Understanding Spatial Data

Francisco Rowe (@fcorowe)

2021-11-05

Using Spatial Data Frames

A core area of this module is learning to work with spatial data in R. R has various purposedly designed packages for manipulation of spatial data and spatial analysis techniques. Various R packages exist in CRAN eg. spatial, sgeostat, splancs, maptools, tmap, rgdal, spand and more recent development of sf - see Lovelace, Nowosad, and Muenchow, 2021 for a great description and historical context for some of these packages.

Lovelace, R., Nowosad, J. and Muenchow, J., 2019. Geocomputation with R. CRC Press

During this session, we will use sf.

We first need to import our spatial data. We will use a shapefile containing data at Output Area (OA) level for Liverpool. These data illustrates the hierarchical structure of spatial data.

Read Spatial Data

oa_shp <- st_read("../data/Liverpool_OA.shp")
## Reading layer `Liverpool_OA' from data source `/Users/Franciscorowe 1/Dropbox/Francisco/Research/github_projects/courses/udd_course/data/Liverpool_OA.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1584 features and 18 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 332390.2 ymin: 379748.5 xmax: 345636 ymax: 397980.1
## projected CRS:  Transverse_Mercator

Examine the input data. A spatial data frame stores a range of attributes derived from a shapefile including the geometry of features (e.g. polygon shape and location), attributes for each feature (stored in the .dbf), projection and coordinates of the shapefile’s bounding box - for details, execute:

?st_read

You can employ the usual functions to visualise the content of the created data frame:

# visualise variable names
names(oa_shp)
##  [1] "OA_CD"    "LSOA_CD"  "MSOA_CD"  "LAD_CD"   "pop"      "H_Vbad"  
##  [7] "H_bad"    "H_fair"   "H_good"   "H_Vgood"  "age_men"  "age_med" 
## [13] "age_60"   "S_Rent"   "Ethnic"   "illness"  "unemp"    "males"   
## [19] "geometry"
# data structure
str(oa_shp)
## Classes 'sf' and 'data.frame':   1584 obs. of  19 variables:
##  $ OA_CD   : chr  "E00176737" "E00033515" "E00033141" "E00176757" ...
##  $ LSOA_CD : chr  "E01033761" "E01006614" "E01006546" "E01006646" ...
##  $ MSOA_CD : chr  "E02006932" "E02001358" "E02001365" "E02001369" ...
##  $ LAD_CD  : chr  "E08000012" "E08000012" "E08000012" "E08000012" ...
##  $ pop     : int  185 281 208 200 321 187 395 320 316 214 ...
##  $ H_Vbad  : int  1 2 3 7 4 4 5 9 5 4 ...
##  $ H_bad   : int  2 20 10 8 10 25 19 22 25 17 ...
##  $ H_fair  : int  9 47 22 17 32 70 42 53 55 39 ...
##  $ H_good  : int  53 111 71 52 112 57 131 104 104 53 ...
##  $ H_Vgood : int  120 101 102 116 163 31 198 132 127 101 ...
##  $ age_men : num  27.9 37.7 37.1 33.7 34.2 ...
##  $ age_med : num  25 36 32 29 34 53 23 30 34 29 ...
##  $ age_60  : num  0.0108 0.1637 0.1971 0.1 0.1402 ...
##  $ S_Rent  : num  0.0526 0.176 0.0235 0.2222 0.0222 ...
##  $ Ethnic  : num  0.3514 0.0463 0.0192 0.215 0.0779 ...
##  $ illness : int  185 281 208 200 321 187 395 320 316 214 ...
##  $ unemp   : num  0.0438 0.121 0.1121 0.036 0.0743 ...
##  $ males   : int  122 128 95 120 158 123 207 164 157 94 ...
##  $ geometry:sfc_MULTIPOLYGON of length 1584; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:14, 1:2] 335106 335130 335164 335173 335185 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:18] "OA_CD" "LSOA_CD" "MSOA_CD" "LAD_CD" ...
# see first few observations
head(oa_shp)
## Simple feature collection with 6 features and 18 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 335071.6 ymin: 389876.7 xmax: 339426.9 ymax: 394479
## projected CRS:  Transverse_Mercator
##       OA_CD   LSOA_CD   MSOA_CD    LAD_CD pop H_Vbad H_bad H_fair H_good
## 1 E00176737 E01033761 E02006932 E08000012 185      1     2      9     53
## 2 E00033515 E01006614 E02001358 E08000012 281      2    20     47    111
## 3 E00033141 E01006546 E02001365 E08000012 208      3    10     22     71
## 4 E00176757 E01006646 E02001369 E08000012 200      7     8     17     52
## 5 E00034050 E01006712 E02001375 E08000012 321      4    10     32    112
## 6 E00034280 E01006761 E02001366 E08000012 187      4    25     70     57
##   H_Vgood  age_men age_med     age_60     S_Rent     Ethnic illness      unemp
## 1     120 27.94054      25 0.01081081 0.05263158 0.35135135     185 0.04379562
## 2     101 37.71174      36 0.16370107 0.17600000 0.04626335     281 0.12101911
## 3     102 37.08173      32 0.19711538 0.02352941 0.01923077     208 0.11214953
## 4     116 33.73000      29 0.10000000 0.22222222 0.21500000     200 0.03597122
## 5     163 34.19003      34 0.14018692 0.02222222 0.07788162     321 0.07428571
## 6      31 56.09091      53 0.44919786 0.88524590 0.11764706     187 0.44615385
##   males                       geometry
## 1   122 MULTIPOLYGON (((335106.3 38...
## 2   128 MULTIPOLYGON (((335810.5 39...
## 3    95 MULTIPOLYGON (((336738 3931...
## 4   120 MULTIPOLYGON (((335914.5 39...
## 5   158 MULTIPOLYGON (((339325 3914...
## 6   123 MULTIPOLYGON (((338198.1 39...

TASK:

Basic Mapping

Again, many functions exist in CRAN for creating maps:

Here this notebook demonstrates the use of plot and tmap. First plot is used to map the spatial distribution of non-British-born population in Liverpool. First we only map the geometries on the right,

Using plot

OAs of Livepool OAs of Livepool

# mapping geometry
plot(st_geometry(oa_shp))

and then:

# map attributes, adding intervals
plot(oa_shp["Ethnic"], key.pos = 4, axes = TRUE, key.width = lcm(1.3), key.length = 1.,
     breaks = "jenks", lwd = 0.1, border = 'grey') 

Spatial distribution of ethnic groups, Liverpool

Spatial distribution of ethnic groups, Liverpool

TASK:

Using tmap

Similar to ggplot2, tmap is based on the idea of a ‘grammar of graphics’ which involves a separation between the input data and aesthetics (i.e. the way data are visualised). Each data set can be mapped in various different ways, including location as defined by its geometry, colour and other features. The basic building block is tm_shape() (which defines input data), followed by one or more layer elements such as tm_fill() and tm_dots().

# ensure geometry is valid
oa_shp = sf::st_make_valid(oa_shp)
# map
legend_title = expression("% ethnic pop.")
map_oa = tm_shape(oa_shp) +
  tm_fill(col = "Ethnic", title = legend_title, palette = magma(256), style = "cont") + # add fill
  tm_borders(col = "white", lwd = .01)  + # add borders
  tm_compass(type = "arrow", position = c("right", "top") , size = 4) + # add compass
  tm_scale_bar(breaks = c(0,1,2), text.size = 0.5, position =  c("center", "bottom")) # add scale bar
map_oa

Note that the operation + is used to add new layers. You can set style themes by tm_style.

Interactive mapping

To visualise the existing styles use tmap_style_catalogue(), and you can also evaluate the code chunk below if you would like to create an interactive map.

tmap_mode("view")
## tmap mode set to interactive viewing
map_oa
## Compass not supported in view mode.
## Warning: In view mode, scale bar breaks are ignored.
ttm()
## tmap mode set to plotting

TASK: Try mapping other variables in the spatial data frame. Where do population aged 60 and over concentrate?

Comparing geographies

If you recall, one of the key issues of working with spatial data is the modifiable area unit problem (MAUP) - see Spatial Data. To get a sense of the effects of MAUP, we analyse differences in the spatial patterns of the ethnic population in Liverpool between Middle Layer Super Output Areas (MSOAs) and OAs. So we map these geographies together.

# read data at the msoa level
msoa_shp <- st_read("../data/Liverpool_MSOA.shp")
## Reading layer `Liverpool_MSOA' from data source `/Users/Franciscorowe 1/Dropbox/Francisco/Research/github_projects/courses/udd_course/data/Liverpool_MSOA.shp' using driver `ESRI Shapefile'
## Simple feature collection with 61 features and 16 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 333086.1 ymin: 381426.3 xmax: 345636 ymax: 397980.1
## projected CRS:  Transverse_Mercator

# ensure geometry is valid
msoa_shp = sf::st_make_valid(msoa_shp)
# create a map
map_msoa = tm_shape(msoa_shp) +
  tm_fill(col = "Ethnic", title = legend_title, palette = magma(256), style = "cont") + 
  tm_borders(col = "white", lwd = .01)  + 
  tm_compass(type = "arrow", position = c("right", "top") , size = 4) + 
  tm_scale_bar(breaks = c(0,1,2), text.size = 0.5, position =  c("center", "bottom")) 
# arrange maps 
tmap_arrange(map_msoa, map_oa) 

TASK: