Fundamental Geographic Data Structures

Three main structures are generally used to organise geographic data:

Vector data structure: The vector data structures record geographic information using points, lines and polygons in a geographic table. These tables contain information about geographic objects. Columns store information about geographic objects, attributes or features, and rows represent individual geographic objects.
Raster data structures: The raster data structures record geographic data in an uniform way over a space in the form of grids. It divides geographic surfaces up into cells of constant size. Rows and columns provide information about the geographic location of a grid.
Spatial graphs: Spatial graphs store connections between objects through space. These connections may derive from geographical topology (e.g. contiguity), distance, or more sophisticated dimensions, such as interaction flows (e.g. human mobility, trade and information).

Vector data structures tend to dominate the social sciences are the interest is often in capturing discrete geographic units containing populations. Here therefore we focus on vector data structures.

Vector data

To understand the structure of vector data, let’s read a dataset (Liverpool_OA.shp) describing output areas within Liverpool in the United Kingdom. To read in the data, we use the st_read() from the package sf. sf supports geometry collections, which can contain multiple geometry types in a single object. sf provides the same functionality previously provided in three separate packages sp, rgdal and rgeos (Robin et al. 2021). sf can also be used in combination with tidyverse!

Reading the data set via sf returns its geographic metadata (i.e. Geometry type, Dimension, Bounding box and coordinate reference system information on the line beginning Projected CRS).

⊕For raster data, I would recommend using the package terra.

⊕If you are interested in learning more about mapping geographic data, I cannot recommend enough: Lovelace, R., Nowosad, J. and Muenchow, J., 2019. “Geocomputation with R”. Chapman and Hall/CRC.

oa_shp <- st_read("./data/Liverpool_OA.shp")

## Reading layer `Liverpool_OA' from data source 
##   `/Users/franciscorowe/Dropbox/Francisco/Research/github_projects/courses/intro-gds/data/Liverpool_OA.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1584 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 332390.2 ymin: 379748.5 xmax: 345636 ymax: 397980.1
## Projected CRS: Transverse_Mercator

We read a sf data frame containing spatial and attribute columns. We can examine the content of the data frame by using the function head(). We called the first four columns. The last column in this example contains the geographic information i.e. geometry.

class(oa_shp)

## [1] "sf"         "data.frame"

head(oa_shp[,1:4])

## Simple feature collection with 6 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 335071.6 ymin: 389876.7 xmax: 339426.9 ymax: 394479
## Projected CRS: Transverse_Mercator
##       OA_CD   LSOA_CD   MSOA_CD    LAD_CD                       geometry
## 1 E00176737 E01033761 E02006932 E08000012 MULTIPOLYGON (((335106.3 38...
## 2 E00033515 E01006614 E02001358 E08000012 MULTIPOLYGON (((335810.5 39...
## 3 E00033141 E01006546 E02001365 E08000012 MULTIPOLYGON (((336738 3931...
## 4 E00176757 E01006646 E02001369 E08000012 MULTIPOLYGON (((335914.5 39...
## 5 E00034050 E01006712 E02001375 E08000012 MULTIPOLYGON (((339325 3914...
## 6 E00034280 E01006761 E02001366 E08000012 MULTIPOLYGON (((338198.1 39...

Each row represents an output area. Each output area has multiple attributes (i.e. columns): administrative areas codes and geometry, as well as information on the local population in these areas; however, this information is not displayed above (can you access it?).

The content of the geometry column gives sf objects their spatial powers. oa_shp$geometry is a ‘list column’ that contains all the coordinates of the output areas polygons. sf objects can be plotted quickly with the base R function plot().

⊕For more advanced map making, use dedicated visualisation packages such as tmap or ggplot2.

plot(oa_shp$geometry)

We can thematically colour any attributes in the spatial data frame based on a column by passing the name of that column to the plot function. We map the share of unemployed population. We can adjust the key or legend position (key.pos), plot axes (axes), length of the scale bar (key.length), thickness/width of the scale bar (key.width), method or number to break the data attribute (breaks), line width (lwd) and colour of polygon borders (border).

plot(oa_shp["unemp"], key.pos = 4, axes = TRUE, key.width = lcm(1.3), key.length = 1., breaks = "jenks", lwd = 0.1, border = 'grey')

Various types of geometries (i.e. lines, points and polygons) exist. We can transform vector data into points by running:

oa_cents = st_centroid(oa_shp)
head(oa_cents[,1:4])

## Simple feature collection with 6 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 335127.6 ymin: 389943.5 xmax: 339371.6 ymax: 394432
## Projected CRS: Transverse_Mercator
##       OA_CD   LSOA_CD   MSOA_CD    LAD_CD                  geometry
## 1 E00176737 E01033761 E02006932 E08000012 POINT (335127.6 389943.5)
## 2 E00033515 E01006614 E02001358 E08000012   POINT (335896.7 394432)
## 3 E00033141 E01006546 E02001365 E08000012 POINT (336635.5 393061.4)
## 4 E00176757 E01006646 E02001369 E08000012 POINT (335809.5 391117.1)
## 5 E00034050 E01006712 E02001375 E08000012 POINT (339371.6 391441.5)
## 6 E00034280 E01006761 E02001366 E08000012 POINT (338390.6 391967.8)

And visualise the data by running:

plot(st_geometry(oa_cents))

Spatial Data

Francisco Rowe (`@fcorowe`)

2022-07-04

Fundamental Geographic Data Structures

Vector data

Spatial Data

Francisco Rowe (@fcorowe)

2022-07-04

Fundamental Geographic Data Structures

Vector data

Francisco Rowe (`@fcorowe`)