@fcorowe
)Three main structures are generally used to organise geographic data:
Vector data structure: The vector data structures record geographic information using points, lines and polygons in a geographic table. These tables contain information about geographic objects. Columns store information about geographic objects, attributes or features, and rows represent individual geographic objects.
Raster data structures: The raster data structures record geographic data in an uniform way over a space in the form of grids. It divides geographic surfaces up into cells of constant size. Rows and columns provide information about the geographic location of a grid.
Spatial graphs: Spatial graphs store connections between objects through space. These connections may derive from geographical topology (e.g. contiguity), distance, or more sophisticated dimensions, such as interaction flows (e.g. human mobility, trade and information).
Vector data structures tend to dominate the social sciences are the interest is often in capturing discrete geographic units containing populations. Here therefore we focus on vector data structures.
To understand the structure of vector data, let’s read a dataset
(Liverpool_OA.shp
) describing output areas within Liverpool
in the United Kingdom. To read in the data, we use the
st_read()
from the package sf
. sf
supports geometry collections, which can contain multiple geometry types
in a single object. sf
provides the same functionality
previously provided in three separate packages sp
,
rgdal
and rgeos
(Robin et al. 2021).
sf
can also be used in combination with
tidyverse
!
Reading the data set via sf
returns its geographic
metadata (i.e. Geometry type
, Dimension
,
Bounding box
and coordinate reference system information on
the line beginning Projected CRS
).
For raster data, I
would recommend using the package terra
.
If you are interested in learning more about mapping geographic data, I cannot recommend enough: Lovelace, R., Nowosad, J. and Muenchow, J., 2019. “Geocomputation with R”. Chapman and Hall/CRC.
oa_shp <- st_read("./data/Liverpool_OA.shp")
## Reading layer `Liverpool_OA' from data source
## `/Users/franciscorowe/Dropbox/Francisco/Research/github_projects/courses/intro-gds/data/Liverpool_OA.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1584 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 332390.2 ymin: 379748.5 xmax: 345636 ymax: 397980.1
## Projected CRS: Transverse_Mercator
We read a sf
data frame containing spatial and attribute
columns. We can examine the content of the data frame by using the
function head()
. We called the first four columns. The last
column in this example contains the geographic information
i.e. geometry
.
class(oa_shp)
## [1] "sf" "data.frame"
head(oa_shp[,1:4])
## Simple feature collection with 6 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 335071.6 ymin: 389876.7 xmax: 339426.9 ymax: 394479
## Projected CRS: Transverse_Mercator
## OA_CD LSOA_CD MSOA_CD LAD_CD geometry
## 1 E00176737 E01033761 E02006932 E08000012 MULTIPOLYGON (((335106.3 38...
## 2 E00033515 E01006614 E02001358 E08000012 MULTIPOLYGON (((335810.5 39...
## 3 E00033141 E01006546 E02001365 E08000012 MULTIPOLYGON (((336738 3931...
## 4 E00176757 E01006646 E02001369 E08000012 MULTIPOLYGON (((335914.5 39...
## 5 E00034050 E01006712 E02001375 E08000012 MULTIPOLYGON (((339325 3914...
## 6 E00034280 E01006761 E02001366 E08000012 MULTIPOLYGON (((338198.1 39...
Each row represents an output area. Each output area has multiple attributes (i.e. columns): administrative areas codes and geometry, as well as information on the local population in these areas; however, this information is not displayed above (can you access it?).
The content of the geometry column gives sf
objects
their spatial powers. oa_shp$geometry
is a ‘list column’
that contains all the coordinates of the output areas polygons.
sf
objects can be plotted quickly with the base R function
plot()
.
For more advanced map
making, use dedicated visualisation packages such as tmap
or ggplot2
.
plot(oa_shp$geometry)
We can thematically colour any attributes in the spatial data frame
based on a column by passing the name of that column to the plot
function. We map the share of unemployed population. We can adjust the
key or legend position (key.pos
), plot axes
(axes
), length of the scale bar (key.length
),
thickness/width of the scale bar (key.width
), method or
number to break the data attribute (breaks
), line width
(lwd
) and colour of polygon borders
(border
).
plot(oa_shp["unemp"], key.pos = 4, axes = TRUE, key.width = lcm(1.3), key.length = 1., breaks = "jenks", lwd = 0.1, border = 'grey')
Various types of geometries (i.e. lines, points and polygons) exist. We can transform vector data into points by running:
oa_cents = st_centroid(oa_shp)
head(oa_cents[,1:4])
## Simple feature collection with 6 features and 4 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 335127.6 ymin: 389943.5 xmax: 339371.6 ymax: 394432
## Projected CRS: Transverse_Mercator
## OA_CD LSOA_CD MSOA_CD LAD_CD geometry
## 1 E00176737 E01033761 E02006932 E08000012 POINT (335127.6 389943.5)
## 2 E00033515 E01006614 E02001358 E08000012 POINT (335896.7 394432)
## 3 E00033141 E01006546 E02001365 E08000012 POINT (336635.5 393061.4)
## 4 E00176757 E01006646 E02001369 E08000012 POINT (335809.5 391117.1)
## 5 E00034050 E01006712 E02001375 E08000012 POINT (339371.6 391441.5)
## 6 E00034280 E01006761 E02001366 E08000012 POINT (338390.6 391967.8)
And visualise the data by running:
plot(st_geometry(oa_cents))