Mapping data

Francisco Rowe (@fcorowe)

2021-11-12

Before diving into this session, let’s ask ourselves:

For creating maps, cartography is important. A carefully crafted map can be an effective way of communicating complex information. Design issues include poor placement, size and readability of text and careless selection of colors. Have a look the style guide of the Journal of Maps for details.

For colour palettes, I recommend: * viridis * color brewer 2.0

Crameri, F., Shephard, G.E. and Heron, P.J., 2020. The misuse of colour in science communication. Nature communications, 11(1), pp.1-10.

Choropleths

Choropleths are thematic maps. They are easy to create but also to get wrong. We’ll look at a set of the principles you can follow to create effective choropleth maps. Here three more questions to consider:

MacEachren, A.M. and Kraak, M.J., 1997. Exploratory cartographic visualization: advancing the agenda, Computers & Geosciences, 23(4), 335-343.

Data

We will use internal migration data from the Office for National Statistics (ONS) from the United Kingdom.

The original data are organised in a long format structure and are disaggregated by sex and age. Each row captures the count of people moving from an origin to a destination. The spatial units of analysis are local authorities (LA).

# clean workspace
rm(list=ls())

# load data
df_long <- read_csv("../data/internal_migration/Detailed_Estimates_2020_LA_2021_Dataset_1.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   OutLA = col_character(),
##   InLA = col_character(),
##   Age = col_double(),
##   Moves = col_double(),
##   Sex = col_character()
## )
# id for origins and destinations
orig_la_nm <- as.data.frame(unique(df_long$OutLA))
dest_la_nm <- as.data.frame(unique(df_long$InLA))

head(df_long)
## # A tibble: 6 x 5
##   OutLA     InLA        Age Moves Sex  
##   <chr>     <chr>     <dbl> <dbl> <chr>
## 1 E06000001 E06000002     0 1.24  M    
## 2 E06000001 E06000002     0 0.662 F    
## 3 E06000001 E06000002     1 5.09  M    
## 4 E06000001 E06000002     1 2.56  F    
## 5 E06000001 E06000002     2 2.64  M    
## 6 E06000001 E06000002     2 2.54  F

We also read our LA boundaries and analyse the structure of the data. We use open data from the ONS’s Geography portal. We use the Local Authority Districts Boundaries (May 2021) UK BFE

# read shapefile
la_shp <- st_read("../data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp")
## Reading layer `LAD_MAY_2021_UK_BFE_V2' from data source `/Users/Franciscorowe 1/Dropbox/Francisco/Research/github_projects/courses/udd_gds_course/data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp' using driver `ESRI Shapefile'
## Simple feature collection with 374 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -116.1928 ymin: 5333.81 xmax: 655989 ymax: 1220310
## projected CRS:  OSGB 1936 / British National Grid
str(la_shp)
## Classes 'sf' and 'data.frame':   374 obs. of  10 variables:
##  $ OBJECTID  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ LAD21CD   : chr  "E06000001" "E06000002" "E06000003" "E06000004" ...
##  $ LAD21NM   : chr  "Hartlepool" "Middlesbrough" "Redcar and Cleveland" "Stockton-on-Tees" ...
##  $ BNG_E     : int  447160 451141 464361 444940 428029 354246 362744 369490 332819 511894 ...
##  $ BNG_N     : int  531474 516887 519597 518183 515648 382146 388456 422806 436635 431650 ...
##  $ LONG      : num  -1.27 -1.21 -1.01 -1.31 -1.57 ...
##  $ LAT       : num  54.7 54.5 54.6 54.6 54.5 ...
##  $ SHAPE_Leng: num  66110 41056 105292 108085 107203 ...
##  $ SHAPE_Area: num  9.84e+07 5.46e+07 2.54e+08 2.10e+08 1.97e+08 ...
##  $ geometry  :sfc_MULTIPOLYGON of length 374; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:4055, 1:2] 447214 447229 447234 447243 447246 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
##   ..- attr(*, "names")= chr [1:9] "OBJECTID" "LAD21CD" "LAD21NM" "BNG_E" ...

Computing mobility indicators

Before moving forward we need to define our objective, what is it that we want to visualise / analyse / monitor? Recall the principles of research design and planning.

Once we have decided our objective, we can define our metrics.

# out-migration
outflows <- df_long %>% 
  group_by(OutLA) %>%
   dplyr::summarise( n = sum(Moves, na.rm = T))

# in-migration
inflows <- df_long %>% 
  group_by(InLA) %>%
   dplyr::summarise( n = sum(Moves, na.rm = T))

# net migration
indicators <- full_join(outflows, 
                        inflows,
                        by = c("OutLA" = "InLA")) %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>% 
  mutate_if(is.numeric, round) %>% 
  rename(
    outflows = n.x,
    inflows = n.y
  ) %>% 
    mutate(
  netflows = (inflows - outflows)
  ) 

Joining spatial data

la_shp <- left_join(la_shp, indicators, by = c("LAD21CD" = "OutLA"))

Mapping categorical data

Let’s start by mapping categorical data and learning about the UK.

# id for country name initial
la_shp$ctry_nm <- substr(la_shp$LAD21CD, 1, 1)
la_shp$ctry_nm <- as.factor(la_shp$ctry_nm)

# simplify boundaries
la_shp_simple <- st_simplify(la_shp, 
                             preserveTopology =T,
                             dTolerance = 1000) # 1km

# ensure geometry is valid
la_shp_simple <- sf::st_make_valid(la_shp_simple)

tm_shape(la_shp_simple) +
  tm_fill(col = "ctry_nm", style = "cat", palette = viridis(4), title = "Country") +
   tm_borders(lwd = 0)  +
    tm_layout(legend.title.size = 1,
          legend.text.size = 0.6,
          legend.position = c("right","top"),
          legend.bg.color = "white",
          legend.bg.alpha = 1)

Mapping continous data

If instead we want to visualise the geographical distribution of a continous phenomenon, we have a few more alternatives.

Equal interval

An option is ‘equal intervals’. The intuition is to divide the distribution into equal size segments.

tm_shape(la_shp_simple) +
  tm_fill(col = "netflows", style = "equal", palette = viridis(6), title = "Net migration") +
   tm_borders(lwd = 0) +
  tm_layout(legend.title.size = 1,
          legend.text.size = 0.6,
          legend.position = c("right","top"),
          legend.bg.color = "white",
          legend.bg.alpha = 1)

Equal interval bins are more appropriate for variables with a uniform distribution. They are not recommended for variables with a skewed distribution. Why?

Quantiles

This algorithm ensures that the same number of data points fall into each category. A potential issue could be that bin ranges can vary widely.

tm_shape(la_shp_simple) +
  tm_fill(col = "netflows", style = "quantile", palette = viridis(6), title = "Net migration") +
   tm_borders(lwd = 0) +
  tm_layout(legend.title.size = 1,
          legend.text.size = 0.6,
          legend.position = c("right","top"),
          legend.bg.color = "white",
          legend.bg.alpha = 1)

Fisher-Jenks

The Fisher-Jenks algorithm, known as ‘natural breaks’, identifies groups of similar values in the data and maximises the differences between categories i.e. ‘natural breaks’.

Jenks, G.F., 1967. The data model concept in statistical mapping. International yearbook of cartography, 7, pp.186-190. Vancouver

tm_shape(la_shp_simple) +
  tm_fill(col = "netflows", style = "jenks", palette = viridis(6), title = "Net migration") +
   tm_borders(lwd = 0) +
  tm_layout(legend.title.size = 1,
          legend.text.size = 0.6,
          legend.position = c("right","top"),
          legend.bg.color = "white",
          legend.bg.alpha = 1)

Order

Order helps presenting a large number of colors over continuous surface of colours and can be very useful for rasters. order can help display skewed distributions.

tm_shape(la_shp_simple) +
  tm_fill(col = "netflows", style = "order", palette = viridis(256), title = "Net migration") +
  tm_borders(lwd = 0) +
  tm_layout(legend.title.size = 1,
          legend.text.size = 0.6,
          legend.position = c("right","top"),
          legend.bg.color = "white",
          legend.bg.alpha = 1)