@fcorowe
)Before diving into this session, let’s ask ourselves:
For creating maps, cartography is important. A carefully crafted map can be an effective way of communicating complex information. Design issues include poor placement, size and readability of text and careless selection of colors. Have a look the style guide of the Journal of Maps for details.
For colour palettes, I recommend: * viridis * color brewer 2.0
Crameri, F., Shephard, G.E. and Heron, P.J., 2020. The misuse of colour in science communication. Nature communications, 11(1), pp.1-10.
Choropleths are thematic maps. They are easy to create but also to get wrong. We’ll look at a set of the principles you can follow to create effective choropleth maps. Here three more questions to consider:
MacEachren, A.M. and Kraak, M.J., 1997. Exploratory cartographic visualization: advancing the agenda, Computers & Geosciences, 23(4), 335-343.
We will use internal migration data from the Office for National Statistics (ONS) from the United Kingdom.
The original data are organised in a long format structure and are disaggregated by sex and age. Each row captures the count of people moving from an origin to a destination. The spatial units of analysis are local authorities (LA).
# clean workspace
rm(list=ls())
# load data
df_long <- read_csv("../data/internal_migration/Detailed_Estimates_2020_LA_2021_Dataset_1.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## OutLA = col_character(),
## InLA = col_character(),
## Age = col_double(),
## Moves = col_double(),
## Sex = col_character()
## )
# id for origins and destinations
orig_la_nm <- as.data.frame(unique(df_long$OutLA))
dest_la_nm <- as.data.frame(unique(df_long$InLA))
head(df_long)
## # A tibble: 6 x 5
## OutLA InLA Age Moves Sex
## <chr> <chr> <dbl> <dbl> <chr>
## 1 E06000001 E06000002 0 1.24 M
## 2 E06000001 E06000002 0 0.662 F
## 3 E06000001 E06000002 1 5.09 M
## 4 E06000001 E06000002 1 2.56 F
## 5 E06000001 E06000002 2 2.64 M
## 6 E06000001 E06000002 2 2.54 F
We also read our LA boundaries and analyse the structure of the data. We use open data from the ONS’s Geography portal. We use the Local Authority Districts Boundaries (May 2021) UK BFE
# read shapefile
la_shp <- st_read("../data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp")
## Reading layer `LAD_MAY_2021_UK_BFE_V2' from data source `/Users/Franciscorowe 1/Dropbox/Francisco/Research/github_projects/courses/udd_gds_course/data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp' using driver `ESRI Shapefile'
## Simple feature collection with 374 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -116.1928 ymin: 5333.81 xmax: 655989 ymax: 1220310
## projected CRS: OSGB 1936 / British National Grid
str(la_shp)
## Classes 'sf' and 'data.frame': 374 obs. of 10 variables:
## $ OBJECTID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ LAD21CD : chr "E06000001" "E06000002" "E06000003" "E06000004" ...
## $ LAD21NM : chr "Hartlepool" "Middlesbrough" "Redcar and Cleveland" "Stockton-on-Tees" ...
## $ BNG_E : int 447160 451141 464361 444940 428029 354246 362744 369490 332819 511894 ...
## $ BNG_N : int 531474 516887 519597 518183 515648 382146 388456 422806 436635 431650 ...
## $ LONG : num -1.27 -1.21 -1.01 -1.31 -1.57 ...
## $ LAT : num 54.7 54.5 54.6 54.6 54.5 ...
## $ SHAPE_Leng: num 66110 41056 105292 108085 107203 ...
## $ SHAPE_Area: num 9.84e+07 5.46e+07 2.54e+08 2.10e+08 1.97e+08 ...
## $ geometry :sfc_MULTIPOLYGON of length 374; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:4055, 1:2] 447214 447229 447234 447243 447246 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr [1:9] "OBJECTID" "LAD21CD" "LAD21NM" "BNG_E" ...
Before moving forward we need to define our objective, what is it that we want to visualise / analyse / monitor? Recall the principles of research design and planning.
Once we have decided our objective, we can define our metrics.
# out-migration
outflows <- df_long %>%
group_by(OutLA) %>%
dplyr::summarise( n = sum(Moves, na.rm = T))
# in-migration
inflows <- df_long %>%
group_by(InLA) %>%
dplyr::summarise( n = sum(Moves, na.rm = T))
# net migration
indicators <- full_join(outflows,
inflows,
by = c("OutLA" = "InLA")) %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
mutate_if(is.numeric, round) %>%
rename(
outflows = n.x,
inflows = n.y
) %>%
mutate(
netflows = (inflows - outflows)
)
la_shp <- left_join(la_shp, indicators, by = c("LAD21CD" = "OutLA"))
Let’s start by mapping categorical data and learning about the UK.
# id for country name initial
la_shp$ctry_nm <- substr(la_shp$LAD21CD, 1, 1)
la_shp$ctry_nm <- as.factor(la_shp$ctry_nm)
# simplify boundaries
la_shp_simple <- st_simplify(la_shp,
preserveTopology =T,
dTolerance = 1000) # 1km
# ensure geometry is valid
la_shp_simple <- sf::st_make_valid(la_shp_simple)
tm_shape(la_shp_simple) +
tm_fill(col = "ctry_nm", style = "cat", palette = viridis(4), title = "Country") +
tm_borders(lwd = 0) +
tm_layout(legend.title.size = 1,
legend.text.size = 0.6,
legend.position = c("right","top"),
legend.bg.color = "white",
legend.bg.alpha = 1)
If instead we want to visualise the geographical distribution of a continous phenomenon, we have a few more alternatives.
An option is ‘equal intervals’. The intuition is to divide the distribution into equal size segments.
tm_shape(la_shp_simple) +
tm_fill(col = "netflows", style = "equal", palette = viridis(6), title = "Net migration") +
tm_borders(lwd = 0) +
tm_layout(legend.title.size = 1,
legend.text.size = 0.6,
legend.position = c("right","top"),
legend.bg.color = "white",
legend.bg.alpha = 1)
Equal interval bins are more appropriate for variables with a uniform distribution. They are not recommended for variables with a skewed distribution. Why?
This algorithm ensures that the same number of data points fall into each category. A potential issue could be that bin ranges can vary widely.
tm_shape(la_shp_simple) +
tm_fill(col = "netflows", style = "quantile", palette = viridis(6), title = "Net migration") +
tm_borders(lwd = 0) +
tm_layout(legend.title.size = 1,
legend.text.size = 0.6,
legend.position = c("right","top"),
legend.bg.color = "white",
legend.bg.alpha = 1)
The Fisher-Jenks algorithm, known as ‘natural breaks’, identifies groups of similar values in the data and maximises the differences between categories i.e. ‘natural breaks’.
Jenks, G.F., 1967. The data model concept in statistical mapping. International yearbook of cartography, 7, pp.186-190. Vancouver
tm_shape(la_shp_simple) +
tm_fill(col = "netflows", style = "jenks", palette = viridis(6), title = "Net migration") +
tm_borders(lwd = 0) +
tm_layout(legend.title.size = 1,
legend.text.size = 0.6,
legend.position = c("right","top"),
legend.bg.color = "white",
legend.bg.alpha = 1)
Order helps presenting a large number of colors over continuous surface of colours and can be very useful for rasters. order
can help display skewed distributions.
tm_shape(la_shp_simple) +
tm_fill(col = "netflows", style = "order", palette = viridis(256), title = "Net migration") +
tm_borders(lwd = 0) +
tm_layout(legend.title.size = 1,
legend.text.size = 0.6,
legend.position = c("right","top"),
legend.bg.color = "white",
legend.bg.alpha = 1)