@fcorowe
)In this section we will learn the intuition of how we can represent spatial relationships in practice. We will explore a key concept of spatial analysis: spatial weights matrices. Spatial weights matrices are structured sets of numbers that formally encode spatial associations between observations.
Key attributes of an spatial weight matrix:
Spatial weight matrices can be created in various ways. We will discuss the most commonly used in practice.
For now, we will only need our LA boundaries.
# clean workspace
rm(list=ls())
# read shapefile
la_shp <- st_read("/Users/Franciscorowe 1/Dropbox/Francisco/Research/github_projects/courses/udd_course/data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp")
## Reading layer `LAD_MAY_2021_UK_BFE_V2' from data source `/Users/Franciscorowe 1/Dropbox/Francisco/Research/github_projects/courses/udd_course/data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp' using driver `ESRI Shapefile'
## Simple feature collection with 374 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -116.1928 ymin: 5333.81 xmax: 655989 ymax: 1220310
## projected CRS: OSGB 1936 / British National Grid
# simplify boundaries
la_shp_simple <- st_simplify(la_shp,
preserveTopology =T,
dTolerance = 500) # .5km
# ensure geometry is valid
la_shp_simple <- sf::st_make_valid(la_shp_simple)
str(la_shp_simple)
## Classes 'sf' and 'data.frame': 374 obs. of 10 variables:
## $ OBJECTID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ LAD21CD : chr "E06000001" "E06000002" "E06000003" "E06000004" ...
## $ LAD21NM : chr "Hartlepool" "Middlesbrough" "Redcar and Cleveland" "Stockton-on-Tees" ...
## $ BNG_E : int 447160 451141 464361 444940 428029 354246 362744 369490 332819 511894 ...
## $ BNG_N : int 531474 516887 519597 518183 515648 382146 388456 422806 436635 431650 ...
## $ LONG : num -1.27 -1.21 -1.01 -1.31 -1.57 ...
## $ LAT : num 54.7 54.5 54.6 54.6 54.5 ...
## $ SHAPE_Leng: num 66110 41056 105292 108085 107203 ...
## $ SHAPE_Area: num 9.84e+07 5.46e+07 2.54e+08 2.10e+08 1.97e+08 ...
## $ geometry :sfc_GEOMETRY of length 374; first list element: List of 1
## ..$ : num [1:27, 1:2] 447214 448290 449073 453112 453352 ...
## ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr [1:9] "OBJECTID" "LAD21CD" "LAD21NM" "BNG_E" ...
Contiguity weights matrices define spatial connection through the existence of common geographical boundaries.
Based on the queen criteria, two spatial units are contiguous if they share a vortex (a single point) of their boundaries.
wm_queen <- poly2nb(la_shp_simple, queen = TRUE)
summary(wm_queen)
## Neighbour list object:
## Number of regions: 374
## Number of nonzero links: 1592
## Percentage nonzero weights: 1.138151
## Average number of links: 4.256684
## 8 regions with no links:
## 44 50 136 277 326 333 335 353
## Link number distribution:
##
## 0 1 2 3 4 5 6 7 8 9 10 12
## 8 12 47 74 70 74 39 33 9 5 2 1
## 12 least connected regions:
## 10 26 27 60 66 87 88 102 126 187 298 339 with 1 link
## 1 most connected region:
## 373 with 12 links
How do we interpret the outcome?
Finding the most connected area:
la_shp_simple$LAD21NM[373]
## [1] "Powys"
Its neighbours:
wm_queen[[373]]
## [1] 19 48 354 356 358 359 361 363 367 369 371 374
Their names:
la_shp_simple$LAD21NM[c(19, 48, 354, 356, 358, 359, 361, 363, 367, 369, 371, 374)]
## [1] "Herefordshire, County of" "Shropshire"
## [3] "Gwynedd" "Denbighshire"
## [5] "Wrexham" "Ceredigion"
## [7] "Carmarthenshire" "Neath Port Talbot"
## [9] "Rhondda Cynon Taf" "Blaenau Gwent"
## [11] "Monmouthshire" "Merthyr Tydfil"
Visualising the weights matrix:
coords <- st_centroid(st_geometry(la_shp_simple))
plot(st_geometry(la_shp_simple), border="grey")
plot(wm_queen, coords, add = TRUE)
The rook defines two observations as neighbours if they share some of their boundaries. For irregular polygons, differences between the rook and queen definitions are minimal and tend to boil down to geocoding. For regular polygons, such as rasters or grids, differences are more noticeable.
wm_rook <- poly2nb(la_shp_simple, queen = FALSE)
summary(wm_rook)
## Neighbour list object:
## Number of regions: 374
## Number of nonzero links: 1290
## Percentage nonzero weights: 0.9222454
## Average number of links: 3.449198
## 19 regions with no links:
## 42 44 50 114 136 211 221 262 277 283 289 295 298 301 306 326 333 335
## 353
## Link number distribution:
##
## 0 1 2 3 4 5 6 7 8 9 12
## 19 41 64 67 77 57 31 11 2 4 1
## 41 least connected regions:
## 3 9 10 26 27 31 43 60 66 87 88 95 99 101 102 104 113 118 121 126 160 187 224 227 249 255 263 286 287 288 291 293 296 297 300 304 307 308 309 339 347 with 1 link
## 1 most connected region:
## 373 with 12 links
Have a go at interpreting and plotting the results.
Distance-based matrices define weights to each pair of observations as a function of their geographical proximity. There are various distance-based matrices, but they share that same intuition.
A approach is to define weights based on the distances between a reference observation and a the set of k observations; that is, the closest. For more details see this vignette
col.knn <- knearneigh(coords, k=4)
head(col.knn[[1]], 5)
## [,1] [,2] [,3] [,4]
## [1,] 4 2 3 5
## [2,] 4 3 1 5
## [3,] 2 1 4 177
## [4,] 2 1 5 3
## [5,] 4 2 1 45
Displaying the network.
plot(st_geometry(la_shp_simple), border="grey")
plot(knn2nb(col.knn), coords, add=TRUE)
An alternative way to define is to draw a circle of certain radius and consider neighbours all observations (i.e. centroids) within that radious.
wm_dist <- dnearneigh(coords, 0, 20000, longlat = TRUE)
## Warning in dnearneigh(coords, 0, 20000, longlat = TRUE): dnearneigh: longlat
## argument overrides object
wm_dist
## Neighbour list object:
## Number of regions: 374
## Number of nonzero links: 1770
## Percentage nonzero weights: 1.265407
## Average number of links: 4.73262
## 99 regions with no links:
## 12 13 17 19 25 26 27 28 29 35 40 45 46 49 50 51 52 54 55 56 58 59 62 63
## 65 66 67 68 69 70 79 82 83 85 86 87 93 103 110 119 143 159 163 165 167
## 168 172 173 174 175 176 177 180 186 190 192 193 204 238 239 240 310 311
## 313 314 315 317 318 319 320 322 323 324 326 328 331 332 333 334 335 336
## 337 338 339 340 341 346 347 349 350 353 354 355 358 359 360 361 362 373
plot(st_geometry(la_shp_simple), border="grey")
plot(wm_dist, coords, add=TRUE)
## Row Standardised Weights Matrices
A spatial weights matrix with raw values (e.g. 1/0s) is rarely the best approach for analysis and some kind of transformation is required.
Let’s use the queen definition to illustrate the example.
Note: ‘zero.policy = TRUE’ allows listing non-neighbours. See what happens if you drop this argument or print `rswm_queen’. If you have done that, you may be waiting for the answer. Well the answer is we have island in the dataset and the queen definition does not integrate these places very well.
The argument Style=“W”
indicates that equal weights are assigned to neighbouring polygons, so they are row standardised: for a given polygon, sums across the columns and divide each cell by the total, to derive weights.
Let’s see the weights for polygon 1:
rswm_queen <- nb2listw(wm_queen, style = "W", zero.policy = TRUE)
rswm_queen$weights[1]
## [[1]]
## [1] 0.5 0.5
This unit has 2 neighbours and each is assigned a 0.5 of the total weight.