Spatial Weights

Francisco Rowe (@fcorowe)

2022-07-04

Back
Next

Intuition

Now we will learn the intuition of how we can represent spatial relationships in practice. We will explore a key concept of spatial analysis: spatial weights matrices. Spatial weights matrices are structured sets of numbers that formally encode spatial associations between observations.

Fig. 1. Spatial weights matrix.

Key attributes of an spatial weight matrix:

Spatial weight matrices can be created in various ways. We will discuss the most commonly used in practice.

Data

For now, we will only need our LA boundaries.

# clean workspace
rm(list=ls())

# read shapefile
la_shp <- st_read("./data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp")
## Reading layer `LAD_MAY_2021_UK_BFE_V2' from data source 
##   `/Users/franciscorowe/Dropbox/Francisco/Research/github_projects/courses/intro-gds/data/Local_Authority_Districts_(May_2021)_UK_BFE_V3/LAD_MAY_2021_UK_BFE_V2.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 374 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -116.1928 ymin: 5333.81 xmax: 655989 ymax: 1220310
## Projected CRS: OSGB36 / British National Grid
# simplify boundaries
la_shp_simple <- st_simplify(la_shp, 
                             preserveTopology =T,
                             dTolerance = 500) # .5km

# ensure geometry is valid
la_shp_simple <- sf::st_make_valid(la_shp_simple)

head(la_shp_simple[,c(2,3)])
## Simple feature collection with 6 features and 2 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 344666.1 ymin: 378867 xmax: 478289.4 ymax: 537036.1
## Projected CRS: OSGB36 / British National Grid
##     LAD21CD              LAD21NM                       geometry
## 1 E06000001           Hartlepool POLYGON ((447213.9 537036.1...
## 2 E06000002        Middlesbrough POLYGON ((448489.9 522071.8...
## 3 E06000003 Redcar and Cleveland POLYGON ((455525.9 528406.7...
## 4 E06000004     Stockton-on-Tees POLYGON ((444157 527956.3, ...
## 5 E06000005           Darlington POLYGON ((423496.6 524724.3...
## 6 E06000006               Halton POLYGON ((351539.9 389475.2...

Building Spatial Weights

Contiguity-based matrices

Contiguity weights matrices define spatial connection through the existence of common geographical boundaries.

Queen

Based on the queen criteria, two spatial units are contiguous if they share a vortex (a single point) of their boundaries.

wm_queen <- poly2nb(la_shp_simple, queen = TRUE)
summary(wm_queen)
## Neighbour list object:
## Number of regions: 374 
## Number of nonzero links: 1592 
## Percentage nonzero weights: 1.138151 
## Average number of links: 4.256684 
## 8 regions with no links:
## 44 50 136 277 326 333 335 353
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 10 12 
##  8 12 47 74 70 74 39 33  9  5  2  1 
## 12 least connected regions:
## 10 26 27 60 66 87 88 102 126 187 298 339 with 1 link
## 1 most connected region:
## 373 with 12 links

How do we interpret the outcome?

Finding the most connected area:

la_shp_simple$LAD21NM[373]
## [1] "Powys"

Its neighbours:

wm_queen[[373]]
##  [1]  19  48 354 356 358 359 361 363 367 369 371 374

Their names:

la_shp_simple$LAD21NM[c(19,  48, 354, 356, 358, 359, 361, 363, 367, 369, 371, 374)]
##  [1] "Herefordshire, County of" "Shropshire"              
##  [3] "Gwynedd"                  "Denbighshire"            
##  [5] "Wrexham"                  "Ceredigion"              
##  [7] "Carmarthenshire"          "Neath Port Talbot"       
##  [9] "Rhondda Cynon Taf"        "Blaenau Gwent"           
## [11] "Monmouthshire"            "Merthyr Tydfil"

Visualising the weights matrix:

coords <- st_centroid(st_geometry(la_shp_simple))
plot(st_geometry(la_shp_simple), border="grey")
plot(wm_queen, coords, add = TRUE)

Rook

The rook defines two observations as neighbours if they share some of their boundaries. For irregular polygons, differences between the rook and queen definitions are minimal and tend to boil down to geocoding. For regular polygons, such as rasters or grids, differences are more noticeable.

wm_rook <- poly2nb(la_shp_simple, queen = FALSE)
summary(wm_rook)
## Neighbour list object:
## Number of regions: 374 
## Number of nonzero links: 1290 
## Percentage nonzero weights: 0.9222454 
## Average number of links: 3.449198 
## 19 regions with no links:
## 42 44 50 114 136 211 221 262 277 283 289 295 298 301 306 326 333 335
## 353
## Link number distribution:
## 
##  0  1  2  3  4  5  6  7  8  9 12 
## 19 41 64 67 77 57 31 11  2  4  1 
## 41 least connected regions:
## 3 9 10 26 27 31 43 60 66 87 88 95 99 101 102 104 113 118 121 126 160 187 224 227 249 255 263 286 287 288 291 293 296 297 300 304 307 308 309 339 347 with 1 link
## 1 most connected region:
## 373 with 12 links

Have a go at interpreting and plotting the results.

Distance-based matrices

Distance-based matrices define weights to each pair of observations as a function of their geographical proximity. There are various distance-based matrices, but they share that same intuition.

K-Nearest Neighbours

A approach is to define weights based on the distances between a reference observation and a the set of k observations; that is, the closest. For more details see this vignette

col.knn <- knearneigh(coords, k=4)
head(col.knn[[1]], 5)
##      [,1] [,2] [,3] [,4]
## [1,]    4    2    3    5
## [2,]    4    3    1    5
## [3,]    2    1    4  177
## [4,]    2    1    5    3
## [5,]    4    2    1   45

Displaying the network.

plot(st_geometry(la_shp_simple), border="grey")
plot(knn2nb(col.knn), coords, add=TRUE)

Distance Band

An alternative way to define is to draw a circle of certain radius and consider neighbours all observations (i.e. centroids) within that radious.

wm_dist <- dnearneigh(coords, 0, 20000, longlat = TRUE)
## Warning in dnearneigh(coords, 0, 20000, longlat = TRUE): dnearneigh: longlat
## argument overrides object
wm_dist
## Neighbour list object:
## Number of regions: 374 
## Number of nonzero links: 1770 
## Percentage nonzero weights: 1.265407 
## Average number of links: 4.73262 
## 99 regions with no links:
## 12 13 17 19 25 26 27 28 29 35 40 45 46 49 50 51 52 54 55 56 58 59 62 63
## 65 66 67 68 69 70 79 82 83 85 86 87 93 103 110 119 143 159 163 165 167
## 168 172 173 174 175 176 177 180 186 190 192 193 204 238 239 240 310 311
## 313 314 315 317 318 319 320 322 323 324 326 328 331 332 333 334 335 336
## 337 338 339 340 341 346 347 349 350 353 354 355 358 359 360 361 362 373
plot(st_geometry(la_shp_simple), border="grey")
plot(wm_dist, coords, add=TRUE)

Row Standardised Weights Matrices

A spatial weights matrix with raw values (e.g. 1/0s) is rarely the best approach for analysis and some kind of transformation is required.

Let’s use the queen definition to illustrate the example.

Note: ‘zero.policy = TRUE’ allows listing non-neighbours. See what happens if you drop this argument or print `rswm_queen’. If you have done that, you may be waiting for the answer. Well the answer is we have island in the dataset and the queen definition does not integrate these places very well.

The argument Style=“W” indicates that equal weights are assigned to neighbouring polygons, so they are row standardised: for a given polygon, sums across the columns and divide each cell by the total, to derive weights.

Let’s see the weights for polygon 1:

rswm_queen <- nb2listw(wm_queen, style = "W", zero.policy = TRUE)
rswm_queen$weights[1]
## [[1]]
## [1] 0.5 0.5

This unit has 2 neighbours and each is assigned a 0.5 of the total weight.