-
Notifications
You must be signed in to change notification settings - Fork 0
/
workflow.qmd
418 lines (321 loc) · 17.1 KB
/
workflow.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
---
title: "Assessing the distributional impacts of rail schemes"
author: "Rahulan Chandrasekaran"
date: "now"
format:
html:
toc: true
anchor-sections: true
link-external-newwindow: true
code-fold: true
code-overflow: wrap
execute:
cache: false
warning: false
error: true
echo: true
messsage: false
---
## Introduction
This document explains how to perform distributional impacts analysis (DIA) on the modelled user (or direct transport) benefits of a rail intervention, in accordance with [TAG Unit A4.2](https://www.gov.uk/government/publications/tag-unit-a4-2-distributional-impact-appraisal).
The R code below has been developed specifically for rail schemes modelled with MOIRA or MOIRA2.
This analysis requires the user to provide two CSV files as inputs, each containing modelled estimates of changes in generalised journey time (in minutes) for pairs of origin and destination stations - described by their NLC identifiers - for trips of a specific journey purpose. One file should have modelled user benefits for commuting trips, and the other file should have the equivalent for leisure trips.^[This analysis does not use any data based on business trips, as business trip user benefits are not suitable for distributional impacts analysis.]
The code first converts the two sets of modelled GJT changes into monetised user benefits by station. For this step, the user benefits associated with a given pair of origin and destination stations is split equally between the origin and destination.^[This is in contrast to alternative assumptions typically used in the context of multi-modal transport models, such as assigning all of a link's benefits to the origin station (especially when the modelled time period is restricted to the morning peak).]
This yields estimates of user benefits by modelled station, which is then mapped to LSOAs using an assignment method where LSOAs with population-weighted centroids within a certain distance of a station are then paired with that station. Mapping user benefits to LSOAs allows for comparison against income deprivation data, which is also available at LSOA level.
Our final table summarises the distribution of scheme user benefits by LSOA across all five income deprivation quintiles (1 = most deprived 20%). Within the table, each quintile is assigned a grade based on its constituent impact area LSOAs' relative shares of modelled user benefits and population.
That is, for all impact area LSOAs within an income deprivation quintile:
| Quintile benefits | Comparison vs quintile population share | Grade |
|--------|--------|--------|
| Positive (benefits) | 5 perc. points or more **greater** than quintile pop. share | ✔✔✔ |
| Positive (benefits) | Within ±5 perc. points of quintile pop. share | ✔✔ |
| Positive (benefits) | 5 perc. points or more **smaller** than quintile pop. share | ✔ |
| Zero | - | Neutral |
| Negative (disbenefits) | 5 perc. points or more **smaller** than quintile pop. share | ❌ |
| Negative (disbenefits) | Within ±5 perc. points of quintile pop. share | ❌❌|
| Negative (disbenefits) | 5 perc. points or more **greater** than quintile pop. share | ❌❌❌ |
: System for grading of transport user benefits DIs for each of the social groups. Source: TAG Unit A4.2, Table 8. {tbl-colwidths="[40,40,20]"}
## Data sources
- User-supplied data on scheme benefits:
- Format: 3 column CSV: origin station NLC; destination staion NLC; modelled change in generalised journey time (minutes) for journeys of a specific purpose (either commuting or leisure)
- Worked example: based on TRU 2019 analysis
- Reference data on station NLCs and coordinates
- For the worked example, we will use the station coordinates and NLCs from the TRU scheme modelling.
- However, this will be replaced with a larger dataset constructed initially from ORR stations data.
- Population by LSOA: ONS, 2020 mid-year population estimates (all ages), published in September 2021. Link to data: [.xlsx](https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/lowersuperoutputareamidyearpopulationestimates/mid2020sape23dt2/sape23dt2mid2020lsoasyoaestimatesunformatted.xlsx)
- Income deprivation quintiles by LSOA, from the 2019 English indices of deprivation, published by MHCLG. [Link to IMD homepage](https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019).
- Population-weighted centroids by LSOA: available on the ONS Open Geography portal. [Link](https://geoportal.statistics.gov.uk/datasets/54a76a36e3b4420a9ea83fcf7994525d_0/).
- Appraisal parameters:
- Value of time for commuting journeys: £9.95, from the TAG Data Book.
- Value of time for leisure journeys: £4.54, from the TAG Data Book.
## Code / outputs
First, we load the necessary R packages and define the distance to use for pairing LSOAs with stations.
```{r packages, include = FALSE}
# Load packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, tidyr, readr, data.table, glue, here, janitor, sf, cli,
gt, tmap, geojsonio, stringr, assertthat, tictoc)
tictoc::tic("Runtime") # start timer
```
```{r user_options}
# Are scheme commuting/leisure benefits unmonetised?
commuting_benefits_unmonetised <- TRUE
leisure_benefits_unmonetised <- TRUE
# Scheme benefits data filenames
commuting_benefits_filename <- "user_benefits_commuting.csv"
leisure_benefits_filename <- "user_benefits_leisure.csv"
# Buffer distance to use when assigning LSOAs to stations by distance to
# LSOA population-weighted centroids (units: metres)
station_buffer <- 1500
# Values of time for commuting and leisure purposes
# Source: TAG Data Book, 2010 prices
vot_commuting_tag <- 9.95
vot_leisure_tag <- 4.54
```
We read in the user-supplied scheme benefits data, assumed to be located in the relative paths `data/user_benefits_commuting.csv` and `data/user_benefits_leisure.csv`, before converting the estimated travel time savings to monetary values, summing across the two journey purposes, and aggregating the results by station. The first few rows of the resulting data is displayed at the end.
```{r, eval = TRUE}
# Read data on modelled scheme GJT changes by purpose (commuting or leisure)
# for all modelled pairs of origin and destination stations, before monetising
# GJT changes if necessary
ReadStationGJTCByPurpose <- function(data_path, filename, vot, to_monetise){
bens <- readr::read_csv(glue("{data_path}{filename}"),
col_names = c("origin_tlc", "dest_tlc", "gjtc")) %>%
mutate(gjtc = gjtc * (vot/60) * to_monetise) %>%
filter(!if_any(everything(), is.na))
bens_orig <- bens %>%
group_by(origin_tlc) %>%
summarise(gjtc = sum(gjtc, na.rm = T) / 2) %>%
rename("tlc" = origin_tlc)
bens_dest <- bens %>%
group_by(dest_tlc) %>%
summarise(gjtc = sum(gjtc, na.rm = T) / 2) %>%
rename("tlc" = dest_tlc)
assertthat::are_equal(sum(bens$gjtc),
sum(bens_orig$gjtc) + sum(bens_dest$gjtc))
bens_agg <- rbind(bens_orig, bens_dest)
return(bens_agg)
}
commuting_bens <- ReadStationGJTCByPurpose(
data_path = glue("{here()}/data/"),
filename = commuting_benefits_filename,
vot = vot_commuting_tag,
to_monetise = commuting_benefits_unmonetised
)
leisure_bens <- ReadStationGJTCByPurpose(
data_path = glue("{here()}/data/"),
filename = leisure_benefits_filename,
vot = vot_leisure_tag,
to_monetise = leisure_benefits_unmonetised
)
total_bens <- rbind(commuting_bens, leisure_bens) %>%
group_by(tlc) %>%
summarise(gjtc = sum(gjtc, na.rm = T)) %>%
arrange(tlc)
head(total_bens)
```
Next, we read in our reference data on station NLCs and coordinates. For the worked example, we will use a dataset constructed from the modelling of the scheme that provides the user benefits for our worked example. This is a subset of all non-closed rail stations in England, but this does not affect the analysis. However, in future we will replace this with a dataset built on ORR stations data.
```{r}
stations_filename <- "tru_stations.csv"
stations <- readr::read_csv(glue("{here()}/data/{stations_filename}"),
col_names = c("name", "tlc", "easting", "northing")) %>%
filter(!if_any(everything(), is.na)) %>%
distinct(.keep_all = T) %>%
dplyr::inner_join(total_bens, by = c("tlc")) %>% # add benefits by station
mutate(gjtc = gjtc * -1) %>%
filter(!(str_detect(name, "Zone "))) # exclude non-rail London zones (eg "Zone Blackfriars")
```
Next, we need LSOA population data.
```{r}
# Population (all ages) in mid-2020 by 2011 LSOA
# Source: ONS (https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/lowersuperoutputareamidyearpopulationestimates/mid2020sape23dt2/sape23dt2mid2020lsoasyoaestimatesunformatted.xlsx, sheet: "Mid-2020 Persons", cols A-G)
lsoa_pop_filename <- "lsoa11_pop_2020.csv"
lsoa_pop <- readr::read_csv(glue("{here()}/data/{lsoa_pop_filename}")) %>%
janitor::clean_names() %>%
select(lsoa_code, "pop" = all_ages) %>%
rename("lsoa11cd" = lsoa_code) %>%
as.data.table()
head(lsoa_pop)
```
Followed by LSOA income deprivation quintile data, which requires converting deciles to quintiles.
```{r}
# LSOA income deprivation quintiles, 2019
# Source: MHCLG, English indices of deprivation 2019
imd_filename <- "File_7_-_All_IoD2019_Scores__Ranks__Deciles_and_Population_Denominators_3.csv"
imd <- readr::read_csv(glue("{here()}/data/{imd_filename}")) %>%
janitor::clean_names() %>%
select(lsoa_code_2011,
"income_decile" = income_decile_where_1_is_most_deprived_10_percent_of_lso_as) %>%
mutate(income_quintile = case_when(
income_decile >= 9 ~ 5,
income_decile >= 7 ~ 4,
income_decile >= 5 ~ 3,
income_decile >= 3 ~ 2,
income_decile >= 1 ~ 1,
TRUE ~ NA_integer_
)) %>%
rename("lsoa11cd" = lsoa_code_2011) %>%
select(-c(income_decile)) %>%
as.data.table()
head(imd)
```
Finally, we import a dataset containing the coordinates of all (2011 definition) LSOA population-weighted centroids in England and Wales in 2022.
```{r, eval = TRUE}
# Read data on 2021 LSOA population-weighted centroids
# Source: https://geoportal.statistics.gov.uk/datasets/ons::lsoa-dec-2021-pwc-for-england-and-wales/explore
lsoa_pwc_filename <- "LSOA_Dec_2011_PWC_in_England_and_Wales_2022_1923591000694358693.csv"
lsoa_pwc <- readr::read_csv(glue("{here()}/data/{lsoa_pwc_filename}")) %>%
janitor::clean_names() %>%
filter(!(stringr::str_sub(lsoa11cd, 1, 1) == "W")) %>% # exclude Welsh LSOAs
select(lsoa11cd, x, y)
head(lsoa_pwc)
```
At this stage, we can plot our scheme benefits data (by station) on a map.
```{r map_options, include = FALSE}
tmap_options(check.and.fix = T)
tmap_mode("view")
```
```{r, eval = TRUE, include = TRUE}
# Make bubble map of user benefits by modelled station
stations %>%
sf::st_as_sf(coords = c("easting", "northing"), crs = 27700) %>%
tm_shape(.) +
tm_bubbles(size = "gjtc", col = "blue", alpha = 0.2)
```
We continue with our analysis by assigning user benefits to LSOAs, which requires identifying all pairs of modelled stations and population-weighted LSOA centroids within `r paste0(station_buffer)`m of each other. Each LSOA that is paired with a modelled station in this way is then assigned a portion of the station's user benefits based on its share of the total population across all station-paired LSOAs.
::: {.callout-note}
This is undoubtedly the most computationally taxing step in the process, but we can exploit the speed of the [**data.table**](https://rdatatable.gitlab.io/data.table/) package to avoid memory-related issues. This works successfully for our worked example - which has 716 modelled stations - where this step is implemented within 5-10 seconds. However, further testing will be required to determine whether the step can be completed successfully for schemes with many more modelled stations.
:::
```{r}
stations <- as.data.table(stations)
lsoa_pwc <- as.data.table(lsoa_pwc)
station_pairs <- vector("list", nrow(stations))
for(s in seq(1:nrow(stations))){
station_pairs[[s]] <- as.data.table(stations[s,])[, as.list(lsoa_pwc),
by = stations[s,]]
station_pairs[[s]] <- station_pairs[[s]][,
dist := (((easting - x)^2) + ((northing - y)^2)) ^ 0.5
][dist <= station_buffer]
if(s %% 100 == 0){cli::cli_inform("Processed {s}/{nrow(stations)} stations")}
}
final <- rbindlist(station_pairs)
rm(station_pairs)
```
Now that we've obtained a dataset showing scheme user benefits by LSOA, for all LSOAs with population-weighted centroids within `r paste0(station_buffer)`m of a modelled station, we can add our other LSOA variables (population and income deprivation quintile) before calculating each LSOA's share of its paired station's population and associated user benefits.
Then we aggregate the data by income deprivation quintile and produce our final user benefits DI table.
```{r, eval = TRUE}
# Add LSOA income deprivation quintile and 2020 population data
final_2 <- final[imd,
on = "lsoa11cd",
income_quintile := i.income_quintile][
lsoa_pop,
on = "lsoa11cd",
pop := i.pop
] %>%
as_tibble() %>%
# Calculate total station population (across paired LSOAs)
group_by(tlc) %>%
mutate(sta_pop = sum(pop)) %>%
ungroup() %>%
# Calculate LSOA population shares
# And then LSOA benefits (based on pop. shares)
mutate(pop_share = pop / sta_pop,
lsoa_bens = gjtc * pop_share) %>%
filter(!if_any(everything(), is.na))
# Finish compiling benefits by LSOA
bens_by_lsoa <- final_2 |>
select(lsoa11cd, lsoa_bens)
head(bens_by_lsoa)
```
```{r}
# Produce TAG A4.2 user benefits DI table
# Group benefits/disbenefits data by income quintile, assign grade based on
# comparing each quintile's share of benefits (disbenefits) and its population
# share. Uses {gt} package to produce a table in HTML format.
bens_by_quintile <- final_2 %>%
group_by(income_quintile) %>%
summarise(lsoas = n(),
bens = sum(lsoa_bens),
pop = sum(pop)) |>
mutate(disbens = if_else(bens < 0, bens * -1, 0)) |>
mutate(bens = if_else(bens > 0, bens, 0)) |>
select(income_quintile, lsoas, bens, disbens, pop)
bens_by_quintile <- bens_by_quintile |>
mutate(share_bens = bens / sum(bens),
share_disbens = disbens / sum(disbens),
share_pop = pop / sum(pop)) %>%
mutate(grade = case_when(
share_bens > 0 & share_bens - share_pop >= 0.05 ~ "✔✔✔",
share_bens > 0 & share_bens - share_pop >= -0.05 ~ "✔✔",
share_bens > 0 & share_bens - share_pop < -0.05 ~ "✔",
share_bens == 0 ~ "Neutral",
share_disbens > 0 & share_disbens - share_pop >= 0.05 ~ "❌❌❌",
share_disbens > 0 & share_disbens - share_pop >= -0.05 ~ "❌❌",
share_disbens > 0 & share_disbens - share_pop <= -0.05 ~ "❌",
TRUE ~ NA_character_
)) %>%
mutate(bens = bens / 1000000,
disbens = disbens / 1000000,
pop = pop / 1000000)
# Make final HTML table
bens_by_quintile |>
gt() |>
sub_missing(missing_text = "-") |>
# Amend column formats
fmt_number(columns = 2, decimals = 0) %>% # lsoas
fmt_number(columns = c(3:5), decimals = 1) %>% # total bens/disbens/pop
fmt_percent(columns = c(6:8), decimals = 0) %>% # shares of bens/disbens/pop
# Amend column labels
cols_label(
income_quintile = "Income Deprivation Quintile",
lsoas = "LSOAs",
bens = "Total benefits",
disbens = "Total disbenefits",
pop = "Population",
share_bens = "Share of user benefits",
share_disbens = "Share of user disbenefits",
share_pop = "Share of population in impact area",
grade = "Assessment"
) %>%
# Table header
tab_header(title = "User Benefits Distributional Analysis") %>%
# Add table footnotes
tab_footnote(
footnote = "1 = most deprived quintile; 5 = least deprived quintile.",
locations = cells_column_labels(columns = income_quintile)
) %>%
tab_footnote(
footnote = "£m, 2010 prices, for a specific modelled year and scenario.",
locations = cells_column_labels(columns = bens)
) %>%
tab_footnote(
footnote = glue("Summed across all English LSOAs in each quintile with a population-weighted centroid within {station_buffer}m of a scheme-modelled station."),
locations = cells_column_labels(columns = pop)
) %>%
# Add/format summary totals
grand_summary_rows(
columns = c(lsoas),
fns = list(Total = ~sum(., na.rm = T)),
fmt = list(~fmt_integer(.))
) %>%
grand_summary_rows(
columns = c(bens, disbens, pop),
fns = list(Total = ~sum(., na.rm = T)),
fmt = list(
~fmt_number(., decimals = 1),
~fmt_number(., decimals = 1),
~fmt_number(., decimals = 1)
)
) %>%
grand_summary_rows(
columns = c(share_bens, share_disbens, share_pop),
fns = list(Total = ~sum(., na.rm = T)),
fmt = list(
~fmt_percent(., decimals = 0),
~fmt_percent(., decimals = 0),
~fmt_percent(., decimals = 0)
)
)
```
```{r}
tictoc::toc(log = TRUE)
sessionInfo()
```