Measuring the changing pattern of ethnic segregation in England and Wales with Consumer Registers

Analysis of changing patterns of ethnic residential segregation is usually framed by the coarse categorisations of ethnicity used in censuses and other large-scale public sector surveys and by the infrequent time intervals at which such surveys are conducted. In this paper, we use names-based classification of Consumer Registers to investigate changing degrees of segregation in England and Wales over the period 1997–2016 at annual resolution. We find that names-based ethnic classification of the individuals that make up Consumer Registers provides reliable estimates of the residential patterning of different ethnic groups and the degree to which they are segregated. Building upon this finding, we explore more detailed segregation patterns and trends of finer groups at annual resolutions and discover some unexpected trends that have hitherto remained unrecorded by Census-based studies. We conclude that appropriately processed Consumer Registers hold considerable potential to contribute to various domains of urban geography and policy.


18
Ethnic residential segregation has provided an enduring and debated focus for social 19 investigation in the United Kingdom and elsewhere. Both the academic and public 20 discussions are frequently dominated by anecdotal evidence because extensive, timely and 21 detailed data on ethnic residential patterns are unavailable. Segregation researchers recognise 22 that the lack of sufficiently granular data with respect to ethnic categories and temporal 23 resolution are of paramount importance and impede progress in significant policy debates, 24 such as migration and segregation in England and Wales (Harris & Johnston, 2018). 25 Additional obstacles in understanding the patterns, causes and consequences of ethnic 26 segregation arise from the "slippery" (Peach, 2009) nature of segregation measures, which 27 continue to be contested. 28 In the United Kingdom, conventional data sources of ethnicity information are mainly drawn 29 from decennial censuses of population. While they are broad in coverage, relying on census 30 data creates significant gaps, as data are only collected every ten years. The delay to the 31 release of the information means that, currently, the most recent data are eight years out of 32 date and the next population-wide update cannot be expected before 2022. Moreover, census 33 categorisations provide little flexibility in profiling particular ethnic communities of policy 34 concern beyond the coarse, pre-defined ethnic categories released by the Census. 35 In this paper, we seek to demonstrate the feasibility of using Consumer Registers (Lansley et  36 al., 2019) as an alternative population data source to official censuses, in order to develop a 37 more granular analysis of recent segregation trends and patterns in England and Wales. We 38 make annual estimates of ethnic segregation in England and Wales from 1997 to 2016 for all 39 ethnic groups recorded by the Census as well as a selected number of finer ethnic categories. 40 We develop two innovative manipulations: (1) we employ the algorithm developed by Kandt 41 and  to infer probable ethnic origins for aggregations of individual names at 42 the Census Output Area level; and then (2) track annual segregation estimates for England 43 and Wales as a whole and for four case studies. 44 to match records between years and thus turn Consumer Registers into a powerful, 50

Consumer Registers
longitudinally linked data resource that can be aggregated to any convenient geography. Such 51 a resource can permit novel insights into a range of research and policy problems, including 52 segration. 53 54 Yet, as is typical for consumer and other big data sources, Consumer Registers require 55 significant data cleaning and pre-processing before they can be deployed for research 56 purposes. A major challenge arises from the unknown provenance of individual records 57 because The pairwise Index of Dissimilarity ! is rewritten in Equation (1) a city, rather than inter-city comparisons (Peach, 2009). 202 (1) ) = * # , (2) In addition, in order to evaluate the influence of randomness, we test the significance of the 203 Dissimilarity Index under the null hypothesis of no systematic segregation. Following the 204 randomisation tests of Boisso et al. (1994) and Carrington and Troske (1997), we generate 205 pseudo-sample distributions with 1000 repetitions by randomly allocating individuals from 206 different ethnic groups to 2011 Census Output Areas. In each repetition, random numbers of 207 the group population " # and ( # are created under the multinomial distribution using the 208 restricted probabilities that the chance of a resident from either ethnic group being allocated 209 to a unit $ equals the proportion of the population in unit $ compared to the total population of 210 6 units in the study area. Using 1000 repetitions of the random allocation process, we 211 calculate the mean Dissimilarity Index D* and confidence intervals (CIs) to test the null 212 hypothesis that the observed segregation level D is produced by randomness solely. 213 consistency (see Table 1). these by population counts for the corresponding groups in 1997 (see Figure 1 and 244

Results
Supplementary material Table S1). 245 Both urban and rural areas have experienced population growth by all ethnic groups over the 246 last 20 years. Growth is particularly pronounced for Indians in rural areas. In addition, two 247 trends can be identified from annual population growth rates by selected ethnic groups, as 248 shown in Figure 1. Growth in numbers of some ethnic groups is increasingly divergent in 249 urban and rural areas, but the nature of this divergence differs among some ethnic groups.  Consumer Registers with those from the Censuses, we conduct correlation analysis of the D 304 values and ranks of the 10 ethnic groups from Table 2 and Table S2 for both urban and rural  305 areas. Results of the analysis are summarised in Table 3. Here, the coefficients suggest that 306 there are strong and positive correlations (coefficients > .8) between Consumer Registers and 307 Censuses in terms of the index values and ranks at the 99% confidence level (p-value < .01). 308 All in all, the correlations suggest that Consumer Registers offer a broadly accurate picture of 309 the ethnic structure of segregation in the country and can be used a source of information 310 when examining segregation during the intercensal period. At the same time, the group-311 specific segregation indices should be viewed with caution. 312  Greek group appears to be the most segregated group in each of the four areas. However, 336 unlike the Bangladeshi community, the high segregation index values of the Greeks can be 337 mainly attributed to the small size of the Greek community, as the distribution of smaller 338 populations is more prone to randomness in a statistical sense as we demonstrate in the 339 previous section and Table 2. Given their small population size, similar arguments can be 340 applied to segregation levels of the French group in Greater Manchester and Birmingham, as 341 well as all of the ethnic minorities in rural areas like Lincolnshire. 342 Despite these statistical concerns, the relative, temporal changes of segregation indices 343 remain meaningful for each group (Simpson 2004). We plot the changes in Dissimilarity 344 Index values by ethnic groups each year relative to 1997 in Figure 2. We can ascertain that 345 although ethnic diversity has been increasing with respect to the proportions of the ethnic 346 minorities, the Dissimilarity Indices of most of the ethnic groups have been dropping, except 347 for 'White British' and 'White Irish' in the three urban areas and "Other White" in 348 Lincolnshire. Such a decrease suggests that these minority groups are more evenly distributed 349 and less segregated than before. In contrast to these groups with gently declining 350 Dissimilarity Indices, the Indian, Black African and Polish communities have experienced a 351 dramatic fall in the segregation levels in terms of the evenness dimension. 352 In particular, we observe pronounced decreases in the Dissimilarity Indices for the Poles 353 across the four areas from 2004. This trend is consistent with the national trend in urban and 354 rural areas presented in the supplementary Table S1. The changing pattern of the Polish 355 residents is quite different from the other communities from the EU, particularly the French. 356 We may speculate that the apparent dispersion of Polish residents in Lincolnshire is a result 357 of their settlement in areas of agricultural labour market shortages since the 2004 Polish EU 358 accession. The Indian group appears to be distributed more evenly across Output Areas in the 359 four urban regions, particularly after 2011. Such observations would not be possible with the 360 Census population data until the next Census in 2021. 361 Measuring the exposure dimension of residential segregation, we find varying levels of 362 Isolation among ethnic groups (Table S3). The south Asian groups -Bangladeshis, Pakistanis 363 and Indians -seem to be more isolated in the four case study areas, which indicates that they 364 tend to live in spatial clusters with less likelihood of meeting people from different ethnic 365 communities in their neighbourhoods. 'Black Caribbean', Chinese, Greek and French remain 366 at relatively low levels of Isolation, which may be partly due to their small overall population 367 sizes. Regarding the temporal trend of the Isolation Indices (see Figure 3), some of these 368 groups have become less segregated along the exposure dimension, for instance, the 369 Bangladeshis. Others have experienced increasing levels of Isolation, most notably the 'Black 370 Africans' in urban areas and the Polish, Indian and 'Other White' groups in Lincolnshire. The 371 Greek and French also exhibit almost identical stability in levels of segregation; conversely, 372 there has been increased Isolation of Poles since 2004, which may reflect the sensitivity of 373 Isolation intervening years (see Figure 4). These changes in the four case study areas generally 384 correspond to the national trend of the ethnic composition summarised in Table 1  We also calculate Dissimilarity Indices of the four case study areas using the adult population 395 from the Consumer Registers and compare them to indices calculated from the whole 396 population recorded in the 2011 and 2001 Censuses (see Figure 5 and Figure S1 in the 397 supplementary material respectively). The magnitude and trend of the segregation levels of 398 each ethnic group largely correspond between Consumer Registers and Censuses with some 399 nuanced differences. The datasets consistently show that some ethnic minorities, for instance, 400 Bangladeshis, Pakistanis, 'Black Africans' and 'Caribbeans' are more segregated than 'White 401 British', 'White Irish' and 'Other White'. A comparison of Figure 5 and Figure S1 in the 402 This latter finding resonates with the chain migration process (Catney, 2015), which denotes 459 the process by which earlier immigrants begin to move away from metropolitan gateway 460 areas while subsequent immigration continues to settle in a wider set of urban cores. 461 Observations from the exposure dimension suggest that increased evenness for some ethnic 462 groups does not necessarily accompany increased exposure. Contrary changes in the two 463 measurements of individual ethnic groups, such as those observed in London, may indicate 464 significant immigration involving one or more ethnic groups. 465

Uncertainties and limitations 466
We note the nuanced inaccuracies arising from measuring segregation using Consumer 467 Registers by validating our ethnicity estimates with reference to 2001 and 2011 Census data. 468 We identify two major sources of mismatches: the dataset representativeness (e.g. age bias 469 and voting registry eligibility); and systematic bias arising from the use of the Ethnicity 470 Estimator algorithm (e.g. the underestimation of Caribbeans). Despite these discrepancies, 471 values of the Dissimilarity Index and their ranks from Consumer Registers exhibit strong 472 positive correlation with those obtained using Census data. The randomisation tests suggest 473 that all of the Dissimilarity Indices are significant set against the null hypothesis of 474 randomness. With the caveats of certain bias attributable to the ways in which they are 475 assembled, Consumer Registers appear to be a promising supplementary source to, rather 476 than a substitute for, Census data. 477 Additional uncertainty of ethnicity estimates arises because the provenance of the different 478 consumer data sources used to augment the Electoral Registers with non-voters over the 20 479 year period is unknown, and the potential sources and operation of bias arising from opt out 480 from the public Electoral Roll post 2003 is also unknown. The methods developed by 481 Lansley et al (2019) promise to address this in parts, but more research is necessary to fully 482 establish the extent of bias. 483 484 Another broad issue is that ethnicity is only inferred from given-and surname pairings, albeit 485 in part using procedures that are more sensitive to the vagaries of self-assignment of identity 486 than purely algorithmic procedures. The merits of names-based analysis would be much 487 reduced were the focus of analysis upon segregation of individuals from 'New World' 488 countries. however, since naming conventions here bear a less clearly identifiable 489 correspondence with geographic origins. Set against these issues, the use of algorithmic 490 procedures to disaggregate the ethnic categories used in UK censuses allows consideration of 491 more classes than is possible through census analysis. Consumer Registers also bring greatly 492 enhanced temporal granularity, in that they are updated in real time, crystallised into annual 493 incremental updates. 494 495

496
The motivation of this study has been to offer a more granular and comprehensive picture of 497 recent segregation trends and to demonstrate the feasibility of revisiting the topic of ethnic 498 segregation using a novel data source: Consumer Registers. Names-based ethnic 499 classifications applied to consumer data offers an innovative and powerful way to identify 500 nuanced patterns of and trends in segregation. Names remain an under-exploited resource in a 501 variety of applications. In particular, the flexibility of defining finer categories of ethnicity 502 produces detailed representations of the widely established two dimensions of ethnic 503 residential segregation. This method can be extended to explicitly spatial investigations of 504 segregation in future research and has the potential to enhance our understanding of ethnic 505 segregation change both in space and over time (see e.g. Lan et al 2019). Removing the 506 constraint of aggregation to Census Output Areas, future research could reconceive 507 segregation as a problem of point pattern analysis subject to restrictions of disclosure control. 508 In view of the high degree of correspondence of segregation patterns with the Census, 509 Consumer Registers are promising resources to uncover new and nuanced dynamics of the 510 complex phenomenon of segregation. 511