Inter-regional migration in Europe: a spatial interaction modelling perspective Adam Dennett*, Kimberley Claydon†, Pablo Mateos† *Centre for Advanced Spatial Analysis †Department of Geography University College London Presentation to the British Society for Population Studies – Annual Conference, 9th September 2011 Presentation Outline • Introduction to the ENFOLD-ing project and motivation for this work • Modelling migration - spatial interaction models • A comparison of modelling methodology – entropy maximising models vs. statistical models • A spatial interaction modelling perspective on inter-regional migration in Europe – work in progress… ENFOLD-ing • Explaining, Modelling and Forecasting Global Dynamics - 5 year EPSRC project • Understand global dynamics through a model-based analysis and to develop an associated forecasting capability • Four substantive areas as key ‘understanding’ and modelling challenges in the context of globalisation: • trade, and economic development • migration, and global demography • security • development aid Migration and global demography stream • Challenge – year 1: to assemble the data and tools to enable us to build a dynamic model of global migration flows – principally international, country-to-country flows but also interested in city/regional scales… • Data: • ‘Slow dynamics’ – population counts, migrant stocks, other demographic data • ‘Fast dynamics’ – migration flows • Supplementary data – distance matrices, language associations, economic data, trade flows, flight data, surname associations, currency, colonial ties etc… • Tools: • Theories of migration and associated models Modelling migration • Two main camps: “Probabilistic” and “Deterministic” – Plane (1982); Stillwell (1978) • Probabilistic: rate-based Markov-style demographic migration models – e.g. DEMIFER project, ONS SNPP • Yr 2000, Pop = 10,000, In-migrants = 500, Rate = 0.05 • Yr 2001, Pop = 10,500, Rate = 0.05, Est In-migrants = 525 • Reliable, historical time-series a prerequisite • Deterministic: examine causal relationships between predictor variables and migration moves – e.g. increasing population = increasing numbers of migrants; increasing distance between origin and destination = decreasing numbers of migrants. • Global data patchy so our initial focus on deterministic migration models… Spatial Interaction Models • Most common deterministic models used in migration analysis • Based on gravity models: πππ = ππ ππ πππ (Zipf, 1946) (1) • Estimated migration (π) between origin i and destination j is proportional to product of populations at origin (ππ ) and destination (ππ ) and inversely proportional to distance between them (πππ ) • In a migration model, populations can be substituted for total in(ππ ) and out-migrants (π·π ) and the frictional effect of distance decays exponentially So: πππ = πΎππ π·π π −π½πππ (2) where: π πππ −π½πππ π ππ π·π π π πΎ= π (3) • The distance decay parameter (π½) is calibrated within the model… Calibrating spatial interaction models • Different techniques can be used to calibrate the parameters of SIMs and this has led to a noticeable bifurcation in approaches… 1. Entropy Maximising spatial interaction models • Developed after pioneering work by Wilson (1971) • Used in migration models by Stillwell (1978), Pooler (1994), Plane (1982), Fotheringham (1983) • Use mathematical programs (usually bespoke – coded from scratch in Fortran, VB, Java etc.) to calibrate parameters though computational search algorithms 2. Statistical spatial interaction models • OLS regression, Poisson Regression and log-linear models • Used in migration modelling by Willekens (1983), Flowerdew (2010), Boyle et al. (1998), Cohen et al. (2008), Mayda (2010), Abel (2010), Raymer (2007) • Calibrated using regression algorithms available in most offthe-shelf statistical software packages – R, SPSS, Stata etc. Spatial interaction models – best approach? • Advocates from both camps of spatial interaction modelling frequently justify their method as preferable • For anyone new to migration modelling the choice of which route to take is unclear Question: • Is one approach preferable to the other in terms of: a) model predictions? b) other model outputs (e.g. parameter information) c) ease of application? • To answer these questions we turn to an empirical example… Spatial Interaction Models – an empirical example Inter-regional (NUTS2) migration – Austria, 2006 AT12 AT13 1131 1887 0 14055 20164 0 379 1597 1110 2973 2027 3498 378 1349 424 978 128 643 25741 26980 Destination AT21 AT22 AT31 AT32 AT33 AT34 Oi 69 738 98 31 43 19 4016 416 1276 1850 388 303 159 20080 1080 1831 1943 742 674 407 29142 0 1608 328 317 469 114 4897 1252 0 1081 622 425 262 8487 346 1332 0 2144 821 274 10638 310 851 2117 0 630 106 5790 490 670 577 546 0 569 4341 154 328 199 112 587 0 2184 4117 8634 8193 4902 3952 1910 89575 AT12 AT13 103 84 0 46 46 0 217 250 130 159 141 186 201 244 344 388 454 498 Destination AT21 AT22 AT31 AT32 AT33 AT34 221 132 215 247 391 505 217 130 141 201 344 454 250 159 186 244 388 498 0 92 152 93 195 306 92 0 125 122 262 376 152 125 0 82 208 315 93 122 82 0 145 259 195 262 208 145 0 114 306 376 315 259 114 0 Flows AT11 Origin AT11 AT12 AT13 AT21 AT22 AT31 AT32 AT33 AT34 Dj 0 1633 2301 85 762 196 49 87 33 5146 Distance AT11 Origin AT11 AT12 AT13 AT21 AT22 AT31 AT32 AT33 AT34 0 103 84 221 132 215 247 391 505 16001 Spatial Interaction Models – an empirical example • Data in the flow matrix can be modelled by using the data in the distance matrix and re-scaling this information subject to marginal constraints in the flow matrix • Using the statistical modelling approach, an additive Poisson regression model which does this would take the form: ln πππ = π + πππ + πππ· + π½πππ Unsaturated log-linear model: overall effect + origin & destination main effect parameters – these are categorical predictors, equivalent to dummy variables and constrain the estimates (4) Continuous predictor variable multiplied by β parameter Spatial Interaction Models – an empirical example • The Poisson model can be run in R using the GLM package with a command similar to: AustriaExp <- glm(Data~Origin+Destination+Dij+ offset(log(Offset)), family=poisson(link="log"),data=Austria) The offset in the model is a matrix with 0s in the diagonal cells and 1s in all other cells to force the modelled diagonals to = 0 • The GLM package in R will calibrate a series of parameters – dummy parameters for the origin and destination main effects, and overall main effect and a slope parameter associated with the continuous distance variable… Spatial Interaction Models – an empirical example AT11 AT12 AT13 Origin Destination AT11 AT12 AT13 AT21 AT22 AT31 AT32 AT33 AT34 0 979 2261 104 338 191 87 44 14 1027 0 14365 503 1614 1602 584 299 96 2921 17686 0 949 3149 2733 1020 516 164 πππ 1.545 2.441 AT21 174 799 1225 0 932 630 591 416 132 0.6992 AT22 AT31 450 329 2044 2639 3239 3657 743 653 0 1304 1003 0 601 1166 314 674 97 222 0.9487 1.29 AT32 AT33 147 74 946 482 1340 677 602 422 767 400 1145 660 0 642 644 0 201 986 0.7427 1.195 AT34 24 164 229 142 132 232 213 1049 0 1.497 2.185 0.1878 0.6643 0.7426 0.2133 0.6677 0.3999 0.9887 6.2050 πππ· R2 = 0.975 π½ = -0.007915 • π12 = 979 = ππ₯π 6.2050 + 0 + 1.497 + −0.007915 × 103 + Spatial Interaction Models – an empirical example • The multiplicative form of the Poisson model above is very similar to the gravity model in Equation 1: • πππ = πΎππ π·π π π½πππ - in log form the ‘main effects’ of this model = ln πππ = ln πΎ + ln ππ + ln π·π • But in log form the main effects of the Poisson model = ln πππ = ln π + ln ππ π + ln π·π π (6) (7) Where: π = π π πππ (8) • These main effects models in 7 and 8 produce identical results • When we incorporate space back into the model, the Poisson model ≠ the gravity model as each main effect parameter is a constraint (gravity only has an overall K constraint). • Including space, the entropy maximising equivalent of the Poisson model is a doubly constrained spatial interaction model… Spatial Interaction Models – an empirical example • The doubly constrained entropy maximising spatial interaction model equivalent to the Poisson model takes the form: πππ = π΄π ππ π΅π π·π π π½πππ (8) Where π΄π = And π΅π = 1 π π΅π π· π π π½πππ 1 π π΄ π ππ π½π π ππ (9) (10) • Model programmed in VBA – constraints (equivalent to main effects ‘dummy’ parameters in Poisson model) calculated using iterative procedure (Senior, 1979) • π½ parameter calibrated using Newton-Raphson routine (other routines available – see Batty, 1972 for thorough comparison…) Spatial Interaction Models – an empirical example Entropy maximising model results Entropy AT11 Origin AT11 AT12 AT13 AT21 AT22 AT31 AT32 AT33 AT34 Dj 0 967 3068 159 457 301 125 55 15 5146 R2 = 0.977 AT12 AT13 924 2386 0 14980 18400 0 687 1085 1940 3189 2525 3531 805 1156 358 508 102 144 25741 26980 π½ = -9.51213 Poisson AT11 Origin AT11 AT12 AT13 AT21 AT22 AT31 AT32 AT33 AT34 Dj Destination AT21 AT22 AT31 AT32 AT33 AT34 Oi 92 336 169 70 30 8 4016 415 1495 1490 474 206 55 20080 806 3019 2559 836 359 95 29142 0 1079 677 670 427 112 4897 846 0 1082 624 278 71 8487 694 1412 0 1325 667 184 10638 689 817 1329 0 691 178 5790 449 373 685 707 0 1207 4341 127 102 203 196 1295 0 2184 4117 8634 8193 4902 3952 1910 89575 0 1027 2921 174 450 329 147 74 24 5146 AT12 AT13 979 2261 0 14365 17686 0 799 1225 2044 3239 2639 3657 946 1340 482 677 164 229 25741 26980 Destination AT21 AT22 AT31 AT32 AT33 AT34 104 338 191 87 44 14 503 1614 1602 584 299 96 949 3149 2733 1020 516 164 0 932 630 591 416 132 743 0 1003 601 314 97 653 1304 0 1166 674 222 602 767 1145 0 644 201 422 400 660 642 0 986 142 132 232 213 1049 0 4117 8634 8193 4902 3952 1910 Oi 4016 20080 29142 4897 8487 10638 5790 4341 2184 89575 Spatial Interaction Models – an empirical example • Reasons for slight difference in between results for entropy maximising model and Poisson model? • It’s all in the π½ parameter ! • Entropy model calibrates parameter as π½ = -9.51213 • Poisson model calibrates parameter as π½ = -0.00791 • But quirk of entropy program means all distances needed to be divided by 1000… so, multiply Poisson parameter by 1000 and π½ = -7.91533 – similar to the Entropy value • If we plug this value into the entropy model (something which is much easier to do with a home-made bespoke model – the opposite can’t be done in R with the Poisson model) – hey presto, identical results! Interim conclusions • Poisson regression and entropy maximising spatial interaction models produce identical results when the distance decay parameter is the same • The only difference between the two approaches is in the calibration of the distance decay parameter • Statistical packages such as R and SPSS will calibrate these parameters automatically using methods which are not fully documented. • R uses the ‘Iteratively Reweighting Least Squares’ algorithm (produces comparable maximum likelihood estimates to the Newton-Raphson routine (Green, 1984)) – and will provide parameter estimates for as many dummy variables and covariates as specified) Interim conclusions • Interpretation of parameters produced in R can be difficult – whilst different coding schemes can be chosen, an intuitive scheme such as Raymer’s (2007) ‘total reference category’ cannot • Bespoke entropy maximising program offers more flexibility but with the drawback of computer programming knowledge required • Calibration of more than 1 parameter in the entropy model considerably more complicated than just 1 (for non-expert programmers / mathematicians!) Modelling inter-regional migration in Europe • Experimentation with migration models led us to exploring inter-regional migration in Europe – reliable data for model calibration • Could our experimentation offer any new perspectives or fill gaps in data? Question: • Can we use inter-regional, intra-country data to effectively model inter-regional, inter-country flows in a post-Schengen open-border Europe? • This is very much still Work in progress, but… European NUTS2 regional system Data collected for countries in 2006 – collated for the DEMIFER project Doubly constrained models – national results Country Code Country R2 FI SE AT HU SK NL DK NO BG CZ UK PL CH BE RO DE IT ES FR Finland Sweden Austria Hungary Slovakia Netherlands Denmark Norway Bulgaria Czech Republic United Kingdom Poland Switzerland Belgium Romania Germany Italy Spain France 0.996 0.974 0.972 0.963 0.948 0.936 0.930 0.919 0.901 0.889 0.884 0.877 0.788 0.772 0.745 0.715 0.699 0.621 0.549 Generalise d π½ (power function) -0.754 -0.771 -0.747 -0.567 -0.773 -1.279 -0.969 -0.814 -0.825 -0.807 -0.927 -1.068 -0.867 -1.049 -0.763 -0.760 -0.718 0.154 1.093 • Can national parameters be used and applied to regions? • π½ parameter closer to positive = migrants less deterred by distance • Noticeable variation, e.g. Netherlands – highest negative π½ • Spain and France π½ values unreliable – they are positive (which would indicate propensity to migrate increases with distance) but poor fits mean in reality this is unlikely to be the situation • Regional parameters may provide better model inputs… Origin/destination specific • We can decompose the national π½ parameters to regional parameters by calibrating them separately for origins and destinations… π½ ππΌπ = π΄πΌ π΅π ππΌ π·π ππππΌ π½ πππ½ = π΄π π΅π½ ππ π·π½ ππππ½ (11) (12) after Stillwell (1978) • Equivalent Poisson models are ln πππ = π + πππ + πππ· + πππ ∗ ln π½πππ (13) ln πππ = π + πππ + πππ· + πππ· ∗ ln π½πππ (14) • Whilst model flow estimates are similar (not identical due to algorithm issues discussed), parameters from Poisson model are difficult to interpret as π½ interacting with dummy variables – consequently we chose entropy models as our vehicle... Variation in distance decay parameters calibrated on intra-country (internal), inter-regional flows, 2006 R2 = 0.941 R2 = 0.944 Multi-level spatial interaction models Notation Description ππΌπ½ Country level migration matrix πππ NUTS2 region level migration matrix πΌπΌ πππ Intra-country, inter-NUTS2 regional matrix Origin/row totals at inter-country level ππΌ = ππΌπ½ = ππΌ+ π½≠πΌ π½ π· = πΌπ½ +π½ πΌπΌ πππ πΌπΌ ππ+ π =π Destination/column totals at intercountry level πΌ≠π½ ππ = = Origin/row totals at NUTS2 level within country π≠π π·π = πππ = π+π π½π½ π½π½ Destination/column totals at NUTS2 level within country πΌπ½ πππ = πΌπ½ ππ+ Origin/row totals at NUTS2 level where NUTS2 region not a member of country = πΌπ½ π+π • π≠π πππΌ = π∉π½ π½ π·π πΌπ½ πππ = π∉πΌ Destination/column totals at NUTS2 level where NUTS2 region not a member of country • Specifying multi-level constraints in an entropy maximising model means we can model unknown flows incorporating maximum amount of known information… These constraints can estimated for a crude version of the model using ππ and π·π distributions applied to ππΌ and π· π½ totals Multi-level model results • Full matrix of flows for 2006 modelled (Eq 11 & 12) using O/D specific π½ parameter estimates from internal flows and crude ππ π·π estimates • In most cases model underpredicts internal flows, suggesting too many internal migrants distributed internationally: country border effects under-estimated • UK, DE, PO, RO – model over predicts: border effects over-estimated Avg gross Avg gross error error (Origin (Destination specific specific beta) beta) Country Code Country AT Austria -957.670 -896.175 BE Belgium -625.557 -662.539 BG Bulgaria -1073.268 -1220.228 CH Switzerland -839.091 -1060.700 CZ Czech Republic -1017.628 -1128.391 DE Germany 101.323 114.369 DK Denmark -3058.586 -2268.791 ES Spain -375.229 -494.283 FI Finland -3280.723 -3421.611 FR France -780.847 -747.982 HU Hungary -2538.885 -2536.447 IE Ireland IT Italy -595.699 -593.918 NL Netherlands -890.721 -981.864 NO Norway -1666.206 -1766.215 PL Poland 185.229 136.879 RO Romania 329.574 97.110 SE Sweden -2212.665 -2306.671 SI Slovenia -2018.675 -2017.708 SK Slovakia -1177.098 -1097.057 UK United Kingdom 424.536 434.086 -13455.477 -13429.681 Conclusions • Poisson migration models and entropy maximising migration models using distance to distribute flows produce identical migration predictions given identical distance decay parameters • Differences in outputs from our experiments are down to algorithms used to calibrate π½ parameters – trade off between proprietary software ‘black box’ and additional complexity involved and knowledge required to build bespoke software • Entropy maximising models calibrated on internal migration data can produce estimates of international (intra-EU), inter-regional flows – a multi-level SIM framework enables known information to be built into constraints • In our trial model, despite open EU borders, internal migration under-predictions in model suggest border effects underestimated for most counties. Model predicts too many migrants flowing between countries (with notable exceptions) – internal inter-regional migration poor predictor of international interregional migration Future work • Develop multi-level SIM framework fully so internal migration flows within EU SIM constrained to known country level information on internal migration (which should improve inter-country estimates) • Distance decay parameter has large effect on model outcome – where this cannot be calibrated directly (due to insufficient data) can we model the parameter using other covariates? • Developing a hierarchy of model constraints – if total migrant flows not available, what are the next best constraints? Total populations? Migrant stocks? GDP? References • • • • • • • • • • • • • • Abel, G.J. (2010), 'Estimation of international migration flow tables in Europe', Journal of the Royal Statistical Society: Series A (Statistics in Society). Batty, M. and Mackie, S. (1972), 'The calibration of gravity, entropy, and related models of spatial interaction', Environment and Planning, 4 (2), 205-33. Boyle, P.J., Flowerdew, R., and Shen, J. (1998), 'Modelling inter-ward migration in Hereford and Worcester: The importance of housing growth and tenure', Regional Studies, 32 (2), 113 - 32. Cohen, J., Roig, M., Reuman, D., and GoGwilt, C. (2008), 'International migration beyond gravity: a statistical model for use in population projections', Proceedings of the National Academy of Sciences, 105 (40), 15269-74. Flowerdew, R. (2010), 'Modelling migration with poisson regression', in J. Stillwell, O. Duke-Williams, and A. Dennett (eds.), Technologies for Migration and Commuting Analysis: Spatial Interaction Data Applications: IGI Global. Fotheringham, A. S. (1983), 'A new set of spatial-interaction models: the theory of competing destinations', Environment and Planning A, 15 (1), 15-36. Stillwell, J. (1978), 'Interzonal migration: some historical tests of spatial-interaction models', Environment and Planning A, 10, 1187-200. Plane, D. A. (1982), 'An information theoretic approach to the estimation of migration flows', Journal of Regional Science, 22 (4), 441-56. Pooler, J. (1994), 'An extended family of spatial interaction models', Progress in Human Geography, 18 (1), 17-39. Mayda, A. (2010), 'International migration: a panel data analysis of the determinants of bilateral flows', Journal of Population Economics, 23 (4), 1249-74. Raymer, J. (2007), 'The estimation of international migration flows: a general technique focused on the origin destination association structure', Environment and Planning A, 39, 985-95. Willekens, F. (1983), 'Log-linear modelling of spatial interaction', Papers in Regional Science, 52 (1), 187-205. Wilson, A. (1971), 'A family of spatial interaction models, and associated developments', Environment and Planning A, 3, 1-32. Zipf, G.K. (1946), 'The P1 P2 / D hypothesis: on the intercity movement of persons', American Sociological Review, 11 (6), 677-86. Thank you Adam Dennett a.dennett@ucl.ac.uk http://adamdennett.co.uk