1601(7) MDAG members to provide comments and select their preferred option Small Area Population Mid-Year Estimates (MYE) Errors SG SIMD team, 5 November 2015 1. Summary 1.1. NRS have announced that there is an error in the small area population estimates which affect the mid-year estimates. This will have an impact on the construction and timescale of SIMD16. SIMD16 could either use the existing estimates or will be delayed. 1.2. Sensitivity analysis has been conducted on selected domains of the SIMD to estimate the impact the MYE error could have on the index. From the analysis the following conclusions can be reached: The impact on the employment and income domains is small and affects local share of deprivation of only a few local authorities. The error is likely to make areas look more deprived than they really are. In a worst case scenario where the whole error of one local authority is attributed to a small area, the effect on the local authority’s local share is unlikely to be noticeable. 1.3. The preferred option is to carry on with SIMD16 and highlight the issues in the publication of SIMD16. 2. Background 2.1. National Records of Scotland have announced that there is an error in the MYE data from 2002 to 2014. The error has arisen through their calculation of migration of 17-25 year olds. This has been caused by an error in the calculation step affecting students. The calculation step looks at GP registrations (NHS data) which is used to estimate populations when students move from one area to another, usually between a parents home address and a student term time address. 2.2. The problem in the calculation is from the lag between students moving home address and then re-registering with a GP closer to their home address. This lag has been miscalculated. 2.3. Preliminary analysis suggests that the net effect for each local authority is small (the largest error is Dundee City with an overestimated population of 143). However, the effect of gains and losses may be larger. There is no analysis on the effect at data zone level. 2.4. After consultation with their main users, NRS have opted to revise the data between 2012 and 2014. This will not be done at data zone level until August 2016. No revisions will be published for the period 2002-2010. These years will not be revised as the error is contained within the standard error of the 1 1601(7) MYE. It has been decided to update 2012-14 as local authorities indicated that they need more accurate estimates to direct policy and interpret small area statistics. However NRS believe that the effects of the changes would normally be within the standard error of these estimates as well. 2.5. The MYEs have not been withdrawn as many users still use this data, but NRS advise to produce a cautionary warning when publishing any statistics which are based on MYE. 3. Impact on SIMD16 3.1. This will directly affect three of the seven SIMD domains but will have smaller effects on a further two. The only domains that will remain unaffected will be the Access to Services and Housing domains, which do not rely on MYE to calculate scores and ranks. 3.2. Employment and income domains 3.2.1. The employment and income domains are calculated using MYE and rely on accurate projections at the data zone level. Specifically, the employment domain needs accurate estimates of the 16-64 year old age range which is affected by this error. The income domain requires accurate estimates for the entire population at data zone. 3.2.2. Two issues are identified with the use of the MYE with this error. First, the effect between data zones will have overestimated populations. This will happen when the students have left a parental home or term-time address and have been included in the estimate when they have actually deregistered from their GP. Similarly there will be other data zones where the students have moved to, resulting in an underestimate. Here the GP registration will not have been included in the estimate. 3.2.3. In both domains, we are concerned that the estimate will under-/overestimate the deprivation rank in the data zones and the concentrations of deprivation across Scotland. 3.2.4. To estimate the likely impact on SIMD16, sensitivity analysis has been conducted on both domains to look at the effect this will have on the development of SIMD16. The methods used to conduct this analysis are presented in ANNEX AError! Reference source not found.. 3.3. Other domains 3.3.1. We have selected the income and employment domains to conduct sensitivity analysis as they contribute to 56% of the final SIMD and all the indicators rely on MYE. This will give the best indication of the impact of the MYE error. 3.3.2. Most of the indicators in the health and crime domains require MYE but sensitivity analysis has not been conducted due to the complexity of these 2 1601(7) indicators. Less than half of the indicators in the education rely on MYE, therefore the overall impact on SIMD will be small. 3.3.3. It is not possible to conduct sensitivity analysis on the overall index. 4. Sensitivity Analysis – population error is spread evenly in each local authority 4.1. The sensitivity analysis is based on the SIMD 2012 ranks and domains. As we do not hold all the raw data for these domains (it is calculated by DWP and returned to us with disclosure controls in place and we are only supplied the scores on which we then calculate the ranks), we have estimated the changes caused only by population changes. Estimates of those receiving benefits or tax credits are created from postcode data held by DWP. 4.2. The following sensitivity analysis has simulated evenly spread changes to the population at data zone level. This is achieved by randomly changing the population in every data zone according to the estimated error from NRS analysis. This random process has been simulated over 1,000 times to identify extreme conditions. For full explanation of the methods used see ANNEX A. 4.3. The sensitivity analysis identifies the change in local share of deprivation for each local authority in the income and the employment domains. 4.4. Table 1 and Table 2 (ANNEX B) list the results for all local authorities. A negative figure represents an overestimation of deprivation while a positive figure reflects an underestimation. 4.5. Summary of results for the income domain: for 11 of the 32 local authorities, no changes are recorded up to the 20% most deprived (md) level for a further 6 local authorities, the scale of change is 0.5% or less up to the 20% md level the largest overestimations of deprivation in local share is in Dundee City (-1.68|10% md) and Clackmannanshire Councils(-1.56|5% md) the largest underestimation of deprivation in local share is in Inverclyde (0.91%|10% md) and Fife Councils (0.88%|20% md) 4.6. Summary of results for the employment domain: for 14 of the 32 local authorities, no changes are recorded up to the 20% md level for a further 10 local authorities, the scale of change is 0.5% or less up to the 20% md level the largest overestimations of deprivation in local share is in Dundee City Council (-1.68%|10% md) 3 1601(7) the largest underestimation of deprivation in local share is in Argyll & Bute Council (0.82%|15% md) 4.7. This represents the maximum effect if the error is evenly spread across each local authority. 5. Sensitivity Analysis – population error is concentrated in one area 5.1. We have also looked at an extreme case where the population error is attributed to only a few data zones. The methodology used in this scenario is explained in full in ANNEX A. 5.2. The analysis here is conducted on Dundee City Council where according to NRS analysis, the population in 2014 was overestimated by 143, the largest error in any local authority. We are looking at a worst case scenario. 5.3. In Dundee, five data zones were identified where student halls of residence are located. These are the most likely data zones to have an overestimated population. All 143 people were attributed to these five data zones and removed from them. Again, the numbers of income and employment deprived people (those claiming benefits or credits) were unchanged. The resulting scores for the income and employment domains were then ranked and compared to the original ranks. This is presented in Tables 3 and 4 in the Annex. 5.4. As expected, all five data zones change to more deprived rankings with a maximum change of 58 ranks. However, the overall effect on local share of income and employment deprivation is marginal. One data zone moved from the 35% md into the 30% md for employment deprivation, and another data zone moved from the 80% md into the 75% md for income deprivation. The local shares for 5%, 10%, 15% and 20% mds were not affected. 6. Conclusion 6.1. Considering the data from the sensitivity analysis the following conclusions can be reached: the impact on the employment and income domains is small and only affects a few local authorities local share of deprivation the error is likely to make areas look worse than they really are in a worst case scenario where the whole error of one local authority is attributed to a small area, the effect on the local authority’s local share is unlikely to be noticeable 6.2. Further to this analysis, the SIMD team expect that the overall effect of the error on individual domains to be weakened on the complete index as: each domain contributes a portion of the overall index. 4 1601(7) where indicators rely on the MYE to calculate both the proportion (numerator) and total (denominator) of an indicator the effect will also be lessened as the overall value will remain the same. 7. Options 7.1. After discussion with NRS it has become clear that the MYE error cannot be rectified before August 2016. The three options available are outlined as follows: 7.2. Option 1. SIMD16 as planned (preferred option) 7.2.1. The SIMD team would continue to work with the data available to produce SIMD16 as currently planned and highlight the problems in the technical report. SIMD would consult with the UK Statistics Authority to ensure that the appropriate steps have been taken to preserve the National Statistic designation. This would include a full report issued with SIMD16 on this issue and the appropriate communications made to all users. 7.2.2. The SIMD team have conducted the sensitivity analysis to understand the scale of impact that might be experienced by the revision in MYE. Any impact is expected to be small and so there is a strong case to continue with SIMD16 as planned. Users’ needs for a refreshed SIMD would still be met. There is a risk that the trust built up by SIMD over many years could be undermined if we proceed in this way, but as long as we communicate our approach clearly and openly so that users understand the issue as fully as possible, this should be mitigated. 7.3. Option 2. SIMD16 delayed 7.3.1. This option would see the SIMD team delaying the publication until NRS have rectified the problem with MYE. We would communicate the delay to users and re-plan with a new Forthcoming Publication date. 7.3.2. Whilst this option would mean that any problems from this error are mitigated, it is not definite that NRS will be able to deliver an update in Autumn 2016. Further delays will impact on users who are reliant on an SIMD16 and we will fail to meet their needs. However delaying SIMD16 would give the team the opportunity to develop and work on other SIMD related products 7.4. Option 3. Abandon SIMD16 and produce SIMD17 7.5. The SIMD team would stop all activity to produce SIMD 2016 and plan for the next publication in SIMD17. 7.6. This option would again fail to meet users’ needs and would mean that work that has already been conducted to develop SIMD16 will be wasted. However the time could be used for the team to develop and work on other SIMD related products and continue to develop new indicators. 5 1601(7) ANNEX A Sensitivity Analysis Method – Even Population Distribution This section covers: the reason for conducting sensitivity analysis on only the income and employment domains the limitations of conducting full sensitivity analysis the method for conducting sensitivity analysis across all data zones in Scotland the method for conducting sensitivity analysis across specific affected data zones (Dundee City) the impact of the MYE errors on the indicators in the two domains Income and employment domains The income and employment domains make up 54% of the total SIMD. All of the indicators in these two domains rely on MYE for all indicators. The sensitivity analysis looks only at the income and employment domains because of the large impact the MYE errors will have on SIMD. Smaller effects may be felt from this error in other domains. In the employment domain, the calculation for each data zone is summarised as: number of people claiming out-of-work benefit (claimant count) best-fit working population In the income domain, the calculation for each data zone is summarised as: number of people claiming (or dependent on someone claiming) tax credits or other benefits total population Limitations of conducting full sensitivity analysis The following limitations were noted in conducting sensitivity analysis: The analysis is based on SIMD 2012 data which uses 2010 MYE. The estimated error however is based on 2014 MYE. The numerator in both equations is calculated by other government departments based on the MYE to estimate the number of people in each equation claiming the relevant benefits or credits. It is not possible to conduct sensitivity analysis based on estimated changes to the numerators. The data that is available has been augmented so it does not disclose information about individual people . The variation caused by the error in MYE is only known at local authority level. Method for conducting sensitivity analysis across all data zones in Scotland The following steps are used to conduct the sensitivity analysis: 6 1601(7) Rankings for each domain were calculated based on the disclosed data available [a]. The probability of a gain or a loss for a data zone in each local authority was calculated [b] based on the analysis conducted by NRS (see Table 5). A random function based on a magnitude of 1.5x that expected from the NRS analysis for the gain or loss of each local authority was calculated [c]. For each data zone a change to the population was calculated by o predicting a gain or loss based on [b] o adding a change based on the expected magnitude [c]. Data zones were re-ranked for each domain based on the error-disclosed data. Local share of the first four vigintiles was calculated for each local authority [d]. The analysis shows the variance from the disclosed data rankings [d-a]. This process was repeated over 1,000 times until extreme conditions were determined. Sensitivity Analysis – Concentrated Population Distribution The same limitations identified above apply to the sensitivity analysis here. The data zones where changes were to be applied were chosen due to the locations of students halls of residence in Dundee, identified with a quick search on Google maps. Using the 2010 MYE (as used in SIMD 2012), the population was adjusted in proportion to the total population of the data zones and the NRS estimated error for the whole local authority. The rank and vigintile of each data zone were recalculated based on the disclosure controlled data [a]. A new rank and vigintile was calculated for each data zone based on the adjusted population [b]. The cut-offs for each vigintile is based on the disclosure controlled rankings and is shown in Table 6 : Dundee City Council Recalculated Cut-offsTable 6. The analysis is based on the changes to the rank and vigintile [b]-[a]. 7 1601(7) ANNEX B - Tables Table 1: Change in Percentage Local Share of most deprived (md) by Local Authority – Income Domain Local Authority Change in Local Share of Deprivation (min/max) 5% md 10% md 15% md 20% md Aberdeen City Aberdeenshire Angus Argyll & Bute Clackmannanshire -1.56% / 0% Dumfries & Galloway Dundee City -0.56% / 0% East Ayrshire East Dunbartonshire East Lothian East Renfrewshire Edinburgh, City of -0.36% / 0% Eilean Siar Falkirk Fife -0.66% / 0.22% Glasgow City -0.29% / 0.29% Highland -0.34% / 0% Inverclyde Midlothian Moray North Ayrshire 0% / 0.56% North Lanarkshire Orkney Islands Perth & Kinross 0% / 0.57% Renfrewshire 0% / 0.47% Scottish Borders Shetland Islands South Ayrshire South Lanarkshire Stirling West Dunbartonshire West Lothian 0% / 0.47% 0% / 0.37% 0% / 0.37% -0.33% / 0% 0% / 0.7% 0% / 0.82% -1.68% / 0% 0% / 0.56% 0% / 0.56% 0% / 0.65% -0.65% / 0% -0.65% / 0% -0.18% / 0.36% -0.18% / 0.18% -0.36% / 0% 0% / 0.51% -0.51% / 0% 0% / 0.44% -0.66% / 0% -0.44% / 0.88% -0.43% / 0% -0.14% / 0.29% -0.14% / 0% -0.34% / 0% 0% / 0.91% -0.56% / 0% 0% / 0.56% -0.24% / 0.48% -0.72% / 0.48% 0% / 0.24% -0.57% / 0% -0.47% / 0.47% -0.47% / 0% -0.25% / 0.25% 0% / 0.75% -0.25% / 0% -0.91% / 0% -0.91% / 0.91% -0.85% / 0% -0.47% / 0.47% 0% / 0.47% 8 1601(7) Table 2: Change in Percentage Local Share of md by Local Authority – Employment Domain Local Authority Aberdeen City Aberdeenshire Angus Argyll & Bute Clackmannanshire Dumfries & Galloway Dundee City East Ayrshire East Dunbartonshire East Lothian East Renfrewshire Edinburgh, City of Eilean Siar Falkirk Fife Glasgow City Highland Inverclyde Midlothian Moray North Ayrshire North Lanarkshire Orkney Islands Perth & Kinross Renfrewshire Scottish Borders Shetland Islands South Ayrshire South Lanarkshire Stirling West Dunbartonshire West Lothian Change in Local Share of Deprivation (min/max) 5% md 10% md 15% md 20% md -0.56 / 0 -0.18 / 0 -0.22 / 0.22 -0.14 / 0.14 -0.34 / 0 0 / 0.56 0 / 0.57 0 / 0.47 0 / 0.47 -1.68 / 0 -0.65 / 0 -0.18 / 0 0 / 0.51 0 / 0.44 -0.29 / 0 0 / 0.24 0 / 0.47 9 0 / 0.7 -0.82 / 0.82 0 / 0.65 -0.18 / 0.18 -0.44 / 0 -0.14 / 0.14 -0.34 / 0 -0.48 / 0.24 -0.47 / 0 0 / 0.5 -0.91 / 0 - 0 / 0.37 -0.33 / 0 -0.36 / 0 -0.22 / 0.22 0 / 0.56 0 / 0.24 -0.25 / 0 -0.91 / 0 0 / 0.47 1601(7) Table 3 : Change in rank and vigintile of selected Dundee City council data zones – income domain Total population (2010) Error S01001095 S01001101 S01001102 S01001109 S01001114 1488 1786 1079 1635 939 -31 -37 -22 -34 -19 Total 6927 -143 Data Zone Rank Vigintile 5540.5 4212 4194.5 2216 433 18 13 13 7 2 New New Rank rank vigintile change 5504 4154 4145 2143 400 17 13 13 7 2 -36.5 -58 -49.5 -73 -33 Table 4 : Change in rank and vigintile of selected Dundee City council data zones – employment domain Data Zone Working-age population Error (2010) S01001095 S01001101 S01001102 S01001109 S01001114 1410 1658 878 1406 612 -23 6284 -27 4106 -15 3777.5 -23 1972 -10 164 Total 5964 -98 Rank Vigintile 10 20 13 12 7 1 New rank New vigintile Rank change 6271 4067 3731 1916 145.5 20 13 12 6 1 -13 -39 -46.5 -56 -18.5 1601(7) Table 5 : Calculated probability of a gain or loss based on the NRS analysis of the MYE error Local Authority Probability of gain Probability of loss 0.37 0.55 0.76 0.46 0.52 0.54 0.10 0.46 0.43 0.65 0.53 0.43 0.56 0.63 0.50 0.50 0.47 0.67 0.53 0.58 0.57 0.57 0.48 0.62 0.50 0.55 0.47 0.59 0.43 0.24 0.42 0.60 0.63 0.45 0.24 0.54 0.48 0.46 0.90 0.54 0.57 0.35 0.48 0.57 0.44 0.37 0.50 0.50 0.53 0.33 0.47 0.42 0.43 0.43 0.52 0.38 0.50 0.45 0.53 0.41 0.57 0.76 0.58 0.40 Aberdeen City Aberdeenshire Angus Argyll & Bute Clackmannanshire Dumfries & Galloway Dundee City East Ayrshire East Dunbartonshire East Lothian East Renfrewshire Edinburgh, City of Eilean Siar Falkirk Fife Glasgow City Highland Inverclyde Midlothian Moray North Ayrshire North Lanarkshire Orkney Islands Perth & Kinross Renfrewshire Scottish Borders Shetland Islands South Ayrshire South Lanarkshire Stirling West Dunbartonshire West Lothian Table 6 : Dundee City Council Recalculated Cut-offs Cut-off SIMD 2012 published data Recalculated with disclosed data 5% 10% 15% 20% 11.73 22.35 31.28 40.78 11.17 20.67 30.73 40.22 11