Open - The Scottish Government

advertisement
1601(7)
MDAG members to provide comments and select their preferred option
Small Area Population Mid-Year Estimates (MYE) Errors
SG SIMD team, 5 November 2015
1.
Summary
1.1.
NRS have announced that there is an error in the small area population
estimates which affect the mid-year estimates. This will have an impact on
the construction and timescale of SIMD16. SIMD16 could either use the
existing estimates or will be delayed.
1.2.
Sensitivity analysis has been conducted on selected domains of the SIMD to
estimate the impact the MYE error could have on the index. From the
analysis the following conclusions can be reached:
 The impact on the employment and income domains is small and affects
local share of deprivation of only a few local authorities.
 The error is likely to make areas look more deprived than they really are.
 In a worst case scenario where the whole error of one local authority is
attributed to a small area, the effect on the local authority’s local share is
unlikely to be noticeable.
1.3.
The preferred option is to carry on with SIMD16 and highlight the issues in
the publication of SIMD16.
2.
Background
2.1.
National Records of Scotland have announced that there is an error in the
MYE data from 2002 to 2014. The error has arisen through their calculation
of migration of 17-25 year olds. This has been caused by an error in the
calculation step affecting students. The calculation step looks at GP
registrations (NHS data) which is used to estimate populations when students
move from one area to another, usually between a parents home address
and a student term time address.
2.2.
The problem in the calculation is from the lag between students moving home
address and then re-registering with a GP closer to their home address. This
lag has been miscalculated.
2.3.
Preliminary analysis suggests that the net effect for each local authority is
small (the largest error is Dundee City with an overestimated population of
143). However, the effect of gains and losses may be larger. There is no
analysis on the effect at data zone level.
2.4.
After consultation with their main users, NRS have opted to revise the data
between 2012 and 2014. This will not be done at data zone level until August
2016. No revisions will be published for the period 2002-2010. These years
will not be revised as the error is contained within the standard error of the
1
1601(7)
MYE. It has been decided to update 2012-14 as local authorities indicated
that they need more accurate estimates to direct policy and interpret small
area statistics. However NRS believe that the effects of the changes would
normally be within the standard error of these estimates as well.
2.5.
The MYEs have not been withdrawn as many users still use this data, but
NRS advise to produce a cautionary warning when publishing any statistics
which are based on MYE.
3.
Impact on SIMD16
3.1.
This will directly affect three of the seven SIMD domains but will have smaller
effects on a further two. The only domains that will remain unaffected will be
the Access to Services and Housing domains, which do not rely on MYE to
calculate scores and ranks.
3.2.
Employment and income domains
3.2.1. The employment and income domains are calculated using MYE and rely on
accurate projections at the data zone level. Specifically, the employment
domain needs accurate estimates of the 16-64 year old age range which is
affected by this error. The income domain requires accurate estimates for the
entire population at data zone.
3.2.2. Two issues are identified with the use of the MYE with this error. First, the
effect between data zones will have overestimated populations. This will
happen when the students have left a parental home or term-time address
and have been included in the estimate when they have actually deregistered from their GP. Similarly there will be other data zones where the
students have moved to, resulting in an underestimate. Here the GP
registration will not have been included in the estimate.
3.2.3. In both domains, we are concerned that the estimate will under-/overestimate
the deprivation rank in the data zones and the concentrations of deprivation
across Scotland.
3.2.4. To estimate the likely impact on SIMD16, sensitivity analysis has been
conducted on both domains to look at the effect this will have on the
development of SIMD16. The methods used to conduct this analysis are
presented in ANNEX AError! Reference source not found..
3.3.
Other domains
3.3.1. We have selected the income and employment domains to conduct
sensitivity analysis as they contribute to 56% of the final SIMD and all the
indicators rely on MYE. This will give the best indication of the impact of the
MYE error.
3.3.2. Most of the indicators in the health and crime domains require MYE but
sensitivity analysis has not been conducted due to the complexity of these
2
1601(7)
indicators. Less than half of the indicators in the education rely on MYE,
therefore the overall impact on SIMD will be small.
3.3.3. It is not possible to conduct sensitivity analysis on the overall index.
4.
Sensitivity Analysis – population error is spread evenly in each local
authority
4.1.
The sensitivity analysis is based on the SIMD 2012 ranks and domains. As
we do not hold all the raw data for these domains (it is calculated by DWP
and returned to us with disclosure controls in place and we are only supplied
the scores on which we then calculate the ranks), we have estimated the
changes caused only by population changes. Estimates of those receiving
benefits or tax credits are created from postcode data held by DWP.
4.2.
The following sensitivity analysis has simulated evenly spread changes to the
population at data zone level. This is achieved by randomly changing the
population in every data zone according to the estimated error from NRS
analysis. This random process has been simulated over 1,000 times to
identify extreme conditions. For full explanation of the methods used see
ANNEX A.
4.3.
The sensitivity analysis identifies the change in local share of deprivation for
each local authority in the income and the employment domains.
4.4.
Table 1 and Table 2 (ANNEX B) list the results for all local authorities. A
negative figure represents an overestimation of deprivation while a positive
figure reflects an underestimation.
4.5.
Summary of results for the income domain:
 for 11 of the 32 local authorities, no changes are recorded up to the 20%
most deprived (md) level
 for a further 6 local authorities, the scale of change is 0.5% or less up to
the 20% md level
 the largest overestimations of deprivation in local share is in Dundee City
(-1.68|10% md) and Clackmannanshire Councils(-1.56|5% md)
 the largest underestimation of deprivation in local share is in Inverclyde
(0.91%|10% md) and Fife Councils (0.88%|20% md)
4.6.
Summary of results for the employment domain:
 for 14 of the 32 local authorities, no changes are recorded up to the 20%
md level
 for a further 10 local authorities, the scale of change is 0.5% or less up to
the 20% md level
 the largest overestimations of deprivation in local share is in Dundee City
Council (-1.68%|10% md)
3
1601(7)
 the largest underestimation of deprivation in local share is in Argyll & Bute
Council (0.82%|15% md)
4.7.
This represents the maximum effect if the error is evenly spread across each
local authority.
5.
Sensitivity Analysis – population error is concentrated in one area
5.1.
We have also looked at an extreme case where the population error is
attributed to only a few data zones. The methodology used in this scenario is
explained in full in ANNEX A.
5.2.
The analysis here is conducted on Dundee City Council where according to
NRS analysis, the population in 2014 was overestimated by 143, the largest
error in any local authority. We are looking at a worst case scenario.
5.3.
In Dundee, five data zones were identified where student halls of residence
are located. These are the most likely data zones to have an overestimated
population. All 143 people were attributed to these five data zones and
removed from them. Again, the numbers of income and employment deprived
people (those claiming benefits or credits) were unchanged. The resulting
scores for the income and employment domains were then ranked and
compared to the original ranks. This is presented in Tables 3 and 4 in the
Annex.
5.4.
As expected, all five data zones change to more deprived rankings with a
maximum change of 58 ranks. However, the overall effect on local share of
income and employment deprivation is marginal. One data zone moved from
the 35% md into the 30% md for employment deprivation, and another data
zone moved from the 80% md into the 75% md for income deprivation. The
local shares for 5%, 10%, 15% and 20% mds were not affected.
6.
Conclusion
6.1.
Considering the data from the sensitivity analysis the following conclusions
can be reached:
 the impact on the employment and income domains is small and only
affects a few local authorities local share of deprivation
 the error is likely to make areas look worse than they really are
 in a worst case scenario where the whole error of one local authority is
attributed to a small area, the effect on the local authority’s local share is
unlikely to be noticeable
6.2.
Further to this analysis, the SIMD team expect that the overall effect of the
error on individual domains to be weakened on the complete index as:
 each domain contributes a portion of the overall index.
4
1601(7)
 where indicators rely on the MYE to calculate both the proportion
(numerator) and total (denominator) of an indicator the effect will also be
lessened as the overall value will remain the same.
7.
Options
7.1.
After discussion with NRS it has become clear that the MYE error cannot be
rectified before August 2016. The three options available are outlined as
follows:
7.2.
Option 1. SIMD16 as planned (preferred option)
7.2.1. The SIMD team would continue to work with the data available to produce
SIMD16 as currently planned and highlight the problems in the technical
report. SIMD would consult with the UK Statistics Authority to ensure that the
appropriate steps have been taken to preserve the National Statistic
designation. This would include a full report issued with SIMD16 on this issue
and the appropriate communications made to all users.
7.2.2. The SIMD team have conducted the sensitivity analysis to understand the
scale of impact that might be experienced by the revision in MYE. Any impact
is expected to be small and so there is a strong case to continue with
SIMD16 as planned. Users’ needs for a refreshed SIMD would still be met.
There is a risk that the trust built up by SIMD over many years could be
undermined if we proceed in this way, but as long as we communicate our
approach clearly and openly so that users understand the issue as fully as
possible, this should be mitigated.
7.3.
Option 2. SIMD16 delayed
7.3.1. This option would see the SIMD team delaying the publication until NRS have
rectified the problem with MYE. We would communicate the delay to users
and re-plan with a new Forthcoming Publication date.
7.3.2. Whilst this option would mean that any problems from this error are mitigated,
it is not definite that NRS will be able to deliver an update in Autumn 2016.
Further delays will impact on users who are reliant on an SIMD16 and we will
fail to meet their needs. However delaying SIMD16 would give the team the
opportunity to develop and work on other SIMD related products
7.4.
Option 3. Abandon SIMD16 and produce SIMD17
7.5.
The SIMD team would stop all activity to produce SIMD 2016 and plan for the
next publication in SIMD17.
7.6.
This option would again fail to meet users’ needs and would mean that work
that has already been conducted to develop SIMD16 will be wasted. However
the time could be used for the team to develop and work on other SIMD
related products and continue to develop new indicators.
5
1601(7)
ANNEX A
Sensitivity Analysis Method – Even Population Distribution
This section covers:
 the reason for conducting sensitivity analysis on only the income and
employment domains
 the limitations of conducting full sensitivity analysis
 the method for conducting sensitivity analysis across all data zones in Scotland
 the method for conducting sensitivity analysis across specific affected data
zones (Dundee City)
 the impact of the MYE errors on the indicators in the two domains
Income and employment domains
The income and employment domains make up 54% of the total SIMD. All of the
indicators in these two domains rely on MYE for all indicators. The sensitivity
analysis looks only at the income and employment domains because of the large
impact the MYE errors will have on SIMD. Smaller effects may be felt from this error
in other domains.
In the employment domain, the calculation for each data zone is summarised as:
number of people claiming out-of-work benefit (claimant count)
best-fit working population
In the income domain, the calculation for each data zone is summarised as:
number of people claiming (or dependent on someone claiming) tax credits or other benefits
total population
Limitations of conducting full sensitivity analysis
The following limitations were noted in conducting sensitivity analysis:
 The analysis is based on SIMD 2012 data which uses 2010 MYE. The
estimated error however is based on 2014 MYE.
 The numerator in both equations is calculated by other government
departments based on the MYE to estimate the number of people in each
equation claiming the relevant benefits or credits.
 It is not possible to conduct sensitivity analysis based on estimated changes to
the numerators.
 The data that is available has been augmented so it does not disclose
information about individual people .
 The variation caused by the error in MYE is only known at local authority level.
Method for conducting sensitivity analysis across all data zones in Scotland
The following steps are used to conduct the sensitivity analysis:
6
1601(7)
 Rankings for each domain were calculated based on the disclosed data
available [a].
 The probability of a gain or a loss for a data zone in each local authority was
calculated [b] based on the analysis conducted by NRS (see Table 5).
 A random function based on a magnitude of 1.5x that expected from the NRS
analysis for the gain or loss of each local authority was calculated [c].
 For each data zone a change to the population was calculated by
o predicting a gain or loss based on [b]
o adding a change based on the expected magnitude [c].
 Data zones were re-ranked for each domain based on the error-disclosed data.
 Local share of the first four vigintiles was calculated for each local authority [d].
 The analysis shows the variance from the disclosed data rankings [d-a].
This process was repeated over 1,000 times until extreme conditions were
determined.
Sensitivity Analysis – Concentrated Population Distribution
The same limitations identified above apply to the sensitivity analysis here.
The data zones where changes were to be applied were chosen due to the locations
of students halls of residence in Dundee, identified with a quick search on Google
maps.
Using the 2010 MYE (as used in SIMD 2012), the population was adjusted in
proportion to the total population of the data zones and the NRS estimated error for
the whole local authority.
The rank and vigintile of each data zone were recalculated based on the disclosure
controlled data [a].
A new rank and vigintile was calculated for each data zone based on the adjusted
population [b]. The cut-offs for each vigintile is based on the disclosure controlled
rankings and is shown in Table 6 : Dundee City Council Recalculated Cut-offsTable
6.
The analysis is based on the changes to the rank and vigintile [b]-[a].
7
1601(7)
ANNEX B - Tables
Table 1: Change in Percentage Local Share of most deprived (md) by Local
Authority – Income Domain
Local Authority
Change in Local Share of Deprivation (min/max)
5% md
10% md
15% md
20% md
Aberdeen City
Aberdeenshire
Angus
Argyll & Bute
Clackmannanshire
-1.56% / 0%
Dumfries & Galloway
Dundee City
-0.56% / 0%
East Ayrshire
East Dunbartonshire
East Lothian
East Renfrewshire
Edinburgh, City of
-0.36% / 0%
Eilean Siar
Falkirk
Fife
-0.66% / 0.22%
Glasgow City
-0.29% / 0.29%
Highland
-0.34% / 0%
Inverclyde
Midlothian
Moray
North Ayrshire
0% / 0.56%
North Lanarkshire
Orkney Islands
Perth & Kinross
0% / 0.57%
Renfrewshire
0% / 0.47%
Scottish Borders
Shetland Islands
South Ayrshire
South Lanarkshire
Stirling
West Dunbartonshire
West Lothian
0% / 0.47%
0% / 0.37%
0% / 0.37%
-0.33% / 0%
0% / 0.7%
0% / 0.82%
-1.68% / 0%
0% / 0.56%
0% / 0.56%
0% / 0.65%
-0.65% / 0%
-0.65% / 0%
-0.18% / 0.36% -0.18% / 0.18%
-0.36% / 0%
0% / 0.51%
-0.51% / 0%
0% / 0.44%
-0.66% / 0% -0.44% / 0.88%
-0.43% / 0% -0.14% / 0.29%
-0.14% / 0%
-0.34% / 0%
0% / 0.91%
-0.56% / 0%
0% / 0.56%
-0.24% / 0.48% -0.72% / 0.48%
0% / 0.24%
-0.57% / 0%
-0.47% / 0.47%
-0.47% / 0%
-0.25% / 0.25%
0% / 0.75%
-0.25% / 0%
-0.91% / 0% -0.91% / 0.91%
-0.85% / 0%
-0.47% / 0.47%
0% / 0.47%
8
1601(7)
Table 2: Change in Percentage Local Share of md by Local Authority – Employment
Domain
Local Authority
Aberdeen City
Aberdeenshire
Angus
Argyll & Bute
Clackmannanshire
Dumfries & Galloway
Dundee City
East Ayrshire
East Dunbartonshire
East Lothian
East Renfrewshire
Edinburgh, City of
Eilean Siar
Falkirk
Fife
Glasgow City
Highland
Inverclyde
Midlothian
Moray
North Ayrshire
North Lanarkshire
Orkney Islands
Perth & Kinross
Renfrewshire
Scottish Borders
Shetland Islands
South Ayrshire
South Lanarkshire
Stirling
West Dunbartonshire
West Lothian
Change in Local Share of Deprivation (min/max)
5% md
10% md
15% md
20% md
-0.56 / 0
-0.18 / 0
-0.22 / 0.22
-0.14 / 0.14
-0.34 / 0
0 / 0.56
0 / 0.57
0 / 0.47
0 / 0.47
-1.68 / 0
-0.65 / 0
-0.18 / 0
0 / 0.51
0 / 0.44
-0.29 / 0
0 / 0.24
0 / 0.47
9
0 / 0.7
-0.82 / 0.82
0 / 0.65
-0.18 / 0.18
-0.44 / 0
-0.14 / 0.14
-0.34 / 0
-0.48 / 0.24
-0.47 / 0
0 / 0.5
-0.91 / 0
-
0 / 0.37
-0.33 / 0
-0.36 / 0
-0.22 / 0.22
0 / 0.56
0 / 0.24
-0.25 / 0
-0.91 / 0
0 / 0.47
1601(7)
Table 3 : Change in rank and vigintile of selected Dundee City council data zones –
income domain
Total
population
(2010)
Error
S01001095
S01001101
S01001102
S01001109
S01001114
1488
1786
1079
1635
939
-31
-37
-22
-34
-19
Total
6927
-143
Data Zone
Rank Vigintile
5540.5
4212
4194.5
2216
433
18
13
13
7
2
New
New
Rank
rank vigintile change
5504
4154
4145
2143
400
17
13
13
7
2
-36.5
-58
-49.5
-73
-33
Table 4 : Change in rank and vigintile of selected Dundee City council data zones –
employment domain
Data Zone
Working-age
population Error
(2010)
S01001095
S01001101
S01001102
S01001109
S01001114
1410
1658
878
1406
612
-23
6284
-27
4106
-15 3777.5
-23
1972
-10
164
Total
5964
-98
Rank Vigintile
10
20
13
12
7
1
New
rank
New
vigintile
Rank
change
6271
4067
3731
1916
145.5
20
13
12
6
1
-13
-39
-46.5
-56
-18.5
1601(7)
Table 5 : Calculated probability of a gain or loss based on the NRS analysis of the
MYE error
Local Authority
Probability of gain
Probability of loss
0.37
0.55
0.76
0.46
0.52
0.54
0.10
0.46
0.43
0.65
0.53
0.43
0.56
0.63
0.50
0.50
0.47
0.67
0.53
0.58
0.57
0.57
0.48
0.62
0.50
0.55
0.47
0.59
0.43
0.24
0.42
0.60
0.63
0.45
0.24
0.54
0.48
0.46
0.90
0.54
0.57
0.35
0.48
0.57
0.44
0.37
0.50
0.50
0.53
0.33
0.47
0.42
0.43
0.43
0.52
0.38
0.50
0.45
0.53
0.41
0.57
0.76
0.58
0.40
Aberdeen City
Aberdeenshire
Angus
Argyll & Bute
Clackmannanshire
Dumfries & Galloway
Dundee City
East Ayrshire
East Dunbartonshire
East Lothian
East Renfrewshire
Edinburgh, City of
Eilean Siar
Falkirk
Fife
Glasgow City
Highland
Inverclyde
Midlothian
Moray
North Ayrshire
North Lanarkshire
Orkney Islands
Perth & Kinross
Renfrewshire
Scottish Borders
Shetland Islands
South Ayrshire
South Lanarkshire
Stirling
West Dunbartonshire
West Lothian
Table 6 : Dundee City Council Recalculated Cut-offs
Cut-off SIMD 2012 published data Recalculated with disclosed data
5%
10%
15%
20%
11.73
22.35
31.28
40.78
11.17
20.67
30.73
40.22
11
Download