Introduction to the 2001 Licensed Individual SAR

advertisement
Working with the 2001 Licensed
Individual SAR
•
•
•
•
•
Coverage and quality
SAR data issues
Analysing SAR data
Software
The other datasets…
The SARs
• Introduced from ’91 Census as alternative to
tabular outputs
– Improved flexibility
– Huge sample sizes
– Only released following demonstration of nondisclosiveness
• Content and access methods of ’01 data
much more affected by confidentiality
– Less detail on many variables in the licensed files
– Codebook online
2001 Files
• Data available for download
– Individual licensed SAR
– On their way
• Household licensed SAR – under special license
from the UK Data Archive
• Small area microdata file
• If you need more detail – Controlled
Access Microdata Samples
– Individual file
– Household file (version 1)
Census coverage
• Major effort to improve coverage in 2001
• One Number Census
• Use of large Census Coverage Survey to
correct census results, 300K households
– Design independent of census;
– Used matched census and CCS data to estimate
total population in each area,
– adjusted all results for census non-response using
imputation of households and individuals
– Results in final database for UK adjusted for nonresponse
Census coverage
• Coverage before imputation:
– 94% households returned forms, with another
4% estimated to be in households identified by
enumerators.
• Response rate lowest for
– Young people in their early 20s (men aged 2024 resp. rate of 87%)
– Inner London (resp rate of 78%)
• Once imputed cases are included estimated
to be 100% coverage
Population base
• One population base: usual residents
– differs from 1991 when user had to chose
either present or usual resident base
• Students enumerated at term time
address
– And are included in the data. Use
stulaway>1 to exclude who are not usual
residents
• Communal establishments are included
in the indivividual file
Implications for 2001 SARs
• 1991 SARs selected from 10% sample
– Did not include imputed households
– 96% coverage
• 2001 SARs selected from 100% ONC
database
– 94% response; 6% imputed
– Imputed individuals/hholds are identified
– Imputed items are flagged
Two kinds of imputation
• Entire individual or household may be
imputed as part of ONC
– Complete records copied from enumerated
individuals/hhold
– Variable oncperim
• Variables imputed when information
missing
Edit
• 13.7 million edit procedures undertaken
– 28% population had 1+ items imputed
– Common:
• Missing prof quals set to none
• Carer set to no where missing (unless economic activity
also missing)
• Travel to work set to ‘work mainly at/from home’ where
workplace was ‘mainly at/from home
– Others
• 14k people multi-ticked ‘sex’ (so imputed)
• 6k children had marital status changed to single
• impossible values set to missing then
imputed
• Missing values are imputed on the basis of similar local
cases
• does not remove unlikely values
Item imputation
For census output database as a whole:
• One or more items imputed for 28% of
the population
• Employment variables most affected:
– Industry ever worked: 18%
– Occupation ever worked: 14%
– Workplace size: 9%
• Under-enumerated groups are most
imputed, esp. single people
Can I tell what/who has been
imputed?
• Oncperim records whether an individual
has been imputed as part of the ONC
– Copies entire record from census database
• ‘z’ variables identify whether individual
has imputed information on a specific
variable
– Parallel set of variables
– zethew, zage0
Crosstab ethnic group (ethew)
by imputation flag (zethew)
Ethnic Group for England and Wales * ethew imputation flag Crosstabulation
Count
ethew imputation flag
not imputed
Ethnic Group for
England and
Wales
Total
imputed
Total
White
1434719
36098
1470817
Mixed
17891
2373
20264
66556
3638
70194
32656
2607
35263
12323
1517
13840
1564145
46233
1610378
Asian or Asian
British
Black or Black
British
Chinese or Other
ethnic group
Percentage with ethnicity
variable imputed, 2001 SARs
Not imputed
imputed
White
97.5
2.5
Mixed
88.3
11.7
Asian
94.8
5.2
Black
92.6
7.4
Chinese/Other 89.0
11.0
All
2.9
97.1
Percentage ONC imputed, 2001
SARs
ONC imputed
White
Not ONC
imputed
94.8
Mixed
91.5
8.5
Asian
84.6
15.4
Black
76.5
13.5
Chinese/Other 85.6
14.4
All
6.2
93.8
5.2
Should I use imputed
individuals or variables?
• Imputation of individuals is designed to
compensate for under-enumeration
- using imputed cases will give results
comparable with national data
- will help overcome bias from nonresponse
• Imputed variables are generally
reported as accurate - in general we
advise using imputed information
Ethnicity
• But doubt over imputed ethnic
group
• Simpson and Akinwale used
Longitudinal Study to compare 1991
ethnic group with imputed 2001 ethnic
group
• Majority of imputed records are ‘wrong’
• Recommend not using imputed records
for minority groups
www.statistics.gov.uk/events/ls_census2001/agenda.
asp
– SARs Percentage ethnic group imputed:
– 2.5% white; 7.4% black; 11.7% mixed
PRAMMing
• PRAMMing is perturbation designed to deal
with very unusual cases, eg widowed 16-year
olds
• Avoids additional broad-banding
• Perturbation is constrained to
– preserve univariate distributions
– Preserve multivariate distributions on control
variables
– prevents strange results (like 5 year old widows)
• Affects 15 variables
– Primary economic activity – 1% cases
The z-variables
• PRAMMed variables are flagged along
with imputed variables
– Cannot distinguish them
• Imputation flags are stored in variables
with z prefix
• Two versions of the download file
– use the larger *-impflag-*.extension
version if interested in
imputation/PRAMMing
General advice
• If unsure about impact of PRAMMing
and imputation
– Do a sensitivity test
– use the z var to exclude cases with
imputed variables and then repeat your
analysis
– Use ONCPERIM to exclude imputed
individuals and repeat your analysis
National variation
• There is one file for the whole UK
• Some variables are country specific:
– Irish language
• Other variables have national variations
– educational qualifications
– ethnicity
– Watch out for the E,W,S and N suffixes!
• Sampling fraction is not quite consistent
across countries!
– Unlikely to result in major bias of proportions
– Will not gross up to census figures
Sampling fraction: by country &
sex
England Male 3.097
England Female 3.092
Wales Male 3.089
Wales Female 3.098
Scotland Male 3.210!
Scotland Female 3.232!
N Ireland Male 3.125
N Ireland Female 3.065
total 3.105
How does the SARs compare
to the aggregate data?
Tables of comparisons between the licensed individual
SAR and the aggregate tables available online in the
user guide. Results are very similar, with occasional
deviations from 95% ci.
• Looked at univariate distribution of economic activity,
general health, marital status and ethnicity
• No proportion significantly different from aggregate data
at UK level
• By country 9/107 cells are significantly different – slightly
over 5% - will be looking to see if PRAMMING is to blame
Get to know the data
• Use the documentation
• SARs User Guide
–
–
–
–
Use Census schedules to check questions
Check univariate frequencies
Do exploratory analyses
Contact sars-helpdesk@man.ac.uk if you can’t
find the information you need in the online
documentation
• Contact sars-helpdesk@man.ac.uk if you
think there is a problem with the data
SARs as a LARGE dataset
• 1.8 Million cases can cause trouble!
• Use Nesstar to do initial data exploration
• Extract a subset using NESSTAR or take a
subset from the downloaded file
• For serious analysis using a syntax ( or .do)
file to record syntax makes re-running easier
– Create a single syntax file which starts with the
original data
– Use file naming conventions that will enable you
to trace versions
– Keep a record of work done
SARs as sample data
Geographically stratified sample
– approximates to simple random sample
– no clustering in Individual file
– Household file – clustering within
households
– Although large sample you may have small
sample sizes when using sub-groups
– use standard errors and confidence
intervals
Comparisons between 1991
and 2001
• Population base changed
– Imputation (no imputed values in 1991 SARs)
– Students – enumerated at term-time address
– Residents only (choice in 1991)
• Variable continuity
– Variable names have been changed where the
variable is not exactly the same
– Some variables (e.g. age, LLI) are easy to
compare by grouping 1991 values
– Some variables are harder to compare as the
question has changed (eg qualifications)
Ethnicity 91/01
• Different questions asked in 1991 and
2001
• No agreed and perfect correspondence
• Simpson and Akinwale use LS to show
how 1991 maps on to 2001
www.statistics.gov.uk/events/ls_census2001/agenda.asp
Software options
• Supported packages
– Nesstar
– NSDstat
– SPSS
– Stata
• Other options
– Import or Stat/transfer to another package
– Use Nesstar to save to SAS or Statistica
– unless you use a v. small subsample the
SARs will be too big for most
spreadsheets!
Looking forward: Moving forward
•
•
•
•
Controlled Access Microdata Samples
Household SARs
Small Area Microdata sample
Learning and Teaching
CAMS content
• Controlled Access Microdata designed
for professional researchers:
• Access in safe setting only
• Specification on SARs website
• Individual file and Household file
Content of CAMs files
• Files contains much more detail; e.g.
– Individual year of age (topcoded at 95)
– FULL coding on country of birth
– SOC Unit Goup
– Local authority geography
– Index of Multiple Deprivation for SOAs
– Index of Multiple Deprivation for migrants
last address
Controlled Access
• CAMS is managed by ONS
• Data is accessed at
London/Titchfield/Newport in Virtual
Laboratory setting on a server
• Virtual lab looks like a standard windows
interface
• Use SPSS/Stata in usual way
• output checked for confidentiality before
release
• Further information and appropriate forms at
http://www.statistics.gov.uk/census2001/sar_cams.asp
• Contact sars@ons.gsi.gov.uk for more details
CAMS Good practice
• Use the licensed SARs...
– to exhaust the potential of other datasets
– to write your syntax files
• check the disclosure guidelines before
writing your file
• Avoid complex tables
– small cell counts aren’t reliable
– unique cells will usually be suppressed
• Do use models
Household SAR
•
•
•
•
•
1% of households and all individuals
Allows linkage between individual in hholds
Will be available SOON under special license
Similar detail to Individual SAR
Specification of Household SAR on website
The hierarchy of the
household file
Household 1
North West
Social rented
Person 1
HoH
Female
28
No quals
P/T Employee
Person 2
Son of HoH
Male
12
N/A
N/A
Household 2
Wales
Owner occupier
Person 1
HoH
Male
33
Degree
F/T Employee
Person 2
Spouse of HOH
Female
31
Degree
P/T Employee
Person 3
Parent of HoH
Female
72
No quals
Econ Inactive
Small Area Microdata file
• 5% sample of individuals
• Full range of variables
• LA lowest geography
• Except Isles of Scilly and City of London in E and W;
similar exceptions in S and NI
– Excludes communal establishments
– Age 11-year bands
– Ethnicity – 5 groups or 16 with records swapping
between LAs
– Economic activity – 3 categories
• Delivery at CCSR soon
Using the SARs in Learning
and Teaching
• SARs provides easy to use dataset
• Fits well with aggregate data
• Supported by learning and teaching
materials
– www.chcc.ac.uk
• Access managed in same way:
– use Census Registration System
– need ATHENS (for data and CHCC)
User support
•
•
•
•
•
•
Web pages are regularly updated
Documentation online
Resources and links added as we go
Seminar invitations welcome!
Regional workshop invites welcome!
SARs Helpdesk
– sars-helpdesk@man.ac.uk
– (0161) 275 4735
• Join email and newsletter lists
• SARs User Group
Last word about surveys
• ESDS Government’s helpdesk:
– govsurveys@esds.ac.uk
Download