General Editing Specifications

advertisement
Population and Housing
Census and Survey Editing
Michael J. Levin
Center for Population and Development Studies
Harvard University
Michael.levin@yahoo.com
1
Appendix A
Censuses where some of these methods were applied
Country
Census Years
American Samoa
1974, 1980, 1990, 2000
Ethiopia
2007
Fiji
1996, 2007
Ghana
1984, 2000, 2010
Grenada
2001
Guam
1980, 1990, 2000
Indonesia
1980, 2010
Kenya
1999
Kiribati
2005
Lesotho
1996, 2006
Malawi
1998, 2008
Maldives
2006
Marshall Islands
1973, 1980, 1988
Micronesia
1973, 1980, 1994, 2000
Northern Marianas
1973, 1980, 1990, 1995, 2000
Palau
1973, 1980, 1990, 1995, 2000, 2005
Papua New-Guinea
1990
Samoa
2001
Sierra Leone
2004
Solomon Islands
1999
South Africa
2001
Sudan
2008
Tanzania
2002
Timor Leste
2004
Tonga
1996, 2006
Uganda
1991, 2002
US Virgin Islands
1980, 1990, 2000
Vanuatu
1989
Zambia
2000
Note: For some, processing occurred during the census, for others it was during
preparation or during analysis (including own children estimation).
2
Purpose of Handbook
No census data are ever perfect
 Changes are made -- little documentation
 Promote communication between subject
specialists and programmers
 “Cookbook” of suggestions -- presents
possible resolutions
 But country edit teams must decide

3
The Census Process
Data collection
 Capture
 Editing
 Tabulation and Dissemination
 Archiving

4
History of census editing
Early years – manual or nothing
 Computers
 Within record editing
 Between record editing
 Hot decking

5
Editing in Historical Perspective
Before computers: manual editing
 With computers: Increased complexity
 Automated changes
 Generalized editing packages
 New philosophies of editing
 Personal computers
 Appropriate levels of computer
editing

6
Major Elements in a Census
Preparatory work
 Enumeration
 Data processing -- keying, editing and
tabulations
 Building data bases and dissemination
 Evaluation of results
 Analysis of results

7
Errors in Census Process
Coverage Errors
 Questionnaire Design
 Enumerator/respondent errors
 Coding errors
 Data entry errors
 Computer editing errors
 Tabulation errors

8
What is editing

Editing is the systematic inspection of
invalid and inconsistent responses, and
subsequent manual or aurtomatic correction
according to pre-determined rules.

The editing team!!
9
Editing Team
Appropriate internal subject matter
specialists
 Computer Programmers
 Work together as a team
 Edit Specs as means of communication
 Outside experts -- academicians
 Outside experts -- private sector

10
Why edit?
Edited vs unedited data
 Always preserve original data
 Consider the users!!

11
Table 1. Sample population by 15-year age group
and sex, using unedited and edited data
Age group
Total
Less than 15 years
15 to 29 years
30 to 44 years
45 to 59 years
60 to 74 years
75 years and over
Not reported
Unedited data
Total
Male
4,147
1,639
1,256
727
360
116
34
15
2,033
799
612
356
194
54
12
6
Female
2,091
825
643
369
166
59
22
7
Not
reported
23
15
1
2
0
3
0
2
Edited data
Total Male
4,147
1,743
1,217
695
341
114
37
2,045
855
603
338
182
53
14
Female
2,102
888
614
357
159
61
23
12
Table showing trends with
unknowns
TABLE 2. POPULATION AND POPULATION CHANGE BY 15-YEAR AGE
GROUP WITH UNKNOWNS: 1990 AND 2000
Age group
Total
Less than 15 years
15 to 29 years
30 to 44 years
45 to 59 years
60 to 74 years
75 years and over
Not reported
Numbers
2000
4147
1639
1256
727
360
116
34
15
1990
3319
1348
902
538
200
89
25
217
Number
Change
Per cent
Change
828
291
354
189
160
27
9
-202
24.9
21.6
39.2
35.1
80.0
30.3
36.0
-93.1
Per cent
2000
100.0
39.5
30.3
17.5
8.7
2.8
0.8
0.4
1990
100.0
40.6
27.2
16.2
6.0
2.7
0.8
6.5
13
WHAT CENSUS EDITING
SHOULD DO
Give users measures of the quality
of the data
2 Identify the types and sources of
error, and
3 Provide adjusted census results
1
14
Goals of the edit
Imputed household should closely resemble
failed edit household
 Imputed data should come from a single
donor person or house resembling donee
 Equally good donors should have equal
chances

15
Basics of Census Editing







Systematic inspection and change (not always
correction)
Fatal edits -- invalid or missing entries
Query edits -- inconsistencies
Must preserve the original data as much as possible
Quality enumeration more important than editing
Edit does not improve data quality -- makes more
esthetic
Team must determine how far to do
16
More of Basics









Over-editing is harmful
Treatment of unknowns
Spurious changes
Determining tolerances
Learning from the edit process
Quality assurance
Costs of Editing
Imputation
Archiving
17
How Over-editing is Harmful
 Timeliness
 Finances
 Distortion
of true values
 A false sense of security
18
What we have to look out for
Treatment of unknowns
 Spurious changes
 Using tolerances
 Learning from the editing process
 Quality assurance
 Costs of editing

19
Two parts of a national edit
Structure editing
 Content editing

20
Editing Applications
Manual versus automatic correction
 Guidelines for correcting data
 Validity and consistency checks
 Methods of correcting and imputing data
 Other editing systems

21
Manual versus Automatic
Correction
Manual correction: takes a long time and
very subject to error
 Automatic correction: faster and consistent.
 Not necessarily correct, just consistent.
 Can look at many variables at the same time
 Can keep an audit trail

22
Figure 1. Sample editing specifications to correct sex
variable, in pseudocode
If SEX of the HEAD OF HOUSEHOLD = SEX of the SPOUSE
If FERTILITY of the HEAD OF HOUSEHOLD is not blank
If FERTILITY of the SPOUSE is blank
(if the SEX of the head of household is not already female) Make the SEX = female endif
(if the SEX of the spouse is not already male) Make the SEX = male endif
else
Do something else because they have same sex and both have fertility !!!
[The “something” could be using the sex of the previous head, or alternating the sex of the
Head, or using ratios of sexes of all heads for an appropriate response, etc.]
endif
Endif
Else
This is the case where the head of household’s fertility is blank
If FERTILITY of the SPOUSE is not blank
(if the SEX of the head of household is not already male) Make the SEX = male endif
(if the SEX of the spouse is not already female) Make the SEX = female endif
else
Do something else because BOTH have no fertility!!!
[The “something” could be using the sex of the previous head, or alternating the sex of the
Head, or using ratios of sexes of all heads for an appropriate response, etc.]
endif
Endif
Endif
23
Guidelines for Correcting Data




Make the fewest required changes possible to the
originally collected data
Eliminate obvious inconsistencies among the
entries
Systematically supply entries for erroneous or
missing items by using other entries for the
housing unit, person, or other persons in the
household or group
When appropriate, use “not reported”
24
25
Types of editing
 Top
Down
• The usual way
• Is simple and straight forward
 Multiple-variable editing approach
• Uses more information
• Is likely to be a better guess
26
Methods of Correction and
Imputation
 When
imputation is not needed – toggling
sexes
 Static imputation – cold deck technique
 Dynamic imputation – hot deck technique
27
Hot Deck Imputation
Geographic considerations
 Use of related items
 Sequence of the items
 Complexity of the matrices
 Standardized hot decks
 Size of hot decks -- too big, audit trail,
too small, difficult items

28
In developing hot decks







Imputation matrices – structure of the matrices
Standardized imputation matrices
Seeding the decks
Big, but not too big
Understanding what the matrix is doing
When the matrix is too small …
Occupation and industry!!
29
Aids to checking edits
1.
2.
3.
Listings
Writing whole households before and after
with changes
Frequency matrices
30
Figure 4. Example of a listing summary for Malawi 2008
Census
[LISTING]
1718 336574
******************************** ...
1719 336574
******* Age & Head
********* ...
1720 336574
******************************** ...
1805
1546
0.1 *P00-1* Head is not first person, is %2d...
1823
877
0.1 *P00-2* No head of household, first person 14+...
1835
62
0.0 *P00-3* No head 14+, first person becomes head...
1850
5074
0.3 *P00-4* Too many heads of household - 1 ...
1860
5238
0.3 *P00-5* Remaining heads made other RELATIONSHI...
1874
939
0.1 *P00-6* After head edit, not one and only one ...
1889
2301
0.1 *P00-6a* Spouses too young made other relative...
1909
1062
0.1 *P00-6ax* Multiple spouses for unmarried head...
1911
1062
0.1 *P00-6ax* Multiple spouses for unmarried head...
1929
44
0.0 *P00-6a1* Crazy case where spouse is visitor a...
1949
89
0.0 *P00-6a3* Crazy case where spouse is visitor a...
1998
12
0.0 *P00-6a1* Extra spouses who are visitors...
2017
1483
0.1 *P00-6a2* Extra spouses not married...
1748490
1748490
1748490
1748490
1748490
1748490
1748490
1748490
1748490
1748490
1748490
1748490
1748490
31
Figure 5. Example of a listing summary for Lesotho 2006
Census
[LISTING]
4388
4389
4390
4401
4410
4419
4426
4433
4440
4453
4461
21471
21471
21471
1449
2897
3791
3895
4908
103
8
616
1.2
2.3
3.0
3.1
3.9
0.1
0.0
0.5
...
******* Sisterhood Characteristics *********...
...
*G45-1* Total sisters out of range [%2d] illeg...
*G45-2* Dead sisters out of range [%2d] illega...
*G45-3* Pregnant sisters [%2d] illegal...
*G45-4* At birth sisters [%2d] illegal...
*G45-5* Week 6 sisters [%2d] illegal...
*G45-6* Sum of Dead Sisters [%2d][%2d][%2d] gr...
*G45-7* Sum of Dead Sisters [%2d][%2d][%2d] gr...
*G45-8* Dead Sisters [%2d] greater than total ...
124839
124839
124839
124839
124839
124839
124839
124839
32
Figure 8. Example of a write listing for Ethiopia 2007
Census
[WRITE]
BARCODE REGION ZONE WEREDA TOWN SUB_CITY SA KEBELE EA
HHNO HUNO
------------------------------------------------------------------------PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8
01 01 01 01 31 01 05 67 02
02 01 06 01 34 01 05 67 02
03 01 09 02 30 01 05 05 02
04 01 09 02 20 01 05 05 02
05 01 09 01 01 01 05 05 02
P18-3 No literacy , but schooling 97, so
P20-20 Unable to read and write 98 because
P16-1 Mother's vital status invalid =
P17-1 Father's vital status invalid =
PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8
01 01 01 01 31 01 05 67 02
02 01 06 01 34 01 05 67 02
03 01 09 02 30 01 05 05 02
04 01 09 02 20 01 05 05 02
05 01 09 01 01 01 05 05 02
9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL
08 01 01
01 97 12 01
08 01 01
01 97 17 01
07 02 02
97 05 01
03 02 02
02 98
03
08
03
literate, PN = 3
never attended school , PN = 4
PN = 5
PN = 5
9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL
08 01 01
01 97 12 01
08 01 01
01 97 17 01
07 02 02
01 97 05 01
03 02 02
02 98 00 03
08 01 01 01 01
RS LY
01
01
01
01 03
07 08
ES
07
03
05
MS MH FH MA FA MD FD LB
01
01
04 00 00 00 00 00 00 00
01 01 00 00 00 00 00 00
01
RS LY
01
01
01
01 03
ES
07
03
05
MS MH FH MA FA MD FD LB
01
01
04 00 00 00 00 00 00 00
01 01 00 00 00 00 00 00
33
Figure 10. Example of a frequency distribution for Sudan
2008 Census
[FREQUENCY]
Imputed Item Q18_ATTAINMENT: Education Attainment - all occurrences
_____________________________ _____________ _____________
Categories
Frequency
CumFreq
% Cum % Net %|cNet %
_______________________________ _____________________________ _____________ _____________
1 No Qualification
105
105
2.2
2.2
2.4
2.4
2 Incomplete Primary
1564
1669
33.5
35.7
35.3
37.7
3 Primary 4
529
2198
11.3
47.0
11.9
49.6
4 Primary 6
492
2690
10.5
57.6
11.1
60.7
5 Primary 8
302
2992
6.5
64.0
6.8
67.5
6 Junior 3
251
3243
5.4
69.4
5.7
73.2
7 Junior 4
58
3301
1.2
70.7
1.3
74.5
8 Secondary 3
95
3396
2.0
72.7
2.1
76.6
9 Secondary 4
5
3401
0.1
72.8
0.1
76.7
10 Post Secondary Diploma
2
3403
0.0
72.8
0.0
76.8
11 University Degree
154
3557
3.3
76.1
3.5
80.3
12 Post Graduate Diploma
10
3567
0.2
76.3
0.2
80.5
13 Master
52
3619
1.1
77.5
1.2
81.7
14 Ph.D
1
3620
0.0
77.5
0.0
81.7
15 Khalwa
1
3621
0.0
77.5
0.0
81.7
@17
144
3765
3.1
80.6
3.2
85.0
@98
667
4432
14.3
94.9
15.0 100.0
_______________________________ _____________________________ _____________ _____________
NotAppl
240
4672
5.1 100.0
_______________________________ _____________________________ _____________
TOTAL
4672
4672 100.0 100.0
34
Figure 11. Example of a frequency distribution for
additional edit for Zambia 1990 Census
[FREQUENCY]
Input: 1IN100.DAT
Program: ZAMHOUSE
ROOMS
------------------------------------------------------------Values
Number of
Cum.
Imputed
Imputations
Percent
Percent
------------------------------------------------------------< 1
1,415
37.21
37.21
1
2,185
57.45
94.66
2
121
3.18
97.84
3
22
0.58
98.42
4
16
0.42
98.84
5
23
0.60
99.45
6
21
0.55
100.00
> 6
------------------------------------3,803
35
Other considerations
Running the edit three times: seed, run,
check
 Saving original responses
 Imputation flags

36
Computer Edit Specifications
for Pilot Census 2001
Data Processing Project
Christopher S. Corlett
Data Processing Adviser
U.S. Census Bureau
37
Editing examples:






Language – the general edit
Young heads of household
Population group
Access to telephones
Same-sex marriages
Fertility
Source for all data except language: South Africa Pilot Census 2001
38
Language Edit



If this is the head and language is missing, first
look for someone else in the house with
language, and assign that.
If this is the head without language, no one
else has language, use neighboring head of
similar characteristics to assign a best guess.
If this is someone else in the house and
language is missing, assign the head’s
language.
39
Language Edit: Within House
91200217 Population Group Case = 0009
ID1
ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS
01
1 034 01
1
55
1 09
1
02
2 023 02
1 06 55
1 07
1
03
2 005 03
1 06 55
1 09
1
04
2 003 03
1 06 55
1 09
1
V.14c: P07 invalid for head, imputing from other PN = 01 Lang =
V.14c: P07 invalid for head, imputing from other PN = 01 Lang =
V.14c: P07 invalid for head, imputing from other PN = 01 Lang =
end ID1
ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS
01
1 034 01
1 06 55
1 09
1
02
2 023 02
1 06 55
1 07
1
03
2 005 03
1 06 55
1 09
1
04
2 003 03
1 06 55
1 09
1
PERMPLAC SM
1
1
1
1
Oth lang = 06
06 Oth lang = 06
06 Oth lang = 06
PERMPLAC SM
1
1
1
1
40
Language Edit: Imputed House
91200697 Language Case = 0027
ID1
ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM
01
1 027 01
1
1 09
1
1
02
2 027 02
1
1 09
1
1
03
1 005 03
1
1 09
1
1
V.14d: P07 invalid, imputing from deck ALANGUAGE PN = 01 Lang =
V.15d: P08 invalid for head, impute from deck ARELIGIO PN = 01 Head Relig =
V.14f: P07 invalid, imputing from head PN = 02 Lang =
Head's lang = 06
V.15f: P08 invalid, imputing from head's religion PN = 02 Relig =
Head's
relig = 38
V.14f: P07 invalid, imputing from head PN = 03 Lang =
Head's lang = 06
V.15b: imputing P08 from mother's religion PN = 03 Relig =
Mo relig = 38
end ID1
ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM
01
1 027 01
1 06 38
1 09
1
1
02
2 027 02
1 06 38
1 09
1
1
03
1 005 03
1 06 38
1 09
1
1
41
Editing examples:






Language – the general edit
Young heads of household
Population group
Access to telephones
Same-sex marriages
Fertility
Source for all data: Pilot Census 2001
42
Young heads of household



V.3 (relationship for head) and V.5 (age of head)
Related issue: each HH must have 1 and only 1
head.
For invalid head of ages, try to obtain via:
–
–
–
spouse (impute from deck based on spouse's age and
head's sex)
otherwise, children (child's age and head's sex)
otherwise, impute from deck (household size
and head's sex)
43
Young heads

Skepticism about young heads; if younger than 12 then
confirm:
–
–
–
–
–
–
–

if someone else older is present, then make them the head (V.3)
can't be married (must be 12+ years to be married)
has to be 12 years older than biological children
confirm consistency of age and educational level
confirm consistency of age and educational institution
can't have economic activity responses if younger than 10
can't have fertility (for girls)
If head doesn't pass these age tests, then impute
(based on head’s sex and household size).
44
Young heads

Effect: number of heads younger than 12 years old
drops from 1296 (1.3%) to 627 (0.6%)
45
46
47
Case 1:
Notes:
PN = Person number
SEX = Sex
DOB = Day of birth
MOB = Month of birth
YOB = Year of birth
REL = Relationship to head
MAR = Marital status
SPN = Spouse person number
CEB = Children ever born (total)
CS = Children surviving (total)
MPN = Mother person number
FPN = Father person number
48
Case 1:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
1 11 01 1950 051 01
1 99
02
1 17 07 1977 023 03
5
03
2 04 04 1985 005 03
5
00
04
1 24 10 1987 011 03
5
05
1 01 07 1990 010 03
5
06
1 20 02 1994 007 01
5
07
1 20 02 1994 007
5
CS MPN FPN
99 01
01
09 01
53 01
49 01
99 01
99 01
V.2b4b: age and DOB inconsistent, age <= DOB, Age = 005 Date = 04/04/1985
V.2b4b: age and DOB inconsistent, age <= DOB, Age = 011 Date = 24/10/1987
V.3: either no heads or > 1= 0002
V.3h: more than 1 head =
V.3i: multiple heads, making oldest= 0051
V.3k: multiple heads, making excess other rel
V.9g: Relation invalid, has a dad, impute Rela
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
1 11 01 1950 051 01
1 99
02
1 17 07 1977 023 03
5
03
2 04 04 1985 015 03
5
00
04
1 24 10 1987 013 03
5
05
1 01 07 1990 010 03
5
06
1 20 02 1994 007 11
5
07
1 20 02 1994 007 03
5
CS MPN FPN
99 01
01
09 01
53 01
49 01
99 01
99 01
49
Case 2:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
2 01 09 1986 015 01
5
00
02
2 09 06 1990 011 06
5
03
1 01 09 1991 010 06
5
04
2 01 09 1994 007 06
5
V.2b4b: age and DOB inconsistent, age
V.2b4b: age and DOB inconsistent, age
V.2b4b: age and DOB inconsistent, age
V.2b4b: age and DOB inconsistent, age
V.3a1: head is younger than 16, Age =
V.3a3: no older relatives found; keep
CS MPN FPN
90
99
99
<= DOB, Age
<= DOB, Age
<= DOB, Age
<= DOB, Age
014
young head
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
2 01 09 1986 014 01
5
00
02
2 09 06 1990 010 06
5
03
1 01 09 1991 009 06
5
04
2 01 09 1994 006 06
5
=
=
=
=
015
011
010
007
Date
Date
Date
Date
=
=
=
=
01/09/1986
09/06/1990
01/09/1991
01/09/1994
CS MPN FPN
90
99
99
50
Case 3:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
1 12 01 1998 003 09 05
02
2
008 09 05
CS MPN FPN
V.3b: no head of household!
V.3e: no head, making oldest person the head
V.5: head is younger than 12, about to confirm this
V.5e1: young head, but age consistent with educ lvl
V.5i1: young head, but age consistent with educ inst
V.5k: imputing young head's age from AHEADAGE for econ
activity inconsistency
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
1 99 99 1908 092 01 05
02
2 12 01 1998 003 09 05
CS MPN FPN
51
Editing examples:
Young heads of household
 Population group
 Access to telephones
 Same-sex marriages
 Fertility

52
Population Group (V.13)

For invalid population group, try to obtain via:
–
–
–

Head of household
Someone else in the household
Otherwise, impute from deck (age by household size)
Effects:
–
Removes 2.9% blank/invalid responses
53
Population Group (percents)
80.0%
70.0%
60.0%
percent
50.0%
raw
40.0%
edited
30.0%
20.0%
10.0%
0.0%
Black
African
Coloured Indian or
Asian
White
Other
blank
invalid
54
Population Group


Parts of the current edit might need refinement for
South Africa
Issues to explore:
–
–
Imputations in HHs with multiple pop groups;
Tolerances and household size:



–
Case where whole HH has blank/invalid pop group;
Case where all but 1 HH member has same pop group;
Situations between these two extremes
Effect on planning/data use of leaving the
variable “not stated”
55
Population Group
56
Case 1:
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01
1 073 01
1 06 55
1 09
1
1
02
2 063 02
1 06 55
1 09
1
1
03
2 025 11
1 06 55
1 09
1
1
04
1 016 09
1 06 55
1 09
1
1
05
1 014 09
1 06 55
1 09
1
1
06
2 011 09
1 06 55
1 09
1
1
07
2 000 11
1 09
1
1
V.13e: Pop group invalid, impute from head PN=07 Group=Head Group= 1
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01
1 073 01
1 06 55
1 09
1
1
02
2 063 02
1 06 55
1 09
1
1
03
2 025 11
1 06 55
1 09
1
1
04
1 016 09
1 06 55
1 09
1
1
05
1 014 09
1 06 55
1 09
1
1
06
2 011 09
1 06 55
1 09
1
1
07
2 000 11
1 06 55
1 09
1
1
57
Case 2:
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01
1 032 01
3 02 39
1 08 710
1
1
02
2 028 02
1 08
1
1
03
1 068 07
1 08
1
1
04
2 057 07
1 08
1
1
05
2 007 03
1 06
1
1
06
1 006 03
1 08
1
1
07
1 001 03
1 08
1
1
08
2 030 12
1 07
09
1
1
V.13e: Pop group invalid, impute from head (SIX TIMES)
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01
1 032 01
3 02 39
1 08
1
1
02
2 028 02
3 02 39
1 08
1
1
03
1 068 07
3 02 39
1 08
1
1
04
2 057 07
3 02 39
1 08
1
1
05
2 007 03
3 02 39
1 06
1
1
06
1 006 03
3 02 39
1 08
1
1
07
1 001 03
3 02 39
1 08
1
1
08
2 030 12
1 07 39
1 09
1
1
58
Case 3:
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01
1 045 01
01 32
1 01
1
02
2 048 02
01 32
1 01
1
V.13b: Pop group invalid, impute from deck
V.13e: Pop group invalid, impute from head
PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM
01
1 045 01
4 01 32
1 01
1
1
02
2 048 02
4 01 32
1 01
1
1
59
Editing examples:
Young heads of household
 Population group
 Access to telephones
 Same-sex marriages
 Fertility

60
Telephones and cell phones (IV.16)
Telephone access is not applicable for
households that have telephones or cell phones.
 Households with responses to the telephone
access question should not have telephones or
cell phones.
 Impute these variables from hot decks
(based on dwelling type and tenure
status) if necessary.

61
62
Telephones and cell phones

Many left all questions blank
–
–
Problems with capture of continuation qsts
Confusion of “blank” and “no” (also seen in
disabilities section)
63
Summary report:
64
Case 1:
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD
01
2 006
1
4
7
4
1
5
1
1
IV.16c: impute cell phone = no
Phone2 Cell=
TV CMP FRG TEL CLL ACC RFS
1
2
1
2
2
4
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD
01
2 006
1
4
7
4
1
5
1
1
Access= 2
TV CMP FRG TEL CLL ACC RFS
1
2
1
2
2
2
4
65
Case 2:
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD
01
006
1
4
1
4
1
1
1
1
TV CMP FRG TEL CLL ACC RFS
1
2
1
1
4
IV.16h: imputed cell = 1 from deck
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD
01
2 006
1
4
1
4
1
1
1
1
TV CMP FRG TEL CLL ACC RFS
1
2
1
1
1
4
66
Case 3:
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD
02
2 005
1
4
1
4
4
4
1
1
IV.13c:
IV.14c:
IV.15c:
IV.16f:
IV.16h:
IV.16j:
IV.17c:
imputed
imputed
imputed
imputed
imputed
imputed
imputed
TV CMP FRG TEL CLL ACC RFS
television
computer
refrigerator
telephone
cell
access
rubbish
DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD
02
2 005
1
4
1
4
4
4
1
1
TV CMP FRG TEL CLL ACC RFS
1
2
2
2
2
1
4
67
Editing examples:
Young heads of household
 Population group
 Access to telephones
 Same-sex marriages
 Fertility

68
Same-sex marriages (V.7, V.8, and V.12)



Treated as part of the marital status edits for heads
and rest of household
Imputations for invalid sex never result in a samesex marriage
No polygamous combinations of same-sex
allowed
69
Same-sex marriages

Skepticism about same-sex marriages; only
allowable if:
–
–
–
–
Both partners 12 years or older;
Both sexes valid;
Relationships to head consistent (for sub-families);
Both partners’ marital statuses reported as “living
together” (4).
70
71
Same-sex marriages
Investigation shows that almost all of the
reported same-sex marriages are erroneous.
 Enumerator’s manual contains instructions
that add bias against accurate collection.
 Social situation in SA means that this might
become a contentious issue.

72
Same-sex marriages

Enumerator’s Manual, pg 38:
“Question P-05: Marital Status …
Couples who are not married to each other but live
together as if they are married, belong to category 4.
This category is for people who live in every respect as
a married couple except that they have not undergone a
marriage ceremony. Only male/female couples should
indicate this category – the census
does not collect data on gay couples.”
73
Case 1:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
1 28 12 1930 071 01
1 02
02
1 17 06 1937 064 02
1 01
03
1 06 02 1935 066 06
8
04
2 06 03 1984 007 09
5
00
CS MPN FPN
99
V.2b4b: age and DOB inconsistent, age <= DOB,Age=071 Date=28/12/1930
V.2b4b: age and DOB inconsistent, age <= DOB,Age=064 Date=17/06/1937
V.2b4b: age and DOB inconsistent, age <= DOB,Age=007 Date=06/03/1984
V.7i: same sex marriage w/ MSs not both 4
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
1 28 12 1930 070 01
1 02
02
2 17 06 1937 063 02
1 01
03
1 06 02 1935 066 06
8
04
2 06 03 1984 016 09
5
00
CS MPN FPN
99
74
Case 2:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
2 16 01 1956 044 01
5
01
02
2 09 05 1991 009 02
5
CS MPN FPN
01
V.2b4b: age and DOB inconsistent, age <= DOB,Age=044 Date=16/01/1956
V.7a: imputing SPN for head to point to spouse SPN=Spouse= 0002
V.7e: imputing head MS from female head MS= 5 SPN= 02
V.7g: spouse too young ... impute from age Head Age = 045 Sp Age= 009
V.7i: same sex marriage w/ MSs not both 4
V.7m: imputing sp MS from hot deck
V.7n: making spouse SPN point to head
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
2 16 01 1956 045 01
1 02 01
02
1 09 05 1991 026 02
1 01
CS MPN FPN
01
75
Case 3:
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
2 03 03 1976 025 01
4
01
02
2 14 08 1979 021 02
4
03
1 03 08 1995 005 03
5
CS MPN FPN
01 99 99
99
02 01
V.2b4b: age and DOB inconsistent, age <= DOB, Age=025 Date=03/03/1976
V.7a: imputing SPN for head to point to spouse
V.7h: same sex marriage, both head & spouse MS = 4
V.7n: making spouse SPN point to head
PN SEX DOB MOB YOB AGE REL MAR SPN CEB
01
2 03 03 1976 024 01
4 02 01
02
2 14 08 1979 021 02
4 01
03
1 03 08 1995 005 03
5
CS MPN FPN
01 99 99
99
02 01
76
Editing examples:
Young heads of household
 Population group
 Access to telephones
 Same-sex marriages
 Fertility

77
Fertility (V.27)



Fertility is not applicable for men or women not
12:49 years old.
For women 12:49, blanks in fertility
section are treated as zeros.
Handle common enumerator and reporting errors
–
–
–
Switch lines when turning to next page;
Husband report fertility, not wife;
Last child info with child, not mother.
78
Notes:
TCEB = Total children ever born
MCEB = Male children ever born
FCEB = Female children ever born
TCS = Total children surviving
MCS = Male children surviving
FCS = Female children surviving
SXLAST = Sex of last child born
VSLAST = Vital status of last child born (still alive?)
YRLAST = Year of birth of last child born
MOLAST = Month of birth of last child born
79
Fertility

Fertility is valid if all of the following are true:

TCEB = MCEB + FCEB, and
TCS = MCS + FCS, and
TCEB >= TCS, and
MCEB >= MCS, FCEB >= FCS, and
number of boys in the household who declared this person as their mother (using
mother person number) ≤ MCS, and
number of girls in the household who declared this person as their mother (using
mother person number) ≤ FCS, and
and woman's age ≥ (11 + TCEB), and
FCEB>0 if SXLAST=female, and
MCEB>0 if SXLAST=male, and
FCS>0 if SXLAST=female and VSLAST=alive, and
MCS>0 if SXLAST=male and VSLAST=alive, and
all responses for last child born information (YRLAST, MOLAST, SXLAST,
VSLAST) are complete and valid, or else they are all blank (indicating no births);











80
Fertility



Also, maximum number of children (24 total and
12 per sex).
When bad CEB or CS values can be calculated,
then we do that.
When fertility is not valid, impute a consistent set
of fertility responses from a deck (based on age,
marital status, education level); then confirm
last child born info from woman’s children
in household.
81
82
Total Births (for women 12:49 years)
60.0%
percent of women
50.0%
40.0%
30.0%
20.0%
10.0%
16
-1
8
19
-2
4
16
-1
8
19
-2
4
raw
edited
Un
de
fi n
ed
13
-1
5
13
-1
5
10
-1
2
89
7
6
5
4
3
2
1
0
0.0%
num ber of children
Total children still living (for women 12:49 years)
60.0%
40.0%
30.0%
20.0%
10.0%
edited
num ber of children
Un
de
fi n
ed
raw
10
-1
2
89
7
6
5
4
3
2
1
0.0%
0
percent of women
50.0%
83
Case 1:
PN SEX AGE CEB MCB FCB
01
1 041
02
2 038 04 02 02
03
2 022 71 01 00
04
1 012
05
2 009
06
1 001
CS MCS FCS MLB YRLB SLB VLB
04
01
02
01
02
00
08 1991
06 1999
2
1
1
1
V.27: problems detected in fertility info ... PN= 03
V.27b: imputing TCEB = MCEB+FCEB PN= 03 TCEB=71 MCEB=01 FCEB=00
PN SEX AGE CEB MCB FCB
01
1 041
02
2 038 04 02 02
03
2 022 01 01 00
04
1 012
05
2 009
06
1 001
CS MCS FCS MLB YRLB SLB VLB
04
01
02
01
02
00
08 1991
06 1999
2
1
1
1
84
Case 2:
PN SEX AGE CEB MCB FCB
01
2 054
02
2 035 02 01 01
03
2 020 00
04
2 014 00
05
1 012
06
2 005
CS MCS FCS MLB YRLB SLB VLB
02
01
01
2
1
V.27: problems detected in fertility info ... PN=02
V.27POST: LAST info blank, imputing from youngest child PN= 02
(updates FCEB, TCEB, FCS, TCS)
V.27e: imputing fertility data from AFERTILITY PN=03
V.27e: imputing fertility data from AFERTILITY PN=04
PN SEX AGE CEB MCB FCB
01
2 054
02
2 035 02 01 01
03
2 020 00 00 00
04
2 014 00 00 00
05
1 012
06
2 005
CS MCS FCS MLB YRLB SLB VLB
02
00
00
01
00
00
01
00
00
11 1995
2
1
85
Case 3:
PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB
01
2 042 05 02 03 05 02 03 09 1997
2
1
02
2 021 01 01
01 01
04 1994
1
1
03
2 018 01 00 00 01 00 00 01 2001
1
04
1 020
05
1 014
06
2 003
07
1 002
08
2 000
V.27: problems detected in fertility info ... PN= 02
V.27c: imputing FCEB = TCEB-MCEB PN= 02
V.27g: imputing FCS = TCS-MCS PN= 02
V.27: problems detected in fertility info ... PN= 03
V.27b: imputing TCEB = MCEB+FCEB PN= 03
V.27j: imputing fertility from hot deck PN= 03
PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB
01
2 042 05 02 03 05 02 03 09 1997
2
1
02
2 021 01 01 00 01 01 00 04 1998
1
1
03
2 018 01 00 01 01 00 01 01 2001
2
1
04
1 020
05
1 014
06
2 003
07
1 002
08
2 000
86
Fertility

Issues:
–
–
If woman reports zero TCEB and leaves rest
blank, does that mean “no fertility” or “error”?
See if last child born can be handled separately
from rest of fertility, so that full set is not
imputed when last child born has problems and
rest is valid
87
Conclusions
Edits part of the series of census procedures
 Usually more for aesthetics than technical
enhancement
 Hardware and software changing rapidly
 The revolution continues!

88
Download