Population and Housing Census and Survey Editing Michael J. Levin Center for Population and Development Studies Harvard University Michael.levin@yahoo.com 1 Appendix A Censuses where some of these methods were applied Country Census Years American Samoa 1974, 1980, 1990, 2000 Ethiopia 2007 Fiji 1996, 2007 Ghana 1984, 2000, 2010 Grenada 2001 Guam 1980, 1990, 2000 Indonesia 1980, 2010 Kenya 1999 Kiribati 2005 Lesotho 1996, 2006 Malawi 1998, 2008 Maldives 2006 Marshall Islands 1973, 1980, 1988 Micronesia 1973, 1980, 1994, 2000 Northern Marianas 1973, 1980, 1990, 1995, 2000 Palau 1973, 1980, 1990, 1995, 2000, 2005 Papua New-Guinea 1990 Samoa 2001 Sierra Leone 2004 Solomon Islands 1999 South Africa 2001 Sudan 2008 Tanzania 2002 Timor Leste 2004 Tonga 1996, 2006 Uganda 1991, 2002 US Virgin Islands 1980, 1990, 2000 Vanuatu 1989 Zambia 2000 Note: For some, processing occurred during the census, for others it was during preparation or during analysis (including own children estimation). 2 Purpose of Handbook No census data are ever perfect Changes are made -- little documentation Promote communication between subject specialists and programmers “Cookbook” of suggestions -- presents possible resolutions But country edit teams must decide 3 The Census Process Data collection Capture Editing Tabulation and Dissemination Archiving 4 History of census editing Early years – manual or nothing Computers Within record editing Between record editing Hot decking 5 Editing in Historical Perspective Before computers: manual editing With computers: Increased complexity Automated changes Generalized editing packages New philosophies of editing Personal computers Appropriate levels of computer editing 6 Major Elements in a Census Preparatory work Enumeration Data processing -- keying, editing and tabulations Building data bases and dissemination Evaluation of results Analysis of results 7 Errors in Census Process Coverage Errors Questionnaire Design Enumerator/respondent errors Coding errors Data entry errors Computer editing errors Tabulation errors 8 What is editing Editing is the systematic inspection of invalid and inconsistent responses, and subsequent manual or aurtomatic correction according to pre-determined rules. The editing team!! 9 Editing Team Appropriate internal subject matter specialists Computer Programmers Work together as a team Edit Specs as means of communication Outside experts -- academicians Outside experts -- private sector 10 Why edit? Edited vs unedited data Always preserve original data Consider the users!! 11 Table 1. Sample population by 15-year age group and sex, using unedited and edited data Age group Total Less than 15 years 15 to 29 years 30 to 44 years 45 to 59 years 60 to 74 years 75 years and over Not reported Unedited data Total Male 4,147 1,639 1,256 727 360 116 34 15 2,033 799 612 356 194 54 12 6 Female 2,091 825 643 369 166 59 22 7 Not reported 23 15 1 2 0 3 0 2 Edited data Total Male 4,147 1,743 1,217 695 341 114 37 2,045 855 603 338 182 53 14 Female 2,102 888 614 357 159 61 23 12 Table showing trends with unknowns TABLE 2. POPULATION AND POPULATION CHANGE BY 15-YEAR AGE GROUP WITH UNKNOWNS: 1990 AND 2000 Age group Total Less than 15 years 15 to 29 years 30 to 44 years 45 to 59 years 60 to 74 years 75 years and over Not reported Numbers 2000 4147 1639 1256 727 360 116 34 15 1990 3319 1348 902 538 200 89 25 217 Number Change Per cent Change 828 291 354 189 160 27 9 -202 24.9 21.6 39.2 35.1 80.0 30.3 36.0 -93.1 Per cent 2000 100.0 39.5 30.3 17.5 8.7 2.8 0.8 0.4 1990 100.0 40.6 27.2 16.2 6.0 2.7 0.8 6.5 13 WHAT CENSUS EDITING SHOULD DO Give users measures of the quality of the data 2 Identify the types and sources of error, and 3 Provide adjusted census results 1 14 Goals of the edit Imputed household should closely resemble failed edit household Imputed data should come from a single donor person or house resembling donee Equally good donors should have equal chances 15 Basics of Census Editing Systematic inspection and change (not always correction) Fatal edits -- invalid or missing entries Query edits -- inconsistencies Must preserve the original data as much as possible Quality enumeration more important than editing Edit does not improve data quality -- makes more esthetic Team must determine how far to do 16 More of Basics Over-editing is harmful Treatment of unknowns Spurious changes Determining tolerances Learning from the edit process Quality assurance Costs of Editing Imputation Archiving 17 How Over-editing is Harmful Timeliness Finances Distortion of true values A false sense of security 18 What we have to look out for Treatment of unknowns Spurious changes Using tolerances Learning from the editing process Quality assurance Costs of editing 19 Two parts of a national edit Structure editing Content editing 20 Editing Applications Manual versus automatic correction Guidelines for correcting data Validity and consistency checks Methods of correcting and imputing data Other editing systems 21 Manual versus Automatic Correction Manual correction: takes a long time and very subject to error Automatic correction: faster and consistent. Not necessarily correct, just consistent. Can look at many variables at the same time Can keep an audit trail 22 Figure 1. Sample editing specifications to correct sex variable, in pseudocode If SEX of the HEAD OF HOUSEHOLD = SEX of the SPOUSE If FERTILITY of the HEAD OF HOUSEHOLD is not blank If FERTILITY of the SPOUSE is blank (if the SEX of the head of household is not already female) Make the SEX = female endif (if the SEX of the spouse is not already male) Make the SEX = male endif else Do something else because they have same sex and both have fertility !!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Else This is the case where the head of household’s fertility is blank If FERTILITY of the SPOUSE is not blank (if the SEX of the head of household is not already male) Make the SEX = male endif (if the SEX of the spouse is not already female) Make the SEX = female endif else Do something else because BOTH have no fertility!!! [The “something” could be using the sex of the previous head, or alternating the sex of the Head, or using ratios of sexes of all heads for an appropriate response, etc.] endif Endif Endif 23 Guidelines for Correcting Data Make the fewest required changes possible to the originally collected data Eliminate obvious inconsistencies among the entries Systematically supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or group When appropriate, use “not reported” 24 25 Types of editing Top Down • The usual way • Is simple and straight forward Multiple-variable editing approach • Uses more information • Is likely to be a better guess 26 Methods of Correction and Imputation When imputation is not needed – toggling sexes Static imputation – cold deck technique Dynamic imputation – hot deck technique 27 Hot Deck Imputation Geographic considerations Use of related items Sequence of the items Complexity of the matrices Standardized hot decks Size of hot decks -- too big, audit trail, too small, difficult items 28 In developing hot decks Imputation matrices – structure of the matrices Standardized imputation matrices Seeding the decks Big, but not too big Understanding what the matrix is doing When the matrix is too small … Occupation and industry!! 29 Aids to checking edits 1. 2. 3. Listings Writing whole households before and after with changes Frequency matrices 30 Figure 4. Example of a listing summary for Malawi 2008 Census [LISTING] 1718 336574 ******************************** ... 1719 336574 ******* Age & Head ********* ... 1720 336574 ******************************** ... 1805 1546 0.1 *P00-1* Head is not first person, is %2d... 1823 877 0.1 *P00-2* No head of household, first person 14+... 1835 62 0.0 *P00-3* No head 14+, first person becomes head... 1850 5074 0.3 *P00-4* Too many heads of household - 1 ... 1860 5238 0.3 *P00-5* Remaining heads made other RELATIONSHI... 1874 939 0.1 *P00-6* After head edit, not one and only one ... 1889 2301 0.1 *P00-6a* Spouses too young made other relative... 1909 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1911 1062 0.1 *P00-6ax* Multiple spouses for unmarried head... 1929 44 0.0 *P00-6a1* Crazy case where spouse is visitor a... 1949 89 0.0 *P00-6a3* Crazy case where spouse is visitor a... 1998 12 0.0 *P00-6a1* Extra spouses who are visitors... 2017 1483 0.1 *P00-6a2* Extra spouses not married... 1748490 1748490 1748490 1748490 1748490 1748490 1748490 1748490 1748490 1748490 1748490 1748490 1748490 31 Figure 5. Example of a listing summary for Lesotho 2006 Census [LISTING] 4388 4389 4390 4401 4410 4419 4426 4433 4440 4453 4461 21471 21471 21471 1449 2897 3791 3895 4908 103 8 616 1.2 2.3 3.0 3.1 3.9 0.1 0.0 0.5 ... ******* Sisterhood Characteristics *********... ... *G45-1* Total sisters out of range [%2d] illeg... *G45-2* Dead sisters out of range [%2d] illega... *G45-3* Pregnant sisters [%2d] illegal... *G45-4* At birth sisters [%2d] illegal... *G45-5* Week 6 sisters [%2d] illegal... *G45-6* Sum of Dead Sisters [%2d][%2d][%2d] gr... *G45-7* Sum of Dead Sisters [%2d][%2d][%2d] gr... *G45-8* Dead Sisters [%2d] greater than total ... 124839 124839 124839 124839 124839 124839 124839 124839 32 Figure 8. Example of a write listing for Ethiopia 2007 Census [WRITE] BARCODE REGION ZONE WEREDA TOWN SUB_CITY SA KEBELE EA HHNO HUNO ------------------------------------------------------------------------PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 01 01 01 01 31 01 05 67 02 02 01 06 01 34 01 05 67 02 03 01 09 02 30 01 05 05 02 04 01 09 02 20 01 05 05 02 05 01 09 01 01 01 05 05 02 P18-3 No literacy , but schooling 97, so P20-20 Unable to read and write 98 because P16-1 Mother's vital status invalid = P17-1 Father's vital status invalid = PN RS RH SX AG RL MT ET DS 1 2 3 4 5 6 7 8 01 01 01 01 31 01 05 67 02 02 01 06 01 34 01 05 67 02 03 01 09 02 30 01 05 05 02 04 01 09 02 20 01 05 05 02 05 01 09 01 01 01 05 05 02 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL 08 01 01 01 97 12 01 08 01 01 01 97 17 01 07 02 02 97 05 01 03 02 02 02 98 03 08 03 literate, PN = 3 never attended school , PN = 4 PN = 5 PN = 5 9 0 1 2 3 CS YR PR ZN MO FA LT SC HG WL 08 01 01 01 97 12 01 08 01 01 01 97 17 01 07 02 02 01 97 05 01 03 02 02 02 98 00 03 08 01 01 01 01 RS LY 01 01 01 01 03 07 08 ES 07 03 05 MS MH FH MA FA MD FD LB 01 01 04 00 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 RS LY 01 01 01 01 03 ES 07 03 05 MS MH FH MA FA MD FD LB 01 01 04 00 00 00 00 00 00 00 01 01 00 00 00 00 00 00 33 Figure 10. Example of a frequency distribution for Sudan 2008 Census [FREQUENCY] Imputed Item Q18_ATTAINMENT: Education Attainment - all occurrences _____________________________ _____________ _____________ Categories Frequency CumFreq % Cum % Net %|cNet % _______________________________ _____________________________ _____________ _____________ 1 No Qualification 105 105 2.2 2.2 2.4 2.4 2 Incomplete Primary 1564 1669 33.5 35.7 35.3 37.7 3 Primary 4 529 2198 11.3 47.0 11.9 49.6 4 Primary 6 492 2690 10.5 57.6 11.1 60.7 5 Primary 8 302 2992 6.5 64.0 6.8 67.5 6 Junior 3 251 3243 5.4 69.4 5.7 73.2 7 Junior 4 58 3301 1.2 70.7 1.3 74.5 8 Secondary 3 95 3396 2.0 72.7 2.1 76.6 9 Secondary 4 5 3401 0.1 72.8 0.1 76.7 10 Post Secondary Diploma 2 3403 0.0 72.8 0.0 76.8 11 University Degree 154 3557 3.3 76.1 3.5 80.3 12 Post Graduate Diploma 10 3567 0.2 76.3 0.2 80.5 13 Master 52 3619 1.1 77.5 1.2 81.7 14 Ph.D 1 3620 0.0 77.5 0.0 81.7 15 Khalwa 1 3621 0.0 77.5 0.0 81.7 @17 144 3765 3.1 80.6 3.2 85.0 @98 667 4432 14.3 94.9 15.0 100.0 _______________________________ _____________________________ _____________ _____________ NotAppl 240 4672 5.1 100.0 _______________________________ _____________________________ _____________ TOTAL 4672 4672 100.0 100.0 34 Figure 11. Example of a frequency distribution for additional edit for Zambia 1990 Census [FREQUENCY] Input: 1IN100.DAT Program: ZAMHOUSE ROOMS ------------------------------------------------------------Values Number of Cum. Imputed Imputations Percent Percent ------------------------------------------------------------< 1 1,415 37.21 37.21 1 2,185 57.45 94.66 2 121 3.18 97.84 3 22 0.58 98.42 4 16 0.42 98.84 5 23 0.60 99.45 6 21 0.55 100.00 > 6 ------------------------------------3,803 35 Other considerations Running the edit three times: seed, run, check Saving original responses Imputation flags 36 Computer Edit Specifications for Pilot Census 2001 Data Processing Project Christopher S. Corlett Data Processing Adviser U.S. Census Bureau 37 Editing examples: Language – the general edit Young heads of household Population group Access to telephones Same-sex marriages Fertility Source for all data except language: South Africa Pilot Census 2001 38 Language Edit If this is the head and language is missing, first look for someone else in the house with language, and assign that. If this is the head without language, no one else has language, use neighboring head of similar characteristics to assign a best guess. If this is someone else in the house and language is missing, assign the head’s language. 39 Language Edit: Within House 91200217 Population Group Case = 0009 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS 01 1 034 01 1 55 1 09 1 02 2 023 02 1 06 55 1 07 1 03 2 005 03 1 06 55 1 09 1 04 2 003 03 1 06 55 1 09 1 V.14c: P07 invalid for head, imputing from other PN = 01 Lang = V.14c: P07 invalid for head, imputing from other PN = 01 Lang = V.14c: P07 invalid for head, imputing from other PN = 01 Lang = end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS 01 1 034 01 1 06 55 1 09 1 02 2 023 02 1 06 55 1 07 1 03 2 005 03 1 06 55 1 09 1 04 2 003 03 1 06 55 1 09 1 PERMPLAC SM 1 1 1 1 Oth lang = 06 06 Oth lang = 06 06 Oth lang = 06 PERMPLAC SM 1 1 1 1 40 Language Edit: Imputed House 91200697 Language Case = 0027 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 1 09 1 1 02 2 027 02 1 1 09 1 1 03 1 005 03 1 1 09 1 1 V.14d: P07 invalid, imputing from deck ALANGUAGE PN = 01 Lang = V.15d: P08 invalid for head, impute from deck ARELIGIO PN = 01 Head Relig = V.14f: P07 invalid, imputing from head PN = 02 Lang = Head's lang = 06 V.15f: P08 invalid, imputing from head's religion PN = 02 Relig = Head's relig = 38 V.14f: P07 invalid, imputing from head PN = 03 Lang = Head's lang = 06 V.15b: imputing P08 from mother's religion PN = 03 Relig = Mo relig = 38 end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 06 38 1 09 1 1 02 2 027 02 1 06 38 1 09 1 1 03 1 005 03 1 06 38 1 09 1 1 41 Editing examples: Language – the general edit Young heads of household Population group Access to telephones Same-sex marriages Fertility Source for all data: Pilot Census 2001 42 Young heads of household V.3 (relationship for head) and V.5 (age of head) Related issue: each HH must have 1 and only 1 head. For invalid head of ages, try to obtain via: – – – spouse (impute from deck based on spouse's age and head's sex) otherwise, children (child's age and head's sex) otherwise, impute from deck (household size and head's sex) 43 Young heads Skepticism about young heads; if younger than 12 then confirm: – – – – – – – if someone else older is present, then make them the head (V.3) can't be married (must be 12+ years to be married) has to be 12 years older than biological children confirm consistency of age and educational level confirm consistency of age and educational institution can't have economic activity responses if younger than 10 can't have fertility (for girls) If head doesn't pass these age tests, then impute (based on head’s sex and household size). 44 Young heads Effect: number of heads younger than 12 years old drops from 1296 (1.3%) to 627 (0.6%) 45 46 47 Case 1: Notes: PN = Person number SEX = Sex DOB = Day of birth MOB = Month of birth YOB = Year of birth REL = Relationship to head MAR = Marital status SPN = Spouse person number CEB = Children ever born (total) CS = Children surviving (total) MPN = Mother person number FPN = Father person number 48 Case 1: PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 1 11 01 1950 051 01 1 99 02 1 17 07 1977 023 03 5 03 2 04 04 1985 005 03 5 00 04 1 24 10 1987 011 03 5 05 1 01 07 1990 010 03 5 06 1 20 02 1994 007 01 5 07 1 20 02 1994 007 5 CS MPN FPN 99 01 01 09 01 53 01 49 01 99 01 99 01 V.2b4b: age and DOB inconsistent, age <= DOB, Age = 005 Date = 04/04/1985 V.2b4b: age and DOB inconsistent, age <= DOB, Age = 011 Date = 24/10/1987 V.3: either no heads or > 1= 0002 V.3h: more than 1 head = V.3i: multiple heads, making oldest= 0051 V.3k: multiple heads, making excess other rel V.9g: Relation invalid, has a dad, impute Rela PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 1 11 01 1950 051 01 1 99 02 1 17 07 1977 023 03 5 03 2 04 04 1985 015 03 5 00 04 1 24 10 1987 013 03 5 05 1 01 07 1990 010 03 5 06 1 20 02 1994 007 11 5 07 1 20 02 1994 007 03 5 CS MPN FPN 99 01 01 09 01 53 01 49 01 99 01 99 01 49 Case 2: PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 2 01 09 1986 015 01 5 00 02 2 09 06 1990 011 06 5 03 1 01 09 1991 010 06 5 04 2 01 09 1994 007 06 5 V.2b4b: age and DOB inconsistent, age V.2b4b: age and DOB inconsistent, age V.2b4b: age and DOB inconsistent, age V.2b4b: age and DOB inconsistent, age V.3a1: head is younger than 16, Age = V.3a3: no older relatives found; keep CS MPN FPN 90 99 99 <= DOB, Age <= DOB, Age <= DOB, Age <= DOB, Age 014 young head PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 2 01 09 1986 014 01 5 00 02 2 09 06 1990 010 06 5 03 1 01 09 1991 009 06 5 04 2 01 09 1994 006 06 5 = = = = 015 011 010 007 Date Date Date Date = = = = 01/09/1986 09/06/1990 01/09/1991 01/09/1994 CS MPN FPN 90 99 99 50 Case 3: PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 1 12 01 1998 003 09 05 02 2 008 09 05 CS MPN FPN V.3b: no head of household! V.3e: no head, making oldest person the head V.5: head is younger than 12, about to confirm this V.5e1: young head, but age consistent with educ lvl V.5i1: young head, but age consistent with educ inst V.5k: imputing young head's age from AHEADAGE for econ activity inconsistency PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 1 99 99 1908 092 01 05 02 2 12 01 1998 003 09 05 CS MPN FPN 51 Editing examples: Young heads of household Population group Access to telephones Same-sex marriages Fertility 52 Population Group (V.13) For invalid population group, try to obtain via: – – – Head of household Someone else in the household Otherwise, impute from deck (age by household size) Effects: – Removes 2.9% blank/invalid responses 53 Population Group (percents) 80.0% 70.0% 60.0% percent 50.0% raw 40.0% edited 30.0% 20.0% 10.0% 0.0% Black African Coloured Indian or Asian White Other blank invalid 54 Population Group Parts of the current edit might need refinement for South Africa Issues to explore: – – Imputations in HHs with multiple pop groups; Tolerances and household size: – Case where whole HH has blank/invalid pop group; Case where all but 1 HH member has same pop group; Situations between these two extremes Effect on planning/data use of leaving the variable “not stated” 55 Population Group 56 Case 1: PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM 01 1 073 01 1 06 55 1 09 1 1 02 2 063 02 1 06 55 1 09 1 1 03 2 025 11 1 06 55 1 09 1 1 04 1 016 09 1 06 55 1 09 1 1 05 1 014 09 1 06 55 1 09 1 1 06 2 011 09 1 06 55 1 09 1 1 07 2 000 11 1 09 1 1 V.13e: Pop group invalid, impute from head PN=07 Group=Head Group= 1 PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM 01 1 073 01 1 06 55 1 09 1 1 02 2 063 02 1 06 55 1 09 1 1 03 2 025 11 1 06 55 1 09 1 1 04 1 016 09 1 06 55 1 09 1 1 05 1 014 09 1 06 55 1 09 1 1 06 2 011 09 1 06 55 1 09 1 1 07 2 000 11 1 06 55 1 09 1 1 57 Case 2: PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM 01 1 032 01 3 02 39 1 08 710 1 1 02 2 028 02 1 08 1 1 03 1 068 07 1 08 1 1 04 2 057 07 1 08 1 1 05 2 007 03 1 06 1 1 06 1 006 03 1 08 1 1 07 1 001 03 1 08 1 1 08 2 030 12 1 07 09 1 1 V.13e: Pop group invalid, impute from head (SIX TIMES) PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM 01 1 032 01 3 02 39 1 08 1 1 02 2 028 02 3 02 39 1 08 1 1 03 1 068 07 3 02 39 1 08 1 1 04 2 057 07 3 02 39 1 08 1 1 05 2 007 03 3 02 39 1 06 1 1 06 1 006 03 3 02 39 1 08 1 1 07 1 001 03 3 02 39 1 08 1 1 08 2 030 12 1 07 39 1 09 1 1 58 Case 3: PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM 01 1 045 01 01 32 1 01 1 02 2 048 02 01 32 1 01 1 V.13b: Pop group invalid, impute from deck V.13e: Pop group invalid, impute from head PN SEX AGE REL GRP LAN RGN RSA PRV CNT CTZ URS PERMPLAC SM 01 1 045 01 4 01 32 1 01 1 1 02 2 048 02 4 01 32 1 01 1 1 59 Editing examples: Young heads of household Population group Access to telephones Same-sex marriages Fertility 60 Telephones and cell phones (IV.16) Telephone access is not applicable for households that have telephones or cell phones. Households with responses to the telephone access question should not have telephones or cell phones. Impute these variables from hot decks (based on dwelling type and tenure status) if necessary. 61 62 Telephones and cell phones Many left all questions blank – – Problems with capture of continuation qsts Confusion of “blank” and “no” (also seen in disabilities section) 63 Summary report: 64 Case 1: DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD 01 2 006 1 4 7 4 1 5 1 1 IV.16c: impute cell phone = no Phone2 Cell= TV CMP FRG TEL CLL ACC RFS 1 2 1 2 2 4 DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD 01 2 006 1 4 7 4 1 5 1 1 Access= 2 TV CMP FRG TEL CLL ACC RFS 1 2 1 2 2 2 4 65 Case 2: DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD 01 006 1 4 1 4 1 1 1 1 TV CMP FRG TEL CLL ACC RFS 1 2 1 1 4 IV.16h: imputed cell = 1 from deck DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD 01 2 006 1 4 1 4 1 1 1 1 TV CMP FRG TEL CLL ACC RFS 1 2 1 1 1 4 66 Case 3: DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD 02 2 005 1 4 1 4 4 4 1 1 IV.13c: IV.14c: IV.15c: IV.16f: IV.16h: IV.16j: IV.17c: imputed imputed imputed imputed imputed imputed imputed TV CMP FRG TEL CLL ACC RFS television computer refrigerator telephone cell access rubbish DWL MLT RMS SHR TEN WAT SRC TLT COK HET LIT RAD 02 2 005 1 4 1 4 4 4 1 1 TV CMP FRG TEL CLL ACC RFS 1 2 2 2 2 1 4 67 Editing examples: Young heads of household Population group Access to telephones Same-sex marriages Fertility 68 Same-sex marriages (V.7, V.8, and V.12) Treated as part of the marital status edits for heads and rest of household Imputations for invalid sex never result in a samesex marriage No polygamous combinations of same-sex allowed 69 Same-sex marriages Skepticism about same-sex marriages; only allowable if: – – – – Both partners 12 years or older; Both sexes valid; Relationships to head consistent (for sub-families); Both partners’ marital statuses reported as “living together” (4). 70 71 Same-sex marriages Investigation shows that almost all of the reported same-sex marriages are erroneous. Enumerator’s manual contains instructions that add bias against accurate collection. Social situation in SA means that this might become a contentious issue. 72 Same-sex marriages Enumerator’s Manual, pg 38: “Question P-05: Marital Status … Couples who are not married to each other but live together as if they are married, belong to category 4. This category is for people who live in every respect as a married couple except that they have not undergone a marriage ceremony. Only male/female couples should indicate this category – the census does not collect data on gay couples.” 73 Case 1: PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 1 28 12 1930 071 01 1 02 02 1 17 06 1937 064 02 1 01 03 1 06 02 1935 066 06 8 04 2 06 03 1984 007 09 5 00 CS MPN FPN 99 V.2b4b: age and DOB inconsistent, age <= DOB,Age=071 Date=28/12/1930 V.2b4b: age and DOB inconsistent, age <= DOB,Age=064 Date=17/06/1937 V.2b4b: age and DOB inconsistent, age <= DOB,Age=007 Date=06/03/1984 V.7i: same sex marriage w/ MSs not both 4 PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 1 28 12 1930 070 01 1 02 02 2 17 06 1937 063 02 1 01 03 1 06 02 1935 066 06 8 04 2 06 03 1984 016 09 5 00 CS MPN FPN 99 74 Case 2: PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 2 16 01 1956 044 01 5 01 02 2 09 05 1991 009 02 5 CS MPN FPN 01 V.2b4b: age and DOB inconsistent, age <= DOB,Age=044 Date=16/01/1956 V.7a: imputing SPN for head to point to spouse SPN=Spouse= 0002 V.7e: imputing head MS from female head MS= 5 SPN= 02 V.7g: spouse too young ... impute from age Head Age = 045 Sp Age= 009 V.7i: same sex marriage w/ MSs not both 4 V.7m: imputing sp MS from hot deck V.7n: making spouse SPN point to head PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 2 16 01 1956 045 01 1 02 01 02 1 09 05 1991 026 02 1 01 CS MPN FPN 01 75 Case 3: PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 2 03 03 1976 025 01 4 01 02 2 14 08 1979 021 02 4 03 1 03 08 1995 005 03 5 CS MPN FPN 01 99 99 99 02 01 V.2b4b: age and DOB inconsistent, age <= DOB, Age=025 Date=03/03/1976 V.7a: imputing SPN for head to point to spouse V.7h: same sex marriage, both head & spouse MS = 4 V.7n: making spouse SPN point to head PN SEX DOB MOB YOB AGE REL MAR SPN CEB 01 2 03 03 1976 024 01 4 02 01 02 2 14 08 1979 021 02 4 01 03 1 03 08 1995 005 03 5 CS MPN FPN 01 99 99 99 02 01 76 Editing examples: Young heads of household Population group Access to telephones Same-sex marriages Fertility 77 Fertility (V.27) Fertility is not applicable for men or women not 12:49 years old. For women 12:49, blanks in fertility section are treated as zeros. Handle common enumerator and reporting errors – – – Switch lines when turning to next page; Husband report fertility, not wife; Last child info with child, not mother. 78 Notes: TCEB = Total children ever born MCEB = Male children ever born FCEB = Female children ever born TCS = Total children surviving MCS = Male children surviving FCS = Female children surviving SXLAST = Sex of last child born VSLAST = Vital status of last child born (still alive?) YRLAST = Year of birth of last child born MOLAST = Month of birth of last child born 79 Fertility Fertility is valid if all of the following are true: TCEB = MCEB + FCEB, and TCS = MCS + FCS, and TCEB >= TCS, and MCEB >= MCS, FCEB >= FCS, and number of boys in the household who declared this person as their mother (using mother person number) ≤ MCS, and number of girls in the household who declared this person as their mother (using mother person number) ≤ FCS, and and woman's age ≥ (11 + TCEB), and FCEB>0 if SXLAST=female, and MCEB>0 if SXLAST=male, and FCS>0 if SXLAST=female and VSLAST=alive, and MCS>0 if SXLAST=male and VSLAST=alive, and all responses for last child born information (YRLAST, MOLAST, SXLAST, VSLAST) are complete and valid, or else they are all blank (indicating no births); 80 Fertility Also, maximum number of children (24 total and 12 per sex). When bad CEB or CS values can be calculated, then we do that. When fertility is not valid, impute a consistent set of fertility responses from a deck (based on age, marital status, education level); then confirm last child born info from woman’s children in household. 81 82 Total Births (for women 12:49 years) 60.0% percent of women 50.0% 40.0% 30.0% 20.0% 10.0% 16 -1 8 19 -2 4 16 -1 8 19 -2 4 raw edited Un de fi n ed 13 -1 5 13 -1 5 10 -1 2 89 7 6 5 4 3 2 1 0 0.0% num ber of children Total children still living (for women 12:49 years) 60.0% 40.0% 30.0% 20.0% 10.0% edited num ber of children Un de fi n ed raw 10 -1 2 89 7 6 5 4 3 2 1 0.0% 0 percent of women 50.0% 83 Case 1: PN SEX AGE CEB MCB FCB 01 1 041 02 2 038 04 02 02 03 2 022 71 01 00 04 1 012 05 2 009 06 1 001 CS MCS FCS MLB YRLB SLB VLB 04 01 02 01 02 00 08 1991 06 1999 2 1 1 1 V.27: problems detected in fertility info ... PN= 03 V.27b: imputing TCEB = MCEB+FCEB PN= 03 TCEB=71 MCEB=01 FCEB=00 PN SEX AGE CEB MCB FCB 01 1 041 02 2 038 04 02 02 03 2 022 01 01 00 04 1 012 05 2 009 06 1 001 CS MCS FCS MLB YRLB SLB VLB 04 01 02 01 02 00 08 1991 06 1999 2 1 1 1 84 Case 2: PN SEX AGE CEB MCB FCB 01 2 054 02 2 035 02 01 01 03 2 020 00 04 2 014 00 05 1 012 06 2 005 CS MCS FCS MLB YRLB SLB VLB 02 01 01 2 1 V.27: problems detected in fertility info ... PN=02 V.27POST: LAST info blank, imputing from youngest child PN= 02 (updates FCEB, TCEB, FCS, TCS) V.27e: imputing fertility data from AFERTILITY PN=03 V.27e: imputing fertility data from AFERTILITY PN=04 PN SEX AGE CEB MCB FCB 01 2 054 02 2 035 02 01 01 03 2 020 00 00 00 04 2 014 00 00 00 05 1 012 06 2 005 CS MCS FCS MLB YRLB SLB VLB 02 00 00 01 00 00 01 00 00 11 1995 2 1 85 Case 3: PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB 01 2 042 05 02 03 05 02 03 09 1997 2 1 02 2 021 01 01 01 01 04 1994 1 1 03 2 018 01 00 00 01 00 00 01 2001 1 04 1 020 05 1 014 06 2 003 07 1 002 08 2 000 V.27: problems detected in fertility info ... PN= 02 V.27c: imputing FCEB = TCEB-MCEB PN= 02 V.27g: imputing FCS = TCS-MCS PN= 02 V.27: problems detected in fertility info ... PN= 03 V.27b: imputing TCEB = MCEB+FCEB PN= 03 V.27j: imputing fertility from hot deck PN= 03 PN SEX AGE CEB MCB FCB CS MCS FCS MLB YRLB SLB VLB 01 2 042 05 02 03 05 02 03 09 1997 2 1 02 2 021 01 01 00 01 01 00 04 1998 1 1 03 2 018 01 00 01 01 00 01 01 2001 2 1 04 1 020 05 1 014 06 2 003 07 1 002 08 2 000 86 Fertility Issues: – – If woman reports zero TCEB and leaves rest blank, does that mean “no fertility” or “error”? See if last child born can be handled separately from rest of fertility, so that full set is not imputed when last child born has problems and rest is valid 87 Conclusions Edits part of the series of census procedures Usually more for aesthetics than technical enhancement Hardware and software changing rapidly The revolution continues! 88