Document1 Documentation file for test of CCRI Household Classification 1911 for use in RDC for 1921-51 RDC project: File 493861 Contract no: Moldofsky 2688 Steps: 1. Create directories to correspond to RDC directory structure: H:\moldofsky2688\data H:\moldofsky2688\syntax 2. Within data directory within folder: copy original 1911 dataset from: P:\project\CFI_CCRI\Research\Research_1911\1911_DATA\original\ \CCRI-CDN-Census1911V20091117.sav FOR 1921-51 datasets: Make sure MISSING values are defined as such, using the variable def, for all variables used in the syntax: RELATIONSHIP, MARITAL_STATUS, SEX, Derived_Surname_Number, Derived_Age_In_Years 3. Large dwellings (LDs) of Single Unit type (SU) (such as camps, prisons) will be excluded from analysis. LDs of Multiple Unit type (apartments, rooming houses) will be included Extract records with Dwelling_type= MU or UU and save as file: Select cases if: Dwelling_Unit_Type = "UU" OR Dwelling_Unit_Type = "MU" CCRI_CDN_Census1911V20091117_MU_UU_only.sav This will be the base working data file from which all others are created. 355036 records. 4. Use Gordon Darroch’s syntax file as basis, and edit for new paths and filenames. Original file: Darroch General HHLD Syntax for 1911 - four files with comments.sps NEW FILE: MOLDOFSKY General HHLD Syntax for 1911 - four files with comments.sps 5. Data file corrections part 1: Darroch’s file documents corrections made after the fact because of missing values which were later inferred. Better method would be to analyze and correct these before processing. Issues dealt with as below. Corrections made manually and saved in file: CCRI_CDN_Census1911V20091117_MU_UU_revised.sav This now becomes the new “working file.” 5a. SEX missing for HEAD OF HOUSEHOLD. Can be inferred from name and occupation in most cases. Select cases: SELECT IF (MISSING (SEX) and RELATIONSHIP = 1). Save file as: SEX_HEAD_MISSING.spv. 47 records selected. Copy variable SEX to base working file: SEX_HEAD_REV 1 Document1 Find identified cases by Derived_Individual_Id. Infer and correct SEX_HEAD_REV. Replace this variable for SEX throughout syntax. 5b. MARITAL_STATUS missing for HEAD OF HOUSEHOLD. Can be inferred from examination of household members in many cases. Select cases: SELECT IF (RELATIONSHIP = 1 and MISSING (MARITAL_STATUS)). Save file as: MARITAL_STATUS_HEAD_MISSING.spv. 322 records selected. Copy variable MARITAL_STATUS to base working file: MARITAL_ST_REV Find identified cases by Derived_Individual_Id. Infer and correct MARITAL_ST_REV. Replace this variable for MARITAL_STATUS throughout syntax. 5c. RELATIONSHIP incorrect related to MARITAL STATUS missing 5b. Can be inferred for example if MARRIED can change unknown relationship of partner to SPOUSE or WIFE. Copy variable RELATIONSHIP to RELATION_REV. Only correct if 5b requires it. Replace RELATIONSHIP with this variable throughout syntax. 6. Data file corrections part 2: Edited Gordon’s syntax file to de-bug it and add comments - ran tests on Alberta dataset see H:\moldofsky_2688\data\CCRI_1911_TEST\DATA\Alberta_test made further changes and refinements and then ran on entire dataset - final version MOLDOFSKY General HHLD Syntax for 1911 REV 2012 02 29.sps This syntax splits working file into 4 parts based on household heads: Regular heads, Double (multiple) heads, No heads and Solo heads (only one person in household.) Each of these files is then processed separately to assign HHLDTYPE (Darroch’s 25 classes of Household type). Examining these files after completion and debugging revealed a number of other errors/inconsistencies in the dataset, and it was decided to correct these. Many of these were revealed when the HHLD_TYPE value was MISSING; some by visual examination of data. These were handled as follows below. Corrections were made to working file CCRI_1911_V20091117_MU_UU_only.sav and saved as CCRI_1911_V20091117_MU_UU_revised.sav becomes new working file. Syntax changed for next round. 6a. Solo heads (file 1911_Solo_Head_ALLVARIABLES.sav) 29 records had HHLD_TYPE MISSING. This is because MARITAL_STATUS or SEX were missing in the original data file. For Marital Status: In Darroch’s syntax he has Recoded missing values as 0 for both - this causes problems later in classification. Therefore for SOLO heads, in syntax change Recode Not_Married_Head (missing=4) instead of 0. This assumes missing is unmarried, which for households of people living alone is overwhelmingly true. For SEX: examine records where MISSING(SEX) (only a few, which were not caught in 5a) and make corrections manually in SEX_HEAD_REV where inferrable from other data such as FIRST_NAME or RELATIONSHIP. 6b. Solo heads (file 1911_Solo_Head_ALLVARIABLES.sav) Incorrect identification of Households as different: Examination of file revealed many records with consecutive Derived_Household_Ids and the same surnames. This should not 2 Document1 happen and is a data entry error; if related people live in the same dwelling they should be part of the same household. In most cases these were errors in the way the households were numbered on the manuscript schedule, which were then carried through into the database because of our “verbatim data entry” policy. The file 1911_Solo_Head_ALLVARIABLES.sav was put into Access for analysis to identify duplicate Household/Surname records. These records were copied into a separate file for documentation purposes: 1911_Solo_Heads_Dup_DIDs_Lastnames.sav. These records WILL BE combined in the new working file by assigning them the identical Derived_Household_Id and changing the following fields appropriately as necessary: Derived_Household_Id_In_Dwelling,Derived_Household_Id_In_Dwelling, Derived_Household_Id_In_Dwelling, Derived_Person_Num_In_Household, Derived_Person_Num_In_Dwelling, Derived_Surname_Number 6c. No Heads file ( file 1911_No_Head_ALLVARIABLES.sav) 33 Records had HHLD_TYPE MISSING. This is because MARITAL_STATUS was missing in the original data file; in 4 records SEX was missing as well. For MARITAL_STATUS and SEX: examine records where either is MISSING, which were not caught in 5a), and make corrections manually in MARITAL_STATUS_REV and SEX_HEAD_REV where inferrable from other data. This is possible most of the time. This process was augmented by examination of manuscript schedules online at: http://automatedgenealogy.com/census11/index.jsp In the process some 7 records were identified where entire households were entered with MISSING values for RELATIONSHIP as well as MARITAL_STATUS and SEX. Out of these, 5/7 seem to be clear from the manuscript schedules. Therefore these data were added to the dataset in the fields RELATION_REV, MARITAL_ST_REV and SEX_HEAD_REV. These records and the corrections are documented in the file: No_Head_HHLDTYPE_Missing_Corrections.sav (total 59 records) The corrections were then made manually in the new working file. 7. After all corrections were made to new working file: CCRI_1911_V20091117_MU_UU_revised.sav DELETE all constructed variables (egonum, TOTinDWELL, TOTinHHLD, etc) Entire syntax was re-run to create new separated classified files. These were then merged to create one classified file with all heads. Result is to create a classification file with all records of heads in it: 1911_ALL_Head_only_classif.sav To this file we add Gordon’s reclassification into two reclassed grouped variables for 8 classes and 3 classes. See in file: *Reclassification to create HHLD_8 and HHLD_3 8. Tested merging output classification files with original data file (test for Alberta) Merge screen set up as below, to add HHLD_type variable, using Derived Household ID as the key or linking variable Open classification file - Sort on Derived indiv id Open full data file - Sort on Derived indiv id Merge files by following using Derived household id as KEY VARIABLE Data -> Merge files -> Add variables (see below) 3 Document1 Syntax generated was used as a guide as follows, with appropriate filename changes This has been incorporated into the main syntax file. This file is then merged with original file to create new cleaned full HHLDTYPE file: CCRI_1911_V20091117_MU_UU_HHLDTYPE.sav see: *Merging with full working file based on Derived Household ID - Using Data - Merge - Add variables 9. Final syntax file: MOLDOFSKY General HHLD Syntax for 1911 REV 2012 02 29.sps Directory: C:\moldofsky_2688\syntax\CCRI_1911_TEST Data: C:\moldofsky_2688\data\CCRI_1911_TEST\2012_02 Copied into: H:\moldofsky_2688\ as defined at top of file. 10. Test adding other variables to HHLD file for aggregation etc Use Merge->Add variables - Merge by Derived Individual ID to add fields to the Household only file. 4