1901 Toronto Area Test Sample User Guide TATS - 1901 Toronto Area Test Sample User Guide. Canadian Century Research Infrastructure (CCRI) at York University. Contents Page Introduction The Census of 1901 Schedule 2 Sample Design Toronto City and Greater Toronto Area (GTA), 1901 The Sample Large Households, Institutions or Group Quarters Data Entry Local Verification Sample Substitutions Variables in the Database Recommendations for Coding Schemes Electoral Maps for 1901 Toronto Area Districts 2 2 3 3 3 5 5 6 6 7 7 9 14 Toronto 1901 Test Sample User Guide -1- 1901 Toronto Area Test Sample User Guide DRAFT Introduction This 1901 Toronto Area Test sample (TATS) is a large sample, representing a full 20 percent of all census-defined dwellings drawn from the original population schedule (schedule 1) of the 1901 census of Canada for Toronto. All individuals recorded in these dwellings are in the sample. The sample N is 52,702 records. TATS was taken as part of the testing of data entry protocols and the training of data entry operators in the course of conducting the Canadian Century Research Infrastructure (CCRI) project at York University. For accounts of the CCRI, and access to the publically-available 1911 national sample of households and related materials, go to http://ccri.library.ualberta.ca/. The design of the Toronto sample was based directly on experience in a prior project, The Canadian Families Project (CFP), conducted at the University of Victoria. See, http://web.uvic.ca/hrd/cfp/data/index.html. Professor Gordon Darroch, a team leader of the CCRI project, was one of the principal researchers in the CFP. We wish to thank Doug Thompson and Patrick Frisby at the University of Victoria for their assistance in drawing the TATS sample. The TATS was initially under the supervision of Dr. Evelyn Ruppert, the first coordinator of the York CCRI Centre, now Senior Research Fellow, CRESC, The Open University, United Kingdom (2006). Alden Cudanin and Nicola Farnworth of the York CCRI centre were responsible for supervising and coordinating the test sample data entry. Alden Cudanin and Gordon Darroch were responsible for this User Guide and for making the data files available. The Census of 1901 We recommend that researchers read the account of the conduct of the 1901 census provided in the Users Guide of the Canadian Families Project (see web link above). They can also find the original enumerator’s instructions on that website. The population was to be recorded as of March 31, 1901, and all information was meant to be accurate as of that date (rather than the day when the enumerator visited a dwelling). Where information was to relate to a year or “the census year,” the year was 1st April 1900 to 31 March 1901. The Canadian census of 1901 was a de jure census. The Census Act did not define that term, but the Instructions to Officers make an attempt: people were to be enumerated, not necessarily where they were actually located on 31 March 1901, but in “their home or usual place of abode.” See also articles 70 through 78 of the Instructions (pp. xx-xxi) and the references to Special Form A on the CFP website. As the CFP documentation indicates, persons temporarily absent, such as a fisher at sea or a logger in a logging camp or a commercial traveller on the road, were to be enumerated in their usual place of residence. In the case of persons away from home, where there was no “fixed period of return,” there should be no Schedule 1 entry. The application of these instructions must have led to some enumeration difficulties and variations in results. Toronto 1901 Test Sample User Guide 2 - - 1901 Toronto Area Test Sample User Guide DRAFT Although this sample was recorded in the service of the larger CCRI project it is unusually large and valuable as a research tool, so we wish to make it publicly available, but we did not have the resources to code the variables. Below we provide an account of the sample, data entry procedures, variables in the database, and a guide to possible coding by users drawing on the CFP coding schemes. Schedule 2 Users should note that Schedule 2 of the 1901 census has also been preserved and is available in digital image form. It was intended to be an extension of Schedule 1. It was an enumeration of properties for persons named in Schedule 1 – not an independent enumeration of all properties in the country. We have only entered data from Schedule 1, but schedule 2 data on properties could be added to the file by interested researchers. Sample Design The same sampling process used for the Canadian Families Project (CFP) has been adopted for the 1901 Toronto Area sample. The sample points are census-defined dwellings. Whereas the CFP consisted of a 5% sample of each subdivision with in the entire country, the TATS is a 20% sample of dwellings within each subdivision of the Toronto Area (see geographic boundaries below). An indexing process, which entailed counting and documenting the total number of dwellings in subdivisions was undertaken during the creation of the CFP sample and did not have to be repeated for the creation of this Toronto Area sample. Since only selected census districts were being sampled, a system was developed to create an ‘index’ file that consisted of a list of the number of dwellings/sample points by district and subdistrict for the entire area. A SPSS script then selected the stratified random sample of all the numbers from this list creating the 20% 1901 Toronto Area sample. All the information for each dwelling/sample point selected was then entered into the database. Further details on 1901 CFP sampling can found on page 6 of The National Sample of the 1901 Census of Canada User Guide, 2002 http://web.uvic.ca/hrd/cfp/data/index.html. Toronto City and Greater Toronto Area, 1901 The TATS was designed to represent the City of Toronto and what would be the equivalent of the 1901 Greater Toronto Area. Districts 116, 117, 118 and parts of 129 cover all 6 wards of the city and the remaining sub districts in districts 129 and all of 130 and 131 cover the rest. The following chart identifies the districts and their sub districts sampled within the entire Toronto Area. District Sub District 116 Toronto Centre Toronto City, ward 3 (Part) 117 Toronto East Toronto 1901 Test Sample User Guide Toronto City, ward 1 Toronto City, ward 2 3 - - 1901 Toronto Area Test Sample User Guide DRAFT Toronto City, ward 3 (Part) 118 Toronto West Toronto City, ward 3 (Part) Toronto City, ward 4 Toronto City, ward 5 Toronto City, ward 6 129 York East East Toronto Village Markham Markham, Village North Toronto (Part) Town-Ville Scarboro Toronto City, ward 1 Toronto City, ward 2 Toronto City, ward 3 Toronto City, ward 4 York (part) 130 York North Aurora, Town – Ville Bradford, Village Georgina & Georina Island Gwillimbury, East – Est Gwillimbury, North - Nord & Snake Island Gwillimbury, West - Ouest Holland Landing, Village King Sutton Village 131 York West Etobicoke North Toronto (part), Town – Ville Richmond Hill, Village Toronto Junction, Town Ville Vaughn Weston, Village Woodbridge, Village York (part) Provided at the end of this user guide are the electoral boundaries maps for each district in the 1901 Toronto Area sample. These maps allow users to further examine the census districts and subdistricts for which the 1901 returns were enumerated. The electoral maps were published by the federal government in 1895. Since the 1901 census districts and the 1895 electoral districts have very Toronto 1901 Test Sample User Guide 4 - - 1901 Toronto Area Test Sample User Guide DRAFT similar boundaries, the electoral maps provide users with an accurate, detailed description of the Toronto Area districts that were enumerated in 1901. The Sample The sample yielded 9,187 dwellings and 52,702 individuals. The Toronto Area population in 1901, according to the published census, was 268,899; there were 46,134 dwellings within the area. Our sample appears to include 19.91% of all dwellings and 19.59% of all individuals counted in the published census. Users need to remember that the sampling unit here is the dwelling, not the individual. However, comparisons with published census data in the following tables suggest that the sample closely represents the distributions of dwellings and population by districts and subdistricts. Further comparisons with published tables of dwelling and population characteristics for the same geographic area can be conducted by researchers. Table 1. Dwellings by Districts District 116 Toronto Centre 117 Toronto East 118 Toronto West 129 York East 130 York North 131 York West Totals Sampled Dwellings 970 1743 3110 1640 781 943 9187 Total Dwellings 4829 8586 15495 8273 4033 4918 46134 Sample as % of Total 20.09 20.30 20.07 19.82 19.37 19.17 19.91 Total Population Sample as % of total Table 2. Population by District District 116 Toronto Centre 117 Toronto East 118 Toronto West 129 York East 130 York North 131 York West Totals Sampled individuals 5675 28765 19.72 9281 16024 8083 3606 10033 52702 45621 81712 40405 18778 53618 268899 20.34 19.61 19.94 19.21 18.72 19.59 Large Households, Institutions or Group Quarters Toronto 1901 Test Sample User Guide 5 - - 1901 Toronto Area Test Sample User Guide DRAFT Some dwellings selected in the sample, of course, were unusually large, as in the case of hospitals, orphanages, asylums and other institutions, including large boarding houses. The 1901 census treated these as separate dwellings. For the purposes of our test sample, large dwellings were treated as single sample units. Data entry operators were instructed to enter all persons in any large institutions identified as sample points. As outlined on page 6 of the CFP’s The National Sample of the 1901 Census of Canada User Guide, 2002 entering all persons in large institutions may affect population estimates based on individual records, sacrificing overall sample precision for the records of the complete population residing in sampled large institutions. Unusual sectors of a population are more appropriately treated as separate sampling strata, as was the practice of the larger CCRI project, but this was beyond the purposes of the TATS. (See CCRI sampling at http://ccri.library.ualberta.ca/). Data Entry Data entry for the 1901 Toronto area sample took place over four years during three separate time periods, serving the purposes of data-entry training. The first was undertaken in the spring of 2003, the second briefly in the summer of 2007 and the third in the winter of 2008. Operationally all batches of data entry were carried out somewhat differently, but the data was entered uniformly. Two stages of preparation needed to be completed before any data could be entered. First, the sample point text files provided by the CFP’s sampler needed to be extracted and made accessible for the data entry software. Secondly, the images for the districts needed to be downloaded and made available for use by the data entry operators. All of the images were downloaded from the public websites of the Library and Archives of Canada and Ancestry.com. Data entry instructions and procedures were followed as specified in the York 1901 Data Entry Manual. Situations where a data-entry operator (DEO) found it difficult to enter information from the schedule1 forms the Data Entry Supervisor was consulted and a solution prescribed. Data entry assignment sheets for all sub districts were created and used for tracking DEO progress and completed work. Data entry operators recorded the number of people entered for each dwelling, any notes on the entry, additional comments or requests for a second opinion and indicated that all entered information had been verified. Local Verification Each DEO reviewed their work by reopening data entry tasks and reading through the transcribed data to ensure the correct information had been entered and that the procedures found in the data entry manual were followed. Data entry operators would review each individual record and their corresponding information row by row and column by column looking for spelling errors, incorrect values and typos. If any questions about or inconsistencies in the data entered arose, the verifying DEO would then refer back to the census schedule corresponding to the dwelling in question and investigate. If corrections or changes to the data needed to be made, the verifying DEO would make the changes completing the dwelling verification process. In cases where DEOs were not present to Toronto 1901 Test Sample User Guide 6 - - 1901 Toronto Area Test Sample User Guide DRAFT validate their work, their data entry tasks were verified by another DEO. This verification process occurred after all data had been entered. Sample Point Substitutions If a sample point was deemed unusable, DEOs replaced it with the next complete dwelling. Then a dwelling note indicating dwelling numbers current and substituted was included in the Parent Table dwelling notes field. Only in cases where an unusable dwelling was the last dwelling in the subdistrict the previously listed dwelling was taken as the substituted sample point. Variables in the Database Parent Table Information Variable Name Description Province Province as entered at top of Schedule 1 District District no. From top of Schedule 1 Subdistrict Sub-district letter from top of Schedule 1 Poll Polling subdivision no. From top of Schedule 1 Place City, town, village or Township from top of Schedule 1 EnumeratorLastName Enumerator’s first name from top of Schedule 1 EnumeratorFirstName NumberInDwelling Enumerator’s last name from top pf schedule 1 Count of persons in Dwelling (Column 1 counts) DwellingNumber Dwelling house no. from column 1, schedule 1 NumberFamiliesHouseholds Count of family/hhds (Column 2 counts) in Dwelling Institution Name of Institution as given by enumerator DwellingNote Note on dwelling by operator during data entry (In access not is SPSS) DataEntryOperator Name of data entry operator who entered information Child Table Information Variable Name Description PageNo Page number from top of schedule 1 LineNo Line number of individual on schedule 1 Toronto 1901 Test Sample User Guide 7 - - 1901 Toronto Area Test Sample User Guide DRAFT HHNo Household no. from column 1, schedule 1 LastName Surname of Family/hhd from column 2, schedule 1 FirstName First name(s) and initials from column 3 Sex Sex (f or m) from Column 4 Colour Colour (usually w, b, r or y) from column 5 RelHead Relationship to head of household from column 6 Marstat Marital Status from column 7 Bday Day and month of birth from column 8 YearBrith Year of birth (4 digits) from column 9 Ageyr Age at last birthday from column 10 AgeMo Age in months (if less than 1) from column 10 BPL Country or place of birth from column 11 UR If born in Canada, whether birthplace rural or urban, column 11 ImmYear Year of immigration to Canada from column 12 NATYR Year of naturalization from column 13 RACE Racial or tribal origins from column 14 NATL Nationality from column 15 RELIGION Religion from column 16 OCC Profession, occupation, trade or means of living from column 17 RETIRED R for retired from column 17 OENMEANS Living on own means from column 18 EMPLOYER Employer from column 19 EMPLOYEE Employee from column 20 OWNACCT Working on own account from column 21 TRADE Working at trade in factory or home from column 22 WORKPLC Name of workplace as given by enumerator MOEMPFAC Months employed at trade in factory from Column 23 MOEMPHOM Months employed at trade in home from column 24 MOEMPOTH Months employed in other than trade in factory or home, column 25 EARNINGS Earnings from occupation or trade from column 26 Toronto 1901 Test Sample User Guide 8 - - 1901 Toronto Area Test Sample User Guide DRAFT EARNSPER Period of earnings if not yearly EXEARN Extra earnings fro other that chief occupation, column 27 MOSCHOOL Months at school in year from column 28 CANREAD Can read from column 29 CANWRITE Can write from column 30 ENGLISH Can speak English from column 31 FRENCH Can speak French from column 32 MTONGUE Mother Tongue from column 33 INFIRM Infirmities from column 34 INDNOTE Note entered by operator during data entry PROPWNER Property listed on schedule 2 (See page and line number for linkage) Recommendations on Coding Schemes The TATS provides data in the form transcribed directly form the census enumerations; the data are uncoded. Researches can, of course, develop or adopt any suitable coding schemes. The detailed and well-tested CFP coding schemes are recommended. The 1901 CFP coded variables are listed below along with the descriptions of the coding schemes used, as outlined in the CFP User Guide. http://web.uvic.ca/hrd/cfp/data/index.html. For the full coding, consult the User’s Guide on this website. PROV2 Numeric code for province from Province Thomas Hillman’s Census Returns...1901 links District numbers to provinces and territories. We created this variable from District numbers in order to see if there was a difference between our totals for each Province (as entered from the enumeration form) and our total for provinces as inferred from District numbers. The results were almost identical, suggesting that enumerators almost always knew which province or District they were in. The codes are: 1 British Columbia 2 Manitoba 3 New Brunswick RelHead RELHEAD2 4 Nova Scotia 7 Quebec 5 Ontario 8 Territories 6 P.E.I. 9 Unorganized Relationship to head of household from column 6 * Numeric code for relationship to head Toronto 1901 Test Sample User Guide 9 - - 1901 Toronto Area Test Sample User Guide DRAFT This is the numeric code for relationship to head. To allow for comparability with U.S. census samples, we begin the 4-digit IPUMS codes for RELATE (IPUMS 95 version 1.0 User’s Guide). Note that the codes are not sex-specific (son gets the same code as daughter). Unfortunately these codes by themselves lacked the flexibility to accommodate all of the variations in our sample. Thus in the IPUMS system lodgers, boarders, roomers and tenants fall between 1201 and 1207 - and Employees begin at 1210 - leaving no room to add the many variations we find among lodgers and their kin, boarders and their kin ,etc. We have therefore added a fifth digit to the codes. The first four digits allow for comparisons with IPUMS samples. In our sample it is often difficult to distinguish institutional employees from non-institutional employees, since there is no single, clear identification of institutions (and enumerators did not always make an entry in column 7 of Schedule 2 - the name of the institution). Thus some institutional employees may appear in the coding sequence for Domestic employees (as, for instance, in the case of a maid, cook or laundress who happens to be working in a hotel or an asylum). We have a distinct numeric sequence (13261 through 13273) for religious institutions. Our codes for “other relatives” tend to be inclusive: they include wards and foster children and godsons, but not orphans (who appear under Non-related Youth). Bday BDAY Toronto BMONTH Day and month of birth from column 8 NOTE: * CFP - Information from Column 8 split into two separate variables. BPL BPL2 Country or place of birth from column 11 * Numeric code for birthplace Area 1901 collected together The numeric codes for birthplace are the 5-digit codes from the IPUMS-95 User’s Guide version 1.0, but with a major extension for the 150_ _ sequence to include all of the entries for Canada. In the IPUMS codes Canadian provinces fall between 15011 and 15081. We have revised the 150- sequence to allow room for provinces and all specific place names entered in this field. Often the province of a specific place cannot be determined (these are entered under 159_ _). RACE RACE2 Racial or tribal origins from column 14 * Numeric code for racial or tribal origin This field contains the numeric codes for RACE (“racial or tribal origin”). The RACE codes in IPUMS-95 are not applicable. The comparable field in IPUMS is ANCESTR1 - the respondent’s self-reported ancestry or ethnic origin. We apply the 4-digit ANCESTR1codes but a major extension was required to accommodate Canadian aboriginal peoples and “mixed bloods”. These are Toronto 1901 Test Sample User Guide 10 - - 1901 Toronto Area Test Sample User Guide DRAFT coded from 92_ _ to 98_ _. The codes are intended to reflect all possible variations among the original entries. The coding scheme allows for comparability with IPUMS samples but has the disadvantage (inherent in the IPUMS codes) that aggregation of certain groups important in the Canadian context will not be easy (English are coded 110, Scots 880, Welsh 970). Note that most francophone Canadians were entered as “French” in the original and hence are coded 260. NATL NATL2 Nationality from column 15 * Numeric code for nationality The numeric codes for nationality could not be taken from the IPUMS-95 codes for citizenship (CITIZEN), which are too limited for use here. Instead we apply, virtually intact, the 4-digit codes for ANCESTR1 (as for RACE2 above). Enumerators respected the instructions about Canadian but a few distinct entries for aboriginal nations appear. RELIGION RELIGION2 Religion from column 16 * Numeric code for religion U.S. historical censuses did not ask respondents to state their religion, and so we cannot apply IPUMS codes. The numeric coding scheme used here is designed for ease of aggregation, and groups religions by broad categories or families. The grouping is adapted from J. Gordon Melton, The Encyclopedia of American Religions (Wilmington, N.C., 1978), volumes 1 and 2, a source which has the advantage that it pays attention to the historical development of religions in North America and the European roots of many. Enumerators often did not enter religion for aboriginal peoples (and some of the nonstandard forms in District 206 did not even have a column for religion), or entered merely “pagan.” A study of the religious affiliation of aboriginal peoples would require careful over-sampling of reel 6556. OCC OCC1 Profession, occupation, trade or means of living from column 17 * Numeric code for Profession, Occupation, trade... The 5-digit codes applied here are an extension of those in CCDO – Canadian Classification and Dictionary of Occupations (Ministry of Supply and Services, 1989), itself an adaptation of ISCO categories. Since enumerators, following their instructions, often stated both the type of work (labourer, clerk, merchant) as well as the “branch” or sector in which the work was done, a decision had to be made about whether to give priority in coding to the type of work or to the sector. Most coding schemes give priority, of necessity, to the type of work thus all clerks will appear in the same general category, all agents in another Toronto 1901 Test Sample User Guide 11 - - 1901 Toronto Area Test Sample User Guide DRAFT category, managers in another category, whatever sector of the economy they may be in. The present coding scheme follows this precedent for most occupations: thus with agents, book-keepers, cashiers, checkers, clerks, dealers, and merchants priority is given to the job or function rather than the sector. The richness of the occupation information, however, allows some priority to be given to sector. Thus foremen, inspectors, labourers (other than general or unspecified), “makers,” managers, and manufacturers are grouped with their industry or sector, where it is given by the enumerator. The first 3 digits of the code also allow for fine distinctions by economic sector. Thus, difficult as it sometimes is to make the distinction, we have made 5 general categories for clerks. Enumerators often gave more than one occupation, despite the instruction that “the chief or principal calling is the only one to be recorded.” Thus farmers (711) are distinguished from farmers who were given some other occupation as well as farming (712); and farm employees are a separate category (714). The codes are intended to allow for ease of aggregation into very broad categories, using the first 2 digits: 11 Managerial, administrative, financial management, government and related 21 Scientists, architects, and related professionals 23 Law and social institutions 24 Students 25 Occupations in religion 27 Teaching professions 31 Occupations in medicine and health 33 Occupations in the arts and writing 41 Clerical and bookkeeping occupations 51 Commerce and sales occupations 61 Service occupations 71 Agricultural occupations 73 Occupations in fishing, hunting and trapping 75 Occupations in logging and forestry 77 Occupations in mining and oil and gas production 81 through 88 Occupations in primary and secondary processing, manufacture and construction (construction and related fall between 871 and 881) 91/93 Transportation 95 Others (printing and related is 951; stationary engineers and unspecified firemen 953; telegraph and telephone 955) 99 General labour and unclassifiable (with general labour at 991) OCC2 * Constructed variable; occupation type Toronto 1901 Test Sample User Guide 12 - - 1901 Toronto Area Test Sample User Guide DRAFT No single set of numeric codes can reflect the full complexity of entries under occupation. Users of census data often require a socio-economic ranking system derived from occupation information. We have not applied a socio-economic ranking here, in part because the census already contains a potentially powerful indicator of the social class of respondents in columns 18 through 21. The limited priority given to sector in the codes for occupation (OCC1), however, risks the loss of important information. OCC1 does not allow one easily to focus on all labourers, or all managers, for instance. To compensate for this loss we apply a simple 2-digit code to flag the presence of specific terms in the occupation information entered by the enumerator. At the very least, these codes will allow users to retrieve certain occupations from the occupational hierarchy that existed in many sectors at the turn of the century. Where the following words do not appear, the field is blank (in SPSS it is “system-missing”). Manufacturer 01 Proprietor 02 Owner 03 Employer 04 Employee 81 Labourer 82 Worker 83 Man 84 Hand 85 Woman/lady 86 Operative 87 Servant/domestic 88 Manager/president 10 Secretary 12 Assistant secretary 13 Master 14 Partner 15 Assistant 91 Chief/chef 16 Apprentice 90 Captain 17 Boy 92 Helper 93 Superintendent 20 Girl/maid 94 Supervisor 21 Child 95 Inspector 22 Son/daughter 96 Agent 30 Wife 97 Solicitor 31 Woman 98 Assistant agent 32 Foreman/forewoman 40 Overseer 41 Boss 42 MTONGUE MTONGUE2 Mother Tongue from column 33 * Numeric code for mother tongue We applied the MTONGUE detailed codes from IPUMS-95, but added a series (9200 to 9295) for aboriginal languages in Canada. INFIRM Infirmities from column 34 Toronto 1901 Test Sample User Guide 13 - - 1901 Toronto Area Test Sample User Guide DRAFT INFIRM2 * Numeric code for infirmities We created our own numeric codes for infirmities, grouped by first digit as follows: 1 blind 2 deaf/deaf and dumb 3 dumb only 4 unsound mind 5 lamed/cripple 6 idiocy 7 unspecified infirmity or invalid 8 old age/palsy/sick 9 other/illegible ELECTORAL MAPS Electoral Map 1 – District 116 Toronto Centre Toronto 1901 Test Sample User Guide 14 - - 1901 Toronto Area Test Sample User Guide DRAFT Electoral Map 2 – District 117 Toronto East Toronto 1901 Test Sample User Guide 15 - - 1901 Toronto Area Test Sample User Guide DRAFT Electoral Map 3 – District 118 Toronto West Toronto 1901 Test Sample User Guide 16 - - 1901 Toronto Area Test Sample User Guide DRAFT Electoral Map 4 – District 129 York East Toronto 1901 Test Sample User Guide 17 - - 1901 Toronto Area Test Sample User Guide DRAFT Electoral Map 5 – District 130 York North Toronto 1901 Test Sample User Guide 18 - - 1901 Toronto Area Test Sample User Guide DRAFT Electoral Map 6 – District 131 York West Toronto 1901 Test Sample User Guide 19 - -