Statistics Netherlands Division of Social and Spatial Statistics Statistical analysis department P.O. Box 4000 2270 JM Voorburg The Netherlands Email: chmn@cbs.nl REGISTER-BASED HOUSEHOLD STATISTICS Carel Harmsen and Abby Israëls Paper for European Population Conference 2003: European Populations: Challenges and Opportunities 26 – 30 August 2003, Warsaw, Poland Project number: BPA number: Date: 22 August 2003 1. Introduction In Spring 2001 Statistics Netherlands has for the first time presented household statistics that are mainly based on integral data from the municipal population registers. Up to that time, the continuous Labour Force Survey has been the data source for household statistics. An important advantage of using the new source is that on a micro-level consistency is achieved between population statistics and household statistics. In this way information on households can be published for a broad range of variables which were not available on the basis of the Labour Force Survey only: households at the municipal level, ‘foreign’ households, foreign population by household position etc. The population in institutional households is also integrated into the production files. In the old data set which was based on the Labour Force Survey, this kind of integration was not possible. Furthermore, the degree of (regional) detail is much greater than in the households statistics based on the Labour Force Survey data. In order to derive household statistics from municipal population register data for about 7 percent of the population the position in the household had to be imputed. Although imputation does not necessarily produce accurate results for individual households it does yield robust results at a higher level of aggregation for a wide array of variables. In this paper we explain more about the population register data used and the way in which household statistics are produced. Special attention is given to the way in which households on addresses with an ambiguous household composition - i.e., a composition that cannot be directly derived from family ties of persons living at the same address -is determined. 2. Dutch population register data1 The Dutch population and household statistics compiled by Statistics Netherlands are based on the automated municipal population registers. This registration system is known as the GBA system, which stands for ‘Gemeentelijke Basis Administratie persoonsgegevens’, the municipal basic registration of population data. ‘Basic’ refers to the fact that the GBA serves as the basic register of population data within a system of local registers. These registers include the local registers on social security, the local registers of water and electricity supply, the local registers of the police departments dealing with the foreign population in the Netherlands, and the (national) registers of the old age pension fund system. 1 This section is based on Prins, 2000. 1 2.1. The GBA-system in short The GBA system was introduced on 1 October 19942. It is a fully decentralised, comprehensive and cohesive population registration system. Due to legal provisions there is no central counterpart of these municipal registers. In this respect the system is unique in the world. Every municipality in the Netherlands has its own population register containing information on all inhabitants of that municipality. This information is listed per individual inhabitant in a so-called personal list (PL). In the registration system each inhabitant has been given a unique personal identification number (PIN), which enables the municipal authorities to link his or her data to those on the spouse, parents and children. For this reason not only the inhabitant’s PIN is stored on each PL, but also those of the parents, the spouse and the offspring. The main features of the GBA system are: - 2.2. the municipalities have retained responsibility for storing and supplying data. There is no central database; central government has developed an electronic communications network which links all municipalities and users of population data; this network provides fully standardised communication between all municipalities and users of population data; the network is an electronic mail system, according to the EDI principle (Electronic Data Interchange). Interactive real-time data exchange is possible; central and local government maintain the network jointly. Contents of the population registers A personal list (PL) consists of, among other information the following categories: 1. 2. 3. 4. 5. 6. personal data; data about the mother; data about the father; data about marriage, partnership, widowhood and divorce; data about the address; data about the offspring; As mentioned before, the population registers are a basic element in national and local government. This is why much attention was paid to the rules with respect to keeping the population register data up-to-date. The information needed to update these registers is provided by either the local registrar (births, deaths, marriages, partnerships), the judicial courts (divorces), the Ministry of Justice (changes of citizenship) or the persons concerned (house 2 Until 1 October 1994 the population registers were a paper card system. Dutch population statistics were based on those registers, as described by Van den Brekel (1977). 2 moves, immigration, emigration, births / marriages / other events that took place abroad). In a number of situations the population register does not match reality: - - - 2.3. Among young people, students for instance, the proportion of misregistrations seems higher than among other groups. Those who move house should notify the municipality of new residence. This is not always done directly after the move. An unknown number of people live in the country without being registered in the population register. Emigrants should notify the local authorities of their departure. However, they often fail to do so. Some just forget, others just do not take the trouble of going to town hall. Events that have taken place abroad are usually registered with some delay. Marriages contracted abroad are the most striking example of delayed registration. Statistics Netherlands authorisations Statistics Netherlands has been authorised to obtain all data from the municipal population registers the statistical office needs to compile population statistics, given the national needs and the needs of international organisations such as the UN, Eurostat and the Council of Europe. Every year in January Statistics Netherlands obtains a fixed set of data about all inhabitants of the Netherlands. These data are primarily used to give a statistical overview of the population on 1 January. These data are also essential for the household statistics. 2.4. Combining electronic GBA-messages The GBA-system is an individually oriented system of population data storage. The personal lists (PLs) display data per individual. Relations with spouse, children and parents are shown by means of Personal Identification Numbers (PINs). The construction of data on the nuclear family and on households is an example of combining data about various persons. The minimum condition for people to be grouped in the same nuclear family is that they live at the same address. Relations between persons at the same address can be detected by means of the PINs. We assume that young children are in a nuclear family unless the data indicate otherwise. Starting with the youngest person at the address, this person’s parents are detected through the mutual PINs. The same procedure is followed for the other persons at the address. 3 The Dutch population statistics are completely based on the municipal population register data. This means that Statistics Netherlands accepts the register data at face value. No further investigations are carried out on the data that are received from the population registers. Of course Statistics Netherlands is aware of the possibility that not all data are fully correct. As was indicated in section 2.2., some people may be registered at a different address than the one at which they actually live. Although this may affect the family and household statistics, no attempts are made to correct these data. 3. Household statistics The household statistics of Statistics Netherlands are derived every year and contain the number of households divided into household types, and persons living in households divided into household positions, in the Netherlands on 1 January. Data on households refer to the population in private and institutional households. Private households consist of one or more persons sharing the same address and providing for their own daily needs. A person in a one-person household is referred to as single. The members of multi-person households can be classified according to their position with respect to the so-called reference person3. The following positions for those members can be distinguished: - child(ren) living at parental home; - living together; - other. Children may be blood-related, stepchildren or adopted children living with (one of) the parent(s) and not having any children of their own living at home. If two persons are living together, it is assumed that they have a steady relationship. ‘Other members’ of the household are for example boarders, foster children and parent(s) of the reference person or of the partner. Persons living with their children but without a partner at the same address are included in the category ‘single parents’ (table 1). 1. Persons in households, 1 January 2002* 3 The reference person is a statistical entity. The reference person in a heterosexual relationship is always the man. In homosexual and lesbian relationships, the reference person is the elder of the two. 4 Age child single group persons living together not married married single parent other member institutional total x 1000 male female 0-14 1520 - - - - 10 4 1534 15-64 966 911 674 2778 56 103 42 5529 65+ - 163 28 656 9 22 31 909 total 2487 1074 702 3433 65 134 77 7972 0-14 1452 - - - - 10 3 1464 15-64 676 711 658 2921 308 77 28 5379 65+ - 569 30 511 38 37 104 1290 total 2128 1280 689 3432 346 123 134 8133 4615 2354 1391 6866 411 257 211 16105 total * provisional data The population in institutional households consists of persons whose accommodation and daily needs are provided for by a third party on a professional basis. It includes persons living in homes for the elderly, nursing homes and mental hospitals. The type of household depends on the relation of its members to the reference person, marital status and offspring. If the reference person is the only person at an address, it is clear that this is a one-person household. Households may also consist of unmarried couples with or without children, and of married couples with or without children. The presence of an ‘other member’ in these households does not effect the classification by type of household. A household consisting of more than one person, where the reference person neither has a partner nor children, is included in the category 'other household'. If the reference person is not cohabiting but has children living at home, the category 'single parent household' applies (table 3). 3.1. Directly derived households 5 The main input for household statistics are integral data on the Dutch population which Statistics Netherlands obtains from municipal population registers (GBA system, see section 2). First, all persons living in an institutional household are classified as such based on address information. After this, persons in private households are derived. For every single identifiable address the persons living on that address are identified together with their (family) relationships. Register information gives information about family ties. Every personal record contains information on parent(s) and of all children born, irrespective of their present residence. There is also information about the partner of the person. Together with the detailed address information it is possible to identify all traditional nuclear families. Obviously, persons living alone at an addresses form a one person household. When more than one person lives at an address either: 1. all persons at the address are related to each other; 2. one or more persons are not related to other persons living at the address. In the first case the household position and composition is derived directly from the family composition. These are married couples with and without children, single parent households, most other households and some nonmarried couples with children. There are a number of specific cases in which the household composition is derived by taking certain decisions. The most important decisions are: - - - - Other persons related to the family nucleus, that is brothers/sisters or grandparent(s): if such a relationship can be identified such persons become part of the household. As a general rule these persons are classified as other members of the household. In the case of two related families the youngest couple is considered the family-nucleus. The other family members are classified as other members of the household. Addresses where two brothers/sisters live together are classified as other households. Linking these two persons is possible because the information on the parents is the same. Persons aged 15 or younger living at an addresses without an identifiable parent are classified as other household members in case there is one other family living at an addresses. When two non-related persons came to live at an address at the same day these two persons are classified as a two-person household. At addresses with more than one family unit which are unlike the type of addresses mentioned in paragraph 3.2, the household composition is the same as for the separate families living at the address. If a couple with children, grandmother and two non-family persons live at an addresses, the households at that address are the couple with 6 children with one other household member, and two one-person households. 3.2. Imputation Most of the household information is derived from the population registers. However, these registers do not contain all the information that is required to distinguish all the different types of households. The position in the household and the composition of the household can be established if the relationships between persons living at the same address is clear. This is the case for roughly 93 percent of the Dutch inhabitants. The remaining 7 percent of the population in households is imputed on the basis of a logistic regression model. For this purpose six groups of addresses are made: Two ‘unattached’4 persons living at an address; Three ‘unattached’ persons living at an address; Four to nine ‘unattached’ persons living at an address; One single-parent family and a ‘unattached’ person living at an address; 5. One couple and one ‘unattached’ person living at an address; 6. Addresses as mentioned above with a postal classification identifying more than one separate postal unit (a kind of substitute for households) at the address. 1. 2. 3. 4. 3.3. Logistic regression In order to impute household compositions Labour Force Survey (LSF) information on the composition of households at an address is coupled with the information from the municipal population registers. These coupled records form the basis for the performed logistic regression. The Labour Force Survey (LFS) data are used to determine the relationship between background variables and the probability of forming one household. In this paper we describe models derived for: - - 4 The imputation of households at addresses with two ‘unattached’ persons. This type of addresses is by far the largest group of addresses to be imputed. The imputation of addresses with one lone parent and an apparently unattached person, a relatively small but methodologically interesting The imputation of addresses with three unattached persons The imputation of addresses with four unattached persons. ‘Unattached’ means that no identifiable family ties are present between the persons 7 3.3.1. The model for imputation of households at addresses with two ‘unattached’ persons For the imputation municipal addresses where two ‘unattached’ persons live are selected, including the household information from the labour force survey. This concerns about 4000 addresses observed in two successive years. These records form the basis for a logistic regression which is done to identify the variables that determine the probability that the persons living at an addresses are part of two households. The model for 2002 consisted of the following variables (table 2): - Age difference between the two persons (DIFAGE) Average age of the two persons (AVAGE) - Degree of urbanisation: 1 = highly urbanised, 5 = not urbanised (URBAN) Number of never married persons (NONMAR) Interaction of age difference by same-sex (DIFAGE by SAMESEX) Interaction of average age by same-sex (AVAGE by SAMESEX) Interaction of number of never married persons by same-sex (NONMAR by SAMESEX) Sex of the eldest combined with sex of the youngest person: male/female, female/male, same-sex (SEX) - 2. Logistic regression for the probability that two ‘unattached’ persons are part of 2 households B S.E. Wald df Sig. Exp(B) DIFAGE 0,139 0,020 46,200 1 ,000 1,149 AVAGET 0,078 0,022 13,178 1 ,000 1,081 URBAN -0,360 0,060 35,469 1 ,000 0,697 1,924 0,373 26,560 1 ,000 6,849 DIFAGE by SAMESEX -0,049 0,013 15,121 1 ,000 0,952 AVAGE by SAMESEX -0,054 0,014 15,661 1 ,000 0,948 NONMAR by SAMESEX -1,209 0,243 24,674 1 ,000 0,298 102,409 2 ,000 NONMAR SEX SEX(1) -7,390 0,782 89,228 1 ,000 0,001 SEX(2) -6,533 0,799 66,872 1 ,000 0,001 2,268 0,563 16,252 1 ,000 9,662 Constant The information derived from this coupled LFS/municipal registers file is used to impute the household variables on all the addresses with two ‘unattached’ persons in the municipal registers. The parameter estimates determine the probability of the two persons belonging to one household for every address with two ‘unattached’ persons. 8 This probability varies with the parameter estimates. For example, two young (below age 25) unattached persons of the same sex have a high probability of constituting two households, 3.3.2. The model for imputation of addresses with a single parent and an unattached person For the imputation municipal addresses are selected where a single parent and an ‘unattached’ person live, including the household information from the Labour Force Survey. This concerns about 600 addresses observed in two successive years. On this type of address three types of households can occur: - Two households: a single parent household and one person household One household: cohabiting couple with child(ren); the unattached person is attached to the single parent. This is done if the age difference between single parent and unattached person is less than 15 years. One household: cohabiting couple and other household member; the unattached person is attached to the child of the single parent. They form the cohabiting couple. The ingle parent mother becomes other household member. This type of household is imputed if the age difference between single parent and unattached person is more than 15 years. The model for addresses at which a single parent and an ‘unattached’ person live with an age difference of 15 years or less, consisted of the following variables for 2002 (table 3a): - Degree of urbanisation: 1 = highly urbanised, 5 = not urbanised (STED1) Sex of the single parent and the unattached person: male/female, female/male, male/ male female/female (GESGES). If the unattached person and the single parent family form one household than the unattached person is attached to the parent in the single parent family. 3a. Logistic regression for the probability that a single parent family and an ‘unattached’ person are part of 2 households age difference =< 15 years STED1 B S.E. Wald df Sig. Exp(B) -,371 ,119 9,647 1 ,002 ,690 17,025 3 ,001 GESGES GESGES(1) 1,907 ,977 3,807 1 ,051 6,730 GESGES(2) -1,099 ,424 6,735 1 ,009 ,333 GESGES(3) -1,304 ,722 3,257 1 ,071 ,271 -,056 ,506 ,012 1 ,912 ,946 Constant 9 The model for addresses at which a single parent and an ‘unattached’ person live with an age difference of more than 15 years consisted for 2002 of the following variables (table 3b): - Age difference between the two persons (1=more than 30 years; 2=16-30 years) (DLFTC150) Age of the unattached person (1=25 or higher; 0=0-24) (LFTLOSCT) Single parent is widower/widow (1=widow;0=no widow) (WEDUW2) Sex of the unattached person (1=male; 2=female) (GESL1(1)) If the unattached person and the single parent family form one household, the unattached person is attached to the child in the single parent family. 3b. Logistic regression for the probability that a single parent and an ‘unattached’ person are part of 2 households, age difference > 15 years B S.E. Wald df Sig. Exp(B) -,356 ,549 ,421 1 ,517 ,700 -,213 ,525 ,164 1 ,686 ,808 WEDUW2 ,146 ,599 ,060 1 ,807 1,157 GESL1(1) -,293 ,519 ,320 1 ,572 ,746 Constant -,196 1,020 ,037 1 ,848 ,822 DLFTC150 LFTLOSCT 3.3.3. The model for imputation of addresses with three unattached persons For the imputation about 500 addresses are selected where three ‘unattached’ persons live. These records form the basis for a logistic regression in order to identify the variables that determine the probability that the persons living at an address are members of one household or part of two or three households. In order to determine these probabilities two logistic regressions were carried out. The first step was to determine the probability of one household versus 2 or 3 households. The number of non-western foreigners appeared to be the determining variable. One or more non-western foreigners at the address led to a significantly higher probability of one household at the address. The result of the first step indicated the probability of one household. A second logistic regression was performed with the variable one or more households as a dependent variable. The result is the probability of three households. The 10 probability of two household is the 1-probability 1 household – probability 3 households. The model for the probably of three households consisted of the following variables (table 4): - 1= all persons at the address have the same sex; else=0 (SAMESEX) number of non-western foreigners(NALLOCHT) 1= average age of persons at the address <25; else= 2 (AVAGE) The probability of three households increases in case the 3 unattached persons are of the same sex, the number of non-western foreigners is small and the average age is under 25. 4. Logistic regression for the probability that three ‘unattached’ persons are part of 3 households B S.E. Wald df Sig. Exp(B) 2,205 ,235 87,946 1 ,000 9,072 ,025 ,109 ,052 1 ,820 1,025 AVEAGE -,525 ,251 4,381 1 ,036 ,592 Constant -1,086 ,460 5,569 1 ,018 ,338 SAMESEX NALLOCHT 3.3.4. The imputation of addresses with four unattached persons For the imputation about 200 municipal addresses at which four ‘unattached’ persons live are selected. Because of the small number of cases and the many possible household forms could appear at these addresses a rather simple imputation is performed for these addresses. The probability for a certain household type is determined directly from the selected records (table 5). 5. Imputation probability by household composition Household composition Sample frequency Imputation probability 4 members 5 5/91 3 members - 1 member 4 4/91 15 15/91 2 2/91 4 single households 65 65/91 Totaal 91 1 2 members – 2 members 2 members – 1 member – 1 member 11 3.4. Imputed households Overall 10 percent of the households is determined by imputation. Table 6 shows that unmarried couples without children are the most difficult group to determine. About half of these couples are based on estimation rather than observation. About three quarters of the unmarried couples with children are based on observation. Most of the remaining quarter comes from addresses containing a single parent and an ‘unattached’ person. In the production line the imputation is carried out by using a cumulative imputation probability. Every time this probability crosses an integer value, that specific address is imputed as two households. If the cumulative probability doesn’t cross an integer value the household becomes one household. In determining the household composition, the coupled addresses are also imputed, ignoring the information on the composition from the LFS. 6. Private households, 1 January 2002* Households Not-imputed households Not-imputed households (x 1000) (x 1000) (%) married couples without children 1535 1535 100 married couples with children 1898 1898 100 unmarried couples without children 499 264 54 unmarried couples with children 197 152 77 single parent household 412 374 91 2345 1908 87 49 33 70 6935 6164 89 one person household other household total * provisional data 4. References Van den Brekel, J.C., 1977, The use of the Netherlands system of continuous population accounting for the population statistics (Statistics Netherlands internal paper). Prins, C.J.M., 2000. Dutch population statistics based on population register data. Maandstatistiek van de bevolking, februari 2000, p. 9-15. 12