dlftc150

advertisement
Statistics Netherlands
Division of Social and Spatial Statistics
Statistical analysis department
P.O. Box 4000
2270 JM Voorburg
The Netherlands
Email: chmn@cbs.nl
REGISTER-BASED HOUSEHOLD STATISTICS
Carel Harmsen and Abby Israëls
Paper for European Population Conference 2003:
European Populations: Challenges and Opportunities
26 – 30 August 2003, Warsaw, Poland
Project number:
BPA number:
Date:
22 August 2003
1.
Introduction
In Spring 2001 Statistics Netherlands has for the first time presented
household statistics that are mainly based on integral data from the municipal
population registers. Up to that time, the continuous Labour Force Survey has
been the data source for household statistics.
An important advantage of using the new source is that on a micro-level
consistency is achieved between population statistics and household
statistics. In this way information on households can be published for a broad
range of variables which were not available on the basis of the Labour Force
Survey only: households at the municipal level, ‘foreign’ households, foreign
population by household position etc. The population in institutional
households is also integrated into the production files. In the old data set
which was based on the Labour Force Survey, this kind of integration was
not possible. Furthermore, the degree of (regional) detail is much greater than
in the households statistics based on the Labour Force Survey data.
In order to derive household statistics from municipal population register data
for about 7 percent of the population the position in the household had to be
imputed. Although imputation does not necessarily produce accurate results
for individual households it does yield robust results at a higher level of
aggregation for a wide array of variables.
In this paper we explain more about the population register data used and the
way in which household statistics are produced. Special attention is given to
the way in which households on addresses with an ambiguous household
composition - i.e., a composition that cannot be directly derived from family
ties of persons living at the same address -is determined.
2.
Dutch population register data1
The Dutch population and household statistics compiled by Statistics
Netherlands are based on the automated municipal population registers. This
registration system is known as the GBA system, which stands for
‘Gemeentelijke Basis Administratie persoonsgegevens’, the municipal basic
registration of population data. ‘Basic’ refers to the fact that the GBA serves
as the basic register of population data within a system of local registers.
These registers include the local registers on social security, the local
registers of water and electricity supply, the local registers of the police
departments dealing with the foreign population in the Netherlands, and the
(national) registers of the old age pension fund system.
1
This section is based on Prins, 2000.
1
2.1.
The GBA-system in short
The GBA system was introduced on 1 October 19942. It is a fully
decentralised, comprehensive and cohesive population registration system.
Due to legal provisions there is no central counterpart of these municipal
registers. In this respect the system is unique in the world. Every municipality
in the Netherlands has its own population register containing information on
all inhabitants of that municipality. This information is listed per individual
inhabitant in a so-called personal list (PL). In the registration system each
inhabitant has been given a unique personal identification number (PIN),
which enables the municipal authorities to link his or her data to those on the
spouse, parents and children. For this reason not only the inhabitant’s PIN is
stored on each PL, but also those of the parents, the spouse and the offspring.
The main features of the GBA system are:
-
2.2.
the municipalities have retained responsibility for storing and
supplying data. There is no central database;
central government has developed an electronic communications
network which links all municipalities and users of population data;
this network provides fully standardised communication between all
municipalities and users of population data;
the network is an electronic mail system, according to the EDI
principle (Electronic Data Interchange). Interactive real-time data
exchange is possible;
central and local government maintain the network jointly.
Contents of the population registers
A personal list (PL) consists of, among other information the following
categories:
1.
2.
3.
4.
5.
6.
personal data;
data about the mother;
data about the father;
data about marriage, partnership, widowhood and divorce;
data about the address;
data about the offspring;
As mentioned before, the population registers are a basic element in national
and local government. This is why much attention was paid to the rules with
respect to keeping the population register data up-to-date. The information
needed to update these registers is provided by either the local registrar
(births, deaths, marriages, partnerships), the judicial courts (divorces), the
Ministry of Justice (changes of citizenship) or the persons concerned (house
2
Until 1 October 1994 the population registers were a paper card system. Dutch population statistics
were based on those registers, as described by Van den Brekel (1977).
2
moves, immigration, emigration, births / marriages / other events that took
place abroad).
In a number of situations the population register does not match reality:
-
-
-
2.3.
Among young people, students for instance, the proportion of
misregistrations seems higher than among other groups. Those who
move house should notify the municipality of new residence. This is
not always done directly after the move.
An unknown number of people live in the country without being
registered in the population register.
Emigrants should notify the local authorities of their departure.
However, they often fail to do so. Some just forget, others just do not
take the trouble of going to town hall.
Events that have taken place abroad are usually registered with some
delay. Marriages contracted abroad are the most striking example of
delayed registration.
Statistics Netherlands authorisations
Statistics Netherlands has been authorised to obtain all data from the
municipal population registers the statistical office needs to compile
population statistics, given the national needs and the needs of international
organisations such as the UN, Eurostat and the Council of Europe. Every year
in January Statistics Netherlands obtains a fixed set of data about all
inhabitants of the Netherlands. These data are primarily used to give a
statistical overview of the population on 1 January. These data are also
essential for the household statistics.
2.4.
Combining electronic GBA-messages
The GBA-system is an individually oriented system of population data
storage. The personal lists (PLs) display data per individual. Relations with
spouse, children and parents are shown by means of Personal Identification
Numbers (PINs).
The construction of data on the nuclear family and on households is an
example of combining data about various persons. The minimum condition
for people to be grouped in the same nuclear family is that they live at the
same address. Relations between persons at the same address can be detected
by means of the PINs. We assume that young children are in a nuclear family
unless the data indicate otherwise. Starting with the youngest person at the
address, this person’s parents are detected through the mutual PINs. The
same procedure is followed for the other persons at the address.
3
The Dutch population statistics are completely based on the municipal
population register data. This means that Statistics Netherlands accepts the
register data at face value. No further investigations are carried out on the
data that are received from the population registers. Of course Statistics
Netherlands is aware of the possibility that not all data are fully correct. As
was indicated in section 2.2., some people may be registered at a different
address than the one at which they actually live. Although this may affect the
family and household statistics, no attempts are made to correct these data.
3.
Household statistics
The household statistics of Statistics Netherlands are derived every year and
contain the number of households divided into household types, and persons
living in households divided into household positions, in the Netherlands on
1 January. Data on households refer to the population in private and
institutional households.
Private households consist of one or more persons sharing the same address
and providing for their own daily needs. A person in a one-person household
is referred to as single. The members of multi-person households can be
classified according to their position with respect to the so-called reference
person3. The following positions for those members can be distinguished:
- child(ren) living at parental home;
- living together;
- other.
Children may be blood-related, stepchildren or adopted children living with
(one of) the parent(s) and not having any children of their own living at
home. If two persons are living together, it is assumed that they have a steady
relationship. ‘Other members’ of the household are for example boarders,
foster children and parent(s) of the reference person or of the partner. Persons
living with their children but without a partner at the same address are
included in the category ‘single parents’ (table 1).
1. Persons in households, 1 January 2002*
3
The reference person is a statistical entity. The reference person in a heterosexual relationship is
always the man. In homosexual and lesbian relationships, the reference person is the elder of the two.
4
Age
child
single
group
persons living
together
not
married
married
single
parent
other
member
institutional
total
x 1000
male
female
0-14
1520
-
-
-
-
10
4
1534
15-64
966
911
674
2778
56
103
42
5529
65+
-
163
28
656
9
22
31
909
total
2487
1074
702
3433
65
134
77
7972
0-14
1452
-
-
-
-
10
3
1464
15-64
676
711
658
2921
308
77
28
5379
65+
-
569
30
511
38
37
104
1290
total
2128
1280
689
3432
346
123
134
8133
4615
2354
1391
6866
411
257
211
16105
total
* provisional data
The population in institutional households consists of persons whose
accommodation and daily needs are provided for by a third party on a
professional basis. It includes persons living in homes for the elderly, nursing
homes and mental hospitals.
The type of household depends on the relation of its members to the
reference person, marital status and offspring. If the reference person is the
only person at an address, it is clear that this is a one-person household.
Households may also consist of unmarried couples with or without children,
and of married couples with or without children. The presence of an ‘other
member’ in these households does not effect the classification by type of
household. A household consisting of more than one person, where the
reference person neither has a partner nor children, is included in the category
'other household'. If the reference person is not cohabiting but has children
living at home, the category 'single parent household' applies (table 3).
3.1.
Directly derived households
5
The main input for household statistics are integral data on the Dutch
population which Statistics Netherlands obtains from municipal population
registers (GBA system, see section 2).
First, all persons living in an institutional household are classified as such
based on address information. After this, persons in private households are
derived. For every single identifiable address the persons living on that
address are identified together with their (family) relationships. Register
information gives information about family ties. Every personal record
contains information on parent(s) and of all children born, irrespective of
their present residence. There is also information about the partner of the
person. Together with the detailed address information it is possible to
identify all traditional nuclear families.
Obviously, persons living alone at an addresses form a one person household.
When more than one person lives at an address either:
1. all persons at the address are related to each other;
2. one or more persons are not related to other persons living at the
address.
In the first case the household position and composition is derived directly
from the family composition. These are married couples with and without
children, single parent households, most other households and some nonmarried couples with children.
There are a number of specific cases in which the household composition is
derived by taking certain decisions. The most important decisions are:
-
-
-
-
Other persons related to the family nucleus, that is brothers/sisters or
grandparent(s): if such a relationship can be identified such persons
become part of the household. As a general rule these persons are
classified as other members of the household. In the case of two
related families the youngest couple is considered the family-nucleus.
The other family members are classified as other members of the
household.
Addresses where two brothers/sisters live together are classified as
other households. Linking these two persons is possible because the
information on the parents is the same.
Persons aged 15 or younger living at an addresses without an
identifiable parent are classified as other household members in case
there is one other family living at an addresses.
When two non-related persons came to live at an address at the same
day these two persons are classified as a two-person household.
At addresses with more than one family unit which are unlike the type
of addresses mentioned in paragraph 3.2, the household composition
is the same as for the separate families living at the address. If a
couple with children, grandmother and two non-family persons live at
an addresses, the households at that address are the couple with
6
children with one other household member, and two one-person
households.
3.2.
Imputation
Most of the household information is derived from the population registers.
However, these registers do not contain all the information that is required to
distinguish all the different types of households. The position in the
household and the composition of the household can be established if the
relationships between persons living at the same address is clear. This is the
case for roughly 93 percent of the Dutch inhabitants. The remaining 7 percent
of the population in households is imputed on the basis of a logistic
regression model. For this purpose six groups of addresses are made:
Two ‘unattached’4 persons living at an address;
Three ‘unattached’ persons living at an address;
Four to nine ‘unattached’ persons living at an address;
One single-parent family and a ‘unattached’ person living at an
address;
5. One couple and one ‘unattached’ person living at an address;
6. Addresses as mentioned above with a postal classification identifying
more than one separate postal unit (a kind of substitute for
households) at the address.
1.
2.
3.
4.
3.3.
Logistic regression
In order to impute household compositions Labour Force Survey (LSF)
information on the composition of households at an address is coupled with
the information from the municipal population registers. These coupled
records form the basis for the performed logistic regression. The Labour
Force Survey (LFS) data are used to determine the relationship between
background variables and the probability of forming one household.
In this paper we describe models derived for:
-
-
4
The imputation of households at addresses with two ‘unattached’
persons. This type of addresses is by far the largest group of addresses
to be imputed.
The imputation of addresses with one lone parent and an apparently
unattached person, a relatively small but methodologically interesting
The imputation of addresses with three unattached persons
The imputation of addresses with four unattached persons.
‘Unattached’ means that no identifiable family ties are present between the persons
7
3.3.1. The model for imputation of households at addresses with two
‘unattached’ persons
For the imputation municipal addresses where two ‘unattached’ persons live
are selected, including the household information from the labour force
survey. This concerns about 4000 addresses observed in two successive
years. These records form the basis for a logistic regression which is done to
identify the variables that determine the probability that the persons living at
an addresses are part of two households.
The model for 2002 consisted of the following variables (table 2):
-
Age difference between the two persons (DIFAGE)
Average age of the two persons (AVAGE)
-
Degree of urbanisation: 1 = highly urbanised, 5 = not urbanised
(URBAN)
Number of never married persons (NONMAR)
Interaction of age difference by same-sex (DIFAGE by SAMESEX)
Interaction of average age by same-sex (AVAGE by SAMESEX)
Interaction of number of never married persons by same-sex
(NONMAR by SAMESEX)
Sex of the eldest combined with sex of the youngest person:
male/female, female/male, same-sex (SEX)
-
2. Logistic regression for the probability that two ‘unattached’ persons are part of 2 households
B
S.E.
Wald
df
Sig.
Exp(B)
DIFAGE
0,139
0,020
46,200
1
,000
1,149
AVAGET
0,078
0,022
13,178
1
,000
1,081
URBAN
-0,360
0,060
35,469
1
,000
0,697
1,924
0,373
26,560
1
,000
6,849
DIFAGE by SAMESEX
-0,049
0,013
15,121
1
,000
0,952
AVAGE by SAMESEX
-0,054
0,014
15,661
1
,000
0,948
NONMAR by SAMESEX
-1,209
0,243
24,674
1
,000
0,298
102,409
2
,000
NONMAR
SEX
SEX(1)
-7,390
0,782
89,228
1
,000
0,001
SEX(2)
-6,533
0,799
66,872
1
,000
0,001
2,268
0,563
16,252
1
,000
9,662
Constant
The information derived from this coupled LFS/municipal registers file is
used to impute the household variables on all the addresses with two
‘unattached’ persons in the municipal registers.
The parameter estimates determine the probability of the two persons
belonging to one household for every address with two ‘unattached’ persons.
8
This probability varies with the parameter estimates. For example, two young
(below age 25) unattached persons of the same sex have a high probability of
constituting two households,
3.3.2. The model for imputation of addresses with a single parent and
an unattached person
For the imputation municipal addresses are selected where a single parent
and an ‘unattached’ person live, including the household information from
the Labour Force Survey. This concerns about 600 addresses observed in two
successive years. On this type of address three types of households can occur:
-
Two households: a single parent household and one person household
One household: cohabiting couple with child(ren); the unattached
person is attached to the single parent. This is done if the age
difference between single parent and unattached person is less than 15
years.
One household: cohabiting couple and other household member; the
unattached person is attached to the child of the single parent. They
form the cohabiting couple. The ingle parent mother becomes other
household member. This type of household is imputed if the age
difference between single parent and unattached person is more than
15 years.
The model for addresses at which a single parent and an ‘unattached’ person
live with an age difference of 15 years or less, consisted of the following
variables for 2002 (table 3a):
-
Degree of urbanisation: 1 = highly urbanised, 5 = not urbanised
(STED1)
Sex of the single parent and the unattached person: male/female,
female/male, male/ male female/female (GESGES).
If the unattached person and the single parent family form one household
than the unattached person is attached to the parent in the single parent
family.
3a. Logistic regression for the probability that a single parent family and an ‘unattached’ person are part
of 2 households age difference =< 15 years
STED1
B
S.E.
Wald
df
Sig.
Exp(B)
-,371
,119
9,647
1
,002
,690
17,025
3
,001
GESGES
GESGES(1)
1,907
,977
3,807
1
,051
6,730
GESGES(2)
-1,099
,424
6,735
1
,009
,333
GESGES(3)
-1,304
,722
3,257
1
,071
,271
-,056
,506
,012
1
,912
,946
Constant
9
The model for addresses at which a single parent and an ‘unattached’ person
live with an age difference of more than 15 years consisted for 2002 of the
following variables (table 3b):
-
Age difference between the two persons (1=more than 30 years;
2=16-30 years) (DLFTC150)
Age of the unattached person (1=25 or higher; 0=0-24) (LFTLOSCT)
Single parent is widower/widow (1=widow;0=no widow)
(WEDUW2)
Sex of the unattached person (1=male; 2=female) (GESL1(1))
If the unattached person and the single parent family form one household, the
unattached person is attached to the child in the single parent family.
3b. Logistic regression for the probability that a single parent and an ‘unattached’ person are part of 2
households, age difference > 15 years
B
S.E.
Wald
df
Sig.
Exp(B)
-,356
,549
,421
1
,517
,700
-,213
,525
,164
1
,686
,808
WEDUW2
,146
,599
,060
1
,807
1,157
GESL1(1)
-,293
,519
,320
1
,572
,746
Constant
-,196
1,020
,037
1
,848
,822
DLFTC150
LFTLOSCT
3.3.3. The model for imputation of addresses with three unattached
persons
For the imputation about 500 addresses are selected where three ‘unattached’
persons live. These records form the basis for a logistic regression in order to
identify the variables that determine the probability that the persons living at
an address are members of one household or part of two or three households.
In order to determine these probabilities two logistic regressions were carried
out. The first step was to determine the probability of one household versus 2
or 3 households. The number of non-western foreigners appeared to be the
determining variable. One or more non-western foreigners at the address led
to a significantly higher probability of one household at the address. The
result of the first step indicated the probability of one household. A second
logistic regression was performed with the variable one or more households
as a dependent variable. The result is the probability of three households. The
10
probability of two household is the 1-probability 1 household – probability 3
households.
The model for the probably of three households consisted of the following
variables (table 4):
-
1= all persons at the address have the same sex; else=0 (SAMESEX)
number of non-western foreigners(NALLOCHT)
1= average age of persons at the address <25; else= 2 (AVAGE)
The probability of three households increases in case the 3 unattached
persons are of the same sex, the number of non-western foreigners is small
and the average age is under 25.
4. Logistic regression for the probability that three ‘unattached’ persons are part of 3 households
B
S.E.
Wald
df
Sig.
Exp(B)
2,205
,235
87,946
1
,000
9,072
,025
,109
,052
1
,820
1,025
AVEAGE
-,525
,251
4,381
1
,036
,592
Constant
-1,086
,460
5,569
1
,018
,338
SAMESEX
NALLOCHT
3.3.4. The imputation of addresses with four unattached persons
For the imputation about 200 municipal addresses at which four ‘unattached’
persons live are selected. Because of the small number of cases and the many
possible household forms could appear at these addresses a rather simple
imputation is performed for these addresses. The probability for a certain
household type is determined directly from the selected records (table 5).
5. Imputation probability by household composition
Household composition
Sample frequency
Imputation probability
4 members
5
5/91
3 members - 1 member
4
4/91
15
15/91
2
2/91
4 single households
65
65/91
Totaal
91
1
2 members – 2 members
2 members – 1 member – 1 member
11
3.4.
Imputed households
Overall 10 percent of the households is determined by imputation. Table 6
shows that unmarried couples without children are the most difficult group to
determine. About half of these couples are based on estimation rather than
observation. About three quarters of the unmarried couples with children are
based on observation. Most of the remaining quarter comes from addresses
containing a single parent and an ‘unattached’ person.
In the production line the imputation is carried out by using a cumulative
imputation probability. Every time this probability crosses an integer value,
that specific address is imputed as two households. If the cumulative
probability doesn’t cross an integer value the household becomes one
household.
In determining the household composition, the coupled addresses are also
imputed, ignoring the information on the composition from the LFS.
6. Private households, 1 January 2002*
Households
Not-imputed
households
Not-imputed
households
(x 1000)
(x 1000)
(%)
married couples without children
1535
1535
100
married couples with children
1898
1898
100
unmarried couples without children
499
264
54
unmarried couples with children
197
152
77
single parent household
412
374
91
2345
1908
87
49
33
70
6935
6164
89
one person household
other household
total
* provisional data
4.
References
Van den Brekel, J.C., 1977, The use of the Netherlands system of continuous
population accounting for the population statistics (Statistics Netherlands
internal paper).
Prins, C.J.M., 2000. Dutch population statistics based on population register
data. Maandstatistiek van de bevolking, februari 2000, p. 9-15.
12
Download