Internal and external consistency in multidimensional population projection models

advertisement
Environment and Planning A, 1985, volume 17, pages 1473-1498
Internal and external consistency in multidimensional
population projection models
N W Keilman
Netherlands Interuniversity Demographic Institute (NIDI), PO Box 955, 2270 AZ Voorburg,
The Netherlands
Received 1 October 1984; in revised form 15 May 1985
Abstract. In this paper the impact of consistency requirements upon the formulation of
multidimensional population projection models is dealt with. The problem of internal
consistency arises when certain output variables of the model have to be consistent with one
another, as in models containing marital status (two-sex problem). External consistency
problems arise when certain output variables have to satisfy externally specified restrictions.
For instance, regional population projections may have to add up to national projections. The
paper gives internal and external consistency algorithms, which control both for stock and for
flow variables. Two approaches are discussed: a simple proportional adjustment for each
relevant demographic component and an approach that minimises deviations in age-specific
patterns of the input rates. Both algorithms are illustrated using data from the Netherlands. In
the first illustration the internal consistency algorithm is applied to the age, sex, and maritalstatus-specific model of the official 1980 national population forecasts. In the second, external
consistency is considered within the context of the multiregional projection model which was
constructed for the regional population forecasts for the Netherlands.
1 Introduction
Population projection models have become increasingly complex in recent years.
Stimulated by growing demands from users, amongst others, some countries project
their population not only by age and sex, but also by region of residence, labour
market status, marital status, level of education, household composition, and the like.
These relatively complex models may raise concern about several aspects of the
consistency both between several distinct models and within a single model. Recent
insights derived from multidimensional demography have greatly enhanced our
understanding of complex processes that have led to inconsistencies in earlier models.
For instance, multiregional models which describe out-migration by out-migration
(exit) parameters and in-migration by admission (entrance) parameters are confronted
with inconsistencies between the total number of out-migrations and the total number
of in-migrations. Multidimensional projection models as pioneered by Rogers
circumvent this sort of inconsistency.
However, in multidimensional projection models inconsistencies may still arise.
F o r example, in a sex and marital-status-specific model, the numbers of male and
female marriages may disagree if marriage rates are specified for each sex. Also, in a
household model there may be a discrepancy between, say, the n u m b e r of adults
leaving a two-parent family and the n u m b e r of new one-parent families.
In literature, little attention has, as yet, been devoted to a general treatment of the
consistency problem in demographic projection models. This may be because of the
fact that a unifying framework for such projection models was lacking, as a result of
which ad hoc solutions were tailored to the specific model. But multidimensional
population projection models offer such a unifying framework. In this article,
therefore, I attempt to describe some general concepts pertaining to the consistency
problem within the context of multidimensional population projection models.
1474
N W Keilman
The paper starts with a description of the multidimensional population projection
model, which is a further improvement of the models developed earlier by Rogers,
Ledent, Willekens, and others. Next, some attention is given to the definition, nature,
and origin of the consistency problem. A fairly general approach to the solution of
inconsistencies is described, partly based upon a decomposition of the matrix
containing occurrence/exposure rates. The paper ends with two illustrations, for
which data from the Netherlands were used. The first applies to the internal
consistency of the age, sex, and marital-status-specific model which was used for the
1980-based national population forecasts. The second describes an external consistency
algorithm which was used in the context of a multiregional projection model.
2 The multidimensional population projection model
2.1 Introduction
In this section, I present a general formulation of the multidimensional population
projection model. The major features of this model are the following:
1. the model is purely demographic—it contains only definition equations, no
behavioural equations;
2. in its basic form, the model is two-sex and female-dominant;
3. the model is deterministic and time discrete;
4. the length of the age intervals is equal to the length of the unit projection periods;
5. the model has occurrence/exposure rates (central rates) as its parameters, which
describe the phenomena experienced by the model population during the projection
period;
6. external migration is taken into account;
7. the model structure and the parameter estimators are derived from a set of
accounting equations or balancing equations.
The last two characteristics result from the fact that the model presented here is an
improvement of the multidimensional projection models that have so far been
developed in multiregional demography. Before multiregional projections had
matured, the projection models usually contained only one dimension or demographic
characteristic, namely, age. Generally, the dimension of sex was implemented in
those models by using an age-specific model for each sex separately.
The traditional approach on which these models were based dates back to their
use by Leslie (1945) and was studied extensively by Keyfitz (1968, sections 2 - 4 ) . Its
main characteristic is that the life table forms the basis for the projection model.
This method was generalised into a multidimensional approach by Rogers (1975),
who first extended the Leslie model into a multiregional projection model and soon
realised that migrations from one region to another were not the only events which
could be handled using such a model. In general, the model can describe transitions
from any one population category to another. This led to a universal framework, the
multidimensional population projection model, in which multiregional models, maritalstatus-specific models, household formation and dissolution models, and models
describing processes within, for example, the labour market can be located.
However, these models still display the fundamental characteristic of the Leslie
model. That is, they are based upon the life-table approach. This method has two
drawbacks:
1. The life table contains information on individuals classified by exact ages. Thus, a
cohort-observational plan is employed. In such a plan, when an individual experiences
an event, their cohort is recorded as well as their age in completed years at the time
of the event (Willekens and Drewe, 1984). In the Lexis diagram of figure 1, the
cohort-observational plan is indicated by the parallelogram QSVP. It extends over
two calendar years. However, for a projection model, the cohort-observational plan
Consistency in multidimensional population projection models
1475
is not very appropriate. Such a model describes the future number of persons
belonging to a certain cohort and experiencing a given event between two successive
points in time. This calls for a period-cohort-observational plan, of which parallelogram
Q R S V in figure 1 is an example. This observational plan covers two age classes.
Starting the construction of a projection model from the life-table approach means
that the cohort-observational plan of this approach should be transformed into the
period-cohort-observational plan of the model. This is frequently done by deriving
survivorship proportions from transition probabilities (for instance, see Rogers, 1975.
pages 65 and 79). Hence, extra complications are introduced as compared with the
approach advocated in this paper. Here, I would rather lay the multidimensional life
table aside and start directly from a set of occurrence/exposure rates, based upon a
period-cohort criterion.
2. T h e life table contains information on the behaviour of a fixed set of individuals as
they grow older (and possibly pass through several classifying states in the multidimensional case). People may leave the system that is described by the information
in the life table, because of death or emigration, for instance. But immigrations are
not allowed. For some countries this may be a reasonable assumption. However,
other countries gain a substantial number of immigrants each year. Therefore, the
life-table-based population projection model was supplemented with a variable
describing immigration from outside the region in question. The approach frequently
adopted involves two stages (see Pollard, 1973, page 50). First, the population
presented in the system at the beginning of the unit projection interval is assumed to
'survive' to the end of the interval. Second, a stock of immigrants is added to this
result.
With this approach, however, all immigrants are assumed to enter the country at
one particular moment. Hence, they cannot experience the relevant events during the
projection interval. For unit intervals of one year this, is often probably quite a
realistic assumption. But with projection models a unit projection interval of five
years is frequently employed, in which case the validity of the approach may be
questioned.
T h e approach outlined in this section is aimed at solving the problems of the
usual models as described above. Its main characteristic is that the starting point for
the model is a set of given occurrence/exposure rates defined upon a period-cohort
criterion, supplemented with the set of corresponding accounting equations. By
simply solving for the population at the end of the interval, a completely valid
projection model is constructed. This means that such a model may exactly simulate
A'+ 1
Figure 1. Lexis diagram.
N W Keilman
1476
the behaviour of the population in a known period. A simple illustration, using only
one population category and two events, may clarify this.
Let us suppose that a population size is denoted by K{ and K2 at the beginning
and at the end of a particular period, respectively. The population may experience
the events of death and of immigration during the interval (1, 2). The number of
persons that die is denoted by IX m is the corresponding occurrence/exposure rate,
and / is the number of immigrants. When deaths and migrations are assumed to
occur uniformly over the interval, then m may be written as m = 2D (Kl + /C->). The
corresponding accounting equation is K2 = A.', — D+ I. Eliminating D and solving
for K: yields
This formula is a special case of the multidimensional model, described by expression (8)
in section 2.2.
This line of thought concerning the process of model construction was followed for
the 1980-based population forecasts of the Netherlands (Cruijsen and Keilman, 1984,
chapter 2; Keilman, 1982). Willekens also used this principle in his efforts to devise
a multiregional population projection model for the Netherlands. In that project, he
described a consistent framework for the derivation of multidimensional projection
model equations (Willekens, 1984; Willekens and Drewe, 1984).
The remaining part of this section is devoted to the presentation of the improved
multidimensional population projection model.
2.2 Model description
The starting point for the process of model construction is the identification of stock
and flow variables. The numbers of persons in each population category (denoted by
a particular combination of the dimensions age, sex, etc) define the set of stock
variables. They are denoted by the symbol K,(x, t), indicating the number of persons
aged x at time t who are classified in the /th population category. At a certain point
in time, the numbers of persons aged A* can be represented, by category in a column
vector,
K(A,/) = [ K , ( A , 0 , K 2 ( A , / )
KV(A,OJT,
in which N is the number of categories a person may occupy. For reasons of
convenience, the age dimension has been singled out in this notation.
The flow variables give the numbers of demographic events that occur in the unit
projection interval, that is, between t and r+ 1. The events involve changes in status,
for instance, from status / to status y, from alive in / to dead, from alive in / to living
abroad (emigration), from living abroad to alive in y (immigration), and from not-yetborn to born in /. This means that the events may be classified into two broad groups:
internal events, which are defined by a transition between two categories of the system,
and external events, which involve a transition across system boundaries (because of
mortality, external migration, and fertility). The flow variables which belong to
internal events may be distinguished by the initial status, /, and the final status, y.
The external event of mortality will be indicated as a transition from status / to
status d (dead), emigration is a transition from / to o (outside the system), immigration
is a transition from o to y, and birth involves a transition from not-yet-born to /.
This gives rise to the following flow variables:
0,y(A, t) the number of transitions from status / to status y in the period (t, t+ 1) by
persons aged x,
0 / d (x, t) the number of persons in status / aged x that die in the interval (/, t+ 1),
Consistency in multidimensional population projection models
0 / 0 (x, /)
OOJ(X, t)
B,y(x, r)
the number of emigrants aged x who leave status / during (f, t+ 1),
the number of immigrants aged x who enter status y during (f, t+ 1),
the number of children born during (r, f + 1) to mothers aged x in status y.
These children will occupy status / immediately after being born.
Ages may be measured at the time the event occurs or at the beginning of the
interval. All flow variables are defined using a period-cohort observation criterion.
This enables us to define the following relation between stock variables and flow
variables:
K,-(* + 1, / + 1) = Kf-(x, t) - Z 0,y(x, 0 + I 0,,-(x, /)
/ ¥• i
k *
i
- 0 / d (x, /) - 0 / 0 (x, 0 + O u/ (x, t) ,
0 ^ x ^ z- 1 .
(1)
This accounting or balancing equation simply states that the number of persons in a
certain category is found using the previous number in that category, corrected for all
outflows (due to internal as well as external events) and all inflows (idem). At time r,
persons aged x are described. At time / + 1, these persons are aged x+ 1. Hence
cohorts are observed over projection periods, which calls for a period-cohortobservational plan. Ages are defined at the beginning of the interval.
Equation (1) is valid for all age groups, except the first one (children born during
the projection interval) and the last one, which is open-ended. For children born
during the projection interval the balancing equation is
Kf-(0, t+ 1) = B,.(r) - Z O„(00, t) + I
i ^
j
k *
O,,(00, t)
j
- O /d (00, /) - O /o (00, t) + Oo/(()0, /) ,
(2)
where O,y(00, /) represents the number of children who move from status / to status j
in the interval of birth. B,(f) denotes the number of children born in status /, and
hence Bf-(f) = IlB, y (x, t) .
j
x
The number of persons in the highest age group, that is, aged z or over, at time t
in status / is given by K;(z, /). This definition implies that the highest age group is an
open one. Hence, its balancing equation is
K,-(z, t + 1) = K,-(z - 1, t) - I Oiy(z - l , r ) + I Oki(z ~ 1, 0
/ * /
- OJz
k * i
- 1 , 0 " O to (z - 1, t) + Oai(z - 1, t) + K;(z, /)
- Z Oiy(z, t) + I Oki(z, t) - Oid(z, t) - Oio(z, t) + Oai(z, t) .
j * i
(3)
k * i
This formula states that at time ^ + 1 , the number of persons aged z or over consists
of two parts. The first part is the number of survivors of those aged z — 1 at time t.
The second part is the number of survivors of persons who already belonged to the
highest age group at time t.
After the presentation of the balancing equations for the first, the last, and the
intermediate age groups, the definitions of the occurrence/exposure rates will be
given. The rates together with the balancing equations lead to a system of linear
expressions that can be solved for K(x+ 1, t+ 1).
The occurrence/exposure rates are obtained by dividing the number of events by
the number of person-years lived in the initial status (for a definition see, for
example, Keyfitz, 1968, page 9). The transition rate miy(jc, t), the mortality rate
N W Keilman
1478
mKl(A\ r), and the emigration rate m/0(A\ t) are, respectively,
//l
m,(A; /) =
m ( )=
;
,
(4)
0/d(jf,/)
(5)
« *'' tki ,ul
m,u(A-, 0 =
;
,
(6)
where L,(A\ t) denotes the number of person-years lived in status / by persons
aged A' during the interval (r, t + 1). Again, in calculating L,(A\ f), a period-cohortobservational plan is used.
For immigrants, no occurrence/exposure rate is defined. The population at risk
for those immigrants is the population outside the system, which is not included in
the model variables. Hence, the number of immigrants, 0 0 / ( A \ /), will be an
exogenous variable in the final form of the model.
In general, the number of person-years is equal to the total length of all lifelines
(Keyfitz, 1968, pages 9-10) in the observation interval in question. Hence, if the
interval is one year,
r i
L,-(A', t)
=
K;-(A- + r, t + r)dr
.
However, since the individual lifelines are not usually known, an approximation to the
integral must be made. As is frequently done in demography, it is assumed that all
events are uniformly distributed over the observation interval to which they apply.
For status /, the number of person-years lived by members of the cohort during
(r, /+ 1) is then
L,.(A\ 0
= 3[K,-(JC, 0 + K , ( A - + 1, f+
1)J .
(7)
Substitution of equations (4)-(7) into equation (1) gives
K / ( J C + 1, / + 1) = K,-(A\ /)
+ I
m/d(A\ t) + S m,v(A', t) + mi{)(x, t)
/* .
m,,-(A;0L,(A',0+O o/ (A',0
-
MA-,
0
This equation applies to status /. Repeating the equation for each of the other
statuses results in a system of N equations. These may conveniently be written as a
matrix equation
[I + iM(i, t)]K(x + 1, t + 1) = [I - !M(x, t)]K{x, t) + 00(X, t)
or equivalently
K ( * + l , t+l)
= [ I + | M ( i , r)]~ 1 [I-^M(A', r)]K(jc, t)
+[I+TM(I,
t)]~lOQ{x, t) ,
(8)
where I is the N x N identity matrix.
The vector K(x, t) was defined earlier as the vector of the numbers of x-year-old
persons at time f, by status. The vector 0 0 (A-, t) is the vector of the numbers of x-yearold immigrants, by status of destination, in the interval (r, t+ 1). The matrix M(x, t)
can be interpreted as the multidimensional generalisation of the one-dimensional
occurrence/exposure rates defined by equations (4)-(6). Its configuration is the
Consistency in multidimensional population projection models
1479
following
M(x, t)
mn(x,t)
-ml2(x,t)
- m 2 I ( x , t)
m22(x t)
...
...
-mNl(x,t)~
-mN2(x,t)
- mlN{x, t)
- m2N(x, t)
...
mNN(x, t)
(9)
with m/7(jc, t) = mjd(x, t) + X rn,y(x, t) + m /0 (x, t). Hence, rates for external events
/ * i
appear only on the main diagonal, whereas internal events have rates off the diagonal
as well.
The multidimensional population projection model for the intermediate age groups,
as given in expression (8), may also be written as
K(x + 1, f + 1) = S(x, t)K(x, t) + i[I + S(x, 0JO o (x, t) ,
with S = (1 +jM) _ 1 (I — yM) (1) as the matrix of period-cohort transition probabilities.
An element s,y(x, t) of S(x, t) represents the probability that a person aged x and
classified in status / at time t will survive until /+ 1 (that is, will neither die nor
emigrate) and at that time will occupy status j . Between time /and time /+ 1 this
person may experience several direct transitions (moves). In this respect, S differs
fundamentally from the matrix M containing occurrence/exposure rates. These rates
describe direct transitions only. In a marital-status-specific model, M does not contain
any rates which describe direct transitions from 'never married* to "divorced\ for
instance. However, because of matrix inversion and multiplication, transition
probabilities between these two states, although very small, are not equal to zero.
Before ending the discussion on the model for the intermediate age groups, we
note that an expression equivalent to equation (8) is
K(JC
+ 1, t + 1) = [I -
Q(JC,
r)]K(x, t) + [I -
TQ(A\ 0JO O (JC,
/) ,
_,
with Q = (1 + j M ) M . Hence the matrix Q is the multidimensional equivalent of the
one-dimensional probability, q, for an event. In the case of a uniform distribution of
the events in question over the observation interval, q is defined as q = (1 + i m ) " 1 m,
where m is the accompanying occurence/exposure rate. This shows the analogy
between the definitions of Q and q. The element q/y of Q (/ ^ j) is equal to - s,y and
hence cannot be interpreted as a probability. However, for a diagonal element qu of
Q we find
q/7(x, t) = 1 - s/7(x, t) .
Therefore, qu is the probability that an individual will leave status / before the end of
the particular interval, because of either an internal event or an external event. For
immigrants this probability of leaving is only half that for those already present at
time t. This is purely a consequence of the assumption of a uniform distribution of
events.
The derivation of the model equations concerning the highest and open-ended age
group is analogous to that for the intermediate age group. The result is
K(z, t+ 1) = [I + l M ( z - 1, t)Yl[l-^M(z-
1, t)]K(z - 1, t)
1
+ [I + iM(z,r)r [I-^M(z,0]K(z,r)
+ [I +^M(z - 1, t)Yl00(z
(1)
- 1, t) + [l+L2M(z, r)]_1O0(z, t).
For brevity, the operand (x, t) is deleted from this term and from some that follow.
(10)
N W Keilman
1480
Equivalent expressions are
K(c, / + 1) = S(z-
1, / ) K ( z - l , / ) + S ( z , t)K(z, t)
+ i l I + S ( z - 1, /)JO t l (z- 1, 0 +5[I+S(z,0]On(z,0
and
K(-, / + 1) = [I - Q(z - 1, r)]K(z - 1, 0 + [I - Q(z, /)]K(z, /)
+ [I - iQ(z " 1, 0JO„(z - 1, 0 + [I - iQ(z, 0JOo(z, 0 .
The model equation for the first age group is derived in two steps. First, an equation
for the number of births in the observation interval is given. Next, an expression is
obtained for the number of zero-year-old children at time t+ 1.
Recall that BI7(A\ /) is the number of children born in status / during the interval
(A, /+ 1) to mothers in status /and aged .v. The fertility rate (occurrence/exposure
rate of childbearing) is
where L)(A, /) is the number of person-years lived in the interval by the female
population at risk. If a uniform distribution of events for these women is assumed,
then
L'ix.t)
= i[Kj(.v,0+Kj(.v+l,/+l)j,
where K'(A, /) is the female population in status /, aged A, at time t. For every
status / that children will occupy after their birth, the number of live-born children is
B,(/) = lTmhll(xu)L[,(x,
•V
t) = S X m ^ A ^ K ^ O + K^(A + 1, r + 1)] ,
/'
-V
(11)
/'
and the vector of the numbers of live-born children in the interval, by status, can be
written as
B(r) = lB,(/),B 2 (r), .-., B A .(0f •
For marital-status-specific models, all children are born in the never-married
status. Hence, B(t) contains only one nonzero element. For multiregional models,
children are born in the region of residence of the mother, thus equation (11) should
be reformulated as B,-(t) = Im w ; (x, t)L)(x, t).
X
Equation (2) shows that children may make transitions from / to y, may emigrate,
may immigrate, and/or may die between birth and the end of the interval. Occurrence/
exposure rates for these events (except for immigration) will be denoted by m/y(00, t),
m/()(00, t), and m /d (00, f), respectively. They are defined as the number of events
divided by the number of person-years lived by these children during the interval.
The observation interval consists of a triangle, for instance, MNT in figure 1. The
number of person-years, L,(00, t), is given by
L,.(00, t) = 4[0 + K ; (0, t + 1)] = iK,-(0, t + 1) .
Hence, the occurrence/exposure rate for death in the interval (A, / + 1) for the lowest
age group is
m d(00 f)
'
'
~ K,(0,r+1) •
Rates for transition between states / and y and for emigration are obtained analogously.
Consistency in multidimensional population projection models
1481
Substitution of the occurrence/exposure rates in the balancing equation (2) leads to
K(0, t + 1) = [I + iM(00, t)]~' [B{t) + O o (00, 0]
= i[I + S(00,r)][B(r)+O o (00,r)]
= [I-iQ(00,0][B(0+Oo(00,0],
(12)
where the matrices M(00, /), S(00, t), and Q(00, /) have definitions and interpretations
similar to those of the other age groups.
The complete multidimensional projection model consists of the expressions (8),
(10), (11), and (12). This model constitutes the basis for the applications in the
following sections. However, in section 3, I shall first pay some attention to the
concepts of internal and external consistency.
3 Internal and external consistency
3.1 Definitions, nature, and origin of the problem
In this paper, consistency is defined as a situation in which a set of model variables
satisfy a certain constraint. When exogenous variables play a role in this constraint,
we speak of external consistency. Internal consistency arises when the constraint
applies only to endogenous variables. The definitions of internal and external
consistency can be formalised as follows. Let */,, u2, ..., un be a set of (endogenous)
model variables. Internal consistency is defined as g(«i, u2, ..., un) = 0, where g is a
(possibly, vector-) function of the model variables. This function depicts the constraint
in question, which is often dictated by reality. External consistency arises in situations
where g(w,, w2, ..., un) = e, in which e denotes one or more exogenous variables.
The concept of consistency used in this paper can be interpreted as numerical
consistency. Other forms, such as logical consistency (a concept which may be
applied to model structure instead of model variables), will not be treated here.
An example may clarify the terms internal consistency and external consistency. In
1976 the Netherlands Central Bureau of Statistics (CBS) published the 1975-based
population forecasts of the Netherlands. In these forecasts, the population was
decomposed by age and sex; in addition, women were classified by marital status.
Hence, the forecasts produced information on the future numbers of marriages,
divorces, and new widows among these women (CBS, 1976). Additional calculations
were published in 1979, in which the marital status of men was also considered. In
the construction of the model for these calculations, three external consistency
constraints were formulated (CBS, 1979, page 9). In each year, the total number of
marrying women (both first marriages and remarriages, summed over all ages) should
be equal to the total number of marrying men. The same applies to divorce. Finally,
the total number of new widowers in a certain year should equal the total number of
dying married females. In the 1980-based forecasts of the CBS, males and females
were treated simultaneously. Hence the forecasters had to solve an internal
consistency problem. In chapter 4 the nuptiality model used in these forecasts is
discussed in more detail.
I shall now pay some attention to the reasons why inconsistencies may arise. In
general we can state that inconsistencies may be expected to occur when two or more
(sub)models describe the same elements. In demographic projection models these
elements consist of population categories, defined by cross-classifications of certain
(demographic) characteristics. Individuals aged thirty years and living in Europe
constitute such a category, as do thirty-year-old married women living in the Netherlands.
A model for the second population category describes elements which also constitute
the first group. When both groups are modelled, inconsistencies may arise.
1482
N W Keilman
From the example given above it is seen that (dis)aggregation may be a cause of
inconsistencies. Rogers (1976) and Gibberd (1981) discuss the consistency problem
within the framework of aggregation and decomposition.
The first term denotes a
reduction of the number of characteristics (or categories) in the population m o d e l hence the dimension of the model is decreased. Reasons for reducing model
complexity may be efficiency and retainment of the overall picture. Decomposition
involves breaking up a large system into several relatively independent subsystems of
lower dimensionality. The solutions of the subsystem problems can then be combined
and, if necessary (for instance, to account for interactions between the smaller parts),
modified to yield the solution of the original problem.
In addition to aggregation and decomposition, inconsistencies may also be caused
by inadequate modelling.
I refer especially to the situation in which the model
describes the behaviour of individuals, despite the fact that their behaviour is, in
reality, derived from the behaviour of a group of persons. The reason for such an
inadequate modelling strategy may be that individual behaviour is easier to trace than
group behaviour. The following example may clarify this.
Suppose a model is being constructed to project the future number of households.
Following the multidimensional modelling strategy, one classifies individuals according
to, say, household category and sex. Within the dimension of household category, the
following states are distinguished: (1) living in a single household, (2) living in a
two-person consensual union, (3) living in a family household. Consider a male
and a female living in a consensual union and suppose the man moves to a family
household (for instance, by marrying a divorced mother). Then the male's transition
implies a transition for the female as well: she will become single. Therefore,
constraints have to be added to the model to ensure numerical consistency between
males and females leaving a consensual union. Not only the dissolution but also the
formation of relationships may create inconsistencies when individuals instead of
couples are considered in the model. However, couples are much more difficult to
monitor through time than are individuals, at least when standard macrodemographic
models are used.
3.2 Approaches to the solution of inconsistencies in demographic projection
models
As was stated in the introduction, little methodological research has so far been done
into the general consistency problem in demographic projection models. In sections
4 and 5 we shall see that the solution strategies for the two-sex problem (treating
inconsistencies in marital-status-specific models) were developed independently of
those for the aggregation of multiregional models into national models. However, a
common line of thought can be established with respect to these strategies, for they
frequently involve the application of the following method:
(1) formulate initial values of model parameters;
(2) check and adjust for consistency;
(3) translate consistent model variables into adjusted parameter values.
Sometimes, the model variables checked for consistency are stock variables only.
This may be the case in multiregional models that have to satisfy national results.
But often, and this applies to both internal and external consistency, the constraints
are formulated in terms of the model flow variables. This then automatically
produces a consistent picture for the stock variables, because of the simple accounting
equations. Hence consistency algorithms for multidimensional models, which adopt
the three-stage solution strategy mentioned earlier in terms of flow variables, need a
device which translates consistent flow variables into the corresponding adjusted
parameters. Therefore, in this section, I shall give a reformulation of the multidimensional population projection model (8), (10), (11), and (12), which allows for
1483
Consistency in multidimensional population projection models
the inclusion of (some) components (flow variables) expressed in absolute numbers,
instead of occurrence/exposure rates.
Let the N x N matrix M of occurrence/exposure rates defined in expression (9), be
written as the sum of N x N matrices, M,, M 2 , ..., M r , one for every component. This
means that for each external component, the corresponding M,- is a diagonal matrix
containing the appropriate occurrence/exposure rates in its diagonal. Internal
components, however, give rise to matrices M, that have occurrence/exposure rates,
with opposite signs, off the diagonals, in addition to rates on their diagonals. For
instance, the M-matrix of a marital-status-specific population classified in four
categories (never-married, currently married, divorced, and widowed) has the
following configuration for every age and period
mI I 0
m, 2 m 22
0
-m23
0
- m,.
M
0
m 32
m 33
0
0 "I
— m 42
0
m 44 J
where m 12 describes first marriage, m 23 divorce, m 24 transition to widowhood, and
m 32 and m 42 remarriage of divorced and widowed persons, respectively. These are
all internal events. The external events of death and emigration are contained in the
diagonal elements, since m„ = m/d + X m,y + m /o .
Let Md describe mortality, M() out-migration, M l 2 first marriage, M 2 3 divorce,
and so on. This means, for instance, that Md is a diagonal matrix with diagonal
(m l d , m 2d , m 3d , m 4d ), whereas
0
0
m 23
m 23
0
0
0
0 "I
0
0
*
0
0
0
0J
0
0
M23 =
Similar lines of thought yield definitions for the remaining matrices. Thus we see that
in this example
M = Md + M 0 + M l 2 + M 2 3 + M 2 4 + M 3 2 + M 4 2
or, in general,
M = I M,- .
I -
1
To every matrix M, there corresponds a vector O, of the absolute numbers of
direct transitions that belong to the /th component. This vector of flow variables is
defined as O, = M,L, where L is the vector of person-years, of which the elements
are given in expression (7). Now the following reformulation of the multidimensional
population projection model (8) can be proven.
For every age x between 0 and z and for every time t,
-1
I+iEM,
-1
K0 + I + i Z M ,
o0 - Z o,.
where K{ and K0 represent K(x+ 1, t+ 1) and K(x, t), respectively, C is any subset of thje
set {i\i = 1, 2, ..., c) of (internal and external) components, and E M , + E M, ^ M.
/ eC
i i C
1484
N W Keilman
The proof of this identity is straightforward. It is based upon the observation that
K, = K() - ML + 0 0 = K„ - iM(K, + K()) + 0 0 .
The usefulness of expression (13) can be illustrated by the following example. Let
us suppose, in a multidimensional model, that the absolute number of deaths has
been specified externally, by status. Let Od denote the flow vector for mortality, thus
Od = <(0| t |, 0 2 j , " , 0/vd)T' and Md the (unknown) diagonal matrix containing death
rates, by status. Then for every age x between 0 and z, and for every r,
K{x+ 1, f + 1 ) =
\l+{M*(x,t)]~l[l-\M*{x,t)]K(x,t)
+ [I+!M*(x,/)j-'[Oo(x,0-Od(A',0J,
where M* = M — M d , the occurrence/exposure matrix for the 'remaining1 components.
Expression (13) provides the basis for the algorithms that ensure consistency
between multiregional and national projections, in section 5. In those algorithms, death
and emigration are specified externally. After K, has been computed by means of
expression (13), adjusted parameter values are obtained by letting O, = iM,-(K, + K()),
for every component /, and solving for the unknown occurrence/exposure rates.
4 Application: the two-sex problem in marital-status-specific population projection
models
The first application of the concepts that have been outlined so far applies to
inconsistencies that give rise to the so-called two-sex problem in nuptiality models.
Attention shall be focussed on marital-status-specific projection models and an
algorithm which ensures internal consistency between male and female nuptiality
behaviour shall be discussed. A numerical illustration will be given using data from
the 1980-based official population forecasts of the Netherlands. An extensive
treatment of the problem has been given elsewhere (Keilman, 1985).
4.1 The two-sex problem
Usually demographic models of observed populations do not contain interactions
between the sexes. Most of them deal with the experiences of a single sex, at
most, the behaviour of males and that of females are treated separately. Attempts
to reflect the experience both of males and of females simultaneously were confronted
with the 'two-sex problem'. By this we mean that observed rates of some demographic
events for males and those for females cannot both be used consistently in a
demographic model if the model population differs from the observed one.
Such demographic events typically involve interaction of the sexes: nuptiality and
fertility, for instance. Suppose two sets of age-specific marriage rates were observed,
one for males and one for females. Now for those two sets of rates to reflect the
behaviour of any population, the total number of marriages derived from the male
rates must equal the total number of marriages implied by the female rates. In
general, however, the rates taken from one population will not yield consistent
numbers of marriages when applied to another population. An analogous situation
occurs with divorce, transition to widowhood, and fertility if these three phenomena
are described for the two sexes separately (Schoen, 1981, page 201).
There is no generally accepted method available to solve this two-sex problem.
Several authors have, however, mentioned requirements (also labelled 'axioms') that
should be met by a realistic two-sex marriage model. When these requirements are
borne in mind, supplemented if necessary with other considerations, a realistic model
can be looked for.
Consistency in multidimensional population projection models
1485
4.2 Requirements for a realistic two-sex marriage model
Recent overviews of the requirements for realistic two-sex marriage models have been
given by Pollard (1977) and Wijewickrema (1980). The most important demands will
be considered here. Monogamy is presupposed. All symbols apply to a fixed period
(flow variables) or moment (stock variables).
(a) Availability,
the number of marriages for males (females) aged x (y) cannot exceed
the number of eligible males (females) aged x (y). Thus,
YJv(x,y)
< fi(x)
and
Hv(x, y) ^ $(y) ,
where v(x, y) is the number of marriages during the period under consideration
between males aged x and females aged y. ju(x) and <f>(y) denote the number of
eligible males aged x and eligible females aged y, respectively.
(b) Monotonicity:
increased availability at any specified age can—ceteris paribus—
result only in an increase in marriages. In symbols,
dv(x,
y)
^{ y} > 0
d<f>{y)
dv(x,
y)
^l / ; > 0 .
d{i{x)
and
For some age combinations, strict inequalities should occur.
(c) Homogeneity:
an increase in the number of available members of each sex by the
same factor entails an increase in the number of marriages by the same factor. If the
members of unmarried males (females) aged A( y) are arranged into a vector /JL(X)[$( v)|
then
v[x, y, njLi(x), n$(y)\
= nv[x, y,ju(x)^ </>(y)|,
n > 0 .
This requirement states that if the population were to double, we would expect the
number of marriages to double rather than to quadruple or remain unchanged.
(d) Competition:
the number of marriages v(x, y) for age combination (x, y) should be
a nonincreasing [and, over some area of (x, y), strictly decreasing] function of ju(x')
for x' ¥" x, and similarly, in the case of females, of $(y'), y' ¥> y. Thus,
3V(JC, y)
d/j,(x'
^ 0
(*' * x)
<o
(/ * y) .
and
dv(x, y)
This axiom stems from the fact that the marriage chances of a male aged twenty-five,
say, will very likely decrease if there is an increase in the supply of eligible males
aged twenty-seven or thirty-five. However, the decrease caused by twenty-seven-yearold males is not necessarily of the same magnitude as that caused by thirty-five-year-old
males, as will be stated in the next requirement.
(e) Substitution (or relative competition): the negative effect of an increase of ju(x')
(x' ¥> x) on v(x, y) depends on how close x' is to x. It is such that the decrease
of v(x, y) for an increase of ju(x') is greater than the decrease of v(x, y) for an
equivalent increase of ju(x") if x' is closer to x than is x". This means that if
(x"-x)
> {x'~ x) > 0
then
M^y)
dju(x')
<
M*>y) <
dju(x")
0
or if
(x"-x)
< (x* - x) < 0
N W Keilman
1486
Thus an extra supply of twenty-seven-year-old single males leads to a stronger
decrease in the number of marrying males aged twenty-five than does a similar extra
amount of single males aged thirty-five. This stems from the fact that some of the
females who initially married twenty-five-year-old males will generally prefer males
aged twenty-seven to males aged thirty-five. Analogous statements are valid when the
sexes are interchanged.
(f) Consistency, the total number of male marriages should equal the total number of
female marriages. A stronger version of this axiom should be used when marriages
are distinguished according to age combination of the partners. Then the number of
marriages between males aged x and females aged y implied by the male rates must
be equal to the number of (A\ y) age combination marriages implied by the female
rates.
The consistency requirement plays a key role in algorithms which solve the two-sex
problem for models based upon individual behaviour. However, not all of the
remaining five requirements given here are equally important. For instance, availability
is as strict an axiom as consistency is, whereas competition and substitution are only
desirable.
The six requirements apply to a marriage model. Models describing divorce or
transition to widowhood do not have to meet the competition requirement or the
substitution requirement.
4.3 Solution to the two-sex problem in the 1980-based population forecasts of the
Netherlands
The model of the 1980-based population forecasts of the Netherlands was
constructed by the Netherlands Central Bureau of Statistics. It possesses strong
multidimensional characteristics (see Cruijsen and Keilman, 1984, section 2). It is a
purely demographic, deterministic, and time-discrete model, based upon the cohortcomponent approach. The demographic characteristics (dimensions) by which the
population of the Netherlands is classified are sex, age (0, 1, 2, ..., 99 or over), and
marital status (never-married, currently married, divorced, widowed). This means that
the CBS model contains the external components of fertility, mortality, and external
migration as well as the internal components of marriage and marriage dissolution.
The model parameters of the CBS model are of the period-cohort type. However,
its formulas differ somewhat from the multidimensional model equations given in
section 2.2, for the CBS model can be viewed as a first-order approximation of the
multidimensional model contained in expressions (8), (10), (11), and (12). This can
be clarified as follows. Recall equation (8):
K(* + 1, t+ 1) = [l+?M(x,t)]-l[I-?M(x,
t)]K(x, t) +[I+iM(jc,/)] _ , O o (jc,0 .
Under quite weak conditions, the matrix to be inverted may be expanded to a power
series in TM. Since
(I+^M)" 1 = I - !M + (\M)2 - (|M) 3 + ... ,
it follows, neglecting terms in M of order two or higher, that
K(x+l,t+l)
= [l-M(x,t)]K(x,t)+[I-±M(x,t)]00(x,t)
.
(14)
Expression (14) is the basic formulation of the CBS model for ages 1, 2, 3, ..., 98.
Equations for the lowest age and the highest age group (99 or over) were derived
analogously. Forecast assumptions were formulated in terms of the nuptiality rates
and death rates contained in M, as well as of the absolute numbers of migrations
contained in 0 0 .
Consistency in multidimensional population projection models
1487
A second but minor difference between the CBS model and the multidimensional
model is the fact that in the CBS model, O n contains the numbers of net migrations.
Hence there are no emigration rates in M.
Distinct nuptiality assumptions were formulated for females and for males.
Therefore, these nuptiality rates initially gave rise to inconsistencies in the numbers
marrying, when summed over all ages and all previous marital states. This was also
the case for marriage dissolution. The approach adopted in the CBS forecasts to
circumvent these inconsistencies is to adjust initially specified rates to meet the
consistency requirements. This method is completely in line with the strategy
discussed in section 3.2. However, the adjustment may be operationalised in various
ways. In the CBS forecasts the adjustment method was constructed in such a way to
fulfil as many of the requirements for a realistic nuptiality model, discussed in
section 4.2, as possible. The method adopted contains three steps, which are
described below. For reasons of convenience the notation is adapted to the special
case of the marital-status-specific population projection model.
Step 1 For every calendar year, the number of marrying males aged x is computed frorrn
v(x) = vl2(x) + v32(x) + v42(x)
= mI2(*)[/*,(*) + i0 0 l (*)] + m32{x)[p3(x) + iO o3 (*)J + mA2(x)\jLi4(x) + iO o 4 (*)|.
Here v denotes the number of marrying persons, ju is the number of eligible males
present at the beginning of the year, and O is the net number of immigrations
during the year. The single subscripts attached to ju denote the marital states:
never-married (1), divorced (3), and widowed (4). The double subscripts for v
and O describe transitions from one marital status to another: first marriage (1 2),
remarriage of divorcees (3 2), remarriage of widowers (4 2), and immigration of single,
divorced, or widowed persons (ol, o 3 , and o4, respectively). The operands attached
to v denote sex as well as age: males aged A* (X) and females aged y (v).
Giving initial values m1,-,, rn'^, and m 42 to the marriage rates results in the initial
number of marrying males aged x. Let v'l(x) denote this initial number. A similar
way of reasoning gives vl(y), the initial number of marrying females aged y.
Step 2 Here consistency is achieved. Implementing step 1 yields a total number of
marrying males, Ev'(x), which is not equal to the total number of marrying females,
Zv'(y). Therefore, these initially inconsistent numbers are averaged, using a harmonic
mean, to find the total number of marriages v'lX. Thus,
lv(x)
Zv\y)
Yvl(x)+^v[(y)
The choice for a harmonic mean is mainly justified by the requirements given in
section 4.2. Elsewhere it has been discussed that such a choice results in meeting the
availability, monotonicity, homogeneity, and competition requirements. Substitution is
not always guaranteed (Keilman, 1985). Compared with other averaging procedures
(for instance, an arithmetic mean, a geometric mean, or a minimum model) the CBS
model with its harmonic mean has relatively good characteristics.
Step 3 New numbers of marriages for each sex at each age are calculated by
proportionally rating the initial numbers up or down. For males the adjustment
factor Am is
Am = va
IVW
i
[
- 2£v i (y) TJv (x)+Zv (y)
(15)
N W Keilman
1488
Analogously, the female adjustment factor A, equals
Ar =
2lv\x) Tv'(x)+
Zv'iy)
= 2 - A,
;i6)
These adjustment factors allow for the calculation of the final numbers of marriages,
V"(A* ) and v"(y), for each marital status. In addition, the adjusted first-marriage rate
for males may be computed as
m;;2M
=
(17)
^m'i2(^)
and the remaining marriage rates can be calculated in a similar way. This enables
one to check the deviation between initial and final assumptions concerning marriage.
The algorithm described here applies to the CBS model, which can be viewed as a
first-order approximation of the multidimensional model discussed in section 2.2.
Which formulation of the algorithm should be employed in such a multidimensional
model? Obviously, the steps 1 and 2 of the CBS algorithm may be repeated. Then,
analogous to expressions (15) and (16), adjustment factors could be computed. But
relation (17) will no longer hold, since the calculation of the number of person-years
for each population category differs between the two models. However, expression (13)
enables one to compute for the general case:
L(x, t) = T[K(JC, t) + K(x + 1, t + l)\
+ l I Mf-(*, t)
K(x, t)
00(x, t) - Z 0,-(x, /)
;i8)
In addition to consistency in marriage behaviour, in a nuptiality model, male and
female divorce should correspond, as well as transition to widowhood and death of
the spouse. This means that the set C will contain only nonnuptiality components,
namely, death and emigration. But these are both external events, which implies that
X M, is a diagonal matrix. Hence, the matrix inversion required to calculate L is
straightforward and immediately leads to adjusted nuptiality rates, For instance, for
first marriage we find, for every /, that
m"2(*) =
=
V\{x)
vU*)[l+^ldM+^lo(*)]
K,(x)+i00l(x)-^2(x) '
where U\(x) denotes the number of person-years lived in the adjusted situation by
never-married persons. This means that the initial rate is multiplied by a factor
m, 2 (*)
m1,^)
=
v'Ux) K 1 ( x ) + | 0 0 l ( x ) - j v i 2 ( x )
v\2(x) K,(*) +±OO1(X) -kv\2(x) '
Adjusted rates for remarriage, divorce, and transition to widowhood may be
computed similarly.
4.4 Results
This section contains some of the results concerning marriage in the 1980-based
population forecasts, for which the CBS nuptiality model was used as part of the
complete forecasting model. I shall limit myself mainly to methodological results,
more specifically, the degree to which the initial numbers of persons marrying in a
certain year had to be adjusted to obtain the final numbers. More substantive
demographic results are given in the relevant CBS publications (CBS, 1982; Cruijsen
and Keilman, 1984).
Consistency in multidimensional population projection models
1489
The 1980-based population forecasts of the CBS contain three variants: high,
medium, and low. These variants were obtained by combining different assumptions
for fertility, first marriage, and external migration with a fixed set of assumptions for
other components (mortality, widowhood, divorce, remarriage) (CBS, 1982, page 17).
A brief description of the assumptions most relevant to this paper is given below.
More detailed information on this subject is to be found elsewhere (CBS, 1982,
section 2; Cruijsen and Keilman, 1984, section 3).
As far as marriage is concerned, it is expected that the family, and hence the
married couple, will play a less dominant role in future society than is the case at
present. Unmarried cohabitation will be more accepted than in the 1970s. Some of
those who cohabit will consider their life-style a worthwhile alternative to marriage.
Therefore, the propensity ultimately to marry will decrease. Whereas the proportions
ever marrying are 9 0 % (males) and 9 5 % (females) for persons born around 1940,
they are assumed to be 6 5 - 7 5 % (males) and 7 5 - 8 5 % (females) for persons born
around 1970. Hence, the majority of the population will marry at least once.
However, these marriages will be contracted at a relatively advanced age: for the
coming decades it is expected that the mean age at first marriage will be some two
years higher than it is nowadays. Remarriage of divorced and widowed persons will
continue to decrease.
In section 4.3 the adjustment factor for marriage was defined as the proportion by
which the initial rates (and also the initial numbers) of persons marrying have to be
adjusted in order to achieve consistency. Expressions (15) and (16) imply that
^m "*" ^f == 2, so that Af can be obtained from / m , and vice versa. Hence we need to
consider only one factor.
Figure 2 shows the development of Am, the male adjustment factor. In the period
1 9 8 0 - 2 0 3 0 the adjustment factor shows the same behaviour in all variants: after
some initial fluctuations, between 1980 and 1985, a period of cyclical behaviour sets
in. In the years between 1985 and 2 0 3 0 the adjustment factor is less than one. This
means that in the initial situation there is an oversupply of marrying males; this excess
increases to 4 i - 6 % (in 1998, when Am is 0 . 9 4 - 0 . 9 5 5 ) and decreases to 2 i - 3 i % .
What could be the cause of this development? The share of marriages that are
remarriages is taken to be 1 5 - 2 0 % for the coming decades. The majority of persons
marrying will not have been married before (CBS, 1 982, page 45). More than 80°/)
of these first marriages occur in the age group twenty-thirty-four years. The ratio of
the number of never-married females to never-married males, aged twenty-thirty-four,
will, therefore, largely explain the development of the adjustment factor. This is
particularly true for the years from about 1995 onwards. In this period the marriage
1.05
1.00
0.95
low variant
'f--- medium variant
high variant
0.90
1980 1985 1990 1995 2000 2005 2010 2015 2020 2025
Calendar year
Figure 2. Adjustment factor, Am, for marriages of men (source: CBS).
N W Keilman
1490
rates were assumed to be (almost) constant with time (CBS, 1982, page 18). This means
that variations in the number of marriages (and hence in the adjustment factor) will be
almost completely determined by variations within the never-married population.
When the initial first-marriage assumption of the CBS forecasts was specified, a
lasting, and often increasing, excess in the supply of never-married males in the years
to come was, in fact, taken into account. This was expressed by the assumption that
for the generations born in the period 1 9 3 0 - 1 9 7 0 the difference between the male
proportion ever marrying and the corresponding female proportion will increase. For
instance, the proportion for females born around 1940 is some 5 % more than the
proportion for males. However, for the cohorts born around 1970 the difference
increases to about 10%. Nevertheless this excess in male supply was underestimated,
since the male adjustment factor is continuously below one.
The relation suggested above between the sex ratio of never-married persons aged
twenty-thirty-four and the adjustment factor is confirmed by the first two columns of
table 1.
A comparison between this table and figure 2 shows that a period in which the sex
ratio decreases ( 1 9 9 0 / 1 9 9 5 - 2 0 0 0 , but particularly 2 0 1 5 - 2 0 3 0 ) , corresponds roughly
with a period in which the adjustment factor decreases. For the years between 1980
and 1 9 9 0 / 1 9 9 5 the relation between the adjustment factor and the sex ratio is much
weaker. This is caused by the diverging trends in the marriage rates for this period
(CBS, 1982, pages 56 and 57).
For every forecast year, the sex ratio in the low variant is larger than that in the
high variant. Hence, in the high variant a relatively large excess in male supply
exists. With respect to this excessive male supply, no difference was initially present
between the three forecasts variants. Therefore, initial inconsistencies between the
sexes are seen to have resulted, inconsistencies which are greater in the high variant
than in the low variant. This agrees with figure 2: almost everywhere the low
variant Am is closer to one than the high variant Am.
Finally, I here draw a conclusion regarding the distance between the adjustment
factors in the high and the low variants. From the mid-1990s onwards, this distance
is almost constant with time (at l-lS-%). Moreover, it is much smaller than the
amplitude of the waves. These observations lead to the conclusion that the influence
of the marriage assumptions upon the adjustment factors (with an additional
amplification due to different fertility assumptions after the years around 2000) is
much smaller than the influence of the structure of the current population.
Table 1. Sex ratio (number of women per 1000 men) of never-married persons aged
twenty-thirty-four years, by variant (source: calculated from CBS, 1982, tables 2-4).
Calendar year
1980
1985
1990
1995
2000
2005
Variant
low
high
608
674
713
723
717
724
608
663
687
680
663
671
Calendar year
2010
2015
2020
2025
2030
Variant
low
high
737
746
745
737
731
691
700
698
686
678
5 Application: consistency between regional and national population projections
T h e second illustration of this paper deals with the reconciliation between a regional
and a national population projection. The second implies an external constraint of
the first. Thus, inconsistencies that may arise are external by nature.
Consistency in multidimensional population projection models
1491
The external consistency algorithm of this section was devised within the context of
the MUDEA (Multidimensional Demographic Analysis) project, carried out by the
Netherlands Interuniversity Demographic Institute and Delft Technical University on
behalf of the Netherlands 1 Physical Planning Agency. One of the aims of this project
was the construction of a multiregional population projection model (Willekens, 1984).
The constraint used in this section simply states that regional projection results,
when summed over all regions, have to be consistent with previously calculated
national totals. This applies not only to stock variables, namely, the population by
sex, age, and region, but also to flow variables; that is, results for regional fertility,
mortality, and external migration also have to be in line with the national results.
This constraint implies that the regional projection is lower in hierarchy than the
national projection. What calls for such a top-down approach? For a number of
reasons larger populations are easier to project than are smaller ones (Baxter and
Williams, 1978, page 61; Pittenger, 1976, page 79; Schwarz, 1975, page 46). First,
the behaviour of an aggregate population is often relatively stable. In addition,
statistical data for sophisticated calculations pertaining to small units are seldom
available. Finally, for a country as a whole, internal migration can be disregarded
whereas external migration is in general of limited importance. Regional projections
are often heavily influenced by internal migration, which is relatively hard to
extrapolate.
How have inconsistencies between regional and national population projections
previously been dealt with? Little methodological literature is available. In practice,
initially inconsistent regional results are almost always corrected to meet the national
constraints. Sometimes this solely applies to stock variables. In such cases, adjusted
model parameters are not calculated. This is possible only when constraints are
formulated in terms of flow variables; for instance, see the multiregional projection
model of the Netherlands described by Eichperger (1984, pages 245 and 246).
However, even in such instances, little attention is paid to the question as to why
inconsistencies in numerical results arise: because of different assumptions or because
of different model structures?
In current handbooks for preparing regional population projections, the problem of
consistency is hardly mentioned. Pittenger (1976, pages 86 and 91) describes a
proportional correction method where regional shares of the total population, after
extrapolation, may fall outside the zero-one range. Schwarz (1975, pages 51
and 53), in illustrative computations, is confronted with the fact that the total
population in two regions exceeds the national population. He then simply diminishes
the regional populations by the appropriate percentage.
Further literature on the consistency problem is very scarce. Baxter and Williams
(1978, page 51) mention it in passing. Rogers (1976) and Gibberd (1981) deal with
it within the very restrictive framework of time-independent projection parameters
and a zero net migration. George (1981) discusses the Canadian experience. He
reports only minor consistency problems, since both the national and the provincial
projections are compiled by Statistics Canada. Within the above-mentioned MUDEA
project the consistency between the multiregional model results and those of the
Netherlands' national population projection model was treated extensively (Keilman,
1984). In this study the MUDEA model was compared with the CBS model, with
respect to the following points: basic principles, model equations, and assumptions.
Algorithms dealing with the inconsistencies were formulated, and illustrative
calculations showed the impact both of logical inconsistencies (due to different basic
principles and model equations) and of numerical inconsistencies (due to different
assumptions). In the remaining part of this section the major findings are highlighted
serving as an illustration of the concept of external consistency.
1492
N W Keilman
5.1 The MUDEA model compared with the CBS model
The MUDEA model is a special case of the multidimensional model discussed in
section 2 of this paper. It contains the dimensions of sex, age (0, 1, 2, ..., 89, 90 or
over), and region of residence (the twelve provinces of the Netherlands). Hence the
model describes the components of fertility, mortality, external migration, and internal
migration. All parameters are of the period-cohort type. For immigration, absolute
numbers are used; occurrence/exposure rates are employed for the remaining
components (including emigration).
The CBS model was briefly described in section 4.3. The most relevant difference
between the CBS model and the MUDEA model is the fact that the first is only a
first-order approximation of the second, as noted in expression (14). Therefore,
special attention was given to the case where national assumptions on fertility,
mortality, and external migration were applied to the parameters of the MUDEA
model. Ideally, this would have resulted in regional projection results that are
consistent with national totals. For instance, the national fertility rate for a certain
age can be viewed as the weighted sum of regional fertility rates, the number of
person-years by region constituting the weights. With equal fertility rates for the
regions, the person-years by region can be added up and this simply results in the
national number of person-years. Because of differences in model structure, the CBS
person-years differ slightly from those of the MUDEA model. Hence, consistency
was achieved by assigning only slightly reformulated values of the national assumptions
concerning fertility, mortality, and external migration, to the regional parameters.
In the case of regional differences in assumptions with respect to fertility,
mortality, and external migration, inconsistencies may arise between national and
regional results, even when the model structures are identical. Therefore, a consistency
algorithm was implemented in the MUDEA model. This algorithm will be discussed
in the following section.
5.2 Consistency algorithm of the MUDEA model
One of the requirements of the MUDEA model is complete consistency between the
results from its flow variables and the results from the corresponding flow variables
in the national forecasts. To that end, a consistency algorithm was implemented in
the MUDEA model. This algorithm contains two options. The first employs a
simple proportional adjustment of the relevant variables, the same adjustment factor
being applied to all regions. However, for a certain component, these factors may
differ for the successive ages. This might result in a situation where, for instance,
the fertility rate for women aged twenty-three has to be raised by 5%, whereas the
fertility rate for twenty-four-year-old women is diminished by 4%. This example
shows that distortions may arise in the initially specified age-specific pattern when a
proportional adjustment is employed.
Therefore, the second option was devised. It meets the consistency requirement as
well as minimising the deviation between the initial and the adjusted age-specific
pattern. In this paper, only the second option will be discussed.
I shall first describe the external migration algorithm. Subsequently mortality and
fertility will be discussed. The external migration algorithm is relatively simple, since
this phenomenon is mainly formulated in terms of absolute numbers (with the
exception of emigration in the MUDEA model). The derivation of the formulae for
mortality is rather complex, since this component describes a nonrenewable event.
This implies that any adjustment of the mortality rates alters not only the numbers of
deaths, but also the relevant numbers of person-years. Fertility is located between
external migration and mortality, in terms of the complexity of the consistency
algorithm. This renewable phenomenon starts from known numbers of person-years
Consistency in multidimensional population projection models
1493
for women in the fertile ages. Any adjustment of the fertility rates does not affect
these person-years.
5.2.1 External migration Since, in the CBS model, this component is formulated in
terms of absolute numbers of net immigrations, the algorithm is used to minimise the
adjustment from initial net immigrations in the MUDEA model [OO,(JC) — 0' 0 (x)] to
final figures [Oaoi(x) - 0* 0 (JC)], by treating all ages x simultaneously. Formally
minimise EE{[OL-(x) - 0*i0(x)] - [aoi(x) - 0)0(x)}}2
i
x
under the condition that
E[0; f .(*) - O'Ux)] = [Ol(x) - 0:0(x)},
for every x.
i
Here 0*0m(x) ~ 0*o(x) denotes the national net number of immigrations for age x as
computed from the CBS model. The solution of this problem may be found by using
the method of Lagrange multipliers. The result is
OM - o-.,x, - ou*> - OLM - i°:.M-o-.wi-io:.w-ou,)i _
where OlQ,(x) "~ 0\0(x) = ZO' o / (i) - 0 / 0 ( x ) , and N is the number of regions.
Next, the adjusted net immigrations in the MUDEA model are divided into
emigrations and immigrations. Here the assumption is that one half of the
adjustments may be attributed to emigration, and the other half to immigration. Hence,
0*oi(x) = Voi(x)+ka(x)
Ol{x)
a(x) =
1 }
x
= O io(x)-\a{x)
,
,
(19)
[Qao.W-oroW]-[Q!>.W-o!oW]
N
Expressions (19) give the solution for the consistency problem when controlling for
the age pattern of net immigration.
5.2.2 Mortality The problem is to find consistent numbers of deaths, by age, sex,
and region, under the condition of a minimum deviation between initial and adjusted
age-specific mortality rates. In other words
minimise ES[ni/ d (x) - mid(x)]2
i
x
under the condition that
ZO?d(x) = O^x)
,
for every x.
Here 0*d(x) is the national number of deaths. In this expression, x may have the values
0, 1, 2, ..., 90+ . Mortality in the calendar year of birth (x = 00) will be treated after
fertility.
Before solving this problem, a relation should be formulated between mdid(x)
and 0/ d (*). Since the MUDEA model fits within the framework of the general
multidimensional model of section 3, such a relation can be found by singling out
mortality from the set of all components and using expressions (13) and (18). Then,
for the year t the adjusted vector of person-years for age x can be written as
V(x, t) = [I +|M*(x, t) +|M*(x, t)]~l{K(x, t) +h[OUx, t) - 0J 0 (*, f)]} .
(20)
N W Keilman
1494
Here M d denotes the diagonal matrix of adjusted mortality rates and M* is the matrix
which describes the remaining components. Oaoi and 0*0 stand for the adjusted
vectors of immigrants and emigrants, respectively. This implies that for a certain year
the adjusted vector of deaths can be written as
0*id(x) = M»(x)V(x)
= M3(*)[I + ? M * W
+|M3(*)]-1K'(JC)
,
(21)
where K' denotes the vector between braces in equation (20). This expression may
be expanded in a power series, since
(l+lM*+jMl)-[
= I - | ( M * + M d ) + i ( M * + M^)2 + ... .
When terms in M d of order two or higher are disregarded, expression (21) leads to
Ojd(*) = M*(x)[l-±M*(x)]K'(x)
.
Since Md is a diagonal matrix with elements mfd we find that
Ojd(x) = m;d(*)Kf-(*) ,
where K, is the /th element of the vector (I — iM*)K'. Therefore the optimisation
problem can be formulated as
to*)
minimise IE K,(*) - m-d(*)
subject to
zoux) = o:0(X).
i
The solution is found in a similar way to the one for external migration. Using the
method of Lagrange multipliers yields
ImU^K^-CrU*)
2
0»d(x) = m;.d(*)K,.(*) - K M - *
^=—!
,
which constitutes the solution for the consistency problem in the context of mortality.
5.2.3 Fertility
The treatment of fertility is very much the same as that of external migration and
mortality. It involves a minimisation of deviations in age-specific fertility rates for all
regions and all ages of the women involved. The result is
mu(x) = nii(x) -L-(xJ EmU*)L;(jt)-B.(Jc)
IL*(*) 2 ,
where mbi(x) is the age-specific fertility rate for women aged x in region /.
Superscripts i and a denote the initial and the adjusted situation, respectively.
L/(x) is the number of person-years lived by the above-mentioned women, when
consistency for external migration and mortality has already been achieved. Bm(x) is
the total number of births to women aged x calculated from the national projections.
Note the strong similarity between the result for fertility and that for mortality.
5.2.4 Mortality in the calendar year of birth After adjustment of the number of births
per region, consistency may be achieved between the regional and the national
numbers of children who die in their year of birth. As before, a least-squares
minimisation problem is formulated, which results in
O?d(00) = im;. d (00)B?-(Bfr S m ; d ( 0 0 ) B , a - 0 ^ ( 0 0 )
I(B?)2
Consistency in multidimensional population projection models
1495
In this formula, Bf denotes the zth element of the vector
[ I - i M ^ O O ) ] - 1 ^ ^ 0^.(00) -Oj o (00)] ,
where Ba is the adjusted vector of births, by region, irrespective of the mother's age.
5.3 Results
Illustrative calculations were carried out for the years 1980 to 1983 in order to test
the age-specific consistency algorithm within the framework of the MUDEA model.
To that end, occurrence/exposure rates for fertility, mortality, emigration, and internal
migration together with absolute numbers of immigrants as observed for two regions
of the Netherlands (the four western provinces and the remaining eight provinces)
during 1976 were employed for the projection year 1980. The resulting flow variables
were compared with the results of the CBS projections (medium variant) for the same
year, and inconsistencies were dealt with using the algorithms of section 5.2. The
adjusted 1980 parameters served as an input for the projection year 1981. After
correction for inconsistencies, these were applied to the year 1982, and so on.
Extensive tabulations both for stock and for flow variables showed complete consistency
between the regional projection results and those of the national projections (Keilman,
1984). Some summary measures are presented in tables 2 and 3.
A few comments are due here. The third and the sixth column of table 2 show
that, except for inaccuracies due to rounding off, consistency was indeed achieved.
Table 3 gives the values of the objective functions for each component by calendar
year. Because of the method employed for assigning initial values to the MUDEA
model parameters, the values in table 3 are largely affected by the stability of the
results of the CBS projections. For instance, for external migration these projections
assumed a high level of net immigration for the year 1980, namely, some 53 000 for
males and females taken together. This was followed by a strong decrease in 1981 to
Table 2. Summary measures for calculations of section 5.3; number of demographic events in
the four western provinces (west) and the remaining eight provinces (rest).
1980
Births:
initial number
adjusted number
national number
Male deaths
initial number
adjusted number
national number
Female deaths
initial number
adjusted number
national number
Male net immigrations
initial number
adjusted number
national number
Female net immigrations
initial number
adjusted number
national number
1983
west
rest
total
west
rest
total
80113
78442
105953
102778
186066
181220
181225
80 643
81094
107595
108201
188238
189295
189298
31214
29593
36 376
33702
67590
63295
63 302
30241
30 504
34604
34175
64845
64679
64694
27931
24 766
29366
26 290
57297
51056
51064
26 406
26212
28786
28733
55191
54945
54942
6436
15559
1894
11017
8330
26577
26 577
6 987
6 745
2287
2045
9274
8 790
8791
7398
14921
4126
11649
11524
26 570
26 570
6 978
6637
3588
3 247
10566
9884
9876
N W Keilman
1496
a level of some 20000 persons. This pattern is reflected in the figures for net
immigration in table 3. The CBS results also determine the trends of the objective
functions for fertility and mortality. It should be noted that in the period 1980-1983
male mortality in the national projections fluctuated somewhat more strongly than did
female mortality.
Table 3. Values of the objective functions at minimum.
Births (x 10" 3 )
Male deaths (x 10" 3 )
Female deaths (x 10" 3 )
Male net immigrations (x 105)
Female net immigrations (x 105)
1980
1981
1982
1983
15.676
53.811
33.399
45.974
28.370
0.230
0.655
0.046
36.540
27.092
0.332
14.743
0.100
0.888
0.386
0.177
15.777
0.090
0.678
0.350
6 Summary and conclusions
In this paper the impact that consistency constraints may have upon the formulation
of multidimensional population projection models is described. To that end, the
equations of a fairly general version of such a model are given. This version may be
considered a further improvement of the traditional multidimensional population
projection model, developed by Rogers, Ledent, Willekens, and others. The
improvement is twofold:
(a) External migration is taken into account, with explicit recognition of the events
that migrants may experience in the system during the projection period in which
they migrate.
(b) The model structure and the parameter estimators are derived from a set of
accounting equations describing the relation between stocks and flows in the system.
The flows are defined over a period-cohort-observational plan which pertains to a
certain cohort during two successive points in time. This enables one to avoid the
calculation of life-table measures as an intermediate step in the construction of the
projection model.
The main characteristic of the approach adopted in this article is that such a model
may exactly simulate the population's behaviour in a known period.
Two types of consistency are described, namely, internal consistency and external
consistency. In both instances, a subset of the model variables has to satisfy a certain
constraint. When exogenous variables play a role in this constraint, we speak of
external consistency. Internal consistency arises when the constraint applies only to
endogenous variables. Inconsistencies may be expected in situations where two or
more (sub)models describe the same population categories. Aggregation and
decomposition of a system are two examples of such a situation. However, the
consistency problem may also be encountered as a result of inadequate modelling.
The population model may, for instance, describe individuals, although their
behaviour is, in reality, part of the behaviour of a larger group.
Methods that deal with solving inconsistencies frequently apply the following
method:
1 formulate initial values of model parameters;
2 check and adjust for consistency;
3 translate consistent model variables into adjusted parameter values.
Consistency constraints are often formulated in terms of flow variables, representing
absolute numbers of events. Therefore, a reformulation of the multidimensional
population projection model is given which enables one to state assumptions for some
components in absolute numbers, rather than in occurrence/exposure rates.
Consistency in multidimensional population projection models
1497
Two applications of the concepts of consistency within the framework of multidimensional population projection models are given. T h e first illustrates how the
problem of internal consistency was solved in the model of the 1980-based official
population forecasts of the Netherlands. O n e of the components to be extrapolated
was nuptiality, and hence in a certain period female marriages should be equal to
male marriages. A n algorithm is given which not only ensures consistency between
female and male marriage behaviour, but which also meets most of the requirements
for a realistic two-sex marriage model mentioned in the literature. T h e adjustment
factors, defined as the ratio between adjusted and initial marriage rates, show a
behaviour which is largely dictated by the sex ratio of never-married persons aged
twenty - thirty-four. Moreover, the influence of the marriage assumptions upon the
adjustment factors (for first marriage, a high, a low, and a medium variant were
assumed) is much smaller than the impact of the structure of the base-year
population. T h e second application describes the (external) consistency between a
regional and a national population projection. A t NIDI, a multiregoinal projection
model was constructed for which an external consistency constraint was formulated:
its regional outcomes (both stock and flow variables), when summed over all regions,
should be equal to the previously calculated corresponding national totals. A n
algorithm for achieving consistency is described, which controls for age-specific
patterns of the relevant components by minimising deviations between initial and
adjusted age-specific patterns.
In this paper it has been shown that in multidimensional models which have been
used for practical projection purposes, the problem of consistency may be solved in a
fairly straightforward manner. However, only a very limited number of general
results could b e presented. This means that in other applications of multidimensional
projection models, other consistency procedures may turn out to be more appropriate.
F o r instance, the simple least-squares minimisation should be considered to be just
one possibility out of many. A weighted version, which takes account of regional
variables, could p r o d u c e more refined results. Also, other objective functions could
be considered. O n e possible candidate is the entropy function, optimisation of
which is equivalent to adapting the well-known Minimum Information Principle.
This approach was extensively used in dynamic household models for Sweden (for
instance, see H a a r s m a n and Marksjoe, 1977; H a a r s m a n and Snickars, 1983).
Finally, we may observe that the multidimensional model used here is based upon the
assumption of a uniform distribution of events during the projection period. A n
alternative approach would be to start from a continuous-time Markov chain and to
assume constant transition intensities. This leads to an exponential form of the
matrix of transition probabilities.
However, the concepts described here may be used as guidelines in solving the
consistency problem of other applications.
Acknowledgements. Thanks are due to Willemien Kneppelhout for assistance with the English
text and to Joan Vrind and Tonny Nieuwstraten for expertly typing the manuscript. Comments
upon an earlier version of this text, made by Frans Willekens and an anonymous referee, are
gratefully acknowledged.
References
Baxter R, Williams I, 1978, "Population forecasting and uncertainty at the national and local
scale" Progress in Planning 9 1-72
CBS, 1976 De Toekomstige Demograflsche Ontwikkeling in Nederland na 1975 (Future
Demographic Trends in the Netherlands after 1975), Centraal Bureau voor de Statistiek
(Staatsuitgeverij, Den Haag)
CBS, 1979 De Toekomstige Ontwikkeling van de Burgerlijke Staat van Mannen (Future Trends in
the Marital Status of Men), Centraal Bureau voor de Statistiek (Staatsuitgeverij, Den Haag)
1498
N W Keilman
CBS, 1982 Prognose van de Be vo Iking van Nederland na 1980. Dee11: Uitkomsten en enkele
Achtergronden (Forecasts of the Population of the Netherlands after 1980. Part I: Results and
some Backgrounds), Centraal Bureau voor de Statistiek (Staatsuitgeverij, Den Haag)
Cruijsen H G J M, Keilman N W, 1984 Prognose van de Bevolking van Nederland na 1980.
Deel II: Modelbouw en Hypothesevorming (Forecasts of the Population of the Netherlands
after 1980. Part II: Model Building and Formulation of Assumptions) (Staatsuitgeverij,
Den Haag)
Eichperger C L, 1984, "Regional population forecasts: approaches and issues" in Demographic
Research and Spatial Policy Eds H Ter Heide, F J Willekens (Academic Press, London)
pp 2 3 5 - 2 5 2
George M V, 1981, "Toward achieving numerical and methodological consistency in developing
national and provincial projections: Canada's experience" paper presented at the 1981
General Conference of the International Union for the Scientific Study of Population,
Manila, 9 - 1 6 December; copy available from the author, Demography Division, Statistics
Canada, Ottawa
Gibberd R, 1981, "Aggregation of population projections models" IIASA Reports 4,
International Institute for Applied Systems Analysis, Laxenburg, Austria; pp 177-193
Haarsman B, Marksjoe B, 1977, "Modeling household changes by 'efficient information
adding'" in Demographic, Economic and Social Interaction Eds A E Andersson, J Holmberg
(Ballinger, Cambridge, MA) pp 307-324
Haarsman B, Snickars F, 1983, "A method for disaggregate household forecasts" Tijdschrift voor
Economische en Sociale Geografie 74 282-290
Keilman N W, 1982, "Validation of population projection models, illustrated by the case of the
Netherlands" WP-29, Netherlands Interuniversity Demographic Institute, PO Box 955,
2270 AZ Voorburg
Keilman N W, 1984, "Integratie en consistentie van regionale en nationale bevolkingsprognoses"
(Integration and consistency of regional and national population forecasts) internal report
number 36, Netherlands Interuniversity Demographic Institute, PO Box 955, 2270 AZ
Voorburg
Keilman N W, 1985, "Nuptiality models and the two-sex problem in national population
forecasts" European Journal of Population 1 (2/3) 2 0 7 - 2 3 5
Keyfitz N, 1968 Introduction to the Mathematics of Population (Addison-Wesley, Reading, MA)
Leslie P H, 1945, "On the use of matrices in certain population mathematics" Biometrika 33
183-212
Pittenger D, 1976 Projecting State and Local Populations (Ballinger, Cambridge, MA)
Pollard J H, 1973 Mathematical Models for the Growth of Human Populations (Cambridge
University Press, Cambridge)
Pollard J H, 1977, "The continuing attempt to incorporate both sexes into marriage analysis"
International Population Conference, Mexico International Union for the Scientific Study of
Population, Rue des Augustins 34, 4000 Liege; pp 2 9 1 - 3 1 0
Rogers A, 1975 Introduction to Multiregional Mathematical Demography (John Wiley, New York)
Rogers A, 1976, "Shrinking large-scale population projection models by aggregation and
decomposition" Environment and Planning A 8 5 1 5 - 5 4 1
Schoen R, 1981, "The harmonic mean as the basis of a realistic two-sex marriage model"
Demography 18 201-216
Schwarz K, 1975 Methoden der Bevolkerungsvorausschatzung unter Beriicksichtigung regionaler
Gesichtspunkte (Herman Schroedel, Hannover)
Wijewickrema S M, 1980 Weak Ergodicity and the Two-sex Problem in Demography PhD
dissertation, Department of Sociology, Free University of Brussels, Pleinlaan 2, 1050 Brussels
Willekens F J, 1984, "Een multiregionaal demografisch vooruitberekeningsmodel voor
Nederland" (A multiregional demographic projection model for the Netherlands), Netherlands
Interuniversity Demographic Institute, PO Box 955, 2270 AZ Voorburg
Willekens F J, Drewe P, 1984, "A multiregional model for regional demographic projection" in
Demographic Research and Spatial Policy Eds H Ter Heide, F J Willekens (Academic Press,
London) pp 309-334
p
© 1985 a Pion publication printed in Great Britain
Download