The Intrinsic Estimator for Age-period

Age-Period-Cohort Analysis:
New Models, Methods, and
Empirical Analyses
Kenneth C. Land, Ph.D.
John Franklin Crowell Professor of
Sociology and Demography
Duke University
Presentation
Indiana University
April 15, 2011
1
GUIDING PRINCIPLE FOR THIS WORK
Famous quote from George E. P. Box, Emeritus
Professor of Statistics, University of Wisconsin at
Madison:
“All statistical models are wrong, but some are
useful.”
Ken Land’s Version:
“All statistical models are wrong, but some have
better statistical properties than others – which
may make them useful.”
2
Organization
 Briefly Review the Early Literature on Cohort Analysis
and the Age-Period-Cohort (APC) Identification
Problem
 Describe Models & Methods Developed Recently for
APC Analysis for Three Research Designs, with
Empirical Applications:
1) APC Analysis of Age-by-Time Period Tables of
Rates
2) APC Analysis of Microdata from Repeated CrossSection Surveys
3) Cohort Analysis of Accelerated Longitudinal
Panel Designs
 Conclusion
3
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
Why cohort analysis?
See the abstract from Norman Ryder’s
classic article:
Ryder, Norman B. 1965. The Cohort as A
Concept in the Study of Social Change.
American Sociological Review
30:843-861.
4
Part I: The Early Literature on Cohort Analysis and
the Age-Period-Cohort (APC) Identification Problem
5
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
And what is the APC identification problem?
See the abstract from the classic Mason et al.
article:
Mason, Karen Oppenheim, William M. Mason, H.
H. Winsborough, W. Kenneth Poole. 1973.
Some Methodological Issues in Cohort Analysis
of Archival Data. American Sociological
Review 38:242-258.
6
Part I: The Early Literature on Cohort Analysis and
the Age-Period-Cohort (APC) Identification Problem
7
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
These two articles were particularly important in framing
the literature on cohort analysis in sociology,
demography, and the social sciences over the past five
decades:
Ryder (1965) argued that cohort membership could be as
important in determining behavior as other social
structural features such as socioeconomic status.
Mason et al. (1973) specified the APC multiple
classification/accounting model and defined the
identification problem therein.
8
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
The Mason et al. (1973) article, in particular,
spawned a large methodological literature,
beginning with Norval Glenn’s critique:
Glenn, N. D. (1976). Cohort Analysts’ Futile Quest: Statistical Attempts
to Separate Age, Period, and Cohort Effects. American
Sociological Review 41:900–905.
and Mason et al.’s (1976) reply:
Mason, W. M., K. O. Mason, and H. H. Winsborough. (1976). Reply to
Glenn. American Sociological Review 41:904-905.
9
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
The Mason et al. reply continued with Bill Mason’s
work with Stephen Fienberg:
Fienberg, Stephen E. and William M. Mason. 1978. "Identification and
Estimation of Age-Period-Cohort Models in the Analysis of Discrete
Archival Data." Sociological Methodology 8:1-67,
which culminated in their 1985 edited volume:
Fienberg, Stephen E. and William M. Mason, Eds. 1985. Cohort Analysis in
Social Research. New York: Springer-Verlag,
a defining volume on the methodological literature on
APC analysis in the social sciences as of about 25
10
years ago.
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
New approaches and critiques thereof continued over
the years; see, e.g., an article applying a Bayesian
statistics approach:
Saski, M., & Suzuki, T. (1987). Changes in Religious Commitment in the
United States, Holland, and Japan. American Journal of Sociology
92:1055–1076,
and the critique:
Glenn, N. D. (1987). A Caution About Mechanical Solutions to the
Identification Problem in Cohort Analysis: A Comment on Sasaki and
Suzuki. American Journal of Sociology 95:754–761.
11
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
For additional material on these and related
contributions to the literature on cohort analysis, see
the following three reviews:
Mason, William M. and N. H. Wolfinger. 2002. “Cohort
Analysis.” Pp. 151-228 in International Encyclopedia of the
Social and Behavioral Sciences. New York: Elsevier.
Glenn, Norval D. 2005. Cohort Analysis. 2nd edition.
Thousand Oaks: Sage.
Yang, Yang. 2007. “Age/Period/Cohort Distinctions.” Pp. 20-22
in Encyclopedia of Health and Aging. Kyriakos S. Markides
(ed). Sage Publications.
12
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
Where does this literature on cohort analysis leave us
today?
If a researcher has a temporally-ordered dataset and
wants to tease out its age, period, and cohort
components, how should he/she proceed?
Are there any methodological guidelines that can be
recommended?
13
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
There are some guidelines – and cautions, e.g., in
Glenn (2005).
But can more be done with new statistical models and
methods? Perhaps, but any new method must
meet the criteria laid down by Glenn (2005: 20)
that it may prove useful:
“if it yields approximately correct estimates ‘more
often than not,’
if researchers carefully assess the credibility of the
estimates by using theory and side information, and
if they keep their conclusions about the effects
14
tentative.”
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
Generally, however, the problem with much of the extant
literature is a deficiency of useful guidelines on how to conduct
an APC analysis. Rather, the literature often leads a researcher
to conclude either that:
it is impossible to obtain meaningful estimates of the
distinct contributions of age, time period, and cohort to
the study of social change,
or that:
the conduct of an APC analysis is an esoteric art that is
best left to a few skilled methodologists.
15
Part I: The Early Literature on Cohort Analysis and the
Age-Period-Cohort (APC) Identification Problem
Yang and Land and co-authors have bravely taken on
Glenn’s challenge and have developed new
approaches for APC analysis that are less esoteric
and can be used by researchers.
These new approaches are bound together as
members of the class of Generalized Linear
Mixed Models (GLMMs), models that allow linear
and nonlinear exponential family links and mixed
(both fixed and random) effects.
16
Part II: First Research Design: APC Analysis of Ageby-Time Period Tables of Rates or Proportions
References for Part II:
Fu, W. J. 2000. “Ridge Estimator in Singular Design with Application to
Age-Period-Cohort Analysis of Disease Rates.” Communications in
Statistics--Theory and Methods 29:263-278.
Yang Yang, Wenjiang J. Fu, and Kenneth C. Land. 2004. “A
Methodological Comparison of Age-Period-Cohort Models: The Intrinsic
Estimator and Conventional Generalized Linear Models.” Sociological
Methodology 34:75-110.
Yang Yang, Sam Schulhofer-Wohl, Wenjiang J. Fu, and Kenneth C. Land.
2008. “The Intrinsic Estimator for Age-Period-Cohort Analysis: What It Is
and How To Use It.” American Journal of Sociology 114(May): 16971736.
Yang Yang. 2008. “Trends in U.S. Adult Chronic Disease Mortality, 19601999: Age, Period, and Cohort Variations.” Demography 45(May):387416.
17
Part II: First Research Design: APC Analysis of Ageby-Time Period Tables of Rates or Proportions
Data Structure: Tabular Rate Data
18
Part II: First Research Design: APC Analysis of Ageby-Time Period Tables of Rates or Proportions
Example: Lung Cancer Death Rates for U.S. Adult Females,
1960 – 1999 Analyzed in Yang (2008)
Age
20 - 24
25 - 29
30 - 34
35 - 39
40 - 44
45 - 49
50 - 54
55 - 59
60 - 64
65 - 69
70 - 74
75 - 79
80 - 84
85 - 89
90 - 94
95 - 125+
All
1960 - 64
0.1
0.2
0.8
2.3
5.1
8.6
12.5
16.1
19.9
24.5
29.2
34.0
36.9
39.8
34.2
26.5
10.3
1965 - 69
0.1
0.2
0.9
3.0
7.1
12.9
19.4
25.5
28.8
33.9
38.4
41.8
45.8
48.6
43.1
44.2
14.7
Deaths per 100,000 Population
Period
1970 - 74 1975 - 79 1980 - 84 1985 - 89
0.1
0.1
0.1
0.1
0.2
0.2
0.2
0.2
1.0
0.9
0.8
0.8
3.6
3.5
3.3
2.7
9.1
10.5
9.9
8.9
18.1
22.2
23.9
23.1
28.9
36.9
44.2
47.2
40.1
53.1
69.0
78.3
46.6
69.2
92.6
115.2
51.3
78.6
114.7
145.5
52.8
77.7
120.2
168.3
56.1
76.1
111.4
162.0
57.3
75.3
102.5
141.1
59.7
75.2
96.9
120.9
60.6
73.6
91.8
108.8
51.0
68.9
82.7
104.1
21.3
28.9
38.3
47.9
Source: CDC/NCHS Multiple Cause of Death File
1990 - 94
0.1
0.2
0.9
2.9
7.5
20.9
44.9
81.8
127.3
172.6
208.2
219.4
199.8
164.5
136.3
120.0
57.0
1995 - 99
0.1
0.2
0.8
3.0
7.7
17.0
39.0
74.2
125.1
180.0
233.5
251.6
249.6
214.8
166.2
132.8
61.7
19
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Algebra of the APC Identification Problem
Linear Model Specification:
M ij  Dij / Pij    i   j   k   ij
(1)
–
Mij denotes the observed occurrence/exposure rate of deaths for the i-th age group for i = 1,…,a age groups at the j-th
time period for j = 1,…, p time periods of observed data
–
Dij denotes the number of deaths in the ij-th group, Pij denotes the size of the estimated population in the ij-th group
–
μ denotes the intercept or adjusted mean
–
αi denotes the i-th row age effect or the coefficient for the i-th age group
–
βj denotes the j-th column period effect or the coefficient for the j-th time period
–
γk denotes the k-th cohort effect or the coefficient for the k-th cohort for k = 1,…,(a+p-1) cohorts, with k=a-i+j
–
εij denotes the random errors with expectation E(εij ) = 0
–
Fixed effect GLIM reparameterization:
as the reference group.
  
i
i
j
 j   k  k  0, or setting one of each of the categories
20
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Algebra of the APC Identification Problem
Alternative Specifications In the Generalized Linear Models (GLM)
Class:
 Simple Linear Models
Yij     i   j  k ij
where Yij is the expected outcome in cell (i, j) that is assumed to be normally distributed or equivalently the error
term  ij is assumed to be normally distributed with a mean of 0 and variance σ2;
 Log-Linear Models
log(Eij) = log(Pij) + μ + αi + βj + γk
where Eij denotes the expected number of events in cell (i,j) that is assumed to be distributed as a Poisson variate,
and log(Pij) is the log of the exposure Pij
 Logistic Models
 mij 
    i   j   k

 1  mij 
 ij  log
where
θij is the log odds of event and mij is the probability of event in cell (i,j).
21
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Algebra of APC Identification Problem
Least-squares regression in matrix form:
Y  Xb
(2)
b  (,1 ,... a1 , 1 ,..., p1 ,  1 ,..., a p2 )T
Identification Problem:
bˆ  ( X T X )1 X T Y
(3)
The solution to these normal equations does not exist because the
Design matrix X is singular with 1 less than full rank (one column can be
written as a linear combination of the others); this is due to the identity:
Period = Age + Cohort
thus, (XTX)-1 does not exist
22
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Conventional Solutions to APC Identification Problem
Constrained Coefficients GLIM (CGLIM) Estimator
 Impose one or more equality constraints on the coefficients of the coefficient
vector in (2) in order to just-identify (one equality constraint) or over-identify
(two or more constraints) the mod
Proxy Variables/Age-Period-Cohort Characteristic (APCC) Approach
 Use one or more proxy variables as surrogates for the age, period, or cohort
coefficients (see O'Brien, R.M. 2000. "Age Period Cohort Characteristic
Models." Social Science Research 29:123-139);
Nonlinear Parametric (Algebraic) Transformation Approach
 Define a nonlinear parametric function of one of the age, period, or cohort
variables so that its relationship to others is nonlinear.
23
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Limitations of Conventional Solutions to APC Identification
Problem
Proxy Variables Approach
 the analyst may not want to assume that all of the variation associated with
the A, P, or C dimensions is fully accounted for by a proxy variable;
Nonlinear Parametric (Algebraic) Transformation Approach
 it may not be evident what nonlinear function should be defined for the
effects of age, period, or cohort;
Constrained Coefficients GLIM (CGLIM) Estimator
 it is the most widely used of the three approaches, but suffers from some
major problems summarized below.
24
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Limitations of Conventional Solutions to APC Identification
Problem
Constrained Coefficients GLIM (CGLIM) Estimator:
 the analyst desires to employ the flexibility of the APC accounting model
with its individual effect coefficients for each of the A, P, or C categories;
 the analyst needs to rely on prior or external information to find
constraints that hardly exists or can be well verified;
 different choices of identifying constraints can produce widely different
estimates of patterns of change across the A, P, and C categories of the
analysis;
 all just-identified CGLIM models will produce the same levels of
goodness-of-fit to the data, making it impossible to use model fit as the
criterion for selecting the best constrained model.
25
Part II: First Research Design: APC
Accounting/Multiple Classification Model
So, what can be done? Some Guidelines for
Estimating APC Models for Tables of Rates or
Proportions
Step 1: Descriptive data analyses using graphics
Step 2: Model specification tests
Objectives:
 to provide qualitative understanding of patterns of age, or period, or
cohort variations, or two-way age by period and age by cohort
variations;
 to ascertain whether the data are sufficiently well described by any
single factor or two-way combination of the A, P, and C dimensions or
if it is necessary to include all three.
26
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Step 1: Graphical analyses: Female Lung Cancer Example
from Yang (2008)
27
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Step 2: Model selection procedures
Examples from Yang et al. (2004) and Yang (2008)
Table 1. Goodness-of-Fit Statistics for Age-Period-Cohort Log Linear Models of
U.S. Adult Mortality
Cause of
Death
Models
DF
Deviance
AIC
BIC
A
112
695527
695751
695763
Heart Disease
Deviance
AIC
BIC
782210
782434
782446
Stroke
Deviance
AIC
BIC
Lung Cancer
Breast Cancer
Total
Female
AP
105
40443
40653
40664
AC
90
72089
72269
72279
APC*
84
18903
19071
19080
52225
52435
52446
18638
18818
18827
9243
9411
9420
655622
655846
655858
12660
12870
12881
25967
26147
26157
1480
1648
1657
Deviance
AIC
BIC
320050
320274
320286
42126
42336
42347
5296
5476
5486
245
413
422
Deviance
AIC
BIC
9748
9972
9984
7403
7613
7625
1553
1733
1743
512
680
689
28
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Guidelines for Estimating APC Models of Rates or
Proportions
If the foregoing descriptive analyses suggest that only one or two of the A, P,
and C dimensions is operative, then the analysis can proceed with a
reduced model (2) that omits one or two dimensions and there is no
identification problem.
If, however, these analyses suggest that all three dimensions are at work,
then Yang et al. (2004, 2008) recommend:
Step 3: Apply the Intrinsic Estimator (IE).
29
Part II: First Research Design: APC
Accounting/Multiple Classification Model
What is the Intrinsic Estimator (IE)?
It is a new method of estimation that yields a unique solution to the
model (2) and is the unique estimable function of both the linear
and nonlinear components of the APC model determined by the
Moore-Penrose generalized inverse. It achieves model
identification with minimal assumptions.
Why is the IE useful?
The basic idea of the IE is to remove the influence of the design
matrix (which is fixed by the number of age and period groups and
not related to the outcome observations Yij) on coefficient
estimates. This constraint produces estimates that have desirable
statistical properties.
30
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Some preliminary matrix algebra concepts:
 Let A be a matrix of dimension q by d (q rows and d columns), let x be a column
vector of dimension d, and y a column vector of dimension q.
For a set of linear equations Ax = y, the set of vectors x0 of (real) numbers such
that Ax0 = 0 is called the null space of the matrix A.
 When a matrix A is rank deficient (has linearly dependent columns), the
dimension of the null space is at least one.
 In this case, if we have Ax = y, then we also have A(x + x0) = y.
 When A is rank deficient, the equation Ax = y has an infinite set of solutions,
which differ by an element of the null space (if vectors x1 and x2 are solutions, then
A(x1 – x2) = 0 and the vector x1 – x2 is in the null space).
 When A is rank deficient, there always is a well-defined solution whose projection
on the null space is zero; this solution corresponds to the generalized inverse of A.
31
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Intrinsic Estimator (IE): Algebraic Definition
The linear dependency between A, P, and C in model (2)
is mathematically equivalent to:
XB0  0
(4)
which defines the null space for model (2) where the
eigenvector B0 of eigenvalue of 0 is fixed by the design
matrix X:
~
B0
B0  ~
B0
~
B0  (0, A, P, C)T
a  1
 a 1
A  1 
,, (a  1) 

2
2 

p 1
 p 1
 C  1  a  p ,, (a  p  2)  a  p 
P
 1,,
 ( p  1) 
2
2 

2
 2

32
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Intrinsic Estimator (IE): Algebraic Definition
Parameter vector orthogonal decomposition:
(5)
(6)
b0  ( I  B0 B0T )b
where b0  Pproj b is the projection of b to the non-null space of X
and t is a real number, tB0 is in the null space of X and represents
trends of linear constraints – Different equality constraints used by
CGLIM estimators, such as b1 and b2, yield different values of t.
b  b0  tB0
b2
b0
b1
33
0
B0
tB0
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Intrinsic Estimator (IE) Method: Algebraic Definition
From the infinite number of estimators of b in model (2):
bˆ  B  tB0
(7)
the IE B estimates the parameter vector b0 corresponding to t = 0:
B  (I  B0 B0T )bˆ
(8)
The IE is the special estimator that uniquely determines the age, period, and
cohort effects in the parameter subspace defined by b0 :
Xbˆ  X (B  tB0 )  XB  tXB0  XB  0  XB (9)
34
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Intrinsic Estimator (IE) Method: Desirable statistical
properties (Yang et al. 2004, 2008):
1) Estimability: Yang et al. (2004) established that the IE
satisfies the Kupper et al. (1985) condition for
estimability, namely
l T B0  0
where where lT is a constraint vector (of appropriate
dimension) that defines a linear function lTb of b.
Reference:
Kupper, L.L., J.M. Janis, A. Karmous, and B.G. Greenberg. 1985.
“Statistical Age-Period-Cohort Analysis: A Review and Critique.”
Journal of Chronic Disease 38:811-830.
35
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Proof: Note that
l B0  (I  B0 B0 )B0  B0  B0 B0 B0  B0  B0  0
T
T
T
Estimable functions are desirable as statistical
estimators because they are linear functions of
the unidentified parameter vector that can be
estimated without bias, i.e., they have unbiased
estimators.
36
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Yang et al. (2004) also proved independently of the Kupper
et al. (1985) estimability condition that the IE has the
following two properties:
2) Unbiasedness: For a fixed number of time periods of
data, it is an unbiased estimator of the special
parameterization (or linear function) b0 of b.
3) Relative efficiency: For a fixed number of time periods
of data, it has a smaller variance than any CGLIM
estimators.
37
Part II: First Research Design: APC
Accounting/Multiple Classification Model
3) Asymptotic consistency: This properties derive largely
from the fact that the length of the eigenvector B0
decreases with increasing numbers of time periods of
data, and, in fact, converges to zero as the number of
periods of data increases without bound.
Therefore, for any two estimators:
bˆ2  B  t 2 B0
and
bˆ1  B  t1 B0
where t1 and t2 are nonzero and correspond to different
identifying constraints, as the number of time periods in
an APC analysis increases, the difference between
these two estimators decreases towards zero, and, in
fact, that the estimators converge toward the IE B.
38
Part II: First Research Design: APC
Accounting/Multiple Classification Model
4) Monte Carlo Simulation: Numerical simulation
demonstrations of the foregoing statistical
properties were given in Yang et al. (2008); one
example is reproduced on the following slide.
39
Simulation Results of the IE and CGLIM Estimators: True Cohort Effects = 0
Age Effe ct: M e an Es tim ate s
2.0
True ef f ect
CGLIM_p
1.5
IE
CGLIM_c
25.0
20.0
0.5
15.0
0.0
10.0
-0.5
5.0
Log coef
1.0
0.0
-1.0
a1
a2
a3
a4
a5
Age
a6
a7
a8
a1
a9
Pe riod Effe ct: M e an Es tim ate s
0.5
a2
a3
a4
a5
Age
a6
a7
a8
a9
Pe riod Effe ct: M SE
8.0
0.3
Log coef
Age Effe ct: MSE
30.0
6.0
0.1
4.0
-0.1
2.0
-0.3
-0.5
0.0
p1
p2
p3
Pe riod
p4
p5
p1
Cohort Effe ct: M e an Es tim ate s
2.0
p2
p4
p5
Cohort Effe ct: M SE
400
300
Log coef
1.0
p3
Pe riod
0.0
200
-1.0
100
-2.0
40
0
c1
c3
c5
c7
Cohort
c9
c11
c13
c1
c3
c5
c7
Cohort
c9
c11
c13
Part II: First Research Design: APC
Accounting/Multiple Classification Model
Based on these statistical properties, Yang et al. (2008)
also showed how the IE can be used in an asymptotic ttest to evaluate a substantively informed equality
constraint on the APC accounting model with respect to
whether the estimated coefficient vector that results
therefrom is (statistically) estimable, that is, within
sampling error of meeting the Kupper et al. condition for
estimability.
41
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Intrinsic Estimator (IE) Method: Computation
Software
Two programs for calculating the IE are available for use in popular
statistical packages:
1)
a S-Plus/R program
and
2)
a Stata Ado File
(both referenced in Yang et al., 2008)
42
Part II: First Research Design: APC
Accounting/Multiple Classification Model
 Example: Intrinsic Estimates of Age, Period, and Cohort Effects of Lung
Cancer Mortality by Sex (Yang 2008)
Period Effect
Age Effect
2.0
2.0
Male
Male
Female
Female
1.0
-2.0
0.0
Log coefficient
0.0
-1.0
-4.0
-9
9
19
95
-9
4
19
90
-8
9
19
85
-8
4
19
80
-7
9
19
75
-7
4
19
70
-6
9
19
65
19
60
85
80
75
70
65
60
55
50
45
90
95
+
Year
Cohort Effect
2.0
Male
Female
1.0
0.0
-1.0
-2.0
43
19
75
19
65
19
55
19
45
Cohort
19
35
19
25
19
15
19
05
18
95
18
85
18
75
-3.0
18
65
Log coefficient
35
30
25
20
40
Age
-6
4
-2.0
-6.0
Some Recent Empirical Applications of the Intrinsic Estimator:
Schwadel, P. 2011. “Age, period, and cohort effects on
religious activities and beliefs”, Social Science
Research 40:181-192.
Unknown Author. 2011. “Age, Period, and Cohort Effects on
Social Capital and Voting.” Social Forces
90:forthcoming.
Winkler, Richelle L., Jennifer Huck, and Keith Warnke. 2009.
“Deer hunter demography: An age-period-cohort
approach to population projections.” Paper presented
at the Population Association of America Annual
Meeting, Detroit, MI, April 30, 2009.
44
Part II: First Research Design: APC
Accounting/Multiple Classification Model
The Intrinsic Estimator (IE): Conclusion
Is the Intrinsic Estimator a “final” or “universal” solution to the APC
“conundrum”?
No. There will never be such a solution. The APC identification problem is
one of structural under-identification in linear or generalized linear
models for which there can only be partial solutions.
But the IE has been shown to be a useful approach to the identification and
estimation of the APC accounting model that
• has desirable mathematical and statistical properties; and
• has passed both case studies and simulation tests of model
validation.
45
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
References for Part III:
Yang, Yang. 2006. Bayesian Inference for Hierarchical Age-Period-Cohort
Models of Repeated Cross-Section Survey Data. Sociological
Methodology 36:39-74.
Yang Yang and Kenneth C. Land. 2006. A Mixed Models Approach to the
Age-Period-Cohort Analysis of Repeated Cross-Section Surveys, With
an Application to Data on Trends in Verbal Test Scores. Sociological
Methodology 36:75-98.
Yang Yang and Kenneth C. Land. 2008. Age-Period-Cohort Analysis of
Repeated Cross-Section Surveys: Fixed or Random Effects?
Sociological Methods and Research 36(February):297-326.
Yang, Yang 2008. “Social Inequalities in Happiness in the United States,
1972 to 2004: An Age-Period-Cohort Analysis.” American
Sociological Review 73(April): 204-226.
46
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
References for Part III, Continued:
Yang Yang, Steven M. Frenk, and Kenneth C. Land. 2010. “Assessing
the Significance of Cohort and Period Effects in Hierarchical
Age-Period-Cohort Models.” Revision of a paper presented at the
American Sociological Association Annual Meeting, San Francisco,
CA, August 2009.
Zheng, Hui, Yang Yang, and Kenneth C. Land. 2011. “Heteroscedastic
Regression in Hierarchical Age-Period-Cohort Models, With
Applications to the Study of Self-Reported Health. Revision of a
paper presented at the American Sociological Association Annual
Meeting, Atlanta, GA, August 2010.
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Data Structure: Individual-level Data in an Age-by-Period
Array
Period j
nij >1
Age i
48
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Approach to the Identification Problem
Many researchers previously have assumed that the APC
identification problem for age-by-time period tables of rates
transfers over directly to this research design.
But note that this research design yields individual-level data,
i.e., microdata on the ages and other characteristics of
individuals in the samples.
Proposal: Use different temporal groupings for the A, P, and
C dimensions to break the linear dependency:
 Single year of age
 Time periods correspond to years in which the surveys are
conducted
 Cohorts can be defined either by five- or ten-year intervals that
are conventional in demography or by application of a
substantive classification (e.g., War babies, Baby Boomers, 49
Baby Busters, etc.).
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Example: Two-way Cross-Classified Data Structure in the GSS: Number of
Observations by Cohort and Period in the Verbal Ability Data (Yang and Land
2006)
Cohort (J) 1974 1976 1978 1982 1984 1987
12
18
8
0
0
0
1890
31
25
19
19
6
0
1895
62
52
49
27
18
17
1900
88
69
68
43
38
23
1905
77
89
69
75
50
48
1910
109 111
84 100
81
81
1915
115 104 112 110
73
97
1920
113 108 106 131
99
92
1925
129
92
90 111
81
95
1930
130 106 108 112
80 101
1935
119 140 130 127 100 142
1940
179 161 184 163 133 143
1945
179 180 197 199 170 185
1950
89 151 180 260 162 219
1955
0
8
59 175 186 190
1960
0
0
0
38
75 161
1965
0
0
0
0
0
29
1970
0
0
0
0
0
0
1975
0
0
0
0
0
0
1980
1432 1414 1463 1690 1352 1623
Total
Year (K)
1988 1989 1990 1991 1993 1994 1996 1998 2000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
13
11
5
2
0
0
0
0
0
11
12
11
11
15
15
10
0
0
34
27
25
29
13
31
27
18
8
42
36
37
41
37
60
39
24
27
60
53
40
56
55
85
59
32
37
52
53
53
40
50
84
81
68
52
47
54
43
62
43
86
72
45
64
39
59
44
37
58 101 100
61
64
49
74
49
65
58 134 117
65
78
98
84
85
74
85 168 161 104
85
101
94
95 111
99 173 169 101 111
102 117 106 118 127 198 213 149 145
109 121 102 118 103 231 208 161 147
101
86
76
91 111 182 188 157 111
32
48
55
77
81 157 188 116 145
0
0
0
1
23
59 128
84 107
0
0
0
0
0
0
4
34
62
890 929 826 933 958 1764 1764 1219 1243
Total
38
100
256
414
620
909
1088
1182
1114
1200
1447
1907
2164
2336
1918
1377
928
402
100
19500
50
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
This Data Structure illustrates that:

respondents are nested in and cross-classified
simultaneously by the two higher-level social contexts
defined by time period and birth cohort,
 individual members of any birth cohort can be interviewed in
multiples replications of the survey, and
 individual respondents in any particular wave of the survey can
be drawn from multiple birth cohorts.
Key Points:
1) this approach builds on the recognition that age is an
intrinsically individual-level property that individuals carry
with them and that varies from period to period;
2) by comparison an individual’s cohort is fixed, as is the time
period of a particular survey, and both cohort and period are
contexts within which individuals mature and age and
experience certain events.
51
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Further Questions:
Is there evidence for clustering effects of random errors,
due to the facts that:
• individuals surveyed in the same year may be
subject to similar unmeasured events that influence
their outcomes, and
• members of the same birth cohort may be subject to
similar unmeasured events that influence their
outcomes?
How can this random variability be modeled and
explained?
52
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Method: Apply Hierarchical Age-Period-Cohort (HAPC) Models
 These models generally are members of what statisticians call mixed
(fixed and random) effects models; in the social sciences, these
models typically are called hierarchical linear models (HLM).
 The mixed models may be linear mixed effects (LMM) models or,
more generally, allow for nonlinear link functions, in which case they are
generalized linear mixed models (GLMM).
 A form of HLMs applicable to cross-classified data of the form shown
above is the class of cross-classified random effects models
(CCREM).
 Objective: Model the level-two heterogeneity to:
 Assess the possibility that individuals within the same periods and
cohorts could share unobserved random variance;
 Explain the level-two variance by contextual characteristics of time
periods and birth cohorts.
53
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Application 1 – A HAPC-LMM of General Social
Survey (GSS) Data on Verbal Test Scores: 1974 – 2006
The Initial Papers:
Alwin, D. 1991. “Family of Origin and Cohort Differences in Verbal Ability.”
American Sociological Review 56:625-38.
Glenn, N.D. 1994 “Television Watching, Newspaper Reading, and Cohort
Differences in Verbal Ability.” Sociology of Education 67:216-30.
The debate in the American Sociological Review:
Wilson, J.A. and W.R. Gove. 1999. "The Intercohort Decline in Verbal Ability: Does
It Exist?" and reply to Glenn and Alwin & McCammon. ASR 64:253-266, 287302.
Glenn, N.D. 1999. “Further Discussion of the Evidence for An Intercohort Decline in
Education-Adjusted Vocabulary.” ASR 64:267-71.
Alwin, D.F. and R.J. McCammon. 1999. “Aging Versus Cohort Interpretations of
Intercohort Differences in GSS Vocabulary Scores.” ASR 64:272-86.
54
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Research Questions
 What are the distinct age, period, and cohort components of
change in verbal ability in the U.S.?
 How can period and/or cohort level heterogeneity be explained by
period and/or cohort characteristics?
Analytic Method
 Apply the HAPC-CCREM to estimate
• fixed effects of age and other individual level and level-two covariates,
• random effects of period and cohort and variance components
55
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Because the WORDSUM outcome variable has a relatively bell-shaped sample
frequency distribution, it is reasonable to use a HAPC model specification that
includes a conventional normal-errors regression model. Specifically, Yang and
Land (2006: 87) specified the cross-classified random effects model (CCREM):
Level-1 or “Within-Cell” Model:
WORDSUMijk = β0jk + β1AGEijk + β2AGE2ijk + β3EDUCATIONijk + β4FEMALEijk +
β5BLACKijk + eijk ,
eijk ~ N(0, σ2 )
(1)
Level-2 or “Between-Cell” Model:
β0jk = γ0 + u0j + v0k,
u0j ~ N(0, τu), v0k ~ N(0, τv)
(2)
Combined Model:
WORDSUMijk = γ0 + β1AGEijk + β2AGE2ijk + β3EDUCATIONijk + β4FEMALEijk +
β5BLACKijk + u0j + v0k + eijk
(3)
for i = 1, 2, …, njk individuals cross-classified within cohort j and period k;
j = 1, …, 20 birth cohorts;
k = 1, …, 17 time periods (survey years)
56
Table 2. HAPC Models of the GSS WORDSUM Data, 1974-2006
Fixed Effects
INTERCEPT
AGE
AGE2
FEMALE
BLACK
EDUCATION
Random Effects
Cohort
1894
1895
1900
1905
1910
1915
1920
1925
1930
1935
1940
1945
1950
1955
1960
1965
1970
1975
1980
1985
coefficient
6.175
0.026
-0.057
0.229
-1.030
0.366
se
0.055
0.015
0.005
0.024
0.034
0.004
t ratio
112.50
1.71
-11.87
9.49
-30.07
86.57
p value
< .000
.087
< .000
< .000
< .000
< .000
coefficient
-0.210
-0.114
-0.051
-0.294
0.021
0.163
-0.079
0.083
0.001
0.068
0.240
0.447
0.184
-0.035
0.002
-0.157
-0.135
-0.001
0.062
-0.195
se
0.142
0.123
0.104
0.090
0.081
0.073
0.068
0.068
0.067
0.064
0.061
0.060
0.059
0.061
0.065
0.071
0.080
0.092
0.112
0.146
t ratio
-1.48
-0.93
-0.49
-3.27
0.26
2.22
-1.15
1.23
0.01
1.06
3.91
7.50
3.10
-0.57
0.04
-2.20
-1.70
-0.01
0.55
-1.34
p value
0.140
0.353
0.625
0.001
0.797
0.027
0.249
0.220
0.990
0.289
< .000
< .000
0.002
0.568
0.970
0.028
0.090
0.990
0.583
0.180
57
Period
1974
1976
1978
1982
1984
1987
1988
1989
1990
1991
1993
1994
1996
1998
2000
2004
2006
Variance Components
Cohort
Period
Individual
Deviance
AIC
0.033
0.060
-0.002
-0.014
0.016
-0.061
-0.128
-0.061
0.020
0.042
-0.004
0.019
-0.060
0.044
0.005
0.038
0.052
variance
0.034
0.005
3.116
87707.2
87713.2
0.043
0.043
0.042
0.040
0.042
0.040
0.046
0.046
0.047
0.046
0.045
0.039
0.039
0.043
0.043
0.043
0.045
se
0.013
0.003
0.030
0.77
1.41
-0.04
-0.36
0.37
-1.52
-2.76
-1.34
0.43
0.92
-0.09
0.49
-1.52
1.02
0.11
0.88
1.16
z value
2.56
1.49
104.87
0.442
0.158
0.967
0.718
0.709
0.129
0.006
0.182
0.670
0.358
0.926
0.623
0.128
0.306
0.915
0.381
0.247
p value
.010
.135
< .000
58
Figure 1. Estimated Cohort and Period Effects and 95 Percent Confidence Bounds for GSS Verbal Ability Model
Cohort Effect
ˆ0 j
Period Effect
7.00
Verbal Test Score
7.00
Verbal Test Score
ˆ0k
6.00
5.00
6.00
5.00
Cohort
Period
59
To further test whether the birth cohort and time period effects – as a whole
– make statistically significant contributions to explained variance in an
outcome variable, a general linear hypothesis may be applied.
Specifically, one can either:
1) examine the statistical significance of the variance components (an
asymptotic t-test for LMMs), or
2) use an F test to test the hypothesis of the presence of random effect.
The sampling distribution of F statistic is exact in LMMs when the random
effects are independently distributed as normal random variables.
This F-test statistic is preferred over the z-score when the sample sizes for
random effects are small. The statistical theory for such tests has been
developed in a very general LMM context by E. Demidenko (Mixed
Models: Theory and Applications. Wiley, 2004).
60
In the present case, for the CCREM-HAPC model of Equations (1)-(3), there
are only two sets of random effect coefficient that are estimated, namely, the
set of residual random effects of cohort j, u0j, and the set of residual random
effects of period k, v0k. Each of these sets of random coefficients is
assumed to be independently, normally distributed with mean 0 and
variances τu and τv, respectively.
Thus, for a CCREM-HAPC model with random intercepts of the form of
Equations (1)-(3), the exact F-test amounts to testing null hypotheses for the
relevance either of the birth cohort random effects:
H0: τu = 0, vs. Ha: τu > 0
or the time period effects:
H0: τv = 0, vs. Ha: τv > 0.
Alternatively, one can test for the joint relevance of both the cohort and
period effects:
H0: τu = τv = 0, vs. Ha: τu > 0 or τv > 0
61
Table 3. F-tests for the Presence of Random Effects, GSS WORDSUM Data
Cohort Effects
SOLS
Smin
R
M
NT
(SOLS  Smin ) /(r  m)
S min /( NT  r)
F
f0.95(r – m, NT – r)
τu = 0 vs. τu > 0
69,377
68,696
25
Period Effects
Cohort and Period Effects
τv = 0 vs. τv > 0 τu = τv = 0 vs. τu or τv > 0
69,377
69,377
69,268
68,558
22
42
5
22,042
34.05
5
22,042
6.41
5
22,042
22.14
3.12
10.9
1.571
3.15
2.03
1.623
3.12
7.096
1.411
62
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
BACK TO THE DEBATE ON TRENDS IN VERBAL ABILITY:
So, who is right, Alwin and Glenn or Wilson and Gove?
The results of the HAPC analyses show:
 significant random variance components that reside in all three
levels of the APC data: individuals nested within cohorts and periods;
 quadratic age effects that are not explained away by controlling for
the effects of key individual characteristics, namely, education, sex and
race, and for period and cohort effects;
 significant contextual effects of cohorts and periods on verbal
ability, but this is mainly a cohort story; and
 strong effects of cohort characteristics: cohorts that have a larger
proportion of daily newspaper readers are better off in their verbal
ability; more hours of TV watching per day tend to undermine average
cohort verbal ability.
Bottom Line: Alwin and Glenn are more right than Wilson and Gove.
63
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Extensions of HAPC Modeling:
– Fixed Effects vs. Mixed Effects Model
– A Full Bayesian HAPC Model
– Generalized Linear Mixed Models (GLMM)
64
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Fixed Effects vs. Mixed Effects Model:
The HAPC-CCREM approach illustrated above uses a
mixed (fixed and random) effects model with a random
effects specification for the level-2 (time period and
cohort) contextual variables.
Alternative: fixed effects specification for the level-2
variables in which ones uses dummy (indicator)
variables to record the cohort and the time period of the
survey.
The comparison seems especially pertinent when the
number of replications of the survey is relatively small—
say 3 to 5.
65
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Fixed Effects vs. Mixed Effects Model:
The estimates of cohort and time period effects from a fixed effects
model for the GSS data are quite similar in pattern to those from
the random effects model (Yang and Land 2008).
The mixed effects model is preferred to the fixed effect model:
 It avoids potential model specification error by not using the
assumption of the fixed effect model that the indicator/dummy
variables representing the fixed cohort and periods effects fully
account for all of the group effects;
 It allows group level covariates to be incorporated into the model and
explicitly models cohort characteristics and period events to test
explanatory hypotheses;
 For unbalanced research designs (designs in which there are unequal
numbers of respondents in the cells), such as one typically has in
repeated cross-section survey designs, a random effect model for the
level-2 variables generally is more statistically efficient.
66
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
A Full Bayesian HAPC Model:
Limitations of HAPC Modeling Using REML-EB Estimation
•
•
•
•
Small numbers of cohorts (J) and periods (K)
Unbalanced data
Inaccurate REML estimates of variance-covariance components
Inaccurate EB estimates of fixed effects regression coefficients
A Remedy: Bayesian Model Estimation (Yang 2006)
• A full Bayesian approach, by definition, ensures that inferences about
every parameter fully account for the uncertainty associated with all
others.
67
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Application 2: A HAPC-GLMM of American National
Election Survey (ANES) Data on Voting Turnout in U.S.
Presidential Elections, 1952-2004 (Yang, Frenk, and
Land 2010)
The GLMM Family of Models:
• Normal outcome: Linear mixed models using Gaussian link
• Binomial outcome: Logistic mixed models using logit link
• Ordinal or nominal outcome: Ordinal logistic mixed models
• Count outcome: Poisson mixed models using log link
• Count outcome with dispersion: Negative Binomial mixed
models
REML-EB Estimation: Use, e.g., SAS PROC GLIMMIXED
68
Application 2: A HAPC-GLMM of Voting Turnout in U.S.
Presidential Elections
Table 4. Descriptive Statistics for ANES Voter Turnout Data, 1952 to 2004
Variables
Description
Mean
SD
Min
Max
0.76
0.43
0
1
0.45
0.1
45.62
0
0.5
0
1
0.3
0
1
16.53 18
95
16.53 -27.46 49.54
Religion
PROTESTANT
CATHOLIC
JEW
Respondent's sex: 1 = male; 0 = female
Respondent's race: 1 = black; 0 = nonblack
Respondent's age at survey year
Centered around grand mean
Respondents' religious preference
1 = Protestant; 0 = otherwise
1 = Catholic; 0 = otherwise
1 = Jew; 0 = otherwise
0.66
0.24
0.02
0.47
0.42
0.15
0
0
0
1
1
1
OTHER
1 = Other/None; 0 = otherwise
0.08
0.27
0
1
Dependent Variable
VOTE
Level-1 Variables
MALE
BLACK
AGE
1= Voted in U.S. presidential elections; 0 =
Did not vote in U.S. presidential elections
69
Respondent's marital status: 1 = Currently
married; 0 = otherwise
MARRIED
0.67 0.47
0
1
Occupational
Class
PROFESSIONAL 1 = Professional; 0 = otherwise
0.25 0.43
0
1
CLERICAL
1 = Clerical; 0 = otherwise
0.18 0.38
0
1
SKILLED
1 = Skilled; 0 = otherwise
0.31 0.46
0
1
LABORER
1 = Laborer; 0 = otherwise
0.03 0.17
0
1
FARMER
1 = Farmer; 0 = otherwise
0.04 0.19
0
1
NOT WORKING 1 = Not working; 0 = otherwise
0.2
0.4
0
1
Political affiliation Respondent's political affiliation
DEMOCRATIC 1 = Democratic; 0 = otherwise
0.52
0.5
0
1
INDEPENDENT 1 = Independent; 0 = otherwise
0.1
0.3
0
1
REPUBLICAN
1 = Republican; 0 = otherwise
0.38 0.48
0
1
POLITICAL
SOUTH¹
1 = Political south; 0 = otherwise
0.27 0.45
0
1
Level-2 Variables
N
Min Max
PERIOD
Survey year
14
1952 2004
COHORT
Five-year birth cohort
23
1859 1986
¹Includes the eleven session states: Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Texas, Virginia
Note: N=19,766
70
To model the likelihood of voter turnout in U.S. Presidential Elections, we apply
the HAPC-CCREM approach and specify the following model:
Level 1 or “Within-Cell” Model:
Pr (VOTEijk = 1) = β0jk + β1AGEijk + β2AGE2ijk + β3MALEijk + β4BLACKijk +
β5PROTESTANTijk + β6CATHOLICijk + β7JEWijk + β8PROFESSIONALijk +
β9CLERICALijk + β10SKILLEDijk + β11FARMERijk + β12NOWORKijk + β13PSOUTHijk
+ β14CMARRIEDijk + β15DEMOCRATICijk + β16REPUBLICANijk
Level 2 or “Between-Cell” Model:
β0jk = γ0 + u0j + ν0k , u0j ~ N(0, τu), ν0k ~ N(0, τv)
COMBINED MODEL:
Pr (VOTEijk = 1) = β0jk + β1AGEijk + β2AGE2ijk + β3MALEijk + β4BLACKijk +
β5PROTESTANTijk + β6CATHOLICijk + β7JEWijk + β8PROFESSIONALijk +
β9CLERICALijk + β10SKILLEDijk + β11FARMERijk + β12NOWORKijk + β13PSOUTHijk
+ β14CMARRIEDijk + β15DEMOCRATICijk + β16REPUBLICANijk
+ u0j + ν0k +
eijk (12)
for
i = 1, 2, …, njk individual within cohort j and period k;
j = 1, …23 birth cohorts;
k = 1, …, 14 time periods (presidential elections).
71
Table 5. HAPC Models of the ANES Voter Turnout Data, 1952-2004
Fixed Effects
coefficient
se
t ratio
p value
INTERCEPT
-0.49
0.134
-3.66
0.003
AGE
0.025
0.001
19.22
<.000
AGE²
-0.001
0.0001
-13.85
<.000
MALE
0.21
0.044
4.73
<.000
BLACK
-0.02
0.058
-0.34
0.734
PROTESTANT
0.335
0.065
5.2
<.000
CATHOLIC
0.591
0.071
8.3
<.000
JEW
1.211
0.184
6.6
<.000
PROFESSIONAL
1.414
0.105
13.45
<.000
CLERICAL
1.037
0.107
9.73
<.000
SKILLED
0.285
0.098
2.9
0.004
FARMER
0.301
0.128
2.35
0.019
NOT WORKING
0.386
0.107
3.6
<.000
POLITICAL SOUTH
CURRENTLY
MARRIED
-0.607
0.04
-15.23
<.000
0.446
0.041
10.78
<.000
DEMOCRATIC
0.736
0.055
13.33
<.000
REPUBLICAN
0.963
0.059
16.4
<.000
72
Random Effects
Cohort
1859-1875
coefficient
0.011
se
0.066
t ratio
0.17
p value
0.867
1876-1880
-0.01
0.064
-0.16
0.871
1881-1885
-0.022
0.063
-0.35
0.723
1886-1890
0.007
0.061
0.12
0.906
1891-1895
-0.026
0.059
-0.45
0.655
1896-1900
0.01
0.058
0.17
0.864
1901-1905
-0.028
0.055
-0.51
0.613
1906-1910
-0.047
0.053
-0.89
0.376
1911-1915
0.061
0.051
1.19
0.236
1916-1920
0.016
0.049
0.32
0.747
1921-1925
0.05
0.048
1.03
0.303
1926-1930
0.054
0.048
1.12
0.262
1931-1935
-0.003
0.049
-0.06
0.953
1936-1940
-0.033
0.05
-0.66
0.511
1941-1945
0.024
0.049
0.49
0.625
1946-1950
0.035
0.048
0.73
0.468
1951-1955
0.027
0.049
0.55
0.581
1956-1960
-0.106
0.05
-2.11
0.035
1961-1965
-0.032
0.053
-0.61
0.54
1966-1970
-0.014
0.057
-0.24
0.811
1971-1975
-0.003
0.06
-0.05
0.958
1976-1980
0.016
0.063
0.26
0.795
1981-1986
0.013
0.065
0.2
0.838
73
Period
1952
-0.029
0.076
-0.39
0.7
1956
-0.102
0.068
-1.50
0.134
1960
0.274
0.081
3.4
0.001
1964
0.066
0.072
0.91
0.361
1968
0.02
0.073
0.28
0.783
1972
-0.066
0.063
-1.04
0.298
1976
-0.049
0.067
-0.74
0.459
1980
-0.055
0.073
-.75
0.453
1984
0.004
0.067
0.05
0.956
1988
-0.259
0.067
-3.85
<.000
1992
0.108
0.066
1.63
0.104
1996
-0.011
0.073
-0.16
0.876
2000
-0.074
0.075
-0.99
0.323
2004
0.174
0.087
2.01
0.045
Variance Components
Cohort
variance
0.004
se
0.003
z value
1.33
p value
0.16
Period
0.021
0.01
2.1
0.04
Deviance
94358.54
AIC
94305.54
Source: 1952-2004 American National Election Study (N = 19,766)
#p<.1; *p<.05; **p<.01; ***p<.001
74
Figure 2. Estimated Cohort and Period Effects and 95 Percent Confidence Bounds for NES Voter Turnout Model
Cohort Effect
ˆ 0 j
Period Effect
ˆ0 k
0.75
0.7
0.65
0.6
0.55
0.5
-1
8
18 75
7
18 6
8
18 1
8
18 6
1891
9
19 6
0
19 1
0
19 6
1911
1
19 6
2
19 1
2
19 6
3
19 1
1936
4
19 1
4
19 6
5
19 1
1956
6
19 1
6
19 6
7
19 1
7
19 6
81
Predicted Probability of Voting
0.8
Cohort
75
Table 6. F-Tests for the Presence of Random Effects, ANES Data
l0
lmax
r
m
NT
Cohort Effects Period Effects Cohort and Period Effects
τu = 0 vs. τu > 0 τv = 0 vs. τv > 0 τu = τv = 0 vs. τu or τv > 0
9,695
9,695
9,695
9,679
9,662
9,646
39
30
53
16
16
16
F
19,766
0.6957
0.4915
1.42
19,766
2.357
0.4912
4.8
19,766
1.32
0.4918
2.69
f0.95(r – m, NT – r)
1.53
1.69
1.7
(lmax – l0)/(r – m)
l0/( NT – r )
76
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
As in the case of trends in GSS verbal ability, this analysis of
Presidential voting turnout finds:
 significant random variance components that reside
in all three levels of the APC data: individuals nested
within cohorts and periods;
 quadratic age effects that are not explained away by
controlling for the effects of individual characteristics,
and for period and cohort effects;
 significant contextual effects of cohorts and periods
on voting in Presidential elections;
 but Presidential voting turnout is mainly a period
story.
77
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Application 3: A HAPC-GLMM Analysis of GSS Data on
Happiness, 1972-2004 (Yang 2008)
Research Questions:
 Who is happier? – Social stratification of subjective wellbeing
 Do people get happier with age and over time?
 How do social inequalities in happiness vary over the life
course and
by time?
 Born to be happy? Are there any birth cohort differences in
happiness?
78
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Level 1 (Individual-Level) Model:
where yijk denotes the ordinal response happiness variable in the GSS data
(very happy, pretty happy, not too happy) modeled with an ordinal logit
HAPC-CCREM specification, and
Xp denotes a vector of other individual-level variables such as age by sex,
age by race, and age by education interaction variables.
79
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Level 2 Model:
80
Some Findings:
81
Some Findings:
82
Some Findings:
83
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
As in the case of trends in GSS verbal ability and NES
Presidential election voting probabilities, this analysis of the
GSS happiness data finds:
significant random variance components that reside
in all three levels of the APC data: individuals nested
within cohorts and periods;
quadratic age effects that are not explained away by
controlling for the effects of individual characteristics, and
for period and cohort effects;
significant contextual effects of both cohorts and
periods on voting in Presidential elections, i.e., interesting
stories both for cohorts and periods.
84
Part III: Second Research Design: APC Analysis of
Repeated Cross-Section Surveys
Application 4: An Integration of the Hierarchical AgePeriod-Cohort Model with Heteroscedastic Regression to
Develop the HAPC-HR Model, Applied to Study Variations
in Self-Reported Health Disparities in the U.S., 1984-2007
(Zheng, Yang, and Land 2011)
There are three standard approaches to the study of changes
in health disparities:
(1) across the life course (e.g., House et al. 1994;
Dannefer 2003),
(2) across cohorts (e.g., Lynch 2003; Warren and
Hernandez 2007), and
(3) across time periods (e.g., Pappas et al. 1993;
Goesling 2007).
All of these approaches have one thing in common:
They focus on changes in health disparities as estimated by
conditional expectation functions (regressions) estimated on
the basis of measured demographic and socioeconomic
covariates.
This facilitates the estimation of between-group disparities,
i.e., variations in health across groups or between-cell variation
and temporal variations therein,
but
it ignores possible within-group disparities – variations in
health inside groups or within-cell variation – and variations
therein over time.
To examine Age-Period-Cohort variations in both health and health
disparities, we:
intersect the HAPC model
with
a Heteroscedastic Regression (HR) model.
This allows us to both:
(1) disentangle age, period, and cohort effects, and
(2) separate within-group health disparities from between-group health
disparities.
The result is a Hierarchical-Age-Period-Cohort-HeteroscedasticRegression Model (HAPC-HR) model.
Application to National Health Interview Survey (NHIS)
data on self-reported health, 1984-2007: With individuallevel demographic and socioeconomic that are established
covariates of health used to define the cells in the Level-1
regression model:
sex (1 = male, 0 = female),
race (1 = white, 0 = non-white),
marital status (1 = married, 0 = unmarried),
work status (1 = full/part time job and 0 = not
employed),
education (years of formal education), and
income (in 2007 dollars),
here are some results.
Figure 1. Observed Means of Self-Rated Health, NHIS, 1984 to 2007.
4
The whole sample
Men
Mean of Self-Rated Health
3.9
Women
3.8
3.7
3.6
3.5
3.4
07
06
20
20
05
20
04
03
20
02
20
01
20
20
00
20
99
19
98
97
19
96
19
95
19
19
94
19
93
19
92
91
19
90
19
89
19
19
88
19
87
19
86
85
19
19
19
84
3.3
Year
* The trends are adjusted for sample weights and smoothed by a three-point moving average.
89
Figure 2. Observed Variances in Self-Rated Health, NHIS, 1984 to 2007.
1.5
The whole sample
Men
Women
Variance in Self-Rated Health
1.4
1.3
1.2
1.1
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
1
Year
* The trends are adjusted for sample weights and smoothed by a three-point moving average
90
Cohort
3.8
Women
3.75
3.7
3.65
3.6
3.55
3.5
3.45
Conditional Expected Value of Self-Rated Health
3.85
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
19
85
19
80
19
75
19
70
19
65
19
60
19
55
19
50
19
45
19
40
19
35
19
30
19
25
19
20
19
15
19
10
19
05
18
99
Conditional Expected Value of Self-Rated Health
84
81
78
75
72
69
66
63
60
57
54
51
48
45
42
39
36
33
30
27
24
21
18
Conditional Expected Value of Self-Rated Health
Figure 5. Variations in Conditional Expected Values of Gender-Specific Self-Rated Health
across Age, Cohort and Period, with 95% Confidence Intervals.
4.7
Men
4.5
Women
4.3
4.1
3.9
3.7
3.5
3.3
3.1
Age
Men
3.85
Men
3.8
Women
3.75
3.7
3.65
3.6
3.55
3.5
3.45
Period
91
Cohort
1.2
1.1
1
Predicted Dispersion of Self-Rated Health
1.3
0.9
0.8
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
19
85
19
80
19
75
19
70
19
65
19
60
19
55
19
50
19
45
19
40
19
35
19
30
19
25
19
20
19
15
19
10
19
05
18
99
Predicted Dispersion of Self-Rated Health
Period
84
81
78
75
72
69
66
63
60
57
54
51
48
45
42
39
36
33
30
27
24
21
18
Predicted Dispersion of Self-Rated Health
Figure 6. Variations in Predicted Dispersion of Gender-Specific Self-Rated Health across
Age, Cohort and Period, with 95% Confidence Intervals.
1.25
Men
1.15
Women
1.05
0.95
0.85
0.75
0.65
0.55
Age
Men
1.3
Men
Women
Women
1.2
1.1
1
0.9
0.8
92
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
References for Part IV:
Miyazaki, Yasuo and Stephen W. Raudenbush. 2000.
"Tests for Linkage of Multiple Cohorts in an Accelerated
Longitudinal Design." Psychological Methods 5:4463.
Yang, Yang. 2007. “Is Old Age Depressing? Growth
Trajectories and Cohort Variations in Late Life
Depression.” Journal of Health and Social Behavior
48:16-32.
93
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
Accelerated Longitudinal Panel Design
Definition: A longitudinal panel study of an initial
sample of individuals from a broad array of ages (and
thus birth cohorts) interviewed or monitored with three
or more follow-up waves.
The design allows a more rapid accumulation of
information on age and cohort effects than a single
cohort follow-up study.
94
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
Data Structure: Accelerated Longitudinal Panel Design
Age (Time)
Cohort
95
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
For this research design, the HAPC Model
becomes a Growth Curve Model of Individual
Change with cohort interactions:
Assess the intra-individual age changes and birth cohort
differences simultaneously;
Assess differential cohort patterns in age changes: ageby-cohort interaction effects;
Period effects?
• The time period for an accelerated longitudinal panel study
often is short (e.g., a decade or so), so the effects of period
usually can be ignored;
• In growth curve models, age and time are the same variable,
so the effects of period need not be estimated; and
• can be focused on the age-by-cohort interactions.
• If period effects are of concern, estimate the HAPC-CCREM.
96
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
Application: Cohort Variations in Age Trajectories of
Depression in the Elderly (Yang 2007)
Research Questions
• Does the age growth trajectory show an increase
in depressive symptoms in late life?
• Is there cohort heterogeneity in levels of
depressive symptoms and age growth trajectories
of depressive symptoms?
• What social risk factors are associated with these
effects?
Data
• Established Populations for Epidemiologic Studies
of the Elderly (EPESE) in North Carolina: A fourwave panel study of older adults aged 65+ from
1986 to 1996
97
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
Model Specification
Level-1 Repeated Observation Model
Yti   0i   1i Ageti   pi X pti  eti
(11)
p
Yti = CES-D for person i at time t, for i =1, …, n and t = 1, …, Ti
Xpti = (marital status, economic status, health status,
stress and coping resources)
= expected CES-D for person i
 0i
= expected growth rate per year of age in CES-D for person i
 1i = regression coefficient associated with X
pti
 pi
iid
eti ~ N (0,  2 )
98
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
Model Specification
Level-2 Individual Model
 0i   00   01Cohorti    0q Z qi  r0i
q
 1i  10  11Cohorti  r1i
(12)
Zqi = (Female, Black, Education)
 00 = expected CES-D for person i for the reference group (at median age in
Cohort 1 at T1)
 01 = main cohort effect coefficient: mean difference in CES-D between
cohorts
 0 q = regression coefficient associated wit Zqi
 10 = age effect coefficient: expected rate of change in CES-D
 11 = age*cohort coefficient: mean difference in rate of change between
cohorts
r0i  iid  0   0
 r  ~ N  0, 
 1i 
    10
 01  

 1  
99
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
Model Estimates
Fixed Effect
Model 1
(Total)
Model 7
(Net)
 00
2.856***
2.525***
 1i  10
Growth Rate: Age,
0.048***
-0.018
Cohort
 01
0.244***
-0.213**
Age * Cohort
 11
-0.019#
-0.040***
 0i
Intercept,
Random Effect
Variance Component
Level-1: Within person
2
% Reduction
36.987***
35.109***
5%
0
6.170***
3.763***
39%
In growth rate 1
0.057***
0.051***
11%
AIC (smaller is better)
51190.5
48167.4
BIC (smaller is better)
51215.6
48192.5
Level-2: In intercept
Goodness-of-fit
# p < .10; * p < .05; ** p < .01; *** p < .001.
100
Part IV: Third Research Design: Cohort Analysis
of Accelerated Longitudinal Panels
Expected Growth Trajectories and Cohort Variations in Depression
b. Model 7- Net Age and Cohort Effects
4
4
Age
95
93
91
89
87
85
83
81
79
77
75
73
65
95
93
91
89
87
85
83
81
79
77
75
73
1
71
1
69
2
67
2
71
3
69
3
67
CES-D
5
65
CES-D
a. Model 1-Gross Age and Cohort Effects
5
Age
All
cohort 1
cohort 2
cohort 3
cohort 4
cohort 5
101
Part IV: Third Research Design: Cohort Analysis of
Accelerated Longitudinal Panels
Summary of Findings:
The gross age trajectory of depressive symptoms
during late life is positive and linear;
There is substantial cohort heterogeneity in both
average levels of depressive symptoms and age
growth trajectories of depressive symptoms;
The age growth trajectories of depressive
symptoms are not significant after adjusting for
cohort effects and risk factors associated with
historical trends in education, life course stages,
survival, health decline, stress and coping resources;
Net of all the factors considered, more recent birth
cohorts have higher levels of depression.
102
Conclusion
A Webpage has been developed that contains copies
of our papers referenced in this presentation as
well as others:
http://www.unc.edu/~yangy819/apc/index.html
Happy Hunting for Age, Period, and Cohort Effects!
103