Growth Models - Biostatistics and Risk Assessment Center (BRAC)

advertisement
Growth Models
Raul Cruz-Cano
HTLH 654 Spring 2013
University of Maryland
What is a Growth Model?
•
•
A way to assess individual stability and
change, both growth and decay, over time.
A two-level, hierarchical model that that
models (1) within individual change over
time and (2) between individual differences
in patterns of growth.
2
Also known as:
•
•
•
•
Growth Models
Trajectory Models
Growth Curve Models
Latent GM
3
Why Latent?
•
•
Because we assume that whatever process that is
underlying the thing we are modeling (or the
behavior we observe) is actually unobserved, or
latent.
The characteristics we observe are a manifestation
of this latent trajectory.
4
Why use Growth Models?
• You have longitudinal data and are interested in
change over time.
– You may want to explain those changes.
– You may also believe that not everyone follows the same
path.
5
Hierarchical Models
• Traditional:
– Level 1: Students
– Level 2: Schools
• Growth Models (a type of HM):
– Level 1: Repeated Observations
– Level 2: Individuals
6
Unconditional Model
• Level 1: Within Individual
yit = αi + βit + εit
• Level 2: Between Individual
α i = α0 + u i
βi = β0 + vi
7
A Latent Trajectory
Depressive Symptoms
Latent Depression Trajectory
β
α
Time
8
Time-Invariant Covariates
• Level 1: Within Individual
yit = αi + βit + εit
• Level 2: Between Individual
αi = α0 + α1xi1 + α2xi2 + . . . αkxik + ui
βi = β0 + β1xi1 + β2xi2 + . . . βkxik + vi
9
Time-Varying Variables
• Level 1: Within Individual
yit = αi + βit + γt wit + εit
• Level 2: Between Individual
Time-varying effect.
αi = α0 + α1xi1 + α2xi2 + . . . αkxik + ui
βi = β0 + β1xi1 + β2xi2 + . . . βkxik + vi
10
Example
• “Stability and Change in Family Structure and
Maternal Health Trajectories.” Meadows,
McLanahan, & Brooks-Gunn. American Sociological
Review. Forthcoming.
• We wanted to know whether changes in family
structure, including transitions into and out of
coresidential relationships, had impacts on health
11
Example: Self-Rated Health
• Mothers in FFCWS
• “In general, how is your health?”
– Excellent (5)
– Very Good (4)
– Good (3)
– Fair (2)
– Poor (1)
• Repeated measures one, three, and five
years after birth.
12
Models
• Unconditional
– Model Fit
• Conditional
– Time-Invariant Covariates
– Time-Varying Covariates
13
Example (cont.)
• Trajectories of maternal self-rated health and
mental health problems from one year after
birth to five years after birth.
• Two types of measures of family structure
change:
– Level 1: Time-Varying
– Level 2: Time-Invariant
14
Time-Invariant Covariates
•
•
•
•
Age at Baseline
Education
Race
Biological Parents Mental
Health Problem
• Lived with both Bio
Parents at Age 15
• Number of Previous
Relationships
• Baseline SRH
• Considered an Abortion
• Positive Marriage
Attitude
• Prenatal Variables
(medical care, drug and
alcohol use, smoking)
• Baseline Marital Status
15
Mothers’ Self-Rated Health Trajectories for each Baseline Marital Status.
16
Time-Varying Covariate
• Mothers’ Household Income
• Fathers’ Mental Health
• Fathers’ Earnings
17
Mothers’ Household Income Trajectories
18
Fathers’ Mental Health Trajectories
19
Fathers’ Earnings Trajectories
20
Example
• Results:
– Transitions, especially exits from marriages,
resulted in declines in mental health problems.
– No growing gap in well-being between mothers
who remained stably married and those remained
stably single, as well as mothers who made
transitions.
21
Other topics worth visiting…
PROC TRAJ
•
•
•
•
•
•
•
•
PROC TRAJ is a specialized model that estimates multiple groups within the
population, in contrast to a traditional regression or growth curve model that
models only one mean within the population (similar to what we do “by hand”
when we divide a variable the groups in a categorical variable)
It is not part of the base SAS program and must be downloaded separately.
Addressed research questions focused on describing the trajectory, or pattern, of
change over time in the dependent variable, specifically questions concerned with
multiple distinct patterns of change over time
Estimates a regression model for each discrete group within the population.
The focus of the Proc Traj procedure is identifying distinct subgroups within the
population.
Does not provide any individual level information on the pattern of change over
time; subjects are grouped and it is assumed that every subject in the group
follows the same trajectory.
There is no random effect capability
In order to use Proc Traj you must organize your data in a multivariate, or “wide”
format, where there is only one row of data for each subject and multiple
observations included in one line of data.
PROC TRAJ
• The posterior group probabilities are
calculated for each individual based on the
estimated parameters, and the individual is
assigned to a group based on their highest
posterior group probability
• You have to use an iterative process to decide
the best model based on the fit parameters
Options
• DATA= data for analysis
• OUTPUT NAMES::
– OUT= Group assignments and membership
probabilities, e.g. OUT=OF.
– OUTSTAT= Parameter estimates used by TRAJPLOT
macro, e.g. OUTSTAT=OS.
– OUTPLOT= Trajectory plot data, e.g. OUTPLOT=OP.
Options
• MODEL; Dependent variable distribution (CNORM, ZIP, LOGIT) e.g.
MODEL CNORM;
• VAR; Dependent variables, measured at different times or ages (for
example, hyperactivity score measured at age t,) e.g. VAR V1-V8;
• INDEP; Independent variables (e.g. age, time) when the dependent
(VAR) variables were measured, e.g. INDEP T1-T8;
1 dependent variable, and 2 independent variable which are always ID
and time
• ORDER; Polynomial (0=intercept, 1=linear, 2=quadratic, 3=cubic) for
each group, e.g. ORDER 2 2 2 0; If omitted, cubics are used by
default.
• ID; Variables (typically containing information to identify
observations) to place in the output (OUT=) data set, e.g. ID IDNO;
• WEIGHT; Weight variable for a weighted likelihood function.
Example
• This example uses data from 195 subjects in a
prospective longitudinal survey. Offense
convictions were recorded annually for boys
from age 8 through age 32 (1 = 1 or more
convictions, 0 = no convictions).
PROC TRAJ DATA=CAMBRDGE OUT=OF
OUTPLOT=OP OUTSTAT=OS ITDETAIL;
ID ID;
VAR C1-C23;
INDEP T1-T23;
MODEL LOGIT;
NGROUPS 2;
ORDER 1 1;
RUN;
%TRAJPLOT(OP,OS,'Offenses vs. Time','Logistic
Model','Offenses','Scaled Age')
PROC TRAJ DATA=CAMBRDGE OUT=OF OUTPLOT=OP OUTSTAT=OS;
ID ID;
VAR C1-C23;
INDEP T1-T23;
MODEL LOGIT;
NGROUPS 2;
ORDER 3 3;
RUN;
/*Creating Graph*/
%TRAJPLOT(OP,OS,'Offenses vs. Time','Logistic Model','Offenses','Scaled Age')
Notice
change in
AIC
Now what?
• In any case there are clearly 2 groups of
people:
– Why are they different? Look at the other
independent variables
Example 2: Number of remissions
PROC TRAJ DATA=TRY OUTPLOT=OP OUTSTAT=OS OUT=OF OUTEST=OE;
ID ID; VAR R0-R10; INDEP T0-T10;
MODEL LOGIT; NGROUPS 3; ORDER 1 2 2;
RUN;
%TRAJPLOT(OP,OS,'Remission vs. Time','Logistic Model','Remission','Time')
PROC TRAJ DATA=TRY OUTPLOT=OP OUTSTAT=OS OUT=OF OUTEST=OE;
ID ID; VAR R0-R10; INDEP T0-T10;
MODEL LOGIT; NGROUPS 4; ORDER 0 3 3 3;
RUN;
%TRAJPLOT(OP,OS,'Remission vs. Time','Logistic Model','Remission','Time')
PROC GLIMMIX for Counts
•The GLIMMIX procedure fits statistical models to data with correlations or nonconstant
variability and where the response is not necessarily normally distributed.
•These models are known as generalized linear mixed models (GLMM).
•The GLMMs, like linear mixed models, assume normal (Gaussian) random effects.
•Conditional on these random effects, data can have any distribution in the exponential
family. The exponential family comprises many of the elementary discrete and continuous
distributions.
•The binary, binomial, Poisson, and negative binomial distributions, for example, are discrete
members of this family. The normal, beta, gamma, and chi-square distributions are
representatives of the continuous distributions in this family.
•In the absence of random effects, the GLIMMIX procedure fits generalized linear models (fit
by the GENMOD procedure).
37
Basic Features
The GLIMMIX procedure enables you to specify a generalized linear mixed model and to
perform confirmatory inference in such models. The syntax is similar to that of the MIXED
procedure and includes CLASS, MODEL, and RANDOM statements.
The following are some of the basic features of PROC GLIMMIX.
• SUBJECT= and GROUP= options, which enable blocking of variance matrices and
parameter heterogeneity
• choice of linearization about expected values or expansion about current solutions of best
linear unbiased predictors
• flexible covariance structures for random and residual random effects, including variance
components, unstructured, autoregressive, and spatial structures
• CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements, which produce
hypothesis tests and estimable linear combinations of effects
38
Notation for the Generalized Linear Mixed Model
The GLIMMIX procedure determines the variance function from the DIST= option
in the MODEL statement or from the user-supplied variance function.
The matrix R is a variance matrix specified by the user through the RANDOM
statement.
39
PROC GLIMMIX Contrasted with Other SAS Procedures
The GLIMMIX procedure generalizes the MIXED and GENMOD procedures in two
important ways.
First, the response can have a nonnormal distribution. The MIXED procedure assumes that
the response is normally (Gaussian) distributed.
Second, the GLIMMIX procedure incorporates random effects in the model and so allows
for subject-specific (conditional) and population-averaged (marginal) inference. The
GENMOD procedure only allows for marginal inference.
The GLIMMIX and MIXED procedure are closely related.
40
Example
Researchers investigated the performance of two medical procedures in a
multicenter study.
They randomly selected 5 centers for inclusion.
One of the study goals was to compare the occurrence of side effects for the
procedures.
In each center nA patients were randomly selected and assigned to procedure “A,”
and nB patients were randomly assigned to procedure “B”.
The following DATA step creates the data set for the analysis.
41
Example
data multicenter;
input center group$ n sideeffect;
datalines;
1 A 32 14
1 B 33 18
The variable group identifies the two procedures, n
2 A 30 4
is the number of patients who received a given
2 B 28 8
procedure in a particular center, and sideeffect is the
3 A 23 14
number of patients who reported side effects.
3 B 24 9
4 A 8 1
4 B 8 1
5 A 7 1
5 B 8 0
;
42
Example
If YiA and YiB denote the number of patients in center i who report side effects for
procedures A and B, respectively, then—for a given center—these are independent
binomial random variables.
To model the probability of side effects for the two drugs, πiA and π iB, you need to
account for the fixed group effect and the random selection of centers. One
possibility is to assume a model that relates group and center effects linearly to the
logit of the probabilities:
R
 U
log S

V
T1 -  W
iA
 0   A  i
iB
 0  B  i
iA
iA
log
R
 U

S
V
T1 -  W
iB
iB
43
Example
proc glimmix data=multicenter;
class center group;
model sideeffect/n = group / solution;
random intercept / subject=center;
run;
The PROC GLIMMIX statement invokes the procedure.
The CLASS statement instructs the procedure to treat the variables center and
group as classification variables.
The MODEL statement specifies the response variable as a sample proportion
using the events/trials syntax. In terms of the previous formulas, sideeffect/n
corresponds to YiA/niA for observations from Group A and to YiB/niB for
observations from Group B
44
Example
The SOLUTION option in the MODEL statement requests a listing of the solutions
for the fixed-effects parameter estimates.
Note that because of the events/trials syntax, the GLIMMIX procedure defaults
to the binomial distribution, and that distribution’s default link is the logit link.
The RANDOM statement specifies that the linear predictor contains an intercept
term that randomly varies at the level of the center effect. In other words, a
random intercept is drawn separately and independently for each center in the
study.
The results of this analysis are shown on the following pages.
45
Example
Results from complete
data from 15 Centers
The “Parameter Estimates” table displays the solutions for the fixed effects in the model.
Solutions for Fixed Effects
Effect
Intercept
group
group
group
A
B
Estimate
Standard
Error
DF
t Value
Pr > |t|
-0.8071
-0.4896
0
0.2514
0.2034
.
14
14
.
-3.21
-2.41
.
0.0063
0.0305
.
Because of the fixed-effects parameterization used in the GLIMMIX procedure, the
“Intercept” effect is an estimate of β0 + βB, and the “A” group effect is an estimate of
βA − βB, the log-odds ratio. The associated estimated probabilities of side effects
in the two groups are
 iA 
 iB
1
 0.2147
1  exp 0.8071  0.4896
k
p
1

 0.3085
1  expk
0.8071p
There is a significant difference between the two
groups (p=0.0305).
46
Example
You can produce the estimates of the average logits in the two groups and their predictions
on the scale of the data with the LSMEANS statement in PROC GLIMMIX.
ods select lsmeans;
proc glimmix data=multicenter;
class center group;
model sideeffect/n = group / solution;
random intercept / subject=center;
lsmeans group / cl;
run;
The LSMEANS statement requests the least-squares means of the group effect on the logit
scale.
The CL option requests their confidence limits.
47
Example
group Least Squares Means
group
A
B
Estimate
Standard
Error
DF
t Value
Pr > |t|
Alpha
Lower
Upper
Mean
-1.2966
-0.8071
0.2601
0.2514
14
14
-4.99
-3.21
0.0002
0.0063
0.05
0.05
-1.8544
-1.3462
-0.7388
-0.2679
0.2147
0.3085
The “Estimate” column displays the least-squares mean estimate on the logit scale, and the “Mean”
column represents its mapping onto the probability scale.
The “Lower” and “Upper” columns are 95% confidence limits for the logits in the two groups.
The “Lower Mean” and “Upper Mean” columns are the corresponding confidence limits for the
probabilities of side effects. These limits are obtained by inversely linking the confidence bounds on the
linear scale, and thus are not symmetric about the estimate of the probabilities.
48
Poisson Distribution
• Poisson distribution is for counts—if events happen at
a constant rate over time, the Poisson distribution
gives the probability of X number of events occurring in
time T.
Poisson Mean and Variance
• Mean

 
For a Poisson
random variable, the
variance and mean
are the same!
Variance and Standard Deviation
 
2
 
where  = expected number of hits in a given
time period
Example: Poisson
Subjects are HIV+ drug users from Project CLEAR.
Two different outcomes, number of sex acts in the last 3 months and
the number of HIV negative or unknown partners in the last 3 months.
Subject
= Subject ID number
Act3m
= Sex acts, last 3 months
Hpart3m
= HIV positive or HIV status unknown partners in the
last 3 months
Follow
= 0, 3, 6, 9, 15, or 21 months post baseline.
Intv
= Intervention = 1, not =0
Gender
= 0 = female, 1 = male
Trade3m = 0 = no, 1=traded sex for money in the last 3 months
proc glimmix data=poisson;
class subject intv follow gender ethnic trade3m;
model Act3m = follow intv gender ethnic trade3m/ dist=poisson solution;
random int / subject=subject;
run;
Exercise 8: Binomial
• The variable age gives the age group
• The variables hmo is binary indicator variable
for HMO insured patients
• Suppose that we want to determine if patients
with hmo die at a different rate
Exercise 8: PostDoc Example
•
•
557 Biochemist got doctorate from 106 American Universities
Variables:
– PDC: Went for post-doc training immediately after PhD
– AGE: Age at PhD completion
– MAR: Married= 1, Unmarried =0
– DOC: Prestige of Doctoral Institution
– UND: selectivity of undergraduate institution
– AG: Agricultural Department = 1, 0 otherwise
– ARTS: Number of Articles Published (Outcome Variable)
– CITS: Number of Citation of published articles
– DOCID: ID of doctoral institution
Raul Cruz-Cano, HLTH653 Spring 2013
Reference
1. Arrandale VH. An Evaluation of Two Existing
Methods for Analyzing Longitudinal Respiratory
Symptom Data [M.Sc. Thesis]. Vancouver:
University of British Columbia; 2006.
2. Jones BL, Nagin DS, Roeder K. A SAS procedure
based on mixture models for estimating
developmental trajectories. Sociological
Methods & Research 2001;29(3):374-393
Download