Defining Major Depression Using Latent Class Model Diagnosis

advertisement
Graphical Diagnostic Tools for Evaluating
Latent Class Models:
An Application to Depression in the ECA
Study
Elizabeth S. Garrett
Department of Biostatistics
Johns Hopkins University
1
GOAL
1.
Provide tools for choosing the most
appropriate latent class model.
2.
Interpret objective diagnostic methods
in reference to the latent class model.
2
Table of Contents
1.
2.
3.
4.
5.
6.
7.
8.
Introduction
Previous Work
Model Estimation
Diagnostic Methods for Latent Class Models
Extensions to Latent Class Regression
Application to the ECA Study
Validating Diagnostic Criteria for Depression
Using LCM
Discussion and Further Research
3
Outline
I.
II.
III.
IV.
V.
VI.
VII.
Depression in relation to the LCM
Approach to Estimation
The ECA Study
Predicted Frequency Check Plot
Latent Class Estimability Display
Interpretation of Findings
Revisions
4
Motivating Question
How should we describe
“major depression?”
–
–
–
–
not depressed, depressed
none, moderate, severe
none, mild, moderate, severe
none, mood symptoms, somatic symptoms, both
5
How we conceptualize
“major depression”






We use indicators of symptoms such as self-reported presence
of sadness, weight change, etc.
A combination of these indicators is thought to define depression.
Using these combinations, we commonly seek to categorize
individuals into depression classes.
These classes represent the construct “depression.”
“Depression” is a latent variable.
The construct of “Depression” can then be used for
classification, description, and prediction
6
Depression in the Diagnostic and Statistical
Manual of Mental Disorders, 3rd Edition
DSM-III Criteria (generally):
A. Dysphoria for 2 or more weeks
B. Reported symptoms in 4 or more of the following
symptom groups:
1. loss of appetite, weight change
2. insomnia, hypersomnia
3. retarded movement, restlessness
4. disinterest in sex
5. fatigue
6. feelings of guilt or worthlessness
7. trouble concentrating, thoughts slow or mixed
8. morbid thoughts, suicidal thoughts/attempts
7
Latent Class Model: Main Ideas

There are M classes of depression (e.g. none, mild, severe). m
represents the proportion of individuals in the population in class
m (m=1,…,M)

Each person is a member of one of the M classes, but we do not
know which. The latent class of individual i is denoted by i.

Symptom prevalences vary by class. The prevalence for
symptom j in class m is denoted by pmj.

Given class membership, the symptoms are independent.
Latent Class Model

~
i
yi
~
p i
~

M : number of classes
 p : vector of symptom probabilities given latent class i
i
  m : probability of being in latent class m, m=1,…M.
 i : the true latent class of individual i.
 y i : vector of individual i’s report of symptoms.
~
9
Estimation Approach
Bayesian Approach:
Quantify beliefs about p, , and  before and after observing
data.
Bayesian Terminology:
Prior Probability: What we believe about unknown
parameters before observing data.
Posterior Probability: What we believe about the
parameters after observing data.
10
Bayesian Estimation Approach
We estimated the models using a Markov chain
Monte Carlo (MCMC) algorithm:
Specify prior probability distribution:
P(p, , )
Combine prior with likelihood to obtain posterior distribution:
P(p, ,  |Y)  P(p, , ) x L(Y| p, , )
Estimate posterior distribution for each parameter using iterative
procedure.
P( 1|Y) = ∫ P(p, ,  |Y)
11
The Epidemiologic Catchment Area Study
prevalence
A.
3481 community-dwelling
individuals in Baltimore were
interviewed using the NIMH
Diagnostic Interview
Schedule.
B.
dysphoria
Group 1
6 month prevalence of
symptoms was assessed.
lost appetite
lost weight
Group 2
insomnia
0.11
hypersomnia
retarded movement
0.14
restlessness
Group 4
disinterest in sex
0.07
Group 5
fatigue
0.04
Group 6
guilt/worthless
0.09
Group 7
trouble concentrating
0.04
thoughts slow or mixed
Group 8
thoughts of death
wanted to die
suicidal thoughts
* those with organic brain disorder
were omitted as per DSM-III criterion
0.06
weight gain
Group 3
8 self-reported symptom groups
were completed for 2938
individuals*.
0.12
12
suicide attempts
0.06
The Epidemiologic Catchment Area Study
2 Class Model
3 Class Model
4 Class Model
Class
1
Class
2
Class
1
Class
2
Class
3
Class
1
Class
2
Class
3
Class
4


0.88
0.12
0.82
0.14
0.02
0.83
0.12
0.04
0.03
weight
0.02
0.42
0.01
0.24
0.77
0.01
0.33
0.21
0.75
sleep
0.06
0.48
0.05
0.36
0.68
0.05
0.33
0.42
0.70
movement
0.07
0.63
0.05
0.50
0.80
0.05
0.49
0.58
0.80
sex
0.02
0.42
0.01
0.25
0.81
0.01
0.12
0.17
0.81
fatigue
0.01
0.20
0.008
0.12
0.36
0.009
0.01
0.19
0.35
guilt
0.04
0.48
0.03
0.35
0.78
0.03
0.20
0.51
0.76
concentration
0.005
0.28
0.004
0.12
0.61
0.003
0.04
0.05
0.65
morbid
0.02
0.40
0.01
0.22
0.80
0.01
0.05
0.11
0.80
dysphoria
0.06
0.51
0.05
0.40
0.77
0.05
0.61
0.23
0.79
13
Predicted Frequency Check (PFC) Plot
Compare observed symptom pattern frequencies to
what the model predicts for a new sample of data
from the same population.
Symptom patterns:
» 000000000 no reported symptoms
» 000000001 report dysphoria only
» 111111111 report all symptoms
29 = 512 possible patterns
14
Example:
Pattern 001000001 :
» restlessness/retarded movement
» dysphoria
We observed 24 individuals with this symptom
pattern:
X 001000001  24
15
Example:
95% confidence interval for frequency?
Non-parametric (saturated model) estimate:
p 001000001 
24
2938
 0.008
[
|
]
15
24
34
16
Model Based Estimation
Predicted frequency of pattern 001000001 and
prediction interval in the 3 class model:
P ( X 001000001  x | Y )
97.5%
2.5%
9
18
28
Predicted Frequency (x)
17
Model Based Estimation
Comparison of model based prediction interval to
empirical confidence interval:
34
97.5%
4 class model
3 class model
2 class model
28
24
18
15
Observed
2.5%
9
18
= 22
= 18
= 17
1 00 0 0 00 0 0
0 01 0 0 10 0 0
0 00 0 1 00 0 0
= 9
= 9
= 8
= 8
= 7
= 7
= 7
= 6
= 6
= 6
= 6
= 5
= 5
= 5
= 5
= 5
= 5
0 00 1 0 00 0 1
0 01 1 0 00 0 0
0 00 0 0 10 0 1
0 10 0 0 10 0 0
0 00 0 0 01 0 0
0 01 0 0 00 1 0
1 01 0 0 00 0 0
0 10 0 1 00 0 0
0 11 0 0 00 0 1
1 10 0 0 00 0 0
1 11 1 1 11 1 1
0 01 0 0 00 1 1
0 01 0 1 00 0 0
0 01 1 0 10 0 0
0 01 1 0 10 1 0
0 10 0 0 00 1 0
1 11 1 0 11 1 1
= 10
= 23
0 00 1 0 00 0 0
0 00 1 0 10 0 0
= 24
0 01 0 0 00 0 1
= 11
= 26
0 11 0 0 00 0 0
1 00 0 0 00 0 1
= 26
0 00 0 0 00 1 0
= 16
= 58
0 00 0 0 10 0 0
0 10 0 0 00 0 1
= 103
0 10 0 0 00 0 0
Ob sd N = 1 9 8 2
= 105
2.5%
0 01 0 0 00 0 0
obs erved
= 110
97.5%
0 00 0 0 00 0 1
0 00 0 0 00 0 0
Predicted Frequency Check Plot
Pattern (in order of prevalence)
19
2.5%
= 3
= 3
= 3
= 3
= 3
= 3
= 3
= 3
= 3
= 3
= 3
= 2
= 2
= 2
= 2
= 2
011001101
100000100
100001000
101101111
101111111
110001000
110101111
111001000
111001011
111100001
000010001
000100010
001000111
001010001
001100010
= 3
001000100
011000010
= 3
000001010
= 3
= 4
111000000
010100000
= 4
011101000
= 3
= 4
011001000
= 3
= 4
011000101
010011011
= 4
010001001
= 3
= 4
001100001
001001010
= 4
001001001
001000101
= 4
Ob sd N = 1 9 8 2
000001011
observed
= 4
97.5%
000000101
000000011
Predicted Frequency Check Plot
Pattern (in order of prevalence)
20
Latent Class Estimability Display (LCED)
Is there enough data to estimate all of the parameters in
the model?
»
»
»
2 class model: 19 parameters
3 class model: 29 parameters
4 class model: 39 parameters
Problems arise when:
»
»
»
small data set
small class size
e.g. N=1000 and class size = 0.01
 10 individuals in class to estimate symptom
prevalences
small data set and small class size
21
Weak “Identifiability”
(Weak Estimability)
Definition: A parameter in a (Bayesian) model is
weakly identified if the posterior distribution of the
parameter is approximately the same as the prior.
P(1)  P( 1|Y)
If a model is weakly identified it is still “valid”, but we
cannot make inferences from the data about the
weakly identified parameters.
22
Examples
0
0.25
0.50
0.75
0
1.00
23
0.25
0.50
0.75
1.00
Latent Class Estimability Display
2 Class Model
Tau
0.02
0.16
Class 1
Class 2
3 Class Model
0.01
0.16
4 Class Model
0.33
0.02
0.35
0.39
0.35
Class Size
9
8
Item
7
6
5
4
3
2
1
Class 1
Class 2
24
Class 3
Class 1
Class 2
Class 3
Class 4
Interpretation
none
mild severe


0.82
0.14
0.02
weight
0.01
0.24
0.77
sleep
0.05
0.36
0.68
movement
0.05
0.50
0.80
sex
0.01
0.25
0.81
fatigue
0.008
0.12
0.36
guilt
0.03
0.35
0.78
concentration 0.004
0.12
0.61
morbid
0.01
0.22
0.80
dysphoria
0.05
0.40
0.77

Depression appears to
be ‘dimensional’
»
»
»



25
none
mild
severe
2% of population is in
severe class
14% in mild class: are
they depressed or not?
How does this compare
to the DSM-III definition?
Work Not Included in Talk
1.
MCMC Algorithm
2.
Log Odds Ratio Check Plot
3.
Predicted Class Assignment Display
4.
Extensions to Regression
26
Revisions Already Implemented
1.
New example for Chapter 5 (LCRR)
2.
Background/justification of latent class
model as “gold-standard” in validation
3.
Splus programs: on website with a
“user’s guide”
27
Download