Methodological Overview and Assessment of the PAMS Project Dave Osthus October 29, 2012

advertisement
Methodological Overview and Assessment of the
PAMS Project
Dave Osthus
October 29, 2012
Outline
I
PAMS Survey
I
PAMS Methodology
I
Results
I
Simulation Idea
I
Comments/Conclusions
PAMS Survey
The Physical Activity Measurement Survey (PAMS) is a survey
designed to obtain information on physical activity patterns of
I
Adult women and men (21-70)
I
Hispanic and African American populations (limited sample
size)
Individuals were sampled from four counties in Iowa: Marshall,
Black Hawk, Dallas and Polk.
PAMS Objectives
I
Goal to recruit 1200 participants into the survey, (recruitment
spanned two years)
I
Approximately equal number of males and females
I
Approximately 10% African American and 10% Hispanic
Totals (%)
Females
785 (58.3)
Males
561 (41.7)
African Americans
114 (8.5)
Hispanics
53 (3.9)
Sampling and Data Collection from Individuals
Sampling
I
Systematic sampling of households
I
Simple random sampling of individuals within households
What was asked of sampled individuals?
I Individuals wore SenseWear Monitor for 24 hours.
I
I
I
24-hour activity recall administered via phone the following
day.
I
I
Measures heart rate, heat flux, accelerometer (motion, steps),
etc.
Proprietary algorithm outputs energy expenditure,
MET-minutes, time spent at different intensity levels, etc.
Can calculate energy expenditure, MET-minutes, time spent at
different intensity levels, etc.
Done on two non-consecutive days
PAMS Methodology
Interest in estimating usual (e.g. long-run average) activity
distributions.
Beyler (2010) methodology:
1. Transform monitor and recall MET-mins (MM) to
approximate normality
2. Remove variation due to nuisance factors (e.g. season of year,
first or second day)
3. Estimate group-level (e.g. Gender/age) parameters for
meaurement error models via method of moments
4. Smooth estimates with a population model describing
relationships among group parameters
5. Back-transform usual activity distributions to original scale
Model for Monitor 24-hr MM on the Normal Scale, Xij
(all parameters and random variables have a subscript g suppressed)
Xij = µ + ti + dij + uij
Xij =monitor MM for individual i on day j from an unbiased
reference instrument
µ =mean daily MM
ti =individual i’s mean deviation from the mean usual daily MM
dij =is individual i’s deviation from his of her mean daily MM on day j
uij =is random measurement error for individual i on day j
ti ∼ N(0, σt2 ),
dij ∼ N(0, σd2 ),
E (Xij |i) = µ + ti ,
uij ∼ N(0, σu2 )
Var(Xij ) = σt2 + σd2 + σu2
*assuming uncorrelated random effects
Model for Recall 24-hr MM on the Normal Scale, Yij
(all parameters and random variables have a subscript g suppressed)
Yij = µy + β1 (ti + dij ) + ri + eij
Yij =recall MM for individual i on day j from the
self-report instrument
µy − µ =bias for mean usual MM
β1 =linear bias related to true activity level
ri =individual i’s deviation from the self-reported mean
eij =recall measurement error for individual i on day j
µy + β1 (ti + dij ) =is systematic bias
ri + eij =is random measurement error
ri ∼ N(0, σr2 ),
eij ∼ N(0, σe2 )
Var(Yij ) = β12 σt2 + β12 σd2 + σr2 + σe2
*assuming uncorrelated random effects
Notation


x̄i.
 ȳi. 
xi1 + xi2

Zi = 
,
xi1 − xi2  , x̄i. =
2
yi1 − yi2
 
µ
 µy 

E (Zi ) = 
0
0
 2 1 2 2
1 2
σt + 2 (σd + σu )
Var (Zi ) = 
sym
ȳi. =
β1 (σt + 2 σd )
β12 (σt2 + 12 σd2 ) + σr2 + σe2
yi1 + yi2
2
0
0
2(σd2 + σu2 )
2(β12 σd2 σe2 )

0
0 
2β1 σd2
Notation, cont.
 
m1
m2 

m1 = 
 0 ,
0
Pn
wi x̄i.
m1 = Pi=1
,
n
i=1 wi
Pn
wi ȳi.
m2 = Pi=1
n
i=1 wi

m
m
0
0
11
12
Pn
− Z̄ )(Zi − Z̄ )0 
m22
0
0 
i=1 wi (Z

Pi n
=

m33 m34 
i=1 wi
sym
m44

m2 =
I
wi is the survey weight for individual i
I
The estimating equations then are:
m1 = E (Zi ),
m2 = Var (Zi )
Estimating Model Parameters
I
Method of Moments
I
8 population parameters per group:
I
2
2
2
2
2 0
θg = (µg , µyg , β1g , σtg
, σdg
, σug
, σrg
, σeg
)
Beyler, 2010
I
Taylor series approximation used to derive variance matrix for
θ̂g :
Steps 1 and 2: Transform to normality and Nuissance Reg.
I Males/Age Groups
I Passably normal if Shapiro-Wilks p-value greater than 0.10
I Must be same transformation for monitor and recall data
I No power transformation proved sufficient
I log transformation was chosen
Normal Q−Q Plot: Recall
Shapiro−Wilks p−val: 0
8.0
Sample Quantiles
7.0
7.5
8.0
7.5
7.0
Sample Quantiles
8.5
8.5
Normal Q−Q Plot: Monitor
Shapiro−Wilks p−val: 2.1e−05
−3
−2
−1
0
1
Theoretical Quantiles
2
3
−3
−2
−1
0
1
2
3
Theoretical Quantiles
QQ plots of male data. These are actually the QQ plots AFTER the nuissance effects
have been removed (Spring and Fall for monitor and Rep for recall.)
Step 3: Parameter Estimates
Interpretation
(Parameter)
Usual PA (µg )
Reported PA (µyg )
Pop Bias (β1g )
Age 1 [21, 40)
n=163
7.802
(0.017)
7.703
(0.017)
0.82
(0.11)
Age 2 [40,50)
n=121
7.711
(0.017)
7.61
(0.015)
0.62
(0.13)
Age 3 [50,60)
n=142
7.705
(0.015)
7.608
(0.017)
0.7
(0.2)
Age 4 [60,70]
n=135
7.588
(0.017)
7.507
(0.016)
0.74
(0.08)
Table : Group level parameter estimates (SEs) for male MET-Mins using age
grouping.
I Expected(?): Constant or monotone estimates across groups.
I Estimates of usual PA and reported PA decrease as age increases.
I Population bias estimates are roughly constant across groups.
Step 3: Variance Component Estimation
Interpretation
(Parameter)
2
Usual PA (σtg
)
2
Day to day (σdg
)
2
Monitor ME (σug
)
2
Recall ME (σeg
)
2
Person Report(σrg )
Age 1 [21, 40)
n=163
0.0313
(0.0063)
0.0117
(0.0037)
0.0134
(0.004)
0.0189
(0.0036)
0.0068
(0.003)
Age 2 [40,50)
n=121
0.0197
(0.0055)
0.017
(0.0055)
0.001
(0.0053)
0.0134
(0.0035)
0.0092
(0.0025)
Age 3 [50,60)
n=142
0.0148
(0.0045)
0.0204
(0.0098)
0.0038
(0.0082)
0.017
(0.0063)
0.009
(0.0035)
Age 4 [60,70]
n=135
0.0273
(0.0049)
0.0096
(0.0023)
0.006
(0.0021)
0.0082
(0.0019)
0.0082
(0.0019)
Table : Group level variance component estimates (SEs) for male MET-Mins using
age groupings.
I Expected(?): Constant or monotone across groups.
I Large SEs relative to estimates for many variance components.
Visual of Variance Components
Group Level Variance Component Comparison
Male Age Groups
True Daily EE
Monitor EE
Reported EE
1000*Variance Component
40
Variance
Usual PA
Day to Day
Monitor ME
Reported ME
20
Person−Report
0
<41 41−50 51−60 >60
<41 41−50 51−60 >60
<41 41−50 51−60 >60
Age Groups
The height of the error bars represent 2*SE.
I Is there a reason why monitor ME should be larger for males between 21 and 40
than for all other ages?
Step Back
What Methodology Assumed
What We Have
Normally Distributed Data
Non-Normally Distributed Data
↓
↓
MoMs Parameter Estimation
MoMs Parameter Estimation
↓
↓
Parameter Estimates
Parameter Estimates
The Question
I
Question: How different are Parameter Estimates and
Parameter Estimates?
I
If Parameter Estimates and Parameter Estimates are
considerably different, then evidence that estimation
procedure is sensitive to departures from normaility.
I
I
Concentrate on finding transformation to normality
If Parameter Estimates and Parameter Estimates are NOT
considerably different, evidence that estimation procedure is
NOT sensitive to departures from normality.
I
Think of other reason why monitor ME (for example) might
deviate from what is easily explainable
Simulation Idea
I
Problem: We don’t have Parameter Estimates. We only have
Parameter Estimates.
I
Solution: Do the following:
1. Simulate “Normally Distributed Data” that is just like the
“Non-Normally Distribued Data” we have, only normally
distributed.
2. Run MoMs parameter estimation on the simulated data from
previous step.
I
Do the above N times.
I
Compare parameter estimates from “Normally Distributed
Data” (simulated data) to results produced by“Non-Normally
Distribued Data” (actual data).
Simulate Data
Simulate “Normally Distributed Data” just like the “Non-Normally
Distribued Data”


xi1g
xi2g 

f (Zig ) = Wig = 
yi1g  ,
yi2g
Φg
=
ng
1 X
Wig ,
ng
data vector for individual i in group g
sample average of vector W for group g
i=1
ng
Σg
=
1 X
(Wig − Φg )(Wig − Φg )0 ,
ng − 1
i=1
= sample covariance matrix of vector W for group g
ng
= number of individuals in sample group g
Simulate Data, cont.
For each g = 1, 2, 3, 4, perform the following:
I For each j = 1, 2, . . . , N = 1000, perform the following:
1. Simulate ng vectors, Wigj , where
Wigj ∼ MVN4 (Φg , Σg ),
j from 1, 2, . . . , N
2. Let Zig∗j = f −1 (Wigj ), the normally distributed analog to Zig
3. Estimate θ̂g∗j with the previously defined method of moment
equations
Monte Carlo Average:
θ̄g∗ =
N
1 X ∗j
θ̂g
N
j=1
Monte Carlo SE:
SE (θ̄g∗ )
v
u
u
= diag t
N
1 X ∗j
(θ̂g − θ̄g∗ )(θ̂g∗j − θ̄g∗ )0
N − 1 j=1
!
Monitor Data
Non-Normally Distribued Data
Normally Distribued Data
Actual Monitor Data Set
One Simulated Monitor Data Set
120
80
80
count
count
120
40
40
0
0
6.5
7.0
7.5
8.0
8.5
Monitor MM
Shapiro-Wilks p-value:
2.1 × 10−5
9.0
6.5
7.0
7.5
8.0
8.5
Monitor MM
Shapiro-Wilks p-value:
0.8605
9.0
Recall Data
Non-Normally Distribued Data
Normally Distribued Data
Actual Recall Data Set
One Simulated Recall Data Set
80
80
count
120
count
120
40
40
0
0
6.5
7.0
7.5
8.0
8.5
Recall MM
Shapiro-Wilks p-value:
0
9.0
6.5
7.0
7.5
8.0
8.5
Recall MM
Shapiro-Wilks p-value:
0.8428
9.0
Scatter Plots
Non-Normally Distribued Data
Normally Distribued Data
Simulated Recall vs. Simulated Monitor
8.5
8.5
8.0
8.0
Recall MM
Recall MM
Actual Recall vs Actual Monitor
7.5
7.5
7.0
7.0
7.0
7.5
8.0
Monitor MM
8.5
7.0
7.5
8.0
Monitor MM
8.5
Parameter Estimates
Interpretation
(Parameter)
Usual PA (µg )
Reported PA (µyg )
Pop Bias (β1g )
Data
Type
A
S
A
S
A
S
Age 1 [21, 40)
n=163
7.802
(0.017)
7.799
(0.022)
7.703
(0.017)
7.704
(0.021)
0.817
(0.111)
0.856
(0.102)
Age 2 [40,50)
n=121
7.711
(0.017)
7.719
(0.021)
7.61
(0.015)
7.606
(0.019)
0.618
(0.128)
0.541
(0.117)
Age 3 [50,60)
n=142
7.705
(0.015)
7.721
(0.021)
7.608
(0.017)
7.609
(0.022)
0.705
(0.199)
0.767
(0.193)
Age 4 [60,70]
n=135
7.588
(0.017)
7.585
(0.021)
7.507
(0.016)
7.501
(0.02)
0.741
(0.08)
0.749
(0.104)
Table : Group level parameter estimates for male MET-Mins using age grouping.
Data Type “A” means “Actual,” “S” means “Simulated.” For the simulated data,
Monte Carlo averages (Monte Carlo SEs) are presented.
I
Estimation of β1g appears sensitive to departures from
normality, but look at SEs.
I
Underestimation of SEs for µg and µyg for actual data
Variance Component Estimates
Interpretation
(Parameter)
2
Usual PA (σtg
)
2
Day to day (σdg
)
2
Monitor ME (σug
)
2
Recall ME (σeg
)
2
Person Report(σrg
)
Data
Type
A
S
A
S
A
S
A
S
A
S
Age 1 [21, 40)
n=163
0.031
(0.006)
0.037
(0.007)
0.012
(0.004)
0.013
(0.004)
0.013
(0.004)
0.01
(0.004)
0.019
(0.004)
0.015
(0.003)
0.007
(0.003)
0.01
(0.004)
Age 2 [40,50)
n=121
0.02
(0.005)
0.026
(0.006)
0.017
(0.006)
0.021
(0.009)
0.001
(0.005)
-0.002
(0.008)
0.013
(0.003)
0.015
(0.003)
0.009
(0.003)
0.009
(0.003)
Age 3 [50,60)
n=142
0.015
(0.005)
0.02
(0.006)
0.02
(0.01)
0.025
(0.04)
0.004
(0.008)
0
(0.04)
0.017
(0.006)
0.023
(0.006)
0.009
(0.004)
0.01
(0.004)
Age 4 [60,70]
n=135
0.027
(0.005)
0.028
(0.006)
0.01
(0.002)
0.011
(0.004)
0.006
(0.002)
0.006
(0.003)
0.008
(0.002)
0.009
(0.002)
0.008
(0.002)
0.01
(0.003)
Table : Group level variance component estimates for male MET-Mins using age
groupings. Data Type “A” means “Actual,” “S” means “Simulated.” For the
simulated data, Monte Carlo averages (Monte Carlo SEs) are presented.
I
Very similar variance component estimates, across the board
I
Same trend across age groups for estimates of monitor ME for
both actual and simulated data
Comments/Conclusions
I
Conclusion: Departure from normality is not causing lack of
monotone or constant parameter estimates across age groups
I
So, do we believe our parameter estimates are correct?
I
Knowledge of context / statistical methodology interplay
Thanks
Download