A Bayesian Approach to Model Daily Physical Activity October 10, 2011

advertisement
A Bayesian Approach to Model Daily Physical
Activity
Dave Osthus and Bryan Stanfill
Center for Survey Statistics and Methodology, Iowa State University
October 10, 2011
Page 1 of 41
Outline
Physical Activity Measurement Survey (PAMS) Background
Data Description
Statement of Problem
Models
Model Selection
Results
Model Assessment
Conclusions and Future Work
Page 2 of 41
PAMS Objectives
PAMS is a two-stage survey designed to obtain information on
physical activity patterns of
Adult women and men (21-70)
Hispanic and African American populations
Rural and non-rural adults
Page 3 of 41
PAMS Population
Eligible adults from
Black Hawk, Dallas,
Marshall, and Polk
counties.
Eligibility criteria:
Not pregnant or lactating
Able to complete an interview in either English or Spanish
No physical limitation or medical restrictions preventing the
adult from participating in physical activity
Reside in a household with a land line phone
Page 4 of 41
Stratification and Sampling Design
Stratification
Counties were partitioned into a low minority and high
minority tract group
Done to improve chances of recruiting African Americans and
Hispanics
Total of 8 strata
Two stage sampling within strata
First stage: Households sampled (sampling frame: listing of
land line phone numbers)
Second stage: Adults within household sampled
Page 5 of 41
Data Collection Process
Selected eligible individuals wore SenseWear Monitor for 24
hours on two non-consecutive days.
Minute by minute energy expenditure information is recorded.
Data collection was evenly distributed over two years,
partitioned into eight quarters.
Page 6 of 41
Summary Statistics
Currently, only a portion of the data collected is being utilized.
Variable
Age
BMI
Smokers
Statistic
Q1
Median
Q3
Q1
Median
Q3
Count
Females (n=219)
45.00
54.00
61.00
24.87
29.41
36.04
35
Males (n=175)
38.00
48.00
60.00
26.03
28.96
33.08
41
Page 7 of 41
Definitions
Def: A metabolic equivalent of a task, or MET, is a
concept expressing energy cost as a multiple of resting
metabolic heart rate. 1 MET is considered one’s resting
metabolic rate.
For example, engaging in an activity with a MET value of 3
would require three times the energy that person consumes at
rest.
Def: MET-minutes are simply the MET levels multiplied by
the minutes engaged in that physical activity.
For example, someone who engaged in a physical activity with
a MET value of 3 for 30 minutes engaged in a 3 ∗ 30 = 90
MET-minute physical activity.
Page 8 of 41
CDC Guidelines
The Center for Disease Control states on their website that
adults need at least:
1
2
3
150 minutes of moderate-intensity aerobic activity (3-6 METs)
every week.
75 minutes of vigorous-intensity aerobic activity (over 6
METs) every week.
An equivalent mix of moderate- and vigorous-intensity aerobic
activity every week.
Key: These minutes of physical activity of specified intensity
levels must come in at least 10 continuous minute intervals,
known as bouts.
Page 9 of 41
Operational Definition and Data Examination
Operational Definition of CDC guidelines:
On 5 or more days a week, individuals should engage in 90
MET-minutes of physical activity at an intensity of at least 3
METs observed during bouts of at least 10 minutes.
Daily MET−minutes
200
count
150
100
50
0
0
500
1000
1500
2000
2500
3000
MET−minutes
Page 10 of 41
Data Examination: Gender
Page 11 of 41
Data Examination: BMI
Page 12 of 41
Data Examination: Age
Page 13 of 41
Modeling Objectives
Goal:
Our goal is to develop a statistical model that can plausibly
“produce” the observed data while at the same time, aides us
in answer three primary questions. They are:
1
2
3
What proportion of Iowans engage in at least one bout of
physical activity (PA)?
What proportion of Iowans comply with the CDC PA
guidelines?
What covariates are associated with compliance?
Page 14 of 41
Modeling Considerations
Modeling considerations:
1 MET-minutes are constrained to a portion of the positive
real-line
2 Spike at 0 MET-minutes
3 Long tail / right-skewed
4 Heterogeneous distributions relative to covariates
Daily MET−minutes
200
count
150
100
50
0
0
500
1000
1500
2000
2500
3000
MET−minutes
Page 15 of 41
The Model
Bayesian Mixture Model Approach
Let Yi , i = 1, 2, . . . , n, be a random variable associated with the
MET-minutes of observation i. We will define Yi as follows:
Yi = (1 − πi ) + πi Wi
where,
πi ∼ Bernoulli(pi )
and,
Wi ∼ Shifted Gamma(µi , φ)
Page 16 of 41
The Model, continued
More specifically, if πi ∼ Bernoulli(pi ), then
g (πi |pi ) = piπi (1 − pi )1−πi
E (πi ) = pi
Var (πi ) = pi (1 − pi )
logit(pi ) ≡ xT
i β1
⇒ g (πi |β1 ) =
exp(xT
i β1 )
1 + exp(xT
i β1 )
!πi
exp(xT
i β1 )
1−
1 + exp(xT
i β1 )
!1−πi
π(β1 ) ∝ 1
Page 17 of 41
The Model, continued
For the shifted gamma, if Wi∗ ∼ Gamma(µi , φ) and
1 ∗
w − log (µi )] + c(wi∗ , φ)}
µi i
c(wi∗ , φ) = φlog (φ) − log (Γ(φ)) + (φ − 1)log (wi∗ )
h(wi∗ |µi , φ) = exp{φ[−
E (Wi∗ ) = µi
µ2i
Var (Wi∗ ) =
φ
log (µi ) ≡ xT
i β2
( "
#
)
1
∗
⇒ h(wi∗ |β2 , φ) = exp φ −
wi∗ − xT
i β2 + c(wi , φ)
exp(xT
β
)
2
i
Page 18 of 41
The Model, continued
Then define Wi = Wi∗ + δ, (δ = 30 in our case), where δ is a fixed,
known constant.
( "
#
)
(wi − δ)
h(wi |β2 , φ) = exp φ −
− xT β2 + c(wi , φ)
exp(xT β2 )
c(wi , φ) = φlog (φ) − log (Γ(φ)) + (φ − 1)log (wi − δ)
E (Wi ) = µi + δ
µ2i
Var (Wi ) =
φ
π(β2 ) ∝ 1
π(φ) = Gamma(1.001, 1001) (Diffuse Gamma)
Page 19 of 41
Posterior Formulation
Posterior is proportional to prior times likelihood:
π(β1 , β2 , φ|y) ∝ π(β1 , β2 , φ)L(y|β1 , β2 , φ)
where we assume,
π(β1 , β2 , φ) = π(β1 )π(β2 )π(φ)
and,
L(y|β1 , β2 , φ)
=
n
Y
Li (β1 , β2 , φ|yi )
i=1
Li (β1 , β2 , φ|yi )
=
+
!
exp(xT
i β1 )
I(yi = 0)
1−
1 + exp(xT
i β1 )
( "
#
)
(wi − δ)
T
exp φ −
− xi β2 + c(wi , φ) I(yi 6= 0)
exp(xT
i β2 )
Page 20 of 41
Fitting the Models
We consider (most) combinations of four covariates: BMI,
Age, Gender, Smoking Status
To more easily automate the model fitting process we only
consider models of the form Xi,n×p for i = 1, 2 and p ∈ {2, 3}
giving a parameter space of 5 or 7 dimensions
MCMCpack was used to sample from each joint posterior
using a standard Metropolis algorithm
Chains were spot checked to confirm the limiting distributions
were reached and desirable mixing was achieved
Now that we have 52 fitted models, how do we decide which
is best?
Page 21 of 41
Bayesian Model Selection
A little notation:
Let M = {M1 , . . . , Mm } be a collection of possible models
Denote the marginal likelihood of the data, y, given model i
as f (y|Mi ) where
Z
f (y|Mi ) = f (y|θi , Mi )π(θi |Mi )dθi .
The posterior probability of model i is given by
f (y|Mi )π(Mi )
P(Mi |y) = Pm
j=1 f (y|Mj )π(Mj )
where π(Mi ) is the prior probability Mi is the best model
(usually 1/m)
Page 22 of 41
Model Selection, Continued
One can use this to find the model that maximizes P(Mi |y)
and use it, or one can integrate over the whole model space
M to incorporate the uncertainty in the model selection
process
The latter option is called Bayesian Model Averaging and
produces “more correct” estimates
The quantity of interest must be shared by all models or the
integration won’t make sense
We consider P(Bout) ≡ P(Yi 6= 0) and P(Compliance) ≡
P(Yi ≥ 90)
Page 23 of 41
Bayes Model Averaging
To compute a BMA estimate for β complete the following process:
For i = 1, . . . , m:
Estimate the quantity of interest for each model: β̂i
Compute the “probability of the model”: P(Mi |y)
Take a weighted
average the estimates
Pm
β̂BMA = i=1 P(Mi |y)β̂i
In our case we are interested in a 95% interval for P(Bout) and
P(Compliance) so we estimate the 2.5% and 97.5% quantiles for
each probability in each model then take the (weighted) average.
Page 24 of 41
Bayes Model Averaging Results
A plot of the P(Mi |y) for models 1-52 is below:
The averaged estimates and intervals are in the following table:
P(Bout)
P(Compliance)
Mean
0.67
0.52
95% CI
(0.12, 0.97)
(0.05, 0.86)
Page 25 of 41
Results From Selected Model
Since P(M24 |y) = 99.97% we can use model 24 without much
loss of accuracy
This model takes BMI and Gender as covariates in both
regressions
A table of the parameter estimates for this model is below
Parameter
β10
βBMI 1
βMale1
β20
βBMI 2
βMale2
φ
Mean
0.323
-0.143
1.109
5.099
-0.094
0.792
0.912
SD
0.120
0.015
0.194
0.081
0.008
0.103
0.054
95% CI
(0.090, 0.561)
(-0.173, -0.116)
(0.731, 1.488)
(4.943, 5.261)
(-0.110, -0.078)
(0.587, 0.992)
(0.808, 1.020)
Page 26 of 41
Interpretation of Parameters
According to our model we expect 67% (64%, 69%) of Iowans
engage in some PA but only 52% (49%,55%) comply with
federal guidelines
For a given gender the odds of engaging in 0 bouts are 1.15
times greater for someone with a BMI of x+1 relative to a
BMI of x
For a given BMI the odds a male engaging in at least one
bout are 3.03 times greater than that of a female
For a given gender every one unit increase in BMI corresponds
to 9% decrease in mean MET-minutes per day
On average a male accrues 120% more MET-minutes than a
female at the same BMI
Page 27 of 41
Interpretation of Parameters
According to our model we expect 67% (64%, 69%) of Iowans
engage in some PA but only 52% (49%,55%) comply with
federal guidelines
For a given gender the odds of engaging in 0 bouts are 1.15
times greater for someone with a BMI of x+1 relative to a
BMI of x
For a given BMI the odds a male engaging in at least one
bout are 3.03 times greater than that of a female
For a given gender every one unit increase in BMI corresponds
to 9% decrease in mean MET-minutes per day
On average a male accrues 120% more MET-minutes than a
female at the same BMI
Page 28 of 41
Interpretation of Parameters
According to our model we expect 67% (64%, 69%) of Iowans
engage in some PA but only 52% (49%,55%) comply with
federal guidelines
For a given gender the odds of engaging in 0 bouts are 1.15
times greater for someone with a BMI of x+1 relative to a
BMI of x
For a given BMI the odds a male engaging in at least one
bout are 3.03 times greater than that of a female
For a given gender every one unit increase in BMI corresponds
to 9% decrease in mean MET-minutes per day
On average a male accrues 120% more MET-minutes than a
female at the same BMI
Page 29 of 41
Interpretation of Parameters
According to our model we expect 67% (64%, 69%) of Iowans
engage in some PA but only 52% (49%,55%) comply with
federal guidelines
For a given gender the odds of engaging in 0 bouts are 1.15
times greater for someone with a BMI of x+1 relative to a
BMI of x
For a given BMI the odds a male engaging in at least one
bout are 3.03 times greater than that of a female
For a given gender every one unit increase in BMI corresponds
to 9% decrease in mean MET-minutes per day
On average a male accrues 120% more MET-minutes than a
female at the same BMI
Page 30 of 41
Interpretation of Parameters
According to our model we expect 67% (64%, 69%) of Iowans
engage in some PA but only 52% (49%,55%) comply with
federal guidelines
For a given gender the odds of engaging in 0 bouts are 1.15
times greater for someone with a BMI of x+1 relative to a
BMI of x
For a given BMI the odds a male engaging in at least one
bout are 3.03 times greater than that of a female
For a given gender every one unit increase in BMI corresponds
to 9% decrease in mean MET-minutes per day
On average a male accrues 120% more MET-minutes than a
female at the same BMI
Page 31 of 41
Estimates of Daily Probabilities
P(Bout>0)
P(Compliance)
0.8
0.6
Gender
Probability
Female
Male
Dash
Mean
0.4
Band
0.2
0.0
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
BMI
Page 32 of 41
Comparison of Weekly and Daily Compliance Rates
Female
Male
0.8
0.6
Type
Probability
P(Daily Compliance)
P(Weekly Compliance)
Dash
Mean
Band
0.4
0.2
0.0
20
25
30
35
40
45
50
55
20
25
30
35
40
45
50
55
BMI
Page 33 of 41
Model Assessment: Gender
Female
Male
150
100
Real
count
50
0
150
100
Simulated
50
0
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
Data
On average we simulate 33% of the population engages in no PA,
the dataset has the same proportion of zeros.
Page 34 of 41
Model Assessment: Age
[21,42) (n=157)
[42,52) (n=167)
[52,62) (n=178)
[62,70](n=156)
60
50
40
Real
30
20
count
10
0
60
50
Simulated
40
30
20
10
0
0
500 1000 1500 2000 2500 3000 3500
0
500 1000 1500 2000 2500 3000 3500
0
500 1000 1500 2000 2500 3000 3500
0
500 1000 1500 2000 2500 3000 3500
Data
Page 35 of 41
Model Assessment: BMI
[0,25) (n=149)
[25,29) (n=163)
[29,34) (n=175)
BMI 34 and Over (n=171)
100
80
Real
60
40
count
20
0
100
80
Simulated
60
40
20
0
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
0
1000
2000
3000
4000
5000
Data
Page 36 of 41
Conclusions
Recapping:
Q: What proportion of Iowans engage in at least one bout of
PA?
A: 67% (64%,69%)
Q: What proportion of Iowans comply with the CDC PA
guidelines (daily compliance)?
A: 52% (49%,55%)
What covariates are associated with compliance?
A: BMI and Gender
Page 37 of 41
A Few Key Assumptions
Days are independent
Observations on the same person are independent
Each individual’s Gamma distribution has the same J-shape
Survey weights are negligible
Page 38 of 41
Future Work
There are several things we plan to do from here:
Add more covariates to our data set
Consider other models and covariate combinations
Use survey statistics
Add random variables for multiple observations
Page 39 of 41
Thanks! Questions?!
Page 40 of 41
Download