A Bayesian Approach to Model Daily Physical Activity Dave Osthus and Bryan Stanfill Center for Survey Statistics and Methodology, Iowa State University October 10, 2011 Page 1 of 41 Outline Physical Activity Measurement Survey (PAMS) Background Data Description Statement of Problem Models Model Selection Results Model Assessment Conclusions and Future Work Page 2 of 41 PAMS Objectives PAMS is a two-stage survey designed to obtain information on physical activity patterns of Adult women and men (21-70) Hispanic and African American populations Rural and non-rural adults Page 3 of 41 PAMS Population Eligible adults from Black Hawk, Dallas, Marshall, and Polk counties. Eligibility criteria: Not pregnant or lactating Able to complete an interview in either English or Spanish No physical limitation or medical restrictions preventing the adult from participating in physical activity Reside in a household with a land line phone Page 4 of 41 Stratification and Sampling Design Stratification Counties were partitioned into a low minority and high minority tract group Done to improve chances of recruiting African Americans and Hispanics Total of 8 strata Two stage sampling within strata First stage: Households sampled (sampling frame: listing of land line phone numbers) Second stage: Adults within household sampled Page 5 of 41 Data Collection Process Selected eligible individuals wore SenseWear Monitor for 24 hours on two non-consecutive days. Minute by minute energy expenditure information is recorded. Data collection was evenly distributed over two years, partitioned into eight quarters. Page 6 of 41 Summary Statistics Currently, only a portion of the data collected is being utilized. Variable Age BMI Smokers Statistic Q1 Median Q3 Q1 Median Q3 Count Females (n=219) 45.00 54.00 61.00 24.87 29.41 36.04 35 Males (n=175) 38.00 48.00 60.00 26.03 28.96 33.08 41 Page 7 of 41 Definitions Def: A metabolic equivalent of a task, or MET, is a concept expressing energy cost as a multiple of resting metabolic heart rate. 1 MET is considered one’s resting metabolic rate. For example, engaging in an activity with a MET value of 3 would require three times the energy that person consumes at rest. Def: MET-minutes are simply the MET levels multiplied by the minutes engaged in that physical activity. For example, someone who engaged in a physical activity with a MET value of 3 for 30 minutes engaged in a 3 ∗ 30 = 90 MET-minute physical activity. Page 8 of 41 CDC Guidelines The Center for Disease Control states on their website that adults need at least: 1 2 3 150 minutes of moderate-intensity aerobic activity (3-6 METs) every week. 75 minutes of vigorous-intensity aerobic activity (over 6 METs) every week. An equivalent mix of moderate- and vigorous-intensity aerobic activity every week. Key: These minutes of physical activity of specified intensity levels must come in at least 10 continuous minute intervals, known as bouts. Page 9 of 41 Operational Definition and Data Examination Operational Definition of CDC guidelines: On 5 or more days a week, individuals should engage in 90 MET-minutes of physical activity at an intensity of at least 3 METs observed during bouts of at least 10 minutes. Daily MET−minutes 200 count 150 100 50 0 0 500 1000 1500 2000 2500 3000 MET−minutes Page 10 of 41 Data Examination: Gender Page 11 of 41 Data Examination: BMI Page 12 of 41 Data Examination: Age Page 13 of 41 Modeling Objectives Goal: Our goal is to develop a statistical model that can plausibly “produce” the observed data while at the same time, aides us in answer three primary questions. They are: 1 2 3 What proportion of Iowans engage in at least one bout of physical activity (PA)? What proportion of Iowans comply with the CDC PA guidelines? What covariates are associated with compliance? Page 14 of 41 Modeling Considerations Modeling considerations: 1 MET-minutes are constrained to a portion of the positive real-line 2 Spike at 0 MET-minutes 3 Long tail / right-skewed 4 Heterogeneous distributions relative to covariates Daily MET−minutes 200 count 150 100 50 0 0 500 1000 1500 2000 2500 3000 MET−minutes Page 15 of 41 The Model Bayesian Mixture Model Approach Let Yi , i = 1, 2, . . . , n, be a random variable associated with the MET-minutes of observation i. We will define Yi as follows: Yi = (1 − πi ) + πi Wi where, πi ∼ Bernoulli(pi ) and, Wi ∼ Shifted Gamma(µi , φ) Page 16 of 41 The Model, continued More specifically, if πi ∼ Bernoulli(pi ), then g (πi |pi ) = piπi (1 − pi )1−πi E (πi ) = pi Var (πi ) = pi (1 − pi ) logit(pi ) ≡ xT i β1 ⇒ g (πi |β1 ) = exp(xT i β1 ) 1 + exp(xT i β1 ) !πi exp(xT i β1 ) 1− 1 + exp(xT i β1 ) !1−πi π(β1 ) ∝ 1 Page 17 of 41 The Model, continued For the shifted gamma, if Wi∗ ∼ Gamma(µi , φ) and 1 ∗ w − log (µi )] + c(wi∗ , φ)} µi i c(wi∗ , φ) = φlog (φ) − log (Γ(φ)) + (φ − 1)log (wi∗ ) h(wi∗ |µi , φ) = exp{φ[− E (Wi∗ ) = µi µ2i Var (Wi∗ ) = φ log (µi ) ≡ xT i β2 ( " # ) 1 ∗ ⇒ h(wi∗ |β2 , φ) = exp φ − wi∗ − xT i β2 + c(wi , φ) exp(xT β ) 2 i Page 18 of 41 The Model, continued Then define Wi = Wi∗ + δ, (δ = 30 in our case), where δ is a fixed, known constant. ( " # ) (wi − δ) h(wi |β2 , φ) = exp φ − − xT β2 + c(wi , φ) exp(xT β2 ) c(wi , φ) = φlog (φ) − log (Γ(φ)) + (φ − 1)log (wi − δ) E (Wi ) = µi + δ µ2i Var (Wi ) = φ π(β2 ) ∝ 1 π(φ) = Gamma(1.001, 1001) (Diffuse Gamma) Page 19 of 41 Posterior Formulation Posterior is proportional to prior times likelihood: π(β1 , β2 , φ|y) ∝ π(β1 , β2 , φ)L(y|β1 , β2 , φ) where we assume, π(β1 , β2 , φ) = π(β1 )π(β2 )π(φ) and, L(y|β1 , β2 , φ) = n Y Li (β1 , β2 , φ|yi ) i=1 Li (β1 , β2 , φ|yi ) = + ! exp(xT i β1 ) I(yi = 0) 1− 1 + exp(xT i β1 ) ( " # ) (wi − δ) T exp φ − − xi β2 + c(wi , φ) I(yi 6= 0) exp(xT i β2 ) Page 20 of 41 Fitting the Models We consider (most) combinations of four covariates: BMI, Age, Gender, Smoking Status To more easily automate the model fitting process we only consider models of the form Xi,n×p for i = 1, 2 and p ∈ {2, 3} giving a parameter space of 5 or 7 dimensions MCMCpack was used to sample from each joint posterior using a standard Metropolis algorithm Chains were spot checked to confirm the limiting distributions were reached and desirable mixing was achieved Now that we have 52 fitted models, how do we decide which is best? Page 21 of 41 Bayesian Model Selection A little notation: Let M = {M1 , . . . , Mm } be a collection of possible models Denote the marginal likelihood of the data, y, given model i as f (y|Mi ) where Z f (y|Mi ) = f (y|θi , Mi )π(θi |Mi )dθi . The posterior probability of model i is given by f (y|Mi )π(Mi ) P(Mi |y) = Pm j=1 f (y|Mj )π(Mj ) where π(Mi ) is the prior probability Mi is the best model (usually 1/m) Page 22 of 41 Model Selection, Continued One can use this to find the model that maximizes P(Mi |y) and use it, or one can integrate over the whole model space M to incorporate the uncertainty in the model selection process The latter option is called Bayesian Model Averaging and produces “more correct” estimates The quantity of interest must be shared by all models or the integration won’t make sense We consider P(Bout) ≡ P(Yi 6= 0) and P(Compliance) ≡ P(Yi ≥ 90) Page 23 of 41 Bayes Model Averaging To compute a BMA estimate for β complete the following process: For i = 1, . . . , m: Estimate the quantity of interest for each model: β̂i Compute the “probability of the model”: P(Mi |y) Take a weighted average the estimates Pm β̂BMA = i=1 P(Mi |y)β̂i In our case we are interested in a 95% interval for P(Bout) and P(Compliance) so we estimate the 2.5% and 97.5% quantiles for each probability in each model then take the (weighted) average. Page 24 of 41 Bayes Model Averaging Results A plot of the P(Mi |y) for models 1-52 is below: The averaged estimates and intervals are in the following table: P(Bout) P(Compliance) Mean 0.67 0.52 95% CI (0.12, 0.97) (0.05, 0.86) Page 25 of 41 Results From Selected Model Since P(M24 |y) = 99.97% we can use model 24 without much loss of accuracy This model takes BMI and Gender as covariates in both regressions A table of the parameter estimates for this model is below Parameter β10 βBMI 1 βMale1 β20 βBMI 2 βMale2 φ Mean 0.323 -0.143 1.109 5.099 -0.094 0.792 0.912 SD 0.120 0.015 0.194 0.081 0.008 0.103 0.054 95% CI (0.090, 0.561) (-0.173, -0.116) (0.731, 1.488) (4.943, 5.261) (-0.110, -0.078) (0.587, 0.992) (0.808, 1.020) Page 26 of 41 Interpretation of Parameters According to our model we expect 67% (64%, 69%) of Iowans engage in some PA but only 52% (49%,55%) comply with federal guidelines For a given gender the odds of engaging in 0 bouts are 1.15 times greater for someone with a BMI of x+1 relative to a BMI of x For a given BMI the odds a male engaging in at least one bout are 3.03 times greater than that of a female For a given gender every one unit increase in BMI corresponds to 9% decrease in mean MET-minutes per day On average a male accrues 120% more MET-minutes than a female at the same BMI Page 27 of 41 Interpretation of Parameters According to our model we expect 67% (64%, 69%) of Iowans engage in some PA but only 52% (49%,55%) comply with federal guidelines For a given gender the odds of engaging in 0 bouts are 1.15 times greater for someone with a BMI of x+1 relative to a BMI of x For a given BMI the odds a male engaging in at least one bout are 3.03 times greater than that of a female For a given gender every one unit increase in BMI corresponds to 9% decrease in mean MET-minutes per day On average a male accrues 120% more MET-minutes than a female at the same BMI Page 28 of 41 Interpretation of Parameters According to our model we expect 67% (64%, 69%) of Iowans engage in some PA but only 52% (49%,55%) comply with federal guidelines For a given gender the odds of engaging in 0 bouts are 1.15 times greater for someone with a BMI of x+1 relative to a BMI of x For a given BMI the odds a male engaging in at least one bout are 3.03 times greater than that of a female For a given gender every one unit increase in BMI corresponds to 9% decrease in mean MET-minutes per day On average a male accrues 120% more MET-minutes than a female at the same BMI Page 29 of 41 Interpretation of Parameters According to our model we expect 67% (64%, 69%) of Iowans engage in some PA but only 52% (49%,55%) comply with federal guidelines For a given gender the odds of engaging in 0 bouts are 1.15 times greater for someone with a BMI of x+1 relative to a BMI of x For a given BMI the odds a male engaging in at least one bout are 3.03 times greater than that of a female For a given gender every one unit increase in BMI corresponds to 9% decrease in mean MET-minutes per day On average a male accrues 120% more MET-minutes than a female at the same BMI Page 30 of 41 Interpretation of Parameters According to our model we expect 67% (64%, 69%) of Iowans engage in some PA but only 52% (49%,55%) comply with federal guidelines For a given gender the odds of engaging in 0 bouts are 1.15 times greater for someone with a BMI of x+1 relative to a BMI of x For a given BMI the odds a male engaging in at least one bout are 3.03 times greater than that of a female For a given gender every one unit increase in BMI corresponds to 9% decrease in mean MET-minutes per day On average a male accrues 120% more MET-minutes than a female at the same BMI Page 31 of 41 Estimates of Daily Probabilities P(Bout>0) P(Compliance) 0.8 0.6 Gender Probability Female Male Dash Mean 0.4 Band 0.2 0.0 20 25 30 35 40 45 50 55 20 25 30 35 40 45 50 55 BMI Page 32 of 41 Comparison of Weekly and Daily Compliance Rates Female Male 0.8 0.6 Type Probability P(Daily Compliance) P(Weekly Compliance) Dash Mean Band 0.4 0.2 0.0 20 25 30 35 40 45 50 55 20 25 30 35 40 45 50 55 BMI Page 33 of 41 Model Assessment: Gender Female Male 150 100 Real count 50 0 150 100 Simulated 50 0 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Data On average we simulate 33% of the population engages in no PA, the dataset has the same proportion of zeros. Page 34 of 41 Model Assessment: Age [21,42) (n=157) [42,52) (n=167) [52,62) (n=178) [62,70](n=156) 60 50 40 Real 30 20 count 10 0 60 50 Simulated 40 30 20 10 0 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500 Data Page 35 of 41 Model Assessment: BMI [0,25) (n=149) [25,29) (n=163) [29,34) (n=175) BMI 34 and Over (n=171) 100 80 Real 60 40 count 20 0 100 80 Simulated 60 40 20 0 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Data Page 36 of 41 Conclusions Recapping: Q: What proportion of Iowans engage in at least one bout of PA? A: 67% (64%,69%) Q: What proportion of Iowans comply with the CDC PA guidelines (daily compliance)? A: 52% (49%,55%) What covariates are associated with compliance? A: BMI and Gender Page 37 of 41 A Few Key Assumptions Days are independent Observations on the same person are independent Each individual’s Gamma distribution has the same J-shape Survey weights are negligible Page 38 of 41 Future Work There are several things we plan to do from here: Add more covariates to our data set Consider other models and covariate combinations Use survey statistics Add random variables for multiple observations Page 39 of 41 Thanks! Questions?! Page 40 of 41