Methodological Overview and Assessment of the PAMS Project Dave Osthus October 29, 2012 Outline I PAMS Survey I PAMS Methodology I Results I Simulation Idea I Comments/Conclusions PAMS Survey The Physical Activity Measurement Survey (PAMS) is a survey designed to obtain information on physical activity patterns of I Adult women and men (21-70) I Hispanic and African American populations (limited sample size) Individuals were sampled from four counties in Iowa: Marshall, Black Hawk, Dallas and Polk. PAMS Objectives I Goal to recruit 1200 participants into the survey, (recruitment spanned two years) I Approximately equal number of males and females I Approximately 10% African American and 10% Hispanic Totals (%) Females 785 (58.3) Males 561 (41.7) African Americans 114 (8.5) Hispanics 53 (3.9) Sampling and Data Collection from Individuals Sampling I Systematic sampling of households I Simple random sampling of individuals within households What was asked of sampled individuals? I Individuals wore SenseWear Monitor for 24 hours. I I I 24-hour activity recall administered via phone the following day. I I Measures heart rate, heat flux, accelerometer (motion, steps), etc. Proprietary algorithm outputs energy expenditure, MET-minutes, time spent at different intensity levels, etc. Can calculate energy expenditure, MET-minutes, time spent at different intensity levels, etc. Done on two non-consecutive days PAMS Methodology Interest in estimating usual (e.g. long-run average) activity distributions. Beyler (2010) methodology: 1. Transform monitor and recall MET-mins (MM) to approximate normality 2. Remove variation due to nuisance factors (e.g. season of year, first or second day) 3. Estimate group-level (e.g. Gender/age) parameters for meaurement error models via method of moments 4. Smooth estimates with a population model describing relationships among group parameters 5. Back-transform usual activity distributions to original scale Model for Monitor 24-hr MM on the Normal Scale, Xij (all parameters and random variables have a subscript g suppressed) Xij = µ + ti + dij + uij Xij =monitor MM for individual i on day j from an unbiased reference instrument µ =mean daily MM ti =individual i’s mean deviation from the mean usual daily MM dij =is individual i’s deviation from his of her mean daily MM on day j uij =is random measurement error for individual i on day j ti ∼ N(0, σt2 ), dij ∼ N(0, σd2 ), E (Xij |i) = µ + ti , uij ∼ N(0, σu2 ) Var(Xij ) = σt2 + σd2 + σu2 *assuming uncorrelated random effects Model for Recall 24-hr MM on the Normal Scale, Yij (all parameters and random variables have a subscript g suppressed) Yij = µy + β1 (ti + dij ) + ri + eij Yij =recall MM for individual i on day j from the self-report instrument µy − µ =bias for mean usual MM β1 =linear bias related to true activity level ri =individual i’s deviation from the self-reported mean eij =recall measurement error for individual i on day j µy + β1 (ti + dij ) =is systematic bias ri + eij =is random measurement error ri ∼ N(0, σr2 ), eij ∼ N(0, σe2 ) Var(Yij ) = β12 σt2 + β12 σd2 + σr2 + σe2 *assuming uncorrelated random effects Notation x̄i. ȳi. xi1 + xi2 Zi = , xi1 − xi2 , x̄i. = 2 yi1 − yi2 µ µy E (Zi ) = 0 0 2 1 2 2 1 2 σt + 2 (σd + σu ) Var (Zi ) = sym ȳi. = β1 (σt + 2 σd ) β12 (σt2 + 12 σd2 ) + σr2 + σe2 yi1 + yi2 2 0 0 2(σd2 + σu2 ) 2(β12 σd2 σe2 ) 0 0 2β1 σd2 Notation, cont. m1 m2 m1 = 0 , 0 Pn wi x̄i. m1 = Pi=1 , n i=1 wi Pn wi ȳi. m2 = Pi=1 n i=1 wi m m 0 0 11 12 Pn − Z̄ )(Zi − Z̄ )0 m22 0 0 i=1 wi (Z Pi n = m33 m34 i=1 wi sym m44 m2 = I wi is the survey weight for individual i I The estimating equations then are: m1 = E (Zi ), m2 = Var (Zi ) Estimating Model Parameters I Method of Moments I 8 population parameters per group: I 2 2 2 2 2 0 θg = (µg , µyg , β1g , σtg , σdg , σug , σrg , σeg ) Beyler, 2010 I Taylor series approximation used to derive variance matrix for θ̂g : Steps 1 and 2: Transform to normality and Nuissance Reg. I Males/Age Groups I Passably normal if Shapiro-Wilks p-value greater than 0.10 I Must be same transformation for monitor and recall data I No power transformation proved sufficient I log transformation was chosen Normal Q−Q Plot: Recall Shapiro−Wilks p−val: 0 8.0 Sample Quantiles 7.0 7.5 8.0 7.5 7.0 Sample Quantiles 8.5 8.5 Normal Q−Q Plot: Monitor Shapiro−Wilks p−val: 2.1e−05 −3 −2 −1 0 1 Theoretical Quantiles 2 3 −3 −2 −1 0 1 2 3 Theoretical Quantiles QQ plots of male data. These are actually the QQ plots AFTER the nuissance effects have been removed (Spring and Fall for monitor and Rep for recall.) Step 3: Parameter Estimates Interpretation (Parameter) Usual PA (µg ) Reported PA (µyg ) Pop Bias (β1g ) Age 1 [21, 40) n=163 7.802 (0.017) 7.703 (0.017) 0.82 (0.11) Age 2 [40,50) n=121 7.711 (0.017) 7.61 (0.015) 0.62 (0.13) Age 3 [50,60) n=142 7.705 (0.015) 7.608 (0.017) 0.7 (0.2) Age 4 [60,70] n=135 7.588 (0.017) 7.507 (0.016) 0.74 (0.08) Table : Group level parameter estimates (SEs) for male MET-Mins using age grouping. I Expected(?): Constant or monotone estimates across groups. I Estimates of usual PA and reported PA decrease as age increases. I Population bias estimates are roughly constant across groups. Step 3: Variance Component Estimation Interpretation (Parameter) 2 Usual PA (σtg ) 2 Day to day (σdg ) 2 Monitor ME (σug ) 2 Recall ME (σeg ) 2 Person Report(σrg ) Age 1 [21, 40) n=163 0.0313 (0.0063) 0.0117 (0.0037) 0.0134 (0.004) 0.0189 (0.0036) 0.0068 (0.003) Age 2 [40,50) n=121 0.0197 (0.0055) 0.017 (0.0055) 0.001 (0.0053) 0.0134 (0.0035) 0.0092 (0.0025) Age 3 [50,60) n=142 0.0148 (0.0045) 0.0204 (0.0098) 0.0038 (0.0082) 0.017 (0.0063) 0.009 (0.0035) Age 4 [60,70] n=135 0.0273 (0.0049) 0.0096 (0.0023) 0.006 (0.0021) 0.0082 (0.0019) 0.0082 (0.0019) Table : Group level variance component estimates (SEs) for male MET-Mins using age groupings. I Expected(?): Constant or monotone across groups. I Large SEs relative to estimates for many variance components. Visual of Variance Components Group Level Variance Component Comparison Male Age Groups True Daily EE Monitor EE Reported EE 1000*Variance Component 40 Variance Usual PA Day to Day Monitor ME Reported ME 20 Person−Report 0 <41 41−50 51−60 >60 <41 41−50 51−60 >60 <41 41−50 51−60 >60 Age Groups The height of the error bars represent 2*SE. I Is there a reason why monitor ME should be larger for males between 21 and 40 than for all other ages? Step Back What Methodology Assumed What We Have Normally Distributed Data Non-Normally Distributed Data ↓ ↓ MoMs Parameter Estimation MoMs Parameter Estimation ↓ ↓ Parameter Estimates Parameter Estimates The Question I Question: How different are Parameter Estimates and Parameter Estimates? I If Parameter Estimates and Parameter Estimates are considerably different, then evidence that estimation procedure is sensitive to departures from normaility. I I Concentrate on finding transformation to normality If Parameter Estimates and Parameter Estimates are NOT considerably different, evidence that estimation procedure is NOT sensitive to departures from normality. I Think of other reason why monitor ME (for example) might deviate from what is easily explainable Simulation Idea I Problem: We don’t have Parameter Estimates. We only have Parameter Estimates. I Solution: Do the following: 1. Simulate “Normally Distributed Data” that is just like the “Non-Normally Distribued Data” we have, only normally distributed. 2. Run MoMs parameter estimation on the simulated data from previous step. I Do the above N times. I Compare parameter estimates from “Normally Distributed Data” (simulated data) to results produced by“Non-Normally Distribued Data” (actual data). Simulate Data Simulate “Normally Distributed Data” just like the “Non-Normally Distribued Data” xi1g xi2g f (Zig ) = Wig = yi1g , yi2g Φg = ng 1 X Wig , ng data vector for individual i in group g sample average of vector W for group g i=1 ng Σg = 1 X (Wig − Φg )(Wig − Φg )0 , ng − 1 i=1 = sample covariance matrix of vector W for group g ng = number of individuals in sample group g Simulate Data, cont. For each g = 1, 2, 3, 4, perform the following: I For each j = 1, 2, . . . , N = 1000, perform the following: 1. Simulate ng vectors, Wigj , where Wigj ∼ MVN4 (Φg , Σg ), j from 1, 2, . . . , N 2. Let Zig∗j = f −1 (Wigj ), the normally distributed analog to Zig 3. Estimate θ̂g∗j with the previously defined method of moment equations Monte Carlo Average: θ̄g∗ = N 1 X ∗j θ̂g N j=1 Monte Carlo SE: SE (θ̄g∗ ) v u u = diag t N 1 X ∗j (θ̂g − θ̄g∗ )(θ̂g∗j − θ̄g∗ )0 N − 1 j=1 ! Monitor Data Non-Normally Distribued Data Normally Distribued Data Actual Monitor Data Set One Simulated Monitor Data Set 120 80 80 count count 120 40 40 0 0 6.5 7.0 7.5 8.0 8.5 Monitor MM Shapiro-Wilks p-value: 2.1 × 10−5 9.0 6.5 7.0 7.5 8.0 8.5 Monitor MM Shapiro-Wilks p-value: 0.8605 9.0 Recall Data Non-Normally Distribued Data Normally Distribued Data Actual Recall Data Set One Simulated Recall Data Set 80 80 count 120 count 120 40 40 0 0 6.5 7.0 7.5 8.0 8.5 Recall MM Shapiro-Wilks p-value: 0 9.0 6.5 7.0 7.5 8.0 8.5 Recall MM Shapiro-Wilks p-value: 0.8428 9.0 Scatter Plots Non-Normally Distribued Data Normally Distribued Data Simulated Recall vs. Simulated Monitor 8.5 8.5 8.0 8.0 Recall MM Recall MM Actual Recall vs Actual Monitor 7.5 7.5 7.0 7.0 7.0 7.5 8.0 Monitor MM 8.5 7.0 7.5 8.0 Monitor MM 8.5 Parameter Estimates Interpretation (Parameter) Usual PA (µg ) Reported PA (µyg ) Pop Bias (β1g ) Data Type A S A S A S Age 1 [21, 40) n=163 7.802 (0.017) 7.799 (0.022) 7.703 (0.017) 7.704 (0.021) 0.817 (0.111) 0.856 (0.102) Age 2 [40,50) n=121 7.711 (0.017) 7.719 (0.021) 7.61 (0.015) 7.606 (0.019) 0.618 (0.128) 0.541 (0.117) Age 3 [50,60) n=142 7.705 (0.015) 7.721 (0.021) 7.608 (0.017) 7.609 (0.022) 0.705 (0.199) 0.767 (0.193) Age 4 [60,70] n=135 7.588 (0.017) 7.585 (0.021) 7.507 (0.016) 7.501 (0.02) 0.741 (0.08) 0.749 (0.104) Table : Group level parameter estimates for male MET-Mins using age grouping. Data Type “A” means “Actual,” “S” means “Simulated.” For the simulated data, Monte Carlo averages (Monte Carlo SEs) are presented. I Estimation of β1g appears sensitive to departures from normality, but look at SEs. I Underestimation of SEs for µg and µyg for actual data Variance Component Estimates Interpretation (Parameter) 2 Usual PA (σtg ) 2 Day to day (σdg ) 2 Monitor ME (σug ) 2 Recall ME (σeg ) 2 Person Report(σrg ) Data Type A S A S A S A S A S Age 1 [21, 40) n=163 0.031 (0.006) 0.037 (0.007) 0.012 (0.004) 0.013 (0.004) 0.013 (0.004) 0.01 (0.004) 0.019 (0.004) 0.015 (0.003) 0.007 (0.003) 0.01 (0.004) Age 2 [40,50) n=121 0.02 (0.005) 0.026 (0.006) 0.017 (0.006) 0.021 (0.009) 0.001 (0.005) -0.002 (0.008) 0.013 (0.003) 0.015 (0.003) 0.009 (0.003) 0.009 (0.003) Age 3 [50,60) n=142 0.015 (0.005) 0.02 (0.006) 0.02 (0.01) 0.025 (0.04) 0.004 (0.008) 0 (0.04) 0.017 (0.006) 0.023 (0.006) 0.009 (0.004) 0.01 (0.004) Age 4 [60,70] n=135 0.027 (0.005) 0.028 (0.006) 0.01 (0.002) 0.011 (0.004) 0.006 (0.002) 0.006 (0.003) 0.008 (0.002) 0.009 (0.002) 0.008 (0.002) 0.01 (0.003) Table : Group level variance component estimates for male MET-Mins using age groupings. Data Type “A” means “Actual,” “S” means “Simulated.” For the simulated data, Monte Carlo averages (Monte Carlo SEs) are presented. I Very similar variance component estimates, across the board I Same trend across age groups for estimates of monitor ME for both actual and simulated data Comments/Conclusions I Conclusion: Departure from normality is not causing lack of monotone or constant parameter estimates across age groups I So, do we believe our parameter estimates are correct? I Knowledge of context / statistical methodology interplay Thanks