Functional data analysis of accelerometer measurements from a population-based physical activity study Francesco Sera MRC Centre of Epidemiology for Child Health University College London–Institute of Child Health f.sera@ich.ucl.ac.uk UCL Partners Biostatistics Network Symposium on Contemporary Statistical Methods in Medical Research London, 18 October 2012 Outline 1 Introduction 2 Accelerometer data as functional data 3 Functional ANOVA 4 Functional principal component analysis 5 Conclusions 2 / 32 Physical activity: a definition Habitual physical activity may be viewed as a latent function of activity intensity over time Activity intensity can be expressed as the rate of chemical energy expended above resting level, also defined as Physical Activity Energy Expenditure (PAEE) 3 / 32 The accelerometer PAEE must be measured through instruments that transform intensity values to a feasible scale: e.g., activity questionnaires, pedometer, accelerometer An accelerometer quantifies the intensity of movement, in one or more directions, of a body segment to which it is attached Movement produces a voltage signal proportional to acceleration which is converted to scalar values (’counts’) that are summed over a user-defined period of time (’epoch’), e.g. 5, 15 or 60 seconds 4 / 32 Accelerometer data 4000 3000 0 1000 2000 Counts per minute 5000 6000 Example of data collected from one child wearing an Actigraph GT1M accelerometer 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 time of the day Accelerometers provide epoch–by–epoch measurements of intensity of body movement → 40,000+ observations on 7 days Richness of accelerometer data → intensity, duration and patterns of (in)activity as well as quantification of accumulated activity 5 / 32 Accelerometer data 4000 3000 0 1000 2000 Counts per minute 5000 6000 Example of data collected from one child wearing an Actigraph GT1M accelerometer 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 time of the day Accelerometers provide epoch–by–epoch measurements of intensity of body movement → 40,000+ observations on 7 days Richness of accelerometer data → intensity, duration and patterns of (in)activity as well as quantification of accumulated activity 5 / 32 Accelerometer data 4000 3000 0 1000 2000 Counts per minute 5000 6000 Example of data collected from one child wearing an Actigraph GT1M accelerometer 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 time of the day Accelerometers provide epoch–by–epoch measurements of intensity of body movement → 40,000+ observations on 7 days Richness of accelerometer data → intensity, duration and patterns of (in)activity as well as quantification of accumulated activity 5 / 32 Accelerometer output measures Despite the potential richness of accelerometer data for the examination of physical activity patterns, most studies are based on daily summary statistics, e.g. Total daily activity Daily average activity Mean minutes per day spent in different activity intensities (e.g. moderate or vigorous) 6 / 32 Accelerometer output measures Summaries can be a good starting point for the analyses With functional data analysis (FDA) we can model the physical activity functional patterns in their entirety 7 / 32 What is functional data analysis? High-frequency measurements Repeated observations Key assumption is smoothness yij = xi tij + εij with xi (t) is a smooth function, t varying over a continuum (e.g. time), and εij is the error or noise term Functional data = the functions xi (t) 8 / 32 What are we interested in? Representation of distribution of functions location dispersion covariance Identify pattern among the functional data Relationship of functional data to covariates responses other functions Relationship between derivatives of functions Timing of events in functions 9 / 32 Objective To evaluate the application of FDA to analyse physical (in)activity using accelerometers within a large population-based cohort of UK children We will use FDA to identify daily periods when children are most or least active, and to model variations in physical activity (e.g. by day of the week or by season) 10 / 32 Dataset The Millennium Cohort Study (MCS) is a longitudinal study of socioeconomic and health–related characteristics of UK children who were resident in the UK at age 9 months At age 7 years, children were asked to wear Actigraph GT1M accelerometers for seven consecutive days during waking hours We analysed 6,709 children contributing to ∼ 37, 000 daily physical activity profiles (median 6 days, range 1–18 days per child) We considered days with at least 10 hours/day of non–zero counts measurements between 8:30 and 19:30 Counts were aggregate in one–minute epochs 11 / 32 Notation Let tijm , yijm be the nij pairs times (t) and counts (y) measured by the accelerometer, where i = 1, . . . , 6,709 is the child, j = 1, . . . , ni is the day and = 1, . . . , nij represents the sampling interval. Here we consider nij = 11 × 60, that is 660 intervals between 8:30 and 19:30 12 / 32 From discrete to functional data Represent data recorded at discrete times as a continuous function Basis-expansion methods xij (t) = φ (t) c ij , where, c ij is the vector of coefficients and φ (t) is a basis function system A K –dimensional, fourth order B–spline basis function system was used 13 / 32 From discrete to functional data Let φk (tm ) the value of basis function φk at time tm The coefficients in c ij can be estimated through generalised linear models: P g(µijm ) = Kk=1 cij,k φk (tm ), where µijm ≡ E(yijm ) and yijm ∼ exponential family distribution 2 ) yijm ∼ Gaussian(µijm , σijm yijm ∼ Poisson(µijm ) or quasi likelihood models with identity and logarithm link functions, and linear and quadratic variance functions of the mean V (µijm ) 14 / 32 Degree of smoothing A range of dimensions K was considered and we compared models characterised by their number of bases Models were compared using the generalised cross validation score GCV = n n−K P ijm V (b µijm )−1 [yijm − µ bijm ]2 n−K Model selection can also be performed using AIC = 2 −` cij y + K 15 / 32 Smoothing; Gaussian distribution Model selection (optimal K = 164) 1000 2000 3000 4000 0 8e+08 1e+09 Counts per minute observed 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 16.5 17.5 18.5 19.5 6e+08 time of the day 0 100 200 300 Number of basis 400 500 600 1000 2000 3000 4000 0 4e+08 Counts per minute smoothed 2e+08 GCV 8.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 time of the day 16 / 32 Analysis of residuals and test of heteroscedasticity test 0 1000 2000 Predicted values 0 1 2 4e+05 3 Residuals vs Leverage ● 504 ●● 120623 ● ●●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ● ●● ● ● ● ● ● ●● ● 3000 ● ● 2 0 ● ●● ●● ● ● ● −2 ● ● Std. Pearson resid. 4 ● 0.0 0.2 0.4 1 0.5 0.5 1 0.6 Leverage ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●● ●●●●● ● ●●●●● ●● ● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●●● ● ●● ● ●● ●● ● ● ●● ● ●● ● ●● ●●● ● ● ●●● ● ● ●●●●● ●● ● ● ● ● ●● ● ●● ●●● ● ● ●●● ● ●● ●● ● ● ● ● ●● ● ●●● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ●●● ●● ● ● ●● ● ● ●● ● ●● ●●●●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●●●● ● ●● ● ●● ●● ● ● ●●● ●●● ● ● ● ● ●●● ● ●●● ● ●● ● ●● ●● ● ● ● ● ● ●●●●● ● ● ●● ●●● ●● ● ●● ● ● ● ● ●●● ● ●●● ● ●● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ●● ●● ● ● ● ●● ● ●●●●●●● ●● ●●● ● ●●● ●● ● ● ● ●● ●●● ● ● ● ● ● ●●●● ●●● ● ● ● ●●●●● ●●●● ● ● ●● ●● ● ●●● ● ●● ● Cook's distance −4 2.0 1.5 1.0 Std. deviance resid. 0.5 0.0 −1 Theoretical Quantiles Scale−Location 504 ● ● 623 ● 120 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ●●●●● ● ● ● ● ●● ●● ●●● ● ● ● ● ●●●● ● ●● ●●● ● ● ● ● ●● ●● ● ● ●● ● ●●● ● ● ● ● ●● ●● ●● ●●● ● ●● ● ●●● ● ●●● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ●● ●● ● ● ●● ● ●● ●● ●● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ●●●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● −2 3e+05 −3 Variance 3000 1e+05 2000 Predicted values Variance = exp(5.99)(Mean)0.99 ● 0.8 1.0 0e+00 1000 623 120 ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 2e+05 4 2 0 Std. deviance resid. 504 ● ● 0 5e+05 Normal Q−Q 504 ● ● 623 ● 120 ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ●● ●●● ● ●●●● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ●●●●● ●● ●● ● ●● ● ●● ● ● ● ● ●●●● ●● ● ●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ●●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● −2 1000 2000 0 −1000 Residuals Residuals vs Fitted ● 400 ● 600 800 1000 Mean 17 / 32 Smoothing; quasi–likelihood Model selection (optimal K = 199) 1000 2000 3000 4000 0 Counts per minute observed 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 16.5 17.5 18.5 19.5 time of the day 1000 2000 3000 4000 GCV 0 Counts per minute smoothed 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 time of the day Number of bases 18 / 32 Residual analysis; quasi–likelihood 4 Normal Q−Q 0 500 1500 2500 1 0 −1 Std. deviance resid. 2 3 623 ● −3 20 0 −20 −60 Residuals 40 60 Residuals vs Fitted ● 623 ● 356 ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ●●●● ● ● ●●● ● ● ● ● ● ● ● ●●● ●●● ● ●●●●● ● ●● ●● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●●● ● ● ● ●● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● 502 ● 3500 356 ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● 502 −3 Predicted values Scale−Location 1500 2500 Predicted values 1 2 3 3500 2 0 Std. Pearson resid. 4 ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●●● ●●● ● ● ● ● ●●● ●● ●● ● ● ● ●● ● ● ●●●●● ● ●● ● ● ●● ● ● 1 ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● 152 ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 750 ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 91 ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● 1 ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ● −2 1.5 1.0 Std. deviance resid. 0.5 0.0 0 Residuals vs Leverage ● 623 502 ● ● 356 ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ●● ●●●● ●● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ●●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●●● ● ●● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ● ●● ●● ● ● ●●● ● ● ●● ●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ●● ●●● ● ● ●● ●●● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ● ●● ● 500 −1 Theoretical Quantiles ● 0 −2 Cook's distance 0.0 0.2 0.4 0.6 0.8 1.0 Leverage 19 / 32 FANOVA The functional ANOVA model can be written as xij,g (t) = µ (t) + αg (t) + ij (t), P with G g=1 αg (t) = 0, ∀t ∈ t, and where g = 1, . . . , G are the levels of a categorical covariate xij,g (t) is the functional representation of accelerometer data for child i, day j of group g µ (t) is the grand mean function αg (t) is the g contrast’s coefficient functions and represents specific effects by day of the week or season on physical activity profiles ij (t) is the residual function that models unexplained variations 20 / 32 A permutation functional F test can be used to assess the effects of covariates on functional responses Let x̂ (t0 ) be the vector of predicted values of the counts function at time t = t0 , the pointwise functional F test statistic is given by b (t0 ) Var x P F (t0 ) = 1/n nij (xij (t0 ) − xbij (t0 ))2 and the maximum functional F test as maxt0 {F (t0 ) , ∀ t0 ∈ t} can be used to evaluate a global association 21 / 32 FANOVA to test gender, weekend and seasonal effects on physical activity profiles The models : (gender) xij,g (t) = µ (t) + Gg (t) + ij,g (t) (weekend) xij,m (t) = µ (t) + Ww (t) + ij,w (t) (season) xij,s (t) = µ (t) + Ss (t) + ij,s (t) Constrained by 2 X g=1 Gg (t) = 0; 2 X w=1 Ww (t) = 0; 4 X Ss (t) = 0, ∀t ∈ t s=1 22 / 32 1000 800 600 400 200 0 counts per minute 1200 1400 1600 Smoothed mean counts profile for 6,709 children 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 Hours 23 / 32 Boys and girls contrast functions 0.02 100 50 0 −50 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 15.5 16.5 17.5 18.5 19.5 Hours 100 50 0 0.00 −50 −100 counts per minute Girls 0.01 8.5 F−statistic −100 counts per minute Boys 8.5 9.5 10.5 11.5 12.5 13.5 14.5 Hours 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 Hours 24 / 32 Workdays and weekend contrast functions 0.10 300 100 0.08 −100 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 15.5 16.5 17.5 18.5 19.5 Hours 0.02 300 100 0.00 −100 −300 counts per minute Weekend 0.06 9.5 0.04 8.5 F−statistic −300 counts per minute Workdays 8.5 9.5 10.5 11.5 12.5 13.5 14.5 Hours 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 Hours 25 / 32 Calendar seasons contrast functions 16.5 0.015 100 0.010 8.5 10.5 12.5 14.5 Summer Autumn 16.5 18.5 16.5 18.5 0.005 Hours 100 0 0.000 −100 counts per minute −200 100 0 −100 −200 counts per minute 0 18.5 Hours F−statistic 14.5 −100 counts per minute 12.5 200 10.5 200 8.5 −200 100 0 −100 −200 counts per minute 200 Spring 200 Winter 8.5 8.5 10.5 12.5 14.5 Hours 16.5 18.5 8.5 10.5 12.5 14.5 Hours 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 Hours 26 / 32 Functional PCA Multivariate PCA use Eigen-decomposition 0 Σ = UDU = p X 0 dq u q u q q=1 0 and u q u r = I (q = r ) Instead of covariance matrix Σ we have the covariance function σ (s, t) For covariance function σ (s, t) = ∞ X dq ξq (s)ξq (t) q=1 for R ξq (t)ξr (t)dt = I (q = r ) 27 / 32 σ (s, t) = ∞ X dq ξq (s)ξq (t) q=1 R The ξq (t) maximize Var ξq (t)xij (t)dt R dq = Var ξq (t)xij (t)dt P dq / dq is proportion of variance explained Principal component scores are Z fij,q = ξq (t)xij (t)dt 28 / 32 Estimated eigenfunctions (harmonics) 0.5 −0.5 −1.0 10.5 12.5 14.5 16.5 18.5 8.5 10.5 12.5 14.5 16.5 18.5 Hours Harmonic 3 ( 4.2 %) Harmonic 4 ( 3.7 %) 0.5 0.0 −0.5 −1.0 −1.0 −0.5 0.0 weights 0.5 1.0 Hours 1.0 8.5 weights 0.0 weights 0.5 0.0 −1.0 −0.5 weights 1.0 Harmonic 2 ( 4.7 %) 1.0 Harmonic 1 ( 7.9 %) 8.5 10.5 12.5 14.5 Hours 16.5 18.5 8.5 10.5 12.5 14.5 16.5 18.5 Hours 29 / 32 Mean and standard deviation of the first 4 functional principal component scores in boys and girls fPCA1 fPCA2 fPCA3 fPCA4 Boys Mean SD 126.1 761.0 47.0 614.1 25.5 574.4 20.2 544.2 Girls Mean SD -124.1 658.4 -46.2 493.5 -25.1 469.5 -19.9 442.1 t–test p value <0.0001 <0.0001 <0.0001 <0.0001 30 / 32 Conclusions We have shown the potential of functional data analysis to analyse a large database of physical activity trajectories These methods allow exploiting the richness of the information gathered FDA can be used to evaluate temporal effects at different levels, e.g, child-level (obesity) and geographical level All analyses were performed using 64–bit R under a Unix–alike machine with 10G of RAM; packages fda and glm were used 31 / 32 Acknowledgments CHARGE group: Lucy J Griffiths, Mario Cortina-Borja, Marco Geraci, Theodora Pouliou, Carly Rich, Carol Dezateux We are grateful to the children who participated in this study and their families This work was funded by the Wellcome Trust Grant WT084686. 32 / 32