Functional data analysis of accelerometer measurements from a population-based physical activity study

advertisement
Functional data analysis of accelerometer
measurements from a population-based
physical activity study
Francesco Sera
MRC Centre of Epidemiology for Child Health
University College London–Institute of Child Health
f.sera@ich.ucl.ac.uk
UCL Partners Biostatistics Network Symposium on
Contemporary Statistical Methods in Medical Research
London, 18 October 2012
Outline
1
Introduction
2
Accelerometer data as functional data
3
Functional ANOVA
4
Functional principal component analysis
5
Conclusions
2 / 32
Physical activity: a definition
Habitual physical activity may be viewed as a latent
function of activity intensity over time
Activity intensity can be expressed as the rate of chemical
energy expended above resting level, also defined as
Physical Activity Energy Expenditure (PAEE)
3 / 32
The accelerometer
PAEE must be measured through instruments that
transform intensity values to a feasible scale: e.g., activity
questionnaires, pedometer, accelerometer
An accelerometer quantifies the intensity of movement, in
one or more directions, of a body segment to which it is
attached
Movement produces a voltage signal proportional to
acceleration which is converted to scalar values (’counts’)
that are summed over a user-defined period of time
(’epoch’), e.g. 5, 15 or 60 seconds
4 / 32
Accelerometer data
4000
3000
0
1000
2000
Counts per minute
5000
6000
Example of data collected from one child wearing an Actigraph GT1M accelerometer
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
time of the day
Accelerometers provide epoch–by–epoch measurements of intensity of body
movement → 40,000+ observations on 7 days
Richness of accelerometer data → intensity, duration and patterns of (in)activity
as well as quantification of accumulated activity
5 / 32
Accelerometer data
4000
3000
0
1000
2000
Counts per minute
5000
6000
Example of data collected from one child wearing an Actigraph GT1M accelerometer
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
time of the day
Accelerometers provide epoch–by–epoch measurements of intensity of body
movement → 40,000+ observations on 7 days
Richness of accelerometer data → intensity, duration and patterns of (in)activity
as well as quantification of accumulated activity
5 / 32
Accelerometer data
4000
3000
0
1000
2000
Counts per minute
5000
6000
Example of data collected from one child wearing an Actigraph GT1M accelerometer
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
time of the day
Accelerometers provide epoch–by–epoch measurements of intensity of body
movement → 40,000+ observations on 7 days
Richness of accelerometer data → intensity, duration and patterns of (in)activity
as well as quantification of accumulated activity
5 / 32
Accelerometer output measures
Despite the potential richness of accelerometer data for
the examination of physical activity patterns, most studies
are based on daily summary statistics, e.g.
Total daily activity
Daily average activity
Mean minutes per day spent in different activity intensities
(e.g. moderate or vigorous)
6 / 32
Accelerometer output measures
Summaries can be a good starting point for the analyses
With functional data analysis (FDA) we can model the
physical activity functional patterns in their entirety
7 / 32
What is functional data analysis?
High-frequency measurements
Repeated observations
Key assumption is smoothness
yij = xi tij + εij
with xi (t) is a smooth function, t varying over a continuum
(e.g. time), and εij is the error or noise term
Functional data = the functions xi (t)
8 / 32
What are we interested in?
Representation of distribution of functions
location
dispersion
covariance
Identify pattern among the functional data
Relationship of functional data to
covariates
responses
other functions
Relationship between derivatives of functions
Timing of events in functions
9 / 32
Objective
To evaluate the application of FDA to analyse physical
(in)activity using accelerometers within a large
population-based cohort of UK children
We will use FDA to identify daily periods when children are
most or least active, and to model variations in physical
activity (e.g. by day of the week or by season)
10 / 32
Dataset
The Millennium Cohort Study (MCS) is a longitudinal study
of socioeconomic and health–related characteristics of UK
children who were resident in the UK at age 9 months
At age 7 years, children were asked to wear Actigraph
GT1M accelerometers for seven consecutive days during
waking hours
We analysed 6,709 children contributing to ∼ 37, 000 daily
physical activity profiles (median 6 days, range 1–18 days
per child)
We considered days with at least 10 hours/day of non–zero
counts measurements between 8:30 and 19:30
Counts were aggregate in one–minute epochs
11 / 32
Notation
Let tijm , yijm be the nij pairs times (t) and counts (y)
measured by the accelerometer, where i = 1, . . . , 6,709 is
the child, j = 1, . . . , ni is the day and = 1, . . . , nij represents
the sampling interval. Here we consider nij = 11 × 60, that
is 660 intervals between 8:30 and 19:30
12 / 32
From discrete to functional data
Represent data recorded at discrete times as a continuous
function
Basis-expansion methods
xij (t) = φ (t) c ij ,
where, c ij is the vector of coefficients and φ (t) is a basis
function system
A K –dimensional, fourth order B–spline basis function
system was used
13 / 32
From discrete to functional data
Let φk (tm ) the value of basis function φk at time tm
The coefficients in c ij can be estimated through
generalised linear models:
P
g(µijm ) = Kk=1 cij,k φk (tm ),
where µijm ≡ E(yijm ) and
yijm ∼ exponential family distribution
2 )
yijm ∼ Gaussian(µijm , σijm
yijm ∼ Poisson(µijm )
or quasi likelihood models with identity and logarithm link
functions, and linear and quadratic variance functions of
the mean V (µijm )
14 / 32
Degree of smoothing
A range of dimensions K was considered and we
compared models characterised by their number of bases
Models were compared using the generalised cross
validation score
GCV =
n
n−K
P
ijm
V (b
µijm )−1 [yijm − µ
bijm ]2
n−K
Model selection can also be performed using
AIC = 2 −` cij y + K
15 / 32
Smoothing; Gaussian distribution
Model selection (optimal K = 164)
1000 2000 3000 4000
0
8e+08
1e+09
Counts per minute
observed
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
16.5
17.5
18.5
19.5
6e+08
time of the day
0
100
200
300
Number of basis
400
500
600
1000 2000 3000 4000
0
4e+08
Counts per minute
smoothed
2e+08
GCV
8.5
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
time of the day
16 / 32
Analysis of residuals and test of
heteroscedasticity test
0
1000
2000
Predicted values
0
1
2
4e+05
3
Residuals vs Leverage
● 504
●●
120623
●
●●●
● ●
●●
● ●●
●●●
●
●
●
●
●
●
●
●● ●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●●●●
●
●●
●
●
● ●
●
●●
●
3000
●
●
2
0
●
●●
●●
●
●
●
−2
●
●
Std. Pearson resid.
4
●
0.0
0.2
0.4
1
0.5
0.5
1
0.6
Leverage
●●● ●
●
●
●
●
●
●●
●
●
●
●● ● ●●
●●
●
●
●
●●●
● ● ●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●● ●
● ●●● ●●
●●●●●
●
●●●●●
●●
● ●●
●●● ●●
●
●
●● ●
●
●
●
● ● ●● ● ● ● ●
●
●●
●
●
●●
●
●
●●
●●●
●
●●
●
●●
●●
●
● ●●
●
●●
●
●●
●●● ●
●
●●●
●
●
●●●●● ●●
●
●
● ● ●●
● ●●
●●● ●
●
●●●
●
●●
●●
●
● ● ● ●●
●
●●●
●
●
●●
●●
●
●
●
●
● ●●
●
●●
●●
●
●
●●
●●
●
●
●
● ●● ●
●●
●●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●● ●●
●
●●●
●●
●●
●●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●●●
●
●●
● ●●
●
●
●
●
●●●
●●
●
●
●●
●
● ●●
● ●●
●●●●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●●●
●
●● ●
●● ●● ● ●
●●●
●●●
●
●
●
●
●●●
●
●●●
● ●● ● ●●
●●
●
●
●
●
●
●●●●● ●
●
●● ●●●
●●
●
●●
●
●
●
●
●●●
● ●●● ●
●●
●
● ●●●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●● ● ●●●
●●
●●● ● ●
●
●
●
●
●
●●
● ●
●
●
●
●● ●●● ●● ●●
●
●
●
●●
●
●●●●●●●
●●
●●●
●
●●●
●●
● ● ●
●●
●●●
● ●
●
●
● ●●●● ●●●
●
●
●
●●●●●
●●●●
●
●
●●
●● ●
●●●
●
●●
●
Cook's distance
−4
2.0
1.5
1.0
Std. deviance resid.
0.5
0.0
−1
Theoretical Quantiles
Scale−Location
504 ●
● 623
● 120
●
●
●
●
●
●
● ● ●
●
●
● ● ●● ●
● ●●
●
● ●● ●
● ● ●● ● ●●●●●
●
●
●
● ●● ●●
●●●
●
● ●
●
●●●●
● ●●
●●●
●
●
●
● ●● ●●
●
●
●● ● ●●●
●
● ●
●
●●
●●
●●
●●● ● ●●
●
●●●
● ●●●
●
●● ● ●
● ● ●
●●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●● ●
● ●●
●●
●
●
●
●
●
●
●
●
● ●
●● ●● ●
● ●●●
● ●●
●●
●
●
●●
● ●●
●● ●●
●●
● ●
● ●
●
●●
●● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●●
●
●
●
●
●●● ●●● ● ● ● ●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●●●● ● ●
●
●●●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
● ●
●
● ●
−2
3e+05
−3
Variance
3000
1e+05
2000
Predicted values
Variance = exp(5.99)(Mean)0.99
●
0.8
1.0
0e+00
1000
623
120
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
2e+05
4
2
0
Std. deviance resid.
504 ●
●
0
5e+05
Normal Q−Q
504 ●
● 623
● 120
●
●
●
●
●
●
● ● ● ●● ●
● ●● ● ●
● ●
●
● ●● ●
●
● ●
● ●●
● ●
● ●●●● ● ● ●
● ●●●
●
●
●
●
●
●
●
●● ●
● ●
● ● ●●● ● ●●
● ● ●●
●●●
●
●●●● ●
●
●●● ●
●
●●
●●
●
●
●
● ●
●●●●●
●●
●●
●
●●
●
●●
●
●
●
●
●●●●
●●
●
●● ●●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●● ●● ●
●
●
●●
●
●
●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●● ●● ●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●● ●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●● ●
●
● ●
●●
●
●
●
●
●●
●
● ●●●
●●●
●
● ● ● ● ●● ●
●
●
●●●
●
●
●
●
●
●
●●●
●
●● ●● ● ●
●●
●
●●●● ● ●
●●
●
● ●
●
● ●
● ●● ●
●
●
●
●
−2
1000 2000
0
−1000
Residuals
Residuals vs Fitted
●
400
●
600
800
1000
Mean
17 / 32
Smoothing; quasi–likelihood
Model selection (optimal K = 199)
1000 2000 3000 4000
0
Counts per minute
observed
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
16.5
17.5
18.5
19.5
time of the day
1000 2000 3000 4000
GCV
0
Counts per minute
smoothed
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
time of the day
Number of bases
18 / 32
Residual analysis; quasi–likelihood
4
Normal Q−Q
0
500
1500
2500
1
0
−1
Std. deviance resid.
2
3
623 ●
−3
20
0
−20
−60
Residuals
40
60
Residuals vs Fitted
● 623
● 356
●
●
●●
● ●
●
● ●
●
●●
●●
●● ● ●
●
●
●
●●●
●●●● ●
● ●●●
●
● ●
●
● ●
● ●●● ●●●
●
●●●●●
● ●● ●●
●
●
●● ●● ●●
●
●●
●
●
●
●
●
● ●●●
●
●● ●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●●
●● ● ● ●
●
●
●●
●
●● ●
●
●
●
●
●
●●
●
●
●●●
●
● ●
●
●
●
●●
●
●
●
●
●
● ● ●● ●●
● ●
●
●●
●
●●
●
●
●●
●●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●● ● ●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●● ●
●
●
●
● ● ●●
●
●● ● ●●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●●
●
●
●
●● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●●
● ● ●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●● ●
●●
●
●●●
●
● ● ●● ●●● ● ●
●
●
●
●● ●● ●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●● ●●● ● ● ● ●
●
● ●
●
●● ●
●●
●
●
502
●
3500
356
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
● 502
−3
Predicted values
Scale−Location
1500
2500
Predicted values
1
2
3
3500
2
0
Std. Pearson resid.
4
●
●
●
●
● ●
● ●
●
●●● ●
●●
●
●●●
●●●
●
●
●
●
●●●
●●
●●
●
●
●
●● ●
●
●●●●●
●
●● ● ●
●●
●
●
1
●
●
●
●
●
●●●
●
●
●
●●
●● ●
●
●
●
●
●●●
●
●
●
●
●
0.5
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
● 152 ●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● 750
●
●
●
●
●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
0.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
91
●
●
●
●●
●
●●
●●
●●
●
● ●
●
●
●
●
●
1
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●● ●
●●●
●
●
●
●
●
−2
1.5
1.0
Std. deviance resid.
0.5
0.0
0
Residuals vs Leverage
● 623
502 ●
● 356
●
●
●
● ● ●
●
●●
●●
● ●
●
●
●●
●
●
●● ●
● ● ● ●● ●
●
●●
●●
●
● ●● ●
●
●●
●●●●
●●
●●●●
●● ●
●
●
● ●
●
●
●
● ●
●● ●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●● ●
●
● ● ●●●● ●●●● ●● ●●●
●
●
●
●
●
●
●
●
● ● ●
●●
●
●
●
●●
●● ● ●
●
●
●
●
●
●
● ●● ●
●
●
●●
●●●● ● ●● ●
●
●
●● ●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●● ●
●
●●
● ● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●●
●
● ● ● ●●●●●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●●●●
●●
●
●
●
●
●
● ●●
●●
●
●
●● ●● ● ●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●● ● ●
●
●●● ●
● ●
●●●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●● ●●
●
●
●●●
● ●
●●
●●
●
●
●●
●
●●● ●●
● ●
●
●
●
●
●
●
● ●
●
●
●
●●● ●●●●●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●● ●
●
●
●
● ●
●
●
●● ●●
●
●
●
● ●●●
●
●
●●
●●
●● ●
●
●
● ●● ●● ●
●● ●●●
●
●
●● ●●● ●● ●
●●
●●
●
●
●
● ● ●●
●●
● ●
●
●
●
●
●●● ●
●●
●
500
−1
Theoretical Quantiles
●
0
−2
Cook's distance
0.0
0.2
0.4
0.6
0.8
1.0
Leverage
19 / 32
FANOVA
The functional ANOVA model can be written as
xij,g (t) = µ (t) + αg (t) + ij (t),
P
with G
g=1 αg (t) = 0, ∀t ∈ t, and where
g = 1, . . . , G are the levels of a categorical covariate
xij,g (t) is the functional representation of accelerometer
data for child i, day j of group g
µ (t) is the grand mean function
αg (t) is the g contrast’s coefficient functions and
represents specific effects by day of the week or season on
physical activity profiles
ij (t) is the residual function that models unexplained
variations
20 / 32
A permutation functional F test can be used to assess the
effects of covariates on functional responses
Let x̂ (t0 ) be the vector of predicted values of the counts
function at time t = t0 , the pointwise functional F test
statistic is given by
b (t0 )
Var x
P
F (t0 ) =
1/n nij (xij (t0 ) − xbij (t0 ))2
and the maximum functional F test as
maxt0 {F (t0 ) , ∀ t0 ∈ t} can be used to evaluate a global
association
21 / 32
FANOVA to test gender, weekend and seasonal effects on
physical activity profiles
The models :
(gender)
xij,g (t) =
µ (t) + Gg (t) + ij,g (t)
(weekend)
xij,m (t) =
µ (t) + Ww (t) + ij,w (t)
(season)
xij,s (t) =
µ (t) + Ss (t) + ij,s (t)
Constrained by
2
X
g=1
Gg (t) = 0;
2
X
w=1
Ww (t) = 0;
4
X
Ss (t) = 0, ∀t ∈ t
s=1
22 / 32
1000
800
600
400
200
0
counts per minute
1200
1400
1600
Smoothed mean counts profile for 6,709 children
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
Hours
23 / 32
Boys and girls contrast functions
0.02
100
50
0
−50
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
15.5
16.5
17.5
18.5
19.5
Hours
100
50
0
0.00
−50
−100
counts per minute
Girls
0.01
8.5
F−statistic
−100
counts per minute
Boys
8.5
9.5
10.5
11.5
12.5
13.5
14.5
Hours
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
Hours
24 / 32
Workdays and weekend contrast functions
0.10
300
100
0.08
−100
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
15.5
16.5
17.5
18.5
19.5
Hours
0.02
300
100
0.00
−100
−300
counts per minute
Weekend
0.06
9.5
0.04
8.5
F−statistic
−300
counts per minute
Workdays
8.5
9.5
10.5
11.5
12.5
13.5
14.5
Hours
8.5
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
Hours
25 / 32
Calendar seasons contrast functions
16.5
0.015
100
0.010
8.5
10.5
12.5
14.5
Summer
Autumn
16.5
18.5
16.5
18.5
0.005
Hours
100
0
0.000
−100
counts per minute
−200
100
0
−100
−200
counts per minute
0
18.5
Hours
F−statistic
14.5
−100
counts per minute
12.5
200
10.5
200
8.5
−200
100
0
−100
−200
counts per minute
200
Spring
200
Winter
8.5
8.5
10.5
12.5
14.5
Hours
16.5
18.5
8.5
10.5
12.5
14.5
Hours
9.5
10.5
11.5
12.5
13.5
14.5
15.5
16.5
17.5
18.5
19.5
Hours
26 / 32
Functional PCA
Multivariate PCA use Eigen-decomposition
0
Σ = UDU =
p
X
0
dq u q u q
q=1
0
and u q u r = I (q = r )
Instead of covariance matrix Σ we have the covariance
function σ (s, t)
For covariance function
σ (s, t) =
∞
X
dq ξq (s)ξq (t)
q=1
for
R
ξq (t)ξr (t)dt = I (q = r )
27 / 32
σ (s, t) =
∞
X
dq ξq (s)ξq (t)
q=1
R
The ξq (t) maximize Var
ξq (t)xij (t)dt
R
dq = Var
ξq (t)xij (t)dt
P
dq / dq is proportion of variance explained
Principal component scores are
Z
fij,q = ξq (t)xij (t)dt
28 / 32
Estimated eigenfunctions (harmonics)
0.5
−0.5
−1.0
10.5
12.5
14.5
16.5
18.5
8.5
10.5
12.5
14.5
16.5
18.5
Hours
Harmonic 3 ( 4.2 %)
Harmonic 4 ( 3.7 %)
0.5
0.0
−0.5
−1.0
−1.0
−0.5
0.0
weights
0.5
1.0
Hours
1.0
8.5
weights
0.0
weights
0.5
0.0
−1.0
−0.5
weights
1.0
Harmonic 2 ( 4.7 %)
1.0
Harmonic 1 ( 7.9 %)
8.5
10.5
12.5
14.5
Hours
16.5
18.5
8.5
10.5
12.5
14.5
16.5
18.5
Hours
29 / 32
Mean and standard deviation of the first 4 functional principal
component scores in boys and girls
fPCA1
fPCA2
fPCA3
fPCA4
Boys
Mean
SD
126.1 761.0
47.0 614.1
25.5 574.4
20.2 544.2
Girls
Mean
SD
-124.1 658.4
-46.2 493.5
-25.1 469.5
-19.9 442.1
t–test
p value
<0.0001
<0.0001
<0.0001
<0.0001
30 / 32
Conclusions
We have shown the potential of functional data analysis to
analyse a large database of physical activity trajectories
These methods allow exploiting the richness of the
information gathered
FDA can be used to evaluate temporal effects at different
levels, e.g, child-level (obesity) and geographical level
All analyses were performed using 64–bit R under a
Unix–alike machine with 10G of RAM; packages fda and
glm were used
31 / 32
Acknowledgments
CHARGE group: Lucy J Griffiths, Mario Cortina-Borja,
Marco Geraci, Theodora Pouliou, Carly Rich, Carol
Dezateux
We are grateful to the children who participated in this
study and their families
This work was funded by the Wellcome Trust Grant
WT084686.
32 / 32
Download