Functional Regression Analysis for Big Medical Movement Data Jian Qing SHI

advertisement
Functional Regression Analysis for Big Medical
Movement Data
Jian Qing SHI
School of Mathematics & Statistics, Newcastle University, UK
j.q.shi@ncl.ac.uk
http://www.staff.ncl.ac.uk/j.q.shi
Joint work with Dr.
Yafeng Cheng and Prof.
Janet Eyre
UCL Workshop on the Theory of Big Data, London
08/01/2016
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
1 / 23
Outline
1
Movement data: project of Limbs-Alive
2
The model and function-valued variable selection
A nonlinear mixed-effects scalar-on-function regression model
How to represent a function-valued variable
Correlation between function-valued variables
Functional least angle regression (fLARS)
3
Numerical results
4
Conclusions
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
2 / 23
Limbs-Alive: Monitoring of upper limb rehabilitation and
recovery after stroke
Worldwide — Stroke survivors estimated 50-60 million globally
currently.
UK — 1.1 million stroke survivors (2012).
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
3 / 23
Limbs-Alive: Monitoring of upper limb rehabilitation and
recovery after stroke
Worldwide — Stroke survivors estimated 50-60 million globally
currently.
UK — 1.1 million stroke survivors (2012).
Hemiparesis occurs in 80% of survivors:
I
I
80% regain the ability to walk independently, but
50-70% experience persisting upper-limb functional impairments.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
3 / 23
Limbs-Alive: Monitoring of upper limb rehabilitation and
recovery after stroke
Worldwide — Stroke survivors estimated 50-60 million globally
currently.
UK — 1.1 million stroke survivors (2012).
Hemiparesis occurs in 80% of survivors:
I
I
80% regain the ability to walk independently, but
50-70% experience persisting upper-limb functional impairments.
Substantial improvement of upper limb function demonstrated to be
possible many years after stroke, but only with intense therapeutic
input.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
3 / 23
Limbs-Alive: Monitoring of upper limb rehabilitation and
recovery after stroke
Worldwide — Stroke survivors estimated 50-60 million globally
currently.
UK — 1.1 million stroke survivors (2012).
Hemiparesis occurs in 80% of survivors:
I
I
80% regain the ability to walk independently, but
50-70% experience persisting upper-limb functional impairments.
Substantial improvement of upper limb function demonstrated to be
possible many years after stroke, but only with intense therapeutic
input.
The limbs-alive way (a system developed by Newcastle team)
I
I
I
Rehabilitation tool: play video-game at home.
Reduce cost, puts patients in charge of their own recovery.
Monitoring: Therapists monitor patients’ upper-limb rehabilitation
remotely.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
3 / 23
The rehabilitation system – Monitoring
Patients play assessment game at home (including 38 movements).
The data are collected and transfered to our central server;
Data analysis: calibration, standardization, registration and then
modeling;
Relay information is send to therapists location. Therapists monitor
patients recovery/adjust therapy non-visually.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
4 / 23
Evaluation of upper limb function
Current (most often used) methods
— Chedoke Arm and Hand Activity
Inventory (CAHAI )
l
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
5 / 23
Evaluation of upper limb function
Current (most often used) methods
— Chedoke Arm and Hand Activity
Inventory (CAHAI )
Method in our project:
l
I
I
I
THREE ORTHOGONAL COILS
TWO OF THE RECEIVERS
Designed 38 movements;
Collected data using wireless
controllers:
It is objective, relatively cheap and
more importantly it can be used
for remote monitoring.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
5 / 23
Big data: evaluation of upper limb function
Date collected from controllers (60 Hz):
I
I
Position data: x(t), y (t), z(t)
Orientation data: qx, qy , qz, qw (quaternions)
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
6 / 23
Big data: evaluation of upper limb function
Date collected from controllers (60 Hz):
I
I
Position data: x(t), y (t), z(t)
Orientation data: qx, qy , qz, qw (quaternions)
but
I
I
I
I
data is noisy
Raw data – about 15GB
sample size 187 — but there are 273 scalar variables and 504
function-valued variables for each patient
modeling is very challenging
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
6 / 23
Preprocessing
Preliminary data analysis: Segmentation, Calibration,
Standardization, Smoothing and Registration
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
7 / 23
Preprocessing
−0.2
0.4 0.8
right_x
−0.4
−1.0
left_x
0.2
Preliminary data analysis: Segmentation, Calibration,
Standardization, Smoothing and Registration
0.0
0.4
0.8
0.0
0.8
1.0
0.0
right_y
−1.0
0.0
left_y
−1.0
J. Q. Shi (Newcastle University)
0.0
0.4
time
1.0
time
0.4Movement
0.8 Data and Functional
0.0 LARS0.4
0.8
08/01/2016
7 / 23
Modeling
Functional regression/classification/latent variable model
I
I
Nonlinearity and heterogeneity,
Variable selection: large number of scalar and function-valued variables
Model validation.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
8 / 23
Available data
Aim : model for yij
Notation: i for patients; j for time-related replicates
Impairment level (cahai) - yij
Patient related information - (scalars e.g. age, gender, initial
impairment level ...)
Kinematic variables - (scalars including velocity, fluency, synchrony,
accuracy, in total of 266 variables)
Function-valued variables - in total of 504 variables
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
9 / 23
Nonlinear mixed-effects scalar-on-function regression model
yi = γ0 + zi γ +
J Z
X
xi,j (t)βj (t)dt + g (vi ) + (1)
j=1
g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ).
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
10 / 23
Nonlinear mixed-effects scalar-on-function regression model
yi = γ0 + zi γ +
J Z
X
xi,j (t)βj (t)dt + g (vi ) + (1)
j=1
g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ).
Features of the model
A mixed-effects scalar-on-function model
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
10 / 23
Nonlinear mixed-effects scalar-on-function regression model
yi = γ0 + zi γ +
J Z
X
xi,j (t)βj (t)dt + g (vi ) + (1)
j=1
g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ).
Features of the model
A mixed-effects scalar-on-function model
GPR models nonlinear random-effects, and can cope with
multi-dimensional covariates
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
10 / 23
Nonlinear mixed-effects scalar-on-function regression model
yi = γ0 + zi γ +
J Z
X
xi,j (t)βj (t)dt + g (vi ) + (1)
j=1
g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ).
Features of the model
A mixed-effects scalar-on-function model
GPR models nonlinear random-effects, and can cope with
multi-dimensional covariates
Provides a natural framework on simultaneously modeling common
mean structure across different subjects and covariance structure for
each individual.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
10 / 23
Nonlinear mixed-effects scalar-on-function regression model
yi = γ0 + zi γ +
J Z
X
xi,j (t)βj (t)dt + g (vi ) + (1)
j=1
g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ).
Features of the model
A mixed-effects scalar-on-function model
GPR models nonlinear random-effects, and can cope with
multi-dimensional covariates
Provides a natural framework on simultaneously modeling common
mean structure across different subjects and covariance structure for
each individual.
Challenges: variable selection of both scalar and function-valued
variables.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
10 / 23
Represent function-valued variables
Z
u=
J. Q. Shi (Newcastle University)
x(t)β(t)dt
Movement Data and Functional LARS
08/01/2016
11 / 23
Represent function-valued variables
Z
u=
x(t)β(t)dt
by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk ))
if k is large enough.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
11 / 23
Represent function-valued variables
Z
u=
x(t)β(t)dt
by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk ))
if k is large enough.
P
by basis functions (BF), e.g. x(t) = K
k=1 ck φk (t).
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
11 / 23
Represent function-valued variables
Z
u=
x(t)β(t)dt
by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk ))
if k is large enough.
P
by basis functions (BF), e.g. x(t) = K
k=1 ck φk (t).
by Gaussian quadrature (GQ):
Z 1
k
X
f (t)dt =
wi f (ti ),
−1
i=1
where k is usually a quite small number.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
11 / 23
Represent function-valued variables
Z
u=
x(t)β(t)dt
by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk ))
if k is large enough.
P
by basis functions (BF), e.g. x(t) = K
k=1 ck φk (t).
by Gaussian quadrature (GQ):
Z 1
k
X
f (t)dt =
wi f (ti ),
−1
i=1
where k is usually a quite small number.
A unified expression
Z
x(t)β(t)dt =XWCβT ,
Z
[β 00 (t)]2 dt =Cβ W2 CβT .
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
(2)
(3)
08/01/2016
11 / 23
Correlation between random vectors
Canonical correlation looks for the linear correlation between two random
vectors. The idea is that, for each pair of a, b , there is
e e
cov (aT Z1 , b T Z2 )
ρ(Z1 , Z2 |a, b ) = q
e
e
e e
Var (aT Z1 )Var (b T Z2 )
e
e
(4)
The first correlation is defined as:
cov (aT Z1 , b T Z2 )
ρ(Z1 , Z2 ) = argmax q
e
e
a,b
Var (aT Z1 )Var (b T Z2 )
ee
e
e
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
12 / 23
Functional canonical correlation
The functional canonical correlation is defined by the following way.
ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) =
√
J. Q. Shi (Newcastle University)
R
R
cov ( X1 (t)β1 (t)dt, X2 (t)β2 (t)dt)
R
R
[Var ( X1 (t)β1 (t)dt)][Var ( X2 (t)β2 (t)dt)]
Movement Data and Functional LARS
08/01/2016
13 / 23
Functional canonical correlation
The functional canonical correlation is defined by the following way.
ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) =
√
R
R
cov ( X1 (t)β1 (t)dt, X2 (t)β2 (t)dt)
R
R
[Var ( X1 (t)β1 (t)dt)][Var ( X2 (t)β2 (t)dt)]
R
00
Need to apply roughness penalty, e.g. PEN = [β(t) ]2 dt.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
13 / 23
Functional canonical correlation
The functional canonical correlation is defined by the following way.
ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) =
√
R
R
cov ( X1 (t)β1 (t)dt, X2 (t)β2 (t)dt)
R
R
[Var ( X1 (t)β1 (t)dt)][Var ( X2 (t)β2 (t)dt)]
R
00
Need to apply roughness penalty, e.g. PEN = [β(t) ]2 dt.
ρ(X1 (t), X2 (t) = ρ(X1 (t), X2 (t)|β̃1 (t), β̃2 (t) with
Z
00
argmax
ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) − λ1 [β1 (t) ]2 dt
β1 (t),β2 (t)
Z
−λ2
00
2
[β2 (t) ] dt
where λ1 and λ2 are two tuning parameters.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
13 / 23
Canonical correlation between one function-valued variable
and one scale random variable
R
cov ( X (t)β(t)dt, by )
ρ(X (t), y |β(t), b) = q
R
[Var ( X (t)β(t)dt)]2 dt][Var (by )]
˜ b̃) with
and ρ(X (t), y ) = ρ(X (t), y |β(t),
Z
argmax ρ(X (t), y |β(t), b) − λ1
00
[β(t) ]2 dt.
β(t),b
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
14 / 23
Functional LARS (fLARS) Algorithm
1
Normalize response and all variables. Set all coefficients as 0.
2
Find the variable with the maximum canonical correlation with the
response. Put it in the selected set A.
3
Iterations start from here. For each iteration, find the vector which is
the canonical variable from all the variables in A against the current
residual rk , call it pk where k is the current iteration. Find αk,j for
each j 6∈ A in each iteration k from:
cor (rk − αk,j pk , pk )2 = fcca(rk − αk,j , xj (t))2 ; j 6∈ A
where αk,j is the distance to go for pk to meet variable xj (t).
4
In iteration k, pk should move min(αk,jk ), and move jk into A.
5
Repeat until meet the stopping rule.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
15 / 23
Functional LARS Algorithm
Two new stopping rules have been developed:
when the distance to move α is small than a threshold;
when AD reaches the maximum.
0.2
0.0
0.1
value
0.3
0.4
α
2
4
6
8
10
12
iteration
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
16 / 23
Simulation study
Model: y = µ +
P
k zk γk
+
P R
j
xj (t)βj (t)dt
True model: including three function-valued variables
x1 (t), x2 (t), x3 (t), and three scalar variables z1 , z2 , z3 .
Candidate variables: 50 function-valued variables and 50 scalar
variables.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
17 / 23
Simulation study: function variables used in the true model
β2
β3
0.00
−0.10
−0.15
0.00
−0.05
0.02
0.00
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
time
time
time
x1(t)
x2(t)
x3(t)
0.8
1.0
80
100
−50
0
200
0
50
250
50
100
300
100
150
0.0
−0.05
value
0.10
value
0.05
0.06
0.04
value
0.08
0.10
0.15
0.12
β1
0
20
40
60
index
J. Q. Shi (Newcastle University)
80
100
0
20
40
60
80
100
0
index
Movement Data and Functional LARS
20
40
60
index
08/01/2016
18 / 23
Simulation study: numerical results
RDP
GQ
BF
GroupLassoP
GroupLassoB
fLARS
RMSE
0.0639
0.0721
0.0622
0.1103
0.3662
J. Q. Shi (Newcastle University)
Correctly
selected (%)
99.58
99.03
99.85
100
54.71
False
selected (%)
0.15
0.50
0.26
43.56
14.05
Movement Data and Functional LARS
Time (sec)
14.3802
3.6457
5.5291
1659.4159
9.3553
08/01/2016
19 / 23
Numerical results for movement data
Acute
Chronic
RMSE
SE
RMSE
SE
Only
lm
6.985
0.229
3.904
0.106
Scalar
ME
6.614
0.179
4.047
0.105
fLARS
fFE
fME
6.200 6.061
0.186 0.176
3.698 3.854
0.090 0.095
Table: RMSEs comparing predictions and the observations with 1000 replications.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
20 / 23
60
50
40
Predictions
20
30
40
20
30
Predictions
50
60
Movement data: the observations and Predictions (cross
validation)
20
30
40
50
60
20
CAHAI
30
40
50
60
CAHAI
Figure: Left: using scalar variables only. Right: using both scalar and
functional-valued variables
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
21 / 23
Conclusions
Proposed a complex nonlinear scalar-on-function regression model
I
I
I
Nonlinear random-effects: modeled by a GPR
Variable selection for large number of scalar and function-valued
variables: the new fLARS works efficiently and accurately
Very good performance for the big medical movement data
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
22 / 23
Conclusions
Proposed a complex nonlinear scalar-on-function regression model
I
I
I
Nonlinear random-effects: modeled by a GPR
Variable selection for large number of scalar and function-valued
variables: the new fLARS works efficiently and accurately
Very good performance for the big medical movement data
Further research
I
Consider different correlation coefficients for function-valued variables
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
22 / 23
Conclusions
Proposed a complex nonlinear scalar-on-function regression model
I
I
I
Nonlinear random-effects: modeled by a GPR
Variable selection for large number of scalar and function-valued
variables: the new fLARS works efficiently and accurately
Very good performance for the big medical movement data
Further research
I
I
Consider different correlation coefficients for function-valued variables
Different way of representing function-valued variables: replace GQ by
other method which can keep more information for x(t).
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
22 / 23
Conclusions
Proposed a complex nonlinear scalar-on-function regression model
I
I
I
Nonlinear random-effects: modeled by a GPR
Variable selection for large number of scalar and function-valued
variables: the new fLARS works efficiently and accurately
Very good performance for the big medical movement data
Further research
I
I
I
Consider different correlation coefficients for function-valued variables
Different way of representing function-valued variables: replace GQ by
other method which can keep more information for x(t).
A nonlinear scalar-on-function model (in terms of function-valued
variable x(t), may use a GP prior).
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
22 / 23
Thank you
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
23 / 23
Thank you
References:
Cheng, Y, Shi, J and Eyre, J. (2015). Nonlinear mixed-effects
scalar-on-function regression models and variable selection for
movement data.
Serradilla, J. Shi, J. Q., Cheng, Y., Morgan, G., Lambden, C. and
Eyre, J. A. (2014). Automatic Assessment of Upper Limb Function
During Play of the Action Video Game, Circus Challenge: Validity and
Sensitivity to Change. SEGAH 2014. (The best paper winner in the
IEEE 3rd International Conference on Serious Games and Applications
for Health, held on Rio de Janeiro, Brazil, May 14-16, 2014).
Shi, J. Q. and Cheng, Y. (2014) R-package: GPFDA (Gaussian
Process Function Data Analysis).
https://www.staff.ncl.ac.uk/j.q.shi/ps/gpfda.pdf.
Shi, J. Q. and Choi, T. (2011). Gaussian Process Regression Analysis
for Functional Data. Chapman & Hall/CRC.
J. Q. Shi (Newcastle University)
Movement Data and Functional LARS
08/01/2016
23 / 23
Download