Functional Regression Analysis for Big Medical Movement Data Jian Qing SHI School of Mathematics & Statistics, Newcastle University, UK j.q.shi@ncl.ac.uk http://www.staff.ncl.ac.uk/j.q.shi Joint work with Dr. Yafeng Cheng and Prof. Janet Eyre UCL Workshop on the Theory of Big Data, London 08/01/2016 J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 1 / 23 Outline 1 Movement data: project of Limbs-Alive 2 The model and function-valued variable selection A nonlinear mixed-effects scalar-on-function regression model How to represent a function-valued variable Correlation between function-valued variables Functional least angle regression (fLARS) 3 Numerical results 4 Conclusions J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 2 / 23 Limbs-Alive: Monitoring of upper limb rehabilitation and recovery after stroke Worldwide — Stroke survivors estimated 50-60 million globally currently. UK — 1.1 million stroke survivors (2012). J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 3 / 23 Limbs-Alive: Monitoring of upper limb rehabilitation and recovery after stroke Worldwide — Stroke survivors estimated 50-60 million globally currently. UK — 1.1 million stroke survivors (2012). Hemiparesis occurs in 80% of survivors: I I 80% regain the ability to walk independently, but 50-70% experience persisting upper-limb functional impairments. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 3 / 23 Limbs-Alive: Monitoring of upper limb rehabilitation and recovery after stroke Worldwide — Stroke survivors estimated 50-60 million globally currently. UK — 1.1 million stroke survivors (2012). Hemiparesis occurs in 80% of survivors: I I 80% regain the ability to walk independently, but 50-70% experience persisting upper-limb functional impairments. Substantial improvement of upper limb function demonstrated to be possible many years after stroke, but only with intense therapeutic input. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 3 / 23 Limbs-Alive: Monitoring of upper limb rehabilitation and recovery after stroke Worldwide — Stroke survivors estimated 50-60 million globally currently. UK — 1.1 million stroke survivors (2012). Hemiparesis occurs in 80% of survivors: I I 80% regain the ability to walk independently, but 50-70% experience persisting upper-limb functional impairments. Substantial improvement of upper limb function demonstrated to be possible many years after stroke, but only with intense therapeutic input. The limbs-alive way (a system developed by Newcastle team) I I I Rehabilitation tool: play video-game at home. Reduce cost, puts patients in charge of their own recovery. Monitoring: Therapists monitor patients’ upper-limb rehabilitation remotely. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 3 / 23 The rehabilitation system – Monitoring Patients play assessment game at home (including 38 movements). The data are collected and transfered to our central server; Data analysis: calibration, standardization, registration and then modeling; Relay information is send to therapists location. Therapists monitor patients recovery/adjust therapy non-visually. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 4 / 23 Evaluation of upper limb function Current (most often used) methods — Chedoke Arm and Hand Activity Inventory (CAHAI ) l J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 5 / 23 Evaluation of upper limb function Current (most often used) methods — Chedoke Arm and Hand Activity Inventory (CAHAI ) Method in our project: l I I I THREE ORTHOGONAL COILS TWO OF THE RECEIVERS Designed 38 movements; Collected data using wireless controllers: It is objective, relatively cheap and more importantly it can be used for remote monitoring. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 5 / 23 Big data: evaluation of upper limb function Date collected from controllers (60 Hz): I I Position data: x(t), y (t), z(t) Orientation data: qx, qy , qz, qw (quaternions) J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 6 / 23 Big data: evaluation of upper limb function Date collected from controllers (60 Hz): I I Position data: x(t), y (t), z(t) Orientation data: qx, qy , qz, qw (quaternions) but I I I I data is noisy Raw data – about 15GB sample size 187 — but there are 273 scalar variables and 504 function-valued variables for each patient modeling is very challenging J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 6 / 23 Preprocessing Preliminary data analysis: Segmentation, Calibration, Standardization, Smoothing and Registration J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 7 / 23 Preprocessing −0.2 0.4 0.8 right_x −0.4 −1.0 left_x 0.2 Preliminary data analysis: Segmentation, Calibration, Standardization, Smoothing and Registration 0.0 0.4 0.8 0.0 0.8 1.0 0.0 right_y −1.0 0.0 left_y −1.0 J. Q. Shi (Newcastle University) 0.0 0.4 time 1.0 time 0.4Movement 0.8 Data and Functional 0.0 LARS0.4 0.8 08/01/2016 7 / 23 Modeling Functional regression/classification/latent variable model I I Nonlinearity and heterogeneity, Variable selection: large number of scalar and function-valued variables Model validation. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 8 / 23 Available data Aim : model for yij Notation: i for patients; j for time-related replicates Impairment level (cahai) - yij Patient related information - (scalars e.g. age, gender, initial impairment level ...) Kinematic variables - (scalars including velocity, fluency, synchrony, accuracy, in total of 266 variables) Function-valued variables - in total of 504 variables J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 9 / 23 Nonlinear mixed-effects scalar-on-function regression model yi = γ0 + zi γ + J Z X xi,j (t)βj (t)dt + g (vi ) + (1) j=1 g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ). J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 10 / 23 Nonlinear mixed-effects scalar-on-function regression model yi = γ0 + zi γ + J Z X xi,j (t)βj (t)dt + g (vi ) + (1) j=1 g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ). Features of the model A mixed-effects scalar-on-function model J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 10 / 23 Nonlinear mixed-effects scalar-on-function regression model yi = γ0 + zi γ + J Z X xi,j (t)βj (t)dt + g (vi ) + (1) j=1 g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ). Features of the model A mixed-effects scalar-on-function model GPR models nonlinear random-effects, and can cope with multi-dimensional covariates J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 10 / 23 Nonlinear mixed-effects scalar-on-function regression model yi = γ0 + zi γ + J Z X xi,j (t)βj (t)dt + g (vi ) + (1) j=1 g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ). Features of the model A mixed-effects scalar-on-function model GPR models nonlinear random-effects, and can cope with multi-dimensional covariates Provides a natural framework on simultaneously modeling common mean structure across different subjects and covariance structure for each individual. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 10 / 23 Nonlinear mixed-effects scalar-on-function regression model yi = γ0 + zi γ + J Z X xi,j (t)βj (t)dt + g (vi ) + (1) j=1 g (v) ∼ GP(0, κ(v, v0 ; θ)) ∼ N(0, σ 2 ). Features of the model A mixed-effects scalar-on-function model GPR models nonlinear random-effects, and can cope with multi-dimensional covariates Provides a natural framework on simultaneously modeling common mean structure across different subjects and covariance structure for each individual. Challenges: variable selection of both scalar and function-valued variables. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 10 / 23 Represent function-valued variables Z u= J. Q. Shi (Newcastle University) x(t)β(t)dt Movement Data and Functional LARS 08/01/2016 11 / 23 Represent function-valued variables Z u= x(t)β(t)dt by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk )) if k is large enough. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 11 / 23 Represent function-valued variables Z u= x(t)β(t)dt by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk )) if k is large enough. P by basis functions (BF), e.g. x(t) = K k=1 ck φk (t). J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 11 / 23 Represent function-valued variables Z u= x(t)β(t)dt by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk )) if k is large enough. P by basis functions (BF), e.g. x(t) = K k=1 ck φk (t). by Gaussian quadrature (GQ): Z 1 k X f (t)dt = wi f (ti ), −1 i=1 where k is usually a quite small number. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 11 / 23 Represent function-valued variables Z u= x(t)β(t)dt by representative data points (RDP), say x = (x(t1 ), x(t2 ), . . . , x(tk )) if k is large enough. P by basis functions (BF), e.g. x(t) = K k=1 ck φk (t). by Gaussian quadrature (GQ): Z 1 k X f (t)dt = wi f (ti ), −1 i=1 where k is usually a quite small number. A unified expression Z x(t)β(t)dt =XWCβT , Z [β 00 (t)]2 dt =Cβ W2 CβT . J. Q. Shi (Newcastle University) Movement Data and Functional LARS (2) (3) 08/01/2016 11 / 23 Correlation between random vectors Canonical correlation looks for the linear correlation between two random vectors. The idea is that, for each pair of a, b , there is e e cov (aT Z1 , b T Z2 ) ρ(Z1 , Z2 |a, b ) = q e e e e Var (aT Z1 )Var (b T Z2 ) e e (4) The first correlation is defined as: cov (aT Z1 , b T Z2 ) ρ(Z1 , Z2 ) = argmax q e e a,b Var (aT Z1 )Var (b T Z2 ) ee e e J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 12 / 23 Functional canonical correlation The functional canonical correlation is defined by the following way. ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) = √ J. Q. Shi (Newcastle University) R R cov ( X1 (t)β1 (t)dt, X2 (t)β2 (t)dt) R R [Var ( X1 (t)β1 (t)dt)][Var ( X2 (t)β2 (t)dt)] Movement Data and Functional LARS 08/01/2016 13 / 23 Functional canonical correlation The functional canonical correlation is defined by the following way. ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) = √ R R cov ( X1 (t)β1 (t)dt, X2 (t)β2 (t)dt) R R [Var ( X1 (t)β1 (t)dt)][Var ( X2 (t)β2 (t)dt)] R 00 Need to apply roughness penalty, e.g. PEN = [β(t) ]2 dt. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 13 / 23 Functional canonical correlation The functional canonical correlation is defined by the following way. ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) = √ R R cov ( X1 (t)β1 (t)dt, X2 (t)β2 (t)dt) R R [Var ( X1 (t)β1 (t)dt)][Var ( X2 (t)β2 (t)dt)] R 00 Need to apply roughness penalty, e.g. PEN = [β(t) ]2 dt. ρ(X1 (t), X2 (t) = ρ(X1 (t), X2 (t)|β̃1 (t), β̃2 (t) with Z 00 argmax ρ(X1 (t), X2 (t)|β1 (t), β2 (t)) − λ1 [β1 (t) ]2 dt β1 (t),β2 (t) Z −λ2 00 2 [β2 (t) ] dt where λ1 and λ2 are two tuning parameters. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 13 / 23 Canonical correlation between one function-valued variable and one scale random variable R cov ( X (t)β(t)dt, by ) ρ(X (t), y |β(t), b) = q R [Var ( X (t)β(t)dt)]2 dt][Var (by )] ˜ b̃) with and ρ(X (t), y ) = ρ(X (t), y |β(t), Z argmax ρ(X (t), y |β(t), b) − λ1 00 [β(t) ]2 dt. β(t),b J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 14 / 23 Functional LARS (fLARS) Algorithm 1 Normalize response and all variables. Set all coefficients as 0. 2 Find the variable with the maximum canonical correlation with the response. Put it in the selected set A. 3 Iterations start from here. For each iteration, find the vector which is the canonical variable from all the variables in A against the current residual rk , call it pk where k is the current iteration. Find αk,j for each j 6∈ A in each iteration k from: cor (rk − αk,j pk , pk )2 = fcca(rk − αk,j , xj (t))2 ; j 6∈ A where αk,j is the distance to go for pk to meet variable xj (t). 4 In iteration k, pk should move min(αk,jk ), and move jk into A. 5 Repeat until meet the stopping rule. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 15 / 23 Functional LARS Algorithm Two new stopping rules have been developed: when the distance to move α is small than a threshold; when AD reaches the maximum. 0.2 0.0 0.1 value 0.3 0.4 α 2 4 6 8 10 12 iteration J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 16 / 23 Simulation study Model: y = µ + P k zk γk + P R j xj (t)βj (t)dt True model: including three function-valued variables x1 (t), x2 (t), x3 (t), and three scalar variables z1 , z2 , z3 . Candidate variables: 50 function-valued variables and 50 scalar variables. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 17 / 23 Simulation study: function variables used in the true model β2 β3 0.00 −0.10 −0.15 0.00 −0.05 0.02 0.00 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 time time time x1(t) x2(t) x3(t) 0.8 1.0 80 100 −50 0 200 0 50 250 50 100 300 100 150 0.0 −0.05 value 0.10 value 0.05 0.06 0.04 value 0.08 0.10 0.15 0.12 β1 0 20 40 60 index J. Q. Shi (Newcastle University) 80 100 0 20 40 60 80 100 0 index Movement Data and Functional LARS 20 40 60 index 08/01/2016 18 / 23 Simulation study: numerical results RDP GQ BF GroupLassoP GroupLassoB fLARS RMSE 0.0639 0.0721 0.0622 0.1103 0.3662 J. Q. Shi (Newcastle University) Correctly selected (%) 99.58 99.03 99.85 100 54.71 False selected (%) 0.15 0.50 0.26 43.56 14.05 Movement Data and Functional LARS Time (sec) 14.3802 3.6457 5.5291 1659.4159 9.3553 08/01/2016 19 / 23 Numerical results for movement data Acute Chronic RMSE SE RMSE SE Only lm 6.985 0.229 3.904 0.106 Scalar ME 6.614 0.179 4.047 0.105 fLARS fFE fME 6.200 6.061 0.186 0.176 3.698 3.854 0.090 0.095 Table: RMSEs comparing predictions and the observations with 1000 replications. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 20 / 23 60 50 40 Predictions 20 30 40 20 30 Predictions 50 60 Movement data: the observations and Predictions (cross validation) 20 30 40 50 60 20 CAHAI 30 40 50 60 CAHAI Figure: Left: using scalar variables only. Right: using both scalar and functional-valued variables J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 21 / 23 Conclusions Proposed a complex nonlinear scalar-on-function regression model I I I Nonlinear random-effects: modeled by a GPR Variable selection for large number of scalar and function-valued variables: the new fLARS works efficiently and accurately Very good performance for the big medical movement data J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 22 / 23 Conclusions Proposed a complex nonlinear scalar-on-function regression model I I I Nonlinear random-effects: modeled by a GPR Variable selection for large number of scalar and function-valued variables: the new fLARS works efficiently and accurately Very good performance for the big medical movement data Further research I Consider different correlation coefficients for function-valued variables J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 22 / 23 Conclusions Proposed a complex nonlinear scalar-on-function regression model I I I Nonlinear random-effects: modeled by a GPR Variable selection for large number of scalar and function-valued variables: the new fLARS works efficiently and accurately Very good performance for the big medical movement data Further research I I Consider different correlation coefficients for function-valued variables Different way of representing function-valued variables: replace GQ by other method which can keep more information for x(t). J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 22 / 23 Conclusions Proposed a complex nonlinear scalar-on-function regression model I I I Nonlinear random-effects: modeled by a GPR Variable selection for large number of scalar and function-valued variables: the new fLARS works efficiently and accurately Very good performance for the big medical movement data Further research I I I Consider different correlation coefficients for function-valued variables Different way of representing function-valued variables: replace GQ by other method which can keep more information for x(t). A nonlinear scalar-on-function model (in terms of function-valued variable x(t), may use a GP prior). J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 22 / 23 Thank you J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 23 / 23 Thank you References: Cheng, Y, Shi, J and Eyre, J. (2015). Nonlinear mixed-effects scalar-on-function regression models and variable selection for movement data. Serradilla, J. Shi, J. Q., Cheng, Y., Morgan, G., Lambden, C. and Eyre, J. A. (2014). Automatic Assessment of Upper Limb Function During Play of the Action Video Game, Circus Challenge: Validity and Sensitivity to Change. SEGAH 2014. (The best paper winner in the IEEE 3rd International Conference on Serious Games and Applications for Health, held on Rio de Janeiro, Brazil, May 14-16, 2014). Shi, J. Q. and Cheng, Y. (2014) R-package: GPFDA (Gaussian Process Function Data Analysis). https://www.staff.ncl.ac.uk/j.q.shi/ps/gpfda.pdf. Shi, J. Q. and Choi, T. (2011). Gaussian Process Regression Analysis for Functional Data. Chapman & Hall/CRC. J. Q. Shi (Newcastle University) Movement Data and Functional LARS 08/01/2016 23 / 23