fda-in-matlab-and

advertisement
Functional Data Analysis
in Matlab and R
James Ramsay, Professor, McGill U., Montreal
Hadley Wickham, Grad student, Iowa State, Ames, IA
Spencer Graves, Statistician, PDF Solutions, San José, CA
1
Outline
• What is Functional Data Analysis?
• FDA and Differential Equations
• Examples:
– Squid Neurons
– Continuously Stirred Tank Reactor (CSTR)
• Conclusions
• References
2
What is FDA?
• Functional data analysis is a collection of
techniques to model data from dynamic
systems
– possibly governed by differential equations
– in terms of some set of basis functions
• The ‘fda’ package supports the use of 8
different types of basis functions: constant,
monomial, polynomial, polygonal, B-splines,
power, exponential, and Fourier.
3
Observations of different lengths
• Observation vectors of different lengths
can be mapped to coordinates of a fixed
basis set
• All examples in the ‘fda’ package have the
same numbers of observations
• No conceptual obstacles to handling
observation vectors of different lengths
4
Time Warping
• “start” and “stop” are sometimes
determined by certain transitions
• Example: growth spurts in the life cycle of
various species do not occur at exactly the
same ages in different individuals (even
within the same species)
5
160
140
120
100
80
Height (cm.)
• Tuddenham, R. D.,
and Snyder, M. M.
(1954) "Physical
growth of California
boys and girls from
birth to age 18",
_University of
California
Publications in Child
Development_, 1,
183-364.
180
10 Girls: Berkeley Growth Study
o
o
o o
o
o
o
o
o
o o
oo o
o
ooo o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
5
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
ooo ooooo
ooo o
o
o
o oooo
oo ooo ooo
ooo
oo
oo
o
ooo
oo
o
o
o
o
o
ooo
oo
oooo
oo
o
oo
oo
o
o
o
o
o
o
o
o
o
ooooooo
o oo
o
o
o
o ooo
o
o
o
o
oooo
o
o
o
o
o
o o o
o
o
o
o
o o o
o
o
o
o ooo
o oo
o
o oo o
o
o
o
o
o
o
ooo o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
10
15
age
6
180
160
140
120
100
80
2
1
o
o o
o
oo o
o
ooo o
o
oo
o
o
ooo
o
oo
o
o
ooo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
10
15
0
5
o
o
o
o
o
o
o
-3
-2
-1
age
-4
Growth acceleration (cm/year^2)
• Growth spurts
occur at
different ages
• Average shows
the basic trend,
but features are
damped by
improper
registration
Height (cm.)
Acceleration
oooo ooooo
o
o
o
o
o
o oooo
oo ooo ooo
ooo
oo
oo
ooo
o
o
o
o
o
o
ooo
o
ooo
oo
o
oo
oooo
o
oo
oo
o
o
o
o
o
o
ooooooo
o oo
o
o
o
o
o
o
o
o
o
oooo
o
o
o
o oo
o oo o
o
o
o
o
o
o o o
o
o
o oo o
o oo
o
o
o o o
ooo oo
oo
o
o
o
o
o
ooo
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
5
10
15
7
age
-4
-3
-2
-1
0
1
2
-4
• Not perfect,
but better
-3
-2
-1
0
1
Growth acceleration (cm/year^2)
• register.fd all
to the mean
Growth acceleration (cm/yr^2)
Registration
5
5
10
10
15
age
8
15
A Stroll Along the Beach
• Light intensity over 365 days at each of
190*143 = 27140 pixels was
– smoothed
– functional principal components
• http://www.stat.berkeley.edu/~wickham/us
erposter.pdf
9
Other fda
capabilities
– even with
series of
different
lengths!
0.006
15
5
S
O
A
N
m
D
j F
Montreal average daily temp
deviation from average (C)
Apr Jun
Sep
Dec
Month
j
F
0.000
m
O
D
A
N
-0.006
– good
estimates of
derivatives
M
Jan
Acceleration
• Phase plane
plots
J A
J
0
• Correlations
-10
Mean Temperature
Montreal average daily temp
deviation from average (C)
-10 -5
0
5
M
J
S J
A
afda-ch03.R
fda-ch01.R
fda-ch02.R
10 15 20
10
Temperature (C)
Script files for fda books
• Ramsay and Silverman
– (2002) Applied Functional Data Analysis
(Springer)
– (2006) Functional Data Analysis, 2nd ed.
(Springer)
• ~R\library\fda\scripts
– Some but not all data sets discussed in the
books are in the ‘fda’ package
– Script files are available to reproduce some but
not all of the analyses in the books.
– plus CSTR demo
11
FDA and Differential Equations
• Many dynamic systems are believed to
follow processes where output changes are
a function of the outputs, x, and inputs, u
(and unknown parameters q):
x  t   f  x,u, t | θ, t   0, T 
• Matlab was designed in part for these types
of models
12
Squid Neurons
• FitzHugh (1961) - Nagumo et al. (1962) Equations:
Estimate a, b and c in: 
3
 
 
Voltage across
Axon Membrane
V
Recovery via
Outward Currents
V c V  V 3 R
R    V  a  bR c
R
13
Tank Reactions
Temperature
Concentration
• Continuously Stirred Tank Reactor (CSTR)
14
Functional Data Analysis Process
1. Select Basis Set
2. Select Smoothing Operator
– e.g., differential equation
– equivalent to a Bayesian prior over coefficients
to estimate
3. Estimate coefficients to optimize some
objective function
4. Model criticism, residual plots, etc.
5. Hypothesis testing
15
Inputs to Tank Reaction Simulation
16
Computations: Nonlinear ODE
dC dt    CC  T , Fin C  FinCin
dT dt   TT  Fco , Fin T  TC  T , Fin C  FinTin    Fco Tco
 CC  T , Fin    exp 104  1 T  1 Tref   Fin
TT  Fco , Fin     Fco   Fin
TC  T , Fin   130 CC  T , Fin 
Temperature
• Compute Input vectors
• Define functions
• Call differential
equation solver
• Summarize, plot
Concentration
  Fco   aFcob 1  Fco  aFcob 2 
4 parameters:  , , a, b
estimate
parameters
(, , a, b)
17
Three problems
• Estimate (, , a, b) to minimize SSE in
Temperature only
function
Matlab lsqnonlin
R
nls
optim Nelder-Mead
BFGS
CG
SANN
nlminb
SSE
5.09888
5.09652
5.09652
5.09652
5.09900
5.17504
5.09652
SSE-min
0.00236
0
0
0
0.00248
0.07852
0
18
SSE(Temp, Conc)
• Matlab: lsqnonlin
Median absolute relative error
Matlab
R
Concentration 1.149E-03 1.145E-03
Temperature
2.640E-04 2.636E-04
• R: nls
Concentration (red = true, blue = estimate)
C(t)
1.6
1.4
C(t)
1.6
1.8
Concentration (red = true, blue = estimated)
1.2
1.4
1.2
0
10
20
30
40
50
0
60
10
20
30
40
50
60
50
60
Temperature
360
Temperature
350
350
340
340
0
10
20
30
40
50
60
330
330
C(t)
T(t)
360
0
10
20
30
40
19
R vs. Matlab
• Gave comparable answers
• R code for CSTR slightly more accurate but
requires much more compute time
– coded by different people
• R has helper functions not so easily
replicated in Matlab
– summary.nls
– confint.nls
– profile.nls
Estimate
kref
0.466
EoverR 0.840
a
1.720
b
0.496
StdErr t
0.004 113.0
0.009 94.7
0.232
7.4
0.050 10.0
Pr(>|t|)
< 2e-16 ***
< 2e-16 ***
8.2e-13 ***
< 2e-16 ***
20
confint.nls
• Likelihood-based confidence intervals:
generally more accurate than Wald intervals
– Wald subject to parameter effects curvature
– Likelihood: only affected by intrinsic curvature
> confintNlsFit
2.5%
kref
0.458
EoverR 0.823
a
1.300
b
0.401
97.5%
0.474
0.858
2.222
0.599
21
plot.profile.nls
2.0
80
1.0
2.0
1.0
0.0
0.0
50
0.455
0.465
0.475
0.82
0.84
0.86
2.0
EoverR
2.0
kref
95
1.0
0.0
1.0
90
0.0
• for a plot
showing the
sqrt(log(LR))
99
1.2
1.6
2.0
a
2.4
0.40
0.50
0.60
b
22
Conclusions
• R and Matlab give comparable answers
• R:nls has helper functions absent from
Matlab:lsqnonlin
• Functional data analysis tools are key for
– estimating derivatives and
– working with differential operators
23
References
• www.functionaldata.org
• Ramsay and Silverman (2006) Functional Data
Analysis, 2nd ed. (Springer)
• ________(2002) Applied Functional Data
Analysis (Springer)
• Ramsay, J. O., Hooker, G., Cao, J. and
Campbell, D. (2007) Parameter estimation for
differential equations: A generalized smoothing
approach (with discussion). Journal of the Royal
Statistical Society, Series B. To appear.
24
NOT free-knot splines
• For this, see
– DierckxSpline package
– Companion to Dierckx, P. (1993). Curve and
Surface Fitting with Splines. Oxford Science
Publications, New York.
• R package by Sundar Dorai-Raj
– links to Fortran code by Dierckx available
from www.netlib.org/dierckx
• soon to appear on CRAN
25
Download