ppt - IDEAL

advertisement
Learning Dynamic Models from
Unsequenced Data
Jeff Schneider
School of Computer Science
Carnegie Mellon University
joint work with Tzu-Kuo Huang, Le Song
1
Learning Dynamic Models
Hidden Markov Models
e.g. for speech recognition
[source: Wikimedia Commons]
Dynamic
Bayesian SEQUENCED observations
• Key
Assumption:
Networks
• What
if observations are NOT SEQUENCED?
e.g. for protein/gene interaction
[source: SISL ARLUT]
System Identification
e.g. for control
Hubble Ultra Deep Field
[source: UAV ETHZ]
2
[Bagnell & Schneider, 2001]
When are Observations not Sequenced?
Galaxy evolution
• dynamics are too slow to watch
[source: STAGES]
Slow developing diseases
• Alzheimers
• Parkinsons
[source:
Getty Images]
How
canprocesses
we learn dynamic models for these?
Biological
• measurements are often destructive
3
[source: Bryan
Neff Lab, UWO]
Outline
• Linear Models
[Huang and Schneider, ICML, 2009]
• Nonlinear Models
[Huang, Song, Schneider, AISTATS, 2010]
• Combining Sequence and Unsequenced Data
[Huang and Schneider, NIPS, 2011]
4
Problem Description
Estimate A from the sample of xi’s
5
Doesn't seem impossible …
6
Identifiability Issues
7
Identifiability Issues
8
A Maximum Likelihood Approach
suppose we knew the dynamic
model and the predecessor of
each point …
n
~
p( X | A,  , X )  
9
i 1
||xi  A~
xi ||2
exp( 
)
2
2
( 2 2 ) p / 2
Likelihood continued
10
Likelihood (continued)
• we don’t know the time either
so also integrate out over time
• then use the empirical density
as an estimate for the resulting
marginal distribution
11
Unordered Method (UM): Estimation
12
Expectation Maximization
13
Sample Synthetic Result
input
14
output
Partial-order Method (PM)
15
Partial Order Approximation (PM)
Perform estimation by alternating maximization
• Replace UM's E-step with a maximum spanning tree
on the complete graph over data points
- weight on each edge is probability of one point
being generated from the other given A and 
- enforces a global consistency on the solution
• M-step is unchanged: weighted regression
16
Learning Nonlinear Dynamic Models
[Huang, Song, Schneider, AISTATS, 2010]
17
Learning Nonlinear Dynamic Models
An important issue
18
•
Linear model provides a severely restricted space of models
- we know a model is wrong because the regression yields
large residuals and low likelihoods
•
The nonlinear models are too powerful; they can fit anything!
•
Solution: restrict the space of nonlinear models
1. form the full kernel matrix
2. use a low-rank approximation of the kernel matrix
Synthetic Nonlinear Data:
Lorenz Attractor
Estimated gradients by kernel UM
19
Ordering by Temporal Smoothing
20
Ordering by Temporal Smoothing
21
Ordering by Temporal Smoothing
22
Evaluation Criteria
23
Results: 3D-1
24
Results: 3D-2
25
3D-1: Algorithm Comparison
26
3D-2: Algorithm Comparison
27
Methods for Real Data
1. Run k-means to cluster the data
2. Find an ordering of the cluster centers
• TSP on pairwise L1 distances (TSP+L1)
OR
• Temporal Smoothing Method (TSM)
3. Learn a dynamic model for the cluster centers
4. Initialize UM/PM with the learned model
28
Gene Expression in Yeast
Metabolic Cycle
29
Gene Expression in Yeast
Metabolic Cycle
30
Results on Individual Genes
31
Results over the whole space
32
Cosine score in high dimensions
Probability of random direction achieving a cosine score > 0.5
dimension
33
Suppose we have some
sequenced data
linear dynamic model:
x  x A
t
t 1
t
perform a standard regression:
min || Y  XA ||
A
2
F
 ~ N (0, I )
t
2
X  [ x1 , x2 ...xn1 ]
Y  [ x2 , x3...xn ]
what if the amount of data is not enough to regress reliably?
34
Regularization for Regression
add regularization to the regression:
ridge regression:
min || Y  XA ||  || A ||
A
lasso:
2
F
2
F
min || Y  XA ||  || A ||1
A
2
F
can the unsequenced data be used in regularization?
35
Lyapunov Regularization
Lyapunov equation relates dynamic model to steady
state distribution:
A QA   I  Q
T
2
Q – covariance of steady state distribution
2
2
ˆ
ˆ
min || Y  XA ||  || A QA   I  Q ||F
A
2
F
T
1. estimate Q from the unsequenced data!
2. optimize via gradient descent using the unpenalized
or the ridge regression solution as the initial point
36
Lyapunov Regularization:
Toy Example
•
•
•
•
37
-0.428 0.572
-1.043 -0.714
=1
2-d linear system
2nd column of A fixed at the correct value
given 4 sequence points
given 20 unsequenced points
A=
Lyapunov Regularization:
Toy Example
38
Results on Synthetic Data
Random 200 dimensional sparse (1/8) stable system
39
Work in Progress …
• cell cycle data from: [Zhou, Li, Yan,
Wong, IEEE Trans on Inf Tech in
Biomedicine, 2009]
A set of 100 sequenced images
• 49 features on protein subcellular
location
• 34 sequences having a full cycle and
length at least 30 were identified
• another 11,556 are unsequenced
• use the 34 sequences as ground truth
and train on the unsequenced data
40
A tracking algorithm identified 34 sequences
41
cosine score
normalized error
Preliminary Results:
Protein Subcellular Location Dynamics
Conclusions and Future Work
• Demonstrated ability to learn (non)linear
dynamic models from unsequenced data
• Demonstrated method to use sequenced
and unsequenced data together
• Continuing efforts on real scientific data
• Can we do this with hidden states?
42
EXTRA SLIDES
43
Real Data: Swinging Pendulum Video
44
Results: Swinging Pendulum Video
45
46
47
Download