Spatiotemporal Data Mining for Monitoring Ocean Objects Carnegie

advertisement
Ambient Intelligence Lab
Carnegie Mellon
Spatiotemporal Data Mining for Monitoring Ocean Objects
Yang Cai, Ph.D.
Ambient Intelligence Lab
Carnegie Mellon University
ycai@cmu.edu
Ambient Intelligence Lab
Carnegie Mellon
Collaborators
Karl Fu, Carnegie Mellon
Xavier Boutonnier, Carnegie Mellon
Daniel Chung, Carnegie Mellon
Richard Stumpf, NOAA
Timothy Wynne, NOAA
Mitchell Tomlison, NOAA
James Acker, GSFC
Cynthia Heil, FWRI
Y. Hu, LaRC
Carnegie Mellon
Ambient Intelligence Lab
Spatiotemporal dynamics of surface objects is a part of our everyday-life,
from “red tide”, river plume, vessels, flood, urban growth, to urban traffic
congestion.
Ambient Intelligence Lab
Carnegie Mellon
Scientific Questions
Tracking:
Given an object in an image sequence (t=1,..,n), how to find
the object at t=n+1 and beyond?
Prediction:
Given databases of historical data and current physical and
biochemical conditions, how to predict the occurrence of the
interested object at a particular time and location?
Carnegie Mellon
Ambient Intelligence Lab
Why do we need data mining here?
1. Lots of historical data (8-year SeaWiFS and
40-year cell count and wind databases)
2. Current models haven’t been successful.
3. Data mining appears to be inexpensive.
4. Most of domain experts still spend 80% time
to do ‘manual’ data mining.
Carnegie Mellon
Ambient Intelligence Lab
Challenges to Data Mining Technologies
• visual or hyper-spectrum content
• space and time
• deformation + transport
• missing data (80% clouds in images)
• multiple databases
• multi-physics
Spatiotemporal Data Mining System for
Tracking and Modeling Ocean Object Movement
Sponsored by NASA ESTO-AIST-QRS-04-3031
Objective
• To track the movement of ocean objects that
have been identified
• To predict the movement of identified objects.
spatiotemporal object motion tracking
Approach
• Computer vision and visualization
• Statistical spatiotemporal data minining.
• Case studies with SeaWiFS datasets.
Key Milestones
• Motion tracking model
6 mo/1 yr
• Case studies with plume/HAB
6 mo/1 yr
• Spatiotemporal motion model
6 mo/2 yr
• Spatial frequency pattern model
6 mo/2 yr
Co-I/Partner
Co-I, Richard Stumpf, NOAA
TRLin = 4
Ambient Intelligence Lab
Carnegie Mellon
V.M.S.V. Methodology
Vision:
•
Spatial Density Filter
•
Correlation Filter and Particle Filter
•
Mutual Information
Mining:
•
Spatiotemporal Neural Network
•
Spatiotemporal Bayesian Model
•
Periodicity Transform
Simulation:
•
Cellular Automata
Visualization:
•
Pseudo Color, Mapping, Animation...
Ambient Intelligence Lab
Carnegie Mellon
Case Study:
Harmful Algal Blooms
Images above show a harmful algae bloom (HAB), highlighted as chlorophyll anomaly, drifting
along the southwest Florida coast in December 2001.
Ambient Intelligence Lab
Carnegie Mellon
Correlation Study
Use chlorophyll as a surrogate for Karenia Brevis blooms (NOAA)
References:
Tomlinson, M.C., R.P. Stumpf, V. Ransibrahmanakul, E.W. Truby, G.J. Kirkpatrick, B.A. Pederson, G.A.
Vargo, C. A. Heil., 2004. Evaluation of the use of SeaWiFS imagery for detecting Karenia brevis harmful
algal blooms in the eastern Gulf of Mexico. Remote Sensing of Environment, v. 91, pp. 293-303.
Carnegie Mellon
Chlorophyll channel
Ambient Intelligence Lab
Anomaly channel
Ambient Intelligence Lab
Carnegie Mellon
Image Data Preprocessing
• Remove noises in the satellite images
• Grouping objects
• Recover the missing data
Ambient Intelligence Lab
Carnegie Mellon
Spatial Density Clustering
Dis
1. Set all the neighboring dots within a distance (Dis) as one test set
2. If number of dots > Min, then the point is a core point
3. Remove the non-core points
4. Go to step 1
Ambient Intelligence Lab
Carnegie Mellon
Sample of SDC vs. Binary Morphology
noisy image
SDC output
Morphology output
Ambient Intelligence Lab
Carnegie Mellon
Missing Data Recovery
1. Which is which?
2. Concavity of objects.
Carnegie Mellon
Ambient Intelligence Lab
Interpolation of a convex object
We take all the points of the
contours of the marginal objects
and by linear interpolation calculate
the position of the interpolated
point.
The Hull Convex of the interpolated
points gives the contour of the
interpolated convex object.
Carnegie Mellon
Ambient Intelligence Lab
Work around concavity
1. First we extract the concavity.
2. Then we interpolate the object and the
concavities.
3. Then we remove the part
corresponding to the interpolated
concavity from the interpolated object
Ambient Intelligence Lab
Carnegie Mellon
Results
Ambient Intelligence Lab
Carnegie Mellon
Shape Grouping Results
Carnegie Mellon
Ambient Intelligence Lab
Correctly Marked Surface Objects (HAB)
Carnegie Mellon
Ambient Intelligence Lab
Object Tracking with Correlation Filter
Shape Correlation = IFFT(FFT(a).* conj(FFT(b)))
where,
a is the test image
b is the reference object in the previous image to be tracked.
FFT(x) represent Discrete Fast Fourier Transform
IFFT(x) is Inverse Discrete Fast Fourier Transform.
Ambient Intelligence Lab
Carnegie Mellon
Tracking Results
Carnegie Mellon
Ambient Intelligence Lab
Tracking HAB with Correlation Filter within 4-day interval
Carnegie Mellon
Ambient Intelligence Lab
Tracking of a bloom which has split into 2 pieces sampled in
an interval of 4 days using particle filter
Carnegie Mellon
Ambient Intelligence Lab
River plume tracking with Mutual Information
Ambient Intelligence Lab
Carnegie Mellon
Neural network spatiotemporal prediction model
point in frame t
raw image sequence
r
Kohonen Network
θ
clustered points **
point in frame t-1
Radial Basis Function
predicted image for t+1
** from Cartesian to
polar coordinates
Carnegie Mellon
Ambient Intelligence Lab
Neural Network Prediction Model Example
• The first 3 images are input shapes (zero wind speed)
• The last image is the predicted shape overlaid on top of the ground truth.
• The blue dots are the clustered data points that represent the shape.
** Reference: Mitchell, T. Machine learning, McGraw-Hill, 1997
Ambient Intelligence Lab
Carnegie Mellon
Prediction model results
Ambient Intelligence Lab
Carnegie Mellon
Prediction model results
Cluster Resolution vs. Time
Time (CPU time)
500.00
400.00
300.00
time
200.00
100.00
0.00
0
500
1000
1500
2000
Cluster Resolution (clusters)
2500
Ambient Intelligence Lab
Carnegie Mellon
Periodicity Transform
Space time coordinates of the measurements
5
x 10
7.35
p −1
2
2
Energy = x
date
7.3
=
1
p
7.25
2
∑
x(i)2
i =0
7.2
-82.1
26.65
-82.2
26.6
26.55
-82.3
-82.4
longitude
6
x 10
26.5
26.45
latitude
Power of the components of periodicity transform of a medium set of data
5
0.25
0.2
5
4
Power
0.15
3
0.1
2
0.05
1
0
0
0
2000
4000
6000
8000
10000
12000
14000
0
100
200
300
400
Periodicity
500
600
700
Carnegie Mellon
Ambient Intelligence Lab
Experiments with cell count data
Ambient Intelligence Lab
Carnegie Mellon
Zoomed result
Ambient Intelligence Lab
Carnegie Mellon
Super-zoomed results
Ambient Intelligence Lab
Carnegie Mellon
Cellular Automata
CA is a two-dimensional simulation of surface physics, chemistry or
biology. It’s simple however, it could be computationally expensive
for large problems.
Carnegie Mellon
Ambient Intelligence Lab
Visualization of the prediction model
Ambient Intelligence Lab
Carnegie Mellon
3-D Stereo Projection Lab
Ambient Intelligence Lab
Carnegie Mellon
V.M.S.V. Methodology
Vision:
•
Spatial Density Filter
•
Correlation Filter and Particle Filter
•
Mutual Information
Mining:
•
Spatiotemporal Neural Network
•
Spatiotemporal Bayesian Model
•
Periodicity Transform
Simulation:
•
Cellular Automata
Visualization:
•
Pseudo Color, Mapping, Animation...
Ambient Intelligence Lab
Carnegie Mellon
Conclusions
1. Data mining without domain expertise is like fishing-in-the-dark.
Multiple experts are needed.
2. Vision-Mining-Simulation-Visualization (VMSV) method puts
human experts in the loop, which increase the chance of
success.
3. Vision algorithm enables automated image-based data mining.
4. Neural network model shows promising in compressing the
shape information in orders of magnitude. However, it has its
limitations in long-term prediction.
5. Bayesian prediction shows promising in long-term prediction and
efficient computational speed.
6. How to couple the multi-physics models with data mining models
is a big challenge.
Ambient Intelligence Lab
Carnegie Mellon
Publications
1.
Y. Cai, R. Stumpf, etc. Spatiotemporal Data Mining for Prediction of Harmful Algal
Blooms, International Harmful Algae Conference, Copenhagen, September 8-12, 2006
2.
Y. Cai, Y. Hu, Onboard Inverse Physics from Sensor Web, Proceedings of Space Missions
and Challenges, SMC-IT, JPL, 2006
3.
Y. Cai and K. Fu, Spatiotemporal Data Mining with Cellular Automata, Proceedings of
International Conference of Computational Science, ICCS 2006, May 30, UK
4.
Y. Cai, D. Chung, K. Fu, R. Stumpf, T. Wynne, M. Tomlinson, Spatiotemporal Data Mining with
Micro Visual Interaction, submitted to Journal of Knowledge and Information Systems
5.
Y. Cai, K. Fu, R. Stumpf, T. Wynne, M. Tomlinson, Spatiotemporal Data Mining for Monitoring
Ocean Objects, submitted to NASA Data Mining Workshop, JPL, 2006
6.
Y. Cai, Y. Hu, Sensory Stream Data Mining on Chip, submitted to NASA Data Mining
Workshop, JPL, 2006
7.
Y. Cai, (editor), Special Issue of Visual data Mining, Journal of Information Visualization, to be
published by Elsevier, 2006
8.
Y. Cai and J. Abascal, (editors), Ambient Intelligence in Everyday Life, Lecture Notes in
Artificial Intelligence, LNAI 3864, to be published by Springer, April, 2006
Carnegie Mellon
Ambient Intelligence Lab
Ambient Intelligence Lab
Carnegie Mellon
Acknowledgement
The study is supported by NASA ESTO grant AIST-QRS-043031. We are indebted to our collaborators in NOAA, FWRI,
GSFC. The authors appreciate the comments and suggestions
from Karen Meo, Kai-Dee Chu, Steven Smith, Gene Carl
Feldman and James Acker from NASA. Also, many thanks to
Christos Faloutous and Mel Siegel from CMU and Andrew
Moore from Google Research Center in Pittsburgh.
Download