Ambient Intelligence Lab Carnegie Mellon Spatiotemporal Data Mining for Monitoring Ocean Objects Yang Cai, Ph.D. Ambient Intelligence Lab Carnegie Mellon University ycai@cmu.edu Ambient Intelligence Lab Carnegie Mellon Collaborators Karl Fu, Carnegie Mellon Xavier Boutonnier, Carnegie Mellon Daniel Chung, Carnegie Mellon Richard Stumpf, NOAA Timothy Wynne, NOAA Mitchell Tomlison, NOAA James Acker, GSFC Cynthia Heil, FWRI Y. Hu, LaRC Carnegie Mellon Ambient Intelligence Lab Spatiotemporal dynamics of surface objects is a part of our everyday-life, from “red tide”, river plume, vessels, flood, urban growth, to urban traffic congestion. Ambient Intelligence Lab Carnegie Mellon Scientific Questions Tracking: Given an object in an image sequence (t=1,..,n), how to find the object at t=n+1 and beyond? Prediction: Given databases of historical data and current physical and biochemical conditions, how to predict the occurrence of the interested object at a particular time and location? Carnegie Mellon Ambient Intelligence Lab Why do we need data mining here? 1. Lots of historical data (8-year SeaWiFS and 40-year cell count and wind databases) 2. Current models haven’t been successful. 3. Data mining appears to be inexpensive. 4. Most of domain experts still spend 80% time to do ‘manual’ data mining. Carnegie Mellon Ambient Intelligence Lab Challenges to Data Mining Technologies • visual or hyper-spectrum content • space and time • deformation + transport • missing data (80% clouds in images) • multiple databases • multi-physics Spatiotemporal Data Mining System for Tracking and Modeling Ocean Object Movement Sponsored by NASA ESTO-AIST-QRS-04-3031 Objective • To track the movement of ocean objects that have been identified • To predict the movement of identified objects. spatiotemporal object motion tracking Approach • Computer vision and visualization • Statistical spatiotemporal data minining. • Case studies with SeaWiFS datasets. Key Milestones • Motion tracking model 6 mo/1 yr • Case studies with plume/HAB 6 mo/1 yr • Spatiotemporal motion model 6 mo/2 yr • Spatial frequency pattern model 6 mo/2 yr Co-I/Partner Co-I, Richard Stumpf, NOAA TRLin = 4 Ambient Intelligence Lab Carnegie Mellon V.M.S.V. Methodology Vision: • Spatial Density Filter • Correlation Filter and Particle Filter • Mutual Information Mining: • Spatiotemporal Neural Network • Spatiotemporal Bayesian Model • Periodicity Transform Simulation: • Cellular Automata Visualization: • Pseudo Color, Mapping, Animation... Ambient Intelligence Lab Carnegie Mellon Case Study: Harmful Algal Blooms Images above show a harmful algae bloom (HAB), highlighted as chlorophyll anomaly, drifting along the southwest Florida coast in December 2001. Ambient Intelligence Lab Carnegie Mellon Correlation Study Use chlorophyll as a surrogate for Karenia Brevis blooms (NOAA) References: Tomlinson, M.C., R.P. Stumpf, V. Ransibrahmanakul, E.W. Truby, G.J. Kirkpatrick, B.A. Pederson, G.A. Vargo, C. A. Heil., 2004. Evaluation of the use of SeaWiFS imagery for detecting Karenia brevis harmful algal blooms in the eastern Gulf of Mexico. Remote Sensing of Environment, v. 91, pp. 293-303. Carnegie Mellon Chlorophyll channel Ambient Intelligence Lab Anomaly channel Ambient Intelligence Lab Carnegie Mellon Image Data Preprocessing • Remove noises in the satellite images • Grouping objects • Recover the missing data Ambient Intelligence Lab Carnegie Mellon Spatial Density Clustering Dis 1. Set all the neighboring dots within a distance (Dis) as one test set 2. If number of dots > Min, then the point is a core point 3. Remove the non-core points 4. Go to step 1 Ambient Intelligence Lab Carnegie Mellon Sample of SDC vs. Binary Morphology noisy image SDC output Morphology output Ambient Intelligence Lab Carnegie Mellon Missing Data Recovery 1. Which is which? 2. Concavity of objects. Carnegie Mellon Ambient Intelligence Lab Interpolation of a convex object We take all the points of the contours of the marginal objects and by linear interpolation calculate the position of the interpolated point. The Hull Convex of the interpolated points gives the contour of the interpolated convex object. Carnegie Mellon Ambient Intelligence Lab Work around concavity 1. First we extract the concavity. 2. Then we interpolate the object and the concavities. 3. Then we remove the part corresponding to the interpolated concavity from the interpolated object Ambient Intelligence Lab Carnegie Mellon Results Ambient Intelligence Lab Carnegie Mellon Shape Grouping Results Carnegie Mellon Ambient Intelligence Lab Correctly Marked Surface Objects (HAB) Carnegie Mellon Ambient Intelligence Lab Object Tracking with Correlation Filter Shape Correlation = IFFT(FFT(a).* conj(FFT(b))) where, a is the test image b is the reference object in the previous image to be tracked. FFT(x) represent Discrete Fast Fourier Transform IFFT(x) is Inverse Discrete Fast Fourier Transform. Ambient Intelligence Lab Carnegie Mellon Tracking Results Carnegie Mellon Ambient Intelligence Lab Tracking HAB with Correlation Filter within 4-day interval Carnegie Mellon Ambient Intelligence Lab Tracking of a bloom which has split into 2 pieces sampled in an interval of 4 days using particle filter Carnegie Mellon Ambient Intelligence Lab River plume tracking with Mutual Information Ambient Intelligence Lab Carnegie Mellon Neural network spatiotemporal prediction model point in frame t raw image sequence r Kohonen Network θ clustered points ** point in frame t-1 Radial Basis Function predicted image for t+1 ** from Cartesian to polar coordinates Carnegie Mellon Ambient Intelligence Lab Neural Network Prediction Model Example • The first 3 images are input shapes (zero wind speed) • The last image is the predicted shape overlaid on top of the ground truth. • The blue dots are the clustered data points that represent the shape. ** Reference: Mitchell, T. Machine learning, McGraw-Hill, 1997 Ambient Intelligence Lab Carnegie Mellon Prediction model results Ambient Intelligence Lab Carnegie Mellon Prediction model results Cluster Resolution vs. Time Time (CPU time) 500.00 400.00 300.00 time 200.00 100.00 0.00 0 500 1000 1500 2000 Cluster Resolution (clusters) 2500 Ambient Intelligence Lab Carnegie Mellon Periodicity Transform Space time coordinates of the measurements 5 x 10 7.35 p −1 2 2 Energy = x date 7.3 = 1 p 7.25 2 ∑ x(i)2 i =0 7.2 -82.1 26.65 -82.2 26.6 26.55 -82.3 -82.4 longitude 6 x 10 26.5 26.45 latitude Power of the components of periodicity transform of a medium set of data 5 0.25 0.2 5 4 Power 0.15 3 0.1 2 0.05 1 0 0 0 2000 4000 6000 8000 10000 12000 14000 0 100 200 300 400 Periodicity 500 600 700 Carnegie Mellon Ambient Intelligence Lab Experiments with cell count data Ambient Intelligence Lab Carnegie Mellon Zoomed result Ambient Intelligence Lab Carnegie Mellon Super-zoomed results Ambient Intelligence Lab Carnegie Mellon Cellular Automata CA is a two-dimensional simulation of surface physics, chemistry or biology. It’s simple however, it could be computationally expensive for large problems. Carnegie Mellon Ambient Intelligence Lab Visualization of the prediction model Ambient Intelligence Lab Carnegie Mellon 3-D Stereo Projection Lab Ambient Intelligence Lab Carnegie Mellon V.M.S.V. Methodology Vision: • Spatial Density Filter • Correlation Filter and Particle Filter • Mutual Information Mining: • Spatiotemporal Neural Network • Spatiotemporal Bayesian Model • Periodicity Transform Simulation: • Cellular Automata Visualization: • Pseudo Color, Mapping, Animation... Ambient Intelligence Lab Carnegie Mellon Conclusions 1. Data mining without domain expertise is like fishing-in-the-dark. Multiple experts are needed. 2. Vision-Mining-Simulation-Visualization (VMSV) method puts human experts in the loop, which increase the chance of success. 3. Vision algorithm enables automated image-based data mining. 4. Neural network model shows promising in compressing the shape information in orders of magnitude. However, it has its limitations in long-term prediction. 5. Bayesian prediction shows promising in long-term prediction and efficient computational speed. 6. How to couple the multi-physics models with data mining models is a big challenge. Ambient Intelligence Lab Carnegie Mellon Publications 1. Y. Cai, R. Stumpf, etc. Spatiotemporal Data Mining for Prediction of Harmful Algal Blooms, International Harmful Algae Conference, Copenhagen, September 8-12, 2006 2. Y. Cai, Y. Hu, Onboard Inverse Physics from Sensor Web, Proceedings of Space Missions and Challenges, SMC-IT, JPL, 2006 3. Y. Cai and K. Fu, Spatiotemporal Data Mining with Cellular Automata, Proceedings of International Conference of Computational Science, ICCS 2006, May 30, UK 4. Y. Cai, D. Chung, K. Fu, R. Stumpf, T. Wynne, M. Tomlinson, Spatiotemporal Data Mining with Micro Visual Interaction, submitted to Journal of Knowledge and Information Systems 5. Y. Cai, K. Fu, R. Stumpf, T. Wynne, M. Tomlinson, Spatiotemporal Data Mining for Monitoring Ocean Objects, submitted to NASA Data Mining Workshop, JPL, 2006 6. Y. Cai, Y. Hu, Sensory Stream Data Mining on Chip, submitted to NASA Data Mining Workshop, JPL, 2006 7. Y. Cai, (editor), Special Issue of Visual data Mining, Journal of Information Visualization, to be published by Elsevier, 2006 8. Y. Cai and J. Abascal, (editors), Ambient Intelligence in Everyday Life, Lecture Notes in Artificial Intelligence, LNAI 3864, to be published by Springer, April, 2006 Carnegie Mellon Ambient Intelligence Lab Ambient Intelligence Lab Carnegie Mellon Acknowledgement The study is supported by NASA ESTO grant AIST-QRS-043031. We are indebted to our collaborators in NOAA, FWRI, GSFC. The authors appreciate the comments and suggestions from Karen Meo, Kai-Dee Chu, Steven Smith, Gene Carl Feldman and James Acker from NASA. Also, many thanks to Christos Faloutous and Mel Siegel from CMU and Andrew Moore from Google Research Center in Pittsburgh.