Anand Rangarajan and Sanjay Ranka
Dept. of CISE
University of Florida
Remote Sensing: Big Spatial Data
Temporal
Spectral AVIRIS ( 20m , 224B ): On demand , airborne, 700km/hr.
ARIES ( 30m , 32B , 7 day )
Landsat-1 (MSS):
80m , 4B , 18 day revisit
1970’s
1M (SPOT, IKONOS, WorldView)
Spatial
2000 Sub-meter (Aerial, …)
AVHRR ( 1KM , 5B , 1 day)
MODIS ( 250m-1KM , 36B , 1-2 days )
5TB/day – Heterogeneous data
Challenge
•
How to map informal settlements (slums, barrios, shanty towns, …) in an efficient manner?
Very High Resolution
Imagery
Formal
Informal
•
Objects may be same (e.g.,
Buildings, Roads,
…), but not the spatial patterns
(neighborhoods)
Sample-based ML versus Spatiotemporal ML
Expert labels
Sample-based classification
Segmentation
Classification
Unsupervised learning
Permutation (acid test)
Spatial image classification
Sample-based
K-Means Clustering
Support vector machine
Pixel dictionary
Invariant
Spatiotemporal
Graph partitioning
????
Superpixel dictionary
Not invariant
Novel Framework
•
Superpixel formation
•
Superpixel descriptor
•
Building the Superpixel graph
•
Initializing the graph labels
•
Label refinement
•
Parallelization
Superpixel Descriptor
•
Intensity histograms
•
Corner density
•
Size of Superpixels
•
Binary feature based on coarse scale tessellation
All these features are concatenated to obtain a 54 dimensional descriptor for each superpixel
Superpixel formation
•
Parcellation into homogenous regions
•
Huge reduction in data complexity
Semi-supervised Learning: Initializing the graph labels
•
The expert labels a very few pixels (ground truth)
•
This sparsely available ground truth is used to learn a classifier.
•
Several approaches are used:
–
K- nearest neighbors (based on k-d trees)
–
SVM (we use a linear kernel in our framework)
•
Once the classifier is obtained, rudimentary labels are obtained for all the superpixels
Label refinement
•
Labels obtained in the previous step are rudimentary.
•
This can cause artifacts (neighboring regions belonging to same class can get different labels).
•
Laplacian propagation is used to smooth the labels obtained in the previous step .
Laplacian propagation
•
Minimize the following cost function separately for each category:
–
𝑌𝑌 𝑖𝑖 is 1 if the superpixel belongs to the category and 0 otherwise.
–
𝑋𝑋 𝑖𝑖 is relaxed to take real values
•
Maximum value of 𝑋𝑋 for each category is used as the
Performance and Accuracy
• Our approach labels approximately 90% of the pixels correctly in 20 minutes for a 1000 x 1000 image patch while Citation KNN takes 27.8 hours and GMIL takes 3.1 hours for a 1000 x 1000 image patch, at a fixed grid size.
• Our approach does not require experimentation to obtain the right grid size which is an unsolved problem.
• Our segment boundaries correspond to natural object boundaries.
Parallelizaton
•
Most of the steps are easily parallelizable using data decomposition techniques (Parallel versions using
MPI, Apache Spark in development).
•
Expect to be able to process 200 square km per core per day
•
Using 1000 cores should result in processing of
200,000 sq km per day.
Manu Sethi, Yupeng Yan, Anand Rangarajan, Ranga Raju Vatsavai, Sanjay
Ranka: An efficient computational framework for labeling large scale spatiotemporal remote sensing datasets.
IC3 2014: 635-640
Manu Sethi, Yupeng Yan, Anand Rangarajan, Ranga Raju Vatsavai, Sanjay
Ranka: Scalable Machine Learning Approaches for Terrain
Classication datasets.
Submitted.
Tessellation of Rio
10.8K * 9.8K
2K * 2K patches
Combine results
35000 superpixels
Results
Superpixel descriptor: Corner Density
Corner density is negligible in forest regions but quite high in urban and slum regions. We use Harris corner detector for obtaining the corners.
Parallelization
•
Most of the steps are easily parallelizable using data decomposition techniques (Parallel versions using
MPI, Apache Spark in development).
•
Expect to be able to process 200 square km per core per day
•
Using 1000 cores should result in processing of
200,000 sq. km per day.
Manu Sethi, Yupeng Yan, Anand Rangarajan, Ranga Raju Vatsavai, Sanjay
Ranka: An efficient computational framework for labeling large scale spatiotemporal remote sensing datasets.
IC3 2014: 635-640
Manu Sethi, Yupeng Yan, Anand Rangarajan, Ranga Raju Vatsavai, Sanjay
Ranka: Scalable Machine Learning Approaches for Terrain
Classication datasets.
Submitted.
Source: SPIE
Image Patches
3D, 4D and 5D blocks
Mixtures of HOSVDs
Research Q: Generalization to non-uniform tesselations?
IEEE Trans. Image Proc. 2008, IEEE T-PAMI 2013
Joint work with Arunava Banerjee
(Joint work with Paul Gader)
The hyperspectral realm: Numerous bands at each pixel
Segmentation, Unmixing and
Classification: Spatiotemporal ML
Image
Segmentation
Endmember
Extraction
GRENDEL (Graph-based Endmember
Extraction and Labeling
Abundance
Estimation
Unmixing
Simulation data containing vortices
Physics-based feature detectors combined via machine learning into compound classifier (CC)
Machine learning Physics
Spatio-temporal machine learning for turbulent flow?
Computer Graphics Forum, 2014
CC has lower error rate and better sensitivity and specificity
Representation of Multilevel Structures
•
Quantize classical system Linear Schrödinger
•
Wave magnitude: Probability, Phase: Curve geom.
•
Apps : Correspondence, indexing, classification
Research Direction: Can other classical problems like shortest paths on manifolds be “quantized”?
NSF 1143963: IEEE CVPR 2012, SIAM Math Anal. 2012