Knowledge Extraction Scientific Data Roy Williams from

advertisement
Knowledge Extraction
from
Scientific Data
Roy Williams
California Institute of Technology
roy@caltech.edu
KE Tools
SDMIV
24 October 2002
Edinburgh
S Data
Scientific Data
 Datacubes
 N-dimensional array
– spectrum, time-series,
– image, voxels, hyperspectral image



Concentration
Pattern matching
Integration
 Event Sets
 Often derived from pattern matching
 A set of events is a table
 Integrating Event Sets
 Clustering
Knowledge Extraction
 Concentration
principle components
 cluster/outlier finding

 Datacube  Eventset
Pattern matching
 From theory or from training set

 Integration
registration of datacubes
 join / crossmatch of eventsets

Datacube
Some stars from the DPOSS survey
Datacube
An AVIRIS image of San Francisco Bay
atmospheric
absorption
400-2500 nm in 224 bands
R. Green, JPL
Concentrating Information
 eg Principle Component Analysis
Given a set of vectors
 Compute dot products


(same as correlations)
Diagonalize
 Throw out weaker (noise) components

Information concentration
Principle Component Analysis
Event Sets
 Created by pattern matching
 from
a known rule
 from a training set
 by finding clusters
Event Set = Table
103?
name=ID
content=key
units=none
datatype=char
108?
E3948547
E3948545
E3943766
name=longitude
content=Earth
coordinate
units=degrees
datatype=double
display=f6.2
43.4
87.2
83.2
Gravitational Lenses
Pattern matching finds events in datacubes
A. Szalay, Johns Hopkins
Black hole collisions
LIGO: Laser Interferometric Gravitational Wave Experiment
Creating Event Sets
Supervised Classification
Given a set of volcanoes, find a lot more volcanoes
Here we use Singular Value Decomposition
Multiparameter
data
all sources
high fX/fopt
colour-colour-fx/fopt
stellar
galaxy
compact
galaxy
symbols: X-ray source counterparts
contours: all optical objects
BLAGN
medium fX/fopt
low fX/fopt
active
dM stars
F/G stars?
NELGs
BLAGN
Mike Watson
Leicester University
possible hi-z
quasar
normal
galaxies?
Integrating Datacubes
Find a mapping from one domain to the other
Registration of DPOSS and Hubble Deep Field
Datacube Registration
Movement of ice inferred from registration
Integrating Event Sets
 Database Join
 Fuzzy Join

eg astronomical crossmatch
 Distributed Join

does the Grid do databases?
Integration of Star Catalogs
Visualizing Event Sets
Unsupervised clustering
50000 stars in color-color space
A Grid of Services
Human gets Data
Understood by human
Further processing after format change
Network of Services
Grid of pipes and engines
Switches and actuators
data flow
Example Grid of Services
Catalog
Service
Query Check
Service
Query
Estimator
DPOSS
Service
Crossmatch
Service
User’s code
2MASS
Service
Storage
Service
flexible complex metadata
AND
broadband binary
Computing Challenges
• High-dimensional
Clustering & Classification
Visualization
Outlier Detection
• Visualization of 1010 points
• Database access to 1010 points
• Large Distributed Join
Standards needed
• Bundling diverse objects together
with code and references
• Referencing data resources on the Grid
local, remote, replicated, ....
Problem Solving Environment
Catalog
Service
Query Check
Service
Query
Estimator
DPOSS
Service
Crossmatch
Service
User’s code
Storage
Service
2MASS
Service
•Plumbing (big data) and electrical (control, metadata)
•Web service and workflow
•Finding service classes/implementations by semantics
•GUI / Executive / IO adapters / Algorithms
Download