pptx - of Marcel Ritter

advertisement
CROSS DISCIPLINARY APPLICATIONS OF MULTIPLEX
OBSERVATIONAL AND COMPUTATIONAL DATASETS USING
FOR ARCHIVING AND
HIGH PERFORMANCE PROCESSING.
Marcel Ritter, Werner Benger, Joseph Stoeckl,
Donna Delparte, Mike Folk, Quincey Koziol,
Frank Steinbacher and Markus Aufleger
ASTRO@U
Center for
Outlook
•
•
•
•
Motivation
Requirements on a Data Format
Introduction HDF5
F5
– Introduction
– Examples of Data Sets
• Application Example:
– The Hawaiian Geospatial Data Repository
• Conclusion
Motivation
Scientific Collaboration
Workgroup A
Workgroup B
Motivation
Scientific Collaboration
Workgroup A
Software
Tool 1
File Format
1
Workgroup B
Software
Tool 2
File Format 2
Motivation
Workgroup A
Workgroup B
Software
Tool 1
Software
Tool 2
File Format
1
File Format 2
Data Exchange
Motivation
File Format
1
File Format N
…
File Format 2
File Format 3
File Format 5
File Format 4
Motivation
File Format
1
File Format N
…
File Format 2
o(N2)
File Format 3
File Format 5
File Format 4
Huge Implementation Effort
Motivation
File Format
1
File Format N
…
File Format 2
Common
Data
Format
File Format 3
File Format 5
File Format 4
Less Implementation Effort
o(N)
Motivation
Easier collaboration
More time for science
Workgroup A
Workgroup B
Software Software
Tool 1
Tool 2
Common
Workgroup C Data
Workgroup D
Software 3Software 4
Format
Easy access
Sustainable
(>10 years)
Well documented
and user
community
Selfdescriptive
Fast and
efficient
Huge data
(Terabytes)
Huge variety
of data
Hierarchical Data Format 5
http://www.hdfgroup.org/HDF5
- A Few Analogies
•
•
•
•
File system (in a file)
Binary XML file
PDF for numerical data
Database (container for
array variables)
- Relationships
/
Attribute
Group
Dataset
Timestep
36,000
lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
Parameters
10;100;1000
-What Users Get…
• A multi-platform library and tools built on over 10 years
experience in large data handling from the high
performance computing community (HPC).
• A capability that:
– Lets them organize large and/or complex collections of data
– Gives them efficient and scalable data storage and access
– Lets them integrate a wide variety of types of data and data
sources
– Guarantees long-term data integrity and preservation
14
• Shapefiles: HDF5 as container format
Browser application
Pixel data
Vector data
• Shapefiles: HDF5 as container format
Browser application
Attribute data
- More Applications
Earth Science (Earth Observing System)
AquaAqua Aura
Terra
TES HRDLS
MLS OMI
(6/01)
CERES
CERES MISR
MODIS MOPIT
Big simulations
MODIS
AMSR
T
Billions of elements/dozens associated values
Flight Testing
Movie Making
Fiber Bundle Data Model
http://www.fiberbundle.net
• Based on HDF5
• Inspired by concepts of:
– Topology
– Differential Geometry
– Geometric Algebra
• Separation of Geometry (Grids) and Datafield (Fields)
Grid
Field
• Hierarchical Structure:
Field
Coordinates
Topology
Grid
Time Slice
Fiber Bundle
• Hierarchical Structure:
Field
Coordinates
Topology
Grid
Time Slice
Fiber Bundle
Visible to the end user
• Multi Channel – Multi Resolution Images:
• Multi Channel – Multi Resolution Images:
Time
Grid
Topology
Representation
Field
/1.4/Satellite/VertexRefinement1x1/Cartesian/Positions
/RGB
/N-IR
/T-IR
/VertexRefinement2x2/Cartesian/Positions
/RGB
/N-IR
/T-IR
/1.6/ …
[Datatype]
[uniform-grid]
[byte,byte,byte]
[float64]
[float64]
“
• Full Waveform LIDAR:
t_emission
t1
t2
t3
• Full Waveform LIDAR: - Laser Data
t_emission
t1
t2
t3
Time
Grid Topology Representation
Field
[Datatype]
/CorseTime/LASER/POINTS/CartesianCoords/Positions
[point3D]
/TimeStamp
[float64]
/Waveform
[uint16,uint16]
/Reflectance [float32]
/SHOTS /SHOTSAsPOINTS/Positions
/Origin
/Direction
/EmissionTime
vlen[uint32]
[point3D]
[vector3D]
[float64]
• Full Waveform LIDAR: - Airplane Data
/CorseTime/PLANE/POINTS/CartesianCoords/Positions
/Rotation
/TimeStamps
[point3D]
[rotor3D]
[float64]
• Bringing together in F5:
– Satellite data
– LIDAR
– Shapefiles
• Features of HDF5
•
•
•
•
•
Benefits
Sustainable storage
Meta data
Compression
Parallel IO
Hyperslab access
• Consistent data organization of simple and complex
spatial-temporal data
• Handle time series of data easily
• Make tools of other disciplines applicable to the Geoscience Community, such as astrophysics imaging mosaic
tools for satellite data: Montage,
http://montage.ipac.caltech.edu
Application Example
http://www.epscor.hawaii.edu
Goal:
Centralized integrative capability to store and manage
access to massive (terabytes) research datasets
Users:
University of Hawaii
research teams
Objectives:
Collect, store and manage
access to data
Utilize user portals
Utilize and link to the Maui High
Performance Computing Center
(MHPCC)
Broad statewide
research community
Mission:
Discovery, manipulation, fusion and
visualization
Geospatial Information and Mass Storage
Geospatial Information and Mass Storage
How to manage and store
large complex datasets?!!
Geospatial Information and Mass Storage
Geospatial Information and Mass Storage
• A common data format eases
reduces wasted time spent on data conversions
and
• Data formats for sustainable transparent storage of huge
and complex data exist, one just has to use them –
•
captures observational and simulation data
consistently.
• Geoscience repositories, such as the
can be built upon this format.
References:
http://www.hdfgroup.org/HDF5
http://www.fiberbundle.net
http://www.epscor.hawaii.edu
http://montage.ipac.caltech.edu
http://sciviz.cct.lsu.edu
- HDFView
screenshot of shapefiles
Geospatial Information and Mass Storage
•
•
•
•
•
•
•
•
Weather station data
Marine buoy sensor data
GPS data collection
Database datasets, excel files
Spatial data - imagery, LiDAR, GIS
Upload and download capability
Metadata search capacity
Visualization of spatial and nonspatial datasets
•
•
•
•
Geoweb application services –
WMS, WFS, WPC
Database management
Data streaming
Data storage of statewide datasets
•
•
Access to HPC services
real-time modeling and analysis
• Grid
– Manifold describing the base space
•
•
•
•
Topology
Refinement level
Coordinate representation
Vertex positions in representation
Download