CROSS DISCIPLINARY APPLICATIONS OF MULTIPLEX OBSERVATIONAL AND COMPUTATIONAL DATASETS USING FOR ARCHIVING AND HIGH PERFORMANCE PROCESSING. Marcel Ritter, Werner Benger, Joseph Stoeckl, Donna Delparte, Mike Folk, Quincey Koziol, Frank Steinbacher and Markus Aufleger ASTRO@U Center for Outlook • • • • Motivation Requirements on a Data Format Introduction HDF5 F5 – Introduction – Examples of Data Sets • Application Example: – The Hawaiian Geospatial Data Repository • Conclusion Motivation Scientific Collaboration Workgroup A Workgroup B Motivation Scientific Collaboration Workgroup A Software Tool 1 File Format 1 Workgroup B Software Tool 2 File Format 2 Motivation Workgroup A Workgroup B Software Tool 1 Software Tool 2 File Format 1 File Format 2 Data Exchange Motivation File Format 1 File Format N … File Format 2 File Format 3 File Format 5 File Format 4 Motivation File Format 1 File Format N … File Format 2 o(N2) File Format 3 File Format 5 File Format 4 Huge Implementation Effort Motivation File Format 1 File Format N … File Format 2 Common Data Format File Format 3 File Format 5 File Format 4 Less Implementation Effort o(N) Motivation Easier collaboration More time for science Workgroup A Workgroup B Software Software Tool 1 Tool 2 Common Workgroup C Data Workgroup D Software 3Software 4 Format Easy access Sustainable (>10 years) Well documented and user community Selfdescriptive Fast and efficient Huge data (Terabytes) Huge variety of data Hierarchical Data Format 5 http://www.hdfgroup.org/HDF5 - A Few Analogies • • • • File system (in a file) Binary XML file PDF for numerical data Database (container for array variables) - Relationships / Attribute Group Dataset Timestep 36,000 lat | lon | temp ----|-----|----12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Parameters 10;100;1000 -What Users Get… • A multi-platform library and tools built on over 10 years experience in large data handling from the high performance computing community (HPC). • A capability that: – Lets them organize large and/or complex collections of data – Gives them efficient and scalable data storage and access – Lets them integrate a wide variety of types of data and data sources – Guarantees long-term data integrity and preservation 14 • Shapefiles: HDF5 as container format Browser application Pixel data Vector data • Shapefiles: HDF5 as container format Browser application Attribute data - More Applications Earth Science (Earth Observing System) AquaAqua Aura Terra TES HRDLS MLS OMI (6/01) CERES CERES MISR MODIS MOPIT Big simulations MODIS AMSR T Billions of elements/dozens associated values Flight Testing Movie Making Fiber Bundle Data Model http://www.fiberbundle.net • Based on HDF5 • Inspired by concepts of: – Topology – Differential Geometry – Geometric Algebra • Separation of Geometry (Grids) and Datafield (Fields) Grid Field • Hierarchical Structure: Field Coordinates Topology Grid Time Slice Fiber Bundle • Hierarchical Structure: Field Coordinates Topology Grid Time Slice Fiber Bundle Visible to the end user • Multi Channel – Multi Resolution Images: • Multi Channel – Multi Resolution Images: Time Grid Topology Representation Field /1.4/Satellite/VertexRefinement1x1/Cartesian/Positions /RGB /N-IR /T-IR /VertexRefinement2x2/Cartesian/Positions /RGB /N-IR /T-IR /1.6/ … [Datatype] [uniform-grid] [byte,byte,byte] [float64] [float64] “ • Full Waveform LIDAR: t_emission t1 t2 t3 • Full Waveform LIDAR: - Laser Data t_emission t1 t2 t3 Time Grid Topology Representation Field [Datatype] /CorseTime/LASER/POINTS/CartesianCoords/Positions [point3D] /TimeStamp [float64] /Waveform [uint16,uint16] /Reflectance [float32] /SHOTS /SHOTSAsPOINTS/Positions /Origin /Direction /EmissionTime vlen[uint32] [point3D] [vector3D] [float64] • Full Waveform LIDAR: - Airplane Data /CorseTime/PLANE/POINTS/CartesianCoords/Positions /Rotation /TimeStamps [point3D] [rotor3D] [float64] • Bringing together in F5: – Satellite data – LIDAR – Shapefiles • Features of HDF5 • • • • • Benefits Sustainable storage Meta data Compression Parallel IO Hyperslab access • Consistent data organization of simple and complex spatial-temporal data • Handle time series of data easily • Make tools of other disciplines applicable to the Geoscience Community, such as astrophysics imaging mosaic tools for satellite data: Montage, http://montage.ipac.caltech.edu Application Example http://www.epscor.hawaii.edu Goal: Centralized integrative capability to store and manage access to massive (terabytes) research datasets Users: University of Hawaii research teams Objectives: Collect, store and manage access to data Utilize user portals Utilize and link to the Maui High Performance Computing Center (MHPCC) Broad statewide research community Mission: Discovery, manipulation, fusion and visualization Geospatial Information and Mass Storage Geospatial Information and Mass Storage How to manage and store large complex datasets?!! Geospatial Information and Mass Storage Geospatial Information and Mass Storage • A common data format eases reduces wasted time spent on data conversions and • Data formats for sustainable transparent storage of huge and complex data exist, one just has to use them – • captures observational and simulation data consistently. • Geoscience repositories, such as the can be built upon this format. References: http://www.hdfgroup.org/HDF5 http://www.fiberbundle.net http://www.epscor.hawaii.edu http://montage.ipac.caltech.edu http://sciviz.cct.lsu.edu - HDFView screenshot of shapefiles Geospatial Information and Mass Storage • • • • • • • • Weather station data Marine buoy sensor data GPS data collection Database datasets, excel files Spatial data - imagery, LiDAR, GIS Upload and download capability Metadata search capacity Visualization of spatial and nonspatial datasets • • • • Geoweb application services – WMS, WFS, WPC Database management Data streaming Data storage of statewide datasets • • Access to HPC services real-time modeling and analysis • Grid – Manifold describing the base space • • • • Topology Refinement level Coordinate representation Vertex positions in representation