The HDF Group Introduction to HDF5 Barbara Jones The HDF Group The 13th HDF & HDF-EOS Workshop November 3-5, 2009 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 1 www.hdfgroup.org Before We Begin … HDF-EOS Home Page: http://hdfeos.org/ Workshop Info: http://hdfeos.org/workshops/ws13/workshop_thirteen.php The HDF Group Page: HDF5 Home Page: HDF Helpdesk: HDF Mailing Lists: November 3-5, 2009 http://hdfgroup.org/ http://hdfgroup.org/HDF5/ help@hdfgroup.org http://hdfgroup.org/services/support.html HDF/HDF-EOS Workshop XIII 2 www.hdfgroup.org HDF = Hierarchical Data Format HDF5 is the second HDF format • Development started in 1996 • First release was in 1998 HDF4 is the first HDF format • Originally called HDF • Development started in 1987 • Still supported by The HDF Group November 3-5, 2009 HDF/HDF-EOS Workshop XIII 3 www.hdfgroup.org HDF5 is like… 5 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 4 www.hdfgroup.org HDF5 is designed … • for high volume and/or complex data • for every size and type of system (portable) • for flexible, efficient storage and I/O • to enable applications to evolve in their use of HDF5 and to accommodate new models • to support long-term data preservation November 3-5, 2009 HDF/HDF-EOS Workshop XIII 5 www.hdfgroup.org HDF5 Technology HDF5 is a data model, library and file format for managing data. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 6 www.hdfgroup.org HDF5 Technology • HDF5 (Abstract) Data Model • • Defines the “building blocks” for data organization and specification Files, Groups, Datasets, Attributes, Datatypes, Dataspaces, … • HDF5 Library (C, Fortran 90, C++ APIs) • Also Java Language Interface and High Level Libraries • HDF5 Binary File Format • • Bit-level organization of HDF5 file Defined by HDF5 File Format Specification • Tools For Accessing Data in HDF5 Format • h5dump, h5repack, HDFView, … November 3-5, 2009 HDF/HDF-EOS Workshop XIII 7 www.hdfgroup.org The HDF Group HDF5 Abstract Data Model a.k.a. HDF5 Logical Data Model a.k.a. HDF5 Data Model November 3-5, 2009 HDF/HDF-EOS Workshop XIII 8 www.hdfgroup.org HDF5 File lat | lon | temp ----|-----|----12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 An HDF5 file is a container that holds data objects. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 9 www.hdfgroup.org HDF5 Groups and Links HDF5 groups and links organize data objects. / Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 SimOut Viz lat | lon | temp ----|-----|----12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 10 www.hdfgroup.org HDF5 Objects The two primary HDF5 objects are: • HDF5 Group: A grouping structure containing zero or more HDF5 objects • HDF5 Dataset: Raw data elements, together with information that describes them (There are other HDF5 objects that help support Groups and Datasets.) November 3-5, 2009 HDF/HDF-EOS Workshop XIII 11 www.hdfgroup.org HDF5 Groups • Used to organize collections • Every file starts with a root group • Similar to UNIX directories • Path to object defines it • Objects can be shared: /A/k and /B/l are the same temp “/” A k B C l temp = Group = Dataset November 3-5, 2009 HDF/HDF-EOS Workshop XIII 12 www.hdfgroup.org HDF5 Datasets HDF5 Datasets organize and contain your “raw data values”. They consist of: • Your raw data • Metadata describing the data: - The information to interpret the data (Datatype) - The information to describe the logical layout of the data elements (Dataspace) - Characteristics of the data (Properties) - Additional optional information that describes the data (Attributes) November 3-5, 2009 HDF/HDF-EOS Workshop XIII 13 www.hdfgroup.org HDF5 Dataset Metadata Data Dataspace Rank Dimensions 3 Datatype Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Integer (optional) Attributes Properties Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 14 www.hdfgroup.org HDF5 Dataspaces An HDF5 Dataspace describes the logical layout for the data elements: • Array • multiple elements in dataset organized in a multi-dimensional (rectangular) array • maximum number of elements in each dimension may be fixed or unlimited • NULL • no elements in dataset • Scalar • single element in dataset • November 3-5, 2009 HDF/HDF-EOS Workshop XIII 15 www.hdfgroup.org HDF5 Dataspaces Two roles: Dataspace contains spatial information (logical layout) about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition Rank = 2 Dimensions = 4x6 Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 1 Dimension = 10 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 16 www.hdfgroup.org HDF5 Datatypes The HDF5 datatype describes how to interpret individual data elements. HDF5 datatypes include: − integer, float, unsigned, bitfield, … − user-definable (e.g., 13-bit integer) − variable length types (e.g., strings) − references to objects/dataset regions − enumerations - names mapped to integers − opaque − compound (similar to C structs) November 3-5, 2009 HDF/HDF-EOS Workshop XIII 17 www.hdfgroup.org HDF5 Dataset 3 5 V Datatype: 16-byte integer Dataspace: Rank = 2 Dimensions = 5 x 3 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 19 www.hdfgroup.org HDF5 Properties • Properties (also known as Property Lists) are characteristics of HDF5 objects that can be modified • Default properties handle most needs • By changing properties one can take advantage of the more powerful features in HDF5 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 20 www.hdfgroup.org Storage Properties Data elements stored physically adjacent to each other Contiguous (default) Better access time for subsets; extensible Chunked Improves storage efficiency, transmission speed Chunked & Compressed November 3-5, 2009 HDF/HDF-EOS Workshop XIII 21 www.hdfgroup.org HDF5 Attributes (optional) • An HDF5 attribute has a name and a value • Attributes typically contain user metadata • Attributes may be associated with - HDF5 groups - HDF5 datasets - HDF5 named datatypes • An attribute’s value is described by a datatype and a dataspace • Attributes are analogous to datasets except… - they are NOT extensible - they do NOT support compression or partial I/O November 3-5, 2009 HDF/HDF-EOS Workshop XIII 22 www.hdfgroup.org HDF5 Abstract Data Model Summary • The Objects in the Data Model are the “building blocks” for data organization and specification • Files, Groups, Links, Datasets, Datatypes, Dataspaces, Attributes, … • Projects using HDF5 “map” their data concepts to these HDF5 Objects November 3-5, 2009 HDF/HDF-EOS Workshop XIII 23 www.hdfgroup.org The HDF Group HDF5 Software November 3-5, 2009 HDF/HDF-EOS Workshop XIII 24 www.hdfgroup.org HDF5 Library Tools HDF5 Software Layers & Storage API … High Level APIs Language Interfaces C, Fortran, C++ Internals Virtual File Layer h5dump tool h5repack tool Java Interface HDF5 Data Model Objects Tunable Properties Groups, Datasets, Attributes, … Memory Mgmt Datatype Conversion Filters Split Files Posix I/O HDFview tool Chunk Size, I/O Driver, … Chunked Storage Version Compatibility and so on… Custom MPI I/O Storage I/O Drivers HDF5 File Format November 3-5, 2009 File Split Files HDF/HDF-EOS Workshop XIII File on Parallel Filesystem 25 Other www.hdfgroup.org HDF5 API and Applications Applications aClimate Model Domain Data Objects EOS library MATLAB … HDF5 Library Storage November 3-5, 2009 HDF/HDF-EOS Workshop XIII 26 www.hdfgroup.org HDF5 Home Page HDF5 home page: http://hdfgroup.org/HDF5/ • Two releases: HDF5 1.8 and HDF5 1.6 HDF5 source code: • • Written in C, and includes optional C++, Fortran 90 APIs, and High Level APIs Contains command-line utilities (h5dump, h5repack, h5diff, ..) and compile scripts HDF pre-built binaries: • When possible, include C, C++, F90, and High Level libraries. Check ./lib/libhdf5.settings file. • Built with and require the SZIP and ZLIB external libraries November 3-5, 2009 HDF/HDF-EOS Workshop XIII 27 www.hdfgroup.org Useful Tools For New Users h5dump: Tool to “dump” or display contents of HDF5 files h5cc, h5c++, h5fc: Scripts to compile applications HDFView: Java browser to view HDF4 and HDF5 files http://www.hdfgroup.org/hdf-java-html/hdfview/ November 3-5, 2009 HDF/HDF-EOS Workshop XIII 28 www.hdfgroup.org h5dump Utility h5dump [options] [file] -H, --header -d <names> -g <names> -p Display header only – no data Display the specified dataset(s). Display the specified group(s) and all members. Display properties. <names> is one or more appropriate object names. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 29 www.hdfgroup.org Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } “/” } ‘dset’ } } November 3-5, 2009 HDF/HDF-EOS Workshop XIII 30 www.hdfgroup.org HDF5 Compile Scripts • h5cc – HDF5 C compiler command • h5fc – HDF5 F90 compiler command • h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 31 www.hdfgroup.org Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c Will show the correct paths and libraries used by the installed HDF5 library. Will show the correct flags to specify when building an application with that HDF5 library. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 32 www.hdfgroup.org The HDF Group Browsing HDF5 Files with HDFView November 3-5, 2009 HDF/HDF-EOS Workshop XIII 33 www.hdfgroup.org HDFView Structure of File November 3-5, 2009 Contents of Dataset HDF/HDF-EOS Workshop XIII 34 www.hdfgroup.org HDFView File Menu November 3-5, 2009 HDF/HDF-EOS Workshop XIII 35 www.hdfgroup.org HDF-EOS5 File in HDFView November 3-5, 2009 HDF/HDF-EOS Workshop XIII 36 www.hdfgroup.org Introduction to HDF5 Programming Model and APIs November 3-5, 2009 HDF/HDF-EOS Workshop XIII 37 www.hdfgroup.org Operations Supported by the API • Create objects (groups, datasets, attributes, complex data types, …) • Assign storage and I/O properties to objects • Perform complex subsetting during read/write • Use variety of I/O “devices” (parallel, remote, etc.) • Transform data during I/O • Make inquiries on file and object structure, content, properties November 3-5, 2009 HDF/HDF-EOS Workshop XIII 38 www.hdfgroup.org General Programming Paradigm • Properties of object are optionally defined Creation properties Access properties • Object is opened or created • Object is accessed, possibly many times • Object is closed November 3-5, 2009 HDF/HDF-EOS Workshop XIII 39 www.hdfgroup.org Order of Operations • An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -becausethe dataset open call requires a file handle as an argument. • Objects can be closed in any order. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 40 www.hdfgroup.org The General HDF5 API • Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example Functions: H5D : Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose November 3-5, 2009 HDF/HDF-EOS Workshop XIII 41 www.hdfgroup.org HDF5 Defined Types For portability, the HDF5 library has its own defined types: hid_t: hsize_t: herr_t: object identifiers (native integer) size used for dimensions (unsigned long or unsigned long long) function return value hvl_t: variable length datatype For C, include hdf5.h in your HDF5 application. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 42 www.hdfgroup.org The HDF5 API • For flexibility, the API is extensive Victronix Swiss Army Cybertool 34 300+ functions • This can be daunting… but there is hope A few functions can do a lot Start simple Build up knowledge as more features are needed November 3-5, 2009 HDF/HDF-EOS Workshop XIII 43 www.hdfgroup.org Basic Functions H5Fcreate (H5Fopen) create (open) File H5Screate_simple/H5Screate create dataSpace H5Dcreate (H5Dopen) H5Dread, H5Dwrite H5Dclose H5Sclose H5Fclose create (open) Dataset access Dataset close Dataset close dataSpace close File NOTE: The order specified above is not required. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 44 www.hdfgroup.org Other Common Functions DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O) H5Dget_space Groups: H5Gcreate, H5Gopen, H5Gclose Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate November 3-5, 2009 HDF/HDF-EOS Workshop XIII 45 www.hdfgroup.org High Level APIs • Included along with the HDF5 library • Simplify steps for creating, writing, and reading objects. • Do not entirely ‘wrap’ HDF5 library November 3-5, 2009 HDF/HDF-EOS Workshop XIII 46 www.hdfgroup.org The HDF Group Example HDF5 Code November 3-5, 2009 HDF/HDF-EOS Workshop XIII 47 www.hdfgroup.org Steps to Create a File 1. Decide on properties the file should have and create them if necessary: • • • Creation properties, like size of user block Access properties (improve performance) Use default properties (H5P_DEFAULT) 2. Create the file 3. Close the file and the property lists, as needed November 3-5, 2009 HDF/HDF-EOS Workshop XIII 48 www.hdfgroup.org Code: Create a File hid_t herr_t file_id; status; file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5Fclose (file_id); “/” (root) Note: Return codes not checked for errors in code samples. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 49 www.hdfgroup.org Dataset Components Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Properties Chunked Compressed November 3-5, 2009 HDF/HDF-EOS Workshop XIII 50 www.hdfgroup.org Steps to Create a Dataset 1. Define dataset characteristics a) Datatype – integer b) Dataspace - 4x6 c) Properties if needed, or use H5P_DEFAULT 2. Decide where to put it • “/” (root) Obtain location ID: - Group ID puts it in a Group - File ID puts it in Root Group A 3. Create dataset in file 4. Close everything November 3-5, 2009 HDF/HDF-EOS Workshop XIII 51 www.hdfgroup.org HDF5 Pre-defined Datatype Identifiers HDF5 defines* set of Datatype Identifiers per HDF5 session. For example: C Type HDF5 File Type HDF5 Memory Type int H5T_STD_I32BE H5T_STD_I32LE H5T_NATIVE_INT float H5T_IEEE_F32BE H5T_IEEE_F32LE H5T_NATIVE_FLOAT double H5T_IEEE_F64BE H5T_IEEE_F64LE H5T_NATIVE_DOUBLE * Value of datatype is NOT fixed November 3-5, 2009 HDF/HDF-EOS Workshop XIII 52 www.hdfgroup.org Pre-defined File Datatype Identifiers Examples: H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point H5T_STD_I32LE Four-byte, little-endian, signed two's complement integer Architecture* Programming Type NOTE: What you see in the file. Name is the same everywhere and explicitly defines a datatype. *STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…” November 3-5, 2009 HDF/HDF-EOS Workshop XIII 53 www.hdfgroup.org Pre-defined Native Datatypes Examples of predefined native types in C: H5T_NATIVE_INT H5T_NATIVE_FLOAT H5T_NATIVE_UINT H5T_NATIVE_LONG H5T_NATIVE_CHAR (int) (float ) (unsigned int) (long ) (char ) NOTE: Memory types. Different for each machine. Used for reading/writing. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 54 www.hdfgroup.org Storage Properties Data elements stored physically adjacent to each other Contiguous (default) Better access time for subsets; extensible Chunked Improves storage efficiency, transmission speed Chunked & Compressed November 3-5, 2009 HDF/HDF-EOS Workshop XIII 55 www.hdfgroup.org Code: Create a Dataset 1 2 . . . . 5 6 7 hid_t hsize_t herr_t file_id, dataset_id, dataspace_id; dims[2]; status; file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, Define a dataspace H5P_DEFAULT, H5P_DEFAULT); dims[0] = 4; current dims rank dims[1] = 6; dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); November 3-5, 2009 HDF/HDF-EOS Workshop XIII 58 www.hdfgroup.org Code: Create a Dataset 1 . . . . . . . 8 hid_t hsize_t herr_t file_id, dataset_id, dataspace_id; dims[2]; status; file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); dims[0] = 4; dims[1] = 6; dataspace_id = H5Screate_simple (2, dims, NULL); Where to put it Datatype dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT,H5P_DEFAULT, H5P_DEFAULT); Size & shape November 3-5, 2009 Properties (Link Creation, Dataset Creation and Access) HDF/HDF-EOS Workshop XIII 59 www.hdfgroup.org Code: Create a Dataset 1 2 3 hid_t hsize_t herr_t file_id, dataset_id, dataspace_id; dims[2]; status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 5 6 7 dims[0] = 4; dims[1] = 6; dataspace_id = H5Screate_simple (2, dims, NULL); 8 dataset_id = H5Dcreate (file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); Terminate access to 9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id); November 3-5, 2009 HDF/HDF-EOS Workshop XIII dataspace, dataset, file 60 www.hdfgroup.org Example Code - H5Dwrite Dataset ID from H5Dcreate/H5Dopen Memory Datatype status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL,H5S_ALL, H5P_DEFAULT, wdata); November 3-5, 2009 HDF/HDF-EOS Workshop XIII 61 www.hdfgroup.org Partial I/O status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,wdata); Memory Dataspace H5S_ALL H5S_ALL File Dataspace (disk) To Modify Dataspace: H5Sselect_hyperslab H5Sselect_elements November 3-5, 2009 HDF/HDF-EOS Workshop XIII 62 www.hdfgroup.org Example Code – H5Dwrite status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, wdata); Data Transfer Property List (MPI I/O, Transformations,…) November 3-5, 2009 HDF/HDF-EOS Workshop XIII 63 www.hdfgroup.org Example Code – H5Dread status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, rdata); November 3-5, 2009 HDF/HDF-EOS Workshop XIII 64 www.hdfgroup.org High Level APIs: HDF5 Lite (H5LT) #include “hdf5_hl.h“ . . file_id = H5Fcreate(“file.h5",H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5LTmake_dataset (file_id,“A",2,dims, H5T_STD_I32BE, data); status = H5Fclose (file_id); November 3-5, 2009 HDF/HDF-EOS Workshop XIII 65 www.hdfgroup.org Steps to Create a Group 1. Decide where to put it – “root group” • Obtain location ID 2. Define properties or use H5P_DEFAULT 3. Create group in file. 4. Close the group. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 67 www.hdfgroup.org Example: Create a Group “/” (root) A B 4x6 array of integers file.h5 November 3-5, 2009 HDF/HDF-EOS Workshop XIII 68 www.hdfgroup.org Code: Create a Group hid_t file_id, group_id; ... /* Open “file.h5” */ file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT); /* Create group "/B" in file. */ group_id = H5Gcreate (file_id,"B", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); /* Close group and file. */ status = H5Gclose (group_id); status = H5Fclose (file_id); November 3-5, 2009 HDF/HDF-EOS Workshop XIII 70 www.hdfgroup.org HDF5 Tutorial and Examples HDF5 Tutorial: http://www.hdfgroup.org/HDF5/Tutor/ HDF5 Example Code: http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/ November 3-5, 2009 HDF/HDF-EOS Workshop XIII 71 www.hdfgroup.org The HDF Group Thank You! November 3-5, 2009 HDF/HDF-EOS Workshop XIII 72 www.hdfgroup.org Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. November 3-5, 2009 HDF/HDF-EOS Workshop XIII 73 www.hdfgroup.org The HDF Group Questions/comments? November 3-5, 2009 HDF/HDF-EOS Workshop XIII 74 www.hdfgroup.org