World Library  
Flag as Inappropriate
Email this Article

Hierarchical Data Format

Article Id: WHEBN0000635425
Reproduction Date:

Title: Hierarchical Data Format  
Author: World Heritage Encyclopedia
Language: English
Subject: NetCDF, HDF, OPeNDAP, NCAR Command Language, List of STEP (ISO 10303) parts
Collection: C Libraries, Computer File Formats, Earth Sciences Data Formats, Meteorological Data and Networks
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Hierarchical Data Format

Hierarchical Data Format
Filename extension .hdf, .h4, .hdf4, .he2, .h5, .hdf5, .he5
Developed by The HDF Group
Latest release
5-1.8.15
(May 15, 2015 (2015-05-15))
Type of format scientific data format
Website .org.hdfgroupwww

Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the National Center for Supercomputing Applications, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF.

In keeping with this goal, the HDF libraries and associated tools are available under a liberal, BSD-like license for general use. HDF is supported by many commercial and non-commercial software platforms, including Java, MATLAB, Scilab, Octave, IDL, Python, R, and Julia. The freely available HDF distribution consists of the library, command-line utilities, test suite source, Java interface, and the Java-based HDF Viewer (HDFView).[1]

The current version, HDF5, differs significantly in design and API from the major legacy version HDF4.

Contents

  • Early history 1
  • HDF4 2
  • HDF5 3
  • Interfaces 4
    • Officially supported APIs 4.1
    • Third-party bindings 4.2
  • See also 5
  • References 6
  • External links 7
    • Tools 7.1

Early history

The quest for portable scientific data format, originally dubbed AEHOO (All Encompassing Hierarchical Object Oriented format) began in 1987 by the Graphics Foundations Task Force (GFTF) at the National Center for Supercomputing Applications (NCSA). NSF grants received in 1990 and 1992 were important to the project. Around this time NASA investigated 15 different file formats for use in the Earth Observing System (EOS) project. After a two year review process, HDF was selected as the standard data and information system.[2]

HDF4

The HDF Group Logo

HDF4 is the older version of the format, although still actively supported by The HDF Group. It supports a proliferation of different data models, including multidimensional arrays, raster images, and tables. Each defines a specific aggregate data type and provides an API for reading, writing, and organizing the data and metadata. New data models can be added by the HDF developers or users.

HDF is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects. Users can create their own grouping structures called "vgroups."

The HDF4 format has many limitations.[3][4] It lacks a clear object model, which makes continued support and improvement difficult. Supporting many different interface styles (images, tables, arrays) leads to a complex API. Support for metadata depends on which interface is in use; SD (Scientific Dataset) objects support arbitrary named attributes, while other types only support predefined metadata. Perhaps most importantly, the use of 32-bit signed integers for addressing limits HDF4 files to a maximum of 2 GB, which is unacceptable in many modern scientific applications.

HDF5

The HDF5 format is designed to address some of the limitations of the HDF4 library, and to address current and anticipated requirements of modern systems and applications. In 2002 it won an R&D 100 Award.[5]

HDF5 simplifies the file structure to include only two major types of object:

HDF-Structure-Example
  • Datasets, which are multidimensional arrays of a homogeneous type
  • Groups, which are container structures which can hold datasets and other groups

This results in a truly hierarchical, filesystem-like data format. In fact, resources in an HDF5 file are even accessed using the POSIX-like syntax /path/to/resource. Metadata is stored in the form of user-defined, named attributes attached to groups and datasets. More complex storage APIs representing images and tables can then be built up using datasets, groups and attributes.

In addition to these advances in the file format, HDF5 includes an improved type system, and dataspace objects which represent selections over dataset regions. The API is also object-oriented with respect to datasets, groups, attributes, types, dataspaces and property lists.

The latest version of NetCDF, version 4, is based on HDF5.

Because it uses B-trees to index table objects, HDF5 works well for time series data such as stock price series, network monitoring data, and 3D meteorological data. The bulk of the data goes into straightforward arrays (the table objects) that can be accessed much more quickly than the rows of a SQL database, but B-Tree access is available for non-array data. The HDF5 data storage mechanism can be simpler and faster than an SQL star schema.

Interfaces

Officially supported APIs

  • C
  • C++
  • CLI - .Net
  • Fortran, Fortran 90
  • HDF5 Lite (H5LT) – a light-weight interface for C
  • HDF5 Image (H5IM) – a C interface for images or rasters
  • HDF5 Table (H5TB) – a C interface for tables
  • HDF5 Packet Table (H5PT) – interfaces for C and C++ to handle "packet" data, accessed at high-speeds
  • HDF5 Dimension Scale (H5DS) – allows dimension scales to be added to HDF5; to be introduced in the HDF5-1.8 release
  • Java

Third-party bindings

  • CGNS uses HDF5 as main storage
  • Common Lisp library hdf5-cffi
  • D offers bindings to the C API, with a high-level h5py style D wrapper under development
  • Erlang, Elixir, and LFE may use the bindings for BEAM languages
  • GNU Data Language
  • Go - kisielk's go-hdf5 package is based on sbinet's go-hdf5 package.
  • Huygens Software uses HDF5 as primary storage format since version 3.5
  • IDL
  • IGOR Pro offers full support of HDF5 files.
  • JHDF5,[6] an alternative Java binding that takes a different approach from the official HDF5 Java binding which some users find simpler
  • JSON though hdf5-json.
  • Julia provides HDF5 support through the HDF5 package.
  • LabVIEW can gain HDF support through third-party libraries, such as h5labview and lvhdf5.
  • Lua through the lua-hdf5 library.
  • MATLAB, Scilab or Octave – use HDF5 as primary storage format in recent releases
  • Mathematica[7] immediate analysis of HDF and HDF5 data
  • Perl[8]
  • Python supports HDF5 via h5py (both high- and low-level access to HDF5 abstractions) and via pytables (a high-level interface with advanced indexing and database-like query capabilities). HDF4 is available via Python-HDF4 and/or PyHDF for both Python 2 and Python 3.
  • R offers support in the rhdf5 package.

See also

References

  1. ^ Java-based HDF Viewer (HDFView)
  2. ^ "History of HDF Group". Retrieved 15 July 2014. 
  3. ^ How is HDF5 different from HDF4?
  4. ^ Are there limitations to HDF4 files?
  5. ^ R&D 100 Awards Archives
  6. ^ JHDF5 library
  7. ^ HDF Import and Export Mathematica documentation
  8. ^ PDL::IO::HDF5

External links

  • Official website
  • What is HDF5?
  • NASA HDF file example, its structure generated and shown online as CreativeCommons image

Tools

  • HDF Explorer A data visualization program that reads the HDF, HDF5 and netCDF data file formats
  • HDFView A browser and editor for HDF files
  • ViTables A browser and editor for HDF5 and PyTables files written in Python

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.