ESMPy: The Python Interface to the Earth System Modeling Framework Ryan O’Kuinghttons, Robert Oehmke Cecelia DeLuca, Gerhard Theurich Peggy Li, Joseph Jacob Cooperative Institute for Research in Environmental Sciences NOAA Environmental Software Infrastructure and Interoperability Project American Meteorological Society Annual Meeting Atlanta, Georgia February 4, 2014 Introduction • The Earth System Modeling Framework (ESMF) is open source software for building modeling components, and coupling them together to form weather prediction, climate, coastal, and other applications. • Supports a full Fortran and limited C and Python interfaces • ESMF provides a mature high performance regridding package • Transforms data from one grid to another by generating and applying interpolation weights • Supports structured and unstructured, global and regional, 2D and 3D grids, with many options • Fully parallel and highly scalable • The Python interface to ESMF (ESMPy) offers access to the regridding functionality and other related features of ESMF. vs. ESMP ESMPy • Thin layer on top C interfaces to ESMF regridding ESMPYESMP • Python package surrounding ESMP and other related modules • Grid objects created in memory • Basic masking capability • Grid objects created in memory or from file • Simple testing layer • Field derived from MaskedArray • Regrid testing framework ESMPy Contributed Package Candidates: Time Manager ESMP OCGIS • Nose testing What does ESMF gain with a Python Interface? • Enables ESMF regridding to be used with very little effort, in an object oriented way: • Regridding applied as a callable Python object • Numpy array access to distributed data • Grid creation from NetCDF files in standard formats • Some users report computation times reduced from hours to minutes • Enables ESMF regridding to be used in other scientific packages with Python-based workflows – current users include: • UV-CDAT (PCMDI) – Ultrascale Visualization Climate Data Analysis Tools • PyFerret (NOAA) – Python based interactive visualization and analysis environment • Iris (Met Office) – Python library for visualizing meteorological and oceanographic data sets. • Community Surface Dynamics Modeling System (CU-Boulder) – tools for hydrological and other surface modeling processes Supported Grids and Methods • Spherical coordinates – bilinear, higher order patch [1,2], or first order conservative regridding with: • Global or regional 2D logically rectangular grids • 2D unstructured meshes composed of triangles or quadrilaterals • Cartesian (x,y) coordinates: • Bilinear, higher order patch or first order conservative regridding between any pair of: • 2D meshes composed of triangles and quadrilaterals • 2D logically rectangular grids composed of a single patch • Bilinear and first order conservative regridding between any pair of: • 3D meshes composed of hexahedrons • 3D logically rectangular grids composed of a single patch 2D Unstructured Mesh From www.ngdc.noaa.gov FIM Unstructured Grid Regional Grid ESMPy Classes • Manager (esmpymanager module) • Initialize and Finalize • Logging • Virtual Machine (parallel distribution) • Grid • Logically rectangular discretization object • Mesh • Unstructured mesh discretization object • Field • Grid or Mesh plus data, mask and metadata • Derived type of the Numpy MaskedArray • Regrid • Callable object which operates on two Fields to compute and apply interpolation weights A Few Words on Data Conventions… The following data conventions makes it easier to use data and tools. ESMPy grid files follow standard data file formats: •Climate and Forecast (CF) grid conventions • UGRID - candidate CF convention for unstructured grids[3] • GRIDSPEC – accepted CF convention for logically rectangular grids [4] •SCRIP – Spherical Coordinate Remapping and Interpolation Package [5] • Legacy format for 2D logically rectangular or 2D unstructured grids •ESMF • Custom format for unstructured grids, more efficient storage than SCRIP when used with ESMF codes From-file Grid/Mesh creation ESMPYESMP MESH GRID • SCRIP format • SCRIP (2D only) • GRIDSPEC is not fully supported, expected in a patch release by the end of February 2014 • UGRID (2D and 3D) • Examples: • ESMFMESH (2D and 3D) • Examples: • Irregular cubed sphere • Latitude longitude • Icosahedral • Gaussian BOTH • Regional and global • Parallel implementation supported for basic regridding - no masking or coordinate retrieval Code – Grid Create a Grid from a SCRIP formatted NetCDF file: import ESMF grid = ESMF.Grid(filename=“ll2.5deg_grid.nc”, filetype=ESMF.FileFormat.SCRIP) OPTIONS: •is_sphere – set to False for a regional grid •add_corner_stagger – set to True to add the corner stagger location defined in the file, this is needed for conservative regridding •add_user_area – set to True to read cell areas from the grid file, otherwise they will be calculated by ESMF internally •add_mask – set to True to generate missing value attribute in ‘varname’ •varname – missing value variable name for the mask •**coord_names – two element array containing name of latitude and longitude variables in a GRIDSPEC file, for the case when multiple coordinates are defined Code – Mesh Create a Mesh from UGRID formatted NetCDF file: import ESMF mesh = ESMF.Mesh(filename=“FVCOM_grid2d_20120314.nc”, filetype=ESMF.FileFormat.UGRID, meshname=‘fvcom_mesh’) OPTIONS: •convert_to_dual – set to False to NOT calculate the dual mesh •add_user_area – set to True to read cell areas from the file, otherwise they will be calculated by ESMF internally •meshname – name of the mesh metadata variable in a UGRID file •add_mask – set to True to generate missing value attribute in ‘varname’ •varname – missing value variable name for the mask Code - Regridding Conservative regridding: from ESMF import Regrid r1to2 = Regrid(field1, field2, regrid_method=ESMF.RegridMethod.CONSERVE) destination_field = r1to2(source_field) OPTIONS: •src_mask_values – numpy array of values to use for a mask on the source field •dst_mask_values – numpy array of values to use for a mask on the source field •regrid_method – (RegridMethod.BILINEAR(default), PATCH, or CONSERVE) •pole_method – specifies the type of artificial pole to construct on source grid (PoleMethod.NONE (default conserve), ALLAVG (default nonconserve), NPNTAVG, TEETH) •regridPoleNPnts – number of points to use with PoleMethod.NPNTAVG •unmapped_action – specifies which action to take if a destination point is found which does not map to any source points (UnmappedAction.ERROR(default) or IGNORE) •src_frac_field– returned numpy array with weights corresponding to fractions of each source field value which contributes to the total mass of the source field •dst_frac_field– returned numpy array with weights corresponding to fractions of each destination field value which contributes to the total mass of the destination field Regridding Results r1to2 = Regrid(field1, field2, regrid_method=RegridMethod.CONSERVE) where: f(phi,theta) = 2 + cos(theta)**2 * cos(2*phi) Source grid: fv1.9x2.5_050503.nc - 1.9x2.5 CAM finite volume grid Destination grid: wr50a_090614.nc - Regional 205x275 grid Mean relative error = Maximum relative error = Conservation error = 3.19E-03 1.93E-02 7.11E-15 Requirements, Supported Platforms, Limitations, etc... Requirements: - Python 2.6, 2.7 - Numpy 1.6.1/2 (ctypes) - ESMF installation (with NetCDF) Testing: - Regression tested nightly on 5 platforms Supported Platforms: - Linux, Darwin, and Cray - Gfortran - OpenMPI Limitations: - No object for collections of Fields - No access to Field bounds Installation: -python setup.py build –ESMFMKFILE=<path_to_esmf.mk> install Status and Future Work • ESMPy is still in beta, production release expected later this year • ESMP is in production and fully supported (fewer features) • Later in 2014: • Python layer functionality of Grid/Mesh created from file • OpenClimateGIS • Data type for observational data streams, and regridding to/from • Time management, calendar • Components? OpenClimateGIS Overview https://www.earthsystemcog.org/projects/openclimategis/ • • • • • Developed by the NESII Group in association with the NCPP Project under funding provided by the NOAA Climate Program Office. Python package designed to ease the “localization” and accessibility of high-dimensional scientific datasets Primary Features: geospatial subsetting, standardized calculation, bundling, and format conversion Could benefit from ESMPy conservative regridding ESMPy could use subsetting capability and access/conversion to/from GIS data formats Will introduce a number of new dependencies: • GDAL, Shapely, Fiona, netCDF4-Python https://www.earthsystemcog.org/projects/openclimategis/dependencies https://github.com/NCPP/ocgis Questions? Please contact: esmf_support@list.woc.noaa.gov with questions or feature requests Download: http://earthsystemcog.org/proj ects/ESMPy/releases References: 1.Khoei S.A., Gharehbaghi A. R., The superconvergent patch recovery technique and data transfer operators in 3d plasticity problems. Finite Elements in Analysis and Design, 43(8), 2007. 2.Hung K.C, Gu H., Zong Z., A modified superconvergent patch recovery method and its application to large deformation problems. Finite Elements in Analysis and Design, 40(5-6), 2004. 3. UGRID wiki: http://publicwiki.deltares.nl/display/NETCDF/Deltares+CF+proposal+for+Unstructured+Grid+data+model 4.GridSpec wiki: https://ice.txcorp.com/trac/modave/wiki/CFProposalGridspec 5.Jones, P.W. SCRIP: A Spherical Coordinate Remapping and Interpolation Package. http://www.acl.lanl.gov/climate/software/SCRIP. Los Alamos National Laboratory Software Release LACC 98-45 ctypes bindings to ESMF Interfacing with ctypes: _ESMF.ESMC_GridGetCoord.restype = ctypes.POINTER(ctypes.c_void_p) _ESMF.ESMC_GridGetCoord.argtypes = [ctypes.c_void_p, ctypes.c_int, ctypes.c_uint, numpy.ctypeslib.ndpointer(dtype=numpy.int32), numpy.ctypeslib.ndpointer(dtype=numpy.int32), ctypes.POINTER(ctypes.c_int)] gridCoordPtr = _ESMF.ESMC_GridGetCoord(grid.struct.ptr, coordDim, staggerloc, exclusiveLBound, exclusiveUBound, ctypes.byref(lrc)) # adjust bounds to be 0 based exclusiveLBound = exclusiveLBound - 1 Allocating Numpy array buffers for memory allocated in ESMF: buffer = numpy.core.multiarray.int_asbuffer( ctypes.addressof(pointer.contents), numpy.dtype(ESMF2PythonType[self.type]).itemsize*size) array = numpy.frombuffer(buffer, ESMF2PythonType[self.type]) Switching between Fortran and C array striding: array = numpy.reshape(array, self.size_local[stagger], order='F')