Community infrastructure for building and coupling high performance climate, weather, and coastal models Cecelia DeLuca NOAA / CIRES University of Colorado, Boulder cecelia.deluca@noaa.gov ESMF website: www.earthsystemmodeling.org Building Community Codes for Effective Scientific Research on HPC Platforms September 6-7, 2012 Outline • • • • • Vision and Context Technical Overview User Community Values, Processes and Governance Future Vision • Earth system models that can be built, assembled and reconfigured easily, using shared toolkits and standard interfaces. • A growing pool of Earth system modeling components that, through their broad distribution and ability to interoperate, promotes the rapid transfer of knowledge. • Earth system modelers who are able to work more productively, focusing on science rather than technical details. • An Earth system modeling community with cost-effective, shared infrastructure development and many new opportunities for scientific collaboration. • Accelerated scientific discovery and improved predictive capability through the social and technical influence of ESMF. Computational Context • Teams of specialists, often at different sites, contribute scientific or computational components to an overall modeling system • Components may be at multiple levels: individual physical processes (e.g. atmospheric chemistry), physical realms (e.g. atmosphere, ocean), and members of same or multi-model ensembles (e.g. “MIP” experiments) • Components contributed from multiple teams must be coupled together, often requiring transformations of data in the process(e.g. grid remapping and interpolation, merging, redistribution) • Transformations are most frequently 2D data, but 3D is becoming more common • There is an increasing need for cross-disciplinary and inter-framework coupling for climate impacts • Running on tens of thousands of processors is fairly routine; utilizing hundreds of thousands of processors or GPUs is less common • Modelers will tolerate virtually no framework overhead and seek fault tolerance and bit reproducibility • Provenance collection is increasingly important for climate simulations Architecture • The Earth System Modeling Framework (ESMF) provides a component architecture or superstructure for assembling geophysical components into applications. • ESMF provides an infrastructure that modelers use to – Generate and apply interpolation weights – Handle metadata, time management, I/O and communications, and other common functions The ESMF distribution does not include scientific models Components Layer Gridded Components Coupler Components ESMF Superstructure Model Layer User Code Fields and Grids Layer ESMF Infrastructure Low Level Utilities External Libraries MPI, NetCDF, … Components • ESMF is based on the idea of components – sections of code that are wrapped in standard interfaces • Components can be arranged hierarchically, helping to organize the structure of complex models • Different modeling groups may create different kinds or levels of components Some of the ESMF components in the GEOS-5 atmospheric GCM ESMF as an Information Layer Applications of information layer • • • • • • Parallel generation and application of interpolation weights Run-time compliance checking of metadata and time behavior Fast parallel I/O Redistribution and other parallel communications Automated documentation of models and simulations Ability to run components in workflows and as web services Structured model information stored in ESMF wrappers ESMF data structures Standard metadata Standard data structures Attributes: CF conventions, ISO standards, METAFOR Common Information Model Component Field Grid Clock User data is referenced or copied into ESMF structures Native model data structures modules grids fields timekeeping Standard Interfaces • All ESMF components have the same three standard methods: – Initialize – Run – Finalize • Each standard method has the same simple interface: Steps to adopting ESMF • Divide the application into components (without ESMF) • Copy or reference component input and output data into ESMF data structures • Register components with ESMF • Set up ESMF couplers for data exchange call ESMF_GridCompRun (myComp, importState, exportState, clock, …) Where: myComp points to the component importState is a structure containing input fields exportState is a structure containing output fields clock contains timestepping information • Interfaces are wrappers and can often be set up in a non-intrusive way Data Structures • ESMF represents data as Fields that can be built on four discretization types: – Logically rectangular grids, which may be connected at the edges – Unstructured meshes – Observational data streams – Exchange grids – grids that are the union of the grids of components being coupled, and reference the data on the original grids • ESMF can transform data among these representations, using an underlying finite element unstructured mesh engine Component Overhead • Representation of the overhead for ESMF wrapped native CCSM4 component • For this example, ESMF wrapping required NO code changes to scientific modules • No significant performance overhead (< 3% is typical) • Few code changes for codes that are modular • Platform: IBM Power 575, bluefire, at NCAR • Model: Community Climate System Model (CCSM) • Versions: CCSM_4_0_0_beta42 and ESMF_5_0_0_beta_snapshot_01 • Resolution: 1.25 degree x 0.9 degree global grid with 17 vertical levels for both the atmospheric and land model, i.e. 288x192x17 grid. The data resolution for the ocean model is 320x384x60. ESMF Regridding ESMF offers extremely fast parallel regridding with many options (regional, global; bilinear, higher order, first order conservative methods; logically rectangular grids and unstructured meshes; pole options; 2D or 3D; invocation as offline application or during model run) Summary of features: http://www.earthsystemmodeling.org/esmf_releases/non_pu blic/ESMF_5_3_0/esmf_5_3_0_regridding_status.htm HOMME Cubed Sphere Grid with Pentagons Courtesy Mark Taylor of Sandia FIM Unstructured Grid ESMF supported grids IMPACT: “use of the parallel ESMF offline regridding capability has reduced the time it takes to create CLM surface datasets from hours to minutes” - Mariana Vertenstein, NCAR Regional Grid Regridding Modes • ESMF Offline RegridWeightGen Application: – Separate application that is built as part of ESMF, can be used independently – Application generates a netCDF weight file from two netCDF grid files – Supports SCRIP, GRIDSPEC, UGRID, and custom ESMF unstructured format mpirun –np 32 ESMF_RegridWeightGen –s src_grid.nc –d dst_grid.nc –m bilinear –w weights.nc • Regridding during model execution: – ESMF library subroutine calls which do interpolation during model run – Can get weights or feed directly into ESMF parallel sparse matrix multiply – Can be used without ESMF components call ESMF_FieldRegridStore(srcField=src, dstField=dst, regridMethod=ESMF_REGRID_METHOD_BILINEAR, routehandle=rh) call ESMF_FieldRegrid(srcField=src, dstField=dst, routehandle=rh) ESMP – Python Interface ● ● ● ● ● ● ● Flexible way to use ESMF parallel regridding Separate download: http://esmfcontrib.cvs.sourceforge.net/viewvc/esmfcontrib/python/ESMP/ Requirements: – Python – Numpy – Ctypes Limited platform support: Linux/Darwin, GCC(g++/gfortran), OpenMPI Data type: ESMP_Field Grid types: – Single-tile 2D logically rectangular type: ESMP_Grid – Unstructured type: ESMP_Mesh Support for all ESMF interpolation options 3/15/2016 ESMF Web Services for Climate Impacts GOAL: Develop a two-way coupled, distributed, service-based modeling system comprised of an atmospheric climate model and a hydrological model, utilizing standard component interfaces from each domain. IDEA: Bring the climate model to local applications • Two way technical coupling completed during 2012 using CAM (with active land) and SWAT (Soil Water Assessment Tool) • Utilized switch in CAM’s ESMF component wrapper that enables web service interface • Coupled system is run using the OpenMI configuration editor, a web service driver developed in the hydrological community • Next steps: using WRF within the CESM framework and an updated hydrological model Summary of Features • Components with multiple coupling and execution modes for flexibility, including a web service execution mode • Fast parallel remapping with many features • Core methods are scalable to tens of thousands of processors • Supports hybrid (threaded/distributed) programming for optimal performance on many computer architectures; works with codes that use OpenMP and OpenACC • Time management utility with many calendars, forward/reverse time operations, alarms, and other features • Metadata utility that enables comprehensive, standard metadata to be written out in standard formats • Runs on 30+ platform/compiler combinations, exhaustive nightly regression test suite (4500+ tests) and documentation • Couples Fortran or C-based model components • Open source license Major Users ESMF Components: • NOAA National Weather Service operational weather models (GFS, Global Ensemble, NEMS) • NASA atmospheric general circulation model GEOS-5 • Navy and related atmospheric, ocean and coastal research and operational models – COAMPS, NOGAPS, HYCOM, WaveWatch, others • Hydrological modelers at Delft hydraulics, space weather modelers at NCAR and NOAA ESMF Regridding and Python Libraries • NCAR/DOE Community Earth System Model (CESM) • Analysis and visualization packages: NCAR Command Language, Ultrascale Visualization - Climate Data Analysis Tools (UV-CDAT), PyFerret users • Community Surface Dynamics Modeling System Usage Metrics • Updated ESMF component listing at: http://www.earthsystemmodeling.org/components/ • Includes 85 components with ESMF interfaces, 12 coupled crossagency modeling systems in space weather, climate, weather, hydrology, and coastal prediction, for operational and research use • About 4500 registered downloads Values and Principles • • • • • • • • Community driven development and community ownership Openness of project processes, management, code and information Correctness Commitment to a globally distributed and diverse development and customer base Simplicity Efficiency User engagement Environmental stewardship Web link for detail: http://www.esmf.ucar.edu/about_us/values.shtml Agile Development vs Community Values In general, Agile processes promote to community development. However, there are areas of potential conflict: Agile emphasis on co-location: “most agile methodologies or approaches assume that the team is located in a single team room” (from Miller, Dist. Agile Development at Microsoft, 2008) vs ESMF emphasis on distributed co-development Agile emphasis on single product owner vs ESMF emphasis on multi-party ownership Making Distributed Co-Development Work Hinges on asynchronous, all-to-all communication patterns: everybody must have information • Archived email list where all development correspondence gets cc’d • Minutes for all telecons • Web browsable repositories (main and contributions), mail summary on check-ins • Daily, publicly archived test results • Monthly archived metrics • Public archived trackers (bugs, feature requests, support requests, etc.) Discouraged: IMing, one-to-one correspondence or calls – the medium matters Change Review Board • CRB established as a vehicle for shared ownership through user task prioritization and release content decisions • Consists of technical leads from key user communities • Not led by the development team! • Sets the schedule and expectations for future functionality enhancements in ESMF internal and public distributions – Based on broad user community and stakeholder input – Constrained by available developer resources – Updated quarterly to reflect current realities • CRB reviews releases after delivery for adherence to release plan Governance Highlights Management of ESMF required governance that recognized social and cultural factors as well as technical factors Main practical objectives of governance: • Enabling stakeholders to fight and criticize in a civilized, contained, constructive way • Enabling people to make priority decisions based on resource realities Observations: • Sometimes just getting everyone equally dissatisfied and ready to move on is a victory • Thorough, informed criticism is the most useful input a project can get • Governance changes and evolves over the life span of a project Governance Functions • • • • • • • • Prioritize development tasks in a manner acceptable to major stakeholders and the broader community, and define development schedules based on realistic assessments of resource constraints (CRB) Deliver a product that meets the needs of critical applications, including adequate and correct functionality, satisfactory performance and memory use, ... (Core) Support users via prompt responses to questions, training classes, minimal code changes for adoption, thorough documentation, ... (Core) Encourage community participation in design and implementation decisions frequently throughout the development cycle (JST) Leverage contributions of software from the community when possible (JST) Create frank and constructive mechanisms for feedback (Adv. Board) Enable stakeholders to modify the organizational structure as required (Exec. Board) Coordinate and communicate at many levels in order to create a knowledgeable and supportive network that includes developers, technical management, institutional management, and program management (IAWG and other bodies) Governance Executive Management Executive Board Strategic Direction Organizational Changes Board Appointments Interagency Working Group annually Advisory Board Stakeholder Liaison Programmatic Assessment & Feedback Reporting External Projects Coordination General Guidance & Evaluation Working Project Change Review Board quarterly Development Priorities Release Review & Approval Joint Specification Team Functionality Change Requests Requirements Definition Design and Code Reviews External Code Contributions weekly Resource Constraints Implementation Schedule Core Development Team Project Management Software Development Testing & Maintenance daily Distribution & User Support Collaborative Design Beta Testing Evolution Phase 1: 2002-2005 NASA’s Earth Science Technology Office ran a solicitation to develop an Earth System Modeling Framework (ESMF). A multi-agency collaboration (NASA/NSF/DOE/NOAA) won the award. The core development team was located at NCAR. A prototype ESMF software package (version 2r) demonstrated feasibility. Phase 2: 2005-2010 New sponsors included Department of Defense and NOAA. A multi-agency governance plan including the CRB was created: http://www.earthsystemmodeling.org/management/paper_1004_projectplan.pdf Many new applications and requirements were brought into the project, motivating a complete redesign of framework data structures (version 3r). Phase 3: 2010-2015 (and beyond) The core development team moved to NOAA/CIRES for closer alignment with federal models. Basic framework development completed with version 5r (ports, bugs, feature requests, user support etc. still require resources). The focus is on increasing adoption and creating a community of interoperable codes. Technical Evolution: National Unified Operational Prediction Capability (NUOPC) • ESMF allows for many levels of components, types of components, and types of connections • In order to achieve greater interoperability, usage and content conventions and component templates are needed • A tri-agency collaboration (NOAA, Navy, Air Force) is building a “NUOPC Layer” that constrains how ESMF is used, and introduces metadata and other content standards, along with inheritable templates for different usage scenarios • A production version of the NUOPC Layer is scheduled for delivery at the end of 2012 CoG Collaboration Environment Central navigation: Templated content Right side: Services Left side: Auto generated navigation of project freeform content Central Section: Freeform content The CoG environment exposes and collates the information needed for distributed, multi-project development, including project repositories, trackers, and governance processes. It does this in an environment that’s linked to data search, metadata, and visualization services and set up to enable component comparisons. DCMIP on CoG Atmospheric Dynamical Core Model Intercomparison Project http://www.earthsystemcog.org/projects/dcmip-2012/ Planned MIPs: 2013 Downscaling 2014 Atm-surface hydrology 2014 Frameworks Future Governance and process largely follow established patterns Active development ● Python API redesign and addition of features ● Preparation for NUOPC Layer major release ● Web services for climate impacts ● Regridding options – higher order conservative , filling in gaps ● Advanced fault-tolerance ● Performance optimizations and GPUs ● CoG collaboration environment ESMF and related development is supported by NASA, NOAA, NSF, and the Department of Defense. 3/15/2016