An Overview of the Earth System Bridge Project (and a bit of background about CSDMS) Scott D. Peckham Senior Research Scientist at INSTAAR Lead PI for Earth System Bridge Former Chief Software Architect for CSDMS University of Colorado, Boulder April 21, 2015 Earth System Bridge, CoG page http://www.earthsystemcog.org/projects/earthsystembridge Background: Community Surface Dynamics Modeling System (CSDMS) http://csdms.colorado.edu http://csdms.colorado.edu/wiki/Model_download_portal CSDMS: 1289 Members, 5 Working Groups, 6 Focus Research Groups Linking Component-based Models: How Can Two Models Differ? • Programming language (C, C++, Fortran, Java, Python, etc.) Solution: Babel and Bocca (CCA toolchain) • Computational grid (triangles, rectangles, Voronoi, points, etc.) Solution: ESMF regridder (parallel, spatial interpol.) • Timestepping scheme (fixed, adaptive, local) Solution: Temporal interpolation tool • Variable names Need some means of “semantic mediation” Solution: CSDMS Standard Names • Variable units Solution: UDUNITS (Unidata) Language Interoperability with Babel Language interoperability is a powerful feature of the CSDMS framework. Components written in different languages can be rapidly linked in HPC applications with hardly any performance cost. This allows us to “shop” for open-source solutions (e.g. libraries), gives us access to both procedural and object-oriented strategies (legacy and modern code), and allows us to add graphics & GUIs at will. CSDMS Regridding Tools • ESMF Regrid (multi-proc., Fortran) • OpenMI Regrid (single-proc., Java) • Custom-built regridders Model Coupling Metadata and the CSDMS Standard Names http://csdms.colorado.edu/wiki/CSDMS_Standard_Names http://csdms.colorado.edu/wiki/CSN_Metadata_Names Semantic Matching for Model Variables Hydro Model A Output variables: • streamflow • rainrate Hydro Model B Input variables: • discharge • precip_rate CSDMS Standard Names • channel_exit_water_x-section__ volume_flow_rate • atmosphere_water__rainfall_volu me_flux Goal: Remove ambiguity so that the framework can automatically match outputs to inputs. The CSDMS Standard Names Data Models like RDF and EAV use triples like: Subject + Predicate + Object, and Entity/Object + Attribute + Value (object-oriented) CSDMS Standard Names use a similar pattern for creating unambiguous and easily understood standard variable names or “preferred labels” according to a set of rules. These are then used to retrieve values and metadata. The pattern is: Object name + [Operation name] + Quantity name Examples: atmosphere_carbon-dioxide__partial_pressure atmosphere_water__rainfall_volume_flux earth_ellipsoid__equatorial_radius soil__saturated_hydraulic_conductivity We have also started building a set of standard Attribute and Process Names. Standard Assumption Names Assumption Type: Example Boundary conditions: Conserved quantities: Coordinate system: Angle conventions: Dimensionality: Equations used: Closures: Flow-type assumptions: Fluid-type assumptions: Geometry assumptions: Named model assumptions: Thermodynamic processes: Approximations: Averaging methods: Numerical methods used: State of matter: no_slip_boundary_condition momentum_conserved cartesian_coordinate_system clockwise_from_north_convention 2_dimensional navier_stokes_equation eddy_viscosity_turbulence_closure laminar_flow herschel_bulkley_fluid trapezoid_shaped green_ampt_infiltration_model isenthalpic_process boussinesq_approximation reynolds_averaged arakawa_c_grid liquid_phase The CSDMS Standard Names The CSDMS Standard Names can be viewed as a lingua franca that provides a bridge for mapping variable names between models. They play an important role in the Basic Model Interface (BMI). Model developers are asked to provide a BMI interface that includes a mapping of their model's internal variable names to CSDMS Standard Names and a Model Coupling Metadata (MCM) file that provides model assumptions and other information. IMPORTANT: Model developers continue to use whatever variable names they want to in their code, but then "map" each of their internal variable names to the appropriate CSDMS standard name in their BMI implementation. Main Page: Basic Rules: Object Names: Operation Names: Quantity Names: Process Names: Assumption Names: Metadata Names: Model Metadata Files: csdms.colorado.edu/wiki/CSDMS_Standard_Names csdms.colorado.edu/wiki/CSN_Basic_Rules csdms.colorado.edu/wiki/CSN_Object_Templates csdms.colorado.edu/wiki/CSN_Operation_Templates csdms.colorado.edu/wiki/CSN_Quantity_Templates csdms.colorado.edu/wiki/CSN_Process_Names csdms.colorado.edu/wiki/CSN_Assumption_Names csdms.colorado.edu/wiki/CSN_Metadata_Names csdms.colorado.edu/wiki/CSN_MMF_Example Standard Name Crosswalks, Etc. Develop crosswalks between the CSDMS Standard Names (2573 names) and the: • CUAHSI VariableName CV (786 names, 50%) • CF Convention Standard Names (2630 names, 60%) tendency_of_mass_concentration_of_ammonia_in_air atmosphere_air_ammonia__time_derivative_of_mass_concentration Work with deep Earth process modeling community to develop new CSDMS Standard Names (PSU meeting) Work with hydrologic modeling community (CUAHSI) to further develop CSDMS Standard Names (NFIE mtg) Metadata for Model: TopoFlow – Meteorology Author Scott D. Peckham Email Scott.Peckham@colorado.edu Domains Hydrology, meteorology Language Python License MIT Purpose TopoFlow is a spatially-distributed hydrologic model that consists of many stand-alone components. This component provides meteorology variables, such as precipitation rate, temperature, relative humidity, and shortwave and longwave radiation. Goto Model Objects Page MCM Tool A prototype or mock-up of a smart phone app based on Model Coupling Metadata (MCM) for describing a model. (under development) The Basic Model Interface (BMI) http://csdms.colorado.edu/wiki/BMI_Description https://github.com/csdms/bmi/blob/master/bmi.sidl The Basic Model Interface (BMI) BMI requires a developer to make some relatively simple, noninvasive, and framework-independent changes to his/her model source code, mostly by adding some new functions. Functions can be grouped into: Model Control Functions (initialize, update, finalize) Model Information Functions (e.g. time-stepping scheme) Variable Information Functions (e.g. dimensions, data type, units) Variable Getters and Setters Grid Information Functions These functions make the model self-describing (so the framework can “see” and reconcile model differences) and fully controllable. CSDMS can automatically wrap BMI-enabled models to provide them with a CMI interface which allows them to be used in the CSDMS framework and to gain other new capabilities, including NetCDF output. How BMI and CMI Work The BMI Breakthrough: For Modelers • Requires minimal effort. • Is noninvasive (adds no dependencies on CSDMS data. structures or code. No interference with developer’s design. • Allows model to still be used as before in a stand-alone mode. • Requires no new code intended to accommodate the needs of other models (unlike OpenMI 1.4). • Is framework agnostic and requires no modeling-framework specific knowledge (e.g. the CCA concept of ports); developers do not “code to” the framework. • Only requires developer to do things that would be necessary for the model to be used in any modeling framework. • Provides added value, e.g. ability to couple to other models and to write output variables to standardized NetCDF files. BMI Breakthrough: For SE at CSDMS • One “CMI wrapper” (code) can be used for any BMI-enabled model, instead of unique wrapper code for each model. • Make it possible to largely automate the conversion of contributed process models to CSDMS components. • Dramatically reduces code maintenance time. • Addresses the “frozen code” problem. i.e. minimal additional effort to bring a new version of the same model (e.g. with enhancements or bug fixes) into the framework. • Minimal impact on performance of the model. • Makes it possible for the CSDMS framework to automatically accommodate differences between models by calling service components when needed. • The SE no longer needs to learn all about each model. Earth System Bridge – BMI Adapters CSDMS OMS ESMF OpenMI Pyre BMI Adapters will allow a BMI-enabled model to be used in any component-based modeling framework. Earth System – Framework Description Language (ES - FDL) https://earthsystemcog.org/projects/es-fdl/ ES - Framework Description Language Earth System Bridge Demonstration Projects Demonstrate the new capabilities provided by Earth System Bridge technology by using them to couple operational/federal atmosphere-ocean models at the globalto-regional scale to academic (or operational) hydrologic/land/coastal models at the regional-to-local scale. • ESMF: Couple the MPIPOM-TC ocean model to a localscale hydrologic or inundation model. (MPIPOM-TC = Message Passing Interface Princeton Ocean Model for Tropical Cyclones) • CSDMS: Complete implementation and testing of a Basic Model Interface (BMI) for WRF model (Weather Research and Forecasting) and link it within the CSDMS framework to the local-scale, spatial hydrologic TopoFlow model. For More Information • Peckham, S.D., E.W.H. Hutton and B. Norris (2013) A component-based approach to integrated modeling in the geosciences: The Design of CSDMS, Computers & Geosciences, special issue: Modeling for Environmental Change, 53, 3-12 . • Peckham, S.D. (2014) The CSDMS Standard Names: Cross-domain naming conventions for describing process models, data sets and their associated variables, Proceedings of the 7th Intl. Congress on Env. Modelling and Software, International Environmental Modelling and Software Society (iEMSs), San Diego, CA. (Eds. D.P. Ames, N.W.T. Quinn, A.E. Rizzoli). • Peckham, S.D. (2014) EMELI 1.0: An experimental smart modeling framework for automatic coupling of self-describing models, Proceedings of HIC 2014, 11th International Conf. on Hydroinformatics, New York, NY. Model Coupling Metadata Models are based on a number of real-world objects, e.g. atmosphere, river channel. (Think object-oriented.) There are “assumptions” that apply to the model, its computational grid, its objects and the attributes/quantities that describe those objects. MCM uses the CSDMS Standard Assumption Names and the CSDMS Standard Variable Names. The assumption names are used to “decorate” or “tag” other model entitities (above). Taming Heterogeneity with Interfaces Before: Each resource is unique. Own ways of doing things. Respond differently. Can become unstable. Difficult to control. After: Uniform outward appearance. Respond to same commands. Interchangeable units. Have a chain of command. Work as a team. Reconciling Differences with Standards vs. If we reconcile differences between the resources in a pairwise manner, the amount of work, etc. grows fast: Cost(N) = N (N-1) / 2 ~ N2. Introduce a new, generic or standard representation (the “hub”), then map resources to and from it. The amount of work, maintenance, etc. drops to: Cost(N) = N. Building a Modeling Framework CSDMS has integrated a variety of powerful, open-source tools to build its modeling framework, such as: Babel – Language interoperability (C,C++,Java,Python,Fortran) Bocca – Component preparation and project management Ccaffeine – Low-level model coupling (parallel environ.) ESMF Regrid – Multi-processor spatial regridding OpenMI Regrid – Single-processor spatial regridding NetCDF – Scientific data format (self-describing, etc.) VisIt – Visualization of large data sets (multi-proc.) We developed our own interface standards, BMI & CMI, and greatly extended the original Ccaffeine GUI to create our CSDMS Web Modeling Tool for interactive model coupling. Some Key Concepts and Terms Model: State variables for some system of interest are discretized in space (on a computational grid) and new values are computed from previous values by marching forward in time according to a set of rules (e.g. laws of physics). Model Component: A model that has been specially prepared for plug-andplay reusability. (i.e. a standard interface & compiled for language interoperability). Interface: A standardized set of functions, which a caller uses to interact with a resource such as a model, database or service. Heterogeneous resources wrapped with a standard interface then operate in a “familiar” way. Modeling Framework: A software container in which pre-compiled model components are instantiated, configured and dynamically linked to rapidly create a customized model. (With a “birds-eye view” of all components.) Workflow: Software that allows a chain of steps to be saved, modified and re-executed. Each step uses an application to create an intermediate product that is passed along to the next step, often as a file. How Can a Model be Converted to a Reusable, Plug-and-Play Component? CSDMS has developed two model interfaces, one called Basic Model Interface (BMI) and one called Component Model Interface (CMI) for this purpose. BMI requires a developer to make some relatively simple, noninvasive, and framework-independent changes to his/her model source code, mostly by adding some new functions. These provide a caller with 3 key things: (1) fine-grained control (i.e. IUF), (2) variable getters and setters and (3) model metadata. The model can still be used in “stand-alone mode,” just as before. CSDMS automatically wraps BMI-enabled models to provide them with a CMI interface. CMI makes function calls to the (1) framework, (2) service components (the “reconcilers”), (3) the BMI of the wrapped model and (4) the CMI of other plug-and-play components. CMI gives a model many new capabilities, including NetCDF output. Metadata for Model: TopoFlow – Meteorology Author Scott D. Peckham Email Scott.Peckham@colorado.edu Domains Hydrology, meteorology Language Python License MIT Purpose TopoFlow is a spatially-distributed hydrologic model that consists of many stand-alone components. This component provides meteorology variables, such as precipitation rate, temperature, relative humidity, and shortwave and longwave radiation. Goto Model Objects Page MCM Tool A Prototype or mock-up of a smart phone app based on Model Coupling Metadata (MCM) for describing a model. (under development) + Objects for Model: TopoFlow - Meteorology O A atmosphere_aerosol_dust + Quantities for Object: land_surface Q Quantity SI Units Grid Type albedo 1 G1 config, output aspect_angle radians G1 config, output emissivity 1 G2 output geodetic_latitude degrees G1 config, output atmosphere_bottom_air_land_heat~net~sensibl e longitude degrees G1 config, output atmosphere_bottom_air_water~vapor slope_angle radians G1 output atmosphere_water temperature deg_C G2 output atmosphere_air-column_water~vapor atmosphere_bottom_air atmosphere_bottom_air_flow atmosphere_bottom_air_land_heat~net~latent earth land_surface model_grid physics snowpack water A Assumptions for Quantity: + aspect_angle Sign_and_Angle_Conventions counter-clockwise_from_east + Objects for Model: TopoFlow - Meteorology O atmosphere_aerosol_dust atmosphere_air-column_water~vapor atmosphere_bottom_air atmosphere_bottom_air_flow atmosphere_bottom_air_land_heat~net~latent atmosphere_bottom_air_land_heat~net~sensib atmosphere_bottom_air_water~vapor atmosphere_water earth land_surface model_grid physics snowpack water A Assumptions for Model: TopoFlow - Meteorology Equations some_named_equation O + A