From “lab books” to computational Earth science. – Chris Hill, MIT

advertisement
From “lab books” to
computational Earth science.
Chris Hill, MIT – cnh@mit.edu
Edinburgh, July 2007
Lab books
A lab notebook is a primary record of research. Researchers
use a lab notebook to document their hypotheses,
experiments and initial analysis or interpretation of these
experiments. The notebook serves as an organizational tool,
a memory aid, and can also have a role in protecting any
intellectual property that comes from the research.
The guidelines for lab notebooks vary widely between
institution and between individual labs, but some guidelines
are fairly common. The lab notebook is usually written in as
the experiments progress, rather than a later date. Many say
that lab notebook should be thought of as a diary of activities
that are described in sufficient detail to allow another scientist
to follow the same steps.
To ensure that data cannot be easily altered, notebooks with
permanently bound pages are often recommended.
Researchers are often encouraged to write only with
unerasable pen, to sign and date each page, and to have
their notebooks inspected periodically by another scientist
who can read and understand it. All of these guidelines can
be useful in proving exactly when a discovery was made, in
the case of a patent dispute.
Several companies now offer electronic lab notebooks. This
format has gained some popularity, especially in large
pharmaceutical companies, which have large numbers of
researchers and great need to document their experiments.
wikipedia
Lab books
• physical, chemical and biological scientists are taught
lab-book discipline from an early age.
– reproducible results are the foundation of scientific and
engineering disciplines e.g. Mickleson/Morley.
– even an infamous “Journal of Unreproducible Results”
• in computational science the “lab book” discipline is
not so ubiquitous – maybe because
– program is a formal statement of applied mathematical
axioms
– axioms are deterministic
– therefore reproducibility is not an issue
– however, a programs i.e. a complex collection of simple
elemental statements is hard to comprehend. If details are
not recorded, reproducibility may well be an issue.
Some example computational
Earth science experiments.
•
•
•
•
Aqua-planet.
Eddying North Atlantic.
Global ocean with eddies and seaice.
IPCC
A simple GFD
configuration
Water covered
planet.
Atmosphere-oceanseaice.
Jean-Michel Campin and David Ferreira
• Some factors that
affect the solution:
– Initial conditions.
– Atmosphere: Clouds,
radiation, dynamics,
boundary layer,
temporal and spatial
discretization….
– Seaice:
Thermodynamics.
Aging. Stress-strain
relation….
– Ocean: Dynamics,
coordinate system,
vertical/horizontal
friction and mixing….
– Coupling: Time
stepping, emergetics.
– External forcings: Solar
insolation, reference
profiles
An eddying, ocean
only configuration
• Some factors that affect
the solution:
Ocean-only, forced
with atmospheric
reanalysis for JanMar.
Red/blue shading:
ocean
heating/cooling.
Cyan/magenta line:
+/-17.5OC @ 200m.
Streaks:
Windstress.
Green thickness:
Ocean mixed layer
depth.
– Initial conditions.
– Atmosphere fluxes:
Planetary boundary
layer scheme.
– Ocean: Dynamics,
coordinate system,
vertical/horizontal
friction and mixing….
– Coupling: Time
stepping, emergetics.
– External forcings:
Solar insolation,
reference profiles,
atmospheric
reanalysis.
– Non-linear/turbulent
flow, so bitwise
reproducibility subject
to FP round off,
parallel reduction
operatations etc…
Global eddying ocean, seaice decadal ensemble. 50+
members.
Ensemble
perturbations:
Numerical formulation
Ocean parameters
Seaice parameters
Initial conditions
Boundary conditions
Sv
IPCC ocean ACC transports
CNRM-CM3
INM-CM3.0
MRI-CGCM2.3.2a
CCCMA-CGCM3.1
MIRCO3.2(hires)
MIROC3.2(medres)
GISS-AOM
GISS-ER
Sv
Observational
Couples
atmosphere,
ocean, seaice,
land,
vegetation,
chemistry
etc…
400
350
300
250
200
150
100
50
0
Could I make this plot without too much difficulty – yes
Could I rerun IPCC scenario (possibly with some parameter change) – no
Diagnosing these results is possible today (PCMDI/ESG archives) for broad
scientific community. Rerunning experiments (with or without small changes)
is still very hard.
Factors affecting solution range from bottom drag to land-surface
formulation to emissions profiles.
Examples summary
• To reproduce an experiment
– significant quantity of information
needs to be stored – spans broad
“big-picture” information (watercovered planet,
atmos+ocean+seaice) to minute
details (bitwise reproducibility
may require record of compiler,
OS etc…)
Sv
400
350
300
250
200
150
100
50
0
INM-CM3.0
CNRM-CM3
MRI-CGCM2.3.2a
MIRCO3.2(hires)
CCCMA-CGCM3.1
MIROC3.2(medres)
GISS-ER
GISS-AOM
Observational
Sv
Way Forward
• hand record is not practical nor ideal
(i.e. not as potentially useful as
electronic record).
• Electronic information should be
stored so as to be amenable to
machine reasoning.
– requires defined vocabularies,
precise formal structure, pattern
matching, rules etc..
– W3C/semantic web technologies XML, RDF,
• In theory, using XML, RDF etc…<
we could describe model systems
using these and enable reruns for
extra outputs (e.g. transport of S3 by
flow) or derived runs (e.g. modified
air-sea coupling coefficient of
formulation).
• In practice this is hardwork!
Baby steps toward a computational
Earth science “model repository”.
• What is working today – PCMDI/ESG
• Steps toward future - ESC
PCMDI
•
•
•
•
•
Archive of all IPCC model outputs.
Stored in common format (netCDF with
standard metadata).
Stored on common mesh. Simplifies things,
but can/does degrade information and even
mislead (e.g. conservation in one
coordinate system may be inexact in
another).
Very limited model metadata is held.
Very successful and technically impressive
– societal utility func. of model quality!
Schmittner et al (2005, GRL)
Earth System Curator (ESC)
Can we (for better or worse!) do for
models what PCMDI does for
datasets?
PCMDI datasets are data “wrapped”
in a common/standard container
(netCDF).
The PCMDI container is “selfdescribing”.
This means we can query and even
combine (to some degree) the
PCMDI datasets.
A container analogy for modeling
technology is the “component
architecture” supported by systems
like ESMF.
Building a coupled model oriented solution –
modeling system as a component tree
• Some mathematics – component M
M   
– no side-effects
– possible persistent internal state
  e , i 
• Supports representation as DAG such that
P ,n  P ,n ,P ,n,m : m  1, nc 
M0
e.g
M 0,1,1
M 0,1
M 0, 2
M0,1,2
M0,2,1M0,2,2 M0,2,3
0,2  0,2 ,0,2,1,0,2,2 ,0,2,3
Example of actual component tree.
Suarez et. al
• Tree of
components from
the GEOS-5
modeling system.
• Each box is an
ESMF
component.
• Components
adhere to DAG
semantics.
Individual components in ESMF
• ESC builds on an ESMF-like component model.
– ESMF Component
• Container for sequence of computation that implements a
particular algorithm (physics simulation e.g. Navier-Stokes
solver or technical function e.g history manager). An ESMF
component exposes its external interfaces through an ESMF
state.
– ESMF State
• Container data type to transport data between components
– ESMF Field
• Container data type that can be used to push/pop ndimensional data with an associated mesh from an ESMF
State.
Given a component model, like the
ESMF paradigm, ESC…
• Describes a component in terms of
– parameters that control the computation sequence.
– states and fields that are passed into/out of the
component.
• Provides two levels of description
– potential and specific.
– Potential is a list of all possible parameters and fields.
It is a virtualized description in that it is not describing
a specific instance.
– Specific is a description of an instantiated component
in which parameters are bound to specific values and
fields and states are bound to specific values.
ESC component descriptions are in
terms of XML schema.
• Curator-NMM
– Described numerical model parameters e.g. timestep,
system requirements,
• Gridspec
– Describes numerical mesh.
• Curator-CIAO
– Describes components inputs and outputs
• Curator-complete
– Describes wiring together of components
– A coupled component is also a component i.e.
schema is recursive.
Some details (more at
http://www.earthsystemcurator.org) …..
Curator-NMM
• The Curator-NMM schema describes model
components, their content, and their connections. It is a
superset of the NMM schema. The main constructs in
the Curator-NMM schema are component, potential
model, and model. Components are "composable"
pieces of code that can be coupled together in various
arrangements to form different models. A potential
model consists of a group of components, and describes
the set of possible models that can be built from those
components. A model is a fully specified application
based on a potential model and configuration choices.
Curator-NMM
Mosaic Grid Specification
• The Mosaic Grid Specification is a
standardized description of muti-patch,
structured grids being developed in
coordination with CF activities.
Mosaic Grid Specification
Component – component compatibility
checking.
• ESC can describe coupled (multi-component)
systems.
• In principle ESC could support recombination of
components from coupled systems e.g. couple
component A (atmosphere dynamics) with
component B (land-surface).
• Ideally, for this, compatibility constraints need to
be expressed in a standard way.
Service architectures
• Standards  services
– Developing standardized descriptions is a
well-proven method toward a service oriented
approach e.g.
Some useful (but an incomplete list of)
URLs
Component
models
Metadata &
standards
http://www.esmf.ucar.edu
http://maplcode.org
http://www.earthsystemcurator.org
http://ncas-cms.nerc.ac.uk/NMM/
http://www.earthsystemgrid.org/
http://www.cgd.ucar.edu/cms/eaton/cfmetadata/
http://sbml.org/index.psp
http://cml.sourceforge.net/wiki/index.php
/Main_Page
http://www.w3.org/
Summary
• Earth System Curator project is an activity developing schema
and tools to capture “semantic” information about models.
– Such information provides basis for formally recording numerical
experiments – computational Earth science “lab book”.
– It also provides the basis for a formal approach reproducible numerical
results – fewer “Journal of Irreproducible Results” candidates.
• Other efforts SBML (systems biology), CML (chemistry) already “uploads” to Science submissions.
• Maybe soon a computational Earth science challenge will
become, how to stop people doing dumb things with easy to
use modeling services, rather than how to get people to use
obtuse legacy modeling systems - maybe! 
ESC collaboration
• NCAR (Cecelia Deluca, Julien Chastang),
MIT (Chris Hill, Constantinos
Evangelinos), Georgia Tech (Spencer
Rubager, Rocky Dunlap, Angela), GFDL
(Balaji, Sergey), Reading UK (Lois
Steenman-Clark, Katherine Boughton),
PRISM (Sophie Valcke).
Download