ppt - WordPress.com

advertisement
Metadata Semantics and
the Earth System Curator
Rocky Dunlap
Earth System Curator
Georgia Tech
Earth System Curator



3 year NSF funded project
Funded Collaborators:
 Cecelia DeLuca (NCAR, PI)
 Balaji (GFDL, Co-PI)
 Don Middleton (NCAR, Co-PI)
 Chris Hill (MIT, Co-PI)
 Spencer Rugaber (Ga Tech, Co-PI)
 Leo Mark (Ga Tech)
 Julien Chastang (NCAR)
 Sergey Nikonov (GFDL)
 Angela Navarro (Ga Tech)
 Me (Ga Tech)
Also working with:
 Lois and Katherine (NMM)
 Sophie Valcke (PRISM/OASIS)
 Others...
Curator Doctrine




Currently a gap in the way we treat models and
datasets (are they really so different?)
Best description of a dataset is a
comprehensive description of the model run
that created the dataset (+ post processing)
Model components are data objects for exchange
Metadata-centric view


Don’t start with a dataset and try to find the
metadata... Start with good metadata that leads
you to the datasets you want—even if they don’t
yet exist! (No, really, that’s how we think.)
Haiku are a valid form of model metadata
Earth System Curator
Applications (Proofs of Concept)

Catalog of modeling components along with
comprehensive metadata


Demonstrate compatibility checking of
components



CDP Curator (Michael B., Don, Luca, Julien)
Primarily “technical” compatibility: platforms,
compilers, required fields, field data types,
calendar/time
Demonstrate auto-generation of coupler
component based on metadata
Demonstrate automation of workflow tasks

Model assembly, execution, archive, postprocessing
Schema Development Fun

To accomplish these goals, we need:
Comprehensive descriptions of climate
models: model metadata
 Includes both “semantic” and “syntactic”
elements (“discovery” vs. “use”)

• Semantic: component name, type, owner,
description, source code location, component
architecture of model, platform, framework
• Syntactic: parameter settings, input datasets,
boundary conditions, coupling details, grid
coordinates
Lots of schemata...
Component (NMM)
 Potential Model (NMM/Curator)
 Model (NMM)
 PMIOD/SMIOC (PRISM coupling spec)
 CRE/Curator Complete (workflow)
 Application (NMM)
 Gridspec

Reminiscing on Metadata
Development

Observations:

(It seems) much of the community is in
support of metadata development
• Although there are different opinions on levels of
comprehensiveness

People using metadata for different reasons:
•
•
•
•
•

Annotate large datasets for retrieval
Inform analysis tools
Archiving of modeling components
Automation of workflow (runtime environ.)
Exchange datasets
Each application requires different (but
often overlapping) metadata
How should we think about
schemata?

Schemata are typically written for applications:



I have a particular task I want to accomplish
What metadata do I need to accomplish it?
Write a schema.
But...

Now we have lots of schemata sitting around
• They may contain overlapping information
• Different ways of expressing the same information
• Each schema is used for a small number of tasks and
understood by a small number of applications
• May need to reference elements in another schema,
or aggregate elements from multiple schemata
A Unified View of Metadata

Given all of the current metadata
development efforts, Curator is promoting
a unified view of metadata
Metadata reuse must be a priority
 Metadata aggregation is key: schemata
built (generated!) from repository of
existing metadata elements (let’s call
them types)
 We must think conceptually first and then
syntactically—ideally, all groups will agree
at both levels

What’s In a Schema?
XML Schema (e.g., gridspec.xsd)
GridTile
ContactRegion
These are
syntactic and
GridDescriptor conceptual
constructs
Boundary
XML Type
Re-using schema elements
How do I best use/re-use metadata
elements from (multiple) schema(ta) to
accomplish my particular application?
 You need:

A conceptual understanding of the “types”
(concepts) in the schema  Glossary
 The syntactic representation of that type
(so you can actually use it in
implementations)  XML Type Library
WE 
ARE
HERE
Multi-Schema Semantic
Glossary


Community-wide glossary of metadata
types/concepts from multiple schemata
Concepts aggregated into a centralized glossary




Schema authors and users can get
explanations/definitions of metadata elements.
Examples:
What does the contact_region tag mean in the
Gridspec schema?
What goes under the intent tag in the PMIOD?
What is a potential model anyway?
Multi-Schema Semantic
Glossary

For each metadata concept provide:
Human-readable definition
 Source schema
 Example usage
 Change notes/provenance
 Semantic relationships with other concepts
(e.g., broader than, narrower than, part of,
parent of, synonym, etc.)

Glossary Design

Schema authors embed descriptions directly
inside each XML schema
Keep the human-readable definitions close
to the formal syntactic definitions
 When schema is updated, it is easy to
update glossary

Glossary entries from distributed schemata
are harvested (nightly?) and placed into
centralized glossary (alternatively, live access?)
 Simple interface allows users to query
glossary for concepts

Glossary Design

Simple Knowledge Organization Systems
(SKOS) data model for glossary entries
http://www.w3.org/2004/02/skos/
 SKOS supports knowledge organization
systems like glossaries, thesauri,
taxonomies, etc.
 RDF based – move the community
toward languages with higher semantics
(eventually get down to dataset level)

Sample SKOS RDF (Basic)
<skos:Concept rdf:about="http://.../schema/1.0#PotentialModel">
<skos:prefLabel>potential model</skos:prefLabel>
<skos:definition>
A set of components at the source code level that can
potentially form an executable model.
...
</skos:definition>
</skos:Concept>
Where should glossary
entries be stored?
Example Annotated Schema
...
<xsd:complexType name=“PotentialModel">
<xsd:annotation>
<xsd:documentation>
<skos:Concept rdf:about="http://.../schema/1.0#PotentialModel">
<skos:prefLabel>potential model</skos:prefLabel>
<skos:definition>
A set of components at the source code level that can
potentially form an executable model.
</skos:definition>
</skos:Concept>
</xsd:documentation>
</xsd:annotation>
<!-- rest of complexType definition goes here -->
<xsd:complexType>
...
Sample SKOS RDF Triples
‘potential model’
skos:prefLabel
skos:Concept
‘A set of components at the source
code level that can potentially form
an executable model. ’
rdf:type
esc:PotentialModel
skos:definition
Other SKOS Fields
<skos:Concept rdf:about="http://purl.oclc.org/NMM/Model/011/#model">
<skos:prefLabel>model</skos:prefLabel>
<skos:definition>
The root element of a NMM Model description. There is one model per xml file.
This model can have one or more related component configurations.
</skos:definition>
<skos:altLabel>simulation</skos:altLabel>
<skos:altLabel>job</skos:altLabel>
<skos:altLabel>run</skos:altLabel>
<skos:example>UK Met Office Unified Model</skos:example>
<skos:related rdf:resource=" http://...NMMPotentialModel/1.0/#PotentialModel"/>
<skos:changeNote rdf:parseType="Resource">
<rdf:value>The label 'model' was changed from NMM_Model.</rdf:value>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">
<foaf:Person xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:name>Katherine Bouton</foaf:name>
<foaf:mbox rdf:resource="mailto:..."/>
</foaf:Person>
</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2007-02-02</dc:date>
</skos:changeNote>
<dc:source rdf:resource="http://purl.oclc.org/NMM/Model"/>
</skos:Concept>
Semantic Relationships
nmm:Model
skos:related
esc:PotentialModel
skosx:childOf
skosx:childOf
skos:synonym
nmm:Component
skos:synonym
prism:Model
Putting it all Together
1
Namespace
Schemata (e.g.,
NMM, Curator-NMM,
Gridspec, ESG)
2
Glossary
metadata
harvested
nightly
3
4
Glossary Web
Application
Aggregate Glossary
RDF
5
Client Web Browser
SPARQL
Queries
Joseki RDF Server
Marked up with
glossary metadata
(terms, definitions,
relationships)
Tomcat
(www.earthsystemcurator.org/glossary)
Search for terms,
view relationships,
etc.
More info:
http://glossary.earthsystemcurator.org/
http://www.earthsystemcurator.org/index.php?option=com_content&task=view&
id=54&Itemid=84
Glossary Interface
Search
Concept List
Schemata
to Include
Links to
related
concepts
Concept
Details
Syntactic Metadata Re-use




So, if we agree on the concepts, what about the
syntax? (i.e., XML representation)
Concept = XML Type
How do we share XML types from multiple
schemata across the community?
One idea: XML Type Library (or Catalog or
Repository)


“Preliminary Research”
This is NOT the same thing as a single complex
schema that describes everything – types are first
class objects and can be manipulated individually
How does an XML Type
Library work?

Operations (web service?)
Submit an XML type
 Get a list of all types
 Query for types
 Validate a type (Is my XML
fragment a valid X?)
 Type membership (What
types does my XML
fragment fit?)
 Generate an XML Schema

How does an XML Type
Library work?

What metadata is available per type?
Definition (e.g., XML Schema complexType)
 SKOS Glossary entry (for queries)
 Example usage scenarios
 Dependencies on other types
 Versioning metadata
 Available operations/web services

• “If you have an XML fragment of type X, you
can use the following services...”
Use Case: Submit Type
Existing
Schemata
Extract Types
<xsd:complexType name=“PotentialModel">
<xsd:annotation> name=“PotentialModel">
<xsd:complexType
<xsd:complexType
<xsd:documentation>
<xsd:annotation> name=“PotentialModel">
<xsd:complexType
name=“PotentialModel">
<xsd:annotation>
<skos:Concept
rdf:about="http://.../schema/1.0#PotentialModel">
<xsd:documentation>
<xsd:complexType
<xsd:annotation> name=“PotentialModel">
<xsd:documentation>
<skos:prefLabel>potential
model</skos:prefLabel>
<skos:Concept
rdf:about="http://.../schema/1.0#PotentialModel">
<xsd:annotation>
<xsd:documentation>
<skos:Concept
rdf:about="http://.../schema/1.0#PotentialModel">
<skos:definition>A
set of components
at the source code...
<skos:prefLabel>potential
model</skos:prefLabel>
<xsd:documentation>
<skos:Concept rdf:about="http://.../schema/1.0#PotentialModel">
<skos:prefLabel>potential
model</skos:prefLabel>
</skos:definition>
<skos:definition>A
of components
at the source code...
<skos:Concept set
rdf:about="http://.../schema/1.0#PotentialModel">
<skos:prefLabel>potential
model</skos:prefLabel>
<skos:definition>A set of components at the source code...
</skos:Concept>
</skos:definition>
<skos:prefLabel>potential
model</skos:prefLabel>
<skos:definition>A set of components at the source code...
</skos:definition>
</xsd:documentation>
</skos:Concept>
<skos:definition>A set of components at the source code...
</skos:definition>
</skos:Concept>
</xsd:annotation>
</xsd:documentation>
</skos:definition>
</skos:Concept>
</xsd:documentation>
<!-rest of</skos:Concept>
complexType definition goes here -->
</xsd:annotation>
</xsd:documentation>
</xsd:annotation>
<xsd:complexType>
<!-- rest
of complexType definition goes here -->
</xsd:documentation>
</xsd:annotation>
<!-rest of complexType definition goes here -->
<xsd:complexType>
</xsd:annotation>
<!-- rest of complexType definition goes here -->
<xsd:complexType>
<!-- rest of complexType definition goes here -->
<xsd:complexType>
<xsd:complexType>
Submit to
Type Library
Use Case: Validation
Type Library
XML Fragment
<horizontal_coord_system type=“cartesian”>
<x_axis>...</x_axis>
<y_axis>...</y_axis>
</horizontal_coord_system>
Validate
“Valid” or
“Invalid”
Use Case: Find Services
Type Library
XML Fragment
<horizontal_coord_system type=“cartesian”>
<x_axis>...</x_axis>
<y_axis>...</y_axis>
</horizontal_coord_system>
Find Services
List of available
services based
on type of
fragment
Interpolate_Service()
Extract_Variable()
Massage_Data()
Another_Operation()
Some Conclusions
With large amount of metadata activity
already in progress, metadata re-use
must be a priority
 Conceptual understanding is essential



Adoption of a glossary of concepts
Syntactic agreement is desirable

Concepts assigned concrete XML
types and stored in a library
Some Haiku
Retile the Shower
Tessellated Mosaic
First Write a Gridspec
Forever summer
questions and answers
Curator complete
Potential Model
Like a cool autumn breeze
Potentially mad
Extra Slides...
Example Gridspec
Applications

Not written for one particular application – general
grid metadata has many potential uses




IPCC Model Documentation table
Moving variables to common grid for analysis
Regridding vertical from 24 to 40 levels
There are two levels: conceptual and syntactic –
ideally, we would agree at both of these levels!

If we only have conceptual agreement—we can still
interoperate, but must do transformations
Type Reuse Scenario
Full Schema
Partial Schemata
Application: NARCCAP
Vertical Interpolation
Gridspec.xsd
Description of vertical
coordinate scheme
Partial Schema
}
Metadata required for
NARCCAP experiment:
interpolate from 24 to 40
vertical levels
Schema Aggregation
Scenario
Schema A
Schema B
Schema C
Schema D
XML Type
Application Schema
Application: Component
Compatibility Checking
NMM Component
Coupling Spec (PMIOD)
Gridspec
Required
coupling fields
Technical details (e.g.,
supported platforms)
Application Schema
Horizontal grid
descriptor
}
All metadata required for
compatibility checking
of two components
Download