Document

advertisement
Persistent Identification of Agents and
Objects of Global Change: Progress in
the Global Change Information System
Peter Fox, RPI
Curt Tilmes, NASA
Xiaogang (Marshall) Ma, RPI
Anne Waple, NOAA
Stephan Zednik, RPI
Jin Zheng, RPI
pfox@cs.rpi.edu,
ctilmes@usgcrp.gov
www.globalchange.gov
The Global Change Research Act and
USGCRP
• USGCRP was mandated by Congress in
the Global Change Research Act (GCRA)
of 1990 (P.L. 101 – 606)
“To provide for development and
coordination of a comprehensive and
integrated United States Research Program
which will assist the Nation and the world to
understand, assess, predict, and respond to
human-induced and natural processes of
global change.”
2
U.S. Global Change Research Program
The Program:
• Coordinates Federal research to
better understand and prepare
the nation for global change
• Prioritizes and supports cutting
edge scientific work in global
change
• Assesses the state of scientific
knowledge and the Nation’s
readiness to respond to global
change
• Communicates research findings
to inform, educate, and engage
the global community
3
Global Change Information System
(GCIS)
Vision:
A unified web based source of authoritative,
accessible, usable, and timely information
about climate and global change for use by
scientists, decision makers, and the public.
4
Global Change Research Act (1990), Section 106
…not less frequently than every 4 years, the
Council… shall prepare… an assessment which–
• integrates, evaluates, and interprets the findings
of the Program and discusses the scientific
uncertainties associated with such findings;
• analyzes the effects of global change on the
natural environment, agriculture, energy
production and use, land and water resources,
transportation, human health and welfare,
human social systems, and biological diversity;
and
• analyzes current trends in global change, both
human- induced and natural, and projects major
trends for the subsequent 25 to 100 years.
5
Previous National Climate Assessments
Climate Change Impacts on the
United States (2000)
Global Climate Change Impacts
in the United States (2009)
http://nca2009.globalchange.gov
Target date for next NCA: 2013
6
NCA 2009
http://nca2009.globalchange.gov
7
Prototype Use Case (UC-1)
Name
Discover and visit data center website of dataset used to generate report figure.
Goal
The NCA Report reader sees a figure and wants to know where the data came from.
Summary
A reader of the NCA is browsing the content via the website. He/she sees a figure and wants to know where the data
came from. A reference to the publication in which the figure originated appears in the figure caption. Selecting the link
to the source publication displays a page of information about the publication including, if available, the publication
DOI. The page also includes references to the datasets cited in the publication. Following each of dataset reference links
presents a page of information about the dataset, including links back to the agency/data center webpage describing the
dataset in more detail and making the actual data available for order or download.
Actors
Primary Actor - reader of the NCA
Preconditions
Reader is viewing the NCA online report
Post Conditions
Reader visits the data center dataset website
Normal Flow
1) System is presenting the NCA report to the reader in a web site. Presentation includes report figure with caption that
includes reference to source publication.
2) Reader selects publication reference in figure caption
3) System displays information about publication, including DOI (if available).
4) Publication information includes publication dataset citations.
5) Reader selects a dataset cited by the publication.
6) System displays information about dataset including links to agency / data center webpages where more information
and (potentially) data download links are available.
7) Reader selects the data center link and is redirected to data center dataset webpage.
NCA links to GCIS entities
9
Key Message & Traceable Account
Key Message vs. “General” Message
(early draft)
GCIS
12
GCIS
• Create an entity from the
structured metadata
about each thing – tag
with related concepts.
• Identify it with a
persistent, controlled
identifier.
• Present with a human
readable web page and a
machine interface.
• Represent all
relationships between
items.
13
GCIS and W3C Prov
For GCIS, we have agents (people, projects, agencies, data
centers, publishers, etc.) who are associated with activities
(measuring, deriving, modeling, analyzing, authoring,
publishing, archiving, distributing, visualizing, etc. ) the
entities (software, data, images, figures, papers, reports,
etc.) related to global change.
We assign local identifiers to each (so we can persistently
resolve them) and capture and represent their relationships.
If possible, we link with external authorities:
agency data centers, journal publishers,
Researcher ID (researcherid.com) or ORCID (orcid.org).
14
W3C PROV (starting points..)
wasDerivedFrom
wasInformedBy
used
ENTITY
ACTIVITY
wasGeneratedBy
startedAtTime,
endedAtTime
wasAttributedTo
wasAssociatedWith
AGENT
Diagram from W3C PROV group and Ivan Herman
actedOnBehalf
Prototype Use Case (UC-2)
Name
Find Latest Datasets by Keyword
Goal
Search for datasets associated with the keyword “snow”, list search results by recentness of publication.
Summary
User story:
I want to look for information concerning “snow.” I don’t know if it is a CLEAN word or a GCMD word or don’t
even know what GCMD or CLEAN is. How would I do it, and what would I see on my monitor during the process?
Assumptions
The reader is not assumed to have knowledge regarding the GCMD Keywords (or other) vocabulary.
Actors
Primary Actor - reader of the NCA
Preconditions
TBD
Post Conditions
Reader is presented with a list of datasets associated with the keyword “snow” sorted by dataset publication date.
Normal Flow
TBD
Notes
We are looking into two user interface options for dataset selection by keyword
1) As a free-text search where the user inputs “snow”.
2) Present the user a faceted browse interface with a vocabulary faceted which presents the user with terms from a
structured vocabulary. The user can manually select the term(s) which match or contain “snow”.
We intend to implement prototypes of both.
NCA links to GCIS entities
17
Traceable accounts…
18
19
20
Interagency Information Integration
GCIS can use relationships between all relevant
information about global change across the agencies:
o From observations to datasets to research papers to models to
analyses to organizations to people to synthesized reports to
human impacts...
o Determine agency interdependencies -- An EPA analysis uses a
NOAA model dependent on observations from a NASA satellite.
o Can present unique interagency metrics "How many papers
referenced datasets from a specific satellite?"
o Direct users back to agency data centers for more detailed
information and the actual content and data.
GCIS Data Mining
Structured information with relationships allows integrated
data mining, searching, metrics.
o What projects provided data used to produce figures that were
referenced in the 2013 NCA section about coastal sea level rise
impacts?
o Which data centers hold data referenced by papers related to
forests in the midwest?
o Which agencies have people working on projects related to societal
impacts of extreme weather events?
o Show me the latest papers about health impacts of air quality in
California. Which datasets were used in the analysis of air quality
in California?
Questions and Comments
For more information, visit http://www.globalchange.gov and
http://tw.rpi.edu/web/project/GCIS-IMSAP
GCIS Benefits
NCA web portal, GCIS prototype
•
•
•
•
•
•
•
•
•
•
•
•
GCIS
NCA content available online
Searchable, linkable
Complete provenance, traceability
Links back to source information including
agency sources, scenarios, technical input
Link to associated and applicable information
and tools
Ensure authoritative and appealing design and
accessibility
Incorporates initial indicators of change,
impact and response
Access to information about NCA process
(transparency)
Facilitates collaboration across segments of the
climate science and applications community
Construct, prototype and test the initial
framework
Use constrained scope and dedicated staff to
accomplish a lot in a short time
Ensure the system design is extensible and
able to grow to meet long term GCIS needs
• A single web site can lead back to
agency global change information
across the program
• A friendly, accessible entry into
global change information for nonscientists
• Global, persistent, reusable
identifiers for each item
• Integrated data catalog provides
interagency metrics, data mining,
searching, etc.
• Interagency relationships allow
discovery of interdependencies
and increase collaboration
opportunities
• Agency information mapped into a
common, consistent model with a
standard vocabulary
• Concept tagging and linking
improves search results for agency
products
24
25
URI Schema
• URI for NCA instances consists of 3 parts:
domain name, type of instance, identifier
– Domain name: data.globalchange.gov
– Types: Person, Project, Organization,
Publication, etc.
– Identifiers: depends on the instance’s type, we
will assign a unique id number or construct an
identifier base on the instance’s unique
property value.
More Examples
Person
http://data.globalchange.gov/person/<unique name>
Publication
http://data.globalchange.gov/publication/doi/<doi>
Project
http://data.globalchange.gov/project/ACMAP
Topic
http://data.globalchange.gov/topic/Human-health
Image
http://data.globalchange.gov/image/<uuid>
Figure
http://data.globalchange.gov/report/<reportid>/figure/<figureid>
Chapter
http://data.globalchange.gov/report/<reportid>/chapter/<chapterid>
Organization
http://data.globalchange.gov/organization/NASA
Model
http://data.globalchange.gov/model/<model_name>
Dataset
http://data.globalchange.gov/dataset/doi/<doi>
Platform
http://data.globalchange.gov/platform/<platform_name>
Instrument
http://data.globalchange.gov/instrument/<instrument_name>
GCIS Ontology for NCA (subset)
28
Provenance Modeling Example
29
Linked Data Principles
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful
information, using the standards.
4. Include links to other URIs, so that they can discover
more things.
http://www.w3.org/DesignIssues/LinkedData.html
30
Linked Open Data
http://5stardata.info
31
Data Identifiers
•NASA Earth Science Data Systems Working Group and ESIP
Federation study resulted in dataset identifiers
recommendations, [1] Duerr, et. al.
•DOI – Digital Object Identifiers provide a well-defined
mechanism to attach an identifier to a digital object.
Recommendation adopted by NASA for EOSDIS:
http://earthdata.nasa.gov/wiki/main/index.php/Digital_Object_Identifi
ers_(DOIs)_for_EOSDIS
doi:10.5067/MEASURES/GSSTF/DATA308
http://dx.doi.org/10.5067/MEASURES/GSSTF/DATA308
32
Identifier Resolution
doi:10.5067/MEASURES/GSSTF/DATA308
A common, persistent, citable reference to that dataset.
We build GCIS specific identifiers from those:
http://data.globalchange.gov/doi/10.5067/MEASURES/GSSTF/DATA308
Then we can resolve it (with content negotiation) on our site,
and link it with identifiers for our other resources, including
asserting equivalence and linking with the data center
responsible for stewardship and distribution of the actual
data. We can also refer and link to other repositories of
information about those resources.
33
Content Negotiation
http://data.globalchange.gov/doi/10.5067/MEASURES/GSSTF/DATA308
The server response from the URI depends on what you
ask for:
•A traditional browser will ask for HTML, and receive and
render a human readable description of the resource.
•Web services can request formal, structured XML or RDF
metadata about the resource.
Our goal is to provide a curated collection of authoritative
global change information, but always link back to the data
center or publisher responsible for the long term
stewardship of the resource.
34
CLEAN Vocabulary
Download