eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration Berkeley, California October 24, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov 1 Topics Challenges to address A brief tutorial on Semantics and semantic computing where XMDR fits Semantic computing technologies Traditional Data Administration XMDR project Test Bed demonstrations 2 The Internet Revolution A world wide web of diverse content: The information glut is nothing new. The access to it is astonishing.3 Challenge: Find and process nonexplicit data For example… Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…); Analgesic Agent Non-Narcotic Analgesic Analgesic and Antipyretic However, want to study patients taking analgesic agents Nonsteroidal Antiinflammatory Drug Tylenol Acetominophen Anacin-3 Datril 4 Challenge: Specify and compute across Relations, e.g., within a food web in an Arctic ecosystem An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE) 5 Challenge: Combine Data, Metadata & Concept Systems Inference Search Query: “find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003” Data: ID Date Temp Hg A 06-09-13 4.4 4 B 06-09-13 9.3 2 X 06-09-13 6.7 78 Concept system: Contamination Biological Radioactive mercury Chemical lead cadmium Metadata: Name Datatype Definition Units ID text Monitoring Station Identifier not applicable Date date Date yy-mm-dd number Temperature (to 0.1 degree C) degrees Celcius number Mercury contamination micrograms per liter Temp Hg 6 Challenge: Use data from systems that record the same facts with different terms Database Catalogs Common Content ISO 11179 Registries Common Content Data Element UDDI Registries Table Column Common Content Business Specification OASIS/ebXML Registries XML Tag Country IdentifierAttribute Common Content CASE Tool Repositories Common Content Business Object Coverage Software Component Registries Common Content Term Hierarchy Ontological Registries Dublin Core Registries Common Content Common Content 8 Challenge: Draw information together from a broad range of studies, databases, reports, etc. 10 Challenge: Gain Common Understanding of meaning between Data Creators and Data Users A common interpretation of what the data represents EEA text environ agriculture climate human health industry tourism soil water air USGS 12312332683268 34534508250825 44544513481348 67067050385038 24824827082708 59159100000000 30830821782178 1231233268 3268 3453450825 0825 4454451348 1348 6706705038 5038 2482482708 2708 5915910000 0000 3083082178 2178 text environ agriculture climate human health industry tourism soil water air text ambiente agricultura tiempo salud huno industria turismo tierra agua aero Users data data DoD EPA text ambiente agricultura tiempo salud hunano industria turismo tierra agua aero text data environ agriculture climate human health 12312332683268 34534508250825 industry 44544513481348 tourism 67067050385038 soil 24824827082708 59159100000000 water 30830821782178 air data 12312332683268 34534508250825 44544513481348 67067050385038 24824827082708 59159100000000 30830821782178 3268 data 0825 123 1348 5038 345 123 3268 2708 0000 445 345 0825 2178 6701348 445 2485038 670 591 248 308 591 308 Information systems Others . . . Data Creation 11 Semantic Computing and XMDR We are laying the foundation to make a quantum leap toward a substantially new way of computing: Semantic Computing How can we make use of semantic computing for the environment and health? What do environmental agencies need to do to prepare for and stimulate semantic computing? What are the ecoinformatics challenges? 12 Coming: A Semantic Revolution Searching and ranking Pattern analysis Knowledge discovery Question answering Reasoning Semi-automated decision making 13 The Nub of It Processing that takes “meaning” into account Processing based on the relations between things not just computing about the things themselves. Processing that takes people out of the processing, reducing the human toil Data access, extraction, mapping, translation, formatting, validation, inferencing, … Delivering higher-level results that are more helpful for the user’s thought and action 14 A Brief Tutorial on Semantics What is meaning? What are concepts? What are relations? What are concept systems? What is “reasoning”? 16 Meaning: The Semiotic Triangle Thought or Reference (Concept) Refers to Referent Symbolises Stands for C.K Ogden and I. A. Richards. The Meaning of Meaning. Symbol “Rose”, “ClipArt” 17 Semiotic Triangle: Concepts, Definitions and Signs Definition CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Referent Sign Stands For 18 Forms of Definitions Definition - Define by: --Essence & Differentia --Relations --Axioms CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Referent Sign Stands For 20 Definition of Concept - Rose: Dictionary - Essence & Differentia 1. any of the wild or cultivated, usually prickly-stemmed, pinnate-leaved, showyflowered shrubs of the genus Rosa. Cf. rose family. 2. any of various related or similar plants. 3. the flower of any such shrub, of a red, pink, white, or yellow color. --Random House Webster’s Unabridged Dictionary (2003) 21 Definitions in the EPA Environmental Data Registry Mailing Address: State USPS Code: Mailing Address State Name: http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada http://www.epa/gov/edr/sw/AdministeredItem#StateName The name of the state where mail is delivered 22 Definition of Concept - Rose: Relations to Other Concepts Love Romance Marriage CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Referent Stands For 23 SNOMED – Terms Defined by Relations 24 Definition of Concept - Rose: Defined by Axioms in OWL rdfs:subClassOf owl:equivalentClass owl:disjointWith CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Referent Stands For 25 Class Axiom (Definitions) Class Description is Building Block of Class Axiom A class description is the term used in this document (and in the OWL Semantics and Abstract Syntax) for the basic building blocks of class axioms (informally called class definitions in the Overview and Guide documents). A class description describes an OWL class, either by a class name or by specifying the class extension of an unnamed anonymous class. OWL distinguishes six types of class descriptions: a class identifier (a URI reference) an exhaustive enumeration of individuals that together form the instances of a class a property restriction the intersection of two or more class descriptions the union of two or more class descriptions the complement of a class description The first type is special in the sense that it describes a class through a class name (syntactically represented as a URI reference). The other five types of class descriptions describe an anonymous class by placing constraints on the class extension. Class descriptions of type 2-6 describe, respectively, a class that contains exactly the enumerated individuals (2nd type), a class of all individuals which satisfy a particular property restriction (3rd type), or a class that satisfies boolean combinations of class descriptions (4th, 5th and 6th type). Intersection, union and complement can be respectively seen as the logical AND, OR and NOT operators. The four latter types of class descriptions lead to nested class descriptions and can thus in theory lead to arbitrarily complex class descriptions. In practice, the level of nesting is usually limited. 26 Class Descriptions -> Class Axiom Class descriptions form the building blocks for defining classes through class axioms. The simplest form of a class axiom is a class description of type 1, It just states the existence of a class, using owl:Class with a class identifier. For example, the following class axiom declares the URI reference #Human to be the name of an OWL class: <owl:Class rdf:ID="Human"/> This is correct OWL, but does not tell us very much about the class Human. Class axioms typically contain additional components that state necessary and/or sufficient characteristics of a class. OWL contains three language constructs for combining class descriptions into class axioms: rdfs:subClassOf allows one to say that the class extension of a class description is a subset of the class extension of another class description. owl:equivalentClass allows one to say that a class description has exactly the same class extension as another class description. owl:disjointWith allows one to say that the class extension of a class description has no members in common with the class extension of another class description. 27 Computable Meaning rdfs:subClassOf owl:equivalentClass owl:disjointWith CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Referent Stands For If “rose” is owl:disjointWith “daffodil”, then a computer can determine that an assertion is invalid, if it states that a rose is also a daffodil (e.g., in a knowledgebase). 28 What are Relations? Relation WaterBody Merced River Fletcher Creek isA isA Merced Lake Merced Lake Fletcher Creek Concepts and relations can be represented as nodes and edges in formal graph structures, e.g., “is-a” hierarchies. 29 Concept Systems have Nodes and may have Relations Nodes represent concepts A Lines (arcs) represent relations 1 a 2 b c d Concept systems can be represented & queried as graphs 30 A More Complex Concept Graph Concept lattice of inland water features Linear Large linear Large Non-linear Non-linear Small linear Small non- linear Deep Natural Flowing Shallow Stagnant Artificial River Stream Canal Reservoir Lake Marsh Pond From Supervaluation Semantics for an Inland Water Feature Ontology Paulo Santos and Brandon Bennett http://ijcai.org/papers/1187.pdf#search=%22terminology%20water%20ontology%22 31 Types of Concept System Graph Structures Trees Partially Ordered Trees Ordered Trees Faceted Classifications Directed Acyclic Graphs Partially Ordered Graphs Lattices Bipartite Graphs Directed Graphs Cliques Compound Graphs 32 Types of Concept System Graph Structures Tree Partial Order Tree Ordered Tree Partial Order Graph Bipartite Graph Faceted Classification Powerset of 3 element set Directed Acyclic Graph Clique Compound Graph 33 Graph Taxonomy Graph Directed Graph Undirected Graph Directed Acyclic Graph Bipartite Graph Clique Partial Order Graph Faceted Classification Lattice Partial Order Tree Tree Note: not all bipartite graphs are undirected. Ordered Tree 34 What Kind of Relations are There? Lots! Relationship class: A particular type of connection existing between people related to or having dealings with each other. acquaintanceOf - A person having more than slight or superficial knowledge of this person but short of friendship. ambivalentOf - A person towards whom this person has mixed feelings or emotions. ancestorOf - A person who is a descendant of this person. antagonistOf - A person who opposes and contends against this person. apprenticeTo - A person to whom this person serves as a trusted counselor or teacher. childOf - A person who was given birth to or nurtured and raised by this person. closeFriendOf - A person who shares a close mutual friendship with this person. collaboratesWith - A person who works towards a common goal with this person. … 35 Example of relations in a food web in an Arctic ecosystem An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE) 36 Ontologies are a type of Concept System Ontology: explicit formal specifications of the terms in the domain and relations among them (Gruber 1993) An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them. Why would someone want to develop an ontology? Some of the reasons are: To share common understanding of the structure of information among people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyze domain knowledge http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html 37 What is Reasoning? Inference Disease is-a is-a Infectious Disease is-a Polio Chronic Disease is-a Smallpox is-a Diabetes is-a Heart disease Signifies inferred is-a relationship 38 Reasoning: Taxonomies & partonomies can be used to support inference queries E.g., if a database contains information on events by city, we could query that database for events that happened in a particular county or state, even though the event data does not contain explicit state or county codes. part-of Oakland California part-of part-of Alameda County part-of Berkeley part-of Santa Clara County part-of Santa Clara San Jose 39 Reasoning: Relationship metadata can be used to infer non-explicit data For example… (1) patient data on drugs currently being taken contains brand names (e.g. Tylenol, Anacin-3, Datril,…); Analgesic Agent Non-Narcotic Analgesic (2) concept system connects different drug types and names with one another (via is-a, part-of, etc. relationships); (3) so… patient data can be linked and searched by inferred terms like “acetominophen” and “analgesic” as well as trade names explicitly stored as text strings in the database Analgesic and Antipyretic Nonsteroidal Antiinflammatory Drug Tylenol Acetominophen Anacin-3 Datril 40 Reasoning: Least Common Ancestor Query What is the least common ancestor concept in the NCI Thesaurus for Acetominophen and Morphine Sulfate? (answer = Analgesic Agent) Analgesic Agent Opioid Non-Narcotic Analgesic Analgesic and Antipyretic Opiate Morphine Codeine Sulfate Phosphate Nonsteroidal Antiinflammatory Drug Acetominophen 41 Reasoning: Example “sibling” queries: concepts that share a common ancestor Environmental: "siblings" of Wetland (in NASA SWEET ontology) Health Siblings of ERK1 finds all 700+ other kinase enzymes Siblings of Novastatin finds all other statins 11179 Metadata Sibling values in an enumerated value domain 42 Reasoning: More complex “sibling” queries: concepts with multiple ancestors Health breast disorders Find all the siblings of Breast Neoplasm Environmental site neoplasms Breast Eye Respiratory neoplasm neoplasm System neoplasm Non-Neoplastic Breast Disorder Find all chemicals that are a carcinogen (cause cancer) and toxin (are poisonous) and terratogenic (cause birth defects) 43 End of Tutorial about concept systems Where does ISO/IEC 11179 fit? 44 Data Generation and Use Cost vs. Coordination Full Control $ Community of Interest Data Creation Reporting Autonomous Coordination 45 Data Generation and Use Cost vs. Coordination Data Use $ Full Control Community of Interest Data Creation Reporting Autonomous Coordination 46 ISO/IEC 11179 Metadata Registries Reduce Cost of Data Creation and Use Data Use $ Full Control Community of Interest Data Creation Reporting Autonomous Coordination 47 Metadata Registries Increase the Benefit from Data (Strategic Effectiveness) Benefit Community of Interest Autonomous Reporting MDR Full Control 48 What Can ISO/IEC 11179 MDR Do? Traditional Data Management (11179 Edition 2) Register metadata which describes data—in databases, applications, XML Schemas, data models, flat files, paper Assist in harmonizing, standardizing, and vetting metadata Assist data engineering Provide a source of well formed data designs for system designers Record reporting requirements Assist data generation, by describing the meaning of data entry fields and the potential valid values Register provenance information that can be provided to end users of data Assist with information discovery by pointing to systems where particular data is maintained. 49 Traditional MDR: Manage Code Sets Data Element Concept Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Algeria Belgium China Denmark Egypt France ... Zimbabwe Data Elements Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others Algeria L`Algérie DZ DZA 012 Belgium Belgique BE BEL 056 China Chine CN CHN 156 Denmark Danemark DK DNK 208 Egypt Egypte EG EGY 818 France La France FR FRA 250 ... ... ... ... ... Zimbabwe Zimbabwe ZW ZWE 716 ISO 3166 French Name ISO 3166 2-Alpha Code ISO 3166 3-Alpha Code ISO 3166 3-Numeric Code ISO 3166 English Name 50 What Can XMDR Do? Support a new generation of semantic computing Concept system management Harmonizing and vetting concept systems Linkage of concept systems to data Interrelation of multiple concept systems Grounding ontologies and RDF in agreed upon semantics Reasoning across XMDR content Provision of Semantic Services 51 Coming: A Semantic Revolution Searching and ranking Pattern analysis Knowledge discovery Question answering Reasoning Semi-automated decision making Full Control Community of Interest Reporting Autonomous 52 We are trying to manage semantics in an increasingly complex content space Structured data Semi-structured data Unstructured data Text Pictographic Graphics Multimedia Voice video 53 11179-3 (E3) Increases MDR Benefit When communities create information according to a common vocabulary the value of the resulting information increases dramatically. Benefit Community of Interest Autonomous Reporting MDR Full Control 54 Example Combining Concept Systems, Data, and Metadata to answer queries. 55 Linking Concepts: Text Document Title 40--Protection of Environment CHAPTER I--ENVIRONMENTAL PROTECTION AGENCY PART 141--NATIONAL PRIMARY DRINKING WATER REGULATIONS § 141.62 40 CFR Ch. I (7–1–02 Edition) § 141.62 Maximum contaminant levels for inorganic contaminants. (a) [Reserved] (b) The maximum contaminant levels for inorganic contaminants specified in paragraphs (b) (2)–(6), (b)(10), and (b) (11)–(16) of this section apply to community water systems and non-transient, non-community water systems. The maximum contaminant level specified in paragraph (b)(1) of this section only applies to community water systems. The maximum contaminant levels specified in (b)(7), (b)(8), and (b)(9) of this section apply to community water systems; non-transient, noncommunity water systems; and transient non-community water systems. Contaminant MCL (mg/l) (1) Fluoride ............................ 4.0 (2) Asbestos .......................... 7 Million Fibers/liter (longer than 10 μm). (3) Barium .............................. 2 (4) Cadmium .......................... 0.005 (5) Chromium ......................... 0.1 (6) Mercury ............................ 0.002 (7) Nitrate ............................... 10 (as Nitrogen) 56 Thesaurus Concept System (From GEMET) Chemical Contamination Definition The addition or presence of chemicals to, or in, another substance to such a degree as to render it unfit for its intended purpose. Broader Term contamination Narrower Terms cadmium contamination, lead contamination, mercury contamination Related Terms chemical pollutant, chemical pollution Deutsch: Chemische Verunreinigung English (US): chemical contamination Español: contaminación química SOURCE General Multi-Lingual Environmental Thesaurus (GEMET) 57 Concept System (Thesaurus) Contamination chemical pollutant Biological Radioactive cadmium Chemical lead chemical pollution mercury 58 Chemicals in EPA Environmental Data Registry Environmental Data Registry Name Mercury Mercury, bis(acetato.kappa.O) (benzenamine)- Mercury, (acetato.kappa.O) phenyl-, mixt. with phenylmercuric propionate Type Biological Recent Additions | Contact Us Organism Chemical Chemical Chemical CAS Number 7439-97-6 63549-47-3 No CAS Number TSN Acalypha ostryifolia 28189 ICTV EPA ID E17113275 E965269 59 Data X Merced River Fletcher Creek B A Merced Lake Monitoring Stations Name A B X Latitude 41.45 N 43.23 N 39.45 N Longitude Measurements Location ID 125.99 W Merced Lake A 2006-09-13 4.4 4 B 2006-09-13 9.3 2 120.50 W Merced River X 2006-09-15 5.2 3 118.12 W Fletcher Creek X 2006-09-13 6.7 78 Date Temp Hg 60 Metadata Contaminants Contaminant Threshold mercury 5 lead 42? cadmium 250? Metadata System Data Element Definition Units Precision Measurements ID Monitoring Station Identifier not applicable not applicable Measurements Date Date sample was collected not applicable not applicable Measurements Temp Temperature degrees Celcius 0.1 Measurements Hg Mercury contamination micrograms per liter 0.004 Monitoring Stations Name Monitoring Station Identifier Monitoring Stations Latitude Latitude where sample was taken Monitoring Stations Longitude Longitude where sample was taken Monitoring Stations Location Body of water monitored Contaminants Contaminant Name of contaminant Contaminants Threshold Acceptable threshold value 61 Relations among Inland Bodies of Water Fletcher Creek feeds into Merced River feeds into Merced River fed from Fletcher Creek feeds into Merced Lake Merced Lake 62 Combining Data, Metadata & Concept Systems Inference Search Query: “find water bodies downstream from Fletcher Creek where chemical contamination was over 2 parts per billion between December 2001 and March 2003” Data ID Date Temp Hg A 06-09-13 4.4 4 B 06-09-13 9.3 2 X 06-09-13 6.7 78 Concept system Contamination Biological Radioactive mercury Chemical lead cadmium Metadata Name Datatype Definition Units ID text Monitoring Station Identifier not applicable Date date Date yy-mm-dd number Temperature (to 0.1 degree C) degrees Celcius number Mercury contamination micrograms per liter Temp Hg 63 Example – Environmental Text Corpus Idea: Develop an environmental research corpus that could attract R&D efforts. Include the reports and other material from over $1b EPA sponsored research. Prepare the corpus and make it available Research results from years of ORD R&D Publish associated metadata and concept systems in XMDR Use open source software for EPA testing 64 Extraction Engines Find concepts and relations between concepts in text, tables, data, audio, video, … Produce databases (relational tables, graph structures), and other output Functions: Segment – find text snippets (boundaries important) Classify – determines database field for text segment Association – which text segments belong together Normalization – put information into standard form Deduplication – collapse redundant information 65 Metadata Registries are Useful Registered semantics For “training” extraction engines The“Normalize” function can make use of standard code sets that have mapping between representation forms. The “Classify” function can interact with pre-established concept systems. Provenance High precision for proper nouns, less precision (e.g., 70%) for other concepts -> impacts downstream processing, Need to track precision 66 Normalize – Need Registered and Mapped Concepts/Code Sets Data Element Concept Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others Algeria Belgium China Denmark Egypt France ... Zimbabwe Data Elements Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others Algeria L`Algérie DZ DZA 012 Belgium Belgique BE BEL 056 China Chine CN CHN 156 Denmark Danemark DK DNK 208 Egypt Egypte EG EGY 818 France La France FR FRA 250 ... ... ... ... ... Zimbabwe Zimbabwe ZW ZWE 716 ISO 3166 French Name ISO 3166 2-Alpha Code ISO 3166 3-Alpha Code ISO 3166 3-Numeric Code ISO 3166 English Name 67 Information Extraction & Semantic Computing Extraction Engine Segment Classify Discover patterns Associate Select models Normalize Fit parameters Deduplicate Inference Report results 11179-3 (E3) XMDR Actionable Information Decision Support 68 Example – 11179-3 (E3) Support Semantic Web Applications XMDR may be used to “ground” the Semantics of an RDF Statement. The address state code is “AB”. This can be expressed as a directed Graph e.g., an RDF statement: Graph Node RDF Subject Address Edge Predicate Node Object State Code AB 69 Example: Grounding RDF nodes and relations: URIs Reference a Metadata Registry dbA:e0139 ai: MailingAddress dbA:ma344 ai: StateUSPSCode “AB”^^ai:StateCode @prefix dbA: “http:/www.epa.gov/databaseA” @prefix ai: “http://www.epa.gov/edr/sw/AdministeredItem#” 70 Definitions in the EPA Environmental Data Registry Mailing Address: State USPS Code: Mailing Address State Name: http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada http://www.epa/gov/edr/sw/AdministeredItem#StateName The name of the state where mail is delivered 71 Ontologies for Data Mapping Ontologies can help to capture and express semantics Concept Concept Concept Concept Geographic Area Geographic Sub-Area Country Country Identifier Country Name Short Name Mailing Address Country Name Long Name Distributor Country Name Country Code ISO 3166 2-Character Code ISO 3166 3-Numeric Code ISO 3166 3- Character Code FIPS Code 73 Example: Content Mapping Service data from many sources – files contain data that has the same facts represented by different terms. E.g., one system responds with Danemark, DK, another with DNK, another with 208; map all to Denmark. XMDR could accept XML files with the data from different code sets and return a result mapped to a single code set. Collect 74 Ecoinformatics: Concept System Store Concept systems: Concept System Thesaurus Themes Ontology GEMET Structured Metadata Data Standards } Metadata Registry Keywords Controlled Vocabularies Thesauri Taxonomies Ontologies Axiomatized Ontologies (Essentially graphs: node-relation-node + axioms) 75 Ecoinformatics: Management of Concept Systems Metadata Registry Concept System Thesaurus Themes Ontology GEMET Structured Metadata Data Standards Concept system: } Registration Harmonization Standardization Acceptance (vetting) Mapping (correspondences) 76 Ecoinformatics: Life Cycle Management Metadata Registry Concept System Thesaurus Themes Ontology GEMET Structured Metadata Data Standards Life cycle management: Data and Concept systems (ontologies) 77 Ecoinformatics: Grounding Semantics Metadata Registries Metadata Registry Concept System Thesaurus Themes Ontology GEMET Structured Metadata Semantic Web RDF Triples Subject (node URI) Verb (relation URI) Object (node URI) Ontologies Data Standards 78 XMDR Project Collaboration Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others Draws on and contributes to interagency/International Cooperation on Ecoinformatics Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users Interacts with many organizations around the world through ISO/IEC standards committees Only loosely aligned with Ecoinformatics Cooperation 79 XMDR Project High risk R&D, sponsor expected likelihood of failure Targeted toward leading-edge semantics applications in a highly strategic environment Conceptualization of new capabilities, creation of designs (expressed as standards), development of a software architecture and prototype system for demonstrating capabilities and testing designs Reasoning, inference, linkage of concepts to data, …. Demonstration of fundamental semantic management capabilities for metadata registries, understanding the potential applications that could be built in-house 80 Results to Date Completed the first version of designs for next generation metadata registries—expressed as figures in a UML model that is proposed for next edition of the ISO/IEC 11179 standard Developed XMDR Prototype -- available as open source software Content loaded in prototype: broad range of traditional metadata and concept systems Designs and prototype being explored and used in several locations. Potential for facilitating development and sharing of content by wide diversity of users. Starting the next version of designs, taking on more 81 challenging content and capabilities Status of Project NSF has funded a three-year project, providing a funding base Strong emphasis on the computer science R&D results and collaboration with EU and Asia Limited staffing Proposing further high risk R&D Developing proposals for collaborative efforts to demonstrate capabilities, especially in the area of water. Opportunity to collaborate with JRC and projects under the European Commission 7th Framework Program 82 Ecoinformatics Test Bed Proposed in Brussels in September 2004 Purpose Project direction and statement developed Research and technical informatics to investigate metadata management techniques. Practical experiment for testing usability. Initial Focus Use metadata and semantic technologies for air quality (transportation) health effects Potential for extension to other areas Need for engaging ongoing operations and/or indicators Bruce the unready 83 Ecoinformatics Test Bed Extend original charter to Water Use Water as example content Look for opportunities to coordinate with EU projects Metadata, concept systems WISE, EC 7th Framework program Identify and propose possible demonstrations 84