Next Generation Semantic Data Environments (or Linked Data, Semantics, and Standards in Scientific Applications) Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Web Science Research Center Director Rensselaer Polytechnic Institute, Troy, NY With thanks to the extended RPI Tetherless World Team OMG Semantics : From Research to Reality: Implementing the Semantic Web March 20, 2013 Reston, VA Trends: More Data & More Diversity • More data – – – – – More open data More authoritative data More interest in and generation of metadata More enthusiast generated / maintained data More vocabularies, taxonomies, ontologies • More diversity – Broader human participation • Trained scientists, citizens, enthusiast, indigenous, … – – – – More locations – mobile as well as global More sensors – human, robots, implants, … Real time feeds Social sources – Twitter, Facebook, … 2 Increasing Requirements • Data and data environments should: – Support usability – not just by original authors – Include (usable) documentation - meta data concerning collection methods, sources, recency, assumptions, … – Provide accessibility with transparent access policies – Include schema / ontology information – including mapping information used in integration along with rationales…. – Support queries (with usable and understandable interfaces) – Document verification and curation methods, including access to tools – Support AND encourage interactions; users should be able to comment, question, contribute, discuss, …. Path moves from Portal -> Virtual Observatory -> Online Community Next: examples, foundations, and discussion 3 Semantic Environmental and Ecological Monitoring • Enable/Empower citizens & scientists to explore pollution sites, facilities, regulations, and health impacts along with 5 4 provenance • Demonstrates semantic 3 2 monitoring possibilities • Extend to endangered species and resource mgr issues 1 • Explanations and Provenance http://was.tw.rpi.edu/swqp/map.html and available http://aquarius.tw.rpi.edu/projects/semantaqua 1. Map view of analyzed results 2. Explanation of pollution 3. Possible health effect of contaminant (from EPA) 4. Filtering by facet to select type of data 5. Link for reporting problems 6. Extended with input from USGS, with population counts for birds & fish Example Workflow (SemantAqua) Publish CSV2RDF4LOD Direct visualize derive derive archive CSV2RDF4LOD Enhance Archive 5 Reusable Ontologies • Pollution ontology describes the relationship between a regulation violation (a measurement), a polluted thing, and a polluted site • Combined with other ontologies (e.g. W3C Geo) users can ask “Tell me all of the polluted things within 1 mile of my location” 6 Ontologies • Water quality ontology extends pollution to describe water-related pollution • Further extended by regulation ontologies to provide “regulation violation” inference • Allows the reasoner to match specific regulations to measurements that violate them 7 Interface 8 Semantic Methodology and Semantic Application Evolution SemantAqua -> SemantEco -> DataOne modularizing, broadening, provenance, interaction VSTO -> SESDI -> SPCDIS - modularizing, provenance, Originally developed for Virtual Observatories (in solar terrestrial) , now in water quality, Sea ice, volcanology, broadening, interaction mycology, …. … McGuinness, Fox, West, Garcia, Cinquini, Benedict, Middleton The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. 19 Conf. on Innovative Applications of Artificial Intelligence (IAAI-07), http://www.vsto.org 9 Population Sciences Grid: Interventions, Behaviors, and Policy Extensible Mashups via Linked Data Diverse datasets from NIH Exploring Interventions along with correlations with behavior changes - in this case tobacco interventions and smoking prevalance Accountable Mashups via Provenance Award winning paper on multi-dimensional analysis 10 An Example: Hawaii Changes in cigarette use viewed against policy changes We link states from year to year to that state across time, adding data for each year. 11 Ontology as API: Adding Dimensions This RDF: graph Creates this visual: dataset x axis y axis 12 Social Observatory – First Responder effort (NIST funded) Social Media use is on the rise. Every day, we write: 294 billion emails 2 million blog posts Over 40 Million Tweets* Finding Users First Responders, including Emergency Medical Personnel, Firefighters, and Police Officers, have active online communities on Social Media websites. How can we leverage Social Media sites … to gather requirements for active First Responders? … to identify stakeholders within those First Responder communities? Finding Topics 13 Web Data “Challenge Response” Enablers - HHS Award winning platform - Target questions: “good hospital for my context” - Prizm, DataCube Explorer, … 14 Open Government Data TWC –Intl Open Government Data Sets Mobile, Distributed, and ContextAware Computing Rensselaer Tetherless World Constellation Web Observatory Foundations & Directions THEMES Multi-Dimensional Data Portals Observatories: Science, Open Government, Health and Life Science, Social Web Science Research Foundations Making Data Transparent and Actionable Provenance Semantic Methodology Social Network Analysis Semantically-Enabled Visualization Web Data "Challenge Response" Enablers • • • • • • Open Data Workflow International Open Government Data Sets Health and Human Services Data Challenge Semantic eScience Data Portals Social Media: Reasoning on Graph Database First Responder Network Foundations: Web Layer Cake Visualization APIs S2S Govt Data Inference Web, Proof Markup Language, W3C Provenance Working group formal model, W3C incubator group, … OWL 1 & 2 WG Edited main OWL Docs, quick reference, OWL profiles (OWL RL), Earlier languages: DAML, DAML+OIL, Classic Inference Web IW Trust, Air + Trust DL, KIF, CL, N3Logic Ontology repositories (ontolinguag), Ontology Evolution env: Chimaera, Semantic eScience Ontologies, MANY other ontologie RIF WG AIR accountability tool SPARQL WG, earlier QL – OWL-QL, Classic’ QL, … Govt metadata search Linked Open Govt Data SPARQL to Xquery translator RDFS materialization (Billion triple winner) Transparent Accountable Datamining Initiative (TAM Inference Web: Making Data Transparent and Actionable Using Semantic Technologies • How and when does it make sense to use smart system results & how do we interact with them? Cognitive Asst -> CPOF & SIRI Knowledge Provenance in Virtual Observatories (Mobile) Intelligent Agents Intelligence Analyst Tools -> Watson Hypothesis Investigation / Policy Advisors NSF Interops: SONET SSIII – Sea Ice 19 Moving to the Next Generation Some focus areas to move to the next generation: • Provenance – e.g., not just the sources, and dates but enough to know when to depend on something. • Policy – balance between sharing data, getting credit , making data accessible to all (or all willing to follow the rules • Social aspects – incentives, rewards, evolution, customization • Distributed, Mobile, and Context-aware • Education – scientific method - promote creating testable hypotheses, how to verify/ replication, etc. • Broadly usable semantic methodology • Moving to truly integrated communities 20 Discussion • • • • Semantic foundations are being used in a wide range of areas. They are not just for semantic practioners any more Open as well as commercial software available Come join us! • And if you are already there… – What do you want from evolving observatory / collaboratory infrastructure ? – What do you need from provenance and explanation infrastructures? – Do you have tools, tool templates, and/or tool requirements? – Do you have use cases? – Are you using our (or another) semantic methodology? More info – Deborah McGuinness dlm@cs.rpi.edu Extra 22 Semantic Web (RPI) 2013 Research Innovatio n RDFa What is an Ontology? Thesauri “narrower Catalog/ term” ID relation Terms/ glossary Informal is-a Formal Frames General is-a (properties) Logical constraints Formal Value Disjointness instance Restrs. , Inverse, part-of… Ontologies Come of Age McGuinness, 2001, and From AAAI Panel 99 – McGuinness, Welty, Uschold, Gruninger, Lehmann Plus basis of Ontologies Come of Age – McGuinness, 2003 Interface 25 Core and Framework Semantics Multi-tiered interoperability used by