Introduction to eScience and Semantic Web Professor Deborah McGuinness TA – Katie Chastain Other lectures from tetherless world grad students Jim McCusker and Amar Viswanathan and possibly others from http://tw.rpi.edu/web/People CSCI 6962 - 01, 26868 , CSCI 4969 - 01, 27716 ITWS 6960 - 01, 27640 , ITWS 4969 - 01, 27717 1 Week 1, August 27, 2012 Admin info (keep/ print this slide) • Class: – CSCI 6962 - 01, 26868 , CSCI 4969 - 01, 27716 – ITWS 6960 - 01, 27640 , ITWS 4969 - 01, 27717 • Hours: 1pm-3:50pm Mondays (except after Columbus day when we meet on Tuesday) • Class Location: Winslow 1140 • Instructors: Deborah McGuinness, TA Katie Chastain, Guests: Jim McCusker, Amar Viswanathan, Patrice Seyed • Contacts: dlm@cs.rpi.edu, chastk@rpi.edu , mccusj@rpi.edu , kannaa@rpi.edu, seyeda2@rpi.edu • Contact locations: Winslow 2104 (DLM), 2nd floor Winslow kitchen • Wiki: http://tw.rpi.edu/web/Courses/SemanticeScience/2012 • Twed: http://tw.rpi.edu/web/TWed Wed - 7-9 starting Sept 12 2 Introductions • • • • • Who are we? Who are you? Why are you here? What do you want to get out of the class? Will you make the class (on time) each week and do you have any other conflicts or issues we should know about? 3 “Knowledge is the common wealth of humanity”* In the Earth and space sciences and elsewhere, ready and open access to the vast and growing collections of cross-disciplinary digital information is the key to understanding and responding to complex Earth system phenomena that influence human survival. We have a shared responsibility to create and implement strategies to realise the full potential of digital information and services for present and future generations. *Adama Samassekou, Convener of the UN World Summit on the Information Society Background People should be able to access a global, distributed knowledge base of scientific data that: • appears to be integrated • appears to be locally available But… data is obtained by multiple means, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable 5 implementation technology… What do we need to achieve Semantic eScience? (in-class brainstorming exercise) White board exercise…. What do we need to achieve this vision? What do we need to achieve Semantic eScience? (in-class brainstorming exercise) organization, leadership, management strategies, roles and assignment of roles dissemination strategy communication of ideas - machine level - human level conflict resolution cross-disciplinary collaboration flexible adaptable, feedback extensible ability to filter information usage/application of resources, optimization facts, knowledge (domain knowledge) context, domain, scope goals, use cases metadata - data to describe data ability to link information ability to understand information ability to capture and represent conflicting ideas provenance - where data come from trust - reliable ability to capture intent (humanitarian aspect / responsibility) credibility of information interesting and appealing standardization education and outreach methods and metrics criteria for evaluation Outline of the course • Topics for Semantic e-Science/ Foundations: – – – – – – – – – – – – – Semantic Methodologies Knowledge Representation for e-Science Ontology Engineering and Re-Use for e-Science Knowledge Integration for e-Science Semantic Data Integration Semantic Web Languages, Tools and Services Knowledge Provenance for e-Science Semantic Infrastructure and Architecture for e-Science Semantic Grid Middleware Ontology Evolution for e-Science Knowledge Management for e-Science e-Science Workflow Management Data life-cycle for e-Science 8 Contents • • • • • • • • • Outline of the course Background e-Science Examples Informatics Semantics Elements of Semantic e-Science (SeS) What we expect Logistics summary 9 The Information Era: Interoperability Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system: • managing and accessing large data sets • higher space/time resolution capabilities • rapid response requirements • data assimilation into models • crossing disciplinary boundaries. 10 Information Information But data has products have Lots of Audiences More Strategic Less Strategic SCIENTISTS TOO From “Why EPO (Education and Public Outreach)?”, a NASA internal report on science education, 2005 11 Shifting the Burden from the User to the Provider 12 Fox CI and X-informatics - CSIG 2008, Aug 11 e-Science • Emphasis is on Science • Original narrative: One of the key drivers behind the search for such new scientific tools is the imminent deluge of data from new generations of scientific experiments and surveys (*). In order to exploit and explore the petabytes of scientific data that will arise from these high-throughput experiments, supercomputer simulations, sensor networks, and satellite surveys, scientists will need assistance from specialized search engines, data mining tools, and data visualization tools that make it easy to ask questions and understand answers. To create such tools, the data will need to be annotated with relevant "metadata" giving information as to provenance, content, conditions, and so on; and, in many instances, the sheer volume of data will dictate that this process be automated. Scientists will create vast distributed digital repositories of scientific data requiring management services similar to those of more conventional digital libraries, as well as other data-specific services. The ability to search, access, move, manipulate, and mine such data will be a central requirement for this new generation of collaborative science software applications. Hey and Trefethen, 2005 13 Evolving Science • Thousand years ago: science was empirical describing natural phenomena • Last few hundred years: theoretical branch using models, generalizations • Last few decades: a computational branch 2 . 4G c2 a a 3 a 2 simulating complex phenomena • Today: data exploration (eScience) synthesizing theory, experiment and computation with advanced data management and statistics new algorithms! • eScience that “understands” meaning of terms Semantic eScience Living in an Exponential World 1000 • Scientific data doubles every year – caused by successive generations of inexpensive sensors + exponentially faster computing • • • • 100 10 1 0.1 1970 Changes the nature of scientific computing Cuts across disciplines (eScience) It becomes increasingly harder to extract knowledge 20% of the world’s servers go into huge data centers by the “Big 5” – Google, Microsoft, Yahoo, Amazon, eBay • So it is not only the scientific data! 1975 1980 1985 1990 1995 2000 CCDs Glass Collecting Data • Very extended distribution of data sets: data on all scales! • Most datasets are small, and manually maintained (Excel spreadsheets) • Total amount of data dominated by the other end (large multi-TB archive facilities) • Most bytes today are collected via electronic sensors Making Discoveries • Where are discoveries made? – At the edges and boundaries, by inspecting deeper or more data • Metcalfe’s law – Utility of computer networks grows as the number of possible connections: O(N2) • Federating data (the connections!!) – Federation of N archives has utility O(N2) – Possibilities for new discoveries grow as O(N2) • Many examples – Sky surveys – galaxy zoo… Very early discoveries from Sloan Digital Sky Survey (http://www.sdss.org/ ), Two Micron Sky Survey (http://www.ipac.caltech.edu/2mass/ ) , Palomar Digital Sky Survey (http://www.astro.caltech.edu/~george/dposs/ ) – Genomics+proteomics – Alzheimers article in reading Data Delivery: Hitting a Wall FTP and GREP are not adequate • • • • You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years • Oh!, and 1PB ~4,000 disks • • • • You can FTP 1 MB in 1 sec You can FTP 1 GB / min (~1 $/GB) … 2 days and 1K$ … 3 years and 1M$ • At some point you need indices to limit search parallel data search and analysis • This is where databases can help • Take the analysis to the data!! Mind the Gap! • As a result of finding out who is doing what, Informatics - information science includes the sharing experience/ expertise, and science of (data and) information, the practice substantial coordination: of information processing, and the engineering • There is/ was still a gap between science of information systems. Informatics studies the and the underlying and of natural structure, behavior,infrastructure and interactions technology that is available and artificial systems that store, process and communicate (data and) information. It also develops its own conceptual theoretical • Cyberinfrastructure is the new and research foundations. Since computers, individuals environment(s) that support advanced data and acquisition, dataallstorage, management, organizations processdata information, data integration, mining, data informatics has data computational, cognitive and visualization and other computing and social aspects, including study of the social information processing services over the impact of information technologies. Wikipedia. Internet. 19 Progression after progression Informatics IT Cyber Infrastru cture Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, Societal Benefit Areas 20 World-Wide Emerging Technology Trends • Innovation will come from other parts of the world other than the U.S. • The Chinese have skipped the Internet first generation. • Growth is occurring in Asia, and decreasing in previous hot areas such as Western Europe. • U.S. Industry is compulsively outsourcing abroad. • Software is moving from forms-based applications to business processes. • Networks are migrating to internet protocol and optical networking technologies. Cyberinfrastructure • • • • • • • • • Data curation and storage Federated access Collaboration New uses in High Performance Computing Databases Web servers, services (software as service) Wiki Visualization All discipline neutral Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, etc. Adopt Leverage Rapid Technology Technology Science/Expert Open World: Prototype Infrastructure Approach Review & Iteration Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Small Team, mixed skills Develop model/ ontology 23 Ex. 1: Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part Information Technology (IT), part Cyber Infrastructure (CI), part Informatics and all about doing new science 24 SemantEco • Water Quality Portal Example from previous classes • http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal • We will come back to this later… but will go over now at a high level. • Next Motivated by the Virtual Solar Terrestrial Observatory 25 Added value Education, clearinghouses, disciplines, et c. other services, Semantic mediation layer - midupper-level Virtual Observatory Portal Semantic interoperability Added value Added value Semantic query, hypothesis and inference Web Serv. VO API Query, access and use of data Mediation Layer • Ontology - capturing concepts of Parameters, Semantic mediation layer - VSTO - low level Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes Metadata, schema, • Maps queries to underlying data data • Generates access requests for metadata, data Added value Data DB2 DB3 • AllowsDBqueries, reasoning, analysis, new ………… Base n 1 hypothesis generation, testing, explanation, etc. 26 Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity. – Extract information from the use-case - encode knowledge – Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a Simple Object Access Protocol (SOAP) web service for the Virtual Ionosphere-ThermosphereMesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter 27 in any order and with constraints included in any combination. Inferred plot type and return required axes data 28 Semantic Web Benefits • Unified/ abstracted query workflow: Parameters, Instruments, Date-Time • Decreased input requirements for query: in one case reducing the number of selections from eight to three • Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics • Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) • Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services – understanding of coordinate systems, relationships, data synthesis, transformations, etc. – returns independent variables and related parameters • A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields) 29 Remembering….data has Lots of Audiences… Also lay people More Strategic Less Strategic 30 What is a Non-Specialist Use Case? Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”. Someone should be able to query a virtual observatory without having specialist knowledge 31 What should the User Receive? Teacher receives four groupings of search results: 1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.ph p and http://www.meted.ucar.edu/hao/aurora/ 2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them 3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights 4) Did you mean?: Aurora Borealis or Aurora Australis, etc. 32 Semantic Information Integration: Concept map for educational use of science data in a lesson plan 33 Fox CI and X-informatics - CSIG 2008, Aug 11 34 Fox CI and X-informatics - CSIG 2008, Aug 11 Semantic Web Basics • The triple: {subject-predicate-object} Interferometer is-a optical instrument Optical instrument has focal length An ontology is a representation of this knowledge • W3C is the primary (but not sole) governing organization for languages, specifications, best practices, et c. – RDF - Resource Description Framework – OWL 1.0 - Ontology Web Language (OWL 2.0 on the way) • Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon • Put semantics between/ in your interfaces, i.e. between layers and components in your architecture, i.e. between ‘users’ and ‘information’ to mediate the exchange 35 • • • • • Terminology Semantic Web – An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation, www.semanticweb.org – Primer: http://www.ics.forth.gr/isl/swprimer/ Semantic Grid – Semantic services to use the resources of many computers connected by a network to solve large scale computational/ data problems Provenance – origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility. Service-oriented architecture – Provision of a capability over the internet via a ‘remote-procedure-call’ using prescribed input, output and pre-conditions Ontology (n.d.). The Free On-line Dictionary of Computing. http://dictionary.reference.com/browse/ontology – An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the 36 relationships that hold among them. • • • Terminology Closed World - where complete knowledge is known (encoded), AI relied on this Open World - where knowledge is incomplete/ evolving, SW promotes this Languages – – – – – – – • OWL - Web Ontology Language (W3C) RDF - Resource Description Framework (W3C) OWL-S/SWSL - Web Services (W3C) WSMO/WSML - Web Services (EC/W3C) SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format PML - Proof Markup Language Editors: Protégé, SWOOP, Medius, SWeDE, … Reasoners – Pellet, Racer, Medius KBS, FACT++, fuzzyDL, KAON2, MSPASS, QuOnto • Query Languages – SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery • Other Tools for Semantic Web – – – – • Search: SWOOGLE swoogle.umbc.edu Collaboration: www.planetont.org Other: Jena, SeSAME/SAIL, Mulgara, Eclipse, KOWARI Semantic wiki: OntoWiki, SemanticMediaWiki Emerging Semantic Standards for Earth Science – SWEET, VSTO, MMI, GeoSciML 37 Semantic Web Layers 38 http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/ Application Areas for Semantics • • • • • • • • • • • • • • Smart search Annotation (even simple forms), smart tagging Geospatial Implementing logic (rules), e.g. in workflows Data integration Verification …. and the list goes on Web services Web content mining with natural language parsing User interface development (portals) Semantic desktop Wikis - OntoWiki, SemanticMediaWiki Sensor Web Software engineering Explanation 39 Visibility 2007-2008 Hype Cycle for Emerging Semantic Web Technologies v0.6 Semantic Web Services Triple stores, e.g. Jena, Sesame, Mulgara, Oracle Spatial Semantic Wiki Smart search, e.g. NOESIS Rules/Logic, SWRL Query Lang, SPARQL Ontology editor, SWOOP Concept map, Cmap RDF OWL 1.0 Tagging / annotation Mid-level ES domain ontologies, e.g GEON Protégé XML Estimated years to mainstream adoption in Earth science < 2 years DL Reasoners, 2-5 years SKOS, e.g. Pellet, Racer Species Query 5-10 years FOAF Validators Lang, Upper level Mid-level ES OWL 1.1 OWL-QL > 10 years ontologies, e.g domain ontologies, Natural Language Obsolete ABC, DOLCE, e.g SWEET before Ontologies SUMO plateau Query Lang, Commercial Managing and embedded QL modular 40 Slope of Plateau of ontologies Technology Peak of Trough of Enlightenment Productivity (ES and trigger Inflated Disillusionment general) Expectations Produced for NASA TIWG semantic web subgroup Time April 2008 Outcome Increased Collaboration & Interdisciplinary Science Acceleration of Knowledge Production Revolutionizing how science is done Output Geospatial semantic services established Geospatial semantic services proliferate Scientific semantic assisted services Autonomous inference of science results Vocabulary Interoperable Information Infrastructure Assisted Discovery & Mediation Improved Information Sharing Languages/ Reasoning Technology Capability Results Semantic Web Roadmap Some common vocabulary based product search and access Semantic geospatial search & inference, access Semantic agentbased searches Semantic agentbased integration Local processing + data exchange Basic data tailoring services (data as service), verification/ validation Interoperable geospatial services (analysis as service), results explanation service Metadata-driven data fusion (semantic service chaining), trust SWEET core 1.0 based on GCMD/CF SWEET core 2.0 based on best practices decided from community RDF, OWL, OWL-S Geospatial reasoning, OWL-Time SWEET 3.0 with semantic callable interfaces via standard programming languages Numerical reasoning Reasoners able to utilize SWEET 4.0 Scientific reasoning 41 Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs) Interactive Interoperable Responsive Verifiable Assisted Assisted Data Information Information Information Knowledge Discovery & Analysis services Delivery Quality Building Mediation Seamless Data Access Capability Semantic Web Roadmap (capability) April 2008 Some common vocabulary based product search and access Some metadata and limited provenance available Semantic geospatial search & inference, access Semantic agentbased searches Semantic agent-based integration Common Ontologies for data terminology captured mining, visualization and in ontologies, crossing analysis emerging/ maturing domains Ontologies for information quality developed Verification is manual with minimal tool support Domain and range properties in ontologies used in tools Provenance/ annotation with ontologies in user tools Service ontologies carry quality provenance Services annotated Dynamic service Semantic markup of Services must be with resource discovery and mediation, data latency (time lags) hardwired and service descriptions and data scheduling which adapt dynamically agreements established Local processing + data exchange Limited metadata passed to analysis applications Basic data tailoring Interoperable geospatial services services (data as (analysis as service), service), verification/ results explanation service validation Tag properties, nonjargon vocabulary for non-specialist use Access mediated by agreed standard vocabularies, hard-wired connections Current Access mediated by common ontologies Near Term (0-2 yrs) Shared terminology for the visual properties of interface objects and graph types... Mediation aided by services with domain/ range properties Mid Term (2-5 yrs) Metadata-driven data fusion (semantic service chaining), trust Semantic fields to describe tag key modal functions. Key data access services are semantically mediated 42 Long Term (5+ yrs) Interactive Interoperable Responsive Verifiable Assisted Assisted Data Information Information Information Knowledge Discovery & Analysis services Delivery Quality Building Mediation Seamless Data Access Capability Roadmap - from near-term to mid-term Semantic geospatial search & inference, access Ontologies for data mining, visualization and analysis emerging/ maturing Ontologies for information quality developed Services annotated with resource descriptions Basic data tailoring services (data as service), verification/ validation Tag properties, nonjargon vocabulary for non-specialist use Access mediated by common ontologies Near Term (0-2 yrs) -> requires agent development and vocabulary for agent characterization Semantic agentbased searches -> requires mature (domain and data-type) ontologies with community endorsement and governance and a robust integration framework -> requires mature quality and uncertainty ontologies with domain and range properties added and populated Common terminology captured in ontologies, crossing domains -> requires semantic service (ontology) registry -> requires service to implement v/v, new descriptions of analyses, developing explanation -> requires development of portal modal function vocabulary and ontology, link to domain context and data structure -> requires adding properties to classes in ontologies and populating instances with expert agreement Domain and range properties in ontologies used in tools Dynamic service discovery and mediation, and data scheduling Interoperable geospatial services (analysis as service), results explanation service Shared terminology for the visual properties of interface objects and graph types... Mediation aided by services with domain/ range properties Mid Term (2-5 yrs) 43 Selected Technical Benefits 1. 2. 3. 4. 5. 6. 7. 8. Integrating Multiple Data Sources Semantic Drill Down / Focused Perusal Statements about Statements Inference Translation Smart (Focused) Search Smarter Search … Configuration Proof and Trust Updated material reused from “The Substance of the Web”. McGuinness and Dean. Semantic Web Applications for National Security. May, 2005. http://www.schafertmd.com/swans/agenda.html 44 1: Integrating Multiple Data Sources • The Semantic Web lets us merge statements from different sources • The RDF Graph Model allows programs to use data uniformly regardless of the source • Figuring out where to find such data is a motivator for Semantic Web Services hasCoordinates #Ionosphere #magnetic name hasLowerBoundaryValue “100” “Terrestrial Ionosphere” hasLowerBoundaryUnit “km” Different line & text colors 45 represent different data sources 2: Drill Down /Focused Perusal • The Semantic Web uses Uniform Resource Identifiers (URIs) to …#NeutralTemperature name things • These can typically be resolved to get more information about the resource measuredby • This essentially creates a web of data analogous to the web of text created by the World Wide Web Internet • Ontologies are represented using the same structure as content – We can resolve class and property URIs to learn about the ontology …#Norway locatedIn ...#ISR ...#FPI type operatedby ...#MilllstoneHill …#EISCAT 46 3: Statements about Statements • The Semantic Web allows us to make statements about statements – Timestamps – Provenance / Lineage – Authoritativeness / Probability / Uncertainty – Security classification – … #Danny’s #Aurora hasSource hasDateTime hascolor • This is an unsung virtue of the Semantic Web 20031031 Red Ontologies Workshop, APL May 26, 2006 47 4: Inference • The formal foundations of the Semantic Web allow us to infer additional (implicit) statements that are not explicitly made • Unambiguous semantics allow question answerers to infer that objects are the same, objects are related, objects have certain restrictions, … • SWRL allows us to make additional inferences beyond those provided by the ontology OperatesInstrument #Millstone Hill #Interferometer hasInstrument isOperatedBy Measures hasTypeofData hasOperatingMo hasMeaasuredData #VerticalMeans 48 5: Translation • While encouraging sharing, the Semantic Web allows multiple URIs to refer to the same thing • There are multiple levels of mapping – – – – Classes Properties Instances Ontologies • OWL supports equivalence and specialization; SWRL allows more complex mappings #precipitation name ont1:Precipitation ont1:EduLevel VO:Scientist #precipitation name ont2:Rain ont2:EduLevel EduVO:K-12 49 6: Smart (Focused) Search • The Semantic Web associates 1 or more classes with each object • We can use ontologies to enhance search by: – – – – Query expansion Sense disambiguation Type with restrictions …. 50 7: Smarter Search / Configuration 51 GEONGRID Ontology Search and Data Integration Example Uses emerging web standards to enable smart web applications Given an upper-level domain choice •Ecology Illustrate or list contained concepts/hierarchy •VegetationCover, TreeRings, etc. Retrieve some specific options from web •Maps, tree-ring data, • Info: https://portal.geongrid.org:8443/gridsphere/gridsphere 52 53 54 8: Proof • The logical foundations hasCalibration #Critical of the Semantic Web #FlatField Dataset allow us to construct proofs that can be used hasPeerReview to improve transparency, understanding, and trust #Solar Physics • Proof and Trust are onPaper going research areas for the Semantic Web: e.g., “Critical Dataset has been calibrated See PML and Inference with a flat field program that is published In the peer reviewed literature.” 55 Web Inference Web Framework for explaining reasoning tasks by storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by multiple distributed reasoners. • OWL-based Proof Markup Language (PML) specification as an interlingua for proof interchange • IWExplainer for generating and presenting interactive explanations from PML proofs providing multiple dialogues and abstraction options • IWBrowser for displaying (distributed) PML proofs • IWBase distributed repository of proof-related meta-data such as inference engines/rules/languages/sources • Integrated with theorem provers, text analyzers, web services, … http://iw.rpi.edu 56 Inference Web Infrastructure (McGuinness, et.al., 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html ) Files/WWW Semantic OWL-S/BPEL Discovery Service (DAML/SNRC) CWM (NSF TAMI) JTP (DAML/NIMD) SPARK (DARPA CALO) N3 KIF SPARK-L UIMA (DTO NIMD Text Analytics Exp Aggregation) Proof Markup Language (PML) Trust Justification Provenance Toolkit IWTrust Trust computation IW Explainer/ Abstractor End-user friendly visualization IWBrowser Expert friendly Visualization IWSearch search engine based publishing IWBase provenance registration Framework for explaining question answering tasks by • abstracting, storing, exchanging, • combining, annotating, filtering, segmenting, • comparing, and rendering proofs and proof fragments provided by question answerers. 57 SW Questions & Answers Users can explore extracted entities and relationships, create new hypothesis, ask questions, browse answers and get explanations for answers. A question An answer A context for explaining the answer An abstracted explanation 58 (this graphical interface done by Batelle supported by Stanford KSL) Summary • Semantics are a very key ingredient for progress in informatics and escience • A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production • This is what we will be teaching you in this class 59 Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Rapid Leverage Open World: Prototype Technology Evolve, Iterate, Infrastructure Redesign, Redeploy Adopt Technology Science/Expert Approach Review & Iteration Use Tools Evaluation Analysis Use Case Small Team, mixed skills Develop model/ ontology 60 Outline of the course • Topics for Semantic e-Science/ Foundations: – – – – – – – – – – – – – Semantic Methodologies Knowledge Representation for e-Science Ontology Engineering and Re-Use for e-Science Knowledge Integration for e-Science Semantic Data Integration Semantic Web Languages, Tools and Services Semantic Infrastructure and Architecture for e-Science Semantic Grid Middleware Ontology Evolution for e-Science Knowledge Management for e-Science e-Science Workflow Management Data life-cycle for e-Science Data Mining and Knowledge Discovery 61 SeS Applications and Ontologies • • • • Semantic Web for Health Care and Life Science Semantic Web for Bio-Med-informatics Semantic Web for System and Integrated Biology Semantic Web for Sun, Earth, Environment and Climate • Semantic Web for Chemistry, Physics and Astronomy • Semantic Web for Engineering • Semantic Web and Digital Libraries and Scientific Publications 62 SeS Project options • Configuration and Deployment of Semantic Virtual Observatories – Oceanography, astronomy, geology – Particularly convenient ones – around water quality, first responder data • • • • • • Semantic Advisors – e.g., Semantic Sommelier Ontology Merging and Validation Test-bed Semantic Language and Tool Use and Evaluation Semantic eScience Implementation Evaluation Semantic Collaboration Case Studies Semantic Application Development and Demonstration 63 Schedule - wiki • Reading assignments • Assignments – Individual – Group • Written assessments • Presentation assessments • Group assessments 64 What we expect • Attend class, complete assignments • Participate • Ask questions – be honest with yourself and others about what you do and do not know • Work both individually and in a group • Work constructively in group and class sessions 65 Logistics summary • Class - Monday 1-3:50pm • Office hours – By Appointment along with a regular time to be determined for TA (probably before and tetherless night – Twed) • This weeks assignment: – Reading - Ontologies 101*- this one is very important, Semantic Web, e-Science, RDFS – Turn in a one page description of one of your favorite papers AND WHY from the reading list • Next class (week 2 – two weeks from today - note labor day): – Foundations I: Methodologies, Knowledge Representation – Use Cases • If you have a background that you think needs some extra 66 background reading, talk to us. Extra 67