- Tetherless World Constellation

advertisement

Introduction to eScience and

Semantic Web

Professor Deborah McGuinness

TA – Weijing Chen

Other lectures from Professor Joanne Luciano, grad student Jim McCusker, and possibly others from http://tw.rpi.edu/web/People

CSCI 6962 - 01, 86933 , CSCI 4969 - 01, 87927

ITWS 6960 - 01, 87198 , ITWS 4969 - 01, 87928

Week 1, initially August 29, 2011

Moved because of Hurricane Irene to Wednesday August 31, 2011

1

Admin info (keep/ print this slide)

• Class:

– CSCI 6962 - 01, 86933CSCI 4969 - 01, 87927

– ITWS 6960 - 01, 87198, ITWS 4969 - 01, 87928

• Hours: 1pm-3:50pm Mondays (except after

Columbus day)

• Class Location: Winslow 1140

• Instructors: Deborah McGuinness, TA Weijing Chen,

Guests: Joanne Luciano, Jim McCusker

• Contacts: dlm@cs.rpi.edu

, chenw8@rpi.edu

, jluciano@cs.rpi.edu

, mccusj@rpi.edu

• Contact locations: Winslow 2104 (DLM), 2143 (JSL)

2

For each class

• Titanpad – this week http://twc.titanpad.com/147

• Scribe for each class – this week Weijing

• After class – scribe copies notes over to the class page

• Class Page:

http://tw.rpi.edu/web/Courses/SemanticeScience/2011

• You will need an account on our site so that you can upload your homeworks and presentations – contact Patrick West – who is in class

• See http://tw.rpi.edu/web/Help/UploadLinkToMedia for uploading instructions 3

Quick hints (from patrick)

• It's just a matter of adding a tag to the body of the drupal page: <document href="SemanteScience2011Assignment00.pdf" alt="Semantic eScience 2011 Assignment 00"/>

• When you save the page, next to the title, you'll see an Upload link. Click on that, upload the document, and when you click "Upload" the page will be changed from an Upload link to a

Download link.

• To upload a new version of the document go to

4 http://tw.rpi.edu/media/submit.php

Introductions

• Who are we?

• Who are you?

• Why are you here?

• What do you want to get out of the class?

• Will you make the class (on time) each week and do you have any other conflicts or issues we should know about?

5

“Knowledge is the common wealth of humanity”*

In the Earth and space sciences and elsewhere, ready and open access to the vast and growing collections of cross-disciplinary digital information is the key to understanding and responding to complex Earth system phenomena that influence human survival.

We have a shared responsibility to create and implement strategies to realise the full potential of digital information and services for present and future generations.

*

Adama Samassekou, Convener of the UN World Summit on the Information Society

Brain Storming

• What do you think we need to address to start to realize the vision on the previous viewgraph?

7

Contents

• Outline of the course

• Background

• e-Science

• Examples

• Informatics

• Semantics

• Elements of Semantic e-Science (SeS)

• What we expect

• Logistics summary

8

Outline of the course

• Topics for Semantic e-Science/ Foundations:

– Semantic Methodologies

– Knowledge Representation for e-Science

– Ontology Engineering and Re-Use for e-Science

– Knowledge Integration for e-Science

– Semantic Data Integration

– Semantic Web Languages, Tools and Services

– Semantic Infrastructure and Architecture for e-Science

– Semantic Grid Middleware

– Ontology Evolution for e-Science

– Knowledge Management for e-Science

– e-Science Workflow Management

– Data life-cycle for e-Science

– Data Mining and Knowledge Discovery

9

Background

People (scientists) should be able to access a global, distributed knowledge base of (scientific) data that:

• appears to be integrated

• appears to be locally available

But… data is obtained by multiple means, using various protocols, in differing vocabularies, using

(sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed

And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

10

What do we need to achieve Semantic eScience?

(in-class brainstorming exercise (2010)) organization, leadership, management strategies, roles and assignment of roles dissemination strategy communication of ideas

- machine level

- human level conflict resolution cross-disciplinary collaboration flexible adaptable, feedback extensible ability to filter information usage/application of resources, optimization facts, knowledge (domain knowledge) context, domain, scope goals, use cases metadata - data to describe data ability to link information ability to understand information ability to capture and represent conflicting ideas provenance - where data come from trust - reliable ability to capture intent (humanitarian aspect / responsibility) credibility of information interesting and appealing standardization education and outreach methods and metrics criteria for evaluation

The Information Era: Interoperability

Modern information and communications technologies are creating an

“interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system:

• managing and accessing large data sets

• higher space/time resolution capabilities

• rapid response requirements

• data assimilation into models

• crossing disciplinary boundaries.

12

But data has Lots of Audiences

products have

More Strategic

Less Strategic

SCIENTISTS TOO

13

From “Why EPO (Education and Public Outreach)?”, a NASA internal report on science education, 2005

Shifting the Burden from the User to the Provider

Fox CI and X-informatics - CSIG 2008, Aug 11

14

e-Science

• Emphasis is on Science

• Original narrative: One of the key drivers behind the search for such new scientific tools is the imminent deluge of data from new generations of scientific experiments and surveys (*). In order to exploit and explore the petabytes of scientific data that will arise from these high-throughput experiments, supercomputer simulations, sensor networks, and satellite surveys, scientists will need assistance from specialized search engines, data mining tools, and data visualization tools that make it easy to ask questions and understand answers. To create such tools, the data will need to be annotated with relevant "metadata" giving information as to provenance, content, conditions, and so on; and, in many instances, the sheer volume of data will dictate that this process be automated.

Scientists will create vast distributed digital repositories of scientific data requiring management services similar to those of more conventional digital libraries, as well as other data-specific services. The ability to search, access, move, manipulate, and mine such data will be a central requirement for this new generation of collaborative science software applications. Hey and Trefethen, 2005

15

Evolving Science

• Thousand years ago: science was empirical describing natural phenomena

• Last few hundred years: theoretical branch using models, generalizations

• Last few decades: a computational branch simulating complex phenomena

• Today: data exploration (eScience) synthesizing theory, experiment and computation with advanced data management and statistics

 new algorithms!

.

a a

2

4

G

3 c

2

  a

2

Living in an Exponential World

1000 • Scientific data doubles every year

– caused by successive generations of inexpensive sensors + exponentially faster computing

100

10

1

1970 • Changes the nature of scientific computing

• Cuts across disciplines (eScience)

• It becomes increasingly harder to extract knowledge

• 20% of the world’s servers go into huge data centers by the “Big 5”

– Google, Microsoft, Yahoo, Amazon, eBay

• So it is not only the scientific data!

1975

0.1

1980

1985

1990

1995

2000

CCDs Glass

Collecting Data

• Very extended distribution of data sets: data on all scales!

• Most datasets are small, and manually maintained (Excel spreadsheets)

• Total amount of data dominated by the other end

(large multi-TB archive facilities)

• Most bytes today are collected via electronic sensors

Making Discoveries

• Where are discoveries made?

– At the edges and boundaries

– Going deeper, collecting more data, using more colors….

• Metcalfe’s law

– Utility of computer networks grows as the number of possible connections: O(N 2 )

• Federating data (the connections!!)

– Federation of N archives has utility O(N 2 )

– Possibilities for new discoveries grow as O(N 2 )

• Many examples

– Sky surveys – galaxy zoo… Very early discoveries from SDSS, 2MASS, DPOSS

– Genomics+proteomics

– Alzheimers article in reading

Data Delivery: Hitting a Wall

FTP and GREP are not adequate

• You can GREP 1 MB in a second

• You can GREP 1 GB in a minute

• You can GREP 1 TB in 2 days

• You can GREP 1 PB in 3 years

• You can FTP 1 MB in 1 sec

• You can FTP 1 GB / min

(~1 $/GB)

• … 2 days and 1K$

• … 3 years and 1M$

• Oh!, and 1PB ~4,000 disks

• At some point you need indices to limit search parallel data search and analysis

• This is where databases can help

• Take the analysis to the data !!

Mind the Gap!

• As a result of finding out who is doing what,

Informatics - information science includes the sharing experience/ expertise, and science of (data and) information, the practice substantial coordination: of information processing, and the engineering communicate (data and) information. It also visualization and other computing and information processing services over the

21

World-Wide Emerging Technology

Trends

• Innovation will come from other parts of the world other than the U.S.

• The Chinese have skipped the Internet first generation.

• Growth will occur in Asia, and continue to decrease in Western Europe.

• U.S. Industry is compulsively outsourcing abroad.

• Software is moving from forms-based applications to business processes.

• Networks are migrating to IP and optical networking technologies.

Cyberinfrastructure

• Data curation and storage

• Federated access

• Collaboration

• New uses in High Performance Computing

• Databases

• Web servers, services (software as service)

• Wiki

• Visualization

• All discipline neutral

Semantic Web Methodology and

Technology Development Process

• Establish and improve a well-defined methodology vision for

Semantic Technology based application development

Leverage controlled vocabularies, etc.

Rapid

Open World:

Evolve, Iterate,

Prototype

Redesign,

Redeploy

Leverage

Technology

Infrastructure

Adopt

Technology

Approach

Science/Expert

Review & Iteration

Use Tools

Evaluation

Analysis

Use Case

Small Team, mixed skills

Develop model/ ontology

24

SemantEco

• Water Quality Portal Example from 2010

• http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal

25

Ex. 1: Virtual Observatories

Make data and tools quickly and easily accessible to a wide audience.

Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated

Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part IT, part CI, part

Informatics and all about doing new science

26

Added value Education, clearinghouses, other services, disciplines, et c.

Semantic mediation layer - midupper-level

Semantic interoperability

VO Web

Portal Added value Serv.

VO

API

Added value

Semantic query,

Mediation Layer hypothesis and inference

• Ontology - capturing concepts of Parameters,

Instruments, Date/Time, Data Product (and

Semantic mediation layer - VSTO - low level

Classes

• Maps queries to underlying data

Metadata, schema,

Query, access and use of data

DB n

DB

1

DB DB

3 … … … …

27

Science and technical use cases

Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity .

– Extract information from the use-case - encode knowledge

– Translate this into a complete query for data - inference and integration of data from instruments, indices and models

Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere-

Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

28

Inferred plot type and return required axes data

29

Semantic Web Benefits

• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time

• Decreased input requirements for query: in one case reducing the number of selections from eight to three

• Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics

• Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services)

• Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services

– understanding of coordinate systems, relationships, data synthesis, transformations, etc.

– returns independent variables and related parameters

• A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

30

But data has Lots of Audiences

More Strategic

Less Strategic

From “Why EPO?”, a NASA internal report on science education, 2005

31

What is a Non-Specialist Use Case?

Someone should be able to query a virtual observatory without having specialist knowledge Teacher accesses internet goes to An Educational Virtual

Observatory and enters a search for “Aurora”.

32

What should the User Receive?

Teacher receives four groupings of search results:

1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.ph

p and http://www.meted.ucar.edu/hao/aurora/

2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them

3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights

4) Did you mean?: Aurora Borealis or Aurora

Australis, etc .

33

Semantic Information Integration:

Concept map for educational use of science data in a lesson plan

Fox CI and X-informatics - CSIG 2008, Aug 11

34

Fox CI and X-informatics - CSIG 2008, Aug 11

35

Ex 2 – SemantEco /

SemantAqua

• Water Quality Portal Example from 2010

• http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal

• Came from hw assignment, proposed in class

• Generated papers in

– Environmental Information Management 2011

– Intl Semantic Web Conference 2011 (main conference and possibly poster session as well)

– American Geophysical Union 2011

– Plus invited presentations for water, health, etc.

36

Semantic Web Basics

• The triple : { subject -predicateobject }

Interferometer is-a optical instrument

Optical instrument has focal length

An ontology is a representation of this knowledge

• W3C is the primary (but not sole) governing organization for languages, specifications, best practices, et c.

– RDF - Resource Description Framework

– OWL 1.0 - Ontology Web Language (OWL 2.0 on the way)

• Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon

• Put semantics between/ in your interfaces, i.e. between layers and components in your architecture, i.e. between ‘users’ and

‘information’ to mediate the exchange

37

Terminology

• Semantic Web

– An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation, www.semanticweb.org

– Primer: http://www.ics.forth.gr/isl/swprimer/

• Semantic Grid

– Semantic services to use the resources of many computers connected by a network to solve large scale computational/ data problems

• Provenance

– origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility.

• Service-oriented architecture

– Provision of a capability over the internet via a ‘remote-procedure-call’ using prescribed input, output and pre-conditions

• Ontology (n.d.). The Free On-line Dictionary of Computing. http://dictionary.reference.com/browse/ontology

– An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.

38

Terminology

• Closed World - where complete knowledge is known (encoded), AI relied on this

• Open World - where knowledge is incomplete/ evolving, SW promotes this

• Languages

– OWL - Web Ontology Language (W3C)

– RDF - Resource Description Framework (W3C)

– OWL-S/SWSL - Web Services (W3C)

– WSMO/WSML - Web Services (EC/W3C)

– SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format

– PML - Proof Markup Language

– Editors: Protégé, SWOOP, Medius, SWeDE, …

• Reasoners

– Pellet, Racer, Medius KBS, FACT++, fuzzyDL, KAON2, MSPASS, QuOnto

• Query Languages

– SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery

• Other Tools for Semantic Web

– Search: SWOOGLE swoogle.umbc.edu

– Collaboration: www.planetont.org

– Other: Jena, SeSAME/SAIL, Mulgara, Eclipse, KOWARI

– Semantic wiki: OntoWiki, SemanticMediaWiki

• Emerging Semantic Standards for Earth Science

– SWEET, VSTO, MMI, GeoSciML

39

Semantic Web Layers

http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/

40

Application Areas for Semantics

• Smart search

• Annotation (even simple forms), smart tagging

• Geospatial

• Implementing logic (rules), e.g. in workflows

• Data integration

• Verification …. and the list goes on

• Web services

• Web content mining with natural language parsing

• User interface development (portals)

• Semantic desktop

• Wikis - OntoWiki, SemanticMediaWiki

• Sensor Web

• Software engineering

• Explanation

41

2007-2008 Hype Cycle for Emerging

Semantic Web Technologies v0.6

Semantic

Web

Services

Semantic

Wiki

Smart search, e.g. NOESIS

Rules/Logic,

SWRL

Query

Lang,

OWL-QL

SKOS,

FOAF

OWL 1.1

Natural Language

Ontologies

Query Lang, Commercial and embedded QL Managing modular ontologies

(ES and general)

Technology trigger

Query Lang,

SPARQL

Tagging / annotation

Species

Validators

Peak of

Inflated

Expectations

Ontology editor,

SWOOP

Mid-level ES domain ontologies, e.g

GEON

Upper level ontologies, e.g

ABC, DOLCE,

SUMO

Trough of

Disillusionment

OWL 1.0

Concept map, Cmap

Protégé

RDF

DL Reasoners, e.g. Pellet, Racer

Mid-level ES domain ontologies, e.g SWEET

Slope of

Enlightenment

Triple stores, e.g.

Jena, Sesame,

Mulgara, Oracle

Spatial

XML

Estimated years to mainstream adoption in Earth science

< 2 years

2-5 years

5-10 years

> 10 years

Obsolete before plateau

Plateau of

Productivity

Produced for NASA TIWG semantic web subgroup

Semantic Web Roadmap

April 2008

 Improved

Information

Sharing

 Increased

Collaboration &

Interdisciplinary Science

 Acceleration of

Knowledge

Production

 Revolutionizing how science is done

 Geospatial semantic services established

 Some common vocabulary based product search and access

 Geospatial semantic services proliferate

 Semantic geospatial search & inference, access

 Local processing + data exchange

 Basic data tailoring services

(data as service), verification/ validation

 Scientific semantic assisted services

Semantic agentbased searches

 Interoperable geospatial services

(analysis as service), results explanation service

 SWEET core

1.0 based on

GCMD/CF

 SWEET core 2.0 based on best practices decided from community

 SWEET 3.0 with semantic callable interfaces via standard programming languages

 Autonomous inference of science results

 Semantic agentbased integration

 Metadata-driven data fusion

(semantic service chaining), trust

 Reasoners able to utilize

SWEET 4.0

 RDF, OWL,

OWL-S

 Geospatial reasoning, OWL-Time

 Numerical reasoning

 Scientific reasoning

43

Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs)

Semantic Web Roadmap (capability)

April 2008

Some common vocabulary based product search and access

Semantic geospatial search & inference, access

 Some metadata and limited provenance available

 Semantic agentbased searches

Ontologies for data mining, visualization and analysis emerging/ maturing

 Common terminology captured in ontologies, crossing domains

 Semantic agent-based integration

 Provenance/ annotation with ontologies in user tools

 Verification is manual with minimal tool support

Ontologies for information quality developed

 Domain and range properties in ontologies used in tools

 Service ontologies carry quality provenance

 Services must be hardwired and service agreements established

 Services annotated with resource descriptions

 Dynamic service discovery and mediation, and data scheduling

 Semantic markup of data latency (time lags) which adapt dynamically

 Local processing + data exchange

 Limited metadata passed to analysis applications

 Basic data tailoring services (data as service), verification/ validation

 Interoperable geospatial services

(analysis as service), results explanation service

Tag properties, nonjargon vocabulary for non-specialist use

 Shared terminology for the visual properties of interface objects and graph types...

 Access mediated by agreed standard vocabularies, hard-wired connections

 Access mediated by common ontologies

 Mediation aided by services with domain/ range properties

 Metadata-driven data fusion (semantic service chaining), trust

 Semantic fields to describe tag key modal functions.

 Key data access services are semantically mediated

44

Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs)

Roadmap - from near-term to mid-term

Semantic geospatial search & inference, access

-> requires agent development and vocabulary for agent characterization

Ontologies for data mining, visualization and analysis emerging/ maturing

Ontologies for information quality developed

-> requires mature (domain and data-type) ontologies with community endorsement and governance and a robust integration framework

-> requires mature quality and uncertainty ontologies with domain and range properties added and populated

 Services annotated with resource descriptions

-> requires semantic service

(ontology) registry terminology captured in ontologies, crossing

 Semantic agentbased searches

 Common domains

Domain and range properties in ontologies used in tools

 Dynamic service discovery and mediation, and data scheduling

 Basic data tailoring services (data as service), verification/

 validation

Tag properties, nonjargon vocabulary for non-specialist use

 Access mediated by common ontologies

-> requires service to implement v/v, new descriptions of analyses, developing explanation

-> requires development of portal modal function vocabulary and ontology, link to domain context and data structure

-> requires adding properties to classes in ontologies and populating instances with expert agreement

 Interoperable geospatial services

(analysis as service), results explanation service

 Shared terminology for the visual properties of interface objects and graph types...

 Mediation aided by services with domain/ range properties

Near Term (0-2 yrs) Mid Term (2-5 yrs)

45

Selected Technical Benefits

1. Integrating Multiple Data Sources

2. Semantic Drill Down / Focused Perusal

3. Statements about Statements

4. Inference

5. Translation

6. Smart (Focused) Search

7.

Smarter Search … Configuration

8. Proof and Trust

Updated material reused from “The Substance of the Web”. McGuinness and Dean. Semantic Web Applications for National

Security. May, 2005. http://www.schafertmd.com/swans/agenda.html

46

1: Integrating Multiple Data

Sources

• The Semantic Web lets us merge statements from different sources

• The RDF Graph Model allows programs to use data uniformly regardless of the source

• Figuring out where to find such data is a motivator for Semantic

Web Services hasCoordinates

#Ionosphere #magnetic name hasLowerBoundaryValue

“Terrestrial

Ionosphere” hasLowerBoundaryUnit

“km”

“100”

Different line & text colors

47 represent different data sources

2: Drill Down /Focused

Perusal

• The Semantic Web uses Uniform

Resource Identifiers (URIs) to name things

…#NeutralTemperature

• These can typically be resolved to get more information about the resource

• This essentially creates a web of data analogous to the web of text created by the World Wide Web

• Ontologies are represented using the same structure as content

– We can resolve class and property URIs to learn about the ontology measuredby

Internet

...#FPI

…#Norway locatedIn

...#ISR type operatedby

...#MilllstoneHill

…#EISCAT

48

3: Statements about Statements

• The Semantic Web allows us to make statements about statements

– Timestamps

– Provenance / Lineage

– Authoritativeness / Probability /

Uncertainty

– Security classification

– …

• This is an unsung virtue of the

Semantic Web

#Aurora hasSource

#Danny’s hasDateTime hascolor

20031031 Red

Ontologies Workshop, APL May 26, 2006

49

4: Inference

• The formal foundations of the Semantic Web allow us to infer additional

(implicit) statements that are not explicitly made

• Unambiguous semantics allow question answerers to infer that objects are the same, objects are related, objects have certain restrictions, …

• SWRL allows us to make additional inferences beyond those provided by the ontology

OperatesInstrument

#Millstone Hill #Interferometer hasInstrument isOperatedBy hasTypeofData

Measures hasOperatingMode hasMeaasuredData

#VerticalMeans

50

5: Translation

• While encouraging sharing, the Semantic Web allows multiple URIs to refer to the same thing

• There are multiple levels of mapping

– Classes

– Properties

– Instances

– Ontologies

• OWL supports equivalence and specialization; SWRL allows more complex mappings

#precipitation name ont1:EduLevel ont1:Precipitation

#precipitation name ont2:EduLevel ont2:Rain

VO:Scientist

EduVO:K-12

51

6: Smart (Focused) Search

• The Semantic Web associates 1 or more classes with each object

• We can use ontologies to enhance search by:

– Query expansion

– Sense disambiguation

– Type with restrictions

– ….

52

7: Smarter Search / Configuration

53

GEONGRID Ontology Search and Data Integration Example

Uses emerging web standards to enable smart web applications

Given an upper-level domain choice

•Ecology

Illustrate or list contained concepts/hierarchy

•VegetationCover, TreeRings, etc.

Retrieve some specific options from web

•Maps, tree-ring data,

Info: https://portal.geongrid.org:8443/gridsphere/gridsphere

54

55

56

8: Proof

• The logical foundations of the Semantic Web allow us to construct proofs that can be used hasCalibration

#Critical

#FlatField

Dataset to improve transparency, understanding, and trust hasPeerReview

• Proof and Trust are ongoing research areas for

#Solar

Physics

Paper the Semantic Web: e.g.,

See PML and Inference

Web

“Critical Dataset has been calibrated with a flat field program that is published

In the peer reviewed literature.”

57

Inference Web

Framework for explaining reasoning tasks by storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by multiple distributed reasoners.

• OWL-based Proof Markup Language (PML) specification as an interlingua for proof interchange

• IWExplainer for generating and presenting interactive explanations from PML proofs providing multiple dialogues and abstraction options

• IWBrowser for displaying (distributed) PML proofs

• IWBase distributed repository of proof-related meta-data such as inference engines/rules/languages/sources

• Integrated with theorem provers, text analyzers, web services, …

58 http://iw.rpi.edu

Inference Web Infrastructure

(McGuinness, et.al., 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html

)

Semantic

Discovery Service

OWL-S/BPEL

(DAML/SNRC)

N3

CWM

(NSF TAMI)

JTP

(DAML/NIMD)

KIF

Files/WWW

Proof Markup

Language (PML)

Trust

Toolkit

IWTrust

IW Explainer/

Abstractor

IWBrowser

Trust computation

End-user friendly visualization

Expert friendly

Visualization

SPARK

(DARPA CALO)

SPARK-L

Justification

IWSearch

UIMA

(DTO NIMD Text Analytics

Exp Aggregation)

Provenance

IWBase

Framework for explaining question answering tasks by

• abstracting, storing, exchanging,

• combining, annotating, filtering, segmenting,

• comparing, and rendering proofs and proof fragments provided by question answerers.

search engine based publishing provenance registration

59

SW Questions & Answers

Users can explore extracted entities and relationships, create new hypothesis, ask questions, browse answers and get explanations for answers.

A question

An answer

A context for explaining the answer

An abstracted explanation

60

(this graphical interface done by Batelle supported by Stanford KSL)

Summary

• Semantics are a very key ingredient for progress in informatics and escience

• A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production

• This is what we will be teaching you in this class

61

Semantic Web Methodology and

Technology Development Process

• Establish and improve a well-defined methodology vision for

Semantic Technology based application development

Leverage controlled vocabularies, et c.

Rapid

Open World:

Evolve, Iterate,

Prototype

Redesign,

Redeploy

Leverage

Technology

Infrastructure

Adopt

Technology

Approach

Science/Expert

Review & Iteration

Use Tools

Evaluation

Analysis

Use Case

Small Team, mixed skills

Develop model/ ontology

62

Outline of the course

• Topics for Semantic e-Science/ Foundations:

– Semantic Methodologies

– Knowledge Representation for e-Science

– Ontology Engineering and Re-Use for e-Science

– Knowledge Integration for e-Science

– Semantic Data Integration

– Semantic Web Languages, Tools and Services

– Semantic Infrastructure and Architecture for e-Science

– Semantic Grid Middleware

– Ontology Evolution for e-Science

– Knowledge Management for e-Science

– e-Science Workflow Management

– Data life-cycle for e-Science

– Data Mining and Knowledge Discovery

63

SeS Applications and Ontologies

• Semantic Web for Health Care and Life Science

• Semantic Web for Bio-Med-informatics

• Semantic Web for System and Integrated Biology

• Semantic Web for Sun, Earth, Environment and

Climate

• Semantic Web for Chemistry, Physics and

Astronomy

• Semantic Web for Engineering

• Semantic Web and Digital Libraries and Scientific

Publications

64

SeS Project options

• Configuration and Deployment of Semantic Virtual

Observatories

– Oceanography, astronomy, geology

• Ontology Merging and Validation Test-bed

• Semantic Language and Tool Use and Evaluation

• Semantic eScience Implementation Evaluation

• Semantic Collaboration Case Studies

• Semantic Application Development and

Demonstration

65

Schedule – web page

• Reading assignments

• Assignments

– Individual

– Group

• Written assessments

• Presentation assessments

• Group assessments

66

What we expect

• Attend class, complete assignments

• Participate

• Ask questions – be honest with yourself and others about what you do and do not know

• Work both individually and in a group

• Work constructively in group and class sessions

67

Logistics summary

• Class - Monday 1-3:50pm

• Office hours – By Appointment along with a regular time to be determined and tetherless night

• This weeks assignment:

– Reading - Ontologies 101*, Semantic Web, e-Science,

RDFS

– Turn in a one page description of one of your favorite papers AND WHY from the reading list

• Next class (week 2 – September 12***** - note labor day):

– Foundations I: Methodologies, Knowledge Representation

• If you have a background that you think needs some extra background reading, talk to us.

• Questions?

68

Extra

69

Digital natives expect services to accommodate their preferences.

• Information online, not “in line”

• Information on-demand, free of place or time

• Blended classroom and online experience

• Flexible schedule for working students

• Relevant and timely content

• More team collaboration

• More content from multiple sources

• Interactive content from voice, video and data

• Ability to contribute, as well as consume, content/knowledge

• Leads to virtual access…

Progression after progression

Informatics

IT Cyber

Infrastru cture

Cyber

Informatics

Core

Informatics

Science

Informatics, aka

Xinformatics

Science,

Societal

Benefit

Areas

71

Summary

• The data and information challenges are (almost) being identified as increasingly common

• Data and information science is becoming the

‘fourth’ column (along with theory, experiment and computation)

• Informatics is playing a key role in filling the gap between science (and the spectrum of nonexpert) use and generation and the underlying cyberinfrastructure – evident due to the emergence of Xinformatics (world-wide)

• Informatics is a profession and a community activity and requires efforts in all 3 sub-areas

(science, core, cyber) and must be synergistic 72

Background

Scientists should be able to access a global, distributed knowledge base of scientific data that:

• appears to be integrated

• appears to be locally available

But… data is obtained by multiple means, using various protocols, in differing vocabularies, using

(sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed

And… there often exists significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

73

Download