- Tetherless World Constellation

advertisement
Introduction to eScience and
Semantic Web
Professor Deborah McGuinness
TA – Katie Chastain
Other lectures from tetherless world grad students Jim McCusker and Amar
Viswanathan and possibly others from http://tw.rpi.edu/web/People
CSCI 6962 - 01, 26868 , CSCI 4969 - 01, 27716
ITWS 6960 - 01, 27640 , ITWS 4969 - 01, 27717
1
Week 1, August 27, 2012
Admin info (keep/ print this slide)
• Class:
– CSCI 6962 - 01, 26868 , CSCI 4969 - 01, 27716
– ITWS 6960 - 01, 27640 , ITWS 4969 - 01, 27717
• Hours: 1pm-3:50pm Mondays (except after Columbus day
when we meet on Tuesday)
• Class Location: Winslow 1140
• Instructors: Deborah McGuinness, TA Katie Chastain,
Guests: Jim McCusker, Amar Viswanathan, Patrice Seyed
• Contacts: dlm@cs.rpi.edu, chastk@rpi.edu , mccusj@rpi.edu
, kannaa@rpi.edu, seyeda2@rpi.edu
• Contact locations: Winslow 2104 (DLM), 2nd floor Winslow
kitchen
• Wiki: http://tw.rpi.edu/web/Courses/SemanticeScience/2012
• Twed: http://tw.rpi.edu/web/TWed Wed - 7-9 starting Sept 12 2
Introductions
•
•
•
•
•
Who are we?
Who are you?
Why are you here?
What do you want to get out of the class?
Will you make the class (on time) each week
and do you have any other conflicts or issues
we should know about?
3
“Knowledge is the common
wealth of humanity”*
In the Earth and space sciences and elsewhere,
ready and open access to the vast and growing
collections of cross-disciplinary digital information
is the key to understanding and responding to
complex Earth system phenomena that influence
human survival.
We have a shared responsibility to create and
implement strategies to realise the full potential of
digital information and services for present and
future generations.
*Adama Samassekou, Convener of the UN World Summit on the Information Society
Background
People should be able to access a global, distributed
knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means, using
various protocols, in differing vocabularies, using
(sometimes unstated) assumptions, with
inconsistent (or non-existent) meta-data. It may be
inconsistent, incomplete, evolving, and distributed
And… there often exists significant levels of semantic
heterogeneity, large-scale data, complex data
types, legacy systems, inflexible and unsustainable
5
implementation technology…
What do we need to achieve Semantic eScience?
(in-class brainstorming exercise)
White board exercise…. What do we need to achieve this vision?
What do we need to achieve Semantic eScience?
(in-class brainstorming exercise)
organization, leadership, management strategies, roles and
assignment of roles
dissemination strategy
communication of ideas
- machine level
- human level
conflict resolution
cross-disciplinary
collaboration
flexible
adaptable, feedback
extensible
ability to filter information
usage/application of resources, optimization
facts, knowledge (domain knowledge)
context, domain, scope
goals, use cases
metadata - data to describe data
ability to link information
ability to understand information
ability to capture and represent conflicting ideas
provenance - where data come from
trust - reliable
ability to capture intent (humanitarian aspect / responsibility)
credibility of information
interesting and appealing
standardization
education and outreach
methods and metrics
criteria for evaluation
Outline of the course
• Topics for Semantic e-Science/ Foundations:
–
–
–
–
–
–
–
–
–
–
–
–
–
Semantic Methodologies
Knowledge Representation for e-Science
Ontology Engineering and Re-Use for e-Science
Knowledge Integration for e-Science
Semantic Data Integration
Semantic Web Languages, Tools and Services
Knowledge Provenance for e-Science
Semantic Infrastructure and Architecture for e-Science
Semantic Grid Middleware
Ontology Evolution for e-Science
Knowledge Management for e-Science
e-Science Workflow Management
Data life-cycle for e-Science
8
Contents
•
•
•
•
•
•
•
•
•
Outline of the course
Background
e-Science
Examples
Informatics
Semantics
Elements of Semantic e-Science (SeS)
What we expect
Logistics summary
9
The Information Era: Interoperability
Modern information and communications
technologies are creating an
“interoperable” information era in which
ready access to data and information can
be truly universal. Open access to data
and services enables us to meet the new
challenges of understand the Earth and
its space environment as a complex
system:
• managing and accessing large data sets
• higher space/time resolution capabilities
• rapid response requirements
• data assimilation into models
• crossing disciplinary boundaries.
10
Information
Information
But
data has
products have
Lots of Audiences
More Strategic
Less Strategic
SCIENTISTS TOO
From “Why EPO (Education and Public Outreach)?”, a NASA internal report on science education, 2005
11
Shifting the Burden from the User
to the Provider
12
Fox CI and X-informatics - CSIG 2008, Aug 11
e-Science
• Emphasis is on Science
• Original narrative: One of the key drivers behind the search for such new
scientific tools is the imminent deluge of data from new generations of
scientific experiments and surveys (*). In order to exploit and explore the
petabytes of scientific data that will arise from these high-throughput
experiments, supercomputer simulations, sensor networks, and satellite
surveys, scientists will need assistance from specialized search engines,
data mining tools, and data visualization tools that make it easy to ask
questions and understand answers. To create such tools, the data will
need to be annotated with relevant "metadata" giving information as to
provenance, content, conditions, and so on; and, in many instances, the
sheer volume of data will dictate that this process be automated.
Scientists will create vast distributed digital repositories of scientific data
requiring management services similar to those of more conventional
digital libraries, as well as other data-specific services. The ability to
search, access, move, manipulate, and mine such data will be a central
requirement for this new generation of collaborative science software
applications. Hey and Trefethen, 2005
13
Evolving Science
• Thousand years ago:
science was empirical
describing natural phenomena
• Last few hundred years:
theoretical branch
using models, generalizations
• Last few decades:
a computational branch
2
.
4G
c2
a
 a   3   a 2
 
simulating complex phenomena
• Today:
data exploration (eScience)
synthesizing theory, experiment and
computation with advanced data
management and statistics
 new algorithms!
• eScience that “understands” meaning of terms
Semantic eScience
Living in an Exponential World
1000
• Scientific data doubles every year
– caused by successive generations
of inexpensive sensors +
exponentially faster computing
•
•
•
•
100
10
1
0.1
1970
Changes the nature of scientific computing
Cuts across disciplines (eScience)
It becomes increasingly harder to extract knowledge
20% of the world’s servers go into huge data centers
by the “Big 5”
– Google, Microsoft, Yahoo, Amazon, eBay
• So it is not only the scientific data!
1975
1980
1985
1990
1995
2000
CCDs
Glass
Collecting Data
• Very extended distribution of data sets:
data on all scales!
• Most datasets are small, and manually
maintained (Excel spreadsheets)
• Total amount of data dominated by the other
end
(large multi-TB archive facilities)
• Most bytes today are collected via electronic
sensors
Making Discoveries
• Where are discoveries made?
– At the edges and boundaries, by inspecting deeper or more data
• Metcalfe’s law
– Utility of computer networks grows as the number of
possible connections: O(N2)
• Federating data (the connections!!)
– Federation of N archives has utility O(N2)
– Possibilities for new discoveries grow as O(N2)
• Many examples
– Sky surveys – galaxy zoo… Very early discoveries from Sloan Digital Sky Survey
(http://www.sdss.org/ ), Two Micron Sky Survey
(http://www.ipac.caltech.edu/2mass/ ) , Palomar Digital Sky Survey
(http://www.astro.caltech.edu/~george/dposs/ )
– Genomics+proteomics
– Alzheimers article in reading
Data Delivery: Hitting a Wall
FTP and GREP are not adequate
•
•
•
•
You can GREP 1 MB in a second
You can GREP 1 GB in a minute
You can GREP 1 TB in 2 days
You can GREP 1 PB in 3 years
•
Oh!, and 1PB ~4,000 disks
•
•
•
•
You can FTP 1 MB in 1 sec
You can FTP 1 GB / min (~1 $/GB)
…
2 days and 1K$
…
3 years and 1M$
• At some point you need
indices to limit search
parallel data search and analysis
• This is where databases can help
• Take the analysis to the data!!
Mind the Gap!
• As a result of finding out who is doing what,
 Informatics - information science includes the
sharing experience/ expertise, and
science of (data and) information, the practice
substantial coordination:
of information processing, and the engineering
• There
is/ was still
a gap between
science
of information
systems.
Informatics
studies the
and
the underlying
and of natural
structure,
behavior,infrastructure
and interactions
technology
that
is available
and artificial
systems
that store, process and
communicate (data and) information. It also
develops its own conceptual
theoretical
• Cyberinfrastructure
is the new and
research
foundations. Since
computers,
individuals
environment(s)
that support
advanced
data and
acquisition,
dataallstorage,
management,
organizations
processdata
information,
data
integration,
mining, data
informatics
has data
computational,
cognitive and
visualization and other computing and
social aspects, including study of the social
information processing services over the
impact of information technologies. Wikipedia.
Internet.
19
Progression after progression
Informatics
IT Cyber
Infrastru
cture
Cyber
Informatics
Core
Informatics
Science
Informatics,
aka
Xinformatics
Science,
Societal
Benefit
Areas
20
World-Wide Emerging Technology
Trends
• Innovation will come from other parts of the world
other than the U.S.
• The Chinese have skipped the Internet first
generation.
• Growth is occurring in Asia, and decreasing in
previous hot areas such as Western Europe.
• U.S. Industry is compulsively outsourcing abroad.
• Software is moving from forms-based applications
to business processes.
• Networks are migrating to internet protocol and
optical networking technologies.
Cyberinfrastructure
•
•
•
•
•
•
•
•
•
Data curation and storage
Federated access
Collaboration
New uses in High Performance Computing
Databases
Web servers, services (software as service)
Wiki
Visualization
All discipline neutral
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, etc.
Adopt
Leverage
Rapid
Technology Technology Science/Expert
Open World: Prototype
Infrastructure Approach Review & Iteration
Evolve, Iterate,
Redesign,
Redeploy
Use Tools
Evaluation
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
23
Ex. 1: Virtual Observatories
Make data and tools quickly and easily accessible to a
wide audience.
Operationally, virtual observatories need to find the
right balance of data/model holdings, portals and
client software that researchers can use without
effort or interference as if all the materials were
available on his/her local computer using the
user’s preferred language: i.e. appear to be local
and integrated
Likely to provide controlled vocabularies that may be
used for interoperation in appropriate domains
along with database interfaces for access and
storage -> thus part Information Technology (IT),
part Cyber Infrastructure (CI), part Informatics and
all about doing new science
24
SemantEco
• Water Quality Portal Example from previous
classes
• http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal
• We will come back to this later… but will go
over now at a high level.
• Next Motivated by the Virtual Solar Terrestrial
Observatory
25
Added value
Education, clearinghouses,
disciplines, et c.
other
services,
Semantic mediation layer - midupper-level
Virtual
Observatory
Portal
Semantic
interoperability
Added value
Added value
Semantic query,
hypothesis and
inference
Web
Serv.
VO
API
Query,
access
and use
of data
Mediation Layer
• Ontology - capturing concepts of Parameters,
Semantic mediation layer - VSTO - low level
Instruments, Date/Time, Data Product (and
associated classes, properties) and Service
Classes
Metadata, schema,
• Maps queries to underlying data data
• Generates access requests for metadata,
data
Added
value
Data
DB2
DB3
• AllowsDBqueries, reasoning, analysis, new
…………
Base n
1
hypothesis generation, testing, explanation, etc.
26
Science and technical use cases
Find data which represents the state of the neutral
atmosphere anywhere above 100km and toward the
arctic circle (above 45N) at any time of high
geomagnetic activity.
– Extract information from the use-case - encode knowledge
– Translate this into a complete query for data - inference and
integration of data from instruments, indices and models
Provide semantically-enabled, smart data query services
via a Simple Object Access Protocol (SOAP) web
service for the Virtual Ionosphere-ThermosphereMesosphere Observatory that retrieve data, filtered by
constraints on Instrument, Date-Time, and Parameter
27
in any order and with constraints included in any
combination.
Inferred plot type
and return required
axes data
28
Semantic Web Benefits
• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time
• Decreased input requirements for query: in one case reducing the
number of selections from eight to three
• Generates only syntactically correct queries: which was not always
insurable in previous implementations without semantics
• Semantic query support: by using background ontologies and a
reasoner, our application has the opportunity to only expose coherent
query (portal and services)
• Semantic integration: in the past users had to remember (and maintain
codes) to account for numerous different ways to combine and plot the
data whereas now semantic mediation provides the level of sensible data
integration required, and exposed as smart web services
– understanding of coordinate systems, relationships, data synthesis,
transformations, etc.
– returns independent variables and related parameters
• A broader range of potential users (PhD scientists, students, professional
research associates and those from outside the fields)
29
Remembering….data has Lots of
Audiences… Also lay people
More Strategic
Less Strategic
30
What is a Non-Specialist Use Case?
Teacher accesses internet goes
to An Educational Virtual
Observatory and enters a
search for “Aurora”.
Someone
should be able
to query a
virtual
observatory
without having
specialist
knowledge
31
What should the User Receive?
Teacher receives four groupings of search
results:
1) Educational materials:
http://www.meted.ucar.edu/topics_spacewx.ph
p and http://www.meted.ucar.edu/hao/aurora/
2) Research, data and tools: via research VOs
but the search for brightness, or green/red line
emission is mediated for them
3) Did you know?: Aurora is a phenomena of
the upper terrestrial atmosphere (ionosphere)
also known as Northern Lights
4) Did you mean?: Aurora Borealis or Aurora
Australis, etc.
32
Semantic Information Integration:
Concept map for educational use of
science data in a lesson plan
33
Fox CI and X-informatics - CSIG 2008, Aug 11
34
Fox CI and X-informatics - CSIG 2008, Aug 11
Semantic Web Basics
• The triple: {subject-predicate-object}
Interferometer is-a optical instrument
Optical instrument has focal length
An ontology is a representation of this knowledge
• W3C is the primary (but not sole) governing organization for
languages, specifications, best practices, et c.
– RDF - Resource Description Framework
– OWL 1.0 - Ontology Web Language (OWL 2.0 on the way)
• Encode the knowledge in triples, in a triple-store, software is
built to traverse the semantic network, it can be queried or
reasoned upon
• Put semantics between/ in your interfaces, i.e. between layers
and components in your architecture, i.e. between ‘users’ and
‘information’ to mediate the exchange
35
•
•
•
•
•
Terminology
Semantic Web
– An extension of the current web in which information is given well-defined
meaning, better enabling computers and people to work in cooperation,
www.semanticweb.org
– Primer: http://www.ics.forth.gr/isl/swprimer/
Semantic Grid
– Semantic services to use the resources of many computers connected by a
network to solve large scale computational/ data problems
Provenance
– origin or source from which something comes, intention for use, who/what
generated for, manner of manufacture, history of subsequent owners, sense
of place and time of manufacture, production or discovery, documented in
detail sufficient to allow reproducibility.
Service-oriented architecture
– Provision of a capability over the internet via a ‘remote-procedure-call’ using
prescribed input, output and pre-conditions
Ontology (n.d.). The Free On-line Dictionary of Computing.
http://dictionary.reference.com/browse/ontology
– An explicit formal specification of how to represent the objects, concepts and
other entities that are assumed to exist in some area of interest and the
36
relationships that hold among them.
•
•
•
Terminology
Closed World - where complete knowledge is known (encoded), AI relied on this
Open World - where knowledge is incomplete/ evolving, SW promotes this
Languages
–
–
–
–
–
–
–
•
OWL - Web Ontology Language (W3C)
RDF - Resource Description Framework (W3C)
OWL-S/SWSL - Web Services (W3C)
WSMO/WSML - Web Services (EC/W3C)
SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format
PML - Proof Markup Language
Editors: Protégé, SWOOP, Medius, SWeDE, …
Reasoners
– Pellet, Racer, Medius KBS, FACT++, fuzzyDL, KAON2, MSPASS, QuOnto
•
Query Languages
– SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery
•
Other Tools for Semantic Web
–
–
–
–
•
Search: SWOOGLE swoogle.umbc.edu
Collaboration: www.planetont.org
Other: Jena, SeSAME/SAIL, Mulgara, Eclipse, KOWARI
Semantic wiki: OntoWiki, SemanticMediaWiki
Emerging Semantic Standards for Earth Science
– SWEET, VSTO, MMI, GeoSciML
37
Semantic Web Layers
38
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
Application Areas for Semantics
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Smart search
Annotation (even simple forms), smart tagging
Geospatial
Implementing logic (rules), e.g. in workflows
Data integration
Verification …. and the list goes on
Web services
Web content mining with natural language parsing
User interface development (portals)
Semantic desktop
Wikis - OntoWiki, SemanticMediaWiki
Sensor Web
Software engineering
Explanation
39
Visibility
2007-2008 Hype Cycle for Emerging
Semantic Web Technologies v0.6
Semantic
Web
Services
Triple stores, e.g.
Jena, Sesame,
Mulgara, Oracle
Spatial
Semantic
Wiki
Smart search,
e.g. NOESIS
Rules/Logic,
SWRL
Query Lang,
SPARQL
Ontology editor,
SWOOP
Concept
map, Cmap
RDF
OWL 1.0
Tagging /
annotation
Mid-level ES
domain
ontologies, e.g
GEON
Protégé
XML
Estimated years to
mainstream adoption
in Earth science
< 2 years
DL Reasoners,
2-5 years
SKOS,
e.g.
Pellet,
Racer
Species
Query
5-10 years
FOAF
Validators
Lang,
Upper level
Mid-level ES
OWL 1.1
OWL-QL
> 10 years
ontologies,
e.g
domain
ontologies,
Natural Language
Obsolete
ABC, DOLCE,
e.g SWEET
before
Ontologies
SUMO
plateau
Query Lang, Commercial
Managing
and embedded QL
modular
40
Slope of
Plateau of
ontologies Technology
Peak of
Trough of
Enlightenment
Productivity
(ES and
trigger
Inflated
Disillusionment
general)
Expectations
Produced for NASA TIWG semantic web subgroup
Time
April 2008
Outcome
 Increased
Collaboration &
Interdisciplinary Science
 Acceleration of
Knowledge
Production
 Revolutionizing
how science is
done
Output
 Geospatial
semantic services
established
 Geospatial semantic
services proliferate
 Scientific
semantic assisted
services
 Autonomous
inference of
science results
Vocabulary
Interoperable
Information
Infrastructure
Assisted
Discovery &
Mediation
 Improved
Information
Sharing
Languages/
Reasoning
Technology
Capability
Results
Semantic Web Roadmap
 Some common
vocabulary based
product search
and access
 Semantic
geospatial search &
inference, access
 Semantic agentbased searches
 Semantic agentbased integration
 Local
processing + data
exchange
 Basic data
tailoring services
(data as service),
verification/
validation
Interoperable
geospatial services
(analysis as
service), results
explanation service
 Metadata-driven
data fusion
(semantic service
chaining), trust
 SWEET core
1.0 based on
GCMD/CF
 SWEET core 2.0
based on best
practices decided from
community
 RDF, OWL,
OWL-S
 Geospatial
reasoning, OWL-Time
 SWEET 3.0 with
semantic callable
interfaces via standard
programming languages
 Numerical
reasoning
 Reasoners
able to utilize
SWEET 4.0
 Scientific
reasoning
41
Current
Near Term (0-2 yrs)
Mid Term (2-5 yrs)
Long Term (5+ yrs)
Interactive Interoperable Responsive Verifiable
Assisted
Assisted
Data
Information Information Information Knowledge Discovery &
Analysis
services
Delivery
Quality
Building
Mediation
Seamless
Data
Access
Capability
Semantic Web Roadmap (capability)
April 2008
 Some common
vocabulary based product
search and access
 Some metadata
and limited
provenance
available
 Semantic geospatial
search & inference,
access
 Semantic agentbased searches
 Semantic agent-based
integration
 Common
 Ontologies for data
terminology captured
mining, visualization and
in ontologies, crossing
analysis emerging/ maturing
domains
 Ontologies for
information quality
developed
 Verification is manual
with minimal tool
support
 Domain and range
properties in ontologies
used in tools
 Provenance/
annotation with
ontologies in user
tools
 Service
ontologies carry
quality provenance
 Services annotated
 Dynamic service
 Semantic markup of
 Services must be
with resource
discovery and mediation, data latency (time lags)
hardwired and service
descriptions
and data scheduling
which adapt dynamically
agreements established
 Local processing +
data exchange
 Limited metadata
passed to analysis
applications
 Basic data tailoring  Interoperable geospatial
services
services (data as
(analysis as service),
service), verification/
results explanation service
validation
 Tag properties, nonjargon vocabulary for
non-specialist use
 Access mediated by
agreed standard
vocabularies, hard-wired
connections
Current
 Access mediated
by common
ontologies
Near Term (0-2 yrs)
 Shared terminology for
the visual properties of
interface objects and graph
types...
 Mediation aided by
services with domain/
range properties
Mid Term (2-5 yrs)
 Metadata-driven
data fusion (semantic
service chaining),
trust
 Semantic fields
to describe tag key
modal functions.
 Key data access
services are
semantically mediated 42
Long Term (5+ yrs)
Interactive Interoperable Responsive Verifiable
Assisted
Assisted
Data
Information Information Information Knowledge Discovery &
Analysis
services
Delivery
Quality
Building
Mediation
Seamless
Data
Access
Capability
Roadmap - from near-term to mid-term
 Semantic geospatial
search & inference,
access
 Ontologies for data
mining, visualization and
analysis emerging/ maturing
 Ontologies for
information quality
developed
 Services annotated
with resource
descriptions
 Basic data tailoring
services (data as
service), verification/
validation
 Tag properties, nonjargon vocabulary for
non-specialist use
 Access mediated
by common
ontologies
Near Term (0-2 yrs)
-> requires agent development
and vocabulary for agent
characterization
 Semantic agentbased searches
-> requires mature (domain and
data-type) ontologies with
community endorsement and
governance and a robust
integration framework
-> requires mature quality and
uncertainty ontologies with
domain and range properties
added and populated
 Common
terminology captured
in ontologies, crossing
domains
-> requires semantic service
(ontology) registry
-> requires service to
implement v/v, new
descriptions of analyses,
developing explanation
-> requires development of
portal modal function
vocabulary and ontology, link
to domain context and data
structure
-> requires adding properties
to classes in ontologies and
populating instances with
expert agreement
 Domain and range
properties in ontologies
used in tools
 Dynamic service
discovery and mediation,
and data scheduling
 Interoperable geospatial
services
(analysis as service),
results explanation service
 Shared terminology for
the visual properties of
interface objects and graph
types...
 Mediation aided by
services with domain/
range properties
Mid Term (2-5 yrs)
43
Selected Technical Benefits
1.
2.
3.
4.
5.
6.
7.
8.
Integrating Multiple Data Sources
Semantic Drill Down / Focused Perusal
Statements about Statements
Inference
Translation
Smart (Focused) Search
Smarter Search … Configuration
Proof and Trust
Updated material reused from “The Substance of the Web”. McGuinness and Dean. Semantic Web Applications for National
Security. May, 2005. http://www.schafertmd.com/swans/agenda.html
44
1: Integrating Multiple Data
Sources
• The Semantic Web lets us merge
statements from different sources
• The RDF Graph Model allows
programs to use data uniformly
regardless of the source
• Figuring out where to find such
data is a motivator for Semantic
Web Services
hasCoordinates
#Ionosphere
#magnetic
name
hasLowerBoundaryValue
“100”
“Terrestrial
Ionosphere”
hasLowerBoundaryUnit
“km”
Different line & text colors
45
represent different data sources
2: Drill Down /Focused
Perusal
• The Semantic Web uses Uniform
Resource Identifiers (URIs) to
…#NeutralTemperature
name things
• These can typically be resolved
to get more information about the
resource
measuredby
• This essentially creates a web of
data analogous to the web of text
created by the World Wide Web
Internet
• Ontologies are represented using
the same structure as content
– We can resolve class and
property URIs to learn about the
ontology
…#Norway
locatedIn
...#ISR
...#FPI
type
operatedby
...#MilllstoneHill …#EISCAT
46
3: Statements about Statements
• The Semantic Web allows us to
make statements about
statements
– Timestamps
– Provenance / Lineage
– Authoritativeness / Probability /
Uncertainty
– Security classification
– …
#Danny’s
#Aurora
hasSource
hasDateTime
hascolor
• This is an unsung virtue of the
Semantic Web
20031031
Red
Ontologies Workshop, APL May 26, 2006
47
4: Inference
• The formal foundations of
the Semantic Web allow
us to infer additional
(implicit) statements that
are not explicitly made
• Unambiguous semantics
allow question answerers
to infer that objects are
the same, objects are
related, objects have
certain restrictions, …
• SWRL allows us to make
additional inferences
beyond those provided by
the ontology
OperatesInstrument
#Millstone Hill
#Interferometer
hasInstrument
isOperatedBy
Measures
hasTypeofData
hasOperatingMo
hasMeaasuredData
#VerticalMeans
48
5: Translation
• While encouraging sharing,
the Semantic Web allows
multiple URIs to refer to the
same thing
• There are multiple levels of
mapping
–
–
–
–
Classes
Properties
Instances
Ontologies
• OWL supports equivalence
and specialization; SWRL
allows more complex
mappings
#precipitation
name
ont1:Precipitation
ont1:EduLevel
VO:Scientist
#precipitation
name
ont2:Rain
ont2:EduLevel
EduVO:K-12
49
6: Smart (Focused) Search
• The Semantic Web
associates 1 or more
classes with each
object
• We can use ontologies
to enhance search by:
–
–
–
–
Query expansion
Sense disambiguation
Type with restrictions
….
50
7: Smarter Search / Configuration
51
GEONGRID Ontology Search
and Data Integration Example
Uses emerging web standards to enable smart web
applications
Given an upper-level domain choice
•Ecology
Illustrate or list contained concepts/hierarchy
•VegetationCover, TreeRings, etc.
Retrieve some specific options from web
•Maps, tree-ring data,
•
Info: https://portal.geongrid.org:8443/gridsphere/gridsphere
52
53
54
8: Proof
• The logical foundations
hasCalibration
#Critical
of the Semantic Web
#FlatField
Dataset
allow us to construct
proofs that can be used
hasPeerReview
to improve transparency,
understanding, and trust
#Solar
Physics
• Proof and Trust are onPaper
going research areas for
the Semantic Web: e.g., “Critical Dataset has been calibrated
See PML and Inference with a flat field program that is published
In the peer reviewed literature.”
55
Web
Inference Web
Framework for explaining reasoning tasks by storing,
exchanging, combining, annotating, filtering, segmenting,
comparing, and rendering proofs and proof fragments
provided by multiple distributed reasoners.
• OWL-based Proof Markup Language (PML) specification as
an interlingua for proof interchange
• IWExplainer for generating and presenting interactive
explanations from PML proofs providing multiple dialogues
and abstraction options
• IWBrowser for displaying (distributed) PML proofs
• IWBase distributed repository of proof-related meta-data such
as inference engines/rules/languages/sources
• Integrated with theorem provers, text analyzers, web
services, …
http://iw.rpi.edu
56
Inference Web Infrastructure
(McGuinness, et.al., 2004 http://www.ksl.stanford.edu/KSL_Abstracts/KSL-04-03.html )
Files/WWW
Semantic
OWL-S/BPEL
Discovery Service
(DAML/SNRC)
CWM
(NSF TAMI)
JTP
(DAML/NIMD)
SPARK
(DARPA CALO)
N3
KIF
SPARK-L
UIMA
(DTO NIMD Text Analytics
Exp Aggregation)
Proof Markup
Language (PML)
Trust
Justification
Provenance
Toolkit
IWTrust
Trust computation
IW Explainer/
Abstractor
End-user friendly
visualization
IWBrowser
Expert friendly
Visualization
IWSearch
search engine
based publishing
IWBase
provenance
registration
Framework for explaining question answering tasks by
• abstracting, storing, exchanging,
• combining, annotating, filtering, segmenting,
• comparing, and rendering proofs and proof fragments
provided by question answerers.
57
SW Questions & Answers
Users can explore extracted entities and relationships, create new
hypothesis, ask questions, browse answers and get explanations for
answers.
A question
An answer
A context for
explaining
the answer
An abstracted
explanation
58
(this graphical interface done by Batelle supported by Stanford KSL)
Summary
• Semantics are a very key ingredient for progress in
informatics and escience
• A sustained involvement of key inter-disciplinary
team members is very important -> leads to
incentives, rewards, etc. and a balance of research
and production
• This is what we will be teaching you in this class
59
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, et c.
Rapid
Leverage
Open World: Prototype
Technology
Evolve, Iterate,
Infrastructure
Redesign,
Redeploy
Adopt
Technology Science/Expert
Approach Review & Iteration
Use Tools
Evaluation
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
60
Outline of the course
• Topics for Semantic e-Science/ Foundations:
–
–
–
–
–
–
–
–
–
–
–
–
–
Semantic Methodologies
Knowledge Representation for e-Science
Ontology Engineering and Re-Use for e-Science
Knowledge Integration for e-Science
Semantic Data Integration
Semantic Web Languages, Tools and Services
Semantic Infrastructure and Architecture for e-Science
Semantic Grid Middleware
Ontology Evolution for e-Science
Knowledge Management for e-Science
e-Science Workflow Management
Data life-cycle for e-Science
Data Mining and Knowledge Discovery
61
SeS Applications and Ontologies
•
•
•
•
Semantic Web for Health Care and Life Science
Semantic Web for Bio-Med-informatics
Semantic Web for System and Integrated Biology
Semantic Web for Sun, Earth, Environment and
Climate
• Semantic Web for Chemistry, Physics and
Astronomy
• Semantic Web for Engineering
• Semantic Web and Digital Libraries and Scientific
Publications
62
SeS Project options
• Configuration and Deployment of Semantic Virtual
Observatories
– Oceanography, astronomy, geology
– Particularly convenient ones – around water quality, first
responder data
•
•
•
•
•
•
Semantic Advisors – e.g., Semantic Sommelier
Ontology Merging and Validation Test-bed
Semantic Language and Tool Use and Evaluation
Semantic eScience Implementation Evaluation
Semantic Collaboration Case Studies
Semantic Application Development and
Demonstration
63
Schedule - wiki
• Reading assignments
• Assignments
– Individual
– Group
• Written assessments
• Presentation assessments
• Group assessments
64
What we expect
• Attend class, complete assignments
• Participate
• Ask questions – be honest with yourself and
others about what you do and do not know
• Work both individually and in a group
• Work constructively in group and class
sessions
65
Logistics summary
• Class - Monday 1-3:50pm
• Office hours – By Appointment along with a regular time to be
determined for TA (probably before and tetherless night –
Twed)
• This weeks assignment:
– Reading - Ontologies 101*- this one is very important,
Semantic Web, e-Science, RDFS
– Turn in a one page description of one of your favorite papers
AND WHY from the reading list
• Next class (week 2 – two weeks from today - note labor day):
– Foundations I: Methodologies, Knowledge Representation
– Use Cases
• If you have a background that you think needs some extra
66
background reading, talk to us.
Extra
67
Download