SW_PromTech_CurrentApp_FutureDir.ppt

advertisement
Semantic Web: Promising Technologies,
Current Applications & Future Directions
Invited and Colloquia talks at: Swinburne Institute of Technology –Melbourne (July
18), University of Adelaide-Adelaide (July 23), University of Melbourne- Melbourne
(July 31), Victoria University- Melbourne
Australia, 2008
Amit P. Sheth
amit.sheth@wright.edu
Kno.e.sis Center, Comp. Sc & Engg
Wright State University, Dayton OH, USA
Thanks Kno.e.sis team and collaborators
Knowledge Enabled Information and Services Science
Outline
• Semantic Web –key capabilities and
technlologies
• Real-world Applications demonstrating
benefit of semantic web technologies
• Exciting on-going research
Knowledge Enabled Information and Services Science
Evolution of the Web
Web as an oracle / assistant /
partner
- “ask the Web”: using semantics
to leverage text + data + services
2007
- Powerset
Web of people
- social networks, user-created casual
content
- Twine, GeneRIF, Connotea
Web of resources
- data = service = data, mashups
- ubiquitous computing
1997
Web of databases
- dynamically generated pages
- web query interfaces
Web of pages
- text, manually created links
- extensive navigation
Knowledge Enabled Information and Services Science
1
2
3
of
Semantic Web
Knowledge Enabled Information and Services Science
1
• Ontology: Agreement with a common
vocabulary/nomenclature, conceptual
models and domain Knowledge
• Schema + Knowledge base
• Agreement is what enables interoperability
• Formal description - Machine processability
is what leads to automation
Knowledge Enabled Information and Services Science
2
• Semantic Annotation (Metadata Extraction):
Associating meaning with data, or labeling
data so it is more meaningful to the system
and people.
• Can be manual, semi-automatic (automatic
with human verification), automatic.
Knowledge Enabled Information and Services Science
3
• Reasoning/Computation: semantics
enabled search, integration, answering
complex queries, connections and analyses
(paths, sub graphs), pattern finding, mining,
hypothesis validation, discovery,
visualization
Knowledge Enabled Information and Services Science
Different foci
• TBL – focus on data: Data Web (“In a way, the
Semantic Web is a bit like having all the databases out
there as one big database.”)
• Others focus on reasoning and intelligent processing
Knowledge Enabled Information and Services Science
SW Stack: Architecture, Standards
Knowledge Enabled Information and Services Science
From Syntax to Semantics
Deep semantics
Shallow semantics
Knowledge Enabled Information and Services Science
a little bit about ontologies
Knowledge Enabled Information and Services Science
Many Ontologies Available Today
Open Biomedical Ontologies
Open Biomedical Ontologies, http://obo.sourceforge.net/
Knowledge Enabled Information and Services Science
From simple ontologies
Knowledge Enabled Information and Services Science
Drug Ontology Hierarchy
(showing is-a relationships)
non_drug_
reactant
interaction_
property
formulary_
property
formulary
indication
monograph
_ix_class
prescription
_drug_
property
cpnum_
group
property
indication_
property
brandname_
individual
brandname_
undeclared
prescription
_drug_
brand_name
brandname_
composite
generic_
composite
prescription
_drug
prescription
_drug_
generic
owl:thing
interaction
interaction_
with_prescri
ption_drug
generic_
individual
Knowledge Enabled Information and Services Science
interaction_
with_non_
drug_reactant
interaction_
with_mono
graph_ix_cl
ass
to complex ontologies
Knowledge Enabled Information and Services Science
N-Glycosylation metabolic pathway
N-glycan_beta_GlcNAc_9
GNT-I
attaches GlcNAc at position 2
N-acetyl-glucosaminyl_transferase_V
N-glycan_alpha_man_4
GNT-V
attaches
GlcNAc at position 6
UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2
<=>
UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2
UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021
Knowledge Enabled Information and Services Science
A little bit about semantic metadata
extractions and annotations
Knowledge Enabled Information and Services Science
Extraction for Metadata Creation
WWW, Enterprise
Repositories
Nexis
UPI
AP
Feeds/
Documents
Digital Videos
...
...
Data Stores
Digital Maps
...
Digital Images
Create/extract as much (semantics)
metadata automatically as possible;
Use ontlogies to improve and enhance
extraction
Digital Audios
EXTRACTORS
METADATA
Knowledge Enabled Information and Services Science
Automatic Semantic Metadata
Extraction/Annotation
Knowledge Enabled Information and Services Science
Semantic Web in Action
Supporting Clinical Decision Making
Knowledge Enabled Information and Services Science
1. Supporting Clinical Decision Making
• Status: In use today
• Where: Athens Heart Center
• What: Use of Semantic Web technologies
for clinical decision support
Knowledge Enabled Information and Services Science
Operational Since January 2006
Knowledge Enabled Information and Services Science
Active Semantic Electronic Medical Records
(ASEMR)
Goals:
• Increase efficiency with decision support
• formulary, billing, reimbursement
• real time chart completion
• automated linking with billing
• Reduce Errors, Improve Patient Satisfaction & Reporting
• drug interactions, allergy, insurance
• Improve Profitability
Technologies:
• Ontologies, semantic annotations & rules
• Service Oriented Architecture
Thanks -- Dr. Agrawal, Dr. Wingeth, and others. ISWC2006 paper
Knowledge Enabled Information and Services Science
ASEMR - Demonstration
Click to Launch
Knowledge Enabled Information and Services Science
Further Opportunity:
Clinical and Biomedical Data
binary
text
Scientific
Literature
Health
Information
Services
PubMed
300 Documents
Published Online
each day
Elsevier
iConsult
NCBI
User-contributed
Content (Informal) Public Datasets
GeneRifs
Genome,
Protein DBs
new sequences
daily
Clinical Data
Personal
health history
Laboratory
Data
Lab tests,
RTPCR,
Mass spec
Search, browsing, complex query, integration, workflow,
analysis, hypothesis validation, decision support.
Knowledge Enabled Information and Services Science
Semantic Web in Action
Querying Integrated Data Sources
Knowledge Enabled Information and Services Science
2. Querying Integrated Data Sources
• Status: Completed research
• Where: NIH
• What: Querying Integrated Data Sources
– Enriching data with ontologies for integration, querying,
and automation
– Ontologies beyond vocabularies : the power of
relationships
Knowledge Enabled Information and Services Science
Using Data to Test Hypothesis
Link between glycosyltransferase activity and
congenital muscular dystrophy?
Gene name
Interactions
Glycosyltransferase
GO
gene
Sequence
PubMed
OMIM
Congenital muscular dystrophy
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
Knowledge Enabled Information and Services Science
On the World Wide Web
(GeneID: 9215)
has_associated_disease
Congenital muscular
dystrophy,
type 1D
has_molecular_function
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
Knowledge Enabled Information and Services Science
Acetylglucosaminyltransferase activity
With the Semantically Enhanced Data
SELECT DISTINCT ?t ?g ?d {
?t is_a GO:0016757 .
GO:0016757
?g has molecular functionglycosyltransferase
?t .
?g has_associated_phenotype ?b2 .
isa
?b2 has_textual_description ?d .
FILTER (?d, “muscular distrophy”, “i”) . GO:0008194
FILTER (?d, “congenital”,GO:0016758
“i”)
}
GO:0008375
acetylglucosaminyltransferase
GO:0008375
acetylglucosaminyltransferase
MIM:608840
Muscular dystrophy,
congenital, type 1D
has_molecular_function
LARGE
EG:9215
has_associated_phenotype
From medinfo paper.
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
Knowledge Enabled Information and Services Science
Semantic Web in Action
Industry Examples
Knowledge Enabled Information and Services Science
•
•
•
•
•
•
Zemanta
Twine
Digger
Calais – Reuters Thompson
Powerset
Talis
Knowledge Enabled Information and Services Science
Emerging Research Areas
Knowledge Enabled Information and Services Science
Fact Extraction and Schema
Creation
Knowledge Extraction from
Community-Generated Content
Knowledge Enabled Information and Services Science
Fact Extraction From Community Content
• Search helps us find relevant pages/articles
• But: It doesn’t answer questions.
Knowledge Enabled Information and Services Science
Fact Extraction From Community Content
• Fact Extraction is the first step towards
answering questions.
• Famous new company that does fact
extraction from Wikipedia is Powerset
http://www.powerset.com
Knowledge Enabled Information and Services Science
Fact Extraction From Community Content
• Problem: without a guiding schema,
extracted predicates are just terms
•  useful for humans, but not for machines
Knowledge Enabled Information and Services Science
Fact Extraction From Community Content
• Expert-created schemas are expensive and
usually very restricted
Knowledge Enabled Information and Services Science
Fact Extraction From Community Content
• Solution: Have a community-generated
schema
•  Wikipedia hierarchy for terms and
concepts
Knowledge Enabled Information and Services Science
Hierarchy Creation
Wikigraph-Based expansion
Graph Search
Query: “cognition”
Seed Query
Fulltext Concept
Search
Graph Search
Graph Search
B
Wikipedia
Hierarchy Creation
Knowledge Enabled Information and Services Science
Fact Extraction From Community Content
• Solution: Have a community-generated
schema
Relationship
•  Wikipedia
types hierarchy for terms and
concepts
– See Automatic Domain Model creation
•  Wikipedia Infoboxes for relationship
types
Knowledge Enabled Information and Services Science
Learn Patterns that indicate Relationships
Query:
Australia 
Sidney
Australia
 New South Wales, Australia
•in Sydney,
•in Sydney, New South Wales, Australia
Canberra
•Sydney is the most populous city in Australia
•Sydney is the most populous city in Australia
We know
that countries
•Canberra,
the Australian
capitalhave
city capitals.
•Canberra, the Australian capital city
Which
one
is Australia’s?
•Canberra
is the
capital
city of the Commonwealth of
•Canberra is the capital city of the Commonwealth of
Australia
Australia
•Canberra, the Australian capital
•Canberra, the Australian capital
Knowledge Enabled Information and Services Science
Add relationships
Knowledge Enabled Information and Services Science
Fact Extraction From Community Content
• The accumulation of many pattern
occurrences give the necessary support
– Canberra, Australia  minimal positive support
– The Australian capital of Canberra 
additional major support
Knowledge Enabled Information and Services Science
Summary
• Create Domain models from seed queries
or seed concepts
• Connect the concepts in the created
domain models with valid relationships
– Learn pertinent patterns for relationships
– Find evidence for relationships in text
• Wikipedia
• WWW
Knowledge Enabled Information and Services Science
Discovering Undiscovered
Knowledge
Connecting the Dots
Knowledge Enabled Information and Services Science
How are Harry Potter and Dan Brown related?
mentioned_in
Nicolas Flammel
Harry Potter
mentioned_in
Nicolas Poussin
member_of
The Hunchback of
Notre Dame
painted_by
written_by
cryptic_motto_of
Et in Arcadia Ego
Victor Hugo
Holy Blood, Holy Grail
member_of
Priory of Sion
mentioned_in
displayed_at
member_of
The Da Vinci code
mentioned_in
painted_by
Leonardo Da Vinci
The Louvre
The Mona Lisa
painted_by
displayed_at
The Last Supper
painted_by
displayed_at
The Vitruvian man
Santa Maria delle
Grazie
Knowledge Enabled Information and Services Science
Motivation
• Undiscovered Public Knowledge
[Swanson 89]
– Hidden connections in text
• Our objective: build mechanisms to reveal
these connections
• Our approach:
o Populate existing ontology schemas via
information extraction from text
o Use the extracted information to
o Support browsing
o Text retrieval
o Knowledge discovery
Knowledge Enabled Information and Services Science
Discovering the Undiscovered Knowledge
• Swanson’s
discoveries – Associations
Magne
leads to between
loss Migraine and
Magnesium [Hearst99]
•
•
•
•
•
•
•
•
stress is associated with migraines
sium
stress can lead to loss of magnesium
Stress
calcium channel blockers prevent some migraines
magnesium is a natural calcium channel blocker
Spreading
spreading cortical depression (SCD) is implicated
in some migraines
high levels of magnesium inhibit SCD
Cortical
migraine patients have high platelet aggregability
Depression
magnesium can suppress platelet aggregability
platelet
aggregability
• Data sets generated
using these entities (marked red above) as
boolean keyword queries against pubmed
Calcium
Channel
Blockers
prevents
Migraine
• Bidirectional breadth-first search used to find paths in resulting RDF
Knowledge Enabled Information and Services Science
Background Knowledge Used
• UMLS – A high level schema of the biomedical
domain
– 136 classes and 49 relationships
– Synonyms of all relationship – using variant lookup
(tools from NLM)
– 49 relationship + their synonyms = ~350 mostly
verbs
T147—effect
T147—induce
T147—etiology
T147—cause
T147—effecting
T147—induced
• MeSH
– 22,000+ topics organized as a forest of 16 trees
– Used to query PubMed
• PubMed
– Over 16 million abstract
– Abstracts annotated with one or more MeSH terms
Knowledge Enabled Information and Services Science
Method – Parse Sentences in PubMed
SS-Tagger (University of Tokyo)
SS-Parser (University of Tokyo)
• Entities (MeSH terms) in sentences occur in modified forms
• “adenomatous”
modifies
“hyperplasia”
(TOP (S
(NP (NP (DT An)
(JJ excessive)
(ADJP (JJ endogenous) (CC or) (JJ
• “An excessive
endogenous
or exogenous
modifies
exogenous)
) (NN stimulation)
) (PP
(IN by) (NPstimulation”
(NN estrogen)
) ) ) (VP (VBZ
“estrogen”
induces)
(NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT
• Entities
can also occur) as
of 2 or more other entities
the)
(NN endometrium)
) ) composites
)))
• “adenomatous hyperplasia” and “endometrium” occur as “adenomatous
hyperplasia of the endometrium”
Knowledge Enabled Information and Services Science
Knowledge Enabled Information and Services Science
Utilizing Extracted Knowledge
Supporting browsing, querying and
knowledge discovery
– Semantic Browser
– Query semi-structured representations
• SPARQL
• Hypothesis-Driven Retrieval
– Discovery complex connection patterns
• Knowledge Discovery operators
Knowledge Enabled Information and Services Science
Example - Evaluating Hypotheses
Migraine
affects
Magnesium
Stress
inhibit
Patient
isa
Calcium Channel
Blockers
Complex
Query
Keyword query: Migraine[MH] + Magnesium[MH]
PubMed
Supporting
Document
sets
retrieved
Knowledge Enabled Information and Services Science
Example - Semantic Browser
Click to Launch
Knowledge Enabled Information and Services Science
Web 2.0
Man Meets Machine
Knowledge Enabled Information and Services Science
Putting the man back in Semantics
Semantic Web focuses on artificial agents
“Web 2.0 is made of people” (Ross Mayfield)
“Web 2.0 is about systems that harness
collective intelligence.” (Tim O’Reilly)
The relationship web combines the skills of
humans and machines
Knowledge Enabled Information and Services Science
Putting the man back in Semantics
Semantic Web focuses on artificial agents
“Web 2.0 is made of
people” (Ross Mayfield)
“Web 2.0 is about
systems that
harness collective
intelligence.”
(Tim O’Reilly)
The relationship web combines the skills of humans and machines
Knowledge Enabled Information and Services Science
Going places …
Formal
Powerful
Implicit
Social,
Informal
Knowledge Enabled Information and Services Science
A Community’s Pulse
• Wealth of information available in blogs, social
networks, chats etc.
• Free medium of self-expression makes mass
opinions / interests available
• Polling for popular culture opinions is easier
• Social Production undeniably affects markets
• Results of analysis more effectively tailored to
specific audience : geo-specific retail ads,
demographic interests in music
Knowledge Enabled Information and Services Science
Buzz on MySpace
Mining artist popularity from chatter on
MySpace
- Lists close to listeners preferences
User Comments:
vs.
BB
May 07
- Bill Boards
Rihanna
Rihanna
User Comments:
Jun 07
Rihanna
Biffy Clyro
Winehouse
Winehouse
Twang
Maroon 5
Maroon 5
Maroon 5
Mccartney
Mccartney
McCartney
Biffy Clyro
Biffy Clyro
Winehouse
Twang
Rascal
Rascal
Rascal
Twang
Knowledge Enabled Information and Services Science
The How
Metadata Extraction from Comments
 Artist, Track name in comments are common words
 “Keep your smile on Lil.” (Artist: Lilly Allen, Track: Smile)
 Necessitate a combination of linguistic, statistical, domain knowledge and
domain specific rules to do well
Detecting and discarding Spam
 Accurate popularity estimates
Transliterating Slang
 I say: “Your music is wicked”
 What I really mean: “Your music is good”
Hypercube: Demographics' of users who post, nonspam positive and negative sentiment comment counts
 Lets one ask questions like “Who is the most popular artists among the 19
year olds in New York?”
Knowledge Enabled Information and Services Science
Opportunities
 Casual Text more and more pervasive
 Extracting Semantic Metadata a whole different
problem
 What works for a news article, scientific literature
does not work well for content that does not follow
rules of edited text
 Need to systematically understand differences
in these types of text in order to improve
enablers like entity extraction
Knowledge Enabled Information and Services Science
Event Web and the Semantic Sensor
Web
Time, Space and Theme
Knowledge Enabled Information and Services Science
Events – Spatial, Temporal and Thematic
Spatial
Temporal
Thematic
Knowledge Enabled Information and Services Science
Events and STT Dimensions
E2:Soldier
E4:Address
located_at
lives_at
located_at
lives_at
E6:Address
Georeferenced Coordinate
Space
(Spatial Regions)
E1:Soldier
assigned_to
occurred_at
E7:Battle
participates_in
E8:Military_Unit
E8:Military_Unit
participates_in
E5:Battle
assigned_to
occurred_at
Residency
Battle Participation
Named Places
Spatial Occurrents
Dynamic Entities
Knowledge Enabled Information and Services Science
E3:Soldier
Scenario: Sensor Data Fusion and Analysis
High-level Sensor
Low-level Sensor
How do we determine if the
three images depict …
• the same time and same
place?
• the same entity?
• a serious threat?
Knowledge Enabled Information and Services Science
90
Data Pyramid
Sensor Data Pyramid
Ontology
Metadata
Knowledge
Entity Metadata
Information
Feature Metadata
Raw Sensor (Phenomenological) Data
Data
Knowledge Enabled Information and Services Science
What is Sensor Web Enablement?
http://www.opengeospatial.org/projects/groups/sensorweb
Knowledge Enabled Information and Services Science
92
SWE Components - Languages
Information
Model for
Observations
and Sensing
Sensor and
Processing
Description
Language
Observations &
Measurements
(O&M)
GeographyML
(GML)
Common Model
for Geographical
Information
SensorML
(SML)
TransducerML
(TML)
Sam Bacharach, “GML by OGC to AIXM 5 UGM,”
OGC, Feb. 27, 2007.
Knowledge Enabled Information and Services Science
Real Time
Streaming
Protocol
SWE Components – Web Services
Discover
Services
Sensors
Providers
Data
Sensor Planning Service:
Command and Task
Sensor Systems
Sensor
Observation
Service:
Access Sensor
Description
and Data
SOS
SPS
SAS
Sensor Alert
Service
Dispatch
Sensor Alerts
to registered
Users
Catalog
Service
Sam Bacharach,
“GML by OGC to AIXM 5 UGM,”
OGC, Feb. 27, 2007.
Clients
Accessible from various
types of clients from PDAs
and Cell Phones to high
end Workstations
Knowledge Enabled Information and Services Science
Semantic Sensor Web
Knowledge Enabled Information and Services Science
95
Conclusion
Knowledge Enabled Information and Services Science
Take Home Points
Semantics - from documents, to entities, to relationships
– Richer, meaningful representations offer more insight,
powerful reasoning capabilities
Semantics and Web technologies for integration of
information from disparate sources, often created for very
different purposes with lesser human involvement
Semantic Web is highly interdisciplinary – uses IR, AI, KR,
DB, DC, ...
Increasing mesh of Semantics, Services, People for better
exploitation of resources (data, sensors, services, people)
Knowledge Enabled Information and Services Science
Kno.e.sis Labs (3rd floor, Joshi)
Semantic Sciences Lab (Dr Sheth)
Bioinformatics Lab (Dr Raymer)
Semantic Web Lab (Dr Sheth + Dr. S.Wang)
Service Research Lab (Dr Sheth)
Metadata and Languages Lab (Dr Prasad)
Data Mining Lab (Dr Dong)
Joint Proposals With Each
Sensor Networking Bin Wang
Knowledge Enabled Information and Services Science
Kno.e.sis Members – a subset
Knowledge Enabled Information and Services Science
References
Projects: http://knoesis.org/research/
Demos at:
http://knoesis.wright.edu/library/demos/
Publications: http://knoesis.wright.edu/library
Rest: http://knoesis.org
Thanks to our key sponsors: National Science
Foundation, National Institute of Health, AFRL
and industry partners.
Knowledge Enabled Information and Services Science
Download