Research in Semantics and Services Science October 26, 2006

advertisement
Research in Semantics and
Services Science
October 26, 2006
Knowledge Enabled Information and
Services Science (kno.e.sis)

Where do we fit in Computer Science?





Semantic Web
Service Oriented Computing
Business Process Management
Data Management and Mining
Bioinformatics
Knowledge Enabled Information and
Services Science (kno.e.sis)

What capabilities do we have?






ontology management and multi-ontology environments
integration and analysis of heterogeneous data (structured, semistructured, unstructured)
advanced and intelligent search, browsing, querying, mining,
analysis and knowledge discovery
semantic annotation of documents, scientific data and services
involving entity and relationship extraction/disambiguation,
semantic enhancement of Web2.0 including social search and
light-weighted services, semantic middleware and semanticsenabled networking
semantic Web services and processes including semantics
based publication, discovery, composition and dynamic binding of
services
Knowledge Enabled Information and
Services Science (kno.e.sis)

Where are our application areas?

e-Science


Web-based Information Management


bioinformatics, biomedicine, health care
search and business intelligence
National and Homeland Security

intelligence analysis
Overview




Semantic Web Services and
Processes
Entity/Relationship
Extraction, Disambiguation
and Annotation
Semantic Analytics
Semantics for Life Sciences
Meteor-S
Semantic
Middleware
SemDis
Bioinformatics for
Gycan Expression
Semantic Web Services and
Processes
Challenges
“Each
enterprise will measure and Challenges
aspire to its own unique
 Business/Organizational
level of dynamism based on its individual purpose. It is
 How to effectively create new business solutions
about being nimble and adaptable. A fully integrated
using a global workforce
business platform can respond faster, and completely, to
 How to make IT more responsive to business
change.
Whether it involves fulfilling a new mandate or
strategy
embracing
a new market opportunity. Some organizations
 Technical/Tactical
Challenges
will
push the envelope, automating
event-triggered
responses
foradd
highly
integrated
closed-loop
processes,
 How to
more
dynamism
in business
process
settingcreation
the stage for self-optimizing systems.”
How to make processes adapt with changing
environments

Sandra Rogers, White Paper: Business Forces Driving Adoption of Service Oriented
Architecture, Sponsored by: SAP AG
Ontologies to Describe Service Semantics
(ontologies are about agreements)
People
Technical
Aspect of Agreement
Organization
Autonomic Web Process*
Strategy Layer (Corporate Strategy and
Strategy Layer
Goals)
• Self Healing
Requirement:
Only Provide customer
Operational Layer (Modeling Business
support
gold
customer
Process to
to provide
business
services)
• Agile
• Self Optimizing
ITLayer
Layer
Execution
(SOA Based IT Processes
andRequirement:
Services)
• Self Configuring
If cost > $$$$,
Implementation
Layer
customer
=(Databases,
gold OS, etc.)
Execution
Scope of Agreement
Task/
App
Domain
Industry
Gen.
Purpose,
Non Functional
Functional
Common
Data/
Sense
Info.
*it’s about the business, not just computing resources
Broad Based
Semantics for Technical Services

Data/Information Semantics




Functional Semantics




(Semi-) Formally representing capabilities of web service
for discovery and composition of Web Services
by annotating operations of Web Services as well as provide preconditions and effects
Execution Semantics




What: (Semi-)Formal definition of data in input and output messages of a web service
Why: for discovery and interoperability
How: by annotating input/output data of web services using ontologies
(Semi-) Formally representing the execution or flow of a services in a process or
operations in a service
for analysis (verification), validation (simulation) and execution (exception handling) of
the process models
using State Machines, Petri nets, activity diagrams etc.
Non Functional Semantics (WS-*)




(Semi-) formally represent qualitative and quantitative measures of Web process
Non- Quantitative includes security, transactions
Quantitative includes cost, time etc.
Business constraints and inter service dependencies (Domain and application
ontologies)
Semantics for Technical Services
BPWS4J,
Execution,
Adaptation and
Mediation
Development
/ Description
/ Annotation
activeBPEL,
WSMX
WSDL, WSDL-S,
SAWSDL, WSMO,
OWL-S
METEOR-S
(MWSAF)
METEOR-S
BPEL, WSAgreement, WSPolicy
METEOR-S
(MWSCF)
Composition,
Configuration
and
Negotiation
Publication
/ Discovery
(Semantic) UDDI
METEOR-S
(MWSDI)
Dynamic Process Configuration

Operations Research has been used in industry for
business process optimization

There is often a lot of domain knowledge in
business process optimization



Minds of analysts/experts
Hidden in databases/texts
We try to explicitly capture domain knowledge and
link with IT systems
Dynamic Process Configuration
Find optimal partners for the process
based on process constraints – cost,
supply time, etc.
Conceptual Approach
1. Create framework to capture
represent domain knowledge
2. Represent constraints on the domain
knowledge
3. Ability to reason on the constraints
and configure the process
Dynamic Process Configuration
Research Challenges



Capturing functional and non-functional
requirements of the Web process (Abstract
process specification)
Discovering service partners based on functional
requirements (Semantic Web service discovery)
Choosing optimal partners that satisfy nonfunctional requirements (Constraint Analysis)
K. Verma, R. Akkiraju, R. Goodwin, P. Doshi, J. Lee, On Accommodating Inter Service Dependencies in Web Process Flow,
AAAI Spring Symposium on Semantic Web Services, 2004
R. Aggarwal, K. Verma, J. A. Miller, Constraint Driven Composition in METEOR-S, SCC 2004.
K. Verma, K.Gomadam, J. Miller and A. Sheth, Configuration and Execution of Dynamic Web Processes, LSDIS Lab Technical Report, 2005.
Process Adaptation

Ability to adapt the processes from failures,
unexpected events

Two kinds of failures


Failures of physical components like services, processes,
network
 Can replace services using dynamic configuration
Logical failures like violation of SLA
constraints/Agreements such as Delay in delivery, partial
fulfillment of order
 Need additional decision making capabilities
K. Verma, A. Sheth, Autonomic Web Processes, ICSOC 2005
K. Verma, P. Doshi, K. Gomadam, A. Sheth, J. Miller, Optimal Adaptation of Web Processes with Coordination Constraints, ICWS 2006.
Process Adaptation

Research Challenges


Creating a model to recover from failures and handle future events
Model must deal with two important factors



Scenario





Uncertainty about when a failure occurs
Cost based recovery
After order for MB and RAM are placed, they may get delayed
The manufacturer may have severe costs if assembly is halted
It must evaluate whether it is cheaper to cancel/return and reorder or take
the penalty of delay
Caveat: possible that reordered goods may be delayed too
Proposed Solution

Modeling decision making capabilities of Service Managers as Markov
Decision Processes (MDPs)
SWAPS: Use of Semantics in Agreement
Matching
An agreement is a collection of alternatives.
A={Alt1, Alt2, …, AltN}
An alternative is a collection of guarantees.
Alt={G1, G2, ...GN}
“requirement(Alt, G)” returns true if G is a requirement of Alt
A guarantee is defined
as a collection“capability(Alt,
G)” returns
true if G is an assurance of Alt
G={Scope, returns
Obligated,
SLO,
Qualifying
Condition, Business Value}
“scope(G)”
the
scope
of G
“obligation(G)” returns the obligated party of G
“satisfies(Gj,
Gi)” returns
true if the
SLO of
is equivalent
to
There is a potential
match between
provider
andGjconsumer
alternatives
or
if:stronger than the SLO of Gi
An alternative Alt1 is a suitable match for Alt2 if:
For
requirement
one
alternative,
there is a capability
(" allGi)
such thatofGi
 Alt1
 requirement(Alt1,
Gi)inother
($ Gj)
alternative,
has the
same scope and the
obligation and the
such thatwhich
Gj  Alt2
 capability(Alt2,
Gj)same
 scope(Gi)
SLO
of the capability
satisfies the request.
= scope(Gj)
 obligation(Gi)
= obligation(Gj)  satisfies(Gj,
WS-Agreement Definition and Ontology
hasGuaranteeTerm
GuaranteeTerm
hasScope
An agreement consists of a collection of
Guarantee hasBusinessValue
terms
hasCondition
A guarantee term
has a scope – e.g. operation
BusinessValue
of service Qualifying Condition
ServiceLevelObjectivev
hasReward
hasObjective
Scope
Predicate
Reward
There
might
business
values hasPenalty
associated
guarantee
term
maybe
have
qualifying
A guaranteeAterm
may have
collection
of a hasImportance
with
each to
guarantee
terms. Business
values
Penalty
conditionParameter
for SLO’s
hold.
Parameter
service
level objectives
Unit
Importance
include importance,
confidence, penalty,
Value
ValueExpression
e.g. numRequests
< 100
and reward.
e.g. Unit
responseTime
< 2 seconds
Predicate
Value
OWL ontology
e.g. Penalty 5 USD
ValueUnit
ValueExpression
Assessment Interval
Assessment Interval
ValueUnit
TimeInterval
Count
Count
TimeInterval
Agreement represented as an instance of ontology
Semantic Middleware
Semantic Middleware

Investigating fundamental issues in
entity/relationship extraction,
disambiguation (matching & mapping)
and annotation.
Three fundamental steps

Semantic Tagging of resources (simplest
form)



Entity identification
Entity disambiguation
Annotation
-----------------------------------------------------------------------------------------------------------------------------------------------------
World Model
Lexical Analysis, Natural Language
Processing, Additional linguistic
resources: Thesaurus,Dictionary
(synonymns, common variations)
Entity Identification /
Metadata Creation
Documents to
annotate
YES
Multiple matches
found during lookup?
NO
Knowledge Base
Semantic
Annotation of
selected documents
Annotated Documents
Entity
Disambiguation
Semantic Annotation

Entities in a drug advisory annotated with
concepts and relationships from a Drug
Ontology
Excerpt of Drug Ontology
Excerpt of Drug Ontology
Sample Created Metadata
<Entity id="122805"
class="DrugOntology#prescription_drug_brandname">
Bextra
<Relationship id=”442134”
class="DrugOntology#has_interaction">
<Entity id="14280" class="DrugOntology
#interaction_with_physical_condition>sulfa allergy
</Entity>
</Relationship>
</Entity>
Disambiguation

Functionality:

merging two databases / ontologies, multiple references
pointing to the same logical entity

Adding new instances to an ontology, a similar entity
already exists and has to be merged with the new one

Example: merging person instances recorded in a
government ontology and an incoming choice point person
entity.
Challenges

Varying information content in entities


Differences in schema
Variations in representation


Use of abbreviations, mis-spellings, different naming
convention, representation formats changing over time
etc.
Insufficient information while merging two
entities
Exploiting relationships and other
/previous reconciliation decisions
Schema
Conflicting instances
Person
Tim Robins
Timothy Wallace Robinson
-- SSN
-- 889889889
-- 889889889
-- TelNumber
-- 7065434567
-- 7062123443
-- FirstName
-- Tim
-- Timothy
-- MiddleName
--
-- Wallace
-- LastName
-- Robins
-- Robinson
-- Generation
--
--
-- Marital Status
-- Single
-- Married
-- Applicant
--
--
-- dependent of
--
--
-- spouse of
--
-- person12332
-- works for
-- People Soft
-- Oracle
-- affiliated with
--
--
-- foreign influence event
-- event7823
-- event099
-- address
-- place23
-- place23
Nature of attribute indicates its
relative importance – SSN given a
high weight in disambiguating
person entities
String similarity metrics
Recognized as a time sensitive
attribute
Reconciling Oracle and
PeopleSoft indicates the two
person entities work for the
same organization
Application - Disambiguating entities from
two domains



DBLP vs. DBLP
FOAF vs. FOAF
DBLP vs. FOAF
FOAF
rdfs:literal
rdfs:literal
DBLP
rdfs:literal
foaf:mbox
foaf:schoolpage
rdfs:literal
label
rdfs:literal
rdfs:literal
dblp:has_label
dblp:has_homepage dblp:has_no_of_co_authors
dblp:has_no_of_publications
dblp:has_coauthor
rdfs:literal
foaf:workplacepage
dblp:Researcher
rdfs:literal
foaf:knows
foaf:Person
rdfs:literal
foaf:surname
foaf:homepage
foaf:firstName
foaf:depiction
foaf:mbox_sha1sum
foaf:nickName
dblp:has_iswcLocation
dblp:has_iswc_type
rdfs:literal
dblp:has_iswc_affiliation rdfs:literal
rdfs:literal
rdfs:literal
rdfs:literal
rdfs:literal
rdfs:literal
rdfs:literal
Exploiting relationships and propagating
reconciliation decisions



Syntactic matches (String similarity)
Attribute weights
Relationship with other entities


Presupposition: some coauthors could also be
your friends
Propagating decisions
Reference reconciliation in complex information spaces, X Dong, A Halevy, J Madhavan - Proceedings of the 2005 ACM
SIGMOD international conference
Syntactic matches
http://www.informatik.uni-trier.de/~ley
/db/indices/a-tree/s/Sheth:Amit_P=.html
9c1dfd993ad7d1852e80ef8c87fac30e10776c0c
Dblp
mbox_shasum
homepage
Amit P Sheth
http://www.semagix.com
http://lsdis.cs.uga.edu
Workplace homepage
9c1dfd993ad7d1852e80ef8c87fac30e10776c0c
Amit Sheth
label
UGA
mbox_shasum
Professor
title
iswc_affiliation
DBLP Researcher
coauthors
label
FOAF Person
Marek Rusinkiewicz
Carole Goble
Steefen Staab
Ramesh Jain
John Miller
John A. Miller
friends
homepage
http://lsdis.cs.uga.edu/~amit
http://lsdis.cs.uga.edu/~amit
homepage
Attribute weights
9c1dfd993ad7d1852e80ef8c87fac30e10776c0c
The uniqueness property of the
Mail box and homepage values
give those attributes more weight
9c1dfd993ad7d1852e80ef8c87fac30e10776c0c
mbox_shasum
Amit P Sheth
Amit Sheth
label
UGA
label
Professor
title
iswc_affiliation
mbox_shasum
DBLP Researcher
Marek Rusinkiewicz
Carole Goble
FOAF Person
coauthors
Steefen Staab
Ramesh Jain
John Miller
John A. Miller
friends
homepage
homepage
http://lsdis.cs.uga.edu/~amit
http://lsdis.cs.uga.edu/~amit
Relationship with other entities
http://www.semagix.com
http://lsdis.cs.uga.edu
http://www.informatik.uni-trier.de/~ley
/db/indices/a-tree/s/Sheth:Amit_P=.html
9c1dfd993ad7d1852e80ef8c87fac30e10776c0c
Dblpmbox_shasum
homepage
Amit P Sheth
Workplace homepage
9c1dfd993ad7d1852e80ef8c87fac30e10776c0c
Amit Sheth
label
UGA
Professor
mbox_shasum
title
iswc_affiliation
A coauthor who is
also a friend
DBLP Researcher
coauthors
label
FOAF Person
Marek Rusinkiewicz
Carole Goble
Steefen Staab
Ramesh Jain
John Miller
John A. Miller
friends
homepage
http://lsdis.cs.uga.edu/~amit
http://lsdis.cs.uga.edu/~amit
homepage
Propagating decisions

If John Miller and John A. Miller are found to be the same entity,
there is more support for reconciliation of the entities Amit P.
Sheth and Amit Sheth
(based on our presupposition that some coauthors could also be your friends)
Amit P Sheth
Amit Sheth
label
UGA
Professor
title
iswc_affiliation
DBLP Researcher
coauthors
label
FOAF Person
Marek Rusinkiewicz
Carole Goble
Steefen Staab
Ramesh Jain
John Miller
John A. Miller
friends
homepage
homepage
http://lsdis.cs.uga.edu/~amit
http://lsdis.cs.uga.edu/~amit
Results / Evaluation
Attributes - weights and thresholds
Properties of the dataset and results
Semantic Analytics
NSF funded Semantic Discovery: Discovering
Complex Relationships in the Semantic Web
(SemDis) 4 faculty members, 7 PhD students
http://lsdis.cs.uga.edu/semdis
Semantic Discovery
From …..
Finding things
To …..
Finding out about things
Relationships!
Semantic Discovery (SemDis) Overview
How is entity 1 (Reviewer) related
to entity 7 (Submission)?
author_of
E2:Paper
E6:Person
author_of
E1:Reviewer
author_of
author_of
E7:Submission
E4:Paper
knows
author_of
E3:Person
knows
E5:Person
User
Aggregated RDF
Instance Base
Semantic Analytics
·
·
·
Semantic Association
Discovery and Ranking
Subgraph Discovery
Browsing
Ontology Schema(s)
Text
XML
HTML
RDMS
Semantic Associations
Concepts and Definitions
Semantic Connectivity
“Matt”
Semantically Connected
&r1
&r6
“Perry”
&r5
name
“LSDIS Lab”
name
“The University
of Georgia”
Semantic Similarity
Passenger
Ticket
Corporate Account
“Bill”
&r1
“Fred”
&r7
“Smith”
&r2
paidby
&r3
Semantically Similar
“Jones”
purchased
lname
purchased
&r8
paidby
&r9
Battling Information
Overload
 Enumeration
and
Ranking
 Subgraph Discovery
Ranking Semantic Associations5
Association Length
Rarity
Organization
Political
Organization
Democratic
Political
Organization
Subsumption
Context
Association
Rank
Trust
Popularity
5. Boanerges Aleman-Meza, Christian Halaschek-Wiener, Budak Arpinar, Cartic Ramakrishnan, and Amit Sheth. “Ranking
Complex Relationships on the Semantic Web”, IEEE Internet Computing, 9(3), 37-44. 2005.
Ranking Semantic Associations (SemRank)



Modulative Ranking
Relevance: Search Mode + Predictability
Refraction Count


Information Gain


How varied is the result from what is expected from
schema?
How much information does a user gain by being informed
about a result?
S-Match

Best semantic match with user need (if provided)
Kemafor Anyanwu, Angela Maduko, Amit Sheth, SemRank: Ranking Complex Relationship Search Results on the Semantic
Web, The 14th International World Wide Web Conference, (WWW2005), Chiba, Japan, May 10-14, 2005
Low Information Gain
Low Refraction Count
High S-Match
High Information Gain
High Refraction Count
High S-Match
adjustable search mode
Subgraph Discovery


Idea is to summarize the important
connections between two resources
Tied with visualization (what is the best set of
associations that can be visually
comprehended at one time?)


Given: RDF Graph, Budget b
Find: The best set of associations which pass
through at most b different nodes
Cartic Ramakrishnan, William Milnor, Matthew Perry, Amit Sheth. "Discovering Informative Connection Subgraphs in
Multi-relational Graphs", SIGKDD Explorations Special Issue on Link Mining, Volume 7, Issue 2, December 2005
1988 Democratic Natl Conv
2000 Democratic Natl Conv
spoke_at
spoke_at
spoke_at
spoke_at
Bill Clinton
nominated_at
Edward Kennedy
won
relative_of
1992 Democratic Natl Conv
1992 Natl Presidential Election
spoke_at
Maria Schriver
Zell Miller
lost
spoke_at
George H W Bush
2004 Republican Natl Conv
spouse_of
started
spoke_at
Arnold Schwarzenegger
George H W Bush Council of Physical Fitness
leader _of
Subgraph Discovery

Approach




Heuristic algorithm
Puts weights on the edges based on semantics of
node and edge types and based on structural
properties of the graph
Models graph as electrical circuit (weights are
conductance)
Use Greedy Algorithm to maximize current flow
and minimize number of nodes
Spatiotemporal and Thematic
Semantic Analytics
Matthew Perry, Farshad Hakimpour, Amit Sheth. "Analyzing Theme, Space and Time: An Ontology-based
Approach", Fourteenth International Symposium on Advances in Geographic Information Systems (ACMGIS '06), Arlington, VA, November 10 - 11, 2006
From thematic analytics to spatio-temporal, thematic
(STT) analytics (Ex: Bioterrorism)
assigned_to
E10:Docto
r
E9:Base
E1:Soldie
r
Spotted Before and
E8:Soldie
Close in Time
r
E6:Attack
used_in
[0, 2]
stationed_at
member_of
[0, 10]
E7:Platoo
n
E2:Sympto
m
E4:Chemica
l
causes
sign_of
participated_in
[4, 6]
After the Battle
spotted_at
[3, 5]
E11:Locatio
n
E14:Battl
e
Near in Space
E5:Terroris
t
member_of
E3:Diseas
e
exhibits
[8, 10]
carried_out
[0, 2]
E13:Soldie
r
E12:Platoo
n
participated_in
member_of
Proposed Model – 3 Dimensions (Thematic, Geospatial,
Temporal)
Dynamic
Entity
Named
Place
located_at [ ti : tj ]
Event
[ ti : tj ]
[ ti : tj ]
occurred_at[ ti : tj ]
subClassOf (isA)
arbitrary user-defined
classes and relationships
[ ti : tj ]
time interval of relationship
[ ti : tj ]
Footprint
Spatial Geometry
Representation
part_of
contains
overlaps
adjacent_to
[ ti : tj ]
Thematic Context for Spatial Extent
Spatial extent of non-spatial entities is
derived from thematic context
15 Spring Street
Lives At
University of Georgia
Works For
(x3, y3)
Bill Allen
(x2, y2)
Fred Smith
Lives At
Georeferenced Coordinate Space
Dynamic Entity
Named Place
150 Elm Street
(x1, y1)
Context: path expression
connecting dynamic
entity type to static entity
type / event
Spatial extent in context of
Example Context:
employment and in
Residency of Co-Workers: works_for.works_for.lives_at
context of residency
Queries based on Spatiotemporal Contexts
Basic ST
Query


ST Range
Query

ST Behavior
Query


ST Relationship
Query

When was the 3rd Armored Division within Iraq?
Where were bombing targets of the US Air Force in
April 2003?
How did the distribution of US airstrips in Iraq
change during March 2003?
Show the dates and locations of battles of the 101st
Airborne Division
How does the battle pattern of the 3rd Armored
Division compare to the pattern of the 1st Armored
Division?
When and where were the 101st Airborne and the
82nd Airborne likely to have interacted?
Spatiotemporal Semantic Associations
• Define setting as a region of space in combination with an
interval of time
• How is entity X related to Spatial setting S? ( ρ (entity, setting))
Group 1
Account_1234
Fred
125 Broad Street
Jim
Attack Site
How is Group 1 connected to the setting of the expected attack?
Spatiotemporal Semantic Associations
How are entity X and entity Y related w.r.t Spatial setting S?
ρ (entity, entity, setting)
Group 1
Group 2
Account_1234
Fred
Jim
125 Broad Street
How are Group 1 and Group 2 connected with respect to the attack site?
Spatiotemporal Semantic Associations

Idea of Virtual Links between entities based on
Spatiotemporal information

Possible definition of rules to define a virtual link
type
 Collaboration: entity X and Y are in close ST
proximity more often than a given threshold
 Knows: entity X and Y are in close ST proximity
regularly
Other Aspects

How do temporal relationships affect association
semantics


2 works_for relationships (overlapping times, disjoint times,
etc)
Complex queries based on all 3 dimensions

Which location is the most likely storage facility for
exfiltrated weapon material
 Thematic (correct capabilities, linked to correct people)
 Spatial (where was the material last seen)
 Temporal (how long can the material stay out of storage)
REmBRANDTS – Retrieval, Browsing,
Analytics and knowledge Discovery from Text
using Semantics
Cartic Ramakrishnan
LSDIS Lab, University of Georgia, Athens, GA
SEMANTICS, SEMANTICS ….. SEMANTICS
Overview
UMLS
Biologically
active substance
affects
complicates
causes
causes
Lipid
Disease or
Syndrome
affects
instance_of
instance_of
???????
Fish Oils
Raynaud’s Disease
MeSH
PubMed
9284
documents
5
documents
4733
documents
About the data used

UMLS – A high level schema of the biomedical
domain



MeSH


136 classes and 49 relationships
Synonyms of all relationship – using variant lookup (tools
T147—effect
from NLM)
T147—induce
T147—etiology
T147—cause
T147—effecting
T147—induced
Terms already asserted as instance of one or more classes
in UMLS
PubMed

Abstracts annotated with one or more MeSH terms
Method – Parse Sentences in PubMed
SS-Tagger (University of Tokyo)
SS-Parser (University of Tokyo)
(TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ
exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ
induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT
the) (NN endometrium) ) ) ) ) ) )
Method – Identify entities and Relationships in
Parse Tree
Method – Identify entities and Relationships in
Parse Tree
Modifiers
Modified entities
Composite Entities
Result of Extraction

Semantic Metadata



Represented in RDF
With complex entities and relationships
connecting them
Provenance of extracted facts


Pointers to original document and sentence
Current results


~2MB RDF for Migraine Magnesium subset of PubMed
~150MB RDF for all documents pertaining to Neoplasms
subtree of MeSH
Use of Generated Semantic Metadata


Semantic Browsing of PubMed based on
named relationships between MeSH terms
Corpus-based hypothesis validation


Path/hypothesis based document retrieval
Knowledge discovery from literature


Coprus-based complex relationship discovery and
ranking
Corpus-based relevant connection subgraph
discovery
Corpus based Hypothesis validation
affectedBy
Magnesium
Migraine
Stress
inhibit
Patient
isa
Calcium Channel
Blockers
Complex
Query
PubMed
Supporting
Document
sets
retrieved
Discovering Complex Relationships
Stress
Migraine
?
Calcium
Channel
Blockers
Cortical Spreading Depression
PubMed
Magnesium
Possibly
thousands
of paths
Need corpus-based
relevance model
for paths and subgraphs
Discovering Maximally Relevant Connection
Subgraphs
Migraine
Subgraph with Maximal Support
Magnesium
A connection
subgraph
PubMed
Computing Semantic
Associations
Graph-based, Main-Memory RDF
Processing
BRAHMS – Design Goals





Offer high performance for basic operations
used in graph traversal algorithms.
Capable of handling big ontologies
(100s Mbytes to many Gbytes).
Handle RDF / RDFS.
Distinguish between schema and instance
level.
Provide framework for testing different
semantic association discovery algorithms.
Maciej Janik, Krys Kochut, "BRAHMS: A WorkBench RDF Store And High Performance Memory System for Semantic
Association Discovery" In the Proceedings of the 4th International Semantic Web Conference (ISWC2005), November
2005, Galway Ireland, pp. 431-445.
BRAHMS

Performance requirements




use main memory for storage – fastest access
create indexes for operations used in graph traversal
algorithms
use C/C++ in implementation instead of Java
Design decisions


compact knowledge base to minimize memory usage, no
memory fragmentation – use contiguous memory blocks 
make it read-only
create snapshot of memory structures for fast start-up
(parse once, use many times)
BRAHMS Results

Speed



outperform Sesame, Jena and Redland in k-hop limited
semantic association searches using main-memory RDF
model
big impact using large datasets, when other datastores
either perform slowly or cannot execute algorithm at all
Handling datasets



size limited by main-memory (physical) and/or system (32
Vs. 64bit)
able to efficiently run algorithms on large datasets, that
other RDF storages cannot handle using memory-model
tested: SWETO [255Mb], Lehigh University –
Univ(50, 0) [556Mb], synthetic [9Gb] /64bit machine/
Future of Brahms

SPARQL – currently implemented most of
functionality over BRAHMS


create querying extension for regular expressions
on graphs
Distributed storage4 (current work)


handle very large dataset (10s Gb+) partitioned to
cluster of computers
efficient distributed SPARQL query model and
implementation
Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Ismailcem Budak Arpinar, Amit Sheth. "Peer-to-Peer
Discovery of Semantic Associations", Second International Workshop on Peer-to-Peer Knowledge Management, San
Diego, CA, July 17, 2005
Semantics for Life Sciences
Applications
Semantic Bioinformatics in
Glycoproteomics
Acknowledgement: NCRR funded Bioinformatics of Glycan Expression,
collaborators, partners at CCRC (Dr. William S. York)
and Christopher Thomas, Cory Henson, Prateek Jain
Outline




Semantic integration of large distributed data
sources
Glycoproteomics ontologies
Services architecture based biological
resources
GLYDE – XML-based representation
standard
Integrated Semantic Information
and knowledge System (Isis)
Have I performed an error?
Give me all result files from a similar
organism, cell, preparation,
mass spectrometric conditions
and compare results.
SPARQL query-based User Interface
ProPreO ontology
Is the result erroneous?
Experimental
Semantic
Give me
result files from
a similar
Data all
Semantic
Metadata
Annotation
Metadata
organism,
cell,
preparation,
Registry
File
mass spectrometric conditions
and compare results.
PROTEOMECOMMONS
EXPERIMENTAL DATA
Raw
mzXML
Raw2mzXML
mzXML2Pkl
Pkl
MACOT
result
ProVault
result
MASCOT Search
ProVault
pSplit
Pkl2pSplit
PROTEOMICS WORKFLOW
N-Glycosylation Process (NGP)
Cell Culture
extract
Glycoprotein Fraction
proteolysis
Glycopeptides Fraction
1
n
Separation technique I
Glycopeptides Fraction
n
PNGase
Peptide Fraction
Separation technique II
n*m
Peptide Fraction
Mass spectrometry
ms data
ms/ms data
Data reduction
ms peaklist
ms/ms peaklist
binning
Glycopeptide identification
and quantification
N-dimensional array
Signal integration
Data reduction
Peptide identification
Peptide list
Data correlation
Ontologies
• Glyco
• An ontology for structure and function of Glycopeptides
• 573 classes, 113 relationships
• Published through the National Center for Biomedical
Ontology (NCBO)
• ProPreO
• An ontology for capturing process and lifecycle information
related to proteomic experiments
• 398 classes, 32 relationships
• 3.1 million instances
• Published through the National Center for Biomedical
Ontology (NCBO) and Open Biomedical Ontologies (OBO)
Zooming in a little …
Reaction R05987
catalyzed by enzyme 2.4.1.145
adds_glycosyl_residue
N-glycan_b-D-GlcpNAc_13
The product of this
reaction is the
Glycan with KEGG
ID 00020.
The N-Glycan with KEGG
ID 00015 is the substrate to
the reaction R05987, which
is catalyzed by an enzyme
of the class EC 2.4.1.145.
Semantic annotation of Scientific Data
<ms/ms_peak_list>
<parameter
instrument=“micromass_QTOF_2_quadropole_time_of_flight_
mass_spectrometer”
mode = “ms/ms”/>
<parent_ion_mass>830.9570</parent_ion_mass>
<total_abundance>194.9604</total_abundance>
<z>2</z>
<mass_spec_peak m/z = 580.2985 abundance = 0.3592/>
<mass_spec_peak m/z = 688.3214 abundance = 0.2526/>
<mass_spec_peak m/z = 779.4759 abundance = 38.4939/>
<mass_spec_peak m/z = 784.3607 abundance = 21.7736/>
<mass_spec_peak m/z = 1543.7476 abundance = 1.3822/>
<mass_spec_peak m/z = 1544.7595 abundance = 2.9977/>
<mass_spec_peak m/z = 1562.8113 abundance = 37.4790/>
<mass_spec_peak m/z = 1660.7776 abundance = 476.5043/>
<ms/ms_peak_list>
Annotated ms/ms peaklist data
Semantic Biological Web Service Registry
Semantic Web Service
GLYDE-CT : GLYcan Data Exchange
Based on a Connection Table Format
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE GlydeCT SYSTEM "http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11.DTD">
<GlydeCT xmlns:GlydeCT="http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11">
<structure type="molecule" id="molecule_1" name=“GP1">
<part type="moiety" id=“moiety_1" ref=“some_file#GNGS" name="GNGS"/>
<part type="moiety" id=“moiety_2" ref=“some_file#Man3" name="Man3GlcNAc2"/>
<link from=“moiety_2" to=“moiety_1">
<link from=“residue_1" to=“residue_2">
<link from="C1" to="N4"/>
</link>
4
</link>
</structure>
3
2
1
</Glyde-CT>
5
moiety_2
Gly 1
|
Asn 2
|
Gly3
|
Ser 4
moiety_1
GLYDE-CT: Collaborative GlycoInformatics
Evolving collaboration between:
 LSDIS/CCRC:
Will York, Amit Sheth, Michael Pierce

EUROCarbDB (German Cancer Research Center):
Willi von der Lieth

Consortium for Functional Glycomics (CFG):
Rahul Raman, Ram Sasisekharan, Thomas Lütteke

N.D. Zelinsky Institute of Organic Chemistry (Moscow)
Yuriy Knirel

Mitsui Knowledge Industry (Japan):
Hisashi Narimatsu, Norihiro Kikuchi

Kyoto Encyclopedia of Genes and Genomes (KEGG):
Minoru Kanehisa, Kiyoko F. Aoki-Kinoshita

Palo Alto Research Center (PARC):
David Goldberg,
Moving Forward



Dr. Amit Sheth will take his new position of LexisNexis Eminent
Scholar at Wright State University starting January 2, 2007
Dr. Sheth, along with 10 of his Ph.D. students and existing and
newly-selected faculty at WSU, will form the kno.e.sis (Knowledge
Enabled Information and Services Science) lab
Collaborative research will continue between the newly-formed
kno.e.sis lab at WSU and the remaining members of LSDIS at UGA


http://lsdis.cs.uga.edu
http://knoesis.org
http://lsdis.cs.uga.edu/projects/asdoc/
http://lsdis.cs.uga.edu/projects/glycomics/
Download