Research Problems in Semantic Web Search Varish Mulwad ____________________________

advertisement
Research Problems in Semantic
Web Search
____________________________
Varish Mulwad
1
Agenda
____________________________
• Introduction
• Swoogle
• Swoogle’s Competition –
• Sindice
• Semantic Web Search Engine (SWSE)
• Watson
• Falcon
• Research Problems and Issues with Swoogle
• References
2
Introduction
____________________________
Your Agent
Web
Possible because:
Data is in machine understandable
form like – RDF, OWL
Dr.Finin’s
FOAF Profile
But how will agent find all this
data ? Search Engines ?
3
Introduction
____________________________
Traditional Search Engine Results
Semantic Web Search Engine Results
4
Swoogle
____________________________
• Swoogle is a crawler based indexing and retrieval
system for Semantic Web
• Swoogle crawls and discovers documents written in
RDF,OWL
• Swoogle classifies a Semantic Web Document(SWD)
as –
• Semantic Web Ontology (SWO) – Defines new
terms
• Semantic Web Databases (SWDB) – Makes
assertions about individuals
5
Swoogle
____________________________
SWOOGLE DEMO
6
Swoogle Architecture
____________________________
7
Swoogle Architecture
____________________________
SWD Discovery Component
• Google crawler using the Google web service
• Filetypes with extensions “.rdf”, ”.owl”, “.n3”
• Google limits only 1000 results per query
• A focussed crawler
• Crawls documents within a given website
• Extension and Focus constraints
• A Swoogle crawler
• Jena based crawler
• Explores Semantic Links between SWDs
8
Swoogle Architecture
____________________________
Metadata Creation
• Basic Metadata
• Encoding – “RDF/XML”, “N-Triple”, “N3”
• Language – RDF, RDFS, OWL, DAML + OIL
• OWL Species – OWL-LITE, OWL-DL, OWL-FULL
• Relations among SWDs
• Reference relationship among SWDs
• Inter ontology relationships
9
Swoogle Architecture
____________________________
Data analysis component
• Classification of SWD as SWO or SWDB
• Compute rank of SWD
Web based interface
• Human User Interface – http://swoogle.umbc.edu
• Web Services using REST interface
• Agent Service
10
Sindice
____________________________
• Created at Digital Enterprise Research Institute
(DERI)
• Key features of Sindice include –
• Sindice collects SWDs and indexes them on resource
URIs, Inverse Functional Properties(IFPs) and
keywords
• Sindice uses the Hadoop parallel architecture
11
Sindice
____________________________
Inverse Functional Property (IFP) – An OWL cardinality
restriction
Sincdice uses three indexes –
• URI index
• IFP index
• Keyword index
Benefits - Faster retrieval of data
12
Sindice
____________________________
Hadoop architecture is used in the following manner –
• Sindice employs Hadoop/Nutch to distribute
crawling job across multiple machines
• Collected data is stored in the Hbase distributed
column – based store
• Efficient handling of large datasets across the
cluster using a MapReduce implementation
13
Sindice
____________________________
SINDICE DEMO
14
SWSE
____________________________
• Semantic Web Search Engine (SWSE) is also a
Semantic Web Search Engine created at Digital
Enterprise Research Institute (DERI)
• SWSE uses a “Multicrawler” – a pipelined architecture
for crawling
15
Watson
____________________________
• Created at Knowledge Management Institute at the
UK Open University
• Major Design Principles –
• Considers explicit and implicit relations between
Ontologies
• Ranking of Ontologies with focus on quality over
popularity
16
Watson
____________________________
WATSON DEMO
17
Falcon
____________________________
• Falcon is a Semantic Web Search engine created at
the Institute of Web Science in China
• Falcon allows keyword based queries on :
• Objects
• Concepts
• Documents
• Falcon performs class subsumption reasoning
18
Falcon
____________________________
FALCON DEMO
19
Summary
____________________________
Swoogle
Others
• Keyword based search
• Searches Ontologies and
Instance Data
Sindice
• Indexes on URI, IFP,
keywords
• Use of Hadoop Architecture
SWSE
• Pipelined Architecture for
Crawling
Watson
• Implicit relations between
SWDs
Falcon
• Class Subsumption
Reasoning
20
Issues
____________________________
Crawling
• Swoogle’s crawler is running as a single thread on
one machine
• Limits the number of SWDs dicovered and revisted
Possible Solutions
• Use of Hadoop Architecture
• Use of Grub
21
Other
Issues
____________________________
Crawling large structured Datasets like DBPedia
More reasoning
More services
22
References
____________________________
•
Li Ding et al., "Swoogle: A Search and Metadata Engine for the Semantic Web",
Proceedings of the Thirteenth ACM Conference on Information and Knowledge
Management, November 2004.
•
P. Mika, G. Tummarello “Web Semantics in the Clouds”, IEEE Intelligent Systems,
Volume 23 , Issue 5 (September 2008)
•
E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, G.
Tummarello “Sindice.com: A document-oriented lookup index for open linked data.” In
International Journal of Metadata, Semantics and Ontologies, 3(1), 2008.
•
Mathieu d’Aquin et al., “Watson: A Gateway for the Semantic Web” ,Poster session of the
European Semantic Web Conference, ESWC 2007
•
Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu , “Searching Semantic Web Objects
Based on Class Hierarchies” In WWW 2008 Workshop on Linked Data on the Web, 2008
23
Questions?
____________________________
24
Download