YAGO-NAGA

advertisement
YAGO-NAGA Project
Presented By:
Mohammad Dwaikat
To:
Dr. Yuliya Lierler
CSCI 8986 – Fall 2012
Agenda
•
•
•
•
•
What is YAGO-NAGA?
Why YAGO-NAGA?
How YAGO-NAGA Works?
Demonstration
YAGO-NAGA Sub-Projects
Agenda
•
•
•
•
•
What is YAGO-NAGA?
Why YAGO-NAGA?
How YAGO-NAGA Works?
Demonstration
YAGO-NAGA Sub-Projects
What is YAGO-NAGA?
• Harvesting, Searching, and Ranking Knowledge
from the Web.
• Building a conveniently searchable, large-scale,
highly accurate knowledge base of common facts
in a machine-processable representation.
• Harvested knowledge about millions of entities
and facts about their relationships, from
Wikipedia and WordNet with careful integration
of these two sources.
What is YAGO-NAGA?
• Its vision is a confluence of Semantic Web
(Ontologies), Social Web (Web 2.0), and
Statistical Web (Information Extraction) assets
towards a comprehensive repository of
human knowledge.
YAGO
• Yet Another Great Ontology (YAGO) Knowledge base.
• It is a huge semantic knowledge base, derived from
Wikipedia, WordNet, and GeoNames.
• It knows almost 10 million entities (e.g. persons,
organizations, cities), and 120 million facts about these
entities.
• It has a manually confirmed accuracy of 95%.
• YAGO is an ontology that is anchored in time and
space.
– It attaches a temporal dimension and a spacial dimension
to many of its facts and entities.
YAGO
• It contains all the entities and ontological facts
extracted from Wikipedia (from 2010-08-17),
with categories mapped to the WordNet class
hierarchy.
• It also contains multi-lingual data from the
Universal WordNet (UWN).
YAGO
• It contains all the entities and facts from
GeoNames - (from a dump of August 2010).
• It also contains textual and structural data
from Wikipedia.
• All links+anchor texts between the YAGO entities.
• All Wikipedia category names.
• The titles of references.
YAGO
• It is particularly suited for disambiguation
purposes, as it contains a large number of names
for entities. It also knows the gender of people.
• YAGO is the resulting knowledge base, the facts
are represented as RDF triples (Resource
Description Framework).
• Methods and prototype systems have been
developed for querying, ranking, and exploring
knowledge.
NAGA
• Not Another Google Answer (NAGA) is a new semantic
search engine which provides ranked answers to
queries based on statistical models.
• It can operate on knowledge bases that are organized
as graphs with labeled nodes and edges, so called
relationship graphs.
• As of now, NAGA uses a projection of YAGO as its
knowledge base.
• The underlying query language supports keyword
search for the casual user as well as graph-based
queries with regular expressions for the expert user.
Agenda
•
•
•
•
•
What is YAGO-NAGA?
Why YAGO-NAGA?
How YAGO-NAGA Works?
Demonstration
YAGO-NAGA Sub-Projects
Consider These Questions
• Which German Nobel laureate survived both
world wars and outlived all four of his children?
– The answer is Max Planck.
• Which politicians are also accomplished
scientists?
– The German chancellor Angela Merkel and Benjamin
Franklin.
• How are Max Planck, Angela Merkel, Jim Gray,
and the Dalai Lama related?
– All four have doctoral degrees from German
universities.
Why YAGO-NAGA?
• Three major research:
– Semantic-Web-style knowledge repositories.
• Such as SUMO, OpenCyc, and WordNet.
– Large-scale information extraction.
– Social tagging and Web 2.0 communities that constitute
the social Web.
• Wikipedia is another example of the Social Web paradigm.
• The challenge is how to extract the important facts
from the Web and organize them into an explicit
knowledge base that captures entities and semantic
relationships among them.
Agenda
•
•
•
•
•
What is YAGO-NAGA?
Why YAGO-NAGA?
How YAGO-NAGA Works?
Demonstration
YAGO-NAGA Sub-Projects
How YAGO-NAGA Works?
• YAGO adopts concepts from the standardized
SPARQL Protocol and RDF Query Language for
RDF data but extends them through more
expressive pattern matching and ranking.
• The prototype system that implements these
features is NAGA.
Query for the YAGO Knowledge Base
A big US city with two airports, one named after a World
War II hero, and one named after a World War II battle field?
Structured Knowledge Queries
• A big US city with two airports, one named
after a World War II hero, and one named after
a World War II battle field?
Select Distinct ?c Where {
?c type City . ?c locatedIn USA .
?a1 type Airport . ?a2 type Airport .
?a1 locatedIn ?c . ?a2 locatedIn ?c .
?a1 namedAfter ?p . ?p type WarHero .
?a2 namedAfter ?b . ?b type BattleField . }
Growing the Knowledge Base
Word
Net
+
Wikipedia
YAGO Core
Extractors
YAGO Core
Checker
YAGO
Core
knows  all entities
Web sources
YAGO
YAGO
Gatherer
Gatherer
Hypotheses
YAGO
YAGO
Scrutinizer
Gatherer
YAGO
G r o w i n g
focus on facts
19/38
Information Extraction from Wikipedia
Subj.
Pred.
Obj.
Stanford
University
type
Private
University
hasPresident
J.L.Hennessy
hasStudents
15,319
foundedBy
L.Stanford
foundedIn
1891
…
…
…
YAGO Knowledge Base
• Combine
knowledge from
WordNet &
Wikipedia.
• Additional
Gazetteers
(geonames.org).
Searching & Ranking RDF Graphs in NAGA
Ranking based on confidence, compactness and relevance
Discovery queries:
Kiel
bornIn
$x
type
$a
scientist
diedOn
>
$b
Connectedness queries:
German
novelist
type
$x
hasWon
hasSon
diedOn
$y
*
Thomas Mann
Queries with regular expressions:
Ling
hasFirstName | hasLastName
(coAuthor
| advisor)*
Beng Chin Ooi
$x
type
scientist
worksFor
$y
Nobel
prize
locatedIn*
Zhejiang
Goethe
Agenda
•
•
•
•
•
What is YAGO-NAGA?
Why YAGO-NAGA?
How YAGO-NAGA Works?
Demonstration
YAGO-NAGA Sub-Projects
YAGO Server: UI & API
YAGO Server: UI & API
YAGO-UI
– Interactive online demo
– RDF with time, space &
provenance annotations
– SPARQL + keywords
YAGO-API
Two basic WebServices:
– processQuery
(String query)
– getYagoEntitiesByNames
(String[] names)
www.mpi-inf.mpg.de/yago-naga/demo.html
YAGO
• Browse through the YAGO knowledge base.
–https://d5gate.ag5.mpisb.mpg.de/webyagospotlx/Browser
• Ask queries on YAGO using SPOTLX patterns.
View the results on a map and timeline.
–https://d5gate.ag5.mpisb.mpg.de/webyagospotlx/WebInterface
Agenda
•
•
•
•
•
What is YAGO-NAGA?
Why YAGO-NAGA?
How YAGO-NAGA Works?
Demonstration
YAGO-NAGA Sub-Projects
YAGO-NAGA Sub-Projects
• More than 13 sub-projects of YAGO-NAGA.
• AIDA: is a method, implemented in an online
tool, for disambiguating mentions of named
entities that occur in natural-language text or
Web tables.
– https://d5gate.ag5.mpi-sb.mpg.de/webaida/
Names, Surface Patterns & Paraphrases
Which chemist was born in London?
NN
VBD
VBN
IN
NNP/LOC
• (I) Named entity disambiguation
– chemist  wordnet_chemist, wordnet_pharmacist
– born  Bertran_de_Born, Born_Identity_(Movie), Born_(Album)
– London  London_UK, London_Arkansas, Antonio_London
• (II) Mapping surface patterns onto semantic relations
– <person> was_born_in <location>  bornIn(<person>, <location>)
– <person> was_born_in <date>  bornOn(<person>, <date>)
• (III) Paraphrases of questions
<person> [was] born in <location>
<location>-born <person>
 bornIn(<person>, <location>)
References
• YAGO-NAGA Project:
– http://www.mpi-inf.mpg.de/yago-naga/
• YAGO:
– http://yago-knowledge.org
• NAGA:
– http://www.mpi-inf.mpg.de/yagonaga/naga/demo.html
Download