GDM@Fudan

advertisement
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
YAGO
Reporter: Qi Liu
Put conference information here
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
What is YAGO?
 A semantic web
 A knowledge base
 A combination of WordNet and wikipedia
2
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Semantic web
 Advocated by W3C(World Wide Web Consortium)
 Aimed at reconstructing the WWW
 A standard framework: RDF(Resource Description
Framework)
3
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
What is YAGO?
 A semantic web
 A knowledge base
 A combination of WordNet and wikipedia
4
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Knowledge base
 To be:
• A special database for knowledge management
 To do:
• Provides a means for collecting, organising, searching
and utilising information
 Three types:
• Machine-readable knowledge bases(DBpedia)
• Human-readable konwledge bases(Wikipedia)
• Knowledge base analysis and design
5
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
What is YAGO?
 A semantic web
 A knowledge base
 A combination of WordNet and wikipedia
6
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
WordNet
 To be:
• A lexical database for English since 1985
 To do:
• Groups words into synsets
• Provides short, general definitions
• Records the semantic relations between these synsets
 25 basic noun groups & 15 verb groups
7
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Key Concepts
 Ontology vs Taxonomy
 Lexicon:the bridge between a language and the
knowledge expressed in that language
 Syntactic (there vs their)
 Semantic (sight vs site)
 Pragmatic (infer vs imply)
8
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Semantics of YAGO
 Five relations:
•
•
•
•
•
Domain
Range
subRelationof
Type
subClassOf
 Entities:
•
•
•
•
•
Domain
Relation
Range
Literal
......
10
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Axiomatic rules
11
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Reasoning rules
 correctness and completeness
12
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
The YAGO system
 Knowledge extraction
 YAGO storage
 Enriching YAGO
13
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Knowledge extraction





TYPE relation
SUBCLASSOF relation
MEANS relation
Other relations
Meta-relations
14
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
TYPE relation extraction
 The Wikipedia Category System
• Types: conceptual, administrative, relational, thematic
 Identifying Conceptual Categories
• Conceptual  TYPE
• Adm and relational ones: excluded by hand
• Employ a shallow linguistic parsing(Noun Group Parser)
of the left two categories
• E.g. Naturalized citizens of United States
• domain and range extracted at the same time
15
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
SUBCLASSOF relation extraction
 Wikipedia categories
• DAG(directed acyclic graph)
• Reflect merely the thematic structure
• Use only the leaf categories of Wikipedia
 Integrating WordNet Synsets
• Match or prefer WordNet
 Establishing subClassOf
• American people in Japan
 Exceptions
• Correct manually
16
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Means relation extraction
 Exploiting WordNet Synsets
• A synset{urban center,metropolis, city}
• Attach a class for the synset ‘city’
 Exploiting Wikipedia Redirects
• Search “Einstein, Albert”, redirected to “Albert, Einstein”
 Parsing Person Names
• givenNameOf subRelationOf means
• familyNameOf subRelationOf means
17
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Other relations extraction






BornInYear & DiedInYear
EstablisedIn & LocatedIn
WrittenInYear
PolitionOf
HasWonPrize
Filtering the Results
18
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Meta-relations extraction
 Descriptions
• Individual DESCRIBES URL
 Witness
• Fact FoundIn URL(of its witness page)
• ExtractedBy
 Context
• Linkages btw A&B: A Context B
19
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Knowledge extraction





TYPE relation
SUBCLASSOF relation
MEANS relation
Other relations
Meta-relations
20
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
The YAGO system
 Knowledge extraction
 YAGO storage
 Enriching YAGO
21
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
YAGO storage
 Model independent of storage
 Storage:
• Text files, XML, database tables, RDF
22
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Enriching YAGO
 Add the fact(x,r,y)
•
•
•
•
•
Map x,y to existing entities(word sense disambiguation)
If mapping failed, add new entity.
Map r to YAGO ontology
If mapping successed, add a FoundIn relation
If mapping failed, add a new fact!
23
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Summary on YAGO1
 1M entities & 5M facts
 Accuracy around 95%
24
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
25
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
YAGO2: In Time, Space and Many Languages
 YAGO: about 100 manually defined relations
 Build YAGO2 architecture based on such rules:
• Factual rules
• E.g. Exceptions,definition of all relations, domains,
ranges and classes
• Implication rules
• Inferring rules from the facts in the database
• Replacement rules
• Normalize numbers, tags and other formats
• Extraction rules
• Extracting facts from a given source text
26
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Temporal Dimension
People wasBornOnDate & diedOnDate
Groups wasCreatedOnDate&wasDestroyedOnDate
Artifacts(buildings, songs,cities) [same as above]
Events startedOnDate & endedOnDate
=>startExistingOnDate&endExistingOnDate
 Facts
 Entities in a fact
=>subjectStartRelation&objectStartRelation




27
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
GEO-SPATIAL Dimension
 All physical objects have a location in space!
 Define it with geographical coordinates, i.e.
Latitude and longtitude
=>yagoGeoCoordinates,
=>hasGeoCoordinates
 Two sources:
• Wikipedia
• GeoNames
• locatedIn & hasGeoCoordinates
& <location,TYPE,class>
28
Email: zerup123@gmail.com
GDM@FUDAN http://gdm.fudan.edu.cn
Graph Data Management Lab, School of Computer Science
Textual Dimension
 hasWikipediaAnchorText
 hasWikipediaCategory
 hasCitationTitle
 subClassOf hasContext
Integrating UWN to including 200 languages
29
Email: zerup123@gmail.com
Download