Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

advertisement
Finding Patterns in a Knowledge
Base using Keywords to
Compose Table Answers
Mohan Yang
Bolin Ding Surajit Chaudhuri
Kaushik Chakrabarti
Raghunath Thota
Mounika Mudireddy
Outline

Introduction

Model

Indexing

Searching

Experiments

Conclusion
Introduction

Queries regarding multiple entities are difficult to
interpret in form of a ranked list

Information related to sets of entities is better when
represented as a table

Google Tables and Microsoft PowerQuery use HTML tables

Generate table answers based on patterns in knowledge
base

Knowledge bases contain information about entities and
attributes representing relationship between them

Model knowledge base as knowledge graph

Knowledge Graph: Directed Graph with nodes as entities
and edges as attributes

Tree Pattern: Possible interpretation of a query

Subtrees are found from the knowledge graph based on
the keywords.

Results with different tree patterns are aggregated into
different tables
Model


Knowledge Graph is represented as G = (V, ε, Τ, α)

V is a collection of entities

ε contains directed edges between the entities in V

Τ(v) gives the type of the entity v in V

α(e) gives the type of the edge e in ε
Valid Subtrees:

Only one path exists from root to leaf

Each keyword in the query appears in the text description of
node/node type/edge type

Every leaf entity or the edge pointing to the leaf must be
mapped to a keyword

Path Patterns: represented as the concatenation of
node/edge types on the path from root to keyword

Length of path pattern = Number of nodes present in the
path from root to a keyword

Tree Patterns: represented as a vector containing path
pattern of each keyword in the query


Height of Tree pattern = maximum length of Path patterns
Tree patterns of two valid subtrees are said to be
identical if path pattern for each keyword in respective
subtrees are identical

Converting tree patterns into table answers

Similar tree patterns are represented in the same table

Each tree pattern is represented as a row in the table

Name of the columns represents the type of nodes/edges in
the path of root to respective keyword

Values of the column represents the nodes i.e. entities

Only one column is considered if the columns are repeated


Relevance score of tree pattern

Scoring functions are used to measure the relevance of tree
patterns

Relevance score of tree pattern is aggregation of relevance
score of valid subtrees satisfying that tree pattern

Relevance score of tree pattern can be either based on more
valid subtrees in a given tree pattern or may be based on
highly relevant individual subtrees
Relevance score of individual valid subtree

Size of the tree (smaller size)

Importance of nodes in the subtree (higher PageRank scores)

How well the keywords match the text description in
subtree
Indexing

For each keyword w in query, all paths from a root
following a specific pattern and ending with w are fetched

Pattern-First path index: Sorted by patterns followed by
roots

Root-First path index: Sorted by roots followed by
patterns

Size of the index is bound by:

Total number of paths with length d

Size of text on entities and attributes
Searching: Pattern Enumeration Join

Tree patterns contain m path patterns if query contains m
keywords

Uses pattern-first path index to fetch all paths of the
given pattern

Joins them at root and find the valid subtrees

Compute the score for the tree pattern
Searching: Linear-Time Enumeration

Candidate roots can be obtained by using root-first index

Roots(w) function gives all roots of which can reach w

Intersection of all the root node sets for each keyword
give the candidate roots

For each candidate root, we fetch the patterns to reach
each keyword using the function Patterns(w,r)

From the fetched patterns, get the paths which can form
valid subtrees
Experiments

Experiment 1:Varying height threshold d and number of
tree patterns

Experiment 2: Varying number of valid subtrees

Experiment 3: Varying size of data set
Conclusion

Formal models of tree patterns are defined

Path-based indexes and two efficient algorithms are
proposed to find tree patterns

Tree patterns are used to compose table answers for
keyword queries

d-height tree pattern problem and sampling-based
approach is introduced to improve search accuracy and
efficiency of results
Thank You
Download