Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers Mohan Yang Bolin Ding Surajit Chaudhuri Kaushik Chakrabarti Raghunath Thota Mounika Mudireddy Outline  Introduction  Model  Indexing  Searching  Experiments  Conclusion Introduction  Queries regarding multiple entities are difficult to interpret in form of a ranked list  Information related to sets of entities is better when represented as a table  Google Tables and Microsoft PowerQuery use HTML tables  Generate table answers based on patterns in knowledge base  Knowledge bases contain information about entities and attributes representing relationship between them  Model knowledge base as knowledge graph  Knowledge Graph: Directed Graph with nodes as entities and edges as attributes  Tree Pattern: Possible interpretation of a query  Subtrees are found from the knowledge graph based on the keywords.  Results with different tree patterns are aggregated into different tables Model   Knowledge Graph is represented as G = (V, ε, Τ, α)  V is a collection of entities  ε contains directed edges between the entities in V  Τ(v) gives the type of the entity v in V  α(e) gives the type of the edge e in ε Valid Subtrees:  Only one path exists from root to leaf  Each keyword in the query appears in the text description of node/node type/edge type  Every leaf entity or the edge pointing to the leaf must be mapped to a keyword  Path Patterns: represented as the concatenation of node/edge types on the path from root to keyword  Length of path pattern = Number of nodes present in the path from root to a keyword  Tree Patterns: represented as a vector containing path pattern of each keyword in the query   Height of Tree pattern = maximum length of Path patterns Tree patterns of two valid subtrees are said to be identical if path pattern for each keyword in respective subtrees are identical  Converting tree patterns into table answers  Similar tree patterns are represented in the same table  Each tree pattern is represented as a row in the table  Name of the columns represents the type of nodes/edges in the path of root to respective keyword  Values of the column represents the nodes i.e. entities  Only one column is considered if the columns are repeated   Relevance score of tree pattern  Scoring functions are used to measure the relevance of tree patterns  Relevance score of tree pattern is aggregation of relevance score of valid subtrees satisfying that tree pattern  Relevance score of tree pattern can be either based on more valid subtrees in a given tree pattern or may be based on highly relevant individual subtrees Relevance score of individual valid subtree  Size of the tree (smaller size)  Importance of nodes in the subtree (higher PageRank scores)  How well the keywords match the text description in subtree Indexing  For each keyword w in query, all paths from a root following a specific pattern and ending with w are fetched  Pattern-First path index: Sorted by patterns followed by roots  Root-First path index: Sorted by roots followed by patterns  Size of the index is bound by:  Total number of paths with length d  Size of text on entities and attributes Searching: Pattern Enumeration Join  Tree patterns contain m path patterns if query contains m keywords  Uses pattern-first path index to fetch all paths of the given pattern  Joins them at root and find the valid subtrees  Compute the score for the tree pattern Searching: Linear-Time Enumeration  Candidate roots can be obtained by using root-first index  Roots(w) function gives all roots of which can reach w  Intersection of all the root node sets for each keyword give the candidate roots  For each candidate root, we fetch the patterns to reach each keyword using the function Patterns(w,r)  From the fetched patterns, get the paths which can form valid subtrees Experiments  Experiment 1:Varying height threshold d and number of tree patterns  Experiment 2: Varying number of valid subtrees  Experiment 3: Varying size of data set Conclusion  Formal models of tree patterns are defined  Path-based indexes and two efficient algorithms are proposed to find tree patterns  Tree patterns are used to compose table answers for keyword queries  d-height tree pattern problem and sampling-based approach is introduced to improve search accuracy and efficiency of results Thank You

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

Related documents

Products

Support

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib