Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers Mohan Yang Bolin Ding Surajit Chaudhuri Kaushik Chakrabarti Raghunath Thota Mounika Mudireddy Outline Introduction Model Indexing Searching Experiments Conclusion Introduction Queries regarding multiple entities are difficult to interpret in form of a ranked list Information related to sets of entities is better when represented as a table Google Tables and Microsoft PowerQuery use HTML tables Generate table answers based on patterns in knowledge base Knowledge bases contain information about entities and attributes representing relationship between them Model knowledge base as knowledge graph Knowledge Graph: Directed Graph with nodes as entities and edges as attributes Tree Pattern: Possible interpretation of a query Subtrees are found from the knowledge graph based on the keywords. Results with different tree patterns are aggregated into different tables Model Knowledge Graph is represented as G = (V, ε, Τ, α) V is a collection of entities ε contains directed edges between the entities in V Τ(v) gives the type of the entity v in V α(e) gives the type of the edge e in ε Valid Subtrees: Only one path exists from root to leaf Each keyword in the query appears in the text description of node/node type/edge type Every leaf entity or the edge pointing to the leaf must be mapped to a keyword Path Patterns: represented as the concatenation of node/edge types on the path from root to keyword Length of path pattern = Number of nodes present in the path from root to a keyword Tree Patterns: represented as a vector containing path pattern of each keyword in the query Height of Tree pattern = maximum length of Path patterns Tree patterns of two valid subtrees are said to be identical if path pattern for each keyword in respective subtrees are identical Converting tree patterns into table answers Similar tree patterns are represented in the same table Each tree pattern is represented as a row in the table Name of the columns represents the type of nodes/edges in the path of root to respective keyword Values of the column represents the nodes i.e. entities Only one column is considered if the columns are repeated Relevance score of tree pattern Scoring functions are used to measure the relevance of tree patterns Relevance score of tree pattern is aggregation of relevance score of valid subtrees satisfying that tree pattern Relevance score of tree pattern can be either based on more valid subtrees in a given tree pattern or may be based on highly relevant individual subtrees Relevance score of individual valid subtree Size of the tree (smaller size) Importance of nodes in the subtree (higher PageRank scores) How well the keywords match the text description in subtree Indexing For each keyword w in query, all paths from a root following a specific pattern and ending with w are fetched Pattern-First path index: Sorted by patterns followed by roots Root-First path index: Sorted by roots followed by patterns Size of the index is bound by: Total number of paths with length d Size of text on entities and attributes Searching: Pattern Enumeration Join Tree patterns contain m path patterns if query contains m keywords Uses pattern-first path index to fetch all paths of the given pattern Joins them at root and find the valid subtrees Compute the score for the tree pattern Searching: Linear-Time Enumeration Candidate roots can be obtained by using root-first index Roots(w) function gives all roots of which can reach w Intersection of all the root node sets for each keyword give the candidate roots For each candidate root, we fetch the patterns to reach each keyword using the function Patterns(w,r) From the fetched patterns, get the paths which can form valid subtrees Experiments Experiment 1:Varying height threshold d and number of tree patterns Experiment 2: Varying number of valid subtrees Experiment 3: Varying size of data set Conclusion Formal models of tree patterns are defined Path-based indexes and two efficient algorithms are proposed to find tree patterns Tree patterns are used to compose table answers for keyword queries d-height tree pattern problem and sampling-based approach is introduced to improve search accuracy and efficiency of results Thank You