Generating Query Substitutions

Generating Query Substitutions Alicia Wood What is the problem to be solved? Problem • Imperfect description of need • Search engine not able to retrieve documents matching query • Need accurate and related query substitutions Problem (cont.) • Given a query • Want to generate modified query (related) – Improvements (specification) – Neutral (spelling change, synonym) – Loss of original meaning (generalization) Who cares about this problem and why? Who cares? • User typing the query • Want correct results with imperfect query What have others done to solve this problem and why is this inadequate? Previous Work • • • • Relevance/Pseudo relevance feedback Query term deletion Substituting query terms with related terms Latent Semantic Indexing (LSI) Relevance/Pseudo relevance feedback • Submit query for initial retrieval • Processing resulting documents • Modify the query by expanding with additional terms from documents • Perform second retrieval with modified query • Can cause query drift • Computationally expensive Query term deletion • Loss of specificity from original query Substituting query terms • Relies on an initial retrieval Latent Semantic Indexing (LSI) • Identify patterns in relationships between terms and concepts in unstructured collection of text • Computationally expensive What is the proposed solution to the problem? Solution • Query modification based on precomputed query and phrase similarity, – Ranking proposed queries – Similar queries /phrases derived from user query sessions – Learned models used to re-rank • Based on similarity of new query to original query Contributions 1. Identification of new source of data to identify similar queries and phrases 2. The definition of a scheme for scoring query suggestions 3. An algorithm to combine query and phrase suggestions – Finds highly and broadly relevant phrases 4. Identification of features that are predictive of highly relevant query suggestions Classes of Suggestion Relevance • Precise rewriting – Match user’s intent, preserve core meaning automobile insurance <-> automotive insurance • Approximate rewriting – direct close relationship to topic, scope narrowed or broadened Apple music player <-> ipod shuffle • Possible rewriting – Categorical relationship to initial query, complementary product but distinct Eye glasses <-> contact lenses • Clear mismatch – no clear relationship Jaguar xj6 <-> os x jaguar Classes of Rewriting • Specific Rewriting (1+2) – closely related query – highly relevant • Broad Rewriting (1+2+3) – query expansion – relevant to user interests Substitutables • Initial query -> generate relevant queries – Replace query as whole or phrases – Segment query into phrases – Find query pairs where one segment has changed • (britney spears) (mp3s) -> (britney spears) (lyrics) • Pair Independence Hypothesis Likelihood Ratio – High value = strong dependence between two terms Validation • 1000 initial queries – Generate single suggestion (qj) for each • Evaluate accuracy of approaches • Train machine learned classifier • Evaluate ability to produce higher quality suggestions – Word distance, normalized edit distance, number of substitutions • Suggestions criteria: – Some words from initial query – Modifications shouldn’t be made at start of query Future Work • Build semantic classifier – Predict semantic class of rewriting • Take inspiration from machine translation techniques • Introduce language model – Avoid producing nonsensical queries

Generating Query Substitutions

Related documents

Products

Support

Generating Query Substitutions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib