Statistical Translation and Web Search Ranking Jianfeng Gao Natural language processing, MSR

Statistical Translation and Web Search Ranking Jianfeng Gao Natural language processing, MSR July 22, 2011 Who should be here? • Interested in statistical machine translation and Web search ranking • Interested in modeling technologies • Look for topics for your master/PhD thesis – A difficult topic: very hard to beat a simple baseline – An easy topic: others cannot beat it either Outline • Probability • Statistical Machine Translation (SMT) • SMT for Web search ranking 3 Probability (1/2) • Probability space: 𝑥𝑋 – 𝑃(𝑥)[0, 1] – 𝑥𝑋𝑃(𝑥) = 1 – Cannot say 𝑃 𝑥 > 𝑃 𝑦 if 𝑥 ∈ 𝑋 but 𝑦𝑋 • Joint probability: 𝑃(𝑥, 𝑦) – Probability that x and y are both true • Conditional probability: 𝑃(𝑦|𝑥) – Probability that y is true when we already know x is true • Independence: 𝑃(𝑥, 𝑦) = 𝑃(𝑥)𝑃(𝑦) – x and y are independent Probability (2/2) • 𝐻: assumptions on which the probabilities are based • Product rule –from the def of conditional probability – 𝑃(𝑥, 𝑦|𝐻) = 𝑃(𝑥|𝑦, 𝐻)𝑃(𝑦|𝐻) = 𝑃(𝑦|𝑥, 𝐻)𝑃(𝑥|𝐻) • Sum rule – a rewrite of the marginal probability def – 𝑃(𝑥|𝐻) = 𝑦 𝑃(𝑥, 𝑦|𝐻) = 𝑦 𝑃(𝑥|𝑦, 𝐻)𝑃(𝑦|𝐻) • Bayes rule – from the product rule – 𝑃(𝑦|𝑥, 𝐻) = 𝑃 𝑥 𝑦, 𝐻 𝑃 𝑦 𝐻 𝑃 𝑥𝐻 An example: Statistical Language Modeling Statistical Language Modeling (SLM) • Model form – capture language structure via a probabilistic model – Pr(𝑊|𝓖) = 𝑃(𝑊|𝓖) = 𝑃(𝑊|𝓖, ) • Model parameters – estimation of free parameters using training data –  = argmax𝑃(𝑊|𝓖, ) Model Form • How to incorporate language structure into a probabilistic model • Task: next word prediction – Fill in the blank: “The dog of our neighbor ___” • Starting point: word n-gram model – Very simple, yet surprisingly effective – Words are generated from left-to-right – Assumes no other structure than words themselves 8 Word N-gram Model • Word based model – Using chain rule on its history (=preceding words) 𝑃 𝑡ℎ𝑒 𝑑𝑜𝑔 𝑜𝑓 𝑜𝑢𝑟 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑏𝑎𝑟𝑘𝑠 = 𝑃 𝑡ℎ𝑒 < 𝑠 > × 𝑃 𝑑𝑜𝑔 < 𝑠 >, 𝑡ℎ𝑒 × 𝑃(𝑜𝑓 | < 𝑠 >, 𝑡ℎ𝑒, 𝑑𝑜𝑔) … × 𝑃(𝑏𝑎𝑟𝑘𝑠 | < 𝑠 >, 𝑡ℎ𝑒, 𝑑𝑜𝑔, 𝑜𝑓, 𝑜𝑢𝑟, 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟) × 𝑃(</𝑠 > | < 𝑠 >, 𝑡ℎ𝑒, 𝑑𝑜𝑔, 𝑜𝑟, 𝑜𝑢𝑟, 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟, 𝑏𝑎𝑟𝑘𝑠) 𝑃(𝑤1, 𝑤2 … 𝑤𝑛) = 𝑃(𝑤1 | < 𝑠 >) × 𝑃(𝑤2 | < 𝑠 > 𝑤1) × 𝑃(𝑤3 | < 𝑠 > 𝑤1 𝑤2) … × 𝑃(𝑤𝑛| < 𝑠 > 𝑤1 𝑤2 … 𝑤𝑛 − 1) × 𝑃(</𝑠 > | < 𝑠 > 𝑤1 𝑤2 … 𝑤𝑛) 9 Word N-gram Model • How do we get probability estimates? – Get text and count! 𝑃 𝑤2 𝑤1 = Count 𝑤1 ,𝑤2 Count 𝑤1 • Problem of using the whole history – Rare events: unreliable probability estimates – Assuming a vocabulary of 20,000 words, model # parameters unigram P(w1) 20,000 bigram P(w2|w1) 400M trigram P(w3|w1w2) 8 x 1012 fourgram P(w4|w1w2w3) 1.6 x 1017 From Manning and Schütze 1999: 194 Word N-gram Model • Markov independence assumption – A word depends only on N-1 preceding words – N=3 → word trigram model • Reduce the number of parameters in the model – By forming equivalence classes • Word trigram model 𝑃 𝑤𝑖 < 𝑠 > 𝑤1𝑤2 … 𝑤𝑖−2 𝑤𝑖−1 = 𝑃 𝑤𝑖 𝑤𝑖−2 𝑤𝑖−1 ) 𝑃(𝑤1 𝑤2 … 𝑤𝑛) = 𝑃(𝑤1| < 𝑠 >) × 𝑃(𝑤2| < 𝑠 > 𝑤1) × 𝑃(𝑤3 | 𝑤1 𝑤2) ... × 𝑃(𝑤𝑛 |𝑤𝑛−2 𝑤𝑛−1 ) × 𝑃(</𝑠 > |𝑤𝑛−1 𝑤𝑛) 11 Model Parameters • Bayesian estimation paradigm • Maximum likelihood estimation (MLE) • Smoothing in N-gram language models 12 Bayesian Paradigm • 𝑃(𝑚𝑜𝑑𝑒𝑙|𝑑𝑎𝑡𝑎) = – – – – 𝑃 𝑑𝑎𝑡𝑎 𝑚𝑜𝑑𝑒𝑙 𝑃 𝑚𝑜𝑑𝑒𝑙 𝑃 𝑑𝑎𝑡𝑎 𝑃(𝑚𝑜𝑑𝑒𝑙|𝑑𝑎𝑡𝑎) – Posterior probability 𝑃(𝑑𝑎𝑡𝑎|𝑚𝑜𝑑𝑒𝑙) – Likelihood 𝑃(𝑚𝑜𝑑𝑒𝑙) – Prior probability 𝑃(𝑑𝑎𝑡𝑎) – Marginal probability • Likelihood versus probability 𝑃 𝑛 𝑢, 𝑁 – for fixed 𝑢, 𝑃 defines a probability over 𝑛; – for fixed 𝑛, 𝑃 defines the likelihood of 𝑢. • Never say “the likelihood of the data” • Always say “the likelihood of the parameters given the data” 13 Maximum Likelihood Estimation (MLE) • 𝛉: model; 𝑋: data • 𝛉 = argmax𝑃(𝛉|𝑋) = argmax𝑃 𝑋𝛉 𝑃 𝛉 𝑃 𝑋 – Assume a uniform prior 𝑃(𝛉) = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 – 𝑃(𝑋) is independent of 𝛉, and is dropped • 𝛉 = argmax 𝑃(𝛉|𝑋)argmax 𝑃(𝑋|𝛉) – where 𝑃(𝑋|𝛉) is the likelihood of parameter • Key difference between MLE and Bayesian Estimation – MLE assume that 𝛉 is fixed but unknown, – Bayesian estimation assumes that 𝛉 itself is a random variable with a prior distribution 𝑃(𝛉) 14 MLE for Trigram LM • 𝑃𝑀𝐿𝐸 (𝑤3|𝑤1 𝑤2) = Count 𝑤1 𝑤2 𝑤3 Count 𝑤1 𝑤2 • 𝑃𝑀𝐿𝐸 (𝑤2|𝑤1) = Count 𝑤1 𝑤2 Count 𝑤1 𝐶𝑜𝑢𝑛𝑡 𝑤 𝑁 • 𝑃𝑀𝐿𝐸 (𝑤) = • It is easy – let us get some real text and start to count  But, why is this the MLE solution? 15 Derivation of MLE for N-gram • Homework – an interview question of MSR  • Hints – This is a constrained optimization problem – Use log likelihood as objective function – Assume a multinomial distribution of LM – Introduce Lagrange multiplier for the constraints • ∑𝑥𝑋𝑃 𝑥 = 1, and 𝑃 𝑥 ≥ 0 16 Sparse Data Problem • Say our vocabulary size is |V| • There are |V|3 parameters in the trigram LM – |V| = 20,000  20,0003 = 8  1012 parameters • Most trigrams have a zero count even in a large text corpus – Count(𝑤1 𝑤2 𝑤3) = 0 – 𝑃𝑀𝐿𝐸 (𝑤3|𝑤1 𝑤2) = Count 𝑤1𝑤2𝑤3 Count 𝑤1𝑤2 –𝑃 𝑊 = 𝑃𝑀𝐿𝐸 𝑤1 𝑃𝑀𝐿𝐸 𝑤2 𝑤1 – oops… =0 𝑖=3…𝑛 𝑃(𝑤𝑖 |𝑤𝑖−2 𝑤𝑖−1 ) =0 17 Smoothing: Adding One • Add one smoothing (from Bayesian paradigm) • But works very badly – do not use this • Add delta smoothing • Still very bad – do not use this 18 Smoothing: Backoff • Backoff trigram to bigram, bigram to unigram  D(0,1) is a discount constant – absolute discount  α is calculated so probabilities sum to 1 (homework) • Simple and effective – use this one! 19 Outline • Probability • SMT and translation models • SMT for web search ranking 20 SMT C: 救援人员在倒塌的房屋里寻找生还者 E: Rescue workers search for survivors in collapsed houses 𝐸 ∗ = argmax 𝑃(𝐸|𝐶) 𝐸 𝐸 ∗ = argmax 𝑃(𝐶|𝐸)𝑃(𝐸) 𝐸 𝑃(𝐶|𝐸) and 𝑃(𝐸|𝐶) 1 𝑃(𝐸|𝐶) = exp 𝑍 𝐶, 𝐸 𝜆𝑖 ℎ𝑖 (𝐶, 𝐸) 𝑖 𝑃(𝐸|𝐶) • Translation process (generative story) – C is broken into translation units – Each unit is translated into English – Glue translated units to form E • Translation models – Word-based models – Phrase-based models – Syntax-based models Generative Modeling Art Story Science Math Engineering Code Generative Modeling for 𝑃(𝐸|𝐶) • Story making – how a target sentence is generated from a source sentence step by step • Mathematical formulation – modeling each generation steps in the generative story using a probability distribution • Parameter estimation – implementing an effective way of estimating the probability distributions from training data Word-Based Models: IBM Model 1 • We first choose the length for the target sentence 𝐼, according to the distribution 𝑃(𝐼|𝐶). • Then, for each position 𝑖 (𝑖 = 1 … 𝐼) in the target sentence, we choose a position 𝑗 in the source sentence from which to generate the 𝑖-th target word 𝑒𝑖 according to the distribution 𝑃 𝑗 𝐶 . • Finally, we generate the target word by translating 𝑐𝑗 according to the distribution 𝑃(𝑒𝑖 |𝑐𝑗 ). Mathematical Formulation • Assume that the choice of the length is independent of 𝐶 and 𝐼 – 𝑃(𝐼|𝐶) = 𝜖 • Assume that all positions in the source sentence are equally likely to be chosen – 𝑃(𝑗|𝐶) = 1 𝐽+1 • Assuming that each target word is generated independently from 𝐶 – 𝑃(𝐸|𝐶) = 𝑃(𝐼|𝐶) 𝑰 𝒊=𝟏 𝑃(𝑒𝑖 |𝐶) Parameter Estimation • Model Form – 𝑃(𝐸|𝐶) = 𝜖 𝐽+1 𝐼 𝑰 ∑𝐽 𝒊=𝟏 𝑗=0 𝑃 𝑒𝑖 𝑐𝑗 • MLE on word-aligned training data –𝑃 𝑒 𝑐 = 𝑁(𝑐,𝑒) ∑ ′ 𝑁(𝑐,𝑒′) 𝑒 • Don’t forget smoothing Phrase-Based Models Mathematical Formulation • Assume a uniform probability over segmentations – 𝑃 𝐸𝐶 ∝∑ 𝑆,𝑇,𝑀 ∈ 𝑃 𝐵 𝐶,𝐸 𝑇 𝐶, 𝑆 ⋅ 𝑃 𝑀 𝐶, 𝑆, 𝑇 • Use the maximum approximation to the sum – 𝑃 𝐸 𝐶 ≈ max 𝑃 𝑇 𝐶, 𝑆 ⋅ 𝑃 𝑀 𝐶, 𝑆, 𝑇 𝑆,𝑇,𝑀 ∈ 𝐵 𝐶,𝐸 • Assume each phrase being translated independently and use distance-based reordering model – 𝑃 𝐸 𝐶 ∝ max 𝑆,𝑇,𝑀 ∈ 𝐵 𝐶,𝑄 𝐾 𝑘=1 𝑃(𝐞𝑘 |𝐜𝑘 )𝑑(𝑠𝑡𝑎𝑟𝑡𝑖 − 𝑒𝑛𝑑𝑖−1 − 1) Parameter Estimation MLE: 𝑃 𝐞 𝐜 = 𝑁 𝐜,𝐞 ∑ ′ 𝑁 𝐜,𝐞′ 𝐞 Don’t forget smoothing Syntax-Based Models Story • Parse an input Chinese sentence into a parse tree • Translate each Chinese constituent into English – VP  (PP 寻找 NP, search for NP PP) • Glue these English constituents into a wellformed English sentence. Other Two Tasks? • Mathematical formation – Based on synchronous context free grammar (SCFG) • Parameter estimation – Learning SCFG from data • Homework  • Let us go thru an example (thanks to Michel Galley) – Hierarchical phrase model – Linguistically syntax-based models 救援人员在倒塌的房屋里寻找生还者 rescue workers search for survivors in collapsed houses 倒塌的房屋 collapsed houses 救援人员在倒塌的房屋里寻找生还者 rescue workers search for survivors in collapsed houses 在倒塌的房屋里寻找生还者 search for survivors in collapsed houses 救援人员在倒塌的房屋里寻找生还者 rescue workers search for survivors in collapsed houses 在倒塌的房屋里寻找生还者 search for survivors in collapsed houses A synchronous rule 在里寻找 • Phrase-based translation unit • Discontinuous translation unit • Control on reordering A synchronous grammar 在里寻找倒塌的房屋生还者 Context-free derivation: 里寻找 search for 在倒塌的房屋里寻找 search for 在倒塌的房屋里寻找生还者 search for survivors in collapsed houses 在 in in collapsed houses A synchronous grammar 在里寻找倒塌的房屋生还者 Recognizes: search for survivors in collapsed houses search for collapsed houses in survivors search for survivors collapsed houses in rescue staff in 救援人员在 collapse of 倒塌的 Rescue workers search for house in search survivors 房屋里寻找生还者 survivors IN in collapsed houses. NNS JJ NNS NNS NN NNS NP VBP PP VBP PP VP S NP PP VP NP PP rescue staff in 救援人员在 collapse of 倒塌的 Rescue workers search for house in search survivors 房屋里寻找生还者 survivors IN in collapsed houses. NNS JJ NNS NNS NN NNS NP VBP PP VBP PP VP S NP PP VP NP PP rescue staff in 救援人员在 collapse of 倒塌的 Rescue workers search for house in search survivors 房屋里寻找生还者 survivors IN in collapsed houses. NNS JJ NNS NNS NN NNS NP VBP PP VBP PP VP S NP PP VP NP PP rescue staff in 救援人员在 collapse of 倒塌的 Rescue workers search for house in search survivors 房屋里寻找生还者 survivors IN in collapsed houses. NNS NP VBP PP VBP PP JJ PP NNS NP PP VP VP VP VP PP 寻找 NP PP VBP IN search for NP PP rescue staff in 救援人员在 collapse of 倒塌的 Rescue workers search for house in search survivors 房屋里寻找生还者 survivors IN in collapsed houses. NNS NP VBP PP VBP PP JJ PP NNS NP PP VP VP SCFG rule: VP-234 PP-32 寻找 NP-57 search for NP-57 PP-32 rescue staff in 救援人员在 collapse of 倒塌的 Rescue workers search for house in search survivors 房屋里寻找生还者 survivors IN in collapsed houses. NNS JJ NNS NNS NN NNS NP VBP PP VBP PP VP S NP PP VP NP PP Outline • Probability • SMT and translation models • SMT for web search ranking 47 Web Documents and Search Queries • • • • cold home remedy cold remeedy flu treatment how to deal with stuffy nose? Map Queries to Documents • Fuzzy keyword matching – Q: cold home remedy – D: best home remedies for cold and flu • Spelling correction – Q: cold remeedies – D: best home remedies for cold and flu • Query alteration – Q: flu treatment – D: best home remedies for cold and flu • Query/document rewriting – Q: how to deal with stuffy nose – D: best home remedies for cold and flu • Where are we now? Research Agenda (Gao et al. 2010, 2011) • Model documents and queries as different languages (Gao et al., 2010) • Cast mapping queries to documents as bridging the language gap via translation • Leverage statistical machine translation (SMT) technologies and infrastructures to improve search relevance Are Queries and Docs just Different Languages? • A large scale analysis, extending (Huang et al. 2010) • Divide web collection into different fields, e.g., queries, anchor text, titles, etc. • Develop a set of language models, each on one n-gram datasets from a different field • Measure language difference between different fields (queries/docs) via perplexity Microsoft Web N-gram Model Collection (cutoff = 0) • Microsoft web n-gram services. http://research.microsoft.com/web-ngram Perplexity Results • Test set – 733,147 queries from the May 2009 query log • Summary – Query LM is most predictive of test queries – Title is better than Anchor in lower order but is worse in higher order – Body is in a different league SMT for Document Ranking • Given a query (q), doc (d) can be ranked by how likely it is that q is rewritten from d, 𝑃(𝐪|𝐝) how to deal with stuffy nose? • An example: phrasal statistical translation for Web document ranking Phrasal Statistical Translation for Ranking d: S: T: M: q: “cold home remedies” [“cold”, “home remedies”] [“stuffy nose”, “deal with”] (1  2, 2 1) “deal with stuffy nose” title segmentation translation permutation query • Uniform probability over S: 𝑃(𝐪|𝐝) ≈ ∑(𝑆,𝑇,𝑀) 𝑃 𝑇 𝐝, 𝑆 𝑃(𝑀|𝐝, 𝑆, 𝑇) • Maximum approximation: 𝑃(𝐪|𝐝) ≈ max (𝑆,𝑇,𝑀)∈𝐵(𝐝,𝐪) 𝑃 𝑇 𝐝, 𝑆 𝑃(𝑀|𝐝, 𝑆, 𝑇) • Max probability assignment via dynamic programming: 𝑃(𝐪|𝐝) ≈ max ∗ 𝑃 𝑇 𝐝, 𝑆 and 𝑃 𝑇 𝐝, 𝑆 = 𝑘=1…𝐾 𝑃(𝐪𝑘 |𝐰𝑘 ) (𝑆,𝑇,𝑀)∈𝐵(𝐝,𝐪,𝐴 ) • Model training on query-doc pairs Mine Query-Document Pairs from User Logs how to deal with stuffy nose? stuffy nose treatment cold home remedies NO CLICK NO CLICK http://www.agelessherbs.com/BestHome RemediesColdFlu.html Mine Query-Document Pairs from User Logs how to deal with stuffy nose? stuffy nose treatment cold home remedies Mine Query-Document Pairs from User Logs how to deal with stuffy nose? stuffy nose treatment cold home remedies QUERY (Q) how to deal with stuffy nose stuffy nose treatment cold home remedies …… go israel skate at wholesale at pr breastfeeding nursing blister baby thank you teacher song immigration canada lacolle Title (T) best home remedies for cold and flu best home remedies for cold and flu best home remedies for cold and flu …… forums goisrael community wholesale skates southeastern skate supply clogged milk ducts babycenter lyrics for teaching educational children s music cbsa office detailed information • 178 million pairs from 0.5 year log Evaluation Methodology • Measurement: NDCG, t-test • Test set: – 12,071 English queries sampled from 1-y log – 5-level relevance label for each query-doc pair – On a tail document sets (click field is empty) • Training data for translation models: – 82,834,648 query-title pairs Baseline: Word-Based Models (Berger&Lafferty, 99) • Basic model: • Mixture model: • Learning translation probabilities from clickthrough data – IBM Model 1 with EM Results Sample IBM-1 word translation probability after EM training on the Query-title pairs Bilingual Phrases • Notice that with context information, we have less ambiguous translations Results • Ranking results – All features – Only phrase translation features Why Do Bi-Phrases Help? • Length distribution • Good/bad examples Generative Topic Models Q: stuffy nose treatment Q: stuffy nose treatment D: cold home remedies Topic D: cold home remedies • Probabilistic latent Semantic Analysis (PLSA) – 𝑃 𝐪𝐝 = 𝑞∈𝐪 ∑𝑧 𝑃 𝑞 𝝓𝑧 𝑃(𝑧|𝐝, 𝜽) – d is assigned a single most likely topic vector – q is generated from the topic vectors • Latent Dirichlet Allocation (LDA) generalizes PLSA – a posterior distribution over topic vectors is used – PLSA = LDA with MAP inference Bilingual Topic Model • For each topic z: 𝝓𝐪𝑧 , 𝝓𝐝𝑧 ~ Dir(𝜷) • For each q-d pair: 𝜽 ~ Dir(𝜶) • Each • Each 𝐪 q is generated by 𝑧 ~ 𝜽 and 𝑞 ~ 𝝓𝑧 w is generated by 𝑧 ~ 𝜽 and 𝑤 ~ 𝝓𝒅𝑧 Log-likelihood of LDA Given Data • 𝝓 and 𝜽: distribution of distribution • LDA requires integral over 𝝓 and 𝜽 • This is the MAP approximation to LDA MAP Estimation via EM • Estimate (𝜽, 𝛟𝐪 , 𝛟𝐝 ) by maximizing joint log likelihood of q-d pairs and the parameters • E-Step: compute posterior probabilities – 𝑃 𝑧 𝑞, 𝜽𝐪,𝐝 ,𝑃 𝑧 𝑤, 𝜽𝐪,𝐝 • M-Step: update parameters using the posterior probabilities –𝑃 𝑞 𝐪 𝝓𝑧 ,𝑃 𝑤 𝝓𝒅𝑧 , 𝑃(𝑧|𝜽𝐪,𝐝 ) Posterior Regularization (PR) • q and its clicked d are relevant, thus they – Share same prior distribution over topics (MAP) – Weight each topic similarly (PR) • Model training via modified EM – E-step: for each q-d pair, project the posterior topic distributions onto a constrained set, where the expected fraction of each topic is equal in q and d – M-step: update parameters using the projected posterior probabilities Topic Models for Doc Ranking Evaluation Methodology • Measurement: NDCG, t-test • Test set: – 16,510 English queries sampled from 1-y log – Each query is associated with 15 docs – 5-level relevance label for each query-doc pair • Training data for translation models: – 82,834,648 query-title pairs Topic Model Results Summary • Probability – Basics – A case study of a probabilistic model: N-gram language model • Statistical Machine Translation (SMT) – Generative modeling (story  math  code) – Word/phrase/syntax based models • SMT for web search ranking – View query and doc as different language – Doc ranking via 𝑃(𝐪|𝐝) – Word/phrase/topic based models • Slides/doc will be available at http://research.microsoft.com/~jfgao/ Main Reference • • • • • • • Berger, A., and Lafferty, J. 1999. Information retrieval as statistical translation. In SIGIR, pp. 222-229. Gao, J., He, X., and Nie, J-Y. 2010. Clickthrough-based translation models for web search: from word models to phrase models. In CIKM, pp. 1139-1148. Gao, J., Toutanova, K., and Yih, W-T. 2011. Clickthrough-based latent semantic models for web search. In SIGIR. Huang, J., Gao, J., Miao, J., Li, X., Wang, K., and Behr, F. 2010. Exploring web scale language models for search query processing. In Proc. WWW 2010, pp. 451-460. MacKay, David J. C. 2003. Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press. Manning, C., and H. Chutze. 1999. Foundations of statistical natural language processing. MIT Press. Cambridge. Philipp Koehn. Statistical Machine Translation. Cambridge University Press. 2009.

Statistical Translation and Web Search Ranking Jianfeng Gao Natural language processing, MSR

Related documents

Products

Support

Statistical Translation and Web Search Ranking Jianfeng Gao Natural language processing, MSR

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib