advertisement

I300 Midterm Preview 1) Completely label a diagram of an information retrieval system 2) Suppose that Si is the event that document i is retrieved in answer to a certain query. Suppose that Ri is the event that document i is a relevant answer to a certain query. a) Explain what the following probability means: Pr ( Ri Si ) 3) Would you expect the value of Pr ( Ri Si ) to be smaller than, equal to or bigger than Pr ( Ri ) (I include the following "hint" but worry that it might be more confusing than helpful: hint: Pr ( Ri ) is the same as the quantity P(rel) = n / N that was in the book.) Explain your answer. 4) We have talked about document descriptions. Where do document descriptions come from.? (The best answer begins by identifying two sources, and then proceeds to discuss each.) In the vector model we could think that we are working with lines and dimensions (in a graph each axis defines a dimension.) 5) Explain what a line (vector) is in the vector model (be sure to discuss both the length and direction of a line.) 6) What is a dimension (axis) in the vector model. 6a) the vector model is directly related to the cosine similarity measure. (That is, a measure of the similarity of a document and a query.) Explain in words what the cosine measure is and why it goes with the vector model 7)In the Boolean model we can think of the description as being points and circles. Describe in words: a) What does a circle mean in the Boolean model? b) what does a point mean in a Boolean model? c) Describe one major advantage of the Boolean model d) Describe one major disadvantage of the Boolean model 10)What is a “stop word” 11) Why do we care about stop words in information retrieval? 12) Zipf’s Law states that Rank times Frequency equals a Constant. a) Explain what Rank and Frequency mean in this situation. That is, how would you get the values? b) Draw and clearly label a graph showing Zipf's Law. Be sure to label the axes. 13) We have introduced the terms "type" and "token" Define each of these clearly. 13a) type 13b) token 16) The handout on data models describes two ways of looking at document description data. Clearly describe each a) The matrix model 1 b) The index model c) Why do we have two different models? 2