Discussion Class 4 Latent Semantic Indexing 1 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment. When answering: Stand up. Give your name. Make sure that the TA hears it. Speak clearly so that all the class can hear. Suggestions: Do not be shy at presenting partial answers. Differing viewpoints are welcome. 2 Question 1: Basics (a) Explain the name "latent semantic analysis"? (b) What problems is latent semantic analysis attempting to solve? (c) What criteria were used in selecting singular-value decomposition? 3 Question 2 • term document query --- cosine > 0.9 4 Question 3: Rank Reduction (a) Explain the matrices in the singular value decomposition: X = T0S0D0' (b) The rank reduction method is to keep the first k elements of S0 and set the others to zero. This gives: ^ = TSD' ~ X X~ What has this to do with latent semantics? 5 Q4: Experimental Results: 100 Factors (a) LSI-100 does better at the right of this graph than on the left. What has this to do with synonymy and polysemy? (b) Why were the authors surprised that TERM and SMART gave similar results? 6 Question 5: Experimental Results (a) Describe the methodology of the MED experiment. (b) What conclusions can you draw from this experiment? (c) The results of the CISI experiment were disappointing. What are some possible explanations? (d) This is a new method. What comes next? 7 Question 6: Number of Factors What data does this graph plot? What conclusions can you draw from this graph? 8 Question 7: Performance What does the paper say about the following? (a) Storage requirements (b) Efficiency of searching (c) Updating of indexes 9