Information retrieval Lab 4 Improving the index In this lab you will improve the index you made in the last lab by adding facilities to deal with punctuation, numbers, dates and to do stemming. Part 1: Stemming Experiment on the ‘ordinary’ words from the Moby common words list, using three different stemmers (from https://pypi.python.org/pypi/stemming/1.0 and UEAlite), for example Lovins, Porter, UEAlite. For each stemmer, build a list of original and stemmed words. What’s the vocabulary size for each? How many terms are being conflated? (estimate from a sample if necessary) What sort of terms are they? How much difference might it make for retrieval? Part 2: Punctuation, numbers and dates Working with the subset of the Reutes21578 corpus on Blackboard, write Python functions to 1. Count and strip punctuation. 2. Identify numbers and numeric expressions. 3. Identify dates and times.