Old Business def fact(x): if(x <= 1): return 1 else: return x* fact(x-1) def binomial_coef(n,k): return fact(n) / (fact(k) * fact(n-k)) for n in range(9): print [binomial_coef(n,k) for k in range(n+1)] [1] [1, 1] [1, 2, 1] [1, 3, 3, 1] [1, 4, 6, 4, 1] [1, 5, 10, 10, 5, 1] [1, 6, 15, 20, 15, 6, 1] [1, 7, 21, 35, 35, 21, 7, 1] [1, 8, 28, 56, 70, 56, 28, 8, 1] Homework for next time • What are the 10 most common words in – Moby Dick? • Make a concordance of the 3rd most common word • Do the same for • https://jshare.johnshopkins.edu/kchurch4/public_html/teaching/103/Spring2011/ – What are the 10 most common words on this page? – Make a concordance of the 3rd most common word • No need to buy the book – Free online at http://www.nltk.org/book • Read Chapter 1 – http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html • Install NLTK (see next slide) – Warning: It might not be easy (and it might not be your fault) – Let us know how it goes • (both positive and negative responses are more appreciated) Installing NLTK http://nltk.googlecode.com/svn/trunk/doc/book/ch01.html • Chapter 01: pp. 1 - 4 – Python – NLTK – Data Concordances Python Objects Lists >>> sent1 ['Call', 'me', 'Ishmael', '.'] >>> type(sent1) <type 'list'> >>> sent1[0] First 'Call' >>> sent1[1:len(sent1)] ['me', 'Ishmael', '.'] Rest Strings >>> sent1[0] 'Call' >>> type(sent1[0]) <type 'str'> >>> sent1[0][0] 'C' >>> sent1[0][1:len(sent1[0])] 'all' Types & Tokens Polymorphism Tokens Types Tokens Types FreqDist URLs (Chapter 3) HTML Works with almost any URL! >>>url="https://jshare.johnshopkins.edu/kchurch4/public_ht ml/teaching/103/Lecture07/WebProgramming/javascript_ example_with_sounds.html" >>> def url2text(url): html = urlopen(url).read() raw = nltk.clean_html(html) tokens = nltk.word_tokenize(raw) return nltk.Text(tokens) >>> text=url2text(url) >>> text.concordance('Nonsense') Hints for Homework import nltk from urllib import urlopen def url2text(url): html = urlopen(url).read() raw = nltk.clean_html(html) tokens = nltk.word_tokenize(raw) return nltk.Text(tokens) url = “https://jshare.johnshopkins.edu/kchurch4/public_html/te aching/103/Spring2011/” t = url2text(url) t.concordance("web”)