Hmwk21 1. Okay: 6, 10, 12, 15 2. 40 tokens, 31 terms The the are two tokens, but one term 3. 4. 5. 6. a. okay, b. organ <> organization, one <> on Okay n/a Just stating “1 token” is not enough, show some detail. What did you try; what was the result? 02/17/15.1 Hmwk22 • Exercise 20.1: Why is it better to partition hosts (rather than individual URLs) between the nodes of a distributed crawl system? • This allows for “politeness” measures to be implemented, preventing excessive calls to a particular machine in short periods of time, by caching data from one node. • Consider a web hosting company with multiple websites, i.e. a pool of ip addresses. Grouping hosts by ip address could also support “politeness”. 02/17/15.2 Hmwk23 • Exercise 20.2: Why should the host splitter precede the Duplicate URL Eliminator (DUE)? • Host splitters can be cached, allowing the host splitter to discard duplicate URLs, thereby reducing network traffic. The same goes for DUE but the DUE requires significant RAM storage on each machine. Using the host splitter first, slows the growth rate of the cache and reduces redundant RAM usage. 02/17/15.3