CMU 15-505: Internet Search Technologies 15-505 Internet Search Technologies • Instructors: – Alona Fyshe – Scott Larsen – Chris Monson – Kamal Nigam • http://www.cs.cmu.edu/~knigam/15-505 What does it take to build a worldclass search engine and related services? • • • • • • • Lots of computer science Massively parallel computation Special-purpose data storage Information retrieval Machine learning Language analysis User interface design • Study each of these topics in narrow but deep fashion • Format: small seminar, readings, interactive discussions, programming practicum • Grading: – 55% programming homework – 30% reading response – 15% class participation What are reading responses? • Practice for reading and thinking about computer science research papers • Meant to be open-ended, fairly short (1 page) • Can be: – Summary of paper – Critique of theory, experiments, approach – Suggestions for follow-on studies Collaboration and Cheating • Please collaborate on ideas, approaches, diagnosing problems – use the mailing list • All words and code must be your own • Disclose all collaborations • Clarify any doubts What will make this class enjoyable? • Interactive • Flexibility to explore fun domains and data • Early feedback to us about what works and doesn’t Problems in Internet Search Technology: • Huge Problems – E.g. what changed in the web since this time yesterday? • Classic Problems – E.g. sorting a gazillion numbers fast • New Problems – E.g. making sense of dynamic Cyrillic web pages • Practical Problems – Eg. how do we make both advertisers and consumers happier at the same time? • Non-practical Problems – E.g. what do you see if you zoom all the way in on the moon? • Beautiful Problems – And Fun Problems A Taste • Sorting – Scaling size up – Scale time requirements down • Matrix Operations – Thinking about the problem in a blend of old ways and new ways Classic Sorting Algorithms • • • • • • • • Quick Merge Selection Shell Heap Radix Bucket …. Enlarge the Problem: • 1,000x too many keys for a single machine • 1024 machines to use Sorting: Parallel • How would you do it? – Quick? – Merge? – Selection? – Shell? – Heap? – Radix? – Bucket? – …. Bitonic Sort: Batcher (1968) • Bitonic Sequence: <a0, a1, …, an-1 > – Exists i such that <a0 .. ai> is monotonically increasing and <ai+1 .. an-1> is monotonically decreasing – Or: there exists a cyclic shift of indices such that the above is satisfied – Eg. < 8, 9, 2, 1, 0, 4> is a bitonic sequence Bitonic Merging Network Bitonic Merge on a Hypercube Bitonic Sort Bitonic Sort Procedure BitonicSort for i = 0 to d -1 for j = i downto 0 if (i + 1)st bit of iproc <> jth bit of iproc comp_exchange_max(j, item) else comp_exchange_min(j, item) endif endfor endfor comp_exchange_max and comp_exchange_min compare and exchange the item with the neighbor on the jth dimension Bitonic Sort Demo http://www.inf.fhflensburg.de/lang/algorithmen/sortieren/bit onic/bitonicen.htm Parallel Sort: Beauty or a Beast? • What does it take to implement this? Bitonic Sort: Why? • • • • O(n log2(n)) Data independent Resource needs are perfectly defined Very parallel friendly Matrix Multiplication cij aikbkj k 0.75 0.25 0.0 0.0 0.75 0.25 0.0 0.0 0.5625 0.375 0.0625 0.0 0.25 0.75 0.0 0.0 0.0 0.75 0.25 0.0 0.1875 0.675 0.1875 0.0 0.0 0.75 0.25 0.0 0.0 0.0 0.75 0.25 0.0 0.5625 0.375 0.0625 0.25 0.0 0.0 0.75 0.25 0.0 0.0 0.75 0.375 0.0625 0.0 0.5625 * = Matrix Pipeline cij aikbkj 0.5625 k 0.75 0.25 0.0 0.75 0.25 0.0 0.0 + 0.0625 0.25 0.75 0.0 0.0 + 0.0 0.0 0.75 0.25 0.0 + 0.0 0.25 0.0 0.0 0.75 0.0 = 0.0 0.75 0.25 0.0 0.0 0.0 0.75 0.25 0.625 0.375 0.0 0.0 0.25 0.0 0.0 0.75 0.1875 0.75 0.0625 0.0 0.0625 0.5625 0.1875 0.1875 0.375 0.0625 0.0 0.5625 Visualization * = Visualization * = Visualization Visualization Matrix Multiplication • A cube of processors • Each does a chunk of the computation – Each needs different (and overlapping) portions of the input – Each passes intermediate results to certain neighbors • Result is stored across multiple machines • Seems kinda heavy for a simple algorithm! • Lookup Fox’s algorithm and Canon’s algorithm – Very pretty at one level – Gory at another level A Different View Courtesy http://www.unrealtournament3.com/ Multiplication Multi-texturing * Addition + Blending = Graphics Pipeline Multiply Multiply Add Image (Frame Buffer) How the Algorithm Works How the Algorithm Works How the Algorithm Works * How the Algorithm Works * How the Algorithm Works * + Performance Multiplication of 2 Matrices 1024x1024 0.4 0.35 0.25 0.2 0.15 0.1 0.05 NVIDIA Dual Opteron 280 Dual Core (2.3GHz) P4 Dual Core (3.2GHz) 4 Threads 2 Threads 1 Thread 2 Threads 1 Thread 4 Threads 2 Threads 1 Thread 7800GTX (450,1250) 0 7900GTX (670, 820) Seconds 0.3 Dual Xeon Dual Core (3.6GHz) GPU Sorting Problems in Internet Search Technology: • • • • • • • Huge Problems Classic Problems New Problems Practical Problems Non-practical Problems Beautiful Problems Fun Problems Questions? CMU 15-505: Internet Search Technologies – Kamal Nigam (knigam@google.com) – Chris Monson (shiblon@google.com) – Alona Fyshe (alonaf@google.com) – Scott Larsen (esl@google.com) Bitonic Rearranging (cycling)