COSC 6397 Big Data Analytics Recap for the final quiz Edgar Gabriel Spring 2015 Quiz • Will be held on April 30, 2.30-3.45 pm • You can have 2 pages of handwritten notes – Can write on both sides, but has to be handwritten (no xerox copies or printed notes) • Topics discussed on next slide • Quiz contains knowledge questions and some simple exercises – Answer questions short and precise – Exercises mostly based on simple/trivial code samples to complete/to find errors etc. – Simple calculation (Speedup, Amdahls Law, etc.) 1 Relevant material • Fundamentals • Master Worker Programming Pattern • MPI (concept only, there will be no programming questions to MPI in this quiz) • MapReduce (I), (II), (III) and Advanced MapReduce programming • Distributed File Systems (I) and (II) • Data Formats (II) • Yarn • Graph Algorithms and Giraph • Inverted Index What is not part of the quiz • • • • • • • • Introduction and Organizational issues Spark Data Formats (I): HDF5 and RDF Pig and Hive Reliability Fuzzy clustering Expectation Maximization Mahout 2 Paper literature • Focus mostly on material presented in the slides • Additional reading that could be useful for the quiz: 1. Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters” http://research.google.com/archive/mapreduce-osdi04.pdf 2. Andrew Pavlo, Erik Paulson, Alexander Rasin, “A Comparison of Approaches to Large-Scale Data Analysis”, http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf 3. http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf 4. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, “Bigtable: A Distributed Storage System for Structured Data”, http://static.googleusercontent.com/media/research.google.com/en /us/archive/bigtable-osdi06.pdf 5. G. Malewicz, M.H. Austern, A.J.C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: A System for LargeScale Graph Processing” http://kowshik.github.io/JPregel/pregel_paper.pdf 3