Recap for the final quiz

advertisement
COSC 6397
Big Data Analytics
Recap for the final quiz
Edgar Gabriel
Spring 2015
Quiz
• Will be held on April 30, 2.30-3.45 pm
• You can have 2 pages of handwritten notes
– Can write on both sides, but has to be handwritten (no
xerox copies or printed notes)
• Topics discussed on next slide
• Quiz contains knowledge questions and some simple
exercises
– Answer questions short and precise
– Exercises mostly based on simple/trivial code samples to
complete/to find errors etc.
– Simple calculation (Speedup, Amdahls Law, etc.)
1
Relevant material
• Fundamentals
• Master Worker Programming Pattern
• MPI (concept only, there will be no programming
questions to MPI in this quiz)
• MapReduce (I), (II), (III) and Advanced MapReduce
programming
• Distributed File Systems (I) and (II)
• Data Formats (II)
• Yarn
• Graph Algorithms and Giraph
• Inverted Index
What is not part of the quiz
•
•
•
•
•
•
•
•
Introduction and Organizational issues
Spark
Data Formats (I): HDF5 and RDF
Pig and Hive
Reliability
Fuzzy clustering
Expectation Maximization
Mahout
2
Paper literature
• Focus mostly on material presented in the slides
• Additional reading that could be useful for the quiz:
1. Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data
Processing on Large Clusters”
http://research.google.com/archive/mapreduce-osdi04.pdf
2. Andrew Pavlo, Erik Paulson, Alexander Rasin, “A Comparison of
Approaches to Large-Scale Data Analysis”,
http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
3. http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf
4. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah
A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E.
Gruber, “Bigtable: A Distributed Storage System for Structured Data”,
http://static.googleusercontent.com/media/research.google.com/en
/us/archive/bigtable-osdi06.pdf
5. G. Malewicz, M.H. Austern, A.J.C. Bik, J. C. Dehnert, I. Horn, N.
Leiser, and G. Czajkowski, “Pregel: A System for LargeScale Graph
Processing” http://kowshik.github.io/JPregel/pregel_paper.pdf
3
Download