COP5725 Advanced Database Systems Spring 2016 Final Review Tallahassee, Florida, 2016 Final Exam • Time: Thursday 04/28/2016 12:30pm --- 2:30pm • Venue: LOV 103, in-class exam • Closed book/note, but you can bring one-page cheat sheet (A4, double side) – Plan your strategy well • No calculators or other electronic devices – Laptops, IPADs, smart phones, etc. are prohibited • Any form of cheating in the examination will result in zero grade, and will be reported to the university 1 Final Exam • Bring you FSU ID to attend the final exam • 35% of your final score • Coverage 1. All materials taught in the class and in the textbook, starting from Introduction, to LSH 2. Six required reading papers 2 Format • One set of true/false questions with brief answers – e.g., MapReduce model is a better model for large-scale data than parallel databases – Answer: False. Because …… • Short-answer questions – e.g, What is the nested-loop join? What is the complexity of this join algorithm? • Several more questions – e.g., Dynamic programming for optimal join order selection • 100 points • I believe you have enough time (120 minutes) 3 Suggested Method for Study • Go over the lecture slides and study the textbook • Reread the required reading papers • Work independently on problems in HW/lectures/exercises in the textbook – Any practice– work it out before checking solutions • Questions? – Discuss with people in the class – Office hours • Monday: 2pm-4pm 4 Final Exam 5 Advanced DB Systems User/Web Forms/Applications/DBA query Query Parser transaction DDL commands Transaction Manager DDL Processor Concurrency Control Logging & Recovery Query Rewriter Query Optimizer Query Executor Records Indexes Buffer Manager Storage Manager Storage Lock Tables Buffer: data, indexes, log, etc Main Memory data, metadata, indexes, log, etc 6 Results are Impressive 7 Why Such Great Achievements? 8 And Many More Behind the Scene • The next one is You! 9 Data Storage and Representation • Memory Hierarchy – Speed vs. Size vs. Cost • Disk – Latency = seek + rotation + transfer – I/O cost • Random I/O vs. Sequential I/O • Data Representation in RDB Systems • Database Addresses – Pointer swizzling • Record Modification • Row Store vs. Column Store 10 Indexing • What is indexing and different types of indices • B/B+ Trees • Inverted Index and Boolean Queries – Query optimization • Multidimensional Indices and Queries – kd-tree – quad-tree – R tree • Bitmap Index 11 Query Processing • Logical vs. Physical Operators – Iterator model – Materialization vs. pipelining • One-pass algorithms – Nested-loop join – …… • Two-pass algorithms – Sort based – Hash based • Index based algorithms 12 Query Optimization • Algebraic Laws • Rule Based Optimization – Heuristic rules for selection • Cost Based Optimization – Dynamic programming • Size Estimation 13 MapReduce • What is MapReduce – General ideas – Map – Reduce – Combiner: local aggregation for optimization • Distributed File Systems • RDB vs. MapReduce • Relational Algebra in MapReduce 14 Data Mining • Data Mining and Knowledge Discovery from Data • Frequent Pattern Mining – Association rules – Closed patterns and maximal patterns – Apriori algorithms • Finding Similar Patterns – Shingles – Jaccard similarity and Minhashing – Locality sensitive hashing 15 Break a Leg!