COP5725 Advanced Database Systems Final Review Tallahassee, Florida, 2016

advertisement
COP5725
Advanced Database Systems
Spring 2016
Final Review
Tallahassee, Florida, 2016
Final Exam
• Time: Thursday 04/28/2016 12:30pm --- 2:30pm
• Venue: LOV 103, in-class exam
• Closed book/note, but you can bring one-page cheat
sheet (A4, double side)
– Plan your strategy well
• No calculators or other electronic devices
– Laptops, IPADs, smart phones, etc. are prohibited
• Any form of cheating in the examination will result in
zero grade, and will be reported to the university
1
Final Exam
• Bring you FSU ID to attend the final exam
• 35% of your final score
• Coverage
1. All materials taught in the class and in the textbook, starting
from Introduction, to LSH
2. Six required reading papers
2
Format
• One set of true/false questions with brief answers
– e.g., MapReduce model is a better model for large-scale data than
parallel databases
– Answer: False. Because ……
• Short-answer questions
– e.g, What is the nested-loop join? What is the complexity of this
join algorithm?
• Several more questions
– e.g., Dynamic programming for optimal join order selection
• 100 points
• I believe you have enough time (120 minutes)
3
Suggested Method for Study
• Go over the lecture slides and study the textbook
• Reread the required reading papers
• Work independently on problems in
HW/lectures/exercises in the textbook
– Any practice– work it out before checking solutions
• Questions?
– Discuss with people in the class
– Office hours
• Monday: 2pm-4pm
4
Final Exam
5
Advanced DB Systems
User/Web Forms/Applications/DBA
query
Query Parser
transaction
DDL commands
Transaction Manager
DDL Processor
Concurrency
Control
Logging &
Recovery
Query Rewriter
Query Optimizer
Query Executor
Records
Indexes
Buffer Manager
Storage Manager
Storage
Lock Tables
Buffer:
data, indexes, log, etc
Main Memory
data, metadata, indexes, log, etc
6
Results are Impressive
7
Why Such Great Achievements?
8
And Many More Behind the Scene
• The next one is
You!
9
Data Storage and Representation
• Memory Hierarchy
– Speed vs. Size vs. Cost
• Disk
– Latency = seek + rotation + transfer
– I/O cost
• Random I/O vs. Sequential I/O
• Data Representation in RDB Systems
• Database Addresses
– Pointer swizzling
• Record Modification
• Row Store vs. Column Store
10
Indexing
• What is indexing and different types of indices
• B/B+ Trees
• Inverted Index and Boolean Queries
– Query optimization
• Multidimensional Indices and Queries
– kd-tree
– quad-tree
– R tree
• Bitmap Index
11
Query Processing
• Logical vs. Physical Operators
– Iterator model
– Materialization vs. pipelining
• One-pass algorithms
– Nested-loop join
– ……
• Two-pass algorithms
– Sort based
– Hash based
• Index based algorithms
12
Query Optimization
• Algebraic Laws
• Rule Based Optimization
– Heuristic rules for selection
• Cost Based Optimization
– Dynamic programming
• Size Estimation
13
MapReduce
• What is MapReduce
– General ideas
– Map
– Reduce
– Combiner: local aggregation for optimization
• Distributed File Systems
• RDB vs. MapReduce
• Relational Algebra in MapReduce
14
Data Mining
• Data Mining and Knowledge Discovery from Data
• Frequent Pattern Mining
– Association rules
– Closed patterns and maximal patterns
– Apriori algorithms
• Finding Similar Patterns
– Shingles
– Jaccard similarity and Minhashing
– Locality sensitive hashing
15
Break a Leg!
Download