Cleveland State University Department of Electrical and Computer Engineering

advertisement
Cleveland State University
Department of Electrical and Computer Engineering
CIS 611/711 Enterprise Databases and Data Warehouse
Catalog Data: CIS 611/711 Enterprise Database Systems and Data Warehouse (3-0-3).
Prerequisites: CIS 505 and CIS 530. Detailed study of modern enterprise
level database systems and data warehouse for their decision support data
analytics systems. Topics include theoretical and practical approaches to
logical and physical database design with normalization theory,
elimination of update anomalies, functional dependency, lossless and
dependency preserving decompositions, file system, index techniques, and
access path. The course focuses on query processing strategies, query
execution techniques, and query optimization strategies and extends the
study to design and implementation of applications of data analytics
systems with data warehouse and online analytical processing (OLAP). It
continues an exploration on integrated enterprise data management
systems with parallel data warehouse and non-relational database systems
and the latest advances in database research with selective papers.
Textbooks:
Fundamentals of Database Systems, by Elmasri / Navathe. 7th Edision.
Addison Wesley Pub Co.
Database Management Systems, 3rd Edition, by Raghu Ramakrishnan and
Johannes Gehrke. Ed. McGraw-Hill, 2002
Data Mining Concepts and Techniques, 3rd Edition, by Jiawei Han /
Micheline Kamber , Morgan Kaufmann Publishers, 2011
Lecture Notes Taken from the Selective Database Research Papers and
Industry Database System Design Documentations
References:
The Theory of Relational Databases, by D. Maier, Ed. Comp. Sc. Press,
1983
Coordinator: Dr. Sunnie S. Chung
Outcomes:
Upon successful completion of the course, the student will be able to:
• Create -or reengineer- well designed relational databases mostly free
from redundancy and abnormal update syndromes;
• Design physical database with comprehensive knowledge of
performance optimization and index;
• Develop a query optimizer with query processing strategies, execution
techniques for practical database problems;
Understand complex query optimization strategies of enterprise level
modern database systems and develop new optimization strategies to
solve practical database problems;
• Analyze database system performance;
• Develop query optimization strategies to solve practical database
problems;
• Develop enterprise level database applications with Parallel Data
Warehouse and OLAP for practical data analytics problems;
• Exposed to recent database research, in particular, integrated big data
management systems with Parallel Data Warehouse, Columnar
databases and Hadoop based NoSQL systems.
Topics
Lecture Hours
1. Introduction to Modern Enterprise Database Systems
3
Parallel Database System Architecture,
SQL Query Processing, View Processing
2. Relational Data Model and Database Constraints,
3
Relational Algebra, Relational Tuple Calculus
3. Database Design, Normalization Theory
3
Attribute Closure, XClosure Algorithms
4. Functional Dependency,
3
Lossless Joins, and Dependency Preserving Decompositions
5. Relational Database File System, Disk Storage
3
6. Physical Database Design:
3
Index; Primary Index, Secondary Index, Clustering Index,
Multilevel Index, B/B+ Tree, Hash Index,
Access Path
7. Query Processing Strategies, Query Execution Techniques
3
8. Evaluation of Relational Operators:
3
Projection, Index Loop Join, Sort Merge Join, Hash Join, Group By
Aggregation
9. External Sorting, External Hashing Techniques
3
10. Cost Analysis of Query Processing Performance
3
Join Algorithms and Analysis
11. Advanced Query Optimization techniques for Complex Queries
3
•
Advanced Join Types, Correlated Subquery Processing, Partitioned Group By,
Query Rewrites Optimizations
Data Warehouse and On Line Analytical Processing (OLAP)
3
Multi-Dimensional Data Warehouse Design
OLAP Aggregation Operators: Cube, Roll Up, Drill Down
Implementation of Data Warehouse and OLAP
13. Data Mining Process with Enterprise Data Warehouse
3
Multi-Dimensional eXpressions (MDX), Data Mining eXpressions (DMX)
14. Advanced Research literature review and Presentations
3
15. Exams and Reviews
3
__
45
12.
Grading: The course grade is based on a student's overall performance through the entire
Semester. The final grade is distributed among the following components:
• Exams (Midterm & Final) 40% (15% for Midterm, 25% for Final)
• Computer Labs 30% (about 3-4 Assignments)
• 1 Project 20% (2 person group project): Project Specifications in detail will be
given in class
• Research Topic Presentation: 10%
Additional Requirements for CIS711 Students:
• Doctoral students who take CIS711 must select a project to work on
• Doctoral students who take CIS711 must work on the project individually (instead
of 2 person group)
• The list of projects and research papers for doctoral students will be given
separately in class. A tentative example of the selection of the research projects
and the paper list are given at the end of the course schedule here
• In each exam, one additional problem is designed to be completed by doctoral
students only
Computer Software Required:
1.
2.
3.
4.
5.
Visual Studio 2012/2013 or higher
SQL Server 2012/2014 or higher
Microsoft SQL Server Data Tools (SSDT) for Analysis Service
Business Intelligence (SSDT BI) for Visual Studio 2012/2014 or higher
Adventure Works (a Sample Data Warehouse) for SQL Server 2012/2014 or
higher
Tentative List of Research Papers and Projects for CIS 711 Doctoral Students:
CIS 711 Doctoral Students should choose one of the following research topics and give a 30 min
presentation on the papers (will be given in class) and complete a project related to the subjects.
Paper List and Project Specification on each research topic below will be given in class.
Examples of Selective Current Database Research Topics in Modern Enterprise Database Systems
(The subjects and the paper list may vary every year.)
1.
Integrated Big Data Processing Systems
Orca: A Modular Query Optimizer Architecture for Big Data (SIGMOD 2014)
Petabyte Scale Databases and Storage Systems Deployed at Facebook, (SIGMOD 2013)
Integrating Hadoop and parallel DBMS (SIGMOD2010)
2.
Big Data Processing System with Parallel Data Warehouse (PDW)
Query Optimization in Microsoft SQL Server Parallel Data Warehouse (SIGMOD 2012)
3.
Information Retrieval System: Semantic Content Based Approaches
Bigtable: A Distributed Storage System for Structured Data, by
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Google, Inc. in the Proceedings of OSDI 2006
4.
Enterprise Big Data Processing System with Cloud:
Google Cloud, Amazon Cloud, Microsoft Azure
SQL Azure as a Self-Managing Database Service: Lessons
Learned and Challenges Ahead by Kunal Mukerjee, et al
(Microsoft) in the proceedings of IEEE Computer Society
Technical Committee on Data Engineering 2011
5.
Enterprise Big Data Processing System with Parallel Data Warehouse (PDW) on Cloud
6.
Columnar Databases
A Storage Advisor for Hybrid Store Databases by SAP (SIGMOD 2014)
Efficient Transaction Processing in SAP HANA Database--The End of a Column Store Myth
SAP, (SIGMOD 2012)
Download