Cleveland State University Department of Electrical and Computer Engineering CIS 611/711 Enterprise Databases and Data Warehouse Catalog Data: CIS 611/711 Enterprise Database Systems and Data Warehouse (3-0-3). Prerequisites: CIS 505 and CIS 530. Detailed study of modern enterprise level database systems and data warehouse for their decision support data analytics systems. Topics include theoretical and practical approaches to logical and physical database design with normalization theory, elimination of update anomalies, functional dependency, lossless and dependency preserving decompositions, file system, index techniques, and access path. The course focuses on query processing strategies, query execution techniques, and query optimization strategies and extends the study to design and implementation of applications of data analytics systems with data warehouse and online analytical processing (OLAP). It continues an exploration on integrated enterprise data management systems with parallel data warehouse and non-relational database systems and the latest advances in database research with selective papers. Textbooks: Fundamentals of Database Systems, by Elmasri / Navathe. 7th Edision. Addison Wesley Pub Co. Database Management Systems, 3rd Edition, by Raghu Ramakrishnan and Johannes Gehrke. Ed. McGraw-Hill, 2002 Data Mining Concepts and Techniques, 3rd Edition, by Jiawei Han / Micheline Kamber , Morgan Kaufmann Publishers, 2011 Lecture Notes Taken from the Selective Database Research Papers and Industry Database System Design Documentations References: The Theory of Relational Databases, by D. Maier, Ed. Comp. Sc. Press, 1983 Coordinator: Dr. Sunnie S. Chung Outcomes: Upon successful completion of the course, the student will be able to: • Create -or reengineer- well designed relational databases mostly free from redundancy and abnormal update syndromes; • Design physical database with comprehensive knowledge of performance optimization and index; • Develop a query optimizer with query processing strategies, execution techniques for practical database problems; Understand complex query optimization strategies of enterprise level modern database systems and develop new optimization strategies to solve practical database problems; • Analyze database system performance; • Develop query optimization strategies to solve practical database problems; • Develop enterprise level database applications with Parallel Data Warehouse and OLAP for practical data analytics problems; • Exposed to recent database research, in particular, integrated big data management systems with Parallel Data Warehouse, Columnar databases and Hadoop based NoSQL systems. Topics Lecture Hours 1. Introduction to Modern Enterprise Database Systems 3 Parallel Database System Architecture, SQL Query Processing, View Processing 2. Relational Data Model and Database Constraints, 3 Relational Algebra, Relational Tuple Calculus 3. Database Design, Normalization Theory 3 Attribute Closure, XClosure Algorithms 4. Functional Dependency, 3 Lossless Joins, and Dependency Preserving Decompositions 5. Relational Database File System, Disk Storage 3 6. Physical Database Design: 3 Index; Primary Index, Secondary Index, Clustering Index, Multilevel Index, B/B+ Tree, Hash Index, Access Path 7. Query Processing Strategies, Query Execution Techniques 3 8. Evaluation of Relational Operators: 3 Projection, Index Loop Join, Sort Merge Join, Hash Join, Group By Aggregation 9. External Sorting, External Hashing Techniques 3 10. Cost Analysis of Query Processing Performance 3 Join Algorithms and Analysis 11. Advanced Query Optimization techniques for Complex Queries 3 • Advanced Join Types, Correlated Subquery Processing, Partitioned Group By, Query Rewrites Optimizations Data Warehouse and On Line Analytical Processing (OLAP) 3 Multi-Dimensional Data Warehouse Design OLAP Aggregation Operators: Cube, Roll Up, Drill Down Implementation of Data Warehouse and OLAP 13. Data Mining Process with Enterprise Data Warehouse 3 Multi-Dimensional eXpressions (MDX), Data Mining eXpressions (DMX) 14. Advanced Research literature review and Presentations 3 15. Exams and Reviews 3 __ 45 12. Grading: The course grade is based on a student's overall performance through the entire Semester. The final grade is distributed among the following components: • Exams (Midterm & Final) 40% (15% for Midterm, 25% for Final) • Computer Labs 30% (about 3-4 Assignments) • 1 Project 20% (2 person group project): Project Specifications in detail will be given in class • Research Topic Presentation: 10% Additional Requirements for CIS711 Students: • Doctoral students who take CIS711 must select a project to work on • Doctoral students who take CIS711 must work on the project individually (instead of 2 person group) • The list of projects and research papers for doctoral students will be given separately in class. A tentative example of the selection of the research projects and the paper list are given at the end of the course schedule here • In each exam, one additional problem is designed to be completed by doctoral students only Computer Software Required: 1. 2. 3. 4. 5. Visual Studio 2012/2013 or higher SQL Server 2012/2014 or higher Microsoft SQL Server Data Tools (SSDT) for Analysis Service Business Intelligence (SSDT BI) for Visual Studio 2012/2014 or higher Adventure Works (a Sample Data Warehouse) for SQL Server 2012/2014 or higher Tentative List of Research Papers and Projects for CIS 711 Doctoral Students: CIS 711 Doctoral Students should choose one of the following research topics and give a 30 min presentation on the papers (will be given in class) and complete a project related to the subjects. Paper List and Project Specification on each research topic below will be given in class. Examples of Selective Current Database Research Topics in Modern Enterprise Database Systems (The subjects and the paper list may vary every year.) 1. Integrated Big Data Processing Systems Orca: A Modular Query Optimizer Architecture for Big Data (SIGMOD 2014) Petabyte Scale Databases and Storage Systems Deployed at Facebook, (SIGMOD 2013) Integrating Hadoop and parallel DBMS (SIGMOD2010) 2. Big Data Processing System with Parallel Data Warehouse (PDW) Query Optimization in Microsoft SQL Server Parallel Data Warehouse (SIGMOD 2012) 3. Information Retrieval System: Semantic Content Based Approaches Bigtable: A Distributed Storage System for Structured Data, by Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Google, Inc. in the Proceedings of OSDI 2006 4. Enterprise Big Data Processing System with Cloud: Google Cloud, Amazon Cloud, Microsoft Azure SQL Azure as a Self-Managing Database Service: Lessons Learned and Challenges Ahead by Kunal Mukerjee, et al (Microsoft) in the proceedings of IEEE Computer Society Technical Committee on Data Engineering 2011 5. Enterprise Big Data Processing System with Parallel Data Warehouse (PDW) on Cloud 6. Columnar Databases A Storage Advisor for Hybrid Store Databases by SAP (SIGMOD 2014) Efficient Transaction Processing in SAP HANA Database--The End of a Column Store Myth SAP, (SIGMOD 2012)