ISAM 5332 PREREQUISITE: DATA WAREHOUSING AND DATA MINING SPRING 2016 ISAM 5331 – Database Design and SQL or equivalent knowledge. CLASSROOM & TIME: SSB 3310/3.201.02 (MIS Database Lab); Mondays, 7:00 PM – 9:50 PM INSTRUCTOR: Mohammad A. Rob, Ph.D. Office: SSB Suite 3-202-9 Voice: (281) 283-3191 E-mail: rob@uhcl.edu Course Website: https://mis.uhcl.edu/rob Office Hours: Mondays, 5-7 PM & Wednesdays: 2-4 PM; and walk-ins & appointment. Teaching Assistant: Check course website. COURSE REQUIREMENTS: Required Text: Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals, Author: Paulraj Ponniah, Second Edition, John Wiley & Sons; ISBN: 978-0-470-46207-2. COURSE FORMAT: Required Material: Instructor’s Class Notes - available in the Course Web site. Required Material: Create a Pivot table in Excel 2013. Online Version from Microsoft. Available through Instructor’s website. Required Materials: Microsoft SQL Server 2012 Analysis Services Multidimensional Modeling Step by Step, Online Version from Microsoft. Available through Instructor’s website. Recommended Material: Microsoft SQL Server 2012 Analysis Services Data Mining Step by Step, Online Version from Microsoft. Available through Instructor’s website. Required Tool: Microsoft SQL Server 2012 Analysis Services (available in the database lab). Recommended Text: Data Mining Concepts and Techniques, Authors: Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers/Academic Press. FACE TO FACE COURSE DESCRIPTION, GOALS, AND LEARNING OUTCOMES: This course will acquaint students with the fundamental knowledge of data warehouse modeling and design. It will also provide students with the knowledge of tools and techniques of data warehouse development in a corporate business environment. It will further familiarize students on the tools and techniques of data interrogation of warehouse data using Pivot tables and OLAP (Online Analytical Processing) methodology. 1 It will prepare students for future careers in data warehouse planning, analysis, design, and implementation as well as making business forecasting using decision-making tools. It will help students to pass data warehousing certification. If time permits, there will be discussions on data mining concepts and techniques. COURSE ACTIVITIES: The course will contain lectures, homework assignments, a group project, group presentations, a test, a research paper and two reports. At least 50% of the activities will be hands-on practice by students using Microsoft tools such as Excel, SQL Server Data Warehouse, Analysis Services and Visual Studio. RESEARCH PAPER: Students will write a paper understanding the current job prospect of a data warehouse analyst. Refer to the course web site for the details of the paper requirements. GROUP PROJECT: Students will work in groups to model, design, and develop a data warehouse. Each group will choose a business industry such as airline, education, retail, financial, insurance, hospitality, investment, and healthcare. They will then develop/collect day-to-day business data of various formats which can be found in files, databases, spreadsheets, or text documents. They will then develop a strategy to transfer these data in a common format and in summarized form to a data warehouse developed in Microsoft Access. They will then transfer that data into a SQL Server data warehouse. Finally, students will apply OLAP tools to extract meaningful business intelligence reports on customers, products, purchases, and so on. Refer to course website for details. Each group needs to submit a final report as outlined in the website. GROUP PRESENTATION: Each student group will make 2-3 presentations on the project mentioned above. The presentations will follow a schedule and they will cover topics of (i) project definition and planning, (ii) data warehouse modeling and expected business intelligence reports, and finally (iii) demonstrate OLAP implementation and reports. Each student must participate in the presentation. ATTENDANCE: Students are expected to be physically present in the class and participate in the discussion on presentations by others. Class rolls may be called anytime. EVALUATION/GRADING POLICY: Midterm 50% Homework 10% Group Project - Presentation 10% Group Project - Final Report 10% Individual Report – Analysis Services/Excel Tabular Modeling 5% Discussion/Attendance 5% Research Paper 10% ____________________________________________________________________ Total 100% 2 CERTIFICATION: Students performing poorly in the test may improve their score by passing a professional certification exam such as Microsoft Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012. At best 10% of the overall course grade can be earned through the certification. Refer to the course website for details on certification. GRADE DISTRIBUTION: A– B– C– D– F = = = = = 90 – 93, 80 – 83, 70 – 73, 60 – 63, 59 and below A B C D = = = = 94 – 100, 84 – 86, 74 – 76, 64 – 66, B+ = 87 – 89, C+ = 77 – 79, D+ = 67 – 69, OTHER INFORMATION: A. Class attendance: Regular class attendance is required and rolls may be called anytime. B. UHCL Software: http://software.uhcl.edu C. Missing Tests and Assignments: Missing tests and laboratories will be counted as zero. Make-up of missing tests and any late submission of assignments will be acceptable only under extreme emergencies. D. Academic Honesty: The Academic Honesty Policy at UHCL (found on the Dean of Students’ website, the Faculty Handbook, the Student Handbook, the Senior Vice President and Provost’s website, the Graduate Catalog, and the Undergraduate Catalog) states: Academic honesty is the cornerstone of the academic integrity of the university. It is the foundation upon which the student builds personal integrity and establishes a standard of personal behavior. Because honesty and integrity are such important factors in the professional community, you should be aware that failure to perform within the bounds of these ethical standards is sufficient grounds to receive a grade of "F" in this course and be recommended for suspension from UHCL. The Honesty Code of UHCL states "I will be honest in all my academic activities and will not tolerate dishonesty." COPYING FROM ONLINE BOOKS & FROM EACH OTHER WILL NOT BE TOLERATED. ABSOLUTELY NO CELL PHONES WILL BE ALLOWED DURING THE TEST. E. Special Academic Accommodations: If you believe you have a disability requiring an accommodation, contact Disability Services at 281‐283‐2648 or disability@uhcl.edu as soon as possible and complete their registration process. The University of Houston System complies with Section 504 of the Rehabilitation Act of 1973 and the Americans with Disabilities Act of 1990, pertaining to the provision of reasonable academic adjustments/auxiliary aids for students with a disability. In accordance with Section 504 and ADA guidelines, each University within the System strives to provide reasonable academic adjustments/auxiliary aids to students who request and require them. F. Incomplete Grade: A grade of “I” (Incomplete) will be administered only under extreme, verifiable emergency” situation where the student in unable to complete some minor portion of the course work due to circumstance beyond his/her control provided the student is passing the course. G. LAST DAY TO DROP/WITHDRAW FROM THE SEMESTER: APRIL 12, 2016. 3 COURSE SCHEDULE (SUBJECT TO CHANGE IF DEEMED NECESSARY) Date-2016 Lecture & Skill/Activity Due Dates January 25 Syllabus Review & Introduction February 1 Background and definitions of data warehouse, data marts, and data mining (Chapter 1 of Ponniah) February 8 The data warehouse architecture (Chapters 2, 6, and 7 of Ponniah) February 15 The principles of dimensional modeling (Chapter 5 and 10 of Ponniah) Submit Review Questions - Chapter 1 & 2 February 22 Advanced topics in dimensional modeling (Chapter 11 of Ponniah) Start of First Group Presentations – Problem Definition and Planning February 29 Online Analytical processing (Chapter 15 of Ponniah) Submit DW Analyst Paper March 7 Introduction to SQL Server Database and Create a new Database Submit Review Questions - Chapters 5 & 10 March 14 Spring Holiday March 21 Create a Pivot Table in Excel using Contoso Database March 28 Test-I: Essay/short answers - Chapters 1, 2, 5, 10, 11 and Class Notes Submit Review Questions - Chapters 11 & 15 April 4 SQL Server 2012 Analysis Services: Lesson 1 Start of Second Group Presentation –Multidimensional Data Modeling April 11 SQL Server 2012 Analysis Services: Lesson 2 Submit Completed Contoso Pivot Table in the Web Folder April 18 SQL Server 2012 Analysis Services: Lesson 3 Submit individual Report on Analysis Services (see below) April 25 Implementing the Group Project Start of Third Group Presentation – Demonstration and OLAP Reports May 2 Continue working on the Group Project May 9 Finishing the Group Project Formation of Groups (3 students) Submit Final Project Report (see below) 4 Individual Report on OLAP Test Project After learning to develop Cubes through SQL Server Analysis Services, each student will be required to document through texts and screen shots, the steps of test Cube development and results of at least three OLAP queries from the Excel Pivot Table. In this case, students will be using Visual Studio/SQL Server 2012 along with a sample SQL Server AdventureWorks data warehouse to create a business intelligence project and go through the steps following the note: SQL Server 2012 analysis Services Multidimensional Modeling Step by Step. Refer to course website for a sample report. Again, your report should be about 10-15 pages long and it should contain 3-5 OLAP reports along with your analysis and conclusion. Group Project Final Report You are to write a report for the group project you have done during the semester. The paper should clearly describe the concepts and purpose of data warehousing, the business scenario of your project and proposed outcome, the approach of dimensional modeling, the design of data warehouse, the research methodology including data gathering, data cleansing, data transfer, as well as development of data warehouse and OLAP cube using a specific tool such as SQL Server Analysis Services 2012. The implementation part should describe dimensions and facts as well as the reasons to consider them. The results should provide appropriate strategic information for your business. Use appropriate texts and graphics to describe and display necessary activities and data. The following is a typical outline of the paper; however, you are free to change the titles of the sections. But the sections must follow typical steps of your project development - beginning from the business scenario and ending with the explanation OLAP results along with figures and graphs: Title Name of the Students and affiliation Abstract Business Scenario: Problem statement, data origins, and expected value Why Data Warehousing? Methodology Defining raw data format Data cleansing Data Transfer (Text file, Excel, Access/SQL Server) Dimensional Modeling: Defining dimensions, facts, attributes, and hierarchies, database schema DW implementation (in Access/SQL Server): Screen capture of relationships, dimensions, facts Cube Implementation and OLAP in SQL Server Results: Screen capture of drill-down, roll-up, slicing and dicing, graphs, along with explanations Conclusion and Discussion 5