Chapter 3 and Module C DATABASES AND DATA WAREHOUSES Supporting the Analytics-Driven Organization Opening Case: The Digitization of Content In 2010, more than half of all music was in digital form; physical music will likely never again be the majority. What else can be digitized? Pictures, movies, books. What about education? Can a course be digitized and canned? INTRODUCTION Business intelligence (BI) ◦ Knowledge about your customers, competitors, business partners, environment, and internal operations to make effective, important, and strategic business decisions Analytics ◦ Fact-based decision-making ◦ Integrated use of IT and statistical techniques to create BI. E.g., If I run a coffee shop and most of my customers are between 18 and 30, and mostly male, then what can I do with this information? THE RELATIONAL DATABASE MODEL There are many types of databases The relational database model is the most popular. Relation = Table. Relational database Database Characteristics 1. 2. 3. 4. Collections of information Created with logical structures Include logical ties within the information Include built-in integrity constraints 2. Database – Logical Structure Advisor Advisor ID Character Field Record File (Table) Database Data Warehouse Class ALastName AFirstName 101 Leonard Lori Class Synonym 102 Aurigemma Sal 10342 MIS 3003 3 103 Bajaj Akhilesh 10344 MIS 1123 2 104 Platner Steve 10359 MIS 4133 2 105 McCrary Mike 10450 MIS 1123 1 10578 MIS 2013 3 10643 MIS 4053 1 Class Prefix Class No Class Section Student-Class Student Student ID SLastName SFirstName Advisor ID 1011 Berry Jeff 101 1012 Smith Tom 103 1013 Sanders Tally 101 1014 Anderson Cindy 103 1015 Whitman Amy 102 1016 Jones Kelsi 105 1017 Phillips Susan 104 Student ID Class Synonym 1011 10342 1011 10643 1013 10578 1014 10342 1014 10359 1014 10450 1015 10578 1016 10342 1017 10344 1017 10450 Logical Structure: Character Advisor Advisor ID Character Field Record File (Table) Database Data Warehouse Class ALastName AFirstName 101 Leonard Lori Class Synonym 102 Aurigemma Sal 10342 MIS 3003 3 103 Bajaj Akhilesh 10344 MIS 1123 2 104 Platner Steve 10359 MIS 4133 2 105 McCrary Mike 10450 MIS 1123 1 10578 MIS 2013 3 10643 MIS 4053 1 Class Prefix Class No Class Section Student-Class Student Student ID SLastName SFirstName Advisor ID 1011 Berry Jeff 101 1012 Smith Tom 103 1013 Sanders Tally 101 1014 Anderson Cindy 103 1015 Whitman Amy 102 1016 Jones Kelsi 105 1017 Phillips Susan 104 Student ID Class Synonym 1011 10342 1011 10643 1013 10578 1014 10342 1014 10359 1014 10450 1015 10578 1016 10342 1017 10344 1017 10450 Logical Structure: Field Advisor Advisor ID Character Field Record File (Table) Database Data Warehouse Class ALastName AFirstName 101 Leonard Lori Class Synonym 102 Aurigemma Sal 10342 MIS 3003 3 103 Bajaj Akhilesh 10344 MIS 1123 2 104 Platner Steve 10359 MIS 4133 2 105 McCrary Mike 10450 MIS 1123 1 10578 MIS 2013 3 10643 MIS 4053 1 Class Prefix Class No Class Section Student-Class Student Student ID SLastName SFirstName Advisor ID 1011 Berry Jeff 101 1012 Smith Tom 103 1013 Sanders Tally 101 1014 Anderson Cindy 103 1015 Whitman Amy 102 1016 Jones Kelsi 105 1017 Phillips Susan 104 Student ID Class Synonym 1011 10342 1011 10643 1013 10578 1014 10342 1014 10359 1014 10450 1015 10578 1016 10342 1017 10344 1017 10450 Logical Structure: Record Advisor Advisor ID Character Field Record File (Table) Database Data Warehouse Class ALastName AFirstName 101 Leonard Lori Class Synonym 102 Aurigemma Sal 10342 MIS 3003 3 103 Bajaj Akhilesh 10344 MIS 1123 2 104 Platner Steve 10359 MIS 4133 2 105 McCrary Mike 10450 MIS 1123 1 10578 MIS 2013 3 10643 MIS 4053 1 Class Prefix Class No Class Section Student-Class Student Student ID SLastName SFirstName Advisor ID 1011 Berry Jeff 101 1012 Smith Tom 103 1013 Sanders Tally 101 1014 Anderson Cindy 103 1015 Whitman Amy 102 1016 Jones Kelsi 105 1017 Phillips Susan 104 Student ID Class Synonym 1011 10342 1011 10643 1013 10578 1014 10342 1014 10359 1014 10450 1015 10578 1016 10342 1017 10344 1017 10450 Logical Structure: File Advisor Advisor ID Character Field Record File (Table) Database Data Warehouse Class ALastName AFirstName 101 Leonard Lori Class Synonym 102 Aurigemma Sal 10342 MIS 3003 3 103 Bajaj Akhilesh 10344 MIS 1123 2 104 Platner Steve 10359 MIS 4133 2 105 McCrary Mike 10450 MIS 1123 1 10578 MIS 2013 3 10643 MIS 4053 1 Class Prefix Class No Class Section Student-Class Student Student ID SLastName SFirstName Advisor ID 1011 Berry Jeff 101 1012 Smith Tom 103 1013 Sanders Tally 101 1014 Anderson Cindy 103 1015 Whitman Amy 102 1016 Jones Kelsi 105 1017 Phillips Susan 104 Student ID Class Synonym 1011 10342 1011 10643 1013 10578 1014 10342 1014 10359 1014 10450 1015 10578 1016 10342 1017 10344 1017 10450 Logical Structure: Database Advisor Advisor ID Character Field Record File (Table) Database Data Warehouse Class ALastName AFirstName 101 Leonard Lori Class Synonym 102 Aurigemma Sal 10342 MIS 3003 3 103 Bajaj Akhilesh 10344 MIS 1123 2 104 Platner Steve 10359 MIS 4133 2 105 McCrary Mike 10450 MIS 1123 1 10578 MIS 2013 3 10643 MIS 4053 1 Class Prefix Class No Class Section Student-Class Student Student ID SLastName SFirstName Advisor ID 1011 Berry Jeff 101 1012 Smith Tom 103 1013 Sanders Tally 101 1014 Anderson Cindy 103 1015 Whitman Amy 102 1016 Jones Kelsi 105 1017 Phillips Susan 104 Student ID Class Synonym 1011 10342 1011 10643 1013 10578 1014 10342 1014 10359 1014 10450 1015 10578 1016 10342 1017 10344 1017 10450 Database – Physical Structure Database tables are stored in the operating system as files, but we don’t worry about the files, because when we open the database, we see the tables. Providing this table centric view to us is the job of the DBMS (Database Management system). Common examples of DBMSs are: MS Access, Libre Office Base, MYSQL, Maria DB, Oracle, MS SQL Server Databases –Logical Structures Databases have many tables In databases, the row number is irrelevant; not true in spreadsheet software In databases, column names are very important. Column names are created in the data dictionary Database –Logical Structures Data dictionary – contains the logical structure for the information in a database Before you can enter information into a database, you must define the data dictionary for all the tables and their fields. For example, when you create the Truck table, you must specify that it will have three pieces of information and that Date of Purchase is a field in Date format. 3. Databases - Logical Ties Within the Information Logical ties must exist between the tables or files in a database Logical ties are created with primary and foreign keys Primary key (PK) Foreign key (FK) Database – Logical Ties within the Information Customer Number is the primary key for Customer and it also appears in Order as a foreign key. Foreign key means that the value MUST exist in the customer table first, before it can exist in the order file. Separate Tables that Link Tables Example: If an order can have many customers, and a customer can be linked to many orders, then how do we capture the link to show which customers are on an order? Can we add multiple columns to the orders table, one for each customer id? Or can we add multiple columns to the customers table, one for each order? OR: create a new table with customer id and order id? Call it CustomerOrders? What is the primary key for this table? What about order date? Customer feedback for that order? 4. Databases – Built-In Integrity Constraints Integrity constraints – rules that help ensure the quality of the information ◦ Primary keys: Value must be unique in the main table ◦ Foreign key: Value must already exist in the main table ◦ Column constraints: Sales price cannot be negative, Phone number must have area code ◦ PK & FK constraints and many column constraints can be created at time of building the tables, one time and then all data that is put in is checked first for violations. Steps in Developing a Database Step 1: Decide on the tables, columns, column types, primary keys and foreign keys. Step 2: Use A Data Definition Language To Create Your Database Prior to developing the database, we can plan out the design, using a diagram called an Entity Relationship diagram. In this, we look at the business requirements and try to list the entities (objects or events) and the links between them (relationships). Then we create tables, with foreign keys and primary keys for each entity and relationship, except that some relationships do not get their own table. E.g., if there was one customer per order, then we just add customer id in the orders table. Fun In class Project University Database Objects: Courses, Course sections, Professors, Students (graduate and undergraduate) , Classrooms, buildings Example Example Example Example Example Example course numbers: MIS3053, MIS4233, MIS3023. course section identifiers: MIS3053Fall2008A, MIS4233Fall2008A StudentID: 0918512 FacultyID: 0918452 BuildingID: HELM, OLIP classroom ID: HELM316 Events: A student takes a course section and gets a grade, a professor teaches a course section and gets a rating for that course section, a graduate student may TA a Course section, and also get a rating for it. Example grade: ‘A’ Example Professor rating: “Excellent’ Example GA rating: ‘Excellent’ Design tables so that information is not duplicated and is properly linked. Show primary and foreign keys. Do this in 2 stages: 1. Table names, columns, PKs and links. 2. Table names, columns, PKs and FKs. DATABASE MANAGEMENT SYSTEM TOOLS 5 Components of a DBMS DBMS engine: Software that talks to Operating system and shows us the tables interface 2. Data definition subsystem: allows us to create the tables, and define PK, FK and other constraints 3. Data manipulation subsystem: Ask questions of existing information 1. Views Report generators QBE tools SQL Application generation subsystem: Create screens that link to tables, so users can input and see informations 5. Data administration subsystem: Create users, grant and revoke privileges. 4. View View – allows you to see the contents of a database file, make changes, and query it to find information Report Generator Report generator – helps you quickly define formats of reports and what information you want to see in a report Query-by-Example Tool QBE tool – helps you graphically design the answer to a question Structured Query Language SQL – standardized fourth-generation query language found in most DBMSs Sentence-structure equivalent to QBE Mostly used by IT professionals Non-procedural language, which makes it different from other programming languages OLTP, OLAP, and Business Intelligence Data Processing Online transaction processing (OLTP) ◦ The gathering and processing transaction information, and updating existing information to reflect the transaction Databases support OLTP Operational database – databases that support OLTP and some limited OLAP Day to day transactions are recorded. Individual transactions are recorded. Online analytical processing (OLAP) ◦ The manipulation of information to support decision making Databases can support some OLAP Data warehouses only support OLAP, not OLTP Data warehouses are special forms of databases that support decision making and help build BI. They have summarized information. Like all sales for each product line in each store in each day. DATA WAREHOUSES AND DATA MINING Data warehouses support OLAP and decision making Data warehouses do not support OLTP Data warehouse Data mart Data-mining Data Warehouse Example According to customers who are female between 30-45, what percentage of sales for cameras occurred after radio advertising in the North Territory? Data Mart Example Data-Mining Tools Data Mining: https://www.youtube.com/watch?v=f2Kji24833Y Digital Dashboard: https://www.youtube.com/watch?v=h9BUlaTlHCE Data Warehouse Considerations Do you really need one, or does your database environment support all your functions? Do all employees need a big data warehouse or a smaller data mart? How up-to-date must the information be? What data-mining tools do you need? INFORMATION OWNERSHIP Information is a resource you must manage and organize to help the organization meet its goals and objectives You need to consider ◦ Strategic management support ◦ Sharing information with responsibility ◦ Information cleanliness Strategic Management Support Data administration – function that plans for, oversees the development of, and monitors the actual data/information. It sets policy. • Database administration – function responsible for the more technical and operational aspects of managing the DBMS platform and the database application. It executes policy. • Sharing Information Everyone can share – while not consuming – information But someone must “own” it by accepting responsibility for its quality and accuracy Information Cleanliness Related to ownership and responsibility for quality and accuracy No duplicate information No redundant records with slightly different data, such as the spelling of a customer name GIGO – if you have garbage information you get garbage information for decision making