INFORMATION AND DATABASES Part 2 Entities: Identification A key is an attribute, or a group of attributes, that assumes a unique value for each entity member (Student ID, SSN, Driver License). • • Why First Name, Last Name are NOT valid keys ? • A group of attributes that uniquely identifies a member of an entity is called a composite key. •A secondary key is an attribute whose values divide all entity members into useful subgroups/sub-criteria. (Major, Gender, etc) Relationships: Degree • Degree of Relationship defines how many entities are involved in a relationship (according to a business rule): • Recursive (Unary), Binary, Ternary • May carry specific data on the relationship Relationships: Degree... • Recursive Relationship: members in the same entity have relationship with each other (one another) 1 M INDIVIDUAL(0,1) STUDENT (0,M) Marry Date -ID -Name 1 (0,1) Be Friend -StudendID -StudentName N (0,N) Relationships: Degree ... • Binary Relationship EMPLOYEE PROJECT 1 - Emp_ID (1,1) - Emp_Name - Emp_Title Lead Date M (0,M) - Project_ID - Proj_Name - Proj_Due Relationships: Degree ... • Ternary relationship EMPLOYEE PROJECT M - EmpID (1,M) - Emp_Name - Emp_Title Assign Date N (1,N) - ProjectID - Proj_Name - Proj_Due P (1,P) TASK - TaskID - TaskName Relationships: Cardinalities • Cardinalities document how many members of one entity can relate to a single member of another entity in a relationship. • Max / Min number of members • Reflect business policies or general business practices (e.g., how many classes a student can take, how many students a class can hold). Student M (25, 40) N Enroll Class (1, 5) One-to-One • One-to-One (1:1) – A relationship between two entities in which an instance of entity A can be related to only one instance of entity B and entity B can be related to only one instance of entity A 1 Sales (1,1) Ex: Cash Sales 1 Pay Cash (1,1) Collections One-to-Many • One-to-Many (1:M) – A relationship between two entities, in which an instance of entity A, can be related to zero, one, or more instances of entity B and entity B can be related to only one instance of entity A Sales 1 (1,1) Pay M Cash (1,M) Collections Ex 1: Installment Payments Sales M (1,M) Pay 1 Cash (1,1) Collections Ex2: Pay many credit purchases in full Many-to-Many • Many-to-Many (M:N) – A relationship between two entities in which an instance of entity A can be related to zero, one, or more instances of entity B and entity B can be related to zero, one, or more instances of entity A Sales M (1,M) Pay N (1,N) Cash Collections Ex: Pay credit purchases with partial payments over some months Data Modeling & DB Design • Database Design • Must be organized • Few or no redundancies • Data model: what info do we need to keep and how they relate to one another • Keys • Primary key : for identification (Student ID, SSN) • Combination primary key (Composite key) • Secondary key : for grouping (major, gender) • Foreign key: to link one table to another, Dealing with Many-to-Many Relationships The relational data model cannot handle Many-to-Many relationships directly – – It is limited to one-to-one and one-to-many relationships Many-to-many relationships need to be replaced with a collection of one-to-many relationships (Cf # 63) Composite Entities • Composite entities - Entities that exist to represent the M:N relationship between two other entities • Example: • There is a many-to-many relationship between an ITEM and an ORDER • An ORDER can contain many ITEM(s) and over time, the same ITEM can appear on many ORDER(s) Composite Entities Entity-Relationship Diagram Model Database Design • Relational Data Model • Primary key (PK): for record identification (Customer), (Order) • Foreign key (FK): for 1:M relationship, on M-side (Orders) links to 1-side (the Customer who places Orders) • Associative Table (Junction table) with Composite Key (CK) for M:N relationships Foreign Keys in Relational Database •A foreign key (FK) in Entity E1(CustID in ORDER) is a primary key of another Entity E2 (CustID in CUSTOMER), which is used to identify (link) a 1:M relationship between E1 and E2 (CUSTOMER and ORDER). •Foreign key is made on the many side (CUSTOMER has many ORDERS, therefore ORDER carries CustID as FK to show which Customer places that Order) Foreign Key CUSTOMER CUSTOMER CustomerID 1 M ORDER ORDER OrderID CustomerID 1:M Relationship Primary Key Foreign Key Foreign Keys in Relational Database. . . •In M:N relationship, the associative/junction table with a composite key will be used to capture the relationship. • ORDER involved many PRODUCTS, PRODUCT involved in many ORDERS. Composite key ProductID-OrderID for LINE ITEM to indicate which product involves in which sales Each part of the composite key serves like a foreign key. •Sometimes, a “surrogate” key (RecordNo) is used as primary key to simplify the identification of record. • Composite Key ORDER ORDER OrderID N M PRODUCT PRODUCT ProductID LINE_ITEM RecordNo OrderID ProductID JUNCTION TABLE M:N Relationship Primary Key Composite Key Database Integrity • Entity integrity: An identifier (primary key) must be unique to identify specific member of the entity. • Referential integrity: A foreign key value in a many-side table should match primary key value in the one-side table (Create ORDER only to an existing CUSTOMER, or we have to add a customer first before having business with him/her) • Domain integrity:error exists when field value is outside the range/type Database Design … • Data Dictionary • Provides information about each attribute in the database including: - Name (What data is about?) - Key (Is it a key or part of a key?) - Data Type (date, alpha-numeric, numeric, etc.) - Valid Value (the format or numbers allowed) • Can be used to enforce Business Rules to prevent illegal or illogical values from entering the database. (e.g. who has authority to enter certain kinds of data; can’t enter characters in numeric field …) Database Design … • Data Dictionary … • Data type (especially data types of keys) • Data size (especially data sizes of keys) • Description (what for) • Authorization (who can create/update) Data Dictionary EMPLOYEE Attributes Types Size Description Authorization EmpID Numeric 6 Identifier HR Manager EmpFirstName Text 10 Employee First Name HR Manager EmpLastName Text 10 Employee Last Name HR Manager Address Text 50 Employee Address HR Manager City Text 10 Employee City HR Manager State Text 2 Employee Last Name HR Manager Zip Text XXXXX Employee Last Name HR Manager Phone Text XXX-XXX-XXXX Employee Last Name HR Manager Date Hired Date MM/DD/YY Date Hired Employee HR Manager Position Text 15 Position of Employee HR Manager Attributes Types Size Description Authorization EntryNumber Numeric 6 Identifier Project Manager EntryDate Date MM/DD/YY Date of Entry Project Manager HoursWorked Numeric 2 Hours per Task Project Manager CostOfHotel Currency 3 Fund Spent on Hotel Project Manager CostOfTravel Currency 3 Fund Spent on Travel Project Manager CostOfMeals Currency 3 Fund Spent on Food Project Manager Approved Y/N 1 Approved / Not Yet Project Manager EXSPENSE From Logical Data Model Entity-Relationship Diagram: Customer Cust No 1 place Order M Order No M contain Product N Product No Relational Data Model: CUSTOMER (Cust No, ….) ORDER (Order No, Cust No, ….) PRODUCT (Product No,…) ORDER-PRODUCT (OrderNo, ProductNo, …) …to “MS Access” Implementation. Another Example: Enrollment Entity-Relationship Diagram: Student M N Enroll Class N M Assign Relational Data Model: STUDENT (Student ID, ….) CLASS (Course ID, ….) INSTRUCTOR (Instructor ID,…) ENROLLMENT (Student ID , Course ID, …) ASSIGNMENT (CourseID , InstructorID, …) Instructor Data Analysis with DBMS : Queries • Structured Query Language (SQL) • Query by Example (QBE) How Organizations Get the Most from Their Data • Data Warehousing: – A logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks • Integrating multiple large databases into a single repository • Queries, analysis, and processing • Purpose: put key business information into the hands of decision makers How Organizations Get the Most from Their Data … • Data Marts • Instead of one large data warehouse, many organizations create multiple data marts. • A data mart is a small data warehouse, designed for the end-user needs in a strategic business unit (SBU) or a department. • Each contains a subset of the data: finance, inventory, personnel • Each data mart is customized for particular DSS applications DATA MARTS Performing Business Analysis with Data Marts • Extraction, transformation, and loading (ETL) – A process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse How Organizations Get the Most from Their Data … • Data Mining • Information on customers, products, markets, etc. from historical data • Drill down: from summary to more detailed data • Sort and extract information • Trends, correlations, forecasting, statistics Data Warehousing • Data warehouses are organized by business dimension or subject. • Data warehouses are multidimensional. • Data warehouses are historical. • Data warehouses use online analytical processing. Data Warehouse Framework & Views Benefits of Data Warehousing • End users can access data quickly and easily via Web browsers because they are located in one place. • End users can conduct extensive analysis with data in ways that may not have been possible before. • End users have a consolidated view of organizational data. Business Intelligence • Improving the quality of business decisions has a direct impact on costs and revenue • BI enables business users to receive data for analysis that is: • Reliable • Consistent • Understandable • Easily manipulated Business Intelligence BI Can Answer Tough Questions Multidimensional Analysis • Databases contain information in a series of two- dimensional tables • In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows • Dimension – A particular attribute of information • Cube – Common term for the representation of multidimensional information Multidimensional Analysis Cubes of Information Information Cleansing • Information cleansing / scrubbing – A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information Uncovering Trends and Patterns with Data Mining • Data mining – The process of analyzing data to extract information not offered by the raw data alone • Data-mining tools – use a variety of techniques to find patterns and relationships in large volumes of information • Classification • Estimation • Affinity grouping • Clustering Uncovering Trends and Patterns with Data Mining . . . • Structured data – Data already in a database or a spreadsheet • Unstructured data – Data does not exist in a fixed location and can include text documents, PDFs, voice messages, emails • Text mining – Analyzes unstructured data to find trends and patterns in words and sentences • Web mining – Analyzes unstructured data associated with websites to identify consumer behavior and website navigation Uncovering Trends and Patterns with Data Mining . . . • Common forms of data-mining analysis capabilities include • • • Cluster analysis Association detection Statistical analysis THANKYOU