I S 6 3 0 : A c c o u nti ng I n f orma tion S y s t ems h t t p : / / w w w. c s u n . e d u / ~ d n 5 8 4 1 2 / I S 5 3 0 / I S 5 3 0 _ F 1 5 . h t m Relational Databases & Data Modeling with ERD Lecture 4 Learning Objectives Limitations of traditional application approaches to managing data. Advantages of centralized database approach REAL framework to capture relevant business data Data Modeling with Entity-Relationship Diagrams (ERD) Advanced database applications in decision support and knowledge management. IS 530 : Lecture 4 2 Why Databases ? Business information systems are built on databases of business event data. Accounting information is one of many outputs, of business event data. Larger organizations store information in data warehouses in ways that let managers analyze it to gain important insights. Sophisticated reporting systems, based on data warehouses and business event databases, help managers makes better decisions. IS 530 : Lecture 4 3 Application Approach To Business Event Processing IS 530 : Lecture 4 4 Database Approach To Business Event Processing IS 530 : Lecture 4 5 Difficulties of Non-Relational Data Files Update Anomaly: not changing all occurrence of a data item (in many places) Insert Anomaly: add an invalid (null record) to the database Delete Anomaly: not remove all info (in many places) about a deleted record IS 530 : Lecture 4 6 Difficulties with Applications Approach Each application collects and manages its own data in dedicated, separate, physically distinguishable files. Data redundancy leads to inconsistencies / integrities among the same data in different files. Increased costs to store multiple versions of the same data in different files. Data residing in separate files are not shareable because fixed record layout in data files created for a particular application. IS 530 : Lecture 4 7 Centralized Database Approach Facts about events are stored in relational database tables instead of separate files. Improves efficiency, eliminates data redundancies, and improves data integrity. Enables integrated business information systems that include data about all of a company’s operations. Multiple users from throughout the organization can view and aggregate event data in a manner most conducive to their needs. IS 530 : Lecture 4 8 Database Management Systems Database management systems (DBMS): set of integrated programs designed to simplify the tasks of creating, accessing, and managing a centralized database. Integrates a collection of files that are independent of application programs and are available to satisfy a number of different processing needs. Supports normal data processing needs and provides data useful to managers. IS 530 : Lecture 4 9 Key DBMS Concepts Data independence: data from the system applications is decoupled to make it independent of the application or other users. Three-tier architecture: presentation (user interface), logic (applications), and data (database). Query language : a programming language to create and access a database and to produce inquiry reports. SQL (Structured Query Language): standard for DBMS query languages. IS 530 : Lecture 4 10 Advantages of DBMS Eliminating data redundancy Ease of maintenance Reduced labor and storage costs Data integrity Data independence Privacy IS 530 : Lecture 4 11 Disadvantages of DBMS Expensive to implement. Expertise needed If the DBMS fails, all the organization’s information processing halts. Increased potential for damage with unauthorized access to central location. Database recovery and contingency planning are more important than in the applications approach. IS 530 : Lecture 4 12 Disadvantages of DBMS . . . “Contention” or “concurrency” problems when more than one user attempts to access data at the same time. Territorial disputes over “data ownership” who is responsible for data maintenance. CIO and/or a database administrator function needed to deal with these and other problems. IS 530 : Lecture 4 13 Evolution of Database Systems File Management (Flat File) Systems Hierarchical Databases Network Databases Relational Databases Object-Oriented Databases Data Warehouse IS 530 : Lecture 4 14 File Management Systems EMPLOYEE UPDATE PROGRAM FD EMPLOYEE MASTER FILE EMPLOYEE REPORT PROGRAM FD CHECK-WRITING PROGRAM FD TIMECARD FILE FD IS 530 : Lecture 4 15 Hierarchical Databases Car Engine Left Door Handle Body Right Door Window Chassis Hood Roof Lock IS 530 : Lecture 4 16 Hierarchical Database Model Hierarchical database model: Records are organized in a pyramid structure. Child records: records that are included in a record at one level above them (a parent record). May only have one parent record. Link through “pointers” Parent records: include the lower-level child records. Cannot sustain complex data structures. IS 530 : Lecture 4 17 Network Databases CUSTOMERS Acme Mfg. #11231 PRODUCTS First Corp. #11232 Size 4 Widget #11233 #11234 4D Bolt #11235 ORDERS IS 530 : Lecture 4 18 Network Database Model Network database model: child records can have more than one parent record. Overcomes problems of hierarchical model. Eclipsed by relational databases. IS 530 : Lecture 4 19 Relational Databases CUSTOMERS CUST ID PRODUCTS 1 1 PRODUCT ID ORDERS M ORDER # CUST ID PRODUCT ID M QUANTITY IS 530 : Lecture 4 20 Relational Database Model Relational database model: data are logically organized into two dimensional tables (i.e., “relations”). Improvement over hierarchical or network database models. Able to handle complex queries (info from many tables/files.) Allows only text and numerical data to be stored. Does not allow the inclusion of complex object types such as graphics, audio, video, or geographic information. IS 530 : Lecture 4 21 Object-Oriented Databases CUSTOMERS CUST ID 1 1 PRODUCTS PRODUCT ID CUST NAME PRICE ADDRESS QTY-ON HAND Add Customer Drop Customer Change Customer New Product ORDERS ORDER # * Buy Product Sell Product CUST ID PRODUCT ID QUANTITY * Take Order Update Order IS 530 : Lecture 4 22 Object-Oriented Database Model Object oriented database model: allows the storage of both simple and complex objects. An object can store attributes and instructions for actions (methods) that can be performed on the object or its attributes. It is a complete “application with its own data” Object is reusable. Object-relational databases: includes a relational DBMS framework with the capability to store complex data types. IS 530 : Lecture 4 23 Data Warehouse IS 530 : Lecture 4 24 What Info to keep ? REAL Framework •Resources •Events •Agents •Locations IS 530 : Lecture 4 25 A Model of Business Event Internal Agents Resources Business Event Location • • • • • External Agents What happened? When did it happen? Who was involved? What Resources were involved? Where did it occur? IS 530 : Lecture 4 26 REAL framework Internal Agent Resource Event 1 Location External Agent Resource Internal Agent Event 2 External Agent Location IS 530 : Lecture 4 REAL Model for Retailing Business Merchandise Salesperson Sell Merchandise Counter Customer Receive Customer Payment Cash IS 530 : Lecture 4 28 Entities Entity is a group of attributes corresponding to the same conceptual thing about which we need to capture and store data (in a file/table) Entity is a set of instances / members of the object that it represents (records) Entity must have a unique name, unique identifier, and at least one attribute (the identifier itself is sufficient) IS 530 : Lecture 4 29 Entities : Attributes An attribute is a descriptive property or characteristic of interest of an entity. Also called field. The data type for an attribute defines what type of data can be stored in that attribute. The domain of an attribute defines what values an attribute can legitimately take on. The default value for an attribute is the value that will be recorded if not specified by the user. • • • IS 530 : Lecture 4 30 Entities : Identification A key is an attribute, or a group of attributes, that assumes a unique value for each entity member (Student ID, SSN, Driver License). Why First Name, Last Name are NOT valid keys ? A group of attributes that uniquely identifies a member of an entity is called a composite key. • IS 530 : Lecture 4 31 Alternative ERD Notation Attribute 1 Attribute 3 Attribute 4 Attribute 2 1 Attribute 1 N Entity 1 Attribute 5 IS 530 : Lecture 4 Attribute 3 Attribute 2 Entity 2 Attribute 4 32 Entities . . . ENTITY NAME CUSTOMER - entity id - attribute 1 - attribute 2 - ………….. - attribute n - Customer_ID - Cust_Name - Cust_Address - Cust_Phone IS 530 : Lecture 4 33 Relationships: Degree Degree of Relationship defines how many entities are involved in a relationship (according to a business rule): Recursive (Unary), Binary, Ternary May carry specific data on the relationship • • IS 530 : Lecture 4 34 Relationships: Degree... Recursive Relationship: members in the same entity have relationship with each other (one another) INDIVIDUAL -ID -Name STUDENT Marry Date -StudendID -StudentName IS 530 : Lecture 4 Be Friend 35 Relationships : Degree . . . Binary Relationship EMPLOYEE - Emp_ID - Emp_Name - Emp_Title PROJECT Lead Date IS 530 : Lecture 4 - Project_ID - Proj_Name - Proj_Due 36 Relationships : Degree . . . Ternary relationship EMPLOYEE - EmpID - Emp_Name - Emp_Title Assign Date PROJECT - ProjectID - Proj_Name - Proj_Due TASK - TaskID - TaskName IS 530 : Lecture 4 37 Relationships: Cardinalities Cardinalities document how many members of one entity can relate to a single member of another entity in a relationship. Max / Min number of members Reflect business policies or general business practices (e.g., how many classes a student can take; how many students a class can hold). • • Student Enroll (16, 37) IS 530 : Lecture 4 Class (1, 5) 38 Max Cardinalities One-to-One (1:1) (Binary) Relationship Sales Pay Cash Collections Ex: Cash Sales One-to-Many (1:M) (Binary) Relationship Sales Pay Cash Collections Ex: Installment Payments IS 530 : Lecture 4 39 Max Cardinalities . . . Many-to-One (M:1) (Binary) Relationship Sales Pay Cash Collections Ex: Pay many credit purchases in full Many-to-Many (M:N) (Binary) Relationship Sales Pay Cash Collections Ex: Pay credit purchases with partial payments over some months IS 530 : Lecture 4 40 Data Modeling & DB Design Data Modeling: what info do we need to keep and how they relate to one another Database Design: tables must be organized with few or no redundancies (Normalization) Keys in Relational DB Primary key : for identification (Student ID) Combination primary key (Composite key) Foreign key: to link one table to another. Surrogate key : a single-value key as alternate to Composite key) [ Secondary key : for grouping (major, gender)] [Candidate key : alternative attribute could be used as identifier (SSN, Driver License)] • • • • • • IS 530 : Lecture 4 41 Database Design Relational Data Model (Data Schema) • Primary key (PK): for record identification • • (Customer ID), (Order ID) Foreign key (FK): for 1:M relationship, on M-side (Orders) links to 1-side (the Customer who places Orders) Associative Table (Junction table) with Composite Key (CK) for M:N relationships IS 530 : Lecture 4 42 Foreign Keys in Relational Database A foreign key (FK) in Entity E1(CustID in ORDER) is a primary key of another Entity E2 (CustID in CUSTOMER), which is used to identify (link) a 1:M relationship between E1 and E2 (CUSTOMER and ORDER). Foreign key is made on the many side (CUSTOMER has many ORDERS, therefore ORDER carries CustID as FK to show which Customer places that Order) IS 530 : Lecture 4 43 Foreign Key CUSTOMER CUSTOMER CustomerID ORDER ORDER OrderID CustomerID IS 530 : Lecture 4 1:M Relationship Primary Key Foreign Key 44 Foreign Keys in Relational Database. . . In M:N relationship, the associative/junction table with a composite key will be used to capture the relationship. • ORDER involved many PRODUCTS, PRODUCT involved in many ORDERS. Composite key ProductID-OrderID for LINE ITEM to indicate which product involves in which sales Each part of the composite key serves like a foreign key. Sometimes, a “surrogate” key (RecordNo) is used as primary key to simplify the identification of record. IS 530 : Lecture 4 45 Composite Key ORDER PRODUCT ORDER OrderID PRODUCT ProductID M:N Relationship Primary Key LINE_ITEM RecordNo OrderID ProductID Composite Key JUNCTION TABLE IS 530 : Lecture 4 46 Database Integrity Entity integrity: An identifier (primary key) must be unique to identify specific member of the entity. Referential integrity: A foreign key value in a manyside table should match primary key value in the one-side table (Create ORDER only to an existing CUSTOMER, or we have to add a customer first before having business with him/her) Domain integrity: error exists when field value is outside the range/type IS 530 : Lecture 4 47 Data Dictionary EMPLOYEE Attributes Types Size Description Authorization EmpID Numeric 6 Identifier HR Manager EmpFirstName Text 10 Employee First Name HR Manager EmpLastName Text 10 Employee Last Name HR Manager Address Text 50 Employee Address HR Manager City Text 10 Employee City HR Manager State Text 2 Employee Last Name HR Manager Zip Text XXXXX Employee Last Name HR Manager Phone Text XXX-XXX-XXXX Employee Last Name HR Manager Date Hired Date MM/DD/YY Date Hired Employee HR Manager Position Text 15 Position of Employee HR Manager Attributes Types Size Description Authorization EntryNumber Numeric 6 Identifier Project Manager EntryDate Date MM/DD/YY Date of Entry Project Manager HoursWorked Numeric 2 Hours per Task Project Manager CostOfHotel Currency 3 Fund Spent on Hotel HR Clerk CostOfTravel Currency 3 Fund Spent on Travel HR Clerk CostOfMeals Currency 3 Fund Spent on Food HR Clerk Approved Y/N 1 Approved / Not Yet Project Manager EXSPENSE IS 530 : Lecture 4 48 From REAL Model . . . Resources Product Events Sales Agents Salesperson Customer Cash Cash Collection Cashier IS 530 : Lecture 4 49 From Logical Data Model Entity-Relationship Diagram: Customer Cust No place Order Order No contain Product Product No Relational Data Model (Data Schema): CUSTOMER (Cust No, ….) ORDER (Order No, Cust No, ….) PRODUCT (Product No,…) ORDER-PRODUCT (OrderNo, ProductNo, …) IS 530 : Lecture 4 50 . . . to Physical Implementation with MS Access IS 530 : Lecture 4 51 Elements of Relational Databases Tables: place to store data. Queries: tools that allow users to access the data stored in various tables and to transform data into information. Forms: onscreen presentations that allow users to view data in tables or collected by queries from one or more tables and input new data. Reports: printed lists and summaries of data stored in tables or collected by queries from one or more tables. IS 530 : Lecture 4 52 Elements Of Relational Databases . . . Form Builder Report Writer Interactive Query Tool Application Program Database Front-end Database Engine To other computer systems Database Database Gateway To other DBMS brands IS 530 : Lecture 4 53 Database Normalization Normalization: A technique for making complex databases more efficient and more easily handled by the DBMS • Eliminates data redundancy • Each entity stores info about one thing/object only Structure of tables must comply with several rules called normal forms to transform data tables that are not in normal form into tables that comply with the rules. Failure to normalize results in anomalies: errors when adding, changing, or deleting data stored in the database. IS 530 : Lecture 4 54 Normalization First normal form (1NF) – an entity whose attributes have no more than one value for a single instance of that entity Any attributes that can have multiple values actually describe a separate entity, possibly an entity and relationship. • Second normal form (2NF) – an entity whose nonprimary-key attributes are dependent on the full primary key. Any nonkey attributes that are dependent on only part of the primary key should be moved to any entity where that partial key is actually the full key. This may require creating a new entity and relationship on the model. • Third normal form (3NF) – an entity whose nonprimary-key attributes are not dependent on any other non-primary key attributes. Any nonkey attributes that are dependent on other nonkey attributes must be moved or deleted. Again, new entities and relationships may have to be added to the data model. • IS 530 : Lecture 4 55 Normalization in Plain English !!! First normal form (1NF) : • No repeating group of a same attribute (multi-valued attribute) • If not: create a new entity/record for this group. Second normal form (2NF) • Attributes should depend on the whole (composite) key, not part of it (partial functional dependency). • If not: create a new entity for these partial depended attributes Third normal form (3NF) • Attributes should depend on the (primary) key only, not on each other – a non-key attribute (transitive dependency) • If not: create new entity for these partial depended attributes IS 530 : Lecture 4 56 Unnormalized Relation Observation: Repeating groups / multi-value attributes !!! IS 530 : Lecture 4 57 Relation in 1NF Observation: Attributes depend on a part of the key !!! IS 530 : Lecture 4 58 Relations in 2NF Observation: Attributes depend on a non-key attribute !!! IS 530 : Lecture 4 59 Relations in 3NF Observation: Each table stores data about one thing only. IS 530 : Lecture 4 60 Example of Relational Database IS 530 : Lecture 4 61 Example of Relational Database . . . IS 530 : Lecture 4 62 Data Warehouses for Data Mining Data Warehousing: use IT / IS to collect, organize, integrate, and store entity-wide data to provide users with easy access to large quantities of varied data from across the organization to improve decisionmaking capabilities. Data Mart: a subset of Data Warehouse to store special purposed data Metadata is an index of DB: what, format, where Data Mining: exploration, aggregation, and analysis of data in data warehouses using analytical tools and exploratory techniques. IS 530 : Lecture 4 63 Data Warehouse IS 530 : Lecture 4 64 Knowledge Management (KM) Explicit Knowledge : anything that can be documented, archived, or codified often with the help of information systems Tacit Knowledge : the processes and procedures on how to effectively perform a particular task stored in a persons mind Knowledge Assets : all underlying skills routines, practices, principles, formulas, methods, heuristics, and intuitions whether explicit or tacit Knowledge Management (KM) : the process an organization uses to gain the greatest value from its knowledge assets IS 530 : Lecture 4 65 Decisions Aids Decision aids: Information systems that help decision makers with aggregate information, what-if analyses .... Includes: Decision Support Systems Executive Information Systems Expert Systems Intelligent Agents • • • • IS 530 : Lecture 4 66 Decision Support Systems (DSS) Decision support systems (DSS): information systems that assist managers with unstructured decisions by retrieving data and generating information. • Possesses interactive capabilities (What-if analyses.) • Can answer ad-hoc inquires. • Provides data modeling facilities. Can imitate human decision making (i.e., artificial intelligence) when confronting complex and ambiguous situations (tacit knowledge, underlying nonlinear relationships from historical data) IS 530 : Lecture 4 67 Executive Information Systems (EIS) Executive Information Systems (EIS) / Executive Support Systems (ESS): information systems, often considered a subset of DSS, that combine information from the organization and the environment, organize and analyze the information, and present the information to the manager in an aggregate form to assists decision making. IS 530 : Lecture 4 68 Group Support Systems (GSS) Group Support Systems (GSS) / Group Decision Support Systems (GDSS): computer based systems that support collaborative intellectual work such as: idea generation, elaboration, analysis, synthesis, information sharing, and decision making Supports brainstorming (a method for freely and creatively generating as many ideas as possible without undue regard for their practicality or realism). IS 530 : Lecture 4 69 Expert Systems (ES) and Neural Networks (NN] Expert Systems (ES): decision support systems for: complex decisions, where consistency is desirable, minimize time and maximize quality. Emulates the problem solving techniques of human experts. Neural Networks (NN): computer hardware and software systems that mimic the human brain’s ability to recognize patterns or predict outcomes using less-than complete information. IS 530 : Lecture 4 70 Intelligent Agents (IA) Intelligent Agent (IA): software program that may be integrated into DSS or other software tools (such as word processing, spreadsheet, or database packages). Once set in motion, these so-called “bots,” or “robots,” continue to perform their tasks without further direction from the user. IS 530 : Lecture 4 71 Business Intelligence (BI) Business intelligence (BI) : uses state-of-the-art information technologies for storing and analyzing data to help managers make the best possible decisions for their companies. BI systems are specifically designed to support managers in making tactical and strategic decisions. BI is often installed into an existing ERP as an additional module. IS 530 : Lecture 4 72