Databases and Information Management Reading: Laudon & Laudon chapter 5 Additional Reading: Brien & Marakas chapter 3-4 COMP 5131 1 Outline Database Approach to Data Management Database Management Systems Improving Business Performance and Decision Making Data Warehouse Data Marts Business Intelligence Managing Data Resources COMP 5131 2 Database Approach ¾ Database Collection of related files containing records on people, places, or things Prior to digital databases, business used file cabinets with paper files ¾ Entity Generalized category representing person, place, thing on which we store and maintain information Example → SUPPLIER, PART ¾ Attributes Specific characteristics of each entity,example SUPPLIER name, address PART description, unit price, supplier COMP 5131 3 Relational Database ¾Organize Data into 2D Tables Tables → Relations with columns and rows One table for each entity Example → CUSTOMER, SUPPLIER, PART, SALES Fields (columns) store data representing an attribute Rows store data for separate records Key field: Uniquely identifies each record Primary key: One field in each table Cannot be duplicated Provides unique identifier for all information in any row COMP 5131 4 Relational Database ¾ Relational Database Table A relational database organizes data in the form of two-dimensional tables. Illustrated here is a table for the entity SUPPLIER showing how it represents the entity and its attributes. Supplier_Number is the key field. COMP 5131 5 Relational Database ¾ Part Table Data for the entity PART have their own separate table. Part_Number is the primary key and Supplier_Number is the foreign key, enabling users to find related information from the SUPPLIER table about the supplier for each part. COMP 5131 6 Relational Database ¾Establishing Relationships Entity-relationship diagram used to clarify table relationships in a relational database Relational database tables may have: One-to-one relationship One-to-many relationship Many-to-many relationship Requires creating a table (join table, Intersection relation) that links the two tables to join information ¾A Simple Entity Relationship Diagram Relationship between supplier and Part COMP 5131 7 Relational Database ¾ Sample Order Report The shaded areas show which data came from the SUPPLIER, LINE_ITEM, and ORDER tables. The database does not maintain data on Extended Price or Order Total because they can be derived from other data in the tables. COMP 5131 8 Relational Database ¾ Final Database Design with Sample Records The final design of the database for suppliers, parts, and orders has four tables. The LINE_ITEM table is a join table that eliminates the many-to-many relationship between ORDER and PART. COMP 5131 9 Relational Database ¾ Entity-Relationship Diagram for the Database with four Tables This diagram shows the relationship between the entities SUPPLIER, ART, LINE_ITEM, and ORDER. COMP 5131 10 Relational Database ¾Normalization Process of streamlining complex groups of data to Minimize redundant data elements Minimize awkward many-to-many relationships Increase stability and flexibility ¾Referential Entity Rules Used by relational databases to ensure that relationships between coupled tables remain consistent Example → When one table has a foreign key that points to another table, you may not add a record to the table with foreign key unless there is a corresponding record in the linked table COMP 5131 11 Database Management Systems ¾DBMS Specific type of software for creating, storing, organizing, and accessing data from a database Separates the logical and physical views of the data Logical view → How end users view data Physical view → How data are actually structured and organized Examples of DBMS → Microsoft Access, DB2, Oracle Database, Microsoft SQL Server, MySQL (Open Source) COMP 5131 12 Database Management Systems ¾HRD Database with Multiple Views Combine tables to deliver data → Users Requirement → Two tables share a common data element COMP 5131 13 Database Management Systems ¾Operations of a Relational DBMS Select Creates a subset of all records meeting stated criteria Join Combines relational tables to present the ser with more information than is available from individual tables Project Creates a subset consisting of columns in a table Permits user to create new tables containing only desired information COMP 5131 14 Database Management Systems ¾ Three Basic Operations of a Relational DBMS The select, project, and join operations enable data from two different tables to be combined and only selected attributes to be displayed. COMP 5131 15 Database Management Systems ¾Capabilities of DBMS Data Definition Capabilities Specify Structure of Contents of Database Data Directory Automated or manual file storing definitions of data elements and their characteristics Query and Data Reporting Data manipulation language Structured query language (SQL) Microsoft Access query-building tools Report generation, example → Crystal Reports COMP 5131 16 Database Management Systems ¾ Access Data Directory Features COMP 5131 17 Database Management Systems ¾ Example of SQL Query COMP 5131 18 Database Management Systems ¾ An Access Query COMP 5131 19 Database Management Systems ¾ An Access Query COMP 5131 20 Database Management Systems ¾ Object-Oriented Database DBMS designed for structured data rows/columns Not suitable for graphics-based or multimedia applications Object-oriented Database OODBMS →Stores data and procedures that act on those data as objects to be retrieved and shared Usage → Manage multimedia components, Java applets for Web Relatively slow compared to relational DBMS Hybrid Object-relational DBMS → Provide capabilities of both types ¾ Databases Improves Performance, Better Decisions Tools COMP 5131 Data warehousing Multidimensional data analysis Data mining Utilizing Web interfaces to databases 21 Using Database to Improve Performance ¾Data Warehouse Database that stores current and historical data that may be of interest to decision makers Consolidates and standardizes data from many systems, operational and transactional databases Data can be accessed but not altered ¾Data Marts Subset of data warehouses that is highly focused and isolated for a specific population of users Can be constructed more quickly at lower cost Example – Company might develop Marketing and Sales Data Mart to deal with customer information COMP 5131 22 Using Database to Improve Performance ¾ Components of Data Warehouse The data are combined with data from external sources and reorganized into a central database designed for management reporting and analysis. The information directory provides users with information about the data available in the warehouse. COMP 5131 23 Using Database to Improve Performance ¾ Business Intelligence Tools for consolidating, analyzing, and providing access to large amounts of data to improve decision making Software for database reporting and querying Tools for multidimensional data analysis (online analytical processing) Data Mining COMP 5131 24 Using Database to Improve Performance ¾Data Mining Finds hidden patterns, relationships in large databases and infers rules from them to predict future behavior Types of Information Associations → Occurrences linked to single event Example → Chips with Coke for 65% but 85% when promotion for Coke Sequences → Events linked over time Example → House purchasing followed by new refrigerator 65% within 2 weeks, oven 45% within one month Classifications → Patterns describing a group an item belongs to Example → Characteristics of customers who are likely to leave, campaign Clusters → Discovering as yet unclassified groupings Forecasting → Uses series of values to forecast future values COMP 5131 25 Using Database to Improve Performance ¾Data Mining Applications for all functional areas of business Government, Scientific Applications Usage Patterns in Customer Data → Identifying profitable customers or for one-to-one marketing campaigns Predictive Analysis → Using data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events, such as the probability a customer will respond to an offer or purchase a specific product ¾Privacy Concerns Usage Create detailed data image about each individual COMP 5131 26 Case Study – DNA Databases ¾ Crime Fighting Weapon or Threat to Privacy? ¾ Questions What are the benefits of DNA databases? What problems do DNA databases pose? Who should be included in a national DNA database? Should it be limited to convicted felons? Explain your answer. Who should be able to use DNA databases? COMP 5131 27 Using Database to Improve Performance ¾ Databases and the web Information from Internal Databases → Customers View Product Catalog, Place Order Request from HTML Commands → SQL for DBMS Processing (database server) Software make this possible Web server Application servers or CGI Database server Advantage of using web to access internal databases Much less training to employees Few or no changes in internal databases Savings over redesigning and rebuilding legacy systems COMP 5131 28 Managing Data Resources ¾ Policies and Procedures for Data Management Information Policy Organization’s rules → Sharing, Disseminating, Acquiring, Classifying, Inventorying information Example → Right to change/view sensitive employee data Data Administration Database design and management group responsible for defining and organizing the structure and content of the database, and maintaining the database Specific policies and procedures for data management Responsibilities → Developing information policy, defining and organizing structure and content of database, planning for data, data directory development, Overseeing logical database design COMP 5131 29 Managing Data Resources ¾ Ensuring Data Quality Poor Data Quality Major problem for successful customer management relationship About 20% of US mail and packages are returned because of incorrect names or addresses Why Data Quality Problems? Redundant and inconsistent data produced by multiple systems Data input errors → Major data quality problems Data Quality Audit Structured survey of the accuracy and completeness of data Data Cleansing Detects and corrects incorrect, incomplete, improperly formatted, and redundant data Specialized data cleansing software → Automatically survey data files, correct errors in the data, integrate data into company wide format COMP 5131 30