Organizing Data and Information for Use in Decision Making (MIS) Views of Data • The Physical View – Concerned with where things are stored at and what they are stored on. • The Logical View – Concerned with how data is represented so that it will be meaningful to users. – What are the relationships between various data elements? FILE in Access TABLE in Access Bit: a single zero or one • Entity class (usually called simply Entity) – It is a concept that relates to the items about which you wish to store information--these “items” can include conceptual ideas such as “philosophers’ thoughts,” people such as “student,” things such as “refrigerators” or places such as “city.” – Generic In Nature – EMPLOYEES Table • Instance – An occurrence of an Entity Class is called an Instance of that class. Manhattan is an instance of the entity class city. – A More Specific Example – A record in Access • Attribute – Characteristics of our records • Employee #, Last Name, First Name, Hire Date, Dept. Number – The columns in Access • A PRIMARY KEY is a field in a database file that uniquely describes each record. – ID Number Social Security Number Four database models • Hierarchical – XML data is stored in a hierarchical format that you can expand and collapse. – The XML instance document can be thought of as a “database” • Network • Relational (the focus of our class) • Object-Oriented Relational Database Model • Uses a series of two-dimensional tables called relations to store information relating to a particular entity class. – Connects or relates data in different tables through the use of a key field or common elements. • Tables = Relation: A specific entity class • Rows = Records or individual file folders. • Columns = Attributes or field names • The relational database model is the widely used database model at the present time. Spreadsheet Data Used to create a database Steps to Create a Relational Database • Define entity classes and primary keys • Define relationships among entity classes • Define attributes (fields) for each relation • Use a data definition language to create the database. Step One: Define Entity Classes and Primary Keys Need a database that keeps track of the following information about your health club: The first name, last name, and id number for each of your members The address (street, city, state) where your members live The gender of each member and what class(es) they are taking The course name, number, and time that each of course meets The first name, last name, and id number for each of your employees. The home phone number for each employee, as well as their salary amount and the date that they were hired. Data Assumptions/Traits • Member Number is unique. Each member will have only one Member Number. There is one Member for each Member Number. • At this time, each member can enroll in only one course (at a time) that is offered by your health club. Therefore, each member will have only one class number recorded at any time. (At some point in your assignment, you will make it possible for your members to take more than one course at a time) • Class Number is the unique identifier for each class that your health club offers. There is only one course for each class number. • Employee Number is the unique identifier for each employee that works at your health club. There is only one employee for each employee number. Data Assumptions/Traits • • • • • A member can take only one course at a time. Only one instructor can teach a particular course. Only one section of a course is taught at any one time. A course may be taken by many different students. An instructor can teach more than one course at any one time. • THEREFORE WE HAVE THE FOLLOWING ENTITY CLASSES: – MEMBERSHIP - primary key is Member Number – COURSES - primary key is Class Number – EMPLOYEES – primary key is Employee Number Step 2: Define relationships between entity classes. Using an Entity Relationship (E-R) Model Diagram, we can represent the entity classes and their relationships: MEMBERSHIP 1:M Key: Member Number COURSES Key: Class Number COURSES Key: Class Number 1:M EMPLOYEES Key: Employee Number STEP 3: Define the attributes in each Entity which has been converted into a relation. Normalization • A technique used to make complex databases more efficient • Break one large table into several smaller tables – Eliminate all repeating groups in records – Eliminate redundant data (duplicate copies) – Assure that each field in the relation depends only on the primary key of that relation Normalized Data: tables are linked together by a common field found in more than one table. A limitation of this current example is that each member can enroll in only one course at a time. That is highly inefficient and it is not a good way to run your business. Creating a 4th table allows you to store information in that table related to all of the different courses that your members are enrolling in. •Class Number is removed from the Membership table and placed in the Enrollment table. •Neither Member Number nor Class Number is unique by itself in the Enrollment table. However, you can create a composite primary key that joins those two fields together, and that can be your unique identifier. Step #4: Use a Data Definition Language to create the database You need a database management system (DBMS) to do this Microsoft Access is a software program (database management system) that creates a database that follows the principles found in the Relational Database Model What Is a Database Management System An integrated set of programs that provides all the necessary capabilities for building and maintaining database files, extracting information required for making decisions, and formatting the information into structured reports. DIFFERENT FROM A DATABASE DATABASE a collection of information that you organize and access according to the logical structure of that information. A database is actually composed of two parts: 1. The information itself / the files that are logically associated 2. The logical structure of the information which is called the data dictionary. The DATABASE STRUCTURE – The data dictionary contains the logical properties that describe information in a database. DATABASE MANAGEMENT SYSTEM (DBMS) the software you use to specify the logical organization for a database and access it. A DBMS contains 5 software components: Database Management System Engine accepts logical requests from the various other DBMS subsystems, converts them to their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device. You can work with the information logically without having to worry about where it is physically stored or what it is stored on. DATA DEFINITION SUBSYSTEM helps you create and maintain the data dictionary and define the structure of the files in a database. • A data dictionary serves the same purpose as an XML schema. • You use this subsystem to define the information logical structure when you first create a database. • Once you’ve created a database, you use this subsystem to define new fields, delete fields, or change field properties. What might be found in a data dictionary in Access? • Field Names and their data types, as well as the primary key. • INTEGRITY CONSTRAINTS are rules that help assure the quality of the information in a database. (Validation Rules in Access) – Setting a maximum or minimum value for a field – A birth date can be now or in the past: not in the future. – A registration database at your school includes integrity constraints concerning prerequisites for certain classes. – Validation Text: error message seen in Access. What might be found in a data dictionary in Access? • Input Masks and Field Sizes • Formats, Default Values, and Captions • With Referential Integrity, you create a feature that will prevent you from typing invalid data into your table. – When two different tables are linked/joined by a relationship where referential integrity is enforced, you cannot type data into a linked field if that same data does not already exist in the original table. – You can’t assign something to an item that doesn’t exist. DATA MANIPULATION SUBSYSTEM helps you add, change, and delete information in a database and mine it for valuable information. • This subsystem is most often the primary interface between you as a user and the information contained in a database. • Tools in this subsystem include views, report generators, query-by-example tools, and structured query language. DATA MANIPULATION TOOLS • VIEW - allows you to see the content of a database file, make whatever changes you want, perform simple sorting, and search to find the location of specific information. – Add records. – Delete records – Sort records. – Search for records. DATA MANIPULATION TOOLS • REPORT GENERATOR - helps you quickly define formats of reports and what information you want to see in a report. You can specify exactly what you want to see and where you want to see it. – Report Wizard – Report Design Screen DATA MANIPULATION TOOLS • QUERY-BY-EXAMPLE (QBE) TOOL - helps you graphically design the answer to a question. You create this query by pointing, clicking, and dragging. • You specify the conditions or criteria that you want the database searched for, and the query tool will return records that match the conditions or criteria that you specified. • Queries are used to extract useful information from a database. Query Design screen (above) and Query Result (below) DATA MANIPULATION TOOLS • STRUCTURED QUERY LANGUAGE (SQL) a standardized fourth-generation language found in most database environments. SQL is the same as QBE, except that you perform a query by creating a statement instead of pointing, clicking, dragging. – SQL is a form of computer programming. – Uses • SELECT • FROM • WHERE APPLICATION GENERATION SUBSYSTEM contains facilities to help you develop transaction-intensive applications. This subsystem includes: • Tools for creating data entry screens. • Application buttons that are used to execute a certain task. • Uses a programming language specific to the database management system that you are using. DATA ADMINISTRATION SUBSYSTEM helps you manage the overall database environment by providing facilities for: • Backup and recovery • Security management – Who has access to what? – What type of access do you have? • Concurrency control ensures the validity of database updates. • Change management allows you to assess the impact of proposed structural changes. One of the goals of a database management system is to provide easy and logical access to data, while at the same time avoiding redundancy. If you have multiple files, each containing the same information, then you will most likely find redundant elements and erroneous data. Relational Database Model • Uses a series of two-dimensional tables called relations. • Connects or relates data in different files through the use of a key field or common elements. • Based on mathematical principles which allow for more logical manipulation of data. • Most flexible type of organization. THE CONCEPT OF KEYS • A KEY is a field or combination of fields used to identify records so they can be easily retrieved and processed. • A PRIMARY KEY is a field in a database file that uniquely describes each record. – ID Number Social Security Number • You can create relationships between tables/files through common fields that are found in more than one table/file. Student and Department Student Entity Student Number Address . . Department ID Department Entity Department ID Address . . This way, if we wanted to know all the students that belong to a particular department, we could get the listing by joining the two entities on the Department ID value. Relational Database Model State DMV Database Driver’s license file/table Street City State Zip 1XYZ234 Driver’s Expiration Name address J.Doe license no. date Car owner Car Model Car A405261 file/table License No. year make Street City St. Zip J. Doe address Name Moving violation citation file/table Moving Date Citation violation Fines paid/ 1XYZ234 Driver’s License No. number cited not paid type Parking violation citation file/table Parking Date Citation violation Fines paid/ Car License Number A405261 number cited not paid (MIS) type Creating relationships between database files. – Common field found in more than one table/file. – You might have to physically draw the connection between two tables/files. This is usually achieved by drawing a connecting line between two fields. INFORMATION STORED SEPARATELY BUT CAN BE RELATED THROUGH KEY JOINS Another example: which movies are provided by which distributors? Organizational Databases • Organizational databases are constantly changing as the organizational transaction processing systems are constantly updating and changing the contents of the organizational database to reflect current business activities. – Organizational databases support day-to-day operations. Constantly being updated and changed. – Not very useful for decision making involving the analysis of historical data. • Organizational databases rarely maintain historical data, just information related to current operations. Data Warehouses • A logical collection of information gathered from many different operational databases. – Contains historical data that has been extracted from many different operational databases. – Historical data is used for decision making. • Operational databases support transaction processing (OLTP). • Data warehouses support analytical processing (OLAP) because historical data is analyzed in order to make a decision. • Data Mining tools are the software tools used to query information in data warehouse. – Use advanced statistical techniques to search for patterns and anomalies in the data. – Attempts to find answers to questions the user did not even think to ask. • A relational database stores information in a series of two dimensional tables. • Data warehouses are multidimensional, containing layers of rows and columns. Each dimension is an attribute of information. Data-mining agents perform multidimensional analysis in data warehouses • Cube – common term for the representation of multidimensional information (layers, rows, columns) Applications of Data Mining • Market Segmentation: identifying common characteristics of customers who buy the same products from your company. • Customer Churn: predicting which customers are likely to leave your company and go to a competitor. • Fraud Detection: identifying which transactions are most likely to be fraudulent. • Direct Marketing: identifying the best prospects in order to obtain the highest response rate. • Market Basket Analysis: trying to understand which products are commonly purchased together. • Trend Analysis: trying to reveal differences between one period and another period. • Info in an Excel spreadsheet and a relational database (Access) appears in the form of a two dimensional table of rows and columns. • By adding a Page Field to a Pivot Table, you can add another dimension of information: 3-D (rows and columns and layers). – Creating a 3-dimensional Pivot Table in Excel is a means of conceptually building a data warehouse. Page fields represent the depth layer • Pivot Tables can help you see relationships in the data The End Recent Database Developments • The Data Warehouse and Data Mining • Object-Oriented Database Management Systems OBJECT-ORIENTED DATABASE MODEL • A database model that brings together, stores, and allows you to work with both information and the procedures that act on that information. • An object is a software module containing – information that describes an entity class – along with a list of procedures that can act on the information describing the entity class. – Audio, video, and graphics can be stored in an object. OBJECT-ORIENTED DATABASE MODEL • Basic principles of Object-Oriented are different: – Hierarchical, network, and especially Relational Database Structures try to separate the data from the programs which manipulate them. – Object-Oriented model does the opposite: they store the Entity together with the Procedures (code) that work on the Entity data. Object-Oriented Database Model Bank Account Object Attributes •Name •Acct. #’s •Address •Phone Operations •Get Accounts Checking Acct. Object Attributes •Acct. #’s •Balance •Credit Line •Monthly Stmt. Operations •Calc. Int. Owed •Print Mthly. Stmt. Savings Acct. Object Attributes •Account # •Balance Operations •Calc. Int. Paid •Print Qtrly. Stmt. The ObjectOriented database model is gaining in popularity and some day may be more popular than our current relational database model. • Object-Oriented Databases Technologies Are Becoming More Popular because we / businesses need to store things that are not just words. • The Object-Oriented Model supports unique and complex data types (because it does not limit data storage to two dimensional tables). You can store and manipulate different aspects of: – graphics – video – text drawings sound music photographs TO SUMMARIZE • How we view information: – The physical view of information deals with how information is physically arranged, stored, and accessed on some type of secondary storage device. – The logical view of information focuses on how you need to arrange and access information to meet your particular business needs. • A database is a collection of information that you organize and access according to the logical structure of that information. • The data dictionary contains the logical structure of information in a database. Database Models • Relational Database Model – All the data is arranged in a series of related tables. Student, Class, Instructor – Tables are linked together by common fields. • Object-Oriented Database Model – Allows you to store the data and the procedures used to manipulate that data together. – Can be used to store text, sound, video, and images. – Stores data, computes GPA, and creates a transcript. Normalization • Used to create tables in a relational database. • Break one large table into several smaller tables – Eliminate redundant data (duplicate copies) – With redundancy removed, you only need to make a change once and all linkages are automatically updated. Proper Database Design • Field: Name: Brian Kovar • Field: Address: – 123 North Main, Manhattan, KS 66502 •Field: Last Name: Kovar •Field: First Name: Brian •Field: Address: 123 North Main •Field: City: Manhattan •Field: State: KS •Field: Zip Code: 66502 Hierarchical Database Model • The oldest database structure – Started by IBM in the mid 1960s • Its structure resembles an Organizational Chart structure applied to entities in a database structure. • Parent/child relationship between entities. – A “parent entity” can have many “children entities” but a “child entity” can have only one “parent entity” – Allows 1:1 and 1:M relationships only. • Only one access path to any particular data element. • Very rigid structure. Very difficult to add new fields to a database. Kansas State University College of Business Accounting MANGT 366 College of Arts & Science MIS Marketing MANGT 367 George W. Bush Bill Clinton George Bush Ronald Reagan College of Engineering Finance MANGT 656 Management MANGT 670 The Hierarchical Database Model NETWORK DATABASE MODEL • The second oldest model • Characteristics – It allows a “parent” entity to have many “children entities” – A “child entity” is allowed to have many “parent entities” • Can represent 1:1, 1:M, and M:M relationships. • Different branches of data can have different relationships. Can access data by more than one path. Network Database Model A college class scheduling system Courses Instructors Students Journalism 101 Film 200 TV 210 D. Barry R. DeNiro D. Rather Student A Student B Student C Student D Student E Database Models • Hierarchical Database Model – Structure Resembles an Organizational Chart – Tree with Branches: Must travel up and down set paths in order to reach a destination. • Network Database Model – Allows for numerous interconnecting relationships. One student can have many teachers. One teacher can have many different students. One teacher can teach many different classes. One class can have many different students. Traditional Approach to Data Management Traditional Approach to Data Management • Separate data files are created by or for different application programs. Seen from a file perspective. • Weaknesses of this approach – Data redundancy: the same data can be found in more than one location. – Data redundancy conflicts with the integrity of the overall data. – Program-data dependence: programs and data that are developed for one application are incompatible with programs and data developed for another application. – Inflexible: Very difficult to make changes The Database Approach • All of the data used by all of the applications is stored in one location: the database – Removes data redundancy – Helps insure data integrity. Hierarchical Database Model A cruise ship reservations system Ports of Departure Miami Los Angeles New York Names of Ships QE 2 The Love Boat The Oriana Sailing Dates April 15 May 30 July 15 Cabin Numbers A-1 A-2 A-3 Organizing Data and Information: Chapter 3 (MIS) • What view of data is of most interest to most people in business (non-computer types of occupations such as accountants, marketers, managers, and people in finance)? – The Logical View of Data – Concerned with how data is represented so that it will be meaningful to users. What are the relationships between various data elements? • Primary Key • Attribute Instance Entity Class • The wording of “Cities in Kansas” is – An entity class • Manhattan would be an _____________ – Instance of the “Cities in Kansas” Entity Class • An attribute of Manhattan is ____________ • Give an example of a good primary key Organizing Data and Information: Chapter 3 (MIS) • The Relationship line is based on a common field found in more than one table (everything about it must be identical). – What is the purpose of the relationship line? – It is used to link together your tables (also referred to as the “join line.”)