Chapter 9 Database Management Systems Accounting Information Systems, 5th edition James A. Hall 1 Flat-File Versus Database Environments Computer processing involves two components: data and instructions (programs). Conceptually, there are two methods for designing interface between programs and data: file-oriented (flat-file) processing: A specific data file was created for each application [legacy systems] data-oriented processing: Create a single data repository to support numerous applications [current systems] Disadvantages of file-oriented processing include redundant data and programs and varying formats for storing the redundant data. File-Oriented (Flat-File) Environment User 1(Sales) Transactions Data Program 1 A,B,C User 2 (A/R) Transactions Program 2 X,B,Y User 3 (Payroll) Transactions Program 3 L,B,M Where B is a customer record. Assume B’s address changes. Each user that works with the B record must modify the address, increasing chance for error. Database Approach User 1 (Sales) Transactions Database Program 1 User 2 (A/R) Transactions Program 2 User 3 (Payroll) Transactions D B M S A, B, C, X, Y, L, M Program 3 With the database approach, any User could change B’s address - it would be changed for all users. changes. [Assume all users have authority to change the address for this example.] Data Redundancy & Flat-File Problems Data Storage - creates excessive storage costs of paper documents and/or magnetic form Data Updating - any changes or additions must be performed multiple times (see example of B’s address) Currency of Information - potential problem of failing to update all affected files Task-Data Dependency - user’s inability to obtain additional information as his or her needs change Advantages of the Database Approach Data sharing/centralized database resolves flat-file problems: No data redundancy - Data stored only once, eliminating data redundancy and reducing storage costs. Single update - Because data are in only one place, requires only single update procedure, reducing time/cost of keeping database current. Current values - Change to database made by any (authorized) user yields current data values for all other (authorized) users. Task-data independence - As users’ information needs expand beyond immediate domain, new needs can be more easily satisfied than under flat-file approach. 6 Disadvantages of the Database Approach Can be costly to implement additional hardware, software, storage, and network resources are required – yet these costs continue to drop… Can only run in certain operating environments may be unsuitable for some existing system configurations (e.g., legacy systems) 7 Internal Controls and DBMS1 Purpose of DBMS is to provide controlled access to database. DBMS is special software programmed to know which data elements each user is authorized to access and deny unauthorized requests of data. 1 DBMS = database management system 8 Query Language Query capability permits both end users and professional programmers to access data (information) in database without writing conventional programs. 9 Three Steps in Designing a Database 1. Prepare conceptual model 2. Specify logical design 3. Identify entities Identify relationships between entities Prepare ER diagram (ERD) Select logical database model (which will always be relational nowadays) Transform conceptual data model using logical database model Implement physical design Physical structures Access methods 10 Phase 1 Prepare the Conceptual Model Draw an ERD to capture the process. 11 Example of a Relationship Linking Two Entities places CUSTOMER Name ORDER Order Number Cust Number Example of 1:M relationship, using Crow’s Feet notation. We usually don’t cover the “0” relationship. Date Phase 2 Specify Logical Design Create relational tables. 13 Logical Data Structures Objective is to develop structure efficiently so data can be accessed quickly and easily. Four types of database structures are: hierarchical (tree structure) network relational object-oriented Concentrate on relational DBs. They are the systems being created at the current time. 14 The Relational Model Relational model portrays data in form of two dimensional tables (looks like Excel worksheet): relation - database table attributes/fields (data elements) - columns tuples (records) - rows data - intersection of rows and columns 15 Properly Designed Relational Tables No repeating values – Primary Key cannot repeat in a table. Attribute values in any column must all be of same class. [can’t put text in a date field] Each column (field name) in a table must be uniquely named. [can’t have 2 address fields both named Address – use Addr1 and Addr2] Each row (record) in a table must be unique in at least one attribute (primary key) 16 Relational Model Data Linkages (>1 table) No explicit pointers are present – data are viewed as collection of independent tables. “Relations” formed by attribute/field that is common to both tables in relation. This field is a “foreign key.” How to assign foreign keys: if 1:1 association, either of table’s primary keys may be foreign key. if 1:M association, primary key of the ONE side is embedded as foreign key in the MANY side. if M:M (M:N) association, create separate linking table. 17 Three Types of Anomalies (anomalies are found in Unnormalized tables) Insertion Anomaly: New item cannot be added to table until at least one entity uses particular attribute item. Deletion Anomaly: If attribute item used by only one entity is deleted, all information about that attribute item is lost. Update Anomaly: Modification on attribute must be made in each of rows in which attribute appears. Anomalies can be corrected by creating relational tables. 18 Relational Tables Various items of interest (customers, inventory, sales) stored in SEPARATE tables in database. Advantages: Removes all three anomalies Efficient use of space. Flexible. Users can form ad hoc relationships for queries. 19 Normalization Process Process that breaks up large, complex tables into smaller tables that meet two conditions: all nonkey attributes (fields) in table are dependent on primary key (PK) all nonkey attributes (fields) are independent of other nonkey attributes (fields) When unnormalized tables are split and reduced to third normal form, they are linked together by foreign keys (secondary keys). 20 all nonkey attributes (fields) in table are dependent on primary key (PK) Student ID Course Instructor ID 17256 ACCT3321 G001 ACCT 17535 ACCT3320 G003 Maria ACCT 17004 MKT3311 G005 Henry Hank MKT 17155 MKT3311 G005 80020 Hernandez Barbara CIS 17200 CIS2320 G010 80020 Hernandez Barbara CIS 17155 MKT3311 G005 80033 Johnson Kris ECON 16256 ACCT3321 S001 80033 Johnson Kris ECON 17005 ECON3315 S005 80033 Johnson Kris ECON 16500 CIS3545 M001 Lname Fname Major 80012 Garcia Maria ACCT 80012 Garcia Maria 80012 Garcia 80014 CRN This is an unnormalized table. Only Lname & Fname depend on the PK of Student ID. Other fields belong in another table(s). 21 Steps in Normalization Unnormalized Table with repeating groups First normal form (1NF) Second normal form (2NF) Third normal form (3NF) 1. Remove repeating groups Does field depend on PK? If no, remove and put in another table. Only concerns tables with composite PKs. See Fig. 9-38 When field depends on another non-key field in table. See Fig. 9-40 2. Remove partial dependencies 3. Remove transitive dependencies Accountants and Data Normalization Update anomalies can generate conflicting and obsolete database values. Insertion anomalies can result in unrecorded transactions and incomplete audit trails. Deletion anomalies can cause loss of accounting records and destruction of audit trails. Accountants should understand data normalization process and be able to determine whether database is properly normalized. Phase 3 Implement Physical Design Decide about software and hardware. 24 Distributed Data Processing 25 President CENTRALIZED COMPUTER SERVICES FUNCTION VP Marketing VP Computer Services Systems Development New Systems Development Database Administration DISTRIBUTED ORGANIZATIONAL STRUCTURE IPU Data Control Systems Maintenance VP Marketing VP Operations VP Finance Data Processing Data Computer Preparation Operations Data Library President VP Finance VP Administration Treasurer Controller IPU IPU VP Operations Manager Plant X IPU IPU Manager Plant Y IPU26 Characteristics of DDP1 Advantages: Cost reductions in hardware and data entry tasks Improved cost control responsibility Improved user satisfaction because control is closer to user level Backup of data can be improved through use of multiple data storage sites 1 DDP = Distributed Data Processing Disadvantages Loss of control (organization-wide) Mismanagement of organization-wide resources Hardware/software incompatibility Redundant tasks/data Incompatible tasks may be consolidated Lack of standards 27 Centralized Databases in DDP Environment Data are retained in central location. Remote IPUs (workstations) send requests for data. Central site services needs of remote workstations. Actual data processing is performed at remote workstation 28 Data Currency Can be an issue in DDP with centralized database During transaction processing, data will temporarily be inconsistent as record is being read and updated. Database lockout procedures are necessary to keep workstations from reading inconsistent data and/or from writing over a transaction being written by another workstation. 29 Distributed Databases: Partitioning Splits central database into segments that are distributed to their primary users Advantages: users’ control is increased by having data stored at local sites transaction processing response time is improved volume of transmitted data between workstations is reduced Reduces potential data loss from a disaster Distributed Databases: Replication Duplication of entire database for multiple workstations Effective for situations with a high degree of data sharing, but no primary user supports read-only queries. Data traffic between sites is reduced considerably. The Deadlock Phenomenon Especially a problem with partitioned databases Occurs when multiple sites lock each other out of data that they are currently using Special software needed to analyze and resolve conflicts. 32 The Deadlock Phenomenon Locked A, waiting for C Locked E, waiting for A A,B E, F C,D Locked C, waiting for E 34