Chapter 1: The Database Environment and Development Process 1 Objectives ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ Define terms Name limitations of conventional file processing Explain advantages of databases Identify costs and risks of databases List components of database environment Identify categories of database applications Describe database system development life cycle Explain prototyping and agile development approaches Explain roles of individuals Explain the three-schema architecture for databases Chapter 1 2 Definitions ◼ ◼ Database: organized collection of logically related data Data: stored representations of meaningful objects and events ◼ ◼ ◼ ◼ Structured: numbers, text, dates Unstructured: images, video, documents Information: data processed to increase knowledge in the person using the data Metadata: data that describes the properties and context of user data Chapter 1 3 Figure 1-1a Data in context Context helps users understand data Chapter 1 4 Figure 1-1b Summarized data Graphical displays turn data into useful information that managers can use for decision making and interpretation Chapter 1 5 Chapter 1 Descriptions of the properties or characteristics of the data, including data types, field sizes, allowable values, and data context 6 Database Examples ◼ Database is involved like everywhere in our world For example: If we go to bank to deposit or withdraw Make hotel and airline reservation Purchase something on-line ◼ Chapter 1 Database Applications ◼ ◼ These examples are what we called traditional database applications More Recent Applications: ◼ ◼ ◼ ◼ ◼ ◼ Chapter 1 Traditional database application Geographic Information Systems (GIS):can store and analyze maps, weather data, and satellite images Multimedia database: store images, audio clips, and video streams digitally Data Warehouses: systems are used in many companies to extract and analyze useful business information from very large databases to support decision making. Real-time and active database technology is used to control industrial and manufacturing processes. Database search techniques are being applied to the World Wide Web to improve the search for information Database can be any size and complexity For example: ◼ ◼ ◼ A list of names and address IRS Internal Revenue Service (IRS) )خدمة اإليرادات الداخلية (مصلحة الضرائب (assume it has 100 million taxpayers and each taxpayer file 5 forms with 400 characters of information per form=800Gbyte) Amazon.com (15 million people visit per day; about 100 people are responsible for database update) Chapter 1 Database System ◼ Data: Known facts that can be recorded and have an implicit meaning. ◼ Database: A collection of related data. ◼ Database Management System (DBMS): DBMS is a computerized system that enables users to create and maintain (add or drop) a database. ◼ A software package/system to facilitate the Define, Construct, Manipulate and Share functions of a computerized database. ◼ Chapter 1 Typical DBMS Functionality ◼ Define involve specify data types, structures, and ◼ Construct or Load the initial database contents and ◼ Manipulate the database: constraints stored data on a secondary storage medium ◼ ◼ ◼ ◼ Retrieval: Querying, generating reports Modification: Insertions, deletions and updates to its content Accessing the database through Web applications Share a database allows multiple users and programs to access the database simultaneously Chapter 1 Data, Information and Knowledge (1 of 3) ◼ ◼ ◼ ◼ Data items: refer to an elementary description of things, events, activities, and transactions that are recorded, classified, and stored, but not organized to convey any specific meaning. It can be numeric, alphanumeric, sounds, or images. E.g. student grade in a class. A database consists of stored data items organized for retrieval. Chapter 1 12 Data, Information and Knowledge (2 of 3) ◼ ◼ ◼ ◼ ◼ Information: is data that has been organized so they have meaning and value to the recipient. The recipient interprets the meaning and draws conclusions and implications from the information. It may convey a trend in the environment, or perhaps indicate a pattern of sales for a given period of time. Data items are processed into information by means of an application. E.g. a student’s grade point average Chapter 1 13 Data, Information and Knowledge (3 of 3) ◼ ◼ ◼ ◼ Knowledge: consists of data and/or Information that have been organized and processed to convey understanding, experience, accumulated learning and expertise as they apply to a current problem or activity. The knowledge possessed by each individual is a product of his experience, and encompasses the norms by which he evaluates new inputs from his surroundings. E.g. the GPA of a student applying to a graduate school can provide an admission officer with the knowledge of how good the student is only in comparison with the GPAs of other students and schools. Wisdom is the top of the DIKW hierarchy and to get there, we must answer questions such as ‘why do something’ and ‘what is best’. In other words, wisdom is knowledge applied in action. Chapter 1 14 Organizing Data in a Computer System (1 of 2) ◼ A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases. Chapter 1 Organizing Data in a Computer System (2 of 2) ◼ Bit: represents the smallest unit of data a computer can process (0/1). ◼ ◼ ◼ ◼ ◼ Byte: is a group of eight bits, represents a single character, which can be a letter, a number, or symbol. Field: A logical grouping of characters into a word, a group of words, or a complete number. For example, a student's name would appear in the name field. Record: A logical group of related fields, such as customer's name, product sold, and hours worked, are examples of a record. File: A logical group of related records. For example, the student records in a single course would constitute a data file for that course. Database: A logical group of related files. All students’ course files could be grouped with files on students’ personal histories and financial backgrounds to create a students' database. Chapter 1 Databases Improve Business Performance and Decision Making ◼ Businesses use their databases to: ◼ ◼ Keep track of basic transactions, such as paying suppliers, processing orders, keeping track of customers, and paying employees. Provide information that will help the company run the business more efficiently, and help managers and employees make better decisions. ◼ Chapter 1 If a company wants to know which product is the most popular or who is its most profitable customer, the answer lies in the data. Big Data ◼ ◼ There has been an explosion of data from Web traffic, e-mail messages, and social media content (tweets, status messages), as well as machinegenerated data from sensors (used in smart meters, manufacturing sensors, and electrical meters) or from electronic trading systems. We now use the term big data to describe these datasets with volumes so huge that is petabyte and exabyte range—in other words, billions to trillions of records, all from different sources Chapter 1 Databases and The Web ◼ ◼ Every time a customer use the Web to place an order or view a product catalog, he is using a Web site linked to an internal corporate database. Many companies now use the Web to make some of the information in their internal databases available to customers and business partners. Chapter 1 Database System ◼ Database System: DBMS + Database Chapter 1 Simplified database system environment Chapter 1 Database Management System - manages interaction between end users and database Database Systems: Design, Implementation, & Management: Rob & Coronel Chapter 1 S511 Session 2, IU-SLIS 22 Database System Environment ▪ Hardware ▪ Software - OS - DBMS - Applications ▪ People ▪ Procedures ▪ Data Database Systems: Design, Implementation, & Management: Rob & Coronel Chapter 1 S511 Session 2, IU-SLIS 23 Types of Databases ◼ ◼ Types of Databases: Types of Database can be categorized according to: 1. 2. 3. 4. 5. Chapter 1 Users Numbers Location Data Sensitivity Data Structure Others Types of Databases Database according to Users Numbers: ❑ Single-user database: ❑ Multiuser database: Supports multiple users at the same time ❑ Enterprise database ❑ Single-user database: ❑ Supports one user at a time ❑ Desktop database: Runs on PC Chapter 1 25 Types of Databases ❑ Multiuser database: Supports multiple users at the same time Workgroup databases: ❑ Supports a small number of users or a specific department ❑ (user number 2 ≤ N < 50) ❑ Associated to one department ❑ Enterprise database: ❑ Supports many users across many departments ❑ (user number 50 ≤ N < M) ❑ Associated to more than one department Chapter 1 26 Types of Databases (continued) Can be classified by location: ◼ Centralized: ◼ ◼ Distributed: ◼ ◼ Supports data located at a single site Supports data distributed across several sites Cloud: ◼ Created and maintained using cloud data services that provide defined performance measures for the database Chapter 1 Centralized Distributed Chapter 1 Cloud 28 Types of Databases (continued) Database according to Data Sensitivity ❑ Operational database: Designed to support a company’s day-to-day operations ❑ ❑ Transitional DB ❑ Production DB Analytical database: Stores historical data and business metrics used exclusively for tactical or strategic decision making ❑ Data warehouse: Stores data in a format optimized for decision support Chapter 1 Types of Databases (continued) Database according to the Structure Degree ❑ Unstructured data: It exists in their original state ❑ Structured data: It results from formatting ❑ Structure is applied based on type of processing to be performed ❑ Semistructured data: Processed to some extent Chapter 1 30 Types of Databases (continued) Others Database •Extensible Markup Language (XML) Represents data elements in textual format ❑ General-purpose databases: Contains a wide variety of data used in multiple disciplines ❑ Discipline-specific databases: Contains data focused on specific subject areas Chapter 1 31 Outline ◼ ◼ ◼ ◼ ◼ Database Introduction An Example Characteristics of the Database Actors on the Scene Advantages of using the DBMS approach Chapter 1 An UNIVERSITY example ◼ A UNIVERSITY database for maintaining information concerning students, courses, and grades in a university environment We have: STUDENT file stores data on each student COURSE file stores data on each course SECTION file stores data on each section of each course GRADE_REPORT file stores the grades that students receive PREREQUISITE file stores the prerequisites ◼ Chapter 1 Example of a simple database Chapter 1 Database manipulation ◼ Database manipulation involves querying and updating Examples of querying are: Retrieve a transcript EX: List the prerequisites of the “Database” course ◼ Examples of updating are: EX: Enter a grade of “A” for “Smith” in “Database” course ◼ Chapter 1 Outline ◼ ◼ ◼ ◼ ◼ Database Introduction An Example Characteristics of the Database Actors on the Scene Advantages of using the DBMS approach Chapter 1 Database System vs. File System Database Systems: Design, Implementation, & Management: Rob & Coronel Chapter 1 S511 Session 2, IU-SLIS 37 Database V.S. File ◼ ◼ 1. 2. 3. 4. In the database approach, a single repository of data is maintained that is defined once then accessed by various users The major differences between DB and File are: Self-describing of a DB Insulation between programs and data Support of multiple views of the data Sharing of data and multiuser transaction processing Chapter 1 Self-describing nature of a database system ◼ ◼ Database system contains not only the database itself but also a complete definition of the database structure and constrains The information stored in the catalog is called Meta-data (data about data), and it describes the structure of the primary database. Chapter 1 Example of a simplified Meta-data Chapter 1 DBMS Functions (continued) Chapter 1 Insulation between programs and data ◼ ◼ ◼ In file processing, if any changes to the structure of a file may require changing all programs that access the file In database system, the structure of data files is stored in the DBMS catalog separately from the access program This is called program-data independence Chapter 1 Program-Data independence For example, a file access program may be written in such a way that it can access only STUDENT records of the structure shown in Figure 1.4. If we want to add another piece of data to each STUDENT record, say the Birth_date, such a program will no longer work and must be changed. By contrast, in a DBMS environment, we only need to change the description of STUDENT records in the catalog The characteristic that allows program-data independence and programChapter 1 operation independence is called data abstraction Support of multiple views of the data ◼ ◼ Each user may see a different view of the database, which describes only the data of interest to that user View is subset of database Chapter 1 Support of Multiple Views of the Data Chapter 1 Sharing of data and multi-user transaction processing ◼ ◼ Allowing a set of concurrent users to retrieve from and to update the database. Concurrency control within the DBMS guarantees that each transaction is correctly executed or aborted ◼ ◼ For example, when several reservation clerks try to assign a seat on an airplane flight (these types of applications are generally called online transaction processing (OLTP)) Chapter 1 Disadvantages of file processing a) Data redundancy and inconsistency: Data redundancy means duplication of data and inconsistency means that the duplicated values are different. b) Integrity problems: Data integrity means that the data values in the data base should be accurate in the sense that the value must satisfy some rules. c) Security Problem: Data security means prevention of data accession by unauthorized users. d) Difficulty in accessing data: Difficulty in accessing data arises whenever there is no application program for a specific task. e) Data isolation: This problem arises due to the scattering of data in various files with various formats. Due to the above disadvantages of the earlier data processing system, the necessity for an effective data processing system arises. Only at that time the concept of DBMS emerges for the rescue of a large number of organizations. Chapter 1 Chapter 1 Outline ◼ ◼ ◼ ◼ ◼ Database Introduction An Example Characteristics of the Database Actors on the Scene Advantages of using the DBMS approach Chapter 1 Database Users ◼ Database administrators: ◼ ◼ Database Designers: ◼ ◼ Responsible to define the content, the structure, the constraints, and functions or transactions against the database. They must communicate with the end-users and understand their needs. End Users ◼ ◼ Responsible for authorizing access to the database, for coordinating and monitoring its use, acquiring software and hardware resources, controlling its use and monitoring efficiency of operations. Access to the database for querying, updating, and generating reports System Analysts ◼ Determine the requirements of end users Chapter 1 Outline ◼ ◼ ◼ ◼ ◼ ◼ Database Introduction An Example Characteristics of the Database Actors on the Scene DBMS Functions Advantages of using the DBMS approach Chapter 1 DBMS Functions ◼ DBMS performs functions that guarantee integrity and consistency of data ◼ Data dictionary management ◼ ◼ defines data elements and their relationships Data storage management ◼ Chapter 1 stores data and related data entry forms, report definitions, etc. DBMS Functions (continued) ◼ Multiuser access control ◼ ◼ Security management ◼ ◼ uses sophisticated algorithms to ensure multiple users can access the database concurrently without compromising the integrity of the database enforces user security and data privacy within database Backup and recovery management ◼ Chapter 1 provides backup and data recovery procedures DBMS Functions (continued) ◼ Database access languages and application programming interfaces ◼ ◼ provide data access through a query language Database communication interfaces ◼ Chapter 1 allow database to accept end-user requests via multiple, different network environments 1.5 Workers behind the Scene ◼ ◼ ◼ DBMS system designers and implementers design and implement the DBMS modules and interfaces as a s Tool developers design and implement tools. the software packages that facilitate database modeling and design, database system design, and improved performance. Operators and maintenance personnel (system administration personnel) are responsible for the actual running and maintenance of the hardware and Chapter 1 software environment for the database system. Outline ◼ ◼ ◼ ◼ ◼ ◼ Database Introduction An Example Characteristics of the Database Actors on the Scene DBMS Functions Advantages of using the DBMS approach Chapter 1 Controlling Redundancy ◼ ◼ Controlling Redundancy is one of most important feature to use DBMS The traditional file approach, each group independently keeps their own file. ◼ Chapter 1 For example: accounting office keeps data on registration and billing info; whereas the registration office keeps track of registration, student courses and grades. Controlling Redundancy ◼ 1. 2. 3. This redundancy in storing the same data multiple times leads to several problems: Logic update – we need to update several times Storage space is wasted The file that represent the same data may become inconsistent Chapter 1 Other Advantages of using the DBMS approach ◼ Restricting unauthorized access to data (users or user groups are given account numbers protected by passwords) ◼ Providing Storage Structures (e.g. indexes) for efficient Query Processing Auxiliary files called indexes are often used for this purpose ◼ Providing backup and recovery services Chapter 1 Other Advantages of using the DBMS approach ◼ ◼ ◼ Providing multiple interfaces to different classes of users Representing complex relationships among data Providing Persistent Storage for Program Objects for program objects and data structures . . Chapter 1 When Not to Use a DBMS ◼ Overhead costs : ◼ ◼ ◼ ◼ ◼ ◼ High initial investment in hardware, software, and training. Generality that a DBMS provides for defining and processing data. Overhead for providing security, concurrency control, recovery, and integrity functions. if the database designers and DBA do not properly design the database if database systems applications are not implemented properly use regular files under the following circumstances: ◼ ◼ ◼ Chapter 1 The database and applications are simple, well defined, and not expected to change. There are stringent real-time requirements for some programs that may not be met because of DBMS overhead. Multiple-user access to data is not required Disadvantages of File Processing ◼ Program-Data Dependence ◼ ◼ Duplication of Data ◼ ◼ No centralized control of data Lengthy Development Times ◼ ◼ Different systems/programs have separate copies of the same data Limited Data Sharing ◼ ◼ All programs maintain metadata for each file they use Programmers must design their own file formats Excessive Program Maintenance ◼ 80% of information systems budget Chapter 1 62 Problems with Data Dependency ◼ ◼ ◼ ◼ ◼ Each application programmer must maintain his/her own data Each application program needs to include code for the metadata of each file Each application program must have its own processing routines for reading, inserting, updating, and deleting data Lack of coordination and central control Non-standard file formats Chapter 1 63 Duplicate Data 64 Chapter 1 Problems with Data Redundancy Waste of space to have duplicate data ◼ Causes more maintenance headaches ◼ The biggest problem: ◼ Data changes in one file could cause inconsistencies ◼ Compromises in data integrity ◼ Chapter 1 65 SOLUTION: The DATABASE Approach Central repository of shared data ◼ Data is managed by a controlling agent ◼ Stored in a standardized, convenient form ◼ Requires a Database Management System (DBMS) Chapter 1 66 Database Management System ◼ A software system that is used to create, maintain, and provide controlled access to user databases Order Filing System Invoicing System Payroll System DBMS Central database Contains employee, order, inventory, pricing, and customer data DBMS manages data resources like an operating system manages hardware resources Chapter 1 67 Advantages of the Database Approach ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ Program-data independence Planned data redundancy Improved data consistency Improved data sharing Increased application development productivity Enforcement of standards Improved data quality Improved data accessibility and responsiveness Reduced program maintenance Improved decision support Chapter 1 68 Costs and Risks of the Database Approach ◼ ◼ ◼ ◼ ◼ New, specialized personnel Installation and management cost and complexity Conversion costs Need for explicit backup and recovery Organizational conflict Chapter 1 69 Elements of the Database Approach ◼ Data models ◼ ◼ ◼ ◼ Entities ◼ ◼ ◼ Noun form describing a person, place, object, event, or concept Composed of attributes Relationships ◼ ◼ ◼ Graphical system capturing nature and relationship of data Enterprise Data Model–high-level entities and relationships for the organization Project Data Model–more detailed view, matching data structure in database or data warehouse Between entities Usually one-to-many (1:M) or many-to-many (M:N) Relational Databases ◼ Database technology involving tables (relations) representing entities and primary/foreign keys representing relationships Chapter 1 70 Figure 1-3 Comparison of enterprise and project level data models Segment of an enterprise data model Segment of a project-level data model Chapter 1 71 One customer may place many orders, but each order is placed by a single customer → One-to-many relationship Chapter 1 72 One order has many order lines; each order line is associated with a single order → One-to-many relationship Chapter 1 73 One product can be in many order lines, each order line refers to a single product → One-to-many relationship Chapter 1 74 Therefore, one order involves many products and one product is involved in many orders → Many-to-many relationship Chapter 1 75 Chapter 1 76 Figure 1-5 Components of the Database Environment Chapter 1 77 Components of the Database Environment ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ CASE Tools–computer-aided software engineering Repository–centralized storehouse of metadata Database Management System (DBMS) –software for managing the database Database–storehouse of the data Application Programs–software using the data User Interface–text and graphical displays to users Data/Database Administrators–personnel responsible for maintaining the database System Developers–personnel responsible for designing databases and software End Users–people who use the applications and databases Chapter 1 78 The Range of Database Applications ◼ ◼ ◼ ◼ Personal databases Two-tier Client/Server databases Multitier Client/Server databases Enterprise applications ◼ ◼ Enterprise resource planning (ERP) systems Data warehousing implementations Chapter 1 79 Chapter 1 80 Figure 1-6 Two-tier database with local area network Chapter 1 81 Figure 1-7 Three-tiered client/server database architecture Chapter 1 82 Enterprise Database Applications ◼ Enterprise Resource Planning (ERP) ◼ ◼ Integrate all enterprise functions (manufacturing, finance, sales, marketing, inventory, accounting, human resources) Data Warehouse ◼ Integrated decision support system derived from various operational databases Chapter 1 83 Figure 1-8a Evolution of database technologies Chapter 1 84 Enterprise Data Model ◼ ◼ ◼ ◼ ◼ ◼ ◼ First step in database development Specifies scope and general content Overall picture of organizational data at high level of abstraction Entity-relationship diagram Descriptions of entity types Relationships between entities Business rules Chapter 1 85 FIGURE 1-9 Example business function-to-data entity matrix Chapter 1 86 Two Approaches to Database and IS Development ◼ SDLC ◼ ◼ ◼ ◼ ◼ System Development Life Cycle Detailed, well-planned development process Time-consuming, but comprehensive Long development cycle Prototyping ◼ ◼ ◼ ◼ Rapid application development (RAD) Cursory attempt at conceptual data modeling Define database during development of initial prototype Repeat implementation and maintenance activities with new prototype versions Chapter 1 87 Systems Development Life Cycle (see also Figure 1-10) Planning Analysis Logical Design Physical Design Implementation Maintenance Chapter 1 88 Systems Development Life Cycle (see also Figure 1-10) (cont.) Purpose–preliminary understanding Deliverable–request for study Planning Planning Analysis Logical Design Physical Design Database activity– enterprise modeling and early conceptual data modeling Chapter 1 Implementation Maintenance 89 Systems Development Life Cycle (see also Figure 1-10) (cont.) Purpose–thorough requirements analysis and structuring Deliverable–functional system specifications Planning Analysis Analysis Logical Design Physical Design Database activity–thorough and integrated conceptual data modeling Chapter 1 Implementation Maintenance 90 Systems Development Life Cycle (see also Figure 1-10) (cont.) Purpose–information requirements elicitation and structure Deliverable–detailed design specifications Planning Analysis Logical Design Logical Design Physical Design Database activity– logical database design (transactions, forms, displays, views, data integrity and security) Chapter 1 Implementation Maintenance 91 Systems Development Life Cycle (see also Figure 1-10) (cont.) Purpose–develop technology and organizational specifications Planning Deliverable–program/data structures, technology purchases, organization redesigns Analysis Logical Design Physical Design Physical Design Database activity– physical database design (define database to DBMS, physical data organization, database processing programs) Chapter 1 Implementation Maintenance 92 Systems Development Life Cycle (see also Figure 1-10) (cont.) Purpose–programming, testing, training, installation, documenting Planning Deliverable–operational programs, documentation, training materials Analysis Logical Design Physical Design Database activity– database implementation, including coded programs, documentation, installation and conversion Chapter 1 Implementation Implementation Maintenance 93 Systems Development Life Cycle (see also Figure 1-10) (cont.) Purpose–monitor, repair, enhance Planning Deliverable–periodic audits Analysis Logical Design Physical Design Database activity– database maintenance, performance analysis and tuning, error corrections Chapter 1 Implementation Maintenance Maintenance 94 Prototyping Database Methodology (Figure 1-11) Chapter 1 95 Prototyping Database Methodology (Figure 1-11) (cont.) Chapter 1 96 Prototyping Database Methodology (Figure 1-11) (cont.) Chapter 1 97 Prototyping Database Methodology (Figure 1-11) (cont.) Chapter 1 98 Prototyping Database Methodology (Figure 1-11) (cont.) Chapter 1 99 Database Schema ◼ External Schema ◼ ◼ ◼ ◼ ◼ Conceptual Schema ◼ ◼ User Views Subsets of Conceptual Schema Can be determined from business-function/data entity matrices DBA determines schema for different users E-R models–covered in Chapters 2 and 3 Internal Schema ◼ ◼ Chapter 1 Logical structures–covered in Chapter 4 Physical structures–covered in Chapter 5 100 Figure 1-12 Three-schema architecture Different people have different views of the database…these are the external schema The internal schema is the underlying design and implementation Chapter 1 101 Managing Projects ◼ ◼ Project–a planned undertaking of related activities to reach an objective that has a beginning and an end Involves use of review points for: ◼ ◼ ◼ ◼ Validation of satisfactory progress Step back from detail to overall view Renew commitment of stakeholders Incremental commitment–review of systems development project after each development phase with rejustification after each phase Chapter 1 102 Managing Projects: People Involved ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ ◼ Chapter 1 Business analysts Systems analysts Database analysts and data modelers Users Programmers Database architects Data administrators Project managers Other technical experts 103 FIGURE 1-13 Computer System for Pine Valley Furniture Company Chapter 1 104 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Chapter 1 105