IS 503 DATABASE CONCEPTS AND APPLICATIONS IS 503: GRADING & OTHER… 25% Homeworks 25% Midterm 25% Final 20% Project Contact: betincan@metu.edu.tr The course materials are at ODTUClass. You are expected to upload your assignments there. Course book: Rames Elmasri, ‘Fundamentals of Database Systems’, 6th Edition, 2010 Slides are mostly based on Elmasri’s presentations. SYLLABUS Chapter 1: Introduction to Conceptual Modeling Chapter 2: Database System Concepts and Architecture Chapter 3: Database modeling using the Entity-Relationship (ER) and Extended ER (EER) Chapter 4: The Relational Data Model and Relational Database Constraints Chapter 5: Relational Database Design by ER-to-Relational Mapping Chapter 6 EER-to-Relational Mapping Chapter 7: The Relational Algebra and Calculus Chapter 7: The Relational Algebra and Calculus Chapter 8: SQL Chapter 9: SQL Programming Lab and Examples Chapter 10: Functional Dependencies and Normalization for Relational Databases Chapter 11: Relational Database Design Algorithms and Further Dependencies Chapter 12: Introduction to Transaction Processing Concepts and Theory Slide 1-3 CHAPTER 1 Introduction and Conceptual Modeling Copyright © 2004 Pearson Education, Inc. BASIC DEFINITIONS Data: Known facts that can be recorded and have an implicit meaning. Database: A collection of related data. Mini-world: Some part of the real world about which data is stored in a database. For example, student grades and transcripts at a university. Database Management System (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. Database System: The DBMS software together with the data itself. Sometimes, the applications are also included. Slide 1-5 EXAMPLE OF A DATABASE (WITH A CONCEPTUAL DATA MODEL) Mini-world for the example: Part of a UNIVERSITY environment. Some mini-world entities: STUDENTs COURSEs SECTIONs (of COURSEs) (academic) DEPARTMENTs INSTRUCTORs Slide 1-6 EXAMPLE A database that stores student information Name StudentNumber Class Major Smith 17 1 CS Brown 8 2 CS Slide 1-7 EXAMPLE Suppose we have the following information in our database: Student Section Grade Report Name StudentNumber Class Major Smith 17 1 CS Brown 8 2 CS SectionIdenti fier CourseNumber Semester Year Instructor 85 MATH2410 Fall 98 King 92 CS1310 Fall 98 Anderson 102 CS3320 Spring 99 Knuth 112 MATH2410 Fall 99 Chang 119 CS1310 Fall 99 Anderson 135 CS3380 Fall 99 Stone StudentNumber SectionIdentifier Grade 17 112 B 17 119 C 8 85 A 8 92 A 8 102 B 8 135 A Slide 1-8 EXAMPLE OF A DATABASE (WITH A CONCEPTUAL DATA MODEL) Some mini-world relationships: SECTIONs are of specific COURSEs STUDENTs take SECTIONs COURSEs have prerequisite COURSEs INSTRUCTORs teach SECTIONs COURSEs are offered by DEPARTMENTs STUDENTs major in DEPARTMENTs Note: The above could be expressed in the ENTITY-RELATIONSHIP data model. Slide 1-9 TYPICAL DBMS FUNCTIONALITY Define a database: in terms of data types, structures and constraints Construct or Load the Database on a secondary storage medium Manipulating the database : querying, generating reports, insertions, deletions and modifications to its content Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent Slide 1-10 TYPICAL DBMS FUNCTIONALITY Protection or Security measures to prevent unauthorized access Protection against hardware and software malfunction (crashes) Security against unauthorized or malicious access Presentation and Visualization of data Slide 1-11 MAIN CHARACTERISTICS DATABASE APPROACH OF THE Self-describing nature of a database system: A DBMS catalog stores the description of the database. The description is called meta-data. This allows the DBMS software to work with different databases. Slide 1-12 A database that stores student information Name StudentNumber Class Major Smith 17 1 CS Brown 8 2 CS Internal storage format for a STUDENT record Data Item Name Starting position in Record Length in Characters (bytes) Name 1 30 StudentNumber 31 4 Class 35 4 Major 39 4 Slide 1-13 MAIN CHARACTERISTICS DATABASE APPROACH OF THE Self-describing nature of a database system: A DBMS catalog stores the description of the database. The description is called metadata. This allows the DBMS software to work with different databases. Insulation between programs and data: Called program-data independence. Allows changing data storage structures and operations without having to change the DBMS access programs. Slide 1-14 MAIN CHARACTERISTICS DATABASE APPROACH OF THE Data Abstraction: A data model is used to hide storage details and present the users with a conceptual view of the database. Support of multiple views of the data: Each user may see a different view of the database, which describes only the data of interest to that user. Slide 1-15 EXAMPLE Suppose we have the following information in our database: Student Section Grade Report Name StudentNumber Class Major Smith 17 1 CS Brown 8 2 CS SectionIdenti fier CourseNumber Semester Year Instructor 85 MATH2410 Fall 98 King 92 CS1310 Fall 98 Anderson 102 CS3320 Spring 99 Knuth 112 MATH2410 Fall 99 Chang 119 CS1310 Fall 99 Anderson 135 CS3380 Fall 99 Stone StudentNumber SectionIdentifier Grade 17 112 B 17 119 C 8 85 A 8 92 A 8 102 B 8 135 A Slide 1-16 EXAMPLE One view for those who want to see the transcript of the students Student Transcript StudentName Smith Brown CourseNumber Grade Semester Year SectionId CS1310 C Fall 99 119 MATH2410 B Fall 99 112 MATH2410 A Fall 98 85 CS1310 A Fall 98 92 CS3320 B Spring 99 102 CS3380 A Fall 99 135 Slide 1-17 MAIN CHARACTERISTICS DATABASE APPROACH OF THE Sharing of data and multiuser transaction processing : allowing a set of concurrent users to retrieve and to update the database. Concurrency control within the DBMS guarantees that each transaction is correctly executed or completely aborted. OLTP (Online Transaction Processing) is a major part of database applications. Slide 1-18 MAIN CHARACTERISTICS DATABASE APPROACH OF THE Transaction: executing program or process that includes one or more database accesses, such as reading or updating of database records. The isolation property ensures that each transaction appears to execute in isolation from other transactions, even though hundreds of transactions may be executing concurrently. Slide 1-19 Why use a database system instead of a file ? Due to characteristics of database approach: Self-describing nature of a database system Insulation between programs and data Sharing of data and multiuser transaction processing Support of multiple views of the data Slide 1-20 ADVANTAGES OF USING THE DATABASE APPROACH Enforcing integrity constraints on the database. E.g. the value of the Class data item within each STUDENT record must be an integer between 1 and 5 The value of Name must be a string of no more than 30 alphabetic characters. Slide 1-25 ADVANTAGES OF USING THE DATABASE APPROACH Controlling redundancy in data storage and in development and maintenance efforts. Redundancy is where the same data is stored in more than one file leading to a waste of space and possible integrity errors. It is the Duplication of data in different files. Hazards of Redundancy 1. duplication of space and effort of maintenance 2. e.g. Update “grade” should be reflected in all places where grade is stored Prone to inconsistencies Slide 1-26 ADVANTAGES OF USING THE DATABASE APPROACH Controlling redundancy in data storage and in development and maintenance efforts. Redundancy is where the same data is stored in more than one file. Hazards of Redundancy 1. duplication of space and effort of maintenance 2. Prone to inconsistencies May forget to update in all places where “grade” is stored Still may be inconsistent because updates are applied independently by each user group. E.g. group1 enters the grade as A and group2 enters the grade erroneously as B. Store each logical data item only in one place – data normalization Slide 1-27 ADVANTAGES OF USING THE DATABASE APPROACH Controlling redundancy in data storage and in development and maintenance efforts. Redundancy Duplication is where more than one copy of the same record occurs or there is duplication of at least one attribute value. If the data can be removed without causing a loss of data, then this duplication may be acceptable. When the duplication occurs in different files it is called Data Redundancy. Slide 1-28 ADVANTAGES OF USING THE DATABASE APPROACH Controlling redundancy in data storage and in development and maintenance efforts. Redundancy Duplication Data Integrity (or consistency) is the problem of ensuring that the data is accurate. Inconsistencies between two entries that intent to represent the same "fact" is known as an integrity error. This can only arise when there is data redundancy or, worse still, data duplication. Data Integrity is brought under control through elimination of Data Redundancy. Slide 1-29 ADVANTAGES OF USING THE DATABASE APPROACH Enforcing integrity constraints on the database. Controlling redundancy in data storage and in development and maintenance efforts. Sharing of data among multiple users. Restricting unauthorized access to data. Providing persistent storage for program Objects Providing Storage Structures for efficient Query Processing Slide 1-30 ADVANTAGES OF USING THE DATABASE APPROACH Providing backup and recovery services. Providing multiple interfaces to different classes of users. Representing complex relationships among data. Drawing Inferences and Actions using rules E.g. determine when students are on probation Slide 1-31 ADDITIONAL IMPLICATIONS OF USING THE DATABASE APPROACH Potential for enforcing standards: this is very crucial for the success of database applications in large organizations. Standards refer to data item names, display formats, screens, report structures, meta-data (description of data) etc. Reduced application development time: incremental time to add each new application is reduced. Slide 1-32 ADDITIONAL IMPLICATIONS OF USING THE DATABASE APPROACH Flexibility to change data structures: database structure may evolve as new requirements are defined. Availability of up-to-date information: very important for on-line transaction systems such as airline, hotel, car reservations. Economies of scale: by consolidating data and applications across departments wasteful overlap of resources and personnel can be avoided. Slide 1-33 EXTENDING DATABASE CAPABILITIES New functionality is being added to DBMSs in the following areas: Scientific Applications Image Storage and Management Audio and Video data management Data Mining Spatial data management Time Series and Historical Data Management The above gives rise to new research and development in incorporating new data types, complex data structures, new operations and storage and indexing schemes in database systems. Slide 1-36 WHEN NOT TO USE A DBMS Main inhibitors (costs) of using a DBMS: High initial investment and possible need for additional hardware. Overhead for providing generality, security, concurrency control, recovery, and integrity functions. When a DBMS may be unnecessary: If the database and applications are simple, well defined, and not expected to change. If there are stringent real-time requirements that may not be met because of DBMS overhead. If access to data by multiple users is not required. Slide 1-37 WHEN NOT TO USE A DBMS When no DBMS may suffice: If the database system is not able to handle the complexity of data because of modeling limitations If the database users need special operations not supported by the DBMS. Slide 1-38