ITEC 3220A Using and Designing Database Systems Instructor: Prof. Z. Yang Course Website: http://people.math.yorku.ca/~zyang/itec 3220a.htm Office: TEL 3049 Course Objective • Examine databases, trends in database management systems and their application in a wide range of organizational areas • Provide an overview of database processing, both historical and discussion of recent trends in database management • Provide the student with exposure to a range of tools, including a relational DBMS as well as an object-oriented DBMS 2 Textbooks • Database Systems. Design, Implementation, and Management, 10th Edition - Peter Rob & Carlos Coronel 3 Marking Scheme • Final exam (closed book) - 50% Midterm (closed book) - 35% Assignments (2 assignments) - 15% • Lecture notes will be made available at: http://people.yorku.ca/~zyang/itec3220a.htm 4 Schedule • Week 1 Database concepts and the relational database model • Week 2 Entity relationship model • Week 3 Normalization • Week 4 SQL • Week 5 SQL + lab • Week 6 Advanced SQL + lab 5 Schedule (Cont’d) • Week 7 Midterm • Week 8 Database design & case study • Week 9 Transaction management and concurrent control • Week 10 Transaction management and concurrent control (Cont’d) ; Data warehousing • Week 11 Objected-Oriented database • Week 12 Review 6 Introduction Database Systems and Data Models Basic Definition • Data: raw facts – Constitute building blocks of information • Information: is produced by processing data and reveals meaning of data – Good, timely, relevant information key to decision making – Good decision making key to organizational survival • Database: shared, integrated computer structure housing: – End user data – Metadata 8 An Example • Converting data to information 9 An Example (Cont’d) • Metadata 10 What is a Database Management System (DBMS) • A collection of programs that manages the database structure and controls access to the data stored in the database – Possible to share data among multiple applications or users • Example: bank and its ATM machines – Makes data management more efficient and effective • End users have better access to more and bettermanaged data 11 DBMS Manages Interaction 12 File and File System • Terminology – Data • Raw Facts – Field • Group of characters with specific meaning – Record • Logically connected fields that describe a person, place, or thing – File • Collection of related records 13 Example 14 Disadvantages of File Processing • Data Dependence – Change in file’s data characteristics requires modification of data access programs – Lengthy development time – Excessive program maintenance • Structural Dependence – Change in file structure requires modification of related programs 15 Example 16 Disadvantages of File Processing (Cont’d) • Data Redundancy – – Different and conflicting versions of same data Results of uncontrolled data redundancy • Data anomalies – Modification – Insertion – Deletion • Data inconsistency – Lack of data integrity 17 Solution: Database Approach • Database consists of logically related data stored in a single repository • Advantages of database approach – Structural and data independence – Minimal data redundancy •Reduces inconsistency, data anomalies – Improves data sharing and data quality – Stores data structures, relationships, and access paths 18 Database vs. File Systems 19 Database System Environment • Hardware: all the system's physical devices • Software – Operating system software – DBMS software – Application programs and utility software • People • Procedures • Data 20 Database Models • Collection of logical constructs used to represent data structure and relationships within the database – Conceptual models: logical nature of data representation – Implementation models: emphasis on how the data are represented in the database 21 Database Models: Historic Overview • Flat files - 1960s - 1980s • Hierarchical – 1970s - 1990s • Network – 1970s - 1990s • Relational – 1980s - present • Object-oriented – 1990s - present • Object-relational – 1990s - present • Data warehousing – 1980s - present • Web-enabled – 1990s - present 22 Hierarchical Database Model • Logically represented by an upside down tree – Each parent can have many children. – Each child has only one parent. 23 Hierarchical Database Model (Cont’d) • Advantages – – – – Conceptual simplicity Database security and integrity Data independence Efficiency • Disadvantages – – – – – Complex implementation Difficult to manage and lack of standards Lacks structural independence Application programming and use complexity Implementation limitations 24 Network Database Model • Each record can have multiple parents – Composed of sets – Each set has owner record and member record. – Member may have several owners. 25 Network Database Model (Cont’d) • Advantages – – – – – – Conceptual simplicity Handles more relationship types Data access flexibility Promotes database integrity Data independence Conformance to standards • Disadvantages – System complexity – Lack of structural independence 26 Relational Database Model • Perceived by user as a collection of tables for data storage • Tables are a series of row/column intersections. • Tables related by sharing common entity characteristic(s). 27 Relational Database Model (Cont’d) 28 Relational Database Model (Cont’d) • Schema for the table – Graphical representation AGENT AGENT_C ODE AGENT_LN AGENT_FN AME AME AGENT_INI AGENT_AREA AGENT_PH TIAL CODE ONE – Text description AGENT(AGENT_CODE, AGENT_LNAME, AGENT_FNAME, AGENT_INITIAL, AGENT_AREACODE, AGETN_PHONE) 29 Relational Database Model (Cont’d) • Advantages – Structural independence – Improved conceptual simplicity – Easier database design, implementation, management, and use – Ad hoc query capability with SQL – Powerful database management system • Disadvantages – Substantial hardware and system software overhead – Poor design and implementation is made easy – May promote “islands of information” problems 30 Object-Oriented Database Model • Objects or abstractions of real-world entities are stored – Attributes describe properties – Collection of similar objects is a class •Methods represent real world actions of classes •Classes are organized in a class hierarchy – Inheritance is ability of object to inherit attributes and methods of classes above it 31 OO Data Model • Advantages – – – – Adds semantic content Visual presentation includes semantic content Database integrity Both structural and data independence • Disadvantages – – – – Lack of OODM Complex navigational data access Steep learning curve High system overhead slows transactions 32 Costs and Risks of the Database Approach • Up-front costs: – Installation Management Cost and Complexity – Conversion Costs • Ongoing Costs – Requires New, Specialized Personnel – Need for Explicit Backup and Recovery • Organizational Conflict 33 Review • Basic concepts: data, information, database, DBMS, file, conceptual model, implementation model, etc • Why database and its importance, cost and risk • Different database models – definition – advantage – disadvantage 34 Chapter 3 The Relational Database Model In this chapter, you will learn: • Basic components of the relational database model – Entities and their attributes – Relationships among entities • Relational algebra • Relationship in relational database • Data redundancy 36 Basic Definition • Entities and Attributes – Entity is a person, place, event, or thing about which data is collected – Attributes are characteristics of the entity • Tables – Holds related entities or entity set – Also called relations – Comprised of rows and columns 37 Table Characteristics • • • • • • • • Two-dimensional structure with rows and columns Rows (tuples) represent single entity Columns represent attributes Row/column intersection represents single value Tables must have an attribute to uniquely identify each row Column values all have same data format Each column has range of values called attribute domain Order of the rows and columns is immaterial to the DBMS 38 Example Tables 39 Terminology for Relational Database Table-Oriented Set-oriented Record-Oriented Table Relation Record type Row Tuple Record Column Attribute Field 40 Key • Consists of one or more attributes that determine other attributes • Primary key (PK) is an attribute (or a combination of attributes) that uniquely identifies any given entity (row). • Key’s role is based on determination – If you know the value of attribute A, you can look up (determine) the value of attribute B 41 Keys (Cont’d) • Composite key – Composed of more than one attribute • Key attribute – Any attribute that is part of a key • Superkey – Any key that uniquely identifies each entity • Candidate key – A superkey without redundancies 42 Keys (Cont’d) • Foreign key (FK) – An attribute whose values match primary key values in the related table • Referential integrity – FK contains a value that refers to an existing valid tuple (row) in another relation • Secondary key – Key used strictly for data retrieval purposes 43 Simple Relational Database 44 Controlled Redundancy • Makes the relational database work • Tables within the database share common attributes that enable us to link tables together. • Multiple occurrences of values in a table are not redundant when they are required to make the relationship work. • Redundancy is unnecessary duplication of data 45 Integrity Rules 46 Integrity Rules (cont’d) 47 Exercises Table name: TRUCK Table name: BASE Table name: TYPE 48 Exercises (Cont’d) • For each table, identify the primary key and the foreign keys. • Do the tables exhibit entity integrity? Explain • So the tables exhibit referential integrity? Explain • Identify the TRUCK table’s candidate key (s). • For each table, identify a super key and a secondary key 49