00-DBS201 Definitions Part 1 A DBMS (Database Management System) is a program, or collection of programs, through which users interact with a database. It both stores and manipulates data, which is stored in structures called "tables". Table A table is a two-dimensional representation of data in rows and columns. Database A structure that can store information about multiple entities, the attributes of those entities, and the relationships between those entities is called a database. Each entity is stored in a table and the attributes are columns of the table. Relating many separate tables makes up a database. Database Design Is a process of designing entities and their attributes, and the relationships between the entities. Database Application User-oriented programs to enter, update, and delete stored data. Form, Report, or View Screen or print objects used to view, print, and maintain data from a database. Database Models There are 4 types of database models for organizing the database structure: 1. Network 2. Hierarchical 3. Relational This DBS201 course focuses on the relational database model. 4. Object-oriented The Relational Database Model A relational database is a collection of relations (see "relation" definition below). The relational database model consists of four components: 1. Entity (Chapter 1, page 6) An entity represents real world objects. It is a person, place, thing, or event for which we intend to store and process data. Entities are represented in the relational database model as tables (or relations). Table (Relation) (Chapter 2, page 30) A relation is a table (a two-dimensional representation of data in rows and columns), which also has the following properties: a) All data entries in the tables are single-valued i.e.: each column and row "cell" has one, and only one, value of data information. This one value of data information must be broken into it's smallest component. b) Each column (attribute) must have a distinct name. c) All data values in a column must be of the same data type (text, number or date). d) Each row must be distinct (unique), and is identified by a single column (attribute) value or a combination of columns (called the Primary Key). There is only 1 PK even if it is made up of several attributes. e) The row order is not important. f) The column order is not important. The Primary Key column (attribute) uniquely identifies each row of a relation (table). We must make sure that we choose appropriate columns as the PK. Candidate Key An attribute or group of attributes that could be chosen as a Primary Key, but are not chosen as the Primary Key due to design reasons such as data security or table efficiency. Example: Every working person in Canada has a Social Insurance Number (SIN). We could use the SIN to identify our employees and this would work great. However, a SIN is confidential information, just as a STUDENT NUMBER is confidential. Therefore, we don't use SIN or STUNUM as PK. Instead, we create another identifier column such as EMP_ID or STU_ID and give each person a unique number. Example: FirstName and LastName are candidate keys for an EMPLOYEE table, but due to possible name duplication it is an inappropriate choice. Therefore, we decide to choose an EmpID column as the primary key. EmpId becomes a Primary key; and FirstName + LastName become a candidate key. 2. Attribute Attributes are columns of a table. Each column, or attribute, represents a characteristic of an entity (or, a piece of information about an entity). See definition of "relation" above for attribute rules. A Derived Attribute is an attribute whose values can be calculated (or generated) from other attributes. In general, we do NOT store derived attributes in our relations (entities), but we store the attributes, which are used to calculate the derived value. 3. Record A record is a row of a table, also called a tuple. 4. Relationship (Chapter 1, page 6) A relationship is an association between entities. Associations (relationships) between entities are formed when an entity's Primary Key attribute is copied as an attribute of a second entity. This second entity's attribute is called the Foreign Key attribute. The Foreign Key attribute is used to establish a relationship between two tables. The Foreign Key column is an attribute whose value must be in the range of primary key values of another entity. The value entered into a foreign key “cell” must exist already as one of the primary keys in another entity. It is through this common value that tables can be joined together. Associations can be of three types: (a) 1:1 - one to one One instance of the first entity can be related to only one instance of the second entity. Or, put another way, the first entity's Primary Key value can be found only once as a Foreign Key value in the second entity. This does not happen very often. (b) 1:M - one to many One instance of the first entity can be related to many instances of the second entity. Or, put another way, the first entity's Primary Key value can be found many times as a Foreign Key value in the second entity. This is NOT true in the other direction. (c) M:N - many to many One instance of the first entity can be related to many instances of the second entity, and one instance of the second entity can be related to many instances of the first entity. Or, put another way, the first entity's Primary Key value can be found many times as a Foreign Key value in the second entity. AND, the second entity's Primary Key value can be found many times as a Foreign Key value in the first entity. Many-to-many relationships are hard to implement and are actually implemented through the use of another table. See next paragraph. A Bridge entity/table, or Composite entity/table, is an entity/table in the relational database model that is required to implement many to many relationships between entities. The Normalization Steps Normalization is a process that tries to minimize problems that occur when we store or manipulate data (add, change and delete). Problems occur most often when data is stored in more than one location. This is called redundancy. (Book Chapter 1 page 3) Redundancy occurs when data has been duplicated within a single table or between 2 or more tables. Some of the problems it causes are: (1) wasted storage space, (2) data changes are cumbersome and time-consuming, and (3) leads to inconsistencies An Inconsistency occurs when the same piece of data is stored in more than one place with more than one spelling or format. This will require complex SQL operations with the database when data is updated, inserted, or deleted. UNF (Un-Normal Form Relation) A table (relation) that has one or more repeating groups is said to be in un-normalized form. (Chapter 02, page 32) (Chapter 05, page 145) Each row and column should store a single piece of data. Problems will occur if there are multiple entries of data for a row and column, called a repeating Group When creating UNF tables: - Place brackets around repeating groups - Calculated (derived) attributes are to be removed - An identifier should be chosen which best reflects the information in the view (an ID or CODE or other identifier if one exists) 1NF (First Normal Form Relation) First Normal Form A table (relation) is in first normal form (1NF) if it does not contain repeating groups. 1. All key (prime) attributes are defined. 2. There are no repeating groups in the table's composite key. 3. All attributes are dependent on the primary key. When creating 1NF tables: - Start with the most embedded group and join its columns and identifier to the identifier of the parent group. This join will create a bridge/composite table consisting of two or more Primary Keys. This table implements a M:N relationship between the two identified groups of attributes. - If a proper identifier (Primary Key) does not exist, then you must create an appropriate identifier. 2NF (Second Normal Form Relation) Second Normal Form A table (relation) is in second normal form (2NF) if it is in first normal form and no nonkey attribute is dependent on only a portion of the primary key. 1. Table is in 1NF 2. The table has no partial dependencies When creating 2NF tables: - Split apart tables with 2 or more Primary Keys and assign the columns to the new tables or leave the columns in the composite table. - Break all combined columns into their smallest forms, such as a person's name becoming Fname and Lname, or an address becoming Street, City, Province, and Postal 3NF (Third Normal Form Relation) Third Normal Form A table is in third normal form (3NF) if it is in second normal form and if the only determinants it contains are candidate keys. Any column (or collection of columns) that determines another column is called a determinant. 1. Table is in 2NF 2. The table has no transitive dependencies. Transitive dependencies are broken into separate tables. 3. The primary key and nothing but the primary key defines each non-key attribute. When creating 3NF tables: - When you identify the 3NF table, it will leave behind a Foreign Key. This implements a 1:M relationship.