Chapter 1 - Database Systems Monday, 22 January 2018 20:32 1.2 Data Versus Information Raw Data Raw facts, or facts that have not yet been processed to reveal their meaning to the end user. Information The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making. Knowledge The body of information and facts about a specific subject. Knowledge implies familiarity, awareness and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge. 1.3 Introducing the Database Data Management A process that focusses on data collection, storage and retrieval. Common data managements functions include addition, deletion, modification and listing. Database A shared, integrated computer structure that houses a collection of data. A database contains two types of data: end user data (raw data) and metadata. Metadata Data about data; that is, data about data characteristics and relationships. Database Management System (RDBM) The collection of programs that manages the database structure and controls access to the data stored in the database. DBMS Advantages • • • • • Improved data sharing. Improved data security. Better data integration. Minimized data inconsistency. Improved data access. UNISA Page 1 • Improved data access. • Improved decision making. • Increased end-user productivity. Data Inconsistency A condition in which different versions of the same data yield different (inconsistent) results. Query A question or task asked by an end user of a database in the form of SQL code. A specific request for data manipulation issued by the end user or the application to the DBMS. Ad Hoc Query A “spur-of-the-moment” question. Query Result Set The collection of data rows returned by a query. Data Quality A comprehensive approach to ensuring the accuracy, validity, and timeliness of data. Types of Databases • • • • • • • • • • • • Single-user database - A database that supports only one user at a time. Desktop database - A single-user database that runs on a personal computer. Multiuser database - A database that supports multiple concurrent users. Workgroup database - A multiuser database that usually supports fewer than 50 users or is used for a specific department in an organization. Enterprise database - The overall company data representation, which provides support for present and expected future needs. Centralized database - A database located at a single site. Distributed database - A logically related database that is stored in two or more physically independent sites. Cloud database - A database that is created and maintained using cloud services, such as Microsoft Azure o Amazon AWS. General-purpose database - A database that contains a wide variety of data used in multiple disciplines. Discipline-specific database - A database that contains data focused on specific subject areas. Operation database - A database designed primarily to support a company’s day-to-day operations. Also known as a transactional database, OLTP database, or production database. Analytic database - A database focused primarily on storing historical data and business metrics used for tactical or strategic decision making. Data Warehouse A specialized database that stores historical and aggregated data in a format optimized for decision UNISA Page 2 A specialized database that stores historical and aggregated data in a format optimized for decision support. Online Analytical Processing (OLAP) A set of tools that provide advanced data analysis for retrieving, processing, and modelling data from the data warehouse. Business Intelligence A set of tools and processes used to capture, collect, integrate, store, and analyse data to support business decision making. Unstructured Data Data that exists in its original, raw state; that is, in the format in which it was collected. Structured Data Data that has been formatted to facilitate storage, use, and information generation. Semistructured Data Data that has already been processed to some extent. Extensible Markup Language (XML) A metalanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document’s data elements. XML facilitates the exchange of structured documents such as orders and invoices over the Internet. XML Database A database system that stores and manages semistructured XML data. Social Media Web and mobile technologies that enable “anywhere, anytime, always on” human interactions. NoSQL A new generation of database management systems that is not based on the traditional relational database model. 1.4 Why Database design is important UNISA Page 3 1.4 Why Database design is important Database Design The process that yields the description of the database structure and determines the database components. The second phase of the Database Life Cycle. 1.5 Evolution of File System Data Processing Data Processing (DP) Specialist The person responsible for developing and managing a computerized file processing system. 1.6 Problems with File System Data Processing Structural Dependencies A data characteristic in which a change in the database schema affects data access, thus requiring changes in all access programs. Structural Independencies A data characteristic in which changes in the database schema do not affect data access. Data Dependencies A data condition in which data representation and manipulation are dependent on the physical data storage characteristics. Data Independence A condition in which data access is unaffected by changes in the physical data storage characteristics. Logical Data Format The way a person views data within the context of a problem domain. Physical Data Format The way a computer “sees” (stores) data. Islands of Information In the old file system environment, pools of independent, often duplicated, and inconsistent data created and managed by different departments. UNISA Page 4 created and managed by different departments. Data Redundancy Exists when the same data is stored unnecessarily at different places. Data Integrity In a relational database, a condition in which the data in the database complies with all entity and referential integrity constraints. Data Anomaly A data abnormality in which inconsistent changes have been made to a database. For example, an employee moves, but the address change is not corrected in all files in the database. 1.7 Database Systems Database System An organization of components that defines and regulates the collection, storage, management, and use of data in a database environment. Data Dictionary A DBMS component that stores metadata—data about data. The data dictionary contains data definitions as well as data characteristics and relationships. May also include data that is external to the DBMS. Performance Tuning Activities that make a database perform more efficiently in terms of storage and access speed. Query Language A nonprocedural language that is used by a DBMS to manipulate its data. An example of a query language is SQL. Structured Query Language (SQL) A powerful and flexible relational database language composed of commands that enable users to create database and table structures, perform various types of data manipulation and data administration, and query the database to extract useful information. UNISA Page 5 Chapter 2 - Data Models Tuesday, 23 January 2018 21:35 2.1 Data Modeling and Data Models Data Modeling The process of creating a specific data model for a determined problem domain. Data Model A representation, usually graphic, of a complex “real-world” data structure. Data models are used in the database design phase of the Database Life Cycle. 2.3 Data Model Basic Building Blocks Entity A person, place, thing, concept, or event for which data can be stored. See also attribute. Attribute A characteristic of an entity or object. An attribute has a name and a data type. Relationship An association between entities. One-to-Many (1:M or 1..*) Relationship Associations among two or more entities that are used by data models. In a 1:M relationship, one entity instance is associated with many instances of the related entity. Many-to-Many (M:N or *..*) Relationship Association among two or more entities in which one occurrence of an entity is associated with many occurrences of a related entity and one occurrence of the related entity is associated with many occurrences of the first entity. One-to-One (1:1 or 1..1) Relationship Associations among two or more entities that are used by data models. In a 1:1 relationship, one entity instance is associated with only one instance of the related entity. Constraints UNISA Page 6 Constraints A restriction placed on data, usually expressed in the form of rules. For example, “A student’s GPA must be between 0.00 and 4.00.” Constraints are important because they help to ensure data integrity. 2.4 Business Rules Business Rule A description of a policy, procedure, or principle within an organization. For example, a pilot cannot be on duty for more than 10 hours during a 24-hour period, or a professor may teach up to four classes during a semester. 2.5 The Evolution of Data Models Hierarchical Model An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it. Segment In the hierarchical data model, the equivalent of a file system’s record type. Network Model An early data model that represented data as a collection of record types in 1:M relationships. Schema A logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other. Subschema The portion of the database that interacts with application programs. Data Manipulation Language (DML) The set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK. Data Definition Language UNISA Page 7 Data Definition Language The language that allows a database administrator to define the database structure, schema, and subschema. Relational Model Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two-dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns). Table (Relation) A logical construct perceived to be a two-dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model. Tuple In the relational model, a table row. Relational Database Management System (RDBMS) A collection of programs that manages a relational database. The RDBMS software translates a user’s logical requests (queries) into commands that physically locate and retrieve the requested data. Relational Diagram A graphical representation of a relational database’s entities, the attributes within those entities, and the relationships among the entities. Entity Relationship (ER) Model (ERM) A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen. Entity Relationship Diagram (ERD) A diagram that depicts an entity relationship model’s entities, attributes, and relations. Entity Instance (Entity Occurrence) A row in a relational table. Entity Set UNISA Page 8 Entity Set A collection of like entities. Connectivity The type of relationship between entities. Classifications include 1:1, 1:M, and M:N. Crow's Foot Notation A representation of the entity relationship diagram that uses a three-pronged symbol to represent the “many” sides of the relationship. Class Diagram Notation The set of symbols used in the creation of class diagrams. Object-Oriented Data Model (OODM) A data model whose basic modeling structure is an object. Object An abstract representation of a real-world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself. Object-Oriented Database Management System (OODBMS) Data management software used to manage data in an object-oriented database model. Semantic Data Model The first of a series of data models that more closely represented the real world, modeling both data and their relationships in a single structure known as an object. The SDM, published in 1981, was developed by M. Hammer and D. McLeod. Class A collection of similar objects with shared structure (attributes) and behaviour (methods). A class encapsulates an object’s data representation and a method’s implementation. Classes are organized in a class hierarchy. Method In the object-oriented data model, a named set of instructions to perform an action. Methods represent real-world actions, and are invoked through messages. UNISA Page 9 represent real-world actions, and are invoked through messages. Class Hierarchy The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. See also inheritance. Inheritance In the object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy. Unified Modeling Language (UML) A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system. Class Diagram A diagram used to represent data and their relationships in UML object notation. Extended Relational Data Model (ERDM) A model that includes the object-oriented model’s best features in an inherently simpler relational database structural environment. Object/Relational Database Management System (O/R DBMS) A DBMS based on the extended relational model (ERDM). The ERDM, championed by many relational database researchers, constitutes the relational model’s response to the OODM. This model includes many of the object-oriented model’s best features within an inherently simpler relational database structure. Big Data A movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost. 3 Vs Three basic characteristics of Big Data databases: volume, velocity, and variety. Hadoop A Java based, open source, high speed, fault-tolerant distributed storage and computational UNISA Page 10 A Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop Distributed File System (HDFS) A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. Name Node One of three types of nodes used in the Hadoop Distributed File System (HDFS). The name node stores all the metadata about the file system. Data Node One of three types of nodes used in the Hadoop Distributed File System (HDFS). The data node stores fixed-size data blocks (that could be replicated to other data nodes). Client Node One of three types of nodes used in the Hadoop Distributed File System (HDFS). The client node acts as the interface between the user application and the HDFS. MapReduce An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores. Key-Value A data model based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values. The key value data model is also called the associative or attribute-value data model. Sparse Data A case in which the number of table attributes is very large but the number of actual data instances is low. Eventual Consistency A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually. American National Standard Institute (ANSI) UNISA Page 11 American National Standard Institute (ANSI) The group that accepted the DBTG recommendations and augmented database standards in 1975 through its SPARC committee. External Model The application programmer’s view of the data environment. Given its business focus, an external model works with a data subset of the global database schema. External Schema The specific representation of an external view; the end user’s view of the data environment. Conceptual Model The output of the conceptual design process. The conceptual model provides a global view of an entire database and describes the main data objects, avoiding details. Conceptual Schema A representation of the conceptual model, usually expressed graphically. Software Independence A property of any model or application that does not depend on the software used to implement it. Hardware Independence A condition in which a model does not depend on the hardware used in the model’s implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level. Logical Design A stage in the design phase that matches the conceptual design to the requirements of the selected DBMS and is therefore software-dependent. Logical design is used to translate the conceptual design into the internal model for a selected database management system, such as DB2, SQL Server, Oracle, IMS, Informix, Access, or Ingress. Internal Model In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation. The internal model is the representation of a database as “seen” by the DBMS. In other words, the internal model requires a designer to match the conceptual model’s characteristics and constraints to those of the selected implementation model. UNISA Page 12 Internal Schema A representation of an internal model using the database constructs supported by the chosen database. Logical Independence A condition in which the internal model can be changed without affecting the conceptual model. (The internal model is hardware-independent because it is unaffected by the computer on which the software is installed. Therefore, a change in storage devices or operating systems will not affect the internal model.) Physical Model A model in which physical characteristics such as location, path, and format are described for the data. The physical model is both hardware- and software-dependent. Physical Independence A condition in which the physical model can be changed without affecting the internal model. UNISA Page 13 Chapter 3 - The Relational Database Model Tuesday, 30 January 2018 21:49 3.1 A Logical View of Data Predicate logic Used extensively in mathematics to provide a framework in which an assertion (statement of fact) can be verified as either true or false. Set Theory A part of mathematical science that deals with sets, or groups of things, and is used as the basis for data manipulation in the relational model. Tuple In the relational model, a table row. Domain In data modeling, the construct used to organize and describe an attribute’s set of possible values. Primary Key (PK) In the relational model, an identifier composed of one or more attributes that uniquely identifies a row. Also, a candidate key selected as a unique entity identifier. 3.2 Keys Key One or more attributes that determine other attributes. Determination The role of a key. In the context of a database table, the statement “A determines B” indicates that knowing the value of attribute A means that the value of attribute B can be looked up. Functional Dependence Within a relation R, an attribute B is functionally dependent on an attribute A if and only if a given value of attribute A determines exactly one value of attribute B. The relationship “B is dependent on A” is equivalent to “A determines B, and is written as A → B. Determinant UNISA Page 14 Determinant Any attribute in a specific row whose value directly determines other values in that row. Dependant An attribute whose value is determined by another attribute. Full Functional Dependence A condition in which an attribute is functionally dependent on a composite key but not on any subset of the key. Composite Key A multiple-attribute key. Key Attributes The attributes that form a primary key. Superkey An attribute or attributes that uniquely identify each entity in a table. Candidate Key A minimal superkey; that is, a key that does not contain a subset of attributes that is itself a superkey. Entity Integrity The property of a relational table that guarantees each entity has a unique value in a primary key and that the key has no null values. Null The absence of an attribute value. Note that a null is not a blank. Foreign Key (FK) An attribute or attributes in one table whose values must match the primary key in another table or whose values must be null. Referential Integrity UNISA Page 15 Referential Integrity A condition by which a dependent table’s foreign key must have either a null entry or a matching entry in the related table. Secondary Key A key used strictly for data retrieval purposes. For example, customers are not likely to know their customer number (primary key), but the combination of last name, first name, middle initial, and telephone number will probably match the appropriate table row Flags Special codes implemented by designers to trigger a required response, alert end users to specified conditions, or encode values. Flags may be used to prevent nulls by bringing attention to the absence of a value in a table. Relational Algebra A set of mathematical principles that form the basis for manipulating relational table contents; the eight main functions are SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT, and DIVIDE. Relvar Short for relation variable, a variable that holds a relation. A relvar is a container (variable) for holding relation data, not the relation itself. Closure A property of relational operators that permits the use of relational algebra operators on existing tables (relations) to produce new relations. SELECT In relational algebra, an operator used to select a subset of rows. Also known as RESTRICT. RESTRICT Same as SELECT. PROJECT In relational algebra, an operator used to select a subset of columns. UNION UNISA Page 16 UNION In relational algebra, an operator used to merge (append) two tables into a new table, dropping the duplicate rows. The tables must be union-compatible. A∪B Union-Compatible Two or more tables that have the same number of columns and the corresponding columns have compatible domains. INTERSECT In relational algebra, an operator used to yield only the rows that are common to two unioncompatible tables. A∩B DIFFERENCE In relational algebra, an operator used to yield all rows from one table that are not found in another union-compatible table. A-B PRODUCT In relational algebra, an operator used to yield all possible pairs of rows from two tables. Also known as the Cartesian product. JOIN In relational algebra, a type of operator used to yield rows from two tables based on criteria. There are many types of joins, such as natural join, theta join, equijoin, and outer join. c⨝a Natural Join A relational operation that yields a new table composed of only the rows with common values in their common attribute(s). Join Columns Columns that are used in the criteria of join operations. The join columns generally share similar values. Equijoin A join operator that links tables based on an equality condition that compares specified columns of UNISA Page 17 A join operator that links tables based on an equality condition that compares specified columns of the tables. Theta Join A join operator that links tables using an inequality comparison operator (<, >, <=, >=) in the join condition. Inner Join A join operation in which only rows that meet a given criterion are selected. The join criterion can be an equality condition (natural join or equijoin) or an inequality condition (theta join). The inner join is the most commonly used type of join. Contrast with outer join. Outer Join A relational algebra join operation that produces a table in which all unmatched pairs are retained; unmatched values in the related table are left null. Contrast with inner join. Left Outer Join In a pair of tables to be joined, a join that yields all the rows in the left table, including those that have no matching values in the other table. For example, a left outer join of CUSTOMER with AGENT will yield all of the CUSTOMER rows, including the ones that do not have a matching AGENT row. Right Outer Join In a pair of tables to be joined, a join that yields all of the rows in the right table, including the ones with no matching values in the other table. For example, a right outer join of CUSTOMER with AGENT will yield all of the AGENT rows, including the ones that do not have a matching CUSTOMER row. DIVIDE In relational algebra, an operator that answers queries about one set of data being associated with all values of data in another set of data. 3.5 The Data Dictionary and the System Catalog Data Dictionary A DBMS component that stores metadata— data about data. Thus, the data dictionary contains the data definition as well as their characteristics and relationships. A data dictionary may also include data that are external to the DBMS. Also known as an information resource dictionary. System Catalog UNISA Page 18 A detailed system data dictionary that describes all objects in a database. Homonym The use of the same name to label different attributes. Homonyms generally should be avoided. Some relational software automatically checks for homonyms and either alerts the user to their existence or automatically makes the appropriate adjustments. 3.6 Relationships within the Relational Database Synonym The use of different names to identify the same object, such as an entity, an attribute, or a relationship; synonyms should generally be avoided. Composite Entity An entity designed to transform an M:N relationship into two 1:M relationships. The composite entity’s primary key comprises at least the primary keys of the entities that it connects. Also known as a bridge entity or associative entity. Linking Tables In the relational model, a table that implements an M:M relationship. 3.8 Indexes Index An ordered array of index key values and row ID values (pointers). Indexes are generally used to speed up and facilitate data retrieval. Also known as an index key. Unique Index An index in which the index key can have only one associated pointer value (row). UNISA Page 19 Chapter 4 - Entity Relationship (ER) Modeling Sunday, 04 February 2018 11:34 4.1 The Entity Relationship Model (ERM) Required Attributes In ER modeling, an attribute that must have a value. In other words, it cannot be left empty. Optional Attributes In ER modeling, an attribute that does not require a value; therefore, it can be left empty. Identifiers One or more attributes that uniquely identify each entity instance. Relational Schema The organization of a relational database as described by the database administrator. Composite Identifier In ER modeling, a key composed of more than one attribute. For example, a phone number such as 615-898-2368 may be divided into an area code (615), an exchange number (898), and a four-digit code (2368). Composite Attribute An attribute that can be further subdivided to yield additional attributes. Simple Attribute An attribute that cannot be subdivided into meaningful components. Single-valued Attribute An attribute that can have only one value. Multivalued Attribute An attribute that can have many values for a single entity occurrence. For example, an EMP_DEGREE attribute might store the string “BBA, MBA, PHD” to indicate three different degrees held. Derived Attribute An attribute that does not physically exist within the entity and is derived via an algorithm. For example, the Age attribute might be derived by subtracting the birth date from the current date. UNISA Page 20 Participants An ER term for entities that participate in a relationship. For example, in the relationship “PROFESSOR teaches CLASS,” the teaches relationship is based on the participants PROFESSOR and CLASS. Connectivity The classification of the relationship between entities. Classifications include 1:1, 1:M, and M:N. Cardinality A property that assigns a specific value to connectivity and expresses the range of allowed entity occurrences associated with a single occurrence of the related entity. Existence-dependent A property of an entity whose existence depends on one or more other entities. In such an environment, the existence-independent table must be created and loaded first because the existence-dependent key cannot reference a table that does not yet exist. Existence-independent A property of an entity that can exist apart from one or more related entities. Such a table must be created first when referencing an existence-dependent table. Strong Entity An entity that is existence-independent, that is, it can exist apart from all of its related entities. Also called a regular entity. Weak (non-identity) Relationship A relationship in which the primary key of the related entity does not contain a primary key component of the parent entity. Strong (identifying) Relationship A relationship that occurs when two entities are existence-dependent; from a database design perspective, this relationship exists whenever the primary key of the related entity contains the primary key of the parent entity. Weak Entity An entity that displays existence dependence and inherits the primary key of its parent entity. For example, a DEPENDENT requires the existence of an EMPLOYEE. Optional Participation In ER modeling, a condition in which one entity occurrence does not require a corresponding entity occurrence in a particular relationship. UNISA Page 21 occurrence in a particular relationship. Mandatory Participation A relationship in which one entity occurrence must have a corresponding occurrence in another entity. For example, an EMPLOYEE works in a DIVISION. (A person cannot be an employee without being assigned to a company’s division.) Relationship Degree The number of entities or participants associated with a relationship. A relationship degree can be unary, binary, ternary, or higher. Unary Relationship An ER term used to describe an association within an entity. For example, an EMPLOYEE might manage another EMPLOYEE. Binary Relationship An ER term for an association (relationship) between two entities. For example, PROFESSOR teaches CLASS. Ternary Relationship An ER term used to describe an association (relationship) between three entities. For example, a DOCTOR prescribes a DRUG for a PATIENT. Recursive Relationship A relationship found within a single entity type. For example, an EMPLOYEE is married to an EMPLOYEE or a PART is a component of another PART. 4.2 Developing an ER Diagram Iterative Process A process based on repetition of steps and procedures. UNISA Page 22 Chapter 5 - Advanced Data Modeling Monday, 05 February 2018 21:38 5.1 The Extended Entity Relationship Model Extended Entity Relationship Model (EERM) Sometimes referred to as the enhanced entity relationship model; the result of adding more semantic constructs, such as entity supertypes, entity subtypes, and entity clustering, to the original entity relationship (ER) model. EER Diagram (EERD) The entity relationship diagram resulting from the application of extended entity relationship concepts that provide additional semantic content in the ER model. Entity Supertype In a generalization/ specialization hierarchy, a generic entity type that contains the common characteristics of entity subtypes. Entity Subtype In a generalization/ specialization hierarchy, a subset of an entity supertype. The entity supertype contains the common characteristics and the subtypes contain the unique characteristics of each entity. Specialization Hierarchy A hierarchy based on the top-down process of identifying lower-level, more specific entity subtypes from a higher-level entity supertype. Specialization is based on grouping unique characteristics and relationships of the subtypes. Inheritance In the EERD, the property that enables an entity subtype to inherit the attributes and relationships of the entity supertype. Subtype Discriminator The attribute in the supertype entity that determines to which entity subtype each supertype occurrence is related. Disjoint Subtype In a specialization hierarchy, a unique and nonoverlapping subtype entity set. UNISA Page 23 Overlapping Subtype In a specialization hierarchy, a condition in which each entity instance (row) of the supertype can appear in more than one subtype. Completeness Constraint A constraint that specifies whether each entity supertype occurrence must also be a member of at least one subtype. The completeness constraint can be partial or total. Partial Completeness In a generalization/ specialization hierarchy, a condition in which some supertype occurrences might not be members of any subtype. Total Completeness In a generalization/ specialization hierarchy, a condition in which every supertype occurrence must be a member of at least one subtype. Specialization In a specialization hierarchy, the grouping of unique attributes into a subtype entity. Generalization In a specialization hierarchy, the grouping of common attributes into a supertype entity. 5.2 Entity Clustering Entity Cluster A “virtual” entity type used to represent multiple entities and relationships in the ERD. An entity cluster is formed by combining multiple interrelated entities into a single abstract entity object. An entity cluster is considered “virtual” or “abstract” because it is not actually an entity in the final ERD. 5.3 Entity Integrity: Selecting Primary Keys Natural Key (natural identifier) A generally accepted identifier for real-world objects. As its name implies, a natural key is familiar to end users and forms part of their day-to-day business vocabulary. Surrogate Key UNISA Page 24 A system-assigned primary key, generally numeric and autoincremented 5.4 Design Cases: Learning Flexible Database Design Time-Variant Data Data whose values are a function of time. For example, time-variant data can be seen at work when a company’s history of all administrative appointments is tracked. Design Trap A problem that occurs when a relationship is improperly or incompletely identified and therefore is represented in a way that is not consistent with the real world. The most common design trap is known as a fan trap. Fan Trap A design trap that occurs when one entity is in two 1:M relationships with other entities, thus producing an association among the other entities that is not expressed in the model. UNISA Page 25 Chapter 6 - Normalization of Database Tables Tuesday, 06 February 2018 21:12 6.1 Database Tables and Normalization Normalization A process that assigns attributes to entities so that data redundancies are reduced or eliminated. Denormalization A process by which a table is changed from a higher-level normal form to a lower-level normal form, usually to increase processing speed. Denormalization potentially yields data anomalies. Prime Attributes A key attribute; that is, an attribute that is part of a key or is the whole key. Key Attributes The attributes that form a primary key. Nonprime Attribute An attribute that is not part of a key. 6.3 The Normalization Process Partial Dependency A condition in which an attribute is dependent on only a portion(subset) of the primary key. Transitive Dependency A condition in which an attribute is dependent on another attribute that is not part of the primary key. Repeating Group In a relation, a characteristic describing a group of multiple entries of the same type for a single key attribute occurrence. For example, a car can have multiple colours for its top, interior, bottom, trim and so on. Dependency Diagram UNISA Page 26 A representation of all data dependencies(primary key, partial, or transitive) within a table. First Normal Form (1NF) The first stage in the normalization process. It describes a relation depicted in tabular format, with no repeating groups and a primary key identified. All nonkey attributes in the relation are dependent on the primary key. Second Normal Form (2NF) The second stage in the normalization process, in which a relation is in 1NF and there are no partial dependencies(dependencies in only part of the primary key). Determinant Any attribute in a specific row whose value directly determines other values in that row. Third Normal Form (3NF) A table is in 3NF when it is in 2NF and no nonkey attribute is functionally dependent on another nonkey attribute; that is, it cannot include transitive dependencies. 6.4 Improving the Design Atomic Attribute An attribute that cannot be further subdivided to produce meaningful components. For example, a person's last name attribute cannot be meaningfully subdivided. Atomicity Not being able to be divided into smaller units. Granularity The level of detail represented by the values stored in a table's row. Data stored at its lowest level of granularity is said to be atomic data. 6.6 Higher-Level Normal Forms Boyce-Codd Normal Form (BCNF) A special type of third normal form (3NF) in which every determinant is a candidate key. A table in BCNF must be in 3NF. UNISA Page 27 6.7 Normalization and Database Design Fourth Normal Form (4NF) A table is in 4NF if it is in 3NF and contains no multiple independent sets of multivalued dependencies. UNISA Page 28 Chapter 7 - Introduction to Structured Query Language (SQL) Sunday, 11 February 2018 11:32 7.1 Introduction to SQL SQL Data Definition Commands SQL Data Manipulation Commands UNISA Page 29 7.2 Data Definition Commands Authentication The process through which a DBMS verifies that only registered users can access the database. Schema A logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other. Usually, a schema belongs to a single user or application. Some Common SQL Data Types UNISA Page 30 CREATE TABLE A SQL command that creates a table’s structures using the characteristics and attributes given. Reserved Words Words used by a system that cannot be used for any other purpose. For example, in Oracle SQL, the word INITIAL cannot be used to name tables or columns. CREATE INDEX A SQL command that creates indexes on the basis of a selected attribute or attributes. DROP INDEX A SQL command used to delete database objects such as tables, views, indexes, and users. UNISA Page 31 INSERT A SQL command that allows the insertion of one or more data rows into a table using a subquery. COMMIT The SQL command that permanently saves data changes to a database. or COMMIT; SELECT A SQL command that yields the values of all rows or a subset of rows in a table. The SELECT statement is used to retrieve data from tables. Wildcard Character A symbol that can be used as a general substitute for: (1) all columns in a table (*) when used in an attribute list of a SELECT statement or, (2) zero or more characters in a SQL LIKE clause condition ( % and _ ). FROM A SQL clause that specifies the table or tables from which data is to be retrieved. UPDATE A SQL command that allows attribute values to be changed in one or more rows of a table. ROLLBACK A SQL command that restores the database table contents to the condition that existed after the last COMMIT statement. UNISA Page 32 ROLLBACK; DELETE A SQL command that allows data rows to be deleted from a table. Subquery A query that is embedded (or nested) inside another query. Also known as a nested query or an inner query. Nested Query In SQL, a query that is embedded in another query. Inner Query A query that is embedded or nested inside another query. Also known as a nested query or a subquery. 7.4 SELECT Queries WHERE A SQL clause that adds conditional restrictions to a SELECT statement that limit the rows returned by the query. Comparison Operators UNISA Page 33 Alias An alternative name for a column or table in a SQL statement. The Arithmetic Operators Rules of Precedence Basic algebraic rules that specify the order in which operations are performed. For example, operations within parentheses are executed first, so in the equation 2 + (3 × 5), the multiplication portion is calculated first, making the correct answer 17. OR The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause. It requires only one of the conditional expressions to be true. AND The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause. It requires that all conditional expressions evaluate to true. Boolean Algebra A branch of mathematics that uses the logical operators OR, AND, and NOT. NOT A SQL logical operator that negates a given predicate. UNISA Page 34 A SQL logical operator that negates a given predicate. BETWEEN In SQL, a special comparison operator used to check whether a value is within a range of specified values. IS NULL In SQL, a comparison operator used to check whether an attribute has a value. LIKE In SQL, a comparison operator used to check whether an attribute’s text value matches a specified string pattern. • % means any and all following or preceding characters are eligible. • _ means any one character may be substituted for the underscore. IN In SQL, a comparison operator used to check whether a value is among a list of specified values. EXISTS In SQL, a comparison operator that checks whether a subquery returns any rows. UNISA Page 35 7.5 Additional Data Definition Commands ALTER TABLE The SQL command used to make changes to table structure. When the command is followed by a keyword (ADD or MODIFY), it adds a column or changes column characteristics. DROP TABLE A SQL command used to delete database objects such as tables, views, indexes, and users. 7.6 Additional SELECT Query Keywords ORDER BY A SQL clause that is useful for ordering the output of a SELECT query (for example, in ascending or descending order). Cascading Order Sequence A nested ordering sequence for a set of rows, such as a list in which all last names are alphabetically ordered and, within the last names, all first names are ordered. DISTINCT A SQL clause that produces only a list of values that are different from one another. Some Basic Aggregation Functions UNISA Page 36 COUNT A SQL aggregate function that outputs the number of rows containing not null values for a given column or expression, sometimes used in conjunction with the DISTINCT clause. MAX A SQL aggregate function that yields the maximum attribute value in a given column. MIN A SQL aggregate function that yields the minimum attribute value in a given column. SUM A SQL aggregate function that yields the sum of all values for a given column or expression. AVG A SQL aggregate function that outputs the mean average for a specified column or expression. SELECT AV(P_PRICE) FROM PRODUCT; GROUP BY A SQL clause used to create frequency distributions when combined with any of the aggregate functions in a SELECT statement. UNISA Page 37 HAVING A clause applied to the output of a GROUP BY operation to restrict selected rows. 7.7 Joining Database Tables Recursive Query A nested query that joins a table to itself. UNISA Page 38