Uploaded by tsitsimessi

INF2603 summary

advertisement
Chapter 1 - Database Systems
Monday, 22 January 2018
20:32
1.2 Data Versus Information
Raw Data
Raw facts, or facts that have not yet been processed to reveal their meaning to the end user.
Information
The result of processing raw data to reveal its meaning. Information consists of transformed data
and facilitates decision making.
Knowledge
The body of information and facts about a specific subject. Knowledge implies familiarity, awareness
and understanding of information as it applies to an environment. A key characteristic is that new
knowledge can be derived from old knowledge.
1.3 Introducing the Database
Data Management
A process that focusses on data collection, storage and retrieval. Common data managements
functions include addition, deletion, modification and listing.
Database
A shared, integrated computer structure that houses a collection of data. A database contains two
types of data: end user data (raw data) and metadata.
Metadata
Data about data; that is, data about data characteristics and relationships.
Database Management System (RDBM)
The collection of programs that manages the database structure and controls access to the data
stored in the database.
DBMS Advantages
•
•
•
•
•
Improved data sharing.
Improved data security.
Better data integration.
Minimized data inconsistency.
Improved data access.
UNISA Page 1
• Improved data access.
• Improved decision making.
• Increased end-user productivity.
Data Inconsistency
A condition in which different versions of the same data yield different (inconsistent) results.
Query
A question or task asked by an end user of a database in the form of SQL code. A specific request for
data manipulation issued by the end user or the application to the DBMS.
Ad Hoc Query
A “spur-of-the-moment” question.
Query Result Set
The collection of data rows returned by a query.
Data Quality
A comprehensive approach to ensuring the accuracy, validity, and timeliness of data.
Types of Databases
•
•
•
•
•
•
•
•
•
•
•
•
Single-user database - A database that supports only one user at a time.
Desktop database - A single-user database that runs on a personal computer.
Multiuser database - A database that supports multiple concurrent users.
Workgroup database - A multiuser database that usually supports fewer than 50 users or is
used for a specific department in an organization.
Enterprise database - The overall company data representation, which provides support for
present and expected future needs.
Centralized database - A database located at a single site.
Distributed database - A logically related database that is stored in two or more physically
independent sites.
Cloud database - A database that is created and maintained using cloud services, such as
Microsoft Azure o Amazon AWS.
General-purpose database - A database that contains a wide variety of data used in multiple
disciplines.
Discipline-specific database - A database that contains data focused on specific subject areas.
Operation database - A database designed primarily to support a company’s day-to-day
operations. Also known as a transactional database, OLTP database, or production database.
Analytic database - A database focused primarily on storing historical data and business
metrics used for tactical or strategic decision making.
Data Warehouse
A specialized database that stores historical and aggregated data in a format optimized for decision
UNISA Page 2
A specialized database that stores historical and aggregated data in a format optimized for decision
support.
Online Analytical Processing (OLAP)
A set of tools that provide advanced data analysis for retrieving, processing, and modelling data
from the data warehouse.
Business Intelligence
A set of tools and processes used to capture, collect, integrate, store, and analyse data to support
business decision making.
Unstructured Data
Data that exists in its original, raw state; that is, in the format in which it was collected.
Structured Data
Data that has been formatted to facilitate storage, use, and information generation.
Semistructured Data
Data that has already been processed to some extent.
Extensible Markup Language (XML)
A metalanguage used to represent and manipulate data elements. Unlike other markup languages,
XML permits the manipulation of a document’s data elements. XML facilitates the exchange of
structured documents such as orders and invoices over the Internet.
XML Database
A database system that stores and manages semistructured XML data.
Social Media
Web and mobile technologies that enable “anywhere, anytime, always on” human interactions.
NoSQL
A new generation of database management systems that is not based on the traditional relational
database model.
1.4 Why Database design is important
UNISA Page 3
1.4 Why Database design is important
Database Design
The process that yields the description of the database structure and determines the database
components. The second phase of the Database Life Cycle.
1.5 Evolution of File System Data Processing
Data Processing (DP) Specialist
The person responsible for developing and managing a computerized file processing system.
1.6 Problems with File System Data Processing
Structural Dependencies
A data characteristic in which a change in the database schema affects data access, thus requiring
changes in all access programs.
Structural Independencies
A data characteristic in which changes in the database schema do not affect data access.
Data Dependencies
A data condition in which data representation and manipulation are dependent on the physical data
storage characteristics.
Data Independence
A condition in which data access is unaffected by changes in the physical data storage
characteristics.
Logical Data Format
The way a person views data within the context of a problem domain.
Physical Data Format
The way a computer “sees” (stores) data.
Islands of Information
In the old file system environment, pools of independent, often duplicated, and inconsistent data
created and managed by different departments.
UNISA Page 4
created and managed by different departments.
Data Redundancy
Exists when the same data is stored unnecessarily at different places.
Data Integrity
In a relational database, a condition in which the data in the database complies with all entity and
referential integrity constraints.
Data Anomaly
A data abnormality in which inconsistent changes have been made to a database. For example, an
employee moves, but the address change is not corrected in all files in the database.
1.7 Database Systems
Database System
An organization of components that defines and regulates the collection, storage, management, and
use of data in a database environment.
Data Dictionary
A DBMS component that stores metadata—data about data. The data dictionary contains data
definitions as well as data characteristics and relationships. May also include data that is external to
the DBMS.
Performance Tuning
Activities that make a database perform more efficiently in terms of storage and access speed.
Query Language
A nonprocedural language that is used by a DBMS to manipulate its data. An example of a query
language is SQL.
Structured Query Language (SQL)
A powerful and flexible relational database language composed of commands that enable users to
create database and table structures, perform various types of data manipulation and data
administration, and query the database to extract useful information.
UNISA Page 5
Chapter 2 - Data Models
Tuesday, 23 January 2018
21:35
2.1 Data Modeling and Data Models
Data Modeling
The process of creating a specific data model for a determined problem domain.
Data Model
A representation, usually graphic, of a complex “real-world” data structure. Data models are used in
the database design phase of the Database Life Cycle.
2.3 Data Model Basic Building Blocks
Entity
A person, place, thing, concept, or event for which data can be stored. See also attribute.
Attribute
A characteristic of an entity or object. An attribute has a name and a data type.
Relationship
An association between entities.
One-to-Many (1:M or 1..*) Relationship
Associations among two or more entities that are used by data models. In a 1:M relationship, one
entity instance is associated with many instances of the related entity.
Many-to-Many (M:N or *..*) Relationship
Association among two or more entities in which one occurrence of an entity is associated with
many occurrences of a related entity and one occurrence of the related entity is associated with
many occurrences of the first entity.
One-to-One (1:1 or 1..1) Relationship
Associations among two or more entities that are used by data models. In a 1:1 relationship, one
entity instance is associated with only one instance of the related entity.
Constraints
UNISA Page 6
Constraints
A restriction placed on data, usually expressed in the form of rules. For example, “A student’s GPA
must be between 0.00 and 4.00.” Constraints are important because they help to ensure data
integrity.
2.4 Business Rules
Business Rule
A description of a policy, procedure, or principle within an organization. For example, a pilot cannot
be on duty for more than 10 hours during a 24-hour period, or a professor may teach up to four
classes during a semester.
2.5 The Evolution of Data Models
Hierarchical Model
An early database model whose basic concepts and characteristics formed the basis for subsequent
database development. This model is based on an upside-down tree structure in which each record
is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the
segment directly below it.
Segment
In the hierarchical data model, the equivalent of a file system’s record type.
Network Model
An early data model that represented data as a collection of record types in 1:M relationships.
Schema
A logical grouping of database objects, such as tables, indexes, views, and queries, that are related
to each other.
Subschema
The portion of the database that interacts with application programs.
Data Manipulation Language (DML)
The set of commands that allows an end user to manipulate the data in the database, such as
SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK.
Data Definition Language
UNISA Page 7
Data Definition Language
The language that allows a database administrator to define the database structure, schema, and
subschema.
Relational Model
Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory
and represents data as independent relations. Each relation (table) is conceptually represented as a
two-dimensional structure of intersecting rows and columns. The relations are related to each other
through the sharing of common entity characteristics (values in columns).
Table (Relation)
A logical construct perceived to be a two-dimensional structure composed of intersecting rows
(entities) and columns (attributes) that represents an entity set in the relational model.
Tuple
In the relational model, a table row.
Relational Database Management System (RDBMS)
A collection of programs that manages a relational database. The RDBMS software translates a
user’s logical requests (queries) into commands that physically locate and retrieve the requested
data.
Relational Diagram
A graphical representation of a relational database’s entities, the attributes within those entities,
and the relationships among the entities.
Entity Relationship (ER) Model (ERM)
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level
with the help of ER diagrams. The model was developed by Peter Chen.
Entity Relationship Diagram (ERD)
A diagram that depicts an entity relationship model’s entities, attributes, and relations.
Entity Instance (Entity Occurrence)
A row in a relational table.
Entity Set
UNISA Page 8
Entity Set
A collection of like entities.
Connectivity
The type of relationship between entities. Classifications include 1:1, 1:M, and M:N.
Crow's Foot Notation
A representation of the entity relationship diagram that uses a three-pronged symbol to represent
the “many” sides of the relationship.
Class Diagram Notation
The set of symbols used in the creation of class diagrams.
Object-Oriented Data Model (OODM)
A data model whose basic modeling structure is an object.
Object
An abstract representation of a real-world entity that has a unique identity, embedded properties,
and the ability to interact with other objects and itself.
Object-Oriented Database Management System (OODBMS)
Data management software used to manage data in an object-oriented database model.
Semantic Data Model
The first of a series of data models that more closely represented the real world, modeling both data
and their relationships in a single structure known as an object. The SDM, published in 1981, was
developed by M. Hammer and D. McLeod.
Class
A collection of similar objects with shared structure (attributes) and behaviour (methods). A class
encapsulates an object’s data representation and a method’s implementation. Classes are organized
in a class hierarchy.
Method
In the object-oriented data model, a named set of instructions to perform an action. Methods
represent real-world actions, and are invoked through messages.
UNISA Page 9
represent real-world actions, and are invoked through messages.
Class Hierarchy
The organization of classes in a hierarchical tree in which each parent class is a superclass and each
child class is a subclass. See also inheritance.
Inheritance
In the object-oriented data model, the ability of an object to inherit the data structure and methods
of the classes above it in the class hierarchy.
Unified Modeling Language (UML)
A language based on object-oriented concepts that provides tools such as diagrams and symbols to
graphically model a system.
Class Diagram
A diagram used to represent data and their relationships in UML object notation.
Extended Relational Data Model (ERDM)
A model that includes the object-oriented model’s best features in an inherently simpler relational
database structural environment.
Object/Relational Database Management System (O/R DBMS)
A DBMS based on the extended relational model (ERDM). The ERDM, championed by many
relational database researchers, constitutes the relational model’s response to the OODM. This
model includes many of the object-oriented model’s best features within an inherently simpler
relational database structure.
Big Data
A movement to find new and better ways to manage large amounts of web-generated data and
derive business insight from it, while simultaneously providing high performance and scalability at a
reasonable cost.
3 Vs
Three basic characteristics of Big Data databases: volume, velocity, and variety.
Hadoop
A Java based, open source, high speed, fault-tolerant distributed storage and computational
UNISA Page 10
A Java based, open source, high speed, fault-tolerant distributed storage and computational
framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to
store and process data.
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at
high speeds.
Name Node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The name node
stores all the metadata about the file system.
Data Node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The data node
stores fixed-size data blocks (that could be replicated to other data nodes).
Client Node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The client node acts
as the interface between the user application and the HDFS.
MapReduce
An open-source application programming interface (API) that provides fast data analytics services;
one of the main Big Data technologies that allows organizations to process massive data stores.
Key-Value
A data model based on a structure composed of two data elements: a key and a value, in which
every key has a corresponding value or set of values. The key value data model is also called the
associative or attribute-value data model.
Sparse Data
A case in which the number of table attributes is very large but the number of actual data instances
is low.
Eventual Consistency
A model for database consistency in which updates to the database will propagate through the
system so that all data copies will be consistent eventually.
American National Standard Institute (ANSI)
UNISA Page 11
American National Standard Institute (ANSI)
The group that accepted the DBTG recommendations and augmented database standards in 1975
through its SPARC committee.
External Model
The application programmer’s view of the data environment. Given its business focus, an external
model works with a data subset of the global database schema.
External Schema
The specific representation of an external view; the end user’s view of the data environment.
Conceptual Model
The output of the conceptual design process. The conceptual model provides a global view of an
entire database and describes the main data objects, avoiding details.
Conceptual Schema
A representation of the conceptual model, usually expressed graphically.
Software Independence
A property of any model or application that does not depend on the software used to implement it.
Hardware Independence
A condition in which a model does not depend on the hardware used in the model’s
implementation. Therefore, changes in the hardware will have no effect on the database design at
the conceptual level.
Logical Design
A stage in the design phase that matches the conceptual design to the requirements of the selected
DBMS and is therefore software-dependent. Logical design is used to translate the conceptual design
into the internal model for a selected database management system, such as DB2, SQL Server,
Oracle, IMS, Informix, Access, or Ingress.
Internal Model
In database modeling, a level of data abstraction that adapts the conceptual model to a specific
DBMS model for implementation. The internal model is the representation of a database as “seen”
by the DBMS. In other words, the internal model requires a designer to match the conceptual
model’s characteristics and constraints to those of the selected implementation model.
UNISA Page 12
Internal Schema
A representation of an internal model using the database constructs supported by the chosen
database.
Logical Independence
A condition in which the internal model can be changed without affecting the conceptual model.
(The internal model is hardware-independent because it is unaffected by the computer on which the
software is installed. Therefore, a change in storage devices or operating systems will not affect the
internal model.)
Physical Model
A model in which physical characteristics such as location, path, and format are described for the
data. The physical model is both hardware- and software-dependent.
Physical Independence
A condition in which the physical model can be changed without affecting the internal model.
UNISA Page 13
Chapter 3 - The Relational Database Model
Tuesday, 30 January 2018
21:49
3.1 A Logical View of Data
Predicate logic
Used extensively in mathematics to provide a framework in which an assertion (statement of fact)
can be verified as either true or false.
Set Theory
A part of mathematical science that deals with sets, or groups of things, and is used as the basis for
data manipulation in the relational model.
Tuple
In the relational model, a table row.
Domain
In data modeling, the construct used to organize and describe an attribute’s set of possible values.
Primary Key (PK)
In the relational model, an identifier composed of one or more attributes that uniquely identifies a
row. Also, a candidate key selected as a unique entity identifier.
3.2 Keys
Key
One or more attributes that determine other attributes.
Determination
The role of a key. In the context of a database table, the statement “A determines B” indicates that
knowing the value of attribute A means that the value of attribute B can be looked up.
Functional Dependence
Within a relation R, an attribute B is functionally dependent on an attribute A if and only if a given
value of attribute A determines exactly one value of attribute B. The relationship “B is dependent on
A” is equivalent to “A determines B, and is written as A → B.
Determinant
UNISA Page 14
Determinant
Any attribute in a specific row whose value directly determines other values in that row.
Dependant
An attribute whose value is determined by another attribute.
Full Functional Dependence
A condition in which an attribute is functionally dependent on a composite key but not on any
subset of the key.
Composite Key
A multiple-attribute key.
Key Attributes
The attributes that form a primary key.
Superkey
An attribute or attributes that uniquely identify each entity in a table.
Candidate Key
A minimal superkey; that is, a key that does not contain a subset of attributes that is itself a
superkey.
Entity Integrity
The property of a relational table that guarantees each entity has a unique value in a primary key
and that the key has no null values.
Null
The absence of an attribute value. Note that a null is not a blank.
Foreign Key (FK)
An attribute or attributes in one table whose values must match the primary key in another table or
whose values must be null.
Referential Integrity
UNISA Page 15
Referential Integrity
A condition by which a dependent table’s foreign key must have either a null entry or a matching
entry in the related table.
Secondary Key
A key used strictly for data retrieval purposes. For example, customers are not likely to know their
customer number (primary key), but the combination of last name, first name, middle initial, and
telephone number will probably match the appropriate table row
Flags
Special codes implemented by designers to trigger a required response, alert end users to specified
conditions, or encode values. Flags may be used to prevent nulls by bringing attention to the
absence of a value in a table.
Relational Algebra
A set of mathematical principles that form the basis for manipulating relational table contents; the
eight main functions are SELECT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT, and
DIVIDE.
Relvar
Short for relation variable, a variable that holds a relation. A relvar is a container (variable) for
holding relation data, not the relation itself.
Closure
A property of relational operators that permits the use of relational algebra operators on existing
tables (relations) to produce new relations.
SELECT
In relational algebra, an operator used to select a subset of rows. Also known as RESTRICT.
RESTRICT
Same as SELECT.
PROJECT
In relational algebra, an operator used to select a subset of columns.
UNION
UNISA Page 16
UNION
In relational algebra, an operator used to merge (append) two tables into a new table, dropping the
duplicate rows. The tables must be union-compatible.
A∪B
Union-Compatible
Two or more tables that have the same number of columns and the corresponding columns have
compatible domains.
INTERSECT
In relational algebra, an operator used to yield only the rows that are common to two unioncompatible tables.
A∩B
DIFFERENCE
In relational algebra, an operator used to yield all rows from one table that are not found in another
union-compatible table.
A-B
PRODUCT
In relational algebra, an operator used to yield all possible pairs of rows from two tables. Also known
as the Cartesian product.
JOIN
In relational algebra, a type of operator used to yield rows from two tables based on criteria. There
are many types of joins, such as natural join, theta join, equijoin, and outer join.
c⨝a
Natural Join
A relational operation that yields a new table composed of only the rows with common values in
their common attribute(s).
Join Columns
Columns that are used in the criteria of join operations. The join columns generally share similar
values.
Equijoin
A join operator that links tables based on an equality condition that compares specified columns of
UNISA Page 17
A join operator that links tables based on an equality condition that compares specified columns of
the tables.
Theta Join
A join operator that links tables using an inequality comparison operator (<, >, <=, >=) in the join
condition.
Inner Join
A join operation in which only rows that meet a given criterion are selected. The join criterion can be
an equality condition (natural join or equijoin) or an inequality condition (theta join). The inner join
is the most commonly used type of join. Contrast with outer join.
Outer Join
A relational algebra join operation that produces a table in which all unmatched pairs are retained;
unmatched values in the related table are left null. Contrast with inner join.
Left Outer Join
In a pair of tables to be joined, a join that yields all the rows in the left table, including those that
have no matching values in the other table. For example, a left outer join of CUSTOMER with AGENT
will yield all of the CUSTOMER rows, including the ones that do not have a matching AGENT row.
Right Outer Join
In a pair of tables to be joined, a join that yields all of the rows in the right table, including the ones
with no matching values in the other table. For example, a right outer join of CUSTOMER with
AGENT will yield all of the AGENT rows, including the ones that do not have a matching CUSTOMER
row.
DIVIDE
In relational algebra, an operator that answers queries about one set of data being associated with
all values of data in another set of data.
3.5 The Data Dictionary and the System Catalog
Data Dictionary
A DBMS component that stores metadata— data about data. Thus, the data dictionary contains the
data definition as well as their characteristics and relationships. A data dictionary may also include
data that are external to the DBMS. Also known as an information resource dictionary.
System Catalog
UNISA Page 18
A detailed system data dictionary that describes all objects in a database.
Homonym
The use of the same name to label different attributes. Homonyms generally should be avoided.
Some relational software automatically checks for homonyms and either alerts the user to their
existence or automatically makes the appropriate adjustments.
3.6 Relationships within the Relational Database
Synonym
The use of different names to identify the same object, such as an entity, an attribute, or a
relationship; synonyms should generally be avoided.
Composite Entity
An entity designed to transform an M:N relationship into two 1:M relationships. The composite
entity’s primary key comprises at least the primary keys of the entities that it connects. Also known
as a bridge entity or associative entity.
Linking Tables
In the relational model, a table that implements an M:M relationship.
3.8 Indexes
Index
An ordered array of index key values and row ID values (pointers). Indexes are generally used to
speed up and facilitate data retrieval. Also known as an index key.
Unique Index
An index in which the index key can have only one associated pointer value (row).
UNISA Page 19
Chapter 4 - Entity Relationship (ER) Modeling
Sunday, 04 February 2018
11:34
4.1 The Entity Relationship Model (ERM)
Required Attributes
In ER modeling, an attribute that must have a value. In other words, it cannot be left empty.
Optional Attributes
In ER modeling, an attribute that does not require a value; therefore, it can be left empty.
Identifiers
One or more attributes that uniquely identify each entity instance.
Relational Schema
The organization of a relational database as described by the database administrator.
Composite Identifier
In ER modeling, a key composed of more than one attribute. For example, a phone number such as
615-898-2368 may be divided into an area code (615), an exchange number (898), and a four-digit
code (2368).
Composite Attribute
An attribute that can be further subdivided to yield additional attributes.
Simple Attribute
An attribute that cannot be subdivided into meaningful components.
Single-valued Attribute
An attribute that can have only one value.
Multivalued Attribute
An attribute that can have many values for a single entity occurrence. For example, an EMP_DEGREE
attribute might store the string “BBA, MBA, PHD” to indicate three different degrees held.
Derived Attribute
An attribute that does not physically exist within the entity and is derived via an algorithm. For
example, the Age attribute might be derived by subtracting the birth date from the current date.
UNISA Page 20
Participants
An ER term for entities that participate in a relationship. For example, in the relationship
“PROFESSOR teaches CLASS,” the teaches relationship is based on the participants PROFESSOR and
CLASS.
Connectivity
The classification of the relationship between entities. Classifications include 1:1, 1:M, and M:N.
Cardinality
A property that assigns a specific value to connectivity and expresses the range of allowed entity
occurrences associated with a single occurrence of the related entity.
Existence-dependent
A property of an entity whose existence depends on one or more other entities. In such an
environment, the existence-independent table must be created and loaded first because the
existence-dependent key cannot reference a table that does not yet exist.
Existence-independent
A property of an entity that can exist apart from one or more related entities. Such a table must be
created first when referencing an existence-dependent table.
Strong Entity
An entity that is existence-independent, that is, it can exist apart from all of its related entities. Also
called a regular entity.
Weak (non-identity) Relationship
A relationship in which the primary key of the related entity does not contain a primary key
component of the parent entity.
Strong (identifying) Relationship
A relationship that occurs when two entities are existence-dependent; from a database design
perspective, this relationship exists whenever the primary key of the related entity contains the
primary key of the parent entity.
Weak Entity
An entity that displays existence dependence and inherits the primary key of its parent entity. For
example, a DEPENDENT requires the existence of an EMPLOYEE.
Optional Participation
In ER modeling, a condition in which one entity occurrence does not require a corresponding entity
occurrence in a particular relationship.
UNISA Page 21
occurrence in a particular relationship.
Mandatory Participation
A relationship in which one entity occurrence must have a corresponding occurrence in another
entity. For example, an EMPLOYEE works in a DIVISION. (A person cannot be an employee without
being assigned to a company’s division.)
Relationship Degree
The number of entities or participants associated with a relationship. A relationship degree can be
unary, binary, ternary, or higher.
Unary Relationship
An ER term used to describe an association within an entity. For example, an EMPLOYEE might
manage another EMPLOYEE.
Binary Relationship
An ER term for an association (relationship) between two entities. For example, PROFESSOR teaches
CLASS.
Ternary Relationship
An ER term used to describe an association (relationship) between three entities. For example, a
DOCTOR prescribes a DRUG for a PATIENT.
Recursive Relationship
A relationship found within a single entity type. For example, an EMPLOYEE is married to an
EMPLOYEE or a PART is a component of another PART.
4.2 Developing an ER Diagram
Iterative Process
A process based on repetition of steps and procedures.
UNISA Page 22
Chapter 5 - Advanced Data Modeling
Monday, 05 February 2018
21:38
5.1 The Extended Entity Relationship Model
Extended Entity Relationship Model (EERM)
Sometimes referred to as the enhanced entity relationship model; the result of adding more
semantic constructs, such as entity supertypes, entity subtypes, and entity clustering, to the original
entity relationship (ER) model.
EER Diagram (EERD)
The entity relationship diagram resulting from the application of extended entity relationship
concepts that provide additional semantic content in the ER model.
Entity Supertype
In a generalization/ specialization hierarchy, a generic entity type that contains the common
characteristics of entity subtypes.
Entity Subtype
In a generalization/ specialization hierarchy, a subset of an entity supertype. The entity supertype
contains the common characteristics and the subtypes contain the unique characteristics of each
entity.
Specialization Hierarchy
A hierarchy based on the top-down process of identifying lower-level, more specific entity subtypes
from a higher-level entity supertype. Specialization is based on grouping unique characteristics and
relationships of the subtypes.
Inheritance
In the EERD, the property that enables an entity subtype to inherit the attributes and relationships of
the entity supertype.
Subtype Discriminator
The attribute in the supertype entity that determines to which entity subtype each supertype
occurrence is related.
Disjoint Subtype
In a specialization hierarchy, a unique and nonoverlapping subtype entity set.
UNISA Page 23
Overlapping Subtype
In a specialization hierarchy, a condition in which each entity instance (row) of the supertype can
appear in more than one subtype.
Completeness Constraint
A constraint that specifies whether each entity supertype occurrence must also be a member of at
least one subtype. The completeness constraint can be partial or total.
Partial Completeness
In a generalization/ specialization hierarchy, a condition in which some supertype occurrences might
not be members of any subtype.
Total Completeness
In a generalization/ specialization hierarchy, a condition in which every supertype occurrence must
be a member of at least one subtype.
Specialization
In a specialization hierarchy, the grouping of unique attributes into a subtype entity.
Generalization
In a specialization hierarchy, the grouping of common attributes into a supertype entity.
5.2 Entity Clustering
Entity Cluster
A “virtual” entity type used to represent multiple entities and relationships in the ERD. An entity
cluster is formed by combining multiple interrelated entities into a single abstract entity object. An
entity cluster is considered “virtual” or “abstract” because it is not actually an entity in the final ERD.
5.3 Entity Integrity: Selecting Primary Keys
Natural Key (natural identifier)
A generally accepted identifier for real-world objects. As its name implies, a natural key is familiar to
end users and forms part of their day-to-day business vocabulary.
Surrogate Key
UNISA Page 24
A system-assigned primary key, generally numeric and autoincremented
5.4 Design Cases: Learning Flexible Database Design
Time-Variant Data
Data whose values are a function of time. For example, time-variant data can be seen at work when
a company’s history of all administrative appointments is tracked.
Design Trap
A problem that occurs when a relationship is improperly or incompletely identified and therefore is
represented in a way that is not consistent with the real world. The most common design trap is
known as a fan trap.
Fan Trap
A design trap that occurs when one entity is in two 1:M relationships with other entities, thus
producing an association among the other entities that is not expressed in the model.
UNISA Page 25
Chapter 6 - Normalization of Database Tables
Tuesday, 06 February 2018
21:12
6.1 Database Tables and Normalization
Normalization
A process that assigns attributes to entities so that data redundancies are reduced or eliminated.
Denormalization
A process by which a table is changed from a higher-level normal form to a lower-level normal form,
usually to increase processing speed. Denormalization potentially yields data anomalies.
Prime Attributes
A key attribute; that is, an attribute that is part of a key or is the whole key.
Key Attributes
The attributes that form a primary key.
Nonprime Attribute
An attribute that is not part of a key.
6.3 The Normalization Process
Partial Dependency
A condition in which an attribute is dependent on only a portion(subset) of the primary key.
Transitive Dependency
A condition in which an attribute is dependent on another attribute that is not part of the primary
key.
Repeating Group
In a relation, a characteristic describing a group of multiple entries of the same type for a single key
attribute occurrence. For example, a car can have multiple colours for its top, interior, bottom, trim
and so on.
Dependency Diagram
UNISA Page 26
A representation of all data dependencies(primary key, partial, or transitive) within a table.
First Normal Form (1NF)
The first stage in the normalization process. It describes a relation depicted in tabular format, with
no repeating groups and a primary key identified. All nonkey attributes in the relation are dependent
on the primary key.
Second Normal Form (2NF)
The second stage in the normalization process, in which a relation is in 1NF and there are no partial
dependencies(dependencies in only part of the primary key).
Determinant
Any attribute in a specific row whose value directly determines other values in that row.
Third Normal Form (3NF)
A table is in 3NF when it is in 2NF and no nonkey attribute is functionally dependent on another
nonkey attribute; that is, it cannot include transitive dependencies.
6.4 Improving the Design
Atomic Attribute
An attribute that cannot be further subdivided to produce meaningful components. For example, a
person's last name attribute cannot be meaningfully subdivided.
Atomicity
Not being able to be divided into smaller units.
Granularity
The level of detail represented by the values stored in a table's row. Data stored at its lowest level of
granularity is said to be atomic data.
6.6 Higher-Level Normal Forms
Boyce-Codd Normal Form (BCNF)
A special type of third normal form (3NF) in which every determinant is a candidate key. A table in
BCNF must be in 3NF.
UNISA Page 27
6.7 Normalization and Database Design
Fourth Normal Form (4NF)
A table is in 4NF if it is in 3NF and contains no multiple independent sets of multivalued
dependencies.
UNISA Page 28
Chapter 7 - Introduction to Structured Query Language
(SQL)
Sunday, 11 February 2018
11:32
7.1 Introduction to SQL
SQL Data Definition Commands
SQL Data Manipulation Commands
UNISA Page 29
7.2 Data Definition Commands
Authentication
The process through which a DBMS verifies that only registered users can access the database.
Schema
A logical grouping of database objects, such as tables, indexes, views, and queries, that are related
to each other. Usually, a schema belongs to a single user or application.
Some Common SQL Data Types
UNISA Page 30
CREATE TABLE
A SQL command that creates a table’s structures using the characteristics and attributes given.
Reserved Words
Words used by a system that cannot be used for any other purpose. For example, in Oracle SQL, the
word INITIAL cannot be used to name tables or columns.
CREATE INDEX
A SQL command that creates indexes on the basis of a selected attribute or attributes.
DROP INDEX
A SQL command used to delete database objects such as tables, views, indexes, and users.
UNISA Page 31
INSERT
A SQL command that allows the insertion of one or more data rows into a table using a subquery.
COMMIT
The SQL command that permanently saves data changes to a database.
or COMMIT;
SELECT
A SQL command that yields the values of all rows or a subset of rows in a table. The SELECT
statement is used to retrieve data from tables.
Wildcard Character
A symbol that can be used as a general substitute for: (1) all columns in a table (*) when used in an
attribute list of a SELECT statement or, (2) zero or more characters in a SQL LIKE clause condition ( %
and _ ).
FROM
A SQL clause that specifies the table or tables from which data is to be retrieved.
UPDATE
A SQL command that allows attribute values to be changed in one or more rows of a table.
ROLLBACK
A SQL command that restores the database table contents to the condition that existed after the last
COMMIT statement.
UNISA Page 32
ROLLBACK;
DELETE
A SQL command that allows data rows to be deleted from a table.
Subquery
A query that is embedded (or nested) inside another query. Also known as a nested query or an
inner query.
Nested Query
In SQL, a query that is embedded in another query.
Inner Query
A query that is embedded or nested inside another query. Also known as a nested query or a
subquery.
7.4 SELECT Queries
WHERE
A SQL clause that adds conditional restrictions to a SELECT statement that limit the rows returned by
the query.
Comparison Operators
UNISA Page 33
Alias
An alternative name for a column or table in a SQL statement.
The Arithmetic Operators
Rules of Precedence
Basic algebraic rules that specify the order in which operations are performed. For example,
operations within parentheses are executed first, so in the equation 2 + (3 × 5), the multiplication
portion is calculated first, making the correct answer 17.
OR
The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause.
It requires only one of the conditional expressions to be true.
AND
The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause.
It requires that all conditional expressions evaluate to true.
Boolean Algebra
A branch of mathematics that uses the logical operators OR, AND, and NOT.
NOT
A SQL logical operator that negates a given predicate.
UNISA Page 34
A SQL logical operator that negates a given predicate.
BETWEEN
In SQL, a special comparison operator used to check whether a value is within a range of specified
values.
IS NULL
In SQL, a comparison operator used to check whether an attribute has a value.
LIKE
In SQL, a comparison operator used to check whether an attribute’s text value matches a specified
string pattern.
• % means any and all following or preceding characters are eligible.
• _ means any one character may be substituted for the underscore.
IN
In SQL, a comparison operator used to check whether a value is among a list of specified values.
EXISTS
In SQL, a comparison operator that checks whether a subquery returns any rows.
UNISA Page 35
7.5 Additional Data Definition Commands
ALTER TABLE
The SQL command used to make changes to table structure. When the command is followed by a
keyword (ADD or MODIFY), it adds a column or changes column characteristics.
DROP TABLE
A SQL command used to delete database objects such as tables, views, indexes, and users.
7.6 Additional SELECT Query Keywords
ORDER BY
A SQL clause that is useful for ordering the output of a SELECT query (for example, in ascending or
descending order).
Cascading Order Sequence
A nested ordering sequence for a set of rows, such as a list in which all last names are alphabetically
ordered and, within the last names, all first names are ordered.
DISTINCT
A SQL clause that produces only a list of values that are different from one another.
Some Basic Aggregation Functions
UNISA Page 36
COUNT
A SQL aggregate function that outputs the number of rows containing not null values for a given
column or expression, sometimes used in conjunction with the DISTINCT clause.
MAX
A SQL aggregate function that yields the maximum attribute value in a given column.
MIN
A SQL aggregate function that yields the minimum attribute value in a given column.
SUM
A SQL aggregate function that yields the sum of all values for a given column or expression.
AVG
A SQL aggregate function that outputs the mean average for a specified column or expression.
SELECT AV(P_PRICE) FROM PRODUCT;
GROUP BY
A SQL clause used to create frequency distributions when combined with any of the aggregate
functions in a SELECT statement.
UNISA Page 37
HAVING
A clause applied to the output of a GROUP BY operation to restrict selected rows.
7.7 Joining Database Tables
Recursive Query
A nested query that joins a table to itself.
UNISA Page 38
Download