Uploaded by brentonjust

Database Management Systems: Week 1 Notes

advertisement
Week 1 notes













Data constitutes the building blocks of information
Information is produced by processing data
Information is used to reveal the meaning of data
Accurate, relevant, and timely information is the key to good decision making
Good decision making is the key to organisation survival in a global environment
Data management: a process that focuses on data collection, storage and retrieval.
Common data management functions include addition, deletion, modification and listing.
Database: a shared, integrated computer structure that houses a collection of related data.
A database contains two types of data: end-user data (raw data) and metadata.
Metadata: data about data; that is, data bout data characteristics and relationships.
Database management system is a collection of programs that manages the database
structure and controls access to the data stored in the database. A database resembles a
very well organise electronic filing cabinet in which powerful software (the DBMS) helps
manage the system. IT is the intermediary between the user and the database
data are integrated and stored in a form of DB
a DB keeps not only end-user data but also metadata
end users cannot directly access to any data stored in a DB, but have to work with a DBMS to
interact with the DB
end users do not directly work with a DBMS, but they use relevant applications (programs)
to request necessary information retrieved from the DB. These applications might be written
by programmers and make roles to connect between a DBMS and end users.


A DBMS has the following advantages
o Improved data sharing
o Improved data security


o Better data integration
o Minimized data inconsistency
o Improved data access
o Improved decision making
o Increased end-user capacity
Data inconsistency. A condition in which different version of the same data yield different
(inconsistent) results
Query. A question or task asked by an end user of a database in the form of SQL code. A
specific request for data manipulation issued by the end user of the application to the DBMS
1-3b





















Single-user database – a database that supports only one user at a time
Desktop database – a single user database that runs on a personal computer
Multiuser database – a database that supports multiple concurrent users
Workgroup database – a multiuse database that usually supports fewer than 50 users or is
used for a specific department in an organisation
Enterprise database – the overall company data representation, which proves support for
present and expected future needs
Centralise database – a database that is located at a single site
Distributed database – a logically related database that is stored in two or more physically
independent sites
Cloud database – a database that is created and maintained using cloud services, such as MS
Azure or Amazon AWS
General purpose database – a database that contains a wide variety of data used in multiple
disciplines
Discipline specific database – a database that contains data focused on specific subject
areas.
Operational database -0 a database designed primarily to support a company’s day-to-day
operations. Also known as a transactional database, OLTP, or production database
Online Transaction Processing Database – see operational database
Transactional Database – see operational database
Production database – see operational database
Analytical Database – a database focused primarily on storing historical data and business
metrics used for tactical or str4ategic decision making
Data warehouse – a specialised database that stores historical and aggregated data in a
format optimised for decision support
Online Analytical Processing – A set of tools that provide advanced data analysis for
retrieving, processing and modelling data from the data warehouse
Business intelligence – a set of tools and processes used to capture, collect, integrate, store,
and analyse data to support business decision making
Unstructured data – data that exiss in its original, raw state; that is, in the format in which it
was collected.
Structured data – data that has been formatted to facilitate storage, use, and information
generation
Semi-structure data – data that has already been processed to some extent.



Extensible Markup Language (XML) – a metealanguage used to represent and manipulate
data elements. Unlike other markup languages, XML permits the manipulation of a
documents data elements
XML database – a database system that stores and manages semi-strcutred XML dta
NoSQL – a new generation of DBMS that is not based on the traditional relational database
model.
1-7




Database system – an organisation of components that defines and regulates the collection,
storage, management, and use of data in a database environment
Data dictionary – a DBMS component that stores metadata – data about data. The data
dictionary contains data definitions as well as data characteristics and relatships. May also
include dtat that is external the DBMS
Performance tuning – activities that make a database perform more eficeitnyl in terms of
storage and access speed.
Disadvantages of database systems
o
o
o
o
o
Incraswed costs
Management complexity
Maintaining currency
Vendor dependence
Frequent upgrade/replacement cycles
1-4
1-6










Structural depednece – a data characteristic in which a change in the database schema
affaects data access, thus requiing changes in all access programs
Structural independence – a data characteristic in which changes in the database schema do
not affect data access
Data type – defines the kind of values that can be used or stored. Also, used in programming
languages and database systems to determine operation that can be applied to such data.
Data dependence – a data condition in which data represenetiation and manipulation are
dependent on the physical data storage characteristics
Data independence – a condition in which data access is unaffected by changes in the
physical data storage characteristics
Logical data format – the way a person views data within the context of a problem domain
Physical dta format – the way a computer “sees” (stores) data.
Islands of information – in the old file system environment, pools of independent, often
duplicated, and inconsistent data created and managed by different departments
Data redundancy – exists when the same data is stored unnecessarily at different places
Data integrity – in a relational database, a condition in which the data in the database
complies with all entity and referential integrity constraints.


critical problems within the file system are: structural dependence, data dependence, data
redundancy, and data anomalies.
Evolution of the file system









Hardware
Software
Operating system software
DBMS software
Application programs and utility software
People
Procedures
Data

Data ananomoly – a data abnormality in which inconsistent changes have been made to a
database. EG an employee moves but the address change is not corrected in all files in the
database.

Data modelling – the process of creating a specific data model for a determined problem
domain
Data model – a representation, usually graphic, of a complex “real-world data structure.
Data models are used in the database design phase of the database life cycle
2-1

2-2

2-3
Entity – a person, place, thing, concept, or event for which data can be stored.






Attribute – a characteristic of an etity or object. An attribute has a name and a data type
Relationship – an association between entities
One-to-many (1:M or 1..*) – associations among two or more entities that are used be
data models. In a 1:M relationship, one entity instance is associated with many instances
of the related entity
Many-to-many ( M:N or *..*) relationship – association among two or more entites in
which one occurrence of an entity is associated with many occurrences of a related
entity and one occurrence of the related entity is associated with many occurrence of
the first entity.
One-to-one (1:1 or 1..1) relationship – associations among two or more entities that are
used by data models. In a 1:1 relationship, one entity instances is associated with only
one instance of the related entity
Constraint – a restriction placed on data, usually expressed in the form of rules eg a
student’s GPA must be between 0 and 4.
2-5















Hierarchical model – an early database model whose basic concepts and characteristics
formed the basic concepts and characteristics formed the basis for subsequent database
models
Segment – in the hierarchical data model, the equivalent of a file system’s record type
Network model – an early data model that represented data as a collection of record types
in 1:M relationship
Schema – a logical grouping of database objects, such as tables, indexes, views, and queries,
that are related to each other.
Subschema – the portion of the database that interacts with application programs
Data Manipulation Language (DML) – the set of commands that allows an end user to
manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT,
AND ROLLBACK
Data definition language (DDL) – the language that allows a database administrator to define
the database structure, schema, and subschema.
Relational model – based on mathematical set theory and represents data as independent
relations. Each relation (table) is conceptually represented as a two-dimensional structure of
intersecting rows and columns. The relations are related to each other through the sharing
of common entity characteristics (values in columns)
Table (relation) – a logical construct perceived to be a 2D structure composed of intersecting
rows(entities) and columns (attributes) that represent an entity set in the relational model
Tuple – in the relational model, a table row
Relational database management system (RDBMS) – collection of programs that manages a
relational database. The RDBMS software translates a user’s logic requests (queries) into
commands that physically locate and retrieve requested data
Relational diagram – a graphical representation of a relational database’s entities, attributes
within those entities, and the relationships among entities.
Entity relationship model (ERM) – a data model that describes relationships (1:1, 1:M, and
M:N) among entities ar the conceptual level with the help of ER diagrams
Entity relationship diagram – a diagram that depicts an entity relationship model’s entities,
attributes, and relations
Entity instance (entity occurrence) – a row in a relational table





Entity set – a collection of like entities
Connectivitiy – the type of relationship between entities. Classifications include 1:1, 1:M,
M:N.
Chen notation – ER model
Crow’s foot notation – a representation of the entity relatiohsp diagram that uses threepronged ‘symbol’ to represent the “many” sides of the relationship
Class diagram notation – the set of symbols used in the creation of class diagrams.
3-1



















Predicate logic – used extensively in maths to provide a framework in which an assertion
(statement of fact) can be verified as either true or false
Set theory – a math of mathematical science that deals with sets, or groups of things, and is
used as the basis for data manipulation in the relational model.
Tuple – in the relational model, a table row
Attribute doman – in data modelling, the construct used to organise and describe an
attributes set of possible values
Primary key (PK) – the the relational model, an identifier composed of one or more
attributes that uniquely idenitifies a row. Also a candidate key selected as a unique entity
idenitifer
Key one or more attributes that determine other attributes
Determination – the role of a key. In the context of a database table, the statement “A
determines B” indicates that knowing the value of attribute A means that the value of
attribute B can be looked up
Functional dependence – within relationship R, an attribute B is functionally dependent on
an attribute A if and only if a given value of attribute A determines exactly one value of
attribute B. The relationship “B is dependent on A” is equivalent to “A determines B” and is
wrriten as A→B
Determinant – any attribute in a specific row whose value directly determines other values
in that row
Dependent – an attribute whose value is determined by another attribute
Full functional dependence – a condition in which an attribute is functionally dependent on
a composite key but not any subset of the key
Composite key – a multiple-attribute key
Key attribute – an attribute that is part of a primay key
Superkey – an attribute/s that uniquely idenifites each entity in a table
Candidate key – a minimal superkey; that is, a key that does not contain a subset of
attributes that is itself a superkey
Entity integrity – the property of a relational table that guarantees each entity has a unique
value in a primary key and that the key has no null values
Null – the absence of an attribute value. Note that a null is not blank
Foreign key – attribute/s in one table whose values must math the primary key in another
table or whose values must be null
Referential integrity – a condition by which a dependent table’s foreign key must have either
a null entry or a matching entry in the related table

Secondary key – a key used strictly for data retrieval purpose. EG customers are not likely to
know their customr id (primary key) but the combination of of last name, first name, middle
initial, and telephone number will probably mather the appropriate table row.







Flags – special codes implemented by designers to trigger a required response, alert end
users to specified conditions, or encode values. Flags may be used to repvent nulls by bringin
attention to the absence value in a table.
Index – an ordered array of indec values and row ID values (pointers). Indexes are generally
used to speed up an dfacilitate data retrieval. Also known as an index key.
Unique index – an index in which the index key can have only one associated pointer value
(row)
Composite entity – an entity designed to transform a M:N relationship into two 1:M
relationships . The composit entity’s primary key comprises at least the primary keys of the
entities that it connects. Also known as a bridge entity or associative entity.
Linking table – in the relational model, a table that implements an M:M relationship.
Domain – the possible set of values for a given attribute





















Required attribute- in ER modelling, an attribute that must have a vlue. IN other words, it
cannot be left empty
Optional attribute – in ER modelling, an attribute that does not require a value; therefore, it
can be left empty
Identifier – one or more attributes that uniquely idneitfy each entity instance
Relational schema – the orgnaisation of a relational database as described by the database
administrator
Composite identifier – in ER modelling, a key composed of more than on attribute.
Composite attribute – an attribute that can be further subdivided to yield additional
attributes. Eg, a phone number such as 615-898-2368 may be divided into an area code
(615), an exchange number (898) and a four digit code (2368).
Simple attribute – an attribute that cannot be subdivided into meanginful components.
Single value attribute – an attribute that can have only one value.
Multivalued attribute -an aatribute that can have many values for a single entity occurrence.
Eg, an EMP_DEGREE attribute might store the string “BBA, MBA, PHD” to indicate three
different degrees held.
Derived attribute – an atytribute that does not physically exist with the entity and is derived
via an algorithym. Eg, the Age attribute might be derived by subtracting the birth date form
the current date.
Participants – an ER term for entities that participate in a relationship. Ef, in the relationship
“PROFESSOR teaches CLASS”, the teaches relationship is based on the particpants
PROFESSOR and CLASS.
Connectivity – the classification of the relationship between entities. Callsifications include
1:1, 1:Mand M:N
Cardinality – a property that assigns a specific value to connectivity and expresses the range
of allowed entity occurrences associated with a single cocurence of the related entity
Existence-dependent – a property of an entity whose existence depnds on one or more
other entities. In such an environment, the existence-independent table must be created
and loaded first because the the existence dependent key cannot reference a table that does
not yet exist.
Existence-independent – a property of an entity that can exist aprt from one or more related
entities. Such a table must be created first when referencing an existence-dependent table
Strong entity- an entity that is existence-independent, that is, it can exist apart from all of its
related entities.
Weak (non-identifying) relationship. A relaltionship in which the primary key of the related
entity does not contain a primary key component of the parent entity
Strong (identifying) relationship – a relationship that occurs when two entities are existencedependent; from a database design perspective, this relationship exists whenever the
primary key of the related entity contains the primary key of the parent entity.
A week entity – an entity that displays existence depdence and inherits the primary key of its
parent entity. Eg, DEPENDENT requires the existence of an EMPLOYEE
Optional participation – in ER modelling, a condition in which one entity occurrence does not
require a corresponding entity occurrence in a particular relationship
Mandatory participation – a relationship in which on entity occurrence must have a
corresponding occurrence in another entity. Eg, an EMPLOYEE works in a DIVISION. (A
person cannot be an employee without being assigned a division).



















Relationship degree – the number of entities or particpants associated with a relationship. A
relationship degree can be unary, binary, ternary, or higher.
Unary degree – an ER term used to describe an associated within an entity. Eg, an EMPLOYEE
might manage another EMPLOYEE
Binary relationship – an ER term for an association (relationship) between two entities. Eg,
PROFESSOR teaches CLASS.
Ternary relationship – an ER term used to describe an association (relationship) between
three entities. Eg, DOCTOR prescribes a DRUG for a PATIENT.
Recursive relationship – a relationship found within a single entity type. Eg, an EMPLOYEE is
married to an EMPLOYEE or a PART is a component of another part.
Iterative process – a process based on repition of steps and procedures.
Extended entity relationship model (EERM) – Sometimes referred to as the enhanced entity
relationship model; the result of adding more semantic construts, such as entity supertypes,
subtypes and entity clustering to the original entity relationship (ER) model
EER diagram – the entity relationship diagram resulting from the application of extended
entity relationship concepts that provide additional semantic content to in the ER model
Entity supertype – in a generalisation or specilisation hierarchy, a generic entity type that
contains common characteristics of entity subtypes
Entity subtype – in a generalistion or specialisation hierarchy a subset of an entity supertype.
The entity supertype contains the common characteristics and the subtypes contain the
unique characteristics of each entity.
Specialisation hierarchy – a heiracrhy based on the top-down process of identifying lowerlevel, more specific entity subtypes from a higher-level entity supertype. Speciliasation is
based on grouping unique characteristics and relationships of the subtypes
Inheritance – in the EERD, the property that enables an entity subtype to inherit the
attributes and relationships fo the entity supertype
Subtype descriminator – the attribute in the supertype entity that determines to which
entity subtype each supertype occurrence is related.
Disjoint subtypes – in a specialistation hierarchy, these are unique and nonoverlapping
subtype entity set.
Overlapping subtype set – in a specilialsation hierarchy, a condition in which each entity
instance (row) of the supertype can appear in more than one subtype
Completeness constraint – a constraint that specifies whether each entity supertype
occurrence must also be a member of atleast one subtype. The completeness constraint can
be partial or total
o Partial completeness – in a generalisation or specilisation hierarchy, a condition in
which some supertype occurrences might not be members of any subtype
o Total completeness – in a generalisation or specialisation hierarchy, a condition in
which every supertype occuence must be a member of at least one subtype
Specialisation – in a specialisation heirarhy, the grouping of unique attributes into subtype
entity
Generalisation – in a specialisation hierarchy, the grouping of common attributes into a
supertype entity
Entity cluser – a “virtual” entity tpe used to reprsetn multiple entites and relationships in the
ERD. An entity cluster is formed by combining multiple interreleated entities into a single
abstract entity object. An entity cluster is considered “virtual” or “abstract” because it is not
actually an entity in the final ERD.





Natural key (natural idenitifer) – a generally accepted identifier for real-world objects. As its
name implies, a natural key is familiar to end users and forms part of their day-to-day
business vocab.
Surrogate key – a system-assigned primary key, generally numeric and auto-incremented
Time-variant data – dat whose values are a function of time. Eg, time-variant data can be
seen at work when a company’s history of all administrative appointments is tracked.
Design trap – a problem that occurs when a relationship is improperly identified and
therefore is represented in a way that is not consistent with the real world. The most
common design traps is the fan trap
Fan trap – a design trap that occurs when one entity is in two 1:M relationshiups with other
entities, thus producing an association among the other entities that is not expressed in the
model
Download