Uploaded by markallengatchalian

Data Normalization

advertisement
DATA
NORMALIZATION
Carbaquil, Carmela Dawn P.
May
27,
2023
What has to be broken
before you can use it?
E__
How many month of the
year which has 28 days?
__
What is full of holes but
still holds water?
S_ON_E
Relation
It is a two-dimensional table of data consisting of rows
(records) and columns (attribute or field)
Relation (Entity) must have a unique name
Every attribute value must be atomic (not multivalued,
not composite)
Every row must be unique (can’t have two rows with
exactly the same values for all their fields)
Relation
Attributes (columns) in tables must have unique
names
The order of the columns (field names) is irrelevant
The order of the rows (records) must be irrelevant
Integrity Constraints
Domain Constraints – allowable
values for an attribute
Entity Integrity – no primary key
attribute may be null. All primary key
fields MUST have data
Referential Integrity – a foreign key in
one relation must match a primary
key value in another relation or the
foreign key value must be null
Data Normalization
Primarily a tool to validate and improve a logical
design so that it satisfies certain constraints that avoid
unnecessary duplication of data
Process of decomposing relations (with anomalies) to
produce smaller, well-structured relations
Develop by E. F. Codd in 1972
Well-Structured Relations
A relation that contains minimal data redundancy and allows
users to insert, delete, and update rows without causing data
inconsistencies
Goal is to avoid anomalies
a) Insertion Anomaly
b) Deletion Anomaly
c) Modification Anomaly
Anomalies
Insertion Anomaly – adding new rows forces user to create
duplicate data
Deletion Anomaly – deleting rows may cause a loss of data that
would be needed for other future rows
Modification Anomaly – changing data in a row forces changes to
other rows because of duplication
Functional Dependency
Functional Dependency is a constraint between two attributes in
which the value of one attribute is determined by the value of
another attribute
For any relation R, attribute B is functionally dependent on attribute
A, if the value of A uniquely determines that value of B
Functional Dependency
Attributes on the left side (SSSNo, ISBN) of the arrow in
functional dependency is called determinant while the
attributes in the right are the dependents
Candidate Key is a unique identifier (one or more attributes).
One of the candidate keys will become the primary key and
must satisfy the (a) unique identification, and (b) non
redundancy, properties.
Steps in Normalization
Normal Form – is a state of a relation that requires that certain rules
regarding relationships between attributes (or functional
dependencies) are satisfied.
What can you break, even if
you never pick it up or touch it?
PR_MI_E
What goes up but never comes
down?
A_E
First Normal Form (1NF)
A relation that has a primary key and in which there are not
repeating groups
Attributes are atomic (simple) and single-valued. Hence,
multivalued attributes are eliminated
A primary key has been identified
Second Normal Form (2NF)
A relation in first normal form in which non-key attribute is fully
functionally dependent on primary key
A relation without partial dependency on primary key
(composite attributes)
Partial Dependency occurs when a non-key attribute
(dependents) depends on a part of the primary key (one of the
attributes of composite primary key)
Second Normal Form (2NF)
Second Normal Form (2NF)
Steps to convert relation with partial dependencies into second
normal form:
1. Create a new relation for each primary key attribute (or
combination of attributes) that is a determinant in the partial
dependency that will serve as the primary key in the new relation.
2. Move the non-key attributes that are dependent on that certain
primary key attribute from the old relation to the new relation.
Third Normal Form (3NF)
Relation must be in second normal form and has no Transitive
Dependencies
Transitive Dependency is a functional dependency between the
primary key and one or more attributes that are dependent on the
primary key via another nonkey attribute
Third Normal Form (3NF)
Steps to convert relation with transitive dependencies into third normal
form:
1. For each non-key attributes (or set of attributes) that is a determinant in
a relation, create an new relation and make it as the primary key.
2. Move the attributes that are functionally dependent of the primary key
in the new relation.
3. Make the primary key as the foreign key in the old relation to create
association with the two relations
Third Normal Form (3NF)
It belongs to you, but other people use
it more than you do. What is it?
N_ME
Boyce-Codd Normal Form (BCNF)
Most third normal form relations are also BCNF relations.
A third normal form relation is NOT in BCNF if;
A. Candidate keys in the relation are composite keys (they are not
single attributes)
B. There is more than one candidate key in the relation, and;
C. The keys are not disjoint, that is, some attributes in the keys are
common.
Boyce-Codd Normal Form (BCNF)
In the above table Functional
dependencies are as follows:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE,
EMP_DEPT_NO}
Candidate key: {EMP_ID, EMP-DEPT}
The table is not in BCNF because neither
EMP_DEPT nor EMP_ID alone are keys.
Boyce-Codd Normal Form (BCNF)
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Fourth Normal Form (4NF)
A relational table is in the fourth normal form (4NF) if it is in BCNF
and all multivalued dependencies are also functional dependencies.
Fourth normal form (4NF) is based on the concept of multivalued
dependencies.
Multivalued dependency occurs when in a relational table,
containing at least three columns, one column has multiple rows
whose values match a value of a single row of one of the other
columns
Fourth Normal Form (4NF)
The given STUDENT table is in 3NF, but the
COURSE and HOBBY are two independent
entity. Hence, there is no relationship
between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21
contains two courses, Computer and Math and
two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads
to unnecessary repetition of data.
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
A table is in the fifth normal form (5NF) if it cannot have a lossless
decomposition into any number of smaller tables.
Fifth normal form is based on the concept of join dependence. 
Join dependency means that an table, after it has been decomposed
into three or more smaller tables, must be capable of being joined
again on common keys to form the original table. 
5NF indicates when an entity cannot be further decomposed
Fifth Normal Form (5NF)
In the above table, John takes both Computer
and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case,
combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester 3
but do not know about the subject and who will
be taking that subject so we leave Lecturer and
Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two
columns blank.
Fifth Normal Form (5NF)
What has a head and a tail
but no body?
C__N
CODD’s Rules
Dr Edgar F. Codd did some extensive research in Relational
Model of database systems and came up with twelve rules that a
database must obey in order to be a true relational database.
These rules can be applied on a database system that is capable
of managing is stored data using only its relational capabilities.
This is a foundation rule, which provides a base to imply other
rules on it
CODD’s Rules
Rule 1: Information rule
This rule states that all information (data), which is stored in the database, must be
a value of some table cell. Everything in a database must be stored in t able
formats. This information can be user data or meta-data.
Rule 2: Guaranteed Access rule
This rule states that every single data element (value) is guaranteed to be
accessible logically with combination of table-name, primary-key (row value) and
attribute-name (column value). No other means, such as pointers, can be used to
access data.
CODD’s Rules
Rule 3: Systematic Treatment of NULL values
This rule states the NULL values in the database must be given a systematic
treatment. As a NULL may have several meanings, i.e. NULL can be interpreted as
one the following: data is missing, data is not known, data is not applicable etc.
Rule 4: Active online catalog
This rule states that the structure description of whole database must be stored in
an online catalog, i.e. data dictionary, which can be accessed by the authorized
users. Users can use the same query language to access the catalog which they
use to access the database itself.
CODD’s Rules
Rule 5: Comprehensive data sub-language rule
This rule states that a database must have a support for a language which has
linear syntax which is capable of data definition, data manipulation and
transaction management operations. Database can be accessed by means of this
language only, either directly or by means of some application. If the database can
be accessed or manipulated in some way without any help of this language, it is
then a violation.
Rule 6: View updating rule
This rule states that all views of database, which can theoretically be updated,
must also be updatable by the system.
CODD’s Rules
Rule 7: High-level insert, update and delete rule
This rule states the database must employ support high-level insertion, updation
and deletion. This must not be limited to a single row that is, it must also support
union, intersection and minus operations to yield sets of data records.
Rule 8: Physical data independence
This rule states that the application should not have any concern about how the
data is physically stored. Also, any change in its physical structure must not have
any impact on application.
CODD’s Rules
Rule 9: Logical data independence
This rule states that the logical data must be independent of its user’s view
(application). Any change in logical data must not imply any change in the
application using it. For example, if two tables are merged or one is split into two
different tables, there should be no impact the change on user application. This is
one of the most difficult rule to apply.
CODD’s Rules
Rule 10: Integrity independence
This rule states that the database must be independent of the application using it.
All its integrity constraints can be independently modified without the need of any
change in the application. This rule makes database independent of the front-end
application and its interface.
CODD’s Rules
Rule 11: Distribution independence
This rule states that the end user must not be able to see that the data is
distributed over various locations. User must also see that data is located at one
site only. This rule has been proven as a foundation of distributed database
systems.
Rule 12: Non-subversion rule
This rule states that if a system has an interface that provides access to low level
records, this interface then must not be able to subvert the system and bypass
security and integrity constraints.
Why do we need to learn data
normalization?
How can you say that a
database is well structured?
Group Assignment
Answer the activity about data
normalization attached in the MS
Teams.
To be submitted on or before the next
meeting.
Download