Section 6 Databases theory

advertisement
N Kay 07/01/04
CP5
Database Theory
CP5
Databases
DATABASE: an organised collection of related files of data and software to
organise it and control access to it. It is a collection of non-redundant data
shareable between applications.
 Databases stores and organises data
 DBMS is a program which controls access to it.
Criticisms of Single and Integrated filing systems. (Flatfiles)




Data duplicated wasteful of memory.
Data may be updated in one file but not in another so data inconsistent
Data was not sharable
Data and programs were dependent e.g. if you add a new field then every
program that uses that data file will load the field.
 Managers do not have an overall picture of what is happening in the firm.
Databases
Disc
Very large data
handling
systems
Used when there is a very large volume of
processing and many applications need to access
the same central pool of data e.g. a large
corporation whose accountants, warehousing, stock
control, billing systems all need to access data
about customer order.
General model of a database
What makes databases
secure ?
 Password hierarchy
 Data validation
techniques
 Data is stored
separately from
programs so different
programs cannot
overwrite data.
Central Pool of
Data
DBMS
App 1
User
User
App 2
User
User
User
1
App 3
User
User
User
User
N Kay 07/01/04
CP5
Database Theory
Database Approach
Each application has its own view of the data which is relevant to it. i.e. this is
known as the data model.
Hierarchical
Network
Relational data model based upon relational links between key fields
Relational databases a relation is a set of entities that have the same attributes each
entity in one table has the same attributes as the other table. Different parts of an
organisation have a different view of the central data. The primary keyfield uniquely
identifies a record.
Functional Dependency (on primary key) means there should be a unique
association between the primary key and the attribute (part of a record e.g. field)
The primary key can uniquely identify a record if functional dependency exists.
Transitive Dependency
If A depends upon B
and B depends upon C
then A depends upon C.
Often to be avoided if unnecessary deletions are to be avoided.
In a Logical data model (LDM) we would have;

Data dictionary - A record of all the data in the system i.e. Records of all
tables Lists all the data entities of a systems and all of their attributes. Any
restrictions on data their format length and relationships between them.
What programs can access the data and whether they can read only edit
E.g.


Name of each data item
Names of tables and their fields.
2
N Kay 07/01/04




CP5
Database Theory
Data types of all fields
Any field formatting required
Field validation rules
Any relationships between tables
Name Description
Type Size Min & Max permitted
values
DOB
Date
Date of student’s
birth
10
01/01/1978
- 31/31/2001

ERD’s

Relational data analysis (RDA) which involves the techniques of
normalisation.
Advantages of using databases
1.
Avoids data duplication
 data stored once
 linked by keyfields
 all data available via relational links in keyfields
2.
Controlled redundancy
 Minimises data duplication
3.
Ensures consistency of data
 of data to all users
4.
Data independence
 data stored separately from programs so can add new fields because
data is independent of the applications which use it
5.
Increased security
Hierarchy of passwords-
What makes databases secure?
 ID
 Password hierarchy
 Authentication
 DBMS can vote permissions Read only
 Authorisation to files and processing
Write only etc.
 Users only allowed to view data
allowed to so less risk of accidental or deliberate destruction.
 Data validation techniques
 Data is stored separately from programs so different programs cannot
overwrite data.
6.
7.
Data integrity
 specify constraints on the data to ensure it is in the correct format and
range.
Easy to add new applications
 without affecting stored data files
3
N Kay 07/01/04
CP5
Database Theory
Disadvantages



Complex to set up and maintain; needs team of programmers to maintain it.
Database software is large complex expensive and requires powerful
computers.
All applications which access the data will be affected if database fails. As the
DBMS is the only access to operational data a system failure can have serious
consequences
Important Terminology
Database Management Systems (DBMS)
DBMS is a program which controls access to the data






Data storage retrieval and update (create ,edit and search)
Creation and maintenance of data dictionary
Managing facilities for sharing data e.g. when two people both simultaneously
try to update the data (Locking out other users)
Backup and recovery of data
Security - check passwords and access rights.
Allow applications to access the data and allow new applications
SQL Structured Query Language a data manipulation language used to perform
searches sorts etc.

Queries combines into 1 table the data from several others

Selects fields which are to be shown in answer

Specifies criteria for searching or sorting

Save query so can be re run

Saves answer table so it can be re-used in future reports
Example
FROM 'Address' table
SELECT Name, Address, Tel No, DOB
WHERE DOB, < 31/12/1999
ORDER by Name
Database Administrator
Responsible for






Design of the database and monitoring it’s performance
Keeping users informed of changes in the database structure which will affect
them
Maintenance of the data dictionary , implementing access rights and
privileges
Allocating passwords
Training users on how to access the database
Ensure adequate backup and recovery procedures
Report generator - facility to output data in a variety of format styles and reports
Client server databases
DBMS server software runs on a network server. This processes requests for
searches reports etc from client server software on network stations
4
N Kay 07/01/04
CP5
Database Theory
The conceptual data model describes how the data elements in the database are
grouped.
Entity:
Attribute:
is a thing of interest to the organisation about which data is to be
held. E.g. Customer, employee Stock item, Supplier
A property or characteristic of an entity
e.g. customer No, customer name etc.
Relationship
is a link or association between entities
e.g. Customer places an order, a doctor has many patients but a
patient
has only one doctor
Type of relationships
One to One
e.g. One husband has one wife, one employee - one job
Employee
One to many
Job
e.g. Mother has many children a borrower has many library
books
Ward
Many to many
Patients
e.g. Students and courses
Courses
Students
Example
Product
Member
Pupils
Barcode
Hires videos
Teachers
Draw the ERD diagram for the above table
5
One to one
One to many
Many to many
N Kay 07/01/04
CP5
Database Theory
Entity Relationship Diagrams (ERD)
ENTITY A
Doctor
ENTITY B
Patient
ATTRIBUTES
ATTRIBUTES
Doctor
Patients
Hospital
Draw an ERD for the following situation
What are the entities?
One to one relationships?
One to many relationships?
Many to many relationships?
6
College enrolment
N Kay 07/01/04
CP5
Database Theory
Normalisation
This is the process undertaken to ensure that a database has no redundant
data or inconsistent data.
Normalisation is the process which ensures that data is held in a database has



Eliminated redundancy
Achieved consistency
Minimises duplication
thus allowing the accurate processing of data and that the database has referential
integrity i.e. it will remain error free and robust when data is added deleted or
changed.
To solve the problem of data duplication the data ID stored once and linked by its
keyfields. Therefore all data can be accessed via the keyfields.
Tables are linked together via relationships between foreign fields i.e. common to
each table
Each database will have
Standard ways of writing them down;
Table in CAPITALS
Fields in brackets
Key field underlined
STUDENT
(Student No, Name ….)
(Student No, Name ….)
Foreign fields italics with a line above
_______
(Student No, Exam No,
Primary keys uniquely identifies a record
Foreign keys linking to another table.
Why might the primary key be more than one field?
e.g. visit to a doctor Unique patient code NHS No would not be sufficient
Need to use multiple key NHS No + date
Primary key
Customer ID #
Name
Address
Tel
Order ID #
Customer ID
Details of order
Date of order
Primary key
Foreign key
Standard way of writing these down
CUSTOMER TABLE (Customer ID, Name, Address, Tel)
_________
ORDER TABLE
(Order ID, Details of order, Date of order, Customer ID)
There are 3 stages
7
N Kay 07/01/04
CP5
Database Theory
First normal form: no repeating attributes or groups


Each column must contain only a single value
Each row must have an item in every column
First normal form removal of repeating groups.
Repeating groups are identified and a second entity is created with an appropriate
primary key.
Example
Student
No
Student
name
Date of
birth
Sex
Course
No
Course
name
Lecture
r No
Lecturer
Name
0485
F Smith
12/09/82
M
CO4876
1845
Jones. R
9234
K Peters
19/10/81
F
BI0945
1945
D Evans
0485
F Smith
12/09/82
M
BI0945
Computing
A level
Biology a
level
Biology a
level
1945
D Evans
Suitable tables of attributes could be
STUDENT
(Student no, Student name, Date of birth, Sex, ,)
COURSE
(Course No, Course Name, Lecturer No, Lecturuer name)



How can the relationship between the two tables be shown?
They need to be linked by a common field
BUT this is a many to many relationship
Adding course no. to STUDENT TABLE is no good because many students do
many courses.
Adding student no. to COURSE TABLE is no good because each course has many
students.
We could set up lots of fields in student table to i.e. one for each course
STUDENT (Student No, Student name, Date of birth, Sex, Course1, Course2,
Course3)
but this would duplicate data on courses. i.e. a repeating attribute.
To put the table in first normal form the repeating attribute must be removed
And the field course number becomes part of the student table.
STUDENT
#)
(Student No # , Student name, Date of birth, Sex, , Course No
COURSE
(Course No #, Course Name, Lecturer No, Lecturer name)
8
N Kay 07/01/04
CP5
Database Theory
Second normal form: tables contain no partial dependencies


Depends only on part of the keyfield.
So we create another table that links to another.
We consider all tables and attributes that do not depend upon the key field and create
separate tables for them to avoid data duplication
Student name is dependent only upon student no and not on course No. To put the
tables into second normal form we need to add a third table
STUDENT
(Student No # , Student name, Date of birth, Sex, )
STUDENT TAKES ( Student No #, Course No # )
COURSE
(Course No #, Course Name, Lecturer No, Lecturer name,)
Whenever your dealing with many to many relationships you will always need a link
table in the middle!!
Students
Courses
Students
Student
takes
becomes
Third Normal Form




Courses
tables contains no non key dependencies.
Data items are dependent on the keyfield only
Data items are dependent upon the whole key.
No transitive dependencies
Transitive dependencies can be removed by creating new table.
Creation of new tables for attributes which do not depend upon their candidate key
but which depend instead upon other non key attributes in the table
In the Course table, lecturer name is dependent upon the Lecturer No and not the
Course No. Therefore it needs to be removed from this dependency
How?
Create a new table for lecturer
In third normal form the table now looks as follows;
STUDENT
(Student No # , Student name, Date of birth, Sex, )
STUDENT TAKES
( Student No #, Course No # )
COURSE
(Course No #, Course Name, Lecturer No, # )
LECTURER
( Lecturer No #, Lecturer Name)
9
N Kay 07/01/04
CP5
Database Theory
Put into third normal form
1. There is a
2. There is a
3. There is a
4. There is a
list
list
list
list
of
of
of
of
pupils
exams they could take
exam entries
rooms in which exams are taken
First normal form
PUPIL
(
EXAMS
)
(
)
Second normal form
PUPIL
(
)
EXAM ENTRIES
(
)
EXAMS
(
)
Third normal form
PUPIL
(
)
EXAM ENTRIES
(
)
EXAMS
(
)
EXAM ROOMS
(
)
10
N Kay 07/01/04
CP5
Example 2
Put into first Normal Form
Customer No #
Customer Firstname
Customer Surname
Address
Tel No
Stock No #
Customer Record
Customer No
Customer Firstname
Customer Surname
Address
Tel No
Supplier No
Supplier name
Supplier address
Stock No
Stock item
Stock cost
Description
Supplier Tel No
Customer No #
Customer Firstname
Customer Surname
Address
Tel No
Database Theory
Stock No #
Stock item
Stock cost
Description
Supplier No #
Supplier name
Supplier address
Put into Second Normal Form
Stock No #
Customer No #
Stock No #
Stock item
Stock cost
Description
Supplier No #
Supplier name
Supplier address
Put into Third Normal Form
Customer No #
Customer Firstname
Customer Surname
Address
Tel No
Stock No #
Customer No #
Supplier No #
Supplier name
Supplier address
Stock No #
Stock No #
Stock item
Stock cost
Data which id not normalised




Wastes memory
Danger of inconsistency
Loss of information
Extra work if one data item changes in one file it must be updated in other tables
if not in a database but in a flatfile system.
11
N Kay 07/01/04
CP5
Database Theory
Normalised ERD’s
Order
Customer
Stock
Supplier

Customer places an order (only 1 in this case)
One to One

There are many orders requiring many stock
Many
One supplier provides many items of stock
Many
Many to

First Normal Form
One to
Summary Normalisation - the method
Start with all the items of
data in any order in one big
table
Group the data into
separate tables to remove
any data that is repeated.
Data must be present at the
atomic level.
Second Normal Form
Check to see if all the data
in each separate table
belongs to, or is uniquely
identified by the keyfield of
that table
Third Normal Form
Split the data again so that
data has its own sensible
keyfield.
Check that all fields in the
records of the table are
really uniquely identified
by the keyfield AND are
independent of one
another.
Split the data yet again so
that the fields of all records
belong to their keyfield only.
This will likely mean moving
some fields to a new table
and creating a new keyfield
for them.
That's the method. Sometimes Second and Third Normal Form are shown reversed,
but the end result is the same.
Normalisation reduces repetition to a minimum, so that a record is stored only in one
place. When it is updated, that one record is updated and it prevents two or more
versions of the same data existing. This is sometimes described as maintaining data
integrity.
12
Download