N Kay 07/01/04 CP5 Database Theory CP5 Databases DATABASE: an organised collection of related files of data and software to organise it and control access to it. It is a collection of non-redundant data shareable between applications. Databases stores and organises data DBMS is a program which controls access to it. Criticisms of Single and Integrated filing systems. (Flatfiles) Data duplicated wasteful of memory. Data may be updated in one file but not in another so data inconsistent Data was not sharable Data and programs were dependent e.g. if you add a new field then every program that uses that data file will load the field. Managers do not have an overall picture of what is happening in the firm. Databases Disc Very large data handling systems Used when there is a very large volume of processing and many applications need to access the same central pool of data e.g. a large corporation whose accountants, warehousing, stock control, billing systems all need to access data about customer order. General model of a database What makes databases secure ? Password hierarchy Data validation techniques Data is stored separately from programs so different programs cannot overwrite data. Central Pool of Data DBMS App 1 User User App 2 User User User 1 App 3 User User User User N Kay 07/01/04 CP5 Database Theory Database Approach Each application has its own view of the data which is relevant to it. i.e. this is known as the data model. Hierarchical Network Relational data model based upon relational links between key fields Relational databases a relation is a set of entities that have the same attributes each entity in one table has the same attributes as the other table. Different parts of an organisation have a different view of the central data. The primary keyfield uniquely identifies a record. Functional Dependency (on primary key) means there should be a unique association between the primary key and the attribute (part of a record e.g. field) The primary key can uniquely identify a record if functional dependency exists. Transitive Dependency If A depends upon B and B depends upon C then A depends upon C. Often to be avoided if unnecessary deletions are to be avoided. In a Logical data model (LDM) we would have; Data dictionary - A record of all the data in the system i.e. Records of all tables Lists all the data entities of a systems and all of their attributes. Any restrictions on data their format length and relationships between them. What programs can access the data and whether they can read only edit E.g. Name of each data item Names of tables and their fields. 2 N Kay 07/01/04 CP5 Database Theory Data types of all fields Any field formatting required Field validation rules Any relationships between tables Name Description Type Size Min & Max permitted values DOB Date Date of student’s birth 10 01/01/1978 - 31/31/2001 ERD’s Relational data analysis (RDA) which involves the techniques of normalisation. Advantages of using databases 1. Avoids data duplication data stored once linked by keyfields all data available via relational links in keyfields 2. Controlled redundancy Minimises data duplication 3. Ensures consistency of data of data to all users 4. Data independence data stored separately from programs so can add new fields because data is independent of the applications which use it 5. Increased security Hierarchy of passwords- What makes databases secure? ID Password hierarchy Authentication DBMS can vote permissions Read only Authorisation to files and processing Write only etc. Users only allowed to view data allowed to so less risk of accidental or deliberate destruction. Data validation techniques Data is stored separately from programs so different programs cannot overwrite data. 6. 7. Data integrity specify constraints on the data to ensure it is in the correct format and range. Easy to add new applications without affecting stored data files 3 N Kay 07/01/04 CP5 Database Theory Disadvantages Complex to set up and maintain; needs team of programmers to maintain it. Database software is large complex expensive and requires powerful computers. All applications which access the data will be affected if database fails. As the DBMS is the only access to operational data a system failure can have serious consequences Important Terminology Database Management Systems (DBMS) DBMS is a program which controls access to the data Data storage retrieval and update (create ,edit and search) Creation and maintenance of data dictionary Managing facilities for sharing data e.g. when two people both simultaneously try to update the data (Locking out other users) Backup and recovery of data Security - check passwords and access rights. Allow applications to access the data and allow new applications SQL Structured Query Language a data manipulation language used to perform searches sorts etc. Queries combines into 1 table the data from several others Selects fields which are to be shown in answer Specifies criteria for searching or sorting Save query so can be re run Saves answer table so it can be re-used in future reports Example FROM 'Address' table SELECT Name, Address, Tel No, DOB WHERE DOB, < 31/12/1999 ORDER by Name Database Administrator Responsible for Design of the database and monitoring it’s performance Keeping users informed of changes in the database structure which will affect them Maintenance of the data dictionary , implementing access rights and privileges Allocating passwords Training users on how to access the database Ensure adequate backup and recovery procedures Report generator - facility to output data in a variety of format styles and reports Client server databases DBMS server software runs on a network server. This processes requests for searches reports etc from client server software on network stations 4 N Kay 07/01/04 CP5 Database Theory The conceptual data model describes how the data elements in the database are grouped. Entity: Attribute: is a thing of interest to the organisation about which data is to be held. E.g. Customer, employee Stock item, Supplier A property or characteristic of an entity e.g. customer No, customer name etc. Relationship is a link or association between entities e.g. Customer places an order, a doctor has many patients but a patient has only one doctor Type of relationships One to One e.g. One husband has one wife, one employee - one job Employee One to many Job e.g. Mother has many children a borrower has many library books Ward Many to many Patients e.g. Students and courses Courses Students Example Product Member Pupils Barcode Hires videos Teachers Draw the ERD diagram for the above table 5 One to one One to many Many to many N Kay 07/01/04 CP5 Database Theory Entity Relationship Diagrams (ERD) ENTITY A Doctor ENTITY B Patient ATTRIBUTES ATTRIBUTES Doctor Patients Hospital Draw an ERD for the following situation What are the entities? One to one relationships? One to many relationships? Many to many relationships? 6 College enrolment N Kay 07/01/04 CP5 Database Theory Normalisation This is the process undertaken to ensure that a database has no redundant data or inconsistent data. Normalisation is the process which ensures that data is held in a database has Eliminated redundancy Achieved consistency Minimises duplication thus allowing the accurate processing of data and that the database has referential integrity i.e. it will remain error free and robust when data is added deleted or changed. To solve the problem of data duplication the data ID stored once and linked by its keyfields. Therefore all data can be accessed via the keyfields. Tables are linked together via relationships between foreign fields i.e. common to each table Each database will have Standard ways of writing them down; Table in CAPITALS Fields in brackets Key field underlined STUDENT (Student No, Name ….) (Student No, Name ….) Foreign fields italics with a line above _______ (Student No, Exam No, Primary keys uniquely identifies a record Foreign keys linking to another table. Why might the primary key be more than one field? e.g. visit to a doctor Unique patient code NHS No would not be sufficient Need to use multiple key NHS No + date Primary key Customer ID # Name Address Tel Order ID # Customer ID Details of order Date of order Primary key Foreign key Standard way of writing these down CUSTOMER TABLE (Customer ID, Name, Address, Tel) _________ ORDER TABLE (Order ID, Details of order, Date of order, Customer ID) There are 3 stages 7 N Kay 07/01/04 CP5 Database Theory First normal form: no repeating attributes or groups Each column must contain only a single value Each row must have an item in every column First normal form removal of repeating groups. Repeating groups are identified and a second entity is created with an appropriate primary key. Example Student No Student name Date of birth Sex Course No Course name Lecture r No Lecturer Name 0485 F Smith 12/09/82 M CO4876 1845 Jones. R 9234 K Peters 19/10/81 F BI0945 1945 D Evans 0485 F Smith 12/09/82 M BI0945 Computing A level Biology a level Biology a level 1945 D Evans Suitable tables of attributes could be STUDENT (Student no, Student name, Date of birth, Sex, ,) COURSE (Course No, Course Name, Lecturer No, Lecturuer name) How can the relationship between the two tables be shown? They need to be linked by a common field BUT this is a many to many relationship Adding course no. to STUDENT TABLE is no good because many students do many courses. Adding student no. to COURSE TABLE is no good because each course has many students. We could set up lots of fields in student table to i.e. one for each course STUDENT (Student No, Student name, Date of birth, Sex, Course1, Course2, Course3) but this would duplicate data on courses. i.e. a repeating attribute. To put the table in first normal form the repeating attribute must be removed And the field course number becomes part of the student table. STUDENT #) (Student No # , Student name, Date of birth, Sex, , Course No COURSE (Course No #, Course Name, Lecturer No, Lecturer name) 8 N Kay 07/01/04 CP5 Database Theory Second normal form: tables contain no partial dependencies Depends only on part of the keyfield. So we create another table that links to another. We consider all tables and attributes that do not depend upon the key field and create separate tables for them to avoid data duplication Student name is dependent only upon student no and not on course No. To put the tables into second normal form we need to add a third table STUDENT (Student No # , Student name, Date of birth, Sex, ) STUDENT TAKES ( Student No #, Course No # ) COURSE (Course No #, Course Name, Lecturer No, Lecturer name,) Whenever your dealing with many to many relationships you will always need a link table in the middle!! Students Courses Students Student takes becomes Third Normal Form Courses tables contains no non key dependencies. Data items are dependent on the keyfield only Data items are dependent upon the whole key. No transitive dependencies Transitive dependencies can be removed by creating new table. Creation of new tables for attributes which do not depend upon their candidate key but which depend instead upon other non key attributes in the table In the Course table, lecturer name is dependent upon the Lecturer No and not the Course No. Therefore it needs to be removed from this dependency How? Create a new table for lecturer In third normal form the table now looks as follows; STUDENT (Student No # , Student name, Date of birth, Sex, ) STUDENT TAKES ( Student No #, Course No # ) COURSE (Course No #, Course Name, Lecturer No, # ) LECTURER ( Lecturer No #, Lecturer Name) 9 N Kay 07/01/04 CP5 Database Theory Put into third normal form 1. There is a 2. There is a 3. There is a 4. There is a list list list list of of of of pupils exams they could take exam entries rooms in which exams are taken First normal form PUPIL ( EXAMS ) ( ) Second normal form PUPIL ( ) EXAM ENTRIES ( ) EXAMS ( ) Third normal form PUPIL ( ) EXAM ENTRIES ( ) EXAMS ( ) EXAM ROOMS ( ) 10 N Kay 07/01/04 CP5 Example 2 Put into first Normal Form Customer No # Customer Firstname Customer Surname Address Tel No Stock No # Customer Record Customer No Customer Firstname Customer Surname Address Tel No Supplier No Supplier name Supplier address Stock No Stock item Stock cost Description Supplier Tel No Customer No # Customer Firstname Customer Surname Address Tel No Database Theory Stock No # Stock item Stock cost Description Supplier No # Supplier name Supplier address Put into Second Normal Form Stock No # Customer No # Stock No # Stock item Stock cost Description Supplier No # Supplier name Supplier address Put into Third Normal Form Customer No # Customer Firstname Customer Surname Address Tel No Stock No # Customer No # Supplier No # Supplier name Supplier address Stock No # Stock No # Stock item Stock cost Data which id not normalised Wastes memory Danger of inconsistency Loss of information Extra work if one data item changes in one file it must be updated in other tables if not in a database but in a flatfile system. 11 N Kay 07/01/04 CP5 Database Theory Normalised ERD’s Order Customer Stock Supplier Customer places an order (only 1 in this case) One to One There are many orders requiring many stock Many One supplier provides many items of stock Many Many to First Normal Form One to Summary Normalisation - the method Start with all the items of data in any order in one big table Group the data into separate tables to remove any data that is repeated. Data must be present at the atomic level. Second Normal Form Check to see if all the data in each separate table belongs to, or is uniquely identified by the keyfield of that table Third Normal Form Split the data again so that data has its own sensible keyfield. Check that all fields in the records of the table are really uniquely identified by the keyfield AND are independent of one another. Split the data yet again so that the fields of all records belong to their keyfield only. This will likely mean moving some fields to a new table and creating a new keyfield for them. That's the method. Sometimes Second and Third Normal Form are shown reversed, but the end result is the same. Normalisation reduces repetition to a minimum, so that a record is stored only in one place. When it is updated, that one record is updated and it prevents two or more versions of the same data existing. This is sometimes described as maintaining data integrity. 12