A collection of organised data Data has structure Can be paper-based More common to talk about electronic databases i.e. computer-based A flat file database is a simple database that stores all data in a single table. A flat file database can be stored in a text file, such as a tab-delimited file; a spreadsheet; or in a database file that contains one or more unrelated tables. Useful for simple lists: Address book CD collection But many problems... Data redundancy › data duplicated in many different files. › Makes data entry slower › Uses disk space Data inconsistency › same data held in different files has to be updated in each separate file when it changes. Program-data dependence › Every computer program has to specify exactly what data fields constitute a record in the file being processed. Changes in data structure result in changes to programs Lack of flexibility › Difficult and time-consuming to assemble the data from the various files and write new programs to produce the required non-routine reports. Data was not shareable › If one department had data that was required by another department, it was awkward to obtain it. A more complex database that stores data in multiple tables that are interrelated. RDBMS benefits: Reduced redundancy Improved data consistency Improved data integrity Better security Program-data independence A file is a collection of sets of similar data called records. Each piece of data within a record is called an item. Items are stored in fields. Database consists of a series of related files called tables. Table › Records Fields Imagine a file of student details in school. Data held may include; Name, address, emergency contact, form, form teacher, form room, subjects studied... Each of these is a field, and the data placed in them in the item, So we have... A file, containing all the data about the students in the school Records, each has the same sort of contents and each one relates to a specific student Fields, which may have an item in them or not. › Field ‘name’ will always have an item in it. › ‘Additional notes’ may be blank. Blank fields can cause problems when interpreting the data at a later date. Does it mean that you have no information, or you have forgotten to enter the information? If information is unavailable it is better to provide a standard response – N/A or Unknown for example. Some fields may be unique. Could be possible that all students have different names – unlikely!! Fields stored can be used to mail merge letters for contacting a particular group - All people living in a specific village. Some field items will be repeated in record after record - Form name, room and teacher Field items could be of different length, and can cause problems... A file where all the records are of the same length is said to have fixed length records. Some fields are always the same length › Postcode is always 7 characters Some fields may need to be 'padded out' so they are the correct length › Surname - If 15 characters are stored then Jenkins would be stored as 'JENKINS '-7 char + 8 spaces Advantage : Access is fast because the computer knows where each record starts. Disadvantage : Using Fixed length records, the records are usually larger and therefore need more storage space and are slower to transfer (load or save). One or more of the fields can be of differing lengths in each record. Advantages: the records will be smaller and will need less storage space the records will load faster Disadvantages: The computer will be unable to determine where each record starts so processing the records will be slower. When the record is stored, each field has a field terminator byte stored at the end of it, and there is often a record terminator at the end of the whole record. The first record of the example would be stored as..... * is a field terminator % is a record terminator This record requires 33 bytes of storage... but each record will be a different size. Each field starts with a byte showing the length of the field. The whole record starts with a byte giving the size of the record. The first record of the example would be stored as... This record requires 34 bytes of storage.. but again, each record would be a different size. Each record in a file must be identifiable and one field must be unique. Known as the Primary or Key Field. Terminology: File = Table Record = Tuples Field = Attribute Databases are collections of data arranged into related tables. There are lots of ways of arranging the data in tables and each arrangement can be given a label according to how it has been arranged These labels are called their Normal Form. Normalisation is the process undertaken to make sure a database has no redundant data or inconsistencies. Tables should be organised so that: › › › › No data is unnecessarily duplicated Data is consistent throughout the database The structure of each table is flexible enough to allow you to enter as many or as few items as you want to The structure should enable a user to make all kinds of complex queries relating data from different tables "A table is in 1NF if it contains no repeating attributes or groups of attributes" e.g. A student can take several courses. Each course has several students attending. The relationship can be represented by an ER diagram: The attributes in these tables would look something like this: STUDENT (StudentID, StudentName, DoB, Gender) COURSE (CourseNumber, CourseName, LecturerID, LecturerName) Consider the problems of creating a relationship between these 2 tables... A link has to made between common fields... but there are no common fields! We could link the tables by copying an attribute from one into the other, but whichever attribute we pick, there will always be repeating attributes created,(which is unacceptable in 1NF!) as shown below... STUDENT (StudentID, StudentName, DoB, Gender, CourseNumber) No good - The student takes several courses, which one would be mentioned? COURSE (CourseNumber, CourseName, LecturerID, LecturerName, StudentID) No good - Each course has more than one student taking it. How about allowing space for 3 courses on each student record? STUDENT (StudentID, StudentName, DoB, Gender, Course1, Course2, Course3) This is no good either - we have created a repeating attribute! The field, CourseNumber is repeated 3 times. The table is therefore not in first normal form. In standard notation, this would be represented by a line above the repeating attribute. To achieve 1NF we must make CourseNumber part of the Primary Key of the STUDENT table... STUDENT(StudentID, CourseNumber StudentName, DoB, Gender) By grouping StudentID and CourseNumber together, we can uniquely identify each student and the courses they are taking without having any repeating attributes = 1NF. 2NF only applies to table that have a Composite Key! "A table is in 2NF if it is in 1NF and it contains no partial dependencies" At the moment, our tables are not in 2NF because they contain attributes that are only partially dependent upon the Primary Key... to be in 2NF, all attributes need to be wholly dependent on the Primary Key. All very well... but what does it mean? The Primary Key of our STUDENT table is a Composite Key, made up of both StudentID and CourseNumber. The attribute StudentName is dependent upon StudentID (One specific StudentID will refer to only 1 student) but it is in no way dependent upon CourseNumber (We cannot identify an individual student by a CourseNumber). This makes it only partially dependent on the Primary Key, and therefore not in 2NF. To achieve 2NF we need to introduce a 3rd table to link the two entities: STUDENT(StudentID, StudentName, DoB, Gender) COURSE (CourseNumber, CourseName, LecturerID, LecturerName) STUDENT_TAKES (StudentID, CourseNumber) "A table is in 3NF if it contains no non-key dependencies" The COURSE table contains an attribute for LecturerID and also one for LecturerName. LecturerName is dependent on LecturerID (not on CourseNumber)... We need a new table for the entity LECTURER! STUDENT(StudentID, StudentName, DoB, Gender) COURSE (CourseNumber, CourseName, LecturerID) STUDENT_TAKES (StudentID, CourseNumber) LECTURER (LecturerID, LecturerName) This is the optimum way of holding this data without any duplication. All tables in a Relational Database should be in 3NF! Each table contains one special attribute by which tuples can be identified because it is unique Primary key – shown by underlining its reference within the bracket of attributes A Primary Key is one or more attributes which uniquely identify an entity occurrence. Sometimes a single attribute is not sufficient to identify each occurrence of an entity uniquely. In these instances we must combine two or more attributes to create a Composite Key. For example, a person's name by itself will not necessarily be enough to identify an individual. A person's name combined with their address may be more appropriate. A key in one table that occurs in another table is called a Foreign Key - Used to link 2 tables together Terminology: File = Table Record = Tuples Field = Attribute Every entity has a name and a Primary Key. Most entities will also have a number of nonidentifying attributes. The convention used for defining attributes is shown below. EntityName (Identifying Attribute1, NonIdentifying Attribute1, .....) The name of the entity is followed by a list of its attributes in brackets. The identifying attribute(s) (Primary / Composite Key) comes first and is underlined. When all the attributes are not yet known this can be shown by a row of dots. An entity-Relationship (ER) diagram shows what information is stored and how it is related i.e. it models the structure of the data. There are 3 main concepts in an ER Diagram: Entities - Things, usually nouns e.g. 'Student' Attributes - Properties of things e.g. 'Name', 'StudentID' Relationships - Connections between things e.g. Student 'studies' Course An entity is a real world object about which data is to be recorded. Attributes are properties, or characteristics, of entities. Relationships are associations between entities. An ER model consists of: › › › › A diagram showing entities and the relationships between them. Formal descriptions of each entity in terms of its attributes. Descriptions of meanings of relationships. Descriptions of any constraints on the system and of any assumptions made. Diagram Conventions for an ER Diagram Entities are shown as rectangles with the name of the entity inside. Name When choosing an entity name: Use singular nouns e.g. 'Student' not 'Students'. Start with an upper case letter and concatenate words e.g. DegreeScheme. Choose distinct names. Three degrees of relationship can be represented: A one-to-one relationship 1:1 A one-to-many relationship 1:n A many-to-many relationship m:n Example: A database can very quickly become complicated. They require something to control it and to control access to it. It needs to control the amending of data to ensure that all the rules remain unbroken. Addition & deletion of data must also be controlled. This software is called Database Management System. Database Management System Data storage, retrieval and update DBMS must allow users to store, retrieve and update information as easily as possible, without having to be aware of the internal structure of the database. Creation and maintenance of the data dictionary Managing the facilities for sharing the database ensure that problems do not arise when two people simultaneously access a record and try to update it. Backup and recovery provide the ability to recover the database in the event of system failure. Security handle password allocation and checking, and the ‘view’ of the database that a given user is allowed. Includes a piece of software called Data Description Language DDL. DDL is used to define the tables in the database, including; › › › › › › › › › › › Data types Data structures within the database Any constraints on the data The design that is created is called a schema. Each user of the database will use it for different things, will be allowed to see different parts and will be given their own subschema to give the rules of how they see data. Users of the database will be given different rights: Db Admin allocates users to groups of one or more & assigns each group a set of privileges or permissions. Permissions determine whether user can view / modify / execute / update. Each user / group has own username Each user has individual password (can & should change regularly) Some will involve manipulating data (amend/delete/insert new data). Done using a tool called a Data Manipulation Language – DML. Manipulation techniques of a DBMS can simplify the use of the DDL & DML Query By Example (QBE) The DBMS maintains a file of descriptions of the data and the structure of storage known as the data dictionary. Data Dictionary The data dictionary is a ‘database about the database’. It will contain information such as: What tables and columns are included in the present structure; The names of the current tables and columns; The characteristics of each item of data, such as its length and data type; Any restrictions on the value of certain columns; The meaning of any data fields that are not self-evident; e.g. a field such as ‘course type’; The relationships between items of data; Which programs access which items of data, and whether they merely read the data or change it. Various tools allows the DBMS to present differing views of the data held within the database. Internal level – 1st Level View of the entire database as it is stored in the system Level at which data is organised according to random access, indexed, sequential... It is hidden from the user by the DBMS Conceptual level – 2nd Level Gives a single, usable, view of all the data on the database External level – 3rd view Where data is arranged according to user requirements and rights Different users will get different views of the data