An Introduction to Databases Dr Stephen Swift The Intelligent Data Analysis Group Brunel University 1 An Introduction to Databases • Databases • The Parts of a Database • A Brief Description of SQL • Examples Using Microsoft Access 2 What is a Database? (1) • A Database System is a Computerised Record Keeping System • Rather Like an Electronic Filing Cabinet • The Data can be Added to, Deleted, Modified etc… • The Data Contained is of the Same Type • Would Not Have a Database Containing Patient Records and the Sales Records of a Pet Shop, For Example 3 What is a Database? (2) • In Large Organisations, a Database System is Usually a Subsystem of a Larger Information System • An Information System Supports the Information Handling Requirements of an Organisation • Smaller Organisations Might Just Have a Single Database • A Database Management System (DBMS) is a Software System that Enables Users to Define, Create, Maintain and Control Access to a Database 4 Why Are Databases Needed? • A Huge Amount of Data is Being Collected Every Second of the Day • The Data: – Is Often Complex – Large in Size – Requires Sophisticated Manipulation • Databases and DBMS are Essential to Successfully Manage Such Data 5 An MS Access Database An MDB File Tables Queries Forms & Reports Macros & Modules 6 Microsoft Access • A Stand Alone Database System • All Aspects of the Database are Contained in a Single MDB File • Slow When Handling Huge Volumes of Data • Can be Used to Create Database Applications 7 Tables Rows Foreign key Column name Primary key Patient No. Patient Surname 923 109 854 231 459 Moneybags Foot Hare Knee Legg Patient Forename Sex Maurice Ivor Susan Boris Brian Male Male Female Male Male Date of birth 23/7/53 3/4/41 13/11/61 4/2/31 10/2/70 Male Female Domains Patient No. Surname Sex Forename Ward No. 10 11 10 7 10 1–12 Date of birth Ward No. The Patient table Ward No. Ward Name 3 11 10 Nightingale Fleming Barnard Type Medical Medical Surgical The Ward table Number of beds 8 12 21 8 Table Properties (1) • Rows (or Records) – Shows Occurrences of Patient – Each Row Must be Uniquely Identifiable – The Order of the Rows MUST NOT Be Significant 9 Table Properties (2) Columns (or Fields) – Each Column has a Type, e.g. Number, Text, Boolean, Multimedia, etc… – The Order of the Columns MUST NOT be Significant – Only One Value Should be Associated With Each Column/Row Intersection in the Table 10 Table Properties (3) • Domain – A Pool of Possible Values From Which the Actual Values Appearing in the Columns of the Table are Drawn • e.g. The Domain of Patient Numbers Includes all of the Possible Patient Numbers, Not Just the Ones Currently in Hospital – Very Important for Comparing Values from Different Tables 11 The Primary Key • A Special Type of Field • Not All Tables Have a Primary Key • Usually a Number or String, e.g. Patient Number • Used to Relate Data Between Tables 12 Worked Examples • Check That Microsoft Access Loads • Check That You Can See Four Files: – – – – “Functions.xls” “Gene ID.xls” “spellman_yeast_alpha.xls” “annette2004.ppt” 13 Worked Example (1) • We Will: – Create a Microsoft Access Database – Import Some Data – Make Sure the Fields are the Correct Type – Create Three Tables – Look at the Tables (Datasheet View) 14 Queries (1) • A Query Selects or Modifies a Subset of One or More Tables • E.g. All Female Patients Under 18 Years Old • A Query is Often Expressed in a Special Language Called SQL 15 SQL • “Structured Query Language” • Originally a Proprietary Language from IBM • Now an International Standard High Level Language Supported by Most Database Products • Used to Modify Data Within a Database 16 Data Manipulation • Data is Manipulated by Rows and Columns • A Subset of Data is Selected and then Modified • The Selection is Made by the User, Usually Some Set of Requirements • E.g. Select All Female Patients Under 18 Years Old and Delete All Their Records 17 Queries (2) A SELECT Query Selects a Subset of One or More Tables SELECT <Fields> FROM <Table> WHERE <Condition>; SELECT Alpha.* FROM Alpha WHERE Alpha.alpha63="NULL"; 18 Queries (3) A Make Table Query Creates a Subset of One or More Tables and Puts the Results Into a New Table. The Destination Table is Replaced SELECT <Fields> INTO <Destination Table> FROM <Source Table> WHERE <Condition>; SELECT Alpha.* INTO Temp FROM Alpha WHERE Alpha.ORF Like "YP*"; 19 Queries (4) An Update Query Changes the Values of One or More Fields in One or More Tables UPDATE <Table> SET <Fields to Values> WHERE <Condition>; UPDATE Alpha SET Alpha.alpha63 = "0“ WHERE Alpha.alpha63="NULL"; 20 Queries (5) An Append Query Selects a Subset of One Tables and Adds it into Another Table INSERT INTO <Destination Table> SELECT <Fields> FROM <Source Table> WHERE <Condition>; INSERT INTO Temp SELECT Alpha.* FROM Alpha WHERE Alpha.alpha63="NULL"; 21 Queries (6) A Delete Query Removes a Subset of One or More Tables From the Database DELETE <Rows> FROM <Table> WHERE <Condition>; DELETE Alpha.*FROM Alpha WHERE Alpha.alpha63="NULL"; 22 Queries (7) A Crosstab Query is Very Complex and Will Therefore Not be Covered! 23 Worked Example (2) • We Have Some Import Errors • We Must Locate What Fields are in Error • We Must Then Use an UPDATE Query to Modify the Erroneous Data 24 Forms Forms are Used to View/Add/Manipulate Data 25 Data Entry (1) • The User Should Only be Able to Enter the Domain of a Field on a Form • E.g. If There are Only 10 Wards in a Hospital, They Should Only be Able to Enter 1-10 in the Wards Field • In the Example Above, Allowing Any Number Would Increase the Chance of Data Errors 26 Data Entry (2) • Pick Lists and Check Boxes Can Help to Maintain Data Integrity • Validation Rules on Form Fields Can Prevent the User From Entering Invalid Data • Minimise Free Text Entry to Fields • The Application Should Help the User in Completing Forms Correctly 27 Reports Reports are Used to Display Data 28 Macros and Modules • Macros are a User-Defined List of Database Actions to be Carried Out • Usually Commonly Performed Tasks • A Module Contains Functions and Subroutines that Carry Out More Complex Tasks • Modules are Constructed Using a 29 Form of Visual Basic Joins • A Join Combines Two Tables into One Virtual Table • Tables are Joined Together Based on a Common Value in a Field • The Field That the Two Tables are Joined on Must be the Same Type 30 Worked Example (3) • We Are Going to Join Our Tables Together • Using “Tools-Relationships” • Add the Three Tables We Imported • Join “Alpha-ORF” and “Gene ID-ORF” • Join “Gene ID-SGD” and “FunctionSGD” 31 Worked Example (4) • Now Look at the Effect on: – – Building a SELECT Query on All of the Tables The Datasheet View For One of the Tables • Without Joins it Would be Very Difficult to Relate and/or Compare Data From Different Tables – Why is This Important? 32 Normalising a Table Normalisation is: “The Organisation of a System's Attributes into a Set of Compact and Meaningful Tables” 33 Normalising a Table Well Normalised Tables Avoid: – Unnecessary Duplication of Data • i.e. No Redundant Data – Problems With Modifying, Inserting and Deleting Data • N.B. Sometimes Referred to as “Update Anomalies” 34 Stages of Normalisation (1) • Normalisation Takes Place in Stages • Each Stage is Known as a Normal Form • Each Stage is a Development From the Previous Stage 35 Stages of Normalisation (2) Un-Normalised Form First Normal Form Second Normal Form Third Normal Form 36 Un-Normalised Form • Column Headings (Field Names) Should be Meaningful • Choice of Primary Key – Must be Unique for the Particular Data Source – May Require Two or More Fields – Use the Smallest Number of Fields Possible – Avoid Textual Keys (Degrades Speed) 37 1st, 2nd and 3rd Normal Form • 1st : Separate any Repeating Groups of Fields to Other/New Tables • 2nd : Separate Fields that Only Depend Upon Part of the Key to Other/New Tables • 3rd : Separate any Fields That are Not Directly and Fully Dependent on the Key to Other/New Tables 38 Sample Source of Data DRUG CARD Patient No. Ward No. 923 10 Surname Forename Moneybags Maurice Ward Name Barnard Drugs Prescribed Length of Date Drug Code 20/5/88 CO2355P Drug Name Dosage Treatment 2 pills 3 x day Cortisone after meals 14 days 20/5/88 MO3416T Morphine Injection every 4 hours 5 25/5/88 MO3416T Morphine Injection 3 every 8 hours 26/5/88 PE8694N Penicillin 1 pill 3 x day 7 39 for additional drugs continue on another card After Normalisation SYSTEM: Source ID No.: UNF Patient Number Patient Surname Patient Forename Ward Number Ward Name Prescription Date Drug Code Drug Name Dosage Length of Treatment Hospital DATE / / AUTHOR Name of Source: 1NF Drug Card 2NF Patient Number Patient Surname Patient Forename Ward Number Ward Name Patient Number Patient Surname Patient Forename Ward Number Ward Name Patient Number Prescription Date Drug Code Drug Name Dosage Length of Treatment Patient Number Prescription Date Drug Code Dosage Length of Treatment Drug Code Drug Name 3NF Patient Number Patient Surname Patient Forename * Ward Number Ward Number Ward Name Patient Number Prescription Date Drug Code Dosage Length of Treatment Drug Code Drug Name 40 Tables as a Logical Data Structure Ward Patient Pat No 923 109 Surname Moneybags Foot Forename Wd No Wd No Maurice 10 Ivor 11 Ward Name 10 Barnard 11 Fleming Drug Prescription Pat No Prescr Date Drug Code Trt Lgth 923 20/5/88 MO3416T 923 25/5/88 MO3416T 923 26/5/88 PE8694N Dosage 2 pills 3 x day after meals Injection every 4 hours Injection every 8 hours 1 pill 3 x day 109 15/5/88 AS473A 2 pills 3 x day after meals 7 109 20/5/88 VA231M 2 per day 5 923 20/5/88 CO2355P 14 5 Drug Code Drug Name CO2355P Cortisone MO3416T Morphine PE8694N Penicillin AS473A Aspirin VA231M Valium 3 7 41 Worked Example (4) • Create a SELECT Query that Just Displays the Functional Groups • Check that it Contains What We are After • Change the SELECT Query to a MAKE TABLE Query 42 References • Further Reading and Source for this Presentation: – Database Systems: “A Practical Approach to Design, Implementation and Management”, 3rd Edition, T. Connolly and C. Begg, Addison Wesley, 2001 – “An Introduction to Database Systems”, 8th Edition, C. J. Date, Addison Wesley, 2004 43