Lecture 1 – Introduction to Databases Rebecca McCready Faculty of Medical Sciences Newcastle University http://fms-itskills.ncl.ac.uk Introduction to Databases • • • • • What is a database? Types of databases. Differences between them. What is normalisation? What are primary and foreign keys? http://fms-itskills.ncl.ac.uk What is a database? • “Organised collection of data” Wikipedia (www.wikipedia.org) • Using a database you can: • Access data in an organised fashion. • Filter data for analysis. • Record and change data. http://fms-itskills.ncl.ac.uk Types of databases (1) • Flat file: • Each line is a single entry. • Each column is necessary, each row is unique. • Simple structure. • Should be non-repetitive data. • eg. text file, spreadsheet, table of data. http://fms-itskills.ncl.ac.uk Working example: a good flat file • Questionnaire results. • Experiment results. • Data that cannot be merged or split. Date of Birth Age(years) % Fat 05/07/1969 23 9.5 08/09/1932 61 34.5 09/11/1965 27 7.8 19/03/1953 39 31.4 03/01/1966 27 17.8 10/04/1934 58 33 24/08/1936 56 32.5 12/05/1972 21 31.1 29/12/1939 53 34.7 18/10/1938 54 29.1 31/03/1935 57 30.3 20/02/1970 23 27.9 27/02/1935 58 33.8 06/02/1940 53 42 http://fms-itskills.ncl.ac.uk Flat files – the pro’s • • • • • Good for non-repetitive data. Good for data in a simple structure. Good for describing single instances. Should be easy to analyse. Simple to create and maintain. http://fms-itskills.ncl.ac.uk Flat files – the con’s • Difficult to store complex or repetitive data. • Difficult to analyse complex data stored in single lines. • Can be time-consuming to maintain if data is complex. http://fms-itskills.ncl.ac.uk Working example: a bad flat file Patient ID Date of Birth Gender Patient Age Operation Operation Date Hospital Consultant 1 04-Feb-69 Female 30 Nephrectomy (any) 25/02/1999 Southmead Hospital, Bristol Vadanan 2 20-Jun-80 Female 19 Cadaver donor nephrectomy 14/01/1999 Southmead Hospital, Bristol Holland 3 05-May-76 Male 23 Upper polar partial nephrectomy 06/08/1999 Southmead Hospital, Bristol Roysam 25/02/1999 Southmead Hospital, Bristol Sanderson 5 04-Dec-53 Female 46 Nephroureterectomy (any) 14 01-Feb-83 Female 15 Nephrectomy (any) 18/10/1998 Frimley Park Hospital, Camberley Jones 16 07-Jun-85 Male 13 Enucleation of renal tumour 10/09/1998 Southmead Hospital, Bristol Whiteaway 16 07-Jun-85 Male 13 Upper polar partial nephrectomy 17/10/1998 Frimley Park Hospital, Camberley Jones O Blue: Multiple fields of repeated data. O Green and Grey: Closely related data. O If this is true then… http://fms-itskills.ncl.ac.uk Types of databases (2) • Relational databases: • Made of several tables. • Each table should relate to another. • Complex data is broken down into simple tables. • Each entry in each table has a unique identifier. • Based on Set Theory in Maths where members have shared characteristics. • eg. Database. http://fms-itskills.ncl.ac.uk Working example: a good database B e l o n g s Performed on Host ID ID ID Responsible for http://fms-itskills.ncl.ac.uk What is normalisation? • Process of applying design rules to a database. • 3 normal forms are necessary (NF), although 5 normal forms exist. First normal form: • No duplicated rows, each cell has a single value and each table has a designated primary key. Second normal form: • PLUS each table is dependent entirely on the primary key. Third normal form: • PLUS each column must depend directly on the primary key. http://fms-itskills.ncl.ac.uk What are Primary and Foreign Keys? • A Primary Key uniquely identifies each record in a database table. • E.g. Patient hospital number, NI numbers, Student IDs etc. • A Foreign Key is a linked field to a primary key field in another table to indicate that the two records have matching values. http://fms-itskills.ncl.ac.uk Working example: requires complex front end Record ID Operation Date Patient ID Patient Age Operation ID Hospital ID Consultant ID 1 10/09/1998 16 13 14 117 20 2 17/10/1998 2 26 3 89 19 3 12/01/1999 14 54 10 117 5 A complex front end is required to make sense of this data and allow easy input of data: Who is patient number ‘16’ or consultant number ‘5’? What operation is number ‘3’? http://fms-itskills.ncl.ac.uk Relational databases – the pro’s • Excellent for storing complex data. • Excellent if data becomes difficult to manage or analyse in flat file form. • Excellent if data becomes repetitive. • Excellent if you have multiple questions to ask in your study. http://fms-itskills.ncl.ac.uk Relational databases – the con’s • Difficult to create. • Often difficult to separate data properly: ‘normalisation’. • Require complex ‘front ends’ to manage them easily and make sense of data. • Requires a greater level of knowledge and skill to use. http://fms-itskills.ncl.ac.uk So how do you decide? • Is your data repetitive? • Do you have complex queries to run? • Are you unsure of what queries you might want to ask of your data? • Do you have complex groupings and relationships between data fields? • Are you able to put the time and effort in to create one? Do you have the confidence to do so? • If YES to any, use a Relational DB. http://fms-itskills.ncl.ac.uk To conclude • Flat file: • Single lines of data. • Unrelated to each other. • Relational database: • Many tables of single lines. • Relationships between tables. • Shared characteristics. • Choose most suitable for your data and you. http://fms-itskills.ncl.ac.uk