Lecture 1 - IT Skills and Digital Literacy

advertisement
Lecture 1 – Introduction to
Databases
Rebecca McCready
Faculty of Medical Sciences
Newcastle University
http://fms-itskills.ncl.ac.uk
Introduction to Databases
•
•
•
•
•
What is a database?
Types of databases.
Differences between them.
What is normalisation?
What are primary and foreign keys?
http://fms-itskills.ncl.ac.uk
What is a database?
• “Organised collection of data”
Wikipedia (www.wikipedia.org)
• Using a database you can:
• Access data in an organised fashion.
• Filter data for analysis.
• Record and change data.
http://fms-itskills.ncl.ac.uk
Types of databases (1)
• Flat file:
• Each line is a single entry.
• Each column is necessary, each row is
unique.
• Simple structure.
• Should be non-repetitive data.
• eg. text file, spreadsheet, table of data.
http://fms-itskills.ncl.ac.uk
Working example: a good flat file
• Questionnaire
results.
• Experiment results.
• Data that cannot be
merged or split.
Date of Birth
Age(years)
% Fat
05/07/1969
23
9.5
08/09/1932
61
34.5
09/11/1965
27
7.8
19/03/1953
39
31.4
03/01/1966
27
17.8
10/04/1934
58
33
24/08/1936
56
32.5
12/05/1972
21
31.1
29/12/1939
53
34.7
18/10/1938
54
29.1
31/03/1935
57
30.3
20/02/1970
23
27.9
27/02/1935
58
33.8
06/02/1940
53
42
http://fms-itskills.ncl.ac.uk
Flat files – the pro’s
•
•
•
•
•
Good for non-repetitive data.
Good for data in a simple structure.
Good for describing single instances.
Should be easy to analyse.
Simple to create and maintain.
http://fms-itskills.ncl.ac.uk
Flat files – the con’s
• Difficult to store complex or repetitive
data.
• Difficult to analyse complex data stored in
single lines.
• Can be time-consuming to maintain if data
is complex.
http://fms-itskills.ncl.ac.uk
Working example: a bad flat file
Patient
ID
Date of
Birth
Gender
Patient
Age
Operation
Operation
Date
Hospital
Consultant
1
04-Feb-69
Female
30
Nephrectomy (any)
25/02/1999
Southmead Hospital,
Bristol
Vadanan
2
20-Jun-80
Female
19
Cadaver donor
nephrectomy
14/01/1999
Southmead Hospital,
Bristol
Holland
3
05-May-76
Male
23
Upper polar partial
nephrectomy
06/08/1999
Southmead Hospital,
Bristol
Roysam
25/02/1999
Southmead Hospital,
Bristol
Sanderson
5
04-Dec-53
Female
46
Nephroureterectomy (any)
14
01-Feb-83
Female
15
Nephrectomy (any)
18/10/1998
Frimley Park
Hospital, Camberley
Jones
16
07-Jun-85
Male
13
Enucleation of renal
tumour
10/09/1998
Southmead Hospital,
Bristol
Whiteaway
16
07-Jun-85
Male
13
Upper polar partial
nephrectomy
17/10/1998
Frimley Park
Hospital, Camberley
Jones
O
Blue: Multiple fields of repeated data.
O
Green and Grey: Closely related data.
O
If this is true then…
http://fms-itskills.ncl.ac.uk
Types of databases (2)
• Relational databases:
• Made of several tables.
• Each table should relate to another.
• Complex data is broken down into simple
tables.
• Each entry in each table has a unique identifier.
• Based on Set Theory in Maths where
members have shared characteristics.
• eg. Database.
http://fms-itskills.ncl.ac.uk
Working example: a good database
B
e
l
o
n
g
s
Performed
on
Host
ID
ID
ID
Responsible
for
http://fms-itskills.ncl.ac.uk
What is normalisation?
• Process of applying design rules to a database.
• 3 normal forms are necessary (NF), although 5
normal forms exist.
First normal form:
• No duplicated rows, each cell has a single value and each
table has a designated primary key.
Second normal form:
• PLUS each table is dependent entirely on the primary
key.
Third normal form:
• PLUS each column must depend directly on the primary
key.
http://fms-itskills.ncl.ac.uk
What are Primary and Foreign Keys?
• A Primary Key uniquely identifies each record in a
database table.
• E.g. Patient hospital number, NI numbers, Student IDs
etc.
• A Foreign Key is a linked field to a primary key field
in another table to indicate that the two records
have matching values.
http://fms-itskills.ncl.ac.uk
Working example: requires complex front end
Record
ID
Operation
Date
Patient
ID
Patient
Age
Operation
ID
Hospital
ID
Consultant
ID
1
10/09/1998
16
13
14
117
20
2
17/10/1998
2
26
3
89
19
3
12/01/1999
14
54
10
117
5

A complex front end is required to make sense
of this data and allow easy input of data:


Who is patient number ‘16’ or consultant number
‘5’?
What operation is number ‘3’?
http://fms-itskills.ncl.ac.uk
Relational databases – the pro’s
• Excellent for storing complex data.
• Excellent if data becomes difficult to
manage or analyse in flat file form.
• Excellent if data becomes repetitive.
• Excellent if you have multiple questions to
ask in your study.
http://fms-itskills.ncl.ac.uk
Relational databases – the con’s
• Difficult to create.
• Often difficult to separate data properly:
‘normalisation’.
• Require complex ‘front ends’ to manage
them easily and make sense of data.
• Requires a greater level of knowledge and
skill to use.
http://fms-itskills.ncl.ac.uk
So how do you decide?
• Is your data repetitive?
• Do you have complex queries to run?
• Are you unsure of what queries you might
want to ask of your data?
• Do you have complex groupings and
relationships between data fields?
• Are you able to put the time and effort in to
create one? Do you have the confidence to
do so?
• If YES to any, use a Relational DB.
http://fms-itskills.ncl.ac.uk
To conclude
• Flat file:
• Single lines of data.
• Unrelated to each other.
• Relational database:
• Many tables of single lines.
• Relationships between tables.
• Shared characteristics.
• Choose most suitable for your data and
you.
http://fms-itskills.ncl.ac.uk
Download