Uploaded by Maunten Mangi

Week 2 Relational Database

advertisement
Introduction to Database
Topics of the Lecture
• The components of a relational database:
Lecture 2
2.1 An example database: the Beijing Olympic Games
2.2 Tables
2.3 Tables are Sets
2.4 Relationships & Relational Maps
2.5 Domains
2.6 Missing data and “null” values
2.7 Integrity Constraints
© from notes by james d. reye @qut
1
2
2.1 An example database:
the Beijing Olympic Games
Tables in the Beijing Olympics
Games Databases
• Created to illustrate the ideas of relational databases
– Only singles swimming events, and final results
– Only 4 tables: Countries, Competitors, Events &
Results
•
Only partial views of the tables are shown here
Swimming
3
• Countries: one row for each country.
4
• Competitors: one row for each competitor (“athlete”).
Countries
Competitors
CountryCode
CountryName
AFG
Afghanistan
AHO
Netherlands Antilles
ALB
Albania
ALG
AND
ANG
ANT
ARG
ARM
Algeria
Andorra
Angola
Antigua and Barbuda
Argentina
Armenia
ARU
ASA
AUS
Aruba
American Samoa
Australia
AUT
AZE
Austria
Azerbaijan
BAH
Bahamas
CompetitorNum
5
GivenName
FamilyName
Gender
DateOfBirth
CountryCode
210375 Leisel
210413 Andrew
210444 Linda
Jones
Lauterstein
Mackenzie
female
male
female
1985-08-30 AUS
1987-05-22 AUS
1983-12-14 AUS
210491 Alice
210500 Kenrick
Mills
Monk
female
male
1986-05-23 AUS
1988-01-01 AUS
210512 Patrick
210565 Adam
Murphy
Pine
male
male
1984-02-22 AUS
1976-02-28 AUS
210588 Shayne
210592 Stephanie
210596 Brenton
210638 Jessicah
210639 Melanie
Reese
Rice
Rickard
Schipper
Schlanger
female
female
male
female
female
1982-09-15 AUS
1988-06-17 AUS
1983-10-19 AUS
1986-11-19 AUS
1986-08-31 AUS
210674 Christian
210675 Nicholas
Sprenger
Sprenger
male
male
1985-12-19 AUS
1985-05-14 AUS
210680 Craig
210690 Eamon
Stevens
Sullivan
male
male
1980-07-23 AUS
1985-08-30 AUS
6
1
• Results: one row for each competitor in each swimming
final event
• Events: one row for each swimming final event.
– So, if there were 8 competitors in each of 10 events
then there would be 80 rows in this table).
Events
EventId
EventGender Distance
Style
DateHeld
StartTime
SWM054
SWM014
SWW054
men
men
women
400 individual medley
400 freestyle
400 individual medley
10-Aug-08
10-Aug-08
10-Aug-08
10:03
10:24
10:42
EventId
CompetitorNum
SWW411
SWW021
SWM031
SWW014
SWM411
SWM012
women
women
men
women
men
men
400 freestyle relay
100 butterfly
100 breaststroke
400 freestyle
400 freestyle relay
200 freestyle
10-Aug-08
11-Aug-08
11-Aug-08
11-Aug-08
11-Aug-08
12-Aug-08
11:21
10:24
10:30
11:17
11:26
10:16
SWM054
SWM054
SWM054
221565
232061
221213
1
2
3
4
5
6
243.84 World Record
246.16 European Record
248.09
SWM054
SWM054
207531
207546
4
5
7
3
252.16
252.47
SWM054
SWM054
232071
211784
6
7
2
1
252.84
253.38
SWM054
SWW010
SWW010
SWW010
SWW010
201943
202658
222009
210153
244639
8
1
2
3
4
8
3
4
5
2
255.40
24.06 Olympic Record
24.07 Americas Record
24.17
24.25
SWW010
SWW010
217815
247253
5
6
6
1
24.26
24.63
SWW010
SWW010
217807
217283
7
8
7
8
24.65
24.77
SWW041
SWM041
SWW031
women
men
women
100 backstroke
100 backstroke
100 breaststroke
12-Aug-08
12-Aug-08
12-Aug-08
10:23
10:31
10:48
SWW012
SWM022
women
men
200 freestyle
200 butterfly
13-Aug-08
13-Aug-08
10:14
10:21
SWW052
women
200 individual medley
13-Aug-08
11:12
Results
7
Place
Lane
ElapsedTime
Note
8
1. no two rows are exactly the same, i.e. no duplicate rows;
2.2 Tables (“relations”)
2. the order of the rows has no effect on the meaning of the
data;
• All data is stored in tables (aka “relations”).
3. the order of the columns has no effect on the meaning of
the data;
• Although the terms “table” and “relation” are often used to
mean the same thing in a relational database, strictly
speaking a relation is a (two-dimensional) table with the
following six rules:
4. each cell in the table contains only a single value, i.e.
you can’t put a list of values in a single cell (such as a list
of a person’s children);
5. each column has a distinct name, i.e. a name which is not
the same as another column in the same table;
9
6. all the values in each column are sensible, given the
column’s name, e.g. you can’t put phone numbers in a
date-of-birth column.
10
Competitors
CompetitorNum
GivenName
FamilyName
Gender
DateOfBirth
CountryCode
210375 Leisel
210413 Andrew
210444 Linda
Jones
Lauterstein
Mackenzie
female
male
female
1985-08-30 AUS
1987-05-22 AUS
1983-12-14 AUS
210491 Alice
210500 Kenrick
Mills
Monk
female
male
1986-05-23 AUS
1988-01-01 AUS
210512 Patrick
210565 Adam
Murphy
Pine
male
male
1984-02-22 AUS
1976-02-28 AUS
210588 Shayne
210592 Stephanie
210596 Brenton
210638 Jessicah
210639 Melanie
Reese
Rice
Rickard
Schipper
Schlanger
female
female
male
female
female
1982-09-15 AUS
1988-06-17 AUS
1983-10-19 AUS
1986-11-19 AUS
1986-08-31 AUS
210674 Christian
210675 Nicholas
Sprenger
Sprenger
male
male
1985-12-19 AUS
1985-05-14 AUS
210680 Craig
210690 Eamon
Stevens
Sullivan
male
male
1980-07-23 AUS
1985-08-30 AUS
2.3 Tables are Sets
• Recall,
a set is a collection of things that (typically) have
something in common.
• E.g. a shopping list is a collection of items that we want to
buy.
11
12
2
• Follow normal mathematical rules that apply to every set:
– no duplicates, i.e. a set cannot contain the same thing more
than once. E.g. the list of numbers “1, 2, 2, 3” is not a set,
but “1, 2, 3” is.
– the order of the things has no effect on the meaning of the
set. E.g. the set {John, Sue, Kila, Ben} is the same as the
set {Ben, John, Sue, Kila}.
• When talking about databases, we say that a table is a set of
rows. E.g., we can say that the Events table is a set of events.
• Because a table is a set of rows, the first two rules for tables
(in slide 10) are essentially the same as the two rules given
just above.
2.4 Relationships and
Relational Maps
• A database is a collection of tables.
But, it’s not just an arbitrary collection of tables. There
must be some kinds of relationships between the tables.
I.e. they must have something in common.
• E.g, the Countries and Competitors table each contain a
column named CountryCode.
• This shared name is a good indication that the two
columns both contain the same kind of data, and this is
confirmed by looking at the actual values in each column,
e.g. “AUS”, “CAN” and “NZL”.
13
14
Countries
CountryCode
CountryName
ARG
Argentina
ARM
Armenia
ARU
Aruba
ASA
American Samoa
AUS
Australia
AUT
• Knowing this allows us to match up information in one
table with another.
Austria
Competitors
AZE
GivenName
BAH
210375 Leisel
210413 Andrew
210444 Linda
Jones Bahamas female
Lauterstein
male
Mackenzie
female
1985-08-30 AUS
1987-05-22 AUS
1983-12-14 AUS
210491 Alice
210500 Kenrick
Mills
Monk
female
male
1986-05-23 AUS
1988-01-01 AUS
210512 Patrick
210565 Adam
Murphy
Pine
male
male
1984-02-22 AUS
1976-02-28 AUS
210588 Shayne
210592 Stephanie
210596 Brenton
210638 Jessicah
210639 Melanie
Reese
Rice
Rickard
Schipper
Schlanger
female
female
male
female
female
1982-09-15 AUS
1988-06-17 AUS
1983-10-19 AUS
1986-11-19 AUS
1986-08-31 AUS
210674 Christian
210675 Nicholas
Sprenger
Sprenger
male
male
1985-12-19 AUS
1985-05-14 AUS
210680 Craig
210690 Eamon
Stevens
Sullivan
male
male
1980-07-23 AUS
1985-08-30 AUS
CompetitorNum
Azerbaijan
FamilyName
Gender
DateOfBirth
CountryCode
• For any database, having a diagram (relational map)
showing these relationships makes it easier to understand
the data in each table, as well as the overall structure of the
database.
15
16
• Note that each relationship is drawn in a special way, with:
a straight line ( | ) at one end; and
a crow’s foot (/|\) at the other end.
This is called a one-to-many relationship, a very
important concept in working with databases.
17
18
3
Events
EventId
SWM054
SWM014
SWW054
EventGender Distance
Style
men
400 individual medley
men
400 freestyle
women
400 individual medley
DateHeld
10-Aug-08
10-Aug-08
10-Aug-08
10:03
10:24
10:42
SWW411
SWW021
women
women
10-Aug-08
11-Aug-08
11:21
10:24
SWM031
SWW014
SWM411
SWM012
SWW041
SWM041
SWW031
men
women
men
men
women
menEventId
SWW012
SWM022
SWW052
400 freestyle relay
100 butterfly
StartTime
100 breaststroke
11-Aug-08
10:30
400 freestyle
11-Aug-08
11:17
400 freestyle relay
11-Aug-08
11:26
200 freestyle
12-Aug-08
10:16
Results
100 backstroke
12-Aug-08
10:23
CompetitorNum
Lane
ElapsedTime
100 backstrokePlace
12-Aug-08
10:31
SWM054
women
SWM054
women
100 221565
breaststroke
200 232061
freestyle
1
2
3
4
12-Aug-08
5
13-Aug-08
SWM054
men
SWM054
women
200 221213
butterfly
200 207531
individual medley4
6
13-Aug-08
7
13-Aug-08
• If you go looking for them, you’ll find that one-to-many
relationships are very common in the real world, e.g.
– a university has many campuses;
– a house has many inhabitants;
– a CD has many tracks, and so on.
Note
243.84
10:48 World Record
246.16
10:14 European Record
248.09
10:21
SWM054
207546
5
3
252.16
11:12
252.47
SWM054
SWM054
232071
211784
6
7
2
1
252.84
253.38
SWM054
SWW010
SWW010
SWW010
SWW010
201943
202658
222009
210153
244639
8
1
2
3
4
8
3
4
5
2
255.40
19
24.06 Olympic Record
24.07 Americas Record
24.17
24.25
SWW010
SWW010
217815
247253
5
6
6
1
24.26
24.63
SWW010
SWW010
217807
217283
7
8
7
8
24.65
24.77
• Because most databases contain information about things
in the real world, it is not surprising that one-to-many
relationships occur between the rows in database tables.
20
• The concepts of columns and domains are closely related, but
they are not the same. There are two major differences.
2.5 Domains
– Firstly, a domain is a set of allowable values. Whereas a
column is a set of values that are actually being used, (at
any point in time).
• As stated previously (section 2.2), all the values in each
column must be sensible, given the column’s name.
– The range of allowable values for a given column is
called a domain.
– Secondly, the column’s values are actually stored as part of
each table in the database, whereas the domain’s values are
specified by rules, rather than by storing each possible
value.
• Domains may allow only a small number of values, e.g. a
Gender domain may only allow the values “male” and
“female”.
• Or, a domain may allow a large number of values, e.g. a
StudentNumber domain may allow any number in the
range 1 to 99,999,999.
21
2.6 Missing data and “null” values:
a short introduction
• Sometimes, in the real world, when we come to insert a
new row in a table, the values of one or more columns are
“missing”, i.e. we don’t have values for them.
• Depending on the circumstances, we may want to:
(a) use a default value, e.g. if an applicant for a new credit card doesn't
ask for a specific credit limit, then they get a default limit;
(b) record that a value is not applicable, e.g. some people don't have
middle-names, home phone numbers, etc;
(c) record that the value is unknown to the person entering the data, e.g.
a person may forget to put their date-of-birth on a job application
form.
23
22
• Apart from forgetfulness, there can be many reasons why
information is unknown. For example:
(i) it may relate to a future event, e.g. a competitor’s time
in a race before the race has taken place.
(ii) there may be insufficient resources (such as time or
equipment) to obtain the information immediately, e.g.
when a patient is admitted to an emergency-ward, the
cause of their illness may initially be unknown.
• Using a default value -- case (a) -- is not a problem for
databases, as we get a value to put in the database, even if
it is just the default value.
• But, when inapplicable or unknown -- cases (b) and (c) -we don’t really have anything to put in the database.
24
4
• To handle this, relational databases allow the special value,
null, to be used to indicate that information is missing.
• Null is not an ordinary value (like 0, 1, 2 or 3), but rather is
used to show the lack of a specific value. For this reason, a
null value cannot be displayed on the screen like an ordinary
value. As a substitute, programs often display things such as:
“null”, “?” or “-“.
• Be aware that null values can be used in all types of columns,
i.e. columns that contains numbers, names, dates, times, etc
(when allowed by the person who creates the database).
• Using null values has the advantages that:
– it is a clear, uniform way of recording that data is
missing; and
– the DBMS is aware that the data is missing and so can
process it appropriately.
• But, although null values are a significant improvement
over reserved values, they introduce new complexities (as
we will see later).
• For this reason, it is best to use null values sparingly, only
allowing them in columns where they are definitely
needed, rather than allowing them in every column.
25
26
2.7 Integrity Constraints
2.7.1
• Primary keys
Primary Keys
• As mentioned previously (section 2.2), two rows in a table
cannot be exactly the same, i.e. no duplicate rows.
• Foreign keys
• To ensure this, each table must have a primary key. A
primary key consists of one or more columns such that no
two rows have the same value in those columns, e.g.
• Referential Integrity
Table Name
Countries
Competitors
Events
Results
Table’s Primary Key
CountryCode
CompetitorNum
EventId
EventId, CompetitorNum
27
28
Countries
Events
CountryCode
CountryName
ARG
Argentina
ARM
Armenia
ARU
Aruba
ASA
American Samoa
AUS
Australia
EventId
SWM054
SWM014
SWW054
EventGender Distance
Style
men
400 individual medley
men
400 freestyle
women
400 individual medley
women
women
men
women
men
men
women EventId
men SWM054
women SWM054
400 freestyle relay
10-Aug-08
11:21
100 butterfly
11-Aug-08
10:24
100 breaststroke
11-Aug-08
10:30
400 freestyle
11-Aug-08
11:17
400 freestyle relay
11-Aug-08
11:26
200 freestyle
12-Aug-08
10:16
Results
100 backstroke Place
12-Aug-08
10:23
CompetitorNum
Lane
ElapsedTime
Note
221565
1 12-Aug-08
4
243.84
100 backstroke
10:31World Record
232061
2 12-Aug-08
5
246.16
100 breaststroke
10:48European Record
221213
200 freestyle
207531
200 butterfly
210413 Andrew
210444 Linda
Lauterstein
Mackenzie
male
female
1985-08-30 AUS
1987-05-22 AUS
1983-12-14 AUS
210491 Alice
210500 Kenrick
Mills
Monk
female
male
1986-05-23 AUS
1988-01-01 AUS
SWW411
SWW021
SWM031
SWW014
SWM411
SWM012
SWW041
SWM041
SWW031
210512 Patrick
210565 Adam
Murphy
Pine
male
male
1984-02-22 AUS
1976-02-28 AUS
SWW012
SWM022
women SWM054
men SWM054
210588 Shayne
210592 Stephanie
210596 Brenton
210638 Jessicah
210639 Melanie
Reese
Rice
Rickard
Schipper
Schlanger
female
female
male
female
female
1982-09-15 AUS
1988-06-17 AUS
1983-10-19 AUS
1986-11-19 AUS
1986-08-31 AUS
SWW052
women SWM054
210674 Christian
210675 Nicholas
Sprenger
Sprenger
male
male
1985-12-19 AUS
1985-05-14 AUS
210680 Craig
210690 Eamon
Stevens
Sullivan
male
male
1980-07-23 AUS
1985-08-30 AUS
AUT
AZE
CompetitorNum
GivenName
BAH
210375 Leisel
Austria
Competitors
Azerbaijan Gender
FamilyName
Jones Bahamas female
DateOfBirth
CountryCode
29
DateHeld StartTime
10-Aug-08
10:03
10-Aug-08
10:24
10-Aug-08
10:42
3
4
207546medley 5
200 individual
6
13-Aug-08
7
13-Aug-08
248.09
10:14
3
13-Aug-08
252.16
10:21
252.47
11:12
SWM054
SWM054
232071
211784
6
7
2
1
252.84
253.38
SWM054
SWW010
201943
202658
8
1
8
3
255.40
24.06 Olympic Record
SWW010
SWW010
SWW010
222009
210153
244639
2
3
4
4
5
2
24.07 Americas Record
24.17
24.25
SWW010
SWW010
217815
247253
5
6
6
1
24.26
24.63
SWW010
SWW010
217807
217283
7
8
7
8
24.65
24.77
30
5
•
To avoid problems, there are actually three properties that
a primary key must have:
1. Uniqueness: the primary key has a different value for
each row.
2. Irreducibility: if a column is removed from the
primary key, then the remaining columns do not have
the uniqueness property.
3. Entity Integrity: Columns in a primary key cannot
contain null values.
•
Two other important things about primary keys:
1. Being a primary key in one table does not mean that the
same column is a primary key in another table.
•
E.g. CountryCode is a primary key in the Countries
table, but is not a primary key in the Competitors
table.
2. The uniqueness property must hold for all current rows
and any possible future rows.
31
32
Countries
CountryCode
CountryName
ARG
Argentina
ARM
Armenia
ARU
Aruba
ASA
American Samoa
AUS
Australia
CompetitorNum
AUT
Austria
Competitors
AZE
GivenName
Azerbaijan
FamilyName
Gender
– A primary key is called:
• simple, if it consists of a single column;
• composite, if it consists of two or more columns.
Both types occur frequently, in real-world databases.
DateOfBirth
CountryCode
BAH
210375 Leisel
210413 Andrew
210444 Linda
Jones Bahamas female
Lauterstein
male
Mackenzie
female
1985-08-30 AUS
1987-05-22 AUS
1983-12-14 AUS
210491 Alice
210500 Kenrick
Mills
Monk
female
male
1986-05-23 AUS
1988-01-01 AUS
210512 Patrick
210565 Adam
Murphy
Pine
male
male
1984-02-22 AUS
1976-02-28 AUS
210588 Shayne
210592 Stephanie
210596 Brenton
210638 Jessicah
210639 Melanie
Reese
Rice
Rickard
Schipper
Schlanger
female
female
male
female
female
1982-09-15 AUS
1988-06-17 AUS
1983-10-19 AUS
1986-11-19 AUS
1986-08-31 AUS
210674 Christian
210675 Nicholas
Sprenger
Sprenger
male
male
1985-12-19 AUS
1985-05-14 AUS
210680 Craig
210690 Eamon
Stevens
Sullivan
male
male
1980-07-23 AUS
1985-08-30 AUS
33
– A table only ever has one primary key. If there is more
than one possibility, the choice is typically made by:
34
• Although seeing actual rows helps us to understand the tables,
it is useful to have a more concise notation for listing the
names of the tables and their columns.
• preferring narrower columns to wider ones, and
• In relational notation, the four tables are:
• preferring simple primary keys to composite
primary keys.
Countries (CountryCode, CountryName)
Competitors (CompetitorNum, GivenName, FamilyName,
Gender, DateOfBirth, CountryCode)
– The other possible primary keys are called alternate
keys, and are used much like primary keys, e.g. to
check that duplicates don’t occur. (However, they are
not quite the same as primary keys.)
Events (EventId, EventGender, Distance, Style,
DateHeld, StartTime)
Results (EventId, CompetitorNum, Place, Lane,
ElapsedTime, Note)
35
36
6
2.7.2
Foreign Keys
• Look at the column names on the relational map for the
Olympics database, all these column names are primary
keys of the tables at the “one” end of the one-to-many
relationships.
• There is a special term for the columns at the “many” end
of these relationships. These are called foreign keys.
• A foreign key is a column (or group of columns), whose
values match the values held in a primary key, e.g…
37
Table Name
Countries
Competitors
Events
Results
Table’s Foreign Keys
none
CountryCode
none
(i) EventId;
(ii) CompetitorNum
38
•
Referenced Primary Key
none.
CountryCode in the Countries table.
none.
(i) EventId in the Events table;
(ii) CompetitorNum in the Competitors table.
Foreign keys are another vital part of relational databases,
for two major reasons:
1. “They are the glue that holds the database together.”
(Chris Date)
2. They prevent junk from getting into the database, by
allowing "referential integrity" constraints (see next
section), i.e. ensuring that the data in one table is
consistent with the data in another table.
39
•
Three other important things about foreign keys:
1. They can be composite, just like primary keys. (There
are no examples of this in the Olympics database, but
there are some in the lab sessions.)
2. While foreign keys usually have the same column
names as the referenced primary key, this is not a hard
and fast rule. The designer of the database can choose
different names for the foreign key columns (at the
risk of confusing the users of the database).
40
3. While the foreign key and referenced primary key are
usually in different tables, this is not always the case.
For example, consider the table:
Employees
StaffNumber
1
2
3
4
5
6
7
StaffName
Susan Boss
John Supervisor
Joe Labourer
Wendy Worker
Mary Leader
Peter Programmer
Harry Hacker
StaffNumberOfManager
–
1
2
2
1
5
5
StaffNumberOfManager
Susan Boss
John Supervisor
Joe Labourer Wendy Worker
41
Mary Leader
Peter Programmer
Employees
Harry Hacker
42
7
Countries
2.7.3
CountryCode
CountryName
ARG
Argentina
ARM
Armenia
ARU
Aruba
ASA
American Samoa
AUS
Australia
Referential Integrity
• Hand-in-hand with the idea of having foreign keys is the
concept of referential integrity.
CompetitorNum
• A database has referential integrity, if all the foreign key
values match values in the referenced primary keys (or
may possibly be null).
• As previously mentioned, referential integrity is important
for preventing junk data entering the database,
– e.g. preventing a Competitor row having “XYZ” as the
CountryCode, because there is no such CountryCode in
the Countries table.
43
AUT
Austria
Competitors
AZE
GivenName
Azerbaijan
FamilyName
Gender
DateOfBirth
CountryCode
BAH
210375 Leisel
210413 Andrew
210444 Linda
Jones Bahamas female
Lauterstein
male
Mackenzie
female
1985-08-30 AUS
1987-05-22 AUS
1983-12-14 AUS
210491 Alice
210500 Kenrick
Mills
Monk
female
male
1986-05-23 AUS
1988-01-01 AUS
210512 Patrick
210565 Adam
Murphy
Pine
male
male
1984-02-22 AUS
1976-02-28 AUS
210588 Shayne
210592 Stephanie
210596 Brenton
210638 Jessicah
210639 Melanie
Reese
Rice
Rickard
Schipper
Schlanger
female
female
male
female
female
1982-09-15 AUS
1988-06-17 AUS
1983-10-19 AUS
1986-11-19 AUS
1986-08-31 AUS
210674 Christian
210675 Nicholas
Sprenger
Sprenger
male
male
1985-12-19 AUS
1985-05-14 AUS
210680 Craig
210690 Eamon
Stevens
Sullivan
male
male
1980-07-23 AUS
1985-08-30 AUS
44
8
Download