CTFS Workshop
Shameema Esufali
Asian data coordinator and technical resource for the network shameemaesufali@gmail.com
CTFS Workshop
Why relational databases?
Why MySQL?
What about R?
Relational Theory
In order to work with MySQL it is necessary to understand the basics of relational theory. i.e how and why data is stored and managed in a relational database.
The guiding principle behind a relational database is to store data once and only once.
What is a Relation?
A table. Columns are fields (attributes) of data related to other fields on the same row (tuple).
Primary Key
Identifies the row of a table without duplicates.
Tells you what the row contains
Eg. If treeid is the primary key then the row has information about that tree
Candidate Primary Key
Any attribute(s) which together would serve as the primary key.
Must uniquely identify a row of data.
Each part of the key must be essential to unique identification. No redundancy.
Foreign Key
A foreign key is a column in a table that matches the primary key column of another table. Its function is to link the basic data of two entities on demand, i.e. when two tables are joined using the common key.
First Normal Form
One piece of information per column. No repeated rows. Eliminate fused data eg Code1,Code2
Tag Species Code
Wrong!
1234 SHORME A, BA
Right Tag
1234
1234
Species
SHORME
SHORME
Code
A
BA
Second Normal Form
Each column depends on the entire primary key.
Wrong
Tag Census Species Seedsize X Y DBH
1234 1 SHORTR Medium 11.3 15.4 12
Tag Species Seedsize X Y
1234 SHORTR Medium 11.3 15.4
Right
Third Normal Form
Each column depends ONLY on the primary key. i.e. there are no transitive dependencies
Wrong
Tag Species Seedsize X Y
1234 SHORTR Medium 11.3 15.4
Tag Species X Y
1234 SHORTR 11.3 15.4
Right
Fourth Normal Form
The table must contain no more than one multi-valued dependency
Tag DBH
1234 10
1234 11
1234 11
Cod e
A
A
BA
Entity Relationship diagram (ERD)
Shows in a diagram how entities (tables) are related to one another.
One to One
One to many
Many to many
One to one
Extension of number of attributes in a single table
Rarely required
Tree
More tree
attributes
Most common
Requires two tables.
Linked by
Foreign Key
One to Many
Parent
Family Genus
Child
Species
Many to many
Need to break down to one to many
Measurement
Tree
Requires three tables
Code
Code
Measurement
Associative table provides common key
Reassembling data
Data was broken down into tables to preserve integrity
How can we put it together to derive information?
Use Structured Query Language (SQL) to JOIN tables using a common attribute
Two tables may be joined when they share at least one common attribute
Joins
3
4
5
GenusID
1
2
The Primary key of the Parent table is stored in the Child table as a cross reference. This is called a Foreign Key.
Primary Key in Parent
Genus
Acacia
Acalypha
Adelia
Aegiphila
Alchornea
5
6
7
3
4
1
2
SpeciesID Species melanoceras diversifolia macrostachya triloba panamensis costaricensis latifolia
3
3
FamilyID
4
3
Foreign Key in Child
3
4
5
2
3
5
GenusId
1
2
Table joined on Foreign Key
GenusID
4
5
2
3
6
7
SpeciesID
1
Species melanoceras diversifolia macrostachya triloba panamensis costaricensis latifolia
3
4
2
2
5
5
GenusId
1 ⇿
⇿
⇿
⇿
⇿
⇿
⇿
3
4
2
2
5
5
GenusID
1
Genus
Acacia
Acalypha
Acalypha
Adelia
Aegiphila
Alchornea
Alchornea
3
3
3
3
3
3
FamilyID
4
The Genus ID in the Species table is used to pick up information for the corresponding Genus. It looks for a row with the matching Primary Key
Extend to join many tables
With SQL you can join as many tables as you need to in order to get the set of information you need. Thus the previous example can be extended to include Family which is a parent table of Genus and/or extended in the another direction to include Tree which is a child of
Species as long as there is a linking attribute .
This attribute is called a Foreign Key .