There are many different ways to organize a dataset into a useful

advertisement
Zoology 955, Spring 2008, 2/16/2016
Creating a Database
Luke Winslow
Resources
Database tutorial:
http://www.webmonkey.com/webmonkey/backend/databases/tutorials/tutorial3.html
SQL tutorial:
http://w3schools.com/sql/default.asp
http://www.sql-tutorial.net/SQL-tutorial.asp
Microsoft Access Alternatives:
http://www.openoffice.org/product/base.html
http://www.koffice.org/kexi/
Concepts
1. Files, tables, fields, records
2. Normalization
3. Relationships and joins
4. Queries, views, forms, reports
5. SQL
6. Indexing
Terminology
Table - A collection of data organized along columns and rows. Usually contains any number of
rows but a fixed number of columns.
Record (row) - Single item in a table. Made up of one or more fields.
Field - A space allocated for a single piece of information in a row.
Database - A collection of tables. In access, a database is stored in a single file. This is not true
for all database systems.
Query - Code to retrieve a certain set of data from a database.
Relationship - Links keys together. Generally, relates a set of rows to another set of rows.
Normalization - Process of reducing redundancy to reduce size and prevent data anomalies.
SQL - Structured Query Language. The standard for interacting with data stored in most
relational database systems.
Index - A data structure used to speed-up access to table data. These can also be used to enforce
uniqueness
1
Zoology 955, Spring 2008, 2/16/2016
Database Software
Database file
Tables
Fields
Records
Process
There are many different ways to organize a dataset into a useful database structure. No matter
what structure you end up with, there is no 100% right or wrong answer. But, there are certainly
good practices that when used, can help avoid common pitfalls.
Below is a simple outline of one potential process for creating a database structure to store your
data.
1. Decide what data you want the database to hold
a. It is often easiest to design a well structured database if you can identify most of
the data storage requirements up front. Requirement changes tend to slow down
and complicate the process.
2. Identify logical entities
a. Mapping tables onto physical and obvious logical entities can make your data
model easier to understand and use.
b. Using highly specific entities can add complexity with little or no functionality
gain. For example, you could define PI, Grad Student, and Undergrad all as
separate entities, but for most systems, Researcher would probably be sufficient.
2
Zoology 955, Spring 2008, 2/16/2016
3. Identify required attributes
a. Identify the attributes you’d like to associate with each entity. Attributes can
usually be easily added later with simple column additions if you have good
entities selected, so don’t spend too much time on this point. Remember, you can
make some attributes optional if you want more flexibility.
4. Identify relationships
a. Determine how your entities fit together. This is most likely an easy step as a
decent mental model is already required in defining entities. This is a good point
to choose what types of relationships to use.
5. Normalize if required
a. Try and eliminate redundancy. This can make the data model easier to update,
smaller to store, and more flexible, but may make data querying and insertion
more complex. Storing no derived values eliminates redundancy and eases
updates if values are changed. (e.g., storing date, year, month, and day).
6. Denormalize if required
a. Use common sense. If practical issues dictate, reducing the number of linked
tables and adding derived columns may make future queries easier.
7. Populate database
a. Import already collected data and create interfaces or forms for entering more
complex data if required.
3
Zoology 955, Spring 2008, 2/16/2016
Reference/Tips
Common Data types:
- Actual implementations vary between database systems. Please consult database
documentation. For example, Integer is from -2,147,483,648 to 2,147,483,647 in
MySQL but is between -32,768 and 32,767 in Microsoft Access.
1) Text
a. Varchar – variable number of text characters, usually includes user defined limit.
b. Char – static number of text characters, usually pads spaces onto the end when not
full. (I don’t recommend this)
c. Text, Memo – Very large text fields. Not as efficient to store as Varchar
2) Decimal numbers
a. Single, Double – Floating points of various precision and range
b. Decimal, Real – Numeric values with fixed number of decimal places
3) Integers
a. Tinyint, Integer, Long – Integers of various range
4) Date
a. DateTime – Usually represents a full date/time timestamp. You rarely need to
store year, date, and time individually.
Access data types to avoid (access specific types that are difficult to translate to other systems).
Other, more general types, are usually sufficient.
- OLE Object
- Hyperlink
- Currency
Common Relationship Types:
One to One – Used when one entity relates directly to another. Example, one buoy relates
to one lake.
Figure 1 - One-to-one relationship example.
4
Zoology 955, Spring 2008, 2/16/2016
One to Many – Used when one entity relates to many other entities, or many entities
relates to one entity. Example, one buoy may have many sensors.
Figure 2 - One-to-many relationship example.
Many to Many – Used when multiple entities may relate to multiple other entities. This
cannot be done in most database systems without using a linking table. Example,
buoys and buoy users. Each user may use multiple buoys and each buoy may be
used by multiple users.
Figure 3 - Many-to-many relationship example using third linking table.
5
Download