Zoology 955, Spring 2008, 2/16/2016 Creating a Database Luke Winslow Resources Database tutorial: http://www.webmonkey.com/webmonkey/backend/databases/tutorials/tutorial3.html SQL tutorial: http://w3schools.com/sql/default.asp http://www.sql-tutorial.net/SQL-tutorial.asp Microsoft Access Alternatives: http://www.openoffice.org/product/base.html http://www.koffice.org/kexi/ Concepts 1. Files, tables, fields, records 2. Normalization 3. Relationships and joins 4. Queries, views, forms, reports 5. SQL 6. Indexing Terminology Table - A collection of data organized along columns and rows. Usually contains any number of rows but a fixed number of columns. Record (row) - Single item in a table. Made up of one or more fields. Field - A space allocated for a single piece of information in a row. Database - A collection of tables. In access, a database is stored in a single file. This is not true for all database systems. Query - Code to retrieve a certain set of data from a database. Relationship - Links keys together. Generally, relates a set of rows to another set of rows. Normalization - Process of reducing redundancy to reduce size and prevent data anomalies. SQL - Structured Query Language. The standard for interacting with data stored in most relational database systems. Index - A data structure used to speed-up access to table data. These can also be used to enforce uniqueness 1 Zoology 955, Spring 2008, 2/16/2016 Database Software Database file Tables Fields Records Process There are many different ways to organize a dataset into a useful database structure. No matter what structure you end up with, there is no 100% right or wrong answer. But, there are certainly good practices that when used, can help avoid common pitfalls. Below is a simple outline of one potential process for creating a database structure to store your data. 1. Decide what data you want the database to hold a. It is often easiest to design a well structured database if you can identify most of the data storage requirements up front. Requirement changes tend to slow down and complicate the process. 2. Identify logical entities a. Mapping tables onto physical and obvious logical entities can make your data model easier to understand and use. b. Using highly specific entities can add complexity with little or no functionality gain. For example, you could define PI, Grad Student, and Undergrad all as separate entities, but for most systems, Researcher would probably be sufficient. 2 Zoology 955, Spring 2008, 2/16/2016 3. Identify required attributes a. Identify the attributes you’d like to associate with each entity. Attributes can usually be easily added later with simple column additions if you have good entities selected, so don’t spend too much time on this point. Remember, you can make some attributes optional if you want more flexibility. 4. Identify relationships a. Determine how your entities fit together. This is most likely an easy step as a decent mental model is already required in defining entities. This is a good point to choose what types of relationships to use. 5. Normalize if required a. Try and eliminate redundancy. This can make the data model easier to update, smaller to store, and more flexible, but may make data querying and insertion more complex. Storing no derived values eliminates redundancy and eases updates if values are changed. (e.g., storing date, year, month, and day). 6. Denormalize if required a. Use common sense. If practical issues dictate, reducing the number of linked tables and adding derived columns may make future queries easier. 7. Populate database a. Import already collected data and create interfaces or forms for entering more complex data if required. 3 Zoology 955, Spring 2008, 2/16/2016 Reference/Tips Common Data types: - Actual implementations vary between database systems. Please consult database documentation. For example, Integer is from -2,147,483,648 to 2,147,483,647 in MySQL but is between -32,768 and 32,767 in Microsoft Access. 1) Text a. Varchar – variable number of text characters, usually includes user defined limit. b. Char – static number of text characters, usually pads spaces onto the end when not full. (I don’t recommend this) c. Text, Memo – Very large text fields. Not as efficient to store as Varchar 2) Decimal numbers a. Single, Double – Floating points of various precision and range b. Decimal, Real – Numeric values with fixed number of decimal places 3) Integers a. Tinyint, Integer, Long – Integers of various range 4) Date a. DateTime – Usually represents a full date/time timestamp. You rarely need to store year, date, and time individually. Access data types to avoid (access specific types that are difficult to translate to other systems). Other, more general types, are usually sufficient. - OLE Object - Hyperlink - Currency Common Relationship Types: One to One – Used when one entity relates directly to another. Example, one buoy relates to one lake. Figure 1 - One-to-one relationship example. 4 Zoology 955, Spring 2008, 2/16/2016 One to Many – Used when one entity relates to many other entities, or many entities relates to one entity. Example, one buoy may have many sensors. Figure 2 - One-to-many relationship example. Many to Many – Used when multiple entities may relate to multiple other entities. This cannot be done in most database systems without using a linking table. Example, buoys and buoy users. Each user may use multiple buoys and each buoy may be used by multiple users. Figure 3 - Many-to-many relationship example using third linking table. 5