Database Design for the non- technical researcher University of Illinois Library Eric Johnson - 2012 Goals for today • • • • Learn how databases are structured Learn to organize data into tables Learn basic SQL Use Access to graphically create queries that can be used in any database Limits of this class • We will cover only the basics • The focus is on storing and retrieving data for your thesis • While not a software class, we will use Access for some examples • Queries (SQL) created in Access can be used in other databases (with a bit of editing) Prerequisites for this class • Know how to download files using a web browser • Know how to navigate to the downloaded file and open it Introductions • Name • Department • A sentence about why you are here Ways to record data • A big note book(s) http://www.flickr.com/p hotos/urbansheep/5124 116/ Ways to record data • Index cards (edge notched for sorting) http://www.guardian.co.uk/technology/blog/2008/jun/18/whenkeepingrecord shadaned Ways to record data • • • • Spread sheets Word processor files Camera images Etc. Plan for data collection Being organized helps you find the information you collected • For Citations – use a Bibliography manager database (Zotero, Endnote, Mendeley, Refworks, etc.) • For Concepts – keep an organized journal • For Data – use spreadsheets or database • Data can be text, numbers, time/date Why a spread sheet or database? • Organized to help you to find the facts you collected • Easy to sort • It is ready for statistical analysis • Software can make maps from the data Spread sheet vs. database • Spread sheet are sortable, but can’t have queries – Useful for small amounts of information • Flat file databases are like spread sheets with the ability to answer queries • Relational databases connect the spread sheet tables together • As you get more information, a database will be able to sort through it quickly Some Tools • Spreadsheets – Microsoft Excel, OpenOffice, AppleWorks, Mac Numbers • Databases- Access, MySQL, PostGre, Oracle(enterprise level), AppleWorks • Geographic Databases- ArcMap, PostGIS • Statistics – SPSS, STATA, SAS Getting the Tools • Webstore: Microsoft Office (Excel and Access), ArcMap • Online free: MySQL, PostGreSQL, PostGIS • At most campus computer labs: SPSS, SAS, STATA (ATLAS & ACES) Design Considerations • • • • What is the purpose of your data? What questions might you want to ask? How much data will you have? How complex are the data relationships? Spreadsheet Table • Each table or spreadsheet page is a major type of information or topic • The table topic is usually a noun like “people”, or “place” • The table collects data about the topic • Spreadsheet pages can be converted to database tables and visa versa Spreadsheet Table • Each table or spreadsheet page is a major type of information or topic. • The table topic is usually a noun like “people”, or “place”. • The table collects data about the topic. • Spreadsheet pages can be converted to database tables and visa versa. Columns and database Fields • Each column or field is a topic detail • BOOK table has columns for Title, AuthorID, and NumberOfPages • AUTHOR table has columns for FullName, BirthDate, and HomeTown Columns and database Fields • <insert picture of BOOK table> <insert picture of AUTHOR table> Connecting Columns • • • • Each table has an Index column Every row will have an index value The index can be a number or words The index value isn’t repeated in a table Connecting Columns • The BOOK table uses AuthorID to point to rows in the AUTHOR table • AUTHOR table has an index or “Key” column for linking to other tables • MOVIE table points to AUTHOR and BOOK Connecting Columns • Advantages: • The complete author information doesn’t need to be written down for each book • Easy to find information because it is categorized More columns • Include various information you collect • A column could list the original source or citation • A column could tell you where in your notes you put additional information (Not everything can fit in a database) Relationships • Tables will have relationships with each other • Relationship are usually Verbs • AUTHOR wrote BOOK • BOOK was made into a MOVIE Entity-Relationship (E-R) Diagram • Can help you understand and clarify how your data is connected http://www.flickr.com/photos/nomanson/6156454223/ Information is in only one place • Avoid redundancy. Don’t put the same data in multiple places. You don’t have to repeat data entry Takes less space Easier to read Reduces typing errors Easier to correct Use Meaningful names • File, Folder, Table and Column names should each tell you what they contain • Make it easy for you to find the data in the future • CamelCase and underscored_words can be used instead of spaces Decide on a naming scheme • Pick some method that will work for you • Be consistent in using it • Write down your scheme so you will remember how to use it later Names • What does SFAP1900 mean? Names • What does SFAP1900 mean? • Science Fiction Authors Pre-1900 • Perhaps SciFiAuthPre1900 would be better Only 1 Thing • When designing a survey, each question should ask only one thing • Each cell in a spread sheet or database table should also contain only one thing • Make 2 rows if you have 2 things Sortable, findable Only 1 Thing per cell • Make 2 rows if you have 2 things • Bad • Good SQL • The language of databases • Used to collect your desired sub-set of the information • Can do math like counting and finding averages SQL words • SELECT • FROM • ; • • • • • WHERE GROUP BY HAVING ORDER BY CREATE Required Optional Tables for the next section • Table names are capitalized • Fields are CamelCase What is your question? • List Mark Twain’s books and his full name SELECT • Which columns do you want to view? • SELECT FullName, BookTitle; • Every SQL statement ends with ; FROM • Tells the database from which tables to select the columns • SELECT FullName, BookTitle FROM AUTHOR, BOOK; • (At this point we will get every possible combination of the tables) JOIN • Limits the possible combinations to what is useful • Indicates columns that connect tables • SELECT FullName, BookTitle FROM AUTHOR JOIN BOOK on AUTHOR.AuthorID = BOOK.AuthorID; WHERE • Used to select rows in which you have interest • SELECT FullName, BookTitle FROM AUTHOR JOIN BOOK on AUTHOR.AuthorID = BOOK.AuthorID WHERE AuthorID = ‘Mark Twain’; GROUP BY • Used to get rid of duplicates now that some of the columns are gone • SELECT FullName, BookTitle FROM AUTHOR JOIN BOOK on AUTHOR.AuthorID = BOOK.AuthorID WHERE AuthorID = ‘Mark Twain’ GROUP BY AuthorID; HAVING • Similar to WHERE but happens after grouping • SELECT FullName, BookTitle FROM AUTHOR JOIN BOOK on AUTHOR.AuthorID = BOOK.AuthorID GROUP BY AuthorID HAVING AuthorID = ‘Mark Twain’; ORDER BY • Used to sort each column • SELECT FullName, BookTitle FROM AUTHOR JOIN BOOK on AUTHOR.AuthorID = BOOK.AuthorID GROUP BY AuthorID HAVING AuthorID = ‘Mark Twain’ ORDER BY BookTitle; CREATE table … AS • Saves the results into a new table • CREATE table TWAIN AS SELECT FullName, BookTitle FROM AUTHOR JOIN BOOK on AUTHOR.AuthorID = BOOK.AuthorID GROUP BY AuthorID HAVING AuthorID = ‘Mark Twain’ ORDER BY BookTitle; The Procedure 1. 2. 3. 4. 5. 6. 7. Decide what information you want Pick the columns you need Remove unwanted rows Group the information Do any math needed Remove unwanted rows again Sort the results Get a database file • Download the database file: Workshop-database.accdb • Open the database file in Access • You may need to click “enable content” Access layout • Tables are listed on the left • Queries will be below the tables • The right is for displaying data and designing queries Look at a table • Double click on Author table • It opens up and shows data • “AuthorID” is an index column that helps a database keep track of data • Each row is an item (a person) • Each column is a characteristic of that person Table Fields • Click on View under the Home tab in the upper left and switch to Design view. • Each Field Name is listed • The type of data in each field is specified (text, number, date/time, etc.) • At the bottom, see if the field selected is “indexed” • In big databases, indexing helps queries run faster Table Fields • Click on View in the upper left and switch to Design view. • Each Field Name is listed • The type of data in each field is specified (text, number, date/time, etc.) • At the bottom, see if the field selected is “indexed” • In big databases, indexing helps queries run faster Quiz • In the AUTHOR table, is AuthorID a text or number field? • Is AuthorID indexed? • Is AuthorID a key? (Keys are used to automatically link tables) Designing a Query: What is Mark Twain’s home town? • Close the AUTHOR table (right click on AUTHOR tab and select close) • Click on the “Create” tab • Click on “Query Design” • In the Show Table dialog, select the AUTHOR Table • Then select Add then Close • Click on the “Create” tab • Click on “Query Design” • In the Show Table dialog, select the AUTHOR Table • Then select Add then Close. Designing a Query: What is Mark Twain’s home town? • Drag “ShortName” from the field list to a column in the grid below • In the “Criteria” line under ShortName, type Mark Twain • Drag “HomeTown” to the next column • Select the red “Run” exclamation point in the upper left Designing a Query: What is Mark Twain’s home town? • The result is a table that lists only the information you wanted Designing a Query: What is Mark Twain’s birth date? • In the upper left, click “View” to return to the query design view • Drag BirthDate to the grid Designing a Query: What is Mark Twain’s birth date? • Delete the HomeTown column – 2 ways: • In the grid click just above “HomeTown” the column will turn black o The delete key will remove it - Or o Select “Delete Columns” at the top in Query Setup ribbon • Run the query Designing a Query: What is Emily Bronte’s birth date? • Return to query design mode by clicking the View button • Change Mark Twain to Emily Bronte • Run the query Save and close query • Right click on the tab for Query1 and select Close • Select Yes, you want to save changes • Give the query a meaningful name so you can find it later Save Query • Right click on the Query1 tab and select Save • Give the query a name so you can find it later Designing a Query: How many pages are in each book? • In the Create tab select Query Design • Select the BOOK table, Add it and close the Show Table window • Drag BookTitle and NumberOfPages to the design grid • Run the query • Notice that the order is not sorted Designing a Query: How many pages are in each book? • In the Create tab select Query Design • Add the BOOK table • Drag BookTitle and NumberOfPages to the design grid • Run the query Designing a Query: How many pages are in each book? • In the Create tab select Query Design • Add the BOOK table • Drag BookTitle and NumberOfPages to the design grid • Run the query Designing a Query: How many pages are in each book, sorted by number of pages? • Back in design view, Under NumberOfPages, in the Sort criteria row select Ascending • Run the query • The result is now sorted Designing a Query: How many pages are in each book, sorted by number of pages? • Under NumberOfPages, in the criteria row select Ascending • Run the query • The result is now sorted Designing a Query: How many pages are in each book, sorted by number of pages? • Under NumberOfPages, in the criteria row select Ascending • Run the query • The result is now sorted Multiple tables • Each query can draw from multiple tables • Drag the AUTHOR table from the left to the field list workspace • Notice that the AuthorID’s are linked Multiple tables • The AuthorID’s were automatically linked because AuthorID was set as a key • You can also link fields by dragging a field from one list to another • The links reduce the results to only items that match in that column in both tables Designing a Query: What are Emily Bronte’s books? • Drag ShortName to the grid • Delete NumberOfPages from the grid • Set the criteria for ShortName to Emily Bronte • Run the query Designing a Query: What are Emily Bronte’s books? • Drag ShortName to the grid • Delete NumberOfPages from the grid • Set the criteria for ShortName to Emily Bronte • Run the query Designing a Query: What are Emily Bronte’s books? • Drag ShortName to the grid • Delete NumberOfPages from the grid • Set the criteria for ShortName to Emily Bronte • Run the query Designing a Query: How many of Emily’s books are in the database? • In design view, select the “Totals” sigma in the upper right • In the grid “Total:” row under BookTitle, change the entry from “Group By” to “Count” • Run the query Designing a Query: Show Books made into movies. • • • • Under BookTitle, reset Count to Group By Add the MOVIE table to the field list Delete ShortName from the grid Drag MovieTitle to the grid • Run the query (it will be wrong) Designing a Query: Show books made into movies. • What went wrong? • The query selected every combination of BookTitle and MovieTitle • To fix this, in design view, drag the BookID from BOOK to the BookID in MOVIE • Run the query (this time it will be right) Designing a Query: Show Author, Movie title and Release year, sorted by year. • Remove the BOOK table from the design space (right click on it and select remove table) • Make the grid have ShortName, MovieTitle and ReleaseYear • ReleaseYear should be sorted Ascending • Run the query Make a table: Show Author, Movie title and Release year, sorted by year. • Grid is like before • In the upper left, select “Make Table” and name it “MOVIE_DATES” • Run the query. Let Access paste rows • Open the new table to see if it is what you expected Look at the SQL • Close the MOVIE_DATES table (right click on the tab and select Close) • In the Query design for Query1, at the lower right, select “SQL” • OR in the upper left under View, select “SQL View” • This shows the SQL statement. You can copy and paste and change this to make new queries. Look at the SQL • Close the MOVIE_DATES table (right click on the tab and select Close) • In the Query design for Query1, at the lower right, select “SQL” • OR in the upper left under View, select “SQL View” • This shows the SQL statement. You can co Questions? Survey • What other database skills would you like to learn? For more help… • Scholarly Commons for one-on-one consultation http://www.library.illinois.edu/sc/ • ATLAS data services http://www.atlas.illinois.edu/services/stats/work shops/registration/ • Database classes at University of Illinois: BADM 352, BADM 554, CS 105 (covers other areas also), CS 411, LIS 490DB / LIS 490DBL, STAT 440 • Links to numeric and spatial data http://www.library.illinois.edu/datagis/