Database Design for the non- technical researcher University of Illinois Library

advertisement
Database Design
for the non- technical
researcher
University of Illinois Library
Eric Johnson - 2012
Goals for today
•
•
•
•
Learn how databases are structured
Learn to organize data into tables
Learn basic SQL
Use Access to graphically create queries that
can be used in any database
Limits of this class
• We will cover only the basics
• The focus is on storing and retrieving data for
your thesis
• While not a software class, we will use Access
for some examples
• Queries (SQL) created in Access can be used in
other databases (with a bit of editing)
Prerequisites for this class
• Know how to download files using a web
browser
• Know how to navigate to the downloaded file
and open it
Introductions
• Name
• Department
• A sentence about why you are here
Ways to record data
• A big note
book(s)
http://www.flickr.com/p
hotos/urbansheep/5124
116/
Ways to record data
• Index cards (edge notched for sorting)
http://www.guardian.co.uk/technology/blog/2008/jun/18/whenkeepingrecord
shadaned
Ways to record data
•
•
•
•
Spread sheets
Word processor files
Camera images
Etc.
Plan for data collection
Being organized helps you find the
information you collected
• For Citations – use a Bibliography
manager database (Zotero, Endnote,
Mendeley, Refworks, etc.)
• For Concepts – keep an organized journal
• For Data – use spreadsheets or database
• Data can be text, numbers, time/date
Why a spread sheet or database?
• Organized to help you to find the facts
you collected
• Easy to sort
• It is ready for statistical analysis
• Software can make maps from the data
Spread sheet vs. database
• Spread sheet are sortable, but can’t have
queries – Useful for small amounts of
information
• Flat file databases are like spread sheets
with the ability to answer queries
• Relational databases connect the spread
sheet tables together
• As you get more information, a database
will be able to sort through it quickly
Some Tools
• Spreadsheets – Microsoft Excel,
OpenOffice, AppleWorks, Mac Numbers
• Databases- Access, MySQL, PostGre,
Oracle(enterprise level), AppleWorks
• Geographic Databases- ArcMap, PostGIS
• Statistics – SPSS, STATA, SAS
Getting the Tools
• Webstore: Microsoft Office (Excel and
Access), ArcMap
• Online free: MySQL, PostGreSQL, PostGIS
• At most campus computer labs: SPSS,
SAS, STATA (ATLAS & ACES)
Design Considerations
•
•
•
•
What is the purpose of your data?
What questions might you want to ask?
How much data will you have?
How complex are the data relationships?
Spreadsheet Table
• Each table or spreadsheet page is a
major type of information or topic
• The table topic is usually a noun like
“people”, or “place”
• The table collects data about the topic
• Spreadsheet pages can be converted to
database tables and visa versa
Spreadsheet Table
• Each table or spreadsheet page is a
major type of information or topic.
• The table topic is usually a noun like
“people”, or “place”.
• The table collects data about the topic.
• Spreadsheet pages can be converted to
database tables and visa versa.
Columns and database Fields
• Each column or field is a topic detail
• BOOK table has columns for Title,
AuthorID, and NumberOfPages
• AUTHOR table has columns for
FullName, BirthDate, and HomeTown
Columns and database Fields
• <insert picture of BOOK table>
<insert picture of AUTHOR table>
Connecting Columns
•
•
•
•
Each table has an Index column
Every row will have an index value
The index can be a number or words
The index value isn’t repeated in a table
Connecting Columns
• The BOOK table uses AuthorID to point to
rows in the AUTHOR table
• AUTHOR table has an index or “Key”
column for linking to other tables
• MOVIE table points to AUTHOR and BOOK
Connecting Columns
• Advantages:
• The complete author information
doesn’t need to be written down for
each book
• Easy to find information because it is
categorized
More columns
• Include various information you collect
• A column could list the original source or
citation
• A column could tell you where in your
notes you put additional information
(Not everything can fit in a database)
Relationships
• Tables will have relationships with each
other
• Relationship are usually Verbs
• AUTHOR wrote BOOK
• BOOK was made into a MOVIE
Entity-Relationship (E-R) Diagram
• Can help you understand and clarify how
your data is connected
http://www.flickr.com/photos/nomanson/6156454223/
Information is in only one place
• Avoid redundancy. Don’t put the same
data in multiple places.





You don’t have to repeat data entry
Takes less space
Easier to read
Reduces typing errors
Easier to correct
Use Meaningful names
• File, Folder, Table and Column names
should each tell you what they contain
• Make it easy for you to find the data in
the future
• CamelCase and underscored_words can
be used instead of spaces
Decide on a naming scheme
• Pick some method that will work for you
• Be consistent in using it
• Write down your scheme so you will
remember how to use it later
Names
• What does SFAP1900 mean?
Names
• What does SFAP1900 mean?
• Science Fiction Authors Pre-1900
• Perhaps SciFiAuthPre1900 would be
better
Only 1 Thing
• When designing a survey, each question
should ask only one thing
• Each cell in a spread sheet or database
table should also contain only one thing
• Make 2 rows if you have 2 things
 Sortable, findable
Only 1 Thing per cell
• Make 2 rows if you have 2 things
• Bad
• Good
SQL
• The language of databases
• Used to collect your desired sub-set of
the information
• Can do math like counting and finding
averages
SQL words
• SELECT
• FROM
• ;
•
•
•
•
•
WHERE
GROUP BY
HAVING
ORDER BY
CREATE
Required
Optional
Tables for the next section
• Table names are capitalized
• Fields are CamelCase
What is your question?
• List Mark Twain’s books and his full name
SELECT
• Which columns do you want to view?
• SELECT FullName, BookTitle;
• Every SQL statement ends with ;
FROM
• Tells the database from which tables to
select the columns
• SELECT FullName, BookTitle
FROM AUTHOR, BOOK;
• (At this point we will get every possible
combination of the tables)
JOIN
• Limits the possible combinations to what
is useful
• Indicates columns that connect tables
• SELECT FullName, BookTitle
FROM AUTHOR
JOIN BOOK on
AUTHOR.AuthorID = BOOK.AuthorID;
WHERE
• Used to select rows in which you have
interest
• SELECT FullName, BookTitle
FROM AUTHOR
JOIN BOOK on
AUTHOR.AuthorID = BOOK.AuthorID
WHERE AuthorID = ‘Mark Twain’;
GROUP BY
• Used to get rid of duplicates now that
some of the columns are gone
• SELECT FullName, BookTitle
FROM AUTHOR
JOIN BOOK on
AUTHOR.AuthorID = BOOK.AuthorID
WHERE AuthorID = ‘Mark Twain’
GROUP BY AuthorID;
HAVING
• Similar to WHERE but happens after
grouping
• SELECT FullName, BookTitle
FROM AUTHOR
JOIN BOOK on
AUTHOR.AuthorID = BOOK.AuthorID
GROUP BY AuthorID
HAVING AuthorID = ‘Mark Twain’;
ORDER BY
• Used to sort each column
• SELECT FullName, BookTitle
FROM AUTHOR
JOIN BOOK on
AUTHOR.AuthorID = BOOK.AuthorID
GROUP BY AuthorID
HAVING AuthorID = ‘Mark Twain’
ORDER BY BookTitle;
CREATE table … AS
• Saves the results into a new table
• CREATE table TWAIN AS
SELECT FullName, BookTitle
FROM AUTHOR
JOIN BOOK on
AUTHOR.AuthorID = BOOK.AuthorID
GROUP BY AuthorID
HAVING AuthorID = ‘Mark Twain’
ORDER BY BookTitle;
The Procedure
1.
2.
3.
4.
5.
6.
7.
Decide what information you want
Pick the columns you need
Remove unwanted rows
Group the information
Do any math needed
Remove unwanted rows again
Sort the results
Get a database file
• Download the database file:
Workshop-database.accdb
• Open the database file in Access
• You may need to click “enable content”
Access layout
• Tables are listed on the left
• Queries will be below the tables
• The right is for displaying data
and designing queries
Look at a table
• Double click on Author table
• It opens up and shows data
• “AuthorID” is an index column that helps
a database keep track of data
• Each row is an item (a person)
• Each column is a characteristic of that
person
Table Fields
• Click on View under the Home tab in the
upper left and switch to Design view.
• Each Field Name is listed
• The type of data in each field is specified
(text, number, date/time, etc.)
• At the bottom, see if the field selected is
“indexed”
• In big databases, indexing helps queries
run faster
Table Fields
• Click on View in the upper left and switch
to Design view.
• Each Field Name is listed
• The type of data in each field is specified
(text, number, date/time, etc.)
• At the bottom, see if the field selected is
“indexed”
• In big databases, indexing helps queries
run faster
Quiz
• In the AUTHOR table, is AuthorID a text
or number field?
• Is AuthorID indexed?
• Is AuthorID a key? (Keys are used to
automatically link tables)
Designing a Query:
What is Mark Twain’s home town?
• Close the AUTHOR table (right click on
AUTHOR tab and select close)
• Click on the “Create” tab
• Click on “Query Design”
• In the Show Table dialog, select the
AUTHOR Table
• Then select Add then Close
• Click on the “Create” tab
• Click on “Query Design”
• In the Show Table dialog, select the
AUTHOR Table
• Then select Add then Close.
Designing a Query:
What is Mark Twain’s home town?
• Drag “ShortName” from the field list to a
column in the grid below
• In the “Criteria” line under ShortName,
type Mark Twain
• Drag “HomeTown” to the next column
• Select the red “Run” exclamation point in
the upper left
Designing a Query:
What is Mark Twain’s home town?
• The result is a table that lists only the
information you wanted
Designing a Query:
What is Mark Twain’s birth date?
• In the upper left, click “View” to return
to the query design view
• Drag BirthDate to the grid
Designing a Query:
What is Mark Twain’s birth date?
• Delete the HomeTown column – 2 ways:
• In the grid click just above “HomeTown”
the column will turn black
o The delete key will remove it
- Or o Select “Delete Columns” at the top in
Query Setup ribbon
• Run the query
Designing a Query:
What is Emily Bronte’s birth date?
• Return to query design mode by clicking
the View button
• Change Mark Twain to Emily Bronte
• Run the query
Save and close query
• Right click on the tab for Query1 and
select Close
• Select Yes, you want to save changes
• Give the query a meaningful name so
you can find it later
Save Query
• Right click on the Query1 tab and select
Save
• Give the query a name so you can find it
later
Designing a Query:
How many pages are in each book?
• In the Create tab select Query Design
• Select the BOOK table, Add it and close
the Show Table window
• Drag BookTitle and NumberOfPages to
the design grid
• Run the query
• Notice that the order is not sorted
Designing a Query:
How many pages are in each book?
• In the Create tab select Query Design
• Add the BOOK table
• Drag BookTitle and NumberOfPages to
the design grid
• Run the query
Designing a Query:
How many pages are in each book?
• In the Create tab select Query Design
• Add the BOOK table
• Drag BookTitle and NumberOfPages to
the design grid
• Run the query
Designing a Query:
How many pages are in each book,
sorted by number of pages?
• Back in design view, Under
NumberOfPages, in the Sort criteria row
select Ascending
• Run the query
• The result is now sorted
Designing a Query:
How many pages are in each book,
sorted by number of pages?
• Under NumberOfPages, in the criteria
row select Ascending
• Run the query
• The result is now sorted
Designing a Query:
How many pages are in each book,
sorted by number of pages?
• Under NumberOfPages, in the criteria
row select Ascending
• Run the query
• The result is now sorted
Multiple tables
• Each query can draw from multiple
tables
• Drag the AUTHOR table from the left to
the field list workspace
• Notice that the AuthorID’s are linked
Multiple tables
• The AuthorID’s were automatically linked
because AuthorID was set as a key
• You can also link fields by dragging a field
from one list to another
• The links reduce the results to only items
that match in that column in both tables
Designing a Query:
What are Emily Bronte’s books?
• Drag ShortName to the grid
• Delete NumberOfPages from the grid
• Set the criteria for ShortName to Emily
Bronte
• Run the query
Designing a Query:
What are Emily Bronte’s books?
• Drag ShortName to the grid
• Delete NumberOfPages from the grid
• Set the criteria for ShortName to Emily
Bronte
• Run the query
Designing a Query:
What are Emily Bronte’s books?
• Drag ShortName to the grid
• Delete NumberOfPages from the grid
• Set the criteria for ShortName to Emily
Bronte
• Run the query
Designing a Query:
How many of Emily’s books are in the
database?
• In design view, select the “Totals” sigma
in the upper right
• In the grid “Total:” row under BookTitle,
change the entry from “Group By” to
“Count”
• Run the query
Designing a Query:
Show Books made into movies.
•
•
•
•
Under BookTitle, reset Count to Group By
Add the MOVIE table to the field list
Delete ShortName from the grid
Drag MovieTitle to the grid
• Run the query (it will be wrong)
Designing a Query:
Show books made into movies.
• What went wrong?
• The query selected every combination of
BookTitle and MovieTitle
• To fix this, in design view, drag the
BookID from BOOK to the BookID in
MOVIE
• Run the query (this time it will be right)
Designing a Query:
Show Author, Movie title and Release
year, sorted by year.
• Remove the BOOK table from the design
space (right click on it and select remove
table)
• Make the grid have ShortName,
MovieTitle and ReleaseYear
• ReleaseYear should be sorted Ascending
• Run the query
Make a table:
Show Author, Movie title and Release
year, sorted by year.
• Grid is like before
• In the upper left, select “Make Table” and
name it “MOVIE_DATES”
• Run the query. Let Access paste rows
• Open the new table to see if it is what you
expected
Look at the SQL
• Close the MOVIE_DATES table (right click
on the tab and select Close)
• In the Query design for Query1, at the
lower right, select “SQL”
• OR in the upper left under View, select
“SQL View”
• This shows the SQL statement. You can
copy and paste and change this to make
new queries.
Look at the SQL
• Close the MOVIE_DATES table (right click
on the tab and select Close)
• In the Query design for Query1, at the
lower right, select “SQL”
• OR in the upper left under View, select
“SQL View”
• This shows the SQL statement. You can
co
Questions?
Survey
• What other database skills would you like to
learn?
For more help…
• Scholarly Commons for one-on-one consultation
http://www.library.illinois.edu/sc/
• ATLAS data services
http://www.atlas.illinois.edu/services/stats/work
shops/registration/
• Database classes at University of Illinois: BADM
352, BADM 554, CS 105 (covers other areas also),
CS 411, LIS 490DB / LIS 490DBL, STAT 440
• Links to numeric and spatial data
http://www.library.illinois.edu/datagis/
Download