DBMS notes - Tom Kleen

advertisement

Databases

Updated 2015.10.27

Sample data: Open the following two files

AlleghenyCountyEducationalAttainment.xlsx

AlleghenyCountyTracts.xlsx

These two files can be used to show how/why we would join data from two tables. The tracts table has information about the census tracts, and is from the Census Bureau. The

Educational Attainment folder has info about education in each census tract.

Load them and set them to 11-point Calibri, and make all rows 15 points tall. View them side-by-side and compare. We want to join the data. This is what a database management system can do.

A Geographic Information System is a combination of two programs: (1) a database and (2) a map-drawing program. You can understand geographic information systems better if you have a basic understanding of databases.

A video store example

Back in the good old days, people could rent DVDs at a movie rental store. They, like almost all other businesses, keep track of all of their business information on computers. Today we will consider WHAT they keep track of and WHY they keep it on computers.

Keeping the data in a spreadsheet

In order for a computer to do anything useful with the information that is stored in it, the information must be organized. This usually means keeping the data in tables. To see how information on a computer is organized, let's look at an example:

A video rental store needs to keep track of data on members, inventory, and rentals.

Each type of data goes in a table. Look at the Movie Rentals Database.xlsx.

 MEMBERS Table: first name, last name, membership date, member number, address, city, state, zip, phone, etc.

 VIDEOS Table: name, date purchased, number, cost, supplier, etc.

 RENTALS Table: video number, member number, rental date, return date, charge, etc.

 ZIP Table: ZIP code, city.

Consider how an employee might use this database to notify customers when they have a movie that is 5 days or more overdue:

1.

Sort the RENTAL table using the due date as a key field.

2.

Highlight (select) the values that are 5 days or more overdue. Write down the member ID of the first member with an overdue movie.

3.

Go to the MEMBER table and look up the member name and phone number.

4.

Call the member. The member asks which movie?

10/25/2015 Документ1 Page 1 of 6

5.

Go back to the RENTAL table and look up the movie ID.

6.

Go to the MOVIE table and look up the name of the movie.

7.

Go back to step 1 and repeat for the second customer on the list, etc., etc.

All of this looking things up is a pain. It would be nice if the computer could do all of the looking up for us. Let's look at a simpler example. Look at the MEMBER table. Each member has a ZIP code telling us where he lives. But if we don't have ZIP codes memorized, we have to look them up. See the ZIP table.

The name of the city and state are NOT stored in the MEMBER table because it would cause a lot of repetition. So we only store the ZIP code. It would be nice if the computer could join these two tables together, matching up ZIP codes and temporarily adding the city and state to the end of the MEMBER table.

This is exactly what a database management system is good at doing. All we have to do is tell the database management system which columns to match up: the ZIP field in the

MEMBER table and the ZIP Code field in the ZIP CODE table.

However, Excel is NOT a database management system. Excel is NOT very good at joining data from one worksheet with data from another worksheet. But a database management system is. Probably the most common database management system in the world today is

Microsoft's Access.

Keeping the data in a database

Open the Access file VideoRentalsDatabase.mdb (it's zipped; you'll have to unzip it). Note that MDB is the file extension for Access database files. This is the old extension. The newer

(since Office 2007) file extension is ACCDB. You will see these often in your geodatabases.

Examine the tables. Note:

 Double-click on each table to open it. So far there's nothing different from Excel.

 Every table is a rectangle.

 Each column has the same type of fact in each cell from top to bottom.

 There can be only one entry for each fact.

No two rows are exactly the same.

The order of the columns is not important.

Design view

Right-click on the MEMBER table and choose "Design View". This is where we can do things like:

 Change field names

 Add new fields

 Tell Access the type of data that each field holds

Integer: byte, integer (2 bytes), and long integer (4 bytes)

Real: single, double

Currency: no round-off errors

10/25/2015 Документ1 Page 2 of 6

Boolean: Yes/No

Date/Time: date/time

String: text, memo

 Object

Hyperlink

 Specify a key field. Excel doesn't know anything about data types or key fields.

Joining tables

Access can join two tables for us if we tell it which fields to match up for the join:

 Click on the Create tab. Then click on the Query Design button.

 The Show Table dialog box will appear.

 Click on the MEMBER table and click on the Add button.

 Click on the ZIP CODE table and click on the Add button.

 Click on the Close button.

 This table has already been set up to show that the ZIP Code field in the Members table is the same as the ZIP Code field in the ZIP Code table. That is why there is a line connecting the fields.

 Click and drag from the ZIP field in the MEMBER table TO the ZIP Code field in the

ZIP CODE table. This tells Access that if we try to join these tables, these are the columns that need to be matched up.

 Now click on the Exclamation mark on the left end of the Query Tools | Design tab. You will get an error message that says "Query must have at least one destination field." Click on OK.

 We need to tell Access which columns from each table to show after it joins the tables. The easy way is just to double-click on the asterisk at the top of each table.

This tells Access to select ALL fields from each table.

 Now click on the Exclamation mark again. This time the join will take place, and

Access will show us all of the resulting rows. There should be 43 rows, one for each row in the MEMBER table.

Joins frequently have to be performed when you are working on a geodatabase and you have two different tables of data, but they both refer to the same geographical area.

Selecting records

In ArcMap, we have been filtering records by going to the Attribute table for a layer and clicking on the Options button and then choosing Select by Attributes.

To select records, we must provide a rule. Most databases use a special language called

SQL (Structured Query Language) to select records from a table. You can do a lot of different things with SQL, but we are only concerned with selecting records that meet certain criteria. To do so, you use a SELECT statement. The syntax for a SELECT statement is:

SELECT fieldnames FROM tablename WHERE condition

The words in upper case are reserved words, and the lower case words must be replaced by your field names, your table name, and your condition(s). The field names and table name

10/25/2015 Документ1 Page 3 of 6

are easy. The interesting part is writing the condition. Let's try to write some SQL queries in

Access.

Click on the Create tab. Then click on the Query Design button. In the Show Table dialog box, click on the Close button (without adding any tables at all). On the Query Tools |

Design tab, click on the SQL button (it is the leftmost button on the ribbon; if it is not visible, click on the down arrow on the leftmost button and choose SQL). Type the following in the window:

SELECT * FROM MEMBER

Access will run the query and it will return a "table" (a temporary table) with all of the records from the MEMBER table (the asterisk means "all columns").

On the Home tab, click on the Views button and choose SQL from the drop-down list (far left side). This will return us to the SQL editor. Another query:

SELECT [First Name], [Last name] FROM MEMBER

First Name and Last Name are the names of two of the columns in the table. There are a lot of computer programs that do not allow you to put blanks in the name of a field (e.g. you would have to name a field FirstName or LastName (without blanks separating the two words). Access allows you to leave blanks in the middle of a field name, but if you do, you must enclose the field name in square brackets. You can list as many fields as you want as long as you separate them with commas.

SQL conditions

You can also specify conditions, so that only some of the records are selected.

Example:

SELECT * FROM MEMBER WHERE [First Name] = 'John'

Since First Name is a text field, when we want to compare a text field to some constant value (like 'John') we must enclose it in apostrophes. Try leaving the apostrophes out and see what happens. SQL will ask you to enter a parameter value because it doesn't know how to interpret 'John' without the apostrophes.

If you are looking at a numeric field, apostrophes are not required (and will cause a "Data type mismatch" error if you do put them in).

Example:

SELECT * FROM MEMBER WHERE ID = 40

You can also supply multiple conditions using the words AND or OR.

Example:

SELECT * FROM MEMBER WHERE

[First Name] = 'John' OR

10/25/2015 Документ1 Page 4 of 6

[First Name] = 'James'

"OR" means "either" so SQL will retrieve any record where either the condition [First Name]

= 'John' is true, or where the condition [First Name] = 'James' is true.

NOTE: You must provide a complete condition on each side of the word "OR". That is, you may not do this:

SELECT * FROM MEMBER WHERE

[First Name] = 'John' OR 'James'

which is the way that you might say it in English. Also note that SQL does NOT give you an error if you do it incorrectly! Instead, it retrieves ALL of the records! So be careful!

Example:

To select records where a given field is within a specific range of values, use AND:

SELECT * FROM MEMBER WHERE

[Member ID] >=35 AND

[Member ID] <=40

"AND" means "both" so SQL will retrieve any record where BOTH conditions are true. If only one of the conditions is true, the record will not be retrieved.

SQL uses the following relational operators to build conditions:

=

>

<

>=

<=

<>

SQL also lets you use something called wild card characters when creating a condition that looks at a text field.

Example:

SELECT * FROM MEMBER WHERE

[Last Name] LIKE '%ER%'

This will retrieve any row where the last name has the letters "ER" anywhere in the name.

The first "%" means that any number of characters (including 0 characters) can precede the letters "ER" and the last "%" means that any number of characters can follow the letters

"ER".

We can also just use the wild card character at the end of the string.

Example:

SELECT * FROM MEMBER WHERE

[First Name] LIKE 'J%'

This will retrieve any row where the first name BEGINS WITH the letter 'J'.

10/25/2015 Документ1 Page 5 of 6

Or we can use the wild card at the beginning of the string.

Example:

SELECT * FROM MEMBER WHERE [Last Name] LIKE '%ER'

This will retrieve any row where the last name ENDS WITH 'ER'.

You can also create a field that is computed using existing fields.

Example:

There really are no fields in the Video Rentals database that make sense to use in a computed field, so we'll do one that doesn't make sense. We will add 1000 to the Member

ID number.

SELECT [Member ID] + 1000 AS BigID FROM MEMBER

10/25/2015 Документ1 Page 6 of 6

Download