Slides - New York University

advertisement
INFORMATION
TECHNOLOGY IN
BUSINESS AND
SOCIETY
SESSION 15 – RELATIONAL DATABASES
SEAN J. TAYLOR
ADMINISTRATIVIA
• Assignment 3:
Due tonight at midnight
(AdSense in a week)
• Midterm back on Thursday
• Database tutorial led by Varun
• Assignment 4:
Posted Thursday, due Friday 3/30
ADMINISTRATIVIA II
• Groups:
I will email a form for you to list up to five
classmates and then choose the groups.
You can list 0-5.
• 2-way feedback:
1. I will send you an anonymous survey.
2. I will send you a brief summary of your
current grade.
LEARNING OBJECTIVES
1. Understand what relational databases are
(or, why text files and Excel are not enough)
2. Identify and distinguish between the following
parts of a relational database: tables, records,
fields, field values
3. Understand three types of anomalies that arise
from un-normalized data
4. Understand how primary keys and foreign keys
are used to link tables.
WHY ARE DATA
VALUABLE?
RELATIONAL DATABASES
• Store data (insert)
• Retrieve data (query)
• Software applications
• Operations
• Analyze data (reporting capabilities)
WHY NOT STORE DATA LIKE
THIS?
Order# Date
Customer ID Last Name First Name Address
ISBN
Book Name
Author
#0465039138 Code and other laws of cyberspace Lessig, Lawrence
Digital Copyright: Protecting
#1573928895 Intellectual Property on the Internet Litman, Jessica
Price
1 9/1/03 C1001
Bezos
Jeff
2 9/2/03 C1004
Sproull
Lee
1 Amazon Plaza
Dean's Office, Stern
School, New York
3 9/3/03 C1002
Student
Pat
Tisch LC-12, New York #0072952849 MIS in the Information Age
Haag, Stephen
$98.75
4 9/4/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
Barabasi, AlbertLaszlo
$34.95
5 9/5/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
Rheingold, Howard
$29.95
6 9/6/03 C1001
Bezos
Jeff
Rheingold, Howard
$29.95
7 9/7/03 C1002
Student
Pat
Litman, Jessica
$55.00
8 9/8/03 C1001
Bezos
Jeff
Rheingold, Howard
$29.95
Linked: The New Science of
#0738206679 Networks
Smart Mobs: The Next Social
#0738206083 Revolution
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
Digital Copyright: Protecting
Tisch LC-12, New York #1573928895 Intellectual Property on the Internet
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
$25.00
$55.00
RELATIONAL DATABASES
• Databases that use a series of logically related twodimensional tables to store their information
• Tables are comprised of fields/records, which in turn
contain field values
Student
Field
Record
Field value
Table
Last Name
SS#
DOB
Major
Smith
100201122
06/11/84
IS
Kim
200202222
1/1/85
FIN
Davis
300201232
12/31/81
MKT
Pat
999132212
3/3/88
ACC
RELATIONAL DATABASES
Relational Database
Tables
Records
Fields
Field values
Bytes, bits
Field
Record
Field value
Student
Table
Last Name
SS#
DOB
Major
Smith
100201122
06/11/84
IS
Kim
200202222
1/1/85
FIN
Davis
300201232
12/31/81
MKT
Pat
999132212
3/3/88
ACC
ADVANTAGES
1. Consistency
•
We can restrict the values of certain fields (e.g.
dates, integers)
• We can impose other kinds of constraints (all costs
must be positive, last names must be included,
orders must have addresses)
• Data look the same to all users at the same time.
2. Centralization
•
Many different users can edit and view the data
simultaneously. Efficient sharing of information.
3. Efficient Querying
•
SQL and other query languages can be used to
create complex reports quickly
PROBLEMS WITH EXCEL?
When should you use a database instead of Excel?
–
Insertion anomalies
–
Deletion anomalies
–
Update anomalies
}
Data Quality Problems
Should we just create multiple workbooks in Excel?
–
The real power of a database: Querying
–
How would you answer the following question in Excel?
–
Find customers that spend on average $50 per book order,
that live on West Coast or on the East Coast (but not in
Midwest) and whose annual income is at least $150K
INSERTION ANOMALIES
• Inability to insert a piece of information about an object without
having to insert a (bogus) piece of information about something
else
• Example: Adding a new customer/book before it is ordered
How can you add the book “Harry Potter” in the file below?
Order# Date
Customer ID Last Name First Name Address
ISBN
Book Name
Author
1 9/1/03 C1001
Bezos
Jeff
2 9/2/03 C1004
Sproull
Lee
1 Amazon Plaza
Dean's Office, Stern
School, New York
3 9/3/03 C1002
Student
Pat
Tisch LC-12, New York #0072952849 MIS in the Information Age
Haag, Stephen
$98.75
Barabasi, AlbertLaszlo
$34.95
Rheingold, Howard
$29.95
Rheingold, Howard
$29.95
Litman, Jessica
$55.00
Rheingold, Howard
$29.95
4 9/4/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
5 9/5/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
6 9/6/03 C1001
Bezos
Jeff
7 9/7/03 C1002
Student
Pat
8 9/8/03 C1001
Bezos
Jeff
#0465039138 Code and other laws of cyberspace Lessig, Lawrence
Digital Copyright: Protecting
#1573928895 Intellectual Property on the Internet Litman, Jessica
Price
Linked: The New Science of
#0738206679 Networks
Smart Mobs: The Next Social
#0738206083 Revolution
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
Digital Copyright: Protecting
Tisch LC-12, New York #1573928895 Intellectual Property on the Internet
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
$25.00
$55.00
DELETION ANOMALIES
• The loss of a piece of information about one object when a
piece of information about a different object is deleted
• Example: Deleting order 2 => deleting customer Lee Sproull
• Example: Deleting order 1 => deleting book “Code…”
Order# Date
Customer ID Last Name First Name Address
ISBN
Book Name
Author
1 9/1/03 C1001
Bezos
Jeff
2 9/2/03 C1004
Sproull
Lee
1 Amazon Plaza
Dean's Office, Stern
School, New York
3 9/3/03 C1002
Student
Pat
Tisch LC-12, New York #0072952849 MIS in the Information Age
Haag, Stephen
$98.75
Barabasi, AlbertLaszlo
$34.95
Rheingold, Howard
$29.95
Rheingold, Howard
$29.95
Litman, Jessica
$55.00
Rheingold, Howard
$29.95
4 9/4/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
5 9/5/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
6 9/6/03 C1001
Bezos
Jeff
7 9/7/03 C1002
Student
Pat
8 9/8/03 C1001
Bezos
Jeff
#0465039138 Code and other laws of cyberspace Lessig, Lawrence
Digital Copyright: Protecting
#1573928895 Intellectual Property on the Internet Litman, Jessica
Price
Linked: The New Science of
#0738206679 Networks
Smart Mobs: The Next Social
#0738206083 Revolution
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
Digital Copyright: Protecting
Tisch LC-12, New York #1573928895 Intellectual Property on the Internet
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
$25.00
$55.00
UPDATE ANOMALIES
• A need to change the same piece of information about an
object multiple times
• Example: Changing Jeff Bezos address in order 1 leaves
orders 6 and 8 unchanged…
Order# Date
Customer ID Last Name First Name Address
ISBN
Book Name
Author
1 9/1/03 C1001
Bezos
Jeff
2 9/2/03 C1004
Sproull
Lee
1 Amazon Plaza
Dean's Office, Stern
School, New York
3 9/3/03 C1002
Student
Pat
Tisch LC-12, New York #0072952849 MIS in the Information Age
Haag, Stephen
$98.75
Barabasi, AlbertLaszlo
$34.95
Rheingold, Howard
$29.95
Rheingold, Howard
$29.95
Litman, Jessica
$55.00
Rheingold, Howard
$29.95
4 9/4/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
5 9/5/03 C1003
Gates
Bill
Microsoft Corporation,
Redmond
6 9/6/03 C1001
Bezos
Jeff
7 9/7/03 C1002
Student
Pat
8 9/8/03 C1001
Bezos
Jeff
#0465039138 Code and other laws of cyberspace Lessig, Lawrence
Digital Copyright: Protecting
#1573928895 Intellectual Property on the Internet Litman, Jessica
Price
Linked: The New Science of
#0738206679 Networks
Smart Mobs: The Next Social
#0738206083 Revolution
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
Digital Copyright: Protecting
Tisch LC-12, New York #1573928895 Intellectual Property on the Internet
Smart Mobs: The Next Social
1 Amazon Plaza
#0738206083 Revolution
$25.00
$55.00
MODELING DATA WITH
ENTITYRELATIONSHIP
DIAGRAMS
ENTITY RELATIONSHIP DIAGRAM
• The aim of an ERD is to model the data within the
Information System.
• Provides a CONCEPTUAL DATA MODEL:
a concept of the system, independent of
implementation
1. What data should be stored?
2. What relationships exist between items of data?
ENTITIES
An actual, real thing or person about which data
might be stored is referred to as an entity.
An entity can be uniquely identified.
Organizations collect and store data about entities:
• if a bank stores data about you - you are an entity
• if a business stores a piece of paper called an invoice - the
invoice is an entity
• a library stores data about a particular book - the book is
an entity
RELATIONSHIPS
• Entities are associated with each other via
relationships.
• A relationship is a named association
between two or more entity types:
Player
Plays for
Team
DEFINING RELATIONSHIPS
Entity-relationship (E-R) diagram:
a graphic method of representing entity classes
and their relationships.
•
•
•
•
•
Rectangle – entity class
Dotted line – relationship
| – single relationship
O – zero or optional relationship
Crow’s foot () – multiple relationship
The types of relationships reflect the business rules
applicable to the entities
SIMPLE HOSPITAL EXAMPLE
In a hospital system, each ward has
many patients who are cared for by
nurses assigned to the specific ward.
Patients may require treatment by
more than one specialist doctor. A
patient belongs to only one ward.
SIMPLE HOSPITAL EXAMPLE
In a hospital system, each ward has
many patients who are cared for by
nurses assigned to the specific ward.
Patients may require treatment by
more than one specialist doctor. A
patient belongs to only one ward.
SIMPLE HOSPITAL SYSTEM
• Ward has many patients (1:N)
• Patients are cared for by nurses (N:M)
• Ward has assigned many nurses (1:N)
• Patients require treatment by one or
more doctor (N:M)
SIMPLE HOSPITAL SYSTEM ERD
WARD
has
assigned
NURSE
DOCTOR
accommodates
cares
for
treats
PATIENT
UNIVERSITY DATABASE
EXAMPLE
A department has many Professors. A Professor
belongs to only one department. The department
offers many different courses, and many
Professors can teach a single course. A Professor
can also teach more than one course. Students
enroll for many courses and courses have many
students. A course belongs to only one
department.
FITTING DATA INTO THE RELATIONAL MODEL
NORMALIZATION
NORMALIZING AMAZON’S
DATA
• The process of assuring that a database can be implemented
effectively as a set of two-dimensional tables
• Unlike Excel though, the tables are connected
• Prevents insertion, deletion and update anomalies
CONNECTING TABLES TOGETHER
Primary keys
–
A field (or group of fields in some cases) that uniquely
describes each record in a table
–
Examples: Customer ID, ISBN, Order#
Foreign keys
Each record should have
a unique primary key
–
A field that is a primary key in one table and appears in a
different table (though not as the primary key)
–
Examples: Customer ID in Orders
Integrity constraints
–
Rules that help ensure data quality
DATABASE SCHEMA
Summary of the logical structure of your database
–
The tables in your database, along with each of their fields, keys
–
The relationships between the tables
Primary key
Primary key
Primary key
Foreign key
Foreign key
NEXT CLASS:
SQL
• Do the SQL tutorial at
http://sqlzoo.net/
(at least sections 0-3)
• Download the “Facebook”
database from Blackboard
and make sure you can open
it in MS Access
• Interest in OSS databases?
• Bring a laptop if you want.
Download