The Relational Model CS 186, Spring 2007, Lecture 2 Mary Roth

advertisement
The Relational Model
CS 186, Spring 2007, Lecture 2
Cow book Section 1.5, Chapter 3
Mary Roth
Administrivia
• Homework 0
– Due next Tuesday, Jan 23 10 p.m.
– Submission instructions added to
homework description
– Class account forms here if you need them
• Discussion sections will meet today
• Questions?
Outline
• What we learned last time
– What is and what good is a DBMS anyway?
– Components of a DBMS
• New stuff
– A brief history of databases
– The relational data model
Review: What is a database?
• A collection of data organized for rapid
search and retrieval
– Data collection has some logical meaning,
and some reason for it to be organized in a
particular way.
Review: What is a DBMS?
• A software system designed to manage
a database.
– Think big and lots of data
• 300,000,000 bank accounts
– Think mission critical
• 1,000,000 transactions a day
• You’d need a DBMS to:
– Help you find things fast
– Help you keep track of what’s going on
Review: ACID properties
• A DBMS ensures a database has ACID properties:
• Atomicity – nothing is ever half baked; database
changes either happen or they don’t.
• Consistency – you can’t peek at the data til it is
baked; database changes aren’t visible til they are
commited
• Isolation – concurrent operations have an
explainable outcome; multiple users can operate
on a database without conflicting
• Durability – what’s done is done; once a database
operation completes, it remains even if the database
crashes
Review: DBMS components
• A DBMS is like an ogre; it has layers
– We’re going to learn about these layers all semester
– We’re going to build several layers in our homework projects
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Review: DBMS
components
•Talks to DBMS to manage data for a specific task
Database application
Query Optimization
and Execution
-> e.g. app to withdraw/deposit money or provide
a history of the account
•Figures out the best way to answer a question
-> There is always nore than 1 way to skin a cat…!
•Provides generic ways to combine data
Relational Operators
Access Methods
-> Do you want a list of customers and accounts or
the total account balance of all customers?
•Provides efficient ways to extract data
-> Do you need 1 record or a bunch?
•Makes efficient use of RAM
Buffer Management
-> Think 1,000,000 simultaneous requests!
•Makes efficient use of disk space
Disk Space Management
DB
-> Think 300,000,000 accounts!
Review: How does a DBMS work?
Database app Query in:
e.g. “Select min(account balance)”
Query Optimization
and Execution
Relational Operators
Access Methods
Buffer Management
Disk Space Management
Customer accounts
stored on disk
Data out:
e.g. 2000
Review: Typical architecture for DB
applications
1.
2.
3.
4.
Enter queries, etc.
By typing text
Graphically
compose queries,
look at data
Embed database
access in a
program
Embed database
access in a web
application
Web browser
Command line
JDBC/ODBC
app
GUI
DBMS
App server
JDBC/ODBC
Summary: Benefits of a DBMS
1. Data independence
–
applications worry about what data they want, not how it
is stored
2. Efficient data access
–
DBMS is smart about how to retrieve data
3. Data integrity and security
–
DBMS won’t let you corrupt data
4. Centralized administration
–
stored data on single server and let people specialize in
managing it
5. Concurrent access
–
Handles multiple users efficiently and recoverably
6. Reduced application development time
–
Derived from 1-5
Minibase is a Java-based DBMS
Database application
Query Optimization
and Execution
Relational Operators
Access Methods
Buffer Management
Disk Space Management
DB
•Homework 5
•Homework 3 (Pencil-work)
•Homework 4
•Homework 2
•Homework 1
•Provided for you
Intermission
• Get up and stretch
• Ask a quick question
• Get a drink of water
A brief history of databases
• Birth of the DBMS parallels adoption of
computer over 1960s and 1970s
• 1960s: IBM introduced IMS
– 36 years old!
– ‘Legacy’ technology, but still important!
• 100,000,000 bank transactions a day move
money through IMS system
• A bank manages over 300,000,000 online bank
accounts on IMS
• One production IMS system has been running
for over 8 years without down time or a crash
A brief history of databases
• 1970: Ted Codd introduced the
relational data model
– Revolutionary idea that spurred a flurry of
DBMS activity
– …at IBM (System R, DB2)
– …at Universities like Berkeley (Ingres)
– …at Oracle (it was born!!)
• Ted Codd won the Turing award in 1981
• Larry Ellison became a gillionaire
So what’s the big deal about the
relational data model?
• What is the first benefit of a DBMS?
– Data independence
• A Data Model is key to data
independence
– It’s the link that provides an
abstraction between user’s
view of the world and bits
stored in computer
Student (sid: string, name: string,
login: string, age: integer, gpa:real)
1010111101
So what’s the big deal about the
relational data model?
• It is now the most widely used data model.
• Before 1970, there were other data models…
– Network
– Hierarchical (IMS)
• But they didn’t really provide data independence
– If the data layout changed, the application had to change
– If you wanted to change the layout, you often had to bring the
whole system down
– Changes had to occur over scheduled system down time.
• Slow! Annoying! Expensive!
• The relational model changed all that.
Relational Database: Definitions
• Relational database: a set of relations.
• Relation: made up of 2 parts:
– Schema : specifies name of relation, plus name
and type of each column.
• e.g.
Students(sid: string, name: string, login: string, age:
integer, gpa: real)
– Instance : a table, with rows and columns.
• #rows = cardinality
• #fields = degree / arity
• You can think of a relation as a set of rows or
tuples. (It’s basically a spread sheet!)
– i.e., all rows are distinct
Ex: Instance of Students Relation
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
• Cardinality = 3, arity = 5 , all rows distinct
• Do all values in each column of a relation instance
have to be distinct?
SQL - A language for Relational DBs
• SQL (a.k.a. “Sequel”), standard
language
• Data Definition Language (DDL)
– create, modify, delete relations
– specify constraints
– administer users, security, etc.
• Data Manipulation Language (DML)
– Specify queries to find tuples that satisfy
criteria
– add, modify, remove tuples
SQL Overview
• CREATE TABLE <name> ( <field> <domain>, … )
• INSERT INTO <name> (<field names>)
VALUES (<field values>)
• DELETE FROM <name>
WHERE <condition>
• UPDATE <name>
SET <field name> = <value>
WHERE <condition>
• SELECT <fields>
FROM <name>
WHERE <condition>
Creating Relations in SQL
• Creates the Students relation.
– Note: the type of each field is specified,
and enforced by the DBMS whenever
tuples are added or modified.
CREATE TABLE Students
(sid CHAR(20),
name CHAR(20),
login CHAR(10),
age INTEGER,
gpa FLOAT)
Table Creation (continued)
• Another example: the Enrolled table
holds information about courses
students take.
CREATE TABLE Enrolled
(sid CHAR(20),
cid CHAR(20),
grade CHAR(2))
Adding and Deleting Tuples
• Can insert a single tuple using:
INSERT INTO Students (sid,name,login,age,gpa)
VALUES (‘53688’,‘Smith’,‘smith@ee’,18,3.2)
•
Can delete all tuples satisfying some condition
(e.g., name = Smith):
DELETE FROM Students S
WHERE S.name = ‘Smith’
Powerful variants of these commands are available;
more later!
Keys
• Keys are a way to associate tuples in
different relations
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
FOREIGN Key
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
PRIMARY Key
age
18
18
19
gpa
3.4
3.2
3.8
Keys are the key to data independence!
• Big improvement over the hierarchical
model
– Relationships are determined by field value,
not physical pointers!
Keys are the key to data independence!
• Let’s enroll Smith in EECS in CS186
– With hierarchical model
53688
Smith
smith@eecs 18 3.2
CS186
•
A
IMS requires
–
A change to add a field for CS186
–
A change to Smith’s record to have him point to the new field
Keys are the key to data independence!
• Let’s enroll Smith in EECS in CS186
– With relational model
Enrolled
sid
53666
53666
53650
53666
53688
•
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
CS186
A
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
Relation model only requires
–
A data change to add a new row to Enrolled table
age
18
18
19
gpa
3.4
3.2
3.8
Let’s return to our bank…
• Can we apply a relational model to our
bank spreadsheet?
Exercises to test your understanding…
• Write the DDL for our bank tables.
– Include primary and foreign key definitions
• Write a SQL query (DML) that returns
the names and account balances for all
customers that have an account balance
> 2500.
• Write a SQL query (DML) that
withdraws $300 from Frodo’s account.
Download