PPT - NYU Stern School of Business

advertisement
C20.0046: Database
Management Systems
Lecture #1
Matthew P. Johnson
Stern School of Business, NYU
Spring, 2005
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
1
Personnel

Instructor: Matthew P. Johnson


mjohnson@stern
Office hours: tba, 8-171, KMC


please visit!
Tutor/TF/grader: tba…
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
2
Communications

Web page:
http://pages.stern.nyu.edu/~mjohnson/dbms/




syllabus
course policies
may move in the future…
Blackboard web site


Some materials will be available here
Discussion board



send general-interest messages here to benefit all!
Go to http://sternclasses.nyu.edu
Click on C20.0046
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
3
Acknowledgements

Thanks to Ramesh, Ullman, et al., Raghu and
Johannes, Dan Suciu, Arthur Keller, David
Kuijt for course materials

See classpage for other related, antecedent
DBMS courses
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
4
What Is a Database?


A very large, integrated collection of data.
Models real-world enterprise.

Entities


Relationships




students, courses, instructors, TAs
George is currently taking C20.0046
Dick is currently teaching C20.0046
Condi is currently TA-ing C20.0046 but took it last
semester
A Database Management System (DBMS)
is a software package designed to store and
manage databases.
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
5
Databases are everywhere: Ordering a pizza
Databases involved?
Pizza Hut’s DB

1.


stores previous orders by customer
stores previous credit cards used
Credit card records
2.


huge databases of (attempted) purchases
location, date, amount, parties
Got approved by credit-report companies
phone company’s records
3.
4.

Local Usage Details (“Pull his LUDs, Lenny.”)
Caller ID
5.

ensures reported address matches destination
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
6
Your wallet is full of DB records








Driver’s license
Credit cards
NYUCard
Medical insurance card
Social security card
Gym membership
Money (serial numbers)
Maybe even photos (ids on back)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
7
Databases are everywhere


Q: Websites backed by DBMSs?
 retail: Amazon, etc.
 data-mining: Page You Made
 search engines: Google, etc.
 directories: Internic, etc.
 searchable DBs: IMDB, tvguide.com, etc.
Q: Non-web examples of DBMSs?
 criminal/terrorist: TIA
 airline bookings
 NYPD’s CompStat
 all serious crime stats by precinct
 Retailers: Wal-Mart, etc.
 when to re-order, purchase patterns, data-mining
 Genomics!
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
8
Example of a Traditional DB App
Suppose we are building a system
to store the information about:
 checking accounts
 savings accounts
 account holders
 state of each of each person’s accounts
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
9
Can we do it without a DBMS?
Sure we can! Start by storing the data in files:
checking.txt
savings.txt
customers.txt
Now write C or Java programs to implement
specific tasks
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
10
Doing it without a DBMS...

Transfer $100 from George’s savings to
checking:
Write a C program to do the following:
Read savings.txt
Find&update the record “George”
balance -= 100
Write savings.txt
Read checking.txt
Find&update the record “George”
balance += 100
Write checking.txt
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
11
Problems without an DBMS...
1. System crashes:
Read savings.txt
Find&update the rec “George.”
Write savings.txt
Read checking.txt
Find&update the rec “George”
Write checking.txt



CRASH !
Q: What is the problem ?
A: George lost his $100
Same problem even if reordered
2. Simultaneous access by many users


George and Dick visit ATMs at same
Lock checking.txt before each use–what is the problem?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
12
Problems without an DBMS...
3.Large data sets (say 500GB)


No indices


Why is this a problem?
Finding “George” in huge flatfile is expensive
Modifications intractable without better data
structures


“George”  “Georgie” is very expensive
Deletions are very expensive
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
13
Problems without an DBMS...
5.Security?


File system may be insecure
File system security may be coarse
6.Application programming interface (API)?

suppose need other apps to access DB
7.How to interact with other DBMSs?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
14
General problems to solve

In building our own system, many Qs arise:





how do we store the data? (file organization, etc.)
how do we query the data? (write programs…)
make sure that updates don’t mess things up?
 leave the DB “consistent”
provide different views on the data?
 e.g., ATM user’s view v. bank teller’s view
how do we deal with crashes?

Too hard! Go buy a DBMS!

Q: How does a DBMS solve these problems?
A: See third part of course

M.P. Johnson, DBMS, Stern/NYU, Spring 2005
15
Big issue: Transaction processing


Grouping of several queries (or other
database actions) into one transaction
ACID properties

Atomicity


Consistency


constraints on relationships
Isolation



all or nothing
concurrency control
Simulated solipsism
Durability

Crash recovery
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
16
Atomicity & Durability


Saw how George lost $100 with makeshift
software
A DBMS prevents this outcome


xacts are all or nothing
One idea: Keep a log (history) of all actions in
set of xacts


Durability: Use log to redo or undo certain ops
in crash recovery
Atomicity: don’t really commit changes until
end
 Then, all at once
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
17
Isolation

Concurrent execution is essential for performance.




Interleaving actions of different user programs
can lead to inconsistency:


Frequent, slow disk accesses
 don’t waste CPU – keep running
e.g., two programs simultaneously withdraw from the
same account
DBMS ensures such problems don’t arise:

users can pretend they are using a single-user
system.
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
18
Isolation

Contrast with a file in two Notepads




Strategy: ignore multiple users
whichever saves last wins
first save is overwritten
Contrast with a file in two Words



Strategy: blunt isolation
One can edit
To the other it’s read-only
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
19
Consistency

Each xant (on a consistent DB) must leave it
in a consistent state


can define integrity constraints
checks the defined claims about the data remain
true
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
20
Data Models



Any DBMS uses a data model: collection of
concepts for describing data
Schema: description of partic set of data,
using some data model
Relational data model: most widely used (by
far) data model



Oracle, DB2, SQLServer, other SQL DBMSs
main concept: relation ~ table of rows & columns
a rel’s schema defines its fields
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
21
Example: university database

Conceptual schema:




Physical schema:



Students(ssn: string, name: string, login: string,
age: int, gpa: real)
Courses(cid: string, cname: string, credits: int)
Enrolled(sid:string, cid:string, grade: string)
Relations stored as unordered text files.
Indices on first column of each rel
External Schema (View):


Course_info(ssn: string, name: string)
My_courses(cname: string, grade: string)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
22
How the programmer sees the DBMS

Start with DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...

Continue with DML to populate tables:
INSERT INTO Students
VALUES(‘Howard, ‘123456789’, ‘undergraduate’)
. . . .
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
23
How the programmer sees the DBMS

Tables:
Takes:
Students:
SSN
123-45-6789
234-56-7890
Courses:
CID
C20.0046
C20.0056

Name
Howard
Wesley
…
Category
undergrad
grad
…
SSN
123-45-6789
CID
C20.0046
123-45-6789
C20.0056
234-56-7890
C20.0046
…
CName
Databases
Advanced Software
semester
Spring,
2004
Spring,
2004
Fall, 2003
Still implemented as files, but behind the
scenes can be quite complex
“data independence” = separate logical view
from physical implementation
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
24
Querying: Structured Query Language

Find all the students who have taken C20.0046:


Find all the students who C20.0046 last fall:


SELECT SSN
FROM Takes
WHERE CID=‘C20.0046’ AND Semester=‘Fall, 2003’
Find the students’ names:


SELECT SSN
FROM Takes
WHERE CID=‘C20.0046’
SELECT Name
FROM Students, Takes
WHERE Students.SSN=Takes.SSN AND
CID=‘C20.0046’ AND Semester=‘Fall, 2003’
Query processor does this efficiently.
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
25
Database Industry

Relational databases are a great success of
theoretical ideas.






based on most “theoretical” type of math there is: set theory
DBMS companies are among the largest software
companies in the world.
Oracle, IBM (with DB2), Microsoft (SQL Server,
Microsoft Access), etc.
Also opensource: MySQL, PostgreSQL, etc.
$20B+ industry.
XML (“semi-structured data”) also important

New lingua franca for exchanging data
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
26
The Study of DBMS

Several aspects:




This course covers all three


Modeling and design of databases
DBMS programming: querying and update
DBMS implementation
though more time on first two
Also will look at some more advanced areas

XML, data-mining, LDAP?
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
27
Databases are used by

DB app programmers



desktop app programmers
web developers
Database administrators (DBAs)





design schemas
security/authorization
crash recovery
tuning
better paid than programmers!

Everyone else (perhaps indirectly)

“You may not be interested in databases, but
databases are interested in you.” - Trotsky
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
28
Course outline

Database design:



The relational model:



Entity/Relationship models
Modeling constraints
Relational algebra
Transforming E/R models to relational schemas
SQL

Views and triggers
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
29
Outline (Continued)

Connecting to a database from a
programming language
Storage and indexing
Transactions
XML
Advanced topics

May change as course progresses





partly in response to audience
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
30
Textbook


Database Management
Systems
by Raghu Ramakrishnan,
Johannes Gehrke
3 edition (August 14, 2002)
Available:
 NYU bookstore
 Amazon/BN (may be cheaper)
 Amazon.co.uk (may be cheaper
still)
 Links on class page

M.P. Johnson, DBMS, Stern/NYU, Spring 2005
31
SQL Readings

Optional reference: SQL in a Nutshell

Online (free) SQL tutorials include:

A Gentle Introduction to SQL (http://sqlzoo.net/)

SQL for Web Nerds
(http://philip.greenspun.com/sql/)
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
32
Grading

Prerequisites: Programming experience


Work & Grading:








presumably C/C++/Java
Homework 30%: O(4)
Project: 30% - see below.
Midterm (closed book/notes): 15%
Final (closed book/notes): 20%
Class participation: 5%
Stern Curve
Class attendance is required
Absences will seriously affect your total grade
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
33
The Project: design end-to-end DB app

data model



creation of DB in Oracle/Mysql


Identify entities (and fields), relationships
Identify resulting relations (tables)
Insertion of real(alistic) data
(web) app for accessing/modifying data


Identification of “interesting” questions to ask
Production of DBMS interface

Work in pairs/threes (start forming now)

Choose topic on your own: previous e.g.s online
Start forming your group today!

M.P. Johnson, DBMS, Stern/NYU, Spring 2005
34
Collaboration






Homework and exams done individually
Project done with your team members only
Non-cited use of others’ problem solutions,
code, etc. = plagiarism
See Stern’s stern academic honesty policy
Contact me if you’re at all unclear before a
particular case
Cite any materials used if you’re at all unclear
after a particular case
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
35
On-going Feedback

Don’t wait until the end-of-semester course
evals to complain or give feedback on how to
improve course. (It’s too late for you then!)

Come see me early on during my office hours


or send me email with your concerns
“We’re in touch, so you be in touch.”
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
36
Summary

DBMS used to maintain, query large
datasets.

Benefits include recovery from system
crashes, concurrent access, data integrity,
security, and quick application development.

Database skills are critical in financial
services, marketing and other business
areas!
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
37
So what is this course about, really?
A bit of everything!
 Languages: SQL, XPath, XQuery
 Data modeling
 Some theory!



Algorithms and data structures (in the third part)




Functional dependencies, normal forms
e.g., how to find most efficient schema for data
e.g., indices make data much faster to find – but how?
Lots of implementation and hacking for the project
Business DBMS examples/cases
Most importantly: how to meet real-world needs
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
38
For next time

Get the book

Read chapters 1 & 2
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
39
For right now: written survey





name
previous cs/is/math/logic courses
previous programming experience
career plans: programmer, DBA, MBA, etc.
why taking class
M.P. Johnson, DBMS, Stern/NYU, Spring 2005
40
Download