Introduction to Database Systems

advertisement
Introduction to
Database Design
Donghui Zhang
CCIS, Northeastern University
Outline




Database and DBMS
Architecture of Database Applications
Database Design
Database Application Programming
Database, DBMS



A Database is a very large,
integrated collection of data.
A Database Management System
(DBMS) is a software designed to
store and manage databases.
A Database Application is a
software which enables the users
to access the database.
Why DBMS?




We currently live in a world
experiencing information explosion.
To manage the huge amount of data:
DBMS
the total RDBMS market in 2003 was $7
billion in license revenues.
Much more money was spent to develop
Database applications.
3000
2500
2000
1500
1000
500
0
Ot
he
rs
Mi
cr
os
of
t
NC
R
Te
ra
da
ta
Or
ac
le
2002
2003
IB
M
#million dollars
RDBMS New Liscence Revenue
Total revenue: 7.1 billion in 2003.





The worldwide database management
software market saw double-digit
growth in 2004.
The five-year forecast calls for a
compound annual growth rate of nearly
6 percent, bringing the market to $12.7
billion in new license revenue by 2009.
Title: Forecast: Database Management
Systems Software, Worldwide, 2003-2009
Author: Colleen Graham, Gartner
Time: April 21, 2005
DBMS can Provide …





Data independence and efficient
access.
Reduced application development
time.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from
crashes.
DBMS Historic Points



First DBMS developed by Turing
Award winner Charles Bachman in
the early 1960s.
in 1970, Turing Award winner Edgar
Codd proposed the relational data
model.
in the late 1980s, IBM proposed SQL.
Outline




Database and DBMS
Architecture of Database Applications
Database Design
Database Application Programming
Components of Data-Intensive Systems
Three separate types of functionality:
 Data management
 Application logic
 Presentation
Example: Course Enrollment
-- Build a system using which students can
enroll in courses:
 Data Management
• Student info, course info, instructor info,
course availability, pre-requisites, etc.

Application Logic
• Logic to add a course, drop a course, create
a new course, etc.

Presentation
• Log in different users (students, staff,
faculty), display forms and human-readable
output
The Three-Tier Architecture
Presentation tier
Middle tier
Data management
tier
Client Program (Web Browser)
Application Server
Database System
E.g. What we use
Presentation tier
Middle tier
Data management
tier
Client Program (Web Browser)
Application Server
Apache
JSP
Database System
MySQL
HTML: An Example
<HTML>
<HEAD></HEAD>
<BODY>
<h1>Barns and Nobble Internet
Bookstore</h1>
Our inventory:
<h3>Science</h3>
<b>The Character of Physical
Law</b>
<UL>
<LI>Author: Richard
Feynman</LI>
<LI>Published 1980</LI>
<LI>Hardcover</LI>
</UL>
<h3>Fiction</h3>
<b>Waiting for the Mahatma</b>
<UL>
<LI>Author: R.K. Narayan</LI>
<LI>Published 1981</LI>
</UL>
<b>The English Teacher</b>
<UL>
<LI>Author: R.K. Narayan</LI>
<LI>Published 1980</LI>
<LI>Paperback</LI>
</UL>
</BODY>
</HTML>
HTML: static vs dynamic

Static: you create an HTML file which is
sent to the client’s web browser upon
request. E.g.:
• your CCIS login is ‘donghui’,
• your HTML file is
/home/donghui/.www/index.html
• The URL is
http://www.ccs.neu.edu/home/donghui

Dynamic: the HTML file is generated
dynamically via your ASP.NET code.
Another View
Client Machines
Machine 1
Your
database
MySQL
Machine 2
Your
JSP
Code
Apache
Client browser 1
Client browser 2
Client browser 3
Client-Server Architecture
Server



Client
Data Management: DBMS @ Server.
Presentation: Client program.
Application Logic: can go either way.
• If combined with server: thin-client
architecture
• If combined with client: thick-client
architecture
Thin-Client Architecture
Client
Server
Client
Client
• Database server and web server too closely
coupled,
• E.g. Does not allow the application logic to
access multiple databases on different servers.
Thick-Client Architecture
Client
Server
Client
Client
• No central place to update the business logic
• Security issues: Server needs to trust clients
• Does not scale to more than several 100s of
clients
Advantages of the Three-Tier Architecture

Heterogeneous systems
• Tiers can be independently maintained, modified, and
replaced

Thin clients
• Only presentation layer at clients (web browsers)

Integrated data access
• Several database systems can be handled transparently
at the middle tier
• Central management of connections

Scalability
• Replication at middle tier permits scalability of business
logic

Software development
• Code for business logic is centralized
• Interaction between tiers through well-defined APIs: Can
reuse standard components at each tier
Outline




Database and DBMS
Architecture of Database Applications
Database Design
Database Application Programming
ER-Model
 Entity: Real-world object
distinguishable from other objects.
E.g. Students, Courses.
 An entity has multiple attributes.
E.g. Students have ssn, name,
phone.
Entities have relationships with
each other. E.g. Students enroll
Courses.

Example of ER Diagram
time
name
ssn
title
phone
Students
unit
cid
Enroll
Courses
To implement the above design, store three
tables in the database.
Students
ssn name
Enroll
phone
1111
John
617-373-5120
2222
Alice
781-322-6084
3333
Victor
617-442-7798
ssn
Courses
cid
title
unit
CSU430
Database Design
4
CSG131
Transaction Processing
4
CSG339
Data Mining
4
cid
time
1111
CSU430
Fall’03
1111
CSG339
Spring’04
2222
CSG131
Winter’03
2222
CSG339
Spring’04
3333
CSU430
Winter’01
Key Constraint in ER Diagram
name
ssn
dname
phone
Students
did
BelongsTo
address
Departments
Many-to-one relationship: no need to be
implemented as a table!
Students
ssn name
phone
did
1111
John
617-373-5120
1
2222
Alice
781-322-6084
1
3333
Victor
617-442-7798
3
Departments
did
dname
address
1
Computer Science
#161 Cullinane
2
Electrical Engineering
#300 Egan
3
Physics
#112 Richard
Some Other Design Concepts





Primary key
Participation constraint
Normal forms (BCNF, 3-NF, etc.)
IS-A hierarchy
Ternary relationships
Outline




Database and DBMS
Architecture of Database Applications
Database Design
Database Application Programming
SQL Query
Find the students in Computer Science Department .
• if we know the did is 1:
SELECT S.name
FROM Students S
WHERE S.did=1
• otherwise:
SELECT S.name
FROM Students S, Departments D
WHERE D.did=S.did AND
D.dname=`Computer Science’
SQL in Application Code


SQL commands can be called from
within a host language (e.g., C++, Java)
program.
Two main integration approaches:
• Embed SQL in the host language
(Embedded SQL, SQLJ)
• Create special API to call SQL
commands (JDBC)
Implementation of
Database System
Introduction
Donghui Zhang
Partially using Prof. Hector Garcia-Molina’s slides (Notes01)
http://www-db.stanford.edu/~ullman/dscb.html
32
Isn’t Implementing a Database
System Simple?
Relations
Statements
Results
33
Introducing the
Database Management System
• The latest from Megatron Labs
• Incorporates latest relational technolo
• UNIX compatible
34
Megatron 3000 Implementation
Details

Relations stored in files (ASCII)
e.g., relation R is in /usr/db/R
Smith # 123 # CS
Jones # 522 # EE
.
.
.
35
Megatron 3000 Implementation
Details

Directory file (ASCII) in
/usr/db/directory
R1 # A # INT # B # STR …
R2 # C # STR # A # INT …
.
.
.
36
Megatron 3000
Sample Sessions
% MEGATRON3000
Welcome to MEGATRON 3000!
&
.
.
.
& quit
%
37
Megatron 3000
Sample Sessions
& select *
from R #
Relation R
A
B
SMITH 123
C
CS
&
38
Megatron 3000
Sample Sessions
& select A,B
from R,S
where R.A = S.A and S.C > 100 #
A
123
522
B
CAR
CAT
&
39
Megatron 3000

To execute “select * from R where
condition”:
(1) Read directory file to get R attributes
(2) Read R file, for each line:
(a) Check condition
(b) If OK, display
40
Megatron 3000

To execute “select A,B from R,S where
condition”:
(1) Read dictionary to get R,S attributes
(2) Read R file, for each line:
(a) Read S file, for each line:
(i) Create join tuple
(ii) Check condition
(iii) Display if OK
41
What’s wrong with the Megatron
3000 DBMS?

Expensive update and search
e.g., - To locate an employee with a given SSN,
file scan.
- To change “Cat” to “Cats”, complete file
write.
• Solution: Indexing!
42
What’s wrong with the Megatron
3000 DBMS?

Brute force query processing
e.g., select *
from R,S
where R.A = S.A and S.B > 1000
- Do select first?
- More efficient join?
• Solution: Query optimization!
43
What’s wrong with the Megatron
3000 DBMS?

No concurrency control or
reliability
e.g., - if two client programs read your bank
balance ($5000) and add $1000 to it…
- Crash.
• Solution: Transaction
management!
44
Download