ppt - Computer Science

advertisement
CS3431 –
Database Systems I
Introduction
Instructor: Elke A. Rundensteiner
rundenst@cs.wpi.edu
Rundensteiner-CS3431
1
What is a Database System?
• Database:
a large collection of related data
usually too large to fit in computer memory at once
usually many users want to access this data and do so fast
Focus: information and knowledge, rather than computation
Rundensteiner-CS3431
2
Database Applications
Have you ever used a database application?







E-commerce: books etc at Amazon, B&N
Banks -- your valuable $$ and ATM transactions
Airlines – manage flights to get you places
Universities – manage student enrollment
GIS (Maps) – find restaurants closest to WPI
WWW (World Wide Web) – blobs, wikis, etc. ?
Bio-informatics (genome data)
Data sets increasing in diversity
and volume are everywhere !!!
Rundensteiner-CS3431
3
Why use DBMS, and not files?





Data independence (robustness under change)
Efficient access even on huge data sets
Reduced application development time
Data integrity ensures consistency of data even
with multiple users
Recovery from crashes, security, etc.
Rundensteiner-CS3431
4
Basic Terminology

Data Model:
 A collection of “types” used for describing data

Data Schema:
 Describes structures for a particular application,
using the given model

Database :
 Collection of actual data that conforms to given
schema

Database Management System :
 Software that allows us to create, use and maintain
a database (conforming to given model).
Rundensteiner-CS3431
5
Relational Data Models

The relational model of data

The most widely used model today.

Main concept: relation, basically a table with
rows and columns.
Every relation has a schema, which describes
the columns, or fields.

Rundensteiner-CS3431
Example Database : Relational
Tabular View of Data: Airline System
Flight
flightNo
start
destination
miles
101
BOS
LAX
3000
102
PVD
LAX
2900
Passenger
FlewIn
Tabular
pName
ffNumber
DoB
milesEarned
Joe
1001
1980
12000
Mary
1002
1981
11000
flightNo
ffNumber
date
101
1001
Jan 4
102
1002
Jan 5
view of data is called Relational Model
Rundensteiner-CS3431
7
Levels of Abstraction
• External schema (view) -describes how users see the
data
• Logical schema –
describes the logical
structures used
View1
View2
View3
Logical Schema
Physical Schema
disk
• Physical schema -describes files and indexes
Rundensteiner-CS3431
9
Levels of Abstraction: Example

Logical (Conceptual) Schema:


Physical Schema



Flight, Passenger, FlewIn tables
Flight table stored as a sorted file
Index on flightNo attribute for Flight relation
Views ( External Schema )

NoOfPassengers (flightNo, date, numPassengers)
Rundensteiner-CS3431
10
Data Independence

Applications insulated from how data is structured
and stored.

Logical data independence:



Logical schema can change, but views need not change
Protection from changes in logical structure of data.
Physical data independence:


Protection from changes in physical structure of data.
Physical schema such as indexes can change, but logical
schema need not change.
Rundensteiner-CS3431
Efficient access

Indexing :


Costing :


Indexes gives direct access to “necessary” portion
of data, as opposed to sequential access in files.
Estimate expected execution times
Query optimization :

Automatically determine and prepare optimal
access plans for getting to the data
Optimizer = “The Bread and Butter of a DBMS !”
Rundensteiner-CS3431
12
Reduced application
development time


Higher level of data abstraction
Queries are written in a high level language
tailored for database applications
Example Query:
SELECT pname
FROM Passengers
WHERE flightNo = 101
Rundensteiner-CS3431
13
Data Integrity

DBMS ensures data is consistent under
concurrent access


E.g.: multiple airline staff trying to reserve a seat
for different customers.
Concepts:


Transactions – grouping multiple instructions
(reads/writes) into one atomic unit
Locks – locking of resources (tables)
Rundensteiner-CS3431
14
Recovery from Crashes

If system crashes in middle of transaction,
recovery must be provided :


Cannot afford to loose data
Ideas: logging, commit/rollback of transactions
Rundensteiner-CS3431
15
Who use databases?



End users
DB application programmers
Database Administrators




Database design
Security, Authorization
Data availability, crash recovery
Database tuning (for performance)
Rundensteiner-CS3431
16
Summary : Why study DBMS?

Need to process large amounts of data increasing

Video, WWW, computer games, geographic
information systems (GIS), genome data, digital
libraries, etc.

DB administrators and programmers hold
rewarding jobs.

DBMS research is one of the most exciting areas in
Computer Science !!
Rundensteiner-CS3431
17
Download