Database Management Systems

advertisement
Course Information - URL
Database Management Systems:
Introduction
All course material will be placed at the URL:
www.seas.gwu.edu/~narahari/cs178/
lab materials will be on
Fall 2004
•
Instructor: Bhagi Narahari
narahari@gwu.edu
Office Hours: Tues,Wed 4-6pm, Fri 1-2pm
TAs:
Lab: Ali Khoshgozaran alikosh@gwu.edu
HWs: Stefan
CS 178 Database Management Systems
1
Course Info on Web
CS 178 Database Management Systems
2
Course Requirements: Grading
Notes in PDF or HTML
Homeworks 15%
Programming assignments 15%
Will post material for each topic on the day that
we cover the material in class
late submissions invoke penalty of 10% each day
late to a max of 60% penalty
Homeworks in PDF or HTML
Project description and requirements
Oracle “Getting started” and Demo files
Exam and project schedule will be placed
All announcements - schedule, changes, etc.
CS 178 Database Management Systems
www.seas.gwu.edu/~cs178/
All course announcements will be placed on
web-- check once a week!
Two Exams (in class): 45-50%
Final Project (demos required): 20-25%
3
CS 178 Database Management Systems
4
1
Academic Integrity Policy
Course Requirements & Rules
Homeworks
www.cs.gwu.edu/academics/integrity.html
written (theory)
•
details and FAQ
relational algebra, file structures and indexing, query opt.
No collaboration (of any sort) on homeworks
No collaboration among teams
programming assignments using Oracle
SQL, PL/SQL, Oracle Forms, JDBC/Java .
Project
within team each team member must have clear
role -- i.e., clearly partitioned tasks for each team
member
2 person Team
Final project due by last week of Class
Grading criteria explained on project page.
Will use Oracle or MySQL-- TBD
CS 178 Database Management Systems
violation of integrity policy -- default is
maximum punishment (at least F for course)
5
Project
6
Lab Sections and TA
A set of “applications” will be posted on the
web site – after teams are finalized (after next
class).
A clear set of “minimum” requirements will
be specified – note that meeting minimum
requirements does not imply an A grade on
the project.
A portion of the project grade will be
“competetive”
Clear deadlines for specific steps in the
project will be posted.
CS 178 Database Management Systems
CS 178 Database Management Systems
Lab sections conducted by TA (Ali)
Does Tuesday 2pm work for all ??
Lab sections will cover
Intro Oracle: SQL, PL/SQL, Forms, JDBC
Short tutorials – including application
development
Clarifications on Programming Assignments
Help with analysis of Project (but not in the design
of the project)
There will be another TA for homework and
lecture questions– details coming soon.
7
CS 178 Database Management Systems
8
2
Accounts, Team Partners etc.
Prerequisites
For Oracle Account:
CS 141/151 Programming and Data
Structures
Languages: Java is required (can do project
using Oracle Forms and Reports)
CS 52/156 Operating system basics
CS 133 Discrete Math/Logic
CS 52/136 Computer
Architecture/Organization
Email Name and your hobbes username to the lab
TA (Ali)
Submit the requested background info to the TA
during the first lab
•
You will NOT be allowed to work on the project unless
you have submitted the background info.
CS 178 Database Management Systems
9
Outline
CS 178 Database Management Systems
10
Introduction to DBMS
Introduction to Relational DBMS
Logical level design of Relational Databases
Formal Query Languages: Rel. algebra
Query languages: SQL
Relational Schema Design and Normal Forms, Tuning
Physical Database Design
Storage, Indexing, File Structures
Query Processing and optimization-i.e., how things work
Concurrency and Recovery; Intro to transaction processing
Overview of Performance modelling
Advanced Topics-time permitting: Security and Privacy,
GIS, Data Mining, OLAP
CS 178 Database Management Systems
11
CS 178 Database Management Systems
12
3
What Is a DBMS?
Why Use a DBMS?
A very large, integrated collection of data.
Models real-world enterprise.
Data independence and efficient access.
Reduced application development time.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
Entities (e.g., students, courses)
Relationships (e.g., Jimmy Page is taking CS178)
A Database Management System (DBMS) is the
software to store/retrieve and manage
databases.
CS 178 Database Management Systems
Why Study Databases??
13
?
CS 178 Database Management Systems
Why Study Databases??
Nothing to do on Mon,Wed 2-3pm!
Shift from computation to information processing
14
?
Information gathering is first step to analysis
at the “low end”: scramble to webspace (a mess!)
at the “high end”: scientific applications
tons of data can be collected easily using current
technology
Datasets increasing in diversity and volume.
To effectively analyze data, must
Digital libraries, interactive video, Human Genome project,
GIS... need for DBMS exploding
collect relevant data
store in manner amenable to efficient access
provide infrastructure for ease of programming
DBMS encompasses most of CS
OS, languages, theory, “A”I, multimedia, logic
Data analysis methods are current emphasis
in the market
CS 178 Database Management Systems
15
CS 178 Database Management Systems
16
4
Data Analysis
Data Analysis: Data Mining
Data Warehousing
Data mining: finding ‘hidden patterns’ in
data; i.e., patterns and relationships that are
not ‘obvious’
nothing but a big database (remember: you can
charge your client more if you say warehouse
instead of DB!!!).
purchasing patterns of supermarket customers
OLAP (on-line analytical processing)
•
multidimensional view of the data
•
How do you use the above pattern/knowledge to
improve your marketing strategy ??
(car types, month, number of sales per month for each
car type) can be viewed as 3-dimensional data
can hypothesize better with different data view
CS 178 Database Management Systems
•
Leave it to the Business Majors to worry about!!
Data mining is “engine” behind
Personalization software
17
Okay, back to CS178
CS 178 Database Management Systems
18
Course Outline: Schedule of Topics
“logical level”- Part 1
Why the discussion on Data mining etc.?
Analysis is important to make informed decisions
efficient analysis requires efficient storage&design
efficient storage&design requires study of DBMS!
Data Mining and other data analysis tools are
current trends
how is data represented at the logical level
how to design a good logical database for an
application
•
what is a good design?
Overview of Performance Modelling
‘physical level’ - Part 2
require solid background in database design and
analysis!!
how to store data on disks and memory
how to efficiently implement logical level
operators at ‘machine level’
and remember- DBMS is basic backbone in
Transaction Processing systems!
CS 178 Database Management Systems
Pattern: 40% of Customers who buy beer also buy
diapers.
•
19
note similarity to programming language
implementation
CS 178 Database Management Systems
20
5
Data Models
Schemas and Instances
A data model is a collection of concepts for describing
data.
A schema is a description of a particular collection of
data, using the a given data model.
The relational model of data is the most widely used
model today.
Schema: overall design of database
instance: collection of info stored at particular
time
var cust1: customer
Main concept: relation, basically a table with rows and
columns.
Every relation has a schema, which describes the columns, or
fields.
•
schema is defined at DB design time
Other data models:
data changes but schema does not
Network
Hierarchical
OO
CS 178 Database Management Systems
21
Levels of Abstraction
Many views, single
conceptual (logical) schema
and physical schema.
Views describe how users
see the data.
Conceptual schema defines
logical structure
Physical schema describes
the files and indexes used.
cust1 is variable of type customer; structure of customer
is scheme, value of cust1 is instance
CS 178 Database Management Systems
22
Levels of Abstraction..
View 1
View 2
View 3
Another approach is three schema
architecture
Physical or Internal Level: how data is stored
Conceptual Level: describes what data
Conceptual Schema
Physical Schema
also Record Level
View Level: describes only part of data
ATM machine
* Schemas are defined using Data Definition Language (DDL);
* data is modified/queried using Data Manipulation Lang(DML).
CS 178 Database Management Systems
23
CS 178 Database Management Systems
24
6
Example: University Database
These layers
must consider
concurrency
control and
recovery
Structure of a DBMS
Conceptual schema:
A typical DBMS has a
Query Optimization
layered architecture.
and Execution
The figure does not
Relational Operators
show the concurrency
Files and Access Methods
control and recovery
components.
Buffer Management
This is one of several
Disk Space Management
possible architectures;
each system has its own
variations.
Students(sid: string, name: string, login: string,
age: integer, gpa:real)
Courses(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string, grade:string)
Physical schema:
Relations stored as unordered files.
Index on first column of Students.
External Schema (View):
Course_info(cid:string,enrollment:integer)
DB
CS 178 Database Management Systems
25
Key concepts
CS 178 Database Management Systems
26
Data Independence
Applications insulated from how data is
structured and stored.
Data independence
Program/data independence
Ability to modify a scheme definition without
affecting next level => Application/data
independence
Concurrency control
Recovery from failure
Logical data independence: Protection from
changes in logical structure of data.
Physical data independence: Protection from
changes in physical structure of data.
Supports Transaction processing
Why is this an advantage??
CS 178 Database Management Systems
27
CS 178 Database Management Systems
28
7
Data Definition and Manipulation
Languages
Query Languages
data definition language (DDL) to specify
database schema
Data manipulation language (DML) allows
users to access or manipulate data as
organized by data model
Formal query languages: Relational algebra,
Relational Calculus, Domain calculus
Commercial query languages: SQL, QUEL
SQL: “descendent” of SEQUEL; mostly
relational algebra and some aspects of
relational calculus
procedural DMLs: require user to specify what
data and how to get it
non-procedural DMLs: require user to specify
what data is needed without specifying how to get
it.
CS 178 Database Management Systems
has procedural and non-procedural aspects
29
Concurrency Control
30
Transaction: An Execution of a DB Program
Key concept is transaction, which is an atomic
sequence of database actions (reads/writes).
Each transaction, executed completely, must
leave the DB in a consistent state if DB is
consistent when the transaction begins.
Concurrent execution of user programs
essential for good DBMS performance.
Because disk accesses are frequent, and relatively
slow, it is important to keep the CPU humming by
working on several user programs concurrently.
Interleaving actions of different user programs
can lead to inconsistency: e.g., check is cleared
while account balance is being computed.
DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system.
CS 178 Database Management Systems
CS 178 Database Management Systems
Thus, ensuring that a transaction (run alone) preserves
consistency is ultimately the user’s responsibility!
31
CS 178 Database Management Systems
32
8
Recovery System
Scheduling Concurrent Transactions
DBMS ensures that execution of a set of concurrent
transactions {T1, ... , Tn} is equivalent to some serial
execution T1’ ... Tn’.
Ability to recover from system failures
Loss of main memory data (from power failures)
Disk failure
Maintain consistency-- transactions should not interfere
with each other
avoid conflicts for resources -- design protocol that
ensures “mutual exclusion”
•
Recovery system must ensure that database is
in a consistent state after failure
examples ???
recover from system crashes -- log file methods
DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a Xact.
CS 178 Database Management Systems
33
What Next
34
A little history
Start with Data Models
In DBMS: single instance of data maintained
and accessed by different users
File Processing: Earliest form of information
storage systems
each user keeps files needed for specific
application
Relational Model with a little ER model intro
Formal query languages- Relational algebra
SQL
Database schema design: how to design a
“good” schema, how to measure “good”?
Normal Forms (3NF, BCNF)
one user keeps track of students fees and
payments
second user keeps files on student grades
Demonstrate concepts learnt on Commercial
DBMS - Oracle
CS 178 Database Management Systems
CS 178 Database Management Systems
35
CS 178 Database Management Systems
36
9
File Processing and Database
Progression of Database Systems
Existence of catalog/data dictionary in
DBMS; in file proc this is part of application
data abstraction provided by DBMS - provide
conceptual view of data without details on
how it is stored
program-data independence- DBMS
programs written independent of specific
data
support of multiple user-views
CS 178 Database Management Systems
Early 1950’s- file proc, IBM’s Ramac system
1960’s: first generalized DBMS IBM Sabre
1970’s: Relational model proposed, INGRES,
System R, Query languagesSequel(SQL),QUEL
1980’s: DBMS for PCs, commercial RDBMSOracle, Sybase, Informix
1990s: Object-relational DBMS, Multimedia,
Spatial/GIS, Data Mining/OLAP, Dist.DB
37
Summary
38
Data Models
DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes,
concurrent access, quick application
development, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs
and are well-paid!☺
DBMS use in emerging applications such as
Human Genome project is exploding
CS 178 Database Management Systems
CS 178 Database Management Systems
Conceptual or object-based logical models ER model, OO- concepts close to user
perception
Record-based/Representational models:
provide concepts understood by users but not
too far from physical -- Relational, Network,
Hierarchical
Physical data models: describe details of how
data stored, record formats...
39
CS 178 Database Management Systems
40
10
Entity Relationship Model
Record-based Logical Models
Relational model: represent data and their
relationships by collection of tables. (Recall:
relation is a collection of tuples)
Network model: collection of records,
relationship between them by links or
pointers
Hierarchical model: similar to network but
organized as collection of trees
note: relational model does not use pointers
Based on collection of real world objects or concept
called entities; ex: employee, student
attribute represents properties of entity; s.s.num
relationship represents interaction between entities
constraints to which database must conform
overall logical structure represented by ER diagram
representing entity sets, relationships,attributes
CS 178 Database Management Systems
41
CS 178 Database Management Systems
42
11
Download