Introduction - Joe Meehean

advertisement
CS370
Database Management Systems
Joe Meehean
1
Persistent Data

What if we want our data to last?
◦
◦
◦
◦
beyond single run of program
beyond life of machine
beyond life of building
beyond life of company
2
Persistent Data

What do we want from a persistent
data store?
3
Persistent Data

What do we want from a persistent
data store?
◦
◦
◦
◦
◦
◦
minimize data size
fast
multiple users (at the same time)
flexible
low maintenance
easy to use
4
Class Registration System

Things
◦?

Actions
◦?

Limitations
◦?
5
Class Registration System

Things
◦
◦
◦
◦

Students
Faculty
Courses
Offerings
Actions
◦ Courses are offered
◦ Students enroll in courses
◦ Faculty teach courses

Limitations
◦
◦
◦
◦
Class size limits
Faculty cannot teach two courses at the same time
Students cannot take two courses at the same time
Prerequisites
6
DISCUSSION
BREAK!!!

Course registration system
◦ how do we store this stuff?
◦ how do we access it?
7
DISCUSSION
BREAK!!!


What if we need to add a new field to the
student record?
What if a second system wants to share our
data?
◦ meal plan wants to share student records

How do we manage a single student record
for all campus data?
◦ payroll
◦ student loans
◦ parking
8
Relational Databases

Solves many of these problems
◦
◦
◦
◦
◦

persistent
shared
separates data access from data storage
allows data reuse
data management can be decomposed
Focus of this class
◦ database creation and management
◦ writing applications that use DBs
9
Vocabulary

DataBase Management System (DBMS)
◦ program or program suite
◦ create, access, and store databases
◦ Oracle, MS Access, SQL Server, DB2

Relational DataBase (RDB)
◦
◦
◦
◦
business or application data
organized using the entity-relationship model
stored in DBMS
student records, payroll, bank records
10
When to use Relational DBs

Well structured data
◦ course registration
◦ payroll

Need transactions (consistency)
◦ ATM
◦ class enrollment
◦ payroll deductions

Needs to be moderately fast
11
When NOT to use Relational DBs

Need blazing speed
◦ databases can be slow (tens of milliseconds)

Need to support thousands of users
◦ databases do not scale well

Unstructured data
◦ more on this later

No common data access patterns
12
Why learn Databases at all?
You will write software that uses them
 New persistent storage response to
relational databases

◦ understand their shortcomings
◦ understand advantages of new techniques

New persistent storage uses
same techniques
◦ kept what they needed, threw the rest away
13
TANGENT!!!
People to develop good relationships with
 Database Administrators

◦ they are experts
◦ they will help you to avoid doing
something stupid
◦ they can make accessing your data
easy or hard
14
TANGENT!!!
People to develop good relationships with
 System Administrators and
Administrative Assistants

◦ everything you do depends on something they
did for you
◦ they are often overworked
◦ every time they do something for you,
it is a favor. Say thank you.
15
Entity Relationship Model
RDBs built on simple data model
 Entities

◦ things, stuff, concepts
◦ e.g., students, course, offerings, grades

Relationships
◦ connections between entities
◦ e.g., students take courses,
courses have offerings,

Similar to object-oriented
programming model
16
ER Model in RDBs


Entities translated into tables
ID
First
Name
Last
Name
Major
GPA
1354
Phil
Park
CS
3.5
5467
Samantha
Small
Econ
4.0
3549
Terry
Berry
Math
2.5
Relationships connect tables
17
How to store table data?

Disk Geometry
◦ seek time: move disk arm (8ms)
◦ rotational Latency: data to spin under
disk head (2-4ms)
◦ data transfer: read data from disk (negible)
◦ sequential reads and writes are faster than
random R/W

Place data that will be used together
close together
18
How to store table data?
struct student{
int
id;
char* f_name;
char* l_name;
char* major;
float gpa;
}



Assume we are building a database
management system
Need to write the student structs to disk
But what is the best way?
19
DISCUSSION
BREAK!!!
We are building a RDB to store
thousands of students
 Organize file layout of student structs
 How do we find student info quickly?

◦ struct student
fetchStudent(int student_id){…}
20
DISCUSSION
BREAK!!!

Organize file layout of student structs
◦ One line per student?
1354, Phil, Park, CS, 2015
3549, Terry, Berry, Math, 2013
5467, Samantha, Small, Econ, 2012
◦ One line per data field?
1354, 3549, 5467
Phil, Terry, Samantha
Park, Berry, Small
CS, Math, Econ
2015, 2013, 2012
21
DISCUSSION
BREAK!!!
Organize file layout of student structs
 What if we want to calculate
average GPA?
 What if we want a sorted list of
last names?

22
DISCUSSION
BREAK!!!

Assume we built a library to find and
read students
struct student
fetchStudent(int student_id){…}
Registrar uses the library
 What if the dining hall wants to add a
account balance field?
 How can we prevent changing the
interface due to changes in data fields?

23
Three Schema Architecture

Internal Schema
◦ How the data is stored on disk

Conceptual Schema
◦ Entities and relationships

External Schema
◦ Application specific views of the data
24
Three Schema Architecture
ID
First
Name
Last
Name
Major
GPA
ID
First
Name
Last
Name
Account
1354
Phil
Park
CS
3.5
1354
Phil
Park
$23.45
5467
Samantha
Small
Econ
4.0
5467
Samantha
Small
$45.16
3549
Terry
Berry
Math
2.5
3549
Terry
Berry
$0.53
ID
First
Name
Last
Name
Major
GPA
Account
1354
Phil
Park
CS
3.5
$23.45
5467
Samantha
Small
Econ
4.0
$45.16
3549
Terry
Berry
Math
2.5
$0.35
1354, 3549, 5467
Phil, Terry, Samantha
Park, Berry, Small
CS, Math, Econ
2015, 2013, 2012
$23.45, $0.53, $45.16
25
Three Schema Architecture

Database management system maps from
one schema to the other
◦ e.g., removes account balance from
registrar view of data

Provides data independence
◦ changes in one schema do not affect structure
of others
◦ may change performance
◦ may add functionality
26
Three Schema Architecture
Application 1
Application 2
Application 3
External Schema 1
External Schema 2
External Schema 3
Conceptual Schema
Internal Schema
27
Accessing and updating data

Applications care about data not how
its stored (external schema)
◦ should not need to know internal schema

Operations on data require
complex procedures
◦ iteration (loops)
◦ lookups
◦ coalescing and shifting data
28
Accessing and updating data

Declarative data language
◦ non-procedural access
◦ describe what data we need
◦ not how to get it

Structured Query Language (SQL)
◦
◦
◦
◦
create database
store data
access data
aggregate data statistics
29
End result of success of RDBs

Application developers need to know…
◦ entity-relationship data model
◦ three schema architecture
◦ SQL

Advantages of relational databases
◦
◦
◦
◦
DBMSs are well tested
DBMSs come with experts
RDBs allow unplanned combination of data
standard across all companies
30
Course Roadmap
Application
External Schema
Using RDBs
Relational
Database Design
Conceptual Schema
Internal Schema
Database Management
System Design
31
Questions?
32
Download