CS370 Database Management Systems Joe Meehean 1 Persistent Data What if we want our data to last? ◦ ◦ ◦ ◦ beyond single run of program beyond life of machine beyond life of building beyond life of company 2 Persistent Data What do we want from a persistent data store? 3 Persistent Data What do we want from a persistent data store? ◦ ◦ ◦ ◦ ◦ ◦ minimize data size fast multiple users (at the same time) flexible low maintenance easy to use 4 Class Registration System Things ◦? Actions ◦? Limitations ◦? 5 Class Registration System Things ◦ ◦ ◦ ◦ Students Faculty Courses Offerings Actions ◦ Courses are offered ◦ Students enroll in courses ◦ Faculty teach courses Limitations ◦ ◦ ◦ ◦ Class size limits Faculty cannot teach two courses at the same time Students cannot take two courses at the same time Prerequisites 6 DISCUSSION BREAK!!! Course registration system ◦ how do we store this stuff? ◦ how do we access it? 7 DISCUSSION BREAK!!! What if we need to add a new field to the student record? What if a second system wants to share our data? ◦ meal plan wants to share student records How do we manage a single student record for all campus data? ◦ payroll ◦ student loans ◦ parking 8 Relational Databases Solves many of these problems ◦ ◦ ◦ ◦ ◦ persistent shared separates data access from data storage allows data reuse data management can be decomposed Focus of this class ◦ database creation and management ◦ writing applications that use DBs 9 Vocabulary DataBase Management System (DBMS) ◦ program or program suite ◦ create, access, and store databases ◦ Oracle, MS Access, SQL Server, DB2 Relational DataBase (RDB) ◦ ◦ ◦ ◦ business or application data organized using the entity-relationship model stored in DBMS student records, payroll, bank records 10 When to use Relational DBs Well structured data ◦ course registration ◦ payroll Need transactions (consistency) ◦ ATM ◦ class enrollment ◦ payroll deductions Needs to be moderately fast 11 When NOT to use Relational DBs Need blazing speed ◦ databases can be slow (tens of milliseconds) Need to support thousands of users ◦ databases do not scale well Unstructured data ◦ more on this later No common data access patterns 12 Why learn Databases at all? You will write software that uses them New persistent storage response to relational databases ◦ understand their shortcomings ◦ understand advantages of new techniques New persistent storage uses same techniques ◦ kept what they needed, threw the rest away 13 TANGENT!!! People to develop good relationships with Database Administrators ◦ they are experts ◦ they will help you to avoid doing something stupid ◦ they can make accessing your data easy or hard 14 TANGENT!!! People to develop good relationships with System Administrators and Administrative Assistants ◦ everything you do depends on something they did for you ◦ they are often overworked ◦ every time they do something for you, it is a favor. Say thank you. 15 Entity Relationship Model RDBs built on simple data model Entities ◦ things, stuff, concepts ◦ e.g., students, course, offerings, grades Relationships ◦ connections between entities ◦ e.g., students take courses, courses have offerings, Similar to object-oriented programming model 16 ER Model in RDBs Entities translated into tables ID First Name Last Name Major GPA 1354 Phil Park CS 3.5 5467 Samantha Small Econ 4.0 3549 Terry Berry Math 2.5 Relationships connect tables 17 How to store table data? Disk Geometry ◦ seek time: move disk arm (8ms) ◦ rotational Latency: data to spin under disk head (2-4ms) ◦ data transfer: read data from disk (negible) ◦ sequential reads and writes are faster than random R/W Place data that will be used together close together 18 How to store table data? struct student{ int id; char* f_name; char* l_name; char* major; float gpa; } Assume we are building a database management system Need to write the student structs to disk But what is the best way? 19 DISCUSSION BREAK!!! We are building a RDB to store thousands of students Organize file layout of student structs How do we find student info quickly? ◦ struct student fetchStudent(int student_id){…} 20 DISCUSSION BREAK!!! Organize file layout of student structs ◦ One line per student? 1354, Phil, Park, CS, 2015 3549, Terry, Berry, Math, 2013 5467, Samantha, Small, Econ, 2012 ◦ One line per data field? 1354, 3549, 5467 Phil, Terry, Samantha Park, Berry, Small CS, Math, Econ 2015, 2013, 2012 21 DISCUSSION BREAK!!! Organize file layout of student structs What if we want to calculate average GPA? What if we want a sorted list of last names? 22 DISCUSSION BREAK!!! Assume we built a library to find and read students struct student fetchStudent(int student_id){…} Registrar uses the library What if the dining hall wants to add a account balance field? How can we prevent changing the interface due to changes in data fields? 23 Three Schema Architecture Internal Schema ◦ How the data is stored on disk Conceptual Schema ◦ Entities and relationships External Schema ◦ Application specific views of the data 24 Three Schema Architecture ID First Name Last Name Major GPA ID First Name Last Name Account 1354 Phil Park CS 3.5 1354 Phil Park $23.45 5467 Samantha Small Econ 4.0 5467 Samantha Small $45.16 3549 Terry Berry Math 2.5 3549 Terry Berry $0.53 ID First Name Last Name Major GPA Account 1354 Phil Park CS 3.5 $23.45 5467 Samantha Small Econ 4.0 $45.16 3549 Terry Berry Math 2.5 $0.35 1354, 3549, 5467 Phil, Terry, Samantha Park, Berry, Small CS, Math, Econ 2015, 2013, 2012 $23.45, $0.53, $45.16 25 Three Schema Architecture Database management system maps from one schema to the other ◦ e.g., removes account balance from registrar view of data Provides data independence ◦ changes in one schema do not affect structure of others ◦ may change performance ◦ may add functionality 26 Three Schema Architecture Application 1 Application 2 Application 3 External Schema 1 External Schema 2 External Schema 3 Conceptual Schema Internal Schema 27 Accessing and updating data Applications care about data not how its stored (external schema) ◦ should not need to know internal schema Operations on data require complex procedures ◦ iteration (loops) ◦ lookups ◦ coalescing and shifting data 28 Accessing and updating data Declarative data language ◦ non-procedural access ◦ describe what data we need ◦ not how to get it Structured Query Language (SQL) ◦ ◦ ◦ ◦ create database store data access data aggregate data statistics 29 End result of success of RDBs Application developers need to know… ◦ entity-relationship data model ◦ three schema architecture ◦ SQL Advantages of relational databases ◦ ◦ ◦ ◦ DBMSs are well tested DBMSs come with experts RDBs allow unplanned combination of data standard across all companies 30 Course Roadmap Application External Schema Using RDBs Relational Database Design Conceptual Schema Internal Schema Database Management System Design 31 Questions? 32