Chapter 3 The Relational Model, Additional Notes By David Shao CS 157B Fall 2003 Instructor: Dr. Lee Overview Codd’s Original Paper Alternative: Network Model IBM Develops System R How to Fund Basic Research? Codd’s Original Paper “A Relational Model of Data for Large Shared Data Banks” Communications of the ACM, Volume 13, Number 6, June 1970 Lower level (basement) of the new Martin Luther King, Jr. Library Get to roll the shelves apart to access journals. Codd’s Reasons Data independence from database implementation such as machine representation Natural structure of data Can be analyzed mathematically (Codd was a mathematician by training) Alternative: Network Model Charles A. Bachman 1973 ACM Turing Award Lecture “The Programmer as Navigator” Communications of the ACM, Volume 16, Number 11, November 1973, pp. 653-657 Bachman’s Perspective Compared his network model to the work of Copernicus Derided “computer-centric” databases, proposed a future where programmers navigate an “n-dimensional data space” CODASYL CODASYL: Conference on Data Systems Languages Formed 1959, helped develop COBOL 1971 Report of the CODASYL Data Base Task Group (DBTG) Bachman one of DBTG’s founding members Network: Navigating Links Unlike the relational model where everything is a table in the database, entities and relationships are not treated in a uniform fashion. Sample Example Fragment DBTG initial specification used COBOL with promises to add Fortran later. RECORD NAME IS ELECTION LOCATION MODE IS VIA ALL-ELECTIONS-88 WITHIN PRESIDENTIAL-AREA 02 ELECTION-YEAR PIC “9999” 02 ELECTION-WON-ELECTORAL-VOTES PIC “999” Relational vs. Network Reference: Entire issue of Computing Surveys, Volume 8, Number 1, March 1976 IBM competitors DEC, Univac, Xerox, and Honeywell offered commercial DBTG databases on their hardware, and Cullinane Corp. offered one for IBM’s System/360 As of 1976, the relational model was unproven. IBM Develops System R “The 1995 SQL Reunion: People, Projects, and Politics”, edited by Paul McJones http://gatekeeper.dec.com/pub/DEC/SRC/te chnical-notes/SRC-1997-018html/sqlr95.html “A History and Evaluation of System R”, Chamberlin et al., CACM 24 (10), 1981 Codd Not Involved with System R Codd was NOT part of the development team and not an author in the CACM paper Codd tried to form a team to investigate a language to more closely resemble the mathematics behind the first order predicate calculus. Express notions such as x y f(x, y) System R Timeline Phase Zero: 1974-1975, throw-away prototype Phase One: 1976-1977, full scale implementation including other features of a “real” database Phase Two: 1978-1979, testing System R internally and with trusted customers System R Features Ability to process SQL commands Support for transactions Support for multiple users Transactions ACID: Atomicity, Consistency, Isolation, Durability Don’t lose data when for example the hard drive crashes or the power goes out. Basic idea: Keep a copy of the old data in a separate place and keep a log of all changes since the last known good state. Coordinating Multiple Users Users want access to the same resource Potential problems when one user writes and another user writes or reads at the same time. Resource-Sharing Example Suppose users A and B are attempting to share resource X at the same time in a multi-programming environment Different results depending on which order A and B are scheduled to run: A writes X; B reads X; B writes X B reads X; A writes X; B writes X B reads X; B writes X; A writes X Resource-Sharing Implementation Locks—reserve a resource for one’s usage, at least prohibiting someone else from changing the resource while one is using it. Many pitfalls—whole books have been written on potential problems such as deadlocks, starvation, and thrashing. Database Implementation Reference Title: Transaction Processing: Concepts and Techniques Jim Gray and Andreas Reuter Jim Gray is one of the authors of the System R paper previously cited. System R Support for SQL Need for speed—universal answer back then was to write a compiler. Pre-compiled code eliminates time spent reparsing SQL and figuring out the best way to access the data. Used B-trees to support indexing. Compiling SQL Queries Compiled SQL queries into pre-optimized machine code Discovered about 100 machine code fragments were sufficient. Not too much impact on users when interactive queries were compiled and not interpreted. B-Trees Key idea—one node distinguishes many Many Children for One Node? Data is pulled into memory from secondary storage in pages—relatively large pieces Accessing data from the hard drive is incredibly slower than from memory— maybe a million times slower! Conclusion: Number of pages accessed needs to be kept as small as possible. B-Tree Advantages If each internal node can reference 200 children, in two accesses 200 * 200 = 40,000 branches can be distinguished. A tree has an intrinsic notion of an ordering as opposed to a hash. Maybe data can be arranged so that related items will be on the same page as one recently accessed. Join Example Using B-Trees Suppose we want to perform a join Bob 7 Tim 3 5 7 3 Mary Anne Sue Bob 7 Anne Tim 3 Sue Using B-Tree Indexing Use an idea similar to merge sort Index the first table on its column 2, index the second table on its column 1, then merge on identical values Avoid m * n comparisons Illustration of Merging Good Bad 3 3 3 3 5 7 5 7 7 7 Who Made Winning Possible Researchers—without Codd’s insight relational models don’t happen Programmers—without System R relational databases and SQL are unproven Management—without Larry Ellison and the other founders of Oracle, relational databases do not win commercially Or Was It a Debacle? Researchers lost—no one wants to read mathematics, and corporate funding for basic research has been slashed Programmers lost—ordinary users don’t want SQL, they want simple Excel-like spreadsheets. And SQL commoditizes programmers. IBM Lost? IBM funded the basic research in relational databases, they employed Codd and the System R programmers But it was Larry Ellison who made tens of billions of dollars. IBM previously had a near-monopoly with IMS in business databases, but products like Oracle help IBM competitors like Sun. How to Fund Basic Research? “Funding a Revolution: Government Support for Computing Research” Copyright 1999 by the National Academy of Sciences http://www.nap.edu/readingroom/books/far/ Chapter 6 “The Rise of Relational Databases”