Real SQL Programming Embedded SQL Java Database Connectivity Stored Procedures 1 Three-Tier Architecture A common environment for using a database has three tiers of processors: 1. Web servers --- talk to the user. 2. Application servers --- execute the business logic. 3. Database servers --- get what the app servers need from the database. 2 Example: Amazon Database holds the information about products, customers, etc. Business logic includes things like “what do I do after someone clicks ‘checkout’?” Answer: Show the “how will you pay for this?” screen. 3 Environments, Connections, Queries The database is, in many DB-access languages, an environment. Database servers maintain some number of connections, so app servers can ask queries or perform modifications. The app server issues statements : queries and modifications, usually. 4 Diagram to Remember Environment Connection Statement 5 Real SQL Programming Embedded SQL (write SQL with “markup” and mixed with host language) Call Level Interface (host level programming via SQL API) Java, C++, Python, Ruby, etc Stored Procedures User defined procedures/function that become part of the schema (server level) 6 Embedded SQL Key idea: A preprocessor turns SQL statements into procedure calls that fit with the surrounding host-language code. All embedded SQL statements begin with EXEC SQL, so the preprocessor can find them easily. 7 Shared Variables To connect SQL and the host-language program, the two parts must share some variables. Declarations of shared variables are bracketed by: EXEC SQL BEGIN DECLARE SECTION; Always <host-language declarations> needed EXEC SQL END DECLARE SECTION; 8 Use of Shared Variables In SQL, the shared variables must be preceded by a colon. They may be used as constants provided by the host-language program. They may get values from SQL statements and pass those values to the hostlanguage program. In the host language, shared variables behave like any other variable. 9 Example: Looking Up Prices We’ll use C with embedded SQL to sketch the important parts of a function that obtains a beer and a bar, and looks up the price of that beer at that bar. Assumes database has our usual Sells(bar, beer, price) relation. 10 Example: C Plus SQL EXEC SQL BEGIN DECLARE SECTION; Note 21-char char theBar[21], theBeer[21]; arrays needed for 20 chars + float thePrice; endmarker EXEC SQL END DECLARE SECTION; /* obtain values for theBar and theBeer */ EXEC SQL SELECT price INTO :thePrice FROM Sells WHERE bar = :theBar AND beer = :theBeer; /* do something with thePrice */ SELECT-INTO 11 as in PSM Embedded Queries Embedded SQL has the some limitations regarding queries: SELECT-INTO for a query should produce a single tuple. Otherwise, you have to use a cursor. 12 Cursor Statements Declare a cursor c with: EXEC SQL DECLARE c CURSOR FOR <query>; Open and close cursor c with: EXEC SQL OPEN CURSOR c; EXEC SQL CLOSE CURSOR c; Fetch from c by: EXEC SQL FETCH c INTO <variable(s)>; Macro NOT FOUND is true if and only if the FETCH fails to find a tuple. 13 Example: Print Joe’s Menu Let’s write C + SQL to print Joe’s menu – the list of beer-price pairs that we find in Sells(bar, beer, price) with bar = Joe’s Bar. A cursor will visit each Sells tuple that has bar = Joe’s Bar. 14 Example: Declarations EXEC SQL BEGIN DECLARE SECTION; char theBeer[21]; float thePrice; EXEC SQL END DECLARE SECTION; EXEC SQL DECLARE c CURSOR FOR SELECT beer, price FROM Sells WHERE bar = ’Joe’’s Bar’; The cursor declaration goes outside the declare-section 15 Example: Executable Part EXEC SQL OPEN CURSOR c; The C style of breaking while(1) { loops EXEC SQL FETCH c INTO :theBeer, :thePrice; if (NOT FOUND) break; /* format and print theBeer and thePrice */ } EXEC SQL CLOSE CURSOR c; 16 Need for Dynamic SQL Most applications use specific queries and modification statements to interact with the database. The DBMS compiles EXEC SQL … statements into specific procedure calls and produces an ordinary host-language program that uses a library. What about dynamic sql queries 17 Dynamic SQL Preparing a query: EXEC SQL PREPARE <query-name> FROM <text of the query>; Executing a query: EXEC SQL EXECUTE <query-name>; “Prepare” = optimize query. Prepare once, execute many times. 18 Example: A Generic Interface EXEC SQL BEGIN DECLARE SECTION; char query[MAX_LENGTH]; EXEC SQL END DECLARE SECTION; while(1) { /* issue SQL> prompt */ /* read user’s query into array query */ EXEC SQL PREPARE q FROM :query; EXEC SQL EXECUTE q; q is an SQL variable representing the optimized } form of whatever statement 19 is typed into :query Execute-Immediate If we are only going to execute the query once, we can combine the PREPARE and EXECUTE steps into one. Use: EXEC SQL EXECUTE IMMEDIATE <text>; 20 Example: Generic Interface Again EXEC SQL BEGIN DECLARE SECTION; char query[MAX_LENGTH]; EXEC SQL END DECLARE SECTION; while(1) { /* issue SQL> prompt */ /* read user’s query into array query */ EXEC SQL EXECUTE IMMEDIATE :query; } 21 Host/SQL Interfaces Via Libraries Another approach to connecting databases to conventional languages is to use library calls. 1. 2. 3. 4. C + CLI Java + JDBC PHP + PEAR/DB Python+PyGreSQL 22 JDBC Java Database Connectivity (JDBC) is a library with Java as the host language. 23 Making a Connection The JDBC classes import java.sql.*; Class.forName(org.postgresql.Driver); Connection myCon = DriverManager.getConnection(…); Loaded by forName URL of the database your name, and password go here. The driver for Postgres; others exist 24 Statements JDBC provides two classes: 1. Statement = an object that can accept a string that is a SQL statement and can execute such a string. 2. PreparedStatement = an object that has an associated SQL statement ready to execute. 25 Creating Statements The Connection class has methods to create Statements and PreparedStatements. Statement stat1 = myCon.createStatement(); PreparedStatement stat2 = myCon.createStatement( ”SELECT beer, price FROM Sells ” + ”WHERE bar = ’Joe’ ’s Bar’ ” createStatement with no argument returns ); a Statement; with one argument it returns 26 a PreparedStatement. Executing SQL Statements JDBC distinguishes queries from modifications, which it calls “updates.” Statement and PreparedStatement each have methods executeQuery and executeUpdate. For Statements: one argument: the query or modification to be executed. For PreparedStatements: no argument. 27 Example: Update stat1 is a Statement. We can use it to insert a tuple as: stat1.executeUpdate( ”INSERT INTO Sells ” + ”VALUES(’Brass Rail’,’Bud’,3.00)” ); 28 Example: Query stat2 is a PreparedStatement holding the query ”SELECT beer, price FROM Sells WHERE bar = ’Joe’’s Bar’ ”. executeQuery returns an object of class ResultSet – we’ll examine it later. The query: ResultSet menu = stat2.executeQuery(); 29 Accessing the ResultSet An object of type ResultSet is something like a cursor. Method next() advances the “cursor” to the next tuple. The first time next() is applied, it gets the first tuple. If there are no more tuples, next() returns the value false. 30 Accessing Components of Tuples When a ResultSet is referring to a tuple, we can get the components of that tuple by applying certain methods to the ResultSet. Method getX (i ), where X is some type, and i is the component number, returns the value of that component. The value must have type X. 31 Example: Accessing Components Menu = ResultSet for query “SELECT beer, price FROM Sells WHERE bar = ’Joe’ ’s Bar’ ”. Access beer and price from each tuple by: while ( menu.next() ) { theBeer = Menu.getString(1); thePrice = Menu.getFloat(2); /*something with theBeer and thePrice*/ 32 } Stored Procedures PSM, or “persistent stored modules,” allows us to store procedures as database schema elements. PSM = a mixture of conventional statements (if, while, etc.) and SQL. Lets us do things we cannot do in SQL alone. 33 Basic PSM Form: PL/PgSQL CREATE FUNCTION <name> (<parameter list> ) RETURNS <type> AS $$ [ DECLARE declarations ] BEGIN statements END; $$ language plpgsql; 34 Example: Stored Procedure Let’s write a procedure that takes two arguments b and p, and adds a tuple to Sells(bar, beer, price) that has bar = ’Joe’’s Bar’, beer = b, and price = p. Used by Joe to add to his menu more easily. 35 Kinds of PSM statements – (1) RETURN <expression> sets the return value of a function. Unlike C, etc., RETURN does not terminate function execution. DECLARE <name> <type> used to declare local variables. BEGIN . . . END for groups of statements. Separate statements by semicolons. 36 Kinds of PSM Statements – (2) Assignment statements: <variable> := <expression>; Example: b := ’Bud’; Statement labels: give a statement a label by prefixing a name and a colon. 37 Example: IF Let’s rate bars by how many customers they have, based on Frequents(drinker,bar). <100 customers: ‘unpopular’. 100-199 customers: ‘average’. >= 200 customers: ‘popular’. Function Rate(b) rates bar b. 38 Example: IF (continued) CREATE FUNCTION Rate (b CHAR(20) ) Number of RETURNS CHAR(10) AS $$ customers of bar b DECLARE cust INTEGER; BEGIN SELECT COUNT(*) INTO cust FROM Frequents WHERE bar = b); IF cust < 100 THEN RETURN ’unpopular’ ELSEIF cust < 200 THEN RETURN ’average’ ELSE RETURN ’popular’ END IF; Nested Return occurs here, not at IF statement END; one of the RETURN statements 39 $$ Loops Basic form: <loop name>: LOOP <statements> END LOOP; Exit from a loop by: LEAVE <loop name> 40 Example: Exiting a Loop loop1: LOOP ... LEAVE loop1; ... END LOOP; If this statement is executed . . . Control winds up here 41 Breaking Cursor Loops – (4) The structure of a cursor loop is thus: cursorLoop: LOOP … FETCH c INTO … ; IF NotFound THEN LEAVE cursorLoop; END IF; … END LOOP; 42 Example: Cursor Let’s write a function that examines Sells(bar, beer, price), and raises by $1 the price of all beers at Joe’s Bar that are under $3. Returns a count of number of changes made. 43 The Needed Declarations CREATE FUNCTION JoeGouge( ) RETURNS int AS $$ DECLARE theBeer CHAR(20); Used to hold beer-price pairs DECLARE thePrice REAL; when fetching DECLARE cnt INT; through cursor c Returns Joe’s menu DECLARE c CURSOR FOR (SELECT beer, price FROM Sells WHERE bar = ’Joe’’s Bar’); 44 The Function Body BEGIN OPEN c; cnt := 0; Check if the recent <<menuLoop>> LOOP FETCH failed to FETCH c INTO theBeer, thePrice; get a tuple IF NOT FOUND THEN LEAVE menuLoop END IF; IF thePrice < 3.00 THEN UPDATE Sells SET price = thePrice + 1.00 WHERE bar = ’Joe’’s Bar’ AND beer = theBeer; cnt := cnt+1; END IF; END LOOP; If Joe charges less than $3 for CLOSE c; the beer, raise its price at RETURN cnt; Joe’s Bar by $1. 45 END; Disks and Files Record Formats, Files & Indexing Disks, Memory, and Files The BIG picture… Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB 47 Hierarchy of Storage Cache is faster and more expensive than... RAM is faster and more expensive than... DISK is faster and more expensive than... DVD is faster and more expensive than... Tape is faster and more expensive than... If more information than will fit at one level, have to store some of it at the next level 48 Why Not Store Everything in Main Memory? Costs too much. $100 will buy you either 4GB of RAM or 1TB of disk today. Main memory is volatile. We want data to be saved between runs. (Obviously!) Typical storage hierarchy: Main memory (RAM) for currently used data. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! READ: transfer data from disk to main memory (RAM). WRITE: transfer data from RAM to disk. Both are high-cost operations, relative to inmemory operations, so must be planned carefully! Disks Secondary storage device of choice. Main advantage over tapes: random access vs. sequential. Data is stored and retrieved in units called disk blocks or pages. Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Therefore, relative placement of pages on disk has major impact on DBMS Pages and Blocks Data files decomposed into pages Fixed size piece of contiguous information in the file Unit of exchange between disk and main memory Disk divided into page size blocks of storage Page can be stored in any block Application’s request for data satisfied by Read data page to DBMS buffer in main memory Transfer requested data from buffer to application Application’s request to change data satisfied by Update DBMS buffer (Eventually) copy buffer page to page on disk 52 Components of a Disk Disk head The platters spin (7200rpm). The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). Sector Arm movement Only one head reads/writes at any one time. Arm assembly Block size is a multiple of sector size (which is fixed). Spindle Tracks Platters I/O Time to Access a Page Seek latency- time to position heads over cylinder containing page (~ 10 - 20 ms) Rotational latency - additional time for platters to rotate so that start of page is under head (~ 5 - 10 ms) Transfer time - time for platter to rotate over page (depends on size of page) Latency = seek latency + rotational latency Goal - minimize average latency, reduce number of page transfers 54 Accessing a Disk Page Time to access (read/write) a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface) Seek time and rotational delay dominate. Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1msec per 4KB page Latency = seek latency + rotational latency Goal - minimize average latency, reduce number of page transfers Reducing Latency Store pages containing related information close together on disk Justification: If application accesses x, it will next access data related to x with high probability Page size tradeoff: Large page size - data related to x stored in same page; hence additional page transfer can be avoided Small page size - reduce transfer time, reduce buffer size in main memory 56 Typical page size - 4096 bytes Arranging Pages on Disk `Next’ block concept: blocks on same track, followed by blocks on same cylinder, followed by blocks on adjacent cylinder Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay. For a sequential scan, pre-fetching several pages at a time is a big win! RAID Disk Array: Arrangement of several disks that gives abstraction of a single, large disk. Goals: Increase performance and reliability. Two main techniques: Data striping: Data is partitioned; size of a partition is called the striping unit. Partitions are distributed over several disks. Redundancy: More disks => more failures. Redundant information allows reconstruction of data if a disk fails.