For the final lecture: Lecture 9 ¾ There will be some time for clarification of hard topics. ¾ PLease send me examples, topics etc that you want me to address. Query processing and optimization ¾ Email: lestr@ida.liu.se Lena Strömbäck september 2007 2 Todays lecture User 4 User Queries 3 Updates Answers User Queries 2 Updates Answers User 1 Updates Queries Answers Real World Model ¾ ¾ ¾ ¾ Updates Queries Answers Processing of queries and updates Database management system Query processing Semantic query trees and canonical form Heuristic optimisation Query plans and code generation Access to stored data Physical database september 2007 3 september 2007 4 SQL-query Application schema naming & structure information Parsing & Validating SELECT ORDER_ID, ENTRY_DATE FROM ORDER WHERE ENTRY_DATE > ‘2001-08-30’ σENTRY_DATE>2001-08-30 Intermediate form of query Database Query Optimizer System Catalog / DD with Meta Data Stored Database with Application Data Execution Plan (Access plan) πORDER_ID,ENTRY_DATE Query Code Generator 5 σENTRY_DATE>2001-08-30 ORDER Runtime DBprocessor Query result september 2007 Parsing and Validation ORDER Code to execute the query Application Data πORDER_ID,ENTRY_DATE << RESULT TABLE >> september 2007 6 1 Example Grammar <Query> ::= SELECT <SelList> FROM <FromList> WHERE <Condition> StarsIn( movieTitle, movieYear, starName ) MovieStar( name, address, gender, birthdate ) <SelList> ::= <Attribute>, <SelList> <SelList> ::= <Attribute> <FromList> ::= <Relation>, <FromList> <FromList> ::= <Relation> SELECT movieTitle FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ’%1960’); <Condition> ::= <Condition> AND <Condition> <Condition> ::= <Tuple> IN (<Query>) <Condition> ::= <Attribute> = <Attribute> <Condition> ::= <Attribute> LIKE <Pattern> <Tuple> ::= <Attribute> september 2007 7 september 2007 Syntax tree 8 Semantic control <Query> SELECT <SelList> FROM <FromList> <Attribute> <RelName> movieTitle StarsIn WHERE <Tuple> 1. <Condition> IN ) ( Control of used relations • • 2. <Attribute> <Query> Control and resolve attributes • starName 3. <SelList> FROM <Attribute> name september 2007 <FromList> WHERE <RelName> <Attribute> MovieStar birthdate Attributes must exist in the relations Type checking • SELECT Have to be declared in FROM Must exist in the database Attributes that are compared must be of the same type <Condition> LIKE <Pattern> ’%1960’ 9 september 2007 Semantic tree/Relational algebra 10 Execution plan/Access plan πmovieTitle one-pass hash-join 102 buffers starName=name StarsIn IndexScan(StarsIn, IndexR) πname σbirthdate LIKE ’%1960’ Filter(birthdate LIKE ’%1960’) TableScan(MovieStar) MovieStar september 2007 11 september 2007 12 2 SQL-query Generated code Application schema naming & structure information (very very simplified) SELECT ORDER_ID, ENTRY_DATE FROM ORDER WHERE ENTRY_DATE > ‘2001-08-30’ Parsing & Validating σENTRY_DATE>2001-08-30 Intermediate formof query … for i=1 to nTuples(Moviestar) tuple = read(Moviestar,i) if tuple.birthdate=”%1960” add tuple to iresult … … Database Query Optimizer System Catalog / DD with Meta Data Stored Database with Application Data for i=1 to nTuples(iresult) tuple=read(iresult) if tuple.name=IStarsIn[Starname] add tuple to result … ORDER Execution Plan (Access plan) πORDER_ID, ENTRY_DATE Query Code Generator Code to execute the query Application Data 13 september 2007 σENTRY_DATE>2001-08-30 ORDER Runtime DBprocessor Query result september 2007 πORDER_ID, ENTRY_DATE << RESULT TABLE >> 14 Basic Relational Algebra ¾ Select, σ ¾ Selects a tuple from a relation ¾ σ<selection condition>(R) Query trees and canonical form ¾ Project, π ¾ Projects a list of attributes from a relation ¾ π<attribute list>(R) ¾ Join operations ¾ R S ¾ RXS september 2007 15 september 2007 Write as relational algebra: Canonical form ¾ SELECT COURSE.NAME, TEACHES.NAME FROM COURSE, TEACHES WHERE COURSE.CODE=TEACHES.COURSE AND COURSE.PERIOD=VT2 september 2007 17 16 The easisest way of generating a query tree from an SQL query: 1. 2. 3. september 2007 Make a large table of all tables in the join using cross product On this table, use the where clause to make a selection. On this result, make a project to pick out the attributes pointed out by the select clause of the query. 18 3 Cost Components ¾ Access cost to secondary storage ¾ access structure, ordering of blocks Heuristic query optimization ¾ Storage cost ¾ Storing intermediate results on disk ¾ Computation cost ¾ in-memory searching, sorting, computation ¾ Memory usage cost ¾ memory buffers needed in the server ¾ Communication cost ¾ remote connection cost, network transfer cost september 2007 19 september 2007 20 Sample Query Tree Execution - projection first Cost estimation: σ ENT RY _D AT E> 2001-08-30 ( π OR DE R_ ID , ENT RY _D AT E ( OR DE R ) ) ¾ Disc accesses are expensive ¾ Estimate the disc accesses, by estimating the amount of data that need to be handled when computing the query n = 2 tuples à 4+27 (=31) bytes total: 62 bytes σ EN TRY_ DATE >20 01-08-30 n = 6 tuples à 4+27 (=31) bytes total: 181 bytes π O R D ER_ID , E NT RY _D AT E n = 6 tuples à 4+4+27 (= 35) bytes tota l: 210 bytes september 2007 21 september 2007 Sample Query Tree Execution - selection first 22 ORD ER JOIN with selection example SELECT * FROM ol_order_line, it_item WHERE ol_item_id = it_item_id AND ol_order_id = 1001 πORDER_ID, ENTRY_DATE( σENTRY _DATE>2001-08-30( ORDER ) ) n = 2 tuples à 4+27 (=31) bytes = 62 bytes σor_order_id=1001(ol_order_line πORDER_ID, ENTRY_DATE ol_item_id = it_item_id it_item) 2) 1) n = 2 tuples à 4+4+27 (=35) bytes = 70 bytes σor_order_id=1001 ol_item_id = it_item_id σENTRY_D ATE>2001-08-30 ol_item_id = it_item_id n = 6 tuples à 4+4+27 (= 35) bytes = 210 bytes ol_order_line september 2007 23 ORDER september 2007 it_item σor_order_id=1001 ol_order_line it_item 24 4 Heuristic optimisation Example: Idéa: Do selection and projection first, join as late as possible Pnum Name Address Phone Email Program Enrollment 10 30 30 20 20 5 6 Code Department Examiner Description Period 6 5 10 200 5 SPNum Ccode 10 6 STUDENT relation 5000 tuples, COURSE relation 200 tuples STUDENTCOURSE relation 100 000 tuples. Algorithm: ¾ Break up conjunctive select into cascades ¾ Move down select as far as possible in the tree ¾ Rearrange select operations – most restrictive first ¾ Convert cross product to join with the appropriate join condition from a selection ¾ Move down project operations as far as possible in the tree ¾ Identify subtrees that can be executed by a single algorithm SELECT name,pnum,examiner FROM student, course, studentcourse WHERE code = “tddb38” and code=ccode and spnum=pnum 400 students have taken the course. september 2007 25 september 2007 Transformation of algebra expressions The System Catalog 1. 2. 3. 4. 5. 6. ¾ Contains useful information to predict which selections to move down in the tree. REL_NAME FK_REL september 2007 ATTR_NAME ATTR_TYPE DATA_LEN NUM_DIST 26 MEMB_PK Conjunctive selection can be broken up into a sequence. Selection is commutative Only the last projection in a sequence is necessary. Projection commutes with selection Join (and cross product) are commutative a. If all the attributes in a selection involves only one relation in a join, then the select can be pushed into the join. b. If the selection condition can be written c1 AND c2 where each of the conditions only concerns one relation, c1 and c2 can be pushed down. MEM_FK LOW_VAL HIGH_VAL 27 september 2007 Transformation of algebra expressions 28 Relational algebra πmovieTitle 7. Projection operations can be pushed into join, each attribute to the relation it concerns. If the join condition contains additional attributes these attributes must be added to the join expressions children in the tree. 8. Union and intersection are commutative. Set difference is not. 9. Join, cross product, union and intersection are associative. 10. Selection commutes with union, intersection and set difference. 11. Projection commutes with union. 12. Combinations of selection and cross product can be converted into join operations. starName=name StarsIn πname σbirthdate LIKE ’%1960’ MovieStar september 2007 29 september 2007 30 5 Execution plan one-pass hash-join 102 buffers IndexScan(StarsIn, IndexR) Filter(birthdate LIKE ’%1960’) TableScan(MovieStar) september 2007 31 september 2007 32 september 2007 33 september 2007 34 Some Heuristics Algorithms and code generation september 2007 35 september 2007 36 6 Basic Algorithms for Executing Query Operations (Primitives in node operations of query trees) Sort-Merge ¾ External Sorting - (ORDER BY, pre-processing for efficient joins) ¾ Sorting algorithm suitable for files that do not fit in memory ¾ Sorting is divided into two phases: ¾ Sorting ¾ sort-merge strategy ¾ The Select Operation ¾ Data scan: Linear search, binary search ¾ Index: Primary on =, Primary on range, Secondary (B+tree index) ¾ Conjunctive selections: Index+test, composite index, record pointer intersection ¾ ¾ ¾ Merging ¾ The JOIN Operation ¾ Nested-loop join, Single-loop join, Sort-merge join, Hash join ¾ ¾ ¾ PROJECT and set operations ¾ π : strait forward, + duplicate elimination ¾ Union, Intersection, Difference : sort-merge + duplicate elimination september 2007 september 2007 37 File is divided into ”runs” that can fit into available buffers. Nr_of_initruns=ceiling(blocks/blocks_in_buffer) ¾ ¾ september 2007 The sorted runs are merged during one or several ”passes”. The degree of merging is the number of runs that can be merged in each pass. degree_of_merging=min( blocks_in_buffer – 1, nr_of_initruns) number of passes = ceiling( logdegree_of_merging(nr_of_initruns) ) 38 Sort-Merge Select Operation Example: blocks_in_buffer = 5, blocks =1024 Æ nr_of_initruns=205 Degree_of_merging = 4 Pass 0: 205 runs Pass 1: 52 runs Pass 2: 13 runs Pass 3: 4 runs Pass 4: 1 run Four passes are needed to sort merge the file. cost = (2*blocks) + (2*(blocks*(logdegree_of_merging(blocks)))) Example cost = 10240 ¾ Linear Search 39 ¾ Retrieve and test every record ¾ Binary Search ¾ If the selection involved an equality comparison on a key attribute used for file ordering. ¾ Primary or Secondary Index ¾ Use the index, eventually for several elements in an intervall. ¾ Index + Test ¾ Composite Index ¾ Record Pointer Intersection september 2007 40 Implementing Joins Implementing ”Project” ¾ Nested-loop ¾ If the attribute list contain the key ¾ For every record t in R, retrieve every record s from S and test the join condition. ¾ No problem, duplicates will not occur ¾ Otherwise ¾ Single-loop ¾ Must remove duplicates ¾ For every record t in R, retrieve all matching records s from S using an index. ¾ Sort-Merge ¾ Each, sorted, file with records are scanned once ¾ Hash ¾ Hash the record of the smaller file R into buckets. Then hash the records of S and combine each record with all records from R in the bucket. Must be able to fit file in memory! september 2007 41 september 2007 42 7 Summary ¾ ¾ ¾ ¾ ¾ september 2007 Heuristic Optimization Query processing steps Relational algebra Heuristic optimization Basic algorithms for executing query operations Cost components 43 SQL-example query SELECT E.LNAME FROM EMPLOYEE E, WORKS_ON W, PROJECT P WHERE P.PNAME = ‘Aquarius’ AND P.PNUMBER = W.PNO AND W.ESSN = E.SSN AND E.BDATE > ‘1957-12-31’ september 2007 Heuristic Optimization – Canonical Form πLNAME 44 Heuristic Optimization – Move Select Down πLNAME σPNUMBER=PNO σPNAME=‘Aquarius’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE>’1957-12-31’ X X PROJECT X σPNAME=‘Aquarius’ X PROJECT σBDATE>’1957-12-31’ WORKS_ON EMPLOYEE september 2007 σESSN=SSN WORKS_ON EMPLOYEE 45 september 2007 Heuristic Optimization – Apply Most Restrictive πLNAME Select First 46 Heuristic Optimization – Convert Cartesian Product/Select with Join πLNAME σESSN=SSN ESSN=SSN X σPNUMBER=PNO X σPNAME=‘Aquarius’ σBDATE>’1957-12-31’ PNUMBER=PNO EMPLOYEE EMPLOYEE WORKS_ON σPNAME=‘Aquarius’ PROJECT september 2007 47 σBDATE>’1957-12-31’ WORKS_ON PROJECT september 2007 48 8 Heuristic Optimization – Move Projections πDown the Tree LNAME ESSN=SSN πESSN πSSN,LNAME PNUMBER=PNO πPNUMBER σPNAME=‘Aquarius’ πESSN,PNO σBDATE>’1957-12-31’ EMPLOYEE WORKS_ON PROJECT september 2007 49 9