Announcements ou ce e ts Lecture 9 ¾ Remember to register for the exam! ¾ Firday 11/10 and Monday 13/10 teachers will be available to answers questions. questions See the home page for more information information. Query processing and optimization Lena Strömbäck oktober 2008 2 For o the t e final a lecture: ectu e User 4 Real World ¾ There will be some time for clarification of hard topics. topics Model ¾ Please send me examples, topics etc that you want me to address by Monday Monday. Processing of queries and updates Database management system ¾ Email: lestr@ida.liu.se User Queries 3 Updates Answers User Queries 2 Updates Answers User Queries 1 Updates Answers Updates Queries Answers Access to stored data Physical database oktober 2008 3 oktober 2008 4 SQL-query Todays odays lecture ectu e Application schema naming & structure information Parsing & Validating V lid i SELECT ORDER_ID, ENTRY_DATE FROM ORDER WHERE ENTRY_DATE > ‘2001-08-30’ σENTRY_DATE>2001-08-30 Intermediate form of query ¾ ¾ ¾ ¾ Query processing Semantic query trees and canonical form Heuristic optimisation Q Query plans l and d code d generation ti Database Query Optimizer System Catalog / DD with Meta Data Stored Database with Application Data ORDER Execution Plan (Access plan) πORDER_ID,ENTRY_DATE Query Code Generator Code to execute the query Application Data 5 oktober 2008 σENTRY_DATE>2001-08-30 ORDER Runtime DBprocessor Query result oktober 2008 πORDER_ID,ENTRY_DATE << RESULT TABLE >> 6 1 Example a pe Semantic Se a t c control co t o 1 1. StarsIn( movieTitle, movieYear, starName ) MovieStar( name, address, gender, birthdate ) Control of used relations • • 2 2. Control and resolve attributes • SELECT movieTitle FROM StarsIn WHERE starName IN ( SELECT name FROM M MovieStar i St WHERE birthdate LIKE ’%1960’); oktober 2008 7 3. Semantic Se a t c tree/Relational t ee/ e at o a algebra a geb a Attributes must exist in the relations Type checking • oktober 2008 Have to be declared in FROM Must exist in the database Att ib t th Attributes thatt are compared d mustt b be off th the same ttype 8 Execution ecut o p plan/Access a / ccess plan pa πmovieTitle one-pass hash-join 102 buffers starName=name IndexScan(StarsIn IndexR) IndexScan(StarsIn, πname StarsIn Filter(birthdate LIKE ’%1960’) %1960 ) σbirthdate LIKE ’%1960’ TableScan(MovieStar) MovieStar oktober 2008 9 oktober 2008 10 SQL-query Generated Ge e ated code Application schema naming i & structure t t information (very very simplified) SELECT ORDER_ID, ENTRY_DATE FROM ORDER WHERE ENTRY_DATE > ‘2001-08-30’ Parsing & Validating σENTRY_DATE>2001-08-30 Intermediate formof query … for i=1 to nTuples(Moviestar) tuple = read(Moviestar,i) if tuple.birthdate=”%1960” add tuple to iresult … … Database Query Optimizer System Catalog / DD with Meta Data Stored Database with Application Data for i=1 to nTuples(iresult) tuple read(iresult) tuple=read(iresult) if tuple.name=IStarsIn[Starname] add tuple to result … ORDER Execution Plan (Access plan) πORDER_ID, ENTRY_DATE Query Code Generator Code to execute the query Application Data 11 oktober 2008 σENTRY_DATE>2001-08-30 ORDER Runtime DBprocessor Query result oktober 2008 πORDER_ID, ENTRY_DATE << RESULT TABLE >> 12 2 Relational e at o a algebra a geb a ¾ Selektion, σ ¾ Selects tuples from a relation ¾ σ<selektvillkor>(R) Query trees and canonical form ¾ SELECT * FROM R WHERE <selektvillkor> ¾ Projektion, π ¾ Selects attributes from a relation ¾ π<attributlista> tt ib tli t (R) ¾ SELECT <attributlista> FROM R Institutionen för dataventenskap (IDA) Linköpings universitet oktober 2008 13 oktober 2008 14 2008-10-03 Relational e at o a algebra a geb a Relationsalgebra e at o sa geb a ¾ Cross product ¾ Sets ¾R X S ¾R U S ¾R – S ¾R I S ¾ SELECT * FROM R, S ¾ Join ¾R <villkor> Sida 14 S ¾ SELECT * FROM R R,S S WHERE <villkor> ¾ R and S have the same attributes and arity Institutionen för dataventenskap (IDA) Linköpings universitet oktober 2008 15 2008-10-03 Institutionen för dataventenskap (IDA) Linköpings universitet Sida 15 oktober 2008 16 2008-10-03 Relational e at o a algebra a geb a Relational e at o a algebra a geb a ¾ Combine ¾ Aggregates πFNAME, LNAME, SALARY(σDNO=5(EMPLOYEE)) <group attributes>F<functions>(R) - SELECT FNAME, LNAME, SALARY FROM EMPLYEE WHERE DNO=5 ex: ¾ Rename DNOF<COUNT SSN, AVERAGE SALARY>(EMPLOYEE) ¾ ρS(B1,B2,…,Bn)(R) ¾ ρS (R) ¾ ρ(B1,B2,…,Bn)(R) ¾ SELECT COUNT(SSN), AVERAGE(SALARY) FROM EMPLOYEE GROUP BY DNO FROM R AS S(B1,B2,…,BN) Institutionen för dataventenskap (IDA) Linköpings universitet oktober 2008 17 Sida 16 2008-10-03 Institutionen för dataventenskap (IDA) Linköpings universitet Sida 17 oktober 2008 18 2008-10-03 Sida 18 3 Write te as relational e at o a algebra: a geb a Canonical Ca o ca form o ¾ SELECT COURSE.NAME, COURSE NAME TEACHES TEACHES.NAME NAME FROM COURSE, TEACHES WHERE COURSE.CODE=TEACHES.COURSE AND COURSE.PERIOD=VT2 oktober 2008 19 The easisest way of generating a query tree from an SQL query: 1. 2. 3. oktober 2008 Make a large table of all tables in the join using cross product On this table, use the where clause to make a selection. On this result, make a project to pick out the attributes pointed out by the select clause of the query. 20 Cost Co Components po e ts ¾ Access cost to secondary storage ¾ access structure, ordering of blocks Heuristic query optimization ¾ Storage cost ¾ Storing intermediate results on disk ¾ Computation cost ¾ in-memory searching, sorting, computation ¾ Memory M usage costt ¾ memory buffers needed in the server ¾ Communication cost ¾ remote connection cost, network transfer cost oktober 2008 21 oktober 2008 22 Sample p Query y Tree Execution - projection first Cost estimation: est at o σ ENTRY_ ENTRY DATE> 20 01 -08 08 -30 30 ( π OR DER_ DER ID , E NT RY_ RY DAT E ( OR DE R ) ) ¾ Disc accesses are expensive ¾ Estimate the disc accesses, by estimating the amount of data that need to be handled when computing the query σ ENTRY_ ENTRY DAT E>20 0 1-0 1 0 8-30 8 30 n = 2 tuples à 4+27 (=31) bytes total: 62 bytes y n = 6 tuples à 4+27 ((=31) 31) bytes total: 181 bytes π OR DER_ ID, ENTRY_ DATE n = 6 tuples à 4+4+27 (= 35) bytes tota l: 210 bytes oktober 2008 23 oktober 2008 24 O RD ER 4 Sample p Query y Tree Execution - selection first JOIN with JO t se selection ect o example e a pe SELECT * FROM ol_order_line, it_item WHERE ol_item_id _ _ = it_item_id _ _ AND ol_order_id = 1001 πORDER_ID, ENTRY_DATE( σENTRY_DATE>2001-08-30( ORDER ) ) n = 2 tuples à 4+27 (=31) bytes = 62 bytes σor_order_id=1001 or order id=1001((ol_order_line πORDER_ID, ENTRY_DATE ol item id = it_item_id ol_item_id it item id it_item)) 2) 1) n = 2 tuples à 4+4+27 (=35) bytes = 70 bytes y σor_order_id=1001 ol_item_id = it_item_id σENTRY_DATE>2001-08-30 ol_item_id = it_item_id n = 6 tuples à 4+4+27 (= 35) bytes = 210 bytes ol_order_line oktober 2008 25 oktober 2008 ORDER σor_order_id=1001 it_item ol_order_line it_item 26 Heuristic eu st c optimisation opt sat o Example: p Idéa: Do selection and p projection j first, join j as late as possible p Pnum Name Address Phone Email Program g Enrollment 10 30 30 20 20 5 6 Code Department Examiner Description Period 6 5 10 200 5 SPNum Ccode 10 6 STUDENT relation 5000 tuples, COURSE relation 200 tuples STUDENTCOURSE relation 100 000 tuples tuples. Algorithm: ¾ Break up conjunctive select into cascades ¾ Move down select as far as possible in the tree ¾ Rearrange select operations – most restrictive first ¾ Convert cross product to join with the appropriate join condition from a selection ¾ Move M d down project j t operations ti as ffar as possible ibl iin th the ttree ¾ Identify subtrees that can be executed by a single algorithm SELECT name,pnum,examiner FROM student, course, studentcourse WHERE code = “tddb38” and code=ccode and spnum=pnum 400 students have taken the course. oktober 2008 27 oktober 2008 Transformation of algebra expressions The e Syste System Catalog Cata og 1. 2. 3. 4. 5. 6 6. ¾ Contains useful information to predict which selections to move down in the tree. REL_NAME ATTR_NAME FK_REL oktober 2008 29 ATTR_TYPE DATA_LEN NUM_DIST 28 MEMB_PK LOW_VAL Conjunctive selection can be broken up into a sequence. Selection is commutative Only the last projection in a sequence is necessary. Projection commutes with selection Join (and cross product) are commutative a If all the attributes in a selection involves only one relation in a join a. join, then the select can be pushed into the join. b. If the selection condition can be written c1 AND c2 where each of the conditions only concerns one relation, c1 and c2 can be pushed down. MEM_FK HIGH_VAL oktober 2008 30 5 Transformation of algebra g expressions p Relational e at o a algebra a geb a πmovieTitle 7. Projection j operations p can be p pushed into jjoin,, each attribute to the relation it concerns. If the join condition contains additional attributes these attributes must be added to the join expressions children in the tree. 8. Union and intersection are commutative. Set difference is not. 9 Join, 9. Join cross product product, union and intersection are associative associative. 10. Selection commutes with union, intersection and set difference. 11. Projection commutes with union. 12. Combinations of selection and cross p product can be converted into jjoin operations. starName=name St I StarsIn πname σbirthdate LIKE ’%1960’ MovieStar oktober 2008 31 oktober 2008 32 Execution ecut o plan pa one pass one-pass hash-join 102 buffers IndexScan(StarsIn, IndexR) Filter(birthdate LIKE ’%1960’) TableScan(MovieStar) ( ) oktober 2008 33 oktober 2008 34 oktober 2008 35 oktober 2008 36 6 Some Heuristics Algorithms and code generation oktober 2008 37 oktober 2008 38 Basic Algorithms for Executing Query Operations (P i i i (Primitives iin node d operations i off query trees)) So t e ge Sort-Merge ¾ External Sorting g - ((ORDER BY,, p pre-processing p g for efficient joins) j ) ¾ Sorting algorithm suitable for files that do not fit in memory ¾ Sorting is divided into two phases: ¾ Sorting g ¾ sort-merge strategy ¾ The Select Operation ¾ Data scan: Linear search, binary search ¾ Index: I d P Primary i on =, Primary Pi on range, Secondary S d (B+tree (B t index) i d ) ¾ Conjunctive selections: Index+test, composite index, record pointer intersection ¾ ¾ ¾ Merging ¾ The JOIN Operation ¾ Nested-loop join, Single-loop join, Sort-merge join, Hash join ¾ ¾ ¾ PROJECT and set operations ¾ π : strait forward, + duplicate elimination ¾ Union, Union Intersection, Intersection Difference : sort-merge + duplicate elimination oktober 2008 oktober 2008 39 File is divided into ”runs” that can fit into available buffers. Nr_of_initruns=ceiling(blocks/blocks_in_buffer) ¾ ¾ oktober 2008 The sorted runs are merged during one or several ”passes”. The degree of merging is the number of runs that can be merged in each pass. degree of merging min( blocks_in_buffer degree_of_merging=min( blocks in buffer – 1, 1 nr nr_of_initruns) of initruns) number of passes = ceiling( logdegree_of_merging(nr_of_initruns) ) 40 So t e ge Sort-Merge Select Se ect Operation Ope at o Example: blocks_in_buffer = 5, blocks =1024 Æ nr_of_initruns=205 Degree_of_merging = 4 Pass 0: 205 runs Pass 1: 52 runs Pass 2: 13 runs Pass 3: 4 runs Pass 4: 1 run Four passes are needed to sort merge the file. cost = (2*blocks) + (2*(blocks*(logdegree_of_merging(blocks)))) Example cost = 10240 ¾ Linear Search 41 ¾ Retrieve and test every record ¾ Binary Search ¾ If the selection involved an equality comparison on a key attribute used for file ordering. ¾ Primary or Secondary Index ¾ Use the index, eventually for several elements in an intervall. ¾ Index I d +T Test ¾ Composite Index ¾ Record Pointer Intersection oktober 2008 42 7 Implementing p e e t g Joins Jo s Implementing p e e t g ”Project” oject ¾ Nested Nested-loop loop ¾ If the attribute list contain the key ¾ For every record t in R, retrieve every record s from S and test the join condition. ¾ No problem, duplicates will not occur ¾ Otherwise ¾ Single-loop ¾ Must remove duplicates ¾ For every record t in R, retrieve all matching records s from S using an index. ¾ Sort-Merge ¾ Each, Each sorted, sorted file with records are scanned once ¾ Hash ¾ Hash the record of the smaller file R into buckets. Then hash the records of S and combine each record with all records from R in the bucket. Must be able to fit file in memory! oktober 2008 43 oktober 2008 Su Summary ay ¾ ¾ ¾ ¾ ¾ oktober 2008 44 Heuristic eu st c Optimization Opt at o Query processing steps Relational algebra Heuristic optimization B i algorithms Basic l ith ffor executing ti query operations ti Cost components 45 SQL-example SQL example query SELECT E.LNAME FROM EMPLOYEE E, WORKS_ON W, PROJECT P WHERE P.PNAME PNAME = ‘Aquarius’ ‘A i ’ AND P.PNUMBER = W.PNO AND W.ESSN = E.SSN AND E.BDATE > ‘1957-12-31’ oktober 2008 Heuristic Optimization p – Canonical Form πLNAME 46 Heuristic Optimization p – Move Select Down πLNAME σPNUMBER=PNO σPNAME=‘Aquarius’ PNAME ‘A i ’ AND PNUMBER PNUMBER=PNO PNO AND ESSN ESSN=SSN SSN AND BDATE>’1957-12-31’ BDATE ’1957 12 31’ X X PROJECT X EMPLOYEE oktober 2008 47 σESSN=SSN σPNAME=‘Aquarius’ X PROJECT σBDATE>’1957-12-31’ WORKS_ON WORKS_ON EMPLOYEE oktober 2008 48 8 Heuristic Optimization p – Apply Most Restrictive πLNAME Select First Heuristic Optimization p – Convert Cartesian Product/Select with Join πLNAME σESSN=SSN ESSN=SSN X σPNUMBER=PNO X σPNAME=‘Aquarius’ σBDATE>’1957-12-31’ PNUMBER=PNO EMPLOYEE EMPLOYEE σPNAME=‘Aquarius’ WORKS_ON PROJECT oktober 2008 σBDATE>’1957-12-31’ WORKS_ON PROJECT 49 oktober 2008 50 Heuristic Optimization p – Move Projections πDown the Tree LNAME ESSN=SSN πESSN πSSN,LNAME PNUMBER=PNO πPNUMBER σPNAME=‘Aquarius’ PNAME ‘Aquarius’ πESSN,PNO SS O σBDATE>’1957-12-31’ EMPLOYEE WORKS ON WORKS_ON PROJECT oktober 2008 51 9