Matakuliah Tahun Versi : M0184 / Pengolahan Data Distribusi : 2005 : Session – 9 QUERY OPTIMATIZATION OBJECTIVE • Definition of Query optimization • Query Optimization Process • Essential aspects of query processing in a distributed environment • The importance of query optimization Query Optimization • Query optimization refers to the process of producing a query execution plan (QEP) which represent an execution strategy for the query. • Query optimization is the process of ensuring that either total cost or the total response time for a query are minimized. • A query optimizer, the software module that perform query optimization, is usually seen as three components : search space, a cost model and search strategy Query Optimization Process INPUT QUERY SEARCH SPACE GENERATION TRANSFORMATION RULES EQUIVALENT QEP SEARCH STRATEGY COST MODEL BEST QEP Source : Principle of Distributed Database System Query Optimization Process Cont’d • SEARCH SPACE, is the set of alternative execution plans to represent the input query. Search Space is obtained by applying transformation rules. • COST MODEL predicts the cost of a given execution plan. To be accurate the cost model must have good knowledge about the distributed execution environment • The SEARCH STRATEGY explores the search space and select the best plan using the cost model Essential Aspect of Query Processing in a Distributed environment • Data and message have to be transmitted across communications lines, which has a tendency to slow down the whole process • The existence of multiple processor in the network means that there are opportunities for parallel processing and data transformation, which raises the possibility of speeding up responses. The importance of Query Optimization There can be very substantial savings in the cost of execution a query or in the length of time the user has to wait for a response. • Execution cost optimizer is to minimize the use of total system resources for a query and hence reduce its cost. • Response time can be taken to represent a cost to an organization Variation in Ways of Executing Queries • Two basic tools for use in optimization : 1. Query Transformation, the relational operator such as JOIN and PROJECT 2. Query Mapping, execution relational operator using low-level algorithms and access devices such as pointer. EXAMPLE SITE Hospital Health Center Community Care RELATION HOSPITALIZATION (Pat-Name, DOB, admit, Discharge, Dept) Cardinality : 200000 PATIENT (Pat-Name, DOB, GP) Cardinality : 10000 SURVEY (Pat-Name, DOB, Weight, Type-ofwork) Cardinality : 100000 EXAMPLE Cont’d • Query : Find the names and GP all patients who weight over 100Kg and have been treated in the orthopedic department since 1 January 2005. EXAMPLE – OPTION 1 • Move PATIENT relation to Community Care, JOIN it with SURVEY relation and move the result to the query site for joining with the other relation, shipped from the Hospital t : cost for transmitting a tuple c : overall cost Cost : (10000 * 1000c) for PATIENT and restricted SURVEY (assume 1000 tuple result) + (200 * 1000c) for joining this result and restricted HOSPITALIZATION Transmit : 200t + 10000t + 1000t EXAMPLE – OPTION 1 • Send the restricted HOSPITALIZATION relation to Community Care, Join it with Restricted SURVE. Join the result with PATIENT. Cost : 1000 * 200c Community Care + 100 * 10000c Health Care Transmit : 200t + 100t + 1000t