Session – 9 QUERY OPTIMATIZATION Matakuliah : M0184 / Pengolahan Data Distribusi

advertisement
Matakuliah
Tahun
Versi
: M0184 / Pengolahan Data Distribusi
: 2005
:
Session – 9
QUERY OPTIMATIZATION
OBJECTIVE
• Definition of Query optimization
• Query Optimization Process
• Essential aspects of query processing in a
distributed environment
• The importance of query optimization
Query Optimization
• Query optimization refers to the process of
producing a query execution plan (QEP) which
represent an execution strategy for the query.
• Query optimization is the process of ensuring
that either total cost or the total response time
for a query are minimized.
• A query optimizer, the software module that
perform query optimization, is usually seen as
three components : search space, a cost model
and search strategy
Query Optimization Process
INPUT QUERY
SEARCH SPACE
GENERATION
TRANSFORMATION
RULES
EQUIVALENT QEP
SEARCH STRATEGY
COST
MODEL
BEST QEP
Source : Principle of Distributed Database System
Query Optimization Process Cont’d
• SEARCH SPACE, is the set of alternative
execution plans to represent the input query.
Search Space is obtained by applying
transformation rules.
• COST MODEL predicts the cost of a given
execution plan. To be accurate the cost model
must have good knowledge about the distributed
execution environment
• The SEARCH STRATEGY explores the search
space and select the best plan using the cost
model
Essential Aspect of Query
Processing in a Distributed
environment
• Data and message have to be transmitted
across communications lines, which has a
tendency to slow down the whole process
• The existence of multiple processor in the
network means that there are
opportunities for parallel processing and
data transformation, which raises the
possibility of speeding up responses.
The importance of Query
Optimization
There can be very substantial savings in
the cost of execution a query or in the
length of time the user has to wait for a
response.
• Execution cost optimizer is to minimize the
use of total system resources for a query
and hence reduce its cost.
• Response time can be taken to represent
a cost to an organization
Variation in Ways of Executing
Queries
• Two basic tools for use in optimization :
1. Query Transformation, the relational
operator such as JOIN and PROJECT
2. Query Mapping, execution relational
operator using low-level algorithms and
access devices such as pointer.
EXAMPLE
SITE
Hospital
Health Center
Community
Care
RELATION
HOSPITALIZATION
(Pat-Name, DOB, admit, Discharge,
Dept)
Cardinality : 200000
PATIENT
(Pat-Name, DOB, GP)
Cardinality : 10000
SURVEY
(Pat-Name, DOB, Weight, Type-ofwork)
Cardinality : 100000
EXAMPLE Cont’d
• Query : Find the names and GP all
patients who weight over 100Kg and have
been treated in the orthopedic department
since 1 January 2005.
EXAMPLE – OPTION 1
• Move PATIENT relation to Community Care,
JOIN it with SURVEY relation and move the
result to the query site for joining with the other
relation, shipped from the Hospital
t : cost for transmitting a tuple
c : overall cost
Cost : (10000 * 1000c)  for PATIENT and
restricted SURVEY (assume 1000 tuple result)
+ (200 * 1000c)  for joining this result and
restricted HOSPITALIZATION
Transmit : 200t + 10000t + 1000t
EXAMPLE – OPTION 1
• Send the restricted HOSPITALIZATION
relation to Community Care, Join it with
Restricted SURVE. Join the result with
PATIENT.
Cost : 1000 * 200c  Community Care
+ 100 * 10000c  Health Care
Transmit : 200t + 100t + 1000t
Download