DB Design, Optimization and Indexing, and Monitoring and Tuning

advertisement
Database Performance Topics:
- DB Design
- Optimization & Indexing
- Monitoring and Tuning
Joe Carola
Siemens Medical Solutions, Health Services
Joe Carola, Siemens HS

Bio
• 30+ years in Information Technology
– 26 of them dedicated to Relational Database, covering all areas of
database design, implementation, support, performance, etc.
• Full History:
–
–
–
–
–
–
–
–
Development: Prog Trainee, Prog/Analyst, Sys Analyst
DBA trainee, DBA, Mgr-DBA (“Actor on the Scene”)
Lead DB Consultant for Codd and Date Consulting
Director-DBA
Technical Database Architect (Currently)
DB2, Microsoft SQL Server, Oracle, Sybase SQL Server
Mainframe, Unix, Wintel
1993 recipient of an International Database User Group Award for
Information Excellence, based on his contributions in the area of
Relational Database technology, and has presented locally and
internationally on a variety of Relational Database topics.
– Chairman, Delaware Valley DB2 User Group
2
Agenda – “Practical Stuff”

DB Design
• In the simplest terms - Logical to Physical

Optimization and Indexing
• Optimizer
• Index Types and how they are used

Monitoring and Tuning
• Monitoring Process and what to monitor
• Tuning Steps
The above is where
I see that the rubber
meets the road……
3
DB Design
DB Design

Logical Design
• Provides complete understanding of data and it usage
• Defining data entities and their attributes
• Provides primary and foreign key definitions

Physical Design
• Based on information gathered during Logical design
– Data must be understood to do this correctly and efficiently
• Provides physical aspects to enhance data usage
– Data types, data lengths, row sizes,
• Provides precise access paths (Indexes) to rows of data
– Support primary and foreign keys
– Secondary indexes

Poor Logical and Physical Database design can be the
largest reason for performance issues
• The price for poor DB Design must be paid at execution time
5
DB Design

Normalization: A synthesis of data design
1st Normal Form – Data is dependent on…………..The Key
2nd Normal Form – Data is dependent on The whole Key
3rd Normal Form - “ “
“ And nothing but the Key,
“so help me Codd”
• Edgar F. (Ted) Codd – Developed the Relational Model
“A Relational Model for Large Shared Data Banks” (1970)
– Solid, yet complex, mathematical foundation
• Relational Algebra
• Domains, Attributes, Tuples, and Relations (OMG!)
– Re-stated to simpler terms…..
• Simple to understand tables, rows, and columns
• The Simplicity is partially the reason for the performance issues
being addressed every day
– Too many shortcuts are taken
– Too many non-experienced data designers are designing and
implementing database applications
6
DB Design

3rd Normal Form is basically 1st cut physical
• Next step after 3rd NF in Physical DB Design is a very important step for
Performance, Concurrency, Operations, etc.
– De-Normalization takes place here
• Storing of data in summary or derived format
• If it doesn’t happen, it takes place at execution time
Result:
High processing costs – Materialization of the result data
Administration costs – Maintenance of the data
Low Currency – Concurrent Access to the data
• However…….
– Anomalies are created as a result of De-normalization
• Insert, Delete, Update
• They all cost extra processing also
– Must strike a balance based on requirements on performance,
availability, storage, administration
7
Optimization
&
Indexing
Optimization and Indexing

Must understand the basics of indexes and
performance statistics.
• As a general rule, indexes should be kept as narrow as
possible, most likely following a business use
requirement, to reduce the amount of processing
overhead associated with each query.

Being familiar with how optimization works will
improve the accuracy of your decision making
when designing indexes
• Understanding how the optimizer works is the first step
toward the establishment of a truly optimized database
environment

As the sophistication of your database
implementation increases…….
• The need to optimize performance will also increase.
9
Optimization and Indexing


SQL Query tuning is one of the most important tasks
to improve application performance
Biggest bang for the performance dollar over
everything else (“IMO”):
• Network, Storage, Memory, Processor

Should be done in the design and testing phases
• However, no amount of Database tuning or SQL statement
tuning can make up for inefficient application
design/coding
– 60% to 80% of Application Problems come from poorly written
SQL or the code around it
i.e. Prog101 abuse can wreck an application too!!
10
Optimizer


Responsible for choosing the least costly
way to execute SQL (DML).
Creates an access path with it’s decision
• Performed at plan compilation time
• Determines Access Methods
– Index Usage
– Table Scan
– Join Method
– Sort
• Determines if Data and/or Index pages can
be read in advance
– Asynchronous Pre-fetch
11
The Importance of Statistics

Statistics provide the optimizer with the
information to make decisions
Table
Indexspace
Generation
Tablespace
RDBMS
Catalog
Or
Dictionary
As the data in a column changes, index and column statistics can
become out-of-date and cause the query optimizer to make less than
optimal decisions on how to process a query.
12
Statistical Terms/Concepts

Cardinality
• Measures how many unique values exist in the table

Density
• Measures the uniqueness of values within a table.
• Helps the optimizer determine how many rows will be
returned for a given key value
• Indexes with high densities will likely be ignored by the
optimizer
– i.e. the index is highly non-unique

Selectivity
• Measures the number of rows that will be returned by a
particular query.
• Needed by Optimizer to calculate the relative cost of a
query plan
13
From Request to Response
REQUEST
RESPONSE
STAGE 2
PREDICATES
RELATIONAL DATA SERVICES
STAGE 2 - Evaluated after data
retrieval via the relational (NONSARGABLE, Residual) data
services which is more expensive
than the Data Manager.
DATA MANAGER
PREDICATE
ANY OTHER
WITH INDEX (ES)
INDEX KEY
Non Indexed
PREDICATE APPLIES
BUFFER MANAGER
I/O
STAGE 1
PREDICATES
STAGE 1 - Evaluated at the time the
data rows are retrieved
(SARGABLE). Performance
advantage in using STAGE 1
PREDICATES because this stage
eliminates ROWS passed to STAGE
2 via the Data Manager.
REQUESTED
DATA
Indexing

A very necessary part of Successful
Database Implementation
I wonder what queries will
be run ?
What indexes will be needed?
What columns will be used as
predicates?
What ORDER BY will be
used most often?
Why do some of
my queries run so
slow!
15
Indexes are a good thing to add, however
there is something to avoid…..
“Thanks for fixing
my query, what did
you do?”
“Great! Then add
indexes to all the
columns in my table
”I added an index to
one of the columns
#!*#!!!
Types of Indexes


There are two types of indexes: clustered and nonclustered, each with unique advantages depending on the
data set.
Clustered index
• Dictates the storage order of the data in a table. Because the data is
sorted, clustered indexes are more efficient on columns of data that
are most often searched for ranges of values. This index type also
excels at finding a specific row when the indexed value is unique.

Non-clustered index
• Similar to an index in a textbook where the data is stored in one
place and the data value in another. A query searches for the data
value by first searching the non-clustered index to find the location
of the data value in the table and then retrieves the data directly
from that location. The non-clustered index is useful for queries
resulting in exact matches.
17
Basic Index Usage
Matching Index Scan
1
Root Page
2
NonLeaf
Page
3
Leaf Page
Data
Page
Data
Page
Data
Page
Select * From TABLE1 Where INDEXED_COL1 = 12345
18
Basic Index Usage
Non-Matching Index Scan
Root Page
NonLeaf
Page
2
Leaf Page
1
Data
Page
Data
Page
Data
Page
Select * From TABLE1 Where INDEXED_COL1 > 00001
19
Basic Index Usage
Index Only
Root Page
Non-Leaf
Page
Leaf Page
1
Select COL1 From TABLE1 Where INDEXED_COL1 > 00001
20
Join Methods

Nested Loop Join
SELECT A,B,X,Y
FROM OUTER, INNER
WHERE A=10 AND B=X
Tables:
Columns:
OUTER
A
B
10
3
10
1
10
10
2
6
10
1
INNER
X
5
3
2
1
2
9
7
Y
A
B
C
D
E
F
G
1.) Scan the outer table,
For each qualifying row……… 2.) find all matching rows
in the inner table, via table
space scan or index
access.
21
COMPOSITE
A B
X
10 3
3
10 1
1
10 2
2
10 2
2
10 1
1
Y
B
D
C
E
D
The nested loop join
produces this result
Join Methods

Merge Scan Join
SELECT A,B,X,Y
FROM OUTER, INNER
WHERE A=10 AND B=X
1.) Condense and sort the outer
table, or access it through an
index on column B…...
Tables:
OUTER
Columns:
A
B
10
1
10
1
10
2
10
3
10
6
Condense and sort the
inner table.
INNER
X
Y
1
D
2
C
2
E
3
B
5
A
7
G
9
F
2.) Scan the outer table,
For each qualifying row….… 3.) Scan a group if
matching rows in the
inner table.
22
COMPOSITE
A B
X
10 1
1
10 1
1
10 2
2
10 2
2
10 3
3
Y
D
D
C
E
B
The merge scan join
produces this result
Join Methods
Hybrid / Hash Join
SELECT C2,C33

FROM OUTER, INNER
WHERE C1 = A AND C2 = C22
1.) Apply local predicates and organize qualifying
rows in join column sequence by either sorting or
accessing via join column index….
INNER
OUTER
R
R
C22 C33
O C1 C2
I
W
D
RID LIST
1
2
3
4
5
6
A
A
A
A
A
.
1
1
2
3
6
.
1
2
2
3
5
7
D
C
E
B
A
G
P1
P2
P3
P4
P5
P6
P1 P1 P2
2.) Obtain only inner table RIDs via index
access using sequenced join column key
23
values...….
P3 P4
4.) List Prefetch inner table
rows and complete partial
rows
PARTIAL ROWS
RESULT
C2
RID
C2 C33
1
1
2
2
3
P1
P1
P2
P3
P4
1
1
2
2
3
3.) Create partial rows, and
sort in RID sequence...….
D
D
C
E
B
An Ounce of Prevention….,

Make your queries simple and efficient, ensuring the
least costly access path available.
• Try not to overload your tables with indexes
• Try not to overload your indexes
• Try not to overload your queries

Keep the Database healthy
• Reorganization
– Eliminates empty space, and fragmentation
– Reduces I/O

Generate Statistics (if they are not automatic)
• The Optimizer is very smart, but data attributes are always
changing
– DB Size/Volume, Data Skewness, Data Content

Analyze SQL Query and access path selection prior to
implementing into a production environment.
• Execute the Explain Plan periodically to determine what
method the Optimizer is selecting for an access path.
24
“Explain” Plan / SHOWPLAN


Phase of the optimizer that captures information
used in selecting the query access plan
Why use an Explain Plan?
• Gives clues as to why the optimizer made access
decisions
• Can be used in advance of execution
• Can be used to maintain a history of problem query
access
– Before/After new indexes additions
– Before/After Statistics are Generated/Re-Generated
– Before/After Data additions/changes/deletions
• Problem determination is easier by comparing reference
plans
25
Example

Graphical SHOWPLAN
26
Monitoring and Tuning
Monitor and Tuning

A Constant Process
• A very necessary part of successful database
implementation
• Must be there to guarantee ongoing, optimal
Database Performance
Design Data
Object Data
Activity Data
Repeat
3.) Consider Fixes
1.) Collect Data
4.) Apply Fixes
2.) Analyze Data
28
Redesign
Tune
Real time
Periodic
Historical
Monitoring and Tuning

What to monitor
• Healthiness of Database Objects
– Growth
– Fragmentation
• Exists when TS and/or indexes have pages in
which the logical ordering, based on key or link
value, does not match the physical ordering of
the pages inside the file
• Causes additional I/O and additional storage
• Causes of Fragmentation
– DML (Insert, Delete, Update)
– Inserts/Updates cause Page Splits
29 cause holes
– Delete/Updates
Monitoring and Tuning

What to monitor
• Fragmentation illustrated
Uniform pages in order
Index 1
Page 1
Index 1
Page 2
Index 1
Page 3
Index 1
Page 4
Index 1
Page 5
Index 1
Page 6
Index 1
Page 7
Index 1
Page 8
Index 2
Page 2
Index 1
Page 4
Index 1
Page 5
Index 3
Page 1
Index 1
Page 8
Non-uniform pages, out of order
Index 1
Page 1
Index 1
Page 2
Index 2
Page 1
• Reorganization
– Reorders pages, compresses entries on a page
• Always be sure to run new Statistics collection
(for the Optimizer) 30
Monitoring and Tuning

What to monitor
• Object Usage
– Access Patterns (Random, Sequential, Indexed, Non-Indexed)
– I/O (Volume, Latency)
– They tend to change over time as users learn the application
• Memory Usage
– Buffer Hit Ratio
– Data/Index pages in the Buffer will avoid an I/O
• Processing Activity
– CPU utilization
• Will indicate excessive searching and/or sorting
– Parallel, Non-Parallel
• Can speed up large searches
• Can also monopolize all the processors
• Locking
– Timeouts
– Deadlocks
31
Monitoring and Tuning

How to monitor – Tool usage
SQL Request
Tool to
Collect &
Interpret
DBMS Statistical
Generation
Alerts
Performance
DB
Result
Reports
32
Monitoring and Tuning
Steps
• Find the statements that consume the most resources
– “Heavy Hitters”
• Physical Reads will indicate SQL requiring disk access to get queries
– Most expensive part of a Query!!!
• Buffer Gets indicate the amount of searching going on within a query
High Buffer Gets = Lots of Searching = Lots of Processing
• Sorts information will indicate if SQL is doing an excessive amount of
sorting
• Find the offending statements without adding to the
performance problem
– Use simple top down approach
• Avoid heavy tracing
• Know the Database Design and Usage
• Run Explain Plan on SQL
33
Additional
Questions?
Download