What is partitioning?

advertisement
Introduction to SQL Server
Partitioning
Kendra Little
This work is by Kendra Little and is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
About Kendra
Index
1. A sample case.
You are here
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
Should this client use partitioning?
Index
1. A sample case.
2. What is partitioning?
You are here
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
All tables have at least one partition.
One Partition
“In SQL Server, all tables and
indexes in a database are
considered partitioned, even if
they are made up of only one
partition. Essentially, partitions
form the basic unit of organization
in the physical architecture of
tables and indexes. This means
that the logical and physical
architecture of tables and indexes
comprised of multiple partitions
mirrors that of single-partition
tables and indexes.”
…Partitioned Table and Index
Concepts (msdn)
“Partitioning” actually means
“horizontal partitioning”
Horizontal partitioning takes
groups of rows in a single table
and allocates them in semiindependent physical sections.
SQL Server’s horizontal
partitioning is RANGE based.
Horizontal ranges are based on a
partition key.
 A single column in the table.
 Just one!
 Use a computed column if you must, but make sure it
performs well as a criterion and works for joins.
 Typically a date or integer value
 Consider:
 A column you will join on
 A column you can always use as a criterion
I must
choose
wisely.
Ranges of data are defined by a
partition function which uses the key.
The partition function defines your boundary points and
can use either RANGE LEFT or RIGHT.
 LEFT: the first value is an UPPER boundary point in
partition #1
 RIGHT: the first value is a LOWER boundary point in
partition #2
Keep to the
right. It’s
easier.
RIGHT based partition function for Doll
Orders keyed on OrderDate
Partition 1
1/1/2008
Partition 2
1/1/2009
Partition 3
1/1/2010
Partition 4
1/1/2011
Partition 5
RIGHT based partition function keyed
on PartName (effectively LIST)
Partition 1
Question: how do we get
rows into Partition 1?
Boundary Point 1: BODY
Partition 2
Boundary Point 2: SHOE
Partition 3
Filegroups are mapped to the partition
function using a partition scheme.
Slow,
Read-only
FG_A
FG_B
Partition 1:
Compressed
1/1/2008
Partition 2:
Compressed
1/1/2009
Partition 3
1/1/2010
FG_C
Partition 4
1/1/2011
FG_D
Partition 5
Objects are created on the partition
scheme.
Table
• Created on partition scheme.
(and indexes)
Partition
Scheme
Partition
Function
• Maps partitions defined by the partition function to physical
filegroups
• Boundary points
• Defines ranges
• Define an algorithm the engine will use to know where to put rows
Indexes can be created on the partition
scheme. Or not.
Aligned
Indexes
• Located on your partitioning scheme (or an identical partitioning scheme)
• Must contain the partitioning key.
• If the partitioning key is not specified, it will be added for you. Note: this
affects your primary key for the table!
• Indexes are aligned by default unless it is otherwise specified at creation time.
• Perform better for aggregations and when partition elimination can be used.
Nonaligned
indexes
• Physically located elsewhere- either non partitioned or on a non-identical
partitioning scheme
• May perform better with single-record lookup
• Allow unique indexes (because they do not have to contain the partitioning
key)
• However, the presence of these preclude partition-switching!
Switching
 Requires all indexes to be aligned.
 Compatible with filtered indexes
 Data may be switched in or out only within the same
filegroup.
 Is a metadata-only operation requiring a schema
modification lock. This can be blocked by DML
operations, which require a schema stability lock.
 Is an exceptionally fast way to load or remove a large
amount of data from a table!
Creating the partition function
Our hero.
Creating filegroups
We left the
Primary FG
default on
purpose!
Creating the partition scheme
The partition scheme can map each partition to a
specific filegroup, or all partitions to the PRIMARY
filegroup.
Where the
rubber
meets the
road.
Query FGs mapped to the partition
function via the partition scheme
This gets a
little
complicated.
Creating a table on the partition
scheme and add some rows.
A partitioned
heap: you
can totally
do that.
Let’s have a look at that heap.
We’ll use this
query again, but
not show it on
every slide for
obvious reasons.
Adding indexes
Someone’s not
in line.
Notice that aligned indexes always
have the clustering key
That’s not
usually there!
Adding another partition
We now have
a full staging
table and
empty
partition on
dailyFG4
Switching in!
Don’t forget to drop
ordersDaily20101230:
your staging table is
still there, it’s just
empty now.
And you’re gonna
have to rebuild that
non-aligned NC if you
want it back.
Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
You are here
Is maintenance a significant problem
for availability?
YES
NO
• Partitioning may be what you
are looking for.
• Keep checking other factors.
• You may have other reasons
to partition, but one of its big
benefits is to help with this.
Maintenance
includes index
rebuilds,
loading data,
and deleting
data.
Are query patterns defined by regions?
YES
• Finding regions of data which are
queried together and have a good
partitioning key is important to good
query performance.
• This is the basis of partition elimination.
NO
• You may not have a good partitioning
key.
• Keep looking at the query patterns for
your workload and evaluating different
partitioning keys.
Data
regions may
be dates,
integers,
codes
Can applications and queries be
optimized for partitioning?
YES
NO
• This means you will be able to
rewrite some queries and
procedures as needed to take
advantage of partition elimination.
• If you do not have the ability to
tune user and application queries,
some will likely perform very poorly.
Some
assembly
required.
Do you have resources to support the
partitioned system?
• Can your disk configuration be optimized?
• Is enough buffer pool available for what
will need to be read into memory
concurrently?
• Will you be able to tune and configure
parallelism appropriately for the workload?
• Do you have a system you can test with a
production-like workload, or a suitable
rollback plan?
Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
You are here
5. Revisiting our sample case.
Editions with partitioning
Enterprise
Datacenter
Developer
Evaluation
Support for HOW MANY partitions?
 15,000 partitions are available in SQL 2008 with SP2
applied
 SQL Server 2005, 2008, and 2008 R2 (for now) are
limited to 1,000 partitions. This is less than 3 years for
daily partitioning.
What problems
could happen with
lots of partitions?
Parallelism
 In 2005, a query touching more than one partition
typically had only one thread per partition.
 In 2008, the Partitioned Table Parallelism
improvement allows multiple threads to be used on
each partition for parallel plans.
Partition
1!
Partition
1!
Partition
2!
Partition
2!
Partition
3!
Partition
3!
Lock escalation AUTO
 Lock escalation can be set to AUTO for a table. If the
table is partitioned, locks will escalate to the partition
level rather than the table level.
 What’s awesome: greater concurrency!
Partition level deadlocks
are not awesome. Test
your workload (like with
any feature).
Partition aware seeks
 In SQL 2008, the optimizer has been made more
clever and has a greater chance at achieving partition
elimination. This has been done by:
 Changing the internal representation of a partitioned
table to be more optimized for seeking on the
PartitionID (even when the table’s CX is on another
column)
 A “skip scan” operation has been added to allow the
optimizer greater flexibility.
More optimized optimizin.
Be careful with your statistics
 Statistics are not maintained per partition, they are
maintained for the entire index or column. Since there
is a limit to the number of steps in the histogram, the
statistics can become invalid, and on very large tables
may take a long time to update.
 Filtered statistics can be used to help with this in
2008: you can create new filtered statistics for your
new partition.
This sounds like work.
Index rebuilds and compression
 Individual partitions cannot be rebuilt online.
 The entirety of a partitioned index can be rebuilt
online.
 Individual partitions can be compressed.
For fact tables with archive data, older partitions can be
be rebuilt once with compression. Their filegroups can
then be made read-only.
I’d better check my
maintenance jobs.
Switching Feature Compatibility
 Works with replication in 2008 and later
 Some subscribers can have the partitioning scheme,
others don’t have to
 This means you can have some subscribers on Standard.
 Works with Change Data Capture (with some special
steps)
 Does not work with Change Tracking
@SQLFool replicates her
partitioned tables, check
out her blog.
Index
1. A sample case.
2. What is partitioning?
3. When is partitioning helpful?
4. What’s the fine print?
5. Revisiting our sample case.
You are here
So, should this client use partitioning?
Resources/ Contact
There is a very large amount of documentation online
for horizontal table partitioning. Get my
recommendations here:
http://littlekendra.com/resources/partition/
This presentation would not have been possibly
without whitepapers and blogs by Kimberly Tripp,
Michelle Ufford, and Ron Talmage.
•
Twitter: @kendra_little
•
Email: littlekendra@gmail.com
•
LinkedIn: http://www.linkedin.com/in/kendralittle
This work is by Kendra Little and is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
Download