Using DataSizer

advertisement
DataSizer
The DataSizer spreadsheet is a tool that helps estimate the size of tables in SQL Server 7.0.
It contains two worksheets, labeled Clustered and Heap. The Clustered sheet gives formulas to
estimate the size of a clustered index table and its nonclustered indexes. The Heap sheet gives
formulas to estimate the size of a heap table (any table that does not have a clustered index) and
the associated nonclustered indexes.
The formulas used in the spreadsheet are exact, but they are not complete (see “Not Covered by
the Worksheets” next). Also, the worksheets include detailed information about the space used in
each data and index row. You may not need that much detail and can go directly to the Table
Size and NC Index Size totals after you have entered the data for your table.
Not Covered by the Worksheets
The formulas to estimate the size of the row and rows per page are based on the actual
algorithms and structures in SQL Server. Despite this, the results of the spreadsheet are still only
estimates. The formulas do not include some of the minor overhead factors that determine the
exact size of a table. Specifically, the formulas do not include space for the following:




Allocation structures (Index Allocation Maps, Page Free Space Maps)
The effects of random inserts and free space scan (affects data placement)
Whether you are performing bcp or insert
The order in which rows are inserted (which affects storage efficiency for clustered
indexes)
Additional factors include overhead for text columns, uniqifier columns (added automatically to
nonunique clustering keys), back pointers, and record forwarding.
If you use the worksheets to estimate the size of a table, you should allow for a small margin of
error. With larger tables, this margin will be so small as to be insignificant. If you have a lot of
small tables you may want to add a 5 percent allowance.
The tool does not include the formula to estimate the size of a table that has Text columns. Not
NULL text values consume 16 bytes in the data row and have a minimum size of 84 bytes on the
text page. Text values are packed onto text pages with the same algorithm as data rows so it
should be possible to estimate the size of text data storage using the HEAP table spreadsheet if
you know the average size of your text values.
Using DataSizer
To use the DataSizer tool, fill out the Clustered and Heap worksheets in DataSizer.xls as
described.
Clustered Worksheet
The formulas in DataSizer are driven by the values you place in the worksheets “green cells”. For
example, on the Clustered worksheet you fill in the following values first.
Rows in table
10,000
Fill Factor
Data Row Fixed Len Col Size
Number of Columns in data row
Number of Variable Length Columns in data row
Max Size of Variable Length Data in data row
Clustered Index Key Fixed Len Col Size
Number of Columns in clustered Index Key
Number of Variable Length Columns in index key
Max Size of Variable Length Data in index key
95%
40
12
8
60
6
1
0
0
If you are not changing the fill factor for your table, leave the value at 95 percent. Add the total
byte count for the fixed length columns and enter that in the cell for Data Row Fixed Len Col
Size. In SQL Server 7.0, a column can be a fixed length and can allow null values. The variable
length types all start with “var-“Now count the variable length columns in your row and enter that
count in the appropriate cell. The Max Size of Variable Length Data in data row is the number
of bytes you would have in variable length columns if they all had as much data as possible. For
example, if you have a varchar(50) and a Variable(800) in your row, the Max Size would be 850.
Now add the total size of fixed length columns in the clustering key, count the total number of
columns in the clustering key and the number of variable length columns, and then add the Max
Size for variable length columns in the clustering key.
You now have enough data entered to estimate the size of the base table for a clustered index
table. The actual size of the data rows and the index rows for this table are given based on the
data you entered. You can see how much space is used by each part of the data and index rows,
and how many rows fit on each page. The spreadsheet is not very smart about calculating how
many levels the index will have. The first level to indicate one page is the top level of the index. If
you see only one page at level 0, that means there is only one index page and the rest of the
pages are data pages. If you have two levels of index pages you will see one level 0 page and
some number of level 1 index pages, and so on. You can easily extend the worksheet to include
more index levels if you are working with very large tables and large index keys.
The next section of the worksheet lets you estimate the size of a nonclustered index for your
clustered index table. Enter the following values for each nonclustered index:
Fill Factor
Index Key Fixed Len Col Size
Number of Columns in Index Key
Number of Variable Length Columns in index key
Max Size of Variable Length Data in index key
95%
8
2
1
30
This is the same type of information as you calculated for the clustered index. You may not need
to specify a different fill factor. Enter the fixed length size, column count, variable length column
count, and maximum size of variable length columns. The formulas on the worksheet will show
you how much space is used for each nonclustered index row.
Heap Worksheet
The Heap worksheet has the same basic structure as the Clustered worksheet. The data
required to estimate the size of a heap is a subset of the data needed by the Clustered
worksheet. The main difference is that the Heap worksheet does not ask for index data for the
base table.
Download