DataSizer The DataSizer spreadsheet is a tool that helps estimate the size of tables in SQL Server 7.0. It contains two worksheets, labeled Clustered and Heap. The Clustered sheet gives formulas to estimate the size of a clustered index table and its nonclustered indexes. The Heap sheet gives formulas to estimate the size of a heap table (any table that does not have a clustered index) and the associated nonclustered indexes. The formulas used in the spreadsheet are exact, but they are not complete (see “Not Covered by the Worksheets” next). Also, the worksheets include detailed information about the space used in each data and index row. You may not need that much detail and can go directly to the Table Size and NC Index Size totals after you have entered the data for your table. Not Covered by the Worksheets The formulas to estimate the size of the row and rows per page are based on the actual algorithms and structures in SQL Server. Despite this, the results of the spreadsheet are still only estimates. The formulas do not include some of the minor overhead factors that determine the exact size of a table. Specifically, the formulas do not include space for the following: Allocation structures (Index Allocation Maps, Page Free Space Maps) The effects of random inserts and free space scan (affects data placement) Whether you are performing bcp or insert The order in which rows are inserted (which affects storage efficiency for clustered indexes) Additional factors include overhead for text columns, uniqifier columns (added automatically to nonunique clustering keys), back pointers, and record forwarding. If you use the worksheets to estimate the size of a table, you should allow for a small margin of error. With larger tables, this margin will be so small as to be insignificant. If you have a lot of small tables you may want to add a 5 percent allowance. The tool does not include the formula to estimate the size of a table that has Text columns. Not NULL text values consume 16 bytes in the data row and have a minimum size of 84 bytes on the text page. Text values are packed onto text pages with the same algorithm as data rows so it should be possible to estimate the size of text data storage using the HEAP table spreadsheet if you know the average size of your text values. Using DataSizer To use the DataSizer tool, fill out the Clustered and Heap worksheets in DataSizer.xls as described. Clustered Worksheet The formulas in DataSizer are driven by the values you place in the worksheets “green cells”. For example, on the Clustered worksheet you fill in the following values first. Rows in table 10,000 Fill Factor Data Row Fixed Len Col Size Number of Columns in data row Number of Variable Length Columns in data row Max Size of Variable Length Data in data row Clustered Index Key Fixed Len Col Size Number of Columns in clustered Index Key Number of Variable Length Columns in index key Max Size of Variable Length Data in index key 95% 40 12 8 60 6 1 0 0 If you are not changing the fill factor for your table, leave the value at 95 percent. Add the total byte count for the fixed length columns and enter that in the cell for Data Row Fixed Len Col Size. In SQL Server 7.0, a column can be a fixed length and can allow null values. The variable length types all start with “var-“Now count the variable length columns in your row and enter that count in the appropriate cell. The Max Size of Variable Length Data in data row is the number of bytes you would have in variable length columns if they all had as much data as possible. For example, if you have a varchar(50) and a Variable(800) in your row, the Max Size would be 850. Now add the total size of fixed length columns in the clustering key, count the total number of columns in the clustering key and the number of variable length columns, and then add the Max Size for variable length columns in the clustering key. You now have enough data entered to estimate the size of the base table for a clustered index table. The actual size of the data rows and the index rows for this table are given based on the data you entered. You can see how much space is used by each part of the data and index rows, and how many rows fit on each page. The spreadsheet is not very smart about calculating how many levels the index will have. The first level to indicate one page is the top level of the index. If you see only one page at level 0, that means there is only one index page and the rest of the pages are data pages. If you have two levels of index pages you will see one level 0 page and some number of level 1 index pages, and so on. You can easily extend the worksheet to include more index levels if you are working with very large tables and large index keys. The next section of the worksheet lets you estimate the size of a nonclustered index for your clustered index table. Enter the following values for each nonclustered index: Fill Factor Index Key Fixed Len Col Size Number of Columns in Index Key Number of Variable Length Columns in index key Max Size of Variable Length Data in index key 95% 8 2 1 30 This is the same type of information as you calculated for the clustered index. You may not need to specify a different fill factor. Enter the fixed length size, column count, variable length column count, and maximum size of variable length columns. The formulas on the worksheet will show you how much space is used for each nonclustered index row. Heap Worksheet The Heap worksheet has the same basic structure as the Clustered worksheet. The data required to estimate the size of a heap is a subset of the data needed by the Clustered worksheet. The main difference is that the Heap worksheet does not ask for index data for the base table.