The Server – Basics

advertisement
Design Considerations for RS Servers
Disclaimer
This document is intended as a guide for design considerations when planning to implement Recovery
Solution into an environment. As such, it must be clear that these are simply considerations and actual
requirements will vary dependent upon the environment, configuration, and supporting infrastructure.
All metrics discussed within this document are for demonstration purposes ONLY.
Contents
Disclaimer...................................................................................................................................................... 1
The Server – Basics........................................................................................................................................ 1
Server OS Requirements ........................................................................................................................... 2
SQL Requirements..................................................................................................................................... 2
SQL TempDB database and log file ....................................................................................................... 3
Application/User Databases and Log File/s .......................................................................................... 3
RS Storage Requirements ......................................................................................................................... 4
RS Activity Requirements .......................................................................................................................... 4
Overview of Basics .................................................................................................................................... 5
The Server – Advanced ................................................................................................................................. 7
Operating System Standard .................................................................................................................. 7
SQL Installation Standards .................................................................................................................... 7
Physical vs Logical Disks ........................................................................................................................ 8
Hard Disk/Hard Disk Controller Speed .................................................................................................. 8
CPU ........................................................................................................................................................ 8
The Server – Basics
When implementing Recovery Solution to an environment, there are three primary elements that need
to be evaluated in order to optimize performance. These three key criteria are:


Hardware – CPU/RAM
File/Network I/O Bandwidth

Storage
In order to simplify the design process, it is often easier to evaluate these criteria against the four key
elements of a Recovery Solution implementation:




Server OS requirements
SQL requirements
RS Storage requirements
RS Activity requirements
We will primarily be using the inherent PerfMon tool to gauge the eligibility of a server as a RS Cluster
host candidate.
Server OS Requirements
Before any RS implementation can occur, there must be a x86 Windows based server available to host
the process. Using the key criteria listed previously, a review of the candidate server (and all software
running on it e.g. Antivirus, Monitoring/Management Software etc) should be conducted. In order to
pass as an eligible candidate for the next phase of evaluation, the server should:



Have sufficient hardware resources so that at peak load of ALL1 functionality, the server is not
approaching maximum load on CPU or memory.
Disk Queue length metrics gathered from PerfMon are not reaching sustained high levels during
peak activity
Have ample disk storage to meet current and foreseeable demand for the existing
implementation.
One key element that does require tuning for optimal performance is the Windows Paging File.
Recommendations on the configuration of the paging file.
SQL Requirements
The primary element of an RS Cluster is the SQL database housing the information associated to the
protected data that the RS Clients have provided. As the SQL database is the key focal point of RS
activities, the degree of optimization that can be achieved will be directly reflected in the performance
of the RS Cluster attached to it.
At the basic level, SQL consists of five components:




1
SQL executable and associated program files
SQL System databases and log files
SQL TempDB database and log file
Application/User database/s and log file/s
ALL activities includes items such as AV scans, DR backup, Windows/Application patching, and 3rd party
applications co-residing on the candidate server.
A good general rule for SQL implementations (regardless of RS usage or not) is to store all your data
files, and transaction log files away from the paging file unless you are sure your server will never page.
Separate your data files from your transaction log files. With RS as a focus, of these elements the
TempDB and the Application/User databases need to be considered in terms of their functionality in
order for true optimization of the SQL instance.
SQL TempDB database and log file
The TempDB database is an extremely important component and should be given much greater
consideration than it generally receives.
The TempDB is responsible for the following activities in SQL 2005





Global (##temp) or local (#temp) temporary tables, temporary table indexes,
temporary stored procedures, table variables, tables returned in table-valued
functions or cursors.
Database Engine objects to complete a query such as work tables to store
intermediate results for spools or sorting from particular GROUP BY, ORDER BY,
or UNION queries.
Row versioning values for online index processes, Multiple Active Result Sets
(MARS) sessions, AFTER triggers and index operations (SORT_IN_TEMPDB).
DBCC CHECKDB work tables.
Large object (varchar(max), nvarchar(max), varbinary(max) text, ntext, image,
xml) data type variables and parameters.
Many of the internal functions of an RS Cluster implementation will leverage heavily upon the
TempDB. As such, the following recommendations should be considered:



Ensure the disk drives TempDB resides on have RAID protection i.e. 1, 1 + 0 or 5
in order to prevent a single disk failure from shutting down SQL Server. Keep in
mind that if TempDB is not available then SQL Server cannot operate.
If SQL Server system databases are installed on the system partition, at a
minimum move the TempDB database from the system partition to another set of
disks to optimize I/O performance.
Size the TempDB database appropriately. For example, if you use the
SORT_IN_TEMPDB option when you rebuild indexes, be sure to have sufficient
free space in TempDB to store sorting operations and that no other process have
the potential to consume space so as to ‘starve’ the TempDB in growth.
Application/User Databases and Log File/s
The RS Cluster database (AeXRSDatabase) is the focal point for all activity of the RS Cluster. Many of
the same considerations that are put in place for the TempDB, need similar and separate
consideration for the AeXRSDatabase. Primary among these are:

Ensure the disk drives TempDB resides on have RAID protection i.e. 1, 1 + 0 or 5
in order to prevent a single disk failure from shutting down SQL Server. Keep in
mind that if AeXRSDatabase is not available then RS Cluster cannot operate and
the primary service will shut down.

If SQL Server system databases are installed on the system partition, at a
minimum move the AeXRSDatbase from the system partition to another set of
disks to optimize I/O performance. If possible, keep the AeXRSDatabase and the
TempDB separated in order to optimize I/O performance.
Once these components have been considered, their impact upon the overall server functionality must
be assessed. As with the first stage, elements to check are:



Have sufficient hardware resources so that at peak load of ALL2 functionality, the server is not
approaching maximum load on CPU or memory.
Disk Queue length metrics gathered from PerfMon are not reaching sustained high levels during
peak activity.
Have ample disk storage to meet current and foreseeable demand for the existing
implementation. Ensure that no other system activities or functions have the potential to
encroach upon the disk space made available for the AeXRSDatabase and TempDB.
A tool that can be used to simulate SQL activity and allow a more robust review of the overall server
performance is the Microsoft SQLIOSim utility
NOTE: While RS does have capacity to be serviced by a database remote to the actual RS Cluster host,
for the purposes of this document focus will be placed on ‘on-box’ implementation. Where the choice to
host the RS database remotely is made, these same considerations are applicable with the additional
encumbrance of network connectivity between the RS Server and the corresponding SQL instance.
RS Storage Requirements
There are a number of elements that must be considered when planning for the storage area for an RS
Cluster implementation. These are:



Number of clients to be supported by the RS Cluster
Nature of files in-scope of the client snapshot
Retention period configured for snapshot data
An extremely valuable tool in estimating the amount of space that will be required by an RS Cluster is
the Recovery Solution Storage Estimator. While these utility can be used as a guide for storage
estimation, for long term sustainability it is highly recommended that the storage be given sufficient
excess capacity so as to accommodate shifts in user data patterns as well as the introduction of
new/upgraded applications into the environment that may shift the prior estimation.
RS Activity Requirements
A pivotal component of the RS activities is the AeXCRTemp folder that is created as part of the RS Server
installation. In order to understand how this can be optimized, an understanding of its functionality
must be made.
2
Beyond the prior definition of ALL, , database activity from other User databases as well as SQL backup activity
need to be brought into consideration at this stage.
1. When a client machine commences a snapshot, comparisons of files from the prior snapshots
are made at a local level. Files that have been modified or added are entered to a list on the
local client and file hashes are created based upon name, size, and modification date. The
hashes are then sent to the server on a file-by-file basis and compared against the hashes
already protected by the RS Cluster.
 If the file hash does obtain a match, the entire file will be sent to the server. Files sent to the
server are sent in encrypted form to the AeXCRTemp directory where the RS engine will break
them down to data chunks and create a new blob%.dat file to store them in (as well as creating
appropriate database entries to facilitate the retrieval of this file).
 If the file hash does obtain a match, the block hashes are sent back to the client to ensure a
match. Where the block hashes for the file do not match, the missing blocks are transferred to
the server (to the AeXCRTemp folder) and stored in a similar manner as above.
2. During the compaction phase of the SSM job, BLOB files identified for compaction will be
decompiled to the AeXCRTemp directory as data chunks. These data chunks are subsequently
written to the protected storage area in a new BLOB file. Once the write process is completed, a
checksum is run against the files and the old BLOB file is deleted.
From these two processes alone, the implications of poor I/O performance of the drive hosting the
AeXCRTemp directory can be easily understood. A slow drive will degrade both snapshot and system
maintenance performance. With these task slowing, the potential for aggravation of other system
bottleneck arises as other process that are delayed by this process may start moving to the page file and
thus creating further I/O requests.
Beyond having sufficient I/O bandwidth to accommodate peak loading, another prerequisite of the
AeXCRTemp directory is that it have sufficient space to accommodate the largest single file that is either
within the protected storage OR offered by a client as part of a snapshot. If the available space for the
AeXCRTemp directory is insufficient, the file will be unable to complete its transition to this folder and a
checksum will inherently fail.
While in a optimal environment the AeXCRTemp folder would reside upon a physical drive of its own, in
practice this is not generally feasible. For optimal performance, ensure that this folder is on a drive
where I/O and size will not be impinged by other server activities.
Overview of Basics
While installation of an RS Cluster is a relatively simple activity, the planning and configuration of an
optimized server is far from that. An understanding and optimization of the individual server needs to
be conducted at a modular level so that where full optimization is not feasible, an understanding of the
implications is made and proved acceptable.
I have provided diagrams below of variations of a RS Cluster implementation to illustrate the benefits
and pitfalls of different configurations.
Diagram 1 – Poor Scalability Design
In the design above, the RS Cluster is built upon a server with a single physical volume. Although there is
sufficient physical space to accommodate the components, there is only one physical I/O point through
which all of these elements run. While this configuration may be the lowest cost, it offers very little in
terms of sustained load capability and would only be viable in extremely small (<100 users) or lab
environments.
Diagram 2 Fully Optimized Design
In Diagram 2 an example of a fully optimized RS Cluster can be seen. Each component has been
separated into a separate physical array so as to optimize disk I/O to a peak level. While this model
allows for optimal performance, implementation of a this design may prove cost prohibitive to all but
the largest of environments and offer limited expansion capacity in terms of further storage arrays.
Diagram 3 Hybridized Optimal Design
Diagram 3 provides an example of a hybrid design of the optimal model. After assessing requirements
for the environment, it was found that the I/O requirements of the separate models could be
accommodated with far fewer physical disk I/O points. A strong point to this model is a significantly
lower cost to implement vs the optimized model of Diagram 2 while still maintaining a separation of I/O
activity at an acceptable level. Should I/O patterns of this server change significantly in the future, the
cost savings attained through aggregating the I/O may become an issue in the future if the preliminary
testing did not leave sufficient I/O bandwidth in excess.
The Server – Advanced
Beyond the elements already discussed, there are further advanced elements that can provide for
improved performance and scalability of an RS Cluster.
Operating System Standard
RAM memory provides for the base functionality of the applications and processes run on any computer
system. While ‘more ram’ is often considered the first option when applications are found to be lagging,
this may not always be possible.
In the Windows environment, the choice of operating system standard will set in place limitations on
how much memory can actually be used. See Memory Limits for Windows Releases for further details of
these limits.
SQL Installation Standards
In line with the Operating System memory limitations, the SQL implementation is limited to the amount
of memory that can be allocated to it. This functionality is controlled through Address Windowing
Extensions (AWE). For details of the limits and capabilities of enabling AWE, please see AWE enabled
Option
Physical vs Logical Disks
A quite common problem encountered in RS Clusters suffering from poor performance comes from the
disk layout. A logical disk is a partition of a physical disk array, and there may be 2 or more logical disks
defined on any particular array. A logical disk is purely a way of logically separating files on a disk array,
it is NOT a separation of I/O activity. Where an activity is accessing data held on two separate logical
disks, for the purposes of I/O discussion the performance can be worse than if the data were stored on a
common physical disk (due primarily to head seek delays).
Wherever possible, avoid the use of logical disks on a RS Cluster Server unless the full implications of the
I/O of the components housed in these partitions has been assessed an found to meet performance
expectations.
An alternative to traditional hard disk arrays on the server is the use of SAN devices. While using very
different technology than that of the hard disk array, for the scope of this document a SAN connected
Lun is considered as a physical disk.
Hard Disk/Hard Disk Controller Speed
Very much in line with the adage ‘You get what you pay for’, hard disk and hard disk controller speeds
will impact greatly upon the performance of a RS Cluster server. While many manufacturers offer high
speed controllers in server equipment, the hard disks purchased to be attached to the array are rarely of
a speed to fully leverage the high speed controller. Regardless of how new and high speed capable a
controller may be, if the disks attached to it are slow e.g. 5400 rpm, the speed of the disk will remain the
limiting factor.
CPU
Generally a server will have more bottlenecks occurring at the I/O level rather than the CPU level. A
minimum of 2 multi core CPU’s will generally suffice all but the largest of implementations with regard
to Recovery Solution.
It is worthy to note that there has been much discussion on the effect of Hyper-Threaded CPU’s on SQL
performance. While limited evidence exists to either side of this discussion at this stage, HT is
recommended to be disabled on a RS Cluster server.
BLOB File Size
BLOB file size has been reviewed as a further element of optimization. RS Engineering have determined
that as a general rule, 200Mb BLOB files have shown the greatest performance in the majority of usage
cases.
Server Task Scheduling
When considering server performance under peak load, attention should be paid to the other tasks that
are conducted on the device. Elements such as anti-virus scans, monitoring software, drive
maintenance, OS/application patching, and additional 3rd party applications added after configuration
can cause significant changes to the expected performance of the server. This leaves two alternatives
available to prevent diminished performance:

Initial design of the server to carry sufficient excess in capabilities that MAY arise in the future
and incur higher implementation costs for an undefined possibility

Establish close lines of communication with all branches of IT that will have impact upon the
server and the software contained upon it so as to impart a full understanding of the server
purpose, the routine task schedule that it runs under, and the maintenance windows that are
available for other activities.
Conclusion
The intent of this document has been to discuss the elements that can directly affect the performance of
an RS Cluster installation. Specific metrics have been intentionally avoided as each environment and
configuration is different as are infrastructure models and budgetary limitations.
Through awareness of the components and their implication upon a stable RS model, planning, testing,
and implementation Recovery Solution in a production capacity will be able to be conducted in an
informed fashion.
For further information or design assistance, please contact your Symantec sales channel.
Download