SnapProtect Sizing Guide SnapProtect v10 Mike Braden, NetApp March 2014 Version 1.0 TABLE OF CONTENTS 1 2 3 4 5 6 Introduction.................................................................................................................................4 1.1 SnapProtect Overview ...........................................................................................................................................4 1.2 Supported Versions ...............................................................................................................................................4 Scalability for a Single CommCell ...............................................................................................4 2.1 CommServe Scalability..........................................................................................................................................5 2.2 Active Job Scalability .............................................................................................................................................5 2.3 CommCell Scalability .............................................................................................................................................5 2.4 CommCell Performance Tuning ............................................................................................................................7 2.5 OnCommand Unified Manager Scalability............................................................................................................7 System Requirements .................................................................................................................9 3.1 SnapProtect System Requirements ......................................................................................................................9 3.2 SnapProtect Deployment on Virtualized Servers ...............................................................................................10 Large CommCell Environment Optimization ............................................................................. 10 4.1 CommServe Server Optimization........................................................................................................................10 4.2 Media Agent Optimization ...................................................................................................................................11 Backup Planning ....................................................................................................................... 12 5.1 Snapshot utilization ..............................................................................................................................................12 5.2 Snapshot Expiration and Removal......................................................................................................................12 5.3 Backup Schedule Planning..................................................................................................................................12 Index Planning .......................................................................................................................... 13 6.1 7 8 2 Index Cache Planning..........................................................................................................................................13 Replication Planning ................................................................................................................. 15 7.1 Initial Replication Job ...........................................................................................................................................16 7.2 Concurrent streams .............................................................................................................................................16 7.3 Replication Schedules .........................................................................................................................................17 7.4 Impact of Volume Layout .....................................................................................................................................17 7.5 Replication Tuning ...............................................................................................................................................18 7.6 Replication Throttle ..............................................................................................................................................18 Capacity Planning ..................................................................................................................... 18 8.1 SnapMirror ............................................................................................................................................................18 8.2 SnapVault .............................................................................................................................................................19 8.3 Deduplication........................................................................................................................................................19 SnapProtect Sizing Guide 9 8.4 VSS Requirements ..............................................................................................................................................19 8.5 Restore Staging ...................................................................................................................................................19 8.6 Media Agent Index Cache ...................................................................................................................................20 Other Planning .......................................................................................................................... 21 9.1 Backup to Tape Using NDMP .............................................................................................................................21 10 Virtual Clients............................................................................................................................ 21 10.1 Virtual Systems ....................................................................................................................................................21 11 NAS Clients ............................................................................................................................... 22 11.1 NAS Clients ..........................................................................................................................................................22 11.2 NDMP Tape Backup ............................................................................................................................................23 12 Conclusions .............................................................................................................................. 23 LIST OF TABLES Table 1) Active job streams ...................................................................................................................................................5 Table 2) CommCell Client Scalability ....................................................................................................................................5 Table 3) CommCell Job Controller Scalability ......................................................................................................................6 Table 4) CommCell Snapshot Scalability..............................................................................................................................6 Table 5) CommServe Server and Media Agent Requirements by Class............................................................................9 Table 6) Maximum Number of Concurrent Streams ..........................................................................................................16 3 SnapProtect Sizing Guide 1 Introduction The intent of this document is to provide sizing guidelines for architecting SnapProtect® solutions for a broad range of environments. The focus of the information is related to SnapProtect in particular, however given that SnapProtect provides management of the integrated data protection features of Data ONTAP® and leverages APIs from both Data ONTAP as well as OnCommand® Unified Manager/DFM/Protection Manager this document also provides some recommendations for those products. References are provided to existing documents that offer details on features in Data ONTAP and Protection Manager as well as best practices and the info in those references super cedes the information in this guide unless otherwise noted. The recommendations in this document are not necessarily hard limits. When architecting a solution it is important to consider as many aspects of the backup environment as possible. A conservative approach is recommended taking into account growth for a given implementation as well as building in allowances for times when systems are operating at a higher than expected load or when operation of the systems is affected by adverse events such as failures or outages in the environment. 1.1 SnapProtect Overview SnapProtect software fully integrates Data ONTAP’s data protection features into a complete and modern backup solution. The advantages of Snapshots and replication offer incredibly fast backup times with the additional advantages of efficient block level incremental data transfers added by SnapMirror and SnapVault. 1.2 Supported Versions This document applies to SnapProtect v10. Any combinations of SnapProtect, operating systems and applications should be verified to be supported in the Interoperability Matrix Tool located on the NetApp Support site. http://support.netapp.com/NOW/products/interoperability/ 2 Scalability for a Single CommCell The following areas of design have been identified as scalability thresholds within a single CommCell group. Threshold considerations by deployment type are divided as follows: Workgroup-class configurations Datacenter-class configurations Enterprise-class configurations The deployment classification of Workgroup, Datacenter and Enterprise are used to correlate the server requirements with specific workload categories. For example, a server that meets the Datacenter class hardware requirements would be able to handle a number of jobs in a 24 hour window as shown the Job Scalability table. The class can be used to correlate requirements in both directions. For example, if there is a known requirement for 300 clients, then using the CommCell Client Scalability table it is possible know a Datacenter class or higher is appropriate and the CommServe Server hardware would need to be at the Datacenter level requirements or higher. 4 SnapProtect Sizing Guide CommCell administrators may receive warnings when approaching scalability limits. Those limits are referenced in this document. The warning message advises the administrator to modify current CommCell settings or configure the entities that exceed the scalability guidelines within a different CommCell. It is strongly recommended that these soft limits, though not strictly enforced, not be exceeded. 2.1 CommServe Scalability Using recommendations based on best practices for implementing a CommCell hierarchy, the following parameters need to be observed for a single CommCell. 2.2 Active Job Scalability Active job scalability refers to the number of concurrent running jobs for the CommCell. This includes Jobs in a Waiting/Pending status. Table 1) Active job streams CommCell Class Running Job Streams Workgroup 1 to 100 Datacenter 101 to 300 Enterprise 301 to 1,000 Another consideration in terms of job scalability is what type of tasks a job will create for the storage controllers and associated systems. It is important to understand that a single job in SnapProtect may create many tasks for a storage controller. The best way to understand the impact is to break down what a single job does and consider the additional tasks. For example, one backup job may include many volumes for a given storage controller. When the job is executed, it will create snapshots for each of the volumes and possibly perform indexing for each volume. In order to not exceed the limits of the storage controller use care in configuring the backup subclient and associated schedule in order to distribute load as much as possible. In this case, it is important to follow the entire contents of this guide and review the materials recommended throughout as references. 2.3 CommCell Scalability A CommCell is composed of the systems protected by a single CommServe and associated Media Agents. The following count applies to a single CommCell. Number of Supported Clients Table 2) CommCell Client Scalability CommCell Class Client Count Workgroup 1 to 100 Datacenter 101 to 400 Enterprise 401 to 5,000 A CommCell can handle up to 10,000 servers. By default, the software supports 5,000 servers out of the box. 5 SnapProtect Sizing Guide Factors that determine the scalability include: Number of jobs running in a 24-hour period Number of jobs completed in a single attempt Type of jobs Geographical client locations (LAN versus WAN) Number of Media Agents It is possible that in larger SnapProtect environments the number of media agents could be high while all other aspects of scalability are within Datacenter or Enterprise limits. SnapProtect utilizes media agents for indexing and storage appliance (array) communications. While its possible that a SnapProtect configuration could include a high number of media agents, the workload would differ from a traditional backup environment where the media agents would be streaming data for many clients. With SnapProtect its possible that the lower workload related to Snapshot backups would allow for a much higher number of media agents without exceeding the scalability limits of a CommCell. Total Number of Jobs in Job Controller Table 3) CommCell Job Controller Scalability CommCell Class Total Active Job Count Workgroup 1 to 100 Datacenter 101 to 300 Enterprise 301 to 1,000 Note: In order to maintain control over the number of active jobs, stagger submitted schedules. An effective means to accomplish staggered schedules is to use multiple schedule policies (operating on different client groups in some cases). The timing of each schedule policy submission may be adjusted to allow for job “tuning” to achieve optimized scalability. Maximum Number of Snapshots The maximum number of snapshots supported on a storage appliance is dictated by DataONTAP. SnapProtect does not have a default snapshot limit. It is up to the administrator to define the schedules and retention such that the number of snapshots per volume does not exceed the limits of the version of DataONTAP running on the storage controller. Table 4) CommCell Snapshot Scalability CommCell Class Maximum Snapshot Count Per Volume 255 Also take into consideration that when an NDMP backup is performed, it may also create a snapshot. In the case of backing up a particular SnapProtect copy to tape, a named snapshot will be used. However, when performing a direct NDMP dump using SnapProtect, the NDMP process will create a point in time snapshot that will be used for the copy to tape. 6 SnapProtect Sizing Guide Maximum Number of Storage Policies There is no limit to the number of maximum storage policies in an enterprise class deployment. The limiting factor for storage policies can be either a practical point from the administration aspect or if tape media is used, the policies that utilize tape should be consolidated in order to reduce the amount of media required for a CommCell. Tape media is dedicated to a storage policy and cannot be shared between storage polices. 2.4 CommCell Performance Tuning See the books online documentation for the current recommendations for performance tuning. From the books online main view click the Server menu, then choose CommCell Management. In the contents pane on the left side locate Best Practices, then choose CommCell Performance Tuning. http://support.netapp.com/NOW/knowledge/docs/snapprotect/relsnap_protect100sp4/21508433_A0_books_online_100sp4/books_online_1/english_us/features/performance_tuning/tunable_para meters.htm 2.5 OnCommand Unified Manager Scalability SnapProtect utilizes one or more OnCommand Unified Manager (OCUM) servers to provision storage for replication relationships for SnapMirror and SnapVault. Using more than one OnCommand Unified Manager server allows for delegation of resources and the ability to divide workloads based on business or environmental requirements. However, only one OCUM server should be used to manage a storage controller and the storage controllers in replication relationships with it. Note: OnCommand Unified Manager was previously referred to as Operations Manager, Provisioning Manager, and Protection Manager. The are also references to Data Fabric Manager (DFM) which is a component of Operations Manager. Environments using Data ONTAP operating in 7-Mode require OnCommand Unified Manager 5.x. Environments using clustered Data ONTAP require OnCommand Unified Manager 6.x. When using both 7-mode and clustered Data ONTAP separate servers running both OCUM 5.x and 6.x are required to manage the respective Data ONTAP systems. OCUM best practices are located at: OCUM 5.x Best Practices Guide https://fieldportal.netapp.com/DirectLink.aspx?documentID=108087&contentID=181060 OCUM 5.x Sizing Guide https://fieldportal.netapp.com/DirectLink.aspx?documentID=85549&contentID=109831 OCUM 6.0 Sizing Guide https://fieldportal.netapp.com/DirectLink.aspx?documentID=101148&contentID=145888 7 OCUM 6.x Maximum Cluster Count Per vAppliance 24 SnapProtect Sizing Guide OCUM Recommended Maximums Replication Relationships Datasets Concurrent Jobs 3,000 300 100 Note: 8 It is recommended to disable performance monitor when configuring an OCUM server that will be used for SnapProtect. OCUM servers should be dedicated for SnapProtect. If performance monitoring is required, it is recommended to use a separate OCUM server for that functionality. SnapProtect Sizing Guide 3 System Requirements The following table illustrates the recommended system requirements for the CommServe server and Media Agents, as defined by class of deployment - Workgroup, Datacenter, Enterprise. The classes are based on current scalability requirements with SnapProtect. 3.1 SnapProtect System Requirements CommServe and Media Agent system requirements are for dedicated physical hardware. For most implementations it is recommended to have dedicated systems for the CommServe and DFM server. For lager implementations it may also be advisable to segment the storage systems managed to allow use of multiple DFM servers. When sizing Media Agent servers, the recommendations below are for Media Agents that will be streaming data to other tiers of storage, such as copying data to tape. Media Agents are also utilized for communications to storage controllers for the backup (Snapshot) process including indexing backups. It is possible that the load on many SnapProtect Media Agents could be lower than the load on a Media Agent that is moving data as they would likely be used only for the backup operation of creating and indexing Snapshots. In those cases, it would be possible to utilize servers that are in lower Class. Table 5) CommServe Server and Media Agent Requirements by Class Module Class Processor Memory CommServe Workgroup 2 CPU cores 8 to 16 GB Datacenter 4 CPU cores 32 GB Enterprise 12 CPU cores 32 GB Workgroup 4 cores 12 to 16 GB 200 GB Datacenter 8 cores 16 to 32 GB 500 GB Enterprise 12 cores 28 to 48 GB 1 TB 1 – 25 storage controllers 1 CPU x64 2 GHz or greater 4 GB 40 GB* 26 or more storage controllers 1 CPU x64 2 GHz or greater 12 GB 60 GB* 4 vCPUs 12 GB 152 GB Media Agent OCUM 5 Server OCUM 6 Server Disk Space Used** *OnCommand Unified Manager (OCUM) disk requirements change depending on the type and configuration of storage controllers. See best practices or the product documentation for OCUM for environments with specific requirements. **The disk space used column is for the additional disk space expected to be available for indexing and database files where appropriate. This does not include the minimum disk space required for installing the product. See the release notes for more information about installation requirements. 9 SnapProtect Sizing Guide Note: See the SnapProtect documentation on the NetApp Support site for the most up to date requirements for the CommServe and Media Agent located under the Server menu, CommCell Management, System Requirements for the CommServe System Requirements. Note: The IMT on the NetApp Support site contains the latest information for operating systems and supported software. Verify any configurations using the IMT for SnapProtect, OCUM and DataONTAP. For a complete list of hardware and software requirements for individual iDataAgents see the SnapProtect books online on the NetApp Support site. 3.2 SnapProtect Deployment on Virtualized Servers This section describes using virtual machines for the SnapProtect servers. For information about backing up a virtualized environment see the sections later for VMware® or Microsoft Hyper-V. CommServe Server If the CommServe server is configured on a virtual machine (VM), then it typically operates at a range of 60% efficiency as compared to a comparable physical server. In this deployment model scalability limits are comparatively affected. Virtualized CommCell guests should be running on VMware vSphere 4.0 or above with comparable resources to a physical server as shown in the CommServe Server requirements by class table. Media Agent Server If the customer Media Agent is configured on a VM, then it typically operates at a range of 60% efficiency as compared to a comparable physical server. In this deployment model scalability limits are comparatively affected. For Media Agents deployed on Virtual Machines the maximum number of concurrent streams supported by a single Media Agent is effectively 60% of the specification for a physical host. Virtualized Media Agent guest servers running in a VMware environment should be running on VMware vSphere 4.0 or later with the latest version and patches applied. 4 Large CommCell Environment Optimization The following settings and processes are recommended for use in large CommCell environments in order to support optimized operations 4.1 CommServe Server Optimization There are several options that can be tuned in SnapProtect to optimize performance depending on the environment. Note: Registry key optimizations utilized in SnapProtect 9 are no longer required in SnapProtect 10. They are available in configuration settings in the GUI. Subclient Count Reduction Benefits By reducing the number of subclients, there is a commensurate reduction in the amount of unnecessary tracking information associated with each server. Subclient count optimization allows for the easier management of daily backup operations for the SnapProtect administrator. It is recommended to perform 10 SnapProtect Sizing Guide periodic review of CommCell subclients in order to determine if any redundant or unneeded subclients may be removed from the CommCell configuration. Recommended Parameter Adjustments SnapProtect offers some parameters that can be tuned to optimize the backup environment. The following are examples: Increase Chunk Size Increase Block Size Job Manager Update Interval Network Agents Job Scheduling The procedures for changing these parameters is now located in the books online. From the menu choose Server, then select CommCell Management. Scroll the contents pane on the left down to the Best Practices section. See CommCell Performance Tuning for parameters and links to details. Note: Some parameters such as Block Size apply specifically to streaming backups to tape and may not apply if doing only snapshots and replication. 4.2 Media Agent Optimization Tape Block Size It is recommended to use a block size of 256K for tape media. For most operating systems this is configured automatically by the operating system’s tape driver. Check the documentation for the tape device driver to determine the block size and details for modifying it. The block size may also be modified using the Data Path properties. To access the setting, select the Storage Policy. Right-click on the copy you wish to change and choose properties. Click the Data Paths tab and select the Data Path, click the properties button and use the block size setting at the bottom of the window. Windows Test Unit Ready Ensure Test Unit Ready registry key is set properly on all Microsoft Windows servers that have visibility to SAN-attached tape drives. Additional detail regarding this registry key is available at: http://support.microsoft.com/default.aspx?scid=kb;en-us;842411&Product=w Dynamic Drive Sharing In a Dynamic Drive Sharing (DDS) environment, configure between a minimum of two to a maximum of six drives that are controlled through each MediaAgent. This should allow for jobs to meet the backup window and at the same time not overload a single MediaAgent on the network. Backups should be associated with Storage Polices/MediaAgents evenly to balance the data protection workload activity. 11 SnapProtect Sizing Guide 5 Backup Planning In SnapProtect terminology the backup is a Snapshot of the primary volume. With the exception of NAS backups, all backups are performed when the file system or application are in a backup mode for a consistent state. When applications are in hot backup mode the corresponding Data ONTAP Snapshot will be in an application consistent state. SnapProtect must create the initial backup for any other SnapProtect operations to be utilized, such as SnapMirror/SnapVault or NDMP dump to tape. Snapshots created by SnapProtect use a specific naming convention to produce unique and identifiable names. The names start with “SP_” and include the job ID for correlation with SnapProtect backup operations. 5.1 Snapshot utilization The limit of the number of Snapshots for a volume in Data ONTAP is 255. Care should be exercised when planning backup schedules and retention in order to avoid running out of Snapshots. When a volume reaches the number of Snapshots allowed subsequent backups would fail. When configuring backup retention and schedules, there is no warning if the configuration will result in a higher number of Snapshots than the system allows. 5.2 Snapshot Expiration and Removal Care should be taken when sizing for the number of Snapshots per volume. Snapshot consumption can vary with the time the aging process runs and the time the backup job completed. This can affect the number of existing Snapshots such that more Snapshots would be used on a volume at a given time depending on the retentions configured. The process for removing expired Snapshots is configured to run daily at noon (12:00 pm) by default. It is possible to change the time and frequency of when the aging process runs such that Snapshots are removed closer to the time they expire. To change the default schedule right-click on the CommServe and select view schedules. Right-click on Data Aging and choose Edit to modify the schedule. Caution should be exercised when planning retentions and backup schedules to account for extra Snapshots. For example, if you configure backups every 2 hours and set the retention to one day you will have a minimum of 12 Snapshots per 24 hour period. When taking into account the data aging process and removal of expired Snapshots, the number could be more than 12. The aging process runs at 12:00 (noon) daily. If you check the system at midnight, you will have 12 Snapshots retained, plus up to 6 more Snapshots that occurred after the reclamation process ran. It is recommended to change the default aging schedule so that the aging process runs at an off-peak time for the environment. The aging process will delete Snapshots that have passed the retention period. If many volumes on a single storage controller contain Snapshots that are expired, the deletion process will place extra load on the storage controller. The number of Snapshots being deleted could also be much higher if there are incremental jobs that maintain more Snapshots per cycle. 5.3 Backup Schedule Planning Backup jobs in SnapProtect are application consistent. Application consistency can add to the time it takes to perform the application backup. Care should be taken to schedule backup jobs when systems are idle or operating during low periods of activity in order to reduce the time it takes to place the application in a quiescent state. 12 SnapProtect Sizing Guide 6 Index Planning Indexing a file system or NAS share is the process of collecting metadata details about the files located on the file system including the structure of the data being indexed, permissions, dates and other details. Indexing is often performed on the primary storage and occurs during the backup process. A backup is not complete until indexing phase of the backup job is complete. It is possible to defer indexing for some operations so that the load created by the indexing operation can be scheduled separately and if required performed on a copy of the data rather than the primary storage. Deferred indexing allows administrators to manage the workload for storage such that it is possible to perform backup related workloads on non-primary storage reducing the impact on production systems. The term indexing in SnapProtect is interchangeable with the term cataloging. Indexing or cataloging refer to the process of collecting meta data that describes the contents of a backup copy. Index or catalog refer to the location the information is stored. SnapProtect does not have a single monolithic location to store all index information. Indexing meta data in SnapProtect is typically located with the backup copy and is cached on media agents when performing data management operations. NAS Data Indexing of NAS volumes on FAS storage controllers is performed using API’s to collect information about the contents of the Snapshot from DataONTAP. Indexing for NAS is disabled by default. When enabling indexing for volumes containing large numbers of files the backup job will take longer to complete. Indexing time is based on the number of files and directories in a volume and is not affected by the size of the files. To reduce the time it takes to index volumes with high files counts, it possible to perform a mix of Full and Incremental backups. Incremental indexing of NAS volumes uses the SnapDiff functionality in DataONTAP to determine the files changed since the previous backup (Snapshot). Full backups also utilize the SnapDiff API, however they collect the complete list of files from the current Snapshot. DataONTAP limits the number of concurrent SnapDiff sessions to 16 per controller. W hen planning for NAS backups that perform indexing it is recommended to distribute the schedules so that no more than 16 jobs are running concurrently on a single controller. Application Data Application and File System iDA’s always perform indexing. Virtual Machine Data Indexing of VMware virtual machines is of the datastore the virtual machine is located in allowing full restore of the virtual machine or its virtual disks. For Windows virtual machines it is also possible to enable granular indexing which will perform indexing of the files in the virtual machine. While the NetApp Snapshot occurs very quickly, indexing and placing applications into backup mode add time to backup jobs and can possibly add significant time. Indexing is required to search and restore at the file level, with the exception of the live browse feature, which allows browsing of a backup when the indexing operation was not performed at the time of the backup. 6.1 Index Cache Planning It is important to consider the disk space required specifically for the index cache on a media agent. The Index Cache is located in the media agent installation directory. The cache is used to temporarily store index related meta data from backup and restore jobs. The space required can vary widely with the type 13 SnapProtect Sizing Guide of data being indexed. The following sizes should be used as a guideline. The system should be monitored using the reporting capabilities of SnapProtect as well as the operating system monitor ing facilities such as Windows Resource or Performance monitor or Linux system and reporting scripts. MediaAgent Class Estimated Data Backed Up Estimated Index Cache Recommended IOPs (Per Week) (8 Worker Threads) Large 40 - 60 TB 1 TB 400 Medium 20 - 40 TB 500 GB 300 Small Up to 20 TB 200 GB 250 Factors that affect the Index Size The following factors must be considered as it impacts the size of the index cache: Index retention settings Number of files that are backed up and other properties like file name length Collection of extra properties like ACLs, additional attributes, etc. Agents that require more space, namely: SharePoint Server iDataAgent - Documents File System iDataAgent with Analytics information and/or ACLs information 14 Virtual Server iDataAgent Exchange Mailbox iDataAgent OnePass Agents SnapProtect Sizing Guide 7 Replication Planning Replication is a key feature of SnapProtect. Snapshots are used to maintain point in time copies of data on primary storage. However, for more complete data protection, replication of those snapshots is the key. Replication in Data ONTAP uses either SnapMirror or SnapVault and the point in time copies that are replicated are called auxiliary copies in SnapProtect terminology. A replication relationship is configured between two storage controllers, typically called the primary storage and secondary storage. These storage systems may also referred to as the source and destination for replication. With SnapProtect it is possible to have three levels of cascade for replication relationships. Data can be replicated from the primary storage controller to a secondary, then to a tertiary storage controller. Replication relationships are created (provisioned) in SnapProtect using APIs in OnCommand Unified Manager or Protection Manager when a Snapshot copy is configured. SnapProtect also schedules and initiates an update to SnapMirror or SnapVault through APIs in OCUM or DFM and Protection Manager as well as manages retention of the Snapshots. Replication relationships are created in SnapProtect, however, you should adhere to any recommendations in the appropriate version of the Data ONTAP documentation located on the NetApp Support site (NSS). It is highly recommended to start by becoming familiar with the information contained in the Data Protection Online Backup and Recovery Guide for all versions of Data ONTAP that will be managed by SnapProtect. For example, for 8.2 7-Mode, see the following guide: Data ONTAP 8.2 Data Protection Online Backup and Recovery Guide For 7-Mode SnapMirror Overview SnapMirror is a physical replication technology. Blocks that have changed between Snapshots are replicated. Snapshots on the source volume are replicated to a target volume, which will provide symmetrical Snapshots between the source and target volumes. For example, if the source has 100 local Snapshots, when they are in sync the target will have the same 100 local Snapshots. The target volume for a SnapMirror relationship will be in restricted mode, providing read-only access to the data in the volume. This data is available to restore using SnapProtect. SnapMirror in Data ONTAP is available with several variations of features including synchronous SnapMirror, semi-synchronous SnapMirror, Qtree SnapMirror and volume SnapMirror. SnapProtect utilizes only asynchronous volume SnapMirror. SnapVault Overview SnapVault is a logical replication technology. It replicates a single Snapshot from the source to the target location. This allows for asymmetrical Snapshots between the source and target volumes. It is possible that the source could have 100 Snapshots utilized, while the target could contain data from only 10 of those Snapshots. The advantage is that Snapshots on the vault destination could be retained for much longer than the Snapshots on the primary. An advantage of SnapVault, for example, is when the primary data requires frequent Snapshots where the destination does not. For example, on the primary volume the Snapshots could be taken every 2 hours, where the vault would save Snapshots daily or weekly. For more information about SnapMirror and SnapVault see the Data Protection Online Backup and Recovery Guide which is part of the standard Data ONTAP documentation set available on the NetApp Support Site. 15 SnapProtect Sizing Guide 7.1 Initial Replication Job When the first replication job runs for a new relationship, a baseline transfer occurs where all of the data from the Snapshot that originated with the backup is transferred from the source volume to the secondary or target volume. Depending on the connectivity and the amount of data in the volume, the baseline process may take significantly longer than subsequent replication jobs. Slow links and large amounts of data will increase the amount of time the initial replication jobs takes. After the initialization of the replication is complete further updates will only perform incremental block copies of changed data allowing for significantly faster replication jobs. 7.2 Concurrent streams There is a limitation of the number of concurrent streams that Data ONTAP will support. The limitation is based on the storage controller platform and version of Data ONTAP. Concurrent streams can be limited by setting the number of Device Streams in the Storage Policy. It is recommended to use this feature to reduce the number of active streams. Additional information including the specific number of streams for each platform is listed in the section Maximum number of concurrent replication operations in the Data Protection Online Backup and Recovery Guide. Data ONTAP 8.2 7-mode Example The following table is an example for Data ONTAP 8.2 7-mode. Note: The table includes values for replication options that may not be supported by SnapProtect. The NearStore option in Data ONTAP also has an effect on the number of concurrent streams. Note: When sizing a Data ONTAP storage controller pay careful attention to other operations occurring on the controller. The maximum concurrent stream values do not take into account other operations that may be occurring such as other backup operations. The maximum number of concurrent replication operations without the NearStore option enabled are as follows. Table 6) Maximum Number of Concurrent Streams FAS Model Volume SnapMirror Source Destination SnapVault Source Destination 2220 50 50 64 64 2240 50 50 64 64 3140 50 50 64 64 3160 50 50 64 64 3170 50 50 64 64 3210 50 50 64 64 3220 50 50 64 64 3240 50 50 64 64 16 SnapProtect Sizing Guide FAS Model Volume SnapMirror SnapVault 3250 50 50 64 64 3270 50 50 64 64 6040A 100 100 96 96 6080 150 150 128 128 6210 150 150 128 128 6220 150 150 128 128 6240 150 150 128 128 6250 150 150 128 128 6280 150 150 128 128 6290 150 150 128 128 7.3 Replication Schedules It is recommended to stagger schedules for replication such that vault and mirroring are not running at the same time for a given volume. In SnapProtect, backup jobs and replication jobs are independent. The replication job is not automatically triggered by the completion of the backup job. Automatic schedules can be used to update replication relationships close to the backup completion time. Replication schedules should also take into account the number of streams as noted in the previous session. 7.4 Impact of Volume Layout Multiple datasets from the same volume produce multiple copies when replicated. For example, when using vApp affinity, which applies to a subset of virtual machines in a datastore and the corresponding volume it is possible to create more than one replication relationship for the volume. Another example is when a volume contains LUNs for multiple client systems (configured in SnapProtect as more than one client), the backup of each client will generate a new Snapshot and will be replicated independently of the other clients to different destination volumes. It is recommended to have a single type of data per volume as appropriate. For example, when backup up a LUN, it is recommended to have the single LUN or all LUNs for a client in one volume so and included in a single subclient. Whenever possible a single subclient should map to one or more volumes. When backing up a subclient, then replicating the data, this will produce the best results. When a volume includes multiple clients and is placed in more than one subclient, each subclient will become a backup job which will create an associated snapshot. In the simplest form, many snapshots will exist for the one volume. However, when replicating the volume, each subclient will create a separate replication relationship which could result in multiple baseline transfers requiring more capacity on the destination. For more information review the SnapProtect SE Training presentation located on the NetApp Field Portal. 17 SnapProtect Sizing Guide 7.5 Replication Tuning There are many aspects of replication that can be tuned or optimized for a given implementation. For example, configuring the TCP Window Size parameters can further optimize SnapMirror. Some examples of tunable options SnapMirror TCP Window Size SnapMirror Network Compression Details on tunable options and recommended settings are available in the following document: SnapMirror Network Compression Technical Report (TR-3790) https://fieldportal.netapp.com/DirectLink.aspx?documentID=76486&contentID=87708 7.6 Replication Throttle SnapProtect does not provide a throttle setting for replication. It is possible to schedule replication in order to reduce the amount of traffic, however it does not configure throttling schedules. In order to limit replication, it is possible to configure the system-wide throttle option in Data ONTAP. To enable system-wide-throttling using the console on the FAS device use the following option: replication.throttle.enable For additional details on using the system-wide throttle see the following doc: SnapMirror Async Overview and Best Practices Guide (TR-3446) https://fieldportal.netapp.com/DirectLink.aspx?documentID=49607&contentID=64018 Note: Protection Manager schedules are not used and Protection Manager throttle schedules do not apply to SnapProtect. 8 Capacity Planning Information about planning capacity is included in the NetApp DataONTAP documentation on the NetApp Support site. See the Data Protection Online Backup and Recovery Guide for details about SnapMirror and SnapVault. SnapProtect creates and manages SnapMirror and SnapVault relationships using the SnapProtect GUI. Destination storage is allocated in Protection Manager as a resource pool, which is available for SnapProtect. When configuring the replication in SnapProtect (Snapshot Copy or Aux Copy) the administrator is shown the resources, which are configured in Protection Manager to use as a destination for SnapMirror and SnapVault relationships. The SnapProtect GUI shows basic information about the resources including the names and available capacity. SnapProtect has a dependency on Protection Manager for provisioning and management of replication. 8.1 SnapMirror SnapProtect uses only asynchronous volume SnapMirror. 18 SnapProtect Sizing Guide Capacity planning for SnapMirror from the SnapProtect perspective is for the destination volume. See the DataONTAP documentation as noted above. Refer to the following guide for best practices on using SnapMirror. SnapMirror Async Overview and Best Practices Guide (TR-3446) https://fieldportal.netapp.com/DirectLink.aspx?documentID=49607&contentID=64018 Replication Between 32-bit and 64-bit Aggregates Using Volume SnapMirror https://fieldportal.netapp.com/DirectLink.aspx?documentID=79930&contentID=97134 SnapMirror Configuration and Best Practices Guide for Clustered Data ONTAP (TR-4015) https://fieldportal.netapp.com/DirectLink.aspx?documentID=69125&contentID=73752 8.2 SnapVault Information on capacity planning and requirements for SnapVault is available in the DataONTAP documentation as noted above. Also refer to the following guide for best practices for using SnapVault. SnapVault Best Practices Guide for 7 Mode (TR-3487) http://www.netapp.com/us/library/technical-reports/tr-3487.html SnapVault Best Practices Guide for Clustered Data ONTAP (TR-4183) https://fieldportal.netapp.com/DirectLink.aspx?documentID=99772&contentID=142356 8.3 Deduplication Deduplication is a feature of Data ONTAP that can be enabled for a SnapVault destination volume. SnapProtect does not control Deduplication on primary volumes. Primary volumes that are configured for deduplication will be mirrored in a deduplicated state when mirror replication is configured in SnapProtect. SnapVault destinations can be configured for deduplication using the deduplication policy when configuring SnapVault in the SnapProtect GUI. For more information on deduplication, see the Storage Management Guide and Storage Efficiency Management Guide in the Data ONTAP documentation. 8.4 VSS Requirements Volume Shadow Copy Service (VSS) is a Microsoft technology that provides the backup infrastructure required to make consistent point in time copies by creating consistent shadow copies of data coordinated with applications or file systems before backing them up. SnapProtect utilizes the hardware VSS provider included with the SnapProtect software distribution. See the appropriate documentation from Microsoft for details on space requirements for VSS. 8.5 Restore Staging Some restore operations require a staging area, such as restoring files backed up from a virtual machine. An example of a staging area is a CIFS share that will be used to restore files from virtual machines that 19 SnapProtect Sizing Guide have not been configured to allow direct restore. In this case, a staging area should be configured with the capacity of the largest amount data expected for restore including concurrent restore operations. 8.6 Media Agent Index Cache The storage space required for the index cache entries of a MediaAgent is estimated at approximately 4 percent of the total data protected. This recommendation is applicable for most CommCell deployments. For environments that have more specific requirements for index cache sizing the following provides a more detailed explanation of the index cache space requirements. The estimation of index cache size is based on a number of factors, which include the following: Average base name (i.e., last level of path) length or the number of characters in the file/folder name (e.g., the file expense_report.xls has a length of 18 characters; the folder jan_sales has a length of 9 characters.) Average file size. Average incremental percentage of the files being backed up. The length of the backup cycle. If you apply these factors to file system or file system-like data, you might require the index cache size to be .72% of the total data, based on the following assumptions: The average base name size of the files is 24 bytes (UTF-8 encoding, which means non-English language characters are more than one byte each.) The average file size is 50,000 bytes. On an average, 10% of the data is backed up by the incremental backups in a cycle. Using a one-week cycle of 1 Full backup followed by 6 Incremental backups. Keep 2 backup cycles in the index cache. If you apply these factors to database objects (such as Exchange Mailbox that is significantly larger), you might require the index cache size to be 2.5% of the total data, based on the following assumptions: The average base name size of the files is 90 bytes. The average file size is 20,000 bytes. On an average, 10% of the data is backed up by the incremental backups in a cycle. Using a one-week cycle of 1 Full backup followed by 6 incremental backups. Keep 2 backup cycles in the index cache. The index size can be much larger, for example, if the following conditions are different: 20 Base names are extremely long. Average file size is much smaller (meaning more files per gigabyte backed up.) The Create new index on full backup option is not selected for the subclient (but this would tend to mean fewer index versions in the cache.) The full backup cycle is much longer than 7 backups. The incremental rate of change is greater than 10%. The number of cycles in the cache is larger than 2. Only Differential backups are run instead of the Incremental backups. SnapProtect Sizing Guide Best Practices for Maintaining Index Cache As the calculation of index cache is based on several assumptions which could in turn change over a period of time in any given environment, we recommend the following as best practices for maintaining index cache: If possible, use a file system that is dedicated to index cache data only so that non-index data does not grow and encroach on the index cache capacity. Use a more liberal than a conservative estimate while allocating space for index cache. Make sure to consider the space allocated for index cache in the following cases: o When clients are added to a MediaAgent. o When the backup cycle is modified. o Factors that affect the composition of backups, such as the data size, file names, etc. 9 Other Planning 9.1 Backup to Tape Using NDMP Information about configuration and operation of NDMP is located in the Data ONTAP product documentation available online at the NetApp Support site in the Data Protection Tape Backup and Recovery Guide at http://support.netapp.com/. 10 Virtual Clients 10.1 Virtual Systems This section provides sizing guidelines for backup of virtual machines. In SnapProtect the Virtual Server Agent (VSA) provides functionality for backup of virtual environments. The VSA can be either a physical system or a virtualized system. There are advantages for each type depending on the customer requirements. For example, when backing up virtual machines to tape, having a physical VSA allows direct connection to a Fibre Channel or SAS tape libr ary. When backup of Virtual Machines does not require direct attached tape, it is recommended to use a virtualized VSA. Larger virtual machine environments can utilize more than one VSA. Increasing the number of VSA agents allows for the distribution of index loads. Increasing the number of data readers per VSA agent in the environment can also improve overall indexing performance. In order to reduce the load on a production hypervisor it is recommended to dedicate one or more hypervisors for indexing, backup copy and restore operations. 21 SnapProtect Sizing Guide VMware Backup of VMware environments can vary greatly with the type of virtual machines being backed up. As an example, the requirements for Virtual Desktop Infrastructure (VDI) can be different from what is required when backing up virtualized application servers. When backing up multiple data stores, it is possible to schedule backups so that they run in parallel. For the fastest backup times its recommended to utilize more data stores of fewer virtual machines. Large data stores with many VMs can be time consuming to backup. The normal backup process first involves quiescing the operating system and any applications running in the VM. In Windows VMs this accomplished by VMware using Microsoft VSS. It is possible to exclude VSS writers that are not required. Refer to the SnapProtect books online in the VMware advanced configuration section for more information on tuning VSS. SnapProtect also provides a feature that speeds up backup of virtual machines by allowing a hardware only snapshot, commonly referred to as a crash consistent snapshot. This feature offers the fastest backup time for virtual machines however it is not considered a consistent backup due to the operating system not being quiesced before the snapshot is taken. 11 NAS Clients NAS Client is a single FAS controller running Data ONTAP. For HA pairs, each controller is a separate NAS Client. Snapshot Constancy NAS Clients use a Data ONTAP consistent Snapshot. Any Snapshot taken in Data ONTAP is consistent with respect to the WAFL file system in Data ONTAP, however any Snapshot taken with the NAS client will not be application consistent. For a Snapshot to be application consistent, the backup must be performed using the application iDA. 11.1 NAS Clients When configuring subclients for a NAS Client, it is recommended to select the Backup Content Path at the volume level, as all Snapshot operations will take place at the volume level. In order to perform backup operations in parallel it is recommended to reduce the number of volumes defined for the Backup Content of a single subclient. This allows for the creation of multiple subclients for each NAS Client, which allows SnapProtect to execute backups of the volumes in parallel. Indexing Indexing for NAS backups is performed on the primary storage controller. The indexing performance of NAS is dependent on SnapDiff. The SnapDiff API provides a way to perform indexing without requiring a mount of the NAS share. SnapDiff also provides a performance increase when performing incremental backups with indexing using a feature in the SnapDiff API. It is also possible to skip the indexing for NAS backups and utilize the live browse feature. The trade-off for this method is that while the backup with be significantly faster when backing up volumes with large numbers of files, the restore process will take longer because the index for the backups will be created at 22 SnapProtect Sizing Guide the time the browse occurs. This feature works best for environments requiring fast backup times where few file or directory level restores are performed. Multiple Backup Sets NAS clients allow multiple backup sets to be created. When creating multiple backup sets use caution not to configure the same volume in each backup set, each will get its own corresponding dataset in OnCommand Unified Manager. When you backup the same volume in two different backup sets it will consume snapshots from each backup set. When that data is replicated using Vault or Mirror, each backup set will get a full copy of the data on the destination system. The result will be using more capacity on the destination system than the source. 11.2 NDMP Tape Backup NDMP performance is dependent on Data ONTAP and the platform of the storage controller. Refer to the following technical report for information about NDMP performance. SMTape and NDMP Performance Report - Data ONTAP 8.2 (TR- TR-4200i) https://fieldportal.netapp.com/DirectLink.aspx?documentID=99881&contentID=142585 12 Conclusions Data growth creates significant challenges to data protection. Data growth continues to exceed the capabilities of traditional backup technologies that require moving full data sets. DataONTAP offers capabilities that allow customers to address the ever-shrinking backup windows by creating full copies of data with the only data movement being incremental block changes. SnapProtect leverages those features of DataONTAP to provide a solution to the challenge of growing data sets. 23 SnapProtect Sizing Guide Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product and feature versions described in this document are supported for your specific environment. The NetApp IMT defines the product components and versions that can be used to construct configurations that are supported by NetApp. Specific results depend on each customer's installation in accordance with published specifications. NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer’s responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document. 24 © 2013 NetApp, Inc. All rights reserved. No portions of this document may be reproduced w ithout prior w ritten consent of NetApp, Inc. Specifications are subject to change w ithout notice. NetApp, the NetApp logo, Go further, faster, xxx, and xxx are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. <<Insert third-party trademark notices here.>> All SnapProtect Sizingother Guide brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. TRXXXX-MMYR