Transactional Replication Performance Tuning and Optimization Authors: Bren Newman, Xavier Schildwachter, Greg Yvkoff The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e—mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred. 2001 Microsoft Corporation. All rights reserved. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. Table of Contents Introduction ...................................................................................................... 1 Improving Replication Performance .................................................................. 2 Improving the Performance of Transactional Replication .................................. 3 Improving Performance in Applying the Initial Snapshot .................................. 3 Using -MaxBCPThreads ..................................................................................... 4 Using -UseInprocLoader.................................................................................... 7 Using Compressed Snapshots ............................................................................ 8 Using Concurrent Snapshot Processing ............................................................... 8 Transactional Replication Performance Examples .............................................. 9 Cost of Transactional Replication at the Publisher .............................................. 10 Transactional Replication with Filters ................................................................ 12 Transactional Replication with Indexed Views .................................................... 12 Transactional Replication with Transformable Subscriptions ................................. 13 Transactional Replication with Subscribers Running Earlier Versions of SQL Server 14 Transactional Replication with Updatable Subscriptions ....................................... 14 Adding Hardware in Transactional Replication .................................................... 16 Effects of Network Connection Speeds on Transactional Replication ...................... 17 Replication Command Types............................................................................ 19 CALL, MCALL, and XCALL ................................................................................ 19 Log Reader Agent Properties ........................................................................... 22 Distribution Agent Properties ........................................................................... 23 Transactional Replication Scalability ................................................................. 23 Transactional Subscriber Latency Rates ............................................................ 27 Factors Affecting Transactional Delivery Rates ................................................... 30 Conclusion ....................................................................................................... 33 i Introduction Transactional replication is a type of replication provided by Microsoft ® SQL Server™ 2000 that allows data modifications to be propagated incrementally between servers in a distributed environment. Transactional replication can be used for many different applications, from reporting servers and data warehousing environments to Web servers and e-commerce applications. Transactional replication is used at many of the predominant Web sites on the Internet that run SQL Server, including MSN.com, Passport.com, BarnesandNoble.com, and Buy.com. Transactional replication is a scalable and reliable solution for distributing data in high-performance environments. This paper examines performance in transactional replication and demonstrates ways in which you can improve the performance of your applications. This paper is based on the results of tests conducted using a variety of hardware configurations and replication environments. Based on the test results, recommendations are made in areas such as applying the initial snapshot, optimizing replication settings, and replication scalability. 1 Improving Replication Performance You can enhance the general performance for all types of replication in your application and on your network by: Optimizing your database design to include replication considerations. Setting a minimum amount of memory allocated to SQL Server 2000. Using a separate disk drive for the transaction log for all databases involved in replication. Adding memory to servers used in replication. Using multiprocessor computers. Publishing only the amount of data required. Running the Snapshot Agent only when necessary and at off-peak times. Placing the snapshot folder on a drive not used to store database or log files. Using a single snapshot folder per publication. Considering the use of compressed snapshot files. Considering the use of pull or anonymous subscriptions. Reducing the verbose level of replication agents to 0, except during initial testing, monitoring, or debugging. Considering the use of the -UseInprocLoader parameter of the Distribution Agent. For more information about enhancing replication performance, see SQL Server Books Online. 2 Improving the Performance of Transactional Replication You can enhance the performance of transactional replication in your application and on your network by: Running agents continuously instead of on frequent schedules. Setting the distribution database to a fixed size that can handle the transaction volume and retention period without frequent autogrowth. Reducing the distribution frequency when replicating to numerous Subscribers. Configuring the Distributor on a dedicated server. Increasing memory on the Distributor. Subscribing to all articles in a publication. Using stored procedure replication when a large number of rows are affected. Minimizing the retention period for transactions and history. Increasing the read batch size for the Log Reader Agent. Minimizing the log history and retention period. Using custom stored procedures for inserts, updates, and deletes at Subscribers. Avoiding horizontal filtering. Improving Performance in Applying the Initial Snapshot Applying the initial snapshot can take a significant amount of time if you are transferring a large amount of data over the network, or if you have a slow link. To address this situation, transfer the snapshot using a removable disk or use the performance optimization features of SQL Server 2000. The following examples demonstrate snapshot performance improvements when using the optimization features of -MaxBCPThreads, -UseInprocLoader, compressed snapshots, and concurrent snapshot processing. 3 Using -MaxBCPThreads In transactional replication, the -MaxBCPThreads parameter can be passed to the Snapshot Agent and the Distribution Agent. This parameter specifies the number of bulk-copy operations that can be performed in parallel. The maximum number of threads and ODBC connections that can exist simultaneously is the value of -MaxBCPThreads or the number of bulk-copy requests that appear in the synchronization transaction at the distribution database, whichever is lower. -MaxBCPThreads must have a value greater than 0, and it has no hard-coded upper limit. The default is 1. When used with the Snapshot Agent, -MaxBCPThreads affects the time it takes to generate a snapshot. When used with the Distribution Agent, -MaxBCPThreads affects the time it takes to apply the snapshot at the Subscriber. Because the Snapshot Agent bulk copies the contents of all the articles in a publication, the Snapshot Agent writes the entire publication to the snapshot folder. Therefore, the faster the disk subsystem can read and write data to the disk or disks, the faster the snapshot is completed. This also applies to the Distribution Agent applying the snapshot at the Subscriber. For the numbers provided in the following table, the snapshot data is written to and read from a three-disk array (RAID 0) and written to a subscription database spread across a three-disk array (RAID 0) with the database log on a separate disk. The performance benefit from using -MaxBCPThreads also depends on the number of processors on the server. Specifying a high number for -MaxBCPThreads can overburden the system, because the system must spend too much time managing threads. Using more threads than the total number of articles provides no additional benefit. In the following example, the publication has seven articles, totaling 228 megabytes (MB) of database storage space. Publication Articles 1 Articles Total rows Reserved size (KB) Index size (KB) CUSTOMER 120,000 19,984 4,032 PAYMENT 120,000 11,280 2,848 ORDERS 374,000 82,208 22,416 NAMES 120,000 7,056 32 CUSTOMER_HISTORY 120,000 23,744 64 PAYMENT_HISTORY 120,000 8,448 64 ORDERS_HISTORY 374,000 75,376 192 TOTAL 1,348,000 228,096 29,648 4 Generating the Initial Snapshot with the Snapshot Agent The following data shows that on a dual-processor 450-megahertz (MHz) Xeon with 256 MB of memory, using a value of 7 for -MaxBCPThreads results in snapshot generation that is 1.6 times faster than it is when using a value of 1. On a single processor, using a value of 7 for -MaxBCPThreads results in snapshot generation that is 1.27 times faster than it is when using a value of 1. Given that the CPU becomes the bottleneck on the single processor, a value of 7 provides no more benefit than a value of 3. Figure 1: Effect of –MaxBCPThreads Setting on Initial Snapshot Generation Processors -MaxBCPThreads =1 -MaxBCPThreads =3 -MaxBCPThreads =7 Dual Processor 122 seconds 84 seconds 76 seconds Single Processor 122 seconds 96 seconds 96 seconds 5 Applying the Initial Snapshot with the Distribution Agent The following data shows that on a dual processor 450-MHz Xeon with 256 MB of memory, using a value of 7 for -MaxBCPThreads results in snapshot application that is 1.3 times faster than it is when using a value of 1. On a single processor, the CPU is again the bottleneck and increasing this value provides little performance improvement. Using a value of 7 for -MaxBCPThreads is only 1.03 times faster than using a value of 1, and using a value of 7 provides no additional benefit over using a value of 3. Using dual processors clearly provides a large performance gain; the initial snapshot application is 1.57 times faster with dual processors than it is with a single processor. Figure 2: Effect of –MaxBCPThreads Setting on Initial Snapshot Application Processors -MaxBCPThreads=1 -MaxBCPThreads=3 -MaxBCPThreads=7 Dual Proc 120 seconds 98 seconds 92 seconds Single Proc 148 seconds 144 seconds 144 seconds 6 Using -UseInprocLoader The -UseInprocLoader parameter can be passed to the Distribution Agent when applying the initial snapshot at the Subscriber. When you use this parameter, the Distribution Agent will use the in-process BULK INSERT operation, decreasing the amount of time taken to apply the snapshot. To further enhance performance, use -UseInprocLoader in conjunction with -MaxBCPThreads. The following example uses a publication containing 10 articles, totaling 46 MB of data. Publication Articles 2 Articles Total rows Reserved size (KB) Index size (KB) CUSTOMER 60,000 7,944 1,968 PAYMENT 60,000 5,640 1,424 ORDERS 187,000 29,896 11,144 NAMES 5,765 328 16 PRODUCTS 10,000 904 264 INTERESTED_IN 6,000 1,216 752 STATE 200 64 48 SHIPPERS 51 40 32 SHIP_TYPE 11 40 32 REGION 2 40 32 TOTAL 329,029 46,112 15,712 When you use only the -UseInprocLoader parameter, snapshot application is 1.4 times faster than without this parameter. When -UseInprocLoader is combined with -MaxBCPThreads=5, snapshot application is 2.1 times faster. 7 Time to Apply Snapshot at Subscriber Standard -UseInprocLoader -UseInprocLoader and -MaxBCPThreads=5 36 seconds 25 seconds 17 seconds In most cases, you will see a performance gain. By default, this parameter is not used because it is affected by line quality and speed, the amount of available memory on the subscription database, the type of data transferred, and the number of articles. It is recommended that you test the performance gain using your publication. Using Compressed Snapshots This option is recommended when you are using a pull or remote push Subscriber. It also provides additional benefits when you are using FTP support. Compressing snapshot files in the alternate snapshot folder can reduce snapshot disk storage requirements, and in some cases it can significantly improve performance when you are transferring snapshot files over a slow connection. However, compressing the snapshot requires additional processing by the Snapshot and Distribution agents while the snapshot files are generated and applied. This may slow down overall snapshot generation and increase the time it takes to apply a snapshot. Using the articles listed earlier in the publication Articles 2 table, the Snapshot Agent generates 20 files—including schema files and data files—with a total size of approximately 130 MB. When you use a compressed snapshot, it also generates a .cab file with a size of approximately 65 MB; a Subscriber loading a compressed snapshot across a slower link has only half as much data to copy. However, compressed snapshots require more storage space and more set-up time. A compressed snapshot can use more space on the Distributor (the process optionally maintains both the compressed and uncompressed data), and it takes more than 4.5 times longer to generate than an uncompressed snapshot because of the time required to compress the snapshot. Consider these tradeoffs carefully during planning. Using Concurrent Snapshot Processing When you use the default settings for snapshot generation, SQL Server places shared locks for the duration of snapshot generation on all tables published as part of replication. This prevents updates from being made on the publishing tables. Concurrent snapshot processing (which is available only with transactional replication) places shared locks for only a short time while SQL Server 2000 creates initial snapshot files, allowing users to continue working uninterrupted. When you create a new publication using transactional replication and indicate that all Subscribers will be instances of SQL Server 7.0 or SQL Server 2000, concurrent snapshot processing is available. After replication begins, the Snapshot Agent places shared locks on the publication tables. The locks prevent changes until a record indicating the start of the snapshot is written to the transaction log. After this is done, the shared locks are released, and data modifications at the database can continue. The duration for holding the locks is only a few seconds, even if a large amount of data is being copied. 8 At this point, the Snapshot Agent starts to build the snapshot files. When the snapshot is complete, a second record indicating the end of the snapshot process is written to the log. Any transactions that affect the tables while the snapshot is being generated are captured between these beginning and ending tokens and forwarded to the distribution database by the Log Reader Agent. When the snapshot is applied at the Subscriber, the Distribution Agent first applies the snapshot files (schema and data files). It then reconciles each captured transaction to see whether it has already been delivered to the Subscriber. During this reconciliation process, the tables on the Subscriber are locked, and transactions that occur during the lock are again captured in the log and applied after the locks are released. If a high number of transactions are captured at the publisher while the tables are locked, the snapshot takes longer to apply at the Subscriber. Although concurrent snapshot processing allows updates to continue on publishing tables, the additional I/O required to write the snapshot files to disk may affect performance. Whenever possible, you should generate the snapshot during periods of low activity. For more information about concurrent snapshot processing, see SQL Server 2000 Books Online. Note In SQL Server 2000, concurrent snapshot processing is not recommended if the publishing table has a unique index that is not the primary key or the clustering key. If data modifications are made to the clustering key while a concurrent snapshot is being generated, replication can fail with a duplicate key error when applying the snapshot to a Subscriber. With SQL Server 2000 Service Pack 1 (SP1), there are no longer any restrictions on using concurrent snapshot processing. Transactional Replication Performance Examples For the numeric data in this section, gain and cost are measured as a percentage of the base throughput numbers returned by the replication agents. Throughput numbers are expressed in commands per second; a command is a unit of operation for a replication agent. The numbers are based on a simple scenario in a fixed environment and are given to show either a cost or a gain in performance. Many factors affect the amount of cost or gain in a real-world scenario, including network traffic, background Microsoft Windows® services that are running, client Net-Libraries, hardware, and so on. Therefore, the numbers provided here are not to be used specifically as benchmarks. All the examples, unless specified otherwise, use the following environment: Hardware The test environment consisted of a Publisher, a remote Distributor, and a Subscriber. These three servers had identical hardware and software: Dell Precision 610, dual-processor Pentium II Xeon 450 MHz; 256 MB memory with a 512-kilobyte (KB) L2 cache; a 100-MB Ethernet network card; and four SCSI disks with Windows 2000 Server, SQL Server 2000 Standard Edition, the database data file, and the database log files each on their own disk. 9 Note In almost all circumstances, the database log file should be placed on its own separate disk or disks and the database data files should be striped across multiple disks with a stripe size of 64 KB or a multiple thereof. Replication environment The replication environment consisted of a publication that contained one article, the HALFTYPES table. This table contained 21 columns, representing every data type except text, ntext, image, sql_variant, and bigint. The row size varied, with an average size of approximately 1,024 bytes. In the test scenario, a push subscription was created, and then the Snapshot Agent, the Log Reader Agent, and the Distribution Agent were run at the Distributor. Three different tests tracked insert, update, and delete operations. Each test consisted of 30,000 commands with 10 commands per second executed at the Publisher. Note UPDATE commands do not update the primary key, because primary key updates are much less common than updates that do not involve the primary key. Because a primary key update is propagated as a DELETE command followed by an INSERT command during replication, it can skew the resulting data. Performance was measured in throughput. To obtain the highest throughput for a given replication agent, in most cases each agent was run separately, rather than concurrently. For example, the commands were first executed at the Publisher, then the Log Reader Agent was run, and finally the Distribution Agent was run. The Distribution Agent used the autogenerated custom stored procedures created for each article, which is the default behavior. Cost of Transactional Replication at the Publisher One of the first questions asked when deciding whether to use transactional replication is: "How does adding transactional replication affect my OLTP server?" When you add transactional replication to an online transactional processing (OLTP) environment, the OLTP server becomes the Publisher. This incurs the overhead cost of the Log Reader Agent querying the database log. In a local Distributor environment, the server also incurs the costs of writing to the distribution database, running the Distribution Agents (in a push scenario), and having the Distribution Agents read changes from the distribution database. To measure the minimum overhead costs, the following times were recorded: The time it took to apply changes at the Publisher without replication configured The time it took to apply changes at the Publisher with the Log Reader Agent running with both a local and a remote Distributor Using the HALFTYPES example, the time it took each of 20 simultaneous client connections to execute 150 transactions consisting of 10 commands per transaction was measured. In total, there were 3,000 transactions or 30,000 commands. At the time of execution, CPU usage was at 90-100 percent for INSERT commands, 50-60 percent for UPDATE commands, and 25-35 percent for DELETE commands. Then a remote 10 Distributor was added to the environment and the tests were run again with the Log Reader Agent running. Finally, the tests were run again using a local Distributor. Cost of Transactional Replication at the Publisher Command Number of Commands Replication not configured Log Reader Agent Log Reader Agent running, remote Running, local Distributor Distributor INSERT 30,000 50 seconds 54 seconds 58 seconds UPDATE 30,000 20 seconds 22 seconds 25 seconds DELETE 30,000 20 seconds 20 seconds 22 seconds As indicated in "Cost of Transactional Replication," there is a cost in adding replication to an OLTP server; however, the cost under stress conditions can be as low as 8-10 percent when using a remote Distributor, and somewhat higher when using a local Distributor. Furthermore, if the publishing server has enough CPU capacity, the cost can be insignificant: When using a four-processor or eight-processor server, the cost of replication is often less than 3 percent. Log Reader Agent vs. Distribution Agent Throughput Using a total of 30,000 commands with 10 commands per transaction, the Log Reader Agent processes many more INSERT and UPDATE commands per second than the Distribution Agent. DELETE commands are processed exceptionally quickly by both agents: the Log Reader Agent only has to formulate and write a small delete string based on the primary key, and the Distribution Agent benefits by reading and distributing such a small string. In most production environments, when both agents are running in continuous mode, the Log Reader Agent can write more commands to the distribution database than the Distribution Agent can deliver to the subscribing database. Even when the Subscriber is processing large amounts of data, it can be as little as 1-5 seconds behind the Publisher, giving a latency of about 5 seconds. Log Reader Agent and Distribution Agent Throughput in Commands per Second (cmds/sec) Command Log Reader Agent (cmds/sec) Distribution Agent (cmds/sec) INSERT 2,080 1,230 UPDATE 2,660 1,570 DELETE 3,890 3,950 11 Transactional Replication with Filters Partitioning replicated data using filters allows you to reduce the amount of data sent over the network, reduce the amount of storage space required at the Subscriber, and customize publications and applications based on individual Subscriber requirements. Because less data is transferred from the Publisher to the Subscriber, the performance of the Distribution Agent improves. For example, adding column filters improves throughput because it minimizes the volume of data that is propagated across the network. Fewer columns propagated means smaller INSERT, UPDATE, and DELETE statements are created and written to and from the distribution database. Also, because the statements are sent across the network in batches, smaller statements means smaller batches. Smaller batches move across the network more quickly, improving performance and reducing latency. After adding column filters to the HALFTYPES publication, Log Reader Agent results were 1.14-1.35 times faster than they were without column filters. Distribution Agent results were 1.05-2.25 times faster. Of course, the performance difference depends largely on the size of the row, the number of columns being filtered, and the size of the columns being filtered. Row filters add overhead to the Log Reader Agent and the Publisher’s CPU, because the Log Reader Agent must evaluate each row filter against the transaction log record for the articles. If multiple filters exist, each filter is evaluated independently and a separate command is entered in the distribution database for each filter that qualifies. Filter overhead depends on three things: the complexity of the filter; the type of joins, functions, or comparisons being used in the filter; and whether or not the filter uses indexes on the publishing database. However, in a Publisher using a remote Distributor, the Publisher will probably have enough CPU capacity to compensate for this extra overhead. Generally the CPU costs are not large and, depending on the filter, should not add more than 20 percent overhead. Transactional Replication with Indexed Views An alternative to adding a row filter at the article level is to publish an indexed view based on the same WHERE clause that the filter would use. When you use an indexed view rather than a row filter, the Log Reader Agent does not need to evaluate statements against filters because the article—which is now the indexed view, rather than the base table—is already filtered. However, the overall performance of the Log Reader Agent is still slower than an article without filters, because every modification performed on a table with an indexed view is logged twice, once for the indexed view and again for the table itself. This doubles the number of log records the Log Reader Agent must traverse, which affects Log Reader Agent performance. The cost of maintaining the indexed view at the Publisher can be high, so indexed views work best if the underlying data is infrequently updated. If the underlying data is frequently updated, the cost of maintaining the indexed view data may outweigh any performance benefits of using the indexed view. Indexed views usually do not improve performance under the following scenarios: OLTP systems with many writes; databases 12 with many UPDATE operations; or using queries that do not involve aggregations or joins. The following table shows the results of a test comparing execution times on identical tables with and without indexed views. No replication was involved in this test. The indexed view was: SELECT * FROM HALFTYPES WHERE IndexCol % 4 = 0 This represents all columns but only 25 percent of all the rows or a total of 7,500 rows. IndexCol had a nonclustered index. Cost of Using an Indexed View Operation Number of operations Table without indexed view Table with indexed view INSERT 30,000 50 seconds 180 seconds UPDATE 30,000 44 seconds 330 seconds DELETE 30,000 44 seconds 207 seconds INSERT operations (not published for replication) took 3.5 times longer to apply to a table with an indexed view than to one without an indexed view. DELETE operations took 3 times longer, and UPDATE operations took 7.5 times longer. The performance issues with using indexed views are not replication issues; they are a function of the way indexed views are handled in the server, but they can clearly affect performance in a replication environment. Although you should exercise caution when publishing indexed views, they can be very useful in some situations, such as replicating summary or aggregated data to a large number of Subscribers. For more information about indexed views, see SQL Server 2000 Books Online. Transactional Replication with Transformable Subscriptions Transforming published data leverages the data movement, transformation mapping, and filtering capabilities of Data Transformation Services (DTS). Using transformable subscriptions allows you to customize and send published data based on the requirements of individual Subscribers. Examples of how you can use transformable subscriptions include: Creating data transformations such as data type mappings (for example, integer to real data type), column manipulations (for example, concatenating first name and last name columns), string manipulations, and function-based transformations. Creating custom data partitions. You can create column and row filters of published data on a per-Subscriber basis. Although using DTS packages provides rich features and flexibility, it affects throughput for both the Log Reader Agent and the Distribution Agent. DTS requires the parameters of the stored procedures to be specified using XCALL syntax for UPDATE and DELETE statements and CALL syntax for INSERT statements (for more information, see "CALL, 13 MCALL, and XCALL" later in this paper). Using XCALL passes values for all columns, whether changed or not, plus the previous value in the column. This increases the size of the command written to the distribution database and creates more processing work for the Log Reader Agent and the Distribution Agent. However, the Log Reader Agent is only minimally affected—in most cases the effect should be less than 10 percent. Often, row-level and/or column-level filtering operations are performed within the DTS package and throughput is negatively affected, but only for the Distribution Agent, which calls the package. Depending upon how much processing is performed within the DTS package, the cost of using a DTS package can be high in terms of CPU and memory usage, and it can reduce the number of commands per second that can be delivered to a Subscriber. Therefore, you should exercise caution when using a highly concurrent push model in which many Distribution Agents are concurrently instantiating DTS packages. Generally, adding a DTS package to a publication reduces the number of commands the Distribution Agent is able to distribute by 50 percent, but it can greatly enhance functionality. Transactional Replication with Subscribers Running Earlier Versions of SQL Server There are no additional overhead costs for push subscriptions with Subscribers running SQL Server 7.0. Distribution Agent throughput remains the same for all tests. Transactional Replication with Updatable Subscriptions If the Subscriber must be updatable within a transactional replication environment, three supported options are available: bidirectional replication, immediate updating subscriptions, and queued updating subscriptions. The following example measures the performance impact of making changes at the Subscriber using immediate updating subscriptions and queued updating subscriptions. With immediate updating subscriptions, local Subscriber triggers are fired when an insert, update, or delete operation occurs. These triggers call remote procedures (RPCs) using the two-phase commit protocol (2PC)—which in turn uses the Microsoft Distributed Transaction Coordinator (MS DTC)—to attempt to commit the transaction at the Publisher. If the transaction can be committed at the Publisher, it is then also committed at the Subscriber. The overhead cost of the immediate updating option is the firing of the RPCs and the related MS DTC service across the network. With queued updating subscriptions, triggers are fired when a local insert, update, or delete operation occurs. These triggers build and write the insert, update, or delete statement to either a SQL Server queue, which is implemented as a local table in SQL Server, or Microsoft Message Queuing (also known as MSMQ) and immediately commit. A Queue Reader Agent—a process that resides on the Distributor—reads the queue and applies it to the Publisher. The overhead cost to the Subscriber of the queued updating option (using the SQL Server queue or Message Queuing) is imposed by the underlying replication triggers that write to the queue. For both immediate and queued updating subscriptions, using Message Queuing as the queue imposes additional costs associated with using MS DTC to commit transactions, 14 adding to the overhead cost of the entire process (the time measured from the commit at the Subscriber to the commit at the Publisher). To determine the minimum overhead cost of adding the updating subscriptions option at the Subscriber, the amount of time it took to apply commands at the Subscriber with and without the updating subscription option was measured. Transactional Replication with Immediate Updating Subscriptions Command Number of Commands Replication not configured Immediate updating INSERT 30,000 50 seconds 191 seconds UPDATE 30,000 20 seconds 177 seconds DELETE 30,000 20 seconds 118 seconds The cost in time taken can be significant, because the transaction must be committed locally and across the network to the Publisher managed by MS DTC. Using queued updating subscriptions provides the advantage of being able to continue modifying data at the Subscriber while disconnected, because these modifications are queued up and later written to the Publisher by the Queue Reader Agent. It is important to note that this is a multiple-step and disconnected process, and the times shown in the following table represent only the first step; changes have not yet been applied to the Publisher, only the Subscriber. To determine the overhead cost of adding queued updating subscriptions at the Subscriber, trigger overhead was measured using both a SQL Server queue and Message Queuing. The Cost of Queued Updatable Subscriptions at the Subscriber Command Number of Replication commands not configured Trigger overhead Trigger overhead using a SQL Server using Message queue Queuing INSERT 30,000 50 seconds 127 seconds 141 seconds UPDATE 30,000 20 seconds 99 seconds 144 seconds DELETE 30,000 20 seconds 64 seconds 107 seconds There is a substantial cost in adding triggers that record transactions to a queue. Writing to a SQL Server queue is faster than writing to Message Queuing. SQL Server queues, which are the default, are also simpler to set up. However, Message Queuing is more scalable because SQL Server queues require the Queue Reader Agent to poll all the Subscribers’ queues periodically, whereas Message Queuing automatically transmits changes made at the Subscribers to a centralized queue located at the Distributor. The single queue at the Distributor also provides centralized queue monitoring. To compare the performance of SQL Server queues and Message Queuing, throughput was measured during dequeuing (the propagation of data from the queue to the Publisher). Dequeuing Throughput Using SQL Server Queues and Message Queuing 15 Command Number of commands SQL Server queue (cmd/sec) Message Queuing (cmd/sec) INSERT 30,000 211 117 UPDATE 30,000 173 102 DELETE 30,000 196 124 To determine the total time required to replicate data from the Subscriber to the Publisher, two times were measured: the amount of time it took to apply commands at the Subscriber and, for queued updating subscriptions, the amount of time it took for the Queue Reader Agent to read the queue and apply it to the Publisher. Transactional Replication with Updatable Subscriptions (Total Time) Command Number of commands Immediate updating Queued updating using SQL Server queues Queued updating using Message Queuing INSERT 30,000 191 seconds 279 seconds 406 seconds UPDATE 30,000 177 seconds 283 seconds 448 seconds DELETE 30,000 118 seconds 222 seconds 358 seconds Immediate updating subscriptions, using MS DTC, offer the fastest overall time, but they do not offer the key benefit of being able to handle offline scenarios or the built-in conflict handling of queued updating subscriptions. For more information about queued updating subscriptions, see SQL Server 2000 Books Online. In general, using a SQL Server queue is faster than using Message Queuing: Not only are the changes written to a queue faster, but the Queue Reader Agent is able to read the queue and apply changes faster at the Publisher. As mentioned in "Transactional Replication with Updatable Subscriptions," you should weigh the functional differences and advantages of using Message Queuing against those of using a SQL Server table as a queue. Furthermore, if you choose Message Queuing, you must be running Windows 2000 or later, because SQL Server 2000 replication requires Message Queuing version 2.0. Adding Hardware in Transactional Replication Adding another processor to your replication environment can improve throughput performance for both the Log Reader Agent and the Distribution Agent. Using the HALFTYPES example in a remote Distributor environment, there is an average 26 percent increase in Log Reader Agent throughput and an average 3 percent increase in Distribution Agent throughput when adding an extra processor to the remote Distributor. Although the Distribution Agent shows minimal performance gain when adding another processor, the HALFTYPES example has only one subscription with one Distribution Agent, and it has only minimal impact on the CPU. In a real-world scenario, a publication usually has multiple subscriptions with multiple Distribution Agents running, so the performance gains are larger and more obvious because both agents use multiple 16 threads. The Log Reader Agent, which is only slightly more CPU-intensive than the Distribution Agent, also sees higher throughput when another processor is added to a remote Distributor that has multiple Log Reader Agents running. When you are running a local Distributor, so that the Distribution Agent shares the CPU with the Log Reader Agent and the OLTP load, the Distribution Agent shows a much larger increase in throughput of 15 percent when a second processor is added. Log Reader throughput increases a substantial 18 percent when a second processor is added. Furthermore, in a high-traffic OLTP environment, adding another processor reduces the contention between the server activity and the Log Reader Agent, improving server performance and Log Reader Agent throughput. Other factors that can affect performance include available memory, the size and location of the database log (including autogrow), and any row filtering of articles. Effects of Network Connection Speeds on Transactional Replication Although the Publisher and Distributor generally communicate on a fast WAN or LAN, connections to Subscribers are often not as fast. The following test data shows Distribution Agent performance in terms of commands per second, using various connection speeds and rows for each command that average approximately 1,024 bytes. Each test consisted of 3,000 transactions with 10 commands per transaction, for a total of 30,000 commands. The environment consisted of a Publisher and a remote Distributor connected across a fast link and a Subscriber connected at various speeds using a network throttle, a device that can emulate slower connection speeds. Note that emulated connections are somewhat slower than real dialup or leased-line connections, because there is no packet optimization. 17 Figure 3: Distribution Rate Using Connection Speeds of 28.8 Kbps, 56 Kbps, and 100 Kbps Distribution throughput is almost linear as greater network bandwidth is provided; if a Subscriber is upgraded from a 28.8-kilobits-per-second (Kbps) dialup line to a 100 Kbps connection, three times as many commands can be delivered in the same period. However, as indicated by the following graph, eventually the network is no longer the bottleneck, and increasing the network bandwidth does not provide additional performance gain. Figure 4: Distribution Rate Using Connection Speeds of 1 megabits Per Second (Mbps), 10 Mbps, and 100 Mbps SQL Server 2000 introduces new Net-Libraries to be used for highly reliable, fast, efficient data transfer between servers in the same data center. These new Net-Libraries contain functionality for different hardware sets based on the Virtual Interface Architecture (VIA). Currently, SQL Server 2000 supports hardware from Giganet and Servernet. 18 In tests, using Servernet over a 1-GB connection between the Distributor and the Subscriber improved the delivery of INSERT commands by 12 percent and UPDATE commands by 8 percent when compared to a 100-MB connection. These increases would have been larger if larger amounts of data had been created and distributed or if there had been multiple Subscribers. This is because the 1-GB environment provides the possibility of scaling to a very large number of Subscribers, each receiving large volumes of data. Bottlenecks in this scenario could include writing the commands to disk at the Subscriber and the Subscriber’s physical processing power, so it is important to have an adequate disk subsystem and CPU at the Subscriber. Replication Command Types By default, the Distribution Agent applies transactions at Subscribers using autogenerated custom stored procedures. These procedures are written to the MSrepl_commands table in the distribution database by the Log Reader Agent and are created on the Subscriber. For example, instead of applying the original INSERT statement that created the INSERT on the Publisher, the Distribution Agent executes an INSERT stored procedure at the Subscriber to perform the same action. These stored procedures can be further customized—for example, for actions such as maintaining aggregate tables—which is generally better than adding Subscriber-specific logic in triggers. Using custom stored procedures can provide performance improvements for the Distribution Agent because the stored procedure’s plan is cached and reused at the Subscriber, and in most cases the amount of data passed over the network can be smaller. Generally, Log Reader Agent performance is the same whether dynamic SQL or custom stored procedures are used; however, UPDATE commands may see throughput increase if the table column names being updated are very long and those column names are provided in the INSERT statement. Using the HALFTYPES example, executing custom stored procedures provided better performance than using dynamic SQL. The greatest performance benefit is with UPDATE commands, which ran 1.8 times faster. Distribution Agent Speed (Dynamic vs. Default Custom Stored Procedures) Command Number of commands Dynamic SQL (cmd/sec) Default custom stored procedure (cmd/sec) INSERT 30,000 1,118 1,230 UPDATE 30,000 882 1,570 DELETE 30,000 3,950 3,950 CALL, MCALL, and XCALL When used in custom stored procedures, the CALL, MCALL, and XCALL syntaxes vary in the amount of data propagated to the Subscriber during transactional replication. The CALL syntax, which can be used for INSERT, UPDATE, and DELETE statements, passes all values for all inserted and deleted columns. The MCALL syntax, which can only be used for UPDATE statements, passes values for affected columns, NULL for unaffected 19 columns, and a bitmap parameter, which identifies which columns have been modified. The XCALL syntax, used for UPDATE and DELETE statements, passes values for all columns, whether changed or not, and includes the previous values of all columns. By default, CALL is used for INSERT and DELETE commands, and MCALL is used for UPDATE commands, because doing so provides the best overall performance for the Log Reader Agent and the Distribution Agent. Unless every column in a table is updated, MCALL provides better performance than CALL and XCALL. Currently, XCALL is used by DTS replication, and it can be used for custom applications that require the previous values. CALL exists for backward compatibility, because UPDATE statements used CALL syntax when it was first introduced in Microsoft SQL Server 6.5. MCALL stored procedures support the same types of customization as CALL stored procedures, so MCALL is recommended if backward compatibility is not a concern. The HALFTYPES example showed the difference in throughput when using dynamic SQL, CALL, MCALL, and XCALL. The results are expressed in commands per second. As mentioned earlier, CALL is the default for INSERT and UPDATE statements, and MCALL is the default for UPDATE statements. Distribution Agent (Dynamic SQL vs. Stored Procedures) Command Number of commands Dynamic SQL CALL MCALL XCALL INSERT 30,000 1137 1245 Not applicable 1245 UPDATE 30,000 855 1054 1534 736 DELETE 30,000 4118 4254 Not applicable 1261 The following example shows a generated statement using dynamic SQL, CALL, MCALL, and XCALL stored procedures for an UPDATE statement. For simplicity and to save space, the authors table in the pubs database was used. You can call sp_browsereplcmds from within the distribution database to view the generated statements for your publications. For more information about sp_browsereplcmds, see SQL Server Books Online. The following code shows an UPDATE statement made at the Publisher: UPDATE authors SET phone = '425 882-8080' WHERE au_id = '172-32-1176' 20 The following table shows the UPDATE statement as it is stored at the Distributor. Equivalent UPDATE Statement Stored Within the Distribution Database Command type Command Dynamic SQL {UPDATE "authors" SET "phone"='425 882-8080' where "au_id"="172-32-1176"} MCALL {CALL sp_MSupd_authors (NULL, NULL, NULL, '425 882-8080', NULL, NULL, NULL, NULL, 0, '172-32-1176', 0x0800)} CALL {CALL sp_MSupd_authors ('172-32-1176', 'White', 'Johnson', '408 496-7223', '10932 Bigge Rd.', 'Menlo Park', 'CA', '94025', 1, '172-32-1176')} XCALL {CALL sp_MSupd_authors ('172-32-1176', 'White', 'Johnson', '408 496-7223', '10932 Bigge Rd.', 'Menlo Park', 'CA', '94025', 1,'172-32-1176', 'White', 'Johnson', '425 882-8080', '10932 Bigge Rd.', 'Menlo Park', 'CA', '94025', 1)} sp_scriptdynamicupdproc Introduced in SQL Server 2000 SP1, the sp_scriptdynamicupdproc stored procedure performs dynamic updates. The UPDATE statement within the custom stored procedure is built dynamically, based on the MCALL syntax for indicating which columns to change. This approach becomes more beneficial as the number of indexes on the subscribing table increases and a low number of columns is actually being changed. The default MCALL scripting logic includes all columns within the UPDATE statement, using a bitmap to determine which columns were changed. If a column has not changed, the column is set back to itself, which is not an issue in most cases; however, if several columns are indexed, extra processing starts to creep in. If there are several indexes on a subscribing table for which only a few column values are changing, index maintenance overhead increases, and this may limit the rate at which changes can be applied. Building the update statement dynamically at run time includes only the columns that have changed, providing an optimal update string. The tradeoff is extra processing incurred at run time to build the dynamic UPDATE statement. HALFTYPES Table with a Varying Number of Indexes Number of indexes Distributor throughput (cmd/sec) Default MCALL, 1 clustered index 1,570 Default MCALL, 1 clustered index, 10 nonclustered indexes 110 New dynamic MCALL with sp_scriptdynamicupdproc, 1 1,180 clustered index, 10 nonclustered indexes 21 Log Reader Agent Properties The default values for the Log Reader Agent are optimal under many circumstances; however, performance can be enhanced by: Reducing the -OutputVerboseLevel property to 0 except during initial testing, monitoring, or debugging. This reduces the amount of output information that is displayed and can improve performance by 5 percent when reducing the value from 2 to 0. Reducing the -HistoryVerboseLevel property to 0 except during initial testing, monitoring, or debugging. This eliminates the logging of history and can improve performance by 5 percent when reducing the value from 2 to 0. Increasing the -ReadBatchSize property. Although the default value of 500 is optimal, increasing the value twofold to tenfold for tests containing smaller sized transactions (one or two commands) improved performance by 5-15 percent; however, improvement costs were negligible for tests containing larger sized transactions (10+ commands). Setting -ReadBatchThreshold property to 0. The default value is 0, which means the Log Reader Agent reads to the end of the log or until it reaches the value set in -ReadBatchSize. (SQL Server Books Online lists an incorrect default value of 100.) Setting it to any other value may reduce performance. Reducing the -PollingInterval setting. Reducing the polling interval can improve the latency of transactions from the log to the distribution database because the Log Reader Agent will query the transaction log more often. Altering the -MaxCmdsInTran setting to improve elapsed time and latency when handling transactions that contain a large number of commands. This parameter is new in SQL Server 2000 SP1. -MaxCmdsInTran allows the Log Reader Agent to break transactions consisting of a large number of commands into smaller transactions, or chunks, which reduces blocking at the Distribution Agent. The Distribution Agent can start processing early chunks while the Log Reader Agent is working through the later chunks of the same transaction, thus improving parallelism between the two agents. However, using this property also means these chunks are committed at the Subscriber as individual transactions. In theory, using the MaxCmdsInTran property breaks the atomicity rule, which states that a transaction must be committed as all or nothing. This is not necessarily problematic because the transaction has already been committed at the Publisher, but users should be aware of this aspect of using -MaxCmdsInTran. As an example of using -MaxCmdsInTran, consider a transaction consisting of 10 million deletes. When this transaction was tested against the HALFTYPES table, the total elapsed time for processing at Log Reader and the Distribution Agent was one hour and 45 minutes. Log Reader Agent throughput was 5,089 commands per second, and Distribution Agent throughput was 2,466 commands per second. In this case, the Distribution Agent did not start until the Log Reader Agent has completed. Using the -MaxCmdsInTran argument at the Log Reader Agent, set to a value of 10,000, total elapsed time for the two agents was a little under one hour. Instead of replicating one transaction, the Log Reader Agent created 1,000 transactions, each consisting of 10,000 commands. Log Reader Agent throughput was 3,809 commands 22 per second and Distribution Agent throughput was 2,390 commands per second. Although Log Reader Agent throughput was reduced, performance was improved because overall elapsed time and latency for each transaction was also reduced. Distribution Agent Properties The default values for the Distribution Agent are optimal under many circumstances; however, performance can be enhanced by: Reducing the -OutputVerboseLevel property to 0 except during initial testing, monitoring, and debugging. This reduces the amount of output information that is displayed and can improve performance by 5 percent. Reducing the -HistoryVerboseLevel property to 0 except during initial testing, monitoring, and debugging. This eliminates the logging of history and can improve performance by 5 percent. Increasing the -CommitBatchSize and -CommitBatchThreshold property. Although the default values of 100 for -CommitBatchSize and 1,000 for -CommitBatchThreshold are optimal, in tests run at Microsoft, increasing the values twofold to tenfold improved performance by 5 percent for INSERT commands, 10-15 percent for UPDATE commands, and 30 percent for DELETE commands. In the test scenario, the Distribution Agent ran independently of the Log Reader Agent, but if the two agents run concurrently and the values are set too high, Distribution Agent performance can decrease while the Log Reader Agent is running. Reducing the -PollingInterval setting. Reducing the polling interval can improve the latency of transactions from the distribution database to the Subscriber, because the Distribution Agent will query the distribution database more frequently. Reducing the polling interval on the both the Log Reader Agent and the Distribution Agent can improve latency between Publisher and Subscriber. Using a low value on a slower network connection is not recommended. Transactional Replication Scalability Because transactional replication is often used in scale-out scenarios, it is crucial to understand the ways in which the type of agent, Distributor, and subscription can affect scalability. This section of the paper examines the use of remote Distributors, pull subscriptions, and independent agents. It also considers Distributor delivery rates, latency, and dequeuing rates for queued updating subscriptions. Note If the number of Subscribers in your topology is very large, or the Subscribers share a fast network but are separated from the Publisher and Distributor by a slow link, design a multi-tiered replication topology. The root server should be the Publisher for middle-tier Subscribers, which in turn should republish the data to lower-level Subscribers. There are some limitations to a topology based on republishing, such as not being able to use updating Subscribers, but republishing can be a good choice for a scale-out scenario. For more information about republishing, see SQL Server Books Online. In the tests that follow, unless otherwise stated the hardware configuration consisted of two Compaq Proliants with RAID disk subsystems: the Publisher was a Pentium Xeon 550 MHz quad processor with 640 MB memory and a 512-KB level 2 cache, and the 23 Distributor was a quad processor Pentium II 200 MHz with 3 GB memory. The publication database and publication contained a single table with a column for every data type (except text, ntext, image, and sql_variant). The replication topology consisted of a Publisher, a remote Distributor, and multiple Subscribers. The subscriptions were distributed evenly between the Subscribers. Network connection was over a 100-Mbps LAN using TCP/IP. Using a Local or Remote Distributor If the Publisher is expected to be a busy OLTP server, or if it is already CPU intensive or even I/O intensive, place the Publisher and Distributor on separate computers. This supports future scaling and capacity planning because multiple Publishers can use the same Distributor. Here are some examples of how transactional replication can affect OLTP activity: Replication agents require a certain amount of memory while executing. Multiple Log Reader Agents and Distribution Agents can consume significant amounts of memory and CPU cycles. The Log Reader Agent writes commands to the distribution database, so multiple Log Reader Agents servicing multiple published databases and writing to the distribution database can consume many CPU cycles and increase disk I/O. If there are multiple Distribution Agents, using a local Distributor slows the overall performance of the server. The cleanup tasks that are run as a maintenance activity on the distribution database can become expensive and involve significant disk activity. Using the HALFTYPES example with a single Log Reader Agent and Distribution Agent, there was a performance benefit in using a remote Distributor. Using the throughput of a local Distributor as a baseline, the Log Reader Agent on a remote Distributor was approximately 1.3 times faster. The Distribution Agent on a remote Distributor for the same tests was approximately 1.47, 1.1, and 1.15 times faster for INSERT, UPDATE, and DELETE commands, respectively. Using Pull Subscriptions The Distribution Agent runs on the Distributor for push subscriptions and on Subscribers for pull or anonymous subscriptions. Using pull or anonymous subscriptions can increase performance by moving Distribution Agent processing from the Distributor to Subscribers. Anonymous subscriptions, which are especially useful for Internet applications, do not require information about the Subscriber to be stored in the distribution database at the Distributor for transactional replication. Not having to maintain information on Subscribers using anonymous subscriptions reduces the resource demands on the Publisher and Distributor. Anonymous subscriptions are a special category of pull subscriptions. In regular pull subscriptions, the Distribution Agent runs at the Subscriber (thereby reducing the resource demands on the Distributor), but it still stores information at the Publisher. 24 Using Independent Agents An independent agent is an agent that services a single publication/subscription pair. Using independent agents reduces latency, because the agent is ready whenever the subscription needs to synchronize. A shared agent, on the other hand, services multiple publication/subscription pairs within a Publisher database and Subscriber database. When multiple subscriptions using the same shared agent need to synchronize, they wait in a queue, and the shared agent services them one at a time. A shared agent is the default for transactional replication, because independent agents cannot guarantee transactional consistency when separate transactions are dependent on each other but are handled by different independent agents. Consider the following example: Transaction T1 updates all rows in article A1 in publication P1, and then transaction T2 in publication P2 bases a query on the results of a SELECT from A1. If you are using a shared agent, this presents no problem, as the shared agent is aware of all transactions and it commits them in order. But independent agents are not aware of each other’s transactions, so there can be no guarantee that T1 will be processed before T2. However, if there are no dependencies between transactions handled by different agents, independent agents allow you to retain transactional consistency while reducing latency. Distribution Delivery Rates In the test discussed in this section, the distribution delivery rate as a function of the number of subscriptions was examined. The transaction rate used was an average of eight transactions per second, with an average of five commands per transaction. This amounts to 1,000,000 to 2,000,000 commands per day, with equal ratios of INSERT, UPDATE, and DELETE commands. All the Distribution Agents were configured as pull, and they were run concurrently. In this scenario, neither the Publisher nor the Distributor was CPU stressed. In fact, because the subscriptions were pull subscriptions, which means that Distribution Agents were run at the Subscribers, the Distributor seldom ran above 25 percent CPU usage with 128 concurrent Subscribers. Figure 5: Change in Delivery Rate As Number of Subscribers Increases 25 Log Reader Agent performance is essentially unaffected by the number of subscriptions, especially when a remote Distributor is used. When a remote Distributor was used, the cost in throughput to the Log Reader was 30-40 percent. When the number of subscribers was scaled from 1 to 128, there was only a 6.2 percent drop in commands per second, at a cost of approximately 1 percent per 48 additional subscribers. Distribution Delivery Latency Delivery latency as a function of the number of subscriptions was also examined, using the same scenario as in the distribution delivery rates example. Figure 6: Delivery latency Average latency increased from three to six seconds when 128 concurrent pull Distribution Agents were running, so all 128 subscribers were on average only six seconds behind the publishing database. However, in the previous scenario, the Log 26 Reader Agent and the Distribution Agent were able to keep up with the transaction rate at the Publisher; given a heavier load, a slower network, or slower Subscriber computers, greater latencies can be expected. Dequeuing Rates for Queued Updating Subscribers The next scenario tested replication scalability by examining the effect on the dequeuing rate of adding additional Subscribers. In this test, a single Queue Reader Agent serviced all the queues (both SQL Server queues and Message Queuing) for a given publication. The dequeuing rate (using a SQL Server queue) as a function of the number of subscriptions is examined in the following chart. The publication again consisted of a single table to which multiple Subscribers subscribe. Hardware used for this test was two dual-processor Xeon 550 MHz computers with 512 MB memory for both the Publisher and the Distributor. The Subscribers were simulated using the two Xeon computers over a 100-Mbps LAN. Figure 7: Dequeuing rate The dequeuing rate decreased by 13 percent for 20 concurrent Subscribers and by 27 percent for 32 concurrent Subscribers. However, in this stress situation an average dequeuing rate of 140 commands per second was maintained; under normal conditions it is unlikely that all Subscribers would have such a high number of changes in the local queue. Using Message Queuing, about 10 percent less performance was realized because Message Queuing uses MS DTC as its transaction coordinator, adding some overhead. Transactional Subscriber Latency Rates Using transactional replication, it is possible for a Subscriber to be a few seconds behind the Publisher. With a latency of only a few seconds, the Subscriber can easily be used as a reporting server, offloading expensive user queries and reporting from the Publisher to the Subscriber. In the following scenario (using the Customer table shown later in this section) the Subscriber was only four seconds behind the Publisher. Even more impressive, 60 percent of the time it had a latency of two seconds or less. The time is measured from 27 when the record was inserted or updated at the Publisher until it was actually written to the subscribing database. 28 Figure 8: Transactional Subscriber Latency UPDATE commands INSERT commands Latency (seconds) Number of rows Latency (seconds) Number of rows 4 5,528 4 1,318 1 21,359 3 14,984 3 30,359 2 39,563 2 42,754 1 44,135 TOTAL 100,000 TOTAL 100,000 This scenario used a separate Publisher, Distributor, and Subscriber across a 10-Mbps LAN, using identical computers: Dell Precision 610; dual-processor 450 MHz; 256 MB of memory and two SCSI hard drives. Commands are applied to the Subscriber with the autogenerated replication stored procedures, using a pull subscription. To complete 100,000 inserts on the Publisher took 304 seconds. 100,000 updates took 315 seconds, or 325 commands per second. The Subscriber was in synchronization within 4 seconds of the Publisher. During the entire process, the Publisher’s CPU rarely moved above 30 percent utilization, the Distributor 14 percent, and the Subscriber 11 percent. Given that there was plenty of CPU capacity at the Distributor and that these are not high-end production servers, more Subscribers can be added with similar latencies. 29 The Customer Table (Used in the Latency Test) Column Data type NULL Default Typical data IDENTITY(0,1) 50000 Cust_Id int Lname varchar(30) HALL Fname varchar(30) NEWMAN DOB smalldatetime State char(2) Email varchar(30) Yes someone@micr osoft.com Tel varchar(15) Yes 425 555-0100 Yes Zip varchar(4) 98050 Yes Rating smallint 20 Yes ROWGUID uniqueidentifier Tran_Dt datetime Yes 1966-01-01 00:00:00 Updated Yes WA ROWGUIDCOL - FF1452C0newid() EF54-47AA8CCF880F7C6F246A 2000-01-01 00:00:00 Yes The primary key is the Cust_Id column, and it is clustered. Both the Publisher and Subscriber include two nonclustered indexes: one on State (nonunique) and one on ROWGUID (unique). Factors Affecting Transactional Delivery Rates In most cases, the Subscriber is the bottleneck, because data cannot be written or applied quickly enough. Factors include the following: Subscriber’s physical computer. Slower processor and/or lower number of processors. Low processor availability and/or high processor load. Low amount of available memory. Slower disk subsystem. Subscription database or SQL Server setup. Database log not on a separate disk. Database on RAID 5 disk (RAID 10 provides better performance). 30 SQL Server memory available, and whether it is dynamic or fixed: The amount of memory that is appropriate and whether it should be fixed or dynamically allocated depends on your application. SQL Server protocol used: TPC/IP is generally slightly faster than other network protocols. SQL Server Personal Edition, which is generally slower, used. Windows 98 or Windows Millennium, which are generally slower, used. Network speed or connection The Subscriber can become I/O bound if using a very fast network (100 MB or faster) and the Subscriber has a slower disk subsystem or the log is not on a separate disk. Reliability of the connection: more retries may be necessary if the connection is unreliable. Different indexes exist on the Subscriber. Often a reporting server is heavily indexed, and index management results in more I/O. Using the CUSTOMER table shown earlier, the average latency increases to 4 seconds (with a maximum of 6 seconds) when four nonclustered indexes are added at the Subscriber on [Lname, Fname], [DOB], [Email], and [Tel]. User triggers firing at the Subscriber. Subscriber triggers not marked NOT FOR REPLICATION are fired for each relevant operation. As triggers frequently use the inserted and/or deleted tables and often perform other operations, the costs can be dramatic. By moving the trigger code into a custom stored procedure, some of the costs can be avoided. Using the earlier CUSTOMER example: it takes 88 seconds, or 1,140 commands per second, for the Distribution Agent to deliver 100,000 insert commands to the Subscriber. The Subscriber has the following insert trigger defined: CREATE TRIGGER CUSTOMER_INS_TRG ON CUSTOMER FOR INSERT AS INSERT INTO BADRATINGS (Id, Cust_Id, Rating, Rating_Dt) SELECT NewId(), Cust_Id, Rating, GetDate() FROM inserted WHERE (Cust_Id % 3) = 0 Then the trigger is dropped and the relevant code is added to the autogenerated insert stored procedure, which is called by the Distribution Agent: CREATE PROCEDURE sp_MSins_CUSTOMER... ... IF((@c1 % 3) = 0) INSERT INTO BADRATINGS (Id, Cust_Id, Rating, Rating_Dt) SELECT NewId(), @c1, @c9, GetDate() 31 It now takes only 52 seconds (1,932 commands per second) for the Distribution Agent to deliver 100,000 commands to the Subscriber. This is 1.7 times faster than using triggers, dramatically affecting latency and throughput. If user triggers are still required (to trap local data changes made by users, for example), they should be marked as NOT FOR REPLICATION. The triggers then fire only when local data changes are made by users. Replicating stored procedure execution. SQL Server can replicate the execution of stored procedures rather than the data changes caused by the execution of those stored procedures. This is useful in replicating the results of maintenance-oriented stored procedures that may affect large amounts of data. Replicating the changes as one stored procedure statement can greatly increase the efficiency of your application, but this feature should be used with care. Each time a published stored procedure is executed at the Publisher, the execution and the parameters passed to it for execution are forwarded to each Subscriber to the Publication. The stored procedure is then executed with these parameters at the Subscriber. This is vastly different from the Log Reader Agent picking up the changes in the log (for possibly thousands of rows), building the SQL statements for each and then having them applied to the Subscriber. Using the CUSTOMER table example (with an existing 100,000 rows) earlier in this paper, the following stored procedure was executed at the Publisher: CREATE PROCEDURE PROC_CUSTOMER_ADMIN_RATING @DOB smalldatetime AS UPDATE CUSTOMER SET Rating = Rating + 1 WHERE DOB < @DOB Executing EXEC PROC_CUSTOMER_ADMIN_RATING '1966-01-01' resulted in 59,972 rows being updated, picked up by the Log Reader Agent and written to the distribution database. The Distribution Agent then applies 59,972 updates to the Subscriber, which takes one minute and 51 seconds to complete. In contrast, when replicating the execution of the stored procedure, only the actual EXEC statement is written to the distribution database and is then executed at the subscribing database. This takes only 1.7 seconds. Therefore, replicating stored procedure execution both reduces the volume of commands requiring forwarding to Subscribers and increases the performance of your application by executing fewer dynamic SQL statements at each Subscriber. 32 Conclusion Transactional replication in SQL Server 2000 is a mature technology that offers high performance and scalability and is suitable for the most demanding enterprise applications. Transactional replication performs well with its default behavior and settings, but it can clearly benefit from performance tuning based on the specific needs of your replication topology and applications. Following the examples and suggestions outlined in this paper can help you take transactional replication performance to the next level. 33