Transactional Replication Performance Tuning and Optimization

Transactional Replication Performance
Tuning and Optimization
Authors: Bren Newman, Xavier Schildwachter, Greg Yvkoff
The information contained in this document represents the current view of Microsoft
Corporation on the issues discussed as of the date of publication. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of
any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO
WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced, stored
in or introduced into a retrieval system, or transmitted in any form or by any means
(electronic, mechanical, photocopying, recording, or otherwise), or for any purpose,
without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Microsoft, the furnishing of
this document does not give you any license to these patents, trademarks, copyrights,
or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain
names, e—mail addresses, logos, people, places and events depicted herein are
fictitious, and no association with any real company, organization, product, domain
name, e-mail address, logo, person, place or event is intended or should be inferred.
 2001 Microsoft Corporation. All rights reserved.
Microsoft and Windows are either registered trademarks or trademarks of Microsoft
Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks
of their respective owners.
Table of Contents
Introduction ...................................................................................................... 1
Improving Replication Performance .................................................................. 2
Improving the Performance of Transactional Replication .................................. 3
Improving Performance in Applying the Initial Snapshot .................................. 3
Using -MaxBCPThreads ..................................................................................... 4
Using -UseInprocLoader.................................................................................... 7
Using Compressed Snapshots ............................................................................ 8
Using Concurrent Snapshot Processing ............................................................... 8
Transactional Replication Performance Examples .............................................. 9
Cost of Transactional Replication at the Publisher .............................................. 10
Transactional Replication with Filters ................................................................ 12
Transactional Replication with Indexed Views .................................................... 12
Transactional Replication with Transformable Subscriptions ................................. 13
Transactional Replication with Subscribers Running Earlier Versions of SQL Server 14
Transactional Replication with Updatable Subscriptions ....................................... 14
Adding Hardware in Transactional Replication .................................................... 16
Effects of Network Connection Speeds on Transactional Replication ...................... 17
Replication Command Types............................................................................ 19
CALL, MCALL, and XCALL ................................................................................ 19
Log Reader Agent Properties ........................................................................... 22
Distribution Agent Properties ........................................................................... 23
Transactional Replication Scalability ................................................................. 23
Transactional Subscriber Latency Rates ............................................................ 27
Factors Affecting Transactional Delivery Rates ................................................... 30
Conclusion ....................................................................................................... 33
i
Introduction
Transactional replication is a type of replication provided by Microsoft ® SQL Server™
2000 that allows data modifications to be propagated incrementally between servers in a
distributed environment.
Transactional replication can be used for many different applications, from reporting
servers and data warehousing environments to Web servers and e-commerce
applications. Transactional replication is used at many of the predominant Web sites on
the Internet that run SQL Server, including MSN.com, Passport.com,
BarnesandNoble.com, and Buy.com.
Transactional replication is a scalable and reliable solution for distributing data in
high-performance environments. This paper examines performance in transactional
replication and demonstrates ways in which you can improve the performance of your
applications. This paper is based on the results of tests conducted using a variety of
hardware configurations and replication environments. Based on the test results,
recommendations are made in areas such as applying the initial snapshot, optimizing
replication settings, and replication scalability.
1
Improving Replication Performance
You can enhance the general performance for all types of replication in your application
and on your network by:

Optimizing your database design to include replication considerations.

Setting a minimum amount of memory allocated to SQL Server 2000.

Using a separate disk drive for the transaction log for all databases involved in
replication.

Adding memory to servers used in replication.

Using multiprocessor computers.

Publishing only the amount of data required.

Running the Snapshot Agent only when necessary and at off-peak times.

Placing the snapshot folder on a drive not used to store database or log files.

Using a single snapshot folder per publication.

Considering the use of compressed snapshot files.

Considering the use of pull or anonymous subscriptions.

Reducing the verbose level of replication agents to 0, except during initial testing,
monitoring, or debugging.

Considering the use of the -UseInprocLoader parameter of the Distribution Agent.
For more information about enhancing replication performance, see SQL Server Books
Online.
2
Improving the Performance of
Transactional Replication
You can enhance the performance of transactional replication in your application and on
your network by:

Running agents continuously instead of on frequent schedules.

Setting the distribution database to a fixed size that can handle the transaction
volume and retention period without frequent autogrowth.

Reducing the distribution frequency when replicating to numerous Subscribers.

Configuring the Distributor on a dedicated server.

Increasing memory on the Distributor.

Subscribing to all articles in a publication.

Using stored procedure replication when a large number of rows are affected.

Minimizing the retention period for transactions and history.

Increasing the read batch size for the Log Reader Agent.

Minimizing the log history and retention period.

Using custom stored procedures for inserts, updates, and deletes at Subscribers.

Avoiding horizontal filtering.
Improving Performance in Applying the
Initial Snapshot
Applying the initial snapshot can take a significant amount of time if you are transferring
a large amount of data over the network, or if you have a slow link. To address this
situation, transfer the snapshot using a removable disk or use the performance
optimization features of SQL Server 2000. The following examples demonstrate
snapshot performance improvements when using the optimization features of
-MaxBCPThreads, -UseInprocLoader, compressed snapshots, and concurrent
snapshot processing.
3
Using -MaxBCPThreads
In transactional replication, the -MaxBCPThreads parameter can be passed to the
Snapshot Agent and the Distribution Agent. This parameter specifies the number of
bulk-copy operations that can be performed in parallel. The maximum number of
threads and ODBC connections that can exist simultaneously is the value of
-MaxBCPThreads or the number of bulk-copy requests that appear in the
synchronization transaction at the distribution database, whichever is lower.
-MaxBCPThreads must have a value greater than 0, and it has no hard-coded upper
limit. The default is 1. When used with the Snapshot Agent, -MaxBCPThreads affects
the time it takes to generate a snapshot. When used with the Distribution Agent,
-MaxBCPThreads affects the time it takes to apply the snapshot at the Subscriber.
Because the Snapshot Agent bulk copies the contents of all the articles in a publication,
the Snapshot Agent writes the entire publication to the snapshot folder. Therefore, the
faster the disk subsystem can read and write data to the disk or disks, the faster the
snapshot is completed. This also applies to the Distribution Agent applying the snapshot
at the Subscriber. For the numbers provided in the following table, the snapshot data is
written to and read from a three-disk array (RAID 0) and written to a subscription
database spread across a three-disk array (RAID 0) with the database log on a separate
disk.
The performance benefit from using -MaxBCPThreads also depends on the number of
processors on the server. Specifying a high number for -MaxBCPThreads can
overburden the system, because the system must spend too much time managing
threads. Using more threads than the total number of articles provides no additional
benefit.
In the following example, the publication has seven articles, totaling 228 megabytes
(MB) of database storage space.
Publication Articles 1
Articles
Total rows
Reserved size (KB) Index size (KB)
CUSTOMER
120,000
19,984
4,032
PAYMENT
120,000
11,280
2,848
ORDERS
374,000
82,208
22,416
NAMES
120,000
7,056
32
CUSTOMER_HISTORY
120,000
23,744
64
PAYMENT_HISTORY
120,000
8,448
64
ORDERS_HISTORY
374,000
75,376
192
TOTAL
1,348,000
228,096
29,648
4
Generating the Initial Snapshot with the Snapshot Agent
The following data shows that on a dual-processor 450-megahertz (MHz) Xeon with 256
MB of memory, using a value of 7 for -MaxBCPThreads results in snapshot generation
that is 1.6 times faster than it is when using a value of 1. On a single processor, using a
value of 7 for -MaxBCPThreads results in snapshot generation that is 1.27 times faster
than it is when using a value of 1. Given that the CPU becomes the bottleneck on the
single processor, a value of 7 provides no more benefit than a value of 3.
Figure 1: Effect of –MaxBCPThreads Setting on Initial Snapshot Generation
Processors
-MaxBCPThreads
=1
-MaxBCPThreads
=3
-MaxBCPThreads
=7
Dual Processor
122 seconds
84 seconds
76 seconds
Single Processor 122 seconds
96 seconds
96 seconds
5
Applying the Initial Snapshot with the Distribution Agent
The following data shows that on a dual processor 450-MHz Xeon with 256 MB of
memory, using a value of 7 for -MaxBCPThreads results in snapshot application that is
1.3 times faster than it is when using a value of 1. On a single processor, the CPU is
again the bottleneck and increasing this value provides little performance improvement.
Using a value of 7 for -MaxBCPThreads is only 1.03 times faster than using a value of
1, and using a value of 7 provides no additional benefit over using a value of 3. Using
dual processors clearly provides a large performance gain; the initial snapshot
application is 1.57 times faster with dual processors than it is with a single processor.
Figure 2: Effect of –MaxBCPThreads Setting on Initial Snapshot Application
Processors
-MaxBCPThreads=1
-MaxBCPThreads=3
-MaxBCPThreads=7
Dual Proc
120 seconds
98 seconds
92 seconds
Single Proc
148 seconds
144 seconds
144 seconds
6
Using -UseInprocLoader
The -UseInprocLoader parameter can be passed to the Distribution Agent when
applying the initial snapshot at the Subscriber. When you use this parameter, the
Distribution Agent will use the in-process BULK INSERT operation, decreasing the
amount of time taken to apply the snapshot. To further enhance performance, use
-UseInprocLoader in conjunction with -MaxBCPThreads. The following example uses
a publication containing 10 articles, totaling 46 MB of data.
Publication Articles 2
Articles
Total rows
Reserved size (KB)
Index size (KB)
CUSTOMER
60,000
7,944
1,968
PAYMENT
60,000
5,640
1,424
ORDERS
187,000
29,896
11,144
NAMES
5,765
328
16
PRODUCTS
10,000
904
264
INTERESTED_IN
6,000
1,216
752
STATE
200
64
48
SHIPPERS
51
40
32
SHIP_TYPE
11
40
32
REGION
2
40
32
TOTAL
329,029
46,112
15,712
When you use only the -UseInprocLoader parameter, snapshot application is 1.4 times
faster than without this parameter. When -UseInprocLoader is combined with
-MaxBCPThreads=5, snapshot application is 2.1 times faster.
7
Time to Apply Snapshot at Subscriber
Standard
-UseInprocLoader
-UseInprocLoader and
-MaxBCPThreads=5
36 seconds
25 seconds
17 seconds
In most cases, you will see a performance gain. By default, this parameter is not used
because it is affected by line quality and speed, the amount of available memory on the
subscription database, the type of data transferred, and the number of articles. It is
recommended that you test the performance gain using your publication.
Using Compressed Snapshots
This option is recommended when you are using a pull or remote push Subscriber. It
also provides additional benefits when you are using FTP support. Compressing snapshot
files in the alternate snapshot folder can reduce snapshot disk storage requirements,
and in some cases it can significantly improve performance when you are transferring
snapshot files over a slow connection. However, compressing the snapshot requires
additional processing by the Snapshot and Distribution agents while the snapshot files
are generated and applied. This may slow down overall snapshot generation and
increase the time it takes to apply a snapshot.
Using the articles listed earlier in the publication Articles 2 table, the Snapshot Agent
generates 20 files—including schema files and data files—with a total size of
approximately 130 MB. When you use a compressed snapshot, it also generates a .cab
file with a size of approximately 65 MB; a Subscriber loading a compressed snapshot
across a slower link has only half as much data to copy. However, compressed
snapshots require more storage space and more set-up time. A compressed snapshot
can use more space on the Distributor (the process optionally maintains both the
compressed and uncompressed data), and it takes more than 4.5 times longer to
generate than an uncompressed snapshot because of the time required to compress the
snapshot. Consider these tradeoffs carefully during planning.
Using Concurrent Snapshot Processing
When you use the default settings for snapshot generation, SQL Server places shared
locks for the duration of snapshot generation on all tables published as part of
replication. This prevents updates from being made on the publishing tables. Concurrent
snapshot processing (which is available only with transactional replication) places shared
locks for only a short time while SQL Server 2000 creates initial snapshot files, allowing
users to continue working uninterrupted.
When you create a new publication using transactional replication and indicate that all
Subscribers will be instances of SQL Server 7.0 or SQL Server 2000, concurrent
snapshot processing is available.
After replication begins, the Snapshot Agent places shared locks on the publication
tables. The locks prevent changes until a record indicating the start of the snapshot is
written to the transaction log. After this is done, the shared locks are released, and data
modifications at the database can continue. The duration for holding the locks is only a
few seconds, even if a large amount of data is being copied.
8
At this point, the Snapshot Agent starts to build the snapshot files. When the snapshot is
complete, a second record indicating the end of the snapshot process is written to the
log. Any transactions that affect the tables while the snapshot is being generated are
captured between these beginning and ending tokens and forwarded to the distribution
database by the Log Reader Agent.
When the snapshot is applied at the Subscriber, the Distribution Agent first applies the
snapshot files (schema and data files). It then reconciles each captured transaction to
see whether it has already been delivered to the Subscriber. During this reconciliation
process, the tables on the Subscriber are locked, and transactions that occur during the
lock are again captured in the log and applied after the locks are released. If a high
number of transactions are captured at the publisher while the tables are locked, the
snapshot takes longer to apply at the Subscriber.
Although concurrent snapshot processing allows updates to continue on publishing
tables, the additional I/O required to write the snapshot files to disk may affect
performance. Whenever possible, you should generate the snapshot during periods of
low activity.
For more information about concurrent snapshot processing, see SQL Server 2000 Books
Online.
Note In SQL Server 2000, concurrent snapshot processing is not recommended if
the publishing table has a unique index that is not the primary key or the clustering
key. If data modifications are made to the clustering key while a concurrent snapshot
is being generated, replication can fail with a duplicate key error when applying the
snapshot to a Subscriber. With SQL Server 2000 Service Pack 1 (SP1), there are no
longer any restrictions on using concurrent snapshot processing.
Transactional Replication Performance
Examples
For the numeric data in this section, gain and cost are measured as a percentage of the
base throughput numbers returned by the replication agents. Throughput numbers are
expressed in commands per second; a command is a unit of operation for a replication
agent. The numbers are based on a simple scenario in a fixed environment and are
given to show either a cost or a gain in performance. Many factors affect the amount of
cost or gain in a real-world scenario, including network traffic, background Microsoft
Windows® services that are running, client Net-Libraries, hardware, and so on.
Therefore, the numbers provided here are not to be used specifically as benchmarks.
All the examples, unless specified otherwise, use the following environment:

Hardware
The test environment consisted of a Publisher, a remote Distributor, and a
Subscriber. These three servers had identical hardware and software: Dell Precision
610, dual-processor Pentium II Xeon 450 MHz; 256 MB memory with a 512-kilobyte
(KB) L2 cache; a 100-MB Ethernet network card; and four SCSI disks with Windows
2000 Server, SQL Server 2000 Standard Edition, the database data file, and the
database log files each on their own disk.
9
Note In almost all circumstances, the database log file should be placed on its
own separate disk or disks and the database data files should be striped across
multiple disks with a stripe size of 64 KB or a multiple thereof.

Replication environment
The replication environment consisted of a publication that contained one article, the
HALFTYPES table. This table contained 21 columns, representing every data type
except text, ntext, image, sql_variant, and bigint. The row size varied, with an
average size of approximately 1,024 bytes.
In the test scenario, a push subscription was created, and then the Snapshot Agent,
the Log Reader Agent, and the Distribution Agent were run at the Distributor. Three
different tests tracked insert, update, and delete operations. Each test consisted of
30,000 commands with 10 commands per second executed at the Publisher.
Note UPDATE commands do not update the primary key, because primary key
updates are much less common than updates that do not involve the primary
key. Because a primary key update is propagated as a DELETE command
followed by an INSERT command during replication, it can skew the resulting
data.
Performance was measured in throughput. To obtain the highest throughput for a
given replication agent, in most cases each agent was run separately, rather than
concurrently. For example, the commands were first executed at the Publisher, then
the Log Reader Agent was run, and finally the Distribution Agent was run. The
Distribution Agent used the autogenerated custom stored procedures created for
each article, which is the default behavior.
Cost of Transactional Replication
at the Publisher
One of the first questions asked when deciding whether to use transactional replication
is: "How does adding transactional replication affect my OLTP server?"
When you add transactional replication to an online transactional processing (OLTP)
environment, the OLTP server becomes the Publisher. This incurs the overhead cost of
the Log Reader Agent querying the database log. In a local Distributor environment, the
server also incurs the costs of writing to the distribution database, running the
Distribution Agents (in a push scenario), and having the Distribution Agents read
changes from the distribution database.
To measure the minimum overhead costs, the following times were recorded:

The time it took to apply changes at the Publisher without replication configured

The time it took to apply changes at the Publisher with the Log Reader Agent running
with both a local and a remote Distributor
Using the HALFTYPES example, the time it took each of 20 simultaneous client
connections to execute 150 transactions consisting of 10 commands per transaction was
measured. In total, there were 3,000 transactions or 30,000 commands. At the time of
execution, CPU usage was at 90-100 percent for INSERT commands, 50-60 percent for
UPDATE commands, and 25-35 percent for DELETE commands. Then a remote
10
Distributor was added to the environment and the tests were run again with the Log
Reader Agent running. Finally, the tests were run again using a local Distributor.
Cost of Transactional Replication at the Publisher
Command
Number of
Commands
Replication
not
configured
Log Reader Agent Log Reader Agent
running, remote
Running, local
Distributor
Distributor
INSERT
30,000
50 seconds
54 seconds
58 seconds
UPDATE
30,000
20 seconds
22 seconds
25 seconds
DELETE
30,000
20 seconds
20 seconds
22 seconds
As indicated in "Cost of Transactional Replication," there is a cost in adding replication to
an OLTP server; however, the cost under stress conditions can be as low as 8-10
percent when using a remote Distributor, and somewhat higher when using a local
Distributor. Furthermore, if the publishing server has enough CPU capacity, the cost can
be insignificant: When using a four-processor or eight-processor server, the cost of
replication is often less than 3 percent.
Log Reader Agent vs. Distribution Agent Throughput
Using a total of 30,000 commands with 10 commands per transaction, the Log Reader
Agent processes many more INSERT and UPDATE commands per second than the
Distribution Agent. DELETE commands are processed exceptionally quickly by both
agents: the Log Reader Agent only has to formulate and write a small delete string
based on the primary key, and the Distribution Agent benefits by reading and
distributing such a small string. In most production environments, when both agents are
running in continuous mode, the Log Reader Agent can write more commands to the
distribution database than the Distribution Agent can deliver to the subscribing
database. Even when the Subscriber is processing large amounts of data, it can be as
little as 1-5 seconds behind the Publisher, giving a latency of about 5 seconds.
Log Reader Agent and Distribution Agent Throughput in Commands per Second
(cmds/sec)
Command
Log Reader Agent
(cmds/sec)
Distribution Agent
(cmds/sec)
INSERT
2,080
1,230
UPDATE
2,660
1,570
DELETE
3,890
3,950
11
Transactional Replication with Filters
Partitioning replicated data using filters allows you to reduce the amount of data sent
over the network, reduce the amount of storage space required at the Subscriber, and
customize publications and applications based on individual Subscriber requirements.
Because less data is transferred from the Publisher to the Subscriber, the performance
of the Distribution Agent improves. For example, adding column filters improves
throughput because it minimizes the volume of data that is propagated across the
network. Fewer columns propagated means smaller INSERT, UPDATE, and DELETE
statements are created and written to and from the distribution database. Also, because
the statements are sent across the network in batches, smaller statements means
smaller batches. Smaller batches move across the network more quickly, improving
performance and reducing latency.
After adding column filters to the HALFTYPES publication, Log Reader Agent results
were 1.14-1.35 times faster than they were without column filters. Distribution Agent
results were 1.05-2.25 times faster. Of course, the performance difference depends
largely on the size of the row, the number of columns being filtered, and the size of the
columns being filtered.
Row filters add overhead to the Log Reader Agent and the Publisher’s CPU, because the
Log Reader Agent must evaluate each row filter against the transaction log record for
the articles. If multiple filters exist, each filter is evaluated independently and a separate
command is entered in the distribution database for each filter that qualifies.
Filter overhead depends on three things: the complexity of the filter; the type of joins,
functions, or comparisons being used in the filter; and whether or not the filter uses
indexes on the publishing database. However, in a Publisher using a remote Distributor,
the Publisher will probably have enough CPU capacity to compensate for this extra
overhead. Generally the CPU costs are not large and, depending on the filter, should not
add more than 20 percent overhead.
Transactional Replication with Indexed
Views
An alternative to adding a row filter at the article level is to publish an indexed view
based on the same WHERE clause that the filter would use. When you use an indexed
view rather than a row filter, the Log Reader Agent does not need to evaluate
statements against filters because the article—which is now the indexed view, rather
than the base table—is already filtered. However, the overall performance of the Log
Reader Agent is still slower than an article without filters, because every modification
performed on a table with an indexed view is logged twice, once for the indexed view
and again for the table itself. This doubles the number of log records the Log Reader
Agent must traverse, which affects Log Reader Agent performance.
The cost of maintaining the indexed view at the Publisher can be high, so indexed views
work best if the underlying data is infrequently updated. If the underlying data is
frequently updated, the cost of maintaining the indexed view data may outweigh any
performance benefits of using the indexed view. Indexed views usually do not improve
performance under the following scenarios: OLTP systems with many writes; databases
12
with many UPDATE operations; or using queries that do not involve aggregations or
joins.
The following table shows the results of a test comparing execution times on identical
tables with and without indexed views. No replication was involved in this test. The
indexed view was:
SELECT * FROM HALFTYPES WHERE IndexCol % 4 = 0
This represents all columns but only 25 percent of all the rows or a total of 7,500 rows.
IndexCol had a nonclustered index.
Cost of Using an Indexed View
Operation
Number of
operations
Table without
indexed view
Table with
indexed view
INSERT
30,000
50 seconds
180 seconds
UPDATE
30,000
44 seconds
330 seconds
DELETE
30,000
44 seconds
207 seconds
INSERT operations (not published for replication) took 3.5 times longer to apply to a
table with an indexed view than to one without an indexed view. DELETE operations took
3 times longer, and UPDATE operations took 7.5 times longer.
The performance issues with using indexed views are not replication issues; they are a
function of the way indexed views are handled in the server, but they can clearly affect
performance in a replication environment. Although you should exercise caution when
publishing indexed views, they can be very useful in some situations, such as replicating
summary or aggregated data to a large number of Subscribers. For more information
about indexed views, see SQL Server 2000 Books Online.
Transactional Replication with
Transformable Subscriptions
Transforming published data leverages the data movement, transformation mapping,
and filtering capabilities of Data Transformation Services (DTS). Using transformable
subscriptions allows you to customize and send published data based on the
requirements of individual Subscribers. Examples of how you can use transformable
subscriptions include:

Creating data transformations such as data type mappings (for example, integer to
real data type), column manipulations (for example, concatenating first name and
last name columns), string manipulations, and function-based transformations.

Creating custom data partitions. You can create column and row filters of published
data on a per-Subscriber basis.
Although using DTS packages provides rich features and flexibility, it affects throughput
for both the Log Reader Agent and the Distribution Agent. DTS requires the parameters
of the stored procedures to be specified using XCALL syntax for UPDATE and DELETE
statements and CALL syntax for INSERT statements (for more information, see "CALL,
13
MCALL, and XCALL" later in this paper). Using XCALL passes values for all columns,
whether changed or not, plus the previous value in the column. This increases the size
of the command written to the distribution database and creates more processing work
for the Log Reader Agent and the Distribution Agent. However, the Log Reader Agent is
only minimally affected—in most cases the effect should be less than 10 percent.
Often, row-level and/or column-level filtering operations are performed within the DTS
package and throughput is negatively affected, but only for the Distribution Agent, which
calls the package. Depending upon how much processing is performed within the DTS
package, the cost of using a DTS package can be high in terms of CPU and memory
usage, and it can reduce the number of commands per second that can be delivered to a
Subscriber. Therefore, you should exercise caution when using a highly concurrent push
model in which many Distribution Agents are concurrently instantiating DTS packages.
Generally, adding a DTS package to a publication reduces the number of commands the
Distribution Agent is able to distribute by 50 percent, but it can greatly enhance
functionality.
Transactional Replication with Subscribers
Running Earlier Versions of SQL Server
There are no additional overhead costs for push subscriptions with Subscribers running
SQL Server 7.0. Distribution Agent throughput remains the same for all tests.
Transactional Replication with
Updatable Subscriptions
If the Subscriber must be updatable within a transactional replication environment, three
supported options are available: bidirectional replication, immediate updating
subscriptions, and queued updating subscriptions. The following example measures the
performance impact of making changes at the Subscriber using immediate updating
subscriptions and queued updating subscriptions.
With immediate updating subscriptions, local Subscriber triggers are fired when an
insert, update, or delete operation occurs. These triggers call remote procedures (RPCs)
using the two-phase commit protocol (2PC)—which in turn uses the Microsoft Distributed
Transaction Coordinator (MS DTC)—to attempt to commit the transaction at the
Publisher. If the transaction can be committed at the Publisher, it is then also committed
at the Subscriber. The overhead cost of the immediate updating option is the firing of
the RPCs and the related MS DTC service across the network.
With queued updating subscriptions, triggers are fired when a local insert, update, or
delete operation occurs. These triggers build and write the insert, update, or delete
statement to either a SQL Server queue, which is implemented as a local table in SQL
Server, or Microsoft Message Queuing (also known as MSMQ) and immediately commit.
A Queue Reader Agent—a process that resides on the Distributor—reads the queue and
applies it to the Publisher. The overhead cost to the Subscriber of the queued updating
option (using the SQL Server queue or Message Queuing) is imposed by the underlying
replication triggers that write to the queue.
For both immediate and queued updating subscriptions, using Message Queuing as the
queue imposes additional costs associated with using MS DTC to commit transactions,
14
adding to the overhead cost of the entire process (the time measured from the commit
at the Subscriber to the commit at the Publisher).
To determine the minimum overhead cost of adding the updating subscriptions option at
the Subscriber, the amount of time it took to apply commands at the Subscriber with
and without the updating subscription option was measured.
Transactional Replication with Immediate Updating Subscriptions
Command
Number of
Commands
Replication not configured
Immediate
updating
INSERT
30,000
50 seconds
191 seconds
UPDATE
30,000
20 seconds
177 seconds
DELETE
30,000
20 seconds
118 seconds
The cost in time taken can be significant, because the transaction must be committed
locally and across the network to the Publisher managed by MS DTC.
Using queued updating subscriptions provides the advantage of being able to continue
modifying data at the Subscriber while disconnected, because these modifications are
queued up and later written to the Publisher by the Queue Reader Agent. It is important
to note that this is a multiple-step and disconnected process, and the times shown in the
following table represent only the first step; changes have not yet been applied to the
Publisher, only the Subscriber.
To determine the overhead cost of adding queued updating subscriptions at the
Subscriber, trigger overhead was measured using both a SQL Server queue and Message
Queuing.
The Cost of Queued Updatable Subscriptions at the Subscriber
Command
Number of Replication
commands not
configured
Trigger overhead
Trigger overhead
using a SQL Server using Message
queue
Queuing
INSERT
30,000
50 seconds
127 seconds
141 seconds
UPDATE
30,000
20 seconds
99 seconds
144 seconds
DELETE
30,000
20 seconds
64 seconds
107 seconds
There is a substantial cost in adding triggers that record transactions to a queue. Writing
to a SQL Server queue is faster than writing to Message Queuing. SQL Server queues,
which are the default, are also simpler to set up. However, Message Queuing is more
scalable because SQL Server queues require the Queue Reader Agent to poll all the
Subscribers’ queues periodically, whereas Message Queuing automatically transmits
changes made at the Subscribers to a centralized queue located at the Distributor. The
single queue at the Distributor also provides centralized queue monitoring.
To compare the performance of SQL Server queues and Message Queuing, throughput
was measured during dequeuing (the propagation of data from the queue to the
Publisher).
Dequeuing Throughput Using SQL Server Queues and Message Queuing
15
Command Number of
commands
SQL Server queue
(cmd/sec)
Message Queuing
(cmd/sec)
INSERT
30,000
211
117
UPDATE
30,000
173
102
DELETE
30,000
196
124
To determine the total time required to replicate data from the Subscriber to the
Publisher, two times were measured: the amount of time it took to apply commands at
the Subscriber and, for queued updating subscriptions, the amount of time it took for
the Queue Reader Agent to read the queue and apply it to the Publisher.
Transactional Replication with Updatable Subscriptions (Total Time)
Command
Number of
commands
Immediate
updating
Queued updating
using SQL Server
queues
Queued updating
using Message
Queuing
INSERT
30,000
191 seconds
279 seconds
406 seconds
UPDATE
30,000
177 seconds
283 seconds
448 seconds
DELETE
30,000
118 seconds
222 seconds
358 seconds
Immediate updating subscriptions, using MS DTC, offer the fastest overall time, but they
do not offer the key benefit of being able to handle offline scenarios or the built-in
conflict handling of queued updating subscriptions. For more information about queued
updating subscriptions, see SQL Server 2000 Books Online.
In general, using a SQL Server queue is faster than using Message Queuing: Not only
are the changes written to a queue faster, but the Queue Reader Agent is able to read
the queue and apply changes faster at the Publisher. As mentioned in "Transactional
Replication with Updatable Subscriptions," you should weigh the functional differences
and advantages of using Message Queuing against those of using a SQL Server table as
a queue. Furthermore, if you choose Message Queuing, you must be running Windows
2000 or later, because SQL Server 2000 replication requires Message Queuing version
2.0.
Adding Hardware in Transactional
Replication
Adding another processor to your replication environment can improve throughput
performance for both the Log Reader Agent and the Distribution Agent. Using the
HALFTYPES example in a remote Distributor environment, there is an average 26
percent increase in Log Reader Agent throughput and an average 3 percent increase in
Distribution Agent throughput when adding an extra processor to the remote Distributor.
Although the Distribution Agent shows minimal performance gain when adding another
processor, the HALFTYPES example has only one subscription with one Distribution
Agent, and it has only minimal impact on the CPU. In a real-world scenario, a publication
usually has multiple subscriptions with multiple Distribution Agents running, so the
performance gains are larger and more obvious because both agents use multiple
16
threads. The Log Reader Agent, which is only slightly more CPU-intensive than the
Distribution Agent, also sees higher throughput when another processor is added to a
remote Distributor that has multiple Log Reader Agents running.
When you are running a local Distributor, so that the Distribution Agent shares the CPU
with the Log Reader Agent and the OLTP load, the Distribution Agent shows a much
larger increase in throughput of 15 percent when a second processor is added. Log
Reader throughput increases a substantial 18 percent when a second processor is
added.
Furthermore, in a high-traffic OLTP environment, adding another processor reduces the
contention between the server activity and the Log Reader Agent, improving server
performance and Log Reader Agent throughput. Other factors that can affect
performance include available memory, the size and location of the database log
(including autogrow), and any row filtering of articles.
Effects of Network Connection Speeds
on Transactional Replication
Although the Publisher and Distributor generally communicate on a fast WAN or LAN,
connections to Subscribers are often not as fast. The following test data shows
Distribution Agent performance in terms of commands per second, using various
connection speeds and rows for each command that average approximately 1,024 bytes.
Each test consisted of 3,000 transactions with 10 commands per transaction, for a total
of 30,000 commands. The environment consisted of a Publisher and a remote Distributor
connected across a fast link and a Subscriber connected at various speeds using a
network throttle, a device that can emulate slower connection speeds. Note that
emulated connections are somewhat slower than real dialup or leased-line connections,
because there is no packet optimization.
17
Figure 3: Distribution Rate Using Connection Speeds of 28.8 Kbps, 56 Kbps, and
100 Kbps
Distribution throughput is almost linear as greater network bandwidth is provided; if a
Subscriber is upgraded from a 28.8-kilobits-per-second (Kbps) dialup line to a 100 Kbps
connection, three times as many commands can be delivered in the same period.
However, as indicated by the following graph, eventually the network is no longer the
bottleneck, and increasing the network bandwidth does not provide additional
performance gain.
Figure 4: Distribution Rate Using Connection Speeds of 1 megabits Per Second
(Mbps), 10 Mbps, and 100 Mbps
SQL Server 2000 introduces new Net-Libraries to be used for highly reliable, fast,
efficient data transfer between servers in the same data center. These new Net-Libraries
contain functionality for different hardware sets based on the Virtual Interface
Architecture (VIA). Currently, SQL Server 2000 supports hardware from Giganet and
Servernet.
18
In tests, using Servernet over a 1-GB connection between the Distributor and the
Subscriber improved the delivery of INSERT commands by 12 percent and UPDATE
commands by 8 percent when compared to a 100-MB connection. These increases would
have been larger if larger amounts of data had been created and distributed or if there
had been multiple Subscribers. This is because the 1-GB environment provides the
possibility of scaling to a very large number of Subscribers, each receiving large volumes
of data. Bottlenecks in this scenario could include writing the commands to disk at the
Subscriber and the Subscriber’s physical processing power, so it is important to have an
adequate disk subsystem and CPU at the Subscriber.
Replication Command Types
By default, the Distribution Agent applies transactions at Subscribers using
autogenerated custom stored procedures. These procedures are written to the
MSrepl_commands table in the distribution database by the Log Reader Agent and are
created on the Subscriber. For example, instead of applying the original INSERT
statement that created the INSERT on the Publisher, the Distribution Agent executes an
INSERT stored procedure at the Subscriber to perform the same action. These stored
procedures can be further customized—for example, for actions such as maintaining
aggregate tables—which is generally better than adding Subscriber-specific logic in
triggers.
Using custom stored procedures can provide performance improvements for the
Distribution Agent because the stored procedure’s plan is cached and reused at the
Subscriber, and in most cases the amount of data passed over the network can be
smaller. Generally, Log Reader Agent performance is the same whether dynamic SQL or
custom stored procedures are used; however, UPDATE commands may see throughput
increase if the table column names being updated are very long and those column
names are provided in the INSERT statement.
Using the HALFTYPES example, executing custom stored procedures provided better
performance than using dynamic SQL. The greatest performance benefit is with UPDATE
commands, which ran 1.8 times faster.
Distribution Agent Speed (Dynamic vs. Default Custom Stored Procedures)
Command
Number of
commands
Dynamic SQL
(cmd/sec)
Default custom
stored procedure
(cmd/sec)
INSERT
30,000
1,118
1,230
UPDATE
30,000
882
1,570
DELETE
30,000
3,950
3,950
CALL, MCALL, and XCALL
When used in custom stored procedures, the CALL, MCALL, and XCALL syntaxes vary in
the amount of data propagated to the Subscriber during transactional replication. The
CALL syntax, which can be used for INSERT, UPDATE, and DELETE statements, passes
all values for all inserted and deleted columns. The MCALL syntax, which can only be
used for UPDATE statements, passes values for affected columns, NULL for unaffected
19
columns, and a bitmap parameter, which identifies which columns have been modified.
The XCALL syntax, used for UPDATE and DELETE statements, passes values for all
columns, whether changed or not, and includes the previous values of all columns.
By default, CALL is used for INSERT and DELETE commands, and MCALL is used for
UPDATE commands, because doing so provides the best overall performance for the Log
Reader Agent and the Distribution Agent.
Unless every column in a table is updated, MCALL provides better performance than
CALL and XCALL. Currently, XCALL is used by DTS replication, and it can be used for
custom applications that require the previous values. CALL exists for backward
compatibility, because UPDATE statements used CALL syntax when it was first
introduced in Microsoft SQL Server 6.5. MCALL stored procedures support the same
types of customization as CALL stored procedures, so MCALL is recommended if
backward compatibility is not a concern.
The HALFTYPES example showed the difference in throughput when using dynamic
SQL, CALL, MCALL, and XCALL. The results are expressed in commands per second. As
mentioned earlier, CALL is the default for INSERT and UPDATE statements, and MCALL is
the default for UPDATE statements.
Distribution Agent (Dynamic SQL vs. Stored Procedures)
Command
Number of
commands
Dynamic
SQL
CALL
MCALL
XCALL
INSERT
30,000
1137
1245
Not
applicable
1245
UPDATE
30,000
855
1054
1534
736
DELETE
30,000
4118
4254
Not
applicable
1261
The following example shows a generated statement using dynamic SQL, CALL, MCALL,
and XCALL stored procedures for an UPDATE statement. For simplicity and to save
space, the authors table in the pubs database was used. You can call
sp_browsereplcmds from within the distribution database to view the generated
statements for your publications. For more information about sp_browsereplcmds, see
SQL Server Books Online.
The following code shows an UPDATE statement made at the Publisher:
UPDATE authors SET phone = '425 882-8080' WHERE au_id = '172-32-1176'
20
The following table shows the UPDATE statement as it is stored at the Distributor.
Equivalent UPDATE Statement Stored Within the Distribution Database
Command
type
Command
Dynamic
SQL
{UPDATE "authors" SET "phone"='425 882-8080' where
"au_id"="172-32-1176"}
MCALL
{CALL sp_MSupd_authors (NULL, NULL, NULL, '425 882-8080', NULL,
NULL, NULL, NULL, 0, '172-32-1176', 0x0800)}
CALL
{CALL sp_MSupd_authors ('172-32-1176', 'White', 'Johnson', '408
496-7223', '10932 Bigge Rd.', 'Menlo Park', 'CA', '94025', 1,
'172-32-1176')}
XCALL
{CALL sp_MSupd_authors ('172-32-1176', 'White', 'Johnson', '408
496-7223', '10932 Bigge Rd.', 'Menlo Park', 'CA', '94025',
1,'172-32-1176', 'White', 'Johnson', '425 882-8080', '10932 Bigge Rd.',
'Menlo Park', 'CA', '94025', 1)}
sp_scriptdynamicupdproc
Introduced in SQL Server 2000 SP1, the sp_scriptdynamicupdproc stored procedure
performs dynamic updates. The UPDATE statement within the custom stored procedure
is built dynamically, based on the MCALL syntax for indicating which columns to change.
This approach becomes more beneficial as the number of indexes on the subscribing
table increases and a low number of columns is actually being changed.
The default MCALL scripting logic includes all columns within the UPDATE statement,
using a bitmap to determine which columns were changed. If a column has not changed,
the column is set back to itself, which is not an issue in most cases; however, if several
columns are indexed, extra processing starts to creep in. If there are several indexes on
a subscribing table for which only a few column values are changing, index maintenance
overhead increases, and this may limit the rate at which changes can be applied.
Building the update statement dynamically at run time includes only the columns that
have changed, providing an optimal update string. The tradeoff is extra processing
incurred at run time to build the dynamic UPDATE statement.
HALFTYPES Table with a Varying Number of Indexes
Number of indexes
Distributor throughput
(cmd/sec)
Default MCALL, 1 clustered index
1,570
Default MCALL, 1 clustered index, 10 nonclustered
indexes
110
New dynamic MCALL with sp_scriptdynamicupdproc, 1 1,180
clustered index, 10 nonclustered indexes
21
Log Reader Agent Properties
The default values for the Log Reader Agent are optimal under many circumstances;
however, performance can be enhanced by:

Reducing the -OutputVerboseLevel property to 0 except during initial testing,
monitoring, or debugging. This reduces the amount of output information that is
displayed and can improve performance by 5 percent when reducing the value from
2 to 0.

Reducing the -HistoryVerboseLevel property to 0 except during initial testing,
monitoring, or debugging. This eliminates the logging of history and can improve
performance by 5 percent when reducing the value from 2 to 0.

Increasing the -ReadBatchSize property. Although the default value of 500 is
optimal, increasing the value twofold to tenfold for tests containing smaller sized
transactions (one or two commands) improved performance by 5-15 percent;
however, improvement costs were negligible for tests containing larger sized
transactions (10+ commands).

Setting -ReadBatchThreshold property to 0. The default value is 0, which means
the Log Reader Agent reads to the end of the log or until it reaches the value set in
-ReadBatchSize. (SQL Server Books Online lists an incorrect default value of 100.)
Setting it to any other value may reduce performance.

Reducing the -PollingInterval setting. Reducing the polling interval can improve the
latency of transactions from the log to the distribution database because the Log
Reader Agent will query the transaction log more often.

Altering the -MaxCmdsInTran setting to improve elapsed time and latency when
handling transactions that contain a large number of commands. This parameter is
new in SQL Server 2000 SP1.
-MaxCmdsInTran allows the Log Reader Agent to break transactions consisting of a
large number of commands into smaller transactions, or chunks, which reduces
blocking at the Distribution Agent. The Distribution Agent can start processing early
chunks while the Log Reader Agent is working through the later chunks of the same
transaction, thus improving parallelism between the two agents. However, using this
property also means these chunks are committed at the Subscriber as individual
transactions. In theory, using the MaxCmdsInTran property breaks the atomicity
rule, which states that a transaction must be committed as all or nothing. This is not
necessarily problematic because the transaction has already been committed at the
Publisher, but users should be aware of this aspect of using -MaxCmdsInTran.
As an example of using -MaxCmdsInTran, consider a transaction consisting of 10
million deletes. When this transaction was tested against the HALFTYPES table, the
total elapsed time for processing at Log Reader and the Distribution Agent was one
hour and 45 minutes. Log Reader Agent throughput was 5,089 commands per
second, and Distribution Agent throughput was 2,466 commands per second. In this
case, the Distribution Agent did not start until the Log Reader Agent has completed.
Using the -MaxCmdsInTran argument at the Log Reader Agent, set to a value of
10,000, total elapsed time for the two agents was a little under one hour. Instead of
replicating one transaction, the Log Reader Agent created 1,000 transactions, each
consisting of 10,000 commands. Log Reader Agent throughput was 3,809 commands
22
per second and Distribution Agent throughput was 2,390 commands per second.
Although Log Reader Agent throughput was reduced, performance was improved
because overall elapsed time and latency for each transaction was also reduced.
Distribution Agent Properties
The default values for the Distribution Agent are optimal under many circumstances;
however, performance can be enhanced by:

Reducing the -OutputVerboseLevel property to 0 except during initial testing,
monitoring, and debugging. This reduces the amount of output information that is
displayed and can improve performance by 5 percent.

Reducing the -HistoryVerboseLevel property to 0 except during initial testing,
monitoring, and debugging. This eliminates the logging of history and can improve
performance by 5 percent.

Increasing the -CommitBatchSize and -CommitBatchThreshold property.
Although the default values of 100 for -CommitBatchSize and 1,000 for
-CommitBatchThreshold are optimal, in tests run at Microsoft, increasing the
values twofold to tenfold improved performance by 5 percent for INSERT commands,
10-15 percent for UPDATE commands, and 30 percent for DELETE commands. In the
test scenario, the Distribution Agent ran independently of the Log Reader Agent, but
if the two agents run concurrently and the values are set too high, Distribution Agent
performance can decrease while the Log Reader Agent is running.

Reducing the -PollingInterval setting. Reducing the polling interval can improve the
latency of transactions from the distribution database to the Subscriber, because the
Distribution Agent will query the distribution database more frequently. Reducing the
polling interval on the both the Log Reader Agent and the Distribution Agent can
improve latency between Publisher and Subscriber. Using a low value on a slower
network connection is not recommended.
Transactional Replication Scalability
Because transactional replication is often used in scale-out scenarios, it is crucial to
understand the ways in which the type of agent, Distributor, and subscription can affect
scalability. This section of the paper examines the use of remote Distributors, pull
subscriptions, and independent agents. It also considers Distributor delivery rates,
latency, and dequeuing rates for queued updating subscriptions.
Note If the number of Subscribers in your topology is very large, or the
Subscribers share a fast network but are separated from the Publisher and
Distributor by a slow link, design a multi-tiered replication topology. The root server
should be the Publisher for middle-tier Subscribers, which in turn should republish
the data to lower-level Subscribers. There are some limitations to a topology based
on republishing, such as not being able to use updating Subscribers, but republishing
can be a good choice for a scale-out scenario. For more information about
republishing, see SQL Server Books Online.
In the tests that follow, unless otherwise stated the hardware configuration consisted of
two Compaq Proliants with RAID disk subsystems: the Publisher was a Pentium Xeon
550 MHz quad processor with 640 MB memory and a 512-KB level 2 cache, and the
23
Distributor was a quad processor Pentium II 200 MHz with 3 GB memory. The
publication database and publication contained a single table with a column for every
data type (except text, ntext, image, and sql_variant). The replication topology
consisted of a Publisher, a remote Distributor, and multiple Subscribers. The
subscriptions were distributed evenly between the Subscribers. Network connection was
over a 100-Mbps LAN using TCP/IP.
Using a Local or Remote Distributor
If the Publisher is expected to be a busy OLTP server, or if it is already CPU intensive or
even I/O intensive, place the Publisher and Distributor on separate computers. This
supports future scaling and capacity planning because multiple Publishers can use the
same Distributor. Here are some examples of how transactional replication can affect
OLTP activity:

Replication agents require a certain amount of memory while executing. Multiple Log
Reader Agents and Distribution Agents can consume significant amounts of memory
and CPU cycles.

The Log Reader Agent writes commands to the distribution database, so multiple Log
Reader Agents servicing multiple published databases and writing to the distribution
database can consume many CPU cycles and increase disk I/O.

If there are multiple Distribution Agents, using a local Distributor slows the overall
performance of the server.

The cleanup tasks that are run as a maintenance activity on the distribution database
can become expensive and involve significant disk activity.
Using the HALFTYPES example with a single Log Reader Agent and Distribution Agent,
there was a performance benefit in using a remote Distributor. Using the throughput of a
local Distributor as a baseline, the Log Reader Agent on a remote Distributor was
approximately 1.3 times faster. The Distribution Agent on a remote Distributor for the
same tests was approximately 1.47, 1.1, and 1.15 times faster for INSERT, UPDATE,
and DELETE commands, respectively.
Using Pull Subscriptions
The Distribution Agent runs on the Distributor for push subscriptions and on Subscribers
for pull or anonymous subscriptions. Using pull or anonymous subscriptions can increase
performance by moving Distribution Agent processing from the Distributor to
Subscribers.
Anonymous subscriptions, which are especially useful for Internet applications, do not
require information about the Subscriber to be stored in the distribution database at the
Distributor for transactional replication. Not having to maintain information on
Subscribers using anonymous subscriptions reduces the resource demands on the
Publisher and Distributor.
Anonymous subscriptions are a special category of pull subscriptions. In regular pull
subscriptions, the Distribution Agent runs at the Subscriber (thereby reducing the
resource demands on the Distributor), but it still stores information at the Publisher.
24
Using Independent Agents
An independent agent is an agent that services a single publication/subscription pair.
Using independent agents reduces latency, because the agent is ready whenever the
subscription needs to synchronize.
A shared agent, on the other hand, services multiple publication/subscription pairs
within a Publisher database and Subscriber database. When multiple subscriptions using
the same shared agent need to synchronize, they wait in a queue, and the shared agent
services them one at a time.
A shared agent is the default for transactional replication, because independent agents
cannot guarantee transactional consistency when separate transactions are dependent
on each other but are handled by different independent agents. Consider the following
example: Transaction T1 updates all rows in article A1 in publication P1, and then
transaction T2 in publication P2 bases a query on the results of a SELECT from A1. If
you are using a shared agent, this presents no problem, as the shared agent is aware of
all transactions and it commits them in order. But independent agents are not aware of
each other’s transactions, so there can be no guarantee that T1 will be processed before
T2. However, if there are no dependencies between transactions handled by different
agents, independent agents allow you to retain transactional consistency while reducing
latency.
Distribution Delivery Rates
In the test discussed in this section, the distribution delivery rate as a function of the
number of subscriptions was examined. The transaction rate used was an average of
eight transactions per second, with an average of five commands per transaction. This
amounts to 1,000,000 to 2,000,000 commands per day, with equal ratios of INSERT,
UPDATE, and DELETE commands. All the Distribution Agents were configured as pull,
and they were run concurrently.
In this scenario, neither the Publisher nor the Distributor was CPU stressed. In fact,
because the subscriptions were pull subscriptions, which means that Distribution Agents
were run at the Subscribers, the Distributor seldom ran above 25 percent CPU usage
with 128 concurrent Subscribers.
Figure 5: Change in Delivery Rate As Number of Subscribers Increases
25
Log Reader Agent performance is essentially unaffected by the number of subscriptions,
especially when a remote Distributor is used. When a remote Distributor was used, the
cost in throughput to the Log Reader was 30-40 percent.
When the number of subscribers was scaled from 1 to 128, there was only a 6.2 percent
drop in commands per second, at a cost of approximately 1 percent per 48 additional
subscribers.
Distribution Delivery Latency
Delivery latency as a function of the number of subscriptions was also examined, using
the same scenario as in the distribution delivery rates example.
Figure 6: Delivery latency
Average latency increased from three to six seconds when 128 concurrent pull
Distribution Agents were running, so all 128 subscribers were on average only six
seconds behind the publishing database. However, in the previous scenario, the Log
26
Reader Agent and the Distribution Agent were able to keep up with the transaction rate
at the Publisher; given a heavier load, a slower network, or slower Subscriber
computers, greater latencies can be expected.
Dequeuing Rates for Queued Updating Subscribers
The next scenario tested replication scalability by examining the effect on the dequeuing
rate of adding additional Subscribers. In this test, a single Queue Reader Agent serviced
all the queues (both SQL Server queues and Message Queuing) for a given publication.
The dequeuing rate (using a SQL Server queue) as a function of the number of
subscriptions is examined in the following chart. The publication again consisted of a
single table to which multiple Subscribers subscribe. Hardware used for this test was two
dual-processor Xeon 550 MHz computers with 512 MB memory for both the Publisher
and the Distributor. The Subscribers were simulated using the two Xeon computers over
a 100-Mbps LAN.
Figure 7: Dequeuing rate
The dequeuing rate decreased by 13 percent for 20 concurrent Subscribers and by 27
percent for 32 concurrent Subscribers. However, in this stress situation an average
dequeuing rate of 140 commands per second was maintained; under normal conditions
it is unlikely that all Subscribers would have such a high number of changes in the local
queue. Using Message Queuing, about 10 percent less performance was realized
because Message Queuing uses MS DTC as its transaction coordinator, adding some
overhead.
Transactional Subscriber Latency Rates
Using transactional replication, it is possible for a Subscriber to be a few seconds behind
the Publisher. With a latency of only a few seconds, the Subscriber can easily be used as
a reporting server, offloading expensive user queries and reporting from the Publisher to
the Subscriber.
In the following scenario (using the Customer table shown later in this section) the
Subscriber was only four seconds behind the Publisher. Even more impressive, 60
percent of the time it had a latency of two seconds or less. The time is measured from
27
when the record was inserted or updated at the Publisher until it was actually written to
the subscribing database.
28
Figure 8: Transactional Subscriber Latency
UPDATE commands
INSERT commands
Latency (seconds) Number of rows
Latency (seconds) Number of rows
4
5,528 4
1,318
1
21,359 3
14,984
3
30,359 2
39,563
2
42,754 1
44,135
TOTAL
100,000 TOTAL
100,000
This scenario used a separate Publisher, Distributor, and Subscriber across a 10-Mbps
LAN, using identical computers: Dell Precision 610; dual-processor 450 MHz; 256 MB of
memory and two SCSI hard drives. Commands are applied to the Subscriber with the
autogenerated replication stored procedures, using a pull subscription. To complete
100,000 inserts on the Publisher took 304 seconds. 100,000 updates took 315 seconds,
or 325 commands per second.
The Subscriber was in synchronization within 4 seconds of the Publisher. During the
entire process, the Publisher’s CPU rarely moved above 30 percent utilization, the
Distributor 14 percent, and the Subscriber 11 percent. Given that there was plenty of
CPU capacity at the Distributor and that these are not high-end production servers,
more Subscribers can be added with similar latencies.
29
The Customer Table (Used in the Latency Test)
Column
Data type
NULL
Default
Typical data
IDENTITY(0,1)
50000
Cust_Id
int
Lname
varchar(30)
HALL
Fname
varchar(30)
NEWMAN
DOB
smalldatetime
State
char(2)
Email
varchar(30)
Yes
someone@micr
osoft.com
Tel
varchar(15)
Yes
425 555-0100
Yes
Zip
varchar(4)
98050
Yes
Rating
smallint
20
Yes
ROWGUID
uniqueidentifier
Tran_Dt
datetime
Yes
1966-01-01
00:00:00
Updated
Yes
WA
ROWGUIDCOL - FF1452C0newid()
EF54-47AA8CCF880F7C6F246A
2000-01-01
00:00:00
Yes
The primary key is the Cust_Id column, and it is clustered. Both the Publisher and
Subscriber include two nonclustered indexes: one on State (nonunique) and one on
ROWGUID (unique).
Factors Affecting Transactional Delivery
Rates
In most cases, the Subscriber is the bottleneck, because data cannot be written or
applied quickly enough. Factors include the following:


Subscriber’s physical computer.

Slower processor and/or lower number of processors.

Low processor availability and/or high processor load.

Low amount of available memory.

Slower disk subsystem.
Subscription database or SQL Server setup.

Database log not on a separate disk.

Database on RAID 5 disk (RAID 10 provides better performance).
30


SQL Server memory available, and whether it is dynamic or fixed: The amount of
memory that is appropriate and whether it should be fixed or dynamically
allocated depends on your application.

SQL Server protocol used: TPC/IP is generally slightly faster than other network
protocols.

SQL Server Personal Edition, which is generally slower, used.

Windows 98 or Windows Millennium, which are generally slower, used.
Network speed or connection

The Subscriber can become I/O bound if using a very fast network (100 MB or
faster) and the Subscriber has a slower disk subsystem or the log is not on a
separate disk.

Reliability of the connection: more retries may be necessary if the connection is
unreliable.

Different indexes exist on the Subscriber. Often a reporting server is heavily indexed,
and index management results in more I/O. Using the CUSTOMER table shown
earlier, the average latency increases to 4 seconds (with a maximum of 6 seconds)
when four nonclustered indexes are added at the Subscriber on [Lname, Fname],
[DOB], [Email], and [Tel].

User triggers firing at the Subscriber. Subscriber triggers not marked NOT FOR
REPLICATION are fired for each relevant operation. As triggers frequently use the
inserted and/or deleted tables and often perform other operations, the costs can be
dramatic. By moving the trigger code into a custom stored procedure, some of the
costs can be avoided. Using the earlier CUSTOMER example: it takes 88 seconds, or
1,140 commands per second, for the Distribution Agent to deliver 100,000 insert
commands to the Subscriber. The Subscriber has the following insert trigger defined:
CREATE TRIGGER CUSTOMER_INS_TRG ON CUSTOMER FOR INSERT
AS
INSERT INTO BADRATINGS (Id, Cust_Id, Rating, Rating_Dt)
SELECT NewId(), Cust_Id, Rating, GetDate()
FROM inserted
WHERE (Cust_Id % 3) = 0
Then the trigger is dropped and the relevant code is added to the autogenerated insert
stored procedure, which is called by the Distribution Agent:
CREATE PROCEDURE sp_MSins_CUSTOMER...
...
IF((@c1 % 3) = 0)
INSERT INTO BADRATINGS (Id, Cust_Id, Rating, Rating_Dt)
SELECT NewId(), @c1, @c9, GetDate()
31
It now takes only 52 seconds (1,932 commands per second) for the Distribution Agent
to deliver 100,000 commands to the Subscriber. This is 1.7 times faster than using
triggers, dramatically affecting latency and throughput.
If user triggers are still required (to trap local data changes made by users, for
example), they should be marked as NOT FOR REPLICATION. The triggers then fire only
when local data changes are made by users.

Replicating stored procedure execution. SQL Server can replicate the execution of
stored procedures rather than the data changes caused by the execution of those
stored procedures. This is useful in replicating the results of maintenance-oriented
stored procedures that may affect large amounts of data. Replicating the changes as
one stored procedure statement can greatly increase the efficiency of your
application, but this feature should be used with care.
Each time a published stored procedure is executed at the Publisher, the execution
and the parameters passed to it for execution are forwarded to each Subscriber to
the Publication. The stored procedure is then executed with these parameters at the
Subscriber. This is vastly different from the Log Reader Agent picking up the changes
in the log (for possibly thousands of rows), building the SQL statements for each and
then having them applied to the Subscriber.
Using the CUSTOMER table example (with an existing 100,000 rows) earlier in this
paper, the following stored procedure was executed at the Publisher:
CREATE PROCEDURE PROC_CUSTOMER_ADMIN_RATING @DOB smalldatetime
AS
UPDATE CUSTOMER
SET Rating = Rating + 1
WHERE DOB < @DOB
Executing EXEC PROC_CUSTOMER_ADMIN_RATING '1966-01-01' resulted in 59,972
rows being updated, picked up by the Log Reader Agent and written to the
distribution database. The Distribution Agent then applies 59,972 updates to the
Subscriber, which takes one minute and 51 seconds to complete. In contrast, when
replicating the execution of the stored procedure, only the actual EXEC statement is
written to the distribution database and is then executed at the subscribing
database. This takes only 1.7 seconds. Therefore, replicating stored procedure
execution both reduces the volume of commands requiring forwarding to Subscribers
and increases the performance of your application by executing fewer dynamic SQL
statements at each Subscriber.
32
Conclusion
Transactional replication in SQL Server 2000 is a mature technology that offers high
performance and scalability and is suitable for the most demanding enterprise
applications. Transactional replication performs well with its default behavior and
settings, but it can clearly benefit from performance tuning based on the specific needs
of your replication topology and applications. Following the examples and suggestions
outlined in this paper can help you take transactional replication performance to the next
level.
33