FAST Search Server 2010 for SharePoint
Capacity Planning
This document is provided "as-is". Information and views expressed in this document, including
URL and other Internet Web site references, may change without notice. You bear the risk of
using it.
Some examples depicted herein are provided for illustration only and are fictitious. No real
association or connection is intended or should be inferred.
This document does not provide you with any legal rights to any intellectual property in any
Microsoft product. You may copy and use this document for your internal, reference purposes.
© 2010 Microsoft Corporation. All rights reserved.

FAST Search Server 2010 for SharePoint
Capacity Planning
Microsoft Corporation
June 2010
Applies to: FAST Search Server 2010 for SharePoint
Summary: This white paper describes specific deployments of FAST Search Server 2010 for
SharePoint, including:

Test environment specifications, such as hardware, farm topology and configuration

The workload that is used for data generation, including the number and class of users,
and farm usage characteristics

Test farm dataset, including search indexes and external data sources

Health and performance data that is specific to the tested environment

Test data and recommendations for how to determine the hardware, topology, and
configuration that you need to deploy a similar environment, and how to optimize your
environment for appropriate capacity and performance characteristics.
Contents
Introduction ......................................................................................................................................................3
Search Overview ................................................................................................................................................4
Sizing Approach ...........................................................................................................................................4
Search and Indexing Lifecycle .......................................................................................................................4
Content feeding .....................................................................................................................................5
Query load ............................................................................................................................................6
Network traffic .................................................................................................................................... 13
Web Analyzer performance dimensioning ...................................................................................................... 14
Scenarios ........................................................................................................................................................ 15
Medium FAST Search Farm.......................................................................................................................... 15
Deployment alternatives ....................................................................................................................... 16
Specifications ...................................................................................................................................... 16
Test Results ........................................................................................................................................ 32
Overall Takeaways ............................................................................................................................... 45
Troubleshooting performance and scalability ........................................................................................................ 47
Raw I/O performance ................................................................................................................................. 47
Analyzing feed and indexing performance ..................................................................................................... 49
The feed and indexing processing chain .................................................................................................. 49
Content SSA ....................................................................................................................................... 49
Content distributors and item processing ................................................................................................ 52
Indexing dispatcher and indexers .......................................................................................................... 53
Analyzing query performance ...................................................................................................................... 54
Query SSA .......................................................................................................................................... 54
QRproxy and QRserver ......................................................................................................................... 55
Query dispatcher ................................................................................................................................. 56
Query matching ................................................................................................................................... 56
Introduction
This white paper provides capacity planning information for collaboration environment
deployments of FAST Search Server 2010 for SharePoint, which is referred to as FAST
Search Server. The white paper includes the following information for sample search farm
configurations:

Test environment specifications, such as hardware, farm topology, and configuration

The workload that is used for data generation, including the number and class of users
and farm usage characteristics

Test farm dataset, including search indexes and external data sources

Health and performance data that is specific to the tested environment
The white paper also contains common test data and recommendations for how to determine
the hardware, topology, and configuration that you need to deploy a similar environment, and
how to optimize your environment for appropriate capacity and performance characteristics.
FAST Search Server contains a richer set of features and a more flexible topology model than
the search solution in earlier versions of SharePoint. Before you employ this architecture to
deliver more powerful features and functionality to your users, you must carefully consider the
effect on your farm’s capacity and performance.
When you read this white paper, you will understand how to:

Define performance and capacity targets for your environment.

Plan the hardware that is required to support the number and type of users and the
features you intend to deploy.

Design your physical and logical topology for optimum reliability and efficiency.

Test, validate, and scale your environment to achieve performance and capacity targets.

Monitor your environment for key indicators.

Before you read this white paper, you should read the following:Performance and
capacity management (SharePoint Server 2010)

Plan Farm Topology (FAST Search Server 2010 for SharePoint)
Search overview
Sizing approach
The scenarios in this white paper describe FAST Search Server test farms, with assumptions that
allow you to start planning for the correct capacity for your farm. To choose the right scenario,
you must consider the following questions:
1. Corpus Size: How much content has to be searchable? The total number of items should
include all objects: documents, web pages, list items, and so on.
2. Availability: What are the availability requirements? Do customers need a search
solution that can survive the failure of a particular server?
3. Content Freshness: How fresh do the search results to be? How long after the
customer modifies the data do you expect searches to provide the updated content in the
results? How often do you expect the content to change?
4. Throughput: How many people will be searching over the content simultaneously? This
includes people typing in a query box, in addition to other hidden queries like Web Parts
automatically searching for data, or Microsoft Outlook 2010 Social Connectors requesting
activity feeds that contain URLs that need security trimming from the search system.
Search and indexing lifecycle
The scenarios allow you to estimate capacity at an early stage of the farm. Farms move through
multiple stages as content is crawled:



Index acquisition
This is the first stage of data population, and it is characterized by:
o
Full crawls (possibly concurrent) of content.
o
Close monitoring of the crawl system to ensure that hosts being crawled are not a
bottleneck for the crawl.
Index Maintenance
This is the most common stage of a farm. It is characterized by:
o
Incremental crawls of all content, detecting new and changed content.
o
For SharePoint content crawls, a majority of the changes that are encountered
during the crawl are related to access right changes.
Index Cleanup This stage occurs when a content change moves the farm out of the
index maintenance stage, for example, when a content database or site is moved from
one search service application to another. This stage is not covered in the scenario
testing behind this white paper, but is triggered when:
o
A content source or start address or both are deleted from a search service
application.
o
A host supplying content is not found by the content connector for an extended
period of time.
Content feeding
Index acquisition
Feed performance, when new content is being added, is mainly determined by the configured
number of item processing components. Both the number of the CPU cores and the speed of
each of them affects the results. As a first order approximation, a 1 GHz CPU core can process
one average size (about 250 KB) Office document per second. For example, the M4 scenario,
which is discussed later in this white paper, has 48 CPU cores for item processing, each being
2.26 GHz, for a total estimated throughput of 48 cores × 2.26 GHz ≈ 100 items per second, on
average.
The following crawl rate graph (taken from the SharePoint administration reports) shows one
such crawl. The crawl rate varies depending on the type of the content. Most of the crawl is new
additions (labeled as "modified" in the graph).
Note:
The indicated feed rates might saturate content sources and networks during peak feeding
rate periods in the preceding crawl. See the Troubleshooting performance and scalability
section for further information about how to monitor feeding performance.
Index maintenance
Incremental crawls can consist of various operations.

Access right (ACL) changes and deletes: These require near zero item processing, but
they require high processing load in the indexer. Feed rates are higher than for full
crawls.

Content updates: These require full item processing in addition to more processing by the
indexer compared to adding new content. Internally, such an update corresponds to a
deletion of the old item and an addition of the new content.

Additions: Incremental crawls contain newly discovered items. These have the same
workload as index acquisition crawls.
Depending on the type of operation, an incremental crawl might be faster or slower than an
initial full crawl. It is faster in the case of mainly ACL updates and deletes, and it is slower in the
case of mainly updated items. Using a backup indexer might slow down the incremental crawl of
updated items further.
In addition to updates from the content sources, the index is also altered by internal operations:

The FAST Search Server link analysis and click-through log analysis generate additional
internal updates to the index.
Example: A hyperlink in one item leads to an update of the anchor text information that
is associated with the referenced item. Such updates have a load pattern that is similar
to the ACL updates.

At regular intervals, the indexer performs internal reorganization of index partitions and
data defragmentation. Defragmentation is started every night at 3:00 A.M., although
redistribution across partitions occurs whenever it is needed.
These internal operations imply that you can observe indexing activity outside intervals with
ongoing content crawls.
Query load
Index partitioning and query evaluation
The overall index is partitioned on two levels:

Index columns: The complete searchable index can be split into multiple disjoint index
columns when the complete index is too large to reside on one server. A query is
evaluated against all index columns within the search cluster, and the results from each
index column are merged into the final query hit list.

Index partitions: Within each index column the indexer uses a dynamic partitioning of the
index to handle large number of indexed items with low indexing and query latency. This
partitioning is dynamic and handled internally on each index server. When a query is
evaluated, each partition runs within a separate thread. The default number of partitions
is five. To handle more than 15 million items per server (column), you have to change
the number of partitions (and associated query evaluation threads). This is discussed in
the Configuration for extended content capacity section.
Query latency
Evaluation of a single query is schematically illustrated in the following figure.
CPU processing (light blue) is followed by waiting for disk access cycles (white) and actual disk
data read transfers (dark blue), repeated 2-10 times per query. This implies that the query
latency depends on the speed of the CPU, in addition to the I/O latency of the storage
subsystem.
A single query is evaluated separately and in parallel, across multiple index partitions in all index
columns. In the default five-partition configuration, each query is evaluated in five separate
threads.
Query throughput
When query load increases, multiple queries are evaluated in parallel, as indicated in the
following figure.
Because different phases of the query evaluation occurs at different times, simultaneous I/O
accesses are not likely to become a bottleneck. CPU processing shows considerable overlap,
which is scheduled across the available CPU cores of the node.
In all scenarios that were tested, the query throughput reaches its maximum when all available
CPU cores are 100 percent utilized. This happens before the storage subsystem becomes
saturated. More and faster CPU cores increase the query throughput and eventually make disk
accesses the bottleneck.
Note:
In larger deployments, with many index columns, the network traffic between query
processing and query matching nodes might also become a bottleneck, and you might
consider increasing the network bandwidth for this interface.
Index size effect on query performance
Query latency is somewhat independent of query load up to the CPU starvation point at
maximum throughput. Query latency for each query is a function of the number of items in the
largest index partition.
The following diagram shows query latency on a system, starting out with 5 million items in the
index, with more content being added in batches up to 43 million items. Feeds are not running
continuously, so the feeding effects on query performance at different capacity points can be
seen. Data is taken from the M6 scenario, which is described later in this white paper.
There are three periods in the graph where search has been stopped, rendered as zero latency.
You can also observe that the query latency is slightly elevated when query load is applied after
an idle period. This is due to caching effects.
The query rate is, on average, 10 queries per minute, apart from a test for query throughput
within the first day of testing. This reached about 4,000 queries per minute, making the 10 qpm
query rate graph almost invisible. Thus this shows the light load query latency and not
maximum throughput.
The following diagram shows the feed rates during the same interval.
By comparing the two graphs, you can see that an ongoing feed gives some degradation of
query latency. Because this scenario has a search row with backup indexer, the effect is much
less than in systems with search running on the same nodes as indexer and item processing.
Percentile based query performance
The graphs presented earlier in this white paper show the average query latency. The
SharePoint administrative reports also provide percentile based reports. This can provide a more
representative performance summary, especially under high load conditions.
The following graph shows the percentile based query performance for the same system as in
the previous section. While the previous graphs showed average query latencies about 500-700
ms, the percentile graph shows that the median latency (fiftieth percentile) is leveling out about
400 ms when content is added. The high percentiles show larger variations, due to the increased
number of items on the system and the effect of ongoing crawls.
Note:
The percentile-based query performance graph includes the crawl rate of the Query SSA. This
does not show any crawling activity, because the Query SSA only crawls user profile data for
the people search index. People search is not included in the test scenarios in this white
paper. Crawling of all other sources is performed by the FAST Search Server Content SSA.
The query throughput load test during the first day reveals that high query load reduces the
latency for the high percentiles. During low query load and ongoing feed, a large fraction of the
queries hits fresh index generations without caches. When query load increases (within the
maximum throughput capacity), the fraction of cold cache queries goes down. This reduces the
high percentile latencies.
Deployments with indexing and queries on same row
Because crawls and queries both use CPU resources, deployments with indexing and queries on
same row show some degradation in query performance during content crawls. Single row
deployments are likely to have indexing, query, and item processing all running on the same
servers.
The following test results are gathered by applying an increasing query load to the system. The
graphs are gathered from the SharePoint administrative reports. The query latency is plotted as
an area versus the left axis, and the query throughput is a light blue line versus the right axis.
In the following diagram there is no ongoing content feed. The colors of the graph are as
follows:

Red: Backend, that is time consumed in the FAST Search Server nodes

Yellow: Object model

Blue: Server rendering
Query latency remains stable at about 700 ms up to 8 queries per second (~500 queries per
minute). At this point, the server CPU capacity becomes saturated. When even higher loads are
applied, query queues build up and latency increases linearly with the queue length.
In the following diagram, the same query load is applied with ongoing content feeding. This
implies that queries have to utilize CPU capacity from the lower prioritized item processing.
Consequently, query latency starts to increase even at low load and the maximum throughput is
reduced from ~600 to ~500 queries per minute. Note the change in scale on the axis compared
to the previous graph.
Query latency has higher variation during feed. The spikes shown in the graph are due to the
indexer completing larger work batches, leading to new index generations which invalidates the
current query caches.
Using a dedicated search row
You can deploy a dedicated search row to isolate query traffic from indexing and item
processing. This requires twice the number of servers in the search cluster, at the benefit of
better and more consistent query performance. Such a configuration also provides query
matching redundancy.
A dedicated search row implies some additional traffic during crawls when the indexer creates a
new index generation (a new version of the index for a given partition). The new index data is
passed over the network from the indexer node to the query matching node. Given a proper
storage subsystem, the main effect on query performance is a slight degradation when new
generations arrive due to cache invalidation.
Search row combined with backup indexer
You can deploy a backup indexer to handle non-recoverable errors on the primary indexer. You
typically co-locate the backup indexer with a search row. For this scenario you should generally
not deploy item processing to the backup indexer (with search row). The backup indexer
increases the I/O load on the search row, because there is additional housekeeping
communication between the primary and backup indexer to keep the index data on the two
servers in sync. This also includes additional data storage on disk for both servers. Make sure
that you dimension your storage subsystem to handle the additional load.
Network traffic
With increased CPU performance on the individual servers, the network connection between the
servers can become a bottleneck. For example, even a small four-node FAST Search Server
farm can process and index more than 100 items per second. If the average item is 250 KB, this
represents about 250 megabits/s average network traffic. Such a load can saturate even a 1
gigabit/s network connection.
The network traffic that is generated by content feeding and indexing is as follows:

The indexing connector within the Content SSA retrieves the content from the source.

The Content SSA (within the SharePoint farm) passes the retrieved items in batches to
the content distributor component in the FAST Search Server farm.

Each item batch is sent to an available item processing component, typically located on
another server.

After processing, each batch is passed to the indexing dispatcher, which splits the
batches according to index column distribution.

The indexing dispatcher distributes the processed items to the indexers of each index
column.

The binary index is copied to additional search rows (if deployed).
The accumulated network traffic across all nodes can be more than five times higher than the
content stream itself in a distributed system. A high performance network switch is needed to
interconnect the servers in such a deployment.
High query throughput also generates high network traffic, especially when using multiple index
columns. Make sure you define the deployment configuration and network configuration to avoid
too much overlap between network traffic from queries and network traffic from content feeding
and indexing.
Web Analyzer performance dimensioning
Performance dimensioning of the Web Analyzer component depends on the number of indexed
items and whether the items contain hyperlinks. Items that contain hyperlinks or that are linked
to, represent the main load on the Web Analyzer.
Database-type content does not typically contain hyperlinks. SharePoint and other types of
Intranet content often contain HTML with hyperlinks. External Web content is almost exclusively
HTML documents with many hyperlinks.
The number of CPU cores and the amount of disk space is vital for performance dimensioning of
the Web Analyzer, but disk space is the most important. The following table specifies rule-ofthumb dimensioning recommendations for the Web Analyzer.
Content type
Number of items per CPU core
GB disk per million items
Database
20 million
2
SharePoint / Intranet
10 million
6
Public Web content
5 million
25
The amount of memory that is needed is the same for all types of content, but it depends on the
number of cores used. We recommend planning for 30 MB per million items plus 300 MB per
CPU core.
The link, anchor text, or click-through log analysis is performed only if sufficient disk space is
available. The number of CPU cores affects only the amount of time it takes to update the index
with anchor text and rank data.
Note:
The table provides dimensioning rules for the whole farm. If the Web Analyzer components
are distributed over two servers, the requirement per server are half of the given values.
If the installation contains different types of content, the safest capacity planning strategy is to
use the most demanding content type as the basis for the dimensioning. For example, if the
system contains a mix of database and SharePoint content, we recommend dimensioning the
system as if it contains only SharePoint content.
Scenarios
This section describes a typical medium sized search farm. This is a moderate search farm
scenario, where the farm provides search services to other farms. Forty million items are
crawled from SharePoint, Web servers, and file shares.
In addition, larger scenarios are briefly discussed (no test data provided in this white paper):

Large FAST Search farm - the large search farm scenario, where the administrator is
providing search services to other farms. One-hundred million items are crawled from
SharePoint, file shares, Web servers, and databases.

Extra-large FAST Search farm – this is the same scenario as the "Large FAST Search
farm," but with increased capacity up to 500 million indexed items.
Note:
A small farm, typically a single-server FAST Search Server farm with up to 15 million items, is
not covered in this white paper.
Note:
The following scenarios do not include storage sizing for storing a system backup, because
backups would typically not be stored on the FAST Search Server nodes themselves.
Each of the following sub-sections describes the specific scenario. General guidelines follow in
the Recommendations section of this white paper.
Medium FAST Search Server farm
The amount of content over which to provide search is moderate (as many as 40 million items),
and to meet freshness goals, incremental crawls are likely to occur during business hours.
The configuration for the parent SharePoint farm uses two front-end Web servers, two
application servers and one database server, arranged as follows:

Application servers and front-end Web servers will only have disk space for the operating
system and programs. No separate data storage is required.

Two crawl components for the Content SSA are distributed across the two application
servers. This is mainly due to I/O limitations in the test setup (1 gigabit/s network),
where a single network adapter would have been a bottleneck.

One of the application servers also hosts Central Administration for the farm.

One database server supports the farm that is hosting the crawl databases, the FAST
Search Server administration databases, in addition to the other SharePoint databases.
Deployment alternatives
The FAST Search Server farm can be deployed in various configurations to suit different
business needs. For the medium farm scenario, the following alternatives have been tested for
the FAST Search Server farm back-end:
M1.
One combined administration and Web Analyzer server, and three index column
servers with default configuration (4 servers)
M2.
Same as M1, but using SAN storage (4 servers)
M3.
A single high capacity server that hosts all FAST Search Server components
M4.
Same as M1, with the addition of a dedicated search row (7 servers)
M5.
Same as M3, with the addition of a dedicated search row (2 servers)
M6.
Same as M4, but the search row includes a backup indexer row (7 servers)
M7.
Same as M5, but the search row includes a backup indexer row (2 servers)
An extended version of the M2 scenario with a total of 100,000 documents has also been tested
to a limited extent, mostly to see if there would be bottlenecks when SAN is used for larger
setups. This is referred to as M2-100M.
Specifications
This section provides detailed information about the hardware, software, topology, and
configuration of the test environment.
Hardware
FAST Search Server farm servers
All the medium size scenarios are running on similar hardware. Unless otherwise stated, the
following specifications have been used.
Shared specifications:

Windows Server 2008 R2 x64 Enterprise Edition

2x Intel L5520 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on

24 GB memory

1 gigabit/s network card

Storage subsystem
o
Operating system: 2x 146 GB 10,000 RPM SAS disks in RAID1
o
Application: 18x 146 GB 10,000 RPM SAS disks in RAID50 (two parity groups of 9
drives each). Total formatted capacity of 2 terabytes.
o
Disk controller: HP Smart Array P410, firmware 3.00
o
Disks: HP DG0146FARVU, firmware HPD5
Changes for M2:

Application is hosted on 2-terabyte partitions on a SAN

SAN used for test
o
3Par T-400
o
240 10,000 RPM spindles (400 GB each)
o
Dual ported FC connection to each application server using MPIO without any FC
switch. MPIO enabled in the operating system.
Changes for M2-100M:

Same as M2, but with increased storage space to 16-terabyte partitions on the SAN.
Changes for M3/M5:

48 GB memory

Application is hosted on 22x300 GB 10,000 RPM SAS drives in RAID50 (two parity groups
of 11 spindles each). Total formatted capacity of 6 terabytes.
Changes for M7:

2x Intel E5640 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on

48 GB memory

Dual 1 gigabit/s network card

Storage subsystem:
o
Application hosted on 12x1 terabyte 7,200 RPM SAS drives in RAID10. Total
formatted capacity of 6 terabytes.
o
Disk controller: Dell PERC H700, firmware 12.0.1-0091
o
Disks: Seagate Constellation ES ST31000424SS, firmware KS65
SharePoint Server 2010 servers
Application and front-end Web servers do not need storage other than operating system,
application binaries, and log files.

Windows Server 2008 R2 x64 Enterprise edition

2x Intel L5420 CPUs

16 GB memory

1 gigabit/s network card

Storage subsystem for operating system and programs: 2x 146 GB 10,000 RPM SAS
disks in RAID1
Instances of SQL Server
The specification is the same as for servers with SharePoint Server 2010 in the previous section,
with additional disk RAID for SQL Server data with 6x 146 GB 10,000 RPM SAS disks in RAID5.
Topology
This section describes the topology of the environment.
M1 and M2 (same configuration)
M1 and M2 are similar except for the storage subsystem. M1 is running on a local disk, while M2
uses SAN storage. M1 and M2 have a search cluster with three index columns and one search
row. There is one separate administration node that also includes the Web Analyzer
components. Item processing is spread out across all nodes.
These scenarios do not have query matching running on a dedicated search row. This implies
that there will be a noticeable degradation in query performance during content feeds. The
effect can be reduced by running feed during off-peak hours or by reducing the number of itemprocessing components to reduce the maximum feed rate.
The following figure shows the M1 deployment alternative. All the tested deployment
alternatives use the same SharePoint Server and Database Server configuration. For the other
deployments only the FAST Search Server farm topology is shown.
The following deployment.xml file is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M1"
xmlns=http://www.microsoft.com/enterprisesearch
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M1</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M1.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<host name="fs4sp2.contoso.com">
<content-distributor />
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>
<host name="fs4sp3.contoso.com">
<content-distributor />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>
<host name="fs4sp4.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
M3
The M3 scenario combines all components on one server. The same effect as on M1 and M2 for
running concurrent feed and queries applies. The reduced number of servers implies fewer items
processing components and thus lower feed rate than M1 and M2.
The following figure shows the FAST Search Server farm topology for the M3 deployment
alternative. The SharePoint Server and Database Server configuration is equal to what is
described for M1. For the other deployments only the FAST Search Server farm topology is
shown.
The following deployment.xml file is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M3"
xmlns=http://www.microsoft.com/enterprisesearch
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M3</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M3.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
M4
M4 corresponds to M1 with the addition of a dedicated search row. The search row adds query
throughput capacity, introduces query redundancy, and provides better separation of query and
feeding load. Each of the three servers running the dedicated search row also includes a query
processing component (query). The deployment also includes a query processing component on
the administration node (fs4sp1.contoso.com). The Query SSA does not use this query
processing component during typical operation, but it can be used as a fallback to serve queries
if the entire search row is taken down for maintenance.
The following figure shows the FAST Search Server farm topology for the M4 deployment
alternative. The SharePoint Server and Database Server configuration is equal to what is
described for M1. For the other deployments, only the FAST Search Server farm topology is
shown.
The following deployment.xml file is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M4"
xmlns=http://www.microsoft.com/enterprisesearch
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M4</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M4.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<host name="fs4sp2.contoso.com">
<content-distributor />
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>
<host name="fs4sp3.contoso.com">
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>
<host name="fs4sp4.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>
<host name="fs4sp5.contoso.com">
<query />
<searchengine row="1" column="0" />
</host>
<host name="fs4sp6.contoso.com">
<query />
<searchengine row="1" column="1" />
</host>
<host name="fs4sp7.contoso.com">
<query />
<searchengine row="1" column="2" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>
</deployment>
M5
M5 corresponds to M3 with the addition of a dedicated search row, giving the same benefits as
M4 compared to M1.
The following figure shows the FAST Search Server farm topology for the M5 deployment
alternative. The SharePoint Server and Database Server configuration is equal to what is
described for M1. For the other deployments, only the FAST Search Server farm topology is
shown.
The following deployment.xml file is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M5"
xmlns=http://www.microsoft.com/enterprisesearch
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M5</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M5.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="16" />
</host>
<host name="fs4sp2.contoso.com">
<query />
<searchengine row="1" column="0" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>
</deployment>
M6
M6 is the same setup as M4, with an additional backup indexer enabled on the search row. Only
the deployment.xml change compared to M4 is shown in the following code example.
…
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="secondary" search="true" />
</searchcluster>
…
M7
M7 is the same setup as M5, with an additional backup indexer enabled on the search row. M7 is
also running on nodes with more CPU cores (see hardware specifications), allowing an increase
in the number of item processing components, including the search row. The following
deployment.xml is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M7"
xmlns=http://www.microsoft.com/enterprisesearch
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M7</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M5.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="20" />
</host>
<host name="fs4sp2.contoso.com">
<query />
<searchengine row="1" column="0" />
<document-processor processes="8" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="secondary" search="true" />
</searchcluster>
</deployment>
Configuration for extended content capacity
FAST Search Server has a default configuration that is optimized for handling as many as 15
million items per index column, with a hard limit of 30 million items per index column. Some of
the scenarios described in this white paper use a modified configuration to allow for as many as
40 million items per column. This is referred to as an extended capacity configuration. The
extended content capacity configuration has more index partitions within each server node. In
this way low query latency can be maintained at the expense of reduced maximum queries per
second (QPS).
Note:
The modified indexer configuration is not optimal for deployments with less than 15 million
documents, and should only be used when higher capacity is required.
Modifying the indexer configuration has implications on how to perform patch and service pack
upgrades. This is further described in the following text.
The extended content capacity configuration requires a configuration change to a configuration
file that is used by the indexer.
Configure the indexer to handle up to 40 million items per column
To reconfigure the indexers you must modify the indexer template configuration file and run the
deployment script to generate and distribute the new configuration.
Note:
The following procedure can only be applied to indexers that do not contain any data.
1. Verify that no crawling is ongoing.
2. Verify that no items are indexed on any of the indexers. Type the following command:
%FASTSEARCH%\bin\indexerinfo –a doccount.
All the indexers should report 0 items.
3. On all FAST Search Server nodes, type the following commands:
net stop fastsearchservice
net stop fastsearchmonitoring
4. On the admin server node:

Save a backup of the original configuration file,
%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data
\RTSearch\clusterfiles\rtsearchrc.xml.win32.template
You might need this backup at a later stage if this configuration file is modified in
any patch or service pack upgrade.

Modify the following configuration values:
i. Configuration parameter: numberPartitions
1. Default setting: 5
2. New setting: 10
ii. Configuration parameter: docsDistributionMax
1. Default setting:
6000000,6000000,6000000,6000000,6000000
2. New setting:
6000000,6000000,6000000,6000000,6000000,6000000,6000000,6
000000,6000000,6000000

The deployment file,
%FASTSEARCH%\etc\config_data\deployment\deployment.xml
must be modified for the PowerShell cmdlet Set-FASTSearchConfiguration to run
the redeployment. You can do that by opening the file in Notepad, adding a space,
and saving the file.

Type the following commands:
Set-FASTSearchConfiguration
net start fastsearchservice
5. On all non-admin server nodes, type the following commands:
Set-FASTSearchConfiguration
net start fastsearchservice
Handling patches and service pack upgrades
For all future patch or service pack updates, you must verify whether this configuration file is
updated as part of the patch or as the service pack update. Review the readme file thoroughly
to look for any mention of this configuration file. If a patch or service pack involves an update of
this configuration file, the following steps must be followed.
1. Replace the configuration file
%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSe
arch\clusterfiles\rtsearchrc.xml.win32.template
with the backup of the original file that you have saved.
2. Perform the patch or service pack upgrade according to the appropriate procedure.
3. Perform the change to the configuration file template as specified earlier. Do not forget to
back up the modified configuration file template!
There are some tradeoffs associated with extending the capacity:

Query throughput is reduced. Query latency (although not exceeding the throughput
limitation) is less affected. Query throughput reductions can be compensated with
multiple search rows, but then the reduction in server count is diminishing.

Indexing requires more resources and more disk accesses.

More items per column require more storage space per server. The total storage space
across the entire farm is mainly the same.

There are fewer nodes for distributing item processing components. Initial feed rate is
reduced, because the feed rate is dependent mainly on the number of available CPU
cores. Incremental feeds also have lower throughput because each index column has
more work. Initial preproduction bulk feeds can be accelerated by temporarily adding
item processing components to eventual search rows, or by temporarily assigning
additional servers to the cluster.

More hardware resources per server are required. We do not recommend that you use
the extended settings on a server with less than 16 CPU cores/threads (24 or more is
recommended). We recommend 48 GB RAM and a high-performance storage subsystem.
See individual scenarios for tested configurations.
In summary, we recommend the extended content capacity configuration only for deployments
with:

High content volumes, but where the number of changes over time is low, typically less
than 1 million changes per column over 24 hours.

Low query throughput requirements (not more than 5-10 queries per second, depending
on CPU performance of the servers).

Running search on a different search row than the primary indexer, because the indexer
is expected to be busy most of the time.
Note:
Any content change implies a load to the system, including ACL changes. ACL changes can
appear for many items at a time in case of access right changes to document libraries or sites.
Dataset
This section describes the test farm dataset, including database content and sizes, search
indexes, and external data sources.
Object
Search index size (# of items)
Size of crawl database
Size of crawl database log file
Size of property database
Size of property database log file
Size of SSA administration database
Value
42.7 million
138 GB
11 GB
<0.1 GB
0.3 GB
<0.1 GB
Note:
The FAST Search Server index does not use any SQL Server-based property database. The
people search index uses the property database, but the test scenarios in this white paper do
not include people search.
The following table specifies which content source types are used to build the index. The
numbers in the table reflect the total number of items per source, including replicated copies.
The difference between the total number of items in the table (43.8 million) and the index size
in the previous table (42.7 million) is due to two factors:

Items can be disabled from indexing in the content source.

The document format type cannot be indexed.
For SharePoint sources, the size of the respective content database in SQL Server is used as the
raw data size.
Content source
File share 1 (2 copies)
File share 2 (2 copies)
SharePoint 1
SharePoint 2
HTML 1
HTML 2
Total
items
Raw data size
Average size
per item
1.2 million
29.3 million
4.5 million
4.5 million
1.1 million
3.2 million
43.8 million
154 GB
6.7 terabytes
2.0 terabytes
2.0 terabytes
8.8 GB
137 GB
11 terabytes
128 KB
229 KB
443 KB
443 KB
8.1 KB
43 KB
251 KB
Note:
To reach sufficient content volume in these tests, two replicas of the file shares are added.
Each copy of each document appears as a unique item in the index, but is treated as a
duplicate by the duplicate trimming feature. From a query matching perspective the load is
similar to having all unique documents indexed, but any results from these sources trigger
duplicate detection and collapsing in the search results.
The test scenarios do not include people search data. People search is crawled and indexed in a
separate index within the Query SSA.
Workload
This section describes the workload that is used for data generation, including the number of
concurrent users and farm usage characteristics.
The number of queries per second (QPS) is varied from 1 QPS to about 40 QPS and the latency
is recorded as a function of this.
The query test set consists of 76,501 queries. These queries have the following characteristics:
Query terms
1
2
3
4
5
7
Number of queries
49,195
2,4520
2,411
325
43
7
Percentage of test set
64.53
32.16
2.81
0.43
0.06
0.01
There are two types of multiterm queries used:
1. ALL queries (about 70 percent of the multiterm queries), meaning all terms must appear
in matching items. This includes queries containing the explicit AND, in addition to the list
of terms that implicitly is parsed as an AND statement.
2. ANY queries (about 30 percent of the multiterm queries), meaning at least one of the
terms must appear in matching items (OR).
The queries are chosen by random selection.
The number of agents defines the query load. One agent repeats the following two steps during
the test:
1. Submit a query.
2. Wait for response.
There is no pause between the repetitions of these steps. A new query starts immediately after
a query response is received.
The number of agents increases in steps during the test. The following figure shows a typical
test where the number of agents increases periodically. For this example the test runs 15
minutes for each number of agents value.
Test results
This section provides data that shows how the farm performed under load.
Feed and indexing performance
Full crawl
The following diagram shows the number of items processed per second for the various
scenarios.
Key takeaways:

The item processors represent the bottleneck. CPU processing capacity is the limiting
factor.

M1, M2, and M4 have similar performance characteristics because the same number of
item processors are available. The same applies to M3 and M5.

M1, M2, and M4 have four times the item processor capacity compared to M3 and M5.
This translates to a four times higher crawl performance during a full crawl.

Running with backup indexers incurs a performance overhead due to the extra
synchronization work required. Typically an installation without backup indexers, such as
M4, outperform one with backup indexers, such as M6.
Incremental crawl
Key takeaways:

Incremental crawls are faster than full crawls, from slightly faster up to a factor of two or
three. This is because incremental crawls mainly consist of partial updates, which only
update metadata. This implies that the feed performance is largely the same for all
content types.

The indexers are the bottleneck for incremental crawls, because item processing load is
limited. Typically, disk I/O capacity is the limiting factor. During an incremental update
the old version of the item is fetched from disk, modified, persisted to disk, and then
indexed. This is more expensive than a full crawl operation where the item is only
persisted and indexed.
Query performance
Scenario M1
The following diagram shows the query latency as a function of QPS for the M1 scenario. For an
increasing number of agents the QPS increases. For low QPS, the query matching component
can handle the increasing QPS, and latency does not increase much. For higher QPS, a
saturation of the system gradually takes place and the query latency increases.
An idle indexer gives best query performance, with an average latency less than 0.7 until
approximately 21 QPS. The corresponding numbers, when a full crawl is done, is 10 QPS. The
corresponding numbers when an incremental crawl is done, is 15 QPS.
The previous figure shows that QPS decreases and latency increases if you apply more query
load after the maximum capacity of the system has been reached. This occurs at the point
where the curve starts bending backwards. On the M1 system, the peak QPS is about 28, with
idle indexers. CPU resources are the bottleneck in this scenario. The behavior is also illustrated
in the next diagram where you can observe that performance decrease when there are more
than 40 simultaneous user agents on an idle system.
Hyper-threading
The following diagram shows the effect of using hyper-threading in the CPU. Hyper-threading
allows more threads to execute in (near) parallel, at the expense of slightly reduced average
performance for single-threaded tasks. FAST Search Server query matching components run in
a single thread when QPS is low and no other tasks are running on the server.
Hyper-threading performs better for all the three feeding cases in the M1 scenario. In other
scenarios, with dedicated search rows, a small reduction (about 150 ms) in query latency is
observed when running at very light query load.
In general, hyper-threading reduces query latency and allows for higher QPS, especially when
there are multiple components on the same server. Disabling hyper-threading provides only a
small improvement under conditions where the performance is already good. Hence, we
recommend having hyper-threading enabled.
Scenario M2
The following diagram shows the query latency as a function of QPS for the M2 scenario. For
increasing number of agents, the QPS increases. For low QPS, the query matching handles
increasing QPS, and latency does not increase much. For higher QPS, a saturation of the system
takes place and the latency increases.
When the indexers are idle, there is a slow increase until the deployment reaches the saturation
point at approximately 20 QPS. For full and incremental crawls the latency increases as
indicated in the graph. This test does not include test data to indicate exactly when the query
latency saturation takes place during crawl.
The following diagram shows the same test data presented as user agents versus latency:
Comparing M1 and M2
The next diagram compares the performance of M1 and M2. The main conclusion is that M1
performs somewhat better than M2. M1 can handle about 3 QPS more than M2 before reaching
the saturation point. The SAN disks themselves used on M2 should be able to match M1’s locally
attached disks in terms of I/O operations per second, but the bandwidth towards the disks is
somewhat lower with the SAN configuration.
For full crawl and incremental crawl, the performance was comparable during the light load
tests. During heavy load, M2 showed slightly less effect on search by ongoing indexing, as the
SAN provided more disk spindles to distribute the load.
Scenario M3
M3 (40 million items on a single server) can handle about 10 QPS when no feeding is ongoing.
This is shown in the following diagrams, both as QPS versus latency and as user agents versus
QPS. For comparison, the M1 data is also included.
One characteristic of the single node installation is that the query latency fluctuates more when
getting close to the saturation point.
Under low query load, M3 is almost able to match the performance of M1. During higher load the
limitations become apparent. M1 has three times the number of query matching nodes and the
peak QPS capacity is close to three times as high, 28 versus 10 QPS.
Scenario M4
M4 is an M1 installation with an added dedicated search row. The main benefit of such a
configuration is that the index and the search processes are not directly competing for the same
resources, primarily disk and CPU.
The following diagrams show that the added search row gives a five QPS gain versus M1. In
addition the query latency is improved by about 0.2-0.4 seconds.
Adding search rows, in most cases, improves query performance, but such a deployment also
introduces additional network traffic that can affect the performance.

The query performance can degrade when search rows are added if you do not have
sufficient network capacity. This is the case when the indexers copies large index files to
the query matching nodes.

The index file copying can also affect index latency performance due to the added need
for copying indices.
Query performance versus document volume
The following diagram shows the result of running the query test on M4 with 18 million, 28
million, and 43 million documents indexed.
The document volume affects the maximum QPS that the system can deliver. Adding
approximately 10 million documents gives an approximate maximum 5 QPS reduction. With less
than 23 QPS, the document volume has a low effect on the query latency.
Scenario M5
Adding a dedicated search row improves query performance, as has already been illustrated
when comparing M1 and M4. The same is the case when adding a query matching node to the
single node M3 setup to get an M5 deployment.
Scenario M6
The following diagram shows the query performance of the M6 versus the M4 topology.
The difference between M6 and M4 is the addition of a backup indexer row. The backup indexers
compete with query matching for available resources, and they can degrade query performance.
However, in this specific test, that is not the case. The hardware that was used has enough
resources to handle the extra load during typical operations.
The backup indexers use significantly less resources than the primary indexer. This is because
the primary indexers perform the actual indexing and distribute the indices to the search rows
and backup indexer row. Note that all indexers perform the regular optimization tasks of internal
data structures between 03:00 A.M. and 05:59 A.M. every night. These tasks can, depending on
the feed pattern, be quite I/O intensive. Testing on M6 has shown that you can see a significant
reduction in query performance during indexer optimization processes. The more update and
delete operations the indexer handles, the more optimization is required.
Disk usage
Index disk usage
The following table shows the combined increase in disk usage on all nodes after the various
content sources have been indexed.
File share 1 (2 copies)
Raw source
data size
154 GB
FiXML data
size
18 GB
Index data
size
36 GB
File share 2 (2 copies)
6.7 terabytes
360 GB
944 GB
5 GB
10 GB
SharePoint 1
2.0 terabytes
70 GB
220 GB
13 GB
SharePoint 2
2.0 terabytes
66 GB
220 GB
17 GB
HTML 1
8.8 GB
27 GB
58 GB
8 GB
HTML 2
137 GB
17 GB
112 GB
6 GB
Total
11 terabytes
558 GB
1.6 terabytes
56 GB
Content source
Other data size

Raw source data size is included only for illustration. These data do not occupy any disk
space on the FAST Search Server system.

FiXML data. The indexer stores the processed items on disk in an XML-based format.
FiXML data serves as input to the indexing process which builds the indices.
o
Every submitted item is stored in FiXML format. Old versions are removed once a
day. The data size that is given contains only a single version of every item.

Index data. The set of binary index files used for query matching.

FAST Search Server keeps a read-only index file set to serve queries while building the
next index file set. The worst-case disk space usage for index data is approximately 2.5
times the size of a single index file set. The 0.5 factor constitute various temporary files.
Other data includes Web Analyzer data, log files, and so on.

When running with primary and backup indexers, the indexers can consume an additional
50 GB each for synchronization data.
Key takeaways:

The ratio between source data and index data depends strongly on the content type. This
is related to the average amount of searchable data in the various data formats.
Web Analyzer disk usage
The following table shows disk usage for the Web Analyzer in a mixed content scenario, where
the data is both file share content and SharePoint items.
Number of items in index
40,667,601
Number of analyzed hyperlinks
119,672,298
Average number of hyperlinks per items
2.52
Peak disk usage during analysis (GB)
77.51
Disk usage between analysis (GB)
23.13
Disk usage per 1 million items during peak (GB)
1.63
The average number of links per item is quite low compared to pure Web content installations,
or pure SharePoint installations. For example, in a pure Web content installation the average
number of links can be as high as 50. Because the Web Analyzer stores only document IDs,
hyperlinks, and anchor texts, the number of links is the dominant factor determining the disk
usage.
The values in the preceding table are somewhat lower than the values that are specified in the
Web Analyzer performance dimensioning section. The values in the preceding table derive from
one specific installation where URLs are fairly short. The performance dimensioning
recommendations are based on experience from several installations.
Overall Takeaways
Query and feeding performance
Performance for feeding new content is mainly determined by the item processing capacity. It is
important that you deploy the item processing component in a way that utilizes spare CPU
capacity across all servers.
Running indexer, item processing and query matching on the same server gives high resource
utilization, but also higher variations in query performance during crawling. For such a
deployment, we recommend that you schedule all crawling outside periods with high query load.
A separate search row is recommended for deployments where low query latency is required at
any time.
You can also combine a separate search row with a backup indexer. This provides short recovery
time in case of a nonrecoverable disk error, with some loss of query performance and
incremental update rates. For the highest query performance requirements, we recommend a
pure search row.
Redundancy
The storage subsystem for a farm must have some level of redundancy, because loss of storage
even in a redundant setup leads to reduced performance during a recovery period that can last
for days. Using a RAID disk set, preferably also with hot spares, is essential to any installation.
A separate search row also provides query redundancy.
Full redundancy for the feeding and indexing chain requires a backup indexer on a separate row,
with increased server count and storage volume. Although this provides the quickest recovery
path from hardware failures, other options might be more attractive when hardware outages are
infrequent:

Running full re-crawl of all the content sources after recovery. Depending on the
deployment alternative, this can take several days. If you have a separate search row
you can perform the re-crawl while you keep the old index searchable.

Run regular backup of the index data.
Capacity per node
For deployments with up to 15 million items per node you should use the default
configuration.</deployment>
Configuration for extended content capacity can be used for up to 40 million items per node if
you have moderate query performance requirements. Given sufficient storage capacity on the
servers, this enables a substantial cut in the number of servers that are deployed.
Deployments on SAN
FAST Search Server can use SAN storage instead of local disks if this is required for operational
reasons. The requirement for high performance storage still applies. Testing of the M2 scenario
shows that a sufficiently powerful SAN is not a bottleneck. Although actual workload is scenario
dependent, the following parameters could be used as estimation for the required SAN resources
for each node in the FAST Search Server farm:

2,000 – 3,000 I/O operations per second (IOPS)

50 – 100 KB average block size

Less than 10 ms average read latency
For a farm setup, such as M4 (7 servers), the SAN must be capable of serving 15,000 – 20,000
IOPS to the FAST Search Server farm regardless of any other traffic served by the same storage
system.
Troubleshooting performance and scalability
This section provides recommendations for how to optimize your environment for appropriate
capacity and performance characteristics. It also covers troubleshooting tips for the FAST Search
Server farm servers, and the FAST Search Server specific configuration settings in the Query
and Content SSAs.
Raw I/O performance
FAST Search Server has extensive use of the storage subsystem. Testing the raw I/O
performance can be used as an early verification of having sufficient performance.
One such test tool is SQLIO
(http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65cb53442d9e19).
After installing SQLIO, the first step is to get or generate a suitable test file. Because the
following tests include write operations, the content of this file is partially overwritten. The size
of the file should also be much larger than the available system memory (by a factor of 10) to
avoid most caching effects.
The test file can also be generated by SQLIO itself, although not directly for huge file sizes. We
recommend that you generate a 1 GB file with the command "sqlio.exe -t32 -s1 -b256 1g",
which creates the file named "1g" in the current directory. This file can then be concatenated to
a sufficiently large file such as 256 GB, by the command "copy 1g+1g+1g+…..+1g testfile".
The following set of commands represents the most performance critical disk operations in FAST
Search Server. All assume that a file "testfile" exists in the current directory, which should be
located on the disk that is planned to host FAST Search Server:
sqlio.exe
sqlio.exe
sqlio.exe
sqlio.exe
sqlio.exe
-kR
-kR
-kW
-kR
-kW
-t4
-t4
-t4
-t1
-t1
-o25 -b1 -frandom -s300 testfile
-o25 -b32 -frandom -s300 testfile
-o25 –b32 -frandom -s300 testfile
-o1 -b100000 -frandom -s300 testfile
-o1 -b100000 -frandom -s300 testfile
The first test measures the maximum number of I/O operations per second for small read
transfers. The second and third tests measure the performance for medium sized random
accesses. The last two tests measure read and write throughput for large transfers. Some
example results are given in the following table, with minimum recommendations during typical
operation in the topmost row.
The following table summarizes the results of the test.
Disk layout
Recommended minimum
16x SAS 10,000 RPM 2.5" drives
RAID50 in two parity groups
22x SAS 10,000 RPM 2.5" drives
RAID50 in two parity groups
With drive failure
12x SAS 7200 RPM 3.5" drives
RAID50 in two parity groups
With drive failure
12x SAS 7200 RPM 3.5" drives
RAID10
With drive failure
1 KB
read
[IOPS]
2000
32 KB
read
[IOPS]
1800
32 KB
write
[IOPS]
900
100MB
read
[MB/s]
500
100M
write
[MB/s]
250
2952
2342
959
568
277
4326
3587
1638
1359
266
3144
2588
1155
770
257
2728
1793
655
904
880
1925
1242
680
178
306
2165
1828
1500
803
767
2015
1711
1498
751
766
Note:
The numbers in the table reflect a deployment where the disk subsystem is at least 50 percent
utilized in capacity before the test file is added. Testing on freshly formatted disks tends to
produce slightly elevated results, because the test file is then placed in the most optimal
tracks across all spindles.
RAID50 provides better performance during typical operation than RAID10 for most tests other
than small writes. RAID10 has less performance degradation if a drive fails. We recommend
using RAID50 for most deployments, because 32 KB writes is the least critical of the five tests
indicated in the preceding table. RAID50 provides nearly twice the storage capacity compared to
RAID10 on the same number of disks.
If you deploy a backup indexer, 32 KB writes are more frequent. This is a large amount of preindex storage files (FiXML) are passed from the primary to the backup indexer node. In certain
cases, this can lead to a performance improvement by using RAID10. Note that these results are
to a large degree dependent on the disk controller and spindles that are used. All scenarios in
this white paper specify in detail the actual hardware that has been tested.
Analyzing feed and indexing performance
The feed and indexing processing chain
The feed and indexing processing chain in FAST Search Server consists of the following
components, all potentially running on separate nodes:

Crawler(s): Any node pushing content into FAST Search Server, in most cases a Content
SSA that is hosted in a SharePoint 2010 Server farm.

Content distributor(s): Receives content in batches, which are redistributed to item
processing running in document processors.

Item processing: Converts documents to a unified internal format.

Indexing dispatcher(s): Schedules which indexer node gets the content batch.

Primary indexer: Generates the index.

Backup indexer: Persists a backup of the information in the primary indexer.
Content flows, as indicated by arrows 1–5 in the preceding figure, with the last flow from
primary to backup indexer, is an optional deployment choice. Asynchronous callbacks for
completed processing are propagating in the other direction as indicated by arrows 6 through 9.
Crawlers are throttling the feed rate based on the callbacks (9) received for document batches
(1). The overall feed performance is determined by the slowest component in this chain. The
following sections describe how to monitor this.
Monitoring can be done through several tools, for example Performance Monitor in Windows
Server 2008 R2, or on Systems Center Operations Manager.
Content SSA
The most frequently used crawler is the set of indexing connectors that is supported by the
Content SSA. The following statistics are important:

Batches ready: The number of batches that has been retrieved from the content sources
and that is ready for passing on to the content distributor.

Batches submitted: The number of batches that has been sent to FAST Search Server
and for which a callback is still pending.

Batches open: The total number of batches in some stage of processing.
The following figure shows these performance counters for a crawl session. Note that there is
different scale used in "batches submitted" and the other two. Feed starts with "batches
submitted" ramping up until the item processing components are all busy (36 in this case), and
will stay at this level as long as there is available work ("batches ready"). There is a period at
about 6:45 P.M. to 8:45 P.M. during which the content source can only provide very limited
volumes of data, bringing "batches ready" to near zero in the same period.
For deployments with backup indexer rows, the "batches submitted" tends to exceed the
number of item processing components. These additional batches are content that has been
processed, but that has not yet been persisted in both indexer rows. By default, the Content
SSA throttles feeds to avoid more than 100 "batches submitted".
For large installations, the throttling parameters should be adjusted to allow for more batches to
be in some stage of processing. Tuning is needed only for deployments with at least one of the
following characteristics:

More than 100 item processing component instances deployed per crawl component in
the content SSA

More than 50 item processing component instances deployed per crawl component in the
content SSA, in conjunction with a backup indexer row

More than three index columns per crawl component in the content SSA
The number of crawl components within the content SSA must be dimensioned properly for
large deployments to avoid network bottlenecks. This scaling often eliminates the need for
further configuration tuning. When one or more of these conditions apply, the feeding
performance can be improved by increasing throttling limits in the Content SSA. These
properties are "MaxSubmittedBatches" (default 100) and "MaxSubmittedPUDocs" (default
1,000), and increased limits can be calculated as in the following section. Note that these limits
apply for each crawl component within the Content SSA. If you use two crawl components (as in
our tests), the total value is two times the configured value as seen from the FAST Search
Server farm.
𝑎= {
1 for deployments without backup indexer
2
for deployments with backup indexer
𝑏 = 𝑎 ∗ Number of item processor instances
𝑐 = Number of index columns
𝑠 = Number of crawl components in the Content SSA
MaxSubmittedBatches =
20 ∗ 𝑐 + 𝑏
𝑠
MaxSubmittedPUBatches = 100 ∗ MaxSubmittedBatches
As an example, the M4 scenario has a=1, b=48, c=3, s=2, resulting in MaxSubmittedBatches =
54 and MaxSubmittedPUDocs = 5,400. The default value (100) for MaxSubmittedBatches does
not need tuning in this case. MaxSubmittedPUDocs (the maximum number of documents with
ACL changes submitted) can be increased if the feed performance is limited by a high rate of
ACL changes. The mentioned configuration parameters have not been changed in the scenarios
covered in this white paper unless explicitly specified.
These throttling limits are configurable through the SharePoint Server 2010 Management Shell
on the SharePoint farm hosting the Content SSA, by the following commands (the following
example sets the default values).
$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "My Content SSA"
$ssa.ExtendedConnectorProperties["MaxSubmittedBatches"] = 100
$ssa.ExtendedConnectorProperties["MaxSubmittedPUDocs"] = 1000
$ssa.Update()
You replace the identity string "My Content SSA" with the name of your Content SSA.
Increasing these limits increases the load on the item processing and indexing components.
When these consume more of the farm resources, query performance are affected. This is less
of an issue when running without a dedicated search row. Increasing "MaxSubmittedPUDocs"
increases the I/O load on primary and backup indexers.
The following table shows the most important performance counters for the Content SSA. Note
that these are found on the node or nodes that are hosting the Content SSA crawl components,
under "OSS Search FAST Content Plugin", and not in the FAST Search Server farm.
Performance
counter
Apply to
object
Notes
Batches open
Content
SSA
The total number of batches in some stage of processing.
Batches
submitted
Content
SSA
The number of batches that has been sent to FAST Search
Server, and for which a callback is still pending. When zero,
nothing has been sent to the FAST Search Server farm backend
for processing.
Batches
ready
Content
SSA
The number of batches that has been retrieved from the
content sources and that are ready for submission on to the
content distributor. When it is zero, the FAST Search Server
farm backend is processing content faster than the Content SSA
can crawl.
Items Total
Content
SSA
The total number of items passed through the Content SSA
since last service restart.
Available MB
Memory
By default, the Content SSA stops aggregating batches that are
ready when 80 percent of the system memory has been used.
Processor
time
Processor
High CPU load could limit the throughput of the Content SSA.
Bytes
Total/sec
Network
Interface
High network load might become a bottleneck for the rate of
data that can be crawled and pushed to the FAST Search Server
farm.
Content distributors and item processing
Each FAST Search Server farm has one or more content distributors. These components receive
all content in batches, which are passed on to the item processing components. You can ensure
good performance by verifying that the following conditions are met.

Item processing components are effectively utilized

Incoming content batches are rapidly distributed for processing
Maximum throughput can only be achieved when the Content SSA described in the previous
section has a constant queue of "batches ready" that can be submitted. Each item processing
component uses 100 percent of a CPU core when it is busy. Item processing components can be
scaled up to one per CPU core.
With multiple content distributors, the following performance counters should be summed up
across all of them for a total overview of the system.
Performance
counter
Apply to
object
Notes
Document
processors
FAST Search
Content
Distributor
The number of item processing components that are
registered with each content distributor. With multiple
content distributions, the item processing components
should be almost evenly distributed across the content
distributors.
Document
processors
busy
FAST Search
Content
Distributor
The number of item processing components that are
currently working on a content batch. This should be close
to the total number under maximum load.
Average
dispatch time
FAST Search
Content
Distributor
The time needed for the content distributor to send a batch
to an item processing component. This should be less than
10 ms. Higher values indicate a congested network.
Average
processing
time
FAST Search
Content
Distributor
Time needed for batch to go through an item processing
component. This time can vary depending on content types
and batch sizes, but it would typically be less than 60
seconds.
Available MB
Memory
Each item processing component might need as much as 2
GB of memory. Processing throughput is affected under
memory starvation.
Processor
time
Processor
Item processing components are very CPU intensive. High
CPU utilization is thus expected during crawls. Item
processing is although scheduled with reduced priority, and
yields CPU resources to other components when needed.
Bytes
Total/sec
Network
Interface
High network load might become a bottleneck for the rate
of data that can be processed by the FAST Search Server
nodes.
Indexing dispatcher and indexers
Indexers are the most write intensive component in a FAST Search Server installation, and you
need to ensure that you have high disk performance. High indexing activity can also affect query
matching operations when running on the same row.
Indexers distribute the items across several partitions. Partition 0, and up to three of the other
partitions can have ongoing activity at the same time. During redistribution of items among
partitions, one or more partitions might be in a state waiting for other partitions to reach a
specific checkpoint. In addition to the following performance counters, indexer status is provided
by the "indexerinfo" command, for example "indexerinfo –a status".
Performance
counter
Apply to
object
Notes
API queue
size
FAST
Search
Indexer
Indexers queues incoming work under high load. This is typical,
especially for partial updates. If API queues never
(intermittently) reach zero, the indexer is the bottleneck. Feeds
will be paused when the API queue reaches 256 MB in one of the
indexers.
This can happen if the storage subsystem is not sufficiently
powerful. It also happens during a large redistribution of content
between partitions, which temporarily blocks more content from
being indexed.
FiXML fill
rate
FAST
Search
Indexer
FiXML files are compacted at regular intervals, by default
between 3:00 A.M. and 5:00 A.M. every night. Low FiXML fill
rate (less than 70%) leads to inefficient operation.
Active
documents
FAST
Search
Indexer
Partition
Partitions 0 and 1 should have less than 1 million items each,
preferably even less to keep indexing latency low. In periods
with high item throughput, indexing latency is reduced and
these partitions are larger, because this is more optimal for
overall throughput. Items are automatically rearranged into the
higher numbered partitions during periods with lighter load.
% Idle Time
Logical disk
Low disk idle time suggests a saturated storage subsystem.
% Free
space
Logical disk
Indexer needs space both for both the index generation
currently used for search and for new index generations that are
under processing. On a fully loaded system, disk usage varies
between 40 percent and near 100 percent for the same number
of items, depending on the state of the indexer.
Analyzing query performance
Query SSA
SharePoint administrative reports provide useful statistics for query performance from an endto-end perspective. These reports are effective for tracing trends over time and for identifying
where to investigate when performance is not optimal.
The following diagram shows two such events. At about 2:20 A.M., server rendering (blue
graph) has a short spike due to recycling of the application pool. Later, at 3:00 A.M., the FiXML
compaction is starting, which affects the backend latency.
In general, server rendering and object model latencies occur on the nodes that are running
SharePoint. These latencies are also dependent on the performance of the instances of SQL
Server that are backing the SharePoint installation. The backend latency is within the FAST
Search Server nodes, and it is discussed in the following sections.
QRproxy and QRserver
Queries are sent from the Query SSA to the FAST Search Server farm via the QRproxy
component, which resides on the server running the query processing component ("query" in
the deployment file). The performance counters in the following table can be helpful for
correlating the backend latency reported by the Query SSA, and the query matching component
(named "QRServer" in the reports). Neither of these components is likely to represent a
bottleneck. Any difference between the two is due to communication delays or processing in the
QRproxy.
Performance counter
Apply to
object
Notes
# Queries/sec
FAST Search
QRServer
Current number of queries per second
# Requests/sec
FAST Search
QRServer
Current number of requests per second. In
addition to the preceding query load, one request
is received every second to check that QRserver is
alive.
Average queries per
minute
FAST Search
QRServer
Average query load
Average latency last ms
FAST Search
QRServer
Average query latency
Peak queries per sec
FAST Search
QRServer
Peak query load seen by the QRserver since the
last restart
Query dispatcher
The query dispatcher (named "Fdispatch" in the reports) distributes queries across index
columns. There is also a query dispatcher that is located on each query matching node,
distributing queries across index partitions. The query dispatcher can be a bottleneck when
there are huge amounts of data in the query results, leading to network saturation. We
recommend that you keep traffic in and out of fdispatch on network connections that are not
carrying heavy load, for example, from content crawls.
Query matching
The query matching (component named "Fsearch" in the reports) is responsible for performing
the actual matching of queries against the index, computing query relevancy and performing
deep refinement. For each query, it reads the required information from the indices that are
generated by the indexer. Information that is likely to be reused is kept in a memory cache for
future use. Good fsearch performance relies on a powerful CPU and on low latency from small
random disk reads (typically 16-64 KB). The following performance counters are useful for
analyzing a node running the query matching:
Performance counter
Apply to
object
Notes
% Idle Time
Logical disk
Low disk idle time suggests a saturated storage
subsystem
Avg. Disk sec/Read
Physical disk
Each query needs a series of disk reads. An
average read latency of less than 10 ms is
desirable.
Avg. Disk Read Queue
Length
Physical disk
On a saturated disk subsystem, read queues build
up. Queues affect query latency. An average
queue length smaller than 1 is desirable for any
node running query components. This is typically
exceeded in single row deployments during
indexing, negatively affecting search performance.
Processor time
Processor
CPU utilization is likely to become the bottleneck
for high query throughput. When fsearch has high
processor time (near 100 percent), query
throughput cannot increase further.