FAST Search SharePoint 2010 Capacity Planning

FAST Search Server 2010 for SharePoint

Capacity Planning

This document is provided "as-is". Information and views expressed in this document, including

URL and other Internet Web site references, may change without notice. You bear the risk of using it.

Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred.

This document does not provide you with any legal rights to any intellectual property in any

Microsoft product. You may copy and use this document for your internal, reference purposes.

© 2010 Microsoft Corporation. All rights reserved.



FAST Search Server 2010 for SharePoint

Capacity Planning

Microsoft Corporation

November 2010

Applies to: FAST Search Server 2010 for SharePoint

Summary: This document describes specific deployments of FAST Search Server 2010 for

SharePoint, including:

 Test environment specifications, such as hardware, farm topology and configuration;

 The workload used for data generation, including the number and class of users, and farm usage characteristics;

 Test farm dataset, including search indexes and external data sources;

 Health and performance data specific to the tested environment;

 Test data and recommendations for how to determine the hardware, topology and configuration you need to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics.

Contents

Introduction ...................................................................................................................................................... 4

Search Overview ................................................................................................................................................ 5

Sizing Approach ........................................................................................................................................... 5

Search and Indexing Lifecycle ....................................................................................................................... 5

Content feeding ..................................................................................................................................... 5

Query load ............................................................................................................................................ 7

Network traffic .................................................................................................................................... 14

Web Analyzer performance dimensioning ...................................................................................................... 15

Scenarios ........................................................................................................................................................ 16

Shared specifications across all scenarios ...................................................................................................... 16

Query workload ................................................................................................................................... 16

Notes on measured disk usage .............................................................................................................. 17

Configuration for extended content capacity ............................................................................................ 18

Extra small FAST Search Farm ..................................................................................................................... 20

Deployment alternatives ....................................................................................................................... 21

Specifications ...................................................................................................................................... 21

Test Results ........................................................................................................................................ 26

Small FAST Search Farm ............................................................................................................................. 30

Medium FAST Search Farm.......................................................................................................................... 31


Specifications ...................................................................................................................................... 31

Test Results ........................................................................................................................................ 44

Large FAST Search Farm ............................................................................................................................. 61


Specifications ...................................................................................................................................... 61

Test Results ........................................................................................................................................ 66

Extra-large FAST Search Farm ..................................................................................................................... 67

Overall Takeaways ..................................................................................................................................... 68

Query and feeding performance ............................................................................................................. 68

Redundancy ........................................................................................................................................ 68

Capacity per node ................................................................................................................................ 68

Deployments on storage area networks (SAN) ......................................................................................... 68

Deployments on solid state disks (SSD) .................................................................................................. 69

Troubleshooting performance and scalability ........................................................................................................ 70

Raw I/O performance ................................................................................................................................. 70

Analyzing feeding and indexing performance ................................................................................................. 72

Content SSA ....................................................................................................................................... 72

Content distributors and item processing ................................................................................................ 75

Indexing dispatcher and indexers .......................................................................................................... 76

Analyzing query performance ...................................................................................................................... 77

Query SSA .......................................................................................................................................... 77

QRproxy and QRserver ......................................................................................................................... 78

Query dispatcher ................................................................................................................................. 79

Query matching ................................................................................................................................... 79

Introduction

This document provides capacity planning information for collaboration environment deployments of FAST Search Server 2010 for SharePoint, in the following referred to as

FAST Search Server. It includes the following information for sample search farm configurations:

 Test environment specifications, such as hardware, farm topology and configuration

 The workload used for data generation, including the number and class of users and farm usage characteristics

 Test farm dataset, including search indexes and external data sources

 Health and performance data specific to the tested environment

It also contains common test data and recommendations for how to determine the hardware, topology and configuration you need to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics.

FAST Search Server contains a richer set of features and a more flexible topology model than the search solution in earlier versions of SharePoint. Before you employ this architecture to deliver more powerful features and functionality to your users, you must carefully consider the impact upon your farm’s capacity and performance.

When you read this document, you will understand how to:

 Define performance and capacity targets for your environment

 Plan the hardware required to support the number and type of users, and the features you intend to deploy





Design your physical and logical topology for optimum reliability and efficiency

Test, validate and scale your environment to achieve performance and capacity targets

 Monitor your environment for key indicators

Before you read this document, you should read the following:

 Performance and capacity management (SharePoint Server 2010)

 Plan Farm Topology (FAST Search Server 2010 for SharePoint)

4

Search Overview

Sizing Approach

The scenarios in this document describe FAST Search Server test farms, with assumptions that allow you to start planning for the correct capacity for your farm. To choose the right scenario, you need to consider the following questions:

1.

Corpus Size: How much content needs to be searchable? The total number of items should include all objects: documents, web pages, list items, etc.

2.

Availability: What are the availability requirements? Do customers need a search solution which can survive the failure of a particular server?

3.

Content Freshness: How "fresh" do you need the search results? How long after the customer modifies the data do you expect searches to provide the updated content in the results? How often do you expect the content to change?

4.

Throughput: How many people will be searching over the content simultaneously? This includes people typing in a query box, as well as other hidden queries like web-parts automatically searching for data, or Microsoft Outlook 2010 Social Connectors requesting activity feeds that contain URLs which need security trimming from the search system.

Search and Indexing Lifecycle

Content feeding

The scenarios allow you to estimate capacity at an early stage of the farm. Farms move through multiple stages as content is crawled:

 Index acquisition This is the first stage of data population, it is characterized by: o Full crawls (possibly concurrent) of content. o Close monitoring of the crawl system, to ensure that hosts being crawled are not a bottleneck for the crawl.

 Index Maintenance This is the most common stage of a farm. It is characterized by: o Incremental crawls of all content, detecting new and changed content o For SharePoint content crawls, a majority of the changes encountered during the crawl are related to access right changes

 Index Cleanup This stage occurs when a content change moves the farm out of the index maintenance stage; for example, when a content database or site is moved from one search service application to another. This stage is not covered in the scenario testing behind this document, but is triggered when: o A content source and/or start address is deleted from a search service application. o A host supplying content is not found by the content connector for an extended period of time.

5

Index acquisition

When adding new content, feed performance is mainly determined by the configured number of item processing components. Both the number of the CPU cores and the speed of each of them will affect the results. As a first order approximation, a 1GHz CPU core will be able to process one average size Office document per second (around 250 kB). For example, the later discussed

M4 scenario has 48 CPU cores for item processing, each being 2.26GHz, providing a total

estimated throughput of 48 cores × 2.26GHz ≈ 100 items per second on average.

The crawl rate graph below is shown from the SharePoint administration reports. The crawl rate varies depending on the type of the content. Most of the crawl is new additions (labeled as

"modified" in the graph).

Note:

The indicated feed rates might saturate content sources and networks during peak feeding

rate periods in the above crawl. See section Troubleshooting performance and scalability for

further information on how to monitor feeding performance.

Index maintenance

Incremental crawls can consist of various operations.

6

 Access right (ACL) changes and deletes: These require near zero item processing, but high processing load in the indexer. Feed rates will be higher than for full crawls.

 Content updates: These require full item processing as well as more processing by the indexer compared to adding new content. Internally, such an update corresponds to a delete of the old item, and an addition of the new content.

 Additions: Incremental crawls will to some extent also contain newly discovered items.

These have the same workload as index acquisition crawls.

Depending on the type of operation, an incremental crawl may be faster or slower than an initial full crawl. It will be faster in the case of mainly ACL updates and deletes, and slower in the case of mainly updated items. Using a backup indexer may slow down the incremental crawl of updated items further.

In addition to updates from the content sources, the index is also altered by internal operations:

 The FAST Search Server link analysis and click-through log analysis generate additional internal updates to the index.

Example: A hyperlink in one item will lead to an update of the anchor text info associated with the referenced item. Such updates have a similar load pattern as the ACL updates.

 At regular intervals, the indexer performs internal reorganization of index partitions and data defragmentation. Defragmentation is started every night at 3am, while redistribution across partitions occurs whenever needed.

These internal operations imply that you may observe indexing activity also outside intervals with ongoing content crawls.

Query load

Index partitioning and query evaluation

The overall index is partitioned on two levels:

 Index columns: When the complete index is too large to reside on one server, it can be split into multiple disjoint index columns. A query will then be evaluated against all index columns within the search cluster, and the results from each index column are merged into the final query hit list.

 Index partitions: Within each index column the indexer uses a dynamic partitioning of the index in order to handle large number of indexed items with low indexing and query latency. This partitioning is dynamic and handled internally on each index server. When a query is evaluated, each partition runs within a separate thread. The default number of partitions is 5. In order to handle more than 15 million items per server (column), you need to change the number of partitions (and associated query evaluation threads). This

is discussed in section Configuration for extended content capacity .

Query latency

Evaluation of a single query is schematically illustrated in the following figure.

CPU processing (light blue) is followed by waiting for disk access cycles (white) and actual disk data read transfers (dark blue); repeated in the order of 2-10 times per query. This implies that

7

the query latency depends on the speed of the CPU, as well as the I/O latency of the storage subsystem.

A single query is evaluated separately, and in parallel, across multiple index partitions in all index columns. In the default five-partition configuration, each query is evaluated in five separate threads within every column.

Query throughput

When query load increase, multiple queries are evaluated in parallel as indicated in the figure below.

As different phases of the query evaluation occurs at different times, simultaneous I/O accesses are not likely to become a bottleneck. CPU processing shows considerable overlap, which will be scheduled across the available CPU cores of the node.

In all scenarios tested, the query throughput reaches its maximum when all available CPU cores are 100% utilized. This happens before the storage subsystem becomes saturated. More and faster CPU cores will increase the query throughput, and eventually make disk accesses the bottleneck.

Note:

In larger deployments with many index columns the network traffic between query processing and query matching nodes may also become a bottleneck, and you may consider increasing the network bandwidth for this interface.

8

Index size impact on query performance

Query latency is to some extent independent of query load up to the CPU starvation point at maximum throughput. Query latency for each query is a function of the number of items in the largest index partition.

The following diagram shows query latency on a system starting out with 5 million items in the index, with more content being added in batches up to 43 million items. The data is taken from the M6 scenario described later. Feeds are not running continuously, in order to see the feeding effects on query performance at different capacity points.

There are three periods in the graph where search has been stopped, rendered as zero latency.

You can also observe that the query latency is slightly elevated when query load is applied after an idle period. This is due to caching effects.

The query rate is on average 10 queries per minute, apart from a test for query throughput within the first day of testing. This reached around 4000 queries per minute, making the 10 qpm query rate graph almost invisible. Thus the graph above shows the light load query latency, and not latency during maximum throughput.

9

The following diagram show the feed rates during the same interval.

By comparing the two graphs, we see that an ongoing feed gives some degradation of query latency. As this scenario has a search row with backup indexer, the effect is anyhow much less than in systems with search running on the same nodes as indexer and item processing.

10

Percentile based query performance

The graphs presented earlier in this document show the average query latency. The SharePoint administrative reports also provide percentile based reports. This can provide a more representative performance summary; especially under high load conditions.

The following graph shows the percentile based query performance for the same system as in the previous section. While the previous graphs showed average query latencies around 500-

700ms, the percentile graph shows that the median latency (50 th percentile) is leveling out around 400ms when content is added. The high percentiles show larger variations, both due to the increased number of items on the system, as well as the impact of ongoing crawls.

Note:

The percentile based query performance graph includes the crawl rate of the Query SSA. This will not show any crawling activity, as the Query SSA will only crawl user profile data for the people search index. People search is not included in the test scenarios in this document.

Crawling of all other sources is performed by the FAST Search Server Content SSA.

The query throughput load test during the first day reveals that high query load will reduce the latency for the high percentiles. During low query load and ongoing feed, a large fraction of the queries will hit fresh index generations without caches. When query load increases (within the

11

maximum throughput capacity), the fraction of cold cache queries goes down. This will reduce the high percentile latencies.

Deployments with indexing and queries on the same row

As crawls and queries both use CPU resources, deployments with indexing and queries on the same row will show some degradation in query performance during content crawls. Single row deployments are likely to have indexing, query and item processing all running on the same servers.

The following test results are gathered by applying an increasing query load to a single row system. The graphs are gathered from the SharePoint administrative reports. The query latency is plotted as an area vs. the left axis, while the query throughput is a light blue line vs. the right axis.

In the following diagram there is no ongoing content feed. The colors of the graph are as follows:

 Red: Backend, that is time consumed in the FAST Search Server nodes

 Yellow: Object model

 Blue: Server rendering

12

Query latency remains stable around 700 ms up to 8 queries per second (~500 queries per minute). At this point the server CPU capacity becomes saturated. When applying even higher load, query queues build up, and latency increases linearly with the queue length.

In the following diagram, the same query load is applied with ongoing content feeding. This implies that queries need to utilize CPU capacity from the lower prioritized item processing.

Consequently, query latency now starts to increase even at low load, and also the maximum throughput is reduced from ~600 to ~500 queries per minute. Note the change in scale on the axis compared to the previous graph.

Query latency will have higher variation during feed. The spikes shown in the graph are due to the indexer completing larger work batches; leading to new index generations which invalidates the current query caches.

13

Using a dedicated search row

You can deploy a dedicated search row to isolate query traffic from indexing and item processing. This requires twice the number of servers in the search cluster, at the benefit of better and more consistent query performance. Such a configuration will also provide query matching redundancy.

A dedicated search row implies some additional traffic during crawls when the indexer creates a new index generation (a new version of the index for a given partition). The new index data is passed over the network from the indexer node to the query matching node. Given a proper storage subsystem, the main effect on query performance is a slight degradation when new generations arrive due to cache invalidation.

Search row combined with backup indexer

You can deploy a backup indexer in order to handle non-recoverable errors on the primary indexer. You will normally co-locate the backup indexer with a search row. For this scenario you should normally not deploy item processing to the combined backup indexer and search row.

The backup indexer increase the I/O load on the search row, as there will be additional housekeeping communication between the primary and backup indexer to keep the index data on the two servers in sync. This also includes additional data storage on disk for both servers.

Make sure that you dimension your storage subsystem to handle the additional load.

Network traffic

With increased CPU performance on the individual servers, the network connection between the servers can become a bottleneck. As an example, even a small 4-node FAST Search Server farm can process and index more than 100 items per second. If the average item is 250 Kbytes, this will represent around 250 Mbit/s average network traffic. Such a load may saturate even a

1Gbit/s network connection.

The network traffic generated by content feeding and indexing can be decomposed as follows:

 The indexing connector within the Content SSA retrieves the content from the source

 The Content SSA (within the SharePoint farm) passes the retrieved items in batches to the content distributor component in the FAST Search Server farm

 Each item batch is sent to an available item processing component, typically located on another server

 After processing, each batch is passed to the indexing dispatcher, which will split the batches according to the index column distribution

 The indexing dispatcher distributes the processed items to the indexers of each index column

 The binary index is copied to additional search rows (if deployed)

The accumulated network traffic across all nodes can be more than five times higher than the content stream itself in a distributed system. A high performance network switch is needed to interconnect the servers in such a deployment.

High query throughput also generates high network traffic, especially when using multiple index columns. Make sure you define the deployment configuration and network configuration to avoid too much overlap between network traffic from queries and network traffic from content feeding and indexing.

14

Web Analyzer performance dimensioning

Performance dimensioning of the Web Analyzer component depends on the number of indexed items and whether the items contain hyperlinks. Items containing hyperlinks, or is linked to, will represent the main load on the Web Analyzer.

Database-type content does normally not contain hyperlinks. SharePoint and other types of

Intranet content will often contain HTML with hyperlinks. External Web content is almost exclusively HTML documents with many hyperlinks.

Although the number of CPU cores and the amount of disk space is vital for performance dimensioning of the Web Analyzer, disk space is the most important. The following table specifies rule-of-thumb dimensioning recommendations for the Web Analyzer.

Content type Number of items per CPU core GB disk per million items

Database

SharePoint / Intranet

Public Web content

20 million

10 million

5 million

2

6

25

Note:

The table provides dimensioning rules for the whole farm. If the Web Analyzer components are distributed over two servers the requirement per server will be half of the given values.

The amount of memory needed is the same for all types of content, but depends on the number of cores used. We recommend planning for 30 MBytes per million items plus 300 MBytes per

CPU core.

The link, anchor text or click through log analysis will only be performed if sufficient disk space is available. The number of CPU cores only impacts the amount of time it takes to update the index with anchor text and rank data.

If the installation contains different types of content, the safest capacity planning strategy is to use the most demanding content type as the basis for the dimensioning. For example; if the system contains a mix of database and SharePoint content it is recommended to dimension the system as if it only contains SharePoint content.

15

Scenarios

This section describes typical deployments for variously sized search farms, with some relevant hardware variations for each scale point. The following scale points are included:

 XS: Extra-small FAST Search farm tested with 1, 5 and 8 million items

 S: Small FAST Search farm with 15 million items (planned for inclusion in future release)

 M: Medium FAST Search farm with 40 million items

 L: Large FAST Search farm with 100 million items

 XL: Extra-large FAST Search farm with 500 million items (planned for inclusion in future release)

For each of these scale points, several scenarios are defined. These are labeled as M1, M2 and so on for the medium scale point, and correspondingly for the others. Content is crawled from

SharePoint, Web servers and file shares.

Note:

The scenarios below does not include storage sizing for storing a system backup, as backups would normally not be stored on the FAST Search Server nodes themselves.

The next subsection describes the specifications shared across all scenarios, while each of the following subsections describes a specific scenario. General guidelines follow in the

"Recommendations" section.

Shared specifications across all scenarios

Note:

The FAST Search Server index does not use any SQL Server based property database. The people search index use the property database, but the test scenarios in this document does not include people search.

Query workload

This section describes the workload used for query profiling. The number of queries per second

(QPS) is varied from 1 QPS to about 40 QPS and the latency is recorded as a function of this.

The query test set consists of 76501 queries. These queries have the following characteristics:

2

3

Query terms

1

Number of queries

49195

24520

2411

4

5

7

325

43

7

There are two types of multi-term queries used:

Percentage of test set

64,53

32,16

2,81

0,43

0,06

0,01

16

1.

ALL queries (70%), meaning all terms must appear in matching items. This includes queries containing the explicit AND, as well as list of terms that implicitly is parsed as an

AND statement.

2.

ANY queries (30%), meaning at least one of the terms must appear in matching items

(OR).

The queries are chosen by random selection.

The number of user agents (simulated users) defines the query load. One agent repeats the following two steps during the test:

1.

Submit a query

2.

Wait for the response

There is no pause between the repetitions of these steps; the agent submits a new query immediately after receiving a query response.

The number of agents increases in steps during the test. For example, the figure below shows a typical test where the number of agents increases periodically, adding two agents every 15 minute.

Notes on measured disk usage

Actual disk usage is listed for all scenarios. Please note that:



 The indexer stores the processed items on disk in an XML based format called FiXML.

FiXML data serves as input to the indexing process which builds the indices. Every submitted item is stored in FiXML format. Old versions are removed once a day. The data size given contains only a single version of every item.



Raw source data size is only included for illustration. These data do not occupy any disk space on the FAST Search Server system.

FAST Search Server keeps a read-only binary index file set to serve queries while building the next index file set. The worst-case disk space usage for index data is approximately 2.5 times the size of a single index file set. The 0.5 factor constitute various temporary files.

17

 When running with primary and backup indexers the indexers may consume an additional

50 GB each for synchronization data for other data including Web Analyzer data, log files, etc.

Note:

The ratio between source data and index data depends strongly on the content type. This is related to the different amount of searchable data in the various data formats.

Configuration for extended content capacity

FAST Search Server has a default configuration that is optimized for handling up to 15 million items per index column, with a hard limit of 30 million items per index column. Some of the scenarios described in this document use a modified configuration to allow for up to 40 million items per column. This is referred to as an extended capacity configuration. The extended content capacity configuration has more index partitions within each server node. In this way low query latency can be maintained at the expense of reduced maximum QPS.

There are some tradeoffs by extending the capacity:

 Query throughput (QPS) is reduced. Query latency (while not exceeding the throughput limitation) is less affected. Query throughput reductions can be compensated with multiple search rows, but then the reduction in server count is diminishing.

 Indexing will require more resources, and also more disk accesses.

 More items per column require more storage space per server. The total storage space across the entire farm is mainly the same.

 There are fewer nodes for distributing item processing components. Initial feed rate will be reduced, as the feed rate is mainly dependent on the number of available CPU cores.

Incremental feeds will also have lower throughput as each index column has more work.

Initial pre-production bulk feeds can be accelerated by temporarily adding item processing components to eventual search rows, or additional servers temporarily assigned to the cluster.

 More hardware resources per server are required. It is not recommended to use the extended settings on a server with less than 16 CPU cores/threads (24 or more is recommended). 48 GB RAM is recommended, and a high-performance storage subsystem. See individual scenarios for tested configurations.

In summary, the extended content capacity configuration is only recommended for deployments with:

 High content volumes, but where the number of changes over time is low, typically less than 1 million changes per column over 24 hours.

 Low query throughput requirements (not more than 5-10 queries per second, depending on CPU performance of the servers).

 Search running on a different search row than the primary indexer, as the indexer is expected to be busy most of the time.

Note:

When estimating the change rate that a farm must be able to consume, keep in mind that any content change implies a load to the system, including ACL changes. ACL changes may appear

18

for many items at a time in case of access right changes to document libraries or sites, resulting in high peak update rates.

Note:

Modifying the indexer configuration has implications on how to perform patch and service pack upgrades. See procedure below.

Enable extended content capacity

In order to reconfigure the indexers to handle up to 40 million items per column, you must modify the indexer template configuration file and run the deployment script to generate and distribute the new configuration.

Note:

Only apply the following procedure to indexers which do not contain any data.

1.

2.

Verify that no crawling is ongoing

Verify that no items are indexed on any of the indexers. Run the following command:

3.

%FASTSEARCH%\bin\indexerinfo –a doccount

All the indexers should report 0 items.

On all FAST Search Server nodes, run the following commands: net stop fastsearchservice

4.

net stop fastsearchmonitoring

On the administration server node: a.

Save a backup of the original configuration file,

%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data

\RTSearch\clusterfiles\rtsearchrc.xml.win32.template

You may need this backup at a later stage if this configuration file is modified in any patch or service pack upgrade. b.

c.

Modify the following values within the original configuration file: i.

ii.

Set the numberPartitions to 10 (the default is 5).

Set the docsDistributionMax to

6000000,6000000,6000000,6000000,6000000,6000000,6000000,6000000

,6000000,6000000

The default value is 6000000,6000000,6000000,6000000,6000000

Modify the deployment file to enable re-deployment. d.

%FASTSEARCH%\etc\config_data\deployment\deployment.xml must be modified in order for the Set-FASTSearchConfiguration PowerShell cmdlet to run the re-deployment. You can do that by opening the file in Notepad, add a space and save the file.

Run the following commands:

Set-FASTSearchConfiguration

19

net start fastsearchservice

5.

On all non-administration server nodes, run the following commands:

Set-FASTSearchConfiguration net start fastsearchservice

Handling patches and service pack upgrades

For all future patch or service pack updates, you need to verify if this configuration file is updated as part of the patch or service pack update. Review the readme file thoroughly to look for any mention of this configuration file. If a patch or service pack involves an update of this configuration file, the following steps must be followed.

1.

Replace the configuration file

%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSe arch\clusterfiles\rtsearchrc.xml.win32.template with the backup of the original file that you have saved.

2.

Perform the patch or service pack upgrade according to the appropriate procedure.

3.

Perform the change to the configuration file template as specified above. Do not forget to back up the modified configuration file template!

Extra small FAST Search Farm

The extra small FAST Search farm is targeting a smaller test corpus with high query rates. The amount of content is up to 8 million items. There are no off-business hours with reduced query load. Crawls are likely to occur at any point in time. Query performance is measured at 1M, 5M and 8M content volume.

The configuration for the parent SharePoint farm uses four front-end Web servers, one application server and one database server arranged as follows:





One crawl component of the Content SSA is running on a single server.

One of the application servers also hosts Central Administration for the farm.

 One database server hosts the crawl databases, the FAST Search Server administration databases, as well as the other SharePoint databases.

Application server and Web front end servers will only have disk space for operating system and programs. No separate data storage is required.

20

Deployment alternatives

For the extra small farm scenario, the following alternatives have been tested for the FAST

Search Server farm back-end:

XS1.

Single server install hosting all FAST Search Server components using regular disk drives

XS2.

Same as XS1, deployed on single virtual machine

XS3.

Four nodes install running on four virtual machines, all running on the same physical server.

XS4.

Same as XS1, with the addition of a dedicated search row (2 servers)

XS5.

Same as XS1, but with storage on SAS SSD drives

Specifications

This section provides detailed information about the hardware, software, topology, and configuration of the test environment.

Hardware

FAST Search Server farm servers

All the extra small size deployment alternatives are running on similar hardware, although some of the setups using virtualization, others solid state disk (SSD) storage.

Shared specifications:

 Windows Server 2008 R2 x64 Enterprise Edition

 2x Intel L5520 CPUs o Hyper-threading switched on o Turbo Boost switched on





24 GB memory

1 Gbit/s network card

 Storage subsystem o OS: 2x 146GB 10k RPM SAS disks in RAID1 o Application: 7x 146 GB 10k RPM SAS disks in RAID5. Total formatted capacity of

880 GB. o Disk controller: HP Smart Array P410, firmware 3.30 o Disks: HP DG0146FARVU, firmware HPD6

Changes for XS2 and XS3:

 Virtualized servers running under Hyper-V. Host server has same specification as XS1 o 4 CPU cores o 8 GB memory o 800 GB disk on servers with index component

21

Changes for XS5:

 Storage subsystem o Application: 2x 400 GB SSD disks in RAID0. Total formatted capacity of 800 GB. o SSD disks: Stec ZeusIOPS MLC Gen3, part Z16IZF2D-400UCM-MSF

SharePoint Server 2010 servers

Application and Web front end servers do not need storage apart from operating system, application binaries and log files.

 Windows Server 2008 R2 x64 Enterprise edition

 2x Intel L5420 CPUs

 16 GB memory

 1 Gbit/s network card

 Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1

SQL servers

Same specification as for SharePoint 2010 servers above, with additional disk RAID for SQL data with 6x 146GB 10k RPM SAS disks in RAID5.

22

Topology

This section describes the topology of the test environment for all deployment alternatives.

XS1

XS1 is a generic single node install using the following deployment.xml file:

<?xml version="1.0" encoding="utf-8" ?>

<deployment version="14" modifiedBy="contoso\user"

modifiedTime="2009-03-14T14:39:17+01:00" comment="XS1"

xmlns=”http://www.microsoft.com/enterprisesearch”

xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”

xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">

<instanceid>XS1</instanceid>

<connector-databaseconnectionstring>

[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS1.jdbc]]>

</connector-databaseconnectionstring>

<host name="fs4sp1.contoso.com">

<admin />

<query />

<content-distributor />

<indexing-dispatcher />

<searchengine row="0" column="0" />

<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>

<document-processor processes="12" />

</host>

<searchcluster>

<row id="0" index="primary" search="true" />

</searchcluster>

</deployment>

XS2

XS2 is a generic single node install without a deployment file, running on a single virtual machine. In practice, the following deployment file would have given the same setup.












<admin />

<query />






</host>

<searchcluster>


</searchcluster>

</deployment>

23

XS3

XS3 is distributed across four virtual machines, getting a comparable hardware footprint to XS1.












<admin />

<query />


</host>




</host>




</host>


<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4" />


</host>

<searchcluster>


</searchcluster>

</deployment>

XS4

XS4 is the same as the XS1 deployment, but extended with an additional search row to get search redundancy.












<admin />

<query />




24



</host>


<query />


</host>

<searchcluster>


<row id="1" index="none" search="true" />

</searchcluster>

</deployment>

XS5

XS5 uses the same deployment file as XS1, but with storage on SAS SSD drives.

Dataset

This section describes the test farm dataset, including database content and sizes, search indexes, and external data sources. The overall metrics are shown in the table below.

Object

Search index size (# of items)

Size of crawl database

Size of crawl database log file

Size of property database

Value

5.4 M

16.7 GB

1.0 GB

< 0.1 GB

Size of property database log file

Size of SSA administration database

< 0.1 GB

< 0.1 GB

The table below shows the content source types used to build the index. The numbers in the table reflect the total number of items per source. The difference between the total number of items and the index size above is due to two factors:

 Items may be disabled from indexing in the content source

 The document format type cannot be indexed

For SharePoint sources, the size of the respective content database in SQL is used as the raw data size.

Content source

HTML 1

SharePoint 1

HTML 2

Total

Items

1.1 M

4.5 M

3.2 M

8.8 M

Raw data size Average size per item

8.8 GB 8.1 kB

2.0 TB

137 GB

2.2 TB

443 kB

43 kB

246 kB

The test scenarios do not include people search data. People Search is crawled and indexed in a separate index within the Query SSA.

25

Test Results

This section provides data that shows how the farm performed under load.

Feeding and indexing performance

All configurations apart from XS2 have the same CPU resources available. XS2 is running on a single virtual machine, and is thus limited to four CPU cores, as opposed to 16 for the others.

The following graph shows the average number of items per second for the different content sources during a full crawl:

Overall, XS2 shows 65-70% performance degradation compared to running on physical hardware. This is expected, as the single VM is restricted by available CPU resources. XS3, running four VMs and thus having the same hardware footprint as XS1, results on 35-40% degradation compared to running directly on the host computer. The major degradation stems from the lower IO performance when running in a virtual machine, using a fixed size VHD file.

The spilt of XS3 resources across four virtual machines also infers more server-to-server communication.

Query performance

The following sub sections describe the query performance impact both from having different farm deployments and varying content volume. There is also a separate test section for the effects of tuning for the low document volume in the XS scenarios, combined with solid state storage disks (SSD).

26

Impact of deployment configuration

The above graph shows the query performance of the different scenarios when there is no ongoing feed. XS1 and XS5 show only minor differences, with a slightly better performance for the SSD based XS5 (running with two SSDs versus 7 regular SAS spindles for XS1). As expected, the additional search row in XS4 does not improve query performance under idle crawl conditions. XS4 has the same throughput as XS1/XS5 under high load, but with slightly increased latency. This is due to queries being directed to both search rows, implying a lower cache hit ratio; as well as intra-node communication.

The virtualized scenarios (XS2 and XS3) have a significantly lower query performance, and also with higher variation than the non-virtualized options. As observed for feed performance, this reduction is related to the storage performance, in addition to the search components having maximum four CPU cores at disposal.

27

The situation is somewhat different in the above graph, showing query performance under full crawl. The single server XS1 scenario gets a reduction in query performance under concurrent crawl load. XS5 has less impact due to the improved storage performance, but does still see CPU congestion between item processing and query components. XS4 is least impacted, as this scenario has a dedicated search row. XS4 results vary more at concurrent high query and feed load due to competition for network resources.

The virtualized scenarios are both below 10 QPS maximum throughput under these load conditions. XS1 (native hardware) and XS3 (virtualized) have the same hardware footprint, with the non-virtualized configuration having more than five times the throughput. Some of this difference is due to virtualization overhead, especially storage performance; and some due to the limitations of a virtual machine with regards to how many CPU cores it has available. Under high search load, the query components can use all 16 CPU cores in XS1, while this is restricted to maximum four CPU cores with XS3.

Impact of varying content volume

Even though the XS-scale scenarios are sized for 8M documents, query performance testing was also run at 1M and 5M items indexed. The following graph shows how the content capacity affects query performance:

28

The solid lines show that maximum query capacity improves with less content, with maximum

90 QPS at 1M items, 80 QPS at 5M items, and 64 QPS at 8M items. During feed, the 1M index can still sustain > 40 QPS, although with a lot of variance. This is due to the total index size being relatively small, and most of it being able to fit inside application and OS level caches.

Both 5M and 8M indices have a lower maximum query performance during feed, in the 25-30

QPS range.

Impact of tuning for high performance storage

Even if the XS5 scenario demonstrated improved performance over XS1 with default settings, configuration tuning allows better utilization of the higher IOPS potential in SSDs. This tuning is done the same way as enabling extended content capacity discussed earlier, although by only changing docsDistributionMax setting, and not the number of partitions: docsDistributionMax=”2500000,2500000,2500000,2500000,2500000”

This will reduce the maximum practical capacity per column to 8–9 million items, but also spread the workload across multiple smaller partitions than the default setting. This allows for more parallel query execution, at the expense of more disk operations.

The following graph shows the result of this tuning at full capacity (8M items), which allows the

SSD based XS5 scenario to serve up to 75 QPS, and also reduce the response time under light query load. For example, the response time at 40 QPS at idle crawl is reduced from 0.4 to 0.2 seconds. Further, the response time during crawls is better with this tuning, as well as more consistent. The tuned XS5 scenario is able to deliver around 40 QPS with sub-second latency during crawls, while XS1 only delivered 15 QPS with the same load and latency requirements.

29

In total, using high performance storage provides improved query performance, especially during concurrent content crawls, and thus reduces or even eliminates the performance driven need to run search on dedicated rows. SSDs also provide sufficient performance with a smaller number of disks. In this case two SSDs outperform seven SAS spindles. This is attractive where power, or space restrictions, does not allow for a larger number of disks, for example for blade servers.

Disk usage

The table below shows the combined increase in disk usage on all nodes after the various content sources have been indexed. Note that scenarios using replication of FiXML and/or index data needs additional space.

Content source

HTML 1

SharePoint1

HTML 2

Total

Raw source data size

1.1 M

4.5 M

3.2 M

8.8 M

FiXML data size

6 GB

41 GB

27 GB

74 GB

Index data size

20 GB

108 GB

123 GB

251 GB

Other data size

4 GB

15 GB

22 GB

41 GB

Small FAST Search Farm

This scenario has not yet been tested. Planned capacity is 15 million items per farm.

30

Medium FAST Search Farm

The medium FAST Search Farm is targeting a moderate test corpus. The amount of content is up to 40 million items, and to meet freshness goals, incremental crawls are likely to occur during business hours.

The configuration for the parent SharePoint farm uses two front-end Web servers, two application servers and one database server arranged as follows:

 Two crawl components for the Content SSA are distributed across the two application servers. This is mainly due to I/O limitations in the test setup (1 Gbit/s network), where a single network adapter would have been a bottleneck.

 One of the application servers also hosts Central Administration for the farm.


Application servers and Web front end servers will only have disk space for operating system and programs. No separate data storage is required.


For the medium farm scenario, the following alternatives have been tested for the FAST Search

Server farm back-end:

M1.

One combined administration and Web Analyzer server, and three index column servers with default configuration (4 servers)

M2.

Same as M1, but using SAN storage (4 servers)

M3.

A single high capacity server hosting all FAST Search Server components

M4.

Same as M1, with the addition of a dedicated search row (7 servers)

M5.

Same as M3, with the addition of a dedicated search row (2 servers)

M6.

Same as M4, but where the search row includes a backup indexer row (7 servers)

M7.

Same as M5, but where the search row includes a backup indexer row (2 servers)

M8.

Same as M3, but using solid state drives (1 server)

M9.

Same as M3, but on more powerful hardware (1 server)

M10.

Same as M1, but using solid state drives for indexer/search nodes (4 servers)

Specifications


Hardware


The following hardware specifications have been used for the medium size deployment alternatives.

Shared specifications:


31


 24 GB memory


 Storage subsystem o OS: 2x 146GB 10k RPM SAS disks in RAID1 o Application: 18x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 9 drives each). Total formatted capacity of 2 TB. o Disk controller: HP Smart Array P410, firmware 3.00 o Disks: HP DG0146FARVU, firmware HPD5

Changes for M2:

 Application is hosted on 2TB partitions on a SAN

 SAN used for test o 3Par T-400 o 240 15k RPM spindles (450GB each) o Dual ported FC connection to each application server using MPIO without any FC switch. MPIO enabled in the operating system.

Changes for M3/M5:

 48 GB memory

 Application is hosted on 22x300GB 10k RPM SAS drives in RAID50 (two parity groups of

11 spindles each). Total formatted capacity of 6TB.

Changes for M7:






48 GB memory

Dual 1 Gbit/s network card

 Storage subsystem: o Application hosted on 12x 1TB 7200 RPM SAS drives in RAID10. Total formatted capacity of 6TB. o Disk controller: Dell PERC H700, firmware 12.0.1-0091 o Disks: Seagate Constellation ES ST31000424SS, firmware KS65

Changes for M8:


 48 GB memory

 Dual 1 Gbit/s network card

 Storage subsystem: o Application: 3x 1280 GB SSD cards in RAID0. Total formatted capacity of 3.6 TB.

32

o SSD cards: Fusion-IO ioDrive Duo 1.28 TB MLC, firmware revision 43284, driver

2.2 build 21459

Changes for M9:

 2x Intel X5670 CPUs o Hyper-threading switched on o Turbo Boost switched on

 48 GB memory


 Storage subsystem: o Application hosted on 12x 600GB 15k RPM SAS drives in RAID50. Total formatted capacity of 6TB. o Disk controller: LSI MegaRAID SAS 9260-8i, firmware 2.90-03-0933 o Disks: Seagate Cheetah 15K.7 ST3600057SS, firmware ES62

Changes for M10:


 48 GB memory


 Storage subsystem (search cluster nodes only): o Application: 1x Fusion-IO ioDrive Duo 1.28 TB MLC SSD card, firmware revision

43284, driver 2.2 build 21459









16 GB memory

1 Gbit/s network card


SQL servers


33

Topology

This section describes the topology of the test environment for all deployment alternatives.

Note:

All the tested deployment alternatives use the same SharePoint Server and Database Server configuration as shown for M1/M2/M10. For the other deployments only the FAST Search

Server farm topology is shown.

M1/M2/M10

M1, M2 and M10 are similar except for the storage subsystem. M1 is running on local disk, while

M2 uses SAN storage and M10 uses solid state storage for the search cluster. All three deployment alternatives have a search cluster with 3 index columns and one search row. There is one separate administration node that also includes the Web Analyzer components. Item processing is spread out across all nodes.

None of these three alternatives have a dedicated search row. This implies that there will be a noticeable degradation in query performance during content feeds. The impact can be reduced by feeding in off-peak hours, or by reducing the number of item processing components to reduce the maximum feed rate.

34

The following figure shows the M1 deployment alternative. M2 and M10 have the same configuration.

35

The following deployment.xml file is used for M1, M2 and M10.



modifiedTime="2009-03-14T14:39:17+01:00" comment="M1"




<instanceid>M1</instanceid>


[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M1.jdbc]]>



<admin />

<query />



</host>





</host>





</host>





</host>

<searchcluster>


</searchcluster>

</deployment>

36

M3

The M3 scenario combines all components on one server. Running concurrent feeding and query load has the same impact as for M1/M2/M10, but in addition, the reduced number of servers implies fewer items processing components, and thus lower feed rate.

The following figure shows the M3 deployment alternative.

The following deployment.xml file is used.












<admin />

<query />






</host>

<searchcluster>


</searchcluster>

</deployment>

37

M4

M4 corresponds to M1/M2/M10 with the addition of a dedicated search row. The search row adds query throughput capacity, introduces query redundancy, and provides better separation of query and feeding load. Each of the three servers running the dedicated search row also includes a query processing component (query). In addition, the deployment includes a query processing component on the administration node (fs4sp1.contoso.com). The Query SSA does not use this query processing component during normal operation, but may be used as a fallback to be able to serve queries if the entire search row is taken down for maintenance.


38













<admin />

<query />



</host>





</host>






</host>





</host>


<query />


</host>


<query />


</host>


<query />


</host>

<searchcluster>



</searchcluster>

</deployment>

39

M5

M5 corresponds to M3 with the addition of a dedicated search row, giving the same benefits as

M4 compared to M1/M2/M10.


40













<admin />

<query />






</host>


<query />


</host>

<searchcluster>



</searchcluster>

</deployment>

M6

M6 has the same setup as M4 with an additional backup indexer enabled on the search row. The backup indexer is deployed by modifying the M4 deployment .xml file as shown below.

…

<searchcluster>


<row id="1" index="secondary" search="true" />

</searchcluster>

…

41

M7

M7 has the same setup as M5, with an additional backup indexer enabled on the search row. M7 is also running on nodes with more CPU cores (see hardware specifications), allowing to increase the number of item processing components in the farm; also running on the search row. The following deployment.xml is used.












<admin />

<query />






</host>


<query />



</host>

<searchcluster>



</searchcluster>

</deployment>

42

M8/M9

The M8/M9 deployment alternatives combine all components on one server, just like M3. The differences are that M8/M9 are running on hardware with better performance, especially for the disk subsystem, and that they have an increased number of CPU cores that allows for more item processing components.

M8 uses solid state storage. M9 has more CPU power (X5670 vs. L5520/L5640 used on most other M-scale tests) and the fastest disk spindles readily available (12x 15k RPM SAS disks).

The following deployment.xml file is used for both M8 and M9.












<admin />

<query />






</host>

<searchcluster>


</searchcluster>

</deployment

Dataset

This section describes the test farm dataset, including database content and sizes, search indexes, and external data sources. The below table shows the key metrics data:

Object




Value

42.7 million

138 GB

11 GB



<0.1 GB

0.3 GB

Size of SSA administration database <0.1 GB

The table below specifies which content source types that are used to build the index. The numbers in the table reflect the total number of items per source, including replicated copies.

The difference between the total number of items below (43.8 million) and the index size above

(42.7 million) is due to two factors:


43



Content source items

File share 1 (2 copies)


SharePoint 1

SharePoint 2

HTML 1

HTML 2

Total

1.2 M

29.3 M

4.5 M

4.5 M

1.1 M

3.2 M

43.8 M


154 GB 128 kB

6.7 TB 229 kB

2.0 TB 443 kB

2.0 TB 443 kB

8.8 GB 8.1 kB

137 GB 43 kB

11 TB 251 kB

Note:

To reach sufficient content volume in the testing of the medium scenario, two replicas of the file shares were added. Each copy of each document would then appear as a unique item in the index, but treated as duplicates by the duplicate trimming feature. From a query matching perspective the load would be similar to having all unique documents indexed, but any results from these sources would trigger duplicate detection and collapsing in the search results.


Test Results


Feeding and indexing performance

Test results from feeding M1 through M6 are included below. The others not included as those do not show significantly different feed performance than their respective deployment alternatives.

44

Full crawl

The following diagram shows the average number of items processed per second for the various deployment alternatives.

For full crawls, the item processors represent the bottleneck. The limiting factor is the CPU processing capacity.

M1, M2 and M4 have similar performance characteristics due to having the same number of item processors available, the same applies to M3 and M5, but note that M1, M2 and M4 have four times higher crawl performance during a full crawl. The reason for this is that they have four times the item processor capacity compared to M3 and M5. Comparing M6 to M4 it also becomes apparent that running with backup indexers incurs a performance overhead due to the extra synchronization work required. Typically an installation without backup indexers, like M4, will outperform one with, like M6.

Incremental crawl

The following diagram shows the average number of items processed per second for the various deployment alternatives.

45

Incremental crawls are faster than full crawls, from slightly faster up to a factor of 2 or 3. This is mainly due to the fact that incremental crawls mostly consist of partial updates, which only updates metadata. This also implies that the feeding performance is largely the same for all content types.

For incremental crawls it is the indexers that are the bottleneck since the item processing load is limited. Typically disk I/O capacity is the limiting factor. During an incremental update the old version of the item is fetched from disk, modified, persisted to disk and then indexed. This is more expensive than a full crawl operation where the item is only persisted and indexed.

Note:

M1 through M5 was tested with content sources having less performance than the other scenarios in this document. The performance numbers can thus not be directly compared, as

M1 through M5 tests were to some extent limited by the bandwidth of the content sources.

46


M1

The following diagram shows the query latency as a function of QPS for the M1 deployment alternative.

An idle indexer gives best query performance, with an average latency less than 0.7 until approximately 21 QPS. The corresponding numbers when doing a full crawl is 10 QPS. The corresponding numbers when doing an incremental crawl is 15 QPS.

Note that the latency is not impacted by higher QPS until you reach max system capacity. The figure shows that QPS will decrease and latency will increase if you apply more query load after the maximum capacity of the system has been reached. This occurs at the point where the curve starts bending "backwards". On the M1 system the peak QPS is about 28, with idle indexers. CPU resources are the bottleneck in this scenario. The behavior is also illustrated in the next diagram where you can observe that performance decrease when having more than 40 simultaneous user agents on an idle system.

47

48

Hyper-Threading

CPU resources are the bottleneck in the M1 deployment alternative. Enabling hyper-threading allows more threads to execute in (near) parallel, at the expense of slightly reduced average performance for single-threaded tasks. Note that the query matching components will run in a single thread when QPS is low and no other tasks are running on the server.

Hyper-threading performs better for all the three feeding cases in the M1 deployment alternative. In the deployment alternative with dedicated search rows, a small reduction (around

150ms) in query latency is observed when running at very light query load.

The following diagram shows the impact of using hyper-threading in the CPU.

In general, hyper-threading reduces query latency and allows for higher QPS, especially when having multiple components on the same server. Disabling hyper-threading only provides a small improvement under conditions where the performance already is good. Hence, having hyper-threading enabled is recommended.

49

M2

The following diagram shows the query latency as a function of QPS for the M2 deployment alternative.

Note that the latency is not impacted by higher QPS until you reach max system capacity. When the indexers are idle, there is a slow latency increase until the deployment reaches the saturation point at approximately 20 QPS. For full and incremental crawls the latency increases as indicated in the graph. This test does not include test data to indicate exactly when the query latency saturation takes place during the crawl.

The following diagram shows the same test data presented as user agents versus latency:

50

Comparing M1 and M2

The next diagram compares the performance of M1 and M2. The main conclusion is that M1 performs somewhat better than M2. M1 is able to handle about 3 QPS more than M2 before reaching the saturation point. The SAN disks used on M2 should be able to match M1’s locally attached disks in terms of I/O operations per second, but the bandwidth towards the disks is somewhat lower with the SAN configuration.

For full crawl and incremental crawl the performance was comparable during the light load tests.

During heavy load, ongoing indexing had less impact on M2 as the SAN provided more disk spindles to distribute the load.

51

M3

The following diagram shows how the M3 deployment alternative (40 million items on a single server) is able to handle about 10 QPS with idle feeding. This is shown in the diagram below as

QPS versus latency. For comparison the M1 data is also included.

One characteristic of the single node installation is that the query latency fluctuates more when getting close to the saturation point.

Under low query load, M3 is almost able to match the performance of M1, but during higher load the limitations become apparent. M1 has three times the number of query matching nodes and the peak QPS capacity is close to three times as high, 28 versus 10 QPS.

52

M4

M4 is an M1 deployment with an added dedicated search row. The main benefit is that the index and the search processes are not directly competing for the same resources, primarily disk and

CPU.

The diagrams below show that the added search row in M4 gives a 5 QPS gain versus M1. In addition, the query latency is improved by about 0.2-0.4 seconds.

Adding search rows will in most cases improve query performance, but at the same time it introduces additional network traffic that may impact the performance.

The query performance may degrade when adding search rows if you do not have sufficient network capacity. This is the case when the indexers copies large index files to the query

53

matching nodes. The index file copying may also impact index latency performance due to the added need for copying of indices.

The following diagram shows the result of running the query test on M4 having 18, 28 and 43 million documents indexed.

The document volume impacts the maximum QPS the system is able to deliver. Adding ~10 million documents gives a ~5 max QPS reduction. Below 23 QPS the document volume has low impact on the query latency.

54

M5

The diagram below shows the query performance of the M5 versus the M3 topology.

As illustrated when comparing M1 and M4, adding a dedicated search row improves query performance. The same is the case when adding a query matching node to the single node M3 setup in order to get an M5 deployment.

M6

The diagram below shows the query performance of the M6 versus the M4 topology.

The difference between M6 and M4 is the addition of a backup indexer row. The backup indexers will compete with query matching for available resources, and may degrade query performance.

However, in this specific test, that was not the case. The hardware used had enough resources to handle the extra load during normal operations.

55

The backup indexers use significantly less resources than the primary indexer. This is due to the fact that the primary indexers performs the actual indexing and distributes the indices to the search rows and backup indexer row.

Note:

All indexers perform regular optimization tasks of internal data structures between 03.00 AM and 05.59 AM every night. These tasks may, depending on the feed pattern, be quite I/O intensive. Testing on M6 has shown that you may see a significant reduction in query performance during indexer optimization processes. The more update and delete operations the indexer handles, the more optimization is required.

M7

The following diagram shows the query latency as a function of QPS for the M7 deployment alternative compared to M5.

M7 is very similar to the M5 scenario, but it is running on servers with more powerful CPUs and more memory. On the other hand, it has a disk subsystem not capable of the same amount of

I/O operations per second (IOPS). The M7 storage subsystem has more bulk capacity but less performance compared to M5.

56

The main difference in results compared to M5 is a slightly increased QPS rate before the system becomes saturated. M5 is saturated around 10 QPS, while M7 provides roughly 12 QPS. This is due to the increased CPU performance and added memory, although partly counterbalanced by the weaker disk configuration.

M8

M8 has the same extended capacity application configuration as M3 and the following M9, but it is using solid state storage with much higher IOPS and throughput capabilities. This system is only limited by the available CPU processing power. Thus a more powerful CPU configuration, e.g. with quad CPU sockets, should be able to get linearly performance improvements with the added CPU resources.

Query performance results for M8 are discussed together with M9 below.

M9

The M9 deployment alternative has the same extended capacity application configuration as M3 and M8, but has improved CPU performance (X5670) and high end disk spindles (15k RPM SAS).

M9 is thus an example of the achievable performance gains by using high end components, keeping regular disk spindles for storage.

The improved CPU performance implies 20-30% increased crawl speeds for M9 over M8 (both with 20 item processor components), and even more compared M3 (which had 12 item processing components). Note that M3 ran with a less powerful server for the content sources, and was more often limited by the sources than M8 and M9. M9 achieved >50 document per second for all content sources.

The following graph shows the query performance for M8 and M9 under varying load patterns, compared to M3 and M5:

57

The following observations can be made:

 Both M8 and M9 perform better during idle feed than M5. M5 performance is only shown under feed, but as M5 has a dedicated search row, the query performance is relatively constant irrespective of ongoing feed or not. The main contribution to peak query rate improvements are the additional CPU resources on M8 and M9 compared to M5.

 During idle feed, M9 will get slightly better QPS than M8 due to the more powerful CPU.

Under overload conditions (>1 second latency), M9 although degrades in performance due to an overloaded storage subsystem (just like M5), while M8 can sustain the peak rate with its solid state storage.

 M3 (M5 without the search row) saturates already at 5 QPS. M9 does provide higher QPS rates. M9 has higher latency than M3 at low QPS during feed, as M9 has >50% higher feed rates than M3 (due to more item processors and faster CPU). M9 with feed rates reduced to M3 levels would have given better query performance than M3 also at low

QPS.

 During feeds, M8 query performance is degraded <20% compared to idle, dominantly due to CPU congestion. Thus the storage subsystem on M8 makes it possible to maintain good query performance during feed without doubling the hardware footprint with a search row. Adding more CPU resources would allow for further increase in query performance, as the storage subsystem still has spare resources in the current setup.

 On M9, query latency roughly doubles during feed. M9 can still deliver acceptable performance under low QPS loads with concurrent feed, but is much more affected than

M8. This is due to the storage subsystem on M9 having slower read accesses when combined with write traffic from feeding and indexing.

58

M10

The M10 deployment alternative has the same configuration as M1 and M2 but improves performance by using solid state storage. M10 is using the same amount of storage as M8, but spreading this across three search cluster servers to get more CPU power.

It is most interesting to compare M10 to M4, as both setups try to achieve a combination of high crawl rate and query performance at the same time. In M4, this is done by splitting the application storage across two search rows, each with 3 columns with 18 SAS disk per server.

M10 only has a single row and is replacing the application disk spindles with solid state storage.

The search cluster totals are thus (both deployment alternatives have an additional administration server):

 M4: 6 servers, 108 disk spindles

 M10: 3 servers, 3 solid state storage cards

With idle content crawls (solid lines), M4 achieves around 23 qps, before degrading to around 20

QPS under overload conditions, with IO becoming the bottleneck. M10 is able to deliver 30 QPS, at which point it becomes limited by the throughput CPU. Using faster or more CPUs would have increased this benefit even more than the measured 30% gain.

During content crawling, M4 has no significant changes in query performance compared to idle.

It is achieving crawl and query load separation by using an additional set of servers in a dedicated search row. M10 gets some degradation, as content processing and queries compete for the same CPU resources. Still, M10 achieves the same 20 QPS as M4 under the highest load conditions. Also note than the content crawling rate on M10 is 20% higher than on M4 during this test, as the increased IO performance allows for better handling of the concurrent operations.

59

Disk usage

Index disk usage

The table below shows the combined increase in disk usage on all nodes after the various content sources have been indexed.

Content source



SharePoint 1

SharePoint 2

HTML 1

HTML 2

Total


154 GB

6.7 TB

2.0 TB

2.0 TB

8.8 GB

137 GB

11 TB

FiXML data size

18 GB

360 GB

70 GB

66 GB

8 GB

31 GB

553 GB

Index data size

36 GB

944 GB

220 GB

220 GB

20 GB

112 GB

1.6 TB

Other data size

5 GB

10 GB

13 GB

17 GB

8 GB

6 GB

56 GB

Web Analyzer disk usage

The following table shows disk usage for the Web Analyzer in a mixed content scenario, where the data is both file share content and SharePoint items.

Number of items in index

Number of analyzed hyperlinks

Average number of hyperlinks per items

Peak disk usage during analysis (GB)

Disk usage between analysis (GB)

Disk usage per 1 million items during peak (GB)

40,667,601

119,672,298

2.52

77.51

23.13

1.63

Note:

The values in the table above are somewhat lower than the values specified for Web Analyzer performance dimensioning in the search overview earlier in this document. The values above derive from one specific installation where URLs are fairly short. The performance dimensioning recommendations are based on experience from several installations.

The average number of links per item is quite low compared to pure Web content installations, or pure SharePoint installations. For example, in a pure Web content installation the average number of links can be as high as 50. Since the Web Analyzer only stores document IDs, hyperlinks and anchor texts, the number of links is the dominant factor determining the disk usage.

60

Large FAST Search Farm

The large FAST Search Farm is targeting a moderate test corpus. The amount of content is up to

100 million items, and to meet freshness goals, incremental crawls are likely to occur during business hours.

The configuration for the parent SharePoint farm uses two front-end Web servers, two application servers and one database server arranged as follows:

 Two crawl components for the Content SSA are distributed across the two application servers. This is mainly due to I/O limitations in the test setup (1 Gbit/s network), where a single network adapter would have been a bottleneck.

 One of the application servers also hosts Central Administration for the farm.


Application servers and Web front end servers will only have disk space for operating system and programs. No separate data storage is required.


For the large farm scenario, the following alternatives have been tested for the FAST Search

Server farm back-end:

L1.

Single row, six column setup, with an additional administration node (7 servers)

L2.

Same as L1, with the addition of a dedicated search row (13 servers)

L3.

Same as L2, but where the search row includes a backup indexer row (13 servers)

Specifications


Hardware


All the large size deployment alternatives are running on similar hardware. The following specifications have been used.



 24 GB memory


 Storage subsystem o OS: 2x 146GB 10k RPM SAS disks in RAID1 o Application: 12x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 6 drives each). Total formatted capacity of 2 TB. o Disk controller: HP Smart Array P410, firmware 3.30

61

o Disks: HP DG0146FARVU, firmware HPD6





 16 GB memory



SQL servers


62

Topology

This section describes the topology of the test environment.

L1

L1 is a single row, six column setup with an additional administration node. The following deployment .xml file is used:



modifiedTime="2009-03-14T14:39:17+01:00" comment="L1"




<instanceid>L1</instanceid>


[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L1.jdbc]]>



<admin />

<query />



</host>


<query />



</host>





</host>





</host>





</host>





</host>





</host>

<searchcluster>


</searchcluster>

</deployment>

63

L2

L2 corresponds to L1 with the addition of a dedicated search row. The search row adds query throughput capacity, introduces query redundancy, and provides better separation of query and feeding load. Three servers running in the dedicated search row also includes a query processing component (query). The deployment also includes a query processing component on the administration node (fs4sp1.contoso.com). The Query SSA does not use this query processing component during normal operation, but may be used as a fallback to be able to serve queries if the entire search row is taken down for maintenance.



modifiedTime="2009-03-14T14:39:17+01:00" comment="L2"




<instanceid>L2</instanceid>


[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L2.jdbc]]>



<admin />

<query />



</host>




</host>





</host>





</host>





</host>





</host>





</host>

<host name=" fs4sp8.contoso.com ">

<query />


64

</host>

<host name="fs4sp9.contoso.com ">

<query />


</host>


<query />


</host>



</host>



</host>

<host name="fs4sp13.contoso.com"


</host>

<searchcluster>



</searchcluster>

</deployment>

L3

L3 has the same setup as L2 with an additional backup indexer enabled on the search row. The backup indexer is deployed by modifying the L2 deployment .xml file as shown below.

…

<searchcluster>



</searchcluster>

…

Dataset

This section describes the test farm dataset, including database content and sizes, search indexes, and external data sources.

Object






Size of SSA administration database

Value

103 million

358 GB

65 GB

<0.1 GB

0.6 GB

<0.1 GB

65

The table below specifies the content source types used to build the index. The numbers in the table reflect the total number of items per source, including replicated copies. The difference between the total number of items below and the index size above is due to two factors:




Content source items



SharePoint 1 (4 copies)

SharePoint 2 (3 copies)

HTML 1 (3 copies)

HTML 2 (3 copies)

Total

Note:

2.4 M

58.6 M

18.1 M

13.6 M

3.2 M

9.5 M

105.5 M


308 GB 128 kB

13.4 TB 229 kB

8.0 TB 443 kB

6.0 TB 443 kB

26 GB 8.1 kB

411 GB 43 kB

28 TB 268 kB

To reach sufficient content volume in these tests, replicas of the data sources were added.

Each copy of each document would appear as a unique item in the index, but treated as duplicates by the duplicate trimming feature. From a query matching perspective the load would be similar as having all unique documents indexed, but any results from these sources would trigger duplicate detection and collapsing in the search results.


Test Results


Feed and indexing performance

All the large scenario deployment alternatives were limited by the bandwidth of the content sources; L1 through L3 all achieved around 200 items per second feed rates.


66

L1, L2 and L3 are scaled up versions of M1, M4 and M6 respectively. More columns are added to

the farm to be able to index more content while maintaining the query performance. The

following graph shows the query performance for L1 through L3, with and without ongoing crawls. L2 and L3 are giving roughly the same performance, and also the same performance as

the corresponding smaller scale M4 and M6.

L1 shows a slightly different performance pattern compared to M1. Given 1 second maximum allowable latency, M1 achieved 27 and 16 QPS under respectively idle and running feed. The

same numbers for L1 is 18 and 9 QPS. The additional columns in L1 compared to M1 infer that

more servers are involved, where the slowest one for any given query at any given time will be the determining factor for the query latency. This effect is much less visible for the L2 and L3 deployments with dedicated search rows, as these machines do not have other components competing for resources.

Disk usage

The table below shows the combined disk usage on all nodes in the L1 deployment alternative.

L2 and L3 uses further disk space for replication of FiXML and index files on the second row.

Content source

Total


28 TB

FiXML data size

1.1 TB

Index data size

3.8 TB

Other data size

104 GB

Extra-large FAST Search Farm

This scenario has not yet been tested. Planned capacity is 500 million items per farm.

67

Overall Takeaways

Query and feeding performance

Performance for feeding of new content is mainly determined by the item processing capacity. It is therefore important that you deploy the item processing component in a way that utilizes spare CPU capacity across all servers.

Running indexer, item processing and query matching on the same server will give high resource utilization, but also higher variations in query performance during crawling. For such a deployment it is recommended to schedule all crawling outside periods with high query load.

A separate search row is recommended for deployments where low query latency is required at any time.

You can also combine a separate search row with a backup indexer. This will provide short recovery time in case of a non-recoverable disk error, with some loss of query performance and incremental update rates. For the highest query performance requirements, a pure search row is recommended.

Redundancy

The storage subsystem for a farm must have some level of redundancy, as loss of storage even in a redundant setup will lead to reduced performance during a recovery period that can last for days. Using a RAID disk set, preferably also with hot spares, is essential to any install.

A separate search row will also provide query redundancy.

Full redundancy for the feeding and indexing chain requires a backup indexer on a separate row, with increased server count and storage volume. While this provides the quickest recovery path from hardware failures, other options might be more attractive when hardware outages are infrequent:

 Running full re-crawl of all the content sources after recovery. Depending on deployment alternative this may take several days. If you have a separate search row you can perform the re-crawl while keeping the old index searchable.

 Run regular backup of the index data.

Capacity per node

For deployments with up to 15 million items per node you should use the default configuration.

Configuration for extended content capacity can be used for up to 40 million items per node if

you have moderate query performance requirements. Given sufficient storage capacity on the servers, this will enable a substantial cut in number of servers deployed.

Deployments on storage area networks (SAN)

FAST Search Server can use SAN storage instead of local disks if this is required for operational reasons. The requirement for high performance storage still applies. Testing of the M2 deployment alternative shows that a sufficiently powerful SAN will not be a bottleneck. Although the actual workload is scenario dependent, the following parameters could be used as estimation for the required SAN resources for each node in the FAST Search Server farm:

 2000 – 3000 I/O operations per second (IOPS)

 50 – 100 kB average block size

68

 Less than 10 ms average read latency

For example, for a farm setup like M4 (7 servers), the SAN must be capable of serving 15.000 –

20.000 IOPS to the FAST Search Server farm regardless of any other traffic served by the same storage system.

Deployments on solid state disks (SSD)

FAST Search Server can take advantage of the increased IO performance of SSDs. The XS5 deployment alternative had low content volume and high QPS. With regular spindles, a high number of disks would be needed, while only two SSDs were sufficient. This allows for using blade servers with local disks on the blade itself, and still having sufficient IO performance.

The M8 and M10 deployment alternatives used SSDs with even higher storage capacity and IO performance. Both of these configurations were entirely limited by the CPU performance. The

SSDs made it feasible to have high query performance without a dedicated search row, thus roughly halving the server count needed for a certain performance target count per server also yields lower power consumption.

1 . The reduced disk

1 Removing the search row does although remove redundancy in case of server failures.

69

Troubleshooting performance and scalability

This section provides recommendations for how to optimize the capacity and performance of your system environment.

It also covers troubleshooting tips for the FAST Search Server farm servers, and the FAST

Search Server specific configuration settings found in the Query and Content SSAs.

Raw I/O performance

FAST Search Server has extensive use of the storage subsystem. Testing the raw I/O performance can be used as an early verification of having sufficient performance.

One such test tool is SQLIO

( http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65cb53442d9e19 ).

After installing SQLIO, the first step is to get or generate a suitable test file. The tests below include write operations, thus the content of this file will be partially overwritten. The size of the file should also be much larger than the available system memory (by a factor of 10) to avoid most caching effects.

The test file can also be generated by SQLIO itself, although not directly for huge file sizes. It is recommended to generate a 1 GB file with the command "sqlio.exe -t32 -s1 -b256 1g" which will create the file named "1g" in the current directory. This file can then be concatenated to a sufficiently large file like 256GB, by the command "copy 1g+1g+1g+…..+1g testfile". To ensure that caching during the test file preparation do not skew the results, a server reboot is recommended before continuing with the specified tests.

The following set of commands is representative for the most performance critical disk operations in FAST Search Server. All assume that a file "testfile" exists in the current directory, which should be located on the disk planned to host FAST Search Server. Each test runs for 300 seconds: sqlio.exe -kR -t4 -o25 -b1 -frandom -s300 testfile sqlio.exe -kR -t4 -o25 -b32 -frandom -s300 testfile sqlio.exe -kW -t4 -o25 -b32 -frandom -s300 testfile sqlio.exe -kR -t1 -o1 -b100000 -frandom -s300 testfile sqlio.exe -kW -t1 -o1 -b100000 -frandom -s300 testfile

The first test measures the maximum number of I/O operations per second for small read transfers. The second and third tests measure the performance for medium sized random accesses. The two last tests measures read and write throughput for large transfers. Some example results are given in the following table, with minimum recommendations during normal operation in the topmost row.

70

Disk layout

Recommended minimum

16x SAS 10k RPM 2.5" drives

RAID50 in two parity groups

22x SAS 10k RPM 2.5" drives


With drive failure

12x SAS 7200 RPM 3.5" drives


With drive failure

12x SAS 7200 RPM 3.5" drives

RAID10

With drive failure

12x SAS 15k RPM 3.5” drives


2x ZeusIOPS 400GB

MLC 2.5” drives

RAID0

1x ioDrive 640GB MLC

1x ioDrive Duo 1280GB MLC

3x ioDrive Duo 1280GB MLC RAID0


Non-default option: 4kB block size


With card failure

Note:

1kB read

[IOPS]

2000

2952

4326

3144

1844

1424

1682

1431

4533

52709

83545

160663

162317

181593 3

188284

126469

32kB read

[IOPS]

1800

2342

3587

2588

1315

982

1134

925

3665

14253

21875

42647

83661

86396

87270

48564

32kB write

[IOPS]

900

959

1638

1155

518

531

1169

1154

848

2717 2

17687

32574

44420

47423

11800

10961

100MB read

[MB/s]

500

568

1359

770

677

220

762

213

501

360

676

1309

2382

2340

2459

716

100MB write

[MB/s]

250

277

122

533

664

1412

1631

545

202

The numbers in the table reflects a deployment where the disk subsystem is at least 50% utilized in capacity before adding the test file. Testing on empty disks tends to get elevated results, as the test file is then placed in the most optimal tracks across all spindles (shortstroking), which can give 2-3x higher performance.

Numbers in rows highlighted in red are measured with a forced drive failure.

RAID50 provides better performance during normal operation than RAID10 for most tests apart from small writes. RAID10 has less performance degradation if a drive should fail. We recommend using RAID50 for most deployments, as 32kB writes is the least critical of the five

266

257

780

477

692

220

235

2 This is the average IOPS over the standard 300 second test period. It although starts out as

~3500 IOPS, degrading to a sustained ~1700 IOPS after 3-4 minutes.

3 Tested with 4kB block reads due to the different block size formatting

71

tests indicated in the table above. RAID50 provides near twice the storage capacity compared to

RAID10 on the same number of disks.

If you deploy a backup indexer, 32kB writes are more frequent. This is due to the fact that a large amount of pre-index storage files (FiXML) are passed from the primary to the backup indexer node. This may in certain cases lead to a performance improvement by using RAID10.

Note:

These results are to a large degree dependent on the disk controller and spindles used. All scenarios in this document specify in detail the actual hardware that has been tested.

Analyzing feeding and indexing performance

The content processing chain in FAST Search Server consists of the following components, all potentially running on separate nodes:

 Crawler(s): Any node pushing content into FAST Search Server, in most cases a Content

SSA hosted in a SharePoint 2010 farm.

 Content distributor(s): Receives content in batches and redistribute them to item processing in document processors









Item processing: Converts documents to a unified internal format

Indexing dispatcher(s): Schedules a indexer node for each content batch

Primary indexer: Generates the index

Backup indexer: Persists a backup of the information in the primary indexer

Content flows as indicated by arrows 1–5 in the figure above, with the last flow from primary to backup indexer is an optional deployment choice. Asynchronous callbacks for completed processing are propagating in the other direction as indicated by arrows 6 through 9. Crawlers will be throttling the feed rate based on the callbacks (9) received for document batches (1).

The overall feed performance will be determined by the slowest component in this chain. The following sections will describe how to monitor this.

Monitoring can be done through several tools; for example the "Performance monitor" of

Windows Server 2008 [R2], or on Systems Center Operations Manager (SCOM).

Content SSA

The most frequently used crawler is the set of indexing connectors supported by the Content

SSA. The following statistics are important:

72

 Batches ready: The number of batches that has been retrieved from the content sources, and that are ready for passing on to the content distributor.

 Batches submitted: The number of batches that has been sent to FAST Search Server, and for which a callback is still pending.

 Batches open: The total number of batches in some stage of processing.

The figure below shows these performance counters for a crawl session. Note that there is different scale used in "batches submitted" and the other two. Feed starts with "batches submitted" ramping up until the item processing components are all busy (36 in this case), and will stay at this level as long as there are available work ("batches ready"). There is a period around 6:45 to 8:45 where the content source is only able to provide very limited volumes of data, bringing "batches ready" to near zero in the same period.

For deployments with backup indexer rows, the "batches submitted" tend to exceed the number of item processing components. These "additional" batches are content that has been processed, but which has not yet been persisted in both indexer rows. The Content SSA will by default throttle feeds in order to avoid more than 100 "batches submitted".

For large installations, the throttling parameters should be adjusted to allow for more batches to be in some stage of processing. Tuning is only needed for deployments with at least one of the following characteristics:

 More than 100 item processing component instances deployed per crawl component in the content SSA

 More than 50 item processing component instances deployed per crawl component in the content SSA, in conjunction with a backup indexer row

 More than 3 index columns per crawl component in the content SSA

73

The number of crawl components within the content SSA must be dimensioned properly for large deployments to avoid network bottlenecks. This scaling will often eliminate the need for further configuration tuning. When one or more of the above mentioned conditions apply, the feeding performance can be improved by increasing throttling limits in the Content SSA. These properties are "MaxSubmittedBatches" (default 100) and "MaxSubmittedPUDocs" (default 1000), and increased limits can be calculated as given below.

Note:

These limits apply for each crawl component within the Content SSA. If you use two crawl components (as in some of the scenario tests), the maximum total number of batches submitted will be two times the configured value.

𝑎 = {

1 for deployments without backup indexer

2 for deployments with backup indexer 𝑏 = 𝑎 ∗ Number of item processor instances 𝑐 = Number of index columns 𝑠 = Number of crawl components in the Content SSA

MaxSubmittedBatches =

20 ∗ 𝑐 + 𝑏 𝑠

MaxSubmittedPUBatches = 100 ∗ MaxSubmittedBatches

For example, the M4 scenario will have a=1, b=48, c=3, s=2; resulting in MaxSubmittedBatches

= 54 and MaxSubmittedPUDocs = 5400. The default value (100) for MaxSubmittedBatches does not need tuning in this case. MaxSubmittedPUDocs (the maximum number of documents with

ACL changes submitted) may be increased if the feed performance is limited by a high rate of

ACL changes. These configuration parameters have not been changed in any of the scenarios covered in this document.

These throttling limits are configurable through the SharePoint 2010 Management Shell on the

SharePoint farm hosting the Content SSA. The following commands set the default values:

$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "My Content SSA"

$ssa.ExtendedConnectorProperties["MaxSubmittedBatches"] = 100

$ssa.ExtendedConnectorProperties["MaxSubmittedPUDocs"] = 1000

$ssa.Update()

You need to replace the identity string "My Content SSA" with the name of your Content SSA.

Increasing these limits will increase the load on the item processing and indexing components.

When these consume more of the farm resources, query performance will be impacted. This is less of an issue when running without a dedicated search row. Increasing

"MaxSubmittedPUDocs" will increase the I/O load on primary and backup indexers.

The table below shows the most important performance counters for the Content SSA. Note that these are found on the node(s) hosting the Content SSA crawl components, under "OSS Search

FAST Content Plugin", and not in the FAST Search Server farm.

74

Performance counter

Apply to object

Notes

Batches open Content

SSA

Batches submitted

Batches ready

Items Total

Available

Mbytes

Processor time

Bytes

Total/sec

Content

SSA

Content

SSA

Content

SSA

Memory

The total number of batches in some stage of processing.

The number of batches that has been sent to FAST Search

Server, and for which a callback is still pending. When zero, nothing has been sent to the FAST Search Server farm backend for processing.

The number of batches that has been retrieved from the content sources and that are ready for submitting to the content distributor. When zero, the FAST Search Server farm backend is processing content faster than the Content SSA is able to crawl.

The total number of items passed through the Content SSA since last service restart.

The total amount of available memory on the computer. The

Content SSA will by default stop aggregating batches ready when 80% system memory has been used.

Processor Overall CPU usage on the computer. High CPU load could limit the throughput of the Content SSA.

Network

Interface

Overall network usage on the computer. High network load might become a bottleneck for the rate of data that can be crawled and pushed to the FAST Search Server farm.

Content distributors and item processing

Each FAST Search Server farm has one or more content distributors. These components receive all content in batches, which are passed on to the item processing components. You can ensure good performance by verifying that the following conditions are met:

 Item processing components are effectively utilized

 Incoming content batches are rapidly distributed for processing

Maximum throughput can only be achieved when the Content SSA described in the previous section has a constant queue of "bathes ready" that can be submitted. Each item processing component will use 100% of a CPU core when busy. Item processing components can be scaled up to one per CPU core.

When having multiple content distributors, the below performance counters should be summed up across all of them for a total overview of the system.


Apply to object

Notes

Document processors

FAST Search

Content

Distributor

The number of item processing components registered with each content distributor. When having multiple content distributors, the item processing components will be evenly distributed across the content distributors.

75

Document processors busy

Average dispatch time

Average processing time

Available

Mbytes

Processor time

Bytes

Total/sec

FAST Search

Content

Distributor

FAST Search

Content

Distributor

FAST Search

Content

Distributor

Memory

Processor

Network

Interface

The number of item processing components that are currently working on a content batch. This should be close to the total number of item processing components under maximum load.

The time needed for the content distributor to send a batch to an item processing component. This should be less than

10ms. Higher values indicate a congested network.

The time needed for a batch to go through an item processing component. This time can vary depending on content types and batch sizes, but would normally be less than 60 seconds.

The total amount of available memory on the computer.

Each item processing component might need up to 2GB of memory. Processing throughput will be impacted under memory starvation.

Overall CPU usage on the computer. Item processing components are very CPU intensive. High CPU utilization is expected during crawls, but item processing is scheduled with reduced priority and will yield CPU resources to other components when needed.

Overall network usage on the computer. High network load might become a bottleneck for the rate of data that can be processed by the FAST Search Server nodes.

Indexing dispatcher and indexers

Indexers are the most write intensive component in a FAST Search Server installation, and you need to ensure that you high disk performance. High indexing activity can also affect query matching operations when running on the same row.

Indexers distribute the items across several partitions. Partition 0, and up to three of the other partitions can have ongoing activity at the same time. During redistribution of items among partitions, one or more partitions might be in a state waiting for other partitions to reach a specific checkpoint. In addition to the performance counters below, indexer status is provided by the "indexerinfo" command, for example "indexerinfo –a status".


Apply to object

Notes

76

Current queue size

FAST

Search

Indexer

Status

Indexers queues incoming work under high load. This is normal, especially for partial updates. If API queues never

(intermittently) reaches zero, the indexer is the bottleneck.

Feeds will be paused when the queue reaches 256MB in one of the indexers.

This can happen if the storage subsystem is not sufficiently powerful. It will also happen during large redistribution of content between partitions, which temporarily blocks more content from being indexed.

FiXML files are compacted at regular intervals, by default between 3am and 5am every night. Low FiXML fill rate (<70%) will lead to inefficient operation.

FiXML fill rate

FAST

Search

Indexer

Active documents

FAST

Search

Indexer

Partition

Partitions 0 and 1 should have less than 1 million items each, preferably even less in order to keep indexing latency low. In periods with high item throughput, indexing latency will be reduced and these partitions will be larger, as this is more optimal for overall throughput. Items will although automatically be rearranged into the higher numbered partitions during periods with lighter load.

% Idle Time Logical disk Low disk idle time suggest a saturated storage subsystem.

% Free space

Logical disk Indexers need space for both the index generation currently used for search, as well as new index generations that are under processing. On a fully loaded system, disk usage will vary between 40% and near 100% for the same number of items, depending on the state of the indexer.

Analyzing query performance

Query SSA

SharePoint administrative reports provide useful statistics for query performance from an endto-end perspective. These reports are effective for tracing trends over time, as well as identifying where to investigate when performance is not optimal.

The diagram below shows two such events. Around 2:20am, server rendering (blue graph) has a short spike due to recycling of the application pool. Later, at 3:00am, the FiXML compaction is starting, impacting the backend latency.

77

In general, server rendering and object model latencies occurs on the nodes running SharePoint.

These latencies are also dependent on the performance of the SQL server(s) backing the

SharePoint installation. The backend latency is within the FAST Search Server nodes, and will be discussed in the following sections.

QRproxy and QRserver

Queries are sent from the Query SSA to the FAST Search Server farm via the QRproxy component which resides on the server running the query processing component ("query" in the deployment file). The performance counters in the table below can be helpful for correlating the backend latency reported by the Query SSA, and the query matching component (named

"QRServer" in the reports). Neither of these components is likely to represent a bottleneck. Any difference between the two is due to communication delays or processing in the QRproxy.

Performance counter Apply to object

Notes

# Queries/sec Current number of queries per second

# Requests/sec

Average queries per minute

FAST Search

QRServer

FAST Search

QRServer

FAST Search

QRServer

Current number of requests per second. In addition to the query load, one internal request is received every second to check that QRserver is alive.

Average query load

78

Average latency last - ms

Peak queries per sec

FAST Search

QRServer

FAST Search

QRServer

Average query latency

Peek query load seen by the QRserver since last restart

Query dispatcher

The query dispatcher (named "Fdispatch" in the reports) distributes queries across index columns. There is also a query dispatcher located on each query matching node, distributing queries across index partitions. Both query dispatchers may be a bottleneck when there are huge amounts of data in the query results, leading to network saturation. It is recommended to keep traffic in and out of fdispatch on network connections that are not carrying heavy load from e.g. content crawls.

Query matching

The query matching (component named "Fsearch" in the reports) is responsible for performing the actual matching of queries against the index, computing query relevancy and performing deep refinement. For each query, it reads the required information from the indices generated by the indexer. Information that is likely to be reused will be kept in a memory cache. Good query matching performance is relying on a powerful CPU as well as low latency from small random disk reads (typically 16-64 kB). The below performance counters are useful for analyzing a node running the query matching:

Performance counter Apply to object

Notes

% Idle Time

Avg. Disk sec/Read

Avg. Disk Read Queue

Length

Processor time

Logical disk Low disk idle time suggest a saturated storage subsystem

Physical disk Each query will need a series of disk reads. An average read latency of less than 10 ms is desirable.

Physical disk On a saturated disk subsystem, read queues will build up. Queues will affect query latency. An average queue length smaller than 1 is desirable for any node running query components. This will typically be exceeded in single row deployments during indexing, negatively impacting search performance.

Processor CPU utilization is likely to become the bottleneck for high query throughput. When query matching has high processor time (near 100%), query throughput will not be able to increase further.

79