FAST Search Server 2010 for SharePoint Capacity Planning This document is provided "as-is". Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. © 2010 Microsoft Corporation. All rights reserved. FAST Search Server 2010 for SharePoint Capacity Planning Microsoft Corporation June 2010 Applies to: FAST Search Server 2010 for SharePoint Summary: This white paper describes specific deployments of FAST Search Server 2010 for SharePoint, including: Test environment specifications, such as hardware, farm topology and configuration The workload that is used for data generation, including the number and class of users, and farm usage characteristics Test farm dataset, including search indexes and external data sources Health and performance data that is specific to the tested environment Test data and recommendations for how to determine the hardware, topology, and configuration that you need to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics. Contents Introduction ......................................................................................................................................................3 Search Overview ................................................................................................................................................4 Sizing Approach ...........................................................................................................................................4 Search and Indexing Lifecycle .......................................................................................................................4 Content feeding .....................................................................................................................................5 Query load ............................................................................................................................................6 Network traffic .................................................................................................................................... 13 Web Analyzer performance dimensioning ...................................................................................................... 14 Scenarios ........................................................................................................................................................ 15 Medium FAST Search Farm.......................................................................................................................... 15 Deployment alternatives ....................................................................................................................... 16 Specifications ...................................................................................................................................... 16 Test Results ........................................................................................................................................ 32 Overall Takeaways ............................................................................................................................... 45 Troubleshooting performance and scalability ........................................................................................................ 47 Raw I/O performance ................................................................................................................................. 47 Analyzing feed and indexing performance ..................................................................................................... 49 The feed and indexing processing chain .................................................................................................. 49 Content SSA ....................................................................................................................................... 49 Content distributors and item processing ................................................................................................ 52 Indexing dispatcher and indexers .......................................................................................................... 53 Analyzing query performance ...................................................................................................................... 54 Query SSA .......................................................................................................................................... 54 QRproxy and QRserver ......................................................................................................................... 55 Query dispatcher ................................................................................................................................. 56 Query matching ................................................................................................................................... 56 Introduction This white paper provides capacity planning information for collaboration environment deployments of FAST Search Server 2010 for SharePoint, which is referred to as FAST Search Server. The white paper includes the following information for sample search farm configurations: Test environment specifications, such as hardware, farm topology, and configuration The workload that is used for data generation, including the number and class of users and farm usage characteristics Test farm dataset, including search indexes and external data sources Health and performance data that is specific to the tested environment The white paper also contains common test data and recommendations for how to determine the hardware, topology, and configuration that you need to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics. FAST Search Server contains a richer set of features and a more flexible topology model than the search solution in earlier versions of SharePoint. Before you employ this architecture to deliver more powerful features and functionality to your users, you must carefully consider the effect on your farm’s capacity and performance. When you read this white paper, you will understand how to: Define performance and capacity targets for your environment. Plan the hardware that is required to support the number and type of users and the features you intend to deploy. Design your physical and logical topology for optimum reliability and efficiency. Test, validate, and scale your environment to achieve performance and capacity targets. Monitor your environment for key indicators. Before you read this white paper, you should read the following:Performance and capacity management (SharePoint Server 2010) Plan Farm Topology (FAST Search Server 2010 for SharePoint) Search overview Sizing approach The scenarios in this white paper describe FAST Search Server test farms, with assumptions that allow you to start planning for the correct capacity for your farm. To choose the right scenario, you must consider the following questions: 1. Corpus Size: How much content has to be searchable? The total number of items should include all objects: documents, web pages, list items, and so on. 2. Availability: What are the availability requirements? Do customers need a search solution that can survive the failure of a particular server? 3. Content Freshness: How fresh do the search results to be? How long after the customer modifies the data do you expect searches to provide the updated content in the results? How often do you expect the content to change? 4. Throughput: How many people will be searching over the content simultaneously? This includes people typing in a query box, in addition to other hidden queries like Web Parts automatically searching for data, or Microsoft Outlook 2010 Social Connectors requesting activity feeds that contain URLs that need security trimming from the search system. Search and indexing lifecycle The scenarios allow you to estimate capacity at an early stage of the farm. Farms move through multiple stages as content is crawled: Index acquisition This is the first stage of data population, and it is characterized by: o Full crawls (possibly concurrent) of content. o Close monitoring of the crawl system to ensure that hosts being crawled are not a bottleneck for the crawl. Index Maintenance This is the most common stage of a farm. It is characterized by: o Incremental crawls of all content, detecting new and changed content. o For SharePoint content crawls, a majority of the changes that are encountered during the crawl are related to access right changes. Index Cleanup This stage occurs when a content change moves the farm out of the index maintenance stage, for example, when a content database or site is moved from one search service application to another. This stage is not covered in the scenario testing behind this white paper, but is triggered when: o A content source or start address or both are deleted from a search service application. o A host supplying content is not found by the content connector for an extended period of time. Content feeding Index acquisition Feed performance, when new content is being added, is mainly determined by the configured number of item processing components. Both the number of the CPU cores and the speed of each of them affects the results. As a first order approximation, a 1 GHz CPU core can process one average size (about 250 KB) Office document per second. For example, the M4 scenario, which is discussed later in this white paper, has 48 CPU cores for item processing, each being 2.26 GHz, for a total estimated throughput of 48 cores × 2.26 GHz ≈ 100 items per second, on average. The following crawl rate graph (taken from the SharePoint administration reports) shows one such crawl. The crawl rate varies depending on the type of the content. Most of the crawl is new additions (labeled as "modified" in the graph). Note: The indicated feed rates might saturate content sources and networks during peak feeding rate periods in the preceding crawl. See the Troubleshooting performance and scalability section for further information about how to monitor feeding performance. Index maintenance Incremental crawls can consist of various operations. Access right (ACL) changes and deletes: These require near zero item processing, but they require high processing load in the indexer. Feed rates are higher than for full crawls. Content updates: These require full item processing in addition to more processing by the indexer compared to adding new content. Internally, such an update corresponds to a deletion of the old item and an addition of the new content. Additions: Incremental crawls contain newly discovered items. These have the same workload as index acquisition crawls. Depending on the type of operation, an incremental crawl might be faster or slower than an initial full crawl. It is faster in the case of mainly ACL updates and deletes, and it is slower in the case of mainly updated items. Using a backup indexer might slow down the incremental crawl of updated items further. In addition to updates from the content sources, the index is also altered by internal operations: The FAST Search Server link analysis and click-through log analysis generate additional internal updates to the index. Example: A hyperlink in one item leads to an update of the anchor text information that is associated with the referenced item. Such updates have a load pattern that is similar to the ACL updates. At regular intervals, the indexer performs internal reorganization of index partitions and data defragmentation. Defragmentation is started every night at 3:00 A.M., although redistribution across partitions occurs whenever it is needed. These internal operations imply that you can observe indexing activity outside intervals with ongoing content crawls. Query load Index partitioning and query evaluation The overall index is partitioned on two levels: Index columns: The complete searchable index can be split into multiple disjoint index columns when the complete index is too large to reside on one server. A query is evaluated against all index columns within the search cluster, and the results from each index column are merged into the final query hit list. Index partitions: Within each index column the indexer uses a dynamic partitioning of the index to handle large number of indexed items with low indexing and query latency. This partitioning is dynamic and handled internally on each index server. When a query is evaluated, each partition runs within a separate thread. The default number of partitions is five. To handle more than 15 million items per server (column), you have to change the number of partitions (and associated query evaluation threads). This is discussed in the Configuration for extended content capacity section. Query latency Evaluation of a single query is schematically illustrated in the following figure. CPU processing (light blue) is followed by waiting for disk access cycles (white) and actual disk data read transfers (dark blue), repeated 2-10 times per query. This implies that the query latency depends on the speed of the CPU, in addition to the I/O latency of the storage subsystem. A single query is evaluated separately and in parallel, across multiple index partitions in all index columns. In the default five-partition configuration, each query is evaluated in five separate threads. Query throughput When query load increases, multiple queries are evaluated in parallel, as indicated in the following figure. Because different phases of the query evaluation occurs at different times, simultaneous I/O accesses are not likely to become a bottleneck. CPU processing shows considerable overlap, which is scheduled across the available CPU cores of the node. In all scenarios that were tested, the query throughput reaches its maximum when all available CPU cores are 100 percent utilized. This happens before the storage subsystem becomes saturated. More and faster CPU cores increase the query throughput and eventually make disk accesses the bottleneck. Note: In larger deployments, with many index columns, the network traffic between query processing and query matching nodes might also become a bottleneck, and you might consider increasing the network bandwidth for this interface. Index size effect on query performance Query latency is somewhat independent of query load up to the CPU starvation point at maximum throughput. Query latency for each query is a function of the number of items in the largest index partition. The following diagram shows query latency on a system, starting out with 5 million items in the index, with more content being added in batches up to 43 million items. Feeds are not running continuously, so the feeding effects on query performance at different capacity points can be seen. Data is taken from the M6 scenario, which is described later in this white paper. There are three periods in the graph where search has been stopped, rendered as zero latency. You can also observe that the query latency is slightly elevated when query load is applied after an idle period. This is due to caching effects. The query rate is, on average, 10 queries per minute, apart from a test for query throughput within the first day of testing. This reached about 4,000 queries per minute, making the 10 qpm query rate graph almost invisible. Thus this shows the light load query latency and not maximum throughput. The following diagram shows the feed rates during the same interval. By comparing the two graphs, you can see that an ongoing feed gives some degradation of query latency. Because this scenario has a search row with backup indexer, the effect is much less than in systems with search running on the same nodes as indexer and item processing. Percentile based query performance The graphs presented earlier in this white paper show the average query latency. The SharePoint administrative reports also provide percentile based reports. This can provide a more representative performance summary, especially under high load conditions. The following graph shows the percentile based query performance for the same system as in the previous section. While the previous graphs showed average query latencies about 500-700 ms, the percentile graph shows that the median latency (fiftieth percentile) is leveling out about 400 ms when content is added. The high percentiles show larger variations, due to the increased number of items on the system and the effect of ongoing crawls. Note: The percentile-based query performance graph includes the crawl rate of the Query SSA. This does not show any crawling activity, because the Query SSA only crawls user profile data for the people search index. People search is not included in the test scenarios in this white paper. Crawling of all other sources is performed by the FAST Search Server Content SSA. The query throughput load test during the first day reveals that high query load reduces the latency for the high percentiles. During low query load and ongoing feed, a large fraction of the queries hits fresh index generations without caches. When query load increases (within the maximum throughput capacity), the fraction of cold cache queries goes down. This reduces the high percentile latencies. Deployments with indexing and queries on same row Because crawls and queries both use CPU resources, deployments with indexing and queries on same row show some degradation in query performance during content crawls. Single row deployments are likely to have indexing, query, and item processing all running on the same servers. The following test results are gathered by applying an increasing query load to the system. The graphs are gathered from the SharePoint administrative reports. The query latency is plotted as an area versus the left axis, and the query throughput is a light blue line versus the right axis. In the following diagram there is no ongoing content feed. The colors of the graph are as follows: Red: Backend, that is time consumed in the FAST Search Server nodes Yellow: Object model Blue: Server rendering Query latency remains stable at about 700 ms up to 8 queries per second (~500 queries per minute). At this point, the server CPU capacity becomes saturated. When even higher loads are applied, query queues build up and latency increases linearly with the queue length. In the following diagram, the same query load is applied with ongoing content feeding. This implies that queries have to utilize CPU capacity from the lower prioritized item processing. Consequently, query latency starts to increase even at low load and the maximum throughput is reduced from ~600 to ~500 queries per minute. Note the change in scale on the axis compared to the previous graph. Query latency has higher variation during feed. The spikes shown in the graph are due to the indexer completing larger work batches, leading to new index generations which invalidates the current query caches. Using a dedicated search row You can deploy a dedicated search row to isolate query traffic from indexing and item processing. This requires twice the number of servers in the search cluster, at the benefit of better and more consistent query performance. Such a configuration also provides query matching redundancy. A dedicated search row implies some additional traffic during crawls when the indexer creates a new index generation (a new version of the index for a given partition). The new index data is passed over the network from the indexer node to the query matching node. Given a proper storage subsystem, the main effect on query performance is a slight degradation when new generations arrive due to cache invalidation. Search row combined with backup indexer You can deploy a backup indexer to handle non-recoverable errors on the primary indexer. You typically co-locate the backup indexer with a search row. For this scenario you should generally not deploy item processing to the backup indexer (with search row). The backup indexer increases the I/O load on the search row, because there is additional housekeeping communication between the primary and backup indexer to keep the index data on the two servers in sync. This also includes additional data storage on disk for both servers. Make sure that you dimension your storage subsystem to handle the additional load. Network traffic With increased CPU performance on the individual servers, the network connection between the servers can become a bottleneck. For example, even a small four-node FAST Search Server farm can process and index more than 100 items per second. If the average item is 250 KB, this represents about 250 megabits/s average network traffic. Such a load can saturate even a 1 gigabit/s network connection. The network traffic that is generated by content feeding and indexing is as follows: The indexing connector within the Content SSA retrieves the content from the source. The Content SSA (within the SharePoint farm) passes the retrieved items in batches to the content distributor component in the FAST Search Server farm. Each item batch is sent to an available item processing component, typically located on another server. After processing, each batch is passed to the indexing dispatcher, which splits the batches according to index column distribution. The indexing dispatcher distributes the processed items to the indexers of each index column. The binary index is copied to additional search rows (if deployed). The accumulated network traffic across all nodes can be more than five times higher than the content stream itself in a distributed system. A high performance network switch is needed to interconnect the servers in such a deployment. High query throughput also generates high network traffic, especially when using multiple index columns. Make sure you define the deployment configuration and network configuration to avoid too much overlap between network traffic from queries and network traffic from content feeding and indexing. Web Analyzer performance dimensioning Performance dimensioning of the Web Analyzer component depends on the number of indexed items and whether the items contain hyperlinks. Items that contain hyperlinks or that are linked to, represent the main load on the Web Analyzer. Database-type content does not typically contain hyperlinks. SharePoint and other types of Intranet content often contain HTML with hyperlinks. External Web content is almost exclusively HTML documents with many hyperlinks. The number of CPU cores and the amount of disk space is vital for performance dimensioning of the Web Analyzer, but disk space is the most important. The following table specifies rule-ofthumb dimensioning recommendations for the Web Analyzer. Content type Number of items per CPU core GB disk per million items Database 20 million 2 SharePoint / Intranet 10 million 6 Public Web content 5 million 25 The amount of memory that is needed is the same for all types of content, but it depends on the number of cores used. We recommend planning for 30 MB per million items plus 300 MB per CPU core. The link, anchor text, or click-through log analysis is performed only if sufficient disk space is available. The number of CPU cores affects only the amount of time it takes to update the index with anchor text and rank data. Note: The table provides dimensioning rules for the whole farm. If the Web Analyzer components are distributed over two servers, the requirement per server are half of the given values. If the installation contains different types of content, the safest capacity planning strategy is to use the most demanding content type as the basis for the dimensioning. For example, if the system contains a mix of database and SharePoint content, we recommend dimensioning the system as if it contains only SharePoint content. Scenarios This section describes a typical medium sized search farm. This is a moderate search farm scenario, where the farm provides search services to other farms. Forty million items are crawled from SharePoint, Web servers, and file shares. In addition, larger scenarios are briefly discussed (no test data provided in this white paper): Large FAST Search farm - the large search farm scenario, where the administrator is providing search services to other farms. One-hundred million items are crawled from SharePoint, file shares, Web servers, and databases. Extra-large FAST Search farm – this is the same scenario as the "Large FAST Search farm," but with increased capacity up to 500 million indexed items. Note: A small farm, typically a single-server FAST Search Server farm with up to 15 million items, is not covered in this white paper. Note: The following scenarios do not include storage sizing for storing a system backup, because backups would typically not be stored on the FAST Search Server nodes themselves. Each of the following sub-sections describes the specific scenario. General guidelines follow in the Recommendations section of this white paper. Medium FAST Search Server farm The amount of content over which to provide search is moderate (as many as 40 million items), and to meet freshness goals, incremental crawls are likely to occur during business hours. The configuration for the parent SharePoint farm uses two front-end Web servers, two application servers and one database server, arranged as follows: Application servers and front-end Web servers will only have disk space for the operating system and programs. No separate data storage is required. Two crawl components for the Content SSA are distributed across the two application servers. This is mainly due to I/O limitations in the test setup (1 gigabit/s network), where a single network adapter would have been a bottleneck. One of the application servers also hosts Central Administration for the farm. One database server supports the farm that is hosting the crawl databases, the FAST Search Server administration databases, in addition to the other SharePoint databases. Deployment alternatives The FAST Search Server farm can be deployed in various configurations to suit different business needs. For the medium farm scenario, the following alternatives have been tested for the FAST Search Server farm back-end: M1. One combined administration and Web Analyzer server, and three index column servers with default configuration (4 servers) M2. Same as M1, but using SAN storage (4 servers) M3. A single high capacity server that hosts all FAST Search Server components M4. Same as M1, with the addition of a dedicated search row (7 servers) M5. Same as M3, with the addition of a dedicated search row (2 servers) M6. Same as M4, but the search row includes a backup indexer row (7 servers) M7. Same as M5, but the search row includes a backup indexer row (2 servers) An extended version of the M2 scenario with a total of 100,000 documents has also been tested to a limited extent, mostly to see if there would be bottlenecks when SAN is used for larger setups. This is referred to as M2-100M. Specifications This section provides detailed information about the hardware, software, topology, and configuration of the test environment. Hardware FAST Search Server farm servers All the medium size scenarios are running on similar hardware. Unless otherwise stated, the following specifications have been used. Shared specifications: Windows Server 2008 R2 x64 Enterprise Edition 2x Intel L5520 CPUs o Hyper-threading switched on o Turbo Boost switched on 24 GB memory 1 gigabit/s network card Storage subsystem o Operating system: 2x 146 GB 10,000 RPM SAS disks in RAID1 o Application: 18x 146 GB 10,000 RPM SAS disks in RAID50 (two parity groups of 9 drives each). Total formatted capacity of 2 terabytes. o Disk controller: HP Smart Array P410, firmware 3.00 o Disks: HP DG0146FARVU, firmware HPD5 Changes for M2: Application is hosted on 2-terabyte partitions on a SAN SAN used for test o 3Par T-400 o 240 10,000 RPM spindles (400 GB each) o Dual ported FC connection to each application server using MPIO without any FC switch. MPIO enabled in the operating system. Changes for M2-100M: Same as M2, but with increased storage space to 16-terabyte partitions on the SAN. Changes for M3/M5: 48 GB memory Application is hosted on 22x300 GB 10,000 RPM SAS drives in RAID50 (two parity groups of 11 spindles each). Total formatted capacity of 6 terabytes. Changes for M7: 2x Intel E5640 CPUs o Hyper-threading switched on o Turbo Boost switched on 48 GB memory Dual 1 gigabit/s network card Storage subsystem: o Application hosted on 12x1 terabyte 7,200 RPM SAS drives in RAID10. Total formatted capacity of 6 terabytes. o Disk controller: Dell PERC H700, firmware 12.0.1-0091 o Disks: Seagate Constellation ES ST31000424SS, firmware KS65 SharePoint Server 2010 servers Application and front-end Web servers do not need storage other than operating system, application binaries, and log files. Windows Server 2008 R2 x64 Enterprise edition 2x Intel L5420 CPUs 16 GB memory 1 gigabit/s network card Storage subsystem for operating system and programs: 2x 146 GB 10,000 RPM SAS disks in RAID1 Instances of SQL Server The specification is the same as for servers with SharePoint Server 2010 in the previous section, with additional disk RAID for SQL Server data with 6x 146 GB 10,000 RPM SAS disks in RAID5. Topology This section describes the topology of the environment. M1 and M2 (same configuration) M1 and M2 are similar except for the storage subsystem. M1 is running on a local disk, while M2 uses SAN storage. M1 and M2 have a search cluster with three index columns and one search row. There is one separate administration node that also includes the Web Analyzer components. Item processing is spread out across all nodes. These scenarios do not have query matching running on a dedicated search row. This implies that there will be a noticeable degradation in query performance during content feeds. The effect can be reduced by running feed during off-peak hours or by reducing the number of itemprocessing components to reduce the maximum feed rate. The following figure shows the M1 deployment alternative. All the tested deployment alternatives use the same SharePoint Server and Database Server configuration. For the other deployments only the FAST Search Server farm topology is shown. The following deployment.xml file is used. <?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M1" xmlns=http://www.microsoft.com/enterprisesearch xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M1</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M1.jdbc]]> </connector-databaseconnectionstring> <host name="fs4sp1.contoso.com"> <admin /> <query /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> <document-processor processes="12" /> </host> <host name="fs4sp2.contoso.com"> <content-distributor /> <searchengine row="0" column="0" /> <document-processor processes="12" /> </host> <host name="fs4sp3.contoso.com"> <content-distributor /> <searchengine row="0" column="1" /> <document-processor processes="12" /> </host> <host name="fs4sp4.contoso.com"> <indexing-dispatcher /> <searchengine row="0" column="2" /> <document-processor processes="12" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> </searchcluster> </deployment> M3 The M3 scenario combines all components on one server. The same effect as on M1 and M2 for running concurrent feed and queries applies. The reduced number of servers implies fewer items processing components and thus lower feed rate than M1 and M2. The following figure shows the FAST Search Server farm topology for the M3 deployment alternative. The SharePoint Server and Database Server configuration is equal to what is described for M1. For the other deployments only the FAST Search Server farm topology is shown. The following deployment.xml file is used. <?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M3" xmlns=http://www.microsoft.com/enterprisesearch xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M3</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M3.jdbc]]> </connector-databaseconnectionstring> <host name="fs4sp1.contoso.com"> <admin /> <query /> <content-distributor /> <indexing-dispatcher /> <searchengine row="0" column="0" /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> <document-processor processes="12" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> </searchcluster> </deployment> M4 M4 corresponds to M1 with the addition of a dedicated search row. The search row adds query throughput capacity, introduces query redundancy, and provides better separation of query and feeding load. Each of the three servers running the dedicated search row also includes a query processing component (query). The deployment also includes a query processing component on the administration node (fs4sp1.contoso.com). The Query SSA does not use this query processing component during typical operation, but it can be used as a fallback to serve queries if the entire search row is taken down for maintenance. The following figure shows the FAST Search Server farm topology for the M4 deployment alternative. The SharePoint Server and Database Server configuration is equal to what is described for M1. For the other deployments, only the FAST Search Server farm topology is shown. The following deployment.xml file is used. <?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M4" xmlns=http://www.microsoft.com/enterprisesearch xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M4</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M4.jdbc]]> </connector-databaseconnectionstring> <host name="fs4sp1.contoso.com"> <admin /> <query /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> <document-processor processes="12" /> </host> <host name="fs4sp2.contoso.com"> <content-distributor /> <searchengine row="0" column="0" /> <document-processor processes="12" /> </host> <host name="fs4sp3.contoso.com"> <content-distributor /> <indexing-dispatcher /> <searchengine row="0" column="1" /> <document-processor processes="12" /> </host> <host name="fs4sp4.contoso.com"> <indexing-dispatcher /> <searchengine row="0" column="2" /> <document-processor processes="12" /> </host> <host name="fs4sp5.contoso.com"> <query /> <searchengine row="1" column="0" /> </host> <host name="fs4sp6.contoso.com"> <query /> <searchengine row="1" column="1" /> </host> <host name="fs4sp7.contoso.com"> <query /> <searchengine row="1" column="2" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="none" search="true" /> </searchcluster> </deployment> M5 M5 corresponds to M3 with the addition of a dedicated search row, giving the same benefits as M4 compared to M1. The following figure shows the FAST Search Server farm topology for the M5 deployment alternative. The SharePoint Server and Database Server configuration is equal to what is described for M1. For the other deployments, only the FAST Search Server farm topology is shown. The following deployment.xml file is used. <?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M5" xmlns=http://www.microsoft.com/enterprisesearch xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M5</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M5.jdbc]]> </connector-databaseconnectionstring> <host name="fs4sp1.contoso.com"> <admin /> <query /> <content-distributor /> <indexing-dispatcher /> <searchengine row="0" column="0" /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> <document-processor processes="16" /> </host> <host name="fs4sp2.contoso.com"> <query /> <searchengine row="1" column="0" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="none" search="true" /> </searchcluster> </deployment> M6 M6 is the same setup as M4, with an additional backup indexer enabled on the search row. Only the deployment.xml change compared to M4 is shown in the following code example. … <searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="secondary" search="true" /> </searchcluster> … M7 M7 is the same setup as M5, with an additional backup indexer enabled on the search row. M7 is also running on nodes with more CPU cores (see hardware specifications), allowing an increase in the number of item processing components, including the search row. The following deployment.xml is used. <?xml version="1.0" encoding="utf-8" ?> <deployment version="14" modifiedBy="contoso\user" modifiedTime="2009-03-14T14:39:17+01:00" comment="M7" xmlns=http://www.microsoft.com/enterprisesearch xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd"> <instanceid>M7</instanceid> <connector-databaseconnectionstring> [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M5.jdbc]]> </connector-databaseconnectionstring> <host name="fs4sp1.contoso.com"> <admin /> <query /> <content-distributor /> <indexing-dispatcher /> <searchengine row="0" column="0" /> <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> <document-processor processes="20" /> </host> <host name="fs4sp2.contoso.com"> <query /> <searchengine row="1" column="0" /> <document-processor processes="8" /> </host> <searchcluster> <row id="0" index="primary" search="true" /> <row id="1" index="secondary" search="true" /> </searchcluster> </deployment> Configuration for extended content capacity FAST Search Server has a default configuration that is optimized for handling as many as 15 million items per index column, with a hard limit of 30 million items per index column. Some of the scenarios described in this white paper use a modified configuration to allow for as many as 40 million items per column. This is referred to as an extended capacity configuration. The extended content capacity configuration has more index partitions within each server node. In this way low query latency can be maintained at the expense of reduced maximum queries per second (QPS). Note: The modified indexer configuration is not optimal for deployments with less than 15 million documents, and should only be used when higher capacity is required. Modifying the indexer configuration has implications on how to perform patch and service pack upgrades. This is further described in the following text. The extended content capacity configuration requires a configuration change to a configuration file that is used by the indexer. Configure the indexer to handle up to 40 million items per column To reconfigure the indexers you must modify the indexer template configuration file and run the deployment script to generate and distribute the new configuration. Note: The following procedure can only be applied to indexers that do not contain any data. 1. Verify that no crawling is ongoing. 2. Verify that no items are indexed on any of the indexers. Type the following command: %FASTSEARCH%\bin\indexerinfo –a doccount. All the indexers should report 0 items. 3. On all FAST Search Server nodes, type the following commands: net stop fastsearchservice net stop fastsearchmonitoring 4. On the admin server node: Save a backup of the original configuration file, %FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data \RTSearch\clusterfiles\rtsearchrc.xml.win32.template You might need this backup at a later stage if this configuration file is modified in any patch or service pack upgrade. Modify the following configuration values: i. Configuration parameter: numberPartitions 1. Default setting: 5 2. New setting: 10 ii. Configuration parameter: docsDistributionMax 1. Default setting: 6000000,6000000,6000000,6000000,6000000 2. New setting: 6000000,6000000,6000000,6000000,6000000,6000000,6000000,6 000000,6000000,6000000 The deployment file, %FASTSEARCH%\etc\config_data\deployment\deployment.xml must be modified for the PowerShell cmdlet Set-FASTSearchConfiguration to run the redeployment. You can do that by opening the file in Notepad, adding a space, and saving the file. Type the following commands: Set-FASTSearchConfiguration net start fastsearchservice 5. On all non-admin server nodes, type the following commands: Set-FASTSearchConfiguration net start fastsearchservice Handling patches and service pack upgrades For all future patch or service pack updates, you must verify whether this configuration file is updated as part of the patch or as the service pack update. Review the readme file thoroughly to look for any mention of this configuration file. If a patch or service pack involves an update of this configuration file, the following steps must be followed. 1. Replace the configuration file %FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSe arch\clusterfiles\rtsearchrc.xml.win32.template with the backup of the original file that you have saved. 2. Perform the patch or service pack upgrade according to the appropriate procedure. 3. Perform the change to the configuration file template as specified earlier. Do not forget to back up the modified configuration file template! There are some tradeoffs associated with extending the capacity: Query throughput is reduced. Query latency (although not exceeding the throughput limitation) is less affected. Query throughput reductions can be compensated with multiple search rows, but then the reduction in server count is diminishing. Indexing requires more resources and more disk accesses. More items per column require more storage space per server. The total storage space across the entire farm is mainly the same. There are fewer nodes for distributing item processing components. Initial feed rate is reduced, because the feed rate is dependent mainly on the number of available CPU cores. Incremental feeds also have lower throughput because each index column has more work. Initial preproduction bulk feeds can be accelerated by temporarily adding item processing components to eventual search rows, or by temporarily assigning additional servers to the cluster. More hardware resources per server are required. We do not recommend that you use the extended settings on a server with less than 16 CPU cores/threads (24 or more is recommended). We recommend 48 GB RAM and a high-performance storage subsystem. See individual scenarios for tested configurations. In summary, we recommend the extended content capacity configuration only for deployments with: High content volumes, but where the number of changes over time is low, typically less than 1 million changes per column over 24 hours. Low query throughput requirements (not more than 5-10 queries per second, depending on CPU performance of the servers). Running search on a different search row than the primary indexer, because the indexer is expected to be busy most of the time. Note: Any content change implies a load to the system, including ACL changes. ACL changes can appear for many items at a time in case of access right changes to document libraries or sites. Dataset This section describes the test farm dataset, including database content and sizes, search indexes, and external data sources. Object Search index size (# of items) Size of crawl database Size of crawl database log file Size of property database Size of property database log file Size of SSA administration database Value 42.7 million 138 GB 11 GB <0.1 GB 0.3 GB <0.1 GB Note: The FAST Search Server index does not use any SQL Server-based property database. The people search index uses the property database, but the test scenarios in this white paper do not include people search. The following table specifies which content source types are used to build the index. The numbers in the table reflect the total number of items per source, including replicated copies. The difference between the total number of items in the table (43.8 million) and the index size in the previous table (42.7 million) is due to two factors: Items can be disabled from indexing in the content source. The document format type cannot be indexed. For SharePoint sources, the size of the respective content database in SQL Server is used as the raw data size. Content source File share 1 (2 copies) File share 2 (2 copies) SharePoint 1 SharePoint 2 HTML 1 HTML 2 Total items Raw data size Average size per item 1.2 million 29.3 million 4.5 million 4.5 million 1.1 million 3.2 million 43.8 million 154 GB 6.7 terabytes 2.0 terabytes 2.0 terabytes 8.8 GB 137 GB 11 terabytes 128 KB 229 KB 443 KB 443 KB 8.1 KB 43 KB 251 KB Note: To reach sufficient content volume in these tests, two replicas of the file shares are added. Each copy of each document appears as a unique item in the index, but is treated as a duplicate by the duplicate trimming feature. From a query matching perspective the load is similar to having all unique documents indexed, but any results from these sources trigger duplicate detection and collapsing in the search results. The test scenarios do not include people search data. People search is crawled and indexed in a separate index within the Query SSA. Workload This section describes the workload that is used for data generation, including the number of concurrent users and farm usage characteristics. The number of queries per second (QPS) is varied from 1 QPS to about 40 QPS and the latency is recorded as a function of this. The query test set consists of 76,501 queries. These queries have the following characteristics: Query terms 1 2 3 4 5 7 Number of queries 49,195 2,4520 2,411 325 43 7 Percentage of test set 64.53 32.16 2.81 0.43 0.06 0.01 There are two types of multiterm queries used: 1. ALL queries (about 70 percent of the multiterm queries), meaning all terms must appear in matching items. This includes queries containing the explicit AND, in addition to the list of terms that implicitly is parsed as an AND statement. 2. ANY queries (about 30 percent of the multiterm queries), meaning at least one of the terms must appear in matching items (OR). The queries are chosen by random selection. The number of agents defines the query load. One agent repeats the following two steps during the test: 1. Submit a query. 2. Wait for response. There is no pause between the repetitions of these steps. A new query starts immediately after a query response is received. The number of agents increases in steps during the test. The following figure shows a typical test where the number of agents increases periodically. For this example the test runs 15 minutes for each number of agents value. Test results This section provides data that shows how the farm performed under load. Feed and indexing performance Full crawl The following diagram shows the number of items processed per second for the various scenarios. Key takeaways: The item processors represent the bottleneck. CPU processing capacity is the limiting factor. M1, M2, and M4 have similar performance characteristics because the same number of item processors are available. The same applies to M3 and M5. M1, M2, and M4 have four times the item processor capacity compared to M3 and M5. This translates to a four times higher crawl performance during a full crawl. Running with backup indexers incurs a performance overhead due to the extra synchronization work required. Typically an installation without backup indexers, such as M4, outperform one with backup indexers, such as M6. Incremental crawl Key takeaways: Incremental crawls are faster than full crawls, from slightly faster up to a factor of two or three. This is because incremental crawls mainly consist of partial updates, which only update metadata. This implies that the feed performance is largely the same for all content types. The indexers are the bottleneck for incremental crawls, because item processing load is limited. Typically, disk I/O capacity is the limiting factor. During an incremental update the old version of the item is fetched from disk, modified, persisted to disk, and then indexed. This is more expensive than a full crawl operation where the item is only persisted and indexed. Query performance Scenario M1 The following diagram shows the query latency as a function of QPS for the M1 scenario. For an increasing number of agents the QPS increases. For low QPS, the query matching component can handle the increasing QPS, and latency does not increase much. For higher QPS, a saturation of the system gradually takes place and the query latency increases. An idle indexer gives best query performance, with an average latency less than 0.7 until approximately 21 QPS. The corresponding numbers, when a full crawl is done, is 10 QPS. The corresponding numbers when an incremental crawl is done, is 15 QPS. The previous figure shows that QPS decreases and latency increases if you apply more query load after the maximum capacity of the system has been reached. This occurs at the point where the curve starts bending backwards. On the M1 system, the peak QPS is about 28, with idle indexers. CPU resources are the bottleneck in this scenario. The behavior is also illustrated in the next diagram where you can observe that performance decrease when there are more than 40 simultaneous user agents on an idle system. Hyper-threading The following diagram shows the effect of using hyper-threading in the CPU. Hyper-threading allows more threads to execute in (near) parallel, at the expense of slightly reduced average performance for single-threaded tasks. FAST Search Server query matching components run in a single thread when QPS is low and no other tasks are running on the server. Hyper-threading performs better for all the three feeding cases in the M1 scenario. In other scenarios, with dedicated search rows, a small reduction (about 150 ms) in query latency is observed when running at very light query load. In general, hyper-threading reduces query latency and allows for higher QPS, especially when there are multiple components on the same server. Disabling hyper-threading provides only a small improvement under conditions where the performance is already good. Hence, we recommend having hyper-threading enabled. Scenario M2 The following diagram shows the query latency as a function of QPS for the M2 scenario. For increasing number of agents, the QPS increases. For low QPS, the query matching handles increasing QPS, and latency does not increase much. For higher QPS, a saturation of the system takes place and the latency increases. When the indexers are idle, there is a slow increase until the deployment reaches the saturation point at approximately 20 QPS. For full and incremental crawls the latency increases as indicated in the graph. This test does not include test data to indicate exactly when the query latency saturation takes place during crawl. The following diagram shows the same test data presented as user agents versus latency: Comparing M1 and M2 The next diagram compares the performance of M1 and M2. The main conclusion is that M1 performs somewhat better than M2. M1 can handle about 3 QPS more than M2 before reaching the saturation point. The SAN disks themselves used on M2 should be able to match M1’s locally attached disks in terms of I/O operations per second, but the bandwidth towards the disks is somewhat lower with the SAN configuration. For full crawl and incremental crawl, the performance was comparable during the light load tests. During heavy load, M2 showed slightly less effect on search by ongoing indexing, as the SAN provided more disk spindles to distribute the load. Scenario M3 M3 (40 million items on a single server) can handle about 10 QPS when no feeding is ongoing. This is shown in the following diagrams, both as QPS versus latency and as user agents versus QPS. For comparison, the M1 data is also included. One characteristic of the single node installation is that the query latency fluctuates more when getting close to the saturation point. Under low query load, M3 is almost able to match the performance of M1. During higher load the limitations become apparent. M1 has three times the number of query matching nodes and the peak QPS capacity is close to three times as high, 28 versus 10 QPS. Scenario M4 M4 is an M1 installation with an added dedicated search row. The main benefit of such a configuration is that the index and the search processes are not directly competing for the same resources, primarily disk and CPU. The following diagrams show that the added search row gives a five QPS gain versus M1. In addition the query latency is improved by about 0.2-0.4 seconds. Adding search rows, in most cases, improves query performance, but such a deployment also introduces additional network traffic that can affect the performance. The query performance can degrade when search rows are added if you do not have sufficient network capacity. This is the case when the indexers copies large index files to the query matching nodes. The index file copying can also affect index latency performance due to the added need for copying indices. Query performance versus document volume The following diagram shows the result of running the query test on M4 with 18 million, 28 million, and 43 million documents indexed. The document volume affects the maximum QPS that the system can deliver. Adding approximately 10 million documents gives an approximate maximum 5 QPS reduction. With less than 23 QPS, the document volume has a low effect on the query latency. Scenario M5 Adding a dedicated search row improves query performance, as has already been illustrated when comparing M1 and M4. The same is the case when adding a query matching node to the single node M3 setup to get an M5 deployment. Scenario M6 The following diagram shows the query performance of the M6 versus the M4 topology. The difference between M6 and M4 is the addition of a backup indexer row. The backup indexers compete with query matching for available resources, and they can degrade query performance. However, in this specific test, that is not the case. The hardware that was used has enough resources to handle the extra load during typical operations. The backup indexers use significantly less resources than the primary indexer. This is because the primary indexers perform the actual indexing and distribute the indices to the search rows and backup indexer row. Note that all indexers perform the regular optimization tasks of internal data structures between 03:00 A.M. and 05:59 A.M. every night. These tasks can, depending on the feed pattern, be quite I/O intensive. Testing on M6 has shown that you can see a significant reduction in query performance during indexer optimization processes. The more update and delete operations the indexer handles, the more optimization is required. Disk usage Index disk usage The following table shows the combined increase in disk usage on all nodes after the various content sources have been indexed. File share 1 (2 copies) Raw source data size 154 GB FiXML data size 18 GB Index data size 36 GB File share 2 (2 copies) 6.7 terabytes 360 GB 944 GB 5 GB 10 GB SharePoint 1 2.0 terabytes 70 GB 220 GB 13 GB SharePoint 2 2.0 terabytes 66 GB 220 GB 17 GB HTML 1 8.8 GB 27 GB 58 GB 8 GB HTML 2 137 GB 17 GB 112 GB 6 GB Total 11 terabytes 558 GB 1.6 terabytes 56 GB Content source Other data size Raw source data size is included only for illustration. These data do not occupy any disk space on the FAST Search Server system. FiXML data. The indexer stores the processed items on disk in an XML-based format. FiXML data serves as input to the indexing process which builds the indices. o Every submitted item is stored in FiXML format. Old versions are removed once a day. The data size that is given contains only a single version of every item. Index data. The set of binary index files used for query matching. FAST Search Server keeps a read-only index file set to serve queries while building the next index file set. The worst-case disk space usage for index data is approximately 2.5 times the size of a single index file set. The 0.5 factor constitute various temporary files. Other data includes Web Analyzer data, log files, and so on. When running with primary and backup indexers, the indexers can consume an additional 50 GB each for synchronization data. Key takeaways: The ratio between source data and index data depends strongly on the content type. This is related to the average amount of searchable data in the various data formats. Web Analyzer disk usage The following table shows disk usage for the Web Analyzer in a mixed content scenario, where the data is both file share content and SharePoint items. Number of items in index 40,667,601 Number of analyzed hyperlinks 119,672,298 Average number of hyperlinks per items 2.52 Peak disk usage during analysis (GB) 77.51 Disk usage between analysis (GB) 23.13 Disk usage per 1 million items during peak (GB) 1.63 The average number of links per item is quite low compared to pure Web content installations, or pure SharePoint installations. For example, in a pure Web content installation the average number of links can be as high as 50. Because the Web Analyzer stores only document IDs, hyperlinks, and anchor texts, the number of links is the dominant factor determining the disk usage. The values in the preceding table are somewhat lower than the values that are specified in the Web Analyzer performance dimensioning section. The values in the preceding table derive from one specific installation where URLs are fairly short. The performance dimensioning recommendations are based on experience from several installations. Overall Takeaways Query and feeding performance Performance for feeding new content is mainly determined by the item processing capacity. It is important that you deploy the item processing component in a way that utilizes spare CPU capacity across all servers. Running indexer, item processing and query matching on the same server gives high resource utilization, but also higher variations in query performance during crawling. For such a deployment, we recommend that you schedule all crawling outside periods with high query load. A separate search row is recommended for deployments where low query latency is required at any time. You can also combine a separate search row with a backup indexer. This provides short recovery time in case of a nonrecoverable disk error, with some loss of query performance and incremental update rates. For the highest query performance requirements, we recommend a pure search row. Redundancy The storage subsystem for a farm must have some level of redundancy, because loss of storage even in a redundant setup leads to reduced performance during a recovery period that can last for days. Using a RAID disk set, preferably also with hot spares, is essential to any installation. A separate search row also provides query redundancy. Full redundancy for the feeding and indexing chain requires a backup indexer on a separate row, with increased server count and storage volume. Although this provides the quickest recovery path from hardware failures, other options might be more attractive when hardware outages are infrequent: Running full re-crawl of all the content sources after recovery. Depending on the deployment alternative, this can take several days. If you have a separate search row you can perform the re-crawl while you keep the old index searchable. Run regular backup of the index data. Capacity per node For deployments with up to 15 million items per node you should use the default configuration.</deployment> Configuration for extended content capacity can be used for up to 40 million items per node if you have moderate query performance requirements. Given sufficient storage capacity on the servers, this enables a substantial cut in the number of servers that are deployed. Deployments on SAN FAST Search Server can use SAN storage instead of local disks if this is required for operational reasons. The requirement for high performance storage still applies. Testing of the M2 scenario shows that a sufficiently powerful SAN is not a bottleneck. Although actual workload is scenario dependent, the following parameters could be used as estimation for the required SAN resources for each node in the FAST Search Server farm: 2,000 – 3,000 I/O operations per second (IOPS) 50 – 100 KB average block size Less than 10 ms average read latency For a farm setup, such as M4 (7 servers), the SAN must be capable of serving 15,000 – 20,000 IOPS to the FAST Search Server farm regardless of any other traffic served by the same storage system. Troubleshooting performance and scalability This section provides recommendations for how to optimize your environment for appropriate capacity and performance characteristics. It also covers troubleshooting tips for the FAST Search Server farm servers, and the FAST Search Server specific configuration settings in the Query and Content SSAs. Raw I/O performance FAST Search Server has extensive use of the storage subsystem. Testing the raw I/O performance can be used as an early verification of having sufficient performance. One such test tool is SQLIO (http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65cb53442d9e19). After installing SQLIO, the first step is to get or generate a suitable test file. Because the following tests include write operations, the content of this file is partially overwritten. The size of the file should also be much larger than the available system memory (by a factor of 10) to avoid most caching effects. The test file can also be generated by SQLIO itself, although not directly for huge file sizes. We recommend that you generate a 1 GB file with the command "sqlio.exe -t32 -s1 -b256 1g", which creates the file named "1g" in the current directory. This file can then be concatenated to a sufficiently large file such as 256 GB, by the command "copy 1g+1g+1g+…..+1g testfile". The following set of commands represents the most performance critical disk operations in FAST Search Server. All assume that a file "testfile" exists in the current directory, which should be located on the disk that is planned to host FAST Search Server: sqlio.exe sqlio.exe sqlio.exe sqlio.exe sqlio.exe -kR -kR -kW -kR -kW -t4 -t4 -t4 -t1 -t1 -o25 -b1 -frandom -s300 testfile -o25 -b32 -frandom -s300 testfile -o25 –b32 -frandom -s300 testfile -o1 -b100000 -frandom -s300 testfile -o1 -b100000 -frandom -s300 testfile The first test measures the maximum number of I/O operations per second for small read transfers. The second and third tests measure the performance for medium sized random accesses. The last two tests measure read and write throughput for large transfers. Some example results are given in the following table, with minimum recommendations during typical operation in the topmost row. The following table summarizes the results of the test. Disk layout Recommended minimum 16x SAS 10,000 RPM 2.5" drives RAID50 in two parity groups 22x SAS 10,000 RPM 2.5" drives RAID50 in two parity groups With drive failure 12x SAS 7200 RPM 3.5" drives RAID50 in two parity groups With drive failure 12x SAS 7200 RPM 3.5" drives RAID10 With drive failure 1 KB read [IOPS] 2000 32 KB read [IOPS] 1800 32 KB write [IOPS] 900 100MB read [MB/s] 500 100M write [MB/s] 250 2952 2342 959 568 277 4326 3587 1638 1359 266 3144 2588 1155 770 257 2728 1793 655 904 880 1925 1242 680 178 306 2165 1828 1500 803 767 2015 1711 1498 751 766 Note: The numbers in the table reflect a deployment where the disk subsystem is at least 50 percent utilized in capacity before the test file is added. Testing on freshly formatted disks tends to produce slightly elevated results, because the test file is then placed in the most optimal tracks across all spindles. RAID50 provides better performance during typical operation than RAID10 for most tests other than small writes. RAID10 has less performance degradation if a drive fails. We recommend using RAID50 for most deployments, because 32 KB writes is the least critical of the five tests indicated in the preceding table. RAID50 provides nearly twice the storage capacity compared to RAID10 on the same number of disks. If you deploy a backup indexer, 32 KB writes are more frequent. This is a large amount of preindex storage files (FiXML) are passed from the primary to the backup indexer node. In certain cases, this can lead to a performance improvement by using RAID10. Note that these results are to a large degree dependent on the disk controller and spindles that are used. All scenarios in this white paper specify in detail the actual hardware that has been tested. Analyzing feed and indexing performance The feed and indexing processing chain The feed and indexing processing chain in FAST Search Server consists of the following components, all potentially running on separate nodes: Crawler(s): Any node pushing content into FAST Search Server, in most cases a Content SSA that is hosted in a SharePoint 2010 Server farm. Content distributor(s): Receives content in batches, which are redistributed to item processing running in document processors. Item processing: Converts documents to a unified internal format. Indexing dispatcher(s): Schedules which indexer node gets the content batch. Primary indexer: Generates the index. Backup indexer: Persists a backup of the information in the primary indexer. Content flows, as indicated by arrows 1–5 in the preceding figure, with the last flow from primary to backup indexer, is an optional deployment choice. Asynchronous callbacks for completed processing are propagating in the other direction as indicated by arrows 6 through 9. Crawlers are throttling the feed rate based on the callbacks (9) received for document batches (1). The overall feed performance is determined by the slowest component in this chain. The following sections describe how to monitor this. Monitoring can be done through several tools, for example Performance Monitor in Windows Server 2008 R2, or on Systems Center Operations Manager. Content SSA The most frequently used crawler is the set of indexing connectors that is supported by the Content SSA. The following statistics are important: Batches ready: The number of batches that has been retrieved from the content sources and that is ready for passing on to the content distributor. Batches submitted: The number of batches that has been sent to FAST Search Server and for which a callback is still pending. Batches open: The total number of batches in some stage of processing. The following figure shows these performance counters for a crawl session. Note that there is different scale used in "batches submitted" and the other two. Feed starts with "batches submitted" ramping up until the item processing components are all busy (36 in this case), and will stay at this level as long as there is available work ("batches ready"). There is a period at about 6:45 P.M. to 8:45 P.M. during which the content source can only provide very limited volumes of data, bringing "batches ready" to near zero in the same period. For deployments with backup indexer rows, the "batches submitted" tends to exceed the number of item processing components. These additional batches are content that has been processed, but that has not yet been persisted in both indexer rows. By default, the Content SSA throttles feeds to avoid more than 100 "batches submitted". For large installations, the throttling parameters should be adjusted to allow for more batches to be in some stage of processing. Tuning is needed only for deployments with at least one of the following characteristics: More than 100 item processing component instances deployed per crawl component in the content SSA More than 50 item processing component instances deployed per crawl component in the content SSA, in conjunction with a backup indexer row More than three index columns per crawl component in the content SSA The number of crawl components within the content SSA must be dimensioned properly for large deployments to avoid network bottlenecks. This scaling often eliminates the need for further configuration tuning. When one or more of these conditions apply, the feeding performance can be improved by increasing throttling limits in the Content SSA. These properties are "MaxSubmittedBatches" (default 100) and "MaxSubmittedPUDocs" (default 1,000), and increased limits can be calculated as in the following section. Note that these limits apply for each crawl component within the Content SSA. If you use two crawl components (as in our tests), the total value is two times the configured value as seen from the FAST Search Server farm. 𝑎= { 1 for deployments without backup indexer 2 for deployments with backup indexer 𝑏 = 𝑎 ∗ Number of item processor instances 𝑐 = Number of index columns 𝑠 = Number of crawl components in the Content SSA MaxSubmittedBatches = 20 ∗ 𝑐 + 𝑏 𝑠 MaxSubmittedPUBatches = 100 ∗ MaxSubmittedBatches As an example, the M4 scenario has a=1, b=48, c=3, s=2, resulting in MaxSubmittedBatches = 54 and MaxSubmittedPUDocs = 5,400. The default value (100) for MaxSubmittedBatches does not need tuning in this case. MaxSubmittedPUDocs (the maximum number of documents with ACL changes submitted) can be increased if the feed performance is limited by a high rate of ACL changes. The mentioned configuration parameters have not been changed in the scenarios covered in this white paper unless explicitly specified. These throttling limits are configurable through the SharePoint Server 2010 Management Shell on the SharePoint farm hosting the Content SSA, by the following commands (the following example sets the default values). $ssa = Get-SPEnterpriseSearchServiceApplication -Identity "My Content SSA" $ssa.ExtendedConnectorProperties["MaxSubmittedBatches"] = 100 $ssa.ExtendedConnectorProperties["MaxSubmittedPUDocs"] = 1000 $ssa.Update() You replace the identity string "My Content SSA" with the name of your Content SSA. Increasing these limits increases the load on the item processing and indexing components. When these consume more of the farm resources, query performance are affected. This is less of an issue when running without a dedicated search row. Increasing "MaxSubmittedPUDocs" increases the I/O load on primary and backup indexers. The following table shows the most important performance counters for the Content SSA. Note that these are found on the node or nodes that are hosting the Content SSA crawl components, under "OSS Search FAST Content Plugin", and not in the FAST Search Server farm. Performance counter Apply to object Notes Batches open Content SSA The total number of batches in some stage of processing. Batches submitted Content SSA The number of batches that has been sent to FAST Search Server, and for which a callback is still pending. When zero, nothing has been sent to the FAST Search Server farm backend for processing. Batches ready Content SSA The number of batches that has been retrieved from the content sources and that are ready for submission on to the content distributor. When it is zero, the FAST Search Server farm backend is processing content faster than the Content SSA can crawl. Items Total Content SSA The total number of items passed through the Content SSA since last service restart. Available MB Memory By default, the Content SSA stops aggregating batches that are ready when 80 percent of the system memory has been used. Processor time Processor High CPU load could limit the throughput of the Content SSA. Bytes Total/sec Network Interface High network load might become a bottleneck for the rate of data that can be crawled and pushed to the FAST Search Server farm. Content distributors and item processing Each FAST Search Server farm has one or more content distributors. These components receive all content in batches, which are passed on to the item processing components. You can ensure good performance by verifying that the following conditions are met. Item processing components are effectively utilized Incoming content batches are rapidly distributed for processing Maximum throughput can only be achieved when the Content SSA described in the previous section has a constant queue of "batches ready" that can be submitted. Each item processing component uses 100 percent of a CPU core when it is busy. Item processing components can be scaled up to one per CPU core. With multiple content distributors, the following performance counters should be summed up across all of them for a total overview of the system. Performance counter Apply to object Notes Document processors FAST Search Content Distributor The number of item processing components that are registered with each content distributor. With multiple content distributions, the item processing components should be almost evenly distributed across the content distributors. Document processors busy FAST Search Content Distributor The number of item processing components that are currently working on a content batch. This should be close to the total number under maximum load. Average dispatch time FAST Search Content Distributor The time needed for the content distributor to send a batch to an item processing component. This should be less than 10 ms. Higher values indicate a congested network. Average processing time FAST Search Content Distributor Time needed for batch to go through an item processing component. This time can vary depending on content types and batch sizes, but it would typically be less than 60 seconds. Available MB Memory Each item processing component might need as much as 2 GB of memory. Processing throughput is affected under memory starvation. Processor time Processor Item processing components are very CPU intensive. High CPU utilization is thus expected during crawls. Item processing is although scheduled with reduced priority, and yields CPU resources to other components when needed. Bytes Total/sec Network Interface High network load might become a bottleneck for the rate of data that can be processed by the FAST Search Server nodes. Indexing dispatcher and indexers Indexers are the most write intensive component in a FAST Search Server installation, and you need to ensure that you have high disk performance. High indexing activity can also affect query matching operations when running on the same row. Indexers distribute the items across several partitions. Partition 0, and up to three of the other partitions can have ongoing activity at the same time. During redistribution of items among partitions, one or more partitions might be in a state waiting for other partitions to reach a specific checkpoint. In addition to the following performance counters, indexer status is provided by the "indexerinfo" command, for example "indexerinfo –a status". Performance counter Apply to object Notes API queue size FAST Search Indexer Indexers queues incoming work under high load. This is typical, especially for partial updates. If API queues never (intermittently) reach zero, the indexer is the bottleneck. Feeds will be paused when the API queue reaches 256 MB in one of the indexers. This can happen if the storage subsystem is not sufficiently powerful. It also happens during a large redistribution of content between partitions, which temporarily blocks more content from being indexed. FiXML fill rate FAST Search Indexer FiXML files are compacted at regular intervals, by default between 3:00 A.M. and 5:00 A.M. every night. Low FiXML fill rate (less than 70%) leads to inefficient operation. Active documents FAST Search Indexer Partition Partitions 0 and 1 should have less than 1 million items each, preferably even less to keep indexing latency low. In periods with high item throughput, indexing latency is reduced and these partitions are larger, because this is more optimal for overall throughput. Items are automatically rearranged into the higher numbered partitions during periods with lighter load. % Idle Time Logical disk Low disk idle time suggests a saturated storage subsystem. % Free space Logical disk Indexer needs space both for both the index generation currently used for search and for new index generations that are under processing. On a fully loaded system, disk usage varies between 40 percent and near 100 percent for the same number of items, depending on the state of the indexer. Analyzing query performance Query SSA SharePoint administrative reports provide useful statistics for query performance from an endto-end perspective. These reports are effective for tracing trends over time and for identifying where to investigate when performance is not optimal. The following diagram shows two such events. At about 2:20 A.M., server rendering (blue graph) has a short spike due to recycling of the application pool. Later, at 3:00 A.M., the FiXML compaction is starting, which affects the backend latency. In general, server rendering and object model latencies occur on the nodes that are running SharePoint. These latencies are also dependent on the performance of the instances of SQL Server that are backing the SharePoint installation. The backend latency is within the FAST Search Server nodes, and it is discussed in the following sections. QRproxy and QRserver Queries are sent from the Query SSA to the FAST Search Server farm via the QRproxy component, which resides on the server running the query processing component ("query" in the deployment file). The performance counters in the following table can be helpful for correlating the backend latency reported by the Query SSA, and the query matching component (named "QRServer" in the reports). Neither of these components is likely to represent a bottleneck. Any difference between the two is due to communication delays or processing in the QRproxy. Performance counter Apply to object Notes # Queries/sec FAST Search QRServer Current number of queries per second # Requests/sec FAST Search QRServer Current number of requests per second. In addition to the preceding query load, one request is received every second to check that QRserver is alive. Average queries per minute FAST Search QRServer Average query load Average latency last ms FAST Search QRServer Average query latency Peak queries per sec FAST Search QRServer Peak query load seen by the QRserver since the last restart Query dispatcher The query dispatcher (named "Fdispatch" in the reports) distributes queries across index columns. There is also a query dispatcher that is located on each query matching node, distributing queries across index partitions. The query dispatcher can be a bottleneck when there are huge amounts of data in the query results, leading to network saturation. We recommend that you keep traffic in and out of fdispatch on network connections that are not carrying heavy load, for example, from content crawls. Query matching The query matching (component named "Fsearch" in the reports) is responsible for performing the actual matching of queries against the index, computing query relevancy and performing deep refinement. For each query, it reads the required information from the indices that are generated by the indexer. Information that is likely to be reused is kept in a memory cache for future use. Good fsearch performance relies on a powerful CPU and on low latency from small random disk reads (typically 16-64 KB). The following performance counters are useful for analyzing a node running the query matching: Performance counter Apply to object Notes % Idle Time Logical disk Low disk idle time suggests a saturated storage subsystem Avg. Disk sec/Read Physical disk Each query needs a series of disk reads. An average read latency of less than 10 ms is desirable. Avg. Disk Read Queue Length Physical disk On a saturated disk subsystem, read queues build up. Queues affect query latency. An average queue length smaller than 1 is desirable for any node running query components. This is typically exceeded in single row deployments during indexing, negatively affecting search performance. Processor time Processor CPU utilization is likely to become the bottleneck for high query throughput. When fsearch has high processor time (near 100 percent), query throughput cannot increase further.