White Paper: Working with
large lists in Office
SharePoint ® Server 2007
Author:
Steve Peschka
Date published:
August 2007
Summary:
Microsoft performed performance testing against Microsoft® Office SharePoint® Server 2007 to
determine the performance characteristics of large SharePoint lists under different loads and
modes of operation. This white paper presents their findings.
The information contained in this document represents the current view of Microsoft Corporation
on the issues discussed as of the date of publication. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the
date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the
rights under copyright, no part of this document may be reproduced, stored in or introduced into a
retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission
of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
written license agreement from Microsoft, the furnishing of this document does not give you any
license to these patents, trademarks, copyrights, or other intellectual property.
 2007 Microsoft Corporation. All rights reserved.
Microsoft, SQL Server, Windows, SharePoint, and Active Directory are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their
respective owners.
Table of Contents
Goals
4
Test results and findings
4
Test characteristics
4
Data access methods
5
Browser
5
SPList with For/Each
5
SPList with SPQuery
6
SPList with DataTable
7
SPListItems with DataTable
8
Lists Web service
8
Search
9
PortalSiteMapProvider
Test harness
11
11
WinForm test application
12
WebPart and JavaScript
12
Web Part
12
Test results
12
Browser-based viewing and page size
13
The baseline test
14
Testing with a very large list
15
Comparing results with an indexed column
17
Comparing an indexed column to an ID column
18
Analyzing the results
18
Search
19
PortalSiteMapProvider
19
SPList
19
Data maintenance considerations
20
Data locking
21
Crawl times
21
Related content
21
Goals
The test results in this white paper are intended to demonstrate the difference in the performance
characteristics of SharePoint lists containing large numbers of items when different data access
types are used to present list contents. Test results in this white paper show how to optimize list
performance through limits on the number of items that appear in a list, and by choosing the most
appropriate method of retrieving list contents.
The tests upon which the results in this white paper are based were conducted by using artificially
created test data and simulated users. Real-world results may vary depending on hardware,
number of concurrent users, farm configuration, and user operations being performed.
Test results and findings
There is documented guidance for Microsoft® Office SharePoint® Server 2007 regarding the
maximum size of lists and list containers. For typical customer scenarios in which the standard
Office SharePoint Server 2007 browser-based user interface is used, the recommendation is that
a single list should not have more than 2,000 items per list container. A container in this case
means the root of the list, as well as any folders in the list — a folder is a container because other
list items are stored within it. A folder can contain items from the list as well as other folders, and
each subfolder can contain more of each, and so on. For example, that means that you could
have a list with 1,990 items in the root of the site, 10 folders that each contain 2,000 items, and so
on. The maximum number of items supported in a list with recursive folders is 5 million items.
In Office SharePoint Server 2007, virtually all end-user data is stored in a list. A document library,
for example, is just a specialized list. The same is true for calendars, contacts, and other
interfaces; they are all just customized versions of the basic SharePoint list, also referred to as an
SPList. The individual items in the list are referred to as list items generally, or an SPListItem in
an SPListItemCollection in the Office SharePoint Server 2007 object model. The findings in this
article are equally important across all of the ways in which you store and work with data in a
Office SharePoint Server 2007 site.
There are some scenarios in which you want to take advantage of the features of Office
SharePoint Server 2007, but need to exceed the limit of 2,000 items per container. If you write
your own interface for managing and retrieving the data, it’s quite possible that you can go past
this limit without an adverse impact on farm performance. You may be able to manage larger lists
to some extent by using views within Office SharePoint Server 2007 that are filtered such that
there are never more than 2,000 items returned. Filtered views provide better performance than
just trying to view one large flat list, but are not as efficient as breaking down the list into different
containers if you are using the predefined browser-based Office SharePoint Server 2007
interface.
If you develop your own interface, there are several different ways to retrieve list data, each with
different performance characteristics. Some data access methods perform very well, but are only
useful in a limited number of scenarios. Finally, there are also performance tradeoffs that need to
be made with other data maintenance tasks in addition to data retrieval.
Test characteristics
The tests in this white paper were conducted on a relatively underpowered Microsoft Virtual
Server 2005 R2 image to show a comparison of farm performance characteristics when different
data access types are used to manipulate list data. The goal of these tests was not to establish a
new arbitrary limit, or to deliver a “requests per second” type number that is typically used in a
load style test to show raw throughput capacity. The virtual server image was running Office
SharePoint Server 2007 Enterprise Edition and had 1 gigabyte (GB) of allocated RAM. Virtual
Server was running on a host machine with a 2 gigahertz (GHz) dual-core processor and 2 GB of
RAM.
Baseline tests were done first with a list containing 1,500 items. The list schema looked like this:
Title: Single line of text
Expense Category: Choice (Meals, Travel, Hotel, Supplies)
Amount: Currency
Deductible: Yes/No
Created By: Person or Group
Modified By: Person or Group
In the baseline tests, no columns were indexed; measurements were taken just to provide a
relative value that could be used after the number of items in the list exceeded recommended
boundaries. In the tests against a very large list, one set was done with no columns being
indexed and a second round was done after configuring the Expense Category column to be
indexed. The query that was executed in each one of the tests used a WHERE clause against the
Expense Category field looking for the first 100 items that contained “Supplies.”
To provide another point of comparison, the data being selected was based on ID value in the
tests against the very large list. The ID is a built-in numeric indexed field in all SharePoint lists
that is well suited to queries. The query in this case was constructed with a WHERE clause that
retrieved items where the ID ranged from 44,500 through 44,599.
Some tests were also run with the site under load. To create the load during the testing process,
a LoadTest was created in the Microsoft Visual Studio® .NET 2005 development system to stress
test the site. Instead of targeting a specific number of users in the test, it was configured as a
goal-based test, or a test in which a target value is defined for a particular measurement, and the
test determines the number of requests required to achieve the target. In this case, the goal that
was configured for the test was to achieve a consistent target CPU utilization on the Office
SharePoint Server 2007 computer of from 60 through 80 percent.
Data access methods
Each test consisted of retrieving a subset of data from the list using one of a number of different
data access methods. This section shows the different methods that were tested.
Note: The code samples included in the following sections are intended to show the process
used to conduct tests. The code may not comply with coding best practices, and should not be
used in a production environment without careful review and testing.
Browser
The list was viewed using a browser and the predefined Office SharePoint Server 2007 interface.
A special tool, which is described in the Test Harness section later in this white paper, was
developed to accurately capture how long it takes to view that information and browse through
pages of data.
SPList with For/Each
The Office SharePoint Server 2007 object model (OM) was used to retrieve the list into an SPList
object. Each item in the list was then enumerated with a For/Each loop until items were found
that matched the search criteria.
The following sample code was used for this method.
'get the site
Dim curSite As SPSite = New SPSite("http://myPortal")
'get the web
Dim curWeb As SPWeb = curSite.OpenWeb()
'get our list
Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))
'get the collection of items in the list
Dim curItems As SPListItemCollection = curList.Items
'enumerate the items in the list
For Each curItem As SPListItem In curItems
'do some comparison in here to see if it's an item we need
Next
SPList with SPQuery
The OM was used to create an SPQuery object that contained the query criteria. That object was
then used to against an instance of the list in a SPList object. The results of the query were
returned by calling the GetItems method on the SPList object.
The following sample code was used for this method.
'get the site
Dim curSite As SPSite = New SPSite("http://myPortal")
'get the web
Dim curWeb As SPWeb = curSite.OpenWeb()
'create our query
Dim curQry As SPQuery = New SPQuery()
'configure the query
curQry.Query = "<Where><Eq><FieldRef
Name='Expense_x0020_Category'/><Value Type='Text'>
Hotel</Value></Eq></Where>"
curQry.RowLimit = 100
'get our list
Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))
'get the collection of items in the list
Dim curItems As SPListItemCollection = curList.GetItems(curQry)
'enumerate the items in the list
For Each curItem As SPListItem In curItems
'do something with each match
Next
SPList with DataTable
This is one of two methods that test using a Microsoft ADO.NET DataTable to work with the data.
In this case an instance of the list is obtained with an SPList object. The data from it is then
retrieved into a DataTable by calling the GetDataTable() method on the Items property —for
example, SPList.Items.GetDataTable(). The DataTable’s DefaultView has a property called
RowFilter that was then set to find the items. To keep the methodology between data access
methods consistent, the DataTable was not cached between tests —it was filled each time by
calling the GetDataTable() method. In a real-world scenario this test would have performed better
had the DataTable been cached after the data was first retrieved, but it serves as a valuable point
in comparison testing about the cost of this approach versus retrieving a DataTable from a
selection of data that’s already filtered.
The following sample code was used for this method.
'get the site
Dim curSite As SPSite = New SPSite("http://myPortal")
'get the web
Dim curWeb As SPWeb = curSite.OpenWeb()
'get our list
Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))
'get the item in a datatable
Dim dt As DataTable = curList.Items.GetDataTable()
'get a dataview for filtering
Dim dv As DataView = dt.DefaultView
dv.RowFilter = "Expense_x0020_Category='Hotel'"
'enumerate matches
For rowNum As Integer = 0 To dv.Count - 1
'do something with each match
Next
SPListItems with DataTable
This method is similar to the SPList with DataTable method, but with a twist. An instance of the
list is retrieved through an SPList object. An SPQuery object is created to build a query, and that
query is executed against the SPList object, which returns an SPListItems collection. The data
from that collection is then retrieved into a DataTable by using the GetDataTable() method on the
SPListItems collection.
The following sample code was used for this method.
'get the site
Dim curSite As SPSite = New SPSite("http://myPortal")
'get the web
Dim curWeb As SPWeb = curSite.OpenWeb()
'create our query
Dim curQry As SPQuery = New SPQuery()
'configure the query
curQry.Query = "<Where><Eq><FieldRef
Name='Expense_x0020_Category'/><Value
Type='Text'>Hotel</Value></Eq></Where>"
curQry.RowLimit = 100
'get our list
Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))
'get the collection of items in the list
Dim curItems As SPListItemCollection = curList.GetItems(curQry)
'get the item in a datatable
Dim dt As DataTable = curItems.GetDataTable()
'enumerate matches
For Each dr As DataRow In dt.Rows
'do something with each match
Next
Lists Web service
The Lists Web service, which comes with Windows SharePoint Services 3.0 and Office
SharePoint Server 2007, was used to retrieve the data. A Collaborative Application Markup
Language (CAML) query was created and submitted along with the list identifier, and an XML
result set was returned from the Lists Web service.
The following sample code was used for this method.
'create a new xml doc we can use to create query nodes
Dim xDoc As New XmlDocument
'create our query node
Dim xQry As XmlNode = xDoc.CreateNode(XmlNodeType.Element,
"Query", "")
'set the query constraints
xQry.InnerXml = "<Where><Eq><FieldRef
Name='Expense_x0020_Category'/><Value
Type='Text'>Hotel</Value></Eq></Where>"
'create the Web service proxy that is mapped to Lists.asmx
Using ws As New wsLists.Lists()
'configure it
ws.Credentials =
System.Net.CredentialCache.DefaultCredentials
ws.Url = "http://myPortal/_vti_bin/lists.asmx"
'create the optional elements
Dim xView As XmlNode = xDoc.CreateNode(XmlNodeType.Element,
"ViewFields", "")
Dim xQryOpt As XmlNode =
xDoc.CreateNode(XmlNodeType.Element, "QueryOptions", "")
'query the server
Dim xNode As XmlNode = ws.GetListItems("myListID", "",
xQry, xView, "", xQryOpt, "")
'enumerate returned items
For nodeCount As Integer = 0 To xNode.ChildNodes.Count - 1
'do something with each match
Next
End Using
Search
The OM was used to execute a query against the Office SharePoint Server 2007 search engine
and return the results as a ResultTableCollection. That was then further distilled down into an
ADO.NET DataTable via the ResultTable of ResultType.RelevantResults from the
ResultTableCollection.
The following sample code was used for this method.
'get the site
Dim curSite As SPSite = New SPSite("http://myPortal")
'get the web
Dim curWeb As SPWeb = curSite.OpenWeb()
'get our list
Dim curList As SPList = curWeb.Lists(New Guid("myListGUID"))
Dim qry As New FullTextSqlQuery(curSite)
Dim SQL As String = "SELECT Title, Rank, Size, Description,
Write, Path, Deductible, ExpenseCategory, ID, Vendor, Amount FROM
portal..scope() WHERE CONTAINS
(""URL"",'""#SITEURL#Lists/#LISTURL#*""') #DEFAULT# ORDER BY
""Rank"""
'do token replacement
SQL = SQL.Replace("#SITEURL#", "http://myPortal/")
SQL = SQL.Replace("#LISTURL#", curList.Title)
SQL = SQL.Replace("#DEFAULT#", "AND FREETEXT
(""ExpenseCategory"",'""Hotel""')")
qry.QueryText = SQL
qry.RowLimit = 100
qry.ResultTypes = ResultType.RelevantResults
'execute the query
Dim rtc As ResultTableCollection = qry.Execute()
Dim rt As ResultTable = rtc(ResultType.RelevantResults)
Dim dt As New DataTable()
dt.Load(rt, LoadOption.OverwriteChanges)
'enumerate matches
For Each dr As DataRow In dt.Rows
'do something with each match
Next
PortalSiteMapProvider
One approach to retrieving list data in Office SharePoint Server 2007 that’s not very well known is
the use of the PortalSiteMapProvider class. It was originally created to help cache content for
navigation. However, it also provides a nice automatic caching infrastructure for retrieving list
data. The class includes a method called GetCachedListItemsByQuery that was used in this
test. This method first retrieves data from a list based on an SPQuery object that is provided as a
parameter to the method call. The method then looks in its cache to see if the items already exist.
If they do, the method returns the cached results, and if not, it queries the list, stores the results in
cache and returns them from the method call.
The following sample code was used for this method. Note that it is different from all of the
previous examples in that you cannot use the PortalSiteMapProvider class in Windows forms
applications.
'get the current web
Dim curWeb As SPWeb =
SPControl.GetContextWeb(HttpContext.Current)
'create the query
Dim curQry As New SPQuery()
curQry.Query = "<Where><Eq><FieldRef
Name='Expense_x0020_Category'/><Value
Type='Text'>Hotel</Value></Eq></Where>"
'get the portal map provider stuff
Dim ps As PortalSiteMapProvider =
PortalSiteMapProvider.WebSiteMapProvider
Dim pNode As PortalWebSiteMapNode =
TryCast(ps.FindSiteMapNode(curWeb.ServerRelativeUrl),
PortalWebSiteMapNode)
'get the items
pItems = ps.GetCachedListItemsByQuery(pNode, "myListName_NotID",
curQry, curWeb)
'enumerate all matches
For Each pItem As PortalListItemSiteMapNode In pItems
'do something with each match
Next
Test harness
All of the tests were executed through one of three different test harnesses. Each one is
described in more detail below.
WinForm test application
The WinForm test application was used for the majority of the tests. It was written in the Microsoft
Visual Basic.NET development system, and runs on the Office SharePoint Server 2007 computer
itself so that it can use the OM to retrieve data from Office SharePoint Server 2007. It used the
new StopWatch feature of the Microsoft.NET Framework version 2.0 to capture the elapsed
milliseconds that each test took to complete both retrieving the data and enumerating the results.
The test results were enumerated and the values of two fields of data were retrieved from each
item so that if any data access method caused some additional processing time in the retrieval of
those items, it would get recorded along with the results. This was done to give a more realistic
representation of how the data would be used in a real-world scenario.
WebPart and JavaScript
Monitoring the time it takes for the predefined Office SharePoint Server 2007 browser interface to
render a page was more difficult. In order to capture that information a custom ASP.NET server
control was developed. In the OnInit event for the Web Part, the current time down to the
millisecond is recorded. When Render is called, that time is output along with some JavaScript
onto the page. The JavaScript forces a call when the browser document’s ReadyStateChange
event fires to a function that the Web Part creates. That function checks the document’s
readyState property and if it is Complete, the function gets the current time, subtracts the time
that was captured during the Web Part’s OnInit event, and displays the difference. The value that
is displayed represents how long it took from when the Web Part was first initialized until the page
was completely finished loading.
Web Part
A second Web Part was written to use the PortalSiteMapProvider application programming
interface (API). This Web Part requires a valid HTTP context and so it would not work in the
WinForms test harness. The process it used was very similar to the WinForms application,
however — in the Render method it calls the GetCachedListItemsByQuery on the
PortalSiteMapProvider class instance and uses the StopWatch class to track the elapsed
milliseconds, which it outputs to the page.
Test results
Before reviewing each of the data points in the testing process it’s also important to understand
what each data point represents. Each point on the graph is represents the average of a number
of tests. For example, most of the test results consist of five data points. Each data point
represents the average time for five tests, so all five data points are the result of 25 tests. The
only exception is the tests for the browser-based rendering times — they used a smaller dataset
than the other tests. The following sections describe the individual test results. All timed results
are measured in milliseconds, so smaller numbers are better.
Browser-based viewing and page size
One test that was done was to determine how the number of records displayed for a list on the
page impacts the performance of rendering that page. The goal was to understand if showing
more items on page caused linear growth, or response times that got exponentially worse. The
testing was done against a list with 1,500 items and varied the number of items displayed on a
page to be 100, 300 and 500. As shown in the following graph, increasing the number of items
displayed per page results in a fairly linear increase in display time.
The baseline test
The goal for the next set of tests was to establish our baseline numbers. Here are the results of
the different data access methods against a list with 1,500 items. Only the most common data
access methods were included in the baseline testing, so test results for the
PortalSiteMapProvider class were not included.
What stands out clearly in this set of results is that viewing the data using the predefined Office
SharePoint Server 2007 browser interface is the slowest data access method by far. This is one
of the reasons why guidance has been delivered to restrict list sizes to no more than 2,000 items
per container. It’s also why we recommend that you don’t consider going above the 2,000 items
per container unless you are developing an alternative interface to work with the data.
Testing with a very large list
The next test really shows well what happens when you dramatically increase the number of
items in the list over the recommended guideline. In this case, the list contained 100,000 items.
The list did not have the index on the Expense Category column, and the site was under load.
The following version of the previous chart omits the two slowest data retrieval methods for ease
of comparison between the other methods.
Using the For/Each enumeration to find items within the list is clearly not a good choice for
working with large amounts of data. In addition, there was tremendous overhead in loading all of
the list data into an ADO.NET DataTable and then using its filtering capabilities to find the desired
data. However, as stated earlier, if you cached the DataTable instead of loading the list data into
it on each request, the results would probably have been significantly different. There still would
be a very significant hit the first time the list data is loaded into the DataTable, however.
Another point to note here is just how well the PortalSiteMapProvider class performed. It was
lightning fast in these tests, and significantly outperformed the other data access methods.
Because the PortalSiteMapProvider and other tested methods performed substantially better
than the For/Each, SPList with DataTable and Page Load in Browser methods, the latter methods
were not included in any subsequent test results.
Also, for the Page Load in Browser test, the page was configured to display 100 items per page.
Comparing results with an indexed
column
The goal of this test was to determine how much of a performance gain is realized when
configuring the column used in the WHERE clause for the test query to be indexed.
These results demonstrate that if you are using the SPList class as part of your data access
strategy, you will benefit greatly from indexing the columns used in WHERE clauses. For other
data access methods, indexing will likely give you only nominal benefit, if at all. Adding a column
index actually reduced performance when using the PortalSiteMapProvider class.
Comparing an indexed column to an ID
column
This test was conducted to compare the performance differences when using a WHERE clause in
the query that relied on an item’s ID rather than the value of an indexed field.
What’s interesting about these results is that they are essentially the inverse of the previous test.
That is, when using ID as the filter field criteria, data access methods that do not use the SPList
class perform much better. However, data access methods that rely on the SPList class still work
much more quickly when they are using an indexed column rather than item IDs.
Analyzing the results
The test results in this white paper validate the fact that with proper testing in your own
environment, it is quite possible that you can use more than 2,000 items in a container without an
adverse impact on performance. The best results will be obtained if you write your own user
interface to work with the data in the list, and make some carefully considered choices about what
data access method works best for your requirements. The data access method you choose may
very well impact other aspects of your site or list implementation.
For example, using data access methods that require the SPList class will greatly benefit from
indexing columns used in a WHERE clause. However, the benefit of indexing these columns is
marginal if the data is retrieved using the Search service, the Lists Web service or the
PortalSiteMapProvider class. Conversely, if you are not using the SPList class for data
retrieval, data access will likely be much faster if you are able to retrieve data based on the ID of
items, rather than the value of a specific column in a list.
Search
Search performed well across all of the scenarios. One drawback to using Search is that it cannot
retrieve data until indexing has completed, so if immediate data retrieval is a requirement, Search
may not be the best choice. You will probably also need to configure Search further to support
your query requirements. For example, these tests required the ability to use a structured query
language (SQL) statement that retrieved a very specific set of fields from a list, as well as use the
ID and Expense Category field in the WHERE clause. For this solution to work, Managed
Properties must be configured in Search to retrieve the custom properties from the list and to use
criteria against them. Implementing Search as it was used in this testing requires Office
SharePoint Server 2007.
PortalSiteMapProvider
The PortalSiteMapProvider class was one of the best performing data access methods in every
scenario. However, there are a couple of limitations in using it. First, because of the way in which
the data is cached, use of the PortalSiteMapProvider class is going to be most useful if the data
you are retrieving is not significantly different over time. If you are trying to frequently retrieve
different data sets, the PortalSiteMapProvider class will incur the overhead of constantly reading
from the database, inserting data into the cache and then returning it from the method call.
Clearly, the advantage of the PortalSiteMapProvider class is when it can read data directly from
the cache.
Also, the amount of memory the PortalSiteMapProvider class has available to use may be
somewhat constrained. It uses the site collection object cache to store data; by default, the object
cache is only 100 megabytes (MB). You can increase the size of the site collection object cache
on the Object cache settings page for the site collection. You can change the Max. Cache Size
(MB) value on that page. However, remember that whatever amount of memory you assign to the
object cache comes out of the same shared memory available to the application pool. If you are
running the 32-bit version of Office SharePoint Server 2007, the most memory you can assign to
a single application pool is 2 GB, and you immediately lose roughly 500 MB when the .NET
Framework and base Office SharePoint Server 2007 DLLs and assemblies are loaded.
Therefore, you need to balance the object cache size with how much memory you have available
on your Web servers in addition to the processor architecture, other loaded programs used by
Office SharePoint Server 2007, etc. The PortalSiteMapProvider class is only available on Office
SharePoint Server 2007.
SPList
Using the SPList class gives you several options to retrieve data — a For/Each enumeration, the
Items collection, the GetDataTable method of an SPListItems collection, and using an SPQuery
object to filter data. Some of those methods, specifically the GetListItems and GetDataTable
from the results of GetListItems, routinely performed well in most scenarios. However, there are
some limitations. For example, the GetListItems method won’t work across folders in a single list
unless the ViewAttributes property of your SPQuery query class includes Scope="Recursive".
For that matter, it won’t work across lists if you want to query data from multiple lists or subsites.
It also requires that all code runs directly on the Office SharePoint Server 2007 computer. Other
options, like the Lists Web service and the Search Web service (not the Search methodology that
was used in these tests) can retrieve the data but run on remote servers.
Data maintenance considerations
There are a few other issues to consider when creating lists with more than 2,000 items per
container. One is the cost of other common operations such as adding or deleting items from the
list. We did some additional tests to measure the impact of those kinds of operations against our
very large list. The results show that as the list gets quite large, those operations begin to slow
down considerably.
The results show that when the site is not under load, adding a single new item does not have a
significant impact on performance. However, although indexing a column improves query
performance, it also may negatively impact the performance of adding new records. Also,
performance would obviously degrade when multiple items are being added and the site is under
load.
Performance for deleting items degrades significantly when a list becomes very large. Deleting a
single item from a very large list takes much more time than deleting an item from a smaller list.
In the test case, a single item was deleted from a site that was not under load. As the data shows,
whether there was an indexed column or not, performance when changing list items degrades as
the size of the list grows. It’s more likely that a batch process would need to be built to delete
items during off-peak periods. If that is not an option, the performance of delete functionality
alone could conceivably force you to abandon plans to use very large lists in Office SharePoint
Server 2007.
Data locking
Another important consideration when using large lists is the concept of the locks that Microsoft
SQL Server™ places on data tables that contain list information. Virtually all data for all Office
SharePoint Server 2007 lists is contained within a single table in SQL Server. This table contains
data for all the lists in all the site collections whose data is stored in that content database. When
you attempt to update data on a list item, whether that is adding, editing or deleting a list item,
SQL Server will attempt to lock other items (rows to SQL Server) for that particular list.
However, there is a limit to the number of individual rows that SQL Server will try to lock down. If
you try to select approximately 5,000 items or more simultaneously for reading or update, SQL
Server will typically lock the entire table for the duration of that change. In this event, all other
reads and writes for all lists in all site collections are queued until the previous transaction is
complete and the lock is released. If your query retrieves data across multiple folders within the
list, the locking behavior occurs whether or not list items are recursively nested so that there are
not more than 2,000 items in an individual container. To ensure that you don’t encounter this
locking behavior, make sure the number of items you retrieve in a single request is well below this
threshold. For example, you can control the number of records returned by setting the RowLimit
on the SPQuery class.
Crawl times

Another consideration with very large lists is crawl time and crawl time-outs. As a list gets
larger, the chances of the indexer timing out when crawling the contents of that list increases.
This is an issue that should be carefully monitored and tested in a lab environment before
rolling out any large list in production. If the indexer is timing out when crawling large lists,
you can increase the time-out value with the following steps:
1. In Central Administration, on the Application Management tab, in the Search section,
click Manage search service.
2. On the Manage Search Service page, in the Farm-Level Search Settings section,
click Farm-level search settings.
3. In the Timeout Settings section, in the Connection time and Request
acknowledgement time boxes, enter the desired number of seconds.
Related content
For more detailed information about the factors involved in performance and capacity planning for
Office SharePoint Server 2007 lists, see following resource:

Plan for software boundaries (Office SharePoint Server)
(http://go.microsoft.com/fwlink/?LinkID=95115&clcid=0x409). This article provides a starting
point for planning the performance and capacity of your system, including performance and
capacity testing results and guidelines for acceptable performance.