FILESTREAM Design and Implementation Considerations

FILESTREAM Design and Implementation
Considerations
SQL Server Technical Article
Writer: Shaun Tinline-Jones
Technical Reviewer: Prem Mehra, Chuck Heinzelman, Nicholas Dritsas, Peter Carlin
Technical Editor: Tali Smith
Published: February 2011
Applies to: SQL Server 2008 R2
Summary: Much of the data used today is unstructured, including text documents, images, and
videos. This unstructured data is typically stored outside a relational database and separate
from structured data. This separation can make data management much more complex, and if
the data is associated with structured storage, the separation can limit performance and file
streaming capabilities.
Microsoft SQL Server 2008 includes an enhancement to data storage called FILESTREAM,
which lets you store unstructured binary large object (BLOB) data directly in the file
system. With FILESTREAM you can take advantage of the Win32 rich set of streaming
application programming interfaces (APIs) for better streaming performance. FILESTREAM also
provides transactional consistency so structured and unstructured data are always in synch;
additionally, you can use Transact-SQL statements to insert, update, query, search, and back
up FILESTREAM data.
This white paper is a companion to the information about FILESTREAM found on TechNet. This
paper delves deeply into selected topics that should be considered when implementing a
solution that uses FILESTREAM, including design considerations, maintenance, and
management of a FILESTREAM environment.
Copyright
This document is provided “as-is”. Information and views expressed in this document, including
URL and other Internet Web site references, may change without notice. You bear the risk of
using it.
This document does not provide you with any legal rights to any intellectual property in any
Microsoft product. You may copy and use this document for your internal, reference purposes.
© 2011 Microsoft. All rights reserved.
2
Contents
Introduction ................................................................................................................................ 5
Introducing FILESTREAM .......................................................................................................... 5
When to Use FILESTREAM ................................................................................................... 6
Considerations for Enabling and Configuring FILESTREAM ...................................................... 7
Enabling FILESTREAM .......................................................................................................... 7
Failover Clusters .................................................................................................................... 8
FILESTREAM Storage Considerations ................................................................................... 9
Considerations for Setting Up FILESTREAM Storage Volumes .......................................... 9
Special Consideration for Large Environments ..................................................................10
RAID Considerations .........................................................................................................10
Security Considerations ............................................................................................................11
Encryption .............................................................................................................................12
Management Considerations ....................................................................................................12
Maintenance Tasks ...............................................................................................................12
Index Management ................................................................................................................12
Filegroup Management ..........................................................................................................13
Backup and Restore ..............................................................................................................13
FILESTREAM Transaction Log..............................................................................................13
Garbage Collector .................................................................................................................14
Log Shipping .........................................................................................................................15
Database Mirroring ................................................................................................................15
Database Snapshot ...............................................................................................................15
AlwaysON (HADRON) ...........................................................................................................15
Considerations for Table Creation .............................................................................................15
Data Access Considerations .....................................................................................................16
Transact-SQL Access............................................................................................................17
File System Streaming Access ..............................................................................................17
Statement Model ...................................................................................................................17
Storage Namespace ..............................................................................................................18
Transacted File System Access.............................................................................................18
Transactional Durability .........................................................................................................18
Isolation Semantics ...............................................................................................................18
3
Partial Updates ......................................................................................................................18
Write-Through from Remote Clients ......................................................................................19
Migration Considerations ..........................................................................................................19
Conclusion ................................................................................................................................20
For More Information.................................................................................................................21
4
Introduction
Today’s computer-driven business world generates data at an incredible rate. If this data is to
be useful, organizations must store it in a controlled and efficient way so that it is readily
accessible.
Prior to Microsoft SQL Server 2008, storing unstructured data such as text documents, images,
and videos posed many challenges, such as how to maintain transactional consistency between
the structured and unstructured data, how to manage backup and restore, and storage
performance and scalability. Architects of applications that required the storage of binary large
objects (BLOB) data could either store the data in the database or store it outside of the
database with a reference stored in the database. This decision was never easy to make
because each option had its own benefits and frustrating limitations. Information about the
tradeoffs can be found in the white paper, FILESTREAM Storage in SQL Server 2008
(http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx).
Microsoft SQL Server 2008 R2 and the current Community Technology Preview (CTP) release
(SQL Server “code-named” Denali) expand the available options for storing BLOB data to
include:

In-database

FILESTREAM

Remote BLOB Storage (RBS)

FileTable
This white paper focuses on design and implementation considerations for using FILESTREAM
storage. Additionally, notes in the paper describe relevant RBS and FileTable considerations.
The paper consolidates information from many resources, and adds content from Microsoft
development team members.
Note: This paper is intended for system architects, IT professionals, and database
administrators (DBAs) tasked with evaluating or implementing data storage. It is assumed that
the reader is very familiar with Windows Server and SQL Server and has at least a basic
knowledge of database concepts.
Introducing FILESTREAM
FILESTREAM is a SQL Server 2008 feature that lets you store unstructured BLOB data directly
in the file system. FILESTREAM is not a data type; it is an attribute imposed on a varbinary
column to indicate that the data is to be stored directly on the file system, thus maintaining
transactional consistency.
A non-FILESTREAM storage format uses the buffer pool when the data pages are accessed.
FILESTREAM uses the NT system cache for caching the file data. This approach helps reduce
the effects that FILESTREAM data has on database engine performance. While the buffer pool
5
is relieved of managing the varbinary(max) data pages, it is important to appreciate that the
virtual address space (VAS) is still shared between FILESTREAM data and SQL Server data.
When using FILESTREAM, it is important to differentiate between traditional data (called row
data) and FILESTREAM data.
Note: Other storage options include RBS and FileTable.
RBS is a library API set incorporated as an add-on feature pack for Microsoft SQL Server. RBS
is designed to move storage of BLOB data from database servers to external Content
Addressable Stores (CAS). A reference to the BLOB is stored in the database. An application
stores and accesses the BLOB data by calling into the RBS client library. RBS manages the
lifecycle of the BLOB; for example, RBS performs garbage collection when needed. For more
information, see Microsoft SQL Server Remote Blob Storage (RBS) Samples
(http://sqlrbs.codeplex.com/).
FileTable builds on existing FILESTREAM capabilities, providing applications with nontransactional access to a special table (the FileTable) that contains unstructured data. For more
information, see FileTable Overview (http://msdn.microsoft.com/enus/library/ff929068(v=SQL.110).aspx).
Note that unlike RBS, FILESTREAM is constrained to local volumes. RBS can store BLOB data
on a variety of remote storage devices.
When to Use FILESTREAM
In SQL Server, BLOBs can be standard varbinary(max) objects (data is stored in tables) or
FILESTREAM varbinary(max) objects (data is stored in the file system). The size and use of the
data determines whether you should use database storage or file system storage.
If any of the following conditions are true, you should consider using FILESTREAM:

The objects to be stored are, on average, larger than 1 megabyte (MB).

Fast read access is important.

You are developing applications that use a middle tier for application logic.
You should generally store BLOBs smaller than 256 kilobytes (KB) inside the database, and
store BLOBs larger than 1 MB outside the database. For BLOBs sized between 256 KB and 1
MB, the more efficient storage solution depends on the read:write ratio of the data, and on the
rate of “overwrite”1. Generally, storing them as varbinary(max) BLOBs in the database provides
better streaming performance than storing them outside of the database
1
6
http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx
Use these considerations as a starting point for deciding if BLOBs should be stored outside of
the database. If you plan to store the BLOBs outside the database, you can then evaluate
whether RBS or FILESTREAM is the most appropriate solution.
For more information, see Chapter 7 (“Special Storage”) of Microsoft SQL Server 2008
Internals2. This chapter provides points to consider when comparing in-database or file system
storage. While not required for implementing FILESTREAM, the discussion in the book is useful
when selecting the most efficient storage medium based on the BLOB size and impact to
database management operations.
Note: SQL Server stores Character Large Objects (CLOBs) as varchar(max) and
nvarchar(max) data types, while BLOB types can be used with varbinary(max) and image data
types.
Considerations for Enabling and Configuring FILESTREAM
In simple terms, to use FILESTREAM, you must create or modify a database to contain a
special type of filegroup and then create or modify a table so that it contains a varbinary(max)
column with the FILESTREAM attribute. After you complete these tasks, you can use TransactSQL and Win32 to manage the FILESTREAM data. The following sections describe some
considerations for enabling and configuring FILESTREAM storage.
Enabling FILESTREAM
FILESTREAM is not automatically enabled when you install or upgrade SQL Server; before you
can start using FILESTREAM, you must enable it on the instance of the SQL Server Database
Engine.
There are several different ways to enable FILSTREAM. You can enable FILESTREAM by
using SQL Server Configuration Manager, by using Transact-SQL, by using SQL Server
Management Studio, or by enabling FILESTREAM during SQL Server 2008 installation. Note
that while enablement is typically performed by a Windows system administrator, it is critical that
the administrator is able to assign appropriate levels of access to the storage.
Note: Step-by-step guidance for enabling FILESTREAM can be found at TechNet at How to:
Enable FILESTREAM (http://technet.microsoft.com/en-us/library/cc645923.aspx) or on MSDN
at How to: Enable FILESTREAM (http://msdn.microsoft.com/en-us/library/cc645923.aspx).
When enabling FILESTREAM using SQL Server Configuration Manager, you should select all of
the check boxes in the FILESTREAM tab of the SQL Server (SQL 2008) Properties pane in
most cases (Enable FILESTREAM or Transact-SQL access, Enable FILESTREAM for file
2
Delaney, Kalen, et al., Microsoft SQL Server 2008 Internals, Redmond, WA: Microsoft Press, 2009,
ISBN: 0735626243
7
I/O streaming access, and Allow remote clients to have streaming access to
FILESTREAM data). You should also create a Windows share name.
Note that the share name must not pre-exist. When selecting a name, pay careful attention to
naming conventions, especially if multiple instances reside on the server and each has
FILESTREAM enabled. Note that if the application will run on the same server as SQL Server
(an uncommon configuration), there is no need to enable remote connections.
Microsoft recommends that you only enable the features you need. It is therefore important to
communicate clearly with the Windows system administrator who is responsible for
configuration. For example, without clear guidance, an administrator might configure the
Windows properties to allow only a Transact-SQL connection when the desired configuration is
to include Win32 access.
After the service is configured through the SQL Server Configuration Manager, you must enable
FILESTREAM through the sp_configure process using Transact-SQL. In SQL Server
Management Studio, open a new query window (the Query Editor) and type the following
Transact-SQL code to enable FILESTREAM and to allow remote clients to connect to
FILESTREAM files through the Win32 API.
EXEC sp_configure FILESTREAM_access_level, 2
RECONFIGURE WITH OVERRIDE
GO
SQL Server 2008 supports three levels of FILESTREAM access:

Access level 0―FILESTREAM support for the instance is disabled.

Access level 1―FILESTREAM for Transact-SQL Access is enabled.

Access level 2―FILESTREAM for Transact-SQL and Windows streaming access is
enabled.
After FILESTREAM is enabled, the SQL Server instance is ready to hold databases that contain
FILESTREAM filegroups. Because FILESTREAM uses this special type of filegroup, you must
specify the CONTAINS FILESTREAM clause for at least one filegroup when you create the
FILESTREAM-enabled database. Note that it is not possible to point multiple databases to a
common directory for their FILESTREAM requirements.
Note: FileTable is enabled through the FILESTREAM configuration settings. In the database,
the process is different than that for FILESTREAM because FileTable is enabled through
database options, not through the definition of a filegroup.
Failover Clusters
FILESTREAM is fully supported with failover clustering. To use FILESTREAM in a failover
cluster, all nodes in the cluster must have FILESTREAM enabled at the Windows level, and the
FILESTREAM data container(s) must be placed on shared storage so the data is available to all
8
nodes. If you plan to enable I/O streaming, be sure to use the same Windows share name on all
nodes; you should see the share name appear as a cluster resource.
Note: For step-by-step guidance, see the MSDN article How to: Set Up FILESTREAM on a
Failover Cluster (http://msdn.microsoft.com/en-us/library/cc645886.aspx).
FILESTREAM Storage Considerations
With FILESTREAM storage, a varbinary(max) column holds the data, which is stored as BLOBs
in the file system. When a BLOB is stored in the database, it cannot be larger than 2 GB; in
FILESTREAM storage, the sizes of the BLOBs are limited only by the volume size of the file
system.
Note: Microsoft SharePoint Server, which uses RBS, also enforces a 2 GB file size limit. This
limitation is neither a FILESTREAM nor an RBS limitation, but is a constraint that is enforced at
the application tier.
FILESTREAM data can only be stored in FILESTREAM filegroups. A FILESTREAM filegroup is
a special filegroup that refers to file system directories, called data containers. Data containers
serve as the interface between Database Engine storage and file system storage.
To ensure that a column stores data on the file system, be sure to specify the FILESTREAM
attribute. This attribute causes the Database Engine to store all data for that column on the file
system rather than the database file. (Note that this attribute cannot be set unless
FILESTREAM is enabled for the SQL Server instance and the database contains a filegroup
that is designated as a FILESTREAM destination.)
A directory is created for each table that has a FILESTREAM-enabled column. Within this
directory, a sub-directory is created for each column that is FILESTREAM enabled, and it is
within these sub-directories that FILESTREAM values will be found. All these directories and
files have a Globally Unique Identifier (GUID) as a name. It is important to note that the file
name (or the GUID) is not intended to be the same as the GUID found in the table. This does
happen, but the GUID correlation is lost after updates are incurred.
Considerations for Setting Up FILESTREAM Storage Volumes
When you set up FILESTREAM storage volumes, consider the following best practices:
9

Turn off short file names on FILESTREAM computer systems. Short file names take
significantly longer to create. To disable short file names, use the Windows fsutil utility
(see Fsutil [http://technet.microsoft.com/en-us/library/cc753059(WS.10).aspx]).

Use 64 KB NTFS file system clusters. Compressed volumes must be set to 4 KB NTFS
clusters.

Disable indexing on FILESTREAM volumes and set disablelastaccess. To set
disablelastaccess, use the Windows fsutil utility.

For performance reasons, FILESTREAM filegroups and containers should reside on
volumes other than the operating system, SQL Server database, SQL Server log,
tempdb, or paging file.

Disable antivirus scanning of FILESTREAM volumes. If antivirus scanning is necessary,
avoid setting policies that will automatically delete offending files.

Set up and tune the RAID level for fault tolerance and the performance that is required
by an application.

A FILESTREAM filegroup has only one data container. This is unlike row data filegroups,
which can have multiple files (equivalent to data containers) per filegroup. Note that the
next release of SQL Server will remove the one data container limitation.

FILESTREAM data containers cannot be nested—you cannot place a new
FILESTREAM data container within another one. Two FILESTREAM data containers for
the same database may share a root directory, but data containers from different
databases cannot share a directory.

When you are using failover clustering, the FILESTREAM filegroups must be on shared
disk resources.
For more information, see FILESTREAM Storage in SQL Server 2008
(http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx).
Special Consideration for Large Environments
In very large environments with multi-terabytes of FILESTREAM data, the use of only one data
container filegroup introduces complexity and imposes additional design considerations.
A simple solution is to use a single volume that can meet all the FILESTREAM capacity
demands. This is not always possible, though some volume sizing strategies (including dynamic
volumes and 64 KB clusters) can allow for a single volume of approximately 256 TB.
A more complex, scalable solution is to partition the FILESTREAM data so that files can be
directed to multiple volumes. This procedure can be challenging because the FILESTREAM
data is constrained by a unique index on the GUID value. Since partitioning relies on ranges of
values, an algorithm is required to ensure a given GUID value will reside on a volume with
available disk space. Data growth, accurate capacity planning, and awareness of data purging
behaviors all contribute to a complex design that requires complex administrative tasks.
Note: In the next major release, SQL Server will allow multiple data containers per
FILESTREAM filegroup that can be added without the application being aware.
In large environments, you might have a large number of files under a single directory. A
directory can host more than 4 billion files, and a directory with more than 300,000 files is
considered to be large (see How NTFS Works [http://technet.microsoft.com/enus/library/cc781134(WS.10).aspx]). A good practice therefore is to create partitions
(FILESTREAM filegroups) that keep the number of files around 300,000.
RAID Considerations
FILESTREAM volumes are constrained to local volumes, including storage area network (SAN)
configurations. It is common for BLOB-based solutions to be high read and to have few (if any)
updates and regular inserts. In fact, if the solution updates BLOBs, then, under the covers, this
10
incurs an insert. This information can help you select an appropriate RAID level to use for your
FILESTREAM data container volumes.
RAID levels differ in terms of read/write performance, resilience to failure, and cost.

RAID 5 is ideal for high read solutions and is relatively low cost. It can handle the failure
of only one drive in the RAID array, however, and it may be unsuitable for write-heavy
workloads.

RAID 10 provides excellent read and write performance and is preferred for high
updating solutions. It can handle multiple drive failures (depending on the degree of
mirroring involved), but it is more expensive, given that at least 50 percent of the drives
in the RAID array are redundant.
RAID level choice might be different for the volume on which each user database is stored, and
it might differ between the volume storing the data files and that storing the log files for a single
database.
If the workload will involve high-performance streaming of FILESTREAM data, you may choose
to have the FILESTREAM data container volume use the RAID level that gives the highest read
performance. However, this approach might not provide a high degree of resilience against
failures. On the other hand, you might choose to use the same RAID level as for the other
volumes that store the data for the database, but this approach might not provide the requisite
performance levels that the workload demands.
It is therefore important to make a carefully considered choice for RAID level for the
FILESTREAM data container volumes after considering the tradeoffs.
Note: For more information, see Physical Database Storage Design
(http://technet.microsoft.com/en-us/library/cc966414.aspx) or RAID Levels and SQL Server
(http://technet.microsoft.com/en-us/library/ms190764.aspx).
Security Considerations
The recommended, default configuration only allows access to the FILESTREAM files through
SQL Server―through Transact-SQL or through a token if using Win32. Microsoft recommends
that no account other than the account running the SQL Server service be granted NTFS
permissions to the FILESTREAM data containers.
It is possible to access the files without going through SQL Server (not recommended), but this
requires an explicit action by the system administrator to modify the security settings applied to
the data container.
FILESTREAM takes advantage of the existing authentication and authorization functionality of
SQL Server for controlled access to the data values; these permissions can be applied at the
columnar level.
11
Note: When the RBS provider is FILESTREAM, then FILESTREAM applies SQL Server
security, but all other providers are unaware of SQL Server security.
You can use DBCC CHECKDB to help identify orphans, whether they exist in the table or in the
file system. Note that DBCC CHECKDB does not reveal cases of file contents that have been
tampered with, however. For more information, see DBCC CHECKDB (Transact-SQL)
(http://msdn.microsoft.com/en-us/library/ms176064.aspx).
Encryption
It is possible to store FILESTREAM data on Encrypted File System (EFS) volumes; however,
you should pay careful attention to the nuances of an EFS volume. For more information, see
The Encrypting File System (http://technet.microsoft.com/en-us/library/cc700811.aspxfor more
information).
Note that during a SQL Server backup of FILESTREAM data, the data is stored decrypted. A
restore operation to a normal volume therefore will result in decrypted values.
Management Considerations
Following are some considerations for maintenance and management of FILESTREAM data.
Maintenance Tasks
Because FILESTREAM is implemented as a varbinary(max) column and integrated directly into
the Database Engine, most SQL Server management tools and functions work without
modification for FILESTREAM data.
Index Management
Indexes can become fragmented over time and might need to be rebuilt. Rebuilding indexes
only addresses the Database Engine pages and does not impact the FILESTREAM data. Note
that index rebuilds are less resource intensive and can be completed in a much shorter time
than when BLOBs are stored in the database.
Note: The varbinary(max) data type prevents online operations. This is a good reason to keep
the data table narrow.
When files on a volume grow, they can also become fragmented, meaning that the collection of
clusters allocated to the file is not contiguous. When the file is read sequentially, the underlying
disk heads need to read all the clusters in sequence, which could mean they have to read
different portions of the disk. Even if files do not grow once they have been created, they could
become fragmented if they were created on a volume where the available free space is not in a
single contiguous chunk.
Fragmentation reduces the sequential read performance; this is similar to index fragmentation
within a database, which can slow down query range scan performance. It is therefore essential
that the volume hosting the FILESTREAM objects be periodically defragmented. Also, if the
12
volume that will be used to host the FILESTREAM data container was previously used, or if it
still contains other data, the fragmentation level should be checked and fixed if necessary.
Filegroup Management
FILESTREAM filegroups inherit characteristics of row data filegroups, except when only one
data container exists per filegroup.
Note: The next major release of SQL Server will allow multiple data containers for
FILESTREAM filegroups.
Many solutions do not take advantage of all of the useful capabilities of filegroups.
FILESTREAM filegroups tend to be extremely large, therefore Read-Only filegroups, Partial
Restore, and Filegroup Backup operations are often very useful, though potentially complex.
Backup and Restore
All backup and recovery models are available with FILESTREAM data, and the FILESTREAM
data is backed up with the structured data in the database. If you do not want to back up
FILESTREAM data with relational data, you can use a partial backup to exclude FILESTREAM
filegroups.
Note: Partial backups are very powerful. However, this backup and restore strategy requires a
thorough understanding of backups. The application must also be able to function with the
unrestored filegroups. Do not underestimate the complexity of a partial backup and restore
strategy.
It is possible to make a FILESTREAM filegroup READ_ONLY. This can improve the
management experience, including the backup and restore processes. Making a FILESTREAM
filegroup READ_ONLY does, however, introduce the need to understand how to handle data
lifespan. For example, it is common for solutions that manage large unstructured data values to
periodically purge or modify data, and then to have long periods of idle time. You must be sure
to include the ability to manage the transitions between READ_ONLY and READ_WRITE.
Note: It is not possible to back up a single file as a unit of the filegroup or to restore a single file
from the filegroup backup. Keep in mind that you can export a single file when you restore the
primary filegroup and then the FILESTREAM backup to a separate instance.
Backups will pick up all files in the FILESTREAM directory, even if they have not yet cleared the
garbage collector (for example, a deferred update). Refer to the sections “FILESTREAM
Transaction Log” and the “Garbage Collector” in this white paper for more detail. If you want to
keep differential and full backups as lean as possible, be sure to first run transaction log
backups and let the garbage collector clear the files.
FILESTREAM Transaction Log
FILESTREAM maintains transactional consistency between the structured and unstructured
data at all times; consistency is maintained automatically by SQL Server and does not require
any custom logic in the application. When a value is written to a FILESTREAM-enabled column,
13
a file is created in the main directory. FILESTREAM maintains the equivalent of a database
transaction log, with many of the same management requirements. The combination of the
database transaction log and the FILESTREAM transaction log lets the FILESTREAM and
structured data be recovered correctly.
A significant advantage of using FILESTREAM instead of storing the BLOB in the database is
the reduction in the size of the transaction log. In Full Recovery mode, inserts into a nonFILESTREAM varbinary(max) column are fully logged; if the column is FILESTREAM enabled,
the transaction log does not contain the BLOB.
The directory that holds the BLOB files has a folder called $FSLOG, which acts like a
transaction log. Unlike the transaction log of the row data files, however, a copy of the
FILESTREAM file is not stored. The algorithms of this transaction log ensure that the space
consumption is minimal, (almost negligible).
The following operations can cause growth similar to that of a row data transaction log:

With an INSERT operation, the new file is created and, in simple terms, very small
(approximately 12 Bytes) text files are created to track which files are new.

A DELETE operation has no effect on the file system.

An UPDATE operation is not performed in-place. The original file is therefore retained,
and a new file is created with updated values and small files created in the $FSLOG
directory.
Frequently updated environments with updates that are equal to or larger than the original value
can consume disk space very quickly. The files are cleared from the file system when they
qualify to be removed by the garbage collector.
Garbage Collector
Files that are no longer needed are removed by a garbage collection process. This process is
automatic, unlike that in Windows SharePoint Services, where garbage collection must be
implemented manually on the external BLOB store. FILESTREAM garbage collection is a
background task that is triggered by the database checkpoint process. A checkpoint is
automatically run when enough transaction logs have been generated. For most
implementations, an administrator simply needs to know that the process exists, and that it is
the mechanism to remove deleted FILESTREAM files.
Note the following considerations:
14

The garbage collector kicks in approximately every five minutes and requires the system
to be idle.

The garbage collector first flags files as “To be Deleted,” and then removes them.

CHECKPOINT is the only way to manually initiate the garbage collector, but this does
not guarantee immediate garbage collector response.
To manage space consumed by the Deferred Update behavior of FILESTREAM data, you
should make sure that a CHECKPOINT and backup have been run, followed about 10 minutes
later with another backup.
Log Shipping
Log shipping supports FILESTREAM. Both the primary and secondary servers must be running
SQL Server 2008 or a later version and must have FILESTREAM enabled.
Database Mirroring
As of SQL Server 2008 R2, Database Mirroring is not yet supported. However, the upcoming
release of SQL Server, “code-named” Denali, will provide this functionality.
Database Snapshot
It is possible to create a snapshot of a database that contains FILESTREAM filegroups,
however the actual FILESTREAM filegroups cannot be part of the definition; in other words, the
FILESTREAM data cannot participate in the database snapshot. Understanding this can be
useful in migration scenarios, where a snapshot can provide a fast rollback option.
AlwaysON (HADRON)
The current Community Technology Preview (CTP) release (SQL Server “code-named” Denali)
will support FILESTREAM in the High Availability Disaster Recovery–AlwaysON (HADRON)
configuration.
Considerations for Table Creation
The basics of table creation are well documented in SQL Server Books Online (for example,
Creating and Modifying Tables [http://msdn.microsoft.com/en-us/library/ms189614.aspx]) and
guidance can be found in many articles on the Internet. The following section addresses
advanced design-related considerations.
The basic statement to create tables is:
CREATE TABLE dbo.FILESTREAMTable
(
[BlobID] uniqueidentifier NOT NULL
ROWGUIDCOL UNIQUE
,[Blob] varbinary(MAX) FILESTREAM NOT NULL
)
GO
When a table contains a FILESTREAM column, each row must have a non-null unique
ROWGUIDCOL value.
When the solution is small and simple, then the data model can allow the metadata to be stored
along with the BLOB. When the data set becomes very large, a more efficient strategy is to
keep a narrow table that only hosts the FILESTREAM data values (the GUID and
15
varbinary(max) values). Avoid maintaining foreign key relationships between this table and the
metadata table because it will incur additional complexity when implementing and maintaining
table partitioning.
The value of this strategy becomes clear when you are managing large sets of data and
accommodating partitioning strategies that align with filegroup strategies. For example, when
switching from a table into another, indexes and foreign key relationships must be identical.
Following is an example of a table creation that aligns with a partitioning strategy:
CREATE TABLE [dbo].[FILESTREAMTable] (
[BlobID] uniqueidentifier
,[Blob]
ROWGUIDCOL NOT NULL
varbinary(MAX) FILESTREAM NOT NULL
CONSTRAINT [UQ_CDX_dboFILESTREAMTable_BlobID] UNIQUE CLUSTERED
([BlobID] ASC)
) ON [ps_RowData] ([BlobID]) FILESTREAM_ON [ps_FILESTREAM];
GO
Note that a FILESTREAM-enabled table will always require a least two filegroups. If filegroups
are not explicitly stated, then those filegroups marked as default will be used.
Data Access Considerations
After you store data in a FILESTREAM column, you can access the files by using Transact-SQL
transactions or by using Win32 application programming interfaces (APIs). The Win32 access
relies on securing a token before accessing the file. Transact-SQL can access the data as if it
were stored in the database.
Note: It is possible to access and interact with the FILESTREAM files without securing a token,
but this requires breaking strict security guidelines. This approach is not recommended. It
places the data at a high risk of corruption while also negatively affecting the application
experiences not related to the SQL Server engine. Microsoft highly recommends that you
access the FILESTREAM files only with the recommended approaches.
Several resources provide empirical performance data that shows that the most efficient way to
access FILESTREAM data is through the Win32 API. See the following resources for more
information:

Managing FILESTREAM Data by Using Win32 (http://msdn.microsoft.com/enus/library/cc645940.aspx)

Managing FILESTREAM Data by Using Transact-SQL (http://msdn.microsoft.com/enus/library/cc645962.aspx)
After you store data in a FILESTREAM column, you can access the files by using Transact-SQL
transactions or by using Win32 APIs.
16
Note: Unlike FILESTREAM, FileTable does allow for in-place updating of BLOB values.
Transact-SQL Access
Transact-SQL access is not the most efficient access to FILESTREAM data. However,
Transact-SQL access provides the ability to introduce FILESTREAM into a solution without
requiring the application to be aware of this new storage format.
When you use Transact-SQL, you can insert, update, and delete FILESTREAM data. Following
are some considerations for using Transact-SQL:

A large amount of data is more efficiently streamed into a file that uses Win32 interfaces.

When a FILESTREAM field is set to NULL, the BLOB data associated with the field is
deleted.

You cannot use a Transact-SQL chunked update, implemented as UPDATE.Write(), to
perform partial updates to the data.

Deleting a row will find the respective individual file and mark it as ready to be deleted. It
will not free disk space until the garbage collector removes it.

A truncate operation marks the table’s directory as ready to be deleted and creates a
new directory.
File System Streaming Access
The Win32 streaming support works in the context of a SQL Server transaction. It is important to
first open a transaction before attempting to request a token because one will not be provided
without an open transaction.
Within a transaction, you can use FILESTREAM functions to obtain a logical Universal Naming
Convention (UNC) file system path. You can then use the OpenSqlFILESTREAM API to obtain
a file handle. This handle can be used by Win32 file streaming interfaces, such as ReadFile()
and WriteFile(), to access and update the file by way of the file system.
Because file operations are transactional, you cannot delete or rename FILESTREAM files
through the file system.
For more information and examples of using Win32 APIs to access FILESTREAM data, see
Managing FILESTREAM Data by Using Win32 (http://msdn.microsoft.com/enus/library/cc645940.aspx).
Statement Model
The FILESTREAM file system access models a Transact-SQL statement by using file open and
close. The statement starts when a file handle is opened and ends when the handle is closed.
For example, when a write handle is closed, any possible AFTER trigger that is registered on
the table operates as if an UPDATE statement is completed.
17
Storage Namespace
In FILESTREAM, the Database Engine controls the BLOB physical file system namespace. A
new intrinsic function, PathName, provides the logical UNC path of the BLOB that corresponds
to each FILESTREAM cell in the table. The application uses this logical path to obtain the
Win32 handle and operate on the BLOB data by using regular Win32 file system interfaces. The
function returns NULL if the value of the FILESTREAM column is NULL.
Transacted File System Access
A new intrinsic function, GET_FILESTREAM_TRANSACTION_CONTEXT(), provides the token
that represents the current transaction that the session is associated with. The transaction must
have been started and not yet aborted or committed. By obtaining a token, the application binds
the FILESTREAM file system streaming operations with a started transaction. The function
returns NULL in the case of no explicitly started transaction.
All file handles must be closed before the transaction commits or aborts. If a handle is left open
beyond the transaction scope, additional reads against the handle will cause a failure; additional
writes against the handle will succeed, but the actual data will not be written to disk. Similarly, if
the database or instance of the Database Engine shuts down, all open handles are invalidated.
Transactional Durability
With FILESTREAM, upon transaction commit, the Database Engine ensures transaction
durability for FILESTREAM BLOB data that is modified from the file system streaming access.
Isolation Semantics
The isolation semantics are governed by Database Engine transaction isolation levels. Only the
read-committed isolation level is supported for file system access. Repeatable read operations,
and also serializable and snapshot isolations, are supported when the FILESTREAM data is
accessed by using Transact-SQL. Dirty read is not supported.
The file system access open operations do not wait for locks. Instead, the open operations fail
immediately if they cannot access the data because of transaction isolation. The streaming API
calls fail with ERROR_SHARING_VIOLATION if the open operation cannot continue because
of isolation violation.
Partial Updates
To allow for partial updates to be made, the application can issue a device file system control
(FSCTL, or FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) to fetch the old content
into the file that the opened handle references. This will trigger a server-side old content copy.
For better application performance and to avoid running into potential time-outs when you are
working with very large files, Microsoft recommends that you use asynchronous I/O.
If the FSCTL is issued after the handle has been written to, the last write operation will persist.
For more information, see FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT
(http://technet.microsoft.com/en-us/library/cc627407.aspx).
18
Write-Through from Remote Clients
Remote file system access to FILESTREAM data is enabled over the Server Message Block
(SMB) protocol. If the client is remote, no write operations are cached by the client side. The
write operations will always be sent to the server. The data can be cached on the server side.
Microsoft recommends that applications that are running on remote client computers
consolidate small write operations to make fewer write operations using larger data size.
Creating memory mapped views (memory mapped I/O) by using a FILESTREAM handle is not
supported. If memory mapping is used for FILESTREAM data, the Database Engine cannot
guarantee consistency and durability of the data or the integrity of the database.
Migration Considerations
Prior to SQL Server 2008, solutions stored BLOB data either within the database as varbinary
objects or externally in a file system. You can use several techniques to migrate from these
options to a FILESTREAM format. While the details of these migration strategies are out of the
scope for this white paper, the following sections describe some migration considerations.
Several factors influence which migration technique is most appropriate:

The total size of the BLOB data.

The current (source) data source of BLOBs.

The availability requirements of the solution (allowed downtime for migration).

The application’s ability and business tolerance to handle temporarily unavailable BLOB
values.

The complexity involved in determining the destination of BLOB.
A simple solution for the migration of data currently residing in a database is to use
INSERT…INTO the FILESTREAM-enabled table. If the data resides in a file system, a process
or application is required to read the values and insert them into the FILESTREAM-enabled
database.
Note: It is not possible to simply point FILESTREAM to existing locations. In all cases, a BLOB
must pass through the Database Engine, because internal structures maintain references
between the table row and the location on the file system.
The simple solution is not adequate if the source data is many terabytes in size, or if the
application (or business) cannot tolerate the duration of transferring the data. Also, a highperforming system generally stores BLOBs smaller than 1 MB in the database and larger
BLOBs outside of the database. You might also want the application to create and store
thumbnails as part of the migration process. These all make the migration more complex, so
carefully consider migration during the planning phase for your solutions.
Note: FileTable eases migration because you can copy and paste; however, it is still important
to carefully consider your migration strategy.
19
Conclusion
There are many factors to consider when deciding which solution is best for storing BLOBs. This
white paper serves as a companion to the information about FILESTREAM found in many
sources. It delves deeply into selected topics that IT professionals should consider when
implementing a solution that uses FILESTREAM. The links that follow provide further
information.
20
For More Information
http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter
http://www.microsoft.com/sqlserver/en/us/product-info/future-editions.aspx: Future SQL Server
Editions
http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx: FILESTREAM Storage in
SQL Server 2008
http://technet.microsoft.com/en-us/library/bb933993.aspx: FILESTREAM Overview
http://msdn.microsoft.com/en-us/library/bb895334.aspx: Using FILESTREAM with Other SQL
Server Features
http://technet.microsoft.com/en-us/library/cc645886.aspx: How to: Set Up FILESTREAM on a
Failover Cluster
http://technet.microsoft.com/en-us/library/dd206979.aspx: FILESTREAM Best Practices
http://msdn.microsoft.com/en-us/library/ff929068(v=sql.110).aspx: FileTable Overview
http://msdn.microsoft.com/en-us/library/ff929144(v=sql.110).aspx: Using FileTables to Manage
Unstructured FILESTREAM Data
http://technet.microsoft.com/en-us/library/ee748649.aspx: Overview of Remote BLOB Storage
(SharePoint Server 2010)
http://blogs.msdn.com/b/sqlrbs/: SQL Remote Blob Storage Team Blog
http://sqlrbs.codeplex.com/: Microsoft SQL Remote Blob Storage (RBS) Samples
21
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:


Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
Send feedback.
22