SQL CLR Demystified A look at the integration of the CLR into SQL Server Matt Whitfield Atlantis Interactive UK Ltd What will be covered • Overview of CLR integration • SqlContext • Stored Procedures • Scalar & Table Functions • CLR DML Triggers • CLR DDL Triggers • Aggregate Functions • CLR Types • Visual Studio Database Projects • A look at permission sets • Examples: – Data maniplation for performance – Environment manipulation for maintenance Overview • CLR integration into SQL Server allows us to run CLR code that conforms to certain restrictions • CLR integration means that we can use common business rule validation code in the middle tier and the database • For some operations, CLR integration allows us to extract maximum performance • CLR integration allows us to perform maintenance tasks that would have been very difficult or impossible before CLR Disabled by Default • CLR integration is disabled by default in SQL Server, and therefore must be enabled: EXEC sp_configure ‘clr enabled’, 1; RECONFIGURE WITH OVERRIDE; GO • This can also be done using the Surface Area Configuration tool. What Framework version? • The CLR in SQL Server is always loaded as the 2.0 runtime • Until .NET 4.0, a single CLR host process could only load a single version of the .NET runtime • It is possible future versions of SQL Server may allow more up-to-date runtime versions to be loaded • We can confirm the loaded version with: select * from sys.dm_clr_properties • However, we can use types from .NET 3.0 and 3.5, as these releases did not update mscorlib Some Things To Note • When SQL Server loads assemblies, they are cached in memory. When the O/S signals memory pressure to SQL Server, explicit garbage collection may be run, and the assemblies may be unloaded. This can cause performance issues if it happens frequently. • SQL CLR code cannot be costed accurately by the query optimiser, as it does not look at what the code actually does – this can affect execution plans. • SQL CLR code can sometimes prevent parallelism, and SQL CLR procedures are usually single threaded. Sometimes this can hurt performance, other times you may find you achieve the same amount of work in the same time using 1 thread instead of 8. SqlContext • SqlContext is a class that gives us access to four members: – IsAvailable – Pipe – TriggerContext – WindowsIdentity • IsAvailable indicates whether the other members can be used SqlContext.Pipe • Pipe is the main point of interaction with the SQL Server when returning result sets or messages • Messages can be sent to the client using the Send(string) function: SqlContext.Pipe.Send(“Hello World”); • Single row result sets can be sent using the Send(SqlDataRecord) function. Multi-row Result Sets • SendResultsStart is used to specify the schema of the result set SendResultsStart(SqlDataRecord) – No data is sent by the SendResultsStart call • 0 or more calls to SendResultsRow are then sent with data SendResultsRow(SqlDataRecord) – Empty result sets can be sent by making no calls to SendResultsRow • SendResultsEnd marks the end of the result set SendResultsEnd() SqlContext.WindowsIdentity • This returns a standard System.Security.Principal.WindowsIdentity object • The property can only be used under EXTERNAL_ACCESS or UNSAFE permission sets • For 99% of CLR work, using this property is not required SqlContext.TriggerContext • TriggerContext gives you access to: – ColumnCount – number of columns in the target table – EventData – SqlXml that returns the same as the EVENTDATA T-SQL function – TriggerAction – a TriggerAction enumeration value that specifies which action caused the trigger to fire – IsUpdatedColumn – method that determines if a column was updated – same as the T-SQL UPDATE() function – …but what about the inserted and deleted virtual tables that are available in normal triggers? Context Connection • To access data from within CLR code, the special ‘context connection’ connection string is used… using (SqlConnection connection = new SqlConnection("context connection=true")) { connection.Open(); ... } • We can then run any SQL statement as we would normally from .NET • The inserted and deleted virtual tables are available through this method when in a DML trigger Stored Procedures • What is a stored procedure? – A procedure that contains a sequence of operations, which may affect existing data, return result sets or both – CLR Stored Procedures are no different – CLR Stored Procedures are implemented as static members of a class – The containing class is irrelevant to the execution of the procedure, and is an organisational unit only Hello World public partial class StoredProcedures { [Microsoft.SqlServer.Server.SqlProcedure] public static void HelloWorld() { SqlContext.Pipe.Send("Hello World!"); } } • • Procedure is just a method, decorated with the SqlProcedure attribute. Procedure is registered in SQL Server as follows: CREATE PROCEDURE [dbo].[HelloWorld] AS EXTERNAL NAME [SQLAssembly].[StoredProcedures].[HelloWorld] GO Simple Data Access • In part 1: – we will open a context connection – we will execute a command – we will read the results, and use those to populate a Dictionary object • In part 2: – We will create record metadata – We will create a record set – We will populate the record set from the Dictionary • While a very mundane example, and not something you would usually use the CLR for, it shows the basic mechanics of moving data into and out of the CLR Simple Data Access (1/2) public static void RowCounter() { Dictionary<string, int> typeDictionary = new Dictionary<string, int>(); using (SqlConnection connection = new SqlConnection("context connection=true")) { connection.Open(); using (SqlCommand command = new SqlCommand("SELECT [type] from [sys].[objects];", connection)) { using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { string type = reader.GetString(0); if (typeDictionary.ContainsKey(type)) { typeDictionary[type] = typeDictionary[type] + 1; } else { typeDictionary.Add(type, 1); } } } } } ... Simple Data Access (2/2) ... SqlDataRecord record = new SqlDataRecord(new SqlMetaData[] { new SqlMetaData("Type", SqlDbType.NChar, 2), new SqlMetaData("Count", SqlDbType.Int) }); SqlContext.Pipe.SendResultsStart(record); foreach (KeyValuePair<string, int> kvp in typeDictionary) { record.SetString(0, kvp.Key); record.SetInt32(1, kvp.Value); SqlContext.Pipe.SendResultsRow(record); } SqlContext.Pipe.SendResultsEnd(); } Things to note • That we did not create a new SqlDataRecord object for every row, we just populated the existing object • That the name exposed to SQL Server does not have to match the name of the implementing method • The return type of the stored procedure can be int to return a value, as with T-SQL stored procedures Scalar Functions • What is a scalar function? – A statement or set of statements that return a single value of a specific type based on zero or more parameters – CLR Scalar Functions are no different – CLR Scalar Functions are implemented much the same as stored procedures – a single static method of a class String Length Scalar Function [Microsoft.SqlServer.Server.SqlFunction] public static long fn_LengthTestCLR(string input) { return input.Length; } CREATE FUNCTION [dbo].[fn_LengthTestCLR] (@input nvarchar (MAX)) RETURNS bigint AS EXTERNAL NAME [SQLAssembly].[UserDefinedFunctions].[fn_LengthTestCLR] GO Performance Comparison • Consider the equivalent T-SQL function, and the following SQL Statement: CREATE FUNCTION dbo.fn_LengthTest(@input varchar(MAX)) RETURNS [bigint] AS BEGIN RETURN LEN(@input) END SELECT SUM(dbo.fn_LengthTest(o.[name])) FROM [sys].[objects] [o] CROSS JOIN [sys].[all_columns] [ac] CROSS JOIN [sys].[all_columns] [ac1] CLR would be slower, right? • No! • On a dual core 2GHz CPU – – SQL Function – 9.036 seconds – SQL CLR Function – 3.918 seconds • Overhead of calling a SQL CLR function is lower than that of calling a T-SQL function Table Functions • What is a table function? – A statement or set of statements that return a single result set with a defined schema based on zero or more parameters – CLR Table Functions can only be multi-statement equivalents, there is no concept of an ‘inline CLR table function’ – Implemented as a pair of methods – one that returns an IEnumerable, and one that sets outbound parameters for each object – both are static members Regular Expression Function • In Part 1: – We will implement an object that we can return to hold the data required to fill each row • In Part 2: – We will create a list of our storage object – We will run a regular expression match based on the parameters – We will populate our list with the results of the regular expression – We will return the list • In Part 3: – We will set the values to be stored in each row A Regular Expression Function (1/3) private struct _match { public readonly int MatchNumber; public readonly int GroupNumber; public readonly string CaptureValue; public _match(int matchNumber, int groupNumber, string captureValue) { MatchNumber = matchNumber; GroupNumber = groupNumber; CaptureValue = captureValue; } } A Regular Expression Function (2/3) [SqlFunction(FillRowMethodName = "FillRow", TableDefinition = "MatchNumber int, " + "GroupNumber int, " + "CaptureValue nvarchar(MAX)")] public static IEnumerable GetCaptureGroupValues(string input, string pattern) { List<_match> matchList = new List<_match>(); int matchIndex = 0; foreach (Match m in Regex.Matches(input, pattern)) { int groupIndex = 0; foreach (Group g in m.Groups) { matchList.Add(new _match(matchIndex, groupIndex++, g.Value)); } matchIndex++; } return matchList; } A Regular Expression Function (3/3) public static void FillRow(Object obj, out int MatchNumber, out int GroupNumber, out SqlString CaptureValue) { _match match = (_match)obj; MatchNumber = match.MatchNumber; GroupNumber = match.GroupNumber; CaptureValue = match.CaptureValue; } Creation SQL CREATE FUNCTION [dbo].[GetCaptureGroupValues] (@input [nvarchar] (MAX), @pattern [nvarchar] (MAX)) RETURNS TABLE ([MatchNumber] [int] NULL, [GroupNumber] [int] NULL, [CaptureValue] [nvarchar] (MAX) NULL) AS EXTERNAL NAME [SQLAssembly].[UserDefinedFunctions].[GetCaptureGroupValues] GO • Again, the name exposed to SQL Server does not have to match the name of the main body function • The SQL definition can specify different column names and parameter names to the main method and fill row method Things to note • That we used a struct for the storage object, rather than a class, because of reduced instantiation costs • That our function was decorated with a SqlFunction attribute that named the fill row method and specified the shape of the output table • That the fill row method took an object as it’s first parameter, and had an outbound parameter for each of the columns we wanted to fill CLR Triggers • Known bug in Visual Studio 2005 and 2008 that means SqlTrigger attribute does not allow you to specify the schema of the target table • Fixed in Visual Studio 2010 • Workaround is to comment out the SqlTrigger attribute, and deploy manually DML Triggers • What is a DML Trigger? – A statement or set of statements run when data in a target table is modified – CLR DML Triggers are no different – CLR DML Triggers still have access to the inserted and deleted virtual tables – CLR DML Triggers are implemented as static methods in a class A Simple DML Trigger • We will specify the event type and target object using the SqlTrigger attribute • We will connect to the database using the context connection • We will select the rows from the inserted virtual table, and count them • We will send a message to the SqlContext pipe stating the number of rows A Simple DML Trigger [Microsoft.SqlServer.Server.SqlTrigger(Target="CallLog", Event="FOR UPDATE")] public static void TestDMLTrigger() { int i = 0; using (SqlConnection connection = new SqlConnection("context connection=true")) { connection.Open(); using (SqlCommand command = new SqlCommand("SELECT * from inserted;", connection)) { using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { i++; } } } } SqlContext.Pipe.Send(i.ToString() + " rows in inserted"); } DDL Triggers • What is a DDL Trigger? – A statement or set of statements run when a schema modification event occurs – CLR DDL Triggers are no different – CLR DDL Triggers have access to the event data via SqlContext.TriggerContext – CLR DDL Triggers are implemented as static methods in a class A Simple DDL Trigger • We will specify the target and the event type using the SqlTrigger attribute • We will create an XmlDocument based on the TriggerContext object from the SqlContext object • We will then use XPath to find the created object’s schema and name • We will send a message to the SqlContext pipe stating the name of the created object A Simple DDL Trigger [Microsoft.SqlServer.Server.SqlTrigger(Target="DATABASE", Event="FOR CREATE_TABLE")] public static void TestDDLTrigger() { XmlDocument document = new XmlDocument(); document.LoadXml(SqlContext.TriggerContext.EventData.Value); string objectName = document.SelectSingleNode("//SchemaName").InnerText + "." + document.SelectSingleNode("//ObjectName").InnerText; SqlContext.Pipe.Send("Table " + objectName + " was created."); } Aggregate Functions • What is an aggregate function? – A function that takes a series of values and accumulates them into a single, aggregated result – CLR Aggregate functions are no different – Implemented as a class with four methods: • • • • Init Accumulate Merge Terminate – Aggregate functions in SQL Server 2005 can take only one parameter, in SQL Server 2008 this restriction is lifted Aggregate Function Methods • Init method is called first, here we initialise our member fields • Accumulate is called to add a value to the aggregate • Merge is called to add together two aggregates (for example, as a result of parallelism) • Terminate is called to return our value A Simple Aggregate • We will use the SqlUserDefinedAggregate attribute to specify the serialisation format of our aggregate • We will track the minimum value in Accumulate, and also track that we have seen values • In Merge, we will check if the other aggregate class has seen values - if it has, then we take the minimum value seen • In Terminate we will return the minimum value we have seen if we have seen any values, or NULL if we have not seen any values A Simple Aggregate (1/2) [Serializable] [Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)] public struct MinValue { int minValue; bool hasValues; public void Init() { minValue = Int32.MaxValue; hasValues = false; } public void Accumulate(SqlInt32 Value) { hasValues = true; if (Value.Value < minValue) { minValue = Value.Value; } } ... A Simple Aggregate (2/2) ... public void Merge(MinValue Group) { if (Group.hasValues) { if (Group.minValue < minValue) { minValue = Group.minValue; } hasValues = true; } } public SqlInt32 Terminate() { if (hasValues) { return minValue; } else { return SqlInt32.Null; } } } SqlUserDefinedAggregate • This attribute contains parameters which can affect the query optimiser, and can cause incorrect results if set wrong: – IsInvariantToDuplicates – should only be true if aggregating the same value many times does not affect the result – IsInvariantToNulls – should only be true if aggregating NULL values does not affect the result – IsInvariantToOrder – should only be true if the result is not dependent on the order of the aggregated values – IsNullIfEmpty – should only be true if the aggregate of 0 values is NULL Native and UserDefined Serialisation • If your aggregate contains reference types then you must implement custom serialisation, and specify Format.UserDefined in the SqlUserDefinedAggregate attribute • When specifying UserDefined format, your aggregate must implement the IBinarySerialize interface, and specify it’s maximum size in the SqlUserDefinedAggregate attribute • Maximum byte size can be 1 to 8000 in SQL Server 2005, or -1 for any value between 8001 bytes and 2 GB in SQL Server 2008. • The same serialisation interface applies to both Aggregates and User Defined Types CLR Types • What is a user defined type? – A type based on a system type, which may be bound to old-style rules and defaults – CLR Types are different – CLR Types can be complex, and can have properties and methods – CLR Types can have static methods (and the built in CLR Types in SQL Server 2008 have some) A Simple CLR Type • Part 1: – We will define the serialisation format of the type – We will provide the IsNull property and the Null static property to return a NULL instance • Part 2: – We will define methods to convert the type to and from strings • Part 3: – We will define a property for the symbol, a method to change the symbol and a static method to create a new currency with a defined symbol and value • Part 4: – We will declare our member fields and provide implementation of the IBinarySerialize interface to load and store our type A Simple CLR Type (1/4) [Serializable] [Microsoft.SqlServer.Server.SqlUserDefinedType(Format.UserDefined, MaxByteSize=24)] public struct Currency : INullable, IBinarySerialize { public bool IsNull { get { return _null; } } public static Currency Null { get { Currency h = new Currency(); h._null = true; return h; } } A Simple CLR Type (2/4) public override string ToString() { return _symbol + _value.ToString(); } public static Currency Parse(SqlString s) { if (s.IsNull || string.IsNullOrEmpty(s.Value) || s.Value.Length < 2 || string.Equals(s.Value, "NULL", StringComparison.OrdinalIgnoreCase)) { return Currency.Null; } Currency u = new Currency(); u._symbol = s.Value[0]; u._value = Decimal.Parse(s.Value.Substring(1)); return u; } A Simple CLR Type (3/4) public char Symbol { get { return _symbol; } } [SqlMethod(IsMutator=true)] public void ChangeSymbol(char newSymbol) { _symbol = newSymbol; } public static Currency CreateCurrency(char symbol, Decimal value) { Currency c = new Currency(); c._symbol = symbol; c._value = value; return c; } A Simple CLR Type (4/4) private char _symbol; private Decimal _value; private bool _null; public void Read(BinaryReader r) { _null = r.ReadBoolean(); _symbol = r.ReadChar(); _value = r.ReadDecimal(); } public void Write(BinaryWriter w) { w.Write(_null); w.Write(_symbol); w.Write(_value); } } The CLR Type in use • Using our static method to create an instance: DECLARE @i [Currency] SET @i = [Currency]::[CreateCurrency]('$', 0.53) PRINT CONVERT([varchar], @i) • Using our instance mutator method to change the symbol: DECLARE @i [Currency] SET @i = [Currency]::[CreateCurrency]('$', 0.53) SET @i.ChangeSymbol('£') PRINT CONVERT([varchar], @i) • Retrieving the value of a property: DECLARE @i [Currency] SET @i = [Currency]::[CreateCurrency]('$', 0.53) PRINT @i.Symbol Things to note • We can call static methods and properties on CLR types with [type-name]::[member-name] • IBinarySerialize must be used to serialise types that cannot be serialised natively • A CLR type should always be able to parse it’s own string representation • Methods that change the state of the type must be marked as mutator methods • Although in .NET we can define operator overloads to specify how operators (e.g. +, -, *, /) are applied to our classes, SQL Server does not use these. Permission Sets • Each assembly in SQL Server has an associated ‘permission set’ – one of: – SAFE – EXTENAL_ACCESS – UNSAFE • The permission set assigned to the assembly determines how limited the CLR functionality is • This, in turn, determines how much damage we can do to the stability of the host process (i.e. SQL Server itself) Allowed operations SAFE EXTERNAL_ACCESS UNSAFE Code access security permissions Execute only Execute + access to external resources Unrestricted Programming model restrictions Yes Yes No Verifiability requirement Yes Yes No Local data access Yes Yes Yes Ability to call native code No No Yes Programming model restrictions • A few of the programming model restrictions for SAFE and EXTERNAL_ACCESS assemblies are as follows: – – – – – – No static fields may be used to store information PEVerify type safety checking test is passed No synchronization may be used Finalizer methods may not be used No threading may be used No self-affecting code may be used • There are more requirements that must be met under the above two permission sets – the above is just some of the more common restrictions. Visual Studio Database Projects • Visual Studio offers us the ability to create a ‘CLR Database Project’ which enables simplified deployment and debugging • Some issues present, though these are mostly solved in VS2010 • Database projects handle the deployment of the objects that we create simply, which can be really helpful during implementation Performance Example Phil Factor Speed Phreak Competition 4 The Log Parsing Problem • Producing aggregated results from an IIS log file • Made difficult by the need to determine ‘sessions’ – i.e. groups of rows from the same IP within a set time-span • Requirement to produce three aggregated result sets – visitor summary by day, visitor summary and page summary by day How did the CLR help? • CLR methods can yield excellent results when doing forward-only navigation through result sets, performing aggregation and calculation along the way • Iteration over the log allowed the data to be parsed into ‘visits’ by applying the session timeout • Iteration over collection of visits allowed the data to be aggregated into country and page summaries What was the result? • Execution in 301ms, as opposed to the closest T-SQL approach which took 1779ms. • Efficiency gains through the fact that the CLR procedure would only occupy 1 thread • Maintenance benefits in readability and simplicity Maintenance Example • CLR Procedure to retrieve the free disk space on any specified server – by Tara Kizer • Uses System.Diagnostics namespace to interface with WMI to retrieve results • Requires UNSAFE assembly permission set because PerformanceCounterCategory is synchronised • Simple use of framework classes to retrieve the information required When should we use the CLR? • To improve performance: – In situations where a forward-only scan over a result set requires row by row processing to produce an aggregated result set – In situations where complex logic cannot simply be expressed in T-SQL • To improve maintenance: – In situations where information required can be simply obtained from any .NET process – In situations where interaction with the O/S is required (will require elevated permission set) • For functionality: – In situations where commonly available .NET Framework classes provide desired functionality (classic example is Regular Expression handling – To bring business logic from the middle tier into the database And when should we not? • When the functionality presented offers little over it’s T-SQL equivalent • Instead of regular DML statements (i.e. a CLR procedure to insert a row is not the way to go!) • To replicate row by row update logic from a cursor – conversion to set-based T-SQL yields far higher performance • When the desired operation really does not belong in the database • When the database has to be portable to other types of database server Summary • We learned: – How to implement each type of CLR object – How to get data out of and back into SQL Server – That the only SQL CLR object type which differs significantly from it’s T-SQL equivalent is the CLR User Defined Type – That the CLR can offer significant performance improvements, particularly for forward-only scans and scalar functions – That we are restricted as to what we can achieve in a CLR method based on the permission set of the assembly