SQL Bits 2009 - Manchester Instrumenting, Monitoring and Auditing of SSIS ETL Solutions Davide Mauri dmauri@solidq.com EXEC sp_help ‘Davide Mauri’ • • • • • • MCDBA, MCAD, MCT Microsoft SQL Server MVP Works with SQL Server from 6.5 Works on BI from 2003 President of UGISS (Italian SQL Server UG) Mentor @ Solid Quality Mentors – Italian Subsidiary Agenda • • • • ETL Story Logging SSIS in MS way Workarounds Logging SSIS in MY way – The developer’s corner • Adding value to log data © 2007 Solid Quality Mentors 3 ETL Story 4 ETL Story • ETL process grows in complexity • Since package won’t be run from BIDS in production you need something to help to understand – What went wrong when package didn’t worked as expected • And maybe this happens only at nighttime… – Monitor the performance of your package to forecast its ability to stay within a given timeframe 5 Logging SSIS in MS way Nice things • Flexibility – You can decide what and where to log to – You have a lot of ready-to-be-used log providers – Is available out-of-the-box • Well, you have to remember to activate it 6 Logging SSIS in MS way Not-so-nice things • Logging needs to be set up within the package – If you need to change logging you need to edit the package in VS • Too few information given – – – – No Variables values No Expressions results No Information on Data Flow Cannot handle very well chains of packages (>=2) • Problems using Parent Package Variables to propagate logging configuration (eg: log file path) • You just lose information when you have more than 2 packages in the chain 7 Logging SSIS in MS way Things that don’t work as expected • DTExec seems to allow to control logging at runtime – Unfortunately you need to have a properly configured connection manager in advance 8 Logging SSIS in MS way “Improved” things from 2005 to 2008 • Some new features added with SQL 2008 – SQLDumper • Too much detailed information on one hand, and again to few on the other 9 Logging SSIS in MS way Conclusions • Doesn’t really offer an help to understand what’s went wrong – To few information given • Hey, I’d like to log also Data Flow! I really have to do everything by hand? – This can take a lot of time! • I want to change my logged data. How can I do it without have to open the package in BIDS and release-test-deploy it? – You can’t! 10 Logging SSIS in MS way Workarounds • Use specific task (Script, Custom or Execute SQL) before and after each task you want to instrument • Create an event-handler for each event you want to log (es: PreExecute, PostExecute) – Better if then you use a tool to create SSIS templates and standardize them • Like MDDE (Metadata-driven ETL) – http://www.codeplex.com/SQLServerMDDEStudio/ 11 Logging SSIS in MS way DEMO 1 . The usual way 12 Logging SSIS in MY way Learn from BIDS • Basically I’d like to have all the information that BIDS give you, but outside BIDS. • Now, if BIDS can, WE can – No magic here, just need to know the APIs! • Just a little bit complex…but we’ll simplify things here • The key is the Execute method of Package class – In particular the overloads that takes the IDtsEvent interface parameter • Whose documentation is not very rich 13 Logging SSIS in MY way Developer’s corner • IDtsEvents is implemented by the base class DefaultEvents • We have to create a custom event handler class deriving from DefaultEvents and then override all default event handlers • Use an instance of the newly created class as a parameter for the Execute method on Package object – Now all events will be intercepted by our Event Handler! 14 Logging SSIS in MY way Developer’s corner • The event handlers methods can call a custom method to log data – Beware! SSIS runtime make heavy use of threads – We have to deal with the fact that our class is used by different thread at the same time. • We have to be sure that race conditions cannot occur • We have to be fast to avoid to impact too much on performances – Log the minimum for all event except errors – Log everything we can for error • They should never happen 15 Logging SSIS in MY way Developer’s corner • All containers will raise events • Inside each event handler method we can access to all runtime information for that container – – – – Variables Connections Configurations Properties • And their expressions 16 Logging SSIS in MY way Developer’s corner • Variables: use the Variables collection available in each container • Connections: use ConnectionManagers collection available in Package class • Configurations: use Configuration collection in Package class – The EnableConfiguration property also tells you if a Package will try to look for “default” configurations 17 Logging SSIS in MY way Developer’s corner • Extracting properties is a bit tricky… – First we have to ask to the container its properties through the Properties collection of the IDTSPropertiesProvider interface – For each property we have to call the GetValue on the Property passing the object from which this property come from as a parameter (!!!) 18 Logging SSIS in MY way Developer’s corner • Now, for Control Flow, we’re done. What about Data Flows? • No specific native logging infrastructure...but BIDS is able to show us how may rows flows between components – …so these information are available somewhere! • DataFlow is able to generate events through the FireCustomEvent method 19 Logging SSIS in MY way Developer’s corner • Custom events are described by the EventInfo class – Every container has an EventInfos property (a collection of EventInfo) • The key event here is the “OnPipelineRowsSent” data flow custom event – Here we have an array of objects that contains interesting things • For this event the array contains 8 entries 20 Logging SSIS in MY way Developer’s corner • OnPipilineRowSent payload – – – – – – – – Source Object (eg: System.__ComObject) DataFlow Object ID (eg: 140) DataFlow Object Name (eg: OLE DB Source Output) Object ID (eg: 134) Object Name (eg: TransformationName) Input Object Id (eg: 135) Input Object Name (eg: Derived Column Input) Row Count (eg: 744) • Not documented in EventInfo 21 Logging SSIS in MY way Developer’s corner • So, filtering on Custom Events we’re able to profile the entire DataFlow! – On buffer basis • We can also count how many times a DataFlow has been invoked when placed into a For..Loop or For..Each container – Together with the knowledge of variables values this provide us information the impact of each iteration 22 Logging SSIS in MY way DEMO 2 . Show me the code! 23 Logging SSIS in MY way DTLoggedExec • The result is DTLoggedExec – Current version 0.2.1.5 beta • Log everything needed – – – – Package version Variables values Properties’ Expressions Profile Dataflow 24 Logging SSIS in MY way DTLoggedExec • Additional Features – Handle long package chains correctly – Supports the majority of DTExec options – Pluggable architecture • Easy to create custom Log Providers • In future will also be able to add custom Data Flows Profilers • Supported platforms – Every platforms & architectures are supported • 2005, 2008 • X86, X64, IA64 25 Logging SSIS in MY way DEMO 3 . Test it! 26 Logging SSIS in MY way DTLoggedExec DB • Profiled data from DataFlows packages can be huge…better to put it into a database • With DTLoggedExec comes a full set of scripts and batch to create a specific database and to bulk load data – Actually only data profiled from DataFlows can be imported – In near future also data from CSV log provider will have its place here • 99% done, testing in progress 27 Logging SSIS in MY way DEMO 4 . Load profiled data 28 Logging SSIS in MY way Performances ? • Control Flow – Performance are affected by the amount of logging you decide to have • Data Flow – Impact of performing dataflow profiling: < 5% • DTLoggedExec can be improved to have even less impact if needed – Better buffering 29 Logging SSIS in MY way Support • DTLoggedExec is under Creative Commons license – Anyone can contribute • Official homepage – http://dtloggedexec.davidemauri.it – Wiki with documentation • Download, source code, issues and forum – http://dtloggedexec.codeplex.com/ 30 Adding value • “Native” Auditing – When, who and how a row has been imported in my DWH? • Performance monitoring of a single package – Or Dataflow • Performance monitoring over time • Easy to monitor discarded rows – Very useful in dashboard • Monitor SLA 31 Logging SSIS in MY way DEMO 5 . Adding value 32 DTLoggedExec Question & Answers References • DTLoggedExec – http://dtloggedexec.davidemauri.it • Jamie Thomson, “Custom Logging Using Event Handlers” – http://blogs.conchango.com/jamiethomson/archive/20 05/06/11/SSIS_3A00_-Custom-Logging-Using-EventHandlers.aspx • Andy Leonard, “ETL Instrumentation” – http://sqlblog.com/blogs/andy_leonard/archive/tags/E TL+Instrumentation/default.aspx 34 DTLoggedExec Thanks! 35