Slide Deck

advertisement
SQL Bits 2009 - Manchester
Instrumenting, Monitoring
and Auditing of SSIS ETL
Solutions
Davide Mauri
dmauri@solidq.com
EXEC sp_help ‘Davide Mauri’
•
•
•
•
•
•
MCDBA, MCAD, MCT
Microsoft SQL Server MVP
Works with SQL Server from 6.5
Works on BI from 2003
President of UGISS (Italian SQL Server UG)
Mentor @ Solid Quality Mentors
– Italian Subsidiary
Agenda
•
•
•
•
ETL Story
Logging SSIS in MS way
Workarounds
Logging SSIS in MY way
– The developer’s corner
• Adding value to log data
© 2007 Solid Quality Mentors
3
ETL Story
4
ETL Story
• ETL process grows in complexity
• Since package won’t be run from BIDS in
production you need something to help to
understand
– What went wrong when package didn’t worked as
expected
• And maybe this happens only at nighttime…
– Monitor the performance of your package to forecast
its ability to stay within a given timeframe
5
Logging SSIS in MS way
Nice things
• Flexibility
– You can decide what and where to log to
– You have a lot of ready-to-be-used log providers
– Is available out-of-the-box 
• Well, you have to remember to activate it 
6
Logging SSIS in MS way
Not-so-nice things
• Logging needs to be set up within the package
– If you need to change logging you need to edit the
package in VS
• Too few information given
–
–
–
–
No Variables values
No Expressions results
No Information on Data Flow
Cannot handle very well chains of packages (>=2)
• Problems using Parent Package Variables to propagate
logging configuration (eg: log file path)
• You just lose information when you have more than 2
packages in the chain
7
Logging SSIS in MS way
Things that don’t work as expected 
• DTExec seems to allow to control logging at
runtime
– Unfortunately you need to have a properly configured
connection manager in advance
8
Logging SSIS in MS way
“Improved” things from 2005 to 2008
• Some new features added with SQL 2008
– SQLDumper
• Too much detailed information on one hand, and again to few
on the other
9
Logging SSIS in MS way
Conclusions
• Doesn’t really offer an help to understand what’s
went wrong
– To few information given
• Hey, I’d like to log also Data Flow! I really have
to do everything by hand?
– This can take a lot of time!
• I want to change my logged data. How can I do it
without have to open the package in BIDS and
release-test-deploy it?
– You can’t!
10
Logging SSIS in MS way
Workarounds
• Use specific task (Script, Custom or Execute
SQL) before and after each task you want to
instrument
• Create an event-handler for each event you
want to log (es: PreExecute, PostExecute)
– Better if then you use a tool to create SSIS templates
and standardize them
• Like MDDE (Metadata-driven ETL)
– http://www.codeplex.com/SQLServerMDDEStudio/
11
Logging SSIS in MS way
DEMO
1 . The usual way
12
Logging SSIS in MY way
Learn from BIDS
• Basically I’d like to have all the information that
BIDS give you, but outside BIDS.
• Now, if BIDS can, WE can 
– No magic here, just need to know the APIs!
• Just a little bit complex…but we’ll simplify things here 
• The key is the Execute method of Package class
– In particular the overloads that takes the IDtsEvent
interface parameter
• Whose documentation is not very rich 
13
Logging SSIS in MY way
Developer’s corner
• IDtsEvents is implemented by the base class
DefaultEvents
• We have to create a custom event handler class
deriving from DefaultEvents and then override
all default event handlers
• Use an instance of the newly created class as a
parameter for the Execute method on Package
object
– Now all events will be intercepted by our Event
Handler! 
14
Logging SSIS in MY way
Developer’s corner
• The event handlers methods can call a custom
method to log data
– Beware! SSIS runtime make heavy use of threads
– We have to deal with the fact that our class is used by
different thread at the same time.
• We have to be sure that race conditions cannot occur
• We have to be fast  to avoid to impact too
much on performances
– Log the minimum for all event except errors
– Log everything we can for error
• They should never happen 
15
Logging SSIS in MY way
Developer’s corner
• All containers will raise events
• Inside each event handler method we can
access to all runtime information for that
container
–
–
–
–
Variables
Connections
Configurations
Properties
• And their expressions
16
Logging SSIS in MY way
Developer’s corner
• Variables: use the Variables collection available
in each container
• Connections: use ConnectionManagers
collection available in Package class
• Configurations: use Configuration collection in
Package class
– The EnableConfiguration property also tells you if a
Package will try to look for “default” configurations
17
Logging SSIS in MY way
Developer’s corner
• Extracting properties is a bit tricky…
– First we have to ask to the container its properties
through the Properties collection of the
IDTSPropertiesProvider interface
– For each property we have to call the GetValue on
the Property passing the object from which this
property come from as a parameter (!!!) 
18
Logging SSIS in MY way
Developer’s corner
• Now, for Control Flow, we’re done. What about
Data Flows?
• No specific native logging infrastructure...but
BIDS is able to show us how may rows flows
between components
– …so these information are available somewhere!
• DataFlow is able to generate events through the
FireCustomEvent method
19
Logging SSIS in MY way
Developer’s corner
• Custom events are described by the EventInfo
class
– Every container has an EventInfos property (a
collection of EventInfo)
• The key event here is the “OnPipelineRowsSent”
data flow custom event
– Here we have an array of objects that contains
interesting things 
• For this event the array contains 8 entries
20
Logging SSIS in MY way
Developer’s corner
• OnPipilineRowSent payload
–
–
–
–
–
–
–
–
Source Object (eg: System.__ComObject)
DataFlow Object ID (eg: 140)
DataFlow Object Name (eg: OLE DB Source Output)
Object ID (eg: 134)
Object Name (eg: TransformationName)
Input Object Id (eg: 135)
Input Object Name (eg: Derived Column Input)
Row Count (eg: 744)
• Not documented in EventInfo 
21
Logging SSIS in MY way
Developer’s corner
• So, filtering on Custom Events we’re able to
profile the entire DataFlow!
– On buffer basis
• We can also count how many times a DataFlow
has been invoked when placed into a For..Loop
or For..Each container
– Together with the knowledge of variables values this
provide us information the impact of each iteration
22
Logging SSIS in MY way
DEMO
2 . Show me the code!
23
Logging SSIS in MY way
DTLoggedExec
• The result is DTLoggedExec
– Current version 0.2.1.5 beta
• Log everything needed
–
–
–
–
Package version
Variables values
Properties’ Expressions
Profile Dataflow
24
Logging SSIS in MY way
DTLoggedExec
• Additional Features
– Handle long package chains correctly
– Supports the majority of DTExec options
– Pluggable architecture
• Easy to create custom Log Providers
• In future will also be able to add custom Data Flows Profilers
• Supported platforms
– Every platforms & architectures are supported
• 2005, 2008
• X86, X64, IA64
25
Logging SSIS in MY way
DEMO
3 . Test it!
26
Logging SSIS in MY way
DTLoggedExec DB
• Profiled data from DataFlows packages can be
huge…better to put it into a database
• With DTLoggedExec comes a full set of scripts
and batch to create a specific database and to
bulk load data
– Actually only data profiled from DataFlows can be
imported
– In near future also data from CSV log provider will
have its place here
• 99% done, testing in progress
27
Logging SSIS in MY way
DEMO
4 . Load profiled data
28
Logging SSIS in MY way
Performances ?
• Control Flow
– Performance are affected by the amount of logging
you decide to have
• Data Flow
– Impact of performing dataflow profiling: < 5%
• DTLoggedExec can be improved to have even
less impact if needed
– Better buffering
29
Logging SSIS in MY way
Support
• DTLoggedExec is under Creative Commons
license
– Anyone can contribute
• Official homepage
– http://dtloggedexec.davidemauri.it
– Wiki with documentation
• Download, source code, issues and forum
– http://dtloggedexec.codeplex.com/
30
Adding value
• “Native” Auditing
– When, who and how a row has been imported in my
DWH?
• Performance monitoring of a single package
– Or Dataflow
• Performance monitoring over time
• Easy to monitor discarded rows
– Very useful in dashboard
• Monitor SLA
31
Logging SSIS in MY way
DEMO
5 . Adding value
32
DTLoggedExec
Question & Answers
References
• DTLoggedExec
– http://dtloggedexec.davidemauri.it
• Jamie Thomson, “Custom Logging Using Event
Handlers”
– http://blogs.conchango.com/jamiethomson/archive/20
05/06/11/SSIS_3A00_-Custom-Logging-Using-EventHandlers.aspx
• Andy Leonard, “ETL Instrumentation”
– http://sqlblog.com/blogs/andy_leonard/archive/tags/E
TL+Instrumentation/default.aspx
34
DTLoggedExec
Thanks!
35
Download