Real-World SSIS A Survival Guide Tim Mitchell What we’ll cover today • • • • Lessons I’ve learned the hard way Methodologies to solve real problems in SSIS Tools to help out Solutions for SQL 2012 as well as earlier versions • Demos Real World SSIS: A Survival Guide (Tim Mitchell) 2 What we won’t cover • No intro to SSIS • Books Online Real World SSIS: A Survival Guide (Tim Mitchell) 3 Housekeeping • Presentation materials • Lunch / breaks Real World SSIS: A Survival Guide (Tim Mitchell) 4 Housekeeping • • • • Let’s keep it informal Ask questions Ask questions Ask questions Real World SSIS: A Survival Guide (Tim Mitchell) 5 PSA: Community • • • • • Survival is easier in groups Local user groups Events (SQL Saturday, SQL Bits, PASS Summit) Online communities Twitter (#sqlhelp) Real World SSIS: A Survival Guide (Tim Mitchell) 6 About me • • • • Business intelligence consultant Group Principal, Linchpin People SQL Server MVP TimMitchell.net / @Tim_Mitchell Real World SSIS: A Survival Guide (Tim Mitchell) 7 Real World SSIS: A Survival Guide (Tim Mitchell) 8 Texas Dictionary • Whole mess: Bountiful amounts of something, usually referring to excess • More than one way to skin a cat: A petunfriendly phrase to indicate that there are usually multiple ways to solve the same problem • Ya’ll: A subgroup of the current group • All ya’ll: The whole of the current group Real World SSIS: A Survival Guide (Tim Mitchell) 9 survival (noun) The state or fact of continuing to live or exist, typically in spite of an accident, ordeal, or difficult circumstances. Reference: Dictionary.com (http://dictionary.reference.com/browse/survival) Real World SSIS: A Survival Guide (Tim Mitchell) 12 survival (noun) Survival is simply the state of existing. It’s just a small step above being dead. -- Me Photo credit: Elvis Ripley (http://www.flickr.com/photos/elvisripley/315439191/). Used under Creative Commons license. Real World SSIS: A Survival Guide (Tim Mitchell) 13 Elements of Survival The dangers: • The elements • Predators • Foolishness of fellow survivors • The unexpected Real World SSIS: A Survival Guide (Tim Mitchell) 14 Elements of Survival The dangers: • Dirty data • Complex or poorly defined ETL requirements • Unexpected metadata changes • Unstable sources/destinations • Project managers Real World SSIS: A Survival Guide (Tim Mitchell) 15 Elements of Survival Means of survival: • Common sense of self preservation • Tools • Leaning on others • Learning from others’ mistakes Real World SSIS: A Survival Guide (Tim Mitchell) 16 Elements of Survival Means of survival: • Best practices • Consistency • Document • Tools (buy/build) • Community Real World SSIS: A Survival Guide (Tim Mitchell) 17 Survival Tip #1: Plan to Fail Planning to Fail Real World SSIS: A Survival Guide (Tim Mitchell) 19 Planning to Fail Data failures: • Missing or offline sources • Changed metadata • Partial loads • Validation issues • Unexpected domain values Real World SSIS: A Survival Guide (Tim Mitchell) 20 X If it happens… Real World SSIS: A Survival Guide (Tim Mitchell) 21 Planning to Fail Planning for failure in the wild: • Build your shelter before it rains • Layers • Leaves • Bread crumbs Real World SSIS: A Survival Guide (Tim Mitchell) 22 Planning to Fail Planning for failure, the ETL way: • Be a pessimist! • Fail gracefully • Capture error/warning data on failure • Build for restartability (where appropriate) Real World SSIS: A Survival Guide (Tim Mitchell) 23 Failing Gracefully Real World SSIS: A Survival Guide (Tim Mitchell) 24 Planning to Fail Why graceful failure? • Avoid leaving affected systems in an inconsistent state • Avoid repeating wholesale operations • Timely notifications to allow proper response from dev/admin staff Real World SSIS: A Survival Guide (Tim Mitchell) 25 Planning to Fail Graceful failures in SSIS • Control flow: – Event handlers – Precedence constraints • Data flow: – Error row redirection – Lookup failure redirection – Conditional split Real World SSIS: A Survival Guide (Tim Mitchell) 26 Planning to Fail Graceful failures in SSIS • Restartability – SSIS Checkpoints – SSIS transactions – Both methods have shortcomings – Custom restartability can be an option Real World SSIS: A Survival Guide (Tim Mitchell) 27 Planning to Fail Natural failures • Simply stop processing on error • Default behavior • In some cases, can be the right pattern Real World SSIS: A Survival Guide (Tim Mitchell) 28 Demo Designing for failure Real World SSIS: A Survival Guide (Tim Mitchell) 29 Survival Tip #2: Take Notes Take Notes What to note? • Trails, paths, and shortcuts • Water sources • Hazards • Enemy positions • Weather and wildlife patterns • Sunrise/sunset time http://www.flickr.com/photos/defenceimages/4993131844 Real World SSIS: A Survival Guide (Tim Mitchell) 31 Take Notes What to note? • Success and failure of operations • Row counts • Run times • Validation information • Warnings http://www.flickr.com/photos/mezdeathhead/3057797092/ Real World SSIS: A Survival Guide (Tim Mitchell) 32 Take Notes Why? • Know what to expect • Plan for growth • Cover your assets http://www.flickr.com/photos/mezdeathhead/3057797092/ Real World SSIS: A Survival Guide (Tim Mitchell) 33 Take Notes It’s all about the log. • SSIS logging • SQL Server log • Custom logging http://www.flickr.com/photos/rwphoto/3108504086/ Real World SSIS: A Survival Guide (Tim Mitchell) 34 Take Notes SSIS Package Logging • It’s already there • Easy to start • Flexible events and destinations • Can be unwieldy http://www.flickr.com/photos/raver_mikey/4285637209/ Real World SSIS: A Survival Guide (Tim Mitchell) 35 Take Notes SSIS Catalog Logging • Version 2012 only • Easiest to configure – Design time or runtime • Least flexible http://www.flickr.com/photos/raver_mikey/4285637209/ Real World SSIS: A Survival Guide (Tim Mitchell) 36 Take Notes Custom Logging • Roll your own • Most difficult to set up • Infinitely flexible http://www.flickr.com/photos/raver_mikey/4285637209/ Real World SSIS: A Survival Guide (Tim Mitchell) 37 Take Notes Server/engine logging • SQL Engine error log • DMVs • Third party tools • Windows log • PerfMon http://www.flickr.com/photos/raver_mikey/4285637209/ Real World SSIS: A Survival Guide (Tim Mitchell) 38 Demo Take Notes Real World SSIS: A Survival Guide (Tim Mitchell) 39 Survival Tip #3: Perform at your best Perform at your Best http://www.flickr.com/photos/defenceimages/5786942640/ http://www.flickr.com/photos/defenceimages/4627096866/ Real World SSIS: A Survival Guide (Tim Mitchell) 41 Perform at your Best Soldier up! • Recognize and avoid quicksand • React appropriately when you’re stuck • Know your environment Real World SSIS: A Survival Guide (Tim Mitchell) 42 Perform at your Best Soldier up! • Isolate and eliminate the things that slow you down • Recognize design patterns that are detrimental to performance • Look *outside* SSIS (gasp!) Real World SSIS: A Survival Guide (Tim Mitchell) 43 Perform at your Best It’s not just SSIS • The majority of SSIS performance problems have nothing to do with SSIS • Limitations on sources and destinations Real World SSIS: A Survival Guide (Tim Mitchell) 44 Perform at your Best It’s not just SSIS • Don’t just ‘pass the buck’, but do consider other factors: – SQL engine configuration – Disk configuration – Network speed/latency – Physical machine capabilities Real World SSIS: A Survival Guide (Tim Mitchell) 45 Perform at your Best It’s not just SSIS • Proper query techniques for relational sources • Effective indexing for sources and destinations • Using OPTION (FAST <n>) Real World SSIS: A Survival Guide (Tim Mitchell) 46 Perform at your Best Streamline your data flows • Transformations matter! • Know how the blocking properties of transformations Real World SSIS: A Survival Guide (Tim Mitchell) 47 Perform at your Best Streamline your data flows • Nonblocking transforms do not hold buffers – Derived Column – Conditional Split – Row Count Real World SSIS: A Survival Guide (Tim Mitchell) 48 Perform at your Best Streamline your data flows • Partially blocking transforms will queue up buffers as needed – Merge Join – Lookup – Union All Real World SSIS: A Survival Guide (Tim Mitchell) 49 Perform at your Best Streamline your data flows • Fully blocking transforms will not pass any data through until all of the data has been buffered at that transformation – Sort – Aggregate Real World SSIS: A Survival Guide (Tim Mitchell) 50 Perform at your Best Streamline your data flows • Be aware of memory use! – LOB (large object) columns will always spool to disk rather than staying in memory. – [N]VARCHAR(MAX) – Memory buffers may spill over to disk Real World SSIS: A Survival Guide (Tim Mitchell) 51 Perform at your Best Streamline your data flows • Manage your sources – Don’t use table drop down list – specify your query including only the necessary columns – Be mindful of indexes when writing data retrieval queries Real World SSIS: A Survival Guide (Tim Mitchell) 52 Perform at your Best Streamline your data flows • Manage your destinations – Use FAST LOAD for SQL Server destinations – Index management (drop?) Real World SSIS: A Survival Guide (Tim Mitchell) 53 Perform at your Best Go Parallel! • Parallel operations can yield faster data flows Real World SSIS: A Survival Guide (Tim Mitchell) 54 Demo Parallel data flow Real World SSIS: A Survival Guide (Tim Mitchell) 55 Perform at your Best Streamline your data flows • Using lookups – Pay attention to lookup cache mode • Full cache • Partial cache • No cache Real World SSIS: A Survival Guide (Tim Mitchell) 56 Perform at your Best Streamline your data flows • Using lookups – Two-phase lookup strategy: • Commonly accessed data in full cache • Remaining data in a subsequent partial cache Real World SSIS: A Survival Guide (Tim Mitchell) 57 Perform at your Best Streamline your data flows • Using lookups – Cache connection manager • Allow reuse of lookup information across data flows Real World SSIS: A Survival Guide (Tim Mitchell) 58 Demo Lookups Real World SSIS: A Survival Guide (Tim Mitchell) 59 Survival Tip #4: Clean it up Clean it up The greatest danger is in the elements • Chances are that unsanitary conditions will kill you before a predator does – Infection – Spoiled food or water Real World SSIS: A Survival Guide (Tim Mitchell) 61 Clean it up In ETL, the greatest dangers often lie in the small things • Like an infection, bad data can fester for a while until it’s too late • Caught early, problems with dirty data are more easily solved Real World SSIS: A Survival Guide (Tim Mitchell) 62 What is dirty data? Types of dirty data: • Data type mismatches • Domain violations • Semantic violations • Technical errors • Simple inaccuracies Real World SSIS: A Survival Guide (Tim Mitchell) 63 What is dirty data? Data type mismatches • Non-numeric data in numeric fields • Decimal data in integer fields • Incorrect precision / rounding • Truncation Real World SSIS: A Survival Guide (Tim Mitchell) 64 What is dirty data? Domain violations • Invalid dates • Incorrect addresses Semantic violations • Data outside of a reasonable range (such as a person’s age in the thousands of years) • Inconsistent use of NULL, blanks, and zeroes Real World SSIS: A Survival Guide (Tim Mitchell) 65 What is dirty data? Technical errors • Improperly formatted dates • Out-of-alignment flat files • Too many/too few delimiters Real World SSIS: A Survival Guide (Tim Mitchell) 66 What is dirty data? Simple inaccuracies • Misspellings • Duplications • Improper formatting (email addresses, phone numbers) • Case Real World SSIS: A Survival Guide (Tim Mitchell) 67 What causes dirty data? Real World SSIS: A Survival Guide (Tim Mitchell) 68 What is dirty data? Causes of dirty data: • Internal: • Unvalidated user input • Lack of proper database constraints and/or application logic • External: • Import bad data from other systems • ETL errors Real World SSIS: A Survival Guide (Tim Mitchell) 69 Now What? Real World SSIS: A Survival Guide (Tim Mitchell) 70 Clean it up • Test your cleansing logic in stage/test/QA first • Cleanse directly in production • Don’t cleanse at all Real World SSIS: A Survival Guide (Tim Mitchell) 71 Clean it up What to do with unresolvable bad data? • Delete • Update to NULL or unknown member • Mark as suspect • Write to triage • Stop the ETL Real World SSIS: A Survival Guide (Tim Mitchell) 72 Data Cleansing in SSIS Real World SSIS: A Survival Guide (Tim Mitchell) 73 Data Cleansing in SSIS Tools of the trade • Native SSIS components • POTS (Plain Old Transact-SQL) • SQL Server DQS Real World SSIS: A Survival Guide (Tim Mitchell) 74 Data Cleansing in SSIS SSIS Native Components • A versatile approach with more transformation options • A much better choice when data cleansing operations involve multiple and/or non-SQL Server data sources • Extensible through custom code • Third party add-ons Real World SSIS: A Survival Guide (Tim Mitchell) 75 Data Cleansing in SSIS SSIS Native Components • Precision tools include Lookup Transformation, Merge Join • Flexible/inexact cleansing through Conditional Split, Derived Columns transformation, fuzzy tools Real World SSIS: A Survival Guide (Tim Mitchell) 76 Data Cleansing in SSIS Transact-SQL • Fast, simple, effective way to do some cleanup operations • Requires no additional software or configuration • Extensible through the use of UDFs or CLR functions Real World SSIS: A Survival Guide (Tim Mitchell) 77 Data Cleansing in SSIS Data Quality Services • A tool specifically designed for data cleansing • Has its own client interface, or can be used within SSIS for cleansing operations • Limited set of operations in SSIS Real World SSIS: A Survival Guide (Tim Mitchell) 78 Demo Data Cleansing Real World SSIS: A Survival Guide (Tim Mitchell) 79 Survival Tip #5: The Swiss Army Knife Swiss Army Knife When unexpected situations arise, an allpurpose tool can literally be a lifesaver. • Cut up small firewood • Can opener • Make a game trap Real World SSIS: A Survival Guide (Tim Mitchell) 81 Swiss Army Knife Scripting and coding tools • SSIS Expressions • Script task/script component • PowerShell Real World SSIS: A Survival Guide (Tim Mitchell) 82 Swiss Army Knife SSIS Expressions • Built into SSIS • Can be used in most any component or task • No extra moving parts required • Useful for declarative statements Real World SSIS: A Survival Guide (Tim Mitchell) 83 Swiss Army Knife Pros: • Easy to get started – just start expressing yourself • Ubiquity • Relatively easy to use Real World SSIS: A Survival Guide (Tim Mitchell) 84 Swiss Army Knife Cons: • Syntax is <polite> unique </polite> • Complex expressions are difficult • Troubleshooting Real World SSIS: A Survival Guide (Tim Mitchell) 85 Swiss Army Knife SSIS Scripting • .NET Framework • VB.NET or C# • Can use existing external assemblies Real World SSIS: A Survival Guide (Tim Mitchell) 86 Swiss Army Knife Pros: • Swiss Army knife of SSIS • Works great for operations where native SSIS tasks/components can’t easily accomplish goal • Does not require in-depth programming knowledge Real World SSIS: A Survival Guide (Tim Mitchell) 87 Swiss Army Knife Cons: • Does require some familiarity with programming or scripting • Not as simple as native components • Performance (sometimes) Real World SSIS: A Survival Guide (Tim Mitchell) 88 Swiss Army Knife Script Task • Used in the Control Flow • Variety of uses: • Interact with OS • Filesystem operations (archiving) • Manipulate SSIS variables • Call external programs Real World SSIS: A Survival Guide (Tim Mitchell) 89 Swiss Army Knife Script Component • Data Flow pane • Data flow/manipulation • Used for: • Data manipulation in the pipeline that can’t be accomplished otherwise • Advanced branching logic • Shred unconventional input files • Create custom output files Real World SSIS: A Survival Guide (Tim Mitchell) 90 Swiss Army Knife Script Component • Synchronous or asynchronous • Types • Source • Transformation • Destination Real World SSIS: A Survival Guide (Tim Mitchell) 91 Swiss Army Knife • • • • • • Semi-structured files Nonlinear files Multiple lines of text per output row Varying number of columns Dissimilar data types “Record Type” format Real World SSIS: A Survival Guide (Tim Mitchell) 92 Other Scripting Uses • Wait for a file or connection to be available • Set and enforce thresholds for maximum execution time • Custom logging • Custom notifications • Cross-package variable sharing • ????? Real World SSIS: A Survival Guide (Tim Mitchell) 93 Demo Expressions and scripting Real World SSIS: A Survival Guide (Tim Mitchell) 94 Survival Tip #6: Know what’s coming next Know what’s coming next Survivors keep an eye on what to expect in the days/months/years ahead • Weather forecasts • Changing of seasons • Wildlife patterns Real World SSIS: A Survival Guide (Tim Mitchell) 96 Know what’s coming next Know the technical/business landscape • New versions of software • Emerging design patterns Real World SSIS: A Survival Guide (Tim Mitchell) 97 What’s new in SSIS for SQL Server 2012 Real World SSIS: A Survival Guide (Tim Mitchell) 98 Logging Changes Back in the day… • Logging configured at the package level • Inconsistent • Difficult to add logging afterward Real World SSIS: A Survival Guide (Tim Mitchell) 99 Logging Changes … and now: • Logging is configured at the server level (SSIS catalog) • Can be added, changed, or removed at runtime Real World SSIS: A Survival Guide (Tim Mitchell) 100 Logging Changes … and now: • Logging levels: – – – – Basic Performance Verbose None • Native row count logging (and everyone said “Amen”) • Logs to table in SSISDB Real World SSIS: A Survival Guide (Tim Mitchell) 101 Logging Changes … and now: • Built-in reports – Included with SSMS – Detail and aggregate data Real World SSIS: A Survival Guide ETL Head-to-Head: T-SQL vs. SSIS (Tim Mitchell) 102 102 Undo/Redo When I was your age… • Package changes are immediate • Undo = close without saving Real World SSIS: A Survival Guide (Tim Mitchell) 103 Undo/Redo … and now • Full support of Undo and Redo in the designer Real World SSIS: A Survival Guide (Tim Mitchell) 104 Package Parameters Prior Versions: • Sharing of values between packages required the inheritance of parent package variables • Parent packages had no knowledge of expected variables in child packages Real World SSIS: A Survival Guide (Tim Mitchell) 105 Package Parameters Prior Versions: • There was no practical way to configure variables as required (other than failing the package) Real World SSIS: A Survival Guide (Tim Mitchell) 106 Package Parameters SQL Server 2012: • Package parameters! • Required or optional • Accessible through the Execute Package Task in parent packages Real World SSIS: A Survival Guide (Tim Mitchell) 107 Package Parameters Real World SSIS: A Survival Guide (Tim Mitchell) 108 DQS and SSIS Then: • Data quality routines were everywhere, but also completely manual • No standard means of implementation Real World SSIS: A Survival Guide (Tim Mitchell) 109 DQS and SSIS Now: • SSIS has a transformation to leverage DQS (also new) for data cleansing operations • Consumes reusable knowledge base data for reliable, consistent cleansing Real World SSIS: A Survival Guide (Tim Mitchell) 110 DQS and SSIS Real World SSIS: A Survival Guide (Tim Mitchell) 111 Flat File Improvements Old school: • Irregularly shaped flat files could not be natively processed in SSIS • Scripting was usually required to process Real World SSIS: A Survival Guide (Tim Mitchell) 112 Flat File Improvements New school: • New flat file connection allows native processing of files with missing columns Real World SSIS: A Survival Guide (Tim Mitchell) 113 Flat File Improvements Real World SSIS: A Survival Guide ETL Head-to-Head: T-SQL vs. SSIS (Tim Mitchell) 114 Shared Data Sources In days of yore: • “Shared” connections meant configuring a connection in each package, and using package configs for the connection string • Still requires setting up and maintaining connections at the package level Real World SSIS: A Survival Guide (Tim Mitchell) 115 Shared Data Sources Here and now: • Native shared connections allow SSIS projects to use connections common to the entire project • Package-level connections still supported Real World SSIS: A Survival Guide (Tim Mitchell) 116 Shared Data Sources Real World SSIS: A Survival Guide (Tim Mitchell) 117 Script Component Debugging Remember when: • MessageBox.Show() • MessageBox.Show() • MessageBox.Show() • MessageBox.Show() • MessageBox.Show() • MessageBox.Show() • MessageBox.Show() Real World SSIS: A Survival Guide (Tim Mitchell) 118 Script Component Debugging … and now: • Integrated debugging in the script component • Step through code line by line to find issues and test Real World SSIS: A Survival Guide (Tim Mitchell) 119 Demo Script Component Debugging Real World SSIS: A Survival Guide (Tim Mitchell) 120 Name-based metadata mapping Then: • Changing upstream components often causes runtime errors in downstream components • The longest 4-letter word in the English language: VS_NEEDSNEWMETADATA Real World SSIS: A Survival Guide (Tim Mitchell) 121 Name-based metadata mapping Now: • Metadata mapping is based on name • Easier to remap upstream components Real World SSIS: A Survival Guide (Tim Mitchell) 122 Demo Name-based metadata mapping Real World SSIS: A Survival Guide (Tim Mitchell) 123 CDC in SSIS The old: • CDC (Change Data Capture) was present in the DB engine, but required manual T-SQL coding to implement Real World SSIS: A Survival Guide (Tim Mitchell) 124 CDC in SSIS The new: • SSIS now has new task and components to handle CDC processing • CDC Task – metadata (start/end initial load, etc.) • CDC Source – retrieve CDC data • CDC Splitter – break apart results Real World SSIS: A Survival Guide (Tim Mitchell) 125 Environments Environment replace configurations • Collections of related values (ex: Production connection strings, Dev connection strings, etc.) • Multiple environments can be associated with each project or package • Specify for automated job, or easily choose at runtime Real World SSIS: A Survival Guide (Tim Mitchell) 126 10+. Designer Improvements Package annotations • In prior versions, annotations were difficult • SSIS 2012 improvements Real World SSIS: A Survival Guide (Tim Mitchell) 127 Designer Improvements Sort packages by name • Sometimes it’s the little things that matter Real World SSIS: A Survival Guide (Tim Mitchell) 128 Designer Improvements Simplified data viewer Real World SSIS: A Survival Guide (Tim Mitchell) 129 Designer Improvements Universal status indicators Real World SSIS: A Survival Guide (Tim Mitchell) 130 Designer Improvements Variable management • Scope default • Expression management • Static values vs. expression • Expression indicator Real World SSIS: A Survival Guide (Tim Mitchell) 131 Survival Tip #7: Have a bag of tricks Have a bag of tricks Be lazy! • Code once, reuse many • Create a portable system for reusing familiar patterns – Database? – Documentation? Real World SSIS: A Survival Guide (Tim Mitchell) 133 Have a bag of tricks Be lazy! • ETL Framework – Managed execution for multipackage ETL processes – Restartability, consolidated error handling and logging Real World SSIS: A Survival Guide (Tim Mitchell) 134 Have a bag of tricks Be lazy! • Custom SSIS components – Create custom components for commonly used design patterns – Parameterized script packages may substitute in SQL 2012 Real World SSIS: A Survival Guide (Tim Mitchell) 135 Have a bag of tricks Be lazy! • Third party tools – BIDS Helper – SSIS Reporting Pack – SQL Sentry Plan Explorer – Brent Ozar’s SQLBlitz Real World SSIS: A Survival Guide (Tim Mitchell) 136 Have a bag of tricks Be lazy! • Biml – Business Intelligence Markup Language – Package generation tool – Included with BIDS Helper (free) Real World SSIS: A Survival Guide (Tim Mitchell) 137 Demo Biml package generation Real World SSIS: A Survival Guide (Tim Mitchell) 138 Questions? Comments? Standing ovation? Real World SSIS: A Survival Guide (Tim Mitchell) 139 Thanks! TimMitchell.net @Tim_Mitchell tdmitch@gmail.com Real World SSIS: A Survival Guide (Tim Mitchell) 140