INTRODUCTION TO DATA QUALITY SERVICES Presentation by Tim Mitchell (Artis Consulting) www.TimMitchell.net Today’s Agenda Overview of DQS Structure Knowledge Base DQS Project Operations Matching Cleansing Administration SSIS Component Shortcomings 2 About the Presenter Tim Mitchell BI Consultant, Artis Consulting North Texas SQL Server User Group SQL Server MVP Contributing author, MVP Deep Dives Vol 2 Coauthor, SSIS Design Patterns TimMitchell.net | twitter.com/Tim_Mitchell 3 Housekeeping Questions Surveys 4 Overview of Data Quality Services v What is DQS? DQS is a knowledge driven data cleansing and matching services Built on top of SQL Server 2012 Simple yet powerful interface 6 What is DQS? 7 What is DQS? Replaces manual data quality work you’re already doing Stored procedures Triggers Custom applications 8 DQS Structure v DQS Structure and Flow Knowledge Base Domains Matching Policies Composite Domains Cleansing Project Matching Project Cleansing Project Matching Project Cleansing Project 10 Knowledge Base Starting point for data quality provisioning Uses locally customized data stores or marketplace data sources Highly reusable and evolutionary Key elements: Domains Matching policies 11 Knowledge Base Create by: Knowledge discovery Domain management Matching rule 12 Knowledge Base 13 Domains Domain = data field Domain rules Composite domains Allows greater flexibility in domain rules 14 Data Quality Project Create interactive projects for data matching and cleansing Leverage one or more domains in an existing knowledge base Somewhat reusable 15 Data Quality Project Nondestructive – no changes to source of data to be cleansed No changes to the KB either Separately, DQS project data can be used to improve the knowledge base 16 Data Quality Project 17 DQS Operations Cleansing Process data against known entities and domain rules Similar to Fuzzy Lookup transform in SSIS Matching Group data together Similar to Fuzzy Grouping transform in SSIS 18 DQS Administration Monitor past activity Set logging options Set confidence thresholds 19 DQS Administration 20 DQS and SSIS SQL Server Integration Services has integrated hook into DQS DQS Cleansing Component Provide automated, noninteractive data cleansing operations 21 DQS and SSIS 22 Demos v Shortcomings V1 product No API – must use DQS client interactively SSIS component only does cleansing 24 Final Thoughts CU1 performance improvements http://bit.ly/IKmMow DQS videos / blogs http://technet.microsoft.com/en-us/sqlserver/hh780961 My blog (www.TimMitchell.net) DQS/MDS virtual chapter masterdata.sqlpass.org 25 Questions? v