INTRODUCTION TO
DATA QUALITY SERVICES
Presentation by Tim Mitchell (Artis Consulting)
www.TimMitchell.net
Today’s Agenda
Overview of DQS
Structure
Knowledge Base
DQS Project
Operations
Matching
Cleansing
Administration
SSIS Component
Shortcomings
2
About the Presenter
Tim Mitchell
BI Consultant, Artis Consulting
North Texas SQL Server User Group
SQL Server MVP
Contributing author, MVP Deep Dives Vol 2
Coauthor, SSIS Design Patterns
TimMitchell.net | twitter.com/Tim_Mitchell
3
Housekeeping
Questions
Surveys
4
Overview of Data Quality Services
v
What is DQS?
DQS is a knowledge driven data
cleansing and matching services
Built on top of SQL Server 2012
Simple yet powerful interface
6
What is DQS?
7
What is DQS?
Replaces manual data quality work
you’re already doing
Stored procedures
Triggers
Custom applications
8
DQS Structure
v
DQS Structure and Flow
Knowledge Base
Domains
Matching
Policies
Composite
Domains
Cleansing
Project
Matching
Project
Cleansing
Project
Matching
Project
Cleansing
Project
10
Knowledge Base
Starting point for data quality provisioning
Uses locally customized data stores or
marketplace data sources
Highly reusable and evolutionary
Key elements:
Domains
Matching policies
11
Knowledge Base
Create by:
Knowledge discovery
Domain management
Matching rule
12
Knowledge Base
13
Domains
Domain = data field
Domain rules
Composite domains
Allows greater flexibility in domain rules
14
Data Quality Project
Create interactive projects for data matching and
cleansing
Leverage one or more domains in an existing
knowledge base
Somewhat reusable
15
Data Quality Project
Nondestructive – no changes to source of data
to be cleansed
No changes to the KB either
Separately, DQS project data can be used to improve
the knowledge base
16
Data Quality Project
17
DQS Operations
Cleansing
Process data against known entities and domain rules
Similar to Fuzzy Lookup transform in SSIS
Matching
Group data together
Similar to Fuzzy Grouping transform in SSIS
18
DQS Administration
Monitor past activity
Set logging options
Set confidence thresholds
19
DQS Administration
20
DQS and SSIS
SQL Server Integration Services has integrated
hook into DQS
DQS Cleansing Component
Provide automated, noninteractive data
cleansing operations
21
DQS and SSIS
22
Demos
v
Shortcomings
V1 product
No API – must use DQS client interactively
SSIS component only does cleansing
24
Final Thoughts
CU1 performance improvements
http://bit.ly/IKmMow
DQS videos / blogs
http://technet.microsoft.com/en-us/sqlserver/hh780961
My blog (www.TimMitchell.net)
DQS/MDS virtual chapter
masterdata.sqlpass.org
25
Questions?
v