DAMA0210_Gransee - DAMA-MN

advertisement
K N I G H T S B R I D G E
Practical Meta Data Solutions
For the Large Data Warehouse
DAMA - MN
PERFORMANCE
October 16, 2002
that
empowers
www.knightsbridge.com
© copyright 2001 Knightsbridge Solutions LLC
Agenda
• Introduction
• Enterprise meta data strategy
• Data warehousing meta data strategy
• Project approach for a practical solution
• Meta data architecture
• Defining ROI
• Tools/options for moving forward
• Meta data summary
• The data quality cycle
• Questions
Tom Gransee, Knightsbridge
2
Everyone knows
Meta data is:
– Valuable
– The right thing to do
– Important for long-term
success
So why isn’t everyone doing it?
Tom Gransee, Knightsbridge
Introduction
3
Why isn’t meta data being addressed?
• Don’t know how to
demonstrate the ROI
• Too complex or we don’t
know where to begin
• Can’t agree on what should
be done
• Market is not mature
enough – we’ll wait until it
settles down
Where do we start?
How do we justify it?
• We’ve tried and failed
Tom Gransee, Knightsbridge
Introduction
4
What is the cost of dirty data?
• “The cost of poor data may be 10-25 percent of total
revenues” - Larry English
• Data quality issues torpedoed a $38 million CRM project Fleet Bank 1996
• The Data Warehouse Institute (TDWI) estimates that poor
quality customer data costs U.S. businesses a staggering
$611 billion a year in postage, printing, and staff overhead
Real Life Insurance Example - $10 million annually
• 2 million annual claims with 377 data items each
• Error rate of .001 generates more than 754,000 errors per month and over 9.04
million annually
• If 10% are critical to fix, there are still over 1 million errors to correct
• Even at a conservative estimate of $10 per error, the companies risk exposure to
poor claim information is $10 million a year
Source: TDWI Data Quality and the Bottom Line
Tom Gransee, Knightsbridge
Introduction
5
Understanding Data Quality Issues
Legacy applications
• Limited exposure to the data
• Greatly increased exposure to the data
• The data warehouse and Operational Data
Stores have revealed the impact of data
quality problems on the business
• Meta data is the component that ties the
data warehouse and ODS to the legacy
systems and exposes the data quality
problems
Tom Gransee, Knightsbridge
Introduction
6
Setting boundaries
You can’t do it all at once!
Tom Gransee, Knightsbridge
Enterprise meta data strategy
7
Establishing a meta data strategy
Data
Warehouse
and
Business
Intelligence
Enterprise
Architecture
- EAI
- ERM
CDISC
HL7
Clinical Trials
Clinical
Patient Care
Content
Component
Document Management
Management Management
and
Portals
Business
Rules
Select a practical starting point and build on your success!
Tom Gransee, Knightsbridge
Enterprise meta data strategy
8
Starting with the data warehouse
Data Quality
Systems
A practical strategy with real business benefits!
Tom Gransee, Knightsbridge
Data Warehousing meta data strategy
9
Why is the DW a good starting point?
• DW typically focuses on the data that
most needs to be shared
• DW presents the greatest need to
understand the data because it is crossfunctional
• Real business benefit can be obtained
for a practical investment
• Existing DW are being re-architected
• Meta data standards and tools are
beginning to have an impact in this area
• Challenges are created in a best-ofbreed development environment
Tom Gransee, Knightsbridge
Data Warehousing meta data strategy
Built for today
- architected
for tomorrow
10
MD integration challenges in the DW architecture
Tom Gransee, Knightsbridge
Data Warehousing meta data strategy
11
Defining meta data for the DW
The formal approach to managing the processes and
information needed by both business and technical
associates to define, build, administer and navigate the DW
Building
Blocks
Example
Benefit
User
Source
Define
Business Meaning
Calculations
Lineage
Recognition
Understanding
Trust
Casual User
Power User
New User
Heads and documents
Spreadsheets
ETL mappings
Build and
Administer
Usage
Key Attributes
Mappings
Performance
Integrity
Scalability
Operator
Modeler
Designer
ETL Jobs Statistics
Data Model
ETL Tool
Navigate
Alias
Canned Reports
Refresh Data
Location
Expedience
Accuracy
New/Casual User
Executive
Frequent User
Data Model
Business Intelligence
Job Schedules / logs
Robust, integrated meta data solutions will aid in using, developing
and operating the data warehouse
source: META Group
Meta data solutions are also referred to as: Information Catalogs or digital DNA
Tom Gransee, Knightsbridge
Data Warehousing meta data strategy
12
The need for a complete project lifecycle
• Document today’s meta data environment
–
–
Identify meta data users and its sources
Identify the business drivers
•
–
Problems, opportunities and associated costs
Identify requirements
•
•
–
How does meta data address the business drivers
What are the savings
Process Layer
Processes and disciplines required
to generate and sustain complete
and accurate meta data
Define objectives and benefits
• Develop a meta data architecture
–
–
–
–
–
Building an Architecture
Processes and disciplines
Integration requirements
Delivery and usage requirements
Technology component / tool
Change management process priorities
• Develop project plan and cost/benefits
Integration Layer
Exchange of meta data between
multiple tools across the data
warehousing framework
Technology Layer
Automate processes and integration
and provide a single Web based
view consolidated across tools
• Build with an Iterative release approach
Tom Gransee, Knightsbridge
Project Approach for a practical solution
13
What are successful projects addressing?
1. Lineage
–
–
What data is available, where it came from and how it’s transformed
1
Including definition, currency and accuracy
2. Appropriate information by user type
–
–
Easy access to meaningful meta data
“How is it different from what I’m used to seeing?”
2
3. Impact analysis across tools and platforms
–
Impossible to do without a formalized meta data technology solution
3
4. Versioning
–
–
How has it changed over time?
Moving from development, to test, to production
5. Live meta data
–
–
Meta data is a natural part of the process
A function fails if the meta data is not complete and accurate
Tom Gransee, Knightsbridge
Project Approach for a practical solution
4
14
Meta data architecture for the DW
Collection Points:
DW & ETL
Design
Auditing
Balancing and controls
Data to support lineage
Data Cleansing
Householding
Data to support lineage
Modeling
Mappings
ETL
6
•Processes and disciplines
•Live meta data concepts
•System of Record
Change management
Source systems
Data warehouse
Staging area / ODS
RDBMS
Bus Req
Source Control
Configuration management
Data Validation
Data Profiling
Job
File
Migration
Systems Execution
Mgmt
1. Collection
WEB
5
Centralized Meta Data Repository
• Manage redundancy
• Provide one view across tools
Tom Gransee, Knightsbridge
Meta Data
Repository
Engine
Object Model
Laptop computer
Consolidated view across tools
BI
2. Integration
Business and technical
user Interface
3. Usage
Data
Quality
7
Physical Repository
RDBMS
Oracle, DB2, others
Meta Data Architecture
15
Components of a well-architected DW solution
• Short- and long-term requirements are
well-defined displaying clear business
benefits
• Meta tags required to support lineage,
balancing and controls, etc., are built into
the DW architecture
• Live meta data concepts are rigorously
followed
• A plug-and-play architecture is used
– Support for multiple tools in a category, i.e.,
Informatica and Ab Initio for ETL
Built for today
- architected
for tomorrow
– Simplifies future transitions to new technology
and tools
Tom Gransee, Knightsbridge
Meta Data Architecture
16
How is a meta data investment in the DW justified
• Reduced total cost of ownership
– Impact analysis -- 50 percent of
development efforts are spent assessing
what is impacted by the change
– Configuration and migration management
8
– Eliminate redundant work
• Improved user acceptance
– What’s available and how to access it
– Business user understands the data and
where it came from -- how is it different
from the operation informational systems?
• Risk avoidance
– What’s the impact of not delivering the
business benefits used to justify the DW
• Industry or government regulations
• Best practices
Tom Gransee, Knightsbridge
Defining ROI
17
The CWM standard as an enabler
1.The Common Warehouse Metamodel
2.Model driven architecture
• Based on object oriented modeling
and development
• Building blocks: UML, MOF, XMI
and OCL
• The model generates:
– Repository data structure changes
– APIs for models to interoperate
– APIs to load and retrieve meta data
– CORBA components
3.Meta data exchange format
• Based on the CWM and Standard DTDs
• XML data streams following the XMI standard
Tom Gransee, Knightsbridge
Tools / Options for moving forward
CWM is supported by:
•
Oracle
•
IBM
•
SAS
•
Adaptive
•
Hyperion
18
Meta data management repositories
Selection Criteria
• Manage associations across tools
• Search and retrieval across tools – ability to display a single consolidated view
–
–
–
–
•
•
•
•
•
•
•
•
•
•
•
•
Lineage
Impact analysis
Plain English definitions
Subject areas
Ease of Extensibility
Customizable user interface using an industry standards web solution
Reduce integration cost in a best-of-breed development environment
Enable Plug-and-play tool strategy
Available automated bi-directional meta data exchange bridges / adapters
Template driven retrieval of meta data
Group / role based security
Interoperability between metamodels
Support of industry standards – CWM, HL7, etc.
Support of Federated repositories
Ability to expand beyond the DW
Versioning – extracting a time slice
Tom Gransee, Knightsbridge
Tools / Options for moving forward
19
Types of Meta data management repositories
• Enterprise – Supports a broad range of functionality including enterprise architecture, data
warehousing, business intelligence, component management and others
– CA – Advantage (formerly Platinum)
– ASG - Rochade
– Adaptive Foundation (formerly Unisys)
• DW Suite solutions – meta data solutions that are integrated into a suite of tools primarily from a
single vendor designed to build and maintain the complete data warehouse framework
– Microsoft Repository
•
DWSoft – Navigator web browser
– SAS –Warehouse Administrator
– Oracle Warehouse Builder
•
OWB Repository
• Enterprise Data Warehousing and ETL – supports data warehousing, ETL and business
intelligence activities typically in a best-of-breed toolset environment
– Ab Initio
– Informatica
•
Data Advantage Group – MetaCenter
– Ascential – Meta Stage
• Modeling – supports development and versioning activates for ER and object modeling
–
–
–
–
ERwin Suite
Rational Rose
Oracle Designer
Popkin System Architect
Tom Gransee, Knightsbridge
Tools / Options for moving forward
20
Options for moving forward with a DW solution
• Build from scratch using an RDBMS and custom web
delivery application
• Implement a repository tool and extend it as needed
– Enterprise repository tool
– Warehouse suite solutions
– ETL tool repository
No complete solution exists today. Establish a foundation and gradually
develop a complete solution through a series of iterative releases!
Tom Gransee, Knightsbridge
Tools / Options for moving forward
21
Development lifecycle
Strategy
Development
Architecture
Development
4 - 6 weeks
3 - 4 weeks
• Define business objectives, • Define meta data sources
requirements and benefits • Define repository
• Understand standards
• Define hardware platform
• Research repository tools
• Define technical
objectives/requirements
• Document capabilities
• Relate standards, tools,
requirements, architecture
• Develop scope and
priorities
Tom Gransee, Knightsbridge
and software
requirements
• Define meta data
integration ETL process
• Define meta data
delivery/display
mechanisms
Tools / Options for moving forward
Design and
Construction
6 - 10 months
• Iterative release approach
• Design and construction of
meta data integration ETL
• Design and construct
logical and physical meta
data models
• Hardware platform
implementation
• Test
• Rollout
22
Pitfalls to avoid
• Selecting a repository tool without
defining requirements first
• Expecting the repository tool to solve
process problems
• Selecting an architecture that is not
extensible or can’t scale
• Underestimating the effort
• Relying too much on manual entry
• Selecting an initial project that does not
deliver adequate business benefit
Many meta data projects
fail because they are
either too big to be
practical or too small to
deliver real benefits
• Selecting an initial project that is too
complex to be practical
Tom Gransee, Knightsbridge
Meta data summary
23
One example of a hybrid solution
DATA Modeling
Business Definitions for
DW Attributes
Logical & Physical DB Models
Source-to-Target Mappings, Code Mappings,
Business Rules, Data Quality, Contacts and
Document Definitions
Access Basic
Oracle 8i
Code & Data
Mapping
Cubes, Hierarchy and Levels
Field-to-Field Derivation
Business
OLAP Cubes Intelligence
SQL Server
2000
Pro Clarity
ER/Win 3.5
Web Based Access
Staging
Area
Meta Model
SD
PRO
Data
Warehouse
Oracle 8i
Professional Workstation 6000
Physical RDBMS
Info through OLE DB
Codes Frequency
Min/Max/Averages
Microsoft Meta Data Services
SQL Server 2000
Fast, inexpensive, low-risk approach
implemented at a major insurance company
Tom Gransee, Knightsbridge
Meta data summary
DWSoft Web Browser
XML and ASP
DML Information
ETL
Ab Initio
24
The DW meta data solution within the
enterprise meta data architecture
Enterprise Architecture
EAI
ERM Oracle
Data Warehouse Platform
Document/Email Platforms
• Operational data store
• Documents
• Customer information
• Catalogs/digital content
• Historical information Oracle
• Email
• Security
Metadata
Repository
Oracle
Catalogs, content, documents, web links…
Unstructured
Data
Tower box
Unstructured text and digital assets
Oracle
Operational
Data
Tower box
Metadata
Repository
ETL Data Management Platform
• Centralized meta model
• Complex data transformations
• Meta data repository
• Real-time extractions
• XML, Java, HTML, LDAP
Flat Files
KM Vendors
• Autonomy
Taxonomy • Intraspect
• Documentum
Meta data • BRS Rule Track
Oracle
Tower box
Knowledge Management Engine
• Business rules
• Content management
• Automatic taxonomy generation
• Neural net search engine/adaptive learning
• Text mining
• Personalization
• XML, Java, HTML LDAP:support
EIP Vendors
• Viador
Meta data • Hummingbird
• TopTier
Taxonomy
Oracle
MicroStrategy BI Platform
• Analytical processing
• Graphical visualization
• Reports
Oracle
Tower box
Personalized
Portal Access
Tower box
EIP Engine
• Browser-based access
• Personalization
• Common Authentication Proxy
• Automatic taxonomy generation
• Structured/unstructured info integration
• XML, Java, HTML LDAP support
Laptop computer
PDA
Tom Gransee, Knightsbridge
Meta data summary
25
Increasing the odds of success
A data warehouse project without a
formalized meta data facility has only a one
in four chance of being highly successful;
still, in the heat of the DW battle, rarely is
meta data seen on the front line!
Is it worth the risk not to do it?
Source: META Group’s industry study: Data Warehouse Scorecard: Cost of Ownership and Successes in Application of
Data Warehouse Technology
Tom Gransee, Knightsbridge
Meta data summary
26
The data quality cycle
1. Identify business drivers
2. Set scope
3. Define metrics
Detect
• Correctness Defects
Correct
Measure / • Integrity Defects
Make visible • Presentation Defects / Repair
• Application Defects *
Prevent
* How the user applies the data
Tom Gransee, Knightsbridge
Data quality cycle
27
Data quality rules progression
Content and structure
Focus on data content
Correctness
(value based)
Data Quality
9
Focus on data structure
Integrity
(structured based)
Rules
Governing the Processes
Examine and understand data
Change the data
Inductive Rules
Deductive Rules
Data Cleansing
and
Transformation
Rules
Data Profiling
Column profiling
Tom Gransee, Knightsbridge
Dependency profiling
Data quality cycle
Redundancy profiling
28
The data quality cycle
Additional activities to deliver data quality
• Consolidate source systems to reduce collection points and minimize
system interfaces
– Consolidate multiple non-integrated legacy application
– Source independent data marts from the data warehouse
• Consolidate shared data
– Use an ODS to shared data across systems
– Use reference tables and keys to logically integrate data that
must remain distributed
• Implement a hub / ODS for data integration
– Provide a single source of clean data
– Reduce system interfaces
Tom Gransee, Knightsbridge
Data quality cycle
29
How meta data supports data quality
• The meta data solution manages many components needed
to support a sustained data quality effort
–
–
–
–
–
–
–
–
Navigation meta data
Lineage
Plain English Definitions
Calculations and transformations
Currency
Identify owners and stewards
Business rules
Help identify redundency
• The repository captures and exposes data quality statistics
to a wide audience
• The repository web interface can provide a mechanism for
soliciting feedback
Tom Gransee, Knightsbridge
Data quality cycle
30
What is the Value?
“Companies that manage their data as a
strategic resource and invest in its quality
are already pulling ahead in terms of
reputation and profitability from those that
fail to do so.”
Source: Global Data Management Survey 2001
PricewaterhouseCoopers
Tom Gransee, Knightsbridge
Data quality cycle
31
Questions
Tom Gransee, Knightsbridge
32
Meta data usage – development samples
Data lineage – from source or target
List the fields in a source file
Easily drill to:
• Target data
• Transformations
• Additional source data
Tom Gransee, Knightsbridge
Example 1a
33
Meta data usage – development samples
Data lineage – data quality review
Tom Gransee, Knightsbridge
Example 1b
34
Meta data usage – development samples
Data lineage – understanding the data
Atlas Source System
1,016,575 Monthly Billing Codes
57% - Atlas Billing Codes
24% - Total Data Warehouse
Cyberlife IL
437,296 Monthly Billing Codes
57% - Cyberlife IL Billing Codes
10% - Total Data Warehouse
Tom Gransee, Knightsbridge
Example 2
35
Meta data usage – development samples
Impact analysis
Display all the transformations
For a Column in the warehouse
or a field from a source system
Tom Gransee, Knightsbridge
Example 3
36
Meta data usage – development samples
Live meta data
DDL
Tables
RDBMS
Columns
ETL
ETL Design
Codes and data
mapping
application
DDL
RDBMS
Code mappings
ETL
Tom Gransee, Knightsbridge
Example 4
Invalid or missing
Suspense code mappings
37
Meta data usage – development samples
Integrated view of meta data from multiple sources
• Extracted from the repository
• Includes data from Oracle
• Includes data from ERwin
Tom Gransee, Knightsbridge
Example 5a
38
Meta data usage – development samples
Integrated view of meta data from multiple sources
Tom Gransee, Knightsbridge
Example 5b
39
When is meta data not meta data?
Column
Column level meta data should be incorporated into the
centralized meta data solution for easy display to a
wide audience
• Column definition
• % of records from each source system
• Counts and % of unique values for a code
Audit and Meta Tags
Row
Claim
Transaction
Source System
Identifier
Date
Updated
Source System
Code Value
Row level information should be captured, managed
and displayed by the application, i.e., the Data
warehouse, data mart or other collection points
Tom Gransee, Knightsbridge
Example 6
40
The importance of a meta data repository
Legacy applications
Tom Gransee, Knightsbridge
•
•
•
•
Limited exposure to the data
An awareness of data quality issues
Impact of problems not easy to see
Few sustained activities to correct the problems
•
•
•
•
•
Greatly increased exposure to the data
Impact of problems clearly felt
Few sustained activities to correct data problems
Awareness of meta data problems
Few sustained activities to correct meta data
•
•
•
•
•
•
•
Greatly increase the exposure to meta data
Navigation meta data increases data access
Lineage
Plain English Definitions
Calculations and transformations
Identify owners and stewards
Business rules
Example 7
41
ROI for meta data
Benefit
Issue
•
•
•
•
•
Business Analysts told us they spend up to
70% of their time locating the right information
source and resolving multiple versions of the
truth
•
Technical analyst’s and developer’s told us
they spend up to 80% of their time finding the
data needed to satisfy a request
•
Business and technical analysts told us it
requires six to nine months for a new
associate to become proficient in using data
•
Lack of complete automated Impact analysis
within and across tools creates a serious risk
when implementing changes
At least 20% improvement in the time required
to locate and validate information
At least 20% improvement of application
development and maintenance activities
Reduced learning curve for new associates by
3 months and lower the mentoring required
from key resources
At least 20% reduction in the time required to
perform a complete impact analysis and
reduce risk of errors when migrating changes
to production
Quote from the META Group’s series of white papers addressing application delivery strategies
“meta data interchange will improve Application Development and maintenance efficiency by up to
30%, and real-time meta data interoperability will enable up to 50% improvement in Application
Development and maintenance efficiency.”
Tom Gransee, Knightsbridge
Example 8
42
What does correct mean
Correctness
Rules
Activities
• Accuracy
Validation
• Consistency
Does it match the rules
• Completeness
Verification
• Balancing
Does it make sense as applied to other
reliable sources
• Continuity
• Precedence
Accurate
• Currency
Data can be valid but still not be accurate
• Duration
Inspection
• Retention
Can be as simple as spot checking or as
thorough data-driven discovery inspection
using techniques like: pattern recognition,
classification and probability
• Precision
• Granularity
Tom Gransee, Knightsbridge
Example 9
43
Download