Data Strategy

advertisement
Data Strategy in Practice
Sid Adelman &
Associates
sidadelman@aol.com
818.783.9634
Data Strategy
1 – Introduction to Data Strategy
 Module 2 – Data Quality
 Module 3 – Metadata
 Module 4 – Organization, Roles &Responsibilities
 Module 5 – Security & Privacy
 Module 6 – Business Intelligence
 Module 7 – Information Integration
 Module 8 – Software/Products
 Module 9 – Performance & Measurement
 Module
Copyright Sid Adelman, 2007
2
Module 1 – Introduction to Data
Strategy
 Components
of a data strategy
 Why have a data strategy
 Do these problems exist in your organization?
 Gain control
 Support the IT strategy
 Data in the Dark Ages
 Enlightened data strategy
 Critical success factors
 How to implement a data strategy
 Best Practices
Copyright Sid Adelman, 2007
3
Components of a Data Strategy +
 RDBMS
- Relational Database Management
System
 Data Quality
 Metadata
 Performance
 Data Distribution
 Organization
 Data Ownership
Copyright Sid Adelman, 2007
4
Components of a Data Strategy +
 Security
and Privacy
 Total Cost of Ownership
 Subject area databases
 Data modeling
 Data sharing
 Business Intelligence
 Information integration
Copyright Sid Adelman, 2007
5
Components of a Data Strategy +
 Legacy/operational
data
 Standards
 Data
migration
 Application packages
 Software/products
 Personal/departmental databases
Copyright Sid Adelman, 2007
6
Components of a Data Strategy
 Categorization
of data
 Communicating and selling the data
strategy
 Measurement
Copyright Sid Adelman, 2007
7
Why Have a Data Strategy
 Capitalize
on the data asset
 Support the IT Strategy
 Gain control
Copyright Sid Adelman, 2007
8
Do these problems exist in your
organization? +
 Uncontrolled
redundant data
 Data not easily accessible by the user
 Lack of knowledge of available data
 Poor data quality
 Each new application designs, builds and
populates it own data base
 Inconsistent reports
Copyright Sid Adelman, 2007
9
Do these problems exist in your
organization?
 Private
databases
 No central meta data repository
 Management unclear on the importance of
data
 No responsibility for data
 Data standards non existent, not understood
or not followed
Copyright Sid Adelman, 2007
10
Gain Control
 Consistent
security implementation
 Understand, define and assign ownership
 Understand, define and assign stewardship
 Minimize redundancy
 Inventory data
 Develop consistent terminology
Copyright Sid Adelman, 2007
11
Support the IT Strategy
 Provide
departments, projects and personnel with
guidelines for storing and accessing data
 Minimize the number of RDBMSs
 Establish, disseminate and maintain standards for
shared data resources
 Deliver a high level of service
–
–
–
–
performance
Availability
response time
responsiveness to user requests
Copyright Sid Adelman, 2007
12
Data in the Dark Ages
 Data
is kept locked by each application or
department
 Users do not trust the data
 Data is not well understood either by users
or by IT
 Data is difficult to access
 Senior Management does not understand
the value of data
Copyright Sid Adelman, 2007
13
Enlightened Organization
 Data
is shared
 Users trust the accuracy of the data
 Data is inventoried and terminology is clear
 Data is easily accessed by IT and by the
users
 Senior Management view data as an asset
that is critical to the organization and to
decision making
Copyright Sid Adelman, 2007
14
Critical Success Factors






Data Strategy supports IT plans
Quality data
Support of legacy data
Support of development efforts
Infrastructure
– Organization
– Skills
– Tools
Achieve short-term successes
Copyright Sid Adelman, 2007
15
How to Implement a Data
Strategy
 Data
environment assessment
 Establish a target data environment
 Develop an implementation plan
 Sell Data Strategy within the organization
 Evaluate progress and justify your existence
 Revisit the plan
Copyright Sid Adelman, 2007
16
Best Practices
 Don’t
get into the details too soon
 Don’t be seen as a theorist -- your actions
must be pragmatic
 Don’t lead with long-term deliverables
 Don’t commit more than you can deliver
 Avoid unproven technology
Copyright Sid Adelman, 2007
17
Module 1 Workshop
Assessment of Existing Organization
Copyright Sid Adelman, 2007
18
Module 2– Data Quality
 Management
Support
 Evaluation/Diagnosis
 Timeliness
 ETL Validation
 Prioritization - Which Data to Clean First
 Cost of Cleansing
 Responsibility for Data Quality
Copyright Sid Adelman, 2007
19
Management Support
 Management
awareness of importance of
data quality
 Cost justification of data quality initiative
 Ongoing commitment
 Finding a business management sponsor
Copyright Sid Adelman, 2007
20
Evaluation/Diagnosis
Which source data is most correct
 Valid values (domains)
 Business rules
 Data types (e.g., hex, packed decimal)
 Completeness
 Inappropriate defaults
 Fields used for multiple purposes
 Accuracy
 Quality of historical data

Copyright Sid Adelman, 2007
21
Data Timeliness
 Currency
of data, e.g., last Friday
 Frequency of update, e.g., daily, weekly,
monthly, quarterly
 User awareness – how will the users know?
Copyright Sid Adelman, 2007
22
ETL Validation
 Validation
of ETL process
 Tie-outs
– Number of records
– Dollar matching
– Quantitative matching
 Automatic
versus manual checking
 Referential integrity?
Copyright Sid Adelman, 2007
23
Triage - Prioritization
 Which
data to clean
 Justification for cleansing
 Ease of cleansing
 Possibility of cleansing
 Political support for cleansing
Copyright Sid Adelman, 2007
24
Cost of Cleansing
 Automatic
versus manual
– Tools to perform automatic cleansing
– Effort to support use of tools
 Use
of defaults
 Knowledge/experience of those performing
manual cleansing
Copyright Sid Adelman, 2007
25
Responsibility for Data Quality
 “It’s
not enough to say that data quality is
everyone’s responsibility.”
 Data Quality Administrator
 Ongoing commitment
 Data ownership responsibility
 Operational versus data warehouse
responsibility
Copyright Sid Adelman, 2007
26
Data Quality – Best Practices
 Inventory
the quality of your data
 Sell the importance of data quality to
management
 Assign data quality responsibility
 Triage the cleansing process
Copyright Sid Adelman, 2007
27
Module 2 Workshop
Data Quality
Copyright Sid Adelman, 2007
28
Module 3– Metadata
 Management
Support
 Meta Data as the Keystone
 Which Metadata to Capture
 Responsibility for Capture
 Responsibility for Maintenance
 Business Metadata
 Technical Metadata
 How will Metadata be Used
 Data Inventory
Copyright Sid Adelman, 2007
29
Metadata – Management Support
 IT
and the Business
 Management understanding of the
importance of metadata
 Impact on project schedules
 Long term benefit of metadata
 Importance for operational and data
warehouse
Copyright Sid Adelman, 2007
30
Metadata as the Keystone
 Single
version of the truth
 It’s the inventory of information
 Tears down dysfunctional information
fiefdoms
 Opportunities to reduce redundancy
 Opportunities for integration
Copyright Sid Adelman, 2007
31
Which Metadata to Capture
 Don’t
boil the ocean
 What meta data is valuable
 Ease and cost of capture
 Political issues relating to capture
Copyright Sid Adelman, 2007
32
Responsibility for Capturing
Metadata
 Incentive
for capturing
 Management direction
 Automatic and manual
Copyright Sid Adelman, 2007
33
Responsibility for Metadata
Maintenance
 Where
does Metadata Repository
maintenance report?
 Why is maintenance important?
 Long-term commitment
Copyright Sid Adelman, 2007
34
Business Metadata
 Business
definitions
 Source of data
 How data was derived (algorithms)
 Lineage (data genealogy)
 Timeliness
 Security
 Ownership
 Quality
Copyright Sid Adelman, 2007
35
Technical Metadata
 Field
name
 Database
 Data type
 Source
 Length
Copyright Sid Adelman, 2007
36
How Will Metadata be Captured
 Data
modeling tools
 ETL tool
 Access and analysis tool
 Metadata Repository tool
 Data dictionary
 Copybooks
 Home grown application
Copyright Sid Adelman, 2007
37
How Will Metadata be Used
 Business
– Understanding the data
– Understanding the meaning of results
– Avoiding incorrect conclusions
 IT
– Research
– Impact analysis
– Tool interchange
Copyright Sid Adelman, 2007
38
Inventory
 Where
is the data?
 How and where is it used?
 Quality of data
 Redundancy
 Ownership
 Documentation
Copyright Sid Adelman, 2007
39
Metadata – Best Practices
 Determine
which meta data to capture and
use
 Determine how the tools will capture and
use metadata
 Sell management on the importance
 Assign metadata responsibility
Copyright Sid Adelman, 2007
40
Module 3 Workshop
Metadata
Copyright Sid Adelman, 2007
41
Module 4 Organization – Datarelated Roles & Responsibilities
 Database Administrator
 Data Administrator
 Data
Quality Administrator
 Security
 Architect
 Data ownership
Copyright Sid Adelman, 2007
42
Database Administrator
 Database
design
 Backup and recovery
 Reorganization
 Monitoring
 Tuning
 Index creation
Copyright Sid Adelman, 2007
43
Data Administrator
 Data
modeling
 Source data evaluation
 Enterprise data integration
 Data quality analysis
 Metadata responsibility
Copyright Sid Adelman, 2007
44
Data Quality Administrator
 Uncovering
data quality problems
 Communicating data quality problems
 ETL verification
 Responsibility for some cleansing
Copyright Sid Adelman, 2007
45
Security
 Responsibility
for who can do what to the
data
– Data access
– Data create/update/delete
 Working
with those administering the tools
that have security capabilities
Copyright Sid Adelman, 2007
46
Architect
 Knowing
what the enterprise needs
 Evaluating technical options
 Developing an appropriate architecture
 Selling the architecture
Copyright Sid Adelman, 2007
47
Data Ownership +
 Creation
 Access
 Determine
requirements for performance
 Determine requirements for availability
 Determine historical requirements
Copyright Sid Adelman, 2007
48
Creation
 Data
Entry process
– Training
– Incentives for quality
 Quality
of data
 Data edits
Copyright Sid Adelman, 2007
49
Access
 Need
to know
 Opt in/Opt out
 Level of granularity
 By department
 By role
 External access by people outside the
organization
Copyright Sid Adelman, 2007
50
Performance Requirements
 Response
time
 What is excellent response time worth?
 Timeliness
Copyright Sid Adelman, 2007
51
Availability Requirements
 How
many hours and days does the system
need to be available?
 What is the availability requirement during
scheduled hours?
Copyright Sid Adelman, 2007
52
Historical Requirements
 How
far back to keep the data
 How detailed does old data need to be?
 Impact of code changes and organizational
changes over time
Copyright Sid Adelman, 2007
53
Organization – Best Practices
 Establish
the appropriate organization for
your enterprise
 Enumerate roles and responsibilities
 Gain concurrence for roles and
responsibilities
– Management
– Those performing the functions
Copyright Sid Adelman, 2007
54
Module 4 Workshop
Organization
Copyright Sid Adelman, 2007
55
Module 5 Security & Privacy
 Categorization
for security
 Responsibility for determining
 Mechanism for establishing procedures
 Security audit
 Regulatory issues
 Data sharing
Copyright Sid Adelman, 2007
56
Categorization for
Security/Privacy
 Does
all data have the same
security/privacy requirements?
 Who determines security/privacy
requirements of data?
 What are the regulatory requirements for
security and privacy?
 Does your organization have a Security
Office? What authority do they have?
Copyright Sid Adelman, 2007
57
Responsibility
 Security
Office
 Internal auditors?
 Data Owners
 Responsibility for administering
 Testing security and privacy
Copyright Sid Adelman, 2007
58
Mechanism for Establishing
Procedures
 Security
requirements
– Internal
– Regulatory
 Tools
that implement security
 Communicating security requirements to
those who implement
Copyright Sid Adelman, 2007
59
Security Audit
 Validating
procedures
 Validating training
 Testing and probing
 Recommending mitigation
 Frequency of audits
Copyright Sid Adelman, 2007
60
Regulatory Issues
Care – HIPPA
 Finance
 Brokerage - SEC
 Insurance
 Media – FCC
 Health
Copyright Sid Adelman, 2007
61
Data Sharing
 Inhibitors
 Motivation/incentives
to share
 Management directives on sharing
Copyright Sid Adelman, 2007
62
Inhibitors
 Power
 Fear
of others
 Fear of boss micromanaging
Copyright Sid Adelman, 2007
63
Motivation/incentives to share
 Are
there any?
Copyright Sid Adelman, 2007
64
Management Direction on
Sharing
 Direction
to share must come from the CEO
– Need to know
– Reason for withholding access must be
documented
– Access only given when directed
Copyright Sid Adelman, 2007
65
Security & Privacy – Best
Practices
 Raise
the consciousness of security and
privacy requirements
 Connect with your Security Office
 Determine security capabilities of tools
 Assign responsibilities
 Test and validate
Copyright Sid Adelman, 2007
66
Module 5 Workshop
Security & Privacy
Copyright Sid Adelman, 2007
67
Module 6 Business Intelligence
 Goals
and Objectives
 Architecture
 Data Mining
 Tools
 Methodology
Copyright Sid Adelman, 2007
68
Goals and Objectives
 Why
have a data warehouse?
 Have goals and objectives been identified
 Have they been communicated?
 Are they measured post-implementation
Copyright Sid Adelman, 2007
69
Architecture
 Platform
 Tools/products
 How
the data flows
Copyright Sid Adelman, 2007
70
Data Mining
 Discovery
versus hypothesis testing
 Different tools
 Different people mining the data
Copyright Sid Adelman, 2007
71
Tools
 RDBMS
 Data
Modeling
 ETL
 Access
and Analysis
 Data quality (Cleansing)
 Measurement
Copyright Sid Adelman, 2007
72
Methodology
 Spiral
versus waterfall
 Phasing more appropriate
 Tasks more difficult to estimate
Copyright Sid Adelman, 2007
73
Business Intelligence – Best
Practices
 Set
goals and objectives
 Set expectations early and often
 Establish cost justification
 Find a terrific sponsor
Copyright Sid Adelman, 2007
74
Module 6 Workshop
Business Intelligence
Copyright Sid Adelman, 2007
75
Module 7 Information Integration
 Integrating
business data
 Data redundancy
 Different RDBMSs and their impact
 Data migration
Copyright Sid Adelman, 2007
76
Integrating Business Data
 Understanding
the customer
 ERPs
 Supply
chain
Copyright Sid Adelman, 2007
77
Data Redundancy
 Goal
to reduce data redundancy?
 Inconsistent data
 Single version of the truth
 Cost of data redundancy
Copyright Sid Adelman, 2007
78
Different RDBMSs & Their
Impact
 More
interface programs
 Less depth in DBA pool
 More product expense
 Integration problems
 Less optimizer capability
Copyright Sid Adelman, 2007
79
Data Migration +
 Should
data be dropped?
 Should data be converted?
 Should data be integrated/consolidated?
Copyright Sid Adelman, 2007
80
Should Data be Dropped?
 Is
it even being used?
 What’s the cost of maintaining this data?
 Could another database be used in its place?
 Any political issues?
 Any regulatory issues?
Copyright Sid Adelman, 2007
81
Should Data be Migrated?
 Can
we consolidate RDBMSs?
 What is the cost of migration?
 What is the impact on other systems?
Copyright Sid Adelman, 2007
82
Should Data be
Integrated/Consolidated?
 Why
do we want to integrate/consolidate?
 Costs of integration/consolidation
 Savings of integration/consolidation
 Political issues
 Regulatory issues
Copyright Sid Adelman, 2007
83
Information Integration – Best
Practices
 Determine
information integration benefits
and costs
 Sell information integration to management
 Establish and execute priorities
Copyright Sid Adelman, 2007
84
Module 7 Workshop
Information Integration
Copyright Sid Adelman, 2007
85
Module 8 Software/Products
 RDBMS
Tools/utilities
 Organization standards for products
 Criteria for selection
 Responsibility for Selection
 Single vendor/best of breed
 Deals/Negotiation
 Relationship with vendors
 Application packages

Copyright Sid Adelman, 2007
86
RDBMS
 Which
RDBMS is the standard
 Relation to platform
 What applications is it being used for
Copyright Sid Adelman, 2007
87
RDBMS Choices
 IBM
(DB2, IMS, Informix)
 Microsoft (SQL Server)
 Oracle
 Sybase
 Teradata
Copyright Sid Adelman, 2007
88
Why standardize the RDBMS?
 Minimize
the number of RDBMSs
 Less training required
 More leverage on RDBMS vendor
 Flexible assignments
 Fewer interface problems
 Fewer interface programs
Copyright Sid Adelman, 2007
89
Relation to platform
 RDBMS
performance impacted by platform
 Platform may dictate (or strongly
recommend) RDBMS choice
 Which decision comes first?
Copyright Sid Adelman, 2007
90
What application is RDBMS
being used for
 Operational/OLTP
 Data
Warehouse/Business Intelligence
Copyright Sid Adelman, 2007
91
Tools/Utilities
 Platform
dependent
 RDBMS dependent
 Expensive
 33% on the shelf
 Lots of product duplication
 Necessary?
Copyright Sid Adelman, 2007
92
Organization Standards for
Products
 Who
sets standards?
 Are the standards known?
 Are they standards or guidelines?
 Who can give dispensation?
Copyright Sid Adelman, 2007
93
Criteria for Selection
 Need
 Cost
 Vendor
– Support
– Reputation
– Financial stability
Copyright Sid Adelman, 2007
94
Responsibility for Selection
 Technical
evaluators
 Strategic architect
 Management
Copyright Sid Adelman, 2007
95
Single Vendor vs Best of Breed
 Single
–
–
–
–

vendor
Possibly a better relationship
Leverage
Not always the best products
Products should all work together
Best-of-breed
– Need to integrate yourself
– Finger pointing when problems
– Potential incompatibilities
Copyright Sid Adelman, 2007
96
Deals/Negotiations
 Have
someone else negotiate
 Don’t let vendor know you have chosen
them before you negotiate
 www.dobetterdeals.com (Joe Auer –
ComputerWorld)
Copyright Sid Adelman, 2007
97
Relationship with Vendors
 Partnerships
 Money
Issues
 Support
 Conferences
 Being a reference
Copyright Sid Adelman, 2007
98
Databases Required by the
Application Packages
 Packages
do not support all RDBMSs
 Packages do not support all RDBMSs
equally well
 Does preferred RDBMS violate
organization standard
 Are support personnel (DBAs) available?
Copyright Sid Adelman, 2007
99
Impact of Package
 Machine
Requirements
 Performance
 Availability
Copyright Sid Adelman, 2007
100
Software – Best Practices
 Determine
real requirements
 Establish software standards
 Make use of existing software whenever
possible
 Talk to organizations who are using the
products
Copyright Sid Adelman, 2007
101
Module 8 Workshop
Software/Products
Copyright Sid Adelman, 2007
102
Module 9 – Performance and
Measurement
 Categorization
for performance
 Capacity Planning
 Monitoring/Measuring
 Service Level Agreements
 Tuning
 Roles and Responsibilities
 Reporting performance
Copyright Sid Adelman, 2007
103
Categorization for Performance
 How
good does response time need to be?
 How does it differ from application to
application?
 What is the cost-benefit of excellent
response time?
 Were performance considerations included
in the architecture?
Copyright Sid Adelman, 2007
104
Categorization for Availability
 Scheduled
hours (24 X 7, 18 X 6,…)
 Availability during scheduled hours
 How does it differ from system to system?
 Is excellent availability cost justified?
 Was availability included in the
architecture?
Copyright Sid Adelman, 2007
105
Capacity Planning
 Database
size
 Number of users
 Number of transactions
 Number of queries/reports
 Time and day of usage
 Complexity of transactions/queries/reports
 Proactive response to capacity increase
Copyright Sid Adelman, 2007
106
Monitoring/Measuring
 Response
time
 Resource utilization (CPU, disk access,
network)
 Who is using the system
 When is the system being used
 Chargebacks
Copyright Sid Adelman, 2007
107
Service Level Agreements
 Response
time
 Availability
– Schedule hours (hours/day, days/week)
– Availability during scheduled hours
 Timeliness
of data
 Response to problems
 Response to new requests
Copyright Sid Adelman, 2007
108
Tuning
of problems – measurement
tools and responsibilities
 Tuning capability of platform, RDBMS,
tools
 Responsibility for tuning
 Awareness
Copyright Sid Adelman, 2007
109
Roles and Responsibilities
 DBA -
RDBMS
 Application performance
 Systems programmer – operating system
 System Architect
 Capacity Planner
 Performance testing
Copyright Sid Adelman, 2007
110
Reporting performance
 IT
– Who needs to take action
– Who needs to see reports/alerts
 Business
– Matching project agreements
– Expectations
Copyright Sid Adelman, 2007
111
Measurement Tools
 Performance
 Usage
 Resource
utilization
 Network
Copyright Sid Adelman, 2007
112
Measurement Usage
 What
do you do with the performance
measurement information?
Copyright Sid Adelman, 2007
113
Reporting to Management
 High
level (not detailed)
 Problems, aberrations
 Frequency
 Form (tables, charts, graphs)
Copyright Sid Adelman, 2007
114
Service Level Agreements
 Response
time
 Availability
 Who establishes agreements?
 What’s realistic?
 Incentives to meet SLAs
Copyright Sid Adelman, 2007
115
Performance & Measurement –
Best Practices
 Determine
what is advantageous to measure
 Assign responsibilities
 Designate tools for measurement
 Report metrics to management
Copyright Sid Adelman, 2007
116
Module 9 Workshop
Performance & Measurement
Copyright Sid Adelman, 2007
117
Overall Data Strategy Best Practices
 Don’t
get into the details too soon
 Don’t be seen as a theorist -- your actions
must be pragmatic
 Don’t lead with long-term deliverables
 Don’t commit more than you can deliver
 Avoid unproven technology
Copyright Sid Adelman, 2007
118
How to Implement a Data
Strategy
 Conduct
a data environment assessment
 Establish a target data environment
 Develop an implementation plan
 Sell Data Strategy within the organization
 Evaluate progress and justify your existence
 Revisit the plan
Copyright Sid Adelman, 2007
119
Summary
 Pitch
the importance of a data strategy to
your CIO and CTO
 Ask to either lead the effort or to be a
permanent member of the team
Copyright Sid Adelman, 2007
120
Download