The Core Principles of Information Governance

advertisement
Brian Kordelski – WW Sales Executive – IBM InfoSphere
12/07/2010
The Core Principles of
Information Governance
© 2010 IBM Corporation
Governance is no longer an option
“By 2013, 25% of the
companies in highly regulated
industries will create and staff
positions in accounting,
human resources, compliance
and audit and law that deal
explicitly with the management
of information via technology.”
– Gartner, Inc.
“Organizing for
Information Governance”
Debra Logan, November 2009
2
“[A]n [information
management] strategy
should incorporate lifecycle information
governance practices [to
ensure] consistent
execution of ... business
optimization, agility, and
transformation [initiatives].”
– Forrester Research, Inc.
“Refresh Your Information
Management Strategy to
Deliver Business Results”
Rob Karel & James
G. Kobielus, August 2009
“If you are going to protect
your company's most
valuable asset—your
data—you will begin to
view data security as a
component of a more
comprehensive information
governance strategy.”
– Hurwitz & Associates
“Why you need an
information governance
strategy for 2010”
Marcia Kaufman,
December 2009
© 2010 IBM Corporation
Information Governance Council Maturity Model
Requires
Enhances
Supports
3
© 2010 IBM Corporation
If we don’t proactively manage quality
Increase costs and missed revenue opportunities, impacting both
financials and customer relationships due to lack of data quality.
Incomplete and inaccurate master data created problems in receiving and/or
shipping products, marketing literature and regulatory mailings, and 360-degree
customer visibility.
Small error in the quality of the rating data leads to negative
impact for the company and unhappy customers
Large Telecom provider with massive volume of telephone calls and telephone
customers, even a small error in the rating data can mean significant revenue
loss or customer turnover.
Data quality issues plague BI initiatives creating a lack of trust in
the data
Several attempts at implementation of a data warehouse and analytics application at
a major retailer had stalled due to data quality issues which created frustration for the
project team and a lack of trust of the data on the part of business users.
4
© 2010 IBM Corporation
Requirements to manage the quality of data
5
Understand
& Define
Develop
& Test
Cleanse &
Manage Continuously
Discover your data across
systems
Develop database
structures
Define Rules &
Cleanse Data
Define common
vocabulary
Create & refresh
test data
Actively Monitor
& Manage Data
Design your
data structures
Validate test
results
Remediate
Inconsistencies
© 2010 IBM Corporation
Understand your information
?
?
?
?
?
?
?
?
?
?
?
? ?
?
 Complex, poorly documented data
relationships
?
?
?
?
?
– Which data is sensitive, and which can be
shared?
– Whole and partial sensitive data elements
can be found in hundreds of tables and
fields
 Data relationships not understood
because:
?
?
– Where are those databases located?
?
?
?
?
?
?
? ?
 Data can be distributed over multiple
applications, databases and platforms
?
– Corporate memory is poor
– Documentation is poor or nonexistent
– Logical relationships (enforced through
application logic or business rules) are
hidden
Distributed Data Landscape
6
© 2010 IBM Corporation
Gain consistent terminology
How does each user define:
Financial
Officer
Business
Analyst
“Active Subscriber”?
 Mobile user who has used “any”
service in the mobile network
Compliance
Officer
 User who paid for the service at
least 1 time in the past 90 days.
Sales Lead
Marketing
Manager
Business
Intelligence
Manager
 Only post-paid customers, not
pre-paid customers
CRM Project
Manager
ERP
Project Manager
 Mobile user who has a phone
plan, but not SMS
IT
Architect
 User who makes at least 1 call
over the period of 90 days
Support Rep
7
© 2010 IBM Corporation
Cleanse and continuously manage your data
1. Create reusable quality rules & cleanse your data
– Leverage the knowledge gained during the understand
& define steps
– Define what quality means to you
– Design your data quality rules and matching logic
2. Actively monitor & manage your data
– Standardize data formats
– Leverage precisely calibrated matching rules and
remove duplicates
– Develop rules & quality metrics for monitoring
– Manage duplicate data, when required
3. Remediate inconsistencies in your data
– Monitor for problems or trends
– Investigate data lineage to find source of problem
– Repair data and source of problem
– Maintain monitoring to capture future problems
8
Make sure there is an owner of data quality AND
management sponsorship
© 2010 IBM Corporation
Monitor quality with integrated data rules
 Create “Checks & Balances” to proactively identify quality concerns throughout the lifecycle
– Build & test rules for common or complex conditions
– Extend profiling through targeted analysis of specific data conditions or conformance to
expected rules
– Establish benchmarks and baselines to help track data quality – is it deteriorating or
remaining constant?
– Flag bad data for audit
 Examples of Rules:
– The Gender field must be populated and must be in the list of accepted values
– The Social Security Number must be numeric and in the format 999-99-9999
– If Date of Birth Exists AND Date of Birth > 1900-01-01 and < TODAY Then Customer
Type Equals ‘P’
– The Bank Account Branch ID is valid in the Branch Reference master list
9
© 2010 IBM Corporation
IBM provides the solutions required to create high quality information
10
Understand
& Define
Develop
& Test
Cleanse &
Manage Continuously
Discover your data across
systems
Develop database
structures
Define Rules &
Cleanse Data
Define common
vocabulary
Create & refresh
test data
Actively Monitor
& Manage Data
Design your
data structures
Validate test
results
Remediate
Inconsistencies
© 2010 IBM Corporation
Organizational challenges from lack of data lifecycle management
 New application functionality to meet business needs is not deployed on schedule
– No understanding of relationships between data objects repeatedly delays projects
– Greater data volumes take longer to clone, test, validate and deploy which equates to
longer test cycles
 Increased operational and infrastructure costs impact IT budget
– Cloning databases requires more storage hardware
– Larger databases impact staff productivity and could mean additional license costs
 Application defects are discovered after deployment
– Costs to resolve defects in production can be 10 – 100 times greater than those caught
in the development environment
 Unintentional disclosure of confidential data kept in test/development environments
“
Forrester estimates that 85%
of data stored in databases is inactive
Source: Noel Yuhanna, Forrester Research, Database Archiving Remains An Important Part Of Enterprise DBMS Strategy, 8/13/07
11
© 2010 IBM Corporation
The data multiplier effect
Development
1 TB
Test
1 TB
1 TB
Production
1 TB
Backup
1 TB
User
Acceptance
1 TB
6 TB
Total
Disaster
Recovery
Actual Data Burden = Size of production database + all replicated clones
12
© 2010 IBM Corporation
Requirements to manage data across its lifecycle
Discover &
Define
Develop &
Test
Optimize, Archive
& Access
Consolidate &
Retire
Discover where
data resides
Develop database
structures & code
Enhance
performance
Rationalize
application portfolio
Classify & define data
and relationships
Create & refresh test
data
Manage data growth
Move only the
needed information
Validate test results
Report & retrieve
archived data
Enable compliance
with retention &
e-discovery
Define policies
13
© 2010 IBM Corporation
Implement test data management with masking
Production or
Production Clone
Create targeted, right-sized test
environments instead of cloning
entire production environments
Mask data to protect privacy
Development
Environment
Test
Environment
QA
Environment
Training
Environment
Compare data pre/post test to
identify quality issues
14
© 2010 IBM Corporation
Archive to manage data growth
Production
Archive
Reference
Data
Historical
Retrieved
Current
Archives
Historical
Data
Reporting
Data
Retrieve
Universal Access to Application Data
Application
Mashup
Application
XML ODBC / JDBC
Archiving is an intelligent process for moving inactive or infrequently
accessed data that still has value, while providing the ability to search
and retrieve the data
15
© 2010 IBM Corporation
Diagnose and solve performance problems
 Identify problems before they impact
business
 Diagnose performance problems
quickly & easily
 Implement a permanent solution, not a
temporary workaround
 Plan for the future while avoiding past
mistakes
16
© 2010 IBM Corporation
When you retire or consolidate applications don’t move all of the data
 Application portfolio has redundant systems acquired via mergers and acquisitions
 Line of business divested; application is no longer needed
 Legacy technologies not compatible with current IT direction
– Old database and/or application versions no longer supported by manufacturer
 Required technical skills or application knowledge no longer available
 Budget pressures – do more with less
In almost ALL cases, access to legacy
data MUST be retained while the
application and database are eliminated
17
© 2010 IBM Corporation
IBM provides the solutions required to manage information
throughout its lifecycle from requirement to retirement
Discover &
Define
Develop &
Test
Optimize, Archive
& Access
Consolidate &
Retire
Discover where
data resides
Develop database
structures & code
Enhance
performance
Rationalize
application portfolio
Classify & define data
and relationships
Create & refresh test
data
Manage data growth
Move only the
needed information
Validate test results
Report & retrieve
archived data
Enable compliance
with retention &
e-discovery
Define policies
18
© 2010 IBM Corporation
The data privacy and protection risk continues
Confidential data inadvertently exposed or otherwise available
to unauthorized viewers.
February 2010: About 600,000 customers of a major NYC bank received their
annual tax documents with their Social Security numbers (combined with other
numbers & letters) printed on the outside of the envelope.
SQL injection is fast becoming one of the biggest & most high
profile web security threats.
July 2010: Hackers obtained access to the user database and administration
panel of a popular website by exploiting several SQL injection vulnerabilities. The
exposed data included user names, passwords, e-mail addresses and IPs.
Unprotected test data sent to and used by test/development
teams as well as third-party consultants.
February 2009: An FAA server used for application development & testing was
breached, exposing the personally identifiable information of 45,000+ employees.
Confidential data that should be redacted can be hidden or
embedded
April 2010: A PDF of a subpoena in the case of “United States vs. Rob
Blagojevich” was posted to public website. However, the “redacted” text simply had
black box placed on top to hide the content – the actual text was still available.
19
© 2010 IBM Corporation
Can today’s organizations successfully protect their information?
 Where does your sensitive data reside across the enterprise?
 How can your data be protected from both authorized and unauthorized access?
 Can your confidential data in documents be safeguarded while still enabling the necessary
business data to be shared?
 How can access to your enterprise databases be protected, monitored and audited?
 Can data in your non-production environments be protected, yet still be usable for training,
application development and testing?
“
Larry Ponemon, founder of the group that bears his name, said that survey
shows a shift in the way C-level executives think about security software.
Investing in data protection, he said, is now seen as less expensive than
recovering from a data breach. -- InformationWeek
20
© 2010 IBM Corporation
Requirements to manage the security and protection of data
21
Discover &
Define
Secure &
Protect
Monitor
& Audit
Discover where sensitive
data resides
Protect enterprise data
from both authorized &
unauthorized access
Audit and report
for compliance
Classify & define data
types
Safeguard sensitive data in
documents
Monitor and enforce
database access
Define policies
& metrics
De-identify confidential
data in non-production
environments
Assess database
vulnerabilities
© 2010 IBM Corporation
Discover where sensitive data may be hidden
Sensitive Relationship Discovery
System A Table 1
Number
Name
4600986
AlexFulltheim
8150928
BarneySolo
6736304
BillAlexander
Patient ID
# embedded
3802468
BobSmith
5567193
EileenKratchman
7409934
FredSimpson
6123913
GregLougainis
5061085
JamieSlattery
4182715
JimJohnson
8966020
MartinAston
Code
53
72
32
47
34
System A Table 15
Patient
Result
Test
3802468
N
53
4182715
N
53
4600986another
N field 32
within
5061085
N
53
5567193
N
72
6123913
Y
47
6736304
N
34
7409934
N
34
8150928
N
47
8966020
N
34
System Z Table 25
Name
Streptococcus pyogenes
Pregnancy
Alzheimer Disease
H1N1
Dermatamycoses
 Relationships and sensitive data can’t
always be found just by a simple data
scan
– Sensitive data can be embedded
within a field
– Sensitive data could be revealed
through relationships across fields
& systems
 When dealing with hundreds of tables
and millions of rows, this search is
complex – you need the right solution
Compound sensitive data:
Test results could potentially be revealed.
22
© 2010 IBM Corporation
Protecting data is both an external and internal issue
 Prevent “power users” from abusing their access to
sensitive data (separation of duties)
– DBA and power users
 Prevent authorized users from misusing sensitive data
– For example, third-party or off-shore developers
 Prevent intrusion and theft of data
– For example, someone walking off with a back-up tape
– Hacker
– Database vulnerabilities (user id with no password or
default password)
23
© 2010 IBM Corporation
Protection of data requires a 360-degree strategy
 Secure sensitive data values
– Across both structured and unstructured
 De-identify data
– Restricted data sharing with 3rd parties
– Generation of fictionalized test data for non-production
– Support off-shore deployment model
 Stop unauthorized data access
– Render data useless via encryption
– Lock down SQL to prevent SQL injection
– Block suspicious network traffic
Security makes it possible for us to take risk, and innovate confidently.
24
© 2010 IBM Corporation
Protect sensitive data values within documents
 Redact (or remove) sensitive unstructured data found in documents and forms, protecting
confidential information while supporting the need to share critical business information
– Support compliance with industry-specific and global data privacy requirements or
mandates
 Leverage an automated redaction process for speed, accuracy and efficiency
– Ensure hidden source data (or metadata) within documents is redacted as well
 Prevent unintentional disclosure by using role-based masking to confidently share data
 Ensure multiple file formats are support, including PDF, text, TIFF and Microsoft Word
documents
Redact Full Name
& Street Address
25
© 2010 IBM Corporation
De-identify data without impacting test & development
 Mask or de-identify sensitive data elements that could be used to identify an individual
 Ensure masked data is contextually appropriate to the data it replaced, so as not to impede
testing
– Data is realistic but fictional
– Masked data is within permissible range of values
 Support referential integrity of the masked data elements to prevent errors in testing
JASON MICHAELS
26
ROBERT SMITH
Personal identifiable
information is masked
with realistic but fictional
data for testing &
development purposes.
© 2010 IBM Corporation
What happens with security complacency
 Not being able to report compliance can lead to regulatory fines
– No audit report mechanism
– No fine grain audit trail of database activities
 Don’t know if there is a data breach until it’s too late
– Lack of awareness of suspicious access patterns
– On-going vs. single-invent: problems identifying patterns of unauthorized use
 Not able to monitor super user activity to ensure data security standards
– Unable to detect intentional and unintentional events
“
27
Most organizations do not have mechanisms in place to prevent
database administrators and other privileged database users from
reading or tampering with sensitive information [in business
applications]…Fewer than two out of five respondents said they could
prevent such tampering by super users.
-- Independent User Group
© 2010 IBM Corporation
Streamline and simplify compliance processes
 Alerts of suspicious activity
 Audit reporting and sign-offs
– User activity
– Object creation
– Database configuration
– Entitlements
 Separation of duties – creation of policies vs. reporting
on application of policies
 Trace users between applications, databases
 Fine grained-policies
 Sign-off and escalation procedures
 Integration with enterprise security systems (SIEM)
28
© 2010 IBM Corporation
IBM provides the solutions required secure and protect data privacy
29
Discover &
Define
Secure &
Protect
Monitor
& Audit
Discover where sensitive
data resides
Protect enterprise data
from both authorized &
unauthorized access
Audit and report
for compliance
Classify & define data
types
Safeguard sensitive data in
documents
Monitor and enforce
database access
Define policies
& metrics
De-identify confidential
data in non-production
environments
Assess database
vulnerabilities
© 2010 IBM Corporation
The IBM security strategy:
Make security, by design, an enabler of innovative change
IBM as a trusted partner,
delivering secure products
and services
IBM as a trusted security
vendor, providing key solutions
across all security domains
 15,000 researchers, developers and SMEs
on security initiatives
– Data Security Steering Committee
– Security Architecture Board
– Secure Engineering Framework
 3,000+ security & risk management patents
 200+ security customer references and 50+
published case studies
 40+ years of proven success securing the
zSeries environment
 Managing more than 7 Billion security
events per day for clients
30
© 2010 IBM Corporation
Delivering trusted information for smarter business decisions across
your entire information supply chain
Transactional &
Collaborative
Applications
Integrate
Analyze
Business Analytics
Applications
Big Data
Manage
External
Information
Sources
Master
Data
Cubes
Streams
Data
Data
Warehouses
Content
Streaming
Information
Govern
Quality
31
Lifecycle
Security &
Privacy
© 2010 IBM Corporation
Enabling success
IBM Information Governance Unified Process
Define
Business
Problem
Obtain
Executive
Sponsorship
Conduct
Maturity
Assessment
Build
Roadmap
Establish
Organization
Blueprint
Build Data
Dictionary
Understand
Data
Create
Metadata
Repository
Appoint Data
Stewards
Create
Specialized
Centers of
Excellence (COE)
Implement
Master Data
Management
Manage Data
Quality
Manage
Security &
Privacy
Define
Metrics
Manage
Life-cycle
Measure
Results
= Enable through Process
32
= Enable through Technology
© 2010 IBM Corporation
What can you do next …
 Start small with a project, don’t try to do it all at once
– Free workshops and assessments
– Best of breed solutions to help you succeed
 Join a movement: www.infogovcommunity.com
– Benchmark your organization online
– Work with others on the Maturity Model
– Compare best practices in online peer reviews
– Be recognized for what you contribute on the leader
board
 Read the book:
– The IBM Data Governance
Unified Process: Driving Business Value
with IBM Software and Best Practices
 Visit our web page:
– ibm.com/informationgovernance
33
© 2010 IBM Corporation
Thank you
© 2010 IBM Corporation
Download