DarkData

advertisement
Shining a Light
on
Dark Data
Bringing your hidden data to the light.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Your Kitchen
3
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Your Kitchen – Well Organized
Baking Pans
Bowls
Pantry
Coffee Cups
Drinking Glasses
Dinner Plates
Flat Ware
Serving Utensils
Cups & Saucers
Pots & Pans
4
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Your Business Organization
5
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Your Organization – Well Organized?
SharePoint
Shared Drives
Records
Business App
Emails
HR
Meeting
Minutes
Business App
Business
Unit A
Finance
Business App
Agenda Mgmt
6
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Business
Unit B
The Kitchen Drawer
7
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
My Kitchen Drawer
8
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Informed Business Decision
Keep or (Defensibly) Delete
Object
Keep
Hurricane shutter keys
√
Batteries
√
Screwdriver(s)
√
Box cutter
√
Postage stamps
Trash
√
Money
9
ReLocate
√
Pens, no longer write
√
Glue, frozen cap
√
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Dark Data
Do You Have Any?
Your House – Regularly Scheduled Disposal
Saturday
Wednesday
Bulk
11
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Your Business – Regularly Scheduled Disposal?
Good
Bad
Ugly
12
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hiding in Plain Sight
Are you overwhelmed by enterprise data?
A majority of an organization’s data is:
•
effectively ungoverned
•
unmanaged
•
some is unseen...
13
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is Dark Data?
What lies hidden in your enterprise data…the unknown?
Dark Data tends to be:
•
•
•
•
•
•
•
Human readable
Unstructured
Unindexed
Un-categorized
Unmanaged
Inactive
Orphaned
Dark Data resides in:
•
•
•
•
•14
File servers
SharePoint
Email servers
Desktops
© Copyright 2013
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Mobile
Devices
What Is the Size of Your Digital
Landfill?
Enterprise Search Engine
The Digital Landfill
15
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Size Does Matter
Just Ask IT
1 Bit = Binary Digit
1000 Terabytes = 1 Petabyte
8 Bits = 1 Byte
1000 Petabytes = 1 Exabyte
1000 Bytes = 1 Kilobyte
1000 Exabytes = 1 Zettabyte
1000 Kilobytes = 1 Megabyte
1000 Zettabytes = 1 Yottabyte
1000 Megabytes = 1 Gigabyte
1000 Yottabytes = 1 Brontobyte
1000 Gigabytes = 1 Terabyte
1000 Brontobytes = 1 Geopbyte
16
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Bit: A Bit is the smallest unit of data that a computer uses. It can be used to represent two states of information, such as Yes or No.
Byte: A Byte is equal to 8 Bits. A Byte can represent 256 states of information, for example, numbers or a combination of numbers and letters. 1 Byte
could be equal to one character. 10 Bytes could be equal to a word. 100 Bytes would equal an average sentence.
Kilobyte: A Kilobyte is approximately 1,000 Bytes, actually 1,024 Bytes depending on which definition is used. 1 Kilobyte would be equal to this
paragraph you are reading, whereas 100 Kilobytes would equal an entire page.
Megabyte: A Megabyte is approximately 1,000 Kilobytes. In the early days of computing, a Megabyte was considered to be a large amount of data.
These days with a 500 Gigabyte hard drive on a computer being common, a Megabyte doesn't seem like much anymore. One of those old 3-1/2 inch
floppy disks can hold 1.44 Megabytes or the equivalent of a small book. 100 Megabytes might hold a couple volumes of Encyclopedias. 600
Megabytes is about the amount of data that will fit on a CD-ROM disk.
Gigabyte: A Gigabyte is approximately 1,000 Megabytes. A Gigabyte is still a very common term used these days when referring to disk space or
drive storage. 1 Gigabyte of data is almost twice the amount of data that a CD-ROM can hold. But it's about one thousand times the capacity of a 31/2 floppy disk. 1 Gigabyte could hold the contents of about 10 yards of books on a shelf. 100 Gigabytes could hold the entire library floor of
academic journals.
Terabyte: A Terabyte is approximately one trillion bytes, or 1,000 Gigabytes. There was a time that I never thought I would see a 1 Terabyte hard
drive, now one and two terabyte drives are the normal specs for many new computers. To put it in some perspective, a Terabyte could hold about
3.6 million 300 Kilobyte images or maybe about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica.
Ten Terabytes could hold the printed collection of the Library of Congress. That's a lot of data.
Petabyte: A Petabyte is approximately 1,000 Terabytes or one million Gigabytes. It's hard to visualize what a Petabyte could hold. 1 Petabyte could
hold approximately 20 million 4-door filing cabinets full of text. It could hold 500 billion pages of standard printed text. It would take about 500 million
floppy disks to store the same amount of data.
Exabyte: An Exabyte is approximately 1,000 Petabytes. Another way to look at it is that an Exabyte is approximately one quintillion bytes or one
billion Gigabytes. There is not much to compare an Exabyte to. It has been said that 5 Exabytes would be equal to all of the words ever spoken by
mankind.
Zettabyte: A Zettabyte is approximately 1,000 Exabytes. There is nothing to compare a Zettabyte to but to say that it would take a whole lot of ones
and zeroes to fill it up.
Yottabyte: A Yottabyte is approximately 1,000 Zettabytes. It would take approximately 11 trillion years to download a Yottabyte file from the Internet
using high-power broadband. You can compare it to the World Wide Web as the entire Internet almost takes up about a Yottabyte.
Brontobyte: A Brontobyte is (you guessed it) approximately 1,000 Yottabytes. The only thing there is to say about a Brontobyte is that it is a 1
followed by 27 zeroes!
Geopbyte: A Geopbyte is about 1000 Brontobytes! Not sure why this term was created. I'm doubting that anyone alive today will ever see a Geopbyte
Copyright
2013way
Hewlett-Packard
Development
Company,
The650
information
contained herein is subject to change without notice.
17
hard© drive.
One
of looking
at a geopbyte
is L.P.
15267
4600 2283229 4012496 7031205 376 bytes!
Drowning In Information?
Why?
The perception of storage is being cheap
Lack of accountability and responsibility
Compliance Requirements
Let‘s keep everything forever
18
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Signs Your Organization is Dealing with Dark Data
Fighting conventional wisdom: common challenges and common responses
Let’s add more disks
Running out of capacity
Applications are slowing down
Upgrade infrastructure
Backup takes longer and longer
Change backup infrastructure
Not sure if compliant
Retaining information longer than needed
Takes a long time to find, retrieve information
19
Implement an archive, DMS, RMS
Keep backup tapes, we keep everything forever
Look into different sources, recover tapes
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The Risk of Ignoring Dark Data
Dark data sitting outside an Information Governance strategy
exposes the organization to risk:
• Spiralling costs
– Expanding information footprint and storage costs
– Litigation and eDiscovery costs (“smoking gun” or inability to deliver)
• Security breaches and reputational damage
– Sensitive information unprotected (Personally Identifiable Information, Privacy, HIPAA
regulations)
– Data leakage and misuse
• Poor business execution and performance
– Incorrect context
– Decisions based on outdated information
– Duplicate effort spent re-creating information
20 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Business Value
Dark Data
Dark, Hidden Data
22
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Understanding the Value of Data
Three zones to simplify information management
23
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
24
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What To Do
About
Dark Data?
Is It OK to Delete?
26
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
27
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
28
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Get Rid of the ROT!
29
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is ROT?
Redundant, Obsolete, Trivial and Unknown
Dark Data tends to be:
• Redundant
– Duplicates and unauthorized copies
• Obsolete
– No longer in use or out of date
– Determined through creation, last
modified or accessed date and
retention policy
• Trivial
– File type with no content value
30
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
When Is It OK to Delete?
31
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Defensible Deletion
32
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Defensible Deletion
33
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The Role of
Information Governance
Information Governance
• Capture
• Create
• Keep
• Destroy
35
People
Programs
Procedures
Policies
• Find
• Store
• Use
• Protect
© Copyright 2013 Hewlett-Packard Development
Company, L.P. The information contained herein is subject to change without notice.
• Share
Information Governance
Policy driven management of all enterprise data
Establish clear control for creating, accessing,
retaining and disposing of information
Meet compliance, regulatory and
organizational requirements
Manage for cost
36
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Is it Data or Is it Information?
DISCOVER
TRANSACT
BUSINESS (Information)
DELIVER
work with information to achieve business
objectives.
CREATE
Business users:
COLLABORA
TE
Information Governance brings business users and IT together
UNDERSTAND
OPTIMIZE
IT:
37
COST
Policies &
Processes
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
PRESERVE
SECURE
DISTRIBUTE
PROTECT
OPTIMIZE
UNDERSTAND
IT (Data)
Organization
Technology
STORE
How the business; creates, shares, delivers
and discovers information will govern where
and how IT stores, protects, secures and
preserves data over time.
INFORMATION GOVERNANCE
RISK
manages data for the business.
VALUE
The Role of
Technology
Modern Computers Can Process Data
Read text files
View video files
Listen to audio files
Index all
Form a conceptual
understanding of data
39
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Conceptual Understanding
40
Furry, four-legged creature
Concept:
Man’s best friend
Comes in flavors like Dalmation,
Chihuahua, Great Dane…
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Plays fetch
Dog
Dark, Hidden Data
41
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Tagged, Organized Information
42
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Stages of Dark Data Clean-Up
Dark data clean-up process design:
43
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Identify and Index
An inventory of your data holdings
Identify
• Identify data sources
– common repositories include SharePoint, Shared drives and Microsoft exchange
Index
• Metadata only index (light index)
– identifies redundant, obsolete and trivial data
– Provides insight into data aging and business relevance
• Metadata and full content index
– Yields greater insight into business value and context
– Identify personally identifiable information (PII)
– Identify potential business records
44
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Analyze
Advanced content analytics to provide understanding and content
Identify
• Common content patterns and groupings
• Sensitive information through education (PII, HIPAA)
Visualization of statistics and summary reports
• Based on file level metadata and hashes (light index):
– Redundant data: statistics on duplicates
– Trivial data: based on file types with no content value (e.g *.exe, system files,
thumbnails)
– Obsolete data: based on date created, modified, accessed & policy
• Based on advanced content analysis:
– Clustering of common content patterns,
– Groupings and category matches
45
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Analytics
Advanced content analytics to provide understanding and content
Detailed graphs and linked
document grid
• Analytical data by:
– size,
– type,
– age,
– user,
– categories and
Candidates to
file, delete, etc.
– custom fields
• Cluster visualization
• Applied Tags
• Duplicates
46
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Volume and
growth of data
Type of files
Analyze and Auto-Classify
47
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Organize
Preparing for policy assignment
Assign data to categories
• Assess gaps between “actual” and “established” categories and groupings
• Train categories from real data or records management file plan/classifications
–
Filtering, sampling & document inspection
• Tag data into actionable groups (categories) based on analysis
Assign policies to tagged categories
• Apply standard RIM policies for disposition or ongoing management
• Workflow policies to route data through an approval process
• Audit logs of policy application and approvals
48
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Organize
Preparing for policy assignment
Tag with reason
Actions
Number and
size of files
File list or
sample list
49
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Reduce
Cut down on the data volume, don’t keep everything forever.
Provide defensible disposition
• Report on items marked for deletion
• Seek approval from identified owners
• Review and approve workflow
processes
• Execute deletion and de-duplication
of tagged data based on policy
• Maintain audit log for policy
application and execution (defensible
disposition)
50
Big
Data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Smart
Data
Manage/Migrate
The pathway to ongoing information governance
Legacy data clean-up is not just about
deleting redundant, obsolete & trivial data
• Merge valuable legacy data into your current
information governance program.
– Declare as a record, move, secure move, apply a
hold to manage in place.
• Migrate cleaned legacy data between repositories or
tiered storage, e.g. File System to SharePoint,
Exchange to HP Consolidated Archive .
• Move declared legacy data records to the Records
Management system
• Provide Lifetime management of new data through
ongoing policy application
51
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Benefits of Managing
Dark Data
Make No Mistake About It….
Dark data presents problems in your organization
Storage
Information
footprint
53
• Key initiative: Storage Containment
• “I can’t buy storage until I reduce the amount of information I have”
• “ I have content in file shares – I don’t know what it is”
• “SharePoint is prolific, I have many inactive sites costing me money”
Needle in the
haystack
• Employees waste too much time “looking” for stuff
• “E-Discovery is costing an arm and a leg”
Compliance
requirements
•
•
•
•
“I don’t know what’s lurking in my file shares”
“I have no way to determine my important business content”
How do I know that I am retaining records IAW retention schedules?
When can I legally and defensibly destroy data?
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The Benefits of Dark/Legacy Data Clean-up
It’s not just about reducing risk
Understanding your data allows you to exploit opportunities
and realize benefits:
• Cost savings from defensible disposition of legacy and dark data
– Reduce information footprint and storage costs
– Reduce management overhead (back-up & recovery, system
maintenance)
– Reduce litigation costs (discovery fees, penalties and fines)
• Improved operational efficiencies
– Streamline data management in preparation for the cloud
Cost
– Automate processes and reduce errors
reduction
• Inform future information governance strategy
– Turn big data into smart data
– Provide insight into current and future business processes
– Gap analysis: Identify “actual” information types and structures,
54 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
compare with “established”
Risk
reduction
Information
governance
Efficiency
Managing dark data can deliver significant ROI
Dark Data Clean-up reduces:
•
•
•
•
Information footprint
Storage costs
Risks
eDiscovery costs
Reduced Backup
$12,168,633
$12,655,378
$8,860,655
Compliance
Management
Costs
Reduced Storage
Purchase
Reduced Storage
Operations
Benefits by Category
55
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Benefits by Year
Thank you!
Please keep in touch, share your ideas,
concerns, and initiatives with me.
Bill Manago, CRM
Information Governance Solutions Lead
HP, Autonomy
William.manago@hp.com
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Download