Shining a Light on Dark Data Bringing your hidden data to the light. © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Your Kitchen 3 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Your Kitchen – Well Organized Baking Pans Bowls Pantry Coffee Cups Drinking Glasses Dinner Plates Flat Ware Serving Utensils Cups & Saucers Pots & Pans 4 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Your Business Organization 5 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Your Organization – Well Organized? SharePoint Shared Drives Records Business App Emails HR Meeting Minutes Business App Business Unit A Finance Business App Agenda Mgmt 6 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Business Unit B The Kitchen Drawer 7 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. My Kitchen Drawer 8 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Informed Business Decision Keep or (Defensibly) Delete Object Keep Hurricane shutter keys √ Batteries √ Screwdriver(s) √ Box cutter √ Postage stamps Trash √ Money 9 ReLocate √ Pens, no longer write √ Glue, frozen cap √ © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Dark Data Do You Have Any? Your House – Regularly Scheduled Disposal Saturday Wednesday Bulk 11 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Your Business – Regularly Scheduled Disposal? Good Bad Ugly 12 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Hiding in Plain Sight Are you overwhelmed by enterprise data? A majority of an organization’s data is: • effectively ungoverned • unmanaged • some is unseen... 13 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. What is Dark Data? What lies hidden in your enterprise data…the unknown? Dark Data tends to be: • • • • • • • Human readable Unstructured Unindexed Un-categorized Unmanaged Inactive Orphaned Dark Data resides in: • • • • •14 File servers SharePoint Email servers Desktops © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Mobile Devices What Is the Size of Your Digital Landfill? Enterprise Search Engine The Digital Landfill 15 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Size Does Matter Just Ask IT 1 Bit = Binary Digit 1000 Terabytes = 1 Petabyte 8 Bits = 1 Byte 1000 Petabytes = 1 Exabyte 1000 Bytes = 1 Kilobyte 1000 Exabytes = 1 Zettabyte 1000 Kilobytes = 1 Megabyte 1000 Zettabytes = 1 Yottabyte 1000 Megabytes = 1 Gigabyte 1000 Yottabytes = 1 Brontobyte 1000 Gigabytes = 1 Terabyte 1000 Brontobytes = 1 Geopbyte 16 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Bit: A Bit is the smallest unit of data that a computer uses. It can be used to represent two states of information, such as Yes or No. Byte: A Byte is equal to 8 Bits. A Byte can represent 256 states of information, for example, numbers or a combination of numbers and letters. 1 Byte could be equal to one character. 10 Bytes could be equal to a word. 100 Bytes would equal an average sentence. Kilobyte: A Kilobyte is approximately 1,000 Bytes, actually 1,024 Bytes depending on which definition is used. 1 Kilobyte would be equal to this paragraph you are reading, whereas 100 Kilobytes would equal an entire page. Megabyte: A Megabyte is approximately 1,000 Kilobytes. In the early days of computing, a Megabyte was considered to be a large amount of data. These days with a 500 Gigabyte hard drive on a computer being common, a Megabyte doesn't seem like much anymore. One of those old 3-1/2 inch floppy disks can hold 1.44 Megabytes or the equivalent of a small book. 100 Megabytes might hold a couple volumes of Encyclopedias. 600 Megabytes is about the amount of data that will fit on a CD-ROM disk. Gigabyte: A Gigabyte is approximately 1,000 Megabytes. A Gigabyte is still a very common term used these days when referring to disk space or drive storage. 1 Gigabyte of data is almost twice the amount of data that a CD-ROM can hold. But it's about one thousand times the capacity of a 31/2 floppy disk. 1 Gigabyte could hold the contents of about 10 yards of books on a shelf. 100 Gigabytes could hold the entire library floor of academic journals. Terabyte: A Terabyte is approximately one trillion bytes, or 1,000 Gigabytes. There was a time that I never thought I would see a 1 Terabyte hard drive, now one and two terabyte drives are the normal specs for many new computers. To put it in some perspective, a Terabyte could hold about 3.6 million 300 Kilobyte images or maybe about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica. Ten Terabytes could hold the printed collection of the Library of Congress. That's a lot of data. Petabyte: A Petabyte is approximately 1,000 Terabytes or one million Gigabytes. It's hard to visualize what a Petabyte could hold. 1 Petabyte could hold approximately 20 million 4-door filing cabinets full of text. It could hold 500 billion pages of standard printed text. It would take about 500 million floppy disks to store the same amount of data. Exabyte: An Exabyte is approximately 1,000 Petabytes. Another way to look at it is that an Exabyte is approximately one quintillion bytes or one billion Gigabytes. There is not much to compare an Exabyte to. It has been said that 5 Exabytes would be equal to all of the words ever spoken by mankind. Zettabyte: A Zettabyte is approximately 1,000 Exabytes. There is nothing to compare a Zettabyte to but to say that it would take a whole lot of ones and zeroes to fill it up. Yottabyte: A Yottabyte is approximately 1,000 Zettabytes. It would take approximately 11 trillion years to download a Yottabyte file from the Internet using high-power broadband. You can compare it to the World Wide Web as the entire Internet almost takes up about a Yottabyte. Brontobyte: A Brontobyte is (you guessed it) approximately 1,000 Yottabytes. The only thing there is to say about a Brontobyte is that it is a 1 followed by 27 zeroes! Geopbyte: A Geopbyte is about 1000 Brontobytes! Not sure why this term was created. I'm doubting that anyone alive today will ever see a Geopbyte Copyright 2013way Hewlett-Packard Development Company, The650 information contained herein is subject to change without notice. 17 hard© drive. One of looking at a geopbyte is L.P. 15267 4600 2283229 4012496 7031205 376 bytes! Drowning In Information? Why? The perception of storage is being cheap Lack of accountability and responsibility Compliance Requirements Let‘s keep everything forever 18 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Signs Your Organization is Dealing with Dark Data Fighting conventional wisdom: common challenges and common responses Let’s add more disks Running out of capacity Applications are slowing down Upgrade infrastructure Backup takes longer and longer Change backup infrastructure Not sure if compliant Retaining information longer than needed Takes a long time to find, retrieve information 19 Implement an archive, DMS, RMS Keep backup tapes, we keep everything forever Look into different sources, recover tapes © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The Risk of Ignoring Dark Data Dark data sitting outside an Information Governance strategy exposes the organization to risk: • Spiralling costs – Expanding information footprint and storage costs – Litigation and eDiscovery costs (“smoking gun” or inability to deliver) • Security breaches and reputational damage – Sensitive information unprotected (Personally Identifiable Information, Privacy, HIPAA regulations) – Data leakage and misuse • Poor business execution and performance – Incorrect context – Decisions based on outdated information – Duplicate effort spent re-creating information 20 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Business Value Dark Data Dark, Hidden Data 22 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Understanding the Value of Data Three zones to simplify information management 23 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. What To Do About Dark Data? Is It OK to Delete? 26 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Get Rid of the ROT! 29 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. What is ROT? Redundant, Obsolete, Trivial and Unknown Dark Data tends to be: • Redundant – Duplicates and unauthorized copies • Obsolete – No longer in use or out of date – Determined through creation, last modified or accessed date and retention policy • Trivial – File type with no content value 30 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. When Is It OK to Delete? 31 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Defensible Deletion 32 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Defensible Deletion 33 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The Role of Information Governance Information Governance • Capture • Create • Keep • Destroy 35 People Programs Procedures Policies • Find • Store • Use • Protect © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. • Share Information Governance Policy driven management of all enterprise data Establish clear control for creating, accessing, retaining and disposing of information Meet compliance, regulatory and organizational requirements Manage for cost 36 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Is it Data or Is it Information? DISCOVER TRANSACT BUSINESS (Information) DELIVER work with information to achieve business objectives. CREATE Business users: COLLABORA TE Information Governance brings business users and IT together UNDERSTAND OPTIMIZE IT: 37 COST Policies & Processes © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. PRESERVE SECURE DISTRIBUTE PROTECT OPTIMIZE UNDERSTAND IT (Data) Organization Technology STORE How the business; creates, shares, delivers and discovers information will govern where and how IT stores, protects, secures and preserves data over time. INFORMATION GOVERNANCE RISK manages data for the business. VALUE The Role of Technology Modern Computers Can Process Data Read text files View video files Listen to audio files Index all Form a conceptual understanding of data 39 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Conceptual Understanding 40 Furry, four-legged creature Concept: Man’s best friend Comes in flavors like Dalmation, Chihuahua, Great Dane… © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Plays fetch Dog Dark, Hidden Data 41 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Tagged, Organized Information 42 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Stages of Dark Data Clean-Up Dark data clean-up process design: 43 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Identify and Index An inventory of your data holdings Identify • Identify data sources – common repositories include SharePoint, Shared drives and Microsoft exchange Index • Metadata only index (light index) – identifies redundant, obsolete and trivial data – Provides insight into data aging and business relevance • Metadata and full content index – Yields greater insight into business value and context – Identify personally identifiable information (PII) – Identify potential business records 44 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Analyze Advanced content analytics to provide understanding and content Identify • Common content patterns and groupings • Sensitive information through education (PII, HIPAA) Visualization of statistics and summary reports • Based on file level metadata and hashes (light index): – Redundant data: statistics on duplicates – Trivial data: based on file types with no content value (e.g *.exe, system files, thumbnails) – Obsolete data: based on date created, modified, accessed & policy • Based on advanced content analysis: – Clustering of common content patterns, – Groupings and category matches 45 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Analytics Advanced content analytics to provide understanding and content Detailed graphs and linked document grid • Analytical data by: – size, – type, – age, – user, – categories and Candidates to file, delete, etc. – custom fields • Cluster visualization • Applied Tags • Duplicates 46 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Volume and growth of data Type of files Analyze and Auto-Classify 47 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Organize Preparing for policy assignment Assign data to categories • Assess gaps between “actual” and “established” categories and groupings • Train categories from real data or records management file plan/classifications – Filtering, sampling & document inspection • Tag data into actionable groups (categories) based on analysis Assign policies to tagged categories • Apply standard RIM policies for disposition or ongoing management • Workflow policies to route data through an approval process • Audit logs of policy application and approvals 48 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Organize Preparing for policy assignment Tag with reason Actions Number and size of files File list or sample list 49 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Reduce Cut down on the data volume, don’t keep everything forever. Provide defensible disposition • Report on items marked for deletion • Seek approval from identified owners • Review and approve workflow processes • Execute deletion and de-duplication of tagged data based on policy • Maintain audit log for policy application and execution (defensible disposition) 50 Big Data © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Smart Data Manage/Migrate The pathway to ongoing information governance Legacy data clean-up is not just about deleting redundant, obsolete & trivial data • Merge valuable legacy data into your current information governance program. – Declare as a record, move, secure move, apply a hold to manage in place. • Migrate cleaned legacy data between repositories or tiered storage, e.g. File System to SharePoint, Exchange to HP Consolidated Archive . • Move declared legacy data records to the Records Management system • Provide Lifetime management of new data through ongoing policy application 51 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Benefits of Managing Dark Data Make No Mistake About It…. Dark data presents problems in your organization Storage Information footprint 53 • Key initiative: Storage Containment • “I can’t buy storage until I reduce the amount of information I have” • “ I have content in file shares – I don’t know what it is” • “SharePoint is prolific, I have many inactive sites costing me money” Needle in the haystack • Employees waste too much time “looking” for stuff • “E-Discovery is costing an arm and a leg” Compliance requirements • • • • “I don’t know what’s lurking in my file shares” “I have no way to determine my important business content” How do I know that I am retaining records IAW retention schedules? When can I legally and defensibly destroy data? © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The Benefits of Dark/Legacy Data Clean-up It’s not just about reducing risk Understanding your data allows you to exploit opportunities and realize benefits: • Cost savings from defensible disposition of legacy and dark data – Reduce information footprint and storage costs – Reduce management overhead (back-up & recovery, system maintenance) – Reduce litigation costs (discovery fees, penalties and fines) • Improved operational efficiencies – Streamline data management in preparation for the cloud Cost – Automate processes and reduce errors reduction • Inform future information governance strategy – Turn big data into smart data – Provide insight into current and future business processes – Gap analysis: Identify “actual” information types and structures, 54 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. compare with “established” Risk reduction Information governance Efficiency Managing dark data can deliver significant ROI Dark Data Clean-up reduces: • • • • Information footprint Storage costs Risks eDiscovery costs Reduced Backup $12,168,633 $12,655,378 $8,860,655 Compliance Management Costs Reduced Storage Purchase Reduced Storage Operations Benefits by Category 55 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Benefits by Year Thank you! Please keep in touch, share your ideas, concerns, and initiatives with me. Bill Manago, CRM Information Governance Solutions Lead HP, Autonomy William.manago@hp.com © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.