The Data Deluge “The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS www.ibrs.com.au Overview The Impact of Changes in Data Growth Rates Exploiting Data Management Technologies Taking Control Of E-mail Conclusions © Copyright 2006 IBRS All rights reserved. The Impact of Changes in Data Growth Rates Data growth rates accelerate The “unstructured data” tipping point How big is the impact? © Copyright 2006 IBRS All rights reserved. Data Growth Rates Accelerate 92% of all new data is stored on magnetic media, primarily hard disks. That data grew about 30% pa between 1999 and 2002 Growth rate forecast to grow at 60% pa though 2011! i.e., your storage capacity will double every 18 months! 2007: First 1TB disk! ANZ Data Growth Rate In Next 12 Months 40% 35% 30% 25% 20% 15% 10% 5% 0% Source: Computer World/IBRS Data Management Survey So What’s New! Data Has Always Grown At High Rates. © Copyright 2006 IBRS All rights reserved. The “Unstructured Data” Tipping Point What is “Unstructured Data” We have reached a tipping point were More that ½ of all data managed by IT is unstructured Merrill Lynch estimate 85% of business data is unstructured Some of your largest data sets are unstructured, e.g., e-mail Unstructured data growth rate of 65%-200% But, 38% of ITO’s lack a document management system % of Unstructured Data 40% 35% 30% 25% 20% 15% 10% 5% 0% Source: Computer World/IBRS Data Management Survey Data Management Was Traditionally About Managing Structured Data. This Focus Needs to Change. © Copyright 2006 IBRS All rights reserved. How Big Is The Impact? Office workers spend an average of 9.5 hr/wk searching, gathering and analysing information, with 60 % of that on the Internet White collar workers spend 30% 40% of their time managing documents Outsell Gartner Our survey highlights Strong concerns with the rate of unstructured data growth Lack of systems to manage this Few concerns with the storage infrastructure. Our unstructured data is growing too rapidly We do not have adequate systems to manage our unstructured data We don't know our storage costs We have problems meeting our compliance requirements Our structured data is growing too rapidly Provisioning storage takes too long We are spending too much on people to manage storage We are spending too much on storage hardware We are spending too much on storage software 70% 65% 28% 22% 19% 10% 10% 4% 4% Source: Computer World/IBRS Data Management Survey IT Must Learn To Manage Unstructured Data As Effectively As It Does Structured Data Today © Copyright 2006 IBRS All rights reserved. Exploiting Data Management Technologies Advances in Storage Hardware Commoditisation of Storage Arrays Information Lifecycle Management Document Management Data Classification Disaster Recovery Readiness © Copyright 2006 IBRS All rights reserved. Advances in Storage Hardware Shugart’s Law - $ per bit of magnetic storage declines 1/2 every 18 months SANs well established & a commodity ~37% pa (10%/Q), recently 50% pa! Flat budget supports ~60%pa growth Fully featured arrays reasonably priced iSCSI taking off as a complement to FC Bolt-on storage virtualisation not gaining traction Content Addressable Storage Use for long term archive. TCO benefits are in the long term management of data Shugart’s Law Ensures Drive Costs Are Contained, But What About The System Costs © Copyright 2006 IBRS All rights reserved. Commodity Storage Arrays G1: Monolithic arrays G2: Modular arrays Proprietary & very expensive Proprietary with commodity components, moderately expensive G3: Commodity based arrays Commodity components, standards based, inexpensive In-box virtualisation for simpler management and lower cost SAS as high performance, lower cost alterative to FC-disk Freely mix SAS and SATA in same frame Thin provisioning is the next big virtualisation technology Potential for new vendor to challenge established players e.g., Compellent, EqualLogic, 3-PAR etc Hardware Is Just A Small Part Of The Problem. Data Management Processes Are More Important © Copyright 2006 IBRS All rights reserved. Information Lifecycle Management Automate the management of your data lifecycle policy Retain, delete, migrate, archive Defining and enforcing policy Start with tiered storage Balance price with service levels Due to high growth rates focus on unstructured data Who sets policy? Who has authority? IT is not the data owner, just the steward! Transactional stuff generally Ok Archival of E-mail and Documents Don’t confuse backup & archival! Separate archive from backup Source: Computer World/IBRS Data Management Survey While ILM is The holy grail of storage vendors it has not yet been widely adopted © Copyright 2006 IBRS All rights reserved. Document Management Document management can eliminate significant wasted time “White collar workers spend 30% - 40% of their time managing documents” But, 38% have no DM system and 50% only cover some documents Document management needs to include e-mail E-mail is often the largest unstructured data repository But only12% said document management includes e-mail Source: Computer World/IBRS Data Management Survey Document Management and ILM and Archiving Are All Predicated on Data Classification and Policy © Copyright 2006 IBRS All rights reserved. Data Classification & Policy Only 12% had clear, formal policy. Without this: IT can’t act responsibly as a steward ILM is nearly impossible, i.e., No mandate! Data can’t be deleted and archival is difficult. Few had metadata or taxonomies, which hampers data use and reuse We have classified some or all of our data. IT is a steward, managing data using policies set by the business. We have assigned business owners for our data. The business has defined the value of our key data. We have clear, formal policies for data management. We create metadata to help classify data. We create taxonomies to help classify data. 53% 35% 30% 18% 12% 6% 4% Source: Computer World/IBRS Data Management Survey Businesses Need to Invest in Data Classification & Policy © Copyright 2006 IBRS All rights reserved. Disaster Recovery Readiness Disaster recovery confidence level are high, however… 44% said they have not tested their DR plan in the last 12 months. 35% said they had only one a limited disaster recovery test in the last 12 months. DR Confidence Levels 50.00% 45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% Very concerned Concerned Confident Very confident Source: Computer World/IBRS Data Management Survey Without Regular Testing Disaster Recovery Plans Are A Lottery © Copyright 2006 IBRS All rights reserved. Taking Control Of E-mail The Importance of E-mail E-mail Data Management Challenges Managing Users’ Mailboxes © Copyright 2006 IBRS All rights reserved. The Importance of E-mail 80% say e-mail is more important than the telephone. 74 % said being without e-mail is a greater hardship than losing the telephone. A typical business user sends and receives around 600 e-mail per week META Group Ferris Research The average office worker spends 49 min/day managing e-mail. Upper level managers spend up to 4hrs/day. All that sending & receiving, responding & deleting takes an enormous toll on workplace productivity. ePolicy Institute E-mail Is An Essential Business Tool But E-Mail Data Management Is Still A “Cottage Industry” © Copyright 2006 IBRS All rights reserved. E-mail Data Management Challenges 57% Said Managing E-mail Was One Of Their Top DM Problems Top Exchange DM challenges Managing Exchange disaster recovery Managing the size of Message Stores Protecting & searching individual .PST files Restoring individual mailboxes Responding to legal discovery and capturing all email for compliance Osterman Research ANZ Forecast E-mail Growth Rates 40% 35% 30% 25% 20% 15% 10% 5% 0% Source: Computer World/IBRS Data Management Survey Managing Users’ Mailboxes Is Key To All These Challenges © Copyright 2006 IBRS All rights reserved. Managing Users’ Mailboxes The common solution is to use mailbox quotas 40% use PSTs to limit growth but 37% said it caused problems. E-mail archival can be a powerful solution but… Only 13 % had successfully implemented e-mail archiving Just shift the problem elsewhere Another 13% tried and failed! Needs robust data management policy Only 2% implemented an ediscovery/compliance solution! Source: Computer World/IBRS Data Management Survey Getting E-mail Under Control Is An Important And Urgent Issue, But Proceed With Great Caution © Copyright 2006 IBRS All rights reserved. Conclusions We have reached a tipping point, where unstructured data volume and growth exceeds that of structured data Learn to manage unstructured data as effectively as structured data Invest in data classification & policy before applying technology © Copyright 2006 IBRS All rights reserved. The Data Deluge “The Growth of Unstructured Data ” Dr Kevin McIsaac, IBRS www.ibrs.com.au