The Data Deluge
“The Growth of Unstructured Data ”
Dr Kevin McIsaac, IBRS
www.ibrs.com.au
Overview




The Impact of Changes in Data Growth
Rates
Exploiting Data Management Technologies
Taking Control Of E-mail
Conclusions
© Copyright 2006 IBRS All rights reserved.
The Impact of Changes in Data
Growth Rates



Data growth rates accelerate
The “unstructured data” tipping point
How big is the impact?
© Copyright 2006 IBRS All rights reserved.
Data Growth Rates Accelerate



92% of all new data is stored
on magnetic media, primarily
hard disks.
That data grew about 30%
pa between 1999 and 2002
Growth rate forecast to grow
at 60% pa though 2011!


i.e., your storage capacity will double
every 18 months!
2007: First 1TB disk!
ANZ Data Growth Rate In
Next 12 Months
40%
35%
30%
25%
20%
15%
10%
5%
0%
Source: Computer World/IBRS Data Management Survey
So What’s New! Data Has Always Grown At High Rates.
© Copyright 2006 IBRS All rights reserved.
The “Unstructured Data”
Tipping Point


What is “Unstructured Data”
We have reached a tipping
point were

More that ½ of all data
managed by IT is unstructured



Merrill Lynch estimate 85% of
business data is unstructured
Some of your largest data sets
are unstructured, e.g., e-mail
Unstructured data growth
rate of 65%-200%

But, 38% of ITO’s lack a
document management system
% of Unstructured Data
40%
35%
30%
25%
20%
15%
10%
5%
0%
Source: Computer World/IBRS Data Management Survey
Data Management Was Traditionally About Managing
Structured Data. This Focus Needs to Change.
© Copyright 2006 IBRS All rights reserved.
How Big Is The Impact?

Office workers spend an average of
9.5 hr/wk searching, gathering and
analysing information, with 60 % of
that on the Internet


White collar workers spend 30% 40% of their time managing
documents


Outsell
Gartner
Our survey highlights



Strong concerns with the rate of
unstructured data growth
Lack of systems to manage this
Few concerns with the storage
infrastructure.
Our unstructured data is growing
too rapidly
We do not have adequate systems
to manage our unstructured data
We don't know our storage costs
We have problems meeting our
compliance requirements
Our structured data is growing too
rapidly
Provisioning storage takes too long
We are spending too much on
people to manage storage
We are spending too much on
storage hardware
We are spending too much on
storage software
70%
65%
28%
22%
19%
10%
10%
4%
4%
Source: Computer World/IBRS Data Management Survey
IT Must Learn To Manage Unstructured Data As Effectively
As It Does Structured Data Today
© Copyright 2006 IBRS All rights reserved.
Exploiting Data Management
Technologies






Advances in Storage Hardware
Commoditisation of Storage Arrays
Information Lifecycle Management
Document Management
Data Classification
Disaster Recovery Readiness
© Copyright 2006 IBRS All rights reserved.
Advances in Storage Hardware

Shugart’s Law - $ per bit of magnetic
storage declines 1/2 every 18 months



SANs well established & a commodity




~37% pa (10%/Q), recently 50% pa!
Flat budget supports ~60%pa growth
Fully featured arrays reasonably priced
iSCSI taking off as a complement to FC
Bolt-on storage virtualisation not
gaining traction
Content Addressable Storage


Use for long term archive.
TCO benefits are in the long term
management of data
Shugart’s Law Ensures Drive Costs Are Contained, But
What About The System Costs
© Copyright 2006 IBRS All rights reserved.
Commodity Storage Arrays

G1: Monolithic arrays


G2: Modular arrays


Proprietary & very expensive
Proprietary with commodity components, moderately expensive
G3: Commodity based arrays

Commodity components, standards based, inexpensive



In-box virtualisation for simpler management and lower cost


SAS as high performance, lower cost alterative to FC-disk
Freely mix SAS and SATA in same frame
Thin provisioning is the next big virtualisation technology
Potential for new vendor to challenge established players

e.g., Compellent, EqualLogic, 3-PAR etc
Hardware Is Just A Small Part Of The Problem. Data
Management Processes Are More Important
© Copyright 2006 IBRS All rights reserved.
Information Lifecycle
Management

Automate the management of your
data lifecycle policy


Retain, delete, migrate, archive
Defining and enforcing policy



Start with tiered storage


Balance price with service levels
Due to high growth rates focus on
unstructured data



Who sets policy? Who has authority?
IT is not the data owner, just the
steward!
Transactional stuff generally Ok
Archival of E-mail and Documents
Don’t confuse backup & archival!

Separate archive from backup
Source: Computer World/IBRS Data Management Survey
While ILM is The holy grail of storage vendors it has not
yet been widely adopted
© Copyright 2006 IBRS All rights reserved.
Document Management

Document management can
eliminate significant wasted time



“White collar workers spend 30%
- 40% of their time managing
documents”
But, 38% have no DM system and
50% only cover some documents
Document management needs to
include e-mail


E-mail is often the largest
unstructured data repository
But only12% said document
management includes e-mail
Source: Computer World/IBRS Data Management Survey
Document Management and ILM and Archiving Are All
Predicated on Data Classification and Policy
© Copyright 2006 IBRS All rights reserved.
Data Classification & Policy

Only 12% had clear, formal
policy. Without this:

IT can’t act responsibly as a
steward


ILM is nearly impossible, i.e.,


No mandate!
Data can’t be deleted and
archival is difficult.
Few had metadata or
taxonomies, which hampers
data use and reuse
We have classified some or all of
our data.
IT is a steward, managing data
using policies set by the business.
We have assigned business
owners for our data.
The business has defined the
value of our key data.
We have clear, formal policies for
data management.
We create metadata to help
classify data.
We create taxonomies to help
classify data.
53%
35%
30%
18%
12%
6%
4%
Source: Computer World/IBRS Data Management Survey
Businesses Need to Invest in Data Classification & Policy
© Copyright 2006 IBRS All rights reserved.
Disaster Recovery Readiness

Disaster recovery confidence
level are high, however…


44% said they have not tested
their DR plan in the last 12
months.
35% said they had only one a
limited disaster recovery test
in the last 12 months.
DR Confidence Levels
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
Very
concerned
Concerned
Confident
Very
confident
Source: Computer World/IBRS Data Management Survey
Without Regular Testing Disaster Recovery Plans
Are A Lottery
© Copyright 2006 IBRS All rights reserved.
Taking Control Of E-mail



The Importance of E-mail
E-mail Data Management Challenges
Managing Users’ Mailboxes
© Copyright 2006 IBRS All rights reserved.
The Importance of E-mail

80% say e-mail is more important
than the telephone. 74 % said being
without e-mail is a greater hardship
than losing the telephone.


A typical business user sends and
receives around 600 e-mail per week


META Group
Ferris Research
The average office worker spends 49
min/day managing e-mail. Upper level
managers spend up to 4hrs/day. All
that sending & receiving, responding
& deleting takes an enormous toll on
workplace productivity.

ePolicy Institute
E-mail Is An Essential Business Tool But
E-Mail Data Management Is Still A “Cottage Industry”
© Copyright 2006 IBRS All rights reserved.
E-mail Data Management
Challenges


57% Said Managing E-mail Was
One Of Their Top DM Problems
Top Exchange DM challenges





Managing Exchange disaster
recovery
Managing the size of Message
Stores
Protecting & searching individual
.PST files
Restoring individual mailboxes
Responding to legal discovery and
capturing all email for compliance

Osterman Research
ANZ Forecast E-mail
Growth Rates
40%
35%
30%
25%
20%
15%
10%
5%
0%
Source: Computer World/IBRS Data Management Survey
Managing Users’ Mailboxes Is Key To All These Challenges
© Copyright 2006 IBRS All rights reserved.
Managing Users’ Mailboxes

The common solution is to use
mailbox quotas

40% use PSTs to limit growth but 37%
said it caused problems.


E-mail archival can be a powerful
solution but…

Only 13 % had successfully
implemented e-mail archiving



Just shift the problem elsewhere
Another 13% tried and failed!
Needs robust data management policy
Only 2% implemented an ediscovery/compliance solution!
Source: Computer World/IBRS Data Management Survey
Getting E-mail Under Control Is An Important And Urgent
Issue, But Proceed With Great Caution
© Copyright 2006 IBRS All rights reserved.
Conclusions



We have reached a tipping point, where
unstructured data volume and growth
exceeds that of structured data
Learn to manage unstructured data as
effectively as structured data
Invest in data classification & policy before
applying technology
© Copyright 2006 IBRS All rights reserved.
The Data Deluge
“The Growth of Unstructured Data ”
Dr Kevin McIsaac, IBRS
www.ibrs.com.au