On Big Data
Patricia Florissi, Ph. D.
VP - Americas & EMEA CTO
April, 2012
© Copyright 2012 EMC Corporation. All rights reserved.
1
IN 2010 THE DIGITAL UNIVERSE WAS
1.2 ZETTABYTES
1,200,000,000,000,000,000,000
Zetta
Exa
Peta
Tera
Giga
Mega
Kilo
Byte
Source: 2010 IDC Digital Universe Study
© Copyright 2012 EMC Corporation. All rights reserved.
2
The Data Deluge This Decade
2020
2009
0.8 Zettabytes
INFORMATION
GROWING
WORLDWIDE IT STAFFING WILL GROW BY
LESS THAN 50%
44
TIMES
LARGER
35.2 ZB
Source: IDC Digital Universe Study, sponsored by EMC, 2011
© Copyright 2012 EMC Corporation. All rights reserved.
3
From Information Deluge To Big Data
Agenda
 How Did We Get Here?
 What Is Big Data Anyway?
 Does Big Data Matter?
 Who Said It Mattered To Brazil and the USA?
© Copyright 2012 EMC Corporation. All rights reserved.
4
How Did We Get
Here?
© Copyright 2012 EMC Corporation. All rights reserved.
5
Waves Of Change
You
PC/
Microprocessor
Networked/
Distributed
Computing
Are
Here!
Minicomputer
Mainframe
© Copyright 2012 EMC Corporation. All rights reserved.
6
Dramatic x86 Performance Growth
2000% Performance Increase Since 2005
Xeon E7-4800
Ten Core 32nm
Xeon 7500
Eight Core 45nm
Xeon 7300
Xeon
7100
Quad
Core 65nm
Xeon 7040
Xeon 3.66 GHz
Dual Core 65nm
Dual Core 90nm
Single Core 90nm
2005
2005
2006
2007
Xeon 7400
Six Core 45nm
2008
2010
2011
Source: Intel internal OLTP database workload performance estimates as of 15 April 2011. Results have been estimated based on
internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or
configuration may affect actual performance.
© Copyright 2012 EMC Corporation. All rights reserved.
7
Dominant Market Share For x86
x86 As A Percent Of Worldwide Server Shipments
100%
80%
60%
40%
20%
0%
1989
1995
x86
UnitShare
Share
Unit
2000
2005
2010
x86
Rev. Share
Revenue
Share
Source: IDC
© Copyright 2012 EMC Corporation. All rights reserved.
8
OPERATIONS PER SECOND
Flash Fills The Performance Gap
1,000,000,000s
Processors
DRAM
Memory
100,000,000s
100,000s
Flash
Hard Disk
100s
Picoseconds Nanoseconds Microseconds Milliseconds
Seconds
LATENCY
© Copyright 2012 EMC Corporation. All rights reserved.
9
Then Came Virtualization
Old World – Physical
New World – Virtual
Dedicated, Vertical Stacks
Dynamic Pools Of
Compute & Storage
© Copyright 2012 EMC Corporation. All rights reserved.
10
Enter The Cloud.
Cloud
Technologies.
© Copyright 2012 EMC Corporation. All rights reserved.
12
Waves Of Change
PC/
Microprocessor
Networked/
Distributed
Computing
Cloud
Computing
Minicomputer
Mainframe
© Copyright 2012 EMC Corporation. All rights reserved.
13
How Is This Related To Big
Data?
© Copyright 2012 EMC Corporation. All rights reserved.
14
Now We Are Cloud Ready for
The Connected Era!
© Copyright 2012 EMC Corporation. All rights reserved.
15
What’s Driving The Data Deluge?
Video Rendering
FACEBOOK
GROWS BY
250 MILLION
PHOTOS / DAY
Mobile Sensors
Social Media
READING METERS
EVERY 15 MINS.
IS 3,000X MORE
DATA INTENSIVE
Video
Surveillance
Medical Imaging
Gene Sequencing
COST TO SEQUENCE
ONE GENOME
HAS FALLEN FROM
$100M IN 2001
Smart Grids
Geophysical
Exploration
© Copyright 2012 EMC Corporation. All rights reserved.
TO $10K IN 2011
16
What Is Big Data
Anyway?
© Copyright 2012 EMC Corporation. All rights reserved.
17
Big Data Refers To…
 All Data that comes at high Volume
 All Data that comes at high Velocity
 All Data that comes from a Variety of Sources
 All Data that brings Complexity
 All Data that challenges existing Information
Infrastructure Capabilities
 All Data that makes us “Think Different” Today
© Copyright 2012 EMC Corporation. All rights reserved.
18
Big Data Is A Relative Concept
What is Big Today…
May Not Be So Big Tomorrow….
© Copyright 2012 EMC Corporation. All rights reserved.
19
Data Sources Are Expanding
INFORMATION IN THE ENTERPRISE WILL
GROW 50X
IN THE NEXT 10 YEARS
Source: 2011 IDC Digital Universe Study
© Copyright 2012 EMC Corporation. All rights reserved.
20
Big Data Applications
Unstructured Data
Gene Sequencing Movie Editing Seismic Study
Semi-Structured Data
Social Media
Clickstream
Productivity
Structured Data
Telco Billing
Retail POS
Sales Forecast
Web Content Storage Services
Social Media
Clickstream
Productivity
© Copyright 2012 EMC Corporation. All rights reserved.
Hybrid
Cloud
21
The Complexity Of Big Data
Unstructured Data
Gene Sequencing Movie Editing Seismic Study
Semi-Structured Data
Social Media
Clickstream
Web Content Storage Services
•The Service could not have been better
•The service could have been better
•The service could have been better,
even if they were dead
© Copyright 2012 EMC Corporation. All rights reserved.
Productivity
Structured Data
Telco Billing
Last
Name
First
Name
Retail POS
SSN
Sales Forecast
DOB
Rate
5
3
1
22
90% OF THE
DIGITAL UNIVERSE IS
UNSTRUCTURED
Source: 2011 IDC Digital Universe Study
© Copyright 2012 EMC Corporation. All rights reserved.
23
Massive Numbers Of Massive Files
Files In The
Digital Universe
Big Data
Applications
500 Quadrillion
5+ TB
Source: 2011 IDC Digital Universe Study, EMC Customers
© Copyright 2012 EMC Corporation. All rights reserved.
24
Record File System IO Performance
Single File System
1,100,000+
636,036
403,326
190,675
© Copyright 2012 EMC Corporation. All rights reserved.
25
Record File System Capacity
Single File System
15 PB
2 PB
64 TB
100 TB
Source: Vendor Product Specifications
© Copyright 2012 EMC Corporation. All rights reserved.
26
So… All That I Need To Do Is
To Manage Big Data, Right?
© Copyright 2012 EMC Corporation. All rights reserved.
27
WRONG!
© Copyright 2012 EMC Corporation. All rights reserved.
28
Big Data Is About
© Copyright 2012 EMC Corporation. All rights reserved.
29
Big Data Is About
Predictive Analytics
© Copyright 2012 EMC Corporation. All rights reserved.
30
Old Analytic Processes
Administrator Bottleneck
Reactive, Unresponsive
Opaque, No Collaboration
© Copyright 2012 EMC Corporation. All rights reserved.
31
New Analytic Processes Are Different
Self-Service
Iterative, Agile
Transparent, Real Time Collaboration
© Copyright 2012 EMC Corporation. All rights reserved.
32
How Can Big Data Transform Your Business?
New Source of Customer, Product and Operational Insights
Today’s Decision-making
Big Data Decision-making
“Rearview Mirror” hindsight
“Forward-looking” insight
Less than 10% of available data
Exploit all data from diverse sources
Incomplete, disjointed, inaccurate
Real-time, correlated
© Copyright 2012 EMC Corporation. All rights reserved.
33
Big Data Apps Require Big Data Analytics
Your Approach To Business Analytics Must Change
Limited, Pre-Defined Expansive, Iterative
Slow & Reactive Agile & Proactive
Limited Insight Expanded Insight & Correlation
Risky Shadow Repositories Improved Compliance
© Copyright 2012 EMC Corporation. All rights reserved.
34
Does Big Data
Matter?
© Copyright 2012 EMC Corporation. All rights reserved.
35
The White House Big Data R&D Initiative
“The initiative we are launching today promises to transform our ability to use Big Data for
scientific discovery, environmental and biomedical research, education, and national
security.”
© Copyright 2012 EMC Corporation. All rights reserved.
36
White House Big Data Call To Action
March 29th, 2012
To Action
To Knowledge
From Data
© Copyright 2012 EMC Corporation. All rights reserved.
37
CLOUD
TRANSFORMS IT
BIG DATA
TRANSFORMS BUSINESS
© Copyright 2012 EMC Corporation. All rights reserved.
38
How Does Big Data
Transform Business?
© Copyright 2012 EMC Corporation. All rights reserved.
39
Predictive. Embedded.
© Copyright 2012 EMC Corporation. All rights reserved.
40
Big Data Enables Discovery Of Pre-Sal
© Copyright 2012 EMC Corporation. All rights reserved.
41
Big Data At Every Single Step
Seismic: Pre-stack
Velocity Data
Interpretation
Geologic Model
Navigation
Culture Data
Log Curves
Pressure Data
Seismic: Post-stack
© Copyright 2012 EMC Corporation. All rights reserved.
42
Oil & Gas Going Through Severe Levels
of Complexity
Thought Leadership
Bigger data at Higher quantities analyzed More
scientifically moving at Wider distances needing to be
Better managed accessible over Longer lifecycles of
time
Innovation Trends:
• Higher precision images
• More measures, more often, more places
• More iterations
• Longer production periods
© Copyright 2012 EMC Corporation. All rights reserved.
• More scarcity of even more specialized skills
• Greater collaboration and automation
• More analytical processes over longer
43
EMC Enables Next Gen Upstream
Upstream
Explore
Develop
Seismic AcquisitionSeismic & Geological
and Processing
Interpretation
Pre-stacked
Data
Post-stacked
Data
© Copyright 2012 EMC Corporation. All rights reserved.
Interpreted
Image
Produce
Reservoir
Modeling
Reservoir
Simulation
Modeled
Image
Reservoir
Management
Simulation
Image
Reservoir
Image
44
EMC Enables Next Gen Interpretation
Seismic & Geological Interpretation
SCIENTIFIC
WORKFLOW
MANAGEMENT
© Copyright 2012 EMC Corporation. All rights reserved.
45
EMC Defines Next Next Gen Upstream
Explore
Develop
Produce
© Copyright 2012 EMC Corporation. All rights reserved.
46
EMC Next Next Gen Big Data Science
Explore
Develop
Produce
© Copyright 2012 EMC Corporation. All rights reserved.
47
Brazil Develops A Robust Ecosystem
“Intellectual growth should commence at birth and cease only at
death.”
UFRJ Campus
Technology
Park
© Copyright 2012 EMC Corporation. All rights reserved.
48
EMC R&D Center Rendering: Rev.01
© Copyright 2012 EMC Corporation. All rights reserved.
49
EMC’s Expansion in Brazil
Petrobras R&D Expansion
• EMC to open an R&D Center in
the University Technology Park in
Rio
– Co-located on the University Campus
with Petrobras R&D (CENPES)
– Neighbor to Schlumberger,
Landmark/Halliburton & GE
• 30+ Big Data Scientists
UFRJ Campus
© Copyright 2012 EMC Corporation. All rights reserved.
Technolog
y
Park
– Collaborating with 50 others on
Campus
– Partnering with Intel & Cisco
• Research on Oil & Gas
Acquisition, Analysis, Collaboration
& Visualization of Seismic Data 50
The Financial Strength To Invest Heavily
2011 Fortune 500 Rank
152
2011 Revenues
$20B
2011 R&D Investment
$2.3B
2011 R&D As A Percentage Of Revenues
~11.5%
2011 Free Cash Flow
$4.4B
2011 Total Cash And Investments
$10.8B
Market Value
$54B
All data for calendar year 2011 except 2011 Fortune 500 ranking and market capitalization, which is as of 21 April 2011.
© Copyright 2012 EMC Corporation. All rights reserved.
51
Our Global Presence
50,000 people
83 countries
Seattle, WA
Pleasanton, CA
Cork, Ireland
Rotterdam, Netherlands
St. Petersburg, Russia
Burlington, Ontario
Brentford, UK
Pau, France
Durham, NC
Apex, NC
Irvine, CA
Duluth, GA
Hopkinton, MA
Vienna, Austria
Roy, UT
Global Headquarters
Palo Alto, CA
Santa Clara, CA
Tel Aviv,
Israel
Cairo, Egypt
Seoul, S. Korea
Beijing, China
Chengdu, China
Tokyo, Japan
Shanghai, China
Bedford, MA
Franklin, MA
Direct Presence
Be'er Sheva,
Israel
Bangalore, India
Cambridge, MA
R&D Center
Singapore
Centers of Excellence
Sydney, Australia
Customer Support Center
Executive Briefing Center
Rio de Janeiro, Brazil
Melbourne, Australia
Manufacturing Center
Global Solution and Engineering Center
© Copyright 2012 EMC Corporation. All rights reserved.
as of June 30, 2011
52
IT
© Copyright 2012 EMC Corporation. All rights reserved.
53
NDUSTRY
© Copyright 2012 EMC Corporation. All rights reserved.
in
IT
RANSITION
54
DISRUPTIVE TECHNOLOGY CREATES
LASTING CHANGE
© Copyright 2012 EMC Corporation. All rights reserved.
55