6-Sears - Big Data Paris 2015

advertisement
Modernizing Business with BIG DATA
Aashish Chandra
Divisional VP, Sears Holdings
Global Head, Legacy Modernization, MetaScale
Big Data fueling Enterprise Agility
Harvard Business Review refers Sears Holdings
Hadoop use case - Big Data's Management
Revolution!
Sears eschews IBM/Oracle for
open source and self build
Sears’ Big Data Swap Lesson:
Functionality over price?
How banks can benefit from
real-time Big Data analytics?
2
Legacy Rides The Elephant
Hadoop has
changed the
enterprise big
data game.
Are you
languishing in
the past or
adopting
outdated
trends?
3
Journey to the world with NO Mainframes..
High
TCO
I. Mainframe Optimization
• 5% ~ 10% MIPS Reduction
• Quick Wins with Low hanging fruits
Optimize
Cost Savings
Open Source
Platform
Simpler &
Easier Code
II. Mainframe ONLINE
Inert
Business
Practice
s
Mainframe
Migration
Convert
• Tool based Conversion
• Convert COBOL & JCL to Java
Business
Agility
Business & IT
Transformation
Resource
Crunch
PiG /
Hadoop
Rewrites
III. Mainframe BATCH
• ETL Modernization
• Move Batch Processing to Hadoop
Modernized
Systems
IT Efficiencies
4
Why Hadoop and Why Now?
THE ADVANTAGES:
Cost reduction
Alleviate performance bottlenecks
ETL too expensive and complex
Mainframe and Data Warehouse processing  Hadoop
THE CHALLENGE:
Traditional enterprises lack of awareness
THE SOLUTION:
Leverage the growing support system for Hadoop
Make Hadoop the data hub in the Enterprise
Use Hadoop for processing batch and analytic jobs
5
The Classic Enterprise Challenge
Growing Data
Volumes
Shortened
Processing
Windows
Tight IT
Budgets
Latency in
Data
The
Challenge
Escalating
Costs
Hitting
Scalability
Ceilings
ETL
Complexity
Demanding
Business
Requirements
6
The Sears Holdings Approach
Key to our Approach:
1) allowing users to continue to use familiar consumption interfaces
2) providing inherent HA
3) enabling businesses to unlock previously unusable data
1
Implement a
Hadoopcentric
reference
architecture
2
Move
enterprise
batch
processing to
Hadoop
3
4
5
6
Make
Hadoop the
single point
of truth
Massively
reduce ETL
by
transforming
within
Hadoop
Move results
and
aggregates
back to
legacy
systems for
consumption
Retain, within
Hadoop,
source files
at the finest
granularity
for re-use
7
The Architecture
• Enterprise solutions using Hadoop must be an eco-system
• Large companies have a complex environment:
• Transactional system
• Services
• EDW and Data marts
• Reporting tools and needs
• We needed to build an entire solution
8
The Sears Holdings Architecture
9
PiG/Hadoop Ecosystem
JQUERY/AJAX
JQUERY/AJAX
Quart
z
JAXB
Quart
z
JAXB
J2EE/WebSphere
JBOSS
J2EE/JBOSS/SPRING
JBOSS
REST
API
REST
API
JDBC/IBATIS
JDBC/IBATIS
es
se
S
al
t
Hba
es
S
al
uc
R
SOL
od
MetaScale
pr
QL
t
uc
Enterprise
Systems
MyS
od
Enterprise
Systems
C
Ha
do
op
e
ic
pr
ata
er
er
ra d
e
T
B
m
to
m
to
ic
us
e
C
us
UD
pr
Ora
cl
Mysql
pr
e
DB2
2
CY /DB
A
A
G T
LE DA
A
R
TE
Batch Processing
Mainframe Batch Processing
HIVE
COBOL/JCL
JBOSS
HADOOP/PIG
JBOSS
VSAM
RUBY/MAPREDUCE
10
The Learning
Over two years of Hadoop experience using Hadoop for Enterprise legacy workload.
UNIQUE VALUE
IMPLEMENTATION
HADOOP
 We can dramatically reduce batch processing times for mainframe and EDW
 We can retain and analyze data at a much more granular level, with longer history
 Hadoop must be part of an overall solution and eco-system
 We can reliably meet our production deliverable time-windows by using Hadoop
 We can largely eliminate the use of traditional ETL tools
 New Tools allow improved user experience on very large data sets
 We developed tools and skills – The learning curve is not to be underestimated
 We developed experience in moving workload from expensive, proprietary mainframe
and EDW platforms to Hadoop with spectacular results
11
Some Examples
Use-Cases at Sears Holdings
The Challenge – Use-Case #1
Offers:
1.4B
SKUs
Timing:
Weekly
Sales:
8.9B
Line
Items
Items:
11.3M
SKUs
Price
Sync:
Daily
Elasticity:
12.6B
Parameters
Inventory:
1.8B rows
Stores:
3200
Sites
•
Intensive computational and large storage requirements
•
Needed to calculate item price elasticity based on 8 billion rows of sales data
•
Could only be run quarterly and on subset of data – Needed more often
•
Business need - React to market conditions and new product launches
13
The Result – Use-Case #1
Business Problem:
•
Intensive computational
and large storage
requirements
•
Needed to calculate
store-item price
elasticity based on 8
billion rows of sales
data
•
Could only be run
quarterly and on subset
of data
•
Business missing the
opportunity to react to
changing market
conditions and new
product launches
Offers:
1.4B
SKUs
Timing:
Weekly
Sales:
8.9B
Line
Items
Price
Sync:
Daily
Elasticity:
12.6B
Parameters
Items:
11.3M
SKUs
Inventory:
1.8B rows
Stores:
3200
Sites
Hadoop
Price elasticity
calculated
weekly
New business
capability
enabled
100% of data
set and
granularity
Meets all SLAs
14
The Challenge – Use-Case #2
Data
Sources:
30+
Input
Records:
Billions
Mainframe
Scalability:
Unable to
Scale 100
fold
Mainframe:
100 MIPS
on 1% of
data
Hadoop
• Mainframe batch business process would not scale
• Needed to process 100 times more detail to handle business critical functionality
• Business need required processing billions of records from 30 input data sources
• Complex business logic and financial calculations
• SLA for this cyclic process was 2 hours per run
Page 15
15
The Result – Use-Case #2
Business Problem:
Data
Sources:
30+
Input
Records:
Billions
• Mainframe batch
business process would
not scale
• Needed to process 100
times more detail to
handle rollout of high
value business critical
functionality
• Time sensitive business
need required processing
billions of records from
30 input data sources
• Complex business logic
and financial calculations
• SLA for this cyclic
process was 2 hours per
run
Mainframe
Scalability:
Unable to
Scale 100
fold
Mainframe:
100 MIPS
on 1% of
data
Hadoop
Teradata &
Mainframe Data
on Hadoop
Implemented
PIG for
Processing
Processing Met
Tighter SLA
JAVA UDFs for
financial
calculations
$600K Annual
Savings
Scalable
Solution in 8
Weeks
6000 Lines
Reduced to 400
Lines of PIG
16
The Challenge – Use-Case #3
Data
Storage:
Mainframe
DB2 Tables
Price
Data:
500M
Records
Processing
Window:
3.5 Hours
Mainframe
Jobs: 64
Hadoop
Mainframe unable to meet SLAs on growing data volume
17
The Result – Use-Case #3
Business Problem:
Data
Storage:
Mainframe
DB2 Tables
Mainframe unable to meet
SLAs on growing data volume
Price
Data:
500M
Records
Processing
Window:
3.5 Hours
Mainframe
Jobs: 64
Hadoop
Source Data in
Hadoop
Job Runs Over
100% faster –
Now in 1.5
hours
$100K in Annual
Savings
Maintenance
Improvement –
<50 Lines PIG
code
18
The Challenge – Use-Case #4
Teradata via
Business
Objects
Batch
Processing
Output:
.CSV Files
Transformation:
On Teradata
History
Retained:
No
User
Experience:
Unacceptable
New Report
Development:
Slow
Hadoop
• Needed to enhance user experience and ability to perform analytics at granular data
• Restricted availability of data due to space constraint
• Needed to retain granular data
• Needed Excel format interaction on data sources of 100 millions of records with agility
19
The Result – Use-Case #4
Business Problem:
• Needed to enhance user
experience and ability to
perform analytics at
granular data
Teradata via
Business
Objects
Batch
Processing
Output:
.CSV Files
• Restricted availability of
data due to space
constraint
Datameer for
Additional
Analytics
History
Retained:
No
User
Experience:
Unacceptabl
e
New Report
Development:
Slow
Hadoop
• Needed to retain granular
data
• Needed Excel format
interaction on data
sources of 100 millions of
records with agility
Transformation:
On Teradata
Sourcing Data
Directly to
Hadoop
Over 50 Data
Sources
Retained in
Hadoop
Redundant
Storage
Eliminated
PIG Scripts to
Ease Code
Maintenance
Transformation
Moved to
Hadoop
Granular
History
Retained
User
Experience
Expectations
Met
Business’s
Single Source
of Truth
20
Summary of Benefits
• Significant reduction in ISV costs
& mainframe software licenses
fees
• Open Source platform
• Saved ~ $2MM annually within 13
weeks by MIPS Optimization
efforts
• Reduced 1000+ MIPS by moving
batch processing to Hadoop
• Ancient systems no longer
bottleneck for business
• Faster time to Market
• Mission critical “Item Master”
application in COBOL/JCL being
converted by our tool in Java
(JOBOL)
• Modernized COBOL, JCL, DB2,
VSAM, IMS & so on
• Reduced batch processing in
COBOL/JCL from over 6 hrs to
less than 10 min in PiG Latin on
Hadoop
• Simpler, and easily maintainable
code
• Massively Parallel Processing
Cost
Savings
Transform
I.T.
Business
Agility
Skills &
Resources
• Readily available resources &
commodity skills
• Access to latest technologies
• IT Operational Efficiencies
• Moved 7000 lines of COBOL
code to under 50 lines in PiG
21
Summary
• Hadoop can revolutionize Enterprise workload and make
business agile
• Can reduce strain on legacy platforms
• Can reduce cost
• Can bring new business opportunities
• Must be an eco-system
• Must be part of an data overall strategy
• Not to be underestimated
22
The Horizon – What do we need next?
• Automation tools and techniques that ease the Enterprise integration
of Hadoop
• Educate traditional Enterprise IT organizations about the possibilities
and reasons to deploy Hadoop
• Continue development of a reusable framework for legacy workload
migration
23
For more information, visit:
Legacy Modernization Made Easy!
www.metascale.com
Follow us on Twitter @LegacyModernizationMadeEasy
Join us on LinkedIn: www.linkedin.com/company/metascale-llc
Contact: Kate Kostan
National Solutions
Kate.Kostan@MetaScale.com
24
Download