Modernizing Business with BIG DATA Aashish Chandra Divisional VP, Sears Holdings Global Head, Legacy Modernization, MetaScale Big Data fueling Enterprise Agility Harvard Business Review refers Sears Holdings Hadoop use case - Big Data's Management Revolution! Sears eschews IBM/Oracle for open source and self build Sears’ Big Data Swap Lesson: Functionality over price? How banks can benefit from real-time Big Data analytics? 2 Legacy Rides The Elephant Hadoop has changed the enterprise big data game. Are you languishing in the past or adopting outdated trends? 3 Journey to the world with NO Mainframes.. High TCO I. Mainframe Optimization • 5% ~ 10% MIPS Reduction • Quick Wins with Low hanging fruits Optimize Cost Savings Open Source Platform Simpler & Easier Code II. Mainframe ONLINE Inert Business Practice s Mainframe Migration Convert • Tool based Conversion • Convert COBOL & JCL to Java Business Agility Business & IT Transformation Resource Crunch PiG / Hadoop Rewrites III. Mainframe BATCH • ETL Modernization • Move Batch Processing to Hadoop Modernized Systems IT Efficiencies 4 Why Hadoop and Why Now? THE ADVANTAGES: Cost reduction Alleviate performance bottlenecks ETL too expensive and complex Mainframe and Data Warehouse processing Hadoop THE CHALLENGE: Traditional enterprises lack of awareness THE SOLUTION: Leverage the growing support system for Hadoop Make Hadoop the data hub in the Enterprise Use Hadoop for processing batch and analytic jobs 5 The Classic Enterprise Challenge Growing Data Volumes Shortened Processing Windows Tight IT Budgets Latency in Data The Challenge Escalating Costs Hitting Scalability Ceilings ETL Complexity Demanding Business Requirements 6 The Sears Holdings Approach Key to our Approach: 1) allowing users to continue to use familiar consumption interfaces 2) providing inherent HA 3) enabling businesses to unlock previously unusable data 1 Implement a Hadoopcentric reference architecture 2 Move enterprise batch processing to Hadoop 3 4 5 6 Make Hadoop the single point of truth Massively reduce ETL by transforming within Hadoop Move results and aggregates back to legacy systems for consumption Retain, within Hadoop, source files at the finest granularity for re-use 7 The Architecture • Enterprise solutions using Hadoop must be an eco-system • Large companies have a complex environment: • Transactional system • Services • EDW and Data marts • Reporting tools and needs • We needed to build an entire solution 8 The Sears Holdings Architecture 9 PiG/Hadoop Ecosystem JQUERY/AJAX JQUERY/AJAX Quart z JAXB Quart z JAXB J2EE/WebSphere JBOSS J2EE/JBOSS/SPRING JBOSS REST API REST API JDBC/IBATIS JDBC/IBATIS es se S al t Hba es S al uc R SOL od MetaScale pr QL t uc Enterprise Systems MyS od Enterprise Systems C Ha do op e ic pr ata er er ra d e T B m to m to ic us e C us UD pr Ora cl Mysql pr e DB2 2 CY /DB A A G T LE DA A R TE Batch Processing Mainframe Batch Processing HIVE COBOL/JCL JBOSS HADOOP/PIG JBOSS VSAM RUBY/MAPREDUCE 10 The Learning Over two years of Hadoop experience using Hadoop for Enterprise legacy workload. UNIQUE VALUE IMPLEMENTATION HADOOP We can dramatically reduce batch processing times for mainframe and EDW We can retain and analyze data at a much more granular level, with longer history Hadoop must be part of an overall solution and eco-system We can reliably meet our production deliverable time-windows by using Hadoop We can largely eliminate the use of traditional ETL tools New Tools allow improved user experience on very large data sets We developed tools and skills – The learning curve is not to be underestimated We developed experience in moving workload from expensive, proprietary mainframe and EDW platforms to Hadoop with spectacular results 11 Some Examples Use-Cases at Sears Holdings The Challenge – Use-Case #1 Offers: 1.4B SKUs Timing: Weekly Sales: 8.9B Line Items Items: 11.3M SKUs Price Sync: Daily Elasticity: 12.6B Parameters Inventory: 1.8B rows Stores: 3200 Sites • Intensive computational and large storage requirements • Needed to calculate item price elasticity based on 8 billion rows of sales data • Could only be run quarterly and on subset of data – Needed more often • Business need - React to market conditions and new product launches 13 The Result – Use-Case #1 Business Problem: • Intensive computational and large storage requirements • Needed to calculate store-item price elasticity based on 8 billion rows of sales data • Could only be run quarterly and on subset of data • Business missing the opportunity to react to changing market conditions and new product launches Offers: 1.4B SKUs Timing: Weekly Sales: 8.9B Line Items Price Sync: Daily Elasticity: 12.6B Parameters Items: 11.3M SKUs Inventory: 1.8B rows Stores: 3200 Sites Hadoop Price elasticity calculated weekly New business capability enabled 100% of data set and granularity Meets all SLAs 14 The Challenge – Use-Case #2 Data Sources: 30+ Input Records: Billions Mainframe Scalability: Unable to Scale 100 fold Mainframe: 100 MIPS on 1% of data Hadoop • Mainframe batch business process would not scale • Needed to process 100 times more detail to handle business critical functionality • Business need required processing billions of records from 30 input data sources • Complex business logic and financial calculations • SLA for this cyclic process was 2 hours per run Page 15 15 The Result – Use-Case #2 Business Problem: Data Sources: 30+ Input Records: Billions • Mainframe batch business process would not scale • Needed to process 100 times more detail to handle rollout of high value business critical functionality • Time sensitive business need required processing billions of records from 30 input data sources • Complex business logic and financial calculations • SLA for this cyclic process was 2 hours per run Mainframe Scalability: Unable to Scale 100 fold Mainframe: 100 MIPS on 1% of data Hadoop Teradata & Mainframe Data on Hadoop Implemented PIG for Processing Processing Met Tighter SLA JAVA UDFs for financial calculations $600K Annual Savings Scalable Solution in 8 Weeks 6000 Lines Reduced to 400 Lines of PIG 16 The Challenge – Use-Case #3 Data Storage: Mainframe DB2 Tables Price Data: 500M Records Processing Window: 3.5 Hours Mainframe Jobs: 64 Hadoop Mainframe unable to meet SLAs on growing data volume 17 The Result – Use-Case #3 Business Problem: Data Storage: Mainframe DB2 Tables Mainframe unable to meet SLAs on growing data volume Price Data: 500M Records Processing Window: 3.5 Hours Mainframe Jobs: 64 Hadoop Source Data in Hadoop Job Runs Over 100% faster – Now in 1.5 hours $100K in Annual Savings Maintenance Improvement – <50 Lines PIG code 18 The Challenge – Use-Case #4 Teradata via Business Objects Batch Processing Output: .CSV Files Transformation: On Teradata History Retained: No User Experience: Unacceptable New Report Development: Slow Hadoop • Needed to enhance user experience and ability to perform analytics at granular data • Restricted availability of data due to space constraint • Needed to retain granular data • Needed Excel format interaction on data sources of 100 millions of records with agility 19 The Result – Use-Case #4 Business Problem: • Needed to enhance user experience and ability to perform analytics at granular data Teradata via Business Objects Batch Processing Output: .CSV Files • Restricted availability of data due to space constraint Datameer for Additional Analytics History Retained: No User Experience: Unacceptabl e New Report Development: Slow Hadoop • Needed to retain granular data • Needed Excel format interaction on data sources of 100 millions of records with agility Transformation: On Teradata Sourcing Data Directly to Hadoop Over 50 Data Sources Retained in Hadoop Redundant Storage Eliminated PIG Scripts to Ease Code Maintenance Transformation Moved to Hadoop Granular History Retained User Experience Expectations Met Business’s Single Source of Truth 20 Summary of Benefits • Significant reduction in ISV costs & mainframe software licenses fees • Open Source platform • Saved ~ $2MM annually within 13 weeks by MIPS Optimization efforts • Reduced 1000+ MIPS by moving batch processing to Hadoop • Ancient systems no longer bottleneck for business • Faster time to Market • Mission critical “Item Master” application in COBOL/JCL being converted by our tool in Java (JOBOL) • Modernized COBOL, JCL, DB2, VSAM, IMS & so on • Reduced batch processing in COBOL/JCL from over 6 hrs to less than 10 min in PiG Latin on Hadoop • Simpler, and easily maintainable code • Massively Parallel Processing Cost Savings Transform I.T. Business Agility Skills & Resources • Readily available resources & commodity skills • Access to latest technologies • IT Operational Efficiencies • Moved 7000 lines of COBOL code to under 50 lines in PiG 21 Summary • Hadoop can revolutionize Enterprise workload and make business agile • Can reduce strain on legacy platforms • Can reduce cost • Can bring new business opportunities • Must be an eco-system • Must be part of an data overall strategy • Not to be underestimated 22 The Horizon – What do we need next? • Automation tools and techniques that ease the Enterprise integration of Hadoop • Educate traditional Enterprise IT organizations about the possibilities and reasons to deploy Hadoop • Continue development of a reusable framework for legacy workload migration 23 For more information, visit: Legacy Modernization Made Easy! www.metascale.com Follow us on Twitter @LegacyModernizationMadeEasy Join us on LinkedIn: www.linkedin.com/company/metascale-llc Contact: Kate Kostan National Solutions Kate.Kostan@MetaScale.com 24