RainStor and Dell DX: Online Structured Data Retention for Data Center Consolidation Presentation for AFPOA, August 26th, 2011 Ramon.chen@rainstor.com – VP Product Management, RainStor Craig_Warthen@dell.com – Product Marketing, Product Group, Storage RainStor + Dell DX Archival solution for Reduction, Retention & on-demand Retrieval of historical (semi) structured data and big machine generated data at 10x Less TCO Unified Platform for Data Consolidation Enterprise Information Archiving (EIA) will become a key infrastructure component by 2013, as the archiving of structured data and unstructured content into a single platform emerges. EIA products that support multiple information types are replacing stand-alone application-specific archiving products. -Enterprise Information Archiving Transforms the Strategy and Approach for Archiving -June 2010 – Gartner: Kenneth Chin, Sheila Childs DX Object Storage Platform CONSOLIDATE: All Your Static Data on one massively scalable repository with no complexity STORE BOTH Structured and unstructured 3 Active Transactional Data Become A Fraction Of Total Storage Over Time Majority becomes Historical Data over time or even all historic when no longer active Data Application Performance 10% 100% Active 70% 90% Static 30% Cost $$$ and PAIN Transactional Data Time Automated Data Creation Changes the Mix Machine Generated or Human Generated Immediately Historical Data SEC-store every trade and price in every market forever! Facebook over 1 PB of log data Smart Grid will generate over 1 EB of data in the US alone! US Telco 750 TBs of CDR data retained today Application Performance Data Static 100% Static 100% Costs $$$ Time Static 100% Current Technology Approaches for Long Term Structured Data Retention RDBMS Warehouse / Store Dev Test DR Operational copies compound the storage problem Tape Impact of Traditional or Absent Strategies More Data More Infrastructure More Resources Higher Costs More Risk Business Challenges: Flat or Reduced IT Budgets Limited IT Resources & Need to focus on core business systems Increasing compliance retention periods compound data volume and management issues Limited access to larger data sets impacts ability to perform deeper analysis Technical Challenges: Multi-terabytes of structured data in traditional RDBMS’ or Files Challenging to maintain & high cost to manage legacy systems – just for data access! Backup windows aren’t being met. Traditional RDBMS systems cannot ingest high volume data and store it Requires expert resources to provide significant care and feeding Overburdened Resources Online Data Retention Requires New Technology Transactional OLTP Online Data Retention (OLDR) Static Machine-Generated Data (MGD) Analytical OLAP How We Do It Reduce Retain Retrieve Size: Massive de-dupe ~97% savings in storage Hardware: On low-cost Dell servers and DX Object Storage Platform Resources: Without specialist DBA support and less storage management Preserved: Massive record volumes in original form Immutable: Tamper proofed with audit trail and WORM Configurable: With retention & expiry policies Massively Scalable: With no complexity Long-Term Preservation: Optimized on object based technology platform with metadata Standards: SQL & BI tools via ODBC/JDBC, HTTP Performant: Fast queries for large complex data sets Flexible: With schema evolution & point-in-time access Disruptive Technology • Patented Peter Smith Pharmaceutical $40,000 • Data Reduction through value and pattern de-duplication = Highest rate of compression available Peter Smith Paul Pharmaceutical $40,000 Finance $35,000 Peter Smith Pharmaceutical $40,000 Paul Brown Finance $35,000 John • Fast Queries in stored format without re-inflation = Access via SQL and ODBC/JDBC, any BI tool (e.g. Cognos, Business Object) Use case: Application Retirement Confidential Application Retirement 1000s of Legacy apps using Oracle, SQL Server, Mainframes etc. Retire apps and store data in optimized repository Search/ Analytics 12 Same user searches & reports work. No changes needed. Current Pains Benefits of Dell Solution Multiple environments, expensive expert IT resources All legacy data on single platform High ongoing HW/SW & maintenance costs Reduced maintenance = frees-up budgets Biz users still need occasional but rare access to data Continued access to data = happy biz users App Retirement TCO Example 300 Legacy apps (250 GB each) = 75TB 25:1 Compression = 3TB Saving $5.7m/yr Retire apps and store data in optimized repository Search/ Analytics Same user searches & reports work. No changes needed. Current OPEX= $6m/yr. Dell Solution OPEX= $300k/yr. Storage 75TB * $20k/TB = $1.5m 3TB * $20k = $60k Servers 300 (1 svr/app) * $10k/svr = $3m 4 shared svrs * $10k = $40k Admin 1 DBA ($200k) per 4 apps= $1.5m 1 Admin ($200k) for entire solution 13 Use case: Machine Generated Data Retention Confidential Machine Generated Data Retention Billions of Human Activity or Machine Auto-generated Records Dell solution acts as primary repository -Closed payments/Transactions -Logs, IP Records -Facilities Management Sensor Data OK -Manufacturing Test, QC Search Ability to quickly access data, even as data continues to be ingested. Current Pains Benefits of Dell Solution Daily volumes outstripping RDBMS capacity Scalable ingestion, storage & query better than RDBMS Strict compliance and query latency needs Configurable expiry/purge, low latency access Significant $$$ spend to support growth Lowest cost per retained TB (10x less) 15 MGD Retention TCO Example 20B WAP logs/day. Retained over 3 Months = 2 Petabytes 20:1 Compression = 100TB Saving $11m/year Dell solution acts as primary repository -WAP Logs Search Ability to quickly access data, even as data continues to be ingested. Current OPEX = $11.8m/yr. Dell Solution OPEX= $800k/yr. Storage 2Pb * $5k/TB = $10m 100TB * $5k = $500k Servers 100 Svrs * $10k/svr = $1m 10 svrs * $10k = $100k Admin 4 Admin ($200k) = $800k 1 Admin = $200k 16 Retention & Compliance TCO Comparison RainStor Cloud/Hosting Enabled OR Dell DX OR Other Comput e Dell DX Object Storage Confidential Solution Overview A long-term preservation optimized solution that solves for the “Big Data” storage problems and enables a better way to retire and archive legacy applications. Integrated Solution Stack • Integrated SW and HW for maximum optimization of solution • Services practice • Dell storage and servers • Integrated specialized database • Dell networking platforms • Dell cloud infrastructure Use Cases 20 • “Big Data” Machine Generated Data • Application Retirement • Application Archive (Future) • Data Warehouse Archive (Future) SQL & BI Analytics Retired Apps Services Layer OLDR Layer Storage Layer -DX Object Store Retired Apps CDR Smart Meter Consulting, Implementation, Support Trade Network Logs DX Object Storage Platform Enterprise-class storage for fixed digital content Solution approach to tiered storage and archive/content management utilizing a common platform Manageability Scalability • Non-disruptive and simple HW expansions, technology transitions, and retire • Self managing, self healing • Easy to retrieve data, HTTP API • Selectable WORM capabilities • Peer-scale architecture • Scales to petabytes • Near limitless number of files • Can scale by as little as 1 node at a time • X86-based modular architecture TCO • Reduce cost of data management 50% • X86-based arch. • Power management features • No backup infrastructure required Application The Digital Object Doc Object Address UUID Metadata 101000101010100111010101100010110100…110010 HTTP/1.1 200 OK Date: Thu, 26 Jun 2008 21:26:34 GMT Server: Object Cluster/2.2 Application-Name: MS Word Create-Date: 2008-06-26 21:26:14.687000 System-Cluster: Internet Demo Cluster System-Created: Thu, 26 Jun 2008 21:26:20 GMT Content-Disposition: inline; filename=Sports %Segment%20626-08.doc Content-Length: 8619354 Content-type: application/doc lifepoint: [Thu, 03 Jul 2015 21:26:14 GMT] reps=2, deletable=True lifepoint: [] delete Replica-Count: 2 CUSTOM ELEMENTS Dell Confidential - Restricted • Object = Metadata + File Data • Stored together for life of object • Metadata: – Rich, descriptive data about the data – Context persisted over time – Enables policy-based management Simplified Expansion and Technology Refresh Adding capacity is simple • Rack & Cable • Power-up • No config or provisioning Refresh is just as easy • Upgrade without interruption • Retire node or volume • Replicates data to another node or volume Recovery & balancing is automatic • Continuous data availability • Load and capacity balanced Enterprise Ready Solution Import Query Key Benefits Low TCO Less Software, Hardware & People to maintain large data sets Ease of Use • Very low admin & no tuning needed • Peer-scale architecture • No provisioning or configuring • Backupless environment Compliant • Configurable retention rules • WORM options • Auto disposition Performance • High ingestion rate • Fast queries • Linear scale-out performance • Optimized Massive Data Compression Consolidated One Database, One Platform 25 Scalable • Massive scalability with no complexity • Big Data volumes