DB2 User Group Meeting Big Data, Business Analytics and System z Shantan Kethireddy shantank@us.ibm.com DB2 System z FTSS Analytics-driven Organizations Can… Identify Risk …and immediately control it Increase system capacity and availability while keeping IT costs flat Insights into overlapping policies from multiple insurance companies Getting their reports as much as 70 percent faster 5 There is an Explosion in Data and Real World Events 2 Billion Internet users by 2011 1.3 Billion RFID tags in 2005 30 Billion RFID tags by 2010 Capital market data volumes grew 1,750%, 2003-06 World Data Centre for Climate 220 Terabytes of Web data 9 Petabytes of additional data 4.6 Billon Mobile Phones World Wide Twitter process 7 terabytes of data every day Facebook process 10 terabytes of data every day eXtreme Analytics High Volume Data Arriving From Many Sources Auto Correlation and Cross Correlation Across Sources Add social networking (reduce the size of Online trxn systems) Search Online Transaction Processing System Embedded Analytics ClickSteam, CRM Dashboards Claim data (text, picture, video) Location Tracking (GPS), iPhone, Vehicle Use Data, $ Trans tracking (Across borders & IP providers), Billions of mobile devices Continuous arrival of high volume information (evolving, highly Auto/Cross variant) (struct-/semi--/unstructured) Financial Planning Analytics, Census Bureau Data Predictive Analytics Search Market Data, Weather Data Web Buz data About products/companies (for reputation analysis) Scorecards Correlation Feeds: Sensors data Mash ups 100’s TBs/ PetaBytes Deep & Wide Analytics Fine grained – individual product and customer at a time and place 12 IBM’s Big Data Platform Vision Bringing Big Data to the Enterprise Client and Partner Solutions IBM Big Data Solutions Data Warehouse Warehouse Appliances Big Data User Environments Developers End Users Netezza Administrators Master Data Mgmt InfoSphere MDM INTEGRATION AGENTS Big Data Enterprise Engines Database DB2 Content Analytics ECM Internet Scale Analytics Open Source Foundational Components Hadoop HBase Pig Lucene Jaql Information Server Streaming Analytics Business Analytics Cognos & SPSS Marketing Unica Data Growth Management InfoSphere Optim 13 Insight Analytics Internet-scale Analytics Based upon Apache Hadoop and Open Source – Apache Hadoop (HDFS, Map-Reduce), Jaql (programming query language), Pig, Flume, Hive, Lucene (text search), Zookeeper (process coordination), Avro (data serialization), HBase (real time read/write), Oozie Unique Features: – Improving upon open source – enterprise-scale indexing – Complex Analytics - Text analytics – Enterprise Scale - Enterprise class storage and security – Workflow - orchestration and prioritization – End-user environments - Enterprise Console – data explorer Organizations use BigInsights for processing an extreme variety and volume of data – ranging from weather predictions to social media and multi-channel customer pattern analysis to IT multi-system log analysis. 14 Internet-Scale Analytics in Action Financial Services Improved risk decisions Customer sentiment analysis AML Transportation Weather and traffic impact on logistics and fuel consumption Call Centers Voice-to-text mining for customer behavior understanding Telecommunications Operations and failure analysis from device, sensor, and GPS inputs Utilities Weather impact analysis on power generation Smart meter data analysis IT Transition log analysis for multiple transactional systems E Commerce Analyze internet behavior and buying patterns Digital Asset Piracy Multi-channel integration Integrated customer behavior modeling 15 Streaming Analytics Streaming Analytics Unique Features: – Complex Analytics - Analysis of structured and unstructured (video, audio, geo-spatial, and other non-relational data) streams • Mining Toolkit to score data models in real time against streaming data – Fast - Clustered runtime for high-performance, extremely low-latency streaming applications – Enterprise-scale - High availability via runtime restart and recovery services – Scalable Organizations use streaming analytics for extreme velocity and variety in various applications – ranging from real-time traffic analysis to predicting stock fluctuations depending on weather. 16 Streaming Analytics in ActionStock market Impact of weather on securities prices Analyze market data at ultra-low latencies Natural Systems Wildfire management Water management Law Enforcement, Defense & Cyber Security Real-time multimodal surveillance Situational awareness Cyber security detection Transportation Intelligent traffic management Fraud prevention Detecting multi-party fraud Real time fraud prevention Manufacturing Process control for microchip fabrication e-Science Space weather prediction Detection of transient events Synchrotron atomic research Health & Life Sciences Neonatal ICU monitoring Epidemic early warning system Remote healthcare monitoring Other Telephony CDR processing Social analysis Churn prediction Geomapping Smart Grid Text Analysis Who’s Talking to Whom? ERP for Commodities FPGA Acceleration Predictive Analytics: The power of social media to forecast auto sales Social media serves as a proxy of people’s opinions (past experiences and current beliefs) about products or services Social media influences people’s buying behavior and the future sales Thus, social media is a powerful platform to predict the future COBRA/Cognos Consumer Insight (CCI) harness this unstructured information for predictive analytics People’s opinions (experiences and beliefs) reflected influence Social Media Use social media as a predictive platform 19 People’s buying behavior and the future sales COBRA (CCI) + SPSS modeler = Social Predictive Analytics Backend - Building the System Social Media Online News Persistence Queries Frontend - Discovery List of Blogs/ Boards Information Extraction Internal Customer Data News Feed / Wires Targeted Websites Discovery Analyzing Analyzing Influencers Taxonomies Analyzing Sentiment URL’s COBRA (CCI) ingests targeted social media contents analyzing sentiment and taxonomies 20 SPSS modeler implements social media-based prediction models for sales Significant correlations between social media measures and auto sales People’s opinions (experiences and beliefs) reflected influence Social Media Use social media as a predictive platform People’s buying behavior and the future sales (From COBRA) Monthly change of the sentiment (positive/neutral/negative) about a brand Monthly change of the frequency (# postings) of topic keywords about a brand 21 (From the company) Observing significant correlations Monthly sales data Sentiment change correlates with the auto sales Finding 1: When the sentiment measure increased (or decreased) from the previous month, the auto sales tended to go up (or down). Examples: Postings with positive or negative sentiment Sentiment-based Correlation Analysis: We investigate the correlation “xyz the is oneauto of the best selling in the US market.” (positive between sales ofsedans a target car brand andsentiment) the sentiment change on the “It's car brand-related social media content hard to know....if the problem is widespread....xyz should fix it. I would probably never buy a new xyz. Today's xyz’s seem over priced, their salesmen act condescending, and Results: For CarI want (Jan.2009 ~ car.” Dec.2010, monthly sample data), well.... truthfully, an American (negative sentiment) “Hi,there I have ais 2005 my question iscorrelation my rear drum brakes are not selfthe adjusting. a xyz, significant between autoTake sales the car to the shop, have it adjusted and soon after it loosens again.” (negative sentiment change: sentiment) and the Pearson’s correlation coefficient = 0.418 (p < .05) Note that the sentiment measure M(t) for a month t on the target car brand-related social media content M(t) = (P(t) – N(t)) / V(t) where P(t) = number of postings with positive sentiment for t, N(t) = number of postings with negative sentiment for t, and V(t) = number of total postings for t Sentiment change for a month = the change in the sentiment compared to the previous month 22 23 Predictive Analytics: the social media power to predict the auto sales © Copyright IBM Corporation 2011 Keyword frequency changes correlates with the auto sales Finding 2: When people mentioned terms such as “safety”, “brakes”, “solid” and “torque” more (or less) compared to the previous month, xyz Sales tended to go down (or up). Keyword-based Correlation Analysis “safety” “brakes” xyz-related Social Media Content COBRA text clustering Automatically discovering topic keywords Corr = – 0.525 (p < .05) Corr = – 0.525 (p < .05) “solid” Corr = – 0.578 (p < .05) “torque” . . . Corr = – 0.503 (p < .05) Compute Pearson’s Correlation Coefficient Keyword Frequency Change for each month = the change in the number of postings containing the keyword compared to the previous month 24 xyz Sales Jan.2009 ~ Dec.2010 monthly sample data 25 Machine Learning Example: Topic Detection and Evolution What are people talking about in social media about a product? documents words 1 1 0.10 1 2 0.30 1 3 0.22 1 4 1.24 : : : : : : K topics documents K topics words 1 1 0.10 1 2 0.30 : : : H W 26 26 We have developed novel technologies, such as IMARS, to automatically recognize semantic categories for diverse visual content Traditional object tracking, face detection, event composition activities IMARS is built on foundation of large-scale semantics modeling and generalized visual feature-based machine learning fireworks parade earthquake Abandoned bag flag burning combat shoplifting launch fire flood wreckage scenes bridge mountains waterfront traffic buildings cityscape street scene monument people couple glasses face team photo person with baby few crowd objects helicopter airplane ferry police car vehicle 10’s specialized military ship 100’s soldiers 1K # categories humvee bus truck 10K 100K generalized 27 27 DB2 Analytics Accelerator V3 Capitalizing on the best of both worlds – System z and Netezza What is it? The IBM DB2 Analytics Accelerator is a workload optimized, appliance add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times. How is it different Performance: Unprecedented response times to enable 'train of thought' analyses frequently blocked by poor query performance. Integration: Deep integration with DB2 V9 and V10 provides transparency to all applications. Self-managed workloads: queries are executed in the most efficient location Transparency: applications connected to DB2 are entirely unaware of the Accelerator Simplified administration: appliance hands-free operations, eliminating most database tuning tasks Breakthrough Technology Enabling New Opportunities 28 DB2 Analytics Accelerator for z/OS Netezza appliance connected to System z only accessible through DB2 Blending System z and Netezza What is the value? technologies to deliver unparalleled, • Fast, predictable response times for “right-time” analysis • Accelerate analytic/ad hoc query response times • Improve price/performance for analytic workloads. Quick ROI • Ease of deployment • Minimize the need to create data marts for performance • Highly secure environment for sensitive data analysis • Transparent to the application and user mixed workload performance for complex analytic business needs. OLTP vs. Analytics – Examples OLTP - “Transactional” Transactional Analytics: (Operational BA) Deep Analytics Withdrawal from a bank account using an ATM Approve request to increase credit line based on credit history and customer profile Regular reporting to central bank – sum of transactions by account Buying a book at Amazon.com Propose additional books based on similar purchases by other customers Which books were bestsellers in Europe over the last 2 months? Check-In for a flight at the airport Offer an upgrade based on frequent flyer history of all passengers and available seats Marketing campaign to sell more tickets in off-peak times Hand-over manufactured printers to an overseascarrier Optimize shipping by selecting cheapest and most reliable carrier on demand Trend of printers sold in emerging countries versus established markets. 30 Business Intelligence Example Predictive Analytics Example Data Integration Example