POWERING UP ANALYTICS WITH BIG DATA THE SAS WAY! -PRIYA SARATHY, PH.D ANALYTIC SALES CONSULTANT, SAS C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . SALUTE TO THE WORLD RUN BY STATISTICIANS Play C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . AGENDA • High Performance Analytics (HPA) • • • • • Meeting Challenges The What? Understanding the Analytic paradigm Shift High Performance Analytics – the SAS way What is the business value add C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . MEETING CHALLENGES C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . HIGH PERFORMANCE WHAT IS HPA DELIVERING ANALYTICS • What is HPA about? • • Evolving business needs Why does business need it? • Leveraging information to compete in the market • Raise revenue/ profits • Reduce costs and inefficiencies C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . Proactive Big Analytics Big Data Analytics Reactive Analytic Capabilities HIGH PERFORMANCE ANALYTICS GREW FROM THE NEED FOR BIG DATA ANALYTICS! BI Big Data BI Large Big Data Size C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . HIGH HPA IS IMPACTING BUSINESS PERFORMANCE IN MANY PERFORMANCE AREAS ANALYTICS Probability of Default on Mortgage Stress Testing Portfolio Next Best Offer C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . •Data Analysis, Variable Selection, Modeling – millions of customers scored in batch •Reduce the time to complete all these tasks from 167 hours to 84 seconds!!! •Market risk solution that simulates market states to derive the value at risk •Understand exposures by counterparties / instrument , Rapidly respond to crisis and adjust your positions accordingly •Recalculate entire risk portfolio in 12 minutes –down from 18 hours!! •Multiple offers, millions of customers, Regional, response history , business rule constraints. •Optimization across cross-sell, upsell offers can run several hours •Speed up computation from 5.5 hours to 2 minutes. HIGH PERFORMANCE WHAT CONVERSATIONS ARE YOU INVOLVED IN? ANALYTICS Fraud Detection Forecasting Inventory Management Retail Marketing Real Time Relationship Marketing C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . •More data analyzed for fraud – more quickly and accurately than ever – across all departments from inside a single enterprise data warehouse. •Trade monitoring-unauthorized trades, Commercial fraud –ACH, Wire, Warranty, Customer fraud - payroll, claims fraud. •Multi level relationships, Segments, global markets •Accuracy in demand forecasting, daily to weekly forecast updates across several models •Promote inventory flow from 24 months by 85% •Household Targeting, Retail bank Campaigns, Customer Acquisition Model •Data Analysis, Variable Selection, Modeling •Real time offers – coupons, cross sell offers • Sports retailer, Location-based analytics and CLV modeling with real time updates, pattern and behavioral analysis = > 60% increase in response rates. • Airline operations: 8-10 hours of modeling, lagged data creating suboptimal decisions – faster insights, greater accuracy from multiple iterations, reduce operation cost. WHAT DO OTHERS DATA MEASUREMENT IS THE MODERN EQUIVALENT OF THINK? THE MICROSCOPE* 28 year Asst. professor at Stanford combined math with political science in his undergraduate and graduate studies, seeing “an opportunity because the discipline is becoming increasingly data-intensive.” His research involves the computer-automated analysis of blog postings, Congressional speeches and press releases, and news articles, looking for insights into how political ideas spread. It’s not just more streams of data, but entirely new ones- countless digital sensors worldwide in industrial equipment, automobiles, electrical meters and shipping crates- measure and communicate location, movement, vibration, temperature, humidity, even chemical changes in the air.. C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold. * Quote from Professor Brynjolfsson The Age of Big Data, By STEVE LOHR, NYT THE WHAT? C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . THE NEW NORMAL – WHAT IS HPA DOING TO ANALYTICS? • • • The Things you can Think! • • • • • C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . Analyze 100% of data More/New variables More model iterations Manage complex models More models (per domain area) More questions/ideas/scenarios to evaluate Multiple deployment options: batch, real-time Continuously monitor model effectiveness and retrain HIGH HPA COMBINES THE THREE PILLARS TO DELIVER PERFORMANCE RESULTS ANALYTICS • Data: Leveraging technology to collect, access and manage data • Analytics: Adapting to new technology, Inmemory, Grid, In-database • Platform: Positioning analytics within industry leaders technology solutions C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . HIGH ADVANCED ANALYTICS AND FAST COMPUTING CAPABILITIES ARE PERFORMANCE BROUGHT TOGETHER WITH SAS HPA ANALYTICS • In a recent National Post interview with Jim Goodnight, the SAS CEO explains it like this: There's a lot of business processes that will be changing because of the speed at which we can do analytics; using a thousand processes in parallel to do these computations can make it possible to do huge problems that we would never have been able to do before because it would take too long on a single processor. • A big part of how HPA gets its speed: it breaks larger problems down into smaller pieces. C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . HIGH PERFORMANCE HPA HELPS REMOVE LIMITATIONS ANALYTICS • • • • • • • • • From Sampling to Populations analysis 50 Attributes to 500+ Attributes Reduce run times 18 Hrs - 30 minutes Build more complex models 3 month Lagged modeling to Real time updates Structured data to combining unstructured data Shortening model lifecycle More frequent updates, model iterations real time scoring impacting business bottom-line You will have more time to think! C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . UNDERSTANDING THE ANALYTIC PARADIGM SHIFT C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . MODEL LIFECYCLE HOW MUCH TIME DO YOU SPEND ON YOUR MODELS? • Where would you like to spend more time? Monitoring & Results Reporting 15% Data Analysis 45% Validation & Implementation 10% C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . Model Build 30% RESPONSIBILITIES OF AN STATISTICAL MODEL BUILDING PARADIGM SHIFT ANALYST • • • • • • • • • • • Extract, Transform, Load data Data massaging/ mining Aggregating, normalizing data Identifying Analytic approach Building Samples Building Models Creating Scoring Code Validation Reports/ model documentation Implementation for Production Results monitoring Update, refresh, or rebuild model C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . • IT – shifting responsibilities to • • • • • • EDW/ DW Data Quality Data integration ODS Production implementation Analyst – building models • • • • Access to more and better data Need for documentation and transparency Greater number of business solutions Changing market and data dynamics impacting frequency of build and update MODEL LIFECYCLE CHANGING ROLES AND RESPONSIBILITIES • New technology, new tools • New business processes • New competitive demands Monitoring & Results Reporting 5% Data Analysis 25% Validation & Implementation 10% C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . Model Build 60% THE FARFALLE THE BASIC STRUCTURE OF ANALYTIC FUNCTION MODEL Source: IDC, 2012 • 70% of the effort in analytics is typically on the information management side of the model. • Analytical teams in the middle are small but crucial for translating the data assets into actionable insights. • The organization change side highlights the attributes of behavior changes needed by business users. C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . Working with a Tsunami of data VOLUME DATA SIZE VARIETY VELOCITY VALUE TODAY C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . THE FUTURE HIGH PERFORMANCE ANALYTICS – THE SAS WAY C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . SAS® HIGHPERFORMANCE ANALYTICS EMBRACING NEW TECHNOLOGY, BUILDING NEW STRENGTHS Visual Analytics C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . PHYSICAL LAYOUT SCALABLE ANALYTIC CAPABILITY Node 1 SAS Analytic & Scoring Accelerators RDMBS Node 2 Shared / Clustered File SAS Metadata Servers [Controller Node n cores] Node n SAS Analytic & Scoring Accelerators HADOOP CLIENT FRAME C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . MID-TIER COMPUTING FRAME DATA FRAME HIGH PERFORMANCE CHANGING THE WAY ANALYTICS IS DONE BOTTOMS UP ANALYTICS Data Preparation • DS2 • SORT Data Exploration Analytics • SUMMARY/MEANS • HPLOGISTIC • FREQ • HPREG • RANK • HPLMIXED • HPFOREST • HPNEURAL • HPREDUCE • HPNLIN C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . SAS® HIGHPERFORMANCE AREAS OF MODEL DEVELOPMENT THAT BENEFIT ANALYTICS SERVER Predictive Analytics & Data Mining • Binary target & continuous no. predictions • Linear & NonLinear modeling • Complex relationships • Tree-based Classification Text Mining • Parsing largescale text collections • Extract entities • Auto. stemming & synonym detection • Topic discovery Optimization* • Econometrics Time Series Local search optimization • Large-scale linear & mixed integer problems • Probability of an event(s) • Severity of random event(s) *Currently only available for Teradata and EMC Greenplum C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . IN-MEMORY HIGH PERFORMANCE ANALYTICS HPA C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . VA FINANCIAL SERVICES CUSTOMER ACQUISITION USE CASE Current Process MODEL DEPLOYMENT DATA EXPLORATION MODEL DEVELOPMENT One algorithm (Neural Network) 1 model per day 5 hours to process model Model lift of 1.6% C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . High-Performance Process Multiple algorithms (e.g. Forest, Logistic Reg., etc.) 1 model per 30 minutes 3 minutes to process model Model lift of 2.5% 84 SECONDS • Think left and think right and think low and think high. Oh, the thinks you can think up if only you try! Oh the things you can find, if you don't stay behind! Dr. Seuss (On Beyond Zebra!, 1955) C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . SAS® HIGHPERFORMANCE ANALYTICS SERVER C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . KEY DIFFERENTIATORS • Only in-memory offering in the market delivering highend analytics, including text mining and optimization • Addresses the entire model development and deployment lifecycle • 36 years of proven technology...faster. Opens up vast array of possibilities to get value from big data ADDITIONAL CASE STUDIES C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . TOP FIVE WAYS HIGH-PERFORMANCE ANALYTICS WILL TRANSFORM MARKETING • Faster, more sophisticated, effective segmentation • • Real-time, relevant next-best customer actions or offers • • companies to quickly and efficiently update their numerous models without submitting a slow overnight batch update process. 1:1 real-time experiences to bolster brand connections • • This results in a more relevant offer or customer interaction surfacing at the “point of need” in real-time Instant deployment and management of marketing models that give you a sustainable advantage • • segmentation tests can be run against the entire populations in order to determine the best campaign interaction methods The outcome is more precise, real-time interactions with consumers at the “point of need.” Optimized marketing for broader business impact • Now businesses can not only determine the customer and financial impacts of their campaigns faster but also adapt instantaneously to market, competitive and customer changes. C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . UNITED HEALTHCARE GROUP BUSINESS ISSUE • Electronic medical records (EMRs) driving a data explosion • Utilize all of the unstructured text (records, case notes, emails, transcripts, etc.) • How to improve quality and cost of care? “Create Healthier Lives” SOLUTION SAS® High-Performance Analytics Server including HP Text Mining • Greenplum Data Computing Appliance • RESULTS • • • • • • Reduce model processing time from four hours to 10 seconds. Reduce misclassification rates from 30% to 10% Historical models improved with more than 10% lift I can now tell that a prescription will harm a patient before you write it… I can tell that a customer is dissatisfied before you lose him or her... I can now determine that a claim is fraudulent before you pay it… C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . HEALTHCARE PAYER “ SAS is helping make our member services the best in the industry, In less than one hour, we can load a huge table (169 million row dataset), find the best variables, compare different models and pick the best model. I would not attempt to model a dataset this large without SAS HPA Server.” Mark Pitts Director of Data Science, Solutions and Strategy SAS HIGHLEVERAGING DATABASE APPLIANCE FOR HPA PERFORMANCE Request is Root Node (Teradata Managed Server) sent to the root node inside the appliance C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . Worker Node 1 Worker Node 2 Worker Node N SAS HIGH- ANALYTICAL COMPUTATION AND DATA REQUEST SENT PERFORMANCE TO THE WORKER NODES Root Node Worker Node 1 C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . Worker Node 2 Worker Node N SAS HIGH- DATA REQUEST SENT TO THE DATABASE. DATA SLICE PERFORMANCE MOVED INTO MEMORY Root Node C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . SAS HIGHANALYTIC PROCESSING WITH INTERNODE COMMUNICATION PERFORMANCE Root Node C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . SAS HIGH- WORKER NODE RETURNED TO THE ROOT NODE. JOB IS PERFORMANCE COMPLETE. Root Node C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .