Pushing the Frontiers of Analytics Brenda Dietrich, IBM Fellow & VP CTO, Business Analytics © 2012 IBM Corporation Global Technology Outlook Objectives GTO identifies significant technology trends early. It looks for high impact disruptive technologies leading to game changing products and services over a 3-10 year horizon. Technology thresholds identified in a GTO demonstrate their influence on clients, enterprises, & industries and have high potential to create new businesses. 2 © 2012 IBM Corporation 2 Global Technology Outlook 2012 Uncertain data and analytics are major themes Managing Uncertain Data at Scale Systems of People The Future Watson Outcome Based Business Future of Analytics 3 Resilient Business and Services © 2012 IBM Corporation 3 Managing Uncertain Data at Scale Trend: Most of the world’s analyzed data will be uncertain 4 By 2015, 80% of the world’s data will be uncertain Uncertain data management requires new techniques These techniques are necessary for real-world Big Data Analytics Opportunity: Business leadership using Big Data Analytics Robust, business-aware uncertain data management Challenge: Taking Big Data Analytics into an uncertain world Analysis of text is highly nuanced; sensor-based data is imprecise Use analytics over uncertain web, sensor, and human-generated data Enable good business decisions by understanding analysis confidence Timely business decisions require efficient large-scale analytics It is more difficult to obtain insight about an individual than a group, especially if the source data is uncertain © 2012 IBM Corporation 4 The fourth dimension of Big Data: Veracity – handling data in doubt Volume Velocity Variety Veracity* Data at Rest Data in Motion Data in Many Forms Data in Doubt Terabytes to exabytes of existing data to process Streaming data, milliseconds to seconds to respond Structured, unstructured, text, multimedia Uncertainty due to data inconsistency & incompleteness, ambiguities, latency, deception, model approximations * Truthfulness, accuracy or precision, correctness 5 © 2012 IBM Corporation 5 Uncertainty arises from many sources Process Uncertainty Data Uncertainty Model Uncertainty Processes contain “randomness” Data input is uncertain All modeling is approximate Actual Spelling Intended Spelling Text Entry ? ? ? Uncertain travel times Fitting a curve to data GPS Uncertainty ? Testimony ? ? {Paris Airport} Ambiguity Semiconductor yield Contaminated? Rumors 6 {John Smith, Dallas} {John Smith, Kansas} Conflicting Data Forecasting a hurricane (www.noaa.gov) © 2012 IBM Corporation 6 By 2015, 80% of all available data will be uncertain By 2015 the number of networked devices will be double the entire global population. All sensor data has uncertainty. 8000 100 7000 6000 90 80 70 5000 60 4000 50 3000 40 2000 30 Aggregate Uncertainty % Global Data Volume in Exabytes 9000 The total number of social media accounts exceeds the entire global population. This data is highly uncertain in both its expression and content. Data quality solutions exist for enterprise data like customer, product, and address data, but this is only a fraction of the total enterprise data. 20 1000 10 0 Multiple sources: IDC,Cisco 2005 7 2010 2015 © 2012 IBM Corporation 7 Examples: Uncertainty management presents many opportunities Downtime costs $M in income loss Equipment maintenance needs unpredictable Customer contracts impose penalties Energy 5% more oil platform production 30% less maintenance cost Improvements obtained using statistical modeling that combine equipment sensor data with performance history to predict corrective maintenance activities Creating profiles from many sources 360˚ customer view Smarter Planet System analytics predict maintenance Many inconsistent data sources Intent hidden within social media Geospatial data is imprecise Auto 35% more satisfied customers by analyzing agent notes Telco 35% better churn prediction using customer SMS messages Research Process and forecast uncertainty More data from physician notes and tests 80% lower price protection costs 30% less channel inventory 50% fewer returns Reductions obtained using inventory replenishment model that accounts for uncertain price protection 8 Healthcare Supply chain Modeling Uncertainties Demand, sales, production, shipment Shipping Uncertainties Goods damaged Mistakes in shipped goods Reduced time to determine lending risk from weeks to minutes Structured medical records are incomplete “Golden” text notes Uncertainty in images must be interpreted Drug names Relationship types (mtr, sibs, m, paunt) Healthcare Able to identify: Mitral stenosis: 40% more smokers found 50% more diagnoses 15% more disease history 35% misdiagnoses © 2012 IBM Corporation 8 Credit Loyalty Michael San Jose, CA Buying DSLR today ! Influencers Intent Customer at Mall Customer in Store #42 $999 $560 In-Store Pricing And Discounts 9 CONDENSE Condensing data reduces uncertainty by constructing context Required: tight integration to maximize context discovery Data finds Data Mother Date Required: common practices followed by multiple standards for representing uncertain data and uncertainty of all types, provenance, and lineage and other metadata Son $560 Birthday Fact Discovery OR $999 A & Spatial Reasoning NY Sense Making & Temporal Reasoning Correlation Corroboration (Evidence Combination) ETC. Buying a DSLR today ! Maximum Context For Minimum Uncertainty Required: common APIs to enable sharing across the uncertainty management pipeline No such common practices, standards or APIs exist today © 2012 IBM Corporation 9 Systems of People A shift in value from process optimization to people-centric processes A new set of data is made possible by exploiting social business A new IT market is emerging 10 Organizations have extracted most of the efficiencies from traditional process automation IT enablement opportunities are shifting to Line of Business Social business drives new efficiencies and value from people-centric processes An opportunity to instrument people-processes Provides the basis for addressing diverse set of problems Adaptive social platforms instrumented with knowledge capture, interconnected with enterprise data and processes, and made intelligent through differentiating analytics will transform business © 2012 IBM Corporation 10 People-centric processes are at the core of a broad range of issues 11 Differentiate for Growth Create winning products, fast, by having the best and most productive knowledge workers Drive Sales Productivity Create superior sales force, drive sales enablement and seller/client alignment Grow in Emerging Markets Re-create organizational footprint in global markets Transform Service Delivery Further grow productivity and enable new delivery models © 2012 IBM Corporation 11 Optimizing people-centric processes is not the same as optimizing supply chains In the last couple of weeks, I’ve talked to ABC bank, XYZ and at a security conference. Status: Working Expert: Security Status: At conference Influencer “Status updates alone on Facebook amount to more than ten times more words than on all blogs worldwide” - David Kirkpatrick, The Facebook Effect CRM – – – – Claims Delivery Records Patents & Publications – Innovation Clients served – Work specs – Products Products sold – Engagements – Tasks worked Sales patterns accomplished – Technical – Team info leadership Productivity – Productivity Rich information (e.g. expertise, work patterns, response to incentives, digital reputation) is flowing through on-line collaboration and enterprise systems Capturing this information enables analytics to be applied to people-centric processes 12 © 2012 IBM Corporation 12 Strength of Sales Force Index is an example of what is possible with a rich representation of people SSFI mines sales force data to understand which attributes of a seller (e.g. skills, experiences), sales team (e.g. team composition, territories) or sales process (e.g. incentives, coverage model) are driving sales performance (quota attainment, win rates, productivity) TODAY Years selling Job change Salary band PBC 13 FUTURE True skills and expertise Disciplines Clients served Products sold Team experiences Connections Incentives and responses Career path … SSFI identifies: – Reasons for performance disparities (at individual or group level), and the best set of actions to drive performance “Why is our sales force in Region X not performing at par with other regions or competition?” “What actions can we take to improve sales performance?” “What are the incentives that truly drive performance?” © 2012 IBM Corporation 13 Executing on SoP vision depends on three key capabilities Incorporate capabilities that adapt content for situations and needs, and enhance communication over many devices, across diverse pools of talent context-aware cognitive load management translation, transcription text-to-speech, voice… PEOPLE ENABLEMENT 14 Develop capabilities to create a representation of a person’s skills, experiences, preferences, digital reputation… In a structured and organized way, so it can be used for the purpose of running a business PEOPLE CONTENT Implement capabilities for people-centric process optimization within an analytics platform for rapid, on-demand deployment matching, talent cloud crowdsourcing, predictive markets simulation of workforce trends performance analytics behavior modeling… PEOPLE ANALYTICS © 2012 IBM Corporation 14 Future of Analytics Explosion of unstructured data Consistent, extensible, and consumable analytics platform Optimizing across the stack to deploy analytics at scale 15 Creates new analytics opportunities Addresses new enterprise needs Reduces cost-to-value for enterprises Increases analytics solution coverage with limited supply of skills Analytics becomes a dominant IT workload and drives HW design Opportunity to seamlessly scale from terascale to exascale © 2012 IBM Corporation 15 Analytics is broadly defined as the use of data and computation to make smart decisions Data Decision point Possible outcomes Data instances Historical Reports and queries on data aggregates Predictive models Option 2 Answers and confidence Simulated Feedback and learning Text 16 Video, Images Audio © 2012 IBM Corporation 16 The value of analytics grows by incorporating new sources of data, composing a variety of analytic techniques, spanning organizational silos, and enabling iterative, user-driven interaction New format or usage of data Intent-to-buy trends Sources and types of data Structured or standardized Segmentationbased market impact estimates Sales-based demand forecasting Low 17 Multi-modal demand forecasting Price-based demand forecasting (own & competitors) Scope of decision High © 2012 IBM Corporation 17 New Data Traditional New Methods Analytics toolkits will be expanded to support ingestion and interpretation of unstructured data, and enable adaptation and learning Adaptive Analysis Responding to context Continual Analysis Responding to local change/feedback Optimization under Uncertainty Quantifying or mitigating risk Optimization Decision complexity, solution speed Predictive Modeling Causality, probabilistic, confidence levels Simulation High fidelity, games, data farming Forecasting Larger data sets, nonlinear regression Alerts Rules/triggers, context sensitive, complex events Query/Drill Down In memory data, fuzzy search, geo spatial Ad hoc Reporting Query by example, user defined reports Standard Reporting Real time, visualizations, user interaction Entity Resolution People, roles, locations, things Relationship, Feature Extraction Rules, semantic inferencing, matching Annotation and Tokenization Automated, crowd sourced Learn In the context of the decision process Decide and Act Understand and Predict Report Collect and Ingest/Interpret Decide what to count; enable accurate counting Extended from: Competing on Analytics, Davenport and Harris, 2007 18 © 2012 IBM Corporation 18 Analytic solutions will apply multiple methods to multiple forms of data Example: Utility Vegetation Management Effective Right of Way vegetation management is critical to streamlined utility operations Traditional Right of Way programs are mainly static-scenario driven on a six year cycle – Static and rigid models lead to predominantly reactive operations, which are expensive – Focus on narrow corridor widths fails to address severe weather impact A multimodal analytics approach can overcome these shortcomings – Structured data (e.g. transmission line maps) and unstructured data (e.g. LIDAR sensor) – Advanced modeling to perform a dynamic scenario-driven analysis SENSORS Preprocessor UTILITY DATA Preprocessor MAPS Right-of-Way Dynamic Forecasting Model Preprocessor WEATHER Preprocessor 19 Visualization Solution Framework 3-Dimensional Model Recovery ELECTRIC TELECOMMUNICATIONS RAIL ROAD OIL Schedule Generator © 2012 IBM Corporation 19 Analytics solution development requires several interacting design steps Data Evaluation and Fusion Algorithm Composition and Invention Testing and Execution Optimization Streaming data Data mining & statistics Text data Optimization & simulation Multi-dimensional Semantic analysis Time series Fuzzy matching Geo spatial Video & image Network algorithms Relational Social network Data Acquisition 20 ✔ Filtering and Extraction Validation New algorithms Business Rules Engine Core Analytics Composition and Packaging Deployment © 2012 IBM Corporation 20 Revenue An Analytics solution platform will increase enterprise value by supporting both the CxO solution and the CIO infrastructure Easier consumption of Analytics solutions – Have consistent look and feel – Changes are easier to implement effectively – Trustworthy solutions are produced With platform Without platform Lines of code Expand Mandate Refine business processes and enhance collaboration Transform Mandate Change the industry value chain through improved relationships 21 Leverage Mandate Streamline operations and increase organizational effectiveness Pioneer Mandate Radically innovate products, markets, business models More efficient, less complex development – Reduces growth of development costs – Speeds delivery of new functionality – Expands analytics solution developer population Reduces client cost of operation – Seamless integration eases deployment of solutions – Establishes preferred development path for new solution – Consistent and coherent infrastructure eases managing solutions The CIO can reduce cost and add value to the use of analytics by supporting collaboration and data/analysis sharing © 2012 IBM Corporation 21 Optimizing across the stack will enable the deployment of analytics at scale Systems supporting future analytics will be more data centric, composable and scalable Systems will support increasingly complex data sets and workflows. Different elements within these complex workflows will require different capabilities within systems. Predictive Analytics Modeling, Simulation Cores Text Analytics Hadoop Workloads Cores SCM Storage Cores SCM SCM Future System Cores SCM + + Network Optimization Sensitivity Analysis Network Storage Network Storage Network General Purpose Integrated Network Integrated Processing Integrated Storage Storage Balanced, reliable, power efficient systems, with integrated software that scales seamlessly Integrated analytics, modeling and simulation capabilities to address generation, management and analysis of Big Data for Business Advantage 22 © 2012 IBM Corporation 22 The Future Watson Extend Watson technology Moves beyond “question-in & answer-out” to always “learning” evidence-based decision support Lead in new domains Addresses the enterprise need to convert growing volumes of information into actionable knowledge Demonstrates business value in critical problem spaces, starting with Healthcare Enable efficient adaptation 23 Efficiently adapting and scaling Watson to new domains requires a novel blend of engineering and research © 2012 IBM Corporation 23 Watson’s real value proposition: Efficient decision support over unstructured (and structured) content Deeper Understanding, Higher Precision and Broader, Timely Coverage at lower costs Jeopardy! Challenge Shallow Understanding Low Precision Broad Coverage Unstructured Data Broad, rich in context Rapidly growing, current Invaluable yet under utilized 24 Deeper Understanding but Brittle High Precision at High Cost Narrow Limited Coverage Structured Data Precise, explicit Narrow, expensive © 2012 IBM Corporation 24 Taking Watson beyond Jeopardy! Understanding Specific Questions Interacting Question-In/Answer-Out Explaining Precise Answers & Accurate Confidences Learning Batch Training Process The type of murmur associated with this condition is harsh, systolic, and increases in intensity with Valsalva From specific questions to rich, incomplete problem scenarios (e.g. EHR) Evidence analysis and look-ahead, drive interactive dialog to refine answers and evidence Move from quality answers to quality answers and evidence Answers, Corrections, Judgements Input, Responses Entire Medical Record Dialog Responses, Learning Questions Refined Answers, Follow-up Questions Rich Problem Scenarios 25 Scale domain learning and adaptation rate and efficiency Interactive Dialog Teach Watson Comparative Evidence Profiles Continuous Training & Learning Process © 2012 IBM Corporation 25 26 © 2012 IBM Corporation