EclipseCon Boston 2013 Combining the power of IBM Watson and OSGi David Taieb IBM Watson Core Technologies Follow us @IBMWatson © 2013 International Business Machines Corporation Agenda What is IBM Watson and why is it important? IBM Watson deep dive and how is IBM putting it to work? Overview of the IBM Watson Tooling Platform 2 © 2013 International Business Machines Corporation Businesses are “dying of thirst in an ocean of data” 90% of the world’s data was created in the last two years 3 80% of the world’s data today is unstructured 1 Trillion connected devices generate 2.5 quintillion bytes data / day 1 in 2 83% 2.2X business leaders don’t have access to data they need of CIOs cited BI and analytics as part of their visionary plan more likely that top performers use business analytics © 2013 International Business Machines Corporation Watson is ushering in a new era of computing . . . System Intelligence Cognitive Programmatic Tabulation Punch cards Time card readers 1900 Search Deterministic Enterprise data Machine language Simple outputs 1950 Discovery Probabilistic Big Data Natural language Intelligent options 1 2011 . . .enabling new opportunities and outcomes 4 1 © 2013 International Business Machines Corporation Why is it so hard for computers to understand us? Welch ran this? Person Organization L. Gerstner IBM J. Welch GE W. Gates Microsoft What Watson isn't What Watson is Search engine New-fangled database system Skynet or HAL 9000 5 “If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE.” Question answering (QA) system Combines information retrieval and natural language processing (NLP) Builds its domain knowledge from sources comprising structured and unstructured data A core set of technologies that can be customized and targeted to specific industries Runs on Apache UIMA (Unstructured Information Management Architecture) technology based on the © 2013 International Business Machines Corporation OASIS standard IBM Watson combines transformational technologies 2 Generates and 1 Understands natural language and human communication 3 Adapts and learns from user selections and responses 6 evaluates evidence-based hypothesis …built on a massively parallel architecture optimized for IBM POWER7 © 2013 International Business Machines Corporation Brief History of IBM Watson IBM Research Project (2006 – ) Jeopardy! Grand Challenge (Feb 2011) Watson for Healthcare (Aug 2011 –) Watson for Financial Services (Mar 2012 – ) Watson Industry Solutions (2012 – ) Cross-industry Applications Expansion Commercialization Demonstration R&D 7 © 2013 International Business Machines Corporation On February 14, 2011, IBM Watson made history Result of IBM Research “Grand Challenge” 8 © 2013 International Business Machines Corporation Watson enables three classes of cognitive services Ask • Leverage vast amounts of data • Ask questions for greater insights • Natural language inquiries • e.g. - Next generation Chat Discover • Find the rationale for given answers • Prompt for inputs to yield improved responses • Inspire considerations of new ideas • e.g. - Next generation Search Discovery Decide • Ingest and analyze domain sources, info models • Generate evidence based decisions with confidence • Learn with new outcomes and actions • e.g. - Next generation Apps Probabilistic Apps 9 © 2013 International Business Machines Corporation How does IBM Watson Work: Architecture overview Context Independent Scoring Context Dependent Scoring Evidence Retrieval A. Sources Question /Topic Analysis Primary Search Candidate Answer Generation Answer Scoring Filter Synthesis Deep Evidence Scoring Final Merging & Ranking Watson States (Simplified) Trained Models Teach Answer, Confidence Train Q&A 10 © 2013 International Business Machines Corporation Putting IBM Watson to work : Watson Solutions Solutions Watson for Industry Watson for Healthcare Watson for Financial Services Sample Advisor Solutions Utilization Research Oncology Care Mgt. Sample Advisor Solutions Banking Financial Markets ASK Services Sample Advisor Solutions Call Center Help Desk Knowledge Technical DISCOVER Services DECISION Services 100111001 10010010010 1000101100101 10001010010 00110101 Data NLP & Machine Learning Analytics Cloud Mobile Workload Optimized Systems Platform Capabilities Insurance Watson for Client Engagement Content Ready Tooling Methods Build Algorithms Teach APIs Run Full Lifecycle 11 © 2013 International Business Machines Corporation Why is tooling important for a successful Watson Implementation •Adapting Watson to a new domain cannot be reduced to a simple product install but must follow a rigorous methodology •Readiness preparation •Building the solution itself •Teaching Watson about the industry, use case and data involved •Run in production •New Watson Solutions can leverage a set of core capabilities provided by the Watson platform as a starting point •Ingestable content •Algorithms for Natural Language Processing, Candidate generation, Machine Learnings, etc… •Set of customizable tools for teaching, training and accuracy analysis •For a successful domain adaptation, a team of people with different backgrounds and expertise have to work together using a set of tools that are easy to use and foster collaboration 12 © 2013 International Business Machines Corporation Tooling Collaboration I do the development to get Watson implemented and deployed I know what Watson needs to know Industry Domain Expert Watson answers all my questions Watson Developer I create the public face of Watson End User End UI Developer I never met an install that could defeat me – though many have tried I can get to the bottom of any Watson accuracy problem Testing and Accuracy Analyst 13 System Administrator © 2013 International Business Machines Corporation Where is tooling needed Version Conttrol Ingestion Context Dependent Scoring Context Independent Scoring Evidence Retrieval A. Sources Deep Evidence Scoring Algorithm Dev Tools Question /Topic Analysis Primary Search Candidate Answer Generation Answer Scoring Filter Trained Models Watson States (Simplified) Training Data Generation Teach Accuracy Analysis Train Q&A 14 Final Merging & Ranking Synthesis Ground Truth Pipeline monitoring Answer, Confidence Pipeline Configuration © 2013 International Business Machines Corporation Question Analysis and Query building Who is the first person to walk on the moon? ESG Parse Tree 15 © 2013 International Business Machines Corporation Search and Candidate Generation Primary search constructs queries and search among many available sources. PRISMATIC (relationship search) Lucene Indri (multiple index types) Semantic relations (DBpedia) Candidate answers are generated based on: •Titles •Anchor text •Passages and their parts: headwords, numbers, dates •Checking candidates against constraints 16 © 2013 International Business Machines Corporation Scoring isA(“Neil Armstrong”, “person”) = 0.8 isA(“Eugene Cernan”, “person”) = 0.3 isA(“Astronaut”, “person”) = 0.1 Candidate Answers 17 More than 50 scoring components: Taxonomic Geospatial (location) Temporal Source reliability Name consistency Relational Passage support Theory consistency Context dependent (deep evidence) Context independent Features for machine language Evidence Feature Scores Doc Rank Pass Rank Ty Cor Geo LFACS Neil Armstrong 0 1 0.8 0 0.7 Eugene Cernan 1 1 0.3 0 0.4 Astronaut 2 2 0.1 0 0.3 John Young 3 0.3 0 0 Buzz Aldrin 3 0.5 0 0 Renegate kids 4 0.0 0 0 0 © 2013 International Business Machines Corporation Supporting Evidence LFACS Alignment Passage search Much like a primary search, but requires candidate answer as a term Further scored to ensure candidate answer context Shared scoring solutions: Passage term match Skip-bigram Text alignment Logical form answer candidate scoring 18 © 2013 International Business Machines Corporation Final Merger Merging Remove duplicate answers Requires normalizing scores per feature to make merger Ranking Use of ML and IBM® SPSS® over training data to create the model to rank future results Linear and logistic regression algorithms Teach-train-execute cycle 10,000 training questions and 2000 test questions 19 © 2013 International Business Machines Corporation Watson Tooling Platform Requirements Development • • • End User Enable multiple teams to work on a common integration platform and modular programming model Ensure high degree of interoperability between tools Easy to use high level APIs • Persistence • Rest Services • Logging • … • • • • • Seamless access to all tools from the same unified Web platform Tools should be easy to assemble, configure and deploy (zero install) Unified login page across all tools Consistent UI Easy access to lifecycle tools • Task management • Source Control • Defect tracking Solution Adopted OSGi based Web Container 20 © 2013 International Business Machines Corporation Watson Tooling Platform Accuracy Analysis Client Tier Dojo IBM IDX Watson JS APis DWR JS JSP Tags POJO Apis Web Tier Training Data Watson Tooling UI OSGi Services JAX-RS Wink OSLC APis Watson Runtime DomainSpecific Specific Domain Domain Specific Tools Domain Specific Tools Tools Plugin Tools Data Ingestion Pipeline configuration GroundTruth Integration into a unified UI Shell Security (Authentication/Authorization/Single Sign-on) Common Extension Points Run on single app server: WAS, Tomcat, Jetty,... Common Apis Common utilities CAS Deserialization Shell Action Experiments Data Models Common UIMA Watson Runtime Welcome Page Answer Key File Formats Type System Question Action Administration Access Annotator Registry Instrumentation Logging Experiment Action Configuration Scheduling •Equinox Web Container •Http Service •Jetty Equinox Servlet Bridge OSGi Web Platform Logging/Serviceability Data Tier Open JPA 21 Experiment Output API Answer Key XML Schema Canonical Data Model / Data Access API Layer Experiment Directory Output Models/ARFFS Intermediary and Training CASes ScoreEval PRA (Anatator tool) Domain Specifics assets Derby/ Db2 Derby/ Db2 Derby/ Db2 Experiment Store, Answer keys Unified Database Schema OSLC Integration of Lifecycle Tools (Defects tracking, Task, Source Control) © 2013 International Business Machines Corporation OSLC : Lifecycle tools Integration Improved Produces New Driver Watson Developer Accuracy Analysis Accuracy Analyst 1. 2. 3. 4. Retrieve Analysis Script Store Experiment results Create new Tasks Enter Defect Training data 1. Markup Annotation 2. Validate Ground Truth Industry Domain Expert 22 Jazz Lifecycle Integration Platform (JLIP) OSLC linked data models and RESTful services interfaces form a great extensible platform upon which to build the tools needed by the Watson persona to access lifecycle data OSLC Based Enterprise Tools © 2013 International Business Machines Corporation Leveraging the Jazz Lifecycle Integration Platform (JLIP) The value of JLIP is that it provides Watson users the relevant information needed from the tools, data and people. Diverse platform participants: •Ecosystem of tools and adapters •IBM tools, including Rational practitioner tools built to the platform •Adapters for 3rd party tools, including Lifecycle Integration Adapters (LIA) •Homegrown and custom/services-built tools Open Source, 3rd Party, IBM nonOSLC Lifecycle Tools OSLC Based Tools & Solutions (inc Rational Lifecycle Tools) Homegrown Lifecycle Tools & Solutions Adapters Adapters Linked lifecycle data (OSLC) Increasing value •Harness the power of Big Data •Connect data and people from disparate, heterogeneous tools •Identify a single system across complex tools deployments to improve centralized administration •Enable extensibility and a broad ecosystem with a vibrant community of participants •Accommodate flexible delivery models and channels such as Cloud and Mobile 23 Product/Vari ant Planning Versioning and Baselining Single Sign on Admin Concepts (System Registry, and System Health) Reviews and Approvals Query & Reporting Traceability & Impact Analysis Notifications and Alerts Shared Artifacts (user, project) Lifecycle Query Lifecycle Management Capabilities Jazz Lifecycle Integration Platform © 2013 International Business Machines Corporation We have only just begun to build a new era of computing powered by cognitive systems Transforming how organizations think, act, and operate Learning through interactions Delivering evidence based responses driving better outcomes 24 © 2013 International Business Machines Corporation Resources IBM Watson: http://www-03.ibm.com/innovation/us/watson/ IEEE Collection: http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717&punu mber=5288520 Eclipse Equinox: http://www.eclipse.org/equinox/ JAZZ Platform: http://jazz.net OSLC: http://open-services.net/ http://www.eclipse.org/lyo/ 25 © 2013 International Business Machines Corporation 26 © 2013 International Business Machines Corporation