Combining the power of Watson and OSGi

advertisement
EclipseCon Boston 2013
Combining the power of
IBM Watson and OSGi
David Taieb
IBM Watson Core Technologies
Follow us @IBMWatson
© 2013 International Business Machines Corporation
Agenda
What is IBM Watson and why is it important?
IBM Watson deep dive and
how is IBM putting it to work?
Overview of the IBM Watson Tooling Platform
2
© 2013 International Business Machines Corporation
Businesses are “dying of thirst in an ocean of data”
90%
of the world’s data
was created in the
last two years
3
80%
of the world’s data
today is
unstructured
1 Trillion
connected devices
generate 2.5
quintillion bytes
data / day
1 in 2
83%
2.2X
business leaders
don’t have access
to data they need
of CIOs cited BI and
analytics as part of
their visionary plan
more likely that top
performers use
business analytics
© 2013 International Business Machines Corporation
Watson is ushering in a new era of computing . . .
System
Intelligence
Cognitive
Programmatic
Tabulation
Punch cards
Time card readers
1900
Search
Deterministic
Enterprise data
Machine language
Simple outputs
1950
Discovery
Probabilistic
Big Data
Natural language
Intelligent options
1
2011
. . .enabling new opportunities and outcomes
4
1
© 2013 International Business Machines Corporation
Why is it so hard for computers to understand us?
Welch ran
this?
Person
Organization
L. Gerstner
IBM
J. Welch
GE
W. Gates
Microsoft
What Watson isn't



What Watson is
Search engine

New-fangled database
system

Skynet or HAL 9000



5
“If leadership is an art
then surely Jack Welch
has proved himself a
master painter during his
tenure at GE.”
Question answering (QA) system
Combines information retrieval and natural language
processing (NLP)
Builds its domain knowledge from sources comprising
structured and unstructured data
A core set of technologies that can be customized and
targeted to specific industries
Runs on Apache UIMA (Unstructured Information
Management Architecture) technology based on the
© 2013 International Business Machines Corporation
OASIS standard
IBM Watson combines transformational technologies
2 Generates and
1 Understands
natural language
and human
communication
3 Adapts and learns
from user
selections and
responses
6
evaluates
evidence-based
hypothesis
…built on a massively parallel
architecture optimized for IBM POWER7
© 2013 International Business Machines Corporation
Brief History of IBM Watson
IBM
Research
Project
(2006 – )
Jeopardy!
Grand
Challenge
(Feb 2011)
Watson
for
Healthcare
(Aug 2011 –)
Watson
for Financial
Services
(Mar 2012 – )
Watson
Industry
Solutions
(2012 – )
Cross-industry
Applications
Expansion
Commercialization
Demonstration
R&D
7
© 2013 International Business Machines Corporation
On February 14, 2011, IBM Watson made history
Result of IBM Research “Grand Challenge”
8
© 2013 International Business Machines Corporation
Watson enables three classes of cognitive services
Ask
• Leverage vast amounts of data
• Ask questions for greater insights
• Natural language inquiries
• e.g. - Next generation Chat
Discover
• Find the rationale for given answers
• Prompt for inputs to yield improved responses
• Inspire considerations of new ideas
• e.g. - Next generation Search  Discovery
Decide
• Ingest and analyze domain sources, info models
• Generate evidence based decisions with confidence
• Learn with new outcomes and actions
• e.g. - Next generation Apps  Probabilistic Apps
9
© 2013 International Business Machines Corporation
How does IBM Watson Work: Architecture overview
Context
Independent
Scoring
Context
Dependent
Scoring
Evidence
Retrieval
A. Sources
Question
/Topic
Analysis
Primary
Search
Candidate
Answer
Generation
Answer
Scoring
Filter
Synthesis
Deep
Evidence
Scoring
Final Merging
& Ranking
Watson States
(Simplified)
Trained
Models
Teach
Answer,
Confidence
Train
Q&A
10
© 2013 International Business Machines Corporation
Putting IBM Watson to work : Watson Solutions
Solutions
Watson for Industry
Watson for
Healthcare
Watson for
Financial Services
Sample Advisor Solutions
Utilization
Research
Oncology
Care Mgt.
Sample Advisor Solutions
Banking
Financial Markets
ASK Services
Sample Advisor Solutions
Call Center
Help Desk
Knowledge
Technical
DISCOVER Services
DECISION Services
100111001
10010010010
1000101100101
10001010010
00110101
Data
NLP & Machine
Learning
Analytics
Cloud
Mobile
Workload Optimized
Systems
Platform
Capabilities
Insurance
Watson for
Client Engagement
Content
Ready
Tooling
Methods
Build
Algorithms
Teach
APIs
Run
Full Lifecycle
11
© 2013 International Business Machines Corporation
Why is tooling important for a successful Watson Implementation
•Adapting Watson to a new domain cannot be reduced to a simple product
install but must follow a rigorous methodology
•Readiness preparation
•Building the solution itself
•Teaching Watson about the industry, use case and data involved
•Run in production
•New Watson Solutions can leverage a set of core capabilities provided by the
Watson platform as a starting point
•Ingestable content
•Algorithms for Natural Language Processing, Candidate generation,
Machine Learnings, etc…
•Set of customizable tools for teaching, training and accuracy analysis
•For a successful domain adaptation, a team of people with different
backgrounds and expertise have to work together using a set of tools that are
easy to use and foster collaboration
12
© 2013 International Business Machines Corporation
Tooling Collaboration
I do the development
to get Watson
implemented and
deployed
I know what
Watson needs
to know
Industry Domain Expert
Watson
answers all my
questions
Watson Developer
I create the
public face of
Watson
End User
End UI Developer
I never met an
install that could
defeat me – though
many have tried
I can get to the
bottom of any Watson
accuracy problem
Testing and Accuracy
Analyst
13
System Administrator
© 2013 International Business Machines Corporation
Where is tooling needed
Version
Conttrol
Ingestion
Context
Dependent
Scoring
Context
Independent
Scoring
Evidence
Retrieval
A. Sources
Deep
Evidence
Scoring
Algorithm
Dev Tools
Question
/Topic
Analysis
Primary
Search
Candidate
Answer
Generation
Answer
Scoring
Filter
Trained
Models
Watson States
(Simplified)
Training
Data
Generation
Teach
Accuracy
Analysis
Train
Q&A
14
Final Merging
& Ranking
Synthesis
Ground
Truth
Pipeline
monitoring
Answer,
Confidence
Pipeline
Configuration
© 2013 International Business Machines Corporation
Question Analysis and Query building
Who is the first person to walk on the moon?
ESG Parse Tree
15
© 2013 International Business Machines Corporation
Search and Candidate Generation
Primary search constructs queries and
search among many available sources.
PRISMATIC (relationship search)
Lucene
Indri (multiple index types)
Semantic relations (DBpedia)
Candidate answers are generated based on:
•Titles
•Anchor text
•Passages and their parts: headwords,
numbers, dates
•Checking candidates against constraints
16
© 2013 International Business Machines Corporation
Scoring
isA(“Neil Armstrong”, “person”) = 0.8
isA(“Eugene Cernan”, “person”) = 0.3
isA(“Astronaut”, “person”) = 0.1
Candidate
Answers
17
More than 50 scoring components:
 Taxonomic
 Geospatial (location)
 Temporal
 Source reliability
Name consistency
 Relational
 Passage support
 Theory consistency
Context dependent (deep evidence)
Context independent
Features for machine language
Evidence Feature Scores
Doc
Rank
Pass
Rank
Ty
Cor
Geo
LFACS
Neil Armstrong
0
1
0.8
0
0.7
Eugene Cernan
1
1
0.3
0
0.4
Astronaut
2
2
0.1
0
0.3
John Young
3
0.3
0
0
Buzz Aldrin
3
0.5
0
0
Renegate kids
4
0.0
0
0
0
© 2013 International Business Machines Corporation
Supporting Evidence
LFACS Alignment
Passage search
 Much like a primary search, but requires
candidate answer as a term
 Further scored to ensure candidate answer
context
 Shared scoring solutions:
Passage term match
Skip-bigram
Text alignment
Logical form answer candidate
scoring
18

© 2013 International Business Machines Corporation
Final Merger
Merging

Remove duplicate answers
 Requires normalizing scores per feature
to make merger
 Ranking
 Use of ML and IBM® SPSS® over
training data to create the model to rank
future results
 Linear and logistic regression algorithms
 Teach-train-execute cycle

10,000 training questions and 2000 test
questions

19
© 2013 International Business Machines Corporation
Watson Tooling Platform Requirements
Development
•
•
•
End User
Enable multiple teams to work on a
common integration platform and modular
programming model
Ensure high degree of interoperability
between tools
Easy to use high level APIs
•
Persistence
•
Rest Services
•
Logging
•
…
•
•
•
•
•
Seamless access to all tools from the same
unified Web platform
Tools should be easy to assemble, configure
and deploy (zero install)
Unified login page across all tools
Consistent UI
Easy access to lifecycle tools
•
Task management
•
Source Control
•
Defect tracking
Solution Adopted
OSGi based Web
Container
20
© 2013 International Business Machines Corporation
Watson Tooling Platform
Accuracy
Analysis
Client Tier
Dojo
IBM IDX
Watson
JS APis
DWR JS
JSP Tags
POJO Apis
Web Tier
Training
Data
Watson Tooling UI
OSGi
Services
JAX-RS Wink
OSLC APis
Watson
Runtime
DomainSpecific
Specific
Domain
Domain
Specific
Tools
Domain Specific
Tools
Tools
Plugin Tools
Data
Ingestion
Pipeline
configuration
GroundTruth
Integration into a
unified UI Shell
Security (Authentication/Authorization/Single Sign-on)
Common Extension Points Run on single app server: WAS, Tomcat, Jetty,...
Common Apis Common utilities

CAS
Deserialization
Shell Action
Experiments
Data Models
Common UIMA
Watson Runtime
Welcome Page
Answer Key
File Formats
Type System
Question Action
Administration Access
Annotator Registry Instrumentation
Logging
Experiment Action
Configuration
Scheduling
•Equinox Web Container
•Http Service
•Jetty
Equinox Servlet
Bridge
OSGi Web Platform
Logging/Serviceability
Data Tier
Open JPA
21
Experiment
Output API
Answer Key
XML
Schema
Canonical Data Model / Data Access API Layer
Experiment Directory Output
Models/ARFFS
Intermediary and Training CASes
ScoreEval
PRA (Anatator tool)
Domain Specifics assets
Derby/
Db2
Derby/
Db2
Derby/
Db2
Experiment Store, Answer keys
Unified Database Schema
OSLC Integration of Lifecycle
Tools (Defects tracking,
Task, Source Control)
© 2013 International Business Machines Corporation
OSLC : Lifecycle tools Integration Improved
Produces New Driver
Watson Developer
Accuracy Analysis
Accuracy
Analyst
1.
2.
3.
4.
Retrieve Analysis Script
Store Experiment results
Create new Tasks
Enter Defect
Training data
1. Markup Annotation
2. Validate Ground Truth
Industry
Domain Expert
22
Jazz Lifecycle Integration
Platform (JLIP)
OSLC linked data models and RESTful services interfaces form a great extensible platform
upon which to build the tools needed by the Watson persona to access lifecycle data
OSLC Based Enterprise Tools
© 2013 International Business Machines Corporation
Leveraging the Jazz Lifecycle Integration Platform (JLIP)
The value of JLIP is that it provides Watson users the relevant
information needed from the tools, data and people.
Diverse platform participants:
•Ecosystem of tools and adapters
•IBM tools, including Rational practitioner
tools built to the platform
•Adapters for 3rd party tools, including
Lifecycle Integration Adapters (LIA)
•Homegrown and custom/services-built tools
Open Source, 3rd
Party, IBM nonOSLC Lifecycle
Tools
OSLC Based Tools &
Solutions (inc Rational
Lifecycle Tools)
Homegrown
Lifecycle Tools &
Solutions
Adapters
Adapters
Linked lifecycle data (OSLC)
Increasing value
•Harness the power of Big Data
•Connect data and people from disparate,
heterogeneous tools
•Identify a single system across complex tools
deployments to improve centralized
administration
•Enable extensibility and a broad ecosystem
with a vibrant community of participants
•Accommodate flexible delivery models and
channels such as Cloud and Mobile
23
Product/Vari
ant Planning
Versioning
and
Baselining
Single
Sign on
Admin Concepts
(System Registry, and
System Health)
Reviews
and
Approvals
Query &
Reporting
Traceability
& Impact
Analysis
Notifications
and Alerts
Shared Artifacts
(user, project)
Lifecycle Query
Lifecycle Management Capabilities
Jazz Lifecycle Integration Platform
© 2013 International Business Machines Corporation
We have only just begun to build a
new era of computing powered by
cognitive systems
 Transforming how organizations think, act,
and operate
 Learning through interactions
 Delivering evidence based responses driving
better outcomes
24
© 2013 International Business Machines Corporation
Resources
 IBM Watson: http://www-03.ibm.com/innovation/us/watson/
 IEEE Collection:
http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=6177717&punu
mber=5288520
 Eclipse Equinox: http://www.eclipse.org/equinox/
 JAZZ Platform: http://jazz.net
 OSLC:
http://open-services.net/
http://www.eclipse.org/lyo/
25
© 2013 International Business Machines Corporation
26
© 2013 International Business Machines Corporation
Download