IBM GTO - Big Data Analytics

advertisement
Pushing the Frontiers of Analytics
Brenda Dietrich, IBM Fellow & VP
CTO, Business Analytics
© 2012 IBM Corporation
Global Technology Outlook Objectives
GTO identifies significant technology trends
early. It looks for high impact disruptive
technologies leading to game changing
products and services over a 3-10 year
horizon.
Technology thresholds identified in a GTO
demonstrate their influence on clients,
enterprises, & industries and have high
potential to create new businesses.
2
© 2012 IBM Corporation
2
Global Technology Outlook 2012
Uncertain data and analytics are major themes
Managing
Uncertain
Data
at
Scale
Systems of People
The Future Watson
Outcome Based Business
Future
of
Analytics
3
Resilient Business and Services
© 2012 IBM Corporation
3
Managing Uncertain Data at Scale
Trend: Most of the
world’s analyzed
data will be uncertain
4
 By 2015, 80% of the world’s data will be uncertain
 Uncertain data management requires new techniques
 These techniques are necessary for real-world Big Data Analytics
Opportunity:
Business leadership
using Big Data
Analytics
 Robust, business-aware uncertain data management
Challenge: Taking
Big Data Analytics
into an uncertain
world
 Analysis of text is highly nuanced; sensor-based data is imprecise
 Use analytics over uncertain web, sensor, and human-generated data
 Enable good business decisions by understanding analysis confidence
 Timely business decisions require efficient large-scale analytics
 It is more difficult to obtain insight about an individual than a group,
especially if the source data is uncertain
© 2012 IBM Corporation
4
The fourth dimension of Big Data: Veracity – handling data in doubt
Volume
Velocity
Variety
Veracity*
Data at Rest
Data in Motion
Data in Many
Forms
Data in Doubt
Terabytes to exabytes
of existing data to
process
Streaming data,
milliseconds to
seconds to respond
Structured,
unstructured, text,
multimedia
Uncertainty due to
data inconsistency
& incompleteness,
ambiguities, latency,
deception, model
approximations
* Truthfulness, accuracy or precision, correctness
5
© 2012 IBM Corporation
5
Uncertainty arises from many sources
Process Uncertainty
Data Uncertainty
Model Uncertainty
Processes contain
“randomness”
Data input is uncertain
All modeling is approximate
Actual
Spelling
Intended
Spelling Text Entry
? ?
?
Uncertain travel times
Fitting a curve to data
GPS Uncertainty
?
Testimony
?
?
{Paris Airport}
Ambiguity
Semiconductor yield
Contaminated?
Rumors
6
{John Smith, Dallas}
{John Smith, Kansas}
Conflicting Data
Forecasting a hurricane
(www.noaa.gov)
© 2012 IBM Corporation
6
By 2015, 80% of all available data will be uncertain
By 2015 the number of networked devices will
be double the entire global population. All
sensor data has uncertainty.
8000 100
7000
6000
90
80
70
5000
60
4000
50
3000
40
2000
30
Aggregate Uncertainty %
Global Data Volume in Exabytes
9000
The total number of social media
accounts exceeds the entire global
population. This data is highly uncertain
in both its expression and content.
Data quality solutions exist for
enterprise data like customer,
product, and address data, but
this is only a fraction of the
total enterprise data.
20
1000
10
0
Multiple sources: IDC,Cisco
2005
7
2010
2015
© 2012 IBM Corporation
7
Examples: Uncertainty management presents many opportunities
 Downtime costs $M in income loss
 Equipment maintenance needs unpredictable
 Customer contracts impose penalties
Energy
5% more oil platform production
30% less maintenance cost
Improvements obtained using statistical modeling that
combine equipment sensor data with performance
history to predict corrective maintenance activities
Creating profiles from many sources
360˚ customer view
Smarter Planet
System analytics predict maintenance
 Many inconsistent data sources
 Intent hidden within social media
 Geospatial data is imprecise
Auto
35% more satisfied customers
by analyzing agent notes
Telco
35% better churn prediction using
customer SMS messages
Research
Process and forecast uncertainty
More data from physician notes and tests
80% lower price protection costs
30% less channel inventory
50% fewer returns
Reductions obtained using inventory replenishment model
that accounts for uncertain price protection
8
Healthcare
Supply chain
Modeling Uncertainties
 Demand, sales, production, shipment
Shipping Uncertainties
 Goods damaged
 Mistakes in shipped goods
Reduced time to determine lending risk
from weeks to minutes
Structured medical records are incomplete
 “Golden” text notes  Uncertainty in images
must be interpreted
 Drug names
 Relationship types
(mtr, sibs, m, paunt)
Healthcare
Able to identify:
Mitral stenosis:
 40% more smokers found
 50% more diagnoses
 15% more disease history  35% misdiagnoses
© 2012 IBM Corporation
8
Credit
Loyalty
Michael
San Jose, CA
Buying
DSLR
today !
Influencers
Intent
Customer at Mall
Customer in Store #42
$999
$560
In-Store Pricing
And Discounts
9
CONDENSE
Condensing data reduces uncertainty by constructing context
Required: tight integration to maximize
context discovery
Data finds
Data
Mother
Date
Required: common practices followed
by multiple standards for representing
uncertain data and uncertainty of all
types, provenance, and lineage and
other metadata
Son
$560
Birthday
Fact
Discovery
OR
$999
A
&
Spatial Reasoning
NY
Sense Making
&
Temporal
Reasoning
Correlation
Corroboration
(Evidence Combination)
ETC.
Buying
a DSLR
today !
Maximum Context
For
Minimum Uncertainty
Required: common APIs to enable
sharing across the uncertainty
management pipeline
No such common practices,
standards or APIs exist
today
© 2012 IBM Corporation
9
Systems of People
A shift in value from
process optimization
to people-centric
processes
A new set of data is
made possible by
exploiting social
business
A new IT market is
emerging
10
 Organizations have extracted most of the efficiencies from traditional
process automation
 IT enablement opportunities are shifting to Line of Business
 Social business drives new efficiencies and value from people-centric
processes
 An opportunity to instrument people-processes
 Provides the basis for addressing diverse set of problems
 Adaptive social platforms instrumented with knowledge capture,
interconnected with enterprise data and processes, and made
intelligent through differentiating analytics will transform business
© 2012 IBM Corporation
10
People-centric processes are at the core of a broad range of issues
11
Differentiate for Growth
Create winning products, fast, by having the
best and most productive knowledge workers
Drive Sales Productivity
Create superior sales force, drive sales
enablement and seller/client alignment
Grow in Emerging Markets
Re-create organizational footprint in global
markets
Transform Service Delivery
Further grow productivity and enable new
delivery models
© 2012 IBM Corporation
11
Optimizing people-centric processes is not the same as optimizing
supply chains
In the last couple of
weeks, I’ve talked to ABC
bank, XYZ and at a
security conference.
Status: Working
Expert: Security
Status: At conference
Influencer
“Status updates alone on Facebook amount to more
than ten times more words than on all blogs
worldwide” - David Kirkpatrick, The Facebook Effect
CRM
–
–
–
–
Claims
Delivery
Records
Patents &
Publications
– Innovation
Clients served
– Work specs
– Products
Products sold – Engagements – Tasks
worked
Sales patterns
accomplished – Technical
– Team info
leadership
Productivity
– Productivity
 Rich information (e.g. expertise, work patterns, response to incentives, digital reputation)
is flowing through on-line collaboration and enterprise systems
 Capturing this information enables analytics to be applied to people-centric processes
12
© 2012 IBM Corporation
12
Strength of Sales Force Index is an example of what is possible with a rich
representation of people
 SSFI mines sales force data to understand
which attributes of a seller (e.g. skills,
experiences), sales team (e.g. team
composition, territories) or sales process
(e.g. incentives, coverage model) are
driving sales performance (quota
attainment, win rates, productivity)
TODAY
 Years selling
 Job change
 Salary band
 PBC
13
FUTURE
 True skills and expertise
 Disciplines
 Clients served
 Products sold
 Team experiences
 Connections
 Incentives and responses
 Career path
 …
 SSFI identifies:
– Reasons for performance disparities (at
individual or group level), and the best set of
actions to drive performance
“Why is our sales force in Region X not
performing at par with other regions or
competition?”
“What actions can we take to improve sales
performance?”
“What are the incentives that truly drive
performance?”
© 2012 IBM Corporation
13
Executing on SoP vision depends on three key capabilities
Incorporate capabilities that
adapt content for situations
and needs, and enhance
communication over many
devices, across diverse
pools of talent
context-aware
cognitive load management
translation, transcription
text-to-speech, voice…
PEOPLE ENABLEMENT
14
Develop capabilities to create
a representation of a
person’s skills, experiences,
preferences, digital
reputation…
In a structured and
organized way, so it can be
used for the purpose of
running a business
PEOPLE CONTENT
Implement capabilities for
people-centric process
optimization within an
analytics platform for rapid,
on-demand deployment
matching, talent cloud
crowdsourcing, predictive markets
simulation of workforce trends
performance analytics
behavior modeling…
PEOPLE ANALYTICS
© 2012 IBM Corporation
14
Future of Analytics
Explosion of
unstructured data
Consistent,
extensible, and
consumable analytics
platform
Optimizing across
the stack to deploy
analytics at scale
15
 Creates new analytics opportunities
 Addresses new enterprise needs
 Reduces cost-to-value for enterprises
 Increases analytics solution coverage with limited supply of skills
 Analytics becomes a dominant IT workload and drives HW design
 Opportunity to seamlessly scale from terascale to exascale
© 2012 IBM Corporation
15
Analytics is broadly defined as the use of data and computation to make
smart decisions
Data
Decision point
Possible outcomes
 Data instances
Historical
 Reports and queries on
data aggregates
 Predictive models
Option 2
 Answers and confidence
Simulated
 Feedback and learning
Text
16
Video, Images
Audio
© 2012 IBM Corporation
16
The value of analytics grows by incorporating new sources of data,
composing a variety of analytic techniques, spanning organizational silos,
and enabling iterative, user-driven interaction
New format or
usage of data
Intent-to-buy trends
Sources and types of data
Structured or
standardized
Segmentationbased
market impact
estimates
Sales-based
demand
forecasting
Low
17
Multi-modal
demand forecasting
Price-based
demand forecasting
(own & competitors)
Scope of decision
High
© 2012 IBM Corporation
17
New Data
Traditional
New Methods
Analytics toolkits will be expanded to support ingestion and interpretation of
unstructured data, and enable adaptation and learning
Adaptive Analysis
Responding to context
Continual Analysis
Responding to local change/feedback
Optimization under Uncertainty
Quantifying or mitigating risk
Optimization
Decision complexity, solution speed
Predictive Modeling
Causality, probabilistic, confidence levels
Simulation
High fidelity, games, data farming
Forecasting
Larger data sets, nonlinear regression
Alerts
Rules/triggers, context sensitive, complex events
Query/Drill Down
In memory data, fuzzy search, geo spatial
Ad hoc Reporting
Query by example, user defined reports
Standard Reporting
Real time, visualizations, user interaction
Entity Resolution
People, roles, locations, things
Relationship, Feature Extraction
Rules, semantic inferencing, matching
Annotation and Tokenization
Automated, crowd sourced
 Learn
In the context of the
decision process
 Decide and Act
 Understand
and Predict
 Report
 Collect and
Ingest/Interpret
Decide what to count;
enable accurate counting
Extended from: Competing on Analytics, Davenport and Harris, 2007
18
© 2012 IBM Corporation
18
Analytic solutions will apply multiple methods to multiple forms of data
Example: Utility Vegetation Management
 Effective Right of Way vegetation management is critical to streamlined utility operations
 Traditional Right of Way programs are mainly static-scenario driven on a six year cycle
– Static and rigid models lead to predominantly reactive operations, which are expensive
– Focus on narrow corridor widths fails to address severe weather impact
 A multimodal analytics approach can overcome these shortcomings
– Structured data (e.g. transmission line maps) and unstructured data (e.g. LIDAR sensor)
– Advanced modeling to perform a dynamic scenario-driven analysis
SENSORS
Preprocessor
UTILITY DATA
Preprocessor
MAPS
Right-of-Way
Dynamic
Forecasting
Model
Preprocessor
WEATHER
Preprocessor
19
Visualization
Solution Framework
3-Dimensional
Model
Recovery
ELECTRIC
TELECOMMUNICATIONS
RAIL
ROAD
OIL
Schedule
Generator
© 2012 IBM Corporation
19
Analytics solution development requires several interacting design steps
Data Evaluation and Fusion
Algorithm Composition and Invention
Testing and Execution Optimization
Streaming data
Data mining
& statistics
Text data
Optimization
& simulation
Multi-dimensional
Semantic
analysis
Time series
Fuzzy
matching
Geo spatial
Video
& image
Network
algorithms
Relational
Social network
Data Acquisition
20
✔
Filtering and
Extraction Validation
New
algorithms
Business Rules Engine
Core Analytics
Composition and
Packaging
Deployment
© 2012 IBM Corporation
20
Revenue
An Analytics solution platform will increase enterprise value by supporting
both the CxO solution and the CIO infrastructure
 Easier consumption of Analytics solutions
– Have consistent look and feel
– Changes are easier to implement effectively
– Trustworthy solutions are produced
With
platform
Without
platform
Lines of code
Expand Mandate
Refine business
processes and
enhance
collaboration
Transform Mandate
Change the industry
value chain through
improved
relationships
21
Leverage Mandate
Streamline
operations and
increase
organizational
effectiveness
Pioneer Mandate
Radically innovate
products, markets,
business models
 More efficient, less complex development
– Reduces growth of development costs
– Speeds delivery of new functionality
– Expands analytics solution developer population
 Reduces client cost of operation
– Seamless integration eases deployment
of solutions
– Establishes preferred development path
for new solution
– Consistent and coherent infrastructure eases
managing solutions
The CIO can reduce cost and add value
to the use of analytics by supporting collaboration
and data/analysis sharing
© 2012 IBM Corporation
21
Optimizing across the stack will enable the deployment of analytics at scale
Systems supporting future analytics will be more data centric, composable and scalable
 Systems will support increasingly complex data sets and workflows.
 Different elements within these complex workflows will require different capabilities within systems.
Predictive Analytics
Modeling, Simulation
Cores
Text Analytics
Hadoop Workloads
Cores
SCM
Storage
Cores
SCM
SCM
Future System
Cores
SCM
+
+
Network
Optimization
Sensitivity Analysis
Network
Storage
Network
Storage
Network
General Purpose
Integrated Network
Integrated Processing
Integrated Storage
Storage
 Balanced, reliable, power efficient systems, with integrated software that scales seamlessly
 Integrated analytics, modeling and simulation capabilities to address generation, management and analysis
of Big Data for Business Advantage
22
© 2012 IBM Corporation
22
The Future Watson
Extend Watson
technology
 Moves beyond “question-in & answer-out” to always “learning”
evidence-based decision support
Lead in new domains
 Addresses the enterprise need to convert growing volumes of
information into actionable knowledge
 Demonstrates business value in critical problem spaces, starting with
Healthcare
Enable efficient
adaptation
23
 Efficiently adapting and scaling Watson to new domains requires a
novel blend of engineering and research
© 2012 IBM Corporation
23
Watson’s real value proposition: Efficient decision support over unstructured
(and structured) content
Deeper Understanding,
Higher Precision and Broader,
Timely Coverage at lower costs
Jeopardy! Challenge
Shallow Understanding
Low Precision
Broad Coverage
Unstructured Data
 Broad, rich in context
 Rapidly growing, current
 Invaluable yet under utilized
24
Deeper Understanding but Brittle
High Precision at High Cost
Narrow Limited Coverage
Structured Data
 Precise, explicit
 Narrow, expensive
© 2012 IBM Corporation
24
Taking Watson beyond Jeopardy!
Understanding
Specific Questions
Interacting
Question-In/Answer-Out
Explaining
Precise Answers
& Accurate Confidences
Learning
Batch Training Process
The type of murmur
associated with this
condition is harsh,
systolic, and increases
in intensity with
Valsalva
From specific
questions
to rich,
incomplete
problem
scenarios
(e.g. EHR)
Evidence
analysis and
look-ahead,
drive interactive
dialog to refine
answers and
evidence
Move from
quality answers
to quality
answers and
evidence
Answers,
Corrections, Judgements
Input, Responses
Entire
Medical
Record
Dialog
Responses, Learning
Questions
Refined Answers, Follow-up
Questions
Rich Problem
Scenarios
25
Scale domain
learning and
adaptation rate
and efficiency
Interactive Dialog
Teach Watson
Comparative
Evidence Profiles
Continuous Training
& Learning Process
© 2012 IBM Corporation
25
26
© 2012 IBM Corporation
Download