Analytics is Driving the Evolution of ECM ECM Becomes a Key

Craig Rhinehart
Director of ECM Product Strategy
The Next Wave of ECM Innovation …
Analyze Your Content with Trusted Content Analytics
© 2009 IBM Corporation
Craig Rhinehart Contact Info
• On my blog this week …
• What happens when we fail to govern enterprise
content properly?
• Email me at
• My blog can be found at
• Follow me on Twitter at
© 2009 IBM Corporation
• Introduction to Content Analytics
• How Content Analytics Works
• New Cognos Content Analytics Offering
• Cognos Content Analytics Demo
• New InfoSphere Content Assessment Offering
© 2009 IBM Corporation
Trusted Content Analytics Overview
Content Assessment
Empower organizations to identify necessary information and
decommission the unnecessary
Master Content
Deliver trusted content to empower better decision
making about individual customers
Leverage & Exploit
Content Analytics
Deliver insight by visualizing trends, correlations and anomalies
about your overall business from your content
© 2009 IBM Corporation
The world is changing and becoming more…
The resulting explosion of information
creates a need for a new kind of intelligence…
… to help build a Smarter Planet
© 2009 IBM Corporation
Creating New Business Optimization Opportunities...
What if you could understand
what your customers want
before they ask?
What if you could detect
fraudulent claims before
they’re paid?
What if you could find crime
patterns and apprehend
criminals in real-time?
What if you could make cities
smarter by integrating all
information about a citizen?
© 2009 IBM Corporation
Business Optimization Enabled by Content Analytics
Smarter Insurance
Large Claims Third-Party Administrator
Analytics over insurance claim files helps detect
fraud faster, reducing costs for their clients by
$millions and optimizing the claims-handling
Smarter CPG
Kraft Australia
Analytics over online customer postings helps
Kraft target and deliver new branding
campaigns, increasing sales and customer
Smarter Telecommunications
Analytics over Voice of Customer data provides
insight to drive customer-oriented decision
making, boosting loyalty and creating new
Smarter Healthcare Plans
Blue Cross Blue Shield of TN
Analytics over an integrated single view
of plans, patients and providers enables
better negotiations and improves provider
satisfaction to over 90%
© 2009 IBM Corporation
Analytics is Driving the Evolution of ECM
ECM Becomes a Key Enabler for Information-Led Transformation
Smarter Business
Content Analytics
BPM  Advanced
Case Management
• Content Analytics
• Content Assessment
• Master Content
• Advanced Workflow
• Activity Monitoring
• Business Rules
Image Management
Office Document Management
Archiving / Records Management
Compliance Lifecycle Mgmt
© 2009 IBM Corporation
Every single organization:
1. Keeps too much information and spends too much storing content
because there’s too much to sift through
2. Can’t pinpoint the right content when they need it because its
unfindable or hidden away in a departmental silo
3. Can’t trust the content they do find about their customers because
the lifecycle is uncontrolled
4. Needs to deliver better customer service, for less because those with
the best service are rising above the rest in highly competitive
5. Wants to optimize their business by
• anticipating their customers’ purchasing needs
• reducing fraud
• delivering a more complete view of their customers
• gaining early warning on product quality and customer satisfaction
because the answers exist inside their organization, they’re just
buried underneath too much information
© 2009 IBM Corporation
• Introduction to Content Analytics
• How Content Analytics Works
• New Cognos Content Analytics Offering
• Cognos Content Analytics Demo
• New InfoSphere Content Assessment Offering
© 2009 IBM Corporation
Key Enabling Innovation: Content Analytics
Claimant: Soft Tissue Injury Concept
Content Analytics
Based on UIMA, the open, industry-standard
architecture for text analysis pioneered by
IBM and now an OASIS standard and Apache
open-source project
Body Part
Noun Phrase
Prep Phrase
John sprained his ankle on the step
Analyzed Documents
• From each document you can derive:
with identified concepts
• New business understanding
• New visibility from content
• Create structure and understanding from a group of words
• Powered by IBM’s unique Dynamic Analysis capability
© 2009 IBM Corporation
Content Analytics enables analysis that was previously impractical
Aggregates conclusions & scales out understanding to large data sets
Claimant: Soft Tissue Injury Concept
Body Part
Automatic Visualization
Noun Phrase
Prep Phrase
Concepts and tagged source
information are visualized in UI
John sprained his ankle on the step
Source Info
(ECM, File, Web, DBMS, ...)
Analyzed Documents
with identified concepts
• Content analytics scales out
document by document content
• Aggregate the conclusions
• Assess volumes of information not
otherwise humanly possible (or
cost effective)
© 2009 IBM Corporation
Dynamic Analysis: Basis for Trusted Content Analytics Solutions
Impractical and overwhelming analyses are now a reality
Aggregate Correlate
IBM’s unique
Dynamic Analysis capability
Aggregate … form collections from multiple
content sources and types unmatched in industry
Correlate … deep analysis of content that surfaces
trends, relationships patterns, concepts and
anomalous associations
Visualize … easy to use, feature-rich views to
quickly dissect large corpa of content and zero-in
on answers
Explore … freely investigate content with faceted
navigation and drill down to surface new insight
and understanding.
… to enable informed business decisions
© 2009 IBM Corporation
Result: A Platform for Uncovering New Insights
Separate the valuable
content from the
Determine what
customers will buy
Find early warnings
on product quality
Tells you something you may not know
Identify potentially
fraudulent insurance
© 2009 IBM Corporation
Claimant: Soft Tissue Injury Concept
Based on UIMA
Body Part
Noun Phrase
Prep Phrase
Automated Concept
Extraction and Logical
Plug-in Custom Analytics
Automatic Classifier
Multi-word Analytics
Named Entity Extraction
Word Analytics
Identify Language
John sprained his ankle on the step
Visualization UI
UIMA Annotators
It is an open, industrial-strength, scalable and extensible platform for creating, integrating and deploying unstructured
information management solutions from combinations of semantic analysis and search components.
Although UIMA originated at IBM, it is now an OASIS industry standard and an Open Source project
which is currently incubating at the Apache Software Foundation.
© 2009 IBM Corporation
• Introduction to Content Analytics
• How Content Analytics Works
• New Cognos Content Analytics Offering
• Cognos Content Analytics Demo
• New InfoSphere Content Assessment Offering
© 2009 IBM Corporation
Leverage &
IBM Cognos Content Analytics
Deliver insight about your overall business from your content
I need to improve
my customer sat
Using Dynamic Analysis, Cognos Content Analytics
powers solutions that can:
Drive new business understanding and visibility
leveraging the content & context of unstructured
Enable better business decisions by explaining why
events are occurring
Expose patterns and trends to highlight optimization
opportunities and create differentiation
Create cost savings by uncovering process
inefficiencies and optimization opportunity
All without prior knowledge or
pre-defined queries or reports
The impact:
• Improved customer satisfaction
• Reduced fraud
• Better understanding of market demand and
• Early warning on product quality issues
I need to better
anticipate my
customers’ needs
I need better
visibility into the
I need to
optimize my
claims process
I need to fight
crime faster
I need to get
ahead of product
quality problems
I need to make my
legal team more
I need to assess my
content & take action
to better manage it
I need to
reduce fraud
I need to anticipate
© 2009 IBM Corporation
IBM Cognos Content Analytics features…
• Analyze and explore structured and
unstructured information
• Automatic extraction of meaningful
concepts and entities from text
• Open, standard UIMA-based text analysis
• Integration with Cognos for reporting
against unstructured concepts
• Multiple graphical views of the facets
(dimensions) of unstructured content
• Automatic highlighting of interesting
anomalies and correlations in the data
• Support for analysis of over 30 content
sources and over 150 content formats
• Integration with ICM for analysis of
document categories, classes, and clusters
• Highly scalable and extensible
© 2009 IBM Corporation
Cognos Content Analytics adds value to…
Retail Customer Care
• Analyzing: Call logs, online media
• For: Brand Reputation Management
• Benefits: Improve customer sat, marketing campaigns
Retail Banking
Customer Care
• Analyzing: Call logs, online media
• For: Buyer Behavior
• Benefits: Improve Customer
satisfaction, marketing campaigns,
find new revenue opportunities
Crime Analytics
• Analyzing: Police records, 911 calls…
• For: Rapid crime solving & crime trend analysis
• Benefits: Safer communities & optimized force deployment
Healthcare Analytics
• Analyzing: Care records
• For: Clinical analysis; treatment protocol optimization
• Benefits: Better management of chronic diseases;
optimized drug formularies; improved patient outcomes
Telco Customer Care
• Analyzing: Call center logs and emails
• For: Churn prediction and FAQ generation
• Benefits: Improved customer retention &
customer satisfaction
...and more!
Automotive Quality Insight
• Analyzing: Tech notes, call logs, online media
• For: Brand Reputation Management
• Benefits: Reduce warranty costs, improve customer
satisfaction, marketing campaigns
Insurance Fraud
• Analyzing: Insurance claims
• For: Detecting Fraudulent activity & patterns
• Benefits: Reduced losses, faster detection,
more efficient claims processes
© 2009 IBM Corporation
Insurance Case Study for Fraud Detection and Prediction
Content Analytics
Based Predictive
Fraud Indicators:
Soft Tissue Injury
Unwitnessed Event
Prior Injury
Multiple Claims …
1. Automatically aggregate structured and
unstructured data accumulated over
time from the claims process
2. Correlate text analytics to apply
meaning and understand patterns and
trends … visualize and explore to
uncover new insights into claims
3. Instrument by applying indicators to
“in process” claims to identify
suspicious claims and type of risk
Historical Cross-Claim
Content Analytics
4. Score suspicious claims to predict
probability and impact of fraud and
5. Route high-likelihood and/or highimpact claims for investigation based
on scoring outcomes
6. Continuously improve outcomes
through closed loop optimization
Routing to
© 2009 IBM Corporation
Partner Solution for Healthcare Fraud Analytics
© 2009 IBM Corporation
Partner Solution for Healthcare Fraud Analytics
© 2009 IBM Corporation
Accelerating Regulatory Review
Environmental Protection Agency
The Customer Problem:
The Solution:
• EPA tracks chemicals being produced
• Chemical producers submit robust reports of
effects on environment
• EPA has 3,000 of these reports and no
way to analyze the data
The Results:
• Convert documents to XML
• Extract complex chemical structures from the
• Provided toxicological capability to
understand how different chemicals map to
“end effects” (e.g. increase in liver weight)
• Provide ability to analyze chemical structures
in reports and, using patent data, understand
how these chemical are being used in the
© 2009 IBM Corporation
Better Business Outcome: NYPD is Solving More
Crime Faster with New Insight from Content Analytics
Identify and Designate
Trusted Repositories of
Create, Control,
Maintain and Supply
Trusted Content
Consume, Leverage
and Exploit Trusted
Govern The Information Lifecycle … Archive, Record and Preserve
Information and Evidence of Transactions, Processes and Events
 Search and analyze complaints, police
reports, 911 records, arrest records, and
data marts … all stuck in silos of information
 All of these forms of text suffer from the
common problems of call center text i.e.
abbreviations, misspellings, synonyms
(Police-specific i.e. perp, ML, FM, MO,
pistol, gun, etc...)
 Find events that keyword search can never
find because they are all described
differently – what keyword to use?
 IBM OmniFind Enterprise Edition with
Content Analytics enables insight and
understanding across all silos
The Results
 Text Analytics can describe events,
categorize them and allow for concept
searches across often unstructured
and at times inaccurate descriptions
 Enables aggregated view of
information beyond silos
 In the first week of deployment two old
murder cases were solved which were
directly attributed to being able to
analyze trusted data and content
 Customized with NYPD-specific case
management analytics
© 2009 IBM Corporation
Accelerating Crime Analysis (Law Enforcement)
• Customer observed “that a too significant part (estimation of
76%) of the analyst’s time is spent in non real analysis tasks
with no real added value for their analysis business”
• “Enable the analysts to cope with the increasingly large
volumes of intelligence information that they are receiving”
• “Automatically extract and find relevant information (facts,
entities, link, etc.) useful for the analysis without having to
spend hours to examine and manually parse data collection.”
• Solution based on Content Analytics with search front-end
built with IBM OmniFind Enterprise Edition on top of an ECM
© 2009 IBM Corporation
Europol Example
refinement of user
query, based on
detected concepts
Concepts such as cars,
people, and crime events is
extracted from the
underlying text by
text analysis technology
© 2009 IBM Corporation
• Introduction to Content Analytics
• How Content Analytics Works
• New Cognos Content Analytics Offering
• Cognos Content Analytics Demo
• New InfoSphere Content Assessment Offering
© 2009 IBM Corporation
FDA MedWatch incident reports are one source
of data for medical device manufacturers to
understand problems being reported by
consumers about their products. It contains both
structured and unstructured information.
A manufacturer could also analyze internal
content, such as warranty claims or support
© 2009 IBM Corporation
This view shows Deviations
(or anomalies) over time for
all values of the selected
facet– in this case, Generic
Device Name
© 2009 IBM Corporation
Here we see an unexpectedly
high occurrence of incidents
around Infusion Pumps in April,
2008, so we drill in.
© 2009 IBM Corporation
Switching to the Facets view
of key phrases, we see
frequent mentions of battery
issues in Infusion Pump
incidents reported in April,
2008. We drill down into these
battery issues.
© 2009 IBM Corporation
In the documents view, we can
see the original source
documents about these 154
battery-related infusion pump
Relevant matching text from
the original documents is
© 2009 IBM Corporation
Switching to a Brand Name facet
view, we can immediately see a
summary, by frequency and
correlation, of the devices that
are mentioned in these batteryrelated incidents.
© 2009 IBM Corporation
Through Cognos Content
Analytics OLAP/Star Schema
export ability, Cognos BI reports
and dashboards can be created to
monitor and track these issues
over time.
© 2009 IBM Corporation
When a potential regulatory, legal, or
compliance issue is identified, the same
Content Analytics interface can be used to
identify internal documents that might be
relevant, gather them, and export them for
archiving into a centralized IBM ECM
© 2009 IBM Corporation
The IBM Content Collector
provides a graphical interface for
coordinating the archiving of
these, and other relevant items
(such as related emails).
Emails and Documents can be
classified, declared as records and
even have meta data cleansed
prior to becoming a managed or
archived item
© 2009 IBM Corporation
Once gathered into a repository, IBM
eDiscovery tools can be used to
place legal holds on items, and
prepare evidence for legal cases,
audits, or other compliance events.
Retention and Legal holds can be
enforced within the storage
infrastructure if using IBM Information
© 2009 IBM Corporation
Specific subsets of evidence
can be marked for further
review to identify the degree
of risk or legal exposure.
© 2009 IBM Corporation
• Introduction to Content Analytics
• How Content Analytics Works
• New Cognos Content Analytics Offering
• Cognos Content Analytics Demo
• New InfoSphere Content Assessment Offering
© 2009 IBM Corporation
Unnecessary Information Eclipses Necessary Information
High Risk
How much of your information is unnecessary?
70%? 80%? 90%?
© 2009 IBM Corporation
Content Assessment Enables Content Decommissioning
Bloated Production Systems
with Inefficient Storage
Content Based Systems
Needing Retirement
Content In The Wild
• Semi-automated process separates trusted
from suspected
• Efficiently addresses large-scale problems,
while incorporating the human element
One customer found 1200 copies of the
same policy document across multiple
enterprise file servers
© 2009 IBM Corporation
IBM InfoSphere Content Assessment
Housekeeping doesn’t have to be a chore.
Dynamically Analyze what you have
Aggregate, Correlate, Visualize and Explore your enterprise information in
new ways to understand virtually all content types from multiple sources.
Make rapid decisions about business value, relevance and disposition.
Decommission what’s unnecessary
Save cost and reduce risk by eliminating obsolete, over-retained, duplicate,
and irrelevant content – and the infrastructure that supports it.
Preserve and Exploit the content that matters
Collect valued content to manage, trust and govern throughout its lifespan in
an enterprise-grade ECM platform. Uncover new business value and insight
by integrating with solutions for eDiscovery, case management, master data
management, business intelligence, predictive analytics and more.
© 2009 IBM Corporation
Selling Content Assessment via BVA
Content decommissioning, dynamic collection for eDiscovery lead to measurable ROI
Cost Drivers
Production System Tangible Costs
Savings After Deployment
Storage Management Tangible Savings
Email / File / SharePoint Storage
Production System Servers
Cost of backup media and storage
Production System Productivity Costs
Storage Management Productivity Savings
• Production System Administration
20% to 80%
• End-User Administration / Classification
70% to 90%
eDiscovery Costs
Data Spoliation (fines, lost or settled
eDiscovery Cost Avoidance
Up to 100%
• Hours vs. Days
Labor costs of providing the information
© 2009 IBM Corporation
Trusted Content Analytics Summary
Content Assessment
Empower organizations to identify necessary information and
decommission the unnecessary
Master Content
Deliver trusted content to empower better decision
making about individual customers
Leverage & Exploit
Content Analytics
Deliver insight by visualizing trends, correlations and anomalies
about your overall business from your content
© 2009 IBM Corporation
Craig Rhinehart
Director of ECM Product Strategy
• Email me at
• My blog can be found at
• Follow me on Twitter at
© 2009 IBM Corporation