Data sensing

advertisement
Having it All
is not Having it All at All!
Problem Formulation in the Face of
Overwhelming Quantities of Data
A journey of discovery… Where’s the fire?
START FROM THE
BEGINNING -- “Before the
beginning of great
brilliance, there must be
Chaos.” -- (I Ching)
“At the beginning of the 21th century,
the population of the Earth [was]
6.300.000.000., who annually
experience a reported 7,000,000 8,000,000 fires with 70,000 –80,000
fire deaths and 500,000 –800,000 fire
injuries.
Dr. Ing. Peter Wagner 2006
”
Data everywhere.
Who knew?
Gone are the days when there was a single
source of “truth”…
Harvard
R.G. Dun Credit Report Collection
Baker Library
Entries in a book on Australia business owners
About a storekeeper in Halifax
County, N.C. – June 1873:
“purchaser or stolen goods, a
great scamp.”
Entry about one J. B. Alford, who sold Entry on Hannah Griffith, a milliner in
Springfield, Ill. In 1869
groceries and liquors: June 1870
“This man is said to be in thriving
circumstances. He has some Real &
personal estate & I think it is safe to
trust him.”
“about to marry a fellow [of] no
account.”
An entry two years later noted with
some relief, that that plan had fallen
through.
"is not much of a businessman, but had some capital, it is said, advanced by his father,
who is reputed well off“ -- About J.D. Rockefeller – who turned out to be a good credit risk;
1863 was the year he set up a refinery that blossomed into Standard Oil.
Hold on… things are
changing.
Framing our case for change…
The Operating Environment
• We all know that the world is changing
• We are aware that the rate of change is increasing
at an unprecedented rate
• We see new types of data, technologies, and
behaviors every day
• More and more, we are tasked with discerning the
discoverable need from the articulated want
The Case for Change
• What has made us successful so far is insufficient
• We now have the ability to succeed… or fail, much faster
• The connectedness of information and the ways in which
it is changing is impacting the risk and opportunity space
in ways we are only beginning to understand
Sometimes, a picture is worth a thousand words.
•
•
•
The largest corpus of data
preceded the event
Most data created about
the event had significant,
and asymmetric latency
The rate of “data decay”
attributable to the
participants in the event is
significant
Pope Benedict Inauguration
Lately, a thousand pictures are taken in the time
it takes to speak a single word!
•
•
•
•
What about the digital
footprint of all of the
smartphones?
What about the social
networks the crowd?
What about the metadata in
the photos?
What are the opportunity
costs to other activities?
Pope Francis Inauguration
Asking the right question
What if the
Hokey Pokey
really is what it’s
all about?
How many more
of these silly
questions till the
next slide?
What if there
were no
hypothetical
questions?
How deep would
the ocean be if
sponges didn’t
live there?
Questions about risk and opportunity are at the heart
of our focus.
What about
fraud?
What other risks
should I know about
before doing
business with this
small company?
Who is the right
decision maker at this
company and how do I
effectively reach them?
Should I
extend credit?
What other companies
is this individual
associated with?
I need insights about a
contact to help me
target my messaging
Which
customers
should I call
on next
What is the
right credit
limit?
Which
prospects are
most
promising?
I need
answers!
How do I identify
changes with my
current contact
relationships?
How can I de-dupe
my current customer
base at the contact
level?
What do my
best
customers
look like?
10
It is extremely important to frame the question in
the right context.
Global
Macro
Regional
Association
Local
Entity
Micro
PITCOB
Disaster
Remediation
Malfeasance
Connected
Supply/Value
Chains
Material
Changes
The right universe of data is often implied by the
scope and context of the question.
Firmographic
Foundational
Telephone
Busines
s
Name
SIC
Employee Size
Linkage
Address
Primary
Contact
•
Unit of Analysis: Set of matched results
•
Response Variables
•
CC = Confidence Code Attribution
•
MG = MatchGrade Attribution
•
WACC = Weighted Average Confidence Code
Year
Started
Sales Revenue
•
Potential Explainable Factors:
• Cleansing Process – things w e do to the Korean text
w hich may cause it to be ‘less matchable’
• Candidate retrieval methods that w e use
• Evaluation & Decisioning – w e may need to adjust our
definition of A / B / F for Korea
• Availability of AME-K data
• Distribution bias in aggregate file behavior of scoring
system
• MatchGrade mappings
– Unknown or ignored, potentially
explainable, causes of variation
Rational Subgroups
•
By Confidence Code Cluster
•
By MatchGrade “cousin” cluster within
Confidence Code
•
•
•
•
•
•
•
•
Unexplainable
Quality of customer input
Completeness of customer input
Emergence of new jargon/Acronyms
New Chinese Idioms
Statutory changes
Differences in privacy expectations
Differences in w ord order, sound, stroke weight
•
•
•
•
Data in hand
Discoverable data
Computable data
Extent, unavailable data
(opportunity cost)
• Understanding of cause
systems
• Relevant theory
12
D&B Proprietary information
Leveraging the “V’s” to
get to the best answer
Volume: How
much data is “too
much” to see the
answer?
Variety: How can
heterogeneous
and unstructured
data inform new
ways of inquiry?
Velocity: Can
the rate of
change of data
itself be part of
the answer?
Veracity: How
do I adjudicate
the truth when
the malfeasants
are learning so
much faster?
A good example can be seen in tracking mergers,
acquisitions, and divestitures.
A typical M&A takes 6-9 months from
announcement to deal completion
• Some take longer, or may never close
• Regulatory requirements sometimes
drive pre- and post- close changes
over years
Family trees updated as the deal completes
• Average update within 10 days
• Linkage updates frequently precede
official registry changes
• Updates include re-linking records, restructuring tree levels, taking entities
to out of business and creating new
entities
Announced restructuring and re-organizations
often take 6 months to 2 years
14
14
Traditional analysis of this data can reveal
interesting risks
National Government:
Republic of Venezuela
3 additional subsidiary levels
Propernyn B.V.
Netherlands
PDV AMERICA, INC
Oklahoma, USA
CITGO PETROLEUM CORPORATION
Texas, USA
15
Combining the articulated want (family tree) with
the discoverable need (what’s really going on)…
The story is true. The
names have been
changed to protect the
innocent..
Monsanto
500 member family
tree
Mediquip
1000 Employees
Largest Genetically
modified food producer
49%
AdvDesigns AG
30 Employees
R&D
Stem Cell Rsrch
Frankfurt, Germany
Medi-Cell
125 Employees
Lab Equip Mfr.
Abayance, FL
30%
Ceramics Inc
50 Employees
Glass Mfr
Wichita, Kansas
Pending Decision:
Underwrite Directors
and Officers Policy
16
Language, identity, and intention can significantly
impact the complexity of the situation.
Kawasaki (idiom)- “river
beside mountainous terrain”
川崎重工咨询
“Chuanxi Zhonggong zuishin”
(aka Kawasaki Heavy Industries Consulting)
“Ka-wa-sa-ki”
株式会社カワサキモータースジャパン
“Kabushikigaisha Kawasaki Mōtāsu Jyapan ”
(aka Kawasaki Motors Japan)
川崎重工業株式会社
川崎涂料有限公司
“Chuanxi chuliao Youxian Gonxi”
(aka Kawasaki Paint Co, Dongguan)
“Kawasaki Jūkōgyō Kabushiki-gaisha”
(aka Kawasaki Heavy Industries)
한국가와사키
“Hanguggawasaki”
(aka Kawasaki Korea)
KAWASAKI KK
(Local electricians in a
suburb of Kawasaki)
D&B Proprietary information
People are strange…
Digital natives
vs. digital
immigrants
Multiple names
Privacy and
other statutory
constraint
Overlapping
“identities”
As the boundary between people and small business
becomes increasingly blurred, we continue to focus
on the concept of People In The Context of Business
THE CHALLENGE
#1 – the “John Smith”
problem – multiple
people with the same
name
THE GOAL
Cleanse, de-dupe, identity resolution and
enrichment services for your contact data
Many people connected
to one business
Understand when people move from
organization to organization
#2 – the “Ann Taylor”
problem – data about
businesses named after
people
Many businesses
connected to one
person
#3 – the “Sybil” problem – one person with
multiple persona or names
Caroline M Smith
302 N Liberty St.
Albion, IA
Addr Type: Residential
Carrie Smith
Meredith Corporation
1716 Locust St.
Des Moines, IA
Addr. Type: Commercial
THE VALUE
Caroline Smith
University of Iowa
21 E Market St.
Iowa City, IA
Addr. Type:
Commercial
Businesses connected through people
People connected
through associations
with other people
Sharpen the line between the individual and
the business when engaging small
businesses
Malfeasance and fraud are perpetrated by
people, not by businesses. This solution
reveals relationships that will help all of us
more effectively identify potential for bad
behavior.
A single view of customers
and prospects, both in the
context of entities and people
will drive key actionable
outcomes for your business.
Carrie Smith
Tenderheart Daycare
2635 Cleveland Dr.
Adel, IA
Addr. Type:
Commercial
D&B Proprietary information
19
Creating the foundation for People in the Context of
Business.
• There will be a point of inflection reached whereby we have sufficiency of indicia (by quality
and count) to say we can recognize a “soul”
• Dynamic clustering will allow us to adjust our opinion of existing indicia or an existing Soul as
new Flexible Alternative Indicia is identified
Indicia
Dynamic Clustering
Soul
Indicia
D&B Proprietary information
20
Predictions, predictions…
I’ll bet you knew
this was coming
How do you
predict something
that has no
precedent?
Learning from
the way things
move, even if
you don’t
understand them
fully… seriously?
Commercial signal and proxy are now added to
existing predictive attributes to provide deeper insights
and even more predictive analytics.
Predictive Content
High
Traditional Business Data
NonTraditional
Insight
Low
Robust Predictive
Data Available
Limited Data
Available
No Data
Available
Signal & proxy
sources add significant
decisioning content on
small businesses with
limited or no traditional
predictive data
footprint
‘Signals’ aggregated and analyzed over time, correlated
with other data sources expose hard-to-find patterns.
Customer
Crossborder
Inquiries
BIG DISPARATE
SOURCES OF
DATA
Global
Trade
Experiences
SIGNAL
EXTRACTION
Call Center
Activity
Customer
Match
Inquiries
Transaction
al
WorldBase
Updates
Intelligence
Engine
Traffic
PREDICTIVE
MODEL GAINS
Other
Proprietary
Sources
Customer
Portfolio
Monitoring
Third Party
Exchange
ADVANCED
ANALYTICS
Phone and
Email
Connectivity
Testing
We’re harnessing the massive flow of data
through our systems and distilling the signals
that describe a company’s behavior.
This is helping to increase levels of precision
in predictive models.
23
D&B Proprietary information
Extending the deployed capability to better
understand malfeasance…
Data Collection & Input
• Identity verification of
the business and
authentication of the
individual
• Rules-based alerts at
point of data entry
Prevention
Detection
Within data
maintenance
• Manual analyst
reviews
• Automated, rulesdriven detection
procedures to reveal
suspicious patterns
Recovery and
Learning
• Investigation of high
risk cases by certified
fraud examiners
• Situational analysis
• Collaborating with
industry groups
forums, customers,
and law enforcement
to understand
evolving needs and
trends
Investigation
At data point of entry
•Apply learning and
integrate new targeted
severe risk prevention
and detection rules in
data supply processes
and platforms
Continuous
Improvement
D&B Proprietary information
24
Combining people, linkage, and daily signals to quickly
recognize and analyze patterns and take action…
“Ring Leaders”
25
In the above use-case, with millions of payment experiences a week,
we were able to quickly identify and analyze a suspicious pattern and take action
Not only on all related cases but also the “three ring leaders”
D&B Proprietary information
Data sensing: Advanced analytics also play a
significant role in acquiring new data sources.
Multi-national footprint?
Comprehensive coverage
across all verticals and sizes of
business?
Positive correlation with
trade or other predictors to
serve as a proxy?
26
Some current efforts under way to utilized this hybrid
capability…
MATERIAL CHANGE
TIER-N SUPPLY CHAIN RISK
LINKAGE DISCOVERY ENGINE
Helping you gain visibility into your
supplier’s suppliers, from tier 1 to tier N.
Providing you more linked families with a
focus on small and medium businesses.
Helping you stay ahead by anticipating
important changes before they occur.
With this knowledge you can reduce the
risk of being blind-sided by disruption(s)
anywhere within your supplier network.
Gain a more comprehensive view of your
multi-site business partners, revealing
new opportunities and overall risk.
Knowing which businesses are poised for
growth, or which may be headed for
elevated risk is valuable foresight.
Branch
Headquarters
Tier 1
Tier 2
Tier N
Tier …
Branch
Ultimate
Parent
Parent
We use analytical methods to build an
implied supply chain using our extensive
knowledge of buyer-seller relationships.
Buyer
Seller
A
B
B
C
Buyer
Seller
…that predict
business
outcomes
Signals that
predict a change…
Subsidiary
Innovative technology and analytics are
efficiently guiding us to potential linkage
relationships we had not previously seen.
…in traditional
predictors…
Anticipatory analytics is helping us
identify unique drivers, root causes, and
sensitivities leading to material change.
Derive insights from
signals over time
Pinpoint combinations with
greatest predictive
31 value
D&B Proprietary information
We are increasingly faced with information that is rich,
varied, and replete with opportunity – our focus is
shifting from “hunting and gathering” to new challenges.
New Techniques to address Big Data
New approaches to Discovery, Curation, and Synthesis
Data sensing at the “Event Horizon”
28
“And now we welcome the new year,
full of things that have never been” –
Rainer Maria Rilke
Download