Having it All is not Having it All at All! Problem Formulation in the Face of Overwhelming Quantities of Data A journey of discovery… Where’s the fire? START FROM THE BEGINNING -- “Before the beginning of great brilliance, there must be Chaos.” -- (I Ching) “At the beginning of the 21th century, the population of the Earth [was] 6.300.000.000., who annually experience a reported 7,000,000 8,000,000 fires with 70,000 –80,000 fire deaths and 500,000 –800,000 fire injuries. Dr. Ing. Peter Wagner 2006 ” Data everywhere. Who knew? Gone are the days when there was a single source of “truth”… Harvard R.G. Dun Credit Report Collection Baker Library Entries in a book on Australia business owners About a storekeeper in Halifax County, N.C. – June 1873: “purchaser or stolen goods, a great scamp.” Entry about one J. B. Alford, who sold Entry on Hannah Griffith, a milliner in Springfield, Ill. In 1869 groceries and liquors: June 1870 “This man is said to be in thriving circumstances. He has some Real & personal estate & I think it is safe to trust him.” “about to marry a fellow [of] no account.” An entry two years later noted with some relief, that that plan had fallen through. "is not much of a businessman, but had some capital, it is said, advanced by his father, who is reputed well off“ -- About J.D. Rockefeller – who turned out to be a good credit risk; 1863 was the year he set up a refinery that blossomed into Standard Oil. Hold on… things are changing. Framing our case for change… The Operating Environment • We all know that the world is changing • We are aware that the rate of change is increasing at an unprecedented rate • We see new types of data, technologies, and behaviors every day • More and more, we are tasked with discerning the discoverable need from the articulated want The Case for Change • What has made us successful so far is insufficient • We now have the ability to succeed… or fail, much faster • The connectedness of information and the ways in which it is changing is impacting the risk and opportunity space in ways we are only beginning to understand Sometimes, a picture is worth a thousand words. • • • The largest corpus of data preceded the event Most data created about the event had significant, and asymmetric latency The rate of “data decay” attributable to the participants in the event is significant Pope Benedict Inauguration Lately, a thousand pictures are taken in the time it takes to speak a single word! • • • • What about the digital footprint of all of the smartphones? What about the social networks the crowd? What about the metadata in the photos? What are the opportunity costs to other activities? Pope Francis Inauguration Asking the right question What if the Hokey Pokey really is what it’s all about? How many more of these silly questions till the next slide? What if there were no hypothetical questions? How deep would the ocean be if sponges didn’t live there? Questions about risk and opportunity are at the heart of our focus. What about fraud? What other risks should I know about before doing business with this small company? Who is the right decision maker at this company and how do I effectively reach them? Should I extend credit? What other companies is this individual associated with? I need insights about a contact to help me target my messaging Which customers should I call on next What is the right credit limit? Which prospects are most promising? I need answers! How do I identify changes with my current contact relationships? How can I de-dupe my current customer base at the contact level? What do my best customers look like? 10 It is extremely important to frame the question in the right context. Global Macro Regional Association Local Entity Micro PITCOB Disaster Remediation Malfeasance Connected Supply/Value Chains Material Changes The right universe of data is often implied by the scope and context of the question. Firmographic Foundational Telephone Busines s Name SIC Employee Size Linkage Address Primary Contact • Unit of Analysis: Set of matched results • Response Variables • CC = Confidence Code Attribution • MG = MatchGrade Attribution • WACC = Weighted Average Confidence Code Year Started Sales Revenue • Potential Explainable Factors: • Cleansing Process – things w e do to the Korean text w hich may cause it to be ‘less matchable’ • Candidate retrieval methods that w e use • Evaluation & Decisioning – w e may need to adjust our definition of A / B / F for Korea • Availability of AME-K data • Distribution bias in aggregate file behavior of scoring system • MatchGrade mappings – Unknown or ignored, potentially explainable, causes of variation Rational Subgroups • By Confidence Code Cluster • By MatchGrade “cousin” cluster within Confidence Code • • • • • • • • Unexplainable Quality of customer input Completeness of customer input Emergence of new jargon/Acronyms New Chinese Idioms Statutory changes Differences in privacy expectations Differences in w ord order, sound, stroke weight • • • • Data in hand Discoverable data Computable data Extent, unavailable data (opportunity cost) • Understanding of cause systems • Relevant theory 12 D&B Proprietary information Leveraging the “V’s” to get to the best answer Volume: How much data is “too much” to see the answer? Variety: How can heterogeneous and unstructured data inform new ways of inquiry? Velocity: Can the rate of change of data itself be part of the answer? Veracity: How do I adjudicate the truth when the malfeasants are learning so much faster? A good example can be seen in tracking mergers, acquisitions, and divestitures. A typical M&A takes 6-9 months from announcement to deal completion • Some take longer, or may never close • Regulatory requirements sometimes drive pre- and post- close changes over years Family trees updated as the deal completes • Average update within 10 days • Linkage updates frequently precede official registry changes • Updates include re-linking records, restructuring tree levels, taking entities to out of business and creating new entities Announced restructuring and re-organizations often take 6 months to 2 years 14 14 Traditional analysis of this data can reveal interesting risks National Government: Republic of Venezuela 3 additional subsidiary levels Propernyn B.V. Netherlands PDV AMERICA, INC Oklahoma, USA CITGO PETROLEUM CORPORATION Texas, USA 15 Combining the articulated want (family tree) with the discoverable need (what’s really going on)… The story is true. The names have been changed to protect the innocent.. Monsanto 500 member family tree Mediquip 1000 Employees Largest Genetically modified food producer 49% AdvDesigns AG 30 Employees R&D Stem Cell Rsrch Frankfurt, Germany Medi-Cell 125 Employees Lab Equip Mfr. Abayance, FL 30% Ceramics Inc 50 Employees Glass Mfr Wichita, Kansas Pending Decision: Underwrite Directors and Officers Policy 16 Language, identity, and intention can significantly impact the complexity of the situation. Kawasaki (idiom)- “river beside mountainous terrain” 川崎重工咨询 “Chuanxi Zhonggong zuishin” (aka Kawasaki Heavy Industries Consulting) “Ka-wa-sa-ki” 株式会社カワサキモータースジャパン “Kabushikigaisha Kawasaki Mōtāsu Jyapan ” (aka Kawasaki Motors Japan) 川崎重工業株式会社 川崎涂料有限公司 “Chuanxi chuliao Youxian Gonxi” (aka Kawasaki Paint Co, Dongguan) “Kawasaki Jūkōgyō Kabushiki-gaisha” (aka Kawasaki Heavy Industries) 한국가와사키 “Hanguggawasaki” (aka Kawasaki Korea) KAWASAKI KK (Local electricians in a suburb of Kawasaki) D&B Proprietary information People are strange… Digital natives vs. digital immigrants Multiple names Privacy and other statutory constraint Overlapping “identities” As the boundary between people and small business becomes increasingly blurred, we continue to focus on the concept of People In The Context of Business THE CHALLENGE #1 – the “John Smith” problem – multiple people with the same name THE GOAL Cleanse, de-dupe, identity resolution and enrichment services for your contact data Many people connected to one business Understand when people move from organization to organization #2 – the “Ann Taylor” problem – data about businesses named after people Many businesses connected to one person #3 – the “Sybil” problem – one person with multiple persona or names Caroline M Smith 302 N Liberty St. Albion, IA Addr Type: Residential Carrie Smith Meredith Corporation 1716 Locust St. Des Moines, IA Addr. Type: Commercial THE VALUE Caroline Smith University of Iowa 21 E Market St. Iowa City, IA Addr. Type: Commercial Businesses connected through people People connected through associations with other people Sharpen the line between the individual and the business when engaging small businesses Malfeasance and fraud are perpetrated by people, not by businesses. This solution reveals relationships that will help all of us more effectively identify potential for bad behavior. A single view of customers and prospects, both in the context of entities and people will drive key actionable outcomes for your business. Carrie Smith Tenderheart Daycare 2635 Cleveland Dr. Adel, IA Addr. Type: Commercial D&B Proprietary information 19 Creating the foundation for People in the Context of Business. • There will be a point of inflection reached whereby we have sufficiency of indicia (by quality and count) to say we can recognize a “soul” • Dynamic clustering will allow us to adjust our opinion of existing indicia or an existing Soul as new Flexible Alternative Indicia is identified Indicia Dynamic Clustering Soul Indicia D&B Proprietary information 20 Predictions, predictions… I’ll bet you knew this was coming How do you predict something that has no precedent? Learning from the way things move, even if you don’t understand them fully… seriously? Commercial signal and proxy are now added to existing predictive attributes to provide deeper insights and even more predictive analytics. Predictive Content High Traditional Business Data NonTraditional Insight Low Robust Predictive Data Available Limited Data Available No Data Available Signal & proxy sources add significant decisioning content on small businesses with limited or no traditional predictive data footprint ‘Signals’ aggregated and analyzed over time, correlated with other data sources expose hard-to-find patterns. Customer Crossborder Inquiries BIG DISPARATE SOURCES OF DATA Global Trade Experiences SIGNAL EXTRACTION Call Center Activity Customer Match Inquiries Transaction al WorldBase Updates Intelligence Engine Traffic PREDICTIVE MODEL GAINS Other Proprietary Sources Customer Portfolio Monitoring Third Party Exchange ADVANCED ANALYTICS Phone and Email Connectivity Testing We’re harnessing the massive flow of data through our systems and distilling the signals that describe a company’s behavior. This is helping to increase levels of precision in predictive models. 23 D&B Proprietary information Extending the deployed capability to better understand malfeasance… Data Collection & Input • Identity verification of the business and authentication of the individual • Rules-based alerts at point of data entry Prevention Detection Within data maintenance • Manual analyst reviews • Automated, rulesdriven detection procedures to reveal suspicious patterns Recovery and Learning • Investigation of high risk cases by certified fraud examiners • Situational analysis • Collaborating with industry groups forums, customers, and law enforcement to understand evolving needs and trends Investigation At data point of entry •Apply learning and integrate new targeted severe risk prevention and detection rules in data supply processes and platforms Continuous Improvement D&B Proprietary information 24 Combining people, linkage, and daily signals to quickly recognize and analyze patterns and take action… “Ring Leaders” 25 In the above use-case, with millions of payment experiences a week, we were able to quickly identify and analyze a suspicious pattern and take action Not only on all related cases but also the “three ring leaders” D&B Proprietary information Data sensing: Advanced analytics also play a significant role in acquiring new data sources. Multi-national footprint? Comprehensive coverage across all verticals and sizes of business? Positive correlation with trade or other predictors to serve as a proxy? 26 Some current efforts under way to utilized this hybrid capability… MATERIAL CHANGE TIER-N SUPPLY CHAIN RISK LINKAGE DISCOVERY ENGINE Helping you gain visibility into your supplier’s suppliers, from tier 1 to tier N. Providing you more linked families with a focus on small and medium businesses. Helping you stay ahead by anticipating important changes before they occur. With this knowledge you can reduce the risk of being blind-sided by disruption(s) anywhere within your supplier network. Gain a more comprehensive view of your multi-site business partners, revealing new opportunities and overall risk. Knowing which businesses are poised for growth, or which may be headed for elevated risk is valuable foresight. Branch Headquarters Tier 1 Tier 2 Tier N Tier … Branch Ultimate Parent Parent We use analytical methods to build an implied supply chain using our extensive knowledge of buyer-seller relationships. Buyer Seller A B B C Buyer Seller …that predict business outcomes Signals that predict a change… Subsidiary Innovative technology and analytics are efficiently guiding us to potential linkage relationships we had not previously seen. …in traditional predictors… Anticipatory analytics is helping us identify unique drivers, root causes, and sensitivities leading to material change. Derive insights from signals over time Pinpoint combinations with greatest predictive 31 value D&B Proprietary information We are increasingly faced with information that is rich, varied, and replete with opportunity – our focus is shifting from “hunting and gathering” to new challenges. New Techniques to address Big Data New approaches to Discovery, Curation, and Synthesis Data sensing at the “Event Horizon” 28 “And now we welcome the new year, full of things that have never been” – Rainer Maria Rilke