3rd Party Data Integration Mania: Ensuring Quality and Transparency Gerard Broussard | CIMM Advisor Pre-Meditated Media 1 Background/Purpose Data Integrations are Rapidly Multiplying • Goal: Enrich and link consumer touch points with more granular target descriptions to improve advertising/marketing effectiveness - Cross platform: Audience measurement, targeting and effectiveness • Potential issue(s): Ensuring data quality standards are met across all sources of integration Purpose of Initiative • Educate end users on potential data quality issues across 3rd party sources • Provide end users with questions to ask of 3rd party data suppliers • Gauge appetite for data nomenclature and metrics definitions standards 2 Research Initiative Overview 1-hour interviews were conducted with 21 companies 10 End Users 2 Advertisers 2 Media Agencies 2 Media Networks 2 MVPDs 2 Data Intermediaries The Questions • Types of Data Integrations • Quality assurance procedures • Assessment of 3rd party data quality • Key areas of Data Quality concern • Data transparency • Consumer authentication • Consumer target model creation • Data naming and definition standards 11 Data Firms 8 General Purpose 2 TV Audience 1 Specialty 3 Defining Data Quality: Underlying Attributes & Integration Process Recent industry attention is focused on underlying quality and integration procedures (data hygiene a critical but less visible issue) Underlying Attributes Data Hygiene • • • • • Missing Fields Missing Values Typos Value Checks Logic Checks • • • • • Source credibility Recency Consumer authentication Collection method Representativeness Integration Process • • • Strength of links Source dominance Modeling strengths 4 Data suppliers: Overview 3rd party ecosystem is a checkerboard of firms that offer consumer information in nearly every category General Purpose • • • • Core consumer data sourced from credit transactions and census data Offer multi-category data Direct marketing legacy Newer entrants focus on digital behavior TV Audience • • 3rd party processors of digital set top box data Match TV data with consumer descriptors from general purpose firms Specialty • Focus on one or two categories of data; e.g., Hispanic consumers, movers, medical 5 End Users: Data Quality Generally Considered Good* • Larger firms perceived as having higher quality, established QA processes • No one firm is the undisputed leader across all categories/services o Suppliers tend to excel in particular categories; e.g., automotive vs. CPG • 3rd party data makes advertising/marketing campaigns more effective *BUT there’s room for improvement 6 End Users: Consumer Segments Top Quality Issue • Consumer authentication – How do we know it’s a male 18-34? • HH vs. persons data – a key issue for authentication and target creation • Modeled targets • How effective is target for driving sales? • Key variables driving the modeled target • Data recency – identifying in-market consumers; about to purchase vs. already purchased 7 End Users: Data Transparency for Better Decisions • The Need To Know – no data set perfect but must be informed of limitations to assess impact on decision making • Vagueness – detail on data source, model components, model approach, etc • Advertiser: “If they’re not forthcoming . . . we don’t work with them.” • Large 3rd party suppliers considered to be more transparent, digital technology platforms less so 8 End Users: Modeled Target Transparency What % of total target are actual consumers vs. modeled? What are key model variables? What is the modeled-target ROI vs. the actual consumer target? Actual Consumer Behavior Modeled Target 9 End Users: Compound Data Integrations = Compound Error Data “mash-ups” often pose challenges for understanding which data source(s) are dominant drivers of impact; particularly integrations in digital tech space Mobile Quality - iOS vs. Android metrics - Phone vs. Tablet metrics - Recency of activity - Profile source method - Data collection method Mobile Profile Syndicated Research Historical CTR data Cookie Quality - Source credibility - Recency - Expected life span - Selection criteria - Pool management - Original vs. look-alike Offline Sales Profile Behavioral Target 10 Data suppliers: Authenticating Consumers Many data suppliers use multiple data sources to confirm the correct information about consumers CRM Client DBs Survey Research Residential Change of Address Automotive Registrations Digital Publisher Registrations TV Viewing Data Consumer A is classified as a 47 year old male residing at 125 Smith St. with a HH income of $125K+ . . . Consumer A CRM 2 Age 47 Address 123 Smith St. Gender Male Income $125K+ CRM 3 47 123 Smith St. Male $125K+ Change of CRM 6 Survey Address 46 47 NA 123 Smith. St. 456 Jackson St 123 Smith St. Male Male Male $110K+ $125K+ NA Auto 47 123 Smith St. Male NA Digital Pub 42 456 Jackson St. Female $125K+ 11 Data suppliers: Creating Modeled Targets One data firm’s approach . . . Screen the variables . . . 1. Variable Sparsity Check Discard variables with few non-zero values 10,000 Target Variables 2. Weight of Evidence . . . then identify most predictive combination 3. Stepwise Regression Remaining variables assessed for value Iterative testing of combos 25-50 Target Variables 12 Data suppliers: Validating Modeled targets Holdout testing is used to compare modeled target performance to that of verified consumer purchasers ROI = $1.34 Actual Consumer Behavior ROI = $1.25 Modeled Target ROI = $0.80 Holdout sample (no ad exposure) 13 Data Harmonization: End Users & Data Firms Agree: Support notion of common set of names and definitions, however would be challenging for custom target segments How many different ways should you say the same thing? One! Male 25-54 Men & Age 25-54 Men 25-54 M 25-54 M Age 25-54 • Most expect the process to be facilitated by major industry associations: 14 Conclusions and Next Steps: Disclosure and transparency will set the table for creation of data quality standards • Establish Principles of Data Disclosure • Request Transparency a. Consumer authentication procedures b. Multiple data source descriptions c. Model target techniques and testing results d. Recency of data e. Integration techniques f. Data labeling • Create Best Practice Standards a. Acceptability guidelines for prepping and integrating multiple sources b. Recruit 3rd party members to participate, shape guidelines 15 Gerard Broussard| Advisor www.cimm-us.org @CIMM_NEWS gerard@pre-meditatedmedia.com @Gbroussard100 16