How Good is that Big Data_Gerard Broussard_Pre

advertisement
3rd Party Data Integration Mania:
Ensuring Quality and Transparency
Gerard Broussard | CIMM Advisor
Pre-Meditated Media
1
Background/Purpose
Data Integrations are Rapidly Multiplying
• Goal: Enrich and link consumer touch points with more granular target
descriptions to improve advertising/marketing effectiveness
- Cross platform: Audience measurement, targeting and effectiveness
• Potential issue(s): Ensuring data quality standards are met across all
sources of integration
Purpose of Initiative
• Educate end users on potential data quality issues across 3rd party sources
• Provide end users with questions to ask of 3rd party data suppliers
• Gauge appetite for data nomenclature and metrics definitions standards
2
Research Initiative Overview
1-hour interviews were conducted with 21 companies
10 End Users
2
Advertisers
2
Media
Agencies
2
Media
Networks
2
MVPDs
2
Data
Intermediaries
The Questions
• Types of Data Integrations
• Quality assurance procedures
• Assessment of 3rd party data quality
• Key areas of Data Quality concern
• Data transparency
• Consumer authentication
• Consumer target model creation
• Data naming and definition
standards
11 Data Firms
8
General
Purpose
2
TV Audience
1
Specialty
3
Defining Data Quality:
Underlying Attributes & Integration Process
Recent industry attention is focused on underlying quality and integration
procedures (data hygiene a critical but less visible issue)
Underlying
Attributes
Data Hygiene
•
•
•
•
•
Missing Fields
Missing Values
Typos
Value Checks
Logic Checks
•
•
•
•
•
Source credibility
Recency
Consumer
authentication
Collection method
Representativeness
Integration
Process
•
•
•
Strength of links
Source dominance
Modeling strengths
4
Data suppliers: Overview
3rd party ecosystem is a checkerboard of firms that offer consumer
information in nearly every category
General
Purpose
•
•
•
•
Core consumer data sourced from credit transactions and census data
Offer multi-category data
Direct marketing legacy
Newer entrants focus on digital behavior
TV Audience
•
•
3rd party processors of digital set top box data
Match TV data with consumer descriptors from general purpose firms
Specialty
•
Focus on one or two categories of data; e.g., Hispanic consumers, movers,
medical
5
End Users: Data Quality Generally Considered Good*
• Larger firms perceived as having higher quality, established QA processes
• No one firm is the undisputed leader across all categories/services
o Suppliers tend to excel in particular categories; e.g., automotive vs. CPG
• 3rd party data makes advertising/marketing campaigns more effective
*BUT there’s room for improvement
6
End Users: Consumer Segments Top Quality Issue
• Consumer authentication – How do we know it’s a male 18-34?
• HH vs. persons data – a key issue for authentication and target creation
• Modeled targets
• How effective is target for driving sales?
• Key variables driving the modeled target
• Data recency – identifying in-market consumers; about to purchase vs.
already purchased
7
End Users: Data Transparency for Better Decisions
• The Need To Know – no data set perfect but must be informed of limitations
to assess impact on decision making
• Vagueness – detail on data source, model components, model approach, etc
• Advertiser: “If they’re not forthcoming . . . we don’t work with them.”
• Large 3rd party suppliers considered to be more transparent, digital
technology platforms less so
8
End Users: Modeled Target Transparency
What % of total target are actual consumers vs. modeled?
What are key model variables?
What is the modeled-target ROI vs. the actual consumer target?
Actual Consumer
Behavior
Modeled Target
9
End Users: Compound Data Integrations = Compound Error
Data “mash-ups” often pose challenges for understanding which data source(s)
are dominant drivers of impact; particularly integrations in digital tech space
Mobile Quality - iOS vs. Android metrics
- Phone vs. Tablet metrics
- Recency of activity
- Profile source method
- Data collection method
Mobile
Profile
Syndicated
Research
Historical
CTR data
Cookie Quality
- Source credibility
- Recency
- Expected life span
- Selection criteria
- Pool management
- Original vs. look-alike
Offline
Sales
Profile
Behavioral
Target
10
Data suppliers: Authenticating Consumers
Many data suppliers use multiple data sources to confirm the correct
information about consumers
CRM
Client DBs
Survey
Research
Residential
Change of
Address
Automotive
Registrations
Digital
Publisher
Registrations
TV Viewing
Data
Consumer A is classified as a 47 year old male residing at 125 Smith St.
with a HH income of $125K+ . . .
Consumer A
CRM 2
Age
47
Address
123 Smith St.
Gender
Male
Income
$125K+
CRM 3
47
123 Smith St.
Male
$125K+
Change of
CRM 6
Survey
Address
46
47
NA
123 Smith. St. 456 Jackson St 123 Smith St.
Male
Male
Male
$110K+
$125K+
NA
Auto
47
123 Smith St.
Male
NA
Digital Pub
42
456 Jackson St.
Female
$125K+
11
Data suppliers: Creating Modeled Targets
One data firm’s approach . . .
Screen the variables . . .
1. Variable Sparsity Check
Discard variables with few
non-zero values
10,000 Target Variables
2. Weight of Evidence
. . . then identify most predictive
combination
3. Stepwise Regression
Remaining variables
assessed for value
Iterative testing
of combos
25-50 Target Variables
12
Data suppliers: Validating Modeled targets
Holdout testing is used to compare modeled target performance to that of
verified consumer purchasers
ROI = $1.34
Actual Consumer
Behavior
ROI = $1.25
Modeled Target
ROI = $0.80
Holdout sample (no ad exposure)
13
Data Harmonization: End Users & Data Firms Agree:
Support notion of common set of names and definitions,
however would be challenging for custom target segments
How many different ways should
you say the same thing? One!
Male 25-54 Men & Age 25-54
Men 25-54 M 25-54 M Age 25-54
• Most expect the process to be facilitated by major industry associations:
14
Conclusions and Next Steps:
Disclosure and transparency will set the table for creation of data quality
standards
• Establish Principles of Data Disclosure
• Request Transparency
a. Consumer authentication procedures
b. Multiple data source descriptions
c. Model target techniques and testing results
d. Recency of data
e. Integration techniques
f. Data labeling
• Create Best Practice Standards
a. Acceptability guidelines for prepping and integrating multiple sources
b. Recruit 3rd party members to participate, shape guidelines
15
Gerard Broussard| Advisor
www.cimm-us.org
@CIMM_NEWS
gerard@pre-meditatedmedia.com
@Gbroussard100
16
Download