How R Transformed the Analytics Paradigm at Millward Brown

advertisement
How R Transformed the
Analytics Paradigm at
Millward Brown
JUAN MANUEL HERNÁNDEZ
1
help(Millward Brown)
W H AT W E D O F O R O U R C L I E N T S
2
library(Millward Brown)
Marketing communications, media, digital and
brand equity research.
A lot of touch points <- attitudinal and behavioural data
We work with 90% of the world’s leading brands.
Categories and consumer profiles <- data of all types
Offices in 56 countries
Global/Regional <- data from almost anywhere.
Brand
Strategy
Brand
Performance
Creative
Development
Channel
Optimization
40 years collecting consumer attitudinal data - demographics, social,
economic, and cultural habits & opinions.
3
For mor information, go to www.millwardbrown.com
data(BrandZ™)
BrandZ™ is the world's largest
brand equity database. Created
in 1998 and continually updated.
It contains data on brands
gathered from interviews with
over 150,000 people every year
in up to 400 studies around the
world.
BrandZ™ is just one example of the type and magnitude of studies carried
out by MB around the world.
4
MB_2014 <- sum(seq_along(MB))
# T H E J O U R N E Y T O A N E W A N A LY T I C S PA R A D I G M
5
library(BrandDynamics)
Measuring brand equity with BrandDynamics™
Voltage 2.0
High
Growing Equity
Strong Equity
(Small strong brands)
(Large strong brands)
Little Equity
Declining Equity
(Small weaker brands)
(Large weaker brands)
Low
Low
Voltage2.0
Analytics originally written in SPSS!
6
Presence
Brand Map
High
ts(BrandDynamics™)
1992
1996
1998
Launch of Launch of
BrandDynamics BRANDZ
2003
2012
2005
2009
Bonding Factor Launch
analysis of D&A
Development of
Voltage2.0
Meaningfully
Different
Framework
2003
2010
Launch of the
‘Paw Print’ analysis
Development
of the Brand
Strength Score
2010
Development of the
Value Driver workshops
20 years provide a lot of learning - conceptual, analytical, and
operational. What, why, where, when, and how to analyse brand equity?
7
Error in library(MDf) : no package called ‘MDf'
The new framework required a new
calculation engine.
Traditional software development teams
aren’t usually skilled in high-level statistics.
Real, enterprise software is much more than
making sure calculations are correct.
Original development estimate timeframe of 2 years!
8
Wikipedia :: define(“enterprise software”)
“Enterprise software, also known as enterprise software application (ESA), is
purposed-designed computer software used to satisfy the needs of
an organization rather than individual users […] Enterprise software is an integral part of
a (computer based) Information System, and as such includes web site software
production.” - http://en.wikipedia.org/wiki/Enterprise_software
Global, enterprise systems have to consider:
• Systems architecture
• Support
• Deployment
• Source control
• General software dev. best practices
• Development programs (e.g. versioning, agile vs. waterfall development, etc…)
Traditional statisticians have never even heard of enterprise software!
9
install.package(MB_R) # the New Calculation Engine
Open Source – We could build a free prototype
Statistical Power – Endless, dynamic array of
statistical and data processing capabilities
There were hidden demons - an R script/package is
not the same as an enterprise analytic system
10
“R encountered a fatal error”
Unsupported
Library quality
Resource Management
“Unknown language”
Open-source software, like R, can be difficult to manage for the enterprise.
11
installed.packages() # Developers & R Statisticians
An immediate lack of balance in
new requirements for analytics
systems vs. skillsets available
became apparent.
A lot of our solutions would require
a level of high-level automation of
statistical analyses our developers
could not deliver fast enough.
Developers
R
Stats
Requirement
Skills
R was an appealing solution, but very little expertise was available.
12
load(R Leap of Faith)
# M B B O L D LY G O E S F O R R
13
A Universe of Data <- 40 Years
Messy
“Pathological”
Data
<-
Survey data is always challenging. Over time, big improvements in data
collection have been made, but many, unexpected inconsistencies and
biases are a constant presence we need to control.
A difficult mixture of effects on data quality and tidiness make harnessing so
much data a major challenge – Millward Brown has a lot of data.
Our analyses are cleverly designed to be applied at a global, generic level,
maximizing insight and minimizing noise through dynamic learning.
14
DESCRIPTION
Agile Development
Input Loading
Validation
Calculation Engine
Analytics
Output
Wrapper
Distribution to
Users
Internal
Infrastructure
We went for it and came up with a plan. An R based enterprise system
would require a wrapper to deliver and supply services to the calculation
engine. R would have to cooperate with other programming languages.
15
sapply(Validation)
Validating inputs guarantees
required input consistency.
Provides (near immediate)
feedback to the user if sufficient
conditions aren’t met for
successful processing.
Business Logic
Data Validation
Happens promptly to avoid
wasting time.
Validation routines allowed us to control and enhance the level of
flexibility in our analyses and systems.
16
MB_Analytics <- function(BrandEquity)
• Quickly embraced R’s excellent data
manipulation functionality.
• Implementing any sort of statistical
analysis/model was possible.
• Vibrant open-source community provides
the best possible support, if you know how
to harness it!
• Leaning process is difficult but rewarding.
We quickly learned we could do everything we required in R.
17
read/write.output(MB)
R can read in data from almost any data source.
R can generate pretty much any type of output.
We could plug R into a system/architecture that would make the most of
its analytical capabilities.
18
require(Beast)
“’Brand Equity Analytics…’,
get the name to spell ‘Beast’
and you’re onto something.”
– Dale Smith, Global Head of
Analytical Innovations, MB
The Beast is MB’s R-based analytics service.
19
str(Beast)
Outputs
Inputs
SPSS
CSV
SAS
Enriched respondent
level data
R
(Tabs, dashboards)
Summary reports
(XLS, PPT, PDF)
Normative database
Wrapper
Analytics contributed by MB’s statistical community
What started out as a calculation engine quickly evolved into a grand
vision that empowered MB’s statistical minds.
20
print(MB_Beast)
# L E S S O N S L E A R N E D , B L O W S TA K E N , R E W A R D S R E A P E D
21
Warning: Planning returned NA
Versioning & Source
Control
Single Analysis vs.
Process
Testing & Exception
Handling
Code/Package
Structure & Quality
• R, CRAN, & Custom Library versions
• As the team grows, how will several programmers contribute code?
• Processing Time, Memory Usage
• Hardware vs. Code Optimization
• Unit, Regression Testing
• Error Handling
• Code re-usability
• Documentation
• Classes, Methods, Functions, Services
As statisticians, we had to learn a lot about a lot of standard software best
practices that are alien to us by nature!
22
sessionInfo() # R & the Beast over time
January 2013
March 2013
May 2013
August 2013
August 2014
Phase 4
Beast
Community
MB now has 6 R
Developers in the
Global Analytics
team.
The Beast
Phase 1
Prototype built.
Single, lonely R
programmer
wrote one epic
script run locally!
23
Phase 3
Phase 2
The Beast is
born! An appbased .Net
wrapper is
created to deliver
functionality.
Beast team: 1 R,
& 1 .Net
Developers
2 developers for
each language.
The Beast is
packaged!
The Beast code
is structured in
line with a
Service-oriented
architecture.
3 R Developers!
MB Global R
Community is
born: 70 active
members worldwide.
summary(Beast)
35%
13%
31%
10%
6%
5%
Today, the Beast is at the heart of much of what we deliver to our clients,
with around 1000 different jobs being processed over the last 12 months
around the world.
24
return(Beast)
Statisticians aren’t software developers, but they
can learn and harness the best of both worlds
Building enterprise software with R is challenging:
open source languages require you to consider
things you wouldn’t otherwise have to worry about.
R can communicate with all sorts of platforms,
enabling efficient gateways for analytics system
success. Statisticians and developers make
powerful allies!
25
How R Transformed the
Analytics Paradigm at
Millward Brown
Juan Manuel Hernández
26
Download