How R Transformed the Analytics Paradigm at Millward Brown JUAN MANUEL HERNÁNDEZ 1 help(Millward Brown) W H AT W E D O F O R O U R C L I E N T S 2 library(Millward Brown) Marketing communications, media, digital and brand equity research. A lot of touch points <- attitudinal and behavioural data We work with 90% of the world’s leading brands. Categories and consumer profiles <- data of all types Offices in 56 countries Global/Regional <- data from almost anywhere. Brand Strategy Brand Performance Creative Development Channel Optimization 40 years collecting consumer attitudinal data - demographics, social, economic, and cultural habits & opinions. 3 For mor information, go to www.millwardbrown.com data(BrandZ™) BrandZ™ is the world's largest brand equity database. Created in 1998 and continually updated. It contains data on brands gathered from interviews with over 150,000 people every year in up to 400 studies around the world. BrandZ™ is just one example of the type and magnitude of studies carried out by MB around the world. 4 MB_2014 <- sum(seq_along(MB)) # T H E J O U R N E Y T O A N E W A N A LY T I C S PA R A D I G M 5 library(BrandDynamics) Measuring brand equity with BrandDynamics™ Voltage 2.0 High Growing Equity Strong Equity (Small strong brands) (Large strong brands) Little Equity Declining Equity (Small weaker brands) (Large weaker brands) Low Low Voltage2.0 Analytics originally written in SPSS! 6 Presence Brand Map High ts(BrandDynamics™) 1992 1996 1998 Launch of Launch of BrandDynamics BRANDZ 2003 2012 2005 2009 Bonding Factor Launch analysis of D&A Development of Voltage2.0 Meaningfully Different Framework 2003 2010 Launch of the ‘Paw Print’ analysis Development of the Brand Strength Score 2010 Development of the Value Driver workshops 20 years provide a lot of learning - conceptual, analytical, and operational. What, why, where, when, and how to analyse brand equity? 7 Error in library(MDf) : no package called ‘MDf' The new framework required a new calculation engine. Traditional software development teams aren’t usually skilled in high-level statistics. Real, enterprise software is much more than making sure calculations are correct. Original development estimate timeframe of 2 years! 8 Wikipedia :: define(“enterprise software”) “Enterprise software, also known as enterprise software application (ESA), is purposed-designed computer software used to satisfy the needs of an organization rather than individual users […] Enterprise software is an integral part of a (computer based) Information System, and as such includes web site software production.” - http://en.wikipedia.org/wiki/Enterprise_software Global, enterprise systems have to consider: • Systems architecture • Support • Deployment • Source control • General software dev. best practices • Development programs (e.g. versioning, agile vs. waterfall development, etc…) Traditional statisticians have never even heard of enterprise software! 9 install.package(MB_R) # the New Calculation Engine Open Source – We could build a free prototype Statistical Power – Endless, dynamic array of statistical and data processing capabilities There were hidden demons - an R script/package is not the same as an enterprise analytic system 10 “R encountered a fatal error” Unsupported Library quality Resource Management “Unknown language” Open-source software, like R, can be difficult to manage for the enterprise. 11 installed.packages() # Developers & R Statisticians An immediate lack of balance in new requirements for analytics systems vs. skillsets available became apparent. A lot of our solutions would require a level of high-level automation of statistical analyses our developers could not deliver fast enough. Developers R Stats Requirement Skills R was an appealing solution, but very little expertise was available. 12 load(R Leap of Faith) # M B B O L D LY G O E S F O R R 13 A Universe of Data <- 40 Years Messy “Pathological” Data <- Survey data is always challenging. Over time, big improvements in data collection have been made, but many, unexpected inconsistencies and biases are a constant presence we need to control. A difficult mixture of effects on data quality and tidiness make harnessing so much data a major challenge – Millward Brown has a lot of data. Our analyses are cleverly designed to be applied at a global, generic level, maximizing insight and minimizing noise through dynamic learning. 14 DESCRIPTION Agile Development Input Loading Validation Calculation Engine Analytics Output Wrapper Distribution to Users Internal Infrastructure We went for it and came up with a plan. An R based enterprise system would require a wrapper to deliver and supply services to the calculation engine. R would have to cooperate with other programming languages. 15 sapply(Validation) Validating inputs guarantees required input consistency. Provides (near immediate) feedback to the user if sufficient conditions aren’t met for successful processing. Business Logic Data Validation Happens promptly to avoid wasting time. Validation routines allowed us to control and enhance the level of flexibility in our analyses and systems. 16 MB_Analytics <- function(BrandEquity) • Quickly embraced R’s excellent data manipulation functionality. • Implementing any sort of statistical analysis/model was possible. • Vibrant open-source community provides the best possible support, if you know how to harness it! • Leaning process is difficult but rewarding. We quickly learned we could do everything we required in R. 17 read/write.output(MB) R can read in data from almost any data source. R can generate pretty much any type of output. We could plug R into a system/architecture that would make the most of its analytical capabilities. 18 require(Beast) “’Brand Equity Analytics…’, get the name to spell ‘Beast’ and you’re onto something.” – Dale Smith, Global Head of Analytical Innovations, MB The Beast is MB’s R-based analytics service. 19 str(Beast) Outputs Inputs SPSS CSV SAS Enriched respondent level data R (Tabs, dashboards) Summary reports (XLS, PPT, PDF) Normative database Wrapper Analytics contributed by MB’s statistical community What started out as a calculation engine quickly evolved into a grand vision that empowered MB’s statistical minds. 20 print(MB_Beast) # L E S S O N S L E A R N E D , B L O W S TA K E N , R E W A R D S R E A P E D 21 Warning: Planning returned NA Versioning & Source Control Single Analysis vs. Process Testing & Exception Handling Code/Package Structure & Quality • R, CRAN, & Custom Library versions • As the team grows, how will several programmers contribute code? • Processing Time, Memory Usage • Hardware vs. Code Optimization • Unit, Regression Testing • Error Handling • Code re-usability • Documentation • Classes, Methods, Functions, Services As statisticians, we had to learn a lot about a lot of standard software best practices that are alien to us by nature! 22 sessionInfo() # R & the Beast over time January 2013 March 2013 May 2013 August 2013 August 2014 Phase 4 Beast Community MB now has 6 R Developers in the Global Analytics team. The Beast Phase 1 Prototype built. Single, lonely R programmer wrote one epic script run locally! 23 Phase 3 Phase 2 The Beast is born! An appbased .Net wrapper is created to deliver functionality. Beast team: 1 R, & 1 .Net Developers 2 developers for each language. The Beast is packaged! The Beast code is structured in line with a Service-oriented architecture. 3 R Developers! MB Global R Community is born: 70 active members worldwide. summary(Beast) 35% 13% 31% 10% 6% 5% Today, the Beast is at the heart of much of what we deliver to our clients, with around 1000 different jobs being processed over the last 12 months around the world. 24 return(Beast) Statisticians aren’t software developers, but they can learn and harness the best of both worlds Building enterprise software with R is challenging: open source languages require you to consider things you wouldn’t otherwise have to worry about. R can communicate with all sorts of platforms, enabling efficient gateways for analytics system success. Statisticians and developers make powerful allies! 25 How R Transformed the Analytics Paradigm at Millward Brown Juan Manuel Hernández 26