Analytics adventures Sandro Catanzaro © 2013 Sandro Catanzaro 1 Where do I come from • • • • • • Mech. Eng undergrad - UBA Entrepreneur MIT Sloan and MIT Aero-Astro Dad Bain & Co alum DataXu co-founder / platform co-inventor • SVP Analytics and Innovation © 2013 Sandro Catanzaro Funny picture about me 2 Founded in 2009; Headquartered in Boston, MA; 300+ employees, 14 offices globally, PetaByte scale dataset Real-time decision system based on MIT research by Bill Simmons and Sandro Catanzaro (first used for NASA Mars Mission project) Easy-to-use software platform enabling brands to use Big Marketing Data 5th fastest growth US company Rated #1 algorithmic optimization technology by Forrester Research Powering blue-chip brands: Ford, Lexus, General Mills, Wells Fargo, Discover DataXu is all about applying data science to the art of marketing © 2011-2013 DataXu, Inc. Privileged & Confidential © 2011-2013 DataXu, Inc. Privileged & Confidential 4 4 Agenda PLEASE MAKE MAGIC HAPPEN MAKING MAGIC WORK SOME ANECDOTES © 2013 Sandro Catanzaro 5 PLEASE MAKE MAGIC HAPPEN © 2013 Sandro Catanzaro 6 What about Analytics Google Search Trends • Analytics popularity driven by its results Search interest trend for “Analytics” • MIT Sloan and IBM survey shows business believes on its value* • Science reliance in massive datasets 2005 2009 2011 2013 MIT Sloan – IBM survey Decision-makers belief analytics is a source of competitive advantage 100% – DNA data mining *Source: Create Business Value with Analytics – MIT Sloan Management Review – Fall 2011” 2007 50% 0% 2010 © 2013 Sandro Catanzaro 2011 7 Just a trend? • Declared the new black – Symptom of shark jumping? • Analytics will stay relevant as long as it continues providing the right answer – Or the right answer more often than not © 2013 Sandro Catanzaro 8 Analytics as a bridge Selection of Error what matters quantification Aggregates Cleaning Technology Raw data Model limits Insight Action oriented question Decision-making As any other bridge, a feat not a science, but of engineering Engineering = art of tradeoffs with just enough information • • • • Goes to a specific point to another specific point Good enough foundations (data and question clarity) Can be used by most people who want to cross it Has explicit durability assumptions © 2013 Sandro Catanzaro 9 Analytics as a system Input Data Data stakeholders Process Output Selection of what matters Technology QA Cleaning Model limits Insight Error quantification Aggregates Easier to manage interface © 2013 Sandro Catanzaro Presentation Insight recipients P&L owner Brittle interface 10 So, there is value on Analytics • Yes, but.. • As you all know, it is not easy • Some of the challenges that make Analytics so much fun © 2013 Sandro Catanzaro 11 Managing Analytics As Analytics Managers at the highest level, we must answer three key questions: 1. How is the business doing? 2. What Drives the business? 3. Who are my customers, what are their needs? To do that though, we rely on a team that can cover domain, business and the analytics process… … and technology Source: http://www.forbes.com/sites/piyankajain/2012/08/19/3-key-analytics-questions-to-ask-big-data/ © 2013 Sandro Catanzaro 12 Analytics Leaders Balancing Act • Valuable insights take into consideration business, domain and analytics process Business need • …with a just-goodenough mindset of insight delivery Domain knowledge Analytics & reasoning Technology © 2013 Sandro Catanzaro 13 A big & difficult challenge Input Process Output Dirty data Scaling up the team Unclear need Too much data New toolset dynamics Unwillingness to pay Far away data Thinking takes time Answer not simple enough Too little data R&D type of risk Output contradicts business Legal implications Market noise – new tools Politics Reputational risk Threat of disruption from output to input NIH – fear of change Trust and proprietary data Proving ROI Did you Notice where most challenges are? Technical stakeholders Little intellectual curiosity Business – Domain stakeholders © 2013 Sandro Catanzaro 14 Types of Analytics 1. Research – Unclear need, unclear answer, high risk 2. Development – Clear answer and method, but some risks on implementation 3. Operations – Good experience with execution. Can be automated – QA for Automation becomes a type 1 problem, again #1 = most fun / most challenging = focus of this slide-set © 2013 Sandro Catanzaro 15 Reporting vs Analytics Just a “report writer software” will do it • Analytics = curated reports – Curation: identifying what matters, editing out what doesn’t – Joining the dots • Analytics is a Bridge to the right insight – Not a pile of needles to replace a pile of hay – Decisions are made with just THE ONE needle – Careful you all detail oriented nation! © 2013 Sandro Catanzaro 16 Do some analytics for me You are smart, you can come up with something interesting, can’t you? • Many problems are not clear – Domain experts and business managers, even you, don’t know boundaries of what can be answered – “catch 22 situation” Unclear need Unwillingness to pay – Even worse, the problem existence is unclear • A solution is identified before the problem is clear (this is really bad) – The myth of big data – and the heavy lifting in data collection • Analytics are perceived as a “nice to have” by less technically oriented stakeholders – Do it on your free time © 2013 Sandro Catanzaro 17 Make sure the answer is X • Clear questions and enough data may still provide the “wrong” answer Output contradicts business Answer not simple enough Politics – May not make business sense – Or not be of the liking of business • Or, may require complex answers difficult to digest by intended audience • Or, politics may be a stronger driver for the decision process © 2013 Sandro Catanzaro 18 Look at the big picture AND focus! • Good analysts who are willing to see the big picture are scarce – Hiring analytics management is harder: Peters Principle • Moving too fast or too slow in tool selection Scaling up the team New toolset dynamics Thinking takes time R&D type of risk – In the middle between bleeding alpha and cobol. Think “second best” • Analysis “gold polishing” – Some of us may be perfectionists • … and finally, there intrinsic risk that the analysis may not work © 2013 Sandro Catanzaro 19 The right data, all in one place • Data is often in the wrong format or has rows that need to be ignored – Cleaning is tedious, often manual • More often than not, there is superfluous data that we keep “just in case” Dirty data Too much data Far away data – Bogs down processing, blows out costs • Data is rarely centralized – May require collaboration: “What is in it for me” © 2013 Sandro Catanzaro 20 FRAMEWORK © 2013 Sandro Catanzaro 21 Where to start Clear and present need “you are lucky” Need is not crisp, but challenge is known “we just need Analytics” Lack of awareness about challenge “Beware” Crisp and clear statement of the “question to answer” good bad Can you tell us how sales correlates with factors A, B, C, D and E; using a quarter by quarter dataset, and data from 2010 to date Please do some attribution modeling © 2013 Sandro Catanzaro 22 The pull model Starts from the end Data collection (data sources, pick, parse cleanup) Is this truly important to you? Dataset Processing (storage technology) (aggregation, processing technology) Analysis output (analysis tools) Are you willing to pay for it? Business problem Audience © 2013 Sandro Catanzaro 23 The pull model Starts from the end… …You would cross the “bridge” 3 times • First time from where you are, to the business stakeholder and problem • Second time from the business problem, all the way back to the data, so that every piece “pulls” the next previous one • With business problem in hand, you will know the minimum analysis output that provides the answer • With the analysis in hand, you will know analysis tools needed to prepare, and also what is the processing and aggregations needed to support it • With the aggregations in hand, you will know the minimum dataset required • .. All the way to data collection • Third time from the data, you will come back as you execute the analysis, step by step to finally arrive to the desired answer © 2013 Sandro Catanzaro 24 Continuous iteration • • • • Share interim results: continue selling on value, continue selling on model belief May modify the “crisp question” – ensure you also reframe time, resources, cost Whether you’ll share the methodology depends on the audience Creates dialog and sense of shared ownership in the answer Stakeholders Incorporate feedback Stakeholders Stakeholders Crisp question statement “the contract” Mockup answer (Answer First) Back of the envelope answer Complete answer Share output Stakeholders (maybe methodology) Nothing fancy here Just risk mitigation / project management 101… © 2013 Sandro Catanzaro 25 Be flexible • At each iteration, the question may move… – Nothing is worse than answering the wrong question – What is valuable may evolve • … yet ensure you “get your victory lap” if the question moves – Thinking evolved thanks to analysis – Re-evaluate timing, resources • Practical signs you’re on the right track: – Experienced stakeholders confirm your findings by ‘gut’ – They ask a LOT of clarifying questions & recommend potential next steps to your analysis/message © 2013 Sandro Catanzaro 26 Best models: 70% belief - 30% math • Simplify • Share with all shareholders – get them to believe in the answer and model • Simplify • Acquire buy in – “prewire meetings” – Must have on board the P&L owner – Also recruit influencers, other stakeholders • Simplify again Be careful on adding accuracy at the cost of complexity – Review and clean up model – Remove unneeded items, keep just enough to prove your point – Maybe add a single cherry (non-requested-by-so-cool-to-know insight) © 2013 Sandro Catanzaro 27 Risk mitigation • Understand your project risks and tackle them in order – The scariest first – even if you don’t solve it, you’ll raise awareness, understand it better, adjust timeline if needed – Leave the easiest for last – even if out of sequence • Ensure you can create full end-to-end passes with increase accuracy each time Nothing fancy here Just risk mitigation / project management 101… © 2013 Sandro Catanzaro 28 Scaling the team • Best bet is for introducing geeks, into big picture thinking • Aim for “T shaped” people – Broad big picture understanding – Deep expertise in one area – Smarts beat expertise • Provide context – Make sure team understands where does the analysis fit • Be willing to let go – Understand costs of potential mistakes – Responsibility is never delegated © 2013 Sandro Catanzaro 29 Accessing and processing data • Data accumulates “close” to the data user (formats, quirks) – Data that is not used is rarely correct, updated on time • Help from data users is critical, yet them may feel threatened – Create quid-pro-quo partnerships. Giving credit can be enough • Avoid changing technologies mid-way a project – If new tech advantages are leaps and bounds above, parallel run both until the new replicates results © 2013 Sandro Catanzaro 30 SOME ANECDOTES © 2013 Sandro Catanzaro 31 Make digital attribution Large banking institution, looking to show innovation: “We want to know the value of the ad sequencing So, please prepare an attribution model for us • Make it say something interesting • Actionable • … and that we haven’t heard before” © 2013 Sandro Catanzaro 32 What can you say about their personal values? Ad agency executive for a large CPG account: “Using the data you are seeing from our ad campaign, what can you tell us about our consumers’ values?” © 2013 Sandro Catanzaro 33 Create wizards: One, two, three, go! Executive to analytics manager and recruiting manager all in one: “We need you to build our new Analytics group from the ground up” © 2013 Sandro Catanzaro 34 Sandro’s Analytics principles 1. Start with the business/domain problem. Never with data 2. Don’t ever ignore humans – especially the one who pays 3. 70% belief – 30% math 4. Teach smart geeks to think big picture & let go 5. Simplify - weight complexity vs. accuracy 6. Quick and good enough wins the day 7. Model and iterate – rinse and repeat 8. Tackle risks from riskiest to least risky 9. Be flexible in scope, but take victory lap and re-budget 10. Never compromise with “the truth” © 2013 Sandro Catanzaro 35 APPENDIX … Because you always need to move 70% of slides to their appendix… © 2013 Sandro Catanzaro 36 No ROI “The project has been cancelled – sorry” • In theory, you should' t care • In practice, is critical to understand the value of your work, and the big picture – Ask questions, understand how your answer fits in • Good fitness means – Project stability – Organizational visibility – Less re-work © 2013 Sandro Catanzaro Thanks to Vishal Kumar for the graphics – stolen from his blog 37 From business to analysis and back • Value created on the intersection of business, domain and analytics Business need Domain knowledge • Just Enough technology • Team skill set should span all three four Analytics & reasoning Technology © 2013 Sandro Catanzaro 38