Data collection

advertisement
Analytics adventures
Sandro Catanzaro
© 2013 Sandro Catanzaro
1
Where do I come from
•
•
•
•
•
•
Mech. Eng undergrad - UBA
Entrepreneur
MIT Sloan and MIT Aero-Astro
Dad
Bain & Co alum
DataXu co-founder / platform
co-inventor
• SVP Analytics and Innovation
© 2013 Sandro Catanzaro
Funny picture
about me
2
Founded in 2009; Headquartered in Boston, MA;
300+ employees, 14 offices globally, PetaByte scale dataset
Real-time decision system based on MIT research by
Bill Simmons and Sandro Catanzaro
(first used for NASA Mars Mission project)
Easy-to-use software platform enabling brands to use
Big Marketing Data
5th fastest growth US company
Rated #1 algorithmic optimization
technology by Forrester Research
Powering blue-chip brands:
Ford, Lexus, General Mills,
Wells Fargo, Discover
DataXu is
all about applying
data science to
the art of marketing
© 2011-2013 DataXu, Inc. Privileged & Confidential
© 2011-2013 DataXu, Inc. Privileged & Confidential
4
4
Agenda
PLEASE MAKE MAGIC HAPPEN
MAKING MAGIC WORK
SOME ANECDOTES
© 2013 Sandro Catanzaro
5
PLEASE MAKE MAGIC HAPPEN
© 2013 Sandro Catanzaro
6
What about Analytics
Google Search Trends
• Analytics popularity driven
by its results
Search interest trend for “Analytics”
• MIT Sloan and IBM survey
shows business believes on
its value*
• Science reliance in massive
datasets
2005
2009
2011
2013
MIT Sloan – IBM survey
Decision-makers belief analytics is
a source of competitive advantage
100%
– DNA data mining
*Source: Create Business Value with Analytics – MIT
Sloan Management Review – Fall 2011”
2007
50%
0%
2010
© 2013 Sandro Catanzaro
2011
7
Just a trend?
• Declared the new black
– Symptom of shark
jumping?
• Analytics will stay relevant
as long as it continues
providing the right answer
– Or the right answer more
often than not
© 2013 Sandro Catanzaro
8
Analytics as a bridge
Selection of Error
what matters quantification
Aggregates
Cleaning
Technology
Raw data
Model
limits
Insight
Action
oriented
question
Decision-making
As any other bridge, a feat not a science, but of engineering
Engineering = art of tradeoffs with just enough information
•
•
•
•
Goes to a specific point to another specific point
Good enough foundations (data and question clarity)
Can be used by most people who want to cross it
Has explicit durability assumptions
© 2013 Sandro Catanzaro
9
Analytics as a system
Input
Data
Data
stakeholders
Process
Output
Selection of
what matters
Technology
QA
Cleaning
Model
limits
Insight
Error
quantification
Aggregates
Easier to manage
interface
© 2013 Sandro Catanzaro
Presentation
Insight
recipients
P&L owner
Brittle
interface
10
So, there is value on Analytics
• Yes, but..
• As you all know, it is not easy
• Some of the challenges that make Analytics so
much fun
© 2013 Sandro Catanzaro
11
Managing Analytics
As Analytics Managers at the highest level, we must
answer three key questions:
1. How is the business doing?
2. What Drives the business?
3. Who are my customers, what are their needs?
To do that though, we rely on a team that can cover
domain, business and the analytics process… … and
technology
Source: http://www.forbes.com/sites/piyankajain/2012/08/19/3-key-analytics-questions-to-ask-big-data/
© 2013 Sandro Catanzaro
12
Analytics Leaders Balancing Act
• Valuable insights take
into consideration
business, domain and
analytics process
Business
need
• …with a just-goodenough mindset of
insight delivery
Domain
knowledge
Analytics &
reasoning
Technology
© 2013 Sandro Catanzaro
13
A big & difficult challenge
Input
Process
Output
Dirty data
Scaling up the team
Unclear need
Too much data
New toolset dynamics
Unwillingness to pay
Far away data
Thinking takes time
Answer not simple enough
Too little data
R&D type of risk
Output contradicts
business
Legal implications
Market noise – new tools
Politics
Reputational risk
Threat of disruption from
output to input
NIH – fear of change
Trust and proprietary data
Proving ROI
Did you Notice where most
challenges are?
Technical
stakeholders
Little intellectual curiosity
Business – Domain
stakeholders
© 2013 Sandro Catanzaro
14
Types of Analytics
1. Research
– Unclear need, unclear answer, high risk
2. Development
– Clear answer and method, but some risks on
implementation
3. Operations
– Good experience with execution. Can be automated
– QA for Automation becomes a type 1 problem, again
#1 = most fun / most challenging = focus of this slide-set
© 2013 Sandro Catanzaro
15
Reporting vs Analytics
Just a “report writer software” will do it
• Analytics = curated reports
– Curation: identifying what matters, editing out what doesn’t
– Joining the dots
• Analytics is a Bridge
to the right insight
– Not a pile of needles
to replace a pile of hay
– Decisions are made with
just THE ONE needle
– Careful you all detail oriented nation!
© 2013 Sandro Catanzaro
16
Do some analytics for me
You are smart, you can come up with
something interesting, can’t you?
• Many problems are not clear
– Domain experts and business managers, even
you, don’t know boundaries of what can be
answered – “catch 22 situation”
Unclear need
Unwillingness to pay
– Even worse, the problem existence is unclear
• A solution is identified before the problem is
clear (this is really bad)
– The myth of big data – and the heavy lifting in
data collection
• Analytics are perceived as a “nice to have” by
less technically oriented stakeholders
– Do it on your free time
© 2013 Sandro Catanzaro
17
Make sure the answer is X
• Clear questions and enough data
may still provide the “wrong”
answer
Output contradicts
business
Answer not simple
enough
Politics
– May not make business sense
– Or not be of the liking of business
• Or, may require complex answers
difficult to digest by intended
audience
• Or, politics may be a stronger
driver for the decision process
© 2013 Sandro Catanzaro
18
Look at the big picture AND focus!
• Good analysts who are willing to see the
big picture are scarce
– Hiring analytics management is harder: Peters
Principle
• Moving too fast or too slow in tool
selection
Scaling up the team
New toolset dynamics
Thinking takes time
R&D type of risk
– In the middle between bleeding alpha and cobol.
Think “second best”
• Analysis “gold polishing”
– Some of us may be perfectionists
• … and finally, there intrinsic risk
that the analysis may not work
© 2013 Sandro Catanzaro
19
The right data, all in one place
• Data is often in the wrong format or has
rows that need to be ignored
– Cleaning is tedious, often manual
• More often than not, there is superfluous
data that we keep “just in case”
Dirty data
Too much data
Far away data
– Bogs down processing, blows out costs
• Data is rarely
centralized
– May require
collaboration:
“What is in it for me”
© 2013 Sandro Catanzaro
20
FRAMEWORK
© 2013 Sandro Catanzaro
21
Where to start
Clear and
present need
“you are lucky”
Need is not crisp,
but challenge is
known
“we just need
Analytics”
Lack of
awareness about
challenge
“Beware”
Crisp and clear statement of the
“question to answer”
good
bad
Can you tell us how sales correlates
with factors A, B, C, D and E; using
a quarter by quarter dataset, and
data from 2010 to date
Please do some
attribution modeling
© 2013 Sandro Catanzaro
22
The pull model
Starts from the end
Data collection
(data sources,
pick, parse
cleanup)
Is this truly
important to you?
Dataset
Processing
(storage
technology)
(aggregation,
processing
technology)
Analysis
output
(analysis
tools)
Are you willing
to pay for it?
Business
problem
Audience
© 2013 Sandro Catanzaro
23
The pull model
Starts from the end…
…You would cross the “bridge” 3 times
• First time from where you are, to the business stakeholder and problem
• Second time from the business problem, all the way back to the data, so that
every piece “pulls” the next previous one
• With business problem in hand, you will know the minimum analysis
output that provides the answer
• With the analysis in hand, you will know analysis tools needed to prepare,
and also what is the processing and aggregations needed to support it
• With the aggregations in hand, you will know the minimum dataset
required
• .. All the way to data collection
• Third time from the data, you will come back as you execute the analysis, step
by step to finally arrive to the desired answer
© 2013 Sandro Catanzaro
24
Continuous iteration
•
•
•
•
Share interim results: continue selling on value, continue selling on model belief
May modify the “crisp question” – ensure you also reframe time, resources, cost
Whether you’ll share the methodology depends on the audience
Creates dialog and sense of shared ownership in the answer
Stakeholders
Incorporate feedback
Stakeholders
Stakeholders
Crisp
question
statement
“the contract”
Mockup
answer
(Answer
First)
Back of
the
envelope
answer
Complete
answer
Share output
Stakeholders
(maybe methodology)
Nothing fancy here
Just risk mitigation /
project management 101…
© 2013 Sandro Catanzaro
25
Be flexible
• At each iteration, the question may move…
– Nothing is worse than answering the wrong question
– What is valuable may evolve
• … yet ensure you “get your victory lap” if the question moves
– Thinking evolved thanks to analysis
– Re-evaluate timing, resources
• Practical signs you’re on the right track:
– Experienced stakeholders confirm your findings by ‘gut’
– They ask a LOT of clarifying questions & recommend potential next steps to
your analysis/message
© 2013 Sandro Catanzaro
26
Best models: 70% belief - 30% math
• Simplify
• Share with all shareholders – get them to believe in the
answer and model
• Simplify
• Acquire buy in – “prewire meetings”
– Must have on board the P&L owner
– Also recruit influencers, other stakeholders
• Simplify again
Be careful on
adding accuracy
at the cost of
complexity
– Review and clean up model
– Remove unneeded items, keep just enough to prove your point
– Maybe add a single cherry (non-requested-by-so-cool-to-know insight)
© 2013 Sandro Catanzaro
27
Risk mitigation
• Understand your project risks and tackle them in order
– The scariest first – even if you don’t solve it, you’ll raise
awareness, understand it better, adjust timeline if needed
– Leave the easiest for last – even if out of sequence
• Ensure you can create full end-to-end passes with increase
accuracy each time
Nothing fancy here
Just risk mitigation /
project management 101…
© 2013 Sandro Catanzaro
28
Scaling the team
• Best bet is for introducing geeks,
into big picture thinking
• Aim for “T shaped” people
– Broad big picture understanding
– Deep expertise in one area
– Smarts beat expertise
• Provide context
– Make sure team understands where
does the analysis fit
• Be willing to let go
– Understand costs of potential mistakes
– Responsibility is never delegated
© 2013 Sandro Catanzaro
29
Accessing and processing data
• Data accumulates “close” to the data user (formats, quirks)
– Data that is not used is rarely correct, updated on time
• Help from data users is critical, yet them may feel threatened
– Create quid-pro-quo partnerships. Giving credit can be enough
• Avoid changing technologies mid-way a project
– If new tech advantages are leaps and bounds above, parallel run both
until the new replicates results
© 2013 Sandro Catanzaro
30
SOME ANECDOTES
© 2013 Sandro Catanzaro
31
Make digital attribution
Large banking institution, looking to show
innovation:
“We want to know the value of the ad sequencing
So, please prepare an attribution model for us
• Make it say something interesting
• Actionable
• … and that we haven’t heard before”
© 2013 Sandro Catanzaro
32
What can you say about
their personal values?
Ad agency executive for a large CPG account:
“Using the data you are seeing from our ad
campaign, what can you tell us about our
consumers’ values?”
© 2013 Sandro Catanzaro
33
Create wizards: One, two, three, go!
Executive to analytics manager and recruiting
manager all in one:
“We need you to build our new Analytics group
from the ground up”
© 2013 Sandro Catanzaro
34
Sandro’s Analytics principles
1. Start with the business/domain problem. Never with data
2. Don’t ever ignore humans – especially the one who pays
3. 70% belief – 30% math
4. Teach smart geeks to think big picture & let go
5. Simplify - weight complexity vs. accuracy
6. Quick and good enough wins the day
7. Model and iterate – rinse and repeat
8. Tackle risks from riskiest to least risky
9. Be flexible in scope, but take victory lap and re-budget
10. Never compromise with “the truth”
© 2013 Sandro Catanzaro
35
APPENDIX
… Because you always need to move
70% of slides to their appendix…
© 2013 Sandro Catanzaro
36
No ROI
“The project has been cancelled – sorry”
• In theory, you should' t care
• In practice, is critical to
understand the value of your
work, and the big picture
– Ask questions, understand
how your answer fits in
• Good fitness means
– Project stability
– Organizational visibility
– Less re-work
© 2013 Sandro Catanzaro
Thanks to Vishal Kumar for the graphics – stolen from his blog
37
From business to analysis and back
• Value created on the
intersection of
business, domain and
analytics
Business
need
Domain
knowledge
• Just Enough technology
• Team skill set should
span all three four
Analytics &
reasoning
Technology
© 2013 Sandro Catanzaro
38
Download