>> Ryan Brush: Terrific, let’s go ahead and... Welcome to the MSR visiting speaker series. Before we...

advertisement
>> Ryan Brush: Terrific, let’s go ahead and get started. Thank you all for coming.
Welcome to the MSR visiting speaker series. Before we get started, let me make a
few notes. First of all, thank you Amy Draves and Michelle Riggen-Ransom for
organizing this event. We’ve got a great speaker today, so I’m really looking forward
to that.
Also, I wanted to make a note. My name is Ryan Brush. I work in GFS, Global
Foundation Services for Microsoft, but I also teach at University of Washington. So,
what’s fun about this is I’m actually going to use part of the book that you see in the
back as a textbook for this upcoming semester, so from October through June we
offer data analysis certificate program through University of Washington, and this is
basically in the evenings Mondays and Thursdays.
If any of you are interested in learning more about the program please feel free to
reach out. I’d be happy to share details about that. So as we get started, let me read
the short blip about John Foreman.
So John Foreman is the chief data scientist fro Mailchimp.com where he leads a data
science product development effort called the email genome project. As an analytics
consultant John has created data science solutions for the Coca-Cola Company, Royal
Caribbean International, Intercontinental Hotels Group, and the Department of
Defense, the IRS, and the FBI.
So John’s book, Data Smart: using data science to transform information into insight
is now out from Wiley, and it’s possibly the most legal fun you can have with
spreadsheets. So I’m looking forward to today’s session, and without further ado I’ll
turn it over to John. Thank you.
>> John Foreman: All right. Thank you. So I’m John and I appreciate the
introduction, Ryan. Like Ryan said, I am a recovering consultant that was all those
projects he listed off when I was a consultant in a management-consulting firm.
Also, before this talk I had a huge lunch over near building 99, so I’m just going to be
burping through the entire talk, so that’ll just happen. I’ll try to keep it quiet, but the
food was delicious. I’m an operations research guy, so operations research is math
applied to decision making, and it’s kind of a piece of data science before data
science was sexy, right? Operations research has been around for a long time, since
World War II looking at how do we schedule convoys, things like that.
So it’s an analytics practice that’s been around for a very long time. But I left
management consulting and went to MailChimp. So I don’t know how many of y’all
know what MailChimp is, or use MailChimp, but it is the world’s largest -- I see a
thumbs up in the back there. Some of you use MailChimp, yes!
So it’s the world’s largest email service provider. So MailChimp is a website through
which you would upload your email list and create a news letter to send to that list,
and we help you do that and help you track opens, clicks, e-commerce, transactions
that started in that newsletter. So we send about 10 billion emails every month, and
then we track a bunch of interactions with those emails. And so there’s another 4
billion interactions coming in, opens, clicks, subscribes, reports, all that kind of stuff.
We actually send a lot to hotmail so y’all get a lot of email from us.
We have about 280 employees in Atlanta. So we’re all in the south, that’s why I say
y’all. You’ll get used to it by the end of this talk. So I was a math guy and was used
to working with large organizations before coming to MailChimp, and so my
experience doing analytics was always that there would be this other analytics team
I would be working with and everyone would be analytics all the time. And when I
went to a startup it was kind of weird. I met Greg. Greg is a designer at MailChimp.
And then I met Fabio, email template designer. Mardov is the designer. And I met
Dave, designer, Kayla designer, Jen designer, Justin our t-shirt designer. He designs
t-shirts like this, and like that shirt. They make things like this billboard outside of
our headquarters. This is the death metal billboard. It was eventually replaced with
an ‘80s metal billboard and these are our action figures for Freddy, our mascot. Our
vinyl figures.
And so this became immediately very confusing for me. I met Aaron, who is our
head of UX. He is a very famous designer in a book called Designing for Emotion.
And this is Ben, our CEO, a man of many talents. He also has a design background.
And very quickly I realized that most companies are not in the business of analytics,
right? When you hear about data science and analytics these days the same
companies are in the headlines. A lot of it seems to have to do with tracking
personal data and doing add targeting. I was going to a company that did not do
that.
They had some other product that people paid monthly fees for to use our product.
What does analytics look like there? What I quickly realized was that all these
people, all these designers, they were after one goal, right? They want to improve
the product by improving the user’s experience of that product. So what I realized is
if I’m going to do analytics at MailChimp, I’m going to have to get behind this. This is
the goal of everyone I kept encountering. So what does that look like for me?
What does it look like for a math guy? To use my skill set, which is not Photoshop, to
improve this website. So I went online and started looking at other websites, right?
So this is one I found. This is a data science product. I thought okay, I can do
something like this. This is from a restaurant review website. I guess they review
other things besides restaurants, but they release this heat map thing where I can
pick a city and Atlanta is not on there, so I picked San Francisco. Hey Seattle made
the list, good for y’all. And then you pick a keyword, but you can’t type in any word,
they’re curated per city, right? So if you pick Paris, like, baguette comes up and if
you pick San Francisco hipster comes up, and then you click it and it colors a heat
map thing with review data all the reviews talking about hipsters are all in the
mission.
Okay, so that’s the mission in San Francisco. And so looking at this you kind of
realize, okay, this actually doesn’t improve my experience of the product, right? All
it does is it sort of flexes analytics muscle that hey, we have all this data, we’ve taken
all this textual data and put it in a heat map to say okay, people talked about
hipsters, now I’m showing it back to you because we think it’s cool, right? And
they’ve in fact selected the words that I can play with to pair it back something to
me that I already knew to begin with. That’s how they curate, it’s like, oh yeah,
that’s exactly where hipsters are, isn’t’ that cool?
So I didn’t want to do this. This doesn’t look like it improves the user experience of
the product at all, this is more impressing maybe the press or someone before I go
public. This is another example. It’s from a famous professional social network.
This was their big first data science project that sort of kicked off data science is
sexy a few years ago. So in this social network there is me down here. I can
visualize all my connections in a map, and they’ve graphed them in these clusters.
This is probably using modular maximization, which I talk about in my book, yay!
And so I can see okay, I’ve got these people in orange, these are people I knew in
Boston. I’ve got these people down here in green. These are people I worked with
at Boos Allen. So I’ve got these different clusters. I already know why these clusters
exist, right? I can just look at the graph and it’s like yeah, I worked with those
people there and I worked with those people there and I went to school with those
people there. So really all you’re telling me is things I already know and I can
highlight someone’s name and it gives me their name, so now I’ve seen their name
twice. So it’s really sexy looking, right, but when you get into it it’s not improving
the product in any way. It just looks really good.
It takes the data and just sort of parrots it back to you and tells you things you
already know, but it does it in a sexy way. So if this isn’t what I should be doing, I’ll
leave this kind of stuff to the designers, what should I be doing? So I thought okay,
well if I can’t find good examples, maybe I’ll look at the toolset that’s out there.
Maybe if I look at tools or techniques I’ll get some insight into what I should be
working on. So what is the big data data science landscape look like?
I’ll just look at all the tools. I found this slide, which is a nightmare. But then there’s
version 2.0 of the nightmare, so this is nightmare version 2.0 and this is still a couple
years old actually. I suspect it’s getting even more crowded. It’s like bubbles within
bubbles within bubbles. So what I discovered is that most analytics teams do
something like this. Right? You start by hearing about some tools, okay, we need to
use hadoop [phonetic], we have to. I don’t know why we have to use hadoop.
Choose your tools first. You know a fraction of what’s possible because you’ve seen
some other website do something that involved a graph or they wrote a blog post
that was really cool, and then you just kind of flail about looking for something to do,
right?
Trying to figure out what does this mean for me? And then in order to impress your
boss there’s another step, you create an info graphic, right? So this is the last step in
analytics nirvana. You collected some data and the new tool that you paid a lot of
money for to save it there even though your data wasn’t that big and you probably
could have put in CQL database, but now I’m going to create an info graphic out of it
to prove that it was all worth it and that I should keep my job.
So what I would propose is there’s another way to go about analytics. And it goes
something like this. Know what’s possible first, so find out what data you have
available to you, both internally and externally if there’s some external data that has
some bearing on your life, but most of the time it’s going to be internal data, right?
So I got these databases, these databases, these databases, margin of error has a
spreadsheet she rarely shares with anyone but I could use that to kind of take a
survey of all the things you have around the company. For Microsoft that’s going to
be insane, but at least in your small world what are the data sources you have at
your disposal?
What techniques could I possibly use? So a basic understanding of what’s
forecasting, how is that different from predictive modeling, how is that different
from data mining? How is that different from optimization? So just kind of a basic
idea of okay, these are the various things people do with data in order to solve
problems and then understanding some of the technology, right?
So like if I want to do predictive modeling maybe I would use [inaudible] or if I just
want to do some basic data vis maybe I would use R [phonetic] again or Python or
maybe I can even use Excel and just draw a graph. So understanding various
technologies. Once you have all that down then identifying problems for
opportunities, right? So talking with people figuring okay, what do you guys need
help with that data could possibly solve?
And actually these first two points are in conversation all the time. Right? You
should always be talking to people, learning what opportunities there are and you
should always be learning about new stuff, the new data sources you didn’t’ know
existed. Oh, that’s a finance database I didn’t’ know about. New technologies that
are coming out, new techniques people are publishing. So that’s kind of all in
conversation intention all the time.
And then once you know all right, we’re going to solve this problem, this requires
data, and we can solve it, here’s the data we’re going to use. We’re going to use
these sources, here’s the technologies we’re going to use, and here’s the techniques
we’re going to use, and then you solve it.
So it’s sort of a problem centric approach to data science, right? And this is actually
why I wrote this book. So Data Smart, this book is really not sexy. Basically I take
the reader through a bunch of techniques that are sort of the bread and butter of
analytics data science, and I do it in excel because that’s a great place to learn them.
It’s very sort of tool, I mean it is a tool, but the data is always in your face, right?
So you can say these are the inputs and I’m going to put them through this formula
and I’m going to put them through the outputs versus using programming language
like R or Python where you’re just going to load up the library and the work is done
for you. You never actually learned anything. So this book is just meant to teach you
like a survey of all the different techniques at your disposal. And it doesn’t just
hammer on one.
A lot of books out there seem to think that artificial intelligence is going to solve
everything as a really sexy technique right now, but there are a bunch of other
techniques out there that are also very useful. Outlier detection is one where you
can do artificial intelligence or you could do some other kind of technique that
maybe would be unsupervised. So I try to take people through all of them.
Okay. So that’s my spiel on analytics and how it should be done within a company
or an analytics team or some part of the organization. I want to just give y’all some
examples of what this looks like at MailChimp. So I’m not just kind of blowing
smoke like we actually try to practice this basic philosophy. And the way I’m going
to do this is I’m just going to go through and we break up the data science team into
sort of four things that we do. We deliver insights and capabilities. Insights is just
basic analysis research. So these might be one-off things where the result might be
a report or spreadsheet or a blog post, right? And then we build capabilities, which
are tools.
So these could be internal tools that other teams can use, or external teams that
actually just become part of the application, right? And so a MailChimp customer
can use one of these tools. So we do this internal and external stuff. So we do it for
internal customers, which are essentially other teams. We act as an internal
consultant other teams and then we do it for external customers, which are our
users that pay us money.
And the way it generally breaks down for us is we spend about 20 percent of our
time just doing basic insights and 80 percent of our time building tools. That’s in
part because tools are just more powerful. If I can build something for the
application that means that all of our users can do analytics versus if I just do it as a
one-off then that one report gets created and that’s it. Furthermore, we spend a lot
of time on capabilities because they are just harder to build.
Okay. So I’m just going to go left to right and top to bottom. So internal insights. I’ll
give you some examples. This is our customer support team. They’re all in Atlanta.
It’s actually about half the company. These folks answer chat and email support,
right? So you just write in and say hey I’m having trouble syncing up my [inaudible]
website with MailChimp, how do I make this happen? These folks are the people
that are going to answer that question. And we’re adding users very quickly.
So in queue one we added 200 thousand new users. And this is actually the night
shift logo. We now have a night shift, yay! We’re hiring about one new support
employee every day, which for a company that I started there less than three years
ago and we had 80 employees and now we’re at about 300, so it’s pretty rapid for us.
It’s been kind of a shock.
But one of the things we’re bumping up against is okay, we’re now at a size with 6
million customers where we have to do some kind of triage or sorting of who gets
support, what does that look like, right? And I think the easy way to do it would be
to say okay, we’ve got forever free accounts for very small users, so these are people
like oh I have a fantasy football league and I want to send a newsletter through
MailChimp. They might use us. Should they get support in the same way that a paid
user does?
So that would be a very simple way to do it. Unfortunately what we’ve found in the
data is that 40 percent of eventually paid users come into customer support before
they pay us money. Right? So if you just give a certain type of support or support
faster for people who have already paid you’re missing a lot of people who would
have paid you money.
So how do we solve that problem? Well, one of the things we could do, you’ll
remember I’m in the insights section, not in the modeling and tools section of this
presentation, I could just build a big AI model to predict paid customers. That’s one
way to go about this. In fact, it something we toyed with to see how far we could get.
This is a quote that I really like. This is from Robert Holt. He wrote a paper called
Very Simple classification rules perform well on most commonly used datasets. And
the basic premise is that you can build a big AI model to do a lot of things, but single
rules often perform very well.
So it’s a single rule. Okay, I have this data that comes in about this user. This user
looks like this. They do this, they do this, they do this. Let me check one thing. And
based on that one check I’m going to say they’re likely to pay us money. So that
would be a single rule. Those actually perform very well, right? A lot of times
people want to jump to the most complex thing they can do. But Holt’s contention in
this paper is that complexity must be justified. People never really think about the
fact that they have to justify why they’re using a really complex approach, and in fact
I think Holt was probably saying you have to justify it with better accuracy, you’ve
got to provide a raw curve that shows how if we use this complex model we get all
these gains.
You also need to justify the additional organizational complexity. I’ve worked at a lot
of companies, especially as a consultant where my job was just to build this really
complex thing that the client had asked for. And the sad thing about that is that
they’ll often get these really complex things that they can tell, like man, we got this
space-age model, the moment the consultants leave it dies. Right? Because no one
knows how to update it. They ask for too many buttons and knobs and then
immediately forgot what they do and no one wants to read the documentation.
There’s an organizational complexity you can introduce if your model is just
unwieldy. I had to build one forecast model that included standard deviation in it
and I had to describe to every single person who was going to use it what standard
deviation was. And the moment I left the model was moth balled.
So when we think about these models at MailChimp I have to think okay, am I going
to stick around for this paid-free prediction or is this something I’m handing to
someone else? Who is that person? How do they keep it up-to-date? In this
particular case the customer was just a developer in support right, and they were
not going to be keeping any eyes on this, so we analyzed the data and found you
know what, there actually is not one rule, there are two rules I can hand you that do
a really good job of ordering customers.
One is when you sign up for a MailChimp account what’s your email address? Is the
domain a free mail domain or is the domain your own domain for your business? So
is it a paid domain? If it’s a free mail domain you’re less likely to ever pay us money.
Second rule, do you have a list of email addresses to import into MailChimp? If you
say no, or if you leave this question blank you’re really not likely to ever pay us
money. Those two questions combined were really extremely powerful features.
After those two questions anything else we added to models helped, but wasn’t
nearly as powerful as those two features. So we just left it at that.
We realized we could get a more accurate model by including a bunch of other
things, I mean you could go out to the domain of that person’s email address, look at
it and use built with to tell what technologies they built, their check out process, you
could do all of these creative things, but whose going to keep that up-to-date? Do
we think it has a chance of survival? If the answer is no let’s stick with the simple
one. That’s fine.
People need to realize they’re free to do that. Okay, another example of the basic
insight that we created. Someone in support had to schedule everyone who’s doing
chat support, and they had to schedule when they would take their lunch break.
That’s where it says phasing. They’re phasing out of chat. It’s kind of sad, some of
these people are point five. That means they’re half a person. That’s because they
were just hired and they’re not fully trained, so they’re counted as half a person,
which is kind of depressing. But they all become full people eventually after they
take enough chats.
But this is very difficult for someone. This is actually kind of a classic operations
research problem to have demand coming along the bottom here. It’s sort of a
forecast in chat that comes out of a forecast model and then you’ve got to schedule
when are people going to take their lunch breaks. They had all these rules like if you
take a lunch break you should take it with someone else because you don’t want to
be lonely. So we thought why don’t we just do the schedule for you because we
know the math behind it? It’s a classic operations research problem it looks like
this, it’s actually just a bunch of inequalities and then you’ve got an objective
function, in our case the objective function is maximize availability subject to
demand.
Code up an LP format, which looks like this, which is frightening, don’t look at that
too closely, and just return to them a schedule back in excel just how they wanted it.
It says okay, here’s when people are on point that means they’re taking chats.
Here’s when they’re taking their lunch break. Certain days they are on email. And
that’s it.
This is an example where we found that someone was doing something the manual
way, right? They were just kind of playing with people like okay, what if I slide the
lunch break, and we realized we know how to do this with math. This actually isn’t
a big data problem and we can just do it real quick and just hand you back this
artifact. This spreadsheet that’s actually optimal in terms of forecast and chat
demand.
One thing I like to keep in mind while we’re still on this topic is the [inaudible]
example from star trek where Kirk has a test he has to take and this particular test is
unbeatable. I think the story actually changes in the new star trek movie versus the
old star trek, but the basic idea is that there are these clingon war birds and they’re
coming in and they’re going to destroy you and you actually can’t win. The point of
the test is to see how you do under pressure given that you will fail.
And Kirk actually goes in and changes the rules of the game, right? So he actually
goes into the system and changes it and cheats so that he can win because he refuses
to admit defeat. He’s just like that. Instead he eats an apple and wins.
But this is something to keep in mind as an analytics person because I think the
tendency is to be presented with the very complex game or system to be like oh, I
can solve it with math and data and you need people to go to lunch together so
they’re not lonely, yeah absolutely. But I think that analysts need to feel empowered
to also say, you know what? Maybe we don’t need to do that anymore. It is perfectly
acceptable to win some sort of analytics problem by suggesting to the business that
maybe you should no longer do that at all.
And the nice thing that analysts have and that data scientists have is this ability to
take in the data, describe the problem, model it and say okay what happens if we get
rid of that rule? So what happens if I get rid of this rule that people have to go to
lunch together? Does that somehow make us more available to take chats? And if so
by how much?
So it’s okay to change the rules of the game and it’s really nice when you can
quantify what that change brings the business in terms of revenue or cost savings,
so keep that in mind.
External insights. We’ll go through this one really quickly. These are basically blog
posts, guides, things like that. So a bunch of our users want to know where do the
demographics of some of these free mail providers? So how does hotmail compare
with yahoo, AOL, Gmail, we’ve got Comcast on here because we send a ton of email
to Comcast. Interestingly enough this is just one graph I stole out of the post, this is
age distribution. So these are first quartile median, third quartile age, for various
email providers. Comcast is way older, which is kind of interesting and it’s because
you have to have like at least a couple bucks to have a Comcast email address, right?
Because you have to have cable or Internet or something. And then you’ll see AOL
actually slides out about six or seven years past hotmail and Gmail. So that’s kind of
interesting. Hotmail is pretty young. I suspect some of this has to do with the fact
that I need an email address in order to have an Xbox live account when I’m setting
that up. It’s going to be hotmail, a bunch of Microsoft owned free mail providers
that are suggested. So we see that those age ranges hotmail and Gmail are actually
very close to each other. Yahoo is a little bit older.
So it’s just one sort of thing that we did and provided that to our users. We have a
bunch of blog posts like this. And our users will get concerned about a few things.
So a lot of people send email to Gmail. This is my Gmail account here. Gmail
introduced the new promotions tab, right? So they took a lot of email marketing and
put it on this upper tab. And so people were flipping out like my email is in another
tab. What does that mean?
Are people ever going to read my email again? So we were actually able to do
analysis and find out what exactly it meant because there is nothing to be gained by
hiding any of this from folks. Rather than letting the question fester, a data science
team can actually go into the data and just quantify like okay, here’s what the change
actually did. So for Gmail email addresses what we saw is the sort of typical click
rates for weekend and weekday the three weeks before the promotions tab was
introduced and the three weeks after the promotions tab was introduced, so typical
click rates during the week would be 13 percent for news letter. After promotions
was introduced it’s down around 12 percent. So you lost about one percent.
So it’s not the end of the world, but it did get affected. So we’re able to just
communicate that to the user. Here’s about what is going to happen. Another
example. During the government shutdown people want to know what’s going to
happen to the engagement of email addresses who work for the government, right?
So we were actually able to provide some of that analysis as soon as the government
shutdown started happening.
Okay, here’s what we’re seeing. If you’re at the EPA or the SPA then no one is
checking their email because they’re not allowed, they’re forbade by law. Some
people are still cheating you’ll notice. Like technically no one form HUD should have
been checking their email and about one-fifth were versus the SEC somehow skated
by with everyone still working, and we actually saw this is when the fall was
ramping up. So this is going to be an increase in email traffic and we should see
actually an increasing engagement, which is why as a percentage of before the
shutdown it’s actually above 100 percent for SEC and state department.
So we were able to provide this to people so they knew okay, if I’m in the mortgage
industry, if I’m sending a news letter that people who work at HUD are going to
read, yeah I should expect that my engagement will fall off a cliff. That’s normal.
Well, as much as a government shutdown is normal it’s normal.
So these are just things that we can provide to people. Okay. So now let’s get into
some of the fun stuff, which is building capabilities or tools. I’ll just give you some
examples of these.
I’m going to give you an example from compliance. We have a team that shuts users
down. And the reason why they do this is because we have six million users. We
don’t have six million ip addresses that we send email over. So a lot of our users
share an ip address. There is some really good averaging affects that come from
that, but if you get one really terrible user in there who is sending to a lot of dead
mail boxes, people who haven’t check their mail for 10 years, it’s going to get
noticed and that ip address could be blocked.
So we need to shut those users down very quickly. So one of the things I looked at
when I first joined MailChimp is who is getting shut down and for what reasons. I
went through all the reasons and created this flow chart. You can get shut down for
multiple accounts, you can get shut down for sending a bunch of dead addresses,
and then you go through various processes and be kicked out the door permanently
or make some restitution have the good lists and that. But one thing I noticed, I
don’t know if you can read it in the back so I’ll read it for you, about one-sixth of our
users were shut down for loading a large list of email addresses, trying to buy a
high-dollar account, or requesting high volume approval, i.e. approval to send to a
large list.
None of those things are bad, right? They’re just scary. If I’ve got a user who comes
in like I want to send to 2 million people, we’re going to shut you down to why do
you want to send to 2 million people? Because if we let that mail out the door and
they’re an abuser, that’s a really terrible situation. So we’re going to manually
review them. But people have this expectation when they sign up for an online
service that they shouldn’t be slowed down, that things should happen seamlessly.
We get a lot of letters like this bullshit review process, I’m sending out press
releases. Press release is in all caps. I don’t know. Yada, yada, yada this is a pain in
the ass. People get really angry. And I understand. When I sign up for any account I
expect online to kind of work really quickly. It’s email. I should just sign up and be
able to send. They don’t realize that spam is something that sort of originated with
email as a big problem, and it’s one we have to control for. And people are signing
up like the day before black Friday, right?
So it’s high potential for abuse, but they really want to get out their email today.
People want to procrastinate. So how do we solve this problem? Well, we solve it
instead of doing a manual review process we create models that will predict when
someone’s going to abuse. And that’s what we tried to do. This is a quote from Paul
Graham about a decade ago. Basically, he says well, spammers, it’s all about the
content. They can’t change their content. They can’t get away from saying Viagra.
They have to say Viagra.
So as long as I can detect Viagra I can shut them down. This is actually wrong these
days, right? Most of our abusers look like this. This is an email that I personally got.
I did not sign up for this list. Dear Avondale Estates resident, I am a realtor, yada,
yada, yada, welcome to the neighborhood, whatever. I never signed up for this. So
there are some ways that this person might be within the law, might be within can
spam if they provide certain information about how I can unsubscribe, etc, but from
a MailChimp perspective this is non-permission based email. We would consider
this person a spammer. They just got my email address from the city and blasted
something at me.
But there is nothing in the content that would allow me as an AI model to
necessarily know this is a spammer. There are good real estate agents out there that
only send to actual customers. So how do we tell the difference?
Well, spam is in the eye of the beholder, right? We tell the difference because it’s the
people on your list that are either going to open and click the email or they’re going
to hit the spam button, right? And that’s the signal we’re going after. So really it’s
not about content at all. It’s about your list. Who are these email addresses that
you’re about to send this to? Where have I seen them before? What types of stuff do
they receive? How old are those email addresses? What happened the last time
anyone of our millions of users sent to them?
We sent to 3 billion email addresses currently and we save all this data. So we’re
able to build a big system that looked at that mostly list based information, as well
as user-based information, a bit of campaign information. But the thing that we
tried to focus on, and this goes back to the earlier slides at the beginning of the talk,
is eliminating risk complexity, right? Even when we build a production system and
we deem this one important enough to actually build a full-fledged AI model instead
of just a simple rule, we want to make sure that we design it in such a way that it’s
going to survive, right?
So most of the data we used internally was already structured, so rather than using
some unstructured database like a hadoop [phonetic] we just chose to use post
graphs because our data is already structured so why not just use a charted CQL
database for it? So we made a lot of design decisions like that. We used random
forest as our AI model. I talk about that one in the book because that one is actually
particularly hard to over train. It’s pretty robust against that. It’s not very finicky.
It’s a pretty good one to use.
So we just made decisions that would be slightly more robust. Okay. But let’s talk
about building tools for external users, right? We’re not talking about building an AI
system. That’s something that’s internal. It can be fairly complex. I’m around to
explain it to my coworkers in person. What does it look like to do analytics where
the customer is someone on the other side of the world? And they’re paying you
maybe 15 dollars a month.
How do I build a tool for them? This is very hard not only because tools are hard to
build, but they’re also hard to communicate. Data science tool list. So we like to
keep it simple. This is probably my favorite thing I’ve gotten to work on. We have
this captia you have to fill out when you create an account with us. And this is a
nightmare, right? I mean I don’t know what point Google decided that we were
going to start labeling all the addresses in the world for them. I assume this is a
street address, this like black blob is probably a street address, but I never know the
answer to these things.
I think on average I’m at least failing the first one. So can we just get rid of it? That
was the question. Can we use our data to get rid of it? And in fact what we found is
that for certain email addresses during the sign-up process we could use our antiabuse models we’d already built and all the data we were already saving in a large
database called the email genome project, we actually have data about 3 billion
email addresses that were constantly updating based on their interactions with the
system, we could bring that data to bare on this question.
There are certain email addresses, certain people who were signing up for an
account where I just don’t know or it’s a grey area. You’re going to fill a captia. But
there is a large portion of our user base for which we never have to show this. So
we removed it from the account sign up process. We removed it from the process
where you contact customer support.
So the artifact for this project for me, how do I build a tool if we think about it’s
really hard to communicate data science to people, there’s just nothing. We just
removed something from the application. And that’s it. That’s all we did. And so I
was really happy about that because I didn’t make anyone’s life any harder. All I did
was remove a barrier. And I took great joy in that.
So another example is send time optimization. A really nagging question for our
users is when do I send my email marketing? They just get so hung up on it. We
spent all this time creating this newsletter about some sale they’re about to have.
They have maybe a million customers they’re going to send it to. And they just don’t
know when to send it and they start freaking out. Then they start going online and
reading anecdotes from people who generally have guru in their job title and they
have no idea what they’re talking about. Maybe they have some small set of data
from one client they worked with and they’re like oh, well people work from 9-5 and
they always take their lunch break at noon, and they’re always stuck in traffic at 6,
so you don’t want to send it at noon or 6. And it’s just always bologna.
So could we answer this question for people? Well, this is a graph from one
individual email address. We’ve got some timeslots here for sends versus clicks.
And what are we really interested in? We’re interested in when are people engaging
with their email? And that engagement is not being driven necessarily by the
notification they just got that they received the email. So where can I find
engagement that’s not tied to volume, but rather is oh, you were available and you
went back and checked your email.
So we built an optimization model off of that data. Here is actually some
distributions of optimal send times for email addresses by categories. So we’ve got
college-aged folks, folks in their 40s and folks over retirement, and if we look at the
distribution of optimal send times, we’ll actually see that college pushes out to the
right. So it peaks with most email addresses in college have an optimal send time
around 1 p.m. versus folks in their 40s and after retirement it’s around 10 a.m.
So it’s really interesting that the data actually pointed that out and you see a similar
trend for a lot of stuff. Bartenders are closer to 1 p.m. Neonatal nurses there really
is no peak. They’re kind of checking their email throughout the day because their
schedules are just weird.
So we built this system but it’s got tons of data in it and it looks at data in an
individual level. It’s all stored in [inaudible]. It’s really sophisticated. How are we
going to show this to our users? Well, they care about the optimal send time for
their list. When they go through the campaign set-up process they have to pick a
time to send at. If they want to schedule their campaign we just have a bubble there
that says let MailChimp pick, and then a little check box that says the optimal time to
send is 3 p.m.
So we did all this work but that doesn’t mean I have to build a big graph with lots of
colors and a heat map or anything like that to communicate that work. Right?
That’s not for our users that would be for the press.
So instead if you want to make the user experience better, let’s just make it a simple
bubble. That’s enough, right?
Here’s another example. I’m just loading up examples here. Apologies. What if I
have a list and there is a particular segment I’m really interested in, but I don’t know
all those people? Right? So let’s say MailChimp has a list of 5 million email
addresses of people that have asked to be updated on our app and new things we’re
releasing.
So let’s say I want to find the press in there, the tech press, and I know a few of the
email addresses like I know that person works for read-write web, I know that
person who works for some tech blog, tech crunch. But I don’t know all of the press
that’s on my list. How am I going to find them?
Well, chances are based on their subscription data how they interact with our
system, other people in the press live in their neighborhood in a data neighborhood
sense. So they are subscribing to the same things, they are interacting with the
same things. So what do I really want to know? I want to know who lives as close to
the tech people, the tech writes I know, who lives as close to them as they live to
each other?
So we can do that calculation, but how do we expose it within the system? You go in
to the create a segment tab here and you put in some email addresses of people we
know. I know this person [inaudible]. I know this person [inaudible]. These are
people on my list and I just want to find the other people on my list that I might not
be aware of.
All we did is we created a discover similar segments button. So you can either save
this segment or you can save the segment with some other people who look like
them. And that’s it. It’s just a button.
So that’s really the approach we take. We work hard to build these systems that are
very rigorous, but we try to keep things as simple as possible, right? If we can keep
the math simple we keep the math simple. If the math has to be complex we try to
make sure that the way it’s exposed to people, the way that it’s kept up is still
simple.
And so I’ll just leave you with four points really. I think anyone who is doing data
analysis at any company it’s important to align yourself with the goals of the
business and serve your colleagues. You see a lot of articles these days about data
science is the sexiest job. Just align yourself with everyone else and serve in that
way when all the hype dies people see that you are valuable and want to keep you
around.
Data science products should receive no special treatment. So if a designer can
solve a problem that I can solve by just changing the color of a button or making the
app look different or feel different that is fine. There is nothing special necessarily
about using math versus something else. So if something is not working, feel free to
kill it.
Get a goal first, and then get toys. Don’t just flail about. Figure out exactly what you
want to do and then figure out here’s the toy. Here are the tools that are going to
solve that problem.
Avoid complexity. That’s really what I’ve talked about for 30 minutes now. So that’s
it. If y’all have questions I think we have time for questions, right?
>>: You mentioned about the mail genome project [inaudible]?
>> John Foreman: It’s not. That’s a short answer of that. We’re still trying. We’re
going really slowly with it. There are a few companies out there that are sort of
charging around like a bull in a china shop with related to data like this and they’re
getting in huge trouble.
We always error on the side of privacy, and these systems were built for first-thingsfirst to prevent abuse, right? So we have tons of data on billions of email addresses,
but I don’t know. We always try to figure out how can we share and what can we
share in a way that maintains expectations from recipients, expectations from our
users? So stuff like discover similar segments is powered by EGP; however, it’s only
about stuff on their list.
We might use that 3 billion email address dataset to perform these neighborhood
calculations, but all I’m telling you is you have these email addresses on your list,
these are email addresses like them also on your list.
So everyone is double opted in and everything is above board so we don’t really
share any of that right now.
Yes?
>>: What is your educational background?
>> John Foreman: So my background is pure math in undergrad, so I studied
abstract algebra and stuff, and then my advisor sat me down and said you’re not
going to be a great mathematician. His words were actually something like you
could go to an okay grad school but you’re going to be one of those guys that kind of
toys around with small results for the rest of your life and math will jump forward
really quickly at certain points by geniuses and you’ll be toying around, but if you’re
cool with that and you really like this keep doing it.
So that sounds like that sucks. I’m going to get a job. So I went to grad school in
applied math operations research, and did that instead and then my wife got
pregnant so I was like well, now I really need a job. So I worked for the government
for a while. My problem with the government is that some of the folks you interact
with just really want to retire.
I like things that are exciting and I just ran into problems with people just like, there
was one person I was interacting with who had a picture of a golf course above their
desk and it was like yeah, I’m going to be there next year when I retire and I’m not
doing anything until then. And it’s just really rough. I’m sure there are other
government organizations where that’s not true, but that was true for me so I got
out of there.
But that’s my background.
Yes?
>>: You talked a lot about for instance, one of your examples with the email, the
subjects or the content of the email being the primary predictor versus the
recipients, but it seemed as though the recipients were largely based on ancillary
data about the recipients, where as the intrinsic data to the email itself the only
thing really there is the subject plus some names that you don’t know anything
about.
So the question that I should have gotten to much quicker is how do you think about
the data neighborhood of a person? Is it more important to get the data to
understand the neighborhood or is it more important to get the toys, tools, and
analysis to analyze what you do have?
So if you were looking at emails alone with no further context, how far could you get
with analytic tools and processes versus ancillary data collection and understanding
the connections between multiple ->> John Foreman: You’re right. That becomes extremely important. So in order to
have that abuse model work the main predictors are built on the list, right? So it’s
when we think about an email sent through MailChimp there’s this huge payload
that’s the content and then the list of email addresses and there is your user
metadata. So you're located here. You say your business address is here because
you have to provide a real address for can spam. And then your billing address is
here and your diners club, we’ll look at all that too.
But with the email addresses you are right. You can get some stuff just from the list,
just as it is. So why is your list 100 percent Gmail and all in caps? Why is everything
like admin@ or dba@ that would be really scary because they're all role addresses.
But usually we have to connect it up. So we want to go into EGP and we have to
have tools to do this. We want to go in and say okay, where have we seen these
email addresses before? How many of them were involved in the Sony hack? How
many of them have been made public? How many of them do we think are
journalists? Why are you emailing 80 percent journalists? That seems scary.
We’ll look at all sorts of stuff like that. How many of them are verifiably dead? So
the last time they were sent to the inbox was dead. How many of them have soft
bounced a gazillion times in a row, meaning they’re on vacation permanently as far
as we’re concerned? In order to understand all that you’ve got to have this big
network of everything. Also subscription data. You get some lists where it’s
extremely broad, and then other lists where it’s all women in their 20s and that’s it.
So what does it mean that you're sending to a huge group of these people when your
content is kind of odd or you're geo located halfway around the world and you're
sending a different language?
So yeah, we have to consider that whole thing and get all sorts of tools and
techniques that would help us look at that much data. Yes?
>>: It seems like you're getting good results from all the networks you have built up
and all the rules that you guys have. Just going back in time, when you first started
at MailChimp you had a fewer set of users, was there a reliance on external data
sources to add some of those problems, or how do you reconcile that as a company
group?
>> John Foreman: It never really factored in until -- so the company is quiet small,
until we introduced our free plan. Then it exploded. It’s like free marketing. And
one of the great things about free users is they’re essentially a marketing cost, but
they provide all this data that can be used to facilitate building these tools, which
helps keep the system clean for everybody.
I came around after the free account because it there wasn’t really enough data to do
this kind of analysis. And because there wasn’t enough data people just didn’t focus
on it at all. Just didn’t even try.
And so once we had all the data lying around it was like well, we could just operate
as we always have, just charge for accounts, which is what we do and just keep
growing and that’s fine, but they just wanted to make use of this and abuse was
really one of the main things that catalyzed it because if you have a free plan you’re
going to get abusers who are like, yeah, let’s just write a bot to sign up for free
accounts.
So they have to bring in someone like me to solve that problem, but then you get to
do all these other fun things with the data. There's a question online.
>>: How do you design software with data first? Is there a different way to think
about designing a website if I want to collect all this data to model analyze later?
>> John Foreman: So yeah, there's a tension there. Like we try to design MailChimp
in a way that just provides a really fun experience for folks. If you’ve never used the
website before you send an email campaign a little monkey hand comes down above
this big red button that’s kind of shaking and sweat is dropping off it. It’s really fun.
If you start thinking about oh, I want to collect this data and that data, all of a sudden
the user experience is degraded.
So recently I logged into facebook and it was like do you work here, or here, or here?
I was like why would you even think that I work for Mary Kay? Like I don’t
understand. What did I say that made you think that? And how do I get past this
screen? So there is this tension with if I want to collect data and use it to then have
some feedback loop where I get better products, is that going to get in the way of the
thing that I want, which is a better product?
So we try to find ways to not bug people, right?
Yes?
>>: If the point is not to bug people, do you have to deal with uncertainty and
ambiguity in your models instead of knowing for certain that that person works at
Mary Kay you say I have a guess that they might?
>>: Yeah. And I think that that’s actually really good to keep that humility in mind
all the time. Like shutter fly recently sent an email campaign to a bunch of people
saying congratulations on your new baby! And a lot of those people did not have a
new baby. They thought that their data was correct, and in fact it’s often not correct.
So we try to build tools that will never get us into that situation. And so it’s kind of
nice when you can never be in a situation where we’re like oh yeah, we know that
for sure. Because most of the time you’re wrong.
Yes.
>>: I just wanted to thank you for the simplicity of your book. I just picked it up
from the front desk on foundations for predictive analytics because it had words like
foundations and practical and simple, and on page 10 that’s the math. And this is
the second book. The fist one was predict [inaudible] for dummies. It was the same
thing. Nowhere does it start by saying hey, I have two data points and I want to
predict the third. And you can do that, but here’s the limitations of it, so here’s how
you want to get around that, and then another example and that sounds like what
you’re doing.
>> John Foreman: Yeah, that’s why I chose Excel. A lot of books do choose to go like
we’re going to do some mathematical notation written in [inaudible] first then
Greek symbols like the sigma’s and we’re going to do some and just assume not only
do people know how to read that but if they don’t like they are going to explain it
and like people would even want to read it. And I think that excel is really nice
because it’s this language that people already get because they have done it. I’ve got
to do the sum, then the equal sum and then drag through this range. You already
know that. And so yeah, I like that kind of stuff but you just get sick of it after a
while. Sometimes you just want something that’s a little bit different.
So I tried to ->>: Well if you have an operations research background, but for those of us who
don’t we don’t have the option.
>> John Foreman: Right, and I think if you’re in school this is like your job is to read
this stuff. If you’re working another job you kind of want to get to the chase a little
faster. Other questions? Yes.
>>: [inaudible]
>> John Foreman: The AB testing they did? The significant AB testing they did?
>>: No, the Obama care role out.
>> John Foreman: Oh, oh that actually, just the website falling over? I mean what I
can say about that was having worked in government and worked in consulting
when I heard that that was happening I was like yeah, of course. It was so many
contractors all working on little pieces of it and then smashing it together. The
whole government contracting apparatus is set up for failure, right?
I mean, you look at that and then you look at the F35 that’s [inaudible] it’s hard. I
feel bad for the government sometimes. Working MailChimp is so much nicer
because we can do things really quickly and not building a camel for four years and
then releasing and hoping it works. Yes?
>>: [inaudible]. From your presentation I think that there are at least two points of
[inaudible] to look at which is the number of emails sent and the number of clicks.
At the same time, with MailChimp if I understand correctly, people can build
arbitrary emails so there is an infinite number of content that they can produce. So
can you walk us through how do you take this fairly complex product or product
that enables people to build complex applications and distill that to a number of
[inaudible] that you can track and also clearly communicate to others.
>> John Foreman: The whole metrics question is kind of funny because I work in a
company that is almost vehemently anti-metric. So for instance, we did a marketing
campaign recently where we took out a bunch of billboards in various cities across
the US and they just had the Freddy head, our little MailChimp logo, and no text on
them whatsoever. So the only people who even understand what these billboards
were people who were already customers.
And there was no data gathering around how did you hear about us? Did you see
the billboard when you signed up? We did nothing. It was just to make our current
users a little happier. They would take pictures. They would tweet. And so I guess
that’s kind of a metric. Did we see any pictures on twitter of the billboard today?
We took the one down in Chicago there was a local paper wrote an article about how
sad someone was across the street from the billboard that the monkey was gone. It
was all facetious. It was like an onion article but it was great.
So one of the ways that we handle metrics is we just blatantly ignore them. And
then we do a lot of other things. We do just a lot of customer interviews. So a lot of
what we would do is rather than looking at what’s our [inaudible] are we getting all
these weird metrics that you can design around, you think they’re somehow related
to the success of your company or revenue, but then you end up gaming them.
Then you end up saying I want to increase average revenue per user so I’m going to
do that by increasing wait times in support for users who don’t pay us as much and
making them pissed. Then they leave and all of a sudden my average has gone up
because I just got rid of a bunch of users.
So we just seem to ignore stuff like that, talk to our customers very closely, and don’t
track a lot of things like that. Talk to our customers closely, understand what they
want. If we keep hearing enough of the same thing, enough of the same stories, and
that’s a lot of what the UX team does. They do a lot of in-person video stuff. They
also do a lot of surveys.
And then we’ll build things off of those. So we’ll say okay, people really -- this is the
problem they’re having. They say they want this but we’ve videoed them, watched
what they do and they’re really bumping into stuff like this.
You can’t ignore that sort of basic insight that may not actually come from data. It
comes from watching people.
We take that and we build tools off of that. That’s a lot of the end run we do around
tracking website metrics. We just don’t.
Yes?
>>: You mentioned talking to users. How does your organization go about talking to
users and who is it that actually does it?
>> John Foreman: It is the UX team that does it. So we have a large team at
MailChimp. It’s the user experience team. The data science team works very closely
with them. Essentially the way it works is that they’re sort of a small yappy dog that
can sense threats real quickly and run around and claw at the glass screen door and
then we’re like a really fat slow dog that can do things that are really frightening but
it takes us forever to get started. So they’ll start yapping and wake us up and then
we’ll wander and be like all right, what are we building for y’all?
But they do a lot of in person customer interviews all around the world. Some of
those customers would be sort of bleeding edge using us in really creative ways.
Other customers would just be more of our standard customers. Maybe it’s a mom
and pop shop trying to sell something and they’ve realized a mailing list is really
useful.
And they’ll go out and actually do video of those. They’ll watch people use the
website. So they do a lot of that. They also do a lot of surveys and that’s how they
find people who are at least a little bit engaged, right? You filled out the survey you
fill out the comment section like I really wish you guys did this! So they would find
people that way too. So they do all of that.
And then actually we’ve started just recently sending one data scientist with them.
So the data science team at MailChimp we tend to hire former consultants. I don’t
know if that’s just because I like former consultants because I was one, but we want
people who just in general can talk to folks so there’s no fear of letting the data
scientist out of the closet. It is perfectly acceptable to send someone with a match
background to talk to a customer.
And so they’ll hang out as a fly on the wall, and then once the data comes up or
science comes up they’ll be like oh yeah, can I ask you some more questions about
AB testing, how do you use it, yada, yada, yada. And they’ll come back with ideas
like we should really try this!
And that’s really nice because if you wait for other people who don’t understand
what you do to translate it sometimes stuff gets lost in translation, right? I mean,
the best translation occurs inside of one brain.
So yeah, it’s a bit about that.
Yes?
>>: So you mentioned that [inaudible].
>> John Foreman: Well, the CEO helped prioritize that, for one. Another great
technique we use is we let people submit requests and we wait two days and we see
which ones survive. It’s amazing the number of requests people give you that the
next day people are like oh, I don’t need that anymore. Okay, good.
But we generally prioritize based on just having broader conversations with the
managers, find out what are the main goals for the year for the company. Okay, this
year we’re thinking about, I can’t tell you what they are because they’re secret, but
we’re thinking about these three things. Which of these products could helps us
achieve these goals and we tackle those first. And we never have enough time so it’s
never like we get to a point where we clean off the list. Some stuff just gets too old
and dies.
Yes?
>>: So when you started the email genome project and you had no data to start
with, how did you decide what to put into your database, everything? Or how did
you go about that?
>> John Foreman: Well, we started with particular products in mind we were going
to build first. And so we knew the first product we were going to build was this
abuse prediction model, right? And so we knew we wanted mailbox interaction data
in order to build that. I need to know about bounces, dead mailboxes, things like
that, spam bounces, all that stuff.
So that’s just going to be first. And so some of those data sets, that one is actually
huge, right? Because we send 10 billion emails every month, all those emails are
going to get some response back from the server saying yeah, we accepted it or in
the case of certain email products they might say yeah, we’ve accepted it and then
two days later they’ll send you a bounce like, haha, we didn’t actually accept it.
So we’ve got to grab all that. So we started with that. So for us it’s always been
about what’s the next thing we’re going to build, and let’s make sure that we start
six months out planning how that data is going to enter into each EP. Other people
tackle it in other ways.
I’m a little bit not optimistic about people who are like let’s just get everything first
and then figure out what we’re going to build. So they spend years capturing data
and have no idea what they’re going to do with it till it’s all in there. Yes?
>>: [inaudible]
>> John Foreman: The one where we’ve looked a lot at accuracy would be the antiabuse stuff because we really cared that we’re only shutting down bad people. And
so for that one what I can say is that there is a portion of people, probably ten
percent of users that are grey area. We can tune the model so that we are extremely
accurate for all these abusers and all these good people. And the cool thing there is if
you are predicted good, verifiably good, we just never talk to you. You just sail
straight through.
But there is this portion in the middle that’s grey. So that would be one place where
we found that. And then the cool thing is this doesn’t end at the research paper
somewhere it’s published like this is how accurate we got. Instead, what we did is
we figure out how to handle the grey people. A don’t he way we do that is through
taste testing. So we essentially say hey, unfortunately this model has detected that
you could be good, you could be bad, here’s why. What we’re going to do is take
your list, send out a taste test first. So we’re going to send out this many email
addresses, wait one hour, and see what kind of feedback comes in. And now we’re
not in the realm of prediction anymore, right?
Now we’re in the realm of reality. If you pass then the rest of your campaign goes
out. So we’ve essentially mitigated some of our risk. It provides a slightly
diminished user experience to that grey area, but we found that it’s necessary. We
can’t just assume the grey area is bad, we can’t assume they’re good either. So that’s
sort of the middle ground. But yes, we do analyze how well we’re doing. That’s part
of what we do.
Yes?
>>: We have time for one more.
>> John Foreman: Congratulations!
>>: So you mentioned about the [inaudible] answer can be answered by the data
science team or the UX team, or sometimes can be both. So what is the process
about how to decide if a particular a question is to be answered by the data science
or the UX [inaudible]?
>> John Foreman: I mean, sometimes it’s a matter of priorities and whether we
think there's additional gains to be made out of not just doing the basic UX stuff. If
we can just do a few interviews, look at our survey data, maybe it’s a few CQL series
that someone on the UX team could do. Yeah, just let them handle it. If it becomes a
situation, there’s no formal process.
It’s a small company, right? I mean, 280 folks, half of those are in customer support,
so if you just look at people doing operational stuff it’s actually very small. We just
talk to each other and eventually the UX team will say hey this is above our
mathematical pay grade and it’s very important too. If everyone says this is rally
important, it needs to be worked on, the data science team is quite small but you
guys handle this one because we can’t crack it with our data.
And the finance team will do that too. They have databases they look at, but
sometimes they will be like we really have this question of international users, it’s
beyond us, we have to go to content. We need to make sure we understand the
language content written in, yada, yada, yada, all of a sudden it becomes a data
science question and so it’ll just end up in our court.
And then if the CEO thinks we should do it we just do it regardless. That always
breaks a tie. All right. Thanks y’all. Thanks for coming out.
[applause]
Download