Tubular Labs Accelerates YouTube Audience Growth with Cloudera

Company Overview
Tubular Labs Accelerates
YouTube Audience Growth
with Cloudera-powered
Enterprise Data Hub
Every day, 1.2 million videos are uploaded and more than three billion videos are viewed on
Google-owned YouTube, one of the most recognized and popular video-sharing websites on
the planet. Its global reach and recent emphasis on channel development make YouTube an
appealing vehicle for brands and content publishers to reach relevant audiences worldwide.
Finding those audiences in the massive pool of YouTube viewers can be a challenge, but
Tubular Labs is changing that with an enterprise data hub (EDH) powered by Cloudera. The
real-time insights delivered by Tubular Labs’ platform, via Impala in particular, reveal who is
engaging with the videos, when they are engaging, and how they are sharing them – empowering upwards of 2900 content creators and brand marketers including AwesomenessTV,
Jamie Oliver, Maker Studios, Seventeen Magazine, and Vice to grow and engage relevant
audiences like never before.
The Challenge
Before starting Tubular Labs with former YouTube analyst Allison Stern in June of 2012,
founder and CEO Rob Gabel was an executive at Machinima, one of the first corporate
entities that published content in the YouTube space – in Machinima’s case, for the gaming
community. “Millions of gamers watch uploads of walk-throughs, players, news, and clips
of game play on YouTube on demand, typically following a favorite game, such as Call of
Duty or Halo,” said Gabel. The data generated by this YouTube activity could be mined for
audience insights that would help Machinima develop better products and build even more
momentum within its online community, but tools to help Machinima understand and grow
its YouTube following were lacking. YouTube video “success,” measured in views, didn’t tell
the full story. That critical need became the genesis of Tubular Labs.
When Gabel and team began evaluating technologies for the architecture that would
support their business, they identified several key requirements that would need to be
addressed. The most critical of these: to deliver real-time reporting performance over
enormous data sets. Other key requirements included:
• Ingesting data from multiple sources that accumulate hundreds of millions of data
points each day
• Writing all of that information and reconstituting it into something actionable
• Executing very complex queries of this big data set on the fly
• Scaling with rapidly growing data volumes
CUSTOMER SUCCESS STORY
1
Key Highlights
Industry
• Digital Media
Locations
• Mountain View, CA, USA
Business Application Supported
• Enterprise data hub empowering
YouTube audience insights in real time
Impact
• 56% faster YouTube subscription growth
among Tubular customers; 52% faster
growth in YouTube views
• Sub-second ad hoc query response via
Cloudera Impala
Technologies in Use
• Hadoop Platform: Cloudera
Enterprise
• Hadoop Components: Cloudera
Impala, Cloudera Manager, HDFS, Hive,
MapReduce
• Servers: Amazon Web Services EC2
Big Data Scale
• Millions of data points ingested daily from
dozens of data sources
Gabel explained, “Questions about the data change constantly. ‘Which channels have audiences that are 80 percent male?’ ‘Which videos for the latest blockbuster film are getting
the most Facebook likes and shares?’ We wanted to enable very complex queries that could
answer any kind of question on-the-fly, even with our large volume of data. And we wanted
to return those results in under a minute.”
Tubular’s search for technology solutions meeting these criteria surfaced few contenders.
The team knew that traditional database systems wouldn’t provide the performance at scale
that they’d need, especially on the limited budget of a start-up, so they looked into emerging
open source technologies. Dave Koblas, Tubular’s vice president of engineering, said, “We
considered many solutions ranging from Twitter Storm running on Cassandra to some very
elaborate MySQL set-ups. Our original prototypes ran on a MySQL-based platform, but we
quickly realized that it wasn’t a sustainable approach, and couldn’t support the required
performance, analytic flexibility, or scale that we needed.”
With its enterprise data hub approach, Cloudera offered a centralized big data solution
based on Apache Hadoop that uniquely addressed Tubular’s needs for an infrastructure that
would deliver performance and analytic flexibility at scale, integrating tools such as Impala
for real-time SQL-on-Hadoop queries and Cloudera Manager for automated system management. “When we looked at Cloudera Impala and the Apache Hadoop system, it was very
clear that these would grow with us and meet our needs over the long haul,” said Koblas.
The company evaluated other SQL-on-Hadoop options too, but found that most of them
were designed to take queries from running in minutes to tens of seconds, and their
management tools were sub-par. Tubular’s goal was to deliver results in less than two
seconds - to truly deliver a real-time platform with an interactive user interface (UI) that
would ensure the best customer experience. “Impala was one of the only choices that was
trying to deliver on that mission statement, as opposed to just making [Apache] Hive faster,”
explained Koblas.
Tubular adopted Impala before it was released for general availability (GA). Koblas and
team knew they’d be taking a bet on such an early technology, but having Cloudera’s
enterprise-grade support behind the technology gave them confidence. “It was a risk and it has delivered in spades.”
Solution
In August 2013 - just weeks past the one-year anniversary of its founding - Tubular went into
public beta with its AudienceGraph technology. AudienceGraph relies on its EDH to track
and synthesize billions of YouTube video views, comments, and related “events” as well as
social engagements including tweets, comments, and likes. And eight months later, in April
2014, Tubular launched its Tubular Intelligence solution, delivering key enhancements built
for enterprises.
The Tubular team continues to engage with Cloudera Proactive Support regularly to ensure
fast, ongoing innovation and to make sure they’re getting the most out of their platform.
The team relies on Cloudera Manager to manage the Hadoop system. “My experience with
Cloudera Support has been really amazing,“ Koblas added, “and if it wasn’t for Cloudera
Manager, I think I would have pulled out all of my hair.”
“Tubular exists to put information recommendations in the hands of our customers,” said
Gabel. Most of Tubular’s clients fall into one of two categories: content marketers and
content publishers.
• Content marketers are selling a brand - a beauty product, an energy drink, or a deodorant
for instance - and engage YouTube audiences with online advertising. “The ads’ viewers
can leave comments and can share YouTube content across social networks,” explained
Gabel. “Effective marketers participate in the conversation by replying to viewer comments
or tweets. These conversations build rapport and positive sentiment between the brand
and consumers, while helping to spread the word about the brand. But they can also deliver
CUSTOMER SUCCESS STORY
2
important insights. Our platform helps content marketers understand, for example, how a
product is being received in the market, and by whom.”
• Content publishers - such as Machinima, the gaming publisher, and Awesomeness
TV, one of Tubular’s first customers - are examples of content publishers. “These are
the new media companies with on-demand ‘channels’ on YouTube,” said Gabel. “Not
everyone can be a CBS, but companies are making high-quality content that’s engaging,
and that entertains and informs people. They can use YouTube to distribute themselves
as an on-demand channel instantly to 150 countries. It’s a disruptive model.”
For both content marketers and content publishers, the hardest part of YouTube is getting
seen by relevant audiences. That’s where Tubular comes in. A YouTube-certified company
and partner, Tubular uses APIs to get injections of billions of social actions from YouTube, as
well as from Twitter, web crawls, and dozens of other publicly available data sources.
Koblas explained, “We normalize all of the data, feed it through the appropriate queuing
systems, then load it into Cloudera and perform further manipulations with Hive scripts to
make the data available for real-time analytics using Impala.”
The company’s AudienceGraph solution was released at approximately the same time that
Cloudera launched Impala 1.0. Tubular enlisted Cloudera Professional Services to tune the
speed, stability, and response time of its Impala installation. “It was fantastic,” said Gabel,
“and gave our customers the responsive interface they were looking for.”
Today, thousands of Tubular customers use Impala to access, query, and explore synthesized, live information in real time. The company’s SaaS-based platform is hosted on
the Amazon EC2 cloud. “For the Impala real-time nodes,” Koblas clarified, “we're using
dedicated, high-storage two-terabyte (TB) Amazon SSD machines which further enhance
our ability to give real-time responses.”
When we looked at Cloudera
Impala and the Apache Hadoop
system, it was very clear that
these would grow with us and
meet our needs over the long haul.
Dave Koblas, Vice President of Engineering, Tubular Labs
Video is dominating the new
world of content and YouTube
presents a massive big data
opportunity for publishers and
content marketers to capitalize
on, allowing them to better reach
and understand their audiences.
Tubular employs cutting-edge
big data analytics to provide new
insights, and we're proud to be
the platform empowering their
enterprise data hub.
Doug Cutting, Chief Architect, Cloudera
The data is presented in both visual and tabular formats in a dashboard interface. Using
real-time channel views, customers can see numbers of views and engagements, as well
as number of subscribers. And they can run elaborate queries. They can see if anyone
influential has been commenting on their videos. They can look through their entire backlog
Sample Tubular Dashboard
CUSTOMER SUCCESS STORY
3
and look for questions. “If they’re an enterprise customer,” said Gabel, “they can run complex
queries to search hundreds of millions of videos or millions of channels to find, for example,
the fastest-growing electronic dance music artists in Sweden that have not been signed to a
YouTube network.” And they can get that information in seconds.
Impact: Smarter Content, Faster Growth
The future of media, said Gabel, is about content, distribution, and information. “Tubular
is here to help content marketers and content publishers leverage the big data element to
gain competitive intelligence - and deeper insights - that will help them identify, grow, and
engage passionate communities of followers on YouTube more effectively in the future.”
“Video is dominating the new world of content and YouTube presents a massive big data
opportunity for publishers and content marketers to capitalize on, allowing them to better
reach and understand their audiences,” said Doug Cutting, co-founder of Hadoop and chief
architect at Cloudera. “Tubular employs cutting-edge big data analytics to provide new
insights, and we’re proud to be the platform empowering their enterprise data hub.”
In one case, AwesomenessTV was launching a brand new YouTube channel and wanted to
attract and understand its teen audiences. The series turned to Tubular to generate an audience graph that helped answer questions like, “What types of content are teens interested
in? What are their behaviors and interests?”
Tubular was able to provide lists of the channel’s fans, super fans, and most influential fans
that could be leveraged for outreach and activation. Later, when AwesomenessTV lost one
of its show hosts, Tubular was able to recommend replacements - one of which was hired.
AwesomenessTV soon reached a million subscribers on YouTube, and in May 2013 was
purchased by DreamWorks Animation for a $33 million US dollars upfront, with additional
payments of up to $117 million.
Overall, Tubular customers are growing their YouTube subscriber base 56% faster than
non-Tubular users, and are growing their views 52% faster.
And, it seems, the startup has created its own loyal following. One survey indicated that 72%
of Tubular customers have referred the company to a friend. “We haven't spent a dollar on
paid media. And beyond speaking at a few conferences, we haven't done any marketing. But
we already have 2900 customers, including dozens of large enterprises,” noted Gabel. “So,
it's working out.”
About Cloudera
Cloudera is revolutionizing enterprise data management by offering the first unified
Platform for Big Data, an enterprise data hub built on Apache Hadoop. Cloudera offers
enterprises one place to store, process and analyze all their data, empowering them to
extend the value of existing investments while enabling fundamental new ways to derive
value from their data. Only Cloudera offers everything needed on a journey to an
enterprise data hub, including software for business critical data challenges such as
storage, access, management, analysis, security and search. As the leading educator of
Hadoop professionals, Cloudera has trained over 40,000 individuals worldwide. Over
1,400 partners and a seasoned professional services team help deliver greater time
to value. Finally, only Cloudera provides proactive and predictive support to run an
enterprise data hub with confidence. Leading organizations in every industry plus top
public sector organizations globally run Cloudera in production. www.cloudera.com.
cloudera.com
1-888-789-1488 or 1-650-362-0488
Cloudera, Inc. 1001 Page Mill Road, Palo Alto, CA 94304, USA
© 2015 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA
and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.
cloudera-casestudy-tubular-labs-102