Wrangling Customer Usage
Data with Hadoop
Clearwire – Thursday, June 27th
Carmen Hall – IT Director
Mathew Johnson – Sr. IT Manager
Starting With…
• …a little ingenuITy!
ingenuITy Day @ Clearwire
• Opportunity for everyone in IT to innovate and present
new and even crazy ideas
• One of those crazy ideas was from Roger Hosto
• Roger had the solution for Clearwire’s Big Data
problem: Hadoop
But Wait!
• Now we had a solution for Big Data
• We needed a Big Data opportunity
• We had just the thing…
The Perfect Problem
• Customer Usage Data – our commodity to Wholesale
Totally (un)Wired
• Americans used more than 1,304 petabytes of
wireless data in 2012 - an increase of 69.3% over the
previous 12 months' usage (827 TB)
• Clearwire processes over 3B individual usage detail
records each month
Shifting Landscape
• The U.S. wireless industry is a $195.5 billion
enterprise - larger than publishing, agriculture, hotels
and lodging, air transportation and movies – just to
name a few
• Prepaid/Pay-As-You-Go services' share of overall
market penetration is 23.4% driving higher exposure of
lost revenue if usage delivery is delayed.
• In some cases, a customer can consume data faster
than we can bill for it
Anatomy Of Latency - Legacy
Up to 90 Minutes
1 Hour
IT Usage
Let’s Talk Numbers
• Assume a 2GB plan
• An HD movie from Netflix consumes 2+ GB per hour
• Assume wholesale price = $6/GB
• Assume the retail price for a GB of data (as top up or
overage) ranges from $20 – $100
As if that wasn’t enough • Clearwire was locked into a very expensive vendor
contract which handled both network provisioning and
usage delivery needs
• Legacy solution was not adaptable or flexible
• We needed something innovative, reliable, internally
supportable, scalable – and we needed it fast
Putting ingenuITy to Work!
• Roger’s idea was suddenly a project
• We needed to build a platform to ingest, process, and
provide cleaned usage data for downstream
applications – and quickly
• We needed:
• A Hadoop Cluster
• 24x7 Operations
• Code to ingest data and handle a myriad of business
• Integration with legacy and new systems
Atlas was Born
• Development work began immediately on Clearwire’s
private cloud infrastructure
• Selected BigTop Packaging of Apache Hadoop v1.0.1
• Custom code leveraging Hive and other common tools
to ingest and process data was written
• Infrastructure was built
Hybrid Approach to Hadoop
• Virtual Edge Nodes
• Leveraged our existing private cloud
• Physical Data Nodes
• Per Unit Cost (Storage & CPU) was lower than
existing infrastructure
• Smaller and more efficient than you think
• 24 data nodes, each with 3TB of usable storage
• Gives us 72TB of usable space
• 3x block replication for production data
• Deployed identical DR/Analytics platform
Operational in No Time
• 2.5 months from project approval to production
• Leveraged our existing support organizations
• Solution leveraged common tools, did not require
specialized teams
• Fault tolerance inherent within Hadoop helps us
minimize late night calls
• An endless supply of data was quickly flowing through
the system
• The results were looking good!
Real Results
• 65% improvement in end to end delivery times
• From 2.5 hours to 1.3 hours
• Reduced catch up time from upstream outages by
more than half
• Reduced outage impacts by introducing flexibility to
deliver partial files
• Eliminated 4 hour weekly usage delivery outages tied
to provisioning system maintenance
Anatomy of Latency - Now
1 Hour
Average of 15 Minutes
~6 Minutes ~9 Minutes
Real (Financial) Results
• 6 month return on investment
• Delivered at 1/3 the cost of competing solutions
• Foundational – Enabling Wholesale support plan of
legacy platform migration
• Saving Clearwire 10’s of millions of dollars over life of
contract and internalizing support and development
The Intangibles
• Proved to internal and external partners that we
deliver what we promise with limited negative impacts
to ongoing business
• This was KEY to the speed at which we were able to
migrate our billing platform
• Delivered more than just a single, targeted process –
delivered an enterprise usage platform to grow from
• Kept true to our innovative spirit and the commitment
to IT professionals that they can make a difference
Evolution – Proving More
The Atlas Hadoop platform is now a go-to IT solution
LTE Usage Data – Now in production
Other Data Sources - ESR Data
Data Replication and real-time ETL
Exploring opportunities with network team to move
closer to usage generation
• Changing mindset of what IT can mean to an