Corona-FB - hadoop

advertisement
-Nagarjuna K
nagarjuna@outlook.com

1,000 people accessing (custom built-in data
infrastructure)
 Technical & Non Technical

> 500TB/day data arrival
 ad-hoc queries (Hive)
 custom MR
 data pipelines
nagarjuna@outlook.com

Largest cluster > 100PB

More than 60,000 queries/day

datawarehouseNow = 2500 X datawarehouse past
nagarjuna@outlook.com
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Job Tracker Responsibilities
• Managing Cluster
Resources
•
Scheduling All user Job
Limitations
• Job Tracker unable to
handle dual
responsibilities
adequately
•
At Peak Load, cluster
utilization dropped
precipitously due to
scheduling overhead.
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
Pull based scheduling
•
Task trackers provide a
heartbeat status to the
job tracker in order to
get tasks to run.
•
This is periodic
•
Smaller Jobs => waste
of time
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
STATIC SLOT-BASED
RESOURCE MANAGEMENT
•
a MapReduce cluster is
divided into a fixed number
of map and reduce slots
based on a static
configuration.
•
slots are wasted anytime
the cluster workload does
not fit the static
configuration.
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
Job tracker design required
hard downtime (all running
jobs are killed) during a
software upgrade
•
Every software upgrade
resulted in significant
wasted computation.
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
Traditional analytic databases
have advanced resource-based
scheduling for a long time.
Hadoop needs this.




Better scalability and cluster utilization
Lower latency for small jobs
Ability to upgrade without disruption
Scheduling based on actual task resource
requirements rather than a count of map and
reduce tasks
nagarjuna@outlook.com
CORONA
nagarjuna@outlook.com
Cluster Manager
• Track nodes and free
resources in the cluster
Job Tracker
• A dedicated job tracker
for each and every job
•
•
Client process
separate process in
the cluster.
CORONA
nagarjuna@outlook.com
Push based
implementations
•
•
•
Cluster manager gets
resource requests from
Job Tracker
CM pushes back
resource grants back to
Job Tracker
Job Tracker then creates
tasks and pushes to task
trackers for execution.
No Periodic Heat-Beat.
Scheduling latency is
minimized.
CORONA
nagarjuna@outlook.com
Cluster Manager doesn’t
track the progress of jobs.
Cluster Manager is agnostic
abt MapReduce
•
•
Job Tracker takes care.
Job Trackers now track
one job each  less
code complexity
With this change,
• Manage many jobs
simultaneously
• Better cluster utilization

Greater scalability

Lower Latency

No downtime upgrades

Better resource management
nagarjuna@outlook.com

Avg time to refill lot
 During the given period, MapReduce took around
66 seconds to refill a slot, while Corona took
around 55 seconds (an improvement of
approximately 17%)
nagarjuna@outlook.com

Cluster Utilization
 In heavy workloads, the utilization in the Hadoop
MapReduce system topped out at 70%. Corona
was able to reach more than 95%.
nagarjuna@outlook.com

More improvements in
 Scheduling fairness
 Job Latency
nagarjuna@outlook.com

http://goo.gl/XJRNN
nagarjuna@outlook.com
Why Not YARN 
nagarjuna@outlook.com

Storage : 1oo PB of data

Analyzes : 105Tb/30 minutes
nagarjuna@outlook.com

Facebook eliminated the single point of failure in the
HDFS platform using a creation it calls AvatarNode
 Later on Open Source came up with HA NameNode with
similar concept
 More abt Avatar :
▪ http://gigaom.com/cloud/how-facebook-keeps-100-petabytes-ofhadoop-data-online/
▪ https://www.facebook.com/notes/facebook-engineering/under-thehood-hadoop-distributed-filesystem-reliability-with-namenode-andavata/10150888759153920
nagarjuna@outlook.com

But Facebook will soon outgrow this cluster.

Those 900 million members are perpetually
posting new status updates, photos, videos,
comments, and — well, you get the picture.

What if 10,000 PB ?
nagarjuna@outlook.com


What if hadoop cluster across multiple data
centers.
Feasibility
 Network packets couldn’t travel b/w networks so
fast
 Limitation with present Arch :
▪ All the machines of the cluster shud be close enough
nagarjuna@outlook.com

Feasibility
 Introducing tens of milliseconds of delay 
slowing down the system
nagarjuna@outlook.com
Prism
nagarjuna@outlook.com
A single light ray => refract to
multiple rays
Replicates and moves data
wherever it’s needed across a
vast network of computing
facilities
Physically separate but
logically same

Can move warehouses around

Not bound by limitations of the data center
nagarjuna@outlook.com

Still in development
 Not yet deployed
nagarjuna@outlook.com

23rd October
 http://www.theregister.co.uk/2009/10/23/google_
spanner/
 Google : Google Spanner — instamatic
redundancy for 10 million servers?

Prism similar to Spanner ?
 Very little known abt Google Spanner
nagarjuna@outlook.com

Spanner, Facebook Prism could be used to
instantly relocate data in the event of a data
center meltdown.
nagarjuna@outlook.com
Download