YARN - hadoop

advertisement
Nagarjuna K
nagarjuna@outlook.com
Reliability
Availability
Scalability - Clusters of 10,000 machines and 200,000
cores, and beyond.
 Backward (and Forward) Compatibility



 Ensure customers’ MapReduce applications run
unchanged in the next version of the framework.
Evolution – Ability for customers to control upgrades
to the Hadoop software stack.
 Predictable Latency – A major customer concern.
 Cluster utilization

nagarjuna@outlook.com

Secondary Requirements
 Support for alternate programming paradigms to
MapReduce.
 Support for short-lived services
nagarjuna@outlook.com

Need
 Separate the tasks of Job Tracker
▪ Resource management
▪ Job Scheduling / Management
nagarjuna@outlook.com
So, What did we
come up with
nagarjuna@outlook.com
•
Resource Manager
•
Node Manager
•
Application Master
•
Container
Resource Manager (RM)
Manages the global assignment
of compute resources to
applications.
nagarjuna@outlook.com
Resource Manager (RM)

A pure Scheduler

No monitoring, tracking
status of application

No guarantee on restarting
failed tasks.
nagarjuna@outlook.com
Resource Manager (RM)

Each client/application may
request multiple resources





Memory
Network
Cpu
Disk ..
This is a significant change from
static Mapper / Reducer model
nagarjuna@outlook.com
Application Master

A per – application
ApplicationMaster (AM) that
manages the application’s life
cycle (scheduling and
coordination).

An application is either a single
job in the classic MapReduce
jobs or a DAG of such jobs.
nagarjuna@outlook.com
Application Master
A per – application
ApplicationMaster (AM) that
manages the application’s life
cycle.
nagarjuna@outlook.com
Application Master

Application Master has the
responsibility of
 negotiating appropriate




nagarjuna@outlook.com
resource containers from the
Scheduler
launching tasks
tracking their status
monitoring for progress
handling task-failures.
Node Manager

The NodeManager is the permachine framework agent
 responsible for launching the
applications’ containers,
monitoring their resource usage
(cpu, memory, disk, network)
and reporting the same to the
Scheduler.
nagarjuna@outlook.com
•
Scalability
•
Availability
•
Wire-compatibility
•
Innovation & Agility
•
Cluster Utilization
•
Support for programming paradigms other than
MapReduce
nagarjuna@outlook.com
Gain with New
Architecture
•
Scalability
•
Availability
•
Wire-compatibility
•
Innovation & Agility
•
Cluster Utilization
•
Support for programming
paradigms other than
MapReduce

RM and Job manager segregated

The Hadoop MapReduce JobTracker
spends a very significant portion of
time and effort managing the life
cycle of applications
nagarjuna@outlook.com
Gain with New
Architecture
•
Scalability
•
Availability
•
Wire-compatibility
•
Innovation & Agility
•
Cluster Utilization
•
Support for programming
paradigms other than
MapReduce

ResourceManage
 Uses ZooKeeper for fail-over.
 When primary fails, secondary can quickly start
using the state stored in ZK

Application Master
 MapReduce NextGen supports application
specific checkpoint capabilities for the
ApplicationMaster.
 MapReduce ApplicationMaster can recover
from failures by restoring itself from state saved
in HDFS.
nagarjuna@outlook.com
Gain with New
Architecture
•
Scalability
•
Availability
•
Wire-compatibility
•
Innovation & Agility
•
Cluster Utilization
•
Support for programming
paradigms other than
MapReduce

MapReduce NextGen uses wirecompatible protocols to allow different
versions of servers and clients to
communicate with each other.

Rolling upgrades for the cluster in future.
nagarjuna@outlook.com
Gain with New
Architecture
•
Scalability
•
Availability
•
Wire-compatibility
•
Innovation & Agility
•
Cluster Utilization
•
Support for programming
paradigms other than
MapReduce

New framework is generic.
 Can came up with non MR parallel computing
techniques
 Different versions of MR running in parallel 
 End users can upgrade to MR versions on their
own schedule
nagarjuna@outlook.com
Gain with New
Architecture
•
Scalability
•
Availability
•
Wire-compatibility
•
Innovation & Agility
•
Cluster Utilization
•
Support for programming
paradigms other than
MapReduce

MRv2 uses a general concept of a resource
for scheduling and allocating to individual
applications.

Container , can be a mapper or a reducer
or … ?

Stubborn notion of Mapper,Reducer
abolished

Better cluster utilization
nagarjuna@outlook.com
Gain with New
Architecture
•
Scalability
•
Availability
•
Wire-compatibility
•
Innovation & Agility
•
Cluster Utilization
•
Support for programming
paradigms other than
MapReduce

Any Doubts ?
nagarjuna@outlook.com

http://developer.yahoo.com/blogs/hadoop/po
sts/2011/02/mapreduce-nextgen/
nagarjuna@outlook.com
Download