Nagarjuna K nagarjuna@outlook.com Reliability Availability Scalability - Clusters of 10,000 machines and 200,000 cores, and beyond. Backward (and Forward) Compatibility Ensure customers’ MapReduce applications run unchanged in the next version of the framework. Evolution – Ability for customers to control upgrades to the Hadoop software stack. Predictable Latency – A major customer concern. Cluster utilization nagarjuna@outlook.com Secondary Requirements Support for alternate programming paradigms to MapReduce. Support for short-lived services nagarjuna@outlook.com Need Separate the tasks of Job Tracker ▪ Resource management ▪ Job Scheduling / Management nagarjuna@outlook.com So, What did we come up with nagarjuna@outlook.com • Resource Manager • Node Manager • Application Master • Container Resource Manager (RM) Manages the global assignment of compute resources to applications. nagarjuna@outlook.com Resource Manager (RM) A pure Scheduler No monitoring, tracking status of application No guarantee on restarting failed tasks. nagarjuna@outlook.com Resource Manager (RM) Each client/application may request multiple resources Memory Network Cpu Disk .. This is a significant change from static Mapper / Reducer model nagarjuna@outlook.com Application Master A per – application ApplicationMaster (AM) that manages the application’s life cycle (scheduling and coordination). An application is either a single job in the classic MapReduce jobs or a DAG of such jobs. nagarjuna@outlook.com Application Master A per – application ApplicationMaster (AM) that manages the application’s life cycle. nagarjuna@outlook.com Application Master Application Master has the responsibility of negotiating appropriate nagarjuna@outlook.com resource containers from the Scheduler launching tasks tracking their status monitoring for progress handling task-failures. Node Manager The NodeManager is the permachine framework agent responsible for launching the applications’ containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler. nagarjuna@outlook.com • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce nagarjuna@outlook.com Gain with New Architecture • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce RM and Job manager segregated The Hadoop MapReduce JobTracker spends a very significant portion of time and effort managing the life cycle of applications nagarjuna@outlook.com Gain with New Architecture • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ResourceManage Uses ZooKeeper for fail-over. When primary fails, secondary can quickly start using the state stored in ZK Application Master MapReduce NextGen supports application specific checkpoint capabilities for the ApplicationMaster. MapReduce ApplicationMaster can recover from failures by restoring itself from state saved in HDFS. nagarjuna@outlook.com Gain with New Architecture • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce MapReduce NextGen uses wirecompatible protocols to allow different versions of servers and clients to communicate with each other. Rolling upgrades for the cluster in future. nagarjuna@outlook.com Gain with New Architecture • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce New framework is generic. Can came up with non MR parallel computing techniques Different versions of MR running in parallel End users can upgrade to MR versions on their own schedule nagarjuna@outlook.com Gain with New Architecture • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce MRv2 uses a general concept of a resource for scheduling and allocating to individual applications. Container , can be a mapper or a reducer or … ? Stubborn notion of Mapper,Reducer abolished Better cluster utilization nagarjuna@outlook.com Gain with New Architecture • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce Any Doubts ? nagarjuna@outlook.com http://developer.yahoo.com/blogs/hadoop/po sts/2011/02/mapreduce-nextgen/ nagarjuna@outlook.com