Cloud MapReduce: a MapReduce Implementation on top of a Cloud Operating System 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Authors: Huan Liu, Dan Orban Accenture Technology Labs {huan.liu, dan.orban}@accenture.com Speaker :童耀民 MA1G0222 2013.06.11 Outline 1. INTRODUCTION 2. CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION 3. PROS AND CONS OF CLOUD MAPREDUCE 4. EXPERIMENTAL EVALUATION 5. CONCLUSION 2 INTRODUCTION • Like a server Operating System (OS), a cloud OS is responsible for managing resources. • In a server (e.g., a PC), the OS is responsible for managing the various hardware resources, such as CPU, memory, disks, network interfaces – everything inside a server’s chassis. 3 INTRODUCTION • Instead of managing a single machine’s resources, a cloud OS is responsible for managing the cloud infrastructure, hiding the cloud infrastructure details from the application programmers and coordinating the sharing of the limited resources. • But unlike a traditional OS, a cloud OS it much more complex, not only because it has to manage a much bigger infrastructure, but also because it has to serve many more customers. 4 INTRODUCTION • We have implemented the MapReduce[1] programming model using services provided by the Amazon cloud OS. 5 INTRODUCTION A. Cloud OS B. Challenges posed by a cloud OS C. Advantages of Cloud MapReduce o o o Incremental scalability Symmetry and Decentralization Heterogeneity D. Contributions 6 INTRODUCTION • A.Cloud OS • First, it provides compute services, such as Amazon EC2 and Windows Azure workers. • Second, it provides storage services, such as Amazon S3 and Windows Azure blob storage. • Third, a cloud OS provides communication services, such as Amazon’s Simple Queue Service (SQS) and Windows Azure queue service, which are similar to a pipe on a UNIX OS, where a user can push in messages at one end and pop out messages at the other end. 7 INTRODUCTION • Last, a cloud OS also provides persistent storage services, such as Amazon’s SimpleDB and Windows Azure table services. 8 INTRODUCTION • B. Challenges posed by a cloud OS • A cloud OS’ scalability comes at a price. • It has to be traded off with other desirable system properties. 9 INTRODUCTION • C. Advantages of Cloud MapReduce • By using queues, we easily parallelize the Map and the Shuffling stages. • By using Amazon’s visibility timeout mechanism, we easily implement fault-tolerance. • By leveraging a cloud OS’s fully distributed implementation, we are able to implement a fully distributed architecture with no single point of failure and scalability bottleneck. 10 INTRODUCTION • D. Contributions • First, we propose, implement and evaluate a new architecture for the MapReduce programming model on top of a cloud OS. • The architecture also uses queues to shuffle results from Map to Reduce. 11 CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION 12 CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION • First, it is a synchronization point where workers (a process running on an instance) can coordinate job assignments. • Second, a queue serves as a decoupling mechanism to coordinate data flow between different stages. • Lastly, we use SimpleDB, which serves as the central job coordination point in our fully distributed implementation. 13 CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION • Cloud challenges and our general solution approaches • Long latency: Since Amazon services are accessed through the network, the latency could be significant. • In our measurement, SQS latency ranges from 20ms to 100ms even from within EC2. • Horizontal scaling: Although all Amazon cloud services are based on horizontal scaling, we are only able to observe one concrete manifestation: when using SimpleDB, each SimpleDB domain is only able to sustain a small write throughput. 14 CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION • Failure detection/recovery and conflict resolution • We use SQS’s visibility timeout mechanism for failure detection and recovery. 15 CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION • The user defined Map function must implement the following interface. • Pull iterator with sorting: In a pull iterator implementation, the user defined reduce function must implement the following interface. 16 CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION • The first is the start interface. • For example, for the word count example, the start function initializes a count variable in object T and sets its value to 0. 17 CLOUD MAPREDUCE ARCHITECTURE AND IMPLEMENTATION • For example, in the word count example, the reduce function converts the string to a numerical value, then adds the value to the count variable stored in T. 18 PROS AND CONS OF CLOUD MAPREDUCE • CMR is simpler for several reasons, including the following. • First, S3 presents a large and reliable file storage abstraction, which alleviates us from having to design our own file system. • Second, SimpleDB presents a high bandwidth status vault, which can sustain a high read and write (through striping) throughput. 19 PROS AND CONS OF CLOUD MAPREDUCE • Third, both S3 and SQS present a single point of contact that is capable of sustaining a high throughput. • We no longer need to worry about communicating with many nodes at the same time. • Last, we simply use Amazon’s visibility timeout mechanism to handle failure. • No extra logic is needed to detect and recover from failure. 20 EXPERIMENTAL EVALUATION 21 EXPERIMENTAL EVALUATION 22 EXPERIMENTAL EVALUATION 23 EXPERIMENTAL EVALUATION 24 CONCLUSION • It is far from obvious that we can simplify largescale systems’ design and implementation if we build them on top of a cloud OS. • Using MapReduce as an example, we have demonstrated that it is possible to overcome the cloud limitations without performance degradation. 25 CONCLUSION • The architecture also uses queues to shuffle results from Map to Reduce. • Even though a full scale performance evaluation is beyond the scope of this paper, our preliminary results indicate that CMR is a practical system and its performance is on par with that of Hadoop. • Our experimental results also indicate that using queues to overlap the map and shuffling stage seems to be a promising approach to improve MapReduce performance. 26 GG END TY 27