School of Engineering & Design Electronic & Computer Engineering M.Sc. Course in Distributed Computing Systems Engineering Grid Programming with GridGain (WS5) Team 4 Team members: Vasilios Filippidis Nicolai Schneider Michael Plach Peter Schlumberger Date: 19/02/2010 CONTENTS Table of Contents Contents ...................................................................................................................................... 2 1 Introduction ....................................................................................................................... 4 1.1 Background ................................................................................................................................... 4 1.1.1 Introduction to Grid Computing.................................................................................................... 4 1.1.2 Cloud Computing........................................................................................................................... 4 1.1.3 GridGain ........................................................................................................................................ 5 2 Problem Definition.............................................................................................................. 6 2.1 Image and Video filters ................................................................................................................. 6 2.2 Gridified movie filter application .................................................................................................. 6 3 Project Management .......................................................................................................... 7 4 Design & Implementation ................................................................................................... 8 4.1 Requirements ................................................................................................................................ 8 4.2 Architecture .................................................................................................................................. 8 4.2.1 Input Data ..................................................................................................................................... 9 4.2.2 Data distribution ........................................................................................................................... 9 4.2.3 Filters ............................................................................................................................................. 9 4.2.4 User Interface.............................................................................................................................. 10 4.3 Implementation .......................................................................................................................... 10 4.3.1 Input Data (evtl: Video splitting and merging?) .......................................................................... 10 4.3.2 Data distribution ......................................................................................................................... 10 4.3.3 Execution monitoring and control .............................................................................................. 10 4.3.4 Filters ........................................................................................................................................... 10 4.3.5 User Interface.............................................................................................................................. 10 5 Testing & Evaluation ......................................................................................................... 11 5.1 Test specification ........................................................................................................................ 11 5.1.1 Test Environment ........................................................................................................................ 11 5.2 Evaluation.................................................................................................................................... 11 5.2.1 Non-gridified application performance ...................................................................................... 11 5.2.2 Gridified application performance .............................................................................................. 11 Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 2 of 14 5.2.3 Saturation.................................................................................................................................... 11 6 Conclusion ........................................................................................................................ 12 7 References........................................................................................................................ 13 8 Appendix .......................................................................................................................... 14 List of Figures Figure 1 - Movie Frame Filtering ............................................................................................................. 6 Figure 2 - Distributed Computing for Movie Frame Processing .............................................................. 6 Figure 3 - Video Processing Grid Application Overview .......................................................................... 8 Figure 4 - Creation of Jobs ....................................................................................................................... 9 Figure 5 - Structure and Content of a Job Object .................................................................................... 9 Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 3 of 14 1 INTRODUCTION 1.1 Background 1.1.1 Introduction to Grid Computing Grid computing is a special form of parallel computing where the computing resources are not located in one single machine but in a set of connected machines. The number of connected computing resources can dynamically change. The connected computers don’t have to be of the same kind; a Grid can consist of a heterogeneous set of computers. A Grid can for example be composed of a huge number of desktop PCs connected via the Internet. While each node itself has only limited computing power, the combination of all of them can act like a virtual supercomputer with theoretically unlimited computing power. One important thing in distributed computing is the distribution of computation tasks to the different computing nodes of the system. In order to be processed in parallel on several nodes, the computation job has to be split and distributed in an intelligent way such that the available computing and communication resources are utilized in an efficient way. In real Grid Computing systems, the resource management is entirely managed by the Grid itself and the user application needs only specify what amount of which resources like CPU or memory the job requires. Where the resources will be allocated is managed by the Grid and it’s middleware technologies. There are several examples of realized computing Grids which are currently in operation. One of them is the EGEE1 Grid. It is Europe’s largest computing grid and used by a huge number of scientists from several fields of science to process computational intensive and large scale data analysis jobs. [EGEEWS] 1.1.2 Cloud Computing Cloud computing can be seen as a further development of Grid computing which’s focus is mainly on making grid-like distributed computing easier and more user-friendly. The main differences to Grids are that clouds have an application programming interface with service-level-agreement support and that the computing resources in a cloud are not necessarily completely abstracted or hidden from the user application. There are a number of commercial cloud computing systems which can be used by anyone with an internet connection for a certain fee. An example for a commercial computing cloud is the Google App Engine. It allows users to deploy their applications on the computing cloud such that they can be accessed via web access. The memory and computing resources are provided by the cloud. [GAE01] Another example for a commercial computing cloud is the Amazon Elastic Computing Cloud called EC2 together with the Amazon Simple Storage Service S3. It offers a resizable amount of computing resources for user applications. [AEC2WS], [AES3WS] 1 The abbreviation EGEE stands for “Enabling Grids for E-Science” Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 4 of 14 1.1.3 GridGain GridGain is an Open Source cloud computing middleware for Java and provides an infrastructure for developing an own cloud computing environment. Each participating computer in the GridGain cloud is called a node. Each node can send jobs to the cloud and can get assigned jobs from the cloud, or another node of the cloud, which it has to execute. The GridGain middleware offers an application programming interface (API) with a huge number of functions for processing jobs on the cloud as well as for controlling and monitoring the GridGain computing cloud activities. Further information about GridGain and a documentation of the API can be found on the project website, see [GG01]. The GridGain cloud computing platform is used in this workshop as basis for the development and analysis of a distributed computing application. Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 5 of 14 2 PROBLEM DEFINITION 2.1 Image and Video filters Image processing is a computational intensive task because an image consists of a huge number of pixels which each has to be processed individually by the processor. Some filters are applied on each pixel individually, like for conversion between colour and greyscale information. But there are also other filters, like the gauss filter where also the information of the surrounding pixels is relevant when processing a pixel. The actual implementation of such filters is described in detail later. The idea of the project was that video image filtering is a suitable task for a grid since the filtering can require a lot of computational resources and a video can easily be split into independent parts which can be processed in parallel in a distributed computing environment like a Grid or a Cloud. Figure 1 shows the basic idea of frame extraction, frame processing and reassembly of a movie. Figure 1 - Movie Frame Filtering 2.2 Gridified movie filter application Since each frame of a movie can be processed independently from all other frames, a distributed approach for rendering and filtering a whole video makes sense. The overall processing effort can be split up into many independent parts which can be distributed over the computing cloud. Figure 2 shows this idea in a graphical way. Figure 2 - Distributed Computing for Movie Frame Processing Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 6 of 14 3 PROJECT MANAGEMENT ToDo Enter project time plan here Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 7 of 14 4 DESIGN & IMPLEMENTATION 4.1 Requirements The application shall be able to read video files and extract the frame images of the video. A filter operation which can be selected via a graphical user interface shall be applied to the extracted images. The image processing shall be executed in parallel on several computers using a computing cloud based on the GridGain cloud computing framework. The application shall collect the processed frames and merge them together to a video again. Sound information is not important but it could be copied from the original video. The job submitting node shall monitor the job execution performance for the different nodes. It shall furthermore control the job load balancing in the computing cloud using functions provided by GridGain. The application shall measure the performance of the job execution in order to evaluate the performance of the grid application. 4.2 Architecture Figure 3 shows an overview of the application architecture. todo Node 2 - Apply the filter on the image Node 3 Get Image Jobs Node (1) with GUI for job submission Return Processed Image Jobs Send Image Jobs GridGain Computing Cloud Receive Processed Image Jobs - Extract images from video - Store in Job sequence info ... - Monitor processing progress and control load balancing - Reassemble processed images to result video Node n Figure 3 - Video Processing Grid Application Overview Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 8 of 14 4.2.1 Input Data ToDo Job 1 Job 2 ... Job n Extract images from video and store in Job Objects Figure 4 - Creation of Jobs Job Object Execution Interface Sequence ID Filter Type Image Data Figure 5 - Structure and Content of a Job Object 4.2.2 Data distribution ToDo 4.2.3 Filters A gray conversion filter takes a colour image and converts each pixel from RGB to a grey value. Image inversion is an operation where the RGB colour information of each pixel is inverted. A Gauss filter is a low pass filter in the frequency domain which smoothes contrasts in an image. Fine structures get lost and coarse structures remain in the picture. This filter type is realized by replacing a pixel value by the mean value of all neighbour pixels within a defined filter size. gray conversion, (blue filter,) inversion, tbd… Kantenfilter? Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 9 of 14 4.2.4 User Interface Todo 4.3 Implementation Todo 4.3.1 Input Data (evtl: Video splitting and merging?) Todo 4.3.2 Data distribution todo 4.3.3 Execution monitoring and control Todo 4.3.4 Filters todo 4.3.5 User Interface Todo Screenshot Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 10 of 14 5 TESTING & EVALUATION 5.1 Test specification Todo 5.1.1 Test Environment todo 5.2 Evaluation Todo 5.2.1 Non-gridified application performance Todo 5.2.2 Gridified application performance Todo 5.2.3 Saturation Todo - pixel operations (grey / colour filter) -> probably too little computation, compared to network transmission effort for the images - complex filters (gauss / edge) -> can probably be speed up well - transparent image overlay (logo into video) -> probably very good application for the grid, because very computation intensive Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 11 of 14 6 CONCLUSION todo Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 12 of 14 7 REFERENCES [AEC2WS] Amazon Elastic Computing Cloud website (visited 02.2010) http://aws.amazon.com/ec2/ [AS3WS] Amazon Simple Storage Service website (visited 02.2010) http://aws.amazon.com/s3/ [EGEEWS] EGEE website (visited 02.2010) http://www.eu-egee.org/ [GAE01] Google App Engine website (visited 02.2010) http://code.google.com/intl/de-DE/appengine/ [GG01] GridGain project website (visited 02.2010) http://www.gridgain.com/index.html [LN01] Lecture Notes “Grid Middleware Technologies” by Dr. Li and Dr. Khan MSc Course, Module EE5541, 2009/2010 Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 13 of 14 8 APPENDIX Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here. page 14 of 14