Date: 19/02/2010 Contents

advertisement
School of Engineering & Design
Electronic & Computer Engineering
M.Sc. Course in Distributed Computing Systems Engineering
Grid Programming with GridGain
(WS5)
Team 4
Team members:
Vasilios Filippidis
Nicolai Schneider
Michael Plach
Peter Schlumberger
Date: 19/02/2010
CONTENTS
Table of Contents
Contents ...................................................................................................................................... 2
1
Introduction ....................................................................................................................... 4
1.1
Background ................................................................................................................................... 4
1.1.1 Introduction to Grid Computing.................................................................................................... 4
1.1.2 Cloud Computing........................................................................................................................... 4
1.1.3 GridGain ........................................................................................................................................ 5
2
Problem Definition.............................................................................................................. 6
2.1
Image and Video filters ................................................................................................................. 6
2.2
Gridified movie filter application .................................................................................................. 6
3
Project Management .......................................................................................................... 7
4
Design & Implementation ................................................................................................... 8
4.1
Requirements ................................................................................................................................ 8
4.2
Architecture .................................................................................................................................. 8
4.2.1 Input Data ..................................................................................................................................... 9
4.2.2 Data distribution ........................................................................................................................... 9
4.2.3 Filters ............................................................................................................................................. 9
4.2.4 User Interface.............................................................................................................................. 10
4.3
Implementation .......................................................................................................................... 10
4.3.1 Input Data (evtl: Video splitting and merging?) .......................................................................... 10
4.3.2 Data distribution ......................................................................................................................... 10
4.3.3 Execution monitoring and control .............................................................................................. 10
4.3.4 Filters ........................................................................................................................................... 10
4.3.5 User Interface.............................................................................................................................. 10
5
Testing & Evaluation ......................................................................................................... 11
5.1
Test specification ........................................................................................................................ 11
5.1.1 Test Environment ........................................................................................................................ 11
5.2
Evaluation.................................................................................................................................... 11
5.2.1 Non-gridified application performance ...................................................................................... 11
5.2.2 Gridified application performance .............................................................................................. 11
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 2 of 14
5.2.3 Saturation.................................................................................................................................... 11
6
Conclusion ........................................................................................................................ 12
7
References........................................................................................................................ 13
8
Appendix .......................................................................................................................... 14
List of Figures
Figure 1 - Movie Frame Filtering ............................................................................................................. 6
Figure 2 - Distributed Computing for Movie Frame Processing .............................................................. 6
Figure 3 - Video Processing Grid Application Overview .......................................................................... 8
Figure 4 - Creation of Jobs ....................................................................................................................... 9
Figure 5 - Structure and Content of a Job Object .................................................................................... 9
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 3 of 14
1 INTRODUCTION
1.1 Background
1.1.1 Introduction to Grid Computing
Grid computing is a special form of parallel computing where the computing resources are not
located in one single machine but in a set of connected machines. The number of connected
computing resources can dynamically change. The connected computers don’t have to be of the
same kind; a Grid can consist of a heterogeneous set of computers.
A Grid can for example be composed of a huge number of desktop PCs connected via the Internet.
While each node itself has only limited computing power, the combination of all of them can act like
a virtual supercomputer with theoretically unlimited computing power.
One important thing in distributed computing is the distribution of computation tasks to the different
computing nodes of the system. In order to be processed in parallel on several nodes, the
computation job has to be split and distributed in an intelligent way such that the available
computing and communication resources are utilized in an efficient way. In real Grid Computing
systems, the resource management is entirely managed by the Grid itself and the user application
needs only specify what amount of which resources like CPU or memory the job requires. Where the
resources will be allocated is managed by the Grid and it’s middleware technologies.
There are several examples of realized computing Grids which are currently in operation. One of
them is the EGEE1 Grid. It is Europe’s largest computing grid and used by a huge number of scientists
from several fields of science to process computational intensive and large scale data analysis jobs.
[EGEEWS]
1.1.2 Cloud Computing
Cloud computing can be seen as a further development of Grid computing which’s focus is mainly on
making grid-like distributed computing easier and more user-friendly. The main differences to Grids
are that clouds have an application programming interface with service-level-agreement support and
that the computing resources in a cloud are not necessarily completely abstracted or hidden from
the user application.
There are a number of commercial cloud computing systems which can be used by anyone with an
internet connection for a certain fee. An example for a commercial computing cloud is the Google
App Engine. It allows users to deploy their applications on the computing cloud such that they can be
accessed via web access. The memory and computing resources are provided by the cloud. [GAE01]
Another example for a commercial computing cloud is the Amazon Elastic Computing Cloud called
EC2 together with the Amazon Simple Storage Service S3. It offers a resizable amount of computing
resources for user applications. [AEC2WS], [AES3WS]
1
The abbreviation EGEE stands for “Enabling Grids for E-Science”
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 4 of 14
1.1.3 GridGain
GridGain is an Open Source cloud computing middleware for Java and provides an infrastructure for
developing an own cloud computing environment.
Each participating computer in the GridGain cloud is called a node. Each node can send jobs to the
cloud and can get assigned jobs from the cloud, or another node of the cloud, which it has to
execute.
The GridGain middleware offers an application programming interface (API) with a huge number of
functions for processing jobs on the cloud as well as for controlling and monitoring the GridGain
computing cloud activities.
Further information about GridGain and a documentation of the API can be found on the project
website, see [GG01].
The GridGain cloud computing platform is used in this workshop as basis for the development and
analysis of a distributed computing application.
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 5 of 14
2 PROBLEM DEFINITION
2.1 Image and Video filters
Image processing is a computational intensive task because an image consists of a huge number of
pixels which each has to be processed individually by the processor. Some filters are applied on each
pixel individually, like for conversion between colour and greyscale information. But there are also
other filters, like the gauss filter where also the information of the surrounding pixels is relevant
when processing a pixel. The actual implementation of such filters is described in detail later. The
idea of the project was that video image filtering is a suitable task for a grid since the filtering can
require a lot of computational resources and a video can easily be split into independent parts which
can be processed in parallel in a distributed computing environment like a Grid or a Cloud.
Figure 1 shows the basic idea of frame extraction, frame processing and reassembly of a movie.
Figure 1 - Movie Frame Filtering
2.2 Gridified movie filter application
Since each frame of a movie can be processed independently from all other frames, a distributed
approach for rendering and filtering a whole video makes sense. The overall processing effort can be
split up into many independent parts which can be distributed over the computing cloud. Figure 2
shows this idea in a graphical way.
Figure 2 - Distributed Computing for Movie Frame Processing
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 6 of 14
3 PROJECT MANAGEMENT
ToDo
Enter project time plan here
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 7 of 14
4 DESIGN & IMPLEMENTATION
4.1 Requirements
The application shall be able to read video files and extract the frame images of the video.
A filter operation which can be selected via a graphical user interface shall be applied to the
extracted images.
The image processing shall be executed in parallel on several computers using a computing cloud
based on the GridGain cloud computing framework.
The application shall collect the processed frames and merge them together to a video again. Sound
information is not important but it could be copied from the original video.
The job submitting node shall monitor the job execution performance for the different nodes. It shall
furthermore control the job load balancing in the computing cloud using functions provided by
GridGain.
The application shall measure the performance of the job execution in order to evaluate the
performance of the grid application.
4.2 Architecture
Figure 3 shows an overview of the application architecture. todo
Node 2
- Apply the filter on the image
Node 3
Get
Image
Jobs
Node (1) with
GUI for job
submission
Return
Processed
Image Jobs
Send
Image
Jobs
GridGain Computing Cloud
Receive
Processed
Image Jobs
- Extract images from video
- Store in Job sequence info
...
- Monitor processing progress
and control load balancing
- Reassemble processed
images to result video
Node n
Figure 3 - Video Processing Grid Application Overview
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 8 of 14
4.2.1 Input Data
ToDo
Job 1
Job 2
...
Job n
Extract images from video
and store in Job Objects
Figure 4 - Creation of Jobs
Job Object
Execution Interface
Sequence ID
Filter Type
Image
Data
Figure 5 - Structure and Content of a Job Object
4.2.2 Data distribution
ToDo
4.2.3 Filters
A gray conversion filter takes a colour image and converts each pixel from RGB to a grey value.
Image inversion is an operation where the RGB colour information of each pixel is inverted.
A Gauss filter is a low pass filter in the frequency domain which smoothes contrasts in an image. Fine
structures get lost and coarse structures remain in the picture. This filter type is realized by replacing
a pixel value by the mean value of all neighbour pixels within a defined filter size.
gray conversion, (blue filter,) inversion, tbd… Kantenfilter?
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 9 of 14
4.2.4 User Interface
Todo
4.3 Implementation
Todo
4.3.1 Input Data (evtl: Video splitting and merging?)
Todo
4.3.2 Data distribution
todo
4.3.3 Execution monitoring and control
Todo
4.3.4 Filters
todo
4.3.5 User Interface
Todo
Screenshot
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 10 of 14
5 TESTING & EVALUATION
5.1 Test specification
Todo
5.1.1 Test Environment
todo
5.2 Evaluation
Todo
5.2.1 Non-gridified application performance
Todo
5.2.2 Gridified application performance
Todo
5.2.3 Saturation
Todo
- pixel operations (grey / colour filter)
-> probably too little computation, compared to network transmission effort for the images
- complex filters (gauss / edge)
-> can probably be speed up well
- transparent image overlay (logo into video)
-> probably very good application for the grid, because very computation intensive
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 11 of 14
6 CONCLUSION
todo
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 12 of 14
7 REFERENCES
[AEC2WS]
Amazon Elastic Computing Cloud website (visited 02.2010)
http://aws.amazon.com/ec2/
[AS3WS]
Amazon Simple Storage Service website (visited 02.2010)
http://aws.amazon.com/s3/
[EGEEWS]
EGEE website (visited 02.2010)
http://www.eu-egee.org/
[GAE01]
Google App Engine website (visited 02.2010)
http://code.google.com/intl/de-DE/appengine/
[GG01]
GridGain project website (visited 02.2010)
http://www.gridgain.com/index.html
[LN01]
Lecture Notes “Grid Middleware Technologies” by Dr. Li and Dr. Khan
MSc Course, Module EE5541, 2009/2010
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 13 of 14
8 APPENDIX
Error! Use the Home tab to apply Überschrift 1 to the text that you want to appear here.
page 14 of 14
Download