The Paper

advertisement
Image Processing:
A Study in Pixel Averaging
Building a Resolution Pyramid with Parallel
Computing
By Denise Runnels &
Farnaz Zand
December 12, 2000
Abstract:
A time comparison study for image processing via pixel averaging in a parallel system is made. The
time to average the pixels of an image using one processor is compared to the time taken by multiple
processors. An abbreviated resolution pyramid is the product of this study. We do not emphasize the
quality of the resultant image we merely note the difference from one resolution to another due to the
averaging of pixel values. A Beowulf cluster is used and the results of this study are presented below.
Introduction:
A resolution pyramid is a set of files, each file of which contains the same image although with differing
resolutions. This pyramid is very useful in virtual reality applications, for example, that call for real-time
rendering of images such as a topographical scene. The image files in a pyramid are simply tiles of
that scene. When the application is running the image files/tiles with lower resolution are loaded in for
rendering the horizon while image files/tiles with higher resolution are used to render the point of focus.
These image files with different resolutions can be produced using several image processing
techniques. One such technique is to average adjacent pixels of an image to produce an image with a
lower resolution.
Image processing can be a computationally expensive undertaking, especially when the original image
is a very large image. Processing a digital image requires three basic steps: read the image data in,
manipulate that data, write the manipulated data back out. This process is readily suitable for
multiprocessing which is a natural step in the progress of computing and image processing.
This study will compare the time needed for averaging the pixel data of an image using only one
processor with the time needed to distribute and process this data using several processors. We will
observe the image quality as the resolution decreases to determine the smallest resolution with a
discernible image.
Images in the monochrome targa format will be used for this study. An image will be loaded in, the
pixel data will be averaged using a variable scalar value and an image in the monochrome targa format
will be generated using the new, averaged pixel data.
The goal of this study is to determine if multiple processors process an image more efficiently than one
processor.
1
Experimental Setup:
The system used in this experiment is a Beowulf cluster, which consists of 16 dual CPU Pentium II
(233 MHz) nodes. Each node is connected via 3COM 3c905 10/100 PCI Ethernet using a Bay
Networks 350T fast Ethernet switch.
The main program used in this study is written in C++ with calls to the MPI library for message passing.
When using multiple processors a master-slave relationship is developed so that the master processor
will read in the original image, strip the header information and pass the data in a two-dimensional array
to all of the slave processors. The master also sends the upper left corner values of a particular tile to a
particular slave so that the slave processor will know what tile to work with. A call to the system clock
starts timing the process just before the master’s first send.
Upon receiving the data from the master, the slave processors transfer the data from the particular tile
section of the original image dictated by the master into a temporary two-dimensional array. The
averaging algorithm used sums each value of the temporary array row major order in scalar sized
pieces, storing them temporarily in a result two-dimensional array at the position that will eventually
hold the final averaged value. When all of the sums are stored in the proper result array location all of
the values in the result array are divided by the square of the scalar value to determine the mean
grayscale pixel value.
The slave processor then sends this result array to the master whereupon the master consolidates the
result arrays from each of the slaves into the original image now with a lower resolution. The master
processor then calls the makeTGA function from the genTGA.h header file, which is written in C and
mapped into the main program.
The makeTGA function receives from the master a two-dimensional array of grayscale pixel values, the
upper left corner value, the result resolution and a string indicating the prefix for the .tga filename to be
generated. This function generates a monochrome .tga image file using the naming convention,
prefix_cornerX_cornerY_resolution.tga. Upon return from this function another call to the system clock
stops the timer and reports the time taken in microseconds.
When using only one processor that processor reads in the image and strips away the header. The
same averaging algorithm used by the slave processors on a portion of the original image is used on
the entire original image. The result of the averaging algorithm is sent to the makeTGA function, which
produces a resultant image of lower resolution. The system clock is called just before entering the
averaging algorithm to start the timer and then again upon returning from the makeTGA function.
These different versions are run five times for each of 1, 5, and 17 processors with three different sized
images, 1024 X 1024, 512 X 512, and 256 X 256. Each original image is scaled down to 128 X 128
beginning with a scalar value of 2 and increasing the scalar by a power of two each iteration until the
final 128 X 128 resolution is reached. That is, with scalar = 2 the resolution of the 1024 X 1024 image
is reduced to a resolution of 512 X 512, then when scalar = 4 the resolution is reduced to 256 X 256
and finally with scalar = 8 the resolution is 128 X 128.
The original images are constrained in size so that dimensions are a power of two because graphics
hardware available requires this constraint. Also, for code simplification to ensure uneven edges are
not a factor, we have limited the number of multiple processors to be even powers of two plus one, i.e.
22 + 1 and 24 +1. The maximum number of processors available is 32 so that 17 processors is the
maximum used for this project.
2
Results:
Following are representations of the data collected for each image processed.
Original Image Size
256 X 256
90
80
70
60
50
Time in
Milliseconds 40
Resolution 128
30
20
10
0
1
5
17
Number of Processors
The above chart indicates the results obtained when reducing a 256 X 256 image to an image
with resolution 128 X 128 using 1, 5, and 17 processors. With even this small of an image 5
processors are more efficient than 1, although 17 processors are far less efficient than either 1
processor or 5 processors.
The images below are the original 256 X 256 image and the reduced resolution 128 X 128
image produced. It can be seen easily that merely averaging pixel values is not an adequate means of
reducing resolution of an image. The reduced image quality is far from acceptable for a real
application.
Original Image
256 X 256
Processed Image
128 X 128
3
Original Image Size
512 X 512
250
200
150
Time in
Milliseconds
1
5
17
100
50
17
5
0
1
256
Number of
Processors
128
Reduced Resolution
The original image size 512 X 512 is reduced first to a resolution of 256 X 256 then to a
resolution of 128 X 128. Like the previous chart this indicates that 5 processors are more efficient than
1, and that 17 processors is very inefficient comparatively. It is also noted that in this case reducing the
image to the lowest resolution, i.e. 128 X 128, is faster than reducing the image to a resolution of 256 X
256 regardless of number of processors.
4
Original Image Size
1024 X 1024
900
800
700
600
500
Time in
Milliseconds 400
1
5
17
300
200
100
17
0
512
1
256
Reduced Resolution
5
Number of
Processor
128
Again with an original image resolution of 1024 X 1024 reduced to resolutions of 512 X 512,
256 X 256, and 128 X 128 the results indicate that 5 processors are more efficient than 1, and 17
processors are the least efficient.
In the previous graph, original image size 512 X 512, we observe a consistent pattern of
decrease in time for generating consecutively lower resolution images. However, with an original
image size of 1024 X 1024 this pattern of consistency is not observed. This inconsistency is seen with
the reduction to a resolution of 256 X 256 where the time increases for 5 processors as well as for 17
processors.
Conclusion:
From these results we conclude that processing images with multiple processors can be more
efficient than processing images with only one processor. We find however, that there are a threshold
number of processors after which efficiency decreases due to the message passing overhead.
It is our intent to follow this study with a more in depth look at certain aspects not considered
here. We would like to compare the efficiency of sending the entire original image to each slave
processor as we have done in this study with having the master processor divide the original image into
partitions and send only the partition to the slave. Perhaps the bottleneck up front with the master
sorting out partitions is still more efficient than sending such a large message (the entire original image)
to all of the slave processors.
We would also like to consider ways of utilizing different numbers of processors by maintaining
different constraints on the original image so that we might narrow down what the threshold number of
processors possibilities are.
This research has shown promising results for implementing image processing techniques in a
multiprocessing environment. Further research will undoubtedly reveal even more exciting outcomes.
5
References:
Benner, D (2000). Unofficial delphi developers faq. Graphics.
http://www.gnomehome.demon.nl/uddf/pages/graphics.htm (13 October 2000)
Drakos, N (1997). Date and time functions -- <time.h>.
http://iel.ucdavis.edu/people/cperez/test13/node3.html (22 September 2000)
Seyfarth, R. Wiglaf, USM Beowulf cluster. A Class Handout.
Snir, M. (1994). MPI: the complete reference.
http://netlib2.cs.utk.edu/utk/papers/mpi-book/mpi-book.html (13 September 2000)
6
Download