Understanding Resource Provisioning for ClimatePrediction.net

advertisement
Understanding Resource Provisioning for ClimatePrediction.net
Malik Shahzad K. Awan and Stephen A. Jarvis
Department of Computer Science,
University of Warwick,
Coventry, UK.
malik@dcs.warwick.ac.uk
Abstract—Peer-to-peer computing, involving the participation
of thousands of general purpose, public computers, has
established itself as a viable paradigm for executing looselycoupled, complex scientific applications requiring significant
computational resources. ClimatePrediction.net is an excellent
demonstrator of this technology, having attracted hundreds of
thousands of users from more than 200 countries for the
efficient and effective execution of complex climate prediction
models. This paper is concerned with understanding the
underlying compute resources on which ClimatePrediction.net
commonly executes. Such a study is advantageous to three
different stakeholders, namely the application developers, the
project participants and the project administrators. This is the
first study of this kind, and while offering general conclusions
with specific reference to ClimatePrediction.net, it also
illustrates the benefits of such work to scalability analysis of
other peer-to-peer projects.
Keywords- Peer-to-Peer Computing; Performance Analysis;
Climate Prediction
I.
INTRODUCTION
Predicting climatic trends remains a computing Grand
Challenge, despite the significant advances in highperformance computing. The development of complex
climate models, coupled with atmospheric/ocean global
circulation models, requires considerable computing
resources for calculating several hundred years of accurate
model data. Historically, supercomputers have been used for
performing the scientific computations involved in
simulating complex climate models [1]. The peer-to-peer
(P2P) computing paradigm, which involves contributions
from (potentially) thousands of personal computers
belonging to the general public, has changed this computing
model. P2P computing provides cheap, general-purpose
computing resources with comparable computing power
(FLOP/s) to an otherwise expensive supercomputer; thus
significant computing resources are now much more readily
available, making possible previously infeasible research [2].
The contribution of P2P computing has been furthered by the
development of software-support for general-purpose
distributed computing. This includes Entropia, which we see
underlying the Great Internet Mersenne Prime Search
(GIMPS) [2, 3] project, for example, and Distributed.net,
which supports research in cryptography [2, 4]. Of
significant value is the Berkeley Open Infrastructure for
Network Computing (BOINC) research, which provides
platform support for a variety of complex scientific
applications in the P2P paradigm [2, 5], etc.
Realizing the fact that executing a non-linear climate
model was beyond the computational capabilities of
available supercomputers [6], the Intergovernmental Panel
on Climate Change (IPCC) in 1999 launched the
ClimatePrediction.net initiative, to explore the possible
contribution of P2P computing [7]. The porting of the
ClimatePrediction.net project to the BOINC platform was
done in 2004 and since then the project has attracted more
than 220,000 users, deploying more than 429,000 hosts, from
217 different countries. Significantly, the project has
achieved average computational power of 129.694 Tera
floating-point operations per second (TeraFLOP/s) [8].
The ClimatePrediction.net project has involved host
machines running 200 distinct microprocessors and 21
different operating systems. Despite this wide variety of
supporting resources, there has been little work on
understanding the implications of this on the application
development, the project participants or the project
administration. This study goes some way to addressing this
issue, and while the results are specifically related to
ClimatePrediction.net, wider conclusions can be drawn with
respect to other P2P projects.
Four of the most widely deployed processors in the
ClimatePrediction.net project have been used in our analysis.
We also consider the impact of the choice of operating
system on the participating computers.
The remainder of the paper is organized as follows:
Section II details the ClimatePrediction.net project; the
description of the publicly available performance data and
the project statistics for ClimatePrediction.net are found in
Section III; Section IV presents the results of our analysis on
this performance data; We identify performance challenges
in Section V, while Section VI contains the conclusions and
a description of the future directions of our research.
II.
CLIMATEPREDICTION.NET
ClimatePrediction.net is a distributed computing project
producing predictions of the Earth’s climate up to the year
2100 and, at the same time, testing the accuracy of climate
prediction models. Parametric approximations, e.g., for
carbon dioxide, sulphur, etc., are analyzed to improve the
understanding of, and confidence in, the impact of these
parameters on the variety of climate prediction models
available to scientists [13].
We highlight the relevant characteristics of the
ClimatePrediction.net project to this work:
compressed and returned to the ClimatePrediction.net project
server [12].
A. Scientific Goal
The main scientific goal of the ClimatePrediction.net
project was to improve “methods to quantify uncertainties of
climate projections and scenarios, including long-term
ensemble simulations using complex models” [9].
H. Performance Evaluation
The ClimatePrediction.net project involves the synthetic
Dhrystone and Whetstone benchmarks for measuring the
performance of the participating computers. The BOINC
platform has an accounting system that defines the notion of
‘credit’, which is a weighted combination of computations
performed, network transfer and the storage used for
computation [2]. The publicly available benchmark results
for each participating ‘type’ of computer capture the
microprocessor, the operating system, the measured integer
performance, the measured floating point performance, the
total disk space, the free disk space, the number of CPUs in
the participating computer, RAM size, Cache size and the
Swap Memory size [8].
B. ClimatePrediction Experiments
The project utilizes a number of different experiments.
These include a simplified single-layer slab ‘ocean’
atmospheric model for analyzing the suitability of the model
for climate prediction; replicating the past climate from
1920-2000 using a full atmosphere-ocean model, and
simulating the climate for 2000-2100 with varying values of
solar, sulphate and greenhouse forces.
C. ClimatePrediction Model
The ClimatePrediction.net project utilizes the ‘Hadley
Centre Slab Model 3’ (HadSM3) version of the ‘Unified
Model’ (UM) [10, 11] from the UK MetOffice for the future
climatic behavior investigations [12]. The prediction model
divides the world into 2.75 by 3.75 degrees resolution grid
boxes and generates simulations of 45-years that require
approximately 4-6 weeks of computation on a uniprocessor
computer [12].
D. Source Code of the Project
The original UM code, written in FORTRAN 90, consists
of approximately 1 million lines of code and was executed
on a single-processor Unix system [12]. The source code was
later ported to object-oriented C++ and a wrapper code was
developed for providing the necessary infrastructure to the
ClimatePrediction.net project, enabling it to act like a
‘volunteer computing’ application [12]. The project deploys
the mySQL database for storage [12].
E. Underlying OS Platform
The first phase of the ClimatePrediction.net project was
based on Windows and took almost 2.5 years of
development. Later, with the adoption of the BOINC
platform in 2004, underlying operating system support was
included for Linux and Apple Macintosh OS X [12].
F. System Requirements
The minimum system requirements for running the
simulations include, 1) a 1.6 GHz microprocessor or above,
2) 256 MB Memory, and 3) at least 1 GB of available hard
disk space - for the different versions of the Windows and
Linux operating systems. For a Macintosh platform, a
1.66GHz microprocessor is required, with 600 MB of
available disk and 192 MB memory are required to execute
the project simulations [14].
G. Simulation Results
The simulations executed on the host machines produce a
detailed output of approximately 2 GB [2], however, only
8MB of the results (summarizing information concerning
temperature, precipitation, clouds, humidity, etc.) are
I. General Project Statistics
Project statistics from ClimatePrediction.net reveal that
more than 220,000 users, forming 6,961 teams, have
participated with around 429,650 machines from 217
countries across the globe [8]. The top 3 participating
countries of the project in terms of users are the United
States (53,466 users), the United Kingdom (21,056 users)
and Germany (19,478 users) [8].
A total of 200 different microprocessors have been
involved in the ClimatePrediction.net project. The Intel
Pentium platform (27,844 hosts) is reported as being the
most commonly used microprocessor, with the Intel Pentium
4 3.0GHz (22,753 hosts), Intel Pentium 4 2.8GHz (17,548
hosts), Intel Pentium 4 2.4GHz (11,646 hosts) and the Intel
Pentium 4 3.2GHz (11,364 hosts) forming four of the top
five most widely used processors in the set of host machines
[8].
Project participants have used 21 different operating
systems with only one of the operating systems having
unknown (hidden) type and name. Microsoft Windows XP
has been reportedly used in more than 273,869 host
machines, Microsoft Vista has been used on 43,015 and
Linux on 40,512 host machines. The use of FreeBSD and
Microsoft Windows 95 is uncommon [8].
III.
PERFORMANCE ANALYSIS
In order to conduct a reliable performance analysis of the
ClimatePrediction.net host resources, we appeal to the fivestage decomposition of this topic as proposed in [15].
These are: 1) Instrumentation – involving accessing
application data for measurement and storage during
execution; 2) Measurement – including measurement and
storage of the application data during execution; 3) Analysis
– comprising the identification of performance trends and
potential bottlenecks; 4) Presentation – consisting of visual
representation of the measured data for manual or automated
analysis, and 5) Optimization – containing the activities
associated with the removal of identified bottlenecks. This
paper presents our preliminary findings using data supplied
by the Dhrystone and Whetstone benchmarks.
A. Scope of the Study
The scope of this study is purely limited to the analysis
and presentation stages of the performance analysis. Thus,
we analyze a proportion of the readily available performance
data from the ClimatePrediction.net project. This data was
collected from the publicly available data source for
scientific applications running on the BOINC platform – see
[8] for more details. Part of this study is to statistically assess
this data to determine whether it provides a sufficient
foundation for further research. If this is not the case then
new resource benchmarking tools may be required in this
context.
B. Significance of Analysis
We believe that the results of this analysis will be
significant to three different stakeholders namely: application
developers, project participants and project administrators.
For application developers, the results provide an insight
into system performance when executing scientific
applications such as those found in ClimatePrediction.net.
The measured performance of the system, with different
parametric values for the supporting hardware components,
will provide the application developers with supporting data
so that future versions of the software can be improved.
Moreover, the results will also enable the application
designers to exploit operating system characteristics.
The general public, interested in participating in P2P
projects, will gain information on effective platforms for
scientific computations involving floating-point and integer
operations. While this may seem inconsequential, it may be
significant to future multi-core resource allocation and
sharing, which if better understood, may increase the number
of participants in the ClimatePrediction.net project.
Finally, we believe that the analysis will assist project
administrators in identifying suitable processor, operating
system and hardware component configuration for best-case
execution. The significance of this is that it may allow a new
form of time-constrained experiments to be supported. In
such a case the middleware may be instructed to postpone
the allocation of work until a more effective architecture
becomes available. This of course represents a trade-off,
balancing the likely occurrence of a more suitable
architecture against the increase in run-time of employing the
first available system. Without supporting performance data
(such as that seen here) such analysis is not possible.
C. Selected Processors
When selecting the processors for our analysis, we
consider two factors: 1) the rank of the most widely used
processors in the host machines and, 2) the credit earned per
CPU, which represents the amount of work done by a
system. Therefore we consider four (of the top 5) most
widely used processors for this analysis, based on the
average credit earned by the CPUs of this type. These
processors are: 1) the Intel Pentium 4 3.2 GHz (credit per
CPU:27,351.64); 2) the Intel Pentium 4 3.0 GHz (credit per
CPU:27,698.48); 3) the Intel Pentium 4 2.8 GHz (credit per
CPU:24,926.16); 4) the Intel Pentium 4 2.4 GHz (credit per
CPU value:16,867.05). All the processors in this study are
from the Intel Pentium 4 family and have SSE2 instruction
set architecture [20]. Such a performance analysis can be
further extended for other candidate processors belonging to
various manufacturers and representing different instruction
set architectures.
IV.
RESULTS
We analyze the performance of processors, operating
systems, their combination at both an overall level and based
on their mean performance, factors that affect the FP and
INT performance across platforms and their effect on the
measured FP and INT values. The data used in the analysis
was manually collected and concerns the performance of
four Intel Pentium 4 microprocessors: 3.2 GHz, 3.0 GHz, 2.8
GHz and 2.4 GHz. A total of 240 samples were collected
from [8] for the four processors – 60 data samples for each of
the four processor types. Of the 60 data samples for each
processor type, 30 represented the performance under
Windows XP, while the remaining 30 samples captured the
performance behaviour under the Linux operating system.
Our analysis reveals some interesting performance trends
among the considered processors under different operating
systems. These results are discussed in the following
subsections.
A. Processor Performance Analysis
When analyzing the overall performance we use
quantile-plots, as this aids clarity of results, while bar-charts
are used for analyzing the mean performance of processors
for FP and INT operations.
1) Overall Processor Performance for FP operations:
The quantile-plots, shown in figure 1, represent the overall
processor performance for floating point (FP) operations.
These results suggest that the performance of the 3.2 GHz
processor was better in terms of achieved FP performance.
The highest achievable FP count value with the 3.2 GHz
processor, as compared to the 3.0 GHz, 2.8 GHz and 2.4GHz
processors, was greater by 7.14%, 4.60% and 16.03%
respectively. However, the lowest possible FP value
measured on the 3.2 GHz processor was 4.21% lower than
that achieved using a 3.0 GHz processor, demonstrating
considerable variability in the results. This same processor
produced a 404.76% improvement over that obtained using a
2.8 GHz processor, while it also highlighted a 3.196% lower
performance than the lowest FP value measured using a
2.4 GHz processor.
50% of the performance results for: 1) the 3.2 GHz
processor, were found in the inter-quartile range (IQR) of
1045 and 1475; 2) the 3.0 GHz processor, were found in the
IQR of 988 and 1309.67; 3) the 2.8 GHz processor, were
found in the IQR of 809 and 1276; and 4) the 2.4 GHz
processor were found in the IQR of 670.60 and 1254.50. A
comparison of IQR reveals a significant overlap of the 50%
FP values of all the processors based on the inter-quartile
ranges, hence, a clear distinction at this level is not possible.
The median value, i.e. the 50th percentile, represents the
central tendency of data and identifies an interesting
distribution trend. The performance instances within the IQR
for 3.2 GHz, 3.0 GHz and 2.4 GHz had a highly skewed
distribution while the values for 2.8 GHz show less of a
trend. The data ranges for the 3.2 GHz, 3.0 GHz, 2.8 GHz
and 2.4 GHz processors were 636-1636, 664-1527, 126-1564
and 657-1410 respectively.
While the maximum and minimum values for each of the
processors lead to a fairly inconclusive comparison (the
exception being the 2.8 GHz processor which exhibits high
variability), there is a clear trend at the 50th percentile. While
the trend itself may be unsurprising, the 3.2 GHz processor
achieves approximately twice the measured FP performance
of that of the 2.4 GHz processor, which clearly represents an
exploitable difference.
were present within the upper and lower whiskers. Several of
the INT values for the 3.0 GHz processor exceeded the upper
whisker boundary and were classified as ‘near outliers’ and
are represented by circles with respect to the distribution of
the majority of data instances. The data ranges for the 3.2
GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz processors were 8693011, 1123-3876, 315-2937 and 1040-2845 respectively.
Figure 2. Comparison of Processor Performance on Measured INT Count
Figure 1. Comparison of Processor Performance on Measured FP Count
2) Overall Processor Performance for INT operations:
Figure 2 represents the overall processor performance for
INT operations. The quantile-plots reveal that the
performance of the 3.0 GHz processor is marginally better in
terms of achieving high INT values. The highest achievable
INT value with 3.0 GHz processor as compared to 3.2 GHz,
2.8 GHz and 2.4 GHz processors showed a relative increase
of 28.72%, 31.97% and 36.24% respectively. Similarly, the
lowest possible INT value measured on a 3.0 GHz processor
was better than the remaining processors. It was 29.22%
higher than that achieved using a 3.0 GHz processor;
256.50% more than that obtained using a 2.8 GHz processor
and 7.98% better than that measured on a 2.4 GHz processor.
A comparison of IQR values for 3.2 GHz (16112500.50), 3.0 GHz (1561.75-2242), 2.8 GHz (1239-2264.50)
and 2.4 GHz (1838-2422) reveals a significant overlap of the
values of all the processors based on the inter-quartile
ranges. The median value, i.e. the 50th percentile,
representing the central tendency of the data, is less
conclusive than in the case of floating-point arithmetic. The
performance instances within the IQR for 3.2 GHz, 3.0 GHz
and 2.8 GHz processors has slightly skewed distributions,
however, the quantile-plot suggests a highly skewed
distribution for performance samples of the 2.4 GHz
processor lying within the IQR. All the data instances for
each of the processors, except for the 3.0 GHz processor,
Figure 3. Comparison of Processor Performance on Mean INT and FP Count
3) Average Processor Performance for INT and FP:
Due to the apparent overlap in the quantile-plots representing
the INT and FP performance, we further compare the
average performance values for INT and FP operations to get
a better insight into the overall processor performance for
this application. Figure 3 presents the comparison of the
average performance for both INT and FP operations. For
INT operations, the 2.4 GHz processor gave higher results
than the 3.2 GHz, 3.0 GHz and 2.8 GHz processors by
9.65%, 9.83% and 29.20% respectively. Perhaps
unsurprisingly the 3.2 GHz processor performs better for FP
operations (on average presenting a 24.03%, 32.87% and
31.63% improvement over the 3.0 GHz, 2.8 GHz and the 2.4
GHz processors). As previously stated, knowing that the 3.2
GHz processor performs better is not surprising – however,
knowing the percentage improvement over the other
processors is a feature that we can later exploit.
B. Operating System Performance Analysis
The performance of operating systems (OS) for FP and
INT operations using the four processors has been analyzed
at an overall level using quantile-plots and at mean
performance levels using bar-charts. Further details of the
analysis are discussed in the following subsections.
1) Overall OS Performance for FP operations:
The performance of Windows XP and Linux on the 3.2 GHz,
3.0 GHz, 2.8 GHz and 2.4 GHz processors for floating point
operations is shown in figure 4. For the 3.2 GHz processor,
in some cases, Windows XP provides better results than
Linux with an increased peak value of 6.72% for FP
operations, while an increased lowest possible value by
87.89%. Despite the overlap of IQRs of Windows XP (13661589.75) and Linux (757-1469.50), the quantile-plots reveal
that Windows XP provides an overall better performance
than Linux on a 3.2 GHz processor. The median value (at the
50th percentile) shows the skewed distribution of the
performance data present in the IQRs. The data range for the
Windows XP and Linux operating systems were 1195-1636
and 636-1533 respectively.
For the 3.0 GHz processor, the non-overlapping IQRs in
the quantile-plots reveal that Windows XP (1285-1324.75)
clearly performed better than Linux (985.75-1083.50) for FP
operations apart from for a few values classified as: 1) near
outliers represented by circles; and 2) significant outliers
represented by stars. Windows XP yields an increased peak
value and an improved lower value than Linux by 8.91% and
53.01% respectively. An approximate normal distribution of
performance data within the IQR for Windows XP is
represented by the median value, while it reports a skewed
distribution for Linux. The performance range for FP
operations while using Windows XP and the Linux operating
systems on 3.0 GHz processor were 1016-1527 and 6641402 respectively.
For the 2.8 GHz processor, the quantile-plots reveal that
Windows XP clearly performs better than Linux. When
Windows XP is used with a 2.8 GHz processor, an increased
peak value for FP operations, up to 50.67%, was reportedly
achieved. Similarly, for the lowest possible value, usage of
Windows XP yields better values of up to 810.301% relative
to Linux. The median value for Windows XP and Linux
suggests a skewed distribution of performance data for both
operating systems in their respective IQRs. The data ranges
for the Windows XP and Linux operating systems were
1147-1564 and 126-1038 respectively.
For the 2.4 GHz processor, the quantile-plots reveal that
Windows XP clearly performs better than Linux. With
Windows XP, an increased peak value for FP of up to
106.44% over Linux is achieved. Similarly, for the lowest
possible value, usage of Windows XP yielded better values
of up to 59.81% relative to Linux. In terms of Linux running
on a 2.4 GHz processor, the performance values for FP
operations have been found to be very predictable. The data
ranges for the Windows XP and Linux operating systems
were 1050-1410 and 657-683 respectively.
Figure 4. Comparison of OS Performance on Measured FP Count
2) Average OS Performance for FP operations:
A comparison of Windows XP and Linux, shown in figure 5,
in terms of achieving higher average FP performance
suggested a clear performance advantage of Windows XP.
Windows XP provides a 27.76%, 28.53%, 103.33% and
84.42% average performance improvement over Linux on
3.2 GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz processors
respectively.
Figure 5. Comparison of Average OS Performance for FP operations
3) Overall OS Performance for INT operations:
Figure 6 represents the quantile-plots describing the
performance of Windows XP and Linux on the 3.2 GHz, 3.0
GHz, 2.8 GHz and 2.4 GHz processors for INT operations.
In terms of the 3.2 GHz processor, the top 25% of the data
instances above 75th percentile for Windows XP were higher
than Linux except in a few cases. The performance of both
Windows XP and Linux can be largely characterized by the
outliers. The deployment of Windows XP provides a
decreased peak value and a decrease in performance of
8.73% and 4.20% respectively, for INT operations when
compared with Linux on a 3.2 GHz processor. The median
values suggest a skewed distribution of performance data for
both operating systems. The data range for the Windows XP
and Linux operating systems were 869-2748 and 907-3011
respectively.
4)Average OS Performance for INT operations:
A comparison of Windows XP and Linux for achieving
higher average INT performance is shown in figure 7. It was
revealed that Windows XP gave better average performance
than Linux for INT operations on the 3.2 GHz, 2.8 GHz and
2.4 GHz processors by 31.17%, 58.75% and 25.67%
respectively. However, the average performance of Windows
XP was degraded by 24.08% on 3.0 GHz processor as
compared to Linux.
Figure 6. Comparison of OS Performance on Measured INT Count
Overall, the 3.0 GHz processor Linux performance is
significantly better than that of Windows XP, apart from a
few outlier values. The usage of Windows XP resulted in a
decreased peak value for INT of up to 28.38%, while gave a
degraded performance of up to 4.43% for the lowest possible
INT value when compared with Linux. The median values
suggested skewed distribution of performance data for both
Windows XP and Linux. The data ranges for the Windows
XP and Linux operating systems were 1123-2776 and 11753876 respectively.
Windows XP performed relatively better than Linux for
the 2.8 GHz processor. When Windows XP is used with a
2.8 GHz processor, an increased peak value for INT
operations of up to 25.67% was reportedly achieved.
Similarly, for the lowest possible value, the use of Windows
XP yielded better values of up to 232.38% relative to Linux.
At least for 50% of the performance instances, Windows XP
gave higher results than Linux. The 50th percentiles for
Windows XP and Linux revealed a skewed distribution of
performance data. The data ranges for the Windows XP and
Linux operating systems were 1047-2937 and 315-2337
respectively.
The performance of Windows XP was found to be
clearly better than Linux apart from a few outlier values on
the 2.4 GHz processor. When Windows XP was used with a
2.4 GHz processor, an increased peak value for FP of up to
50.77% was reportedly achieved. However, for the lowest
possible value, usage of Windows XP resulted in a degraded
value of up to 40.64% relative to Linux due to the presence
of outliers associated with Windows XP performance data.
For Linux running on a 2.4 GHz processor, the performance
values for INT were found to be very predictable with few
outliers. The data ranges for the Windows XP and Linux
operating systems were 1040-2845 and 1752-1887
respectively.
Figure 7. Comparison of Average OS Performance for INT operations
C. Factors affecting the FP and INT performance
The performance analysis reveals that the 3.2 GHz
processor is the most suitable for floating-point intensive
computations with Windows XP, while for integer intensive
computations, the 2.4 GHz processor with Windows XP is
the most suitable amongst those considered. We further
investigate the factors affecting the FP and INT
performance on these platforms and the effects of the
change in values of these impacting factors on the FP and
INT performance.
1) Factors affecting the FP performance:
We perform correlation analysis to identify the factors that
influence the FP performance on the 3.2 GHz processor with
Windows XP. The Pearson coefficient of correlation is
calculated using PASW Statistics 18 for identifying the
influence of RAM, Cache, Swap Memory, Total Disk Size
and Free Disk Size on the measured FP performance. The
analysis, summarized in Table I, identified Cache as the only
factor influencing the FP performance of the system with a
moderate positive strength of 0.363 and a significance value
of 0.048.
TABLE I. CORRELATION ANALYSIS FOR FP PERFORMANCE
Pearson
Sig. (2-tailed)
RAM
Cache
Swap
.073
.700
.363
.048
-.002
.991
Total
Disk
-.043
.822
Free
Disk
-.023
.903
To determine the effect of change in Cache size on
measured FP value for the 3.2 GHz processor with Windows
XP, we performed regression analysis with several different
regression models: 1) Linear; 2) Inverse; 3) Quadratic;
4) Cubic; 5) Compound; 6) Power; 7) S; 8) Growth and, 9)
Exponential; to identify the best suited functional
relationship between the Cache size and the measured FP
value. The R2 values for Linear (0.132), Logarithmic (0.133),
Inverse (0.134), Quadratic (0.145), Cubic (0.145),
Compound (0.143), Power (0.144), S (0.144), Growth
(0.143) and Exponential (0.143) models suggests that no
single model can accurately represent the complex
relationship between the Cache size and the FP performance
of the system. Figure 8 shows the scatter plot and the trendlines using these models.
contains the results of the correlation analysis. The analysis
again identifies Cache as the only factor influencing the INT
performance of the system with a moderate negative
strength of 0.599 and a highly significant value of 0.001.
TABLE II. CORRELATION ANALYSIS FOR INT PERFORMANCE
Pearson
Sig. (2-tailed)
RAM
Cache
Swap
.235
.247
-.599
.001
.273
.177
Total
Disk
.381
.060
Free
Disk
.358
.072
To determine the effect of a change in the Cache size on
measured INT values for the 2.4 GHz processor with
Windows XP, we again perform regression analysis, given in
Figure 9, to best characterize the functional relationship
between the Cache size and the measured FP value. The R2
values for Linear (0.359), Logarithmic (0.359), Inverse
(0.359), Quadratic (0.359), Cubic (0.359), Compound
(0.337), Power (0.337), S (0.337), Growth (0.337) and
Exponential (0.337) models suggest once again that none of
the models could accurately represent the complex
relationship between the Cache size and the INT
performance of the system.
V.
CHALLENGES ASSOCIATED WITH PERFORMANCE
ANALYSIS
While analyzing the performance data returned by
ClimatePrediction.net, we have identified two main
challenges. These are briefly discussed in the following
subsections.
Figure 8. Relationship between Cache Size and FP performance
Figure 9. Relationship between Cache Size and INT performance
2) Factors affecting the INT performance:
We performed correlation analysis to identify the factors
influencing the INT performance of the 2.4 GHz processor
with Windows XP. The Pearson coefficient of correlation is
calculated using PASW Statistics 18 for finding the impact
of RAM, Cache, Swap Memory, Total Disk Size and Free
Disk Size on the measured INT performance. Table II
A. Benchmarks
The Dhrystone and Whetstone benchmarks used for
measuring the performance of the systems participating in
the ClimatePrediction.net project are traditional synthetic
benchmarks, often described in the literature as ‘toy
benchmarks’ [16]. As such, the results that we can yield from
the underlying resources are limited.
The performance data obtained using these synthetic
benchmarks cannot be compared with a standard
performance benchmark suite e.g., SPEC 2000. The
performance data collected for the ClimatePrediction.net
project are in terms of millions of instructions executed per
second. On the other hand, the SPEC 2000 suite has two
components: CINT2000 and CFP2000. The CINT2000
calculates the integer operation speed of a system by running
12 different programs for an odd number of times greater
than or equal to 3. The CFP2000 component calculates the
floating-point operation speed of a system by running 14
different programs for an odd number of times greater than
or equal to 3. The results are then converted to ratios with
respect to a benchmark (the Sun Ultra 10 machine); the
geometric mean of the individual results is calculated and
published [17].
B. Limitations of BOINC
There are also reported limitations of the BOINC
benchmarking system [18]. These include: 1) instability of
benchmarking results with successive runs, yielding
significant variations in performance results; 2) the credit
claims based on the reported benchmarking results for the
same work units reportedly have wide variation of up to 2X
virtually across every BOINC powered project; 3) no
provision is available to prevent ‘cheating’ or ‘cooking’ the
benchmarking results; 4) the BOINC benchmarking does not
perform a good characterization of the system; and 5) the
usage of synthetic benchmarks are not good representatives
of the real-world scientific applications running on BOINC.
Realizing the limitations of BOINC benchmarking, efforts
are underway for improving the benchmarking system
employed [18-19].
VI.
process of developing light-weight resource profiling tools
that we hope to contrast with those detailed in [18]. As well
as improving the accuracy of the accounting system found in
BOINC, there several other benefits to collecting accurate
resource information. We envisage scenarios in which
middleware may select resources on the basis of this
resource data (semi-real-time analysis, QoS-managed
services etc), where it may be advantageous to wait for better
resources as opposed to selecting resources as and when they
come available. This may in itself allow a richer variety of
P2P service, and thus broaden the appeal (and application) of
this paradigm.
CONCLUSIONS & FUTURE WORK
The aim of this study is to characterize the underlying
resources on which the ClimatePrediction.net project, or
similar projects that are based on the BOINC platform,
execute. By understanding more about the participating
resources we hope to: (i) Support application developers by
providing insights into system performance. The measured
performance of the system, with different parametric values
for the supporting hardware components, will provide the
application developers with supporting data so that future
versions of the software can be improved. Moreover, the
results will also enable the application designers to exploit
operating system characteristics; (ii) Support the participants
in peer-to-peer projects, by informing them as to the amount
of RAM, swap memory and main memory consumed during
execution. While this may at first appear inconsequential, it
may be significant to future multi-core resource allocation
and sharing, which if better understood, may increase the
number of participants in these P2P projects; (iii) Finally we
believe that the analysis will assist project administrators in
identifying suitable processor, operating system and
hardware component configuration for best-case execution.
The significance of this is that it may allow a new form of
time-constrained experiments to be supported. In such a case
the middleware may be instructed to postpone the allocation
of work until a more effective architecture becomes
available.
When selecting the processors for our analysis, we
consider two factors: 1) the rank of the most widely used
processors in the host machines and, 2) the credit earned per
CPU, which represents the amount of work done by a
system. Therefore we consider four (of the top 5) most
widely used processors for ClimatePrediction.net in this
analysis. A total of 240 data samples were collected, with 60
data samples representative of each of the four processors in
question. We also evaluate the impact of operating system on
the floating-point and integer performance.
While trends are apparent in many cases, which
themselves may be exploited in later work, the limitations of
the Dhrystone and Whetstone benchmarks from which the
data is collected is a limiting factor. Further work is needed
to strengthen the data collection process. This in itself has
some difficulties, not least, the introduction of further
overhead on the contributing machines. We are in the
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
Stainforth, D., et. al., “Distributed Computing for Public
Interest Climate Modeling Research”, Computing in Science
and Engineering, Vol. 4, Issue 3, May 2002, pp. 82-89.
Anderson, D. P., “BOINC: A System for Public-Resource
Computing and Storage”, Proc. of 5th IEEE/ACM
International Workshop on Grid Computing, 2004, pp. 4-10.
Entropia PC Grid Computing: http://www.entropia.com/
Distributed.net: http://www.distributed.net/
BOINC: http://boinc.berkeley.edu/
Allen, M. R., “Possible or probable?”, Nature, 425:242, 2003.
Allen, M. R., “Do-it-yourself climate prediction”, Nature,
401:642, 1999.
ClimatePrediction.net Statistics:
http://www.allprojectstats.com/po.php?projekt=21
[9]
[10]
[11]
[12]
[13]
[14]
Working Group I, “Climate Change 2001: The Scientific
Basis”, Third Assessment Report of the Intergovernmental
Panel on Climate Change, Cambridge Univ. Press,
Cambridge, UK, 2001.
Cullen, M., “The Unified Forecast/Climate Model”, Meteor.
Mag., 122:81-94, 1993.
Williams, K. D., et. al., “Transient climate change in the
Hadley Centre models: The role of physical processes”,
Journal of Climate, 14 (12): 2659-2674, 2001.
Christensen, C., et. al., “The Challenge of Volunteer
Computing with Lengthy Climate Model Simulations”, Proc.
of the 1st International Conference on e-Science and Grid
Computing (e-Science’ 05), 2005.
ClimatePrediction.net: http://climateprediction.net/
ClimatePrediction.net
Project
System
Requirements:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/tech_faq_boinc.php
[15]
[16]
[17]
Koehler, C., Curreri, J., and George, A. D., “Performance
Analysis Challenges and Framework for High-Performance
Reconfigurable Computing”, Parallel Computing, Vol. 34,
Issue 4-5, May 2008, pp.217-230.
Hennessy, J. L., and Patterson, D. A., Computer Architecture
– A Quantitative Approach, Morgan Kaufmann Publishers,
Inc., San Mateo, California, 1990.
Standard Performance Evaluation Corporation (SPEC):
http://www.spec.org
[18]
Improved Benchmarking System Using Calibration Concepts:
http://www.boincwiki.info/Improved_Benchmarking_System_Using_Calibration_Con
cepts
[19]
New Credit System Design:
http://boinc.berkeley.edu/trac/wiki/CreditNew
[20]
Intel Pentium 4 Processor:
http://pixel01.cps.intel.com/products/processor/pentium4/index.htm
Download