Example: Rumor Performance Evaluation Andy Wang CIS 5930

advertisement
Example: Rumor
Performance Evaluation
Andy Wang
CIS 5930
Computer Systems
Performance Analysis
Motivation
• Optimistic peer replication is popular
– Intermittent connectivity
– Availability of replicas for concurrent
updates
– Convergence and correctness for updates
• Example: Rumor, Coda, Ficus, Lotus
Notes, Outlook Calendar, CVS
2
Background
• Replication provides high availability
• Optimistic replication allows immediate
access to any replicated item, at the risk
of permitting concurrent updates
• Reconciliation process makes replicas
consistent (i.e., two replicas for peer-topeer)
3
Background Continued
• Conflicts occur when different replicas
of the same file are updated subsequent
to the previous reconciliation
4
Optimistic Replication
Example
Log on Desktop
10:00 Update
10:25 Update
Log on Desktop
10:00 Update
10:25 Update
10:40 Update
connected
disconnected
Log on Portable
10:00 Update
10:25 Update
Log on Portable
10:00 Update
10:25 Update
10:51 Update
5
Example Continued
Log on Desktop
10:00 Update
10:25 Update
10:40 Update
Log on Desktop
10:00 Update
10:25 Update
10:40 Update
10:51 Update
disconnected
connected
• Run reconciliation
• Detect a conflict
• Propagate updates
Log on Portable
10:00 Update
10:25 Update
10:51 Update
Log on Portable
10:00 Update
10:25 Update
10:40 Update
10:51 Update
6
Goal
• Understand the cost characteristics of
the reconciliation process for Rumor
7
Services
• Reconciliation
– Exchange file system states
– Detect new and conflicting versions
• If possible, automatically resolve conflicts
• Else, prompt user to resolve conflicts
– Propagate updates
8
Outcomes
• Two reconciled replicas become
consistent for all files and directories
• Some files remain inconsistent and
require user to resolve conflicts
9
Metrics
• Time
– Elapsed time
• From the beginning to the completion of a
reconciliation request
– User time (time spent using CPU)
– System time (time spent in the kernel)
• Failure rate
– Number of incomplete reconciliations and
infinite loops (none observed)
10
Metrics not Measured
• Disk access time
– Require complex instrumentations
• E.g., buffering, logging, etc.
• Network and memory resources
– Not heavily used
• Correctness
– Difficult to evaluate
11
Monitor Implementation
Reconciliation Process
Perl library
Spool-to-dump
C++
Scanner
Recon
Rfindstored
Spool-to-dump
Rrecon
Server
• Top-level Perl time command
12
Parameters
• System parameters
– CPU (speed of local and remote servers)
– Disk (bandwidth, fragmentation level)
– Network (type, bandwidth, reliability)
– Memory (size, caching effects, speed)
– Operating system (type, version, VM
management, etc.)
13
Parameters (Continued)
• Workload parameters
– Number of replicas
– Number of files and directories
– Number of conflicts and updates
– Size of volumes (file size)
14
Workloads
• Update characteristics extracted from
Geoff Kuenning’s traces
File access
Readonly
access
Read-write access
Nonshared access
Read
access
Write
access
Shared access
2-way sharing
Read
access
Write
access
3+way sharing
Read
access
Write
access
15
Experimental Settings
•
•
•
•
•
•
Machine model: Dell Latitude XP
CPU: x486 100 MHz
RAM: 36MB
Ethernet: 10Mb
Operating system: Linux 2.0.x
File system: ext3
16
Experimental Settings
• Should have documented the following
as well
– CPU: L1 and L2 cache sizes
– RAM: Brand and type
– Disk: brand, model, capacity, RPM, and
the size of on-disk cache
– File system version
17
Experimental Design
• 255 full factorial design
• Linear regression or multivariate linear
regression to model major factors
• Target: 95% confidence interval
18
5
2 5
•
•
•
•
•
Full Factorial Design
Number of replicas: 2 and 6
Number of files: 10 and 1,000
File size: 100 and 22,000 bytes
Number of directories: 10 and 100
Number of updates: 10 and 450
– Capped at 10 updates for 10 files
• Number of conflicts: 0 /* typical */
19
5
2 5
Full Factorial Analysis
Elapsed time
150
• Experiment errors <
3%
Time
(seconds)
100
50
0
0
10
20
30
Experimental number
measured time
System time
40
predicted time
User time
6
5
4
Time
3
(seconds)
2
1
0
40
30
Time
20
(seconds)
10
0
0
10
20
30
Experimental number
measured time
predicted time
40
0
10
20
30
Experimental number
measured time
predicted time
40
20
Variation of Effects
Top 5 effects for elapsed time
% Variation
• All major effects
significant at 95%
confidence interval
100
90
80
70
60
50
40
30
20
10
0
# files
# files
# updates
#files x
#updates
Factor
fileSize
fileSize x
#files
Factor
fileSize
# updates
Top 5 effects for user time
Top 5 effects for system time
% Variation
100
90
80
70
60
50
40
30
20
10
0
# dirs
% Variation
100
90
80
70
60
50
40
30
20
10
0
fileSize x
#files
# files
# replicas
# dirs
Factor
#files x
#updates
# updates
21
Residuals vs. Predicted
Time
Elapsed time
20
• Clusters caused by
dominating effects
of files
15
10
Residuals
(seconds)
5
0
-5 0
50
100
150
-10
-15
-20
Predicted time (seconds)
System time
Residuals
(seconds)
User time
0.6
0.6
0.4
0.4
0.2
0.2
0
-0.2
0
1
2
3
-0.4
-0.6
4
5
Residuals
(seconds)
0
-0.2
0
10
20
30
40
-0.4
Predicted time (seconds)
-0.6
Predicted time (seconds)
22
Residuals vs. Experiment
Numbers
Elapsed time
20
• Residuals show
homoscedasticity,
almost
15
10
5
0
residuals
-5 0
50
100
150
200
-10
-15
-20
Experimental number
System time
residuals
User time
0.6
0.6
0.4
0.4
0.2
0.2
0
-0.2
residuals
0
50
100
150
-0.4
-0.6
200
0
-0.2
0
50
100
150
200
-0.4
Experimental number
-0.6
Experimental number
23
Quantile-Quantile Plot
Elapsed time
20
y15= 5.6125x + 2E-15
R² = 0.9757
10
• Residuals are
normally distributed,
almost
5
Residual
quantiles
-4
0
-2
-5 0
2
4
-10
-15
-20
Normal quantiles
System time
Residual
quantiles
-4
User time
0.6
0.6
0.4
y = 0.1125x - 2E-18
0.2 R² = 0.9863
0.4
y = 0.1242x - 3E-16
R² = 0.9524
0.2
0
-2
-0.2
0
-0.4
-0.6
Normal quantiles
2
4
Residual
quantiles
-4
0
-2
-0.2
0
2
4
-0.4
-0.6
Normal quantiles
24
Multivariate Regression
•
•
•
•
•
•
•
Number of replicas: 2
Number of files: 4 levels, 10-600
File size: 22,000 bytes
Number of directories: 4 levels, 10-60
Number of updates: 0
Number of conflicts: 0 /* typical */
Number of repetitions: 5 per data point
25
Multivariate Regression
Elapsed time
• Experiment errors <
7%
• All coefficients are
significant
150
Time
(seconds)
100
50
0
0
20
40
60
Experiment number
measured time
80
100
80
100
predicted time
User time
System time
3.5
3
2.5
2
Time
(seconds) 1.5
1
0.5
0
40
30
Time
20
(seconds)
10
0
0
20
40
60
80
Experiment number
measured time
predicted time
100
0
20
40
60
Experiment number
measured time
predicted time
26
Residuals vs. Predicted
Time
Elapsed time
15
• Elapsed time shows
a bi-model trend
• User time shows an
exponential trend
10
5
Residuals
(seconds)
0
-5
0
20
40
60
80
100
120
-10
-15
Predicted time (seconds)
System time
User time
0.3
1
0.2
0.1
0
Residuals
-0.1 0
(seconds)
-0.2
0.5
0.5
1
1.5
2
-0.3
2.5
3
Residuals
(seconds)
0
0
10
20
30
40
-0.5
-0.4
-0.5
Predicted time (seconds)
-1
Predicted time (seconds)
27
Residuals vs. Experiment
Numbers
Elapsed time
15
• Not so good for
elapsed time and
user time
10
5
0
Residuals
-5
0
20
40
60
80
100
80
100
-10
-15
Experiment number
User time
System time
1
0.3
0.2
0.5
0.1
0
residuals -0.1 0
20
40
60
80
100
residuals
0
0
-0.2
20
40
60
-0.5
-0.3
-0.4
-0.5
Experiment number
-1
Experiment number
28
Quantile-Quantile Plot
Elapsed time
20
• Residuals are not
normally distributed
for elapsed time and
user time
y = 5.6775x - 4E-14
R² = 0.8407
15
10
Residual
quantiles
-3
5
0
-2
-1
-5 0
1
2
3
-10
-15
-20
Normal quantiles
System time
Residual
quantiles-3
-2
0.4
y = 0.1321x - 2E-15
0.3
R² = 0.9789
0.2
0.1
0
-1 -0.1 0
1
2
-0.2
-0.3
-0.4
-0.5
Normal quantiles
User time
1.5
y = 0.4811x - 2E-15
R² = 0.9243
1
0.5
3
Residual
quantiles
-3
0
-2
-1
-0.5
0
1
2
3
-1
-1.5
Normal quantiles
29
Log Transform (User Time)
User time
0.04
• ANOVA tests failed
miserably
0.02
Residuals
(seconds)
0
0
0.5
1
1.5
2
-0.02
-0.04
-0.06
Predicted time (seconds)
User Time
User time
0.04
0.08
0.06
0.02
0.04
0
residuals
0
20
40
60
-0.02
100
0.02
0
-2
-1 -0.02 0
1
2
3
-0.04
-0.04
-0.06
80
Residual
quantiles
-3
y = 0.0222x - 1E-15
R² = 0.8709
-0.06
Experiment number
-0.08
Normal quantiles
30
Residual Analyses
(User Time)
0.25
• No indications that
transforms can
help…
0.2
Standard 0.15
deviation of
0.1
residuals
0.05
0
0
0.06
10
20
Mean user time
30
40
stdev errors
0.05
0.25
0.04
0.2
Variance of
0.03
residuals
0.02
Standard 0.15
deviation of
0.1
residuals
0.01
0.05
0
0
0
10
20
Mean user time
30
40
0
500
1000
Mean user time squared
1500
31
Possible Explanations
• i-node related factors
– Number of files per directory block
– Crossing block boundary may cause
anomalies
• Caching effects
– Reboot needed across experiments
32
Linear Regression
• Number of files: 100, 150, 200, 250,
252, 253, 300, 350, 400, 450
– Test for the boundary-crossing condition as
the number of files exceeds one block
– Note that Rumor has hidden files
• Number of repetitions: 5 per data point
• Flush cache (reboot) before each run
33
Linear Regression
Elapsed time
100
80
60
Time
(seconds) 40
20
0
•
> 80%
• All coefficients are
significant
R2
0
100
200
300
Number of files
measured time
400
500
predicted time
95% confidence interval
User time
System time
30
3
20
Time
(seconds) 10
2
Time
(seconds) 1
0
0
0
100
measured time
200
300
Number of files
400
predicted time
500
0
100
200
300
Number of files
measured time
95% confidence interval
400
500
predicted time
34
Residuals vs. Predicted
Time
Elapsed time
• Elapsed time shows
a bi-model trend
• User time shows an
exponential trend
15
10
5
Residuals
(seconds)
0
-5
0
20
0.6
0.2
0.4
0.1
0
0.5
1
1.5
2
2.5
Residuals
(seconds)
20
25
0.2
0
0
5
10
15
-0.2
-0.2
-0.3
100
User time
0.3
0
80
Predicted time (seconds)
System time
-0.1
60
-10
-15
Residuals
(seconds)
40
Predicted time (seconds)
-0.4
Predicted time (seconds)
35
Residuals vs. Experiment
Numbers
Elapsed time
• Elapsed time shows
a rising bi-modal
trend
15
10
5
residuals
0
0
20
– Randomization of
experiments may
help
-15
Experiment number
User time
0.3
0.6
0.2
0.4
0.1
0.2
0
-0.1
residuals
0
10
20
30
40
50
60
0
0
10
20
30
40
50
60
-0.2
-0.2
-0.3
60
-10
System time
residuals
40
-5
Experiment number
-0.4
Experiment number
36
Quantile-Quantile Plot
Elapsed time
15
• Error residuals for
elapsed time is not
normal
5
Residual
quantils
-3
0
-2
– Perhaps piece-wise
normal
-1
-5
3
User time
y = 0.0976x - 4E-16
R² = 0.9693
y = 0.2134x + 2E-15
R² = 0.9709
0.4
0.1
0.2
0
-0.1
2
0.6
0.2
-1
1
-15
Normal quantiles
0.3
-2
0
-10
System time
Residual
quantiles
-3
y = 5.8218x + 5E-15
R² = 0.878
10
0
-0.2
-0.3
Normal quantiles
1
2
3
Residual
quantiles
-3
0
-2
-1
-0.2
0
1
2
3
-0.4
-0.6
Normal quantiles
37
Possible Explanations
•
•
•
•
i-node related factors: No
Caching effects: No
Hidden factors: Maybe
Bugs: Maybe
38
Conclusion
• Identified the number of files as the
dominating factor for Rumor running
time
• Observed the existence of an unknown
factor in the Rumor performance model
39
White Slide
40
Download