SYSTEM PERFORMANCE ANALYSIS AND OPTIMIZATION

advertisement
SYSTEM PERFORMANCE ANALYSIS AND OPTIMIZATION
(Ch.9 from Laplante, 1997)
Recall that response time is the time between receipt of an interrupt
and completion of all associated processing. Time-loading (utilization) is the
percentage of time the CPU is doing “useful” processing. Memory-loading is
the percentage of usable memory that is being used.
RESPONSE TIME CALCULATION
In general, the response time for task I, denoted Ri, is
Ri=Li+Cs+Si+Ai
where Li is the interrupt latency (nanoseconds), Cs is the context save time
(microseconds), Si is the schedule time (microseconds), and Ai is the actual
process time (milliseconds).
For highest-priority tasks, the total interrupt latency can be computed
as
Li=Lp+max(LI, LD)
where Lp is the interrupt latency to the propagation delay of the interrupt
signal, LI is the longest completion time for an instruction in the interrupted
process, and LD is the maximum time the interrupts are deliberately disabled
by the lower-priority routine during, for example, context switching or
buffer passing.
For lower-priority tasks, interrupt cannot be processed until all higherpriority routines have been fully processed. In this case,
Li=LH,
where LH is the time needed to complete all higher-priority routines.
Calculation of LH is difficult or impossible for most systems due to process i
might be interrupted.
TIME-LOADING AND ITS MEASUREMENT
If Ti is the cycle time (or minimum time between occurrences) for cycle i,
and Ai is the actual execution time, then time-loading (utilization) T for n
tasks is:
n
T= 
i 1
Ai
Ti
SCHEDULING PROBLEMS AND ESTIMATIONS
The most scheduling problems involving real-time systems are NP-complete
(require exponential time for solving):
1. When there are mutual exclusion constraints, it is impossible to find a
totally on-line optimal run-time scheduler
2. The problem of deciding whether it is possible to schedule a set of
periodic tasks that use semaphores only to enforce mutual exclusion is
NP-hard (even exponential-time solution is not known).
3. The multiprocessor scheduling problem with two processors, no
resources, independent tasks, and arbitrary computation times is NPcomplete (for unit computation times it is polynomial).
4. The multiprocessor scheduling problem with three or more
processors, one resource, independent tasks, unit computation times of
each task is NP-complete.
REDUCING RESPONSE TIMES AND TIME-LOADING
1. Compute at Slowest Cycle
All processing should be made at the slowest rate that can be tolerated.
Checking a temperature discrete for a large room at faster than 1 second
may be wasteful.
2. Scaled Arithmetic
Integer operations are typically faster than floating point operations for
most computers. We can take advantage of that fact in certain systems by
multiplying integers by a scale factor to simulate floating point
operations. This solution was one of the first methods for implementing
real-number operations in the early computers. Here a two’s complement
number is used, the LSB (least significant bit) of which is assigned a
scale factor, which is sometimes called granularity of the number. If the
number is an n-bit two’s complement integer, then the MSB (most
significant bit) of the number acts like a sign bit. The largest number that
can be represented this way is (2 n1  1)  LSB and the smallest number that
can be represented is  2 n 1  LSB
Example: Consider the aircraft navigation system in which x, y, and z
accelerometer pulses are converted into actual accelerations by applying
the scale factor of 0.01. The 16-bit number 0000 0000 0001 0011 then
represents a delta velocity of 19x0.01 = 0.19 feet per second. The largest
and smallest delta velocities that can be represented in this scheme are
327.67 and -327.68 feet per second, respectively.
Scaled numbers can be added and subtracted together, and multiplied and
divided by a constant (but not another scaled number), as signed integers.
Thus, computations, involving such numbers, can be performed in integer
form and then converted to floating point only at the last step.
3. Look-Up Tables
Look-up tables rely on the mathematical definition of the derivative:
f ( x)  lim
x 0
f ( x  x)  f ( x)
x
Generic look-up table is an array of pre computed values of f for various x
taken with the step x . All intermediate values can be interpolated as
follows:
f ( x )  f ( x)  ( x   x)
f ( x  x)  f ( x)
x
The choice of x represents a tradeoff between the size of the table and the
desired resolution of the function.
BASIC OPTIMIZATION THEORY
1. Use of Arithmetic Identities
For example, multiplication by the constant “1” or addition with 0 should
be eliminated from the executable
2. Reduction in Strength
This method refers to the use of fastest macroinstruction to accomplish a
given
calculation. For
example, many compilers
will
replace
multiplication of an integer by another integer that is a power of 2, by a
series of shift instructions. Divide instructions usually take longer to
execute than multiply instructions. Hence, it may be better to multiply by
the reciprocal of the number than to divide by that number. For example,
x*0.5 will be faster than x/2.0.
3. Common Sub-Expression Elimination
The following Pascal fragment
X:=y+a*b;
Y:=a*b+z;
could be replaced with:
t:=a*b;
x:=y+t;
y:=t+z;
eliminating the additional multiplication.
4. Intrinsic Functions
When possible, use intrinsic functions rather than ordinary functions.
Intrinsic functions are simply macros where the actual function call is
replaced by in-line code during compilation:
# define max(A,B) ((A)>(B)?(A):(B))
This improves real-time performance because the need to pass
parameters, create space for local variables, and release that space is
eliminated.
5. Constant Folding
The statement
X:=2.0*x*4.0;
could be optimized by folding 2.0*4.0 to 8.0.
Although original statement may be more descriptive, a comment can be
provided to explain the optimized expression.
Also, mnemonic names can be used. For example, in the case of use of
 / 2 , it can be pre computed and stored as a constant named pi_div_2.
6. Loop Invariant Optimization
Consider the following Pascal fragment:
X:=100;
While x>0 do
X:=x-(y+z);
It can be replaced by
X:=100;
T:=y+z;
While x>0 do
X:=x-t;
This moves an instruction outside the loop (decreases time) but increases
memory requirements.
7. Loop Induction Elimination
A variable I is called an induction variable of a loop if every time when
loop variable changes, I is incremented or decremented by some constant.
Consider the following Pascal fragment:
For i:=1 to 10 do
A[i+1]:=1;
An improved version is
For j:=2 to 11 do
A[j]:=1;
eliminating the extra addition within the loop.
8. Use of Registers and Caches
When programming in assembly language, or when using languages that
support register-type variables, such as C, it is usually advantageous to
perform calculations using registers:
f(register unsigned m, register long n){
register int i;
..
Although most optimizing compilers will cache variables, where
possible, the nature of the source-level code affects compiler’s abilities.
9. Removal of Dead or Unreachable Code
For example, instead of
If(debug){
..
}
Better to use
#ifdef DEBUG
{
..
}
#endif
10.Flow Control Optimization
The following pseudo code
Goto label11;
Label10: y=1;
Label11: goto label12;
Can be replaced by
Goto label12;
Label10: y=1;
Label11: goto label12;
Such code is not normally generated by programmers but might result
from automatic generation.
11.Constant Propagation
Certain variable assignment statements can be changed to constant
assignments, thereby permitting time saving. For example:
X:=100;
Y:=x;
Is implemented in 2-address assembly by a non-optimizing compiler as
LOAD R1, 100
STORE R1,x
LOAD R1,x
STORE R1,y
This could be replaced by
X:=100;
Y:=100;
With associated 2-address assembly output:
LOAD R1, 100
STORE R1,x
STORE R1,y
12.Dead-Store Elimination
The following Pascal code illustrates dead-store:
T:=y+z;
X:=func(t);
This could be replaced by
X:=func(y+z);
if t is not used in other statements,
13.Dead Variable Elimination
A variable is live at point in a program if its value can be used
subsequently; otherwise it is dead and subject to removal. The following
Pascal code illustrates that x is a dead variable:
Int main(){
Int x;
Return 1;
}
After dead variable elimination:
Int main(){
return 1;
}
14.Short-Circuiting Boolean Code
If (x>0) and (y>0) then
Z:=1;
If x<=0 then there is no need to check y>0:
If x>0 then
If y>0 then
Z:=1;
15.Loop Unrolling
It duplicates statements in order to reduce the number of loop iterations
and hence the loop overhead incurred:
For i:=1 to do
a[i]:=a[i]*8;
may be replaced by:
for i:=1 to 6 step 3 do begin
a[i]:=a[i]*8;
a[i+1]:=a[i+1]*8;
a[i+2]:=a[i+2]*8
end;
16.Loop Jamming
It is also called loop fusion. It combines similar loops into one:
For i:=1 to 100 do
X[i]:=y[i]*8;
For i:=1 to 100 do
Z[i]:=x[i]*y[i];
Is converted to
For i:=1 to 100 do begin
X[i]:=y[i]*8;
Z[i]:=x[i]*y[i]
End;
17.Cross Jump Elimination
If the same code appears in more than one case in a case statement, then
such cases can be combined:
Case x of
0: x:=x+1; break;
1: x:=x*2; break;
2: x:=x+1; break;
3: x:=2;
End;
Can be replaced by
Case x of
0,2: x:=x+1; break;
1: x:=x*2; break;
3: x:=2;
End;
OTHER OPTIMIZATION TECHNIQUES
1. Optimize the most frequently used path
2. Arrange a series of IF statements so that the most likely to fail
condition is tested first
3. Arrange a series of AND conditions so that most likely to fail
condition is tested first (in the case of OR conditions – so that most
likely to succeed condition is tested first)
4. Arrange entries in the table so that the most frequently sought values
are the first to be compared
5. Replace threshold tests on monotone (continuously nondeacreasing or
nonincreasing) functions by tests on their parameters. For example,
instead of
If exp(x)<exp(y) then
Use
If x<y then
6. Link the most frequently used procedures together to maximize the
locality of reference (for paged and cached systems)
7. Store data elements that are used concurrently together (to increase
locality of reference)
8. Store procedures in sequence so that calling and called procedures
will be loaded together (to increase locality of reference)
ANALYSIS OF MEMORY REQUIREMENTS
Memory is considered as stack, program and RAM areas.
The total memory loading is a weighted sum of individual memoryloading for program, stack and RAM:
MT=MP*PP+MR*PR+MS*PS,
Where MP, MR, MS are the memory loading of program, RAM nd stack
parts, and PP, PR, PS are percentages of total memory allocated for the
program, RAM and stack areas. For example, computer system has 64
Mb of program memory that is loaded at 75%, 24 Mb of RAM that is
loaded at 25%, and 12 Mb of stack area that is loaded at 50%. The total
memory loading is
MT=0.75*64/100+0.25*24/100+0.50*12/100=60%
MP=UP/TP
Where UP is the number of locations used in the program area, and TP is
the total available locations in the program area.
MR=UR/TR
Where UR is the number of locations used in the RAM area, and TR is
the total available number of locations in RAM. Numbers UP, TP, UR,
TR are available from the linker.
MS=US/TS
Where TS is the total available number of locations in the stack area,
US=CS*Tmax
Where CS is number of locations allocated for one task, and Tmax is the
maximal number of tasks which can reside simultaneously in the stack
area.
REDUCING MEMORY-LOADING
It may be achieved by proper choice of target area for variables, reuse of
variables, and by use of self-modifying (that is dangerous and is not
allowed in many cases).
QUEUEING MODELS
Basic Buffer Size Calculation
If the data are produced at a rate P(t) and consumed at a rate C(t), then if
burst of data takes place for period T, buffers size can be calculated as
B=(P-C)*T
If P and C are constants
If they are functions of time and burst of data takes place between t1 and
t2, then buffer size is
T
B  max  ( P(t )  C (t )) dt
T t 2 t1
t1
If rates of production and consumption are random values with some
distribution, we come to queuing model
X/Y/n
Where X denotes arrival time probability function, Y is service time
probability function, n is a number of servers. For example, n is a number
of processors, X is a distribution for times if arising interruptions, Y is
distribution of time of handling of interruptions by respective processes.
Hence, M/M/1 denotes system with one processor, serving interruptions
arising according to exponential distribution by processes requiring also
exponentially distributed times. Exponential distribution is given by
f (t )  e t
It means that probability of arising of new interruption at time instant T
inside time interval (a,b) is
b
p(a  T  b)   f (t )dt  e a  e b
a
Value of
1

gives average time between two consecutive interruptions,
 is the average rate of arising of interrupts. Respective mean
and
service time we denote by
1
. Not to have infinite number of interrupts in

the queue we require

It means that mean rate of interrupts is less than rate of serving them. If
to denote  

, then the average number of customers (interrupts) in the

queue is given by
N

1 
(1)
With variance
 N2 

(1   ) 2
(2)
For random value x, average value x is defined as

x   xf ( x)dx

and variance is

 x2   ( x  x ) 2 f ( x)dx

The average time customer (interruption) spends in the system is
T
1/ 
(3)
1 
Probability that at least k customers are in the queue is
P( Noofcustomers  k )   k
(4)
Buffer size for waiting interruptions is defined by average size of queue
N (1). With the help of (1), (2) we can decide what maximal size of
buffer can be used. Expression (3) allows to evaluate response times.
Expression (4) can be used to decide on system parameters providing that
in the system there will be not more than specified number of pending
requests.
LITTLE’S LAW
It states (appeared in 1961) that the average number customers in a
queuing system, N av , is equal to the average arrival rate of the customers
to that system, rav , times the average time spent in that system, t av ,
N av  rav  t av
If n servers are present then
n
N av   ri ,av  t i ,av
i 1
Where ri ,av is average arrival rate of customers to the i-th server, and t i ,av
is the average service time for server i.
For example, a system is known to have periodic interrupts occurring at
10, 20 and 100 milliseconds and a sporadic interrupt that is known to
occur on average every 1 second. The average processing time for these
interrupts is 3, 8, 25, and 30 milliseconds. Then by Little’s law the
average number of customers in the queue is
N av 
1
1
1
1
3
8 
 25 
 30  0.98
10
20
100
1000
Download