Performance Measurement CSE, POSTECH Program Performance Recall that the program performance is the amount of computer memory and time needed to run a program. 1. 2. The performance of a program depends on – – 2 Analytically - performance analysis Experimentally - performance measurement the number and type of operations performed, and the memory access pattern for the data and instructions Performance Analysis Paper and pencil. Do NOT need a working computer program or even a computer. 3 Some Uses of Performance Analysis Why do want to do a performance analysis of algorithms? To determine the practicality of algorithm To predict run time on large instance To compare two algorithms that have different asymptotic complexity - e.g., O(n) and O(n2) 4 Limitations of Performance Analysis Does NOT account for constant factors. But constant factors may dominate 1000n vs. n2 especially if we are interested only in n < 1000 5 Modern computers have a hierarchical memory organization with different access times for memory at different levels of the hierarchy. Memory Hierarchy MAIN L2 ALU R L1 8-32 32KB 512KB 512MB 1C 2C 10C 100C C = CPU cycle Read Sections 4.5.1 & 4.5.2 Limitations of Performance Analysis Performance analysis does not account for this difference in memory access times. Programs that do more work may take less time than those that do less work. – 7 e.g., a program with a large operation count and a small number of accesses to slow memory may take less time than a program with a small operation count and a large number of accesses to slow memory Performance Measurement Concerned with obtaining the actual space and time requirements of a program Actual space and time are dependent on – – 8 Compiler and options Specific computer We do not generally consider run-time space requirements (read the reasons on page 122) Performance Measurement Needs (1) 9 programming language working program computer compiler and options to use g++ –O, –O2, -O3 (see manual pages for g++) Performance Measurement Needs (2) data to use for measurement 1. 2. 3. 10 worst-case data best-case data average-case data What is the worst-case, best-case, average-case data for insertionSort and how do you generate them? timing mechanism --- clock Choosing Instance Size We decide on which values of instance size (n) to use according to two factors: 1. 2. 11 the amount of time we want to perform what we expect to do with the times In practice, we generally need the times for more than three values of n (read the reasons on page 123) Timing in C++ double clocksPerMillis = double(CLOCKS_PER_SEC) / 1000; // clock ticks per millisecond clock_t startTime = clock(); // code to be timed comes here double elapsedMillis = (clock() – startTime) / clocksPerMillis; // elapsed time in milliseconds 12 Shortcoming See Program 4.1 and its execution times in Figure 4.1 (what is wrong with these execution times?) the time needed for the worst case sorts is too small for clock() to measure Clock accuracy – – – 13 assume the clock is accurate to within 100 ticks If the method returns the time of t, the actual time lies between max{0,t-100} and t+100 For Figure 4.1, the actual time could be between 0-100 Shortcoming 14 Repeat work many times to bring total time to be >= 1000 ticks See Program 4.2 What is the difference between Prog 4.1 & 4.2? See Figures 4.2 & 4.3 See Figure 4.4 (overhead measurement) Accurate Timing clock_t startTime = clock(); long numberofRepetitions; do { numberofRepetitions++; doSomething(); } while (clock() - startTime < 1000) double elapsedMillis = (clock()- startTime) / clocksPerMillis; double timeForCode = elapsedMillis/numberofRepetitions; 15 Accuracy Now accuracy is 10%. First reading may be just about to change to startTime + 100 Second reading (final value of clock()) may have just changed to finishTime so finishTime - startTime is off by 100 ticks 16 Accuracy First reading may have just changed to startTime Second reading may be about to change to finishTime + 100 so finishTime - startTime is off by 100 ticks 17 Accuracy Examining remaining cases, we get trueElapsedTime = finishTime - startTime +- 100 ticks To ensure 10% accuracy, require elapsedTime = finishTime – startTime >= 1000 ticks 18 What is wrong with the following measurement? long numberOfRepetitions = 0; // Program 4.3 clock_t elapsedTime = 0; do { numberOfRepetitions++; clock_t startTime = clock( ); doSomething(); elapsedTime += clock( ) - startTime; } while (elapsedTime < 1000); // repeat until enough time has elapsed 19 Answer to Ch. 4, Exercise 1 In each iteration of the do-while loop, the amount added to elapsedTime may deviate from the actual run time of doSomething by up to 100 ms (or 100 ticks). This error is additive over the iterations and so does not decline as a fraction of total time. For example, suppose that doSomething takes almost 100 ms. to execute. In the worst case, the clock reading will change just before each execution of the assignment startTime = clock() and the amount added to elapsedTime is zero on each iteration of the do-while loop; the do-while loop does not terminate. 20 How do we fix this? Time Measurement in Time Shared Systems UNIX – – time MyProgram See man pages for time Do Exercise 4.2 Read Chapter 4 21