Profiling code, Timing Methods 26-Jul-16 Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the experts say about optimization: Sir Tony Hoare: “Premature optimization is the root of all evil.” Donald Knuth: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” W. A. Wulf: “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.” Michael A. Jackson: “The First and Second Rules of Program Optimization: 1. Don’t do it. 2. (For experts only!) Don’t do it yet.” Richard Carlson: “Don’t sweat the small stuff.” When should you optimize? Optimize when (and only when): One of the major principles of the XP (Extreme Programming) approach is: “Do the simplest thing that could possibly work.” A program you have written is taking too long to run A method you have written will be called billions of times in a highly graphics-intensive application Your code is going into the kernel of an operating system In other words, don’t use a faster method in preference to a simpler one If it turns out to be too slow, you can fix it later However: If efficiency is completely ignored in a large project, it may be very difficult to fix it later Individual methods can be improved, but if the overall design of the project is inefficient, redesign can be prohibitively expensive What should you optimize? From Wikipedia: In engineering, bottleneck is a phenomenon where the performance or capacity of an entire system is limited by a single component The 90/10 rule is: 90% of the runtime is taken up by 10% of the code How do you find the bottleneck? That 10% is the bottleneck This is where significant improvements can be made Intuition is surprisingly inaccurate A profiler is a program that measures which parts of your program are taking the most time (or using the most memory) Use a profiler before you waste your time optimizing the wrong part of the program! What shouldn’t you optimize? Don’t sweat the small stuff Virtually any modern compiler is better at optimization than you are Consider: for (int i = 0; i < array.length; i++) { int limit = 100; if (array[i] > limit) array[i] = limit; } Would this be faster if you took the assignment to limit out of the loop? No! The compiler will do that anyway In fact, there is a fair chance that doing so would slow your program down! How should you optimize? The best way to speed up a bottleneck is to avoid doing it altogether! Example: In one case study I read about, one program added debugging information to a dataset; the dataset was sent to a second program, which stripped off the debugging information before using the dataset The second best way is to find a faster algorithm Eliminating the debugging information doubled the speed of the program This what a course in “Analysis of Algorithms” is all about Always remember, though, that in “real life” you should use a profiler to find out what to optimize The fastest sorting algorithm in the world won’t help you very much if you’re sorting a 10-element array Timing without a profiler Here’s an easy way to time a section of code: long t = System.currentTimeMillis(); // code to be timed goes here t = System.currentTimeMillis() - t; System.out.println("Code took " + t + " milliseconds"); This approach often fails. Why? Code took 0 milliseconds On modern computers, many things you might want to time take less than one millisecond Solution: Put the code to be timed in a loop, and run it a thousand times, or a hundred thousand times, or even more This helps with the “0 milliseconds” problem By itself, this is a bad idea for timing a sorting algorithm (Why?) Improving timing accuracy To improve accuracy: Run the code many times, and take the average Close as many other applications as possible--in fact, run from the command line, rather than Eclipse or NetBeans Call the code to be timed at least once before you begin timing Using currentTimeMillis measures the total elapsed time, by all processes, not just the code you are trying to time Many Java compilers don’t compile code until the first time you use it--this is called JIT, or “Just In Time,” compiling Call System.gc() before starting the timing This is a request for Java to run the garbage collector now, so that (hopefully) it will not run during your timing trials This call is legal in all Java implementations, but they are free to ignore it If the code being timed creates new objects, then the time for that will be (and should be) included in the timing results, but may introduce too many garbage collections Problems with profiling Profiling is not as useful for distributed code For example, an application that runs on a server, accesses a database, makes remote calls, does input/output, etc., is too distributed for profiling to be much help Environment “noise” can be a problem Profiling is more meaningful in the absence of “alien” code, network access, input/output, etc. Try to isolate the code to be profiled from these factors Many Java compilers interpret the code the first time through, rather than compile it Run the code being tested multiple times, rather than just once Longer runs are better; ideally, your profiling run should take tens of seconds, or a few minutes The End