3
3
4
4
5
6
7
7
8
9
9
10
11
12
12
13
I would like to take this opportunity to thank everyone who has helped me to get this far at CERN. Special thanks go to my supervisor Lorenzo Moneta for his teaching and direction during this project. To Andras Zsenei for his consistent help in the office. To everyone in my department for their support during this summer. To my university tutors Geraint
Jones and Andy Wathen for their teaching and references. To all my friends here at CERN who have made my time so here so enjoyable and taught me so many new things about their cultures. Especially Jan, Nino, Dana, David, Carlos,
John, Stian, Diana, Florian, Laura, Lucia, Cristina, Aoife and Laura who I will hopefully go and visit soon. And finally, I would like to thank my friends and family back in England for always being there for me.
Andrew McLennan – University of Oxford Page 2 of 13 4/16/2020
A large class of problems in many different fields of research can be reduced to the problem of finding the smallest value taken on by a function of one or more variable parameters. For example, the minimum of f ( x )
x
3
2
is zero and is obtained at x = 3.
The classic example of minimization which occurs so often in scientific research here at CERN, however, is the estimation of unknown parameters in a theory, by minimizing the different between theory and experimental data.
Before we can tackle the minimization problem, we have to state what assumptions we are allowed to make. It is assumed that the function F(x) is not actually known analytically but is only defined by the value it takes at a given position. It is also assumed that we are allowed to specify a range which the parameters are allowed to take. Any additional information we wish to provide such as the numerical values of the derivatives dF/dx at any point should be given when available, but in general should not be assumed.
The function should get repeatedly evaluated at different points until its minimum is found and the method which finds the minimum (within a given tolerance) after the fewest evaluations is generally considered to be the best.*
There are several difficult situations which the minimization methods have to overcome. These include
Finding the global minimum opposed to a local minimum
Finding the minimum point located somewhere within a large plateau
Finding the minimum point without having to check every point in the allowable range
We will test our methods to check whether the above situations are handled correctly
The goal of my project here at CERN is to examine the relationship between the amount of information we need to supply in order for the minimization methods to converge to the correct result as well as comparing the differences between the minimization methods Minuit, GMinuit and Fumili. Minuit comes from the class TMinuit and GMinuit is the name of the packer for the new C++ version of Minuit. Both Minuit and GMinuit are general minimization methods based on Fletcher’s unified approach to Variable Metric Methods (VMM) which combines rank-one and rank-two formulas to deal with different types of minimization problems.** Fumili comes from the class TFumili and is a specialized minimization technique based on Chi Squared minimization and is designed to work very quickly for certain functions provided we supply good starting information.
My aim has been to produce software tools to systematically analyze the length of time each method takes to run as well as graphical displays of the regions in which each method converges. It is important that both Minuit and GMinuit work provided the same initial conditions apply for backwards compatibility, but ideally GMinuit should work faster in general and over a larger range of input parameters. It is also important to insure against islands being formed within the range of input parameters.***
Minimization within ROOT**** can be used in several ways:
In order to produce a “line of best fit” on data within a histogram
With pure minimization of a function of our choosing.
Given a histogram, the production of a line of best fit is very useful. Minimization methods are designed to take this data and use it to efficiently produce a function which best matches the data points. This is a difficult mathematical problem and hence why many different methods exist. Pure function minimization is what is behind the production of a line of best fit and hence should be tested also. I therefore restricted my investigation to looking at these two cases using nontrivial functions to get the most out of the methods.
Initially I didn’t know what results I would produce and so a lot of experimenting was required. This allowed me to become very familiar with the methods I was examining and helped me to evolve and tailor my programs at each stage so that I could produce interesting and useful results to aid in the fixing of bugs within GMinuit.
* required to implement the method, but normally the dominating factor will be the time spent in evaluating the function
**
Occasionally other considerations may be important, such as the amount of storage required by the method or the amount of computation
A general minimization method is one which can handle all functions. More details about these and many more minimization methods can be found on the Minuit website under Documentation. http://www.cern.ch/minuit
*** An island is a region of input parameters of non-convergence for the method, located inside another region of convergence.
**** ROOT was produced at CERN for use in analyzing the data from high energy physics. See http://root.cern.ch
for more details.
4/16/2020 Andrew McLennan – University of Oxford Page 3 of 13
The first example of minimization I considered was the fitting of a histogram in order to produce a line of best fit. I constructed a function consisting of many Gaussian peaks together with some linear background noise and used this to populate a one-dimensional histogram.* The function ranged over 0-1000 with the peaks located at random places. This type of histogram is regularly seen here at CERN and so is a good place to start my testing.
As the histogram was filled using randomly chosen points, I decided that the best way to test the pure efficiency of the fitting algorithms would be to pre-locate each of the peaks using a TSpectrum object. This information was then passed to each method as their initial search parameters and the length of time each one took to converge was recorded.
The output of this program can be seen in Appendix 1 and Table 1 below contains the CPU running times of each method.
As can be seen from the output, all three methods produced a good fit to the data but
CPU Running Times
TMinuit 0.64s more importantly each one produced the same fit to the data implying that this fit is actually the best fit available.
Fumili 0.15s
GMinuit
(new interface)
GMinuit
(interface as TMinuit)
0.47s
0.66 s
Examining the lengths of time each method took to run reveals something interesting.
As suspected, Fumili works by far the fastest. This is because we supplied good initial information and Fumili is a specialized method for problems such as this, while Minuit and GMinuit are general methods. GMinuit ran faster than TMinuit when used in combination with an optimized fitter interface in the ROOT framework, exploiting object oriented features of the new C++ version of Minuit. When the same interface as in TMinuit is used, comparable running times are then obtained.
Further tests on the range of convergence are now needed to fully examine the differences between the running times and range of convergence of the two methods before Minuit can be retired in favor of the new GMinuit.
*
From the above program and the general requirements of each minimization methods, we can expect that when good information is provided, all methods will converge to the correct and same results. If, however, information is supplied which is not good then it is not known whether the methods will fit correctly or not. To test the methods when different initial information is supplied I produced a program to systematically cycle through the input parameters in order to see at which points the method works and which points it fails.
To begin with I decided to start with a simpler example than the one above and to use only two peaks located at 8.0 and 11.0 on the range of 0-20. The program was created in such a way that changing the method of minimization involved only changing the name within the TVirtualFitter’s SetDefaultFitter() method. For each iteration of the program, my program set the initial suggested location of the means for the two
Gaussian peaks, before it performed the fit. It cycled through each of the suggested means from (-5,-5) to (25,25), going up in integer intervals. The real means were located at 8.0 and 11.0 and so I expected that at least around these regions, all three methods should work.
Together with identifying where each method converged correctly, I also produced two histograms of the length of time each method took given a specific input parameter for each method. One was for when the fit was good and the other was for when the method didn’t fit the histogram correctly. I chose to display the reciprocals of the actual times in order to be able to see the places of very fast convergence more easily.
The results of this program for Minuit and GMinuit provided me with very useful information. As can been seen clearly in the two pictures below, Minuit converged to the correct results in far more places than GMinuit did. This suggested there was a major bug which needed to be fixed otherwise GMinuit wouldn’t be fully backwards compatible with the older version. Another worrying result from this program was the existence of islands within areas of convergence and bands of convergence and non-convergence. This wasn’t expected especially for such a simple example with parameters so close to the actual minimum.
Looking at the lengths of time each method took to converge, we find that for GMinuit we almost always knew within 2 seconds whether the method has converged correctly or not. Minuit does in some cases work faster than GMinuit, which is a problem, but generally there exists a greater number of places where Minuit converges much more slowly.
Andrew McLennan – University of Oxford Page 4 of 13
By looking more closely at the points where Minuit performs better than GMinuit, it was possible to track down what was causing some of these problems in the code and hence fix it. Running this program again using the fixed version of
GMinuit now produced results which were far more closely related to Minuit as can be seen below and in Appendix 2b .
Islands and bands still exist, but to solve this will require a much more in-depth look at the reasons and paths the algorithm takes during its execution. The full output results of my program for Minuit and GMinuit can be found in
Appendix 2a . More research into the problem of this was therefore definitely needed.
Fumili on the other hand converged correctly in a far smaller region than either Minuit or GMinuit but for this function at least no islands or bands existed. The more interesting result however is that when Fumili does fit correctly the running time of the method is far less than either of the other two algorithms, but when bad initial information is supplied the running time can be exceptionally large. Hence as suspected, provided we supply good information to Fumili it will work very well for functions such as this, but if we don’t have adequate information then a general minimization method will usually work faster. The full output results for Fumili can be found in Appendix 3 .
To allow me to further my investigation into these methods, I decided that generalizing my code would allow me much more freedom in the future when changing the function to be examined. I increased the range of the function to 0-1000 with the peaks now located at 350.0 and 750.0 respectively. I also realized that currently the programs were taking much too long to run using the standard ROOT interface and hence I adapted the code so that I could produce compiled C code which was, in turn, more stable.
The main step I took, however, was to have the comparison of Minuit and
GMinuit both located on the same results output. I decided this would make it easier to spot where the methods differed and hence help my investigation. Locations which are coloured Black indicate parameters where neither Minuit nor GMinuit converged correctly. Blue represents areas where both converged correctly.
Green areas where GMinuit outperformed Minuit and Red where
GMinuit failed but the original Minuit worked.
Another feature I added to my program was the ability to add some extra random noise to the histogram. By being able to select and compare the differences between different percentage levels of noise I was able to compare the methods under more realistic experimental conditions. My final adaptation was to have two histograms displaying the frequency of convergence for the different distances away from the actual means.
This would show that as you move the initial parameters away from the actually means, the less likely it was that either method would converge.
4/16/2020 Andrew McLennan – University of Oxford Page 5 of 13
The above picture represents a section of the output from the updated program. As can be seen, when no extra noise is introduced and even when using the version of GMinuit we had previously fixed, there are still several regions and bands where the method fails.
When we next run the program with different levels of extra noise, we get some very interesting results. Not only are the regions of convergence for the methods reduced as the amount of noise is increased, but points where the method previously didn’t converge correctly now in some cases does converge. Bands on non-convergence still exist for this function even as the level of noise is increased, but the position of the bands actually changes. For 5% extra noise we see that GMinuit performs with a greater uniformity that Minuit as less islands are created. Finally, we notice for this example at least, that as the amount of extra random noise is increased, GMinuit starts to out perform Minuit. Table 2 shows the actual number of points where the two method converge
Noise Minuit GMinuit
0%
2.5%
5.0%
847
805
165
814
806
176
Hence, not only does the accuracy of the initial information we supply have to be more accurate as noise level increases, but it seems GMinuit was designed to work well as the noise level increases. It would be interesting to have some further investigations with different levels and types of noise to see whether GMinuit actually does work better as noise increases. The full output results for the 3 different noise levels can be found in Appendix 4 .
The final part of my project was to move away from using minimization for fitting histograms to testing the minimization methods for Minuit and GMinuit directly. I did this by again adapting my program to deal with more complicated two dimensional functions such as Rosenbrock’s Curved Valley problem* with the aim of being able to change the function under investigation easily. I designed the program so that it was easy to change the viewing range of the function, the amount of detail to print out, the position about which to concentrate the investigation and also whether I wanted just the basic information about this position or the extra graphical outputs. All these parameters can be set at runtime allowing the user to view general information around the actual minimum and then to target specific regions or points without having to go and change any of the underlying code.
Running the program without setting any of the input parameters produced the output results in Appendix 5a . As can be seen, there are several places where GMinuit should work but doesn’t. Some of these places are actually quite close to the actual minimum and hence even though we have fixed one problem in the code, others must exist. Again the bottom two histograms represent the number of correctly and incorrectly minimized positions which are of different distances away from the actual minimum. The second output screen shows the proportional number of correctly and incorrectly minimized points with respect to the total number of points available at that level. It can be clearly seen that as the information you supply diverges from the actual minimum, the number of points which will actually converge correctly tends to zero. This happens quite well with Minuit but GMinuit again struggles.
Using this information and the code, it was possible to go into detail for the points where GMinuit failed and fix two more problems making a total of 3 fixes. Running the same program again but now using this fixed version of GMinuit results in the output in Appendix 5b . It can now clearly be seen that GMinuit works well for this function and that it even out performs Minuit in several places. This encouraging result shows that my investigation was going in the correct direction at least and hopefully it should be possible to find all the problems causing the discrepancies seen.
* More information about Rosenbrock’s Curved Valley problem can be found in Appendix 5a.
4/16/2020 Andrew McLennan – University of Oxford Page 6 of 13
To determine whether all the differences between Minuit and GMinuit had been found, I tested one further function. The Goldstein and Price’s function with four minima** is another difficult problem for minimization methods due to it having more than one local minimum. My program again adapted to the new information, allowing me to specify more exactly how much information I would like to see outputted as well as having an extra indicator for the case when Minuit and GMinuit both failed but failed to different values. I felt that this was also an important case to consider as it shows that the method must have taken different routes to get to its final result. Upon running my program for this function I expected there to only be a few places where the function failed. Unfortunately, however, it was obvious there are still major problems with GMinuit due to the large quantity or red markers dotted all over the input region. Not only did
Minuit outperform GMinuit, both methods left many islands and bands of non-convergence within areas of convergence and vise-versa. Also, by looking at the second output results, the distance away from the actual minimum did not seem to affect the probability of whether a set of input parameters would converge.
To get the most out of my program, it should be run with input parameters. The following is an example of the program:
root Project5GoldsteinAndPriceFunction.C+( range , extra , print , X , Y ) , where
“ range ” is the distance either side of our chosen point to be displayed,
“ extra ” is whether we would like only the basic output or the full detailed program output,
“ print ” decides how much information each method should display, going from 0 to 3,
“ X ” & “ Y ” are the (x,y) coordinates about which my program should look.
The default execution parameters would effectively be equivalent to
root Project5GoldsteinAndPriceFunction.C+( 10 , 0 , 0 , 0 , -1.0 )
During this project, I have been able to investigate the properties of different minimization strategies. I have been able to show graphically where each method works and compare the new and old versions of Minuit. Even though three fixes have resulted from my investigation I have been able to show that there must be more still hidden within the code somewhere. The most important result I have found, however, is that both the new and old versions of Minuit fail in places where they were not expected to. It would be very interesting to be able to look at the code in far more detail and follow the routes they take for these islands compared to the surrounding areas to find out what the reason is for the differences. Then, it may be possible to produce more rigorous methods for finding minimizations and fits to histogramed data.
Thus, to take this project further I would initially look at fixing the problems which are causing GMinuit to fail in so many places where Minuit actually worked for the Goldstein and Price’s function. This would be done in the same way as before by looking at the locations which caused anomalies, seeing what results they were actually giving and trying to use this information together with stepping through the code until the problem is discovered.
Next I would adapt my program to higher dimensions, specifically 4 dimensions, as there exists several difficult test functions for which it would be interesting to see how GMinuit performed. As it is only really possible to display 2 dimensions effectively, I would want it to be possible to fix certain dimensions and vary others. It should therefore be possible to specify this in the programs input parameters and at which specific values the fixed dimensions should be set to. The information I would then get for these new test functions would either provide more information and points at which need to be compared (so as to fix bugs within GMinuit) or increase our confidence in its algorithm. Finally, the last extension I would make would be to display both the function and region of convergence on the same graph, as this would make the investigation easier to visualize. Each comparison point would then be tied directly to an area of the function being minimized.
As this investigation was very open ended with no previous information about what results to expect, I had to produce my program so that it could evolve and develop as more information was discovered. This can be seen with the progression of the programs I produced. The final program I produced is thus stable and has the ability to provide a firm base to continue the investigation. In the end I decided to only compare versions of Minuit leaving the Fumili investigation for another time. However, this wouldn’t be hard to incorporate into my program if someone so wished. I felt that comparing Minuit versions (which are generalized method) to Fumili (which is a specialized method), would not
* actually produce any comparable results.
More information about Goldstein and Price’s Function with four minima can be found in Appendix 5b.
4/16/2020 Andrew McLennan – University of Oxford Page 7 of 13
The output of my program for comparing Minuit, Fumili and GMinuit when good initial information is supplied. As can be seen, each method fits the function but the running time of each method differs greatly.
The horizontal axis contains the bins for the range of the function.
The vertical axis is the number of elements for each bin.
4/16/2020 Andrew McLennan – University of Oxford Page 8 of 13
4/16/2020 Andrew McLennan – University of Oxford Page 9 of 13
4/16/2020 Andrew McLennan – University of Oxford Page 10 of 13
Both
Noise Minuit GMinuit
0%
2.5%
5.0%
847
805
165
814
806
176
4/16/2020 Andrew McLennan – University of Oxford Page 11 of 13
F ( x , y )
100
y
x 2
2
( 1
x ) 2
Minimum:
F ( 1 .
0 , 1 .
0 )
0
Possible starting point:
F (
1 .
2 , 1 , 0 )
24 .
20
This problem is probably the best known test problem for minimization methods.
It consists of a narrow parabolic valley with very steep sides. The floor of the valley follows approximately the parabola y
x
2
1 200 and stepping methods tend to perform at least as well as gradient methods for this function.
4/16/2020 Andrew McLennan – University of Oxford Page 12 of 13
F ( x , y )
1
x
Local minima: y
1
2
19
14 x
3 x
2
14
F y
6 xy
3
( 1 .
2 , 0 .
8 ) y
2
30
840
2 x
3 y
2
18
F
F
( 1 .
8 , 0 .
2 )
(
0 .
6 ,
0 .
4 )
84
30
32 x
12 x
2
48 y
36 xy
27 y
2
Minimum:
Possible starting point:
F
F
( 0 ,
1 .
0 )
(
0 .
4 ,
0 .
6 )
3
35
This is another standard test function for minimization methods. It is an eighth-order polynomial in two variables which is well behaved near each minimum, but has four minima. An interesting place to start looking would be at the above starting point as it lies in between the two lowest minima.
4/16/2020 Andrew McLennan – University of Oxford Page 13 of 13