Introduction: This investigation seeks to find out the appropriate mathematical model that can show a relationship between the gold medal winning heights that were held in the Olympic games of men’s categories with that of the year in which the games were held. The Olympic games have a four year cycle so that the next game commences after every four years of the current year. The relationship between the winning heights and the years of the games was modelled analytically by the use of technology such as Autograph software and other graphical analysis software. In the Table 1, the data that is being shown is of the heights of the winning men’s high jump who got Gold Medal in Olympics in the years that ranged from 1932 to 1980. The unit is in centimetres. The height of the gold medal winning jumps in the men’s category ranged from 197- 236 cm for the period 1932 to 1980. It might be worthwhile mentioning here that the Olympic Games didn’t take place on two occasions – first in 1940 and the other in 1944. The data contained in the below Table, table 1 is the actual and original data of the games that will be used for making the model: Table 1: Year Height 1932 197 1936 203 1948 198 1952 204 1956 212 1960 216 1964 218 1968 224 1972 223 1976 225 1980 236 The table below shows the years that are after the year 1896 (for example 1932 corresponds to the year 36 in the Table 2) according to my adjusted axis. This origin was chosen by me because I wanted the coordinated of the x-axis to be positive as far as possible. The height y, which is a dependent variable is measured in cm. So the two axis are x for year and y for height. Table 2: Year (since 1980)- Height(in cm)-(y) (x) 36 197 40 203 52 198 56 204 60 212 64 216 68 218 72 224 76 223 80 225 84 236 Graph 1: In this kind of model, it is neither possible nor practical to have negative values for the variable y. It would not simply make sense as height cannot be below zero (0) cm at any cost. In a long jump, the athletes are expected to jump atleast a few cms if not a few hundred cms as is the case in most of the games. So the numbers are natural or whole numbers in this model. On the contrary, we could have negative numbers in the x axis value which is for years as they represent the years before 1890, hence here x values are integers. But in this model and this assignment no negative values will be used as all the years that is going to be used here will be post 1980). Developing and Evaluating the Analytical Model: If we take a look at the scatter plot graph which is enclosed above we will see that the data shows minima, maxima and inflection points and as such is not in a linear pattern but in a zig zag manner. An upward trend is seen for the period:1932-1936 and the graph follows a downward trend 1936 to 1948. Again there is an upward trend from 1948 till 1968. After 1968, there is further decline again which continues till 1972. After 1972 there is an upward trend from the year1976. However it should be noted that the relationship is not a quadratic one since data is not of a parabolic path one. This is because the data has more than one maxima or minima. Again it cannot be put under logarithmic or exponential category as there are numerous maxima and minima. These are lacking in logarithmic and exponential functions. Again another prominent and widespread function in the form of trigonometric function cannot be applied in this model as the trend is not a periodic function. Even though a piecewise function could have been applied here as there were good chances that it would had worked in this case but in this assignment the polynomial functions will be examined. A 3rd order polynomial fit of the form, was chosen for the 1st model function by me Four equations are required for calculating the values of the four parameters: a, b, c and d. A year and corresponding height are substituted into each of the equation. These four different equations are solved by the help of matrices. In order to enhance the accuracy of the model, the four data sets selected by me were ones that were fairly spread in Table 2. The 4 different equations are presented here in the Table below: 3. Table 3: Year (from Equation 1890) 36 197 = a(36)3 + b(36)2 + c(36) + d 56 204 = a(56)3 + b(56)2 + c(56) + d 66 218= a(66)3 + b(66)2 + c(66) + d 72 224 = a(72)3 + b(72)2 + c(72) + d In the above equation, the following years were selected to include the lower half of the curve (maxima): 1932 and 1936 and the following years were selected to show the upper half of the curve to include points after the inflection in the graph (in this case it is somewhat around 1964): 1972 and 1980 In order to solve the equations, a matrix was set up in the following manner: AX = B A-1AX = A-1B X = A-1B Where the first matrix is A, the second matrix is X and the third matrix is B. Substituting numbers: = Therefore the following function explains the mathematical relationship of winning Olympic high jump heights and the corresponding years: Function 1:f(x)≈-0.000414x3+0.0859x2-4.91x + 284.3 The Graphics Display Calculator (GDC) has automatically rounded the numbers to 4 significant figures. The function has been plotted on below Graph here as Graph 2: Graph 2: If one takes a look at the above trend then he or she will get an impression that the analytical model here fits the given data quite well. The graph follows the general and normal trend of data in which the height increases with respect to the time, i.e., rises up. In the above graph we can see that 4 of the points sit on the curve. It was these pints that created the model function. 3 of the points are above the curve while other 4 lie down. Overall an even distribution is shown above and below the trend line. One particular drawback or limitations of this particular model is that after 1980 the height of the graph continues to go up. This is not a very practical or realistic depiction as one cannot jump over a certain limit. It cannot be indefinite at all as human beings cannot jump to infinite distances. So it could be presumed safely that this function is valid only for the years prior to 1980. One other limitation with this model can be found if we see the height of the graph prior to 1932. It is in a growth trend which means that before 1932 there was an increase in height but that was not the case as is shown by the model curve. Therefore the function is valid only for the years that were after 1932. The below table (Table 3) shows the residual heights for each and every year. If the Δy value is 0, it means that the point lies on the line. The greater and bigger the value of Δy is, the greater is the distance between the point from the curve. If there is a negative Δy value, then it means that the point lies below the curve, whereas positive Δy values mean that the point is above the curve. The values deviate from a range of 0cm to -12.7 cm. We can see that here the standard deviation is 5.934. However, in some of the the years like 1936, 1972, 1976 and 1980, the errors are higher than the standard deviation which means that that there must be a better fit. Table 3: Year Height Height Δy = y1-y2 (after (in cm) predicted by (in cm) 1980) (y1) curve (y2) (in cm) 36 197 197.0 0.0 40 203 196.5 6.5 52 198 200.8 -2.8 56 204 204.0 0.0 60 212 208.0 4.0 64 216 212.6 3.4 68 218 218.0 0.0 72 224 224.0 0.0 76 223 230.6 -7.6 80 225 237.7 -12.7 84 236 245.3 -9.3 By the use of technology, a natural exponent fit was created by me: This curve is also quite close in resemblance with my analytical model but it does not cross through the same 4 data points as it did in the analytical model. This is visible in the below mentioned Graph3: Graph 3: Even though the curve is following the general path here, it doesn’t match the crests as seen in the years 1932-1948 and in 1948- 1964 and also in troughs (1936-1952 and 1968-1976) or inflection points (around 1956). It can be possible that the data can be better fitted to a polynomial function of a higher order. As it looks like that there might be approximately 2 crests, 2 troughs and 1 inflection point, it is likely that a fifth order sixth order polynomial will be a good fit. A fifth order polynomial will be analysed in this paper. This polynomial function below has been found using technology/ regression and all numbers have been rounded to four significant figures. Function 3:f(x) = 0.00005655x5 – 0.01037x4 + 0.9877x³ – 51.52x² + 1396x – 0.0001518 Technology had been used to plot this function has been plotted on Graph 4. Graph 4: The green curve in the above graph denotes the 5th order polynomial function and the pink curve denotes the analytical cubic model. This 5th degree polynomial can be seen as a better fit as it passes through points (i.e, years 1936, 1956 and 1980) and the curve is closer to certain points (years 1948, 1960, 1972 and 1976) than the cubic function. Predictions Based on Analytical Model: By using the cubic analytical model (Function 1), if the games had been held in 1940 (50 according to the adjusted axis) and in the year 1944 (54 according to the adjusted axis), then the predicted winning heights would have been as below: For solving the year 1940 as it was not originally in the data, the number 50 which corresponds to the year 1940 is substituted for x in Function 1: f(x)≈ -0.0001302x3+2030x2-3.645x + 273.9 f(50)≈ -0.0001302(50)3+2030(50)2-3.645350) + 273.9 f(50) ≈ 197.1 Again to replace a number for the year 1944, the number 54 which corresponds to the year 1944 will be used in place of x in the Function 1: f(x)≈ -0.0001302x3+2030x2-3.645x + 273.9 f(54)≈ -0.0001302(54)3+2030 (54)2-3.6453 (54) + 273.9 f(54) ≈ 198.5 Therefore going by the regression model, the men’s gold medal winning height should have been 197cm in the year 1940 and for 1944 it should had been 199cm. Both the numbers and figures seem achievable and realistic as there can be a 2cm increase in the next games. Using the analytical cubic function (Function 1), it was found that if the Games had been held in 1984 (94 according to the adjusted axis) and in 2016 (126 according to the adjusted axis), the predicted winning heights would have been as: For solving the year 1984, the number 94 which corresponds to1984 will be used in lieu for x in Function 1. f(x)≈ -0.0001302x3+2030x2-3.645x + 273.9 f(94)≈ -0.0001302(94)3+2030 (94)2-3.6453 (94) + 273.9 f(94) ≈ 253.3 To solve for the year 2016 the number 126 which corresponds to 2016 will be used in lieu for x in the Function 1: f(x)≈ -0.0001302x3+2030x2-3.645x + 273.9 f(126)≈ -0.0001302(126)3+2030 (126)2-3.6453 (126) + 273.9 f(94) ≈ 327.2 Going by the regression model, in 1984 the winning height should have been 253 cm and for the year 2016 the winning height should be 327cm. But both the figures for the mentioned 2 years - 1984 and 2016 do not seem factual and realistic. The winning height cannot continue to increase above 1980 and also below 1932. Interpreting Analytical Model for Additional Data: The below Graph (Graph 6) includes additional data for years 1896-1932 (minus the years 1900, 1916 and 1924) and for 1980-2008. This data was fit to the cubic analytical (Function 1) model. Graph 6 By seeing the above graph one can find out that a cubic function is not a good fit for the additional data. This is because the winning heights before the years 1932 and after 1980 do not even lie on the trend curve. If we see the Table 4, the values of Δy are much higher than those of Table 3. Because the function is in an upward curve, most of the points lie below the curve and therefore the values of Δy are all negative. The range of deviation is from 0 to -71.8cm. The standard deviation is 24.43 which is approximately 4 times more than that of between the years 1932-1980. The Δy values are much larger than the standard deviation for 1896, 1904, 1908, 1992, 1996, 2000, 2004 and 2008, which means there must be a another better fit. Table 4: Year (years after 1980) Height (in cm) (Y1) Height Δy = y1-y2 predicted by (in cm) curve (y2) (in cm) 6 190 253.7 -63.7 14 180 232.0 -52.0 18 191 223.3 -32.3 22 193 215.8 -22.8 30 193 204.8 -11.8 38 194 198.5 -4.5 42 197 197.0 0.0 46 203 196.5 6.5 58 198 200.8 -2.8 62 204 204.0 0.0 66 212 208.0 4.0 70 216 212.6 3.4 74 218 218 0.0 78 224 224 0.0 82 223 230.6 -7.6 86 225 237.7 -12.7 90 236 245.3 -9.3 94 235 253.3 -18.3 98 238 261.8 -23.8 102 234 270.5 -36.5 106 239 279.5 -40.5 110 235 288.8 -53.8 114 236 298.3 -62.3 118 236 307.8 -71.8 From 1896-1908 there is a trough in the data points. But from 1908- 1936 there is an upward trend after which there is a dip once again till 1952. From 1952- 1968 there is again an upward trend and then another trough is seen from 1968 – 1976. Then the values are almost constant and fixed from 1980 onwards with minor fluctuations (for example in the years 1988 and 1996). In order to fit these trends in the data, the model needs to be changed. The cubic model was modified such that the curve now went through four points including years in the extended data set. Table 5: Year (from Equation 1890) 6 197 = a(6)3 + b(6)2 + c(6) + d 58 204 = a(58)3 + b(58)2 + c(58) + d 94 218= a(94)3 + b(94)2 + c(94) + d 118 224 = a(118)3 + b(118)2 + c(118) + d To include the lower half of the data set the years 1896 was chosen. To include a point at the middle of the data set the year 1948 was used. The years 1984 and 2008 were chosen to include the upper half of the data set. To solve the equations, a matrix was set up in the following fashion: AX = B A-1AX = A-1B X = A-1B Where the first matrix is A, the second matrix is X and the third matrix is B. Substituting numbers: = Therefore the mathematical relationship of winning Olympic high jump heights and years is defined by the function: Function 4:f(x)≈-0.0002354x3+0.04713x2-1.979 + 200.2 The Graphics Display Calculator (GDC) has automatically rounded the numbers to 4 significant figures. This function has been plotted on Graph 7 by using technology. Graph 7: In the above Graph 7, the pink line or equation 2 represents the function 4 which is the refined cubic function for the additional data and the blue line or Equation 1 represents the original cubic function. Function 4 is a much better fit than Function 1 as it follows the general trend of the data as it increases and then gradually decreases. However, this function has its limitations as well. This is because it is highly likely that prior to 1896, it was unlikely that the heights would have increased and post 2008, it is highly unlikely that there will be decreasing trend. Conclusion: This IA attempted to find an analytical curve and a regression curve that models the height and years from 1986 to 2008. A cubic function (Function 1) fit the original data given in Table 1. However, it only followed the general trend and did not follow the nuances (minima, maxima, inflections) in the data. To model this, a 5th order regression polynomial (Function3) was used which was a better fit for the overall trend. For the extended data, the original cubic function did not fit very well and a modified cubic polynomial (Function 4) was generated which followed the general trend of the data better.