Study Advice Service Mathematics in Decision Making an introduction for business studies/management students (levels 5 and 6) Background This summary sheet is derived from experiences in teaching Management Decision Making at level 5 and level 6 and identifies common problem areas for students. Decision making is multidimensional activity, in that it uses both quantitative and qualitative information to aid the process. One approach to this multidimensionality is to use the three phased approach (Jennings & Wattam, 1998). This identifies three key steps to complete in the process of making decisions about problems and opportunities. Problem Identification Solution Development Solution Selection In applying this framework to management decisions, we need to understand and model information in a variety of ways. This involves handling both qualitative information (text, values, opinions, evidence etc) and quantitative information (data, trends, percentages, probabilities etc). This short guide is a summary of the key issues to consider in managing quantitative models in particular. Most examples and activities you will come across use time series data (data which _____________________________________________________________________________________________________ Tel: 01482 466199 Web: www.hull.ac.uk/studyadvice Email: studyadvice@hull.ac.uk is dependent upon the passage of time). This is usually described by the x variable (independent) and the data against which this varies is the y variable (dependent). Note, however, you may come across different data sets which do not use time as the independent variable. The procedures for handling such problems are largely the same. Smoothing Data Periodic/random variations in data can be smoothed by weighting the importance of the most recent forecast and most recent actual data observed, in different ways. Typical nomenclature used in this approach uses the Greek symbols of: - alpha – used to denote a smoothing constant (typically given a value between 0.1 and 0.5) For example: New Forecast = (chosen fraction * Actual Data value) + ((1-chosen fraction)*Old Forecast Value) Where = chosen fraction (or smoothing constant) Hence: a) New Forecast = (* Actual Data value) + ((1-)*Old Forecast Value) Hence: F4 = A3 + (1-)F3 Where : F4 = Next (needed) forecast A3 = Most recent actual data F3 = Most recent forecast NOTE – It is very important to understand that there is a hierarchy of brackets (i.e. ( and )) used in calculations involving quantitative methods. You always start with the innermost brackets first: e.g. first part A is worked out, then part with B, then part with C and then the part with D. ANSWER = (D*(C+1(B*10(A+20)))) Unless you carefully consider this when using Excel, calculations and formulae used in the cells of Excel will not generate the correct answer. For example, the answer to: ANSWER = (A*B+1-R) Will not give the same result as: ANSWER = (A*((B+1)-R)) There are also additional equations for looking at smoothing quantitative data forecasts using more smoothing constants (such as (beta) and (gamma) values). 2 Shorthand and symbols when dealing with lots of numbers As this focus on quantitative tends to ask you to consider lots of data, another general symbol you will come across is that of (or sigma). – this means ‘SUM OF’ – so whatever follows this, you are ‘adding’ up e.g. For example, if your data set was called SET 1 and included the values of (1,45,45,34,6,52), then (SET 1) would be equal to: (SET 1) = 1+45+45+34+6+52 = 183 You will also come across symbols such as: and - mean – approximately equal to (i.e. A B, if A equals 34.23 and B equals 34.27) - means whatever is on the left hand side is greater than or equal to whatever is on the right hand side of the inequality symbol e.g. (10 9 is OK, as is 10 10, but not 10 11) - means whatever is on the left hand side is less than or equal to whatever is on the right hand side of the inequality symbol e.g. (9 10 is OK, as is 9 9, but not 9 8) y or x - means to find the mean (or average) of your y or x values - means ‘infinity’ X1, X2 etc – using subscripts such as numbers indicates the data all belong to one data set (or variable) ye – this is the estimate of y – typically indicating the forecast you have made as a result of modelling some quantitative data (y is usually the dependent variable) R2 or r2 - this is the symbol given to the statistical calculation called the ‘coefficient of determination’. This is found from a dataset (x,y) directly to be: It can also be found from the equation: r2 = (ye – y)2 (y - y)2 The nearer your value for r2 is to either +1 or -1, the better your fit. The nearer it is to 0, the poorer your best fit line is. 3 Methods of handling quantitative data for forecasting (1) Handling datasets for forecasting purposes usually means looking for trends in the data and then isolating those trends, so as to be able to rearrange them to give a value for some time in the future (if your independent variable is time (t)). How you isolate them has a common methodology, but how you rearrange them needs careful consideration by you, as which is the best way to do this (according to what mathematical relationship). For example, the easiest way to do this with two datasets (x and y) is via a mathematical method called linear regression. NOTE – in choosing to use such a method, you are assuming that the relationship between x and y is linear. This means that say a 20% change in x would have a same change in the value of y (also 20%), assuming the relationship between the data is only in terms of x and y. Nonlinear relationships between x and y do not obey this same rule. You can think of this also in terms of linear relationships being described by a straight line, whereas non-linear relationships are described by curves. Linear regression further assumes that your dependent values (y) have an underlying trend (U), a dominant trend (T), perhaps a seasonal trend (S), and random trends ( R). Usually values for R are ignored. This process identifies the values for U, T and S for any given value of x, to provide a forecast of ye. It uses the method of least squares analysis to do this. This method determines the equation of a best fit line, which minimizes its average distance from all known data points. The equation of a straight line is: y = mx +c Where: y = dependent value x = independent value m = gradient of line (this is also described by T) c = value at which the line crosses the y axis (this is also described by U) So, the method of least squares analysis tries to find the values of m and c that best describe the most accurate fitting line between data points. In the examples below, graph (a) has a better ‘best fit line’ than that also shown in graph (b). 4 Graph (a) y (dependent variable)) 14 Actual data plotted 12 10 8 6 4 Best fit line 2 0 0 2 4 6 8 10 12 x (independent variable) The equation of the straight line in graph (a) is: y = 1.12x + 1.28. This is the equation of the best fit line for the data plotted. This means that the distance between the values in this dataset (actual data plotted) and the best fit line for x and y is minimized. Hence it is the best fitting line, when compared with the poor fitting line drawn in graph (b). This poor best fit line has the equation y=1.45x+1.28 and it is easy to see that despite the similarity in the equations, the best fit line in graph (a) is superior. Graph (b) y (dependent variable) 18 16 14 Poor fitting line 12 10 8 6 4 2 Original dataset Best fitting line 0 0 2 4 6 8 10 12 x (independent variable) The values of m and c can be determined directly from your original x and y values, using these formulae: c 5 In these formulae, ‘n’ denotes the number of values you have got (in your original dataset). A measure of how good a fit your ‘best fit line’ actually is can be found from this equation: Hence, a typical dataset in x and y could look like (taken from Carlson, 2002): x y xy x2 y2 1 2 2 1 2 2 5 10 4 25 3 6 18 9 36 4 10 40 16 100 5 16 80 25 256 6 21 126 36 441 7 25 175 49 625 8 26 208 64 676 9 32 288 81 1024 10 35 350 100 1225 ∑ x = 55 ∑ y = 178 ∑ xy = 1297 ∑ x 2 = 385 ∑ y2 = 4410 c Sometimes in datasets you may observe a regular repeating change in the value of y, say. This could be a seasonal pattern (S values and which does not necessarily refer to the more familiar use of the word). It is better to think of it as a regular repeating cycle which occurs every n th moment. For example, if a company’s sales were related strongly to the season of the year, then this cycle would have a period of 4 (i.e. every 5 th point would be repeating itself – so only 4 points are needed to describe the entire seasonal change). You can moderate your forecasts by accommodating such regular periodic changes. 6 Methods of handling quantitative data for forecasting (2) You may be asked to consider datasets (x,y) which do not seem to have a clear linear relationship. These are called non-linear relationships and three common types are shown below in graphs (c,d and e). You make this judgement by inspecting your dataset and considering whether you would expect to see such a relationship in your data. It is important to note that equations which describe curves are part of a family of curves, which can look very similar to each other. For example, if you magnified the edge of the circle sufficiently, it would come to resemble a straight line. So you must exercise care when choosing to use a non-linear equation for forecasting. Some of the more familiar types of non-linear equations used are described in graphs (c,d and e). In graph (c ) for example, the curve has the equation y = b*(mx) (or b multiplied by m which has been raised to the power of x). In graph (d), the curve has the equation y = b*(xm) (or b multiplied by x which has been raised to the power of m). When m is an odd number note you will have multiple turning points in the graph. In graph (e), the curve has the equation y = b*(exp(-mx)) (or b multiplied by the exponential value of the negative value of m multiplied by x). There are lots of other non-linear relationships, but these are not covered by this study guide. These equations can be used in the same way as with forecasting via the linear regression and least squares analysis, but there are a few more intermediate steps. These steps are the transformations needed to convert a non-linear relationship into a linear relationship. To do this you need to use logarithms and the laws of the logarithms. 7 Graph c 4000 y (independent variable) 3500 3000 2500 2000 1500 1000 500 0 0 5 10 15 20 25 x (independent variable) For example, for graph (c ) above, which has the general equation y = cm x. This can be expanded using the laws of logarithms to derive: y = cmx. OR log y = log (cmx) OR log y = log c + log mx OR (1) log y = log c + x log m If you compare this with the general equation of a straight line (y = mx+c) you can note that equation (1) looks like a straight line equation now, if we let: Y = log y C = log c M= log m Then: Y = Mx + C 8 This has transformed the non-linear equation into a linear equation, which we can apply the standard least squares analysis method to help solve. Remember that in this case, you will have a column of log y (which replaces y in your calculations to find M and C). However, when you have derived your values for m and c from least squares analysis, you must remember that these actually refer to M and C, hence you need to transform them back to the original values in the original equation of y = cmx. Hence, suppose you find M to equal 0.45 and C to equal 1.4, then transforming these back to find m and b means taking the anti-logarithms of M and C. If you are using (log) function (this is log base 10), this is simply: m = 10M and c = 10C OR actual gradient of your best fitting curve (your T value) has a gradient value of 10 raised to the power of M and the intercept on the y axis (the U) value has a value of 10 raised to the power of C. This same procedure can be applied to the equations described by graphs (d) and (e). That is to say: Identify the generic equation you wish to use to model the quantitative date Transform that equation into a comparable linear format Determine values of M and C (as appropriate) via least squares analysis Transform values of M and C (as appropriate) back to find true values of m and c (as appropriate) in your original equation of the curve you have chosen to use. 9 Graph d 60 y (dependent variable) 50 40 30 20 10 0 0 5 10 15 20 25 x (independent variable) For example, for graph (d) above, which has the general equation y = cxm. This can be expanded using the laws of logarithms to derive: y = cxm. OR log y = log (cxm) OR log y = log c + log xm OR (2) log y = log c + m log x If you compare (2) with the general equation of a straight line (y = mx+c) you can note that it looks like a straight line equation now, if we let: Y = log y X= log x C = log c Then: Y = mX+ C 10 This has transformed the non-linear equation into a linear equation, to which we can apply the standard least squares analysis method to help solve. Remember that in this case, you will have a column of log y (which replaces y in your calculations to find m and C as well as a column of log x, which replaces your column of x). However, when you have derived your values for m and c from least squares analysis, you must remember that m is ok and solved, but c actually refers to C and hence you need to transform this back to the original value in the original equation of y = cxm. Hence, suppose you find m to equal 0.45 and C to equal 1.4, then transforming this back to find b means taking the anti-logarithms of B. If you are using (log) function, this is simply: c = 10C OR the intercept on the y axis (b, also the U) value, has a value of 10 raised to the power of C. Graph e 0.8 y (dependent variable) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 x (independent variable) For example, for graph (e) above, which has the general equation y = c*(exp(-mx)) or y=ce-mx. This can be expanded using the laws of natural logarithms to derive: y = ce-mx OR ln y = ln (ce-mx) OR 11 ln y = ln c + ln (e-mx) OR (3) ln y = ln c + (-mx) This happens as ln(exp(X)) always equals X (i.e. the natural logarithm is the inverse of the exponential function). If you compare this with the general equation of a straight line (y = mx+c) you can note that equation (3) looks like a straight line equation now, if we let: Y = In y C = In c Then: Y = (-)mx+ C This has transformed the non-linear equation into a linear equation, to which we can apply the standard least squares analysis method to help solve. Remember that in this case, you will have a column of ln y (which replaces y in your calculations to find c). However, when you have derived your values for m and c from least squares analysis, you must remember that m is ok and solved, but c actually refers to C and hence you need to transform this back to the original value in the original equation of y = ce-mx Hence, suppose you find m to equal 0.45 and C to equal 1.4, then transforming this back to find c means taking the anti-natural logarithm of C. If you are using ln function, this is simply: c = Exp (C) OR c, the intercept on the y axis (the U) value, has a value of exponential(C). As with linear regression forecasts, you can also moderate your non-linear forecasts with seasonality values if appropriate. Worked example Consider the dataset below of average monthly temperature (in degrees Celsius) for the last 41 months in Sao Paulo. Assuming there are no other variables or external factors affecting the observed data, except the time of the year, forecast the temperature for the next year. Month Temperature Month Temperature Month Temperature Month Temperature 1 2 3 4 5 6 7 8 9 10 11 12 3 4 5.5 9 15 18 20 21 17 15 10 8 13 14 15 16 17 18 19 20 21 22 23 24 5 5.8 5.9 9 16 21 25 30 30 25 19 15 25 26 27 28 29 30 31 32 33 34 35 36 12 8 8.5 10 11 17.5 22 27.5 38 45 37 35 32 37 38 39 40 41 28 29 32 35 47 When this data is plotted on an x-y graph (graph below), the following relationship is displayed. It is difficult to be certain at this juncture which forecasting model (linear or non-linear) to use to predict the temperature for the next 12 months. This would require you to research the context of the data. For the purposes of this guide, we will assume the relationship is non-linear and based upon the 3 non–linear equations discussed above, we will use the generic equation (4) y = c*(exp(+mx)) or y=ce+mx. (Note in this case, as the curve is upward sloping, the m value is given a positive sign). Degrees (Celcius) Temperature in Sao Paulo 50 45 40 35 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 Month Using the expansion of equation (4), we will use the transformation ln y = ln c + (+mx), to convert our data into a linear form we can manipulate. Hence our dataset would look like: 13 SUM x Month 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00 37.00 38.00 39.00 40.00 41.00 861.00 y Temperature 3.00 4.00 5.50 9.00 15.00 18.00 20.00 21.00 17.00 15.00 10.00 8.00 5.00 5.80 5.90 9.00 16.00 21.00 25.00 30.00 30.00 25.00 19.00 15.00 8.00 8.50 10.00 11.00 17.50 22.00 27.50 38.00 45.00 37.00 35.00 32.00 28.00 29.00 32.00 35.00 47.00 814.70 ln y xlny lny squared x squared 1.10 1.39 1.70 2.20 2.71 2.89 3.00 3.04 2.83 2.71 2.30 2.08 1.61 1.76 1.77 2.20 2.77 3.04 3.22 3.40 3.40 3.22 2.94 2.71 2.08 2.14 2.30 2.40 2.86 3.09 3.31 3.64 3.81 3.61 3.56 3.47 3.33 3.37 3.47 3.56 3.85 113.83 1.10 2.77 5.11 8.79 13.54 17.34 20.97 24.36 25.50 27.08 25.33 24.95 20.92 24.61 26.62 35.16 47.13 54.80 61.16 68.02 71.43 70.82 67.72 64.99 51.99 55.64 62.17 67.14 83.00 92.73 102.74 116.40 125.62 122.77 124.44 124.77 123.29 127.96 135.16 142.21 157.86 2626.12 1.21 1.92 2.91 4.83 7.33 8.35 8.97 9.27 8.03 7.33 5.30 4.32 2.59 3.09 3.15 4.83 7.69 9.27 10.36 11.57 11.57 10.36 8.67 7.33 4.32 4.58 5.30 5.75 8.19 9.55 10.98 13.23 14.49 13.04 12.64 12.01 11.10 11.34 12.01 12.64 14.82 336.27 1.00 4.00 9.00 16.00 25.00 36.00 49.00 64.00 81.00 100.00 121.00 144.00 169.00 196.00 225.00 256.00 289.00 324.00 361.00 400.00 441.00 484.00 529.00 576.00 625.00 676.00 729.00 784.00 841.00 900.00 961.00 1024.00 1089.00 1156.00 1225.00 1296.00 1369.00 1444.00 1521.00 1600.00 1681.00 23821.00 Hence from this table of data values, we can determine m and c to be: m = 41(2626.12)-(861)(113.83) 41(23821)-(861)(861) c = (113.83)(23821)-(861)(2626.12) 41(23821)-(861)(861) Hence : m = 0.0411 to 4 decimal places) C = 1.9140 (to 4 decimal places) 14 However, as this refers to m and C and we need m and c, we must transform the c value to obtain the true value for our non-linear equation. Hence: c = exp(C) Therefore: m = 0.0411 (to 4 decimal places) c = 6.7822 (to 4 decimal places) Therefore the final non-linear equation is given as: (5) y = 6.7822e0.0411x Hence the forecasts for next 12 months are found by inserting additional values of x into equation (5) which generates table (1) below: Month 1 2 3 4 5 6 7 8 9 10 11 12 Temperature Month Temperature Month Temperature Month Temperature 3 4 5.5 9 15 18 20 21 17 15 10 8 13 14 15 16 17 18 19 20 21 22 23 24 5 5.8 5.9 9 16 21 25 30 30 25 19 15 25 26 27 28 29 30 31 32 33 34 35 36 8 8.5 10 11 17.5 22 27.5 38 45 37 35 32 37 38 39 40 41 42 43 44 45 46 47 48 28 29 32 35 47 38.11 39.71 41.38 43.11 44.92 46.80 48.77 Month Temperature 49 50 51 52 53 50.81 52.95 55.17 57.48 59.89 Table 1 If you plot these forecasts with the original data, you notice that whilst the trend (T) and underlying trend (U) seem appropriate, we are still missing out on a seasonal variation (S) in the data forecasts. This is the next step. Seasonality can be addressed in two ways, using these methods. The first is called the additive method, the second is called the multiplicative method. In both methods you are working out how much difference there is between your trend (T) forecast (in table 1) compared with the real data you have. You then adjust your (T) forecast by a suitable amount to end up with a final forecast. Your adjustments are based on the two calculations of: Final forecast = T + S (Additive method) Final forecast = T * S (Multiplicative method) We shall use one of these methods to continue with this worked example. 15 Graph (f) : Temperature Sau Paulo 70 degree celcius 60 50 40 30 20 Best fit line 10 0 0 10 n=3,15,27 20 30 40 50 60 months When including seasonality (S) into forecasts, one approach is then to consider if the seasonality (i.e. the change observed in the actual data as it moves around the best fit line) is a regular constant change or if that change is increasing or decreasing with the passage of time. By looking at the original graph of the temperature in Sau Paulo over time, we could argue that the regular ups and downs observed in this data seem to repeat every 12 th data value point (hence the seasonality has a period of 12) and that these changes in every 12 th point, appear to be more or less the same when compared with a rising trend (T). For example, if you look at graph (f), the 3rd data value point (n=3), just after the bottom of the seasonal cycle seems to always be more or less, a little bit less than the trend (T) value for this time. Hence the temperature for each March month (n=3, 15, 27, 39) always seems to be the same distance below the trend line (T). We therefore need to modify the forecast values we have determined for (T) above in table 1, by the amount of difference for each point in the seasonal cycle. This is worked out in this example by the simple calculation of (Actual - Forecast (T)). Hence in table (2) below, the difference between the actual data observed and the trend forecast is worked out. As this cycle of change in the data repeats itself with a period of 12, we can take the average of the differences for the same points in the cycle. This is shown by the column in table 2, headed ‘mean difference’. Hence the first value is the mean of the values (-4.07, -6.57, -10.95, -3.03). Your final forecast is then the addition of the (T) forecast with the seasonal (S) modification. Hence the first final forecast value is found to be (T)+(S), which equals: Final forecast for January (n=1) = 7.07+ (-6.15) = 0.91 If you repeat this calculation for all your months, including your new forecasts, you end up with table 2 and graph (g). 16 Difference between actual x y ln y Month Temperature 1.00 3.00 1.10 2.00 4.00 1.39 3.00 5.50 1.70 4.00 9.00 2.20 5.00 15.00 2.71 6.00 18.00 2.89 7.00 20.00 3.00 8.00 21.00 3.04 9.00 17.00 2.83 10.00 15.00 2.71 11.00 10.00 2.30 12.00 8.00 2.08 13.00 5.00 1.61 14.00 5.80 1.76 15.00 5.90 1.77 16.00 9.00 2.20 17.00 16.00 2.77 18.00 21.00 3.04 19.00 25.00 3.22 20.00 30.00 3.40 21.00 30.00 3.40 22.00 25.00 3.22 23.00 19.00 2.94 24.00 15.00 2.71 25.00 8.00 2.08 26.00 8.50 2.14 27.00 10.00 2.30 28.00 11.00 2.40 29.00 17.50 2.86 30.00 22.00 3.09 31.00 27.50 3.31 32.00 38.00 3.64 33.00 45.00 3.81 34.00 37.00 3.61 35.00 35.00 3.56 36.00 32.00 3.47 37.00 28.00 3.33 38.00 29.00 3.37 39.00 32.00 3.47 40.00 35.00 3.56 41.00 47.00 3.85 861.00 814.70 113.83 xlny 1.10 2.77 5.11 8.79 13.54 17.34 20.97 24.36 25.50 27.08 25.33 24.95 20.92 24.61 26.62 35.16 47.13 54.80 61.16 68.02 71.43 70.82 67.72 64.99 51.99 55.64 62.17 67.14 83.00 92.73 102.74 116.40 125.62 122.77 124.44 124.77 123.29 127.96 135.16 142.21 157.86 2626.12 lny squared x squared Trend forecast 1.21 1.92 2.91 4.83 7.33 8.35 8.97 9.27 8.03 7.33 5.30 4.32 2.59 3.09 3.15 4.83 7.69 9.27 10.36 11.57 11.57 10.36 8.67 7.33 4.32 4.58 5.30 5.75 8.19 9.55 10.98 13.23 14.49 13.04 12.64 12.01 11.10 11.34 12.01 12.64 14.82 336.27 1.00 4.00 9.00 16.00 25.00 36.00 49.00 64.00 81.00 100.00 121.00 144.00 169.00 196.00 225.00 256.00 289.00 324.00 361.00 400.00 441.00 484.00 529.00 576.00 625.00 676.00 729.00 784.00 841.00 900.00 961.00 1024.00 1089.00 1156.00 1225.00 1296.00 1369.00 1444.00 1521.00 1600.00 1681.00 23821.00 7.07 7.36 7.67 7.99 8.33 8.68 9.04 9.42 9.82 10.23 10.66 11.11 11.57 12.06 12.56 13.09 13.64 14.21 14.81 15.43 16.08 16.75 17.45 18.19 18.95 19.74 20.57 21.44 22.34 23.27 24.25 25.27 26.33 27.43 28.58 29.78 31.03 32.33 33.69 35.10 36.58 38.11 39.71 41.38 43.11 44.92 46.80 48.77 50.81 52.95 55.17 57.48 59.89 Table 2 17 data and trend forecast Mean difference Final forecast -4.07 -3.36 -2.17 1.01 6.67 9.32 10.96 11.58 7.18 4.77 -0.66 -3.11 -6.57 -6.26 -6.66 -4.09 2.36 6.79 10.19 14.57 13.92 8.25 1.55 -3.19 -10.95 -11.24 -10.57 -10.44 -4.84 -1.27 3.25 12.73 18.67 9.57 6.42 2.22 -3.03 -3.33 -1.69 -0.10 10.42 -6.15 -6.05 -5.27 -3.41 3.65 4.95 8.13 12.96 13.26 7.53 2.43 -1.36 -6.15 -6.05 -5.27 -3.41 3.65 4.95 8.13 12.96 13.26 7.53 2.43 -1.36 -6.15 -6.05 -5.27 -3.41 3.65 4.95 8.13 12.96 13.26 7.53 2.43 -1.36 -6.15 -6.05 -5.27 -3.41 3.65 4.95 8.13 12.96 13.26 7.53 2.43 -1.36 -6.15 -6.05 -5.27 -3.41 3.65 0.91 1.31 2.40 4.59 11.98 13.62 17.18 22.38 23.08 17.76 13.09 9.75 5.42 6.01 7.29 9.68 17.29 19.16 22.94 28.39 29.34 24.28 19.89 16.83 12.79 13.70 15.30 18.03 25.99 28.22 32.38 38.23 39.59 34.96 31.02 28.42 24.88 26.28 28.41 31.70 40.23 43.06 47.84 54.34 56.37 52.45 49.24 47.41 44.66 46.90 49.89 54.08 63.55 Graph (g):Forecast and Actual temperature Sau Paulo 70 Degree 60 Original data 50 40 30 Final forecast 20 10 Best fit line 0 0 10 20 30 40 50 60 Month Graph (g) therefore shows a good fit between your final forecasts and the original data, suggesting that for future forecasts it is to be appropriate and accurate. To determine how accurate your forecasts are, you now need to determine the errors in them. This is examined below. As an aside, if you were to apply the multiplicative method to the (T) forecasts in this example, rather than work out (Actual – forecast (T)) as above, you instead determine: Seasonal proportional change = Actual *100% Forecast (T) Hence as with the calculations above, you end up with a proportional change (a % change needing to be made to each (T) forecast) so as to end up with the final forecast needed. This method therefore allows you consider increasing or decreasing rates of change in the seasonality affecting observed data. Methods of handling quantitative data for forecasting (3) In applying any statistical or numerical method to a range of data, it is helpful to know if your forecasts are good, bad or indifferent. The value of r2 has already been introduced, but there are other measures of goodness that can be used to give information about how effective your analysis has been. Some of these are listed below. Mean Error (ME) – This is the summed average difference between your forecasts and what data you have observed. It has the form of: ME = (ye – y) n where n = number of data values of y you have 18 Mean Square Error (MSE) – This is the summed and squared difference between your forecasts and what data you have observed. It has the form of: MSE = (ye – y)2 n where n = number of data values of y you have It is a better measurement of goodness in your forecasts, as it eliminates negative values in the difference between your forecast and the observed data. Mean Absolute Deviation (MAD) – This is the absolute integer difference between your forecasts and the data you have observed. It has the form of: MAD = │ye – y│ n where n = number of data values of y that you have The straight bars on either side of the equation denote just take the absolute difference (ignoring any negative signs). It is a better measurement of goodness in your forecasts, as it eliminates negative values in the difference between your forecast and the observed data. Worked example If we apply the MSE calculation to the worked example in table 2, we need to calculate the sum of the differences between our forecasts and the original data, then divide this by the number of data values we have (n=41). Hence: MSE = 566.55 / 41 MSE = 13.82 (to 2 decimal points) Hence taking the square root of this value, each forecast value is, on average, within 3.17 degrees of the original data. In terms of calculating the coefficient of determination: r2 = (ye – y)2 (y - y)2 Using the data in table 2, we can find this to be: r2 = 4611.709 5596.76 r2 =0.8239 In other words, the model used has accounted for approximately 83% of observed variance in the data, representing a good fit with the data. 19 References: Carlson, G.A., (2002),’Least Squares Analysis’, Accessed at http://www.stchas.edu/faculty/gcarlson/physics/docs/LeastSquaresAnalysis.pdf on March 24th 2006 Jennings D. and Wattam, S.,(1998),’Decision Making – An integrated approach’, FT- Prentice Hall, UK. The information in this leaflet can be made available on request. Telephone 01482 466199. © 03/2008 20