Simple Regression Analysis Dr. H.I. Jimoh Department of Geography, University of Ilorin, Nigeria Introduction Regression analysis is a measure of the predictability of one set of variates (William and Terry, 1989, Hammond and McCullagh, 1978). Alternatively, it can equally be looked at as a summary expression of the relationship between two variables; or a means of highlighting, interpreting or predicting unknown values of one variable from known values of the other. Whenever is seen as a summary expression of the relationship between two variables, regression line (line of best fit) is fitted on a scatter gram (see Fig. 1). Also, when viewed as a measure of prediction, it involves the building of simple regression model such as y = a + bx. Thus, a simple regression model can simply be referenced as an analytical too that measures both association and prediction between two data set. From above, it is important to be informed, that two sets of variables are required; one is independent and the other dependent. For instance, relationship between the depth of erosion and annual rainfall total in Kogi State, Nigeria. The annual rainfall total is the independent variable and the depth of erosion is the dependent variable. This is because; the depth of erosion depends on the forces of the annual rainfall total. Further, linear regression is estimated with the formula y = a + bx where y = dependent variable. Erosion Rates (cm) y 12- 10- 8- 6- 4- 2x 2 4 6 8 10 12 14 Annual Rainfall Total (mm) Fig. 1: Scattergram of X & Y Data x = independent variable a = intercept on y-axis, the value of y at the point where x = 0 b = slope coefficient. This definition has also been presented in the form of diagram (See Fig. 2). In the social sciences, the relevance of simple regression analysis as an analytical tool is enormous. This is even more crucial when a prime focus of a research is either on association between issues or events, or making predictions based on the knowledge of the available information. It is therefore this appreciation of linear regression as an analytical tool that has provoked the need to examine all aspects of simple regression analysis. This is in the hope of synthesizing the seemingly difficult aspects of this analytical tool to some of the 20th century cholars in the Third World who believed that rigorous analytical tool are undesirable in research endeavours. The Need for Simple Regression Analysis In social Sciences, research efforts by scholars vary in content, approach and methodology. However, the technique of analysis as appropriate may be decided by the focus of the research, uses with which the result will be put, nature of the data, and method of data acquisition among others. These criteria not withstanding, the application of simple regression analysis has been found with rewarding results in all social science research. For instance, the need to examine the future behaviour of a new business line in a mixed economy, voting pattern among the different strata of a population, the trend of accounting system in a, capitalist economy, weather forecasting from atmospheric parameters among others. These research exercises in various aspects of social sciences call for the application of simple regression analysis as a means of appreciating the degree of associations between variables. In addition, empirical research works have in recent time dominated most research efforts in social sciences. Such research endeavours call for serious statements on both associations and predictions. Again, the basic tool is the simple regression analysis. For instance, Geomorphology (a specialized field in Geography) rests comfortably on the application of simple regression analysis (See Ebisemiju 1976, Oyegun 1980, Iweze 1988, Jimoh 1994). However it must be mentioned that the selection of an analytical tool should be devoid of the interest of the researcher. It should however be based on the relevance of the tool to the research focus. This is because; basis must be established for the choice of analytical tools. Erosion Rates(cm) y 28- 24- 20. y = a +bx 16- 12Slope (b) . 8- 4- Intercept (a) x 0 4 8 12 16 20 24 Annual Rainfall Total (mm) Fig. 2: Characteristic of Linear Regression. Computation of Simple Regression 28 There exists in the computation of simple regression analysis a well-ordered method of procedure which includes: (a) Formulation of Hypothesis. Hypothesis is the starting point for investigation; and is of two types: The Null Hypothesis (Ho) and Alternative Hypothesis (H~. The Null Hypothesis is a negative proposition, formulated for the purpose of applying a statistical test to a problem under investigation, and generally in anticipation, of being rejected as false (Hammond and McCullagh 1979). However, at the rejection of Ho, the Alternative Hypothesis is formulated which is positive for the purpose of applying a statistical test to a problem under study, and generally in anticipation of being accepted as true. Hypothesis formulation is important in trend fitting especially after the significant level has been determined. This is crucial because, it states the level of correctness of association. (a) Categorization of Data into Dependent and Independent variables. This is to assist the construction of the scatter gram, a step necessary for the trend fitting exercise. (c) Statement of the Formular This is of the form y = a+b x. to facilitate the application of the linear regression model, the "a" and "b" component of the equation (y = a +b x) must be defined appropriately thus: b = nE xy - (Ex) (Ey) nEx2 - (Ex)2 where b =slope n = number of observations E = summation sign x&y= designation of data under investigation. Note that it is important to compute "b" first because, it enters into the computation of "a" thus: a = bEx n Where a = intercept E = summation sign b = slope x&y = designation of data under investigation (d) Calculation (See the example below) for instance an observation has been made on re of sediment yield during rainfall events over a certain period of time (See Table 1) Table 1: Rainfall and Sediment Yield in a Region. Number of X Y observations Rainfall total (mm) Sediment yields (gms) 1 1.0 3.0 2 3.0 5.0 3 4.0 9.0 4 7.0 7.0 5 10.0 8.0 6 12.0 16.0 7 15.0 20.0 From this table 1, another table is necessarily computed to reflect on the various component parts of the linear regression model (Table2). Table 2: N X Y x2 y2 xy 1 1.0 1.0 1.0 9.0 3.0 2 3.0 3.0 9.0 25.0 15.0 3 4.0 4.0 16.0 81.0 36.0 4 7.0 7.0 49.0 49.0 49.0 5 10.0 10.0 100.0 64.0 80.0 6 12.0 12.0 144.0 256.0 192.0 7 15.0 15.0 225.0 400.0 300.0 En = 7 Ex = 52 Ey = 68 Ex2 = 544 Ey2 = 884 Exy = 675 b = nE xy - (Ex) (Ey) nEx2 - (Ex)2 = 7 x 675 – 52 x 68 7x 544 – 2704 = 4725 – 3536 3808 –2704 = 1189 1104 = 1.08 b = 1.08 y–bEx n 68 – 1.08 (52) 7 a= 1.69 The simple regression equation therefore becomes Y = 1.69 + 1.08x. By this linear regression formula, it becomes possible to fit in the line of best fit from which associations could be stated and predictions made between pairs of datasets. . (e) Trend Fitting Stage The fitting of line of best fit involves the construction of scatter gram after the identification of the dependent and independent variables (Fig. 3). The fitting of the line of best fit involves establishing the point of origin, which a= is usually on the “y” axis. For example, y = 1.69 + 1.08 x (0) = 1.69 + 17.28 = 18.97 (Equation 2). Thus, the two values on “y” and “x” axes especially as depicted by equations 1 and 2 (See fig 3) explains the procedure for fitting in the line of best fit. This method of trend fitting is known as the method of least squares. Another method is the semi-averages. Fig. 3: BIVARIATE REGRESSION ON SEDIMENT YIELD WITH RAINFALL TOTAL (f) Determination of the Rate of Deviation of Data Point From the Line of Best fit. This process is necessary to ascertain the fact that, the path of the line of best fit is correct. The observed and the predicted values of y, the rate of squares of these deviations are as presented on Table 3. Table 3: Comparing the Observed and Predicted Values for the Least square Model X Y Y (Y –y) (Y – Y)2 =1.69 + 1.08X 1.0 3.0 2.77 3-2.77 =0.23 0.05 3.0 5.0 4.93 5-4.93 =0.07 0.00 4.0 9.0 6.01 9-6.01 = 2.99 8.94 7.0 7.0 9.25 7-9.25 = 2.25 5.06 10.0 8.0 12.49 8-12.49 =4.49 20.16 12.0 16.0 14.65 16-14.65 = 1.35 1.82 15.0 20.0 17.89 20-17.89= 2.11 4.45 Sum of error = 0.01 SSE = 40.48 For this table (table 3), it is observed that the sum of squares of the deviations, SSE is 40.48 and this value will be considered less compared to any line fitted into the scatter gram other than this. This is because, we have defined the best fitting straight line to be the one that satisfies the lese squares criterion this line is referenced the least square line, and its equation is called the least squares prediction equation. The understanding here is that, one is confident that the path of the trend is correct and the degree of error term is considered minimized. Interpretation/Application There is no hard and fast rule regarding the interpretation/the application of simple regression analysis. However, what is basic is that a simple regression is a package of information about research question whose interpretation/the application lies wholly in the indepth understanding of the mechanism of simple regression analysis. In this view, understanding simple regression in its entirely is the only way through which it can be interpreted or applied to research questions. Appraisal of Simple Regression In social sciences researches, simple regression analyses have been found to be of paramount importance as follows: (a) It is useful tool for making valid predictions. For example, predict the rate of sediment yield (y axis) on surfaces form rainfall quantities (x axis). That is, predict the value of “y” on “x”. The formula is of the form y – y = r x sy/sx (x-x) Where: r = product moment correlation coefficient x & y = the means. sx & sy = the standard deviation of the given values of x and y Also, it is possible to predict rainfall quantities, given a particular amount of sediment materials. This in essence means that, you should predict “x” on “y”. The formula is of the form: x – x = r x sx/sy (y-y) where the components of this formula are as earlier defined. (b) it equally assists in the understanding of the strength variables on each other. For instance, a problem relating to the rainfall values on the rate of sediment yield from surfaces. This will enable the scholar to determine the precise relevance of the factor of rainfall on the rate of sediment yield on the surface. (c) Simple regression analysis is a basic tool that facilitates a comparative study. (d) Finally, it provides basis for making decisions about the nature of relationship between a pair of variables. For example, the effects of price change on the quantity of rice purchased over a certain period of time in a Nigerian. Generally, simple regression analysis is such a vital analytical tool in social science researches through which problems in making valid decisions, predictions, interpretations, and explanations have been surmounted. Conclusion Simple regression analysis is a fundamental tool in most social sciences researches. This is obvious when a better grip of the analytical tool is successfully achieved. But, this can only be of computation, interpretation and application. References Ebisemiju, S.F (1976) “ the Structure of the interrelationship of drainage basin characteristics” Ph.D. thesis, Dept. of Geography, Ibadan, Unpublished. Hammond, R and P.S. Mc Cullagh (1979) Quantitative Techniques in Geography. Oxford: OUP Iweze, A.C. (1988) “Rainfall erosion and soil erodibility in West Africa”. Proceedings of the 3rd Symposium: Nigerian Department of Meteorology Services, Federal Ministry of Aviation, Lagos, Nigeria. Jimoh, H.I. (1994) “Effects of erosion on Foundation of Houses in Okene and its Environs”. Ilorin Journal of Business and Social Sciences vol. 4 pp. 133-142. Oyegun, R.O. (1980) “The Effects of Tropical Rainfall on Sediment Yield from different surfaces in Sub-Urban Ibadan”. Ph.D Thesis, Dept of Geography, Ibadan. Unpublished. Williams. M. and S. Terry (1989) A Second Course in business statistics: regression analysis. New Jersey: Dellen Publishing Company.