Simple Regression Analysis

advertisement
Simple Regression Analysis
Dr. H.I. Jimoh
Department of Geography,
University of Ilorin, Nigeria
Introduction
Regression analysis is a measure of the predictability of one set of variates
(William and Terry, 1989, Hammond and McCullagh, 1978). Alternatively, it can
equally be looked at as a summary expression of the relationship between two
variables; or a means of highlighting, interpreting or predicting unknown values of one
variable from known values of the other. Whenever is seen as a summary expression of
the relationship between two variables, regression line (line of best fit) is fitted on a
scatter gram (see Fig. 1). Also, when viewed as a measure of prediction, it involves the
building of simple regression model such as y = a + bx. Thus, a simple regression
model can simply be referenced as an analytical too that measures both association and
prediction between two data set.
From above, it is important to be informed, that two sets of variables are
required; one is independent and the other dependent. For instance, relationship between
the depth of erosion and annual rainfall total in Kogi State, Nigeria. The annual rainfall
total is the independent variable and the depth of erosion is the dependent variable. This
is because; the depth of erosion depends on the forces of the annual rainfall total.
Further, linear regression is estimated with the formula y = a + bx where y = dependent
variable.
Erosion Rates (cm)
y
12-
10-
8-
6-
4-
2x
2
4
6
8
10
12
14
Annual Rainfall Total (mm)
Fig. 1: Scattergram of X & Y Data
x
= independent variable
a
= intercept on y-axis, the value of y at the point
where x = 0
b
= slope coefficient.
This definition has also been presented in the form of diagram (See Fig. 2).
In the social sciences, the relevance of simple regression analysis as an
analytical tool is enormous. This is even more crucial when a prime focus of a research
is either on association between issues or events, or making predictions based on the
knowledge of the available information. It is therefore this appreciation of linear
regression as an analytical tool that has provoked the need to examine all aspects of
simple regression analysis. This is in the hope of synthesizing the seemingly difficult
aspects of this analytical tool to some of the 20th century cholars in the Third World
who believed that rigorous analytical tool are undesirable in research endeavours.
The Need for Simple Regression Analysis
In social Sciences, research efforts by scholars vary in content, approach and
methodology. However, the technique of analysis as appropriate may be decided by the
focus of the research, uses with which the result will be put, nature of the data, and
method of data acquisition among others. These criteria not withstanding, the
application of simple regression analysis has been found with rewarding results in all
social science research. For instance, the need to examine the future behaviour of a new
business line in a mixed economy, voting pattern among the different strata of a
population, the trend of accounting system in a, capitalist economy, weather forecasting
from atmospheric parameters among others. These research exercises in various aspects
of social sciences call for the application of simple regression analysis as a means of
appreciating the degree of associations between variables. In addition, empirical
research works have in recent time dominated most research efforts in social sciences. Such
research endeavours call for serious statements on both associations and predictions. Again, the
basic tool is the simple regression analysis. For instance, Geomorphology (a specialized field in
Geography) rests comfortably on the application of simple regression analysis (See Ebisemiju
1976, Oyegun 1980, Iweze 1988, Jimoh 1994). However it must be mentioned that the
selection of an analytical tool should be devoid of the interest of the researcher. It
should however be based on the relevance of the tool to the research focus. This is
because; basis must be established for the choice of analytical tools.
Erosion Rates(cm)
y
28-
24-
20.
y = a +bx
16-
12Slope (b)
.
8-
4-
Intercept (a)
x
0
4
8
12
16
20
24
Annual Rainfall Total (mm)
Fig. 2: Characteristic of Linear Regression.
Computation of Simple Regression
28
There exists in the computation of simple regression analysis a well-ordered
method of procedure which includes:
(a)
Formulation of Hypothesis.
Hypothesis is the starting point for investigation; and is of two types: The Null
Hypothesis (Ho) and Alternative Hypothesis (H~. The Null Hypothesis is a negative
proposition, formulated for the purpose of applying a statistical test to a problem under
investigation, and generally in anticipation, of being rejected as false (Hammond and
McCullagh 1979). However, at the rejection of Ho, the Alternative Hypothesis is
formulated which is positive for the purpose of applying a statistical test to a problem
under study, and generally in anticipation of being accepted as true. Hypothesis
formulation is important in trend fitting especially after the significant level has been
determined. This is crucial because, it states the level of correctness of association.
(a) Categorization of Data into Dependent and Independent variables.
This is to assist the construction of the scatter gram, a step necessary for the trend
fitting exercise.
(c) Statement of the Formular
This is of the form y = a+b x. to facilitate the application of the linear regression
model, the "a" and "b" component of the equation (y = a +b x) must be defined
appropriately thus:
b = nE xy - (Ex) (Ey)
nEx2 - (Ex)2
where b
=slope
n
= number of observations
E
= summation sign
x&y= designation of data under investigation.
Note that it is important to compute "b" first because, it enters into the computation of
"a" thus:
a
=
bEx
n
Where
a
= intercept
E
= summation sign
b
= slope
x&y = designation of data under investigation
(d) Calculation
(See the example below)
for instance an observation has been made on re of sediment yield during rainfall events
over a certain period of time (See Table 1)
Table 1: Rainfall and Sediment Yield in a Region.
Number of
X
Y
observations Rainfall total (mm)
Sediment
yields (gms)
1
1.0
3.0
2
3.0
5.0
3
4.0
9.0
4
7.0
7.0
5
10.0
8.0
6
12.0
16.0
7
15.0
20.0
From this table 1, another table is necessarily computed to reflect on the various
component parts of the linear regression model (Table2).
Table 2:
N
X
Y
x2
y2
xy
1
1.0
1.0
1.0
9.0
3.0
2
3.0
3.0
9.0
25.0
15.0
3
4.0
4.0
16.0
81.0
36.0
4
7.0
7.0
49.0
49.0
49.0
5
10.0
10.0
100.0
64.0
80.0
6
12.0
12.0
144.0
256.0
192.0
7
15.0
15.0
225.0
400.0
300.0
En = 7 Ex = 52
Ey = 68
Ex2 = 544
Ey2 = 884
Exy = 675
b = nE xy - (Ex) (Ey)
nEx2 - (Ex)2
=
7 x 675 – 52 x 68
7x 544 – 2704
=
4725 – 3536
3808 –2704
=
1189
1104
=
1.08
b
= 1.08
y–bEx
n
68 – 1.08 (52)
7
a=
1.69
The simple regression equation therefore becomes Y = 1.69 + 1.08x. By this linear
regression formula, it becomes possible to fit in the line of best fit from which
associations could be stated and predictions made between pairs of datasets. .
(e) Trend Fitting Stage
The fitting of line of best fit involves the construction of scatter gram after the
identification of the dependent and independent variables (Fig. 3).
The fitting of the line of best fit involves establishing the point of origin, which
a=
is usually on the “y” axis. For example, y = 1.69 + 1.08 x (0) = 1.69 + 17.28 = 18.97
(Equation 2).
Thus, the two values on “y” and “x” axes especially as depicted by equations 1
and 2 (See fig 3) explains the procedure for fitting in the line of best fit.
This method of trend fitting is known as the method of least squares. Another
method is the semi-averages.
Fig. 3: BIVARIATE REGRESSION ON SEDIMENT YIELD WITH RAINFALL
TOTAL
(f) Determination of the Rate of Deviation of Data Point From the Line of Best fit.
This process is necessary to ascertain the fact that, the path of the line of best fit
is correct.
The observed and the predicted values of y, the rate of squares of these
deviations are as presented on Table 3.
Table 3: Comparing the Observed and Predicted Values for the Least square
Model
X
Y
Y
(Y –y)
(Y – Y)2
=1.69 + 1.08X
1.0
3.0
2.77
3-2.77 =0.23
0.05
3.0
5.0
4.93
5-4.93 =0.07
0.00
4.0
9.0
6.01
9-6.01 = 2.99
8.94
7.0
7.0
9.25
7-9.25 = 2.25
5.06
10.0
8.0
12.49
8-12.49 =4.49
20.16
12.0
16.0
14.65
16-14.65 = 1.35
1.82
15.0
20.0
17.89
20-17.89= 2.11
4.45
Sum of error = 0.01
SSE = 40.48
For this table (table 3), it is observed that the sum of squares of the deviations, SSE is
40.48 and this value will be considered less compared to any line fitted into the scatter
gram other than this. This is because, we have defined the best fitting straight line to be
the one that satisfies the lese squares criterion this line is referenced the least square line,
and its equation is called the least squares prediction equation.
The understanding here is that, one is confident that the path of the trend is
correct and the degree of error term is considered minimized.
Interpretation/Application
There is no hard and fast rule regarding the interpretation/the application of
simple regression analysis. However, what is basic is that a simple regression is a
package of information about research question whose interpretation/the application lies
wholly in the indepth understanding of the mechanism of simple regression analysis. In
this view, understanding simple regression in its entirely is the only way through which
it can be interpreted or applied to research questions.
Appraisal of Simple Regression
In social sciences researches, simple regression analyses have been found to be
of paramount importance as follows:
(a)
It is useful tool for making valid predictions. For example, predict the rate of
sediment yield (y axis) on surfaces form rainfall quantities (x axis). That is, predict the
value of “y” on “x”.
The formula is of the form
y – y = r x sy/sx (x-x)
Where:
r
= product moment correlation coefficient
x & y = the means.
sx & sy
= the standard deviation of the given values of x and y
Also, it is possible to predict rainfall quantities, given a particular amount of sediment
materials. This in essence means that, you should predict “x” on “y”. The formula is of
the form:
x – x = r x sx/sy (y-y)
where the components of this formula are as earlier defined.
(b)
it equally assists in the understanding of the strength variables on each other.
For instance, a problem relating to the rainfall values on the rate of sediment
yield from surfaces. This will enable the scholar to determine the precise
relevance of the factor of rainfall on the rate of sediment yield on the surface.
(c)
Simple regression analysis is a basic tool that facilitates a comparative study.
(d)
Finally, it provides basis for making decisions about the nature of
relationship between a pair of variables. For example, the effects of price
change on the quantity of rice purchased over a certain period of time in a
Nigerian.
Generally, simple regression analysis is such a vital analytical tool in social
science researches through which problems in making valid decisions,
predictions, interpretations, and explanations have been surmounted.
Conclusion
Simple regression analysis is a fundamental tool in most social sciences
researches. This is obvious when a better grip of the analytical tool is successfully
achieved. But, this can only be of computation, interpretation and application.
References
Ebisemiju, S.F (1976) “ the Structure of the interrelationship of drainage basin
characteristics” Ph.D. thesis, Dept. of Geography, Ibadan, Unpublished.
Hammond, R and P.S. Mc Cullagh (1979) Quantitative Techniques in Geography.
Oxford: OUP
Iweze, A.C. (1988) “Rainfall erosion and soil erodibility in West Africa”. Proceedings
of the 3rd Symposium: Nigerian Department of Meteorology Services, Federal
Ministry of Aviation, Lagos, Nigeria.
Jimoh, H.I. (1994) “Effects of erosion on Foundation of Houses in Okene and its
Environs”. Ilorin Journal of Business and Social Sciences vol. 4 pp. 133-142.
Oyegun, R.O. (1980) “The Effects of Tropical Rainfall on Sediment Yield from
different surfaces in Sub-Urban Ibadan”. Ph.D Thesis, Dept of Geography,
Ibadan. Unpublished.
Williams. M. and S. Terry (1989) A Second Course in business statistics: regression
analysis. New Jersey: Dellen Publishing Company.
Download