LINEAR REGRESSION - CLSU Open University

advertisement
Module 9
Linear Regression
Introduction
The main objective of many statistical investigations is to establish
relationships which make it possible to predict one or more variables in
terms of the other variables. For instance, studies are made to predict the
future weight of a person in terms of the number of weeks he or she will
stay on an 800-calorie-per-day diet, family expenditures on medical care in
terms of family income and per capita consumption of certain food items
of their nutritional value.
The statistical technique used for determining the probable form of
the relationship between variables is called regression analysis. The
ultimate objective when using this method of analysis is usually to predict
or estimate the value of one variable (dependent variable) corresponding
to a given value of another variable (independent variable).
Objectives:
At the end of this module, you should be able to:
1.
2.
3.
4.
Fit a simple linear regression model to a given data set.
Overlay the graph of the regression line on the scatter diagram.
Interpret the predicting equation.
Compute the coefficient of determination and interpret the result.
In this section we consider the problem of estimating or predicting
the value of the dependent variable (usually represented by Y) on the
basis of a known measurement of an independent and frequently
controlled variable (usually represented by X). Suppose we wish to predict
63
a student’s grade in chemistry based on his score on an intelligence test
administered prior to his attending college. To make such a prediction we
first examine the distribution of chemistry grades corresponding to various
intelligence test scores achieved by students in prior years. Denoting an
individual’s chemistry grade by y and his intelligence test score by x, then
the pertinent data of any student in the population can be represented by
the coordinates (x,y). A random sample of size n form the population
might then be designated by the set {(xi,yi); I = 1 , 2, …, n}.
Let us consider the distribution of chemistry grades corresponding to
intelligence test scores of 50, 55, 65 and 70. The chemistry grades for a
sample of 12 students having these intelligence test scores are presented
in Table 1.
Table 1
Intelligence Test Scores and Chemistry Grades
Student
1
2
3
4
5
6
7
8
9
10
11
12
Test Score, x
65
50
55
65
55
70
65
70
55
70
50
55
Chemistry Grade, y
85
74
76
90
85
87
94
98
81
91
76
74
Before proceeding any further let us first consider the following
lectures and discussion that will be necessary in answering the above
problem.
SIMPLE LINEAR REGRESSION
Of the many questions that can be used to predict values of one
variable, Y, from given values of another variable, X, the simplest and
most widely used is the linear equation in two unknown, which is
64
expressed by the simple linear statistical model also called as the simple
linear regression model.
Yi = 0 + 1 Xi + i , i = 1, 2,…, n
where
Yi = ith observed value of the random variable Y
Xi = ith observed value of the random variable X
0 = regression constant, the true Y intercept
1 = regression coefficient; the slope of the line
i = random error associated with Y for a given Xi
The quantities 0 and 1 in the model are called the parameters of
the model. Their values can only be determined if the entire population of
(X,Y) values can be obtained. In most practical situations, only estimate of
0 and 1 can be calculated. Let b0 represent an estimate of 0 and b1 an
estimate of 1 : b0 and b1 can be determined by either the freehand or the
least square methods. The latter procedure would reveal the least squares
estimate of 0 and 1 , respectively.
 xy – ( x)( y)
n
b1 = --------------------------- x2 – ( x)2
n
_
_
b0 = y – b1x
(Eqn. 1)
(Eqn. 2)
These computed values of b0 and b1 determine the equation of the
estimated sample regression line.

y = b0 + b1x.
This equation is referred to as the predicting equation since it is
used to predict the value of y (denoted by y^), for given a value of x.
Example:
Using the data presented in Table 1, x refers to the
intelligence test scores and y as the chemistry grade, we
obtain the following quantities:
n = 12
xi = 725
yi = 1011 xiyi = 61,685
65
xi2
= 44,475
yi2
_
_
= 85,905 x = 60.417 y = 84.250
61,685 – (725)(1011)
12
b1 = ----------------------------- = 0.897
44,475 – (725)2
12
b0 = 84.250 – 0.897(60.417) = 30.056
We can now write the predicting equation as

y = 30.056 + 0.897
This can be used to predict the chemistry grade of the student for a
known intelligence test scores. For x = 50, for instance, we get

y = 30.056 + 0.897(50) = 74.9
Thus a student whose intelligence test score is 50 is predicted to
have a chemistry grade of 74.9 while a student whose intelligence test
score is 70, the predicted chemistry grade is 92.8 (verify this value).
If we would like to assess how good the predicting line is, we
compute the value of R2, the coefficient of determination. (R2 is the
percentage contribution of the independent variable x to the changes in
the dependent variable y. The range of R2 is between 0 to 100%)
b1[ xy – ( x)( y)/n]
R = -------------------------------- x 100
Y2 – ( Y)2/n
2
0.897[61,685 – (725)(1011)/12]
R = ----------------------------------------- x 100
85,905 – (1011)2/12
2
R2 = 74.36%
66
This means that about 74.36% of the total variation or changes in
the chemistry grade is accounted for or is explained by the intelligence test
score.
Activity
1. Consider the following set of data:
x:1 2 3 4 5 6
y:6 4 3 5 4 2
a. Plot the points of x and y on your graphing paper (Do not connect
the points by means of a line, the resulting graph is known as the
scatter diagram).
b. Find the estimating
regression equation, that is, find the value of

b1, b0 and y.
c. Graph the equation of the line on your scatter diagram.
d. Estimate the value of y when x = 4? When x = 7?
e. Compute R2 and interpret your results.
2. Look for three studies that made use of simple linear regression
analysis. For each of them, present the following:
a. hypotheses of the study
b. regression equation obtained
c. your interpretation of the given regression equations
Download