Uploaded by Wael Shata

Lecture 7 Simple Regression and correlation-1

advertisement
Tanta University- Faculty of Commerce- English Section
Second Year 2019-2020
Introductory Statistics,
Week Seven
Chapter 12
Simple Linear Regression and Correlation Analyses
In this Lecture we will cover descriptive measures of simple linear
regression and correlation analyses and in a subsequent course
inferential techniques will be handled.
Firstly : The Correlation Coefficient
Used to analyze the relationship between two continuous
variables.
 Step one in the analysis is to plot the data for x and y in a
scatter plot.
Scatter plot is a two dimensional plot, values of x are on the xaxis, and values of y are on the y-axis, the values of (x, y) are
plotted.
 Step 2, is to examine the scatter plot, you may obtain one of
the following:
1.
Linear Relationship: going upward or downward.
a) Plot to the left, indicates a direct
relationship, meaning that as x
values increase the y values
also increase.
b) The plot to the
right indicates an inverse relationship between X and Y,
meaning that as x values increases the y values decrease.
In both plots, we can fit a straight line passes through most
[3]
Tanta University- Faculty of Commerce- English Section
Second Year 2019-2020
Introductory Statistics,
points , we say that the relation ship between X and Y is
linear.
2. Curvilinear Relationship: where
the points take the shape of a
quadratic relationship (plot to the
right) or a cubic relationship (
plot to the left) between Y and X.
3. No relationship : no
particular pattern Points are
scattered irregularly, and no
particular pattern for the
values of X and Y; as x
increases y some times increase and some times decrease.
Example:
The following data gives household income and expenditure in
thousands pounds for 10 families:
Income
3 4
4
6
7
6
8
9
9
11
Expenditure 2 3
4
4
5
5
6
7
7
8
Using excel, input data in
two columns, select data,
use the insert tab and
select Scatter , you get the
following linear scatter plot :
Thus, a linear relationship exists
between income and
expenditure.
[4]
Tanta University- Faculty of Commerce- English Section
2019-2020
Introductory Statistics, Second Year
To reach the strength of this relationship we compute the
correlation coefficient.
Pearson's Product Moment correlation coefficient.
The coefficient ranges between +1 and -1:
How to compute the correlation coefficient ?
a)
Computational formula is :
r
n   x
n   xy   x   y
2

  x   n   y 2   y 
2
2

Where,
n: Sample size
∑xy : is the sum of cross product of each value of y times the
corresponding value of x
∑x: is the sum of the column of x ( the independent variable);
∑y: is the sum of the column of y( the dependent variable);
∑x2: is the sum of the squared values of x
∑y2: is the sum of the squared values of y
Applying the above formula to the data above for income and
expenditure, we form the following table, where the variables
are income (x) and expenditure (y) ,
[5]
Tanta University- Faculty of Commerce- English Section
2019-2020
Introductory Statistics, Second Year
Income Expenditure(Y) XY
X2
Y2
(X)
3
2
6
9
4
4
3
12
16
9
4
4
16
16
16
6
4
24
36
16
7
5
35
47
25
6
5
30
36
25
8
6
48
64
36
9
7
63
81
49
9
7
63
81
49
11
8
88
121
64
2
∑x=67
∑y=51
∑xy=385 ∑x =509
∑y2
=293
From the table , we find that :
n=10 ∑ x = 67
∑ y= 51
∑ xy=385
∑ x2 = 509
∑ y2= 293
Inserting those sums in the correlation coefficient equation, we get:
r

10 385 67 51
10  509  (67)  10  293  (51) 
2
2
433
 .9738
601  329
Interpretation
How do we interpret a correlation coefficient of .9738 ?
1.
First it is positive, so we see that there is a direct
relationship between income (X) and expenditure (Y).
[6]
Tanta University- Faculty of Commerce- English Section
2019-2020
Introductory Statistics, Second Year
2.
Second, it is very close to 1, thus we conclude that the association
is strong; i.e., an increase in income will definitely leads to an increase
in expenditure.
b)
The CORREL function : using excel, Input data in two columns and
select , the Formula tab, statistical and “
correl” function as follows:
You get the following dialog Box, where
you fill in the addresses
of the first variable (C5:C14) and
the second variable (C5:C14) and you get the correlation coefficient=.97378,
as shown.
[7]
Tanta University- Faculty of Commerce- English Section
2019-2020
Introductory Statistics, Second Year
Secondly: Simple Linear Regression
Regression analysis is used to predict the value of a dependent variable
(effect variable) based on the value of at least one independent
variable( cause variables). In our example, the effect variable is :
Expenditure” and the cause variable is “ Income”. There is only one
independent variable in simple regression analysis
The population Regression Model
The actual population regression model takes the following form
i
 β 0  β1x i  ε i
Where: Yi is dependent variable (observed values) for
observation i
β 0 : population y intercept; it is the value predicted
when X=0.
β 1 : is the regression coefficient in the population, it is
the amount of change in the dependent variable associated
with one unit increase in the independent variable.
Xi: the value of the independent variable for observation i
εi: is a random error for observation i describes the
difference between the observed value and the average value (
predicted value).
e , is the
The population regression line or the prediction lin
y
mean expected value for y at a given x, and it contains only the
first two components of model (1) above, thus, the prediction
line is given by:

y  β 0  β1  x
(2)
[8]
Tanta University- Faculty of Commerce- English Section
Second Year 2019-2020
Introductory Statistics,
Thus the difference between the actual

y
values y and the predicted values

Is the error ε term :   y  y
as shown in the scatter plot diagram.
Estimating the Regression Equation
The purpose is to estimate equation (2) such that the error of
prediction (Equation (3) is a minimum.
The method used is called Least Square method, this method
makes the sum of squared error terms ( equation 3) minimum.
Thus to estimate equation (2):
 b 0  b1 x
ˆi
y
Such that the squared error term is a minimum, we apply the
following equations to estimate the prediction equation:
b1 
n   xy   x   y
n   x
2
  x 
2

b0  y  b1  x
The dependent variable is the “ effect or response variable”,
and the independent variable is the “ cause” variable. Applying
the estimated coefficients equations above, we get:
[9]
Tanta University- Faculty of Commerce- English Section
Second Year 2019-2020
b0  y  b1 x
y 51

y

 5.1
n
10
b0  5.1  .7205  6.7  .2727
Thus, the regression equation is:
b)
Introductory Statistics,
x 67

x

 6.7
n

y  .2727  .7205 x
Finding the slope and Intercept Using Excel
1. The Intercept” and “ SLOPE functions
Using excel function “ Intercept” we get the estimated bo and
using the “ SLOPE” function we get the estimated regression
coefficient. We proceed as earlier, we get:
[10]
10
Tanta University- Faculty of Commerce- English Section
Statistics, Second Year 2019-2020
Introductory
And thus,

y  .2727  .7205 x
same as obtained earlier.
Interpretation of the regression equation:
1.
The intercept = .2727
This means that the expenditure is 272.7 pound (.2727
thousand pound= 272.2) if income = zero.
2.
The slope or regression coefficient = .7205 ( in thousands),
this means that for every increase in income by thousand
pound, the consumption increases by 720.5 pounds.
3.
The regression coefficient always take the direction of the
correlation coefficient, either they are both positive or they are
both negative.
4.
To use the estimated regression equation : at income =3:

y  .2727  .7205 3  2.434
[11]
Tanta University- Faculty of Commerce- English Section
Statistics, Second Year 2019-2020
Introductory
Thus, the mean expenditure for families make 3 thousands
pounds income, it is 2.434 thousands .
And the mean expenditure for families make 4 thousand
income, replacing 4 for x we get :

y  .2727  .7205 4  3.1547
Questions On Simple correlation and Regression
Analysis
Use Excel functions and get the standard deviation of bothe
variables used in the example above and check the relationship
between the correlation coefficient and the regression coefficient
in terms of the standard deviations of both variables.
and
True/ false
1. Pearson Product Moment correlation coefficient is used on
quantitative data only.
2. A correlation coefficient of -1.0 indicates a weak relation ship
between the two variables.
3. A correlation of .74 is found between cost and profit, this
means that as cost go up profit goes up.
4 . when the standard deviations of both variables are equal then
the correlation coefficient and the regression coefficient are equal.
5. A regression coefficient of -2.1 is associated with a positive
correlation coefficient.
[12]
Tanta University- Faculty of Commerce- English Section
Statistics, Second Year 2019-2020
Introductory
MCQ
============================================
Use the following function argument and answer 1 to
5
1. The number of pairs of {x, Y} is : (a) 12 b. 6 c. 5 d. 10
2. The regression coefficient indicates that:
a. The correlation between {X, Y} is .70
b. The value of X at Y=0 is .70
c. Y increases by .70 for each one unit increase in X
d. X increases by .70 for each one unit increase in Y
3. The value of the regression constant ( intercept) is :
a. -1.96
b. 3.64 c. -2.68
d. not enough data
4. Using the regression of y on x, the predicted value of y for x
= 15 is: a. 10.24
b. 5.76
c. 8.55
d. 13.39
5. Given that the observed value at X=15 is 8, the prediction
error is: a. -.55
b. -2.24
c. 2.26
d. 5.39
===================================================
[13]
Download