Uploaded by J F

Pearson R & Simple Linear Regression (1)

advertisement
Pearson R & Simple Linear Regression
Engr. Maricar M. Navarro
 The annual consumer expenditures and annual net incomes of a sample of 10 families
in a Metropolitan Area in 2007 are shown on the following table below. Prepare a
regression and a correlational analysis of their expenditures and net income for
2007.
Family
A
B
C
D
E
F
G
H
I
J
Net Income (x) Expenditure (y)
(in hundred
(in thousand
thousand pesos)
pesos)
10
2
4
6
8
7
4
6
7
6
23
7
15
17
23
22
10
14
20
19
Pearson R
(Formula 1)
Pearson R
(Formula 2)
ANALYSIS of the Relationship

The figure shows the formula on how to compute for the value of r which is a unit less quantity. Be
reminded that if r=1, the relationship is perfectly positive, contradictorily, if r=-1, relationship is
perfectly negative. On the other hand when r=0, there is no relationship between the two variables.
Substitute the data from the table on the formula,

Computing a value significantly nearer to positive 1, the value of r = 0.91 infers that there is almost
a very high correlation between the net income and expenditures .

The (linear) relationship is very strong. The value of r2= 0.83 states that 83% of the total variation
in expenditure is explained by the net income and 17% is not. The low value of r2 indicates that
there may be many other important variables that contribute to the determination of expenditures.
For example, the amount of expenditure is expected to depend on the number of family members
and location where the family resides.
Since the degree of relationship is quantitatively shown by the value of r, we also want to know the
degree of relationship. Thus, we have to conduct a test
 for significance between the Net Income to Expenditures and state the null hypothesis and
alternative hypothesis

Test Hypothesis
 In hypothesis testing, the significance level is the criterion used for
rejecting the null hypothesis. The significance level is used in hypothesis
testing as follows: First, the difference between the results of the
experiment and the null hypothesis is determined. Then, assuming the
null hypothesis is true, the probability of a difference that large or larger
is computed . Finally, this probability is compared to the significance
level. If the probability is less than or equal to the significance level, then
the null hypothesis is rejected and the outcome is said to be statistically
significant. Traditionally, experimenters have used either the 0.05 level
(sometimes called the 5% level) or the 0.01 level (1% level), although
the choice of levels is largely subjective. The lower the significance level,
the more the data must diverge from the null hypothesis to be
significant. Therefore, the 0.01 level is more conservative than the 0.05
level. The Greek letter alpha (α) is sometimes used to indicate the
significance level.
 Test Hypothesis
no correlation
Null Hypotheis
Ho: r = 0 There is no significant relationship of Family's Net
Income into their Expenditures
positively correlated
Alternative Hypothesis
Ha: r ≠ 0 There is a significant relationship of Family's Income
into the Expenditures
Method 1: Using a P-value to make a
decision
Pvalue = 0.000
If Pvalue < 0.05 (statistically
significant
Where as;
Significance level : An α of 0.05 indicates that the risk of concluding that a
correlation exists—when, actually, no correlation exists—is 5%
Calculation notes for p value
Using minitab to calculate the p-value. The P-value is the probability that you would have found
the current result if the correlation coefficient were in fact zero (null hypothesis). If this
probability is lower than the conventional 5% (P<0.05) the correlation coefficient is called
statistically significant.
Tabulation (Method 1) Pvalue
Criteria to
Reject Null
Hypothesis
Sources
Family's Net Income
(X) (Familys
Expenditures (Y)
Significa
nce level
α = 0.05 P-value
0.05
0
If Pvalue< α
Hypothesis
Reject Null
Decision
Verbal Interpretation
Reject the null hypothesis There is
sufficient evidence to conclude that
Pvalue> α
there is significant linear
HO: ρ = 0;(no
relationship between x (Familys
correlation)
Reject Ho. And conclude that
Income and y (Expenditures
H1 : ρ ≠0
there is significant correlation
)because the computed Pvalue
significantly
between Familys Income and 0.000 is less than the significance
correlated
0.000>0.05
Expenditures
level 0.05
Method 2 : Using Table of Pearson R
Critical Values to make a decision
 R=0.912 Critical values = 0.444
-1
- 0. 444
0
+0. 444 R= 0.912 + 1
Analysis for Familys Income and Expenditurs (R=0.912) (n= 20)
(df = 20-2 =18) The critical values are - 0.444 and + 0.444
. Since R=0.912 is not within the critical value . Therefore r Is significant.
And can be used a line for prediction
Method : Using Table of T test =T=r
T Critical Values to make a decision
Given : N=20
T
r=0.91
=0.05 /2 , 0.025
r 2 0.83
  0.05
value

9.43
T  0.05 / 2  0.025
T0.025 ,18  2.101
df=20-2 =18
Decision Rule: Reject Ho if
Tvalue  2.101
9.43  2.101
, Reject Ho, This conclude that there is a significantly relationship of Family’s Income to their Expenditures
Test Hypothesis
Test Hypothesis
no correlation
Ho: r = 0
positively correlatedHa: r ≠ 0
There is no relationship of Family's Net Income into their Expenditures
There is a significant relationship of Family's Income into the Expenditures
0.912
18
0.168
106.83871
10.336281
1
Tvalue
9.43
t  r
N 2
1 r 2
1
18.00
0.168
106.84
10.34
9.43
Simple Linear Regression (Method 1)
Family
A
B
C
D
E
F
G
H
I
J
Net Income
(x)
(in hundred
thousand
pesos)
10
2
4
6
8
7
4
6
7
6
Expendi ture
(y)
(i n
thous a nd
pes
23os )
7
15
17
23
22
10
14
20
19
y
x
10
b=
6
17
xy
x^2
230
14
60
102
184
154
40
84
140
114
100
4
16
36
64
49
16
36
49
36
 xy
x
1122
406
 xy - n( y)( x) = 1122 - 10(17)(6)
406  10(36)
 x - n( x)
2
2
a = y - b x = 17 - (2.21)(6) = 3.69
b=
b=
102
46
2.22
a=
3.696

2
63
= 2.21
10
Simple Linear Regression (Method 2)
Simple Linear Regression
Simple Linear Regression
Family
A
B
C
D
E
F
G
H
I
J
Net Income
Expenditure
(x)
(y)
(in hundred
(in thousand
thousand
pesos)
pesos)
10
2
4
6
8
7
4
6
7
6
23
7
15
17
23
22
10
14
20
19
a
b
3.696
3.696
3.696
3.696
3.696
3.696
3.696
3.696
3.696
3.696
2.217
2.217
2.217
2.217
2.217
2.217
2.217
2.217
2.217
2.217
(Residuals)
Predicted
Error Predicted Value
Error Prediction
Value
Prediction
`ŷ =a +bx
( in Thousands)
`ŷ =a +bx
e=y-`ŷ
in thousands
e=y-`ŷ
25.87
8.13
12.57
17.00
21.43
19.22
12.57
17.00
19.22
17.00
-2.87
-1.13
2.43
0.00
1.57
2.78
-2.57
-3.00
0.78
2.00
25,869.57
8,130.43
12,565.22
17,000.00
21,434.78
19,217.39
12,565.22
17,000.00
19,217.39
17,000.00
-2869.57
-1130.43
2434.78
0.00
1565.22
2782.61
-2565.22
-3000.00
782.61
2000.00
30
Correlation between Family's Income and Expenditures
y = 2.2174x + 3.6957
R² = 0.8315
25
30
Expenditure
20
Actual Expenditures vs. Predicted Expenditures
25
10
Expenditures
15
20
Correlation between Family's Income and
Expenditures
15
Linear (Correlation between Family's Income a
Expenditures)
10
5
Choose (Highlight)the Net Income and Expenditure Values , choose scatter diagram, right click add trend line, choose linear , click ok
5
0
0
2
4
0
6A
Net Income
Expenditures (Actual Value)
Predicted Value (Regressed Model0
B
8
C
D 10
E
12
F
G
H
I
J
23
7
15
17
23
22
10
14
20
19
25.87
8.13
12.57
17.00
21.43
19.22
12.57
17.00
19.22
17.00