Pearson R & Simple Linear Regression Engr. Maricar M. Navarro The annual consumer expenditures and annual net incomes of a sample of 10 families in a Metropolitan Area in 2007 are shown on the following table below. Prepare a regression and a correlational analysis of their expenditures and net income for 2007. Family A B C D E F G H I J Net Income (x) Expenditure (y) (in hundred (in thousand thousand pesos) pesos) 10 2 4 6 8 7 4 6 7 6 23 7 15 17 23 22 10 14 20 19 Pearson R (Formula 1) Pearson R (Formula 2) ANALYSIS of the Relationship The figure shows the formula on how to compute for the value of r which is a unit less quantity. Be reminded that if r=1, the relationship is perfectly positive, contradictorily, if r=-1, relationship is perfectly negative. On the other hand when r=0, there is no relationship between the two variables. Substitute the data from the table on the formula, Computing a value significantly nearer to positive 1, the value of r = 0.91 infers that there is almost a very high correlation between the net income and expenditures . The (linear) relationship is very strong. The value of r2= 0.83 states that 83% of the total variation in expenditure is explained by the net income and 17% is not. The low value of r2 indicates that there may be many other important variables that contribute to the determination of expenditures. For example, the amount of expenditure is expected to depend on the number of family members and location where the family resides. Since the degree of relationship is quantitatively shown by the value of r, we also want to know the degree of relationship. Thus, we have to conduct a test for significance between the Net Income to Expenditures and state the null hypothesis and alternative hypothesis Test Hypothesis In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis. The significance level is used in hypothesis testing as follows: First, the difference between the results of the experiment and the null hypothesis is determined. Then, assuming the null hypothesis is true, the probability of a difference that large or larger is computed . Finally, this probability is compared to the significance level. If the probability is less than or equal to the significance level, then the null hypothesis is rejected and the outcome is said to be statistically significant. Traditionally, experimenters have used either the 0.05 level (sometimes called the 5% level) or the 0.01 level (1% level), although the choice of levels is largely subjective. The lower the significance level, the more the data must diverge from the null hypothesis to be significant. Therefore, the 0.01 level is more conservative than the 0.05 level. The Greek letter alpha (α) is sometimes used to indicate the significance level. Test Hypothesis no correlation Null Hypotheis Ho: r = 0 There is no significant relationship of Family's Net Income into their Expenditures positively correlated Alternative Hypothesis Ha: r ≠ 0 There is a significant relationship of Family's Income into the Expenditures Method 1: Using a P-value to make a decision Pvalue = 0.000 If Pvalue < 0.05 (statistically significant Where as; Significance level : An α of 0.05 indicates that the risk of concluding that a correlation exists—when, actually, no correlation exists—is 5% Calculation notes for p value Using minitab to calculate the p-value. The P-value is the probability that you would have found the current result if the correlation coefficient were in fact zero (null hypothesis). If this probability is lower than the conventional 5% (P<0.05) the correlation coefficient is called statistically significant. Tabulation (Method 1) Pvalue Criteria to Reject Null Hypothesis Sources Family's Net Income (X) (Familys Expenditures (Y) Significa nce level α = 0.05 P-value 0.05 0 If Pvalue< α Hypothesis Reject Null Decision Verbal Interpretation Reject the null hypothesis There is sufficient evidence to conclude that Pvalue> α there is significant linear HO: ρ = 0;(no relationship between x (Familys correlation) Reject Ho. And conclude that Income and y (Expenditures H1 : ρ ≠0 there is significant correlation )because the computed Pvalue significantly between Familys Income and 0.000 is less than the significance correlated 0.000>0.05 Expenditures level 0.05 Method 2 : Using Table of Pearson R Critical Values to make a decision R=0.912 Critical values = 0.444 -1 - 0. 444 0 +0. 444 R= 0.912 + 1 Analysis for Familys Income and Expenditurs (R=0.912) (n= 20) (df = 20-2 =18) The critical values are - 0.444 and + 0.444 . Since R=0.912 is not within the critical value . Therefore r Is significant. And can be used a line for prediction Method : Using Table of T test =T=r T Critical Values to make a decision Given : N=20 T r=0.91 =0.05 /2 , 0.025 r 2 0.83 0.05 value 9.43 T 0.05 / 2 0.025 T0.025 ,18 2.101 df=20-2 =18 Decision Rule: Reject Ho if Tvalue 2.101 9.43 2.101 , Reject Ho, This conclude that there is a significantly relationship of Family’s Income to their Expenditures Test Hypothesis Test Hypothesis no correlation Ho: r = 0 positively correlatedHa: r ≠ 0 There is no relationship of Family's Net Income into their Expenditures There is a significant relationship of Family's Income into the Expenditures 0.912 18 0.168 106.83871 10.336281 1 Tvalue 9.43 t r N 2 1 r 2 1 18.00 0.168 106.84 10.34 9.43 Simple Linear Regression (Method 1) Family A B C D E F G H I J Net Income (x) (in hundred thousand pesos) 10 2 4 6 8 7 4 6 7 6 Expendi ture (y) (i n thous a nd pes 23os ) 7 15 17 23 22 10 14 20 19 y x 10 b= 6 17 xy x^2 230 14 60 102 184 154 40 84 140 114 100 4 16 36 64 49 16 36 49 36 xy x 1122 406 xy - n( y)( x) = 1122 - 10(17)(6) 406 10(36) x - n( x) 2 2 a = y - b x = 17 - (2.21)(6) = 3.69 b= b= 102 46 2.22 a= 3.696 2 63 = 2.21 10 Simple Linear Regression (Method 2) Simple Linear Regression Simple Linear Regression Family A B C D E F G H I J Net Income Expenditure (x) (y) (in hundred (in thousand thousand pesos) pesos) 10 2 4 6 8 7 4 6 7 6 23 7 15 17 23 22 10 14 20 19 a b 3.696 3.696 3.696 3.696 3.696 3.696 3.696 3.696 3.696 3.696 2.217 2.217 2.217 2.217 2.217 2.217 2.217 2.217 2.217 2.217 (Residuals) Predicted Error Predicted Value Error Prediction Value Prediction `ŷ =a +bx ( in Thousands) `ŷ =a +bx e=y-`ŷ in thousands e=y-`ŷ 25.87 8.13 12.57 17.00 21.43 19.22 12.57 17.00 19.22 17.00 -2.87 -1.13 2.43 0.00 1.57 2.78 -2.57 -3.00 0.78 2.00 25,869.57 8,130.43 12,565.22 17,000.00 21,434.78 19,217.39 12,565.22 17,000.00 19,217.39 17,000.00 -2869.57 -1130.43 2434.78 0.00 1565.22 2782.61 -2565.22 -3000.00 782.61 2000.00 30 Correlation between Family's Income and Expenditures y = 2.2174x + 3.6957 R² = 0.8315 25 30 Expenditure 20 Actual Expenditures vs. Predicted Expenditures 25 10 Expenditures 15 20 Correlation between Family's Income and Expenditures 15 Linear (Correlation between Family's Income a Expenditures) 10 5 Choose (Highlight)the Net Income and Expenditure Values , choose scatter diagram, right click add trend line, choose linear , click ok 5 0 0 2 4 0 6A Net Income Expenditures (Actual Value) Predicted Value (Regressed Model0 B 8 C D 10 E 12 F G H I J 23 7 15 17 23 22 10 14 20 19 25.87 8.13 12.57 17.00 21.43 19.22 12.57 17.00 19.22 17.00