Tanta University- Faculty of Commerce- English Section Second Year 2019-2020 Introductory Statistics, Week Seven Chapter 12 Simple Linear Regression and Correlation Analyses In this Lecture we will cover descriptive measures of simple linear regression and correlation analyses and in a subsequent course inferential techniques will be handled. Firstly : The Correlation Coefficient Used to analyze the relationship between two continuous variables. Step one in the analysis is to plot the data for x and y in a scatter plot. Scatter plot is a two dimensional plot, values of x are on the xaxis, and values of y are on the y-axis, the values of (x, y) are plotted. Step 2, is to examine the scatter plot, you may obtain one of the following: 1. Linear Relationship: going upward or downward. a) Plot to the left, indicates a direct relationship, meaning that as x values increase the y values also increase. b) The plot to the right indicates an inverse relationship between X and Y, meaning that as x values increases the y values decrease. In both plots, we can fit a straight line passes through most [3] Tanta University- Faculty of Commerce- English Section Second Year 2019-2020 Introductory Statistics, points , we say that the relation ship between X and Y is linear. 2. Curvilinear Relationship: where the points take the shape of a quadratic relationship (plot to the right) or a cubic relationship ( plot to the left) between Y and X. 3. No relationship : no particular pattern Points are scattered irregularly, and no particular pattern for the values of X and Y; as x increases y some times increase and some times decrease. Example: The following data gives household income and expenditure in thousands pounds for 10 families: Income 3 4 4 6 7 6 8 9 9 11 Expenditure 2 3 4 4 5 5 6 7 7 8 Using excel, input data in two columns, select data, use the insert tab and select Scatter , you get the following linear scatter plot : Thus, a linear relationship exists between income and expenditure. [4] Tanta University- Faculty of Commerce- English Section 2019-2020 Introductory Statistics, Second Year To reach the strength of this relationship we compute the correlation coefficient. Pearson's Product Moment correlation coefficient. The coefficient ranges between +1 and -1: How to compute the correlation coefficient ? a) Computational formula is : r n x n xy x y 2 x n y 2 y 2 2 Where, n: Sample size ∑xy : is the sum of cross product of each value of y times the corresponding value of x ∑x: is the sum of the column of x ( the independent variable); ∑y: is the sum of the column of y( the dependent variable); ∑x2: is the sum of the squared values of x ∑y2: is the sum of the squared values of y Applying the above formula to the data above for income and expenditure, we form the following table, where the variables are income (x) and expenditure (y) , [5] Tanta University- Faculty of Commerce- English Section 2019-2020 Introductory Statistics, Second Year Income Expenditure(Y) XY X2 Y2 (X) 3 2 6 9 4 4 3 12 16 9 4 4 16 16 16 6 4 24 36 16 7 5 35 47 25 6 5 30 36 25 8 6 48 64 36 9 7 63 81 49 9 7 63 81 49 11 8 88 121 64 2 ∑x=67 ∑y=51 ∑xy=385 ∑x =509 ∑y2 =293 From the table , we find that : n=10 ∑ x = 67 ∑ y= 51 ∑ xy=385 ∑ x2 = 509 ∑ y2= 293 Inserting those sums in the correlation coefficient equation, we get: r 10 385 67 51 10 509 (67) 10 293 (51) 2 2 433 .9738 601 329 Interpretation How do we interpret a correlation coefficient of .9738 ? 1. First it is positive, so we see that there is a direct relationship between income (X) and expenditure (Y). [6] Tanta University- Faculty of Commerce- English Section 2019-2020 Introductory Statistics, Second Year 2. Second, it is very close to 1, thus we conclude that the association is strong; i.e., an increase in income will definitely leads to an increase in expenditure. b) The CORREL function : using excel, Input data in two columns and select , the Formula tab, statistical and “ correl” function as follows: You get the following dialog Box, where you fill in the addresses of the first variable (C5:C14) and the second variable (C5:C14) and you get the correlation coefficient=.97378, as shown. [7] Tanta University- Faculty of Commerce- English Section 2019-2020 Introductory Statistics, Second Year Secondly: Simple Linear Regression Regression analysis is used to predict the value of a dependent variable (effect variable) based on the value of at least one independent variable( cause variables). In our example, the effect variable is : Expenditure” and the cause variable is “ Income”. There is only one independent variable in simple regression analysis The population Regression Model The actual population regression model takes the following form i β 0 β1x i ε i Where: Yi is dependent variable (observed values) for observation i β 0 : population y intercept; it is the value predicted when X=0. β 1 : is the regression coefficient in the population, it is the amount of change in the dependent variable associated with one unit increase in the independent variable. Xi: the value of the independent variable for observation i εi: is a random error for observation i describes the difference between the observed value and the average value ( predicted value). e , is the The population regression line or the prediction lin y mean expected value for y at a given x, and it contains only the first two components of model (1) above, thus, the prediction line is given by: y β 0 β1 x (2) [8] Tanta University- Faculty of Commerce- English Section Second Year 2019-2020 Introductory Statistics, Thus the difference between the actual y values y and the predicted values Is the error ε term : y y as shown in the scatter plot diagram. Estimating the Regression Equation The purpose is to estimate equation (2) such that the error of prediction (Equation (3) is a minimum. The method used is called Least Square method, this method makes the sum of squared error terms ( equation 3) minimum. Thus to estimate equation (2): b 0 b1 x ˆi y Such that the squared error term is a minimum, we apply the following equations to estimate the prediction equation: b1 n xy x y n x 2 x 2 b0 y b1 x The dependent variable is the “ effect or response variable”, and the independent variable is the “ cause” variable. Applying the estimated coefficients equations above, we get: [9] Tanta University- Faculty of Commerce- English Section Second Year 2019-2020 b0 y b1 x y 51 y 5.1 n 10 b0 5.1 .7205 6.7 .2727 Thus, the regression equation is: b) Introductory Statistics, x 67 x 6.7 n y .2727 .7205 x Finding the slope and Intercept Using Excel 1. The Intercept” and “ SLOPE functions Using excel function “ Intercept” we get the estimated bo and using the “ SLOPE” function we get the estimated regression coefficient. We proceed as earlier, we get: [10] 10 Tanta University- Faculty of Commerce- English Section Statistics, Second Year 2019-2020 Introductory And thus, y .2727 .7205 x same as obtained earlier. Interpretation of the regression equation: 1. The intercept = .2727 This means that the expenditure is 272.7 pound (.2727 thousand pound= 272.2) if income = zero. 2. The slope or regression coefficient = .7205 ( in thousands), this means that for every increase in income by thousand pound, the consumption increases by 720.5 pounds. 3. The regression coefficient always take the direction of the correlation coefficient, either they are both positive or they are both negative. 4. To use the estimated regression equation : at income =3: y .2727 .7205 3 2.434 [11] Tanta University- Faculty of Commerce- English Section Statistics, Second Year 2019-2020 Introductory Thus, the mean expenditure for families make 3 thousands pounds income, it is 2.434 thousands . And the mean expenditure for families make 4 thousand income, replacing 4 for x we get : y .2727 .7205 4 3.1547 Questions On Simple correlation and Regression Analysis Use Excel functions and get the standard deviation of bothe variables used in the example above and check the relationship between the correlation coefficient and the regression coefficient in terms of the standard deviations of both variables. and True/ false 1. Pearson Product Moment correlation coefficient is used on quantitative data only. 2. A correlation coefficient of -1.0 indicates a weak relation ship between the two variables. 3. A correlation of .74 is found between cost and profit, this means that as cost go up profit goes up. 4 . when the standard deviations of both variables are equal then the correlation coefficient and the regression coefficient are equal. 5. A regression coefficient of -2.1 is associated with a positive correlation coefficient. [12] Tanta University- Faculty of Commerce- English Section Statistics, Second Year 2019-2020 Introductory MCQ ============================================ Use the following function argument and answer 1 to 5 1. The number of pairs of {x, Y} is : (a) 12 b. 6 c. 5 d. 10 2. The regression coefficient indicates that: a. The correlation between {X, Y} is .70 b. The value of X at Y=0 is .70 c. Y increases by .70 for each one unit increase in X d. X increases by .70 for each one unit increase in Y 3. The value of the regression constant ( intercept) is : a. -1.96 b. 3.64 c. -2.68 d. not enough data 4. Using the regression of y on x, the predicted value of y for x = 15 is: a. 10.24 b. 5.76 c. 8.55 d. 13.39 5. Given that the observed value at X=15 is 8, the prediction error is: a. -.55 b. -2.24 c. 2.26 d. 5.39 =================================================== [13]