Module 9 Linear Regression Introduction The main objective of many statistical investigations is to establish relationships which make it possible to predict one or more variables in terms of the other variables. For instance, studies are made to predict the future weight of a person in terms of the number of weeks he or she will stay on an 800-calorie-per-day diet, family expenditures on medical care in terms of family income and per capita consumption of certain food items of their nutritional value. The statistical technique used for determining the probable form of the relationship between variables is called regression analysis. The ultimate objective when using this method of analysis is usually to predict or estimate the value of one variable (dependent variable) corresponding to a given value of another variable (independent variable). Objectives: At the end of this module, you should be able to: 1. 2. 3. 4. Fit a simple linear regression model to a given data set. Overlay the graph of the regression line on the scatter diagram. Interpret the predicting equation. Compute the coefficient of determination and interpret the result. In this section we consider the problem of estimating or predicting the value of the dependent variable (usually represented by Y) on the basis of a known measurement of an independent and frequently controlled variable (usually represented by X). Suppose we wish to predict 63 a student’s grade in chemistry based on his score on an intelligence test administered prior to his attending college. To make such a prediction we first examine the distribution of chemistry grades corresponding to various intelligence test scores achieved by students in prior years. Denoting an individual’s chemistry grade by y and his intelligence test score by x, then the pertinent data of any student in the population can be represented by the coordinates (x,y). A random sample of size n form the population might then be designated by the set {(xi,yi); I = 1 , 2, …, n}. Let us consider the distribution of chemistry grades corresponding to intelligence test scores of 50, 55, 65 and 70. The chemistry grades for a sample of 12 students having these intelligence test scores are presented in Table 1. Table 1 Intelligence Test Scores and Chemistry Grades Student 1 2 3 4 5 6 7 8 9 10 11 12 Test Score, x 65 50 55 65 55 70 65 70 55 70 50 55 Chemistry Grade, y 85 74 76 90 85 87 94 98 81 91 76 74 Before proceeding any further let us first consider the following lectures and discussion that will be necessary in answering the above problem. SIMPLE LINEAR REGRESSION Of the many questions that can be used to predict values of one variable, Y, from given values of another variable, X, the simplest and most widely used is the linear equation in two unknown, which is 64 expressed by the simple linear statistical model also called as the simple linear regression model. Yi = 0 + 1 Xi + i , i = 1, 2,…, n where Yi = ith observed value of the random variable Y Xi = ith observed value of the random variable X 0 = regression constant, the true Y intercept 1 = regression coefficient; the slope of the line i = random error associated with Y for a given Xi The quantities 0 and 1 in the model are called the parameters of the model. Their values can only be determined if the entire population of (X,Y) values can be obtained. In most practical situations, only estimate of 0 and 1 can be calculated. Let b0 represent an estimate of 0 and b1 an estimate of 1 : b0 and b1 can be determined by either the freehand or the least square methods. The latter procedure would reveal the least squares estimate of 0 and 1 , respectively. xy – ( x)( y) n b1 = --------------------------- x2 – ( x)2 n _ _ b0 = y – b1x (Eqn. 1) (Eqn. 2) These computed values of b0 and b1 determine the equation of the estimated sample regression line. y = b0 + b1x. This equation is referred to as the predicting equation since it is used to predict the value of y (denoted by y^), for given a value of x. Example: Using the data presented in Table 1, x refers to the intelligence test scores and y as the chemistry grade, we obtain the following quantities: n = 12 xi = 725 yi = 1011 xiyi = 61,685 65 xi2 = 44,475 yi2 _ _ = 85,905 x = 60.417 y = 84.250 61,685 – (725)(1011) 12 b1 = ----------------------------- = 0.897 44,475 – (725)2 12 b0 = 84.250 – 0.897(60.417) = 30.056 We can now write the predicting equation as y = 30.056 + 0.897 This can be used to predict the chemistry grade of the student for a known intelligence test scores. For x = 50, for instance, we get y = 30.056 + 0.897(50) = 74.9 Thus a student whose intelligence test score is 50 is predicted to have a chemistry grade of 74.9 while a student whose intelligence test score is 70, the predicted chemistry grade is 92.8 (verify this value). If we would like to assess how good the predicting line is, we compute the value of R2, the coefficient of determination. (R2 is the percentage contribution of the independent variable x to the changes in the dependent variable y. The range of R2 is between 0 to 100%) b1[ xy – ( x)( y)/n] R = -------------------------------- x 100 Y2 – ( Y)2/n 2 0.897[61,685 – (725)(1011)/12] R = ----------------------------------------- x 100 85,905 – (1011)2/12 2 R2 = 74.36% 66 This means that about 74.36% of the total variation or changes in the chemistry grade is accounted for or is explained by the intelligence test score. Activity 1. Consider the following set of data: x:1 2 3 4 5 6 y:6 4 3 5 4 2 a. Plot the points of x and y on your graphing paper (Do not connect the points by means of a line, the resulting graph is known as the scatter diagram). b. Find the estimating regression equation, that is, find the value of b1, b0 and y. c. Graph the equation of the line on your scatter diagram. d. Estimate the value of y when x = 4? When x = 7? e. Compute R2 and interpret your results. 2. Look for three studies that made use of simple linear regression analysis. For each of them, present the following: a. hypotheses of the study b. regression equation obtained c. your interpretation of the given regression equations