Math 52 Linear Regression Instructions TI-83 Use the following data to study the relationship between average hours spent per week studying and overall QPA. The idea behind linear regression is to determine if two variables have a linear relationship, and to find the equation of a line that best fits the data. Th first question is - does the data appear to have a linear relationship? A scatterplot of the data usually helps determine if a relationship appears to exist. If the relationship appears to be linear you will want to determine the line of best fit and the correlation coefficient. “Eyeballing” the data usually is useless as far as determining the linearity of the data, some kind of scatterplot is your best bet. Average Weekly Study Hours = AWSH AWSH 0 2 3 3.5 G.P.A 2.00 1.75 1.95 3.8 4 2.5 5 2.0 6.5 2.5 7 3.0 9 3.5 10 3.5 11 4.0 15 3.0 1. DRAW A SCATTERPLOT You can either draw a scatterplot by hand or use your calculator. Enter the data into your calculator as you normally would, except now you have to enter the x values into one list and the y values into another list. Put the x’s into L1 and the y’s into L2. 1 The easiest way to plot the data using the calculator is to do the following: 1. Turn the STAT plot on, do this by pressing [2nd] and [Y=] to get the following: You will need to activate the plot, activate PLOT 1 by pressing [1] or [Enter] to get First use the blue arrow keys to highlight the On choice (once the cursor in on the ON choice press [ENTER] to activate it). Second you need to pick the scatterplot choice from the list of choices, it is the first choice. Third enter the lists your x and y values are entered into (for our example this is L1 and L2) Finally press the blue [Graph] key. Note: if your graph does not appear, press the blue [ZOOM] and scroll down until you see the choice 9:ZoomStat, this will readjust the window dimensions and most likely you will see the graph now. 2. CALCULATE r (the CORRELATION COEFFICIENT) To get the summary statistics for calculating the value of r (the correlation coefficient), run the 2-var stats for x and y. 2-var stats can be found in the same menu as 1-var stats, it is the second choice, that is: 2 Pressing [ENTER] will yield: scrolling through the list will yield We can calculate the value of r by using r= ( ) n∑ xy − (∑ x )(∑ y ) ( ) 2 2 n x 2 − (∑ x ) n ∑ y 2 − (∑ y ) ∑ 278.8 = = .6258 (49.37610758)(9.02274927 ) = 12(235.4) − (76 )(33.5) 12(684.5) − 76 2 12(100.305) − 33.5 2 We can also get the calculator to calculate r for us. Under the TESTS menu (you can find this menu under the main STATS menu). Scroll down to find choice E, 3 Pick choice E and the screen should change to Note: 1. The x and y list should correspond to the lists where you entered your data, so here it should be L1 and L2. 2. The Freq choice will usually be 1. 3. For the β & ρ : ≠0 <0 >0 row highlight the ≠0 choice. 4. Next to the RegEq we want to enter Y1, to do so, place the cursor next to the RegEq and then press [VARS], move the cursor to highlight the Y-Vars menu, the first choice should be 1:Function, press 1 or [ENTER], the new menu should yield a list of y-vars, the first choice should be 1:Y1, just press [ENTER] and you should return to the line RegEq and the Y1 should be where you want it (you should not have to do this step again unless you erase the calculator memory). The screen should now look like : Now put the cursor on the Calculate choice and press enter, you should get the following: 4 scroll down to get the rest of the information More information is given than we need at the moment, but we will go back and use the rest, notice the value for r is the same was we calculated by hand. 3. TEST r FOR STATISTICAL SIGNIFICANCE Once r is calculated, we need to determine if r is statistically significant. If r is statistically significant then we will proceed to find the regression line (or line of best fit). There are two ways to test the significance of r. This test involves testing Ho: ρ = 0 there is no significance H1: ρ ≠ 0 there is a significant relationship Method 1 Using Table A-6 1. Find the absolute value of r 2. Determine your level of significance, either 0.05 or 0.01 3. Go to the row that corresponds to n 4. If the absolute value of your r is greater than the value from the table, your r is statistically significant, and there is a linear correlation. Method 2 Using the t-test for r. 5 1. Calculate t = r 1− r2 n−2 , the degrees of freedom are n-2 2. Find the t-statistic from Table A-3, row n-2 and the column that corresponds to your choice of α. 3. Determine if your test statistic falls in the rejection or acceptance region. (Notice the t value is calculated when you run the LinRegTTest as well as the p-value for the test) If r is statistically significant, we can proceed and find the line of best fit. 4. FIND REGRESSION LINE (or LINE OF BEST FIT) We have already found all the info we need to calculate the line of best fit when we found the 2-var stats. The line of best fit has the form yˆ = b0 + b1 x , where (∑ y )(∑ x 2 ) − (∑ x )(∑ xy ) b0 = = y-intercept 2 n(∑ x 2 ) − (∑ x ) and b1 = n(∑ xy ) − (∑ x )(∑ y ) ( ) n ∑ x 2 − (∑ x ) 2 = slope In this case we can find that (33.5)(684.5) − (76)(235.4) = 5040.35 = 2.06741 2 2438 12(684.5) − (76) 12(235.4 ) − (76 )(33.5) 278.8 b1 = .11435 = 2 2438 12(684.5) − (76 ) b0 = So our line of best fit is yˆ = 2.067 + .1144 x Notice the calculator calculated these values when we ran the LinRegTTest, note on the calculator b0 is the value of “a” and b1 is the value of “b”. Graph the line of best fit over the scatterplot of the data set and see that we have 6 To get the line in your graph, just press the blue [GRAPH] key again, and the scatterplot should appear but this time the regression line should also appear (this results because you entered the Y1 next to the RegEQ in the LinRegTTest, if you had not done this the line would not appear now). 7