Chapter 3: Describing Relationships Section 3.1 Scatterplots and Correlation Explanatory and Response Variables Definition: A response variable (y) measures an outcome of a study. An explanatory variable (x) may help explain or influence changes in a response variable. SCATTERPLOTS AND CORRELATION Most statistical studies examine data on more than one variable. In many of these settings, the two variables play different roles. Displaying Relationships: Scatterplots The most useful graph for displaying the relationship between two quantitative variables is a scatterplot. Definition: A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as a point on the graph. How to Make a Scatterplot 1. Decide which variable should go on each axis. • Remember, the eXplanatory variable goes on the X-axis! 2. Label and scale your axes. 3. Plot individual data values. NOTE: The axes need not intersect at (0,0). For each of the axes, the scale should be chosen so that the minimum and maximum values on the scale are convenient and the values to be plotted are between the two values. Displaying Relationships: Scatterplots Make a scatterplot of the relationship between body weight and pack weight. Since Body weight is our eXplanatory variable, be sure to place it on the X-axis! Body weight (lb) Backpack weight (lb) 120 187 109 103 131 165 158 116 26 30 26 24 29 35 31 28 Interpreting Scatterplots For the distribution of a single quantitative variable, “shape, center, spread, outliers” (SOCS) has been a useful summary. How to Examine a Scatterplot As in any graph of data, look for the overall pattern and for striking departures from that pattern. • You can describe the overall pattern of a scatterplot by the direction, shape, and strength of the relationship. • An important kind of departure is an outlier, an individual value that falls outside the overall pattern of the relationship. (Ask yourself, “Is there a striking exception to the overall pattern?”) DESCRIBING STRENGTH Describe the strength of the relationship. If the points cluster closely around an imaginary line, the association is strong. If the points are scattered farther from the line, the association is weak. DESCRIBING DIRECTION Definition: Two variables have a positive association when above-average values of one tend to accompany above-average values of the other, and when belowaverage values also tend to occur together. (i.e., Generally speaking, the y values tend to increase as the x values increase.) Two variables have a negative association when above-average values of one tend to accompany below-average values of the other. (i.e., Generally speaking, the y values tend to decrease as the x values increase.) Interpreting Scatterplots Outlier There is one possible outlier, the hiker with the body weight of 187 pounds seems to be carrying relatively less weight than are the other group members. Strength Direction Form There is a moderately strong, positive, linear relationship between body weight and pack weight. It appears that lighter students are carrying lighter backpacks. Interpreting Scatterplots Consider the SAT example from page 144. Interpret the scatterplot. Strength Direction Form There is a moderately strong, negative, curved relationship between the percent of students in a state who take the SAT and the mean SAT math score. Further, there are two distinct clusters of states and two possible outliers that fall outside the overall pattern. EXAMPLE S ta nda rd A sample of one-way Greyhound bus fares from Rochester, NY to cities less than 750 miles away was taken by going to Greyhound’s website. The following table gives the destination city, the distance and the one-way fare. Which variable is the explanatory? O ne -W a y D e stina tio n C ity D ista nc e A lba ny, N Y 240 F a re 39 B a ltim o re , M D 430 81 B uffa lo , N Y 69 17 C hic a g o , IL 607 96 C le v e la nd, O H 257 61 M o ntre a l, Q U 480 7 0 .5 N e w Y o rk C ity, N Y 340 65 O tta w a , O N 467 82 P hila de lphia , P A 335 67 P o tsda m , N Y 239 47 S yra c use , N Y 95 20 To ro nto , O N 178 35 W a shing to n, D C 496 87 Response? 12 EXAMPLE SCATTERPLOT $100 Greyhound Bus Fares Vs. Distance $90 Standard One-Way Fare $80 $70 $60 $50 $40 $30 $20 $10 50 150 250 350 450 550 Distance from Rochester, NY (miles) Verify the plot on your graphing calculator Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 13 650 FURTHER COMMENTS It is possible that two points might have the same x value with different y values. Notice that Potsdam (239) and Albany (240) come very close to having the same x value but the y values are $8 apart. Clearly, the value of y is not determined solely by the x value (there are factors other than distance that affect the fare). In this example, the y value tends to increase as x increases. We say that there is a positive relationship between the variables distance and fare. It appears that the y value (fare) could be predicted reasonably well from the x value (distance) by finding a line that is close to the points in the plot. 14