Algebra 1 Chapter 6 Math Notes Scatterplots A graph of plotted points that shows data from two variables. The pattern of the data allows us to see if there is any correlation between the data. The data points are not connected to one another. Line of Best Fit A line of best fit is a line drawn through the center of data on a scatterplot. The line of best fit is written in the form y = mx + b. To find the equation of the line of best fit, choose two points that fall on the line of best fit and find the equation between those two (x, y) points. Describing Association The association of a scatterplot can be described using the acronym DOFS. DOFS stands for Direction, Outlier, Form, and Strength: Direction: Describes the general direction of data Positive Association: as one variable increases, the other variable also increases Negative Association: as one variable increases, the other variable decreases No Association: no apparent pattern Outlier: a piece of data that does not seem to fit into the overall pattern Form: Describes the shape of data Linear or non-linear Strength: Describes how close together or far apart the data points are Strong: Data points are really close together Moderate: The data points are kind of close together Weak: The data points are really scattered Interpreting Slope and y-intercept on Scatterplots Slope : the amount of change we expect in the dependent variable (y-variable) when independent variable (xvariable) increases by one unit y-intercept: the predicted value of the dependent variable (y) when the independent variable (x) is zero Note: Be careful about extrapolating the data (making prediction that are beyond the data)! Our predictions become less reliable when we go outside of the range of our data. **Keep in mind this is just a prediction! Residuals We measure how far a prediction made by our model (line of best fit or LSRL) is from the actual value with a residual: Residual = actual – predicted A residual has the same units as the y-axis. A residual can be graphed with a vertical line that extends from the data point to the line of best fit. The length of the line is the residual. A positive residual means that the actual data point is greater that the predicted value. A negative residual means that the actual value is less than the predicted value. Least Squares Regression Line A unique line of best fit for data can be found by finding the line that makes the residuals as small as possible. We call this line the Least Squares Regression Line and abbreviate it LSRL. A calculator can find the LRSL quickly. There is one unique LSRL for any set of data. You can use the same line of best fit to make predictions from the data. We model data because: 1. The trend in the data can easily be described without giving a list of data points. 2. Predictions can be made about points for which we do not have actual data. Residual Plots A residual plot is created in order to analyze for what form of linear association should be used. A residual plot has an x-axis that is the same as the independent variable for the data. The y-axis of a residual plot is the residual of each data point (which has the same units as the y-axis). If a linear model fits the data well, no interesting pattern will be made by the residuals. That is because a line that fits the data well just goes through the middle of all of the data. Correlation Coefficient The correlation coefficient, r, is a measure of how much or how little data is scattered around the LSRL. (In other words, it is a measure of the strength of a linear association.) The correlation coefficient can have values between –1 and 1. If r = 1 or r = –1 the association is perfectly linear, and there is no scatter at all. If r = 0, the data is extremely scattered around the LSRL A positive correlation coefficient means the trend is increasing (slope is positive) A negative correlation means the trend is decreasing (slope is negative) A correlation coefficient of zero means the slope of the LSRL is horizontal and there is no linear association at all between the variables. The correlation coefficient does not have units so it can be used to compare scatter from different situations. The correlation coefficient does not have a real-world meaning other than a measure of strength.