2.1C Describing Location in a Distribution Name __________________________________ What we know when exploring quantitative data: 1. Plot the data (the graph is usually a dotplot, stemplot, or histogram). 2. Look for and describe the overall pattern (Shape, Center, Spread) and departures from that pattern (Outliers). 3. Calculate a numerical summary to briefly describe center and spread (mean, std dev, 5 number summary, etc.) What we are going to learn today about exploring quantitative data: 4. For large data sets, the pattern can be so regular that we can describe it by a smooth curve. Density Curves Characteristics of density curves: 1. They are always on or above the horizontal axis. 2. The total area underneath the curve is equal to 1. Density curves describe the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. Consider a relative frequency histogram: the horizontal axis is broken into classes, and the vertical axis measures the percentage/proportion of observations in each class. See the example below from a previous class. What proportion of teams averaged between 100 and <105 points per game? What proportion of teams averaged between 95 and <105 points per game? 0.6 Relative Frequency of PTSG This histogram, if you recall, shows the relative frequencies of average points per game scored by the 30 NBA teams in the 09-10 season. 0.5 0.4 0.3 0.2 0.1 0 90 95 100 105 PTSG 110 115 What proportion of teams scored between 100 and <103 points per game? That last question highlights a limitation of the histogram. By getting a smooth curve, a density curve, to approximate the data, we are able to ask and answer a more comprehensive set of questions. Important notes about density curves: 1. Outliers are not described by the density curve. 2. Density curves are approximations. They will never mimic the actual data perfectly; they will, however, be accurate enough for practical use, and often times easier to use. EX] BATTING AVERAGES The first histogram below shows the distribution of batting average (proportion of hits) for the 432 Major League Baseball players with at least 100 plate appearances in the 2009 season. The smooth curve shows the overall shape of the distribution. In the middle graph, the more heavily shaded bars on the right represent the proportion of player who had batting averages of at least 0.270. There are 177 such players out of a total of 432, for a proportion of 0.410. In the third graph below, the area under the curve to the right of 0.270 is shaded more heavily. This area is 0.3974, only 0.0126 away from the actual proportion of 0.410. Describing Density Curves Median of a density curve: A median is a data point that has half the observations on either side. Since the area under the curve now represents proportions of the total number of observations, the median will cut the total area in half: 0.5 to the left, and 0.5 to the right. Mean of a density curve: A mean can be described as a balancing point (think about the see-saw analogies I gave you when we talked about standard deviations the first time, and how the fulcrum of the balanced see-saw was the mean). The mean does not cut the data in half like the median, because values far away from the fulcrum pull harder!! Density curves can be skewed, and the mean will be pulled towards the skew. Symmetric density curve: A density curve that is perfectly symmetric will have identical mean and median Mean and Standard Deviation: Unlike sample data sets, which use 𝑥̅ and 𝑠𝑥 as the mean and standard deviation, respectively, density curves will use 𝜇 (mu) and 𝜎 (sigma). Understanding Check 1. Explain why this is a legitimate density curve. 2. About what proportion of observations lie between 7 and 8? 3. Mark the approximate location of the median. 4. Mark the approximate location of the mean. Explain why the median and the mean have the relationship that they do in this case. Pg 108: 27, 31, 39 Read pages 110-119