Chapter 2: Exploring Distributions The goal of this chapter is to provide you with the basic concepts and tools for exploring distributions of ____________ data (data which comes from observing a single variable). Univariate data can be grouped into two categories: 1. Categorical (qualitative) Variables The observations specify the group (category) to which an individual belongs. e.g. the brand of calculator students own, whether a mammal is considered wild or not. 2. Numerical (quantitative) Variables The observations assign a numerical value or measurement to each individual. e.g. gestation period of elephants, maximum longevity of smokers. 2.1 Visualizing Distributions: Shape, Center, and Spread This first section simply introduces various distribution shapes and asks you to estimate some summary values visually. Later we will compute summary values numerically. The AP Statistics Exam frequently asks you to describe or compare distributions. Although it may not mention each of these specifically, when discussing numerical variables you must always include three things (categorical variables will be covered later): 1. Shape (if a distribution does not look like any of these then it is said to be “indistinct”.) The common shapes of a distribution are: Uniform _________ Skewed (left or right) Symmetric Bimodal Any other features – such as __________, clusters or gaps – should also be mentioned when discussing the shape. 2. Center The two common measures of center are the: Mean (average value) Median (middle value) The ________, or most common value, is not very useful except when talking about bimodal distributions. 3. Spread The spread measures the ___________ in the values. The two common measures of spread are: Interquartile range Standard Deviation Distributions come in a variety of shapes, and the following is a summary of the four most common. Uniform (Rectangular) Distributions In a uniform distribution all values occur __________ often. The shape is rectangular. e.g. The last digit of social security codes is fairly uniform. To summarize this distribution, you might write “The distribution of the last digits of 50 social security numbers is approximately uniform, with about 5 out of 50 (10%) per digit. Dot Plot Social Security Digits 0 2 4 6 SS_Last_Digit 8 10 Normal Distributions This is one of the most important shapes – but you must be careful about applying the term normal to any distribution that is bell-shaped and symmetric. In a normal distribution the shape is symmetric. There is a single peak at the line of symmetry, and the curve drops off smoothly on both sides, flattening toward the x-axis but never quite reaching it and stretching infinitely far in both directions. The mean, median and mode are all at the line of symmetry. ___________ point (where the curve changes from concave up to concave down) Skewed Distributions Distributions which are bunched at one end and have a long “______” at the other end are called skewed. The direction of the tail tells whether the distribution is: Skewed right – tail stretches right Skewed left – tail stretches left e.g. The distribution of GPA’s of 62 students is skewed left. Grade-Point Averages of 62 students (each dot represents tw o points) 0.5 1.0 1.5 2.0 GPA 2.5 3.0 Dot Plot 3.5 4.0 Typically the median is used to describe the center of a skewed distribution. To estimate the median from a dot plot, locate the value that divides the dot into two halves. Use the interquartile range to indicate the spread. Interquartile range = upper quartile – lower quartile The value that divides the upper half into two halves. The value that divides the lower half into two halves. Bimodal Distributions A distribution that has two peaks is bimodal. These peaks should be definite and separated by some distance before calling the distribution bimodal. It doesn’t make sense to discuss the center of the whole distribution, but you could talk about the location of the two peaks and give a reason for the two modes. Other Features: Outliers, Gaps, and Clusters An ___________ is a value that stands apart from the bulk of the data (later we will look at a rule to help identify an outlier). There is no formal definition of a gap, or a cluster – just use common sense. e.g. Lord Rayleigh noticed two clusters separated by a gap when he measured the density of nitrogen of samples collected from the atmosphere (the cluster on the right) and from a chemical procedure (the cluster on the left). This led him to discover inert gases in the atmosphere. Dot Plot Rayleigh's Nitrogen Data 2.298 2.300 2.302 2.304 2.306 Density 2.308 2.310 2.312 Example: Describe the distribution below in terms of shape, center, and spread. Dot Plot Los Angeles Rainfall 1899-1999 0 Solution: Shape: Center: Spread: Other Features: 5 10 15 20 Rainfall (inches) 25 30 35