Chapter 3 Distributions Continuous random variables • Are numerical variables whose values fall within a range or interval • Are measurements • Can be described by density curves Density curves • Is always on or above the horizontal axis • Has an area exactly equal to one underneath it • Often describes an overall distribution • Describe what proportions of the observations fall within each range of values Unusual density curves • Can be any shape • Are generic continuous distributions • Probabilities are calculated by finding the area under the curve .5 How do you find the area of a triangle? .25 1 2 3 4 2.25 .25 P(X < 2) = 2 5 What is the area of a line segment? .5 .25 1 2 P(X = 2) = 0 P(X < 2) = .25 3 4 5 In continuous distributions, P(X < 2) & P(X < 2) areHmmmm… the same answer. Is this different than discrete distributions? Shape is a trapezoid – .5 b1How = .5long are the bases? .25 b2 = .375 1 2 4 h = 1 3 b1 b2 h Area 5 2 P(X > 3) = .5(.375+.5)(1)=.4375 P(1 < X < 3) =.5(.125+.375)(2) =.5 P(X > 1) = .75 0.50 .5(2)(.25) = .25 0.25 (2)(.25) = .5 1 2 3 4 0.50 P(0.5 < X < 1.5) = .28125 .5(.25+.375)(.5) = .15625 0.25 (.5)(.25) = .125 1 2 3 4 Special Continuous Distributions Uniform Distribution • Is a continuous distribution that is evenly (or uniformly) distributed • Has a density curve in the shape of a rectangle • Probabilities are calculated by finding the area under the curve a b x 2 x2 b a 12 2 How do ayou the Where: & bfind are the area endpoints ofof thea rectangle? uniform distribution The Citrus Sugar Company packs sugar in bags labeled 5 pounds. However, the packaging isn’t perfect and the actual What shape does a uniform weights are uniformly distributed with a What is the height of this distribution have? mean of 4.98 pounds and a range of .12 rectangle? pounds. How long is this rectangle? a)Construct the uniform distribution above. 1/.12 4.92 4.98 5.04 • What is the probability that a randomly selected bag will weigh more than 4.97 pounds? P(X > 4.97) = .07(1/.12) = .5833 What is the length of the shaded region? 1/.12 4.92 4.98 5.04 • Find the probability that a randomly selected bag weighs between 4.93 and 5.03 pounds. What is the length of P(4.93<X<5.03) = .1(1/.12) = .8333 the shaded region? 1/.12 4.92 4.98 5.04 The time it takes for students to drive to school is evenly distributed with a minimum of 5 minutes and a range of 35 minutes. What is the height of the rectangle? a)Draw the distribution Where should the rectangle end? 1/35 5 40 b) What is the probability that it takes less than 20 minutes to drive to school? P(X < 20) = (15)(1/35) = .4286 1/35 5 40 c) What is the mean and standard deviation of this distribution? = (5 + 40)/2 = 22.5 2 = (40 - 5)2/12 = 102.083 = 10.104 Density Curves A density curve is similar to a histogram, but there are several important distinctions. 1. Obviously, a smooth curve is used to represent data rather than bars. However, a density curve describes the proportions of the observations that fall in each range rather than the actual number of observations. 2. The scale should be adjusted so that the total area under the curve is exactly 1. This represents the proportion 1 (or 100%). Density Curves 3. While a histogram represents actual data (i.e., a sample set), a density curve represents an idealized sample or population distribution. (describes the proportion of the observations) 4. Always on or above the horizontal axis 5. We will still utilize mu for mean and sigma for standard deviation. Density Curves: Mean & Median Three points that have been previously made are especially relevant to density curves. 1. The median is the "equal areas" point. Likewise, the quartiles can be found by dividing the area under the curve into 4 equal parts. 2. The mean of the data is the "balancing" point. 3. The mean and median are the same for a symmetric density curve. Shapes of Density Curves • We have mostly discussed right skewed, left skewed, and roughly symmetric distributions that look like this: Bimodal Distributions We could have a bi-modal distribution. For instance, think of counting the number of tires owned by a two-person family. Most two-person families probably have 1 or 2 vehicles, and therefore own 4 or 8 tires. Some, however, have a motorcycle, or maybe more than 2 cars. Yet, the distribution will most likely have a “hump” at 4 and at 8, making it “bi-modal.” Uniform Distributions We could have a uniform distribution. Consider the number of cans in all six packs. Each pack uniformly has 6 cans. Or, think of repeatedly drawing a card from a complete deck. Onefourth of the cards should be hearts, one-fourth of the cards should be diamonds, etc. Other Distributions Many other distributions exist, and some do not clearly fall under a certain label. Frequently these are the most interesting, and we will discuss many of them. #1 RULE – ALWAYS MAKE A PICTURE It is the only way to see what is really going on! Normal Distributions • • • • • Symmetrical bell-shaped (unimodal) density curve How is this done Above the horizontal axis mathematically? N(, ) The transition points occur at + Probability is calculated by finding the area under the curve • As increases, the curve flattens & spreads out • As decreases, the curve gets taller and thinner Normal Curves • Curves that are symmetric, single-peaked, and bell-shaped are often called normal curves and describe normal distributions. • All normal distributions have the same overall shape. They may be "taller" or more spread out, but the idea is the same. What does it look like? Normal Curves: μ and σ • The "control factors" are the mean μ and the standard deviation σ. • Changing only μ will move the curve along the horizontal axis. • The standard deviation σ controls the spread of the distribution. Remember that a large σ implies that the data is spread out. Finding μ and σ • You can locate the mean μ by finding the middle of the distribution. Because it is symmetric, the mean is at the peak. • The standard deviation σ can be found by locating the points where the graph changes curvature (inflection points). These points are located a distance σ from the mean. A 6 B Do these two normal curves have the same mean? If so, what is it? YES Which normal curve has a standard deviation of 3? B Which normal curve has a standard deviation of 1? A The 68-95-99.7 (Empirical)Rule In a NORMAL DISTRIBUTIONS with mean μ and standard deviation σ: • 68% of the observations are within σ of the mean μ. • 95% of the observations are within 2 σ of the mean μ. • 99.7% of the observations are within 3 σ of the mean μ. The 68-95-99.7 Rule Why Use the Normal Distribution??? 1. They occur frequently in large data sets (all SAT scores), repeated measurements of the same quantity, and in biological populations (lengths of roaches). 2. They are often good approximations to chance outcomes (like coin flipping). 3. We can apply things we learn in studying normal distributions to other distributions. Heights of Young Women • The distribution of heights of young women aged 18 to 24 is approximately normally distributed with mean = 64.5 inches and standard deviation = 2.5 inches. The 68-95-99.7 Rule Use the previous chart... • Where do the middle 95% of heights fall? • What percent of the heights are above 69.5 inches? • A height of 62 inches is what percentile? • What percent of the heights are between 62 and 67 inches? • What percent of heights are less than 57 in.? Example • Suppose, on average, it takes you 20 minutes to drive to school, with a standard deviation of 2 minutes. Suppose a normal model is appropriate for the distribution of drivers times. – How often will you arrive at school in less than 20 minutes? – How often will it take you more than 24 minutes? Suppose that the height of male students at BHS is normally distributed with a mean of 71 inches and standard deviation of 2.5 inches. What is the probability that the height of a randomly selected male student is more than 73.5 inches? 1 - .68 = .32 P(X > 73.5) = 0.16 68% 71 Suppose you take the SAT test and the ACT test. Not using the chart they provide, can you directly compare your SAT Math score to your ACT math score? Why or why not? We need to standardized these scores so that we can compare them. Standard Normal Density Curves Always has = 0 & = 1 To standardize: x z Must have this memorized! Let’s explore . . So . what does the z-score tell you? Suppose the mean and standard deviation of a distribution are = 50 & = 5. If the x-value is 55, what is the z-score? 1 If the x-value is 45, what is the z-score? -1 If the x-value is 60, what is the z-score? 2 What do these z scores mean? -2.3 1.8 6.1 -4.3 2.3 below the mean 1.8 above the mean 6.1 above the mean 4.3 below the mean Jonathan wants to work at Utopia Landfill. He must take a test to see if he is qualified for the job. The test has a normal distribution with = 45 and = 3.6. In order to qualify for the job, a person can not score lower than 2.5 standard deviations (z score) below the mean. Jonathan scores 35 on this test. Does he get the job? No, he scored 2.78 SD below the mean Sally is taking two different math achievement tests with different means and standard deviations. The mean score on test A was 56 with a standard deviation of 3.5, while the mean score on test B was 65 with a standard deviation of 2.8. Sally scored a 62 on test A and a 69 on test B. On which test did Sally score the best? She did better on test A. Strategies for finding probabilities or proportions in normal distributions 1. State the probability statement 2. Draw a picture 3. Calculate the z-score 4. Look up the probability (proportion) in the table The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standardDraw deviation of 15 & shade Write the hours. What proportion of these the curve probability batteries can be expected to last less statement than 220 hours? P(X < 220) = .9082 Look up z220 200 score in z 1.33 table 15 Calculate z-score The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last more than 220 hours? P(X>220) = 1 - .9082 = .0918 220 200 z 1.33 15 The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 Look up in table 0.95 hours. How long must a battery last to be in the top 5%? to find z- score P(X > ?) = .05 x 200 1.645 15 x 224.675 .95 .05 1.645 The heights of the female students at PWSH are normally distributed with a What is the zmean of 65 inches. What is the for the standard deviation of this score distribution 63? if 18.5% of the female students are shorter than 63 inches? P(X < 63) = .185 63 65 .9 2 2.22 .9 -0.9 63 Will my calculator do any of this normal stuff? • Normalpdf – use for graphing ONLY • Normalcdf – will find probability of area from lower bound to upper bound • Invnorm (inverse normal) – will find z-score for probability The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last less than 220 hours? N(200,15) P(X < 220) = Normalcdf(-∞,220,200,15)=.9082 The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last more than 220 hours? N(200,15) P(X>220) = Normalcdf(220,∞,200,15) = .0918 The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. How long must a battery last to be in the top 5%? P(X > ?) = .05 .95 Invnorm(.95,200,15)=224.675 .05 The heights of female teachers at PWSH are normally distributed with mean of 65.5 inches and standard deviation of 2.25 inches. The heights of male teachers are normally distributed with mean of 70 inches and standard deviation of 2.5 inches. •Describe the distribution of differences of heights (male – female) teachers. Normal distribution with = 4.5 & = 3.3634 • What is the probability that a randomly selected male teacher is shorter than a randomly selected female teacher? P(X<0) = 4.5 Normalcdf(-∞,0,4.5,3.3634 = .0901 Ways to Assess Normality • Use graphs (dotplots, boxplots, or histograms) • Normal probability (quantile) plot Normal Probability (Quantile) plots • The observation (x) is plotted against known normal z-scores • If the points on the quantile plot lie close to a straight line, then the data is normally distributed • Deviations on the quantile plot indicate nonnormal data • Points far away from the plot indicate outliers • Vertical stacks of points (repeated observations of the same number) is called granularity Consider a random sample with are these nWhy = 5. regions not To find the appropriate z-scores for a the same sample of size 5, divide the standard width? normal curve into 5 equal-area regions. These would be the z-scores (from the Consider a random sample with standard normal curve) that we would theto plot our data against. n Why = 5.isuse median not Next – find the median z-score for in the each region. “middle” of each region? -1.28 0 -.524 1.28 .524 Normal Scores Let’s construct a normal probability Suppose we have the following Sketch a scatterplot by pairing the plot. The values of the normal scores observations of widths of contact smallest normal score with the What should depend oninthe sample size n. The normal windows integrated circuit chips: smallest observation from the data 1 happen if n = set scores when 10 are below: & so on our data is 3.21set2.49 2.94 4.38 normally 1 2 3 4 3.62 3.30 2.85 3.34 distributed? 4.02 5 3.81 -1.539-1 -1.001 -0.656 -0.376 -0.123 0.123 0.376 0.656 1.001 1.539 Widths of Contact Windows Notice that the boxplot is approximately symmetrical and that the normal probability plot is approximately Notice that linear. the boxplot is approximately symmetrical except for the outlier and that the normal probability plot shows the outlier. Notice that the boxplot is skewed left and that the normal probability plot shows this skewness. Are these approximately normally distributed? 50 48 54 47 51 52 46 53 What 52 51 48 48 54 55 57is this 45 53 50 47 49 50 56 called? 53 52 Both the histogram & boxplot are approximately symmetrical, so these data are approximately normal. The normal probability plot is approximately linear, so these data are approximately normal.