chapter2

advertisement

Chapter 2 The Normal Distribution

1

Up to this point we have been developing a strategy for exploring data on a single quantitative variable.

To review:

 Start with a graph (e.g., dot plot, stemplot, or histogram)

 Look for an overall shape or pattern; then look for deviations from this pattern

 Last, but not least, choose a numerical summary to describe center and spread

Here comes a fast ball

We will now add that sometimes the overall pattern of a LARGE number of observations is so regular that we can describe it by a smooth curve

Below is a histogram of the vocabulary scores of all seventh grade students in Gary, Indiana

The histogram is approximately symmetric and both tails fall off smoothly from the single peak. No large gaps or obvious outliers. Note that the smooth curve drawn provides a reasonable description of the overall pattern of the data. Now lets use it!!

The shaded area represents the proportion of scores that are less than or equal to 6. The correct answer from the histogram is 0.303

Chapter 2 The Normal Distribution

2

Using the smooth curve the proportion of scores less than or equal to 6 is calculated to be 0.293. Close enough???

Here comes a curve ball!!!

In most cases the curve is easier to work with because the histogram depends on your choice of classes, while the curve does not if we do the following:

 Use the smooth curve to describe what proportion of the observations fall within each RANGE of values; NOT the counts of observations (relative frequency not frequency)

 Adjust the dimensions of the curve so that the area under the curve represents the proportion of the observations

 Further adjust the scale of the graph so that the total area under the curve is exactly 1

(representing 100% of the data)

The resulting curve is a DENSITY CURVE and the area under the curve and above the horizontal axis is equal to the proportion of observations falling in this range.

The shaded area under the density curve is the proportion of observations taking values between 7 and 8

Note that no real set of data is exactly described by a density curve. The curve is an approximation that is easy to use and accurate enough for our use.

Chapter 2 The Normal Distribution

3

The median and mean of a symmetric density curve

The median and mean of a right-skewed density curve

EXAMPLE

Sketch a density curve that is symmetric but not bell-shaped

The figure is a density curve of a UNIFORM DISTRIBUTION. Using this curve answer the following: a) what is the total area under the curve b) what percent of the observations lie above 0.8 c) what percent of the observations lie between 0.25 and 0.75 d)what percent of the observations lie between 0.8 and 1.75

Chapter 2 The Normal Distribution

4

For the density curve above find the proportion of observations within the interval: a) 0.6 ≤ X ≤ 0.8 b) 0 ≤ X ≤ 0.4

Density curves that are symmetric, single-peaked, and bell-shaped are called NORMAL CURVES and they describe NORMAL DISTRIBUTIONS. All normal distributions have the same overall shape. The exact density curve for a particular normal distribution can be described by giving the mean (µ) and standard deviation ( α )

Two normal curves showing the mean and standard deviation

Chapter 2 The Normal Distribution

5

The 68-95-99.7 Rule (of thumb)

In the normal distribution with mean (µ) and standard deviation (

α

)

 68% of the observations fall within α of the mean

 95% of the observations fall within 2 α of the mean

 99.7% of the observations fall within 3 α of the mean

The distribution of heights of young women aged 18 to 24 is approximately mean 64.5 in and standard deviation 2.5 in. What percentage of young women have heights: a) less than 64.5 in b) less than 69.5 in c) greater than 62 in d) greater than 72 in

Normal distributions are so common in the real world that a shorthand notations has been developed to describe them. The normal distribution with mean µ and standard deviation α is referred to as N(µ, α ).

Normal distributions are important in statistics because:

 The distribution of many real world data sets can be described by the normal distribution (e.g.,

SAT scores)

 Normal distributions are good approximations of many kinds of chance outcomes (e.g., tossing a coin)

 Many statistical procedures based on normal distributions work well for other roughly symmetric distributions.

Chapter 2 The Normal Distribution

6

The Army reports that the distribution of head circumference among male soldiers is approximately

N(22.8,1.1) a)what percent of soldiers have head circumference greater than 23.9 in? b) A head circumference of 23.9 in would be what percentile? c) What percentage of soldiers have head circumferences between 21.7 and 23.9 in?

Human pregnancies from conception to birth varies according to a distribution that is approximately

N(266,16). a) between what values do the lengths of the middle 95% of all pregnancies fall? b) How short are the shortest 2.5% of all pregnancies c) How long are the longest 2.5% of all pregnancies

Chapter 2 The Normal Distribution

7

THE STANDARD NORMAL DISTRIBUTION

Normal distributions have similar shapes. In fact, all normal distributions are exactly the same if we report the data in units of α about µ.

STANDARDIZING AND Z-SCORES

If X is an observation from a distribution that has mean µ and standard deviation α , the standardized value (sometimes called the z-value) of X is the difference between the value and the mean divided by the standard deviation

Z = (X-µ)/

α

A standardized observation tells us how many standard deviations the original observation falls away from the mean and in which direction. Observations larger than the mean are positive and obervations smaller than the mean are negative. Observations equal to the mean give a z-value of zero

The standard normal distribution is the normal distribution N(0,1) with mean of 0 and standard deviation of 1. If a variable X has ANY normal distribution N(µ, α ), then the standardized variable z has the standard normal distribution

Table A, inside the front cover, gives areas under the standard normal curve. The table entry for each zvalue is the area under the curve to the left of z

Chapter 2 The Normal Distribution

8

What proportion of young women are less than 68 inches tall?

The distribution of heights of young women aged 18 to 24 was approximately N(64.5,2.5). The standardized height becomes

Z = (height – 64.5)/2.5

The level of cholesterol in the blood is important because high-cholesterol levels may increase the risk of heart disease. The distribution of blood cholesterol levels in a large population of people of the same age and sex is roughly normal. For 14 year old boys the mean is 170 milligrams of cholesterol per deciliter of blood and standard deviation of 30 mg/dL. Levels above 240 mg/dL may require medical attention.

What percent of 14 year old boys have more than 240 mg/dL of cholesterol? x>240

(x-170)/30 > (240-170)/30 z>2.33

What percent of 14 year old boys have blood cholesterol between 170 smf 240 mg/dL?

170 ≤ X ≤ 240

(170-170)/30 ≤ (X-170)/30 ≤ (240-170)/30

Chapter 2 The Normal Distribution

9

0 ≤ Z ≤ 2.33

Using Table A the area between 0 and 2.33 is the area below 2.33 minum the area below 0

Area between 0 and 2.33

= area below 2.33 – area below 0.00

= 0.9901 – 0.5000

= 0.4901

Therefore, about 49% of 14 year old boys have cholesterol levels between 170 and 240 mg/dL.

What if the z-value we are interested in falls outside the range covered by the table. For example, what if we are interested in the area to the left of Z = -4. The table only goes out to Z = -3.4. The desired area is less than the entry for Z = -3.4, which is 0.0003. There is very little area outside the range covered by the Table. Therefore, you can usually take this area to be zero with little loss in accuracy.

What if we want to find the observed value with a given proportion of observations above it or below it?

To do this one must find the desired area in Table A and work backward to find the corresponding observed value.

Example: Scores on the SAT verbal test in recent years follow approximately the N(505,110) distribution. How high must a student score in order to place in the top 10% of all students taking the

SAT?

Go to Table A and find the value of Z such that 90% of the area falls to the left.

The closest entry to 0.9 is 0.8997. This entry corresponds to a Z-value of 1.28

Therefore,

Z = 1.28

(X-505)/110 = 1.28

X = 1.28(110) + 505

X = 645.8

Consequently, a student must score at least 646 to place in the highest 10%

Chapter 2 The Normal Distribution

10

Example: Use Table A to find the area under the curve which answers the following question? a) Z < 2.85 b) Z > 2.85 c) Z ≥ 2.85 d) Z = 2.85 e) The point Z with 25% of the observations falling below it

Is It Normal????

Many of the statistical calculations we will use in later chapters assume the distribution of the data is normal. Therefore you will need to be able to verify normality in order to properly use these statistical techniques.

1.

Plot the data as a histogram, stemplot or dotplot. Look for a symmetric bell-shaped curve. Then verify that the 68-95-99.7 rule applies.

2.

Construct a normal probability plot. Enter data into a list and then hit StatPlot Icon 6 with data

(list) and axis X. If the graph is linear (or roughly so) the data has a normal distribution.

Note that the program calculates the percentile for each data point and its corresponding Z-value. It then plots the Z-value on the Y-axis and the corresponding X-value (original observed value) on the X-axis.

Example: Does the following data on hysterectomies performed per year by male doctors in Switzerland approximate a normal distribution?

27 50 33 25 86 25 85 31 37 44 20 36 59 34 28

Download