3.2 Notes

advertisement
Chapter 3
Distributions
Continuous random variables
• Are numerical variables
whose values fall within a
range or interval
• Are measurements
• Can be described by
density curves
Density curves
• Is always on or above the
horizontal axis
• Has an area exactly equal to one
underneath it
• Often describes an overall
distribution
• Describe what proportions of the
observations fall within each range
of values
Unusual density curves
• Can be any shape
• Are generic continuous
distributions
• Probabilities are calculated
by finding the area under
the curve
.5
How do you find
the area of a
triangle?
.25
1
2
3
4
2.25 
 .25
P(X < 2) =
2
5
What is the
area of a line
segment?
.5
.25
1
2
P(X = 2) = 0
P(X < 2) = .25
3
4
5
In continuous
distributions,
P(X < 2) & P(X < 2)
areHmmmm…
the same
answer.
Is this different
than discrete
distributions?
Shape is a
trapezoid –
.5
b1How
= .5long are the
bases?
.25
b2 = .375
1
2
4
h = 1
3

b1  b2 h
Area 
5
2
P(X > 3) = .5(.375+.5)(1)=.4375
P(1 < X < 3) =.5(.125+.375)(2) =.5
P(X > 1) = .75
0.50
.5(2)(.25) = .25
0.25
(2)(.25) = .5
1
2
3
4
0.50
P(0.5 < X < 1.5) = .28125
.5(.25+.375)(.5) =
.15625
0.25
(.5)(.25) = .125
1
2
3
4
Special Continuous
Distributions
Uniform Distribution
• Is a continuous distribution that is
evenly (or uniformly) distributed
• Has a density curve in the shape of a
rectangle
• Probabilities are calculated by finding
the area under the curve
a b
x 
2
 x2

b a

12
2
How
do ayou
the
Where:
& bfind
are the
area
endpoints
ofof
thea
rectangle?
uniform
distribution
The Citrus Sugar Company packs sugar in
bags labeled 5 pounds. However, the
packaging isn’t perfect and the actual
What
shape
does a uniform
weights
are
uniformly
distributed
with a
What is the height of this
distribution
have?
mean of 4.98
pounds
and
a
range
of .12
rectangle?
pounds.
How long is this rectangle?
a)Construct the uniform distribution above.
1/.12
4.92
4.98
5.04
•
What is the probability that a
randomly selected bag will weigh
more than 4.97 pounds?
P(X > 4.97) =
.07(1/.12)
= .5833
What is the
length of
the shaded region?
1/.12
4.92
4.98
5.04
• Find the probability that a
randomly selected bag weighs
between 4.93 and 5.03 pounds.
What is the
length of
P(4.93<X<5.03) = .1(1/.12)
= .8333
the shaded region?
1/.12
4.92
4.98
5.04
The time it takes for students to drive
to school is evenly distributed with a
minimum of 5 minutes and a range of 35
minutes.
What is the height of the
rectangle?
a)Draw the distribution
Where should the
rectangle end?
1/35
5
40
b) What is the probability that it takes
less than 20 minutes to drive to
school?
P(X < 20) = (15)(1/35) = .4286
1/35
5
40
c) What is the mean and standard
deviation of this distribution?
 = (5 + 40)/2 = 22.5
2 = (40 - 5)2/12 = 102.083
 = 10.104
Density Curves
A density curve is similar to a histogram, but there are
several important distinctions.
1. Obviously, a smooth curve is used to represent data
rather than bars. However, a density curve describes
the proportions of the observations that fall in each
range rather than the actual number of observations.
2. The scale should be adjusted so that the total area
under the curve is exactly 1. This represents the
proportion 1 (or 100%).
Density Curves
3. While a histogram represents actual data (i.e., a
sample set), a density curve represents an idealized
sample or population distribution. (describes the
proportion of the observations)
4. Always on or above the horizontal axis
5. We will still utilize mu  for mean and sigma  for
standard deviation.
Density Curves: Mean &
Median
Three points that have been previously made are
especially relevant to density curves.
1. The median is the "equal areas" point. Likewise, the
quartiles can be found by dividing the area under the
curve into 4 equal parts.
2. The mean of the data is the "balancing" point.
3. The mean and median are the same for a symmetric
density curve.
Shapes of Density Curves
• We have mostly discussed right skewed, left
skewed, and roughly symmetric distributions
that look like this:
Bimodal Distributions
We could have a bi-modal distribution. For
instance, think of counting the number of tires
owned by a two-person family. Most two-person
families probably have 1 or 2 vehicles, and
therefore own 4 or 8 tires. Some, however, have
a motorcycle, or maybe more than 2 cars. Yet,
the distribution will most likely have a “hump”
at 4 and at 8, making it “bi-modal.”
Uniform Distributions
We could have a uniform distribution. Consider
the number of cans in all six packs. Each pack
uniformly has 6 cans. Or, think of repeatedly
drawing a card from a complete deck. Onefourth of the cards should be hearts, one-fourth
of the cards should be diamonds, etc.
Other Distributions
Many other distributions exist, and some do not
clearly fall under a certain label. Frequently
these are the most interesting, and we will
discuss many of them.
#1 RULE – ALWAYS MAKE A
PICTURE
It is the only way to see what is really going on!
Normal Distributions
•
•
•
•
•
Symmetrical bell-shaped (unimodal) density curve
How is this done
Above the horizontal axis
mathematically?
N(, )
The transition points occur at  + 
Probability is calculated by finding the area under
the curve
• As  increases, the curve flattens &
spreads out
• As  decreases, the curve gets
taller and thinner
Normal Curves
• Curves that are symmetric, single-peaked, and
bell-shaped are often called normal curves and
describe normal distributions.
• All normal distributions have the same overall
shape. They may be "taller" or more spread
out, but the idea is the same.
What does it look like?
Normal Curves: μ and σ
• The "control factors" are the mean μ and the
standard deviation σ.
• Changing only μ will move the curve along the
horizontal axis.
• The standard deviation σ controls the spread
of the distribution. Remember that a large σ
implies that the data is spread out.
Finding μ and σ
• You can locate the mean μ by finding the
middle of the distribution. Because it is
symmetric, the mean is at the peak.
• The standard deviation σ can be found by
locating the points where the graph changes
curvature (inflection points). These points are
located a distance σ from the mean.
A
6
B


Do these two normal curves have the same mean?
If so, what is it?
YES
Which normal curve has a standard deviation of 3?
B
Which normal curve has a standard deviation of 1?
A
The 68-95-99.7 (Empirical)Rule
In a NORMAL DISTRIBUTIONS with mean μ
and standard deviation σ:
• 68% of the observations are within σ of the
mean μ.
• 95% of the observations are within 2 σ of the
mean μ.
• 99.7% of the observations are within 3 σ of the
mean μ.
The 68-95-99.7 Rule
Why Use the Normal
Distribution???
1. They occur frequently in large data sets (all
SAT scores), repeated measurements of the
same quantity, and in biological populations
(lengths of roaches).
2. They are often good approximations to chance
outcomes (like coin flipping).
3. We can apply things we learn in studying
normal distributions to other distributions.
Heights of Young Women
• The distribution of heights of young women
aged 18 to 24 is approximately normally
distributed with mean  = 64.5 inches and
standard deviation  = 2.5 inches.
The 68-95-99.7 Rule
Use the previous chart...
• Where do the middle 95% of heights fall?
• What percent of the heights are above 69.5
inches?
• A height of 62 inches is what percentile?
• What percent of the heights are between 62
and 67 inches?
• What percent of heights are less than 57 in.?
Example
• Suppose, on average, it takes you 20 minutes
to drive to school, with a standard deviation of
2 minutes. Suppose a normal model is
appropriate for the distribution of drivers
times.
– How often will you arrive at school in less than 20
minutes?
– How often will it take you more than 24 minutes?
Suppose that the height of male
students at BHS is normally
distributed with a mean of 71 inches
and standard deviation of 2.5 inches.
What is the probability that the
height of a randomly selected male
student is more than 73.5 inches?
1 - .68 = .32
P(X > 73.5) = 0.16
68%
71
Suppose you take the SAT test and the
ACT test. Not using the chart they
provide, can you directly compare your
SAT Math score to your ACT math score?
Why or why not?
We need to standardized these scores so
that we can compare them.
Standard Normal Density
Curves
Always has  = 0 &  = 1
To standardize:
x 
z 

Must have
this
memorized!
Let’s explore . . So
. what does the z-score
tell you?
Suppose the mean and standard deviation of a
distribution are  = 50 &  = 5.
If the x-value is 55, what is the z-score?
1
If the x-value is 45, what is the z-score?
-1
If the x-value is 60, what is the z-score?
2
What do these z scores mean?
-2.3
1.8
6.1
-4.3
2.3  below the mean
1.8  above the mean
6.1  above the mean
4.3  below the mean
Jonathan wants to work at Utopia
Landfill. He must take a test to see
if he is qualified for the job. The
test has a normal distribution with
 = 45 and  = 3.6. In order to
qualify for the job, a person can not
score lower than 2.5 standard
deviations (z score) below the mean.
Jonathan scores 35 on this test.
Does he get the job?
No, he scored 2.78 SD below the mean
Sally is taking two different math
achievement tests with different means
and standard deviations. The mean score
on test A was 56 with a standard deviation
of 3.5, while the mean score on test B was
65 with a standard deviation of 2.8. Sally
scored a 62 on test A and a 69 on test B.
On which test did Sally score the best?
She did better on test A.
Strategies for finding probabilities
or proportions in normal
distributions
1. State the probability
statement
2. Draw a picture
3. Calculate the z-score
4. Look up the probability
(proportion) in the table
The lifetime of a certain type of battery
is normally distributed with a mean of
200 hours
and
a standardDraw
deviation
of 15
& shade
Write
the
hours. What
proportion of these
the curve
probability
batteries
can be expected to last less
statement
than 220 hours?
P(X < 220) = .9082
Look up z220
 200
score
in
z 
 1.33
table
15
Calculate z-score
The lifetime of a certain type of battery
is normally distributed with a mean of
200 hours and a standard deviation of 15
hours. What proportion of these
batteries can be expected to last more
than 220 hours?
P(X>220) = 1 - .9082
= .0918
220  200
z 
 1.33
15
The lifetime of a certain type of battery
is normally distributed with a mean of
200 hours and a standard deviation of 15
Look
up in
table 0.95
hours. How long
must
a battery
last to be
in the top 5%? to find z- score
P(X > ?) = .05
x  200
1.645 
15
x  224.675
.95
.05
1.645
The heights of the female students at
PWSH are normally distributed with a
What
is the zmean of 65 inches. What
is the
for the
standard deviation of this score
distribution
63?
if 18.5% of the female students are
shorter than 63 inches?
P(X < 63) = .185
63  65
 .9 

2
 
 2.22
 .9
-0.9
63
Will my calculator do any
of this normal stuff?
• Normalpdf – use for graphing ONLY
• Normalcdf – will find probability of
area from lower bound to upper
bound
• Invnorm (inverse normal) – will find
z-score for probability
The lifetime of a certain type of battery
is normally distributed with a mean of
200 hours and a standard deviation of 15
hours. What proportion of these
batteries can be expected to last less
than 220 hours?
N(200,15)
P(X < 220) =
Normalcdf(-∞,220,200,15)=.9082
The lifetime of a certain type of battery
is normally distributed with a mean of
200 hours and a standard deviation of 15
hours. What proportion of these
batteries can be expected to last more
than 220 hours?
N(200,15)
P(X>220) =
Normalcdf(220,∞,200,15) = .0918
The lifetime of a certain type of battery
is normally distributed with a mean of
200 hours and a standard deviation of 15
hours. How long must a battery last to be
in the top 5%?
P(X > ?) = .05
.95
Invnorm(.95,200,15)=224.675
.05
The heights of female teachers at
PWSH are normally distributed with
mean of 65.5 inches and standard
deviation of 2.25 inches. The heights
of male teachers are normally
distributed with mean of 70 inches and
standard deviation of 2.5 inches.
•Describe the distribution of differences
of heights (male – female) teachers.
Normal distribution with
 = 4.5 &  = 3.3634
• What is the probability that a
randomly selected male teacher is
shorter than a randomly selected
female teacher?
P(X<0) =
4.5
Normalcdf(-∞,0,4.5,3.3634 = .0901
Ways to Assess Normality
• Use graphs (dotplots,
boxplots, or histograms)
• Normal probability
(quantile) plot
Normal Probability (Quantile) plots
• The observation (x) is plotted against known
normal z-scores
• If the points on the quantile plot lie close
to a straight line, then the data is normally
distributed
• Deviations on the quantile plot indicate
nonnormal data
• Points far away from the plot indicate
outliers
• Vertical stacks of points (repeated
observations of the same number) is called
granularity
Consider a random sample with
are these
nWhy
= 5.
regions not
To find the appropriate z-scores for a
the same
sample
of
size
5,
divide
the
standard
width?
normal curve into 5 equal-area regions.
These would
be the z-scores
(from
the
Consider
a random
sample
with
standard normal curve) that we would
theto plot our data against.
n Why
= 5.isuse
median not
Next – find the median z-score for
in the
each
region.
“middle” of
each region?
-1.28
0
-.524
1.28
.524
Normal Scores
Let’s
construct
a normal
probability
Suppose
we
have
the
following
Sketch a scatterplot by pairing the
plot.
The
values
of
the
normal
scores
observations
of
widths
of
contact
smallest normal score with the
What
should
depend
oninthe
sample size
n. The
normal
windows
integrated
circuit
chips:
smallest
observation
from
the
data
1
happen
if n = set
scores
when
10 are
below:
& so
on
our data
is
3.21set2.49
2.94 4.38
normally 1
2
3
4
3.62 3.30 2.85 3.34
distributed?
4.02
5
3.81
-1.539-1 -1.001 -0.656 -0.376 -0.123
0.123 0.376 0.656 1.001 1.539
Widths of Contact Windows
Notice that the boxplot
is approximately
symmetrical and that
the normal probability
plot is approximately
Notice that linear.
the boxplot is
approximately
symmetrical except for
the outlier and that
the normal probability
plot shows the outlier.
Notice that the boxplot
is skewed left and
that the normal
probability plot shows
this skewness.
Are these approximately normally
distributed?
50 48 54 47 51 52 46 53
What
52 51 48 48 54 55
57is this
45
53 50 47 49 50 56 called?
53 52
Both the histogram & boxplot
are approximately
symmetrical, so these data
are approximately normal.
The normal probability
plot is approximately
linear, so these data are
approximately normal.
Download