Ch 2 Notes - msmatthewsschs

advertisement
Chapter 2 Notes
Describing Location in a Distribution
2.1 – Measures of Relative Standing and Density Curves
Where do I stand compared to others?
NOTATION:
Sample
Mean
Standard
Deviation
x
Sx or s
Population

“mu”
 “sigma”
Note: Never
use x on the
calc
Sally wants to go to college. She recently took a
really hard AP Stats test and got her score back.
She went to several of her friends to see their
scores.
Chapter 1 Test Results
90
72
79
85
85
84
76
86
4
5
6
7
8
9
89
69
60
80
89
4
09
2269
0245569
00
54
48
49
72
82
90
Standardizing Values and Z-Scores
We are going to see how the test score looks
based on the variability of the data using the
standard deviation based on the center of our
data, the mean.
x  mean
z
standard deviation
This tells us how many standard deviations
the test score is away from the mean.
Standardizing Values and Z-Scores
• Standardized values have no units.
• z-scores measure the distance of each
data value from the mean in standard
deviations.
• A negative z-score tells us that the data
value is below the mean, while a positive zscore tells us that the data value is above
the mean.
Benefits of Standardizing
• Standardized values have been converted
from their original units to the standard
statistical unit of standard deviations from
the mean.
• Thus, we can compare values that are
measured on different scales, with different
units, or from different populations.
Back to z-scores
• Standardizing data into z-scores shifts the
data by subtracting the mean and rescales
the values by dividing by their standard
deviation.
– Standardizing into z-scores does not change
the shape of the distribution.
– Standardizing into z-scores changes the center
by making the mean 0.
– Standardizing into z-scores changes the
spread by making the standard deviation 1.
To standardize:
Convert the mean to 0.
Convert the standard deviation to 1.
When Is a z-score BIG?
• A z-score gives us an indication of how
unusual a value is because it tells us how
far it is from the mean.
• A data value that sits right at the mean, has
a z-score equal to 0.
• A z-score of 1 means the data value is 1
standard deviation above the mean.
• A z-score of –1 means the data value is 1
standard deviation below the mean.
Example #1
Find Sally’s z-score and comment on how she did on
the test.
90
72
79
85
85
84
76
86
89
69
60
80
54
48
49
72
82
90
x  mean
80

75
z
 0.3607

standard deviation 13.86
She is barely (only 0.36 standard deviations)
above the average test score. She is slightly
above average.
Example #2
The distribution of the duration of human pregnancies (i.e. the
number of days between conception and birth) has been found with
mean  = 266 and the  = 16.
a. What is the Z-Score for a human pregnancy of 266 days?
0
x
–

266
–
266
0
Z=
=
=
=
16

16
.
That’s the mean!
Example #2
The distribution of the duration of human pregnancies (i.e. the
number of days between conception and birth) has been found to be
approximately normal with mean  = 266 and the  = 16.
b. What is the Z-Score for a human pregnancy of 250 days?
-16
x
–

250
–
266
-1
Z=
=
=
=
16

16
One standard deviation below the mean
Example #3
Adult female Dalmations weigh an average of 50 pounds with a
standard deviation of 3.3 pounds. Adult female Boxers weigh an
average of 57.5 pounds with a standard deviation of 1.7 pounds.
Mike owns an underweight Dalmatian and an underweight
Boxer. The Dalmatian weighs 45 pounds and the Boxer weighs
52 pounds. Which dog is more underweight? Explain.
Dalmatian
Z = x –  = 45 – 50 =
3.3
Boxer
x
–

52
–
57.5
Z=
=
=

1.7
-5
-1.515
=
3.3
.
-5.5
-3.235
=
1.7
.
The Boxer’s weight is VERY low comparative to other Boxers.
Hmm…so how unusual is -3.235?
Percentile: Percent of the observations at or
below a value
Chebshev’s Inequality
In any distribution, the percent of
observations falling within k standard
deviations of the mean is at least
1 

100  1  2 
 k 
-sd
sd
mean
Example #4
Find the percentile ranking of the following:
a. One standard deviation from the mean
1

100  1  2   1001 1  100 0  0%
 1 
b. Two standard deviations from the mean
1 

100  1  2   1001  .25  100 0.75  75%
 2 
c. Three standard deviations from the mean
1 

100  1  2   1001  .11  100 0.889  88.9%
 3 
d. Four standard deviations from the mean
1 

100  1  2   1001  .0625  100 0.9375  93.75%
 4 
Density Curve: • Mathematical model that
describes a set of data.
• Above x-axis
• Total area under curve = 1
Example #5
Using the following uniform density curve, answer the question:
a. Verify that this is a density curve.
A = bh
A = (8)(0.125)
A=1
Example #5
Using the following uniform density curve, answer the question:
b. What is the probability that the random
variable has a value less than 3?
A = bh
A = (3)(0.125)
A = 0.375
Example #5
Using the following uniform density curve, answer the question:
c. What is the probability that the random
variable has a value between 3 and 5?
A = bh
A = (2)(0.125)
A = 0.25
Example #5
Using the following uniform density curve, answer the question:
d. What is the percentile for the variable that has
a value of 6?
A = bh
A = (6)(0.125)
A = 0.75
Example #5
Using the following uniform density curve, answer the question:
e. What value for the variable is in the 25th percentile?
A = bh
0.25 = (b)(0.125)
2=b
In a density curve:
•The mean, median, and quartiles can be located by eye.
•The mean, , is the balance point of the curve, if it
were made of solid material.
•The median, M, divides the area under the curve
in half.
•The quartiles with the median divide the curve
into quarters.
The mean and the median are the same only if
the distribution is symmetrical. The median is a
measure of center that is resistant to skew and
outliers. The mean is not.
Mean
Median
Mean
Median
Mean
Median
Example #6
A group of 78 third-grade students in a Midwestern elementary
school took a “self-concept” test that measured how well they felt
about themselves. Higher scores indicate more positive selfconcepts. A histogram for these students’ self-concept scores are
given below. Draw an appropriate density curve for summarizing
the histogram on the graph above. How would you describe the
shape of this density curve?
Frequency
20
10
0
15
25
35
45
55
SelfConcept
65
75
85
Example #6
A group of 78 third-grade students in a Midwestern elementary
school took a “self-concept” test that measured how well they felt
about themselves. Higher scores indicate more positive selfconcepts. A histogram for these students’ self-concept scores are
given below. Draw an appropriate density curve for summarizing
the histogram on the graph above. How would you describe the
shape of this density curve?
Frequency
20
Skew Left
10
0
15
25
35
45
55
SelfConcept
65
75
85
Example #7
Label A, B, and C as either the mean, median, or mode
for each picture.
A = mode
B = median
C = mean
A = Mean, median, mode A = mean
B = n/a
B = median
C = n/a
C = mode
Example #8
For the density curve below, which of the following is true?
a.
b.
c.
d.
The mean and median are equal.
The mean is greater than the median.
The mean is less than the median
The mean could be either greater than or
less than the median
e. The mean is 0.5
2.2 – Normal Distributions
Think of how stupid the average person is, and
realize half of them are stupider than that.
Normal Curves:
Density curve that is symmetric, unimodal, and
bell-shaped
It is used to compare information from
different populations or to find the percentile
of a certain value.
N(,) where  = mean and  = standard
deviation of the population.
The mean is the center and the standard
deviation is the distance from the mean.
The NORMAL(or BELL-SHAPED) DISTRIBUTION
describes many different data sets, like:
•
•
•
•
•
•
•
Scores on a midterm exam
Weights of people
Calorie consumption per day
Lengths of pregnancies
Heights of people
IQ
The # of M&M’s in a 1lb bag
The 68-95-99.7 Rule (Empirical Rule):
In the normal distribution with mean  and standard
deviation :
• 68% of the observations fall within  of the
mean , or 
• 95% of the observations fall within 2 of , or
2
• 99.7% of the observations fall within 3 of ,
or 3
Example #5
Find the area between each of the deviations.
Why are these not percentiles?
So, how do you know if it is normal?
Assessing Normality: Looks can be deceiving!
Method #1: Make a histogram
1. Find the mean and sd
2. Measure the intervals for 1, 2, and 3 sd
3. Count how many observations fall
between these standard deviations
4. Compare to the 68-95-99.7 Rule
Remember the Dalmatians?
Example #9: Here is a sample of 25 dalmatians from a
local vets office. Determine if their weights follow a
normal distribution.
53
48
52
51
52
46
45
47
48
52
51
55
Mean = 49.92lbs
S.D. = 3.34lbs
51
48
46
47
54
51
51
46
55
56
50
44
49
100%
60%
0
39.9
5
43.24
6
46.58
9
49.92
4
53.26
Gosh, tough call!
0
56.60
59.94
Method #2: Normal Probability Plot
Zscore
1. Plug data into a list
2. Make a Normal Probability Plot
3. If the data is normal then it will make a
straight line
data
Normal
Left Skew
Right Skew
Bimodal
Calculator Tip: Normal Probability Plot
Statplot – Normal Probability Plot
Back to the dogs…
Example #10: Make a normal probability plot
of the weights of dalmatians.
Example #11
The plot shown at the right is a Normal probability plot for a
set of data. The data value is plotted on the x axis, and the
standardized value is plotted on the y axis. Which statement is
true for these data?
(a) The data are clearly Normally distributed.
(b) The data are approximately Normally distributed.
(c) The data are clearly skewed to the left.
(d) The data are clearly skewed to the right.
(e) There is insufficient information to determine the shape of
the distribution.
Day 2 of 2.2!
The Standard Normal Curve:
The requirement is that the data must me
normally distributed. We will use TABLE A. (In
the text, you have TABLE A in the front 2 pages
of the book.)
How
can you
find
me?
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
a. P (Z < -2.20) =
=1
=0
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
a. P (Z < -2.20) =
=1
Z
-2.20
=0
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
a. P (Z < -2.20) = 0.0139
=1
Z
-2.20
=0
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
b. P (Z > 2.20) =
=1
 =0
Z
2.20
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
b. P (Z > 2.20) = 1 – P(Z < 2.20) =
=1
 =0
Z
2.20
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
b. P (Z > 2.20) = 1 – P(Z < 2.20) =
1 – 0.9861 = 0.0139
=1
 =0
Z
2.20
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
c. P(Z > -0.95) = 1 – P(Z <-0.95) =
=1
=0
Z
-0.95
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
c. P(Z > -0.95) = 1 – P(Z <-0.95) =
1 – 0.1711 = 0.8289
=1
=0
Z
-0.95
Note:
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
d. P (Z < 1.25) =
=1
=0 Z
1.25
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
d. P (Z < 1.25) = 0.8944
=1
=0 Z
1.25
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
e. P (-1.04 < Z < 3.01) = P(Z < 3.01) – P(Z < –1.04)
=1
Z =0
-1.04
Z
3.01
e. P (-1.04 < Z < 3.01) = P(Z < 3.01) – P(Z < –1.04)
= 0.9987 – 0.1492
= 0.8495
=1
Z =0
-1.04
Z
3.01
Example #12
Let’s us TABLE A to find the following: Draw the picture first,
shade the region you want and look up the X in Table A to find the
proportion, probability (percentage) to the left of that z-score. The
proportion is also known as probability that the value of a
particular member of a population will fall in the given interval.
f. P (0.15 < Z < 1.41) = P(Z < 1.41) – P(Z<0.15)
=1
=0
Z
Z 1.41
0.15
f. P (0.15 < Z < 1.41) = P(Z < 1.41) – P(Z<0.15)
= 0.9207 – 0.5596
= 0.3611
=1
=0
Z
Z 1.41
0.15
Calculator Tip: Catalog Help
Apps – CtlgHelp – Enter – Enter
Calculator Tip: Z-score Probabilities
2nd Dist – Normalcdf,
(lowerbound, upperbound, mean, sd)
Example #13
Suppose that the fuel efficiency (in miles per gallon) of a Beetle
varies with each tank of gas according to a normal distribution
with  = 34 and standard deviation  = 3.5 miles per gallon.
a. What proportion of all tanks would get 29 miles per gallon or
less?
-5
x
–

29
–
34
Z=
=
-1.4285
=
=
3.5

3.5
P (Z < -1.4285) =
=1
 = 3.5
x
29
=34
Z
-1.4285
=0
P (Z < -1.4285) = 0.0764
=1
 = 3.5
x
29
=34
Z
-1.4285
=0
Example #13
Suppose that the fuel efficiency (in miles per gallon) of a Beetle
varies with each tank of gas according to a normal distribution
with  = 34 and standard deviation  = 3.5 miles per gallon.
b. What proportion of all tanks would get 40 miles per gallon or
more?
6
x
–

40
–
34
Z=
=
1.7143
=
=
3.5

3.5
.
P (Z > 1.7143) =
 = 3.5
=34
x
40
=1
=0
Z
1.71
P (Z > 1.7143) = 0.0436
 = 3.5
=34
x
40
=1
=0
Z
1.71
Example #13
Suppose that the fuel efficiency (in miles per gallon) of a Beetle
varies with each tank of gas according to a normal distribution
with  = 34 and standard deviation  = 3.5 miles per gallon.
c. What proportion of all tanks would get between 27 and 42 miles
per gallon?
-7
x
–

27
–
34
Z=
=
-2
=
=
3.5

3.5
8
x
–

42
–
34
Z=
=
2.285
=
=
3.5

3.5
.
P (-2 < Z < 2.29) = P(Z < 2.29) – P(Z < -2)
=1
 = 3.5
Z
27
 = 34
Z
42
Z
-2
=0
Z
2.29
P (-2 < Z < 2.29) = P(Z < 2.29) – P(Z < -2)
= 0.9890 – 0.0228
= 0.9662
=1
 = 3.5
Z
27
 = 34
Z
42
Z
-2
=0
Z
2.29
Example #13
Suppose that the fuel efficiency (in miles per gallon) of a Beetle
varies with each tank of gas according to a normal distribution
with  = 34 and standard deviation  = 3.5 miles per gallon.
d. What proportion of all tanks would get 47 miles per gallon or
more? Less than 47 miles per gallon?
13
x
–

47
–
34
Z=
=
3.715
=
=
3.5

3.5
P (Z < 3.715) =
=1
=0 Z
3.72
P (Z > 3.715) =
=1
=0
Z
3.72
P (Z < 3.715) = 1
=1
=0 Z
3.72
P (Z > 3.715) = 0
=1
=0
Z
3.72
Example #14: Golf courses have a wide range of difficulty.
Similarly, players differ in ability. In order to adjust for
variations between players, they are often assigned a handicap
score. To adjust for variations between courses, a handicapper
decides to compare the golfer’s score against the data from the
course. Suppose course A plays at a mean score of 76 with a
standard deviation of 8 strokes with a normal distribution of
scores. The mean score for course B is 80 with a standard
deviation of 6 strokes and the scores are normally distributed.
If a golfer regularly shoots an 80 on course A, what should be
the comparable score on course B?
=8
 = 76 x
80
=6
 = 80 x=?
4
x
–

80
–
76
Z=
0.5
=
=
8
 =
8
x
–

Z=

x
–
80
0.5 =
6
3 = x – 80
83 = x
What if I know the percentile, but not X?
Inverse Normal Probability Calculations: Un-standardizing
Example #15: What value(s) of Z cut off the region described?
a. The lowest 11%
=1
0.11
Z=?
=0
Example #15: What value(s) of Z cut off the region described?
a. The lowest 11%
Z = -1.23
=1
0.11
Z=?
=0
Example #15: What value(s) of Z cut off the region described?
b. The highest 30%
=1
0.30
=0
Z=?
Example #15: What value(s) of Z cut off the region described?
b. The highest 30%
Z = 0.52
=1
0.30
=0
Z=?
Example #15: What value(s) of Z cut off the region described?
c. The highest 7%
=1
0.07
=0
Z=?
Example #15: What value(s) of Z cut off the region described?
c. The highest 7%
Z = 1.48
=1
0.07
=0
Z=?
Example #15: What value(s) of Z cut off the region described?
d. The middle 50%
75% – 25%
=1
Z=?  = 0 Z=?
Example #15: What value(s) of Z cut off the region described?
d. The middle 50%
75% – 25%
Z = 0.67 and Z = -0.67
=1
Z=?  = 0 Z=?
Want Z?
Calculator Tip: Z-score given probabilities
2nd Dist – invNorm(%)
Want X?
Calculator Tip: Z-score given probabilities
2nd Dist – invNorm(%, , )
Steps for the Inverse Probability Calculation:
• Draw a picture
• Identify the z-value from the given value of the proportion –
look up the proportion in the MIDDLE of Table A.
• Solve for x:
x
–

Z=

x = Z() + 
Example # 16
A British company called Molebegon removes unwanted moles
from gardens. In 1995, the European Union announced that the
tiny moles are just too difficult to catch. They will not attempt to
catch the smallest 10%. Molebegon’s past records indicate that
weights of moles are normally distributed with a mean of 150
grams and a standard deviation of 32 grams. What is the cut off
weight for the moles they will catch?
x = Z() + 
 = 32
0.10
Z=?
=150
Example # 16
A British company called Molebegon removes unwanted moles
from gardens. In 1995, the European Union announced that the
tiny moles are just too difficult to catch. They will not attempt to
catch the smallest 10%. Molebegon’s past records indicate that
weights of moles are normally distributed with a mean of 150
grams and a standard deviation of 32 grams. What is the cut off
weight for the moles they will catch?
x = Z() + 
 = 32
x = (-1.28)(32) + 150
x = -40.96 + 150
x = 109.04 grams
0.10
Z=?
=150
Example #17
ACT versus SAT, I There are two major tests of readiness for college,
the ACT and the SAT. ACT scores are reported on a scale from 1 to 36.
The distribution of ACT scores in recent years has been roughly
Normal with mean = 20.9 and standard deviation = 4.8. SAT scores
(prior to 2005) were reported on a scale from 400 to 1600. SAT scores
have been roughly Normal with mean = 1026 and standard deviation
= 209. The following exercises are based on this information.
a. Jose scores 1287 on the SAT. Assuming that
both tests measure the same thing, what score
on the ACT is equivalent to Jose's SAT score?
Explain.
SAT
 = 209
 = 1026 x
1287
ACT
 = 4.8
 = 20.9 x=?
SAT
261
x
–

1287
–
1026
Z=
=
1.25
=
=
209

209
x
–

ACT Z =

x
–
20.9
1.25 =
4.8
5.99 = x – 20.9
26.89 = x
b. Reports on a student's ACT or SAT usually give the
percentile as well as the actual score. Tonya scores 1318
on the SAT. What is her percentile? Show your method.
x
–

1318 – 1026 = 1.40
Z=
=

209
P(Z < 1.40) =
 = 209
=1026 x = 1318
b. Reports on a student's ACT or SAT usually give the
percentile as well as the actual score. Tonya scores 1318
on the SAT. What is her percentile? Show your method.
x
–

1318 – 1026 = 1.40
Z=
=

209
P(Z < 1.40) = 0.9192
 = 209
The 91st percentile
=1026 x = 1318
c. The quartiles of any distribution are the values with
cumulative proportions 0.25 and 0.75. What are the quartiles of
the distribution of ACT scores? Show your method.
x = Z() + 
 = 4.8
25%
Z? = 20.9
c. The quartiles of any distribution are the values with
cumulative proportions 0.25 and 0.75. What are the quartiles of
the distribution of ACT scores? Show your method.
x = Z() + 
x = (-0.67)(4.8) + 20.9
x = -3.216 + 20.9
x = 17.684
 = 4.8
25%
Z? = 20.9
c. The quartiles of any distribution are the values with
cumulative proportions 0.25 and 0.75. What are the quartiles of
the distribution of ACT scores? Show your method.
x = Z() + 
 = 4.8
25%
 = 20.9
Z?
c. The quartiles of any distribution are the values with
cumulative proportions 0.25 and 0.75. What are the quartiles of
the distribution of ACT scores? Show your method.
x = Z() + 
x = (0.67)(4.8) + 20.9
x = 3.216 + 20.9
x = 24.116
 = 4.8
25%
 = 20.9 x=?
Download