DobbinChapter1 Sec1

advertisement
CHAPTER 1, Section 1.3 Revised Feb 2, 2012
If the overall pattern of a large number of observations is quite regular, we chose to
describe it by a smooth curve called a density curve.
A Density curve has the following properties:
1. It is on or above the horizontal axis.
2. The total area under the curve is 1.0000 or 100.00%.
3. The area under the curve and between any two values on the horizontal
axis represents the percent or fraction of all observations that fall in that
range, (probability of occurrence).
4. Because density curves are continuous distributions, the chance of any
exact value occurring is 0; only an interval has a percent or a probability
of occurring.
The median of a density curve is the equal areas or equal counts point.
The mean of a density curve is the balance point.
For a left skewed density curve, the mean is lower or less than the median.
For a symmetric density curve, the mean = the median.
For a right skewed density curve the mean is higher or greater than the median.
A density curve is an idealized model for a distribution of data. It is often used to
describe the entire population of interest, and in this context the mean of the
population is designated as µ, and the standard deviation as σ . When we take
actual observations (generally a sample) we distinguish the mean of the sample
observations as x and the standard deviation as s.
Lecture 3, Section 1.3
Page 1
Normal Distributions are a particularly important class of density curves. These
density curves are symmetric, unimodal, and bell-shaped.
They have the following properties:
 They are all symmetric
 Their mean is equal to their median
 The standard deviation  controls the spread of a normal curve. We can
actually locate  by eye on a normal curve. It is the point on the horizontal
scale which is directly under the inflection points of the curve.
 Changing the mean,  , without changing standard deviation,  , shifts the
normal curve along the horizontal axis without changing the spread.
 Changing the  without changing  changes only the spread of the normal
distribution.
 The Normal density curve can be fully described by giving its mean,  , and
standard deviation,  . The values  and  are parameters of the curve..
Lecture 3, Section 1.3
Page 2
The standard notation is N ( , ) . Given X has a normal distribution with mean
 = 5 and standard deviation σ = 0.2, we write it as follows:
X ~ N(5, 0.2)
Common properties of Normal density curves:
The 68-95-99.7 Rule:
In the normal distribution with mean  and standard deviation  :
 Approximately 68% of the observations fall within 1  of the mean 
 Approximately 95% of the observations fall within 2 of 
 Approximately 99.7% of the observations fall within 3 of 
Example: Checking account balances, X, are approximately normal with a mean
of 1325 and a standard deviation of 25.
1. What is the notation for this distribution?
X is ~ N ( 1325, 25)
2. Between what numbers do 68% of the balances fall?
1300 and 1350
3. Above what number do 2.5% of the balances lie?
1375
4. Approximately what % of balances are between 1250 and 1400?
99.7%
Lecture 3, Section 1.3
Page 3
What if you need different probabilities for X ~ N (  , ) ?
1. We use the Standard Normal Distribution, a normal distribution with a
mean,  , = 0 and a standard deviation,  = 1, written as N(0,1).
2. And we can use the fact that all normal distributions are the same if we express
the location of any point on the horizontal scale in terms the center, µ plus or
minus a certain number of units of  .
3. We can convert any normal distribution to a Standard Normal Distribution by
the formula listed below. If we do this we can use the Standard Normal Table
(Table A in the front cover of your book) for any variable which can be described
as a Normal Distribution.
You convert X ~ N (  , ) to Z ~ N(0,1). Convert/standardize using:
z
x

A standardized value is often called a z-score. The z-score effectively describes
how many standard deviations any x is from the X distribution mean, and in what
direction.
Z-scores are what you need in order to use the Standard Normal Table (Table A in
the front cover of your book). In the table:
 Z-scores run down the left-most column of the tables. The 2nd decimal place
of the z-score runs across the top-most row of the tables.
 The inner numbers are the probability that you are at or lower than your zscore.
 The first page of Table A has negative z-scores, the second page has positive
z-scores.
 P(Z=z-score) = 0. Only intervals have probabilities.
Lecture 3, Section 1.3
Page 4
1.
2.
3.
4.
5.
To find a probability if you have X ~ N (  , ) and a sample score, x, to
work with:
x
Convert x to z-score. z 

Rearrange (if necessary the inequality so that it uses < or  . This uses:
P(Z>z-score) = 1 – P(Z<z-score).
Look up the probability for your z-score on Table A.
If z-score is between 2 table points, use the closest value.
P(a<Z<b) = P(Z<b) – P(Z<a).
If you are given the probability and know X ~ N (  , ) , but don’t know the
sample’s score you will need to work the problem backwards. Find the appropriate
z-score and convert it to x with x =  + z  . See part f in the example below.
Examples:
1. Checking account balances are ~N(1325,25).
a. Bill has a balance of $1270. What is Bill’s standardized balance (his zscore)?
b. What is the probability an account will have less money than Bill’s?
c. What is the probability an account balance will be more than $1380?
d. What is the probability an account balance will be exactly $1380?
e. What is the probability that an account will have between $1310 and $1390?
f. What account balance would be the beginning of the top 10% of all
balances?
Lecture 3, Section 1.3
Page 5
2. The beanstalks clubs are a social club for tall people. To join the Beanstalks,
woman must be at least 70” tall and men must be at least 74” tall. The National
Health Survey reports that:
Height of adult Women in U.S. = ~ N (63.6, 2.5) and
Height of adult Men in U.S. = ~ N (69, 2.8 )
a. What fraction of the adult female population of the U.S. could qualify for
members of the Beanstalks?
b. What fraction of the adult male population of the U.S could qualify for
membership in the Beanstalks?
c. If the Beanstalks Club wanted to be more exclusive for males, above what
height would only 1% of males qualify?
Lecture 3, Section 1.3
Page 6
3.
A physical fitness association is including the mile run in its secondary
school fitness test for boys. The time for this event for boys in secondary
school is ~ N (450, 40)
a. What fraction of the secondary school boys would run the mile in less than
7 minutes( less than 420 seconds)?
b. What fraction of the secondary school boys run the mile in 7 to 8 minutes,
(420 to 480 seconds) ?
c. If the association wants to designate the fastest 10% as “excellent”, what
time should the association set for this criterion?
Lecture 3, Section 1.3
Page 7
How do you determine if your data is normally distributed?
Two methods:
1. Graph the data and determine if the data is unimodal, symmetric, and
approximately normally distributed (use the 68%-95%-99.7% rule to check). OR
make a Histogram, a Stemplot or a Normal Quantile Plot.
2. Generate a normal quantile plot.
One does a Normal Quantile plot using the following steps:
 Arrange the data, each value of x, from smallest to largest,
 Recording what percentile of the data each value represents,
 Convert that percentile to an expected z-score,
 Convert z-score to an expected value,
 Plot the x observed value and the expected value.
Example: Bob’s last 20 golf scores,
69 73 77 77 80 76 75 77 78 78 77 81 82 75 79 76 83 77 80 84
Put the data in ascending order:
69 73 75 75 76 76 77 77 77 77 77 78 78 79 80 80 81 82 83 84
Lecture 3, Section 1.3
Page 8
If the points on a normal quantile plot lie close to a straight line, the plot indicates
that the data are normal. Systematic deviations from a straight line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall
pattern of the plot. A Q-Q Plot shows observed and expected values, as above. A
P-P plot shows observed and expected cumulative probabilities.
For SPSS: Enter your variable data. Then select Analyze>Descriptive
Statistics> QQ > move variable name to variable column, select OK.
Lecture 3, Section 1.3
Page 9
Download