Normal Distribution Lecture Notes

advertisement
Normal Distribution Lecture Notes
Professor Richard Blecksmith
richard@math.niu.edu
Dept. of Mathematical Sciences
Northern Illinois University
Math 101 Website: http://math.niu.edu/∼richard/Math101
Section 2 Website: http://math.niu.edu/∼richard/Math101/fall06
1. Normal Distribution Curve
34%
34%
2.5%
13.5%
µ−σ
µ − 2σ
µ
2.5%
13.5%
µ+σ
µ + 2σ
In a normal distribution
• Fact 1.
• Fact 2.
– 50%
– 50%
Center = mean = median
The data lies equally distributed on each side of the center.
of the data lies to the left of µ and
of the data lies to the right of µ.
2. The 68 – 95 – 99 Rule
• Fact 3.
– 68% of the data lies within 1 standard deviation of the mean
– 95% of the data lies within 2 standard deviations of the mean
– 99% of the data lies within 3 standard deviations of the mean
1
2
3. Standardizing Data
Given normally distributed data, with mean µ and starndard deviation σ.
If x is a data point, we wish to know:
• how many standard deviations is x to the right (or left) of the center?
• That is, x = µ + z · σ. Solve for z.
µ+z·σ =x
z·σ =x−µ
z = (x − µ)/σ
4. The z–Rule
Original
Data Value
x
Standardized
Data Value
z = (x − µ)/σ
• A negative value of z represents a data point to the left of the center
• A positive value of z represents a data point to the right of center
5. Example from Text (page 51)
The lifetime of 20,000 flashlight batteries are normally distributed, with a
mean of µ = 370 days and a standard deviation of σ = 30 days.
1. What percentage of the batteries are expected to last more than 340
days?
Solution: z = (x − µ)/σ
= (340 − 370)/30 = −1.00
• Look up z = 1 in the chart.
• (The negative means that this value occurs one standard deviation to
the left of the center µ.)
• The corresponding P value is 34.1%.
3
6. Draw the picture
34.1
µ − 1.00σ
50
µ
The answer is 34.1 + 50 = 84.1%.
7. Question 2
2. How many batteries can be expected to last less than 325 days?
Solution: Work with percentages.
• z = (x − µ)/σ = (325 − 370)/30 = −1.50
• Look up z = 1.5 in the chart.
• The corresponding P value is 43.3%.
8. Draw the picture
43.3
µ − 1.50σ
µ
• Fifty percent of the data lies to the left of the center.
• Since 43.3% lies between µ − 1.50σ and the center µ,
• the percentage to the left of µ − 1.50σ is 50.0 − 43.3 = 6.7%
The final answer is: 6.7 percent of 20,000 = .067 × 20, 000 = 1340
9. SAT Example
• In 2001 a total of 1,276,320 college-bound students took the SAT exam.
4
• The mean and standard deviation of the test scores was µ = 506 and
σ = 111.
• 68% of the students fall within 1 standard deviation of the mean,
• that is in the range µ−σ = 506−111 = 395 to µ+σ = 506+111 = 617.
• 95% of the students fall within 2 standard deviations of the mean, that
is in the range µ − 2σ = 506 − 222 = 284 to µ + 2σ = 506 + 222 = 728.
• Where is the cutoff between the first and second Quartile?
10. SAT Example Cont’d
• We want P = 25%.
• The (3-digit) chart shows the z-value corresponding to P = .25 is z =
.675.
• This means that 25% of the data occurs before you get within .675
standard deviations of µ (on the left).
• Another 25% lies between µ − .675σ and µ itself.
• So the first quartile occurs at
• Q1 = µ − .675σ = 506 − (.675)111 = 431
• It turns out Q1 was exactly 430.
• The third quartile occurs at
• Q1 = µ + .675σ = 506 + (.675)111 = 581
11. Draw the Picture
2001 SAT Scores
25%
25%
µ − 0.675σ
Q1 = 431
25%
µ
506
25%
µ + 0.675σ
Q3 = 581
I.4 Sampling Lecture Notes
5
12. Statistical Thinking
Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. – H. G. Wells, author
of “War of the Worlds”
Definition: Statistics is the science of collecting, analyzing, and interpreting
data in such a way that the conclusions can be objectively evaluated.
13. Three Phases of Statistics
• Collect the data
• Analyze the data
– order the data
– graphical displays
– numerical calculations (such as mean and standard dev)
• Interpret the results
– use proper statistical techniques to substantiate or refute hypothesized statements
– match data to the appropriate technique
– determine whether the proper assumptions are satisfied
14. Two types of statistics
• Descriptive statistics – summarize and describe a characteristic for
some group
• Inferential statistics – estimate, infer, predict, or conclude something
about a larger group
15. Examples
Descriptive
Batting Average
Yards Per Carry
Test Scores
Inferential
Polls
Medical Studies
Market Surveys
6
16. Two types of data
• Quantitative data – values recorded on a natural numerical scale
• Qualitative data – classified into categories
17. Quantitative Data
• Weight of subjects in medical sample
• Height of buildings in Chicago
• Temperatures per day at Antarctica Weather Station
18. Qualitative Data
• Gender of subjects in medical sample
• Political affilation of respondents in a poll survey
• Class (fresh, soph, jr, sr) of Math 101 students
19. Vocabulary
• The population is the entire set of objects (people or things) under
consideration.
• A sample is a subset of the population that is available for the analysis.
• A bias is a favoring of certain outcomes over others.
• A census collects data from each member of the population.
• A statistic is a statement of numerical information about a sample.
• A parameter is a statement of numerical information about a population.
20. Census versus Sample
Would you use a census or a sample to determine the following:
• Project the winner of an election
• Calculate a baseball player’s batting average
7
• Predict whether it will rain tomorrow
• Test whether the soup is too salty
• Calculate Shaq’s free throw average
• Use a market study to determine a new flavor of toothpaste
• Report the Dow Jones Average
• Generalize a medical study to other groups
• The average score on the first test
21. Dealing with bias
Bias in some form occurs in the collecting of most, if not all, sets of data.
The bias may come from
• the portion of the population surveyed
• the phrasing of the questions
22. Examples
• “Dewey defeats Truman” projection of Chicago Tribune based on 1948
telephone poll
• “Are you in favor of Illinois banning cell phones in cars? Dial *91 on
your cellular phone to vote.”
• “Do you feel budget cuts are more important than humanitarian programs that would need to be cut to obtain a balanced budget?”
23. Methods for Choosing Samples
• Judgement Sample
8
– Use the opinion of person(s) deemed qualified to choose members
of the sample.
– Example: to investigate study habits of atheletes, ask their coaches
and teachers.
• Simple Random Selection
– Use random numbers to select the sample.
– Page 315 Random Digit Table:
72985547555515086461
• Stratefied Sampling
– Divide the population into relatively homogenous groups, draw a
sample from each group, and take their union.
24. Goals of a good sample
• from the correct population
• chosen in an unbiased way
• large enough to reflect total population
25. Normal Distribution of Random Events
Toss a coin 100 times and count the number of heads.
How many heads would you expect?
• about 50
• exactly 50
It does not seem reasonable that the count will be exactly 50.
We would not be surprised if the number of heads turned out to be 48 or
51 or even 55.
We would be surprised to see 80 heads, and would begin to suspect that the
coin was not fair.
26. Coin Toss Data
Experiment: A coin is tossed n = 100 times.
9
The experiment is repeated 1000 times.
Here are the results:
27. Frequency Table: No. of Heads
Heads Freq
1
0
..
.
0
34
0
35
2
36
2
37
2
38
2
39
5
40 14
41 16
42 25
43 30
44 31
Heads Freq
45 54
46 49
47 54
48 66
49 89
50 70
51 77
52 85
53 62
54 57
55 52
56 40
57 36
Heads Freq
58 27
59 19
60 11
61 11
62
5
63
4
64
2
65
0
66
0
67
1
68
0
..
.
0
100
0
28. Mean and Standard Deviation
mean = 50.296
stand dev = 5.100
10
29. Coin Toss Histogram
30
40
50
60
70
30. Sampling Distributions
If we could examine all possible samples of size n of a population, then the
frequency distribution of the means of these samples is normally distributed.
•
•
•
•
µ = the mean over the entire population
σ = the standard deviation over the entire population
x = the mean of the sampling distribution
σx = the standard deviation of the sampling distribution
31. Two Rules
Rule 1. x = µ
σ
Rule 2. σx = √
n
We are assuming in Rule 2 that the size of the entire population is much
larger than the sample size n.
Download