Introduction to the Practice of Statistics

advertisement
David S. Moore • George P. McCabe
Introduction to the
Practice of Statistics
Fifth Edition
Chapter 1:
Looking at Data—Distributions
Copyright © 2005 by W. H. Freeman and Company
Modifications and Additions by M. Leigh Lunsford, 20052006
Technology Requirements
•
•
•
•
•
•
Megastat for Excel
Excel Data Analysis
Excel Macros
Data Sets in Excel Format on CD
TI-83
See my website for more details:
www.mathspace.com/lunsford
What is Statistics??
The Science of Learning from Data
The Collection and Analysis of Data
Experimental Design
Chapter 3
Probability
Chapter 4
Descriptive Statistics
(Data Exploration)
Chapters 1, 2
Inferential Statistics
Chapters 5 - 8
Chapter 1 - Looking at Data
1.1 Displaying Distributions with Graphs
1.2 Describing Distributions with Numbers
1.3 Density Curves and Normal Distributions
Section 1.1
Displaying Distributions with
Graphs
Data Basics
Variable Types
An Example (p. 5)
Graphs for Categorical Vars.
• Bar Graphs
• Pie Charts
Educational Level Example (page 7):
– A Bar Graph by Hand
– A Pie Chart by Hand
Homework: Try to do these in Excel!
Graphs for Quantitative Data
• Stemplots (Stem and Leaf Plots)
– Generally for small data sets
• Histograms
• Time Plots (if applicable)
Let’s look at an example to see what types of questions one may ask
and how these plots help to visualize the answers!
Example 1.7 Page 14
Descriptive and Inferential Stats
1. What percent of the 60 randomly chosen fifth
grade students have an IQ score of at least 120?
2. Based on this data, approximately what percent
of all fifth grade students have an IQ score of at
least 120?
3. What is the average IQ score of the fifth grade
students in this sample?
4. Based on this data, what is the average IQ score
of all fifth grade students (i.e. the population)
from which the sample was drawn?
Inferential? 2 and 4
Descriptive? 1 and 3
Let’s Make a Stemplot!
An Example (Ex. 1.7 p.14)
Data in Table 1.3 p. 14 (and on next slide)
Stem and Leaf Plot for Example
IQ Test Scores for 60 Randomly Chosen
5th Grade Students
Generated Using the Descriptive Statistics Menu on Megastat
Stem and Leaf plot for
iq
stem unit =
10
leaf unit =
1
Frequency
Stem
3
8
129
4
9
0467
14
10
01112223568999
17
11
00022334445677788
11
12
22344456778
9
13
013446799
2
14
25
60
Leaf
Now Let’s Make a Histogram!
• Use the Same Data in Example 1.7 (Data in
Table 1.3)
• We will start by hand….using class widths
of 10 starting at 80…
• Let’s try using Megastat (Excel file on
Disk)!
• Compare the Stemplot to the Histogram!
Histogram for Example
iq
lower
cumulative
upper
midpoint
width
frequency
percent
frequency
percent
80
<
90
85
10
3
5.0
3
5.0
90
<
100
95
10
4
6.7
7
11.7
100
<
110
105
10
14
23.3
21
35.0
110
<
120
115
10
17
28.3
38
63.3
120
<
130
125
10
11
18.3
49
81.7
130
<
140
135
10
9
15.0
58
96.7
140
<
150
145
10
2
3.3
60
100.0
60
100.0
IQ Scores of Randomly Chosen Fifth Grade Students
30
Compare this Histogram
to the Stem & Leaf Plot
we Generated Earlier!
25
15
10
5
IQ Score
15
0
14
0
13
0
12
0
11
0
10
0
90
0
80
Percent
20
Recall Our Earlier Question 1
1. What percent of the 60 randomly chosen
fifth grade students have an IQ score of at
least 120?
• Numerically?
18.3%+15%+3.3%=36.6%
(11+9+2)/60=.367 or 36.7%
• How to Represent
Graphically? Grey Shaded Region corresponds to this
36.6% of data
What is Different From
the Histogram we Generated
In Class??
Descriptors we
will be interested
in for data and
population
distributions.
Let’s Look at the Distribution we Just Created:
•Overall Pattern:
Shape (modes, tails (skewness), symmetry)
Center (mean, median)
Spread (range, IQR, standard deviation)
•Deviations:
Outliers
•Overall Pattern:
Shape, Center, Spread?
•Deviations:
Outliers?
Example 1.9 page 18-19
Data Analysis – An Interesting Example
(Example 1.10, p. 9-10)
80 Calls
•Overall Pattern:
Shape, Center, Spread?
•Deviations:
Outliers?
Time Plots – For Data Collected
Over Time…
Example: Mississippi River
Discharge p.19 (data p. 21)
Example – Dealing with
Seasonal Variation
Extra Slides from Homework
•
•
•
•
•
•
Problem 1.19
Problem 1.20
Problem 1.21
Problem 1.31
Problem 1.36
Problem 1.37-1.38
Problem 1.19, page 30
Problem 1.20, page 31
Problem 1.21, page 31
Problem 1.31, page 36
Problem 1.36, page 38
Problems 1.37 – 1.39
Section 1.2
Describing Distributions with
Numbers
Types of Measures
• Measures of Center:
– Mean, Median, Mode
• Measures of Spread:
– Range (Max-Min), Standard Deviation,
Quartiles, IQR
Means and Medians
Consider the following sample of test scores
from one of Dr. L.’s recent classes (max
score = 100):
65, 65, 70, 75, 78, 80, 83, 87, 91, 94
What is the Average (or Mean) Test Score?
What is the Median Test Score?
Consider the following sample of test scores from one of
Dr. L.’s recent classes (max score = 100):
65, 65, 70, 75, 78, 80, 83, 87, 91, 94
• Draw a Stem and Leaf Plot (Shape, Center, Spread?)
• Find the Mean and the Median
• Let’s Use our TI-83 Calculators!
– Enter data into a list via Stat|Edit
– Stat|Calc|1-Var Stats
• What happens to the Mean and Median if the lowest
score was 20 instead of 65?
• What happens to the Mean and Median if a low score of
20 is added to the data set (so we would now have 11
data points?)
What can we say about the Mean versus the Median?
Quartiles: Measures of Position
A Graphical Representation of Position of Data
(It really gives us an indication of how the data is spread
among its values!)
Using Measures of Position to Get Measures of Spread
And what was the range again???
5 Number Summary, IQR, Box Plot,
and where Outliers would be for Test
Score Data:
65, 65, 70, 75, 78, 80, 83, 87, 91, 94
What do we notice about symmetry?
Histograms of Flower Lengths
Problem 1.58
Generated via Minitab
Histogram of Flower Length
36
bihai
39
42
45
48
51
red
48
36
Percent
24
12
0
yellow
48
36
24
12
0
36
39
42
45
48
51
length
Panel variable: variety
Box Plot and 5-Number Summary
for Flower Length Data
Generated via Box Plot Macro for Excel
Box Plots for Flower Lengths
Bihai
Red
Yellow
Lengths (in mm)
55
Median
47.12
39.16
36.11
45
Q1
46.71
38.07
35.45
40
Min or In
Fence
46.34
37.4
34.57
Max or In
Fence
50.26
43.09
38.13
Q3
48.24
5
41.69
36.82
50
35
30
Bihai
Red
Yellow
Flower Color
Outliers?
Remember this histogram from
the Service Call Length Data on
page 9? How do you expect the
Mean and Median to compare
for this data?
Mean 196.6, Median 103.5
Box Plot for Call Length Data
More on Measures of Spread
• Data Range (Max – Min)
• IQR (75% Quartile minus 25% Quartile 2, range
of middle 50% of data)
• Standard Deviation (Variance)
– Measures how the data deviates from the
mean….hmm…how can we do this?
• Recall the Sample Test Score Data:
65, 65, 70, 75, 78, 80, 83, 87, 91, 94
Recall the Sample Mean (X bar) was 78.8…
Computing Variance and Std. Dev.
by Hand and Via the TI83:
Recall the Sample Test Score Data:
65, 65, 70, 75, 78, 80, 83, 87, 91, 94
Recall the Sample Mean (X bar) was 78.8
78.8
65
4.2
13.8
65
70
75
80
x
What does the number
4.2 measure? How
about 13.8?
83
85
90
95
Effects of Outliers on the Standard Deviation
Consider (again!) the following sample of test scores from one of
Dr. L.’s recent classes (max score = 100):
65, 65, 70, 75, 78, 80, 83, 87, 91, 94
What happens to the standard deviation and the location of the 1st
and 3rd quartiles if the lowest score was 20 instead of 65?
What happens to the standard deviation and the location of the
1st and 3rd quartiles if a low score of 20 is added to the data set
(so we would now have 11 data points?)
What can we say about the effect of outliers on the standard
deviation and the quartiles of a data set?
Example 1.18:
Stemplots of Annual Returns for
Stocks (a) and Treasury bills (b)
On page 53 of text. What are the
stem and leaf units????
Effects of Linear Transformations on the Mean
And Standard Deviation
Consider (again!) the following sample of test scores from one of Dr. L.’s recent
classes (max score = 100):
65, 65, 70, 75, 78, 80, 83, 87, 91, 94
Xbar=78.8 s=10.2 (rounded)
Suppose we “curve” the grades by adding 5 points to every test score (i.e.
Xnew=Xold+5). What will be new mean and standard deviation?
Suppose we “curve” the grades by multiplying every test score times 1.5 (i.e.
Xnew=1.5*Xold). What will be the new mean and standard deviation?
Suppose we “curve” the grades by multiplying every test score times 1.5 and
adding 5 points (i.e. Xnew=1.5*Xold+5). What will be the new mean and
standard deviation?
Box Plots for Problems 1.62-1.64
Section 1.3
Density Curves and Normal
Distributions
Basic Ideas
• One way to think of a density curve is as a smooth
approximation to the irregular bars of a histogram.
• It is an idealization that pictures the overall pattern of the
data but ignores minor irregularities.
• Oftentimes we will use density curves to describe the
distribution of a single quantitative continuous variable
for a population (sometimes our curves will be based on a
histogram generated via a sample from the population).
– Heights of American Women
– SAT Scores
• The bell-shaped normal curve will be our focus!
Density Curve
Page 64
Shape?
Center?
Spread?
Sample Size =105
Page 65
Density Curve
Shape?
Center?
Spread?
Sample Size=72 Guinea pigs
Two Different but
Related Questions!
1. What proportion
(or percent) of seventh
graders from Gary,
Indiana scored below 6?
2. What is the probability
(i.e. how likely is it?)
that a randomly chosen
seventh grader from Gary,
Indiana will have a test
score less than 6?
Example 1.22
Page 66
Sample Size = 947
Relative “area under the
curve”
VERSUS
Relative “proportion of
data” in histogram
bars.
Page 67 of text
The classic “bell shaped”
Density curve.
Shape?
Center?
Spread?
Median separates area under
curve into two equal areas
(i.e. each has area ½)
A “skewed” density curve.
What is the geometric interpretation
of the mean?
The mean as “center of mass” or “balance point” of the density curve
The normal density curve!
Shape? Center? Spread?
Area Under Curve?
Assume Same Scale on
Horizontal and Vertical
(not drawn) Axes.
How does the standard
deviation affect the
shape of the normal
density curve?
How does the magnitude
of the standard deviation
affect a density curve?
(aka the “Empirical Rule”)
The distribution of heights of young women (X) aged 18 to 24 is
approximately normal with mean mu=64.5 inches and standard
deviation sigma=2.5 inches (i.e. X~N(64.5,2.5)). Lets draw the
density curve for X and observe the empirical rule!
Example 1.23, page 72
How many standard
deviations from the mean
height is the height of a
woman who is 68 inches?
Who is 58 inches?
The Standard Normal
Distribution
(mu=0 and sigma=1)
Notation:
Z~N(0,1)
Horizontal axis in units of z-score!
Let’s find some proportions
(probabilities) using normal
distributions!
Example 1.25 (page 75)
Example 1.26 (page 76)
(slides follow)
Let’s draw the
distributions by hand
first!
Example 1.25, page 75
TI-83 Calculator Command: Distr|normalcdf
Syntax: normalcdf(left, right, mu, sigma) = area under curve from left to right
mu defaults to 0, sigma defaults to 1
Infinity is 1E99 (use the EE key), Minus Infinity is -1E99
Example 1.26, page 76
On the TI-83: normalcdf(720,820,1026,209)
Let’s find the same probabilities using z-scores!
The Inverse Problem:
Given a normal density proportion or
probability, find the corresponding z-score!
What is the z-score such that 90% of the data has a z-score
less than that z-score?
(1) Draw picture!
(2) Understand what you are solving for!
(3) Solve approximately! (we will also use the invNorm key
on the next slide)
Now try working Example 1.30 page 79!
(slide follows)
Syntax:
invNorm(area,mu,sigma)
gives value of x with area
to left of x under normal
curve with mean mu and
standard deviation sigma.
TI-83: Use Distr|invNorm
How can we use our
TI-83s to solve this??
invNorm(0.9,505,110)=?
invNorm(0.9)=?
Page 79
How can we tell if our data is “approximately normal?”
Box plots and histograms should show essentially symmetric,
unimodal data. Normal Quantile plots are also used!
Histogram and Normal Quantile Plot for Breaking
Strengths (in pounds) of Semiconductor Wires
(Pages 19 and 81 of text)
Histogram and Normal Quantile Plot for Survival Time of
Guinea Pigs (in days) in a Medical Experiment
(Pages 38 (data table), 65 and 82 of text)
Using Excel to Generate Plots
• Example Problem 1.30 page 35
– Generate Histogram via Megastat
– Get Numerical Summary of Data via Megastat
or Data Analysis Addin
– Generate Normal Quantile Plot via Macro (plot
on next slide)
Normal Quantile plot for
Problem 1.30 page 35
Extra Slides from Homework
•
•
•
•
•
•
•
•
Problem 1.80
Problem 1.82
Problem 1.119
Problem 1.120
Problem 1.121
Problem 1.222
Problem 1.129
Problem 1.135
Problem 1.80 page 84
Problem 1.83 page 85
Problem 1.119 page 90
Problem 1.120 page 90
Problem 1.121 page 92
Problem 1.122 page 92
Problem 1.129 page 94
Problem 1.135 page 95-96
Download