Monday F Chapters 1

advertisement
Math 3307 Lecture Notes
Perkowsky text
Monday format
May’13
Jan. 2015
Chapters 1 - 3
Activites 1
Activities 2
Homework Assignments
10 points each problem or part
Homework 1 – 70 points
Chapter 1
2, 4, 8
Chapter 2
2, 4, 6, 8
Homework 2 – 90 points
Chapter 3
3, 4(b, e, f, h, j), 10, 12, 14
Homework style sheet and rules:
Work on one side only; pdf it and upload it before the deadline on the calendar.
Work that is poorly scanned or illegible will be given a zero.
This includes sideways or upside down scans!
Do NOT crowd the work, leave at least 3” between problems.
Label the answers carefully so the grader can grade efficiently.
1
Chapter 1 Elements of Statistics
Let’s imagine that you have been hired to collect information on the workload and
responsibilities of middle school teachers in the USA.
A.
Where would you start?
Would you try to contact every middle school teacher in the country?
What would you do to get the data?
B.
What types of information would you collect…how would you decide what
is important to know in describing the areas of interest?
2
1.1
Getting started
ACTIVITIES 1 - Definition
Look in the book at the definition. How does it compare to yours?
Statistics:
Descriptive statistics
Definition and examples
Inferential statistics
Definition and examples
3
Descriptive Statistics Problems – by group!
DS1
Which of the following conclusions may be obtained from the following data by
purely descriptive methods and which require generalizations?
A student in my Spring Pre-calculus class took 4 consecutive daily quizzes and got
the following scores: 3, 8, 10, and 12.
a.)
On only 1 day did he get less than 5 right.
b.)
The student’s number correct increased on each successive quiz.
c.)
The student got better at guessing what I was going to ask each day.
d.)
On the last day the student copied his answers from his neighbor.
DS2
Smith and Jones are hairdressers. On a recent day, Smith cut the hair of 4 male
clients and 2 female clients. While Jones cut hair on 3 males and 3 females.
a.)
The amount of time it takes Smith and Jones to do a haircut is
approximately the same.
b.)
Smith always cuts hair on more males than females.
c.)
The two always have the same number of clients per day.
d.)
Over a week, Smith averages 6 clients a day.
4
ACTIVITES 1 DS3
More definitions:
page 3 in the text
Variable
Data and Data Set
Raw data
Population/Sample
Population parameter
Population statistics
Sample statistics
5
Focus on understanding:
A local school district would like to conduct a survey to estimate the percentage of
the registered voters in the district who would support a school bond levy (tax). To
determine the level of support, the school board surveys 1,000 registered voters
from their district. What are:
The population
The sample
The variable(s)
Raw data
Sample statistics
Population parameters
ACTIVITES – USING THE VOCABULARY
6
Sampling Techniques
pages 4 - 7
Simple random sampling
***Graphing Calculators: Let’s generate a random sample
and talk about how to use it creatively.***
Systematic sampling
Convenience sampling
Cluster sampling
Stratified sampling
Bias in Data Collection
page 9
IMPORTANT to know about or discover!
Classroom connection
Television stations, radio stations, and newspapers often predict the winners of
important elections long before the votes are counted. They make these predictions
based on polls.
A
What factors might cause a prediction to be inaccurate?
B
Political parties often conduct their own pre-election polls to find out what
voters think about their campaign and their candidates. How might a
political party bias such a poll?
7
1.2
Types of Data
Let’s come up with examples of the following:
Categorical/Qualitative Data
Numerical/Quantitative Data
Nominal Data
page 12
Ordinal Data
Interval Data
Ratio Data
Discrete/Continuous
Chapter 1 summary:
OYO. Note: essay questions on the tests.
Example: In 3 paragraphs, compare and contrast 2 different types of data.
8
Chapter 2 Organizing and Displaying Data
2.1
Displaying Categorical Data
Frequency and Relative Frequency Tables
Pages 21 - 23
Read and review in your group.
ACTIVITIES – The eyes have it!
Dot diagrams:
(line plots – page 33)
These summarize data visually and quickly. Put one dot for each
observation. Note that you don’t need to sort the data to make a dot diagram.
For example:
If I toss a die 6 times and get: 1 4 5 6 1 2
I’d put a horizontal line down and mark off the 6 possible numbers and then
put a dot above each recorded value:
9
DD Problem 1
215013207134241225134
311024113235224403140
This data summarizes the number of times per week that a small regional
airport with 48 flights per day that there are delayed takeoffs.
Make a dot diagram and analyze the data completely.
Dot diagrams are also useful with qualitative or categorical data.
ACTIVITIES DD Problem 2
10
Bar Graphs and Circle graphs
Example:
Here is a distribution of information about Americans aged 18 or older:
Marital
status
Count
Percent
Single
41.8
22.6
Married
113.3
61.1
Widowed
13.9
7.5
Divorced
16.3
8.8
In Millions
There are a couple of ways to display this information graphically. One is a
histogram or bar chart and another is a pie chart or circle graph.
Pie chart
11
Histogram
Why was it important to use the percentages and not the raw counts in both
representations?
See page 24 for a useful summary of which type of representation to use when.
12
2.2
Displaying Quantitative Data
Frequency and Relative Frequency Tables
The Rules
page 26
Classes: upper limits, lower limits, class mark
Class boundaries
Example:
Fifty candidates entering an astronaut training program were given a psychological
profile test measuring bravery. NASA grouped the data to make it more compact.
Note that the scores are grouped into units of the SAME length. Why is this
important?
Would you present this as a pie chart?
A dot diagram?
A bar chart or histogram?
Score in points
# of candidates
100 - 119
18
60 - 79
8
120 - 139
8
80 - 99
16
140 - 159
6
What do you think about the extreme values on the results?
13
Stem and Leaf Plots
page 30
An improvement on dot diagrams, stem and leaf plots work on data with
many various measurements. It is fairly low tech and can be quickly done in a
meeting or on the fly. I find them exceptionally useful in small classes (n < 50) for
a quick grade analysis.
The stems are the 10’s and the leaves are the single digits in each day’s total. It can
be useful to organize the leaves in order, too.
Here is one of my classes, a final:
10 123
09 45779
08 327758
07 459
06 78
BELOW 1111
Turn the page sideways (anti clockwise)…note the resemblance to a dot diagram!
What does this tell you about my class?
Note that in each case, there was somebody pretty close to the next level.
What grade is “BELOW”?
Sometimes if the data is unusually condensed, you might split the stems making
more rows rather than fewer rows.
14
Here are some quiz grades out of 130 points:
112 114 114 116 118 119 120 121 122 123
124 125 125 126 127 127 129
The best data presentation is to show 110 – 114, 115 – 119, 120 – 124, 125 – 129
rather than just 2 stems with LOOOOONG leaf lines:
11
11
12
12
244
689
01234
556779
Note that the stems are now both a hundreds and a tens digit!
Count the data points off the stem and leaf diagram. Where is the median?
The 80th percentile?
15
SL Problem 1
A hotel has 85 rooms. In February of last year they had the following rental
statistics:
75 79 37 57 60 64 35 73 62 81 43 72 78 54 69 75 78 49 59 80 58 76
52 49 42 62 81 77
Produce a stem and leaf plot of this data.
16
ACTIVITIES - SL Problem 2
17
SL Problem 3
Decide which representation you’d like to use with this data to show the age of the
presidents at inauguration. Dot diagram or stem and leaf. Why did you pick what
you did? Produce the display on the page provided at the end of the data.
Presidents
Find information about U.S. presidents, including party affiliation, term in office,
age at inauguration, age at death, and more.
State
Name and
of
1
(party)
Term birth
1789
Washingto –
1.
n (F)3
1797
Religion2
Born
Died
Va.
2/22/1732
12/14/179
Episcopalian
9
Age
at
inaug
.
Age
at
deat
h
57
67
J. Adams
2.
(F)
1797
–
1801
Mass.
10/30/173
7/4/1826
5
Unitarian
61
90
Jefferson
3.
(DR)
1801
–
1809
Va.
4/13/1743 7/4/1826
Deist
57
83
Madison
4.
(DR)
1809
–
1817
Va.
3/16/1751 6/28/1836 Episcopalian
57
85
Monroe
5.
(DR)
1817
–
1825
Va.
4/28/1758 7/4/1831
58
73
6. J. Q.
1825
Mass. 7/11/1767 2/23/1848 Unitarian
57
80
Episcopalian
18
Adams
(DR)
–
1829
1829
–
1837
S.C.
3/15/1767 6/8/1845
1837
Van Buren
8.
–
(D)
1841
N.Y.
W. H.
9. Harrison
(W)4
1841
10
Tyler (W)
.
11
Polk (D)
.
Jackson
7.
(D)
61
78
12/5/1782 7/24/1862 Reformed Dutch
54
79
Va.
2/9/1773
Episcopalian
68
68
1841
–
1845
Va.
3/29/1790 1/18/1862 Episcopalian
51
71
1845
–
1849
N.C.
11/2/1795 6/15/1849 Methodist
49
53
Va.
11/24/178
7/9/1850
4
Episcopalian
64
65
Unitarian
50
74
1849
12
4
Taylor (W) –
.
1850
4/4/1841
Presbyterian
13 Fillmore
. (W)
1850
–
1853
N.Y.
1/7/1800
14
Pierce (D)
.
1853
–
1857
N.H.
11/23/180
10/8/1869 Episcopalian
4
48
64
15 Buchanan
. (D)
1857
–
1861
Pa.
4/23/1791 6/1/1868
65
77
16 Lincoln
. (R)5
1861
–
1865
Ky.
2/12/1809 4/15/1865 Liberal
52
56
3/8/1874
Presbyterian
19
1865
17 A. Johnson
–
. (U)6
1869
N.C.
12/29/180
7/31/1875 (7)
8
56
66
18
Grant (R)
.
1869
–
1877
Ohio
4/27/1822 7/23/1885 Methodist
46
63
19
Hayes (R)
.
1877
–
1881
Ohio
10/4/1822 1/17/1893 Methodist
54
70
20 Garfield
. (R)5
1881
Ohio
11/19/183
Disciples of
9/19/1881
1
Christ
49
49
21
Arthur (R)
.
1881
–
1885
Vt.
10/5/1829
11/18/188
Episcopalian
6
50
56
22 Cleveland
. (D)
1885
–
1889
N.J.
3/18/1837 6/24/1908 Presbyterian
47
71
Ohio
8/20/1833 3/13/1901 Presbyterian
55
67
1889
23 B. Harrison
–
. (R)
1893
24 Cleveland
. (D)8
1893
–
1897
N.J.
3/18/1837 6/24/1908 Presbyterian
55
71
25 McKinley
. (R)5
1897
–
1901
Ohio
1/29/1843 9/14/1901 Methodist
54
58
T.
26
Roosevelt
.
(R)
1901
–
1909
N.Y.
10/27/185
1/6/1919
8
Reformed Dutch
42
60
27
Taft (R)
.
1909
–
1913
Ohio
9/15/1857 3/8/1930
Unitarian
51
72
20
28
Wilson (D)
.
1913
–
1921
Va.
12/28/185
2/3/1924
6
Presbyterian
56
67
29 Harding
. (R)4
1921
–
1923
Ohio
11/2/1865 8/2/1923
Baptist
55
57
30 Coolidge
. (R)
1923
–
1929
Vt.
7/4/1872
1/5/1933
Congregationali
st
51
60
1929
31
Hoover (R) –
.
1933
Iowa
8/10/1874
10/20/196
Quaker
4
54
90
F. D.
32
Roosevelt
.
(D)4
1933
–
1945
N.Y.
1/30/1882 4/12/1945 Episcopalian
51
63
33 Truman
. (D)
1945
–
1953
Mo.
5/8/1884
60
88
1953
34 Eisenhowe
–
. r (R)
1961
Tex.
10/14/189
3/28/1969 Presbyterian
0
62
78
43
46
55
64
12/26/197
Baptist
2
35 Kennedy
. (D)5
1961
–
1963
Mass. 5/29/1917
L. B.
36
Johnson
.
(D)
1963
–
1969
Tex.
8/27/1908 1/22/1973
37
Nixon (R)9
.
1969
–
1974
Calif.
1/9/1913
4/22/1994 Quaker
56
81
38
Ford (R)
.
1974
–
Neb.
7/14/1913
12/26/200
Episcopalian
6
61
—
11/22/196
Roman Catholic
3
Disciples of
Christ
21
1977
39
Carter (D)
.
1977
–
1981
Ga.
10/1/1924 —
Southern Baptist
52
—
40 Reagan
. (R)
1981
–
1989
Ill.
2/6/1911
Disciples of
Christ
69
93
41 G.H.W.
. Bush (R)
1989
–
1993
Mass. 6/12/1924 —
Episcopalian
64
—
1993
42
Clinton (D) –
.
2001
43 G. W.
. Bush (R)
2001
–
2009
44
2009
Obama (D)
.
–
6/5/2004
Ark.
8/19/1946 —
Baptist
46
—
Conn.
July 6,
1946
—
Methodist
54
—
Hawai Aug. 4,
i
1961
—
United Church
of Christ
47
NOTE:
1. F—Federalist; DR—Democratic-Republican; D—Democratic; W—Whig; R—Republican; U—Union.
2. Religious affiliation at election. Several presidents changed religions during their lifetimes.
3. No party for first election. The party system in the U.S. made its appearance during Washington's first term.
4. Died in office.
5. Assassinated in office.
6. The Republican National Convention of 1864 adopted the name Union Party. It renominated Lincoln for president;
for vice president it nominated Johnson, a War Democrat. Although frequently listed as a Republican vice president
and president, Johnson undoubtedly considered himself strictly a member of the Union Party. When that party broke
apart after 1868, he returned to the Democratic Party.
7. Johnson was not a professed church member; however, he admired the Baptist principles of church government.
8. Second nonconsecutive term.
9. Resigned Aug. 9, 1974.
22
Worksheet – presidents continued
What if we want to know: “Are we electing younger people than earlier in our
history?” j Consider a time series*! Find this in your book and discuss why it
might answer the question better than the preceding presentation
How could you present the categorical data? Party affliation, home state,
religion…decide (without doing!) how you would present each type of categorical
data.
*a chronological presentation with time on the x axis.
23
Histograms
***Calculator p.66 – 69…graphing a histogram
Let’s graph the following data together in our calculators, making a histogram:
First discuss each column and what each means!
Measurement number
1
0
2
3
3
1
4
5
5
2
6
7
7
5
8
6
9
3
10
0
11
1
12
0
13
2
24
A new, expanded style of bar/histogram: double sided…note the technique for
comparing data sets!
United States
AGE DISTRIBUTION
When drawn as a "population pyramid," age distribution can hint at patterns of growth.
A top heavy pyramid, like the one for Grant County, North Dakota, suggests negative population
growth that might be due to any number of factors, including high death rates, low birth rates,
and increased emigration from the area.
A bottom heavy pyramid, like the one drawn for Orange County, Florida, suggests high birthrates,
falling or stable death rates, and the potential for rapid population growth.
But most areas fall somewhere between these two extremes and have a population pyramid
that resembles a square, indicating slow and sustained growth with the birth rate exceeding
the death rate, though not by a great margin.
Let’s talk about what we can see here in this pyramid.
25
Line Graphs
page 35
Usually time is the horizontal axis. These are plotted just like graphing in algebra!
Now let’s look at page 36, the Classroom Connection illustration and talk about it.
26
2.3
Misleading graphs
Read it in class. Let’s discuss it together.
Not in the book, but good to know!
Simpson’s Paradox and Averages
We’ve already seen that averages can be misleading. There’s another way that they
can mislead discovered and publicized by Dr. Simpson in the 1960’s. You need to
be careful that the categories over which you are averaging are actually
comparable!
Here’s an excerpt from STATS: Data and Models (ISBN 0-321-20054-3, Pearson)
p. 24:
One famous example of Simpson’s Paradox arose during an investigation of
admission rates for men and women at the University of California at Berkeley’s
graduate schools. As reported in Science, about 45% of male applicants were
admitted while only about 30% of female applicants got in. It looked like a clear
case of discrimination. However, when the data were broken down by school
(Engineering, Law, Medicine, etc.) it turned out that women were admitted at
nearly the same or, in some cases, much higher rates than the men. How could this
be?
27
Women applied in large numbers to schools with very low admissions rates (Law
and Medicine, for example, admitted fewer than 10%). Men tended to apply to
Engineering and Science. Those schools have admission rates above 50%. When
the average was taken, the women had a lower overall rate but the average didn’t
really make sense.
Often you need to check more closely into the categories within each variable to get
the true picture.
Here’s the data on the graduate admissions from the 1975 issue of Science:
Males
accepted/
Females
accepted/
applicants
applicants
Program 1
511/825
89/108
Program 2
352/560
17/25
Program 3
137/407
132/375
Program 4
22/373
24/341
1022/2165
262/849
Let’s do some comparisons:
What are the overall averages? What are the averages within program categories?
28
ACTIVITIES – Simpson’s Paradox
Chapter 2 Summary
read on your own.
Here’s a sample test question:
Given these grades how will we check them out, compare and categorize?
Show more than one way to do this.
Discuss the benefits/problems with each way you present.
99, 79, 56, 98, 82, 71, 85, 92, 83, 75, 65, 94, 83
29
Chapter 3 Describing Data with Numbers
3.1 Measures of Center
These are the numbers that describe what is normal, usual, and in the middle or the
center. These terms are very loose and need firming up mathematically, of course.
Mode
x
Median
x
Mean
x
~
Mode
One measure of central tendency is the Mode.
This is the number that occurs most frequently in a data set.
The data set doesn’t always have a mode – if each data point is a different number
the set is mode-free. The mode is always a number in the data set, if there is one.
Some data sets have a mode; some are bi-modal or multimodal.
30
Problem Mode 1
Which of the following bars shows the mode in this histogram?
Age and saying No
Number of No's per hour
6
5
4
Series1
3
2
1
0
1
2
3
4
5
6
Age
31
Median
Another measure of central tendency is the Median:
The median is the value that is at the numerical middle of the data if there are an
odd number of data points and they are arranged in order by size. It is the mean of
the 2 middle data points if the number of data points is even and arranged in order
by size.
The formula for finding the location of the median for n data points is
0.5(n + 1).
The process is to order the data and then find the measurement at that location.
Problem Median 1
Find the median location for
Data set A.
n = 19 data points
Data set B.
n = 52 data points
Is the measurement equal to it’s location number?
ACTIVITIES Median Problem 2
32
Problem Median 2
In golf the holes are rated for a recommended number of strokes needed to sink the
golf ball into the hole. A score of par means the golfer used the recommended
number, a birdie is one fewer than recommended, a bogey is one more than the
recommended number, an eagle is 2 fewer strokes.
At a recent televised tournament, 7 golfers had the following scores, ranked
alphabetically by last name: par, birdie, par, par, birdie, bogey, and eagle.
Where is the median score located?
What is the median score?
33
Problem Median 3
The data shown in the table are the median prices of existing homes in the USA
from 1981 through 1986. If the average prices of existing homes were calculated
for each of these years, how do you think these values would compare to the
median prices shown?
Would the average price be higher, lower, or the same?
Year
Median
1981 66,460
1982 67,800
1983 70,300
1984 72,400
1985 75,500
1986 80,300
34
Mean
The most popular measure of “centeredness” is the Mean
(sometimes called the average).
The mean of n numbers is the sum of the numbers divided by n. If you are working
with a data set of measurements, the mean is denoted: x .
There are some very cogent reasons for its popularity:
It can always be calculated and it’s easy to calculate.
It is unique: there is only ONE mean for a data set.
It uses EVERY data point; nothing is eliminated.
It doesn’t depend on chance or luck.
There are some equally important reasons to take the mean with a grain of salt:
It is heavily affected by outliers!
Let’s look at this. Here is a list of home prices:
$77,500
$78,200
$137,000
$110,500
$1,800,300
What is the AVERAGE? Is this a measure of center, usual, normal?
What happened? What might we use instead of mean?
35
Do these 2 problems by group then discuss weighted mean
Problem CT1
An elevator in PGH is designed to carry a maximum load of 3,200 pounds. If it is
loaded with 18 people with a mean weight of 166 pounds, is it in any danger of
being overloaded?
Problem CT2
Having received a bonus of $20,000 for accepting early retirement, a company’s
sales representative invested $6,000 in a bond paying 3.75%, $10,000 in a mutual
fund paying 3.96%, and $4,000 in a CD paying 3.25%. Find the weighted mean of
these percentages.
36
Weighted mean – DISCUSS together
Problem CT3
A lecturer counts the final exam in a course 4 times as much as each of the 3 small
exams during the semester. Which of the following students has the higher
average?
Test 1
Test 2
Test 3
Final
Mikey
72
80
65
82
Lizbeth
81
87
75
78
37
Relationships among Mean, Median, and Mode, 1 problem plus one with 3
parts.
Problem CT4
The data shown in the table are the median prices of existing homes in the USA
from 1981 through 1986. If the average prices of existing homes were calculated
for each of these years, how do you think these values would compare to the
median prices shown?
Would the average price be higher, lower, or the same?
Year
Median
1981 66,460
1982 67,800
1983 70,300
1984 72,400
1985 75,500
1986 80,300
38
Problem CT5
Here are 3 data sets. The graphs for them follow.
x axis
STTR STTL
Symm
1
1
1
1
2
2
2
2
3
4
3
3
4
5
4
4
5
4
5
5
6
3
6
5
7
2
8
4
8
2
5
3
9
1
4
2
10
1
3
1
Calculate mean, median, and mode for these 3 charts. Mark on the x-axis where
each goes. How many data points in each set?
39
Skewed to the right
6
5
4
Series1
3
2
1
0
1
2
3
4
5
6
7
8
9
10
Skewed to the left
9
8
7
6
5
Series1
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10
40
Symmetric
6
5
4
Series1
3
2
1
0
1
2
3
4
5
6
7
8
9
10
Summarize your results with a mnemonic device.
Which measurement is most sensitive to outliers? Mean or Median?
What does it mean to say “most sensitive”
Discuss this idea using the salaries of baseball players.
ACTIVITIES MMM – 12 points!
41
3.2 Measures of Spread or Variability
Range
Max - Min
***Graphing Calculator, page 60
Variance:
Mean deviation
p. 58
 xx
n
The mean deviation is calculated by doing the following:
Calculate the mean.
Subract the mean from each data point. Take the absolute value of each
difference.
Add up the positive differences.
Divide by n, the number of data points.
Standard deviation
Variance:
p. 60
  x  x
2
n 1
The standard deviation for a set of data is the square root of the variance.
***graphing calculator p. 61***
42
The sample variance is calculated by doing the following:
First calculate the sample mean,
then subtract the mean from each measurement individually and
square the answer.
Add up all the squares and divide by n  1.
Example:
Given the following data points find the mean deviation and the standard deviation
along with the measures of central tendency. What is the range?
Display the data…why did you choose what you did for the display?
5, 6, 9, 0, 1, 6, 11, 5
43
Measures of Variability
Problem MV 1
Calculate the mean for each sample below. Calculate the range and variance for
each sample.
Discuss the information available in the variance.
N=5
1.2
1
0.8
Series1
0.6
0.4
0.2
0
1
2
3
4
5
44
N=5
3.5
3
2.5
2
Series1
1.5
1
0.5
0
1
2
3
4
5
45
ACTIVITES
Problem MV 2
Problem MV 3 – do in groups in class – 3 problems to discuss
Three sets of data are shown below.
 What are the number of data points in each set?
 What is the mean for each set (do this WITHOUT a calculator!).
 Rank the sets from the most variable to the least variable and tell why you
made those choices. (again: calculator free).
Hint: use the formula for variance to help you reason it out!
s
2
 ( x  x)

2
n 1
46
Data set 1
7
6
Frequency
5
4
Series1
3
2
1
0
1
2
3
4
5
6
7
8
9
10
11
Measurement
47
Data Set 2
6
Frequency
5
4
Series1
3
2
1
0
1
2
3
4
5
6
7
8
9
10
11
Measurement
48
Frequency
Data Set 3
10
9
8
7
6
5
4
3
2
1
0
Series1
1
2
3
4
5
6
7
8
9
10
11
Measurement
49
ACTIVITIES
Problem MV 4
Not in the book, but helpful to know!
Grouped Data for Variance calculations
If f is the frequency of a data measurement, then the following formula calculates
the variance for the data:
n
s2 
 f ( x  x)
i 1
i
2
i
n 1
Translate the formula to words in groups! Share around!
50
Problem MV 5
The data in the following table are for the inner diameters of some tubes
manufactured by a machine. This table is called a “distribution” because it gives
the values and their frequency. Find the mean diameter and the variance for the
tubes.
D, inches
frequency
2.0
2
2.2
4
2.3
6
2.8
3
3.0
5
51
Problem MV 7
The following table is a distribution of the top speeds in mph at which 30 racers
were clocked in an auto race. Find the mean and variance for the race.
Top Speed
Number of
racers
145
9
150
8
160
11
170
2
52
3.3
Measures of Position
Percentile Rank
Decile
Quartile
Percentile
A fractile ranking means that a given number of measurements lie below the given
measurement and a given number above.
Suppose your child comes home to tell you that she’s in the 90th percentile of her
class on a particular test. This means that 90% of the children have lower scores or
the same score as she does and 10% have higher scores. You do need to be a little
careful with these measurements of relative ranking, though. It could be that 91%
of the children failed the test and 9% passed. In this scenario, of course, being in
the 90% percentile isn’t much to brag about. You need absolute measures AND
relative measures to evaluate a situation about fractiles.
Deciles divide the measurements into 10ths and quartiles divide the measurements
into quarters. The median is both a decile and a quartile ranking.
Let’s look at quartiles:
Q1 is the median of all measurements less than the median of the data set.
Q3 is the median of all measurements greater than the median of the data set.
And deciles:
D1 is the measurement such that 90% of the measurements are BIGGER than it.
53
Problem FP 1
The following numbers are weekly lumber production (in million board feet) for a
company in Oregon. Find the first quartile and the 90th percentile for the data.
390
406
447
410
370
338
410
320
359
392
315
480
54
Not in the book, but handy to know!
Percentage change in a measurement:
The percent change in a measurement is often of interest to managers, doctors, and
teachers. It is used as a measure of efficacy.
The calculation is
final - initial
initial
Suppose you have a student who was reading poorly – 15 words a minute. You
train the student using your favorite method and test him again to find him reading
27 words a minute. The percent change is
27  15
15
which is 80%.
You would then report an 80% improvement in speed.
55
Problem PC 1
You’ve been looking at a sweater in the store but it costs $135 and that’s too much.
BUT one day you go and check and it’s been marked down to $65…what is the
percent change?
Problem PC2
A student has been working with a tutor on his math skills. His weekly quiz
average was a 65% when he started with the help program.
His quizzes are 30 points each. During the program his weekly grades are
20, 23, 21, 28, 27, 29
What is the percent change in his average? Would you say that the tutoring helped?
ACTIVITIES – PERCENT CHANGE
56
The Empirical Rule
page 71
Given a normal distribution (continuous, symmetric, mound-shaped)
68% of the data will lie inside 1 standard deviation from the mean
95% of the data will lie inside 2 standard deviations from the mean
99% of the data will lie inside 3 standard deviations from the mean
Let’s sketch this:
Z-score – a number that tells you how far a measurement is from the mean.
Usual, unexceptional data points will be 1  1.5 s
Think C’s on the positive end
Unusual will be 1.2  2.5
Rare and outliers will be 2.5 and up or down
Think of a grading scheme and standard deviations here: let’s put in standard
deviations and letter grades:
57
Here is one of my classes, a listing of the grades on the final…raw data and real
This is a stem-and-leaf diagram.
10 123
09 45779
08 327758
07 459
06 78
05 354
How many students were in my class?
What is the mean and the standard deviation?
s2 
 ( x  x)
2
n 1
Which grade is at the 80% percentile?
How far is the 85 from the mean in terms of the standard deviation?
58
ZS Problem 1
If you have 2 students applying for entrance to a G&T program and you have room
for only one, which one will you pick based on the following test information?
Gina got a 78 on a test with an average of 72 and a standard deviation of 5.
Mike got an 87 on a test with an average of 85 and standard deviation 1.5.
Who is the stronger student and how do you know?
59
ZS Problem 2
Given the following distribution – Arrange in a dot diagram. Follow the directions
on the next page.
Measurement number
1
0
2
3
3
1
4
5
5
2
6
7
7
5
8
6
9
3
10
0
11
1
12
0
13
2
60
Discuss
the measures of central tendency
 mean
 median
 mode

the measures of variability
 range
 variance
 standard deviation

and give
 the z score for the measurement 7.
Verify the Empirical Rule by making a dot or bar chart of the data and marking off
where each of the standard deviations from the mean are with respect to the data
points . (  s,  2s,  3s)
61
ZS Problem 3
The mean salary of the employees at a high school in Missouri is $28, 500 with a
standard deviation of $2,100.
Discuss the Empirical Rule and who might fit where on a bar chart of employee
salaries.
The state announces a flat raise of $500 per employee for the next year. Find the
mean and standard deviation of the new salaries.
Who will benefit the most in a percentage change analysis?
62
ZS Problem 4
Given that the mean is 9.0 and the standard deviation is 1.4 on the data below, give
the numbers of the 2,000 data points that should be within 1, 2, and 3 standard
deviations of the mean. Then count the numbers that actually ARE within these
bounds.
Value
Frequency
0
1
1
2
2
4
3
8
4
20
5
35
6
60
7
120
8
25
9
500
10
1000
ACTIVITIES ZS PROBLEM 5
63
ZS Problem 6:
Analyze the following nuclear reactor data (@2010)
Country
Argentina
In operation
Under construction
Electr. net
Electr. net
Number output
Number output
MW
MW
692
2
935
1
Armenia
1
375
-
-
Belgium
7
5,926
-
-
Brazil
2
1,884
1
1,245
Bulgaria
2
1,906
2
1,906
Canada
18
12,569
-
-
13
10,048
27
27,230
6
4,980
2
2,600
Czech Republic
6
3,722
-
-
Finland
4
2,716
1
1,600
France
58
63,130
1
1,600
Germany
17
20,490
-
-
Hungary
4
1,889
-
-
20
4,391
5
3,564
-
1
915
China


India
Iran
Mainland
Taiwan
-
64
Japan
54
46,823
2
2,650
Korea, Republic
21
18,665
5
5,560
Mexico
2
1,300
-
-
Netherlands
1
487
-
-
Pakistan
2
425
1
Romania
2
1,300
-
32
22,693
11
9,153
4
1,792
2
782
1
666
-
-
South Africa
2
1,800
-
-
Spain
8
7,514
-
-
10
9,303
-
-
Switzerland
5
3,238
-
-
Taiwan
6
4,980
2
2,600
Ukraine
15
13,107
2
1,900
United Kingdom
19
10,137
-
-
USA
104
100,747
1
1,165
Total
442
374,958
65
62,862
Russian Federation
Slovakian
Republic
Slovenia
Sweden
300
-
65
Work:
Some thoughts:
A histogram for the number per country?
Calculate the measures of center, the variability
Check the Empirical Rule?
An average output for each reactor?
A z-score for the USA, for China?
66
ZS Problem 7
A rough estimate of the range is the mean +/ 2 standard deviations from the mean.
Why is this true?
Could you use 3 sd? What would the difference be?
So you can ESTIMATE the standard deviation by taking the range and dividing by
4…let’s do this. It’s rough, but sometimes you just have to take what you can get!
If the range is 16 what is the estimate of the SD?
If the mean is 4 and the SD is 1.2 , what is an estimate of the range?
67
3.4
Box and Whisker Plots
are sometimes called “box plots”. They use the
Five Number Summary
in a visual way:
Minimum value in the data set
Lower Quartile value
Median
Upper Quartile value
Maximum value
***Graphing Calculator, page 79
Definitions:
Lower Quartile: Q1:
Upper Quartile: Q3:
the median of the values below the median
the median of the values above the median
It is possible to replace the minimum and maximum with prescribed values and
have “outliers” marked.
Sketch: horizontal
68
IQR: Interquartile Range: is the difference between the upper quartile and the
lower quartile. It is where the most “normal” measurements are.
Let’s look at page 75 and analyze the two data sets presented there!
69
Box plots are often used to compare data sets! It’s so easy to see how categories
compare with them.
Constructing a box plot with specified “fences” and “outliers”
as opposed to the Five Number Summary only
Put the data set in numerical order.
Mark the Five Number Summary right on the list.
Construct the box with Q1, the median, and Q3
Find the length of the fences (upper and lower, Qx  1.5(IQR))
Identify any data points that lie outside the fences and mark them *
BW1
Here is one of my classes, a listing of the grades on the final…raw data and real
This is a stem-and-leaf diagram.
10 123
09 45779
08 327758
07 459
06 78
05 354
How many students were in my class?
What are the grades?
What is the Five Number Summary? The IQR?
What is the estimated SD? And the estimated z-score for 67?
70
Sketch the box and whisker plot! Were there any outliers?
How do you know they’re outliers? Use the next page for this
71
BW1 continued
72
And another example, utilizing the comparison power of box and whisker plots:
Is in ACTIVITIES BW 2
Comparing several data sets with box and whisker plots.
A student designed an experiment to test the efficiency of 4 coffee containers from
different manufacturers by pouring coffee at 180 into each container and then
measuring the temperature difference after 30 minutes. She did the experiment 5
times – using different cups of the same type each time (she didn’t reuse any of the
cups). So she used 20 cups total, 5 from each manufacturer.
The 5 number summary average temperature differences are in the table below
Min
Q1
Median Q3
Max
IQR
Cup 1
6F
6
83.25
14.25
18.5
8.25
Cup 2
0F
1
2
4.5
7
3.5
Cup 3
9F
11.5
14.25
21.75
24.5
10.25
Cup 4
6F
6.50
8.50
14.25
17.5
7.75
Compare the data. Which cup has the best heat retention property?
Each group in the room do one and then we’ll go the board and compare!
73
Chapter 3 Summary
OYO
Sample question:
Page 83 number 9, 13
74
Download