ST-L1 Intro to Statistics

advertisement
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
Intro to Statistics
ST-L1 Objectives:
To review measures of central tendency and
dispersion.
Learning Outcome B-4
Slide 1
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
Judith and Francine, both age 19, have decided to go on a Caribbean
cruise, and they want to have an enjoyable time, which means that they
want to travel with other people their own age. They buy tickets for a
cruise where the average age of the other passengers is 20 years. Can
you imagine their surprise at the start of the cruise when they discover
that all the other passengers are parents (average age 32) with children
(average age 8)?
This lesson includes a brief review of sampling and the calculation of
central tendencies. It also introduces 'measures of dispersion,' something
Judith and Francine should have known about. Also, you will be
introduced to information technology (Winstats) that will do the statistics
calculations.
Theory – Intro
Slide 2
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
Teacher Bundy asked his students to measure the length of the
classroom using metre sticks, and to write their measurements (rounded
to the nearest millimetre) on the board. It quickly became evident that
most of the measurements were quite similar, but a few were lower or
higher than the 'main group' of measurements.
Whenever some quantity or value is measured numerous times, there will
likely be a variety of results. Most of the results will likely be close to a
value believed to be the true value. The term variability in statistics
refers to the variety of answers we get when measuring something. You
will study ways of describing and interpreting variability in data. For
example, you will review ways of finding a 'middle' measurement that
represent all the data, and also ways of describing how the data are
spread about this 'middle' value.
Theory – Variability, Continuous
and Discrete Data
Slide 3
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
If we count books, the count would consist of whole numbers. This is an
example of discrete data, or data you get when counting a finite number
of distinct objects. (For example, there may be 71 or 72 books, but not
71.6 books.)
If we measure the length of the classroom we must decide on an
acceptable level of precision and round our final answer. We can never
say the measurement is exactly correct. Such measurements are
continuous data, because the data is NOT discrete the final answer is
an approximation (eg. 71.4 cm, or 71.38 cm or 71.3810728 cm). Other
examples of continuous data are measurements of speed, area, and
time.
Theory – Variability, Continuous
and Discrete Data cont’d
Slide 4
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
A measure of central tendency is a single quantity or score that in
some way represents the 'middle' value of all the data in a sample or
population. Three commonly used measures of central tendency are as
follows:
1. The mean, or average, of a set of scores is found by adding the scores
together and then dividing the sum by the number of scores. The
symbols for mean are:
(bar x) for the mean of a set of data (sample mean), and
(mu) for the mean of a population
2. The mode for a sample or population is the value that appears most
often.
Theory – Measures of Central
Tendency
Slide 5
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
The median is the middle term when data are arranged in order of size
from the smallest to the largest. Let 'n' represent the number of terms of
data. If the data have an odd number of terms, the middle one is the
term. If the data have an even number of terms, you find the average of the
middle two terms. That is, you find the average of the terms at
and
Theory – Measures of Central
Tendency
Slide 6
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
A clerk in a men's clothing store keeps a weekly record of the number of
pairs of pants sold. The following is her list for two weeks.
Calculate the mean, mode, and median for the data shown.
Mon
Tue
Wed
Thur
Fri
Sat
Week 1
34
40
36
36
38
38
Week 2
32
36
36
42
34
34
Test Your Knowledge –
Measures of Central Tendency
Slide 7
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
The question from the previous page is repeated here: A clerk in a men's
clothing store keeps a weekly record of the number of pairs of pants sold.
The following is her list for two weeks.
Winstats is used to answer this question. The window on the left-hand side
shows the data entered in two columns, and the window on the right shows
Mon
Tue
Wed
Thur
Fri
Sat
the overall statistics.
Calculate the mean, mode, and
median for the data shown.
Week 1
34
40
36
36
38
38
Week 2
32
36
36
42
34
34
Note that the mean shown
is 36.333, and the median
is 36.000. These answers
are the same as the ones
that were calculated.
Winstats does not calculate
the mode.
Test Your Knowledge –
Measures of Central Tendency
Slide 8
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
A central tendency is a measure of some kind of 'middle' number for a
group of data. We also need ways to measure the variation of the individual
data values, and how the data are spread about the central value.
For example, Robin can drive to university using the downtown route or the
perimeter route. The downtown route is shorter, but it has more traffic, and
can become quite crowded. The driving times in minutes for each route
Downtown
(arranged in ascending order)
15
26
30
39
45
Route
for 5 days are shown on the table.
Perimeter
29
30
31
32
33
Route
The average driving time for each route is 31 minutes. Which route should
she take? The driving times for the downtown route vary from 15 to 45
minutes, and so she would need to allow 45 minutes travel time to ensure
getting to class on time. The driving times for the perimeter route vary from
29 to 33 minutes, and so the travel times are more predictable.
As you can see, Robin needs to consider more than the mean travel times
when deciding on which route to take.
Theory – Measures of Dispersion
Slide 9
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
The simplest measure of variation is the range. The range of a set of
numbers is the difference between the largest number and the
smallest number.
In the previous example (repeated here), the ranges are as follows:
Downtown
Route
Perimeter
Route
15
26
30
39
45
29
30
31
32
33
Downtown: Range = 45 - 15 = 30 minutes
Perimeter: Range = 33 - 29 = 4 minutes
Other applications of range may be:
•temperature variation for the day
•prices of stocks on the stock market
•marks of students in class
Theory – Range
Slide 10
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
One limitation of using range to describe the variation in a group of
data (for example, student marks in a class) is that range provides
information about only two scores - the highest and the lowest - and
does not provide any information about all the other scores. One
extreme score will make the range very large, even if all the other
scores are very close.
For example, the marks of the five students in both groups below
have a range of 55. The variability in the first group, however,
appears to be primarily attributable to one extreme student.
Student marks: 25, 75, 77, 78, 80
Range = 80 - 25 = 55
Student marks: 25, 34, 56, 71, 80
Range = 80 - 25 = 55
Theory – Range
Slide 11
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
The standard deviation is a measure that shows how the data are
spread about the mean value. We will use standard deviation to
describe data, but we will not use algebraic formulas to calculate
standard deviation. Instead, we will use technology (computer
program or graphing calculator) to calculate standard deviation. The
symbol for standard deviation of a population or large sample is
and the symbol for standard deviation of a sample is s. A large
sample is defined as a sample with 30 or more data items. In this
course, we will use only
(sigma), which represents the standard
deviation of the population.
Theory – The Standard Deviation
Slide 12
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
For example:
The mean math mark for Class A is 74, and the standard deviation is 4.
This means that 68 percent of all the marks in the class are within 4 of
74. In other words, we can say that 68 percent of the marks in class are
between (74 - 4) 70 and (74 + 4) 78.
Theory – The Standard Deviation
Slide 13
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
Calculate Standard Deviation Using Algebra (not really)
Theory – The Standard Deviation
Slide 14
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
The mean math marks and standard deviation for two classes are
shown below. Assume that 68 percent of the marks in each class
are within one standard deviation of the mean mark.
mean
standard
mark
deviation
Class A
74
4
Class B
72
8
Questions:
1. In which class is the set of marks more dispersed?
2. Bert in Class A and Beth in Class B each have a mark of 82%.
How many standard deviations are they from their class means?
Who appears to have the better mark?
Theory – The Standard Deviation
Slide 15
40S Applied Math
Mr. Knight – Killarney School
Answers:
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
mean
standard
mark
deviation
Class A
74
4
Class B
72
8
1. Class A: 68% of all the marks are from (74 - 4) 70 to (74 + 4) 78.
Class B: 68% of all the marks are from (72 - 8) 64 to (72 + 8) 80.
Therefore, the marks in Class B are more dispersed.
2. Bert: Number of standard deviations above the mean.
Beth: Number of standard deviations above the mean.
Therefore, Bert appears to have the better mark because he is 2
above the class average.
End day 1
Theory – The Standard Deviation
Slide 16
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
On previous pages all the elements of data were listed in tables. This is
fine if the number of elements is relatively small, but becomes awkward
if the number of elements gets large -- say 100 elements or more. For
this reason, it is convenient to present the data as a frequency
distribution. A frequency distribution table shows the number of
elements of data (frequency) at each measure. Sometimes the
measures need to be grouped, especially if the measures are
continuous.
Example:
The table below is a frequency distribution table that shows the heights
of 100 Senior 4 students. The students are grouped into suitable height
groups in 7 cm. intervals.
Theory – Grouped Data
Slide 17
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
Example:
The table below is a frequency distribution table that shows the heights
of 100 Senior 4 students. The students are grouped into suitable height
groups in 7 cm. intervals.
height interval
interval mean
# of students
153.5 to 160.5
157
5
160.5 to 167.5
164
16
167.5 to 174.5
171
43
174.5 to 181.5
178
27
181.5 to 188.5
185
9
Total
100
Determine the mean, median, mode, range, and standard deviation for
the student heights. Be sure to select 'grouped data' in Winstats.
Theory – Grouped Data
Slide 18
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
From the diagram, you can read the following answers.
mean = m = 172.33 median = 171.00
mode = 171.00 (i.e., the largest # of students, read from input data at
43 students)
range = 28
standard deviation = s = 6.837
Theory – Grouped Data
Slide 19
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
A histogram is a bar graph that shows equal intervals of a measured
or counted quantity on the horizontal axis, and the frequencies
associated with these intervals on the vertical axis.
The data from the previous page is repeated below. The histogram has
been drawn with Winstats, and represents the heights of the students in
graphical form.
Note that the five bars represent the five height intervals, and the
heights of the bars represent the number of students at each height.
The tallest bar represents the mode height. Such a histogram is known
as a Frequency Distribution Graph.
Theory – Histogram
Slide 20
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
The frequency distribution table below shows the midterm marks of 85
Senior 4 math students at Parksville High. The first column shows the
mark interval, the second column the average mark within each mark
interval, and the third column the number of students at each mark.
Answer the following questions.
mark interval
mark
# of students
29 to 37
33
1
38 to 46
42
4
47 to 55
51
12
56 to 64
60
18
65 to 73
69
24
74 to 82
78
16
83 to 91
87
7
92 to 100
96
3
Total
Sample Problem
85
Slide 21
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
mark interval
mark
# of students
29 to 37
33
1
38 to 46
42
4
47 to 55
51
12
56 to 64
60
18
65 to 73
69
24
74 to 82
78
16
83 to 91
87
7
92 to 100
96
3
Total
Sample Problem
85
Slide 22
40S Applied Math
Mr. Knight – Killarney School
Sample Problem
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
Slide 23
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
mark interval
mark
# of students
29 to 37
33
1
38 to 46
42
4
47 to 55
51
12
56 to 64
60
18
65 to 73
69
24
74 to 82
78
16
83 to 91
87
7
92 to 100
96
3
Total
Sample Problem
85
Slide 24
40S Applied Math
Mr. Knight – Killarney School
Unit: Statistics
Lesson: ST-L1 Intro to Statistics
mark interval
mark
# of students
29 to 37
33
1
38 to 46
42
4
47 to 55
51
12
56 to 64
60
18
65 to 73
69
24
74 to 82
78
16
83 to 91
87
7
92 to 100
96
3
Total
Sample Problem
85
Slide 25
Download