Outline Course Notes - Knightswood Secondary School

advertisement
Statistics
1.
2.
3.
4.
5.
6.
7.
8.
Finding the Mean, Median & Mode
Statistical notation (C)
Frequency Table and Graph
The 5-figure summary (& box plot) (C)
Cumulative frequency (table/graph) (C)
Making comparisons (C)
Standard deviation
(C)
Probability
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
Statistics
page 34
1)
Finding the Mean, Median & Mode
The mean, median and mode are all types of averages.
Mean = nx
(all the numbers added together and divided by the
number of numbers).
median =
The middle number in a set of ordered numbers (if there is no
single middle number then the middle two are added together
and halved).
Mode =
The most frequently occurring number.
Example:
Answer:
Here are Jane's marks from her last 8 tests (marks are out of 10).
4, 5, 6, 6, 7, 7, 7, 9
Find the mean, median and mode for this data.
Mean = nx = 4 + 5 + 6 + 6 + 7 + 7 + 7 + 9 = 6.375
8
Median = 6 + 7 = 6.5
2
Mode = 7
Note:
2)
The mean is usually considered to be the best average since it takes
into account the value of every number in the data set.
Statistical Notation
(C)
Here are some commonly used symbols in statistics and their meaning (try to
find them on your scientific calculator).
xi
n
Each number in your data set (in the above example,
x 1 = 4, x 2 = 5, etc...)
The number of numbers in your data set.
x
The mean
x
All the numbers in your data set added together.
x2
Square all the numbers in your data set first then add them all
together.
( x ) 2 Add all the numbers in your data set together then square the total.
n−1 or s
P(a)
Standard deviation (see later)
The probability of the outcome "a" occurring.
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
Statistics
page 35
3)
Frequency Table & Graph
Example:
(C)
Here are the number of goals Inverness Thistle scored over 20
matches during season 2000/2001.
3,
1,
1,
2,
1,
2,
0,
1,
0,
4,
1,
0,
0,
2,
2,
0,
1,
1,
0,
4.
Show the information in a frequency table and draw a frequency graph
and find the mean, mode and median.
Answer:
The frequency table.
Number of
goals (x)
tally
frequency
(f)
fxx
0
IIII
5
0
1
IIII II
7
7
2
IIII
4
8
3
II
2
6
4
II
2
8
0
0
20
29
5
Total
This total gives n
This total gives x
The frequency graph.
Goals scored by
Inverness Thistle
f
6
5
Notice that we can easily find
the mean from the totals in
the table.
29
x = x
n = 20 = 1.45
4
By looking at the graph, you
can see that the highest bar is
for 1 goal.
3
2
So the mode is 1.
1
0
1
2
3
4
5goals
The median is not so obvious. The middle numbers are x 10 and x11 which are
both 1 again (you can see this from the graph or you can order the data set).
So the median is calculated by 1 + 1 = 1.
2
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
Statistics
page 36
4)
The 5-figure summary (& box plot)
(C)
Example:
A local newspaper makes a note of its daily sales for the past two
weeks (numbers are in nearest thousand);11, 9, 8, 7, 8, 10, 12, 13, 10, 6, 7, 7, 9, 10
Find the 5-figure summary for this data.
Answer:
We must firstly order the data set.
6, 7, 7, 7, 8, 8, 9, 9, 10, 10, 10, 11, 12, 13
There are 14 numbers in the data set
which divides into two sets of 7.
14
There is no middle number so use
x7 and x8 for the median (called Q 2).
7
x7
x8
7
3
Q3
Q2
The two sets of 7 now divide into
four sets of 3, each clearly having
a middle number at x 4 and x11.
3
Q1
3
3
The 5-figure summary is defined as...
L (the lowest number in the data set)
H (the highest number in the data set)
Q1 (occurring at exactly x4 in this case)
x7 + x8 9 + 9
Q2 (the median) =
=
2
2
Q3 (occurring at exactly x11 in this case)
=
=
=
=
=
6
13
7
9
10
Q1, Q2 & Q3 are known as the quartiles.
The range of the data set is defined to be H - L = 13 - 6 = 7
This information can be represented as a box plot to make it easier to see
how the data is spread.
Box Plot
6
2
3
4
5
6
7
7
8
9
10
9
10
13
11
12
13
14
Notice....
The position of the box tells us that the numbers are mostly between 7 & 10.
The median or Q 2 is indicated by a line through the box.
The length of the box (the interquartile range, Q3-Q1) gives an indication of
the spread of the numbers. You will find out a better measurement of the
spread later on.
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
Statistics
page 37
5)
Cumulative Frequency (table/graph)
Example:
During an epidemic of blood poisoning, a sample of 24 people were
screened for the disease. The number of infected blood cells per ml
were counted by machine and put into class intervals of 100.
Diseased cells
0-100
Frequency
i)
101-200
3
201-300
2
301-400
2
401-500
4
501-600
6
601-700
3
1
701-800
1
801-900
1
901-1000
1
The cumulative frequency table.
We can construct a cumulative frequency table as shown below. Try to make
sure you can see where all the numbers are coming from ( help from a
teacher may be required in this example).
Diseased cells frequency Cumulative
(per ml)
frequency
0 - 100
3
3
101 - 200
2
5
201 - 300
2
7
301 - 400
4
11
401 - 500
6
17
501 - 600
3
20
601 - 700
1
21
701 - 800
1
22
801 - 900
1
23
901 - 1000
1
24
X
Plot the diseased cells
against the cumulative
frequency.
X Draw a smooth best
fitting curve.
ii) Cumulative frequency graph.
24
Cumulative Frequency
(C)
21
18
X
Q3
X
X
The graph can now be
used to estimate the
values of Q1, Q2 & Q3 .
X
15
12
X
Q3
9
6
3
0-
24 + 4 = 6, so we will
use 6, 12 & 18 for the
quartiles.
Q2
Q1
You can now read that,
Q1 = 250
Q2 = 420
Q3 = 530
approximately
X Q2
X
X
100
Q1
200
300
400
500
Cells
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
600
700
800
900
1000
continued over/
Statistics
page 38
5)
Cumulative Frequency (table/graph) (continued)
(C)
iii)
The interquartile and semi-interquartile range (a measurement of spread).
The interquartile range can now be calculated as Q3 - Q1 = 530 - 250 = 280
1
or the semi-interquartile range = 2 (Q 3 − Q 1 ) = 140 is more commonly used.
iv)
Interpreting results
We can see from our graph that most people in our sample had a relatively
low count of diseased cells (the graph rises steeply at the beginning,
between 275 (Q1) and 530 (Q3 )) and very few had a count over 800.
Of course, in the real world, we would have to sample many more people to
make any firm conclusions.
6)
Making comparisons
(C)
Statistics are often used to compare two sets of data which may look similar
but in fact are quite different.
Example 1:
The number of punishment exercises given out each week by a
teacher in Poshwood Academy is shown in the stem-leaf diagram
below.
0, 0, 1, 3, 5, 6, 7, 7, 8, 9
0, 0, 1, 1, 1, 2, 2, 2, 4, 5, 7, 7, 7
0, 3
Here n = 25
1
1
2
2
Analysis
Count the numbers in the
data set. You should find 25.
25
Let us find the 5-figure summary.
Use a box diagram to find out where
Q1 , Q2 & Q3 occur.
L=0
H = 23
x + x7 6 + 7
Q1 = 6
=
= 6.5
2
2
Q2 = x 13 = 10
x + x 20 12 + 14
Q3 = 19
=
= 13
2
2
12
6 x6
Q2
x7
12
6 x19
6
Q1
x20
6
Q3
Q2 is an exact middle number
Q1 & Q3 has to be calculated
The Box Plot for Poshwood Academy
0
10
6.5
0
5
10
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
13
23
15
20
25
30
Statistics
page 39
6)
Making comparisons (continued)
(C)
Example 2:
Some distance away, in Shadywood Secondary, the number of
punishment exercises issued by a teacher were recorded as in the
previous example. Here are the results in another stem-and-leaf
diagram.
This the survey was carried out over a
Key
14 week period (in the last example n=25).
0, 2, 6, 9, 9,
1
2
=
12
We can still compare the information.
2, 7,
0, 1, 1, 3, 4, 6, 9
1
1
2
2
Analysis
This time n=14.
14
Let us again find the 5-figure summary.
Use a box diagram to find out where
Q1 , Q2 & Q3 occur.
L=0
H = 29
Q1 = x 4 = 9
x + x 8 17 + 20
Q2 = 7
=
= 18.5
2
2
Q2 = x 11 = 23
7
x7
7
x8
Q2
3
3
Q1
3
Q3
3
Q2 must be calculated
Q1 & Q3 has an exact middle number
The Box Plot for Shadywood Secondary
9
0
15
10
5
0
18.5
23
20
29
25
30
25
30
Compare it with the box plot for Poshwood Academy
6.5
0
10
13
23
15
10
5
0
By comparing the two box plots;-
20
There tends to be more punishments given out by the teacher
from Shadywood Secondary (the box is more over to the right)
There is a much greater spread of data in Shadywood
Secondary (look at the length of box)
The median (a simple average) is lower for Poshwood Academy
(look at the position of Q2)
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
Statistics
page 40
7)
Standard Deviation
(C)
Standard deviation is a measurement of how the data is spread out from the
mean. It is a much better measurement of spread than the semi-interquartile
range.
You have a choice of two formulas for calculating the standard deviation.
Example:
Method 1
The length of phone calls made in a day from an office were monitored
and the results are shown below (in minutes).
3, 12, 12, 5, 9, 8, 9, 21, 4, 6, 2, 5, 5, 7, 8, 2, 10, 9
Find the standard deviation of the data set (working to 2 decimal
places).
mean = x = nx = 137 = 7.61
18
x
A table of values is now useful,-
(x − x) 2
x−x
2
-5.61
31.47
2
-5.61
31.47
3
-4.61
21.25
4
-3.61
13.03
5
-2.61
6.81
5
-2.61
6.81
5
-2.61
6.81
6
-1.61
2.59
7
-0.61
0.37
8
0.39
0.62
8
0.39
0.62
9
1.39
1.93
9
1.39
1.93
9
1.39
1.93
10
2.39
5.71
12
4.39
19.27
12
4.39
19.27
21
13.39
x = 137
(totals)
179.29
2
(x − x) = 351.18
Standard deviation ( n−1 or s) is given by the formula,
n−1 =
(x − x) 2
n−1
=
351.18
17
= 4.54
This is a lengthy method. A quicker way is shown on the next page.
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
Statistics
page 41
7)
Standard Deviation
(continued)
Method 2 (the one pass formula)
Although this formula looks more difficult it is actually quicker to use
(especially if you are able to use the statistical mode on your scientific
calculator). There is no need to calculate the mean and it cuts down on the
rounding error build-up.
For our data set, x = 137 and x 2 = 1393 (obtained by firstly squaring
every number in the data set and then adding them all together or simply by
using the statistical mode on your calculator).
Standard deviation ( n−1 or s) can now be calculated by this formula,
n−1 =
( x)2
x 2 − n
n−1
=
1393 − 18769
18
17
=
350.28 = 4.54
17
In an exam you must show all the working so it is a good idea to use the
statistical mode on your calculator to find x and x 2 then substitute these
values into their correct place in the formula and do the calculation. Once you
have found the answer you can check it on the calculator by using the n−1
key. Don't try to memorise these formulae, they are given to you in the exam.
8)
Probability
(C)
Probability is measured on a scale of 0 to 1.
If a certain event (outcome) is impossible the probability of it occurring is 0.
If a certain event (outcome) is certain then the probability of it occurring is 1.
Probability of any event happening is calculated from,
P(E) =
number of favourable outcomes
total number of outcomes
where P(E) is the probability of an
event "E" occurring.
Example:
a)
b)
c)
There are 52 cards in a standard pack of playing cards.
What is the probability of selecting any club (√) at random?
What is the probability of selecting any king at random?
What is the probability of selecting a joker at random?
Answer:
a) Probability of selecting a club.
P(any √) = 13 = 0.25
52
(there are 13 cards of
each suit)
b) Probability of selecting any king.
P(any king) = 4 = 0.077 (to 3 decimal places)
52
c) Probability of selecting a joker.
P(joker)= 0 = 0
(no jokers in a 52 card pack)
52
Knightswood©Copyright Kayar publishers 2000
Outline Course Notes
Statistics
page 42
Download