Slides - Dr J Frost

advertisement
S1: Chapter 2/3
Data: Measures of Location and
Dispersion
Dr J Frost (jfrost@tiffin.kingston.sch.uk)
www.drfrostmaths.com
Last modified: 9th September 2015
For Teacher Use
Lesson 1:
Concept of variables in statistics.
Mean of ungrouped and grouped data.
Using a calculator to input data and calculate Σπ‘₯, Σ𝑓π‘₯, 𝑛, π‘₯.
Combined means.
Lesson 2:
Median from discrete data and continuous data (using
linear interpolation)
Lesson 3:
Quartiles/percentiles using linear interpolation.
Lesson 4:
Variance/Standard Deviation
Lesson 5:
Coding. Consolidation.
Variables in algebra vs stats
π‘₯
Differences
Similarities
 Just like in algebra, variables in stats
represent the value of some quantity,
e.g. shoe size, height, colour.
 Variables can be discrete or continuous.
Discrete: Limited to specific values, e.g.
shoe size, colour, day born. May be
quantitative (i.e. numerical) or qualitative
(e.g. favourite colour).
Continuous: Can be any real number in
some range (time, weight, …)
Discrete ?
Continuous ?
 Can be part of further calculations, e.g.
if π‘₯ represents height, then 2π‘₯
represents twice people’s height. In
stats this is known as ‘coding’, which
we’ll cover later.
 Unlike algebra, a variable in stats
represents the value of multiple objects.
e.g. The heights of all people in a room.
 Because of this, we can do operations on
it as if it was a collection of values:
 If π‘₯ represents people’s heights, Σπ‘₯
gives the sum of everyone’s heights.
In algebra this would be
meaningless. If π‘₯ = 4, then Σπ‘₯
makes no sense!
 π‘₯ is the mean of π‘₯. Notice π‘₯ is a
collection of values whereas π‘₯ is a
single value.
Mean of ungrouped data
Length of cat
𝒙
2.2
2.5
2.6
2.65
2.9
You all know how to find the mean of a list of values. But lets
consider the notation, and see how theoretically we could
calculate each of the individual components on a calculator.
Σπ‘₯?
π‘₯=
𝑛?
Whip out yer Casios…
The overbar in stats
specifically means ‘the
sample mean of’, but
don’t worry about this
for now.
Inputting Data
Length of cat
𝒙
2.2
2.5
2.6
2.65
2.9
1. Press the ‘MODE’ button and select STAT.
2. The “1-VAR” means “one variable”, so select this.
(We’ll also be using A + BX later in the year, which as you
might guess, is to do with fitting straight lines of best fit)
3. Enter each value above in the table that appears,
and press ‘=‘ after each one.
4. Now press AC to stop entering data so we can start
inputting a calculation…
Inputting Data
Length of cat
𝒙
2.2
2.5
2.6
2.65
2.9
In STATS mode, we can still do many calculations as
normal (e.g. try 1 + 1 = ) but we can also insert various
statistical calculations involving π‘₯. Use SHIFT -> [1] to
access these. Try inputting these expressions:
𝑛
=πŸ“ ?
Σπ‘₯ /𝑛 = 𝟐. πŸ“πŸ•
?
3π‘₯ + 1 = πŸ–. πŸ•πŸ
?
Σπ‘₯
= 𝟏𝟐.?πŸ–πŸ“
π‘₯
= 𝟐. πŸ“πŸ•
?
Grouped Data
Height 𝒉 of bear (in metres)
Estimate of Mean:
Frequency
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
π‘₯=
𝑓π‘₯
?𝑓
=
46.75
?
40
= 1.17π‘š
?
What does the variable π‘₯
represent?
The midpoints of each interval. They‘re effectively a
sensible single value used to represent each interval.
Why is our mean just an estimate?
Because we don’t know the exact heights within each
group. Grouping data loses information.
?
?
Inputting Frequency Tables
Height 𝒉 of bear (in metres)
Frequency
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
We now need to input frequencies.
Go to SETUP (SHIFT -> MODE), press
down, then select STAT.
Select ‘ON’ for frequency.
Your calculator will preserve this setting
even when switched off.
Now go into STATS mode and input your table, using your midpoints as the
π‘₯ values. How on the calculator do we get:
Σ𝑓π‘₯ ?
Σ𝑓 ?
This is Σπ‘₯. This is because your calculator thinks π‘₯ is the
original variable rather than just
? a list of interval midpoints.
Just imagine that your calculator is converting your table into
a list of values with duplicates.
? = Σ𝑓)
This is 𝑛. (Since 𝑛
Bro Exam Tip: In the exam you get a method mark for the
division and an accuracy mark for the final answer. Write:
46.75
π‘₯=
= 1.16875
40
You’re not required to show working like “0.25 × 4 + β‹―”
π‘₯=
𝑓π‘₯
𝑓
Mini-Exercise
Use your calculator’s STAT mode to determine the mean (or estimate of the mean).
Ensure that you show the division in your working.
1
2
IQ of L6Ms2 (𝒒)
Frequency (𝒇)
Num children (𝒄)
Frequency (𝒇)
0
2
80 < π‘ž ≤ 90
7
1
6
90 ≤ π‘ž < 100
5
2
1
100 ≤ π‘ž < 120
3
3
1
120 ≤ π‘ž < 200
1
Σ𝑐 11
𝑐=
= ?= 1.1
𝑛
10
3
Time 𝒕
π‘₯=
Σ𝑓π‘₯ 1560
? = 97.5
=
Σ𝑓
16
Frequency (𝒇)
9.5 < 𝑑 ≤ 10
32
10 ≤ 𝑑 < 12
27
12 ≤ 𝑑 < 15
47
15 ≤ 𝑑 < 16
11
π‘₯=
Σ𝑓π‘₯
Σ𝑓
=
1414
117
=
? 12.1 to 3sf.
GCSE RECAP :: Combined Mean
The mean maths score of 20 pupils in class A is 62.
The mean maths score of 30 pupils in class B is 75.
a) What is the overall mean of all the pupils’ marks.
b) The teacher realises they mismarked one student’s paper; he should have
received 100 instead of 95. Explain the effect on the mean and median.
𝟐𝟎 × πŸ”πŸ + πŸ‘πŸŽ × πŸ•πŸ“
πŸ‘πŸ’πŸ—πŸŽ
𝑴𝒆𝒂𝒏 =
=
= πŸ”πŸ—. πŸ–
πŸ“πŸŽ
πŸ“πŸŽ
?
The revised score will increase the mean but leave the median unaffected.
Test Your Understanding
Archie the Archer competes in a competition with 50 rounds. He scored an average
of 35 points in the first 10 rounds and an average of 25 in the remaining rounds.
What was his average score per round?
πŸ‘πŸ“ × πŸπŸŽ + πŸπŸ“ × πŸ’πŸŽ
𝑴𝒆𝒂𝒏 =
= πŸπŸ•
?πŸ“πŸŽ
Median – which item?
You need to be able to find the median of both listed data and of grouped data.
Grouped data
Listed data
𝒏
Position of median
Median
1,4,7,9,10
5
4,9,10,15
4
?
2 ?
/3
4?
5?
/6
?7
9.5
?
?7
7.5
?
Items
2,4,5,7,8,9,11
7
1,2,3,5,6,9,9,10,11,12 10
3rd
nd
rd
th
th
th
Can you think of a rule to find the
position of the median given 𝑛?
! To find the position of the median for
𝑛
listed data, find 2 :
- If a decimal, round?up.
- If whole, use halfway between this
item and the one after.
IQ of L6Ms2 (𝒒)
Frequency (𝒇)
80 ≤ π‘ž < 90
7
90 ≤ π‘ž < 100
5
100 ≤ π‘ž < 120
3
120 ≤ π‘ž < 200
2
Position to use for median:
8.5?
! To find the median of
𝑛
grouped data, find 2 , then use
linear interpolation.
𝑛
DO NOT round or adjust it in any way.
2
This is just like at GCSE where, if you had a
cumulative frequency graph with 60 items,
you’d look across the 30th. We’ll cover
linear interpolation in a sec…
Quickfire Questions…
What position do we use for the median?
Lengths: 3cm, 5cm, 6cm, …
𝑛 = 11
Median position: 6th ?
Lengths: 4m, 8m, 12.4m, …
𝑛 = 24
Median position: 12th/13
? th
Age
Freq
10 ≤ π‘Ž < 20
12
20 ≤ π‘Ž < 30
5
Median position: 8.5
?
Score
Freq
150 ≤ 𝑠 < 200
3
200 ≤ 𝑠 < 400
7
th
Median position: 18?
Volume (ml)
Median position: 5 ?
Ages: 5, 7, 7, 8, 9, 10, …
𝑛 = 60
Median position: 30th/31
? st
Score
Weights: 1.2kg, 3.3kg, …
𝑛 = 35
Freq
0 ≤ 𝑣 < 100
5
100 ≤ 𝑣 < 200
6
200 ≤ 𝑣 < 300
2
Median position: 6.5?
Freq
150 ≤ 𝑠 < 200
15
200 ≤ 𝑠 < 400
6
Median position: 10.5
?
Weights: 4.4kg, 7.6kg, 7.7kg…
𝑛 = 18
Median position: 9th/10
? th
Bro Exam Note: Pretty much every exam question has dealt
with median for grouped data, not listed data.
Linear Interpolation
Height of tree (m)
Freq
C.F.
0.55 ≤ β„Ž < 0.6
55
55
0.6 ≤ β„Ž < 0.65
45
100
0.65 ≤ β„Ž < 0.7
30
130
0.7 ≤ β„Ž < 0.75
15
145
0.75 ≤ β„Ž < 0.8
5
150
At GCSE we draw a suitable line
on a cumulative frequency
graph. How could we read of
this value exactly using suitable
calculation?
We could find the fraction of
the way along the line
segment using the frequencies,
then go this same fraction
along the class interval.
Linear Interpolation
Height of tree (m)
Freq
C.F.
0.55 ≤ β„Ž < 0.6
55
55
0.6 ≤ β„Ž < 0.65
45
100
0.65 ≤ β„Ž < 0.7
30
130
0.7 ≤ β„Ž < 0.75
15
145
0.75 ≤ β„Ž < 0.8
5
150
The 75th item is within the
0.6 ≤ β„Ž < 0.65 class interval
because 75 is within the first
100 items but not the first 55.
Frequency up until
this interval
55?
?
75
0.6m
?
Med
Height at start of interval.
Item number we’re
interested in.
?
100
Frequency at end of
this interval
0.65m
?
Height at end of interval.
Linear Interpolation
Height of tree (m)
Freq
C.F.
0.55 ≤ β„Ž < 0.6
55
55
0.6 ≤ β„Ž < 0.65
45
100
0.65 ≤ β„Ž < 0.7
30
130
0.7 ≤ β„Ž < 0.75
15
145
0.75 ≤ β„Ž < 0.8
5
150
Frequency up until
this interval
What fraction of the way across the
interval are we?
𝟐𝟎
?
πŸ’πŸ“
Hence:
𝟐𝟎
π‘΄π’†π’…π’Šπ’‚π’ = 𝟎. πŸ” +
× πŸŽ. πŸŽπŸ“
πŸ’πŸ“ ?
55
75
0.6m
Med
Item number we’re
interested in.
Height at start of interval.
100
Frequency at end of
this interval
0.65m
Height at end of interval.
Bro Tip: I like to put the
units to avoid getting
frequencies confused
with values.
Bro Tip: To quickly get
frequency before and after,
just look for the two
cumulative frequencies that
surround the item number.
More Examples
Weight of cat (kg)
Freq
C.F.
1.5 ≤ 𝑀 < 3
10
10
3≤𝑀<4
8
4≤𝑀<6
14
Time (s)
Freq
C.F.
8 ≤ 𝑑 < 10
4
4
18
10 ≤ 𝑑 < 12
3
7
32
12 ≤ 𝑑 < 14
13
20
Median class interval:
πŸ‘≤π’˜<πŸ’
?
10?
3kg
?
?
16
7?
?
18
4kg
?
Fraction along interval:
πŸ”
πŸ–?
Median:
πŸ”
πŸ‘+
× πŸ ? = πŸ‘. πŸ•πŸ“π’Œπ’ˆ
πŸ–
?
20
10
?
12s
?
14s
?
Median:
𝟏𝟐 +
πŸ‘
×
πŸπŸ‘
𝟐 = 𝟏𝟐. πŸ“ (to 3sf)
?
What’s different about the intervals here?
Weight of cat to nearest kg
Frequency
10 − 12
7
13 − 15
2
16 − 18
9
19 − 20
4
There are GAPS between intervals!
What interval does this actually represent?
10 − 12
9.5 −? 12.5
Lower class boundary
Class width = 3
Upper class boundary
?
Identify the class width
Distance 𝒅 travelled (in m)
…
Time 𝒕 taken (in seconds)
0 ≤ d < 150
0−3
150 ≤ d < 200
πŸ’−πŸ”
𝟐𝟎𝟎 ≤ 𝐝 < 𝟐𝟏𝟎
7 − 11
…
Lower class boundary = 200
?
Lower class boundary = 3.5?
Class width = 10?
Class width = 3 ?
Weight π’˜ in kg
…
Speed 𝒔 (in mph)
10 − 20
10 ≤ s < 20
21 − 30
20 ≤ 𝑠 < 29
πŸ‘πŸ − πŸ’πŸŽ
πŸπŸ— ≤ 𝐬 < πŸ‘πŸ
Lower class boundary = 30.5
?
Class width = 10?
…
Lower class boundary = 29?
Class width = 2
?
Linear Interpolation with gaps
Jan 2007 Q4
10
29
72
97
105
111
116
119
120
?
29
19.5 ?miles
?
60
72?
29.5?miles
πŸ‘πŸ
π‘€π‘’π‘‘π‘–π‘Žπ‘› = πŸπŸ—. πŸ“ +
×?
𝟏𝟎 = πŸπŸ”. πŸ•π’Œπ’ˆ
πŸ’πŸ‘
Test Your Understanding
Questions should be on a printed sheet…
Age of relic (years)
0-1000
1001-1500
1501-1700
1701-2000
Frequency
24
29
12
35
26
π‘€π‘’π‘‘π‘–π‘Žπ‘› = 1000.5 +
× 500
29
?
= πŸπŸ’πŸ’πŸ–. πŸ• years
Shark length (cm)
40 ≤ π‘₯ < 100
100 ≤ π‘₯ < 300
300 ≤ π‘₯ < 600
600 ≤ π‘₯ < 1000
Frequency
17
5
8
10
3
π‘€π‘’π‘‘π‘–π‘Žπ‘› = 100 +
× 200
5?
= 220π‘π‘š
Exercise 2
1
Questions should be on a printed sheet…
πŸπŸ”
π‘΄π’†π’…π’Šπ’‚π’ = πŸ‘πŸ’. πŸ“ +
×
?πŸ“ = πŸ‘πŸ•. 𝟐
πŸ‘πŸŽ
2
3
π‘΄π’†π’…π’Šπ’‚π’ = πŸ’πŸŽ +
𝟐𝟐
× πŸπŸŽ = πŸ’πŸ“. πŸ—
πŸ•πŸ’ ?
π‘΄π’†π’…π’Šπ’‚π’ = πŸ”πŸŽ +
πŸπŸ’
× πŸ‘πŸŽ = πŸ•πŸ‘. πŸπŸπŸ“
πŸ‘πŸ ?
Exercise 2
4
πŸ’πŸ–πŸ‘πŸ•. πŸ“
𝑴𝒆𝒂𝒏 =
=?πŸπŸ’. πŸπŸ–πŸ•πŸ“
𝟐𝟎𝟎
πŸ‘πŸ–
π‘΄π’†π’…π’Šπ’‚π’ = 𝟐𝟎. πŸ“ +
× πŸ“ = 𝟐𝟐. πŸ•
πŸ–πŸ– ?
Quartiles – which item?
You need to be able to find the quartiles of both listed data and of grouped data.
The rule is exactly the same as for the median.
Grouped data
Listed data
Items
1,4,7,9,10
4,9,10,15
2,4,5,7,8,9,11
IQ of L6Ms2 (𝒒)
𝒏
Position of LQ & UQ
5
?
1 /2 ?
& 3 /4
2 ?
&6
3 and
?8
4
7
1,2,3,5,6,9,9,10,11,12 10
2nd & 4th
st
nd
rd
nd
th
rd
th
th
LQ & UQ
?7
6.5 &
?12.5
4?
&9
3&
?10
Can you think of a rule to find the
position of the LQ/UQ given 𝑛?
! To find the position of the LQ/UQ for
1
3
listed data, find 4 𝑛 or 4 𝑛 then as before:
?
- If a decimal, round up.
- If whole, use halfway between this
item and the one after.
Frequency (𝒇)
80 ≤ π‘ž < 90
7
90 ≤ π‘ž < 100
5
100 ≤ π‘ž < 120
3
120 ≤ π‘ž < 200
2
Position to use for LQ:
4.25
?
! To find the median of
1
3
grouped data, find 4 𝑛 and 4 𝑛,
then use linear interpolation.
Again, DO NOT round this value.
Percentiles
The LQ, median and UQ give you 25%, 50% and 75% along the data respectively.
But we can have any percentage you like.
𝑛 = 43
Item to use for 57th
percentile?
24.51
?
You will always find these for grouped data in an exam, so never round
this position.
Notation:
Lower Quartile: 𝑄1?
Upper Quartile: 𝑄?3
Median:
𝑄?2
?
57th Percentile: 𝑃57
Test Your Understanding
These are the same as the ‘Test Your Understanding’
questions on your sheet from before.
Age of relic (years)
0-1000
1001-1500
1501-1700
1701-2000
Item for 𝑄1 :
25th?
Frequency
24
29
12
35
1
𝑄1 = 1000.5 +
× 500
29
?
= 1017.74 years
10
𝑄3 = 1700.5 +
× 300
35
?
= 1786.21 years
𝐼𝑄𝑅 = 768.47 ?
Shark length (cm)
40 ≤ π‘₯ < 100
100 ≤ π‘₯ < 300
300 ≤ π‘₯ < 600
600 ≤ π‘₯ < 1000
Frequency
17
5
8
10
Item to use for P37 : 14.8 ?
πŸπŸ’. πŸ–
𝑃37 = πŸ’πŸŽ +
×?πŸ”πŸŽ = πŸ—πŸ. πŸπŸ’
πŸπŸ•
𝟏. 𝟐
𝑃78 = πŸ”πŸŽπŸŽ +
×?πŸ’πŸŽπŸŽ = πŸ”πŸ’πŸ–
𝟏𝟎
Exercise 3
?
?
Exercise 3
πŸ‘πŸ—
𝑄1 = 𝟐𝟎 +
× πŸπŸŽ = πŸπŸ–. πŸ“
πŸ—πŸ
πŸπŸ‘
𝑄3 = πŸ”πŸŽ +
× πŸ’πŸŽ = πŸ–πŸ‘. πŸ”
πŸ‘πŸ—
𝐼𝑄𝑅 = πŸ“πŸ“. 𝟏
?
𝟏
× πŸπŸŽ = πŸ’πŸ’. πŸ“
πŸ’πŸ‘
πŸπŸ–
𝑄3 = πŸπŸ—. πŸ“ +
× πŸπŸŽ = πŸ‘πŸ”. πŸ•
πŸπŸ“
𝐼𝑄𝑅 = πŸ•. πŸ–
𝑄1 = πŸπŸ—. πŸ“ +
πŸπŸ“
× πŸ‘πŸŽ = πŸ’πŸ’. πŸ“
πŸ‘πŸ
𝟏𝟐
𝑄3 = πŸ—πŸŽ +
× πŸ‘πŸŽ = πŸπŸŽπŸ“. πŸ•
πŸπŸ‘
πŸ•
𝑃90 = 𝟏𝟐𝟎 +
× πŸπŸπŸŽ = πŸπŸ”πŸ—. πŸ’
πŸπŸ•
𝑄1 = πŸ‘πŸŽ +
?
?
What is variance?
Distribution of IQs in L6Ms5
Distribution of IQs in L6Ms4
πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ 𝐷𝑒𝑛𝑠𝑖𝑑𝑦
πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦ 𝐷𝑒𝑛𝑠𝑖𝑑𝑦
110
𝐼𝑄
Here are the distribution of IQs in two
classes. What’s the same, and what’s
different?
110
𝐼𝑄
Variance
Variance is how spread out data is.
Variance, by definition, is the average squared distance from the mean.
𝜎
2
Σ
=
π‘₯−π‘₯ 2
𝑛
Distance from mean…
Squared distance from mean…
Average squared distance from mean…
Simpler formula for variance
Variance
“The mean of the squares minus the square of the mean
(‘msmsm’)”
2
Σπ‘₯
𝜎2 = ? − π‘₯2?
𝑛
Standard Deviation
𝜎 = π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’
The standard deviation can ‘roughly’ be thought of as the average distance from the
mean.
Examples
1, 3
Variance
𝜎 2 = 5 − 2?2 = 1
Standard Deviation
𝜎 = 1 =?1
So note that that in the case of two
items, the standard deviation is
indeed the average distance of the
values from the mean.
2cm 3cm 3cm 5cm 7cm
Variance
𝜎 2 = 19.2 − 42?= 3.2cm
Standard Deviation
?
𝜎 = 3.2 = 1.79cm
Practice
Find the variance and standard deviation of the following sets of data.
2
?
Variance = 2.67
4
6
Standard Deviation = 1.63?
1 2 3 4 5
Variance = 2 ?
Standard Deviation = 1.41 ?
Extending to frequency/grouped frequency tables
We can just mull over our mnemonic again:
Variance: “The mean of the squares minus the square of the
means (‘msmsm’)”
Σ𝑓π‘₯ 2
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ = ? − π‘₯ 2 ?
Σ𝑓
Bro Tip: It’s better to try and memorise the mnemonic than the formula
itself – you’ll understand what’s going on better, and the mnemonic will be
applicable when we come onto random variables in Chapter 8.
Bro Exam Note: In an exam, you will pretty much certainly be asked to find
the standard deviation for grouped data, and not listed data.
Example
May 2013 Q4
We can use our STATS mode to work out the various summations needed.
Just input the table as normal.
Note that, as the discussion before, Σπ‘₯ 2 on a calculator actually gives you
Σ𝑓π‘₯ 2 because it’s already taking the frequencies into account.
πŸ’πŸ–πŸ‘πŸ•. πŸ“
?
= πŸπŸ’. πŸπŸ–πŸ•πŸ“
𝟐𝟎𝟎
πŸπŸ‘πŸ’πŸπŸ–πŸ. πŸπŸ“
𝟐 = πŸ–πŸ”. πŸ‘πŸ•
𝜎2 =
− πŸπŸ’. πŸπŸ–πŸ•πŸ“
?
𝟐𝟎𝟎
𝜎 = πŸ–πŸ”. πŸ‘πŸ• = πŸ—. πŸπŸ—
π‘₯=
?
You ABSOLUTELY must
check this with the 𝜎π‘₯
you can get on the
calculator directly.
Test Your Understanding
May 2013 (R) Q3
?
Most common exam errors
Thinking Σ𝑓π‘₯ 2 means Σ𝑓π‘₯ 2 . It means the sum
of each value squared!
When asked to calculate the mean followed by
standard deviation, using a rounded version of the
mean in calculating the standard deviation, and
hence introducing rounding errors.
Forgetting to square root the variance to get the
standard deviation.
ALL these mistakes can be easily spotted if you
check your value against “𝜎π‘₯” in STATS mode.
Exercises
Page 40 Exercise 3C
Q1, 2, 4, 6
Page 44 Exercise 3D
Q1, 4, 5
Coding
What do you reckon is the mean height of people in this room?
Now, stand on your chair, as per the instructions below.
INSTRUCTIONAL VIDEO
Is there an easy way to recalculate the mean based on your new
heights? And the variance of your heights?
Starter
Suppose now after a bout of ‘stretching you to your limits’, you’re
now all 3 times your original height.
What do you think happens to the standard deviation of your heights?
It becomes 3 times larger (i.e. your heights are 3 times as spread out!)
?
What do you think happens to the variance of your heights?
It becomes 9 times larger
?
(Can you prove the latter using the formula for variance?)
The point of coding
Cost π‘₯ of diamond ring (£)
£1010 £1020 £1030 £1040 £1050
We ‘code’ our variable using the following:
π‘₯ − 1000
𝑦=
10
New values 𝑦:
£1 £2 £3
? £4
Standard deviation of 𝑦 (πœŽπ‘¦ ):
therefore…
Standard deviation of π‘₯ (𝜎π‘₯ ):
£5
𝟐
?
10 ?
2
Finding the new mean/variance
Old mean π‘₯
Old 𝜎π‘₯
Coding
New mean 𝑦
New πœŽπ‘¦
36
4
𝑦 = π‘₯ − 20
16
?
4?
36
?
8
?
𝑦 = 2π‘₯
72
16
35
4
𝑦 = 3π‘₯ − 20
85
?
12
?
20
3
2
?7
?9
40
5
40
?
3
?
11
27
300
?
25
?
π‘₯
𝑦=
2
π‘₯ + 10
𝑦=
3
π‘₯ − 100
𝑦=
5
Example Exam Question
Suppose we’ve
worked all these
out already.
f)
Coding is
π’•πŸ = π’•πŸ − πŸ“
Mean decreases by 5.
𝝈 unaffected.
Median decreases by 5.
Lower quartile decreases by 5.
Interquartile range unaffected.
?
Exercises
Page 26 Exercise 2E
Q3, 4
Page 47 Exercise 3E
Q2, 3, 5, 7
(Note: answers in book for Q2 are incorrect:
πœŽπ‘€ = 70.71
? therefore
?
𝜎 𝑀 = 7.071
10
?
πœŽπ‘€−200 = 70.71
πœŽπ‘€−10 = 0.354
?
200
Chapters 2-3 Summary
I have a list of 30 heights in the class. What item do I use for:
• 𝑄1 ?
• 𝑄2 ?
• 𝑄3 ?
?
8th
Between 15
?th and 16th
23rd
?
For the following grouped frequency table, calculate:
Height 𝒉 of bear (in metres)
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
a) The estimate mean: β„Ž =
b) The estimate median:
c) The estimate variance:
(you’re given Σπ‘“β„Ž2 = 67.8125)
Frequency
0.25 × 4 + 0.85 × 20 + β‹― 46.75
? = 40 = 1.17π‘š π‘‘π‘œ 3𝑠𝑓
40
16
0.5 +
× 0.7 = 1.06π‘š
?
20
67.8125
46.75
2
?
𝜎 =
−
40
40
2
= 0.329 π‘‘π‘œ 3𝑠𝑓
Chapters 2-3 Summary
What is the standard deviation of the following lengths: 1cm, 2cm, 3cm
𝜎2 =
14
2
− 22 =
3
3
?
𝜎=
2
3
The mean of a variable π‘₯ is 11 and the variance 4.
π‘₯+10
The variable is coded using 𝑦 = 3 . What is:
a) The mean of 𝑦?
b) The variance of 𝑦?
π’š = πŸ•?
πŸ’
πˆπŸπ’š = πŸ—?
A variable π‘₯ is coded using 𝑦 = 4π‘₯ − 5.
For this new variable 𝑦, the mean is 15 and the standard deviation 8.
What is:
a) The mean of the original data?
𝒙 = πŸ“?
b) The standard deviation of the original data? πˆπ’™ = ?
𝟐
Download