S1: Chapter 1 Data: Location

advertisement
S1: Chapters 2-3
Data: Location and Spread
Dr J Frost (jfrost@tiffin.kingston.sch.uk)
Last modified: 5th September 2014
Types of variables
π‘₯
In statistics, we can use a variable to represent some quantity, e.g. height, age.
This could be qualitative (e.g. favourite colour) or quantitative (i.e. numerical).
Variables are often used differently in statistics than they are in algebra.
π‘₯
In statistics, this would mean:
“Sum over the values of the variable we’re collected (i.e. our data).”
2 types of variable:
Discrete variables
Continuous variables
Has specific values.
e.g. Shoe size, colour,
? website visits in
an hour period, number of siblings, …
Can have any value in a range.
e.g. Height, distance,
? weight, time,
wavelength, …
Quartiles for large numbers of items
What item do we use for each quartile when 𝑛 = β‹―
1
1
3
Rule: Find or or of 𝑛. Then:
4
2
4
• If not whole, round up.
• If whole, use this item and one after.
LQ
Median
UQ
31
8?th
16?th
24?th
19
5?th
10? th
15?th
3rd and
? 4th
5?th
? 8th
7th and
11?th
6
14
?2nd
4?th
Under what circumstances do we not round?
When we have a grouped frequency table
involving a continuous?variable.
Quickfire Quartiles
LQ
Median
UQ
1?
2?
3?
1, 2, 3, 4
?
1.5
?
2.5
?
3.5
1, 2, 3, 4, 5
1.5
?
2?
4.5
?
2?
3.5
?
5?
1, 2, 3
1, 2, 3, 4, 5, 6
Notation for quartiles/percentiles
Lower Quartile:
𝑄1?
Median:
𝑄?2
Upper Quartile:
𝑄?3
57th Percentile:
𝑃57
?
Grouped Frequency Data Recap
This type of data is continuous.
?
Height 𝒉 of bear (in metres)
Frequency
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
Estimate of Mean:
π‘₯=
What does the variable 𝒙 represent?
Why the ‘bar’ (horizontal line) over the 𝒙?
Why is our mean just an estimate?
𝑓π‘₯
?𝑓
=
46.75
?
40
= 1.17π‘š
?
The midpoints of each interval. They‘re effectively a
sensible single value used to represent each interval.
?
It’s the sample mean of π‘₯. It indicates that our mean
is just based on a sample, rather than the whole
population.
?
Because we don’t know the exact heights within each
group. Grouping data loses information.
?
Grouped Frequency Data Recap
Height 𝒉 of bear (in metres)
Frequency
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
Modal class interval:
0.5 ≤ β„Ž < 1.2
(‘modal’ means ‘most’)
Median class interval:
?
There are 40 items, so determine where 20th item is.
0.5 ≤ β„Ž < 1.2
?
Using STATS mode on your calculator
Height 𝒉 of bear (in metres)
Frequency
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
Work out the mean for
this example first using
proper workings.
1. Go to SETUP (SHIFT οƒ  MODE). Press down for the second page of menu, and select
STAT. You want Frequency ‘ON’. (Note that you won’t have to do this again in future)
2. MODE οƒ  STAT
3. Select 1-VAR (as there is only “1 variable” here – frequency is not a variable!)
4. Enter your x values, pressing = after each one. Navigate to the top of your table to
enter your frequencies.
5. Press AC to ‘bank’ your table.
6. SHIFT οƒ  1 for ‘STAT’. Select each ‘Sum’ or ‘Var’. Once you’ve selected a statistic to
use, it’ll appear in your calculation. Once you want to calculate the value, press =. Try
entering Σπ‘₯ ÷ 𝑛. (For this example: 1.16875)
7. MODE οƒ  COMP to go back to normal computation mode.
Important note: Confusingly, your calculator means Σ𝑓π‘₯
when you enter Σπ‘₯. And 𝑛 = Σ𝑓, i.e. it’s interpreting the
data as if it was listed out with duplicated.
Warning: You still need to show working in the exam.
What’s different about the intervals here?
Weight of cat to nearest kg
Frequency
10 − 12
7
13 − 15
2
16 − 18
9
19 − 20
4
There are GAPS between intervals!
What interval does this actually represent?
10 − 12
9.5 −? 12.5
Lower class boundary
Class width = 3
Upper class boundary
?
Identify the class width
Distance 𝒅 travelled (in m)
…
Time 𝒕 taken (in seconds)
0 ≤ d < 150
0−3
150 ≤ d < 200
πŸ’−πŸ”
𝟐𝟎𝟎 ≤ 𝐝 < 𝟐𝟏𝟎
7 − 11
…
Lower class boundary = 200
?
Lower class boundary = 3.5?
Class width = 10?
Class width = 3 ?
Weight π’˜ in kg
…
Speed 𝒔 (in mph)
10 − 20
10 ≤ s < 20
21 − 30
20 ≤ 𝑠 < 29
πŸ‘πŸ − πŸ’πŸŽ
πŸπŸ— ≤ 𝐬 < πŸ‘πŸ
Lower class boundary = 30.5
?
Class width = 10?
…
Lower class boundary = 29?
Class width = 2
?
S2 – Chapters 2/3
Interpolation
RECAP: Quartiles of Frequency Table
Age of squirrel
Frequency
Cumulative Freq
1
5
5
2
8
13
3
11
24
4
5
29
29
𝑄1 ?
29 squirrels. 4 = 7.25
So look at 8th squirrel.
?
Occurs within second group, so 𝑄1 = 2
𝑄2 ?
29
2
𝑄3 ?
3
4
= 14.5 so use 15th squirrel.
?
Occurs in third group, so 𝑄2 = 3
× 29 = 21.75 so use 22nd squirrel.
Still in third group, so?𝑄3 = 3
Estimating the median
GCSE Question
Answer = 13.5 + 8? = 21.5
Estimating the median
At GCSE, you were only required to give the median class interval when dealing with
grouped data. Now, we want to estimate a value within that class interval.
Weight of cat to nearest kg
Frequency
10 − 12
7
13 − 15
2
16 − 18
9
19 − 20
4
(Why not the 11.5 item?)
Frequency up until
this interval
9?
?
11
15.5kg
?
?
Item number we’re
interested in.
?
18
18.5kg
?
Weight at start of
interval.
Median = 15.5 +
Frequency at end of
this interval
Weight at end of
interval.
2
9
× 3? = 16.17π‘˜π‘”
Estimating other values
Weight of cat to nearest kg
34th
LQ
=
UQ
=
Frequency
10 − 12
7
13 − 15
2
16 − 18
9
19 − 20
4
5.5
9.5 +
× 3? = 11.86π‘˜π‘”
7
7.5
15.5 +
× ?3 = 18π‘˜π‘”
9
Percentile = 12.5 +
0.48
2
× ?3 = 13.22π‘˜π‘”
You should have a sheet in front of you
1a
1000.5 +
1
×
29
500 = 1017.74
? years
1b
1000.5 +
26
×
29
500 = 1448.78
? years
1c
1700.5 +
10
×
35
300 = 1786.21
? years
1d
Interquartile Range: 1786.21 − 1017.74 = 768.47 years
2a
40 +
2b
300 +
2c
555 − 58.35 = 496.65cm
?
5.2
× 60
17
6.8
×
8
= 58.35cm ?
300 = 555cm?
?
Exercises
Page 34 Exercise 3A
Q4, 5, 6
Page 36 Exercise 3B
Q1, 3, 5
S2 – Chapters 2/3
Variance and Standard Deviation
What is variance?
Distribution of IQs in L6Ms5
Distribution of IQs in L6Ms4
πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦
110
πΉπ‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦
𝐼𝑄
Here are the distribution of IQs in two
classes. What’s the same, and what’s
different?
110
𝐼𝑄
Variance
Variance is how spread out data is.
Variance, by definition, is the average squared distance from the mean.
𝜎
2
Σ
=
π‘₯−π‘₯ 2
𝑛
Distance from mean…
Squared distance from mean…
Average squared distance from mean…
Simpler formula for variance
Variance
“The mean of the squares minus the square of the mean
(‘msmsm’)”
Σπ‘₯ 2
Σπ‘₯
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ = ? − ?
𝑛
𝑛
2
Standard Deviation
𝜎 = π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’
The standard deviation can ‘roughly’ be thought of as the average distance from the
mean.
Starter
Calculate the variance and standard deviation of the following heights:
2cm 3cm 3cm 5cm 7cm
Variance
= 19.2 − 42 ?= 3.2cm
Standard Deviation
= 3.2 = 1.79cm
?
Practice
Find the variance and standard deviation of the following sets of data.
2
Variance = 2.67
?
4
6
Standard Deviation = 1.63?
1 2 3 4 5
Variance = 2 ?
Standard Deviation = 1.41 ?
Extending to frequency/grouped frequency tables
We can just mull over our mnemonic again:
Variance: “The mean of the squares minus the square of the
means (‘msmsm’)”
2
Σ𝑓π‘₯
Σ𝑓π‘₯
?
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ = ? −
Σ𝑓
Σ𝑓
2
Bro Tip: It’s better to try and memorise the mnemonic than the formula
itself – you’ll understand what’s going on better, and the mnemonic will be
applicable when we come onto random variables in Chapter 8.
Example
Height 𝒉 of bear (in metres)
Frequency
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
?
Σ𝑓π‘₯ = 46.75
?
Σ𝑓π‘₯ 2 = 67.81
67.81
46.75
?
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ =
−
40
40
?
Σ𝑓 = 40
2
= 0.33
Sometimes we’re helpfully given summed data:
Shoe Size 𝒕
Σ𝑓𝑑 = 252
Frequency
10
7
11
2
12
9
13
4
Σ𝑓𝑑 2 = 2914
2914
252
?
π‘‰π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ =
−
22
22
Σ𝑓 = 22
2
= 1.25
Exercises
Page 40 Exercise 3C
Q1, 2, 4, 6
Page 44 Exercise 3D
Q1, 4, 5
Recap
Σπ‘₯ = 10,
Σy = 20,
Σ𝑓π‘₯ 2 = 1000,
Σ𝑓𝑑 2 = 400,
Σπ‘₯ 2 = 50,
𝜎2 = 6 ?
Σy 2 = 100,
𝜎2 = 4 ?
Σ𝑓π‘₯ = 100,
𝜎2 = 0 ?
Σ𝑓𝑑 = 20,
𝜎 2 = 75?
𝑛=5
n=5
Σ𝑓 = 10
Σ𝑓 = 4
S2 – Chapters 2/3
Coding
Starter
What do you reckon is the mean height of people in this room?
Now, stand on your chair, as per the instructions below.
INSTRUCTIONAL VIDEO
Is there an easy way to recalculate the mean based on your new
heights? And the variance of your heights?
Starter
Suppose now after a bout of ‘stretching you to your limits’, you’re
now all 3 times your original height.
What do you think happens to the standard deviation of your heights?
It becomes 3 times larger (i.e. your heights are 3 times as spread out!)
?
What do you think happens to the variance of your heights?
It becomes 9 times larger
?
(Can you prove the latter using the formula for variance?)
The point of coding
Cost π‘₯ of diamond ring (£)
£1010 £1020 £1030 £1040 £1050
We ‘code’ our variable using the following:
π‘₯ − 1000
𝑦=
10
New values 𝑦:
£1 £2 £3
? £4
Standard deviation of 𝑦 (πœŽπ‘¦ ):
therefore…
Standard deviation of π‘₯ (𝜎π‘₯ ):
£5
𝟐
?
10 ?
2
Finding the new mean/variance
Old mean π‘₯
Old variance
Coding
New mean 𝑦
New variance
36
4
𝑦 = π‘₯ − 20
16
?
?4
36
?
?4
𝑦 = 2π‘₯
72
16
35
4
𝑦 = 3π‘₯ − 20
85
?
36
?
20
3
2
?7
?3
40
5
40
?
?6
11
27
300
?
125
?
π‘₯
𝑦=
2
π‘₯ + 10
𝑦=
3
π‘₯ − 100
𝑦=
5
Exercises
Page 26 Exercise 2E
Q3, 4
Page 47 Exercise 3E
Q2, 3, 5, 7
Chapters 2-3 Summary
I have a list of 30 heights in the class. What item do I use for:
• 𝑄1 ?
• 𝑄2 ?
• 𝑄3 ?
?
8th
Between 15
?th and 16th
23rd
?
For the following grouped frequency table, calculate:
Height 𝒉 of bear (in metres)
0 ≤ β„Ž < 0.5
4
0.5 ≤ β„Ž < 1.2
20
1.2 ≤ β„Ž < 1.5
5
1.5 ≤ β„Ž < 2.5
11
a) The estimate mean: β„Ž =
b) The estimate median:
c) The estimate variance:
(you’re given Σπ‘“β„Ž2 = 67.8125)
Frequency
0.25 × 4 + 0.85 × 20 + β‹― 46.75
? = 40 = 1.17π‘š π‘‘π‘œ 3𝑠𝑓
40
16
0.5 +
× 0.7 = 1.06π‘š
?
20
67.8125
46.75
2
?
𝜎 =
−
40
40
2
= 0.329 π‘‘π‘œ 3𝑠𝑓
Chapters 2-3 Summary
What is the standard deviation of the following lengths: 1cm, 2cm, 3cm
𝜎2 =
14
2
− 22 =
3
3
?
𝜎=
2
3
The mean of a variable π‘₯ is 11 and the variance 4.
π‘₯+10
The variable is coded using 𝑦 = 3 . What is:
a) The mean of 𝑦?
b) The variance of 𝑦?
π’š = πŸ•?
πŸ’
πˆπŸπ’š = πŸ—?
A variable π‘₯ is coded using 𝑦 = 4π‘₯ − 5.
For this new variable 𝑦, the mean is 15 and the standard deviation 8.
What is:
a) The mean of the original data?
𝒙 = πŸ“?
b) The standard deviation of the original data? πˆπ’™ = ?
𝟐
Download