Uploaded by elizabeth.rogers438

Year 9 Maths Statistics Notes 2022

advertisement
Year 9 Maths Statistics Notes 2022
Achievement Standards:
2.1
2.2
2.3
2.4
Students compare techniques for collecting data from primary and secondary sources
Identify questions and issues involving different data types
Students construct histograms and back-to-back stem-and-leaf plots with and without the use of digital technology
Students make sense of the position of the mean and median in skewed, symmetric and bi-modal displays to describe and interpret data
Statistical investigations have 4 main stages:
1.
2.
3.
4.
Posing a question.
Collecting data.
Analysing data.
Interpreting results.
Data Sources – Achievement Standard 2.1
Learning Intentions:

I can collect data directly and from secondary sources
There are 2 main types of data that can be obtained:
 primary data - data that has been collected from the original
source for a specific purpose, for example, if a school wanted to know
what their students thought of the school canteen service they would
question the pupils directly
 secondary data - data that is not originally collected by a group for
a specific purpose, for example, finding out the average cost of cars
in a car park by using national statistics
Collection of data – Achievement Standard 2.2
Learning Intentions:
 I can identify/describe how data was collected
Data can be obtained via survey, observation, and experiment, although in practice a combination of these methods
may be used
A population is the complete set of individuals, objects or places in question
A census is an attempt to collect information about the whole population.
When the population is too large or it is too difficult to carry out a census due to time or monetary considerations,
data is collected from a representative sample instead
When you choose a sample from a population you
need to make sure that it has been selected fairly
In a random sample, every member of the
population has an equal change of being selected
There are 4 main methods to choose
members of the population to sample
Types of Data – Achievement Standard 2.2
Learning Intentions:
● I can recognise the difference between numerical and categorical data
Numerical data is number data. This type of data is either:


Discrete: whole number values which are usually found by counting.
Continuous: values that could be any number in an interval which are usually found by measuring.
Categorical data puts the data into non-numerical categories using words or labels This type of data is either


Nominal : involves naming or identifying data
Ordinal: involves placing information into an order.
How we organise and present our data depends on the type of data that we are dealing with.
Numerical data can be organised
and presented in the following ways:





Categorical data can be organised and
presented in the following ways:




Frequency/Tally tables
Bar/Column graphs (Discrete data)
Histograms (Continuous data)
Dot plots (Discrete data)
Stem and Leaf Plots
Frequency/Tally tables
Bar/Column graphs
Pie Charts
Line / Dot plots
From this list we will only be focusing on

Frequency/Tally tables

Stem and Leaf Plots

Back to back stem and leaf plots

Histograms
Presenting Data Frequency Tables
Learning Intentions:
 I can organise and present data using Frequency and Tally tables
Frequency Table show the number of pieces of data that
fall within given intervals.
A tally is a tool used for counting as results are gathered
Numbers are written as vertical lines with every 5th number having a cross
though a group of lines
Frequency tables show how common a certain value is in a frequency
column . A tallying column is also often used as data is gathered
The items can be individual values or intervals of values
Construct Stem and Leaf Plots –
Learning Intentions:
 I can construct a stem and leaf plot
Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf"
(usually the last digit)
The "stem" values are listed down, and the "leaf" values go right from the stem values.
The "stem" is used to group the scores and can consist of any number of digits;
Each "leaf" shows the individual scores within each group and is always a single digit
In a stem-and-leaf plot, the data are organized from least to greatest. Leaves need to be in order
Stem and leaf plots are similar to horizontal bar graph, but the actual numbers are used instead of bars
Construct Back to Back Stem and Leaf Plots – Achievement Standard 2.3
Learning Intentions:
 I can construct a back-to-back stem and leaf plot
This year we are looking at back-to-back stemplots to allow easy comparisons between sets of data. The stem is drawn
I the middle with leaves on either side
When constructing a back-to-back stemplot, the right side is constructed as normal, the stemplot to the left is
constructed as a mirror image with the leaves increasing in value as they are read right to left.
Example:
Construct Histograms – Achievement Standard 2.3
Learning Intentions:
 I can construct comparative histograms
When displaying continuous data graphically, histograms are a clear way of showing the frequency of each group.
A histogram is a chart that plots the distribution of a numeric variable’s values as a series of bars. Each bar typically covers
a range of numeric values called a bin or class; a bar’s height indicates the frequency of data points with a value within
the corresponding bin.
A special kind of bar graph that uses bars to represent the frequency
of numerical data that have been organized into intervals.
Because the intervals are all equal, all of the bars have the same width
Because the intervals are continuous (connected; ongoing), there is no
space between the bars
Example: A class of 24 students were asked how long it takes them to get to school in minutes.
The following data was collected:
15 10 21 13 22 23 17 19 25 31 27 32 35 42 14 12 26 18 34 19 28 30 17 8
(1) Construct a frequency/tally table for the data
Number
Tally
(2) Construct a histogram of the data
Frequency
0–9
1
10 – 19
10
20 – 29
7
30 – 39
5
40 – 49
1
Total
24
Shape of Data – Achievement Standard 2.4
Learning Intentions:

I can describe the shape of data using the terms ‘skewed’, ‘symmetric’ and ‘bimodal
Shape of the distribution
Once data has been organised, we can describe the shape of the distribution. The shape of the data can give us
another insight into where most of the information lies and gives us more information to make informed
analysis.
The shape of a distribution indicates the range and pattern of the distribution of a data set.
The distribution shape of quantitative data can be described as there is a logical order to the values, and the
'low' and 'high' end values on the x-axis of the histogram are able to be identified.
A distribution of data item values may be symmetrical or asymmetrical.
Two common examples of symmetry and asymmetry are the 'normal distribution' and the 'skewed
distribution'.
A normal distribution is a true symmetric distribution of observed values.
Skewness is the tendency for the values to be more frequent around the high or low ends of the x-axis.
The shape can be described in 4 ways:
Symmetrical: The data is balanced (symmetrical) about a centre line
Negatively Skewed: Most of the data are clustered towards the higher end of the scale and there is a “tail” of data
values towards the lower end of the scale
Positively Skewed: Most of the data are clustered towards the lower end of the scale and there is a “tail” of data
values towards the higher end of the scale
Bimodal: There are two clear “peaks” in the data, which implies that there are two modes (most frequently
occurring data values/groups)
Measures of Centre –
Learning Intentions:

I can calculate the mean, median mode and range of data sets
Outliers
Once data has been organised, we can begin to analyse the information. One of the first things that might be noticed is
the presence of outliers.
An outlier is a data value that is generally a single data value that is away from the rest of the data. You will notice a gap
between this data value and the rest of the data.
Measures of centre
Measures of centre are the statistical averages:
Mean: The sum of the data divided by the number of data values.
𝑚𝑒𝑎𝑛 =
𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
Median: the middle data value from a set of ordered data values. The position of the middle data value can be found
using the rule
𝑛+1
2
where 𝑛 is the number of data values in the set.
Note: When calculating the median, we must be careful.


If there is an odd number of data values, then the median is the middle number.
If there is an even number of data values, then the median is the mean of the middle 2 data values.
Mode / Modal class: the most frequently occurring data value/group/class.
Example:
Determine the measures of centre and spread for the following set of data:
2 4 4 5 6 6 7 7 7 8 9 9 10
𝑀𝑒𝑎𝑛 =
84
13
𝑀𝑒𝑑𝑖𝑎𝑛 = 7
𝑀𝑜𝑑𝑒 = 7
= 6.46
Example:
Determine the measures of centre and spread for the following set of data:
3 4 4 5 5 6 6 7 7 7 9 9 10 11
𝑀𝑒𝑎𝑛 =
95
14
= 6.79
𝑀𝑒𝑑𝑖𝑎𝑛 =
6+7
2
= 6.5
𝑀𝑜𝑑𝑒 = 7
Finding Measures of Centre from a table
Example: Determine the measures of centre and spread for the following set of data displayed in a frequency table:
Number
Frequency
Product
Cumulative
Frequency
2
3
2×3=6
3
3
7
3 × 7 = 21
3 + 7 = 10
4
9
36
10 + 9 = 19
5
4
20
19 + 4 = 23
6
3
18
26
7
2
14
28
8
1
8
29
9
1
9
30
10
1
10
31
Total
31
142
To find the mean from a frequency table, we need to add all the data values first. This can be made easier from the
table using a product column. The product column is the 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦. Then the total of the product
column is the sum of the data values.
𝑀𝑒𝑎𝑛 =
=
2×3+3×7+4×9+...+10×1
31
142
31
= 4.58
To find the median from a frequency table, we need to know where the middle data value would be. We can use the
formula
𝑛+1
2
to find the middle, where n is the number of data values. For this set of data, the middle data value
would be in the position
31+1
2
= 16th. Using the cumulative frequency column, we can work out where the 16th
number would be.
𝑀𝑒𝑑𝑖𝑎𝑛 = 4
𝑀𝑜𝑑𝑒 = 4
𝑅𝑎𝑛𝑔𝑒 = 10 − 2 = 8
Measures of Spread Range and Interquartile Range – Achievement Standard 2.4
Learning Intentions:
 I can use the range and interquartile range to describe a set of data
Measures of spread gives a measure of the variation in the data or how wide the data is:
Range: The full width of the data.
𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒
Example:
:
Determine the measures of spread for the following set of data:
3 4 4 5 5 6 6 7 7 7 9 9 10 11
𝑅𝑎𝑛𝑔𝑒 = 11 − 3
=8
Interquartile Range: The width of the middle 50% of the data.
𝐼𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑔𝑒 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 3 − 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 1
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
where 𝑄1 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 1, the median of the lower half of the data
𝑄3 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 3, the median of the upper half of the data
Quartiles: Quartiles divide the data into 4 (approximately) equal parts.
25% of scores lie below the lower quartile Q1
25% of scores lie between Q1 and the median (Q2)
25% of scores lie in between the median and the upper quartile Q3
25% of scores lie above Q3
Example:
Determine the measures of spread for the following set of data:
2 4 4 5 6 6 7 7 7 8 9 9 10
𝑅𝑎𝑛𝑔𝑒 = 10 − 2
𝑄3 = 8.5
=8
𝑄1 = 4.5
∴ 𝐼𝑄𝑅 = 8.5 − 4.5
=4
Interpreting Measures of Centre – Achievement Standard 2.4
Learning Intentions:
 I can use the appropriate measure of centre to draw conclusions
Important Note: There are advantages and disadvantages of using the mean or median as a measure of the centre
(average) of the data. When mean and median are compared to each other, they can also give an indication of whether
the data is symmetric or skewed and which way the data is skewed.
Mean
Median
Advantage
Includes all the data values and so
considers all of the information
Is not influenced by extreme data values
(outliers)
Disadvantage
Is influenced by extreme data
values (outliers)
Doesn’t consider the value of the data values,
only the position of the values
Symmetric Shape
Approximately equal
Positively Skewed
Mean is higher than median
Negatively Skewed
Mean is lower than median
Comparing Data Measures of Centre – Achievement Standard 2.4
Learning Intentions:

I can compare means, medians and ranges of two sets of numerical data
Comparing Back to Back Stem and Leaf Plots – Achievement Standard 2.4
Learning Intentions:
 I can use back-to-back stem and leaf plots to compare two data sets
Comparing Data using the Range and Interquartile Range


– Achievement Standard 2.4
I can use the range and interquartile range to describe a set of data
I can use the range and interquartile range to draw conclusions
Consider the following two distributions of exam scores:
Note Both distributions have a median of 74.5.
Which distribution has more variability?
Class Set A , the range is 55 (the range: 95 – 40 = 55). Class Set B , the range is 55 (the range: 95 – 40 = 55).
If we use the range to measure variability, we say the distributions have the same amount of variability.
But the variability in the distributions differ when we look at how the data is distributed about the median.
Set A has a large portion of its data close to the median. This is not true for Set B.
From this viewpoint, Set A has less variability about the median
Quartile marks divide the data set into four subgroups with the same number of individuals in each subgroup.
Class A: Q1: 71 Q2: 74.5 Q3: 78.5
Class B: Q1: 61 Q2: 74.5 Q3: 89
Some quartiles exhibit more variability in the data even though each quartile contains the same amount of data.
Comments about the variability in the data for Class Set A

Looking at the first quartile (Q1). 25% of the scores in Class A are between 40 and 71.

There is a lot of variability in the Q1 The eight scores in Q1 vary by 30 points.

Looking at the third quartile (Q3) 25% of the scores in Class A are between 74.5 and 78.5.

There is not much variability in Q3. The 8 scores in Q3 vary by only 4 points.
Class A: IQR = Q3 – Q1 = 78.5 – 71 = 7.5



Class B: IQR = Q3 – Q1 = 89 – 61 = 28
Class A has less variability about its median. Its IQR is much smaller.
The middle 50% of exam scores for Class A vary by only 7.5 points.
The middle 50% of exam scores for Class B vary by 28 points.
Interpret Comparing and Reporting on Data – Achievement Standard 2.4
Learning Intentions

I can describe and interpret data as part of a statistical report
When working with data, we are trying to find an answer to a question. By collecting data,
organising/displaying/calculating the data, we are able to draw conclusions and make decisions. This is done through
preparing a statistical report.
The form of a statistical report:







Introduction – stating the variable to be investigated
Collect the Data
Organise and graph the data – frequency table, stemplot, column graph/histogram
Comment on the shape of the data – symmetrical, positively/negatively skewed, bimodal, any
outliers
Calculate the measures of centre (mean & median) and spread (range)
Analyse/Assumptions
Conclusion
Writing the analysis, discussion of the assumptions and conclusion can often be the most challenging part of a statistical
report. So, we will work through an analysis, assumptions discussion and conclusion together as an example of how to
work through the results.
We will use the TEEL (Topic, Evidence, Elaboration, Link) approach to write your analysis/assumptions.
Download