Year 9 Maths Statistics Notes 2022 Achievement Standards: 2.1 2.2 2.3 2.4 Students compare techniques for collecting data from primary and secondary sources Identify questions and issues involving different data types Students construct histograms and back-to-back stem-and-leaf plots with and without the use of digital technology Students make sense of the position of the mean and median in skewed, symmetric and bi-modal displays to describe and interpret data Statistical investigations have 4 main stages: 1. 2. 3. 4. Posing a question. Collecting data. Analysing data. Interpreting results. Data Sources – Achievement Standard 2.1 Learning Intentions: I can collect data directly and from secondary sources There are 2 main types of data that can be obtained: primary data - data that has been collected from the original source for a specific purpose, for example, if a school wanted to know what their students thought of the school canteen service they would question the pupils directly secondary data - data that is not originally collected by a group for a specific purpose, for example, finding out the average cost of cars in a car park by using national statistics Collection of data – Achievement Standard 2.2 Learning Intentions: I can identify/describe how data was collected Data can be obtained via survey, observation, and experiment, although in practice a combination of these methods may be used A population is the complete set of individuals, objects or places in question A census is an attempt to collect information about the whole population. When the population is too large or it is too difficult to carry out a census due to time or monetary considerations, data is collected from a representative sample instead When you choose a sample from a population you need to make sure that it has been selected fairly In a random sample, every member of the population has an equal change of being selected There are 4 main methods to choose members of the population to sample Types of Data – Achievement Standard 2.2 Learning Intentions: ● I can recognise the difference between numerical and categorical data Numerical data is number data. This type of data is either: Discrete: whole number values which are usually found by counting. Continuous: values that could be any number in an interval which are usually found by measuring. Categorical data puts the data into non-numerical categories using words or labels This type of data is either Nominal : involves naming or identifying data Ordinal: involves placing information into an order. How we organise and present our data depends on the type of data that we are dealing with. Numerical data can be organised and presented in the following ways: Categorical data can be organised and presented in the following ways: Frequency/Tally tables Bar/Column graphs (Discrete data) Histograms (Continuous data) Dot plots (Discrete data) Stem and Leaf Plots Frequency/Tally tables Bar/Column graphs Pie Charts Line / Dot plots From this list we will only be focusing on Frequency/Tally tables Stem and Leaf Plots Back to back stem and leaf plots Histograms Presenting Data Frequency Tables Learning Intentions: I can organise and present data using Frequency and Tally tables Frequency Table show the number of pieces of data that fall within given intervals. A tally is a tool used for counting as results are gathered Numbers are written as vertical lines with every 5th number having a cross though a group of lines Frequency tables show how common a certain value is in a frequency column . A tallying column is also often used as data is gathered The items can be individual values or intervals of values Construct Stem and Leaf Plots – Learning Intentions: I can construct a stem and leaf plot Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf" (usually the last digit) The "stem" values are listed down, and the "leaf" values go right from the stem values. The "stem" is used to group the scores and can consist of any number of digits; Each "leaf" shows the individual scores within each group and is always a single digit In a stem-and-leaf plot, the data are organized from least to greatest. Leaves need to be in order Stem and leaf plots are similar to horizontal bar graph, but the actual numbers are used instead of bars Construct Back to Back Stem and Leaf Plots – Achievement Standard 2.3 Learning Intentions: I can construct a back-to-back stem and leaf plot This year we are looking at back-to-back stemplots to allow easy comparisons between sets of data. The stem is drawn I the middle with leaves on either side When constructing a back-to-back stemplot, the right side is constructed as normal, the stemplot to the left is constructed as a mirror image with the leaves increasing in value as they are read right to left. Example: Construct Histograms – Achievement Standard 2.3 Learning Intentions: I can construct comparative histograms When displaying continuous data graphically, histograms are a clear way of showing the frequency of each group. A histogram is a chart that plots the distribution of a numeric variable’s values as a series of bars. Each bar typically covers a range of numeric values called a bin or class; a bar’s height indicates the frequency of data points with a value within the corresponding bin. A special kind of bar graph that uses bars to represent the frequency of numerical data that have been organized into intervals. Because the intervals are all equal, all of the bars have the same width Because the intervals are continuous (connected; ongoing), there is no space between the bars Example: A class of 24 students were asked how long it takes them to get to school in minutes. The following data was collected: 15 10 21 13 22 23 17 19 25 31 27 32 35 42 14 12 26 18 34 19 28 30 17 8 (1) Construct a frequency/tally table for the data Number Tally (2) Construct a histogram of the data Frequency 0–9 1 10 – 19 10 20 – 29 7 30 – 39 5 40 – 49 1 Total 24 Shape of Data – Achievement Standard 2.4 Learning Intentions: I can describe the shape of data using the terms ‘skewed’, ‘symmetric’ and ‘bimodal Shape of the distribution Once data has been organised, we can describe the shape of the distribution. The shape of the data can give us another insight into where most of the information lies and gives us more information to make informed analysis. The shape of a distribution indicates the range and pattern of the distribution of a data set. The distribution shape of quantitative data can be described as there is a logical order to the values, and the 'low' and 'high' end values on the x-axis of the histogram are able to be identified. A distribution of data item values may be symmetrical or asymmetrical. Two common examples of symmetry and asymmetry are the 'normal distribution' and the 'skewed distribution'. A normal distribution is a true symmetric distribution of observed values. Skewness is the tendency for the values to be more frequent around the high or low ends of the x-axis. The shape can be described in 4 ways: Symmetrical: The data is balanced (symmetrical) about a centre line Negatively Skewed: Most of the data are clustered towards the higher end of the scale and there is a “tail” of data values towards the lower end of the scale Positively Skewed: Most of the data are clustered towards the lower end of the scale and there is a “tail” of data values towards the higher end of the scale Bimodal: There are two clear “peaks” in the data, which implies that there are two modes (most frequently occurring data values/groups) Measures of Centre – Learning Intentions: I can calculate the mean, median mode and range of data sets Outliers Once data has been organised, we can begin to analyse the information. One of the first things that might be noticed is the presence of outliers. An outlier is a data value that is generally a single data value that is away from the rest of the data. You will notice a gap between this data value and the rest of the data. Measures of centre Measures of centre are the statistical averages: Mean: The sum of the data divided by the number of data values. 𝑚𝑒𝑎𝑛 = 𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 Median: the middle data value from a set of ordered data values. The position of the middle data value can be found using the rule 𝑛+1 2 where 𝑛 is the number of data values in the set. Note: When calculating the median, we must be careful. If there is an odd number of data values, then the median is the middle number. If there is an even number of data values, then the median is the mean of the middle 2 data values. Mode / Modal class: the most frequently occurring data value/group/class. Example: Determine the measures of centre and spread for the following set of data: 2 4 4 5 6 6 7 7 7 8 9 9 10 𝑀𝑒𝑎𝑛 = 84 13 𝑀𝑒𝑑𝑖𝑎𝑛 = 7 𝑀𝑜𝑑𝑒 = 7 = 6.46 Example: Determine the measures of centre and spread for the following set of data: 3 4 4 5 5 6 6 7 7 7 9 9 10 11 𝑀𝑒𝑎𝑛 = 95 14 = 6.79 𝑀𝑒𝑑𝑖𝑎𝑛 = 6+7 2 = 6.5 𝑀𝑜𝑑𝑒 = 7 Finding Measures of Centre from a table Example: Determine the measures of centre and spread for the following set of data displayed in a frequency table: Number Frequency Product Cumulative Frequency 2 3 2×3=6 3 3 7 3 × 7 = 21 3 + 7 = 10 4 9 36 10 + 9 = 19 5 4 20 19 + 4 = 23 6 3 18 26 7 2 14 28 8 1 8 29 9 1 9 30 10 1 10 31 Total 31 142 To find the mean from a frequency table, we need to add all the data values first. This can be made easier from the table using a product column. The product column is the 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦. Then the total of the product column is the sum of the data values. 𝑀𝑒𝑎𝑛 = = 2×3+3×7+4×9+...+10×1 31 142 31 = 4.58 To find the median from a frequency table, we need to know where the middle data value would be. We can use the formula 𝑛+1 2 to find the middle, where n is the number of data values. For this set of data, the middle data value would be in the position 31+1 2 = 16th. Using the cumulative frequency column, we can work out where the 16th number would be. 𝑀𝑒𝑑𝑖𝑎𝑛 = 4 𝑀𝑜𝑑𝑒 = 4 𝑅𝑎𝑛𝑔𝑒 = 10 − 2 = 8 Measures of Spread Range and Interquartile Range – Achievement Standard 2.4 Learning Intentions: I can use the range and interquartile range to describe a set of data Measures of spread gives a measure of the variation in the data or how wide the data is: Range: The full width of the data. 𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 Example: : Determine the measures of spread for the following set of data: 3 4 4 5 5 6 6 7 7 7 9 9 10 11 𝑅𝑎𝑛𝑔𝑒 = 11 − 3 =8 Interquartile Range: The width of the middle 50% of the data. 𝐼𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑔𝑒 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 3 − 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 1 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 where 𝑄1 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 1, the median of the lower half of the data 𝑄3 = 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 3, the median of the upper half of the data Quartiles: Quartiles divide the data into 4 (approximately) equal parts. 25% of scores lie below the lower quartile Q1 25% of scores lie between Q1 and the median (Q2) 25% of scores lie in between the median and the upper quartile Q3 25% of scores lie above Q3 Example: Determine the measures of spread for the following set of data: 2 4 4 5 6 6 7 7 7 8 9 9 10 𝑅𝑎𝑛𝑔𝑒 = 10 − 2 𝑄3 = 8.5 =8 𝑄1 = 4.5 ∴ 𝐼𝑄𝑅 = 8.5 − 4.5 =4 Interpreting Measures of Centre – Achievement Standard 2.4 Learning Intentions: I can use the appropriate measure of centre to draw conclusions Important Note: There are advantages and disadvantages of using the mean or median as a measure of the centre (average) of the data. When mean and median are compared to each other, they can also give an indication of whether the data is symmetric or skewed and which way the data is skewed. Mean Median Advantage Includes all the data values and so considers all of the information Is not influenced by extreme data values (outliers) Disadvantage Is influenced by extreme data values (outliers) Doesn’t consider the value of the data values, only the position of the values Symmetric Shape Approximately equal Positively Skewed Mean is higher than median Negatively Skewed Mean is lower than median Comparing Data Measures of Centre – Achievement Standard 2.4 Learning Intentions: I can compare means, medians and ranges of two sets of numerical data Comparing Back to Back Stem and Leaf Plots – Achievement Standard 2.4 Learning Intentions: I can use back-to-back stem and leaf plots to compare two data sets Comparing Data using the Range and Interquartile Range – Achievement Standard 2.4 I can use the range and interquartile range to describe a set of data I can use the range and interquartile range to draw conclusions Consider the following two distributions of exam scores: Note Both distributions have a median of 74.5. Which distribution has more variability? Class Set A , the range is 55 (the range: 95 – 40 = 55). Class Set B , the range is 55 (the range: 95 – 40 = 55). If we use the range to measure variability, we say the distributions have the same amount of variability. But the variability in the distributions differ when we look at how the data is distributed about the median. Set A has a large portion of its data close to the median. This is not true for Set B. From this viewpoint, Set A has less variability about the median Quartile marks divide the data set into four subgroups with the same number of individuals in each subgroup. Class A: Q1: 71 Q2: 74.5 Q3: 78.5 Class B: Q1: 61 Q2: 74.5 Q3: 89 Some quartiles exhibit more variability in the data even though each quartile contains the same amount of data. Comments about the variability in the data for Class Set A Looking at the first quartile (Q1). 25% of the scores in Class A are between 40 and 71. There is a lot of variability in the Q1 The eight scores in Q1 vary by 30 points. Looking at the third quartile (Q3) 25% of the scores in Class A are between 74.5 and 78.5. There is not much variability in Q3. The 8 scores in Q3 vary by only 4 points. Class A: IQR = Q3 – Q1 = 78.5 – 71 = 7.5 Class B: IQR = Q3 – Q1 = 89 – 61 = 28 Class A has less variability about its median. Its IQR is much smaller. The middle 50% of exam scores for Class A vary by only 7.5 points. The middle 50% of exam scores for Class B vary by 28 points. Interpret Comparing and Reporting on Data – Achievement Standard 2.4 Learning Intentions I can describe and interpret data as part of a statistical report When working with data, we are trying to find an answer to a question. By collecting data, organising/displaying/calculating the data, we are able to draw conclusions and make decisions. This is done through preparing a statistical report. The form of a statistical report: Introduction – stating the variable to be investigated Collect the Data Organise and graph the data – frequency table, stemplot, column graph/histogram Comment on the shape of the data – symmetrical, positively/negatively skewed, bimodal, any outliers Calculate the measures of centre (mean & median) and spread (range) Analyse/Assumptions Conclusion Writing the analysis, discussion of the assumptions and conclusion can often be the most challenging part of a statistical report. So, we will work through an analysis, assumptions discussion and conclusion together as an example of how to work through the results. We will use the TEEL (Topic, Evidence, Elaboration, Link) approach to write your analysis/assumptions.