40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics Intro to Statistics ST-L1 Objectives: To review measures of central tendency and dispersion. Learning Outcome B-4 Slide 1 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics Judith and Francine, both age 19, have decided to go on a Caribbean cruise, and they want to have an enjoyable time, which means that they want to travel with other people their own age. They buy tickets for a cruise where the average age of the other passengers is 20 years. Can you imagine their surprise at the start of the cruise when they discover that all the other passengers are parents (average age 32) with children (average age 8)? This lesson includes a brief review of sampling and the calculation of central tendencies. It also introduces 'measures of dispersion,' something Judith and Francine should have known about. Also, you will be introduced to information technology (Winstats) that will do the statistics calculations. Theory – Intro Slide 2 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics Teacher Bundy asked his students to measure the length of the classroom using metre sticks, and to write their measurements (rounded to the nearest millimetre) on the board. It quickly became evident that most of the measurements were quite similar, but a few were lower or higher than the 'main group' of measurements. Whenever some quantity or value is measured numerous times, there will likely be a variety of results. Most of the results will likely be close to a value believed to be the true value. The term variability in statistics refers to the variety of answers we get when measuring something. You will study ways of describing and interpreting variability in data. For example, you will review ways of finding a 'middle' measurement that represent all the data, and also ways of describing how the data are spread about this 'middle' value. Theory – Variability, Continuous and Discrete Data Slide 3 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics If we count books, the count would consist of whole numbers. This is an example of discrete data, or data you get when counting a finite number of distinct objects. (For example, there may be 71 or 72 books, but not 71.6 books.) If we measure the length of the classroom we must decide on an acceptable level of precision and round our final answer. We can never say the measurement is exactly correct. Such measurements are continuous data, because the data is NOT discrete the final answer is an approximation (eg. 71.4 cm, or 71.38 cm or 71.3810728 cm). Other examples of continuous data are measurements of speed, area, and time. Theory – Variability, Continuous and Discrete Data cont’d Slide 4 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics A measure of central tendency is a single quantity or score that in some way represents the 'middle' value of all the data in a sample or population. Three commonly used measures of central tendency are as follows: 1. The mean, or average, of a set of scores is found by adding the scores together and then dividing the sum by the number of scores. The symbols for mean are: (bar x) for the mean of a set of data (sample mean), and (mu) for the mean of a population 2. The mode for a sample or population is the value that appears most often. Theory – Measures of Central Tendency Slide 5 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics The median is the middle term when data are arranged in order of size from the smallest to the largest. Let 'n' represent the number of terms of data. If the data have an odd number of terms, the middle one is the term. If the data have an even number of terms, you find the average of the middle two terms. That is, you find the average of the terms at and Theory – Measures of Central Tendency Slide 6 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics A clerk in a men's clothing store keeps a weekly record of the number of pairs of pants sold. The following is her list for two weeks. Calculate the mean, mode, and median for the data shown. Mon Tue Wed Thur Fri Sat Week 1 34 40 36 36 38 38 Week 2 32 36 36 42 34 34 Test Your Knowledge – Measures of Central Tendency Slide 7 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics The question from the previous page is repeated here: A clerk in a men's clothing store keeps a weekly record of the number of pairs of pants sold. The following is her list for two weeks. Winstats is used to answer this question. The window on the left-hand side shows the data entered in two columns, and the window on the right shows Mon Tue Wed Thur Fri Sat the overall statistics. Calculate the mean, mode, and median for the data shown. Week 1 34 40 36 36 38 38 Week 2 32 36 36 42 34 34 Note that the mean shown is 36.333, and the median is 36.000. These answers are the same as the ones that were calculated. Winstats does not calculate the mode. Test Your Knowledge – Measures of Central Tendency Slide 8 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics A central tendency is a measure of some kind of 'middle' number for a group of data. We also need ways to measure the variation of the individual data values, and how the data are spread about the central value. For example, Robin can drive to university using the downtown route or the perimeter route. The downtown route is shorter, but it has more traffic, and can become quite crowded. The driving times in minutes for each route Downtown (arranged in ascending order) 15 26 30 39 45 Route for 5 days are shown on the table. Perimeter 29 30 31 32 33 Route The average driving time for each route is 31 minutes. Which route should she take? The driving times for the downtown route vary from 15 to 45 minutes, and so she would need to allow 45 minutes travel time to ensure getting to class on time. The driving times for the perimeter route vary from 29 to 33 minutes, and so the travel times are more predictable. As you can see, Robin needs to consider more than the mean travel times when deciding on which route to take. Theory – Measures of Dispersion Slide 9 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics The simplest measure of variation is the range. The range of a set of numbers is the difference between the largest number and the smallest number. In the previous example (repeated here), the ranges are as follows: Downtown Route Perimeter Route 15 26 30 39 45 29 30 31 32 33 Downtown: Range = 45 - 15 = 30 minutes Perimeter: Range = 33 - 29 = 4 minutes Other applications of range may be: •temperature variation for the day •prices of stocks on the stock market •marks of students in class Theory – Range Slide 10 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics One limitation of using range to describe the variation in a group of data (for example, student marks in a class) is that range provides information about only two scores - the highest and the lowest - and does not provide any information about all the other scores. One extreme score will make the range very large, even if all the other scores are very close. For example, the marks of the five students in both groups below have a range of 55. The variability in the first group, however, appears to be primarily attributable to one extreme student. Student marks: 25, 75, 77, 78, 80 Range = 80 - 25 = 55 Student marks: 25, 34, 56, 71, 80 Range = 80 - 25 = 55 Theory – Range Slide 11 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics The standard deviation is a measure that shows how the data are spread about the mean value. We will use standard deviation to describe data, but we will not use algebraic formulas to calculate standard deviation. Instead, we will use technology (computer program or graphing calculator) to calculate standard deviation. The symbol for standard deviation of a population or large sample is and the symbol for standard deviation of a sample is s. A large sample is defined as a sample with 30 or more data items. In this course, we will use only (sigma), which represents the standard deviation of the population. Theory – The Standard Deviation Slide 12 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics For example: The mean math mark for Class A is 74, and the standard deviation is 4. This means that 68 percent of all the marks in the class are within 4 of 74. In other words, we can say that 68 percent of the marks in class are between (74 - 4) 70 and (74 + 4) 78. Theory – The Standard Deviation Slide 13 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics Calculate Standard Deviation Using Algebra (not really) Theory – The Standard Deviation Slide 14 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics The mean math marks and standard deviation for two classes are shown below. Assume that 68 percent of the marks in each class are within one standard deviation of the mean mark. mean standard mark deviation Class A 74 4 Class B 72 8 Questions: 1. In which class is the set of marks more dispersed? 2. Bert in Class A and Beth in Class B each have a mark of 82%. How many standard deviations are they from their class means? Who appears to have the better mark? Theory – The Standard Deviation Slide 15 40S Applied Math Mr. Knight – Killarney School Answers: Unit: Statistics Lesson: ST-L1 Intro to Statistics mean standard mark deviation Class A 74 4 Class B 72 8 1. Class A: 68% of all the marks are from (74 - 4) 70 to (74 + 4) 78. Class B: 68% of all the marks are from (72 - 8) 64 to (72 + 8) 80. Therefore, the marks in Class B are more dispersed. 2. Bert: Number of standard deviations above the mean. Beth: Number of standard deviations above the mean. Therefore, Bert appears to have the better mark because he is 2 above the class average. End day 1 Theory – The Standard Deviation Slide 16 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics On previous pages all the elements of data were listed in tables. This is fine if the number of elements is relatively small, but becomes awkward if the number of elements gets large -- say 100 elements or more. For this reason, it is convenient to present the data as a frequency distribution. A frequency distribution table shows the number of elements of data (frequency) at each measure. Sometimes the measures need to be grouped, especially if the measures are continuous. Example: The table below is a frequency distribution table that shows the heights of 100 Senior 4 students. The students are grouped into suitable height groups in 7 cm. intervals. Theory – Grouped Data Slide 17 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics Example: The table below is a frequency distribution table that shows the heights of 100 Senior 4 students. The students are grouped into suitable height groups in 7 cm. intervals. height interval interval mean # of students 153.5 to 160.5 157 5 160.5 to 167.5 164 16 167.5 to 174.5 171 43 174.5 to 181.5 178 27 181.5 to 188.5 185 9 Total 100 Determine the mean, median, mode, range, and standard deviation for the student heights. Be sure to select 'grouped data' in Winstats. Theory – Grouped Data Slide 18 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics From the diagram, you can read the following answers. mean = m = 172.33 median = 171.00 mode = 171.00 (i.e., the largest # of students, read from input data at 43 students) range = 28 standard deviation = s = 6.837 Theory – Grouped Data Slide 19 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics A histogram is a bar graph that shows equal intervals of a measured or counted quantity on the horizontal axis, and the frequencies associated with these intervals on the vertical axis. The data from the previous page is repeated below. The histogram has been drawn with Winstats, and represents the heights of the students in graphical form. Note that the five bars represent the five height intervals, and the heights of the bars represent the number of students at each height. The tallest bar represents the mode height. Such a histogram is known as a Frequency Distribution Graph. Theory – Histogram Slide 20 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics The frequency distribution table below shows the midterm marks of 85 Senior 4 math students at Parksville High. The first column shows the mark interval, the second column the average mark within each mark interval, and the third column the number of students at each mark. Answer the following questions. mark interval mark # of students 29 to 37 33 1 38 to 46 42 4 47 to 55 51 12 56 to 64 60 18 65 to 73 69 24 74 to 82 78 16 83 to 91 87 7 92 to 100 96 3 Total Sample Problem 85 Slide 21 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics mark interval mark # of students 29 to 37 33 1 38 to 46 42 4 47 to 55 51 12 56 to 64 60 18 65 to 73 69 24 74 to 82 78 16 83 to 91 87 7 92 to 100 96 3 Total Sample Problem 85 Slide 22 40S Applied Math Mr. Knight – Killarney School Sample Problem Unit: Statistics Lesson: ST-L1 Intro to Statistics Slide 23 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics mark interval mark # of students 29 to 37 33 1 38 to 46 42 4 47 to 55 51 12 56 to 64 60 18 65 to 73 69 24 74 to 82 78 16 83 to 91 87 7 92 to 100 96 3 Total Sample Problem 85 Slide 24 40S Applied Math Mr. Knight – Killarney School Unit: Statistics Lesson: ST-L1 Intro to Statistics mark interval mark # of students 29 to 37 33 1 38 to 46 42 4 47 to 55 51 12 56 to 64 60 18 65 to 73 69 24 74 to 82 78 16 83 to 91 87 7 92 to 100 96 3 Total Sample Problem 85 Slide 25