Numerical Descriptions of the Data Sections 2.1 & 2.2 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston January 21, 2016 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 1 / 34 Outline 1 Introduction 2 Range 3 Variance and Standard Deviation 4 Calculating The Standard Deviation 5 Coefficient of Variation 6 Percentiles 7 Quartiles 8 The 1.5IQR Rule Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 2 / 34 Different Test Scores? The following table shows the score of an exam for two different sections from a sample of randomly selected 10 students from each of the two sections. Section A 65 66 67 68 71 73 74 77 77 77 Section B 42 54 58 62 67 77 77 85 93 100 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 3 / 34 Types of Measurements for Spread The simplest useful numerical description of a distribution consists of both a measure of center and a measure of spread. Range = largest value - smallest value Variance Standard deviation Coefficient of variation Percentiles Quartiles IQR; Interquartile range Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 4 / 34 Range The range is the difference between the largest value and the smallest value in your dataset. Range of Section A = 77 − 65 = 12 Range of Section B = 100 − 42 = 58 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 5 / 34 Measuring Spread: The Standard Deviation Measures spread by looking at how far the observations are from their mean. Most common numerical description for the spread of a distribution. A larger standard deviation implies that the values have a wider spread from the mean. Denoted s when used with a sample. This is the one we calculate from a list of values. Denoted σ when used with a population. This is the "idealized" standard deviation. The standard deviation has the same units of measurements as the original observations. Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 6 / 34 Definition of the Standard Deviation The standard deviation is the average distance each observation is from the mean. Using this list of values from a sample: 3, 3, 9, 15, 15 The mean is 9. By definition, the average distance each of these values are from the mean is 6. So the standard deviation is 6. Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 7 / 34 Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. I I 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 8 / 34 Adding or Subtracting a Value to the Observations Adding or subtracting the same value to all the original observations does not change the standard deviation of the list. Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard deviation = 6. If we add 4 to all the values: 7, 7, 13, 19, 19 mean = 13, standard deviation = 6 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 9 / 34 Multiplying or Dividing a Value to the Observations Multiplying or dividing the same value to all the original observations will change the standard deviation by that factor. Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard deviation = 6. If we double all the values: 6, 6, 18, 30, 30 mean = 18, standard deviation = 12 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 10 / 34 Population Variance and Standard Deviation If N is the number of values in a population with mean mu, and xi represents each individual in the population, the the population variance is found by: 2 σ = PN i=1 (xi − µ)2 N and the population standard deviation is the square root, σ = Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) √ January 21, 2016 σ2. 11 / 34 Sample Variance and Standard Deviation Most of the time we are working with a sample instead of a population. So the sample variance is found by: Pn (xi − x̄)2 2 s = i=1 n−1 √ and the sample standard deviation is the square root, s = s2 . Where n is the number of observations (samples), xi is the value for the i th observation and x̄ is the sample mean. Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 12 / 34 Calculating the Standard Deviation By Hand When calculating by hand easier calculations are: σ2 = N 1X 2 xi − µ2x . N i=1 s 2 = 1 n−1 n X ! xi2 − nx̄ 2 . i=1 √ Then the standard deviation is the square root: σ = Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) σ 2 or s = √ January 21, 2016 s2 . 13 / 34 Example: Section A Determine the sample standard deviation of the following test scores. Section A 65 66 67 68 71 73 74 Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) 77 77 77 January 21, 2016 14 / 34 Sample Standard Deviation of Section A test scores Sample standard deviation is s = 4.77. This implies that from the sample of the 10 students from section A the tests scores has a spread, on average, of 4.77 points from the mean of 71.50 points. Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 15 / 34 In Class Work Example A statistics teacher wants to decide whether or not to curve an exam. From her class of 300 students, she chose a sample of 10 students and their grade were: 72, 88, 85, 81, 60, 54, 70, 72, 63, 43 Determine the sample mean. What is the variance? What is the standard deviation? Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 16 / 34 Add 10 Suppose the statistics instructor decides to curve the grade by adding 10 points to each score. What is the new mean, variance and standard deviation? Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 17 / 34 Multiply by 2 For the following dataset the mean is x̄ = 4.5, the variance is s2 = 3.5 and the standard deviation is s = 1.870829. 3, 6, 2, 7, 4, 5 Now, multiply each value by 2. What is the new variance and the new standard deviation? Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 18 / 34 Calculating Standard Deviation For larger data sets use a calculator or computer software. Each calculator is different if you cannot determine how to compute standard deviation from your calculator ask your instructor. For this course we will be using R as the software. The function for the sample standard deviation in R is sd(data name$variable name). Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics Sections 2.1University & 2.2 of Houston ) January 21, 2016 19 / 34