CHAPTER 1 OUTLINE Population Data Population The numeric characteristic of all members of a specific group of individuals or things is called population data. All students at IUPUI (about 29,000) Numeric characteristics: o Household income. o Marital status (“married” = 1; “single” = 0). o Work status (“full time” = 1; “part-time or not working” = 0) o Age. o Residency status (“in-state” = 1; “out-of-state” = 0) o Commuting distance and time to campus. Sample Data A sample is a subset of the population. The numeric characteristic of a subset of the specific group of individuals or things is called sample data. Sample Population Parameter A summary characteristic computed from population data is called a population parameter. Population Mean (µ mu) The mean of the population is a population parameter. Example Population data, X: the commuting distance of N = 88 E270 students: µ= ∑𝑥 𝑁 ∑𝑥 = 1,419 𝑁 = 88 Chapter 1—Descriptive Statistics µ = 1,419⁄88 = 16.125 13 3 8 20 6 3 6 24 27 30 18 4 24 23 5 3 20 15 9 21 22 17 13 19 24 20 30 8 23 3 12 17 18 27 12 8 17 9 24 30 7 27 16 1 19 27 26 26 24 27 22 13 16 19 21 24 6 11 19 10 15 11 7 23 8 23 23 22 3 5 19 27 16 15 24 25 18 3 22 28 23 11 8 3 4 21 5 11 Page 1 of 8 Population Proportion (π pi) 1 0 0 1 0 0 0 1 0 0 0 The population proportion is a population parameter. Example The population data, X: gender of N = 88 E270 students. X is binary data: Female = 1, Male = 0 Proportion of students who are female: 𝜋= ∑𝑥 ∑𝑥 𝑁 = 41 𝑁 = 88 𝜋 = 41⁄88 = 0.466 (or 46.6%) 1 1 0 1 0 0 1 1 0 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 1 0 0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 Sample Statistic A summary characteristic computed from sample data is called a sample statistic. Sample Mean x̅ (x-bar) The mean of a sample data is a sample statistic. Example The commuting distance of n = 5 students randomly selected from the population of E270 students. ̅= 𝒙 9 20 10 3 26 0 1 1 0 1 0 0 0 0 0 ∑𝒙 𝒏 ∑𝑥 = 68 𝑛=5 𝑥̅ = 68⁄5 = 13.6 ̅ (p-bar) Sample Proportion 𝒑 The sample proportion is a sample statistic. Example The proportion of a sample of n = 50 IUPUI students who smoke tobacco. “Smokes”: 𝑥 = 1; “Does not smoke”: 𝑥 = 0 ∑𝑥 9 𝑝̅ = ∑𝑥 = 9 𝑝̅ = = 0.18 𝑛 50 Chapter 1—Descriptive Statistics 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 2 of 8 The Mean is the Center of Gravity of Data The five items in your short grocery list have the following prices. Prices are rounded to the nearest dollar for ease of calculation. Item 𝑖 1 2 3 4 5 µ= ∑𝑥 𝑁 = Price ($) xi 2 5 3 8 17 35 Deviation xi − μ -5 -2 -4 1 10 0 35 =7 5 The mean is the center of gravity of the data because ∑(𝑥𝑖 − µ) = 0 The sum of deviations from the mean is zero. Weighted Mean The weighted mean is used when each observation in the data set is assigned a specified weight. The weight of each data point can be either relative, or be expressed as the frequency of that data point in the data set. Example An instructor assigns the following weights to various requirements in an introduction to statistics course: Homework assignments Midterm test Final 0.20 (20%) 0.30 (40%) 0.50 (50%) A student’s scores on these requirements, from a scale of 100, are shown below. The course letter grade is based on the average computed from these scores. Compute the average score. Requirement Homework Midterm Final Score 𝑥𝑖 95 84 80 Weight 𝑤𝑖 0.2 0.3 0.5 Weighted average = 𝑥𝑖 𝑤𝑖 = 84.2 Chapter 1—Descriptive Statistics 𝑥𝑖 𝑤𝑖 19.0 25.2 40.0 84.2 Page 3 of 8 Example The distribution of test scores (scale of 100) on the midterm is as follows. Compute the average score of the 50 students who took the test. Score 𝑥𝑖 56 64 72 76 80 84 88 92 96 100 Frequency 𝑓𝑖 1 1 3 10 12 8 6 4 2 3 50 𝑥𝑖 𝑓𝑖 56 64 216 760 960 672 528 368 192 300 4116 ∑𝑥𝑖 𝑓 𝑖 = 4116 𝑁 = ∑𝑓𝑖 = 50 µ= ∑𝑥𝑖 𝑓𝑖 ∑𝑥𝑖 𝑓𝑖 = = 82.32 𝑁 ∑𝑓𝑖 Measure of Dispersion (Variability) of Data—Variance What is “dispersion” or “variability”? Dispersion or variability of data indicates how the data points in a data set are distributed along the number line. This distribution or dispersion is measured against a benchmark on the number line. The benchmark is the mean or the center of gravity of that data set. The distance of each data point from the mean is called the “deviation”. For the data set X, the deviation of each observation from the mean is 𝑥𝑖 − µ𝑥 . For the data set Y, the deviation is 𝑦𝑖 − µ𝑦 . A logical and intuitive summary measure or indicator of deviations is the average of the deviations. To find this average you must sum the deviations and divide the sum by the number of deviations. But the sum of deviations is always equal to zero, so the average deviations will also be equal to zero. To avoid this outcome, square the deviations and find the mean of the squared deviations. The mean of squared deviations is the variance of the data. Chapter 1—Descriptive Statistics Page 4 of 8 𝑥 14 16 18 24 28 100 (𝑥 − µ)² 36 16 4 16 64 136 𝑥−µ -6 -4 -2 4 8 0 𝑦 1 3 7 24 50 85 (𝑦 − µ)² 256 196 100 49 1089 1690 𝑦−µ -16 -14 -10 7 33 0 Mean: µ = ∑𝑥⁄𝑁 = 100⁄5 = 20 Mean: µ = ∑𝑦⁄𝑁 = 85⁄5 = 17 Variance: σ2 = ∑(𝑥 − µ)2 ⁄𝑁 = 136⁄5 = 27.2 Variance: σ2 = ∑(𝑦 − µ)2 ⁄𝑁 = 1690⁄5 = 338 Standard deviation: σ = √σ2 = 5.215 Standard deviation: σ = √338 = 18.385 Variance is the average squared deviation. Standard deviation is the average deviation, computed the roundabout way. Simpler formula to compute the variance— The numerator of the variance formula can be calculated more simply by using the following formula: ∑(𝑥 − µ)2 𝑥 5.2 6.9 8.6 12.8 14.4 47.9 Regular method (𝑥 − µ)² 𝑥−µ -4.38 19.1844 -2.68 7.1824 -0.98 0.9604 3.22 10.3684 4.82 23.2324 60.9280 = ∑𝑥2 − 𝑁µ2 Simple method 𝑥 𝑥² 5.2 27.04 6.9 47.61 8.6 73.96 12.8 163.84 14.4 207.36 47.9 519.81 µ = ∑𝑥⁄𝑁 = 47.9⁄5 µ = ∑𝑥⁄𝑁 = 47.9⁄5 = 9.58 ∑(𝑥 − µ)2 𝑥² = 519.81 = 60.928 σ2 = 60.928⁄5 = 12.1856 Chapter 1—Descriptive Statistics 𝑥² − 𝑁µ2 = 519.81 − 5(9.582 ) = 60.928 σ² = 60.928 ⁄5 = 12.1856 Page 5 of 8 Variance of Binary Data The variance of binary data is computed using the simple formula: π is the population proportion (mean of binary data). σ² = π(1 − π) Using the simple method for computing the numerator of the variance formula (replacing µ with π), we have, σ2 = ∑(𝑥 − π)2 𝑁 = ∑𝑥 2 − 𝑁π2 𝑁 Since x is binary, 𝑥² = σ2 = π= ∑𝑥 − 𝑁π2 𝑁 ∑𝑥 𝑁 = = ∑𝑥 𝑁 𝑥 For example: 𝑥 = 1 + 1 + 0 + 0 + 0 = 2 𝑥2 =1+1+0+0+0 =2 − π2 = π − π2 = π(1 − π) 2 = 0.4 5 σ² = π(1 − π) = 0.4(0.6) = 0.24 Variance of Sample Data Using 𝑥̅ for the mean of the sample data, the sum of squared deviations in the numerator of the variance formula is: (𝑥 − 𝑥̅)² But to find the average squared deviation of the sample data you must divide the sum of squared deviations by “𝑛 − 1”, where 𝑛 (lower case) is the number of data points in the sample (sample size). The symbol for the sample variance is 𝒔². 𝑠2 = ∑(𝑥 − 𝑥̅ )2 𝑛−1 The simple formula for the numerator of the sample variance is: (𝑥 − 𝑥̅ )2 = 𝑥² − 𝑛𝑥̅² Chapter 1—Descriptive Statistics Page 6 of 8 Example Below is the fill (in ounces) of a random sample of 5 bottles of soda. Compute the mean, variance, and standard deviation of the fill. The regular method 𝑥 16.2 15.8 15.6 16.4 15.5 ∑𝑥 = 79.5 𝑥 − 𝑥̅ 0.3 -0.1 -0.3 0.5 -0.4 ∑(𝑥 − 𝑥̅) = 0.0 The “simple” method (𝑥 − 𝑥̅)² 0.09 0.01 0.09 0.25 0.16 2 ) ∑(𝑥 − 𝑥̅ = 0.60 𝑥̅ = 𝑥 ⁄𝑛 = 79.5⁄5 = 15.9 𝑠² = (𝑥 − 𝑥̅)² 𝑛– 1 = 0.60 = 0.15 4 𝑠 = √0.15 = 0.387 𝑥 16.2 15.8 15.6 16.4 15.5 ∑𝑥 = 79.5 ∑𝑥 2 = 𝑥² 262.44 249.64 243.36 268.96 240.25 1264.65 𝑥̅ = 𝑥 ⁄𝑛 = 79.5⁄5 = 15.9 𝑥² − 𝑛𝑥̅ 2 𝑠² = = 1264.65 − 5(15.92 ) = 0.60 0.60 = 0.15 4 𝑠 = √0.15 = 0.387 The z-score We use the standard deviation of the data set (σ for the population data, and 𝑠 for the sample data) to transform the observations in a data set such that they are expressed in terms of their deviation from the mean measured in units of, or relative to, the standard deviation. 𝑧𝑖 = 𝑥𝑖 − µ σ For example, suppose the mean of a data set is µ = 20 and the standard deviation is σ = 6, and suppose one of the observations in this data set is 𝑥𝑖 = 32. The. the deviation of this data point from the mean is 𝑥𝑖 − µ = 32 − 20 = 12. Given σ = 6, then this observation, 𝑥𝑖 = 32, is (32 − 20)⁄6 = 2 standard deviations from the mean. The deviation “𝑥𝑖 − µ” is expressed in units of standard deviation σ. This is the 𝑧-score. 𝑧= 32 − 20 =2 6 You can find the z score for each observation in the data set. Chapter 1—Descriptive Statistics Page 7 of 8 Example Find the mean µ and standard deviation σ of the following population data set. Then transform each 𝑥𝑖 into the 𝑧-score. 𝑥𝑖 8 12 18 26 41 To find µ and σ: 𝑥 8 12 18 26 41 105 µ= 𝑥 σ² = 𝑁 To find the z-scores: (𝑥 − µ)² 169 81 9 25 400 684 = 105 = 21 5 (𝑥 − µ)² 𝑁 684 = = 136.80 5 𝑥 8 12 18 26 41 105 𝑧 = (𝑥 − µ)⁄σ -1.11 -0.77 -0.26 0.43 1.71 0.00 𝑧² 1.24 0.59 0.07 0.18 2.92 5.00 The Mean and Standard Deviation of z ∑𝑧 = 0 µ𝑧 = σ = √136.8 = 11.696 2 σ = Chapter 1—Descriptive Statistics 𝑥−µ -13 -9 -3 5 20 0 ∑𝑧 𝑁 =0 ∑(𝑧 − µ𝑧 )2 𝑁 = ∑𝑧 2 𝑁 = 5 =1 5 𝜎=1 Page 8 of 8