CHAPTER 1 REVIEW OF BASIC STATISTICS CONCEPTS 1. 2. 3. 4. 5. 6. Rules of Summation What is Statistics? 2.1. Descriptive statistics 2.2. Inferential statistics 2.2.1. population 2.2.2. Sample Important Measures of Central Tendency and Data Variability 3.1. The Mean 3.1.1. population mean µ 3.1.2. sample mean x̄ 3.2. The Mean as the Center of Gravity of the Data 3.3. Variance 3.3.1. Variance of Population Data 3.3.2. Sample Variance 3.4. Standard Deviation 3.5. The z-score Measures of Association Between Two Variables 4.1. Covariance 4.1.1. Population Covariance 4.1.1.1. Covariance is affected by the scale of the data 4.1.1.2. The covariance sign (−, +) indicates the direction of relationship between x and y 4.1.1.3. When x and y are Unrelated 4.1.1.4. The Computational (Simpler) Formula for Covariance 4.1.2. Sample Covariance 4.2. Correlation Coefficient Random Variables 5.1. Probability Distribution (Probability Density Function) of Discrete Random Variables 5.2. Expected Value of Discrete Random Variables 5.2.1. Expected Value Rules 5.3. Variance of the Discrete Random Variable 5.3.1. Variance Rules 5.4. The Fixed and Random Components of a Random Variable 5.5. Joint Probability Distribution (Probability Density Function) of Two Random Variables 5.6. Independent versus Dependent Random Variables 5.7. Covariance of x and y 5.7.1. The covariance formula simplified 5.8. Coefficient of Correlation 5.9. Effect of Linear Transformation of Two Random Variables on Their Covariance and Correlation 5.10. The Mean and Variance of the Sum of Two Random Variables Normal Distribution 1-Numerical Descriptive Statistics 1 of 37 1. Rules of Summation 1) Sum of 𝑥𝑖 . 𝑛 ∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 𝑖=1 𝑖 1 2 3 4 5 𝑥𝑖 20 21 19 22 24 106 5 ∑ 𝑥𝑖 = 20 + 21 + 19 + 22 + 24 = 106 𝑖=1 2) For a given constant 𝑘. 𝑛 ∑ 𝑘 = 𝑘 + 𝑘 + ⋯ + 𝑘 = 𝑛𝑘 𝑖=1 5 ∑ 10 = 10 + 10 + 10 + 10 + 10 𝑖=1 5 ∑ 10 = 5 × 10 = 50 𝑖=1 3) Sum of 𝑘𝑥𝑖 𝑛 ∑ 𝑘𝑥𝑖 = 𝑘𝑥1 + 𝑘𝑥2 + ⋯ + 𝑘𝑥𝑛 𝑖=1 𝑛 ∑ 𝑘𝑥𝑖 = 𝑘(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) 𝑖=1 𝑛 𝑛 ∑ 𝑘𝑥𝑖 = 𝑘 ∑ 𝑥𝑖 𝑖=1 𝑖=1 4) Sum of 𝑘 + 𝑚𝑥𝑖 𝑛 ∑(𝑘 + 𝑚𝑥𝑖 ) = (𝑘 + 𝑚𝑥1 ) + (𝑘 + 𝑚𝑥2 ) + ⋯ + (𝑘 + 𝑚𝑥𝑛 ) 𝑖=1 𝑛 ∑(𝑘 + 𝑚𝑥𝑖 ) = (𝑘 + 𝑘 + ⋯ 𝑘) + 𝑚(𝑥1 + 𝑥2 + ⋯ 𝑥𝑛 ) 𝑖=1 𝑛 𝑛 𝑛 ∑(𝑘 + 𝑚𝑥𝑖 ) = ∑ 𝑘 + 𝑚 ∑ 𝑥𝑖 𝑖=1 𝑖=1 𝑖=1 1-Numerical Descriptive Statistics 2 of 37 𝑛 𝑛 ∑(𝑘 + 𝑚𝑥𝑖 ) = 𝑛𝑘 + 𝑚 ∑ 𝑥𝑖 𝑖=1 𝑖=1 5) Sum of 𝑥𝑖 + 𝑦𝑖 𝑛 ∑(𝑥𝑖 + 𝑦𝑖 ) = (𝑥1 + 𝑥2 + ⋯ 𝑥𝑛 ) + (𝑦1 + 𝑦2 + ⋯ 𝑦𝑛 ) 𝑖=1 𝑛 𝑛 𝑛 ∑(𝑥𝑖 + 𝑦𝑖 ) = ∑ 𝑥𝑖 + ∑ 𝑦𝑖 𝑖=1 𝑖=1 𝑖=1 2. What is Statistics? Statistics is a discipline which studies the collection, organization, presentation, analysis and interpretation of numerical data. There are two types of statistics: the descriptive statistics, and the inferential statistics. 2.1. Descriptive statistics Descriptive statistics is the easy part. It deals with the collection, organization, and presentation of data. Descriptive statistics involves tables, charts, and presentation of summary characteristics of the data, which include concepts such as the mean, median or standard deviation. Descriptive statistics is encountered daily in the news media. For example, in the weather report you frequently hear about the average temperature, precipitation, pollen count, etc., in a given month of the year. Or you may read about the stock market trend, changes in the mortgage rate, the rise and fall in the crime rate, students' performance in statewide tests, and many similar reports. 2.2. Inferential statistics Inferential statistics is the complicated part of statistics. It deals with inferring or drawing conclusions about the whole (population data) from analyzing a part of a phenomenon (sample data). An opinion poll is an example of inferential statistics. For example, to determine the voters' preference for a given political candidate a sample of registered voters is questioned from which inferences are made about the attitudes of the population of all potential voters. The reason inferential statistics is more complicated is that it involves the theories of probability and sampling distribution, subjects unfamiliar to most students of introduction to statistics. 2.2.1. Population In inferential statistics, the term population applies to every element, observation or data in the phenomenon or group that is the subject of the analysis. Stated another way, a population consists of all the items or individuals about which you want to draw a conclusion. 2.2.2. Sample The sample is a subset of the population selected in order to estimate, or infer about, specific characteristics of the population. For example, suppose we are interested in the average age of residents of a retirement community in Florida. Table 1.1, listing the age of every resident, represents the population that is the 1-Numerical Descriptive Statistics 3 of 37 subject of the study. The population has 608 observations. The shaded cells in the table represent the age data for a sample of size 40 randomly selected from the population. Table 1.2 contains the sample data. The population data set here is said to be “finite”. You can easily obtain and list them. And you can easily compute the average age. Here, the average age of the population of residents in this community is 64.2. The population average (or population mean), denoted by (mu, the Greek lower case m) is an example of a summary characteristic of a data set. A summary characteristic that is obtained from the population is called a population parameter. Thus, = 64.2 is a population parameter. Table 1.1 Population Age Data for the Residents of a Retirement Community 82 69 56 74 68 65 60 66 70 51 59 64 75 69 69 70 60 76 65 64 64 55 64 65 69 75 58 61 65 59 62 62 57 61 59 78 70 70 69 63 62 73 52 68 79 61 68 68 56 79 61 68 67 52 67 65 65 53 56 69 66 57 67 62 54 70 58 76 65 64 79 70 69 55 60 69 54 62 67 61 60 69 55 68 66 52 69 62 67 70 57 69 60 58 63 65 88 67 65 55 61 69 63 63 67 53 58 65 76 63 66 69 58 74 72 63 66 51 66 68 70 50 57 62 51 58 66 70 62 67 54 70 64 64 61 66 62 69 68 68 63 67 68 64 59 52 52 70 64 55 62 67 66 73 57 61 67 60 68 69 87 64 50 56 70 61 60 70 70 60 68 68 72 61 67 68 56 68 67 61 64 55 67 64 61 77 60 67 67 60 62 66 84 67 68 62 67 64 67 69 68 74 54 63 68 66 67 68 56 61 70 63 63 54 61 66 68 79 60 67 56 57 68 53 52 68 66 59 67 63 76 69 66 78 52 65 72 64 61 63 59 64 75 63 69 52 66 65 61 51 57 70 61 59 68 56 85 63 59 67 61 70 73 69 61 55 62 70 65 68 61 62 58 53 66 70 66 55 65 62 67 66 57 61 64 58 70 50 90 63 67 54 62 67 78 63 65 76 66 64 79 67 63 69 57 59 63 61 67 50 63 62 61 84 60 66 67 57 62 64 50 62 58 80 68 70 51 70 66 57 66 66 59 65 65 67 60 69 80 65 62 50 61 66 64 64 59 67 57 60 61 66 81 70 61 57 70 69 54 69 65 78 65 70 80 69 69 63 60 69 74 63 62 54 64 65 69 53 60 70 69 59 62 65 78 61 61 63 67 63 76 65 68 76 67 62 52 65 68 64 59 73 66 69 69 54 67 66 64 73 59 62 58 60 65 67 61 64 50 74 67 63 52 63 66 57 56 70 76 61 65 61 60 55 75 64 66 54 63 70 70 60 60 69 62 59 63 51 79 67 68 79 70 67 55 64 70 63 57 66 50 68 63 62 59 52 50 61 68 52 63 69 64 60 60 69 52 60 62 63 82 65 67 65 68 70 58 68 65 57 67 63 55 67 69 68 60 77 52 64 65 53 62 65 64 73 56 65 62 56 65 63 71 64 52 73 64 68 62 64 62 52 66 68 76 64 66 68 58 76 61 69 61 51 64 67 61 56 59 61 70 60 62 55 86 66 54 73 64 66 50 63 64 72 68 63 57 61 65 67 60 74 55 67 70 52 61 63 66 66 59 68 53 56 68 55 73 67 57 50 61 70 69 66 68 57 52 68 51 65 64 69 59 72 72 65 65 53 67 66 62 82 56 65 50 59 70 57 Sometimes it is preferable to determine the summary characteristic of interest from a sample. In most cases, the population data set may not be finite, and hence not obtainable. In such cases a sample, as a subset of the population data, my serve us better than the study of the whole population. Even with finite population data sets, sometimes it is preferable to obtain a sample because it may be more convenient or that the sample data can be screened for errors much better than when population data is used. If a summary characteristic is computed from the sample data, then this summary characteristic represents an estimate of the population parameter. The sample (estimated) summary characteristic is called a sample statistic. Table 1.2 shows the age data for a sample of 40 residents randomly selected from the population. 1-Numerical Descriptive Statistics 4 of 37 Table 1.2 The Age Data for a Sample of 40 Residents 54 69 66 64 61 61 62 70 51 53 57 69 60 54 60 70 66 69 69 59 70 76 75 66 60 67 52 50 61 60 69 52 52 66 62 69 65 63 64 70 The average or mean age computed from the random sample of size 40 shown in Table 1.2 is 62.8. This average is denoted by the symbol 𝑥̅ (x-bar). Thus, the sample statistic 𝑥̅ = 62.8 is an estimate of the population parameter = 64.2. 3. Important Measures of Central Tendency and Data Variability The two main summary characteristics of data used in statistics are the measure of central tendency or center of gravity of data, and the measure of data variability. The measure of central tendency is the mean and the measure of data variability (dispersion) is the variance, which is a squared measure, and its square root standard deviation. 3.1. The Mean The most widely known and used measure of central tendency is the arithmetic mean (or, simply, the mean or the average). The mean is the sum of the values of all the observations in a data set divided by the number of observations. If the data set represents a population then the population mean is denoted by µ ; and if the data set represents a sample, then the sample mean is denoted by 𝑥̅ . The formula for the population mean is: 3.1.1. Population Mean To obtain the population mean add all the values in the data set and divide the sum by the count of the observations in that data set, N. µ= ∑𝑥𝑖 𝑁 Example A population consists of five data points as follows: 𝑥𝑖 2 5 7 9 17 The population mean is computed as µ= ∑𝑥𝑖 𝑁 = 2 + 7 + 9 + 17 =8 5 1-Numerical Descriptive Statistics 5 of 37 3.1.2. Sample Mean The formula for the sample mean is the same as the population mean formula, except for the symbols. The sample mean is denoted by x̄ and the sample size by n. 𝑥̅ = ∑𝑥𝑖 𝑛 3.2. The Mean as the Center of Gravity of Data Set The mean represents the "center of gravity" of a set of numbers. To explain this, first you must understand one of the most important terms in statistics, that is, deviation. Deviation is simply the distance of, or difference between, each data point from some benchmark. Here the benchmark is the mean. Thus, the deviation is defined as: 𝑥𝑖 − 𝜇. Table 1.5 below shows the deviation of each of the five data points from the mean µ = 8. Note that the sum of deviations equals zero. This is where the notion of “center of gravity” comes in. Table 1.5 Deviation of data from the mean (µ = 8) Deviation 𝑥𝑖 − µ 𝑥𝑖 2 −6 5 −3 7 −1 9 1 17 9 ∑(𝑥𝑖 − µ) = 0 As the following diagram shows, µ = 8 is the balancing point of the five numbers. This means that sum of the deviations of the values exceeding the mean, 1 + 9 = 10, exactly balances the sum of the deviations of the values below the mean, −6 −3 – 1 = −10, . Thus the sum of all deviations from the mean is zero. 2 5 7 9 17 8 3.3. Variance The variance of a data set is a measure of dispersion of data. It represents the mean squared deviation of data points from the mean. 1-Numerical Descriptive Statistics 6 of 37 3.3.1. Variance of Population Data The population variance is denoted by 𝜎² (lower case Greek letter sigma-square). To compute the variance of a population data set, first you must find the mean µ, then determine the sum of squared deviations of the observations from the mean, as follows: Deviation from the mean = 𝑥𝑖 − µ Squared Deviation = (𝑥𝑖 − µ)2 Sum of the squared deviations = ∑(𝑥𝑖 − µ)2 Variance is the mean squared deviation. Therefore, divide the sum of squared deviations by N: σ2 = ∑(𝑥𝑖 − µ)2 𝑁 Example Find the variance of the following data set: 34 55 46 38 42 The following worksheet shows the computations: 𝑥𝑖 34 -9 81 55 12 144 46 3 9 38 -5 25 42 -1 𝜇 = 43 σ2 = (𝑥𝑖 − 𝜇)2 𝑥𝑖 − 𝜇 1 2 ∑(𝑥𝑖 − µ) = 260 260 = 52 𝑁 3.3.2. Computational (Simpler) Formula to Find the Population Variance We can adjust the variance formula to obtain a simpler process to compute the variance. Of course, if you have access to a computer software like Excel there is no need to use this formula. Nevertheless, learning how to derive the computational formula is a useful exercise in learning various statistical computations. the computational formula is obtained as follows. The numerator of the variance formula can be simplified into the following terms: (𝑥 − 𝜇)2 = (𝑥2 − 2µ𝑥 + µ2 ) (𝑥 − 𝜇)2 = (𝑥2 − 2µ𝑥 + µ2 ) (𝑥 − 𝜇)2 = 𝑥² − 2µ𝑥 + 𝑁µ2 1-Numerical Descriptive Statistics 7 of 37 (𝑥 − 𝜇)2 = 𝑥2 − 2𝑁µ2 + 𝑁µ2 (𝑥 − 𝜇)2 = 𝑥2 − 𝑁µ² Then, σ2 = ∑ 𝑥 2 − 𝑁µ2 𝑁 Compute the variance from the data set above using the simple formula: σ2 = 𝑥 𝑥2 34 1156 55 3025 46 2116 38 1444 42 1764 µ = 43 𝑥² = 9505 9505 − 5 × 432 5 In Excel, the variance of a population data set is obtained by the function: =VAR.P(data range) 3.3.3. Sample Variance The variance of sample data set is not only denoted by a different symbol but also it is obtained using a different formula. To find the average squared deviation or “mean square”, divide the total sum of squares by 𝑛 − 1. The value obtained by 𝑛 − 1 is called the degrees of freedom. This concept will be explained later within the context of inferential statistics. There we will also learn why we divide the sum of squared deviations by 𝑛 − 1, rather than just 𝑛, to determine the sample variance. s2 = ∑(𝑥𝑖 − 𝑥̅ )2 𝑛−1 Example Using the same data above, assume the data represent a sample, find the sample variance. 𝑥 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )² 34 -9 81 55 12 144 46 3 9 38 -5 25 42 -1 1 x̄ = 43 s2 = (𝑥 − 𝑥̅ )²= 260 260 = 65 4 1-Numerical Descriptive Statistics 8 of 37 Similarly, the simple computational formula for the sample variance is: 𝑠2 = ∑𝑥 2 − 𝑛𝑥̅ 2 𝑛−1 In Excel, the variance of a sample data set is found by the function: =VAR.S(data range) 3.4. Standard Deviation The standard deviation is the (positive) square root of the variance. It is the indirect way obtaining the mean deviation of the data points from their center of gravity. The population standard deviation formula is: σ=√ ∑(𝑥 − µ)2 𝑁 And the sample standard deviation: ∑(𝑥 − 𝑥̅ )2 𝑠=√ 𝑛−1 In Excel, the standard deviation of the population data set (σ) is obtained by: the standard deviation of the sample data set (s) is obtained by: 3.5. =STDEV.P(data range) =STDEV.S(data range) The z-score Using the mean and the standard deviation of a data set we can determine the relative location of each observation or data point. This relative location is measured as the distance or deviation of each data point from the mean in units of the standard deviation. The deviation of each data point from the mean is 𝑥 − 𝜇. If you divide the deviation by σ, then the distance is measured relative to, or in units of, the standard deviation. Through this process we "standardize" the data points; we transform the variable 𝑥 into the standardized variable 𝑧. 𝑧= 𝑥−𝜇 𝜎 Example For the following data set, find the mean and the standard deviation and then find the z-score for each data point. That is, transform the x variable into a z variable. 46 𝜇 = 44 54 42 46 32 𝜎 = 7.155 The standardized values are determined as follows: 1-Numerical Descriptive Statistics 9 of 37 𝑧 = (𝑥 − µ)⁄σ 𝑥 𝑥−𝜇 46 2 54 10 1.40 42 −2 −0.28 46 2 0.28 32 −12 −1.68 (𝑥 − 𝜇) = 0 𝑧 = 0.00 0.28 Note that since the sum of all deviations from the mean equal to zero, then the mean of all z scores must be zero µ𝑧 = ∑𝑧 𝑁 µ𝑧 = 1 𝑥−µ ∑ 𝑁 σ µ𝑧 = 1 ∑(𝑥 − 𝜇) = 0 𝑁𝜎 Also the variance and the standard deviation are both equal to one. σ2𝑧 = ∑(𝑧 − µ𝑧 )2 𝑁 σ2𝑧 = 1 ∑ 𝑧2 𝑁 σ2𝑧 = (𝑥 − µ)2 1 ∑ 𝑁 σ2 σ2𝑧 = 1 𝑁σ2 2 ∑(𝑥 − µ) = =1 𝑁σ2 𝑁σ2 𝜇𝑧 = 0 (𝑧 − µ𝑧 )² * 𝑧 0.28 0.0781 1.40 1.9531 -0.28 0.0781 0.28 0.0781 -1.68 2.8125 (𝑧 − µ𝑧)2 = 5.0000 σ2𝑧 = ∑(𝑥 − µ𝑧 )2 ⁄𝑁 = 1.0000 𝑧 = 0.00 µ𝑧 = ∑ 𝑧⁄𝑁 = 0.00 * Note that the z values are rounded to two decimals. The squared values in the second column are, therefore, not exactly the squares of the rounded values in the first column. 1-Numerical Descriptive Statistics 10 of 37 4. Measures of Association Between Two Variables In many statistical analyses we are interested in the relationship between two variables. For example, to what extent advertising affects the sales volume of a product? How per unit cost of production varies with the volume of output? How a state’s annual tax revenues vary with changes in the Gross Domestic Product? An important measure of the degree of association between two variables is covariance. A related, and more widely used, measure is the correlation coefficient. These measures are discussed below. 4.1. Covariance 4.1.1. Population Covariance Consider the following two population data sets represented by 𝑥 and 𝑦 with equal number of data points, where each data point in 𝑥 is paired with a data point in 𝑦 (Example 1). Covariance, as the term implies, measures the joint variation in the two data sets. It is the average value of the product of the deviation of data points in each set from their respective mean: σ𝑥𝑦 = ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) 𝑁 Example 1 𝑥 58 75 30 102 69 84 22 48 𝑦 122 118 115 144 98 160 60 87 µx = 61 µy = 113 𝑥 − µ𝑥 -3 14 -31 41 8 23 -39 -13 σ𝑥𝑦 (𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) 𝑦 − µ𝑦 9 -27 5 70 2 -62 31 1271 -15 -120 47 1081 -53 2067 -26 338 ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = 4618 = ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )⁄𝑁 = 577.25 4.1.1.1. Covariance is affected by the scale of the data Because the size of the covariance is affected by the scale of the data, covariance is not used as a measure of the strength of the relationship between x and y. It would be misleading to think that because a given covariance is large the relationship between the two variables is strong. The above covariance, σ𝑥𝑦 = 577.25, for example, tells us very little how strongly 𝑥 and 𝑦 are related. Using the covariance, we can derive a relative measure which clearly shows the strength of relationship between the two variables. This relative measure is called the correlation coefficient and will be explained below. 4.1.1.2. The covariance sign (−, +) indicates the direction of relationship between x and y Because covariance can be either negative or positive, its sign is an indicator of the direction of the relationship between x and y. The last column in the table for Example 1 shows that most of the deviations products, (𝑥 − µ𝑥 )(𝑦 − µ𝑦 ), are positive and their overall size of the positives overwhelms the negatives. 1-Numerical Descriptive Statistics 11 of 37 What is the significance of this? To explain, consider the following scatter diagram of the above data, where each dot represents the corresponding (𝑥, 𝑦) pairs. y 180 160 84, 160 I 140 120 II 58, 122 30, 115 μy = 113 100 48, 87 80 60 102, 144 75, 118 69, 98 22, 60 40 III IV 20 0 0 20 40 60 μx = 61 80 100 120 x The plot area of the scatter diagram is partitioned into four quadrants using a vertical line representing µ𝑥 = 61 and a horizontal line representing µ𝑦 = 113. The intersection point of the two mean lines can be looked at as the center of gravity of the data. The points in quadrant II represent all the (x, y) pairs that exceed their respective means. Thus, all deviations (𝑥𝑖 − µ𝑥 ) and (𝑦𝑖 − µ𝑦 ) in this quadrant and their products (𝑥𝑖 − µ𝑥 )(𝑦𝑖 − µ𝑦 ) > 0. In quadrant IV, the points represent all the (𝑥, 𝑦) pairs that are below their respective mean. Thus, all deviations (𝑥𝑖 − µ𝑥 ) and (𝑦𝑖 − µ𝑦 ) in this quadrant are negative. However, the product of two negatives is always positive. Thus, in quadrant IV also (𝑥𝑖 − µ𝑥 )(𝑦𝑖 − µ𝑦 ) > 0. In quadrant III, deviations (𝑥𝑖 − µ𝑥 ) are positive but (𝑦𝑖 − µ𝑦 ) are negative, and the reverse holds in quadrant IV. Thus, in these two quadrants the products (𝑥𝑖 − µ𝑥 )(𝑦𝑖 − µ𝑦 ) < 0. When in the competition between the positive products in quadrants II and IV, on the one hand, and the negative products in quadrants I and III, on the other, the positives overwhelm the negatives, the covariance sign will be positive, indicating a direct relationship between x and y. The relationship is inverse when the negatives win. 4.1.1.3. Covariance when 𝒙 and 𝒚 are unrelated When the points are evenly distributed in the four quadrants, no one wins. The points have therefore no direction, indicating no relationship between 𝑥 and 𝑦. Thus covariance is zero or near zero. When covariance is zero, 𝑥 and 𝑦 are said to be independent. Example 2 The following data and the related scatter diagram show that x and y are not related. The data points appear to be evenly distributed in the four quadrants. 1-Numerical Descriptive Statistics 12 of 37 x 97 23 86 14 54 44 19 12 55 26 87 82 41 56 92 88 y y 147 132 105 126 101 125 134 107 109 105 119 141 143 131 122 142 160 150 140 130 μy 120 110 100 90 80 0 20 40 μx 60 80 100 x 4.1.1.4. The Computational (Simpler) Formula for Covariance We can simplify the population covariance formula for computational purposes by rewriting the numerator as follows: ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑(𝑥𝑦 − µ𝑦 𝑥 − µ𝑥 𝑦 + µ𝑥 µ𝑦 ) ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑𝑥𝑦 − µ𝑦 ∑𝑥 − µ𝑥 ∑𝑦 + 𝑁µ𝑥 µ𝑦 ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑𝑥𝑦 − 𝑁µ𝑥 µ𝑦 − 𝑁µ𝑥 µ𝑦 + 𝑁µ𝑥 µ𝑦 ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑𝑥𝑦 − 𝑁µ𝑥 µ𝑦 The covariance formula then becomes: σ𝑥𝑦 = ∑𝑥𝑦 − 𝑁µ𝑥 µ𝑦 𝑁 Use the computational formula to compute the covariance from Example 1. 1-Numerical Descriptive Statistics 13 of 37 𝑥 58 75 30 102 69 84 22 48 σ𝑥𝑦 = 𝑦 122 118 115 144 98 160 60 87 𝑥𝑦 7076 8850 3450 14688 6762 13440 1320 4176 𝑥𝑦 = 59762 59762 − 8(61)(113) = 577.25 8 4.1.2. Sample Covariance Sample covariance formula differs from the population covariance in the denominator. After computing the sum of the product of the deviations, divide the result by 𝑛 − 1. 𝑠𝑥𝑦 = ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) 𝑛−1 To computational formula is: 𝑠𝑥𝑦 = ∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 𝑛−1 Example 3 In the following table x represents the annual operating revenues and y the operating expenses (in $1,000’s) of a small company for a sample of 10 years. Compute the sample covariance of x and y. Use both the main formula and the computational formula. Year 1 2 3 4 5 6 7 8 9 10 𝑥̅ = 364 𝑥 209 235 262 289 328 364 410 454 509 580 𝑦 135 150 167 188 210 235 265 302 343 395 (𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅) (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) -155 -104 16,120 -129 -89 11,481 -102 -72 7,344 -75 -51 3,825 -36 -29 1,044 0 -4 0 46 26 1,196 90 63 5,670 145 104 15,080 216 156 33,696 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) = 95,456 𝑥𝑦 28,215 35,250 43,754 54,332 68,880 85,540 108,650 137,108 174,587 229,100 𝑥𝑦 = 965,416 𝑦̅ = 239 Original Formula: 1-Numerical Descriptive Statistics 14 of 37 𝑠𝑥𝑦 = ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) 95,456 = = 10,606.22 𝑛−1 9 Computational Formula 𝑠𝑥𝑦 = ∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 965,416 − 10(364)(239) 95,456 = = = 10,606.22 𝑛−1 9 9 Note that, other than the fact that covariance is positive, meaning that revenues and expenses vary together, 𝑠𝑥𝑦 = 10,606.22 conveys very little additional information about the nature of association between x and y. To measure the strength of the relationship between revenues and expenses we need a different measure. This takes us to the correlation coefficient. 4.2. Correlation Coefficient The correlation coefficient is a relative measure, showing the strength of the relationship between 𝑥 and 𝑦. It is determined by dividing the covariance of 𝑥 and 𝑦 by the product of the standard deviation of x and the standard deviation of y. The symbol for the population correlation coefficient is ρ𝑥𝑦 (𝑟ℎ𝑜 𝑠𝑢𝑏 𝑥𝑦) and the symbol for the sample correlation coefficient is 𝑟𝑥𝑦 . Other than the difference in the symbols, the formulas are the same, both resulting in identical coefficients. Population correlation coefficient ρ𝑥𝑦 = ρ𝑥𝑦 σ𝑥𝑦 σ𝑥 σ𝑦 − 1 ≤ ρ𝑥𝑦 ≤ 1 ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) 𝑁 = 2 ∑(𝑦 − µ )2 ) ∑(𝑥 − µ 𝑦 𝑥 √ √ 𝑁 𝑁 ρ𝑥𝑦 = ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) √∑(𝑥 − µ𝑥 )2 √∑(𝑦 − µ𝑦 )2 Sample correlation coefficient 𝑟𝑥𝑦 = 𝑠𝑥𝑦 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) = 𝑠𝑥 𝑠𝑦 √∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅)2 − 1 ≤ 𝑟𝑥𝑦 ≤ 1 Both ρ𝑥𝑦 and 𝑟𝑥𝑦 vary between −1 and +1. If the coefficient is negative, then there is an inverse relationship between 𝑥 and 𝑦. A positive coefficient indicates a direct relationship between the two variables. The closer the coefficient value is to the two extremes, the stronger the relationship between 𝑥 and 𝑦, and the closer it is to zero, the weaker the relationship. If the coefficient is zero, then there is no relationship between the two— 𝑥 and 𝑦 are said to be independent. Example 4 Find the correlation coefficient for the population data in Example 1. 1-Numerical Descriptive Statistics 15 of 37 𝑥 58 75 30 102 69 84 22 48 𝑦 122 118 115 144 98 160 60 87 As computed above, σ𝑥𝑦 = 577.25 Using the population standard deviation formula, we have ∑(𝑥 − µ𝑥 )2 σ𝑥 = √ = 25.323 𝑁 σ𝑦 = √ ρ𝑥𝑦 = ∑(𝑦 − µ𝑦 )2 = 29.559 𝑁 σ𝑥𝑦 577.25 = = 0.77 σ𝑥 σ𝑦 (25.323)(29.559) Since the correlation coefficient is close to 1, then there appears to be a reasonably strong association between x and y. Example 5 Use the sample data in Example 3 to determine the sample correlation coefficient. Using the sample standard deviation formula for the 𝑥 and 𝑦 data, we have: 𝑠𝑥 = 122.88 𝑟𝑥𝑦 = 𝑠𝑦 = 86.39 𝑠𝑥𝑦 10,606.22 = = 0.999 𝑠𝑥 𝑠𝑦 (122.88)(86.39) Here the correlation coefficient is almost one, indicating a direct extremely close relationship between operating revenue and expenses. This close relationship is shown in the following scatter diagram, where the sample points appear to lie on straight, upward sloping line. 1-Numerical Descriptive Statistics 16 of 37 y 450 400 580, 395 350 509, 343 300 454, 302 410, 265 364, 235 328, 210 289, 188 262, 167 235, 150 209, 135 250 200 150 100 50 0 100 200 300 400 500 600 700 x Example 6 Compute the correlation coefficient for the data in Example 2. To obtain the correlation coefficient without finding the covariance first, use the following simpler formula (easily derived from either the population or sample correlation coefficient formula): 𝑟𝑥𝑦 = ∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ √∑𝑥 2 − 𝑛𝑥̅ 2 √∑𝑦 2 − 𝑛𝑦̅ 2 𝑥 97 23 86 14 54 44 19 12 55 26 87 82 41 56 92 88 𝑥̅ = 54.75 𝑟𝑥𝑦 = − 𝑛𝑥̅ 2 √∑𝑦 2 − 𝑛𝑦̅ 2 𝑥𝑦 14259 3036 9030 1764 5454 5500 2546 1284 5995 2730 10353 11562 5863 7336 11224 12496 110432 𝑥𝑦 = 110432 𝑦̅ = 124.3125 ∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ √∑𝑥 2 𝑦 147 132 105 126 101 125 134 107 109 105 119 141 143 131 122 142 = 𝑥² 9409 529 7396 196 2916 1936 361 144 3025 676 7569 6724 1681 3136 8464 7744 61906 𝑦² 21609 17424 11025 15876 10201 15625 17956 11449 11881 11025 14161 19881 20449 17161 14884 20164 250771 𝑥² = 61906 110432 − 16(54.75)(124.3125) √61906 − 16(54.752 )√250771 − 16(124.31252 ) 𝑦² = 250771 = 0.2192 The correlation coefficient of 0.22 indicates that there is a very week association between 𝑥 and 𝑦. How to Find Covariance and Correlation Coefficient Using Excel— 1-Numerical Descriptive Statistics 17 of 37 The formula for population covariance: =COVARIANCE.P(array1,array2) The formula for sample covariance: =COVARIANCE.S(array1,array2) The formula for correlation coefficient : =CORREL(array1,array2) Since the population and sample correlation coefficient formulas give the same results the Excel formula applies to both. 5. Random Variables A random variable is a variable whose values are determined through a random experiment or process. In other words, a random variable is a variable whose value cannot be predicted exactly. The value is not known in advance; it is not known until after the random experiment is conducted. Example 5.1 As a random experiment, toss a coin. This random experiment has two outcomes: H and T. Let’s assign the value 0 to H, and 1 to T. Thus, this experiment generates values we can assign to the random variable x, which is the number of tails. If you toss two coins, then the number of tails is either 0, 1, or 2: Outcomes of the random experiment (0,0) (0,1), (1,0) (1,1) Values of random variable 𝑥 Number of tails 0 1 2 Example 5.2 When you toss a pair of dice, let 𝑥 denote the sum of the number of dots appearing on top. These numbers, as they appear in the top row below, are assigned to 𝑥 through the outcomes of the random experiment. The outcomes are shown below. Outcomes of the random experiment (1,1) Values of random variable 𝑥 Sum of dots 2 (1,2), (2,1) 3 (1,3), (2, 2), (3,1) 4 (1,4), (2,3), (3,2), (4,1) 5 (1,5), (2,4), (3,3), (4,2), (5,1) 6 (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) 7 (2,6), (3,5), (4,4), (5,3), (6,2) 8 (3,6), (4,5), (5,4), (6,3) 9 (4,6), (5,5), (6,4) 10 (5,6), (6,5) 11 (6,6) 12 Example 5.3 When you are guessing the answers to a set of 5 multiple-choice questions, you are conducting a random experiment. Assign 0 to the incorrect answer and 1 to the correct answer for each question, and let 𝑥 denote the number of possible correct answers: 0, 1, 2, 3, 4, 5. These numbers are assigned to 𝑥 through the 1-Numerical Descriptive Statistics 18 of 37 outcomes of the random experiment as shown below. There are 32 possible outcomes in this random experiment.1 𝑥 Correct guesses Outcomes of the random experiment (0,0,0,0,0) 0 (1,0,0,0,0), (0,1,0,0,0), (0,0,1,0,0), (0,0,0,1,0), (0,0,0,0,1), 1 (1,1,0,0,0), (1,0,1,0,0), (1,0,0,1,0), (1,0,0,0,1), (0,1,1,0,0), (0,1,0,1,0), (0,1,0,0,1), (0,0,1,1,0), (0,0,1,0,1), (0,0,0,1,1) 2 (1,1,1,0,0), (1,1,0,1,0), (1,1,0,0,1), (1,0,1,1,0), (1,0,1,0,1), (1,0,0,1,1), (0,1,1,1,0), (0,1,1,0,1), (0,1,0,1,1), (0,0,1,1,1) 3 (1,1,1,1,0), (1,1,1,0,1), (1,1,0,1,1), (1,0,1,1,0), (0,1,1,1,1) 4 (1,1,1,1,1) 5 5.1. Probability Distribution (Probability Density Function) of Discrete Random Variables The random variables presented above are examples of discrete random variables. A random variable is discrete if it has a specific set of values, that is, if all of its possible values can be enumerated. The probability distribution, or probability density function, of a discrete random variable is simply a table showing all the possible values and corresponding probability of each value occurring. Depending on the nature of the random process generating the random variable, determining the probability distribution can be a very simple or a highly complex exercise. Example 5.4 Let 𝑥 be the random variable denoting the number of tails when tossing two coins as in Example 5.1. Write the probability distribution of x. As shown in Example 5.1, there are four outcomes when tossing two coins, each assigning a discrete value to 𝑥. Since each outcome is equally likely then the probability distribution is: x 0 1 2 f(x) 0.25 0.50 0.25 1.00 Example 5.5 Let 𝑥 be the random variable denoting the sum of dots appearing on top when tossing a pair of dice. Write the probability distribution of x. Per Example 5.2, there are 36 outcomes of (ordered) pairs of numbers generating the values of 𝑥. Since each outcome is equally likely, then the probability distribution of x can be written as follows: Each trial has two outcomes (success or failure). The experiment has five trials. Therefore, the total number of outcomes is 25 = 32. 1 1-Numerical Descriptive Statistics 19 of 37 𝑥 2 3 4 5 6 7 8 9 10 11 12 𝑓(𝑥) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 Example 5.6 Let x be the random variable denoting the number of correct answers when guessing the answers to a 5question multiple choice exam. Write the probability distribution of 𝑥. There are 32 outcomes, as shown in Example 5.3, for this random experiment. However, the outcomes are not all equally likely. If the test were a true-false type, the outcomes would be equally likely. For multiple choice tests the likelihood of each outcome depends a) on the probability of correct guess for each question— which depends on the number of choices—and b) the number of correct guesses in each outcome. Assume there are four choices per question. Then the probability of an incorrect guess for each question is P(0) = 3⁄4 = 0.75, and for a correct guess, P(1) = 1⁄4 = 0.25 P(1). Since the trials, guessing the answers, are independent, then: 𝑓(𝑥 = 0) = P(0,0,0,0,0) = (0.75)(0.75)(0.75)(0.75)(0.75) = 0.2373 𝑓(𝑥 = 5) = P(1,1,1,1,1) = (0.25)(0.25)(0.25)(0.25)(0.25) = 0.0010 Since the outcomes generating each of the intermediate values of the random variable are, for each value, equally likely, then, 𝑓(𝑥 𝑓(𝑥 𝑓(𝑥 𝑓(𝑥 = 1) = 5 × 0.25 × 0.754 = 0.3955 = 2) = 10 × 0.252 × 0.753 = 0.2637 = 3) = 10 × 0.253 × 0.752 = 0.0879 = 4) = 10 × 0.254 × 0.75 = 0.0146 The probability distribution of x is then: 𝑥 0 1 2 3 4 5 𝑓(𝑥) 0.2373 0.3955 0.2637 0.0879 0.0146 0.0010 Note that this is an example of a binomial distribution. The random experiments or processes generating a binomial distribution have the following characteristics: a) All 𝑛 trials are independent and identical; b) each trial has two outcomes, success or failure; and c) the probability of success (and failure) remains unchanged for all trials. Let 𝑛 be the number of trials, 𝑥 be the number of successes, and π be the probability of success per trial. Then 𝑓(𝑥) = C(𝑛, 𝑥)π𝑥 (1 − π)(𝑛−𝑥) 1-Numerical Descriptive Statistics 20 of 37 where C(𝑛, 𝑥) is the combination counting technique of selecting x items from n items: C(𝑛, 𝑥) = 𝑛! (𝑛 − 𝑥)! 𝑥! Example 5.7 Find the probability of guessing 8 answers correctly in a set of 25 multiple choice questions each with 5 choices. 𝑓(8) = C(25, 8)(0.28 )(0.817 ) = (1,081,575)(0.00000256)(0.022518) = 0.0623 5.2. Expected Value of Discrete Random Variables The expected value of a discrete random variable is the mean of the values assigned to the random variable. Since each value has a distinct probability of occurring, then these probabilities must be taken into account when computing the mean. The values must be weighted by their respective probabilities. Therefore, expected value is simply the weighted mean of all possible values of a discrete random variable. E(𝑥) = 𝑥𝑖 𝑓(𝑥𝑖 ) 𝑥𝑖 𝑥1 𝑥2 𝑥𝑛 𝑓(𝑥𝑖 ) 𝑓(𝑥1 ) 𝑓(𝑥2 ) 𝑓(𝑥𝑛 ) 𝑥𝑖 𝑓(𝑥𝑖 ) 𝑥1 𝑓(𝑥1 ) 𝑥2 𝑓(𝑥2 ) 𝑥3 𝑓(𝑥3 ) E(𝑥) = 𝑥𝑖 𝑓(𝑥𝑖 ) Example 5.8 Find the expected value of the sum of the dots when tossing a pair of dice. 𝑥 2 3 4 5 6 7 8 9 10 11 12 𝑓(𝑥) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 𝑥𝑓(𝑥) = 𝑥 𝑓(𝑥) 2/36 6/36 12/36 20/36 30/36 42/36 40/36 36/36 30/36 22/36 12/36 252/36 E(𝑥) = 𝑥𝑓(𝑥) = 252⁄36 = 7 5.2.1. Expected Value Rules The following rules affecting expected values will be used very frequently in future discussions: 1) Expected value of a constant is the constant itself: 1-Numerical Descriptive Statistics 21 of 37 E(𝑎) = 𝑎 2) If you multiply each value of the random variable by a constant, expected value is also a multiple of that constant: E(𝑏𝑥) = 𝑏𝐸(𝑥) E(𝑏𝑥) = ∑𝑏𝑥𝑓(𝑥) E(𝑏𝑥) = 𝑏𝑥1 𝑓(𝑥1 ) + 𝑏𝑥2 𝑓(𝑥2 ) + ⋯ + 𝑏𝑥𝑛 𝑓(𝑥𝑛 ) E(𝑏𝑥) = 𝑏[𝑥1 𝑓(𝑥1 ) + 𝑥2 𝑓(𝑥2 ) + ⋯ + 𝑥𝑛 𝑓(𝑥𝑛 )] E(𝑏𝑥) = 𝑏∑𝑥𝑓(𝑥) = 𝑏E(𝑥) Let 𝑦 = 𝑎 + 𝑏𝑥 Then, combining Rule 1 and Rule 2, we have, E(𝑦) = E(𝑎 + 𝑏𝑥) = 𝑎 + 𝑏E(𝑥) Example 5.9 The following is the probability (relative frequency) of the number of cars sold in a week by a salesman in a car dealership. 𝑥 0 1 2 3 4 5 𝑓(𝑥) 0.05 0.10 0.25 0.40 0.15 0.05 a) Find the expected value or the mean number of cars sold per week by the salesman. E(𝑥) = 𝑥𝑓(𝑥) = 2.65 b) The salesman receives a fixed weekly salary of $300 and a $200 commission for each car sold. What is the salesman’s mean weekly income? 𝑦 = 300 + 200𝑥 E(𝑦) = E(300 + 200𝑥) = 300 + 200E(𝑥) = 300 + (200)(2.65) = 830 5.3. Variance of the Discrete Random Variable The variance, generally, is computed as the mean squared deviation of a variable from the mean. σ2 = ∑(𝑥 − µ)2 𝑁 If the variable is a random variable, then we must compute the weighted mean of the squared deviations using the probabilities as the weights. Using the symbol for the population mean μ in place of E(𝑥), let 𝑢 denote the deviation of the values of the random variable from the mean, 𝑢 =𝑥−𝜇 1-Numerical Descriptive Statistics 22 of 37 Then variance of the random variable 𝑥 is defined as the expected value of the squared deviations. var(𝑥) = E(𝑢2 ) = E[(𝑥 − µ)2 ] = ∑(𝑥 − µ)2 𝑓(𝑥) Example 5.10 Use the data for the car salesman in Example 8 to compute the variance of the number of cars sold. 𝑥 0 1 2 3 4 5 𝑓(𝑥) 0.05 0.10 0.25 0.40 0.15 0.05 (𝑥 − µ)2 𝑓(𝑥) 0.3511 0.2723 0.1056 0.0490 0.2734 0.2761 1.3275 Using the properties of expected value we can develop an easier computational formula for var(𝑥): var(𝑥) = E[(𝑥 − µ)2 ] var(𝑥) = E(𝑥 2 − 2𝑥µ + µ2 ) = E(𝑥 2 ) − 2µE(𝑥) + µ2 var(𝑥) = E(𝑥 2 ) − 2µ2 + µ2 var(𝑥) = E(𝑥 2 ) − µ2 The square root of the variance is the standard deviation of the random variable, sd(𝑥). The standard deviation in effect shows the average deviation of the random variable from its mean. 5.3.1. Variance Rules a) Variance of a constant is zero: var(𝑎) = 0 b) Multiplying the random variable by a constant increases the variance by a factor of the square of the constant. var(𝑏𝑥) = 𝑏 2 var(𝑥) var(𝑏𝑥) = E[(𝑏𝑥 − 𝑏µ)2 ] var(𝑏𝑥) = E[𝑏 2 (𝑥 − µ)2 ] var(𝑏𝑥) = 𝑏 2 E[(𝑥 − µ)2 ] = 𝑏 2 var(𝑥) Let 𝑦 = 𝑎 + 𝑏𝑥 var(𝑦) = 𝑏 2 var(𝑥) Example 5.11 Using the income figures of the salesman in Example 8 find the variance of income. 𝑦 = 300 + 200𝑥 var(𝑦) = 2002 var(𝑥) = (40,000)(1.3275) = 53,100 The standard deviation of income is, 1-Numerical Descriptive Statistics 23 of 37 sd(𝑦) = √53,100 = $230.43 5.4. The Fixed and Random Components of a Random Variable As shown above, the deviation of the random variable from the mean is denoted by 𝑢 𝑢 =𝑥−µ Rearranging this equation we can express 𝑥 as follows: 𝑥 =µ+𝑢 Here we have simply separated the random variable into two components: the fixed component is the population parameter μ, and the random component is 𝑢. Using the variance rules, we have var(𝑥) = var(µ + 𝑢) = var(𝑢) var(µ) = 0 The variance of the random variable is the variance of the random component. The following properties of 𝑢 will be referred to frequently in the subsequent chapters. E(𝑢) = E(𝑥 − µ) E(𝑢) = E(𝑥) − µ = µ − µ = 0 var(𝑢) = E[(𝑢 − 0)2 ] = E(𝑢2 ) 5.5. Joint Probability Distribution (Probability Density Function) of Two Random Variables To study the relationship between two random variables we use the joint probability distribution. Consider two random variables 𝑥 and 𝑦, where 𝑥 takes on values 0 and 1, and y takes the values 0, 1, and 2. The following table shows the joint probability distribution (JPD) of 𝑥 and 𝑦. 𝑥 0 1 𝑓(𝑦) 0 0.26 0.07 0.33 𝑦 1 0.06 0.12 0.18 2 0.08 0.41 0.49 𝑓(𝑥) 0.40 0.60 1.00 This JPD is used to explain the following concepts: marginal probability, joint probability, and conditional probability. 5.5.1. Marginal Probability The random variable x takes on the values 0, and 1, and y, 0, 1 and 2. The probability density function (PDF) for each random variable is shown below: 1-Numerical Descriptive Statistics 24 of 37 𝑥 0 1 𝑓(𝑥) 0.40 0.60 𝑦 0 1 2 𝑓(𝑦) 0.33 0.18 0.49 The probability associated with each value of 𝑥 and each value of 𝑦 are obtained from the margin (last row and last column, respectively) of the joint probability table. This is why 𝑓(𝑥) and 𝑓(𝑦) are called marginal probability density functions. 5.5.2. Joint Probabilities The probability associated with each pair of (𝑥, 𝑦) value is called a joint probability. For example, in the table above the probability that (𝑥 = 0, 𝑦 = 0) is 0.26: 𝑓(0, 0) = 0.26. Since 𝑥 has two values and each is paired with three 𝑦 values, then there are 6 joint probabilities. The joint PDF is thus a function of the values of 𝑥 and 𝑦. It provides the probability associated with distinct pair of (x, y) values. For example, 𝑓(1, 2) = 0.41 Note that each marginal probability is the sum of joint probabilities. For marginal probabilities of 𝑥 the joint probabilities are summed across the row for each 𝑥 value; and for those of 𝑦 the joint probabilities are summed down the column for each y value. For example, 𝑓(𝑥 = 0) = 𝑓(0,0) + 𝑓(0,1) + 𝑓(0,2) = 0.26 + 0.06 + 0.08 = 0.40 𝑓(𝑦 = 1) = 𝑓(0,1) + 𝑓(1,1) = 0.06 + 0.12 = 0.18 The following table shows the general form of a joint probability distribution table. 𝑥 𝑥1 𝑥2 𝑓(𝑦) 𝑦 𝑦2 𝑓(𝑥1 , 𝑦2 ) 𝑓(𝑥2 , 𝑦2 ) 𝑓(𝑦2 ) 𝑦1 𝑓(𝑥1 , 𝑦1 ) 𝑓(𝑥2 , 𝑦1 ) 𝑓(𝑦1 ) 𝑦3 𝑓(𝑥1 , 𝑦2 ) 𝑓(𝑥2 , 𝑦3 ) 𝑓(𝑦3 ) 𝑓(𝑥) 𝑓(𝑥1 ) 𝑓(𝑥2 ) 1.00 The following shows, using summation notation, how in general the marginal and joint probabilities are related. For example, 3 𝑓(𝑥1 ) = 𝑓(𝑥1 , 𝑦1 ) + 𝑓(𝑥1 , 𝑦2 ) + 𝑓(𝑥1 , 𝑦3 ) = ∑ 𝑓(𝑥1 , 𝑦𝑗 ) 𝑗=1 Or, 2 𝑓(𝑦1 ) = 𝑓(𝑥1 , 𝑦1 ) + 𝑓(𝑥2 , 𝑦1 ) = ∑ 𝑓(𝑥𝑖 , 𝑦1 ) 𝑖=1 Generally, for 𝑖 = 1, 2, … , 𝑚, and for 𝑗 = 1, 2, . . . , 𝑛, we have: 𝑛 𝑓(𝑥𝑖 ) = 𝑓(𝑥𝑖 , 𝑦1 ) + 𝑓(𝑥𝑖 , 𝑦2 ) + ⋯ + 𝑓(𝑥𝑖 , 𝑦𝑛 ) = ∑ 𝑓(𝑥𝑖 , 𝑦𝑗 ) 𝑗=1 𝑚 𝑓(𝑦𝑗 ) = 𝑓(𝑥1 , 𝑦𝑗 ) + 𝑓(𝑥2 , 𝑦𝑗 ) + ⋯ + 𝑓(𝑥𝑚 , 𝑦𝑗 ) = ∑ 𝑓(𝑥𝑖 , 𝑦𝑗 ) 𝑖=1 1-Numerical Descriptive Statistics 25 of 37 5.5.3. Conditional Probability of x and y The conditional probability of 𝑥, given a value of 𝑦, shows the probability that 𝑥 takes on a value for a given value of the random variable 𝑦. This conditional probability is the ratio of joint probability of 𝑥 and 𝑦 over the marginal probability of 𝑦. 𝑓(𝑥|𝑦) = 𝑓(𝑥, 𝑦) 𝑓(𝑦) The conditional probability of 𝑥 given 𝑦 is the ratio of joint probability of 𝑥 and 𝑦 over the marginal probability of 𝑦. Inversely, 𝑓(𝑦|𝑥) = 𝑓(𝑥, 𝑦) 𝑓(𝑥) For example, for the numerical JPD table above, 𝑥 0 1 𝑓(𝑦) 0 0.26 0.07 0.33 𝑓(𝑥 = 1|𝑦 = 0) = 𝑓(1,0) 0.07 = = 0.2121 𝑓(𝑦 = 0) 0.33 𝑓(𝑦 = 2|𝑥 = 1) = 𝑓(1,2) 0.41 = = 0.6833 𝑓(𝑥 = 1) 0.60 𝑦 1 0.06 0.12 0.18 2 0.08 0.41 0.49 𝑓(𝑥) 0.40 0.60 1.00 Rewriting the conditional probability formula, we can express the joint probability of x and y as the product of the conditional probability times the marginal probability. 𝑓(𝑥, 𝑦) = 𝑓(𝑥|𝑦)𝑓(𝑦) or, 𝑓(𝑥, 𝑦) = 𝑓(𝑦|𝑥)𝑓(𝑥) This leads us to the definition of independent random variables versus dependent random variables. 5.6. Independent versus Dependent Random Variables Two random variables x and y are independent if, 𝑓(𝑥|𝑦) = 𝑓(𝑥) and 𝑓(𝑦|𝑥) = 𝑓(𝑦) If 𝑥 and 𝑦 are independent, then their joint probability becomes: 1-Numerical Descriptive Statistics 26 of 37 𝑓(𝑥, 𝑦) = 𝑓(𝑥|𝑦)𝑓(𝑦) = 𝑓(𝑥)𝑓(𝑦) As an indicator of the dependent or independent relationship between x and y always compare the joint probability to the product of the two marginal probabilities. If, 𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦) 𝑥 and 𝑦 are independent 𝑓(𝑥, 𝑦) ≠ 𝑓(𝑥)𝑓(𝑦) 𝑥 and 𝑦 are dependent Consider the following joint probability distributions. 𝑥 0 1 2 𝑓(𝑦) (A) 𝑥 and 𝑦 are independent 𝑦 0 1 2 0.300 0.125 0.075 0.180 0.075 0.045 0.120 0.050 0.030 0.60 0.25 0.15 𝑓(𝑥) 0.50 0.30 0.20 1.00 (B) 𝑥 and 𝑦 are dependent 𝑦 0 1 2 0.25 0.15 0.10 0.20 0.08 0.02 0.15 0.02 0.03 0.60 0.25 0.15 𝑥 0 1 2 𝑓(𝑦) 𝑓(𝑥) 0.50 0.30 0.20 1.00 In (A), 𝑥 and 𝑦 are independent because 𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦). For example, 𝑓(𝑥 = 1, 𝑦 = 2) = 𝑓(𝑥 = 1)𝑓(𝑦 = 2) = 0.30 × 0.15 = 0.045 In (B), 𝑥 and 𝑦 are dependent because 𝑓(𝑥, 𝑦) ≠ 𝑓(𝑥)𝑓(𝑦). For example, 𝑓(𝑥 = 1, 𝑦 = 2) = 0.02 ≠ 𝑓(𝑥 = 1)𝑓(𝑦 = 2) = 0.30 × 0.15 = 0.45 5.7. Covariance of Random Variables x and y The covariance of 𝑥 and 𝑦 is a measure of association or linear mutual variability of 𝑥 and 𝑦. It is the expected value of the product of the deviations of 𝑥 from the mean of 𝑥 times the deviations of 𝑦 from the mean of 𝑦. That is, it is the weighted average of the product of the two deviation terms. The weights are the joint probabilities 𝑓(𝑥, 𝑦). cov(𝑥, 𝑦) = E[(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )] = ∑ ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) 𝑓(𝑥, 𝑦) 𝑥 𝑦 Before the double summation notation in the formula scares you, let us compute the covariance from Table (B) above as reproduced here. x 0 1 2 f(y) 1-Numerical Descriptive Statistics 0 0.25 0.20 0.15 0.60 y 1 0.15 0.08 0.02 0.25 2 0.10 0.02 0.03 0.15 f (x) 0.50 0.30 0.20 1.00 27 of 37 First list all 9 pairs of joint (𝑥, 𝑦)’s and the corresponding joint probabilities, then proceed with the calculations as shown below. Note that µ𝑥 = ∑𝑥𝑓(𝑥) = 0.7 and µ𝑦 = ∑𝑦𝑓(𝑦) = 0.55. 𝑥 𝑦 𝑓(𝑥, 𝑦) 0 0 0 1 1 1 2 2 2 0 1 2 0 1 2 0 1 2 0.25 0.15 0.10 0.20 0.08 0.02 0.15 0.02 0.03 (𝑥 − µ𝑥 ) (𝑦 − µ𝑦 ) (𝑥 − µ𝑥 )(𝑦 − µ𝑦 )𝑓(𝑥, 𝑦) -0.7 -0.55 0.09625 -0.7 0.45 -0.04725 -0.7 1.45 -0.10150 0.3 -0.55 -0.03300 0.3 0.45 0.01080 0.3 1.45 0.00870 1.3 -0.55 -0.10725 1.3 0.45 0.01170 1.3 1.45 0.05655 ∑∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )𝑓(𝑥, 𝑦) = −0.10500 The covariance is: cov(𝑥, 𝑦) = −0.105. The negative sign in this example indicates that the two variable are inversely related. Thus, the covariance shows the direction of the relationship between 𝑥 and 𝑦. If the covariance is negative, then 𝑥 and 𝑦 tend to move in the opposite direction: when 𝑥 increases, then 𝑦 will decrease. When 𝑥 decreases, 𝑦 will increase. The two random variables are said to be negatively correlated. The covariance is positive when 𝑥 and 𝑦 are directly related. They tend to move together: when 𝑥 increases, then y will increase. When 𝑥 decreases, 𝑦 will decrease. The two random variables are said to be positively correlated Now compute the covariance from Table (A): 𝑥 0 0 0 1 1 1 2 2 2 𝑦 0 1 2 0 1 2 0 1 2 𝑓(𝑥, 𝑦) 0.300 0.125 0.075 0.180 0.075 0.045 0.120 0.050 0.030 (𝑥 − µ𝑥 )(𝑦 − µ𝑦 )𝑓(𝑥, 𝑦) (𝑥 − µ𝑥 ) (𝑦 − µ𝑦 ) -0.70 -0.55 0.11550 -0.70 0.45 -0.03938 -0.70 1.45 -0.07613 0.30 -0.55 -0.02970 0.30 0.45 0.01013 0.30 1.45 0.01958 1.30 -0.55 -0.08580 1.30 0.45 0.02925 1.30 1.45 0.05655 )(𝑦 ∑∑(𝑥 − µ𝑥 − µ𝑦 )𝑓(𝑥, 𝑦) = 0.000000 Note that the covariance is zero. This is not a coincidence. Recall that Table (A) represented the joint probability distribution of two independent variables. When two random variables are independent, they have no tendency to move together in either direction. They are said to be uncorrelated. 5.7.1. The covariance formula simplified It appears that a lot of effort goes into the calculation of the covariance. A simple reconfiguration of the definitional formula for covariance results in a simpler computational formula, as follows: 1-Numerical Descriptive Statistics 28 of 37 cov(𝑥, 𝑦) = E[(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )] = E(𝑥𝑦 − 𝑥µ𝑦 − 𝑦µ𝑥 + µ𝑥 µ𝑦 ) Using the linear transformation properties of the expected value, we have: E(𝑥𝑦 − 𝑥µ𝑦 − 𝑦µ𝑥 + µ𝑥 µ𝑦 ) = E(𝑥𝑦) − µ𝑦 E(𝑥) − µ𝑥 E(𝑦) + µ𝑥 µ𝑦 E(𝑥𝑦 − 𝑥µ𝑦 − 𝑦µ𝑥 + µ𝑥 µ𝑦 ) = E(𝑥𝑦) − µ𝑦 µ𝑥 − µ𝑥 µ𝑦 + µ𝑥 µ𝑦 = E(𝑥𝑦) − µ𝑦 µ𝑥 In short, cov(𝑥, 𝑦) = E(𝑥𝑦) − µ𝑦 µ𝑥 where, E(𝑥𝑦) = ∑∑𝑥𝑦𝑓(𝑥, 𝑦). 2 For Table (B) we have, 𝑥 0 0 0 1 1 1 2 2 2 𝑦 0 1 2 0 1 2 0 1 2 𝑥𝑦 0 0 0 0 1 2 0 2 4 𝑓(𝑥, 𝑦) 𝑥𝑦𝑓(𝑥, 𝑦) 0.25 0.00 0.15 0.00 0.10 0.00 0.20 0.00 0.08 0.08 0.02 0.04 0.15 0.00 0.02 0.04 0.03 0.12 E(𝑥𝑦) = 0.28 µ𝑥 µ𝑦 = 0.385 cov(𝑥, 𝑦) = E(𝑥𝑦) − µ𝑥 µ𝑦 = −0.105 For Table (A) 2 This is a good place to show how the double summation notation works: 𝑚 𝑛 𝑚 ∑ ∑ 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) = ∑[𝑥𝑖 𝑦1 𝑓(𝑥𝑖 , 𝑦1 ) + 𝑥𝑖 𝑦2 𝑓(𝑥𝑖 , 𝑦2 ) + ⋯ + 𝑥𝑖 𝑦𝑛 𝑓(𝑥𝑖 , 𝑦𝑛 )] 𝑖=1 𝑗=1 ∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) ∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) ∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) ∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) 𝑖=1 = 𝑥1 𝑦1 𝑓(𝑥1 , 𝑦1 ) + 𝑥1 𝑦2 𝑓(𝑥1 , 𝑦2 ) + ⋯ + 𝑥1 𝑦𝑛 𝑓(𝑥𝑖 , 𝑦𝑛 ) + = 𝑥2 𝑦1 𝑓(𝑥2 , 𝑦1 ) + 𝑥2 𝑦2 𝑓(𝑥2 , 𝑦2 ) + ⋯ + 𝑥2 𝑦𝑛 𝑓(𝑥2 , 𝑦𝑛 ) + = 𝑥2 𝑦1 𝑓(𝑥2 , 𝑦1 ) + 𝑥2 ⋮ 𝑓(𝑥2 , 𝑦2 ) + ⋯ + 𝑥2 𝑦𝑛 𝑓(𝑥2 , 𝑦𝑛 ) + = 𝑥𝑚 𝑦1 𝑓(𝑥𝑚 , 𝑦1 ) + 𝑥𝑚 𝑦2 𝑓(𝑥𝑚 , 𝑦2 ) + ⋯ + 𝑥𝑚 𝑦𝑛 𝑓(𝑥𝑚 , 𝑦𝑛 ) 1-Numerical Descriptive Statistics 29 of 37 𝑥 0 0 0 1 1 1 2 2 2 𝑦 0 1 2 0 1 2 0 1 2 𝑥𝑦 0 0 0 0 1 2 0 2 4 𝑓(𝑥, 𝑦) 𝑥𝑦𝑓(𝑥, 𝑦) 0.300 0.000 0.125 0.000 0.075 0.000 0.180 0.000 0.075 0.075 0.045 0.090 0.120 0.000 0.050 0.100 0.030 0.120 E(𝑥𝑦) = 0.385 µ𝑥 µ𝑦 = 0.385 cov(𝑥, 𝑦) = E(𝑥𝑦) − µ𝑥 µ𝑦 = 0.000 The results of the last calculations also show another important relationship between two independent variables 𝑥 and 𝑦. Note that: E(𝑥𝑦) − µ𝑥 µ𝑦 = 0 This means that, for any two independent random variables, E(𝑥𝑦) = E(𝑥)E(𝑦).3 5.8. Coefficient of Correlation The covariance by itself shows the direction of association between 𝑥 and 𝑦. We are also interested in how strongly 𝑥 and 𝑦 are correlated. However, like all variance measures, covariance is affected by the scale or the magnitude of data and, therefore, it cannot serve as a measure of degree or strength of association between the two random variables. The bigger the scale of the data, the bigger the covariance, even though 𝑥 and 𝑦 may not be strongly related. For example, if the data are in millions of dollars, the covariance will be bigger than when the data are in thousands of dollars. To obtain a measure of correlation, we need to scale the covariance. Scaling results in a relative measure, thus removing the impact of the scale of the data. Scaling takes place by dividing the covariance by the product of sd(𝑥) and sd(𝑦). The result is called the coefficient of correlation, denoted by the Greek letter ρ (lower case rho): ρ= cov(𝑥, 𝑦) sd(𝑥) ∙ sd(𝑦) −1≤ρ≤1 The correlation coefficient varies between −1 and 1. Mathematical proof is as follows, Since 𝑥 and 𝑦 are two independent random variables, then 3 𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦) Therefore, E(𝑥𝑦) = ∑∑𝑥𝑦𝑓(𝑥, 𝑦) = ∑∑𝑥𝑦𝑓(𝑥)𝑓(𝑦) E(𝑥𝑦) = ∑∑𝑥𝑓(𝑥)𝑦𝑓(𝑦) E(𝑥𝑦) = ∑𝑥𝑓(𝑥)∑𝑦𝑓(𝑦) = E(𝑥)E(𝑦) 1-Numerical Descriptive Statistics 30 of 37 When ρ = −1, 𝑥 and 𝑦 are perfectly negatively correlated. When ρ = 0, 𝑥 and 𝑦 are uncorrelated. This means that 𝑥 and 𝑦 are independent. When ρ = 1, 𝑥 and 𝑦 are perfectly positively correlated. The closer ρ is to either −1 or 1, the stronger the relationship between the variable. The closer it is to 0, the weaker the relationship. Using the data in Table (B) above, we computed, and var(𝑥) = 0.5475 var(𝑦) = 0.61 Thus, sd(𝑥) = 0.781 and sd(𝑦) = 0.7399 Using these values and cov(𝑥, 𝑦) = −0.105, we have ρ= cov(𝑥, 𝑦) −0.105 = = −0.182 sd(𝑥) ∙ sd(𝑦) 0.781 × 0.7399 This means that the relationship between 𝑥 and 𝑦 is negative but very weak. 5.9. Effect of Linear Transformation of Two Random Variables on Their Covariance and Correlation Let 𝑣 = 𝑎 + 𝑏𝑥 and 𝑤 = 𝑐 + 𝑑𝑦 represent the linear of transformation of the random variables 𝑥 and 𝑦. Show that, cov(𝑣, 𝑤) = 𝑏𝑑cov(𝑥, 𝑦) Proof is as follow: cov(𝑣, 𝑤) = E[(𝑣 − µ𝑣 )(𝑤 − µ𝑤 )] cov(𝑣, 𝑤) = E(𝑣𝑤) − µ𝑣 µ𝑤 = E(𝑣𝑤) − E(𝑣)E(𝑤) cov(𝑣, 𝑤) = E[(𝑎 + 𝑏𝑥)(𝑐 + 𝑑𝑦)] − [𝑎 + 𝑏E(𝑥)][𝑐 + 𝑑E(𝑥𝑦)] cov(𝑣, 𝑤) = E(𝑎𝑐 + 𝑏𝑐𝑥 + 𝑎𝑑𝑦 + 𝑏𝑑𝑥𝑦) − E(𝑎 + 𝑏𝑥)E(𝑐 + 𝑑𝑥) cov(𝑣, 𝑤) = 𝑎𝑐 + 𝑏𝑐E(𝑥) + 𝑎𝑑E(𝑦) + 𝑏𝑑E(𝑥𝑦) − 𝑎𝑐 − 𝑎𝑑E(𝑦) − 𝑏𝑐E(𝑥) − 𝑏𝑑E(𝑥)E(𝑦) cov(𝑣, 𝑤) = 𝑏𝑑E(𝑥𝑦) − 𝑏𝑑E(𝑥)E(𝑦) = 𝑏𝑑[E(𝑥𝑦) − E(𝑥)E(𝑦)] cov(𝑣, 𝑤) = 𝑏𝑑cov(𝑥, 𝑦) The correlation coefficient of 𝑣 and 𝑤 is, 𝜌(𝑣, 𝑤) = cov(𝑣, 𝑤) 𝑏𝑑cov(𝑥, 𝑦) = sd(𝑣) ∙ sd(𝑤) 𝑏sd(𝑥) ∙ 𝑑sd(𝑦) 𝜌(𝑣, 𝑤) = cov(𝑥, 𝑦) sd(𝑥) ∙ sd(𝑦) 1-Numerical Descriptive Statistics 31 of 37 5.10. The Mean and Variance of the Sum of Two Random Variables Let 𝑧 =𝑥+𝑦 The expected value of 𝑧 is, E(𝑧) = E(𝑥 + 𝑦) = E(𝑥) + E(𝑦) The variance of 𝑧 is, var(𝑧) = E[(𝑧 − µ𝑧 )2 ] 2 2 var(𝑧) = E {[(𝑥 + 𝑦) − (µ𝑥 + µ𝑦 )] } = E {[(𝑥 − µ𝑥 ) + (𝑦 − µ𝑦 )] } var(𝑧) = E[(𝑥 − µ𝑥 )2 + (𝑦 − µ𝑦 )2 + 2(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )] var(𝑧) = E[(𝑥 − µ𝑥 )2 ] + E[(𝑦 − µ𝑦 )2 ] + 2E[(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )] var(𝑧) = var(𝑥 + 𝑦) = var(𝑥) + var(𝑦) + 2cov(𝑥, 𝑦) Similarly, var(𝑥 − 𝑦) = var(𝑥) + var(𝑦) − 2cov(𝑥, 𝑦) If 𝑥 and 𝑦 are two independent random variables, cov(𝑥, 𝑦) = 0. Then, var(𝑥 + 𝑦) = var(𝑥 − 𝑦) = var(𝑥) + var(𝑦) 6. The Normal Distribution The normal distribution is the quintessential example of a continuous random variable. Unlike a discrete random variable, a continuous random variable can take on infinite number of values. If the distribution of these values is bell-shaped, then the distribution is represented by the normal probability density function: 𝑓(𝑥) = 1 √2πσ 1 𝑥−µ 2 ) σ 𝑒 − 2( Each normal 𝑝𝑑𝑓 is identified by two distribution parameters µ (the population mean) and σ (the population standard deviation). The other two symbols π and e are two universal constants (π = 3.1415927 and, the base for natural logarithm, 𝑒 = 2.718282). Given µ and σ, we can draw the normal curve for different values of 𝑥. Example 6.1 Suppose the vehicle speed on a given stretch of an interstate is normally distributed with a mean speed of µ = 72 mph and σ = 5. Show the distribution of 𝑥, the vehicle speed. 𝑓(𝑥) = 1 5√2π 1 𝑥−72 2 ) 5 𝑒 −2( Find the density for 𝑥 = 77. 1-Numerical Descriptive Statistics 32 of 37 This can be done using a scientific calculator or Excel: 𝑓(77) = 1 √2π5 1 77−72 2 ) 5 𝑒 − 2( = 0.0484 The value (the density) 𝑓(𝑥 = 77) = 0.0484 is not the probability of 𝑥 = 77. It represents the height of the curve at point 𝑥 = 77. For continuous distribution, the 𝑝𝑑𝑓 does not provide the probability for a given value of 𝑥. Because 𝑥 can take on infinite values, the probability for a given value is not defined. For such variables, probability is defined only for a range of values. Using the 𝑝𝑑𝑓, the probability that the random variable 𝑥 can take on values in a given range or interval of values is determined by the area under the 𝑝𝑑𝑓 bounded by the lower and upper ends of the interval. The following shows the distribution of vehicle speed in Example 6.1. f(77) = 0.048 µ = 72 x = 77 x To draw this distribution, all we need to know are the values for the two parameters µ and σ. The normal curve practically (but not actually) touches the x axis at 4 standard deviations from the mean. Thus, in Example 6.1 you can evaluate 𝑓(𝑥) for several values of 𝑥, where 52 ≤ 𝑥 ≤ 92,4 and then connect the dots to draw the normal curve as above. Note that the inflection point of the curve occurs at 𝑥 = µ + σ, which is at 𝑥 = 72 + 5 = 77 6.1. Finding the Area (Probability) Under a Normal Curve Example 6.2 Given µ = 72 and σ = 5, if a vehicle is clocked at random, what is the probability that the speed is between 67 and 77 mph? Or, what proportion of vehicles drive between 67 and 77 mph? In the diagram below, the probability is shown as the area under the curve for the interval 67 ≤ 𝑥 ≤ 77. In Excel use the function =𝐍𝐎𝐑𝐌. 𝐃𝐈𝐒𝐓(𝐱, 𝐦𝐞𝐚𝐧, 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝_𝐝𝐞𝐯, 𝐜𝐮𝐦𝐮𝐥𝐚𝐭𝐢𝐯𝐞) to find 𝑓(𝑥). Enter “0” for “cumulative” to obtain the density. Entering “1” for “cumulative” will yield the probability (the area under the normal curve to the left of 𝑥). 4 1-Numerical Descriptive Statistics 33 of 37 67 µ = 72 77 x This area can be found by evaluating the integral of 𝑓(𝑥) for 67 ≤ 𝑥 ≤ 77. You can do this if you are math wiz! You can, however, also use the Excel function =𝐍𝐎𝐑𝐌. 𝐃𝐈𝐒𝐓(𝐱, 𝐦𝐞𝐚𝐧, 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝_𝐝𝐞𝐯, 𝟏). This function gives you the area under the curve to the left of 𝑥. Therefore, you must use the function twice and then find the difference: P(67 ≤ 𝑥 ≤ 77) = P(𝑥 ≤ 77) − P(𝑥 ≤ 67) = NORM.DIST(77,72,5,1) = NORM.DIST(67,72,5,1) Difference 0.8413 0.1587 0.6827 Alternatively, you can linearly transform 𝑥 into the standard normal variable 𝒛. As mentioned above, the 𝑧 score (regardless of the distribution of the data) measures the deviation of each 𝑥 value from the mean in units of standard deviation. The standard normal variable 𝑧 shows the deviations of all normally distributed values of 𝑥 in units of σ. 𝑧= 𝑥−µ σ The 𝑧 random variable has a mean of 0 and standard deviation of 1. E(𝑧) = E ( 𝑥−µ 1 µ µ µ ) = E(𝑥) − = − = 0 σ σ σ σ σ 𝑣𝑎𝑟(𝑧) = 𝑣𝑎𝑟 ( 𝑥−µ 1 1 2 σ2 ) = 𝑣𝑎𝑟 ( 𝑥) = ( ) 𝑣𝑎𝑟(𝑥) = 2 = 1 σ σ σ σ Given these special features of z, the standard normal table of probabilities is developed. When 𝑥 is transformed into 𝑧 then the corresponding probabilities are obtained from this table: For 𝑥 = 77, For 𝑥 = 67, 𝑧 = (77 – 72)⁄5 = 1.00 𝑧 = (67 – 72)⁄5 = −1.00 Thus, P(67 ≤ 𝑥 ≤ 77) = P(−1.00 ≤ 𝑧 ≤ 1.00) = P(𝑧 ≤ 1.00) − P(𝑧 ≤ −1.00) = 0.8413 − 0.1587 = 0.6827 In Excel, use the =𝐍𝐎𝐑𝐌. 𝐒. 𝐃𝐈𝐒𝐓(𝐳, 𝐜𝐮𝐦𝐮𝐥𝐚𝐭𝐢𝐯𝐞) function to find the cumulative probability of 𝑧. For example: =NORMSDIST(−1,1) = 0.1587 P(−1.00 ≤ 𝑧 ≤ 1.00) = 0.6827 is the confirmation of the well-known empirical rule that for all bell-shaped distributions approximately 68 percent of all values fall within one standard deviation from the mean. 1-Numerical Descriptive Statistics 34 of 37 0.6827 -1.00 0 1.00 z In addition for its practical uses to determine probabilities for normal random variables, the 𝑧 variable is important also for theoretical reasons. The standard normal variable is related to a number of other random variables that are crucial to inferential statistics. These random variables are referred to in terms of their distribution: the t distribution, the Chi-square distribution, and the F distribution. These distributions will be discussed in the future chapters when their applications are required for the topic under consideration. 6.2. Finding the 𝒙 value for a given area under the normal curve Example 6.3 Given µ = 72 and σ = 5, below what speed 90 percent of vehicles drive? In the diagram below, the area bounded on the right by x is 0.90. What is the value of x? 0.9000 72 x = _____ To find 𝑥, first solve for 𝑥 from the 𝑧 function, 𝑧= 𝑥−µ σ 𝑥 = µ + 𝑧σ The 𝑧-score that bounds an area from the left (or with a cumulative probability) of 0.90 is 𝑧 = 1.28 (rounded to two decimal points). You can obtain this 𝑧-score using =𝐍𝐎𝐑𝐌. 𝐒. 𝐈𝐍𝐕(𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲) in Excel. 1-Numerical Descriptive Statistics 35 of 37 =NORM. S. INV(0.9) = 1.28 Thus, 𝑥 = 72 + (1.28)(5) = 78.4 This speed is the 90th percentile speed. Ninety percent of vehicles drive under 78.4 mph. You may also derive the same 𝑥 value using Excel. You can directly compute the 𝑥 value for a given cumulative probability using =𝐍𝐎𝐑𝐌. 𝐈𝐍𝐕(𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲, 𝐦𝐞𝐚𝐧, 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝_𝐝𝐞𝐯) =NORMINV(0.9,72,5) = 78.408 Example 6.4 For µ = 72 and σ = 5, find the middle interval (the interval symmetric about the mean) of 𝑥 values which contains: a) b) c) 90% of all 𝑥 values. (In what speed interval do 90% of vehicles drive?) 95% of all 𝑥 values. (In what speed interval do 95% of vehicles drive?) 99% of all 𝑥 values. (In what speed interval do 99% of vehicles drive?) In the following diagram, we are to find the lower and upper 𝑥 values of the interval such that the proportion 1 − α of all 𝑥 values fall within this interval. The proportion α⁄2 of 𝑥 values fall at each tail of the distribution outside the interval. 1−α α/2 xL α/2 xU x The 𝑥 values of the interval boundaries are found using the following familiar equations: 𝑧α⁄2 = 𝑥−µ σ 𝑥𝐿 , 𝑥𝑈 = µ ± 𝑧α⁄2 σ Here the symbol 𝑧α⁄2 is the z score that bounds a tail area of α⁄2. When, for example, 1 − α = 0.90, then 𝑧α⁄2 = 0.05. The 𝑧 score that bounds a tail area of 0.05, 𝑧0.05 , is easily found using, =NORM. S. INV(0.05) = −1.64 Ignoring the “−” sign, then 𝑧0.05 = 1.64 1-Numerical Descriptive Statistics 36 of 37 The following table shows the z score for the three different tail areas needed to complete Example 6.4 1−α 0.90 0.95 0.99 α ⁄2 0.050 0.025 0.005 𝑧α⁄2 1.64 1.96 2.58 𝑥𝐿 , 𝑥𝑈 = µ ± 𝑧α⁄2 σ a) 𝑥𝐿 , 𝑥𝑈 = 72 ± 1.64(5) = (63.8,80.2) b) 𝑥𝐿 , 𝑥𝑈 = 72 ± 1.96(5) = (62.2,81.8) c) 𝑥𝐿 , 𝑥𝑈 = 72 ± 2.58(5) = (59.1,84.9) 1-Numerical Descriptive Statistics 37 of 37