1-Review of Basic St..

advertisement
CHAPTER 1
REVIEW OF BASIC STATISTICS CONCEPTS
1.
2.
3.
4.
5.
6.
Rules of Summation
What is Statistics?
2.1. Descriptive statistics
2.2. Inferential statistics
2.2.1. population
2.2.2. Sample
Important Measures of Central Tendency and Data Variability
3.1. The Mean
3.1.1. population mean µ
3.1.2. sample mean x̄
3.2. The Mean as the Center of Gravity of the Data
3.3. Variance
3.3.1. Variance of Population Data
3.3.2. Sample Variance
3.4. Standard Deviation
3.5. The z-score
Measures of Association Between Two Variables
4.1. Covariance
4.1.1. Population Covariance
4.1.1.1. Covariance is affected by the scale of the data
4.1.1.2. The covariance sign (−, +) indicates the direction of relationship between x and y
4.1.1.3. When x and y are Unrelated
4.1.1.4. The Computational (Simpler) Formula for Covariance
4.1.2. Sample Covariance
4.2. Correlation Coefficient
Random Variables
5.1. Probability Distribution (Probability Density Function) of Discrete Random Variables
5.2. Expected Value of Discrete Random Variables
5.2.1. Expected Value Rules
5.3. Variance of the Discrete Random Variable
5.3.1. Variance Rules
5.4. The Fixed and Random Components of a Random Variable
5.5. Joint Probability Distribution (Probability Density Function) of Two Random Variables
5.6. Independent versus Dependent Random Variables
5.7. Covariance of x and y
5.7.1. The covariance formula simplified
5.8. Coefficient of Correlation
5.9. Effect of Linear Transformation of Two Random Variables on Their Covariance and Correlation
5.10. The Mean and Variance of the Sum of Two Random Variables
Normal Distribution
1-Numerical Descriptive Statistics
1 of 37
1. Rules of Summation
1) Sum of 𝑥𝑖 .
𝑛
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑖=1
𝑖
1
2
3
4
5
𝑥𝑖
20
21
19
22
24
106
5
∑ 𝑥𝑖 = 20 + 21 + 19 + 22 + 24 = 106
𝑖=1
2) For a given constant 𝑘.
𝑛
∑ 𝑘 = 𝑘 + 𝑘 + ⋯ + 𝑘 = 𝑛𝑘
𝑖=1
5
∑ 10 = 10 + 10 + 10 + 10 + 10
𝑖=1
5
∑ 10 = 5 × 10 = 50
𝑖=1
3) Sum of 𝑘𝑥𝑖
𝑛
∑ 𝑘𝑥𝑖 = 𝑘𝑥1 + 𝑘𝑥2 + ⋯ + 𝑘𝑥𝑛
𝑖=1
𝑛
∑ 𝑘𝑥𝑖 = 𝑘(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )
𝑖=1
𝑛
𝑛
∑ 𝑘𝑥𝑖 = 𝑘 ∑ 𝑥𝑖
𝑖=1
𝑖=1
4) Sum of 𝑘 + 𝑚𝑥𝑖
𝑛
∑(𝑘 + 𝑚𝑥𝑖 ) = (𝑘 + 𝑚𝑥1 ) + (𝑘 + 𝑚𝑥2 ) + ⋯ + (𝑘 + 𝑚𝑥𝑛 )
𝑖=1
𝑛
∑(𝑘 + 𝑚𝑥𝑖 ) = (𝑘 + 𝑘 + ⋯ 𝑘) + 𝑚(𝑥1 + 𝑥2 + ⋯ 𝑥𝑛 )
𝑖=1
𝑛
𝑛
𝑛
∑(𝑘 + 𝑚𝑥𝑖 ) = ∑ 𝑘 + 𝑚 ∑ 𝑥𝑖
𝑖=1
𝑖=1
𝑖=1
1-Numerical Descriptive Statistics
2 of 37
𝑛
𝑛
∑(𝑘 + 𝑚𝑥𝑖 ) = 𝑛𝑘 + 𝑚 ∑ 𝑥𝑖
𝑖=1
𝑖=1
5) Sum of 𝑥𝑖 + 𝑦𝑖
𝑛
∑(𝑥𝑖 + 𝑦𝑖 ) = (𝑥1 + 𝑥2 + ⋯ 𝑥𝑛 ) + (𝑦1 + 𝑦2 + ⋯ 𝑦𝑛 )
𝑖=1
𝑛
𝑛
𝑛
∑(𝑥𝑖 + 𝑦𝑖 ) = ∑ 𝑥𝑖 + ∑ 𝑦𝑖
𝑖=1
𝑖=1
𝑖=1
2. What is Statistics?
Statistics is a discipline which studies the collection, organization, presentation, analysis and interpretation of
numerical data. There are two types of statistics: the descriptive statistics, and the inferential statistics.
2.1. Descriptive statistics
Descriptive statistics is the easy part. It deals with the collection, organization, and presentation of data.
Descriptive statistics involves tables, charts, and presentation of summary characteristics of the data, which
include concepts such as the mean, median or standard deviation. Descriptive statistics is encountered daily
in the news media. For example, in the weather report you frequently hear about the average temperature,
precipitation, pollen count, etc., in a given month of the year. Or you may read about the stock market trend,
changes in the mortgage rate, the rise and fall in the crime rate, students' performance in statewide tests, and
many similar reports.
2.2. Inferential statistics
Inferential statistics is the complicated part of statistics. It deals with inferring or drawing conclusions about
the whole (population data) from analyzing a part of a phenomenon (sample data). An opinion poll is an
example of inferential statistics. For example, to determine the voters' preference for a given political
candidate a sample of registered voters is questioned from which inferences are made about the attitudes of
the population of all potential voters. The reason inferential statistics is more complicated is that it involves
the theories of probability and sampling distribution, subjects unfamiliar to most students of introduction to
statistics.
2.2.1.
Population
In inferential statistics, the term population applies to every element, observation or data in the phenomenon
or group that is the subject of the analysis. Stated another way, a population consists of all the items or
individuals about which you want to draw a conclusion.
2.2.2.
Sample
The sample is a subset of the population selected in order to estimate, or infer about, specific characteristics
of the population. For example, suppose we are interested in the average age of residents of a retirement
community in Florida. Table 1.1, listing the age of every resident, represents the population that is the
1-Numerical Descriptive Statistics
3 of 37
subject of the study. The population has 608 observations. The shaded cells in the table represent the age
data for a sample of size 40 randomly selected from the population. Table 1.2 contains the sample data.
The population data set here is said to be “finite”. You can easily obtain and list them. And you can easily
compute the average age. Here, the average age of the population of residents in this community is 64.2. The
population average (or population mean), denoted by  (mu, the Greek lower case m) is an example of a
summary characteristic of a data set. A summary characteristic that is obtained from the population is called
a population parameter. Thus,  = 64.2 is a population parameter.
Table 1.1
Population Age Data for the Residents of a Retirement Community
82
69
56
74
68
65
60
66
70
51
59
64
75
69
69
70
60
76
65
64
64
55
64
65
69
75
58
61
65
59
62
62
57
61
59
78
70
70
69
63
62
73
52
68
79
61
68
68
56
79
61
68
67
52
67
65
65
53
56
69
66
57
67
62
54
70
58
76
65
64
79
70
69
55
60
69
54
62
67
61
60
69
55
68
66
52
69
62
67
70
57
69
60
58
63
65
88
67
65
55
61
69
63
63
67
53
58
65
76
63
66
69
58
74
72
63
66
51
66
68
70
50
57
62
51
58
66
70
62
67
54
70
64
64
61
66
62
69
68
68
63
67
68
64
59
52
52
70
64
55
62
67
66
73
57
61
67
60
68
69
87
64
50
56
70
61
60
70
70
60
68
68
72
61
67
68
56
68
67
61
64
55
67
64
61
77
60
67
67
60
62
66
84
67
68
62
67
64
67
69
68
74
54
63
68
66
67
68
56
61
70
63
63
54
61
66
68
79
60
67
56
57
68
53
52
68
66
59
67
63
76
69
66
78
52
65
72
64
61
63
59
64
75
63
69
52
66
65
61
51
57
70
61
59
68
56
85
63
59
67
61
70
73
69
61
55
62
70
65
68
61
62
58
53
66
70
66
55
65
62
67
66
57
61
64
58
70
50
90
63
67
54
62
67
78
63
65
76
66
64
79
67
63
69
57
59
63
61
67
50
63
62
61
84
60
66
67
57
62
64
50
62
58
80
68
70
51
70
66
57
66
66
59
65
65
67
60
69
80
65
62
50
61
66
64
64
59
67
57
60
61
66
81
70
61
57
70
69
54
69
65
78
65
70
80
69
69
63
60
69
74
63
62
54
64
65
69
53
60
70
69
59
62
65
78
61
61
63
67
63
76
65
68
76
67
62
52
65
68
64
59
73
66
69
69
54
67
66
64
73
59
62
58
60
65
67
61
64
50
74
67
63
52
63
66
57
56
70
76
61
65
61
60
55
75
64
66
54
63
70
70
60
60
69
62
59
63
51
79
67
68
79
70
67
55
64
70
63
57
66
50
68
63
62
59
52
50
61
68
52
63
69
64
60
60
69
52
60
62
63
82
65
67
65
68
70
58
68
65
57
67
63
55
67
69
68
60
77
52
64
65
53
62
65
64
73
56
65
62
56
65
63
71
64
52
73
64
68
62
64
62
52
66
68
76
64
66
68
58
76
61
69
61
51
64
67
61
56
59
61
70
60
62
55
86
66
54
73
64
66
50
63
64
72
68
63
57
61
65
67
60
74
55
67
70
52
61
63
66
66
59
68
53
56
68
55
73
67
57
50
61
70
69
66
68
57
52
68
51
65
64
69
59
72
72
65
65
53
67
66
62
82
56
65
50
59
70
57
Sometimes it is preferable to determine the summary characteristic of interest from a sample. In most cases,
the population data set may not be finite, and hence not obtainable. In such cases a sample, as a subset of the
population data, my serve us better than the study of the whole population. Even with finite population data
sets, sometimes it is preferable to obtain a sample because it may be more convenient or that the sample data
can be screened for errors much better than when population data is used. If a summary characteristic is
computed from the sample data, then this summary characteristic represents an estimate of the population
parameter. The sample (estimated) summary characteristic is called a sample statistic.
Table 1.2 shows the age data for a sample of 40 residents randomly selected from the population.
1-Numerical Descriptive Statistics
4 of 37
Table 1.2
The Age Data for a Sample of 40 Residents
54
69
66
64
61
61
62
70
51
53
57
69
60
54
60
70
66
69
69
59
70
76
75
66
60
67
52
50
61
60
69
52
52
66
62
69
65
63
64
70
The average or mean age computed from the random sample of size 40 shown in Table 1.2 is 62.8. This
average is denoted by the symbol 𝑥̅ (x-bar). Thus, the sample statistic 𝑥̅ = 62.8 is an estimate of the
population parameter  = 64.2.
3. Important Measures of Central Tendency and Data Variability
The two main summary characteristics of data used in statistics are the measure of central tendency or center
of gravity of data, and the measure of data variability. The measure of central tendency is the mean and the
measure of data variability (dispersion) is the variance, which is a squared measure, and its square root
standard deviation.
3.1.
The Mean
The most widely known and used measure of central tendency is the arithmetic mean (or, simply, the mean
or the average). The mean is the sum of the values of all the observations in a data set divided by the number
of observations. If the data set represents a population then the population mean is denoted by µ ; and if the
data set represents a sample, then the sample mean is denoted by 𝑥̅ . The formula for the population mean is:
3.1.1.
Population Mean
To obtain the population mean add all the values in the data set and divide the sum by the count of the
observations in that data set, N.
µ=
∑𝑥𝑖
𝑁
Example
A population consists of five data points as follows:
𝑥𝑖
2
5
7
9
17
The population mean is computed as
µ=
∑𝑥𝑖
𝑁
=
2 + 7 + 9 + 17
=8
5
1-Numerical Descriptive Statistics
5 of 37
3.1.2.
Sample Mean
The formula for the sample mean is the same as the population mean formula, except for the symbols. The
sample mean is denoted by x̄ and the sample size by n.
𝑥̅ =
∑𝑥𝑖
𝑛
3.2. The Mean as the Center of Gravity of Data Set
The mean represents the "center of gravity" of a set of numbers. To explain this, first you must understand
one of the most important terms in statistics, that is, deviation. Deviation is simply the distance of, or
difference between, each data point from some benchmark. Here the benchmark is the mean. Thus, the
deviation is defined as: 𝑥𝑖 − 𝜇. Table 1.5 below shows the deviation of each of the five data points from the
mean µ = 8. Note that the sum of deviations equals zero. This is where the notion of “center of gravity”
comes in.
Table 1.5
Deviation of data from the mean (µ = 8)
Deviation
𝑥𝑖 − µ
𝑥𝑖
2
−6
5
−3
7
−1
9
1
17
9
∑(𝑥𝑖 − µ) = 0
As the following diagram shows, µ = 8 is the balancing point of the five numbers. This means that sum of the
deviations of the values exceeding the mean, 1 + 9 = 10, exactly balances the sum of the deviations of the
values below the mean, −6 −3 – 1 = −10, . Thus the sum of all deviations from the mean is zero.
2
5
7
9
17
8
3.3. Variance
The variance of a data set is a measure of dispersion of data. It represents the mean squared deviation of data
points from the mean.
1-Numerical Descriptive Statistics
6 of 37
3.3.1. Variance of Population Data
The population variance is denoted by 𝜎² (lower case Greek letter sigma-square). To compute the variance
of a population data set, first you must find the mean µ, then determine the sum of squared deviations of the
observations from the mean, as follows:
Deviation from the mean = 𝑥𝑖 − µ
Squared Deviation = (𝑥𝑖 − µ)2
Sum of the squared deviations = ∑(𝑥𝑖 − µ)2
Variance is the mean squared deviation. Therefore, divide the sum of squared deviations by N:
σ2 =
∑(𝑥𝑖 − µ)2
𝑁
Example
Find the variance of the following data set:
34
55
46
38
42
The following worksheet shows the computations:
𝑥𝑖
34
-9
81
55
12
144
46
3
9
38
-5
25
42
-1
𝜇 = 43
σ2 =
(𝑥𝑖 − 𝜇)2
𝑥𝑖 − 𝜇
1
2
∑(𝑥𝑖 − µ) =
260
260
= 52
𝑁
3.3.2.
Computational (Simpler) Formula to Find the Population Variance
We can adjust the variance formula to obtain a simpler process to compute the variance. Of course, if you
have access to a computer software like Excel there is no need to use this formula. Nevertheless, learning
how to derive the computational formula is a useful exercise in learning various statistical computations. the
computational formula is obtained as follows. The numerator of the variance formula can be simplified into
the following terms:
(𝑥 − 𝜇)2 = (𝑥2 − 2µ𝑥 + µ2 )
(𝑥 − 𝜇)2 = (𝑥2 − 2µ𝑥 + µ2 )
(𝑥 − 𝜇)2 = 𝑥² − 2µ𝑥 + 𝑁µ2
1-Numerical Descriptive Statistics
7 of 37
(𝑥 − 𝜇)2 = 𝑥2 − 2𝑁µ2 + 𝑁µ2
(𝑥 − 𝜇)2 = 𝑥2 − 𝑁µ²
Then,
σ2 =
∑ 𝑥 2 − 𝑁µ2
𝑁
Compute the variance from the data set above using the simple formula:
σ2 =
𝑥
𝑥2
34
1156
55
3025
46
2116
38
1444
42
1764
µ = 43
𝑥² = 9505
9505 − 5 × 432
5
In Excel, the variance of a population data set is obtained by the function: =VAR.P(data range)
3.3.3. Sample Variance
The variance of sample data set is not only denoted by a different symbol but also it is obtained using a
different formula. To find the average squared deviation or “mean square”, divide the total sum of squares by
𝑛 − 1. The value obtained by 𝑛 − 1 is called the degrees of freedom. This concept will be explained later
within the context of inferential statistics. There we will also learn why we divide the sum of squared
deviations by 𝑛 − 1, rather than just 𝑛, to determine the sample variance.
s2 =
∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
Example
Using the same data above, assume the data represent a sample, find the sample variance.
𝑥
𝑥 − 𝑥̅
(𝑥 − 𝑥̅ )²
34
-9
81
55
12
144
46
3
9
38
-5
25
42
-1
1
x̄ = 43
s2 =
(𝑥 − 𝑥̅ )²=
260
260
= 65
4
1-Numerical Descriptive Statistics
8 of 37
Similarly, the simple computational formula for the sample variance is:
𝑠2 =
∑𝑥 2 − 𝑛𝑥̅ 2
𝑛−1
In Excel, the variance of a sample data set is found by the function: =VAR.S(data range)
3.4. Standard Deviation
The standard deviation is the (positive) square root of the variance. It is the indirect way obtaining the mean
deviation of the data points from their center of gravity. The population standard deviation formula is:
σ=√
∑(𝑥 − µ)2
𝑁
And the sample standard deviation:
∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
In Excel,
the standard deviation of the population data set (σ) is obtained by:
the standard deviation of the sample data set (s) is obtained by:
3.5.
=STDEV.P(data range)
=STDEV.S(data range)
The z-score
Using the mean and the standard deviation of a data set we can determine the relative location of each
observation or data point. This relative location is measured as the distance or deviation of each data point
from the mean in units of the standard deviation. The deviation of each data point from the mean is 𝑥 − 𝜇. If
you divide the deviation by σ, then the distance is measured relative to, or in units of, the standard deviation.
Through this process we "standardize" the data points; we transform the variable 𝑥 into the standardized
variable 𝑧.
𝑧=
𝑥−𝜇
𝜎
Example
For the following data set, find the mean and the standard deviation and then find the z-score for each data
point. That is, transform the x variable into a z variable.
46
𝜇 = 44
54
42
46
32
𝜎 = 7.155
The standardized values are determined as follows:
1-Numerical Descriptive Statistics
9 of 37
𝑧 = (𝑥 − µ)⁄σ
𝑥
𝑥−𝜇
46
2
54
10
1.40
42
−2
−0.28
46
2
0.28
32
−12
−1.68
(𝑥 − 𝜇) = 0
𝑧 = 0.00
0.28
Note that since the sum of all deviations from the mean equal to zero, then the mean of all z scores must be
zero
µ𝑧 =
∑𝑧
𝑁
µ𝑧 =
1
𝑥−µ
∑
𝑁
σ
µ𝑧 =
1
∑(𝑥 − 𝜇) = 0
𝑁𝜎
Also the variance and the standard deviation are both equal to one.
σ2𝑧 =
∑(𝑧 − µ𝑧 )2
𝑁
σ2𝑧 =
1
∑ 𝑧2
𝑁
σ2𝑧 =
(𝑥 − µ)2
1
∑
𝑁
σ2
σ2𝑧 =
1
𝑁σ2
2
∑(𝑥
−
µ)
=
=1
𝑁σ2
𝑁σ2
𝜇𝑧 = 0
(𝑧 − µ𝑧 )² *
𝑧
0.28
0.0781
1.40
1.9531
-0.28
0.0781
0.28
0.0781
-1.68
2.8125
(𝑧 − µ𝑧)2 = 5.0000
σ2𝑧 = ∑(𝑥 − µ𝑧 )2 ⁄𝑁 = 1.0000
𝑧 = 0.00
µ𝑧 = ∑ 𝑧⁄𝑁 = 0.00
* Note that the z values are rounded to two decimals. The squared values in the second column are,
therefore, not exactly the squares of the rounded values in the first column.
1-Numerical Descriptive Statistics
10 of 37
4. Measures of Association Between Two Variables
In many statistical analyses we are interested in the relationship between two variables. For example, to
what extent advertising affects the sales volume of a product? How per unit cost of production varies with
the volume of output? How a state’s annual tax revenues vary with changes in the Gross Domestic Product?
An important measure of the degree of association between two variables is covariance. A related, and more
widely used, measure is the correlation coefficient. These measures are discussed below.
4.1. Covariance
4.1.1. Population Covariance
Consider the following two population data sets represented by 𝑥 and 𝑦 with equal number of data points,
where each data point in 𝑥 is paired with a data point in 𝑦 (Example 1). Covariance, as the term implies,
measures the joint variation in the two data sets. It is the average value of the product of the deviation of data
points in each set from their respective mean:
σ𝑥𝑦 =
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )
𝑁
Example 1
𝑥
58
75
30
102
69
84
22
48
𝑦
122
118
115
144
98
160
60
87
µx = 61
µy = 113
𝑥 − µ𝑥
-3
14
-31
41
8
23
-39
-13
σ𝑥𝑦
(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )
𝑦 − µ𝑦
9
-27
5
70
2
-62
31
1271
-15
-120
47
1081
-53
2067
-26
338
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = 4618
= ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )⁄𝑁 = 577.25
4.1.1.1. Covariance is affected by the scale of the data
Because the size of the covariance is affected by the scale of the data, covariance is not used as a measure of
the strength of the relationship between x and y. It would be misleading to think that because a given
covariance is large the relationship between the two variables is strong. The above covariance, σ𝑥𝑦 = 577.25,
for example, tells us very little how strongly 𝑥 and 𝑦 are related. Using the covariance, we can derive a
relative measure which clearly shows the strength of relationship between the two variables. This relative
measure is called the correlation coefficient and will be explained below.
4.1.1.2. The covariance sign (−, +) indicates the direction of relationship between x and y
Because covariance can be either negative or positive, its sign is an indicator of the direction of the
relationship between x and y. The last column in the table for Example 1 shows that most of the deviations
products, (𝑥 − µ𝑥 )(𝑦 − µ𝑦 ), are positive and their overall size of the positives overwhelms the negatives.
1-Numerical Descriptive Statistics
11 of 37
What is the significance of this? To explain, consider the following scatter diagram of the above data, where
each dot represents the corresponding (𝑥, 𝑦) pairs.
y
180
160
84, 160
I
140
120
II
58, 122
30, 115
μy = 113
100
48, 87
80
60
102, 144
75, 118
69, 98
22, 60
40
III
IV
20
0
0
20
40
60
μx = 61
80
100
120
x
The plot area of the scatter diagram is partitioned into four quadrants using a vertical line representing µ𝑥 =
61 and a horizontal line representing µ𝑦 = 113. The intersection point of the two mean lines can be looked at
as the center of gravity of the data. The points in quadrant II represent all the (x, y) pairs that exceed their
respective means. Thus, all deviations (𝑥𝑖 − µ𝑥 ) and (𝑦𝑖 − µ𝑦 ) in this quadrant and their products
(𝑥𝑖 − µ𝑥 )(𝑦𝑖 − µ𝑦 ) > 0. In quadrant IV, the points represent all the (𝑥, 𝑦) pairs that are below their respective
mean. Thus, all deviations (𝑥𝑖 − µ𝑥 ) and (𝑦𝑖 − µ𝑦 ) in this quadrant are negative. However, the product of two
negatives is always positive. Thus, in quadrant IV also (𝑥𝑖 − µ𝑥 )(𝑦𝑖 − µ𝑦 ) > 0. In quadrant III, deviations
(𝑥𝑖 − µ𝑥 ) are positive but (𝑦𝑖 − µ𝑦 ) are negative, and the reverse holds in quadrant IV. Thus, in these two
quadrants the products (𝑥𝑖 − µ𝑥 )(𝑦𝑖 − µ𝑦 ) < 0. When in the competition between the positive products in
quadrants II and IV, on the one hand, and the negative products in quadrants I and III, on the other, the
positives overwhelm the negatives, the covariance sign will be positive, indicating a direct relationship
between x and y. The relationship is inverse when the negatives win.
4.1.1.3. Covariance when 𝒙 and 𝒚 are unrelated
When the points are evenly distributed in the four quadrants, no one wins. The points have therefore no
direction, indicating no relationship between 𝑥 and 𝑦. Thus covariance is zero or near zero. When covariance
is zero, 𝑥 and 𝑦 are said to be independent.
Example 2
The following data and the related scatter diagram show that x and y are not related. The data points appear
to be evenly distributed in the four quadrants.
1-Numerical Descriptive Statistics
12 of 37
x
97
23
86
14
54
44
19
12
55
26
87
82
41
56
92
88
y
y
147
132
105
126
101
125
134
107
109
105
119
141
143
131
122
142
160
150
140
130
μy
120
110
100
90
80
0
20
40
μx 60
80
100
x
4.1.1.4. The Computational (Simpler) Formula for Covariance
We can simplify the population covariance formula for computational purposes by rewriting the numerator
as follows:
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑(𝑥𝑦 − µ𝑦 𝑥 − µ𝑥 𝑦 + µ𝑥 µ𝑦 )
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑𝑥𝑦 − µ𝑦 ∑𝑥 − µ𝑥 ∑𝑦 + 𝑁µ𝑥 µ𝑦
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑𝑥𝑦 − 𝑁µ𝑥 µ𝑦 − 𝑁µ𝑥 µ𝑦 + 𝑁µ𝑥 µ𝑦
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) = ∑𝑥𝑦 − 𝑁µ𝑥 µ𝑦
The covariance formula then becomes:
σ𝑥𝑦 =
∑𝑥𝑦 − 𝑁µ𝑥 µ𝑦
𝑁
Use the computational formula to compute the covariance from Example 1.
1-Numerical Descriptive Statistics
13 of 37
𝑥
58
75
30
102
69
84
22
48
σ𝑥𝑦 =
𝑦
122
118
115
144
98
160
60
87
𝑥𝑦
7076
8850
3450
14688
6762
13440
1320
4176
𝑥𝑦 = 59762
59762 − 8(61)(113)
= 577.25
8
4.1.2. Sample Covariance
Sample covariance formula differs from the population covariance in the denominator. After computing the
sum of the product of the deviations, divide the result by 𝑛 − 1.
𝑠𝑥𝑦 =
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑛−1
To computational formula is:
𝑠𝑥𝑦 =
∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅
𝑛−1
Example 3
In the following table x represents the annual operating revenues and y the operating expenses (in $1,000’s)
of a small company for a sample of 10 years. Compute the sample covariance of x and y. Use both the main
formula and the computational formula.
Year
1
2
3
4
5
6
7
8
9
10
𝑥̅ = 364
𝑥
209
235
262
289
328
364
410
454
509
580
𝑦
135
150
167
188
210
235
265
302
343
395
(𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅)
(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
-155
-104
16,120
-129
-89
11,481
-102
-72
7,344
-75
-51
3,825
-36
-29
1,044
0
-4
0
46
26
1,196
90
63
5,670
145
104
15,080
216
156
33,696
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) = 95,456
𝑥𝑦
28,215
35,250
43,754
54,332
68,880
85,540
108,650
137,108
174,587
229,100
𝑥𝑦 = 965,416
𝑦̅ = 239
Original Formula:
1-Numerical Descriptive Statistics
14 of 37
𝑠𝑥𝑦 =
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) 95,456
=
= 10,606.22
𝑛−1
9
Computational Formula
𝑠𝑥𝑦 =
∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 965,416 − 10(364)(239) 95,456
=
=
= 10,606.22
𝑛−1
9
9
Note that, other than the fact that covariance is positive, meaning that revenues and expenses vary together,
𝑠𝑥𝑦 = 10,606.22 conveys very little additional information about the nature of association between x and y.
To measure the strength of the relationship between revenues and expenses we need a different measure.
This takes us to the correlation coefficient.
4.2. Correlation Coefficient
The correlation coefficient is a relative measure, showing the strength of the relationship between 𝑥 and 𝑦. It
is determined by dividing the covariance of 𝑥 and 𝑦 by the product of the standard deviation of x and the
standard deviation of y. The symbol for the population correlation coefficient is ρ𝑥𝑦 (𝑟ℎ𝑜 𝑠𝑢𝑏 𝑥𝑦) and the
symbol for the sample correlation coefficient is 𝑟𝑥𝑦 . Other than the difference in the symbols, the formulas are
the same, both resulting in identical coefficients.
Population correlation coefficient
ρ𝑥𝑦 =
ρ𝑥𝑦
σ𝑥𝑦
σ𝑥 σ𝑦
− 1 ≤ ρ𝑥𝑦 ≤ 1
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )
𝑁
=
2 ∑(𝑦 − µ )2
)
∑(𝑥
−
µ
𝑦
𝑥
√
√
𝑁
𝑁
ρ𝑥𝑦 =
∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )
√∑(𝑥 − µ𝑥 )2 √∑(𝑦 − µ𝑦 )2
Sample correlation coefficient
𝑟𝑥𝑦 =
𝑠𝑥𝑦
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
=
𝑠𝑥 𝑠𝑦 √∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅)2
− 1 ≤ 𝑟𝑥𝑦 ≤ 1
Both ρ𝑥𝑦 and 𝑟𝑥𝑦 vary between −1 and +1. If the coefficient is negative, then there is an inverse relationship
between 𝑥 and 𝑦. A positive coefficient indicates a direct relationship between the two variables. The closer
the coefficient value is to the two extremes, the stronger the relationship between 𝑥 and 𝑦, and the closer it is
to zero, the weaker the relationship. If the coefficient is zero, then there is no relationship between the two—
𝑥 and 𝑦 are said to be independent.
Example 4
Find the correlation coefficient for the population data in Example 1.
1-Numerical Descriptive Statistics
15 of 37
𝑥
58
75
30
102
69
84
22
48
𝑦
122
118
115
144
98
160
60
87
As computed above,
σ𝑥𝑦 = 577.25
Using the population standard deviation formula, we have
∑(𝑥 − µ𝑥 )2
σ𝑥 = √
= 25.323
𝑁
σ𝑦 = √
ρ𝑥𝑦 =
∑(𝑦 − µ𝑦 )2
= 29.559
𝑁
σ𝑥𝑦
577.25
=
= 0.77
σ𝑥 σ𝑦 (25.323)(29.559)
Since the correlation coefficient is close to 1, then there appears to be a reasonably strong association
between x and y.
Example 5
Use the sample data in Example 3 to determine the sample correlation coefficient.
Using the sample standard deviation formula for the 𝑥 and 𝑦 data, we have:
𝑠𝑥 = 122.88
𝑟𝑥𝑦 =
𝑠𝑦 = 86.39
𝑠𝑥𝑦
10,606.22
=
= 0.999
𝑠𝑥 𝑠𝑦 (122.88)(86.39)
Here the correlation coefficient is almost one, indicating a direct extremely close relationship between
operating revenue and expenses. This close relationship is shown in the following scatter diagram, where the
sample points appear to lie on straight, upward sloping line.
1-Numerical Descriptive Statistics
16 of 37
y
450
400
580, 395
350
509, 343
300
454, 302
410, 265
364, 235
328, 210
289, 188
262, 167
235, 150
209, 135
250
200
150
100
50
0
100
200
300
400
500
600
700
x
Example 6
Compute the correlation coefficient for the data in Example 2.
To obtain the correlation coefficient without finding the covariance first, use the following simpler formula
(easily derived from either the population or sample correlation coefficient formula):
𝑟𝑥𝑦 =
∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅
√∑𝑥 2
− 𝑛𝑥̅ 2 √∑𝑦 2 − 𝑛𝑦̅ 2
𝑥
97
23
86
14
54
44
19
12
55
26
87
82
41
56
92
88
𝑥̅ = 54.75
𝑟𝑥𝑦 =
−
𝑛𝑥̅ 2 √∑𝑦 2
−
𝑛𝑦̅ 2
𝑥𝑦
14259
3036
9030
1764
5454
5500
2546
1284
5995
2730
10353
11562
5863
7336
11224
12496
110432
𝑥𝑦 = 110432
𝑦̅ = 124.3125
∑𝑥𝑦 − 𝑛𝑥̅ 𝑦̅
√∑𝑥 2
𝑦
147
132
105
126
101
125
134
107
109
105
119
141
143
131
122
142
=
𝑥²
9409
529
7396
196
2916
1936
361
144
3025
676
7569
6724
1681
3136
8464
7744
61906
𝑦²
21609
17424
11025
15876
10201
15625
17956
11449
11881
11025
14161
19881
20449
17161
14884
20164
250771
𝑥² = 61906
110432 − 16(54.75)(124.3125)
√61906 − 16(54.752 )√250771 − 16(124.31252 )
𝑦² = 250771
= 0.2192
The correlation coefficient of 0.22 indicates that there is a very week association between 𝑥 and 𝑦.
How to Find Covariance and Correlation Coefficient Using Excel—
1-Numerical Descriptive Statistics
17 of 37
The formula for population covariance:
=COVARIANCE.P(array1,array2)
The formula for sample covariance:
=COVARIANCE.S(array1,array2)
The formula for correlation coefficient :
=CORREL(array1,array2)
Since the population and sample correlation coefficient formulas give the same results the Excel formula
applies to both.
5. Random Variables
A random variable is a variable whose values are determined through a random experiment or process. In
other words, a random variable is a variable whose value cannot be predicted exactly. The value is not
known in advance; it is not known until after the random experiment is conducted.
Example 5.1
As a random experiment, toss a coin. This random experiment has two outcomes: H and T. Let’s assign the
value 0 to H, and 1 to T. Thus, this experiment generates values we can assign to the random variable x,
which is the number of tails. If you toss two coins, then the number of tails is either 0, 1, or 2:
Outcomes of the random
experiment
(0,0)
(0,1), (1,0)
(1,1)
Values of random variable
𝑥
Number of tails
0
1
2
Example 5.2
When you toss a pair of dice, let 𝑥 denote the sum of the number of dots appearing on top. These numbers, as
they appear in the top row below, are assigned to 𝑥 through the outcomes of the random experiment. The
outcomes are shown below.
Outcomes of the random experiment
(1,1)
Values of random variable 𝑥
Sum of dots
2
(1,2), (2,1)
3
(1,3), (2, 2), (3,1)
4
(1,4), (2,3), (3,2), (4,1)
5
(1,5), (2,4), (3,3), (4,2), (5,1)
6
(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)
7
(2,6), (3,5), (4,4), (5,3), (6,2)
8
(3,6), (4,5), (5,4), (6,3)
9
(4,6), (5,5), (6,4)
10
(5,6), (6,5)
11
(6,6)
12
Example 5.3
When you are guessing the answers to a set of 5 multiple-choice questions, you are conducting a random
experiment. Assign 0 to the incorrect answer and 1 to the correct answer for each question, and let 𝑥 denote
the number of possible correct answers: 0, 1, 2, 3, 4, 5. These numbers are assigned to 𝑥 through the
1-Numerical Descriptive Statistics
18 of 37
outcomes of the random experiment as shown below. There are 32 possible outcomes in this random
experiment.1
𝑥
Correct
guesses
Outcomes of the random experiment
(0,0,0,0,0)
0
(1,0,0,0,0), (0,1,0,0,0), (0,0,1,0,0), (0,0,0,1,0), (0,0,0,0,1),
1
(1,1,0,0,0), (1,0,1,0,0), (1,0,0,1,0), (1,0,0,0,1), (0,1,1,0,0), (0,1,0,1,0), (0,1,0,0,1), (0,0,1,1,0), (0,0,1,0,1), (0,0,0,1,1)
2
(1,1,1,0,0), (1,1,0,1,0), (1,1,0,0,1), (1,0,1,1,0), (1,0,1,0,1), (1,0,0,1,1), (0,1,1,1,0), (0,1,1,0,1), (0,1,0,1,1), (0,0,1,1,1)
3
(1,1,1,1,0), (1,1,1,0,1), (1,1,0,1,1), (1,0,1,1,0), (0,1,1,1,1)
4
(1,1,1,1,1)
5
5.1. Probability Distribution (Probability Density Function) of Discrete Random
Variables
The random variables presented above are examples of discrete random variables. A random variable is
discrete if it has a specific set of values, that is, if all of its possible values can be enumerated. The probability
distribution, or probability density function, of a discrete random variable is simply a table showing all the
possible values and corresponding probability of each value occurring. Depending on the nature of the
random process generating the random variable, determining the probability distribution can be a very
simple or a highly complex exercise.
Example 5.4
Let 𝑥 be the random variable denoting the number of tails when tossing two coins as in Example 5.1. Write
the probability distribution of x.
As shown in Example 5.1, there are four outcomes when tossing two coins, each assigning a discrete value to
𝑥. Since each outcome is equally likely then the probability distribution is:
x
0
1
2
f(x)
0.25
0.50
0.25
1.00
Example 5.5
Let 𝑥 be the random variable denoting the sum of dots appearing on top when tossing a pair of dice. Write
the probability distribution of x.
Per Example 5.2, there are 36 outcomes of (ordered) pairs of numbers generating the values of 𝑥. Since each
outcome is equally likely, then the probability distribution of x can be written as follows:
Each trial has two outcomes (success or failure). The experiment has five trials. Therefore, the total number
of outcomes is 25 = 32.
1
1-Numerical Descriptive Statistics
19 of 37
𝑥
2
3
4
5
6
7
8
9
10
11
12
𝑓(𝑥)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
Example 5.6
Let x be the random variable denoting the number of correct answers when guessing the answers to a 5question multiple choice exam. Write the probability distribution of 𝑥.
There are 32 outcomes, as shown in Example 5.3, for this random experiment. However, the outcomes are
not all equally likely. If the test were a true-false type, the outcomes would be equally likely. For multiple
choice tests the likelihood of each outcome depends a) on the probability of correct guess for each question—
which depends on the number of choices—and b) the number of correct guesses in each outcome.
Assume there are four choices per question. Then the probability of an incorrect guess for each question is
P(0) = 3⁄4 = 0.75, and for a correct guess, P(1) = 1⁄4 = 0.25 P(1). Since the trials, guessing the answers,
are independent, then:
𝑓(𝑥 = 0) = P(0,0,0,0,0) = (0.75)(0.75)(0.75)(0.75)(0.75) = 0.2373
𝑓(𝑥 = 5) = P(1,1,1,1,1) = (0.25)(0.25)(0.25)(0.25)(0.25) = 0.0010
Since the outcomes generating each of the intermediate values of the random variable are, for each value,
equally likely, then,
𝑓(𝑥
𝑓(𝑥
𝑓(𝑥
𝑓(𝑥
= 1) = 5 × 0.25 × 0.754 = 0.3955
= 2) = 10 × 0.252 × 0.753 = 0.2637
= 3) = 10 × 0.253 × 0.752 = 0.0879
= 4) = 10 × 0.254 × 0.75 = 0.0146
The probability distribution of x is then:
𝑥
0
1
2
3
4
5
𝑓(𝑥)
0.2373
0.3955
0.2637
0.0879
0.0146
0.0010
Note that this is an example of a binomial distribution. The random experiments or processes generating a
binomial distribution have the following characteristics: a) All 𝑛 trials are independent and identical; b) each
trial has two outcomes, success or failure; and c) the probability of success (and failure) remains unchanged
for all trials. Let 𝑛 be the number of trials, 𝑥 be the number of successes, and π be the probability of success
per trial. Then
𝑓(𝑥) = C(𝑛, 𝑥)π𝑥 (1 − π)(𝑛−𝑥)
1-Numerical Descriptive Statistics
20 of 37
where C(𝑛, 𝑥) is the combination counting technique of selecting x items from n items:
C(𝑛, 𝑥) =
𝑛!
(𝑛 − 𝑥)! 𝑥!
Example 5.7
Find the probability of guessing 8 answers correctly in a set of 25 multiple choice questions each with 5
choices.
𝑓(8) = C(25, 8)(0.28 )(0.817 ) = (1,081,575)(0.00000256)(0.022518) = 0.0623
5.2. Expected Value of Discrete Random Variables
The expected value of a discrete random variable is the mean of the values assigned to the random variable.
Since each value has a distinct probability of occurring, then these probabilities must be taken into account
when computing the mean. The values must be weighted by their respective probabilities. Therefore,
expected value is simply the weighted mean of all possible values of a discrete random variable.
E(𝑥) = 𝑥𝑖 𝑓(𝑥𝑖 )
𝑥𝑖
𝑥1
𝑥2

𝑥𝑛
𝑓(𝑥𝑖 )
𝑓(𝑥1 )
𝑓(𝑥2 )

𝑓(𝑥𝑛 )
𝑥𝑖 𝑓(𝑥𝑖 )
𝑥1 𝑓(𝑥1 )
𝑥2 𝑓(𝑥2 )

𝑥3 𝑓(𝑥3 )
E(𝑥) = 𝑥𝑖 𝑓(𝑥𝑖 )
Example 5.8
Find the expected value of the sum of the dots when tossing a pair of dice.
𝑥
2
3
4
5
6
7
8
9
10
11
12
𝑓(𝑥)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
𝑥𝑓(𝑥) =
𝑥 𝑓(𝑥)
2/36
6/36
12/36
20/36
30/36
42/36
40/36
36/36
30/36
22/36
12/36
252/36
E(𝑥) = 𝑥𝑓(𝑥) = 252⁄36 = 7
5.2.1. Expected Value Rules
The following rules affecting expected values will be used very frequently in future discussions:
1) Expected value of a constant is the constant itself:
1-Numerical Descriptive Statistics
21 of 37
E(𝑎) = 𝑎
2) If you multiply each value of the random variable by a constant, expected value is also a multiple of that
constant:
E(𝑏𝑥) = 𝑏𝐸(𝑥)
E(𝑏𝑥) = ∑𝑏𝑥𝑓(𝑥)
E(𝑏𝑥) = 𝑏𝑥1 𝑓(𝑥1 ) + 𝑏𝑥2 𝑓(𝑥2 ) + ⋯ + 𝑏𝑥𝑛 𝑓(𝑥𝑛 )
E(𝑏𝑥) = 𝑏[𝑥1 𝑓(𝑥1 ) + 𝑥2 𝑓(𝑥2 ) + ⋯ + 𝑥𝑛 𝑓(𝑥𝑛 )]
E(𝑏𝑥) = 𝑏∑𝑥𝑓(𝑥) = 𝑏E(𝑥)
Let 𝑦 = 𝑎 + 𝑏𝑥
Then, combining Rule 1 and Rule 2, we have,
E(𝑦) = E(𝑎 + 𝑏𝑥) = 𝑎 + 𝑏E(𝑥)
Example 5.9
The following is the probability (relative frequency) of the number of cars sold in a week by a salesman in a
car dealership.
𝑥
0
1
2
3
4
5
𝑓(𝑥)
0.05
0.10
0.25
0.40
0.15
0.05
a) Find the expected value or the mean number of cars sold per week by the salesman.
E(𝑥) = 𝑥𝑓(𝑥) = 2.65
b) The salesman receives a fixed weekly salary of $300 and a $200 commission for each car sold. What is the
salesman’s mean weekly income?
𝑦 = 300 + 200𝑥
E(𝑦) = E(300 + 200𝑥) = 300 + 200E(𝑥) = 300 + (200)(2.65) = 830
5.3. Variance of the Discrete Random Variable
The variance, generally, is computed as the mean squared deviation of a variable from the mean.
σ2 =
∑(𝑥 − µ)2
𝑁
If the variable is a random variable, then we must compute the weighted mean of the squared deviations
using the probabilities as the weights. Using the symbol for the population mean μ in place of E(𝑥), let
𝑢 denote the deviation of the values of the random variable from the mean,
𝑢 =𝑥−𝜇
1-Numerical Descriptive Statistics
22 of 37
Then variance of the random variable 𝑥 is defined as the expected value of the squared deviations.
var(𝑥) = E(𝑢2 ) = E[(𝑥 − µ)2 ] = ∑(𝑥 − µ)2 𝑓(𝑥)
Example 5.10
Use the data for the car salesman in Example 8 to compute the variance of the number of cars sold.
𝑥
0
1
2
3
4
5
𝑓(𝑥)
0.05
0.10
0.25
0.40
0.15
0.05
(𝑥 − µ)2 𝑓(𝑥)
0.3511
0.2723
0.1056
0.0490
0.2734
0.2761
1.3275
Using the properties of expected value we can develop an easier computational formula for var(𝑥):
var(𝑥) = E[(𝑥 − µ)2 ]
var(𝑥) = E(𝑥 2 − 2𝑥µ + µ2 ) = E(𝑥 2 ) − 2µE(𝑥) + µ2
var(𝑥) = E(𝑥 2 ) − 2µ2 + µ2
var(𝑥) = E(𝑥 2 ) − µ2
The square root of the variance is the standard deviation of the random variable, sd(𝑥). The standard
deviation in effect shows the average deviation of the random variable from its mean.
5.3.1. Variance Rules
a) Variance of a constant is zero: var(𝑎) = 0
b) Multiplying the random variable by a constant increases the variance by a factor of the square of the
constant.
var(𝑏𝑥) = 𝑏 2 var(𝑥)
var(𝑏𝑥) = E[(𝑏𝑥 − 𝑏µ)2 ]
var(𝑏𝑥) = E[𝑏 2 (𝑥 − µ)2 ]
var(𝑏𝑥) = 𝑏 2 E[(𝑥 − µ)2 ] = 𝑏 2 var(𝑥)
Let
𝑦 = 𝑎 + 𝑏𝑥
var(𝑦) = 𝑏 2 var(𝑥)
Example 5.11
Using the income figures of the salesman in Example 8 find the variance of income.
𝑦 = 300 + 200𝑥
var(𝑦) = 2002 var(𝑥) = (40,000)(1.3275) = 53,100
The standard deviation of income is,
1-Numerical Descriptive Statistics
23 of 37
sd(𝑦) = √53,100 = $230.43
5.4. The Fixed and Random Components of a Random Variable
As shown above, the deviation of the random variable from the mean is denoted by 𝑢
𝑢 =𝑥−µ
Rearranging this equation we can express 𝑥 as follows:
𝑥 =µ+𝑢
Here we have simply separated the random variable into two components: the fixed component is the
population parameter μ, and the random component is 𝑢. Using the variance rules, we have
var(𝑥) = var(µ + 𝑢) = var(𝑢)
var(µ) = 0
The variance of the random variable is the variance of the random component.
The following properties of 𝑢 will be referred to frequently in the subsequent chapters.
E(𝑢) = E(𝑥 − µ)
E(𝑢) = E(𝑥) − µ = µ − µ = 0
var(𝑢) = E[(𝑢 − 0)2 ] = E(𝑢2 )
5.5. Joint Probability Distribution (Probability Density Function) of Two Random
Variables
To study the relationship between two random variables we use the joint probability distribution. Consider
two random variables 𝑥 and 𝑦, where 𝑥 takes on values 0 and 1, and y takes the values 0, 1, and 2. The
following table shows the joint probability distribution (JPD) of 𝑥 and 𝑦.
𝑥
0
1
𝑓(𝑦)
0
0.26
0.07
0.33
𝑦
1
0.06
0.12
0.18
2
0.08
0.41
0.49
𝑓(𝑥)
0.40
0.60
1.00
This JPD is used to explain the following concepts: marginal probability, joint probability, and conditional
probability.
5.5.1.
Marginal Probability
The random variable x takes on the values 0, and 1, and y, 0, 1 and 2. The probability density function (PDF)
for each random variable is shown below:
1-Numerical Descriptive Statistics
24 of 37
𝑥
0
1
𝑓(𝑥)
0.40
0.60
𝑦
0
1
2
𝑓(𝑦)
0.33
0.18
0.49
The probability associated with each value of 𝑥 and each value of 𝑦 are obtained from the margin (last row
and last column, respectively) of the joint probability table. This is why 𝑓(𝑥) and 𝑓(𝑦) are called marginal
probability density functions.
5.5.2.
Joint Probabilities
The probability associated with each pair of (𝑥, 𝑦) value is called a joint probability. For example, in the table
above the probability that (𝑥 = 0, 𝑦 = 0) is 0.26: 𝑓(0, 0) = 0.26. Since 𝑥 has two values and each is paired
with three 𝑦 values, then there are 6 joint probabilities. The joint PDF is thus a function of the values of 𝑥 and
𝑦. It provides the probability associated with distinct pair of (x, y) values. For example, 𝑓(1, 2) = 0.41
Note that each marginal probability is the sum of joint probabilities. For marginal probabilities of 𝑥 the joint
probabilities are summed across the row for each 𝑥 value; and for those of 𝑦 the joint probabilities are
summed down the column for each y value. For example,
𝑓(𝑥 = 0) = 𝑓(0,0) + 𝑓(0,1) + 𝑓(0,2) = 0.26 + 0.06 + 0.08 = 0.40
𝑓(𝑦 = 1) = 𝑓(0,1) + 𝑓(1,1) = 0.06 + 0.12 = 0.18
The following table shows the general form of a joint probability distribution table.
𝑥
𝑥1
𝑥2
𝑓(𝑦)
𝑦
𝑦2
𝑓(𝑥1 , 𝑦2 )
𝑓(𝑥2 , 𝑦2 )
𝑓(𝑦2 )
𝑦1
𝑓(𝑥1 , 𝑦1 )
𝑓(𝑥2 , 𝑦1 )
𝑓(𝑦1 )
𝑦3
𝑓(𝑥1 , 𝑦2 )
𝑓(𝑥2 , 𝑦3 )
𝑓(𝑦3 )
𝑓(𝑥)
𝑓(𝑥1 )
𝑓(𝑥2 )
1.00
The following shows, using summation notation, how in general the marginal and joint probabilities are
related. For example,
3
𝑓(𝑥1 ) = 𝑓(𝑥1 , 𝑦1 ) + 𝑓(𝑥1 , 𝑦2 ) + 𝑓(𝑥1 , 𝑦3 ) = ∑ 𝑓(𝑥1 , 𝑦𝑗 )
𝑗=1
Or,
2
𝑓(𝑦1 ) = 𝑓(𝑥1 , 𝑦1 ) + 𝑓(𝑥2 , 𝑦1 ) = ∑ 𝑓(𝑥𝑖 , 𝑦1 )
𝑖=1
Generally, for 𝑖 = 1, 2, … , 𝑚, and for 𝑗 = 1, 2, . . . , 𝑛, we have:
𝑛
𝑓(𝑥𝑖 ) = 𝑓(𝑥𝑖 , 𝑦1 ) + 𝑓(𝑥𝑖 , 𝑦2 ) + ⋯ + 𝑓(𝑥𝑖 , 𝑦𝑛 ) = ∑ 𝑓(𝑥𝑖 , 𝑦𝑗 )
𝑗=1
𝑚
𝑓(𝑦𝑗 ) = 𝑓(𝑥1 , 𝑦𝑗 ) + 𝑓(𝑥2 , 𝑦𝑗 ) + ⋯ + 𝑓(𝑥𝑚 , 𝑦𝑗 ) = ∑ 𝑓(𝑥𝑖 , 𝑦𝑗 )
𝑖=1
1-Numerical Descriptive Statistics
25 of 37
5.5.3.
Conditional Probability of x and y
The conditional probability of 𝑥, given a value of 𝑦, shows the probability that 𝑥 takes on a value for a given
value of the random variable 𝑦. This conditional probability is the ratio of joint probability of 𝑥 and 𝑦 over the
marginal probability of 𝑦.
𝑓(𝑥|𝑦) =
𝑓(𝑥, 𝑦)
𝑓(𝑦)
The conditional probability of 𝑥 given 𝑦 is the ratio of joint probability of 𝑥 and 𝑦 over the marginal
probability of 𝑦.
Inversely,
𝑓(𝑦|𝑥) =
𝑓(𝑥, 𝑦)
𝑓(𝑥)
For example, for the numerical JPD table above,
𝑥
0
1
𝑓(𝑦)
0
0.26
0.07
0.33
𝑓(𝑥 = 1|𝑦 = 0) =
𝑓(1,0)
0.07
=
= 0.2121
𝑓(𝑦 = 0) 0.33
𝑓(𝑦 = 2|𝑥 = 1) =
𝑓(1,2)
0.41
=
= 0.6833
𝑓(𝑥 = 1) 0.60
𝑦
1
0.06
0.12
0.18
2
0.08
0.41
0.49
𝑓(𝑥)
0.40
0.60
1.00
Rewriting the conditional probability formula, we can express the joint probability of x and y as the product of
the conditional probability times the marginal probability.
𝑓(𝑥, 𝑦) = 𝑓(𝑥|𝑦)𝑓(𝑦)
or,
𝑓(𝑥, 𝑦) = 𝑓(𝑦|𝑥)𝑓(𝑥)
This leads us to the definition of independent random variables versus dependent random variables.
5.6. Independent versus Dependent Random Variables
Two random variables x and y are independent if,
𝑓(𝑥|𝑦) = 𝑓(𝑥)
and
𝑓(𝑦|𝑥) = 𝑓(𝑦)
If 𝑥 and 𝑦 are independent, then their joint probability becomes:
1-Numerical Descriptive Statistics
26 of 37
𝑓(𝑥, 𝑦) = 𝑓(𝑥|𝑦)𝑓(𝑦) = 𝑓(𝑥)𝑓(𝑦)
As an indicator of the dependent or independent relationship between x and y always compare the joint
probability to the product of the two marginal probabilities. If,
𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦)
𝑥 and 𝑦 are independent
𝑓(𝑥, 𝑦) ≠ 𝑓(𝑥)𝑓(𝑦)
𝑥 and 𝑦 are dependent
Consider the following joint probability distributions.
𝑥
0
1
2
𝑓(𝑦)
(A)
𝑥 and 𝑦 are independent
𝑦
0
1
2
0.300 0.125 0.075
0.180 0.075 0.045
0.120 0.050 0.030
0.60
0.25
0.15
𝑓(𝑥)
0.50
0.30
0.20
1.00
(B)
𝑥 and 𝑦 are dependent
𝑦
0
1
2
0.25
0.15
0.10
0.20
0.08
0.02
0.15
0.02
0.03
0.60
0.25
0.15
𝑥
0
1
2
𝑓(𝑦)
𝑓(𝑥)
0.50
0.30
0.20
1.00
In (A), 𝑥 and 𝑦 are independent because 𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦). For example,
𝑓(𝑥 = 1, 𝑦 = 2) = 𝑓(𝑥 = 1)𝑓(𝑦 = 2) = 0.30 × 0.15 = 0.045
In (B), 𝑥 and 𝑦 are dependent because 𝑓(𝑥, 𝑦) ≠ 𝑓(𝑥)𝑓(𝑦). For example,
𝑓(𝑥 = 1, 𝑦 = 2) = 0.02 ≠ 𝑓(𝑥 = 1)𝑓(𝑦 = 2) = 0.30 × 0.15 = 0.45
5.7.
Covariance of Random Variables x and y
The covariance of 𝑥 and 𝑦 is a measure of association or linear mutual variability of 𝑥 and 𝑦. It is the expected
value of the product of the deviations of 𝑥 from the mean of 𝑥 times the deviations of 𝑦 from the mean of 𝑦.
That is, it is the weighted average of the product of the two deviation terms. The weights are the joint
probabilities 𝑓(𝑥, 𝑦).
cov(𝑥, 𝑦) = E[(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )] = ∑ ∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 ) 𝑓(𝑥, 𝑦)
𝑥
𝑦
Before the double summation notation in the formula scares you, let us compute the covariance from Table
(B) above as reproduced here.
x
0
1
2
f(y)
1-Numerical Descriptive Statistics
0
0.25
0.20
0.15
0.60
y
1
0.15
0.08
0.02
0.25
2
0.10
0.02
0.03
0.15
f (x)
0.50
0.30
0.20
1.00
27 of 37
First list all 9 pairs of joint (𝑥, 𝑦)’s and the corresponding joint probabilities, then proceed with the
calculations as shown below. Note that µ𝑥 = ∑𝑥𝑓(𝑥) = 0.7 and µ𝑦 = ∑𝑦𝑓(𝑦) = 0.55.
𝑥
𝑦
𝑓(𝑥, 𝑦)
0
0
0
1
1
1
2
2
2
0
1
2
0
1
2
0
1
2
0.25
0.15
0.10
0.20
0.08
0.02
0.15
0.02
0.03
(𝑥 − µ𝑥 )
(𝑦 − µ𝑦 )
(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )𝑓(𝑥, 𝑦)
-0.7
-0.55
0.09625
-0.7
0.45
-0.04725
-0.7
1.45
-0.10150
0.3
-0.55
-0.03300
0.3
0.45
0.01080
0.3
1.45
0.00870
1.3
-0.55
-0.10725
1.3
0.45
0.01170
1.3
1.45
0.05655
∑∑(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )𝑓(𝑥, 𝑦) = −0.10500
The covariance is: cov(𝑥, 𝑦) = −0.105.
The negative sign in this example indicates that the two variable are inversely related. Thus, the covariance
shows the direction of the relationship between 𝑥 and 𝑦. If the covariance is negative, then 𝑥 and 𝑦 tend to
move in the opposite direction: when 𝑥 increases, then 𝑦 will decrease. When 𝑥 decreases, 𝑦 will increase.
The two random variables are said to be negatively correlated.
The covariance is positive when 𝑥 and 𝑦 are directly related. They tend to move together: when 𝑥 increases,
then y will increase. When 𝑥 decreases, 𝑦 will decrease. The two random variables are said to be positively
correlated
Now compute the covariance from Table (A):
𝑥
0
0
0
1
1
1
2
2
2
𝑦
0
1
2
0
1
2
0
1
2
𝑓(𝑥, 𝑦)
0.300
0.125
0.075
0.180
0.075
0.045
0.120
0.050
0.030
(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )𝑓(𝑥, 𝑦)
(𝑥 − µ𝑥 ) (𝑦 − µ𝑦 )
-0.70
-0.55
0.11550
-0.70
0.45
-0.03938
-0.70
1.45
-0.07613
0.30
-0.55
-0.02970
0.30
0.45
0.01013
0.30
1.45
0.01958
1.30
-0.55
-0.08580
1.30
0.45
0.02925
1.30
1.45
0.05655
)(𝑦
∑∑(𝑥 − µ𝑥
− µ𝑦 )𝑓(𝑥, 𝑦) = 0.000000
Note that the covariance is zero. This is not a coincidence. Recall that Table (A) represented the joint
probability distribution of two independent variables. When two random variables are independent, they
have no tendency to move together in either direction. They are said to be uncorrelated.
5.7.1. The covariance formula simplified
It appears that a lot of effort goes into the calculation of the covariance. A simple reconfiguration of the
definitional formula for covariance results in a simpler computational formula, as follows:
1-Numerical Descriptive Statistics
28 of 37
cov(𝑥, 𝑦) = E[(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )] = E(𝑥𝑦 − 𝑥µ𝑦 − 𝑦µ𝑥 + µ𝑥 µ𝑦 )
Using the linear transformation properties of the expected value, we have:
E(𝑥𝑦 − 𝑥µ𝑦 − 𝑦µ𝑥 + µ𝑥 µ𝑦 ) = E(𝑥𝑦) − µ𝑦 E(𝑥) − µ𝑥 E(𝑦) + µ𝑥 µ𝑦
E(𝑥𝑦 − 𝑥µ𝑦 − 𝑦µ𝑥 + µ𝑥 µ𝑦 ) = E(𝑥𝑦) − µ𝑦 µ𝑥 − µ𝑥 µ𝑦 + µ𝑥 µ𝑦 = E(𝑥𝑦) − µ𝑦 µ𝑥
In short,
cov(𝑥, 𝑦) = E(𝑥𝑦) − µ𝑦 µ𝑥
where,
E(𝑥𝑦) = ∑∑𝑥𝑦𝑓(𝑥, 𝑦).
2
For Table (B) we have,
𝑥
0
0
0
1
1
1
2
2
2
𝑦
0
1
2
0
1
2
0
1
2
𝑥𝑦
0
0
0
0
1
2
0
2
4
𝑓(𝑥, 𝑦)
𝑥𝑦𝑓(𝑥, 𝑦)
0.25
0.00
0.15
0.00
0.10
0.00
0.20
0.00
0.08
0.08
0.02
0.04
0.15
0.00
0.02
0.04
0.03
0.12
E(𝑥𝑦) = 0.28
µ𝑥 µ𝑦 = 0.385
cov(𝑥, 𝑦) = E(𝑥𝑦) − µ𝑥 µ𝑦 = −0.105
For Table (A)
2
This is a good place to show how the double summation notation works:
𝑚
𝑛
𝑚
∑ ∑ 𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 ) = ∑[𝑥𝑖 𝑦1 𝑓(𝑥𝑖 , 𝑦1 ) + 𝑥𝑖 𝑦2 𝑓(𝑥𝑖 , 𝑦2 ) + ⋯ + 𝑥𝑖 𝑦𝑛 𝑓(𝑥𝑖 , 𝑦𝑛 )]
𝑖=1 𝑗=1
∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )
∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )
∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )
∑∑𝑥𝑖 𝑦𝑗 𝑓(𝑥𝑖 , 𝑦𝑗 )
𝑖=1
= 𝑥1 𝑦1 𝑓(𝑥1 , 𝑦1 ) + 𝑥1 𝑦2 𝑓(𝑥1 , 𝑦2 ) + ⋯ + 𝑥1 𝑦𝑛 𝑓(𝑥𝑖 , 𝑦𝑛 ) +
= 𝑥2 𝑦1 𝑓(𝑥2 , 𝑦1 ) + 𝑥2 𝑦2 𝑓(𝑥2 , 𝑦2 ) + ⋯ + 𝑥2 𝑦𝑛 𝑓(𝑥2 , 𝑦𝑛 ) +
= 𝑥2 𝑦1 𝑓(𝑥2 , 𝑦1 ) + 𝑥2 ⋮ 𝑓(𝑥2 , 𝑦2 ) + ⋯ + 𝑥2 𝑦𝑛 𝑓(𝑥2 , 𝑦𝑛 ) +
= 𝑥𝑚 𝑦1 𝑓(𝑥𝑚 , 𝑦1 ) + 𝑥𝑚 𝑦2 𝑓(𝑥𝑚 , 𝑦2 ) + ⋯ + 𝑥𝑚 𝑦𝑛 𝑓(𝑥𝑚 , 𝑦𝑛 )
1-Numerical Descriptive Statistics
29 of 37
𝑥
0
0
0
1
1
1
2
2
2
𝑦
0
1
2
0
1
2
0
1
2
𝑥𝑦
0
0
0
0
1
2
0
2
4
𝑓(𝑥, 𝑦)
𝑥𝑦𝑓(𝑥, 𝑦)
0.300
0.000
0.125
0.000
0.075
0.000
0.180
0.000
0.075
0.075
0.045
0.090
0.120
0.000
0.050
0.100
0.030
0.120
E(𝑥𝑦) = 0.385
µ𝑥 µ𝑦 = 0.385
cov(𝑥, 𝑦) = E(𝑥𝑦) − µ𝑥 µ𝑦 = 0.000
The results of the last calculations also show another important relationship between two independent
variables 𝑥 and 𝑦. Note that:
E(𝑥𝑦) − µ𝑥 µ𝑦 = 0
This means that, for any two independent random variables,
E(𝑥𝑦) = E(𝑥)E(𝑦).3
5.8. Coefficient of Correlation
The covariance by itself shows the direction of association between 𝑥 and 𝑦. We are also interested in how
strongly 𝑥 and 𝑦 are correlated. However, like all variance measures, covariance is affected by the scale or the
magnitude of data and, therefore, it cannot serve as a measure of degree or strength of association between
the two random variables. The bigger the scale of the data, the bigger the covariance, even though 𝑥 and 𝑦
may not be strongly related. For example, if the data are in millions of dollars, the covariance will be bigger
than when the data are in thousands of dollars.
To obtain a measure of correlation, we need to scale the covariance. Scaling results in a relative measure, thus
removing the impact of the scale of the data. Scaling takes place by dividing the covariance by the product of
sd(𝑥) and sd(𝑦). The result is called the coefficient of correlation, denoted by the Greek letter ρ (lower case
rho):
ρ=
cov(𝑥, 𝑦)
sd(𝑥) ∙ sd(𝑦)
−1≤ρ≤1
The correlation coefficient varies between −1 and 1.
Mathematical proof is as follows,
Since 𝑥 and 𝑦 are two independent random variables, then
3
𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦)
Therefore,
E(𝑥𝑦) = ∑∑𝑥𝑦𝑓(𝑥, 𝑦) = ∑∑𝑥𝑦𝑓(𝑥)𝑓(𝑦)
E(𝑥𝑦) = ∑∑𝑥𝑓(𝑥)𝑦𝑓(𝑦)
E(𝑥𝑦) = ∑𝑥𝑓(𝑥)∑𝑦𝑓(𝑦) = E(𝑥)E(𝑦)
1-Numerical Descriptive Statistics
30 of 37
When ρ = −1, 𝑥 and 𝑦 are perfectly negatively correlated.
When ρ = 0, 𝑥 and 𝑦 are uncorrelated. This means that 𝑥 and 𝑦 are independent.
When ρ = 1, 𝑥 and 𝑦 are perfectly positively correlated.
The closer ρ is to either −1 or 1, the stronger the relationship between the variable. The closer it is to 0, the
weaker the relationship.
Using the data in Table (B) above, we computed,
and
var(𝑥) = 0.5475
var(𝑦) = 0.61
Thus,
sd(𝑥) = 0.781
and
sd(𝑦) = 0.7399
Using these values and cov(𝑥, 𝑦) = −0.105, we have
ρ=
cov(𝑥, 𝑦)
−0.105
=
= −0.182
sd(𝑥) ∙ sd(𝑦) 0.781 × 0.7399
This means that the relationship between 𝑥 and 𝑦 is negative but very weak.
5.9. Effect of Linear Transformation of Two Random Variables on Their
Covariance and Correlation
Let
𝑣 = 𝑎 + 𝑏𝑥
and
𝑤 = 𝑐 + 𝑑𝑦
represent the linear of transformation of the random variables 𝑥 and 𝑦. Show that,
cov(𝑣, 𝑤) = 𝑏𝑑cov(𝑥, 𝑦)
Proof is as follow:
cov(𝑣, 𝑤) = E[(𝑣 − µ𝑣 )(𝑤 − µ𝑤 )]
cov(𝑣, 𝑤) = E(𝑣𝑤) − µ𝑣 µ𝑤 = E(𝑣𝑤) − E(𝑣)E(𝑤)
cov(𝑣, 𝑤) = E[(𝑎 + 𝑏𝑥)(𝑐 + 𝑑𝑦)] − [𝑎 + 𝑏E(𝑥)][𝑐 + 𝑑E(𝑥𝑦)]
cov(𝑣, 𝑤) = E(𝑎𝑐 + 𝑏𝑐𝑥 + 𝑎𝑑𝑦 + 𝑏𝑑𝑥𝑦) − E(𝑎 + 𝑏𝑥)E(𝑐 + 𝑑𝑥)
cov(𝑣, 𝑤) = 𝑎𝑐 + 𝑏𝑐E(𝑥) + 𝑎𝑑E(𝑦) + 𝑏𝑑E(𝑥𝑦) − 𝑎𝑐 − 𝑎𝑑E(𝑦) − 𝑏𝑐E(𝑥) − 𝑏𝑑E(𝑥)E(𝑦)
cov(𝑣, 𝑤) = 𝑏𝑑E(𝑥𝑦) − 𝑏𝑑E(𝑥)E(𝑦) = 𝑏𝑑[E(𝑥𝑦) − E(𝑥)E(𝑦)]
cov(𝑣, 𝑤) = 𝑏𝑑cov(𝑥, 𝑦)
The correlation coefficient of 𝑣 and 𝑤 is,
𝜌(𝑣, 𝑤) =
cov(𝑣, 𝑤)
𝑏𝑑cov(𝑥, 𝑦)
=
sd(𝑣) ∙ sd(𝑤) 𝑏sd(𝑥) ∙ 𝑑sd(𝑦)
𝜌(𝑣, 𝑤) =
cov(𝑥, 𝑦)
sd(𝑥) ∙ sd(𝑦)
1-Numerical Descriptive Statistics
31 of 37
5.10.
The Mean and Variance of the Sum of Two Random Variables
Let
𝑧 =𝑥+𝑦
The expected value of 𝑧 is,
E(𝑧) = E(𝑥 + 𝑦) = E(𝑥) + E(𝑦)
The variance of 𝑧 is,
var(𝑧) = E[(𝑧 − µ𝑧 )2 ]
2
2
var(𝑧) = E {[(𝑥 + 𝑦) − (µ𝑥 + µ𝑦 )] } = E {[(𝑥 − µ𝑥 ) + (𝑦 − µ𝑦 )] }
var(𝑧) = E[(𝑥 − µ𝑥 )2 + (𝑦 − µ𝑦 )2 + 2(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )]
var(𝑧) = E[(𝑥 − µ𝑥 )2 ] + E[(𝑦 − µ𝑦 )2 ] + 2E[(𝑥 − µ𝑥 )(𝑦 − µ𝑦 )]
var(𝑧) = var(𝑥 + 𝑦) = var(𝑥) + var(𝑦) + 2cov(𝑥, 𝑦)
Similarly,
var(𝑥 − 𝑦) = var(𝑥) + var(𝑦) − 2cov(𝑥, 𝑦)
If 𝑥 and 𝑦 are two independent random variables, cov(𝑥, 𝑦) = 0. Then,
var(𝑥 + 𝑦) = var(𝑥 − 𝑦) = var(𝑥) + var(𝑦)
6. The Normal Distribution
The normal distribution is the quintessential example of a continuous random variable. Unlike a discrete
random variable, a continuous random variable can take on infinite number of values. If the distribution of
these values is bell-shaped, then the distribution is represented by the normal probability density function:
𝑓(𝑥) =
1
√2πσ
1 𝑥−µ 2
)
σ
𝑒 − 2(
Each normal 𝑝𝑑𝑓 is identified by two distribution parameters µ (the population mean) and σ (the population
standard deviation). The other two symbols π and e are two universal constants (π = 3.1415927 and, the
base for natural logarithm, 𝑒 = 2.718282). Given µ and σ, we can draw the normal curve for different values
of 𝑥.
Example 6.1
Suppose the vehicle speed on a given stretch of an interstate is normally distributed with a mean speed of µ =
72 mph and σ = 5. Show the distribution of 𝑥, the vehicle speed.
𝑓(𝑥) =
1
5√2π
1 𝑥−72 2
)
5
𝑒 −2(
Find the density for 𝑥 = 77.
1-Numerical Descriptive Statistics
32 of 37
This can be done using a scientific calculator or Excel:
𝑓(77) =
1
√2π5
1 77−72 2
)
5
𝑒 − 2(
= 0.0484
The value (the density) 𝑓(𝑥 = 77) = 0.0484 is not the probability of 𝑥 = 77. It represents the height of the
curve at point 𝑥 = 77. For continuous distribution, the 𝑝𝑑𝑓 does not provide the probability for a given value
of 𝑥. Because 𝑥 can take on infinite values, the probability for a given value is not defined. For such variables,
probability is defined only for a range of values. Using the 𝑝𝑑𝑓, the probability that the random variable 𝑥 can
take on values in a given range or interval of values is determined by the area under the 𝑝𝑑𝑓 bounded by the
lower and upper ends of the interval.
The following shows the distribution of vehicle speed in Example 6.1.
f(77) = 0.048
µ = 72
x = 77
x
To draw this distribution, all we need to know are the values for the two parameters µ and σ. The normal
curve practically (but not actually) touches the x axis at 4 standard deviations from the mean. Thus, in
Example 6.1 you can evaluate 𝑓(𝑥) for several values of 𝑥, where 52 ≤ 𝑥 ≤ 92,4 and then connect the dots to
draw the normal curve as above. Note that the inflection point of the curve occurs at 𝑥 = µ + σ, which is at
𝑥 = 72 + 5 = 77
6.1. Finding the Area (Probability) Under a Normal Curve
Example 6.2
Given µ = 72 and σ = 5, if a vehicle is clocked at random, what is the probability that the speed is between 67
and 77 mph? Or, what proportion of vehicles drive between 67 and 77 mph?
In the diagram below, the probability is shown as the area under the curve for the interval 67 ≤ 𝑥 ≤ 77.
In Excel use the function =𝐍𝐎𝐑𝐌. 𝐃𝐈𝐒𝐓(𝐱, 𝐦𝐞𝐚𝐧, 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝_𝐝𝐞𝐯, 𝐜𝐮𝐦𝐮𝐥𝐚𝐭𝐢𝐯𝐞) to find 𝑓(𝑥). Enter “0” for “cumulative”
to obtain the density. Entering “1” for “cumulative” will yield the probability (the area under the normal curve to the left
of 𝑥).
4
1-Numerical Descriptive Statistics
33 of 37
67
µ = 72
77
x
This area can be found by evaluating the integral of 𝑓(𝑥) for 67 ≤ 𝑥 ≤ 77. You can do this if you are math
wiz! You can, however, also use the Excel function =𝐍𝐎𝐑𝐌. 𝐃𝐈𝐒𝐓(𝐱, 𝐦𝐞𝐚𝐧, 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝_𝐝𝐞𝐯, 𝟏). This function
gives you the area under the curve to the left of 𝑥. Therefore, you must use the function twice and then find
the difference: P(67 ≤ 𝑥 ≤ 77) = P(𝑥 ≤ 77) − P(𝑥 ≤ 67)
= NORM.DIST(77,72,5,1)
= NORM.DIST(67,72,5,1)
Difference
0.8413
0.1587
0.6827
Alternatively, you can linearly transform 𝑥 into the standard normal variable 𝒛. As mentioned above, the 𝑧
score (regardless of the distribution of the data) measures the deviation of each 𝑥 value from the mean in
units of standard deviation. The standard normal variable 𝑧 shows the deviations of all normally distributed
values of 𝑥 in units of σ.
𝑧=
𝑥−µ
σ
The 𝑧 random variable has a mean of 0 and standard deviation of 1.
E(𝑧) = E (
𝑥−µ
1
µ µ µ
) = E(𝑥) − = − = 0
σ
σ
σ σ σ
𝑣𝑎𝑟(𝑧) = 𝑣𝑎𝑟 (
𝑥−µ
1
1 2
σ2
) = 𝑣𝑎𝑟 ( 𝑥) = ( ) 𝑣𝑎𝑟(𝑥) = 2 = 1
σ
σ
σ
σ
Given these special features of z, the standard normal table of probabilities is developed. When 𝑥 is
transformed into 𝑧 then the corresponding probabilities are obtained from this table:
For 𝑥 = 77,
For 𝑥 = 67,
𝑧 = (77 – 72)⁄5 = 1.00
𝑧 = (67 – 72)⁄5 = −1.00
Thus,
P(67 ≤ 𝑥 ≤ 77) = P(−1.00 ≤ 𝑧 ≤ 1.00) = P(𝑧 ≤ 1.00) − P(𝑧 ≤ −1.00) = 0.8413 − 0.1587 = 0.6827
In Excel, use the =𝐍𝐎𝐑𝐌. 𝐒. 𝐃𝐈𝐒𝐓(𝐳, 𝐜𝐮𝐦𝐮𝐥𝐚𝐭𝐢𝐯𝐞) function to find the cumulative probability of 𝑧. For
example: =NORMSDIST(−1,1) = 0.1587
P(−1.00 ≤ 𝑧 ≤ 1.00) = 0.6827 is the confirmation of the well-known empirical rule that for all bell-shaped
distributions approximately 68 percent of all values fall within one standard deviation from the mean.
1-Numerical Descriptive Statistics
34 of 37
0.6827
-1.00
0
1.00
z
In addition for its practical uses to determine probabilities for normal random variables, the 𝑧 variable is
important also for theoretical reasons. The standard normal variable is related to a number of other random
variables that are crucial to inferential statistics. These random variables are referred to in terms of their
distribution: the t distribution, the Chi-square distribution, and the F distribution. These distributions will be
discussed in the future chapters when their applications are required for the topic under consideration.
6.2.
Finding the 𝒙 value for a given area under the normal curve
Example 6.3
Given µ = 72 and σ = 5, below what speed 90 percent of vehicles drive?
In the diagram below, the area bounded on the right by x is 0.90. What is the value of x?
0.9000
72
x = _____
To find 𝑥, first solve for 𝑥 from the 𝑧 function,
𝑧=
𝑥−µ
σ
𝑥 = µ + 𝑧σ
The 𝑧-score that bounds an area from the left (or with a cumulative probability) of 0.90 is 𝑧 = 1.28 (rounded
to two decimal points). You can obtain this 𝑧-score using =𝐍𝐎𝐑𝐌. 𝐒. 𝐈𝐍𝐕(𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲) in Excel.
1-Numerical Descriptive Statistics
35 of 37
=NORM. S. INV(0.9) = 1.28
Thus,
𝑥 = 72 + (1.28)(5) = 78.4
This speed is the 90th percentile speed. Ninety percent of vehicles drive under 78.4 mph.
You may also derive the same 𝑥 value using Excel. You can directly compute the 𝑥 value for a given
cumulative probability using =𝐍𝐎𝐑𝐌. 𝐈𝐍𝐕(𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲, 𝐦𝐞𝐚𝐧, 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝_𝐝𝐞𝐯)
=NORMINV(0.9,72,5) = 78.408
Example 6.4
For µ = 72 and σ = 5, find the middle interval (the interval symmetric about the mean) of 𝑥 values which
contains:
a)
b)
c)
90% of all 𝑥 values. (In what speed interval do 90% of vehicles drive?)
95% of all 𝑥 values. (In what speed interval do 95% of vehicles drive?)
99% of all 𝑥 values. (In what speed interval do 99% of vehicles drive?)
In the following diagram, we are to find the lower and upper 𝑥 values of the interval such that the proportion
1 − α of all 𝑥 values fall within this interval. The proportion α⁄2 of 𝑥 values fall at each tail of the distribution
outside the interval.
1−α
α/2
xL
α/2
xU
x
The 𝑥 values of the interval boundaries are found using the following familiar equations:
𝑧α⁄2 =
𝑥−µ
σ
𝑥𝐿 , 𝑥𝑈 = µ ± 𝑧α⁄2 σ
Here the symbol 𝑧α⁄2 is the z score that bounds a tail area of α⁄2. When, for example, 1 − α = 0.90, then
𝑧α⁄2 = 0.05. The 𝑧 score that bounds a tail area of 0.05, 𝑧0.05 , is easily found using,
=NORM. S. INV(0.05) = −1.64
Ignoring the “−” sign, then
𝑧0.05 = 1.64
1-Numerical Descriptive Statistics
36 of 37
The following table shows the z score for the three different tail areas needed to complete Example 6.4
1−α
0.90
0.95
0.99
α ⁄2
0.050
0.025
0.005
𝑧α⁄2
1.64
1.96
2.58
𝑥𝐿 , 𝑥𝑈 = µ ± 𝑧α⁄2 σ
a)
𝑥𝐿 , 𝑥𝑈 = 72 ± 1.64(5) = (63.8,80.2)
b)
𝑥𝐿 , 𝑥𝑈 = 72 ± 1.96(5) = (62.2,81.8)
c)
𝑥𝐿 , 𝑥𝑈 = 72 ± 2.58(5) = (59.1,84.9)
1-Numerical Descriptive Statistics
37 of 37
Download