Uploaded by hinwahk

QA1 notes 47fd38cfd22a737ee0f8391f2dfbb6d8 (2)

advertisement
Quantitative Analysis I
Course Title:
I. Course Information
C
Course Code:
CC Credits: 3
C
M
A
4
QF Credits: 12
0
0
1
QF Level: 4
II. Course Objectives
This course is designed to provide students with an introduction to statistical knowledge as applied to daily or academic situations
and to equip them with the essential skills to identify and apply appropriate techniques to solve problems. Students will be exposed
to concepts such as sampling techniques, graphical presentations, statistical measures, classical probability distribution, point and
interval estimation and hypothesis testing. Students will learn how to perform data analysis by hand and by Excel.
III. Syllabus
1.
Sampling and Summarizing Data and Statistical Descriptions

Frequency Distributions

Stem-and-Leaf Displays

Graphical Presentations

Box-and-Whisker Plots

Measures of Location (mean, mode and median)

Measure of Variation (range, IQR, variance and standard deviation)
2.
Possibilities and Probabilities

Permutations

Combinations

Probability

Mathematical Expectation

Some rules concerning probability

Sample spaces and Events

Addition rules

Conditional Probability

Independent Events

Multiplication Rules

Bayes Theorem
3.
Probability Distributions

Bernoulli Distribution

Binomial Distribution

Geometric Distribution

Poisson Distribution

Normal Distribution
4.
Sampling and Sampling Distribution

Different sampling methods

Sampling Distribution of the Mean

Sampling Distribution of the Proportion

The Central Limit Theorem
5.
Point and Interval Estimation

Point estimation

Interval Estimation of a population mean

Interval Estimation of a population proportion

Determine the size of the sample
6.
Hypothesis Testing

Reason for testing hypothesis

Steps in Hypothesis Testing

Hypothesis Testing on means (one sample)

Hypothesis Testing on proportions (one sample)

Type I and Type II errors
7.
Statistical Analysis Using Excel

Data Analysis dialog box in Excel

Excel Statistical Functions

Confidence interval estimation by using Excel

Hypothesis Testing by using Excel
IV. Assessment
Four assignments (60%) and one end of term assessment (40%)
VIII. Required and Recommended Reading (please use The Harvard Convention/APA Convention)
References
1. Haeussler, Ernest F., Paul, Richard S. & Wood, R. J. (2011) Introductory Mathematical Analysis for Business, Economics, and the
Life and Social Sciences, 13th edition, Pearson Education Limited.
2. Berenson, Mark L., Levine, David M. & Szabet, Kathryn A. (2015), Basic Business Statistics: Concept and Applications, 13th edition,
Pearson Education Limited.
Associate Degree 2022 – 2023 First Semester
CCMA4001 Quantitative Analysis I
Chapter 1
Summarizing and Describing Data
1.1 Summarizing Data
In order to visualize the distribution of a set of raw data, we ought to compile the data into a
more comprehensible form, making use of tables and graphs.
A. Frequency Tables
Given a set of raw data we usually arrange it into a frequency distribution where we collect
‘like’ quantities and display them by writing down how many of each type there are to form
a frequency table.
2-1
Example 1
In a multiple-choice test with 10 questions, the numbers of correct answers of 40 students are as follows.
10
4
9
6
7
4
8
7
7
5
5
8
7
9
10
6
5
9
7
6
4
7
5
6
7
9
5
8
8
4
8
7
7
5
5
4
8
6
6
6
Construct a frequency table for these data.
Solution:
Number of correct answers
Tally
Frequency
4
////
//// //
//// //
//// ////
//// /
////
//
5
5
6
7
8
9
10
7
7
9
6
4
2
Total: 40
2-2
B. Bar Chart
Example 2
Using the frequency table constructed in Example 1, draw a bar chart for the distribution of the
number of correct answers of 40 students in the multiple-choice test.
Solution:
Frequency (Number of students)
Distribution of the number of correct answers of 40 students
in the multiple-choice test
10
8
6
4
2
0
4
5
6
7
8
Number of correct answers
2-3
9
10
C. Stem-and-leaf diagrams
A very useful graphical representation of a frequency distribution is the stem-and-leaf diagram
(or stemplot) .
The stem-and-leaf diagram involves a combination of a graphical technique and a sorting technique.
By sorting it means listing the data in rank order according to numerical value.
The data values themselves are use to do this sorting.
The “stem” is the leading digit(s) of the data, while the “leaf” is the trailing digit.
For example, the numerical data 386 might be split 38 – 6 as shown:
Leading digits
38
(Used in sorting)
Trailing digit
6
(Shown in display)
A stem-and-leaf diagram is a method of presenting a data set so that gaps or concentrations in the
data become visible.
2-4
Example 3
Suppose that a class of 40 students obtained the following results in a Mathematics test.
61
80
55
70
76
73
100
90
64
62
75
64
62
66
46
61
67
39
58
63
63
64
51
40
66
43
38
37
28
71
70
49
48
68
86
27
69
74
37
56
Construct a stem-and-leaf diagram for these data.
Solution:
Stem
(Tens)
Leaf
(Units)
2
78
3
7789
4
03689
5
1568
6
11223344466789
7
0013456
8
06
9
0
10
0
2-5
Advantages of a stem-and-leaf diagram
1. It is easy to construct. In fact, it is no more difficult to construct than a frequency table.
2. It is actually partly a table and partly a graph and so it immediately and directly gives a good
picture of the frequency distribution without having to prepare a frequency table first and then
construct charts afterwards.
3. Since the actual data are recorded in the diagram, it retains the information about the original data,
and the information may be recovered readily.
In a frequency table or histogram, data are represented by tallies or areas of rectangles in class intervals
and so some information about the original data is lost and cannot be recovered.
For example, the reading 64 is recorded in its entirety in a stem-and-leaf diagram, but is represented
only by a count of 1 in the class interval (e.g. 60 – 64) in a frequency table or histogram.
4. It can be regarded as the original set of data arranged in ascending order of magnitude.
Hence it can be readily used for finding quartiles.
Disadvantages of a stem-and-leaf diagram
1. For some type of data, the number of stems that can be chosen is either very small or very large,
thus making the diagram inconvenient to construct and unable to show the distribution effectively.
2. It is not quite suitable for large sets of data.
Actually, for a large set of data, the purpose of graphical representation is to give a good overall
picture of the distribution rather than to show the details of the data.
A bar chart or a histogram is more suitable in this case.
2-6
Example 4
A fishery expert found the following concentrations of mercury, in parts per million, in thirty fish caught
in a certain stream.
0.024
0.031
0.052
0.024
0.024
0.030
0.056
0.034
0.059
0.068
0.035
0.021
0.052
0.023
0.054
0.028
0.037
0.034
0.048
0.040
0.022
0.049
0.043
0.034
0.032
0.021
0.040
0.032
0.021
0.039
Construct a double-stem diagram for these data.
Solution:
Stem
(Unit = 0.01)
Leaf
(Unit = 0.001)
2
11123444
2
8
3
0122444
3
579
4
003
4
89
5
224
5
69
6
6
8
In the above diagram, the units of the stems and leaves have been chosen to make the recorded digits simple.
This is an important feature of a stem-and-leaf diagram.
2-7
1.2 Statistical Descriptions
In statistics, there are two useful types of measure which characterize any set of data or
frequency distribution.
The first type, a measure of ‘centralization’, attempts to locate a typical value about which the
distribution clusters. This type of measure is called an average or measure of central tendency
or measure of location.
The second type is a measure of how scattered or spread out a distribution is and is called
a measure of dispersion.
In the figures shown,
(a) shows two distributions with different measures of central tendency but roughly the same spread,
(b) illustrates two distributions with the same measure of central tendency but different spreads.
(a)
(b)
2-8
I. Measures of Central Tendency
The most common measures of central tendency or average are the mean, the median and the mode.
A. Mean (Arithmetic Mean)
Given the complete set of N data {x 1 , x 2 ,  , x N } in a population, the mean  , is defined as

1
(x1  x 2    x N )
N
or

1 N
 xi
N i 1
The mean is usually denoted by Greek letter  (pronounced as mu).
If the set of n data {x p1 , x p 2 ,, x p n } , where the p i ’s are a set of integers selected from 1 to N,
is a sample of size n drawn from a population, then the sample mean is defined similarly,
but is denoted by x (read as x bar). Thus
1
x  ( x p1  x p 2    x p n )
n
x
or
1 n
xp
n i 1 i
The notation x pi for the elements of the sample may be a bit difficult for beginners.
Hence, when no misunderstanding arises, we shall denote the sample of size n simply as
{x 1 , x 2 ,  , x n }
Bearing in mind that the element x i in the sample is, in general, not the same element x i in
the population.
With this understanding, the sample mean is x 
2-9
1
(x1  x 2    x n )
n
or
x
1 n
 xi
n i 1
Example 4
Suppose that a class of 40 students obtained the following results in a Mathematics test.
61
80
55
70
76
73
100
90
64
62
75
64
62
66
46
61
67
39
58
63
63
64
51
40
66
43
38
37
28
71
70
49
48
68
86
27
69
74
37
56
(a) Find the mean of the population of Mathematics test marks.
(b) The following two samples each have been drawn randomly from the population of Mathematics
test marks.
S1 = {70, 43, 28, 69, 75, 90}
S 2 = {68, 62, 48, 39, 38, 55, 66, 71, 37, 76}
Find the means of these samples.
(c) Find the mean of the sample S3 formed by combining the samples S1 and S 2 .
Solution:
(a) The population mean is

1
(61  80  55  70  76  73  100  90  64  62  75  64  62  66  46  61  67  39  58  63
40
 63  64  51  40  66  43  38  37  28  71  70  49  48  68  86  27  69  74  37  56)
= 60.425
(b) The sample mean of S1 is
x1 
1
(70  43  28  69  75  90)  62.5
6
The sample mean of S 2 is
x2 
1
(68  62  48  39  38  55  66  71  37  76)  56
10
Note that a population mean is a unique value, but the sample mean varies from sample to sample.
(c) The sample mean of S3 is
or
x3 
1
(70  43  28  69  75  90  68  62  48  39  38  55  66  71  37  76)  58.4375
16
x3 
62.5  6  56  10
 58.4375
6  10
2-10
B. Median
The median is a measure of position. It is the middle value in an ordered sequence of data.
To find the median from a set of data collected in its raw form, we must first arrange the data
in rank order, from the smallest to the largest observation. Such an ordered sequence of data is
called an ordered array.
For a set of discrete data x 1 , x 2 , …, x n arranged in ascending order,
(i) if n is odd, x n 1 is the median, the median is the value of the datum that is in the middle.
2
(ii) if n is even, the median is

1
 x n  x n  , the median is the mean of the two data that are
1 
2  2
2 
nearest to the middle.
Example 5
(a) Find the median of the set of data {12, 8, 13, 16, 5}.
(b) Find the median of the set of data {25, 25, 37, 26, 25, 12, 75, 75}.
Solution:
(a) Arrange the set of five data in ascending order 5, 8, 12, 13, 16, the median is x 51  x 3  12
2
(b) Arrange the set of eight data in ascending order 12, 25, 25, 25, 26, 37, 75, 75,
the median is
 1
1
1
 x 8  x 8   ( x 4  x 5 )  (25  26)  25.5


1

2 2
2
2
2 
2-11
C. Mode
The mode of a set of data is the value that occurs with the highest frequency.
In this sense it is “most typical” of a set of data
For example, for the data 1, 1, 2, 2, 2, 3, 3, 4, 5, 5, 6, the mode is 2.
A distribution with one mode is called a unimodal distribution, while those with two modes are
bimodal, and with three or more are multimodal.
The two main advantages of mode are that it requires no calculations, only counting, and that
it can be determined for qualitative as well as quantitative data.
However, if all values are different in the set of data, certainly, the mode is useless in such a situation.
Example 6
Suppose that 50 children are asked which of the six brands of soft drink they prefer most
and the following results are obtained.
Brand
A
B
C
D
E
F
Number of children
4
15
5
8
3
15
Find the mode of these data.
Solution:
There are two modes in this set, namely, B and F.
This set is said to be bimodal.
2-12
II. Measures of Dispersion
The measures of central tendency can provide only brief information on a set of data.
Obviously, for a set of data, the averages alone cannot tell us how spread out or dispersed the data are.
We need some measures of dispersion, a numerical value indicating the amount of scatter about
a central point.
Widely dispersed data are also highly variable data. Hence measures of dispersion are also called
measures of variability.
The most common measures of dispersion in statistics are the range, the inter-quartile range,
the variance and the standard deviation.
2-13
A. Range
The range of a set of data is the difference between the largest value and the smallest value of the set.
In general, the greater the range, the greater the dispersion of the set of data.
Example 7
Find the range of scores of athlete A and B in Example 11
Solution:
The range of scores of athlete A = 9.5 – 6.0 = 3.5
The range of scores of athlete B = 8.0 – 7.0 = 1.0
Since the range of score of athlete A is greater than that of athlete B, we say that the scores of
athlete A is more dispersed than those of athlete B.
2-14
B. Inter-quartile range
With the set of data arranged in ascending order, the median is the value which divides the set of
data into two equal parts.
Similarly, if we divide the set of data into four equal parts, the corresponding values, denoted by
Q1 , Q 2 , Q 3 are called the first, second and third quartiles respectively.
And Q 2 is just the median of the distribution.
The inter-quartile range (IQR) of a set of data is defined as Q 3  Q1 ,
it measures approximately how far from the median we must go on either side before we can
include one-half of the values of the data set.
In dividing the set of data into 100 equal parts, the values are called percentiles and
are denoted by P1 , P2 , …, P99 .
The 50 th percentile, P50 , corresponds to the median,
whereas P25 and P75 corresponds to Q1 and Q 3 respectively.
The p th percentile of a data set is a value such that at least p percent of the items take on this value or less
and at least (100 – p) percent of the items take on this value or more.
Q1 is the first quartile (or lower quartile) where 25% of the data lie below it;
Q 2 is the second quartile (or middle quartile or median) where 50% of the data lie below it; and
Q 3 is the third quartile (or upper quartile) where 75% of the data lie below it.
To find the p th percentile, first arrange the set of discrete data x 1 , x 2 , …, x n in ascending order,
then compute index i, where
p
n
100
to find the position of the p th percentile.
i
If i is not an integer, round up to the nearest integer. The p th percentile is the value in the i th position.
If i is an integer, the p th percentile is the average of the values in positions i and i + 1.
2-15
Example 8
(a) Find the inter-quartile range of the data set A {14, 23, 16, 18, 15, 44, 19}.
(b) Find the inter-quartile range of the data set B {10, 15, 40, 28, 34, 18, 24, 30}.
(c) By comparing the inter-quartile range of the data sets A and B, which set has a greater dispersion?
Solution:
(a) Arrange the seven data of the data set A in ascending order 14, 15, 16, 18, 19, 23, 44.
For the 25 th percentile, the index i 
25
 7  1.75 = 2 (round up to the nearest integer),
100
hence Q1  x 2  15
For the 75 th percentile, the index i 
75
 7  5.25 = 6 (round up to the nearest integer),
100
hence Q 3  x 6  23
The inter-quartile range  Q 3  Q1 = 23 – 15 = 8
(b) Arrange the eight data of the data set B in ascending order 10, 15, 18, 24, 28, 30, 34, 40.
For the 25 th percentile, the index i 
hence Q1 
25
8  2,
100
1
1
( x 2  x 3 )  (15  18)  16.5
2
2
For the 75 th percentile, the index i 
75
8  6,
100
1
1
( x 6  x 7 )  (30  34)  32
2
2
The inter-quartile range  Q 3  Q1 = 32 – 16.5 = 15.5
hence Q 3 
(c) The range of both data sets A and B are 30.
However, the inter-quartile range of data set A is less than the inter-quartile range of data set B,
data set B has a greater dispersion.
The range considers the difference between the maximum and minimum values of a set of data.
The inter-quartile range considers the range of 50% of the data in the middle and thus avoids the
impact of extreme values.
Therefore if there are extreme values in a set of data, the inter-quartile range is a better measure of
dispersion than the range.
Moreover, the inter-quartile range exists even if the set of data has open ends.
2-16
Box-and-Whisker Diagram
The median, the lower quartile and the upper quartile together with the maximum and the minimum
values provide a good description of a set of data as they indicate some of the most important
characteristics of the set. These five key descriptive statistical measures are often called the
five-number summary of the set of data. A graphical display of these measures, called a
box-and-whisker diagram or a box plot, gives an even better visual impression of the set.
lower
25% of data
middle
50
%
ofdata






_____________

_____________
Minimum
Q1
upper
25% of data
Q3
Q2
Maximum
(median)
IQR
Range
A box-and-whisker diagram consists of a rectangular box drawn with its length parallel to the x-axis
and with its ends marking the position of the lower and the upper quartiles. An orange bar is then
inserted in the box to mark the median. The two extreme values, the minimum and the maximum
values of the data, are linked to the box by lines, called whiskers, parallel to the x-axis.
A glance at the diagram then gives us good information about the central tendency, dispersion and
extreme values of the set.
(1) The bar at the median shows the location of the centre of the data.
(2) The length of the box is equal to the inter-quartile range shows the dispersion of 50% of the data
in the middle, a measure of dispersion.
(3) The lengths of the whiskers show the dispersion of the data below the lower quartile and
above the upper quartile, describe the behavior at the ends or tails of the distribution.
(4) The shape of the diagram gives us a quick impression on the degree of symmetry of the data
distribution about the median.
It is easy to use box-and-whisker diagrams to compare the features, such as location of centre,
dispersion and symmetry of different sets of data. However, a box-and-whisker diagram does not
reveal the total frequency of each set of data, nor the frequency of the data for any specific range.
If such information is required, a stem-and-leaf diagram, bar chart or histogram can be used.
2-17
Box-and-whisker diagrams are particularly useful for comparing the central tendency and
the dispersion of two or more sets of data.
Example 9
The following box-and-whisker diagrams show the distributions of marks of Chinese, English and
Mathematics test.
(a)
(b)
(c)
(d)
Which test has the marks with the largest inter-quartile range?
Which test has the marks with the smallest range?
Which test has the highest median mark?
If Mary gets 70 marks in all three tests, in which test does she perform the best?
Briefly explain your answer.
Solution:
(a) Since the length of the box of Mathematics test is the largest, Mathematics test has the marks with
the largest inter-quartile range.
(b) Since the distance between two ends of the whiskers of Chinese test is the shortest.
Chinese test has the marks with the smallest range.
(c) Since the orange bar in the box of Mathematics test is at the rightmost position, the median mark
of Mathematics test is the highest.
(d) Since from the box-and-whisker diagram above, the mark of Mary’s English test is in the top
25% of the class while her marks in Mathematics and Chinese tests are not.
Mary performs the best in English test.
2-18
Skewness of Distributions
A distribution can have many different shapes. It may be symmetric or skewed.
A distribution is symmetric if the parts above and below its center are mirror images.
If Q 2  Q1  Q 3  Q 2 , the distribution is symmetric.
Min
Q1
Q3
Q2
Max
A distribution is skewed to the right if the right side is longer, while it is skewed to the left if the left
side is longer.
For a positively skewed or right-skewed distribution, an asymmetric distribution with a “tail” on the right
indicates the presence of extreme values at the positive end of the distribution.
A distribution is positively skewed if Q 2  Q1  Q 3  Q 2
Long tail to the right
Min
Q1 Q 2
Q3
Max
For a negatively skewed or left-skewed distribution, an asymmetric distribution with a “tail” on the left.
A distribution is negatively skewed if Q 2  Q1  Q 3  Q 2
Long tail to the left
Min
Q1
Q 2 Q 3 Max
2-19
Example 10
Using the stem-and-leaf diagram constructed in Example 5 for the distribution of results of the class
of 40 students in the Mathematics test.
Stem
(Tens)
Leaf
(Units)
2
78
3
7789
4
03689
5
1568
6
11223344466789
7
0013456
8
06
9
0
10
0
(a) Find the median, the first and the third quartiles.
(b) Construct the box-and-whisker diagram.
(c) Use the quartiles to comment on the skewness of the distribution.
Solution:
(a) The median is
 1
1
1
 x 40  x 40   ( x 20  x 21 )  (63  63)  63
1 
2  2
2
2
 2
For the 25 th percentile, the index i 
25
1
1
 40  10 , hence Q1  ( x 10  x 11 )  (48  49)  48.5
100
2
2
For the 75 th percentile, the index i 
75
1
1
 40  30 , hence Q 3  ( x 30  x 31 )  (70  70)  70
100
2
2
(b)
63
27
25
30
48.5
35
40
45
70
50
55
60
65
70
100
75
80
85
90
95
100
(c) Q 2  Q1  63  48.5  14.5 and Q 3  Q 2  70  63  7
Since Q 2  Q1  Q 3  Q 2 , the distribution is negatively skewed (left-skewed).
2-20
Example 11
The table below gives the monthly salaries in dollars of 25 employees of a certain department.
(a)
(b)
(c)
(d)
(e)
7800
11900
12700
10400
20200
6200
7300
9200
15500
17900
9700
9500
10500
13300
10200
9900
14200
8900
8700
16600
7400
6600
9600
6100
8200
Construct a stem-and-leaf diagram for the data.
Find the mean.
Find the median, the first and the third quartiles and the inter-quartile range.
Construct the box-and-whisker diagram.
Use the quartiles to comment on the skewness of the distribution.
Solution:
(a)
Stem
(Unit = $1000)
Leaf
(Unit = $100)
6
126
7
348
8
279
9
25679
10
245
11
9
12
7
13
3
14
2
15
5
16
6
17
9
18
19
20
2
2-21
(b) The mean 
1
(7800  11900  12700  10400  20200  6200  7300  9200  15500  17900
25
 9700  9500  10500  13300  10200  9900  14200  8900  8700  16600
 7400  6600  9600  6100  8200)
= 10740
(c) Making use of the stem-and-leaf diagram for the distribution of the salaries (with a column of
cumulative frequencies added to help locating the quartiles),
The median is x 251  x 13  9700
2
For the 25 th percentile, the index i 
25
 25  6.25 = 7 (round up to the nearest integer),
100
hence Q1  x 7  8200 .
For the 75 th percentile, the index i 
75
 25  18.75 = 19 (round up to the nearest integer),
100
hence Q 3  x 19  12700 .
The inter-quartile range  Q 3  Q1 = 12700 – 8200 = 4500
(d)
9700
6100
6000
7000
8200
8000
12700
9000
10000
11000
12000
13000
20200
14000
15000
16000
17000
18000
19000
20000
21000
(e) Q 2  Q1  9700  8200  1500 and Q 3  Q 2  12700  9700  3000
Since Q 2  Q1  Q 3  Q 2 , the distribution is positively skewed (right-skewed).
2-22
C. Variance and Standard Deviation
Although the inter-quartile range is an improved measure of dispersion compared with the range,
still it does not make use of the actual values of all the data in the set, therefore, cannot completely
reflect the dispersion of the data. A measure of dispersion which does take into account the
dispersion of all the values is the variance and standard deviation.
To overcome the limitations of range and inter-quartile range mentioned above, we can find the
distance of each datum from the centre of a group of data. The greater the average distance of all
data from the centre, the wider the dispersion of a set of data is.
If the set of N data {x 1 , x 2 , , x N } represents a population with mean  , then the variance of the
set of data is defined as the mean of the squares of the deviations of individual values from the
population mean, and is commonly denoted by  2 . Thus, population variance
2 
1 N
1
( x i  ) 2  [( x 1  ) 2  ( x 2  ) 2    ( x N  ) 2 ]

N i 1
N
Large variances indicate large dispersion and small variance indicate small dispersion.
However, the variance defined above does not have the same unit as the original values of x.
To have a measure of dispersion with the same unit as the original data, we take the positive square
root of the variance. The resulting measure is called the standard deviation of the set of data. Thus,
Population standard deviation  
1 N
 ( x i  ) 2 
N i 1
1
[( x 1  ) 2  ( x 2  ) 2    ( x N  ) 2 ]
N
If the set of n data {x 1 , x 2 ,  , x n } is a sample of size n drawn from a population and with mean x ,
the sample variance, s 2 , is defined as
s2 
1 n
1
(x i  x) 2 
[( x 1  x ) 2  ( x 2  x ) 2    ( x n  x ) 2 ]

n  1 i 1
n 1
The sample standard deviation, s, is the positive square root of the sample variance.
s
1 n
 (x i  x) 2 
n  1 i 1
1
[( x 1  x ) 2  ( x 2  x ) 2    ( x n  x ) 2 ]
n 1
2-23
Note that the differences between sample variance s 2 and population variance  2 are
the sample mean x is used instead of the population mean  , and the divisor is n – 1 instead of N.
Standard deviation can give us an idea about how close all the data are from their mean, and thus
we can learn about the consistency of the set of data.
The smaller the standard deviation, the less dispersed the set of data is.
In other words, the distribution of data in the set is more consistent.
2-24
Example 12
The temperatures (in o C ) of water in seven beakers are: 30, 32, 33, 28, 31, 29, 34.
(a) Find the mean of the temperatures of the water.
(b) Find the population standard deviation of the temperatures of the water.
Solution:
(a) The mean of the temperatures of the water is

1 7
1
x i  (30  32  33  28  31  29  34)  31

7 i 1
7
(b) The variance of the temperatures of the water is
1 7
( x i  ) 2

7 i 1
1
 [(30  31) 2  (32  31) 2  (33  31) 2  (28  31) 2  (31  31) 2  (29  31) 2  (34  31) 2 ]
7
1
 [(1) 2  12  2 2  (3) 2  0 2  (2) 2  3 2 ]
7
1
 (1  1  4  9  0  4  9)
7
4
2 
Therefore, the population standard deviation of the temperatures of the water is   4  2
2-25
Example 13
(a) Find the variance and standard deviation of the population of Mathematics test marks in Example 7
with the population mean 60.425.
(b) If the passing mark is one population standard deviation less than the mean, find the number of
students failed in the Mathematics test.
(c) The sample S 2 = {68, 62, 48, 39, 38, 55, 66, 71, 37, 76} has been drawn from the population of
Mathematics test marks in Example 7. The sample mean was found to be 56.
Find the sample variance and sample standard deviation.
Solution:
(a) The variance is
1 40 2
 xi  2
40 i 1
1

(612  80 2  55 2  70 2  76 2  73 2  100 2  90 2  64 2  62 2  75 2  64 2  62 2  66 2  46 2
40
 612  67 2  39 2  58 2  63 2  63 2  64 2  512  40 2  66 2  43 2  38 2  37 2  28 2  712
 70 2  49 2  48 2  68 2  86 2  27 2  69 2  74 2  37 2  56 2 )  60.425 2
1

(156497)  (60.425) 2
40
= 261.2444
2 
Therefore the standard deviation is   261.2444  16.1631
(b) The passing mark = 60.425 – 16.1631 = 44.2619
There are eight students with marks less than 44, so eight students failed in the Mathematics test.
(c) The sample variance is
s2 
2
1  10 2
  x i  10 x 
10  1  i 1

1
 [(68 2  62 2  48 2  39 2  38 2  55 2  66 2  712  37 2  76 2 )  10  56 2 ]
9
1
 (33304  31360)
9
 216
And the sample standard deviation is s  216  14.70
2-26
Use Scientific Calculator to find mean and standard deviation
Use the calculator to find the mean and standard deviation of the data set
{1, 2, 5, 6, 8, 9, 10, 12, 14, 18}
2-27
Use Scientific Calculator to find mean and standard deviation
Use the calculator to find the mean and standard deviation of the data set
{1, 2, 5, 6, 8, 9, 10, 12, 14, 18}
2-28
Associate Degree 2022 – 2023 First Semester
CCMA4001 Quantitative Analysis I
Chapter 2
Counting
2.1 The Fundamental Principle of Multiplication
Suppose that the first task can be completed in m1 ways, a second task in m 2 ways,
a third task in m 3 ways, and so on, until we reach the k th task that can be performed in m k ways;
then the total number of ways can be done is the product
m1m 2  m k
Example 1
(a) Find the number of ways to answer a true-false quiz with 4 questions if every question
must be answered.
(b) If we allow the possibility of unanswering the questions, what is the possible number of ways?
Solution:
(a) If every question must be answered, there are two choices, true or false, to answer each question.
By the fundamental principle of multiplication,
the total number of ways is 2  2  2  2  16
(b) If every question can be unanswered, there are 3 alternatives to answer each question.
The number of ways to answer the quiz is 3  3  3  3  81
3-1
Example 2
Consider the word “CHAPTER”.
(a) How many ways can this word be arranged?
(b) If we insist that the letter C starts first. How many ways are possible?
(c) If we insist that the letter C starts first and the letter R be the last. How many ways are possible?
Solution:
3-2
2.2 Permutation
A. Factorial Notation
The product of the first n consecutive integers is denoted by n! and is read as ‘n factorial’.
That is, n! n (n  1)(n  2) 3  2  1 for n  1
For example, 4! = 4  3  2  1 = 24
and 8! = 8  7  6  5  4  3  2  1 = 40320
Also, ‘factorial 0’, 0! is defined to be 1.
Since (n  1)! (n  1)(n  2)  3  2  1 ,
we see that when n > 1, n! n (n  1)!
Example 3
In how many ways can 8 people be seated in a row of 8 chairs?
Solution:
The number of ways is 8! 40320
3-3
B. Permutation of n distinct objects
In enumerating a sample space, we often require to choose a number r of objects from a set
of n objects and arrange them in order. We say we make a permutation (or an arrangement)
of n distinct objects taken r (r  n ) at a time.
The order of the object in permutation is important, that is, abc and bca are different permutations.
The total number of possible permutations, denoted by Prn or n Pr is given by
Prn  n (n  1)(n  2)(n  r  1)

r factors
Note that there are r factors in the expression for Prn .
Using the factorial notation, we can define Prn as:
n!
Prn 
(n  r )!
In particular, when all n distinct objects are taken together and arranged in order,
the number of permutations is Pnn  n!
Example 4
In how many different ways can the 20 members of a union select a president, a
vice-president, secretary, and a treasurer?
Solution:
Assuming that the officers are selected in the order president, a vice-president, secretary, and a
treasurer. Since the order is important in this question, the number of possible ways is
20!
20!
P420 

 116280
(20  4)! 16!
3-4
Example 5
How many different possible ways can a 6-letter words be formed such that the first and the last
letter are distinct vowels and the rest four are distinct consonants?
Solution:
C. Permutation of objects not all distinct
If among n objects, n 1 are of one kind, n 2 of a second kind, …, and n k of a k th kind,
then when all the n objects are taken together, the number of distinct permutations is
n!
n 1!n 2 ! n k !
Example 6
Find the number of permutations can be made from the letters in each word.
(a) FACTORIAL
(b) COMBINATION
Solution:
(a) Since there are two A’s, hence by substitute n 1  2 and n = 9,
the number of arrangements is given by
9!
 9  8  7  6  5  4  3  181440
2!
(b) Since there are two O’s, two I’s and two N’s, hence by substitute n 1  2 , n 2  2 , n 3  2
and n = 11,
the number of arrangements is given by
11!
 4989600
2!2!2!
3-5
Example 7
In how many ways can 10 customers be assigned to three counters with 2 to counter A,
3 to counter B and 5 to counter C?
Solution:
The number of ways is
10!
 2520
2!3!5!
3-6
2.2 Combination
In many circumstances, when we select objects, we need not consider the order in which the
objects appear. We then say we make a selection or combination of the objects.
The number of combinations (or selections) of n distinct objects taken r at a time, where r  n ,
n
is denoted by C nr , n C r or   , where
r
Prn
n!
n (n  1)(n  2)  (n  r  1)
C 


r! (n  r )!r!
r (r  1)  2  1
n
r
In particular, C 0n  C nn  1
It is important to note that we use C nr instead of Prn when the different orderings of
the r chosen objects are unimportant.
Example 8
In how many ways can 5 cards be drawn from an ordinary pack of 52 playing cards?
Solution:
We need to consider combinations, since the order in which the cards are drawn is not
important.
The number of ways of drawing 5 cards is C 52
5 
3-7
52!
52!

 2598960
(52  5)!5! 47!5!
Example 9
A team of 6 is chosen at random from 8 boys and 7 girls.
(a) How many ways can the team to be chosen if there are no restrictions.
(b) How many ways can the team to be chosen if there must be more boys than girls?
Solution:
(a) Number of ways of choosing the team is C15
6 
15!
15!

 5005
(15  6)!6! 9!6!
(b) If there are more boys than girls, then there must be 6 boys, 5 boys and 1 girl or 4 boys and 2 girls.
Number of ways of choosing 6 boys is C 86  28
Number of ways of choosing 5 boys and 1 girl is C 85  C17  56  7  392
Number of ways of choosing 4 boys and 2 girls is C 84  C 72  70  21  1470
Therefore, the number of ways of choosing the team if there are more boys than girls is
28 + 392 + 1470 = 1890
3-8
Associate Degree 2022 – 2023 First Semester
CCMA4001 Quantitative Analysis I
Chapter 3
Probability
3.1 Set Notation
A. Set and Element
A set is a list or collection of objects. The objects in the set are called elements or members of the
set.
If x is an element of the set A, we write x  A.
Here the symbol  means “ belong to” or “is an element of”.
Correspondingly, the symbol  means “does not belong to” or “is not an element of”.
If every element of a set B also belongs to a set A, we say that B is contained in A
and call B a subset of A.
Symbolically, we write B  A if x  B implies that x  A.
For example,
Let A = {2, 4, 6, 8, 10}.
If B = {2, 8, 10}, then B  A.
If C = {1, 2, 4}, then C is not a subset of A because 1C but 1A. In symbol, we write C  A.
Two sets are equal if and only if each is contained in the other.
Thus A = B if and only if A  B and B  A.
The universal set, U, is the set which contains all possible elements within a particular application
under consideration.
For example, if we consider A = {1, 2, 3} as a set listing some possible results of throwing a die,
the universal set is U = {1, 2, 3, 4, 5, 6}.
4-1
On the other hand, every set contains a subset with no elements in it.
We call this subset the null set or empty set and denote it by  (read as phi).
B. Venn Diagram
The idea of sets and subsets and relationships between them can be conveniently illustrated by
means of Venn diagrams.
In such diagrams, the universal set is usually represented by the region bounded by a rectangle
and sets and subsets by regions bounded by circles or other closed curves.
The regions may be shaded as required. The elements of the sets need not be marked in the diagram.
Example 1
The following diagram shows the universal set U and three other sets, A, B and C.
Describe the relationships between these sets.
Solution:
All A, B and C are subsets of U, i.e., A  U, B  U and C  U.
The circle representing C lies within the circle representing A.
Hence, C is a subset of A, i.e., C  A.
Apart from these relationships, no set is a subset of any other,
i.e., A  B, A  C, B  A, B  C and C  B.
The circles representing B and C do not intersect each other.
Hence, B and C have no elements in common.
The circles representing A and B have some common area.
Hence, A and B have at least one element in common.
4-2
C. Operations on Sets
It is often necessary to combine two or more sets to form new sets. This is done by set operations.
(1) Intersection
The intersection of two sets A and B, denoted by A  B, is the set of elements which belongs
to both A and B.
In symbols, we write A  B = {x: x  A and x  B}
The intersection, like A and B themselves, is a subset of the universal set U.
It can be represented in a Venn diagram as the region common to the regions representing A and B.
For example,
if A = {1, 3, 5, 7, 9} and B = {1, 2, 3, 4, 5},
then A  B = {1, 3, 5}.
If A and B have no elements in common, i.e., A  B =  , they are said to be disjoint.
Thus, A and B are disjoint if A  B =  .
(2) Union
The union of two sets A and B, denoted by A  B, is the set of elements which belongs to
either A or B or both.
In symbols, we write A  B = {x: x  A or x  B}
4-3
For example,
if A = {2, 4, 6, 8, 10} and B ={5, 10, 15, 20},
then A  B = {2, 4, 5, 6, 8, 10, 15, 20}.
Example 2
Let U = {x: x is a lower case letter}, A = {a, e, i, o, u}, B ={v, o, w, e, l}, and C = {v, w}.
(a) Find (i) A  B (ii) A  B (iii) B  C (iv) B  C (v) A  C
(b) Represent the sets A, B, C and U by a Venn diagram.
Solution:
(a) (i) A  B = {a, e, i, o, u, v, w, l}
(ii) A  B = {e, o}
(iii) B  C ={v, o, w, e, l}
(iv) B  C ={v, w}
(v) A  C = 
Note: If C  B, then B  C = B and B  C = C.
(b) The Venn diagram for the sets A, B, C and U are as follows.
4-4
(3) Complement
Sometimes, we are interested not only in the objects belonging to a subset, but also in the objects
not belonging to that subset. This gives rise to the concept of a complement.
Let A be a subset of the universal set U. The complement of A with respect to U is the set of the
elements of U which do not belong to A.
We shall denote this complement by A C , A’ or A .
In symbols, we write A’ = {x: x U and x  A}
For example,
if U = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and A = {1, 3, 5, 7, 9},
then A = {0, 2, 4, 6, 8}.
4-5
3.2. Sample Space and Events
A. Definitions
When we roll a die and observe the number shown on the top face, we perform an experiment.
A characteristics of the experiment is that, while we know the result (outcome) of
the experiment must be one of the six numbers in the set {1, 2, 3, 4, 5, 6},
we cannot predict with certainty which of them will actually show up.
The occurrence of the number depends on chance.
Such an experiment is called a random experiment.
When a random experiment is performed, the set of all possible outcomes is called
the sample space for the experiment, and is often denoted by S.
The sample space corresponds to the universal set U in set theory.
Example 3
A die is rolled and the number shown on the top face is observed.
Write down
(a) the sample space S,
(b)
(c)
(d)
(e)
the event A that the number shown is odd,
the event B that the number shown is divisible by 3,
the event that the number shown is odd or divisible by 3,
the event that the number shown is both odd and divisible by 3.
Solution:
(a) S = {1, 2, 3, 4, 5, 6}
(b) A = {1, 3, 5}
(c) B = {3, 6}
(d) This is the union of A and B. A  B = {1, 3, 5, 6}
(e) This is the intersection of A and B. A  B = {3}
4-6
3.3 Fundamental Concepts of Probability
A. Definition
Probability is a numerical measure of the chance of the occurrence of an event.
A sample space is said to be equiprobable if all the sample points are equally likely to occur.
Let S be an equiprobable sample space with n(S) sample points
and A be an event in S with n(A) sample points.
Then the probability of A, denoted by P(A), is
n (A)
P(A) 
n (S)
Hence for an equiprobable sample space, it is necessary to be able to enumerate all “possible”
and all “favourable” outcomes in order to find the probability of a given event.
4-7
Example 4
A bag contains 1 red ball, 2 yellow balls and 3 blue balls. A ball is randomly drawn from the bag.
(a) What is the probability that the ball drawn is red?
(b) What is the probability that the ball drawn is blue?
Solution:
The total number of balls in the bag is 1 + 2 + 3 = 6
Therefore, the number of possible outcomes is 6.
(a) The bag contains 1 red ball, therefore, the number of favourable outcome is 1.
1
 P(ball drawn is red) 
6
(b) The bag contains 3 blue balls, therefore, the number of favourable outcome is 3.
3 1
 P (ball drawn is blue)  
6 2
Example 5
Two fair coins are tossed.
(a) Find the probability that both coins show the same result.
(b) Find the probability that at least one coin shows a head.
Solution:
Since the coins are “fair”,
we can assume that the sample space S = {HH, HT, TH, TT} is equiprobable.
(a) Let A be the event that both coins show the same result.
A = {HH, TT} and n(A) = 2
n (A) 2 1
 
 P(A) 
n (S) 4 2
(b) Let B be the event that at least one coin shows a head.
B = {HH, HT, TH} and n(B) = 3
n (B) 3

 P(B) 
n (S) 4
4-8
B. Properties of Probability
From the definition of probability, we may derive the following properties.
For every event A in the sample space S,
1.
0  P(A)  1
2.
P(S) = 1
3.
P(A)  1  P(A)
4.
For two events A and B in the sample space S,
P(A  B)  P(A)  P(B)  P(A  B) [Additive Rule]
4-9
Example 6
A die is thrown. Find the probability of getting a number
(a) greater than 2,
(b) greater than 6,
(c) less than 7,
(d) less than 3 or greater than 5,
(e) less than 4 and is an odd number.
Solution:
For a die, there are 6 possible outcomes: 1, 2, 3, 4, 5 and 6.
(a) There are 4 favourable outcomes: 3, 4, 5 and 6
 P(a number greater than 2) 
4 2

6 3
(b) Since no number on the die is greater than 6, the number of favourable outcome is zero.
 P(a number greater than 6) 
0
0
6
(c) Since all the numbers on a die are less than 7, there are 6 favourable outcomes: 1, 2, 3, 4, 5
and 6
 P(a number less than 7) 
6
1
6
(d) There are 3 favourable outcomes: 1, 2 and 6
 P(a number less than 3 or greater than 5) 
3 1

6 2
(e) There are 2 favourable outcomes: 1 and 3
 P(a number less than 4 and is an odd number) 
4-10
2 1

6 3
We may use a tree diagram or the tabulation method to help us list out the possible outcomes.
Example 7
Three fair coins are tossed.
(a) Find the probability of getting exactly 2 heads.
(b) Find the probability of getting at least 2 tails.
Solution:
Let H stands for a head and T stands for a tail.
By using a tree diagram, we have
First
coin
Second
coin
Third
coin
Outcomes
H
(H, H, H)
T
(H, H, T)
H
(H, T, H)
T
(H, T, T)
H
(T, H, H)
T
(T, H, T)
H
(T, T, H)
T
(T, T, T)
H
H
T
H
T
T
The sample space S = {(H, H, H), (H, H, T), (H, T, H), (H, T, T), (T, H, H), (T, H, T), (T, T, H), (T, T, T)}
and n(S) = 8.
(a) Let A be the event of getting exactly 2 heads.
A = {(H, H, T), (H, T, H), (T, H, H)} and n(A) = 3
3
 P(A) 
8
(b) Let B be the event of getting at least 2 tails.
B = {(H, T, T), (T, H, T), (T, T, H), (T, T, T)} and n(B) = 4
4 1
 P(B)  
8 2
4-11
Example 8
Two dice are thrown.
(a) Find the probability of getting the same number on the two dice.
(b) Find the probability of getting a total of 8.
Solution:
All the possible outcomes are listed in the following table.
First Dice
Second Dice
1
2
3
4
5
6
1
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)
2
(2, 1)
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(2, 6)
3
(3, 1)
(3, 2)
(3, 3)
(3, 4)
(3, 5)
(3, 6)
4
(4, 1)
(4, 2)
(4, 3)
(4, 4)
(4, 5)
(4, 6)
5
(5, 1)
(5, 2)
(5, 3)
(5, 4)
(5, 5)
(5, 6)
6
(6, 1)
(6, 2)
(6, 3)
(6, 4)
(6, 5)
(6, 6)
(a) Let A be the event of getting the same number on the two dice.
A = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)} and n(A) = 6
 P(A) 
6 1

36 6
(b) Let B be the event of getting a total of 8.
B = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} and n(B) = 5
 P(B) 
5
36
4-12
We can also use the counting methods in the calculation of probabilities.
Example 9
A team of 5 is chosen at random from 6 boys and 4 girls. What is the probability that
(a) they are all boys?
(b) three of them are boys?
(c) only one girl is chosen?
Solution:
C56
6
1
(a) P(all 5 are boys) = 10 

C5
252 42
C36  C42 20  6 120 10
(b) P(3 are boys and 2 are girls) =



C10
252
252 21
5
C64  C14 15  4 60
5



(c) P(4 are boys and 1 is girl) =
10
C5
252 252 21
4-13
Example 10
A hand of 5 cards are drawn from a deck of 52 playing cards. What is the probability of getting
(a) 5 spades?
(b) 5 cards of the same suit?
(c) 3 aces and 2 kings?
(d) a full-house, i.e., 3 cards with the same number/letter and 2 cards with the same other number/letter?
Solution:
(a) The total number of possible hands of 5 cards is C 52
5  2598960
The number of ways of selecting 5 spades from the 13 spades is C13
5  1287
 P(5 spades) 
C13
1287
33
5


52
2598960 66640
C5
(b) We can select one suit out of 4 suits, namely, spade (♠), heart (♥), diamond (♦) and club (♣).
Then select 5 cards from this suit.
 P(same suit) 
C14  C13
4  1287
5148
33
5



52
2598960 2598960 16660
C5
(c) The number of ways of selecting 3 aces from the 4 aces is C34  4
The number of ways of selecting 2 kings from the 4 kings is C 42  6
 The number of ways of forming 3 aces and 2 kings is C 34  C 42  4  6  24
 P(3 aces and 2 kings) 
C 34  C 42
46
24
1



52
2598960 2598960 108290
C5
(d) There are 13 ways to choose the first number/letter and then 12 ways to choose the second number/letter.
For each of 13  12  156 ways,
the probability of forming a full-house is the same as P(3 aces and 2 kings).
 P(full-house)  13  12 
C 34  C 42
24
3744
6
 156 


52
2598960 2598960 4165
C5
4-14
Example 11
A card is drawn randomly from an ordinary pack of 52 playing cards.
Find the probability that the card is
(a) a spade or a heart,
(b) an ace or a heart.
Solution:
Since the card is drawn randomly from the pack, each card has the same chance of being selected.
Hence we may take the sample space to be equiprobable with 52 sample points.
(a) Since a card cannot be both a spade and a heart,
the event of getting a spade and the event of getting a heart cannot occur together,
so they are mutually exclusive.
P(spade) 
13 1

52 4
and P(heart) 
13 1

52 4
Therefore, by the special addition rule for mutually exclusive events,
P(spade  heart )  P(spade)  P(heart) 
13 13 26 1



52 52 52 2
(b) Since a card can be both an ace and a heart,
the event of getting an ace and the event of getting a heart can occur together,
so they are not mutually exclusive.
P(ace) 
4
1

52 13
P(heart) 
13 1

52 4
and P(ace  heart)  P(ace of heart) 
1
52
Therefore, by the general addition rule,
P(ace  heart)  P(ace)  P(heart)  P(ace  heart) 
4-15
4 13 1 16 4




52 52 52 52 13
Example 12
In a box of 12 monitors, 8 are good and 4 are defective.
If three monitors are picked at random for inspection, what is the probability that
(a) they are all good?
(b) at least one is defective?
(c) 2 are good and 1 is defective?
(d) at least two are good?
Solution:
(a) P(3 good ones) =
C 83
56 14


12
220 55
C3
(b) Using the law for complementary events,
P(at least 1 defective) = 1 – P(3 good ones) = 1 
14 41

55 55
C 82  C14 28  4 112 28
(c) P(2 good and 1 defective) =



220
220 55
C12
3
(d) Using the special addition law for mutually exclusive events,
P(at least 2 good ones) = P(3 good ones) + P(2 good and 1 defective)
14 28

55 55
42

55

4-16
C. Conditional Probability
Sometimes we may wish to alter our estimate of the probability of an event
when we have additional knowledge that might affect its outcome.
This revised probability is called the conditional probability of the event.
The 50 students of a class can be classified according to sex and stream as follows.
Arts (A)
Science (S)
Boys (B)
7
17
Girls (G)
16
10
If one student is selected at random from the class,
the probability that the student is a boy Arts student is P(B  A) 
7
50
Now, suppose that the student is known to have been drawn from the 24 boy students.
Then the probability that he is an Arts student should be
7
.
24
This probability is obviously different from P(B  A) above as it is based on a different
sample space (the reduced sample space of 24 boy students, not the original sample space of
50 students of both sexes).
It is the probability that the student selected is an Arts student given the condition that
the student is a boy.
This type of probability is called conditional probability.
4-17
More generally, consider the sample points of the equiprobable sample space S classified according
to the table below.
B
B
Total
A
n (A  B)
n (A  B)
n(A)
A
n (A  B)
n (A  B)
n (A)
Total
n(B)
n (B)
n(S)
The probability that event B will occur given that event A has already occurred is called
the conditional probability of B given A (or the probability of B conditional on A),
and is denoted by P(B | A) .
Thus, with the reasonable assumptions that n (A)  0 and n (S)  0 ,
n (A  B)
P(A  B)
n (A  B)
n (S)
P( B | A ) 


n
(
A
)
P( A )
n (A)
n (S)
The last expression is usually taken as the definition of a conditional probability.
The conditional probability of B given A has occurred is defined as
P(A  B)
P(B | A) 
if P(A)  0
P( A )
Similarly, the conditional probability of A given B has occurred is defined as
P(A  B)
P(A | B) 
if P(B)  0
P(B)
Note that the relationship between A and B in the conditional probability is not symmetrical,
that is, in general, P(B | A)  P(A | B) .
4-18
Example 13
100 students go to a camp. After grouping, the number of boys and girls in each group are as follows.
Group
I
II
III
IV
Boys
14
18
15
13
Girls
11
7
10
12
If a student is chosen at random from these 100 students, find the probability that
(a) the student is in group II,
(b) the student is a boy,
(c) the student is a boy in group II,
(d) the student is a boy given that the student is in group II,
(e) the student is in group II given that the student is a boy.
Solution:
Let A be the event that the chosen student is in group II,
and B be the event that the chosen student is a boy.
(a) P(the student is in group II)  P(A) 
(b) P(the student is a boy)  P(B) 
18  7 25 1


100
100 4
14  18  15  13 60 3


100
100 5
(c) P(the student is a boy in group II)  P(A  B) 
4-19
18
9

100 50
(d) P(the student is a boy given that the student is in group II)  P(B | A)
P(A  B)
P(A)
9
 50
1
4
9

4
50
18

25

OR Among the 25 students in group II, there are 18 boys.
 P(the student is a boy given that the student is in group II) 
18
25
(e) P(the student is in group II given that the student is a boy)  P(A | B)
P(A  B)
P(B)
9
 50
3
5
9 5


50 3
3

10

OR Among the 60 boys, there are 18 boys in group II.
 P(the student is in group II given that the student is a boy) 
4-20
18 3

60 10
Example 14
A card is drawn randomly from an ordinary pack of 52 playing cards.
(a) What is the probability it is the king of spade?
(b) We are told it is a king. What is the probability it is the king of spade?
(c) We are told it is a spade. What is the probability it is the king of spade?
Solution:
(a) P(king of spade) =
1
52
(b) The most direct approach is to say that the king is equally likely to be any of the four kings
and hence the probability that it is the king of spade is
1
.
4
(c) The spade is equally likely to be any of the thirteen spades
and hence the probability that it is the king of spade is
1
.
13
It is instructive to consider (b) and (c) as conditional probabilities.
Consider the sample space that has the 52 distinct cards as equally likely outcomes.
Let A be the event of getting a king,
B be the event of getting a spade,
and C be the event of getting the king of spade.
Then P(A) 
4
1
13 1
1
 , P(B) 
 and P(C)  P(A  C)  P(B  C) 
52 13
52 4
52
1
P(A  C) 52 13 1
(b) P(C | A) 



1
P(A)
52 4
13
1
4
1
P(B  C) 52
(c) P(C | B) 



1
52 13
P(B)
4
4-21
Example 15
The probability that a regularly scheduled flight arrives on time is 0.92, the probability that
it departs on time is 0.83, and the probability that it arrives and departs on time is 0.78.
Find the probability that a plane
(a) departs on time given that it arrived on time,
(b) arrives on time given that it departed on time.
Solution:
Let A be the event that the flight arrives on time, and B be the event that the flight departs on time.
Then P(A) = 0.92, P(B) = 0.83 and P(A  B)  0.78
(a) P(B | A) 
P(A  B) 0.78 78 39



P( A )
0.92 92 46
(b) P(A | B) 
P(A  B) 0.78 78


P(B)
0.83 83
This example verifies that P(B | A)  P(A | B) .
Example 16
A fair dice is tossed twice. If the first toss is an odd number, what is the probability that the sum of
the two tosses is 7?
Solution:
Let A be the event of getting an odd number in the first toss, and B be the event of getting the sum is 7.
There are 6 possible outcomes in the first toss: 1, 2, 3, 4, 5 and 6
A = {1, 3, 5} and n(A) = 3
 P( A ) 
3 1

6 2
There are totally 6  6  36 possible outcomes for tossing a dice twice.
For events A and B to occur simultaneously, there are three favourable outcomes: (1, 6), (3, 4) and (5, 2)
 P(A  B) 
3
1

36 12
1
P(A  B) 12 1
Therefore, P(B | A) 


1
P(A)
6
2
4-22
D. Multiplication Law of Probability for Independent Events
For two events A and B in the sample space S,
event B is said to be independent of event A if the probability that event B occurs is not affected
by whether event A has or has not occurred.
In this case, the conditional probability of B given A equals the probability of B,
that is, P(B | A)  P(B)
On the other hand, P(A | B) 
P(A  B) P(A)  P(B | A) P(A)  P(B)


 P( A )
P(B)
P(B)
P(B)
and so, A is also independent of B.
So we have “B is independent of A” implies that “A is also independent of B”.
Two events A and B are independent if either P(A | B)  P(A) or P(B | A)  P(B) .
If A, B are independent events, since P(A | B) 
P(A  B)
P(B)
we have P(A  B)  P(A | B)  P(B)  P(A)  P(B)
Therefore, independent events are usually formally defined as follows.
Two events A and B are independent if and only if P(A  B)  P(A)  P(B)
This is the special multiplication law of probability for independent events
and is sometimes known as ‘the AND rule’.
It tells us that the probability that two independent events will both occur is simply
the product of their probabilities.
If A, B and C are independent events, then
P(A  B  C)  P(A)  P(B)  P(C)
4-23
Example 17
When a fair coin is tossed three times, there are eight equally likely outcomes.
Consider the following events.
A is the event that the first toss is head.
B is the event that the last toss is tail.
C is the event that the total number of heads is exactly one.
(a) Compute P(A), P(B) and P(C).
(b) Compute P(B | A) . Are the events A and B independent?
(c) Compute P(C | A) . Are the events A and C independent?
Solution:
(a) A = {(H, H, H), (H, H, T), (H, T, H), (H, T, T)} and n(A) = 4
 P(A) 
4 1

8 2
B = {(H, H, T), (H, T, T), (T, H, T), (T, T, T)} and n(B) = 4
 P(B) 
4 1

8 2
C = {(H, T, T), (T, H, T), (T, T, H)} and n(C) = 3
 P ( C) 
3
8
(b) Since A  B = {(H, H, T), (H, T, T)} and n (A  B)  2
 P(A  B) 
2 1

8 4
1
P(A  B) 4 2 1
It follows that P(B | A) 
  
1 4 2
P( A )
2
Since P(B | A)  P(B) , events A and B are independent.
OR
1 1 1
 
2 2 4
So we have P(A  B)  P(A)  P(B) , and therefore, events A and B are independent.
P(A)  P(B) 
4-24
(c) Since A  C = {(H, T, T)} and n (A  C)  1
 P ( A  C) 
1
8
1
P ( A  C) 8 2 1
It follows that P(C | A) 
  
1 8 4
P(A)
2
Since P(C | A)  P(C) , events A and C are dependent.
OR
1 3 3
 
2 8 16
So we have P(A  C)  P(A)  P(C) , and therefore, events A and C are dependent.
P ( A )  P (C) 
4-25
Example 18
Two events A and B are such that P(A) 
1
1
2
, P(A | B)  and P(B | A)  .
4
2
3
(a) Are A and B independent events?
(b) Are A and B mutually exclusive events?
(c) Find P(A  B) .
(d) Find P(B).
Solution:
(a) If A and B are independent events then P(A | B)  P(A) .
1
1
and P(A)  .
2
4
Therefore P(A | B)  P(A) and A and B are not independent events.
Now P(A | B) 
(b) If A and B are mutually exclusive events then P(A | B)  0 .
But it is given that P(A | B) 
1
.
2
Therefore A and B are not mutually exclusive events.
(c) P(A  B)  P(A)  P(B | A) 
1 2 1
 
4 3 6
(d) P(A  B)  P(B)  P(A | B)
1
1
 P(B) 
6
2
 P(B) 
P(B) 
1
2
6
1
3
4-26
E. Multiplication Law of Probability for Dependent Events
Two events are said to be dependent if the occurrence of one event affects the occurrence of the other.
If A and B are two dependent events where event A occurs before event B,
then the conditional probability of B based on event A has occurred is
P(A  B)
if P(A)  0
P( B | A ) 
P( A )
So we have P(A  B)  P(A )  P(B | A)
This is known as the general multiplication rules, and the rules enable us to calculate the probability
that two events will both occur.
The probability that both of two events will occur is equal to the probability that one of them
will occur multiplied by the probability that the other one will occur given that the first has occurred.
The multiplication rule looks rather complex, but should be intuitively clear and may be readily
written down with the aid of a tree diagram.
For two events A and B in the sample space, the probability that both of them will occur is
P(A  B)  P(A)  P(B | A)
4-27
Example 19
There are 2 red balls and 1 green ball in a bag.
Find the probability of getting a red ball in the first draw and a green ball in the second draw
if two balls are drawn at random one by one
(a) with replacement,
(b) without replacement.
Solution:
Let A be the event that a red ball is drawn in the first draw.
and B be the event that a green ball is drawn in the second draw.
(a) If two balls are drawn at random one by one with replacement,
then events A and B are independent events.
Let R stands for a red ball and G stands for a green ball.
The tree diagram is constructed as shown.
First
Draw
Second
Draw
Outcome
RR
RG
GR
GG
The probability of getting ‘RG’ can be calculated by multiplying the probabilities along the branches.
P(RG )  P(A  B)  P(A)  P(B) 
2 1 2
 
3 3 9
4-28
(b) If two balls are drawn at random one by one without replacement,
then events A and B are dependent events.
The tree diagram is constructed as shown.
First
Draw
Second
Draw
Outcome
RR
RG
GR
The probability of getting ‘RG’ can be calculated by multiplying the probabilities along the branches.
P(RG )  P(A  B)  P(A)  P(B | A) 
2 1 1
 
3 2 3
4-29
Example 20
A carton contains 12 eggs of which 3 are rotten. If 2 eggs are selected randomly from the carton
without replacement, what is the probability that
(a) both are rotten?
(b) exactly one is rotten?
(c) at least one is rotten?
Solution:
Let R i be the event that the i th egg drawn is rotten, i = 1, 2,
and G i  R i , the event that the i th egg drawn is good.
The tree diagram for the results of the 2 eggs selected is shown as follows.
(a) P(both eggs drawn are rotten)  P(R 1  R 2 )  P(R 1 )  P(R 2 | R 1 )
Assuming that all eggs have the same probability of being selected,
P(R 1 ) 
3 1

12 4
If the first egg drawn is rotten, there will be 2 rotten eggs among the 11 remaining.
 P(R 2 | R 1 ) 
2
11
Hence, P(R 1  R 2 ) 
3 2
1
 
12 11 22
OR Using combination, the required probability is
4-30
C 32
3
1


12
66 22
C2
(b) From the tree diagram, the event that exactly one of the eggs drawn is rotten is the union of
the two mutually exclusive events R 1  G 2 and G 1  R 2 .
P(exactly one of the eggs drawn is rotten)
 P ( R 1  G 2 )  P (G 1  R 2 )
 P ( R 1 )  P (G 2 | R 1 )  P (G 1 )  P ( R 2 | G 1 )
3 9 9 3
  
12 11 12 11
9
9


44 44
18

44
9

22

OR Using combination, the required probability is
(c)
C13  C19 3  9 27 9



C12
66
66 22
2
P(at least one of the eggs drawn is rotten)
= P(exactly one of the eggs drawn is rotten) + P(both eggs drawn are rotten)
9
1


22 22
10

22
5

11
4-31
G. Bayes’ Theorem
Bayes’ theorem is an important extension of the result
P(A  B) P(B)  P(A | B)
P(B | A) 

P( A )
P( A )
Suppose the sample space S is partitioned into n mutually exclusive and exhaustive events E 1 , E 2 , …, E n ,
i.e., S  E 1  E 2    E n with E i  E j   for i, j = 1, 2, …, n and i  j .
Let A be an event in S.
Then the probability of E r condition on A is
P(E r | A) 
P(E r )P(A | E r )

P( E 1 )  P( A | E 1 )  P( E 2 )  P( A | E 2 )    P( E n )  P( A | E n )
P(E r )P(A | E r )
n
 P( E )  P( A | E )
i 1
i
i
for r = 1, 2, …, n.
This is known as Bayes’ theorem.
The formula looks very complicated, but in fact it is easy to use if you remember that the denominator
is the total probability of A.
4-32
Note that the conditional probability on the left hand side is conditional on A
while the conditional probabilities on the right hand side are conditional on the E i ’s.
This reversal of conditioning is the most important feature of Bayes’ theorem.
Normally, we start with a specified event ( E r , say) and find, by the multiplicative rule,
the probability that this event will lead to an observed result (A).
But Bayes’ theorem allows us to do the reverse.
Given the observed result (A), we calculate the probability that it has arisen from a specified event ( E r ).
In Bayes’ theorem, the “initial” probabilities P(E r ) are assigned or estimated based on personal judgment,
experience or past records. Their values are given before the information about the occurrence or
non-occurrence of the event A is available. They are thus called prior probabilities.
The Bayes’ theorem modifies the prior probabilities to incorporate information provided by
the occurrence of event A. The “revised” probabilities P(E r | A) , derived after information
is provided by an observed result, are called posterior probabilities.
4-33
Example 21
Visitors to a certain country are required to undergo a blood test for a certain kind of disease.
If a visitor has the disease, the test has a probability 0.90 of showing a positive result.
But even if the visitor does not have the disease, the test still has a probability 0.05 of showing
a positive result. From past records and other sources of information, it is known that 8% of the
visitors have the disease.
(a) What is the probability that a visitor selected at random will give a positive result for the blood test?
(b) Suppose that a visitor’s blood test gives a positive result, what is the probability that he has the disease?
Solution:
Let D be the event that a visitor has the disease,
and Y (i.e., yes) be the event that the blood test shows a positive result.
(a) The given information is displayed in the following tree diagram.
From the diagram, we see that
P ( Y )  P ( D)  P ( Y | D)  P ( D)  P ( Y | D)
 0.08  0.90  (1  0.08)  0.05
 0.08  0.90  0.92  0.05
 0.072  0.046
 0.118
4-34
(b) If the visitor has the disease, we know that his blood test will very likely show a positive result.
But conversely, we realize that even if he does not have the disease, his blood test result can
still be positive, though with a low probability.
If a positive result cannot assure that the visitor has the disease,
then the probability that it indicate that the visitors has the disease can be found by
the Bayes’ theorem as follows.
P( D | Y ) 
P ( D)  P ( Y | D)
P ( D)  P ( Y | D)  P ( D)  P ( Y | D)
0.08  0.90
0.08  0.90  (1  0.08)  0.05
0.072

0.08  0.90  0.92  0.05
0.072

0.072  0.046
0.072

0.118
72

118
36

59

Bayes’ theorem is useful for revising the prior probability to give the posterior probability that
an event occurs based on the information provided by an observed result.
The positive result of the blood test suggests a greatly increased probability (from 0.08 to
that the visitor has the disease.
4-35
36
 0.61 )
59
Associate Degree 2022 – 2023 First Semester
CCMA4001 Quantitative Analysis I
Chapter 4
Discrete Probability Distributions
4.1 Probability Distributions and Probability Functions
A probability distribution is a function that describes the likelihood of obtaining the possible values
that a random variable can assume. In other words, the values of the variable vary based on the
underlying probability distribution.
Suppose you draw a random sample and measure the heights of the subjects. As you measure
heights, you can create a distribution of heights. This type of distribution is useful when you need to
know which outcomes are most likely, the spread of potential values, and the likelihood of different
results.
Discrete probability functions can assume a discrete number of values. For example, coin tosses and
counts of events are discrete functions. These are discrete distributions because there are no
in-between values. For example, you can have only heads or tails in a coin toss. Similarly, if you’re
counting the number of books that a library checks out per hour, you can count 21 or 22 books, but
nothing in between.
For discrete probability distribution functions, each possible value has a non-zero likelihood.
Furthermore, the probabilities for all possible values must sum to one. Because the total probability
is 1, one of the values must occur for each opportunity.
5-1
Example 1
Consider the experiment of throwing three fair coins.
Let X be the random variable of the number of heads shown when 3 fair coins are tossed.
(a) Describe the random variable X as a function of the elements of the sample space.
(b) What is the probability associated with each value of the random variable X?
Solution:
(a) The sample space of throwing 3 fair coins is
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
which consists of 8 equiprobable elements.
The random variable X is a function defined on the sample space S,
to each point i in S, X assigns a real number X(i).
If only the number of heads shown is concerned, then a numerical value of 0, 1, 2 or 3
will be assigned to each sample point.
The numbers 0, 1, 2 and 3 are random quantities determined by the outcome of an experiment.
5-2
The values of the random variable X are given by the following table.
i
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
X(i)
3
2
2
2
1
1
1
0
Thus, the random variable X may be described as the function shown in the figure below.
(b) When the random variable X takes on the value 3, it is associated with the event {HHH}
which has a probability of
1
of occurring.
8
Thus, we have the following probability statement.
P(X = 3) = P({HHH}) 
1
8
P(X = 2) = P({HHT, HTH, THH}) 
P(X = 1) = P({HTT, THT, TTH} 
P(X = 0) = P({TTT}) 
3
8
3
8
1
8
The probability statements may be written as
1

8
3

8
P( X  x )  
3
8

1
 8
when x  3
when x  2
when x  1
when x  0
5-3
Representation
The probability distribution of a random variable may be regarded as an arrangement in
which the total probability 1 of the sample space is assigned (distributed) to the various
values of the random variable according to some rule(s).
A discrete probability distribution is the probability distribution of a discrete random variable.
It may be represented by a table or a function.
Example 2
Represent the probability distribution of the random variable X, the number of heads shown
in the throw of three fair coins, in Example 1 by a table,
Solution:
(a) The table below is a representation of the probability distribution of X.
x
0
1
2
3
P(X = x)
1
8
3
8
3
8
1
8
5-4
Example 3
The probability function of a random variable X is given by
kx

f (x)  
0

for x  1, 2, 3, 4
otherwise
(a) Find the value of k.
(b) Tabulate the probability distribution of X.
(c) Find the value of P(X  3) .
Solution:
4
 f (x)  1
(a)
x 1
f (1)  f (2)  f (3)  f (4)  1
k  2k  3k  4k  1
10k  1
1
k
10
(b) The probability distribution of X is as follows.
x
1
2
3
4
f(x)
1
10
1
5
3
10
2
5
(c) P(X  3)  P(X  1)  P(X  2)  P(X  3)
 f (1)  f (2)  f (3)
1
2
3

 
10 10 10
6

10
3

5
5-5
4.2 Expectation
The expectation (or mean or expected value) of a discrete random variable X (or of a discrete
probability distribution) is defined as
n
  E(X)   f ( x ) x   f ( x i ) x i
i 1
x
where

denotes the summation over all the possible values of x.
x
In general, the expectation of a function g(X) of a discrete random variable X is defined as
n
E[g(X)]   f ( x )g ( x )   f ( x i )g( x i )
x
i 1
and is the weighted mean of all the values which the function g(X) can take.
5-6
Example 3
The random variable X has the following probability distribution.
x
0
1
2
3
4
f(x)
1
16
1
4
3
8
1
4
1
16
Find E(X) , E(X 2 ) and hence E(X 2  3X  8) .
Solution:
4
E(X)   f ( x ) x 
x 0
4
1
1
3
1
1
 0  1   2   3   4  2
16
4
8
4
16
E(X 2 )   f ( x ) x 2 
x 0
1
1
3
1
1
 0 2   12   2 2   3 2   4 2  5
16
4
8
4
16
E(X 2  3X  8)  E(X 2 )  3E(X)  8  5  3  2  8  7
Example 4
From past records, a dentist found that the number of patients X treated in an hour can be described
by the following probability distribution.
x
1
2
3
4
5
f(x)
0.1
0.15
0.4
0.25
0.1
(a) Find E(X) .
(b) If he charges his patient $500 for each treatment, what is his expected earning per hour.
Solution:
5
(a) E(X)   f ( x ) x  0.1  1  0.15  2  0.4  3  0.25  4  0.1  5  3.1
x 1
(b) If he charges his patient $500 for each treatment, his expected earning per hour is
3.1  $500  $1550
5-7
The following results on expectations are frequently used in probability theory.
1. For a random variable X and any constants a and b,
E(aX + b) = aE(X) + b
A special case is E(b) = b.
2. For functions g(X) and h(X) of the random variable X,
E[g(X)  h(X)] = E[g(X)]  E[h(X)]
Proof:
1. E(aX  b)   f ( x )(ax  b)
x
  f ( x )(ax )   f ( x )(b)
x
x
 a  f ( x ) x  b f ( x )
x
( a and b are constants independent of x)
x
= aE(X) + b(1)
= aE(X) + b
2. E[g( x )  h ( x )]   f ( x )[g ( x )  h ( x )]
x
  f ( x )g ( x )   f ( x ) h ( x )
x
x
 E[g (X)]  E[h (X)]
5-8
4.3 Variance and Standard Deviation
The variance of a discrete random variable X (or of a discrete probability distribution),
commonly denoted by Var(X) is defined as
n
 2  Var(X)  E[(X  ) 2 ]   f ( x )( x  ) 2   f ( x i )( x i  ) 2
i 1
x
The standard deviation  is the positive square root of the variance, and is a measure of
dispersion of the distribution in the same unit as x.
  Var (X)
n
Sometimes, the formulae  2  Var (X)  E[(X  ) 2 ]   f ( x )( x  ) 2   f ( x i )( x i  ) 2
i 1
x
are not very convenient to use.
The following sets of formulae are their useful alternatives.
n
 2  Var (X)  E(X 2 )  [E(X)]2   f ( x ) x 2   2   f ( x i ) x i   2
2
i 1
x
i. Variance of a constant
For any constant a, Var(a) = 0
Proof: Since the mean of a is a itself,
Var (a )  E[(a  a ) 2 ]  0
This result must be true as variance is a measure of dispersion or variability,
and a constant does not vary.
ii. Variance of a constant multiple of a random variable
For a random variable X and a constant a,
Var (aX)  a 2 Var (X)
Proof: Since E(aX)  aE(X)  a , that is, the mean of aX is a
Var (aX)  E[(aX  a) 2 ]
 E[a 2 (X  ) 2 ]
 a 2 E[(X  ) 2 ]
 a 2 Var(X)
( a 2 is a constant)
5-9
iii. Variance of a linear function of a random variable
For a random variable X and constants a and b,
Var(aX  b)  a 2 Var(X)
5-10
Example 4
The following table shows the probability distribution of a random variable X.
x
1
2
3
4
5
f(x)
0.1
0.3
0.2
0.3
0.1
(a) Find E(X) .
(b) Find E(X 2 ) .
(c) Find Var (X) .
(d) Find Var (X 2 ) .
(e) Find E(4X  7) and Var(4X  7) .
(f) Find E(2  5X) and Var(2  5X) .
Solution:
5
(a) E(X)   f ( x) x  0.1  1  0.3  2  0.2  3  0.3  4  0.1  5  3
x 1
5
(b) E(X 2 )   f ( x) x 2  0.1  12  0.3  2 2  0.2  3 2  0.3  4 2  0.1  5 2  10.4
x 1
(c) Var(X)  E[(X  E(X)) 2 ]
 E[(X  3) 2 ]
5
  f ( x )( x  3) 2
x 1
 0.1  (1  3) 2  0.3  (2  3) 2  0.2  (3  3) 2  0.3  (4  3) 2  0.1  (5  3) 2
 0.1  4  0.3  1  0.2  0  0.3  1  0.1  4
 1.4
OR
Var(X)  E(X 2 )  [E(X)]2  10.4  3 2  10.4  9  1.4
5-11
(d) Var (X 2 )  E[(X 2  E(X 2 )) 2 ]
 E[(X 2  10.4) 2 ]
5
  f ( x )( x 2  10.4) 2
x 1
 0.1  (12  10.4) 2  0.3  (2 2  10.4) 2  0.2  (3 2  10.4) 2  0.3  (4 2  10.4) 2  0.1  (5 2  10.4) 2
 0.1  88.36  0.3  40.96  0.2  1.96  0.3  31.36  0.1  213.16
 52.24
OR
Var(X 2 )  E[(X 2 ) 2 ]  [E(X 2 )]2
 E(X 4 )  [E(X 2 )]2
5
  f ( x ) x 4  (10.4) 2
x 1
 0.1  14  0.3  2 4  0.2  3 4  0.3  4 4  0.1  5 4  (10.4) 2
 160.4  108.16
 52.24
(e) E(4X  7)  4E(X)  7  4  3  7  19
Var (4X  7)  4 2 Var(X)  16  1.4  22.4
(f) E(2  5X)  2  5E(X)  2  5  3  13
Var(2  5X)  (5) 2 Var(X)  25  1.4  35
5-12
Example 4
Let X be the discrete random variable of the number of heads shown when three fair coins are
tossed. The probability distribution of X is as follows.
X
0
1
2
3
f(x)
1
8
3
8
3
8
1
8
(a) Find E(X) , E(X 2 ) and E[X(X  1)] .
(b) Find Var (X) .
(c) Find Var (X 2 ) .
5-13
Example 5
There are 4 gold coins and 6 silver coins in a bag. Three coins are randomly drawn in succession
from the bag without replacement. Let the random variable X be the total number of gold coins
have been drawn.
(a) Tabulate the probability distribution of X.
(b) Find E(X) and Var(X).
5-14
Associate Degree 2021 – 2022 First Semester
CCMA4001 Quantitative Analysis I
Chapter 5
Special Discrete Probability Distributions
5.1 The Bernoulli Distribution
A Bernoulli trial is a random experiment which can result in one of two possible outcomes.
For convenience, we may call the two outcomes ‘success’ and ‘failure’, but they can equally
be ‘yes’ and ‘no’, ‘good’ and ‘defective’, ‘male’ and female’, ‘black’ and ‘white’, etc.
Suppose that the probability of a ‘success’ is p. Then the probability of a ‘failure’ is 1 – p.
Since the outcomes are not numerical, we define a random variable X on the sample space
{success, failure} so that
 1 for a success
X
 0 for a failure
This random variable is called a Bernoulli variable and has the probability distribution called
a Bernoulli distribution as shown below.
x
0
1
P(X = x)
1–p
p
Mathematically, the Bernoulli distribution is given by
P(X  x )  p x (1  p)1 x for x = 0, 1
The mean (expected value) of the distribution to be
p
And the variance is given by
 2  p(1  p)
Note that a Bernoulli distribution is completely specified by the value of p, which is often called
the parameter of the distribution.
web resources: https://www.jbstatistics.com/introduction-to-the-bernoulli-distribution/
6-1
Example 1
A college, which has 1250 students, conducts a chairman election for the student union.
Each student drops a vote in a ballot box. It is found that Peter gets 875 votes.
Suppose a vote is picked at random from the ballot box.
Let the random variable X be the count of the vote for Peter.
(a) Describe the probability distribution of X.
(b) Find the mean and variance of X.
(c) Interpret the mean and variance of X.
Solution:
(a) When the vote is for Peter, the count is 1; otherwise the count is 0.
P(X = 1) = P(the vote is for Peter) 
875
 0.7
1250
P(X = 0) = P(the vote is not for Peter) = 1 – 0.7 = 0.3
x
0
1
P(X = x)
0.3
0.7
It is a Bernoulli distribution with p = 0.7
(b) The mean of X
The variance of X
  0.7
 2  p(1  p )  0.7  0.3  0.21
(c) The mean is the proportion of votes for Peter (success) in the population.
The variance is a measure of variability of this proportion.
6-2
5.2 The Binomial Distribution
A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in
an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two
possible outcomes (the prefix “bi” means two, or twice). For example, a coin toss has only two possible
outcomes: heads or tails and taking a test could have two possible outcomes: pass or fail.
This random variable is called a binomial variable and its distribution is called a binomial distribution.
Let n be the total (fixed) number of independent Bernoulli trials, each trial resulting in either
a success (S) or a failure (F) with constant probabilities p and 1 – p respectively.
Also let the random variable X be the number of successes in the n trials.
Then, for a typical outcome with x successes (and hence with n – x failures),
the probability is, by the special multiplication rule for independent events,
p p  p (1  p)(1  p)  (1  p)  p x (1  p) n  x


 
nx
x
Trial Number
1
2
3
4
…
n–1
n
Outcome of Trial
S
F
S
S
…
F
S
Success Number
Probability
1
p
1–p
2
p
3
p
…
…
1–p
x
p
But the x successes can occur in C nx ways among the n trials.
Hence the total probability for x successes and n – x failures is
P(X  x )  C nx p x (1  p) n  x
for x = 0, 1, 2, …, n
This is the probability function of the binomial distribution.
6-3
The probability function has two parameters, n and p.
Hence, the distribution is completely specified when the values of these parameters are specified.
For this reason, we shall use the notation Bin(n, p) to denote the binomial distribution with
parameters n and p, where n is the number of trials and p is the probability of success in each trial.
Thus, Bin(5, 0.3) denotes the binomial distribution
P(X  x )  C 5x (0.3) x (0.7) 5 x for x = 0, 1, 2, 3, 4, 5
We also write X ~ Bin (n, p)
to mean that ‘X has the Bin(n, p)’, or ‘X is distributed as the Bin(n, p)’.
The mean of the binomial distribution Bin(n, p) is
  np
And the variance of the binomial distribution Bin(n, p) is
 2  np(1  p)
Binomial distributions must also meet the following three criteria:
 The number of observations or trials is fixed. In other words, you can only figure out
the probability of something happening if you do it a certain number of times. This is common
sense—if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a 20
times, your probability of getting a tails is very, very close to 100%.
 Each observation or trial is independent. In other words, none of your trials have an effect on the
probability of the next trial.
 The probability of success (tails, heads, fail or pass) is exactly the same from one trial to another.
web resources: https://www.jbstatistics.com/introduction-to-the-binomial-distribution/
6-4
Example 2
From past records, patients suffering from a certain disease will recover in one week’s time with
a probability of 0.7 if they are given a treatment, and with a probability of 0.3 if they are not given
a treatment. Find the probability that
(a) out of 8 patients who receive treatment, less than 4 will recover in one week’s time;
(b) out of 8 patients who do not receive treatment, more than 6 will recover in one week’s time.
Solution:
(a) Let X be the number of patients who receive treatment and will recover in one week’s time.
Then X ~ Bin(8, 0.7)
i.e., P(X = x)  C 8x (0.7) x (0.3) 8 x for x = 0, 1, 2, 3, 4, 5, 6, 7, 8
Hence P(X < 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)
 C 80 (0.7) 0 (0.3) 80  C18 (0.7)1 (0.3) 81  C 82 (0.7) 2 (0.3) 8 2  C 83 (0.7) 3 (0.3) 83
 (0.3) 8  8  0.7  (0.3) 7  28  (0.7) 2  (0.3) 6  56  (0.7) 3  (0.3) 5
 0.00006561  0.00122472  0.01000188  0.04667544
 0.05796765
(b) Let Y be the number of patients who do not receive treatment and will recover in one week’s time.
Then Y ~ Bin(8, 0.3)
i.e., P(Y = y) = b(y; 8, 0.3)  C 8y (0.3) y (0.7) 8 y
for y = 0, 1, 2, 3, 4, 5, 6, 7, 8
Hence P(Y > 6) = P(Y = 7) + P(Y = 8)
 C 87 (0.3) 7 (0.7) 87  C 88 (0.3) 8 (0.7) 88
 8  (0.3) 7  0.7  (0.3) 8
 0.00122472  0.00006561
 0.00129033
6-5
5.3 The Geometric Distribution
In the binomial distribution, the number of trials n is fixed, and the random variable of interest
is the number of ‘successes’ in these n trials.
Sometimes we do not fix the number of trials beforehand but keep on repeating the trial until
a ‘success’ occurs. Then the number of successes is fixed (= 1) while the number of trials is
a random variable. The associated probability distribution is called the geometric distribution,
which we define formally as follows.
Let independent Bernoulli trials, each with a constant probability p of a success, be performed
until a success occurs.
The number of trials X then has the geometric distribution with parameter p, given by
P(X  x )  (1  p) x 1 p
for x = 1, 2, 3, …
If x independent Bernoulli trials are required to give a success, then the first x – 1 trials must
all result in failures (F) while the x th trial must result in a success (S).
Trial Number
1
2
3
…
x–1
x
Outcome of Trial
F
F
F
…
F
S
1–p
1–p
1–p
…
1–p
p
Probability
Since the probability of each failure is 1 – p and that of the success is p, and since the trials
are independent, the probability of the above event is (1  p) x1 p as required.
Note that the values which X may take can be any integer from 1 to infinity since at least one
trial is required to obtain a success, but theoretically, infinitely many trials may have to be
performed to get the success.
Also, the probabilities (1  p) x1 p (for x = 1, 2, 3, …) are the terms of a geometric series with
common ratio 1 – p. This justifies the name ‘geometric distribution’.
The mean of the geometric distribution with parameter p is
1

p
And the variance of the geometric distribution with parameter p is
1 p
2  2
p
web resources: https://www.jbstatistics.com/introduction-to-the-geometric-distribution/
6-6
Example 3
A fair die is thrown until a ‘1’ occurs.
(a) Find the probability distribution for the number of throws required.
(b) Find the probability that 6 throws are required.
(c) Find the probability that more than 6 throws are required.
(d) Find the mean and the standard deviation of the probability distribution in (a).
Solution:
(a) Each throw of the die is a Bernoulli trail with a probability of success (the occurrence of ‘1’)
1
equal to
6
1
Hence the number of independent trials required, X, has a geometric distribution with p 
6
5
P( X  x )   
6
5
(b) P(X  6)   
6
x 1
6 1
1
 
6
for x = 1, 2, 3, …
5
5
5
3125
1 5 1 5 1 5
 0.06698
     5   6 
46656
6 6 6 6 6 6
(c) P(X > 6) = P(X = 7) + P(X = 8) + P(X = 9) + …
6
7
8
5 1 5 1 5 1
                ...
6 6 6 6 6 6
6
a
5
5 1
which is sum to infinity of a geometric series S() 
with a      and r 
1 r
6
6 6
6
5 1
   
6
6
Hence P(X  6)      
5
1
6
6
5 1
   
6
6
 6   6    5   5  15625  0.3349
 
1
6 6 46656
6
6
OR The event that more than n throws are required can also be interpreted as 1 does not occur
in the first n throws.
n
5
This then gives P(X  n )    directly.
6
6
5 6 15625
5
Hence P(X  6)     6 
 0.3349
46656
6
6
6-7
(d) The mean  
1 1
 6
p 1
6
This is a reasonable result since if the probability for a ‘1’ to occur is
1
,
6
then theoretically, the die must be thrown 6 times on the average to obtain one ‘1’.
1
5
1 p
6  6  5  36  30
The variance  2  2 
2
1
6
p
1
 
36
6
1
Hence the standard deviation   30  5.4772
6-8
5.4 The Poisson Distribution
The Poisson distribution is the discrete probability distribution of the number of events
occurring in a given time period, given the average number of times the event occurs over that
time period.
For example
A certain fast-food restaurant gets an average of 3 visitors to the drive-through per minute.
This is just an average, however. The actual amount can vary.
The Poisson distribution is applicable only when several conditions hold.
 An event can occur any number of times during a time period.
 Events occur independently. In other words, if an event occurs, it does not affect the
probability of another event occurring in the same time period.
 The rate of occurrence is constant; that is, the rate does not change based on time.
 The probability of an event occurring is proportional to the length of the time period. For
example, it should be twice as likely for an event to occur in a 2 hour time period than it
is for an event to occur in a 1 hour period.
The Poisson variable X with parameter  (> 0) has the probability function
P ( X  x) 
e   x
x!
for x = 0, 1, 2, …
A Poisson distribution is completely specified by only one parameter,  ,
and is denoted by Po(  ).
The mean of the Poisson distribution Po( ) is

And the variance of the Poisson distribution Po( ) is
2  
Therefore for the Poisson distribution Po(  ),
mean = variance = 
web resources: https://www.jbstatistics.com/introduction-to-the-poisson-distribution/
6-9
Example 4
The number of telephone calls at an office over a given time interval may be considered as having
a Poisson distribution. The average number of phone calls per hour in an office is 180.
Find the probability of having
(a) 4 phone calls in a minute,
(b) no phone call in a particular 3-minute interval.
Solution:
Let the random variable X be the number of phone calls in a minute.
Then X is a Poisson distribution with
mean  
180
3
60
i.e., P(X  x ) 
e 3 (3) x
x!
for x = 0, 1, 2, …
Note that 180 calls per hour = 3 calls per minute
(a) The probability of having 4 phone calls in a minute is
P(X  4) 
e 3 (3) 4 e 3 (81)

 0.1680
4!
24
6-10
(b) The probability of having no phone call in a minute is
P(X  0) 
e 3 (3) 0 e 3 (1)

 0.04979
0!
1
Since each 1-minute interval is independent of each other,
the probability of having no phone call in a particular 3-minute interval is
(0.04979) 3  0.0001234
OR Let the random variable Y be the number of phone calls in a 3-minute interval.
Then Y is a Poisson distribution with
mean  
180
9
20
i.e., P(Y  y) 
e 9 (9) y
y!
for y = 0, 1, 2, …
Note that 180 calls per hour = 9 calls per 3-minute interval
The probability of having no phone call in a particular 3-minute interval is
P(Y  0) 
e 9 (9) 0 e 9 (1)

 0.0001234
0!
1
6-11
Associate Degree 2021 – 2022 First Semester
CCMA4001 Quantitative Analysis I
Chapter 6
The Normal Distribution and Its Applications
6.1 The Normal Distribution
A. The Probability Density Function
One of the most important continuous distribution which is of wide applications is
the normal (or Gaussian) distribution.
A continuous random variable X has a normal distribution (also called the Gaussian distribution)
with mean  , variance  2 and standard deviation  has the probability density function
f (x) 
1
 2
e

1  x  


2  
2
for    x  
If a random variable X is normally distributed with mean  and variance  2 ,
we write X ~ N(,  2 ) .
The figure shown gives a sketch of such a normal curve.
The normal curve has the following properties.
(1) f(x) > 0 for all values of x.
This is obvious from the definition of f(x).
Thus, condition (1) for a pdf is satisfied.
7-1
(2) The total area under the curve and above the horizontal axis is equal to 1,
i.e.,



f ( x )dx  

1

 2
e

1  x  


2  
2
dx  1
Thus satisfying condition (2) for pdf.
The shape of the curve f(x) depends on two parameters, the mean  and the variance  2 .
The graph of f(x) has the following features.
(1) It is a bell-shaped curve symmetrical about the vertical line x   .
(2) The mean, median and mode are all equal to  .
(3) The normal curve extends indefinitely in both direction, from   to   .
And it has the x-axis as an asymptote.
That is, as x   or x   , f ( x )  0
The shape and the location of the curve f(x) depends on two parameters,
the mean  and the variance  2 .
For normal curves with the same  but different ' s (1   2 ) ,
they have the same shape but are centred at different positions along the horizontal axis.
For normal curves with the same  but different ' s (1   2 ) ,
the curves are centred at exactly the same position on the horizontal axis,
but they have different shape, the curve with the larger standard deviation is lower and spreads out farther.
7-2
For example,
7-3
B. Probabilities of the Standard Normal Distribution N(0, 1)
Suppose a random variable X has a normal distribution N(,  2 ) .
The probability that X lies between a and b is written P(a  X  b)
and is given by the area under the normal curve between a and b.
P (a  X  b)  
b
a
1
 2
e

1  x  


2  
2
dx
The areas under the normal curve can be computed by the use of integral calculus
but the function is very difficult to integrate.
In practice, we usually find normal probabilities from tables.
In order to use the same set of tables for all possible values of mean  and variance  2 ,
we perform a process known as ‘standardizing X’ to obtain the standard normal variable
which is given the special symbol Z.
The random variable Z having the normal distribution with mean 0 and standard deviation
equal to 1 is called the standard normal variable.
Thus, Z ~ N(0, 1) and the N(0, 1) is called the standard normal distribution and has pdf
f (z) 
1
2
e

z2
2
for    z  
The following is the graph of the standard normal distribution.
Probabilities of this distribution can be found from normal tables.
7-4
Mathematicians has set up tables of area under the standard normal curve.
One form of such normal tables is given on P.6.
The table gives values of A(z) defined by
A ( z )  P (0  Z  z )  
z
1
0
2
e

1 2
t
2
dt for z  0
A(z)
0
z
From the table, we can read, for example, the following probabilities as areas A(z) under
the standard normal curve.
P(0  Z  1.2)  A(1.2)  0.3849
P(0  Z  0.63)  A(0.63)  0.2357
The normal table can be used to find values like P( Z  a ) , P( Z  b) and P(a  Z  b) .
In using the table, it is always recommended to sketch the normal curve and shade the relevant region.
This will visualize the situation under consideration.
Watch: https://www.youtube.com/watch?v=lgwT6tDniko
7-5
The entries in Table I are the probabilities that a random variable having
the standard normal distribution will take on a value between 0 and z.
They are given by the area of the gray region under the curve in the figure.
TABLE I NORMAL-CURVE AREAS
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.1
0.2
0.3
0.4
0.5
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.6
0.7
0.8
0.9
1.0
0.2257
0.2580
0.2881
0.3159
0.3413
0.2291
0.2611
0.2910
0.3186
0.3438
0.2324
0.2642
0.2939
0.3212
0.3461
0.2357
0.2673
0.2967
0.3238
0.3485
0.2389
0.2704
0.2995
0.3264
0.3508
0.2422
0.2734
0.3023
0.3289
0.3531
0.2454
0.2764
0.3051
0.3315
0.3554
0.2486
0.2794
0.3078
0.3340
0.3577
0.2517
0.2823
0.3106
0.3365
0.3599
0.2549
0.2852
0.3133
0.3389
0.3621
1.1
1.2
1.3
1.4
1.5
0.3643
0.3849
0.4032
0.4192
0.4332
0.3665
0.3869
0.4049
0.4207
0.4345
0.3686
0.3888
0.4066
0.4222
0.4357
0.3708
0.3907
0.4082
0.4236
0.4370
0.3729
0.3925
0.4099
0.4251
0.4382
0.3749
0.3944
0.4115
0.4265
0.4394
0.3770
0.3962
0.4131
0.4279
0.4406
0.3790
0.3980
0.4147
0.4292
0.4418
0.3810
0.3997
0.4162
0.4306
0.4429
0.3830
0.4015
0.4177
0.4319
0.4441
1.6
1.7
1.8
1.9
2.0
0.4452
0.4554
0.4641
0.4713
0.4772
0.4463
0.4564
0.4648
0.4719
0.4778
0.4474
0.4573
0.4656
0.4725
0.4783
0.4484
0.4582
0.4664
0.4732
0.4788
0.4495
0.4591
0.4671
0.4738
0.4793
0.4505
0.4599
0.4678
0.4744
0.4798
0.4515
0.4608
0.4685
0.4750
0.4803
0.4525
0.4616
0.4692
0.4756
0.4808
0.4535
0.4625
0.4699
0.4761
0.4812
0.4545
0.4633
0.4706
0.4767
0.4817
2.1
2.2
2.3
2.4
2.5
0.4821
0.4861
0.4893
0.4918
0.4938
0.4826
0.4864
0.4896
0.4920
0.4940
0.4830
0.4868
0.4898
0.4922
0.4941
0.4834
0.4871
0.4901
0.4925
0.4943
0.4838
0.4875
0.4904
0.4927
0.4945
0.4842
0.4878
0.4906
0.4929
0.4946
0.4846
0.4881
0.4909
0.4931
0.4948
0.4850
0.4884
0.4911
0.4932
0.4949
0.4854
0.4887
0.4913
0.4934
0.4951
0.4857
0.4890
0.4916
0.4936
0.4952
2.6
2.7
2.8
2.9
3.0
0.4953
0.4965
0.4974
0.4981
0.4987
0.4955
0.4966
0.4975
0.4982
0.4987
0.4956
0.4967
0.4976
0.4982
0.4987
0.4957
0.4968
0.4977
0.4983
0.4988
0.4959
0.4969
0.4977
0.4984
0.4988
0.4960
0.4970
0.4978
0.4984
0.4989
0.4961
0.4971
0.4979
0.4985
0.4989
0.4962
0.4972
0.4979
0.4985
0.4989
0.4963
0.4973
0.4980
0.4986
0.4990
0.4964
0.4974
0.4981
0.4986
0.4990
3.1
3.2
3.3
3.4
3.5
0.4990
0.4993
0.4995
0.4997
0.4998
0.4991
0.4993
0.4995
0.4997
0.4998
0.4991
0.4994
0.4995
0.4997
0.4998
0.4991
0.4994
0.4996
0.4997
0.4998
0.4992
0.4994
0.4996
0.4997
0.4998
0.4992
0.4994
0.4996
0.4997
0.4998
0.4992
0.4994
0.4996
0.4997
0.4998
0.4992
0.4995
0.4996
0.4997
0.4998
0.4993
0.4995
0.4996
0.4997
0.4998
0.4993
0.4995
0.4997
0.4998
0.4998
Also, for z = 4.0, 5.0 and 6.0, the areas are 0.49997, 0.4999997, and 0.499999999.
7-6
Example 1
Given that Z is the standard normal variable, Z ~ N(0, 1), find the following probabilities.
(a) P(0  Z  1.28)
(b) P(1.28  Z  0)
(c) P( Z  1.28)
(d) P( Z  1.28)
(e) P( Z  1.28)
(f) P( Z  1.28)
(g) P(1.28  Z  2.28)
(h) P(1.28  Z  2.28)
(i) P(2.28  Z  1.28)
(j) P(0  Z  1.289)
Solution:
(a) P(0  Z  1.28)  A(1.28)  0.3997
(b) P(1.28  Z  0)  P(0  Z  1.28)  A(1.28)  0.3997
(c) The normal curve is symmetrical about the line through its mean (i.e., the line z = 0)
and the total area under the curve is equal to 1.
 The area on the right of the line z = 0 is 0.5
i.e., P( Z  0)  0.5
P( Z  1.28)  P( Z  0)  P(0  Z  1.28)  0.5  0.3997  0.1003
(d) P( Z  1.28)  P(1.28  Z  0)  P( Z  0)  0.3997  0.5  0.8997
(e) P( Z  1.28)  P( Z  0)  P(0  Z  1.28)  0.5  0.3997  0.8997
(f) P( Z  1.28)  P( Z  0)  P(1.28  Z  0)  0.5  0.3997  0.1003
7-7
(g) P(1.28  Z  2.28)  P(0  Z  2.28)  P(0  Z  1.28)
 A(2.28)  A(1.28)
 0.4887  0.3997
 0.0890
(h) P(1.28  Z  2.28)  P(1.28  Z  0)  P(0  Z  2.28)
 0.3997  0.4887
 0.8884
(i) P(2.28  Z  1.28)  P(2.28  Z  0)  P(1.28  Z  0)
 P(0  Z  2.28)  P(0  Z  1.28)
 A(2.28)  A(1.28)
 0.4887  0.3997
 0.0890
(j) The normal probability table gives probabilities only for values of z with at most 2 decimal places.
Hence, rounding 1.289 to 1.29, we have
P(0  Z  1.289)  P(0  Z  1.29)  A(1.29)  0.4015
7-8
Example 2
If Z has the standard normal distribution, determine the value of z in each of the following cases.
(a) P( Z  z)  0.8790
(b) P( Z  z)  0.0401
(c) P( Z  z)  0.0080
(d) P( Z  z)  0.6429
(e) P(2.1  Z  z)  0.2586
(f) P(z  Z  0.37)  0.3125
Solution:
In solving this problem, the normal table is used in the reverse way.
We first locate the given probability in the table.
Its corresponding row digits and column digits then give the value of z.
(a) As P( Z  z)  0.8790
 P(0  Z  z)  P( Z  z)  P( Z  0)  0.8790  0.5  0.3790
0
z
The entry 0.3790 is located in the row marked 1.1 and in the column marked 0.07 in the normal table.
 The required z = 1.17
(b) As P( Z  z)  0.0401
 P(0  Z  z)  P( Z  0)  P( Z  z)  0.5  0.0401  0.4599
From the normal table, z = 1.75
0
(c) P( Z  z)  0.0080  0.5
implying that z lies on the left of 0 such that
P(z  Z  0)  P( Z  0)  P( Z  z)  0.5  0.0080  0.4920
 P(0  Z  z)  P(z  Z  0)  0.4920
From the normal table, – z = 2.41
 z = – 2.41
z
0
(d) P( Z  z)  0.6429  0.5
implying that z lies on the left of 0 such that
P(z  Z  0)  P( Z  z)  P( Z  0)  0.6429  0.5  0.1429
 P(0  Z  z)  P(z  Z  0)  0.1429
From the normal table,
A(0.36) = 0.1406
and A(0.37) = 0.1443
Since 0.1429 is closer to 0.1443, we take – z = 0.37
 z = – 0.37
7-9
z 0
z
(e) As P(2.1  Z  0)  P(0  Z  2.1)  A(2.1)  0.4821  0.2586
 z must be negative in this case.
-2.1
z
0
By symmetry,
P(0  Z  z)  P(z  Z  0)  P(2.1  Z  0)  P(2.1  Z  z)  0.4821  0.2586  0.2235
From the normal table,
A(0.59) = 0.2224
and A(0.60) = 0.2257
Since 0.2235 is closer to 0.2224, we take – z = 0.59
 z = – 0.59
(f) As P(0  Z  0.37)  A(0.37)  0.1443  0.3125
 z must be negative in this case.
z 0 0.37
By symmetry,
P(0  Z  z)  P(z  Z  0)  P(z  Z  0.37)  P(0  Z  0.37)  0.3125  0.1443  0.1682
From the normal table,
A(0.43) = 0.1664
and A(0.44) = 0.1700
Since 0.1682 is the average of 0.1664 and 0.1700, we take – z = 0.435
 z = – 0.435
7-10
C. Probabilities of the Normal Distribution N( ,  2 )
When X ~ N(,  2 ) with  and  are not 0 and 1, we have to transform the random variable X
to a normal random variable Z with mean zero and variance 1.
This can be done by means of the transformation Z 
7-11
X 
.

Example 3
A random variable X is normally distributed with mean 30 and standard deviation 4.
Find the following probabilities.
(a) P(X  33)
(b) P(X  35)
(c) P(X  27)
(d) P(21  X  28)
Solution:
The random variable X has the distribution N(30, 16) with mean   30 and standard deviation   4
Hence X is transformed to standard normal by Z 
X   X  30

4

33  30 

(a) P(X  33)  P Z 
  P( Z  0.75)
4 

 P ( Z  0)  P (0  Z  0.75)
 0.5  0.2734
30
0
 0.7734
35  30 

(b) P(X  35)  P Z 
  P( Z  1.25)
4 

 P( Z  0)  P(0  Z  1.25)
 0.5  0.3944
 0.1056
30
0
27  30 

(c) P(X  27)  P Z 

4 

 P( Z  0.75)
 P( Z  0)  P(0.75  Z  0)
 0.5  P(0  Z  0.75)
 0.5  0.2734
 0.2266
27
-0.75
28  30 
 21  30
Z
(d) P(21  X  28)  P

4 
 4
 P(2.25  Z  0.5)
 P(2.25  Z  0)  P(0.5  Z  0)
 P(0  Z  2.25)  P(0  Z  0.5)
 0.4878  0.1915
 0.2963
7-12
21
-2.25
28
-0.5
33
0.75
35
1.25
x
z
x
z
30
0
x
z
30
0
x
z
Example 4
A random variable X has distribution N(50, 100). Find the values of a and b if
(a) P(X  a )  0.0427
(b) P(X  b)  0.209
Solution:
The random variable X has the distribution N(50, 100), i.e.,   50 and   100  10
(a) If P( Z  z)  0.0427
then P(0  Z  z)  P( Z  0)  P( Z  z)  0.5  0.0427  0.4573
50
0
From the normal table, z = 1.72
As
a
z
X
Z
P(X  a )  0.0427
a  50 

P Z 
  0.0427
10 

a  50

 1.72
10
a  50  17.2
a  67.2
(b) If P( Z  z)  0.209  0.5
implying that z lies on the left of 0 such that
P(z  Z  0)  P( Z  0)  P( Z  z)  0.5  0.209  0.291
 P(0  Z  z)  P(z  Z  0)  0.291
b
z
From the normal table, – z = 0.81
 z = – 0.81
As
P(X  b)  0.209
b  50 

P Z 
  0.209
10 

b  50

 0.81
10
b  50  8.1
b  41.9
7-13
50
0
X
Z
D. Applications of Normal Distribution
Applications of normal distribution are commonly occurred in many daily life examples.
Example 5
Suppose that W, the weight in kg of an adult male follows N(60, 25) distribution.
Calculate the probability that a male chosen at random is
(a) less than 61 kg;
(b) greater than 63 kg;
(c) between 58 kg and 63 kg;
(d) less than 58 kg.
Solution:
The mean and the variance of W are   60 and  2  25 respectively.
So the standard deviation is   5
The standardization formula is Z 
W  60
5
(a) The probability that a male chosen at random is less than 61 kg is
 W  60 61  60 
P( W  61)  P


5 
 5
 P ( Z  0 .2 )
 P ( Z  0 )  P ( 0  Z  0 .2 )
 0.5  0.0793
 0.5793
(b) The probability that a male chosen at random is greater than 63 kg is
 W  60 63  60 
P( W  63)  P


5 
 5
 P( Z  0.6)
 P( Z  0)  P(0  Z  0.6)
 0.5  0.2257
 0.2743
7-14
(c) The probability that a male chosen at random is between 58 kg and 63 kg is
 58  60 W  60 63  60 
P(58  W  63)  P



5
5 
 5
 P(0.4  Z  0.6)
 P(0.4  Z  0)  P(0  Z  0.6)
 P(0  Z  0.4)  P(0  Z  0.6)
 0.1554  0.2257
 0.3811
(d) The probability that a male chosen at random is less than 58 kg is
 W  60 58  60 
P( W  58)  P


5 
 5
 P( Z  0.4)
 P ( Z  0 )  P (  0 .4  Z  0 )
 P ( Z  0 )  P ( 0  Z  0 .4 )
 0.5  0.1554
 0.3446
7-15
Example 6
The intelligence quotients (IQ) of children in a city are normally distributed with mean 100
and standard deviation 15.
(a) What proportion of children have IQ scores
(i) less than 91?
(ii) between 106 and 130?
(b) What IQ score will be exceeded by only 5% of the children?
Solution:
(a) Let the random variable X be the IQ score of a child in the city.
Then X ~ N(100, 225)
(i) The proportion of children have IQ scores less than 91 is
91  100 

P(X  91)  P Z 

15 

 P( Z  0.6)
 P( Z  0)  P(0.6  Z  0)
 P( Z  0)  P(0  Z  0.6)
 0.5  0.2257
 0.2743
91
-0.6
100
0
X
Z
(ii) The proportion of children have IQ scores between 106 and 130 is
130  100 
 106  100
Z
P(106  X  130)  P

15
 15

 P ( 0 .4  Z  2 )
 P ( 0  Z  2 )  P ( 0  Z  0 .4 )
 0.4772  0.1554
 0.3218
7-16
100 106
0 0.4
130
2
X
Z
(b) Let a be the required score.
P(X > a) = 0.05
If P( Z  z)  0.05
then P(0  Z  z)  P( Z  0)  P( Z  z)  0.5  0.05  0.45
From the normal table,
A(1.64) = 0.4495
and A(1.65) = 0.4505
Since 0.45 is the average of 0.4495 and 0.4505
 z = 1.645
As
P(X  a )  0.05
a  100 

P Z 
  0.05
15 

a  100

 1.645
15
a  100  24.675
a  124.675
 a = 125 (round up to the nearest integer)
7-17
100
0
a
z
X
Z
E. Sum and Difference of Two Independent Normal Variables
If X and Y are any two random variables, continuous or discrete, then
E(X + Y) = E(X) + E(Y)
E(X – Y) = E(X) – E(Y)
Also, if X and Y are independent, then
Var(X + Y) = Var(X) + Var(Y)
Var(X – Y) = Var(X) + Var(Y)
These results can be applied to normal variables.
The sum or difference of two independent normal variables is also normally distributed.
If X and Y are two independent normal variables such that
2
2
X ~ N(1 , 1 ) and Y ~ N( 2 ,  2 )
2
2
then X  Y ~ N(1   2 , 1   2 )
2
2
and X  Y ~ N(1   2 , 1   2 )
7-18
Example 7
If X ~ N(70, 9) and Y ~ N(60, 16), find
(a) P(X  Y  140)
(b) P(120  X  Y  135)
(c) P(X  Y  7)
(d) P(2  X  Y  8)
Solution:
(a) X + Y ~ N(70 + 60, 9 + 16)
i.e., X + Y ~ N(130, 25)
130
140 x + y
2
z
0
z

140  130 
P(X  Y  140)  P Z 
  P( Z  2)  P( Z  0)  P(0  Z  2)  0.5  0.4772  0.9772
25 

 120  130
135  130 

Z
(b) P(120  X  Y  135)  P
25
25 

 P(2  Z  1)
 P(2  Z  0)  P(0  Z  1)
 P(0  Z  2)  P(0  Z  1)
 0.4772  0.3413
 0.8185
120
-2
(c) X – Y ~ N(70 – 60, 9 + 16)
i.e., X – Y ~ N(10, 25)

7  10 

P(X  Y  7)  P Z 
25 

 P( Z  0.6)
 P(0.6  Z  0)  P( Z  0)
 P(0  Z  0.6)  P( Z  0)
 0.2257  0.5
 0.7257
7-19
2
-1.6
135
0
1
x+ y
z
z
10
x- y
0
z
8 10
-0.4 0
x- y
7
-0.6
 2  10
8  10 

(d) P(2  X  Y  8)  P
Z
25 
 25
 P (  1 .6  Z   0 .4 )
 P (  1 .6  Z  0 )  (  0 .4  Z  0 )
 P ( 0  Z  1 .6 )  P ( 0  Z  0 .4 )
 0.4452  0.1554
 0.2898
130
z
z
z
Associate Degree 2021 – 2022 First Semester
CCMA4001 Quantitative Analysis I
Chapter 7
Sampling Distribution
7.1 Random Samples and Sampling Distributions
A. Sampling Distributions
The sample mean X of the sample observations X 1 , X 2 , …, X n from
a population X is given by
X
1
(X1  X 2    X n )
n
From the definition of a random sample, the observations are random variables
and so the sample statistic is also a random variable.
Hence, the sample mean has a probability distribution and this distribution is called the sampling
distribution of the mean.
8-1
Example 1
A population consists of the numbers 1, 2, 3, 4 and 5.
Random samples of size 2 are drawn from the population with replacement.
(a) List all the possible samples of size 2 and find their means.
(b) Construct the sampling distribution for the sample mean.
Solution:
(a) Let (X 1 , X 2 ) represent a sample of size 2 drawn from the population.
Since the numbers are drawn with replacement, each X i (i = 1, 2) can take values from 1 to 5.
Hence, there are 25 possible samples and these, together with their means,
are listed in the following table.
Sample
Mean
(x 1 , x 2 ) x 
1
1
1
1
1
(x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 )
2
2
2
2
2
Sample
Mean
Sample
Mean
Sample
Mean
Sample
Mean
(1, 1)
1
(2, 1)
1.5
(3, 1)
2
(4, 1)
2.5
(5, 1)
3
(1, 2)
1.5
(2, 2)
2
(3, 2)
2.5
(4, 2)
3
(5, 2)
3.5
(1, 3)
2
(2, 3)
2.5
(3, 3)
3
(4, 3)
3.5
(5, 3)
4
(1, 4)
2.5
(2, 4)
3
(3, 4)
3.5
(4, 4)
4
(5, 4)
4.5
(1, 5)
3
(2, 5)
3.5
(3, 5)
4
(4, 5)
4.5
(5, 5)
5
8-2
(b) Since the samples are drawn randomly, each sample has the same probability of being drawn.
The totality of all the samples thus forms an equiprobable sample space with 25 sample points,
each point having a probability of
1
.
25
The sampling distribution of the sample mean X may then be derived from the table in (a)
and is shown in the following table.
Sample Mean x
Probability
1
1
25
1.5
2
25
2
3
25
2.5
4
25
3
5 1

25 5
3.5
4
25
4
3
25
4.5
2
25
5
1
25
Total
1
8-3
Example 2
Refer to the data and results of Example 1.
(a) Find the mean  and the variance  2 of the population.
(b) Find the mean E(X) and the variance Var (X) of the sampling distribution of X .
Solution:
5
1
1
1
1
1
(a)   E(X)   xf ( x )  1   2   3   4   5   3
5
5
5
5
5
x 1
 2  Var (X)
 E(X 2 )  [E(X)]2
5
  x 2 f ( x )  (3) 2
x 1
1
1
1
1
1
 12   2 2   3 2   4 2   5 2   (3) 2
5
5
5
5
5
 11  9
2
9
(b) E(X)   x if ( x i )
i 1
 1
1
2
3
4
1
4
3
2
1
 1.5 
 2
 2.5 
 3   3.5 
 4
 4.5 
 5
25
25
25
25
5
25
25
25
25
3
This is the same as the population mean  .
2
Var (X)  E(X )  [E(X)]2
9
  x i f ( x i )  (3) 2
2
i 1
1
2
3
4
1
 (1.5) 2 
 22 
 (2.5) 2 
 32 
25
25
25
25
5
4
3
2
1
(3.5) 2 
 42 
 (4.5) 2 
 52 
 (3) 2
25
25
25
25
 10  (3) 2
 10  9
1
 12 
OR
Var(X) 
2 2
 1
n
2
8-4
7.2 Relationship Between Sample Mean and Population Mean
Suppose that a population has mean  , which is unknown.
To estimate  , it is natural to draw a random sample from the population
and use the sample mean as an estimate.
Let X 1 , X 2 , …, X n be a random sample of n independent observations from a population
with mean  and variance  2 .
Consider the sample mean, X , where X 
1
1 n
(X1  X 2    X n )   X i
n
n i 1
The distribution of X has mean (expected value) E(X) and variance Var(X) .
1

E(X)  E  (X1  X 2    X n )

n
1
 [E (X 1 )  E(X 2 )    E(X n )]
n
1
 ( n )
n

1

Var(X)  Var  (X1  X 2    X n )
n

1
 2 [Var(X 1 )  Var(X 2 )    Var(X n )]
n
1
 2 (n 2 )
n
2

n
Therefore E(X)   = population mean
and Var(X) 
 2 population var iance

n
sample size
The sample mean is expected to assume the value of the population mean in the long run.
Hence by drawing many, many random samples of fixed size n, calculating their means
and averaging them, we can obtain a good estimate of the population mean.
8-5
7.3 The Sampling Distribution of the Sample Mean
Consider X 1 , X 2 , …, X n , a random sample of size n, taken from a population with mean 
and variance  2 .
1 n
2
X

and
variance
.
,
has
mean
 i
n i 1
n

and is called the standard error of the mean.
The standard deviation of the distribution of X is
n
We know that the sample mean X , where X 
The distribution of X is known as the sampling distribution of means.
We now consider how the sample mean is distributed.
A. Sampling from a Normal Population
If X 1 , X 2 , …, X n is a random sample of size n taken from a normal distribution X ~ N(,  2 ) ,
 2
then the distribution of X is also normal and X ~ N ,
 n
1
where X  (X 1  X 2    X n )
n



The figure shows the relationship between the distributions of X and X for a normal population.
The diagram shows the distribution of X, where X ~ N(,  2 ) ,
together with the distributions of X when n = 4 and when n = 16.
The shape of the distribution of X is narrower than that of the population.
Each curve is symmetrical about  , but as n gets larger, the variance gets smaller,
so the curve becomes taller and less spread out.
The larger the sample size n, the smaller the variance of X and hence the narrower its shape.
8-6
Watch
http://onlinestatbook.com/2/sampling_distributions/samp_dist_meanM.html
Example 3
The weight of an egg is normally distributed with a mean of 62 grams and a standard deviation of 4 grams.
(a) If an egg is picked at random, find the probability that it weighs between 60 grams and 65 grams.
(b) Eggs are packed at random in boxes of 12. Find the probability that the average weight of the eggs
in a box lies between 60 grams and 65 grams.
Solution:
(a) Let the random variable X be the weight of an egg in grams.
Then X ~ N(62, 16)
The probability that an egg weighs between 60 grams and 65 grams is
65  62 
 60  62
P(60  X  65)  P
Z

4 
 4
 P(0.5  Z  0.75)
 P(0.5  Z  0)  P(0  Z  0.75)
 P(0  Z  0.5)  P(0  Z  0.75)
 0.1915  0.2734
 0.4649
(b) Since X is normally distributed, X is also normally distributed
4
 16 

and X ~ N 62,  , i.e., X ~ N 62, 
3
 12 

The probability that the average weight of the eggs in a box lies between 60 grams and 65 grams is




60  62
65  62 

P(60  X  65)  P
Z

4
4 


3
3 

 P(1.73  Z  2.60)
 P(1.73  Z  0)  P(0  Z  2.60)
 P(0  Z  1.73)  P(0  Z  2.60)
 0.4582  0.4953
 0.9535
8-7
B. Sampling from Any Population and Sample Size n is Large
When the population X is not normal, the distribution of X may have a form different from
that of the population.
In many cases, the distribution of X is difficult to find and its shape depends on the sample size n.
Fortunately, when n is large enough, the distribution of X is known to be approximately normal,
whether the population is normal or not.
This is one of the most important results in statistics and is known as the Central Limit Theorem.
Central Limit Theorem
For any population X with mean  and variance  2 , the distribution of the sample mean X
2
(based on random samples of size n) is approximately normal with mean  and variance
,
n
where n is sufficiently large,
 2 
i.e., X ~ N ,  approximately, for large n.
 n 
This theorem explains why the normal distribution is so important in daily life and in statistical theory.
The sample size n required for the approximate normality to be valid depends on the nature of
the population distribution.
In most cases, n = 30 may be considered sufficiently large for the Central Limit Theorem to apply.
If the population is itself normal, then the sample mean is exactly normally distributed
for any value of n.
8-8
The following figure illustrates the sampling distributions of X for some populations
with different sample sizes.
If X 1 , X 2 , …, X n is a random sample of size n from any distribution X with mean  and variance  2
then the distribution of the sample mean X is approximately normal for large n and
 2
X ~ N ,
 n

1
 where X  (X 1  X 2    X n )
n

The approximation gets better as n gets larger.
The distribution of X can be discrete, for example binomial or Poisson;
or continuous, for example rectangular or exponential.
8-9
Example 4
If a random sample of size 30 is taken from each of the following distributions, find, for each case,
the probability that the sample mean exceeds 5.
(a) X ~ Bin(9, 0.5)
(b) X ~ Po(4.5)
Solution:
(a) If X ~ Bin(9, 0.5) then
E(X)    np  9  0.5  4.5
Var(X)   2  np(1  p)  9  0.5  0.5  2.25
The sample size is large, so by the central limit theorem,
2.25 

X ~ N 4.5,
 approximately
30 





5  4.5 

Hence P(X  5)  P Z 
 P( Z  1.83)  P( Z  0)  P(0  Z  1.83)  0.5  0.4664  0.0336

2.25 


30 

(b) If X ~ Po(4.5) then
E(X)      4.5
Var (X)   2    4.5
The sample size is large, so by the central limit theorem,
4.5 

X ~ N 4.5,
 approximately
30 





5  4.5 

 P( Z  1.29)  P( Z  0)  P(0  Z  1.29)  0.5  0.4015  0.0985
Hence P(X  5)  P Z 
4.5 


30 

8-10
Example 5
Suppose that the population distribution of the daily wages of clerks is known to have a mean of $230
and a standard deviation of $25. For a random sample of 100 clerks, what is the probability that
the sample mean daily wage will be between $225 and $233?
Solution:
Let the random variable X be the daily wage of a clerk in dollars.
It is given that   230 and   25
As the sample size n = 100 is large enough (> 30),
we do not need to know the nature of the population distribution.
From the central limit theorem,

25 2 
 approximately
X ~ N 230,
100 

i.e., X ~ N(230,6.25) approximately
The probability that the sample mean daily wage will be between $225 and $233 is
 225  230
233  230 
P(225  X  233)  P
Z

6.25 
6.25

 P (  2  Z  1 .2 )
 P (  2  Z  0 )  P ( 0  Z  1 .2 )
 P ( 0  Z  2 )  P ( 0  Z  1 .2 )
 0.4772  0.3849
 0.8621
8-11
Associate Degree 2020 – 2021 First Semester
CCMA4001 Quantitative Analysis I
Chapter 8
Estimation
Introduction
One major type of inferential statistics is estimating the unknown parameter in the
population by the information collected from a sample. In this chapter, we will
discuss how to estimate the unknown population mean and population proportion.
Also the determination of the sample size to achieve a certain level of accuracy will
be discussed.
Estimation of the Mean (σ known)
Point estimate
In the previous chapter, we know that according to the central limit theorem, a sample
mean would follow the normal distribution where
 2
X ~ N   ,
n


 .

The sample mean x is an unbiased point estimator of the population mean μ,
as E ( X )   .
Example
The weights of a particular brand of cola follow a normal distribution with an
unknown population mean and standard deviation of 15 grams. A sample of 25 cans
of cola has a mean of 362.3 grams. What is the estimated population mean?
 The point estimate of μ = 362.3 grams.
Confidence Interval Estimate
1
When we estimate the population mean weight of this brand of cola as 362.3 grams
according to a particular sample, do you believe that the true population mean is
exactly 362.3 grams? As there are variations of different sample means selected
from the same population, we can only say that we believe the population mean is
around 362.3 grams. Then, how accurate is the estimate?
 2
As X ~ N   ,
n




 , we know that P (   1.96
 X    1.96
)  0.95 .
n
n

That
means if we repeatedly draw different samples, 95% of the sample means fall within


 
the limits    1.96
,   1.96
. .
n
n

0.025
0.025
X
  1.96 / n
  1.96 / n
95% Confidence Intervals


 
, X  1.96
Now, with μ being unknown and estimated by X ,  X  1.96
 gives
n
n

a 95% confidence interval estimates of the unknown μ. That means if we repeatedly
draw different samples, 95% of the described intervals would contain the parameter μ.
 In the above example, with a single sample with sample mean 362.3, the 95%
confidence interval estimate of μ:

15
15 
 362.3  1.96
, 362.3  1.96
  356.42, 368.18
25
25 

2
In general, a 100(1-  )% confidence interval estimate of μ is given by


 
, x  z / 2

 x  z / 2
n
n

Commonly used confidence level includes:
Confidence level

0.90
0.1
0.95
0.05
0.98
0.02
0.99
0.01
 /2
0.05
0.025
0.01
0.005
z / 2
1.645
1.96
2.33
2.575
Sampling Error and Sample Size
Half of the confidence interval, i.e. z / 2

n
is the sampling error with 100(1-  )%
confidence. This helps determining the sample size.
Example
The breaking force of the metal wire is normally distributed with an unknown mean
and the standard deviation of 100 pounds. Suppose you want to estimate the
population mean breaking force to within  25 pounds of the true value with 90%
confidence. How large should the sample be?
 1.645 100  25
n
n  43.30
44 items should be selected.
Estimation of the Proportion
Besides the estimation of the population mean, another commonly estimated
parameter is the population proportion.
3
Example
Before the election, an organization has conducted a survey to investigate the
supportive rate of each candidate. What is the point estimate of the population
proportion of Johnson’s vote if the survey indicates 245 out of 500 respondents will
vote for him? What is the 95% confidence interval estimate of it?
The point estimate of the population proportion is the sample proportion p̂ . With
 p (1  p) 
pˆ ~ N  p,
 , the 100(1-  )% confidence interval estimate of p:
n



 pˆ  z / 2


pˆ (1  pˆ )
, pˆ  z / 2
n
pˆ (1  pˆ ) 


n

 In the above example,
Point estimate of p = 0.49
95% C.I. of p

0.49(0.51)
0.49(0.51) 

  0.49  1.96
, 0.49  1.96

500
500


 0.4462, 0.5338 
4
The entries in Table I are the probabilities that a random variable having the
standard normal distribution will take on a value between 0 and z.
They are
given by the area of the gray region under the curve in the figure.
TABLE I NORMAL-CURVE AREAS
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.1
0.2
0.3
0.4
0.5
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.6
0.7
0.8
0.9
1.0
0.2257
0.2580
0.2881
0.3159
0.3413
0.2291
0.2611
0.2910
0.3186
0.3438
0.2324
0.2642
0.2939
0.3212
0.3461
0.2357
0.2673
0.2967
0.3238
0.3485
0.2389
0.2704
0.2995
0.3264
0.3508
0.2422
0.2734
0.3023
0.3289
0.3531
0.2454
0.2764
0.3051
0.3315
0.3554
0.2486
0.2794
0.3078
0.3340
0.3577
0.2517
0.2823
0.3106
0.3365
0.3599
0.2549
0.2852
0.3133
0.3389
0.3621
1.1
1.2
1.3
1.4
1.5
0.3643
0.3849
0.4032
0.4192
0.4332
0.3665
0.3869
0.4049
0.4207
0.4345
0.3686
0.3888
0.4066
0.4222
0.4357
0.3708
0.3907
0.4082
0.4236
0.4370
0.3729
0.3925
0.4099
0.4251
0.4382
0.3749
0.3944
0.4115
0.4265
0.4394
0.3770
0.3962
0.4131
0.4279
0.4406
0.3790
0.3980
0.4147
0.4292
0.4418
0.3810
0.3997
0.4162
0.4306
0.4429
0.3830
0.4015
0.4177
0.4319
0.4441
1.6
1.7
1.8
1.9
2.0
0.4452
0.4554
0.4641
0.4713
0.4772
0.4463
0.4564
0.4648
0.4719
0.4778
0.4474
0.4573
0.4656
0.4725
0.4783
0.4484
0.4582
0.4664
0.4732
0.4788
0.4495
0.4591
0.4671
0.4738
0.4793
0.4505
0.4599
0.4678
0.4744
0.4798
0.4515
0.4608
0.4685
0.4750
0.4803
0.4525
0.4616
0.4692
0.4756
0.4808
0.4535
0.4625
0.4699
0.4761
0.4812
0.4545
0.4633
0.4706
0.4767
0.4817
2.1
2.2
2.3
2.4
2.5
0.4821
0.4861
0.4893
0.4918
0.4938
0.4826
0.4864
0.4896
0.4920
0.4940
0.4830
0.4868
0.4898
0.4922
0.4941
0.4834
0.4871
0.4901
0.4925
0.4943
0.4838
0.4875
0.4904
0.4927
0.4945
0.4842
0.4878
0.4906
0.4929
0.4946
0.4846
0.4881
0.4909
0.4931
0.4948
0.4850
0.4884
0.4911
0.4932
0.4949
0.4854
0.4887
0.4913
0.4934
0.4951
0.4857
0.4890
0.4916
0.4936
0.4952
2.6
2.7
2.8
2.9
3.0
0.4953
0.4965
0.4974
0.4981
0.4987
0.4955
0.4966
0.4975
0.4982
0.4987
0.4956
0.4967
0.4976
0.4982
0.4987
0.4957
0.4968
0.4977
0.4983
0.4988
0.4959
0.4969
0.4977
0.4984
0.4988
0.4960
0.4970
0.4978
0.4984
0.4989
0.4961
0.4971
0.4979
0.4985
0.4989
0.4962
0.4972
0.4979
0.4985
0.4989
0.4963
0.4973
0.4980
0.4986
0.4990
0.4964
0.4974
0.4981
0.4986
0.4990
Also, for z = 4.0, 5.0 and 6.0, the areas are 0.49997, 0.4999997, and 0.499999999.
5
Associate Degree 2020 – 2021 First Semester
CCMA4001 Quantitative Analysis I
Chapter 9
Testing Hypothesis
Introduction
In the previous chapter, we look at the type of inferential statistics that
estimate the unknown population parameter by the data collected in a sample.
In this chapter, we look at another type of inferential statistics that an
assumption about the population parameter is tested by the information
provided in a sample.
Null Hypothesis and Alternative Hypothesis
Instead of knowing nothing about the unknown population parameter, very
often, we have some assumptions or theories about the unknown parameter
and we want to test if the assumption is correct.
Null Hypothesis, H0, is the statement that contains the assumption we want to
test (the equal sign “=” is always included). Alternative Hypothesis, H1, is
the opposite statement of the null hypothesis (the equal sign “=” should not
be included). The alternative hypothesis can be one-sided or two-sided
depends on what we try to prove.
Example
Last year, the average amount of sale invoices of ABC shop is $1000 with the
standard deviation of $60.
1. We want to test if the average amount of this year’s sales invoices is the
same as last year ($1000), OR
2. We want to test if the average amount of this year’s sales invoices is more
than last year ($1000).
 In case 1, H0: μ = $1000
 In case 2, H0: μ = $1000
v.s.
v.s.
H1: μ ≠ $1000
H1: μ > $1000
1
Type I and Type II errors
No matter whether it is a two-sided test or one-sided test, we need to make
our decision based on the estimate we compiled from a sample.
As sampling error exists, sometimes we may make a wrong decision.
There are two types of errors.
H0 is true
Do not reject H0
Reject H0
H0 is false
Type II error
Probability = 
Type I error
Probability = 
The probability of committing Type I error,  , also known as the level of
significance of the test, is decided before the test is conducted. This level of
significance, with the knowledge of the distribution of the test statistics, helps
to determine the rejection region of the test.
In statistical hypothesis testing, a type I error is the rejection of a true null
hypothesis (also known as a "false positive" finding or conclusion; example:
"an innocent person is convicted"), while a type II error is the non-rejection of
a false null hypothesis (also known as a "false negative" finding or conclusion;
example: "a guilty person is not convicted"). Much of statistical theory
revolves around the minimization of one or both of these errors, though the
complete elimination of either is a statistical impossible.
2
Test for Hypothesis for the Mean (σ known)
When the variable X is normally distributed or when the sample size is large
 2
enough, X ~ N   ,
n

z
x  0
/ n

 , where E(X) = μ and Var(X) = σ2. The Z-statistic,

, when the null hypothesis H0: μ = μ0 is true, should follow the
standard normal distribution,.
We reject the null hypothesis H0: μ = μ0 when the z statistics fall into the
rejection regions.
Two-sided test
H0: μ =  0 v.s.
H1: μ ≠  0
Rejection region
with probability α
We reject the null hypothesis
either z is too large (z > z / 2 ) or
too small (z < - z / 2 ).
 z / 2
One-sided test
H0: μ =  0 v.s.
0
Z
z / 2
H1: μ >  0
Rejection region
with probability α
We reject the null hypothesis
when z is too large (z > z ).
0
z
Z
3
One-sided test
H0: μ =  0 v.s.
H1: μ <  0
Rejection region
with probability α
We reject the null hypothesis
when z is too small (z <- z ).
 z

0
Z
In the above example, if a random sample with sample size n = 30 of
this year’s sale invoices is $1030, should the null hypothesis be rejected
at 0.05 level of significance?
z
1030  1000
60 / 30
 2.7386
Case 1: As 2.7386 > 1.96 (z0.025 = 1.96), the null hypothesis is rejected.
Case 2: As 2.7386 > 1.645 (z0.05 = 1.645), the null hypothesis is rejected.
(Remember you must decide in advance whether you have to do a
two–tailed test or a one-tailed test instead of doing both.)
4
Test for Hypothesis for the Proportion
 p (1  p) 
As pˆ ~ N  p,
 , under the null hypothesis H0: p = p0, the Z-statistics
n


pˆ  p 0
should follow the standard normal distribution, Z 
~ N 0, 12 .
p0 (1  p 0 )
n


We reject the null hypothesis H0: p = p0 when:
Test
H0: p = p 0 v.s.
H1: p ≠ p 0
Rejection Region
either z is too large (z > z / 2 ) or
z is too small (z < - z / 2 )
H0: p = p 0 v.s.
H1: p > p 0
z is too large (z > z )
H0: p = p 0 v.s.
H1: p < p 0
z is too large (z <- z )
Example
A coin is suspected if it is fair or not. This single coin is tossed 200 times and
120 times resulted at head. Test at 0.10 level of significance if the coin is fair?
 H0: p =0.5 v.s. H1: p ≠0.5
0.6  0.5
z
 2.8284
0.5(0.5) / 200
As 2.8284 > 1.645, the null hypothesis is rejected.
concluded as unfair.
The coin is
5
p-value in Statistics
When you perform a hypothesis test in statistics, a p-value helps you
determine the significance of your results. This is another approach on doing
testing hypothesis.
How to find the p-value
To find the p-value, first we need to find out the test statistics z. Next, we
need to find the corresponding level of p from the z value obtained. For this
purpose, we need to look at the z table or from the calculator.
For example, let us find the value of p corresponding to z ≥ 2.81. From the
normal table, you find the probability is 0.4975. The corresponding p-value is
0.5 – 0.4975 = 0.0025. If we use 5% as the significant level, since p-value is less
than 5%, we reject the null hypothesis.
Example
A coin is suspected if it is fair or not. This single coin is tossed 200 times and
120 times resulted at head. Test at 0.10 level of significance if the coin is fair?
Use the p-value approach to solve it.
H0: p =0.5
v.s. H1: p ≠0.5
0.6  0.5
z
 2.8284
0.5(0.5) / 200
From the normal table when z = 2.8284, the probability is 0.4977
p-value = 0.5 – 0.4977 = 0.0023
Since 0.0023 < 0.05 (the  /2 value for 2-sided test)
The null hypothesis is rejected. The coin is concluded as unfair.
6
Other resources (section 6.1 – 6.6)
https://www.jbstatistics.com/category/hypothesis-testing/
7
Associate Degree 2021 – 2022 Second Semester
CCMA4001 Quantitative Analysis I
Chapter 10
Excel Statistical Function
Introduction
Excel provides an extensive range of Statistical Functions, that perform calculations
from basic mean, median & mode to the more complex statistical distribution and
probability tests.
The Excel Statistical functions are all listed in the tables below. Selecting a function
name will take you to a full description of the function, with examples of use and
advice on common errors.
Function Name
Descriptions
COUNT
Returns the number of numerical values in a supplied set
of cells or values
MAX
Returns the largest value from a list of supplied numbers
MIN
Returns the smallest value from a list of supplied numbers
AVERAGE
Returns the Average of a list of supplied numbers
MEDIAN
Returns the Median (the middle value) of a list of supplied
numbers
MODE
Returns the Mode (the most frequently occurring value) of
a list of supplied numbers
PERCENTILE.INC
Returns the K'th percentile of values in a supplied range,
where K is in the range 0 - 1 (inclusive)
QUARTILE.INC
Returns the specified quartile of a set of supplied numbers,
based on percentile value 0 - 1 (inclusive)
Function Name
Descriptions
STDEV.S
Returns the standard deviation of a supplied set of values
(which represent a sample of a population)
STDEV.P
Returns the standard deviation of a supplied set of values
(which represent an entire population)
VAR.S
Returns the variance of a supplied set of values (which
1
represent a sample of a population)
VAR.P
Returns the variance of a supplied set of values (which
represent an entire population)
SKEW
Returns the skewness of a distribution
FACT
Find the factorial of a number
PERMUT
Returns the number of permutations for a given number of
objects
BINOM.DIST
Returns the individual term binomial distribution
probability
POISSON.DIST
Returns the Poisson distribution
GAUSS
Calculates the probability that a member of a standard
normal population will fall between the mean and z
standard deviations from the mean
NORM.INV
Returns the inverse of the normal cumulative distribution
STANDARDIZE
Returns a normalized value
CONFIDENCE.NORM
Returns the confidence interval for a population mean,
using a normal distribution
Z.TEST
Returns the one-tailed probability value of a z-test
There are plenty of excel function so I cannot include all of them. You may also visit
the following web site to learn more.
The attached files are the exercises and the answers on how to use the excel function.
Please do it and check the answer.
Useful Website
https://exceljet.net/excel-functions/excel-fact-function
https://support.microsoft.com/en-us/office/excel-functions-by-category-5f91f4e9-7b4
2-46d2-9bd1-63f26a86c0eb
2
Download