Quantitative Analysis I Course Syllabus

Quantitative Analysis I Course Title: I. Course Information C Course Code: CC Credits: 3 C M A 4 QF Credits: 12 0 0 1 QF Level: 4 II. Course Objectives This course is designed to provide students with an introduction to statistical knowledge as applied to daily or academic situations and to equip them with the essential skills to identify and apply appropriate techniques to solve problems. Students will be exposed to concepts such as sampling techniques, graphical presentations, statistical measures, classical probability distribution, point and interval estimation and hypothesis testing. Students will learn how to perform data analysis by hand and by Excel. III. Syllabus 1. Sampling and Summarizing Data and Statistical Descriptions  Frequency Distributions  Stem-and-Leaf Displays  Graphical Presentations  Box-and-Whisker Plots  Measures of Location (mean, mode and median)  Measure of Variation (range, IQR, variance and standard deviation) 2. Possibilities and Probabilities  Permutations  Combinations  Probability  Mathematical Expectation  Some rules concerning probability  Sample spaces and Events  Addition rules  Conditional Probability  Independent Events  Multiplication Rules  Bayes Theorem 3. Probability Distributions  Bernoulli Distribution  Binomial Distribution  Geometric Distribution  Poisson Distribution  Normal Distribution 4. Sampling and Sampling Distribution  Different sampling methods  Sampling Distribution of the Mean  Sampling Distribution of the Proportion  The Central Limit Theorem 5. Point and Interval Estimation  Point estimation  Interval Estimation of a population mean  Interval Estimation of a population proportion  Determine the size of the sample 6. Hypothesis Testing  Reason for testing hypothesis  Steps in Hypothesis Testing  Hypothesis Testing on means (one sample)  Hypothesis Testing on proportions (one sample)  Type I and Type II errors 7. Statistical Analysis Using Excel  Data Analysis dialog box in Excel  Excel Statistical Functions  Confidence interval estimation by using Excel  Hypothesis Testing by using Excel IV. Assessment Four assignments (60%) and one end of term assessment (40%) VIII. Required and Recommended Reading (please use The Harvard Convention/APA Convention) References 1. Haeussler, Ernest F., Paul, Richard S. & Wood, R. J. (2011) Introductory Mathematical Analysis for Business, Economics, and the Life and Social Sciences, 13th edition, Pearson Education Limited. 2. Berenson, Mark L., Levine, David M. & Szabet, Kathryn A. (2015), Basic Business Statistics: Concept and Applications, 13th edition, Pearson Education Limited. Associate Degree 2022 – 2023 First Semester CCMA4001 Quantitative Analysis I Chapter 1 Summarizing and Describing Data 1.1 Summarizing Data In order to visualize the distribution of a set of raw data, we ought to compile the data into a more comprehensible form, making use of tables and graphs. A. Frequency Tables Given a set of raw data we usually arrange it into a frequency distribution where we collect ‘like’ quantities and display them by writing down how many of each type there are to form a frequency table. 2-1 Example 1 In a multiple-choice test with 10 questions, the numbers of correct answers of 40 students are as follows. 10 4 9 6 7 4 8 7 7 5 5 8 7 9 10 6 5 9 7 6 4 7 5 6 7 9 5 8 8 4 8 7 7 5 5 4 8 6 6 6 Construct a frequency table for these data. Solution: Number of correct answers Tally Frequency 4 //// //// // //// // //// //// //// / //// // 5 5 6 7 8 9 10 7 7 9 6 4 2 Total: 40 2-2 B. Bar Chart Example 2 Using the frequency table constructed in Example 1, draw a bar chart for the distribution of the number of correct answers of 40 students in the multiple-choice test. Solution: Frequency (Number of students) Distribution of the number of correct answers of 40 students in the multiple-choice test 10 8 6 4 2 0 4 5 6 7 8 Number of correct answers 2-3 9 10 C. Stem-and-leaf diagrams A very useful graphical representation of a frequency distribution is the stem-and-leaf diagram (or stemplot) . The stem-and-leaf diagram involves a combination of a graphical technique and a sorting technique. By sorting it means listing the data in rank order according to numerical value. The data values themselves are use to do this sorting. The “stem” is the leading digit(s) of the data, while the “leaf” is the trailing digit. For example, the numerical data 386 might be split 38 – 6 as shown: Leading digits 38 (Used in sorting) Trailing digit 6 (Shown in display) A stem-and-leaf diagram is a method of presenting a data set so that gaps or concentrations in the data become visible. 2-4 Example 3 Suppose that a class of 40 students obtained the following results in a Mathematics test. 61 80 55 70 76 73 100 90 64 62 75 64 62 66 46 61 67 39 58 63 63 64 51 40 66 43 38 37 28 71 70 49 48 68 86 27 69 74 37 56 Construct a stem-and-leaf diagram for these data. Solution: Stem (Tens) Leaf (Units) 2 78 3 7789 4 03689 5 1568 6 11223344466789 7 0013456 8 06 9 0 10 0 2-5 Advantages of a stem-and-leaf diagram 1. It is easy to construct. In fact, it is no more difficult to construct than a frequency table. 2. It is actually partly a table and partly a graph and so it immediately and directly gives a good picture of the frequency distribution without having to prepare a frequency table first and then construct charts afterwards. 3. Since the actual data are recorded in the diagram, it retains the information about the original data, and the information may be recovered readily. In a frequency table or histogram, data are represented by tallies or areas of rectangles in class intervals and so some information about the original data is lost and cannot be recovered. For example, the reading 64 is recorded in its entirety in a stem-and-leaf diagram, but is represented only by a count of 1 in the class interval (e.g. 60 – 64) in a frequency table or histogram. 4. It can be regarded as the original set of data arranged in ascending order of magnitude. Hence it can be readily used for finding quartiles. Disadvantages of a stem-and-leaf diagram 1. For some type of data, the number of stems that can be chosen is either very small or very large, thus making the diagram inconvenient to construct and unable to show the distribution effectively. 2. It is not quite suitable for large sets of data. Actually, for a large set of data, the purpose of graphical representation is to give a good overall picture of the distribution rather than to show the details of the data. A bar chart or a histogram is more suitable in this case. 2-6 Example 4 A fishery expert found the following concentrations of mercury, in parts per million, in thirty fish caught in a certain stream. 0.024 0.031 0.052 0.024 0.024 0.030 0.056 0.034 0.059 0.068 0.035 0.021 0.052 0.023 0.054 0.028 0.037 0.034 0.048 0.040 0.022 0.049 0.043 0.034 0.032 0.021 0.040 0.032 0.021 0.039 Construct a double-stem diagram for these data. Solution: Stem (Unit = 0.01) Leaf (Unit = 0.001) 2 11123444 2 8 3 0122444 3 579 4 003 4 89 5 224 5 69 6 6 8 In the above diagram, the units of the stems and leaves have been chosen to make the recorded digits simple. This is an important feature of a stem-and-leaf diagram. 2-7 1.2 Statistical Descriptions In statistics, there are two useful types of measure which characterize any set of data or frequency distribution. The first type, a measure of ‘centralization’, attempts to locate a typical value about which the distribution clusters. This type of measure is called an average or measure of central tendency or measure of location. The second type is a measure of how scattered or spread out a distribution is and is called a measure of dispersion. In the figures shown, (a) shows two distributions with different measures of central tendency but roughly the same spread, (b) illustrates two distributions with the same measure of central tendency but different spreads. (a) (b) 2-8 I. Measures of Central Tendency The most common measures of central tendency or average are the mean, the median and the mode. A. Mean (Arithmetic Mean) Given the complete set of N data {x 1 , x 2 ,  , x N } in a population, the mean  , is defined as  1 (x1  x 2    x N ) N or  1 N  xi N i 1 The mean is usually denoted by Greek letter  (pronounced as mu). If the set of n data {x p1 , x p 2 ,, x p n } , where the p i ’s are a set of integers selected from 1 to N, is a sample of size n drawn from a population, then the sample mean is defined similarly, but is denoted by x (read as x bar). Thus 1 x  ( x p1  x p 2    x p n ) n x or 1 n xp n i 1 i The notation x pi for the elements of the sample may be a bit difficult for beginners. Hence, when no misunderstanding arises, we shall denote the sample of size n simply as {x 1 , x 2 ,  , x n } Bearing in mind that the element x i in the sample is, in general, not the same element x i in the population. With this understanding, the sample mean is x  2-9 1 (x1  x 2    x n ) n or x 1 n  xi n i 1 Example 4 Suppose that a class of 40 students obtained the following results in a Mathematics test. 61 80 55 70 76 73 100 90 64 62 75 64 62 66 46 61 67 39 58 63 63 64 51 40 66 43 38 37 28 71 70 49 48 68 86 27 69 74 37 56 (a) Find the mean of the population of Mathematics test marks. (b) The following two samples each have been drawn randomly from the population of Mathematics test marks. S1 = {70, 43, 28, 69, 75, 90} S 2 = {68, 62, 48, 39, 38, 55, 66, 71, 37, 76} Find the means of these samples. (c) Find the mean of the sample S3 formed by combining the samples S1 and S 2 . Solution: (a) The population mean is  1 (61  80  55  70  76  73  100  90  64  62  75  64  62  66  46  61  67  39  58  63 40  63  64  51  40  66  43  38  37  28  71  70  49  48  68  86  27  69  74  37  56) = 60.425 (b) The sample mean of S1 is x1  1 (70  43  28  69  75  90)  62.5 6 The sample mean of S 2 is x2  1 (68  62  48  39  38  55  66  71  37  76)  56 10 Note that a population mean is a unique value, but the sample mean varies from sample to sample. (c) The sample mean of S3 is or x3  1 (70  43  28  69  75  90  68  62  48  39  38  55  66  71  37  76)  58.4375 16 x3  62.5  6  56  10  58.4375 6  10 2-10 B. Median The median is a measure of position. It is the middle value in an ordered sequence of data. To find the median from a set of data collected in its raw form, we must first arrange the data in rank order, from the smallest to the largest observation. Such an ordered sequence of data is called an ordered array. For a set of discrete data x 1 , x 2 , …, x n arranged in ascending order, (i) if n is odd, x n 1 is the median, the median is the value of the datum that is in the middle. 2 (ii) if n is even, the median is  1  x n  x n  , the median is the mean of the two data that are 1  2  2 2  nearest to the middle. Example 5 (a) Find the median of the set of data {12, 8, 13, 16, 5}. (b) Find the median of the set of data {25, 25, 37, 26, 25, 12, 75, 75}. Solution: (a) Arrange the set of five data in ascending order 5, 8, 12, 13, 16, the median is x 51  x 3  12 2 (b) Arrange the set of eight data in ascending order 12, 25, 25, 25, 26, 37, 75, 75, the median is  1 1 1  x 8  x 8   ( x 4  x 5 )  (25  26)  25.5   1  2 2 2 2 2  2-11 C. Mode The mode of a set of data is the value that occurs with the highest frequency. In this sense it is “most typical” of a set of data For example, for the data 1, 1, 2, 2, 2, 3, 3, 4, 5, 5, 6, the mode is 2. A distribution with one mode is called a unimodal distribution, while those with two modes are bimodal, and with three or more are multimodal. The two main advantages of mode are that it requires no calculations, only counting, and that it can be determined for qualitative as well as quantitative data. However, if all values are different in the set of data, certainly, the mode is useless in such a situation. Example 6 Suppose that 50 children are asked which of the six brands of soft drink they prefer most and the following results are obtained. Brand A B C D E F Number of children 4 15 5 8 3 15 Find the mode of these data. Solution: There are two modes in this set, namely, B and F. This set is said to be bimodal. 2-12 II. Measures of Dispersion The measures of central tendency can provide only brief information on a set of data. Obviously, for a set of data, the averages alone cannot tell us how spread out or dispersed the data are. We need some measures of dispersion, a numerical value indicating the amount of scatter about a central point. Widely dispersed data are also highly variable data. Hence measures of dispersion are also called measures of variability. The most common measures of dispersion in statistics are the range, the inter-quartile range, the variance and the standard deviation. 2-13 A. Range The range of a set of data is the difference between the largest value and the smallest value of the set. In general, the greater the range, the greater the dispersion of the set of data. Example 7 Find the range of scores of athlete A and B in Example 11 Solution: The range of scores of athlete A = 9.5 – 6.0 = 3.5 The range of scores of athlete B = 8.0 – 7.0 = 1.0 Since the range of score of athlete A is greater than that of athlete B, we say that the scores of athlete A is more dispersed than those of athlete B. 2-14 B. Inter-quartile range With the set of data arranged in ascending order, the median is the value which divides the set of data into two equal parts. Similarly, if we divide the set of data into four equal parts, the corresponding values, denoted by Q1 , Q 2 , Q 3 are called the first, second and third quartiles respectively. And Q 2 is just the median of the distribution. The inter-quartile range (IQR) of a set of data is defined as Q 3  Q1 , it measures approximately how far from the median we must go on either side before we can include one-half of the values of the data set. In dividing the set of data into 100 equal parts, the values are called percentiles and are denoted by P1 , P2 , …, P99 . The 50 th percentile, P50 , corresponds to the median, whereas P25 and P75 corresponds to Q1 and Q 3 respectively. The p th percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 – p) percent of the items take on this value or more. Q1 is the first quartile (or lower quartile) where 25% of the data lie below it; Q 2 is the second quartile (or middle quartile or median) where 50% of the data lie below it; and Q 3 is the third quartile (or upper quartile) where 75% of the data lie below it. To find the p th percentile, first arrange the set of discrete data x 1 , x 2 , …, x n in ascending order, then compute index i, where p n 100 to find the position of the p th percentile. i If i is not an integer, round up to the nearest integer. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i + 1. 2-15 Example 8 (a) Find the inter-quartile range of the data set A {14, 23, 16, 18, 15, 44, 19}. (b) Find the inter-quartile range of the data set B {10, 15, 40, 28, 34, 18, 24, 30}. (c) By comparing the inter-quartile range of the data sets A and B, which set has a greater dispersion? Solution: (a) Arrange the seven data of the data set A in ascending order 14, 15, 16, 18, 19, 23, 44. For the 25 th percentile, the index i  25  7  1.75 = 2 (round up to the nearest integer), 100 hence Q1  x 2  15 For the 75 th percentile, the index i  75  7  5.25 = 6 (round up to the nearest integer), 100 hence Q 3  x 6  23 The inter-quartile range  Q 3  Q1 = 23 – 15 = 8 (b) Arrange the eight data of the data set B in ascending order 10, 15, 18, 24, 28, 30, 34, 40. For the 25 th percentile, the index i  hence Q1  25 8  2, 100 1 1 ( x 2  x 3 )  (15  18)  16.5 2 2 For the 75 th percentile, the index i  75 8  6, 100 1 1 ( x 6  x 7 )  (30  34)  32 2 2 The inter-quartile range  Q 3  Q1 = 32 – 16.5 = 15.5 hence Q 3  (c) The range of both data sets A and B are 30. However, the inter-quartile range of data set A is less than the inter-quartile range of data set B, data set B has a greater dispersion. The range considers the difference between the maximum and minimum values of a set of data. The inter-quartile range considers the range of 50% of the data in the middle and thus avoids the impact of extreme values. Therefore if there are extreme values in a set of data, the inter-quartile range is a better measure of dispersion than the range. Moreover, the inter-quartile range exists even if the set of data has open ends. 2-16 Box-and-Whisker Diagram The median, the lower quartile and the upper quartile together with the maximum and the minimum values provide a good description of a set of data as they indicate some of the most important characteristics of the set. These five key descriptive statistical measures are often called the five-number summary of the set of data. A graphical display of these measures, called a box-and-whisker diagram or a box plot, gives an even better visual impression of the set. lower 25% of data middle 50 % ofdata       _____________  _____________ Minimum Q1 upper 25% of data Q3 Q2 Maximum (median) IQR Range A box-and-whisker diagram consists of a rectangular box drawn with its length parallel to the x-axis and with its ends marking the position of the lower and the upper quartiles. An orange bar is then inserted in the box to mark the median. The two extreme values, the minimum and the maximum values of the data, are linked to the box by lines, called whiskers, parallel to the x-axis. A glance at the diagram then gives us good information about the central tendency, dispersion and extreme values of the set. (1) The bar at the median shows the location of the centre of the data. (2) The length of the box is equal to the inter-quartile range shows the dispersion of 50% of the data in the middle, a measure of dispersion. (3) The lengths of the whiskers show the dispersion of the data below the lower quartile and above the upper quartile, describe the behavior at the ends or tails of the distribution. (4) The shape of the diagram gives us a quick impression on the degree of symmetry of the data distribution about the median. It is easy to use box-and-whisker diagrams to compare the features, such as location of centre, dispersion and symmetry of different sets of data. However, a box-and-whisker diagram does not reveal the total frequency of each set of data, nor the frequency of the data for any specific range. If such information is required, a stem-and-leaf diagram, bar chart or histogram can be used. 2-17 Box-and-whisker diagrams are particularly useful for comparing the central tendency and the dispersion of two or more sets of data. Example 9 The following box-and-whisker diagrams show the distributions of marks of Chinese, English and Mathematics test. (a) (b) (c) (d) Which test has the marks with the largest inter-quartile range? Which test has the marks with the smallest range? Which test has the highest median mark? If Mary gets 70 marks in all three tests, in which test does she perform the best? Briefly explain your answer. Solution: (a) Since the length of the box of Mathematics test is the largest, Mathematics test has the marks with the largest inter-quartile range. (b) Since the distance between two ends of the whiskers of Chinese test is the shortest. Chinese test has the marks with the smallest range. (c) Since the orange bar in the box of Mathematics test is at the rightmost position, the median mark of Mathematics test is the highest. (d) Since from the box-and-whisker diagram above, the mark of Mary’s English test is in the top 25% of the class while her marks in Mathematics and Chinese tests are not. Mary performs the best in English test. 2-18 Skewness of Distributions A distribution can have many different shapes. It may be symmetric or skewed. A distribution is symmetric if the parts above and below its center are mirror images. If Q 2  Q1  Q 3  Q 2 , the distribution is symmetric. Min Q1 Q3 Q2 Max A distribution is skewed to the right if the right side is longer, while it is skewed to the left if the left side is longer. For a positively skewed or right-skewed distribution, an asymmetric distribution with a “tail” on the right indicates the presence of extreme values at the positive end of the distribution. A distribution is positively skewed if Q 2  Q1  Q 3  Q 2 Long tail to the right Min Q1 Q 2 Q3 Max For a negatively skewed or left-skewed distribution, an asymmetric distribution with a “tail” on the left. A distribution is negatively skewed if Q 2  Q1  Q 3  Q 2 Long tail to the left Min Q1 Q 2 Q 3 Max 2-19 Example 10 Using the stem-and-leaf diagram constructed in Example 5 for the distribution of results of the class of 40 students in the Mathematics test. Stem (Tens) Leaf (Units) 2 78 3 7789 4 03689 5 1568 6 11223344466789 7 0013456 8 06 9 0 10 0 (a) Find the median, the first and the third quartiles. (b) Construct the box-and-whisker diagram. (c) Use the quartiles to comment on the skewness of the distribution. Solution: (a) The median is  1 1 1  x 40  x 40   ( x 20  x 21 )  (63  63)  63 1  2  2 2 2  2 For the 25 th percentile, the index i  25 1 1  40  10 , hence Q1  ( x 10  x 11 )  (48  49)  48.5 100 2 2 For the 75 th percentile, the index i  75 1 1  40  30 , hence Q 3  ( x 30  x 31 )  (70  70)  70 100 2 2 (b) 63 27 25 30 48.5 35 40 45 70 50 55 60 65 70 100 75 80 85 90 95 100 (c) Q 2  Q1  63  48.5  14.5 and Q 3  Q 2  70  63  7 Since Q 2  Q1  Q 3  Q 2 , the distribution is negatively skewed (left-skewed). 2-20 Example 11 The table below gives the monthly salaries in dollars of 25 employees of a certain department. (a) (b) (c) (d) (e) 7800 11900 12700 10400 20200 6200 7300 9200 15500 17900 9700 9500 10500 13300 10200 9900 14200 8900 8700 16600 7400 6600 9600 6100 8200 Construct a stem-and-leaf diagram for the data. Find the mean. Find the median, the first and the third quartiles and the inter-quartile range. Construct the box-and-whisker diagram. Use the quartiles to comment on the skewness of the distribution. Solution: (a) Stem (Unit = $1000) Leaf (Unit = $100) 6 126 7 348 8 279 9 25679 10 245 11 9 12 7 13 3 14 2 15 5 16 6 17 9 18 19 20 2 2-21 (b) The mean  1 (7800  11900  12700  10400  20200  6200  7300  9200  15500  17900 25  9700  9500  10500  13300  10200  9900  14200  8900  8700  16600  7400  6600  9600  6100  8200) = 10740 (c) Making use of the stem-and-leaf diagram for the distribution of the salaries (with a column of cumulative frequencies added to help locating the quartiles), The median is x 251  x 13  9700 2 For the 25 th percentile, the index i  25  25  6.25 = 7 (round up to the nearest integer), 100 hence Q1  x 7  8200 . For the 75 th percentile, the index i  75  25  18.75 = 19 (round up to the nearest integer), 100 hence Q 3  x 19  12700 . The inter-quartile range  Q 3  Q1 = 12700 – 8200 = 4500 (d) 9700 6100 6000 7000 8200 8000 12700 9000 10000 11000 12000 13000 20200 14000 15000 16000 17000 18000 19000 20000 21000 (e) Q 2  Q1  9700  8200  1500 and Q 3  Q 2  12700  9700  3000 Since Q 2  Q1  Q 3  Q 2 , the distribution is positively skewed (right-skewed). 2-22 C. Variance and Standard Deviation Although the inter-quartile range is an improved measure of dispersion compared with the range, still it does not make use of the actual values of all the data in the set, therefore, cannot completely reflect the dispersion of the data. A measure of dispersion which does take into account the dispersion of all the values is the variance and standard deviation. To overcome the limitations of range and inter-quartile range mentioned above, we can find the distance of each datum from the centre of a group of data. The greater the average distance of all data from the centre, the wider the dispersion of a set of data is. If the set of N data {x 1 , x 2 , , x N } represents a population with mean  , then the variance of the set of data is defined as the mean of the squares of the deviations of individual values from the population mean, and is commonly denoted by  2 . Thus, population variance 2  1 N 1 ( x i  ) 2  [( x 1  ) 2  ( x 2  ) 2    ( x N  ) 2 ]  N i 1 N Large variances indicate large dispersion and small variance indicate small dispersion. However, the variance defined above does not have the same unit as the original values of x. To have a measure of dispersion with the same unit as the original data, we take the positive square root of the variance. The resulting measure is called the standard deviation of the set of data. Thus, Population standard deviation   1 N  ( x i  ) 2  N i 1 1 [( x 1  ) 2  ( x 2  ) 2    ( x N  ) 2 ] N If the set of n data {x 1 , x 2 ,  , x n } is a sample of size n drawn from a population and with mean x , the sample variance, s 2 , is defined as s2  1 n 1 (x i  x) 2  [( x 1  x ) 2  ( x 2  x ) 2    ( x n  x ) 2 ]  n  1 i 1 n 1 The sample standard deviation, s, is the positive square root of the sample variance. s 1 n  (x i  x) 2  n  1 i 1 1 [( x 1  x ) 2  ( x 2  x ) 2    ( x n  x ) 2 ] n 1 2-23 Note that the differences between sample variance s 2 and population variance  2 are the sample mean x is used instead of the population mean  , and the divisor is n – 1 instead of N. Standard deviation can give us an idea about how close all the data are from their mean, and thus we can learn about the consistency of the set of data. The smaller the standard deviation, the less dispersed the set of data is. In other words, the distribution of data in the set is more consistent. 2-24 Example 12 The temperatures (in o C ) of water in seven beakers are: 30, 32, 33, 28, 31, 29, 34. (a) Find the mean of the temperatures of the water. (b) Find the population standard deviation of the temperatures of the water. Solution: (a) The mean of the temperatures of the water is  1 7 1 x i  (30  32  33  28  31  29  34)  31  7 i 1 7 (b) The variance of the temperatures of the water is 1 7 ( x i  ) 2  7 i 1 1  [(30  31) 2  (32  31) 2  (33  31) 2  (28  31) 2  (31  31) 2  (29  31) 2  (34  31) 2 ] 7 1  [(1) 2  12  2 2  (3) 2  0 2  (2) 2  3 2 ] 7 1  (1  1  4  9  0  4  9) 7 4 2  Therefore, the population standard deviation of the temperatures of the water is   4  2 2-25 Example 13 (a) Find the variance and standard deviation of the population of Mathematics test marks in Example 7 with the population mean 60.425. (b) If the passing mark is one population standard deviation less than the mean, find the number of students failed in the Mathematics test. (c) The sample S 2 = {68, 62, 48, 39, 38, 55, 66, 71, 37, 76} has been drawn from the population of Mathematics test marks in Example 7. The sample mean was found to be 56. Find the sample variance and sample standard deviation. Solution: (a) The variance is 1 40 2  xi  2 40 i 1 1  (612  80 2  55 2  70 2  76 2  73 2  100 2  90 2  64 2  62 2  75 2  64 2  62 2  66 2  46 2 40  612  67 2  39 2  58 2  63 2  63 2  64 2  512  40 2  66 2  43 2  38 2  37 2  28 2  712  70 2  49 2  48 2  68 2  86 2  27 2  69 2  74 2  37 2  56 2 )  60.425 2 1  (156497)  (60.425) 2 40 = 261.2444 2  Therefore the standard deviation is   261.2444  16.1631 (b) The passing mark = 60.425 – 16.1631 = 44.2619 There are eight students with marks less than 44, so eight students failed in the Mathematics test. (c) The sample variance is s2  2 1  10 2   x i  10 x  10  1  i 1  1  [(68 2  62 2  48 2  39 2  38 2  55 2  66 2  712  37 2  76 2 )  10  56 2 ] 9 1  (33304  31360) 9  216 And the sample standard deviation is s  216  14.70 2-26 Use Scientific Calculator to find mean and standard deviation Use the calculator to find the mean and standard deviation of the data set {1, 2, 5, 6, 8, 9, 10, 12, 14, 18} 2-27 Use Scientific Calculator to find mean and standard deviation Use the calculator to find the mean and standard deviation of the data set {1, 2, 5, 6, 8, 9, 10, 12, 14, 18} 2-28 Associate Degree 2022 – 2023 First Semester CCMA4001 Quantitative Analysis I Chapter 2 Counting 2.1 The Fundamental Principle of Multiplication Suppose that the first task can be completed in m1 ways, a second task in m 2 ways, a third task in m 3 ways, and so on, until we reach the k th task that can be performed in m k ways; then the total number of ways can be done is the product m1m 2  m k Example 1 (a) Find the number of ways to answer a true-false quiz with 4 questions if every question must be answered. (b) If we allow the possibility of unanswering the questions, what is the possible number of ways? Solution: (a) If every question must be answered, there are two choices, true or false, to answer each question. By the fundamental principle of multiplication, the total number of ways is 2  2  2  2  16 (b) If every question can be unanswered, there are 3 alternatives to answer each question. The number of ways to answer the quiz is 3  3  3  3  81 3-1 Example 2 Consider the word “CHAPTER”. (a) How many ways can this word be arranged? (b) If we insist that the letter C starts first. How many ways are possible? (c) If we insist that the letter C starts first and the letter R be the last. How many ways are possible? Solution: 3-2 2.2 Permutation A. Factorial Notation The product of the first n consecutive integers is denoted by n! and is read as ‘n factorial’. That is, n! n (n  1)(n  2) 3  2  1 for n  1 For example, 4! = 4  3  2  1 = 24 and 8! = 8  7  6  5  4  3  2  1 = 40320 Also, ‘factorial 0’, 0! is defined to be 1. Since (n  1)! (n  1)(n  2)  3  2  1 , we see that when n > 1, n! n (n  1)! Example 3 In how many ways can 8 people be seated in a row of 8 chairs? Solution: The number of ways is 8! 40320 3-3 B. Permutation of n distinct objects In enumerating a sample space, we often require to choose a number r of objects from a set of n objects and arrange them in order. We say we make a permutation (or an arrangement) of n distinct objects taken r (r  n ) at a time. The order of the object in permutation is important, that is, abc and bca are different permutations. The total number of possible permutations, denoted by Prn or n Pr is given by Prn  n (n  1)(n  2)(n  r  1)  r factors Note that there are r factors in the expression for Prn . Using the factorial notation, we can define Prn as: n! Prn  (n  r )! In particular, when all n distinct objects are taken together and arranged in order, the number of permutations is Pnn  n! Example 4 In how many different ways can the 20 members of a union select a president, a vice-president, secretary, and a treasurer? Solution: Assuming that the officers are selected in the order president, a vice-president, secretary, and a treasurer. Since the order is important in this question, the number of possible ways is 20! 20! P420    116280 (20  4)! 16! 3-4 Example 5 How many different possible ways can a 6-letter words be formed such that the first and the last letter are distinct vowels and the rest four are distinct consonants? Solution: C. Permutation of objects not all distinct If among n objects, n 1 are of one kind, n 2 of a second kind, …, and n k of a k th kind, then when all the n objects are taken together, the number of distinct permutations is n! n 1!n 2 ! n k ! Example 6 Find the number of permutations can be made from the letters in each word. (a) FACTORIAL (b) COMBINATION Solution: (a) Since there are two A’s, hence by substitute n 1  2 and n = 9, the number of arrangements is given by 9!  9  8  7  6  5  4  3  181440 2! (b) Since there are two O’s, two I’s and two N’s, hence by substitute n 1  2 , n 2  2 , n 3  2 and n = 11, the number of arrangements is given by 11!  4989600 2!2!2! 3-5 Example 7 In how many ways can 10 customers be assigned to three counters with 2 to counter A, 3 to counter B and 5 to counter C? Solution: The number of ways is 10!  2520 2!3!5! 3-6 2.2 Combination In many circumstances, when we select objects, we need not consider the order in which the objects appear. We then say we make a selection or combination of the objects. The number of combinations (or selections) of n distinct objects taken r at a time, where r  n , n is denoted by C nr , n C r or   , where r Prn n! n (n  1)(n  2)  (n  r  1) C    r! (n  r )!r! r (r  1)  2  1 n r In particular, C 0n  C nn  1 It is important to note that we use C nr instead of Prn when the different orderings of the r chosen objects are unimportant. Example 8 In how many ways can 5 cards be drawn from an ordinary pack of 52 playing cards? Solution: We need to consider combinations, since the order in which the cards are drawn is not important. The number of ways of drawing 5 cards is C 52 5  3-7 52! 52!   2598960 (52  5)!5! 47!5! Example 9 A team of 6 is chosen at random from 8 boys and 7 girls. (a) How many ways can the team to be chosen if there are no restrictions. (b) How many ways can the team to be chosen if there must be more boys than girls? Solution: (a) Number of ways of choosing the team is C15 6  15! 15!   5005 (15  6)!6! 9!6! (b) If there are more boys than girls, then there must be 6 boys, 5 boys and 1 girl or 4 boys and 2 girls. Number of ways of choosing 6 boys is C 86  28 Number of ways of choosing 5 boys and 1 girl is C 85  C17  56  7  392 Number of ways of choosing 4 boys and 2 girls is C 84  C 72  70  21  1470 Therefore, the number of ways of choosing the team if there are more boys than girls is 28 + 392 + 1470 = 1890 3-8 Associate Degree 2022 – 2023 First Semester CCMA4001 Quantitative Analysis I Chapter 3 Probability 3.1 Set Notation A. Set and Element A set is a list or collection of objects. The objects in the set are called elements or members of the set. If x is an element of the set A, we write x  A. Here the symbol  means “ belong to” or “is an element of”. Correspondingly, the symbol  means “does not belong to” or “is not an element of”. If every element of a set B also belongs to a set A, we say that B is contained in A and call B a subset of A. Symbolically, we write B  A if x  B implies that x  A. For example, Let A = {2, 4, 6, 8, 10}. If B = {2, 8, 10}, then B  A. If C = {1, 2, 4}, then C is not a subset of A because 1C but 1A. In symbol, we write C  A. Two sets are equal if and only if each is contained in the other. Thus A = B if and only if A  B and B  A. The universal set, U, is the set which contains all possible elements within a particular application under consideration. For example, if we consider A = {1, 2, 3} as a set listing some possible results of throwing a die, the universal set is U = {1, 2, 3, 4, 5, 6}. 4-1 On the other hand, every set contains a subset with no elements in it. We call this subset the null set or empty set and denote it by  (read as phi). B. Venn Diagram The idea of sets and subsets and relationships between them can be conveniently illustrated by means of Venn diagrams. In such diagrams, the universal set is usually represented by the region bounded by a rectangle and sets and subsets by regions bounded by circles or other closed curves. The regions may be shaded as required. The elements of the sets need not be marked in the diagram. Example 1 The following diagram shows the universal set U and three other sets, A, B and C. Describe the relationships between these sets. Solution: All A, B and C are subsets of U, i.e., A  U, B  U and C  U. The circle representing C lies within the circle representing A. Hence, C is a subset of A, i.e., C  A. Apart from these relationships, no set is a subset of any other, i.e., A  B, A  C, B  A, B  C and C  B. The circles representing B and C do not intersect each other. Hence, B and C have no elements in common. The circles representing A and B have some common area. Hence, A and B have at least one element in common. 4-2 C. Operations on Sets It is often necessary to combine two or more sets to form new sets. This is done by set operations. (1) Intersection The intersection of two sets A and B, denoted by A  B, is the set of elements which belongs to both A and B. In symbols, we write A  B = {x: x  A and x  B} The intersection, like A and B themselves, is a subset of the universal set U. It can be represented in a Venn diagram as the region common to the regions representing A and B. For example, if A = {1, 3, 5, 7, 9} and B = {1, 2, 3, 4, 5}, then A  B = {1, 3, 5}. If A and B have no elements in common, i.e., A  B =  , they are said to be disjoint. Thus, A and B are disjoint if A  B =  . (2) Union The union of two sets A and B, denoted by A  B, is the set of elements which belongs to either A or B or both. In symbols, we write A  B = {x: x  A or x  B} 4-3 For example, if A = {2, 4, 6, 8, 10} and B ={5, 10, 15, 20}, then A  B = {2, 4, 5, 6, 8, 10, 15, 20}. Example 2 Let U = {x: x is a lower case letter}, A = {a, e, i, o, u}, B ={v, o, w, e, l}, and C = {v, w}. (a) Find (i) A  B (ii) A  B (iii) B  C (iv) B  C (v) A  C (b) Represent the sets A, B, C and U by a Venn diagram. Solution: (a) (i) A  B = {a, e, i, o, u, v, w, l} (ii) A  B = {e, o} (iii) B  C ={v, o, w, e, l} (iv) B  C ={v, w} (v) A  C =  Note: If C  B, then B  C = B and B  C = C. (b) The Venn diagram for the sets A, B, C and U are as follows. 4-4 (3) Complement Sometimes, we are interested not only in the objects belonging to a subset, but also in the objects not belonging to that subset. This gives rise to the concept of a complement. Let A be a subset of the universal set U. The complement of A with respect to U is the set of the elements of U which do not belong to A. We shall denote this complement by A C , A’ or A . In symbols, we write A’ = {x: x U and x  A} For example, if U = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and A = {1, 3, 5, 7, 9}, then A = {0, 2, 4, 6, 8}. 4-5 3.2. Sample Space and Events A. Definitions When we roll a die and observe the number shown on the top face, we perform an experiment. A characteristics of the experiment is that, while we know the result (outcome) of the experiment must be one of the six numbers in the set {1, 2, 3, 4, 5, 6}, we cannot predict with certainty which of them will actually show up. The occurrence of the number depends on chance. Such an experiment is called a random experiment. When a random experiment is performed, the set of all possible outcomes is called the sample space for the experiment, and is often denoted by S. The sample space corresponds to the universal set U in set theory. Example 3 A die is rolled and the number shown on the top face is observed. Write down (a) the sample space S, (b) (c) (d) (e) the event A that the number shown is odd, the event B that the number shown is divisible by 3, the event that the number shown is odd or divisible by 3, the event that the number shown is both odd and divisible by 3. Solution: (a) S = {1, 2, 3, 4, 5, 6} (b) A = {1, 3, 5} (c) B = {3, 6} (d) This is the union of A and B. A  B = {1, 3, 5, 6} (e) This is the intersection of A and B. A  B = {3} 4-6 3.3 Fundamental Concepts of Probability A. Definition Probability is a numerical measure of the chance of the occurrence of an event. A sample space is said to be equiprobable if all the sample points are equally likely to occur. Let S be an equiprobable sample space with n(S) sample points and A be an event in S with n(A) sample points. Then the probability of A, denoted by P(A), is n (A) P(A)  n (S) Hence for an equiprobable sample space, it is necessary to be able to enumerate all “possible” and all “favourable” outcomes in order to find the probability of a given event. 4-7 Example 4 A bag contains 1 red ball, 2 yellow balls and 3 blue balls. A ball is randomly drawn from the bag. (a) What is the probability that the ball drawn is red? (b) What is the probability that the ball drawn is blue? Solution: The total number of balls in the bag is 1 + 2 + 3 = 6 Therefore, the number of possible outcomes is 6. (a) The bag contains 1 red ball, therefore, the number of favourable outcome is 1. 1  P(ball drawn is red)  6 (b) The bag contains 3 blue balls, therefore, the number of favourable outcome is 3. 3 1  P (ball drawn is blue)   6 2 Example 5 Two fair coins are tossed. (a) Find the probability that both coins show the same result. (b) Find the probability that at least one coin shows a head. Solution: Since the coins are “fair”, we can assume that the sample space S = {HH, HT, TH, TT} is equiprobable. (a) Let A be the event that both coins show the same result. A = {HH, TT} and n(A) = 2 n (A) 2 1    P(A)  n (S) 4 2 (b) Let B be the event that at least one coin shows a head. B = {HH, HT, TH} and n(B) = 3 n (B) 3   P(B)  n (S) 4 4-8 B. Properties of Probability From the definition of probability, we may derive the following properties. For every event A in the sample space S, 1. 0  P(A)  1 2. P(S) = 1 3. P(A)  1  P(A) 4. For two events A and B in the sample space S, P(A  B)  P(A)  P(B)  P(A  B) [Additive Rule] 4-9 Example 6 A die is thrown. Find the probability of getting a number (a) greater than 2, (b) greater than 6, (c) less than 7, (d) less than 3 or greater than 5, (e) less than 4 and is an odd number. Solution: For a die, there are 6 possible outcomes: 1, 2, 3, 4, 5 and 6. (a) There are 4 favourable outcomes: 3, 4, 5 and 6  P(a number greater than 2)  4 2  6 3 (b) Since no number on the die is greater than 6, the number of favourable outcome is zero.  P(a number greater than 6)  0 0 6 (c) Since all the numbers on a die are less than 7, there are 6 favourable outcomes: 1, 2, 3, 4, 5 and 6  P(a number less than 7)  6 1 6 (d) There are 3 favourable outcomes: 1, 2 and 6  P(a number less than 3 or greater than 5)  3 1  6 2 (e) There are 2 favourable outcomes: 1 and 3  P(a number less than 4 and is an odd number)  4-10 2 1  6 3 We may use a tree diagram or the tabulation method to help us list out the possible outcomes. Example 7 Three fair coins are tossed. (a) Find the probability of getting exactly 2 heads. (b) Find the probability of getting at least 2 tails. Solution: Let H stands for a head and T stands for a tail. By using a tree diagram, we have First coin Second coin Third coin Outcomes H (H, H, H) T (H, H, T) H (H, T, H) T (H, T, T) H (T, H, H) T (T, H, T) H (T, T, H) T (T, T, T) H H T H T T The sample space S = {(H, H, H), (H, H, T), (H, T, H), (H, T, T), (T, H, H), (T, H, T), (T, T, H), (T, T, T)} and n(S) = 8. (a) Let A be the event of getting exactly 2 heads. A = {(H, H, T), (H, T, H), (T, H, H)} and n(A) = 3 3  P(A)  8 (b) Let B be the event of getting at least 2 tails. B = {(H, T, T), (T, H, T), (T, T, H), (T, T, T)} and n(B) = 4 4 1  P(B)   8 2 4-11 Example 8 Two dice are thrown. (a) Find the probability of getting the same number on the two dice. (b) Find the probability of getting a total of 8. Solution: All the possible outcomes are listed in the following table. First Dice Second Dice 1 2 3 4 5 6 1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) 2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) 3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) 4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) 5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) 6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) (a) Let A be the event of getting the same number on the two dice. A = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)} and n(A) = 6  P(A)  6 1  36 6 (b) Let B be the event of getting a total of 8. B = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} and n(B) = 5  P(B)  5 36 4-12 We can also use the counting methods in the calculation of probabilities. Example 9 A team of 5 is chosen at random from 6 boys and 4 girls. What is the probability that (a) they are all boys? (b) three of them are boys? (c) only one girl is chosen? Solution: C56 6 1 (a) P(all 5 are boys) = 10   C5 252 42 C36  C42 20  6 120 10 (b) P(3 are boys and 2 are girls) =    C10 252 252 21 5 C64  C14 15  4 60 5    (c) P(4 are boys and 1 is girl) = 10 C5 252 252 21 4-13 Example 10 A hand of 5 cards are drawn from a deck of 52 playing cards. What is the probability of getting (a) 5 spades? (b) 5 cards of the same suit? (c) 3 aces and 2 kings? (d) a full-house, i.e., 3 cards with the same number/letter and 2 cards with the same other number/letter? Solution: (a) The total number of possible hands of 5 cards is C 52 5  2598960 The number of ways of selecting 5 spades from the 13 spades is C13 5  1287  P(5 spades)  C13 1287 33 5   52 2598960 66640 C5 (b) We can select one suit out of 4 suits, namely, spade (♠), heart (♥), diamond (♦) and club (♣). Then select 5 cards from this suit.  P(same suit)  C14  C13 4  1287 5148 33 5    52 2598960 2598960 16660 C5 (c) The number of ways of selecting 3 aces from the 4 aces is C34  4 The number of ways of selecting 2 kings from the 4 kings is C 42  6  The number of ways of forming 3 aces and 2 kings is C 34  C 42  4  6  24  P(3 aces and 2 kings)  C 34  C 42 46 24 1    52 2598960 2598960 108290 C5 (d) There are 13 ways to choose the first number/letter and then 12 ways to choose the second number/letter. For each of 13  12  156 ways, the probability of forming a full-house is the same as P(3 aces and 2 kings).  P(full-house)  13  12  C 34  C 42 24 3744 6  156    52 2598960 2598960 4165 C5 4-14 Example 11 A card is drawn randomly from an ordinary pack of 52 playing cards. Find the probability that the card is (a) a spade or a heart, (b) an ace or a heart. Solution: Since the card is drawn randomly from the pack, each card has the same chance of being selected. Hence we may take the sample space to be equiprobable with 52 sample points. (a) Since a card cannot be both a spade and a heart, the event of getting a spade and the event of getting a heart cannot occur together, so they are mutually exclusive. P(spade)  13 1  52 4 and P(heart)  13 1  52 4 Therefore, by the special addition rule for mutually exclusive events, P(spade  heart )  P(spade)  P(heart)  13 13 26 1    52 52 52 2 (b) Since a card can be both an ace and a heart, the event of getting an ace and the event of getting a heart can occur together, so they are not mutually exclusive. P(ace)  4 1  52 13 P(heart)  13 1  52 4 and P(ace  heart)  P(ace of heart)  1 52 Therefore, by the general addition rule, P(ace  heart)  P(ace)  P(heart)  P(ace  heart)  4-15 4 13 1 16 4     52 52 52 52 13 Example 12 In a box of 12 monitors, 8 are good and 4 are defective. If three monitors are picked at random for inspection, what is the probability that (a) they are all good? (b) at least one is defective? (c) 2 are good and 1 is defective? (d) at least two are good? Solution: (a) P(3 good ones) = C 83 56 14   12 220 55 C3 (b) Using the law for complementary events, P(at least 1 defective) = 1 – P(3 good ones) = 1  14 41  55 55 C 82  C14 28  4 112 28 (c) P(2 good and 1 defective) =    220 220 55 C12 3 (d) Using the special addition law for mutually exclusive events, P(at least 2 good ones) = P(3 good ones) + P(2 good and 1 defective) 14 28  55 55 42  55  4-16 C. Conditional Probability Sometimes we may wish to alter our estimate of the probability of an event when we have additional knowledge that might affect its outcome. This revised probability is called the conditional probability of the event. The 50 students of a class can be classified according to sex and stream as follows. Arts (A) Science (S) Boys (B) 7 17 Girls (G) 16 10 If one student is selected at random from the class, the probability that the student is a boy Arts student is P(B  A)  7 50 Now, suppose that the student is known to have been drawn from the 24 boy students. Then the probability that he is an Arts student should be 7 . 24 This probability is obviously different from P(B  A) above as it is based on a different sample space (the reduced sample space of 24 boy students, not the original sample space of 50 students of both sexes). It is the probability that the student selected is an Arts student given the condition that the student is a boy. This type of probability is called conditional probability. 4-17 More generally, consider the sample points of the equiprobable sample space S classified according to the table below. B B Total A n (A  B) n (A  B) n(A) A n (A  B) n (A  B) n (A) Total n(B) n (B) n(S) The probability that event B will occur given that event A has already occurred is called the conditional probability of B given A (or the probability of B conditional on A), and is denoted by P(B | A) . Thus, with the reasonable assumptions that n (A)  0 and n (S)  0 , n (A  B) P(A  B) n (A  B) n (S) P( B | A )    n ( A ) P( A ) n (A) n (S) The last expression is usually taken as the definition of a conditional probability. The conditional probability of B given A has occurred is defined as P(A  B) P(B | A)  if P(A)  0 P( A ) Similarly, the conditional probability of A given B has occurred is defined as P(A  B) P(A | B)  if P(B)  0 P(B) Note that the relationship between A and B in the conditional probability is not symmetrical, that is, in general, P(B | A)  P(A | B) . 4-18 Example 13 100 students go to a camp. After grouping, the number of boys and girls in each group are as follows. Group I II III IV Boys 14 18 15 13 Girls 11 7 10 12 If a student is chosen at random from these 100 students, find the probability that (a) the student is in group II, (b) the student is a boy, (c) the student is a boy in group II, (d) the student is a boy given that the student is in group II, (e) the student is in group II given that the student is a boy. Solution: Let A be the event that the chosen student is in group II, and B be the event that the chosen student is a boy. (a) P(the student is in group II)  P(A)  (b) P(the student is a boy)  P(B)  18  7 25 1   100 100 4 14  18  15  13 60 3   100 100 5 (c) P(the student is a boy in group II)  P(A  B)  4-19 18 9  100 50 (d) P(the student is a boy given that the student is in group II)  P(B | A) P(A  B) P(A) 9  50 1 4 9  4 50 18  25  OR Among the 25 students in group II, there are 18 boys.  P(the student is a boy given that the student is in group II)  18 25 (e) P(the student is in group II given that the student is a boy)  P(A | B) P(A  B) P(B) 9  50 3 5 9 5   50 3 3  10  OR Among the 60 boys, there are 18 boys in group II.  P(the student is in group II given that the student is a boy)  4-20 18 3  60 10 Example 14 A card is drawn randomly from an ordinary pack of 52 playing cards. (a) What is the probability it is the king of spade? (b) We are told it is a king. What is the probability it is the king of spade? (c) We are told it is a spade. What is the probability it is the king of spade? Solution: (a) P(king of spade) = 1 52 (b) The most direct approach is to say that the king is equally likely to be any of the four kings and hence the probability that it is the king of spade is 1 . 4 (c) The spade is equally likely to be any of the thirteen spades and hence the probability that it is the king of spade is 1 . 13 It is instructive to consider (b) and (c) as conditional probabilities. Consider the sample space that has the 52 distinct cards as equally likely outcomes. Let A be the event of getting a king, B be the event of getting a spade, and C be the event of getting the king of spade. Then P(A)  4 1 13 1 1  , P(B)   and P(C)  P(A  C)  P(B  C)  52 13 52 4 52 1 P(A  C) 52 13 1 (b) P(C | A)     1 P(A) 52 4 13 1 4 1 P(B  C) 52 (c) P(C | B)     1 52 13 P(B) 4 4-21 Example 15 The probability that a regularly scheduled flight arrives on time is 0.92, the probability that it departs on time is 0.83, and the probability that it arrives and departs on time is 0.78. Find the probability that a plane (a) departs on time given that it arrived on time, (b) arrives on time given that it departed on time. Solution: Let A be the event that the flight arrives on time, and B be the event that the flight departs on time. Then P(A) = 0.92, P(B) = 0.83 and P(A  B)  0.78 (a) P(B | A)  P(A  B) 0.78 78 39    P( A ) 0.92 92 46 (b) P(A | B)  P(A  B) 0.78 78   P(B) 0.83 83 This example verifies that P(B | A)  P(A | B) . Example 16 A fair dice is tossed twice. If the first toss is an odd number, what is the probability that the sum of the two tosses is 7? Solution: Let A be the event of getting an odd number in the first toss, and B be the event of getting the sum is 7. There are 6 possible outcomes in the first toss: 1, 2, 3, 4, 5 and 6 A = {1, 3, 5} and n(A) = 3  P( A )  3 1  6 2 There are totally 6  6  36 possible outcomes for tossing a dice twice. For events A and B to occur simultaneously, there are three favourable outcomes: (1, 6), (3, 4) and (5, 2)  P(A  B)  3 1  36 12 1 P(A  B) 12 1 Therefore, P(B | A)    1 P(A) 6 2 4-22 D. Multiplication Law of Probability for Independent Events For two events A and B in the sample space S, event B is said to be independent of event A if the probability that event B occurs is not affected by whether event A has or has not occurred. In this case, the conditional probability of B given A equals the probability of B, that is, P(B | A)  P(B) On the other hand, P(A | B)  P(A  B) P(A)  P(B | A) P(A)  P(B)    P( A ) P(B) P(B) P(B) and so, A is also independent of B. So we have “B is independent of A” implies that “A is also independent of B”. Two events A and B are independent if either P(A | B)  P(A) or P(B | A)  P(B) . If A, B are independent events, since P(A | B)  P(A  B) P(B) we have P(A  B)  P(A | B)  P(B)  P(A)  P(B) Therefore, independent events are usually formally defined as follows. Two events A and B are independent if and only if P(A  B)  P(A)  P(B) This is the special multiplication law of probability for independent events and is sometimes known as ‘the AND rule’. It tells us that the probability that two independent events will both occur is simply the product of their probabilities. If A, B and C are independent events, then P(A  B  C)  P(A)  P(B)  P(C) 4-23 Example 17 When a fair coin is tossed three times, there are eight equally likely outcomes. Consider the following events. A is the event that the first toss is head. B is the event that the last toss is tail. C is the event that the total number of heads is exactly one. (a) Compute P(A), P(B) and P(C). (b) Compute P(B | A) . Are the events A and B independent? (c) Compute P(C | A) . Are the events A and C independent? Solution: (a) A = {(H, H, H), (H, H, T), (H, T, H), (H, T, T)} and n(A) = 4  P(A)  4 1  8 2 B = {(H, H, T), (H, T, T), (T, H, T), (T, T, T)} and n(B) = 4  P(B)  4 1  8 2 C = {(H, T, T), (T, H, T), (T, T, H)} and n(C) = 3  P ( C)  3 8 (b) Since A  B = {(H, H, T), (H, T, T)} and n (A  B)  2  P(A  B)  2 1  8 4 1 P(A  B) 4 2 1 It follows that P(B | A)     1 4 2 P( A ) 2 Since P(B | A)  P(B) , events A and B are independent. OR 1 1 1   2 2 4 So we have P(A  B)  P(A)  P(B) , and therefore, events A and B are independent. P(A)  P(B)  4-24 (c) Since A  C = {(H, T, T)} and n (A  C)  1  P ( A  C)  1 8 1 P ( A  C) 8 2 1 It follows that P(C | A)     1 8 4 P(A) 2 Since P(C | A)  P(C) , events A and C are dependent. OR 1 3 3   2 8 16 So we have P(A  C)  P(A)  P(C) , and therefore, events A and C are dependent. P ( A )  P (C)  4-25 Example 18 Two events A and B are such that P(A)  1 1 2 , P(A | B)  and P(B | A)  . 4 2 3 (a) Are A and B independent events? (b) Are A and B mutually exclusive events? (c) Find P(A  B) . (d) Find P(B). Solution: (a) If A and B are independent events then P(A | B)  P(A) . 1 1 and P(A)  . 2 4 Therefore P(A | B)  P(A) and A and B are not independent events. Now P(A | B)  (b) If A and B are mutually exclusive events then P(A | B)  0 . But it is given that P(A | B)  1 . 2 Therefore A and B are not mutually exclusive events. (c) P(A  B)  P(A)  P(B | A)  1 2 1   4 3 6 (d) P(A  B)  P(B)  P(A | B) 1 1  P(B)  6 2  P(B)  P(B)  1 2 6 1 3 4-26 E. Multiplication Law of Probability for Dependent Events Two events are said to be dependent if the occurrence of one event affects the occurrence of the other. If A and B are two dependent events where event A occurs before event B, then the conditional probability of B based on event A has occurred is P(A  B) if P(A)  0 P( B | A )  P( A ) So we have P(A  B)  P(A )  P(B | A) This is known as the general multiplication rules, and the rules enable us to calculate the probability that two events will both occur. The probability that both of two events will occur is equal to the probability that one of them will occur multiplied by the probability that the other one will occur given that the first has occurred. The multiplication rule looks rather complex, but should be intuitively clear and may be readily written down with the aid of a tree diagram. For two events A and B in the sample space, the probability that both of them will occur is P(A  B)  P(A)  P(B | A) 4-27 Example 19 There are 2 red balls and 1 green ball in a bag. Find the probability of getting a red ball in the first draw and a green ball in the second draw if two balls are drawn at random one by one (a) with replacement, (b) without replacement. Solution: Let A be the event that a red ball is drawn in the first draw. and B be the event that a green ball is drawn in the second draw. (a) If two balls are drawn at random one by one with replacement, then events A and B are independent events. Let R stands for a red ball and G stands for a green ball. The tree diagram is constructed as shown. First Draw Second Draw Outcome RR RG GR GG The probability of getting ‘RG’ can be calculated by multiplying the probabilities along the branches. P(RG )  P(A  B)  P(A)  P(B)  2 1 2   3 3 9 4-28 (b) If two balls are drawn at random one by one without replacement, then events A and B are dependent events. The tree diagram is constructed as shown. First Draw Second Draw Outcome RR RG GR The probability of getting ‘RG’ can be calculated by multiplying the probabilities along the branches. P(RG )  P(A  B)  P(A)  P(B | A)  2 1 1   3 2 3 4-29 Example 20 A carton contains 12 eggs of which 3 are rotten. If 2 eggs are selected randomly from the carton without replacement, what is the probability that (a) both are rotten? (b) exactly one is rotten? (c) at least one is rotten? Solution: Let R i be the event that the i th egg drawn is rotten, i = 1, 2, and G i  R i , the event that the i th egg drawn is good. The tree diagram for the results of the 2 eggs selected is shown as follows. (a) P(both eggs drawn are rotten)  P(R 1  R 2 )  P(R 1 )  P(R 2 | R 1 ) Assuming that all eggs have the same probability of being selected, P(R 1 )  3 1  12 4 If the first egg drawn is rotten, there will be 2 rotten eggs among the 11 remaining.  P(R 2 | R 1 )  2 11 Hence, P(R 1  R 2 )  3 2 1   12 11 22 OR Using combination, the required probability is 4-30 C 32 3 1   12 66 22 C2 (b) From the tree diagram, the event that exactly one of the eggs drawn is rotten is the union of the two mutually exclusive events R 1  G 2 and G 1  R 2 . P(exactly one of the eggs drawn is rotten)  P ( R 1  G 2 )  P (G 1  R 2 )  P ( R 1 )  P (G 2 | R 1 )  P (G 1 )  P ( R 2 | G 1 ) 3 9 9 3    12 11 12 11 9 9   44 44 18  44 9  22  OR Using combination, the required probability is (c) C13  C19 3  9 27 9    C12 66 66 22 2 P(at least one of the eggs drawn is rotten) = P(exactly one of the eggs drawn is rotten) + P(both eggs drawn are rotten) 9 1   22 22 10  22 5  11 4-31 G. Bayes’ Theorem Bayes’ theorem is an important extension of the result P(A  B) P(B)  P(A | B) P(B | A)   P( A ) P( A ) Suppose the sample space S is partitioned into n mutually exclusive and exhaustive events E 1 , E 2 , …, E n , i.e., S  E 1  E 2    E n with E i  E j   for i, j = 1, 2, …, n and i  j . Let A be an event in S. Then the probability of E r condition on A is P(E r | A)  P(E r )P(A | E r )  P( E 1 )  P( A | E 1 )  P( E 2 )  P( A | E 2 )    P( E n )  P( A | E n ) P(E r )P(A | E r ) n  P( E )  P( A | E ) i 1 i i for r = 1, 2, …, n. This is known as Bayes’ theorem. The formula looks very complicated, but in fact it is easy to use if you remember that the denominator is the total probability of A. 4-32 Note that the conditional probability on the left hand side is conditional on A while the conditional probabilities on the right hand side are conditional on the E i ’s. This reversal of conditioning is the most important feature of Bayes’ theorem. Normally, we start with a specified event ( E r , say) and find, by the multiplicative rule, the probability that this event will lead to an observed result (A). But Bayes’ theorem allows us to do the reverse. Given the observed result (A), we calculate the probability that it has arisen from a specified event ( E r ). In Bayes’ theorem, the “initial” probabilities P(E r ) are assigned or estimated based on personal judgment, experience or past records. Their values are given before the information about the occurrence or non-occurrence of the event A is available. They are thus called prior probabilities. The Bayes’ theorem modifies the prior probabilities to incorporate information provided by the occurrence of event A. The “revised” probabilities P(E r | A) , derived after information is provided by an observed result, are called posterior probabilities. 4-33 Example 21 Visitors to a certain country are required to undergo a blood test for a certain kind of disease. If a visitor has the disease, the test has a probability 0.90 of showing a positive result. But even if the visitor does not have the disease, the test still has a probability 0.05 of showing a positive result. From past records and other sources of information, it is known that 8% of the visitors have the disease. (a) What is the probability that a visitor selected at random will give a positive result for the blood test? (b) Suppose that a visitor’s blood test gives a positive result, what is the probability that he has the disease? Solution: Let D be the event that a visitor has the disease, and Y (i.e., yes) be the event that the blood test shows a positive result. (a) The given information is displayed in the following tree diagram. From the diagram, we see that P ( Y )  P ( D)  P ( Y | D)  P ( D)  P ( Y | D)  0.08  0.90  (1  0.08)  0.05  0.08  0.90  0.92  0.05  0.072  0.046  0.118 4-34 (b) If the visitor has the disease, we know that his blood test will very likely show a positive result. But conversely, we realize that even if he does not have the disease, his blood test result can still be positive, though with a low probability. If a positive result cannot assure that the visitor has the disease, then the probability that it indicate that the visitors has the disease can be found by the Bayes’ theorem as follows. P( D | Y )  P ( D)  P ( Y | D) P ( D)  P ( Y | D)  P ( D)  P ( Y | D) 0.08  0.90 0.08  0.90  (1  0.08)  0.05 0.072  0.08  0.90  0.92  0.05 0.072  0.072  0.046 0.072  0.118 72  118 36  59  Bayes’ theorem is useful for revising the prior probability to give the posterior probability that an event occurs based on the information provided by an observed result. The positive result of the blood test suggests a greatly increased probability (from 0.08 to that the visitor has the disease. 4-35 36  0.61 ) 59 Associate Degree 2022 – 2023 First Semester CCMA4001 Quantitative Analysis I Chapter 4 Discrete Probability Distributions 4.1 Probability Distributions and Probability Functions A probability distribution is a function that describes the likelihood of obtaining the possible values that a random variable can assume. In other words, the values of the variable vary based on the underlying probability distribution. Suppose you draw a random sample and measure the heights of the subjects. As you measure heights, you can create a distribution of heights. This type of distribution is useful when you need to know which outcomes are most likely, the spread of potential values, and the likelihood of different results. Discrete probability functions can assume a discrete number of values. For example, coin tosses and counts of events are discrete functions. These are discrete distributions because there are no in-between values. For example, you can have only heads or tails in a coin toss. Similarly, if you’re counting the number of books that a library checks out per hour, you can count 21 or 22 books, but nothing in between. For discrete probability distribution functions, each possible value has a non-zero likelihood. Furthermore, the probabilities for all possible values must sum to one. Because the total probability is 1, one of the values must occur for each opportunity. 5-1 Example 1 Consider the experiment of throwing three fair coins. Let X be the random variable of the number of heads shown when 3 fair coins are tossed. (a) Describe the random variable X as a function of the elements of the sample space. (b) What is the probability associated with each value of the random variable X? Solution: (a) The sample space of throwing 3 fair coins is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} which consists of 8 equiprobable elements. The random variable X is a function defined on the sample space S, to each point i in S, X assigns a real number X(i). If only the number of heads shown is concerned, then a numerical value of 0, 1, 2 or 3 will be assigned to each sample point. The numbers 0, 1, 2 and 3 are random quantities determined by the outcome of an experiment. 5-2 The values of the random variable X are given by the following table. i HHH HHT HTH THH HTT THT TTH TTT X(i) 3 2 2 2 1 1 1 0 Thus, the random variable X may be described as the function shown in the figure below. (b) When the random variable X takes on the value 3, it is associated with the event {HHH} which has a probability of 1 of occurring. 8 Thus, we have the following probability statement. P(X = 3) = P({HHH})  1 8 P(X = 2) = P({HHT, HTH, THH})  P(X = 1) = P({HTT, THT, TTH}  P(X = 0) = P({TTT})  3 8 3 8 1 8 The probability statements may be written as 1  8 3  8 P( X  x )   3 8  1  8 when x  3 when x  2 when x  1 when x  0 5-3 Representation The probability distribution of a random variable may be regarded as an arrangement in which the total probability 1 of the sample space is assigned (distributed) to the various values of the random variable according to some rule(s). A discrete probability distribution is the probability distribution of a discrete random variable. It may be represented by a table or a function. Example 2 Represent the probability distribution of the random variable X, the number of heads shown in the throw of three fair coins, in Example 1 by a table, Solution: (a) The table below is a representation of the probability distribution of X. x 0 1 2 3 P(X = x) 1 8 3 8 3 8 1 8 5-4 Example 3 The probability function of a random variable X is given by kx  f (x)   0  for x  1, 2, 3, 4 otherwise (a) Find the value of k. (b) Tabulate the probability distribution of X. (c) Find the value of P(X  3) . Solution: 4  f (x)  1 (a) x 1 f (1)  f (2)  f (3)  f (4)  1 k  2k  3k  4k  1 10k  1 1 k 10 (b) The probability distribution of X is as follows. x 1 2 3 4 f(x) 1 10 1 5 3 10 2 5 (c) P(X  3)  P(X  1)  P(X  2)  P(X  3)  f (1)  f (2)  f (3) 1 2 3    10 10 10 6  10 3  5 5-5 4.2 Expectation The expectation (or mean or expected value) of a discrete random variable X (or of a discrete probability distribution) is defined as n   E(X)   f ( x ) x   f ( x i ) x i i 1 x where  denotes the summation over all the possible values of x. x In general, the expectation of a function g(X) of a discrete random variable X is defined as n E[g(X)]   f ( x )g ( x )   f ( x i )g( x i ) x i 1 and is the weighted mean of all the values which the function g(X) can take. 5-6 Example 3 The random variable X has the following probability distribution. x 0 1 2 3 4 f(x) 1 16 1 4 3 8 1 4 1 16 Find E(X) , E(X 2 ) and hence E(X 2  3X  8) . Solution: 4 E(X)   f ( x ) x  x 0 4 1 1 3 1 1  0  1   2   3   4  2 16 4 8 4 16 E(X 2 )   f ( x ) x 2  x 0 1 1 3 1 1  0 2   12   2 2   3 2   4 2  5 16 4 8 4 16 E(X 2  3X  8)  E(X 2 )  3E(X)  8  5  3  2  8  7 Example 4 From past records, a dentist found that the number of patients X treated in an hour can be described by the following probability distribution. x 1 2 3 4 5 f(x) 0.1 0.15 0.4 0.25 0.1 (a) Find E(X) . (b) If he charges his patient $500 for each treatment, what is his expected earning per hour. Solution: 5 (a) E(X)   f ( x ) x  0.1  1  0.15  2  0.4  3  0.25  4  0.1  5  3.1 x 1 (b) If he charges his patient $500 for each treatment, his expected earning per hour is 3.1  $500  $1550 5-7 The following results on expectations are frequently used in probability theory. 1. For a random variable X and any constants a and b, E(aX + b) = aE(X) + b A special case is E(b) = b. 2. For functions g(X) and h(X) of the random variable X, E[g(X)  h(X)] = E[g(X)]  E[h(X)] Proof: 1. E(aX  b)   f ( x )(ax  b) x   f ( x )(ax )   f ( x )(b) x x  a  f ( x ) x  b f ( x ) x ( a and b are constants independent of x) x = aE(X) + b(1) = aE(X) + b 2. E[g( x )  h ( x )]   f ( x )[g ( x )  h ( x )] x   f ( x )g ( x )   f ( x ) h ( x ) x x  E[g (X)]  E[h (X)] 5-8 4.3 Variance and Standard Deviation The variance of a discrete random variable X (or of a discrete probability distribution), commonly denoted by Var(X) is defined as n  2  Var(X)  E[(X  ) 2 ]   f ( x )( x  ) 2   f ( x i )( x i  ) 2 i 1 x The standard deviation  is the positive square root of the variance, and is a measure of dispersion of the distribution in the same unit as x.   Var (X) n Sometimes, the formulae  2  Var (X)  E[(X  ) 2 ]   f ( x )( x  ) 2   f ( x i )( x i  ) 2 i 1 x are not very convenient to use. The following sets of formulae are their useful alternatives. n  2  Var (X)  E(X 2 )  [E(X)]2   f ( x ) x 2   2   f ( x i ) x i   2 2 i 1 x i. Variance of a constant For any constant a, Var(a) = 0 Proof: Since the mean of a is a itself, Var (a )  E[(a  a ) 2 ]  0 This result must be true as variance is a measure of dispersion or variability, and a constant does not vary. ii. Variance of a constant multiple of a random variable For a random variable X and a constant a, Var (aX)  a 2 Var (X) Proof: Since E(aX)  aE(X)  a , that is, the mean of aX is a Var (aX)  E[(aX  a) 2 ]  E[a 2 (X  ) 2 ]  a 2 E[(X  ) 2 ]  a 2 Var(X) ( a 2 is a constant) 5-9 iii. Variance of a linear function of a random variable For a random variable X and constants a and b, Var(aX  b)  a 2 Var(X) 5-10 Example 4 The following table shows the probability distribution of a random variable X. x 1 2 3 4 5 f(x) 0.1 0.3 0.2 0.3 0.1 (a) Find E(X) . (b) Find E(X 2 ) . (c) Find Var (X) . (d) Find Var (X 2 ) . (e) Find E(4X  7) and Var(4X  7) . (f) Find E(2  5X) and Var(2  5X) . Solution: 5 (a) E(X)   f ( x) x  0.1  1  0.3  2  0.2  3  0.3  4  0.1  5  3 x 1 5 (b) E(X 2 )   f ( x) x 2  0.1  12  0.3  2 2  0.2  3 2  0.3  4 2  0.1  5 2  10.4 x 1 (c) Var(X)  E[(X  E(X)) 2 ]  E[(X  3) 2 ] 5   f ( x )( x  3) 2 x 1  0.1  (1  3) 2  0.3  (2  3) 2  0.2  (3  3) 2  0.3  (4  3) 2  0.1  (5  3) 2  0.1  4  0.3  1  0.2  0  0.3  1  0.1  4  1.4 OR Var(X)  E(X 2 )  [E(X)]2  10.4  3 2  10.4  9  1.4 5-11 (d) Var (X 2 )  E[(X 2  E(X 2 )) 2 ]  E[(X 2  10.4) 2 ] 5   f ( x )( x 2  10.4) 2 x 1  0.1  (12  10.4) 2  0.3  (2 2  10.4) 2  0.2  (3 2  10.4) 2  0.3  (4 2  10.4) 2  0.1  (5 2  10.4) 2  0.1  88.36  0.3  40.96  0.2  1.96  0.3  31.36  0.1  213.16  52.24 OR Var(X 2 )  E[(X 2 ) 2 ]  [E(X 2 )]2  E(X 4 )  [E(X 2 )]2 5   f ( x ) x 4  (10.4) 2 x 1  0.1  14  0.3  2 4  0.2  3 4  0.3  4 4  0.1  5 4  (10.4) 2  160.4  108.16  52.24 (e) E(4X  7)  4E(X)  7  4  3  7  19 Var (4X  7)  4 2 Var(X)  16  1.4  22.4 (f) E(2  5X)  2  5E(X)  2  5  3  13 Var(2  5X)  (5) 2 Var(X)  25  1.4  35 5-12 Example 4 Let X be the discrete random variable of the number of heads shown when three fair coins are tossed. The probability distribution of X is as follows. X 0 1 2 3 f(x) 1 8 3 8 3 8 1 8 (a) Find E(X) , E(X 2 ) and E[X(X  1)] . (b) Find Var (X) . (c) Find Var (X 2 ) . 5-13 Example 5 There are 4 gold coins and 6 silver coins in a bag. Three coins are randomly drawn in succession from the bag without replacement. Let the random variable X be the total number of gold coins have been drawn. (a) Tabulate the probability distribution of X. (b) Find E(X) and Var(X). 5-14 Associate Degree 2021 – 2022 First Semester CCMA4001 Quantitative Analysis I Chapter 5 Special Discrete Probability Distributions 5.1 The Bernoulli Distribution A Bernoulli trial is a random experiment which can result in one of two possible outcomes. For convenience, we may call the two outcomes ‘success’ and ‘failure’, but they can equally be ‘yes’ and ‘no’, ‘good’ and ‘defective’, ‘male’ and female’, ‘black’ and ‘white’, etc. Suppose that the probability of a ‘success’ is p. Then the probability of a ‘failure’ is 1 – p. Since the outcomes are not numerical, we define a random variable X on the sample space {success, failure} so that  1 for a success X  0 for a failure This random variable is called a Bernoulli variable and has the probability distribution called a Bernoulli distribution as shown below. x 0 1 P(X = x) 1–p p Mathematically, the Bernoulli distribution is given by P(X  x )  p x (1  p)1 x for x = 0, 1 The mean (expected value) of the distribution to be p And the variance is given by  2  p(1  p) Note that a Bernoulli distribution is completely specified by the value of p, which is often called the parameter of the distribution. web resources: https://www.jbstatistics.com/introduction-to-the-bernoulli-distribution/ 6-1 Example 1 A college, which has 1250 students, conducts a chairman election for the student union. Each student drops a vote in a ballot box. It is found that Peter gets 875 votes. Suppose a vote is picked at random from the ballot box. Let the random variable X be the count of the vote for Peter. (a) Describe the probability distribution of X. (b) Find the mean and variance of X. (c) Interpret the mean and variance of X. Solution: (a) When the vote is for Peter, the count is 1; otherwise the count is 0. P(X = 1) = P(the vote is for Peter)  875  0.7 1250 P(X = 0) = P(the vote is not for Peter) = 1 – 0.7 = 0.3 x 0 1 P(X = x) 0.3 0.7 It is a Bernoulli distribution with p = 0.7 (b) The mean of X The variance of X   0.7  2  p(1  p )  0.7  0.3  0.21 (c) The mean is the proportion of votes for Peter (success) in the population. The variance is a measure of variability of this proportion. 6-2 5.2 The Binomial Distribution A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice). For example, a coin toss has only two possible outcomes: heads or tails and taking a test could have two possible outcomes: pass or fail. This random variable is called a binomial variable and its distribution is called a binomial distribution. Let n be the total (fixed) number of independent Bernoulli trials, each trial resulting in either a success (S) or a failure (F) with constant probabilities p and 1 – p respectively. Also let the random variable X be the number of successes in the n trials. Then, for a typical outcome with x successes (and hence with n – x failures), the probability is, by the special multiplication rule for independent events, p p  p (1  p)(1  p)  (1  p)  p x (1  p) n  x     nx x Trial Number 1 2 3 4 … n–1 n Outcome of Trial S F S S … F S Success Number Probability 1 p 1–p 2 p 3 p … … 1–p x p But the x successes can occur in C nx ways among the n trials. Hence the total probability for x successes and n – x failures is P(X  x )  C nx p x (1  p) n  x for x = 0, 1, 2, …, n This is the probability function of the binomial distribution. 6-3 The probability function has two parameters, n and p. Hence, the distribution is completely specified when the values of these parameters are specified. For this reason, we shall use the notation Bin(n, p) to denote the binomial distribution with parameters n and p, where n is the number of trials and p is the probability of success in each trial. Thus, Bin(5, 0.3) denotes the binomial distribution P(X  x )  C 5x (0.3) x (0.7) 5 x for x = 0, 1, 2, 3, 4, 5 We also write X ~ Bin (n, p) to mean that ‘X has the Bin(n, p)’, or ‘X is distributed as the Bin(n, p)’. The mean of the binomial distribution Bin(n, p) is   np And the variance of the binomial distribution Bin(n, p) is  2  np(1  p) Binomial distributions must also meet the following three criteria:  The number of observations or trials is fixed. In other words, you can only figure out the probability of something happening if you do it a certain number of times. This is common sense—if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a 20 times, your probability of getting a tails is very, very close to 100%.  Each observation or trial is independent. In other words, none of your trials have an effect on the probability of the next trial.  The probability of success (tails, heads, fail or pass) is exactly the same from one trial to another. web resources: https://www.jbstatistics.com/introduction-to-the-binomial-distribution/ 6-4 Example 2 From past records, patients suffering from a certain disease will recover in one week’s time with a probability of 0.7 if they are given a treatment, and with a probability of 0.3 if they are not given a treatment. Find the probability that (a) out of 8 patients who receive treatment, less than 4 will recover in one week’s time; (b) out of 8 patients who do not receive treatment, more than 6 will recover in one week’s time. Solution: (a) Let X be the number of patients who receive treatment and will recover in one week’s time. Then X ~ Bin(8, 0.7) i.e., P(X = x)  C 8x (0.7) x (0.3) 8 x for x = 0, 1, 2, 3, 4, 5, 6, 7, 8 Hence P(X < 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)  C 80 (0.7) 0 (0.3) 80  C18 (0.7)1 (0.3) 81  C 82 (0.7) 2 (0.3) 8 2  C 83 (0.7) 3 (0.3) 83  (0.3) 8  8  0.7  (0.3) 7  28  (0.7) 2  (0.3) 6  56  (0.7) 3  (0.3) 5  0.00006561  0.00122472  0.01000188  0.04667544  0.05796765 (b) Let Y be the number of patients who do not receive treatment and will recover in one week’s time. Then Y ~ Bin(8, 0.3) i.e., P(Y = y) = b(y; 8, 0.3)  C 8y (0.3) y (0.7) 8 y for y = 0, 1, 2, 3, 4, 5, 6, 7, 8 Hence P(Y > 6) = P(Y = 7) + P(Y = 8)  C 87 (0.3) 7 (0.7) 87  C 88 (0.3) 8 (0.7) 88  8  (0.3) 7  0.7  (0.3) 8  0.00122472  0.00006561  0.00129033 6-5 5.3 The Geometric Distribution In the binomial distribution, the number of trials n is fixed, and the random variable of interest is the number of ‘successes’ in these n trials. Sometimes we do not fix the number of trials beforehand but keep on repeating the trial until a ‘success’ occurs. Then the number of successes is fixed (= 1) while the number of trials is a random variable. The associated probability distribution is called the geometric distribution, which we define formally as follows. Let independent Bernoulli trials, each with a constant probability p of a success, be performed until a success occurs. The number of trials X then has the geometric distribution with parameter p, given by P(X  x )  (1  p) x 1 p for x = 1, 2, 3, … If x independent Bernoulli trials are required to give a success, then the first x – 1 trials must all result in failures (F) while the x th trial must result in a success (S). Trial Number 1 2 3 … x–1 x Outcome of Trial F F F … F S 1–p 1–p 1–p … 1–p p Probability Since the probability of each failure is 1 – p and that of the success is p, and since the trials are independent, the probability of the above event is (1  p) x1 p as required. Note that the values which X may take can be any integer from 1 to infinity since at least one trial is required to obtain a success, but theoretically, infinitely many trials may have to be performed to get the success. Also, the probabilities (1  p) x1 p (for x = 1, 2, 3, …) are the terms of a geometric series with common ratio 1 – p. This justifies the name ‘geometric distribution’. The mean of the geometric distribution with parameter p is 1  p And the variance of the geometric distribution with parameter p is 1 p 2  2 p web resources: https://www.jbstatistics.com/introduction-to-the-geometric-distribution/ 6-6 Example 3 A fair die is thrown until a ‘1’ occurs. (a) Find the probability distribution for the number of throws required. (b) Find the probability that 6 throws are required. (c) Find the probability that more than 6 throws are required. (d) Find the mean and the standard deviation of the probability distribution in (a). Solution: (a) Each throw of the die is a Bernoulli trail with a probability of success (the occurrence of ‘1’) 1 equal to 6 1 Hence the number of independent trials required, X, has a geometric distribution with p  6 5 P( X  x )    6 5 (b) P(X  6)    6 x 1 6 1 1   6 for x = 1, 2, 3, … 5 5 5 3125 1 5 1 5 1 5  0.06698      5   6  46656 6 6 6 6 6 6 (c) P(X > 6) = P(X = 7) + P(X = 8) + P(X = 9) + … 6 7 8 5 1 5 1 5 1                 ... 6 6 6 6 6 6 6 a 5 5 1 which is sum to infinity of a geometric series S()  with a      and r  1 r 6 6 6 6 5 1     6 6 Hence P(X  6)       5 1 6 6 5 1     6 6  6   6    5   5  15625  0.3349   1 6 6 46656 6 6 OR The event that more than n throws are required can also be interpreted as 1 does not occur in the first n throws. n 5 This then gives P(X  n )    directly. 6 6 5 6 15625 5 Hence P(X  6)     6   0.3349 46656 6 6 6-7 (d) The mean   1 1  6 p 1 6 This is a reasonable result since if the probability for a ‘1’ to occur is 1 , 6 then theoretically, the die must be thrown 6 times on the average to obtain one ‘1’. 1 5 1 p 6  6  5  36  30 The variance  2  2  2 1 6 p 1   36 6 1 Hence the standard deviation   30  5.4772 6-8 5.4 The Poisson Distribution The Poisson distribution is the discrete probability distribution of the number of events occurring in a given time period, given the average number of times the event occurs over that time period. For example A certain fast-food restaurant gets an average of 3 visitors to the drive-through per minute. This is just an average, however. The actual amount can vary. The Poisson distribution is applicable only when several conditions hold.  An event can occur any number of times during a time period.  Events occur independently. In other words, if an event occurs, it does not affect the probability of another event occurring in the same time period.  The rate of occurrence is constant; that is, the rate does not change based on time.  The probability of an event occurring is proportional to the length of the time period. For example, it should be twice as likely for an event to occur in a 2 hour time period than it is for an event to occur in a 1 hour period. The Poisson variable X with parameter  (> 0) has the probability function P ( X  x)  e   x x! for x = 0, 1, 2, … A Poisson distribution is completely specified by only one parameter,  , and is denoted by Po(  ). The mean of the Poisson distribution Po( ) is  And the variance of the Poisson distribution Po( ) is 2   Therefore for the Poisson distribution Po(  ), mean = variance =  web resources: https://www.jbstatistics.com/introduction-to-the-poisson-distribution/ 6-9 Example 4 The number of telephone calls at an office over a given time interval may be considered as having a Poisson distribution. The average number of phone calls per hour in an office is 180. Find the probability of having (a) 4 phone calls in a minute, (b) no phone call in a particular 3-minute interval. Solution: Let the random variable X be the number of phone calls in a minute. Then X is a Poisson distribution with mean   180 3 60 i.e., P(X  x )  e 3 (3) x x! for x = 0, 1, 2, … Note that 180 calls per hour = 3 calls per minute (a) The probability of having 4 phone calls in a minute is P(X  4)  e 3 (3) 4 e 3 (81)   0.1680 4! 24 6-10 (b) The probability of having no phone call in a minute is P(X  0)  e 3 (3) 0 e 3 (1)   0.04979 0! 1 Since each 1-minute interval is independent of each other, the probability of having no phone call in a particular 3-minute interval is (0.04979) 3  0.0001234 OR Let the random variable Y be the number of phone calls in a 3-minute interval. Then Y is a Poisson distribution with mean   180 9 20 i.e., P(Y  y)  e 9 (9) y y! for y = 0, 1, 2, … Note that 180 calls per hour = 9 calls per 3-minute interval The probability of having no phone call in a particular 3-minute interval is P(Y  0)  e 9 (9) 0 e 9 (1)   0.0001234 0! 1 6-11 Associate Degree 2021 – 2022 First Semester CCMA4001 Quantitative Analysis I Chapter 6 The Normal Distribution and Its Applications 6.1 The Normal Distribution A. The Probability Density Function One of the most important continuous distribution which is of wide applications is the normal (or Gaussian) distribution. A continuous random variable X has a normal distribution (also called the Gaussian distribution) with mean  , variance  2 and standard deviation  has the probability density function f (x)  1  2 e  1  x     2   2 for    x   If a random variable X is normally distributed with mean  and variance  2 , we write X ~ N(,  2 ) . The figure shown gives a sketch of such a normal curve. The normal curve has the following properties. (1) f(x) > 0 for all values of x. This is obvious from the definition of f(x). Thus, condition (1) for a pdf is satisfied. 7-1 (2) The total area under the curve and above the horizontal axis is equal to 1, i.e.,    f ( x )dx    1   2 e  1  x     2   2 dx  1 Thus satisfying condition (2) for pdf. The shape of the curve f(x) depends on two parameters, the mean  and the variance  2 . The graph of f(x) has the following features. (1) It is a bell-shaped curve symmetrical about the vertical line x   . (2) The mean, median and mode are all equal to  . (3) The normal curve extends indefinitely in both direction, from   to   . And it has the x-axis as an asymptote. That is, as x   or x   , f ( x )  0 The shape and the location of the curve f(x) depends on two parameters, the mean  and the variance  2 . For normal curves with the same  but different ' s (1   2 ) , they have the same shape but are centred at different positions along the horizontal axis. For normal curves with the same  but different ' s (1   2 ) , the curves are centred at exactly the same position on the horizontal axis, but they have different shape, the curve with the larger standard deviation is lower and spreads out farther. 7-2 For example, 7-3 B. Probabilities of the Standard Normal Distribution N(0, 1) Suppose a random variable X has a normal distribution N(,  2 ) . The probability that X lies between a and b is written P(a  X  b) and is given by the area under the normal curve between a and b. P (a  X  b)   b a 1  2 e  1  x     2   2 dx The areas under the normal curve can be computed by the use of integral calculus but the function is very difficult to integrate. In practice, we usually find normal probabilities from tables. In order to use the same set of tables for all possible values of mean  and variance  2 , we perform a process known as ‘standardizing X’ to obtain the standard normal variable which is given the special symbol Z. The random variable Z having the normal distribution with mean 0 and standard deviation equal to 1 is called the standard normal variable. Thus, Z ~ N(0, 1) and the N(0, 1) is called the standard normal distribution and has pdf f (z)  1 2 e  z2 2 for    z   The following is the graph of the standard normal distribution. Probabilities of this distribution can be found from normal tables. 7-4 Mathematicians has set up tables of area under the standard normal curve. One form of such normal tables is given on P.6. The table gives values of A(z) defined by A ( z )  P (0  Z  z )   z 1 0 2 e  1 2 t 2 dt for z  0 A(z) 0 z From the table, we can read, for example, the following probabilities as areas A(z) under the standard normal curve. P(0  Z  1.2)  A(1.2)  0.3849 P(0  Z  0.63)  A(0.63)  0.2357 The normal table can be used to find values like P( Z  a ) , P( Z  b) and P(a  Z  b) . In using the table, it is always recommended to sketch the normal curve and shade the relevant region. This will visualize the situation under consideration. Watch: https://www.youtube.com/watch?v=lgwT6tDniko 7-5 The entries in Table I are the probabilities that a random variable having the standard normal distribution will take on a value between 0 and z. They are given by the area of the gray region under the curve in the figure. TABLE I NORMAL-CURVE AREAS z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.6 0.7 0.8 0.9 1.0 0.2257 0.2580 0.2881 0.3159 0.3413 0.2291 0.2611 0.2910 0.3186 0.3438 0.2324 0.2642 0.2939 0.3212 0.3461 0.2357 0.2673 0.2967 0.3238 0.3485 0.2389 0.2704 0.2995 0.3264 0.3508 0.2422 0.2734 0.3023 0.3289 0.3531 0.2454 0.2764 0.3051 0.3315 0.3554 0.2486 0.2794 0.3078 0.3340 0.3577 0.2517 0.2823 0.3106 0.3365 0.3599 0.2549 0.2852 0.3133 0.3389 0.3621 1.1 1.2 1.3 1.4 1.5 0.3643 0.3849 0.4032 0.4192 0.4332 0.3665 0.3869 0.4049 0.4207 0.4345 0.3686 0.3888 0.4066 0.4222 0.4357 0.3708 0.3907 0.4082 0.4236 0.4370 0.3729 0.3925 0.4099 0.4251 0.4382 0.3749 0.3944 0.4115 0.4265 0.4394 0.3770 0.3962 0.4131 0.4279 0.4406 0.3790 0.3980 0.4147 0.4292 0.4418 0.3810 0.3997 0.4162 0.4306 0.4429 0.3830 0.4015 0.4177 0.4319 0.4441 1.6 1.7 1.8 1.9 2.0 0.4452 0.4554 0.4641 0.4713 0.4772 0.4463 0.4564 0.4648 0.4719 0.4778 0.4474 0.4573 0.4656 0.4725 0.4783 0.4484 0.4582 0.4664 0.4732 0.4788 0.4495 0.4591 0.4671 0.4738 0.4793 0.4505 0.4599 0.4678 0.4744 0.4798 0.4515 0.4608 0.4685 0.4750 0.4803 0.4525 0.4616 0.4692 0.4756 0.4808 0.4535 0.4625 0.4699 0.4761 0.4812 0.4545 0.4633 0.4706 0.4767 0.4817 2.1 2.2 2.3 2.4 2.5 0.4821 0.4861 0.4893 0.4918 0.4938 0.4826 0.4864 0.4896 0.4920 0.4940 0.4830 0.4868 0.4898 0.4922 0.4941 0.4834 0.4871 0.4901 0.4925 0.4943 0.4838 0.4875 0.4904 0.4927 0.4945 0.4842 0.4878 0.4906 0.4929 0.4946 0.4846 0.4881 0.4909 0.4931 0.4948 0.4850 0.4884 0.4911 0.4932 0.4949 0.4854 0.4887 0.4913 0.4934 0.4951 0.4857 0.4890 0.4916 0.4936 0.4952 2.6 2.7 2.8 2.9 3.0 0.4953 0.4965 0.4974 0.4981 0.4987 0.4955 0.4966 0.4975 0.4982 0.4987 0.4956 0.4967 0.4976 0.4982 0.4987 0.4957 0.4968 0.4977 0.4983 0.4988 0.4959 0.4969 0.4977 0.4984 0.4988 0.4960 0.4970 0.4978 0.4984 0.4989 0.4961 0.4971 0.4979 0.4985 0.4989 0.4962 0.4972 0.4979 0.4985 0.4989 0.4963 0.4973 0.4980 0.4986 0.4990 0.4964 0.4974 0.4981 0.4986 0.4990 3.1 3.2 3.3 3.4 3.5 0.4990 0.4993 0.4995 0.4997 0.4998 0.4991 0.4993 0.4995 0.4997 0.4998 0.4991 0.4994 0.4995 0.4997 0.4998 0.4991 0.4994 0.4996 0.4997 0.4998 0.4992 0.4994 0.4996 0.4997 0.4998 0.4992 0.4994 0.4996 0.4997 0.4998 0.4992 0.4994 0.4996 0.4997 0.4998 0.4992 0.4995 0.4996 0.4997 0.4998 0.4993 0.4995 0.4996 0.4997 0.4998 0.4993 0.4995 0.4997 0.4998 0.4998 Also, for z = 4.0, 5.0 and 6.0, the areas are 0.49997, 0.4999997, and 0.499999999. 7-6 Example 1 Given that Z is the standard normal variable, Z ~ N(0, 1), find the following probabilities. (a) P(0  Z  1.28) (b) P(1.28  Z  0) (c) P( Z  1.28) (d) P( Z  1.28) (e) P( Z  1.28) (f) P( Z  1.28) (g) P(1.28  Z  2.28) (h) P(1.28  Z  2.28) (i) P(2.28  Z  1.28) (j) P(0  Z  1.289) Solution: (a) P(0  Z  1.28)  A(1.28)  0.3997 (b) P(1.28  Z  0)  P(0  Z  1.28)  A(1.28)  0.3997 (c) The normal curve is symmetrical about the line through its mean (i.e., the line z = 0) and the total area under the curve is equal to 1.  The area on the right of the line z = 0 is 0.5 i.e., P( Z  0)  0.5 P( Z  1.28)  P( Z  0)  P(0  Z  1.28)  0.5  0.3997  0.1003 (d) P( Z  1.28)  P(1.28  Z  0)  P( Z  0)  0.3997  0.5  0.8997 (e) P( Z  1.28)  P( Z  0)  P(0  Z  1.28)  0.5  0.3997  0.8997 (f) P( Z  1.28)  P( Z  0)  P(1.28  Z  0)  0.5  0.3997  0.1003 7-7 (g) P(1.28  Z  2.28)  P(0  Z  2.28)  P(0  Z  1.28)  A(2.28)  A(1.28)  0.4887  0.3997  0.0890 (h) P(1.28  Z  2.28)  P(1.28  Z  0)  P(0  Z  2.28)  0.3997  0.4887  0.8884 (i) P(2.28  Z  1.28)  P(2.28  Z  0)  P(1.28  Z  0)  P(0  Z  2.28)  P(0  Z  1.28)  A(2.28)  A(1.28)  0.4887  0.3997  0.0890 (j) The normal probability table gives probabilities only for values of z with at most 2 decimal places. Hence, rounding 1.289 to 1.29, we have P(0  Z  1.289)  P(0  Z  1.29)  A(1.29)  0.4015 7-8 Example 2 If Z has the standard normal distribution, determine the value of z in each of the following cases. (a) P( Z  z)  0.8790 (b) P( Z  z)  0.0401 (c) P( Z  z)  0.0080 (d) P( Z  z)  0.6429 (e) P(2.1  Z  z)  0.2586 (f) P(z  Z  0.37)  0.3125 Solution: In solving this problem, the normal table is used in the reverse way. We first locate the given probability in the table. Its corresponding row digits and column digits then give the value of z. (a) As P( Z  z)  0.8790  P(0  Z  z)  P( Z  z)  P( Z  0)  0.8790  0.5  0.3790 0 z The entry 0.3790 is located in the row marked 1.1 and in the column marked 0.07 in the normal table.  The required z = 1.17 (b) As P( Z  z)  0.0401  P(0  Z  z)  P( Z  0)  P( Z  z)  0.5  0.0401  0.4599 From the normal table, z = 1.75 0 (c) P( Z  z)  0.0080  0.5 implying that z lies on the left of 0 such that P(z  Z  0)  P( Z  0)  P( Z  z)  0.5  0.0080  0.4920  P(0  Z  z)  P(z  Z  0)  0.4920 From the normal table, – z = 2.41  z = – 2.41 z 0 (d) P( Z  z)  0.6429  0.5 implying that z lies on the left of 0 such that P(z  Z  0)  P( Z  z)  P( Z  0)  0.6429  0.5  0.1429  P(0  Z  z)  P(z  Z  0)  0.1429 From the normal table, A(0.36) = 0.1406 and A(0.37) = 0.1443 Since 0.1429 is closer to 0.1443, we take – z = 0.37  z = – 0.37 7-9 z 0 z (e) As P(2.1  Z  0)  P(0  Z  2.1)  A(2.1)  0.4821  0.2586  z must be negative in this case. -2.1 z 0 By symmetry, P(0  Z  z)  P(z  Z  0)  P(2.1  Z  0)  P(2.1  Z  z)  0.4821  0.2586  0.2235 From the normal table, A(0.59) = 0.2224 and A(0.60) = 0.2257 Since 0.2235 is closer to 0.2224, we take – z = 0.59  z = – 0.59 (f) As P(0  Z  0.37)  A(0.37)  0.1443  0.3125  z must be negative in this case. z 0 0.37 By symmetry, P(0  Z  z)  P(z  Z  0)  P(z  Z  0.37)  P(0  Z  0.37)  0.3125  0.1443  0.1682 From the normal table, A(0.43) = 0.1664 and A(0.44) = 0.1700 Since 0.1682 is the average of 0.1664 and 0.1700, we take – z = 0.435  z = – 0.435 7-10 C. Probabilities of the Normal Distribution N( ,  2 ) When X ~ N(,  2 ) with  and  are not 0 and 1, we have to transform the random variable X to a normal random variable Z with mean zero and variance 1. This can be done by means of the transformation Z  7-11 X  .  Example 3 A random variable X is normally distributed with mean 30 and standard deviation 4. Find the following probabilities. (a) P(X  33) (b) P(X  35) (c) P(X  27) (d) P(21  X  28) Solution: The random variable X has the distribution N(30, 16) with mean   30 and standard deviation   4 Hence X is transformed to standard normal by Z  X   X  30  4  33  30   (a) P(X  33)  P Z    P( Z  0.75) 4    P ( Z  0)  P (0  Z  0.75)  0.5  0.2734 30 0  0.7734 35  30   (b) P(X  35)  P Z    P( Z  1.25) 4    P( Z  0)  P(0  Z  1.25)  0.5  0.3944  0.1056 30 0 27  30   (c) P(X  27)  P Z   4    P( Z  0.75)  P( Z  0)  P(0.75  Z  0)  0.5  P(0  Z  0.75)  0.5  0.2734  0.2266 27 -0.75 28  30   21  30 Z (d) P(21  X  28)  P  4   4  P(2.25  Z  0.5)  P(2.25  Z  0)  P(0.5  Z  0)  P(0  Z  2.25)  P(0  Z  0.5)  0.4878  0.1915  0.2963 7-12 21 -2.25 28 -0.5 33 0.75 35 1.25 x z x z 30 0 x z 30 0 x z Example 4 A random variable X has distribution N(50, 100). Find the values of a and b if (a) P(X  a )  0.0427 (b) P(X  b)  0.209 Solution: The random variable X has the distribution N(50, 100), i.e.,   50 and   100  10 (a) If P( Z  z)  0.0427 then P(0  Z  z)  P( Z  0)  P( Z  z)  0.5  0.0427  0.4573 50 0 From the normal table, z = 1.72 As a z X Z P(X  a )  0.0427 a  50   P Z    0.0427 10   a  50   1.72 10 a  50  17.2 a  67.2 (b) If P( Z  z)  0.209  0.5 implying that z lies on the left of 0 such that P(z  Z  0)  P( Z  0)  P( Z  z)  0.5  0.209  0.291  P(0  Z  z)  P(z  Z  0)  0.291 b z From the normal table, – z = 0.81  z = – 0.81 As P(X  b)  0.209 b  50   P Z    0.209 10   b  50   0.81 10 b  50  8.1 b  41.9 7-13 50 0 X Z D. Applications of Normal Distribution Applications of normal distribution are commonly occurred in many daily life examples. Example 5 Suppose that W, the weight in kg of an adult male follows N(60, 25) distribution. Calculate the probability that a male chosen at random is (a) less than 61 kg; (b) greater than 63 kg; (c) between 58 kg and 63 kg; (d) less than 58 kg. Solution: The mean and the variance of W are   60 and  2  25 respectively. So the standard deviation is   5 The standardization formula is Z  W  60 5 (a) The probability that a male chosen at random is less than 61 kg is  W  60 61  60  P( W  61)  P   5   5  P ( Z  0 .2 )  P ( Z  0 )  P ( 0  Z  0 .2 )  0.5  0.0793  0.5793 (b) The probability that a male chosen at random is greater than 63 kg is  W  60 63  60  P( W  63)  P   5   5  P( Z  0.6)  P( Z  0)  P(0  Z  0.6)  0.5  0.2257  0.2743 7-14 (c) The probability that a male chosen at random is between 58 kg and 63 kg is  58  60 W  60 63  60  P(58  W  63)  P    5 5   5  P(0.4  Z  0.6)  P(0.4  Z  0)  P(0  Z  0.6)  P(0  Z  0.4)  P(0  Z  0.6)  0.1554  0.2257  0.3811 (d) The probability that a male chosen at random is less than 58 kg is  W  60 58  60  P( W  58)  P   5   5  P( Z  0.4)  P ( Z  0 )  P (  0 .4  Z  0 )  P ( Z  0 )  P ( 0  Z  0 .4 )  0.5  0.1554  0.3446 7-15 Example 6 The intelligence quotients (IQ) of children in a city are normally distributed with mean 100 and standard deviation 15. (a) What proportion of children have IQ scores (i) less than 91? (ii) between 106 and 130? (b) What IQ score will be exceeded by only 5% of the children? Solution: (a) Let the random variable X be the IQ score of a child in the city. Then X ~ N(100, 225) (i) The proportion of children have IQ scores less than 91 is 91  100   P(X  91)  P Z   15    P( Z  0.6)  P( Z  0)  P(0.6  Z  0)  P( Z  0)  P(0  Z  0.6)  0.5  0.2257  0.2743 91 -0.6 100 0 X Z (ii) The proportion of children have IQ scores between 106 and 130 is 130  100   106  100 Z P(106  X  130)  P  15  15   P ( 0 .4  Z  2 )  P ( 0  Z  2 )  P ( 0  Z  0 .4 )  0.4772  0.1554  0.3218 7-16 100 106 0 0.4 130 2 X Z (b) Let a be the required score. P(X > a) = 0.05 If P( Z  z)  0.05 then P(0  Z  z)  P( Z  0)  P( Z  z)  0.5  0.05  0.45 From the normal table, A(1.64) = 0.4495 and A(1.65) = 0.4505 Since 0.45 is the average of 0.4495 and 0.4505  z = 1.645 As P(X  a )  0.05 a  100   P Z    0.05 15   a  100   1.645 15 a  100  24.675 a  124.675  a = 125 (round up to the nearest integer) 7-17 100 0 a z X Z E. Sum and Difference of Two Independent Normal Variables If X and Y are any two random variables, continuous or discrete, then E(X + Y) = E(X) + E(Y) E(X – Y) = E(X) – E(Y) Also, if X and Y are independent, then Var(X + Y) = Var(X) + Var(Y) Var(X – Y) = Var(X) + Var(Y) These results can be applied to normal variables. The sum or difference of two independent normal variables is also normally distributed. If X and Y are two independent normal variables such that 2 2 X ~ N(1 , 1 ) and Y ~ N( 2 ,  2 ) 2 2 then X  Y ~ N(1   2 , 1   2 ) 2 2 and X  Y ~ N(1   2 , 1   2 ) 7-18 Example 7 If X ~ N(70, 9) and Y ~ N(60, 16), find (a) P(X  Y  140) (b) P(120  X  Y  135) (c) P(X  Y  7) (d) P(2  X  Y  8) Solution: (a) X + Y ~ N(70 + 60, 9 + 16) i.e., X + Y ~ N(130, 25) 130 140 x + y 2 z 0 z  140  130  P(X  Y  140)  P Z    P( Z  2)  P( Z  0)  P(0  Z  2)  0.5  0.4772  0.9772 25    120  130 135  130   Z (b) P(120  X  Y  135)  P 25 25    P(2  Z  1)  P(2  Z  0)  P(0  Z  1)  P(0  Z  2)  P(0  Z  1)  0.4772  0.3413  0.8185 120 -2 (c) X – Y ~ N(70 – 60, 9 + 16) i.e., X – Y ~ N(10, 25)  7  10   P(X  Y  7)  P Z  25    P( Z  0.6)  P(0.6  Z  0)  P( Z  0)  P(0  Z  0.6)  P( Z  0)  0.2257  0.5  0.7257 7-19 2 -1.6 135 0 1 x+ y z z 10 x- y 0 z 8 10 -0.4 0 x- y 7 -0.6  2  10 8  10   (d) P(2  X  Y  8)  P Z 25   25  P (  1 .6  Z   0 .4 )  P (  1 .6  Z  0 )  (  0 .4  Z  0 )  P ( 0  Z  1 .6 )  P ( 0  Z  0 .4 )  0.4452  0.1554  0.2898 130 z z z Associate Degree 2021 – 2022 First Semester CCMA4001 Quantitative Analysis I Chapter 7 Sampling Distribution 7.1 Random Samples and Sampling Distributions A. Sampling Distributions The sample mean X of the sample observations X 1 , X 2 , …, X n from a population X is given by X 1 (X1  X 2    X n ) n From the definition of a random sample, the observations are random variables and so the sample statistic is also a random variable. Hence, the sample mean has a probability distribution and this distribution is called the sampling distribution of the mean. 8-1 Example 1 A population consists of the numbers 1, 2, 3, 4 and 5. Random samples of size 2 are drawn from the population with replacement. (a) List all the possible samples of size 2 and find their means. (b) Construct the sampling distribution for the sample mean. Solution: (a) Let (X 1 , X 2 ) represent a sample of size 2 drawn from the population. Since the numbers are drawn with replacement, each X i (i = 1, 2) can take values from 1 to 5. Hence, there are 25 possible samples and these, together with their means, are listed in the following table. Sample Mean (x 1 , x 2 ) x  1 1 1 1 1 (x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 ) (x 1 , x 2 ) x  (x 1  x 2 ) 2 2 2 2 2 Sample Mean Sample Mean Sample Mean Sample Mean (1, 1) 1 (2, 1) 1.5 (3, 1) 2 (4, 1) 2.5 (5, 1) 3 (1, 2) 1.5 (2, 2) 2 (3, 2) 2.5 (4, 2) 3 (5, 2) 3.5 (1, 3) 2 (2, 3) 2.5 (3, 3) 3 (4, 3) 3.5 (5, 3) 4 (1, 4) 2.5 (2, 4) 3 (3, 4) 3.5 (4, 4) 4 (5, 4) 4.5 (1, 5) 3 (2, 5) 3.5 (3, 5) 4 (4, 5) 4.5 (5, 5) 5 8-2 (b) Since the samples are drawn randomly, each sample has the same probability of being drawn. The totality of all the samples thus forms an equiprobable sample space with 25 sample points, each point having a probability of 1 . 25 The sampling distribution of the sample mean X may then be derived from the table in (a) and is shown in the following table. Sample Mean x Probability 1 1 25 1.5 2 25 2 3 25 2.5 4 25 3 5 1  25 5 3.5 4 25 4 3 25 4.5 2 25 5 1 25 Total 1 8-3 Example 2 Refer to the data and results of Example 1. (a) Find the mean  and the variance  2 of the population. (b) Find the mean E(X) and the variance Var (X) of the sampling distribution of X . Solution: 5 1 1 1 1 1 (a)   E(X)   xf ( x )  1   2   3   4   5   3 5 5 5 5 5 x 1  2  Var (X)  E(X 2 )  [E(X)]2 5   x 2 f ( x )  (3) 2 x 1 1 1 1 1 1  12   2 2   3 2   4 2   5 2   (3) 2 5 5 5 5 5  11  9 2 9 (b) E(X)   x if ( x i ) i 1  1 1 2 3 4 1 4 3 2 1  1.5   2  2.5   3   3.5   4  4.5   5 25 25 25 25 5 25 25 25 25 3 This is the same as the population mean  . 2 Var (X)  E(X )  [E(X)]2 9   x i f ( x i )  (3) 2 2 i 1 1 2 3 4 1  (1.5) 2   22   (2.5) 2   32  25 25 25 25 5 4 3 2 1 (3.5) 2   42   (4.5) 2   52   (3) 2 25 25 25 25  10  (3) 2  10  9 1  12  OR Var(X)  2 2  1 n 2 8-4 7.2 Relationship Between Sample Mean and Population Mean Suppose that a population has mean  , which is unknown. To estimate  , it is natural to draw a random sample from the population and use the sample mean as an estimate. Let X 1 , X 2 , …, X n be a random sample of n independent observations from a population with mean  and variance  2 . Consider the sample mean, X , where X  1 1 n (X1  X 2    X n )   X i n n i 1 The distribution of X has mean (expected value) E(X) and variance Var(X) . 1  E(X)  E  (X1  X 2    X n )  n 1  [E (X 1 )  E(X 2 )    E(X n )] n 1  ( n ) n  1  Var(X)  Var  (X1  X 2    X n ) n  1  2 [Var(X 1 )  Var(X 2 )    Var(X n )] n 1  2 (n 2 ) n 2  n Therefore E(X)   = population mean and Var(X)   2 population var iance  n sample size The sample mean is expected to assume the value of the population mean in the long run. Hence by drawing many, many random samples of fixed size n, calculating their means and averaging them, we can obtain a good estimate of the population mean. 8-5 7.3 The Sampling Distribution of the Sample Mean Consider X 1 , X 2 , …, X n , a random sample of size n, taken from a population with mean  and variance  2 . 1 n 2 X  and variance . , has mean  i n i 1 n  and is called the standard error of the mean. The standard deviation of the distribution of X is n We know that the sample mean X , where X  The distribution of X is known as the sampling distribution of means. We now consider how the sample mean is distributed. A. Sampling from a Normal Population If X 1 , X 2 , …, X n is a random sample of size n taken from a normal distribution X ~ N(,  2 ) ,  2 then the distribution of X is also normal and X ~ N ,  n 1 where X  (X 1  X 2    X n ) n    The figure shows the relationship between the distributions of X and X for a normal population. The diagram shows the distribution of X, where X ~ N(,  2 ) , together with the distributions of X when n = 4 and when n = 16. The shape of the distribution of X is narrower than that of the population. Each curve is symmetrical about  , but as n gets larger, the variance gets smaller, so the curve becomes taller and less spread out. The larger the sample size n, the smaller the variance of X and hence the narrower its shape. 8-6 Watch http://onlinestatbook.com/2/sampling_distributions/samp_dist_meanM.html Example 3 The weight of an egg is normally distributed with a mean of 62 grams and a standard deviation of 4 grams. (a) If an egg is picked at random, find the probability that it weighs between 60 grams and 65 grams. (b) Eggs are packed at random in boxes of 12. Find the probability that the average weight of the eggs in a box lies between 60 grams and 65 grams. Solution: (a) Let the random variable X be the weight of an egg in grams. Then X ~ N(62, 16) The probability that an egg weighs between 60 grams and 65 grams is 65  62   60  62 P(60  X  65)  P Z  4   4  P(0.5  Z  0.75)  P(0.5  Z  0)  P(0  Z  0.75)  P(0  Z  0.5)  P(0  Z  0.75)  0.1915  0.2734  0.4649 (b) Since X is normally distributed, X is also normally distributed 4  16   and X ~ N 62,  , i.e., X ~ N 62,  3  12   The probability that the average weight of the eggs in a box lies between 60 grams and 65 grams is     60  62 65  62   P(60  X  65)  P Z  4 4    3 3    P(1.73  Z  2.60)  P(1.73  Z  0)  P(0  Z  2.60)  P(0  Z  1.73)  P(0  Z  2.60)  0.4582  0.4953  0.9535 8-7 B. Sampling from Any Population and Sample Size n is Large When the population X is not normal, the distribution of X may have a form different from that of the population. In many cases, the distribution of X is difficult to find and its shape depends on the sample size n. Fortunately, when n is large enough, the distribution of X is known to be approximately normal, whether the population is normal or not. This is one of the most important results in statistics and is known as the Central Limit Theorem. Central Limit Theorem For any population X with mean  and variance  2 , the distribution of the sample mean X 2 (based on random samples of size n) is approximately normal with mean  and variance , n where n is sufficiently large,  2  i.e., X ~ N ,  approximately, for large n.  n  This theorem explains why the normal distribution is so important in daily life and in statistical theory. The sample size n required for the approximate normality to be valid depends on the nature of the population distribution. In most cases, n = 30 may be considered sufficiently large for the Central Limit Theorem to apply. If the population is itself normal, then the sample mean is exactly normally distributed for any value of n. 8-8 The following figure illustrates the sampling distributions of X for some populations with different sample sizes. If X 1 , X 2 , …, X n is a random sample of size n from any distribution X with mean  and variance  2 then the distribution of the sample mean X is approximately normal for large n and  2 X ~ N ,  n  1  where X  (X 1  X 2    X n ) n  The approximation gets better as n gets larger. The distribution of X can be discrete, for example binomial or Poisson; or continuous, for example rectangular or exponential. 8-9 Example 4 If a random sample of size 30 is taken from each of the following distributions, find, for each case, the probability that the sample mean exceeds 5. (a) X ~ Bin(9, 0.5) (b) X ~ Po(4.5) Solution: (a) If X ~ Bin(9, 0.5) then E(X)    np  9  0.5  4.5 Var(X)   2  np(1  p)  9  0.5  0.5  2.25 The sample size is large, so by the central limit theorem, 2.25   X ~ N 4.5,  approximately 30       5  4.5   Hence P(X  5)  P Z   P( Z  1.83)  P( Z  0)  P(0  Z  1.83)  0.5  0.4664  0.0336  2.25    30   (b) If X ~ Po(4.5) then E(X)      4.5 Var (X)   2    4.5 The sample size is large, so by the central limit theorem, 4.5   X ~ N 4.5,  approximately 30       5  4.5    P( Z  1.29)  P( Z  0)  P(0  Z  1.29)  0.5  0.4015  0.0985 Hence P(X  5)  P Z  4.5    30   8-10 Example 5 Suppose that the population distribution of the daily wages of clerks is known to have a mean of $230 and a standard deviation of $25. For a random sample of 100 clerks, what is the probability that the sample mean daily wage will be between $225 and $233? Solution: Let the random variable X be the daily wage of a clerk in dollars. It is given that   230 and   25 As the sample size n = 100 is large enough (> 30), we do not need to know the nature of the population distribution. From the central limit theorem,  25 2   approximately X ~ N 230, 100   i.e., X ~ N(230,6.25) approximately The probability that the sample mean daily wage will be between $225 and $233 is  225  230 233  230  P(225  X  233)  P Z  6.25  6.25   P (  2  Z  1 .2 )  P (  2  Z  0 )  P ( 0  Z  1 .2 )  P ( 0  Z  2 )  P ( 0  Z  1 .2 )  0.4772  0.3849  0.8621 8-11 Associate Degree 2020 – 2021 First Semester CCMA4001 Quantitative Analysis I Chapter 8 Estimation Introduction One major type of inferential statistics is estimating the unknown parameter in the population by the information collected from a sample. In this chapter, we will discuss how to estimate the unknown population mean and population proportion. Also the determination of the sample size to achieve a certain level of accuracy will be discussed. Estimation of the Mean (σ known) Point estimate In the previous chapter, we know that according to the central limit theorem, a sample mean would follow the normal distribution where  2 X ~ N   , n    .  The sample mean x is an unbiased point estimator of the population mean μ, as E ( X )   . Example The weights of a particular brand of cola follow a normal distribution with an unknown population mean and standard deviation of 15 grams. A sample of 25 cans of cola has a mean of 362.3 grams. What is the estimated population mean?  The point estimate of μ = 362.3 grams. Confidence Interval Estimate 1 When we estimate the population mean weight of this brand of cola as 362.3 grams according to a particular sample, do you believe that the true population mean is exactly 362.3 grams? As there are variations of different sample means selected from the same population, we can only say that we believe the population mean is around 362.3 grams. Then, how accurate is the estimate?  2 As X ~ N   , n      , we know that P (   1.96  X    1.96 )  0.95 . n n  That means if we repeatedly draw different samples, 95% of the sample means fall within     the limits    1.96 ,   1.96 . . n n  0.025 0.025 X   1.96 / n   1.96 / n 95% Confidence Intervals     , X  1.96 Now, with μ being unknown and estimated by X ,  X  1.96  gives n n  a 95% confidence interval estimates of the unknown μ. That means if we repeatedly draw different samples, 95% of the described intervals would contain the parameter μ.  In the above example, with a single sample with sample mean 362.3, the 95% confidence interval estimate of μ:  15 15   362.3  1.96 , 362.3  1.96   356.42, 368.18 25 25   2 In general, a 100(1-  )% confidence interval estimate of μ is given by     , x  z / 2   x  z / 2 n n  Commonly used confidence level includes: Confidence level  0.90 0.1 0.95 0.05 0.98 0.02 0.99 0.01  /2 0.05 0.025 0.01 0.005 z / 2 1.645 1.96 2.33 2.575 Sampling Error and Sample Size Half of the confidence interval, i.e. z / 2  n is the sampling error with 100(1-  )% confidence. This helps determining the sample size. Example The breaking force of the metal wire is normally distributed with an unknown mean and the standard deviation of 100 pounds. Suppose you want to estimate the population mean breaking force to within  25 pounds of the true value with 90% confidence. How large should the sample be?  1.645 100  25 n n  43.30 44 items should be selected. Estimation of the Proportion Besides the estimation of the population mean, another commonly estimated parameter is the population proportion. 3 Example Before the election, an organization has conducted a survey to investigate the supportive rate of each candidate. What is the point estimate of the population proportion of Johnson’s vote if the survey indicates 245 out of 500 respondents will vote for him? What is the 95% confidence interval estimate of it? The point estimate of the population proportion is the sample proportion p̂ . With  p (1  p)  pˆ ~ N  p,  , the 100(1-  )% confidence interval estimate of p: n     pˆ  z / 2   pˆ (1  pˆ ) , pˆ  z / 2 n pˆ (1  pˆ )    n   In the above example, Point estimate of p = 0.49 95% C.I. of p  0.49(0.51) 0.49(0.51)     0.49  1.96 , 0.49  1.96  500 500    0.4462, 0.5338  4 The entries in Table I are the probabilities that a random variable having the standard normal distribution will take on a value between 0 and z. They are given by the area of the gray region under the curve in the figure. TABLE I NORMAL-CURVE AREAS z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.6 0.7 0.8 0.9 1.0 0.2257 0.2580 0.2881 0.3159 0.3413 0.2291 0.2611 0.2910 0.3186 0.3438 0.2324 0.2642 0.2939 0.3212 0.3461 0.2357 0.2673 0.2967 0.3238 0.3485 0.2389 0.2704 0.2995 0.3264 0.3508 0.2422 0.2734 0.3023 0.3289 0.3531 0.2454 0.2764 0.3051 0.3315 0.3554 0.2486 0.2794 0.3078 0.3340 0.3577 0.2517 0.2823 0.3106 0.3365 0.3599 0.2549 0.2852 0.3133 0.3389 0.3621 1.1 1.2 1.3 1.4 1.5 0.3643 0.3849 0.4032 0.4192 0.4332 0.3665 0.3869 0.4049 0.4207 0.4345 0.3686 0.3888 0.4066 0.4222 0.4357 0.3708 0.3907 0.4082 0.4236 0.4370 0.3729 0.3925 0.4099 0.4251 0.4382 0.3749 0.3944 0.4115 0.4265 0.4394 0.3770 0.3962 0.4131 0.4279 0.4406 0.3790 0.3980 0.4147 0.4292 0.4418 0.3810 0.3997 0.4162 0.4306 0.4429 0.3830 0.4015 0.4177 0.4319 0.4441 1.6 1.7 1.8 1.9 2.0 0.4452 0.4554 0.4641 0.4713 0.4772 0.4463 0.4564 0.4648 0.4719 0.4778 0.4474 0.4573 0.4656 0.4725 0.4783 0.4484 0.4582 0.4664 0.4732 0.4788 0.4495 0.4591 0.4671 0.4738 0.4793 0.4505 0.4599 0.4678 0.4744 0.4798 0.4515 0.4608 0.4685 0.4750 0.4803 0.4525 0.4616 0.4692 0.4756 0.4808 0.4535 0.4625 0.4699 0.4761 0.4812 0.4545 0.4633 0.4706 0.4767 0.4817 2.1 2.2 2.3 2.4 2.5 0.4821 0.4861 0.4893 0.4918 0.4938 0.4826 0.4864 0.4896 0.4920 0.4940 0.4830 0.4868 0.4898 0.4922 0.4941 0.4834 0.4871 0.4901 0.4925 0.4943 0.4838 0.4875 0.4904 0.4927 0.4945 0.4842 0.4878 0.4906 0.4929 0.4946 0.4846 0.4881 0.4909 0.4931 0.4948 0.4850 0.4884 0.4911 0.4932 0.4949 0.4854 0.4887 0.4913 0.4934 0.4951 0.4857 0.4890 0.4916 0.4936 0.4952 2.6 2.7 2.8 2.9 3.0 0.4953 0.4965 0.4974 0.4981 0.4987 0.4955 0.4966 0.4975 0.4982 0.4987 0.4956 0.4967 0.4976 0.4982 0.4987 0.4957 0.4968 0.4977 0.4983 0.4988 0.4959 0.4969 0.4977 0.4984 0.4988 0.4960 0.4970 0.4978 0.4984 0.4989 0.4961 0.4971 0.4979 0.4985 0.4989 0.4962 0.4972 0.4979 0.4985 0.4989 0.4963 0.4973 0.4980 0.4986 0.4990 0.4964 0.4974 0.4981 0.4986 0.4990 Also, for z = 4.0, 5.0 and 6.0, the areas are 0.49997, 0.4999997, and 0.499999999. 5 Associate Degree 2020 – 2021 First Semester CCMA4001 Quantitative Analysis I Chapter 9 Testing Hypothesis Introduction In the previous chapter, we look at the type of inferential statistics that estimate the unknown population parameter by the data collected in a sample. In this chapter, we look at another type of inferential statistics that an assumption about the population parameter is tested by the information provided in a sample. Null Hypothesis and Alternative Hypothesis Instead of knowing nothing about the unknown population parameter, very often, we have some assumptions or theories about the unknown parameter and we want to test if the assumption is correct. Null Hypothesis, H0, is the statement that contains the assumption we want to test (the equal sign “=” is always included). Alternative Hypothesis, H1, is the opposite statement of the null hypothesis (the equal sign “=” should not be included). The alternative hypothesis can be one-sided or two-sided depends on what we try to prove. Example Last year, the average amount of sale invoices of ABC shop is $1000 with the standard deviation of $60. 1. We want to test if the average amount of this year’s sales invoices is the same as last year ($1000), OR 2. We want to test if the average amount of this year’s sales invoices is more than last year ($1000).  In case 1, H0: μ = $1000  In case 2, H0: μ = $1000 v.s. v.s. H1: μ ≠ $1000 H1: μ > $1000 1 Type I and Type II errors No matter whether it is a two-sided test or one-sided test, we need to make our decision based on the estimate we compiled from a sample. As sampling error exists, sometimes we may make a wrong decision. There are two types of errors. H0 is true Do not reject H0 Reject H0 H0 is false Type II error Probability =  Type I error Probability =  The probability of committing Type I error,  , also known as the level of significance of the test, is decided before the test is conducted. This level of significance, with the knowledge of the distribution of the test statistics, helps to determine the rejection region of the test. In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the non-rejection of a false null hypothesis (also known as a "false negative" finding or conclusion; example: "a guilty person is not convicted"). Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossible. 2 Test for Hypothesis for the Mean (σ known) When the variable X is normally distributed or when the sample size is large  2 enough, X ~ N   , n  z x  0 / n   , where E(X) = μ and Var(X) = σ2. The Z-statistic,  , when the null hypothesis H0: μ = μ0 is true, should follow the standard normal distribution,. We reject the null hypothesis H0: μ = μ0 when the z statistics fall into the rejection regions. Two-sided test H0: μ =  0 v.s. H1: μ ≠  0 Rejection region with probability α We reject the null hypothesis either z is too large (z > z / 2 ) or too small (z < - z / 2 ).  z / 2 One-sided test H0: μ =  0 v.s. 0 Z z / 2 H1: μ >  0 Rejection region with probability α We reject the null hypothesis when z is too large (z > z ). 0 z Z 3 One-sided test H0: μ =  0 v.s. H1: μ <  0 Rejection region with probability α We reject the null hypothesis when z is too small (z <- z ).  z  0 Z In the above example, if a random sample with sample size n = 30 of this year’s sale invoices is $1030, should the null hypothesis be rejected at 0.05 level of significance? z 1030  1000 60 / 30  2.7386 Case 1: As 2.7386 > 1.96 (z0.025 = 1.96), the null hypothesis is rejected. Case 2: As 2.7386 > 1.645 (z0.05 = 1.645), the null hypothesis is rejected. (Remember you must decide in advance whether you have to do a two–tailed test or a one-tailed test instead of doing both.) 4 Test for Hypothesis for the Proportion  p (1  p)  As pˆ ~ N  p,  , under the null hypothesis H0: p = p0, the Z-statistics n   pˆ  p 0 should follow the standard normal distribution, Z  ~ N 0, 12 . p0 (1  p 0 ) n   We reject the null hypothesis H0: p = p0 when: Test H0: p = p 0 v.s. H1: p ≠ p 0 Rejection Region either z is too large (z > z / 2 ) or z is too small (z < - z / 2 ) H0: p = p 0 v.s. H1: p > p 0 z is too large (z > z ) H0: p = p 0 v.s. H1: p < p 0 z is too large (z <- z ) Example A coin is suspected if it is fair or not. This single coin is tossed 200 times and 120 times resulted at head. Test at 0.10 level of significance if the coin is fair?  H0: p =0.5 v.s. H1: p ≠0.5 0.6  0.5 z  2.8284 0.5(0.5) / 200 As 2.8284 > 1.645, the null hypothesis is rejected. concluded as unfair. The coin is 5 p-value in Statistics When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. This is another approach on doing testing hypothesis. How to find the p-value To find the p-value, first we need to find out the test statistics z. Next, we need to find the corresponding level of p from the z value obtained. For this purpose, we need to look at the z table or from the calculator. For example, let us find the value of p corresponding to z ≥ 2.81. From the normal table, you find the probability is 0.4975. The corresponding p-value is 0.5 – 0.4975 = 0.0025. If we use 5% as the significant level, since p-value is less than 5%, we reject the null hypothesis. Example A coin is suspected if it is fair or not. This single coin is tossed 200 times and 120 times resulted at head. Test at 0.10 level of significance if the coin is fair? Use the p-value approach to solve it. H0: p =0.5 v.s. H1: p ≠0.5 0.6  0.5 z  2.8284 0.5(0.5) / 200 From the normal table when z = 2.8284, the probability is 0.4977 p-value = 0.5 – 0.4977 = 0.0023 Since 0.0023 < 0.05 (the  /2 value for 2-sided test) The null hypothesis is rejected. The coin is concluded as unfair. 6 Other resources (section 6.1 – 6.6) https://www.jbstatistics.com/category/hypothesis-testing/ 7 Associate Degree 2021 – 2022 Second Semester CCMA4001 Quantitative Analysis I Chapter 10 Excel Statistical Function Introduction Excel provides an extensive range of Statistical Functions, that perform calculations from basic mean, median & mode to the more complex statistical distribution and probability tests. The Excel Statistical functions are all listed in the tables below. Selecting a function name will take you to a full description of the function, with examples of use and advice on common errors. Function Name Descriptions COUNT Returns the number of numerical values in a supplied set of cells or values MAX Returns the largest value from a list of supplied numbers MIN Returns the smallest value from a list of supplied numbers AVERAGE Returns the Average of a list of supplied numbers MEDIAN Returns the Median (the middle value) of a list of supplied numbers MODE Returns the Mode (the most frequently occurring value) of a list of supplied numbers PERCENTILE.INC Returns the K'th percentile of values in a supplied range, where K is in the range 0 - 1 (inclusive) QUARTILE.INC Returns the specified quartile of a set of supplied numbers, based on percentile value 0 - 1 (inclusive) Function Name Descriptions STDEV.S Returns the standard deviation of a supplied set of values (which represent a sample of a population) STDEV.P Returns the standard deviation of a supplied set of values (which represent an entire population) VAR.S Returns the variance of a supplied set of values (which 1 represent a sample of a population) VAR.P Returns the variance of a supplied set of values (which represent an entire population) SKEW Returns the skewness of a distribution FACT Find the factorial of a number PERMUT Returns the number of permutations for a given number of objects BINOM.DIST Returns the individual term binomial distribution probability POISSON.DIST Returns the Poisson distribution GAUSS Calculates the probability that a member of a standard normal population will fall between the mean and z standard deviations from the mean NORM.INV Returns the inverse of the normal cumulative distribution STANDARDIZE Returns a normalized value CONFIDENCE.NORM Returns the confidence interval for a population mean, using a normal distribution Z.TEST Returns the one-tailed probability value of a z-test There are plenty of excel function so I cannot include all of them. You may also visit the following web site to learn more. The attached files are the exercises and the answers on how to use the excel function. Please do it and check the answer. Useful Website https://exceljet.net/excel-functions/excel-fact-function https://support.microsoft.com/en-us/office/excel-functions-by-category-5f91f4e9-7b4 2-46d2-9bd1-63f26a86c0eb 2

Quantitative Analysis I Course Syllabus

Related documents

Products

Support

Quantitative Analysis I Course Syllabus

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib