THE COPPERBELT UNIVERSITY SCHOOL OF BUSINESS QUANTITATIVE METHODS EXTERNAL PROGRAMME TAILOKA FRANK P (PROF) CONTENTS PAGE 1. Introduction 1 2. Methods of Organizing and Presenting Data 4 • • • 3. Descriptive Measures • • 4. 36 Probability Experiment Nature of Probability Addition Rules Conditional Probability and Independence Multiplication Rules Baye’s Theorem 70 Binomial Poisson Normal Sampling and Sampling Distribution • • • 7. Measures of Central Tendency Measures of Variability Probability Distributions • • • 6. 14 Probability • • • • • • 5. Frequency Tables Bar Charts Pie Charts 97 Distribution and Sample mean Distribution of proportion Distribution of Sums Estimation • • • 105 Point Estimates Interval Estimates The t-distribution Hypothesis Testing 112 Type I and II Errors Hypothesis Tests Application 2 9. Analysis of Variance • • • 10. 11. The F-distribution Tests under Analysis of Variance Application Time Series • • • 148 Components of Time Series Isolating Time Series Components Application Index Numbers • • 123 167 Construction of Index numbers Uses of Index Number 12. Assignments 3 CHAPTER 1 INTRODUCTION TO STATISTICAL ANALYSIS Reading Newbold 1.1, 1.3, parts of 1.2. Anderson, Sweeney, and Willians Chapter 1 Wonnacott and Wonnacott Chapter 1 James T Mc Clave, P. George Benson Chapter 1 Introductory Comments This Chapter sets the framework for the book. Read it carefully, because the ideas introduced are a basis to this subject and research Methodology. 1. Random Sampling, Deductive and Inductive Statistics. Random Sampling Only in exceptional circumstance is it possible to consider every member of the population. In most cases only a sample of the population can be considered and the results contained from this sample must be generalized to apply to the population. In order that these generalizations should be accurate the sample must be random, that is, every possible sample has an equal chance of selection and the choice of a member of the sample must not be influenced by previous selection, this is simple random sampling. Example 1 Suppose that a population consists of six measurements, 1, 2, 3, 4, 5, and 7. List all possible different samples of two measurements that could be selected from the population. Give the probability associated with each sample in a random sample of n =2 measurement selected from the populations. Solution All possible samples are listed below 4 Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Measurements 1,2 1,3 1,4 1,5 1,7 2,3 2,4 2,5 2,7 3,4 3,5 3,7 4,5 4,7 5,7 Now let us suppose that I draw a single sample of n = 2 measurement from the 15 possible sample of two measurements. The sample selected is called a random sample if every sample had an equal probability (1/15) being selected. It is rather unlikely that we would ever achieve a truly random sample, because the probabilities of selection will not always be exactly equal. But we do the best we can. One of the simplest and most reliable ways to select a random sample of n measurements from a population is to use a table of random numbers (See Appendix vii). Random number tables are constructed in such a way that, no matter where you start in the tables no matter what direction you move, the digits occur randomly and with equal probability. Thus if we wished to choose a random sample of n = measurements from a population containing 100 measurements, we could label the measurements in the population from 0 to 99 (or 1 to 100). Then referring to Appendix Vii and choosing a random starting point, the next 10 two-digit numbers going across the page would indicate the labels of the particular measurements to be included n the random sample. Similarly, by you moving up or down the page, we would also obtain a random sample. Example 2 A small community consists of 850 families. We wish to obtain a random sample of 20 families to ascertain public acceptance of a wage and price freeze. Refer to Appendix vii to determine which families should be sampled. Solution Assuming that a list of all families in the community is available (such as a telephone directory), we could label the families from 0 to 849 (or equivalently, from 1 to 850). Then referring to the Appendix, we choose a starting point. Suppose we have decided to start at line 1, column 4. Going down the page we will choose the first 20 three digit numbers between 000 and 849 from Table B, we have 290 207 424 367 219 065 302 541 0.78 466 454 607 083 6.42 219 254 068 462 160 823 These 20 members identify the 20 families that are to be included in our example/ 5 Deductive and Inductive Statistics. The reasoning that is used in statistics hinges on understanding two types of logic, namely deductive and inductive logic. The type of logic that reasons from the particular (sample) to the general (Population) is known as inductive logic, while the type that reasons from the general to the particular is known as deductive logic. Learning Objectives After working through this chapter, you should be able to: • Explain what random sampling is • Explain the difference between a population and a sample 6 CHAPTER 2 METHODS OF ORGANISING AND PRESENTING DATA Reading Newbold Chapter 2 James T Mc Clave and P George Benson Chapter 2 Tailoka Frank P Chapter 3 Introductory Comments This Chapter contains themes to do with the understanding of data. We find graphical representations from the data, which allow one to easily see its most important characteristics. Most of the graphical representation are very tedious to construct without the use of a computer. However, one understands much more if one tries a few with pencil and a paper. Graphical Representations Of Data Types of business data; methods of frequency distribution. representation of qualitative data, cumulative Types of business data. Although the number of business phenomena that can be measured is almost limitless, business data can generally be classified as one of two types: quantitative or qualitative. Quantitative data are observations that are measured on a numerical scale. Examples of quantitative business data are: i. ii. iii. The monthly unemployment percentage Last year’s sales for selected firms. The number of women executives in an industry. Quantitative data is one that is not measurable, in the sense that height is measured, or countable, as people entering a store. Many characteristics can be classified only. Examples of qualitative business data are: i) The political party affiliations of fifty randomly selected business executives. Each executive would have one and only one political party affiliation. 7 ii) The brand of petrol last purchased by seventy four randomly selected car owners. Again, each measurement would fall into one and only one category. Notice that each of the examples has nonnumerical or qualitative measurements. Graphical methods for describing qualitative data. (a) The Bar Graph For example, suppose a woman’s clothing store located in the downtown area of a large city wants to open a branch in the suburbs. To obtain some information about the geographical distribution of its present customers, the Store manager conducts a survey in which each customer is asked to identify her place of residence with regard to the city’s four quadrants. Northwest (NW), North east (NE), Southwest (SW), or Southeast (SE) out of town customers are excluded from the survey – the response of n = 30 randomly selected resident customers – might appear as in Table 1.1 (note that the symbol n is used here and throughout this course to represent the sample size i.e. the number of measurements in a sample). You can see that each of the thirty measurements fall in one and only one of the four possible categories representing the four quadrants of the city. Table 1.1. Customer 1 2 3 4 5 6 7 8 9 10 Customer resident Survey: n = 30 Resident NW SE SE NW SW NW NE SW NW SE Customer 11 12 13 14 15 16 17 18 19 20 Residence NW SE SW NW SW NE NE NW NW SW Customer 21 22 23 24 25 26 27 28 29 30 Residence NE NW SW SE SW NW NW SE NE SW A natural and useful techniques for summarizing qualitative data is to tabulate the frequency or relative frequency of each category. Definition: The frequency for a category is the total number of measurements that fall in the category. The frequency for a particular category, say category i will be denoted by the symbol f i . The relative frequency for a category is the frequency of that category divided by the total number of measurements; that is. The relative frequency for category I is 8 Relative frequency = fi n Where n = total number of measurements in the sample f i = frequency for the i category. The frequency for a category is the total number of measurements in that category, whereas the relative frequency for a category is the proportion of measurements in the category. Table 1.2 shows the frequency and relative frequency for the customer residences listed in Table 1.1. Note that the sum of the frequencies should always equal the total number of measurements in the sample and the sum of the relative frequencies should always equal 1 (except for rounding errors) as in Table 1.2. Category Frequency Relative Frequency NE 5 5/30 = .167 NW 11 11/30 = .367 SE 6 6/30 = .200 SW 8 8/30 = .267 Total 30 1 A common means of graphically presenting the frequencies or relative frequencies for qualitative data is the bar chart. For this type of chart, the frequencies (or relative frequencies are represented by bars-one bar for each category. The height of the bar for a given category is proportional to the category frequency (or relative frequency). Usually the bars are placed in a vertical position with the base of the bar on the horizontal axis of the graph. The order of the bars on the horizontal axis is unimportant. Both a frequency bar chart and a relative frequency bar chart for the customer residence Example are shown in Figure 1.1. 10 Relative Frequency 5 Frequency 0 NE NW SE SW Residential quadrant 9 a) A frequency bar chart. .50 .25 0 NE NW SE SW Residential Quadrant b) b) A Relative Frequency bar char. Figure 1.1 The Pie Chart The second method of describing qualitative data sets is the pie chart. This is often used in newspaper and magazine articles to depict budgets and other economic information. A complete circle (the pie) represents the total number of measurements. This is partitioned into a number of slices with one slice for each category. For example, since a complete circle spans 360o, if the relative frequency for a category is .30, the slice assigned to that category is 30% of 360 or (.30) (36) = 108o. 108o Figure 1.2 The portion of a pie char corresponding to a relative frequency of .3. 10 Graphical Methods for Describing Quantitative Data. The Frequency Histogram and Polygon. The histogram (often called a frequency distribution) is the most popular graphical technique for depicting quantitative data. To introduce the histogram we will use thirty companies selected randomly from the 1980 Financial Magazine (the top 500 companies in sales for calendar year 1979). The variable X we will be interested in is the earnings per share (E/S) for these thirty companies. The earnings per share is computed by dividing the year’s net profit by the total number of share of common stock outstanding. This figure is of interest to the economic community because it reflects the economic health of the company. The earnings per share figures for the thirty companies are shown (to the nearest ngwee) in Table 1.3. Company 1 2 3 4 5 6 7 8 9 10 E/S 1.85 3.42 9.11 1.96 6.48 5.72 1.72 .8.56 0.72 6.28 Company 11 12 13 14 15 16 17 18 19 20 E/S` 2.80 3.46 8.32 4.62 3.27 1.35 3.28 3.75 5.23 2.92 Company 21 22 23 24 25 26 27 28 29 30 E/S 2.75 6.58 3.54 4.65 0.75 2.01 5.36 4.40 6.49 1.12 How to construct a Histogram 1. Arrange the data in increasing order, from smallest to largest measurement. 2. Divide the interval from the smallest to the largest measurement into between five and twenty equal sub-intervals, making sure that: a) Each measurement falls into one and only one measurement class. b) No measurement falls on a measurement class boundary. Use a small number of measurement classes if you have a small amount of data; use a larger number of classes for large amount of data. 3. Compute the frequency measurement class. (or relative frequency) of measurements in each 4. Using a vertical axis of about three-fourths the length of the horizontal axis, plot each frequency (or relative frequency) as a rectangle over the corresponding measurement class. 11 Using a number of measurements, n = 30, is not large, we will use six classes to span the distance between the smallest measurements, 0.72, and the largest measurement, 9.11. This distance divided by 6 is equal to Largest measurement – smallest measurement Number of intervals = ≅ 9.11 – 0.72 6 1.4 By locating the lower boundary of the first class interval at 0.715 (slightly below the smallest measurement) and adding 1.4, we find the upper boundary to be 2.115. Adding 1.4 again, we find the upper boundary of the second class to be 3.515. Continuing this process, we obtain the six class intervals shown in the table below. Note that each boundary falls on a 0.005 value (one significant digit more than the measurement), which guarantees that no measurement will fall on a class boundary. The next step is to find the class frequency and calculate the class relative frequencies Class 1 2 3 4 5 6 Measurement Class 0.715 – 2.115 2.115 – 3.515 3.515 – 4.915 4.915 – 6.315 6.315 –7.715 7.715 – 9.115 Total Class Frequency 8 7 5 4 3 3 Class relative Frequency 8/30 = .267 7/30 = .233 5/30 = .167 4/30 = .133 3/30 = .100 3/30 = .100 30 1.00 Table 1.4 Definition The class frequency for a given class, say class i, is equal to the total number of measurements that fall in that class. The class frequency for class I is denoted by the symbol f i . Definition The class relative frequency for a given class, say class i, is equal to the class frequency divided by the total number n of measurement, i.e. Relative frequency for class i = fi n 12 8 6 4 2 0 a) 0.517 2.115 Earnings per share Frequency Histogram. 3.515 4.915 6.315 7.715 9.115 .3 .2 .1 0.715 (b) 2.115 3.515 4.915 6.315 7.715 9.115 Earnings per share Relative Frequency histogram Cumulative Frequency Distribution It is often useful to know the number or the proportion of the total number of measurements that are less than or equal to those contained in a particular class. These quantities are called the class cumulative frequency and the class cumulative relative frequency respectively. 13 For example, if the classes are numbered from the smallest to the largest values of x, 1, 2, 3, 4, . . . , then the cumulative frequency for the third class would equal the sum of the class frequencies corresponding to classes 1, 2, and 3. Cumulative frequency for class 3 = f1 + f 2 + f 3 Similarly, cumulative relative frequency for class 3 = f1 + f 2 + f 3 where n is the total n number of measurements in the sample. Cumulative frequencies and cumulative relative frequencies for earning per share data. Class No. Measurement class Class Frequency Cumulative frequency Class Relative Class Frequency Cumulative Relative Frequency 1 0.715 - 2.115 8 8 8/30 = .267 8/30 =.267 2 2.115 – 3.515 7 (8 + 7) = 15 7/30 = .233 15/30 = .500 3 3.155 – 4.915 5 (15 + 5) = 20 5/30= .167 20/30 = .667 4 4.915 – 6.315 4 (20 + 4) = 24 4/30 = .133 24/30 = .800 5 6.315 – 7.715 3 (24 + 3) = 27 3/30 = .100 27/30 = .900 6 7.715 – 9.115 3 (27 + 3) = 30 3/100 = .100 30/30 = 1.00 30 Cumulative relative frequency Distribution for earnings per share data. 1.0 Cumulative Relative .8 Frequency .6 .4 .2 0.715 2.115 3.115 4.915 6.315 Earnings per share 7.715 9.115 14 Learning Objective After working through this Chapter you should be able to: • Draw a pie chart, bar frequencies, histogram. • Interpret the diagrams. You will understanding the importance of captions, axis labels and graduation of axes. chart and also construct frequency tables, relative 15 16 CHAPTER 3 DESCRIPTIVE MEASURES Reading Newbold Chapter 2 Wonnacott and Wonnacolt Chapter 2 Tailoka Frank P. Chapter 4 James T McClave , Lawrence Lapin L and P George Benson Chapter 3 Introductory Comments This Chapter contains themes which allow one to easily se the most important characteristics of data. The idea is to find simple numbers like the mean, variance which will summarize those characteristics. 3. Numerical Description of Data. The Mode; A measure of Central tendency. Definition. The mode is the measure that occurs with the greatest frequency in the data set. Because if emphasizes data concentration, the mode has application in marketing as well as in description of large data sets collected by state and federal agencies. Unless the data set is rather large, the mode may not be very meaningful. For example, consider the earning per share measurements for the thirty financial companies we used in the previous chapter. If you were to re-examine these data, you would find that none of the thirty measurements is duplicated in this sample. This, strictly speaking, all thirty measurements are mode for this sample. Obviously, this information is of no practical use for data description. We can calculate a more meaningful mode by constructing a relative frequency histogram for the data. The interval containing the most measurements is called the modal class and the mode is taken to be the midpoint of this class interval. The modal class, the one corresponding to the interval 0.715 – 2.115 lies to the left side of the distribution. The mode is the midpoint of this interval; that is 17 Mode = 0.715 + 2.115 = 1.415 2 In the sense that the mode measures data concentration, it provides a measure of central tendency of the data. The Arithmetic mean A measurement of Central Tendency The most popular and best understood measure of central Tendency for a quantitative data set is the arithmetic (or simply the mean): Definition The mean of a set of quantitative data is equal to the sum of the measurements divided by the number of measurement contained in the data set. The mean of a sample is denoted by x (read “x bar”) and represent the formula for this calculation as follows:- Example 1 Calculate the mean of the following five simple measures,. 5, 3, 8, 5,6. Solution Using the definition of the sample mean and demand shorthand notation we find 5 x= ∑ 1=1 5 xi = 5 + 3 + 8 + 5 + 6 27 = = 5 .4 . 5 5 The mean of this sample is 5.4 The sample mean will play an important role in accomplishing our objective of making inferences about populations based on sample information. For this reason it is important to use a different symbol when we want to discuss the mean of a population of measurement s i.e. the mean of the entire set of measurements in which we are interested. We use the Greek letter µ (“mu”) for the population mean The Median: Another measure of Central Tendency The median of a data set is the number such that half the measurements fall below the median and half fall above. The median is of most value in describing large data sets. If the data set is characterized by a relative frequency histogram, the median is the point on the x-axis such that half the area under the histogram lies above the median and half lies below. For a small, or even a large but finite, number of measurements, there may be 18 many numbers that t satisfy the property indicated in the figure on the next page. For this reason, we will arbitrarily calculate the media of a data. Calculating a median 1. 2.. If the number of n of measurements in a data set is odd, the median is the middle number when the measurements are arranged in ascending (or descending) order. If the number of n of measurements is even, the median is the mean of the two middle measurements when the measurements are arranged in ascending (or descending) order. Example 2 Consider the following sample if n = 7 measurements. 5, 7, 4, 5, 20, 6, 2 a) b) Calculate the median of this sample Eliminate the last measurement (the 2) and calculate the median of the remaining n = 6 measurements. Solution a) The seven measurements in the sample are first arranged in ascending order 2, 4, 5, 5, 6, 7, 20 Since the number of measurements is odd, the median is the middle measure. Thus, the median of this sample is 5. b) After removing the 2 from the set of measurements, we arrange the sample measurements in ascending order as follows: 4, 5, 5, 6, 7, 20 Now the number of measurements is even, and so we average the middle two measurements. The median is (5+6)/2 = 5.5. Comparing the mean and the median 1. If the median is less than the mean, the data set is skewed to the right. Relative Frequency Median Rightward Skewness Mean measurement units 19 sKewness = = 2. Mean − Mode S tan dard deviation 3(mean − median) s tan dard deviation The median will equal the mean when the data set is symmetric. Median Mean Measurement unit Symmetry 3. If the median is greater than the mean, the data set is skewed to the left. Mean Median The range: A measure of variability Measures of Variation Definition: The range of a data. Set is equal to the largest measurement minus the smallest measure. When dealing with grouped data, there are two procedures which are not adopted for determining the range. 1. Range = class mark of highest class – class mark of lowest class. 2. Range = upper class boundary of highest class – lower class boundary of lowest class. 20 Variance and Standard Deviation The Sample Variance for a sample of n measurements is equal to the squared distances from the mean divided by (n-1). In symbols using S 2 to represent the simple variances. n S2 = ∑ ( x − x) i =1 2 i n −1 The second step in finding a meaningful measure of data variability is to calculate the standard deviation of the data set. The sample standard deviation , s, is defined as the positive square root of the sample variance, S 2 thus, n S = S2 = ∑ ( x − x) i =1 2 i n −1 The corresponding quantity, the population standard deviation, measure the variability of the measurements in the population and is denoted by σ (‘sigma’). The population variances will therefore be denoted by σ 2 . Example 3 Calculate the standard deviation of the following sample. 2, 3, 3, 3, 4. Solution For this set of data, x = 3. Then S= = (2 − 3) 2 + (3 − 2) 2 +(3 − 3) 2 + (4 − 3) 2 5 −1 2 = 0.5 = 0.71 4 Shortcut formular for simple variance 21 S2 = ( sum of square of sample measurement ) − n −1 n ∑ x1 n 2 xi − i =1 ∑ n i =1 n −1 ( sum of sample measurement ) n 2 2 Example 4 Use the shortcut formula to compute the variances of these two samples of five measures each. Sample 1: 1, 2, 3, 4, 5 Sample 2:2, 3, 3, 3, 4 Solution We first work with sample 1. The quantities needed are: n ∑x i =1 1 5 ∑x i =1 2 1 = 1 + 2 + 3 + 4 + 5 = 15, and = 12 + 2 2 + 32 + 4 2 + 52 = 1 + 4 + 9 + 16 + 25 = 55 2 5 ∑ xi n (15) 2 2 x1 − i =1 55 − ∑ 5 5 = S 2 = i =1 4 5 −1 55 − 45 10 = = 2 .5 4 4 Similarly, for sample 2 we get 5 ∑x i =1 i = 2 + 3 + 3 + 3 + 4 = 15 22 5 ∑x Add i =1 = 2 2 + 32 + 32 + 32 + 4 2 = 4 + 9 + 9 + 9 + 16 = 47 2 1 Then the variance for sample 2 is 2 5 ∑ xi n (15) 2 2 x1 − i =1 47 − ∑ 5 5 S 2 = i =1 = 5 −1 4 = 47 − 45 2 = = 0 .5 4 4 Example 5 The earnings per share measurements for thirty companies selected randomly from 1980 Financial/Daily mail are listed here. Calculate the sample variance S 2 and the standard deviation, S, from these measurements. 1.85 3.42 9.11 1.96 6.48 5.72 1.72 8.56 0.72 6.28 2.80 3.46 8.32 4.62 3.27 1.35 3.28 3.75 5.23 2.92 2.75 6.58 3.54 4.65 0.75 2.01 5.36 4.40 6.49 1.12 Solution The calculation of the sample variance , S 2 , would be very tedious for this example if we tried to use the formula. 30 S2 = ∑ (x i =1 i − x) 2 30 − 1 Because it would be necessary to compute all thirty squared distances from the mean, however, for the shortcut formula we need only compute: 23 30 ∑x i =1 i 30 ∑x i =1 2 i = 1.85 + 3.42 + . . . + 1.12 = 122.47 and = (1.85) 2 + (3.42) 2 + . . . + (1.12) 2 = 6.57.5239 2 30 ∑ x1 30 (122.47) 2 i =1 2 x − 657 . 5239 − ∑ i 30 30 S 2 = i =1 = 30 − 1 29 = 5.4331 Notice that we retained four decimal places in the calculation of S 2 to reduce rounding errors, even though the original data were accurate to only two decimal places. The standard deviation is S = S 2 = 5.4331 = 2.33 Interpreting the Standard Deviation If we are comparing the variability of two samples selected from a population , the sample with the larger standard deviation is the more variable of the two. Thus, we know how to interpret the standard deviation on a relative or comparative basis, but we have not explained how it provides a measure of variability for a single sample. One way to interpret the standard deviation as a measure of variability of a data set would be to answer questions each as the following. How many measurements are within 1 standard deviation of the mean? How many measurements are within 2 standard deviation of the mean? For a specific data set, we can answer the questions by counting the number of measurements in each of the intervals. However, if we are interested on obtaining a general answer to these questions, the problem is more difficult. There are two guidelines to help answer the questions of how many measurements fall within 1, 2, and 3 standard deviations of the mean. The first set, which applied to any sample is derived from a theorem proved by the Russian Mathematician Chebyshev. The second set, the Empirical Rule is based on empirical evidence that has accumulated over time and applies to samples that posses mould shaped frequency distributions those that are approximately symmetric, with a clustering of measurement about the mid point of the distribution (the mean, median and mode should all be about the same) and that laid off as we move away from the center of the histogram. 24 Aids to the Interpretation of a Standard deviation. 1. 2. A rule (from Chebyshev’s theorem) that applied to any sample of measure regardless of the shape of the frequency distribution. a. It is possible that none of the measurements will fall within 1 standard deviation of the means ( x − S to x +S ). b. At least ¾ of the measurement will fall within 2 standard deviations of the mean ( x − 2 S to x + 2 S ). c. At least 8/9 of the measurements will fall within 3 standard deviations of the mean ( x − 3S to x + 3S ). A rule of thumb, called the empirical rule, that applies to samples with frequency distributions that are mould-shaped: a) Approximately 68% of the measurements will fall within 1 standard deviation of the mean ( x − S to x +S ). b) Approximately 95% of the measurements will fall within 2 standard deviations of the mean ( x − 2 S to x + 2 S ). c) Essentially all the measurements will fall within 3 standard deviations of the mean ( x − 3S to x + 3S ). Example 6 Refer to the data for earnings per share for thirty companies selected randomly from the 1980 Financial/Daily Mail . x = 4.08, S = 2.33. Calculate the fraction of the thirty measurements that lie within the intervals x + S , x + 2 S , and x + 3S , and compare the results with those of the Chebyshev and Empirical rule. Solution x − S , x + S ) = (4.08 − 2.33, 4.08 + 2.33) = (1.75, 6.41) A check of the measurements show that 19 of the 30 measurements i.e., approximately 63% are within 1 standard deviation of the mean. ( x − 2 S , x + 2 S ) = (4.08 − 4.66, 4.08 + 4.66) = (0.58, 8.74) 25 Contains 29 measurements, or approximately 97% of the n = 30 measurements. Finally the 3 standard deviation interval around x ( x − 3S , x + 3S ) = (4.08 − 6.99, 4.08 + 6.99) = (−2.91, 11.07). Contains all the measurements. These 1, 2 and 3 standard deviations percentages (63, 97, and 100) agree fairly well with the approximations of 68%, 95% and 100%, given by the Empirical Rule for mould-shape distributions. Example 7 The aid for interpreting the value of a standard deviation can be put to an immediate practical use as a check on the calculation of the standard deviation. Suppose you have a data set for which the smallest measurement is 20 and the largest is 80. You have calculated the standard deviation of the data set to be S = 190. How can you use the Chebyshev or empirical rule to provide a rough check on your calculated value of S? Solution The larger the number of measurements in a data set, the greater will be the tendency for very large or very small measurements (extreme values) to appear in the data set. But from the Rules, you know that most of the measurements (approximately 95% if the distribution is mould-shaped) will be within 2 standard deviations of the mean, and regardless of how many measurements are in the data set, almost all of them will fall 3 standard deviations of the mean. Consequently we would expect the range to be between 4 and 6 standard deviations – i.e. between 4s and 6s. Range – largest measurement – smallest measurement = 80 – 20 = 20. x − 2S x x + 2S Range 4S 26 The relation between the range and the Standard deviation. Then if we let the range equal 6S, we obtain Range 60 S = = = 6S 6S 10 Or, if we let the range equal 4s, we obtain a larger (and more conservative) value for S, namely Range = 60 = S = 4s 6s 15 Now you can see that it does not make much difference whether you let the range equal 4S (which is more realistic for most data set) or 6S (which is reasonable for large data sets). It is clear than your calculated value,, S = 190, is too large, and you should check your calculations. Calculating a mean and standard Deviation from Grouped data If your data have been grouped in classes of equal width and arranged in a frequency table, you can use the following formulas to calculate x , S2, and S xi = midpoint of the ith class f i = Frequency of the ith class K = Number of classes K x= ∑x f i i i =1 n K ∑ xi f i K x12 f i − i =1 ∑ n S 2 = i =1 n −1 2 S = S2 Example 8 Compute the mean and standard deviation for the earnings per share data using the grouping shown in the frequency table 1.4. 27 Solution The six class interval, midpoints, and frequencies are shown in the accompanying table. Class Class Midpoint 0.715 – 2.115 1.415 Class frequency fi 8 2.115 – 3.515 2.815 7 3.515 – 4.915 4.215 5 4.915 – 6.315 5.615 4 6.315 – 7.015 7.015 3 7.715 – 9.115 8.415 3 n = ∑ f i = 30 K x= ∑x f i i = (1.415)(8) + (2.815)(7) + (4.215)(5) + . . . + (8.415)(3) / 30 n 120.85 = = 4.03 30 i =1 K ∑ xi f i K x12 f i − i =1 ∑ n S 2 = i =1 n −1 2 K We found ∑x f i =1 i i = 120.85 when we calculated x, therefore ((1.415) 2 (8) + (2.815) 2 (7) + . . . + (8.415) 2 (3)) − (120.85)3 / 30 30 − 1 646.49875 − 486.82408 = 29 = 5.5060 S2 = S = 5.5060 = 2.35. 28 You will notice that values of x, S 2 , and S from the formulas for grouped data usually do not agree with these obtained for the raw data ( x = 4.03 and S = 2.311). This is because we have substituted the value of the class mid point for each value of x in a class interval. Only when every value of a x in each class is equal to its respective class midpoint will the formulas for grouped and for ungrouped data give exactly the same answers for x, S 2 , and S. otherwise, the formulas for grouped data will give only the approximations to these numerical descriptive measures. Measures of Relative Standing Descriptive measures of the relationship of a measurement to the rest of the data are called measure of relative standing. One measure of relative standing of a particular measurement is its percentile ranking. Definition Let x1 , x2 , . . . , xn be a set of n measurements arranged in increasing (or decreasing) order. The pth percentile is a number x such that p% of the measurements fall below the pth percentile and (100 – p)% fall above it. For example. if oil company A report that its yearly sales are in the 90th percentile of all companies in the industry, the implication is that 90% of all oil companies have yearly sales less that A’s, and only 10% have yearly sales exceeding company A’s. Relative Frequency .90 .10 Company A’s sales. Yearly sales. Another measure of relative standing in popular use is the Z-score. The Z-Score makes use of the mean and standard deviation of the data set in order to specify the location of a measurement. 29 Definition The sample Z-score for a measurement x is Z= x−x S The population Z-Score for a measurement x is Z= x−µ σ The Z-score represents the distance between a given measurement x and the mean expressed in standard units. Example 9 Suppose 200 steel workers are selected, and the annual income of each is determined. The mean and standard deviation are x = K14,000, S = K 2,000 Suppose Chipo’s annual income is K12,00 what is his sample Z-score? K8,000 x − 3S K12,000 x K14,000 x K20,000 x + 3S Annual income of steel workers. Solution Chipo’s annual income lies below the mean income of the 200 steel workers. We compute Z = x − x 12000 − 14000 = = −1.0 S 2000 Which tells us that Chipo’s annual income is 1.0 standard deviation below the sample mean, in short, his sample Z-score is –1.0. Example 10 Suppose a female bank executive believes that her salary is low as a result of sex discrimination. To try to substantiate her belief, she collects information on the salaries of her counterparts in the banking business. She finds that their salaries have a mean of K17,000 and a standard deviation of K1,000. Her salary is K13,500. Does this information support her claim of sex discrimination? 30 Solution The analysis might proceed as follows: First, we calculate the Z-score for the woman’s salary with respect to those of her male counterparts. Thus Z= 13500 − 17000 = −3.5 1000 The implication is that the woman’s salary is 3.5 standard deviations below the mean of the male distribution. Further more, if a check of the male salary data shows that the frequency distribution is mould-shaped, we can infer that very few salaries in this distribution should have a Z-score less than –3, as shown in the figure on the next page. Relative Frequency Z-Score = -3.5 13.500 17,000 Salary (K) Male Salary Distribution Therefore, a Z-score of –3.5 represents either a measurement from a distribution different from the male salary distribution or a very unusual (highly improbable) measurement for the male salary distribution. Well, which of the two situations do you think prevails? Do you think the woman’s salary is simply an usually low one in the distribution of salaries, or do you think her claim of salary discrimination is justified? Most people would probably conclude that ther salary does not come from the male salary distribution. 31 However, the careful investigator should require more information before inferring sex discrimination as the case. We would want to know more about the data collection technique the woman used, and more about her competence at her job. Also perhaps other factors like the length of employment should be considered in the analysis. Learning Objectives After working through this Chapter you should be able to • Calculate the arithmetic mean, standard deviation, variance, median, quartiles for grouped or ungrouped data. • Explain the use of all the above quartiles. Sample Examination Questions 1. (a) (b) Briefly state, with reasons, the type of chart which would best convey the information for each of the following: (i) Students at the University classified by programme of study. (ii) Members of a professional association classified by age. (iii) Numbers of cars taxed for 2002, 2003 and 2004 in areas A, B and C of a city. The weekly cost (K) of rented accommodation was recorded for 100 students living in an area. Amount in Thousand of Kwachas 0–4 5–9 10 – 14 15 – 19 20 – 24 25 - 29 Frequency 3 17 24 31 19 6 (i) Draw a histogram. (ii) Give the median and the interquartile range. (iii) Calculate the mean, mode, and standard deviation. (iv) What conclusions can you draw from the data? 32 2. 3. The data below are per capita per week numbers of cigarettes sold for 38 states in a country. 19.20 26.82 19.24 27.18 25.96 30.14 29.27 21.10 28.91 29.92 29.64 21.94 22.58 29.92 26.91 43.40 30.18 23.86 28.56 24.75 24.32 24.78 22.17 20.96 27.38 24.44 26.89 41.46 21.08 23.57 15.80 32.10 24.44 29.04 31.34 29.60 23.12 17.08 (a) Plot the data using an approximate graphical method. (b) Give the mean, the median and the mode. (c) Assuming this is a normal distribution, and given a standard deviation of these figures of 4.387, what proportion of the states would expect to have more than 20 cigarettes smoked per capita per week? (d) How does this compare with the actual situation as shown in the table above? (a) Briefly state, with reasons, the type of chart which would best convey in each of the following: (b) (i) A country’s total import of cigarettes by source. (ii) Students in higher education classified by age. (iii) Number of students registered for secondary school in year 2001, 2002 and 2003 for areas X, Y, and Z of a country. The weekly cost (K’000) of rented accommodation was recorded for 40 students living in an area. (i) 35 56 33 30 31 55 29 27 21 32 43 33 29 27 30 29 26 26 27 26 35 32 28 27 31 27 33 24 27 28 33 49 22 19 46 36 26 38 36 55 Summarize the data in a frequency distribution table. 33 4. (a) (ii) Calculate the mean and the standard deviation from your frequency table. (iii) Plot a histogram for these data. What is the value of the median? (iv) What conclusions can you draw from these data? Given below is a sample of 25 observations, calculate: (i) The range (ii) The arithmetic mean (iii) The median (iv) The lower quartile (v) The upper quartile (vi) The quartile deviation (vii) The mean deviation (viii) The standard deviation 5 18 29 42 50 61 8 20 33 43 54 63 10 21 35 46 56 67 11 25 39 48 58 69 14 (b) 5. Explain the term ‘measure of dispersion’ and state briefly the advantage and disadvantage of using the following measures of dispersion: (i) Range Mean deviation (ii) (iii) Standard deviation A machine produces the following number of rejects in each successive period of five minutes. 20 84 16 26 27 (a) 55 58 25 42 42 58 7 55 57 13 40 40 43 73 28 15 41 22 27 24 28 67 66 66 37 21 28 32 7 34 29 19 29 23 27 30 26 11 17 24 17 26 21 35 12 (b) Construct a frequency distribution from these data, using seven class intervals of equal width. Using the frequency distribution, calculate: (i) the mean (ii) the standard deviation (c) Briefly explain the meaning of your calculated measures. 34 CHAPTER 4 PROBABILITY Reading Newbold Chapter 3 Tailoka Frank P Chapter 8 Wonnacott and Wonnacolt Chapter 3 Introductory Comments Probability is more abstract than other parts of this subject, and solving the problems may be difficult. The concepts are very important for statistics because it is the rules of probability that allow one to reason about uncertainty. Independence and conditional probability are important to understand clearly for the purpose of statistical investigation. 4. Elementary Probability Counting Techniques. Introduction of the probability concept. The event and the event relationships. Probability trees, conditional probability and statistical independence. Counting techniques: In calculating probabilities, it is essential to be able to work out n(s) and n(E) as straight-forwardly as possible. Permutations and combinations are very helpful here. We begin with the following basic principle. Fundamental principle of counting. If two operations A, B are carried out, and there are M different ways of carrying out A and k different ways of carrying out B, then the combined A and B may be carried out in M x K different ways. Example 1 Suppose a license plate contains two distinct letters followed by three digits with the first digit not zero. How many different license places can be printed? The first letter can be printed in 26 different ways, the second letter in 25 different ways (since the letter printed first cannot be chosen for a second letter, the first digit in 9 ways and each of the other two digits in 10 ways. Hence 26.25.9.10.10 = 585,000 different plates can be printed. 35 Example 2. A toy manufacturer makes a wooden toy in two parts, the top part may be coloured red, white or blue and the bottom part brown, orange, yellow or green. How many differently coloured toys can be produced? A red top part may be combined with a bottom part of any of the four possible colours. Similarly, either a white or a blue top part may be combined with each of the four different coloured parts. Hence the number of different coloured toys is 3 × 4 = 12 Permutations: An arrangement of a set of n objects in a given order is called a permutation of the objects (taken all at a time). An arrangement of any r ≤ n of these objects in a given order is called an r-permutation or a permutation of the object’s taken r at a time. Example 3 Consider the set of letters a, b, c and d. then i) bdca, dcba and acdb are permutations of the 4 letters (taken all at a time). ii) bad, adb and bca are permutations of 4 letters taken 3 at a time. iii) ad, ca, da and bd are permutations of the 4 letters taken 2 at a time. Example 4 The telephone switchboard in the company requires two operators whose chairs (positions) are side by side. When the telephone operators go to lunch, two of the four Secretaries take their places. If we make a distinction between the two operator’s positions, in how may ways can the four secretaries fill them? We can answer this question by determining the number of possible permutations of 4 things taken 2 at a time. There are 4 secretaries, A, B, C and D, to fill the first position. Once this position has been filled, there are only 3 secretaries to fill the second positions. 36 The figure below: Ways to fill First position Ways to fill second position A B B 1 C 2 D 3 A 4 C 5 D 6 A C D Counting the number of permutations 7 B 8 D 9 A 10 B 11 C 12 The tree diagram on the page illustrates that there are 4.3 = 12 possible permutations of four things taken two at a time. Suppose that n is the number of distinct objects from which an ordered arrangement is to be derived, and r is the number of objects in the arrangement. The number of possible ordered arrangements is the number of permutations of things taken r at a time. This is written symbolically as P (n, r ) in general, or n Pr . P (n, r ) = n(n − 1)(n − 2). . . (n − r + 1) → (1) We multiply the right hand side of equation (1) by (n − r )!/(n − r )! This is equivalent to multiplying by 1, we obtain 37 P (n, r ) = n(n − 1)(n − 2). . . (n − r +1) (n − 1)! (n − r )! n(n − 1)(n − 2) . . . (n − r + 1)(n − r )! (n − r )! n! = (n − 1)! = Example 5 i) In a stock room, 5 adjacent bins are available for storing 5 different items. The stock of each item can be stored satisfactorily in any bin. In how many ways can we assign the 5 items to the 5 bins? We get the answer by evaluating P(5, 5) which is P (5,5) = ii) 5! = 5.4.3.2.1 = 120 (5 − 5)! Suppose that there are 6 different parts to be stocked, but only 4 bins are available. To find the number of possible arrangements, we need to determine the number of permutations of 6 things taken 4 at a time, which is P (6,4) = 6! 6 .5 .4 .3 .2 .1 = = 360 (6 − 4)! 2! Example 6 How many permutation are there of 3 objects, say, a, b and c? There are P (3,3) = 3! = 3!= 1.2.3 = 6 such permutations. (3 − 3)! These are abc, acb, bac, bca, cab, cba. Permutation with repetitions: The number of permutations of n objects of which n1 are alike, n2 are alike of another kind . . . . nr are alike of a further kind, is given by 38 n! n1!n2!. . . n −! where n = n1 + n2 + . . . + nr Example 7 Find the number of permutation of the would “ACCOUNTANTS” Total number of letters in “ACCOUNTANTS” is 11 out of which there are two C’s, two N’s, and two t’s. So the required number of permutation s = 11! = 2494800. 2!2!2!2! Combinations A combination is an arrangement of objects without regard to order. Example 8 The combinations of the letters a, b, c, d taken 3 at a time are {a, b, c}, {a, b, d}, (a, c, d}, (b, c, d} or simply abc, abd, acd, bcd, Observe that the following combinations are equal. abc, acb, bac, bca, cab, cba. That is, each denotes the same set a, b, c The number of combinations of n objectives taken r at a time will be denoted by C (n, r ) or nCr . Example 9 We determine the number of combinations of the four letters, a, b, c, d taken 3 at a time. Note that each combination consisting of three letters determine 3! = 6 permutations of the letters in the combination. Combinations Permutations abc abc, acb, bac, bca, cab cba abd abd, adb, bad, bda, dab, dba acd acd, adc, cad, cda, dac, dca bcd bcd, bdc, cbd, cbd, dbc, dcb 39 Thus the number of combinations multiplied by 3! Equals the number of permutations c(4,3).3!= P (4,3)orC (4,3) = P (4,3) 3! Now P (4,3) = 4.3.2 = 24 and 3!= 6; henceC (4,3) = 4 as noted above. Thus C (n, r ) = n! r!(n − r )! Example 10 A perfume manufacturer who makes 10 fragrances wants to prepare a gift package containing 6 fragrances. How many combinations of fragrances are available? The answer is C (10,6) = 10! 10.9.8.7.6! = = 210 6!(10 − 6) 6!.4.3.2.1 Tree Diagrams A tree diagram is a device used to enumerate all the possible outcomes of a sequence of experiments where each experiment can occur in a finite number of ways. The construction of tree diagrams is illustrated in the following examples. 40 Example 11 Find the product A x B x C where A = {1, 2}, B{a, b, c} and C = {3, 4}. The tree diagram follows: 3 (1, a, 3) 4 (1, a, 4) a 1 b 3 (1, b, 3) 4 (1, b, 4) 3 (1, c, 3) 4 (1, c, 4) 3 (2, a, 3) 5 (2, a, 4) 3 (2, b, 3) 4 (2, b, 4) 3 (2, c, 3) 4 (2, c, 4) c 0 a b 2 c Observe that the tree is constructed from left to right, and that the number of branches at each prints corresponds to the number of possible outcomes of the next experiment. 41 Example 12 Mumba and Ened are to play a tennis tournament. The first person to win two games in a row or who wins a total of three games wins the tournament. The following diagram shows the possible outcomes of the tournament. M M M M M E E E E 0 M E M M M E E E E Observe that there are 10 end points which corresponds to the 10 possible outcomes of the tournament. MM, MEMM, MEMEM, MEMEE, MEE, EMM, EMEMM, EMEME, EMEE, EE The path from the beginning of the tree to the end point indicates who won which game in the individual tournament. Basic Of Probability Given a sample spaces S, we need to assign to each event that can be obtained from S a number, called the probability of the event. This number will indicate the relative likelihood of the various events. For events that are equally likely, the probability of the event can be found from the following basic probability principle. Then the probability that event E occurs, written P (E), is P(E) = m (1) n 42 This same result can also be given in terms of the cardinal number of a set. Where n (E) represents the number of elements in a finite set E. With the same assumptions given above, P(E) = n(E) . (2) n(S) Example 1 Suppose a fair coin is tossed twice. The sample space is S = (HH), (HT), (TH), (TT). Set S contains 4 outcomes, all of which are equally likely. (This makes n = 4 in the formula (1) above.) Find the probability of the following outcomes. a) E = (HT), (TH) Event E contains two elements, so P (E) = 2 = 1 4 2 By this result, a head or tail will show up 1/2 of the time when a fair coin is tossed twice. b) Two heads Let event F = (HH) be the event” two heads are observed when a fair coin is tossed twice. Event F contains one element, so P (F) = ¼ c) Three heads A fair coin tossed twice can never show three heads. If G is the event, then G = ∅, and P (G) = 0 = 0. 4 The event is impossible. Example 2 If a single paying card is drawn at random from an ordinary 52-card bridge deck, find the probability of each of the following events. a) An ace is drawn There are four aces on the deck, out of 52 cards, so P(ace) = 4 1 = 52 13 43 b) A face card is drawn Since there are 12 face cards P (face card) = c) 12 3 = 52 13 A spade is drawn The deck contains 13 spaces, so P (spade) = d) 13 1 = 54 4 A spade or heart is drawn Besides the 13 spades, the deck contains 13 hearts, so P (spade or heart) = 26 1 = 52 2 Example 3 The Manager of a department store has decided to make a study on the size of purchases made by people coming into the store. To begin he chooses a day that seems fairly typical and gathers the following data. (Purchases have been rounded to the nearest Kwacha) with sales tax ignored. Amount of purchase Number of customers Probability (relative frequency) K0 and under 160 0.280 K2250 and under K11250 K11250 and under 84 0.147 50 0.088 136 0.239 77 0.135 63 0.111 570 1.000 K13500 K13500 and under K20250 K20250 and under K22500 K22500 and over 44 Probability Distributions. In example 3 the outcomes were various purchase amounts, and a probability was assigned to each outcome. By this process, a probability distribution can be set up; that is to each possible outcome of an experiment, a number, called the probability of that outcome, is assigned. Example 4 Set up a probability distribution for the number of heads observed when a fair coin is tossed twice. _______________________________________ Number of heads Probability _______________________________________ 0 1 4 1 2 4 2 1 4 _________ Total 1 _______________________________________ The probability distribution that were set up suggest the following properties of probability. Let S = S1, S2, S3, …, Sn be the sample space obtained from the union of n distinct simple events S1 , (S2 , S3 ,…, Sn with associated probabilities P1, P2, P3, …, Pn. Then 1. 0 ≤ P1 ≤ 1, 0 ≤ P2 ≤ 1, …, 0 ≤ Pn ≤ 1 (All probabilities are between 0 and 1 inclusive); 2. P1 + P2 + P3 + … + Pn = 1; (The sum of all probabilities for a sample space is 1.); 3. P (S) = 1 45 Addition Principle Suppose E = S1, S2, ..., Sn , where S1 , S2 , S3 , ..., Sm are distinct simple events then P (E) = P( S1 ) + P( S2 ) + ... + P ( Sm ) Example 5 Refer to the previous Example and find the probability that a customer spends at least K11,250 but less than K20250. This event is union of two simple events spending K11,250 to K20,250. The probability of spending at least K11,250 but less than K20,250 can thus be found by the addition principle. Let this event A, then P (A ) = P(Spending K11250 − K13500) + P(spending K13500 -K20250) Addition for Mutually Exclusive Events . For mutually exclusive events E and F P (EUF) = P(E) + P(F) Example 6 Use the probability distribution of Example 5 to find the probability that we get at least one head on tossing a fair twice. Event E “At least one head” is the union of three mutually exclusive events, two heads, one head one tail and one tail one head. P(E) = P(2 heads) + 2P(one head one tail) = 1 2 3 + = 4 4 4 Complement: P(E ') = 1 - P(E ) and P(E) = 1 - P(E ') In a particular experiment, P(E) = P(E') = 1 - P(E) = 1 − 3 . 8 Find P(E') 3 5 = . 8 8 46 Example 7 In example 3 above, find the probability that a customer spends less than K22500. Let E to be the event “a customer spends less than K22500”. P(E) = 0.281 + 0.147 + 0.088 + 0.2394 + 0.135 = 0.889 Alternatively E' is the event that “a customer spends K22500 and over” from the table. P(E') = 0.111, and 1-P( E ′ ) = P(E) = 1 - 0.111 = 0.889 Odds The Odds in favor of an event E is defined as the ratio of P(E) to P(E') , or P(E) P(E') Example 8 Suppose the weather forecaster says that the probability of rain tomorrow is = 2 . 5 Find the odds in favor of rain tomorrow. Let E be the event “rain tomorrow”. Then E ′ is the event “no rain tomorrow”. Since 2 5 P(E) = 3 We have P( E ′ ) = . By the definition of odds, odds in favor of rain 5 3 or 3:2 = 2/5 written 2 to 3/5 . In general, if the odds favoring event E are m to n, then P(E) = m m and P( E ′ ) = m+n m+n Example 9 The odds that a particular bid will be the low bid are 8 to 13. Find the probability that the bid will be the low bid. 47 Solution Odds of 8 to 13 show 8 favorable chances out of 8 + 13 = 21 chances altogether. P (bid will be low bid) = There is a 8 8 = 8 + 13 21 13 chance that the bid will not be the low bid 21 Extended Addition Principle For any two events, E and F from a sample space S, P(EUF) = P(E) + P(F) -P(E ∩ F) Example 10. If a single card is drawn from an ordinary deck, find the probability that it will be red or a face card. Let R and F represent the events “red” and “face card” respectively. Then P(R) = 26 12 6 , P(F) = , and P (R ∩ F) = 52 52 52 (There are six red face cards in a deck) By the extended addition principle, P(R∪ F) = P(R) + P(F) - P(R∩ F) = 26 + 12 - 6 = 32 = 8 52 52 52 52 13 48 Example 11 Suppose two fair dice are rolled. Find each of the following probabilities. a) The first die show a 2 or the sum is 6 A B (1,1) (2,1) (3,1) (4,1) (5,1) (6,1) (1,2) (2,2) (3,2) (4,2) (5,2) (6,2) (1,3) (2,3) (3,3) (4,3) (5,3) (6,3) (1,4) (2,4) (3,4) 4,4) (5,4) (6,4) (1,5) (2,5) (3,5) (4,5) (5,5) (6,5) (1,6) (2,6) (3,6) (4,6) (5,6) (6,6) P(A) = 6 5 1 , P(B) = , P(An B) = 36 36 36 By the extended addition principle P(A∪B) = P(A) + P(B) – P(A∩ B) = b) 6 5 1 10 5 + − = = 36 36 36 36 18 The sum is 5 or the second die is 4. P(sum is 5) = 4 6 , P(second die is 4) = 36 36 P(sum is 5 and second die is 4) = 1 36 = 9 = 1 36 4 49 CONDITIONAL PROBABILITIES Often we are interested in how certain events are related to the occurrence of other events. In particular, we may be interested in the probability of the occurrence of an event given that another related event has occurred. Such probabilities are referred to as conditional Probabilities. The conditional Probability of event E given event F, written P(EF), is P(EF) = P(E∩ F), P(F) ≠ 0 P(F) Example 11 The Training Manager for a large stockbrokerage firm has noticed that some of the of firm’s brokers use the firm’s research advice, while other brokers tend to go with their own feelings of which stocks will go up. To see if the research department is better than just the feelings of the brokers, the manager conducted a survey of 100 brokers, with results as shown in the following table. Picked stocks Didn’t pick stocks Total That went up That went up 15 Used research 30 45 Didn’t use research 30 25 55 Totals 60 40 100 Letting A represent the event “picked stocks that went up”, and letting B represent the event “used research”, we can find the following probabilities. P(A) = 60 = 0.6 100 P(A') = 40 = 0.4 100 P(B) = 45 = 0.45 100 P(B') = 55 = 0.55 100 Suppose we want to find the probability that a broker using research will pick stocks that go up. From the table above, of the 45 brokers who use research, 30 picked stocks that went up, with P(broker who uses research picks stocks that go up) = 30 = 0.667. 45 50 This is a different number than the probability that a broker picks stocks that go up, 0.6, since we have additional information (the broker uses research) which reduced the sample space. In other words, we found the probability that a broker picks stocks that go up, A, given the additional information that the broker uses research, B. This is called the conditional probability of event A, given that event B has occurred, written P(A/B). In the example above, P(AB) = P(A∩ B) P(B) = 30 = 0.667. 45 Product Rule: For any events E and F P(E∩F) = P(F). P(E/F) Example 12. A class is 2 3 women and men. Of the women, 25% are business majors. Find the 5 5 probability that a student chosen at random is a woman business major. Solution Let B and W represent the events “business major” and “woman”, respectively. We want to find P(B ∩ W) . By the product rule, P(B ∩W) = P(W). P(BW) Using the given information, P(W) = 2 5 = 0.4 and P(BW) = 0.25. Thus P(B∩ W) = 0.4(0.25) = 0.10 Example 13 Suppose an investment firm is interested in the following events: A = Common stock in XYZ Corporation gains 10% next year B = Gross National Product gains 10% next year 51 The firm has assigned the following probabilities on the basis of available information. P(AB) = 0.8, P(B) = 0.3 That is, the Investment Company believes the probability is 0.8 that the XYZ common stock will gain 10% in the next year assuming that the GNP gains 10% in the same time period. In addition, the company believes the probability is only 0.3 that the GNP will gain 10% in the next year. Use the formula for calculating the probability of an intersection to calculate the probability that XYZ common stock and the GNP gain 10% in the next year. Solution. We want to calculate P(A∩B). The formula is P(A∩B) = P(B) P(AB) = (0.3) (0.8) = 0.24 Thus, the probability, according to this investment firm, is 0.24 that both XYZ common stock and the GNP will gain 10% in the next year. In the previous section we showed that the probability of an event A may be substantially altered by the assumption that the event B has occurred. However, this will not always be the case. In some instances the assumption that event B has occurred will not alter the probability of event A at all. When this is true, we call events A and B independent. Events A and B are independent if the assumptions that B has occurred does not alter the probability that A has occurred, i.e P(AB) = P(A) When events A and B are independent it will also be true that P(BA) = P(B) Events that are not independent are said to be dependent. Example 14 The probability that interest rates will rise has been assessed as 0.8. If they do rise, the probability that the stock market index will drop is estimated to be 0.9. If the interest 52 rates do not rise, the probability that the stock market index will still drop is estimated as 0.4. What is the probability that the stock market index will drop? Solution P(A) = P(Interest rates rise) = 0.8. P(B) = P(Stock market index drops) = ? Then, the probability of A′ , the complement of A, “interest rates do not rise”’ is P( A′ ) = 1 – 0.8 = 0.2. P(BA) = P(stock market index dropsinterest rates rise) = 0.9 P(B A′ ) = P(stock market index dropsinterest rates do not rise) = 0.4. By the multiplication rule P(B n A) = P(A) P(BA) = 0.8 x 0.9 = 0.72 and P(B n A′ ) = P( A′ ) P(B A′ ) = 0.2 x 0.4 = 0.08 P(B) = 0.72 + 0.08= 0.80 Example 15 Suppose we toss a fair die, let B be the event observe a number less or equal to 4 and A to be the event an even number is observed. Are event A and B independent? P(B) = 4 2 = , since B = { 1, 2, 3, 4} 6 3 P(A) = 3 1 = since A = 2, 4, 6 6 2 P(A ∩ B) = 2 1 = where A ∩ B = 2, 4 6 3 Now given A has occurred 53 P(BA) = P(AU B) = 1/3 = 2 = P(B) P(A) Similarly P(AB) P( A B) = ½ 3 P( A ∩ B ) 1 / 3 2 = = = P( B) P ( A) 1/ 2 3 P( A ∩ B) 1 / 3 1 = = = P( A) P( B) 1/ 2 2 Therefore the events A and B are independent. If events A and B are independent, the probability of intersection of A and B equals the product of the probabilities of A and B, i.e, P(A∩ B) = P(A) P(B). In the toss experiment P(A∩B) = P(A). P(B) = 1 2 1 = = 2 3 3 Bayes’ Theorem A posteriori Probabilities Suppose three machines, A, B, and C, produce similar engine components. Machine A produces 45 percent of the total components, machine B produces 30 percent, and Machine C, 25 percent. For the usual production schedule, 6 percent of the components produced by machine A do not meet established specifications; for machine B of machine C, the corresponding figures are 4 percent and 3 percent. One component is selected at random from the total output and is found to be defective. What is the probability that the component selected was produced by machine A? The answer to this question is found by calculating the probability after the outcomes of the experiment have been observed. Such probabilities are called a posteriori probabilities as opposed to a prior probabilities – probabilities that give the likelihood that an event will occur. 54 C B A D A∩ D B∩D C∩D D is the event that a defective component is produced by machine A, machine B or machine C. The three mutually exclusive events A, B and C form a partition of the sample spaces. Apart from being mutually exclusive, their union is precisely S. The event D may be expressed as: 1. D = ( A ∩ D) ∪ ( B ∩ D) ∪ (C ∩ D) 2. The event that a component is defective and is produced by machine A is given by A ∩ D. Thus, a posterior probability that a defective component selected was produced by machine a is given by P ( A / D ) = n( A ∩ D ) n( D ) P( A ∩ D) P( D) P( A ∩ D) = P ( A ∩ D ) +P ( B ∩ D )+P (C ∩ D ) P( A / D) = (1) Next, using the product rule, we may express 55 P ( A ∩ D ) = P ( A) P ( D / A) P ( B ∩ D ) = P ( B )P ( D / B ), and P (C ∩ D ) = P (C ) P ( D / C ) so that (1) may be expressed in the form P ( A) P ( D / A) P( A / D) = P ( A) P ( D / A) + P ( B ) P ( D / B )+P (C ) P ( D / C ) (2) which is a special case of a result known as Bayes Theorem. Observe that the expression on the right of (2) involves the probabilities P(A), P(B), P(C) and the conditional probabilities P(D/A),P(D/B), and P(D/C), all of which may be calculated in the usual fashion. Infact, by displaying these quantities on a tree diagram, we obtain Figure 1.0. We may compute the required probability by substituting the relevant quantities into (2), or we may make use of the following device. P(A/D) = Product of probabilities along the limb through A Sum of products of the probabilities along each limb terminating at D Step 1 Step 2 Machine Condition P ( A) = 0.45 A P ( D ∩ A) = P ( A).P ( D / A) Probability of outcome P ( D / A) = 0.06 D P ( B ) = 0.30 P ( D / A) = 0.94 = 0.027 D P ( D ∩ A).P ( D / A) = 0.423 B P ( D / B ) = 0.04 D P ( D ∩ B ) = P ( B ).P ( D / B ) = 0.012 P (C ) = 0.25 P ( D / B ) = 0.96 D P ( D ∩ B ) = P ( B ).P ( D / B ) =0.288 C P ( D / C ) = 0.03 D P ( D ∩ C ) = P (C ).P ( D / C ) = 0.0075 P ( D / C ) = 0.97 D P ( D ∩ C ) = P (C.).P ( D / C ). =0.2425 In either case, we obtain 56 P( A / D) = (0.45)(0.06) (0.45)(0.06) + (0.3)(0.04) + (0.25)(0.03) = 0.027 0.027 + 0.012 + 0.0075 = 0.027 = 0.581 0.0465 Before looking at any further examples, let us state the general form of Baye’s Theorem. Let A1 , A2 , . . . , An be a partition of a sample space S and let E be an event of the experiment such that P ( E ) ≠ 0. Then the posterior probability P ( Ai / E )(1 ≤ i ≤ n) is given by P ( Ai / E ) = P ( A1 ) P ( E / A1 ) P ( A1 ).P ( E / A1 )+P ( E / A2 )P ( A2 ) + . . . + P ( An ).P ( E / An ) → (3) Problems 1) In a certain city, 40 percent of the people consider themselves movement for multiparty democracy (MMD), 35 percent consider themselves to be United Party for Nation Development (UPND) and 25 percent consider themselves to be independents (1). During a particular election, 45 percent of the MMDs voted, 40 percent of the UPND voted and 60 percent of the independents voted. Suppose a person is randomly selected: a) b) Find the probability that the person voted. If the person voted, find the probability that the voter is i) ii) iii) 2) MMD UPND Independent. Three girls Chanda, Mumba and Chileshe, pack okra in a factory. From the batch allotted to them Chanda packs 55%, Mumba, 30% and Chileshe 15%. The probability that Chanda breaks some okra in a packet is 0.7, and the respective probabilities for Mumba and Chileshe are 0.2 and 0.1. What is the probability that a packet with broken okra found by the Checker was packed by a) b) c) Chanda? Mumba? Chileshe? 57 3) A publisher sends advertising material for an accounting text to 80% Professors teaching the appropriate Accounting Courses. Thirty percent Professors who received this material adopted the books, as did 10% professors who did not receive the material. What is the probability Professor who adopts the book has received the advertising material? of all of the of the that a Solutions MMD P(M) = .40 P(V/M) = .45, Independent P(I) = .25 P(V/I) = .60 a) P(V) + P(M).P(V/M) + P(U).P(V/U) + P(I)P(V/I) = .40(.45) + .35(.40) + .25(.60) = 0.18 + 0.14 + 0.15 = 0.47 b) i) ii) iii) 2. UPND P(U) = .35 P(V/U) = .40, P( M / V ) P (V ) P ( M ).P (V / M ) = P ( M ).P (V / M ) + P (U ).P (V / U ) + P ( I ).P (V / I ) 0.18 = = 0.383 0.47 P( M / V ) = P (U ∩ V ) P (V ) P (U ).P (V / U ) = P (V ) 0.14 = = 0.298 0.47 P (U / V ) = P( I / V ) = 0.15 = 0.319 0.47 Chanda, (D) Mumba (M) Chileshe (H) P ( D ) = .55, P ( M ) = .30 P ( H ) = .15 P ( B / D ) = 0 .7 , P ( B / M ) = 0.2, P ( B / H ) = 0 .1 P ( B ) = P ( D ).P ( B / D ) +P ( M ).P ( B / M ) +P ( H ).P ( B / H ) = .55(0.7) + .30(0.2) + .15(0.1) = 0.385 + 0.06 + 0.015 = 0.46 P ( D ).P ( B / D ) 0.385 a) P( D / B) = = ≅ 0.837 P( B) 0.46 58 3. b) P( M / B) = P ( M ).P ( B / M ) 0.06 = ≅ 0.1304 P( B) 0.46 c) P( H / B) = P ( H ).P ( B / H ) 0.015 = ≅ 0.0326 P( B) 0.46 Let R be the event the Professor received material. A be the even the Professor a adopted the book P(R).P(A/R) P(A/R) = 0.30 P ( A /R) = 0.10 P(R) = 0.8 P(A/ R ) = 0.10 P( R ) = 0.2 P( A / R ) = 0.90 P ( R / A) = P ( R ∩ A) P ( R ).P ( A / R ) = P ( A) P ( R ).P ( A / R ) + P ( R ).P ( A / R ) = 0.8(0.30) 0.8(0.30) + 0.2(0.10) = 0.24 0.24 = 0.24 + 0.02 0.26 = 0.923. Learning Objectives After working through this Chapter, you should be able to • List the rules of probability. • Explain conditional probability, independent events and mutually exclusive events. • Apply the Baye’s Theorem to find conditional probabilities • Define combinations, permutation and be able to apply such results to problems. 59 CHAPTER 5 PROBABILITY DISTRIBUTION Reading Newbold Chapters 4 (not 4.4) and only 5.5 in Chapter 5 Wonnacott and Wonnacott Chapter 4 Tailoka Frank P Chapter 9 Introductory Comments This Chapter introduces the three useful standard distributions for two counts (Discrete Probability distribution) and one for (Continuous probability Distribution). These are so often used that everyone should be familiar with them. We need to know the mean, the variance and how to find simple probabilities. 5.0 Discrete Random Variables A random variable maybe defined roughly as a variable that takes on different numerical values because of chance. Random variables are classified as either discrete or continuous. A discrete random variable is one that can take on only a finite or countable number of distinct values. A random variable is said to be continuous in a given range if the variable can assume any value in that range. The term continuous random variable implies that the variation takes place along a continuum. Examples of continuous variables include weight, length, velocity, rate of production, dosage of a drug, and the length of life of a given product. While discrete variables can be counted, continuous variable can be measured with some degree of accuracy. A probability distribution of a discrete random variable x whose value at x is f( x ) possess the following properties. 1. f ( x) ≥ 0 for all real values of x 2. ∑ f ( x) = 1 x 60 Property 1 simple states that probabilities are greater than or equal to zero. The second property states that the sum of the probabilities in a probability distribution is equal to 1. The notation ∑ F ( x) x means ‘sum of the values f() for all the values that x takes on”. We will ordinarily use the term probability distribution to refer to both discrete and continuous variables, other terms are sometimes used to refer to probability distributions (also called probability functions). Probability distributions of discrete random variables are often referred to as probability mass functions or simply mass functions because the probabilities are massed at distinct points, for example along the x axis. Probability distributions of continuous random variables are referred to as probability density functions or density functions. 5.1 Cummulative Distribution Functions Given a random variable x , the values of the cumulative distribution function at x , denoted F(x), is the probability that x takes on values less than or equal to x . Hence f ( x) = p( x) ≤ ( x) → (1) In the case of a discrete random variable, it is clear that f (c ) = ∑ f ( x ) → ( 2) x≤c The symbol ∑ f ( x) x≤c Means “sum of the values of f9cx0 for all values of x less than or equal to c”. Example 1 Shoprite is interested in diversifying its product line into the soft goods market. Mr Phiri, Vice president in charge of mergers and acquisitions, is negotiating the acquisition of quick-save, a discount shop. To determine the price Shoprite would have to pay per share for quick save, he sets up the probability distribution for the stock price shown in the table below. 61 Probability distribution and cumulative distribution for the price of Quick save common stock. Price of Quicksave Common stock (x) K74 250 76 500 78 750 81 000 83 250 Probability f(x) 0.08 0.15 0.53 0.20 0.04 Cumulative Probability F(x) 0.08 0.23 0.76 0.96 1.00 The probability that the price would be K78 750 or less is P ( x ≤ K 78 750) = F ( K 78750) = 0.08 + 0.15 + 0,53 = 0.76 P ( x ≤ K 76 500) = F ( K 76 500) = 0.23 A graph of the cumulative distribution function is a step function that is the values change in discrete ‘steps’ at the indicated integral values of the random variable x. F(x) • 1.00 • • 0.80 0.60 0.40 • 0.20 • 0.00 K74 250 76 500 78 750 81 000 83 250 x Price of stock Graph of cumulative distribution of the price of Quicksave common stocks. 62 5.2 Probability Distribution of Discrete Random Variables We will discuss the binomial and Poisson probability distribution of discrete random variables. µ = E ( x) = ∑ xP( x) All x The variance of discrete random variable x is σ 2 = E ( x − µ ) 2 = ∑ ( x − µ ) 2 p( x) All x In general, if g(x) is any function of the discrete random variable x, then E[ g ( x)] = ∑ g ( x) P( X = x) All x For example E (20 x) = ∑ 20 xP( X = x) E ( x 2 ) = ∑ x 2 P( X = x) E ( X − 5) = ∑ ( x − 5) P( X = x) Example 2 The random variable X has the following distribution for x = 1,2,3,4. X 1 2 3 0.02 0.35 0.53 P( X = x) 4 0.10 Calculate: a) E ( x) b) E (5 x − 3) c) E( X 2 ) d) 6 E ( x) + 8 e) E (5 x 2 + 2) Solution a) E ( x) = ∑ xP( X = x) = 1(0.02) +2(0.35) + 3(0.53) + 4(0.10) = 0.02 + 0.70 + 1.59 + 0.40 = 2.71 63 b) E (5 x − 3) = 5 E ( x) − 3 = 5∑ xP ( X = x) − 3 = 5 [1(0.02) + 2(0.35) + 3(0.53) + 4(0.10)] − 3 = 5(2.71) − 3 = 13.55 − 3 = 10.55 c) E ( X 2 ) = ∑ X 2 P( X = x) = 12 (0.02) + 2 2 (0.35) + 32 (0.53) − 4 2 (0.10) = 0.02 + 1.4 + 4.77 + 1.6 = 7.79 d) 6 E ( x) + 8 = 6∑ xP( X = x) + 8 = 6(2.71) + 8 = 16.26 + 8 = 24.26 e) E (5 x 2 + 2) = 5 E ( x 2 ) + 2 =5 E ( x 2 )+ 2 5∑ x 2 P( X = x) + 2 = 5(7.79) + 2 = 40.95 In general, the following results hold when X is a discrete random variable. 1) E (a ) = a where a is any constant. 2) E (ax) = aE ( X ), where a is any constant 3) E (aX +b) = aE ( x) + b, where a and b are any constants. 4) E[ f1 ( x) + f 2 ( x)] = E[ f ( x)] + E[ f 2 ( x) where f1 and f 2 are functions of X. Variance, Var (x) As for the variance, the following results are useful. 1) Var (a ) = 0 where a is any constant 2) Var (ax) = a 2 var( x) where a is any constant 3) Var (ax +b) = a 2 var( x) where a and b are any constants. 64 Example 3 For the data in Example 2, calculate the following: a) b) c) Var (5 x − 3) = 25 var( x) Var (4 x) Var (3 x + 2) Solution a) Var (5 x − 3) = 25 var( x) We will need to find Var ( x) = E ( x 2 ) − E 2 ( x) E(X ) = ∑ xP ( X = x ) = 2 . 71 . E ( X 2 ) = ∑ X 2 P ( X = x) = 7.79 Var ( x) = E ( X 2 ) − E 2 ( x) = 7.79 − (2.71) 2 = 0.4459 Var (5 x − 3) = 25 var( x) = 25(0.4459) Therefore var(5 x − 3) = 11.1475 b) Var (4 x) = 16 var( x) = 16(0.4459) = 7.1344 c) Var (3 x + 2) = 9 var( x) = 9(0.4459) = 4.0131 5.3 The Binomial Distribution The Binomial distribution, in which there are two possible outcomes on each experimental trial is undoubtedly the most widely applied probability distribution of a discrete random variable. It has been used to describe a large variety of processes in business and the social sciences as well as other areas. The Bernoulli process after James Bernoulli (1654 – 1705) gives rise to the Binomial distribution. 65 The Bernoulli process has the following characteristics. a) On each trial, there are two mutually exclusive possible outcomes, which are referred to as “success” and “failures”. In somewhat different language sample space of possible outcomes on each experimental trial is S = (failure, success). b) The probability of a success, denoted P, remains constant from trial to trial. The probability of a failure-denoted q, is equal to 1 − P . c) The trials are independent. That is, the outcomes on any given trial or sequence of trials does not affect the outcomes on subsequent trials. Suppose we toss a coin 3 times, then we may treat each toss as one Bernoulli trial. The possible outcomes on any particular trial are a head and a tail. Assume that the appearance of a head is a success. For example, we may choose to refer to the appearance for a defective item in a production process as a success, if a series of births is treated as a Bernoulli process, the appearance of female (male) may be classified as a success. Consider the experiment of tossing a fair coin three times, then the sequence of outcome is HTH, HHH, HHT, THH, TTT, THT, TTH, HTT Since the probability of a success and failure on a given trial are respectively, P and, the probability of the outcome for instance {HTH } = pqp = p 2 q where p is the probability of observing a “head” and q is the probability of observing a “tail”. Outcome Probability HTH pqp = p 2 q HHH PPP = p 3 HHT ppq = p 2 q THH qpp = qp 2 THT qpq = q 2 p TTT qqq = q 3 TTH qqp = q 2 p HTT pqq = pq 2 66 We can obtain the number of such sequences from the formula for the number of combination of n objects taken x at a time. Thus the number of possible sequences in 3 which two heads can occur is . 2 n! Thus C (n, x) = x!(n − x)! 3! =3 2!1! C (3,2) = These are the events {HTH}, {HHT}, {THH} Therefore the probability of exactly 2 heads p ( x = 2) = c(3,2)qp 2 In the case of the fair coin, we assign a probability of 1 1 to p and to q. Hence 2 2 P ( x = 2) = C (3,2)(1 / 2)(1 / 2) 2 = 3 / 8. This result may be generalized to obtain the probability of (exactly) a success in n trials of a Bernoulli process. Let us assume n – x failures occurred followed by x successes, in that order. We may then represent this sequence as: qqq . . . q n- x failures ppp x successes The probability of this particular sequence is q n − x p x . The number of possible sequences n of n trials resulting in exactly x success is . x Therefore, the probability of obtaining x successes in n trials of a Bernoulli process is given by F ( x) = (n, x)q n − x p x for x = 0,1,2, . . .,n If we denote by x the random variable “number of successes in these n trials”, then F ( x) = P( X ≤ x) The fact that this is a probability distribution is verified by noting the following conditions. 1) f ( x) ≥ 0 for all real numbers of x 2) ∑ f ( x) = 1 x Therefore, the term binomial probability distribution, or simply binomial distribution, is usually used to refer to the probability distribution resulting from a Bernoulli process. 67 In problems where the assumption of a Bernoulli process are met, we can obtain the probabilities of zero, one, or more successes in n trials from the respective terms of the binomial expansion (q + p ) n , where q and p denotes the probabilities of failure and success on a single trial and n is the number of trials. Example 4 The tossing of a fair coin 3 times was used earlier as an example of a Bernoulli process. Compute the probabilities of all possible numbers of heads and this establishes a particular binomial distribution. Solution 1 , n = 3. Letting x 2 represent the random variable “number of heads”, the probability distribution is as follows: This problem is an application of the binomial distribution for P = (Number of heads) X P( x) 0 3 1 0 2 1 3 1 1 2 2 3 1 1 3 = 8 2 2 2 3 3 1 3 2 3 0 3 1 1 = 8 2 1 3 1 = 8 2 2 2 1 0 1 1 = 8 2 Example 5 A machine that produces stampings for car engines is not working properly and producing 15% defectives. The defective and no defective stampings proceed from the 68 machine on a random manner. If 4 stampings are randomly collected, find the probability that 2 of them are defective. Solution Let P = 0.15 be the probability that a single stamping will be defective and let X equal the number of defective in n = 4 trials. Then, q = 1 − p = 1 − 0.15 = 0.85 and n p( x) = x = x n− x p q = 4(0.15) x (0.85) 4 − x x 4! (0.15) x (0.85) 4 − x ( x = 0,1,2,3,4) x!(4 − x)! Therefore, the probability of x = 2 defectives in a sample n = 4, substitute x = 2 into the formula for P(X) to obtain 4! (0.15) 2 (0.85) 2 = 0.01625625(6) 2!(4 − 2)! = 0.0975375 P ( 2) = = 0.0975 The mean, variance and standard deviation for a Binomial random variable is given by: Mean µ = np Variance σ 2 = npq S tan dard deviation σ = npq To calculate the values of µ and σ in example 5, substitute n = 4 and P = 0.15 unto the following formula µ = np = 4(0.15) = 0.60 σ = npq = (4)(0.15)(0.85) = 0.51 = 0.714 Example 6 Payani Serenje owns 5 stocks. The probability that each stock will rise in price is 0.6. What is the probability that three out of the five stocks will rise in price? 69 Solution n = 5 = 0.6, q = 1 − P = 0 .4 Let x be the number of stocks, then P( X = 3) = (5,3)(0.6)3 (0.4) 2 5! = .(0.216)(0.16) 3!2! (5)(4) ≅ (0.216)(0.16) 2 ≅ 0.3456 ≅ 0.346 From the tables n = 5, P = 0 .6 P (3) = P ( X ≤ 3) − P ( X ≤ 2) = .663 − .317 = 0.34 5.4 The Poisson Distribution The Poisson distribution is named after the eighteenth century (in the early 1800s) French Physicist and mathematician. The Poisson distribution is a discrete probability distribution which has the following formula. P( X ) = µ xeµ x! , forx = 0,1,2 . . . where P(x) is the probability that a variable with a Poisson distribution equals x, µ is the mean or expected value of the Poisson distribution, and e is approximately 2.718 and is the base of the natural logarithms. One reason why the Poisson distribution is important in statistics is that it can be used as an approximation to the binomial distribution. If n (the number of trials) is large and P(the probability of success) is small, the probability can be approximated by the Poisson distribution where np = µ . Experience indicates that the approximation is adequate for most practical purposes if n is at least 20 and P is no greater than 0.05. The Poisson distribution has been used to describe the probability function of such situations. 1) 2) 3) 4) 5) Product demand Demand for service Number of telephone calls that come through a switchboard., Number of death claims per day received by an insurance company. Number of breakdowns of an electronic computer per much. 70 All the preceding have two elements in common, 1) The given occurrence can be described in terms of a discrete random variable, which takes on values, 0, 1, 2, and so forth. 2) There is some rate that characterizes the process producing the outcome. The rate is the number of occurrences per interval of time or space. For instance, product demand can be characterized by the number of units purchased in a specified period. Product demand may be viewed as a process that produces random occurrences in continuous time. The characteristics of a Poisson distribution are as follows:1) The experiment consists of counting the number of times a particular event occurs during a given unit of time, or in a given area of volume (or any unit of measurement, 2) The probability that an even occurs in a given unit of time, area, or volume is independent of the number that occur in their units. Example 7 Suppose X the number of the company’s absent employees on Tuesdays has (approximately) a Poisson probability distribution. Assuming that the average number of Tuesday absentees is 3.4; a) Find the mean and standard deviation of x, the number of absent employees on Tuesday. b) Find the probability that exactly 3 employees are absent on a given Tuesday. c) Find the probability that at least two employees are absent on a Tuesday. Solution a) The mean and variance of a Poisson distribution are equal to µ . Thus for this example µ = 3.4, σ 2 = 3 .4 Therefore the standard deviation is σ = 3.4 = 1.84 b) We want the probability that exactly three employees are absent on Monday. The probability distribution for x is 71 P( X ) = µ X e− µ X! Then µ = 3.4, X = 3, and e −3.4 = 0.033373 (from Table 2) (3.4)3 e −3.4 (3.4)3 (0.033373) = = 0.2186. Thus, P (3) = 3! 6 c) To find the probability that at least two employees are absent on Tuesday, we need to find ∞ P ( X ≥ 2) = P (2) + P (3) + . . . = ∑ P ( X ) x=2 Alternatively, we could find the complementary event P( X ≤ 2) = 1 −P( X ≤ 1) = 1 − [ P(0) +P(1)] (3.4)0 e3.4 (3.4)1 e3.4 = 1 + 0! 1! = 1 − [0.033373 + (3.4)(0.03337] = 1 − 0.1468412 = 0.8531588 = 0.8532 Example 8 On Saturdays at Southdown, a small airport in Kalulushi, airplanes arrive at an average of 3 for the one hour period 13 00 hours to 14 00 hours. If these arrivals are distributed according to the Poisson probability distribution, what are the probabilities that: a) Exactly zero airplanes will arrive between 13 00 hours to 14 00 hours next Saturday? b) Either one or two airplanes will arrive between 13.00 hours and 14 00 hours next Saturday? c) A total of exactly two airplanes will arrive between 13 00 hrs and 14 00 hrs during the next three Saturdays? 72 Solution a) µ = 3, and we let X be the number of arrivals during the specified time period. 30 e.− 3 ≅ 0.049787068 0! = 0.0498 P ( 0) = (From the table, we have 0.049787). b) P ( X = 1 or X = 2) =P ( X = 1)+P ( X = 2) 31 e −3 32 e −3 + 1! 2! 9 = e − 3 (3 + ) 2 15 = ( )(0.04978068) 2 = 0.37340301 = 0.3734. = c) A total of exactly two arrivals in three Saturdays during the period 13 00 hours to 14 00 hours can be obtained. For example by having two arrivals on the first day, none on the second day, and none on the third day during the specified one-hour period. The total number of ways in which the event in question can occur is shown in the table below. Saturday Day 1 2 0 0 1 1 0 Number of Arrivals Saturday Day 2 0 2 0 1 0 1 Saturday Day 3 0 0 2 0 1 1 Number of ways of obtaining a total of exactly 2 arrivals in 3 Saturdays. 73 = 3[ P ( X = 2][ P ( X = 0)]2 + 3[ P ( X = 1)]2 [ P ( X = 0)] =3 (32 e − 3 ) (30 e − 3 ) 2 (31 e − 3 ) 2 (30 e − 3 ) +3 2! 0! 1! 0! −9 81 9 81e = 3e − 9 + 9 = = (0.0001) 2 2 2 = 0.0049815 = 0.005 5.5 Continuous Random Variables The probability distribution of continuous random variables are also important in statistical theory. They are, a theoretical representation of a continuous random variable such as the time taken in minutes to do some work, or the mass in grammes of a bag of salt. The continuous random variable is specified by its probability density function, which is written f ( x) where f ( x) > 0 throughout the range of values for which x is defined. The probability density function ( p.d . f ) can be represented by a curve, and the probabilities are given by the area under the curve. For a continuous random variable x that assumes a value in the interval a < x < b, b the P (a < x < b) = ∫ f ( x)dx , assuming the integral exists. Similar to the a requirements for a discrete probability distribution, require f ( x) ≥ 0 and b ∫ f ( x)dx = 1 . If x is a continuous random variable – and with p.d.f. f ( x), then a b b var( x) = ∫ x f ( x)dx − µ where µ = E ( x) = ∫ xf ( x)dx, The standard deviation of 2 2 a a x is often written as σ = var( x) . 5.6 The Normal Distribution The normal distribution plays a central role in statistical theory and practice, particularly in the area of statistical inference. Any important characteristic of the normal distribution is that we need to know only the mean and standard deviation to compute the entire distribution. The normal probability distribution is defined by the question. 74 1 F ( x) = e− 2 (x − µ 2 ) σ2 2πσ 2 The normal distribution is perfectly symmetric about its mean µ . Computing the area over intervals under the normal probability distribution is a difficult task. As a result, we will use the computed areas listed in Table 3. Example 1 Suppose you have a normal random variable x with µ = 50 and σ = 15. Find the probability that x will fall within the interval 30 < × < 70. Solution We compute the Z-Score, (or standard score) for the measurement x, the standard score is defined by: Z= Value − Mean x−µ = S tan dard deviation σ thus Z = 30 − 50 = −1.33 15 Because x = 30 lies to the left of the mean, the corresponding Z-score should be negative and of the same numerical value as the Z-score corresponding to x = 50. Z= 70 − 50 20 + = = 1.33 15 15 f ( x) (4) A 30 50 70 Normal frequency function: µ = 50, σ = 15. 75 To find the area corresponding to a Z-score of 1.33, we first locate the value 1.3 in the left-hand column. Since this column lists Z values to one decimal place only, we refer to the top row of the table to get the second decimal place, 0.03. Finally, we locate the number where the row labeled Z = 1.3 and the column labeled 0.03 meet. This number represents the area between the mean, µ and the measurement that has a Z-score of 1.33. A = 0.4082 Or, the probability that x will fall between 50 and 70 is 0.4082. Thus the required probability is 2(0.4082) = 0.8164. Example 2 Use Table 1 to determine the area to the right of the Z-score 1.64for the standard normal distribution, i.e., find P ( Z > 1.64) . Solution A Standard Normal Distribution: µ = 0, σ = 1 The probability that a normal random variable will fall more than 1.64 standard deviation to the right of its mean is indicated in the figure above. Because the normal distribution is symmetric, half of the total probability (.5) lies to the right of the mean and half to the left. Therefore, the desired probability is P ( Z > 1.64) = 0.5 = A. . Where A is the area between µ = 0 and Z =1.64 as shown in the figure. Referring to Table 1, the area A corresponding to Z = 1.64 is 0.4495, so, P ( Z > 1.64)= 0.5 − A = 0.5 − 0.4495 = 0.0505. Example 3 Find the probability that the value of the standard normal variable will be between –1.23 and +1.14. Solution 76 Table 1 shows that the area under the standard normal curve between 0 and 1.23 is 0.3907, so the area between 0 and –1.23 must also be 0.3907. Table 1 shows that the area between 0 and 1.14 is 0.3729. Thus, the area between –1.23 and +1.14 equals 0.3907 + 0.3729 = 0.7636, which means that the probability we want equals 0.7636. -1.23 0 +1.14 Example 4 Find the probability that the value of the standard normal variable will be between 0.43 and 1.55. Solution 0 0.43 1.55 From Table 1, the area between 0 and 1.55 is 0.4394 and that between 0 and 0.43 is 0.1664. Therefore the area between 1.55 is 0.4394 – 0.1664 = 0.2730. The Normal Distribution As An Approximation To The Binomial Distribution Normal Approximation to the Binomial Distribution. If n (the number of trials) is large and P ( the probability of success) is not too close to 0 or 1, the probability distribution of the number of successes occurring in n Bernoulli trials can be approximated by a normal distribution. Experience indicates that the approximation is fairly accurate as long as 1 1 1 np > 5 when p ≤ and n(1 − p ) > when p > . 2 2 2 77 Example 5 1 . A firm has 100 2 such machines and whether one is down, is statistically independent of whether another is not down. What is the probability that at least 60 machines will be down? The probability that a machine will be down for repairs next week is Solution The number of machines down for repair has a binomial distribution with mean equal to 1 1 100 or 50. Because of the continuity correction, the probability that the 2 2 number down for repairs is 60 or more can be approximated by the probability that the value of a normal variable with mean equal to 50 and standard deviation equal to 5 exceeds 59.50. The value of the standard normal variable corresponding to 59.50 is (5950) ÷ 5, or 1.9. Table 3 shows that the area under the standard normal curve between zero is 1.9 is 0.4713, so the area to the right of 1.9 must equal 0.5000 – 0.4713 = 0.0287. This is the probability that at least 60 machines will be down for repair. Learning Objectives After working through this Chapter, you should be able to: • Give the formal definition of a random variable, and distinguish between a random variable and the values it takes. • Explain the difference between continuous and discrete random variables. • Discuss such distributions as Binomial, Poisson, Normal and calculate probabilities of events for such random variables. • Find the mean and the variance of the binomial, Poisson and Normal distributions. Sample Examination Questions 1. a) It is estimated that 75% of a grapefruit crop is good, the other 25% have rotten centers that cannot be detected unless the grapefruit is cut open. The grapefruit are sold in sacks of 6. Let r be the number of good grapefruit in the sack. i) ii) iii) iv) v) Make a histogram of the probability distribution of r. What is the probability of getting no more than one bad grapefruit in a sack? What is the probability of getting at least one grapefruit in a sack? What is the expected number of good grapefruit in a sack? What is the standard deviation of the r probability distribution? 78 2. 3. b) Let x have a normal distribution with µ = 10 and σ = 2. Find the probability that an x value selected at random from the distribution is between 11 and 14. a) In a lottery, you pay K12 500 to choose a number (integer) between 0 and 9999, inclusive. If the number is drawn, you win K12 500,000. What is your expected gain (or loss) per play? b) A large hotel knows that on average 2% of its customers require a special diet for medical reasons. It is hosting a conference for 500 people. i) Which probability distribution would you suggest for calculating the exact probability that no customer at the conference will require a special diet? Calculate this probability. ii) Which probability distribution do you suggest is an approximation to this and why? Calculate an approximate probability that no customers require a special diet. iii) Compare your answers to (i) and (ii). iv) From past records the hotel knows that 0.2% of its customers will require medical attention while staying in the hotel. Calculate the exact and approximate probability that no customer out the 500 will require medical attention while attending the conference. Is this approximation better or worse that the approximation used in (ii)? Why? The Table below shows the probabilities for the number of complaints received each day by a newspaper agency from customers not receiving a paper. a) No. of complaints Probability b) 8 .35 9 .42 10 .18 11 .03 12 .02 i) Find the mean and standard deviation of the number of complaints. ii) The agency state the cost (in kwachas) of daily complaints to be C = 600 + 300x, where x is the number of complaints. Find the mean and standard deviation of the cost of daily complaints. A write has prepared to submit sit articles for publication. The probability of any article being accepted is 0.20. Assuming independence, find the probability that the writer will have i) exactly one article accepted. ii) At least two articles accepted iii) No more than three articles accepted iv) At most two articles accepted. 79 4. a) b) 5. a) b) c) A Toyota dealer wishes to know how many citations to order for the coming month. Estimated demand is normally distributed, with a standard deviation of 20 and a mean of 120. i) What is the probability that he will need more than 160? ii) What is the probability that he will eed less than 90? A client wishes to know what price he might be able to get for a business property. The realtor estimates that a sale price for that property of K600 million would be exceeded no more than 5% of time. A price at least K420 million should be obtained at least 90% of the time.. Assuming the distribution of sales prices to be normal, answer the following questions? i) What are µ and σ for this distribution? ii) What is the probability of a scale price greater than K540, less than K640 million, and between K540 million and K600 million. Which of the following are continuous variables, and which are discrete variables. i) Number of traffic fatalities per year in the town of Livingstone. ii) Distance a ball travels after bring killed by a soccer player. iii) Time required to drive from home to campus on any given day. iv) Number of cars in Kitwe on any given day. v) Your weight before breakfast each morning. The ABCD Mother-in-law sociologists say that 80% of married women claim that their husbands’ mothers are the biggest bones of contention in their marriages (sex and money are lower-rated areas of contention). Suppose that five married women are having lunch together one afternoon, what is the probability that: i) All of them dislike their mother-in-law ii) None of them dislike her mother-in-law? iii) At least four of them dislike their mother-in-law? iv) No more than three of them dislike their mother-in – law. The Mulenga Café has found that about 6% of the parties who make reservations don’t show up. If 90 party observations have been made, how 80 many can be expected to show up. Find the standard deviation of this distribution. 6. a) b) The mean and standard deviation on an examination are 85 and 15 respectively. Find the scores on standard units of students receiving grades. i) 65 ii) 89 Determine the probabilities i) P ( Z ≤ 2.12) ii ) P (−16 ≤ Z < 1.13) where Z is assumed to be normal with mean 0 and variance 1. 7. c) What is the probability of obtaining at least 1280 heads if a coin is tossed 2500 times and heads and tails are equally likely? d) The side effects of a certain drug cause discomfort to only a few patients. The probability that any individual will suffer from the side effects is 0.005. If the drug is given to 35 000 patients, what is the probability that three (3) will suffer side effects. a) The customer service center in a large Luksa department store has determined that the amount of time spent with a customer with a complaint is normally distributed with a mean of 9.3 minutes and a standard deviation of 2.5 minutes. What is the probability that for a randomly chosen customer with a complaint the amount of time sent resolving the complaint will be: b) i) less that 10 minutes? ii) more than 5 minutes iii) between 8 and 15 minutes. A car rental company is determined that the probability a car will need service work in any given month is 0.25. The company has 850 cars. i) What is the probability that more than 150 cars will require service work in a particular month? 81 ii) c) What is the probability that fewer than 180 cars will need service work in a given month? (Give reason for the method used to calculate the probabilities in (i) and (ii). A contractor estimates the probabilities for the number of days required to complete a certain type of construction project as follows. Time (days) Probability 1 .04 2 .21 3 .34 4 .31 5 .10 i) What is the probability that a randomly chosen project will take less than 3 days to complete. ii) Find the expected time to complete a project. iii) Find the standard deviation of time required to complete a project. iv) The Contractor’s project cost is made up of two parts – a fixed cost of K100,000,000 plus K10,000,000 for each day taken to complete the project. Find the standard deviation of total project costs. 82 CHAPTER 6 SAMPLING AND SAMPLING DISTRIBUTION Reading Newbold Chapter 6 Wonnacolt and Wonnacolt Chapter 6 Tailoka Frank P Chapter 10 James T Mc Clave and P George Benson Chapter 7 Introductory Comments We now start on the work that defines the subject Statistics as a different and unique subject. The idea of sampling and sampling distribution for a statistic like the mean must be clearly understood by all users of statistics. This is not an easy Chapter to understand. 6. Sampling Theory Sampling and Sampling Distribution 6.1 Sampling If we draw an object from a box, we have the choice of replacing or not replacing the abject into the box before we draw again. In the first case a particular object can come up gain and again, whereas in the second it can come up only once. Sampling where each member of a pollution may be chosen more than once is called sampling with replacement while sampling where each member cannot be chosen more than once is called sampling without replacement. Random Samples. Random Numbers Clearly the reliability of conclusions drawn concerning a population depends on whether the sample is properly chosen so as to represent the population 83 sufficiently well, and one of the important problems of statistical inference is just how to choose a sample. The way to do this for finite population is to make sure that each members of the population has the same chance of being in the Sample, which os often called a random sample. Random sampling can be accomplished for relatively small populations by drawing lots or equivalently, by using a table of random numbers specially constructed for such purposes. Because inference from sample to population cannot be certain we must use the language of probability in any statement of conclusions. 6.2 Sampling Distributions As we have seen, a sample statistic that is computed from X 1 , . . . , X n is a function of these random variables and is therefore itself a random variable. The probability distribution of a sample statistic is often called the sampling distribution of the statistic. Alternatively, we can consider all possible sample of size n that can be drawn from the population, and for each sample we compute the statistic. In this manner we obtain the distribution of the statistic, which is its sampling distribution. For a sampling distribution, we can of course compute a mean, variance, standard deviation, etc. The standard deviation is sometimes also called the standard error. The Sample Mean Let X 1 , X 2 , . . . X n denote the independent, identically distributed random variables for a random sample of size n as described above. Then the mean of the sample or sample mean is a random variable defined by x= X1 +X 2 + . . . + X n n → (1) If x1 , x2 , . . ., xn denote the values obtained in a particular sample of size b, then the mean x + x2 + . . . + xn for that sample is denoted by x = 1 → ( 2) 2 Sampling Distributions of Means 84 Let f ( x) be the probability distribution of some given population from which we draw a sample of size n. Then it is natural to look for the probability distribution of the sample statistics x , which is called the sampling distribution for the sample mean, or the sampling distribution of mean. The following theorems are important in this connection. Theorem 6.1 The mean of the sampling means denoted by µ x = µ → (3) Where µ is the mean of the population. Theorem 6 – 1 states that the expected value of the sample mean is the population mean. Theorem 6.2 If a population is infinite and the sampling ir random or if the population is finite and sampling is with replacement, then the variance of the sampling distribution of means, denoted by σ x2 , is given by [ ] E ( x − µ ) 2 = σ x2 = σ2 n Theorem 6.3 If the population is of size N, if sampling is without replacement, and if the sample size σ 2 N − n is n ≤ N , then the previous equation is replaced by σ x2 = → (5) n N − 1 While µ x is from Theorem 6.1. Note that Theorem 6.3 is basically the same as 6.2 as N → ∞ Theorem 6.4 If the population from which samples are taken is normally distributed with mean µ and variance σ 2 , then the sample mean is normally distributed with mean µ and variance σ2 n . Theorem 6.5 85 Suppose that the population from which samples are taken has a probability with mean µ and variance σ 2 , that is not necessarily a normal distribution. Then the standardized variable associated with x , given by Z= x−µ → σ ( 6) n is asymptotically normal, i.e. lim n→∞ P( Z ≤ z ) = 1 2π z ∫e − µ2 2 du → ( 7) −∞ Theorem 6.5 is a consequence of the Central limit theorem. It is assumed here that the population is infinite or that sampling is with replacement. Otherwise, the above is correct if we replace σ n in Theorem 6.5 by σ x2 as given in theorem 6.3. Example 1.0 Five hundred ball bearings have a mean weight of 5.02kg and a standard deviation of 0.30kg. Find the probability that a random sample of 100 ball bearings chosen from this group will have a combined weight of more than 5.10kg. For the sampling distributions of means, µ x = µ = 5.02kg , and = 0.30 100 σ = 2 x σ2 n N −n N −1 500 − 100 = 0.027 500 − 1 The combined weight will exceed 5.10kg if the mean weight of the 100 bearings exceeds 5.10kg. 5.10 in standards units = 5.10 − 5.02 = 2.96 0.027 The required probability is the area to the right z = 2.96. 2.96 Figure 6.1. 86 The probability is 0.5 – 0.4985 = 0.0015. Therefore, there are only 3 chances in 2000 of picking a sample of 100 ball bearings with a combined weight exceeding 5.10 kg. For finite populations in which samplings without replacement, the equation σ p̂ given above, is replaced by σ x as given they Theorem 6.3 with σ pˆ = pq n . Sampling Distribution of Proportions Suppose that a population is infinite and binomially distributed, with P and q = 1- p being the respective probabilities that any given number exhibits or does not exhibit of a certain property. For example, the population may be all possible tosses of a fair coin, in which the probability may be all possible tosses of a fair coin, in which the probability of the 1 event heads is P = . 2 Consider all possible samples of size n drawn from this populations, and for each sample determine the statistic that is the proportion P of successes. In the case of the coin, P would the the proportion of heads turning up in n tosses. Then we obtain a sampling distribution whose mean µ p̂ and standard deviation σ p̂ are given by µ pˆ = P σ pˆ = pq = n p (1 − p ) n → (8) For large values of n(n ≥ 30) the sampling distribution is very nearly a normal distribution, as seen from Theorem 6.5. Sampling Distribution of Differences and Sums Suppose that we are given two populations. For each sample size n1 drawn from the first population, let us compute a statistic S1. This yields a sampling distribution for S1. whose mean and standard deviation we denote by µ s1 and σ s1 , respectively. Similarly for each sample of size n2 drawn from the second population, let us compute a statistic S2 whose mean and standard deviation are µ s 2 and σ s 2 respectively. Taking all possible combinations of these samples from the two populations, we can obtain a distribution of the differences of the statistics. The mean and standard deviations are µ s 2 and σ s 2 respectively. Taking all possible combinations of these samples from the two populations, we can obtain a distribution of the differences S1 − S 2 , which is called the sampling distribution of differences of the statistics. The mean and standard deviation of this sampling d, denoted respectively. 87 By µ S1 − S 2 = µ S1 − µ S 2 σS 1 − S2 = σ S21 + σ S22 (9) Provided that the samples chosen do not in any way depend on each other, i.e., the samples are independent (in other words, the random variables S1 and S 2 are independent. If, for example S1 and S 2 are the sample means from two populations, denoted by x1 , x2 , respectively, then the sampling distribution of the differences of means is given for infinite population with mean and standard deviation µ1 , σ 1 and µ 2 , σ 2 , respectively by µx 1 − x2 = µ x1 − µ x 2 = µ1 − µ 2 , 1 − x2 = σ x2 + σ x2 = σx 1 σ 12 n1 2 → and + σ 22 → n2 (10) (11) Using Theorems 6.1 and 6.2. This result also holds for finite populations if sampling is done with replacement. The standardized variable Z= ( X 1 − X 2 ) − ( µ1 − µ 2 ) σ 12 n1 + σ 22 n2 in that case is very nearly normally distributed if n1 and n2 are large (n1 , n2 ≥ 30). Similar results can be obtained for infinite populations in which sampling is without replacement by using Theorems 6.1 and 6.3. Corresponding results can be obtained for sampling distributions of differences of proportions from two binomially distributed populations with parameters P1 , q1 , and P2 , q2 , whose mean and standard deviation of their difference is given by µ P − P = µ P = P1 − P2 1 2 σ P − P = σ P2 + σ P2 1 2 → 2 1 2 = P1q1 P2 q2 + n1 n2 → (13) and (14) Instead of taking difference os statistics, we sometimes are interested in the sum of their statistics. In that case, sampling distribution of the sum of statistics S1 and S 2 has mean and standard deviation given by µS 1 + S2 = µ S1 + µ S 2 σS 1 +S2 = σ S21 + σ S22 → (15) 88 assuming the samples are independent. Results similar to µ x 1 − x2 and σ x 1 − x2 can be obtained.. Example 2 It has been found that 2% of the tools produced by a certain machine are defective. What is the probability that in a shipment of 400 such tool, 3% or more will prove defective? µ p = P = 0.02, σp = 0.02(0.98) 0.14 = = 0.007 400 20 pq = n 0.03 − 0.02 P ( P > 0.03) = P Z > 0.007 = P ( Z > 1.43) = 0.5000 − 0.4236 = 0.0764 1.43 Learning Objectives After working through this Chapter, you should be able to: • Give the formal definition of a random variable, and distinguish between a random variable and the values it take, • Explain the difference between continuous and discrete random variables. • Discuss such distribution as binomial, poisson normal and calculate probabilities of event for such random variables. • Find the mean and the variance of the binomial, Poisson and normal distribution. 89 CHAPTER 7 ESTIMATION Reading Newbold Chapter 7 Wonnacott and Wonnacott Chapter 7 Tailoka Frank P Chapter 10 Introductory Comments We need to know how the mean of the population is related to the sample mean What characteristics must the sample mean have. We need to know whether the sample is likely to give us an estimate close to the population value. To tell us this , we use confidence intervals. 7. Estimation Theory 7.1 Unbiased Estimates and Efficient Estimates A statistic is called unbiased estimator of a population parameter if the mean or expectation of the statistic is equal to the parameter. The corresponding value of the statistic is then called unbiased estimate of the parameter. If the sampling distribution of two statistics have the same mean, the statistic with the smaller variance is called a more efficient estimator of the mean. The corresponding value of the efficient statistic is then called an efficient estimate . Clearly one would in practice prefer to have estimators that are both efficient and unbiased, but this is not always possible. 7.2 Point estimates and Interval Estimates An estimate of a population parameter given by a single number is called a point estimate of the parameter. An estimate of a population parameter given by two numbers between which the parameter may be considered to lie is called an interval estimate of the paratmeter. Example 7.0 If we say that a distance is 34.5km, we are giving a point estimate. If, on the other hand, we say that the distance is 34.5 ± 0.04km, i.e., the distance lies between 34.46 and 34.54km, we are giving an interval estimate. A statement of the error or precision of an estimate is often called reliability. 90 7.3 Confidential Interval Estimates of Population Parameters. Let µ s and σ s be the mean and standard deviation (standard error) of the sampling distribution of a statistic S. Then if the dampling distribution of S is approximately normal (which we have seen is true for many statistics if the sample size n ≥ 30), we can expect to find S lying in the interval µ s − σ s to µ s + σ s , µ s − 2σ s to µ s + 2σ s or µ s − 3σ s , to µ + 3σ s , about 68%, 95% and 99.7% of the time respectively. Equivalently, we can expect to find, or we can be confident of finding µ in the intervales S − σ s , to S +σ s , S − 2σ , to S + 2σ , S − 3σ s to S + 3σ s about 68%, 95% and 99.7% of the time respectively. Because of this, we call these respective intervals 68%, 95% and 99.7% confidence intervals for estimating µ s (i.e., for estimating the population parameter, in this case of an unbiased S). The end number of these intervals ( S ± σ s S ± 2σ s , S ± 3σ s ) are then called the 68%, 95% and 99.7% confidence limites. Similarly, S ± 1.96σ s and S ± 2.58σ s are 95% and 99% confidence limits for µ s . The percentage confidence is often called the confidence level. The numbers 1.96, 2.58, etc., in the confidence limits are called critical values and are denoted by Z c . From confidence levels, we can find critical values. 7.4 Confidence Intervals for Means We shall see how to create confidence intervals for the mean of a population using two different cases. The first case shall be when we have a large sample size ( n ≥ 30), and the second case shall be when we have a smaller sample n < 30) and the underlying population is normal. Large samples ( n ≥ 30) If the statistic S is the sample mean x , then the 95% and 99% confidence limits for estimation of the population mean µ are given by x ± 1.96σ x , and x ± 2.58σ x , respectively. More generally, the confidence limits are given by x ± Z cσ x where Z c which depends on the particular level of condience desired. The confidence limits for the population mean are given by x ± Zc σ n → (1) 91 In case of sampling from an infinite population or if sampling is done with replacement from a finite population, and by x ± Zc σ n N −n N −1 → ( 2) If sampling is done without replacement from a population of finite size N. In general, the population standard deviation σ is unknown, so that to obtain the above cnfidence limits we use the estimator Sˆ or S . Example 2 Find a 95% confidence interval estimating the mean height of the 1546 male students at XYZ University by taking a sample size 100. (Assume the mean of the sample, x , is 67.45 and that the standard deviation of the sample Ŝ , is 2.93cm). The 95% confidence limits are x ± 1.96 σ n Using x = 67.45cm and Ŝ = 2.93 as an estimate of σ , the confidence limits are 2.93 67.45 ± 1.96 100 or 67.45 ± 0.57 Then the 95% confidence interval for the population mean µ is 66.88 to 68.02 cm, which can be denoted by 66.88 < µ < 68.02. We can therefore say that the probabilit that the population mean height lies between 66.88 and 68.02 cm is above 95%. In symbols, we write P (66.88 < µ < 68.02) = 0.95% . This is equivalent to saying that we are 95% confident that the population mean (true mean) lies between 66.88 and 68.02cm. 7.5 Sample Sample (n < 30) and Population Normal In this case use the distribution to obtain confidence levels. For example, if − t0.025 and t0.025 are values of T for which 2.5% of the area lies in each tail of the t distribution, then a 95% confidence interval for T is given by − t0.025 < (x − µ ) S n < t0.025 → (3) 92 From which we can see that µ can be estimated to lie in the interval x − t0.025 Sˆ S < µ < x + t0.025 n n → ( 4) With 95% confidence. In general the confidence limits for population means are given by x ± tc S n → (5) where the tc values can be read from Table 2. 7.6 Confidence Intervals for Proportions Suppose that the statistic S is the proportional of “successes’ in a sample of size n ≥ 30 drawn from a binomial population in which P is the proportion of successes (i.e. the probability of success). Then the confidence limits for P are given P ± Z cσ p , where P denotes the proportion of success in the sample of size n. Using the value of σ p obtained in chapter 6, we see that confidence limits for the population are given by: P ± Zc pq P(1 − P) = P ± Zc n n → ( 6) In case sampling from an infinite population or if sampling is with replacement from a finite population. Similarly, the confidence limits are: P ± Zc pq n N −n N −1 → ( 7) If sampling n without replacement from a population of finite size N. Note that these results are obtained from (1) and (2) on replacing x by P and σ by Pq . To compute the above confidence limits, we use the sample estimate P for p. Example 3 A sample roll of 100 votes chosen at random from all voters in a given district indicate that 55% of them were in favour of a particular candidate. Find the 99% confidence limits for the proportion of all voters in favour of this candidate. 93 The 99% confidence limits for the population P are P+ 1.58σ p = P ± 2.58 P (1 − p ) n 055(0.45) 100 = 0.55 ± 2.58 = 0.55 ± 0.13 7.7 Confidence Intervals for Differences and Sums S1 and S2 are two sample statistics with approximately normal sampling distributions, confidence limits for the differences of the population parameters corresponding to S1 and S2 are given by (S1 − S2 ) + Z cσ s + s 1 2 = (S1 − S 2 ) ± Z c σ s21 +σ s22 → (8) While confidence limits for the sum of the population parameters are given by (S1 − S 2 ) + Z cσ s + s 1 2 = (S1 − S 2 ) ± Z c σ s21 +σ s22 → (9) provided the samples are independent. For example, confidence limits for the difference of two population means, in the case where the populations are infinite and have known standard deviations σ 1 ,σ 2 , are given by (x − x ) ± Z σ 1 2 c x1 − x 2 ( ) = x1 − x 2 ± Z c σ s2 σ s2 1 n1 + n2 → (10) where x1 , n1 and x 2 , n2 are the respective means and sizes of the two samples drawn from the populations. Similarly, confidence limits for the difference of two population proportions, where the populations are infinite, are given by (P1 − P )2 ± Zc P(1 − p1 ) P (1 − p2 ) + 2 n1 n2 → (11) 94 When P1 and P2 are the two sample proportions and n1 and n2 are sizes of the two samples drawn from the populations. Example 4 In a random sample of 400 adults and 600 teenagers who watched a certain television program, 100 adults and 300 teenagers indicated that they like it. Construct the 99.7% confidence limits for the difference in proportions of all adults and all teenagers who watched the program and liked it. Confidence limits for the difference in proportions of the two groups are given by 911), where subscripts 1 and 2 refer to teenagers and adults, respectively, and Q1 = 1 − p1 , Q2 = 1 − p2. Here P1 − 300 / 600 = 0.5 and P2 = 100 / 400 = 0.25 are respectively, the proportions of teenagers and adults who liked the program. The 99.7% confidence limits are given by 0.50 − 0.25 ± 3 (0.50)(0.50) (0.25)(0.75) + 600 400 0.25 ± 0.09 → (12) Therefore, we can be 99.7% confident that the true difference in proportions lies between 0.16 and 0.34. Learning Objectives After working through this Chapter you should be able to: • . • Explain a point estimate and confidence interval. • Confidence intervals for proportions and differences of proportions. Find confidence intervals for means of normal populations, and for differences of means of two normal populations, both when variance (s) are known and when they are unknown.. 95 CHAPTER 8 HYPOTHESIS TESTING Reading Newbold Chapter 9 Wonnacott and Wonnacott Chapter 9 Tailoka Frank P Chapter 10 Introductory Comments We often need to answer questions about a population such as “Is the mean of the population less 5?” or “Is there any difference between two means?” In statistics we try to answer these questions based on the information in samples. There is useful information in this Section of this subject for everyday life. The theory of tests of hypothesis is necessarily linked to that for confidence intervals. 8.0 Test of Hypothesis and Significance 8.1 Statistical Decisions Very often in practice we are called upon to make decisions about populations on the basis of sample information. Such decisions are called statistical decisions. For example, we may wish to decide on the basis of sample data whether a new serum is really effective in curing a disease, whether one educational procedure is better than another, or whether a given coin is loaded. 8.2 Statistical Hypothesis In attempting to research decisions, it is useful to make assumptions or guesses about the populations involved. Such assumptions, which may or may not be true, are called Statistical hypotheses and in general are statements about the probability distribution of the populations. For example, if we want to decide whether a given coin is loaded, we formulate the hypothesis that the coin is fair, i.e, p = 0.5, where p is the probability of heads. Similarly, if we want to decide whether one procedure is better than another, we formulate the hypothesis that there is no difference between the two procedures (i.e., any observed differences are merely due to fluctuations in sampling from the same population). Such hypotheses are often called null hypotheses, denoted by H o . 96 Any other hypothesis that differs from a given null hypothesis is called an alternative hypothesis. For example, if the null hypothesis is p = 0.5, possible alternative hypotheses are p = 0.7, P ≠ 0.5 or P > 0.5. A hypothesis alternative to the null hypothesis is denoted by H1 . 8.3 Type I and Type II Errors If we reject a hypothesis when it happens to be true, we say that a Type I error has been made. If, on the other hand, we accept a hypothesis when it should be rejected, we say that a Type II error has been made. In either case, a wrong decision or error in judgement has occurred. In order for any tests of hypotheses or decision rules to be good, they must be designed so as to minimize errors of decision. This is not a simple matter since, for a given sample size, an attempt to decrease one type of error is accompanied in general by an increase in the other type of error. In practice one type of error may be more serious than the other, and so a compromise should be reached in favour of a limitation of the more serious error. The only way to reduce both types of errors is to increase the sample size, which may or may not be possible. 8.4 Level of Significance In testing a given hypothesis, the maximum probability with which we should be willing to risk a type I error is called the level of significance of the test. This probability is often specified before any samples are drawn so that results obtained will not influence our decision. In practice a level of significance of 0.05 or 0.01 is customary, although other values are used. If for example a 0.05 or 5% level of significance is chosen in designing a test of a hypothesis, then there are about 5 chances in 100 that we would reject the hypothesis when it should be accepted; i.e., whenever the null hypothesis is true, we are about 95% confident that we would make the right decision. In such cases we say that the hypothesis has been rejected at a 0.05 level of significance, which means that we could be wrong with probability 0.05. 8.5 Test Involving the normal Distribution To illustrate the ideas presented above, suppose that under a given hypothesis, the sampling distribution of a statistic S in a normal distribution with mean µ s and standard deviation σ s . The distribution of that standard variable Z = ( S − µ s ) / σ s is the standard normal distribution (mean 0, variance 1) shown in Figure 8.1, and the extreme values of Z would lead to the rejection of the hypothesis. 97 Critical region Critical region 0.95 0.25 0.25 Z = -1.96 Z = 1.96 As indicated in the figure, we can be 955 confident that, if the hypothesis is true, the Z-score of an actual sample statistic S will be between –1.96 and 1.96 (since the area under the normal curve between these values is 0.95). However, if on choosing a single sample at random we find that the Z Score of its statistic lies outside the range –1.96 to 1.96, we would conclude that such an event could happen with the probability of only 0.05 (total shaded area in the figure) if the given hypothesis was true. We would then say that this Z-Score differed significantly from what would be expected under the hypothesis, and we would be inclined to reject the hypothesis. The total shaded area 0.05 is the level of significance of the test. It represents the probability of our being wrong in rejecting the hypothesis, i.e., the probability of making a Type I error. Therefore, we say that the hypothesis is rejected at a 0.05 level of significance or that the Z Score of the given sample statistic is significant at 0.05 level of significance. The set of Z Scores outside the range –1.96 to 1.96 constitutes what is called the critical region or region of rejection of the hypothesis of the region of significance. The set of Z Scores inside the range –1.96 to 1.96 could then be called the region of acceptance of the hypothesis or the region of non significance. On the basis of the above remarks, we can formulate the following decision rule: a) Reject the hypothesis at a 0.05 level of significance of the Z Score of the statistic S lies outside the range –1.96 to 1.96 (i.e., if either Z > 1.96 or Z < −1.96). This is equivalent to saying that the observed sample statistic or significant at the 0.05 level. b) 8.6 Accept the hypothesis 9or, if desired, no decision at all) otherwise. One-Tailed and Two-Tailed Tests In the above test we displayed interest in extreme values of the statistic S or its corresponding Z Score on both side of the mean, i.e., in both tails of the 98 distribution. For this reason such tests are called two-tailed tests or two-sided tests. Often, however, we may be interested only in extreme values to one side of the mean, i.e., on one tail of the distribution, as for example, when we are testing the hypothesis that one process is better than another (which is different from test whether one process is better or worse than the other). Such tests are called one-tailed tests or one-sided tests. In such cases, the critical region is a region to one side of the distribution, with area equal to the level of significance. 8.7 P-Value: The P-value is the smallest value of α which will lead to the rejection of the null hypothesis. 8.8 Special Tests For large samples, many statistics share nearly normal distributions with mean µ s and standard deviation σ s . In such cases we can use the above results to formulate decision rule or tests of hypotheses and significance. The following special cases are just a few of the statistics of practical interest. In each case the results hold e for infinite populations or for sampling with replacement. For sampling without replacement from finite populations, the result must be modified. 1. Means Here S = X , the sample mean; µ s = µ x = µ , the population mean; σ s = σ x = σ n , where σ is the population standard deviation and n is the sample size. The standardized variable is given by Z= x−µ σ/ n (1) for n ≥ 30 . for n < 30 , tc = 2. x−µ S n Proportions: Here S = P, the proportion of “successes” in a sample; µ s = µ p = P, where p is the population proportion of successes and n is the sample size; σ s= σ p = given by pq / n , where q = 1 – p. The standardized variable is 99 Z= P− p pq / n (2) In case P = x , where x is the actual number of successes in a sample, (2) n becomes Z= 3. X − np npq (3) Differences of means let X 1 and X 2 be the sample means obtained in large samples of sizes n 1 and n2 drawn from respective populations having means µ1 and µ 2 and standard deviations σ 1 and σ 2 . Consider the null hypothesis that there is no difference between the population means, i.e., µ1 = µ 2 . µx− x = 0 σx 1 −x = 2 σ 12 n1 + σ 22 (4) n2 The standardized variable is given by Z= 4. X1 − X 2 − 0 σX 1−X2 = X1 − X 2 (5) σ X −X Difference of proportions let P1 and P2 be the sample proportions obtained in large samples of sizes n 1 and n2 drawn from respective proportions P1 and P2 . Consider the null hypothesis that there is no difference between the population proportions, i.e., P1 = P2 , and thus the samples are really drawn from the same population. 1 1 + n1 n2 µ p − P = 0, σ P − P 2 = P(1 − P) 1 where P = 2 1 n1P1 + n2 + P2 is used as an estimate of the population proportion P. n1 + n2 By using the standardized variable Z = P1 − P 2 −0 σ P −P 1 2 = P1 − P2 σ P −P 1 we can observe 2 differences at an appropriate level of significance and thereby test the null hypothesis. Tests involving other statistics can similarly be designed. 100 Example: The mean lifetime of a sample of 100 fluorescent light bulbs produced by a company is computed to be 1570 hours with a standard deviation of 120 hours. If µ is the mean lifetime of all the bulbs produced by the company, test the hypothesis µ ≠ 1600 hours. Use a significance level of 0.05 and find the P value of the test. 1. Ho : µ = 1600 H a : µ ≠ 1600 2. This is a two tailed test. .025 .025 .95 -1.96 1.96 we reject H o if Z c is either > 1.96 or < -1.96 3. n = 100, Z c = X −µ S n X = 1570, µ = 1600, S = 120 Zc = 1570 − 1600 − 30 = 120 12 100 = -2.5 4. Since Z = -2.5, < -1.96, we reject H o . P-value = 2 P ( Z ≤ −2.5) = 2(0.0062) = .0124. 101 Learning Objectives After working through this chapter you should be able to: • Define and use the terminology of statistical testing. • Carry out statistical tests of all the types covered in this Chapter. • Calculate the P-value of the simpler tests. • Explain the way in which the rejection regions of tests follow from the distributional results, taking into account the level and considerations of power. Sample Examination Questions 1. 2. A finite population consisting of the numbers 6, 7, 8 10 and 11 can be converted into an infinite population if we take a random of size 2 by first drawing one element and then replacing it before drawing the second element. (a) Determine how many different samples of size 2 can be drawn from this infinite population and list them. (b) Determine the means of the samples of part (a). What is the probability assigned to each mean? Construct the sampling distribution to the mean for random samples of size 2 drawn from this infinite population. (c) Calculate the mean and the standard deviation of the probability distribution of part (b) and compare the value of the standard deviation with the corresponding result obtained from the standard error of the mean formula. (a) Explain briefly with examples: (i) population parameter (ii) sample statistics (iii) population 102 3. (b) Chisha is a cocktail hostess in a very exclusive private club. The Zambia Revenue Authority is auditing her tax return this year. Chisha claims that her average tip last year was K23, 750. To support her claim, she sent the ZRA a random sample of 52, credit card receipts showing her bar tips. When ZRA got the receipts, they computed the sample average and found it to be x = K 26,250 with sample standard deviation S = K 5,750 . Do these receipts indicate that the average tip Chisha received last year was more than K23,750. Use a 1% level of significance. Also find the P-value. (a) Briefly define each of the following terms: (b) 4. (a) (b) (i) Finite population correction factor (ii) Simple random sampling (iii) Standard error A government agency recently found that an artificial sweetener used in diet soft drinks may have harmful side effects. Therefore, it sets limits on the amount that each can may contain at 0.1 ounce. The manage of a local soft drink company, thinking that the mixing machine may not be staying within the tolerate limit, runs a test on 100 cans. The test shows the cans to have an average of 0.13 ounce of artificial sweetener. The population standard deviation is 0.06. (i) Should the manager adjust the machine if α = 0.05 ? (ii) If α = 0.02, should the manager adjust the machine? (iii) Which value of α would you pick for this problem? (iv) What if x = 0.12 (α = 0.02 )? (v) At what value of x should he keep the machine (α = 0.02) as it is? Define each of the following: (i) The power of a test. (ii) A student’s test. The table below shows the annual salaries in millions of kwacha of randomly selected faculty in public educational institutions and private educational institutions. 103 Public private 5. (a) (b) 6. (a) (b) 80 86 90 95 100 110 85 75 105 115 92 74 65 64 85 92 72 73 74 (i) Find a 90% confidence interval for the difference between population mean annual salaries in the public and private institutions. (ii) Test the null hypothesis that the mean salary for the private institutions is K5,000,000 more than in the public institutions against the alternative that the mean for the private institutions is more than K5, 000,000 greater. (iii) State carefully the assumptions you have made in arriving at the test and confidence interval. Explain the following terms used in statistical hypothesis testing: (i) Rejection region (ii) Significance level of the test. A random sample of 25 engineers in company A produces a mean salary of K90,000,000 with standard deviation of K15,000,000; and a random sample of 86 engineers in company B produces a mean salary K110, 000,000 with a standard deviation of K20, 000, 000. (i) Can we conclude that company B pays its engineers more than company A? Use an α = 0.05 level of significance. (ii) What is the P-value for this test? Define each of the following: (i) The power of a test (ii) Rejecting a null hypothesis (iii) The Central Limit Theorem An Air Force base mess hall has received a shipment of 10 000 gallon size cans of cherries. The supplier claims that the average amount of liquid is 0.25 gallon per annum. A government inspector took a random sample of 100 cans and found the average liquid content to be 0.28 gallon per can with a standard deviation of 0.10. (i) Does this indicate that the supplier’s claim is too low? (Use 95% level of significance). 104 (ii) 7. (a) Compute the P-value. A consumer group is testing camp stores. To test the heating capacity of a store, the group measures the time required to bring 2 litres of water from 10°c to boiling (at sea level). Two competing models are under consideration. Thirty-six stores of each model are tested and the following results are obtained: x1 = 11.4 min ; Standard deviation S1 = 25 min Model 1: Mean time Model 2: Mean time x2 = 9.9 min ; Standard deviation S2 = 30 min Is there any difference between the performance the performances of these two models? (Use a 5% level of significance). Also find the P-value for the sample test statistic. (b) Define briefly the following terms: (i) Type I error (ii) Decision (iii) Type II error 105 CHAPTER 9 ANALYSIS OF VARIANCE Reading Newbold Chapter 15 Wonnacott and Wonnacott Chapter 10 Tailoka Frank P Chapter 13 Introductory Comments Analysis of Variance (ANOVA) is a popular tool that needs some time and effort to appreciate. The idea of analysis of variance is to investigate how variation in structured data can be split into pieces associated with components of the structure. Here we cover one-way and two-way cases. Both tests and confidence intervals are widely used in applications. Analysis Of Variance Use of F-distribution. The F-distribution is used to test the hypothesis that the variance of one normal population equals the variance of another normal population. The second use of the F-distribution involves the analysis of variance techniques, abbreviated ANOVA. Basically, analysis of variance uses sample information to determine whether or not three or more treatments produce different results. A treatment is a cause, or specific source, of variation in a set of data. Following are several cases to expand on the meaning of a treatment. Do different treatments of fertilizer affect yield? Do different grades of gasoline affect performance? Do four different assembly methods result in different population means? 106 Assumptions Underlaying The Anova Test Before we actually conduct a test using the ANOVA techniques, the assumption underlying the test will be examined. If the following assumptions cannot be met, another analysis of variance technique may be applied. 1. The three or more populations of interest are normally distributed. 2. These populations have equal standard deviations 3. The samples we select from each of the populations are random and independent that is they are not related. Analysis Of Variance Procedure: The ANOVA procedure can best be illustrated using an example. Suppose the manager of ABC resigned and three sales people at the branch are being considered for the position. All three have about the same length of service, education and so on. In order to make a decision, it was suggested that each of their monthly sales are shown in Table 1. The “treatments” in this problem are sales people. Table 1.0 Monthly Sales of appliances for three sales People. Sample Ms Banda Monthly Sales (K000) Mr Mwenya Mr Chisenga 25 25 19 15 15 17 14 17 13 10 16 11 21 17 12 18 14.4 17 Mean 107 The ANOVA procedure calls for the same hypothesis procedure outlined in the lecture notes of Estimation and hypothesis testing. STEP 1 The null hypothesis H o states that there is no significant difference among the mean sales of the three salespeople; that is µ1 = µ 2 = µ3 . H a states that at least one mean is different. As before, if H o is rejected, H a will be accepted. STEP 2 The level of significance is selected. In our case we choose 0.05 level. STEP 3 The test statistic. The appropriate test statistic is the F-distribution. Underlying this procedure are several assumptions. 1) The data must be at least interval level. 2) The actual selection of the sales must be chosen using a probabilitytype procedure. 3) The distribution of the monthly sales for each of the populations is normal. 4) The variance of the three populations are equal, i.e. σ 2 1 = σ 2 2 = σ 23 . F is the ratio of two variances. F = Estimatedpopulation var iancebasedonthe var iationbetweenthesamplemeans Estimatedpopulation var iancebasedon var iationwithinsamples MST = MSE The numerator has k-1 degrees of freedom. The denominator has N-K degrees of freedom, where k is the number of treatments and n is the number of observations. STEP 4 The Decision Rule. As noted previously the F-distribution and accompanying curve are positively skewed and dependent on: 1) 2) The number of treatments, K, and The total number of observations, N. For this problems we have K-1=3-1=2 degrees of freedom in the numerator. There are 15 observations (three samples of five each). Therefore there are N-K=15-3=12 degrees of freedom in the denominator 108 In using the predetermined 0.05 level, the decision rule is to accept the null hypothesis H o if the computed F value is less than or equal to 3.89; we reject H o if the computed F value is greater than 3.89. The decision rule is shown diagrammatically. Region of rejection Region of acceptance 3.89 Distribution of F for a k of 3 and an N of 15. α = 0.05 F scale critical value α = 0.05 STEP 5 Compute F, and arrive at a decision. The first step is to set up an ANOVA table. It is merely a convenient form to record the sum of squares and other computations. The general format for a one-way analysis of variance problem is shown in table 2.0 Table 2.0 A general format for Analysis of Variance Table. Source of variation (1) Sum of Squares (2) Degrees of freedom (3) Mean squares (1)/(2) K-1 SST = MSTR K −1 N-K SSE = MSE N −K SST Between Treatments SSE Error(within treatments) SS Total Total 109 Formula For SST K −1 SSE N −K F= = MSRT MSE Where MSTR is the mean square between treatments. MSE is the mean square due to error. It is also referred to as the mean square within treatments. SST is the abbreviation for the sum of square treatment and is found by: SST = 2 (T 2 ) (∑ X ) − ∑ n N SSE is the abbreviation for the sum of square error. Where: T N Is the number of observations for each respective treatment = = ∑x ∑x = 2 = Treatment total is the sum of all the observations (sales) is the square of each observation (sales) and then the sum of the squares. K = is the number of treatments (sales people) N = is the total number of observations Compute SST SST = 2 (T 2 ) (∑ X ) − ∑ n N (75) 2 (90) 2 (72) 2 (247) 2 = + + − 5 5 5 15 = 4101.8 – 4067.27 = 34.53 Compute SSE [T ] ∑ (X ) − ∑ N 2 SSE = 2 110 (85) 2 (90) 2 (72) 2 = (25) 2 + (15) 2 + ....(12) 2 − + + 5 5 5 = 4.355 – 4.101.8 = 253.2 Total variation (SS total) is the sum of the between-columns and the between-rows variation, that is SS total = SST + SSE = 34.53 + 253.2 = 287.73. As a check SS Total = ∑ (X ) − (247) = 4 355 15 2 (∑ X ) 2 N 2 = 4.355 – 4067.27 = 287.73 Three sums of squares and the calculation needed for F are transferred to the ANOVA Table 3. Table 3.0 ANOVA Table for the Store Managers problem Source of variation (1) Sums of square (2) degrees of freedom Between treatment SST = 34.53 K-1=3-1=2 Error (within 253.2 SSE = SS Total) 287.73 Computing F: F = N-K = 15-3 = 12 1 Mean squares 2 SST 34.53 = = 17.265 k −1 2 SSE 253.2 = = 21.1 N −K 12 SST MSRT 17.265 = = = 0.818 K −1 MSE 21.1 SSE N −K The decision rule states that if the computed value of F is less than or equal to the critical value of 3.89, the null hypothesis is accepted. If the F value is greater than 3.89, H o is rejected and H a is accepted. Since 0.818 < 3.89, the null hypothesis is accepted at the 0.05 level. To put it another way, the differences in the mean monthly sales (K17,000, 111 K18,000 and K14,000) are due to chance (sampling). From a practical standpoint, the levels of sales of the three salespeople being considered for Store manager are the same. No decision with respect to the position can be made on the basis of monthly sales. Inferences About Treatmenat Means Suppose in carrying out the ANOVA procedure, we make the decision to reject the null hypothesis. This allows us to conclude that all treatment means are not the same. Sometimes we may be satisfied with this conclusion, but in other instances we may want to know which treatment means differ. Let us consider the following example: Four groups of students were subjected to different teaching techniques and tested at the end of a specified period of time. As a result of dropouts from the experimental groups (due to sickness, transfer, and so on), the number of students varied from group to group. Do the data shown below present sufficient evidence to indicate a difference in the mean achievement for the four teaching techniques? Use 0.05 level of significance. 1 65 67 73 79 81 69 2 75 69 83 81 72 79 90 549 454 SS (total) = = 139511 - ∑∑ X (1779) 23 2 ij 3 59 78 67 62 83 76 4 94 89 80 88 425 351 − (∑∑ X ij ) 2 N 2 = 139511 – 137601.78 = 1909.22 T2 − CM ∑ i =1 ni K SST = (454) 2 (549) 2 (425) 2 (351) 2 = + + + − 137601.78 6 7 6 4 = 34352.667 + 43057.29 + 30104.17 + 30800.25 – 137601.7826 = 13814.377 – 137601.783 = 712.594 SSE = SS total – SST = 1909.22 – 712.59 = 1196.63 112 Table 4.0 Anova Table For Students Source of Variation SST SSE SS Total Sums of square 712.59 1196.63 1909.22 Degrees of Freedom 3 19 22 Mean square 237.53 62.98 F 237.53 = 3.77 62.98 Reject H o if the computed F value is greater than F.05, 3, 19 = 3.13. Since F = 3.77 , 3.13 we reject H o . Recall in the Stores manager data there was no difference in the treatment means. In this case further analysis of the treatment means is not warranted. However, in the foregoing example, regarding mean achievement for the four teaching techniques, there was a difference in the treatment means. That is, the null hypothesis is rejected and the alternate hypothesis accepted. If the achievement do differ, the question is between which groups do the treatment means differ? Several procedures are available to answer this question. Perhaps the simplest is through the use of confidence intervals. A confidence interval for the difference between two population means is found by: (X 1 ) − X 2 ± tα 2 N −K 1 1 MSE + n1 n2 Where: X1 is the mean of the first treatment X2 t MSE n1 n2 is the mean of the second treatment is obtained from the table. The degrees of freedom are equal to N-K is the mean square error term obtained from the ANOVA table (SSE/N-K) is the number of observations in the first treatment is the number of observations in the second treatment. If the confidence interval includes 0, we conclude there is no difference in the pair of treatment means. However, if both end points of the confidence interval are of the same sign, it indicates that the treatment means differ. The 0.95 level of confidence for the difference between µ1 and µ 2 is found by 113 (X 1 ) − X 2 ± tα 2 ,N −K 1 1 MSE + n1 n2 = (75.67 – 78.43) ± 2.093 1 1 62.98 + 6 7 = -2.76 ± 9.24 = --12.00 and 6.48 where X 1 = 75.67, X 2 = 78.43 t = 2.093 from Appendix A table A.6 (N-K = 19 degree of freedom). MSE = 62.98 from the ANOVA Table n1 = 6, n2 = 7 Similarly, consider X 1 = 75.67 and X 4 = 87.75 We found that the 95 percent confidence interval ranges from –22.8 up to –1.36. Both end points are negative: we can conclude these treatment means differ significantly. That is students subjected to teaching techniques 4 have higher score than those subjected to teaching technique 1. Caution The investigation of differences in treatment means is a sequential process. The initial step is to conduct the ANOVA test. Only if the null hypothesis that the treatment means are equal is rejected should any analysis of the treatment means be attempted. Two-Way Anova: In the appliance sales, example, we were unable to show that a difference exists among the mean sales of the three salespeople. In the computation of F- statistic, variation was considered as originating from two sources. First, variation within each of the treatment was considered. The variation either originated from the treatment or was considered random. There are other possible sources of variation, such as the training the sales people had, the days of the week on which the sample data were obtained, etc. Two-way analysis of variance allows us to consider at least one other of these possibilities. Example: EUROAFRICA, is expanding bus services from the Capital City into the heart of the Copperbelt. There are four routes being considered from Kitwe to the other four towns. The travel times in minutes along each of the four routes are given below. 114 Travel Time From Kitwe To Other Four Towns DAY Monday Tuesday Wednesday Thursday Friday LUANSHYA 40 38 38 37 41 NDOLA 45 42 40 43 41 CHINGOLA 46 44 44 42 40 MUFULIRA 34 30 33 40 32 At the 0.05 significance level, can it be concluded there is a difference among the four routes? Does it make a difference which day of the week it is? The null hypothesis is that the mean time is the same along the four routes, then this requires the one-way ANOVA approach. The variation that occurs because of differences in the days of the week is considered random and is included in the MSE term. Thus the F ratio is reduced,. If the variation due to the day of the week can be removed, the denominator or the F ratio will be reduced. In this case, the day of the week is called a blocking variable. Hence, we have variation due to treatment and due to blocks. The sum of squares due to block (SSB) is computed as follows: SSB = ∑B K 2 − (∑ X ) 2 N Where B refers to the block total, that is, the total for each row, and K refers to the number of items in each block. The same format is used for the two-way ANOVA Table as was used in the one-way ANOVA case. SST and SS total are computed as before. SSE is obtained by subtraction (SSE = SS Total – SST-SSB). Table 4.0 shows the necessary calculations. Calculations Needed For Two-Way ANOVA Day Monday Tuesday Wednesday Thursday Friday Column Total Sum of Square Sample size Luanshya 40 38 38 37 41 194 7538 5 Ndola Chingola Mufulira 46 44 44 42 40 216 9352 5 34 30 33 40 32 169 5769 5 45 42 40 43 41 211 8919 5 Row Sum 165 154 155 162 154 790 31578 Analogous to the ANOVA Table for a one-way analysis, the two way general format is: Source of (1) Sum of Squares (2) Degrees of freedom (3) Mean squares (1)/(2) 115 SST K −1 Treatments SSB n −1 Blocks SSE SSTotal Error ( K − 1)(n − 1) SST = MSTR K −1 SSB = MSB = meansquare n −1 = SSE = MSE ( K − 1)(n − 1) Total As before, to compute SST SST = 2 (T 2 ) (∑ X ) − ∑ n N (194) 2 (211) 2 (216) 2 (169) 2 (790) 2 = + + + − 5 5 5 20 5 = 31474.8 – 31205 = 269.8 SSB is found by: SSB = ∑ [B ] − (∑ X ) 2 2 K N (165)2 (154 )2 (155)2 (162 )2 (154 )2 = + + + + − 31205 4 4 4 4 4 = 31231.5 – 31205 = 26.5 The remaining sum of squares are SS Total = ∑ (X ) − 2 (∑ X ) 2 N 116 (790) = 31578 20 2 = 31578 – 31205 = 373 SSE = SS total – SST – SSB = 373 – 269.8 – 26.5 = 76.7 The values for the various components of the ANOVA Table are computed as follows: Source of variation (1) Sum of Squares (2) Degrees of freedom (3) Mean squares (1)/(2) 3 89.933 4 6.625 12 6.392 269.8 Treatments 26.5 Blocks 76.7 Error Total 373 19 There are two sets of hypothesis being tested: 1. 2. Ho The treatment means are the same. µ1 = µ 2 = µ3 = µ 4 Ha The treatment means are not the same. Ho The block means are the same. µ1 = µ 2 = µ3 = µ 4 = µ5 Ha The block means are not the same. First we all test the hypothesis concerning the treatments means. There are K-1 = 4-1 = 3 degrees of freedom in the numerator and (n-1) (K-1) = (4-1)(5-1) = 12 degrees of freedom in the denominator. Using the 0.05 significance level, the critical value of F is 3.49. The null hypothesis that the mean times for the four routes are the same is rejected if the F ratio exceeds 3.49. 117 F= MSTR 89.933 = = 14.07 MSE 6.392 The null hypothesis is rejected and the alternate accepted. It is concluded that mean travel time is not the same for all routes. EUROAFRICA will want to conduct some tests to determine which treatment means differ. Next, we test to find out if the travel time is the same for different days of the week. The degrees of freedom in the numerator for blocks is n-1 = 5-1 = 4. The degrees of freedom in the denominator is the same as before: (n-1) (K-1) = (5-1) (4-1) = 12. The null hypothesis that the block means are the same is rejected if the f ratio exceeds 3.26. MSB 6.625 F= = = 1.04 MSE 6.392 The null hypothesis is accepted. The mean travel time is the same for the various days of the week. Problems 1) Suppose that we want to compare the cholesterol contents of four competing diet foods on the basis of the following data (in milligrams per package) which were obtained for three 6-ounce packages of each of the diet foods. Diet Food A 3.6 4.1 4.0 nA = 3 B 3.1 3.2 3.9 nB = 3 C 3.2 3.5 3.5 nC = 3 D 3.5 3.8 3.8 nD = 3 The means of these four samples are YA = 3.9, YB = 3.4 , YC = 3.4 and Y4 = 3.7 . We want to know whether the differences among them are significant or whether they can be attributed to chance, use 0.05 level of significance. 2) Of the three banks in Kitwe, customers are randomly selected from each bank and their waiting times before service are recorded. Bank ZNCB 4.8 Standard Chartered 6.9 bank 7.1 Barclays bank Waiting time (minutes) 5.5 6.3 8.5 5.3 4.3 3.5 118 Do these data indicate a significant difference among the mean waiting times of these banks? Use the 0.05 significance level. 3. 4) 5) A Wholesaler is interested in comparing the weight in grammes of tomatoes from Lusaka, Ndola and Kitwe. Lusaka Ndola Kitwe 5.6 8.8 9.0 7.8 8.2 7.4 8.2 11.0 10.1 8.9 9.3 10.0 a) What are the null and alternate hypothesis? b) Fill in an ANOVA Table c) What is the critical value of F, assuming a 0.01 level of significance? d) What decision should the wholesaler make? Refer to problem 3. Let µ A and µ B respectively, denote the mean weights in grammes of tomatoes from Lusaka and Ndola. a) Find a 95 percent confidence interval for µ A b) Find a 95 percent confidence interval for µ B c) Find a 95 percent confidence interval for µ A − µ B d) What conclusion can you draw from the interval in c. An experiment was conducted to complete the effect of four different chemicals, A, B, C and D. In producing water resistance in textiles. A strip of materials, randomly assigned to receive one of the four chemicals, A, B, C, or D. This process was replicated three times, thus producing a randomized block design. The design, with moisture-resistance measurement, is as shown in the accompanying diagram (low readings indicate low moisture penetration). 119 a) Do these data indicate a significant difference among the mean waiting times of these banks? Use the 0.05 significance level. b) Do the data provide evidence to indicate that blocking increased the amount of information in the experiment? c) Find a 95% confidence interval for the difference in mean moisture penetration for fabric treated by chemicals A and D. d) Interpret the interval. 1 C 9.9 A 10.1 B 11.4 D 12.1 2 D 13.4 B 12.9 A 12.2 C 12.3 3 B 12.7 D 12.9 C 11.4 A 11.9 ANSWERS Diet Food: A 3.6 4.1 4.0 Total ∑ X 11.7 ∑X 2 B 3.1 3.2 3.9 10.2 35.06 45.77 C 3.2 3.5 3.5 10.2 34.74 D 3.5 3.8 3.8 11.1 41.13 (∑ X ) ∑ (X ) − 2 SS Total = 2 (43.2) N 2 = 156.7 - 12 = 156.7 – 155.52 = 1.18 120 (∑ T ) − (∑ X ) SST = 2 2 n = N (11.7) 2 + (10.2) 2 + (10.2) 2 + (11.1) 2 3 - 155.52 136.89 + 104.04 + 104.04 + 123.21 − 155.52 3 = 156.06 – 155.02 = = 0.54 SSE = SS Total – SST = 1.18 – 054 = 0.64 ___________________________________________________ Source of Degree of Mean square F Variation Freedom ____________________________________________________ SST = 0.54 3 0.18 SSE = 0.64 8 0.08 2.25 ___________________________________________________ SS Total = 18 11 ____________________________________________________ F.05, 3,8 = 4.07, Accept H o 2) _______________________________________________ Bank Waiting Sample ∑X ∑X 2 Time Size ________________________________________________ ZNCB 4.8, 5.5, 6.3 3 16.6 92.98 Standard Chartered Bank 6.9, 8.5, 5.3, 4.3 4 25 166.44 Barclays 7.1, 3.5 2 10.6 62.66 ________________________________________________ (52.2) 2 = 322.08 − 302.76 = 19.32 SS Total = 322.08 − 9 121 (16.6) 2 (25) (10.6 ) SST = + + − 302.76 3 5 2 2 2 = 91.853 + 156.25 + 56.18 – 302.76 = 304.283 – 302.76 = 1.523 SSE = SS Total – SST = 19.32 – 1.523 = 17.797 Source of variation SST SSE Sum of Square 1.523 17.797 Degree of freedom 2 6 SS Total 19.32 8 Mean square F 0.7615 2.966 0.257 F.05, 2, 6 = 5.14. Accept H o 3. H o : µ1 = µ 2 = µ3 H a : not all equal Reject H a is F is greater than 8.02. SS Total = 428.59 − (104.3) 2 12 = 928.59 – 906.54 = 22.05 (23.4) 2 (31.6) 2 (49.3) 2 SST= + + − 906.54 3 4 5 = 11.718 SSE = 22.05 – 11.718 = 10.332 Source of variation SST SSE Sum of Square 11.718 10.332 Degree of freedom 2 9 Mean square F 5.859 1.148 5.10 122 SS Total 22.05 11 H o cannot be rejected. The evidence does not suggest any differences jin weights of tomatoes. 4) a) for a simple treatment T1 ± tα S / n1 2 2 where S = S = MSE 7.8 ± t.025 , 9 (1.07)/ 3 7.8 ± 2.262 (0.618) (6.402, 9.198) b) 7.9 ± (2.262) (1.071) 4 7.9 ± 1.2 (6.7, 9.1) (T − T ) ± t 1 j α S 2 c) 1 1 + ni n j (7.8 – 7.9) ± (2.262) (1.071) -0.1 ± 1.85 1 1 + 3 4 (-1.95, 1.75) This interval traps 0 which implies there is no significant difference between the two means. 2 5) 2 (43.5) (50.8) (48.9) 2 (143.2) SSB = + + − 4 4 4 12 2 = 473.0625 + 645.16 + 596.8025 – 1708.85 = 7.175 SS Total = 1721.76 - (143.2) 12 2 = 1721.76 – 1708.85 = 12.91 123 SST = (34.2)2 + (37 )2 + (33.6)2 + (38.4)2 3 33 3 − 1708.85 3 = 39.88 + 456.33 + 376.32 + 491.52 – 1708.85 = 5.2 SSE = 12.91 – 7.175 – 5.2 = 0.535 Source of variation SST SSB SSE SS Total 5) a) Sum of Square 5.2 7.175 0.535 12.91 Degree of freedom 3 2 6 11 Mean square 1.7333 3.5875 0.0892 F 19.43 40.22 F.05, 3, 6 = 4.76 reject H o H a : µ A = µ B = µC = µ D H a : not all equals. F.05, 2, 6 = 5.14 reject H o b) H o : µ1 = µ 2 = µ3 H a : not all equal. c) (11.4 – 12.8) ± t.025, 6 2.447 -1.4 ± 1 1 0.0892 + 3 3 0.2439 0.597 (-1.997, -0.803) Learning Objectives 124 After working through this Chapter you should be able to: • Explain the purpose of analysis of variance • Carry out small examples of one way and two-way analysis of variance with a hand calculator, presenting in an ANOVA table. • Carry out tests of hypothesis, and to write down confidence intervals as in this Chapter. Sample Examination Questions 1. a) A restaurant owner operates three restaurant within a city. One in a major shopping centre (A), one near the college campus (B), and one at the park area (C). The management has collected the following data on daily sales (in thousands of kwachas). A B C Monday 10.5 8.4 5.9 Tuesday 8.4 9.3 7.1 Friday 12.6 11.4 6.7 Saturday 18.3 7.9 14.2 Sunday 10.8 6.3 13.7 Day (i) What type of experimental design is represented here? 125 2. (ii) Construct an ANOVA summary table for this experiment. (iii) Is there evidence of a difference in mean sales among the restaurants? (Use α = 0.05 ). (iv) Is there evidence (at α = 0.05 ) of a difference in the mean sales for the five days. (v) Estimate the difference in mean sales between the restaurant created at the shopping center and near the college campus. Use a 90% confidence interval. (vi) State the assumptions required for the validity of the procedures used in parts (ii) to (v). (b) A major appliance dealer wishes to compare his mean television sales during three different periods of the week. Beginning (Monday, Tuesday), Mddle (Wednesday, Thursday), and End (Friday, Saturday). His plan is to select random samples of sales records from each period, and record the number of television sets sold. What type of experimental design is this? (a) What is a two-way ANOVA test? (b) A power plant, which uses water from the surrounding bay for cooling its condensers, is required by the Environmental Protection Agency (EPA) to determine whether discharging its heated water into the bay has a detrimental effect on the flora (plant life) in the water. The EPA requests that the power plant make its investigation at three strategically chosen locations, called stations. Stations 1 and 2 are located near the plants discharge tubes, while station is further out in the bay. During one randomly selected day in each of 4 months, a diver is sent down to each of the stations, randomly samples a square meter area of the bottom, and counts the number of blades of the different types of grasses present. The results are as follows for one important grass type. Month Station 1 2 3 May 28 31 53 June 25 22 61 July 37 30 56 August 20 26 48 126 3. (i) Is there sufficient evidence to indicate a difference among the mean numbers of blades found per square meter per month for the three stations? Use α = 0.05 . (ii) Is there sufficient evidence to indicate a difference among the mean numbers of blades found per square meter for the 4 months? Use α = 0.05 . (c) Place a 90% confidence interval on the difference in means between stations 1 and 3. (a) An advertising firm is studying the effects of four different kinds of displays of a product in a grocery store in three different sales areas in the city. Within each sales area, four stores are selected, and each receives one of the four displays. Over the duration of the experiment, the number of units of the product sold is recorded. The data are shown in the table. Display 4. Sales Area 1 2 3 A 120 76 95 B 114 60 102 C 140 85 122 D 102 80 85 (i) Which model is appropriate for analyzing these data? Explain. (ii) Do the four displays result in different averages? Use α = 0.05 to reject. (b) State the three assumptions of the error term in the analysis of variance models. Which of the three assumptions is most critical in validating an analysis of variance model fitted to a data set? (a) What is an ANOVA test? 127 (b) A supermarket chain conducted a study to determine where to place its generic brand products in order to increase sales. Sales (in thousands of kwacha) for one were as follows: Store 1 Store 2 Store 3 High shelf 60 56 52 Eye-level shelf 53 58 56 Low shelf 55 55 59 Perform a two-way analysis of variance. Using the level of significance α = 0.05 . 5. (a) Three of the currently most popular television shows produced the following ratings (percentage of the television audience tuned into the show) over a period of four weeks: Week 1 2 3 4 Totals A 34.7 38.1 35.1 30.4 138.3 SHOW B 28.4 32.2 32.4 28.2 121.2 C 23.8 20.7 25.8 29.9 99.2 Totals 86.9 91.0 93.3 87.5 358.7 (i) Is there evidence (at α for the three shows. (ii) Is there evidence (at α =0.01) that the use of weeks as blocks is justified in this experiment. (iii) Construct a 95% confidence interval for the difference in mean ratings between shares B and C. (iv) State the assumptions necessary for the validity of the procedure used in (i) to (iii). = 0 . 01 ) that the mean ratings differ 128 (b) Independent random samples of six assistant professors, four associate professors and five full professors were asked to estimate the amount of time outside the classroom spent on teaching responsibilities in the last week. Results, in hours are shown in the accompanying table. Assistant 8 13 12 16 10 12 Associate 16 13 16 9 Full 12 8 7 10 8 (i) What type of experiment design is represented here. (ii) Set out the analysis of variance table. (iii) Test the null hypothesis that the three population times are equal. Use α = 0.05 . 129 CHAPTER 10 TIME SERIES Reading Newbold Chapter 17 Tailoka Frank P Chapter 6 Plane and Oppermann 395 Introductory Comments This Chapter follows from the Index and allows the understanding of some alternative ways of presenting the results. Index numbers plays an important role in forecasting and here models of forecasting are presented. 10.1 Introduction Any variable that is measured overtime in sequential order is called a time series. The primary characteristic of a time series is the assumption that the observations have some form of dependence on time. Since this time dependence may take on any number of possible patterns, the problem becomes one of identifying the most important factors. Business people, economists, and analysts of various kinds all look back at the sequence of events that occurred over the past year or years in order to understand what happened and thereby (they hope) to be in a better position to anticipate what may happen in the future. A leveling-off long-term population growth, for example, may indicate to a particular firm that future market expansion may not be unlimited and that more careful attention should be paid to increasing the firm’s market share. Even with a general slowdown in population growth, the gradual aging of the population may imply to another firm – one concentrating in consumer goods for older people – that its total market potential is growing substantially year after year, other types of time – dependent patterns may exist, as well. In looking at a time series of monthly or quarterly beer sales, for example, we may discover a regular seasonal pattern in which beer consumption peaks. Other regular periodic or seasonal variation can be observed in sales of college textbooks, and in the observance of such social customs as giving Christmas gifts and Valentine’s Day flowers. 130 The task of time – series analysis can therefore be thought of quite generally as a matter of identifying and isolating the various major time dependent patterns on a given time series data array. Once accomplished, this analysis should enhance the user’s ability to forecast variables of interest over the future. The classical time-series model focuses on the decomposition of the timedependent variable into four component parts: trend (T), cycle (C), seasonal variation (S), and residual or irregular variation (I). The model may be additive in its component parts: Yt = Tt + St + I t + Ct or multiplicative in its component parts, Yt = Tt × C1 × S1 × I1 The movements of a time series may be classified as follows: 1. A trend (also known as a secular trend) is a long-term relatively smooth pattern or direction that the series exhibits. By definition, it has a duration of more than one year. For example, data for beer sales show them to have an upward trend to the right, whereas birth rates over the last few years seem to have a downward trend to the right. 2. A cycle is a wavelike or oscillatory pattern about a long-term trend that is generally apparent over a number of years. By definition, it has a duration of more than one year. Examples of cycles are well known business cycles that record periods of economic recession and inflation, long-term product demand cycles and cycles in the monetary and financial sectors. 3. Residual or Irregular Variation is the random movement that a series exhibits after the trend, cycle, and seasonal variation are removed. For example, daily centimeters of rainfall in a particular urban setup during a given month is often random in this sense. Notice that all time series exhibit random variation while they may not have a trend, a cycle, or seasonal variation. Moreover, whether or not a particular trend, cycle, or seasonal variation is present in a given time series critically depends on the time period chosen for observation. 4. Seasonal – these are the oscillations, which depend on the season of the year. Thus, employment is usually higher at harvest time at Nakambala Sugar Estate in Mazabuka. Rainfall will be higher at some times of the year than at others. The motivation behind decomposing a time series is twofold. On the one hand, we wish to see whether a particular component is present in a given time series and to understand the extent to which it explains some of the movements in the 131 variable of interest. On the other hand, if we wish to forecast a particular variable, we can usually improve our forecasting accuracy by first breaking it into component parts, then forecasting each of these parts separately, and finally combining the individual effects to produce the composite overall forecast. Business Forecasting is concerned with estimating the future value of some variable of interest. This may be done for the short-term or for the long-term, and different forecasting models are more appropriate for one case than for the other. Forecasting may be done in any of three possible ways. Using regression models, using time series models, and using forecasting models especially created for a specific purpose. Indeed, quantitative forecast models have even been designed for cases in which historical databases are not available – such as when a firm wishes to forecast sales of a new product or the expected profitability or market share for such a product. Today, forecasters have developed a specialized terminology or jargon and many forecasting models require a level of mathematical sophistication and the availability of computers and specialized computer software that go far beyond the scope of this book. As such, our objective in this course is to provide the student with a basic understanding of the underlying issues about the use of various types of forecasting models, rather than to provide a sophisticated level of hands – on experience. 10.2 Trend Analysis The first component of a time series that we will consider is the long-term trend. A trend can be linear or nonlinear and, indeed, can take on a whole host of other functional forms such as polynomials and logarithmic trends, among others. We shall begin by working through an example using a linear model. Example Annual sales for a pharmaceutical company have been recorded over the past 10 years; they are shown in Table 1.1. Calculate a linear trend of the data. Table 1.1 – Annual Data for Pharmaceutical example. How we measure time along the horizontal axis (it turns out) is irrelevant in timeseries analysis. We can suit ourselves, picking whatever numbers serve to reduce the computational burden. A common practice is to measure the time periods consecutively (1, 2, 3, ….), and we shall do so here. Table 1.2 – Calculations for Example 1.1 132 SALES Y 18.0 19.4 18.0 19.9 19.3 21.1 23.5 23.2 20.4 24.4 207.2 YEAR 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 TOTAL YEAR 1975 TIME X 1 2 3 4 5 6 7 8 9 10 55 X2 1 4 9 16 25 36 49 64 81 100 385 XY 18.0 38.8 54.0 79.6 96.5 126.6 164.5 185.6 183.6 244.0 1,191.2 SALES (in K millions) 18.0 1976 19.4 1977 18.0 1978 19.9 1979 19.3 1980 21.1 1981 23.5 1982 23.2 1983 20.4 1984 24.4 Least Squares Method: The simplest method of fitting a linear trend is to use the least squares approach we discussed in the handout on Regression Analysis. In this method, the formulas for the slope and intercept are: 133 b= ∑ xy − ∑ ∑x a = Y − bx b= a= 2 − ∑y x n (∑ x )2 n − 1191.2 − (55)(207.2) 10 2 ( 55) 385 − 10 = 51.6 = .6255 82.5 (55) = 17.28 207.2 − .6255 10 10 and the following trend equation can be written as: Y = 17.28 + .6255 x = 17.28 + .6255(11) = 24 .1605 = 24.16 Similarly, forecasting 2 years ahead would involve setting x equal to 12; and so on. Both confidence and prediction intervals can be constructed to give us a bound of confidence about our forecast. The Caveat about forecasting outside the data range must be emphasized here-especially if forecasting for more than one time period is being contemplated. Example 1.2 Among the more common functional forms used in trend analysis are the following three: 1. A linear model, y = P0 + P1 × x 134 which is appropriate if the first differences are roughly equal (first differences are between success values in time series). 2. A polynomial form, y = P0 + P1 x + P2 x 2 ( parabola ) or y = P0 + P1 x 2 ( parabola ) which is appropriate of differences between successive first differences). 3. A logarithmic trend or exponential trend, Y = P0 ( P1 ) x or log y = log P0 + (log P1 ) x which is appropriate if neither A linear or polynomial form fits but there nonetheless appears to be a constant rate of increase over time. 10.3 Moving Averages An alternative approach to trend-cycle analysis is to use moving averages. In a sense, the moving average, MA, takes away the short-term seasonal and irregular variation, leaving a combined trend widely used to remove seasonal variation, irregular variation (or “noise”, as it is also called), or both. Example 1.2 Monthly sales figure for gasoline were recorded at all the gas stations in a particular town, as shown in table 1.3. Calculate the three-month and five month moving averages. Example 1.3 Monthly Regional Gasoline Sales MONTH 1 2 3 4 5 6 7 8 9 10 11 12 GASOLINE SALES (1000s of kilograms) 37 70 45 26 60 45 31 79 24 61 25 44 135 Solution A moving average is a simple arithmetic average computed over any number of time periods. For a three period moving average, we would take the first three months (1, 2, and 3) and average them. Then we would move to the next month grouping (2, 3 and 4) and averaging them; and so on. In a similar fashion, we can compute 5 month moving averages, as shown in table 1.4, or any other number – of month’s averages. Table 1.4 – Calculations for Moving Averages for Gasoline Sales Example 3 month MA 5 month MA Month Gasoline Moving ÷3 Moving ÷ 3 Moving Sales Total Moving Total = Average = Average 1 37 2 70 152 50.7 3 45 141 47.0 238 47.6 4 26 131 43.7 246 49.2 5 60 131 43.7 207 41.4 6 45 136 45.3 241 48.2 7 31 155 51.7 239 47.8 8 79 134 44.7 240 48.0 9 24 164 54.7 220 44.0 10 61 110 36.7 233 46.6 11 25 130 43.3 12 44 Notice that, the longer the time period, over which we average, the smoother the series becomes. Eventually it becomes a straight line moving average. Reducing the number of observation points for the 3 month moving average, we lose the first and last month; for the 5 month moving average, we lose both the first 2 and the last 2 months. In general, if we set the period of the moving average exactly equal to the number of seasonal variations that occur in a given time series, we exactly remove that seasonal variation. For example, if we have quarterly observations and wish to remove the four seasons, we choose a 4 – period moving average. Here (and in general) when the number of periods chosen is even – numbered we must compute a centered moving average. Example 1.3 Historical occupancy rates for a Kasaba resort hotel have been compiled by the government tourism office; these are shown in Table 1.5 calculate 4 – quarter moving average. Solution To remove the seasonal variation, we need to compute a 4 – period moving average. This, however, would place the moving average exactly between the two quarters. Consequently, we next take a 2 period moving average of all 4 period moving averages, thereby centering the final moving average on a particular quarter. Our calculations appear in Table 1.6. 136 Notice that we first calculated the 4-quarter moving and then centered it by determining the averages of each pair of adjacent moving averages. For example, the moving average of the first four quarters is 1972.75. The moving average of quarters (1980 and 1981) II, III, IV and I are 1983.50. The centered moving average is (1972.75 + 1983.50)/2 = 1978.1. The remaining centered moving averages are computed in a similar manner. Table 1.5 – Hotel Occupancy Rates Year 1980 1981 1982 1983 1984 Quarter Hotel Occupancy I II III IV I II III IV I II III IV I II III IV I II III IV 1682 2105 2401 1703 1725 2215 2603 1815 1783 2215 2187 1801 1867 2124 2417 1896 1995 2504 2619 2011 Moving averages are specifically designed to remove seasonal and/or irregular variations. As such, they can be thought of as serving three purposes. First, they are one of several types of smoothing techniques that remove short-term variation and leave only a combined trend-cycle. In other words, if we think of the classical multiplicative time – series model, we have Y = T .C.S .I by dividing both sides by (S.I.), we get Y T .C.S .I . = = T .C = MA S .I S .I That is, we are left with the moving average series, which is composed solely of the trend and cycle. 137 second, we can set the period of the moving average exactly equal to the number of seasonal effects we wish to remove. In that sense, we have deseasonalized our time series. Table 1.6 – Centered Moving Average Calculation for Hotel Occupancy Year 1980 1981 1982 1983 1984 Quarter Occupancy I II III IV I II III IV 1682 2105 2401 1703 I II III IV 1725 2215 2603 1815 1783 2215 2187 1801 I II III IV I II III IV 1867 2124 2417 1896 1995 2504 2619 2011 4 Quarter Moving Average 1972.75 1983.50 2011.00 2061.50 2089.50 2104.00 2104.00 2 Term Moving Total 3956.25 3994.50 4072.50 4151.00 4193.50 4208.00 Centered Moving Average 1978.1 1997.3 2036.3 2075.5 2096.8 2104.0 2000.00 1996.50 2017.50 1994.75 4104.00 3996.50 4014.00 4012.25 2052.0 1998.3 2007.0 2006.1 2052.25 2076.00 21.08.00 2203.00 2253.50 2282.25 - 4047.00 4128.25 4184.00 4311.00 4456.00 4535.75 - 2023.5 2064.1 2092.0 2155.5 2228.3 2267.9 - This is one of the simplest methods of forecasting but it is only appropriate for series with no trend or seasonal effect. It is often used to predict the demand for a product in the next time period so that sufficient stock can be kept to supply it. (This is called demand forecasting.) 10.4 Irregular Variation Irregular or random variation remains after the trend, cyclic and seasonal variation have been removed. One way of removing it is through smoothing techniques, such as the moving average we discussed in section 1.3. another popular technique is exponential smoothing, which we shall look at shortly. By definition, irregular variation is unpredictable and random, can only sometimes be identified through examination of major external events that might have influenced the time series, and often tend to cancel each other out over time. Although certain mathematical techniques (such as spectral analysis) address themselves to irregular variation and movements in residual error terms, they are beyond the scope of this course. 138 Exponential Smoothing – Exponential smoothing offers an alternative to moving averages as a way of smoothing a exponential smoothing. st = αYt + (1 − α )Yt −1 + α (1 − α ) Yt − 2 + ... 2 This formula states that the current period’s smoothed value of the time series, St depends on all past values of the dependent variable, although these are weighed progressively less the farther back they go. We set the smoothing constant α such that 2 0 ≤ α ≤ 1 , which means that the successive values of α ,α (1 − α ),α (1 − α ) ..., get smaller and smaller. There is a mathematical procedure for selecting the best or optimal value of the smoothing constant, but it is beyond the level of this course. In fact, selecting small values for α straightens out the time series more completely than selecting large values of α does. By simple mathematical derivation, it can be shown that the extended exponential smoothing equation just described reduces to a computationally simpler form, called the basic exponential smoothing equation: St = αYt + (1 − α )St −1 or St = α (Yt − St −1 ) + St −1 for 0 <α <1 (1) Note that St is the forecasted value and Yt is the actual value. We begin the smoothing procedure by initially setting S1 = Y1 in the first period. Successive values are individually computed as: S 2 = αY2 + (1 − α )S1 S3 = αY3 + (1 − α )S 2 and so on. Setting the smoothing constant to either of its extremes yields one of two cases. When α = 0, then St = (0. yt ) + (1 − 0 )St −1 = St −1 Since we set S1 = Y1 , it follows that St = Y1 for all t . Thus smoothed values are simply equal to the initial value of the time series. Setting α = 1, then St = (1. yt ) + (1 − 1)St −1 = Yt Thus, the smoothed value of the series is just the most recent observation, and all earlier observations are ignored. Such a series is called a random walk or a naïve forecasting model. Here, the forecast value in any particular year is simply the previous year’s value. The layout for working out problems using equation (1) is as follows: 139 Time Period (t) 1 Actual alues (Yt) Y1 ( α Y − S0 Forecasted Values St ) S1 Y3 α (Y2 − S1 ) α (Y2 − S2 ) S3 = S 2 + α (Y2 − S1 ) . . . . . . 2 Y2 3 . . t Yt S 2 = S1 + α (Y1 − S0 ) α (Yt − St −1 ) St = St −1 + α (Yt −1 − St − 2 ) The forecasts of values X t +1 are obtained by the series St +1 = St + α (Yt − St −1 ). This single value is then used as the forecast value in all future years, i.e., for t = 2,3,... Example 1.3 Consider the example used by Roger C. Pfaffenberger and James H. Patterson, book, Statistical Methods (1987) page 899. information on monthly sales of computer software from Daltons Software, Inc., in Fortworth, Texas, for 1986 is given in Table 1.0 using α - values of α = 0.1 and α = 0.9 and forecast of sales for January 1986 of $2,100, forecast sales for February 1986 through January 1987. t Yt Month 1986 January 1 $1,800 February 2 2,000 March 3 1,800 April 4 3,000 May 5 2,700 June 6 1,900 July 7 3,000 August 8 2,600 September 9 1,700 October 10 1,200 November 11 2,400 December 12 1,500 Actual Sales Time Period t 1 2 3 4 5 6 7 8 9 10 11 12 Yt $1,800 2,000 1,800 3,000 2,700 1,900 3,000 2,600 1,700 1,200 2,400 1,500 α = 0 .1 α (Yt − St −1 ) Forecast Sales -30 -7 -26 96 57 -29 84 36 -58 -102 28 -65 2,100 2,070 2,063 2,037 2,133 2,190 2,161 2,245 2,281 2,223 2,121 2,149 2,084 140 2 , where n is the n +1 number of periods in the equivalent moving average. For example, for a 4quarterly moving average over 1 year (n = 4) , α = 0.4. The larger the value of n , of course, and the smaller the value of α , the greater will be the smoothing effect. A useful rule for finding α is given by the formula α = Worked Examples 1. Exponentially Smooth the following observed series of values: 45, 43, 46. 40, 35, 39, 44, The old forecast for the first observed value should be taken as 40 with α − 0.2 . St = α (Yt − St −1 ) + St −1 α = 0 .2 Yt (Yt − St −1 ) t 1 2 3 4 5 6 7 2. 40 35 39 44 45 43 46 St −1 0 -1 0 1 1 0.4 0.92 40 40 39 39 40 41 41.4 42.32 Exponentially Smooth the following data what is the new forecast for the production of aircraft in 1971? (Take α = 0.25 ). Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 Production of New Aircraft 518 Year t 1 2 3 4 5 6 7 8 9 10 11 395 Yt 518 395 487 450 319 415 431 312 278 500 450 487 450 319 415 431 α = 0.25 St −1 0 -31 0 -9 -40 -6 -0.25 -30 -31 32 12 518 518 487 487 478 438 432 432 402 371 403 415 (Yt − St −1 ) 312 278 500 The new forecast for the production of aircraft in 1971 is 415. 141 450 Problems: 1. The accompanying table shows earnings per share of a corporation over a period of 18 years. Year 1 2 3 4 5 6 2. Earnings 3.63 3.62 3.66 5.31 6.14 6.42 Year 7 8 9. 10. 11 12. Earnings 7.01 6.37 5.82 4.98 3.43 3.40 Year 13. 14. 15. 16. 17. 18. (a) Using Smoothing Constants α = 0.3, 0.5, forecast based on simple exponential Smoothing. (b) Which of the forecasts would you choose to use? Earnings 3.54 1.65 2.15 6.09 5.95 6.26 0.7 and 0.9, find Manufacturer Sales of Women’s Footwear (m. pairs) 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter 1966 20.9 17.3 15.6 13.9 1967 17.5 14.7 13.5 13.1 1968 17.0 13.5 13.5 13.7 Is there any evidence that manufacturers’ sales of women’s footwear are subject to seasonal variation? Predict manufacturers’ sales during the first quarter of 1969. New Forecasted Value = old forecasted value + α (actual observation – old forecasted value). α = 0 .2 Period Reference 1 2 3 4 5 6 7 Actual Demand 16 20 15 19 17 21 25 Old Forecast 16 16 16.80 16.44 16.95 16.96 17.77 19.22 New Forecast 16.00 16.80 16.44 16.95 16.96 17.77 19.22 142 y2 = 16 + 0.2(20 − 16 ) = 16 + 0.2(4 ) = 16.80 y3 = 16.80 + 0.2(15 − 16.80 ) = 16.80 + 0.2(− 1.8) = 16.44 y4 = 16.44 + 0.2(19 − 16.44 ) = 16.44 + 0.2(2.56 ) = 16.952 y5 = 16.95 + 0.2(17 − 16.95) = 16.95 + 0.2(0.05) = 16.96 y6 = 16.96 + 0.2(21 − 16.96 ) = 16.96 + 0.2(4.04 ) = 17.77 y7 = 17.77 + 0.2(25 − 17.77 ) = 17.77 + 0.2(7.23) = 19.22 Learning Objectives After working through this Chapter you should be able to: • Define the term time series • Discuss appropriate model to use when forecasting, least squares method, moving average method, exponential smoothing method. Sample Examination Questions 1. Live birth in XX and XY 2002 – 2004 (in thousands) Quarters Year 1 2 3 (a) (b) (c) 2. 4 2002 262 263 264 250 2003 255 256 253 240 2004 251 250 247 237 By means of a moving average, find the trend and seasonal adjustments. Forecast the number of live births in XX and XY for the first two quarters of 2005. Discuss briefly the accuracy of your forecasts. The sales (X K1 000 000) of golf equipment by a large sport shop is shown for each period for four years as follows: Quarter First 2002 - 2003 10 2004 25 2005 45 Second - 35 55 67 Third - 65 85 97 Fourth 29 25 45 - 143 3. (a) Using an additive model, find the centred moving average trend. (b) Find the average seasonal variation for each quarter. (c) Predict sales for the last quarter of 2005 and the first quarter of 2006 stating any assumptions. In the series below the trend is linear and is given by the equation Yt = 5.42 + 1.49 X t Where t indicates the quarter, taking the values 1, 2, 3, . . . 16 Year (a) (b) (c) 4. (a) Quarterly Sales of ‘Musenge’ Sales are in K ‘000’ 000 3 1 2 4 1999 8.2 5.8 8.7 13.9 2000 15.5 12.2 13.5 20.0 2001 21.2 17.0 20.2 26.2 2002 25.4 23.5 25.3 32.6 Plot the data and a bend line a graph. Using the graph (or otherwise) obtain the deviations from the trend line and hence obtain estimates of the seasonal variations. Assume that the additive, model Yt = Tt + St + Rt is appropriate where St and Rt are respectively the seasonal and the residual components of the series. Obtain a forecast of the sales for the quarters of 2003. State clearly the assumptions underlying the procedure that you have used. The number of prescriptions dispensed by chemists under the Health Board in country X during the five-year period is shown in the table below. Health Board Prescriptions (millions) Quarter Year 1 2 3 4 62 73 1 (i) (ii) (iii) 2 71 69 64 71 3 75 68 64 70 4 71 68 67 69 77 5 Plot the data on a graph. Calculate and plot on the same graph using a suitable moving average. Using the additive model, obtain the average seasonal variation. 144 (b) With which characteristics movement of a time series would you mainly associate each of the following: (i) a boom in business (ii) an increase in employment during the harvest season (iii) a minor fire delaying production for a month (iv) decrease in the sales of black and white television sets. . 145 CHAPTER 11 INDEX NUMBERS Reading Newbold Chapter Plane and Oppermann Chapter 16 Tailoka Frank P Chapter 5 Introductory Comments This Chapter looks at an index number which is useful in describing the way in which the economic changes from period to period using prices, quantities etc. A device constructed by statisticians which attempts to explain the magnitude of economic changes overtime is called an index number. An index number shows the rate of change of a variable from one specification to another. You will realize that the index of retail prices attempts to measure the change in the price of a whole range of goods and services that we regularly buy. So you can see that it is attempting to measure the cost of living – something that concerns us all. In times of inflation, the retail price index is probably more important than at any other time in its existence. *Developed countries. - increase in pay pensions index-linked *However, to do this we need to know what an index is, how it is calculated and what its limitations are. The primary function of price index is to compare prices in one year with those in some other years. Technically prices in a given year are to be compared with prices in the base year which are taken as standard. Conventionally P1 refers to the price in the given year and P0 refers to the price in the base year. A Price Index: measures the change in the money value of a group of items overtime. If only one item such as bread is being considered the comparison between years may be made by the calculation of price relatives, i.e., the prices in the given year relative to the base year. P Price relative = 1 .100 P0 146 e.g., if the price of a loaf was K1100 in 1999 and K1700 in 2000, the 2000 price relative 1700 to 1999 was 100 = 154.5 . The interpretation of price index is straight 1100 forward. The price index for 2000 is 154.5. this means that the 2000 price of a loaf of bread is 154.5 percent of the 1999 (base year) price of a loaf of bread. Four Main Considerations to be borne in Mind When Constructing an Index Number i) ii) iii) iv) The purpose of the Index Number. Selection of items for inclusion. Selection of appropriate weights. Selection of a base year If more than one item of commodity is to be considered to give an overall impression of rising or falling prices it becomes necessary to combine the prices of these items into some form of a weighted average or index number. The most commonly used form is that calculated by the laspeyres formula. I1 = ∑ Pq ∑Pq 1 0 .100 0 0 where I1 = index number for the given year q0 = weight applied to each price calculated from the base year. ∑ = sum to be taken over all the items. Po = price calculated from the base year. Consider question one (Tailoka Frank P, Chapter 5). I1 is then the index number for 1991. , I 0 = 1990 p1 75 90 100 A B C I1 = q0 500 35 65 P0 45 50 55 P1q0 37500 3150 6500 47150 P0 q0 22500 1750 3575 27825 ∑ Pq ∑Pq 1 0 0 0 I1 = 47150 .100 = 169.5 27825 147 It may now be stated that prices have risen by 69.5% overall from 1990 to 1991 based on the evidence of these three commodities. This index is a reasonable measure of the change in prices over a short period of, say, two years, but if the given year is a longer period in time from the base year, the weights used tend to become out of date as spending habits change and no longer give a realistic comparison between the two years. This disadvantage may be overcome by using a given year weighted index as calculated by the Paasche formula. I1 = ∑ Pq ∑Pq 1 1 0 1 This index gives the change in the total value of the given year consumption from the value it would have had in the base year. p1 75 90 100 I1 = q1 800 150 80 P1q1 60000 13500 8000 81500 P0 45 50 55 P0 q1 36000 7500 4400 47900 81500 .100 = 170 47900 From this calculation prices may be said to have risen 70% overall. However, this formula is equally unrealistic in that it compares hypothetical past quantities with current real quantities rather than vice versa. One suggested way out of the dilemma is to calculate an average index number which is the geometric mean of the Laspeyres and the Paasche index numbers. I F = I L .I p × 100 2. = ∑ Pq . ∑ Pq ∑Pq ∑Pq 1 0 1 1 0 0 0 1 .100 Changing the Base The base of an index number series is changed by taking proportions as illustrated below. Index A has 1971 as a base year and Index B has 1976 as a base year. To convert Index A to Index B each index. A value was divided by 150. It can be seen that the numbers for each year are in the same proportions for both Index A and Index B. Year 1971 1972 1973 1974 1975 1976 BASE CHANGE Index A 100 110 120 130 140 150 Index B 66.7 73.3 80.0 86.7 93.3 100 148 3. Chain Index Numbers In a chain base index the base period progresses by one time period each time, therefore each index number is interpreted relative to the previous period. Chain index = Price/Quantity at time n Price/Quantity at time n-1 × 100 Example: The table below shows the week ending share price on the stock exchange over a period of four weeks for a local company’s shares: Week Price (K) 1 250 2 300 3 350 5 225 Calculate and interpret a chain base index using week 1 as the base. Index ( wk1) = 100 Index ( wk 2) = Pr ice Pr ice wk 2 300 × 100 = × 100 = 120 wk1 250 Index ( wk 3) = Pr ice Pr ice wk 3 350 × 100 = × 100 = 116.67 wk 2 300 Index ( wk 4) = (to 2 d . p.) (to 2 d . p.) 225 × 100 = 64.29 (to 2 d . p.) 350 At the end of the second week the share price had increased by 20% from the end of the first week. By the end of the third week the share price had increased again but at a slower rate (16.67%) when compared with week 2. In week 4 the price had dipped with a 35.71% decrease from week 3. 4. Splicing Overlapping Series of Index Numbers Suppose index A has a base of 1972 and that in 1974 it becomes necessary to alter the weights used; thus producing a new index, B, based on 1974. However, it is not very meaningful to have an index series covering only three years such as A, but continuity would be maintained. If the new series B could be expressed in terms of the series A. The process is really one of taking proportions using a chain index and it is illustrated using the data in Table 2.0. 149 Year 240 200 180 1972 1973 1974 1975 1976 A: Series Series B: Index B ∑ Pq68 Index A ∑ Pq66 I 72 = 240 .100 = 100 240 I 73 = 200 .100 = 83.3 240 I 74 = 180 .100 = 75 240 I 74 = 200 .100 = 100 200 I 75' = 180 .100 = 90 200 I 76' = 160 .100 = 80 200 ' 200 180 160 The chain index numbers for series B are: ' I 75 , 74 = 180 = 0 .9 200 ' I 76 , 75 = 160 = 0.89 180 One can expect the ratio 1975 to 1974 to be the same for both index A and index B, I 75 I '75 = I 74 I '74 I 75 = I 75 90 .I 74 = .75 = 67.5 I 74 100 150 I '75 is the definition of the chain index I '75,74 and therefore the I '74 formula for calculating I 75 may be rewritten as: It can be seen that I 75 = I 74 .I '75, 74 = 75 × 0.9 = 67.5 In general the next value in series A, I K +1 , may be obtained by multiplying the previous value, I 'K +1, K . I K +1 = I K .I 'K +1, K The index series B came into being because the weights were changed in 1974. It would of course be possible to change the weights every yea and using the chain index technique relate that year back to the original base Series A. This is the method used in calculating the index of retail prices. 5. Deflating Prices and Incomes Indicators of inflation are rising prices and incomes. The question sometimes asked is: by how much has real income increased in, for example the past two years? It may be answered by deflating the income figures by dividing by the retail price index. Prices of individual commodities may be deflated in the same manner, thus showing the increased in real price. Table 3.0 Deflating Income Year 1974 1976 Income K2,610,000 K3,150,000 Price Index 100 157 Real Income K2,610,000.00 K2,006,369.43 Example: Suppose that the income column in table 3.0 shows the incomes from a sales representative in 1974 and 1976, the base year of the index of retail prices has been taken as 1974 and the value for 1976 is 157. Real income may be calculated by dividing actual income by the price index. 1974 real income = K 2,610,000 = K 2,610,000 1.00 1976 real income = K 3,150,000 = K 2,006,369.43 1.57 151 It may be said that the salesman’s real income has decreased by K603,630.57 over the two years. Learning Objectives After working through this Chapter you should be able to • Explain what an index number is. • Compute simple index number and interpret them • Calculate the Paasche, Laspyre’s and Fisher’s index number • Change index from one base to another. Sample Examination Questions 1. The following figures give the distribution of income percentages for an average family: Food % 45 Fuel and light 15 Clothing 05 Rent 20 Other items 15 Average prices (K’000) for three successive years as follows: Food Fuel and Light Clothing Rent Other Items 2003 180 40 95 50 65 2004 200 45 80 55 80 2005 215 42 95 60 80 (i) Calculate a cost of living index for the years 2004 and 2005, taking 2003 as a base year. (ii) Comment briefly on the problem of the choice of items and weights when constructing an index number. 152 2. (a) What are the main considerations to be borne in mind when constructing an index number? b) The following table shows the total weekly expenditure on four commodities in July 2001 and July 2002 based on a representative sample of 1000 households: Commodities Quantities Purchased (Kg) Total Expenditure (K) July 2001: Butter 5 500 2 500 000 Potatoes 10 500 600 000 Apples 4 000 800 000 9 500 000 meat 8 000 28 000 13 400 000 July 2002: Butter 5 500 3 400 000 Potatoes 9 500 900 000 Apples 3 500 850 000 Meat 8 500 1 250 000 27 000 6 400 000 You are required to compute a paasche index showing the extent of the use in prices of all four commodities. (c) 3. Explain briefly the major weakness of the paasche index in this case and suggest an alternative. The following figures give the distribution of income percentages for an average family: % Food 25 Fuel and light 20 Clothing 25 Rent 10 Other items 20 Average prices for the successive years were as follows: Food Fuel & light Clothing Rent Other Items 1999 180 35 100 45 65 2000 195 34 90 45 75 2001 210 30 95 50 75 (a) Calculate a cost living index for the years 2000 and 2001, taking 1999 as a base year. (b) Comment briefly on the problem of the choice of items and weights when constructing an index number. 153 4. (a) Define what is meant by a ‘fixed base index number’ and a ‘chain based index number’ and explain the different ways in which these alternatives have to be interpreted. (b) From the following data, calculate: i) ii) A Laspeyre price index for 2003. A Paasche quantity index for 2003. In each case using 2001 as the base year. 5. Commodity A 2001 Average price (K) 18 250 B (a) (b) Quantity 155 2003 Average price (K) 1 8 750 Quantity 195 39 100 275 46 000 310 C 7 000 120 9 000 195 D 14 750 435 22 700 380 E 74 200 95 101 800 130 What are the main considerations to be borne in mind when constructing an index number? The following table shows the total weekly expenditure on four commodities in July 1993 and July 2004, based on a representative sample of 1000 households. Commodities July 1993 Butter Quantities Purchased (Kg) 4 500 Total Expenditure K’000 1 680 Potatoes 9 500 510 Apples 3 000 600 7 000 24 000 7 200 9 990 July 2004 Butter 4 500 4 200 Potatoes 8 500 4 200 Apples 3 500 1 500 Meat Meat 7 500 19 500 24 000 29 400 You are required to compute a laspeyre index showing the extent of the rise in prices of all four commodities. (c) Explain briefly the major weakness of the laspeyre index in this case and suggest an alternative. 154 CHAPTER 12 ASSIGNMENTS Methods of organizing and presentation of data description measures. 1. The work required on two types of machine, X and Y, has been categorized of routine maintenance, part replacement and specialist repair. Records kept for the past 12 months provide the following information: Work Required Routine Maintenance Part Replacement Specialist Repair Frequency Type X 11 5 4 Type Y 15 2 3 Present this information using: (a) (b) 2. Pie Charts Appropriate bar charts. The average weekly household expenditure on a particular range of products has been recorded from a sample of 20 households as follows: K42,000 K45,500 K35,550 K48,150 K37,450 K51,000 K25,600 K55,600 K22,500 K65,600 K43,100 K79,600 K46,400 K39,450 K52,950 K29,000 K49,900 K39,500 K73,050 K41,550 Tabulate as a frequency distribution and construct suitable diagram. 3. The number of new orders received by a company over the past 25 working days were recorded as follows: 4 5 5 4 5 1 3 6 1 3 2 6 2 3 4 5 4 5 1 4 5 7 3 6 2 Determine the mean, median and mode. 4. Determine the mean, median and mode from the following information given on journey distance to work: 155 Kilometers Under 1 Percentages 20 1 and under 3 26 3 and under 10 35 10 and under 15 9 15 and over 10 5. Write a brief statement explaining the meaning of the standard deviation to someone who knows nothing about statistics. 6. In question two: Determine the range, quartile deviation and standard deviation. 7. A survey of workers in a particular industrial sector produced the following table: Income (Weekly) Under Under K500,000 175 K500,000 but under K750,000 240 K750,000 but under k1,000,000 230 K1,000,000 but under K2,000,000 160 Over K2,000,000 125 Estimate the standard deviation, range and quartile deviation. Probability Probability Distributions 156 (1) What is the probability that two successively chosen random digits are the same? (2) A letter is chosen at random from the alphabet. What is the probability that it is: (a) (b) (3) Assuming that any arrangement of one or more letters forms a word, how many words can be formed from: (a) (b) (4) (5) A vowel? A consonant? CAT STRANGE? Two fair dice are thrown (a) If it is known that the total score was 7, what is the probability that the difference between the scores on the two dice was 2? (b) If it known that the difference between the scores was 3, what is the probability that the total score was 8? Events A and B are such that P ( A) = ( ) 5 7 1 , P A / B = , P( A ∩ B ) = 12 12 8 Find: (a) (b) (c) (d) P (B ) P( A / B ) P ( B / A) P( A ∪ B ) State whether events A and B are: (6) (a) Mutually exclusive (b) Independent Three machines A, B and C produce respectively, 40 percent, 10 percent and 50 percent of the items in a factory. The percent of defective items produced by the machines is, respectively 1 percent, 3 percent and 4 percent. An item from the factory is selected at random. (a) Find the probability that the item is defective. 157 (b) (7) If the item is defective, find the probability that the item was produced by: (i) Machine A (ii) Machine B (iii) Machine C Find the mean µ , variance σ 2 , and standard deviation σ of each distribution. (a) x f (x) 2 3 8 19 30 1 5 1 6 (b) 2 1 4 x f (x) 3 1 2 8 1 4 (c) x f (x) 1 0.4 2 0.1 3 0.2 4 0.3 8. A player tosses 2 fair coins. The player wins K12, 000 if 2 heads occur and K4, 000 if 1 head occurs. For the game to be fair, how much should the player lose if not heads occur? 9. A fair coin is tossed 3 times. Find the probability that there will appear: (a) (b) (c) (d) 3 heads Exactly 2 heads Exactly 1 head No heads 10. The probability is 0.02 that an item produced by a factory is defective. A shipment of 10, 000 items is sent to a warehouse. Find the expected number of defective items and the standard deviation. 11. One-fifth of all accounts are found to contain errors. In a batch of eight accounts find the probability that the number of accounts containing errors is: (a) (b) Less than 2 More than 2 158 Find the mean and standard deviation of the number of accounts containing errors. 12. An assembly line produces approximately 3% defective items. In a batch of 150 items, find the probability of obtaining: (a) (b) 13. Serious accidents occur at random in a particular manufacturing industry at the rate of 2.6 per week. Find the probability of less than 3 accidents occurring during: (a) (b) 14. A given week A five-week period Given a normal distribution with mean = 60 and standard deviation = 10. Find the areas under the normal curve: (a) (c) (e) 15. Only two defective Less than two defective Over 72 Over 50 Between 52 and 82 (b) (d) Under 60 Between 60 and 75 Hourly wage rates for unskilled workers in a particular nationwide industry are normally distributed with a mean of K10, 000 and a standard deviation of K1, 750. (a) Find the probability that an employee selected at random will earn a basic rate of between K9, 500 and K11, 000 per hour. (b) In a group of 500 unskilled employees how many would you expect to earn more than K12, 500 per hour? (c) Approximately 20% earn less than the recommended minimum basic rate. What is this minimum rate? Sampling and sampling distribution Estimation, Hypothesis testing 1. The values of orders received by a company are normally distributed with a mean of K17, 000, 000 and a standard deviation of K5, 750, 000 in a batch of 25 orders. Find the probability that the average value is: (a) In excess of K20m 159 (b) (c) 2. Below K15 Between K17.5m and K18.5m The average income tax allowance for employees in a company is K19.5m per annum, with a standard deviation of K3.75m. (a) Find the probability that a group of 60 employees selected at random will have an average income tax allowance of: (i) (ii) (b) Over K2m per annum; and Between K14.4m and K19m Find the 95% confidence limits for the average tax allowance in such a group. 3. In a random sample of 200 employees, 55% were found to be in favour of strike action. Find the 95% confidence limits for the proportion of all employees in the company who are in favour of such action. 4. An assessment test is given to all prospective employees in a company. The test scores are known to be normally distributed. A random sample of five participants obtained the following results: 50, 65, 70, 72, 76 Test the assumption that the mean test score is 55 using the 5% significance level. 5. A random sample of twelve blue-collar employees in large manufacturing plant found the following figures for number of hours overtime worked in the last month: 23, 15, 30, 15, 20, 35, 24, 12, 40, 30, 27, 32 Use an unbiased estimation procedure to find estimate for each of the following: (a) (b) (c) 6. The population mean The population variance The variance of the sample mean Let x1 , x2 , and x3 be random sample from a population with mean µ and variance σ . Consider the following two point estimators µ : µ (1) = x1 + 2 x2 + 3 x3 ( 2) x1 + 4 x2 + x3 ,µ = 6 6 (a) Show that both estimators are unbiased. 160 (b) What estimator is more efficient? (c) Find the relative efficiency (d) Find an unbiased estimator of the population mean that is more efficient than either of these estimators. 7. Consider a normal population with a standard deviation = 20. A random sample of 16 items is found to have a mean of 112. Test the assumption at the 5% significance level that the population has a mean of 100. 8. Define α and β for statistical test of hypotheses. 9. A vice president in charge of sales for a large corporation claims that salesmen are averaging no more than 15 sales contacts per week. (She would like to increase this figure). As check on her claim, n = 36 salesmen are selected at random, and the number of contacts is recorded for a single randomly selected week. The sample reveals a mean of 17 contacts and a variance of 9. Does the evidence contradict the vice president’s claim? Use α = 0.05 . 10. A psychological study was conducted to compare the reaction times of men and women to a certain stimulus. Independent random samples of 50 men and 50 women were employed in the experiment. The results are shown in Table 1.0. Do the data present sufficient evidence to suggest a difference between time mean reactions times for men and women? Use α = 0.05 . Table 1.0 11. Men n1 = 50 Women n2 = 50 y1 = 42 seconds y2 = 38 seconds S1 = 18 S2 = 14 Suppose that the vice president of question 9 wants to be able to detect a difference equal to one call in the mean number of customer calls per week. That is, she is interested in testing H o : µ = 16 . With the data as given in question 9 for this test, compute the p-value. Analysis of variance Time series Index numbers 161 1. Four groups of students were subjected to different teaching techniques and tested at the end of a specified period of time. As a result of dropouts from the experimental groups (due to sickness, transfer and so on), the number of students varied from group to group. Do the data shown in Table 1.2 present sufficient evidence to indicate a difference in the mean achievement for the four teaching techniques? Table 1.2 1 60 80 82 84 90 2. Data for question 1 2 62 50 65 85 63 45 3 58 54 80 62 72 4 95 80 85 72 In an investigation of college major and initial salary at first job, a random sample of two students from business, engineering, history, journalism and pharmacy was taken. The initial monthly salary for those in the sample was: the salaries are in millions of Kwacha. Business Engineering History Journalism Pharmacy 3.6 4.0 3.2 3.6 3.6 4.2 4.8 2.8 3.6 4.0 4.2 4.4 2.4 4.8 4.8 Take the hypothesis that the average initial salary is the same for all majors and report your results. Use α = 0.05 . 3. Articles manufactured by a company are produced by three operators using three different machines. The manufacturer wishes to determine whether there is a difference. (a) Between the operators and (b) between the machines. An experiment is performed to determine the number of articles per day produced by each operator using each machine, the results are shown in Table 1.3. Provide the desired information, using a significance level of 0.05. Table 1.3 Machine A Machine B Machine C OPERATOR 1 25 36 30 2 29 32 27 3 26 30 29 162 4. The manager of a cycle and small motorcycle business examining his sales of mopeds over the past three years finds the data to be as follows: No. of Machines Sold 1st Quarter 2nd Quarter 10 12 17 11 22 15 24 1997 1998 1999 2000 5. Use the data to make a reasonable estimate and the number of machines, which should be ordered for the first half of 2002. (b) What factors do you think are likely to affect the trend and seasonal variation in sales of mopeds? The sales (xK1 000 000) of golf equipment by a department store is shown for each period of three months as follows: (a) (b) (c) 1996 25 1999 9 31 61 21 2000 22 52 82 42 2001 43 65 95 - Using the additive model, find the centered moving average trend. Find the average seasonal variation for each quarter Predict sales for the last quarter of 2001 and the first quarter of 2002, stating any assumptions. By using exponential smoothing with α = 0.025 , smooth our weekly turnover figures for a company tabulated below: Week Turnover (x K1 000) 7. 4th Quarter 8 10 17 - (a) Quarter First Second Third Fourth 6. 3rd Quarter 11 20 27 - 1 27 2 33 3 30 4 24 5 30 6 35 7 40 8 30 9 38 10 45 40 38 44 52 Illustrate weekly turnover together with the smoothed values comment on whether these smoothed value give a reliable indication of the tread. For the following data, calculate: (i) A laspeyre price index for 2002 (ii) A purchase quantity index for 2002 in each case using 2000 as the base year. 2000 2001 Commodity A B C D E Average Price (K) K18, 250 K39, 100 K7, 000 K14, 750 K74, 200 Quantity Average Quantity 155 273 114 430 90 K18, 750 K46, 000 K9, 000 K22, 700 K101, 800 195 300 190 360 130 163 8. Splice the old and new indexes, so that 1999 = 100 Year 1999 2000 2001 9. Old index 231 215 - In 2000, a particular index series was reset to 100; the following table shows the values of the old index and the new index: Year 1996 1997 1998 1999 2000 2001 (a) (b) 10. New index 100 108 Old year 100 95 101 110 115 New index 100 110 What is the value of the old index in 2001? What would have been the value of the new index 1997? The CP: for wage and clerical workers (base year 2000 = 100) are 205.3. What was the purchasing power of a Kwacha in terms of 2000 prices? 164