Introduction to Statistics: Data Collection & Sampling Methods

I VOLUME: II N CHAPTER 9: WHAT IS STATISTICS? T 9.1 Introduction R In the ordinary English language, the term statistics is often referred to as an aggregate of items with numerical quantification. For example, one might say “Trade statistics referring to a mass of import and export figures ”. Although this conception is not far from the truth, still it does not give us the real meaning of the term statistics as far as scope of its study is concerned. O D U C T O R Y Statistics is a science of collecting data, summarising, analysing and presenting the information collected. In general, statistics is a field of study concerned with mathematical characterisation of a given aggregates of items. Statistics as a science is essentially a branch of Applied Mathematics just as we have other branches in Mathematics such as Mechanics or Calculus. Statistical Applications find their way into almost every field of study such as business, economics, sociology, agriculture, education and other related fields. Statistics when applied to biological field is termed as Biostatistics. The subject matter of statistics can roughly be divided into two categories: 1.Descriptive Statistics 2.Inferential Statistics Descriptive Statistics is primarily concerned with organisation and presentation of collected data. On the other hand Inferential Statistics, as opposed to descriptive statistics deals with decision making/judgements based on the collected data using advanced mathematical methods. In the midst of the two categories we have Mathematical or Theoretical statistics, which can be thought as an Engine in the study of the two fields. 9.2 Data Collection In any scientifical research, data element is an important part of the exercise. Scientifical inquiries and findings normally start by observing some facts (collected data/information) from which certain conclusions may be drawn. After the information has been collected, it is then processed and analysed. Data which has just been collected and not yet organised in any form is referred to as raw data, while if at all the collected data has been altered in one way or another is called an organised data/information In the first place, data can be classified in two groups. Quantitative and qualitative type of data. Data on, say, height, students score, number of plants are examples of quantitative data. Data on things such as beauty, goodness, are examples of qualitative type of data. Statistics deals with both types of these categories. S T Further more, the quantitative data can be viewed in two categories by the virtue of the values they take. Data that take only exact values, such as 5,6, 8, are known as discrete data. For example, consider the number of tomatoes on each plant in a green house. Data, which cannot take exact values but can be given only within a certain range or measured to a certain degree of accuracy, is called continuous data. For example, consider the heights or weights of say school children. A T I S 1 9.2.1 Population The whole purpose of collecting data is to have information on the items of interest in a particular study. The set of all items under discussion in a particular survey or experiment is called a population of the study. e.g. consider a survey on the performance of all horticultural produces in Morogoro. All the items belonging to the set of horticultural produces may be considered as the population under study. 9.2.2 Sample A sample is a set of few items taken from a population, i.e. a subset of the population. Consider say a set of 10 scores by 10 students out of a class of 100 students. In this particular case, the scores by ten students can be thought as a sample of student’s marks out of that class. 9.2.3 Why should one take a sample? It is in the interest of every surveyor to have as much information as possible on every item of a population. But this is practically not possible due to the following reasons:(i) (ii) (iii) It is costly to study the whole population A sample shortens time for carrying out a survey The information in a sample can be analysed with greater accuracy 9.3Types of Sampling Methods A sample needs to be a representative of the population. This is a general and chief characteristic of a sample. There are different types of samples depending on the nature of the population, and in the interest of the surveyor /researcher. However, we can generally have two groups of samples basing on the sampling methods, normally random sampling and non-random sampling. A sample, which is selected in such a way that every member in the population has an equal chance of being selected, is called a random sample. Some types of random samples include: - 9.3.1 Simple Random Sampling When items are selected at random, each member of the population has an equal chance of being selected. Each member of the sampling frame is allocate3d a number and the sample is selected using random numbers obtained either from random number tables or generated by a computer or calculator. This can be done with or without replacement. Suppose we draw a number from a hat. We have a choice of replacing the number in the hat before drawing again, or of not replacing it. Sampling where each item may be chosen more than once is called sampling with replacement. Sampling where each item may not be chosen more than once is called sampling without replacement 2 9.3.2 Systematic Sampling Random sampling from a very large population is cumbersome. An alternative way is to list the population elements in some order and then choose every kth member from the list, after obtaining a random starting point. Example Describe how to choose a systematic sample of 30 from a population of 100 items Sol: k = 100/30= 3.3 so every time we select an item we need to move 3.3 places along the list. A random start between 1 and 3 inclusive is chosen. Let this be 2. So ,we would select the 2nd item Then 2+3.3= 5.3 → 5th item 5.3 +3.3=8.6→9th item 9.3.3 Stratified Sampling Stratified sampling is a kind of sample taken from a population with different strata (layers), where separate random samples from each stratum are combined to form the sample. The allocation of units from the different stratas to the sample can be approached in two ways, usually through proportional allocation and Neyman allocation. In the first method, units from different strata are allocated according to the stratum share out of the total population. For example if a sample of 100 students is to be taken from a colleague with say 500 1st year, 300 2nd year and 200 3rd year, the units to be taken from each stratum will accordingly be, (5/10) x100, (3/10) x300 and (2/10) x100. As for the second method, cost and accuracy rather than proportionality is considered. 9.4 Methods of Data Capture After you have determined the population elements and the type of sample you are going to take, the next important step is to think about how you can capture the data (information). 9.4.1 Interview If the information sought is to be obtained from persons, oral interview can be done. One of the chief advantages of this method is that a surveyor has an opportunity to scrutiny the information given and question back the respondent. Hence, under normal circumstances the information collected through this method may generally be regarded to be correct. However the method is cost full in terms of time and money, as it requires a physical contact of the interviewer and the respondent. 9.4.2 Mailing Sometimes, it is impossible to visit every person, and instead questionnaires can be sent by mail. The chief disadvantage in this method is low response rate and high risky of having less correct information given. 3 9.4.3 Measurements/Experiments It sometimes happen that the information sought cannot be obtained from an individual person unless measured by the surveyor himself. This method is particularly significant to researchers in the natural sciences. For example, if one is interested to find out the effects of the yield of oranges trees after being treated by a certain fertilizer. This information cannot easily be obtained by interviewing the illiterate orange growers in the rural area. Alternatively, measurements should be taken .The method tends to give you the most accurate information if carried out correctly. The great disadvantage in the method is the cost in terms of money, trained personnel and time. In statistics, measurements are often involved in a special branch of study called “ design of Experiments” 9.4.4 Available literatures/Secondary data sources The information so seeked might be already available in the documentation. So one may simply deep from a literature instead of capturing data afresh. The method saves both time and financial resources. However it has a great disadvantage that the information collected may not exactly fit the study area of a researcher as it may be out dated or incomplete Exercise: 9 1. You are a surveyor at Sokoine University of Agriculture investigating reasons for student’s mass failure. With concrete reasons, suggest an appropriate kind of sampling methods. 2. Discuss the merit of measurements method against the interviewing method in data capturing. 3. With an example, explain the role of Mathematics in Statistics 4. With typical examples, explain the relevance of statistics in our daily life 5. The judges intend to find the most beautiful girl in a beauty contest. Explain how may the exercise be carried out? What kind of data is going to be used in this exrcise? 6. Suppose you want to compare the efficiency of nitrogen fertilizer on maize yields at the Tanzanian rural areas. Explain how you are going to carry the exercise. What kinds of sampling methods will you employ and how is the data going to be taped? 4 CHAPTER: 10 BASIC MEASURES OF CENTRAL TENDENCY AND VARIABILITY 10.1 Introduction The need for the average in statistics comes from the very meaning of the term statistics. After data has been collected, we need to summarise the information in such a way that one might easily see what is depicted by the data. Therefore, as a matter of necessity we need a single value to represent the entire picture, and this is what we refer to it as an “average”. But how should an average be? What is its chief characteristic? An answer to these questions should serve to be a corner stone in all the properties constituting an average. Obviously, an average in any field of study must be as representative as possible of the entire observation. Think of a situation where you have a student’s scores in three subjects as 52, 60 and 78.Then a question is raised as to what is the average performance of the student? Certainly by practice the average performance would be =[54+60+78]/3=64. But, why 64? This is a very important question in the understating of a statistical average. The average is 64 because there is no any other value besides 64, which is numerically more close to all the observations taken at a time. This kind of an average in statistics is known as an “arithmetic mean”. There is always a tendency for most students when first introduced to the notion of an average to think that always an average means an arithmetic mean-this is quite wrong! The nature and form of an average will very much depend on the nature and form of the data concerned. Think of a situation where you have a beauty contest involving 5 girls and a group of judges rank the girls in terms of beauty so that we have the following order in beauty 1. Elizabeth 2.Asha 3.Nyanzobe 4.Nyange 5.Tabitha Then, someone who never attended the contest but who fortunately is familiar with Nyanzobe ask you a question, how beautiful were the girls compared to other beauty contests? In whichever thinking this question asks for an average beauty among the five girls as perceived by the judges. In response to this question, a knowledgeable statistician would say the girls were more or less like Nyanzobe in terms of beauty. This is so because she (Nyanzobe) is in the midst of the Judge’s ranks. So, even though we do not have quantitative data we have been able to establish a statistical average in the light of position consideration. This kind of an average in statistics is often known as a median, and it is applicable for both quantitative and qualitative type of data. From this discussion, it would seem appropriate to establish the qualities of a good statistical average bearing in mind that an average is basically a representative of the entire attributes of a population. In general an average in statistics should at least posses the following characteristics. 1. It should be based on all observations made. 2. The average should rigidly be defined and not left to the mere estimation of the Observer. 3. It should posses some simple and obvious properties and not mathematical abstractions. 4. It should be calculated with reasonable ease and rapidity. 5. It should be as stable as possible. 6. It should lend itself readily to algebraic treatments. 5 Below are the commonly known and daily applicable statistical averages. 10.2 commonly known measures of central tendency 10.2.1 The Arithmetic Mean The arithmetic mean = 1 N of a set of variables x1 x2, x3, ………. xN is defined by the formulae N  xi i 1 Example Find the mean yield of the five plots of maize if the yield per plot were, 2, 4, 5, 1.5, and 2.5 sacks of maize. Sol: = 1 N N  xi = = i 1 1 5  xi =1/5[2+4+5+1.5+2.5]=15/5=3 5 i 1 10.2.1.1 The Mathematical Properties of an Arithmetic Mean 1. The sum of the deviations of a set of variables from its arithmetic mean is zero Consider the following observations on the number of patients who died of cancer intestine in 5 years period Table10.1 Year Deaths 1 22 The arithmetic mean is given as x = 1 N 2 26 N 3 40  xi =1/5 i 1 4 48 5 30 1 5  xi = [22+26+40+42+30]/5 5 i 1 =160/5=32 Table 10.2 XI X I- x -10 -6 8 10 -2 0 22 26 40 42 30 N  (xi - x ) i 1 From table 10.2 above, we can clearly see that the sum of the deviations from the mean is zero. 6 Exercise 10.2.1(a) Show that algebraically that 1 N N  i 1 di = 1 N N  (xi - x ) = 0 i 1 If di =Xi - C where C is a constant then x = C + d 2. Consider the previous example. Let c = 30 Table10.3 Xi 22 26 40 42 30  di di = xi-C -8 -4 10 12 0 10 Therefore d = 10/5 =2 x = C + d =30+2=32 Exercise10.2.1(b) If di =Xi - C where C is a constant show algebraically that x = C + d 10.2.2 The Weighted Arithmetic Mean The simple arithmetic mean assumes that all the observations have the same weight. i.e. the same contribution to the total distribution which is not often the case in some situations. Suppose a student receives grades 70, 60, 75, 84, and 65 in courses carrying credit hours of 1, 3, 2, 2 and 4 hours respectively. His average grade cannot simply taken to be [70+60+75+84+65]/5. The hours should be taken into consideration and the average is found as follows: - x = [(70x1) +(60x3) +(75x2) +(84x3) +(65x4)]/(1+3+2+2+4)=76 Therefore the weighted arithmetic mean may be defined as n w i 1 x= n  i Where w i. is the weight assigned to x i. I 1 7 10.2.3 The Geometric Mean In some kind of data we have values occurring in a sequence of geometric progression and it has been found that the proper average for this kind of data is the geometric mean given by G.M = n x1 x 2 x3  x n  . For example, the geometric mean for 2,4, 8 is 3 2 4  8  4 10.2.4 The Harmonic Mean The harmonic mean (H) of a given set of observation is defined as the reciprocal of the arithmetic mean of the reciprocals of the observations. Given a set of variables x1 x2, x3, ………. Xn.The harmonic H is given as 1 1 1   x x2 xn 1 n  n 1 1 1   x1 x2 xn 10.2.5 The Median The median is a middle value in a set of observations arranged in the order of magnitude of values. Given x 1, x2 … xN such that x1,  x2  x3…  xN , then the median of these observations is M = x(N+1)/2 if N is odd and M= 1 (XN/2 + X(N+2)/2 ) if N 2 is even. Example: Find the median in the following set of observations (a) 30, 35, 7, 6, 20, 40, 15 (b) 13, 7, 8, 11, 5, 4 Sol: (a) First arrange the data in ascending order as follows 6, 7, 15, 20, 30, 35, 40. Since N =7 is odd then M = x(N+1)/2 = x(7+1)/2 = x4 = 20. (b) Again arrange the data in order so that you have the following array 4,5,7,8,11,13. Since N = 6 is even then M= 1 1 (XN/2 + X (N+2)/2 ) = (X6/2 + X(6+2)/2 ) = 2 2 1 1 (X3 + X4) = (7 + 8) =7.5 2 2 10.2.6 Mode In a given set of observations a value that occurs frequently than any other is termed as the mode of the distribution. Example Find the mode in the following records of a peasant’s sales of horticultural produces in a 7 days period 8 Table 10.4 Days Sales in Tshs 1 1600 2 500 3 700 4 500 5 800 6 900 7 500 Sol: 500 occur 3 times while the other values occur only once. So the mode is 500.Tshs 10.3.Measures of Dispersation/Variability 10.3.1 Introduction The measures of dispersion come into existence as a necessity of complementing the study on measures of central tendency. If we refer back to the discussion in section 10.1.0 on statistical average where we had a student scoring 54, 60 and 78 in three different subjects we can clearly see the need of having a study on measures of dispersion. We had an average score of 64 per subject. But as a matter of fact it is not true that the score was 64 in every subject! The subject deviations from the average were +6 for the first, +4 for the second and –14 for the third. With such deviations one may be a bit pessimistic to say that the student’s score in any of the three subject is likely to be 64.To be safe, he would like first to set aside a certain amount of variation from the score of 64 before making such a general statement. Since the variation from the average is different from one score to another, then such a precaution would bring a need of looking for an average deviation in absolute terms of the subjects from the mean and this is =[6+4+14]/3=8. Such kind of a measure of dispersion is called Mean deviation. With this value, someone would be safer to conclude that the student score in a particular subject is likely to fall between -8+64 and 64+8 i.e. 56 and 72 Below are some commonly known measures of dispersion. 10.3.2 Range Is the difference between the highest and lowest values in the observations? Consider the observations on the sales by the peasant in the previous example. Range= 100-50 =50 10.3.3 Mean deviation As introduced before the mean deviation (M.D) of a set of N values, x1, x2 … xN is defined as M.D = 1 N N  │xi - x │.In the example in section i 1 3.2.6 x= 1 N N  i 1 xi = 1 7 7 x = i 1 [500+1000+500+700+500+800+900]/7 =4900/7=700 M.D = [│500-700│+│100-700│+│500-700│+│700-700│+│500-700│+│800-700│+│900-700│]/7=[│-200│+│300│200│+│0│+│-200│+│100│+│200│]/7 =[200+300+200+0+200+100+200]/7 =1200/7=171.42 10.3.4 The Standard Deviation The mean deviation which was supposed to be a standard measure of dispersion has one major weakness that “, it is not easily amendable for further algebraic treatments”. Thus, another measure was introduced to cater for such a weakness and this is the standard deviation (S.D) defined as a non-negative square root of the variance (v). This measure of dispersion is the most frequently used in statistics than any other measure of dispersion. 9 v where v = Normally S.D = 1 N N  (xi - x ) 2 i 1 Example Find the standard deviation of the following observations, 45, 32, 38, 48, 60, 75. Sol: Table 10.5 xI 45 32 38 48 60 75 298 x= v = N 1 N  1 N N xi = i 1  (xi - x )2 21.8089 312.2289 136.1889 2.7889 106.7089 641.6089 1221.1334 xi - x -4.67 -17.67 -11.67 -1.67 10.33 25.33 1 6 6  xi = 298/6 = 49.67 i 1 (xi - x )2 = i 1 1 (1221.1334) = 203.5555663 6 Therefore S.D = v = 203.5555663 = 14.267 ≈ 14.27 ( 2 decimals) However, this method is relatively tedious and less accurate. Another formulae for the variance is given as V= 1 N x 2 i - ( x )2 The formula is in essence the same as the previous one. Indeed, we can show algebraically that this is deduced from the other formulae. Exercise 10.3.4: Show that V = N 1 N  (xi - x )2 = i 1 1 N x 2 i - ( x )2 Using this new formula the standard deviation could have easily been calculated as follows: - V= 1 6 xi x i2 45 32 38 48 60 75 2025 1024 1444 2304 3600 5625 16022 6 x i 1 2 i - ( x )2 = 1 (16022) – (49.67) 2 =14.26 ( 2 decimals) 6 10 Table 10.6 10.3.5 The Quartile Deviation (semi- inter quartile range) A quartile refers to one fourth of the entire observations. If the values in the observations are arranged in an ascending order and divided in four equal parts in terms of number of observation then we have accordingly, the 1 st, the 2nd and the 3rd quartiles. Infact the second quartile is the median of the observation. For example, consider the following set of observations, 5,6,9,11,12,15,18,19,20,22,23,23.6,25,26,26.7. The 1st quartile ( Q1 ) will be 11, the second ( Q 2 ) is 19, and the third ( Q3 ) is 23.6. Alongside quartiles we have also deciles which refer to one tenth, percentiles referring to one of a hundredth and so on. Again, the fifth deciles and the fifty percentile would be equal to the median of the distribution. The Quartile deviation or a semi-interquatile range is defined as one half of the difference between the 3rd and the 1st quartiles. It is given by Q= Q3  Q1  /2. In the example given Q=(23.6-11)/2 =6.3 11 CHAPTER 11: FREQUENCY DISTRIBUTIONS 11.1 Introduction The collected data in most cases come to us as large samples and in form unsuitable for immediate interpretation. In such cases, it is always important to group the data into appropriate number of classes before their general characteristics can be detected and measured. Lets for instance consider the raw data relating the weights (in Kg) of 50 animals measured after being fed with special animal feed. 50, 60, 71, 85, 83, 67, 68, 62, 63, 95, 45, 74, 95, 68, 75, 90, 91, 69, 71, 84, 82, 72, 76, 83, 63, 74, 88, 62, 65, 45, 58, 61, 60, 79, 80, 88, 93, 59. 60, 83, 78,62, 88, 57, 53, 67, 77, 74, 75, 75, It is difficult to trace some of the most general characteristics of the given data unless it is grouped. This data can conveniently be grouped and shown in a tabular form through the following procedures: 1. Find the range. In this case, range= 95-45=50 2. Decide on the number of classes to be formed. In practice there is no rule for the number of classes to be formed. However, we prefer a range of between 10 to 20 classes. Suppose we want to form 11 number of classes, then the class size C= 50/11 = 4.54  5 3. Consider the smallest value, which is 45, so that you can start with 45-49, 50-54 and so on. Table 11.1 Classes Frequencies 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 Total 2 2 3 9 6 6 7 6 4 3 2 50 N:B    When data have been organised in such a way that they may be described in terms of class frequencies the is called a frequency distribution. The width of the class is called a class interval. Associated with every class, are the class boundaries which can be found as follows; result Table 11.2 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 Total Classes 44.5- 49.5- 54.5- 59.5- 64.5- 69.5- 74.5- 79.5- 84.5- 89.5- 94.5Class 49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5 boundaries 2 2 3 9 6 6 7 6 4 3 2 Frequencies 50 Remember that we determined the measures of central tendency and those of dispersions for ungrouped data. We can as well do the same for the grouped data. The formulas for those measures remain essentially the same, with few alterations due the presence of frequencies. The midpoints from each of the classes will act as the values in the ungrouped data being considered in terms of the number of times they occur. 12 11.2 Measures of central tendencies and variability in a grouped data We shall consider only the Arithmetic mean, Standard Deviation, Mode and the Median. The reader is advised to find out for himself about the Geometric mean, Harmonic Mean, Mean Deviation, the Range, and the Quartile deviation. Some of these will be treated in exercise.11. 11.2.1 The Arithmetic Mean x N The arithmetic mean x in grouped data is given as, x =  N  xifi i 1 fi Where fi is the frequency of the ith group. i 1 11.2.2 The Standard Deviation N S.D = v Where V=  N (xi - x )2fi i 1  i 1 N fi =  N x2ifi i 1  fi ¯ ( x )2 i 1 th Where fi is the frequency of the i group. 11.2.3 Mode X =L +  1 (h) 1 +2 Where L is the lower limit of the class containing the mode is the excess of the modal frequency over the preceding class 1 is the excess of the modal frequency over the following class 2 11.2.4 Median Median =L + Where f h C (N/2 -C) h f l is the lower limit of the median class is the frequency of the median class is the width of the median class Cumulative frequency up to the class preceding the median class Let us now consider the above measures in the example concerning the animal weights distribution given in section 4.0.0 13 Table 11.3 Frequencies ( fi ) Midpoint ( xi ) xi f i xi2 f i 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 Total 2 2 3 9 6 6 7 6 4 3 2 50 47 52 57 62 67 72 77 82 87 92 97 792 94 104 171 558 402 432 539 492 348 276 194 3610 4418 5408 9747 34596 26934 31104 41503 40344 30276 25392 18818 268540 1. i x Classes The mean N x =  N  xifi i 1 2. = 3610 50  72.2 fi i 1 The standard deviation N S.D = v Where V=  N  x2ifi i 1 Therefore S.D = 3. ¯ i 1 157.96  12.57 Mode x̂ =L + 4. ( x ) 2 = 268540 50  72.2  157.96 2 fi  1 (h)  1 +  2 = 59.5 + (6) x 5 6 +3 = 62.8 Median Median =L + (N/2 -C) h f = 69.5 + (50/2 -22) 5 6 = 72 11.3 Moments: Some of the measures defined in the preceding section fall into a rather broad category of statistics known as moments .The r moment about a of a given distribution is defined  r  th 1 N n  x i 1 n  a  f i where N=  f i . Of particular interest in this r i i 1 study are the moments about the mean and also about the origin. The implication from the given formula is that the arithmetic mean is the 1st moment about the origin while the variance is the second moment about the mean. In general the rth moment about the origin would be given as  r  1 N n  x  i 1 r  14 i r f i whereas the rth moment about the mean would be 1 N  x n i 1  r i  x f i . Moments have wide applications in the study of probability distributions as they serve to define the distinguishing characteristics of one distribution from the other 11.3.1 The relationship between moments about the mean and moments about any point “a” Consider the rth moment about the mean, which is r  By 1 N  x n i 1 letting i x   r = d= a  x  1 N  ( x n i 1 and i  a )  (a  x)  r  1 N n  x i 1  r  a  , using the binomial expansion we have the following r i  1 r result;  r  C0 d 0 ( x  a) r  r C1 d 1 ( x  a) r 1  ... r C r d r ( x  a) 0 N =  r  r C1d 1  r1  r C2 d 2  r2 r Cr d r .  As an example let us compare the 2nd moment about the mean (which is the variance) and the moments about the origin i.e. about a=0.We have  2   2  2C1d 1  2 1  2C2 d 2  2 2 = 1 N x 2 i f i  2x 1 N x 2 i fi  x 0 = 1 N x 2 i 2 f i  x = Var (x) Exercise: 11 1.Express the moments about the origin in terms of the moments about the mean 2.Show that the G.M in a set of grouped data with midpoints as x1 x2, x3, ,satisfies the following equation log(G.M )  …… Xn and frequencies f1 , f2, f3, …fn 1  f1 log x1  f 2 log x2   f n log xn  n 3.Find the Harmonic Mean of the following, 30,60,90 4.Establish a formulae for an Harmonic Mean of a grouped data with midpoints as x1 x2, x3, f1 , f2, f3, …fn . … Xn and frequencies 5. The mean and variance of 40 students grouped in class-intervals of 10 marks are 40 and 49 respectively. It was later observed that two observations belonging to the class-interval 21-30 were included in the class-interval 31-40 by mistake. Find the new mean and standard deviation 15 CHAPTER 12: DATA PRESENTATION 12.1 Introduction It is always important to present the collected information in such a way that even one who is not familiar with Statistics can easily grasp the information. One of the most common ways of data presentation is though a diagram. Diagrams should aid the reader by saving his time as well as easily identifying some of the salient features manifested in the data. There are different forms of diagrammatic presentation. Some of them are outlined below:12.2 Pie-charts This involves presentation of data coming from a population of different categories, which are non-overlapping. The main reason for pie charts is to show the sharing composition of the population categories. For example, consider the following composition in the results of the 1st year B.Sc.ESM in the 2001 University Examination at Sokoine University of Agriculture. S/N 1 2 3 4 Category Passing Repeating Disco Total Table 12.1 Number 47 8 17 72 Percentage 65% 11% 24% 100% University Examination results for BSC.ESM students in the year 2001 Passing 24% Repeating Disco 11% 65% Source: Figure12.1 University examinations results for Bsc.Esm students in the year 2001 From Sokoine University Examination Statistics for the year 2001 12.3 Bar graphs Another way of presenting data is though the use of bar graphs, which may roughly be divided into two types, one for noncontinuous data and the other for continuous data. 16 12.3.1 Bar graphs for discrete data Consider again the Sokoine University Examination results for five 1st year B.Sc. degree programmes for the year 2001. Table 12.2 S/N Degree Programme Failure (%) 1 B.Sc Animal Science 43% 2 B.Sc ESM 35% 3 B.Sc Agric. General 31% 4 B.Sc Horticulture 29% 5 B.Sc Education and Extension 23% Failure rate (%) Failure rate(%) in five 1st year Bsc.degree program m es in 2001 50 45 40 35 30 25 20 15 10 5 0 Animal Science ESM Agric General Horticulture Education and Extension Program m es Figure 12.2: Failure rate in five 1st year B.Sc degree programmes in year 2001 Source: From Sokoine University Examination Statistics for the year 2001 12.3.2 Histogram This kind of a bar graph is used to represent continuous data such as heights, weights, and temperatures e.t.c. As opposed to the other kinds of bar graphs, in a histogram, bars are normally joined together. As an example let us draw the histogram for animal weight distribution presented in section 11.1 17 Number of animals Histogram for Animal weight Distribution 10 9 8 7 6 5 4 3 2 1 0 47 52 57 62 67 72 77 82 87 92 97 Weight(kg) Figure 12.3: Histogram for animal weight Distribution 12.4 Line Graphs 12.4.1Frequency polygons If a line is used not with a free hand to join the points connecting the midpoints and the frequencies we have what we call a frequency polygon. An example is given below; Frequency Polygon for animal weight Distribution Number of animals 10 8 6 4 2 0 47 52 57 62 67 72 77 82 87 92 Weight in Kg Figure 12.4: Frequency polygon for Animal weight distribution 18 97 12.4.2 Ogives: Ogives refer to less than or greater than cumulative frequencies curves. Below are the cumulative frequencies of the animal weight distribution given in section 11.1, together with their corresponding graphs. Class boundaries Less than Cumulative Frequencies Greater than Cumulative Frequencies Table 12.3 44.5 49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5 0 2 4 7 16 22 28 35 41 45 48 50 50 48 46 43 34 28 22 15 9 5 Less than and Greater than ogives Cumulative frequencies 60 50 40 Greaterthan Less than 30 20 10 44 .5 49 .5 54 .5 59 .5 64 .5 69 .5 74 .5 79 .5 84 .5 89 .5 94 .5 99 .5 0 Anim al w eight in Kgs Figure 12.5: Less than and greater than ogives for animal weight distribution 19 2 0 12.4.3Frequency curve: If the points in figure 12.4 are joined with a free hand we have what is called frequency curve Frequency curve for animal weight distribution number of animals 10 8 6 4 2 0 0 20 40 60 80 100 120 Weight in Kgs Figure 12.6: Frequency Curve for animal weight distribution As you can see, in the figure above, the curve tends to depict some picture with regards to the peakdness and symmetry of the distribution with reference to the centre (the mean). These features, when careful examined they give rise to two important concepts of a distribution, known as “ Skewness and Kurtosis”. The two concepts are discussed below. 12.5 Skewness Skewness refers to the degree of asymmetry or departure from the mean of the given distribution. The mean is the overall representative of the entire population, in which case, under normal circumstances we would expect each of the observation to be not far from the mean. But this is usually not the case for some distributions. This situation necessitates us to study the degree of asymmetry of a given distribution. We roughly have three types of distributions as far as skewness is concerned. If the frequency curve of a given distribution has a longer tail to the right we say the distribution is positively skewed. In other words, most of the observations are numerically larger than the mean. As a matter of fact in this kind of a distribution, the mean is always larger than the median and the mode of the distribution. On the other hand, if the frequency curve of a given distribution is more elongated to the left, the distribution is said to be negatively skewed, and as such the mean is always smaller than the median and the mode of the distribution. But what happens when the distribution is neither skewed to the left nor to the right of the distribution? In this case, we have what is known as zero skewness implying that the mean, median, and the mode of the distribution are the same. Such kind of distributions is known as normal distributions and they are common in measurements such as weight, height and in student's marks. Infact most of the measured quantities in nature follows such a distribution. 20 1. -Vely-skewed-- 2. +vely skewed-- 3. Zero skeweness, Figure 12.7 12.5.1 Measures of Skewness Basing on the explanations of the preceding section, we can deduce the measures of skew ness. One of such measures is the famous Pearson’s coefficient of skewness, given as   mean  mod e . The sign of  in this formula will reflect the three S .D types of skweness defined in the preceding section. For example let us find the coefficient of skeweness from the example in section 11.1.0.We had, x  72.2, ~ x  72andS.D  12.57 hence  = 72.2  72  0.01, which shows that the distribution is 12.57 positively skewed. The coefficient of skweness could have also been obtained by involving the mode rather than the median and the formulae would be   3(mean  median) , implying that that. mean  mod e  3(mean  median) This S .D relationship is known as the empirical relationship between the mean, the mode and the median of a given distribution. For non- normal distributions, we can always apply the above relationship to approximate one of the measures given the other two. Another famous formula that needs no details of the mode and the median is the one involving the 3 rd and the 2nd moments about the mean of the given distribution. Usually, we have,    32  22 . 12.6 Kurtosis Kurtosis is the measure of the degree of the peak ness of a distribution .we basically has three types of distributions with reference to peaked ness. Distributions with excessively high level of peaked ness are known as leptokurtic while those with extremely low level are called platykurtic. The distributions with a moderate level of peakdeness are known as mesokurtic with the normal distribution as an example of them. The kurtosis of a distribution is given by    4  22  3 . If  >0, the distribution is leptokurtic, if  <0 it is platykurtic and if  =0 then the distribution is a mesokurtic  Leptokurtic platycurtic Mesokurtic y Figure 12.8 21  Exrcise12 1.The following are the figures (in millions of USD) of Tanzanian trade with SADC for the period 1994-1998 Table 12.4 Year 1994 1995 1996 1997 1998 Exports 87.3 96.4 80.7 102.9 69 Imports 233.8 220.8 193.9 226.3 294.1 Source: Tanzania Revenue Authority-Customs Department. Discuss how the data given can be presented in a bar graph. 2.What is so special about a histogram as opposed to other kinds of bar graphs? 3.What is the main purpose of data presentation? Discuss the important aspects in making data presentation. 22 CHAPTER 13:CORRELATION & REGRESSION ANALYSIS 13.1 Introduction: In real world we have relationships between two or more different variables. For example, infancy age has something to do with infancy weight. Similarly, someone's height and weight have a sort of association. There are so many examples in the real world to show that quantitative relationships among variables exist. In statistics, we seek first to establish in a mathematical way, whether such relationships exist and later to know the functional nature of such relationship. With the first involvement we embark on what is called "Correlation Analysis while in the later task we deal with what is known as "Regression Analysis". For the purpose of this subject we shall confine ourselves to the study of linear relationship between two variables only, and this would be known as simple correlation and regression Analysis the reader is advised to revisit the topic on linear functions and its properties for a thorough understanding of the materials presented in this section 13.2 Correlation Analysis Suppose we have 3 different sets of paired observations for variables x and y plotted in the so-called Scatter diagrams as shown in the figures below 200 200 150 150 100 100 50 50 0 0 -30 -20 -10 -30 0 10 -20 20 -50 -10 -50 0 10 20 -100 -100 -150 -150 -200 Figure 13.1 Figure 13.2 150 100 50 0 -30 -20 -10 -50 0 10 20 30 -100 -150 -200 Figure 13.3 On the basis of the given figures, it would seem plausible to base our measurement of correlation coefficient on the product "xy". As you can see in figure 13.1, the two variables have a positive linear relationship and as such, the observations have concentrated much on the first and the third quadrant in which case the produt "xy" is always positive! 23 Similarly in figure 2, where most observations are found in the second and the forth quadrant, the product "xy” is always negative and as it can be seen the variables have a negative linear association. Things are different in figure3 where the observations are scattered all over the four quadrants, the sign of "xy" in this case can generally be regarded to be zero meaning that no linear association exist between the two variables. As a matter of fact, we cannot trace any sort of linear trend among the points in figure 3. But then what happens to some points for the case of figures13.1 and 13.2, which seem not to obey the general trend? In figure13.1 some points are not found in quadrant 1 and 3 and similarly in figure 13.2, some points are not found in quadrants 2 and 4.This situation suggest for an average consideration of all the points .So the quantity  xy n will serve to indicate the overall direction and extent of the relation. The meaning is very clear that even though some of the pair wise product of x and y may be of different sign, eventually the overall sum of the pair wise product would tell us about the entire direction of the relationship. However, we have still one more problem, and this is on the units of measurement. Think, for instance, one is to compute the correlation coefficient between height taken in ft and the weight taken in pounds of some group of individuals. And another person is to do the same exercise perhaps on the same individuals but with measurements taken in metres for the case of heights and in kgs for weights. The final result for the two individuals would certainly differ in terms of the magnitude, even though it will be the same in terms of direction. Or think of the situation where one wants to know as to which variable between weight and age relate much to one's height than the other. Certainly, there would be no comparison between the two unless the units for ages and weights are harmonised! This problem is easily over-comed through the use of standard scores. That is, instead of looking for the correlation coefficient between x and y we shall look for the correlation between the standardised variables of x and y. So the suggested measure becomes 1  x  x  y  y   . After some manipulations, the correlation coefficient of the two variables x and y denoted  n  s x  s y  1  x y xy   n2 by r is then given as r = n where s x andsy are the standard deviations of x and y respectively. The term sx s y in the numerator is known as the covariance of x and y denoted by Cov (x, y) =. r can also be written as r = n xy   x y n x   x n y 2 2 2   y  2 1  x y The above formula for xy   n n2  This new formulae is the most practical for computation purposes. Note that the absolute value of r is always less than 1 i.e.  1  r  1 . If r is close to 1 indicates a very strong relationship whereas the vice-versa is also true. Example 1 The following are the scores in terms of G.P.A of 10 pre-entry female students at SUA in 2000/1 against their entry points (based on A-level) performance S/N Entry points G.P.A 1 2 3 3.5 2.3 2.1 3 4 2.6 Table 13.1 4 5 6 3.5 3.5 4 3.2 3.2 2.5 7 3 3.1 8 3.5 2.8 Find the sample correlation coefficient and comment on the nature of their relationship. 24 9 10 3 3.5 3.6 2.8 Sol: r= n xy   x y n x   x n y S/N 1 2 3 4 5 6 7 8 9 10 Total r= 2   y  X 3 3.5 4 3.5 3.5 4 3 3.5 3 3.5 34.5 Y 2.3 2.1 2.6 3.2 3.2 2.5 3.1 2.8 3.6 2.8 28.2 2 2 2 7(96.8)  (34.5)(28.2) 10(120.3)  (34.5) 10(81.4)  (28.2)  2 2  XY 6.9 7.35 10.4 11.2 11.2 10 9.3 9.8 10.8 9.8 96.8 Table 13.2 X2 9 12.25 16 12.25 12.25 16 9 12.25 9 12.25 120.3 Y2 5.29 4.41 6.76 10.2 10.2 6.25 9.61 7.84 13 7.84 81.4 =-0.35 Comment: Since r=-0.35 is negative but close to 0, then the Pre-entry female student's G.P.A's have poor and negative linear association with their A-level entry points. Example 2 A group of 5 students took tests before and after training and obtained the following scores Table 13.3 Before X: 2 2.5 2.5 3 5 After Y: 2.5 3 3 5 5 Find the correlation coefficient r and comment on the nature of the relationship. Sol: Table 13.4 X 2 2.5 2.5 3 5 Total 15 Y 2.5 3 3 5 5 18.5 X2 4 6.25 6.25 9 25 50.5 XY 5 7.5 7.5 15 25 60 25 Y2 6.25 9 9 25 25 74.25 r= n xy   x y n x   x n y 2 2 2   y  2  = 5(60)  (15)(18.5) 5(50.5)  (15) 5(74.25)  (18.5)  2 2  0.796 Comment Since the value of r is positive and close to 1, then there exist a strong and positive linear association between scores before training and the scores after training 13.2.1 A note on Correlation Coefficient " r” Although, in most cases the results from correlation analysis gives us a picture on how things relate, its interpretation should be taken with a great care. It happens in most case that we have surprising results, which seem to be against the intuition. Whenever such a situation occurs, we need to careful and considerably interpret the result. For example in one instance the 2001UE results for Communication Skills (SC100), Biometry (MB101) and Development studies (DS100) for the students in B.Sc ESM were considered and it was found that Communication Skills was highly correlated to Biometry than to Development studies. The results seem to be against the intuition that Communication Skills should not be highly correlated to Development Study, a subject of which its mastery is largely based on language mastery! But what could be the conclusion? Instead, of simply concluding that DS and SC are less connected we should rather think of exceptional factors prevailing in the conduct of the subjects. One of such possible reasons could be the fact that, most instructors, including the ones in the said subjects are interested /concerned with the material content of the subject rather than the grammatical part of it! As a result, the language component may not be reflected in the performance of most subjects including the ones, which would have, seem to be reasonably related! That Biometry was highly correlated to Communication Skills, this is no way a by chance event! With mathematical concepts manifested in its teachings, Biometry was then a measure of one’s general understanding in all the subjects, including Communication Skills 13.3 Regression Analysis In an attempt to find the form of an equation connecting two or more variables, statistician employs the so-called regression analysis. In mathematics, this concept is known as curve fitting. There are so many types of curves ranging from linear, polynomials, logarithmic and so on. As said before, we shall confine ourselves to the study of fitting the linear equations with two variables only. This is called Simple linear regression analysis. 13.3.1 Simple linear Regression Analysis Simple linear regression analysis deals with the determination of an equation connecting the linear association of two variables. Assume that we have n paired observations of x and y such that (x 1, y1), (x2, y2), … (xn, yn). We intend to find a and b such that yi =a +bxi. Consider, say, the following observations on x and y where y is the number of rats dying due to the use of dose x of a drug. Y x 50 0 90 3 56 4 Table 13.5 62 65 5 10 80 9 70 6 Let us plot the paired observations in the x and y axes so that we get the so-called scatter diagram. The scatter diagram can easily reveal the extent of linear association between the two variables. 26 Scatter diagram for the number of dying rats against the dose of a drug The number "y" of dying rats 12 10 8 6 4 2 0 0 20 40 60 80 100 Level "x" of a dose Figure 13.4: Scatter diagram for the number of dying rats against the dose of a drug As seen from the plotted graph, the suggested linear relationship does not perfectly exist. But we can just have an approximate linear relation by finding a line l, which has a minimum sum of square deviations of the vertical distances from the points not to be found on the line. Such a line is called the line of best fit, or the regression line. The main principle underlying the linear regression line is that the line is so chosen in such a way that the sum of the square deviations of the vertical distances of the points not to be found in the line is minimum. The method of fitting a line in such a principle is called the method of least square estimation 13.3.2 The derivation of the least squares estimates. We have ei  yi  yˆ i where yˆ i  a  bxi . So, we shall consider e 2 i    yi  yˆ i     yi  a  bxi  2 2 Basing on the principle defined in the previous section, the least squares estimates would be found as follows (the reader is advised to refer to the topics on partial derivatives and matrix algebra sections) i)   ei2 y  i ii)  ei2 y  i a b Equating i and ii to zero we have; And;  a  bxi  2 a  a  bxi  b 2  2  yi  a  bxi   2  yi  a  bxi xi an  b xi   y i …………(i) a  xi  b xi2   y i xi …(ii) 27 Solving i and ii simultaneously using matrix algebra; x x  n   xi i 2 i   yi  a  =   b     y i xi   x  x   n   xi Pre-multiply by    1 i 2 i a  b  This finally reduces to    1 0 a   n on both sides we get;    = 0 1   b    x i 1 n xi2   xi  2 x  y x  y x Thus a = n x   x  2 i i i i 2 2 i i and b =   xi2    xi n xi y i   xi  y i i i 2 i i i i   xi2  y i   xi  y i xi    2   yi   n xi2   xi    =  n y x  x y  y i xi    i i  i  i  2   n xi2   xi      xi   n  n xi   xi  2 2 1  x    y   x   y x  = cov( x, y ) . var( x) The expression for a can also be shown to be equal to a = y -b x . These values of a and b are the ones which always minimises the sum of the square deviations of the vertical distances from the regression line. Let us apply these findings on the previous problem.(refer to table 13.5) Table 13.6 S/N 1 2 3 4 5 6 7 Total X 50 90 56 62 65 80 70 473 b= Y 0 3 4 5 10 9 6 37 xy 0 270 224 310 650 720 420 2594 7(2594)  (473)(370  0.082 7(33105)  (473) 2 a = y -b x = 37 473  (0.082)  0.26 7 7 28 X2 2500 8100 3136 3844 4225 6400 4900 33105 Interpretation  ·The value of a=-0.25 is the y intercept when x=0.So, we can argue that without any explanation on the level of the dose, about -0.12 rats are expected to die. But as for this example, it suffices to say that no rat will be dying in the absence of the dose as we cannot have negative number of rats.  ·b=0.082 is the slope of the line, hence we can argue that for every unit increase in the dose x, there is a corresponding 0.082 increase in the number of dying rats. One may not clearly understand what this means as there cannot be 0.082 rats. However, what it means is that, in every 12 more units of the dose, there will be one more rat dying from the dose. Example.1 The following data were obtained on the paired variables x and y X Y (a) (b) (c) -2 2 -1 2 Table 13.7 0 3 1 4 2 4 Find the linear regression line of y on x and comment on the result. Calculate the correlation coefficient r and comment on the result. Find sx 2 and sy 2 Sol: S/N 1 2 3 4 5 Total x -2 -1 0 1 2 0 y 2 2 3 4 4 15 Table 13.8 xy -4 -2 0 4 8 6 X2 4 1 0 1 4 10 Y2 4 4 9 16 16 49 (a) Using a Microsoft excel program, the regression line is easily given as y =3+0.6x (b) Again using Microsoft excel program, r = 0.95, which indicates a strong and positive linear association between the two variables. (c) From computer calculations sx =2 and sy =0.8 13.3.3 Explained, Unexplained, and Total Variations  The variance of the actual values of y observed in the field is called Total variation.  Note that while finding the regression line, we had the vertical distances e1, e2, e3 , …en, of the points not to be found on the line. If the variance of these vertical distances is computed is known as the variance of the errors of estimate. Another name assigned to it is the Unexplained variation. The unexplained variation is normally denoted by se2 . 29  After we have obtained the estimated y's, we can as well obtain their variance denoted as sy . The variance of the estimated values of y is known as the Explained variation. Consider the previous example where we had y=3+0.6x Table 13.9 S/N 1 2 3 4 5 X -2 -1 0 1 2 ŷ Y 2 2 3 4 4 1.8 2.4 3 3.6 4.2 ei =yi- ŷ 0.2 -0.4 0 0.4 -0.2 From the tables: (1.). Sy2 =0.08 (2) Se2 =0.08 Sŷ2 = 0.72 (3) From these result you will definitely find that Sy2 = Sŷ2 + Se2 Total variation in y =Explained Variation+Unxeplained variation. This result can be proved assumption that estimated y must on the average be the same as the original y algebraically basing on the The partition of the total variation in y into two components enables us to define another measure of the goodness of fit of the given regression line known as the "coefficient of determination" given as R 2 = Explained variation/Total variation. With reference to this example we have R2 = 0.72/0.8 =0.9 = 90%. Note that R2 = r2 under simple linear regression analysis. The coefficient of determination shows the extent at which the variation in y is explained/caused by the regression line, the greater the value of R2 =the better is the fitted line and the viceversa hold true. In the example given, 90% of the variation in y is accounted by the regression line. Exercise 9-13 1. Clearly, distinguish between statistics and mathematics? 2. Discuss any four methods of data capture. 3. Outline the advantages of an arithmetic mean as a statistical average compared to the standard deviation. 4. The mean of five items of an observation is 4 and the variance is 5.2.If three of the items is 1, 2, and 6 then find the other two. 5. The mean and standard deviation of a sample of size 10 were found to be 9.5 and 2.5 respectively. Later on, an additional observation became available. This was 1.5 and was included in the original sample. Find the mean and standard deviation of the 11 observations. 30 6. The following table shows the distribution of ages of a group of people in a village. Age (years) 0-9 10-19 20-29 30-39 40-49 Number of people 25 35 75 41 24 Calculate: (i) (ii) (iii) The Arithmetic Mean Standard Deviation The coefficient of skew ness and comment on the result 7. Find the mean and standard deviation of the following two samples put together Sample. No 1 2 Size 50 60 Mean 158 164 S.D 5.1 4.6 8 The mean and standard deviation of a set of observations were found to be 16.5 and 4.7. But later on it was discovered that, the value 16 and 12 were wrongly entered instead of a 6 and 2. Find the correct value of the mean and the standard deviation 9. Find the mean and standard deviation of the values 4, 5, 6 and 10 10. Mention and explain any four advantages of taking a sample in a survey instead of having a complete enumeration. 11. What is a stratified sampling? Describe it with concrete examples. 12. An experimenter made the following observations 1, 3, 5, 7, ….99. Find the mean and standard deviation of the observations. 13. In question no 6, find the following (a) Median (b) Mode (c) Draw a Histogram, frequency polygon and a greater than cumulative frequency curve. (d) Use the greater than cumulative frequency curve to estimate the number of people with ages less than 28 years. (e) Find the range 14. Explain the difference between the mean and the standard deviation. 15. We have two investment proposals, Expected Mean cash flow Standard deviation Oil Venture Rs 1,00,000 7,200 Real Estate Venture Rs 10,00,000 14,000 Explain which venture is more riskier. 31 16. Coefficients of variation of two series are 75% and 90% and their standard deviations are 15 and 18 respectively. Find their means. 17. For a frequency distribution of marks of History of 200 candidates (grouped in intervals 0-5, 5-10…) the mean and S.D were found to be 40 and 15.Later on it was discovered that the score 43 was misread as 53 in obtaining the frequency distribution. Find the corrected mean and s.d corresponding to the corrected frequency distribution. 18. What is the arithmetic mean for the following data? Variate: Frequency: 0 1 1 n C1 2 n C2 … … …. …. n n Cn 19. Two groups of students reported mean weights of 162 and 148 pounds respectively. When would the mean weights of both together be 155 pounds? 20. Find the mean and standard deviation of the following series of observations -1, 4, -9, 16, -25, 36…10,000. 21. The expenditure for 100 families is given below: Expenditure: 0-10 No. of families 14 10-20 20-30 30-40 40-50 ? 27 ? 15 Mode of the distribution is 24.Calculate the missing frequencies. 22. Let d I be the deviations of a set of variables from an arbitrary constant C. Show that the standard deviations of the Variate d I is the same as that of the original variables 23. Consider a sequence of numbers in an arithmetic progression, with the first term as a1 and the last as an. Find an expression for mean deviation. Hence or otherwise find mean deviation of the sequence, 1, 2, 3, …1000. 24. The mean and standard deviation of a Variate x are m and  respectively. Obtain the mean and Standard deviation of (ax+b)/c, where a, b, c are constants. The following were the points scored by the five students of Bsc.Environmental Sciences & Management students in two different subjects in the June/July-2001 University Examination. Math-(x) Comm-(y) 1 5 3 3 2 5 0 3 2 3 Answer the following questions: 25. The correlation coefficient between the two subjects is a) 2, b) -0.9, c) -0.08 d) 0 e) 0.98 32 f)None of the given 26. Comment on the nature of the linear relationship between the two subjects: a) Poor and direct relationship; b) indirect and strong relationship; c) Strong and positive d) Strong and direct relationship e) No relationship at all f) None of the given 27. The line of the best fit of y on x is a) y =3x+2 b)y = -0.9x+5 e) y = -0.08x+3.9; f) y = 5 28. c) y =8x +3 d) y =-3.9 +0.2x For every unit increase in the score of Maths (x) there is a corresponding unit change in the score of Communication Skills by -. a) an increase of 3.9 marks c) a decrease of 0.08 marks e) an increase of 8 marks b) an increase of 3 marks d) a decrease of 0.9 marks f) None of the given The following table shows a distribution of ages of a group of people in a village. Age (years) Number of people 0-9 25 10 - 19 35 20 - 29 75 30 - 39 41 40 - 49 24 Answer the following questions: - 29. 30. 31. 32. The modal age is f) None of the given The median age is a) 17 b) 24 c) 30 a) 24 b) 20 e) None of the given d) 45 c)34 The number of people with age less than 36 is a) 176 b) 167 c) 175 f) You cannot tell e) 25 d) 35 d) 160 e)168 The number of people with ages greater than 10 is a) 172 b) 175 c) 140 d) 160 e) 170 Given the following sequence of observations, 1, 3, 6, 10, … 5050: The mean, and median are; a) 172, 1000 b) 1717, 1300.5 2555, 1555 e) 2000, 1476 f) None of the given f) None of the above 33. c) 225, 1598 d) 34. Asha and Janet are female students studying Mathematics in two different classes, which were taught by two different instructors in different circumstances. When the test was given to the two classes, Asha got 60% while Janet did also get 60%. However, the mean score in the first class was 50% and in the second was 40% while the standard deviations for the two classes were 3 % and  % respectively. Which one between the two girls performed better in Mathematics than the other? a) Asha; b) Janet; c) No one; d) Both of them; 33 e) You cannot tell; f) None of the given 35. Asha and Janet participated in a certain beauty contest in which Asha was ranked the fourth out of seven girls who participated. How beautiful is Janet as per the opinion of the Judges? (a) Like Asha b) The ugliest c) The most beautiful not tell; f) None of the given d) like the wife of one of the Judges e) You can 35. The average in the sequence 2,4,8 is 4 and not 4.6 i.e. (2+4+8)/3. Why do you think this is so? a) Because 4<4.6 b) Because 4 is the geometric mean c) No reason d) Not true e) You can not tell; f) None of the given 36. A surveyor had already identified about 2280 items from which a systematic Sampling would be made. Given that the sampling interval was 10. Find:(a) The sample size to be taken (b) If the first item to be picked was the 9th in the list, what would be the last item in the list to be included in the sample? (c) Considering the order of the items in the list as the numerical values, find the mean and median of the sample. 38. y and x are said to be related in the form of y =ax + bx2 whereas the observed values of x and y are as shown in the table X Y 1 10 5 190 6 360 4 124 With your knowledge of linear regression analysis, show that they do indeed obey this law. Hence, estimate the value of y when x is 10 39. The following experimental values are said to obey a law of the form y=aebx whereas the observed values of y and x are as shown in the table below X Y 1.2 73.20 0.8 22.05 5 64.74 0.7 16.33 ( i ) Estimate the best values of a and b ( ii ) Hence estimate the value of y when x is 0.3 40. Given that variables x and y have a linear relationship and the following data are provided. s x =9.83, s y =25.916, and b=2.069. Comment on the nature of their relationship. Clearly state the reasons for your comment. 41. The first three moments about the value 4 of a variable are, 2, 9.7and –48. Find the 1st three moments about the mean. Also compute the coefficient of skewness and comment on the nature of the distribution. 42. Compute the coefficients of skewness and kurtosis if the first four moments about the value 3 of a variable are 1.7, 8.9, 39.5 and 211.7 34 43. If the regression line of y on x is 0.1x+1 and that of x on y is x=2y-2. Find the values of y and r. 44. Show that it is impossible for two variates x and y to have the following properties. E (x)=3, E (y) =2, E (x2)=10, E (y2) =29, E (xy)=0 45 Is it true that if the regression line of y on x is y=3x+1, then that of x on y is x=2y+3 46. Let r be the correlation coefficient between x and y. What is the correlation coefficient between (3x+1) and (2y-3). What conclusion in general can you make on the property of a correlation coefficient “r”? 47. State the main principle underlying the linear regression analysis. difference between regression and correlation analysis 48. Show that the variance of the first n positive integers is 49. Show algebraically that, the total variation=Explained 1 2 (n  1) 12 Varoiation+Unexplained Variation. (Hint: Consider the fact that, on average the mean of estimated y’s is the same as the mean of observed y’s) 35 Explain the main CHAPTER 14: ELEMENTARY PROBABILITY THEORY 14.1 Introduction The word probability in its ordinary meaning refers to the chance/possibility of a certain event to take place. In other words, the event must be the one, which no one is certain about how it will exactly occur. These are events occurring without any rule/order events not determined by any one! . Such random events usually emanate from random experiments such as the toss of a coin or the throw of die. From this definition it logically follows that in order to determine the probability of an event, we must have the knowledge on the odds in favour of event A and the odds not in favour of A. The two will form the set of the odds in favour of none of the event. This set is a universal set consisting of all possible events under consideration. Accordingly, we define the probability of an event A as the fraction of the odds in favour of A out of the total possible odds. Think of an experiment where one is tossing a coin. There are two possibilities; a head or a tail may turn up forming a universal set of two elements. It would logically appear that the chance of having a head turning up in that experiment is fifty-fifty. i.e. ½ But why? This is of-course based on the assumption that the coin is not biased and there is no reason as to why should the odds in favour of one side exceeds the other!!! Such an approach to probability theory is known as classical theory of probability-in reference to the classical society of the time during which the theory was first developed. However, there many cases where such assumptions are not valid or rather it is difficult to enumerate the odds of the entire universal set. Such situations suggest another approach to probability theory known as empirical probability. What is simply meant is, the probability of an event should not merely be a matter of one’s subjective thinking / perception but rather determined by empirical evidence based on data observation for considerable period of time and space. Just think of how you can know about the chance of a first year student at SUA to fail in the examination? Or how you can establish the chance of a smoker to die from cancer? All these would demand an empirical verification as said earlier! At this point, it suffices to put a note on the uses of the word probability in statistical theory. By saying, for instance, that the chance for a student to pass an exam at SUA is 0.9, it does not mean that a particular student X sitting for that exam is likely to pass. Absolutely this is not the meaning! The meaning is if that student is allowed to sit for an exam ten times, nine of the times is likely to be passing (no wonder that she/He passes less than 9 times).The relevance of the probability of an event in real life should be sought of in terms of repetitive trials and not otherwise! For the mastery of this topic the reader is advised, among other things, to make sure that the Set theory and combinatorial mathematics outlined in the beginning are clearly understood. 14.2.Basic probability Concepts 14.2.1 Experiment An experiment in probability theory is any well-defined action (trial) of which its outcome is not certainly known. Examples of trials are such as the toss of a fair coin, the throw of a die or sitting for an examination 14.2.2 Possibility space/set In an experiment, the set of all possible outcomes is called a possibility set, while each outcome is called a sample point. In tossing of a die there are six possible outcomes, which form a sample space of six elements. 36 14.2.3 The probability of an event A As explained in the introductory part, the probability of an event A is defined as the ratio of the number of elements in the event set to the number of elements in the possibility set. That is, if the possibility set S consists of equally likely outcomes, then the probability of an event A, written P (A) is defined as P (A) =n (A)/n (S). Consider an experiment of throwing an ordinary die and the outcome A, such that the number occurring is 5 or 1. The event set A = 5,1 and the possibility set S= 1,2,3,4,5,6 so P (A) =n (A)/n (S) =2/6 =1/3 From this definition, it follows that, always 0  P( A)  1 , because A is a subset of the Universal set. If P (A)=0 the event cannot occur i.e. an impossible event. If P (A)=1 then the event is a sure event, i.e. it must occur. 14.2.4 Complementary Events Let A denote the event A does not occur, then the following is true P ( A ) =1-P (A) or P (A) +P ( A ) =1 Example A bag contains 5 black and 3 white balls. A ball is drawn from the box. Find the probability that (i) the ball drawn is black (ii) it is not black. Sol: (i.) P(black)=5/8 (ii) P(not black)=1-P(black)=1-5/8=3/8 14.2.5 Independent Events Two ehvents A and B are independent, if the occurrence of one event does not affect the occurrence of another. If two events A and B are independent, then P (A  B)=P (A)xP (B). This is called the multiplication law for independence of two events. Example.1 The chance that A and B will solve a question are ½ and 2/3 respectively. If they both attempt the question find the chance that the question will not be solved. Sol: A question will not be solved if and only if both of them fail in solving the question. So it requires us to consider the joint occurrence of the events. “ A fails and B fails to solve the question”. Since the two events are independent then the chance is 1/2x1/3=1/6. 37 Example .2 A die is thrown twice. Find the probability of obtaining a 4 on the first throw and an odd number on the second throw. Sol: We consider two events. “Event A , a 4 on the first throw and event B an odd number on the second throw”. P(A) =1/6, and P(B) = 3/6 since we have three odd numbers out of the six possible outcomes. Knowing that A and B are independent then P(A and B)=P(A)xP(B)=1/6x3/6=3/36. 14.2.6 Conditional Probability Example An urn contains 3 red balls and 5 white balls. A ball is drawn at random and its colour is noted, without replacing the first ball, a balujjl is drawn and its colour noted as well. Find the probability that (a) Both balls are white, if the first was also white. (b) ,, ,, ,, red ,, ,, ,, ,,, red Sol: Let w=event that selected ball is a white ball and r=event that selected ball is a red ball. We shall write P (w2/ w1) to mean the probability of the second ball selected is white given that the first was white. Note that the notation does not imply division! (a). P (w1  w2) =P (w1) xP (w2/ w1)=(5/8)x (4/7)=5/14 (b) P (r1  r2) =P (r1) x P (r2/ r1)=(3/8)x (2/7)=3/28 In general, if two events A and B are not independent, then P (A  B)=P (A) x P (B/A) or P (A  B)=P (B) x P (A/B). 14.2.7 Additive law of probability If A and B are any two events of the same experiment such that P (A)  P (B)  0, then P (A or B)=P (A)+P (B) - P (A and B) Symbolically, P (A  B) =P (A) +P (B) - P (A  B) Example1 A and B are shooting at a target. The chance for A to shoot the target is 1/3 and for B is 2/3. Find the probability that the target will be hit .The target will be hit if either A or B shoot the target. Sol: From the established law, P (A or B) =P (A) +P (B) - P (A and B) = 1/3 + 2/3 – (1/3)*(2/3) = 7/9 38 Example.2. Given that P (A/B) =2/5, P (B)=1/4, P (A)=1/3. Find (a) P (B/A) The multiplication law of probability can be extended up to a case of more than two events. Exercise Given three events A, B, C, establish the additive law for the three events. 14.2.8 Mutually Exclusive Events If either event A or B can occur but not both, then the two events A and B are said to be mutually exclusive events. In this particular case P (A  B)=0. So for mutually exclusive events A and B P (AorB) =P(A  B)= P(A)+P(B) Example. hA die is thrown, find the probability of having a number which is either less than 3 or greater than 4. Sol The two events A and B are mutually exclusive, none of the numbers can be less than 3 and at the same time greater than 4. Accordingly, P(A or B) =P(A)+P(B) = 2/6+2/6=2/3 14.2.9 Exhaustive Events If two events A and B are such that AUB=S then P(AUB)=1 then events A and B are said to be exhaustive. Consider S= 1,2,3,4,6 A= 1,2 , B= 3,4,6, AUB= 1,2,3,4,6, P(AUB)=n(AUB)/n(S) =6/6=1.Therefore, A and B are exhaustive eventsj. 14.3 Worked examples Example 1 A box contains 12 balls in which 4 are white, 3blue and 5 are red. 3 balls are drawn at random from the box. Find the chance that  All the three balls are of the same colour  Two of the balls are of the same colour and  All three balls are of different colour 39 Sol:  The same colour means either all white or all red or all blue. These three events are mutually exclusive  4   3  5           3   3  3   3 events. Accordingly; we have 44 12    3   Two are of the same colour means either white with the rest or blue with the rest or red with the rest. Accordingly,   3  9   4  8   5  7              2 1   2 1   2 1   29 44 12    3  All three are of different colours means one white, one blue and one red. Accordingly,  4  3  5      1 1 1   3 11 12    3  Example.2 k There are two urns. The first contains 5red balls and 3white balls .The second contains 7 red balls and 3 white balls. An urn is randomly selected and the ball drawn from it. What is the probability that the ball drawn is red? Sol: The red ball can be drawn from either the 1st urn or the 2nd urn. The chance a red ball is drawn from the 1st urn is ½*5/8 while the chance that a red ball is drawn from the 2nd urn is ½*7/10. Since the two events are mutually exclusive then the probability of drawing a red ball is ½*5/8+1/2*7/10=53/80 14.4 Bayesian Probability theory Suppose event A can be caused by a set of mutually exclusive and exhaustive events E1, E2, E3…En .By the conditional probability theory, P (AnEi)=P (A/Ei) P (Ei). Since event A can occur through E1, E2, E3…then the probability that event A n occurs=P (AnE1)+P (AnE2)+…. +P(AnEn)=  P( AnE ) . The probability of event A to occur is known as the total i i 1 probability. Suppose event A has occurred or must occur. What would then be our interest? Most certainly, our interest would be to know as to what events has caused it or may lead to the occurrence of such event. For example, if A occurs what is the chance that event Ei has caused it? n We know that P (Ei/A)=P (AnEi)/P (A)=P (AnEi)/  P( AnE ) P( E ) i 1 i 40 i n = P (A/Ei) P (Ei). /  P( A / E ) P( E ) . i 1 i i What we have so far deduced is known as Bayesian theory, owing its name to the English theologian and probabilistic BAYES Thomas (1702-1761) who firstly inverted it. Example.1 A product in a certain plant can be manufactured by any of the three different machines, M1, M2, M3.The chances for a machine to manufacture a defective item are respectively 0.3, 0.2 and 0.5. Assuming that the chance for a product to be manufactured by any of the machine is 1/3.Find, (a) (b) The chance that a manufactured product is defective; The chance that a defective product is manufactured by a machine M2 (a) We have P (D/M1)=0.3, P (D/M2)=0.2, and P (D/M3)=0.5 Sol: 3 The chance for a defective product  P( DnM ) =0.3x0.3+0.2x0.3+0.5x0.3=0.3 i i 1 (b) P (M1/D)=P (DnM1)/P (D) = P(D/M1)xP(M1)/P(D)=0.2x0.3/0.3=0.2 Example 2 There are two urns. The 1st contains 8 red and 2 green balls. The second contains 4 red and 3 green balls. A ball is drawn at random from one of the urns and found to be green. What is the chance that the ball was drawn from the 2 nd urn. jSol Let suppose U1 stands for the 1st urn U2 stands for the 2nd urn G “ for the green ball selected P (U1)=1/2 , P(U1)=1/2, P(G/U1)=2/10, P(G/U2)=3/7 h We want to find P(U2/G).=P(U2nG)/P(G). But we know that a green ball can be selected from either the 1 st urn or the 2nd 2 urn. Accordingly P(G)=P(GnU1)+P(GnU2)=  P(GnU ) P(U ) =0.5x2/10+0.5x3/7=0.31 i 1 i i Therefore (U2/G).=P(U2nG)/P(G).=P(G/U2)xP(U2)/P(G)=(0.2x0.5)/0.31=032 41 Exercise 14. 1. An ordinary die is thrown. Find the probability that the number obtained (a) is a multiple of 3 (b) is less than 7 (c) is greater than 10 2. A sample of 28 rats was treated by a certain dose of drug, with an anticipation of 5 different effects that may be observed. The number of effects against the number of rats affected were observed and recorded as follows: Number of effects Number of rats 0 4 1 12 2 8 3 3 4 2 5 1 If the rat is chosen at random, find the probability that there are : (a) (b) (d) 3 effects on a rat 2 or 5 effects at least one effect 3. For the events A and B it is known that P(A)=2/3, P(AB)=5/12 and P(A B)=3/4 find P(B) 4. A and B are two events such that P(A)=8/15, P(B)=2/3 and P(AB)=1/5. Are A and B exhaustive events? 5. A and B are exhaustive events and it is known that P (A/B)=1/4 and P (B)=2/3.Find P (A) 6. Write short notes on the following terms (a) (b) (c) (d) (e) 7. Probability Sample space Sample point Mutually exclusive events Independent events The probability that an animal treated with a certain chemical will die is 0.2.Find the probability that (a) (b) Two treated animals will die In two treated animals one will die, the other will survive. 8. The probability that a customer will visit a pharmacy in a day is 0.025. Find the probability that on two consecutive days at least one customer will visit the pharmacy. 9. A coin is tossed and a die is thrown. What is the probability of obtaining a head on the coin and an even number on the die? 10. Suppose that in general 20% of the patients affected with a specific disease die. In a random sample of 3 such patients, what is the probability for 2 deaths? 11. A die is thrown 3 times, what is the probability that (a) (b) 12. all throws show 6 all throws are alike In certain experiment known as binomial experiment a coin was tossed 3 times. Find the probability of having 42     0 Number of heads 2 `` `` 3 `` `` 0 `` `` Find the sum of the probabilities found above and comment on the result obtained. 13. Of 150 patients examined at a clinic, it was found that 90 had heart trouble, 50 had diabetes, and 20 had both diseases. What percentage of the patients had either heart trouble or diabetes? 14. If 60 percent of the American males of the age of 20 and 65 percent of American females of the age of 20 live to be 70, what is the probability that an American couple married when they were 20 years will live to celebrate their golden wedding? 15. If two guinea pigs, one of pure black race and the other of pure white race are mated, the probability that each offspring of the second generation is pure white, pure black or of mixed colour are respectively ¼, ¼ and ½. What is the probability that 3 such off springs would posses different colours? 16. From a sample of 6 orange trees and 8 lemon trees, 5 trees are chosen at random for experimentation. Find the probability that 17. (a) 2 orange trees and 3 lemon trees are chosen (b) There are more number of orange trees chosen than lemon trees. A balanced diet is said to have a combination of certain amounts of proteins, carbohydrates, and vegetables. A student was asked to prepare different types of meals from a set of 5 different types of carbohydrates, 6 of proteins and 2 of vegetables. How many different types of meals will a student prepare if she uses: (i) (ii) 2 types of carbohydrates, 3 of proteins and 1 of vegetables. 1 type of carbohydrates, 2 of proteins and 2 of vegetables. 18. Into how many different ways can the letter of the word stratified be arranged? 19. A box contains 5 white and 2 black balls. A second box identical with the first contains 3 white and 5 black balls. One box is chosen and a ball withdrawn from it. What is the probability that the ball drawn is white? 20. Find the number of permutations of the letters of the word GRAMMAR. 21. Find the number of ways in which 3 animals for experiment can be chosen from eight different animals. 22. Find the number of ways in which six boys can be divided into two teams of three. 23. In a sample of 24 animals, 7 have black colour. If two animals are chosen at random from the sample, find the probability that (i) (ii) (iii) 24. They both have black colour Neither has black colour If 3 animals are chosen at random, find the probability that more than 1 will have black Colour. In an experiment to determine the effects of a certain fertilizer on crops, two plots of land containing fruits and wood trees were considered. Plot 1 contains 40 fruits tree and 10 wood trees. Plot 2 contains 20 fruit trees and 70 wood trees. An unbiased coin is tossed. If a head turns up, a tree is selected from plot 1, while if a tail turns up a tree is selected from plot 2. Calculate the probability that a fruit tree is selected for 43 experiment in (a) one trial (b) Given that a fruit tree is selected for experimentation, calculate the probability that when the coin was tossed a head was obtained. 25. In a certain Experiment 5, seeds of normal type were planted. Assuming other factors remain equal, find the chance that (i) none will germinate (ii) only two will germinate (ii) at least one will germinate. 26. If on average, 1 ship in every 10 is sunk, find the chance that out of 5 ships expected 4 at least will arrive safely. 27. A six –faced die is so biased that it is twice as likely to show an even number as an odd number when thrown. It is thrown twice. What is the probability that the sum of the two numbers thrown is even? 28. The chance of winning three of the five games and four of the five games are equal. What is the chance of wining all five games. 29. When a soldier fires a target, the probability that he hits is: 1/3 for soldier A, 1/6 for soldier B, 1/6 for soldier C and 1/12 for soldier D. If all the four soldiers A, B, C,D fire at the target simultaneously, calculate the probability , that the target is hit by someone or more. 30. In a bolt factory machines A, B,C manufacture respectively 25%, 35% and 40% of the total production. Of their output 5,4, 2 percents are defective bolts. A bolt is drawn at random from the product and found defective. What are the probabilities that it was manufactured by machines A, B, C? 31. An urn contains 4 white and 5 black ball, a second urn contains 5 white and 4 black balls. One ball is transferred from the first to the second urn, then a ball is drawn from the second urn. What is the probability that it is white? 32. A and B toss a coin alternatively on the understanding that the first to obtain a head wins the toss. A begins. Find their respective chances of wining. 33. What is the probability of picking either a red piece or white piece from a container, which contains 15 red, 5 white and 13 green pieces. 34. If P (A) =1/2, P (B) =1/3 and P(C) =2/3. Find P (A or B or C) 35. A card is picked at random from many cards numbered 1,2,3…2000. Find the chance that the picked card is either divisible by 3 or 7. 36. There are 4 female students, 4 male students and 4 lecturers available for interview. Three persons are chosen at random for interview. Find the probability that, all the three categories of people are selected given that - ( i) Sampling is done with replacement( ii) without replacement 37. Three groups of plants contain respectively 3 lemon trees and 1 orange tree; 2 lemon trees and 2 orange trees; and 1orange tree and 3 lemon trees. One tree is selected at random from each group. Find the probability that the three selected trees consists of 1 lemon tree and 2 orange trees. 38. A bag contains 50 tickets numbered 1, 2, 3, …50 of which 5 are drawn at random and arranged in ascending order of their numbers. X1 < X2 < X3, …X5 What is the probability that X3 ,=30 ? 44 CHAPTER 15: INTRODUCTION TO PROBABILITY DISTRIBUTIONS 15.1 Discrete random variables and discrete probability distributions A variable of which its occurrence is subject to chance is known as a random variable. In tossing a coin n times the variable X denoting the number of heads to appear is certainly a random variable of whose possible values are 0,1,2,3…n. For a variable to be random it must be determined from a random experiment. Basically, a variable describes with numerical quantification about a certain outcome of interest, in a given experiment. Our interest in this discussion is focused on all possible values of a variable and not just some of them. Implyingly random variables describe numerically a set of exhaustive sample points to be taken by the event of interest in a given experiment. Conclusively, a discrete variable X n assuming the values x1, x2… xn, with associated probabilities p1, p2… pn will be a random variable if and only if p i 1 i =1. Example .1 Let X be a variable “ the number of fours when two dice are thrown”. Show that X is a discrete random variable. Sol: When two dice are thrown, the possible number of fours is 0, 1 or 2. Therefore x can take the values 0, 1, and 2, meaning that X is a discrete variable. P (X=0) =(5/6) x (5/6)=25/36 P (X=1) =(5/6) x (1/6) +(1/6) x(5/6)=10/36 P (X=2) =(1/6) x (1/6) =1/36 xi p ( xi ) =25/36+10/36+1/36=1 Now  Therefowre X, is a discrete random variable Example 2 hklfghdIf we toss a fair coin twice, the numbers of heads to obtain are 0, 1 or 2. J,nP (X=0) = (1/2) x (1/2) =1/4 P (X=1) = (1/2) x (1/2) + (1/2) x (1/2) =1/2 P (X=2) == (1/2) x (1/2) =1/4 The value being considered is the number of heads .It can only take the value 0, 1, 2 and so is called a discrete variable. Again, P (X=0) +P (X=1) +P (X=2) =¼+1/2+1/4 =1. So X is a discrete random variable. In the examples given, we had the set of values to be assumed by a specified random variable together with their corresponding probabilities. Such a distribution is called a probability distribution (Recall about a frequency distribution). We can present such a distribution in a tabular form 45 Table 15.1 0 1 ¼ 1/2 X P (X=x) 2 1/4 Sometimes the probability distribution of X can be expressed as a function of x in the form of a formula. In the above table one could present the relationship in the following way   p( x)  1 4 , if x  0, 2  1   2 if x  1 Such a function providing the probabilities of X at various values is known as the probability density function (p.d.f) or probability mass function (p.m.f). Example The p.d.f of a discrete random variable y is given by P (Y=y) =cy2 for y=0, 1, 2, 3.4. Given that c is a constant, find the value of c. Sol: From the definition of a random variable c y 2  p i  must be equal to unity. So  p(Y  y)   cy 2 1 =1.We have c 0 2  12  2 2  32  4 2  1 , and thus c=1/30 15.2 Expectation, E(x) 15.2.1 Introduction The expectation E (x) of a random variable X is simply the mean of the probability distribution of X. It shows the average n x p value of X expected after the conduct of a given random experiment. Thus E(X)= i 1 n i p i 1 i n =  xi pi since i 1 n p i 1 i is always i equal to 1. Example 1 A random variable X has probability distribution as shown in the table below. Find E (X) Table 15.2 X 1 2 3 P(X=x) 0.3 0.2 0.5 . Sol: E (X) =1x0.3 +2x0.2 +3x0.5=2.2 Example.2 In a certain gambling game, a die is thrown and you bet for a number to appear. If your guess is correct a 100/= is awarded to you, otherwise you are the one to award. If Mr A is to play only once, what is his expected gain? 46 Sol: We need to find out the probability distribution for the concerned random variable. The variable X can assume only two values, which are either 100 or 100 depending on the show up of the die. Table 15.3 X P (X=x) -100 5/6 100 1/6 E (X) =-100x5/6 +100x//6=-67/= 15.2.2The Expectation of any Function “f (x)” Sometimes we are interested in the expectation of a function of x. For example one might be interested in the expected value of the linear function of x or quadratic function of x and so on. In general, if g (x) is any function of the discrete random g ( xi ) P ( xi ) variable X then E (g (x)) =  Theorems 1. E (a) = a where a is a constant 2. E (aX) = aE (X) where a is a constant f3. E (ax+b) = aE (X) +b where a and b are any constants 4. E [f (X) +g (X)] = Ef (X) +Eg(X) The student is advised prove the above identities. Their proofs should be straightforward basing, on what was outlined in the properties of a sigma notation. Example3. The following is the probability distribution of X shown in the table below Table 15.4 X 0 1 4 P (X=x) 0.2 0.4 0.2 Find the following (a) E (X) (b) E (2X) (c) E (7X+1) (d) E( X2 ) (e) E (X+ 5 X2) Sol: (a) E (X) =0x0.4+1x0.4+4x0.2=1.2 (b) E(2X)=2xE(X)=2x1.2=2.4 =7E(X)+1=7x1.2+1=9.4 (d) E (X2)= 0  0.2  1  0.4  4  0.2  3.6 2 2 2 (e) E (X+ 5 X ) =E(X)+E(5 X ) = E(X)+5E(X ) = 1.2+5x3.6=19.2 2 2 2 j 47 (c) E(7X+1) 15.2.3 Variance, v (x) Recall that in a frequency distribution V (X) = random variable with E (X) = μ then V (x) = f p i i (Xi - X )2 (xi - μ)2 /  /  f i = ∑ fi(xi)2 /  f i –( x )2 .But if X is a discrete p i = ∑ X2pi /  p i –(μ)2 = ∑ X2pi- μ2 = E (X2) – μ2. For example, variance of the distribution given in the preceding section would be V (X)=3.6-1.22 = 2.16 d Theorem 1. V (aX) = a2var (X) 2. Var (aX+b) = a2var (X) The student is advised to verify the above identities. Exercise A committee of six students is taken out of a class of 10 females and 8 males. Find the expected number of females in the committee. Find also the standard deviation of the possible number of female students 15.2.3 Moments Just as we have moments in frequency distributions, we also have moments in probability distributions. The frequencies shall be replaced by probabilities in the original formulae. The rth moment about the mean of a probability distribution will be given as  r   ( x   ) P ( xi )  E ( x   ) r r c  r   ( x  a) r Whereas the rth moment about a point“a ”will be given by P ( xi )  E ( x  a ) r 15.3 Bivariate probability distributions It sometimes happens that our interest is focused on the event involving the joint occurrence of two or more varieties. These variables may be independent or dependent. These two conditions will have implication on the statistics involving such two variables. Usually if we consider the joint occurrence of two variables the following statements are always valid. 1. 2. 3. 4. E (X  Y) E (XY) = var (X  Y) Var (X  Y) = E (X)  E (Y) E (X) x E (Y) if and only if X and Y are independent = V (X) + Var (Y)  Cov (X, Y) = V (X) + Var (Y) if and only if X and Y are independent jh 48 Proof 1. E ( X  Y )  n k  ( x i 1 j 1 2. k n k n k i 1 j 1  y j ) pij  xi pij   y j pij  1   xi pi  1   y j p j  E ( x)  E ( y) i 1 j i i 1 j 1 If X and Y are independent then the joint p.d.f of X and Y “ P (X=x,Y=y) may written as P (X=x)  P(Y=y)= pi p j . n So E(XY)=  i 1 3. n i k n k j 1 i 1 j 1  xi yi pi p j   xi pi  y j p j  ExE y  Having proved the 1st and the 2nd the 3rd become straight forward.       Var ( X  Y )  E x  y   E x  y   E ( x 2 )  E y 2  2 E xy  E 2 x   E 2  y   E x E  y  Rearranging the terms accordingly we get var (X  Y)=V (X) + Var (Y)  Cov(X, Y) where 2 Cov(X,Y)=E(XY)-E(X)E(Y) 4 If the two variables are independent then from the 2nd Cov (X, Y)=0 and hence var (X  Y)= V (X) + Var (Y) Example .1. Suppose two coins are tossed once. Let X be the random variable denoting the number of heads in the 1st coin and let Y be the random variable denoting the number of heads in the second coin Obtain the probability distributions of X and Y (a) (b) (c) Find E(X), Var(X), E(Y) and Var(Y) Obtain the probability distribution for the random a variable X+Y, X-Y and XY Find the E (X+Y), Var (X+Y), E (X-Y), Var (X-Y), E (XY) and compare the results with the results in part a above. Sol. The probability distributions are as follows X P (X) X P (X) 0 0.5 Table 15.5:For X 0 1 0.5 0.5 Table 15.5: For Y 1 0.5 (a) E (X) = 0 x0.5+1x0.5 =0.5 Var (X) = E (X2) – μ2 = 0x0.5+1x0.5-0.52 = 0.25 h By symmetry E (Y) =0.5 and V (Y) =0.25 49 (b) Let V=X+Y and U=X-Y and R =XY kjWe shall consider all the possible pairs for x and y in order to determine possible values to be assumed by U, V and R. Accordingly, we have the following probability distributions for U, V and R Table 15.6: For V V P(V=v) 0 1/2x1/2=0.25 1 1/2x12+1/2x1/2=0.5 2 1/2x1/2=0.25 Table 15.7: For U U P (U=u) i -1 1/2x1/2=0.25 0 1/2x1/2+1/2x1/2=0.5 1 1/2x1/2=0.25 Table 15.8: For R Rfd P(R=r) (c) 0 1/2x1/2+12x1/2+1/2x1/2=0.75 1 1/2x1/2=0.25 Comparison (i) For X+Y E(X+Y)=E(V)=0x0.25+1x0.5+2x0.25=1, =0.5+0.5=E(X)+E(Y) V (X+Y)=Var (U)=(0x0.25+1x05+4x0.25)-12=0.5=0.25+0.25=V (X)+V(Y) Since X and Y are independent g (ii) For X-Y E (X-Y) =E (U) =-1x0.25+0x0.5+1x0.25 = 0 = 0.25-0.25=E (X)-E (Y) V (X-Y)=V (U) =1(1x0.25+0x0.5+1x0.25)-02=0.5=0.25+0.25=V(X)+V (Y) Since X and Y are independent (iii) For XY E (XY)=E(R)= 0x0.75+1x0.25 =0.25=0.5x0.5=E (X) xE (Y)= since X and Y are independent Example .2 The p.d.f of a joint distribution of x and y is given as   0.2 y 2  x for x  1,2,4 and y  1,2,3,4  10  f(x, y) =  0 otherwise Find (i) E (X) (ii) E(Y) (iii) (XY) and verify whether X and Y are independent or not 50 Sol :  (i) E(X)=  0.2 y x 2 10 ally allx x    1  1    0.2 xy 2   x 2     0.2 y 2  x  1 x 2   10   10  ally allx ally allx ally allx aly allx     1  0.2(12  2 2  3 2  4 2 )(1  2  4)  (1  1  ...1)(12  2 2  4 2 )  4.2 10 (ii)  y E (Y)=  0.2 y 2 10 ally allx x    1 1    0.2 y 3   xy     0.2  10  10  ally allx ally allx allx      y   y x  3 ally ally allx  1 3 0.2(13  2 3  33  4 3 )  (1  2  3  4)(1  2  4)  1 10 (iii) E(XY)=   1  1 1   xy  0.2 y 2  x      0.2 xy 3   x 2 y     0.2 y 3  x   y  x 2        10  ally allx ally allx ally ally allx  10  ally allx  10   1  0.2(13  2 3  33  4 3 )(1  2  4)  (1  2  3  4)(12  2 2  4 2 )  7 10     Since 7  4.2 x 1 then X and Y are not independent. 15.4 Marginal Probability Density Function The joint probability distribution function P (x, y) describe the joint occurrence of the two variables x and y. We can however derive the probability distribution of one variable from the joint distribution by fixing the other. If we take the sums with respect to y (making x fixed) in the P (x, y) i.e. P( x, y ) the value of P (x, y) will become simply P (x) and  ally the vice-versa is also true. Such probability distributions for a single variable derived from the joint distributions of the two are known as marginal density functions. Consider the joint p.d.f of example 2 in the preceding section where we had   0.2 y 2  x for x  1,2,4  10  f(x, y) =  0 otherwise and y  1,2,3,4 The marginal densities for x and y can be found as follows 51 P(x)  1 0.2 1 0.2 x 4x  6 (0.2 y 2  x)   y2   x   (30)  1  for x=1,2,4   10 ally 10 ally 10 ally 10 10 ally 10 P(y)  1 0.2 1 0.2 2 1  0.6 y 2  7 2 2 for y=1,2,3,4. (  0 . 2 y  x )   y  x   y 1  x    100    10 allx 10 allx 10 10 allx 10 allx allx 15.5 Continuous random variable and continuous probability distributions 15.5.1 Introduction A continuous random variable is a theoretical representation of a continuous variable such as height, mass or time. The probability density function of a continuous random variable X is often denoted as f (x), where 0<f(x)<1 through out the range of values for which X is valid. In a discrete random variable we had the sum of the probabilities being equal to one. This is also the case for continuous random variable. However in a continuous random variable x cannot take exact values such as 2,5,or 3and as such we consider a certain interval of x values and not a single value. Infact P (X=a) where a is an exact value can never be defined.. Example X is the random variable “ the delay in hours of flight from Airport where f(x) = 0.2-0.02x probability that (a) the delay will be less than 4 (b) the delay will be between 2 and 6 for 0  x  10. Find the Sketch of f(x) y 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 As you can see the values of x which is the number of hours ranges from 0 to 10 and there are infinitely many hours between 0and 10 inclusive. Any interval selected at random from the given range of x values may be thought to have a corresponding portion of area in the total area under the curve of f(x) as it can be suggested by the indicated rectangles. In other words considering the P(0<X<a) is much like thinking of the area under the curve between X=0 and x=a, out of a total area under the curve from X=0 to X=10.Therefore P(0<x<a)= xa x 10 xa x 0 x 0 x 0  f ( x)dx /  f ( x)dx =  f ( x)dx 52 Note that the whole area under the curve would then represent the total sum of all the probabilities, which is 1. x2 (a) P (0<X<2)=  (0.2  0.02x)dx  0.36 x 0 x 6 (b) P(2<X<6)=  (0.2  0.02x)dx  0.48 x2 All the statistics discussed in the discrete random variable do also apply for the case of continuous probability distributions. It is important to realise that, this time we are dealing with infinitely many values of X within a given range. Wherever there is a sum in a discrete case then there is a sum to infinity values in continuous distributions. In turn this leads us to evaluate a definite integral on the specified summing limits. The similarity of the formulae between the discrete and continuous random variables is given in the following table Table 15.9 S/N 1. Discrete p i Continuous 1   f ( x)dx  1 allx  2.  n E ( X )   xi p ( xi ) E ( x)  i 1 3. E (g (x)) =  xf ( x)dx    g ( xi ) P ( xi ) E (g (x)) =  g ( x ) f ( x)dx i   4. x 2 f ( x)dx   2  E (X2) – μ2 V (x)=∑ X2pi- μ2 = E (X2) – μ2 V(x)= E(a)=a E (ax)=aE(x) E (ax+b) Var (ax+b)=a2va(x) E (a)=a E (ax)=aE (x) E (ax+b) Var (ax+b)=a2va(x)  5. 6. 7. 8. 9. r   1 r ( x   ) P( xi )  E ( x   ) r   ( x   ) r f ( x)dx  E ( x   ) r  r  n  10.  r   ( x  a) P( xi )  E ( x  a) r  r  r   ( x  a) r f ( x)dx  E ( x  a) r  Exercise In the example given above find the following (i) Mean (ii) Var (x) (iii) mode (iv) The coefficient of skew ness (v) Median 53 15.5.2 Bi-variate Continuous Probability Distributions If X and Y are two continuous random variables we can consider their joint occurrence just as we did in a discrete random variable case. So Z=f(x,y) will dente the joint p.d.f of x and y. The following statements are still valid under a continuous probability distribution 1. E (X  Y) 2. E (XY) 3. var (X  Y) 4. Var (X  Y) = = = = E (X)  E (Y) E (X) x E (Y) if and only if X and Y are independent V (X) + Var (Y)  Cov (X, Y) V (X) + Var (Y) if and only if X and Y are independent Example   3 2 xy 2  y 2 for 1<x<3 and 0<y<2 80 2x  1 3y 2 densities are respectively f (x)= and f (y)= 10 8 The joint p.d.f for X and Y are given as f (x, y) = Find whereas as the marginal (i) E(X) (ii) E(Y) (iii) E(XY) and verify if X and Y are independent or not Sol: (i) (ii) (ii) E ( x)  E ( y)  E(XY)  3   1 2  0  xf ( x)dx   x(2 x  1) dx  2.13 10  yf ( y)dx   y (3 y 2 ) dy  1.5 8  2 xy 2  y 2  3 2 x 2 y 3  xy 3   =  xyf ( x, y)dxdy   xy3 dxdy dxdy  80  80    3 2 3 2  3  2 3 3 = 2 x dx y dy   xdx y dy  3.2 80  1 0 1 0  Since E (XY) = 2.13 x 1.5=E (X) x E (Y) then X and Y are independent continuous random variables 15.5.3 Marginal Probability Density Functions for a Continuous Random Variable. Similar to the discrete case, we can obtain the p.d.f of one variable from the joint probability distribution of X and Y by  fixing one of the variable. So the p.d.f for X would be given by f (x)=  f ( x, y)dy and  example given, we can derive the marginal densities of X and Y in the following ways: 54 vice-versa is also true. In the 2 2  2x  1  2 xy 2  y 2  3  2 2   dy  f (x)=  f ( x, y )dy =  3 2 x y dy  y dy  0   10 for 1<X<3  80 80  0  0   3 3  3  3y 2 1  2 2 xy 2  y 2 2 f(y)=  f ( x, y )dx =  dx = 2 y  xdx  y  dx  8 32  32 1 1  1   2 15.6 The Moment Generating Function The moment generating function of a probability distribution is defined as the E ( e tx ). Accordingly, for a discrete random variable X the m.g.f will be given by M x (t )  E (e tx )   e tx p( x) while for a continuous random variable allx  the m.g.f will be given by M x (t )  E (e tx )   e tx f ( x)dx . The moment generating function, as the name suggest is a  very useful tool in generating the moments for most probability distributions. If an m.g.f exists, it is unique for that particular distribution. We can see the use of the m.g.f by expanding E ( e tx ) using the Maclaurian series expansion for ex. We know that t 2  2 t 3  3 t 2 E x2 t 3 E x3 tx     )= 1  t1    . E( e )=( 1  tE x    2! 3! 2! 3!     As the expansion suggest, we have the moments about the origin as the coeffients of t in the given infinity series. In each case, the 1st, the 2nd, the 3rd …rth moments about the origin can be easily obtained by substituting t=0 in the rth derivative of the m.g.f d r M x (t ) i.e.  r  t=0 dt r Exercise 15 1. The discrete random variable X has p.d.f as shown X 6P(X=x) 1 0.2 2 0.25 3 0.4 4 a 5 0.05 Find (i) the value of a (ii) P (1 X3) (ii) P (2 X<5) (ii) E (X) 2. The p.d.f of a discrete random variable x is given by P (X=x)=kx for x=12, 13, 14. Find the value of the constant k. 3. A committee of 3 is to be chosen from 4 girls and 7 boys. Find the expected number of girls on the committee, if the members of the committee are chosen at random. The discrete random variable has p.d.f given by P (X=x) =kx for x=1,2, 3, 4, 5, where k is a constant. Find the E (X) 4. 55 5. Independent random variables X and Y are such that E (X)=4, E (Y)=5, Var (X)=1, Var (y)=2, find (a) E (X+Y) (b) E (5x+6y) (c) Var (X+Y) (d) Var (3X+2Y) (e) Var (3X-5Y) 9 Independent random variables X and Y have the following probability distributions X P (X=x) 0 1/4 1 3/4 Y P (Y=y) 0 2/3 1 1/3 Find (i) E (X) (ii) E(Y) (iii)E(XY) and verify whether X and Y are independent(iv) Var (X-Y) 6. Two independent random variables X and Y are such that E (X)=3 and E(X2 )=12 E (Y)=4 and E (Y2 )=18, find the values of (a) E (3X-2Y) (b) Var (2X+Y) (c) Var (2X-Y) 7. X has a p.d.f given by P (X=X)=kx for x=1,2,3,4. Find (a) k (b) E (X) (c) Var (X) (d) Var (3X) 8. Two independent random variables X and Y are such that E (X2 )=14, E (Y2 )=20 Var(X)=10, Var(Y)=11, find the values of (a) E (3X-2Y) (b) Var (2X+Y) ( c) Var(5X+2Y) 9. The p.d.f is such that P (X=x)=kx for x =0, 1, 2, 3,…n find the value of k 10. Given that X is a discrete random variable with p.d.f as P (X=x) = p x for x = 1, 2, …  . Where p 1 11. distribution. 12. (a) (b) (c) Find the value of the constant p Find the E (X) Find the coefficient of skewness and kurtosis and comment on the nature of the probability distribution. (i) Distinguish between a probability distribution and a frequency (ii) The p.d.f of a discrete random variable X is given by P (X=x) = mx for x assuming the values -1, 4, -9, 16, -25, 36…10,000. Find the value of the constant m. The p.d.f of a discrete random variable X is given as P (X=x) =K where K is a constant. Find the value of the constant K 56 2x [40 p x ] for x= 0,1,2, …40 x! 13. A box contains two red and two white balls. Balls are drawn at random without replacement. Let X be the random variable, the number of white balls chosen before the first red. Find E (X) 14. The p.d.f of a discrete random variable X is given as P (X=x) =Kx for x = 1, 3, 6, 10, 15 . . . 465. 15. (i) Find the value of the constant K (ii) Find P (1<X<465)  1  The p. d. f of a discrete random variable X is given as P (X=x) = K  2 for X= 1, 2, 3, …100.  x  x  Find P (1  X  50). 17. The p.d.f of a discrete random variable X is given as P (X=x) =Kx for x = 1, 7, 18, 34, . . . 970. (i) Find the value of the constant K (ii) 18. Find P (X=34) The jont probability distribution for the variables X an Y is given as f (x, y) = xy  x 2 y 2 for X=1,2,3,4 and 480 y=1,2,3. (i) (ii) (iii) (iv) (v) 19. Determine the marginal density functions for the variables X and Y. Find P (X+Y>3), P (X-Y= equal to even number) Cov (X, Y) and comment on the nature of X and Y Var (x) and Var (Y) Determine the correlation coefficient between X and Y The joint p.d.f of a continuous random variables X and Y is given as f (x, y) = e   x  y  2 for x>0 and y>0 Find the marginal density functions of X and Y and hence or otherwise tell whether Xand Y are independent or not 20. Given f(x,y) =M7xy2 with 0<x<1 and 1<y<2 as joint probability functions for the variables X and Y .Find (i) (ii) (iii) (iv) (v) P (0.2<x<1) The mean, mode of f (x) and f(y) The mode of f (x, y) P (x+y=1)* P (x+y=0.2)* 57 21. Define a continuous random variable random. The following is a p.d.f of a continues random variable x f(x) = Kx2 +1 for 0<X<2 = 0 otherwise Find the value of the constant K and hence mean and median of the distribution. 22. A discrete random variable X has its p.d.f given as if P (X=x) = n c x pxqn- for X=0,1,2…n, where p+q=1 Using the concept of moment generating function, find its mean and variance 23. A discrete random variable X has its p.d.f given as if P (X=x) = qxp, for x =0, 1, 2,…. where p+q=1.Using the concept of a moment generating function, find its mean and variance 24. A discrete random variable X has its p.d.f given as P (x)  x e  x! for x = 0,1,2…  .Using the concept of a moment generating function, find its mean and variance 25. 26. A continuous random variable z has it p.d.f f (z) = 1 2 e 1  z2 2 .  z   a. Using the moment generating function or otherwise, show that its mean is 0 its variance is 1 b. Find the mode and median and comment on the nature of its distribution. A continuous random variable X has its probability distribution given as 1 b  a  for  f (x)=  0 otherwise  a X b Determine its mean, variance and median. 58 CHAPTER 16:SOME COMMON PROBABILITY DISTRIBUTION 16.1 Introduction There are common probability distributions applicable in our daily life. Some of the probability distributions presented in this chapter are, the binomial distribution, the Poisson distribution, the geometric distribution, the hyper geometric distribution, the Uniform distribution and the normal distribution. 16.2 Binomial Distribution A binomial distribution is a kind of a discrete probability distribution, which occurs in an experiment having the following features.     The Experiment must have exactly two possible outcomes i.e. either success or failure The number of trials for that particular experiment must be fixed The probability of success “P” and that of failure ”q” must be the same in every trial The trials must all be independent Examples of binomial experiments, which leads to a binomial distribution (i)A fair coin is tossed three times and the numbers of heads that can occur are considered (ii)A die is thrown ten times and the number of times you obtain a six is considered 16.2.1 The probability density function of a binomial distribution Suppose we have tossed coin n times, and our interest is in the number x of heads that may appear. In other words we want to have n-x tails out of the n trials. But we know that the number x of heads and n-x of tails may appear in different way in terms of the position they occupy. For instance you may have TTTHHH…T or THTHTH…T, e.t.c. Using our knowledge on permutation we can know the number of such possible different arrangements of n objects out of which r are alike and the remaining n-r are also alike. This is n! = x!(n  x)! nCx. But we know that the trials are independent and thus the chance for the occurrence of one such arrangement is p x q n x . Hence, the probability of obtaining x number of heads is nCx p x q n x . The p.d.f of a binomial distribution is thus established as P (x) ncr p x q n  x  = nCr 0 otherwise  Example1 The probability that an animal is properly fed in a certain village is 0.6. Find the probability that in a randomly selected sample of 8 lambs there will be exactly 3 who are well fed Sol: 8C3(0.6)3(0.4)5=0.12 In a binomial distribution E (X)=np and Var (X=) npq The proofs for the two are left as an exercise to student. 59 Example.2 A coin is tossed 6 times. Find the expected number of heads and their variance Sol: E (X)=np=6x1/2=3 and Var (X)=npq = 6x1/2x1/2 = 1.5 16.3.The Poisson distribution It sometimes happens that a random variable under the binomial distribution is observed at an infinitely level. That is the number of trials becomes extremely large. Under such circumstances the limit of the binomial probabilities can be found as n approaches to infinity. Usually we do consider the mean occurrence (expected value of x) of such events within a specified interval of time and space. Events of that nature are said to follow under the so-called Poisson distribution. For example when we think of the distribution of the number telephone calls within a specified time say one hour at a particular calling centre. Or a number of car accidents in a month at particular junction of a road. These are examples of the so-called Poisson distribution. The p.d.f of a Poisson distribution can be derived as follows from the binomial distribution.   and. q= 1  n n n x x n x  n(n  1) (n  x  1)(n  x)!      n(n  1) (n  x  1)     x     x 1    lim 1   lim 1    as n   x!(n  x)! n.n.n(n)  n  n   n  x!  n  as n   We have P (x=) nCr p x q n x and the mean value “  ” =np implying that p= t (  ) x x  1  1   x 1    n  1 where t =   lim 1  1   1   1    lim1    x!  n  n   n n  n   t as n   as t    x e  for x = 0,1,2…  x! Where  is the mean of the distribution also called the parameter of the distribution. If x is distributed in this way, then we write X ~ Po (  ) Example.1 Given that X follows a Poisson distribution with parameter 1.5, find the probability of having X=2 or 3 Sol: P (X=2or3)=P (X=2)+P (x=3)  1.5 2 e 1.5 1.5 3 e 1.5 + =0.25+0.13=0.38 2! 3! 60 Example .2 A coin is tossed 300 times .Use the Poisson distribution to approximate the probability of having 50 number of heads. Sol: Since we have so many number of trials the binomial probabilities can indeed be approximated b the Poisson distribution. We know  =np=100x1/2150. Hence, the p.d.f will be given by P (x)  50 x e 50 5010 e 50 . P (X=50) =  =5.1x10-12 x! 10! 16.4 Geometric Distribution A geometric distribution comes into being when we consider the number of failures preceding the 1st success in a sequence of independent trials with a fixed probability “p” of success in each trial. So if the random variable X dente such number of failures before the 1st success we shall have the p.d.f of X as p (x)=qxp for x =0, 1, 2,…. Where p+q=1 Exercise a. b. Show that the given formulae is indeed a p.d.f Establish the mean and variance of a Geometric distribution Example In a certain game A throws a die with a view of having a number 2 appearing. The exercise continues until when the die shows up the number 2. Find (i) The chance of having the exercise carried in 3 trials (ii) The expected number of failures before the 1st success Sol: (i) P (x )= qxp =(5/6)3(1/6) =125/1296=0.1 (iii) It can be shown from the given exercise that for a geometric distribution E (X)=q/p. Hence the expected number of failures =q/p =(5/6)(/1/6) =5 16.5 Hyper geometric Distribution We had some examples of probability problems where sampling is made without replacement out of a population of (a+b) observations in which a of them are of one type and the other b are of another type. If our interest is on the number x items of category a to be included in the sample of size n, then that is a hyper geometric distribution. Examples of such a distribution is like “the variable X denoting the number of female students to be included in a sample of 9 students selected from a class consisting of 10female and 8 male students. 61 The p.d.f of a geometric distribution is given by   a  b      x n  x  for     a  b    f(x) =   n    0 otherwise  x  0,1,3, n The mean and variance of a hyper geometric distribution is given as E (x) = na nab  a  b  n  and var (x) =   ab ( a  b) 2  a  b  1  9  10 =5girls, while the expected number of males 10  8 98 9  10  8  10  8  9  is =4males. The variance of the distribution is   =1.2 10  8 (10  8) 2  10  8  1  So in the given distribution the expected number of girls in the sample is 16.6 The Uniform Distribution If, for, instance it is known that the probability for any value of x to occur is constant then we have what is known as a uniform distribution. In case of continuous random variable with a  x  b the uniform distribution is given as f (x)= for a  x  b 0 otherwise Proof: k for a  x  b where k is a constant and 0 otherwise We know that f (x) = x b By the requirement,  x b f ( x)  xa 1  k  1  k (b  a)  1  k  b  a x a Similarly for a discrete random variable the uniform distribution is given as f (x)= 1 x b x for a  x  b xa 0 otherwise 62 1 ba 16.7 The Normal Distribution The normal distribution is an example of a continuous probability distribution. It is given as f (x) = -   x   where  is the mean of the distribution and  2 1 2 2 e  ( x   2 ) for is the variance of the distribution. When X is distributed in this way we write X ~ N (  ,  ). 2 16.7.1 Main features of the Normal distribution and the test for normality 1. It is a bell shaped and symmetrical about x =  . This means the coefficient of skew ness is zero. 2. Its frequency curve is a mesokurtic curve with the coefficient of kurtosis being zero. Normally the maximum value of f (x) occurs when x=  and is given as f (x) =1/  2 3. Approximately 95% of the distribution lies within 2 standard deviation of the mean whereas 99.7% of the distributions lies within 3 standard deviations. The Sketch 16.7.2 The standard normal distribution and reading from the normal distribution When  =0 and  2 =1 then the p.d.f of a normal distribution becomes f (z) = 1 2 e 1  z2 2 . This kind of normal distribution is called a standard normal distribution, and we write as Z~ N (0,1). Consider say P (Z<1.8). To find P 1.8 (Z<1.8) it would d require us to evaluate  0 1 1 e  z 2 dz which is possible but very tedious. For that reason 2 2 statisticians have taken trouble to construct several of such definite integrals for an ease of finding probabilities under the standard normal curve. Using the standard normal curve (Table ) P (Z<1.8) can easily be found as 0.0790 16.7.3 Using the standard normal Tables for any normal distribution In order to use the standard normal tables for any normal distribution, we standardize a given random variable say X. So if X~N (  ,  2 ) then Z= X   . Suppose a random variable X denotes an examination score in mathematics by a student at a certain colleague, where it has been found that the score is normally distributed with mean 70 and variance 64.What is the chance of having a student scoring between 60 and 90. In order to use the standard normal tables we must standardise the variable X into a standard variety Z. This can be done as follows: - 63  60  70 X   90  70    P 1.25  Z  2.5 . This means finding the P (60<X<80)   x 8   8 We have P (60<X<80) = P under the given distribution is the same as finding P (-1.25<Z<2.5) under the standard normal distribution .So P (60<X<80)= P (-1.25<Z<2.5)=0.7999 16.7.4 The biviriate normal distribution It is worth at this point to mention about the distribution of two independent normally distributed random variables x and y as we shall frequently encounter this distribution in inferential statistics. 2 Let X and Y are any two independent normally distributed random variables with means x ,  y and variances  x2 and  y respectively. Then the following is true: 1. V = X+Y is also normally distributed with mean  2. V =X-Y is also normally distributed with mean  x x +  -  and variance  x2 +  y 2 y and variance  x2 +  y 2 y Exercise 16 1. An unbiased coin is tossed 4 times. Find the probability of having 3 heads 2. Assuming that two crossed animals of type A1 and A2 are equally likely to produce either an animal similar to A1 or similar to A2. Find the probability that in a group of five animal calves there will be more animals of type A1. 3. State the conditions under which the binomial distributions occur. 4. Of Sokoine University students, 60% have their ages between 20 and 26 years. From a sample of 10Students chosen at random find the probability that (a) only 3 have ages between 20 and 26 (b) more than 8 have ages between 20 and 26 5. X is r.v such that X Bin (n, p). Given that E (X) =2.4 and p = 0.3, find the standard deviation of the distribution. 6. Show that if P (X=x) = n c x pxqn-x x= 0, 1, 2, …n, then E (X) = np 7. The probability that a target is hit is 0.3. Find the least number of shorts, which should be fired if the probability that the target is hit at least once is greater than 0..95. 8. There are about ten multiple-choice questions. The probability for one to guess a correct answer is 1/3. Find the probability of one having (a) 4 answers correct. (b) All answers correct 9. Of the articles from a certain production line, 10% are defective. If a sample of 25 articles is taken, find the expected number of articles and the standard deviation. 64 10. The mean number of bacteria per millilitre of a liquid is kwon to be 4. Assuming that the number of bacteria follows a Poisson distribution, find the probability that in 1ml of liquid, there will be (a) No bacteria (b) 4 bacteria (c) less than 3 bacteria 11. The random variable X follows a Poisson distribution with standard deviation 2. Find P (X 3) 12. If Z N (0,1) find (a) P (0.829< Z <1.843) (b) P (-2.05<Z<0) (c) P(Z < 1.78) ( d) P( Z >2.326 ) 13. If Z N (0,1) find y if (a) P (Z<y) =0. 506 (b) P (Z<y) =0. 891 (c) P (Z>y) =0.001 (d) P(Z >y) =0.00122 14. 15. If XN (100,80), find (a) P (85<X<112), (b ) P(105< X<115) ( c) P( X-100 < 80 ) The heights of boys at a particular age follow a normal distribution with mean 150.3 cm and standard deviation 5cm. Find the probability that a boy picked at random from this age group has height (a) Less than 153 cm, (b) less than 148 cm, (c) more than 158 cm (d) between 147cm and 149.5cm. 16. The marks in an examination were normally distributed with mean  and standard deviation . 10% of the candidates had more than 75 marks and 20% had less than 40 marks. Find the values of  and . 17. The lengths of rods produced in a workshop follow a normal distribution with mean  and variance 4. 10% of the rods are less than 17.4 cm long. Find the probability that a rod chosen at random will be between 18 and 23 cm long. 18. If XN (,2) and P (X< 35)=0.2, P (35<X<45) = 0.65. Find  and . 19. The lengths of certain items follow a normal distribution with mean  cm and standard deviation 6cm.It is known that 4.78% of the items have a length greater than 82 cm. find the value of the mean  20. A discrete random variable X follows under the uniform distribution with values ranging from 2,6,12,20…. 600. Find P (12<X<462) 21. A discrete random variable X follows under a uniform distributions with a<X<b. Establish the mean and variance of the distribution 22. Show that the mean and variance of the hyper geometric distributions are given by E (x) = (x) = na and var ab nab  a  b  n    ( a  b) 2  a  b  1  23. In a certain college a student is allowed to sit for an exam any number of times until when he passes. If the chance that a student passes the exam is 0.4. How many times do you expect a student to sit for such an exam? 24. Six students are sitting for an examination 8 times simultaneously. The chance that a student passes an examination is 0.1. What is the chance of having 4 passing at least 3 examinations? 65 25. In a certain colleague, the chance that a female student passes an exam is 0.4 while the chance for a male student to pass is 0.7. Find the chance of having 4 students passing an exam in a class of 10 males and 8 female students sitting for the examination. 26. In number 27 above what can be the chance of having the same number of sex passing the examination. 27. Show that the variance for a Poisson distribution is the same as the mean. A random variable X follows under the Poisson distribution with parameter 7. Using the normal distribution, approximate P (X>7) 28. By considering some properties of two independent random variables, establish the joint p.d.f of the sum of two independently distributed Poisson variates with means 1 and  2 respectively. 29. Each of the two faces of a die is painted with colours, yellow, green and blue respectively. The die is thrown ten times. What is the chance of having 3yellow faces, 2green faces and 5 blues faces in the whole exercise.[Hint: This is an example of a multinomial distribution] 30 A random variable X¬ N(100,80) and a random variable y¬ N(62.5,50).Find the chance that the difference between X and Y is not larger than 20. 31 By considering the properties of independent random variables, establish the joint p.d.f of the sum of two independent binomial varieties with parameters (n1,p) and (n2,p) respectively. 32. Of SUA students 60% have their ages between20yrs and 26yrs. 30% between 26 yrs and 30yrs and 10% above 30yrs. A random sample of 10 students is selected. Find the probability that 2, 5 and 3 students in the categories mentioned will be selected. 66 CHAPTER 17: SAMPLING DISTRIBUTIONS 17.1 Random sampling We discussed measures of central tendency and variability in a population when we learnt about frequency distributions and probability distributions. We have these measures as well in a sample. When these measures are computed based on sample observations, we refer to them as statistics. In general, a statistic is any characteristic, which is derived from a sample. If a sample of 10 students taking introductory statistics is selected and their scores obtained, their average score will be a statistic. Similarly the mode, median and the variance of their scores will all be statistics. As opposed to the case in a sample, these measures are often known as parameters as far as the study of population’s distribution is concerned. Statistics are meant to represent the unknown parameters of a population. Since we can take more than one sample of items from a population, it follows that samples also have a certain distribution, with specific mean and variance. If we consider the various sample means from the different possible samples to be taken from a certain population, then we are thinking of the sampling distribution of the mean. As a matter of fact we can find sampling distributions of all the statistics, i.e. mode, mean, median variance etc. However for the purpose of our study we shall centre our discussion on the sampling distributions of the mean in this section and of the variance in section 20. Sampling can be done with or without replacement from a finite or infinite population. However, in practise sampling is done either with replacement from a finite population or without a replacement but from an infinite population. The two procedures constitutes of what is practically known as random sampling. 17.1.1Sampling with replacement If we have X 1 , X 2 , …X n , random samples of n independent observations from a population with mean  and variance  2 , then E ( x ) =  and Var ( x )= 2 n if sampling is done with replacement. We can demonstrate the validity of these statements with the following example. Suppose a discrete random variable X has probability distribution P (X=x) as shown in the table below. Table 17.1 X 0 1 P (X=x) 0.8 0.2 (i) 2 Show that  = 0.2 and var (X) =  = 0.16 (ii) By considering all possible samples of size 2 find the probability distribution of, the mean of such samples. Verify that E ( x ) =  and Var ( x )= 2 2 67 Sol: (i)  = E ( x ) = 0x0.8+1x0.2 = 0.2 and Var ( x ) = 02x0.8+12x0.2 –0.22 = 0.16 Possible samples are S/N Samples 1 2 3 4 (0,0) (0,1) (1,0) (1,1) Table 17.2 Sample mean 0 0.5 0.5 1 Probability 0.8x0.8 =0. 64 0.8x0.2 =0.16 0.2x0.8 =0.16 0.2x0.2 =0.04 The probability distribution for x is therefore X P(X =x) Table 17.3 1 0 0.5 0.64 0.32 0.04 E ( X ) = 0x0.64 +0.5x0.32+1x0.04 =0.2 which is the same as  -the population mean Var ( X ) = E ( X ) 2 -(E ( X )) 2. But E ( X ) 2 = 02 x0.64 + 0.52x0.32 +12x0.04 =0.12 2 Therefore Var ( X ) = 0.12-0.2 = 0.08 = 0.16/2 which is the same as 2 n the population Variance divided by the sample size Exercise Given the following distribution X P (X=x) 0 0.2 Table 17.4 1 0.1 2 0.5 3 0.2 (a) Find the mean  and the variance  2 . (b) By taking all possible samples of size 3 verify that E ( x ) =  and Var ( x )= 68 2 n 17.1.2 Sampling without replacement. If sampling is done without replacement then the following is true E ( x ) =  and Var. ( x )= ( N  n)  2  = 1  ( N  1) n  n  s2  N n However when N is large, meaning that the population considered is an infinite one, the formulae for the variance of the sample means becomes the same as in the case of sampling with replacement. Hence Var. ( x )= 2 n . 17.2.The sampling distribution of the mean-I- (Normal distribution) Theorem “If x 1, x 2,. x N are the means of the random samples of size n taken from a normal distribution where X ~ N (  ,  2 ) then the distribution of x is also normal with mean  and variance  2 /n” The intuition behind the theorem should be obvious. We first know that X ~ N (  ,  2 ) and that x is a sum of about n normal variates. From our knowledge of biviriate/multivariate normal distribution, x should also follow under normal with the established mean and variance. Example A random sample of size 10 is taken from among the students of whose mean score in introductory statistics is 70 with a variance of 36. Find the probability that the sample mean is less than 67. Sol: We are required to find P ( x <67). This cannot be done unless we know about the distribution of the sample mean x . But from the above stated theorem, we know that x ~(  , Accordingly, P ( x <67). = P ( x 2 n  2 n ) 67  70 ) = P (Z<-1.580) = 0.0571 36 10 Exercise 1. In the example above what would be the chance that the sample mean doe not exceed the population mean score by 5 2. In a certain college the female scores in mathematics is normally distributed with mean 64 and variance 16, while the male’s score in the subject is normally distributed with mean 70 and variance16. A sample 69 of 4male student and 5 female students are taken. What is the chance that the four males will have their score better than the five females? Show algebraically that “If x 1, x 2,. x N are the means of the random samples of size n taken from a normal 3 distribution where X ~ N (  ,  2 ) then the distribution of x is also normal with mean  and variance  2 /n” 17.2.1 The central limit theorem If x 1, x 2, ... xn  are the means of the random samples of size n taken from any distribution with mean  2 , then for large n, the distribution of the sample mean x is approximately normal and x ~ N (  ,  and variance 2 n ). Note that in practice n is taken to be at least 30. Example A random sample of size 40 is taken from an unknown distribution of whose mean is known to be 8 and variance 4, find the probability that the sample mean exceeds 6. Sol: We do not know whether the population follows under normal distribution or not. However under the central limit theorem the distribution of x shall still follow under normal with mean  , and variance 2 n since the sample size taken is very large.      x   7 8  P ( x < 6) = P   = P ( x < 2.828) =0.4977 2 4     32   n Exercise. A sample of size 30 was taken from a poison distribution with parameter   3 Find the probability that the sample mean follows between 4 and 7. (Hint consider 4.5 and 6.5) 17.3 The distribution of the sample mean II- (The student’s t distribution) The student’s t distribution is also one of the sampling distributions of the mean. It is a continuous probability distribution. The distribution happens when the population variance  2 is unknown. Recall that, if X~N (,   2 ), then x~ N (  ,  2 /n). If it happens that we only have the sample variance s2 which is in practise the case, then x ~t (  ,s2/n ). W. S. Gosset, writing under the pen name of Student, firstly introduced the distribution. It was later on proved by R.A. Fisher that the distribution has a symmetrical, bell shaped, but non normal distribution 70  t2 Which is of the form f (t) = c1  v      v 1     2  for -   t   , where the parameter v is called the number of degrees of freedom. Normally v= n-1 where n is the size of the sample taken. So in terms of n the distribution may be written as f (t)  t2    c 1  =  n  1   n   2 Example. 1 A sample of size 9 of the heights of the student in a certain school was taken from a population with N (1.5,  2 ). The sample variance was found to be 0.63. What is the chance that the mean was less than 2? Sol: We are supposed to look for P( x <2). Upon standardizing we get p ( x 2 s n Since  2 is unknown, the quantity x s2 n  2  1.5 ) 0.63 9 shall follow under the student’s t with n-1 degrees of freedom. Accordingly P ( x <2) = P (t<1.900). The meaning here is the same as for the normal distribution that we have to evaluate the definite integral 1.9 1.9    f (t )dt    t2 c1  8     9   2 dt . However we have statistical tables constructed for this purpose and we can easily read the value for the above definite integral at 8 degrees of freedom as 0.050 (Refer to the student’s t tables at the back of the book) Example 2 The performance of students in a certain class was found to have a mean of 60. A sample of is the chance that their score had exceeded 55 if their variance score was 64. 5 students was selected, what Sol: We are looking for P ( x >55). Upon standardizing x we have P ( x >55). = P ( x s2 n  55  60 ), which is P (t>-1.118). 100 5 From the student’s tables at 4 d.f we have P (t>-1.118) =1-P (t<-1.118)=1-0.1 = 0.99 At this point the reason that we need the student’s t distribution should now be very clear. However, the central limit theorem does also apply to the student’s t-distribution. That the distributions of the sample means under the student’s t-distribution will eventually follow under the normal distribution if n is very large (greater than 30). In example 2 above what could be the solution to the question if a sample of size30 was taken instead of size 5? 71 Since n is large, then by the central limit theorem the standardised variable of x would follow under normal distribution anti-though the variance is not known. So we would have P( x 2 s n  55  60 ) = P (Z>-2.739) = 0.99 100 30 Exercise 17 1. If X ~ N (200, 80) and a random sample of size 5 is taken from the distribution, find the probability that the sample mean (a) (b) is greater than 207 lies between 201 and 209. 2. If X ~ N (200, 100) and a random sample of size 10 is taken from the distribution, find the probability that the sample mean lies outside the range 198 to 205. 3. If X ~ N (50, 12) and a random sample of size 12 is taken from the distribution, find the probability that the sample mean (a) (b) is less than 48.5, (b) is less than 52.3, lies between 50.7 and 51.7. 4. At a college, the masses of the male students are distributed approximately normally with mean mass 70kg and standard deviation 5kg. Four male students are chosen at random. Find the probability that their mean mass is less than 65 kg. 5. A normal distribution has a mean of 40 and a standard deviation of 4. If 25 items are drawn at random, find the probability that their mean is (a) 41.4 or more (b) between 38.7 and 40.7 (c) less than 39.5. 6. If a large number of samples, size n are taken from a population which follows a normal distribution with mean 74 and standard deviation 6, (a) find n if the probability that the sample mean exceeds 75 is 0.282, (b) find n if the probability that the sample mean is less than 70.4 is 0.00135. 7. A normal distribution has a mean of 30 and a variance of 5. Find the probability that (a) the average of 10 observations exceeds 30.5, (b) the average of 40 observations exceeds 30.5, (c) the average of 100 observations exceeds 30.5. Find n such that the probability that the average of n observations exceeds 30.5 is less than 1%. 8. The r.v. X is such that X ~ N (  , 4). A random sample size n, is taken from the population. Find the least n such that P(    < 0.5) > 0.95. 72 9. Χ is the r.v. ‘the sample mean of samples, size 15, taken from N(30, 18)’ and Y is the r.v. ‘the sample mean of samples, size 8, taken from N (20,16)’. Find the distribution of (a) X – Y, (e) 4X – 2Y. (b) X + Y, (c) Y - X (d) 5X + 3Y, 10. In a certain country the heights of men are normally distributed with mean 175 cm and standard deviation 5cm and the heights of women are normally distributed with mean 165 cm and standard deviation 6 cm. Find the probability that the mean height of three women chosen at random is greater than the mean height of four men chosen at random from the population. 11. A random sample X , X is drawn from a distribution with mean  and standard deviation  . State the mean and standard deviation of the distribution of (a) X1 + X2 , (b) X1 - X2 , (c) X. A student’s performance is equality good in two subjects. The marks she might be expected to score in each subject may be treated as independent observations drawn from a normal distribution with mean 45 and standard deviation 5. Two procedures might be used to decide whether to give the student an overall pass. One is to demand that she pass separately in each subject, the pass mark being 40; the other is to require that her mean mark in the two subjects exceed 40. Find the probability that the student will obtain an overall pass by each of these procedures. 12. In certain nation, men have heights distributed normally with mean 1.70 m and standard deviation 10 cm. Find the probability that the average height of three men chosen randomly is greater than 1.78m and the probability that all three will have heights greater than 1.83m? For the nation, women have heights distributed normally with mean 1.60m and standard deviation 7.5cm. Find the probability that a husband and wife have not more than 5 cm difference in heights and state the assumptions that you have made in the calculation. 13. X1 and X2 are random variables such that X1 is normally distributed with mean 120 and variance 8 and X2 is normally distributed with mean 150 and variance 22. A random sample of size 20 is taken from the distribution of 3X1 + 4X2 . Find the distribution of the sample mean. 14. Random variables X and Y are such that X ~ N (100, 10) and Y ~ N (120, 20). Random samples of size 50 are taken from each distribution. Find the probability that the sample from the distribution of Y will have a mean which is at least 21 more than the mean of the sample from the distribution of X. 73 15. (a) If X and Y are independent random variables with means  x,  y and variances  x2 ,  y2. Respectively, show from first principles that the mean and variance of aX + bY are a  x + b  y and a2  x2 + b2  y2 respectively where a and b are constants. (b) The diameters x of 110 steel rods were measured in centimetres and the results were summarised as follows:  x  36.5 , x 2  12.49 Find the mean and standard deviation of these measurements. Assuming these measurements are a sample from a normal distribution with this mean and this variance, find the probability that the mean diameter of a sample of size 110 is greater than 0.245 cm. 16. A random sample of size 100 is taken from Bin(20,0.6). Find the probability that (a) X is greater than 12.4 (b) X is less than 12.2, where X is the sample mean. 17. A random sample of size 30 is taken from Po (4). Find (a) P(X < 4.5), (b) P(X > 3.8), (c) P(3.8 < X < 4.5). 18. If a large number of samples, of size n, are taken from Po(4.6) and approximately means are less than 4.005, estimate n. 2.5% of the sample 19. If a large number of samples, of size n, are taken from Po (2.9) and approximately 1% of the sample means are greater than 3.41, estimate n. 20. If a large number of samples of size n are taken from Bin(20,0.2) and approximately 90% of the sample means are less than 4.354, estimate n 21. The standard deviation of the masses of articles in a large population is 4.55 kg. If random samples of size 100 are drawn from the population, find the probability that a sample mean will differ from the true population mean by less than 0.8 kg. 22. Two red balls and two white balls are placed in a bag. Balls are drawn one by one, at random and without replacement. The random variable X is the number of white balls drawn before the first red ball is drawn. 1 , and find the rest of the probability distribution of X. 3 5 (ii) Find E(X) and show that Var(X) = . 9 (i) Show that P(X = 1) = (iii) The sample mean for 80 independent observations of X is denoted by X. Using a suitable approximation, find P(X>0.75). 23. The mass of coffee in a randomly chosen jar sold by a certain company may be taken to have a normal distribution with mean 203g and standard deviation 2.5g. (i) Find the probability that a randomly chosen jar will contain at least 200g of coffee. (ii) Find the mass m such that only 3% of jars contain more than m grams of coffee. (iii) Find the probability that two randomly chosen jars will together contain between 400g and 405g of coffee. (iv) The random variable C denotes the mean mass (in grams) of coffee per jar in a random sample of 20 jars. Find the value of a such that P( C  203 < a ) = 0.95. 74 CHAPTER 18: ESTIMATION OF POPULATION PARAMETERS 18.1 Introduction If from the observations in a sample, a single value is calculated as an estimate for unknown population parameter, the procedure is known as point estimation. But if a population parameter is estimated within a given range falling between two values we refer to such estimation procedure as interval estimation. Interval estimation is much preferred for decision making/inferential statistics. If you may recall on our discussion on measures of central tendency we had seen that, to simply regard an arithmetic mean as the representative of the entire population is much riskier than having two limits allowing for the variation from the mean. This indicates the strength of interval estimation to the point estimates. However, the two type of estimation are closely related with interval estimation requiring first, the point estimate of the population parameter. Our discussion in this chapter will be centred on two population parameters, the mean “  ” and the variance “  2 ” 18.2 Point estimation Following what has been discussed in the previous section, what then should be an estimator of a particular population parameter? How should it behave? It is easy for the case of the population mean to say intuitively that its corresponding estimate should be the sample mean. The same also should be the case with the population variance that its estimator should be a statistic, which is more or less like the sample variance. But this is not always the case for some population parameters. Instead of relying on such intuitive knowledge, a rather general procedure is employed to find an estimate of a certain population parameter. We have several approaches such as the method of moments, least square estimates (which had already been discussed) and maximum likely hood method. The treatment of such methods is beyond the scope of this presentation. However a rigously students is strongly advised to consult other references. The methods mentioned above are meant to satisfy some of the very important characteristics of a good estimator of a population parameter, amongst which being that an estimator ˆ of a population parameter  should always be unbiased. That is E ( ˆ ) =  . From our discussion in sampling distribution we can clearly see that the sample mean is an unbiased estimator of a population mean. For we had proved that E ( x ) =  . Basically what is meant is, the thought estimator ˆ should be the very parameter  under average consideration from all the possible samples. But things are different for a sample variance. E (s2)  2 . That is the sample variance is not an unbiased estimator of the population variance  2 . To have it as an unbiased estimator, we need to make some slight improvement, and usually we have ns2/(n-1) as an unbiased estimator of the population variance  2 . However when n is very large (preferably n>30) the estimator ns2/(n-1) simply reduces to s2. Exercise  x  x 2 1. Show algebraically that E [ns2/(n-1)]=  2 where s2 = 2. Find the unbiased estimators of the population mean and variance from the following sample values 8,9,10,11,12,15 75 n 7, 18.3 Interval Estimation of Population parameters 18.3.1 Introduction As earlier said it sometimes preferable to give a range of values for which a certain population parameter may fall instead of the vice-versa. Such an estimation of a population parameter is called interval estimation. Usually, interval estimation is attained after specifying a certain degree of accuracy, called confidence interval. For instance 95% confidence interval for the population mean  , means a and b such that P (a    b ) =95% =0.95. 18.3.2 Confidence Interval for  18.3.2.1 Confidence Interval for  when population variance  2 is known Let us consider a (1-  ) % confidence interval for  . If XN (  ,  ) then for any n, x  (  , 2 2 n ).  / n Where Z N (0,1). By standardization, we have z=( x -  ) We know that the central (1-  ) % of N (0,1) lies between  z . 2 Sketch Therefore  z  / n z  (x-) 2 2 -Multiplying through out by  / n , we have;  z  / n  ( x -  )  z  / n ; 2 2 Multiplying by –1 through out we have;  z  / n  (  - x )  z  / n ; 2 2 Adding by x through out we have; x  z  / n    x + z  / n ; 2 2 This may finally be abbreviated as x  z  / n 2 Example A sample of 19 students taking agriculture general was taken to assess the performance of the students in introductory statistics. The mean score of the students was found to be 30. Find a 95% confidence interval for mean score of all the students taking agriculture general if it is known that the variance of their score is 9.5. 76 Sol: We know that the area contained within 95% lies between  1.96 . From the established formulae 95% confidence interval would be given by x  1.96  / n Accordingly we have 30  1.96 (3.1)/ 19 =30  1.4=(28.6, 31.4) The interpretation is that if hundred such samples were taken, 95 of them are likely to have their means between the two limits. Exercise In one-incidence two persons agued about the mean age of the 1st year students at a certain Tanzanian college, one saying 23 and the other 26.To that effect, a random sample of 10 students was taken and revealed the mean age of 25 years. By finding a 99% confidence interval for the population mean, comment on the dispute between the two guys if the overall variance is known to be 16 years of age. 18.3.2.2 Confidence Interval for  , population variance  2 unknown The situation described in part a is a very ideal situation. For it is wonderful as to how can you know about the population variance and not about the population mean? Of-course one may treat the population variance under the current study to be the same as the one found in earlier studies made on similar populations. This seems to be the only way of having a situation described in part a above. In practice however, we rarely know about the population mean and variance. So what happen when we know not of the population variance  2? From what we had learnt on the sampling distribution of the mean, the statistic “ x ” may still fall under the normal distribution depending on the sample size. Usually if n is large x will fall under normal in despite of the absence of the population variance other wise it falls under the student’s t. Example .1 A random sample of 40 individuals indicated a mean height of 3 m and a variance of 0.2m.Find a 99% confidence interval of the mean height of the population associated with the 40 individuals Sol: Anti-though  2 is unknown the sample mean x ¬N (  ,  2) because n=40>30 The area containing 99% of the entire distribution lies between - 2.575 and 2.575 Accordingly the 99%confidence interval for  would be3  2.575 (0.2)/ 40 =3  0.01= (2.99,3.01) Example.2 A sample of size 5 of petty traders was taken to asses the monthly sales of the petty traders in Morogoro urban area. The mean sales in hundred thousands shillings of the five petty traders were found to be 30, 26, 28, 35, 40. Find a 95% confidence interval for mean monthly sales of the entire urban area by the petty traders. 77 Sol: In this example we know nothing concerning the population variance just in the 1st example. Worse still the sample size taken is very small. So of necessity x will follow under the student t with (5-1) d.f. Consequently the 95%confidence  x  x  = 2 interval would be x  t ( n 1), ˆ / n where ̂ 2 2 From the data given x =31.8 and n 1 = x 2  nx n 1 2 = ns 2 n 1 ˆ =5.67. The t values containing 95% of the distribution at 4 d.f are respectively  2.776. Thus 95% C.I for  is 31.8  2.776 (5.67) / 5 =31.8  7=(24.8,38.8). 18.3.3 Confidence Interval for the difference between two population means –(  1 -  2 ) 18.3.3.1 Introduction Sometimes our interest is focused on the difference between two population means. For example a medical doctor would wish to know as to which medicine between two medicines A and B is effective in treating a certain disease. Or a district education officer may be interested at comparing the performance of students in School A and School B. These situations suggest the scrutiny of the difference between the two population’s means. 18.3.3.2 Confidence Interval for (  1 -  2 ) when 12 and 22 are known 2 Suppose X1N (  1 ,  12 ) and X2 N (  2 ,  21 ). Lets consider say a (1-  ) % confidence interval for  1 -  2. A (1-  ) % confidence interval for  1 -  2 will be given by ( x1 - x 2 )  Z  2   12  22      n1 n2  Exercise Prove the above result that a (1-  ) % confidence interval for  1 -  2. is given by ( x1 - x 2 )  Z  2   12  22    . (Hint: consider the sampling distribution of x1 - x 2 )   n1 n2  Example Two samples of sizes 10 and 8 of animal weights were taken from two different populations of animals of type A and type B. The sample means were respectively 5kg for the 1st sample and 7kg for the second. It is known that the weights for the two populations are normally distributed with variances2.61 and 3.00 respectively. Find a 95% confidence interval for the difference between the two populations and comment on the result. 78 Sol: We know that the area contained within 95% lies between  1.96 . From the established formulae the 95% confidence interval for  1 -  2. Would be   12  22   .   n1 n2  ( x1 - x 2 )  1.96   2.61 3     10 8  = (7 -5)  1.96  =2  1.6 =(0.4, 3.6) Comment: The above interval indicates that the difference can never be zero. Hence the difference is significant at 5% level. In other words Animal of type B are heavier than those of type A as the samples indicate. 18.3.3.3 Confidence Interval for  1 -  2 when 12 and 22 are unknown] Just as it was the case in part a of the previous section, so also it is in part a of this section. The described condition of part a is an ideal one. In most situations we do not have both, the population means as well as the populations variance. However, if it happens that the two population variances are unknown while both the samples sizes are large enough [n1, n2 > 30] the sampling distribution of x1 - x 2 would still fall under the normal distribution otherwise it will follow under the student’s t distribution. Case 1: when 12 and 22 are unknown, but the samples sizes are large enough. [n1, n2 > 30] Example.1 A random sample of 40 pigs was fed on diet A and another sample of 30 pigs were fed on diet B and the increase in weights noted in each case. The mean increase in weight due to diet A was 13kg while for diet B was 15 kg and the variances were respectively 12kg and 25.61kg. Determine a 95% confidence interval for the mean increase in weights of the two populations and comment on the result. Anti-though 12 and 22 are unknown the statistic “ x1 - x 2 “ shall fall under normal distribution. Because n1=, 40 and n2 =30 which are larger than 30  12 25.61    . =2  2.1 = (-0.1kg, 4.1kg) 30   40 So a 95% C.I= (15 -13)  1.96  Comment: The two limits are of opposite signs, suggesting to us that the difference might sometimes be zero. Hence we can conclude that the two diets do not have a significant difference in terms of their contribution to the pigs mean increase in weight. Case II when 12 and 22 are unknown, and the samples sizes are small. [n1, n2 < 30] We have seen a case in which the two population variances are unknown but the sample sizes are large. How about when the population variances are unknown and the sample sizes are not large? 79 When samples are small [mostly taken to be less than 30], the sample variances s12 and s 22 can not be used as estimates for n1 s12  n2 s 22 . An assumption is thus  and  . Instead we look for a pooled estimate of the population variance i.e. S = n1  n2  2 2 1 2 p 2 2 made that the two populations are having the same variance. Further to that, the sampling distribution for “ x1 - x 2 ” follow under the student’s t with ( n1  n2  2 ) d.f As a result (1-  ) % confidence interval for  1 -  2. is given by ( x1 - x 2 )  t  n  n 1 2  2 , 2  s 2p s 2p    .  n1 n2    (Hint: consider the sampling distribution of x1 - x 2 ) Example The following were the student’s scores in MB201for random samples taken from two different degree programmes at the Sokine University of agriculture in the year 2002 Table 18.1 B.SC. Agric Engineering Sample size Sample mean n1 =12 Bachelor of Vertinary Medicine n2 =8 x1 =7.5 x 2 =5.9 Sample variance s12 = 0.58 s 22 =3.1 Find a 95% confidence interval and compare the e student’s performance in the two-degree programmes Sol As it can be seen both the population variances are absent and that sample sizes are smaller than 30, so the above formulae will be used. From the student’s t tables we have t  n  n  2 , 2 = t18,( 0.05) 2 =2.101 1 2 n1 s12  n2 s 22 12(0.58)  8(3.1)  1.76 And from the data given we have S = = n1  n2  2 12  8  2 2 p  1.76 1.76   . = 8   12 Therefore 95% C.I for  1 -  2 = (7.5 -5.9)  2.101  1.6  1.3 = (0.3 2.9) It is thus evident that the performance of the Engineering students is significantly better than that of the vertinary students in the subject. 80 Exrcise:18 1.The score, x of 63 seniors on a graduate Record Examination showed  x =34,540 and  x 2 =19,480,000. Calculate the unbiased estimate of the population variance. What is your estimate of σ? 2. If two samples of size 10 and 15 are drawn from the same population and have variances of 2.40 and 2.70 respectively, what is an unbiased estimate of the population variance? 3. ( a) When is an estimator q said to be an unbiased estimator of the population parameter Q. (b) From a lot of capsules the contents of five samples of four each were weighed in milligrams. The results follow: Sample no. 1 62.3 61.9 63.1 62.5 62.5 2 62.0 61.8 61.9 62.3 62.1 3 62.9 62.0 61.9 62.5 61.8 4 62.8 62.2 62.6 61.6 62.9 Calculate the variances of each sample and from these variances estimate the population (lot) variance of the weighted contents of capsules. 4.If the mean age at death of 64 a man engaged in a somewhat hazardous occupation is 52.4 years with a standard deviation of 10.2 years, what are the 98% confidence limits for the mean age of all men so engaged? 5.150 bags of flour of a particular brand are weighed and the mean mass is found to be 748g with standard deviation 3.6 g. Find (a) 90% (b) 95% (c) 98% confidence intervals for the mean mass of bags of flour of this brand. 6.A special aptitude test was given to 26 law school freshmen. The results showed a mean score of 82.0 and a variance of 49.00. Set up the 90 percent confidence interval for the mean score of all law school freshmen. 7.In no 6 above, suppose the population variance was known to be 52, find a90 percent confidence interval. 8.Four rats were fed a special ration during the first three months of their lives. The following gains in weight (grams) were noted: 55, 62, 58, 65. Find a 99% confidence interval for the population mean of rats fed with such a special ration. 9.Two laboratory assistants make 10 observations each on the same galvanometer for the same experiment. The average readings were 61 and 58 with variances of 0.60 and 0.40 respectively. Find a 95% confidence interval. 10.Ten pupils from one school have a mean I.Q of 108 and a variance of 60; 17 pupils from another school show a mean of 114 with a variance of 80. Find a 99% confidence interval and comment on the significance of the difference in I.Q between the pupils of the two schools. 11.The total nitrogen (N) content (mg per cc) of rat blood plasma was determined for a group of 60 rats of age 50 days and for a group of 70 rats of age 80 days. The mean N content for the first group was 0.983 and the variance was 0.00253; for the second group the corresponding statistics were 1.042 and 0.00224, respectively. Find a 99% confidence interval and comment on whether N content vary with age. 81 12. A systolic blood pressures of a group of 60 patients showed x =140, sx =10. A second group of 60 showed y =145, and sy =13. Find a 95% confidence interval for the mean difference between the two groups. 82 CHAPTER 19: HYPOTHESIS TESTING 19.1 Introduction A hypothesis is a statement, which can either be true or false. It is a supposition about a certain characteristic of a population. The reason that we have hypotheses in statistics is due to the fact that in oftentimes our investigation are based on sample data rather than the entire population. In relying upon the sample data we will always be uncertain about the size/nature of the population parameters. Hypothesis testing is a general procedure of reaching to a statistical decision basing on the sample statistics. The already discussed interval estimation, as we hade seen may also serve the same purpose in decisionmaking. When making a statistical enquiry, we often put forward a hypothesis concerning a population parameter. For example,  The mean score of the students in Group 1 is 50%  The mean height of 15 year-old girls is 1.62cm These hypotheses are called null hypotheses and are denoted by H0. In order to test the validity of H0, we consider the observations made from random samples taken from the populations and perform a statistical test. If the statistical test indicates that we should reject the null hypothesis, H 0, we do so in favour of an alternative hypothesis, denoted by H1. But such a statistical test is usually not perfect. It may reject the correct null hypothesis and accept the wrong alternative hypothesis and the vice-versa. For that reason a statistical test for a population parameter is done after having specified the level of significance  , i.e. the chance for the null hypothesis to be rejected. In other word the probability of accepting the null hypothesis is 1-  . If it happens that the true null hypothesis is rejected then we have committed the so-called type I error. On the other hand when the wrong null hypothesis is accepted then we will have committed the so-called Type II error. 19.2 The test statistic Suppose, we wish to investigate whether the mean of a normal population is 50 or not. The hypotheses to be stated would be H0:  = 50 (i.e. the population mean is 50) H1:   50 (i.e. the population mean is not 50) But how can we know that the mean is 50 or not? It sounds miraculous that we should be able to know about that. The only way possible is by having extra information regarding the population. Suppose a value was picked at random from the population and found to be 40. Can we use this random observation to decide whether the mean is really 50 or not? Indeed we can. Since we know that the population mean is ideally the representative of the entire population, then under normal circumstance this value should be close to the suggested mean  otherwise the suggested mean is not the true mean. To achieve our target it is precise to standardize the value X. The standardised value Z is then known as the test statistic. In this X  50 example we would use z =  .If z is close to zero, i.e. z is small, we accept that the sample value could have been taken form a population with mean 50 and we do not reject H0 otherwise we reject H0 and conclude that the population mean is not 50. However is the used statistic reliable? As a matter of fact, it is very risk to use just a single value as we have done as our test statistics. The quality of the test statistic will depend upon a number of factors among which being the number of observations it can involve. For that matter, in practice, statistics such as sample mean and sample variance, which involves more than one observations. In general, a good test statistic is the one, which is an estimate of the population parameter under consideration. 83 19.3.Critical region and critical values In the preceding section we said of rejecting or accepting a null hypothesis basing on whether the standardised value is close to zero or not. But how close should it be? To get rid of such a subjective conclusion we need to specify the limits on either side of the population parameter over which the value of Z can be supposed to fall in. This then would be a call for a confidence interval, which was earlier, discussed. Normally we need to select a set of values containing (1-  )% of the distribution for which the null hypothesis can be accepted. The region at which the null hypothesis is rejected is called the Critical region and the boundaries of the critical region are called the critical value. 19.4 One tailed and two tailed test The above test considered where we had  = 50 Vs   50 is an example of a two tailed test. In a two-tailed test, the specified population parameter can either be greater than or less than the given value. In this case if   25, means either  > 25 or  < 25. In a one tailed test we are interested at either a definite increase or definite decrease of a given population parameter at a specified value. The above hypothesis could thus be either  = 25 Vs  < 25 or  = 25 Vs  > 25 19.5 Procedures for carrying out a statistical test. In general, when performing a significant test it is useful to follow a set of procedure. The following procedures can roughly be followed     State first the null hypothesis, H0 and alternative hypothesis H1 If we are looking for a definite increase or definite decrease in the population parameter, we use a one tailed test and if we are looking for any change we use a two-tailed set Consider the appropriate distribution given by the null hypothesis Decide on the level of the test. This fixes the critical values of the test statistic Decide on the rejection criteria Now consider the sample values and    Calculate the value of the test statistic. Make a conclusion: If the value of the test statistic lies in the critical region reject H0 If the value of the test statistic does not lie in the critical region do not reject H0 19.6 Testing for  =C basing on a sample mean 19.6.1When the population variance  2 is known When the sample statistic considered is the sample mean x , the test statistic will be obtained by standardizing x , and this will be given by ( x -C)  / n. 84 Example A sample of the test scores of 49 students was taken and the mean score was found to be 68.9. The variance of the test score by all the students in the school is known to be 36. Is there sufficient evidence at 3% level to assert that the mean score of all the students was 70? Sol  =70  =70 (i) H0: H1: (ii) (iii) We suppose that indeed  =70.Accordingly XN (70, 36) The z value containing 97% of the entire the entire distribution are –2.17and 2.17 (i) Z=( x -C) (v) The calculated value 1.28 is smaller than the critical value of 2.17. Hence we conclude that the population mean could be 70 at 3% level of significance.  / n.= 68.9  70 6 49  1.28 19.6.2 When population variance  2 is unknown As note before on confidence intervals, the preceding section is an ideal one. Often times we know not of the population mean and the population variance. In other words the test statistics involves the sample variance s 2 and not population variance  2 . Accordingly we have, Z=( x -C) s / n for large n and t=( x -C) ̂ / n for small n Example 1 A normal distribution is thought to have a mean of 50. A random sample of 100 gave x mean of 52.6 and s= 14.5. Is there sufficient evidence that the population mean is indeed 50? Test at the 5% level. Sol: In this example the population variance is not known, yet the sample size is very Large (>30), hence the distribution of the sample mean shall follow under the normal distribution. i) H0: H1:  =50  =50 85 (ii) We suppose that indeed  =50.Accordingly x N (50, 14.52 100 ) (iii) The z value containing 95% of the entire the entire distribution are –1.96and 1.96 (ii) The test statistic would be Z=( x -C) (v) The calculated value 1.79is smaller than the critical value of 1.96 Hence we conclude that the population mean could be 50 at 5% level of significance s/ n= 52.6  50 100 14.5  1.79 Example 2 A sample of size 5 of the scores by student’s in-group E yielded the following results. Table 19.1 Student 1 2 3 4 5 Score 50 45 40 60 50 Can we assert at 5% level of significance that the mean score in-group E was 55? Sol: In this example the population variance is not known and worse the sample size is very Small (<30), hence the distribution of the sample mean shall follow under the student’s t with n-1 d.f. From the data given, x = 49and ̂ =7.42  =55   55 i) H0: H1: (ii) (iii) We suppose that indeed  =55.Accordingly x t (55, 7.422 5 ) The t value containing 95% of the entire the entire distribution at 4 d.f are –2.776and 2.776 (iii) The test statistic would be t=( x -C) (v) The calculated value 1.808 is smaller than the critical value of 2.776 Hence we conclude that the population mean could be 55 at 5% level of significance 86 ̂ / n = 49  55 7.42 5  1.808 19.7 Testing for the difference between two populations “ 1   2 =C “ 19.7.1 Testing for 1   2 =C based on the sample means when the population variance  12 and  22 are known If X1 and X2 are independent unpaired samples of sizes n1 and n2 such that X1 N (  1 ,  ) and X2 N (  2 ,  ) then x1 - x 2 (  1 -  2 , 2 1 2 2  12 n1 +  22 n2 ) Since, the population variances are given, then the standardised value of x1 - x 2 will follow under the standard normal distribution and the test statistic will be given as Z = ( x 1 - x 2 ) - (0)  12 n1   12 n2 Example A random sample of size 100 is taken from a normal population with variance  12 = 40. The sample mean x1 is 38.3. Another random sample of size 80, is taken from a normal population with variance  22 = 30. The sample mean x 2 is 40.1. Test at the 5% level, whether there is a significant difference in the population means  1 and  2 . Sol: From the given test statistic we shall have Z= ( 38.3 - 40.1 ) - (0) 40 30  100 80 =-2.0 This value will be compared with the critical values of 1.96 at 5% level of significance which implies the rejection of the null hypothesis that 1   2 =0 19.7.2 Testing for 1   2 =C based on the sample means when the population variance  12 and  22 are unknown Once again let us consider a realistic situation at the field where usually the population variance is unknown. Accordingly the sampling distribution of the mean will follow under either t-distribution or Z-distribution depending on the sample sizes n1 and n2 . If both the 87 sample sizes are large the test statistic will be Z= ( x 1 - x 2 ) - (C) s2 n1 would be t=  s2 whereas if the sample sizes are very small the test statistic n2 ( x 1 - x 2 ) - (C ) s 2p n1  s 2p n2 Example 1. The systolic blood pressures of a group of 60 patients showed x =140, sx =10. A second group of 60 showed y =145, and sy =13. Is there a significant difference in the mean difference between the two groups? Sol: In the example given the population variances are known. But the sample sizes are both larger than 30 Therefore Z= ( x 1 - x 2 ) - (C) s2 n1  s2 = ( 140 - 145 ) - (0) n2 100 169  60 60  2.36 This value will be compared with the critical values of 1.96 at 5% level of significance which implies the rejection of the null hypothesis that 1   2 =0 Example 2 The following were the student’s scores in MB201for random samples taken from two different degree programmes at the Sokoine University of agriculture in the year 2002 Table 19.2 B.SC. Agric Engineering Bachelor of Vertinary Medicine n1 =12 n2 =8 Sample size Sample mean Sample variance x1 =7.5 x 2 =5.9 2 1 s 22 =3.1 s = 0.58 Test at 5% level of significance whether 1   2 =0 or not and hence comment the student’s performance in the two-degree programmes Sol As it can be seen, the two population variances are unknown and that sample sizes are smaller than 30. From the student’s t tables we have t  n  n 1 2  2  , 2 = t18,( 0.05) 2 =2.101 n1 s12  n2 s 22 12(0.58)  8(3.1)  1.76 And from the data given we have S = = n1  n2  2 12  8  2 2 p 88 Therefore t = ( 7.5 - 5.9) - (0) 1.76 1.76  12 8 = 2.6 We shall reject the null hypothesis because 2.6>2.101. It is thus evident that the performance of the Engineering students is significantly better than that of the vertinary students in the subject. 19.8 A test for paired observations The examples in the preceding section involved two independent random samples taken from two different populations. Sometimes observation is made at pair and as a consequence any two samples considered are not independent. For example one may be interested at comparing the mean performance of the students in two different subjects say math and History. The score for every student would be regarded as paired observation. If x and y are the two variables observed jointly, the distribution of the sum or the difference cannot easily be established unless we involve the cov (x, y). Alternatively we consider the distribution of di=xi-yi. So the test for  x   y  C would basically be the test for  d  c . In particular, the test for the difference between two population means would the same as the test for  d  0 . The test statistics will thus be Z=( d -C) s d / n for large n and t=( d -C) ˆ d / n for small n. Example The following were the points scored by the students of Bsc.Environmental Sciences & Management two different subjects in the June/July-2001 University Examination at Sokoine University of Agriculture. 2 2 2 2 2 2 Table 19.3 2 2 2 2 2 5 2 4 3 2 5 3 3 1 Sol These are paired observation where we need to consider deviations di=xi-yi as shown in the table below: Table 19.4 4 1 3 1 2 2 2 2 2 2 2 2 3 2 2 3 2 3 X 2 1 5 1 2 2 2 2 5 2 4 2 2 2 2 2 5 2 Y 2 0 -2 0 0 0 0 0 -3 0 -2 0 1 0 0 1 -2 1 di 5 3 2 3 1 2 BIOMETRY DEV- STUDY 4 1 3 1 2 1 5 1 2 2 3 2 2 2 2 2 3 2 2 5 Test the hypothesis that the performance of the students in the subjects is the same. Considering the figures in the third row, we have d  -0.06 and ̂ d  1.43 and n =20 Since n is small the test statistic would be t=( d -C) ˆ d / n =  0.06 1.43 20  0.1876 The t-values containing 95% of the entire distribution at 19 d.f are –2.093 and 2.093 Upon comparison we accept the null hypothesis that  d  0 , meaning that the performance in the two subjects is not significantly different at 5% level of significance. 89 Exercise:19 1.What is a statistical hypothesis? Explain the essence of having hypotheses in statistics 2.The following were the random samples on weights of six children under five years at Kigugu village 12 13 14 11 15 8 9. Can the mean weight of the entire village be equal to 20? Test at 5% level of significance 3.Two samples drawn from two different normal populations revealed the following. S/N 1 2 Sample size 8 12 Sample variance Sampl mean 2 3 12 14 Do the two population means differ significantly? 4.The following table shows the mean number of bacteria colonies per plate obtainable by four slightly different methods from soil samples taken at 4P.M and 8P.M respectively. Method 4 P.M 8 P.M A 29.75 39.20 B 27.50 40.60 C 30.25 36.20 D 27.80 42.40 Are there significant more bacteria at 8.P.M than at 4 P.M? 5. If X and Y are two dependent variables corresponding two different populations. Derive the formulae for the mean and variance of the sampling distribution of the mean difference of the two populations 6.Ten soldiers visit the rifle range two weeks running. The first week their score were 67,24,57,55,63,54,56,68,33,43. In the second week their scores in the same order were 70,38,58,58,56,67,68,75,42,38. Is there any significant improvement? 90 CHAPTER 20: THE  2 DISTRIBUTION 20.1 Introduction If the variable variables Z1, , Z2 , Z3 are independent and each having N (0, 1), then y= z12  z 22  ... z n2 follows , …. Zn under the so-called chi-square (  ) distribution with n degrees of freedom. The p.d.f of a chi square distribution is 2 given as y = C e (  2 / 2) (  2 ) ( n / 2 ) 1 where C is a constant depending on n, the number of degrees of freedom. Sketch of the curve Consider the random samples x1 , x2 ,..., xn taken from a normal distribution with N (  ,  2 ). Based on the given definition, the following statistic  2 =( x1   ) 2  2 However, if  2 =(   2  ...  ( xn   ) 2  2 Would follow under the  2 (n) . is unknown and is estimated from the data we have; x1  x) 2   ( x2   ) 2 2  ( x2  x ) 2  2  ...  ( xn  x ) 2  2 = (n  1)ˆ 2  2 Which also follow the  2 (n-1) Exercise 1. Show that the mean of the chi-square distribution with n degrees of freedom is also n 2. Show that (n  1)ˆ 2  2 follows under the  2 (n-1) (Hint: The sum of two chi-variates with n and m degrees of freedom follows under chisquare distribution with n+m d.f) 91 20.2 The sampling distribution of the variance An immediate application of this result is on the inferences about the population variance confidence interval for Accordingly  n 11 2    2. (1-)% We know the chi-values containing (1-)% are  n 11 2  and  n1  2  . n  1s 2 2  2 . Consider   n1 2 . which is the same as n 1s 2  n1 2   n  1s 2  n11 2  . Example Establish a 95% confidence interval for the variance of a normal distribution using the random sample below: 4,5,7,8,10 Sol From the data, s2 = 4.56, and the  2 4,0..025 = 11.143 while  2 4, 0..975 = 0.484 Therefore 95% confidence interval for  is 2 1.64   2  37.68. = 4  4.56 . 1.64<37.68 4  4.56 2  11.143 0.484 Exercise A sample of 5 students in a certain class showed a variance of 20 in their mathematics score. Is it possible that the variance of the entire class score is 25? Test at 10% level of significance. 20.3 The uses of a chi-square distribution Apart from being the sampling distribution of the variance, the chi-square distribution has many other uses resulting from the following theorem If Oi and Ei are the observed and expected frequencies of the i th event in a given random experiment consisting of n events, then the quantity  O  E  n i 1 i Ei 2 i follows under the  2 (n-1) dealing with data involving counts. Below are some of the applications of this theorem 92 The proof of this theorem is beyond the scope of this presentation. However, this theorem finds its applications in many daily practical inquire 20.3.1 The test on the goodness of fit- (single classification) The chi-square distribution can be used to test whether a certain observation follows under a suggested distribution or not. Example A student was asked to throw a die 100 times and recorded the following results Score on x Observed frequency Expected frequency 1 24 2 12 3 14 4 15 5 26 6 11 Does the data support the hypothesis that the die is unbiased? Test at 5% level Sol 1. Score on x Observed frequency Expected frequency 1 24 17  O  E  6 2. The quantity i 1 i 2 12 17 3 14 17 4 15 17 5 26 17 6 11 17 2 i = Ei Example 2 Mendel reported the following on results for a di-hybrid (double-heterozygous) cross with peas. 1. 2. 3. 4. Round/yellow— 315 Round /green 108 Wrinkled/yellow-101 Wrinkled/green 32 The theory predicts that the frequencies should be in the proportions 9:3:3:1 Are these results consistent with the hypothesis of independent segregation and simple dominance of yellow over green and round over wrinkled? sol 93 Example 3 Genetic theory states that children having one parent of blood type M and the other of blood type N will always be one of the types M, MN, N and that the proportions of three types will on average be as 1:2:1. A report states that out of 300 children having one M parent and one N parent, 30% were found to be of type M, 45% of type MN and the remainder type N.Test whether the genetic theory is true or not. Sol 20.3.2The test of independence on the association of attributes(Double classifications) So far, we have considered cases in which individuals/ observations are classified in single criteria. However sometimes we have to consider cases where individuals are being classified in more than one criterion. The chi-square distribution can always be used to test the independence or dependence of the two criteria classification over the individuals. Example. Four hundred school children were classified into left-handed or right handed, and left-eye-dominant and right-eyedominant. The following were the results: - Left-eyed Right-eyed Total Left-handed 27 27 54 Right-handed 110 236 346 Total 137 263 400 Is there any association between one being left-handed and one being left eyed and vice-versa? Sol 20.3.3 Remarks on the  2 distribution (a) The expected frequencies should at least exceed 5 and preferably much larger. If one or more of the expected frequencies falls below 5, we pool the smaller classes to form larger ones until the condition is fulfilled. In combining frequency classes in this way we lose the degrees of freedom (b) The sum of the expected frequencies must be equal the sum of the observed frequencies (c) The number, m of classes or cells should preferably be neither too large nor too small. If 5  m  20 one is usually on the safe side. 94 CHAPTER 21:THE F-DISTRIBUTION: A TEST FOR  12   22 21.1 Introduction When testing for difference between two populations means with population variances unknown and sample size being not large we always assumed that the two population variances are the same and looked for the pooled variance as an estimate of the variance for both the two populations. This is not always the case and sometime we need to make a test whether really  12   22 . Such a test can be made using the so-called F-distribution. The F-distribution can be derived from the chi-square distribution using the following theorem. “Given two chi-variables,  n21 and n22 then the quantity F=  2 / v1  v / v2 follows under F- 2 distribution with degrees of freedom v1 andv2 ” The p.d.f for the F-distribution is given by y  CF v1  2 2 v v  1 2 v (1  1 F ) 2 . Where C is a constant depending upon v1 andv2 . v2 Of particular interest is when considering the ratio of two sample variances corresponding to sample sizes n1 andn2 . Recall that (n  1) sˆ 2  2 follows  2 (n-1). It follows that if s12 and s 22 are the variances of two samples of sizes n1 andn2 taken from two different populations with variances  12 and 22 , the quantity ˆ 12 ˆ 22 ( n1  1) ( n2  1) would /  12 n1  1  22 n2  1 follow under F-distribution with (n1-1) and (n2-1) degrees of freedom. we write this as F (n1-1 n2-1). If we assume that the null hypothesis  12   22 is true the quantity simply, ˆ 12 ˆ 22 ( n1  1) ( n2  1) becomes /  12 n1  1  22 n2  1 ˆ 12 which is the ratio of two unbiased estimators of the two population variances. ˆ 22 95 Sketch of the F-curve Example In testing for percent of ash content, 17 test from one shipment of coal show s=2.66 percent, and 21 test from a second shipment show s=4.55 percent. Test H0:  12   22 Vs  12   22 with   0.10 21.2 The uses of the F-distribution The F- distribution is frequently used in the analysis of variance where you consider variation from two different sources contained in a certain observation The ratio of the variances from the two source will tell whether there is a significant variation between the two sources of variation. If there is a significant variation between the two sources, the implication is that the sources do differ significantly in terms of their contribution in the total variation. At the moment we can use the Fdistribution in testing the validity of a given regression line Recall that, TOTAL VARIATION =EXPLAINED VARIATION +UNEXPLAINED VARIATION And Or TSS=(R2)TSS+ (1- R2)TSS Where R2 is the coefficient of determination = EXPLAINED VARIATION/TOTAL VARIATION We have two sources of variation as indicated; Explained variation (R2) TSS¬ 1 and unexplained variation (1- R2)TSS¬ n 2  . Hence F=(R2)TSS (n-2) / (1- R2)TSS gives F-value at 1and n-1 degrees of freedom. Usually under simple linear regression the ratio of the two variations follows F (1,n-2). R2 is required to be significant, meaning that EXPLAINED variation is larger than the UNEXPLAINED variation. Example The mental ages, x and the scores on a test y, of a group of 4 boys were as follows. X Y 5 0 5 5 7 8 Find (a) Regression line of y on x (b) Test at 5% whether the regression line is relevant or not. (c) Comment on part a after the result in part b (d) On the basis of the result in part (b) estimates the score at the age of 6 96 8 10 Sol (a) (b) y=-10.22+2.6x From the data given R2=0.8. The calculated F is 0.8(2)/(1-0.8)=8 The F value at 1 and 2 d.f is 7.71 Since the calculated value “8” is larger than 7.71, we conclude that the regression sum of squares are significantly larger than the Error sum of squares. Hence the regression line is significant. Exercise: 20-21 1 A sample of 30 drawn from a normal distribution had a variance of 10. Find the 90% confidence interval for the population variance 2. Assume for a certain age of group of Tanzanian males that systolic blood pressures show variance of 268. A selected sample of 20 men from this age group had a variance of 313. May one conclude that this age group represents a population with variance not equal to 268? 3. A sample of 5 males and 4 females scores in mathematics were taken. Find the chance that the variation in female score was 9times larger than in male’s scores 4. The following were the paired observations between x and y. (3,1), (4,3), (4,2), (2,5), (1,7). Find the linear regression of yon x. Is the regression line significant at 5%level? 5. The following table shows the association, among 1000 schoolboys between their general ability and their mathematical. Find out whether there is a connection between one’s mathematical ability and the general ability Math ability Good Fair Poor Total 6 General ability Good 44 265 41 Fair 22 257 91 Poor 4 178 98 Total It is known that, for over a period of five years a certain college comprises of 600 1st years, 400 , 2nd years and 300 third years. Can a group of 20 1st years, 10-second years and 5 third year students found at random be said to belong to such a college. 97 A list of useful references 1.Dr.B.S.Grewal( Higher Engineering mathematics- 2.Sanjay Arora & Bansi Lal Introducing Probability & Statistics, 3.Elmer B.Mode Elements of Statistics-Third edition 4.D.A.Bryars- Advanced level statistics, 5. Ministry of national education Advanced mathematics volume II, 6. D.S.Gupta An outline of statistical theory 7.Harry Frank& Steve son C.Althoen Statistics-Low price edition 98

Introduction to Statistics: Data Collection & Sampling Methods

Related documents

Products

Support

Introduction to Statistics: Data Collection & Sampling Methods

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib