STATISTICS: MODULE 12122 CHAPTER 4 STATISTICAL INFERENCE The aim of statistical analysis is to enable one to proceed from a knowledge of the current situation to an understanding of what will happen tomorrow. In statistics, we study numerical data (usually from a sample) in order that these decisions, predictions, etc., be made after some mathematical / statistical investigation. 4.1 Population versus Sample The totality of pertinent data that may be collected in a given problem is called the POPULATION. Example 4.1 An accountant is interested in all the accounts received by a large firm during the first six months of a new tax year. The POPULATION is the 100,000 or so numbers each representing (i) the amount of an account in £, or (ii) the number of errors in an account, or (iii) whether an account has been settled or is outstanding, i.e. we define a random variable which is 0 if the account has been settled and 1 if it is outstanding. Referring back to Chapter 1, (i), (ii) and (iii) are examples of random variables. The sample space is the set of all the possible values of the random variable under consideration and it is also called the population, so probability distributions or models are theoretical distributions or models of POPULATIONS. In most cases the statistician will not investigate all values in the population but will take a SAMPLE. Distributions associated with samples are called frequency distributions. Example 4.2 The John Levin Partnership has department stores in a large number of cities throughout the U.K. The company wishes to compare the sales from the different department stores and instructs the sales managers in each of the stores to record the weekly sales for a random sample. The Edinburgh branch of the John Levin Partnership record the following sales figures : Sales Frequency (£0000s) less than 40 40-80 80-120 120-160 160-200 200-240 6 18 33 25 14 4 100 2 4.2 Some reasons for sampling 1. The time and cost involved in examining the whole population may be too great. 2. The population may be infinite or effectively infinite - e.g. testing out a new insecticide. 3. If the testing is testing to destruction, there would be none left to sell e.g. testing life-bulbs to measure their lifetime or testing tyres to measure what pressure they can take before they burst. N.B. You can arrive at reliable inferential results even taking a small sample from a large population, provided the correct statistical techniques are used. There are many ways in which a sample can be chosen (the study of this is called sampling design) e.g. simple random sample, cluster sample, systematic sample, stratified sample. 4.3 Random Sampling We assume that we take a simple random sample. A simple random sample is a sample in which each member of the population has an equal and independent chance of being included in the sample. Example 4.3 Let X 1 , X 2 , X 3 ,......., X n be a random sample from a population which has a Normal distribution with mean µ and variance σ 2 . Particular samples of interest in Econometrics (1) A cross-section is a sample of a number of observational units at one point in time. (2) A time series is a set of observations on the same observational unit at different points in time. (3) A panel data set is the same cross-section at different points in time. A TIME SERIES 2 3 The data, once collected, are classified in some systematic manner and presented in tabular form e.g a frequency distribution as in Example 4.2 (and / or in many cases graphical and pictorial form e.g histogram, bar chart, pie-chart). Descriptive measures, called STATISTICS are now collected from the data (e.g. mean, median, standard deviation). These measures, when calculated for the population, are called PARAMETERS. 4.4 Statistics versus Parameters A descriptive measure which is determined from the sample is called a statistic. Suppose X 1 , X 2 , X 3 ,......., X n is a random sample from a given population. A statistic will be a function of the sample values X 1 , X 2 , X 3 ,......., X n and will vary from sample to sample so a statistic will be a random variable. Examples of statistics sample mean X = (1) X 1 + X 2 + .......+ X n , n ∑ (X n (2) sample variance S 2 = (3) sample proportion p∃ (4) sample median X − X) 2 i 1 n− 1 ~ If we compute these measures for populations, they are called parameters. Examples of parameters N.B. (1) population mean µ (2) population variance σ 2 (3) (4) population proportion π or p % population median µ Usually Greek letters are used for population measures and English letters for sample measures. Sample Population Compute a statistic Compute a parameter e.g. ~ ∃ X , S , S 2 , p∃, X θ∃, λ e.g. µ,σ,σ2 , π,θ,λ A statistic is a function of the elements of a random sample. It is a random variable - it varies from sample to sample with a certain distribution called a sampling distribution. A parameter is a constant (generally unknown) The object then is to make decisions concerning the population by analysing the information contained in the sample. This is called statistical inference. 3 4 4.5 Statistical Inference The first stage in the problem is to try and answer questions about the population on the information contained in the sample. Because such inference cannot be absolutely certain, the language of PROBABILITY is used in stating conclusions. There are two branches of statistical inference. (1) Estimation 4.6 (2) Hypothesis testing Estimation In estimation we are concerned with obtaining the best possible estimates of ~ , σ , σ 2 , p , θ etc. population parameters (generally unknown) such as µ , µ There are two types of estimate: A. Point estimate B. Interval estimate A. Point estimators A point estimator is a random variable varying from sample to sample and its value is called the point estimate. Definition . A point estimate is a single value estimate for the parameter. Example 4.4 The weekly demand for a product is believed to be normally distributed with mean µ and variance σ 2 . Obtain point estimates of µ and σ 2 if the following demands are observed. 18, 25, 26, 27, 26, 25, 20, 22, 23, 25, 25, 28, 22, 27, 27, 20, 19, 31, 26, 27, 25, 24, 21, 29, 28, 22, 24, 26, 25, 25, 24. 31 Summary statistics are ∑ X = 762 and 1 31 ∑ X 2 = 19000 where X is the weekly 1 demand. Solution There are several suggestions for estimating µ . We could use the (i) sample mean X . Here X = ~ (ii) sample median X which is the middle value when the 31 values are arranged in order of magnitude. ~ Here X is the 16th value ( ( n + 1) th value with n = 31) in order of magnitude . 2 4 5 (iii) sample mode which is the value which occurs most frequently and this is (iv) midpoint of the range X MID = X MIN + X MAX 18 + 31 which is = 24.5 2 2 There are two suggestions for estimating σ 2 . ∑ (X n (i) Use the sample variance S 2 = − X) n ∑ 2 i 1 n− 1 = 1 n ∑ X i 1 X i2 − n n− 1 2 = ∑ (X n (ii) 2 Use S ∗ = 1 − X) 2 i n which gives 8.695. The question arises therefore as to which point estimator of µ is the best to use and which point estimator of σ 2 is the best to use. We have to choose the one with the ‘best’properties. This will be discussed in Chapter 5. B. Interval estimates To obtain an interval estimate one determines an interval in which, with a high degree of confidence one predicts the parameter value lies. These intervals are called confidence intervals and the end-points are called confidence limits. In order to ascertain which is the ‘best’point estimator and in order to obtain interval estimates we need to know more about certain sampling distributions. 4. 7 Sampling distribution of a single mean X (population normal). Suppose we take many many random samples of size n from a Normal population with mean µ and variance σ2 , and for each sample we compute the sample mean X . The mean X will vary from sample to sample e.g. suppose we take many random samples of students from Bristol University from different departments and for each sample we record the mean weekly number of hours of study X , then X will vary from sample to sample with a particular distribution. x1′ , x 2′ , ......... , x n′ x1′ ′ , x 2′ ′ , ......... , x n′ ′ x1′ ′ ,′ x 2′ ′ ′ , ......... , x n′ ′ ′ . . . . Sample 1 Sample 2 Sample 3 ........... . ........... . 5 6 . . ........... . . . ........... . From 3.15, Chapter 3, suppose X1 , X 2 ..... X n form a random sample from a Normal population with mean µ and variance σ2 , and the random variable Y is given by Y = a1 X1 + a 2 X 2 + a 3 X 3 + .....+ a n X n where a1 , a 2 ..... a n are constants, Y ~ N ( ∑ a i µ , ∑ ai2σ 2 ). then Further, suppose that a1 = a 2 = a 3 =..... = a n = then Y = X1 + X 2 + ... X n = X and n X ~ N (µ , So in theory, 1 n X ~ N(µ , σ2 ). n σ2 ) n i.e. the (sampling) distribution of a sample mean is Normal with mean µ = population mean σ ( < σ ) where σ is the population standard deviation and n but standard deviation σ X = n is the sample size. i.e. standardising 4.8 ( X − µ) σ ~ N (0,1). n A 100 (1- α )% confidence interval for µ where σ is known We can write From 4.7, This gives P (− zα / 2 ≤Z ≤ zα / 2 ) = (1 − α ) Z= ( X − µ) σ , so replace Z by (*) ( X − µ) n σ in n X − µ) ( P − zα / 2 ≤ ≤ zα / 2 = (1 − α ) σ n We aim to rearrange the above to get µ on its own in the middle. 6 (*) 7 Hence a two-sided 100 (1 - α )% confidence interval for µ where σ is known is given by σ σ , X + zα / 2 X − zα / 2 n n Example 4.5 The weight of canned tuna in nominal 16-ounce cans has a Normal distribution with a variance of 0.75. A random sample of 34 cans are weighed and the sample mean is 16.4 ounces. What is the 95% two-sided confidence interval for the population mean weight? 4.9 One-sided 100 (1 - α )% confidence intervals for µ where σ is known These are given by and µ ó X + zα σ n or σ − ∞ , X + zα n µ ò X − zα σ n or σ X − zα n , ∞ Example 4.6 Calculate a 95% one-sided confidence interval for the population mean weight which indicates its maximum value. 4.10 Sampling distribution of ( X − µ) (population normal) S n Suppose we take many many random samples of size n from a Normal population with mean µ and variance σ 2 , then for each sample we compute X , S and ( X − µ) S then this latter n statistic will vary from sample to sample because both X , S will vary from sample to sample. What is the sampling distribution of ( X − µ) S ?. n We know that the sampling distribution of 7 ( X − µ) σ n is normal with mean 0 and variance 1 8 i.e. ( X − µ) σ ~ N (0,1). n It turns out that the sampling distribution of ( X − µ) S is not Normal but is a distribution which is n symmetrical about 0 like the standardised normal N (0,1). In fact the sampling distribution of ( X − µ) S is a Student t-distribution with parameter υ (called the n degrees of freedom υ ) where υ = (n − 1). As υ → ∞ the t -distribution approaches a standardised Normal distribution. i.e. ( X − µ) S ~ t (n − 1) n and as n → ∞ , t (n − 1)→ N (0,1) Hence we can construct a 100 (1- α )% confidence interval for µ when the value of σ is unknown . It is given by : S S X − t α / 2 n , X + tα / 2 n Example 4.7 A random sample of 169 households in a certain very large area is selected as part of a study of the recreation habits of community residents. The respondents indicate a mean amount spent annually for recreation of £350 per family. The sample standard deviation is £65. Construct a 99% confidence interval for the mean amount spent annually in the area assuming the amount spent annually is normally distributed. 4.11 One-sided 100 (1- α )% confidence intervals for µ where σ is unknown These are given by and µ ó X + tα S n or S − ∞ , X + tα n µ ò X − tα S n or S X − tα n , ∞ 8 9 Chapter 4 continued..... 4.12 Interpretation of the 100(1- α )% confidence intervals for µ We have σ σ P X − zα / 2 ≤ µ ≤ X + zα / 2 = (1 − α ) when σ is known and n n S S ≤µ ≤ X + tα /2 P X − tα / 2 = (1 − α ) when σ is unknown. n n The random variables here are the endpoints of the interval not µ which is an unknown parameter (i.e. an unknown constant ) and one can only make probability statements about random variables. So once we have substituted in particular values for X and S from the sample, we can no longer make probability statements about the interval only confidence statements. It is important to realise that we cannot attach a probability to a particular interval only a confidence level. However we can say that we are 100(1- α )% confident that µ lies in our constructed 9 10 interval which means that if we were to take many many random samples of the same size n and for each of the samples, we constructed a 100(1- α )% confidence interval for µ , they would all be different as X and S vary from sample to sample (see diagram above) but we would expect on average 100(1- α )% of these intervals to contain µ and α % not to contain µ . 4. 13 Distribution of a single mean X (population non-normal). Suppose we take many many random samples of size n from a non-normal population (e.g. Binomial, Uniform population, Exponential population) with mean µ and variance σ2 and for each sample we measure the sample mean X . The Central Limit Theorem tells us that the sampling distribution of X is approximately normal with mean µ and standard deviation σ X = σ and the approximation improves as n → ∞ i.e. as the sample n size becomes very large. in X dist ribut → N (µ, ion or 4.14 X− µ σ/ n σ2 ) n as n → ∞ in dist ribut → N (0,1) ion as n → ∞ An application of the Central Limit Theorem Sampling distribution of a single sample proportion p∃ Suppose we take many many random samples of size n from a Binomial population and for each sample we record the proportion p∃that have a certain attribute. Let X1 , X2 ,........, X n be a random sample with Xi = 1 Xi = 0 if "i" has some attribute if "i" has not the attribute. Example 4.8 Consider a consumer’s poll where a random sample of coffee drinkers are asked if they prefer a particular brand of coffee called Goldtaste to any other brand of an equivalent standard. The attribute of interest here is ‘they prefer Goldtaste’so that Xi = 1 Xi = 0 if consumer prefers Goldtaste if consumer prefers another brand. X i is called a Bernoulli random variable and suppose that the probability that a consumer prefers Goldtaste is p. Then p(x) = P (X = x) = p x (1 − p) 1− x Let Y = X 1 + X 2 + X 3 + .........+ X n who prefer Goldtaste and Y ~Bin (n , p) . x = 0 or 1. then Y is the number of consumers in the sample 10 11 Let p∃= X + X 2 + ...+ X n Y = 1 =X n n then p∃is the sample proportion of consumers who prefer Goldtaste. ( ) So by the Central Limit Theorem, p∃~ N µ p ,σ 2p approximately We have µ p = σ 2p = Generally by the Central Limit Theorem, p∃~ N p , Standardising p∃− p p(1 − p) p(1 − p) approximately . n ~ N (0 ,1) approximately . n Example 4.9 Suppose a random sample of 200 consumers are asked if they prefer Goldtaste to other brands of coffee of an equivalent standard and 119 of them say they do prefer Goldtaste, obtain a point estimate for the proportion of all coffee consumers who would prefer Goldtaste. 4.15 An approximate 100(1- α )% confidence interval for a population proportion p An approximate confidence interval for p is given by p∃− zα / 2 p∃(1 − p∃) n , p∃+ zα / 2 p∃(1 − p∃) n Example 4.10 In the British Midland survey 1100 business travellers from eleven European countries, 23% of them thought that the E.U. needed the ECU to improve business in Europe. Obtain a 98% confidence interval for the percentage of all business travellers who would think this way. 11 12 4.16 Part(2) of Statistical Inference - Hypothesis testing via confidence intervals A statistical hypothesis is an assumption which is made concerning either the parameters of the population distribution or the shape of the probability distribution of the population. The assumption is called the null hypothesis and is generally denoted by H 0 , while the alternative hypothesis is denoted by H1 or H A . Example 4.11 A random sample of 169 households in a certain very large area is selected as part of a study of the recreation habits of community residents. The respondents indicate a mean amount spent annually for recreation of £350 per family. The sample standard deviation is £65. Test the hypothesis that the mean amount spent annually in the area is £375 per family using a 1% level of significance. Example 4.12 Based on theoretical considerations, an economist predicts that the average cost of maintaining a car in London is £1,800 per year. To test his theory the economist selects a random sample of 900 London car owners. This sample has a mean of £1,819 and a standard deviation of £300. (a) Does the sample information indicate that the mean cost is more than the economist believes?. (b) Does the sample information indicate that the economist’s theory is incorrect?. Use a 5% level of significance. (a) So the hypothesis which the economist wishes to test is specified by H0 : µ = 1,800 Economists' theory is correct, and the alternative hypothesis is given by H1 : µ > 1,800 Mean cost more than economist believes where µ is the average cost of maintaining a car in London. (b) The null hypothesis H0 would be the same but the alternative hypothesis H1 would be 12 13 H1 : µ ≠ 1,800 4.17 Testing the validity of H0 Economists' theory is incorrect. An assumption H0 could be completely verified if we could examine the whole population. We are usually unable to do this for a variety of reasons so we have to investigate the validity of H 0 on the basis of sample results. In testing, if we obtain a result which is very different from what was expected under H 0 then it can be due to either of two things: Either (a) A not very probable event has occurred and H 0 is true or (b) Null hypothesis H 0 is incorrect and should be rejected in favour of a more plausible hypothesis H1 . We conventionally conclude that (b) is the correct conclusion with the proviso that we could be wrong in our conclusion and the probability of this error is α%, which is the level of significance of the test. α% represents the maximum risk of a wrong decision in rejecting H 0 in favour of the alternative hypothesis H1 . If we reject H 0 when H 0 is true and so should not be rejected, we call this a Type I error and P( A Type I error) = α% , the level of significance of the test. In a court of law at the beginning of a court case, the defendent (i.e. the accused) is assumed innocent. Then the evidenceis considered and on the basis of this evidence, the defendent is either found guilty i.e. the original assumption of innocence is rejected or they are found not guilty . It is similar in hypothesis testing. The null hypothesis H 0 is assumed true and then the sample evidence is examined to see if it supports the null hypothesis H 0 or tends to discredit the null hypothesis H 0 . If it discredits H 0 , then H 0 is rejected in favour of a more plausible assumption i.e. H 0 is rejected in favour of H1 at a chosen level of significance. 4.18 Two-tailed and One-tailed Tests The question of whether a two-tailed test or a one-tailed test is to be performed depends on the alternative hypothesis e.g. H 0 : µ = 15 H1 : µ ≠ 15 Two-tailed test H 0 : µ = 15 H1 : µ < 15 H 0 : µ = 15 H1 : µ > 15 One-tailed test One-tailed test 13 14 C.Osborne March 2000 14 15 Worked Example 4.12 Let X = the cost of maintaining a car in London and X ~N( µ ,σ 2 ). Then µ is the mean cost of maintaining a car in London (a) We are testing here H0 : µ = 1,800 H1 : µ > 1,800 against Economists' theory is correct, Mean cost is more than economist believes. We have a one-tailed test here. Assume H0 is true i.e. X ~N( 1800 , σ 2 ). The level of significance of the test is 5% or 0.05 i.e. α = 0.05 . To test this hypothesis we need to construct a one-sided 100 (1 − α )% confidence interval for µ of the form µ ≥ ? i.e. a one-sided 95% confidence interval for µ of the form µ ≥ ?. If 1800 lies outside this interval, then we reject H0 at the 5% level of significance. If it does not i.e. it lies inside the interval, then we cannot reject H0 at the 5% level of significance. As σ 2 is unknown, we must use a confidence interval based on the t-distribution. From section 4.11 such a one-sided 100 (1- α )% confidence interval for µ where σ is unknown, is given by µ ò X − tα S n or S X − tα n , ∞ So a 95% one-sided confidence interval for µ is given by µ ò X − t 0.05 S n Here The parameter υ (i.e. the number of degrees of freedom) for the t -value is given by (n -1) = 899 For t 0.05 (899) use t 0.05 (120) in the tables and this gives Hence µ ò µ ò i.e. a 95% one-sided confidence interval for µ is given by [ 1802.42 ,∞ ) Conclusion 15 16 (b) We are testing here H0 : µ = 1800 against Economists' theory is correct, H1 : µ ≠ 1800 Economists' theory is incorrect. We have a two-tailed test here. Assume H0 is true i.e. X ~N( 1800 , σ 2 ). The level of significance of the test is 5% or 0.05 i.e. α = 0.05 As we have a two-tailed test here, we must construct a two-sided 95% confidence interval for µ and see if 1800 lies outside this interval. If it does, we reject H0 in favour of H1 at the 5% level of significance. If it does not i.e. it lies inside the confidence interval, then we cannot reject H0 at the 5% level of significance. A two-sided 100 (1- α )% confidence interval for µ when the value of σ is unknown is given from section 4.10 by : S S , X + tα / 2 X − t α / 2 n n So a 95% confidence interval for µ is given by S S X − t 0.025 n , X + t 0.025 n 300 300 = 1819 − t 0.025 , 1819 + t 0.025 900 900 Now the degrees of freedom for the t-value are 899 as in part (a) so for t 0.025 (899) use t 0.025 (120) in the tables which is 1.98. Hence a 95% confidence interval for µ is given by 1819 − 198 . 300 300 , 1819 + 198 . 900 900 = [ 1819 − 19.8 , 1819 + 19.8] = [ 1799.2 , 1838.8] Conclusion Our value of 1800 lies just inside the 95% confidence interval for µ given above. Hence we cannot reject H0 at the 5% level of significance i.e. we say the mean cost of maintaining a car in London is not significantly different from £1800. N.B. It does not follow that the mean cost of maintaining a car in London is exactly £1800. i.e. just because you cannot reject H0 it does not automatically follow that H0 is true. We can say we are 95% confident that the mean cost of maintaining a car in London is some value between £1799.2 and £1838.8 . If you cannot reject a null hypothesis H0 on the basis of the sample results and you decide to go ahead and accept H0 as being true, you may be committing what is called a Type II error if H0 is false and so should not be accepted. You will consider this further in the Econometric part of the course. 16 17 Summary 1. State your null hypothesis H0 and your alternative hypothesis H1 . 2. Depending on the alternative hypothesis H1 , determine whether the test is a two-tailed or one- tailed test. 3. If the test is two-tailed, then construct a two-sided 100 (1− α )% confidence interval for the unknown parameter ( in our examples the unknown parameter has been µ ). e.g a 100 (1- α )% confidence interval for µ when the value of σ is known is given by : σ σ , X + zα / 2 X − zα / 2 n n A 100 (1- α )% confidence interval for µ when the value of σ is unknown is given by : S S , X + tα / 2 X − t α / 2 n n 4. If the test is one-tailed, then construct a one-sided 100 (1− α )% confidence interval for the unknown parameter. If H1 involves a ‘less than ‘sign ( e.g. µ < µ ∗ ) then construct a one-sided interval of the form µ ≤ ? (see (a) and (b) below). If H1 involves a ‘greater than ‘sign ( e.g. µ > µ ∗ ) then construct a one-sided interval of the form µ ≥ ? (see (c) and (d) below). e.g. (a) One-sided 100 (1- α )% confidence interval for µ where σ is known are given by µ ó X + zα (b) or σ − ∞ , X + zα n One-sided 100 (1- α )% confidence interval for µ where σ is unknown are given by µ ó X + tα (c) σ n S n or S − ∞ , X + tα n One-sided 100 (1- α )% confidence interval for µ where σ is known are given by and µ ò X − zα σ n or σ X − zα n , ∞ (d) One-sided 100 (1- α )% confidence interval for µ where σ is unknown are given by µ ò X − tα 5. S n or S X − tα n , ∞ If the value of the parameter specified in the null hypothesis H0 (e.g. µ = µ ∗ ) is outside the interval , then reject H0 at the α % level of significance. If it lies inside the interval, do not 17 18 reject H0 at the α % level of significance. 6. When we construct from the sample results a 100 (1- α )% confidence interval for an unknown parameter, whether a two-sided interval or a one-sided interval, we are 100 (1- α )% confident that the parameter has a value within this interval. This confidence level of 100 (1- α )% is usually large e.g. 95% or 99% i.e. we are highly confident that the value of the unknown parameter lies in this interval. So when the value of the parameter specified by H0 is outside the interval, it is really saying that the sample results do not appear to support H0 , so the null hypothesis H0 is suspect or discredited and should be rejected, with the proviso that we could be wrong in that decision to reject H0 with probability α %. As the confidence level 100 (1- α )% will be small e.g. 5% or 1%. 7. is usually high so the chances of a wrong decision α % If the confidence level is 100 (1- α )% , the level of significance of the test is α % and vice-versa . Note If you do know σ , the population standard deviation, use confidence intervals for the population mean µ based on z - values. If you do not know σ , which is much more likely since the population mean µ is unknown, use confidence intervals based on t - values. 18