Decision making by Statistical Inferences Introduction Suppose a representative sample that gives some information concerning a mean or other statistical quantity was collected from an experiment or research, some questions on the basis of this information would be: 1. Are the sample quantity and a corresponding population quantity close enough together so that it is reasonable to say that the sample might have come from the population? 2. Are they far enough apart so that they likely represent different populations? Answering these questions would involve making accurate statistical decisions that would be based on probability hence; we should make the solution to these questions as quantitative as we can. Decisions made about population on the basis of sample information are called statistical decisions. As highlighted above it could be to decide on which procedure or factor or services are better than the other from sample and population information Types of inferences or Decision Decisions are made from two categories; from population information and from sample information. In the first case, we already know the variance or standard deviation of the population, usually from previous measurements, and the normal distribution can be used for calculations. This usually use the Z-score as the test statistic In the other category, we find an estimate of the variance or standard deviation of the population from the sample used in the research itself; here the test statistic is usually the t-score Hypothesis Before decisions are made, assumptions about the population must first be made. These assumptions may or may not be true but are general statements about the probability distributions of the population. The assumptions are called statistical hypotheses. Statistical hypotheses are formulated for the purpose of rejecting or nullifying it. It is usually formulated that the sample came from a population, i.e. that there is no difference between the sample mean and population mean as the case may be. This type of hypothesis is called the NULL hypothesis and it is denoted by Ho. Any other hypothesis that is different from the Null hypothesis is called an alternative hypothesis and usually denoted by H1 Test of Hypothesis A procedure leading to a decision about the null hypothesis is called a test of a hypothesis. Hypothesistesting procedures rely on using the information in a random sample from the population of interest. If this information is consistent with the null hypothesis, we will not reject it; however, if this information is inconsistent with the null hypothesis, we will conclude that the null hypothesis is false and reject it in favor of the alternative. Here we are testing the hypothesis that a sample is similar enough to a particular population so that it might have come from that population. In that case, if the hypothesis is true, all disagreement between sample and population is due to random variation, and we say that the sample is consistent with the population. Specifically, we make the null hypothesis that the sample came from a population having the stated value of the population characteristic, which is the mean in this case. Then we do calculations to see how reasonable such a hypothesis is. We have to keep in mind the alternative if the null hypothesis is not true, as the alternative will affect the calculations. Procedure The procedure for tests of significance can be summarized as follows: 1. State the null hypothesis in terms of a population parameter, such as μ. 2. State the alternative hypothesis in terms of the same population parameter. 3. State the test statistic, substituting quantities given by the null hypothesis but not the observed values. What values of the test statistic will indicate that the difference may be significant? State what statistical distribution is being used. 4. Show calculations assuming that the null hypothesis is true. 5. Report the observed level of significance, or else compare the value of the test statistic with a critical value such as tabulated values 6. State a conclusion. That might be either to accept the null hypothesis, or else to reject the null hypothesis in favor of the alternative hypothesis. If the evidence is not strong enough to reject the null hypothesis, it is tentatively accepted, but that might be changed by further evidence. By statistical analysis we cannot prove that the null hypothesis is correct. Instead of saying that the null hypothesis is accepted, it is often better to say just that the null hypothesis is not rejected. In many instances we choose a critical level of significance before observations are made. The most common choices for the critical level of significance are 10%, 5%, and 1%. If the observed level of significance is smaller than a particular critical level of significance, we say that the result is statistically significant at that level of significance. If the observed level of significance is not smaller than the critical level of significance, we say that the result is not statistically significant at that level of significance. Error in Decision Making This decision procedure can lead to either of two wrong conclusions. The four situations possible in decision making is presented in Table 1 that determines whether the final decision is correct or in error. Rejecting the null hypothesis H0 when it is true is defined as a Type I error. That is a Type I error is committed when a researcher rejects the Null hypothesis when it should have been accepted. A Type II error is committed when a researcher accepts a Null hypothesis when it should have been rejected Level of Significance In testing a given hypothesis, the maximum probability that a researcher would be willing to risk a Type I error is called level of significance. That is the probability of making a type I error is denoted by the Greek letter α. It is called the α - error, or the size of the test. Alternatively, the probability of making a Type II error is denoted by β. β = P(type II error) = P(fail to reject H0 when H0 is false) Usually a level of significance of 0.05, 0.01 and 0.005 are used it is generally accepted to use 0.05 in engineering. This means that there is about 5(1) chance in 100 that we would reject the Null hypothesis when it should be accepted. In other words we are 95% confident that we have made the right decision. The power of a statistical test is the probability of rejecting the null hypothesis H0 when the alternative hypothesis is true. It is usually calculated from the tytpe II error as P=1-β Test Statistic For a normal distribution as shown in Fig 1 that is assumed for hypothesis testing, the standardized variable Z called the z-score is used as the test statistics for large numbers of samples when the population parameters are known. The Z-score is given by Equation 3 Where X is the distribution statistics, µ is the population parameter or mean in this case, is the variance and n is the sample size. Decision region for Z-score If the calculated Z-score lies between -1.96 and +1.96 (for @ = 0.05) then the null hypothesis is true hence will be accepted because it is within the acceptance region However, If the z-score falls outside this acceptance region the score is said to be significant @ 0.05 level of significance and the hypothesis is rejected ( See figure 2) One Tailed or Two-tailed Tests In constructing hypotheses, we will always state the null hypothesis as an equality so that the probability of type I error α can be controlled at a specific value. The alternative hypothesis might be either onesided or two-sided, depending on the conclusion to be drawn if H0 is rejected. Fig 2 displayed showed the two tails, upper tail and lower tail tests and the critical regions required in making decisions Fig 2 If the objective is to make a claim involving statements such as greater than, less than, superior to, exceeds, at least, and so forth, a one-sided alternative is appropriate. If no direction is implied by the claim, or if the claim “not equal to” is to be made, a two-sided alternative should be used. P - Values in Hypothesis Tests The P-value is the probability that the test statistic will take on a value that is at least as extreme as the observed value of the statistic when the null hypothesis H is true. Thus, a P-value conveys much information about the weight of evidence against H , and so a decision maker can draw a conclusion at any specified level of significance. Hence the P-value is the smallest level of significance that would lead to rejection of the null hypothesis H with the given data. 0 0 0 Here, Φ( Z0) is the standard normal cumulative distribution function Clearly, the P-value provides a measure of the credibility of the null hypothesis. Specifically, it is the risk that we have made an incorrect decision if we reject the null hypothesis H0. The P-value is not the probability that the null hypothesis is false, nor is (1 – P ) the probability that the null hypothesis is true. Examples involving Population and sample parameters Decisions for the Mean when Variance Is Known We may have some previous data giving the variance or standard deviation of the population, and it may be reasonable to assume that the previous value of the variance still applies. Example 1 Researcher personnel tested a new component to produce a propellant where the performance is determined by the burning rate. The standard burning rate must be 50cm/s for a propellant to be acceptable with a standard deviation of 2 cm/s. The result from 25 samples obtained by the research officer was 51.3. At a level of significance of 0.05, does the new propellant meet the specification? What conclusion can you draw from the experiment? Solution as discussed in class Following the stated procedure 1. The parameter of interest is of burning rate, µ the mean burning rate. 2. H0: µ = 50 centimeters per second 3. H1: µ = 50 centimeters per second 4. α = 0.05 5. The test statistic is given by Decision criteria If -1.96< Zcal< 1.96 accept hypothesis else reject 8. Decision: Since z0 = 3.25 >1.96, we reject H0: µ = 50 at the 0.05 level of significance. Stated more completely, we conclude that the mean burning rate differs from 50 centimeters per second, based on a sample of 25 measurements. I Conclusion There is strong evidence that the mean burning rate exceeds 50 centimeters per second, hence the new propellant does not meet the specification of propellants Example 2 Researcher personnel tested a new component to produce a propellant where the performance is determined by the burning rate. The minimum standard burning rate is 50cm/s for a propellant to be acceptable with a standard deviation of 2 cm/s. The result from 25 samples obtained by the research officer was 51.3 . At a level of significance of 0.05, Is there evidence to support the claim that the new propellant burning rate is greater than the specification? Note: Because we have a minimum specification and the aim is to show that the parameter of interest is better we apply the one tail test as shown below (Upper tail) Following the stated procedure 1. The parameter of interest is of burning rate , µ the mean burning rate. 2. H0: µ = 50 centimeters per second 3. H1: µ > 50 centimeters per second 4. α = 0.05 5. The test statistic is given by 6. Decision criteria If Z > Z0.05 reject hypothesis else accept For α = 0.05 Z0.05 = 2.25 8. Decision criteria Decision: Since z0 = 3.25 > 2.25 , we reject H0: µ = 50 at the 0.05 level of significance for the upper tail test. Stated more completely, we conclude that the mean burning rate differs from 50 centimeters per second, based on a sample of 25 measurements and that the average burning rate is greater than the minimum requirement of 50 Conclusion There is strong evidence that the mean burning rate exceeds 50 centimeters per second, hence the new propellant is better with a higher burning rate greater than 50cm/s Using the p-value to make decisions Considering example 1 the p-value is calculated as P-value = 2[1 – Φ(Zcal)] = 2[1 – Φ(3.25)] The calculated z score was 3.25 Looking at the Table for z probability = 0.999423 P = = 2[1 – 0.999423] = 0.0012 Thus, H0: µ = 50 would be rejected at any level of significance α ≥ P-value = 0.0012. i.e Ho would be rejected at α = 0.01 or 0.05 but not rejected at α = 0.001 Summary Example 3 It is very important that a certain solution in a chemical process have a pH of 8.30. The method used gives measurements which are approximately normally distributed about the actual pH of the solution with a known standard deviation of 0.020. We decide to use 5% as the critical level of significance. Solution Exercise The average daily amount of scrap from a mould manufacturing process is 25.5 kg with a standard deviation of 1.6 kg. A recent engineer employed by the company designed a modification process in an attempt to reduce this waste obtained the following scrap material in a 10-day trial period as 24.0, 21.9, 23.5, 25.2, 22.0, 21.0, 24.5, 25.0, 25.1, 22.8. If normal distribution applies, confirm if the modification reduced the waste scrap level using a significant level of 5%. If the level of significance is increased to 1%, what would be the conclusion on the modification? The standard deviation of a particular dimension on a machine part is known to be 0.0053 inches. Four parts coming off the production line are measured, giving readings of 2.747, 2.740, 2.750 and 2.749 inches. The population mean is supposed to be 2.740 inches. The normal distribution applies. a) Is the sample mean significantly larger than 2.740 inches at the 1% level of significance? b) What is the probability of a Type II error (i.e., of accepting the null hypothesis of part (a) when in fact the true mean is 2.752 inches)? Assume the standard deviation remains unchanged. A manufacturer produces a special alloy steel with an average tensile strength of 25,800 psi. The standard deviation of the tensile strength is 300 psi. Strengths are approximately normally distributed. A change in the composition of the alloy is tried in an attempt to increase its strength. A sample consisting of eight specimens of the new composition is tested. Unless an increase in the strength is significant at the 1% level, the manufacturer will return to the old composition. Standard deviation is not affected. a) If the mean strength of the sample of eight items is 26,100 psi, should the manufacturer continue with the new composition? b) What is the minimum mean strength that will justify continuing with the new composition? Probability of Type II error( ) Probability of type II error is used in determining the sample saie of experiments given the level of confidence that is required for thre research thus the can be calculated from Where Sample Size Formula The sample size for a given probability for types I and II errors can be obtained from Hypothesis Tests on the Mean When the population statistics are not known which is usually the case during researches, the t statistic is used in place of the Z statistic and the same procedure is followed. The summary is as presented Example An experiment was performed in which 15 materials were selected at random and their coefficients of friction measured in a laboratory. In the experiment the materials were rolled from special equipment in the laboratory where all external parameters were made constant with the underlisted values obtained. It is of interest to determine if there is evidence (with α = 0.05) to support a claim that the mean coefficient of friction exceeds 0.82. The observations follow: Solution We assume that normal distribution exist Since we are interested in the mean value exceeds 0.82, a one tail test would be used 1. The parameter of interest is the mean coefficient of friction, µ The Hypothesis Testing Procedures on the variance and standard Deviation of a normal population Following the previous procedures the hypotheses are as under listed as well as the test statistic Where the Null hypothesis Where 2 is obtained from the chi-square table Decision criteria for one sided tests are given as Example To obtain the chi squared value @ 0.025 for two tail test Test of two samples Hypothesis Tests for a Difference in Means, Variances Known Following the earlier procedures the test summary is as shown Example