Trevor Larsen Term Project – Part 2 Confidence Level Intervals For this next section, we will be looking into what these confidence level intervals are. It is the certain range that corresponds to a certain confidence level. Example; I’m 95% sure that I will get between 80 and 99 out of 100 on the next test. It tells you how confidence that person who made that statement. Here are a few problems to illustrate my point: Interpretation & Discuss: Problem 1: For the first problem, we need to construct a 95% confidence interval for the mean starting package students graduating in Engineering. Our alpha is simply one minus the confidence level, which is .05. We first find the sample mean and standard deviation of those fifty students graduating in the field using StatCrunch. Then we calculate the Error using the standard deviation, the sample size, which is 50, andπ‘πΌ/2 , using the t-distribution table. Remembering that this is a two tailed problem, the 5% percent that is unsure is really 2.5% at each end of the normal distribution graph. Be sure that when using the t-table you are looking under .05 in two tails, not one tail. We subtract and add the error to the sample mean. The confidence level interval shows $58,425 < u < $62,903. So overall, we are 95% sure that the population mean value for the students graduating in Engineering will be within this specific range. Problem 2: For the second problem, we are to construct a 99% confidence interval for the standard deviation of starting packages for Computer Science. Our alpha is .01 and again, a two-tailed problem, so 0.5% is at each end. We start by finding the standard deviation, the degrees of freedom (n-1), and chi-squared – left and right. The chi-squared values can be obtained using the chi-squared distribution chart. Since this is two tailed, use .005 and .995. The equations for the range work differently, no error to subtract from. We take the square root of the degrees of freedom times the dev. Squared divided by the left and right values of the chi squared table. The right one goes on the left of the interval because a bigger denominator is smaller. This whole thing gives us 3667 < π < 6179. Overall, we are 99% sure that the standard deviation for the students graduating in Computer Science in the population will be within this specific range. Problem 3: For the final problem in this section, we are to construct a 90% confidence interval for the proportion of all students that are graduating will start out with a starting package with a value of over $50,000. We know our alpha to be .10, and that this is a two tailed problem, 5% percent at each end. Our sample size is all the students in the table, that’s 350. We will need to find the sample proportion (the probability of success), but taking the number of people that have over 50,000 and dividing by the sample size. The sample proportion is 149/350 or .426 that a random student will make over 50,000. Next, we find π§π/2 using the normal distribution table (hint: there’s a smaller table with the most common confidence levels with their z-score). Now you can find the error. The equation is your z-score times the square root of the success probability times the failure probability divided by the sample size. The interval now states that we are 90% confident that our probability of success for the population proportion is within this range; 383 < p < .470. Hypothesis Testing This next section is all about using analysis to test a hypothesis which is a claim whether a certain condition is met. We test this claim with a certain significance level, because most of the time the answers will not be exact. Depending on if the test analysis is within the significance or not, then the claim is either labeled ‘rejected’ or ‘fail to reject’. These claims, confirmations, and rejections are sometimes presented in front of very important people or scientists, so they are always put in a formal statement for them. Here are the problems we are given and their answers: Interpretation & Discuss: Problem 1: For the first problem, we are given this claim with a .05 significance level; “Students graduating in Human and Social Sciences will start off with an average of under $38,000.” First we need to understand what we are testing here. We always test our null hypothesis. Since our claim fits under the alternate hypothesis, our null will say our mean is equal to $38,000. Alpha will be our significance level, and we can figure out the sample mean and standard deviation from the statcrunch. We now will solve for our test statistics. We find this by subtracting the population mean from our sample mean. Then divide by the standard deviation divided by the square root of the sample size. The next step can be done in one of two methods; the critical value method or the P-value method. For convenience, we will use the critical value method for this problem. First, by using the significance level and the degrees of freedom (sample size – 1) from our t-distribution chart, we find the critical value that separates the safe zone from the critical zone. An important note to take into account, the claim is not an equal, it’s a less-than. This tells us that this is a left-tailed problem. So, make sure you look under the right column. Since the test static value falls in the safe zone, we are going to fail to reject the null hypothesis. Here is the final statement to answer that question. “There is not sufficient evidence to support the claim that Students graduating in Human and Social Sciences will start off with an average of under $38,000.” Problem 2: For the last problem, we are given this claim with a .01 significance level: “80% of the students graduating with a college degree will find a starting package valued over $40,000”. Again, alpha is the significance level. Now our claim fits right in line with a null hypothesis of p = .80. Our alternate hypothesis will be the exact opposite. First, we get our sample proportions for success and failure. Then our sample size is all the students. We now will solve for our test statistic. We find this by subtracting the population proportion from and sample proportion. Then, divide by square root of the population proportion of success times the population proportion of failure divided by the sample size. For this next step we will use the P-value method. The P-value represents probability of observing a test statistic equal to or more ‘extreme’ than what was observed, assuming the null hypothesis is true. For our problem, the test statistic is a z-score, which can be used to find the P-value. If the P-value is less than or equal to the significance level, we reject the null hypothesis. If it’s more than the significance level, we fail to reject. Since our P-value is more than the significance level. We will fail to reject the null hypothesis and we will formally write this statement: “There is not sufficient evidence to warrant rejection of the claim that 80% of the students graduating with a college degree will find a starting package valued over $40,000”. Conditions for Confidence Intervals and Hypothesis Tests 1. The sample is a simple random sample. 2. The conditions for the binomial distribution are satisfied. Fixed number of trials. Only two possible categories of outcomes. 3. There are at least 5 successes and at least 5 failures. Conditions for the Confidence intervals for Proportions Conditions for the Confidence intervals for the Mean 1. The sample is a simple random sample. 2. The population is normally distributed or n > 30. Conditions for the Confidence intervals for Standard Deviations 1. The sample is a simple random sample. 2. The population must have normally distributed values. Our third problem satisfies all three. The samples are random. The binomial distribution is met, as you make $50,000 or you don’t. Finally there are at least 5 successes and failures each, with 149 making over $50,000 and 201 not making over $50,000. Our fifth problem satisfies all three. The samples are random. The binomial distribution is met, as you make $40,000 or you don’t. Finally there are at least 5 successes and failures each, with 269 making over $40,000 and 81 not making over $40,000. Our first and fourth problem satisfies both conditions. The samples are random, and distribution is normal or the sample size is greater than 30. Both of them had a sample size of 50 students. Our second problem does have random samples. However, right now it’s not normally distributed. It’s skewed to the right. By speculation, if we expand our sample size substantially, it would become a normal distribution because any skews would go away with more samples and therefore satisfies the second condition. There’s a couple ways that errors could have been made during these problems. ο· If the sampling method wasn’t random. If it were any other sampling method, there would probably be different results. ο· The other thing that might be uncertain is the distribution of the Standard Deviation. In a sample, the graph would appear skewed to the right. If we increase the sample size dramatically, it will eventually become a normally distributed graph. But this is all speculation regardless, because we have no idea what the population number is. ο· Not enough of a sample size. If we decided to base our results off only 5 people, the result would be off because that sample size doesn’t reflect the statistics for the whole population. On that same note, if we had 20 successes, and 1 failure for a binomial distribution, this statistic would be far from accurate because this sample is recording the condition of success. Without the failures, we would not get an accurate reading and base the population on this statistic. I conclude that we will have better statistical results if we plan on finding the confidence interval and hypothesis tests of the proportions and mean of the population. There would be fewer ways to make errors this way. The only things we need to be careful of regardless of what method we choose is the sampling method. It needs to be solidly random, otherwise, lots of potentially big errors will be made.