Steps for Statistical Inference For our class statistical interference will fall into one of the following categories: 1. Confidence Intervals - primarily used to estimate some unknown parameter 2. Hypothesis Testing - used when we disagree with a hypothesized parameter Step 1: Identify which inference method to use. The wording of the question will lead you to which choice to make. For example, if the problem asks “to estimate” or “what is” this would indicate the use of confidence intervals. For instance, if the research question was, “What is the true mean GPA difference between males and females” or “Estimate the true mean GPA difference between males and females” we would calculate a confidence interval. However, if the question states a current parameter exists and we disagree with this value or we want to demonstrate statistically that the true parameter differs from some value, we would employ Hypothesis Testing. If the question were “We want to show that there is a difference between mean GPA of males and females” this would infer that a hypothesis test should be conducted where the hypothesized value is zero (i.e. no difference). Step 2: After determining the inference method we then have to identify the variable type and how many to select either a proportions or means technique. From the example in step 1 GPA is continuous, plus the wording states “mean” and therefore we know the inference method would involve a means method. Then since we are comparing males and females, two independent categories, the inference method would invoke a two-means analysis. Step 3: From here the process should become straightforward. If doing the work by hand we just plug the sample statistics into the formula based on our decisions in Steps 1 and 2. If using statistical software we simply need to select the correct method based on our previous steps. Step 4: Use the output or by hand results to make conclusions and decisions remember that these relate back to the population of interest. So if calculating a confidence interval: “We are 95%(or whatever level of confidence) confident that the true mean GPA difference between males and females is (enter the confidence interval limits).” If conducting a hypothesis test then provide the pertinent details. “With a p-value of 0.002 and using a 0.05 level of significance, we reject Ho that there is no difference in mean GPA between males and females.” EXAMPLES BELOW!! 1 Examples Eg 1: In a marketing survey for an automobile manufacturer, 90 randomly selected adults are asked which car color they would choose, if a particular car were available in either blue or red body colors. Of the 90 respondents, 53 said “blue.” Let p = population proportion that would say “blue.” The manufacturer wants to learn if a majority of buyers would pick blue. Step 1: The problem asks “to learn if a majority of buyers would pick blue”. This implies more than 50% suggesting a hypothesis test would be used with 0.5 as the hypothesized value. Step 2: Color is a categorical variable and we are counting how many blue cars would be bought. This relates to proportion and since there is only one proportion of interest, (that is we are only interested in the percentage of blue cars and not comparing the percentage of blue cars from one year to another year nor are we comparing the percentage of blue cars to the percentage of another color) Step 3: Use the steps for a hypothesis test of one proportion 1. Ho: p = 0.5 Ha: p > 0.5 2. Significance level is 0.05 3. Test statistic: z pˆ p0 p0 (1 p0 ) n 53 / 90 0.5 0.5(1 0.5) 90 1.69 and verify that the use of normal approximation is appropriate: n*p = 45 and n*(1-p) = 55 which are both large enough (greater than 15) so use of normal approximation is supported. 4: p-value: Since our Ha is “greater than” our p-value will be found by P(Z > 1.69). From the standard normal table the cumulative probability for a z-value of 1.69 is 0.9545. Since we want “greater than” 1.69 we get the p-value by subtracting this cumulative probability from one, or 0.0455 is the p-value. 5. Decision: Since the p-value of 0.0455 is less than 0.05 we reject Ho. 6. Conclusion: With a p-value of 0.0455 and an alpha value of 0.05, we reject Ho and conclude that a majority of adults would prefer a blue car over red. Eg 2: PSU claims that the average SAT Math (SATM) score for the incoming fall 2005 class at University Park was approximately 610. You believe this to be too high. The descriptive statistics are: sample size is 216; sample mean is 599; and the sample standard deviation is 85.3 Step 1: The problem states that you believe that the true mean SATM score is less than 610. The 610 represents a hypothesize value and therefore a test of hypothesis is in order. Step 2: SATM is a quantitative variable plus the question uses the word “mean”. Since we are only interested in one population, SATM scores for the fall 2005 class, the correct procedure to use is an one-mean hypothesis test. If we wanted to compare SATM scores between say two different years or between incoming freshman males and females then a two-means test would be appropriate. 2 Step 3: Use the steps for a hypothesis test of one mean 1. Ho: μ = 610 Ha: μ < 610 2. Significance level is 0.05 3. Test statistic: t x o 599 610 1.89 s 85.3 n 216 4: p-value: Since our Ha is “less than” our p-value will be found by P(T < -1.89). However, since the t-distribution is symmetric P(T < -1.89) is the same as P(T > 1.89). This is why we use P(T > absolute value of the t-test statistic) when using the table. The degrees of freedom (df) for this test is equal to the sample size minus one, or 215. Since the T-table does not have 215 we will use 100. Going across the row for DF of 100 we look for the absolute value of our test statistic: 1.89 which we cannot find but notice that it falls between 1.660 and 1.984. Going to the top of these two columns to find the “right tail probability” we see t0.05 and t0.025 respectively. Since the t-test statistic falls somewhere between 1.660 and 1.984 this says that the p-value is between 0.025 and 0.05, or 0.025 < p-value < 0.05 [Note that we write the smaller value first. Some make the mistake of writing 0.05 < p-value < 0.025 but if you look carefully at this it is not possible: How could a p-value be greater than 0.05 at the same time being less than 0.025?] 5. Decision: Since the range of possible p-values are all less than 0.05 we reject Ho. 6. Conclusion: With a p-value range of 0.025 < p-value < 0.05 and an alpha value of 0.05, we reject Ho and conclude that mean SATM scores for the 2005 PSU incoming freshman class is less than 610. Note: If in either of these two tests the alternative hypothesis (Ha) was “not equal” we would have conducted the analysis in the same way except when calculating the p-value we would have doubled the results. This would have led to a p-value of 0.091 for the first example and a range of 0.05 < p-value < 0.10 for example two. In both cases this would have resulted in us NOT rejecting Ho. Eg 3: PSU wants to determine with 95% confidence what the mean SAT Math (SATM) is for incoming freshman class of 2005. The descriptive statistics are: sample size is 216; sample mean is 599; and the sample standard deviation is 85.3 Step 1: Now the problem states that want to determine or find the mean SATM. That is the university does not have a hypothesized value (i.e. and educated guess) and instead wants to estimate what it might be with a level of confidence of 95%. Step 2: SATM is a quantitative variable plus the question uses the word “mean”. Since we are only interested in one population, SATM scores for the fall 2005 class, the correct procedure to use is an one-mean confidence interval. If we wanted to estimate the difference between SATM scores say for two different years or between incoming freshman males and females then a twomeans confidence interval would be appropriate. Step 3: Perform the steps for calculating a one-sample mean confidence interval 1. The general confidence interval formula: sample statistic ± Multiplier*Standard Error 2. Specific formula for one sample CI: x t multiplier * s n 3 3. Find the t-multiplier. Since interested in 95% level of confidence with degrees of freedom of 215 (again using 100 from the T-table) we go to the T-table and find under Confidence Level the level of 95% and go down this column until we reach the row that matches degrees of freedom (df) of 100. The resulting t-multiplier is 1.984 4. Plug the values into the confidence interval formula: 599 1.984 * 85.3 216 85.3 5. Calculate the interval: The margin of error is 1.984 * = 11.515 making the 216 lower bound of the interval equal to 599 - 11.515 = 587.485 and the upper bound of the interval equal to 599 + 11.515 = 610.515. 6. Interpret the interval: We are 95% confident that the true mean SATM scores for the incoming 2005 freshman class at PSU is between 587.485 and 610.515 Note: If we wanted to use this interval to test Ho: μ = 610 Ha: μ ≠ 610 we would NOT reject Ho since the interval (ever so slightly!) does contain the 610 hypothesized value. 4