Inferential Statistics & Test of Significance Confidence Interval (CI) Y Z / 2 y Y = mean Z = Z score related with a 95% CI σ = standard error samplemean 1.96(or 2) * standarder ror Building a CI • Assume the following y 100 y 15 N 400 Y y y 15 400 N .750 CI 100 (1.96)(0.750 ) Upper 101 .47 Lower 98.53 Why do we use 1.96? Source; Knoke & Bohrnstead (1991:167) Is there a sample that is different from the mean? Significance Testing • When we explain some phenomenon we move beyond description to inferential statistics and hypothesis testing. • Tests of significance allow us to test hypotheses, and when we find a relationship between variables, reject the null hypothesis. Hypothesis testing • Hypothesis testing means that we are testing our null hypothesis (Ho) against some competing or alternative hypothesis (H1) • Normally we choose statements such as Ho : μy = 100 H1: μy ≠100 Or H1: μy > 100 Or H1: μy < 100 Significance Testing • Even with high powered statistical measures, there will be results that pop up that are affected by chance. If we were to keep running our models a thousand times, or fewer, we would likely see some results that do not stem from systematic processes. • Thus, we need to determine at what level of significance we are willing to frame our results. We can never be 100% confident. • Conventional levels of significance where we reject the null hypothesis are usually .05 or .01. The probability .10 is weakly significant. Significance Testing • When you erroneously reject the null hypothesis when it is true, you make a Type I error. This means you are accepting a “False Positive” result. • Think of this as a fiancé test. The chances of rejecting or saying no to mister or miss “right” Significance Testing • A Type II error occurs when you accept the null hypothesis when it is not true. • This is a “False Negative”, when you have say yes to Mr. or Miss “wrong” • Type II errors in statistical testing result from too little data, omitted variable bias, and multicollinearity. Other distributions • The normal distribution assumes: 1. We know the standard error of the population, however, often we don’t know it. 2. The t-distribution become the best alternative when we don’t know the standard error but we know the standard deviation. 3. As the sample gets bigger the t-distribution approaches the normal distribution 4. There are other distribution such as chi square and the that we will discuss latter. T- Distribution & Normal Distribution The form of the t-distribution depends on the sample size. As the sample gets Larger there is not difference between the normal and the t-distribution Source: Gujarati (1992:76) The t formula y y t Sy N CI Y t / 2 ( S y / N ) For α =.05 and N=30 , t =2.045 95% CI using t-test • Mean= 20 • Sy = 5 • N= 20 20± 2.093 (5/√20) = 22.34 upper 18.88 lower Why do we care about CI? • We use CI interval for hypothesis testing • For instance, we want to know if there is a difference of home values between El Paso and Boston • We want to know whether or not taking class at Kaplan makes a difference in our GRE scores • We want to know if there is a difference between the treatment and control groups. Mean Difference testing Mean USA El Paso Las Cruces Boston Home Values T-Tests of Independence • Used to test whether there is a significant difference between the means of two samples. • We are testing for independence, meaning the two samples are related or not. • This is a one-time test, not over time with multiple observations. • Example: The values of homes between El Paso and Boston T-Test of Independence • Useful in experiments where people are assigned to two groups, when there should be no differences, and then introduce Independent variables (treatment) to see if groups have real differences, which would be attributable to introduced X variable. This implies the samples are from different populations (with different μ). • This is the Completely Randomized TwoGroup Design. T-Test of Independence • For example, we can take a random sample of high school students and divided into two groups. One gets tutoring for the SAT and the other does not. Ho: μ1≠ μ2 H1: μ1= μ2 • After one group gets tutoring, but not the other, we compare the scores. We find that indeed the group exposed to tutoring outperformed the other group. We thus conclude that tutoring makes a difference. • Positive increments at a different rate Treatment Control Pre-test Post-test Two Sample Difference of Means T-Test t X1 X 2 2 2 ( n 1 ) s ( n 1 ) s n n 1 1 2 2 1 2 n1 n2 2 n1n2 (n1 1) s1 (n2 1) s2 n1 n2 2 2 Sp2 = n1 n2 n n 1 2 2 Pooled variance of the two groups = common standard deviation of two groups Two Sample Difference of Means T-Test • The nominator of the equation captures difference in means, while the denominator captures the variation within and between each group. • Important point: of interest is the difference between the sample means, not sample and population means. However, rejecting the null means that the two groups under analysis have different population means. An example • Test on GRE verbal test scores by gender: Females: mean = 50.9, variance = 47.553, n=6 Males: mean=41.5, variance= 49.544, n=10 t 50.9 41.5 (6 1)47.553 (10 1)49.544 6 10 6 10 2 6(10) t t 9.4 48.826(.26667) 9.4 13.02 9.4 t 2.605 3.608 Now what do we do with this obtained value? Steps of Testing and Significance 1. Statement of null hypothesis: if there is not one then how can you be wrong? 2. Set Alpha Level of Risk: .10, .05, .01 3. Selection of appropriate test statistic: T-test, 4. Computation of statistical value: get obtained value. 5. Compare obtained value to critical value: done for you for most methods in most statistical packages. Steps of Testing and Significance 6. Comparison of the obtained and critical values. 7. If obtained value is more extreme than critical value, you may reject the null hypothesis. In other words, you have significant results. 8. If point seven above is not true, obtained is lower than critical, then null is not rejected. GRE Verbal Example Obtained Value: 2.605 Critical Value? Degrees of Freedom: number of cases left after subtracting 1 for each sample. (14) Ho : μf =μm H1: μf ≠μm Is the null hypothesis (Ho) supported? Answer: No, women have higher verbal skills and this is statistically significant. This means that the mean scores of each gender as a population are different. Paired T-Tests • We use Paired T-Tests, test of dependence, to examine a single sample subjects/units under two conditions, such as pretest - posttest experiment. • For example, we can examine whether a group of students improves if they retake the GRE exam. The T-test examines if there is any significant difference between the two studies. If so, then possibly something like studying more made a difference. Paired T-Tests • Unlike a test for independence, this test requires that the two groups/samples being evaluated are dependent upon each other. • For example, we can use a paired t-test to examine two sets of scores across time as long as they come from the same students. • This is appropriate for a pre-test –post-test research design D n( D ( D ) ( n 1) 2 2 ΣD = sum differences between groups, plus it is squared. n = number of paired groups Comparing Test Scores Midterm Final 48 71.2 69 73.3 95 96 87 94.2 50 81.4 75 86.7 74 72.8 88 88 92 95 69 88 75 91.8 86 93.6 73 71.8 60 80.1 Paired Samples Statistics Pair 1 MID FINAL Mean 74.3571 84.5643 N 14 14 Std. Deviation 14.60562 9.32924 Std. Error Mean 3.90352 2.49335 Paired Samples Correlations N Pair 1 MID & FINAL 14 Correlation .710 Sig. .004 Paired Samples Test Paired Differences Pair 1 MID - FINAL Mean -10.2071 Std. Deviation 10.34300 Std. Error Mean 2.76428 95% Confidence Interval of the Difference Lower Upper -16.1790 -4.2353 What can we conclude? t -3.693 df 13 Sig. (2-tailed) .003