|PART ONE -- Essentials| -- Inference (Estimation and Hypothesis Testing) From Small Samples Note: This part of the course deals with estimation and hypothesis testing, which were introduced in Stats 1, Part 5. Review the outline for that section if necessary. The material in Stats 1, Part 5 is to be considered part of Stats 2, Part 1. Interval estimation and hypothesis testing Two Types of Problems Means--one-group; two-group t-distribution Symmetrical with center concentration, but not as concentrated as the normal distribution Lower in the center and higher in the tails than the normal distribution Degrees of freedom--expresses the sample size One-group problems, (n-1); two-group problems, (n1+n2-2) or [(n1-1) + (n2-1)] On the 4-Column formula sheet, columns 1 and 2 may be used, with the substitution of "t" for "z". Logic is identical to chapters 6, 7 and 8 large-sample sections. Unpaired design for two-group problems Sample items for each group selected randomly A difference between the means of groups might be due to experimental "treatment" or might simply be due to the fact that the members of the two groups were different. Treatment--intentional difference between groups being tested, e.g., in a pharmaceutical test, drug group vs. non-drug group Confounding variable--uncontrolled factor that might be causing an observed difference between groups Paired-difference design for two-group problems Purpose--to eliminate "confounding" variables and isolate the variable of interest Ideal--keep everything constant except the variable under investigation. Same subjects are tested twice--before and after the experimental treatment. Difference therefore cannot be due to the members of the groups being different. Four assumptions Samples Random Independent (in two-group unpaired experiments) Populations Normally distributed Equal variances (in two-group unpaired experiments) Moderate departures from the assumptions will not seriously affect validity. A test with this characteristic is called "robust." If the assumptions are seriously violated, two approaches may be taken Increase sample to a "large" size (then, population assumptions need not be met). Use nonparametric tests (which have no population assumptions). Inferences regarding variances One-group inference regarding the variance--uses "chi-square" (χ2) distribution Estimation and hypothesis testing are possible regarding the variance of one group. In hypothesis testing, the Ho is that σ2 is equal to some specified value. Two-group inference regarding the variances--uses "F" distribution. Variances are compared by division (ratio), rather than by subtraction (difference) Estimation and hypothesis testing are possible regarding the variances of two groups. In hypothesis testing, the Ho is that σ12 is equal to σ22. As a ratio, this would mean σ12 / σ22 = 1. Terminology--explain each of the following: inferential statistics, sample mean, population mean, estimator, estimate, unbiased estimator, point estimate, interval estimate, confidence interval, degree of confidence, confidence level, error factor, required sample size, upper confidence limit, lower confidence limit, hypothesis test, null hypothesis, alternate hypothesis, type I error, α, type II error, β, calculated-t (test statistic), critical region, table-t (critical value of t), rejection of the null hypothesis, non-rejection of the null hypothesis, p-value, hypothesis-test conclusion, independent samples, standard error of the difference, paired difference design, confounding variable, chi-square distribution (purpose), F distribution (purpose) Skills and Procedures given appropriate data, conduct estimation and hypothesis testing on the population mean of one group, involving these steps: make a point estimate of a population mean compute the sampling standard deviation (standard error) of the sample means compute and interpret the error factor for the interval estimate for the 90%, 95% and 99% confidence levels, using the t distribution state the null and alternate hypotheses regarding the population mean determine the table-t (critical value of t) for alpha levels of 0.10, 0.05 and 0.01 compute the calculated-t (test statistic) draw the appropriate hypothesis-test conclusion based on the given level of α, the table-t (critical value) and the calculated-t (test statistic) interpret the conclusion determine and interpret the p-value given appropriate data, conduct estimation and hypothesis testing on the population means of two groups, involving these steps: make a point estimate of the difference between population means compute the sampling standard deviation (standard error) of the difference between sample means compute and interpret the error factor for the interval estimate for the 90%, 95% and 99% confidence levels state the null and alternate hypotheses regarding the difference between population means determine the table-t (critical value of t) for alpha levels of 0.10, 0.05 and 0.01 compute the calculated-t (test statistic) draw the appropriate hypothesis-test conclusion based on the given level of α, the table-t and the calculated-t interpret the conclusion determine and interpret the p-value given appropriate data, conduct estimation and hypothesis testing on the population means of two groups in a paired-difference design, involving these steps: make a point estimate of the difference between population means by computing the average of the differences compute the sampling standard deviation (standard error) of the difference between sample means compute and interpret the error factor for the interval estimate for the 90%, 95% and 99% confidence levels state the null and alternate hypotheses regarding the difference between population means determine the table-t (critical value of t) for alpha levels of 0.10, 0.05 and 0.01 compute the calculated-t (test statistic) draw the appropriate hypothesis-test conclusion based on the given level of α, the table-t and the calculated-t interpret the conclusion determine and interpret the p-value Concepts- explain why a confidence interval becomes larger as the confidence level increases explain why a confidence interval becomes smaller as the sample size increases describe the nature of the trade-off between precision and cost identify the type of error that is made if the null hypothesis is "the defendant is innocent," and an innocent defendant is erroneously convicted identify the type of error that is made if the null hypothesis is "the defendant is innocent," and a guilty defendant is erroneously acquitted explain why a researcher seeking to reject a null hypothesis may tend to prefer a one-sided alternative hypothesis explain how the paired-difference design eliminates a confounding variable explain what happens to the t distribution as the sample size becomes smaller explain what happens to the t distribution as the sample size becomes larger describe how the t distribution is similar to the normal distribution describe how the t distribution differs from the normal distribution What to say and how to say it: INSERT A NUMBER WHEREVER THERE ARE PARENTHESES ( ). Column 1--mean, one group Ho Rejected The difference between the sample mean, (xbar), and the null hypothesis, (μHo), is statistically significant at the (α) level. The population mean is probably not (μHo). Ho not rejected: The difference between the sample mean, (xbar), and the null hypothesis, (μHo), is not statistically significant at the (α) level. The population mean could be (μHo). Column 2--means, 2 groups (or paired-difference design) Ho rejected: The difference between the sample means, (xbar1-xbar2), is statistically significant at the (α) level. The population means are probably not equal. Ho not rejected: The difference between the sample means, (xbar1-xbar2), is not statistically significant at the (α) level. The population means could be equal. THE t-DISTRIBUTION CENTRAL LIMIT THEOREM -- SAMPLING DISTRIBUTIONS OF: MEANS DIFFERENCES BETWEEN MEANS PROPORTIONS DIFFERENCES BETWEEN PROPORTIONS ARE ESSENTIALLY NORMAL REGARDLESS OF THE SHAPE OF THE POPULATION DISTRIBUTION, WHEN SAMPLE SIZES ARE LARGE (n 30). WHEN SAMPLE SIZES ARE SMALL (n < 30), SAMPLING DISTRIBUTIONS ARE NO LONGER NORMAL. THEY FOLLOW t-DISTRIBUTIONS: SYMMETRICAL, LOWER AND WIDER THAN THE NORMAL DISTRIBUTION, LESS CONCENTRATED IN THE CENTER. t-DISTRIBUTION SHAPE VARIES AS n CHANGES. THE SMALLER THE n, THE LESS CONCENTRATED IN THE CENTER. SAMPLE SIZE IS EXPRESSED BY DEGREES OF FREEDOM df = (n - 1). t-DISTRIBUTION TABLE COLUMN HEADINGS -- ONE-SIDED AND TWO-SIDED TAIL AREAS: BODY OF TABLE CONTAINS t-VALUES (ANALOGOUS TO z-VALUES)--THE NUMBER OF STANDARD DEVIATIONS FROM THE MEAN t-VALUES APPROACH z-VALUES AS n INCREASES. BOTTOM ROW OF THE t-TABLE CONTAINS z-VALUES. AS n DECREASES, t-VALUES INCREASE. DUE TO THE LESSER DEGREE OF CENTER CONCENTRATION, AS THE SAMPLE SIZE DECREASES, ONE MUST MOVE FARTHER FROM THE MEAN IN ORDER TO ENCLOSE A GIVEN PORTION OF THE DISTRIBUTION.