Inferring Sample Findings to the Population and Testing for Differences Jump to first page Statistics versus Parameters Values computed from samples are statistics! Values computed from the population are parameters! Use Greek letters when referring to parameters. Use Roman letters for statistics. Inference and Statistical Inference Inference - generalize about an entire class based on what you have observed about a small set of members of that class. Draw a conclusion from a small amount of evidence. Statistical Inference - Sample size and sample statistics are used to make estimates of population parameters. Hypothesis Testing Statistical procedure used to accept or reject the hypothesis based on sample information. Steps in hypothesis testing Begin with a statement about what you believe exists in the population. Draw a random sample and determine the sample statistic. Compare the statistic with the hypothesized parameter. Decide whether or not the sample supports the original hypothesis If the sample does not support the hypothesis, revise the hypothesis to be consistent with the sample's statistic. Test of the Hypothesized Population Parameter Value For example, we hypothesize that the average GPA for business majors is not the same as Recreation majors. z x µH sx The sample mean is compared to the hypothesized mean, if z exceeds critical value of z (e.g., 1.96) then we reject the hypothesis that the population mean is Mu. z p H sp Directional Hypotheses Indicates the direction in which you believe the population parameter falls. For example, the average GPA of business majors is higher than the average GPA of Recreation Majors. Note that we are now interested in the volume of the curve on only one side of the mean. Interpretation If the hypothesis about the population parameter is correct or true,then a high percentage of the sample means must fall close to this value (i.e., within +/-1.96 sd.) Failure to support the hypothesis tells the hypothesizer that the assumptions about the population are in error. Testing for Differences Between Two Means z Ho: There is no difference between two means. (Mu1=Mu2) Ha: There is a difference between two means. (Mu1 does not equal Mu2). x1x2 s x x 1 2 sx x2 1 2 s1 2 s2 n1 n2 Testing for Differences Between Two Means: Example Is there a statistically significant difference between men and women on how many movies they have seen in the last month? Ho: There is no difference between two means. (MuW=MuM) Ha: There is a difference between two means. (MuW/=MuM) Example Gender N Mean St. Dev male 19 2.3684 1.98 female 13 2.5385 2.18 t TOTAL df -.229 30 Significance (2 tailed) .820 F Sig. Levene’s test for equality of variance .004 .952 Testing for Differences Between Two Means Fail to reject the null hypothesis, that the means are equal Why? Significance = .82 Reject any significance lower than .05 .82 > ,05; therefore, fail to reject null there is no statistically significant difference between men and women on how many movies seen in last month makes sense - look at means (2.36 & 2.53) Small Sample Size - t-Test Normal bell curve assumptions are invalid when sample sizes are 30 or less. Alternative choice is t-Test Shape of t distribution is determined by sample size (i.e., degrees of freedom). df = n-1 ANOVA ANOVA = Analysis of Variance Compares means across multiple groups ANOVA will tell you that one pair of means has a statistically significant difference but not which one Assumptions: independence normality equality of variance (Levene test) Analysis of Variance When researchers want to compare the means of three or more groups. ANOVA used to facilitate comparison! Basic Analysis Does a statistical significance difference exist between at least two groups of means? ANOVA does not communicate how many pairs of means are statistically significant in their differences. Hypothesis Testing Ho: There is no difference among the population means for the various groups. Ha: At least two groups have different population means.. When MSBetween is significantly greater than MSWithin then we reject Ho. F value F = MSBetween/MSWithin If F exceeds Critical F(df1, df2) then we reject Ho. Visual Representation Population 2 Population 1 Population 3 Population 5 Population 4 Appears that at least 2 populations have different means. Visual Representation Population 3 Population 2 Population 1 Population 4 Population 5 Appears that populations do not have significantly different means. Tests of Differences Chi-square goodness-of-fit Does some observed pattern of frequencies correspond to an expected pattern? Z-test/T-test Is there a significant difference between the means of two groups? ANOVA Is there a significant difference between the means of more than two groups? When to Use each Test Chi-square goodness-of-fit Both variables are categorical/nominal. T-test One variable is continuous; the other is categorical with two groups/categories. ANOVA One variable is continuous (i.e., interval or ratio); the other is categorical with more than two groups. How to Interpret a Significant p-value (p < .05) Chi-square goodness-of-fit T-test “There is a significant difference in frequency of responses among the different groups (or categories).” “The means (averages) of the 2 population groups are different on the characteristic being tested.” ANOVA “The means of the (multiple) population groups are different need post hoc test (e.g., Bonferroni) to determine exactly which group means are different from one another.” Measuring Association Is there any association (correlation) between two or more variables? If so, what is the strength and direction of the correlation? Can we predict one variable (dependent variable) based on its association with other variables (independent variables)? Correlation Analysis estimate of correlation between two A statistical technique used to measure the closeness of the linear relationship between two or more variables. Can offer evidence of causality, but is not enough to establish causality by itself (must also have evidence of knowledge/theory and correct sequence of variables). Scatterplots can give visual variables. Regression Analysis Simple Regression relate a single criterion (dependent) variable to a single predictor (independent) variable Multiple Regression relate a single criterion variable to multiple predictor variables All variables should be at least interval! Correlation/Regression Coefficient of Correlation (r) measure of the strength of linear association between two variables also called “Pearson’s r” or “product-moment” ranges from -1 to +1 Coefficient of Determination (r2) proportion of variance in the criterion explained by the fitted regression equation (of predictors)