1 Using SPSS, Chapter 11: Additional Hypothesis Tests Here we see how to use SPSS to perform Chi-Squared and ANOVA tests. • Chapter 11.1 - Chi-Squared Test for Goodness of Fit If your data is in a frequency table be sure to first weight your data by frequency. Data → Weight Cases... Either way, you proceed by Analyze → Nonparametric Tests → One Sample... 2 – Assuming equal probabilities. Play Video . – Assuming non-equal probabilities. Play Video . • Chapter 11.2 - Chi-Squared Test of Independence If your data is in a frequency table be sure to first weight your data by frequency. Data → Weight Cases... Either way, you proceed by Analyze → Descriptive Statistics → Crosstabs... . • Chapter 11.3 - ANOVA Your data must be in standard SPSS format (rows = cases & columns = variables). Analyze → Compare Means → One-Way ANOVA... • Creating and Importing Data 3 Play Video . 4. Play Video . 5 2 Chapter 11.1 - Chi-Squared Test for Goodness of Fit Play Video . • Assuming Equal Probabilities This video demonstrates the Preliminary Example from Chapter 11.1. Here, the assumed probabilities in the null hypothesis are all equal. There are two parts to this video. In part 1, the data is in the form of a frequency table (below left). In part 2, the data is in standard format (below right). Part 1: Data in a Frequency Table Example 1, Hypothesized Even Distribution Sample Distribution for 60 Rolls of a Single Die Outcomes: Observed Assumed # on die Frequency (Oi ) Probability (pi ) 1 7 1/6 2 6 1/6 3 11 1/6 4 15 1/6 5 13 1/6 6 8 1/6 Part 2: Data in Standard Format Rows = Cases and Columns = Variables Roll # 1 2 3 4 5 .. . Outcome 6 3 2 2 1 .. . 60 5 Results Calculated in Textbook: χ2 = 6.400 P-value = 0.2692 Fail to reject the null hypothesis. Conclusion: There is not enough evidence to conclude that the die is not fair. Play Video . • Assuming Unequal Probabilities This video demonstrates Your Turn problem #2 from Chapter 11.1. Here, the assumed probabilities in the null hypothesis are not all equal. There are two parts to this video. In part 1, the data is in the form of a frequency table (below left). In part 2, the data is in standard format (below right). Part 1: Data in a Frequency Table Your Turn 2, Hypothesized Distribution Sample Distribution for 800 Blood Donors Blood Observed Assumed Type Frequency (Oi ) Probability (pi ) O+ 310 0.38 − O 71 0.07 A+ 235 0.34 − A 64 0.06 + B 68 0.09 − B 12 0.02 AB+ 36 0.03 − AB 4 0.01 Part 2: Data in Standard Format Rows = Cases and Columns = Variables Donor Blood # Type 1 O+ 2 O+ 3 A+ 4 AB+ 5 B6 A+ .. .. . . 800 B+ Results Calculated in Textbook: χ2 = 23.724 P-value = 0.00127 Reject the null hypothesis. Conclusion: Regional distribution does not seem to fit national distribution 3 Chapter 11.2 - Chi-Squared Test of Independence Play Video . This video demonstrates Example 1 from Chapter 11.2 Below is the distribution of grades by gender for grades in a class with 72 students in the form of a contingency table. Test whether or not there is a significant dependent relationship between grade and gender in this class. Use a 0.05 significance level. Observed Frequencies - Contingency Table A B C D F Male 8 10 6 9 9 Female 4 6 9 6 5 Results Calculated in Textbook: χ2 = 2.724 P-value = 0.605 Fail to reject the null hypothesis. Conclusion: Grade and gender are not significantly dependent in this class. There are two parts to this video. In part 1, the data is in the form of a frequency table (below left). In part 2, the data is in standard format (below right). Part 1: Data in a Frequency Table Gender Grade Frequency Male A 8 Female A 4 Male B 10 Female B 6 Male C 6 Female C 9 Male D 9 Female D 6 Male F 9 Female F 5 Part 2: Data in Standard Format Rows = Cases and Columns = Variables Student # Gender Grade 1 Male B 2 Female C 3 Female A 4 Male B 5 Female F 6 Male D 7 Male F 8 Female C .. .. .. . . . 72 Male B 4 Chapter 11.3 - ANOVA Play Video . This video demonstrates the Over-Simplified Examples (Case 1 and Case 2) in Chapter 11.3. For each case, there are three samples. For each case, test the claim that population means are not all equal. Case 1: Similar Means Sample 1 Sample 2 Sample 3 3 3 4 3 5 5 4 5 6 5 5 7 5 7 8 Results Calculated in Textbook: F = 2.73 P-value = 0.106 Fail to reject the null hypothesis Conclusion: There is not enough evidence to conclude that the population means are not equal. Case 2: Disparate Means Sample 1 Sample 2 Sample 3 3 3 8 3 5 9 4 5 10 5 5 11 5 7 12 Results Calculated in Textbook: F = 28.18 P-value = 0.000029 Reject the null hypothesis Conclusion: There is sufficient evidence to conclude that the population means are not equal. Format The Data First: The data must be put into the standard rows = cases and columns = variables format. Notice, each entry from each sample represents a different case so we need to set up the data so that there are 15 cases. Each case has a sample number and a score. Case 1: Similar Means Sample Score 1 3 1 3 1 4 1 5 1 5 2 3 2 5 2 5 2 5 2 7 3 4 3 5 3 6 3 7 3 8 Case 2: Disparate Means Sample Score 1 3 1 3 1 4 1 5 1 5 2 3 2 5 2 5 2 5 2 7 3 8 3 9 3 10 3 11 3 12 5 Creating and Importing Data • There are two ways to get data into SPSS. – You can enter the data by typing it directly into the data editor. – You can open an existing data file by selecting the File tab, then Open , then Data... . Then select the type of file from the list of options. If it is not already an SPSS (.sav) data file, you will be prompted to answer some questions. For example, if you open an Excel file it may ask which worksheet and whether or not the first row contains labels. • Make sure your data is formatted as described below. – Rows = Cases Each row represents a case such as each respondent to a questionnaire. – Columns = Variables Each column represents a variable being tracked or measured. For example, the answers to a specific question on a questionnaire defines it’s own variable (column). As such, each row represents an individual case for all variables. – Cells contain values Each cell contains a single value of a variable for a case. It is possible to enter data in the form of a frequency table but then you must do some alterations before analyzing such data. • Once you have the data opened in the data editor, click the Variable View tab at the bottom of the data editor. In this view, each variable is now a row and you must make sure all your variables are defined appropriately. The most important distinctions are – TYPE : The most common types are ∗ Numeric: Used for quantitative data. These are numbers with no commas and a period delimiting the decimal places. SPSS will not allow you to enter non-numeric characters into a cell of numeric type. ∗ Date: Used for dates or times from a menu of formats. ∗ String: Used for qualitative data. Avoid symbols such as *, -, +, ?, etc. – Measure : There are three levels of measurement. ∗ Scale is for ratio or interval levels of measurement. ∗ Ordinal is for ordinal or ranked data. ∗ Nominal is for qualitative data. – Values : If you have numeric values representing qualitative data such a 1=male and 0=female, you will probably want this to be labelled accordingly in graphs and outputs. Click on the cell in the Values column for that variable and assign labels for each value.