Assignment 28 Psych 5500/6500 Fall, 2008 Chi-Square Test for the True Value of the Variance 1. Some researchers have been examining the effect of problem-solving training on performance on a specific IQ test. Scoring of the test is such that normally (i.e. without the training) the population will have a mean of 100 and a standard deviation of 10. The researchers expect that the training will not only raise the mean score on the IQ test but should also make the participants more similar in their scores. They sample some students and give them the training, then they give them the IQ test. The data are presented in the file ‘IQ.sav’. Analyze the data to test their theory that the training decreases the variance of IQ scores. a) Write H0 and HA b) What is the expected (mean) value of chi-square if H0 is correct? (give a numeric answer) c) Based upon the story problem is this a directional or a non-directional theory being tested? d) Should the data be analyzed using a two-tail or a one-tail test? I suggest you draw the sampling distribution and mark the rejection region and the mean of the curve, but you don’t need to hand that in. e) Use the Chi Square tool to arrive at the Chi Square critical value (note from the image on the tool what rejection region it works with, then figure out what p value you need to input to get the critical value you want). f) Use SPSS to estimate the variance of the population of participants who have received the problem solving training. g) Compute the obtained value of chi-square. h) Use the Chi Square tool to compute the value of p. Note from the image on the tool what p value it gives you, and adjust accordingly. i) What is your decision regarding H0? (it might help to draw the sampling distribution and put in the rejection region) j) What is your decision regarding whether or not you can conclude that the training reduces variance of scores on the IQ test k) This use of chi-square relies upon an assumption about the population, what is that assumption? Goodness of Fit Test 2. A researcher suspects that the ethnic makeup of an area has changed over some time period. The last complete census (taken seven years ago) showed that the ethnic makeup of the area was: Ethnic Group A: 20% Ethnic Group B: 30% Ethnic Group C: 50% The researcher randomly samples 39 people from the area and records to which ethnic group they belong. The data can be found in the file “ethnicity”. Load the data into SPSS. a) Go to the Analyze>>DescriptiveStatistics>>Frequencies menu to get a breakdown on the percentages of each ethnic group in the sample. Observe the differences between the frequencies from 7 years ago and those of the more recent survey. Note that if the more recent survey was a complete census (everyone in the population being measured) then we would not need to do a statistical analysis to get a p value, for we would have the actual values in the population and could simply compare the data from then to now. But, that is not the case, the researcher relied on a sample from the population and now we need to determine whether the differences in percentages now is simply due to chance. State the percent of each ethnic group in the sample. b) To analyze the data using SPSS you will need to change the scores into numbers. Use the ‘Recode into Different Variables’ to change ‘Group A’ to ‘0’, ‘Group B’ to ‘1’, and ‘Group C’ to ‘2’ (we did something like this earlier in the semester). Look at your data to make sure it worked. Now go to the Analyze>>NonparametericTests>>Chi-Square menu. Move the numeric variable into the Test Variables List. Then enter the expected frequency for each category (assuming H0 is true) in turn (first for 0, then for 1, then for 2) into the Expected Values area. When finished click Ok. The ‘Asymp Sig’ is the p value for this chi-square. State the results of the analysis in the following form: χ²(df) = χ²obtained, p=... c) Interpret the results of the analysis. 3. Let’s use the goodness of fit test to see if some data deviate significantly from what we would expect if the data were normally distributed. Load into SPSS the crime data from assignment 27. Take a look at a histogram of the Crime Rates and see that the data look fairly skewed. Step A: we need to change the crime rate data into standard scores. Go to the Analyze>>DescriptiveStatistics>>Descriptives menu and move the Crime Rate variable into the ‘Variables’ box, then click on the ‘Save standardized values as variables’ box, and then click ‘OK’. Check to see that a new variable has been created that contains the standard scores of the crime rates. Step B: Now sort the z scores in descending order, that can be done through the Data menu. Step C: Create a new variable called ‘category’. Using the categories from the lecture notes, give all the z scores of 1.15 and above a score of ‘1’, those between 0.68 and 1.15 a score of ‘2’, and so on. What if a score falls exactly on a boundary (e.g. z=1.15)? This rarely happens, if it does, simply consistently follow some rule (e.g. if it falls on the boundary assign it to upper category). Step D: Go to the Analyze>>NonParametricTests>>ChiSquare, move the ‘category’ variable to the ‘Test Variable List’, under ‘Expected Values’ make sure ‘All categories equal’ is selected (you are expecting each category to have the same frequency if the data are normally distributed), and then click OK. Step E: Under the ‘Test Statistics’ box the ‘Asymp Sig.’ is the p value for this test. a) Look at the expected values of the cells, do we need to be particularly worried about whether or not the distribution of (O-E) is normally distributed? Why or why not? b) State the results of the analysis in the following form: χ²(df) = χ²obtained, p=... c) Interpret the results of the analysis. d) In general, if we reject H0 can we conclude that the data are not normally distributed? e) In general, if we fail to reject H0 can we conclude that the data are normally distributed? Why or why not? f) If the two highest crime rates were much higher, say with z scores of 4.2 and 3.9 respectfully (instead of 2.81 and 2.75 as they are in our data), then the data would be much more skewed. Would this use of χ² to test the normality of the data detect that change in the data?