2. Formulation of null hypothesis This is a critical step in the hypothesis testing process. The process is called hypothesis testing because this is the hypothesis which we test through the appropriate statistical procedures. A null hypothesis is formulated in direct response to the research hypothesis. The null hypothesis is sometimes referred to as the hypothesis of no difference; it is sometimes termed the hypothesis of equality. We can think of the null hypothesis this way. The null hypothesis is the logical opposite of the research hypothesis. It is formulated such that if the research hypothesis is valid, the null hypothesis cannot be valid. If the null hypothesis is valid, the research hypothesis cannot be valid. The 2 formulations are mutually exclusive, that is, they cannot both be true at the same time. This is analogous to the possible verdicts available to a jury in a criminal trial. If a defendant in a criminal trial is found guilty, he/she cannot also be not guilty. If a defendant is found not guilty, he/she cannot also be guilty. Just as a criminal defendant may be either guilty or not guilty, so a null hypothesis may be valid or not valid. One formulation precludes the other. We have previously discussed 3 forms of research hypotheses. Now, let’s look at the corresponding null hypotheses. Null hypotheses are symbolized H0 (read H sub zero). 1. Correlation Null Hypothesis. If the correlation research hypothesis asserts that 2 (or more) variables are correlated, that is, the correlation coefficient is not zero, the null hypothesis asserts that it is zero. It should be clear that the 2 statements cannot both be true at the same time. Hence, if we decide to reject one hypothesis, we do not reject the other. From a previous slide show, recall this correlation research hypothesis: Age and income are correlated. The corresponding correlation null hypothesis is: Age and income are NOT correlated. In statistical symbols, the research hypothesis is: rage,income ≠ 0.00; the null hypothesis is: rage,income = 0.00 . Remember, the Pearson correlation coefficient (used in SPSS) requires that both variables be measured at the interval/scale level. 2. Independence Null Hypothesis. If the research hypothesis asserts that 2 (or more) nominal or ordinal variables are related (there is a dependence relationship between them), the null hypothesis asserts that the 2 (or more) variables are NOT related- they are independent of each other. The chi-square test with which you are somewhat familiar can be run with crosstabulated nominal or ordinal variables in SPSS; this statistic is a test of the independence of the variables in the cross-tabulation. In a previous slide show, we hypothesized that gender and voting preference are related- they are not independent of each other. Our null hypothesis, then, is that gender and voting preference are not related- they are independent of each other. 3. Difference between means Null Hypothesis. If the research difference between means hypothesis asserts that there is a (real) difference between 2 (or more) population means, the null hypothesis asserts that there is NO difference between the 2 (or more) population means. The Venn diagram on the following slide illustrates the null hypothesis. Null Hypothesis Population: µ Sample 1 X1 Sample 2 X2 2 samples from 1 population with 1 averagesample averages differ only by chancesampling error As this diagram suggests, there may be 2 samples in our analysis and they may have different means, but the difference between their means (and the population mean) is only because of sampling error- the fact that it was these particular sets of respondents from the population who were chosen for the research. Had we chosen different samples of respondents from this population, we would no doubt have obtained different sample means. But the differences may still be the result of sampling error. The difference between means null hypothesis asserts that the difference between 2 (or more) sample means is the result of chance or sampling error. The samples come from 1 population with 1 mean. As a reminder, the Venn diagram on the following slide represents the research hypothesis that the 2 samples come from 2 different populations with 2 different means. Note the symbols: µ (pronounced mew) is the population mean; is the sample mean (x-bar). Population 1: µ1 Population 2: µ2 Sample 1 _ X1 Sample 2 _ X2 2 different samples with 2 different means from 2 different populations with 2 different Means. In our previous slide show, we presented this difference between means research hypothesis: Males and females earn different mean incomes. In math symbols: H1 : µmale ≠ µfemale . The null hypothesis asserts that males and females do NOT earn different mean incomes. In alternative phrasing: Males and females earn the same mean income. Symbolically: H0 : µmale = µfemale . Remember that in difference between means hypotheses, the dependent variable is measured at the interval/scale level. The independent variable may be measured at the nominal or ordinal level. 3. State the alpha (α) level. Recall that in a previous slide show, we made an analogy between a criminal trial and hypothesis testing. After the criminal charge from the prosecutor and the plea from the defendant, the third step in the trial process was the requirement that the evidence presented at trial must prove the defendant’s guilt beyond a reasonable doubt in order for the jury to reach a verdict of guilty. We presented a table to demonstrate the possible outcomes of the trial and their possible consequences for the defendant. Verdict Defendant is actually not guilty Defendant is actually guilty Guilty Error- an innocent defendant is punished Correct verdict Correct verdict Error- a guilty defendant goes free Not guilty In the trial process, we are particularly concerned about the consequence of making the error of finding the defendant guilty when he/she is actually not guilty. In social science research, we too must render a verdict; we must make a decision to reject or not reject the null hypothesis. The following slide presents a table similar to this one as it is applied to the decision to reject or not reject a null hypothesis. Decision Null hypothesis is true Null hypothesis is not true Reject H0 Type I error- also known as Correct decision α (alpha) error Not reject H0 Correct decision Type II error- also known as β (beta) error Just as a jury does not have absolute certainty about the guilt or innocence of a criminal defendant, social science researchers will never know with certainty if they have made a correct decision about the null hypothesis. Hence, we have to be concerned about the probability of making an error in our decision to reject or not reject the null hypothesis. And, just as juries are concerned about making an error of finding an innocent defendant guilty, we are concerned about the probability of rejecting a null hypothesis when it is actually true. This is termed a Type I or α (alpha) error. We are willing to risk making a Type I or α error, but only at certain known levels of probability. In social science research, we use α (alpha) to symbolize these known levels of probability and these levels are typically; 1) α = .05- we are willing to risk making a Type I error 5 percent of the time; 2) α = .01- we are willing to risk making a Type I error 1 percent of the time. In other research contexts, we may use different α levels, such as α = .10 . But, most research in criminal justice and sociology uses α levels of .05 or .01. Using these numbers gives researchers a bit of an advantage over juries. Jurors do not have objective standards about what constitutes proof “beyond a reasonable doubt”; researchers have specific α levels on which to base their decisions. The following table ties the criminal trial and social science research processes together. Verdict/ Decision Defendant is actually not guilty/ Null hypothesis is true Defendant is actually guilty/ Null hypothesis is not true Guilty/ Reject H0 Error- an innocent defendant is punished/ Type I error Correct verdict/ Correct decision Correct verdict Correct decision Error- a guilty defendant goes free/ Type II error Not guilty/ Not Reject H0 Now that we have an idea of what the concept of α levels means- it is the risk we are willing to take of rejecting a null hypothesis when it is actually true- how do we apply the concept to hypothesis testing? SPSS actually helps us apply the concept. The key is the term “Sig” in the SPSS output. We will examine this in subsequent slides. For now, understand that social science researchers clearly state their criterion for deciding to reject or not reject a null hypothesis. That statement is called an α level. [And this level is typically either .05 or .01.] 4. Collect data and run descriptive statistics on the sample. Data collection techniques were presented in the Research Methods course. Descriptive statistics were covered in the first half of this course. 5. Run the appropriate inferential statistical test. Before discussing “appropriate” inferential statistical tests, let us recall inferential statistics. From a previous slide show, we have the following Venn diagram. Population SAMPLE SELECTION Descriptive Statistics Sample After describing the sample, we apply procedures of Inferential Statistics and, based on the results, we decide if we can generalize our data to the entire population. GENERALIZATION to POPULATION Inferential Statistics As the diagram suggests, after we have selected a sample, collected data (as indicated in our research hypothesis), and computed descriptive statistics on the sample, the next task is to use the sample data to generalize to the larger population from which the sample was selected. This is analogous to a jury deliberating over the evidence presented at trial to try to determine if the evidence proves the defendant’s guilt beyond a reasonable doubt. Just as the evidence will lead the jury to a verdict, so the results of the the appropriate statistical test will lead the researcher to a decision about the null hypothesis. While the tests of the evidence used by the jury may be subjective, there are clearly appropriate tests to run on the data based on the research and null hypotheses. Here are guidelines. 1) Correlation hypotheses. We have seen that the “Correlate-Bivariate” command sequence in SPSS leads to a correlation matrix in which each variable in the analysis is correlated with all of the other variables. On the next slide is the correlation matrix from the “voter.xlsx” file we have used in class that shows the correlation between age and income. Correlations age age income Pears on Correlation Sig. (2-tailed) N Pears on Correlation Sig. (2-tailed) N 1 1500 -.045 .082 1500 income -.045 .082 1500 1 1500 Reading across the rows of the correlation matrix, we see that each row includes 3 rows . 1) Row 1: Variable “age” Pearson Correlation- this cell contains the correlation coefficient(s) for the variables in the analysis- in this illustration, age and income. Sig. (2-tailed)- this is the number we use to decide to reject or not reject the null hypothesis. (We will have more to say about this in a subsequent slide.) N- this is the number of pairs of observations used in the computation of the Pearson Correlation coefficient. 2. Independence hypotheses. Result of chi-square test “Sig”- this is the number we use to decide to reject or not reject the null hypothesis. 3. Difference between means hypotheses. Result of t test for difference between means. “Sig. (2-tailed)”: this is the number we use to decide to reject or not reject the null hypothesis. 6. Researcher decides to reject or not reject null hypothesis. How do you decide whether to reject or accept a null hypothesis? In a previous slide, we stated that “Sig. (2-tailed)… is the number we use to decide to reject or not reject the null hypothesis.” The rule(s) to follow in deciding to reject or not reject the null hypothesis can be summarized as follows. 6a. In the case of a correlation null hypothesis, in the correlation matrix, examine the number in the row headed “Sig”. If this number is: a. less than or equal to the number you stated in Step 3 above (State the α level ), REJECT THE NULL HYPOTHESIS. Again, if you stated an α level of .05 and the number for the correlation in the “Sig” row is .05 or lower- your decision is to reject the null hypothesis. b. greater than the number you stated in Step 3 above, do not reject the null hypothesis. Again, if you stated an α level .01 and number in the “Sig” row is .03, your decision is not to reject the null hypothesis. Correlations age age income Pears on Correlation Sig. (2-tailed) N Pears on Correlation Sig. (2-tailed) N 1 1500 -.045 .082 1500 income -.045 .082 1500 1 1500 Here the “Sig. (2-tailed)” number is .082; since this is greater than either .05 or .01, we do not reject the null hypothesis. 6b. In the case of an independence hypothesis, examine the “Chi Square Tests” output. In the table, there is a column headed “Asymp. Sig. (2-Sided)” and a row headed “Pearson ChiSquare”. Examine the number in this cell; if this number is: a. less than or equal to the number you stated in Step 3 above (State the α level ), REJECT THE NULL HYPOTHESIS. Again, if you stated an α level of .05 and the number for the correlation in the “Sig” row is .05 or lower- your decision is to reject the null hypothesis. b. greater than the number you stated in Step 3 above, do not reject the null hypothesis. Again, if you stated an α level .01 and number in the “Sig” row is .03, your decision is not to reject the null hypothesis. Chi-Square Tests Pears on Chi-Square Likelihood Ratio Linear-by-Linear Ass ociation N of Valid Cases Value 24.217a 24.252 15.187 2 2 Asymp. Sig. (2-s ided) .000 .000 1 .000 df 1500 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 98.71. The “Asymp. Sig. (2-sided)” number is .000 . Since this number is less than an α level of either .05 or .01, we reject the null hypothesis. We can conclude that the 2 variables- sex and voting preference- are not independent; they are related. 6c. In the case of difference between means hypothesis, examine “Independent Samples Test” output. There is a column headed “Sig (2-tailed)” and a row headed “Equal Variances Assumed”. Examine the number in this cell. If this number is: a. less than or equal to the number you stated in Step 3 above (State the α level), REJECT THE NULL HYPOTHESIS. For example, if you stated an α level of .05 and the number in the “Sig” cell, is .04- your decision is to reject the null hypothesis. b. greater than the number you stated in Step 3 above, do not reject the null hypothesis. For example, if you stated an α level .01 and the number in the “Sig” cell is .03- your decision is not to reject the null hypothesis. Independent Samples Test Levene's Tes t for Equality of Variances F income Equal variances ass umed Equal variances not as sumed .432 Sig. .511 t-tes t for Equality of Means 95% Confidence Interval of the Difference Lower Upper Mean Difference Std. Error Difference -.209 1498 .834 -313.3952630474 1498.2777421308 -3252.340273198 2625.5497471032 -.209 1402.814 .835 -313.3952630474 1502.1089669456 -3260.017081586 2633.2265554911 t df Sig. (2-tailed)