The null hypothesis is the logical opposite of the research

advertisement
2. Formulation of null hypothesis
This is a critical step in the hypothesis testing process. The process is called hypothesis testing
because this is the hypothesis which we test through the appropriate statistical procedures. A
null hypothesis is formulated in direct response to the research hypothesis. The null hypothesis
is sometimes referred to as the hypothesis of no difference; it is sometimes termed the
hypothesis of equality. We can think of the null hypothesis this way.
The null hypothesis is the logical opposite of the research hypothesis.
It is formulated such that if the research hypothesis is valid, the null hypothesis cannot be valid.
If the null hypothesis is valid, the research hypothesis cannot be valid. The 2 formulations are
mutually exclusive, that is, they cannot both be true at the same time. This is analogous to the
possible verdicts available to a jury in a criminal trial. If a defendant in a criminal trial is found
guilty, he/she cannot also be not guilty. If a defendant is found not guilty, he/she cannot also be
guilty. Just as a criminal defendant may be either guilty or not guilty, so a null hypothesis may
be valid or not valid. One formulation precludes the other.
We have previously discussed 3 forms of research hypotheses. Now, let’s look at the
corresponding null hypotheses.
Null hypotheses are symbolized H0 (read H sub zero).
1. Correlation Null Hypothesis. If the correlation research hypothesis asserts that 2 (or more)
variables are correlated, that is, the correlation coefficient is not zero, the null hypothesis
asserts that it is zero. It should be clear that the 2 statements cannot both be true at the
same time. Hence, if we decide to reject one hypothesis, we do not reject the other. From a
previous slide show, recall this correlation research hypothesis: Age and income are
correlated. The corresponding correlation null hypothesis is: Age and income are NOT
correlated. In statistical symbols, the research hypothesis is: rage,income ≠ 0.00; the null
hypothesis is: rage,income = 0.00 . Remember, the Pearson correlation coefficient (used in
SPSS) requires that both variables be measured at the interval/scale level.
2. Independence Null Hypothesis. If the research hypothesis asserts that 2 (or more) nominal
or ordinal variables are related (there is a dependence relationship between them), the null
hypothesis asserts that the 2 (or more) variables are NOT related- they are independent of
each other. The chi-square test with which you are somewhat familiar can be run with crosstabulated nominal or ordinal variables in SPSS; this statistic is a test of the independence of
the variables in the cross-tabulation. In a previous slide show, we hypothesized that gender
and voting preference are related- they are not independent of each other. Our null
hypothesis, then, is that gender and voting preference are not related- they are independent
of each other.
3. Difference between means Null Hypothesis. If the research difference between means
hypothesis asserts that there is a (real) difference between 2 (or more) population means,
the null hypothesis asserts that there is NO difference between the 2 (or more) population
means. The Venn diagram on the following slide illustrates the null hypothesis.
Null Hypothesis
Population:
µ
Sample 1
X1
Sample 2
X2
2 samples from 1 population with 1 averagesample averages differ only by chancesampling error
As this diagram suggests, there may be 2 samples in our analysis and they may have different
means, but the difference between their means (and the population mean) is only because of
sampling error- the fact that it was these particular sets of respondents from the population
who were chosen for the research. Had we chosen different samples of respondents from this
population, we would no doubt have obtained different sample means. But the differences may
still be the result of sampling error. The difference between means null hypothesis asserts that
the difference between 2 (or more) sample means is the result of chance or sampling error. The
samples come from 1 population with 1 mean.
As a reminder, the Venn diagram on the following slide represents the research hypothesis that
the 2 samples come from 2 different populations with 2 different means.
Note the symbols: µ (pronounced mew) is the population mean;
is the sample mean (x-bar).
Population 1:
µ1
Population 2:
µ2
Sample 1
_
X1
Sample 2
_
X2
2 different samples with 2 different means
from 2 different populations with 2 different
Means.
In our previous slide show, we presented this difference between means research hypothesis:
Males and females earn different mean incomes. In math symbols: H1 : µmale ≠ µfemale . The null
hypothesis asserts that males and females do NOT earn different mean incomes. In alternative
phrasing: Males and females earn the same mean income. Symbolically: H0 : µmale = µfemale .
Remember that in difference between means hypotheses, the dependent variable is measured
at the interval/scale level. The independent variable may be measured at the nominal or ordinal
level.
3. State the alpha (α) level.
Recall that in a previous slide show, we made an analogy between a criminal trial and hypothesis
testing. After the criminal charge from the prosecutor and the plea from the defendant, the
third step in the trial process was the requirement that the evidence presented at trial must
prove the defendant’s guilt beyond a reasonable doubt in order for the jury to reach a verdict of
guilty. We presented a table to demonstrate the possible outcomes of the trial and their
possible consequences for the defendant.
Verdict
Defendant is actually not
guilty
Defendant is actually
guilty
Guilty
Error- an innocent
defendant is punished
Correct verdict
Correct verdict
Error- a guilty defendant
goes free
Not guilty
In the trial process, we are particularly concerned about the consequence of making the error of
finding the defendant guilty when he/she is actually not guilty. In social science research, we
too must render a verdict; we must make a decision to reject or not reject the null hypothesis.
The following slide presents a table similar to this one as it is applied to the decision to reject or
not reject a null hypothesis.
Decision
Null hypothesis is true
Null hypothesis is not true
Reject H0
Type I error- also known as Correct decision
α (alpha) error
Not reject H0
Correct decision
Type II error- also known as
β (beta) error
Just as a jury does not have absolute certainty about the guilt or innocence of a criminal
defendant, social science researchers will never know with certainty if they have made a correct
decision about the null hypothesis. Hence, we have to be concerned about the probability of
making an error in our decision to reject or not reject the null hypothesis. And, just as juries are
concerned about making an error of finding an innocent defendant guilty, we are concerned
about the probability of rejecting a null hypothesis when it is actually true. This is termed a
Type I or α (alpha) error. We are willing to risk making a Type I or α error, but only at certain
known levels of probability. In social science research, we use α (alpha) to symbolize these
known levels of probability and these levels are typically;
1) α = .05- we are willing to risk making a Type I error 5 percent of the time;
2) α = .01- we are willing to risk making a Type I error 1 percent of the time.
In other research contexts, we may use different α levels, such as α = .10 . But, most research in
criminal justice and sociology uses α levels of .05 or .01. Using these numbers gives researchers
a bit of an advantage over juries. Jurors do not have objective standards about what constitutes
proof “beyond a reasonable doubt”; researchers have specific α levels on which to base their
decisions. The following table ties the criminal trial and social science research processes
together.
Verdict/
Decision
Defendant is actually not guilty/
Null hypothesis is true
Defendant is actually guilty/
Null hypothesis is not true
Guilty/
Reject H0
Error- an innocent
defendant is punished/
Type I error
Correct verdict/
Correct decision
Correct verdict
Correct decision
Error- a guilty defendant
goes free/
Type II error
Not guilty/
Not Reject H0
Now that we have an idea of what the concept of α levels means- it is the risk we are willing to
take of rejecting a null hypothesis when it is actually true- how do we apply the concept to
hypothesis testing? SPSS actually helps us apply the concept. The key is the term “Sig” in the
SPSS output. We will examine this in subsequent slides. For now, understand that social science
researchers clearly state their criterion for deciding to reject or not reject a null hypothesis.
That statement is called an α level. [And this level is typically either .05 or .01.]
4. Collect data and run descriptive statistics on the sample.
Data collection techniques were presented in the Research Methods course.
Descriptive statistics were covered in the first half of this course.
5. Run the appropriate inferential statistical test.
Before discussing “appropriate” inferential statistical tests, let us recall inferential statistics.
From a previous slide show, we have the following Venn diagram.
Population
SAMPLE
SELECTION
Descriptive Statistics
Sample
After describing the sample,
we apply procedures of
Inferential Statistics
and, based on the
results, we decide if
we can generalize
our data to the entire
population.
GENERALIZATION to
POPULATION
Inferential Statistics
As the diagram suggests, after we have selected a sample, collected data (as indicated in our
research hypothesis), and computed descriptive statistics on the sample, the next task is to use
the sample data to generalize to the larger population from which the sample was selected.
This is analogous to a jury deliberating over the evidence presented at trial to try to determine if
the evidence proves the defendant’s guilt beyond a reasonable doubt. Just as the evidence will
lead the jury to a verdict, so the results of the the appropriate statistical test will lead the
researcher to a decision about the null hypothesis. While the tests of the evidence used by the
jury may be subjective, there are clearly appropriate tests to run on the data based on the
research and null hypotheses. Here are guidelines.
1) Correlation hypotheses.
We have seen that the “Correlate-Bivariate” command sequence in SPSS leads to a correlation
matrix in which each variable in the analysis is correlated with all of the other variables. On the
next slide is the correlation matrix from the “voter.xlsx” file we have used in class that shows the
correlation between age and income.
Correlations
age
age
income
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
1
1500
-.045
.082
1500
income
-.045
.082
1500
1
1500
Reading across the rows of the correlation matrix, we see that each row includes 3 rows .
1) Row 1: Variable “age”
Pearson Correlation- this cell contains the correlation coefficient(s) for the
variables in the analysis- in this illustration, age and income.
Sig. (2-tailed)- this is the number we
use to decide to reject or not reject
the null hypothesis. (We will have more to say about
this in a subsequent slide.)
N- this is the number of pairs of observations used in the computation of
the Pearson Correlation coefficient.
2. Independence hypotheses.
Result of chi-square test 
“Sig”- this is the number we use to decide
to reject or not reject the null hypothesis.
3. Difference between means hypotheses.
Result of t test
for difference
between means.
“Sig. (2-tailed)”:
this is the
number we
use to decide
to reject or
not reject
the null
hypothesis.
6. Researcher decides to reject or not reject null hypothesis.
How do you decide whether to reject or accept a null hypothesis? In a previous slide, we stated
that “Sig. (2-tailed)… is the number we use to decide to reject or not reject the null
hypothesis.” The rule(s) to follow in deciding to reject or not reject the null hypothesis can be
summarized as follows.
6a. In the case of a correlation null hypothesis, in the correlation matrix, examine the number
in the row headed “Sig”. If this number is:
a. less than or equal to the number you stated in Step 3 above (State the α level ),
REJECT THE NULL HYPOTHESIS. Again, if you stated an α level of .05 and the number
for the correlation in the “Sig” row is .05 or lower- your decision is to reject the null
hypothesis.
b. greater than the number you stated in Step 3 above, do not reject the null
hypothesis. Again, if you stated an α level .01 and number in the “Sig” row is .03,
your decision is not to reject the null hypothesis.
Correlations
age
age
income
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
1
1500
-.045
.082
1500
income
-.045
.082
1500
1
1500
Here the “Sig. (2-tailed)” number is .082; since this is greater than either .05 or .01, we do not
reject the null hypothesis.
6b. In the case of an independence hypothesis, examine the “Chi Square Tests” output. In the
table, there is a column headed “Asymp. Sig. (2-Sided)” and a row headed “Pearson ChiSquare”. Examine the number in this cell; if this number is:
a. less than or equal to the number you stated in Step 3 above (State the α level ),
REJECT THE NULL HYPOTHESIS. Again, if you stated an α level of .05 and the number
for the correlation in the “Sig” row is .05 or lower- your decision is to reject the null
hypothesis.
b. greater than the number you stated in Step 3 above, do not reject the null
hypothesis. Again, if you stated an α level .01 and number in the “Sig” row is .03,
your decision is not to reject the null hypothesis.
Chi-Square Tests
Pears on Chi-Square
Likelihood Ratio
Linear-by-Linear
Ass ociation
N of Valid Cases
Value
24.217a
24.252
15.187
2
2
Asymp. Sig.
(2-s ided)
.000
.000
1
.000
df
1500
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 98.71.
The “Asymp. Sig. (2-sided)” number is .000 . Since this number is less than an α level of either
.05 or .01, we reject the null hypothesis. We can conclude that the 2 variables- sex and voting
preference- are not independent; they are related.
6c. In the case of difference between means hypothesis, examine “Independent Samples Test”
output. There is a column headed “Sig (2-tailed)” and a row headed “Equal Variances Assumed”.
Examine the number in this cell. If this number is:
a. less than or equal to the number you stated in Step 3 above (State the α level),
REJECT THE NULL HYPOTHESIS. For example, if you stated an α level of
.05 and the number in the “Sig” cell, is .04- your decision is to reject
the null hypothesis.
b. greater than the number you stated in Step 3 above, do not reject the null
hypothesis. For example, if you stated an α level .01 and the number in the “Sig” cell
is .03- your decision is not to reject the null hypothesis.
Independent Samples Test
Levene's Tes t for
Equality of Variances
F
income
Equal variances
ass umed
Equal variances
not as sumed
.432
Sig.
.511
t-tes t for Equality of Means
95% Confidence Interval of the
Difference
Lower
Upper
Mean Difference
Std. Error
Difference
-.209
1498
.834
-313.3952630474
1498.2777421308
-3252.340273198
2625.5497471032
-.209
1402.814
.835
-313.3952630474
1502.1089669456
-3260.017081586
2633.2265554911
t
df
Sig. (2-tailed)
Download