Obtaining Exact Significance Levels With SPSS

advertisement
Obtaining Exact Significance Levels With SPSS
It is so easy to obtain exact significance levels with SPSS, that I shall expect you to obtain exact
p's for all tests of statistical significance which you conduct using the normal curve, the binomial
distribution, the Chi-square distribution, and the t and F distributions, even when you conduct them by
hand (when you do the complete analysis on SPSS, you will get the p-values automatically as a part
of the statistical output).
Boot up SPSS and click HELP, TOPICS, INDEX. Enter the letters “CUM.” Select the index entry
“Cumulative Distribution Functions” and then click DISPLAY. Click PRINT to print a very terse lesson
on how to use the functions that allow you get p values for specified probability distributions.
The Standard Normal Distribution.
First we are going to find the lower-tailed p value for a Z score of -1.96. SPSS will not let us use
the COMPUTE function, which is employed to find p values, without having a data set open, so we
shall make a little dummy data set. In the top left cell in the data editor, enter the score 0 and then hit
the cursor down key. You now have a data set open with one score in it. Click TRANSFORM,
COMPUTE. The “Compute Variable” dialog box appears. In the “Target Variable” field, enter the
letter P. In the “Numeric Expression” field type in CDFNORM(-1.96) and then click OK.
You are asking SPSS to find the lower tailed probability of a standard normal Z score of -1.96. Of
course, you already know that the probability is .025. SPSS shows you, in the second column,
named p, that the probability is .02. SPSS has rounded to two decimal points.
Go to the Variable View and tell SPSS to give you 5 decimal points for the P variable.
P-SPSS.docx
2
Return to the data view and you see that the p is .02500.
Binomial Distribution
You wish to determine if mothers can identify their babies from the scent of the baby. Your
research participants are mothers and their newborn babies in a maternity ward. You take shirts that
the babies have worn for a while and stuff them into cardboard tubes. To each of 17 mothers you
present two tubes, one that contains her baby’s shirt, one that contains another baby’s shirt. You ask
her to sniff both tubes and ‘guess’ which one has her baby’s shirt in it. Thirteen of the 17 mothers
correctly identify their baby, four do not. This example is based on actual research, by the way. If
mothers really cannot identify their babies by scent alone, but just guess on a task like this, what is
the probability of as few as 4 out of 17 having guessed incorrectly? This is a question that involves
the binomial distribution, a distribution in which the variable has only two possible outcomes (in this
case, a correct identification or an incorrect identification). To obtain the binomial probability, we
TRANSFORM, COMPUTE, and tell SPSS to set the value of P to CDF.BINOM(4,17,.5). The 4 is the
observed number of failures correctly to identify the baby, the 17 is the number of mothers who tried
to do it, and the .5 is the probability of correctly identifying the baby if the mother was just “blindly
guessing.” SPSS tells us that the probability of getting that few correct identifications if the mothers
were really just blindly guessing is only .025. That should convince us that mothers really can identify
their babies on the basis of scent alone.
3
Chi-Square Distribution
Suppose that we wish to determine whether or not three different types of therapy differ with
respect to their effectiveness in relieving symptoms of chronic anxiety. Our independent variable is
type of therapy and the dependent variable is whether or not the patient reports having experienced a
reduction in anxiety after three months of treatment. We analyze the contingency table with a
Pearson Chi-Square and obtain a value of 5.99 for the test statistic. Is this significant? In the
“Numeric Expression” field of the Compute Variable dialog box we enter CDF.CHISQ(5.99,2) and
click OK. SPSS tells us that the lower-tailed p is .94996 -- but we need an upper-tailed p for this
application of 2, so we subtract from one and obtain a p of .05004. Technically, this is not quite less
than or equal to the holy criterion of .05, but we round it off to four decimal points and conclude that
the three types of therapy do differ significantly in effectiveness.
Student’s t Distribution
We wish to determine whether sick animals who have been given a certain medical treatment
differ from those who have received a placebo treatment with respect to the severity of their illness
after treatment. A t test yields a value of t = 2.23 on 10 degrees of freedom. To get the p value, we
enter in the “Numeric Expression" field of the Compute Variable dialog box CDF.T(-2.23,10) and
obtain a one-tailed p of .02492. Since we desire a two-tailed p, we double this, obtaining .0498, and
conclude that the treated animals did differ significantly from those who received only a placebo.
F Distribution
We have conducted an Analysis of Variance to determine if students in three different majors
differ with respect to how many hours they say they spend studying in a typical week. We obtain a F
of 3.33 on 2, 27 degrees of freedom. To get the p value, we enter in the “Numeric Expression" field of
the Compute Variable dialog box CDF.F(3.33,2,27) and obtain a lower-tailed p value of .95203.
Since this application of the F statistic requires an upper-tailed p, we subtract from one and obtain a p
value of .048. We conclude that the majors do differ significantly on how many hours they report they
spend studying.
If you have more than one value of F, you can get all of the p values in one run by entering F, df1,
and df2 as data and then pointing the CDF function to those variables, as shown below. Notice that I
had SPSS subtract from 1 for me.
4
Return to Wuensch’s SPSS Lessons Page
Karl L. Wuensch, Dept. of Psychology,
East Carolina University, Greenville, NC 27858 USA
July, 2014
Download