Reading Statistics:
The SPSSPC "Frequencies" Printout
Frequencies tell you both the number and the percentage of all parents who selected each response. Following is an example:
21 Caused Cut Back on Alcohol
Frequency Percent Valid
Percent
Cumulative
Percent
Valid No
Yes
Total
Missing System
Total
51
72
123
5
128
39.8
56.3
96.1
3.9
100.0
41.5
58.5
100.0
41.5
100.0
The column labeled "frequency" lists the actual number of parents who selected each response. In the example, 51 parents answered "no" and 72 answered "yes."
The column labeled "percent" lists the actual percentages of the total sample who answered either "yes" or "no."
"Valid percent" is the percent when missing data are excluded from the calculations.
In other words, these are the percentages of parents who selected each response after we disregard missing responses.
The row labeled "missing system" shows the number and the percentage of parents who either did not answer the question, or whose answer was invalid. In the example,
5 parents gave no answer (3.9% of the total).
What numbers do you report?
For percentages in "yes/no" questions, such as the one above, report the "percent."
This makes the assumption that people who didn't answer are really giving a "no" answer. Thus we are using the total sample.
Just the "percent" of people who answered "yes" should be reported. This is how a report of these results might read: "Over half of the parents (56%) reported that reading the newsletter helped convince me to cut back on alcohol." The bar graph
(labeled "Figure 7") shows the percentage of respondents answering "yes" to the questions of self-reported effects of reading the newsletter. The first bar represents the 56% of "yes" respondents from the table above.
For questions 1 - 16 you should also report the "percent" since we want to know the percentage of the total sample who selected each response.
"Reading the newsletters helped convince me to..."
56%
Cut back on alcohol."
46%
Eat more healthy foods."
44%
Take my prenatal vitamins more."
38%
Breast feed my baby."
29%
Keep all my prenatal clinic appointments."
26%
Cut back on smoking."
0% 30% 60%
Percentage of Respondents
For the percentages of socio-demographic variables (26-31), you should use the entries under the column "valid percent." Valid percent is used for these variables, rather than percent, because we assume that the missing cases are distributed proportionately among all the response categories. This allows us to make pie charts that add up to 100%.
31 Family Income
Frequency Percent Valid
Percent
Cumulative
Percent
Valid Less than 14 K
14K - 19,999
20K - 49,999
50K or more
Total
Missing System
Total
2
8
52
62
124
4
128
1.6
6.3
40.6
48.4
96.9
3.1
100.0
1.6
6.5
41.9
50.0
100.0
1.6
8.1
50.0
100.0
In the frequency table above, 124 of the 128 parents had valid data, while 4 families
(3.1%) had missing data. You can see the "valid percents" are slightly higher than the
"percents" because the 4 missing cases have been removed from consideration. The bar chart labeled "Figure 3" demonstrates how these percentages would be graphed in the final report.
Note that sometimes you need to do some recalculating on your own. Here is an example from the Frequencies table above: "Eight percent of respondents reported their family income to be less than $19,999 per year." The 8% figure comes from adding the valid percents from two rows (1.6 for those less than 14K and 6.5 for those from 14 K to 19,999).
In the final report, press releases, and other public documents, we recommend rounding percentages to the nearest whole number. We lose a little accuracy this way, but avoid frightening math-phobic people. So we would round 14.4 to 14 and
14.6 to 15. But which way would you round 14.5? Here is the rule of thumb we follow: round .5 to the nearest even number. Therefore, 14.5 would round down to
14, but 15.5 would round up to 16, the nearest even number. In this way there is no systemic bias upward or downward.
50%
$50,000 or more
42%
$20,000 to 49,999
6 %
$14,000 to 19,999
2 %
Less than $14,000
0 % 20% 40%
Percentage of Respondents
60%
Reading Statistics:
The SPSSPC "Crosstabs" Printout
Cross-tabulations are called "crosstabs" for short, and literally tabulate one variable across another. In the example below we have cross-tabulated parents' selfreport responses of whether reading the newsletter helped convince them to cut back on alcohol with the variable called 2-LEVEL RISK VAR.
Every statistical software system prints out a crosstab a little differently, so don't be surprised if you need to relearn how to read this. We'll walk you through it here.
2-LEVEL RISK VAR * CUT BACK ON ALCOHOL
Crosstab
Cut Back on
Alcohol
No Yes
Total
RISK VAR % within 2-Level
PRIMIS & RISK Count
% within 2- Level
Total Count
% within 2-Level
Pearson Chi-Square
Continuity Correction
Value
8.233
7.136
Chi-Square Tests
Df
(2-sided)
1
1
.004
.008
60.5%
28
32.9%
51
41.5%
57
67.1%
72
58.5%
(2-sided)
85
100.0%
123
100.0%
Exact Sig
(1-sided)
First, note that it tells you right at the top which two variables are being cross tabulated (i.e. 2-LEVEL RISK VAR * CUT BACK ON ALCOHOL).
Parents answered either "yes" or "no" that reading the newsletter convinced them to cut back on alcohol. The first two columns in the crosstab represent these answers: the first column for the "no" answers (which we coded with the number "1") and the second column with the "yes" answers ("2").
The "2-level risk" classifications are shown on the left side of the printout, with one row for each classification. With two possible group classifications (“Non-Risk” and
“Primi & Risk”) and two possible answers as to whether reading the newsletters helped convince them to cut back on alcohol (either yes or no), we have a total of 2x2
= 4 cells in the table (not counting the "total" cells on the margins of the table).
The first number in each cell is the "count" of the number of parents who fit the two conditions for that cell. For example, in the upper right cell 15 non-risk parents also answered "yes," that reading the newsletter helped convince them to cut back on alcohol.
The second number in each cell is the row percentage. In the example, looking at the upper right hand cell, 39.5% (or 40% rounded) of the parents who were considered non-risk reported that they changed their behavior as a result of reading the newsletter
(i.e. answered "yes"). In the next cell down, one can see that a greater percentage of
"primi & risk" parents (67.1, or 67%) felt that reading the newsletter helped convince them to cut back on alcohol. These are the most interesting numbers in the crosstabs table because they show the newsletter was more effective at bringing about behavior change (self-reported) in risk group parents compared to those who were non-risk.
In the final report, these percentages are graphed as shown in the sample bar graph labeled Figure 8. Looking at the first item ("Convince me to cut back on alcohol"),
40% for the "Non-Risk" group is compared to 67% for the “Risk-Group".
The bottom row and far right column on the crosstabs table show the row and column total "counts" and "percentages". In the example, you can see that a total of 72 parents (15 and 57) reported that reading the newsletter convinced them to cut back on alcohol, representing 58.5% (58% rounded) of the 123 parents who responded.
These are the same numbers as in the frequency table shown before. The crosstab simply provides a frequency count within each "group" classification.
When I get a printout like this one, I will usually circle the "yes" percents, and then write a sentence and/or draw a picture that summarizes what the numbers mean.
Writing on your printouts is a terrific way to begin the magical process of turning sterile numbers into an interesting story. For this printout I might write something like "greater impact on “at-risk” parents." Sometimes I draw a crude bar graph on the printout that is later realized in the final report like the one labeled Figure 8.
Is this a significant difference between the two groups? Our printout reports a simple
Chi-square test of independence in the "Chi-Square Tests" table. It gives us a sense of whether the observed differences by category (Non-Risk and Risk Group) are likely to be real, or just fluctuations due to chance. In this example, the Chi-square test notes that a total of 58.5% of parents report that the newsletter helped convince them to cut back on alcohol. If group classification is unrelated to reports of behavior change due to reading the newsletter, we should expect about the same percentage of parents in each group (i.e. 58%) to report that the newsletter convinced them to cut back on alcohol. As the percentages diverge farther and farther away from this
expectation (down to 40% for one group and up to 67% for the other), then the results look more and more reliable to us and less likely due to chance.
The Chi-square test tells us the exact mathematical probability that our results are due to chance alone. Under the label "Asymp. Sig." in the Pearson Chi-Square section is the probability. If the significance were .05, this would mean that the probability of obtaining the observed distribution by chance alone was 5 times in 100. As a rule of thumb, most researchers think anything this rare, or more rare, is "statistically significant." In the example shown, the chances of observing this difference between
Non Risk and Risk Group parents by chance alone was less than .004, or less than four in one-thousand, a highly significant result!
Note: the other statistics reported in the Chi-Square Test table are not relevant for our purposes.
Cut back on alcohol."
Eat more healthy foods."
Take my prenatal vitamins
Breast feed my baby."
Keep all my prenatal clinic appointments."
40%
36%
56%
37%
50%
35%
43%
26%
36%
24%
27%
67%
Cut back on smoking."
0% 30% 60%
Percentage Agreeing