Data Analysis Guidelines - Answers for

advertisement
Basic Data Analysis Guidelines
for Research Students
Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW
University of Mary Hardin-Baylor
Social Work Program
January 30, 2012
Reproduction of any part of the guidelines is not permitted without the author’s permission.
August, 2008
2
Table of Contents
Page
Introduction ............................................................................................................................4
Organization of the Guide ................................................................................................4
Basic Guidelines for Constructing a Survey Question ..........................................................5
Constructing Your Response Categories - Establishing Your Level of Measurement ....5
Associating Response Categories of a Question to Statistical Procedures ......................6
Basic Guidelines for Analyzing Data ....................................................................................7
Data Analysis: Making Sense of Those Numbers .................................................................8
Check To Be Sure Your Data is Accurate .......................................................................8
Conducting a Frequencies Analysis for Each Variable ...................................................9
Example of a Survey Question and SPSS Frequencies Output for the Variable SEX ....9
Univariate Data Analysis .......................................................................................................10
Analysis of a Nominal Level Variable.............................................................................10
Example of a survey question and SPSS output for a nominal level variable ...........10
Analysis of an Ordinal Level Variable ............................................................................12
Example of a survey question and SPSS output for an ordinal level variable ...........12
Analysis of an Interval/Ratio Level Variable ..................................................................13
Example of a survey question and SPSS output for an interval level variable ..........14
Bivariate (2 variables) Data Analysis ....................................................................................15
Chi Square (Goodness of Fit) Test...................................................................................15
Example 1 - Chi square test .......................................................................................16
Example 2 - Chi square test .......................................................................................17
t-Test (Difference of Means Test)....................................................................................
Example 1 - One sample t-test ...................................................................................
Example 2 - Independent samples t-test ....................................................................
Example 3 - Paired samples t-test ..............................................................................
Analysis of Variance (ANOVA) Test ..............................................................................
Example of a one-way ANOVA ................................................................................
Pearson’s Product Moment Correlation (r)......................................................................
Example - Pearson’s (r) .............................................................................................
3
Conclusion .............................................................................................................................
Appendices SPSS Output Screens
Appendix 1 Frequencies SPSS Screens ...........................................................................19
Appendix 2 Crosstab and Chi Square SPSS Screens .......................................................21
Appendix 2 t-Test SPSS Screens .....................................................................................
One Sample t-Test Screens ........................................................................................
Independent Samples t-Test Screens .........................................................................
Paired Samples t-Test Screens ...................................................................................
Appendix 3 Analysis of Variance (one-way) SPSS Screens ...........................................
Appendix 4 Pearson’s r SPSS Screens ............................................................................
References ..............................................................................................................................23
4
Basic Data Analysis Guidelines for Research Students
Introduction
Research and statistics are inseparable. Knowing this is one thing. Understanding and using
this relationship is another, especially for a research student. An oversight of many research
students is that of waiting until later rather than considering early in the research process the
relationship between the problem statement, research question, hypotheses, the kinds of data one
will be collecting, and the statistical analysis of the data.
This basic guide for analyzing data is presented to encourage you to consider early rather
than later in the research process the relationship that exists between questions asked on a
survey, the response categories and data that is generated, and statistical procedures available to
create some sense from the collected data. Thinking about data and its analyses should be part of
the first steps in the development of a research proposal and like many other parts of the research
process should be continually revisited, updated, and refined as your project draws to a
conclusion.
This guide provides examples of univariate (single variable) and bivariate (two variables)
analysis. It begins by encouraging you to be certain that your data set is accurate and “error
free,” then proceeds to discuss several basic univariate and bivariate data analysis procedures.
Univariate procedures are essentially what you already know as descriptive statistics. Bivariate
statistical procedures presented in this guide include: the chi square test, the t-test, analysis of
variance (ANOVA), and the Pearson’s r (correlation). This guide does not discuss multivariate
(more than two variables) statistical analysis procedures.
Organization of the Guide
This guide begins with two very brief sections on constructing questions for a survey and
general reminders about data analysis. The points in these two sections should serve as “memory
joggers” as you begin to consider the relationship between your research design and statistical
analysis. The Data Analysis section re-introduces you to the important task of insuring your data
is “clean” by conducting a “Frequencies” procedure. Once you are fairly certain your data is
accurate, you can begin the statistical analysis procedures, initially conducting univariate data
analysis then moving on to bivariate procedures.
This guide for data analysis assumes an understanding of basic statistics and basic skills and
experience with SPSS ™.
5
Basic Guidelines for Constructing a Survey Question
Though this guide will not present all aspects of designing a research project, you may find it
helpful to have a few reminders about constructing questions for a survey instrument. This will
enable you to be mindful that how you ultimately construct a question and its response categories
determine what you can do, statistically, with it.
When constructing survey questions or when selecting questions to use from a standardized
instrument, you may want to keep in mind the following questions:
1. What’s the purpose of my research? Am I trying to describe, to explain, to predict, or
evaluate some occurrence and given the purpose of my research, will I need to generate
descriptive statistics, inferential statistics, or both descriptive and inferential statistics?
2. For each question on a survey instrument, does this survey question provide information
about the independent variable(s), the dependent variable, the control variables, or is this
question on the survey to provide some demographic information about the respondents?
3. Which of the variables/questions do I intend to analyze together, i.e., gender of the
respondents by their education level?
4. What is the best or most appropriate level of measurement (nominal, ordinal,
interval/ratio) for this variable? Should I create response categories so that I get nominal,
ordinal, or interval/ratio level data?
5. Will I have a random or nonrandom sample and is my sample of sufficient size that I can
assume the scores approach that of a normal distribution?
6. What is my anticipated sample size and will I have a sample of sufficient size such that I
can conduct the statistical procedures I have planned to run?
How you answer these questions will, to a degree, influence the questions you ask on your
survey and help establish the response categories for the questions. Most importantly they will
influence the kinds of statistical procedures you are able to conduct for your study.
Constructing Your Response Categories - Establishing Your Level of Measurement
If you are constructing your data collection instrument, you have the opportunity to establish
the level of measure for many of your variables. As an example, the variable education can be
constructed in such a way that your data may be a nominal, ordinal, or an interval/ratio measure.
Education as a nominal measure:
Do you have a high school diploma?
____Yes
____No
6
Education as an ordinal measure:
What is your current class standing?
___Senior ___Junior ___Sophomore ___Freshman
Education as an interval/ratio measure:
How many years of education do you have?
______Years
As you examine the examples above of how you could construct a question about one’s level
of education, you should recognize that designing and constructing a survey instrument is both a
science and an art, and you should think of a question in terms of its response categories and
level of measure. The next section further illustrates the importance of the response categories of
your questions.
Associating Response Categories of a Question to Statistical Procedures
This section presents the relationship between level of measure of the response categories of
a question and possible basic statistical procedures you can conduct. As noted earlier and
illustrated in sections still to come, you should think in terms of both univariate and bivariate
data analysis. The tables below provide a basic guide for the types of univariate and bivariate
data analysis you can conduct, based on the measurement level of your variables. In the tables
below, measurement level refers to the response categories for a given question on a survey.
Table 1: Univariate Procedures
Measurement Level
Nominal measures
EX: gender; ethnicity; religious preference
Ordinal measures
EX: socioeconomic status as high, medium, and
low; class standing as Senior, Junior, Sophomore,
Freshman
Interval /ratio measures
EX: age in years; income in dollars; test scores
Basic Statistical Procedures
Mode, Percentages, Ratios
Mode, Median, Percentages, Ratios,
Quartiles
Mode, Median, Percentages, Ratios,
Quartiles, Mean, standard deviation
In Table 2 Bivariate Statistical Procedures, you will notice a row and column identified as
dichotomous. Dichotomous variables are a special category of variables that only have two
meaningful response categories. Dichotomous variables, for the purpose of this guide, will be
treated as though they are nominal level variables. Examples of dichotomous variables include
7
Sex (Male/Female), US Citizen (Yes/No), Race (White/Nonwhite), and Religion
(Christian/NonChristian).
Table 2 provides also you with recommendations about statistical procedures you may desire
to conduct when examining two variables. Table 2 is read by looking at the intersection of the
row and column that represents the level of measure of your two variables. Thus, if you have two
interval level variables (interval x interval) you should probably conduct a Pearson’s r
(correlation).
Table 2: Bivariate Statistical Procedures
Measurement Level
of First Variable
Dichotomous
Nominal
Ordinal
Interval/Ratio
Measurement Level of Second Variable
Dichotomous
Nominal
Ordinal
Chi square
Phi
Chi square
Chi Square
Cramer’s V
Cramer’s V
Lambda
Lambda
Chi square
ANOVA
Gamma, Somers' d,
t-test (for
One-way (for
Tau B, Tau C,
interval like
interval-like data) Spearman’s rho,
data)
Pearson’s r (for
interval-like data)
t-test for
ANOVA
ANOVA
independent,
One-way
One-way
paired, and onePearson’s r (for
sample
interval-like data)
Interval/Ratio
Pearson’s r
Basic Guidelines for Analyzing Data
Before you actually begin to conduct your data analysis, there are a few preliminary points to
consider that may impact your statistical analysis. The statements below are for you to consider
once you have collected your surveys and as you enter and begin the statistical analysis of your
data.
1. “Junk in, junk out,” meaning if your data is not entered accurately (is not “clean”), the
conclusions drawn from your statistical analysis may not be correct.
2. You are generally more likely to find statistical significance with larger samples. Thus, if
you have a small sample (exactly what “small” means will need to be covered in a
research methods course) you are less likely to find significance, which leads to the next
point.
8
3. While an alpha level of .05 (level of significance,
= .05) is standard for most social
science research, you may decide to establish either a higher or lower alpha based on
your research design, question, and sample size. Consult with your professor or a
statistical consultant about the alpha to establish for your analysis. The important point to
remember is that you should establish your alpha before you conduct your statistical
analysis.
4. In statistical analysis a relationship is either significant or not significant. There is no
relationship that can be described as “highly significant” or “strongly significant.” If you
have established your alpha as .05, then whether the computed probability (p) is .049 or
.0001, you can only state that you have a “significant” relationship.
5. Remember that a high or “strong” correlation is not the same as causation.
Data Analysis: Making Sense of Those Numbers
Check To Be Sure Your Data Is Accurate
One of the first steps in data analysis is to insure the information in your data file is accurate.
In other words you should have some level of certainty the data entered into your SPSS data file
are correct. One way to check for errors in data entry is to run the Frequencies procedure. This
will help you identify one type of data entry error, specifically when you enter a numeric value
that does not represent a response code. For example, for the variable Sex, you have the numeric
codes of 1 for “Male” respondents, 2 for “Female” respondents, or 99 representing responses that
are “Not Answered.” Upon running the Frequencies procedure you note that a 7 has been entered
for the variable. The 7 is a data entry error because you should only have codes of 1, 2, or 99 for
the variable Sex.
The Frequencies procedure, however, will only help you identify one type of data entry error.
The output from a Frequencies procedure will not identify data entry errors where, for the
variable Sex, you entered a code of 1 for a respondent when it really should have been a 2. In
other words, you miscoded the respondent as “Male” instead of “Female” but the numeric code
you entered, a code of 1, is a valid code for the variable Sex. Identifying and correcting this and
other types of data entry errors will require other procedures and processes on the part of the
researcher or person entering the data.
9
Conducting a Frequencies Analysis for Each Variable
Check for the following:
a. Is the total number of responses, the number of records entered, correct for each variable,
i.e., if you entered 40 records, do you have 40 in the data file for each variable - good
responses plus those you have identified as “system missing?”
b. Are all the numeric codes entered correctly, i.e., if you are only supposed to have 1’s for
Males, 2’s for Females, and 99’s for Not Answered (NA), did you check to insure you
don’t have any other numeric value entered for that variable?
c. If you note errors in the data, correct them before you conduct your statistical analysis,
then rerun “Frequencies” for those variables where corrections were made.
d. Frequencies is not appropriate for string variables that have alpha numeric characters
such as street addresses and names.
Example of a Survey Question and SPSS Frequencies Output for the Variable SEX
Example of a survey question about the respondent’s sex with pre-coded responses:
1. What is your sex?
____ 1 Male
____ 2 Female
Example of SPSS Frequencies output for the variable Sex:
Statistics
RESPONDENTS SEX
N
Valid
Missing
40
0
RESPONDENTS SEX
Frequency Percent
Valid 0
1 MALE
2 FEMALE
Total
1
17
22
40
2.5
42.5
55.0
100.0
Valid
Percent
2.5
42.5
55.0
100.0
Cumulative
Percent
2.5
45.0
100.0
Data entry error identified by running Frequencies
as there should be only 1’s and 2’s entered.
Though the Frequencies procedure will not totally eliminate the problem of data entry error,
it will help reduce the error in your data. The Frequencies procedure can also generate basic
descriptive statistics that will allow you to both check your data for errors and begin to develop a
10
sense of the distribution of scores for your variables. The next section discusses univariate
statistical procedures that can be conducted as you are running the Frequencies procedure.
Univariate Data Analysis
Univariate data analysis is the analysis of a single variable as opposed to conducting data
analysis using two (bivariate) or more (multivariate) variables. The term “descriptive statistics”
is most often associated with summarizing the characteristics of a variable or a set of variables.
Another general term, “measures of central tendency,” is also used as a reference to the statistical
procedures associated with describing the distribution of values of the responses to a single
variable. Measures of central tendency include the mode, median, and mean. Other information
about the distribution of scores in a variable that further assist with describing the variable
include the range, upper and lower limits, variance, standard deviation, and confidence interval.
Analysis of a Nominal Level Variable
A nominal variable is a categorical variable that is measured in such a way that the categories
indicate differences among respondents with no hierarchy or rank order implied in those
differences. When constructing a survey question with nominal level response categories, the
response categories should be mutually exclusive and exhaustive. Common examples of nominal
level variables are Sex (Male/Female), Ethnic Background (Anglo, Hispanic, African American,
Asian, Pacific Islander, etc.), and Religion (Protestant, Catholic, Jewish, Islamic, Buddhist, etc.).
The following statistics may be appropriate for nominal variables/data:
o Frequencies (mode)
o Percentages
o Ratios
Example of a survey question and SPSS output for a nominal level variable
Example of a survey question and nominal response categories with pre-coded response
categories:
1. What is your religious preference?
___1 Protestant ___2 Catholic ___3 Jewish ___4 None __5 Other
Example of SPSS outputs for the variable Religious Preference:
Statistics
RELIGIOUS PREFERENCE
11
N
Valid
Missing
Mode
1477
9
1
RELIGIOUS PREFERENCE
Frequency
Valid 1 PROTESTANT
2 CATHOLIC
3 JEWISH
4 NONE
5 OTHER
Total
Missing 9 NA
Total
886
367
26
146
52
1477
9
1486
Percent
59.6
24.7
1.7
9.8
3.5
99.4
.6
100.0
Valid
Cumulative
Percent
Percent
60.0
60.0
24.8
84.8
1.8
86.6
9.9
96.5
3.5
100.0
100.0
Example of SPSS pie graph with percentages for the variable Religious Preference:
OTHER
3.5%
NONE
9.8%
Missing
.6%
JEWISH
1.7%
CATHOLIC
24.7%
PROTESTANT
59.6%
Brief Interpretation of an Analysis of the Variable Religious Preference Using the Mode
The 1,486 respondents in this survey most often reported they were of a Protestant faith
followed by those reporting they were of the Catholic faith.
Brief Interpretation of an Analysis of the Variable Religious Preference Using Percentages
Of the 1,486 total respondents, 59.6% reported they were Protestant, followed by those
reporting they were Catholic (24.7%) and Jewish (1.7%), while 9.8% reported they had
no religious preference, 3.5% noted they had another religious preference, and 0.6%
were “missing,” meaning they did not respond to the question.
12
Brief Interpretation of an Analysis of the Variable Religious Preference Using a Ratio
Slightly less than three of every five respondents reported they were of the Protestant
faith.
Analysis of an Ordinal Level Variable
An ordinal variable is a categorical variable in which there is some inherent rank, hierarchy,
or order to the categories. The concept of “rank” in this instance does not imply that respondents
in a higher category are in some way better than other respondents. Instead, hierarchy or rank
means that the established categories allow the respondents to be arranged along some dimension
or in some order. Common examples of ordinal level variables include Economic Status (Low,
Middle, High), Class Standing (Senior, Junior, Sophomore, Freshman), and attitudinal variables,
such as Satisfaction with Services (High, Medium, Low).
The following statistics may be appropriate for ordinal variables/data:
o Frequencies (mode, median)
o Percentages
o Quartiles
Example of a survey question and SPSS output for an ordinal level variable
Example of a survey question and ordinal response categories:
1. What is your annual family income?
___1
___2
___3
___4
___5
___6
___7
Less than $1,000
$1,000-2,999
$3,000-3,999
$4,000-4,999
$5,000-5,999
$6,000-6,999
$7,000-7,999
___ 8
___ 9
___10
___11
___12
___13
___14
$8,000-9,999
$10,000-12,499
$12,500-14,999
$15,000-17,499
$17,500-19,999
$20,000-22,499
$22,500-24,999
___15
___16
___17
___18
___19
___20
___21
$25,000-29,999
$30,000-34,999
$35,000-39,999
$40,000-49,999
$50,000-59,999
$60,000-74,999
$75,000+
Example of SPSS outputs for the variable Family Income
Statistics
TOTAL FAMILY INCOME (N=1486)
N
Valid
Missing
Median
Mode
Percentiles
25
50
75
1405
81
16.00
18.00
11.00
16.00
19.00
Numeric values represent various
income groups. See the next
Table.
13
TOTAL FAMILY INCOME
Frequency
Valid
Missing
Total
1 LT $1000
2 $1000-2999
3 $3000-3999
4 $4000-4999
5 $5000-5999
6 $6000-6999
7 $7000-7999
8 $8000-9999
9 $10000-12499
10 $12500-14999
11 $15000-17499
12 $17500-19999
13 $20000-22499
14 $22500-24999
15 $25000-29999
16 $30000-34999
17 $35000-39999
18 $40000-49999
19 $50000-59999
20 $60000-74999
21 $75000+
22 REFUSED
Total
98 DK
99 NA
Total
12
17
16
17
32
13
21
38
73
62
68
63
70
70
103
110
80
141
93
93
130
83
1405
56
25
81
1486
Percent
.8
1.1
1.1
1.1
2.2
.9
1.4
2.6
4.9
4.2
4.6
4.2
4.7
4.7
6.9
7.4
5.4
9.5
6.3
6.3
8.7
5.6
94.5
3.8
1.7
5.5
100.0
Valid
Cumulative
Percent
Percent
.9
.9
1.2
2.1
1.1
3.2
1.2
4.4
2.3
6.7
.9
7.6
1.5
9.1
2.7
11.8
5.2
17.0
4.4
21.4
4.8
26.3
4.5
30.7
5.0
35.7
5.0
40.7
7.3
48.0
7.8
55.9
5.7
61.6
10.0
71.6
6.6
78.2
6.6
84.8
9.3
94.1
5.9
100.0
100.0
Brief Interpretation of an Analysis of the variable Family Income
Though the annual family income most often reported was between $40,000 and
$49,999, the median annual family income for the 1,405 valid respondents was between
$30,000 and $34,999. Twenty-five percent of the families reported an annual income of
less than $17,500 while the upper 25% reported an annual income of more than
$50,000.
Analysis of an Interval/Ratio Level Variable
Unlike nominal and ordinal variables that are categorical, interval and ratio level variables
are numeric or scaled variables. For these variables, the numbers are ordered, ranked, and the
distance between the numbers is the same for all numbers (i.e., $5.00 is higher than $4.00 by the
same amount as $99.00 is higher than $98.00). Interval variables are like ratio variables except
interval variables do not have a true zero, meaning a value of zero does not really mean the
absence of the characteristic and the distance between units of measurement of interval variables
are not proportional. For example, age is a ratio level variable because the age of zero means the
person is not yet born and someone 20 years of age is twice that of another who is age 10. IQ
14
score is an interval level variable because an IQ of 100 does not mean a person has twice the
intelligence of a person with an IQ of 50. Statistically, however, interval and ratio level data are
treated the same way. Variables often used in social research that are interval/ratio level include
number of children in a family, number of therapy or counseling sessions, number of times
married, and number of days hospitalized.
The following statistics may be appropriate for interval/ratio variables/data:
o Frequencies (mode, median, mean)
o Quartiles
o Standard deviation
Example of a survey question and SPSS output for an interval level variable
1. How old were you when you were first married?
____ Years of age
Statistics
AGE WHEN FIRST MARRIED
N
Valid
590
Missing
896
Mean
22.64
Median
22.00
Mode
21
Std. Deviation
4.710
Minimum
13
Maximum
57
Percentiles
25
19.00
50
22.00
75
25.00
Brief Interpretation of an Analysis of Age When First Married
When asked their age when they were first married, 590 of 1,486 respondents had a
valid response. The average age of first marriage was 22.64 years (sd = 4.710), while
the median age of first marriage was 22 years. The most common or frequently reported
age of first marriage was 21 years of age. The youngest age reported of first marriage
was 13 years and the oldest was 57 years of age. The lower twenty-five percent of the
respondents noted that they first married by age 19 while the upper twenty-five percent
reported they were married at or older than the age of 25 years.
While univariate analysis of data is an important and helpful procedure to describe a variable,
even more information about the data can be gathered by conducting bivariate data analysis. The
next section presents a discussion on the more common types of bivariate data analysis
procedures.
15
Bivariate (2 variables) Data Analysis
The more common bivariate statistical analysis procedures (ones you will most likely use)
include the chi square, t-test, analysis of variance (ANOVA), and Pearson’s r (correlation). Each
procedure is discussed in the sections below and include the assumptions for each statistical
procedure, conditions for selecting a particular statistical procedure, and an SPSS example with a
brief statement describing the analysis of the SPSS output.
Chi Square (Goodness of Fit) Test
The chi square is a nonparametric test for the bivariate analysis of two nominal level
variables. When conducting the chi square test, the data is often displayed as a cross tabulation
(crosstab) or contingency table. The chi square is actually the name of the test statistic used to
determine if there is a significant relationship between the two nominal variables. The specific
statistical procedure is discussed in most statistical textbooks. In SPSS the crosstabs procedure
may also be used to determine the association between two ordinal level variables and
nominal/interval level variables though doing so requires specific and special data analysis
procedures. Consult with your professor or a statistician if you think you will need to conduct an
analysis of ordinal or interval level variables using the crosstabs.
Assumptions of the chi square:
 Probability sampling design
 80% or more of the cells in your contingency table should have an expected cell
frequency of 5 or greater
 Observations are independent, meaning you should not use the chi square test for
matched pairs.
 You should apply Yates correction factor for 2x2 contingency tables and the cell
frequencies are 5 or more, but less than 10.
 You should apply Fisher’s exact test when the sample size for a 2x2 contingency
table is 20 or less.
SPSS procedures to request in the Crosstabs dialogue box (see Appendix 1 Crosstab and Chi
Square SPSS Screens):




Click the “Statistics” button and check the “Chi square” box in the upper left corner
Check the appropriate measure of association box (most likely “Phi and Cramer’s V)
Click the “Cells” button and check the row and/or column and/or total percentages
(based on how you prefer to look at/analyze the table) box
For Residuals, standardized residuals are recommended. Cells with standardized
residual values of greater than +1.0 may reveal the cell that contributes to a
significant chi square test.
16
Measures of association for 2 nominal variables:
 Phi - for 2x2 tables
 Cramer’s V for other tables (2x3, 3x3, etc.)
Example 1 - Chi square test
The contingency table and data analysis examines the relationship between general happiness
and marital status of the respondents. Examples of possible survey questions are noted below.
1. What is your current marital status?
___1 Married ___2 Widowed ___3 Divorced ___4 Separated ____5 Never married
2. What is your current level of general happiness with life?
___ Very happy
___ Pretty happy
___ Not too happy
SPSS Output of a Contingency Table and Chi Square Test
GENERAL HAPPINESS by MARITAL STATUS (N=494)
MARITAL STATUS
Total
GENERAL
NEVER
MARRIED WIDOWED DIVORCED SEPARATED
HAPPINESS
MARRIED
Count
81
8
12
1
15
117
VERY
% within Marital 33.5%
16.3%
16.0%
9.1%
12.8%
23.7%
HAPPY
Status
Std. Residual
3.1
-1.1
-1.4
-1.0
-2.4
Count
143
33
52
8
83
319
PRETTY
% within Marital 59.1%
67.3%
69.3%
72.7%
70.9%
64.6%
HAPPY
Status
Std. Residual
-1.1
.2
.5
.3
.9
Count
18
8
11
2
19
58
NOT TOO
% within Marital
7.4%
16.3%
14.7%
18.2%
16.2%
11.7%
HAPPY
Status
Std. Residual
-2.0
.9
.7
.6
1.4
Count
242
49
75
11
117
494
Total
% 100.0%
100.0%
100.0%
100.0%
100.0% 100.0%
Chi-Square Tests
Value
df
Asymp. Sig.
(2-sided)
.000
.000
.000
Pearson Chi-Square 29.537
8
Likelihood Ratio 30.445
8
Linear-by-Linear 22.369
1
Association
N of Valid Cases
494
a. 2 cells (13.3%) have expected count less than 5.
The minimum expected count is 1.29.
Standardized residuals > +1.0
Level of significance (p) is < .05
Chi square value and degrees of
freedom (df)
Number of cells (%) that have an
expected frequency of less than 5
17
Symmetric Measures
Nominal by Nominal
Phi
Cramer's V
Value
.245
.173
494
Approx. Sig.
.000
.000
Value of Cramer’s V noting a
“weak” association
N of Valid Cases
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Brief Interpretation of the Analysis of General Happiness with the Marital Status of the
Respondents
There is a significant (X 2 = 29.537, df = 8, p < .01) but weak association (Cramer’s V =
.173) between one’s level of general happiness and marital status. Persons who are
married are significantly more likely to report they are very happy while persons who
have never married are more likely to report they are not too happy.
Example 2 - Chi square test
The table and data analysis examines the relationship between favoring or opposing the death
penalty for the crime of murder and gender of the respondent. Possible survey questions are also
provided below.
1. What is your sex?
___1 Male
___2 Female
2. Do you favor or oppose the death penalty for the crime of murder?
___1 Favor
___2 Oppose
SPSS Output for the a Cross tabulation and Chi square test
FAVOR OR OPPOSE THE DEATH PENALTY FOR MURDER by SEX
SEX
Total
1 MALE
2 FEMALE
1 FAVOR
Count
199
232
431
% within
81.9%
73.7%
77.2%
FAVOR OR
RESPONDENTS SEX
OPPOSE
Std. Residual
.8
-.7
DEATH
PENALTY FOR 2 OPPOSE
Count
44
83
127
MURDER
% within
18.1%
26.3%
22.8%
RESPONDENTS SEX
Std. Residual
-1.5
1.3
Total
Count
243
315
558
% within RESPONDENTS SEX
100.0%
100.0%
100.0%
18
Chi-Square Tests
Value
df
Asymp. Sig. Exact Sig. Exact Sig.
(2-sided)
(2-sided) (1-sided)
.021
.028
.020
.025
.013
.021
Pearson Chi-Square 5.301
1
Continuity Correction 4.843
1
Likelihood Ratio 5.385
1
Fisher's Exact Test
Linear-by-Linear 5.291
1
Association
N of Valid Cases
558
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 55.31.
Symmetric Measures
Nominal by Nominal
Phi
Cramer's V
Value Approx. Sig.
.097
.021
.097
.021
558
N of Valid Cases
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Brief Interpretation of the Analysis of Attitude Toward the Death Penalty for Murder with
Respondent’s Sex
There is a significant (X 2 = 5.301, df = 1, p = .021) but weak association (Phi = .097)
between a person favoring or opposing the death penalty for the crime of murder and the
person’s sex. Women are significantly more likely to oppose the death penalty for the
crime of murder than are men.
Statistics is like grout - The word feels decidedly
unpleasant in the mouth, but it describes something
essential for holding a mosaic in place.
- Ramsey & Schafer -
19
Appendix 1 Frequencies SPSS Screens
Highlight and move the
variables from the variable
list to the “Variable(s)” box
by clicking the arrowhead.
20
Click “OK” button to run
Frequencies. Output for the first
variable, “quality” is noted below..
Variable Label
Variable Name
quality Quality of Svc
Cumulative
Frequency
Valid
Values
Percent
Valid Percent
Percent
1 Poor
1
.9
.9
.9
2 Fair
2
1.9
1.9
2.8
3 Good
17
15.9
15.9
18.7
4 Excellent
87
81.3
81.3
100.0
107
100.0
100.0
Total
21
Appendix 2 Crosstab and Chi Square SPSS Screens
Click “Statistics” to get
the “Crosstabs: Statistics”
dialogue box
22
Click “Cells” to get the
“Crosstabs: Cell Display”
dialogue box
23
References
Holcomb, Z. C. (2006). SPSS basics: Techniques for a first course in statistics. Glendale, CA:
Pyrczak Publishing.
Kachigan, S. K. (1986). Statistical analysis: An interdisciplinary introduction to univariate and
multivariate methods. New York: Radius Press.
Keller, G. (2001). Applied statistics with Microsoft® Excel. Pacific Grove, CA: Duxbury.
Norusis, M. J. (2011). IBM SPSS Statistics 19 guide to data analysis. Upper Saddle River, NJ:
Prentice Hall.
Ramsey, F. L., & Schafer, W. (2002). The statistical sleuth: A course in methods of data analysis
(2nd ed.). Belmont CA: Duxbury Press.
Rubin, A., & Babbie, E. (2008). Research methods for social work (7th ed.). Belmont, CA:
Thompson/Brooks Cole.
Download