Basic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor Social Work Program January 30, 2012 Reproduction of any part of the guidelines is not permitted without the author’s permission. August, 2008 2 Table of Contents Page Introduction ............................................................................................................................4 Organization of the Guide ................................................................................................4 Basic Guidelines for Constructing a Survey Question ..........................................................5 Constructing Your Response Categories - Establishing Your Level of Measurement ....5 Associating Response Categories of a Question to Statistical Procedures ......................6 Basic Guidelines for Analyzing Data ....................................................................................7 Data Analysis: Making Sense of Those Numbers .................................................................8 Check To Be Sure Your Data is Accurate .......................................................................8 Conducting a Frequencies Analysis for Each Variable ...................................................9 Example of a Survey Question and SPSS Frequencies Output for the Variable SEX ....9 Univariate Data Analysis .......................................................................................................10 Analysis of a Nominal Level Variable.............................................................................10 Example of a survey question and SPSS output for a nominal level variable ...........10 Analysis of an Ordinal Level Variable ............................................................................12 Example of a survey question and SPSS output for an ordinal level variable ...........12 Analysis of an Interval/Ratio Level Variable ..................................................................13 Example of a survey question and SPSS output for an interval level variable ..........14 Bivariate (2 variables) Data Analysis ....................................................................................15 Chi Square (Goodness of Fit) Test...................................................................................15 Example 1 - Chi square test .......................................................................................16 Example 2 - Chi square test .......................................................................................17 t-Test (Difference of Means Test).................................................................................... Example 1 - One sample t-test ................................................................................... Example 2 - Independent samples t-test .................................................................... Example 3 - Paired samples t-test .............................................................................. Analysis of Variance (ANOVA) Test .............................................................................. Example of a one-way ANOVA ................................................................................ Pearson’s Product Moment Correlation (r)...................................................................... Example - Pearson’s (r) ............................................................................................. 3 Conclusion ............................................................................................................................. Appendices SPSS Output Screens Appendix 1 Frequencies SPSS Screens ...........................................................................19 Appendix 2 Crosstab and Chi Square SPSS Screens .......................................................21 Appendix 2 t-Test SPSS Screens ..................................................................................... One Sample t-Test Screens ........................................................................................ Independent Samples t-Test Screens ......................................................................... Paired Samples t-Test Screens ................................................................................... Appendix 3 Analysis of Variance (one-way) SPSS Screens ........................................... Appendix 4 Pearson’s r SPSS Screens ............................................................................ References ..............................................................................................................................23 4 Basic Data Analysis Guidelines for Research Students Introduction Research and statistics are inseparable. Knowing this is one thing. Understanding and using this relationship is another, especially for a research student. An oversight of many research students is that of waiting until later rather than considering early in the research process the relationship between the problem statement, research question, hypotheses, the kinds of data one will be collecting, and the statistical analysis of the data. This basic guide for analyzing data is presented to encourage you to consider early rather than later in the research process the relationship that exists between questions asked on a survey, the response categories and data that is generated, and statistical procedures available to create some sense from the collected data. Thinking about data and its analyses should be part of the first steps in the development of a research proposal and like many other parts of the research process should be continually revisited, updated, and refined as your project draws to a conclusion. This guide provides examples of univariate (single variable) and bivariate (two variables) analysis. It begins by encouraging you to be certain that your data set is accurate and “error free,” then proceeds to discuss several basic univariate and bivariate data analysis procedures. Univariate procedures are essentially what you already know as descriptive statistics. Bivariate statistical procedures presented in this guide include: the chi square test, the t-test, analysis of variance (ANOVA), and the Pearson’s r (correlation). This guide does not discuss multivariate (more than two variables) statistical analysis procedures. Organization of the Guide This guide begins with two very brief sections on constructing questions for a survey and general reminders about data analysis. The points in these two sections should serve as “memory joggers” as you begin to consider the relationship between your research design and statistical analysis. The Data Analysis section re-introduces you to the important task of insuring your data is “clean” by conducting a “Frequencies” procedure. Once you are fairly certain your data is accurate, you can begin the statistical analysis procedures, initially conducting univariate data analysis then moving on to bivariate procedures. This guide for data analysis assumes an understanding of basic statistics and basic skills and experience with SPSS ™. 5 Basic Guidelines for Constructing a Survey Question Though this guide will not present all aspects of designing a research project, you may find it helpful to have a few reminders about constructing questions for a survey instrument. This will enable you to be mindful that how you ultimately construct a question and its response categories determine what you can do, statistically, with it. When constructing survey questions or when selecting questions to use from a standardized instrument, you may want to keep in mind the following questions: 1. What’s the purpose of my research? Am I trying to describe, to explain, to predict, or evaluate some occurrence and given the purpose of my research, will I need to generate descriptive statistics, inferential statistics, or both descriptive and inferential statistics? 2. For each question on a survey instrument, does this survey question provide information about the independent variable(s), the dependent variable, the control variables, or is this question on the survey to provide some demographic information about the respondents? 3. Which of the variables/questions do I intend to analyze together, i.e., gender of the respondents by their education level? 4. What is the best or most appropriate level of measurement (nominal, ordinal, interval/ratio) for this variable? Should I create response categories so that I get nominal, ordinal, or interval/ratio level data? 5. Will I have a random or nonrandom sample and is my sample of sufficient size that I can assume the scores approach that of a normal distribution? 6. What is my anticipated sample size and will I have a sample of sufficient size such that I can conduct the statistical procedures I have planned to run? How you answer these questions will, to a degree, influence the questions you ask on your survey and help establish the response categories for the questions. Most importantly they will influence the kinds of statistical procedures you are able to conduct for your study. Constructing Your Response Categories - Establishing Your Level of Measurement If you are constructing your data collection instrument, you have the opportunity to establish the level of measure for many of your variables. As an example, the variable education can be constructed in such a way that your data may be a nominal, ordinal, or an interval/ratio measure. Education as a nominal measure: Do you have a high school diploma? ____Yes ____No 6 Education as an ordinal measure: What is your current class standing? ___Senior ___Junior ___Sophomore ___Freshman Education as an interval/ratio measure: How many years of education do you have? ______Years As you examine the examples above of how you could construct a question about one’s level of education, you should recognize that designing and constructing a survey instrument is both a science and an art, and you should think of a question in terms of its response categories and level of measure. The next section further illustrates the importance of the response categories of your questions. Associating Response Categories of a Question to Statistical Procedures This section presents the relationship between level of measure of the response categories of a question and possible basic statistical procedures you can conduct. As noted earlier and illustrated in sections still to come, you should think in terms of both univariate and bivariate data analysis. The tables below provide a basic guide for the types of univariate and bivariate data analysis you can conduct, based on the measurement level of your variables. In the tables below, measurement level refers to the response categories for a given question on a survey. Table 1: Univariate Procedures Measurement Level Nominal measures EX: gender; ethnicity; religious preference Ordinal measures EX: socioeconomic status as high, medium, and low; class standing as Senior, Junior, Sophomore, Freshman Interval /ratio measures EX: age in years; income in dollars; test scores Basic Statistical Procedures Mode, Percentages, Ratios Mode, Median, Percentages, Ratios, Quartiles Mode, Median, Percentages, Ratios, Quartiles, Mean, standard deviation In Table 2 Bivariate Statistical Procedures, you will notice a row and column identified as dichotomous. Dichotomous variables are a special category of variables that only have two meaningful response categories. Dichotomous variables, for the purpose of this guide, will be treated as though they are nominal level variables. Examples of dichotomous variables include 7 Sex (Male/Female), US Citizen (Yes/No), Race (White/Nonwhite), and Religion (Christian/NonChristian). Table 2 provides also you with recommendations about statistical procedures you may desire to conduct when examining two variables. Table 2 is read by looking at the intersection of the row and column that represents the level of measure of your two variables. Thus, if you have two interval level variables (interval x interval) you should probably conduct a Pearson’s r (correlation). Table 2: Bivariate Statistical Procedures Measurement Level of First Variable Dichotomous Nominal Ordinal Interval/Ratio Measurement Level of Second Variable Dichotomous Nominal Ordinal Chi square Phi Chi square Chi Square Cramer’s V Cramer’s V Lambda Lambda Chi square ANOVA Gamma, Somers' d, t-test (for One-way (for Tau B, Tau C, interval like interval-like data) Spearman’s rho, data) Pearson’s r (for interval-like data) t-test for ANOVA ANOVA independent, One-way One-way paired, and onePearson’s r (for sample interval-like data) Interval/Ratio Pearson’s r Basic Guidelines for Analyzing Data Before you actually begin to conduct your data analysis, there are a few preliminary points to consider that may impact your statistical analysis. The statements below are for you to consider once you have collected your surveys and as you enter and begin the statistical analysis of your data. 1. “Junk in, junk out,” meaning if your data is not entered accurately (is not “clean”), the conclusions drawn from your statistical analysis may not be correct. 2. You are generally more likely to find statistical significance with larger samples. Thus, if you have a small sample (exactly what “small” means will need to be covered in a research methods course) you are less likely to find significance, which leads to the next point. 8 3. While an alpha level of .05 (level of significance, = .05) is standard for most social science research, you may decide to establish either a higher or lower alpha based on your research design, question, and sample size. Consult with your professor or a statistical consultant about the alpha to establish for your analysis. The important point to remember is that you should establish your alpha before you conduct your statistical analysis. 4. In statistical analysis a relationship is either significant or not significant. There is no relationship that can be described as “highly significant” or “strongly significant.” If you have established your alpha as .05, then whether the computed probability (p) is .049 or .0001, you can only state that you have a “significant” relationship. 5. Remember that a high or “strong” correlation is not the same as causation. Data Analysis: Making Sense of Those Numbers Check To Be Sure Your Data Is Accurate One of the first steps in data analysis is to insure the information in your data file is accurate. In other words you should have some level of certainty the data entered into your SPSS data file are correct. One way to check for errors in data entry is to run the Frequencies procedure. This will help you identify one type of data entry error, specifically when you enter a numeric value that does not represent a response code. For example, for the variable Sex, you have the numeric codes of 1 for “Male” respondents, 2 for “Female” respondents, or 99 representing responses that are “Not Answered.” Upon running the Frequencies procedure you note that a 7 has been entered for the variable. The 7 is a data entry error because you should only have codes of 1, 2, or 99 for the variable Sex. The Frequencies procedure, however, will only help you identify one type of data entry error. The output from a Frequencies procedure will not identify data entry errors where, for the variable Sex, you entered a code of 1 for a respondent when it really should have been a 2. In other words, you miscoded the respondent as “Male” instead of “Female” but the numeric code you entered, a code of 1, is a valid code for the variable Sex. Identifying and correcting this and other types of data entry errors will require other procedures and processes on the part of the researcher or person entering the data. 9 Conducting a Frequencies Analysis for Each Variable Check for the following: a. Is the total number of responses, the number of records entered, correct for each variable, i.e., if you entered 40 records, do you have 40 in the data file for each variable - good responses plus those you have identified as “system missing?” b. Are all the numeric codes entered correctly, i.e., if you are only supposed to have 1’s for Males, 2’s for Females, and 99’s for Not Answered (NA), did you check to insure you don’t have any other numeric value entered for that variable? c. If you note errors in the data, correct them before you conduct your statistical analysis, then rerun “Frequencies” for those variables where corrections were made. d. Frequencies is not appropriate for string variables that have alpha numeric characters such as street addresses and names. Example of a Survey Question and SPSS Frequencies Output for the Variable SEX Example of a survey question about the respondent’s sex with pre-coded responses: 1. What is your sex? ____ 1 Male ____ 2 Female Example of SPSS Frequencies output for the variable Sex: Statistics RESPONDENTS SEX N Valid Missing 40 0 RESPONDENTS SEX Frequency Percent Valid 0 1 MALE 2 FEMALE Total 1 17 22 40 2.5 42.5 55.0 100.0 Valid Percent 2.5 42.5 55.0 100.0 Cumulative Percent 2.5 45.0 100.0 Data entry error identified by running Frequencies as there should be only 1’s and 2’s entered. Though the Frequencies procedure will not totally eliminate the problem of data entry error, it will help reduce the error in your data. The Frequencies procedure can also generate basic descriptive statistics that will allow you to both check your data for errors and begin to develop a 10 sense of the distribution of scores for your variables. The next section discusses univariate statistical procedures that can be conducted as you are running the Frequencies procedure. Univariate Data Analysis Univariate data analysis is the analysis of a single variable as opposed to conducting data analysis using two (bivariate) or more (multivariate) variables. The term “descriptive statistics” is most often associated with summarizing the characteristics of a variable or a set of variables. Another general term, “measures of central tendency,” is also used as a reference to the statistical procedures associated with describing the distribution of values of the responses to a single variable. Measures of central tendency include the mode, median, and mean. Other information about the distribution of scores in a variable that further assist with describing the variable include the range, upper and lower limits, variance, standard deviation, and confidence interval. Analysis of a Nominal Level Variable A nominal variable is a categorical variable that is measured in such a way that the categories indicate differences among respondents with no hierarchy or rank order implied in those differences. When constructing a survey question with nominal level response categories, the response categories should be mutually exclusive and exhaustive. Common examples of nominal level variables are Sex (Male/Female), Ethnic Background (Anglo, Hispanic, African American, Asian, Pacific Islander, etc.), and Religion (Protestant, Catholic, Jewish, Islamic, Buddhist, etc.). The following statistics may be appropriate for nominal variables/data: o Frequencies (mode) o Percentages o Ratios Example of a survey question and SPSS output for a nominal level variable Example of a survey question and nominal response categories with pre-coded response categories: 1. What is your religious preference? ___1 Protestant ___2 Catholic ___3 Jewish ___4 None __5 Other Example of SPSS outputs for the variable Religious Preference: Statistics RELIGIOUS PREFERENCE 11 N Valid Missing Mode 1477 9 1 RELIGIOUS PREFERENCE Frequency Valid 1 PROTESTANT 2 CATHOLIC 3 JEWISH 4 NONE 5 OTHER Total Missing 9 NA Total 886 367 26 146 52 1477 9 1486 Percent 59.6 24.7 1.7 9.8 3.5 99.4 .6 100.0 Valid Cumulative Percent Percent 60.0 60.0 24.8 84.8 1.8 86.6 9.9 96.5 3.5 100.0 100.0 Example of SPSS pie graph with percentages for the variable Religious Preference: OTHER 3.5% NONE 9.8% Missing .6% JEWISH 1.7% CATHOLIC 24.7% PROTESTANT 59.6% Brief Interpretation of an Analysis of the Variable Religious Preference Using the Mode The 1,486 respondents in this survey most often reported they were of a Protestant faith followed by those reporting they were of the Catholic faith. Brief Interpretation of an Analysis of the Variable Religious Preference Using Percentages Of the 1,486 total respondents, 59.6% reported they were Protestant, followed by those reporting they were Catholic (24.7%) and Jewish (1.7%), while 9.8% reported they had no religious preference, 3.5% noted they had another religious preference, and 0.6% were “missing,” meaning they did not respond to the question. 12 Brief Interpretation of an Analysis of the Variable Religious Preference Using a Ratio Slightly less than three of every five respondents reported they were of the Protestant faith. Analysis of an Ordinal Level Variable An ordinal variable is a categorical variable in which there is some inherent rank, hierarchy, or order to the categories. The concept of “rank” in this instance does not imply that respondents in a higher category are in some way better than other respondents. Instead, hierarchy or rank means that the established categories allow the respondents to be arranged along some dimension or in some order. Common examples of ordinal level variables include Economic Status (Low, Middle, High), Class Standing (Senior, Junior, Sophomore, Freshman), and attitudinal variables, such as Satisfaction with Services (High, Medium, Low). The following statistics may be appropriate for ordinal variables/data: o Frequencies (mode, median) o Percentages o Quartiles Example of a survey question and SPSS output for an ordinal level variable Example of a survey question and ordinal response categories: 1. What is your annual family income? ___1 ___2 ___3 ___4 ___5 ___6 ___7 Less than $1,000 $1,000-2,999 $3,000-3,999 $4,000-4,999 $5,000-5,999 $6,000-6,999 $7,000-7,999 ___ 8 ___ 9 ___10 ___11 ___12 ___13 ___14 $8,000-9,999 $10,000-12,499 $12,500-14,999 $15,000-17,499 $17,500-19,999 $20,000-22,499 $22,500-24,999 ___15 ___16 ___17 ___18 ___19 ___20 ___21 $25,000-29,999 $30,000-34,999 $35,000-39,999 $40,000-49,999 $50,000-59,999 $60,000-74,999 $75,000+ Example of SPSS outputs for the variable Family Income Statistics TOTAL FAMILY INCOME (N=1486) N Valid Missing Median Mode Percentiles 25 50 75 1405 81 16.00 18.00 11.00 16.00 19.00 Numeric values represent various income groups. See the next Table. 13 TOTAL FAMILY INCOME Frequency Valid Missing Total 1 LT $1000 2 $1000-2999 3 $3000-3999 4 $4000-4999 5 $5000-5999 6 $6000-6999 7 $7000-7999 8 $8000-9999 9 $10000-12499 10 $12500-14999 11 $15000-17499 12 $17500-19999 13 $20000-22499 14 $22500-24999 15 $25000-29999 16 $30000-34999 17 $35000-39999 18 $40000-49999 19 $50000-59999 20 $60000-74999 21 $75000+ 22 REFUSED Total 98 DK 99 NA Total 12 17 16 17 32 13 21 38 73 62 68 63 70 70 103 110 80 141 93 93 130 83 1405 56 25 81 1486 Percent .8 1.1 1.1 1.1 2.2 .9 1.4 2.6 4.9 4.2 4.6 4.2 4.7 4.7 6.9 7.4 5.4 9.5 6.3 6.3 8.7 5.6 94.5 3.8 1.7 5.5 100.0 Valid Cumulative Percent Percent .9 .9 1.2 2.1 1.1 3.2 1.2 4.4 2.3 6.7 .9 7.6 1.5 9.1 2.7 11.8 5.2 17.0 4.4 21.4 4.8 26.3 4.5 30.7 5.0 35.7 5.0 40.7 7.3 48.0 7.8 55.9 5.7 61.6 10.0 71.6 6.6 78.2 6.6 84.8 9.3 94.1 5.9 100.0 100.0 Brief Interpretation of an Analysis of the variable Family Income Though the annual family income most often reported was between $40,000 and $49,999, the median annual family income for the 1,405 valid respondents was between $30,000 and $34,999. Twenty-five percent of the families reported an annual income of less than $17,500 while the upper 25% reported an annual income of more than $50,000. Analysis of an Interval/Ratio Level Variable Unlike nominal and ordinal variables that are categorical, interval and ratio level variables are numeric or scaled variables. For these variables, the numbers are ordered, ranked, and the distance between the numbers is the same for all numbers (i.e., $5.00 is higher than $4.00 by the same amount as $99.00 is higher than $98.00). Interval variables are like ratio variables except interval variables do not have a true zero, meaning a value of zero does not really mean the absence of the characteristic and the distance between units of measurement of interval variables are not proportional. For example, age is a ratio level variable because the age of zero means the person is not yet born and someone 20 years of age is twice that of another who is age 10. IQ 14 score is an interval level variable because an IQ of 100 does not mean a person has twice the intelligence of a person with an IQ of 50. Statistically, however, interval and ratio level data are treated the same way. Variables often used in social research that are interval/ratio level include number of children in a family, number of therapy or counseling sessions, number of times married, and number of days hospitalized. The following statistics may be appropriate for interval/ratio variables/data: o Frequencies (mode, median, mean) o Quartiles o Standard deviation Example of a survey question and SPSS output for an interval level variable 1. How old were you when you were first married? ____ Years of age Statistics AGE WHEN FIRST MARRIED N Valid 590 Missing 896 Mean 22.64 Median 22.00 Mode 21 Std. Deviation 4.710 Minimum 13 Maximum 57 Percentiles 25 19.00 50 22.00 75 25.00 Brief Interpretation of an Analysis of Age When First Married When asked their age when they were first married, 590 of 1,486 respondents had a valid response. The average age of first marriage was 22.64 years (sd = 4.710), while the median age of first marriage was 22 years. The most common or frequently reported age of first marriage was 21 years of age. The youngest age reported of first marriage was 13 years and the oldest was 57 years of age. The lower twenty-five percent of the respondents noted that they first married by age 19 while the upper twenty-five percent reported they were married at or older than the age of 25 years. While univariate analysis of data is an important and helpful procedure to describe a variable, even more information about the data can be gathered by conducting bivariate data analysis. The next section presents a discussion on the more common types of bivariate data analysis procedures. 15 Bivariate (2 variables) Data Analysis The more common bivariate statistical analysis procedures (ones you will most likely use) include the chi square, t-test, analysis of variance (ANOVA), and Pearson’s r (correlation). Each procedure is discussed in the sections below and include the assumptions for each statistical procedure, conditions for selecting a particular statistical procedure, and an SPSS example with a brief statement describing the analysis of the SPSS output. Chi Square (Goodness of Fit) Test The chi square is a nonparametric test for the bivariate analysis of two nominal level variables. When conducting the chi square test, the data is often displayed as a cross tabulation (crosstab) or contingency table. The chi square is actually the name of the test statistic used to determine if there is a significant relationship between the two nominal variables. The specific statistical procedure is discussed in most statistical textbooks. In SPSS the crosstabs procedure may also be used to determine the association between two ordinal level variables and nominal/interval level variables though doing so requires specific and special data analysis procedures. Consult with your professor or a statistician if you think you will need to conduct an analysis of ordinal or interval level variables using the crosstabs. Assumptions of the chi square: Probability sampling design 80% or more of the cells in your contingency table should have an expected cell frequency of 5 or greater Observations are independent, meaning you should not use the chi square test for matched pairs. You should apply Yates correction factor for 2x2 contingency tables and the cell frequencies are 5 or more, but less than 10. You should apply Fisher’s exact test when the sample size for a 2x2 contingency table is 20 or less. SPSS procedures to request in the Crosstabs dialogue box (see Appendix 1 Crosstab and Chi Square SPSS Screens): Click the “Statistics” button and check the “Chi square” box in the upper left corner Check the appropriate measure of association box (most likely “Phi and Cramer’s V) Click the “Cells” button and check the row and/or column and/or total percentages (based on how you prefer to look at/analyze the table) box For Residuals, standardized residuals are recommended. Cells with standardized residual values of greater than +1.0 may reveal the cell that contributes to a significant chi square test. 16 Measures of association for 2 nominal variables: Phi - for 2x2 tables Cramer’s V for other tables (2x3, 3x3, etc.) Example 1 - Chi square test The contingency table and data analysis examines the relationship between general happiness and marital status of the respondents. Examples of possible survey questions are noted below. 1. What is your current marital status? ___1 Married ___2 Widowed ___3 Divorced ___4 Separated ____5 Never married 2. What is your current level of general happiness with life? ___ Very happy ___ Pretty happy ___ Not too happy SPSS Output of a Contingency Table and Chi Square Test GENERAL HAPPINESS by MARITAL STATUS (N=494) MARITAL STATUS Total GENERAL NEVER MARRIED WIDOWED DIVORCED SEPARATED HAPPINESS MARRIED Count 81 8 12 1 15 117 VERY % within Marital 33.5% 16.3% 16.0% 9.1% 12.8% 23.7% HAPPY Status Std. Residual 3.1 -1.1 -1.4 -1.0 -2.4 Count 143 33 52 8 83 319 PRETTY % within Marital 59.1% 67.3% 69.3% 72.7% 70.9% 64.6% HAPPY Status Std. Residual -1.1 .2 .5 .3 .9 Count 18 8 11 2 19 58 NOT TOO % within Marital 7.4% 16.3% 14.7% 18.2% 16.2% 11.7% HAPPY Status Std. Residual -2.0 .9 .7 .6 1.4 Count 242 49 75 11 117 494 Total % 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% Chi-Square Tests Value df Asymp. Sig. (2-sided) .000 .000 .000 Pearson Chi-Square 29.537 8 Likelihood Ratio 30.445 8 Linear-by-Linear 22.369 1 Association N of Valid Cases 494 a. 2 cells (13.3%) have expected count less than 5. The minimum expected count is 1.29. Standardized residuals > +1.0 Level of significance (p) is < .05 Chi square value and degrees of freedom (df) Number of cells (%) that have an expected frequency of less than 5 17 Symmetric Measures Nominal by Nominal Phi Cramer's V Value .245 .173 494 Approx. Sig. .000 .000 Value of Cramer’s V noting a “weak” association N of Valid Cases a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. Brief Interpretation of the Analysis of General Happiness with the Marital Status of the Respondents There is a significant (X 2 = 29.537, df = 8, p < .01) but weak association (Cramer’s V = .173) between one’s level of general happiness and marital status. Persons who are married are significantly more likely to report they are very happy while persons who have never married are more likely to report they are not too happy. Example 2 - Chi square test The table and data analysis examines the relationship between favoring or opposing the death penalty for the crime of murder and gender of the respondent. Possible survey questions are also provided below. 1. What is your sex? ___1 Male ___2 Female 2. Do you favor or oppose the death penalty for the crime of murder? ___1 Favor ___2 Oppose SPSS Output for the a Cross tabulation and Chi square test FAVOR OR OPPOSE THE DEATH PENALTY FOR MURDER by SEX SEX Total 1 MALE 2 FEMALE 1 FAVOR Count 199 232 431 % within 81.9% 73.7% 77.2% FAVOR OR RESPONDENTS SEX OPPOSE Std. Residual .8 -.7 DEATH PENALTY FOR 2 OPPOSE Count 44 83 127 MURDER % within 18.1% 26.3% 22.8% RESPONDENTS SEX Std. Residual -1.5 1.3 Total Count 243 315 558 % within RESPONDENTS SEX 100.0% 100.0% 100.0% 18 Chi-Square Tests Value df Asymp. Sig. Exact Sig. Exact Sig. (2-sided) (2-sided) (1-sided) .021 .028 .020 .025 .013 .021 Pearson Chi-Square 5.301 1 Continuity Correction 4.843 1 Likelihood Ratio 5.385 1 Fisher's Exact Test Linear-by-Linear 5.291 1 Association N of Valid Cases 558 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 55.31. Symmetric Measures Nominal by Nominal Phi Cramer's V Value Approx. Sig. .097 .021 .097 .021 558 N of Valid Cases a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. Brief Interpretation of the Analysis of Attitude Toward the Death Penalty for Murder with Respondent’s Sex There is a significant (X 2 = 5.301, df = 1, p = .021) but weak association (Phi = .097) between a person favoring or opposing the death penalty for the crime of murder and the person’s sex. Women are significantly more likely to oppose the death penalty for the crime of murder than are men. Statistics is like grout - The word feels decidedly unpleasant in the mouth, but it describes something essential for holding a mosaic in place. - Ramsey & Schafer - 19 Appendix 1 Frequencies SPSS Screens Highlight and move the variables from the variable list to the “Variable(s)” box by clicking the arrowhead. 20 Click “OK” button to run Frequencies. Output for the first variable, “quality” is noted below.. Variable Label Variable Name quality Quality of Svc Cumulative Frequency Valid Values Percent Valid Percent Percent 1 Poor 1 .9 .9 .9 2 Fair 2 1.9 1.9 2.8 3 Good 17 15.9 15.9 18.7 4 Excellent 87 81.3 81.3 100.0 107 100.0 100.0 Total 21 Appendix 2 Crosstab and Chi Square SPSS Screens Click “Statistics” to get the “Crosstabs: Statistics” dialogue box 22 Click “Cells” to get the “Crosstabs: Cell Display” dialogue box 23 References Holcomb, Z. C. (2006). SPSS basics: Techniques for a first course in statistics. Glendale, CA: Pyrczak Publishing. Kachigan, S. K. (1986). Statistical analysis: An interdisciplinary introduction to univariate and multivariate methods. New York: Radius Press. Keller, G. (2001). Applied statistics with Microsoft® Excel. Pacific Grove, CA: Duxbury. Norusis, M. J. (2011). IBM SPSS Statistics 19 guide to data analysis. Upper Saddle River, NJ: Prentice Hall. Ramsey, F. L., & Schafer, W. (2002). The statistical sleuth: A course in methods of data analysis (2nd ed.). Belmont CA: Duxbury Press. Rubin, A., & Babbie, E. (2008). Research methods for social work (7th ed.). Belmont, CA: Thompson/Brooks Cole.