USING SPSS FOR DATA ENTRY AND ANALYSIS Dwayne Devonish NB: Can still be used for later versions of SPSS despite some operational and cosmetic changes An update is pending USING SPSS • SPSS = Statistical Package for Social Sciences • Three main steps in SPSS: Defining variables/items into SPSS (preparing the data file) – This process involves defining and labelling each variable as well as assigning numbers to each possible response (Pallant, 2005). Entering the Data – After defining all variables, you can enter your data. Conducting Statistical Analyses – SPSS can perform a range of statistical analyses on your data including frequencies, means and standard deviations and crosstabulations. QUESTIONS/VARIABLES • You have to be familiar with different types of questions or items on a questionnaire. A question or item on a questionnaire is treated as a variable in SPSS. • A variable is any characteristic that can vary. For example, gender, age and income. • In SPSS, variables can assume different levels of measurement. • Nominal Variables and Ordinal Variables are sometimes referred to as categorical variables because they possess distinct labels or categories in which persons or objects are placed. However, ordinal variables consist of response options/categories with some intrinsic order, that is, these categories are rank-ordered. For example, Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree or Primary, Secondary, First Degree and Post graduate. QUESTIONS/VARIABLES (2) • Nominal variables have categories that have no implied order such as gender (male or female) or religious affiliation (Christian or Muslim) or type of car (Nissan or Suzuki). • Interval and Ratio variables are variables that elicit numerical or scale data. Data are in the form of numbers. These variables are also known as continuous variables. For example, age, height, weight, etc. • Nominal, ordinal, interval, or ratio variables represent different types of items on a questionnaire. QUESTIONS/VARIABLES • All you need to know is that there are really three basic types of questions/items on a questionnaire: • Closed-Ended Items - Items with a specified number of response options from which respondents either circle or tick one or more – also known as the categorical variables (nominal or ordinal). • Open-Ended/Textual Items – Items that allow respondents to provide textual responses (data in words; qualitative data). • Numerical Items – Items that only elicit data in the form of numbers. Respondents write in a number as a response (Interval or Ratio). SPSS Version 11 SPSS v.11 FOR WINDOWS: This guide presents an introduction to guidelines on the SPSS 11 version; however, for later versions up to v.15 to 20, it may still help. For example SPSS v.15-20 have a very similar structure and operational set-up but with some minor changes. • In SPSS, there are two main windows: the Data Editor Window and Output Viewer. • The Data Editor is used to define the variables and to enter the raw data from completed questionnaires. • There are two different views on the Data Editor Screen: Variable View and Data View (at bottom lefthand side of the screen of SPSS window). • Let’s go through the three main phases in SPSS Phase 1 – Setting up the Data Sheet In Phase 1 (Setting up the data sheet) • In the Variable view (select by ticking the ‘tab’ at the bottom), you can prepare your data sheet (codebook). This involves defining your variables/questions into the SPSS data sheet. • Each row in the variable view represents a variable to be defined. Relevant characteristics used to define variables are listed along the top of the data sheet. • Characteristics are (from left to right): Name, Type, Width, Decimals, Label, Values, Missing, Columns, Align and Measure. SAMPLE QUESTIONNAIRE • Let’s use this short questionnaire: 1. Gender Male __ Female __ 2. Age 3. Income _______ $10-20 _ $30-45 _ $21-30_ $Over 45_ 4. Suggestions _______________________________ Phase 1 • Under Name – Type in a name for your variable. The variable name allows you to identify the question/variable in SPSS when entering your data from a questionnaire. The variable name: o Cannot be longer than 8 characters (in SPSS V.11); however, more than 8 characters exist in later versions of SPSS o Must be unique (each variable must have a different name) o Must begin with a letter For our questionnaire (see previous slide), we give our first question/variable a name – gender. Variable characteristics are listed along the top The data editor window has two views: Data View and Variable View which can be selected at the bottom of the screen. Click Variable view to define your variables. Phase 1 • Under Type – This option represents the type of variable or data you are entering. There are two common variable types you should be familiar with: 1) Numeric and 2) String. Don’t worry about the others !! The ‘numeric type’ is used with 1) questions that are closed-ended (i.e. have categories or response options from which to choose – in SPSS, a number is assigned to a category or response option on these types of questions/variables) and 2) questions/variables that elicit (seek to obtain) numerical data (interval/ratio variables or continuous or scale variables). Continuous/scale variables (aka interval or ratio) are another term that represent numerical questions. Phase 1 (Under Type) Gender is a categorical variable (nominal) and is a closed question (i.e. people have two categories to choose from) because it places people into two distinct groups/categories = Male or Female. You must choose numeric for our first question measuring gender. The ‘string type’ is used only for questions that seek to obtain textual data (data in the form of words). These questions are open-ended. You can activate the ‘Type’ cell by clicking the grey box right of the cell Numeric is the default in SPSS, so you don’t have to change this option Click on the grey box to open the dialogue box containing variable type options. Phase 1 • Under Width – The width is the character or digit span - The default value for width is 8, permitting 8 characters or digit values when entering data (e.g. for 8-digit numbers). This width size can accommodate most data, however, if you have a continuous variable which has very large values ($7,000,000, 000), you would have to increase the width size. Otherwise, leave it at 8. For string variables, increase width to the maximum (256 characters in SPSS 11.) since you will be accommodating words instead of numbers. Phase 1 • Under Decimals – The default setting for decimals is 2. However, you should set all decimals to 0 (zero). If your variable requires decimal places (e.g. income as a continuous variable - $500.28), you can adjust the decimals to suit. • Under Label – The label column allows to give a longer description for your variable than the 8 characters that are allowed under Variable Name. For example, you can type in “Gender of Respondent” or “What is your Gender” in the space. When you conduct statistical tests, the label appears over your generated statistical output so you can match it to the question you analyse. Phase 1 • Under Values – This column allows you to assign numbers to represent categories or labels on your variable – only should be activated for closed-ended items. Using the grey box on the right of the cell, you can activate a dialogue box with which you can insert the value (number) and the corresponding value label (category or response option). You add this information into the lower field of the dialogue box until you have finished assigning a number to each label. You click OK. Remember that all value labels and corresponding values must be added into the lower field of the box, before clicking OK. Click Add to register your coding scheme so that it goes in the lower field of the box. ‘1=Male’ has been registered in lower box already. You must click add to register ‘2=Female’ in the lower box. You can click ‘OK’ when you are done. Phase 1 • Under Missing – You can assign specific values to indicate missing values for you data. For example, if you arrive at a missing response on gender, you can specify a number that SPSS would register as a missing response. When entering data, you can alternatively leave the field or cell blank if you encounter a missing response – this is preferred. SPSS registers this blank field as a missing response. If you are planning to use a value instead to represent a missing response, you can activate your missing column using the grey box on the right of the cell – otherwise you can leave it alone. 2. Unclick the ‘No Missing Values’ and click ‘Discrete missing values.’ You can type a value that would represent a missing response like ‘9999’, for example. 1. Grey box activates ‘missing’ column 3. Click OK after you have specified the your value. Phase 1 • Under Columns and Align – You do not have to adjust or change these settings. Move on to Measure. • Under Measure – You can specify the level of measurement (nominal, ordinal or scale) your variable assumes. For gender, you can specify a nominal measure. Measure speaks to those types we were discussing: nominal, ordinal or scale. Remember scale covers both interval or ratio (continuous). • Congratulations, you have successfully coded gender into SPSS. • Later versions of SPSS (e.g. 19 and 20) have another variable tool known “Role” (after ‘Measure). You can leave this tool at “Input’: the default selection. Phase 1 • For the income variable, you should follow the same procedure: Name – income, Type – numeric (closed question – have categories to choose from), Width - 8, Decimals - 0, Label - Income of Respondent, Values – Four categories = 1= 10-20, 2= 21-30, 3 = 31-45, 4= Over 45, Missing = 9999, Measure = ordinal. • For the age question, the same procedure is used but when you arrive at “Values”, you do not have to assign a number because age is a numerical/continuous variable, that is, the data are in the form of numbers already. Leave values as ‘none’. The measure is specified as ‘Scale’ Phase 1 • For the final question on our survey (Suggestions), the respondent is asked to offer suggestions in a space provided. This question is open-ended and requires textual data (i.e. words). When you reach ‘type’, change the option to String which allows to put in text or words into the field or cell during the data entry stage. Increase characters (width) to 255 (maximum character span). The Decimals, Values and Missing columns are inactive. Type in the label – e.g., Suggestions from Respondents. You are done !!! DON’T FORGET TO SAVE YOUR FILE You can click the ‘Data View’ tab to enter your data (phase 2) You enter your data in a row from left to right Variables are listed across the top of the data sheet in DATA VIEW. Phase 2 – Data Entry • Let’s enter this completed questionnaire: • Gender • Age • Income Male X 24 10-20 _ 31-45 _ • Suggestions Eat more healthy Female _ 21-30 X Over 45 _ 2. You can see male (instead of 1) and 21-30 (instead of 2) when your value labels have been requested. 1. You can click icon (with a label) on the menu bar to see your value labels instead of raw numbers Phase 3 – Analysis of Data • After entering your data, you can now move on to analyse your data. • Descriptive statistics are a school of statistics that are used to describe the characteristics of your sample. Descriptive statistics include frequencies and percentages, means and standard deviations, and crosstabulations. • For categorical variables (nominal or ordinal), that is, those questions that have a number of categories or response options attached, you can use frequencies/percentages to obtain the number/proportion of persons that fall into the different categories. For example, gender and income are categorical and must be analyzed using frequencies. Phase 3 - Analysis of Data (2) • For continuous or scale variables such as age (numerical data), you have to use means and standard deviations. These statistics (like frequencies for gender) give you a descriptive account on your sample in terms of age. The mean is same as the average. You use the mean and standard deviation for summarising numerical or continuous data (age). Descriptive Statistics • Main descriptive statistics include: • Frequencies (and percentages) • Means and standard deviations • Crosstabulations 2. Click the ‘Analyse’ tab at the top of the menu bar 3. Move to 1. We are going to analyse ‘gender’ (a categorical variable) using the frequencies/percentages command. ‘Descriptive Statistics’, then click Frequencies in the next box 1. A dialogue box appears marked ‘Frequencies’. On the left box field, you have a list of the variables you have defined in your variable view. 2. To run frequencies on a variable, choose the variable from the left box field and with the use of the arrow in the middle, transfer it to the right box field, under variable(s). 2. We can choose different charts to accompany our frequency command on gender. Click on ‘Charts’ to see options. 1. Gender is transferred to right box field. We want to analyse gender. 1. A chart dialogue box appears. You can choose bar charts or pie charts. Let’s choose pie charts. 2. Click Continue and then you can click OK on the first dialogue box. This table shows the frequencies and percentage s for males and females in the sample. SPSS opens a separate window called the ‘Output’ View where the statistical output is presented. This box tells you the number of valid (those who have filled in information for gender) and missing cases (those who have not indicated their gender). There are ten valid cases. To analyse continuous variables such as age, you must use mean and standard deviation statistics. Go to Analyse, ‘Descriptive Statistics’ and then ‘Descriptives’ Click options to see the various descriptive statistics you can conduct on age. Again, use the arrow here to drag age into right box field. Ensure that mean and standard deviation options are ticked. Click Continue and then click OK on first dialogue box to see the statistical results. N = number of persons who indicated their age (sample size). Minimum = Smallest value on age (youngest person= 13). Maximum = Largest value on age (oldest person= 53) Mean = Average = The average (mean) age of the sample is 26.60 years (SD=12.39). Remember the standard deviation (SD) is the spread of the scores around the mean. Remember, mean and standard deviation are used to summarise continuous data (age, number of children, weight, height). Descriptive Statistics • Crosstabulations are descriptive tools that are used to analyse two variables at a time. These variables must be categorical. Crosstabulations can summarise a relationship between two variables. • For example, let’s say we want to analyse income and gender to determine the number (and percentage) of males and females that fall into different income categories. We have to use a crosstabulation because we are looking to analyse two variables which are categorical. To analyse two variables using a crosstabulation, Go to Analyse, ‘Descriptive Statistics’, then Go to ‘Crosstabs’ 2. Before conducting the crosstabulation, click on the tab at the bottom ‘Cells’ to request percentages. 1. Using the arrows, you can transfer income variable from the left to the right box field under ‘Row’ and transfer gender variable to other field below income under ‘Column’. You are analysing gender and income, so one variable must go into the row box and next must go into the column box. 2. Click ‘Continue’ and then click OK on the first dialogue box to run the analysis. 1. Click ‘column’ under percentages to find out the percentage of males and females in different income categories This table tells you the valid number and proportion of cases Income categories are in the rows Male and Female categories of gender are in columns. This table is referred to as a contingency table. Interpretation of Contingency Table • The table in prior slide tells you that 100% (5 out 5 females) of females fell into the $10-$20 income bracket, whereas all males fell into the $21-$30 income bracket. ENJOY SPSS Remember, SPSS is a comprehensive statistical programme that is used to conduct various statistical analyses on data derived from surveys. It can be used to generate charts, tables and other statistics on your data. You can use it with any course that involves a research project, thesis or dissertation.