Chapter 1 Fundamental concepts SPSS - Descriptive statistics Before starting with any advanced analysis, it is a good habbit to start with some descriptive statistics and simple graphics, to see what is going on in your data! Datafile used: gss.sav How to get there: Analyze Descriptive Statistics …. Frequencies … This menu selection opens the following Frequencies dialog box: As you can see, the variables are difficult to read. To make them easier to read, we’ll use variable names instead of labels in dialog boxes. Do this by choosing Edit Options. Then, in the Options dialog box, click the General tab. In the Variable Lists group box (top at the right), select ‘Display names’ and click OK. This change doesn’t have effect until the next time you open a data file! So close the datafile, and reopen it. Return to the Frequency dialog box. Now you’ll see the following Frequencies dialog box: Choose the variable(s) for which you need descriptive statistics by selecting them and clicking on the arrow. They appear in the ‘Variable(s):’ box. ‘Display frequency tables’ is automatically selected. In a frequency table the absolute and the relative frequencies are shown, as well as the percentage and cumulative percentage of valid cases (without missing values). The cumulative percentage is the portion that is smaller or equal to the concerning value. Button Statistics… One can select many descriptive statistics. Most importantly, these are the Mean, Median, and Mode, and Std. deviation, Range, Minimum and Maximum. See following figure. Button Charts… Some simple charts can be obtained, such as bar charts, pie charts and histograms. A histogram is a graphical display of counts for ranges of data values. In histograms, one can choose to indicate the normal curve as well. See following figure. When a chart is obtained in the output, they can be modified in the SPSS Viewer. A new window appears, the SPSS Chart Editor, in which changes can be made by clicking on a certain part of the chart (e.g. axis, legend, title) In the following figure, the window ‘Category Axis’ appears by clicking on the x-axis title Respondent’s Sex. Output of running frequencies Output 1 When you perform an analysis using ‘Frequencies’ on the variable degree, without indicating any options, the results are the following: Frequencies Statistics RS Highest Degree N Valid 1496 Missing 4 RS Highest Degree Valid Missing Total Less than HS High school Junior college Bachelor Graduate Total Don't know No answer Total Frequency 279 780 90 234 113 1496 2 2 4 1500 Percent 18,6 52,0 6,0 15,6 7,5 99,7 ,1 ,1 ,3 100,0 Valid Percent 18,6 52,1 6,0 15,6 7,6 100,0 Cumulative Percent 18,6 70,8 76,8 92,4 100,0 In the table ‘Statistics’, the number of cases (N) is splitted in Valid and Missing cases. In the frequency table ‘RS Highest Degree’, the variable degree is splitted into the possible answers (Less than HS, High School, ..etc), and their absolute (Frequency) and the relative (Percent) frequencies are shown, as well as the percentage and cumulative percentage of valid cases (Valid Percent and Cumulative Percent). Percent calculates the relative frequencies including the missing cases. However, Valid Percent calculates the relative frequencies excluding the missing cases, so that the relative frequencies of the valid cases count up to 100 %. Output 2 When you perform an analysis using ‘Frequencies’ on the variables age, indicating the options mean, median and mode (button Statistics) , and histogram with normal curve (button Charts), some of the results are the following (we left the table ‘Age of Respondent’ out because it is very large): Frequencies Statistics Age of Respondent N Valid Missing Mean Median Mode 1495 5 46,23 43,00 28a a. Multiple modes exist. The smallest value is shown Age of Respondent 200 Frequency 100 Std. Dev = 17,42 Mean = 46,2 N = 1495,00 0 20,0 30,0 25,0 35,0 40,0 50,0 45,0 60,0 55,0 70,0 65,0 80,0 75,0 90,0 85,0 Age of Respondent As usual, the number of valid and missing cases are visible in the ‘Statistics’ table. The other descriptive statistics (Mean, Median and Mode), are indicated in the same table. The histogram of the variable age shows its distribution, with Age of Respondent on the x-axis and Frequency on the y-axis. The distribution seems to be approximately normal, and skewed to the left.