4
2 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
These reference manuals have been developed to assist students in the basics of statistical computing – sort of a
“Statistical Computing for Dummies”. It is not our intention to use this manual to teach statistical concepts 1 …but rather to demonstrate how to utilize previously taught statistical and data analysis concepts the way that professionals and practitioners apply them – through the able assistance of computing. Proficiency in software allows students to focus more on the interpretation of the output and on the application of results rather than on the mathematical computations.
We should pause here and strongly make the point that computers should serve as a medium of expediency of calculation
– not as a substitution for the ability to execute a calculation.
In the Basic Concepts manual, we present statistical concepts, context for their use, and formulas where appropriate. We provide exercises to execute these concepts by hand. Then, in each subsequent manual, the concepts are applied in a consistent manner using each of the five major statistical computing packages – Excel, SPSS, Minitab, R and SAS.
1 Readers of this manual are assumed to have completed some introductory statistics course. For individuals wishing to review statistical concepts, we recommend Introduction to Stats by DeVeaux, Velleman and Bock.
3 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
SPSS was first developed in 1968 by social science researchers at Stanford University as a tool to help them with quantitative research. In fact, the acronym SPSS initially stood for “Statistical Package for the Social Sciences”. As with
“IBM” and “AT&T”, the company (and its software) is simply known by its initials, in part as a testament to its diverse user base. Although the software is most heavily used in social science contexts – particularly in psychology, political science and in academia – it is also used in medicine, marketing, and many other contexts. SPSS is appealing to many users from less technical and/or mathematical disciplines because it has a particularly user-friendly interface consisting of an Excel-like spreadsheet for the data and menus and buttons for manipulations and analyses. Although this “point and click” interface makes SPSS particularly attractive for statistical computing novices, individuals who require greater statistical functionality may find the application limiting.
Between 2009 and 2010, the premier vendor for SPSS was called PASW (Predictive Analytics SoftWare) Statistics. The company announced on July 28, 2009 that it was being acquired by IBM. Versions 19.0 and 20.0 are named IBM SPSS
Statistics. This document has been written using SPAW version 18.0.
4 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Prior to actually executing any of the statistical concepts from the Basic Concepts Manual, we first need to get the
WidgeOne.xls dataset into the SPSS system and convert it into an SPSS file.
When you open SPSS you should see the following screen:
5 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
As shown above, there are two tabs in the new file. A Variable View tab and a Data View tab. The Data View tab will display the data much the same way as an Excel spreadsheet.
We must import the data from the Excel spreadsheet WidgeOne. Do this by clicking on File>Open>Data. Then click the computer icon>Computer>C$(\\Client)(V:):
Note that if you are accessing SPSS through Citrix, all of your drive names will change. For example, your “C:” drive will become your “V:” drive.
Make sure that the File type is set to .xls to find an Excel file.
Browse to where the WidgeOne file is located.
When you open it, you will get a dialogue box like this:
6 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Make sure that you select the Plant_Survey worksheet.
7 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
The following window should appear:
This is one of two possible views of your dataset. This is the Variable View. Note at the bottom of the screen, the Variable
View tab is highlighted. This view lists the variables in your dataset. In our case, the column names in the WidgeOne file were converted to variable names in this SPSS file. The qualitative variables (e.g., GENDER and PLANT) are called
“string” variables and the quantitative variables (e.g., PRDCTY and YRONJOB) are called “numeric” variables.
8 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
For later displays it will be nice to create user-friendly labels for each of the values in these variables, instead of indicators like “D” for the Dallas plant. To create labels that will make our output easier to read, click on the Values cell in the
PLANT row:
9 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You will be prompted for a name and a label. In the Value box, enter the value that appears in the actual data that you want to read differently in the output:
Click the Add button. Next assign the label “Norcross” for the value “N” and click Add again. Click OK. Do this for the other string variables – Plant and Position. Please note that this is NOT affecting your actual data, it will only change the way that the output appears.
10 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Go back to the Data View tab at the bottom of the screen…you will see the actual data 2 :
To expand the columns, simply place your cursor in between the column headers (variable names) and drag the column to its desired width – just like you would in Excel.
2 If you needed to create a new dataset from scratch, you would begin by defining your variables in the Variable View window and then return to the Data View window and input the values.
11 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
At this point, we could convert the other worksheets from the WidgeOne dataset into SPSS files. Each would be converted to a separate SPSS file. These files could be merged into one file using the Merge Files option in the Data Menu (not available in Student Version). However, since we will only be using the variables in the Plant_Survey worksheet for our statistical analyses, we will not execute a merge at this time.
12 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
SPSS is a menu-based system. Thus it is only a matter of finding what you want to do on the menus and customizing your request. For most computations, you should find SPSS to be easier than Excel. In order to find the two most predominant measures of central tendency (the mean and median) we start in the Analyze menu. Within that menu, choose Descriptive
Statistics and Frequencies as shown:
13 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Next you will see the following:
We need to choose the variables for which we are interested in finding the mean and median. We will choose only the quantitative variables (those with the “ruler” icon next to them): JOBGRADE, SOCREL, YRONJOB, PRDCTY and
JOBSAT. We make these selections by clicking on the variable from the list on the left and then clicking on the right arrow button circled above to place it on the variable list on the right. Almost every option in SPSS has this type of interface for selecting variables for analysis. You can choose more than one variable at a time by holding the Ctrl key down as you make your selections.
Please make sure that the Display frequency tables option is UNTICKED. This will be more meaningful later.
14 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
After you have identified the variables for analysis, click on the Statistics option button as circled above. You should see this screen:
This should look pretty familiar. This is almost the same list of statistical information that was produced when we executed Tools>Data Analysis>Descriptive Statistics in Excel. Hmmm…that must mean that this stuff really is important.
For now, just tick Mean and Median and select Continue.
15 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
We obtain the following display containing the means and medians of our five variables in our SPSS Output window:
Statistics
JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT
N Valid 40 40 40 40 40
Mean
Missing 6
6.6000
6
84.5798
6
5.5000
6
8.2900
6
6.8500
Median 6.5000 84.8114 5.0000 8.3500 6.6000
Notice that these figures are consistent with what we had generated using Excel and what we had computed by hand.
Isn’t it nice when numbers match?
What if we were only interested in a subset of the data? For example, what if we wanted to know the measurements of central tendency of these variables by gender and by plant?
16 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
We would select the Compare Means>Means option from the Analyze menu as shown:
17 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following screen:
Typically, quantitative variables go in the
Dependent List and qualitative variables go in the Independent List.
Choose the same five variables as before. Place these variables in the Dependent List. Then, place the variables Plant and
Gender in the Independent List. This will enable us to better understand if there are differences between the genders and the plants across the quantitative variables like Productivity (PRDCTY).
Once the variable lists have been populated, select the Options button. From the list, identify that you want the Mean and the Median. Select Continue and OK.
18 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following output:
Case Processing Summary
JOBGRADE * Plant
PRDCTY * Plant
SOCREL * Plant
YRONJOB * Plant
JOBSAT * Plant
JOBGRADE * Gender
PRDCTY * Gender
SOCREL * Gender
YRONJOB * Gender
JOBSAT * Gender
N
Included
Percent
40
40
40
40
40
40
40
40
40
40
87.0%
87.0%
87.0%
87.0%
87.0%
87.0%
87.0%
87.0%
87.0%
87.0%
N
6
6
6
6
6
6
6
6
6
6
Cases
Excluded
Percent
13.0%
13.0%
13.0%
13.0%
13.0%
13.0%
13.0%
13.0%
13.0%
13.0%
N
Total
Percent
46
46
46
46
46
46
46
46
46
46
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
19 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT * Plant
Plant
Dallas Mean
Std. Deviation
Norcross Mean
Total
Std. Deviation
Mean
Std. Deviation
JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT
6.8696
1.39167
6.2353
1.71499
6.6000
1.54919
88.3394
6.31238
79.4934
5.07442
84.5798
7.25633
5.5217
1.47308
5.4706
1.50489
5.5000
1.46760
8.1043 7.1478
4.66276 1.04745
8.5412
3.76199
8.2900
4.25657
6.4471
.85522
6.8500
1.02081
Gender
Female Mean
JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT * Gender
JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT
6.3000 83.9656 6.0000 8.1850 6.9800
Male
Total
Std. Deviation
Mean
Std. Deviation
Mean
Std. Deviation
1.49032
6.9000
1.58612
6.6000
1.54919
7.84944
85.1940
6.75811
84.5798
7.25633
1.48678
5.0000
1.29777
5.5000
1.46760
4.95020 1.06306
8.3950
3.55846
8.2900
6.7200
.98654
6.8500
4.25657 1.02081
This output is much more explanatory than the first set of output. Look at the differences between the plants. Which plant is more productive? Which plant has a higher Job Satisfaction score? Now look at the differences between the genders. Which gender has a higher social relations score? Is there a difference in productivity between the genders?
20 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Sometimes looking at an average by itself is misleading. For example…let’s assume that a friend of yours just read an article about lung cancer. He goes on to tell you that “1% of all Americans will die of lung cancer”. I should probably mention that your friend is a member of the great statistical unwashed. Does this mean that you have a 1% chance of dying of lung cancer? Of course not. It depends upon a lot of things…like…do you smoke? If you re-evaluate that number by smokers/non-smokers, the values are very different. That’s the point…averages are very misleading. You need to look at the average (or median) by different groupings to better understand the “rest of the story”.
As a rule, we do not use the mode as a Measurement of Central Tendency with quantitative data. If the data is qualitative
– Plant, Gender, Position – it is the ONLY Measurement of Central Tendency available. We can determine the mode of variables such as these by selecting Analyze>Descriptive Statistics>Frequencies again from the Analyze menu. This time choose the qualitative variables – Plant, Gender and Position. Check the box next to display frequency tables. Then click
OK.
You will see the following frequency tables from which it is easy to determine if there is a modal value (isn’t this easier than what we had to go through in Excel?):
Plant
Valid
Dallas
Norcross
Total
Frequency Percent Valid Percent
6 13.0 13.0
23
17
46
50.0
37.0
100.0
50.0
37.0
100.0
Cumulative
Percent
13.0
63.0
100.0
21 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Gender
Valid
Female
Male
Total
Frequency Percent Valid Percent
6 13.0 13.0
20
20
46
43.5
43.5
100.0
43.5
43.5
100.0
Cumulative
Percent
13.0
56.5
100.0
POSITION
Valid
Hourly
Management
Total
Frequency Percent Valid Percent
6 13.0 13.0
20
20
46
43.5
43.5
100.0
43.5
43.5
100.0
Cumulative
Percent
13.0
56.5
100.0
You can also see here that we are reaping the work of changing the labels earlier – the full name is much easier to interpret and understand.
22 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
To represent the dispersion of a quantitative variable (Measurements of Dispersion are not relevant for qualitative variables), we typically report the standard deviation. To do this in SPSS, return to the Analyze menu. Select Descriptive
Statistics>Frequencies and select the quantitative variables as before. Turn off the display for frequency tables and click on the Statistics button. Select Standard Deviation. Click Continue and then OK.
You should see the following output:
Statistics
JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT
N Valid
Missing
Std. Deviation
40
6
1.54919
40
6
7.25633
40
6
1.46760
40 40
6 6
4.25657 1.02081
We could have obviously included lots of statistics in our analysis simply by choosing the ones we want from the
Statistics screen.
The second Measurement of Dispersion discussed in the Basic Concepts Manual was the frequency table. To execute a basic frequency table for a qualitative variable, go to Analyze> Descriptive Statistics>Frequencies. Select the qualitative variables for analysis. Ensure that the Display frequency tables box is ticked at the bottom of the page. Click OK.
23 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following output:
Plant
Valid
Dallas
Norcross
Total
Frequency Percent Valid Percent
6 13.0 13.0
23
17
46
50.0
37.0
100.0
50.0
37.0
100.0
Cumulative
Percent
13.0
63.0
100.0
Gender
Valid
Female
Male
Total
Frequency Percent Valid Percent
6 13.0 13.0
20
20
46
43.5
43.5
100.0
43.5
43.5
100.0
Cumulative
Percent
13.0
56.5
100.0
24 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
POSITION
Valid
Hourly
Management
Frequency Percent Valid Percent
6 13.0 13.0
20
20
43.5
43.5
43.5
43.5
Cumulative
Percent
13.0
56.5
100.0
Total 46 100.0 100.0
In the previous manuals, we explained how to categorize a quantitative variable into a qualitative variable. For example, when we created a frequency table for the job tenure variable, we created three categories: < 5 years, 5-10 years and more than 10 years. To create these same categories in SPSS, we need to recode our YRONJOB variable into a new variable called JOBTEN.
25 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
To do this, go to the Transform menu and choose the option Recode into Different Variables:
26 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following:
Click on the Old and New Variables button.
27 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should now see the following screen:
Identify the name of the first category
Identify the range of values for the first category.
Tick this box to tell SPSS that you are creating a qualitative variable
First we define the category “New”. In the screen above, you must indicate that the Range of this new value is from 0 to
4.9 (we wanted values less than 5 and the data had only one decimal place of accuracy). Check in the box that specifies that the new output variable will be of type “String”. We also name the new values “New”.
28 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Click on the Add button to add this new output value.
These actions will produce the following:
Note that the values of YRONJOB between 0 and 4.9 will represent the category “New” in the new variable.
Continue this same process creating the category “Experienced” (5-10 years on the job) and the category “Mature” (10+ years on the job). Note: since the value “Experienced” has 11 characters, change the Width from 8 to 11.
29 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
After you have completed this process, click on Continue.
You should see the following screen:
The “Name” is what will appear in the dataset. The “Label” is what will appear in the output. Select Change and then select OK.
30 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following screen:
31 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Now we can easily generate a frequency table for the new variable JOBTEN. As before, go to Analyze>Descriptive
Statistics>Frequencies. Ensure that the frequency table option is ticked and select your new Jobten variable:
Job Tenure
Frequency Percent Valid Percent
Cumulative
Percent
Valid Experienced 16 40.0 40.0 40.0
Mature
New
Total
Well Done!
15
9
40
37.5
22.5
100.0
37.5
22.5
100.0
77.5
100.0
32 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
For professional presentations or for formal documents, we recommend the use of a graphics package (e.g. Microsoft
Power Point). However, SPSS has some nice graphs available in the Graphs menu, which can be used less formally. In addition, it is very useful to develop graphics for your own purposes, because it enables you to see things about your data that you might not have otherwise seen.
As with Excel, let’s begin with a Histogram. We will also execute a Stem and Leaf plot, which we were not able to do with Excel.
To create a Histogram of the YRONJOB variable, select Analyze>Descriptive Statistics>Explore:
33 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Assign YRONJOB to the Dependent list. Select the Plots button:
34 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Tick the Stem and Leaf and Histogram options.
Click Continue. On the Explore dialogue box, make sure that the “Both” option is selected for the Display. Click OK.
35 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
This set of executions will generate the following output:
Descriptives
Statistic Std. Error
YRONJOB Mean
95% Confidence Interval for
Mean
Lower Bound
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
8.2900
6.9287
9.6513
8.2917
8.3500
18.118
4.25657
.10
17.00
16.90
6.10
-.081
-.748
.67302
.374
.733
36 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
YRONJOB Stem-and-Leaf Plot
Frequency Stem & Leaf
2.00 0 . 01
5.00 0 . 22233
5.00 0 . 44555
4.00 0 . 6777
8.00 0 . 88888999
7.00 1 . 0000111
4.00 1 . 2333
4.00 1 . 4445
1.00 1 . 7
Stem width: 10.00
Each leaf: 1 case(s)
Here is the Stem and Leaf plot. If you imagine rotating this graphic clockwise 90 degrees, it is basically a Histogram on its side. The plot tells us that each stem has a width of 10.00. This means that the values should be interpreted in units of 10.
Let’s start in the middle with the frequency of 7.00. Here, we have four values that are 10.x and three more values that are
11.x. The next line indicates a frequency of 4.00. In the dataset, we have an observation that is 12.x and three observations that are 13.x. The greatest observation is 17.
37 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
This is a boxplot. Here, the center line is the median. The “box” is the Interquartile range – the high end of the box is the
75 th percentile and the low end is the 25 th percentile. The “whiskers” that extend in either direction tell us the full range of the data. If there were any “outliers” (defined as observations with values more than 1.5*IQR from the mean), they would be identified here.
Lots and lots of output…with relatively little work. That’s what I’m talking about!
38 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
To replicate the pie chart developed in the Basic Concepts Manual, go to Graph>Legacy Dialogues> Pie:
39 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see this screen:
What we’re trying to communicate with the pie chart is the proportions of the different groups, so click summaries for groups of cases and then define.
40 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following screen:
Select % of cases – we are communicating proportions, after all
Drag the qualitative variable of interest in this box.
41 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
The only other thing we really need to do is to give our Pie Chart a title. So, click on the Titles tab and title the chart “Job
Tenure of WidgeOne Employees”. Feel free to explore the other tabs.
You should have generated the following:
To insert the percentages, simply double click on the graph to see this:
42 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Click show data labels and then close the properties window and the chart editor to obtain the following graph:
43 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Let’s say that you wanted to understand how the overall productivity of the company was allocated by plant – what percentage of the productivity comes from Dallas versus Norcross. This is easy to do in a Pie Chart in SPSS. Go back to
Graph>Legacy Dialogues> Pie.
44 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
This time, select “sum of variable” as the slice, select the PRDCTY as the variable to sum by, and then (since we want the percentages) select % of cases.
45 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Don’t forget to change the title and add in the percentages by double clicking the pie chart.
You should see the following Pie Chart:
This pie chart now provides information regarding the percentage of WidgeOne’s productivity by plant (Norcross needs to step it up).
46 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
The next univariate visualization tool is a Bar chart. This is done in a very similar way to the Pie Chart.
Select Graph>Legacy Dialogues> Bar. You should see the following screen:
47 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Again, we’re comparing groups and we would want the information summarized by group, so click on summaries for groups of cases and then define.
Change this to counts (n of cases).
Drag the variable of interest into the category axis
48 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
As before, change the title to something meaningful. You should generate something like this:
This could be better. Double click on the graph to bring up the chart editor.
49 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Click on Transpose Chart Coordinate
System to generate a horizontal bar chart
Click on Data Label Mode to insert the counts
Click on the labels. This brings up a window where you can change the order of the groups from alphabetical to something that makes more sense
We should probably note at this point that if the definitions that you assigned when you transformed the quantitative variable into a qualitative variable are not universally known, you should include a legend or key at the bottom of your graphic to ensure that the reader understands the definition of “New” and “Mature”.
50 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Contingency tables, Stacked Bar Charts, 100% Stacked Bar Charts and Scatter Plots can be easily generated in SPSS.
To reproduce the Contingency Tables that were created in earlier manuals that included the variables Plant and Gender, select Analyze>Descriptive Statistics>Cross Tabs:
Place the Plant variable in the Row position.
Place the Gender variable in the
Column position.
51 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
As with Excel Pivot Tables, Crosstabs in SPSS are very flexible. If you wish to include more than just the frequency counts in the cells of your table, click on Cells. You will see the following window:
In the percentages section, select Row, Column and Total. Click Continue and then OK.
52 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Wow…look how much output was created in a single table! That was so much easier than Excel! The output table contains the conditional probabilities described in The Basic Concepts Manual. In the first “cell” – the intersection of
Female and Dallas – we have four pieces of information. We know that there are 13 women who work in Dallas. We know that of all of the Dallas employees, 56.5% are female. We know that of all of the women, 65% are in Dallas. Finally, we know that of all employees, 32.50% are females in Dallas.
Plant * Gender Crosstabulation
Plant
Total
Dallas Count
% within Plant
% within Gender
% of Total
Norcross Count
% within Plant
% within Gender
% of Total
Count
% within Plant
% within Gender
% of Total
Gender
Female Male
13
56.5%
10
43.5%
65.0%
32.5%
7
41.2%
35.0%
17.5%
20
50.0%
25.0%
10
58.8%
50.0%
25.0%
20
50.0%
100.0%
50.0%
50.0%
100.0%
50.0%
Total
23
100.0%
57.5%
57.5%
17
100.0%
42.5%
42.5%
40
100.0%
100.0%
100.0%
If you need to subset this information further (e.g. by Job Tenure), there is an easy way to do that. Go back to the
Analyze>Descriptive Statistics>Crosstabs screen. Press Reset to return to the default settings.
53 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Make your selections of the three variables as follows:
Click OK.
54 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
This time, the table will only show the cell counts (we could have included the percentages as before by following the same steps in the Cell Display screen):
Plant * Gender * Job Tenure Crosstabulation
Count
Job Tenure Gender
Female Male Total
Experienced Plant Dallas
Norcross
Mature
New
Total
Plant Dallas
Norcross
Total
Plant Dallas
Norcross
Total
3
3
8
4
6
6
2
2
6
5
5
10
3
4
7
2
1
3
16
9
6
15
6
8
8
3
9
Notice that the same information on Plant and Gender counts has now been provided by each level of Job Tenure –
Experienced, Mature and New (the levels are reported in alphabetical order rather than by order of magnitude). Cool.
55 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Stacked Bar Charts can be generated in SPSS using the same basic executions that you did for simple Bar Charts in the previous section. Select Graphs>Legacy Dialogues> Bar:
56 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Select stacked and by groups, click define and then you should see the following screen:
Change the Title.
Place the Plant variable here.
Place the Gender variable here.
57 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Select OK.
You should see the following Stacked Bar Chart:
Because these groups are of different sizes, it might be better to plot this information in a 100% Stacked Bar Chart instead.
To do this, double click on the graph:
58 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Click the 100% stacked option.
59 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following graphic:
60 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
The last multivariate visualization technique is the Scatter Plot. Again, SPSS provides us with flexibility to subset our analysis if needed.
What variables might have a relationship? What about Productivity and Job Satisfaction? A Scatter Plot of these variables can be generated by selecting Graph>Legacy Dialogues >Scatterplot, choose simple scatter plot then click define:
61 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Don’t forget to change the title! You should see the following graphic:
So…what do you think? It appears that there might be a positive relationship between the two variables, because the graphic roughly moves in a linear fashion from the SW corner to the NE corner.
62 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Like the other software applications, SPSS will generate random numbers using the internal clock in the computer. As a result, every time a command is given to SPSS to generate some set of random numbers, a different set of random numbers will be generated. However, sometimes we may need to replicate a set of random numbers exactly the way they were previously generated. To accomplish this replication, SPSS allows the analyst to define a “seed” number that will ensure a consistent set of random numbers – the numbers are still random and can be used to ensure statistical independence of samples.
If you need to set the “seed” number so you can replicate your results, simply go to the Transform menu. Choose the
Random Number Generators option. You should see the following screen:
63 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
This system is set to have a Starting Point of 1234567. This starting point is referred to as a “seed”. You can set the starting point value prior to each analysis that uses the random numbers. The value must be a positive integer.
To create a string of random numbers, which is uniformly distributed between 0 and 1, go to the Transform menu and choose Compute Variable. We will call the new random number variable Group as shown in the screen below. Look at the menu for Function Group. In this menu, select Random Numbers. You will then see a long list appear in the Functions and Special Variables menu. This is a list of distributions that you could use to generate the new random variable Group.
This time double click on Rv.Uniform:
64 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Every distribution has parameters that must be specified. For the uniform distribution, the only parameters are the two values between which we want our random numbers to fall. The “?”s in the expression RV.UNIFORM(?,?) which appears in the Numeric Expression box are asking you to fill in these two values for your random numbers. Change this expression to read RV.UNIFORM(0,1), so the random numbers will be between 0 and 1 (as it did in Excel). Click OK. The new variable Group should appear in your Data View. Here is what a typical result would look like:
Remember that your results will vary since this variable was randomly generated.
65 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
One of the primary reasons for generating random numbers is to assign observations into statistically independent groups. Using the random numbers, let’s assign the 40 observations into 2 groups – a test group and a control group.
Just like we did in section 1, select Transform>Recode Into Different Variables. Select the new variable Group to be transformed. Click on Old and New Values. Set it up, so that the values between 0 and .5 are put into the Control Group and the values from .5 to 1 are in the Test group:
Click on Continue. Give the new variable a name like “Assignment” and then click OK.
66 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
You should see the following on your Data View:
Now, you have two groups of randomly assigned employees. This is a very important concept in Statistical Testing.
67 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Because the process of selecting a random sample from a set of data is so common, there is a very straight-forward way to accomplish this in SPSS. Suppose we wish to select a simple random sample of 30 individuals from this dataset. Select
Data>Select Cases>Random Sample of Cases>Sample:
You could choose to sample a certain percentage of the cases or sample 30 out of the first 40 cases. Do the latter. Click on
Continue and then OK.
68 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Your Data view will now look like this:
Cases with a slash were not selected in the sample
Note there is a new variable in the list – filter_$. It assigns the value 1 to those values selected for the random sample and the value 0 to all others. Cases not selected for the sample are now slashed in the first column.
Remember that all samples will all differ unless the same seed is used to generate them.
At this point, you can execute all of your analysis as before, but only those cases with a filter=1 value will be analyzed. You can go back to all cases by selecting Data>Select Cases>All
Cases.
69 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Generating confidence intervals in SPSS is very easy. For example, if we wish to compute a 95% confidence interval for the mean Job Satisfaction rating of all employees, we would go to the Analyze menu and choose Compare Means and then choose One-Sample T Test 3 . Once the Job Satisfaction variable has been assigned, select Options and ensure that the
CI will be generated at a 95% Confidence:
3 Ttests are very common tests used to determine if two sample means differ significantly or if one sample mean differs from some established value. For more detailed information on Ttests, we suggest Statistical Methods and Data Analysis by Ott and Longnecker.
70 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
Click Continue and then OK. You will see the following output:
One-Sample Test
Test Value = 0
95% Confidence Interval of the
Difference t df Sig. (2-tailed) Mean Difference Lower Upper
JOBSAT 42.440 39 .000 6.85000 6.5235 7.1765
As stated previously in The Basic Concepts Manual, these results would be reported as:
“Based on a representative sample of 40 employees, we are 95% confident that job satisfaction among all employees is estimated to be between 6.52 and 7.18”.
This means that the probability that the “true” mean job satisfaction of all employees, which is unknown, falls between
6.52 and 7.18 is 95%. It also means that there is a 5% probability that the true mean job satisfaction is outside of this range
(< 6.52 or > 7.18).
71 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University
What is a “Lagniappe”? This word derives from New World Spanish la ñapa, “the gift”. The word came into the
Creole dialect of New Orleans and there acquired a French spelling. It is still used in the Gulf States, especially southern Louisiana, to denote a little bonus that a friendly shopkeeper might add to a purchase.
Our lagniappe for our readers includes the extra and interesting things that we have learned to do with these software packages that might not be easily found or well known. A little extra information at no extra cost!
You may have noticed that in your output screen in SPSS, you have been generating what is called “Syntax”. For example, when you executed the measurements of central tendency for the quantitative variables, SPSS wrote the following syntax:
FREQUENCIES
VARIABLES=JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT /FORMAT=NOTABLE
/STATISTICS=MEAN MEDIAN
/ORDER= ANALYSIS .
Did you notice that? Probably not. Most people use SPSS because they don’t have to write code to have the software do what they want. However, in the event that you find the “point-and-click” environment of SPSS too restricting, know that you always have the option to write custom syntax to have SPSS more specifically do what you want.
To run syntax in SPSS, select File>New>Syntax. In the blank syntax screen type (or paste) the syntax above. Then select
Run>All. You will generate the same output as before! Cool!
72 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University