SPSS Manual - Department of Statistics and Analytical Sciences

SPSS Reference Manual:

A gentle overview

Table of Contents

1.

Introduction to SPSS

2.

Data Analysis and Statistical Concepts



Concept 1 – Measurements of Central Tendency



Concept 2 – Measurements of Dispersion



Concept 3 – Visualization of Univariate Data



Concept 6 – Confidence Intervals



Concept 4 – Visualization of Multivariate Data



Concept 5 – Random Number Generation And Simple Sampling

4

13

13

23

33

51

63

70

2 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University

These reference manuals have been developed to assist students in the basics of statistical computing – sort of a

“Statistical Computing for Dummies”. It is not our intention to use this manual to teach statistical concepts 1 …but rather to demonstrate how to utilize previously taught statistical and data analysis concepts the way that professionals and practitioners apply them – through the able assistance of computing. Proficiency in software allows students to focus more on the interpretation of the output and on the application of results rather than on the mathematical computations.

We should pause here and strongly make the point that computers should serve as a medium of expediency of calculation

– not as a substitution for the ability to execute a calculation.

In the Basic Concepts manual, we present statistical concepts, context for their use, and formulas where appropriate. We provide exercises to execute these concepts by hand. Then, in each subsequent manual, the concepts are applied in a consistent manner using each of the five major statistical computing packages – Excel, SPSS, Minitab, R and SAS.

1 Readers of this manual are assumed to have completed some introductory statistics course. For individuals wishing to review statistical concepts, we recommend Introduction to Stats by DeVeaux, Velleman and Bock.


SPSS

What is SPSS?

SPSS was first developed in 1968 by social science researchers at Stanford University as a tool to help them with quantitative research. In fact, the acronym SPSS initially stood for “Statistical Package for the Social Sciences”. As with

“IBM” and “AT&T”, the company (and its software) is simply known by its initials, in part as a testament to its diverse user base. Although the software is most heavily used in social science contexts – particularly in psychology, political science and in academia – it is also used in medicine, marketing, and many other contexts. SPSS is appealing to many users from less technical and/or mathematical disciplines because it has a particularly user-friendly interface consisting of an Excel-like spreadsheet for the data and menus and buttons for manipulations and analyses. Although this “point and click” interface makes SPSS particularly attractive for statistical computing novices, individuals who require greater statistical functionality may find the application limiting.

Between 2009 and 2010, the premier vendor for SPSS was called PASW (Predictive Analytics SoftWare) Statistics. The company announced on July 28, 2009 that it was being acquired by IBM. Versions 19.0 and 20.0 are named IBM SPSS

Statistics. This document has been written using SPAW version 18.0.


Getting data into SPSS

Prior to actually executing any of the statistical concepts from the Basic Concepts Manual, we first need to get the

WidgeOne.xls dataset into the SPSS system and convert it into an SPSS file.

When you open SPSS you should see the following screen:


As shown above, there are two tabs in the new file. A Variable View tab and a Data View tab. The Data View tab will display the data much the same way as an Excel spreadsheet.

We must import the data from the Excel spreadsheet WidgeOne. Do this by clicking on File>Open>Data. Then click the computer icon>Computer>C$(\\Client)(V:):

Note that if you are accessing SPSS through Citrix, all of your drive names will change. For example, your “C:” drive will become your “V:” drive.

Make sure that the File type is set to .xls to find an Excel file.

Browse to where the WidgeOne file is located.

When you open it, you will get a dialogue box like this:


Make sure that you select the Plant_Survey worksheet.


The following window should appear:

This is one of two possible views of your dataset. This is the Variable View. Note at the bottom of the screen, the Variable

View tab is highlighted. This view lists the variables in your dataset. In our case, the column names in the WidgeOne file were converted to variable names in this SPSS file. The qualitative variables (e.g., GENDER and PLANT) are called

“string” variables and the quantitative variables (e.g., PRDCTY and YRONJOB) are called “numeric” variables.


For later displays it will be nice to create user-friendly labels for each of the values in these variables, instead of indicators like “D” for the Dallas plant. To create labels that will make our output easier to read, click on the Values cell in the

PLANT row:


You will be prompted for a name and a label. In the Value box, enter the value that appears in the actual data that you want to read differently in the output:

Click the Add button. Next assign the label “Norcross” for the value “N” and click Add again. Click OK. Do this for the other string variables – Plant and Position. Please note that this is NOT affecting your actual data, it will only change the way that the output appears.


Go back to the Data View tab at the bottom of the screen…you will see the actual data 2 :

To expand the columns, simply place your cursor in between the column headers (variable names) and drag the column to its desired width – just like you would in Excel.

2 If you needed to create a new dataset from scratch, you would begin by defining your variables in the Variable View window and then return to the Data View window and input the values.


At this point, we could convert the other worksheets from the WidgeOne dataset into SPSS files. Each would be converted to a separate SPSS file. These files could be merged into one file using the Merge Files option in the Data Menu (not available in Student Version). However, since we will only be using the variables in the Plant_Survey worksheet for our statistical analyses, we will not execute a merge at this time.


Concept 1: Using SPSS for Measurements of Central Tendency

SPSS is a menu-based system. Thus it is only a matter of finding what you want to do on the menus and customizing your request. For most computations, you should find SPSS to be easier than Excel. In order to find the two most predominant measures of central tendency (the mean and median) we start in the Analyze menu. Within that menu, choose Descriptive

Statistics and Frequencies as shown:


Next you will see the following:

We need to choose the variables for which we are interested in finding the mean and median. We will choose only the quantitative variables (those with the “ruler” icon next to them): JOBGRADE, SOCREL, YRONJOB, PRDCTY and

JOBSAT. We make these selections by clicking on the variable from the list on the left and then clicking on the right arrow button circled above to place it on the variable list on the right. Almost every option in SPSS has this type of interface for selecting variables for analysis. You can choose more than one variable at a time by holding the Ctrl key down as you make your selections.

Please make sure that the Display frequency tables option is UNTICKED. This will be more meaningful later.


After you have identified the variables for analysis, click on the Statistics option button as circled above. You should see this screen:

This should look pretty familiar. This is almost the same list of statistical information that was produced when we executed Tools>Data Analysis>Descriptive Statistics in Excel. Hmmm…that must mean that this stuff really is important.

For now, just tick Mean and Median and select Continue.


We obtain the following display containing the means and medians of our five variables in our SPSS Output window:

Statistics

JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT

N Valid 40 40 40 40 40

Mean

Missing 6

6.6000

6

84.5798

6

5.5000

6

8.2900

6

6.8500

Median 6.5000 84.8114 5.0000 8.3500 6.6000

Notice that these figures are consistent with what we had generated using Excel and what we had computed by hand.

Isn’t it nice when numbers match?

What if we were only interested in a subset of the data? For example, what if we wanted to know the measurements of central tendency of these variables by gender and by plant?


We would select the Compare Means>Means option from the Analyze menu as shown:


You should see the following screen:

Typically, quantitative variables go in the

Dependent List and qualitative variables go in the Independent List.

Choose the same five variables as before. Place these variables in the Dependent List. Then, place the variables Plant and

Gender in the Independent List. This will enable us to better understand if there are differences between the genders and the plants across the quantitative variables like Productivity (PRDCTY).

Once the variable lists have been populated, select the Options button. From the list, identify that you want the Mean and the Median. Select Continue and OK.


You should see the following output:

Case Processing Summary

JOBGRADE * Plant

PRDCTY * Plant

SOCREL * Plant

YRONJOB * Plant

JOBSAT * Plant

JOBGRADE * Gender

PRDCTY * Gender

SOCREL * Gender

YRONJOB * Gender

JOBSAT * Gender

N

Included

Percent

40

40

40

40

40

40

40

40

40

40

87.0%

87.0%

87.0%

87.0%

87.0%

87.0%

87.0%

87.0%

87.0%

87.0%

N

6

6

6

6

6

6

6

6

6

6

Cases

Excluded

Percent

13.0%

13.0%

13.0%

13.0%

13.0%

13.0%

13.0%

13.0%

13.0%

13.0%

N

Total

Percent

46

46

46

46

46

46

46

46

46

46

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%


JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT * Plant

Plant

Dallas Mean

Std. Deviation

Norcross Mean

Total

Std. Deviation

Mean

Std. Deviation


6.8696

1.39167

6.2353

1.71499

6.6000

1.54919

88.3394

6.31238

79.4934

5.07442

84.5798

7.25633

5.5217

1.47308

5.4706

1.50489

5.5000

1.46760

8.1043 7.1478

4.66276 1.04745

8.5412

3.76199

8.2900

4.25657

6.4471

.85522

6.8500

1.02081

Gender

Female Mean

JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT * Gender


6.3000 83.9656 6.0000 8.1850 6.9800

Male

Total

Std. Deviation

Mean

Std. Deviation

Mean

Std. Deviation

1.49032

6.9000

1.58612

6.6000

1.54919

7.84944

85.1940

6.75811

84.5798

7.25633

1.48678

5.0000

1.29777

5.5000

1.46760

4.95020 1.06306

8.3950

3.55846

8.2900

6.7200

.98654

6.8500

4.25657 1.02081

This output is much more explanatory than the first set of output. Look at the differences between the plants. Which plant is more productive? Which plant has a higher Job Satisfaction score? Now look at the differences between the genders. Which gender has a higher social relations score? Is there a difference in productivity between the genders?


Sometimes looking at an average by itself is misleading. For example…let’s assume that a friend of yours just read an article about lung cancer. He goes on to tell you that “1% of all Americans will die of lung cancer”. I should probably mention that your friend is a member of the great statistical unwashed. Does this mean that you have a 1% chance of dying of lung cancer? Of course not. It depends upon a lot of things…like…do you smoke? If you re-evaluate that number by smokers/non-smokers, the values are very different. That’s the point…averages are very misleading. You need to look at the average (or median) by different groupings to better understand the “rest of the story”.

As a rule, we do not use the mode as a Measurement of Central Tendency with quantitative data. If the data is qualitative

– Plant, Gender, Position – it is the ONLY Measurement of Central Tendency available. We can determine the mode of variables such as these by selecting Analyze>Descriptive Statistics>Frequencies again from the Analyze menu. This time choose the qualitative variables – Plant, Gender and Position. Check the box next to display frequency tables. Then click

OK.

You will see the following frequency tables from which it is easy to determine if there is a modal value (isn’t this easier than what we had to go through in Excel?):

Plant

Valid

Dallas

Norcross

Total

Frequency Percent Valid Percent

6 13.0 13.0

23

17

46

50.0

37.0

100.0

50.0

37.0

100.0

Cumulative

Percent

13.0

63.0

100.0


Gender

Valid

Female

Male

Total


6 13.0 13.0

20

20

46

43.5

43.5

100.0

43.5

43.5

100.0

Cumulative

Percent

13.0

56.5

100.0

POSITION

Valid

Hourly

Management

Total


6 13.0 13.0

20

20

46

43.5

43.5

100.0

43.5

43.5

100.0

Cumulative

Percent

13.0

56.5

100.0

You can also see here that we are reaping the work of changing the labels earlier – the full name is much easier to interpret and understand.


Concept 2: Using SPSS for Measurements of Dispersion

To represent the dispersion of a quantitative variable (Measurements of Dispersion are not relevant for qualitative variables), we typically report the standard deviation. To do this in SPSS, return to the Analyze menu. Select Descriptive

Statistics>Frequencies and select the quantitative variables as before. Turn off the display for frequency tables and click on the Statistics button. Select Standard Deviation. Click Continue and then OK.


Statistics


N Valid

Missing

Std. Deviation

40

6

1.54919

40

6

7.25633

40

6

1.46760

40 40

6 6

4.25657 1.02081

We could have obviously included lots of statistics in our analysis simply by choosing the ones we want from the

Statistics screen.

The second Measurement of Dispersion discussed in the Basic Concepts Manual was the frequency table. To execute a basic frequency table for a qualitative variable, go to Analyze> Descriptive Statistics>Frequencies. Select the qualitative variables for analysis. Ensure that the Display frequency tables box is ticked at the bottom of the page. Click OK.



Plant

Valid

Dallas

Norcross

Total


6 13.0 13.0

23

17

46

50.0

37.0

100.0

50.0

37.0

100.0

Cumulative

Percent

13.0

63.0

100.0

Gender

Valid

Female

Male

Total


6 13.0 13.0

20

20

46

43.5

43.5

100.0

43.5

43.5

100.0

Cumulative

Percent

13.0

56.5

100.0


POSITION

Valid

Hourly

Management


6 13.0 13.0

20

20

43.5

43.5

43.5

43.5

Cumulative

Percent

13.0

56.5

100.0

Total 46 100.0 100.0

In the previous manuals, we explained how to categorize a quantitative variable into a qualitative variable. For example, when we created a frequency table for the job tenure variable, we created three categories: < 5 years, 5-10 years and more than 10 years. To create these same categories in SPSS, we need to recode our YRONJOB variable into a new variable called JOBTEN.


To do this, go to the Transform menu and choose the option Recode into Different Variables:


You should see the following:

Click on the Old and New Variables button.


You should now see the following screen:

Identify the name of the first category

Identify the range of values for the first category.

Tick this box to tell SPSS that you are creating a qualitative variable

First we define the category “New”. In the screen above, you must indicate that the Range of this new value is from 0 to

4.9 (we wanted values less than 5 and the data had only one decimal place of accuracy). Check in the box that specifies that the new output variable will be of type “String”. We also name the new values “New”.


Click on the Add button to add this new output value.

These actions will produce the following:

Note that the values of YRONJOB between 0 and 4.9 will represent the category “New” in the new variable.

Continue this same process creating the category “Experienced” (5-10 years on the job) and the category “Mature” (10+ years on the job). Note: since the value “Experienced” has 11 characters, change the Width from 8 to 11.


After you have completed this process, click on Continue.


The “Name” is what will appear in the dataset. The “Label” is what will appear in the output. Select Change and then select OK.




Now we can easily generate a frequency table for the new variable JOBTEN. As before, go to Analyze>Descriptive

Statistics>Frequencies. Ensure that the frequency table option is ticked and select your new Jobten variable:

Job Tenure


Cumulative

Percent

Valid Experienced 16 40.0 40.0 40.0

Mature

New

Total

Well Done!

15

9

40

37.5

22.5

100.0

37.5

22.5

100.0

77.5

100.0


Concept 3: Using SPSS for Visualization/Organization of Univariate Data

For professional presentations or for formal documents, we recommend the use of a graphics package (e.g. Microsoft

Power Point). However, SPSS has some nice graphs available in the Graphs menu, which can be used less formally. In addition, it is very useful to develop graphics for your own purposes, because it enables you to see things about your data that you might not have otherwise seen.

As with Excel, let’s begin with a Histogram. We will also execute a Stem and Leaf plot, which we were not able to do with Excel.

To create a Histogram of the YRONJOB variable, select Analyze>Descriptive Statistics>Explore:


Assign YRONJOB to the Dependent list. Select the Plots button:


Tick the Stem and Leaf and Histogram options.

Click Continue. On the Explore dialogue box, make sure that the “Both” option is selected for the Display. Click OK.


This set of executions will generate the following output:

Descriptives

Statistic Std. Error

YRONJOB Mean

95% Confidence Interval for

Mean

Lower Bound

Upper Bound

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

8.2900

6.9287

9.6513

8.2917

8.3500

18.118

4.25657

.10

17.00

16.90

6.10

-.081

-.748

.67302

.374

.733


YRONJOB Stem-and-Leaf Plot

Frequency Stem & Leaf

2.00 0 . 01

5.00 0 . 22233

5.00 0 . 44555

4.00 0 . 6777

8.00 0 . 88888999

7.00 1 . 0000111

4.00 1 . 2333

4.00 1 . 4445

1.00 1 . 7

Stem width: 10.00

Each leaf: 1 case(s)

Here is the Stem and Leaf plot. If you imagine rotating this graphic clockwise 90 degrees, it is basically a Histogram on its side. The plot tells us that each stem has a width of 10.00. This means that the values should be interpreted in units of 10.

Let’s start in the middle with the frequency of 7.00. Here, we have four values that are 10.x and three more values that are

11.x. The next line indicates a frequency of 4.00. In the dataset, we have an observation that is 12.x and three observations that are 13.x. The greatest observation is 17.


This is a boxplot. Here, the center line is the median. The “box” is the Interquartile range – the high end of the box is the

75 th percentile and the low end is the 25 th percentile. The “whiskers” that extend in either direction tell us the full range of the data. If there were any “outliers” (defined as observations with values more than 1.5*IQR from the mean), they would be identified here.

Lots and lots of output…with relatively little work. That’s what I’m talking about!


To replicate the pie chart developed in the Basic Concepts Manual, go to Graph>Legacy Dialogues> Pie:


You should see this screen:

What we’re trying to communicate with the pie chart is the proportions of the different groups, so click summaries for groups of cases and then define.



Select % of cases – we are communicating proportions, after all

Drag the qualitative variable of interest in this box.


The only other thing we really need to do is to give our Pie Chart a title. So, click on the Titles tab and title the chart “Job

Tenure of WidgeOne Employees”. Feel free to explore the other tabs.

You should have generated the following:

To insert the percentages, simply double click on the graph to see this:


Click show data labels and then close the properties window and the chart editor to obtain the following graph:


Let’s say that you wanted to understand how the overall productivity of the company was allocated by plant – what percentage of the productivity comes from Dallas versus Norcross. This is easy to do in a Pie Chart in SPSS. Go back to

Graph>Legacy Dialogues> Pie.


This time, select “sum of variable” as the slice, select the PRDCTY as the variable to sum by, and then (since we want the percentages) select % of cases.


Don’t forget to change the title and add in the percentages by double clicking the pie chart.

You should see the following Pie Chart:

This pie chart now provides information regarding the percentage of WidgeOne’s productivity by plant (Norcross needs to step it up).


The next univariate visualization tool is a Bar chart. This is done in a very similar way to the Pie Chart.

Select Graph>Legacy Dialogues> Bar. You should see the following screen:


Again, we’re comparing groups and we would want the information summarized by group, so click on summaries for groups of cases and then define.

Change this to counts (n of cases).

Drag the variable of interest into the category axis


As before, change the title to something meaningful. You should generate something like this:

This could be better. Double click on the graph to bring up the chart editor.


Click on Transpose Chart Coordinate

System to generate a horizontal bar chart

Click on Data Label Mode to insert the counts

Click on the labels. This brings up a window where you can change the order of the groups from alphabetical to something that makes more sense

We should probably note at this point that if the definitions that you assigned when you transformed the quantitative variable into a qualitative variable are not universally known, you should include a legend or key at the bottom of your graphic to ensure that the reader understands the definition of “New” and “Mature”.


Concept 4: Using SPSS for Visualization/Organization of Multivariate Data

Contingency tables, Stacked Bar Charts, 100% Stacked Bar Charts and Scatter Plots can be easily generated in SPSS.

To reproduce the Contingency Tables that were created in earlier manuals that included the variables Plant and Gender, select Analyze>Descriptive Statistics>Cross Tabs:

Place the Plant variable in the Row position.

Place the Gender variable in the

Column position.


As with Excel Pivot Tables, Crosstabs in SPSS are very flexible. If you wish to include more than just the frequency counts in the cells of your table, click on Cells. You will see the following window:

In the percentages section, select Row, Column and Total. Click Continue and then OK.


Wow…look how much output was created in a single table! That was so much easier than Excel! The output table contains the conditional probabilities described in The Basic Concepts Manual. In the first “cell” – the intersection of

Female and Dallas – we have four pieces of information. We know that there are 13 women who work in Dallas. We know that of all of the Dallas employees, 56.5% are female. We know that of all of the women, 65% are in Dallas. Finally, we know that of all employees, 32.50% are females in Dallas. 

Plant * Gender Crosstabulation

Plant

Total

Dallas Count

% within Plant

% within Gender

% of Total

Norcross Count

% within Plant

% within Gender

% of Total

Count

% within Plant

% within Gender

% of Total

Gender

Female Male

13

56.5%

10

43.5%

65.0%

32.5%

7

41.2%

35.0%

17.5%

20

50.0%

25.0%

10

58.8%

50.0%

25.0%

20

50.0%

100.0%

50.0%

50.0%

100.0%

50.0%

Total

23

100.0%

57.5%

57.5%

17

100.0%

42.5%

42.5%

40

100.0%

100.0%

100.0%

If you need to subset this information further (e.g. by Job Tenure), there is an easy way to do that. Go back to the

Analyze>Descriptive Statistics>Crosstabs screen. Press Reset to return to the default settings.


Make your selections of the three variables as follows:

Click OK.


This time, the table will only show the cell counts (we could have included the percentages as before by following the same steps in the Cell Display screen):

Plant * Gender * Job Tenure Crosstabulation

Count

Job Tenure Gender

Female Male Total

Experienced Plant Dallas

Norcross

Mature

New

Total

Plant Dallas

Norcross

Total

Plant Dallas

Norcross

Total

3

3

8

4

6

6

2

2

6

5

5

10

3

4

7

2

1

3

16

9

6

15

6

8

8

3

9

Notice that the same information on Plant and Gender counts has now been provided by each level of Job Tenure –

Experienced, Mature and New (the levels are reported in alphabetical order rather than by order of magnitude). Cool.


Stacked Bar Charts can be generated in SPSS using the same basic executions that you did for simple Bar Charts in the previous section. Select Graphs>Legacy Dialogues> Bar:


Select stacked and by groups, click define and then you should see the following screen:

Change the Title.

Place the Plant variable here.

Place the Gender variable here.


Select OK.

You should see the following Stacked Bar Chart:

Because these groups are of different sizes, it might be better to plot this information in a 100% Stacked Bar Chart instead.

To do this, double click on the graph:


Click the 100% stacked option.


You should see the following graphic:


The last multivariate visualization technique is the Scatter Plot. Again, SPSS provides us with flexibility to subset our analysis if needed.

What variables might have a relationship? What about Productivity and Job Satisfaction? A Scatter Plot of these variables can be generated by selecting Graph>Legacy Dialogues >Scatterplot, choose simple scatter plot then click define:


Don’t forget to change the title! You should see the following graphic:

So…what do you think? It appears that there might be a positive relationship between the two variables, because the graphic roughly moves in a linear fashion from the SW corner to the NE corner.


Concept 5: Using SPSS for Random Number Generation and Simple Random Sampling

Like the other software applications, SPSS will generate random numbers using the internal clock in the computer. As a result, every time a command is given to SPSS to generate some set of random numbers, a different set of random numbers will be generated. However, sometimes we may need to replicate a set of random numbers exactly the way they were previously generated. To accomplish this replication, SPSS allows the analyst to define a “seed” number that will ensure a consistent set of random numbers – the numbers are still random and can be used to ensure statistical independence of samples.

If you need to set the “seed” number so you can replicate your results, simply go to the Transform menu. Choose the

Random Number Generators option. You should see the following screen:


This system is set to have a Starting Point of 1234567. This starting point is referred to as a “seed”. You can set the starting point value prior to each analysis that uses the random numbers. The value must be a positive integer.

To create a string of random numbers, which is uniformly distributed between 0 and 1, go to the Transform menu and choose Compute Variable. We will call the new random number variable Group as shown in the screen below. Look at the menu for Function Group. In this menu, select Random Numbers. You will then see a long list appear in the Functions and Special Variables menu. This is a list of distributions that you could use to generate the new random variable Group.

This time double click on Rv.Uniform:


Every distribution has parameters that must be specified. For the uniform distribution, the only parameters are the two values between which we want our random numbers to fall. The “?”s in the expression RV.UNIFORM(?,?) which appears in the Numeric Expression box are asking you to fill in these two values for your random numbers. Change this expression to read RV.UNIFORM(0,1), so the random numbers will be between 0 and 1 (as it did in Excel). Click OK. The new variable Group should appear in your Data View. Here is what a typical result would look like:

Remember that your results will vary since this variable was randomly generated.


One of the primary reasons for generating random numbers is to assign observations into statistically independent groups. Using the random numbers, let’s assign the 40 observations into 2 groups – a test group and a control group.

Just like we did in section 1, select Transform>Recode Into Different Variables. Select the new variable Group to be transformed. Click on Old and New Values. Set it up, so that the values between 0 and .5 are put into the Control Group and the values from .5 to 1 are in the Test group:

Click on Continue. Give the new variable a name like “Assignment” and then click OK.


You should see the following on your Data View:

Now, you have two groups of randomly assigned employees. This is a very important concept in Statistical Testing.


Because the process of selecting a random sample from a set of data is so common, there is a very straight-forward way to accomplish this in SPSS. Suppose we wish to select a simple random sample of 30 individuals from this dataset. Select

Data>Select Cases>Random Sample of Cases>Sample:

You could choose to sample a certain percentage of the cases or sample 30 out of the first 40 cases. Do the latter. Click on

Continue and then OK.


Your Data view will now look like this:

Cases with a slash were not selected in the sample

Note there is a new variable in the list – filter_$. It assigns the value 1 to those values selected for the random sample and the value 0 to all others. Cases not selected for the sample are now slashed in the first column.

Remember that all samples will all differ unless the same seed is used to generate them.

At this point, you can execute all of your analysis as before, but only those cases with a filter=1 value will be analyzed. You can go back to all cases by selecting Data>Select Cases>All

Cases. 


Concept 6: Using SPSS for Confidence Intervals

Generating confidence intervals in SPSS is very easy. For example, if we wish to compute a 95% confidence interval for the mean Job Satisfaction rating of all employees, we would go to the Analyze menu and choose Compare Means and then choose One-Sample T Test 3 . Once the Job Satisfaction variable has been assigned, select Options and ensure that the

CI will be generated at a 95% Confidence:

3 Ttests are very common tests used to determine if two sample means differ significantly or if one sample mean differs from some established value. For more detailed information on Ttests, we suggest Statistical Methods and Data Analysis by Ott and Longnecker.


Click Continue and then OK. You will see the following output:

One-Sample Test

Test Value = 0

95% Confidence Interval of the

Difference t df Sig. (2-tailed) Mean Difference Lower Upper

JOBSAT 42.440 39 .000 6.85000 6.5235 7.1765

As stated previously in The Basic Concepts Manual, these results would be reported as:

“Based on a representative sample of 40 employees, we are 95% confident that job satisfaction among all employees is estimated to be between 6.52 and 7.18”.

This means that the probability that the “true” mean job satisfaction of all employees, which is unknown, falls between

6.52 and 7.18 is 95%. It also means that there is a 5% probability that the true mean job satisfaction is outside of this range

(< 6.52 or > 7.18).


SPSS Lagniappe

What is a “Lagniappe”? This word derives from New World Spanish la ñapa, “the gift”. The word came into the

Creole dialect of New Orleans and there acquired a French spelling. It is still used in the Gulf States, especially southern Louisiana, to denote a little bonus that a friendly shopkeeper might add to a purchase.

Our lagniappe for our readers includes the extra and interesting things that we have learned to do with these software packages that might not be easily found or well known. A little extra information at no extra cost!

You may have noticed that in your output screen in SPSS, you have been generating what is called “Syntax”. For example, when you executed the measurements of central tendency for the quantitative variables, SPSS wrote the following syntax:

FREQUENCIES

VARIABLES=JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT /FORMAT=NOTABLE

/STATISTICS=MEAN MEDIAN

/ORDER= ANALYSIS .

Did you notice that? Probably not. Most people use SPSS because they don’t have to write code to have the software do what they want. However, in the event that you find the “point-and-click” environment of SPSS too restricting, know that you always have the option to write custom syntax to have SPSS more specifically do what you want.

To run syntax in SPSS, select File>New>Syntax. In the blank syntax screen type (or paste) the syntax above. Then select

Run>All. You will generate the same output as before! Cool!


SPSS Manual - Department of Statistics and Analytical Sciences

SPSS Reference Manual:

A gentle overview

Table of Contents

1.

Introduction to SPSS

2.

Data Analysis and Statistical Concepts

Concept 1 – Measurements of Central Tendency

Concept 2 – Measurements of Dispersion

Concept 3 – Visualization of Univariate Data

Concept 6 – Confidence Intervals

Concept 4 – Visualization of Multivariate Data

Concept 5 – Random Number Generation And Simple Sampling

13

13

23

33

51

63

70

SPSS

What is SPSS?

Getting data into SPSS

Concept 1: Using SPSS for Measurements of Central Tendency

Concept 2: Using SPSS for Measurements of Dispersion

Concept 3: Using SPSS for Visualization/Organization of Univariate Data

Concept 4: Using SPSS for Visualization/Organization of Multivariate Data

Concept 5: Using SPSS for Random Number Generation and Simple Random Sampling

Concept 6: Using SPSS for Confidence Intervals

SPSS Lagniappe

Related documents

Products

Support

SPSS Manual - Department of Statistics and Analytical Sciences

SPSS Reference Manual:

A gentle overview

Table of Contents

1.

2.

13

13

23

33

51

63

70

SPSS

What is SPSS?

Getting data into SPSS

Concept 1: Using SPSS for Measurements of Central Tendency

Concept 2: Using SPSS for Measurements of Dispersion

Concept 3: Using SPSS for Visualization/Organization of Univariate Data

Concept 4: Using SPSS for Visualization/Organization of Multivariate Data

Concept 5: Using SPSS for Random Number Generation and Simple Random Sampling

Concept 6: Using SPSS for Confidence Intervals

SPSS Lagniappe

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib