missing_values_in_SPSS

advertisement
April 9, 2004
A note on missing values and recodes in SPSS
Missing values in SPSS can be of two types: System Missing and user-defined missing.
System missing values in SPSS are represented by a period in the data sheet (as they are
in SAS). They will never be used in any analyses. However, unlike SAS, system-missing
values are not considered by SPSS to be smaller than any numeric values. A systemmissing value will show up in the data sheet as a period (.), as in SAS. Unfortunately,
SPSS does not like to read in missing values of period (.) in raw data, and it will
complain about any periods it finds in the raw data. However, even though it complains,
SPSS will correctly read periods into the data set, and assign them to be system missing.
User-defined missing values in SPSS retain their numeric value in the data sheet, but
they will not be used in analyses. This type of missing value is not available in SAS.
How missing values affect recodes and compute commands in SPSS. As in SAS, you
need to be careful when doing recodes to be sure that missing values in the original
variable are not improperly set to a non-missing value in the new variable.

There is no problem with missing values if explicit codes are given for each value
of the original variable, as in the example below:
RECODE
age
(15 thru 20=1) (21 thru 29=2) (30 thru 39=3) INTO agegrp .
EXECUTE .

SPSS can also use the key words Lowest and Highest in recodes, and these will
not improperly recode missing values. The lowest and highest key words refer to
the lowest non-missing value, and the highest non-missing value, respectively.
RECODE
age
(lowest thru 20=1) (21 thru 29=2) (30 thru highest=3) INTO agegrp .
EXECUTE .

Setting up codes for dummy variables can be a problem. Be very careful when
using “else” as part of a recode. The problems is that “else” includes all values
that have not been explicitly coded, even missing values. The code shown below
will give correct results, even though “else” is used.
RECODE
origvar
(missing=sysmis) (1=1) (ELSE=0) INTO newvar .
EXECUTE.
This code makes certain that the “else” part of the code will not apply to either
system-missing or user-missing values, because the missing values have already
been put into system missing at the start of the recode.
If you only want to be sure that system missing values (not user-missing values,
such as 99) are sent to system missing in your new variable, you can use the code
as shown below:
RECODE
origvar
(sysmis=sysmis) (1=1) (ELSE=0) INTO newvar .
EXECUTE.
When this code is used, user-defined missing values, such as 99 or 88, will be
recoded into newvar as zero.

“Do if” statements can also be used to make sure missing values are correctly
recoded. Make sure that you use an end if. statement to end the do loop in SPSS.
This syntax is shown below:
do if not sysmis(origvar).
RECODE
origvar
(Lowest thru 100 =1) (ELSE=0) INTO dumvar.
end if.
EXECUTE .
To exclude both system-missing and user-missing values from the dummy
variable coding, you can use:
do if not missing(origvar).
RECODE
origvar
(Lowest thru 100 =1) (ELSE=0) INTO dumvar.
end if.
EXECUTE .

The Compute command can also cause problems with missing values in SPSS. If
you use Compute, as in the example below, to create a new dummy variable, you
will get incorrect results, if there are any missing values for origvar:
COMPUTE NEWVAR = (ORIGVAR = 2) .
EXECUTE.
This compute command will result in values of origvar=2 being set to 1 in
newvar, and all other values including system-missing and user-missing, being set
to zero. To get the correctly recoded values, use the following syntax:
do if not sysmis(origvar).
COMPUTE NEWVAR = (ORIGVAR = 2) .
end if.
EXECUTE .
This compute command will result in system-missing values of origvar being set
to system-missing in newvar. Values of origvar=2 will be set to 1 in newvar. All
and all other values including user-missing, will be set to zero.
To make sure that both system-missing and user-missing values of origvar get
coded as missing:
OR:
do if not missing(origvar).
COMPUTE NEWVAR = (ORIGVAR = 2) .
end if.
EXECUTE .
None of this will be necessary if there are no missing values in the original variable. But,
it is always wise to be on the safe side and to practice safe computing. It is always wise to
check your work carefully!
Download