SOP how I match genotype and phenotype data from Excel and SPSS files
to perform repeated measures ANOVA
Stefan Vormfelde
In this SOP I describe, how I match genotype and phenotype data to perform repeated
measures ANOVA in SPSS. I describe, how I match
1. multiple phenotype data from Excel files in an SPSS data file,
2. genotype data from Excel files to phenotype data from an SPSS file in a common SPSS
Why a routine?
For genotype-phenotype association analysis, SPSS needs genotype data and phenotype data
in a common sav-file. I compose the respective data from Excel files into a common sav-file
using an SPSS-routine. Advantages of using routines include the
1. reduction of mismatches, especially, when data come sorted by different orders
2. speed (sometimes) and
3. tracebility, when data sheets develop further.
Why Excel?
I prefer Excel to maintain best control working up genotype and phenotype data. SPSS can
compose data from other formats. However, there’s commonly also a way to import them in
The procedure, I describe in this SOP, can be recalculated using these files:
match multiple phenotypes from Excel
pSample_RM.xls (phenotype samples in an Excel file)
match_Excel_data_in_SPSS.xls (to prepare the routine file)
match_Excel_data_in_SPSS.sps (the final routine file)
Phenotypes_RM.spv (the documentation of the run)
Phenotypes_RM.sav (the outputfile I desired)
match genotypes from Excel to phenotypes from SPSS
gSample_RM.xls (genotype samples in an Excel file)
match_Excel_data_to_SPSS_data_multiple_phenotypes.xls (to prepare the routine)
match_Excel_data_to_SPSS_data_multiple_phenotypes.sps (the final routine file)
rmANOVA_dataSheet.spv (the documentation of the run)
rmANOVA_dataSheet.sav (the data sheet I desired)
I prepare the Excel-files
I restrict to a single header line in both Excel-files. However, this may not be necessary.
The SPSS-routine can sort the data by more than one criterion, e.g. by subject and also by
study center. To make use of them, I have to position the columns containing the sort criteria
as the first columns. I sort these columns in the order I want to use the sort criteria: First
column – first criterion, second column – second criterion, ...
I adjust the sort criteria: The headers of the columns must match between the files, e.g.
“Pat_ID”, “study_center”, … The values in the cells must also match between the files. The
respective command will not match e.g. B_1 to “1” but only B_1 to B_1 and “1” to “1”.
I prepare the sps-file (SPSS-routine)
To prepare the routine file, I prepare the syntax in Excel-files and copy and paste it to spsfiles afterwards. (match_Excel_data_in_SPSS.xls, match_Excel_data_to_SPSS_data.xls).
I follow the instructions in the first columns of the excel files.
I may save the file.
Finally, I mark and copy (ctrl+c) the boxed area.
I open SPSS. Then I select the pull-down menu “file”, then “new” and “syntax”.
I insert (rightclick+insert) the text.
I may save the file.
Ready to go.
I execute the SPSS-routine
To execute the routine, I open the sps-file, which I want to run.
I select the pull-down menu “execute” (“Ausführen”) and then “all” (“Alle”). This opens an
output file, where I can follow the process and where warnings and mistakes are documented.
Warnings: When I run my sample-routine on my sample-files, I get warnings in the spv-file,
which correctly hint to more than one subject with data but without a subject ID in the sample
files. These warnings do not preclude usage of the resulting sav-file. I do not get more
The last command stores the output as an spv-file, e.g. “Phenotpyes_RM.spv” or
“rmANOVA_dataSheet.spv”. I keep these files for traceability.
The routine stores the matched data file as an sav-file according to the last command line, e.g.
“Phenotypes_RM.sav” or “rmANOVA_dataSheet.sav”. This is result I desired. I keep this
sav-file for traceability.
I can now proceed with genotype-phenotype association analysis in the sav-file.