SOP_match_Excel_data_in_SPSS_Cov

advertisement
SOP how I match genotype and phenotype data from Excel and SPSS files
to analyze covariance (ANCOVA)
Stefan Vormfelde
In this SOP I describe, how I match genotype and phenotype data to analyze covariance
ANCOVA in SPSS. I describe, how I match genotype and phenotype data from Excel files in
an SPSS data file.
Why a routine?
For genotype-phenotype association analysis, SPSS needs genotype data and phenotype data
in a common sav-file. I compose the respective data from Excel files into a common sav-file
using an SPSS-routine. Advantages of using routines include the
1. reduction of mismatches, especially, when data come sorted by different orders
2. speed (sometimes) and
3. tracebility, when data sheets develop further.
Why Excel?
I prefer Excel to maintain best control working up genotype and phenotype data. SPSS can
compose data from other formats. However, there’s commonly also a way to import them in
Excel.
Samples
The procedure, I describe in this SOP, can be recalculated using these files:
gSample_strata.xls (genotype samples in an Excel file)
pSample_strata.xls (phenotype samples in an Excel file)
match_Excel_data_in_SPSS_genotypes_and_phenotypes.xls
(to prepare the routine’s syntax)
match_Excel_data_in_SPSS_genotypes_and_phenotypes.sps (the final routine)
repCovariance_dataSheet.spv (the documentation of the run)
repCovariance_dataSheet.sav (the outputfile I desired)
I prepare the Excel-files
I restrict to a single header line in both Excel-files. However, this may not be necessary.
The SPSS-routine can sort the data by more than one criterion, e.g. by subject and also by
study center. To make use of them, I have to position the columns containing the sort criteria
as the first columns. I sort these columns in the order I want to use the sort criteria: First
column – first criterion, second column – second criterion, ...
I adjust the sort criteria: The headers of the columns must match between the files, e.g.
“Pat_ID”, “study_center”, … The values in the cells must also match between the files. The
respective command will not match e.g. B_1 to “1” but only B_1 to B_1 and “1” to “1”.
I prepare the sps-file (SPSS-routine)
To prepare the routine file, I prepare the syntax in Excel-files and copy and paste it to spsfiles afterwards (match_Excel_data_in_SPSS_genotypes_and_phenotypes.sps).
I follow the instructions in the first column of the xls-file.
I may save the file.
Finally, I mark and copy (ctrl+c) the boxed area.
I open SPSS. Then I select the pull-down menu “file”, then “new” and “syntax”.
I insert (rightclick+insert) the text.
I may save the file.
Ready to go.
I execute the SPSS routine
To execute the routine, I open match_Excel_data_in_SPSS_genotypes_and_phenotypes.sps.
I select the pull-down menu “execute” (“Ausführen”) and then “all” (“Alle”). This opens an
output file, where I can follow the process and where warnings and mistakes are documented.
Warnings: When I run my sample-routine on my sample-files, I get warnings in the spv-file,
which correctly hint to more than one subject with data but without a subject ID in the sample
files. These warnings do not preclude usage of the resulting sav-file. I do not get more
warnings.
The last command stores the output as an spv-file, e.g. “repCovariance_dataSheet.spv”. I keep
these spv-files for traceability.
The routine stores the matched data file as an sav-file according to the last command line, e.g.
“repCovariance_dataSheet.sav”. This is result I desired. I keep this sav-file for traceability.
I can now proceed with genotype-phenotype association analysis in the sav-file.
Download