Genetic Study of Aspirin Responsiveness (GeneSTAR) Data Dictionary GeneSTAR Analysis Datasets Version 1.0, May 2008 GeneSTAR Data Coordinating and Analysis Center Division of Statistical Genomics, Washington University School of Medicine St. Louis, MO February 17, 2016 GeneSTAR Analysis Data Dictionary Version 1.0 INTRODUCTION ................................................................................................................................................3 CLINIC DATASETS ............................................................................ ERROR! BOOKMARK NOT DEFINED. GENETICS-RELATED DATASETS ................................................................................................................... GANON DATASET ............................................................................................................................................... GTRIPLE DATASET ............................................................................................................................................. LOCMAP DATASET ............................................................................................................................................. GENEFREQ DATASET ......................................................................................................................................... 2 2/17/2016 GeneSTAR Analysis Data Dictionary Version 1.0 INTRODUCTION The overall objective of the GeneSTAR study is to characterize the inhibitory effect of low dose aspirin (ASA) on agonist-induced platelet aggregation, thromboxane and ATP release, aggregation under shear conditions, and surface expression of P-selectin and CD40 ligand in 3200 subjects from 400 multi-generational families, half African American and half white. Low dose aspirin (ASA) is cost effective and efficacious for the prevention and treatment of coronary heart disease (CHD). The benefit of ASA is thought to be related to the irreversible acetylation of platelet cyclo-oxygenase-1 (COX-1), resulting in a reduction in platelet aggregation and platelet-mediated inflammation. Considerable inter-individual variation exists in the effect of ASA on platelet function, and this variability may be related to genetic variations across individuals. Participants are high-risk siblings of patients with premature CHD (previously identified in the Johns Hopkins Sibling Study), along with their adult offspring. Platelet function and plasma inflammatory markers (C-reactive protein, interleukin-1 Beta, interleukin-6, monocyte chemotactic protein-1, and matrix metalloproteinase-9) were measured at baseline and after 14 days of ASA, 81 mg/day, to characterize ASA-response phenotypes. From a list of candidate genes involved in the known biochemical pathways of platelet aggregation and platelet-mediated inflammation, 20 genes will be initially selected for genotyping of 15-20 single nucleotide polymorphisms (SNPs) per gene, based on biological importnace and the presence of sufficient known SNPs in coding and/or regulatory regions. After the first 1600 participants have been phenotyped, a complementary genome wide scan of short tandem repeat (STR) markers and fine mapping of up to 5 regions of interest will be done using SNP clusters. Based on linkage analysis, the list of candidate genes will be re-prioritized and additional genotyping will be performed. Analyses will be performed to determine whether ASA responsiveness is heritable and whether it is associated with specific variations in candidate genes or defined haplotypes. The results should lead to a better understanding of the variability among individuals in ASA responsiveness, including possible racial differences, and should enable genotype tailoring of preventive therapy for CHD in high-risk individuals This document describes the Analysis datasets and main variables for the GeneSTAR study. The datasets are divided into … categories: 1) CLINIC datasets, which contains …. phenotypic/clinic SAS datasets with one record per observation, and 2) GENETICS datasets, which GENETICS-RELATED DATASETS Genetic datasets are summarized in the table below followed by a detailed description. All genotype data and pedigree information have been through extensive quality control, and are considered the final ‘analysis’ versions of the datasets. Datasets Structure Comments INFO GANON GTRIPLE 1 obs/ subject 1 obs/ subject 1 obs/ subject Includes the genotyped anonymous marker data Triple information (Subject ID, Family ID, Father ID, Mother ID and Gender) 3 2/17/2016 GeneSTAR Analysis Data Dictionary Version 1.0 LOCMAP 1 obs/ marker GENEFREQ 1 obs/allele/marker The locus description file for every marker genotyped, including chromosome and map distance. Provides the percentage of every allele per marker in the GENODATA dataset INFO Dataset The INFO dataset contains information about 3250 subjects which were genotyped. The dataset contains subject ID, original barcode (obarcode), numerical barcode variable (barcoden), age, sex, and race for the subjects. GANON Dataset The GANON dataset contains the ID for each subject in the DEMOG dataset, along with their marker genotyping results. All missing marker data is coded as ‘0/0’. GTRIPLE Dataset The GTRIPLE dataset is the genetically-corrected pedigree file for all GeneSTAR subjects. Reported family relationships have been confirmed, and corrected, if necessary, using the ASPEX and GRR programs. These programs use all marker data to determine the most likely relationship between each relative-pair. The GTRIPLE dataset contains the subject ID (ID), along with his/her mother’s ID, father’s ID, family ID and gender for each subject. It records family structure in the standard form used in most commonly accepted genetic analysis programs. NEW-GTRIPLE Dataset LOCMAP Dataset The LOCMAP dataset provides a complete list of the markers currently genotyped in the GANON dataset. It has one record per marker. Specifically, this dataset provides marker name, chromosome number, genetic distance (i.e., distance along the chromosome for each marker), and alias. This dataset contains information for … markers. GENEFREQ Dataset The GENEFREQ dataset contains the allele frequencies for each marker from the GANON dataset with one record per allele per marker. There are three variables in the dataset, marker name, allele and percent. The variable PERCENT holds the actual frequency as a percent of the total, scaled to 100. 4 2/17/2016 GeneSTAR Analysis Data Dictionary Version 1.0 INDEX A B C D F G H I K L M P R S T U X 5 2/17/2016