HyperGEN DATA DICTIONARY, version 3

advertisement
Genetic Study of Aspirin
Responsiveness (GeneSTAR)
Data Dictionary
GeneSTAR Analysis
Datasets
Version 1.0, May 2008
GeneSTAR Data Coordinating and Analysis Center
Division of Statistical Genomics, Washington University School of Medicine
St. Louis, MO
February 17, 2016
GeneSTAR Analysis Data Dictionary
Version 1.0
INTRODUCTION ................................................................................................................................................3
CLINIC DATASETS ............................................................................ ERROR! BOOKMARK NOT DEFINED.
GENETICS-RELATED DATASETS ...................................................................................................................
GANON DATASET ...............................................................................................................................................
GTRIPLE DATASET .............................................................................................................................................
LOCMAP DATASET .............................................................................................................................................
GENEFREQ DATASET .........................................................................................................................................
2
2/17/2016
GeneSTAR Analysis Data Dictionary
Version 1.0
INTRODUCTION
The overall objective of the GeneSTAR study is to characterize the inhibitory effect of low
dose aspirin (ASA) on agonist-induced platelet aggregation, thromboxane and ATP release,
aggregation under shear conditions, and surface expression of P-selectin and CD40 ligand in
3200 subjects from 400 multi-generational families, half African American and half white.
Low dose aspirin (ASA) is cost effective and efficacious for the prevention and treatment of
coronary heart disease (CHD). The benefit of ASA is thought to be related to the irreversible
acetylation of platelet cyclo-oxygenase-1 (COX-1), resulting in a reduction in platelet
aggregation and platelet-mediated inflammation. Considerable inter-individual variation
exists in the effect of ASA on platelet function, and this variability may be related to genetic
variations across individuals.
Participants are high-risk siblings of patients with premature CHD (previously identified in
the Johns Hopkins Sibling Study), along with their adult offspring. Platelet function and
plasma inflammatory markers (C-reactive protein, interleukin-1 Beta, interleukin-6,
monocyte chemotactic protein-1, and matrix metalloproteinase-9) were measured at baseline
and after 14 days of ASA, 81 mg/day, to characterize ASA-response phenotypes. From a list
of candidate genes involved in the known biochemical pathways of platelet aggregation and
platelet-mediated inflammation, 20 genes will be initially selected for genotyping of 15-20
single nucleotide polymorphisms (SNPs) per gene, based on biological importnace and the
presence of sufficient known SNPs in coding and/or regulatory regions. After the first 1600
participants have been phenotyped, a complementary genome wide scan of short tandem
repeat (STR) markers and fine mapping of up to 5 regions of interest will be done using SNP
clusters. Based on linkage analysis, the list of candidate genes will be re-prioritized and
additional genotyping will be performed. Analyses will be performed to determine whether
ASA responsiveness is heritable and whether it is associated with specific variations in
candidate genes or defined haplotypes. The results should lead to a better understanding of
the variability among individuals in ASA responsiveness, including possible racial
differences, and should enable genotype tailoring of preventive therapy for CHD in high-risk
individuals
This document describes the Analysis datasets and main variables for the GeneSTAR study.
The datasets are divided into … categories: 1) CLINIC datasets, which contains ….
phenotypic/clinic SAS datasets with one record per observation, and 2) GENETICS datasets,
which
GENETICS-RELATED DATASETS
Genetic datasets are summarized in the table below followed by a detailed description. All
genotype data and pedigree information have been through extensive quality control, and are
considered the final ‘analysis’ versions of the datasets.
Datasets
Structure
Comments
INFO
GANON
GTRIPLE
1 obs/ subject
1 obs/ subject
1 obs/ subject
Includes the genotyped anonymous marker data
Triple information (Subject ID, Family ID, Father ID, Mother ID and Gender)
3
2/17/2016
GeneSTAR Analysis Data Dictionary
Version 1.0
LOCMAP
1 obs/ marker
GENEFREQ
1 obs/allele/marker
The locus description file for every marker genotyped, including chromosome and
map distance.
Provides the percentage of every allele per marker in the GENODATA dataset
INFO Dataset
The INFO dataset contains information about 3250 subjects which were genotyped. The
dataset contains subject ID, original barcode (obarcode), numerical barcode variable
(barcoden), age, sex, and race for the subjects.
GANON Dataset
The GANON dataset contains the ID for each subject in the DEMOG dataset, along with
their marker genotyping results. All missing marker data is coded as ‘0/0’.
GTRIPLE Dataset
The GTRIPLE dataset is the genetically-corrected pedigree file for all GeneSTAR subjects.
Reported family relationships have been confirmed, and corrected, if necessary, using the
ASPEX and GRR programs. These programs use all marker data to determine the most likely
relationship between each relative-pair. The GTRIPLE dataset contains the subject ID (ID),
along with his/her mother’s ID, father’s ID, family ID and gender for each subject. It records
family structure in the standard form used in most commonly accepted genetic analysis
programs.
NEW-GTRIPLE Dataset
LOCMAP Dataset
The LOCMAP dataset provides a complete list of the markers currently genotyped in the
GANON dataset. It has one record per marker. Specifically, this dataset provides marker
name, chromosome number, genetic distance (i.e., distance along the chromosome for each
marker), and alias. This dataset contains information for … markers.
GENEFREQ Dataset
The GENEFREQ dataset contains the allele frequencies for each marker from the GANON
dataset with one record per allele per marker. There are three variables in the dataset, marker
name, allele and percent. The variable PERCENT holds the actual frequency as a percent of
the total, scaled to 100.
4
2/17/2016
GeneSTAR Analysis Data Dictionary
Version 1.0
INDEX
A
B
C
D
F
G
H
I
K
L
M
P
R
S
T
U
X
5
2/17/2016
Download