eMERGE Network Project Proposal for

advertisement
eMERGE Network Supplemental Genotyping Project – Phenotype Description & Specifications
Platelet count (PLT) and Mean Platelet Volume (MPV)
Phenotype Description:
Project Title
Sites Involved
Genome- and Phenome-Wide Association Study to Identify Genetic Variants
Influencing Platelet Count (PLT) and Mean Platelet Volume (MPV)
The complete blood count (CBC), typically measured by automated hematology
analyzers is a commonly performed test in the clinical setting. PLT values were
available in the EMR for 13, 582 patients with extant GWAS data from the five sites
(Marshfield Clinic, Vanderbilt University Medical Centre, Group Health Corporation,
Mayo Clinic and Northwestern University) of the eMERGE I network. MPV values
(n=6291) were available from three sites (Marshfield Clinic, Mayo Clinic and
Northwestern University)
Platelets are enucleated cell fragments derived from megakaryocytes that play key
roles in hemostasis and in the pathogenesis of atherothrombosis and cancer.
Platelet traits are highly heritable and identification of genetic variants associated
with platelet traits and assessing their pleiotropic effects may help to understand
the role of underlying biological pathways.
Background /
Significance
Outline of Project
Planned Statistical
Analyses
Last updated: 2/16/2016
We propose to conduct an electronic medical record (EMR)-based genomics study
to: 1) identify common variants that influence inter-individual variation in the
number of circulating platelets (PLT) and mean platelet volume (MPV), by
performing a genome-wide association study (GWAS) 2) assess pleiotropic effects
of such variants by performing a phenome-wide association study (PheWAS) with a
wide range of EMR- derived phenotypes and 3) characterize association of variants
influencing MPV and PLT using functional, pathway and disease enrichment
analyses.
1. Understand genetic variants associated with PLT and MPV
2. Perform genome-wide and phenome-wide association study
3. Higher order SNP associations (haplotype & epistasis)
4. Pathway enrichment (biological prior)
5. Demonstrate benefit of genetics to clinical model
Clinical data inspection
When multiple measurements of platelet traits were available for an individual
patient, we chose the median value and the corresponding age for the genetic
analyses.
Association and Imputation:
We will use the efficient mixed-model association expedited (EMMAX) algorithm to
correct for sample relatedness and cryptic population substructure. The IBS matrix
was calculated for each pair of individuals using the genome-wide genotype data.
The generalized least squares F-test was used to estimate the regression coefficient
( ) and perform association analyses, which were implemented in the R ‘emmax’
package, with adjustment for age, sex, and site. The statistical power of our study
was 80% to detect a quantitative trait locus that explained 0.29% of variance in PLT
and 0.63% in MPV, given a sample size of 13,582 and 6,291 respectively and a
significance level of 5E-8. We will all non-genotyped SNPs in 20 autosomal
chromosomes based on HapMap II CEU database (release 21). Imputation-based
association for non-genotyped SNPs will be performed using the same IBS matrix
for the genotyped SNPs by EMMAX.
Page 1
Other analyses: PheWAS will be performed with PLINK using logistic regression
analyses that adjusted for age and gender and assumed an additive genetic model
Ethical considerations
Functional, pathway and disease enrichment analysis:
To understand the functional similarity and diversity of these genes, we will use a
bioinformatics analysis pipeline that combined 3 different tools: Gene Ontology
and Protein-class enrichment analysis using Protein ANalysis Through Evolutionary
Relationship (PANTHER), Molecular events (Pathway) enrichment analysis using
Reactome and complex disease enrichment using Functional Disease Ontology.
Potential risks to the participants from the focus groups are of a
social/psychological nature. These risks include breach of confidentiality and the
potential for psychological discomfort, such as anxiety or stress, raised by the
topics in the focus group vignettes or in open discussions of the consensus panel.
Some individuals may find it inconvenient to attend the scheduled sessions. There
are no physical risks involved.
Phenotype Specifications:
Lab Results with the following components (where available):
1. PLATELET COUNT (K/uL) (PLT)
2. Mean Platelet Volume (MPV)
Person exclusion criteria:
Samples with PLT values ≤ 100 or ≥ 600 (×109/L) were excluded. Of the remaining PLT and MPV lab values, the
median value was extracted for each study participant and included in further analyses.
Other Repeated Measures/Events Associated Temporally with Lab Results:
13424 patients of European ancestry with PLT data; 6291 of these patients also had MPV data from five eMERGE
sites after exclusion. When multiple measurements of platelet traits were available for an individual patient, we
chose the median value and the corresponding age for the genetic analyses.
Subject Table:
Variables:
 Gender




(NA=Not Assessed; Missing=.; F=Female; M=Male; U=Unknown)
Race
(NA=Not Assessed; Missing=.; 0=Black or African American; 1=American Indian or Alaska Native;
2=Asian; 3=White; 4=Native Hawaiian or other Pacific Islander; 5=Other; 6=Unknown)
Ethnicity
(NA=Not Assessed; Missing=.; 1=Hispanic or Latino;0=Not Hispanic or Latino; 99=Unknown)
Decade of birth

(1=Before 1920; 2=1920-1929; 3=1930-1939; 4=1940-1949; 5=1950-1959; 6=1960-1969; 7=19701979; 8=1980-1989; 9=1990+; 99=unknown)
Principal Components 1-3 (for genetically determined ancestry)



APPENDIX
Last updated: 2/16/2016
Page 2
NONE
Last updated: 2/16/2016
Page 3
Download