3. Genotyping and quality control

advertisement
GWAS of CHD in African Americans
Genome-wide association study of coronary heart disease and its risk
factors in 8,090 African Americans: The NHLBI CARe Project
Text S1
Guillaume Lettre1,2, Cameron D. Palmer3,4,#, Taylor Young3,#, Kenechi G. Ejebe3,#, Hooman Allayee5,
Emelia J. Benjamin6,7, Franklyn Bennett8, Donald W. Bowden9, Aravinda Chakravarti10, Al
Dreisbach11, Deborah N. Farlow3, Aaron R. Folsom12, Myriam Fornage13, Terrence Forrester8, Ervin
Fox11, Christopher A. Haiman5, Jaana Hartiala5, Tamara B. Harris14, Stanley L. Hazen15, Susan R.
Heckbert16,17,18, Brian E. Henderson5, Joel N. Hirschhorn3,4,19, Brendan J. Keating20, Stephen B.
Kritchevsky21, Emma Larkin22, Mingyao Li23, Megan E. Rudock24, Colin A. McKenzie25, James B.
Meigs19,26, Yang A. Meng3, Tom H. Mosley Jr11, Anne B. Newman27, Christopher H. NewtonCheh3,7,19,28,29, Dina N. Paltoo30, George J. Papanicolaou30, Nick Patterson3, Wendy S. Post31, Bruce M.
Psaty16,17,18, Atif N. Qasim32, Liming Qu23, Daniel J. Rader32,33, Susan Redline23, Muredach P. Reilly32,33,
Alexander P. Reiner34, Stephen S. Rich35, Jerome I. Rotter36, Yongmei Liu24, Peter Shrader26, David S.
Siscovick16,17, W.H. Wilson Tang15, Herman A. Taylor Jr.11,37,38, Russell P. Tracy39, Ramachandran S.
Vasan6,7, Kevin M. Waters5, Rainford Wilks40, James G. Wilson11,41, Richard R. Fabsitz30, Stacey B.
Gabriel3, Sekar Kathiresan3,7,19,28,29, Eric Boerwinkle42
1. Montreal Heart Institute, 5000 Bélanger Street, Montréal, Québec, H1T 1C8, Canada
2. Département de Médecine, Université de Montréal, C.P. 6128, succursale Centre-ville, Montréal, Québec, H3C 3J7,
Canada
3. Program in Medical and Population Genetics, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
4. Divisions of Genetics and Endocrinology and Program in Genomics, Children’s Hospital Boston, Boston, MA 02115, USA
5. Department of Preventive Medicine, USC Keck School of Medicine, Los Angeles, CA 90033, USA
6. Department of Medicine, Boston University Schools of Medicine and Epidemiology, Boston, MA 02215, USA
7. Framingham Heart Study of the National, Heart, Lung, and Blood Institute and Boston University, Framingham,
Massachusetts 01702, USA
8. Tropical Medicine Research Institute, University of the West Indies, Mona, Kingston 7, Jamaica
9. Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA
10. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287,
USA.
11. Department of Medicine, University of Mississipi Medical Center, Jackson, MS 39216, USA
12. Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN 55454, USA
13. Institute of Molecular Medicine and Division of Epidemiology School of Public Health, University of Texas Health
Sciences Center at Houston, 1825 Pressler Street SRB 530.G, Houston, TX, 77030, USA
14. Laboratory of Epidemiology, Demography, and Biometry, National Institute on Aging, Bethesda, MD, 20892, USA
15. Departments of Cell Biology and Cardiovascular Medicine, The Center for Cardiovascular Diagnostics & Prevention,
Cleveland Clinic, Cleveland, Ohio 44195, USA
16. Departments of Medicine and Epidemiology, University of Washington, Seattle, Washington 98195, USA
17. Cardiovascular Health Research Unit, University of Washington, Seattle, Washington 98101, USA
18. Group Health Research Institute, Group Health Cooperative, Seattle, WA 98124, USA
19. Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
20. Center for Applied Genomics, Abramson Building, Children’s Hospital of Philadelphia, PA 19104, USA
21. J. Paul Sticht Center on Aging, Division of Gerontology and Geriatric Medicine, Wake Forest University School of
Medicine, Winston-Salem, NC 27157, USA
22. Case Western Reserve University, Center for Clinical Investigation, Iris S. & Bert L. Wolstein Building, 2103 Cornell
Road, Room 6129, Cleveland, OH 44106-7291, USA
23. Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA
1
GWAS of CHD in African Americans
24. Department of Epidemiology and Prevention, Division of Public Health Sciences, Wake Forest University School of
Medicine, Winston-Salem, NC 27157, USA
25. Tropical Metabolism Research Unit, Tropical Medicine Research Institute, University of the West Indies, Mona,
Kingston 7, Jamaica
26. General Medicine Division, Massachusetts General Hospital, Boston, MA 02114, USA
27. Center for Aging and Population Health, Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA 15261,
USA
28. Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.
29. Cardiovascular Research Center and Cardiology Division, Massachusetts General Hospital, Boston, Massachusetts
02114, USA
30. National Heart, Lung, and Blood Institute (NHLBI), Division of Cardiovascular Sciences, NIH, Bethesda, MD 20892, USA
31. Division of Cardiology, the Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
32. The Cardiovascular Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
33. The Institute for Translational Medicine and Therapeutics, School of Medicine, University of Pennsylvania,
Philadelphia, PA 19104, USA
34. University of Washington, Department of Epidemiology, Box 357236, Seattle, WA 98195, USA
35. Center for Public Health Genomics, University of Virginia, 6111 West Complex, PO Box 800717, Charlottesville, VA
22908-0717, USA
36. Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
37. Jackson State University, Jackson, MS 39217, USA
38. Tougaloo College, Tougaloo, MS 39174, USA
39. Departments of Pathology and Biochemistry, University of Vermont, 208 S. Park Drive, suite 2, Colchester, VT 05446,
USA
40. Epidemiology Research Unit, Tropical Medicine Research Institute, University of the West Indies, Mona, Kingston 7,
Jamaica
41. G. V. (Sonny) Montgomery Veterans Affairs Medical Center, MS 39216, USA
42. Human Genetics Center and Institute of Molecular Medicine and Division of Epidemiology, University of Texas Health
Science Center, Houston, Texas 77030, USA
#These authors contributed equally to this work
Correspondence to:
Eric Boerwinkle
The University of Texas School of Public Health
1200 Herman Pressler, Suite E451
Houston, TX 77030
USA
Phone : 713-500-9914
Fax : 713-500-0900
Email : Eric.Boerwinkle@uth.tmc.edu
2
GWAS of CHD in African Americans
Table of Contents
1. COHORT DEMOGRAPHICS
5
A) DESCRIPTION OF THE CARE AFRICAN-AMERICAN SAMPLES USED IN THE GENOME-WIDE ASSOCIATION STUDIES.
I. ATHEROSCLEROSIS RISK IN COMMUNITIES (ARIC).
II. CORONARY ARTERY RISK DEVELOPMENT IN YOUNG ADULTS (CARDIA).
III. CLEVELAND FAMILY STUDY (CFS).
IV. JACKSON HEART STUDY (JHS).
V. MULTI-ETHNIC STUDY OF ATHEROSCLEROSIS (MESA).
B) DESCRIPTION OF THE AFRICAN-AMERICAN SAMPLES USED FOR REPLICATION.
I. MULTIETHNIC COHORT (MEC), TYPE-2 DIABETES SUBSET.
II. CLEVELAND CLINIC.
III. PENNCATH (CORONARY ANGIOGRAPHIC CASE-CONTROL STUDY).
IV. NATIONAL HEALTH AND NUTRITION EXAMINATION SURVEY III (NHANES III).
V. JAMAICAN SPANISH TOWN AND GXE COHORTS.
VI. HEALTH, AGING, AND BODY COMPOSITION (HEALTH ABC) STUDY.
5
5
5
6
7
7
8
8
9
9
10
10
11
2. PHENOTYPE DEFINITIONS AND MODELING
11
A) CORONARY HEART DISEASE (CHD).
B) TYPE-2 DIABETES (T2D).
C) HYPERTENSION (HTN).
D) LOW-DENSITY LIPOPROTEIN CHOLESTEROL (LDL-C).
E) HIGH-DENSITY LIPOPROTEIN CHOLESTEROL (HDL-C).
F) SMOKING.
11
11
11
11
12
12
3. GENOTYPING AND QUALITY CONTROL
12
A) GENOME-WIDE ASSOCIATION STUDIES.
B) REPLICATION GENOTYPING.
12
13
4. PRINCIPAL COMPONENT ANALYSIS (PCA)
13
5. GENOTYPE IMPUTATION
14
6. GENETIC ASSOCIATION ANALYSIS
14
A) ADDITIONAL SAMPLE FILTERING.
B) SNP ASSOCIATION.
I- COHORTS WITH UNRELATED SAMPLES.
II- COHORTS WITH RELATED SAMPLES.
C) META-ANALYSIS.
14
15
15
15
15
7. ADMIXTURE ASSOCIATION
16
3
GWAS of CHD in African Americans
A) CREATING A PANEL OF ANCESTRY INFORMATIVE MARKERS (AIMS).
B) LOCAL ANCESTRY ESTIMATION.
C) ASSOCIATION ANALYSIS USING LOCAL ANCESTRY ESTIMATES.
16
16
17
8. REFERENCES
18
9. SUPPLEMENTARY FIGURE LEGENDS
20
4
GWAS of CHD in African Americans
1. Cohort demographics
A) Description of the CARe African-American samples used in the genome-wide association
studies.
i. Atherosclerosis Risk in Communities (ARIC).
The ARIC study is a prospective population-based study of atherosclerosis and cardiovascular
diseases in 15,792 men and women, including 11,478 non-Hispanic whites and 4,314 African
Americans, drawn from 4 U.S. communities (suburban Minneapolis, Minnesota; Washington
County, Maryland; Forsyth County, North Carolina, and Jackson, Mississippi). In the first three
communities, the sample reflects the demographic composition of the community. In Jackson, only
black residents were enrolled. Because of the design and focus of the CARe Project, only selfreported African-American participants are included in this analysis. Participants were between
age 45 and 64 years at their baseline examination in 1987-1989 when blood was drawn for DNA
extraction and participants consented to genetic testing. After taking into account availability of
adequate amounts of high quality DNA, appropriate informed consent and genotyping quality
control and assurance procedures, genotype data were available on 2,989 African-American
individuals. Blood was drawn in the morning after an overnight fast. Total cholesterol, HDLcholesterol and glucose levels were measured by standard enzymatic methods. LDL-cholesterol
was calculated by the method of Friedewald et al. [1]. Prevalent diabetes mellitus was defined as a
fasting glucose level ≥126 mg/dL, non-fasting glucose ≥200 mg/dL, or a history of treatment for
diabetes.
Seated blood pressure was measured three times with a random-zero
sphygmomanometer and the last two measurements were averaged. Prevalent hypertension was
defined as a systolic blood pressure ≥140 mmHg, a diastolic blood pressure ≥90 mmHg, or current
use of antihypertensive medications. Information about medical history and cigarette smoking was
elicited from standardized and validated interviewer-administrated questionnaires. For the
analyses presented here, smoking was defined as current smoking. Prevalent CHD was defined as
self-reported myocardial infarction or revascularization procedures.
ii. Coronary Artery Risk Development in young Adults (CARDIA).
The CARDIA study is a prospective, multi-center investigation of the natural history and etiology
of cardiovascular disease in African Americans and whites 18-30 years of age at the time of initial
examination. The initial examination included 5,115 participants selectively recruited to represent
proportionate racial, gender, age, and education groups from four communities: Birmingham, AL;
Chicago, IL; Minneapolis, MN; and Oakland, CA. Participants from the Birmingham, Chicago, and
Minneapolis centers were recruited from the total community or from selected census tracts.
Participants from the Oakland center were randomly recruited from the Kaiser-Permanente health
plan membership. Details of the study design have been published [2]. From the time of initiation of
the study in 1985-1986, five follow-up examinations have been conducted at years 2, 5, 7, 10, 15,
and 20. DNA extraction for genetic studies was performed at the Y10 examination. After taking into
account availability of adequate amounts of high quality DNA, appropriate informed consent and
genotyping quality control and assurance procedures, genotype data were available on 955 AfricanAmerican individuals.
5
GWAS of CHD in African Americans
Each participant’s age, race, and sex were self-reported during the recruitment phase and verified
during the baseline clinic visit. Blood samples were drawn after an overnight fast. Total plasma
cholesterol, triglycerides, HDL-, and LDL-cholesterol were measured according to standardized
methods[3]. LDL-C was estimated using the Friedewald equation [1]. Blood pressure was measured
at each exam on the right arm using a random-zero sphygmomanometer with the participant
seated and following a 5 min. rest. Systolic and diastolic pressures were recorded as Phase I and
Phase V Korotkoff sounds. Three measurements were taken at one-minute intervals. The average of
the second and third measurements was taken as the blood pressure value. Plasma glucose was
measured by the hexokinase method. Prevalent diabetes mellitus was defined as a fasting glucose
level ≥126 mg/dL, non-fasting glucose ≥200 mg/dL, or a history of treatment for diabetes.
Participant’s tobacco use and medical history were obtained from validated questionnaires. Details
of the procedures for data collection have been previously described [2].
iii. Cleveland Family Study (CFS).
The Cleveland Family Study (CFS) is a family-based, longitudinal study designed to characterize
the genetic and non-genetic risk factors for sleep apnea. In total, 2534 individuals (46% African
American) from 352 families were studied on up to 4 occasions over a period of 16 years (19902006). The initial aim of the study was to quantify the familial aggregation of sleep apnea. Over
time, the aims were expanded to characterize the natural history of sleep apnea, sleep apnea
outcomes, and to identify the genetic basis for sleep apnea. With subsequent exams, the cohort was
expanded to include increased minority representation and additional family members. The total
sample included index probands (n=275) who were recruited from 3 area hospital sleep centers if
they had a confirmed diagnosis of sleep apnea and at least 2 first-degree relatives available to be
studied. In the first 5 years of the study, neighborhood control probands (n=87) with at least 2
living relatives available for study were selected at random from a list provided by the index family.
All available first-degree relatives and spouses of the case and control probands were recruited.
Second-degree relatives, including half-sibs, aunts, uncles and grandparents, were also included if
they lived near the first degree relatives (cases or controls), or if the family had been found to have
two or more relatives with sleep apnea. The sample, which is enriched for individuals with sleep
apnea, also contains a high prevalence of individuals with sleep apnea-related traits, including:
obesity, impaired glucose tolerance, and hypertension. Data that were used for the CARe analyses
were for individuals in whom DNA had been collected (i.e., over the last two exam cycles (n=1447)).
After genotyping quality control, genotype data were available for 632 African Americans.
Phenotype data were collected over as many as 4 exam cycles, each occurring ~every 4 years. The
last three exams targeted all subjects who had been studied at earlier exams, as well as new
minority families and family members of previously studied probands who had been unavailable at
prior exams. The phenotype data used in the current analysis were from the 4th exam (2001-06)
conducted in 736 subjects, with oversampling of minorities and individuals in whom prior
microsatellite genome scans had been conducted. This exam was conducted in a General Clinical
Research unit, and included an overnight in-laboratory study, and more detailed measurements of
sleep using full polysomnography; blood sampling before bed, in the morning after an overnight
fast, and after an oral glucose challenge test. Total cholesterol, HDL-cholesterol and glucose levels
were measured by standard enzymatic methods. LDL-cholesterol was calculated by the method of
Friedewald [1]. Prevalent diabetes mellitus was defined as a fasting glucose level ≥126 mg/dL, non6
GWAS of CHD in African Americans
fasting glucose ≥200 mg/dL, or a history or treatment for diabetes. Blood pressure was measured 9
times using a mercury sphygmomanometer over 3 sessions (one supine and two sitting at each
session). Prevalent hypertension was defined as a systolic blood pressure ≥140 mmHg, a diastolic
blood pressure ≥90 mmHg, or current use of antihypertensive medications. Information about
medical history and cigarette smoking was elicited from standardized self-administered
questionnaires. For the analyses presented here, smoking was defined as current smoking.
Prevalent CHD was defined as self-reported myocardial infarction or revascularization procedures.
iv. Jackson Heart Study (JHS).
The Jackson Heart Study (JHS) is a prospective population-based study to seek the causes of the
high prevalence of common complex diseases among African Americans in the Jackson, Mississippi
metropolitan area, including cardiovascular disease, type-2 diabetes, obesity, chronic kidney
disease, and stroke[4]. During the baseline examination period (2000-2004) 5,301 self-identified
African Americans were recruited from four sources, including (1) randomly sampled households
from a commercial listing; (2) ARIC participants; (3) a structured volunteer sample that was
designed to mirror the eligible population; and (4) a nested family cohort. Unrelated participants
were between 35 and 84 years old, and members of the family cohort were ≥ 21 years old when
consent for genetic testing was obtained and blood was drawn for DNA extraction. Based on DNA
availability, appropriate informed consent, and genotyping results that met quality control
procedures, genotype data were available for 3,030 individuals, including 885 who are also ARIC
participants. In the current study, JHS participants who were also enrolled in the ARIC study were
analyzed with the ARIC dataset – for this reason, the JHS dataset analyzed here had 2,145
individuals. Many key aspects of the JHS were modeled on, and are essentially identical to, the
methods used in the ARIC study (see above), including phlebotomy procedures, blood pressure
measurement, laboratory methods for lipids, cholesterol, and glucose analysis, definitions of
prevalent diabetes mellitus and hypertension, and survey methods and definitions related to
medical history, cigarette smoking, and CHD.
v. Multi-Ethnic Study of Atherosclerosis (MESA).
The Multi-Ethnic Study of Atherosclerosis (MESA) is a study of the characteristics of subclinical
cardiovascular disease (disease detected non-invasively before it has produced clinical signs and
symptoms) and the risk factors that predict progression to clinically overt cardiovascular disease or
progression of the subclinical disease. MESA researchers study a diverse, population-based sample
of 6,814 asymptomatic men and women aged 45-84 at baseline. Approximately 38% of the
recruited participants are white, 28% African-American, 22% Hispanic, and 12% Asian,
predominantly of Chinese descent. For the current study, after taking into account availability of
adequate amounts of high quality DNA, appropriate informed consent and genotyping quality
control and assurance procedures, genotype data were available on 1,646 African-American
individuals.
Participants were recruited from six field centers across the United States: Wake Forest
University, Columbia University, Johns Hopkins University, University of Minnesota, Northwestern
University and University of California - Los Angeles. Each participant received an extensive
physical exam to determine coronary calcification, ventricular mass and function, flow-mediated
7
GWAS of CHD in African Americans
endothelial vasodilation, carotid intimal-medial wall thickness and presence of echogenic lucencies
in the carotid artery, lower extremity vascular insufficiency, arterial wave forms,
electrocardiographic (ECG) measures, standard coronary risk factors, socio-demographic factors,
lifestyle factors, and psychosocial factors. Selected repetition of subclinical disease measures and
risk factors at follow-up visits allows study of the progression of disease. Blood samples are being
assayed for putative biochemical risk factors and stored for case-control studies. DNA is extracted
and lymphocytes immortalized for study of candidate genes and genome-wide scanning.
Participants are followed for identification and characterization of cardiovascular disease events,
including acute myocardial infarction and other forms of coronary heart disease (CHD), stroke, and
congestive heart failure; for cardiovascular disease interventions; and for mortality.
In addition to the six Field Centers, MESA involves a Coordinating Center, a Central Laboratory,
and Central Reading Centers for Computed Tomography (CT), Magnetic Resonance Imaging (MRI),
Ultrasound, and Electrocardiography (ECG). Protocol development, staff training, and pilot testing
were performed in the first 18 months of the study. The first examination took place over two
years, from July 2000 - July 2002. It was followed by three additional examination periods:
September 2002 – January 2004, February 2004 - July 2005 September 2005 - May 2007.
Participants are contacted every 9 to 12 months throughout the study to assess clinical morbidity
and mortality. NHLBI recently funded MESA II, which will bring all participants back for a fifth
exam, starting in April 2010.
B) Description of the African-American samples used for replication.
i. Multiethnic Cohort (MEC), type-2 diabetes subset.
The MEC consists of 215,251 men and women, and comprises mainly five self-reported racial/ethnic
populations: European Americans, African Americans, Latinos, Japanese Americans and Native
Hawaiian [5]. Between 1993 and 1996, adults between 45 and 75 years old were enrolled by completing a
26-page, self-administered questionnaire asking detailed information about dietary habits, demographic
factors, level of education, personal behaviors, and history of prior medical conditions (e.g. diabetes).
Potential cohort members were identified through Department of Motor Vehicles drivers’ license files,
voter registration files and Health Care Financing Administration data files. In 2001, a short follow-up
questionnaire was sent to update information on dietary habits, as well as to obtain information about new
diagnoses of medical conditions since recruitment. Between 2003 and 2007, we re-administered a
modified version of the baseline questionnaire. All questionnaires inquired about history of diabetes,
without specification as to type (1 vs. 2). Between 1995 and 2004, blood specimens were collected from
~67,000 MEC participants at which time a short questionnaire was administered to update certain
exposures, and collect current information about medication use.
African Americans in the MEC were sampled primarily from California, with the majority from Los
Angeles County. Cohort members in California are linked each year to the California Office of Statewide
Health Planning and Development (OSHPD) hospitalization discharge database which consists of
mandatory records of all in-patient hospitalizations at most acute-care facilities in California. Records
include information on the principal diagnosis plus up to 24 other diagnoses (coded according to ICD-9),
including T1D and T2D. Information from this database was utilized to assess the percentage of T2D
8
GWAS of CHD in African Americans
controls (as defined below) with undiagnosed T2D, as well as the percentage of identified diabetes cases
with T1D rather than T2D. Based on the OSHPD database <3% of T2D cases had a previous diagnosis of
T1D. We did not use this source to identify T2D cases because they did not include information on
diabetes medications, one of our inclusion criteria for cases (see below).
Diabetic cases in the MEC were defined using the following criteria: (a) a self-report of diabetes on the
baseline questionnaire, 2nd questionnaire or 3rd questionnaire; and (b) self-report of taking medication for
T2D at the time of blood draw; and (c) no diagnosis of T1D in the absence of a T2D diagnosis from the
OSHPD. Controls were defined as: (a) no self-report of diabetes on any of the questionnaires while
having completed a minimum of 2 of the 3 (79% of controls returned all 3 questionnaires); and (b) no use
of medications for T2D at the time of blood draw; and (c) no diabetes diagnosis (type 1 or 2) from the
OSHPD registry. To preserve DNA for genetic studies of cancer in the MEC, subjects with an incident
cancer diagnosis at time of selection for this study were excluded. Controls were frequency matched to
cases on age at entry into the cohort (5-year age groups).
ii. Cleveland clinic.
The Cleveland Clinic GeneBank study is a hospital-based study of ~10,000 patients recruited
between 2001-2006 and provides an ongoing focus for analyzing the association of biochemical and
genetic factors with coronary atherosclerosis in a consecutive cohort of patients undergoing
elective cardiac evaluation. Enrollment criteria included stable patients undergoing coronary
angiography and ability to give informed consent. Ethnicity was self-reported and information
regarding demographics, medical history, and medication use was obtained by patient interviews
and confirmed by chart reviews. All clinical outcome data were verified by source documentation.
CAD was defined as adjudicated diagnoses of stable or unstable angina, MI (adjudicated definition
based on defined electrocardiographic changes or elevated cardiac enzymes), angiographic
evidence of ≥ 50% stenosis of one or more major epicardial vessels, and/or a history of known CAD
(documented MI, CAD, or history of revascularization). Fasting blood was drawn and plasma,
serum, and DNA buffy coats were isolated. A Lipid profile and complete metabolic panel were
assayed on all samples. 719 African American men and women were selected from the entire
Genebank cohort for genotyping and analysis as a replication dataset for the present study.
iii. PennCATH (Coronary Angiographic Case-Control Study).
PennCATH is a University of Pennsylvania (U. Penn) Medical Center based coronary angiographic
study that has been used previously for replication of novel genes and risk factors for
atherosclerotic CVD and type-2 diabetes [6-8]. Briefly, PennCATH, recruited between July 1998 and
March 2003, provides an ongoing focus for analyzing the association of biochemical and genetic
factors with coronary atherosclerosis in a consecutive cohort of patients undergoing cardiac
catheterization and coronary angiography. A total of 3,850 subjects provided written informed
consent in a Penn Institutional Review Board approved protocol. Enrollment criteria included any
clinical indication for cardiac catheterization and ability to give informed consent. The following
data were extracted from the medical record; age, gender, self-reported race/ethnicity, past
medical (including diabetes, hypertension, dyslipidemia, prior MI and cardiac events), social, family
and medication history, cardiovascular risk factors, physical exam including vital signs, weight and
height (for BMI). Ethnicity information was self-reported. Coronary angiograms were scored at the
9
GWAS of CHD in African Americans
time of procedure by the interventional cardiologist. Blood was drawn in a fasting state, DNA (buffy
coats) and plasma was isolated, and lipoproteins and glucose were assayed on all samples.
For this analysis, a nested case-control analysis was performed in PennCATH (N=502 African
Americans) composed of controls (N=162) who on coronary angiography showed no or minimal
(<10% stenosis of any vessel) evidence of CAD and angiographic CAD cases (N=340) with one or
more coronary vessels with ≥50% stenosis. Cases were divided into those with history or
presentation of MI (N=136) and cases without history or presentation with MI (N=204).
iv. National Health and Nutrition Examination Survey III (NHANES III).
The Third National Health and Nutrition Examination Survey (NHANES III), 1988-94, was
conducted on a nationwide probability sample of approximately 33,994 persons aged 2 months and
over. The survey was designed to obtain nationally representative information on the health and
nutritional status of the population of the United States through interviews and direct physical
examinations. Physical examinations and objective measures are employed because the
information collected cannot be furnished or is not available in a standardized manner through
interviews with the people themselves or through records maintained by the health professionals
who provide their medical care.
Some of the 30 topics investigated in the NHANES III were: high blood pressure, high blood
cholesterol, obesity, passive smoking, lung disease, osteoporosis, HIV, hepatitis, helicobacter pylori,
immunization status, diabetes, allergies, growth and development, blood lead, anemia, food
sufficiency, dietary intake-including fats, antioxidants, and nutritional blood measures. Methods
used to conduct surveys and examinations and to measure plasma glucose, HDL- and LDLcholesterol and triglycerides are detailed at http://www.cdc.gov/nchs/nhanes.htm.
v. Jamaican Spanish Town and GXE cohorts.
DNA samples for the Jamaican cohorts were obtained from two sources: Kingston and Spanish
Town, Jamaica. The Kingston GXE cohort was obtained from a survey conducted in the capital city,
Kingston, as part of a larger project to examine gene by environment interactions in the
determination of blood pressure among adults 25-74 years. The principal criterion for eligibility
was a body mass index (BMI) in either the top or bottom third of BMI for the Jamaican
population[9]. Participants were identified principally from the records of the Heart Foundation of
Jamaica, a non-governmental organization based in Kingston, which provides low-cost screening
services (height and weight, blood pressure, glucose, cholesterol) to the general public. Other
participants were identified from among participants in family studies of blood pressure at the
Tropical Metabolism Research Unit (TMRU) and from among staff members at the University of the
West Indies, Mona. All participants were unrelated. A total of 1,039 persons were enrolled.
The Spanish Town cohort was obtained from a survey conducted as part of the International
Collaborative Study of Hypertension in Blacks and previously described in detail [9]. In this study a
stratified random sample of the Jamaican population aged 25-74 years was recruited from in and
around Spanish Town, a stable, residential urban area neighboring the capital city of Kingston.
2,096 participants were enrolled between 1993 and 1998.
10
GWAS of CHD in African Americans
The GXE and Spanishtown studies are approved by the University Hospital of the West
Indies/University of the West Indies/Faculty of Medical Sciences Ethics Committe Mona, Kingston,
Jamaica.
vi. Health, Aging, and Body Composition (Health ABC) Study.
The Health ABC study is a prospective cohort study investigating the associations between body
composition, weight-related health conditions, and incident functional limitation in older adults.
Health ABC enrolled well-functioning, community-dwelling black (n=1281) and white (n=1794)
men and women aged 70-79 years between April 1997 and June 1998. Participants were recruited
from a random sample of white and all black Medicare eligible residents in the Pittsburgh, PA, and
Memphis, TN, metropolitan areas. Participants have undergone annual exams and semi-annual
phone interviews.
2. Phenotype definitions and modeling
Note. For all traits analyzed, we used phenotypes and covariates at the baseline examination. Only
prevalent events for coronary heart disease, type-2 diabetes, and hypertension were considered.
A) Coronary heart disease (CHD).
Coronary heart disease cases are defined as participants with: (1) myocardial infarction, (2) heart
surgery, (3) coronary bypass, (4) angioplasty of coronary artery, or (5) physician-diagnosed history
of myocardial infarction. We excluded from the coronary heart disease cases those patients with
angina only.
B) Type-2 diabetes (T2D).
Type-2 diabetes cases are defined as participants: (1) with fasting blood glucose 126 mg/ml, (2)
with random blood glucose 200 mg/ml, (3) with physician-diagnosed type-2 diabetes, or (4)
currently taking diabetic medications.
C) Hypertension (HTN).
Hypertension cases are defined as participants: (1) with systolic blood pressure 140 mmHg, (2)
with diastolic blood pressure 90 mmHg, or (3) currently taking blood pressure lowering
medications. We excluded from the hypertension cases patients with self-reported history of
hypertension only.
D) Low-density lipoprotein cholesterol (LDL-C).
LDL-C was calculated according to Friedewald’s formula: LDL-C = total cholesterol – HDL-C –
(triglycerides ÷ 5). If a triglyceride value was ≥ 400 mg/dL, LDL-C was treated as a missing value.
For individuals on lipid-lowering therapy, the LDL-C value was multiplied by 1.42 to model a 30%
reduction in LDL-C on therapy. This represents the average expected reduction in LDL-C with a
first-generation statin, the most commonly used lipid-lowering medication during the study periods
of most of the cohorts [10]. Sex-specific phenotype residuals were constructed within each cohort,
after adjusting for age and age-squared. Each set of residuals was standardized to a mean of zero
and a standard deviation of one. The standardized residual served as the phenotype in genotypephenotype association analyses.
11
GWAS of CHD in African Americans
E) High-density lipoprotein cholesterol (HDL-C).
Sex-specific residuals were constructed within each cohort after adjusting for age and agesquared. Each set of residuals was standardized to a mean of zero and a standard deviation of one.
The standardized residual served as the phenotype in genotype-phenotype association analyses.
F) Smoking.
This analysis was restricted to current smokers. The number of reported cigarettes smoked per
day was categorized as follows: 1= <1 cigarette per day; 2= 1-4 cigarettes per day; 3= 5-14
cigarettes per day; 4= 15-24 cigarettes per day; 5= 25-34 cigarettes per day; 6 =34-44 cigarettes per
day; and 7= 45 and more cigarettes per day. Sex-specific phenotype residuals were constructed
within each cohort, after adjusting for age and age-squared. Each set of residuals was standardized
to a mean of zero and a standard deviation of one. The standardized residual served as the
phenotype in genotype-phenotype association analyses.
3. Genotyping and quality control
A) Genome-wide association studies.
All samples were genotyped at the Broad Institute using the Affymetrix Genome-Wide Human
SNP Array 6.0 (Affy6.0) according to the manufacturer’s recommendations. This genotyping
platform interrogates simultaneously 1.8 million markers for genetic variation (906,600 SNPs and
946,000 copy number variation probes) [11,12]. A total amount of 1 μg of genomic DNA (diluted in
1X TE buffer and at 50 ng/μl) was equally interleaved on 96-well master plates to ensure technical
uniformity during the laboratory process. Two methods of DNA quality control metrics were
assessed on the samples prior to the genome scan. First, quantity of double stranded DNA was
assessed using PicoGreen® (Molecular Probes, Oregon, USA). Next, to confirm sample identity, a set
of 24 markers including a gender confirmation assay were genotyped using the Sequenom platform
to serve as a genetic fingerprint. Each of these 24 SNPs are also on the Affy6.0 array, and serve in
cross-platform sample verification. Based on the concept of reduced genomic representation, a
restriction enzyme digestion was performed on 250 ng of input DNA. The digested segments were
ligated to enzyme specific adaptors which incorporate a universal PCR priming sequence. PCR
amplification using universal primers was performed in a reaction optimized to amplify fragments
between 200-1,100 base pairs. A fragmentation step then reduced the PCR product to segments of
approximately 25-50 bp, which were then end-labeled using biotinylated nucleotides. The labeled
product was then hybridized to a chip, washed and detected.
Genotypes were called using Birdseed v1.33 [11,12]. Quality control steps were performed using
the software PLINK [13], EIGENSTRAT [14], and PREST-Plus [15]. For each step, we report in
Supplementary Tables 1 and 2 the number of samples and SNPs, respectively, which were
removed from the raw datasets.
First, to confirm sample identity, we monitored genotype concordance between 24 SNPs
genotyped in the same DNA samples using both Sequenom iPLEX [16] and Affy6.0. Second, DNA
samples with a genome-wide genotyping success rate <95% and SNPs with genotyping success rate
<90%, monomorphic SNPs, and SNPs that map to several genomic locations were removed from the
12
GWAS of CHD in African Americans
analyses. Third, heterozygosity rates (in the form of inbreeding coefficients) on the autosomes were
estimated to identify problematic DNA samples (poor DNA quality or contaminations). Fourth,
genome-wide genotype data was used to estimate identity-by-descent (IBD) between all pairwise
combinations of samples in order to identify sample duplicates, contaminated samples, and cryptic
relationships. We also used IBS/IBD measures, as implemented in PREST-Plus [15], to confirm
known pedigree data for CFS and JHS. Fifth, we removed sample outliers in the nearest neighbor
and “clustering based on missingness” analyses in PLINK. Sixth, additional filters were applied to
remove SNPs with minor allele frequency (MAF) <1%, with genotyping success rate <95%, and
SNPs where missingness can be predicted using surrounding haplotypes. The Hardy-Weinberg
equilibrium (HWE) test was performed for all SNPs, but SNPs were not excluded based uniquely on
this criterion given the admixed nature of the cohorts genotyped. We note however that none of the
SNPs reported in this manuscript to be associated with coronary heart disease or its risk factors
have a HWE P-value <1x10-6. Seventh, for datasets with known pedigrees (CFS and JHS), SNPs and
samples with an unusually high number of Mendel errors were excluded. Finally, SNPs that showed
association with specific chemistry plates were excluded.
B) Replication genotyping.
5' nuclease Taqman allelic discrimination assay (Taqman; Applied Biosystems, Foster City, CA,
USA) was used by the Multi-Ethnic Cohort to genotype six SNPs in the T2D panel. Custom assays to
genotype SNPs in the Cleveland, PennCATH, Jamaican, and NHANESIII samples were designed using
Illumina’s Oligos Pool All (OPA) technology. Genotype calls were generated using Illumina’s
BeadStudio. For these replication analyses, samples and SNPs with genotyping success rate <90%
were excluded.
For Health ABC, genotyping was performed by the Center for Inherited Disease Research (CIDR)
using the Illumina Human1M-Duo BeadChip system. Genomic DNA was extracted from buffy coat
collected using PUREGENE® DNA Purification Kit during the baseline exam. Samples were
excluded from the dataset for reasons of sample failure, genotypic sex mismatch, and first-degree
relative of an included individual based on genotype data. Genotyping was successful for 1,151,215
SNPs in 1,139 African Americans. Imputation was done for the autosomes using MACH software
version 1.0.16. SNPs with minor allele frequency ≥1%, call rate ≥97% and HWE P ≥10-6 were used
for imputation. HapMap II phased haplotypes were used as reference panels. For African
Americans, genotypes were available on 1,007,948 high quality SNPS for imputation based on a 1:1
mixture of the CEPH:Yoruba (YRI) reference panel (release 21, build 36). A total of 1,958,375 SNPs
are available for analysis.
4. Principal component analysis (PCA)
We used PCA as implemented in EIGENSTRAT[14] on the cleaned CARe African-American Affy6.0
genotype data. Together with the CARe African-American samples, we analyzed genome-wide
genotype data from 1,178 European Americans (a multiple sclerosis GWA study graciously offered
by Dr. Phil de Jager and colleagues) and from 756 Nigerians from the Yoruba region (a hypertension
GWA study graciously offered by Dr. Richard Cooper and colleagues). These two datasets have been
extensively cleaned using PCA to remove population outliers. In our analysis, they are used as
13
GWAS of CHD in African Americans
reference populations. Plots of the two main principal components (PCs) for each dataset are
shown in Supplementary Figure 1: the first PC is correlated at r2 >0.98 with global European vs.
African ancestry as calculated independently using the population genetics software
ANCESTRYMAP and STRUCTURE [17,18].
PCA was also used as a screening tool to detect extreme sample outliers before quality control
checks (Supplementary Table 1). For all cohorts except CARDIA, we did not observe significant
sample outliers at this step. For CARDIA, however, the second principal component separated 210
samples from the rest of the samples (the first PC still captured global ancestry). These 210 samples
were characterized by low genotyping success rate (<98%), low heterozygosity (inbreeding
coefficient F <-0.15), and belonged to four different chemistry plates (the CARDIA DNA collection
was genotyped on 16 plates). These 210 CARDIA samples were determined to have poor
genotyping characteristics and removed from subsequent QC analyses.
5. Genotype imputation
Imputation
was
performed
using
MACH
1.0.16
(http://www.sph.umich.edu/csg/abecasis/MaCH/). MaCH requires phased reference haplotypes
to perform imputation. For the African Americans, a combined CEU+YRI reference panel was
created. This panel includes SNPs segregating in both CEU and YRI, as well as SNPs segregating in
one panel and monomorphic and nonmissing in the other (2.74 million altogether). Due to the
overlap of African American individuals on the Affymetrix 6.0 and IBC arrays[16,19], it was possible
to analyze imputation performance at SNPs not genotyped on Affymetrix 6.0. For imputation based
on Affymetrix data, the use of the CEU+YRI panel resulted in an allelic concordance rate of ~95.6%,
calculated as 1 – 1/2*|imputed_dosage – chip_dosage|. This rate is comparable to rates calculated
for individuals of African descent imputed with the HapMap 2 YRI individuals [20].
For each imputed sample, imputation was performed in two steps. For the first step, individuals
with pedigree relatedness or cryptic relatedness (pi_hat > 0.05) were filtered. A subset of
individuals was randomly extracted from each panel and used to generate recombination and error
rate estimates for the corresponding sample. In the second step, these rates were used to impute
all sample individuals across the entire reference panel. Imputation results were filtered at an
RSQ_HAT threshold of 0.3 and a minor allele frequency threshold of 0.01.
6. Genetic association analysis
A) Additional sample filtering.
Prior to genetic association testing, we removed additional samples that had passed all quality
control filters described above but would nevertheless have caused problems in the interpretation
of the results. These include: samples with missing gender information (ARIC=85), samples with
different IDs that share >90% of the their genome identity-by-descent (IBD)(ARIC=56; JHS=1),
samples unlikely to be from African Americans based on principal component analysis results
(ARIC=8; CARDIA=2), samples that have a high number of discordant genotypes at SNPs common to
14
GWAS of CHD in African Americans
both the Affy6.0 platform and the ITMAT-BROAD-CARe (IBC) array (ARIC=3) [19], seven samples
from the ARIC dataset that were also present in the JHS dataset based on IBD metrics, and
participants who were younger than 18 years old at baseline (CARDIA=5; CFS=111). Thus, the
following numbers of African-American participants were available for analysis: ARIC=2,830,
CARDIA=949, CFS=521, JHS=2,144, and MESA=1,646 (Total N=8,090).
B) SNP association.
We arbitrarily decided not to perform analysis on datasets with 10 or fewer cases for a given
phenotype. This filter excluded CARDIA from the T2D analysis and CARDIA and MESA from the CHD
analysis. The inflation factors for all completed analyses are presented in Supplementary Table 4.
i- Cohorts with unrelated samples.
For all cohorts but CFS, genome-wide association (GWA) analysis was performed in PLINK [13]
using a linear regression model for quantitative traits (HDL, LDL, and smoking) and a logistic
regression model for dichotomous outcomes (CHD, T2D, HTN), both under an additive genetic
model. Trait modeling for LDL, HDL, and smoking is described above. For both linear and logistic
models, we used as covariates the first ten principal components; for the logistic models, we also
included gender, age, and age-squared in the model. For imputed genotypes, we use dosage
information (i.e. a value between 0.0 – 2.0 calculated using the probability of each of the three
possible genotypes) in the regression model implemented in PLINK.
ii- Cohorts with related samples.
For CFS, we modeled the family structure in the association tests using linear mixed effects (LME)
models for the quantitative traits (HDL, LDL, and smoking) and generalized estimating equations
(GEE) for the dichotomous phenotypes (CHD, T2D, HTN). These statistical methods are
implemented in R [21]. We tested an additive genetic model and included as covariates the first
ten principal components. With the GEE, we also included as covariates gender and age; agesquared was not taken into account because it introduced a co-linearity problem with the R
routine. For imputed genotypes, we use dosage information (i.e. a value between 0.0 – 2.0
calculated using the probability of each of the three possible genotypes) in the regression model
implemented in R routines.
Although the JHS has a small number of related individuals, extensive analyses showed that
results were concordant using linear/logistic regression or the LME/GEE routines, after genomic
control (data not shown). For simplicity, we opted to present the linear/logistic regression results
for JHS in this article. We did not test other genetic model than the additive model in this genomewide association study.
C) Meta-analysis.
Association results from the five CARe cohorts (ARIC, CARDIA, CFS, MESA, JHS) were combined
using the inverse variance method, as implemented in the software metal
(http://www.sph.umich.edu/csg/abecasis/metal/) [22]. Individual study results were corrected
using genomic control. Meta-analytic results were also scaled using genomic control. The
15
GWAS of CHD in African Americans
Manhattan plots summarizing the meta-analysis results after double genomic control corrections
are in Figure S2.
7. Admixture association
A) Creating a panel of Ancestry Informative Markers (AIMs).
First, we describe a data resource that proved very valuable for much of the analytic work on
CARe African Americans (AA). We obtained a substantial number of samples that proved excellent
surrogates for the ancestral populations of the CARe AA. After careful curation, removing related
pairs and some outlier samples (quality-control carried out using EIGENSOFT [14]), we had access
to data for 756 Nigerian (Yoruba) samples and 1,178 European American samples that had been
collected for other purposes, and made available to CARe. We thank Arti Tandon for having
performed the quality-control of these datasets. We also thank Drs. Richard Cooper and Phil de
Jager (and their colleagues) for generously providing to CARe these African and European samples,
respectively. We call these samples ‘parental’ samples, although obviously they are samples from
populations only approximating the true parental populations of AA.
We then ran smartpca (part of EIGENSOFT), on the parental samples. As expected the leading
eigenvector reflects genetic differences between Europe and West Africa. We output the ‘SNP
loadings’: the absolute value of the loading is a score for informativeness at each SNP (that is how
much a SNP contributes to a given eigenvector. We then ran a greedy algorithm to produce a list of
SNPs all at least 0.5 cM apart from each other, chosen to have large SNP loadings. This produced a
list of 4917 SNPs. We next carried out a careful check on pairs of markers, for whether detectable
linkage disequilibrium (LD) was present in the parental populations. This step is necessary because
our admixture program ANCESTRYMAP [17] is sensitive to remaining residual LD. Specifically, we
checked all pairs of markers that were separated by less than 5 cM and computed a statistic X, 2[1]
distributed at random for both the European and African parental samples. If the genetic distance
between the pairs is d cM then we declared the pair to be in LD if X > 0.02d. For such a pair, we
deleted the less informative SNP from our AIMs panel. This is a very stringent criterion, but is
justified, as deleting a few informative markers is a modest price to reduce the likelihood of false
positive association signals. This “LD pruning” step is a standard part of the preprocessing for
ANCESTRYMAP [17], and has been carried out in exactly the same way in previous admixture scans
[23-25]. The described approach produced a final set of 3,192 unlinked AIMs.
B) Local ancestry estimation.
Our primary tool is ANCESTRYMAP [17], which can calculate: (1) local ancestry estimates on a
mesh (default 1 cM, which we use) and (2) for each individual i, estimates of the global ancestry θi
(proportion of European ancestors) and the number of generations to admixture (λi).
Especially for large sample sets such as the CARe study, it is impractical to compute and store on
disk local ancestry estimates for every SNP and individual. Instead we use an interpolation
procedure. Think of a chromosome as ordered from left to right, so that the leftmost marker has
smallest physical position on the human reference. We wish to calculate a local ancestry estimate
for marker k, sample i. Find the nearest point to k on the left, L = L(k) such that the genetic distance
16
GWAS of CHD in African Americans
|k − L(k)| > 2 cM . Similarly, with find the nearest point R = R(k) on the right. Let α(x, i) be the
posterior probability that sample i has x European chromosomes at L conditional on markers at L
or leftwards. Let β(y, i) be the probability of the data at R or rightwards, conditional on sample i
having y European chromosomes at R. These quantities are familiar from standard theory of Hidden
Markov Models, and note that the definitions are not symmetric. Assuming that markers outside the
interval (L, R) are unlinked to k, it is now simple to calculate the posterior distribution of the
number of European chromosomes at k conditional on all data outside the interval (L, R), using the
genetic distance between k, L and R. In our applications, we also want the distribution of ancestry
conditioned as above, together with the observed genotype at k. As parental allele frequencies are
assumed known, this is readily computed using Bayes rule. These computations can be done ‘on the
fly’ without external storage and are fast and efficient. Boundary conditions (such as L, R being ‘off
the end’ of the chromosome) are handled correctly.
C) Association analysis using local ancestry estimates.
We developed a new module in PLINK, called PLOCAL (Cameron Palmer and Joel N. Hirschhorn,
pers. comm.), that can retrieve dynamically from the ANCESTRYMAP output files the estimates of
local ancestry for each participant and for each SNP. In PLINK, we could test linear and logistic
regression models with terms for SNP genotypes, local ancestry estimates, and the main ten
principal components; in logistic models, we also add gender, age, and age-squared as covariates.
To combine evidence from genotypes (e.g. the number of minor alleles) and estimates of local
ancestry (e.g. 0-100% European ancestry) at each SNP for a given phenotype across the CARe
African-American cohorts, we follow these steps: at each SNP, (1) the P-value corresponding to the
effect size (regression beta’s) estimated for the SNP term is combined across the five cohorts using
a weighted Z-score method based on cohort sample size (see metal above); individual cohort
results are scaled using genomic control, (2) similarly, the P-value corresponding to the local
ancestry term is combined using the same meta-analysis method across cohort; and (3) for each
SNP, meta-analytic P-values for the SNP and local ancestry effect sizes are converted to Chi-square
values, summed, and a Chi-square statistic with two degrees-of-freedom is obtained. Genomic
control is again applied to the final association statistics. For these analyses, we ignore family
structures and rely conservatively on genomic control to control for the inflation of the test
statistics.
To illustrate the power of our approach in genetic association testing in African Americans, we
simulated the following scenario: A genetic marker has an allele frequency of 50% in both
European and African parental populations. This marker is in perfect linkage disequilibrium in
Europeans with the causal allele, but in complete linkage equilibrium in Africans with the causal
allele. The causal allele has a genetic relative risk of 2. The study design includes 2,000 AfricanAmerican cases and controls. Under these assumptions, an analytical framework that ignores local
ancestry has 4% power to identify the signal at P1x10-6. In comparison, our strategy, which
utilizes allelic and local ancestry information, has 96% power to detect the same signal using the
same statistical threshold.
17
GWAS of CHD in African Americans
8. References
1. Friedewald WT, Levy RI, Fredrickson DS (1972) Estimation of the concentration of low-density
lipoprotein in plasma, without use of preparative ultracentrifuge. Clin Chem 18: 499-502.
2. Friedman GD, Cutter GR, Donahue RP, Hughes GH, Hulley SB, et al. (1988) CARDIA: study design,
recruitment, and some characteristics of the examined subjects. J Clin Epidemiol 41: 11051116.
3. Warnick GR (1986) Enzymatic methods for quantification of lipoprotein lipids. Methods Enzymol
129: 101-123.
4. Taylor HA, Jr., Wilson JG, Jones DW, Sarpong DF, Srinivasan A, et al. (2005) Toward resolution of
cardiovascular health disparities in African Americans: design and methods of the Jackson
Heart Study. Ethn Dis 15: S6-4-17.
5. Kolonel LN, Henderson BE, Hankin JH, Nomura AM, Wilkens LR, et al. (2000) A multiethnic cohort
in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151: 346-357.
6. Helgadottir A, Manolescu A, Helgason A, Thorleifsson G, Thorsteinsdottir U, et al. (2006) A variant
of the gene encoding leukotriene A4 hydrolase confers ethnicity-specific risk of myocardial
infarction. Nat Genet 38: 68-74.
7. Helgadottir A, Thorleifsson G, Magnusson KP, Gretarsdottir S, Steinthorsdottir V, et al. (2008) The
same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic
aneurysm and intracranial aneurysm. Nat Genet 40: 217-224.
8. Lehrke M, Millington SC, Lefterova M, Cumaranatunge RG, Szapary P, et al. (2007) CXCL16 is a
marker of inflammation, atherosclerosis, and acute coronary syndromes in humans. J Am
Coll Cardiol 49: 442-449.
9. Cooper R, Rotimi C, Ataman S, McGee D, Osotimehin B, et al. (1997) The prevalence of
hypertension in seven populations of west African origin. Am J Public Health 87: 160-168.
10. Kapur NK, Musunuru K (2008) Clinical efficacy and safety of statins in managing cardiovascular
risk. Vasc Health Risk Manag 4: 341-353.
11. Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, et al. (2008) Integrated genotype
calling and association analysis of SNPs, common copy number polymorphisms and rare
CNVs. Nat Genet 40: 1253-1260.
12. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, et al. (2008) Integrated detection and
population-genetic analysis of SNPs and copy number variation. Nat Genet 40: 1166-1174.
13. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for
whole-genome association and population-based linkage analyses. Am J Hum Genet 81:
559-575.
14. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal
components analysis corrects for stratification in genome-wide association studies. Nat
Genet 38: 904-909.
15. McPeek MS, Sun L (2000) Statistical tests for detection of misspecified relationships by use of
genome-screen data. Am J Hum Genet 66: 1076-1094.
16. Musunuru K, Lettre G, Young T, Farlow DN, Pirrucello JP, et al. (2010) Candidate Gene
Association Resource (CARe): Design, Methods, and Proof of Concept. Circulation
Cardiovascular Genetics In press.
17. Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, et al. (2004) Methods for highdensity admixture mapping of disease genes. Am J Hum Genet 74: 979-1000.
18. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus
genotype data. Genetics 155: 945-959.
18
GWAS of CHD in African Americans
19. Keating BJ, Tischfield S, Murray SS, Bhangale T, Price TS, et al. (2008) Concept, design and
implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic
association studies. PLoS One 3: e3583.
20. Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, et al. (2009) Genotype-imputation accuracy
across worldwide human populations. Am J Hum Genet 84: 235-250.
21. Chen MH, Yang Q (2009) GWAF: an R package for genome-wide association analyses with family
data. Bioinformatics.
22. Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genomewide
association scans. Bioinformatics 26: 2190-2191.
23. Kao WH, Klag MJ, Meoni LA, Reich D, Berthier-Schaad Y, et al. (2008) MYH9 is associated with
nondiabetic end-stage renal disease in African Americans. Nat Genet 40: 1185-1192.
24. Reich D, Nalls MA, Kao WH, Akylbekova EL, Tandon A, et al. (2009) Reduced neutrophil count in
people of African descent is due to a regulatory variant in the Duffy antigen receptor for
chemokines gene. PLoS Genet 5: e1000360.
25. Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, et al. (2005) A whole-genome
admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet 37:
1113-1118.
26. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, et al. (2010) LocusZoom: regional
visualization of genome-wide association scan results. Bioinformatics 26: 2336-2337.
27. Kathiresan S, Voight BF, Purcell S, Musunuru K, Ardissino D, et al. (2009) Genome-wide
association of early-onset myocardial infarction with single nucleotide polymorphisms and
copy number variants. Nat Genet 41: 334-341.
28. Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, et al. (2009) Common variants at 30
loci contribute to polygenic dyslipidemia. Nat Genet 41: 56-65.
19
GWAS of CHD in African Americans
9. Supplementary Figure Legends
Figure S1. Plots of the two main principal components (PC) in the CARe African-American samples.
European-Americans and Nigerians samples are used as reference populations. We note that the
first principal component (PC1) captures European vs. African global ancestry. For CFS, outliers on
PC2 all belong to the same large family.
Figure S2. Manhattan plots summarizing the meta-analysis results for the six phenotypes analyzed
after double genomic control scaling. The dashed line highlights genome-wide significance (P-value
=5x10-8). The genome-wide significant loci include 8p23 (chr8), LPL (chr8), LIPC (chr15), LCAT
(chr16), and CETP (chr16) for HDL-C, SLC12A9 (chr7) for hypertension, and PCSK9 (chr1), CELSR2PSRC1-SORT1 (chr1), and APOE (chr19) for LDL-C.
Figure S3. Graphical representation of the information summarized in Table S12. Plots were
drawn using LocusZoom [26]. CHD association results in Caucasians are from Kathiresan et al. [27].
HDL-C association results in Caucasians are from Kathiresan et al. [28]. Under each plot, the light
blue box corresponds to the genomic intervals flanked by the leftmost and rightmost SNPs with and
r20.3 with the index SNPs (purple diamond). For the results in Caucasians, we used LD based on
HapMap CEU, and for the results in African Americans, LD based on HapMap YRI. For the PLTP and
ABCA1 loci, the CARe SNPs (respectively rs6065904 and rs13284054) define genomic intervals of
0.2 kb using the r20.3 threshold, which appear as light blue lines on the plots.
Figure S4. Quantile-quantile (QQ) plots of the meta-analyses results that take into account local
ancestry estimates and SNP genotypes (Chi-square with two degrees-of-freedom (N=8,090). Each
black circle represents an observed statistic for genotyped SNPs only (defined as the –log10(Pvalue)) against the corresponding expected statistic. The grey area corresponds to the 90%
confidence intervals calculated empirically using permutations. The meta-analysis inflation factors
are: coronary heart disease (s=0.923), HDL-C (s=1.030), hypertension (s=1.121), LDL-C
(s=1.290), smoking (s=1.060), and type-2 diabetes (s=1.109). Data shown is genomic controlled
before (for each study) and after the meta-analysis.
Figure S5. Manhattan plots summarizing the meta-analysis results that take into account local
ancestry estimates and SNP genotypes (two degrees-of-freedom). Results are shown for the six
phenotypes analyzed after double genomic control scaling. The dashed line highlights genome-wide
significance (P-value =5x10-8). The genome-wide significant loci include LCAT (chr16) and CETP
(chr16) for HDL-C and PCSK9 (chr1), CELSR2-PSRC1-SORT1 (chr1), 2p24 (chr2), and APOE (chr19)
for LDL-C.
Figure S6. Comparison of P-values (-log10 scale) for the meta-analysis results obtained using SNP
genotype-only (x-axis) or SNP genotype + estimate of local ancestry (y-axis) to compute the test
statistics. Each black circle corresponds to a SNP. In total, results from ~885,000 genotyped SNPs
were available for each method. The gray line represents perfect correlation (x=y). The horizontal
and vertical dashed lines represent the pre-defined threshold for genome-wide significance (Pvalue=5x10-8).
20
Download