3. Genotyping and quality control

GWAS of CHD in African Americans Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: The NHLBI CARe Project Text S1 Guillaume Lettre1,2, Cameron D. Palmer3,4,#, Taylor Young3,#, Kenechi G. Ejebe3,#, Hooman Allayee5, Emelia J. Benjamin6,7, Franklyn Bennett8, Donald W. Bowden9, Aravinda Chakravarti10, Al Dreisbach11, Deborah N. Farlow3, Aaron R. Folsom12, Myriam Fornage13, Terrence Forrester8, Ervin Fox11, Christopher A. Haiman5, Jaana Hartiala5, Tamara B. Harris14, Stanley L. Hazen15, Susan R. Heckbert16,17,18, Brian E. Henderson5, Joel N. Hirschhorn3,4,19, Brendan J. Keating20, Stephen B. Kritchevsky21, Emma Larkin22, Mingyao Li23, Megan E. Rudock24, Colin A. McKenzie25, James B. Meigs19,26, Yang A. Meng3, Tom H. Mosley Jr11, Anne B. Newman27, Christopher H. NewtonCheh3,7,19,28,29, Dina N. Paltoo30, George J. Papanicolaou30, Nick Patterson3, Wendy S. Post31, Bruce M. Psaty16,17,18, Atif N. Qasim32, Liming Qu23, Daniel J. Rader32,33, Susan Redline23, Muredach P. Reilly32,33, Alexander P. Reiner34, Stephen S. Rich35, Jerome I. Rotter36, Yongmei Liu24, Peter Shrader26, David S. Siscovick16,17, W.H. Wilson Tang15, Herman A. Taylor Jr.11,37,38, Russell P. Tracy39, Ramachandran S. Vasan6,7, Kevin M. Waters5, Rainford Wilks40, James G. Wilson11,41, Richard R. Fabsitz30, Stacey B. Gabriel3, Sekar Kathiresan3,7,19,28,29, Eric Boerwinkle42 1. Montreal Heart Institute, 5000 Bélanger Street, Montréal, Québec, H1T 1C8, Canada 2. Département de Médecine, Université de Montréal, C.P. 6128, succursale Centre-ville, Montréal, Québec, H3C 3J7, Canada 3. Program in Medical and Population Genetics, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA 4. Divisions of Genetics and Endocrinology and Program in Genomics, Children’s Hospital Boston, Boston, MA 02115, USA 5. Department of Preventive Medicine, USC Keck School of Medicine, Los Angeles, CA 90033, USA 6. Department of Medicine, Boston University Schools of Medicine and Epidemiology, Boston, MA 02215, USA 7. Framingham Heart Study of the National, Heart, Lung, and Blood Institute and Boston University, Framingham, Massachusetts 01702, USA 8. Tropical Medicine Research Institute, University of the West Indies, Mona, Kingston 7, Jamaica 9. Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA 10. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA. 11. Department of Medicine, University of Mississipi Medical Center, Jackson, MS 39216, USA 12. Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN 55454, USA 13. Institute of Molecular Medicine and Division of Epidemiology School of Public Health, University of Texas Health Sciences Center at Houston, 1825 Pressler Street SRB 530.G, Houston, TX, 77030, USA 14. Laboratory of Epidemiology, Demography, and Biometry, National Institute on Aging, Bethesda, MD, 20892, USA 15. Departments of Cell Biology and Cardiovascular Medicine, The Center for Cardiovascular Diagnostics & Prevention, Cleveland Clinic, Cleveland, Ohio 44195, USA 16. Departments of Medicine and Epidemiology, University of Washington, Seattle, Washington 98195, USA 17. Cardiovascular Health Research Unit, University of Washington, Seattle, Washington 98101, USA 18. Group Health Research Institute, Group Health Cooperative, Seattle, WA 98124, USA 19. Department of Medicine, Harvard Medical School, Boston, MA 02115, USA 20. Center for Applied Genomics, Abramson Building, Children’s Hospital of Philadelphia, PA 19104, USA 21. J. Paul Sticht Center on Aging, Division of Gerontology and Geriatric Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA 22. Case Western Reserve University, Center for Clinical Investigation, Iris S. & Bert L. Wolstein Building, 2103 Cornell Road, Room 6129, Cleveland, OH 44106-7291, USA 23. Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA 1 GWAS of CHD in African Americans 24. Department of Epidemiology and Prevention, Division of Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA 25. Tropical Metabolism Research Unit, Tropical Medicine Research Institute, University of the West Indies, Mona, Kingston 7, Jamaica 26. General Medicine Division, Massachusetts General Hospital, Boston, MA 02114, USA 27. Center for Aging and Population Health, Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA 15261, USA 28. Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA. 29. Cardiovascular Research Center and Cardiology Division, Massachusetts General Hospital, Boston, Massachusetts 02114, USA 30. National Heart, Lung, and Blood Institute (NHLBI), Division of Cardiovascular Sciences, NIH, Bethesda, MD 20892, USA 31. Division of Cardiology, the Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA 32. The Cardiovascular Institute, University of Pennsylvania, Philadelphia, PA 19104, USA 33. The Institute for Translational Medicine and Therapeutics, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA 34. University of Washington, Department of Epidemiology, Box 357236, Seattle, WA 98195, USA 35. Center for Public Health Genomics, University of Virginia, 6111 West Complex, PO Box 800717, Charlottesville, VA 22908-0717, USA 36. Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA 37. Jackson State University, Jackson, MS 39217, USA 38. Tougaloo College, Tougaloo, MS 39174, USA 39. Departments of Pathology and Biochemistry, University of Vermont, 208 S. Park Drive, suite 2, Colchester, VT 05446, USA 40. Epidemiology Research Unit, Tropical Medicine Research Institute, University of the West Indies, Mona, Kingston 7, Jamaica 41. G. V. (Sonny) Montgomery Veterans Affairs Medical Center, MS 39216, USA 42. Human Genetics Center and Institute of Molecular Medicine and Division of Epidemiology, University of Texas Health Science Center, Houston, Texas 77030, USA #These authors contributed equally to this work Correspondence to: Eric Boerwinkle The University of Texas School of Public Health 1200 Herman Pressler, Suite E451 Houston, TX 77030 USA Phone : 713-500-9914 Fax : 713-500-0900 Email : Eric.Boerwinkle@uth.tmc.edu 2 GWAS of CHD in African Americans Table of Contents 1. COHORT DEMOGRAPHICS 5 A) DESCRIPTION OF THE CARE AFRICAN-AMERICAN SAMPLES USED IN THE GENOME-WIDE ASSOCIATION STUDIES. I. ATHEROSCLEROSIS RISK IN COMMUNITIES (ARIC). II. CORONARY ARTERY RISK DEVELOPMENT IN YOUNG ADULTS (CARDIA). III. CLEVELAND FAMILY STUDY (CFS). IV. JACKSON HEART STUDY (JHS). V. MULTI-ETHNIC STUDY OF ATHEROSCLEROSIS (MESA). B) DESCRIPTION OF THE AFRICAN-AMERICAN SAMPLES USED FOR REPLICATION. I. MULTIETHNIC COHORT (MEC), TYPE-2 DIABETES SUBSET. II. CLEVELAND CLINIC. III. PENNCATH (CORONARY ANGIOGRAPHIC CASE-CONTROL STUDY). IV. NATIONAL HEALTH AND NUTRITION EXAMINATION SURVEY III (NHANES III). V. JAMAICAN SPANISH TOWN AND GXE COHORTS. VI. HEALTH, AGING, AND BODY COMPOSITION (HEALTH ABC) STUDY. 5 5 5 6 7 7 8 8 9 9 10 10 11 2. PHENOTYPE DEFINITIONS AND MODELING 11 A) CORONARY HEART DISEASE (CHD). B) TYPE-2 DIABETES (T2D). C) HYPERTENSION (HTN). D) LOW-DENSITY LIPOPROTEIN CHOLESTEROL (LDL-C). E) HIGH-DENSITY LIPOPROTEIN CHOLESTEROL (HDL-C). F) SMOKING. 11 11 11 11 12 12 3. GENOTYPING AND QUALITY CONTROL 12 A) GENOME-WIDE ASSOCIATION STUDIES. B) REPLICATION GENOTYPING. 12 13 4. PRINCIPAL COMPONENT ANALYSIS (PCA) 13 5. GENOTYPE IMPUTATION 14 6. GENETIC ASSOCIATION ANALYSIS 14 A) ADDITIONAL SAMPLE FILTERING. B) SNP ASSOCIATION. I- COHORTS WITH UNRELATED SAMPLES. II- COHORTS WITH RELATED SAMPLES. C) META-ANALYSIS. 14 15 15 15 15 7. ADMIXTURE ASSOCIATION 16 3 GWAS of CHD in African Americans A) CREATING A PANEL OF ANCESTRY INFORMATIVE MARKERS (AIMS). B) LOCAL ANCESTRY ESTIMATION. C) ASSOCIATION ANALYSIS USING LOCAL ANCESTRY ESTIMATES. 16 16 17 8. REFERENCES 18 9. SUPPLEMENTARY FIGURE LEGENDS 20 4 GWAS of CHD in African Americans 1. Cohort demographics A) Description of the CARe African-American samples used in the genome-wide association studies. i. Atherosclerosis Risk in Communities (ARIC). The ARIC study is a prospective population-based study of atherosclerosis and cardiovascular diseases in 15,792 men and women, including 11,478 non-Hispanic whites and 4,314 African Americans, drawn from 4 U.S. communities (suburban Minneapolis, Minnesota; Washington County, Maryland; Forsyth County, North Carolina, and Jackson, Mississippi). In the first three communities, the sample reflects the demographic composition of the community. In Jackson, only black residents were enrolled. Because of the design and focus of the CARe Project, only selfreported African-American participants are included in this analysis. Participants were between age 45 and 64 years at their baseline examination in 1987-1989 when blood was drawn for DNA extraction and participants consented to genetic testing. After taking into account availability of adequate amounts of high quality DNA, appropriate informed consent and genotyping quality control and assurance procedures, genotype data were available on 2,989 African-American individuals. Blood was drawn in the morning after an overnight fast. Total cholesterol, HDLcholesterol and glucose levels were measured by standard enzymatic methods. LDL-cholesterol was calculated by the method of Friedewald et al. [1]. Prevalent diabetes mellitus was defined as a fasting glucose level ≥126 mg/dL, non-fasting glucose ≥200 mg/dL, or a history of treatment for diabetes. Seated blood pressure was measured three times with a random-zero sphygmomanometer and the last two measurements were averaged. Prevalent hypertension was defined as a systolic blood pressure ≥140 mmHg, a diastolic blood pressure ≥90 mmHg, or current use of antihypertensive medications. Information about medical history and cigarette smoking was elicited from standardized and validated interviewer-administrated questionnaires. For the analyses presented here, smoking was defined as current smoking. Prevalent CHD was defined as self-reported myocardial infarction or revascularization procedures. ii. Coronary Artery Risk Development in young Adults (CARDIA). The CARDIA study is a prospective, multi-center investigation of the natural history and etiology of cardiovascular disease in African Americans and whites 18-30 years of age at the time of initial examination. The initial examination included 5,115 participants selectively recruited to represent proportionate racial, gender, age, and education groups from four communities: Birmingham, AL; Chicago, IL; Minneapolis, MN; and Oakland, CA. Participants from the Birmingham, Chicago, and Minneapolis centers were recruited from the total community or from selected census tracts. Participants from the Oakland center were randomly recruited from the Kaiser-Permanente health plan membership. Details of the study design have been published [2]. From the time of initiation of the study in 1985-1986, five follow-up examinations have been conducted at years 2, 5, 7, 10, 15, and 20. DNA extraction for genetic studies was performed at the Y10 examination. After taking into account availability of adequate amounts of high quality DNA, appropriate informed consent and genotyping quality control and assurance procedures, genotype data were available on 955 AfricanAmerican individuals. 5 GWAS of CHD in African Americans Each participant’s age, race, and sex were self-reported during the recruitment phase and verified during the baseline clinic visit. Blood samples were drawn after an overnight fast. Total plasma cholesterol, triglycerides, HDL-, and LDL-cholesterol were measured according to standardized methods[3]. LDL-C was estimated using the Friedewald equation [1]. Blood pressure was measured at each exam on the right arm using a random-zero sphygmomanometer with the participant seated and following a 5 min. rest. Systolic and diastolic pressures were recorded as Phase I and Phase V Korotkoff sounds. Three measurements were taken at one-minute intervals. The average of the second and third measurements was taken as the blood pressure value. Plasma glucose was measured by the hexokinase method. Prevalent diabetes mellitus was defined as a fasting glucose level ≥126 mg/dL, non-fasting glucose ≥200 mg/dL, or a history of treatment for diabetes. Participant’s tobacco use and medical history were obtained from validated questionnaires. Details of the procedures for data collection have been previously described [2]. iii. Cleveland Family Study (CFS). The Cleveland Family Study (CFS) is a family-based, longitudinal study designed to characterize the genetic and non-genetic risk factors for sleep apnea. In total, 2534 individuals (46% African American) from 352 families were studied on up to 4 occasions over a period of 16 years (19902006). The initial aim of the study was to quantify the familial aggregation of sleep apnea. Over time, the aims were expanded to characterize the natural history of sleep apnea, sleep apnea outcomes, and to identify the genetic basis for sleep apnea. With subsequent exams, the cohort was expanded to include increased minority representation and additional family members. The total sample included index probands (n=275) who were recruited from 3 area hospital sleep centers if they had a confirmed diagnosis of sleep apnea and at least 2 first-degree relatives available to be studied. In the first 5 years of the study, neighborhood control probands (n=87) with at least 2 living relatives available for study were selected at random from a list provided by the index family. All available first-degree relatives and spouses of the case and control probands were recruited. Second-degree relatives, including half-sibs, aunts, uncles and grandparents, were also included if they lived near the first degree relatives (cases or controls), or if the family had been found to have two or more relatives with sleep apnea. The sample, which is enriched for individuals with sleep apnea, also contains a high prevalence of individuals with sleep apnea-related traits, including: obesity, impaired glucose tolerance, and hypertension. Data that were used for the CARe analyses were for individuals in whom DNA had been collected (i.e., over the last two exam cycles (n=1447)). After genotyping quality control, genotype data were available for 632 African Americans. Phenotype data were collected over as many as 4 exam cycles, each occurring ~every 4 years. The last three exams targeted all subjects who had been studied at earlier exams, as well as new minority families and family members of previously studied probands who had been unavailable at prior exams. The phenotype data used in the current analysis were from the 4th exam (2001-06) conducted in 736 subjects, with oversampling of minorities and individuals in whom prior microsatellite genome scans had been conducted. This exam was conducted in a General Clinical Research unit, and included an overnight in-laboratory study, and more detailed measurements of sleep using full polysomnography; blood sampling before bed, in the morning after an overnight fast, and after an oral glucose challenge test. Total cholesterol, HDL-cholesterol and glucose levels were measured by standard enzymatic methods. LDL-cholesterol was calculated by the method of Friedewald [1]. Prevalent diabetes mellitus was defined as a fasting glucose level ≥126 mg/dL, non6 GWAS of CHD in African Americans fasting glucose ≥200 mg/dL, or a history or treatment for diabetes. Blood pressure was measured 9 times using a mercury sphygmomanometer over 3 sessions (one supine and two sitting at each session). Prevalent hypertension was defined as a systolic blood pressure ≥140 mmHg, a diastolic blood pressure ≥90 mmHg, or current use of antihypertensive medications. Information about medical history and cigarette smoking was elicited from standardized self-administered questionnaires. For the analyses presented here, smoking was defined as current smoking. Prevalent CHD was defined as self-reported myocardial infarction or revascularization procedures. iv. Jackson Heart Study (JHS). The Jackson Heart Study (JHS) is a prospective population-based study to seek the causes of the high prevalence of common complex diseases among African Americans in the Jackson, Mississippi metropolitan area, including cardiovascular disease, type-2 diabetes, obesity, chronic kidney disease, and stroke[4]. During the baseline examination period (2000-2004) 5,301 self-identified African Americans were recruited from four sources, including (1) randomly sampled households from a commercial listing; (2) ARIC participants; (3) a structured volunteer sample that was designed to mirror the eligible population; and (4) a nested family cohort. Unrelated participants were between 35 and 84 years old, and members of the family cohort were ≥ 21 years old when consent for genetic testing was obtained and blood was drawn for DNA extraction. Based on DNA availability, appropriate informed consent, and genotyping results that met quality control procedures, genotype data were available for 3,030 individuals, including 885 who are also ARIC participants. In the current study, JHS participants who were also enrolled in the ARIC study were analyzed with the ARIC dataset – for this reason, the JHS dataset analyzed here had 2,145 individuals. Many key aspects of the JHS were modeled on, and are essentially identical to, the methods used in the ARIC study (see above), including phlebotomy procedures, blood pressure measurement, laboratory methods for lipids, cholesterol, and glucose analysis, definitions of prevalent diabetes mellitus and hypertension, and survey methods and definitions related to medical history, cigarette smoking, and CHD. v. Multi-Ethnic Study of Atherosclerosis (MESA). The Multi-Ethnic Study of Atherosclerosis (MESA) is a study of the characteristics of subclinical cardiovascular disease (disease detected non-invasively before it has produced clinical signs and symptoms) and the risk factors that predict progression to clinically overt cardiovascular disease or progression of the subclinical disease. MESA researchers study a diverse, population-based sample of 6,814 asymptomatic men and women aged 45-84 at baseline. Approximately 38% of the recruited participants are white, 28% African-American, 22% Hispanic, and 12% Asian, predominantly of Chinese descent. For the current study, after taking into account availability of adequate amounts of high quality DNA, appropriate informed consent and genotyping quality control and assurance procedures, genotype data were available on 1,646 African-American individuals. Participants were recruited from six field centers across the United States: Wake Forest University, Columbia University, Johns Hopkins University, University of Minnesota, Northwestern University and University of California - Los Angeles. Each participant received an extensive physical exam to determine coronary calcification, ventricular mass and function, flow-mediated 7 GWAS of CHD in African Americans endothelial vasodilation, carotid intimal-medial wall thickness and presence of echogenic lucencies in the carotid artery, lower extremity vascular insufficiency, arterial wave forms, electrocardiographic (ECG) measures, standard coronary risk factors, socio-demographic factors, lifestyle factors, and psychosocial factors. Selected repetition of subclinical disease measures and risk factors at follow-up visits allows study of the progression of disease. Blood samples are being assayed for putative biochemical risk factors and stored for case-control studies. DNA is extracted and lymphocytes immortalized for study of candidate genes and genome-wide scanning. Participants are followed for identification and characterization of cardiovascular disease events, including acute myocardial infarction and other forms of coronary heart disease (CHD), stroke, and congestive heart failure; for cardiovascular disease interventions; and for mortality. In addition to the six Field Centers, MESA involves a Coordinating Center, a Central Laboratory, and Central Reading Centers for Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound, and Electrocardiography (ECG). Protocol development, staff training, and pilot testing were performed in the first 18 months of the study. The first examination took place over two years, from July 2000 - July 2002. It was followed by three additional examination periods: September 2002 – January 2004, February 2004 - July 2005 September 2005 - May 2007. Participants are contacted every 9 to 12 months throughout the study to assess clinical morbidity and mortality. NHLBI recently funded MESA II, which will bring all participants back for a fifth exam, starting in April 2010. B) Description of the African-American samples used for replication. i. Multiethnic Cohort (MEC), type-2 diabetes subset. The MEC consists of 215,251 men and women, and comprises mainly five self-reported racial/ethnic populations: European Americans, African Americans, Latinos, Japanese Americans and Native Hawaiian [5]. Between 1993 and 1996, adults between 45 and 75 years old were enrolled by completing a 26-page, self-administered questionnaire asking detailed information about dietary habits, demographic factors, level of education, personal behaviors, and history of prior medical conditions (e.g. diabetes). Potential cohort members were identified through Department of Motor Vehicles drivers’ license files, voter registration files and Health Care Financing Administration data files. In 2001, a short follow-up questionnaire was sent to update information on dietary habits, as well as to obtain information about new diagnoses of medical conditions since recruitment. Between 2003 and 2007, we re-administered a modified version of the baseline questionnaire. All questionnaires inquired about history of diabetes, without specification as to type (1 vs. 2). Between 1995 and 2004, blood specimens were collected from ~67,000 MEC participants at which time a short questionnaire was administered to update certain exposures, and collect current information about medication use. African Americans in the MEC were sampled primarily from California, with the majority from Los Angeles County. Cohort members in California are linked each year to the California Office of Statewide Health Planning and Development (OSHPD) hospitalization discharge database which consists of mandatory records of all in-patient hospitalizations at most acute-care facilities in California. Records include information on the principal diagnosis plus up to 24 other diagnoses (coded according to ICD-9), including T1D and T2D. Information from this database was utilized to assess the percentage of T2D 8 GWAS of CHD in African Americans controls (as defined below) with undiagnosed T2D, as well as the percentage of identified diabetes cases with T1D rather than T2D. Based on the OSHPD database <3% of T2D cases had a previous diagnosis of T1D. We did not use this source to identify T2D cases because they did not include information on diabetes medications, one of our inclusion criteria for cases (see below). Diabetic cases in the MEC were defined using the following criteria: (a) a self-report of diabetes on the baseline questionnaire, 2nd questionnaire or 3rd questionnaire; and (b) self-report of taking medication for T2D at the time of blood draw; and (c) no diagnosis of T1D in the absence of a T2D diagnosis from the OSHPD. Controls were defined as: (a) no self-report of diabetes on any of the questionnaires while having completed a minimum of 2 of the 3 (79% of controls returned all 3 questionnaires); and (b) no use of medications for T2D at the time of blood draw; and (c) no diabetes diagnosis (type 1 or 2) from the OSHPD registry. To preserve DNA for genetic studies of cancer in the MEC, subjects with an incident cancer diagnosis at time of selection for this study were excluded. Controls were frequency matched to cases on age at entry into the cohort (5-year age groups). ii. Cleveland clinic. The Cleveland Clinic GeneBank study is a hospital-based study of ~10,000 patients recruited between 2001-2006 and provides an ongoing focus for analyzing the association of biochemical and genetic factors with coronary atherosclerosis in a consecutive cohort of patients undergoing elective cardiac evaluation. Enrollment criteria included stable patients undergoing coronary angiography and ability to give informed consent. Ethnicity was self-reported and information regarding demographics, medical history, and medication use was obtained by patient interviews and confirmed by chart reviews. All clinical outcome data were verified by source documentation. CAD was defined as adjudicated diagnoses of stable or unstable angina, MI (adjudicated definition based on defined electrocardiographic changes or elevated cardiac enzymes), angiographic evidence of ≥ 50% stenosis of one or more major epicardial vessels, and/or a history of known CAD (documented MI, CAD, or history of revascularization). Fasting blood was drawn and plasma, serum, and DNA buffy coats were isolated. A Lipid profile and complete metabolic panel were assayed on all samples. 719 African American men and women were selected from the entire Genebank cohort for genotyping and analysis as a replication dataset for the present study. iii. PennCATH (Coronary Angiographic Case-Control Study). PennCATH is a University of Pennsylvania (U. Penn) Medical Center based coronary angiographic study that has been used previously for replication of novel genes and risk factors for atherosclerotic CVD and type-2 diabetes [6-8]. Briefly, PennCATH, recruited between July 1998 and March 2003, provides an ongoing focus for analyzing the association of biochemical and genetic factors with coronary atherosclerosis in a consecutive cohort of patients undergoing cardiac catheterization and coronary angiography. A total of 3,850 subjects provided written informed consent in a Penn Institutional Review Board approved protocol. Enrollment criteria included any clinical indication for cardiac catheterization and ability to give informed consent. The following data were extracted from the medical record; age, gender, self-reported race/ethnicity, past medical (including diabetes, hypertension, dyslipidemia, prior MI and cardiac events), social, family and medication history, cardiovascular risk factors, physical exam including vital signs, weight and height (for BMI). Ethnicity information was self-reported. Coronary angiograms were scored at the 9 GWAS of CHD in African Americans time of procedure by the interventional cardiologist. Blood was drawn in a fasting state, DNA (buffy coats) and plasma was isolated, and lipoproteins and glucose were assayed on all samples. For this analysis, a nested case-control analysis was performed in PennCATH (N=502 African Americans) composed of controls (N=162) who on coronary angiography showed no or minimal (<10% stenosis of any vessel) evidence of CAD and angiographic CAD cases (N=340) with one or more coronary vessels with ≥50% stenosis. Cases were divided into those with history or presentation of MI (N=136) and cases without history or presentation with MI (N=204). iv. National Health and Nutrition Examination Survey III (NHANES III). The Third National Health and Nutrition Examination Survey (NHANES III), 1988-94, was conducted on a nationwide probability sample of approximately 33,994 persons aged 2 months and over. The survey was designed to obtain nationally representative information on the health and nutritional status of the population of the United States through interviews and direct physical examinations. Physical examinations and objective measures are employed because the information collected cannot be furnished or is not available in a standardized manner through interviews with the people themselves or through records maintained by the health professionals who provide their medical care. Some of the 30 topics investigated in the NHANES III were: high blood pressure, high blood cholesterol, obesity, passive smoking, lung disease, osteoporosis, HIV, hepatitis, helicobacter pylori, immunization status, diabetes, allergies, growth and development, blood lead, anemia, food sufficiency, dietary intake-including fats, antioxidants, and nutritional blood measures. Methods used to conduct surveys and examinations and to measure plasma glucose, HDL- and LDLcholesterol and triglycerides are detailed at http://www.cdc.gov/nchs/nhanes.htm. v. Jamaican Spanish Town and GXE cohorts. DNA samples for the Jamaican cohorts were obtained from two sources: Kingston and Spanish Town, Jamaica. The Kingston GXE cohort was obtained from a survey conducted in the capital city, Kingston, as part of a larger project to examine gene by environment interactions in the determination of blood pressure among adults 25-74 years. The principal criterion for eligibility was a body mass index (BMI) in either the top or bottom third of BMI for the Jamaican population[9]. Participants were identified principally from the records of the Heart Foundation of Jamaica, a non-governmental organization based in Kingston, which provides low-cost screening services (height and weight, blood pressure, glucose, cholesterol) to the general public. Other participants were identified from among participants in family studies of blood pressure at the Tropical Metabolism Research Unit (TMRU) and from among staff members at the University of the West Indies, Mona. All participants were unrelated. A total of 1,039 persons were enrolled. The Spanish Town cohort was obtained from a survey conducted as part of the International Collaborative Study of Hypertension in Blacks and previously described in detail [9]. In this study a stratified random sample of the Jamaican population aged 25-74 years was recruited from in and around Spanish Town, a stable, residential urban area neighboring the capital city of Kingston. 2,096 participants were enrolled between 1993 and 1998. 10 GWAS of CHD in African Americans The GXE and Spanishtown studies are approved by the University Hospital of the West Indies/University of the West Indies/Faculty of Medical Sciences Ethics Committe Mona, Kingston, Jamaica. vi. Health, Aging, and Body Composition (Health ABC) Study. The Health ABC study is a prospective cohort study investigating the associations between body composition, weight-related health conditions, and incident functional limitation in older adults. Health ABC enrolled well-functioning, community-dwelling black (n=1281) and white (n=1794) men and women aged 70-79 years between April 1997 and June 1998. Participants were recruited from a random sample of white and all black Medicare eligible residents in the Pittsburgh, PA, and Memphis, TN, metropolitan areas. Participants have undergone annual exams and semi-annual phone interviews. 2. Phenotype definitions and modeling Note. For all traits analyzed, we used phenotypes and covariates at the baseline examination. Only prevalent events for coronary heart disease, type-2 diabetes, and hypertension were considered. A) Coronary heart disease (CHD). Coronary heart disease cases are defined as participants with: (1) myocardial infarction, (2) heart surgery, (3) coronary bypass, (4) angioplasty of coronary artery, or (5) physician-diagnosed history of myocardial infarction. We excluded from the coronary heart disease cases those patients with angina only. B) Type-2 diabetes (T2D). Type-2 diabetes cases are defined as participants: (1) with fasting blood glucose 126 mg/ml, (2) with random blood glucose 200 mg/ml, (3) with physician-diagnosed type-2 diabetes, or (4) currently taking diabetic medications. C) Hypertension (HTN). Hypertension cases are defined as participants: (1) with systolic blood pressure 140 mmHg, (2) with diastolic blood pressure 90 mmHg, or (3) currently taking blood pressure lowering medications. We excluded from the hypertension cases patients with self-reported history of hypertension only. D) Low-density lipoprotein cholesterol (LDL-C). LDL-C was calculated according to Friedewald’s formula: LDL-C = total cholesterol – HDL-C – (triglycerides ÷ 5). If a triglyceride value was ≥ 400 mg/dL, LDL-C was treated as a missing value. For individuals on lipid-lowering therapy, the LDL-C value was multiplied by 1.42 to model a 30% reduction in LDL-C on therapy. This represents the average expected reduction in LDL-C with a first-generation statin, the most commonly used lipid-lowering medication during the study periods of most of the cohorts [10]. Sex-specific phenotype residuals were constructed within each cohort, after adjusting for age and age-squared. Each set of residuals was standardized to a mean of zero and a standard deviation of one. The standardized residual served as the phenotype in genotypephenotype association analyses. 11 GWAS of CHD in African Americans E) High-density lipoprotein cholesterol (HDL-C). Sex-specific residuals were constructed within each cohort after adjusting for age and agesquared. Each set of residuals was standardized to a mean of zero and a standard deviation of one. The standardized residual served as the phenotype in genotype-phenotype association analyses. F) Smoking. This analysis was restricted to current smokers. The number of reported cigarettes smoked per day was categorized as follows: 1= <1 cigarette per day; 2= 1-4 cigarettes per day; 3= 5-14 cigarettes per day; 4= 15-24 cigarettes per day; 5= 25-34 cigarettes per day; 6 =34-44 cigarettes per day; and 7= 45 and more cigarettes per day. Sex-specific phenotype residuals were constructed within each cohort, after adjusting for age and age-squared. Each set of residuals was standardized to a mean of zero and a standard deviation of one. The standardized residual served as the phenotype in genotype-phenotype association analyses. 3. Genotyping and quality control A) Genome-wide association studies. All samples were genotyped at the Broad Institute using the Affymetrix Genome-Wide Human SNP Array 6.0 (Affy6.0) according to the manufacturer’s recommendations. This genotyping platform interrogates simultaneously 1.8 million markers for genetic variation (906,600 SNPs and 946,000 copy number variation probes) [11,12]. A total amount of 1 μg of genomic DNA (diluted in 1X TE buffer and at 50 ng/μl) was equally interleaved on 96-well master plates to ensure technical uniformity during the laboratory process. Two methods of DNA quality control metrics were assessed on the samples prior to the genome scan. First, quantity of double stranded DNA was assessed using PicoGreen® (Molecular Probes, Oregon, USA). Next, to confirm sample identity, a set of 24 markers including a gender confirmation assay were genotyped using the Sequenom platform to serve as a genetic fingerprint. Each of these 24 SNPs are also on the Affy6.0 array, and serve in cross-platform sample verification. Based on the concept of reduced genomic representation, a restriction enzyme digestion was performed on 250 ng of input DNA. The digested segments were ligated to enzyme specific adaptors which incorporate a universal PCR priming sequence. PCR amplification using universal primers was performed in a reaction optimized to amplify fragments between 200-1,100 base pairs. A fragmentation step then reduced the PCR product to segments of approximately 25-50 bp, which were then end-labeled using biotinylated nucleotides. The labeled product was then hybridized to a chip, washed and detected. Genotypes were called using Birdseed v1.33 [11,12]. Quality control steps were performed using the software PLINK [13], EIGENSTRAT [14], and PREST-Plus [15]. For each step, we report in Supplementary Tables 1 and 2 the number of samples and SNPs, respectively, which were removed from the raw datasets. First, to confirm sample identity, we monitored genotype concordance between 24 SNPs genotyped in the same DNA samples using both Sequenom iPLEX [16] and Affy6.0. Second, DNA samples with a genome-wide genotyping success rate <95% and SNPs with genotyping success rate <90%, monomorphic SNPs, and SNPs that map to several genomic locations were removed from the 12 GWAS of CHD in African Americans analyses. Third, heterozygosity rates (in the form of inbreeding coefficients) on the autosomes were estimated to identify problematic DNA samples (poor DNA quality or contaminations). Fourth, genome-wide genotype data was used to estimate identity-by-descent (IBD) between all pairwise combinations of samples in order to identify sample duplicates, contaminated samples, and cryptic relationships. We also used IBS/IBD measures, as implemented in PREST-Plus [15], to confirm known pedigree data for CFS and JHS. Fifth, we removed sample outliers in the nearest neighbor and “clustering based on missingness” analyses in PLINK. Sixth, additional filters were applied to remove SNPs with minor allele frequency (MAF) <1%, with genotyping success rate <95%, and SNPs where missingness can be predicted using surrounding haplotypes. The Hardy-Weinberg equilibrium (HWE) test was performed for all SNPs, but SNPs were not excluded based uniquely on this criterion given the admixed nature of the cohorts genotyped. We note however that none of the SNPs reported in this manuscript to be associated with coronary heart disease or its risk factors have a HWE P-value <1x10-6. Seventh, for datasets with known pedigrees (CFS and JHS), SNPs and samples with an unusually high number of Mendel errors were excluded. Finally, SNPs that showed association with specific chemistry plates were excluded. B) Replication genotyping. 5' nuclease Taqman allelic discrimination assay (Taqman; Applied Biosystems, Foster City, CA, USA) was used by the Multi-Ethnic Cohort to genotype six SNPs in the T2D panel. Custom assays to genotype SNPs in the Cleveland, PennCATH, Jamaican, and NHANESIII samples were designed using Illumina’s Oligos Pool All (OPA) technology. Genotype calls were generated using Illumina’s BeadStudio. For these replication analyses, samples and SNPs with genotyping success rate <90% were excluded. For Health ABC, genotyping was performed by the Center for Inherited Disease Research (CIDR) using the Illumina Human1M-Duo BeadChip system. Genomic DNA was extracted from buffy coat collected using PUREGENE® DNA Purification Kit during the baseline exam. Samples were excluded from the dataset for reasons of sample failure, genotypic sex mismatch, and first-degree relative of an included individual based on genotype data. Genotyping was successful for 1,151,215 SNPs in 1,139 African Americans. Imputation was done for the autosomes using MACH software version 1.0.16. SNPs with minor allele frequency ≥1%, call rate ≥97% and HWE P ≥10-6 were used for imputation. HapMap II phased haplotypes were used as reference panels. For African Americans, genotypes were available on 1,007,948 high quality SNPS for imputation based on a 1:1 mixture of the CEPH:Yoruba (YRI) reference panel (release 21, build 36). A total of 1,958,375 SNPs are available for analysis. 4. Principal component analysis (PCA) We used PCA as implemented in EIGENSTRAT[14] on the cleaned CARe African-American Affy6.0 genotype data. Together with the CARe African-American samples, we analyzed genome-wide genotype data from 1,178 European Americans (a multiple sclerosis GWA study graciously offered by Dr. Phil de Jager and colleagues) and from 756 Nigerians from the Yoruba region (a hypertension GWA study graciously offered by Dr. Richard Cooper and colleagues). These two datasets have been extensively cleaned using PCA to remove population outliers. In our analysis, they are used as 13 GWAS of CHD in African Americans reference populations. Plots of the two main principal components (PCs) for each dataset are shown in Supplementary Figure 1: the first PC is correlated at r2 >0.98 with global European vs. African ancestry as calculated independently using the population genetics software ANCESTRYMAP and STRUCTURE [17,18]. PCA was also used as a screening tool to detect extreme sample outliers before quality control checks (Supplementary Table 1). For all cohorts except CARDIA, we did not observe significant sample outliers at this step. For CARDIA, however, the second principal component separated 210 samples from the rest of the samples (the first PC still captured global ancestry). These 210 samples were characterized by low genotyping success rate (<98%), low heterozygosity (inbreeding coefficient F <-0.15), and belonged to four different chemistry plates (the CARDIA DNA collection was genotyped on 16 plates). These 210 CARDIA samples were determined to have poor genotyping characteristics and removed from subsequent QC analyses. 5. Genotype imputation Imputation was performed using MACH 1.0.16 (http://www.sph.umich.edu/csg/abecasis/MaCH/). MaCH requires phased reference haplotypes to perform imputation. For the African Americans, a combined CEU+YRI reference panel was created. This panel includes SNPs segregating in both CEU and YRI, as well as SNPs segregating in one panel and monomorphic and nonmissing in the other (2.74 million altogether). Due to the overlap of African American individuals on the Affymetrix 6.0 and IBC arrays[16,19], it was possible to analyze imputation performance at SNPs not genotyped on Affymetrix 6.0. For imputation based on Affymetrix data, the use of the CEU+YRI panel resulted in an allelic concordance rate of ~95.6%, calculated as 1 – 1/2*|imputed_dosage – chip_dosage|. This rate is comparable to rates calculated for individuals of African descent imputed with the HapMap 2 YRI individuals [20]. For each imputed sample, imputation was performed in two steps. For the first step, individuals with pedigree relatedness or cryptic relatedness (pi_hat > 0.05) were filtered. A subset of individuals was randomly extracted from each panel and used to generate recombination and error rate estimates for the corresponding sample. In the second step, these rates were used to impute all sample individuals across the entire reference panel. Imputation results were filtered at an RSQ_HAT threshold of 0.3 and a minor allele frequency threshold of 0.01. 6. Genetic association analysis A) Additional sample filtering. Prior to genetic association testing, we removed additional samples that had passed all quality control filters described above but would nevertheless have caused problems in the interpretation of the results. These include: samples with missing gender information (ARIC=85), samples with different IDs that share >90% of the their genome identity-by-descent (IBD)(ARIC=56; JHS=1), samples unlikely to be from African Americans based on principal component analysis results (ARIC=8; CARDIA=2), samples that have a high number of discordant genotypes at SNPs common to 14 GWAS of CHD in African Americans both the Affy6.0 platform and the ITMAT-BROAD-CARe (IBC) array (ARIC=3) [19], seven samples from the ARIC dataset that were also present in the JHS dataset based on IBD metrics, and participants who were younger than 18 years old at baseline (CARDIA=5; CFS=111). Thus, the following numbers of African-American participants were available for analysis: ARIC=2,830, CARDIA=949, CFS=521, JHS=2,144, and MESA=1,646 (Total N=8,090). B) SNP association. We arbitrarily decided not to perform analysis on datasets with 10 or fewer cases for a given phenotype. This filter excluded CARDIA from the T2D analysis and CARDIA and MESA from the CHD analysis. The inflation factors for all completed analyses are presented in Supplementary Table 4. i- Cohorts with unrelated samples. For all cohorts but CFS, genome-wide association (GWA) analysis was performed in PLINK [13] using a linear regression model for quantitative traits (HDL, LDL, and smoking) and a logistic regression model for dichotomous outcomes (CHD, T2D, HTN), both under an additive genetic model. Trait modeling for LDL, HDL, and smoking is described above. For both linear and logistic models, we used as covariates the first ten principal components; for the logistic models, we also included gender, age, and age-squared in the model. For imputed genotypes, we use dosage information (i.e. a value between 0.0 – 2.0 calculated using the probability of each of the three possible genotypes) in the regression model implemented in PLINK. ii- Cohorts with related samples. For CFS, we modeled the family structure in the association tests using linear mixed effects (LME) models for the quantitative traits (HDL, LDL, and smoking) and generalized estimating equations (GEE) for the dichotomous phenotypes (CHD, T2D, HTN). These statistical methods are implemented in R [21]. We tested an additive genetic model and included as covariates the first ten principal components. With the GEE, we also included as covariates gender and age; agesquared was not taken into account because it introduced a co-linearity problem with the R routine. For imputed genotypes, we use dosage information (i.e. a value between 0.0 – 2.0 calculated using the probability of each of the three possible genotypes) in the regression model implemented in R routines. Although the JHS has a small number of related individuals, extensive analyses showed that results were concordant using linear/logistic regression or the LME/GEE routines, after genomic control (data not shown). For simplicity, we opted to present the linear/logistic regression results for JHS in this article. We did not test other genetic model than the additive model in this genomewide association study. C) Meta-analysis. Association results from the five CARe cohorts (ARIC, CARDIA, CFS, MESA, JHS) were combined using the inverse variance method, as implemented in the software metal (http://www.sph.umich.edu/csg/abecasis/metal/) [22]. Individual study results were corrected using genomic control. Meta-analytic results were also scaled using genomic control. The 15 GWAS of CHD in African Americans Manhattan plots summarizing the meta-analysis results after double genomic control corrections are in Figure S2. 7. Admixture association A) Creating a panel of Ancestry Informative Markers (AIMs). First, we describe a data resource that proved very valuable for much of the analytic work on CARe African Americans (AA). We obtained a substantial number of samples that proved excellent surrogates for the ancestral populations of the CARe AA. After careful curation, removing related pairs and some outlier samples (quality-control carried out using EIGENSOFT [14]), we had access to data for 756 Nigerian (Yoruba) samples and 1,178 European American samples that had been collected for other purposes, and made available to CARe. We thank Arti Tandon for having performed the quality-control of these datasets. We also thank Drs. Richard Cooper and Phil de Jager (and their colleagues) for generously providing to CARe these African and European samples, respectively. We call these samples ‘parental’ samples, although obviously they are samples from populations only approximating the true parental populations of AA. We then ran smartpca (part of EIGENSOFT), on the parental samples. As expected the leading eigenvector reflects genetic differences between Europe and West Africa. We output the ‘SNP loadings’: the absolute value of the loading is a score for informativeness at each SNP (that is how much a SNP contributes to a given eigenvector. We then ran a greedy algorithm to produce a list of SNPs all at least 0.5 cM apart from each other, chosen to have large SNP loadings. This produced a list of 4917 SNPs. We next carried out a careful check on pairs of markers, for whether detectable linkage disequilibrium (LD) was present in the parental populations. This step is necessary because our admixture program ANCESTRYMAP [17] is sensitive to remaining residual LD. Specifically, we checked all pairs of markers that were separated by less than 5 cM and computed a statistic X, 2[1] distributed at random for both the European and African parental samples. If the genetic distance between the pairs is d cM then we declared the pair to be in LD if X > 0.02d. For such a pair, we deleted the less informative SNP from our AIMs panel. This is a very stringent criterion, but is justified, as deleting a few informative markers is a modest price to reduce the likelihood of false positive association signals. This “LD pruning” step is a standard part of the preprocessing for ANCESTRYMAP [17], and has been carried out in exactly the same way in previous admixture scans [23-25]. The described approach produced a final set of 3,192 unlinked AIMs. B) Local ancestry estimation. Our primary tool is ANCESTRYMAP [17], which can calculate: (1) local ancestry estimates on a mesh (default 1 cM, which we use) and (2) for each individual i, estimates of the global ancestry θi (proportion of European ancestors) and the number of generations to admixture (λi). Especially for large sample sets such as the CARe study, it is impractical to compute and store on disk local ancestry estimates for every SNP and individual. Instead we use an interpolation procedure. Think of a chromosome as ordered from left to right, so that the leftmost marker has smallest physical position on the human reference. We wish to calculate a local ancestry estimate for marker k, sample i. Find the nearest point to k on the left, L = L(k) such that the genetic distance 16 GWAS of CHD in African Americans |k − L(k)| > 2 cM . Similarly, with find the nearest point R = R(k) on the right. Let α(x, i) be the posterior probability that sample i has x European chromosomes at L conditional on markers at L or leftwards. Let β(y, i) be the probability of the data at R or rightwards, conditional on sample i having y European chromosomes at R. These quantities are familiar from standard theory of Hidden Markov Models, and note that the definitions are not symmetric. Assuming that markers outside the interval (L, R) are unlinked to k, it is now simple to calculate the posterior distribution of the number of European chromosomes at k conditional on all data outside the interval (L, R), using the genetic distance between k, L and R. In our applications, we also want the distribution of ancestry conditioned as above, together with the observed genotype at k. As parental allele frequencies are assumed known, this is readily computed using Bayes rule. These computations can be done ‘on the fly’ without external storage and are fast and efficient. Boundary conditions (such as L, R being ‘off the end’ of the chromosome) are handled correctly. C) Association analysis using local ancestry estimates. We developed a new module in PLINK, called PLOCAL (Cameron Palmer and Joel N. Hirschhorn, pers. comm.), that can retrieve dynamically from the ANCESTRYMAP output files the estimates of local ancestry for each participant and for each SNP. In PLINK, we could test linear and logistic regression models with terms for SNP genotypes, local ancestry estimates, and the main ten principal components; in logistic models, we also add gender, age, and age-squared as covariates. To combine evidence from genotypes (e.g. the number of minor alleles) and estimates of local ancestry (e.g. 0-100% European ancestry) at each SNP for a given phenotype across the CARe African-American cohorts, we follow these steps: at each SNP, (1) the P-value corresponding to the effect size (regression beta’s) estimated for the SNP term is combined across the five cohorts using a weighted Z-score method based on cohort sample size (see metal above); individual cohort results are scaled using genomic control, (2) similarly, the P-value corresponding to the local ancestry term is combined using the same meta-analysis method across cohort; and (3) for each SNP, meta-analytic P-values for the SNP and local ancestry effect sizes are converted to Chi-square values, summed, and a Chi-square statistic with two degrees-of-freedom is obtained. Genomic control is again applied to the final association statistics. For these analyses, we ignore family structures and rely conservatively on genomic control to control for the inflation of the test statistics. To illustrate the power of our approach in genetic association testing in African Americans, we simulated the following scenario: A genetic marker has an allele frequency of 50% in both European and African parental populations. This marker is in perfect linkage disequilibrium in Europeans with the causal allele, but in complete linkage equilibrium in Africans with the causal allele. The causal allele has a genetic relative risk of 2. The study design includes 2,000 AfricanAmerican cases and controls. Under these assumptions, an analytical framework that ignores local ancestry has 4% power to identify the signal at P1x10-6. In comparison, our strategy, which utilizes allelic and local ancestry information, has 96% power to detect the same signal using the same statistical threshold. 17 GWAS of CHD in African Americans 8. References 1. Friedewald WT, Levy RI, Fredrickson DS (1972) Estimation of the concentration of low-density lipoprotein in plasma, without use of preparative ultracentrifuge. Clin Chem 18: 499-502. 2. Friedman GD, Cutter GR, Donahue RP, Hughes GH, Hulley SB, et al. (1988) CARDIA: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol 41: 11051116. 3. Warnick GR (1986) Enzymatic methods for quantification of lipoprotein lipids. Methods Enzymol 129: 101-123. 4. Taylor HA, Jr., Wilson JG, Jones DW, Sarpong DF, Srinivasan A, et al. (2005) Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethn Dis 15: S6-4-17. 5. Kolonel LN, Henderson BE, Hankin JH, Nomura AM, Wilkens LR, et al. (2000) A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol 151: 346-357. 6. Helgadottir A, Manolescu A, Helgason A, Thorleifsson G, Thorsteinsdottir U, et al. (2006) A variant of the gene encoding leukotriene A4 hydrolase confers ethnicity-specific risk of myocardial infarction. Nat Genet 38: 68-74. 7. Helgadottir A, Thorleifsson G, Magnusson KP, Gretarsdottir S, Steinthorsdottir V, et al. (2008) The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat Genet 40: 217-224. 8. Lehrke M, Millington SC, Lefterova M, Cumaranatunge RG, Szapary P, et al. (2007) CXCL16 is a marker of inflammation, atherosclerosis, and acute coronary syndromes in humans. J Am Coll Cardiol 49: 442-449. 9. Cooper R, Rotimi C, Ataman S, McGee D, Osotimehin B, et al. (1997) The prevalence of hypertension in seven populations of west African origin. Am J Public Health 87: 160-168. 10. Kapur NK, Musunuru K (2008) Clinical efficacy and safety of statins in managing cardiovascular risk. Vasc Health Risk Manag 4: 341-353. 11. Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, et al. (2008) Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40: 1253-1260. 12. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, et al. (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40: 1166-1174. 13. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559-575. 14. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904-909. 15. McPeek MS, Sun L (2000) Statistical tests for detection of misspecified relationships by use of genome-screen data. Am J Hum Genet 66: 1076-1094. 16. Musunuru K, Lettre G, Young T, Farlow DN, Pirrucello JP, et al. (2010) Candidate Gene Association Resource (CARe): Design, Methods, and Proof of Concept. Circulation Cardiovascular Genetics In press. 17. Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, et al. (2004) Methods for highdensity admixture mapping of disease genes. Am J Hum Genet 74: 979-1000. 18. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945-959. 18 GWAS of CHD in African Americans 19. Keating BJ, Tischfield S, Murray SS, Bhangale T, Price TS, et al. (2008) Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies. PLoS One 3: e3583. 20. Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, et al. (2009) Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 84: 235-250. 21. Chen MH, Yang Q (2009) GWAF: an R package for genome-wide association analyses with family data. Bioinformatics. 22. Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26: 2190-2191. 23. Kao WH, Klag MJ, Meoni LA, Reich D, Berthier-Schaad Y, et al. (2008) MYH9 is associated with nondiabetic end-stage renal disease in African Americans. Nat Genet 40: 1185-1192. 24. Reich D, Nalls MA, Kao WH, Akylbekova EL, Tandon A, et al. (2009) Reduced neutrophil count in people of African descent is due to a regulatory variant in the Duffy antigen receptor for chemokines gene. PLoS Genet 5: e1000360. 25. Reich D, Patterson N, De Jager PL, McDonald GJ, Waliszewska A, et al. (2005) A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat Genet 37: 1113-1118. 26. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, et al. (2010) LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26: 2336-2337. 27. Kathiresan S, Voight BF, Purcell S, Musunuru K, Ardissino D, et al. (2009) Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet 41: 334-341. 28. Kathiresan S, Willer CJ, Peloso GM, Demissie S, Musunuru K, et al. (2009) Common variants at 30 loci contribute to polygenic dyslipidemia. Nat Genet 41: 56-65. 19 GWAS of CHD in African Americans 9. Supplementary Figure Legends Figure S1. Plots of the two main principal components (PC) in the CARe African-American samples. European-Americans and Nigerians samples are used as reference populations. We note that the first principal component (PC1) captures European vs. African global ancestry. For CFS, outliers on PC2 all belong to the same large family. Figure S2. Manhattan plots summarizing the meta-analysis results for the six phenotypes analyzed after double genomic control scaling. The dashed line highlights genome-wide significance (P-value =5x10-8). The genome-wide significant loci include 8p23 (chr8), LPL (chr8), LIPC (chr15), LCAT (chr16), and CETP (chr16) for HDL-C, SLC12A9 (chr7) for hypertension, and PCSK9 (chr1), CELSR2PSRC1-SORT1 (chr1), and APOE (chr19) for LDL-C. Figure S3. Graphical representation of the information summarized in Table S12. Plots were drawn using LocusZoom [26]. CHD association results in Caucasians are from Kathiresan et al. [27]. HDL-C association results in Caucasians are from Kathiresan et al. [28]. Under each plot, the light blue box corresponds to the genomic intervals flanked by the leftmost and rightmost SNPs with and r20.3 with the index SNPs (purple diamond). For the results in Caucasians, we used LD based on HapMap CEU, and for the results in African Americans, LD based on HapMap YRI. For the PLTP and ABCA1 loci, the CARe SNPs (respectively rs6065904 and rs13284054) define genomic intervals of 0.2 kb using the r20.3 threshold, which appear as light blue lines on the plots. Figure S4. Quantile-quantile (QQ) plots of the meta-analyses results that take into account local ancestry estimates and SNP genotypes (Chi-square with two degrees-of-freedom (N=8,090). Each black circle represents an observed statistic for genotyped SNPs only (defined as the –log10(Pvalue)) against the corresponding expected statistic. The grey area corresponds to the 90% confidence intervals calculated empirically using permutations. The meta-analysis inflation factors are: coronary heart disease (s=0.923), HDL-C (s=1.030), hypertension (s=1.121), LDL-C (s=1.290), smoking (s=1.060), and type-2 diabetes (s=1.109). Data shown is genomic controlled before (for each study) and after the meta-analysis. Figure S5. Manhattan plots summarizing the meta-analysis results that take into account local ancestry estimates and SNP genotypes (two degrees-of-freedom). Results are shown for the six phenotypes analyzed after double genomic control scaling. The dashed line highlights genome-wide significance (P-value =5x10-8). The genome-wide significant loci include LCAT (chr16) and CETP (chr16) for HDL-C and PCSK9 (chr1), CELSR2-PSRC1-SORT1 (chr1), 2p24 (chr2), and APOE (chr19) for LDL-C. Figure S6. Comparison of P-values (-log10 scale) for the meta-analysis results obtained using SNP genotype-only (x-axis) or SNP genotype + estimate of local ancestry (y-axis) to compute the test statistics. Each black circle corresponds to a SNP. In total, results from ~885,000 genotyped SNPs were available for each method. The gray line represents perfect correlation (x=y). The horizontal and vertical dashed lines represent the pre-defined threshold for genome-wide significance (Pvalue=5x10-8). 20

3. Genotyping and quality control

Related documents

Products

Support

3. Genotyping and quality control

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib