SUPPLEMENTARY MATERIAL Detecting evidence of positive selection across Asia with sparse genotype data from the HUGO Pan-Asian SNP Consortium Xuanyao Liu, Woei-Yuh Saw, Mohammad Ali, Rick Twee-Hee Ong, Yik-Ying Teo CONTENTS 1 Supplmentary Methods 1.1 Quantifying over-representation of height genes 2 2 2 Supplementary figures 3 3 Supplementary tables 6 4 References 10 1 1 Supplementary Methods 1.1 Quantifying over-representation of height genes Of the 59 genomic regions identified by haploPS to be positively selected in the 31 PASNP population groupings, 30 regions containing a total of 3,518 genes were found to possess at least one height-associated gene. Given that there have been more genome-wide association studies (GWAS) in height, particularly those involving hundreds of thousands of samples, we wanted to evaluate whether there was any evidence of over-representation for positive selection in height-related genes in the PASNP populations. As of 24 June 2013, there were 279 genes reported to be associated with height in the NHGRI GWAS catalogue1, against a baseline of 28,906 genes in the autosomal chromosomes of the human genome. The 30 positively selected regions contained 58 height-related genes, and a one-sided Binomial test yielded a p-value of 9.98 × 10-5 that this observation was due to chance. 2 2 Supplementary figures Supplementary Figure 1. Evidence of positive selection by iHS in HGDP populations Evidence of positive selection by iHS and XP-EHH for the populations in the Human Genome Diversity Project at chromosome 2. The figures were obtained from the HGDP Selection Browser maintained by the Pritchard Lab at http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/ using input coordinates of Chr2:196,841,741..197,997,071. 3 Supplementary Figure 2. Selected haplotype forms in 12 PASNP population groupings HaploPS identified the extended haplotypes that presented evidence of positive selection at chromosome 2 between 196.8Mb and 198.0Mb in 12 of the 31 PASNP population groupings. By extracting the haplotypes at frequencies ranging between 45% and 85% in the respective populations, we can infer that the selection signals likely stem from the same evolutionary event prior to the divergence of the populations as the selected haplotypes were perfectly identical and yielded a haplotype similarity index (HSI) of 1.00. 4 Supplementary Figure 3. HaploPS evidence around the HBB locus The horizontal axis of each panel shows the genetic distance in centimorgans spanned by the longest haplotype at 10% frequency across the genome, while the vertical axis shows the number of SNPs spanned by the corresponding haplotype. Thailand indigenous 1 refers to the PASNP populations from Thailand of H’Tin, Mlabri, Plang, Karen and Lawa ethnicities and the China Wa ethnic group; while Thailand indigenous 2 refers to the PASNP populations from Thailand of Tai Lue, Tai Yong, Tai Kern and Tai Yuan ethnicities. 5 3 Supplementary tables Supplementary Table 1. Labeling and characteristics of the populations in PASNP. This table is adapted from Figure 1 in the original PASNP publication2. Grouping China Group 1 China Group 2 China Group 3 China Group 4 China Han Indonesia Group 1 Indonesia Group 2 Indonesia Group 3 Indonesia Group 4 Indonesia Group 5 Indonesia Group 6 India Group 1 India Group 2 India Group 3 India Group 4 Japan Main Japan Okinawa Korean Malaysia Group 1 Malaysia Group 2 Malaysia Negrito Philippines Group 1 Labels CN-GA CN-HM CN-JI CN-CC CN-UG CN-SH CHB ID-SB ID-RA ID-SO ID-LA ID-LE ID-AL AX-ME ID-TR ID-MT ID-ML ID-KR ID-TB ID-DY ID-SU ID-JA ID-JV MY-BD IN-NI IN-TB IN-DR SG-ID IN-WI IN-EL IN-SP IN-WL IN-IL IN-NL JP-ML JPT JP-RK KR-KR MY-KN MY-MN SG-MY MY-TM MY-JH MY-KS PI-MA PI-UI PI-UN PI-UB Country China China China China China China China Indonesia Indonesia Indonesia Indonesia Indonesia Indonesia Pacific Indonesia Indoensia Indonesia Indonesia Indonesia Indonesia Indonesia Indonesia Indonesia Malaysia India India India Singapore India India India India India India Japan Japan Japan Korea Malaysia Malaysia Singapore Malaysia Malaysia Malaysia Philippines Philippines Philippines Philippines Ethnicity Han Hmong Jiamao Zhuang Uyghur Han Han Kambera Manggarai Manggarai Lamaholot Lembata Alorese Melanesian Toraja Mentawai Malay Batak Karo Batak Dayak Sudanese Javanese Javanese Bidayuh Tharu Ladakhi Upper Caste India Origin Bhil Upper Caste Upper Caste Upper Caste Upper Caste Upper Caste Japanese Japanese Ryukyuan Korean Malay Malay Malay Proto-Malay Negrito Negrito Manobo Urban Urban Urban 6 Language Cantonese Hmong Jiamao Zhuang Uyghur Chinese Chinese Kambera Manggarai Manggarai Lamaholot Lembata Alor Nasioi Toraja Mentawai Malay Batak Karo Batak Toba Benuak Sunda Javanese Javanese Jagoi Pahari Spiti Telugu Tamil Bhili Bengali Hindi Marathi Hindi Hindi Japanese Japanese Okinawan Korean Malay Minangkabau Malay Temuan Jehai Kensiu Manobo Visaya Tagalog Ilocano # samples 30 26 31 26 26 21 45 20 17 19 20 19 19 5 20 15 12 17 20 12 25 34 19 50 20 23 24 30 25 16 23 14 15 15 71 44 49 90 30 20 18 49 50 30 18 20 19 20 Grouping Philippines Negrito Singapore Chinese Taiwan Indigenous Taiwan Main Thailand Group 1 Thailand Group 2 Thailand Group 3 Thailand Group 4 Thailand Group 5 Labels PI-AT PI-IR PI-MW PI-AG PI-AE SG-CH AX-AM AX-AT TW-HA TW-YA TH-HM TH-YA TH-TY TH-TL TH-TK TH-TU TH-TN TH-MA TH-PP TH-KA TH-LW CN-WA TH-PL CN-JN TH-MO Country Philippines Philippines Philippines Philippines Philippines Singapore Taiwan Taiwan Taiwan Taiwan Thailand Thailand Thailand Thailand Thailand Thailand Thailand Thailand Thailand Thailand Thailand China Thailand China Thailand Ethnicity Negrito Negrito Negrito Negrito Negrito Han Ami Atayal Han Han Hmong Yao Tai Yong Tai Lue Tai Kern Tai Yuan H’Tin Mlabri Plang Karen Lawa Wa Palong Jinuo Mon 7 Language Ati Iraya Mamanwa Agta Aeta MinNan Ami Atayal Hakka MinNan Hmong Iu-Mien Tai Yong Lue Tai Kern Tai Yuan Mal Mlabri Blang Karen Lawa Wa Palong Jinuo Mon # samples 23 9 19 8 8 30 10 10 32 48 20 19 18 20 18 20 18 18 18 20 19 56 18 29 19 Supplementary Table 2. Regions identified by haploPS to be positively selected in the 31 PASNP population groupings. The start and end coordinates for each region are reported in NCBI Build 36 coordinates. 8 chr 1 startpos 31,628,061 endpos 59,351,170 pop_names Thailand_Group5 freq_all 0.05 1 63,039,308 76,275,854 0.05 1 1 1 145,720,880 148,360,642 156,412,238 158,989,174 170,718,710 173,602,174 Malaysia_Negritos Philippine_Negritos Indonesia_Group1 Indonesia_Group2 Taiwan_Indigenous Taiwan_Indigenous 1 2 181,292,574 197,417,076 8,624,853 9,653,517 Malaysia_Negritos 0.05 Japan_Okinawa Korean 0.9 0.8 2 16,676,509 18,526,577 0.25 2 42,409,140 48,888,233 Malaysia_Negritos Malaysia_Group2 India_Group2 Japan_Okinawa 2 84,292,129 85,819,280 2 108,085,215 109,051,831 2 2 113,751,097 133,237,701 166,064,007 169,579,101 haploPS_start haploPS_end haploPS_pop 50,442,680 57,644,365 CHB, CHD, JPT, MAS CHB, CHD, CHS, JPT, 64,112,389 76,257,415 MAS 0.85 0.75 0.65 145,830,625 0.2 NA 0.35 170,912,521 149,622,482 NA 171,363,586 CHB NA CHB, CHD, JPT NA 9,270,479 NA 9,654,635 17,314,637 17,897,251 NA CHB, CHD, CHS, JPT CHB, CHD, CHS, JPT, MAS 0.75 0.05 0.75 43,282,562 43,958,429 Singapore_Chinese China_Group2 China_Han 0.7 84,222,108 84,995,378 0.85 0.9 108,286,633 108,910,484 0.05 0.1 118,670,171 NA 131,247,669 NA 0.45 0.6 0.55 0.85 0.75 0.55 0.5 0.65 0.45 0.8 0.5 0.65 196,982,600 197,749,763 top genes identified by Fst ZMYM6, PIK3R3 DNAJC6,SLC44A5 NA FCRL1,CD5L FMO6P,FMO2 RGL1,EDEM3,HMCN1,DKFZp 762L185 NA NA CHD, CHS, JPT, MAS CHB, CHD, CHS, JPT, MAS NA CHB, CHD, CHS, JPT CHB, CHD, CHS, JPT, MAS NA SULT1C3 NA 2 196,841,741 197,997,071 China_Group2 Thailand_Group1 Indonesia_Group1 Malaysia_Group2 Malaysia_Group1 Indonesia_Group5 Indonesia_Group2 Thailand_Group3 India_Group2 Indonesia_Group3 Indonesia_Group6 Thailand_Group1 Philippine_Group1 Japan_Main 2 208,624,365 213,556,972 Thailand_Group4 0.1 208,754,345 213,267,474 3 39,148,534 55,438,612 0.05 44,121,527 49,710,750 3 56,851,823 73,194,663 0.05 0.05 58,038,371 73,054,439 3 3 103,332,218 123,321,009 126,537,059 134,498,594 Thailand_Group1 China_Group2 Thailand_Group1 Thailand_Group4 Japan_Okinawa China_Group2 CEU India_Group2 Indonesia_Group6 China_Group1 Philippine_Group1 Thailand_Group3 Philippine_Negritos Taiwan_Indigenous Philippine_Negritos Indonesia_Group5 0.05 0.05 103,385,279 0.1 NA 121,909,440 NA JPT CHB, CHD, CHS, JPT, MAS NA 0.5 0.8 0.8 0.8 0.8 0.05 0.05 0.05 32,843,099 135,551,435 NA NA 34,025,092 135,768,238 NA NA CHB, CHD, CHS, GIH, INS, JPT, MAS CHB, JPT, MAS NA NA NA NA VEGFC,IRF2 NA 0.1 0.05 56,766,427 56,785,344 CHS NA 4 4 4 5 32,709,603 134,211,631 175,225,231 5,472,363 34,377,130 142,886,432 187,552,805 13,945,346 5 50,304,726 83,541,796 Supplementary Table 2 continued. 9 CHB, CHD, CHS, GIH, INS, JPT, MAS CHB, CHD, CHS, JPT, MAS CHB, CHD, CHS, JPT, MAS NA NA NA ERBB4 NISCH CADPS BC035247 COL29A1,CPNE4 5 88,817,624 109,333,621 5 117,327,664 117,922,801 5 144,740,153 145,341,692 6 6 17,538,157 47,152,974 38,019,008 52,441,050 6 6 6 7 7 7 65,310,924 138,951,795 151,793,601 8,149,845 13,708,216 79,973,591 93,068,220 143,385,463 161,155,132 12,656,000 25,381,591 97,825,437 7 8 8 98,319,021 133,654,097 53,312,825 69,718,665 103,015,890 116,977,822 8 9 9 118,469,843 136,264,032 1,409,638 1,608,419 78,984,573 86,304,688 10 20,686,445 30,470,812 10 43,089,450 72,592,373 10 10 11 12 12 13 90,013,013 109,273,134 2,745,638 12,334,440 97,309,698 31,666,843 108,886,554 110,380,402 7,732,438 24,752,300 105,781,266 38,937,812 13 56,834,700 70,731,938 13 79,516,916 91,765,949 14 44,308,706 64,031,232 14 15 75,475,877 25,947,855 89,115,878 27,654,486 15 18 36,269,405 55,392,483 67,966,469 64,773,274 19 8,214,446 13,425,865 20 21 36,150,356 22,287,335 59,287,333 26,182,624 Indonesia_Group5 China_Group2 Thailand_Group5 Malaysia_Group1 Thailand_Group4 Thailand_Group3 China_Group1 China_Han Philippine_Negritos Taiwan_Indigenous Malaysia_Group2 India_Group1 Taiwan_Indigenous Malaysia_Negritos Thailand_Group5 Thailand_Group4 Taiwan_Indigenous Thailand_Group3 Indonesia_Group5 Thailand_Group1 Malaysia_Negritos Thailand_Group5 Malaysia_Group2 Indonesia_Group5 Taiwan_Indigenous Malaysia_Group2 Thailand_Group4 Indonesia_Group3 Indonesia_Group3 Malaysia_Group2 Philippine_Negritos Indonesia_Group5 Taiwan_Indigenous Thailand_Group3 Taiwan_Indigenous China_Group2 Taiwan_Indigenous China_Group2 Thailand_Group5 Malaysia_Group2 Malaysia_Negritos China_Group2 Philippine_Negritos Malaysia_Negritos Indonesia_Group6 Indonesia_Group3 Thailand_Group1 Taiwan_Indigenous Philippine_Negritos Malaysia_Group2 Indonesia_Group4 Malaysia_Group2 Indonesia_Group5 Thailand_Group3 China_Group2 China_Group1 Korean Japan_Main Thailand_Group5 Indonesia_Group3 Philippine_Negritos 0.1 0.05 NA NA NA RGMB,BC042169,CHD1,SLCO 6A1 0.9 0.9 0.9 0.9 0.9 0.9 117,373,324 117,701,041 CHB, CHD, CHS, JPT, MAS NA 0.8 0.85 NA NA NA 0.05 0.05 0.15 26,232,282 0.15 NA 35,693,077 NA 0.05 0.05 0.1 0.1 0.05 0.05 0.05 0.05 88,878,455 NA NA NA NA NA 69,850,800 NA NA NA NA NA CHB, CHD, CHS, GIH, INS, JPT, MAS RNF144B,SCGN,HLADRB5,HLADRB6,COL11A2,RXRB,SLC39 A7 OPN5,PKHD1 MAS NA NA NA NA NA CHB, CHD, CHS, JPT, MAS MAS CHS BAI3 NA NOX3 THSD7A NA NA NA NA NA 0.15 0.05 111,910,801 0.05 66,841,775 0.1 111,526,351 131,264,503 67,126,543 111,757,796 0.05 0.05 129,601,852 0.9 NA 0.1 NA 129,666,059 NA NA 0.05 22,715,863 24,435,153 0.05 0.05 55,537,663 65,892,784 0.05 0.05 0.4 0.1 0.05 0.1 0.1 92,947,202 NA NA NA 98,739,698 NA 107,547,879 NA NA NA 99,296,032 NA CHB, CHD, JPT NA NA CHB, CHD, CHS, JPT, MAS CHB, CHD, CHS, JPT, MAS CHB, CHD, CHS, JPT, MAS NA NA NA CHB, CHD, CHS, JPT NA 0.1 0.05 60,292,838 60,493,599 JPT, MAS DIAPH3 0.1 0.1 0.05 NA NA NA NA 0.05 0.1 0.1 49,021,100 49,473,932 JPT C14orf106,BMP4 0.05 0.1 0.85 76,948,436 NA 77,059,691 NA CHS NA KIAA0743,NRXN3 NA 62,904,700 NA CHB, CHD, CHS, JPT, MAS NA NA LOC390858 NA NA NA 52,648,939 NA CHD, JPT NA GCNT7,C20orf85 NA 0.05 0.05 0.05 40,233,690 0.05 NA 0.85 0.8 0.85 NA 0.05 0.05 52,491,840 0.1 NA 10 C7orf58,EU233817,PLXNA4 NA NA KIAA1217,PRINS CSGALNACT2,PRKG1 BC035398 GRIN2B,CR623725 STARD13,DCLK1 4 References 1. Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106, 9362-7 (2009). Abdulla, M.A. et al. Mapping human genetic diversity in Asia. Science 326, 1541-5 (2009). 2. 11