Supplementary Fig. S1. A long PCR scheme for retrieval of missing sequence corresponding to a breaking point in a presumed misassembled region at the end of a pseudomolecule. Supplementary Fig. S2. Diversity along soybean chromosomes 1 to 20. The horizontal axes are Mbp coordinated along the Williams 82 reference genome and approximate centromere positions proposed by Schmutz et al. 2010 are denoted by the thick arrows. Shown for each chromosome are relationship between physical and genetic positions (cM, black line), and corresponding recombination rates (red line, cM/Mb) calculated from 100-kb sliding windows for the genomic regions between adjacent markers (top panel), numbers of genes per 100 kb (black line) and numbers of transposable elements (TE) per 100 kb (middle panel), and numbers of single nucleotide polymorphic (SNP) sites per 100 kb (left y axis) and percentages of shared SNPs of Hwangkeum per the number of SNPs for IT182832 (bottom panel). In the top panel, the discrepant regions between the genetic and sequence-based physical maps are denoted by discontinuities. green lines represent Williams 82K, blue lines IT182932, red lines Hwangkeum, and black lines % shared SNP. In the bottom panel, Supplementary Fig. S3. Single nucleotide divergence between soybean variants. The approximate 3.2 million single nucleotide differences among three soybean accessions, which were identified by aligning short reads against the Williams 82 reference genome sequence, were classified into shared and unique variations. The overlapping regions represent variants shared between two variants or all variants. Supplementary Table S2. Distribution of diversity or polymorphism informative content (PIC) values from genotyping 258 indel markers onto a diverse set of 12 soybean variants PIC value Number of indel markers 0.1531 138 0.2779 27 0.2918 5 0.3750 20 0.3775 3 0.4029 4 0.4449 22 0.4862 16 0.4866 5 0.5000 6 0.5418 4 0.5695 1 0.5696 2 0.6112 2 0.6528 1 0.7085 1 0.7502 1 Supplementary Table S3. List of indel markers in marker intervals corresponding to putative introgression regions in Hwangkeum (Glycine max) from G. soja where more than half of indel markers polymorphic between Williams 82 (G. max) and IT182932 (G. soja) were shown to be monomorphic between Hwangkeum and IT182932 Indel marker in marker- Genome Polymorphism or sparse interval sequence monomorphism position between Hwangkeum Forward primer Reverse primer Allele size in Williams 82 and IT182932 GMES0924 to Satt631(Gm03: 1310955.. 2916475) GSINDEL19600 1516234 Polymorphism GAATTGTATTCTGAAACAGC CACTCAAATCCTACGTTTAC 161 GSINDEL19661 1684062 Monomorphism TTTACTATGGCATGATTTCT TTCACCTAGGTAATTTTGAA 143 GSINDEL19680 1742470 Monomorphism TTACATTGTTCAATCCTACC TTTTCTTCTTGCCTTTAGTA 152 GSINDEL19728 1908133 Monomorphism GGATTTTTCAATTGATTTTA TCATCTCTCTCCTAACAGAA 115 GSINDEL19743 1942952 Monomorphism TGCATTCCAATACTATTACC AGTGATTTATGCTTTTTCAC 116 GSINDEL19792 2045321 Polymorphism GTACTTCCATTAAAACATGC ATGCTTTTGTTGTTGATTAT 243 GSINDEL19902 2324329 Monomorphism GTCCTCTGAACAATAAACTG TACACCGATTCCTTTAAATA 216 GSINDEL19990 2558325 Polymorphism AATGGTTCACAAAACTTAGA TATCACAGAAGAAGAGGCTA 248 Satt316 to Sca-364a (Gm06: 47485161.. 48661496) GSINDEL56203 47572099 Monomorphism AAATAAGCAATAGGCACTAA GTTTTTAATTATGAGGCAAA 130 GSINDEL56300 47798156 Monomorphism ATACGTGGCAATAGTATGAT CAAGATTTTGAGTTAAGGTG 122 GSINDEL56311 47820269 Monomorphism TGAGAAATCGTTTATTTCAT CTTTGTTTTTCTTAAGGTGA 179 GSINDEL56321 47837905 Polymorphism CTTGTTTTTGTTGATTCTTC TTAACCTATTTTCTGTCCAA 200 GSINDEL56371 47956076 Monomorphism TATTTCTTTTGAAACAGACC TACTCTTCCCTTTGTCATTA 180 GSINDEL56385 47986268 Monomorphism GAATAAGAAAGAGAGGAAGC TAGGGGAAAATGAAGACTA 177 GSINDEL56400 48014768 Monomorphism ATACATTTCATTTCATCCAG AAGTTTCACGTCAGTTAAAA TGACAACTAAAATGACAATAA CATTTGACATTGCTATTATG GSINDEL56414 48074822 Monomorphism GSINDEL56439 48178039 Monomorphism GATCCAACTTACCATAATCA TCAAAAATAAAATGGAGTGT 198 GSINDEL56531 48415540 Polymorphism TCTTCAATTCCGAATACTAA ATATATCAACGAAATGCTTC 161 A 160 132 GMES1600 to GMES6736 (Gm09: 39751797.. 41849066) GSINDEL84893 40031613 Polymorphism CAATTTTTAAACAAGCTCAA AGTCTTTTCATGTTATGCAC 195 GSINDEL85007 40364072 Monomorphism ACCAGCAACACATTATTTAT TGCTGAACTGTCTTCTACTT 210 GSINDEL85023 40412092 Monomorphism GTAACACGACACAAACTTCT GAACAAAATGAAAATATGCT 163 GSINDEL85030 40419277 Monomorphism GAATGAATGAATGTTTGTTT GGTAGTGAATTACAACCAAG 130 GSINDEL85048 40498520 Polymorphism GTAAGGACTAAGGATAAAGC CTTTCAAGCTGGATTTGAC 204 GSINDEL85121 40623731 Monomorphism ACTGTGTTGTTAGCATTTTT CCAACTCGTCAACTCTATT 187 GSINDEL85147 40673473 Monomorphism AAAGAGTTGCATTACAAGAG CTTCCCTTTTCTTCTTTTAT 134 GSINDEL85155 40694764 Monomorphism TGCCATATCTTATCTTTTGT GGACTGTGTACTTGATAGGA 141 GSINDEL85193 40763413 Monomorphism GACTCTTCTTCTGTCTCCTC TTTTAATTGGGTGAGAGTAA 169 GSINDEL85269 41096510 Monomorphism TATGCTCGTACTGAGATTTT GAGAGTGATCCATTCAAAG 232 GSINDEL85271 41103592 Monomorphism ACTCAGGAGATTCTTGAAAT GCTAGTCAATTGGAAACAT 134 GSINDEL85272 41109897 Monomorphism TATACACCGAGCTTAATAGG AAGACCTTCAGTACAGTTCA 137 GSINDEL85282 41140003 Monomorphism ATTTACCATGAGCAGATTTA CTTGGTCCAATCTTAGTGT 232 GSINDEL85324 41280723 Monomorphism GTGCCACTTATGTGAGATAC AAAAACTTTGATATTGTGGA 130 GSINDEL85332 41294386 Polymorphism TATACACAAAGTTGCACAAA ATGTCACTCAAAATAGATGC 179 GSINDEL85333 41295818 Monomorphism GAGGGGATATCTGTGTATCT CTTCACTTGGTGATAGAGAA 244 GSINDEL85335 41297158 Polymorphism ACGTGAAAAGTGTCTCTAAA TTCATCTTCTCCTTTTCATA 216 GSINDEL85359 41384266 Monomorphism ATTGAAGAGTCCTCTACCTC GTAGCTAGCATTTCAAGAAG 183 GSINDEL85435 41537237 Polymorphism CCCATGACTCTTATCTCATA GATACTTGGGAAGAGAAAGT 229 A203.p1to Sca_189b (Gm15: 7469277.. 8817679) GSINDEL136839 7777973 Monomorphism AAAAGAGTGCATAATGATTT ATTTCCAAGATTTTTCTTTT 117 GSINDEL136934 8075682 Polymorphism TCTCAAAATAAAAATGGAAG TTATCAAATAACAAGGGAAT 131 GSINDEL136982 8167899 Monomorphism ACAAATCCAGCAAACTATA CTTAGGAAATTCATTTGATG 156 GSINDEL137010 8288840 Polymorphism CTTTGCAAAATAAGTTTAGG CTTTTTCTCTCAATTTTTCA 195 GSINDEL137069 8507084 Monomorphism TGGAATTTTCTGAAATAAAG TAATCTCAAGAGGAGATGAA 174 GSINDEL137071 8509641 Monomorphism TTTAGATAACCTTCCTCACA TTCACAGTAGGTTAGACGTT 171 GSINDEL137103 8609958 Polymorphism CATAAGGGAGGGTAATACTT TTAATTGATCCATGTTCATC 133 GSINDEL137141 8721390 Monomorphism TTGGTGGTATCACTAACTTT ATTTAGGCTTAGGGTCTAAC 141 Supplementary Table S4. List of primer sets for long PCR amplification of breaking points of discrepant segments between the current genetic map and Williams 82 genome sequence assembly (Glyma1) and GenBank accession numbers of sequences of the retrieved sequences Discrepant segment Primer set Forward primer sequence Reverse primer sequence GenBank accession number Chr 5 Chr5A1 GCAACGTTTGTCTTCGTTCA GTTAATCTCGCCGGAAAATTG JQ924191 Chr 11 BE020413- JQ924190 BE020413 AGTTAAGATATGTTGCTTGG AGTGTTTGTTGTATGGTTGT Chr11B1 (primary GGCCACTTCTGGAATCGTAA GCCCCACTGGAAGTATTTGA GACTCGGTGACACCATAAGT GTGAATTGTGTACGGGTTTT Chr14B2 GAACATATATGGGGTGCATGA CATTCTACGCTAGAAGCTGAA JQ924192 Upper Scaffold41-be ATATGCCACCCAAATAAAAA GTTTGGGTGAAAAACAAGAG JQ924193 Lower Scaffold41-end CCAGACAAAAGAGAAAGTGG GGAAGGACAAGGGTTATTTT JQ924194 Chr19L GAAGGATACAAGTGAAAAAGTACAA GATGTAGACAACATATCCCCTTC JQ924195 containing 5’ end 3’ end PCR) Chr11B1-2 (nested PCR) Chr 14 Insertion site of unplaced scaffold_41 on Chr 17 Chr 19 Supplementary Table S5. Summary of mapping by chromosome Chromosome Number of Distance cM/marker Physical kb/marker number markers (cM) 1 128 107.9 0.8 55.9 436.7 1.9 2 113 138.6 1.2 51.7 457.5 2.7 3 67 113.2 1.7 47.8 713.4 2.4 4 59 105.7 1.8 49.2 833.9 2.1 5 79 110.0 1.4 41.9 530.4 2.6 6 74 125.1 1.7 50.7 685.1 2.5 7 64 114.9 1.8 44.6 696.9 2.6 8 93 155.3 1.7 47.0 505.4 3.3 9 73 109.4 1.5 46.8 641.1 2.3 10 80 142.6 1.8 51.0 637.5 2.8 11 76 133.0 1.8 39.2 515.8 3.4 12 72 106.1 1.5 40.1 556.9 2.6 13 110 115.3 1.0 44.4 403.6 2.6 14 76 113.4 1.5 49.7 653.9 2.3 15 78 124.9 1.6 50.9 652.6 2.5 16 65 91.9 1.4 37.4 575.4 2.5 17 64 119.2 1.9 41.9 654.7 2.8 18 73 116.8 1.6 62.3 853.4 1.9 19 75 112.0 1.5 50.6 674.7 2.2 20 62 105.9 1.7 46.8 754.8 2.3 Total/average 1581 2361.2 1.5 950.0 600.9 2.5 length (Mb) Recombination rate (cM/Mb) Supplementary Table S6. Comparison of recombination rate of plants with sequenced genomes Species name (common name) Arabidopsis thaliana Arabidopsis lyrata Fragaria vesca (strawberry) Brachypodium distachyon Brassica rapa (pak choi) Medicago truncatula Oryza sativa (rice) Solanum tuberosum (potato) Sorghum bicolor (sorghum) Glycine max (soybean) Predicted Number of Sequenced Transposable Sequenced Genetic map Recombination Adjusted Referencesd genome size chromosomes genome size element (TE) genome size length (cM) rate (cM/Mb)a recombination (Mb) (Mb) content (%) excluding TE rate (cM/Mb)b 125 5 119 16 102 597 5.0 5.9 1, 2, 3 230 8 207 23 159 515 2.5 3.2 2, 4, 5 240 7 210 22 164 559 2.7 3.4 6 355 485 454 405 844 5 10 8 12 12 272 284 375 389 727 26 40 30 35 62 201 170 263 264 276 1598 1123 567 1530 762c 5.9 4.0 1.5 3.9 1.1 8 6.6 2.2 5.8 2.8 7, 8 9, 10 11, 12 13, 14 15, 16 748 10 730 62 277 1059 1.5 3.8 17, 18 1115 20 937 57 402 2361 2.5 5.9 19, this study Zea mays (maize) 2300 10 2300 85 345 2349 1.0 6.8 20, 21 a Calculated by dividing the map length by the sequenced genome size. We presume that, as the sequenced genomes of rice and Arabidopsis demonstrated tendency of overestimating genome size by flow cytometry [Arabidopsis Genome Initiative (2000); International Rice Genome Sequencing Project (2005)], the sequenced genome sizes are more accurate than the predicted genome sizes. b c Calculated by dividing the map length by the sequenced genome size excluding TE. Average genetic length of 751 cM for the maternal map and 773 cM for the paternal map. d References 1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 2. Hollister JD, Smith LM, Guo YL, Ott F, Weigel D, Gaut BS (2011) Transposable elements and small RNAs contribute to gene expression divergence between Arabidopsis thaliana and Arabidopsis lyrata. Proc Natl Acad Sci USA 108:2322-2327 3. Lister C, Dean C (1993) Recombinant inbred lines for mapping RFLP and phenotypic markers in Arabidopsis thaliana. Plant J 4: 745–750 4. Hu TT, Pattyn P, Bakker EG, Cao J, Cheng JF, Clark RM, Fahlgren N, Fawcett JA, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP, Salamov AA, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME, Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KFX, Van de Peer Y, Grigoriev IV, Nordborg M, Weigel D, Guo Y-L (2011) The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet 43:476-481 5. Kuittinen H, de Haan AA, Vogl C, Oikarinen S, Leppälä J, Koch M, Mitchell-Olds T, Langley CH, Savolainen O (2004) Comparing the linkage maps of the close relatives Arabidopsis lyrata and A. thaliana. Genetics 168:1575-1584 6. Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL, Jaiswal P, Mockaitis K, Liston A, Mane SP, Burns P, Davis TM, Slovin JP, Bassil N, Hellens RP, Evans C, Harkins T, Kodira C, Desany B, Crasta OR, Jensen RV, Allan AC, Michael TP, Setubal JC, Celton J-M, Rees DJG, Williams KP, Holt SH, Rojas JJR, Chatterjee M, Liu B, Silva H, Meisel L, Adato A, Filichkin SA, Troggio M, Viola R, Ashman TL, Wang H, Dharmawardhana P, Elser J, Raja R, Priest HD, Bryant DW Jr, Fox SE, Givan SA, Wilhelm LJ, Naithani S, Christoffels A, Salama DY, Carter J, Lopez Girona E, Zdepski A, Wang W, Kerstetter RA, Schwab W, Korban SS, Davik J, Monfort A, Denoyes-Rothan B, Arus P, Mittler R, Flinn B, Aharoni A, Bennetzen JL, Salzberg SL, Dickerman AW, Velasco R, Borodovsky M, Veilleux RE, Folta KM (2011) The genome of woodland strawberry (Fragaria vesca). Nat Genet 43:109-116 7. Huo N, Garvin DF, You FM, McMahon S, Luo MC, Gu YQ, Lazo GR, Vogel JP. (2011) Comparison of a high-density genetic linkage map to genome features in the model grass Brachypodium distachyon. Theor Appl Genet. 123:455-64 8. International Brachypodium Initiative (2010) Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463:763–768 9. Kim H, Choi SR, Bae J, Hong CP, Lee SY, Hossain MJ, Van Nguyen D, Jin M, Park BS, Bang JW, Bancroft I, Lim YP. (2009) Sequenced BAC anchored reference genetic map that reconciles the ten individual chromosomes of Brassica rapa. BMC Genomics 10:432 10. The Brassica rapa Genome Sequencing Project Consortium (2011) The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 43:1035-1039 11. Mun JH, Kim DJ, Choi HK, Gish J, Debellé F, Mudge J, Denny R, Endré G, Saurat O, Dudez AM, Kiss GB, Roe B, Young ND, Cook DR (2006) Distribution of microsatellites in the genome of Medicago truncatula: A resource of genetic markers that integrate genetic and physical maps. Genetics 172:2541-2555 12. Young ND, Debellé F, Oldroyd GE, Geurts R, Cannon SB, Udvardi MK, Benedito VA, Mayer KF, Gouzy J, Schoof H, Van de Peer Y, Proost S, Cook DR, Meyers BC, Spannagl M, Cheung F, De Mita S, Krishnakumar V, Gundlach H, Zhou S, Mudge J, Bharti AK, Murray JD, Naoumkina MA, Rosen B, Silverstein KA, Tang H, Rombauts S, Zhao PX, Zhou P, Barbe V, Bardou P, Bechner M, Bellec A, Berger A, Bergès H, Bidwell S, Bisseling T, Choisne N, Couloux A, Denny R, Deshpande S, Dai X, Doyle JJ, Dudez AM, Farmer AD, Fouteau S, Franken C, Gibelin C, Gish J, Goldstein S, González AJ, Green PJ, Hallab A, Hartog M, Hua A, Humphray SJ, Jeong DH, Jing Y, Jöcker A, Kenton SM, Kim DJ, Klee K, Lai H, Lang C, Lin S, Macmil SL, Magdelenat G, Matthews L, McCorrison J, Monaghan EL, Mun JH, Najar FZ, Nicholson C, Noirot C, O'Bleness M, Paule CR, Poulain J, Prion F, Qin B, Qu C, Retzel EF, Riddle C, Sallet E, Samain S, Samson N, Sanders I, Saurat O, Scarpelli C, Schiex T, Segurens B, Severin AJ, Sherrier DJ, Shi R, Sims S, Singer SR, Sinharoy S, Sterck L, Viollet A, Wang BB, Wang K, Wang M, Wang X, Warfsmann J, Weissenbach J, White DD, White JD, Wiley GB, Wincker P, Xing Y, Yang L, Yao Z, Ying F, Zhai J, Zhou L, Zuber A, Dénarié J, Dixon RA, May GD, Schwartz DC, Rogers J, Quétier F, Town CD, Roe BA (2011) The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480:520-524 13. Harushima Y, Yano M, Shomura A, Sato M, Shimano T, Kuboki Y, Yamamoto T, Lin SY, Antonio BA, Parco A, Kajiya H, Huang N, Yamamoto K, Nagamura Y, Kurata N, Khush GS, Sasaki T (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148(1):479-494 14. International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 15. The Potato Genome Sequencing Consortium (2011) Genome sequence and analysis of the tuber crop potato. Nature 474:189-195 16. van Os H, Andrzejewski S, Bakker E, Barrena I, Bryan GJ, Caromel B, Ghareeb B, Isidore E, de Jong W, van Koert P, Lefebvre V, Milbourne D, Ritter E, van der Voort JN, Rousselle-Bourgeois F, van Vliet J, Waugh R, Visser RG, Bakker J, van Eck HJ (2006) Construction of a 10,000-marker ultradense genetic recombination map of potato: providing a framework for accelerated gene isolation and a genomewide physical map. Genetics 173:1075-1087 17. Bowers JE, Abbey C, Anderson S, Chang C, Draye X, Hoppe AH, Jessup R, Lemke C, Lennington J, Li Z, Lin Y, Liu S, Luo L, Marler BS, Ming R, Mitchell SE, Qiang D, Reischmann K, Schulze SR, Skinner DN, Wang Y, Kresovich S, Schertz KF, and Paterson AH (2003) A high-density genetic recombination map of sequence-tagged sites for Sorghum, as a framework for comparative structural and evolutionary genomics of tropical grains and grasses. Genetics 165:367–386 18. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Mehboob Ur R, Ware D, WesthoVP, Mayer KFX, Messing J, Rokhsar DS (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551–556 19. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang X-C, Shinozaki K, Nguyen HT, Wing RA, Cregan PB, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA (2010) Genome sequence of the palaeopolyploid soybean. Nature 463(7278):178-183 20. Liu S, Yeh C-T, Ji T, Ying K, Wu H, Tang HM, Fu Y, Nettleton D, Schnable PS (2009) Mu transposon insertion sites and meiotic recombination events co-localize with epigenetic marks for open chromatin across the maize genome. PLoS Genetics 5: e1000733. 21. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, Chen W, Yan L, Higginbotham J, Cardenas M, Waligorski J, Applebaum E, Phelps L, Falcone J, Kanchi K, Thane T, Scimone A, Thane N, Henke J, Wang T, Ruppert J, Shah N, Rotter K, Hodges J, Ingenthron E, Cordes M, Kohlberg S, Sgro J, Delgado B, Mead K, Chinwalla A, Leonard S, Crouse K, Collura K, Kudrna D, Currie J, He R, Angelova A, Rajasekar S, Mueller T, Lomeli R, Scara G, Ko A, Delaney K, Wissotski M, Lopez G, Campos D, Braidotti M, Ashley E, Golser W, Kim H, Lee S, Lin J, Dujmic Z, Kim W, Talag J, Zuccolo A, Fan C, Sebastian A, Kramer M, Spiegel L, Nascimento L, Zutavern T, Miller B, Ambroise C, Muller S, Spooner W, Narechania A, Ren L, Wei S, Kumari S, Faga B, Levy MJ, McMahan L, Van Buren P, Vaughn MW, Ying K, Yeh CT, Emrich SJ, Jia Y, Kalyanaraman A, Hsia AP, Barbazuk WB, Baucom RS, Brutnell TP, Carpita NC, Chaparro C, Chia JM, Deragon JM, Estill JC, Fu Y, Jeddeloh JA, Han Y, Lee H, Li P, Lisch DR, Liu S, Liu Z, Nagel DH, McCann MC, SanMiguel P, Myers AM, Nettleton D, Nguyen J, Penning BW, Ponnala L, Schneider KL, Schwartz DC, Sharma A, Soderlund C, Springer NM, Sun Q, Wang H, Waterman M, Westerman R, Wolfgruber TK, Yang L, Yu Y, Zhang L, Zhou S, Zhu Q, Bennetzen JL, Dawe RK, Jiang J, Jiang N, Presting GG, Wessler SR, Aluru S, Martienssen RA, Clifton SW, McCombie WR, Wing RA, Wilson RK. (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112-1115 Supplementary Table S7. Summary of sequencing and variations for three soybean varieties Variety Category IT182932 Hwangkeum Williams 82 Mapping Total bases 36,880,852,541 19,983,900,227 16,544,562,819 Mean depth 38.82 21.03 17.41 %_bases_above_1 97.4 97.4 98.5 %_bases_above_5 93.2 95.9 97.6 %_bases_above_10 85.5 92.0 89.8 %_bases_above_20 69.2 56.2 34.2 Total 2,397,205 1,236,277 113,587 Known 1,365,216 817,605 44,566 Homozygous 2,286,168 1,165,945 51,454 Heterozygous 111,037 70,332 62,133 1,575,531 785,647 63,381 Transversion 821,674 450,630 50,206 Exon 116,121 55,029 5,610 Exon known 73,935 41,882 2,008 Exon novel 42,186 13,147 3,602 Exon homozygous 108,819 50,081 2,268 Exon heterozygous 7,302 4,948 3,342 Exon transition 68,941 32,637 3,081 Exon transversion 47,180 22,392 2,529 CDS 85,330 40,543 4,388 5’ UTR 9,706 4,420 423 3’ UTR 21,886 10,468 856 Intron 236,471 115,952 9,295 Silent 38,587 18,489 1,932 Missense 45,798 21,624 2,401 Nonsense 896 414 50 Readthrough 131 65 9 Splice_site 701 357 50 Start_codon 162 68 10 Total 302,013 236,276 29,105 Homozygous 286,097 224,203 20,269 Coveragea SNP Transition Indel Heterozygous 15,916 12,073 8,836 213,098 173,403 22,621 Exon 14,072 7,127 1,162 Exon novel 14,072 7,127 1,162 Exon homozygous 13,191 6,714 930 Exon heterozygous 881 413 232 Exon tandem_repeat 9,885 5,171 917 CDS 4,879 2,452 655 5’ UTR 3,476 1,633 199 3’ UTR 5,775 3,073 316 47,994 28,876 3,285 Frameshift 2,772 1,531 595 Inframe 2,107 921 60 Splice_site 264 139 31 Start_codon 85 41 5 Tandem_repeat Intron a Percentage of the Williams 82 reference genome sequence covered with short reads higher than 1, 5, 10, and 20