Integrative Physical Mapping of the Soybean Genome NSF#9872635 and ISPOB 98-222-24-2 The goal of this project is to develop an integrated physical and genetic map of the soybean genome. This map will be readily-used for large-scale sequencing of the soybean genome and large-scale functional analysis of the soybean genome sequence by reverse genetics. Therefore, this integrated map will provide a platform for large-scale discovery, mapping, cloning and utilization of genes agriculturally important to soybean production. To this end, we proposed to accomplish the following objectives (see the proposal): Original Project Objectives: 1. Fingerprint three soybean BIBAC libraries of 120,000 BACs, (90,000 NSF, 30,000 ISPOB) equivalent to > 15 x soybean haploid genomes, that are capable of direct complementation by soybean transformation. 2. Integrate the BAC contigs with the 1,000 microsatellite marker soybean molecular genetic map (600 NSF; 400 ISPOB). 3. Test methods for assembling the fingerprinted BACs into the physical map contigs, for gap filling. 4. Test the utility of the contigs for identifying homeologous regions of the duplicated soybean genome and for assignment of EST families to genomic regions . 5. Provide soybean researchers with electronic access to BAC clones encompassing regions likely to contain genes and QTL of agronomic importance. Personnel Involved This Year: At TAMU: Dr. Chengcang Wu, postdoctoral scientist; Dr. Padmavathi Nimmakayala, postdoctoral scientist; Dr. Suku Sun, postdoctoral scientist; Mr. Filip Santos, research assistant; Ms. Rachael Springman, undergraduate student; Ms. Kejiao Ding, graduate student; Mr. Quanzhou Tao, research associate. At SIUC: Jeffry Schulz continued as Technician, programmer, Kay Cryder undergraduate student who started as a graduate student, Amanda Tiedeman as a Undergraduate student, Abdelmajid Kassem High School Teacher. A marker anchored physical map was constructed from bacterial artificial chromosome libraries and large insert plasmid libraries (hereafter BACs): Objective 2: Marker Integration: Precisely 370 markers have been successfully anchored 930 individual HindIII BACs (Table 2), 964 BamHI BACs and 1,121 EcoRI BACs from Forrest. By cooperation with another NSF project, 9872565 “A functional genomics program for soybean” we have fingerprinted BACs that anchor the local physical maps of soybean c.v. Williams 82 (Marek et al., 2001). This has incorporated a further 89 microsatellite markers (from 267) and 105 RFLP markers. Presumably this provides 459 anchors for the physical map, although the homeolog frequency may increase this number significantly (45-50%). About 148 clones have been verified by re-amplification from an independent copy of the BAC library and are made available. The genetic map location for all verified markers and their plate addresses for all clones can be viewed at www.siu.edu/~pbgc/Database/sattlinkfiles. Figure 1. Satellite links for linkage groups A to G available at www.siu.edu/~pbgc/ Objective 1: BAC Fingerprinting: To develop the proposed BAC/BIBAC-based, integrated physical/genetic map of soybean, we fingerprinted soybean 95,322 BACs and BIBACs, covering 11.8 x soybean genomes. There are 38,562 clones from the soybean Forrest Hind III BIBAC library (Meksem et al. 2000), 22,656 clones from the Forrest BamHI BIBAC library (Meksem et al. 2000), 30,720 clones from the new Forrest EcoRI BAC library (Wu et al. 2002) and 3,384 clones from the Williams 82 Hind III and Fairbault Eco RI BAC libraries (Marek et al., 2001; Danesh et al. 1998) (Table 1). A database for the BAC and BIBAC fingerprints were created and made readily accessible and available to the public (see Figure 2 and http://hbz.tamu.edu - Physical Mapping - Soy Map). Users can access the database via the WWW and use the FPC Hitting Tool. The fingerprints are useful when chromosome walking and when building minimally overlapping clone tiles for genome sequencing. Table 1. Soybean BACs fingerprinted and edited for the construction of the map. Libraries Clones fingerprinted: Cloning site No. of Insert size clones (kbp) coverage Clones edited: Genome No. of equivalence clones Forrest at TAMU BIBAC Hind III 30,720 125 3.5 x 30,720 BIBAC Bam HI 21,504 125 2.4 x 20,736 BAC Eco RI 30,720 157 4.4 x 30,720 Williams 82 & Fairbault at TAMU BAC HindIII & EcoRI 3,0001 150 0.4 x 3,000 Forrest At SIUC BIBAC Hind III 7,8422 125 0.8 x 7022 BIBAC Bam HI 1,152 125 0.3 x 24 Williams at SIUC BAC Hind III 384 150 0.04 384 Combined libraries 95,322 11.8 x 86,3863 1. The BACs were identified with 267 SSRs and 105 RFLPs by Dr. Shoemaker laboratory at Iowa State University and Dr. N. Young laboratory at University of Minnesota (Marek et al. 2001). 2. Many clones containing SSR markers were fingerprinted twice, once at SIUC and once at TAMU. All 702 anchored clones were edited at SIUC. 3. The database of all BAC and BIBAC fingerprints are publicly accessible and available using the FPC Hitting Tool at http://hbz.tamu.edu - Physical Mapping – Soy Map. Figure 2. Clone plate entry and conversion in FPC available at http://hbz.tamu.edu . Objective 3: Contig Map Assembly and Gap Filling During September 2001, we assembled the map contigs from the fingerprint database of 81,024 soybean clones using the FPC package (Soderlund et al. 1997, 2001). We generated 5,488 automatic contigs (contigs assembled without manual editing). Contigs can be searched for any clone in the database (Figure 3). A typical contig can be seen in Figure 3. The longest contig contained 320 clones and encompassed 5.86 Mbp. The contigs consisted of a total of 396,843 unique bands, estimated to span 1,667 Mb (Table 2) if each band represents 4.2 Kbp. Since this is more than the genome size of soybean (1,100 Mb/haploid) the estimate may be innacurate. Many of the contigs may overlap even though the common bands were not identified under the contig assembly conditions used. Also, some misassembly is inevitable before manual editing to remove multiplets and multi-band artefacts. Table 2. Status of soybean physical map after automated assembly, September 2001. Number Contigs containing: Number_ BAC clones in FPC database 81,024 > 25 clones 220 BACs used in contig assembly 75,568 10 – 25 clones 3,038 Number of singletons 5,884 3 – 9 clones 1,845 Clones in contigs 69,684 2 clones 385 Number of contigs 5,488 singletons 5,884 Unique bands of the contigs 396,843 Total Contig length (Mbp) 1,667* *Each fingerprint band was estimated to represent a fragment of 4.2 Kbp, on average. Figure 3. A typical contig available at Fingerprint Contig Map showing ctg2296 clones: 11 length: 62 bands (246 Kbp) built on clone E12D03 http://hbz.tamu.edu/bacindex1.html According to three reports, (McPherson et al., 2001; Tao et al. 2001; Chang et al. 2001), the number of contigs can be reduced and their size enlarged by 4 – 6 fold after contig editing and mergence using the FPC package. Recent progress with contig editing has reduced contig number and increased contig size (Table 3). Table 3. Status of soybean physical map (as of June, 2002) Automated assembly Contig editing and mergence (September 2001) (June 2002) BAC clones in FPC database BACs used in contig assembly Number of singletons Clones in contigs (fold genome) Anchored Markers Number of contigs Contigs containing: > 25 clones 10 – 25 clones 3 – 9 clones 2 clones Unique bands of the contigs Physical length of the contigs in Mb 81,024 75,568 5,884 69,684 (8.7 x) 278 5,488 220 3,038 1,845 385 396,843 1,667* 83,026 78,001 5,918 72,083 (9.0 x) 459 5,253 235 3,226 1,687 105 380,952 1,600 *Each fingerprint band was estimated to represent a fragment of 4.2 kb, on average. Based on this progress we expect the number of the soybean contigs will be reduced from the 5,253 automated contigs to about 1,000 contigs after contig editing and mergence are completed. The contig editing and mergence will be completed in early 2002. The BACs from the Williams 82 and Fairbault BAC libraries (Danesh et al., 1998; Marek et al. 2001) will assist with gap closure and contig verification. Contigs are available at http://hbz.tamu.edu - Physical Mapping – Soy Map. We are continuing to anchor the positive clones of the remaining marker anchored BACs used in screening of the Williams 82 and Fairbault libraries, verifying, editing and merging the contigs. The Forrest soybean genome project group at South Illinois University at Carbondale is working on the integration of another 520 SSR markers. We are removing likely cross contaminated cloned from the data-set as identified by SIUC program. By August 2002 we will accomplish the fourth generation integrated physical and genetic map of soybean, consisting of about 1,200 contigs, containing about 900 mapped DNA markers, and covering about 95% of the soybean genome. Objective 4: Synteny analysis and EST assignment. a, EST assignment Dr. Lightfoot has begun gene rich region mapping by placing onto the soybean physical map 34 F.solani induced defense associated ESTs from small multigene families (Genbank BI850056-850092). Most of these ESTs (60%) are providing single bands from designed primers. In addition we are mapping member of two large gene families which can help indetify gene rich regions. The Zhang group showed NBS like sequences map to many locations in the physical map (Wu et al., 2002a). The rhg1 gene was a LRR transmembrane kinase found in a gene rich region. The family has 174 members in Arabidopsis (TAGI 2000). The Meksem group is mapping members of this family in soybean (Meksem et al., 2001; 2001a) to attempt to identify Rhg2-10. b. Genome sequence: BAC end sequencing of a marker anchored BACs has been used for genome sampling (Marek al., 2001; Iqbal et al., 2002) and putative genes were detected in about 7% of sequences. We expect about 2.5% of the soybean genome to encode genes (40,000 genes in 1,000 Mbp of genome) the markers used seem to bias toward gene rich regions. As expected about half of the predicted genes in genomic DNA are not present in the EST libraries but 80% had paralogs. The predicted genes not captured by EST libraries tend to be cell, tissue, organ or environment specific and accumulate to low abundance. They include mRNAs that appear to encode genes targeted to the membranes, nucleus and mitochondria that tend to be accumulate to lower copies per cell than cytoplasmic mRNAs. c. Gene Discovery: The genomics of linkage group G of soybean: We have assembled a physical map of linkage group G. Using 23 anchored contigs we assembled a partial physical map of a chromosome that encompassed 69 automatics contigs and 50 Mbp into 10 merged contigs (figure 4). The contigs anchored to linkage group G have provided 2 nucleation points for preliminary genome sequencing (Figure 5 and 6) to overlap with a region sequenced by another group (Hague et al., 2001). All of the linkage group G contigs are anchored with two or more genetic markers that permit their orientation relative to one another. Some of the contigs are known to contain high-density gene clusters . Some contigs are known to contain resistance gene analog clusters (Wu et al., 2002). 0 10 20 C e n tim o rg a n s (c M ) S S S S S S S S S S S S S S S S S S SD S SC N 30 Sd w t 40 H t/L d g e 50 60 a tt1 6 3 a tt0 3 8 a tt3 0 9 a tt6 1 0 a tt5 7 0 a tt2 1 7 a tt1 3 0 a tt2 3 5 a t_ 1 3 1 a tt3 2 4 a tt3 9 4 a tt1 1 5 c tt0 1 0 a tt5 6 6 /3 a t_ 0 8 8 a tt5 6 4 /1 a tt1 9 9 /5 a tt0 1 2 /5 03 38 05 17 70 W ue M i S c le ro P ro t S a tt2 8 8 80 90 F e e ffic . 100 O il SC N L ash Sd w t S a tt4 7 2 S a tt1 9 1 S a t_ 1 1 7 110 120 S a t_ 0 6 4 S c t_ 1 8 7 F o rre st S o y b e a n C h ro m o so m e G Figure 4A: Soybase Linkage group G from Soybase http://macgrant.agron.iastate.edu/ Figure 4B: Physical Map of linkage group G showing the core group of marker anchors and an example of a contig The annotation of the composite linkage group G posted at Soybase in January 2002 is used here. The ratio of physical distance to genetic distance varies from 100 to greater than 900 Kbp per cM in contigs sampled from different regions of the linkage group (Figure 4). Figure 5: Gene Density in 317 Kbp of the sequence of linkage group G by Annotation. Predicted genes are green, exon predictions are blue (Genscan) and light blue (Genmark) genome sequence orthologs are red, ESTs paralogs and orthologs are grey Figure 6: Closer view of predicted genes around Rhg1 in Annotation Station. Predicted genes are green, exon predictions are blue (Genscan) and light blue (Genmark) genome sequence orthologs are red, ESTs paralogs and orthologs are grey Objective 5: Community Access. A database for the BAC and BIBAC fingerprints were created and made readily accessible and available to the public (see Figure 2 and http://hbz.tamu.edu - Physical Mapping - Soy Map). Users can access the database via the WWW and use the all five tools. The fingerprints are useful when chromosome walking and when building minimally overlapping clone tiles for genome sequencing. The genetic map location for all verified markers and their plate addresses for all clones can be viewed at www.siu.edu/~pbgc/Database/sattlinkfiles. We have published and distributed copies of a user guide to the physical map that includes a description of how to dowmload source data for contigs for in-house manipulation as well as outlining some of the problems and pitfalls users will encounter. References Danesh, D., S. Penuela, J. Mudge, R.L. Denny, H. Nordstrom, J.P. Martinez, Young, N.D. 1998. A bacterial artificial chromosome library for soybean and identification of clones near a major cyst nematode resistance gene. Theor. Appl. Genet. 96:196202. Hauge,B.M., Wang,M.L., Parsons,J.D. and Parnell,L.D. 2001. Nucleic acid molecules and other molecules associated with soybean cyst nematode resistance Patent: WO 0151627-A 8 19-JUL-2001; Marek L, Shoemaker R. 1997. BAC contig development by fingerprint analysis in soybean. Genome 40:420-427. Marek LF, Mudge J, Darnielle L, Grant D, Hanson N, Paz M, Huihuang Y, Denny R, Larson, K, Foster-Hartnett D, Cooper A, Danesh D, Larsen D, Schmidt T, Staggs R, Crow JA, Retzel, E, Young ND, Shoemaker RC. 2001 Soybean genomic survey: BAC-end sequences near RFLP and SSR markers. Genome 44:572-581. McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, Sekhon M, Wylie K, Mardis ER, Wilson RK, Fulton R, Kucaba TA, Wagner-McPherson C, Barbazuk WB, Gregory SG, Humphray SJ, French L, Evans RS, Bethel G, Whittaker A, Holden JL, McCann OT, Dunham A, Soderlund C, Scott CE, Bentley DR, Schuler G, Chen HC, Jang W, Green ED, Idol JR, Maduro VV, Montgomery KT, Lee E, Miller A, Emerling S, Kucherlapati, Gibbs R, Scherer S, Gorrell JH, Sodergren E, Clerc-Blankenburg K, Tabor P, Naylor S, Garcia D, de Jong PJ, Catanese JJ, Nowak N, Osoegawa K, Qin S, Rowen L, Madan A, Dors M, Hood L, Trask B, Friedman C, Massa H, Cheung VG, Kirsch IR, Reid T, Yonescu R, Weissenbach J, Bruls T, Heilig R, Branscomb E, Olsen A, Doggett N, Cheng JF, Hawkins T, Myers RM, Shang J, Ramirez L, Schmutz J, Velasquez O, Dixon K, Stone NE, Cox DR, Haussler D, Kent WJ, Furey T, Rogic S, Kennedy S, Jones S, Rosenthal A, Wen G, Schilhabel M, Gloeckner G, Nyakatura G, Siebert R, Schlegelberger B, Korenberg J, Chen XN, Fujiyama A, Hattori M, Toyoda A, Yada T, Park HS, Sakaki Y, Shimizu N, Asakawa S, Kawasaki K, Sasaki T, Shintani A, Shimizu A, Shibuya K, Kudoh J, Minoshima S, Ramser J, Seranski P, Hoff C, Poustka A, Reinhardt R, Lehrach H. 2001 A physical map of the human genome. Nature 409 6822 :934-941. Meksem K, Ruben E, Zobrist K, Zhang H-B, Lightfoot DA. 2000. Two large insert libraries for soybean: Applications in cyst nematode resistance and genome wide physical mapping. Theor Appl Genet 101: 747-755. Wu C, FA Tao Q, Santos, P Nimmakayala, R Springman and H-B Zhang. 2002. A Bacterial artificial chromosome library for ‘Forrest’ soybean. Theor. Appl. Genet. in press .