Integrative Physical Mapping of the Soybean Genome

advertisement
Integrative Physical Mapping of the Soybean Genome
NSF#9872635 and ISPOB 98-222-24-2
The goal of this project is to develop an integrated physical and genetic map of the soybean
genome. This map will be readily-used for large-scale sequencing of the soybean genome
and large-scale functional analysis of the soybean genome sequence by reverse genetics.
Therefore, this integrated map will provide a platform for large-scale discovery, mapping,
cloning and utilization of genes agriculturally important to soybean production. To this end,
we proposed to accomplish the following objectives (see the proposal):
Original Project Objectives:
1. Fingerprint three soybean BIBAC libraries of 120,000 BACs, (90,000 NSF, 30,000
ISPOB) equivalent to > 15 x soybean haploid genomes, that are capable of direct
complementation by soybean transformation.
2. Integrate the BAC contigs with the 1,000 microsatellite marker soybean molecular
genetic map (600 NSF; 400 ISPOB).
3. Test methods for assembling the fingerprinted BACs into the physical map contigs, for
gap filling.
4. Test the utility of the contigs for identifying homeologous regions of the duplicated
soybean genome and for assignment of EST families to genomic regions .
5. Provide soybean researchers with electronic access to BAC clones encompassing regions
likely to contain genes and QTL of agronomic importance.
Personnel Involved This Year:
At TAMU: Dr. Chengcang Wu, postdoctoral scientist; Dr. Padmavathi Nimmakayala,
postdoctoral scientist; Dr. Suku Sun, postdoctoral scientist; Mr. Filip Santos, research
assistant; Ms. Rachael Springman, undergraduate student; Ms. Kejiao Ding, graduate student;
Mr. Quanzhou Tao, research associate.
At SIUC: Jeffry Schulz continued as Technician, programmer, Kay Cryder undergraduate
student who started as a graduate student, Amanda Tiedeman as a Undergraduate student,
Abdelmajid Kassem High School Teacher.
A marker anchored physical map was constructed from bacterial artificial chromosome
libraries and large insert plasmid libraries (hereafter BACs):
Objective 2: Marker Integration: Precisely 370 markers have been successfully anchored
930 individual HindIII BACs (Table 2), 964 BamHI BACs and 1,121 EcoRI BACs from
Forrest. By cooperation with another NSF project, 9872565 “A functional genomics program
for soybean” we have fingerprinted BACs that anchor the local physical maps of soybean c.v.
Williams 82 (Marek et al., 2001). This has incorporated a further 89 microsatellite markers
(from 267) and 105 RFLP markers. Presumably this provides 459 anchors for the physical
map, although the homeolog frequency may increase this number significantly (45-50%).
About 148 clones have been verified by re-amplification from an independent copy of the BAC
library and are made available. The genetic map location for all verified markers and their plate
addresses for all clones can be viewed at www.siu.edu/~pbgc/Database/sattlinkfiles.
Figure 1. Satellite links for linkage groups A to G available at www.siu.edu/~pbgc/
Objective 1: BAC Fingerprinting: To develop the proposed BAC/BIBAC-based, integrated
physical/genetic map of soybean, we fingerprinted soybean 95,322 BACs and BIBACs,
covering 11.8 x soybean genomes. There are 38,562 clones from the soybean Forrest Hind III
BIBAC library (Meksem et al. 2000), 22,656 clones from the Forrest BamHI BIBAC library
(Meksem et al. 2000), 30,720 clones from the new Forrest EcoRI BAC library (Wu et al. 2002)
and 3,384 clones from the Williams 82 Hind III and Fairbault Eco RI BAC libraries (Marek et
al., 2001; Danesh et al. 1998) (Table 1). A database for the BAC and BIBAC fingerprints were
created and made readily accessible and available to the public (see Figure 2 and
http://hbz.tamu.edu - Physical Mapping - Soy Map). Users can access the database via the
WWW and use the FPC Hitting Tool. The fingerprints are useful when chromosome walking
and when building minimally overlapping clone tiles for genome sequencing.
Table 1. Soybean BACs fingerprinted and edited for the construction of the map.
Libraries
Clones fingerprinted:
Cloning site
No. of
Insert size
clones
(kbp)
coverage
Clones edited:
Genome
No. of
equivalence
clones
Forrest at TAMU
BIBAC
Hind III
30,720
125
3.5 x
30,720
BIBAC
Bam HI
21,504
125
2.4 x
20,736
BAC
Eco RI
30,720
157
4.4 x
30,720
Williams 82 & Fairbault at TAMU
BAC
HindIII & EcoRI 3,0001
150
0.4 x
3,000
Forrest At SIUC
BIBAC
Hind III
7,8422
125
0.8 x
7022
BIBAC
Bam HI
1,152
125
0.3 x
24
Williams at SIUC
BAC
Hind III
384
150
0.04
384
Combined libraries
95,322
11.8 x
86,3863
1. The BACs were identified with 267 SSRs and 105 RFLPs by Dr. Shoemaker laboratory at Iowa State
University and Dr. N. Young laboratory at University of Minnesota (Marek et al. 2001).
2. Many clones containing SSR markers were fingerprinted twice, once at SIUC and once at TAMU. All 702
anchored clones were edited at SIUC.
3. The database of all BAC and BIBAC fingerprints are publicly accessible and available using the FPC Hitting
Tool at http://hbz.tamu.edu - Physical Mapping – Soy Map.
Figure 2. Clone plate entry and conversion in FPC available at http://hbz.tamu.edu .
Objective 3: Contig Map Assembly and Gap Filling
During September 2001, we assembled the map contigs from the fingerprint database of
81,024 soybean clones using the FPC package (Soderlund et al. 1997, 2001). We generated
5,488 automatic contigs (contigs assembled without manual editing). Contigs can be searched
for any clone in the database (Figure 3). A typical contig can be seen in Figure 3. The longest
contig contained 320 clones and encompassed 5.86 Mbp. The contigs consisted of a total of
396,843 unique bands, estimated to span 1,667 Mb (Table 2) if each band represents 4.2 Kbp.
Since this is more than the genome size of soybean (1,100 Mb/haploid) the estimate may be
innacurate. Many of the contigs may overlap even though the common bands were not
identified under the contig assembly conditions used. Also, some misassembly is inevitable
before manual editing to remove multiplets and multi-band artefacts.
Table 2. Status of soybean physical map after automated assembly, September 2001.
Number
Contigs containing:
Number_
BAC clones in FPC database
81,024
> 25 clones
220
BACs used in contig assembly 75,568
10 – 25 clones
3,038
Number of singletons
5,884
3 – 9 clones
1,845
Clones in contigs
69,684
2 clones
385
Number of contigs
5,488
singletons
5,884
Unique bands of the contigs
396,843
Total Contig length (Mbp)
1,667*
*Each fingerprint band was estimated to represent a fragment of 4.2 Kbp, on average.
Figure 3. A typical contig available at Fingerprint Contig Map showing ctg2296 clones: 11
length: 62 bands (246 Kbp) built on clone E12D03 http://hbz.tamu.edu/bacindex1.html
According to three reports, (McPherson et al., 2001; Tao et al. 2001; Chang et al. 2001), the
number of contigs can be reduced and their size enlarged by 4 – 6 fold after contig editing and
mergence using the FPC package. Recent progress with contig editing has reduced contig
number and increased contig size (Table 3).
Table 3. Status of soybean physical map (as of June, 2002)
Automated assembly
Contig editing and mergence
(September 2001)
(June 2002)
BAC clones in FPC database
BACs used in contig assembly
Number of singletons
Clones in contigs (fold genome)
Anchored Markers
Number of contigs
Contigs containing:
> 25 clones
10 – 25 clones
3 – 9 clones
2 clones
Unique bands of the contigs
Physical length of the contigs in Mb
81,024
75,568
5,884
69,684 (8.7 x)
278
5,488
220
3,038
1,845
385
396,843
1,667*
83,026
78,001
5,918
72,083 (9.0 x)
459
5,253
235
3,226
1,687
105
380,952
1,600
*Each fingerprint band was estimated to represent a fragment of 4.2 kb, on average.
Based on this progress we expect the number of the soybean contigs will be reduced from
the 5,253 automated contigs to about 1,000 contigs after contig editing and mergence are
completed. The contig editing and mergence will be completed in early 2002. The BACs from
the Williams 82 and Fairbault BAC libraries (Danesh et al., 1998; Marek et al. 2001) will assist
with gap closure and contig verification. Contigs are available at http://hbz.tamu.edu - Physical
Mapping – Soy Map.
We are continuing to anchor the positive clones of the remaining marker anchored BACs
used in screening of the Williams 82 and Fairbault libraries, verifying, editing and merging
the contigs. The Forrest soybean genome project group at South Illinois University at
Carbondale is working on the integration of another 520 SSR markers. We are removing
likely cross contaminated cloned from the data-set as identified by SIUC program. By
August 2002 we will accomplish the fourth generation integrated physical and genetic map
of soybean, consisting of about 1,200 contigs, containing about 900 mapped DNA markers,
and covering about 95% of the soybean genome.
Objective 4: Synteny analysis and EST assignment.
a, EST assignment
Dr. Lightfoot has begun gene rich region mapping by placing onto the soybean physical
map 34 F.solani induced defense associated ESTs from small multigene families (Genbank
BI850056-850092). Most of these ESTs (60%) are providing single bands from designed
primers. In addition we are mapping member of two large gene families which can help indetify
gene rich regions.
The Zhang group showed NBS like sequences map to many locations in the physical map
(Wu et al., 2002a). The rhg1 gene was a LRR transmembrane kinase found in a gene rich
region. The family has 174 members in Arabidopsis (TAGI 2000). The Meksem group is
mapping members of this family in soybean (Meksem et al., 2001; 2001a) to attempt to identify
Rhg2-10.
b. Genome sequence: BAC end sequencing of a marker anchored BACs has been used for
genome sampling (Marek al., 2001; Iqbal et al., 2002) and putative genes were detected in about
7% of sequences. We expect about 2.5% of the soybean genome to encode genes (40,000 genes
in 1,000 Mbp of genome) the markers used seem to bias toward gene rich regions. As expected
about half of the predicted genes in genomic DNA are not present in the EST libraries but 80%
had paralogs. The predicted genes not captured by EST libraries tend to be cell, tissue, organ or
environment specific and accumulate to low abundance. They include mRNAs that appear to
encode genes targeted to the membranes, nucleus and mitochondria that tend to be accumulate
to lower copies per cell than cytoplasmic mRNAs.
c. Gene Discovery: The genomics of linkage group G of soybean: We have assembled a
physical map of linkage group G. Using 23 anchored contigs we assembled a partial physical
map of a chromosome that encompassed 69 automatics contigs and 50 Mbp into 10 merged
contigs (figure 4). The contigs anchored to linkage group G have provided 2 nucleation points
for preliminary genome sequencing (Figure 5 and 6) to overlap with a region sequenced by
another group (Hague et al., 2001). All of the linkage group G contigs are anchored with two or
more genetic markers that permit their orientation relative to one another. Some of the contigs
are known to contain high-density gene clusters . Some contigs are known to contain resistance
gene analog clusters (Wu et al., 2002).
0
10
20
C e n tim o rg a n s (c M )
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
SD S
SC N
30
Sd w t
40
H t/L d g e
50
60
a tt1 6 3
a tt0 3 8
a tt3 0 9
a tt6 1 0
a tt5 7 0
a tt2 1 7
a tt1 3 0
a tt2 3 5
a t_ 1 3 1
a tt3 2 4
a tt3 9 4
a tt1 1 5
c tt0 1 0
a tt5 6 6 /3
a t_ 0 8 8
a tt5 6 4 /1
a tt1 9 9 /5
a tt0 1 2 /5
03
38
05
17
70
W ue
M i
S c le ro
P ro t
S a tt2 8 8
80
90
F e e ffic .
100
O il
SC N
L ash
Sd w t
S a tt4 7 2
S a tt1 9 1
S a t_ 1 1 7
110
120
S a t_ 0 6 4
S c t_ 1 8 7
F o rre st S o y b e a n
C h ro m o so m e G
Figure 4A: Soybase Linkage group G from
Soybase http://macgrant.agron.iastate.edu/
Figure 4B: Physical Map of linkage
group G showing the core group of
marker anchors and an example of a
contig
The annotation of the composite linkage group G posted at Soybase in January 2002 is used
here. The ratio of physical distance to genetic distance varies from 100 to greater than 900 Kbp
per cM in contigs sampled from different regions of the linkage group (Figure 4).
Figure 5: Gene Density in 317 Kbp of the
sequence of linkage group G by Annotation.
Predicted genes are green, exon predictions
are blue (Genscan) and light blue (Genmark)
genome sequence orthologs are red, ESTs
paralogs and orthologs are grey
Figure 6: Closer view of predicted genes
around Rhg1 in Annotation Station.
Predicted genes are green, exon predictions
are blue (Genscan) and light blue (Genmark)
genome sequence orthologs are red, ESTs
paralogs and orthologs are grey
Objective 5: Community Access.
A database for the BAC and BIBAC fingerprints were created and made readily
accessible and available to the public (see Figure 2 and http://hbz.tamu.edu - Physical Mapping
- Soy Map). Users can access the database via the WWW and use the all five tools. The
fingerprints are useful when chromosome walking and when building minimally overlapping
clone tiles for genome sequencing. The genetic map location for all verified markers and their
plate addresses for all clones can be viewed at www.siu.edu/~pbgc/Database/sattlinkfiles. We
have published and distributed copies of a user guide to the physical map that includes a
description of how to dowmload source data for contigs for in-house manipulation as well as
outlining some of the problems and pitfalls users will encounter.
References
Danesh, D., S. Penuela, J. Mudge, R.L. Denny, H. Nordstrom, J.P. Martinez, Young,
N.D. 1998. A bacterial artificial chromosome library for soybean and identification
of clones near a major cyst nematode resistance gene. Theor. Appl. Genet. 96:196202.
Hauge,B.M., Wang,M.L., Parsons,J.D. and Parnell,L.D. 2001. Nucleic acid molecules
and other molecules associated with soybean cyst nematode resistance Patent: WO
0151627-A 8 19-JUL-2001;
Marek L, Shoemaker R. 1997. BAC contig development by fingerprint analysis in
soybean. Genome 40:420-427.
Marek LF, Mudge J, Darnielle L, Grant D, Hanson N, Paz M, Huihuang Y, Denny R,
Larson, K, Foster-Hartnett D, Cooper A, Danesh D, Larsen D, Schmidt T, Staggs R,
Crow JA, Retzel, E, Young ND, Shoemaker RC. 2001 Soybean genomic survey:
BAC-end sequences near RFLP and SSR markers. Genome 44:572-581.
McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, Sekhon M,
Wylie K, Mardis ER, Wilson RK, Fulton R, Kucaba TA, Wagner-McPherson C,
Barbazuk WB, Gregory SG, Humphray SJ, French L, Evans RS, Bethel G,
Whittaker A, Holden JL, McCann OT, Dunham A, Soderlund C, Scott CE, Bentley
DR, Schuler G, Chen HC, Jang W, Green ED, Idol JR, Maduro VV, Montgomery
KT, Lee E, Miller A, Emerling S, Kucherlapati, Gibbs R, Scherer S, Gorrell JH,
Sodergren E, Clerc-Blankenburg K, Tabor P, Naylor S, Garcia D, de Jong PJ,
Catanese JJ, Nowak N, Osoegawa K, Qin S, Rowen L, Madan A, Dors M, Hood L,
Trask B, Friedman C, Massa H, Cheung VG, Kirsch IR, Reid T, Yonescu R,
Weissenbach J, Bruls T, Heilig R, Branscomb E, Olsen A, Doggett N, Cheng JF,
Hawkins T, Myers RM, Shang J, Ramirez L, Schmutz J, Velasquez O, Dixon K,
Stone NE, Cox DR, Haussler D, Kent WJ, Furey T, Rogic S, Kennedy S, Jones S,
Rosenthal A, Wen G, Schilhabel M, Gloeckner G, Nyakatura G, Siebert R,
Schlegelberger B, Korenberg J, Chen XN, Fujiyama A, Hattori M, Toyoda A, Yada
T, Park HS, Sakaki Y, Shimizu N, Asakawa S, Kawasaki K, Sasaki T, Shintani A,
Shimizu A, Shibuya K, Kudoh J, Minoshima S, Ramser J, Seranski P, Hoff C,
Poustka A, Reinhardt R, Lehrach H. 2001 A physical map of the human genome.
Nature 409 6822 :934-941.
Meksem K, Ruben E, Zobrist K, Zhang H-B, Lightfoot DA. 2000. Two large insert
libraries for soybean: Applications in cyst nematode resistance and genome wide
physical mapping. Theor Appl Genet 101: 747-755.
Wu C, FA Tao Q, Santos, P Nimmakayala, R Springman and H-B Zhang. 2002. A
Bacterial artificial chromosome library for ‘Forrest’ soybean. Theor. Appl. Genet. in
press .
Download