Supporting References

advertisement
Supporting Information
Supporting Materials and Methods
Information on probes used for GB-BAC identification. The algorithms that were used to
design genic overgo probes were provided in Zheng et al. (1) and are available through the
OligoSpawn interface at http//www.oligospawn.org. Details of the overgo labeling and
hybridization procedures were provided by Madishetty et al. (2). Hybridization pools c1 through
c69 included a total of 12,285 probes, of which 12,059 were intended to find a single gene and 226
were intended to find several to many genes sharing the probe sequence (“popular” probes). In
general, these probes were designed to find genes in functional categories, or by expression pattern,
or location on a specific chromosome. Most of the probes that were chosen by their expression
pattern made use of experiments conducted using the Barley1 GeneChip (3), especially drought
stress, low temperature, salinity or abscisic acid application. Probes addressing functional
categories included transcription factors, photosynthetic processes, kinases, phosphatases, cell
wall biogenesis and numerous others. The overgos in pools c4 through c9 were selected from a list
of “popular” oligonucleotides (1) to maximize the number of gene-positive BACs per probe, but
in some cases problems were encountered with highly repetitive sequences. The first few pools
(c1 through c3) were composed of 40 bp overgos corresponding to genes indicated by the literature
as pertinent to abiotic stress; the remainder of the overgos produced 36 bp probes. The nature of
pools c0 and all other probes from researchers who provided GB-BAC addresses from prior work
varied widely, including cDNAs, genomic DNA fragments, overgos and PCR amplification.
Hybridization process. Autoradiographs were analyzed using High Density Filter Reading
(HDFR) software from Incogen Inc. (Williamsburg, VA). X-ray films were scanned and imported
into HDFR, where a grid file was generated for the filter layout reflecting the 18,432 clone
addresses. Filter images were aligned with the grid using the background and a few (3-4) strong
signal positive BACs, then each filter was scored and positives compiled into a text file for each
pool. Each BAC was spotted at two locations within a 4 x 4 grid in a unique pattern to facilitate
correct scoring of positive clones. All filter images were scored by a second, and sometimes a
third, person. Any BAC scored positive by any person was added to the list of GB-BACs, with the
expectation that this would result in some false positives since each person applied subjective
judgment as to the boundary of positive versus negative hybridizations.
Compartmentalized assembly method. Contigs were assembled using a tolerance of 3 and a
cutoff of 1e-45 with all other parameters at default values. FPC’s End-Merger function was applied
in several iterations with a cutoff of 1e-40. To avoid making wrong merges early in the process,
End-Merger was run with increasingly lower values of the “match” parameter (the required
number of matching clones in one of the ends of the contigs that will be merged) (6 for the first
iteration, 4 for the second iteration, and 3 for subsequent iterations). Cutoff values of 1e-50, 1e-55,
1e-60 were used iteratively to resolve Q-clones. Q-contigs (contigs that contain at least one Qclone) that contain 15% or more Q-clones were then split into component parts to decrease the
number of Q-clones. To merge contigs that share many clones, a similarity probability was
computed (the probability that two contigs share a set of clones by chance) as per Bozdag et al. (4)
and then contigs that have a similarity probability less than a threshold were merged using MergeSimilar-Contigs software. A threshold of 0 was used for the first iteration, 1e-30 for the second
iteration, and 1e-15 for subsequent iterations. After the automatic assembly of the fingerprinted
clones, there were 72,052 clones, 10,794 contigs, 10,598 singletons, and 996 Q-contigs. Only 75
of these Q-contigs contain 15% or more Q-clones.
Supporting References
1. Zheng J, et al. (2006) OligoSpawn: a software tool for the design of overgo probes from large
unigene datasets. BMC Bioinformatics 7:7.
2. Madishetty K, Condamine P, Svensson JT, Rodriguez E, Close TJ (2007) An improved method
to identify BAC clones using pooled overgos. Nucleic Acids Res 35:e5–e7.
3. Close TJ, et al. (2004) A new resource for cereal genomics: 22K barley GeneChip comes of
age. Plant Physiology 134:960-968.
4. Bozdag S, Close T, Lonardi S (2009) A Compartmentalized Approach to the Assembly of
Physical Maps. BMC Bioinformatics 10:217.
Fig. S1. Scatter plot of number of gene-bearing sequenced BACs against molecular sizes for barley
chromosome arms. The regression line, correlation coefficient (r) and coefficient of determination
(r2) are shown. Molecular sizes of barley chromosome arms were extracted from Suchankova et
al. (2006).
Fig. S2. BAC distribution along barley chromosomes 1H, 3H, 4H, 6H and 7H. Grey bars represent
the number of sequenced barley BACs and their units are shown on the left Y-axis. Colored lines
represent the proportion of BACs containing only 1 HC gene model (blue), 3 or more HC genes
(red) or 0 HC gene models (yellow), and the scale is shown on the right Y-axis. BAC densities are
calculated for a sliding window of 40 Mb at 2.5 Mb intervals based on the physical coordinates
provided by IBSC (2012). Barley-rice synteny is represented by lines connecting each mapped
BAC to the position on the rice genome determined by BLASTX (see Materials and Methods).
Densities of expressed rice genes across chromosomes are also displayed (adapted from
Supplementary Figure 2 in IRGSP (2005)), where blue bars indicate the frequency of gene models
in 100 kb windows, red boxes indicate centromeres and white boxes represent physical gaps.
Fig. S3. Estimate of the total number of gene-bearing BACs. The X-axis represent the probe pool
number starting with pool 1. The hybridization data were randomly shuffled and sampled 10,000
times to plot the number of unique BACs identified as a function of the number of probe pools
applied. The left vertical axis represents the mean number of newly identified BACs (in red,
declining linearly), while the right vertical axis show the mean value of the total number of unique
BACs identified (in green, increasing asymptotically). Linear extrapolation of the mean values of
new BACs is shown as dashed lines.
Table S1. Statistics of BAC sequence assembly for different minimum node sizes. (*) Numbers
do not include gene models hitting ≥10 BACs. (NA) numbers of gene models not shown for these
node sizes as a minimum length of 200 bp was used for the BLAST alignments.
Min Node
Size
HV Set
# BACs
Avg. #
Nodes/BAC
Length of assembled
reads (bp)
Avg. BAC
Length
Avg.
N50
Avg.
L50
# Unique HC
Models *
# Unique LC
Models *
100
3
2,123
30
221,329,225
104,253
22,345
2.6
NA
NA
100
4
2,134
31.3
266,413,198
124,842
20,991
3.5
NA
NA
100
5
2,161
26.4
250,223,324
115,791
25,175
2.8
NA
NA
100
6
2,181
26.2
227,828,567
104,461
20,147
3.2
NA
NA
100
7
2,155
22.7
228,934,235
106,234
27,881
2.5
NA
NA
100
8
2,180
27.5
228,919,327
105,009
22,343
2.9
NA
NA
100
9
1,638
26.5
174,329,244
106,428
22,928
2.8
NA
NA
100
10
1,050
43.6
118,985,319
113,319
31,136
2.3
NA
NA
100
All Sets
15,622
28.3
1,716,962,439
109,907
23,660
2.8
NA
NA
200
3
2,123
18.8
217,876,895
102,627
22,707
2.6
4,733
5,177
200
4
2,134
23.7
264,139,517
123,777
21,092
3.4
4,015
5,029
200
5
2,161
19.1
248,005,208
114,764
25,417
2.7
4,029
5,259
200
6
2,181
19.2
225,688,671
103,479
20,324
3.1
3,671
4,353
200
7
2,155
15.5
226,742,726
105,217
28,063
2.4
4,056
4,871
200
8
2,180
20.6
226,799,383
104,036
22,524
2.9
3,993
4,808
200
9
1,638
20.4
172,930,153
105,574
23,092
2.7
3,532
4,307
200
10
1,050
21.9
115,572,808
110,069
31,987
2.2
2,313
2,863
200
All Sets
15,622
19.7
1,697,755,361
108,677
23,906
2.8
15,707
19,330
400
3
2,123
14.4
215,338,448
101,431
22,975
2.5
4,733
5,173
400
4
2,134
19.8
261,756,143
122,660
21,206
3.3
4,012
5,013
400
5
2,161
16.1
246,155,661
113,908
25,539
2.7
4,027
5,249
400
6
2,181
16.3
223,920,615
102,669
20,442
3
3,665
4,346
400
7
2,155
12.9
225,190,771
104,497
28,261
2.4
4,054
4,868
400
8
2,180
17.2
224,627,560
103,040
22,759
2.8
3,989
4,793
400
9
1,638
17
171,292,922
104,574
23,256
2.7
3,528
4,306
400
10
1,050
12.6
112,940,735
107,563
32,486
2.2
2,310
2,858
400
All Sets
15,622
16
1,681,222,855
107,619
24,102
2.7
15,701
19,308
Table S2. BAC clones assigned to '4HC'. HC gene models contained in those BACs are shown.
Gene models hitting ≥10 BACs are indicated with an asterisk.
BAC
IBSC
(2012)
Chr.
IBSC (2012)
physical
position
Ariyadasa
et al.
(2014) Chr.
Ariyadasa
et al.
(2014) cM
0033J08
0059H22
4H
0075H13
0H
0104I13
4H
60267720
430787160
# HC
genes
# HC
genes
(≤10 BAC
hits)
(≥10 BAC
hits)
0
HC gene ID
Annnotation
1
AK361796*
Vacuolar protein sorting protein 25
-
4
51.4
0
0
-
4
51.13
0
2
AK370358*
4
51.13
5
6
MLOC_23041.
1*
MLOC_72681.
1
cDNA clone:J033145I11, full insert
sequence
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
AK248360.1
Retinoblastoma-related protein
AK360577
Retinoblastoma-related protein
MLOC_3611.1
MLOC_75153.
1
MLOC_80393.
1*
AK251542.1*
0132M20
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative, Ty3gypsy subclass
MLOC_8336.1
*
RNA-directed DNA polymerase (reverse
transcriptase)-related family protein
LENGTH=210
MLOC_23734.
1*
Retrotransposon protein, putative, LINE
subclass
MLOC_9225.1
*
RNA-directed DNA polymerase (reverse
transcriptase)-related family protein
LENGTH=146
MLOC_56399.
1*
Retrotransposon protein, putative,
unclassified
0
0
-
-
1
0
AK355297
30S ribosomal protein S5
0143O21
0H
0199F14
4H
341362080
4
51.27
0
1
AK357593*
Retrotransposon protein, putative, Ty3gypsy subclass
0253O03
4H
287446880
4
51.27
0
0
-
-
0
1
AK361796*
Vacuolar protein sorting protein 25
-
-
0253P15
0258K05
4H
412163600
4
51.6
0
0
0271C20
4H
371291960
4
51.4
2
0
MLOC_14312.
4
MLOC_26248.
1
tRNA-splicing endonuclease
0282F07
4H
268238040
4
51.27
0
0
0287D21
4H
385016280
4
51.17
0
0
-
-
6
MLOC_72681.
1
Retrotransposon protein, putative,
unclassified
0288F07
4H
430787160
4
51.13
5
-
tRNA-splicing endonuclease, putative
-
AK248360.1
Retinoblastoma-related protein
AK360577
Retinoblastoma-related protein
MLOC_3611.1
MLOC_75153.
1
MLOC_80393.
1*
AK251542.1*
0293B11
0H
0305P04
4H
69740080
4
51.2
0
0
4
51.13
0
2
0327K16
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative, Ty3gypsy subclass
MLOC_8336.1
*
RNA-directed DNA polymerase (reverse
transcriptase)-related family protein
LENGTH=210
MLOC_23734.
1*
Retrotransposon protein, putative, LINE
subclass
MLOC_9225.1
*
RNA-directed DNA polymerase (reverse
transcriptase)-related family protein
LENGTH=146
MLOC_56399.
1*
Retrotransposon protein, putative,
unclassified
-
-
MLOC_81217.
1*
MLOC_26663.
1*
Unknown protein
Retrotransposon protein, putative,
unclassified
0
0
-
-
0332H20
4H
268238040
4
51.27
0
0
-
-
0335O06
4H
309164240
4
51.27
0
0
-
-
MLOC_9792.1
unknown protein; LOCATED IN:
chloroplast stroma, chloroplast;
EXPRESSED IN: 16 plant structures;
EXPRESSED DURING: 10 growth
stages; BEST Arabidopsis thaliana protein
match is: unknown protein .
LENGTH=578
AK355646
Magnesium chelatase subunit D
0343P09
4H
412163600
4
51.6
2
0
0355I05
4H
294314000
4
51.35
0
0
-
-
0367O21
4H
270838320
4
51.84
0
1
AK361796*
Vacuolar protein sorting protein 25
2
0
AK366110
Chloride channel G
AK374032
ATP-dependent RNA helicase, putative
0370F11
0383P23
4H
294314000
4
51.35
0
0
-
-
0384E06
4H
268238040
4
51.27
0
0
-
-
0395H19
4H
400552240
4
51.27
1
5
AK369779
Serine--tRNA ligase
MLOC_31683.
1*
MLOC_27082.
1*
MLOC_45246.
1*
MLOC_29006.
1*
MLOC_31105.
1*
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
Retrotransposon protein, putative,
unclassified
0406O16
0
1
AK250939.1*
60S ribosomal protein L27
0415E07
7H
342180480
7
7.61
0
0
-
-
0417L20
1H
288571560
1
52.51
0
0
-
-
0418G02
4H
206388080
4
51.6
1
0
AK369779
Serine--tRNA ligase
0420M07
5H
23667880
5
43.75
0
0
-
-
0422B15
5H
18794680
5
3.42
0
0
-
-
0430N13
0H
0
0
-
-
0440F08
0H
0448E19
4H
52308160
4
51.4
0
0
-
-
4
52.5
0
0
-
-
0
0
-
0464C14
0464I10
3
3
MLOC_6065.1
MLOC_64204.
1
AK250129.1
MLOC_67693.
1*
MLOC_80910.
1*
MLOC_63286.
1*
0466F08
0474D04
0
4H
243288000
4
51.27
0493M11
1
1
3
0H
0497J24
4H
341362080
Ribosomal protein-like
AK374352*
Endonuclease-reverse transcriptase
AK364687*
Endonuclease-reverse transcriptase
AK248406.1*
Endonuclease-reverse transcriptase
0
AK355297
30S ribosomal protein S5
1
MLOC_58018.
2
Tir-nbs resistance protein
AK357593*
0495P14
Alpha-1,4-glucan-protein synthase [UDPforming], putative
Alpha-1,4-glucan-protein synthase [UDPforming], putative
Alpha-1,4-glucan-protein synthase [UDPforming], putative
Retrotransposon protein, putative,
unclassified
Terpenoid cylase/protein prenyltransferase
alpha-alpha toroid
Retrotransposon protein, putative, Ty3gypsy subclass
Retrotransposon protein, putative, Ty3gypsy subclass
4
51.27
0
1
AK357593*
4
51.27
0
2
AK250939.1*
60S ribosomal protein L27
AK357593*
Retrotransposon protein, putative, Ty3gypsy subclass
0505C19
0
0
-
-
0511D22
0
0
-
-
0533C18
0
0
-
-
0553F21
0
0
-
-
0
0
-
-
0567E13
0
0
-
-
0568F02
0
1
AK361796*
Vacuolar protein sorting protein 25
0580C08
0
0
-
-
0598H07
0
0
-
-
0562B13
4H
389512520
4
51.27
0626D08
0629O16
4H
333449640
0683O04
0706G15
0739I14
1
0
MLOC_24918.
1
Heme exporter protein CcmC
0
0
-
-
1
0
AK362606
50S ribosomal protein L14
0
MLOC_68534.
1
50S ribosomal protein L14
AK362606
50S ribosomal protein L14
2
4H
294314000
4
51.35
0743G16
0
0
-
-
0
0
-
-
0
0
-
-
0743N18
0H
0789B17
4H
371291960
4
51.4
0
0
-
-
0805J22
4H
270838320
4
51.84
0
1
AK361796*
Vacuolar protein sorting protein 25
Download