Supporting Information Supporting Materials and Methods Information on probes used for GB-BAC identification. The algorithms that were used to design genic overgo probes were provided in Zheng et al. (1) and are available through the OligoSpawn interface at http//www.oligospawn.org. Details of the overgo labeling and hybridization procedures were provided by Madishetty et al. (2). Hybridization pools c1 through c69 included a total of 12,285 probes, of which 12,059 were intended to find a single gene and 226 were intended to find several to many genes sharing the probe sequence (“popular” probes). In general, these probes were designed to find genes in functional categories, or by expression pattern, or location on a specific chromosome. Most of the probes that were chosen by their expression pattern made use of experiments conducted using the Barley1 GeneChip (3), especially drought stress, low temperature, salinity or abscisic acid application. Probes addressing functional categories included transcription factors, photosynthetic processes, kinases, phosphatases, cell wall biogenesis and numerous others. The overgos in pools c4 through c9 were selected from a list of “popular” oligonucleotides (1) to maximize the number of gene-positive BACs per probe, but in some cases problems were encountered with highly repetitive sequences. The first few pools (c1 through c3) were composed of 40 bp overgos corresponding to genes indicated by the literature as pertinent to abiotic stress; the remainder of the overgos produced 36 bp probes. The nature of pools c0 and all other probes from researchers who provided GB-BAC addresses from prior work varied widely, including cDNAs, genomic DNA fragments, overgos and PCR amplification. Hybridization process. Autoradiographs were analyzed using High Density Filter Reading (HDFR) software from Incogen Inc. (Williamsburg, VA). X-ray films were scanned and imported into HDFR, where a grid file was generated for the filter layout reflecting the 18,432 clone addresses. Filter images were aligned with the grid using the background and a few (3-4) strong signal positive BACs, then each filter was scored and positives compiled into a text file for each pool. Each BAC was spotted at two locations within a 4 x 4 grid in a unique pattern to facilitate correct scoring of positive clones. All filter images were scored by a second, and sometimes a third, person. Any BAC scored positive by any person was added to the list of GB-BACs, with the expectation that this would result in some false positives since each person applied subjective judgment as to the boundary of positive versus negative hybridizations. Compartmentalized assembly method. Contigs were assembled using a tolerance of 3 and a cutoff of 1e-45 with all other parameters at default values. FPC’s End-Merger function was applied in several iterations with a cutoff of 1e-40. To avoid making wrong merges early in the process, End-Merger was run with increasingly lower values of the “match” parameter (the required number of matching clones in one of the ends of the contigs that will be merged) (6 for the first iteration, 4 for the second iteration, and 3 for subsequent iterations). Cutoff values of 1e-50, 1e-55, 1e-60 were used iteratively to resolve Q-clones. Q-contigs (contigs that contain at least one Qclone) that contain 15% or more Q-clones were then split into component parts to decrease the number of Q-clones. To merge contigs that share many clones, a similarity probability was computed (the probability that two contigs share a set of clones by chance) as per Bozdag et al. (4) and then contigs that have a similarity probability less than a threshold were merged using MergeSimilar-Contigs software. A threshold of 0 was used for the first iteration, 1e-30 for the second iteration, and 1e-15 for subsequent iterations. After the automatic assembly of the fingerprinted clones, there were 72,052 clones, 10,794 contigs, 10,598 singletons, and 996 Q-contigs. Only 75 of these Q-contigs contain 15% or more Q-clones. Supporting References 1. Zheng J, et al. (2006) OligoSpawn: a software tool for the design of overgo probes from large unigene datasets. BMC Bioinformatics 7:7. 2. Madishetty K, Condamine P, Svensson JT, Rodriguez E, Close TJ (2007) An improved method to identify BAC clones using pooled overgos. Nucleic Acids Res 35:e5–e7. 3. Close TJ, et al. (2004) A new resource for cereal genomics: 22K barley GeneChip comes of age. Plant Physiology 134:960-968. 4. Bozdag S, Close T, Lonardi S (2009) A Compartmentalized Approach to the Assembly of Physical Maps. BMC Bioinformatics 10:217. Fig. S1. Scatter plot of number of gene-bearing sequenced BACs against molecular sizes for barley chromosome arms. The regression line, correlation coefficient (r) and coefficient of determination (r2) are shown. Molecular sizes of barley chromosome arms were extracted from Suchankova et al. (2006). Fig. S2. BAC distribution along barley chromosomes 1H, 3H, 4H, 6H and 7H. Grey bars represent the number of sequenced barley BACs and their units are shown on the left Y-axis. Colored lines represent the proportion of BACs containing only 1 HC gene model (blue), 3 or more HC genes (red) or 0 HC gene models (yellow), and the scale is shown on the right Y-axis. BAC densities are calculated for a sliding window of 40 Mb at 2.5 Mb intervals based on the physical coordinates provided by IBSC (2012). Barley-rice synteny is represented by lines connecting each mapped BAC to the position on the rice genome determined by BLASTX (see Materials and Methods). Densities of expressed rice genes across chromosomes are also displayed (adapted from Supplementary Figure 2 in IRGSP (2005)), where blue bars indicate the frequency of gene models in 100 kb windows, red boxes indicate centromeres and white boxes represent physical gaps. Fig. S3. Estimate of the total number of gene-bearing BACs. The X-axis represent the probe pool number starting with pool 1. The hybridization data were randomly shuffled and sampled 10,000 times to plot the number of unique BACs identified as a function of the number of probe pools applied. The left vertical axis represents the mean number of newly identified BACs (in red, declining linearly), while the right vertical axis show the mean value of the total number of unique BACs identified (in green, increasing asymptotically). Linear extrapolation of the mean values of new BACs is shown as dashed lines. Table S1. Statistics of BAC sequence assembly for different minimum node sizes. (*) Numbers do not include gene models hitting ≥10 BACs. (NA) numbers of gene models not shown for these node sizes as a minimum length of 200 bp was used for the BLAST alignments. Min Node Size HV Set # BACs Avg. # Nodes/BAC Length of assembled reads (bp) Avg. BAC Length Avg. N50 Avg. L50 # Unique HC Models * # Unique LC Models * 100 3 2,123 30 221,329,225 104,253 22,345 2.6 NA NA 100 4 2,134 31.3 266,413,198 124,842 20,991 3.5 NA NA 100 5 2,161 26.4 250,223,324 115,791 25,175 2.8 NA NA 100 6 2,181 26.2 227,828,567 104,461 20,147 3.2 NA NA 100 7 2,155 22.7 228,934,235 106,234 27,881 2.5 NA NA 100 8 2,180 27.5 228,919,327 105,009 22,343 2.9 NA NA 100 9 1,638 26.5 174,329,244 106,428 22,928 2.8 NA NA 100 10 1,050 43.6 118,985,319 113,319 31,136 2.3 NA NA 100 All Sets 15,622 28.3 1,716,962,439 109,907 23,660 2.8 NA NA 200 3 2,123 18.8 217,876,895 102,627 22,707 2.6 4,733 5,177 200 4 2,134 23.7 264,139,517 123,777 21,092 3.4 4,015 5,029 200 5 2,161 19.1 248,005,208 114,764 25,417 2.7 4,029 5,259 200 6 2,181 19.2 225,688,671 103,479 20,324 3.1 3,671 4,353 200 7 2,155 15.5 226,742,726 105,217 28,063 2.4 4,056 4,871 200 8 2,180 20.6 226,799,383 104,036 22,524 2.9 3,993 4,808 200 9 1,638 20.4 172,930,153 105,574 23,092 2.7 3,532 4,307 200 10 1,050 21.9 115,572,808 110,069 31,987 2.2 2,313 2,863 200 All Sets 15,622 19.7 1,697,755,361 108,677 23,906 2.8 15,707 19,330 400 3 2,123 14.4 215,338,448 101,431 22,975 2.5 4,733 5,173 400 4 2,134 19.8 261,756,143 122,660 21,206 3.3 4,012 5,013 400 5 2,161 16.1 246,155,661 113,908 25,539 2.7 4,027 5,249 400 6 2,181 16.3 223,920,615 102,669 20,442 3 3,665 4,346 400 7 2,155 12.9 225,190,771 104,497 28,261 2.4 4,054 4,868 400 8 2,180 17.2 224,627,560 103,040 22,759 2.8 3,989 4,793 400 9 1,638 17 171,292,922 104,574 23,256 2.7 3,528 4,306 400 10 1,050 12.6 112,940,735 107,563 32,486 2.2 2,310 2,858 400 All Sets 15,622 16 1,681,222,855 107,619 24,102 2.7 15,701 19,308 Table S2. BAC clones assigned to '4HC'. HC gene models contained in those BACs are shown. Gene models hitting ≥10 BACs are indicated with an asterisk. BAC IBSC (2012) Chr. IBSC (2012) physical position Ariyadasa et al. (2014) Chr. Ariyadasa et al. (2014) cM 0033J08 0059H22 4H 0075H13 0H 0104I13 4H 60267720 430787160 # HC genes # HC genes (≤10 BAC hits) (≥10 BAC hits) 0 HC gene ID Annnotation 1 AK361796* Vacuolar protein sorting protein 25 - 4 51.4 0 0 - 4 51.13 0 2 AK370358* 4 51.13 5 6 MLOC_23041. 1* MLOC_72681. 1 cDNA clone:J033145I11, full insert sequence Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified AK248360.1 Retinoblastoma-related protein AK360577 Retinoblastoma-related protein MLOC_3611.1 MLOC_75153. 1 MLOC_80393. 1* AK251542.1* 0132M20 Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, Ty3gypsy subclass MLOC_8336.1 * RNA-directed DNA polymerase (reverse transcriptase)-related family protein LENGTH=210 MLOC_23734. 1* Retrotransposon protein, putative, LINE subclass MLOC_9225.1 * RNA-directed DNA polymerase (reverse transcriptase)-related family protein LENGTH=146 MLOC_56399. 1* Retrotransposon protein, putative, unclassified 0 0 - - 1 0 AK355297 30S ribosomal protein S5 0143O21 0H 0199F14 4H 341362080 4 51.27 0 1 AK357593* Retrotransposon protein, putative, Ty3gypsy subclass 0253O03 4H 287446880 4 51.27 0 0 - - 0 1 AK361796* Vacuolar protein sorting protein 25 - - 0253P15 0258K05 4H 412163600 4 51.6 0 0 0271C20 4H 371291960 4 51.4 2 0 MLOC_14312. 4 MLOC_26248. 1 tRNA-splicing endonuclease 0282F07 4H 268238040 4 51.27 0 0 0287D21 4H 385016280 4 51.17 0 0 - - 6 MLOC_72681. 1 Retrotransposon protein, putative, unclassified 0288F07 4H 430787160 4 51.13 5 - tRNA-splicing endonuclease, putative - AK248360.1 Retinoblastoma-related protein AK360577 Retinoblastoma-related protein MLOC_3611.1 MLOC_75153. 1 MLOC_80393. 1* AK251542.1* 0293B11 0H 0305P04 4H 69740080 4 51.2 0 0 4 51.13 0 2 0327K16 Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, Ty3gypsy subclass MLOC_8336.1 * RNA-directed DNA polymerase (reverse transcriptase)-related family protein LENGTH=210 MLOC_23734. 1* Retrotransposon protein, putative, LINE subclass MLOC_9225.1 * RNA-directed DNA polymerase (reverse transcriptase)-related family protein LENGTH=146 MLOC_56399. 1* Retrotransposon protein, putative, unclassified - - MLOC_81217. 1* MLOC_26663. 1* Unknown protein Retrotransposon protein, putative, unclassified 0 0 - - 0332H20 4H 268238040 4 51.27 0 0 - - 0335O06 4H 309164240 4 51.27 0 0 - - MLOC_9792.1 unknown protein; LOCATED IN: chloroplast stroma, chloroplast; EXPRESSED IN: 16 plant structures; EXPRESSED DURING: 10 growth stages; BEST Arabidopsis thaliana protein match is: unknown protein . LENGTH=578 AK355646 Magnesium chelatase subunit D 0343P09 4H 412163600 4 51.6 2 0 0355I05 4H 294314000 4 51.35 0 0 - - 0367O21 4H 270838320 4 51.84 0 1 AK361796* Vacuolar protein sorting protein 25 2 0 AK366110 Chloride channel G AK374032 ATP-dependent RNA helicase, putative 0370F11 0383P23 4H 294314000 4 51.35 0 0 - - 0384E06 4H 268238040 4 51.27 0 0 - - 0395H19 4H 400552240 4 51.27 1 5 AK369779 Serine--tRNA ligase MLOC_31683. 1* MLOC_27082. 1* MLOC_45246. 1* MLOC_29006. 1* MLOC_31105. 1* Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified Retrotransposon protein, putative, unclassified 0406O16 0 1 AK250939.1* 60S ribosomal protein L27 0415E07 7H 342180480 7 7.61 0 0 - - 0417L20 1H 288571560 1 52.51 0 0 - - 0418G02 4H 206388080 4 51.6 1 0 AK369779 Serine--tRNA ligase 0420M07 5H 23667880 5 43.75 0 0 - - 0422B15 5H 18794680 5 3.42 0 0 - - 0430N13 0H 0 0 - - 0440F08 0H 0448E19 4H 52308160 4 51.4 0 0 - - 4 52.5 0 0 - - 0 0 - 0464C14 0464I10 3 3 MLOC_6065.1 MLOC_64204. 1 AK250129.1 MLOC_67693. 1* MLOC_80910. 1* MLOC_63286. 1* 0466F08 0474D04 0 4H 243288000 4 51.27 0493M11 1 1 3 0H 0497J24 4H 341362080 Ribosomal protein-like AK374352* Endonuclease-reverse transcriptase AK364687* Endonuclease-reverse transcriptase AK248406.1* Endonuclease-reverse transcriptase 0 AK355297 30S ribosomal protein S5 1 MLOC_58018. 2 Tir-nbs resistance protein AK357593* 0495P14 Alpha-1,4-glucan-protein synthase [UDPforming], putative Alpha-1,4-glucan-protein synthase [UDPforming], putative Alpha-1,4-glucan-protein synthase [UDPforming], putative Retrotransposon protein, putative, unclassified Terpenoid cylase/protein prenyltransferase alpha-alpha toroid Retrotransposon protein, putative, Ty3gypsy subclass Retrotransposon protein, putative, Ty3gypsy subclass 4 51.27 0 1 AK357593* 4 51.27 0 2 AK250939.1* 60S ribosomal protein L27 AK357593* Retrotransposon protein, putative, Ty3gypsy subclass 0505C19 0 0 - - 0511D22 0 0 - - 0533C18 0 0 - - 0553F21 0 0 - - 0 0 - - 0567E13 0 0 - - 0568F02 0 1 AK361796* Vacuolar protein sorting protein 25 0580C08 0 0 - - 0598H07 0 0 - - 0562B13 4H 389512520 4 51.27 0626D08 0629O16 4H 333449640 0683O04 0706G15 0739I14 1 0 MLOC_24918. 1 Heme exporter protein CcmC 0 0 - - 1 0 AK362606 50S ribosomal protein L14 0 MLOC_68534. 1 50S ribosomal protein L14 AK362606 50S ribosomal protein L14 2 4H 294314000 4 51.35 0743G16 0 0 - - 0 0 - - 0 0 - - 0743N18 0H 0789B17 4H 371291960 4 51.4 0 0 - - 0805J22 4H 270838320 4 51.84 0 1 AK361796* Vacuolar protein sorting protein 25