Supplementary Figure Legends - Word file

advertisement
Figure Legends for Supplemental Information Figures for the manuscript 2003-1213814.
Figure SI-1. BAC clone selection process. Seed clones were initially selected either
randomly from the BAC library or from contigs from initial FPC maps (constructed at the
BCCA-GSC). BAC end sequences (produced at TIGR) and FPC fingerprint maps were
generated concurrent to BAC skim sequencing and eBAC assembly. Data from all
sources fed back into later rounds selecting walking and gap-filling clones to produce a
tiling path.
Figure SI-2. Coverage of finished BACs based on depth of BAC bait coverage.
Finished sequences for twelve BACS, each on a different chromosome, were compared
with enriched-BAC assemblies for varying depth of BAC-read coverage (dashed lines).
On average (bold line), as the depth of BAC coverage (sum of trimmed read length
divided by the fingerprint-estimated BAC size) approaches 2, the contigs from the
enriched-BAC assembly approach 100 percent coverage of the finished sequence.
Figure SI-3. Partitioning of fingerprint and YAC map contigs across the rat
sequence assembly. Contigs were anchored to the assembly using BAC end and in silico
mapping methods. Contigs that localized to multiple regions on the assembly are joined
by colored curves, anchored at the location of the middle of the contig regions. The
contigs themselves are suppressed for clarity (see C for an example of a detailed region).
The density of segmental duplications, calculated as a windowed sum over adjacent 1 Mb
regions, is shown by blue bars oriented towards the center of the circle. The scale for the
segmental duplication is logarithmic, with decades at 0, 1, 10 and 100 kb marked by
concentric grey circles. a. Red curves join segmented BAC fingerprint contig regions
which have been visually inspected and determined to overlap in the fingerprint map.
Grey curves join contig segment pairs where at least one member of the pair is located on
chrUn which contains unanchored parts of the assembly. b. YAC contigs are anchored to
the assembly by way of hybridizations to BACs with sequence coordinates. Light grey
lines link regions that (a) are anchored by hybridization to a single BAC that is associated
with <80% of the contig YACs or (b) are anchored by <20% of the contig BACs, leading
to spurious contig segmentation on the assembly not likely to be due to actual
inconsistencies between the YAC map and the sequence assembly. Red lines link the
remaining region pairs, for which segmentation evidence is robust. c. The fingerprint
map contig structure is shown in detail for the 80-200 Mb region of chromosome 2. The
bottom contig track (green) represents contigs in the manually edited map. Sequence
information was used to merge contigs and increase the contiguity of the map. The upper
contig track (blue) shows the contig layout in the merged map. Contig 3019 (contig 2040
in the merged map) maps to two disjoint regions by sequence. This split is not
corroborated by the contig structure in the fingerprint map. The histogram below the
contig tracks shows the relative number of fingerprint map clones with sequence
coordinate annotations in windows of 250 kb.
Figure SI-4. Coverage of BAC Clones Selected for Sequencing from the Fingerprint
Map. Summary of the sequence clone selection process during the 44 rounds of
selection, shown by the total size of selected clones, coverage provided by the clones and
number of coverage gaps. Solid glyphs correspond to the sequence clone selections,
hollow glyphs correspond to simulated random sets. The total size and coverage plots
show statistics compiled from selections made from the fingerprint map only,
representing 16,299 clones. The gap count was computed using all selected clones. The
random sets contained the same number of simulated selections as their corresponding
experimental equivalents.
Figure SI-5. Correlation of SINE locations in rat and mouse.
Correlation of the lineage-specific SINE densities in 14,243 100 kb windows in rat and
the orthologous regions in mouse. For each of the SINE families B1, B2 and ID, we
constructed consensus sequences for multiple rat-specific, mouse-specific, as well as
ancestral subfamilies to optimize the distinction of lineage-specific and shared repeats.
To further minimize the number of elements falsely labeled to be lineage-specific, which
would exaggerate the correlation, the maximum divergence of a copy to a consensus
sequence was set below 9%, i.e. below the neutral substitution level since the speciation
in both species (SINEs observed at identical sites in rat and mouse and therefore
predating the speciation were actually 12% or more diverged from the available
consensus sequences, due to CpG content and the still incomplete resolution of subfamily
structure).
For the same windows the density of lineage-specific LTR elements or L1 copies in rat
and mouse showed no correlation.
Figure SI-6. Dot-plots showing positions of aligning segments in rat-rat, rat-mouse,
and rat-human comparisons, for a 10 Mb region of rat Chromosome 10 and the
orthologous regions in mouse and human. All known interspersed repeats and tandem
repeats were masked prior to the alignment using BLASTZ (default parameters). The
extensive off-diagonal lines depict the prevalent, medium length duplications. A putative
gene cluster in the center of this region, which in humans contains the genes TREM5 and
CMRF35, was excluded from the calculations of the frequency of medium-length
duplications.
Figure SI-7. Sequence conservation at AG acceptor splice sites in aligned introns.
Figure SI-8. Sequence conservation at GT donor splice sites in aligned introns.
Download