Captions for Supplemental Tables Supplemental Table 1 A table of

advertisement
Captions for Supplemental Tables
Supplemental Table 1
A table of all allelic variation in the Immunoglobulin Heavy Chain locus. We reconstructed the specific
allele present in each rat line and assembled all alleles for all IgH genes in a single table. Column A
identifies the gene segment or family name (using the IMGT nomenclature for rat immunoglobulin
heavy chain gene families/segments). Column B provides the specific gene name. Column C indicates
that the allele information in that row corresponds to the Brown Norway rat reference genome
sequence (BN) or to the SHR-A3 and SHR-B2 genome sequences we have assembled to the
reference. Column D provides an allele count to indicate the reference allele as allele 1 and to identify
any additional alleles for each gene that were identified in either or both SHR-A3 and SHR-B2. Column
E provides a cumulative tally of the new IgH alleles (303 new in total). Column F provides the allele
sequence for all alleles, noting the variant base residues in color.
Supplemental Table 2
A complete description of the functional state in SHR-A3 and SHR-B2, along with the amino acids
affected by sequence change is provided in this table. Columns A and B are as in Supplemental Table
1. Columns C, D and E indicate for BN, SHR-A3 and SHR-B2 respectively whether the sequence for
each IgH gene encodes a functional gene, an open reading frame or a null segment. Column F
indicates that the allele information in that row corresponds to the Brown Norway rat reference genome
sequence (BN) or to the SHR-A3 and SHR-B2 genome sequences we have assembled to the
reference. Column G provides a count of non-synonymous alleles to indicate the reference allele as
allele 1 and to identify any additional non-synonymous alleles for each gene that were identified in
either or both SHR-A3 and SHR-B2. . Column H provides a cumulative tally of the new IgH alleles that
have amino acid sequence variation (128 in total). Column I indicates non-synonymous alleles
comparing SHR-A3 and SHR-B2 (98 in total). Column J provides the amino acid sequence encoded by
the alleles with variant residues indicated in color.
Supplemental Table 3
This table integrates allelic information with sequence genome position and includes a color-coded
haplotype map of the IgH locus. Columns A through E are as in Supplemental Table 2. Column F
indicates whether the sequences of each gene segment are inherited identical by descent and if so,
which strains share identity. For simplicity SHR-A3 and SHR-B2 are reduced to A3 and B2 and Brown
Norway to BN in this column. IBD indicates identical by descent for the two or three strains preceding
this abbreviation. Diff indicates strains that have different alleles that are not inherited identical by
descent. Column G and H provide a color-coded haplotype map of SHR-A3 (column G) and SHR-B2
(Column H) to represent the ancestral state of each IgH gene segment. When the segment is IBD with
BN it is colored brown. When a unique SHR-A3 allele exists, it is colored red. When a unique SHR-B2
allele exists, it is colored green. When an allele that differs from BN is present in both SHR-A3 and
SHR-B2 and is IBD, it is colored blue. Column I indicates whether the coding sequence is present on
the forward (+) or reverse (-) strands of the genome. Column J provides the position of the gene in the
rat genome assembly 3.4. Supplemental Figure 2 indicated that genome sequence read coverage was
variable across this region of the genome. This may reflect the presence of sequence duplication and
deletion occurring in this highly segmented region of the genome. Since variation is detected by
alignment to the reference it is possible that some regions of this part of the genome are duplicated in
SHR-A3 and/or SHR-B2. Duplications can undergo subsequent genome sequence divergence. During
alignment to the reference, multiple duplicated segments that have diverged through single nucleotide
polymorphism may be aligned to a single sequence in the reference genome that reflects the original
state of that segment prior to duplication. We have provided information in Column K that may reflect
the occurrence of this phenomenon. For example, if a sequence is been duplicated 4 times in SHR-A3
or SHR-B2 compared to the reference sequence, then this may be reflected in sequence coverage
greater than the genome-wide average of ~50X coverage we obtained, as indicated in Suppl. Fig. 2.
This may result in all 4 duplicated segments aligning to a single segment of the reference genome.
Depending on its evolutionary path, each duplicated segment may or may not contain variants that are
present in its related duplicated segments. For example, in row 37 we found that the reference
sequence was identified at the same location in 149 of 204 aligned sequence reads in SHR-A3, but that
a variant was located in 55 of 204 reads. This may reflect segmental duplication where subsequent
polymorphism has affected only one of the duplicated segments. In the same region, SHR-B2 varied
from the reference sequence and the variation was present in all reads. Next generation short read
genome sequencing is not sufficiently able to resolve these segmental duplications that are followed by
the creation of subsequent polymorphism. However we note our observations here so that they may be
examined more closely as sequencing technology advances and longer read assemblies become
available that can resolve highly segmentally duplicated sequences. Columns L and M repeat the base
sequence and amino acid sequence allele information also available in Supplemental Tables 1 and 2.
Download