Comparative Diversity of Protocadherin Gene Clusters

advertisement
SUPPLEMENTAL TEXT
Diversified gene regulation of the vertebrate-specific Pcdh clusters
The clustered Pcdh genes are absent in the Drosophila Melanogaster and
Caenorhabditis elegans genomes (Rubin et al. 2000; Hill et al. 2001; Noonan et al.
2004). I found that the constant protein sequences of the  and  clusters are highly
conserved among mammals, birds, amphibians, and fish (Supplemental Figure S2). The
lengths of the  constant region polypeptide sequences are almost identical (Figure
S2A); however, the constant polypeptide sequences of the zebrafish  proteins are
longer than those of the non-fish vertebrates (Figure S2B). The two draft genomes of
sea squirts (invertebrate chordates: Ciona intestinalis and Ciona savignyi) do not seem
to contain the Pcdh clusters (Dehal et al. 2002) (www.broad.mit.edu /annotation /ciona).
Therefore, the Pcdh clusters seem to be an evolutionary novelty specific to vertebrates.
Although tandem duplicated genes tend to have conserved promoter motifs, the
motifs in different groups may have distinct characteristics. For example, the variable
exons of the UGT1 gene clusters can be divided into phenol and bilirubin groups. There
is no common motif in the promoter regions of all of the UGT1 variable exons. However,
there is a highly conserved motif upstream of each variable exon in the bilirubin group
and a distinct motif in the phenol group (Zhang et al. 2004).
Each Pcdh variable exon is preceded by a distinct promoter (Tasic et al. 2002).
Each Pcdh promoter contains a highly conserved “CGCT” core sequence motif in
humans and mice (Wu et al. 2001; Noonan et al. 2004). I reasoned that the motifs for
different Pcdh groups may have distinct characteristics. By using the Gibbs Motif
Sampler program (Thompson et al. 2003) (bayesweb.wadsworth.org /gibbs), I searched
1
the 350-bp regions upstream of the translation start codon for each member of 15
distinct groups of the chimpanzee, rat, and zebrafish Pcdh genes (Figure S3). The
motifs for chimpanzee and rat  (Figure S3, A and E),  (Figure S3, B and F), a (Figure
S3, C and G), and b (Figure S3, D and H) are conserved. All contain a highly
conserved “CGCT” core motif. However, the flanking sequences are different between
the mammalian a and b genes (Figure S3, C, D, G, and H). This observation suggests
that the regulation of mammalian a and b groups may be different. The mammalian ctype Pcdh genes have a motif distinct from those of the other mammalian groups
(Figure S3I).
The motifs of the zebrafish  and  groups 1 genes are similar to each other
(Figure S3, J and M), and resemble those of the mammalian  genes, which have a
“CGCT” core sequence. The motifs for the zebrafish  groups 2 and 3 genes have a
weak “CAGT” sequence instead of the “CGCT” core (Figure S3, K and L). The zebrafish
 group 2 genes (Figure S3N) have a motif related to those of the mammalian a group
(Figure S3, C and G). The zebrafish  group 3 genes have a distinct motif (Figure S3O).
These results suggest that each zebrafish variable exon is preceded by a promoter that
is related to a mammalian promoter, but its regulation has diverged considerably from
the mammalian Pcdh genes.
2
SUPPLEMENTAL LITERATURE CITED
Dehal, P., Y. Satou, R. K. Campbell, J. Chapman, B. Degnan et al., 2002 The draft
genome of Ciona intestinalis: insights into chordate and vertebrate origins.
Science 298: 2157-2167.
Hill, E., I. D. Broadbent, C. Chothia and J. Pettitt, 2001 Cadherin superfamily proteins in
Caenorhabditis elegans and Drosophila melanogaster. J. Mol. Biol. 305: 10111024.
Noonan, J. P., J. Grimwood, J. Schmutz, M. Dickson and R. M. Myers, 2004 Gene
conversion and the evolution of protocadherin gene cluster diversity. Genome
Res. 14: 354-366.
Rubin, G. M., M. D. Yandell, J. R. Wortman, G. L. Gabor Miklos, C. R. Nelson et al.,
2000 Comparative genomics of the eukaryotes. Science 287: 2204-2215.
Tasic, B., C. E. Nabholz, K. K. Baldwin, Y. Kim, E. H. Rueckert et al., 2002 Promoter
choice determines splice site selection in protocadherin alpha and gamma premRNA splicing. Mol. Cell 10: 21-33.
Thompson, W., E. C. Rouchka and C. E. Lawrence, 2003 Gibbs Recursive Sampler:
finding transcription factor binding sites. Nucleic Acids Res. 31: 3580-3585.
Wu, Q., T. Zhang, J. F. Cheng, Y. Kim, J. Grimwood et al., 2001 Comparative DNA
sequence analysis of mouse and human protocadherin gene clusters. Genome
Res. 11: 389-404.
Zhang, T., P. Haws and Q. Wu, 2004 Multiple variable first exons: a mechanism for celland tissue-specific gene regulation. Genome Res. 14: 79-89.
3
SUPPLEMENTAL FIGURE LEGENDS
Figure S1. RT-PCR of members of two  clusters from the zebrafish brain RNA
preparations. The amplified full-length coding sequences are indicated by 3 kb bands.
The smaller bands are alternatively spliced products. M, marker.
Figure S2. An alignment of vertebrate Pcdh  (A) and  (B) constant protein sequences
with conserved residues highlighted. The high degree of conservation between the two
zebrafish  or  constant regions demonstrates that these clusters are duplicated. HS,
Homo sapiens; PT, Pan troglodytes; MM, Mus musculus; RN, Rattus norvegicus; GG,
Gallus gallus; XT, Xenopus tropicalis; DR, Danio rerio.
Figure S3. Characteristics of the conserved promoter sequence motifs in vertebrate
clustered Pcdh genes. Shown are graphic logo representations of the chimpanzee 
(A),  (B), a (C), b (D), rat  (E),  (F), a (G), b (H), chimpanzee and rat c-type (I),
zebrafish  groups 1 (J), 2 (K), 3 (L), and  groups 1 (M), 2 (N), and 3 (O) motifs. The
height of symbols indicates the relative frequency of each nucleotide at that position.
Figure S4. An alignment of the human (A), chimpanzee (B), mouse (C), rat (D) Pcdh ,
,  ECs 1-3 sequences with those of C-cadherin (C-cdh). The + codons predicted to
be subject to positive selection with a posterior probability >0.90 by one model and >0.5
by at least one other model are highlighted in red for members of the  cluster, in green
for , in blue for a, and in violet for b. The corresponding positions in C-cadherin are
also highlighted accordingly. Positions that were predicted to be under positive selection
by two or more groups are indicated by an asterisk.
4
SUPPLEMENTAL TABLE LEGENDS
Table S1. List of oligonucleotides used
Table S2. Log-likelihood values and parameter estimates for 22 human, chimpanzee,
mouse, rat, and zebrafish Pcdh groups
Model1 Maximum likelihood models implemented in the codeml program of the
PAML package. M0, one-ratio; M1, neutral; M2, selection; M3, discrete; M7, ; M8,
+.
2 Estimated log-likelihood values by the codeml program.
3 Estimated transition/transversion rate ratio by the codeml program.
Estimation of Parameters4 =KA/KS nonsynonymous/synonymous rate ratio;
p=proportion of sites for each site class. M0: one estimated  for all sites; M1:
estimate p0=proportion of sites with =0, p1=1 - p0, proportion of sites with =1;
M2: estimate p0 (=0), p1 (=1), and , p2=1 - p0 - p1. M3: estimate p0, p1, , 1,
and 2; p2=1 - p0 - p1. M7: estimate p and q (parameters of  distribution of 
between 0 and 1). M8: same as M7 except additional site class where an estimated
 is allowed.
LRT(2)5 Statistical likelihood ratio test; comparing the test statistic (2) calculated
from paired codeml models (M1 vs M2; M0 vs M3; and M7 vs M8) with the critical
value of chi-square asymptotic distribution with appropriate degrees of freedom (i.e.
2 d.f., 4 d.f., and 2 d.f., respectively). 2 and level of significance are shown for M2,
M3, and M8 models. Note that no positively selected sites are predicted by at least
two pairs of codeml models for any of the six zebrafish Pcdh groups. N/A, Not
applicable.
5
Positively Selected Sites6 Codon positions predicted to be under positive selection
with a posterior probability >0.90 by one codeml model (M2, M3, or M8), and >0.50
by at least one other model.
Note that residues are comparably numbered among Pcdh groups and between
different species.
Table S3. Summary information for 22 Pcdh groups analyzed
Tree length1 measured as the number of nucleotide substitutions along the tree per
codon by the codeml program.
The + sites2 codon positions predicted to be under positive selection with a
posterior probability >0.90 by one codeml model (M2, M3, or M8), and >0.50 by at
least one other model. Note that residues are comparably numbered among Pcdh
groups and between different species.
Table S4. Log-likelihood values and parameter estimates for 8 primate and rodent Pcdh
groups
Model1 Maximum likelihood models implemented in the codeml program of the
PAML package. M0, one-ratio; M1, neutral; M2, selection; M3, discrete; M7, ; M8,
+.
2 Estimated log-likelihood values by the codeml program.
3 Estimated transition/transversion rate ratio by the codeml program.
Estimation of Parameters4 =KA/KS nonsynonymous/synonymous rate ratio;
p=proportion of sites for each site class. M0: one estimated  for all sites; M1:
estimate p0=proportion of sites with =0, p1=1 - p0, proportion of sites with =1;
M2: estimate p0 (=0), p1 (=1), and , p2=1 - p0 - p1. M3: estimate p0, p1, , 1,
6
and 2; p2=1 - p0 - p1. M7: estimate p and q (parameters of  distribution of 
between 0 and 1). M8: same as M7 except additional site class where an estimated
 is allowed.
LRT(2)5 Statistical likelihood ratio test; comparing the test statistic (2) calculated
from paired codeml models (M1 vs M2; M0 vs M3; and M7 vs M8) with the critical
value of chi-square asymptotic distribution with appropriate degrees of freedom (i.e.
2 d.f., 4 d.f., and 2 d.f., respectively). 2 and level of significance are shown for M2,
M3, and M8 models. No positively selected sites are predicted for zebrafish Pcdh
genes (data not shown). N/A, Not applicable.
Positively Selected Sites6 Codon positions predicted to be under positive selection
with a posterior probability >0.90 by one codeml model (M2, M3, or M8), and >0.50
by at least one other model.
Note that residues are comparably numbered among Pcdh groups and between
different species.
7
Download