Online Resource 1. Phylogenetic analysis of soybean bZIPs proteins

advertisement
1
Online Resource 1. Phylogenetic analysis of soybean bZIPs proteins
2
3
MATERIALS AND METHODS
4
To provide an evolutionary framework for the discussion of protein functions, phylogenetic
5
hypotheses were inferred by Bayesian inference (BI) and maximum likelihood (ML) using BEAST
6
v1.7.2 (Huelsenbeck and Ronquist, 2001) and GARLI 2.0 (Zwickl, 2006), respectively.
7
First, a search for bZIP protein sequences in Glycine max was performed in GenBank
8
(http://blast.ncbi.nlm.nih.gov), and 148 sequences were selected (Online Resource 2).
9
Additionally, we selected 10 bZIP protein sequences related to pathogen response
10
(ACT66299_TabZIP;
AAX20030_CAbZIP1;
CAA71687_GHBF1;
AAL27150_BZI1;
11
O22763_AtbZIP10; AAN61914_ PPI1; BAG24402_ SBZ1; NP_001234596_SlAREB1; P43273_
12
TGA2; Q39234_ TGA3). The selected sequences were aligned using MAFFT version 7 (Katoh et.
13
al, 2002) and the JTT+G (Jones et. al, 1992) was calculated by ProtTest 3.2.1 (Darriba et. al, 2011)
14
as the best-fit amino acid substitution model according to AIC and BIC criteria.
15
The BI phylogenetic trees were calculated using the Bayesian Markov Chain Monte Carlo
16
(MCMC) method with 5x107 generations and a sample frequency of 104, using the JTT+G
17
substitution model. The convergence of the parameters was analyzed in TRACER v1.5.0
18
(http://beast.bio.ed.ac.uk/tracer), and the chain reached a stationary distribution after 5x105
19
generations. A total of 1% of the generated trees was burned to produce the consensus Bayesian
20
phylogenetic tree.
21
The JTT+G substitution model was also selected in the GARLI settings (datatype =
22
aminoacid; ratematrix = jones; statefrequencies = jones; ratehetmodel = gamma; numratecats = 4;
23
invariantsites = none), and the statistical support of the ML phylogenetic trees was calculated by
1
24
103 bootstrap replicates. The 50% majority rule consensus ML phylogenetic tree of all bootstrap
25
replicates was summarized using SumTrees of DendroPy 3.8.0 (Sukumaran et. al, 2010).
26
For comparison purposes among clusters in phylogenetic tree and protein functions, a
27
search for conserved domains was performed in the Pfam database (Sonnhammer et. Al, 1997).
28
Only domains predicted at the 1% level of significance were considered further.
29
30
RESULTS
31
In phylogenetic trees (Fig. S1), these ten classes were recovered as monophyletic clades
32
supported by moderate values of posterior probability (PP) (Bayesian tree) and bootstrap value
33
(BV) (ML tree), with PP > 85 and BV > 50; most of 119 proteins included in these clades share
34
the same pfam domains (http://pfam.sanger.ac.uk/). Based on such data, we also proposed an
35
additional group of nine related bZIP proteins that have a set of four distinct domains (Fig. S1).
36
Beyond these eleven groups, 20 (13.5%) bZIP proteins remained ungrouped (Fig. S1).
37
After the distribution of all soybean bZIP proteins annotated in GenBank into 11 classes,
38
we sought candidate sequences in the group of bZIP proteins characterized as responsive to
39
pathogens (Fig. S1) and also differentially expressed during ASR. For the purpose of determining
40
the bZIP transcription factors involved in response to P. pachyrhizi, we used data from subtractive
41
libraries containing clones of Inoculated plants vs. MOCK plants subtraction, based on resistant
42
soybean cultivar PI561356, deposited in the database of the GENOSOJA consortium
43
(http://bioinfo.cnpso.embrapa.br/genosoja/; Benko-Iseppon et al, 2012). We were able to verify
44
the differential expression of several members of plant transcription factors families involved in
45
plant defense, such as MYB, WRKY, AP2 / ERF, NAC and bZIP transcription factors families,
46
during ASR infection in PI561356 plants (unpublished data). Among the members of the bZIP
2
47
family, we chose four distinct soybean genes with high value of fragments per kilobase of exon
48
per million fragments mapped (FPKM; above 500) for further functional studies (GenBank access
49
numbers ABI34659, NP_001237027, XP_003543312 and XP_003525005). Thus, we selected
50
four bZIP members (whose GenBank accession numbers are highlighted in the tree) for functional
51
analysis (Fig. S1). Based on the phylogenetic analyses, two bZIP proteins selected grouped in E
52
class, and were used for analysis of response to infection by P. pachyrhizi (Fig. 1).
53
This is the class that includes TabZIP (ACT66299), a single bZIP protein that has been
54
functionally characterized in the response to infection by a wheat rust fungus, Puccinia striiformis
55
f. sp. Tritici (Zhang et al. 2009). The soybean proteins selected (XP_003543312 and
56
XP_003525005) are similar to Arabidopsis bZIP proteins from the E class, which have not been
57
assigned a defined biological function (Jakoby et al. 2002). Thus, we selected these two proteins
58
for functional studies and to analyze their expression during infection by P. pachyrhizi. The
59
proteins were named GmbZIPE1 (XP_003543312) and GmbZIPE2 (XP_003525005) because of
60
their similarity to Arabidopsis proteins in the E class (Jakoby et al. 2002).
61
The two other bZIP members selected (ABI34659 and NP_001237027) grouped in the C
62
class. The C class had the highest number of transcription factors in the bZIP family characterized
63
as responsive to pathogens (Fig. S1). The protein ABI34659 identified in GenBank as GmbZIP105
64
and the NP_001237027 protein identified as GmbZIP62 present strong similarity to members of
65
the C class that are responsive to pathogens (Fig. 1), such as G/HBF-1 and the soybean SBZ1 (Fig.
66
S1; Jakoby et al. 2002).
67
GmbZIP62 was previously characterized as responsive to abiotic stress (Liao et al. 2008);
68
its overexpression in Arabidopsis increased tolerance to drought, salinity and freezing (Liao et al.
69
2008). Thus, GmbZIP62 may be a general response factor to stresses in plants, including stress
3
70
caused by pathogens, a feature already described for other bZIPs proteins (Lee et al. 2006; Orellana
71
et al. 2010).
72
The proteins GmbZIP62, GmbZIP105, and GmbZIPE1 GmbZIPE2 were grouped in
73
different clades, reflecting structural differences that may be accompanied by functional
74
differences among these proteins (Fig. S1).
75
76
DISCUSSION
77
Phylogenetic analysis showed the great structural and functional diversity of the bZIP
78
family of transcription factors. Previous studies (Jakoby et al. 2002) proposed the separation of
79
Arabidopsis bZIPs into 10 classes or groups. In the soybean, the same 10 groups were proposed to
80
group the bZIP proteins according to their structural similarity with the Arabidopsis proteins (Liao
81
et al. 2008). The phylogenetic analysis performed in this study revealed the presence of at least
82
11 classes of soybean bZIP proteins, which demonstrates the existence of other functional classes
83
in addition to those previously described by Jakoby and coworkers (2002). Twenty of the 148
84
analyzed soybean proteins were not grouped in any of the 11 groups formed, most likely due to
85
the lack of the complete sequence of these proteins in the GenBank database or the fact that their
86
functional domains were not characterized as bZIP domains, although these proteins have been
87
predicted to be bZIP family proteins in previous studies (Liao et al. 2008).
88
A new class is also proposed, formed by proteins homologous to HB-PHD family proteins
89
in Arabidopsis (Ariel et al. 2007). The homeobox domain (HB) is a conserved motif of 60 amino
90
acids in transcription factors found in all eukaryotic organisms (Ariel et al. 2007). This motif folds
91
in a triple helix structure that is capable of interacting specifically with the target DNA (Ariel et
92
al. 2007). The PHD finger domain, a His-Cys3-Cys4 zinc finger, is found in many regulatory
4
93
proteins from plants and animals and is often associated with transcriptional regulation by
94
chromatin modification (Halbach et al. 2000). In transcription factors containing a homeobox
95
domain, the PHD finger is combined with a leucine zipper in an upstream position (Halbach et al.
96
2000). These domains together form a highly conserved region of 180 amino acids called ZIP /
97
PHDf, and it has been verified that the transcriptional activity of the PHD finger domain is masked
98
when it is in this long region (Halbach et al. 2000). Interestingly, little is known about the region
99
proximal to the basic leucine zipper domain, and these proteins have not been described in the
100
literature as proteins containing a bZIP domain. In addition to containing the protein domains
101
described for HB-PHD proteins, the 9 proteins of this group also have MEKHLA domains
102
(MEKHLA as conserved sequence of amino acids), which are similar to the PAS domain (Per,
103
Arnt, and Yes proteins). In eukaryotes, this domain is a signal detector in signaling pathways
104
(Dunham et al. 2003). This group also contains a START domain (StAR protein-related lipid-
105
transfer), which has a lipid-binding function, and can bind to cholesterol, phospholipids and
106
sphingolipids (Ponting et al. 1999), suggesting that these proteins may be anchored to the cell
107
membrane and act as signal receptors in plant cells.
108
Analysis of the composition of protein domains observed in each bZIP protein in soybean
109
verified a structural diversity found among monophyletic groups (Fig. S1). Members of groups A,
110
C, E, F, I and S contained signature basic leucine zipper (bZIP) domains, and the domains
111
identified as bZIP domains differ structurally within the classification proposed by Jakoby and
112
coworkers (2002). For example, the domain bZIP_C, found in 19 of the 20 members of the group
113
C, differs from the bZIP_2 and bZIP_1 domains by having a leucine zipper that contains nine
114
replicates containing seven leucines each (Jakoby et al. 2002). Unlike the members of these groups,
115
members of the other groups have several distinct domains that reflect the characteristics of their
5
116
functional roles. Most members of the group D (17) also have a DOG1 domain (Delay of
117
germination protein 1), which is related to the control of seed development (Jakoby et al. 2002),
118
while two members have a HSF (Heat shock factor) domain, a binding domain related to heat
119
shock promoters (Clos et al. 1990). Seven members of group G have a domain called MFMR
120
(multifunctional mosaic region), which has a crucial role in the activation of transcription (Jakoby
121
et al. 2002). Five members of group H have a RING-type zinc finger domain that most likely
122
functions in protein-protein interactions (Halbach et al. 2000). Although 20 proteins were not
123
grouped in any of the 11 monophyletic groups formed, 16 of them have domains with unknown
124
functions (DUF) that have structural similarity to the bZIP domain (DUF630, DUF632 and
125
DUF1664), while the other 3 proteins exhibited unrelated domains (Fig. S1). Ten proteins found
126
in the grouping showed no relevant domains (Fig. S1).
6
127
128
129
Fig. S1 Phylogeny of soybean bZIP proteins. The majority-rule consensus tree was obtained by
130
Bayesian MCMC coalescent analysis of 148 sequences of bZPIP proteins (see methods above).
7
131
The posterior probability values (PP) (expressed as probabilities) calculated using the best trees
132
found by MrBayes are shown beside each node. The second value (underlined) corresponds to
133
bootstrap values (BV) (expressed as probabilities) that define the clusters in the maximum
134
likelihood tree. The proteins selected for study are shown in bold and highlighted in yellow
135
(GmbZIPE1, XP_003543312; GmbZIPE2, XP_003525005; GmbZIP105, ABI34659; GmbZIP62,
136
NP_001237027).
8
137
References
138
Ariel FD, Manavella PA, Dezar CA, Chan RL (2007) The true story of the HD-Zip family. Trends
139
Plant Sci. 12(9):419-26.
140
Benko-Iseppon AM, Nepomuceno AL, Abdelnoor RV (2012) GENOSOJA - the Brazilian soybean
141
genome consortium: high throughput omics and beyond. Genetics and Molecular Biology,
142
35 (1, Suppl. 1), i-iv.
143
Clos J, Westwood JT, Becker PB, Wilson S, Lambert K, Wu C (1990) Molecular cloning and
144
expression of a hexameric Drosophila heat shock factor subject to negative regulation.
145
Cell. 30;63(5):1085-97.
146
147
Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models
of protein evolution. Bioinformatics. 27:1164-1165.
148
Dunham CM, Dioum EM, Tuckerman JR, Gonzalez G, Scott WG, Gilles-Gonzalez MA (2003) A
149
distal arginine in oxygen-sensing heme-PAS domains is essential to ligand binding, signal
150
transduction, and structure. Biochemistry. 1;42(25):7701-8.
151
Halbach T, Scheer N, Werr W (2000) Transcriptional activation by the PHD finger is inhibited
152
through an adjacent leucine zipper that binds 14-3-3 proteins. Nucleic Acids
153
Res. 15;28(18):3542-50.
154
155
Huelsenbeck JP, Ronquist F (2001). MRBAYES: Bayesian inference of phylogenetic trees.
Bioinformatics. 17, 754–755.
156
Jakoby M et al (2002) bZIP transcription factors in Arabidopsis. Trends Plant Sci. 7, 106–111.
157
Katoh K et al (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast
158
Fourier transform. Nucleic Acid Res. 30:3059-3066.
9
159
160
Lee SJ et al (2002) PPI1: A novel pathogen-induced basic region-leucine zipper (bZIP)
transcription factor from pepper. Mol Plant Microbe Interact. 15, 540–548.
161
Liao Y et al (2008) Soybean GmbZIP44, GmbZIP62 and GmbZIP78 genes function as negative
162
regulator of ABA signaling and confer salt and freezing tolerance in transgenic
163
Arabidopsis. Planta. 228:225-240
164
Orellana S et al (2010) The transcription factor SlAREB1 confers drought, salt stress tolerance and
165
regulates biotic and abiotic stress-related genes in tomato. Plant Cell Environ. 33, 2191–
166
2208.
167
168
169
170
Ponting CP, Aravind L (1999) START: a lipid-binding domain in StAR, HD-ZIP and signalling
proteins. Trends Biochem Sci. 24(4):130-2.
Sukumaran J, Holder MT (2010) DendroPy: a Python library for phylogenetic computing.
Bioinformatics (Oxford, England), 26:1569–71.
171
Zhang Y et al (2009) Cloning and characterization of a bZIP transcription factor gene in wheat
172
and its expression in response to stripe rust pathogen infection and abiotic stresses.
173
Physiological and Molecular Plant Pathology. 73, 88–94.
174
Zwickl D (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological
175
sequence datasets under the maximum likelihood criterion. The University of Texas,
176
Austin, TX.
10
Download