HSV Research paper

advertisement
Variation in conservation among different genes within
the Herpes Simplex Virus type 1, and its correlation
with function
Kerri Callahan & Samantha Nadeau
Molecular Evolution
Thursday, December 11, 2014
Abstract
An analysis of three functionally different HSV-1 genes reveals patterns of
purifying selection as well as correlations between gene function and variation. Gene
redundancy decreases conservation while vital functionality increases conservation. The
study examined HSV-1 due to its prevalence in American adults. Hopes for vaccinations
and treatments require an understanding of gene conservation and evolutionary patterns.
The genes studied produced products including a membrane protein (UL20), a tegument
protein (UL49), and a terminase enzyme (UL15; UL28; UL33). In this study, multiple
tests rejected the null hypothesis that dN=dS and supported the original hypothesis that
UL20 currently maintains less conservation than UL49. Findings regarding the UL15
complex are inconclusive due to false assumptions that the individual genes could be
grouped together.
Introduction:
Herpes Simplex Virus type 1, more commonly known as oral herpes, affects 90%
of American adults (1). Herpes Simplex Virus type 2, or genital herpes, affects 20% of
American Adults (1). Due to the prevalent nature of these viruses, researchers completed
numerous studies investigating both type 1 and type 2 of Herpes Simplex. Most of the
research compared Herpes Simplex type 1 to Herpes Simplex type 2 rather than just
focusing on one of the two viruses. By narrowing in on Herpes Simplex Virus type 1
specifically, and the different morphological regions of the virus, comparing some of the
different genes in regards to their evolutionary conservation and correlating function
becomes possible. The three gene products compared in this study include a membrane
protein (encoded by the gene UL20), a tegument protein (encoded for by UL49), and a
terminase enzyme (encoded for by UL15, UL28, and UL33). The hypothesis states that
the UL20 gene, which encodes for the membrane protein, will be least conserved since it
is believed that this is less vital to the function of the virus in comparison to the other
genes because in addition to UL20, many glycoproteins and other membrane-associated
proteins perform similar, if not the same, function (5).
Background:
The Herpes Simplex Virus type 1 is a double stranded DNA virus with a lytic life
cycle (2). After the virus infects the cell through lytic infection, the virus travels to the
spinal ganglia and resides there until reactivation occurs causing latent infection (3).
The virion consists of three major morphological structures, the envelope, the
tegument, and the nucleocapsid (4). Within the envelope, a lipid bilayer composing the
outermost surface of the virion, lies the tegument, which contains proteins that get
released into the cytoplasm of the infected cell (4). Tegument proteins are responsible for
the egress of virion progeny (4). Within the tegument, the nucleocapsid contains the
double stranded DNA necessary for HSV-1 function (4).
The Herpes Simplex Virus 1 encodes for over fifteen membrane proteins, one of
which is the UL20 gene product (5). The UL20 protein is an intrinsic membrane protein
involved in the distribution of virions into the extracellular space of the infected cell (5).
The UL20 gene product works with other membrane-associated proteins and
glycoproteins to transport the virions from the infected cell into the extracellular space
(5). This means that UL20 plays a role in virion transmission; however, virion
transmission requires other genes as well (5).
Within the virion, the tegument layer lies between the nucleocapsid and the
envelope (4). The VP22 protein is a major tegument protein, encoded for by the UL49
gene (4). This protein is needed for the redistribution of viral proteins from the nucleus to
the cytoplasm of infected cells (4). This is essential for the egress of virion progeny (4).
During replication of Herpes Simplex Virus type 1, concatemers accumulate in
the infected cell (6). In order for the progeny virions to assemble, the concatemers must
be cleaved by a terminase enzyme and packaged into capsids (6). UL15, UL28, and
UL33 protein subunits comprise the terminase enzyme in Herpes Simplex Virus type
1(6). The subunits are assembled in the cytoplasm and using a mechanism on the UL15
protein, the complex exists the cytoplasm and enters into the nucleus (6). There, the
UL28 binds the complex to the DNA packaging signal and with the help of UL15 and
UL33, cleaves the concatemer and packages the genome into the capsid (6).
Prior research into the evolution of Herpes Simplex Virus shows UL15, UL28,
and UL33 to be highly conserved (7,8). Previously, studies on UL33, UL31, and UL34
(orthologs) indicated that highly conserved protein interactions may result whether or not
high sequence similarity exists between the genes. However, since literature stated
interactions between UL15, UL28, and UL33 existed in multiple different species, we
chose to group the three genes together for analysis under the assumption that they were
co-evolving to become more similar to one another (8).
Methods:
Partial genomes of HSV-1 strains CJ394, CJ360, CJ311, CJ790, OD4, and
TFT401 were imported in FASTA(text) format from the NCBI database (9) into MEGA
software version 6.06 (10). The location of genes UL15, UL20, UL28, UL33, and UL49
were found on NCBI for each strain and used to crop the partial genome into the targeted
genes (9).
Sequence Alignments:
Three separate alignments containing six total isolates were made using
CLUSTALW. The first alignment consisted of the UL20 gene from each of the six
strains. The second alignment also consisted of an isolated gene from each of the six
strains, but the gene was UL49. The third alignment consisted of two isolates of UL15,
one from strain CJ311 and one from strain OD14, two isolates of UL28, one from strain
CJ394 and one from strain CJ360, and two isolates of UL33, one from TFT401 and one
from CJ970.
Sequence Analysis:
The differences in synonymous and nonsynonymous mutations were tested using
the codon-based-z test of selection. The three alternative hypotheses used included
dN>dS, dN=/=dS, dS>dN, and the null hypothesis remained as dN=dS. The NeiGojobori (Proportion) method was used on all three sets of sequences and gaps were
treated using pairwise deletion. Selection at Codons was estimated via HyPhy using the
maximum likelihood method, standard genetic code table, and the Felsenstein 1981
model. Gaps were treated using complete deletion. Sums of dN values for all codons
and dS values for all codons were calculated and divided to give the dN/dS value. This
test was repeated for all three sequence sets. Pairwise distance estimates were computed
using the bootstrap method (500 replications) and Tamura-Nei model, with p-values of
less than 0.05 being considered as “significant”. Additionally, maximum likelihood trees
were constructed using the bootstrap test of phylogeny and the Tamura-Nei model for
each of the three sets of alignments. Branch length was measured in the number of
substitutions per site and included in the tree.
Results:
UL20:
The codon-based-z-test of selection resulted in a probability of 1.0 and a Zstatistic of -2.572 for HA: dN>dS, probability of 0.003 and Z-statistic of -3.017 for HA:
dN=/=dS, and a probability of 0.004 and Z-statistic of 2.669 for HA: dN<dS (Table 1).
Selection at Codons was estimated using HyPhy and resulted in a dN/dS value of 0.749
(Table 2). Of the 15 p-values reported from the pairwise distance analysis, 11 were
significant in rejecting the null hypothesis that dN=dS (Table 3). The maximum
likelihood tree that was generated showed two general clades, one consisting of strain
OD4 and CJ970 and one consisting of strains CJ394, CJ360, CJ311, and TFT401. The
bootstrap value for the latter clade was 100 whereas CJ970 was condensed. The branch
lengths for all clades were less than 0.1, with all except two being less than 0.01 (Figure
1). The average branch length was 0.00949.
UL15+UL28+UL33:
The codon-based-z-test of selection for the UL15, 28, 33 group computed a
probability of 1.0 and a Z-statistic of -1.895 for HA: dN>dS, probability of 0.08 and Zstatistic of -1.764 for HA: dN=/=dS, and a probability of 0.007 and Z-statistic of 1.798
for HA: dN<dS (Table 1). Selection at codons estimated via HyPhy gave a dN/dS value
of 0.972 (Table 2). Three of the 15 p-values from the pairwise distance analysis were
significant to reject the null hypothesis that dN=dS (Table 3). Three clades were clear in
the maximum likelihood tree. The first had a bootstrap value of 100 and consisted of
strains CJ394 and CJ360. The second had a bootstrap value of 99 and consisted of
TFT401 and CJ970. These two clades originated from a common ancestor. The third
clade had a bootstrap value of 100 and strains CJ311 and OD4. The average branch
length was 0.185 for the entire tree, 0.008 for the UL28 clade, 0.0 for the UL33 clade,
and 0.0025 for the UL15 clade (Figure 2).
UL49:
The codon-based-z-test of selection computed an overall probability of 1.0 and zstatistic of -1.794 for HA: dN>dS, probability of 0.078 and Z-statistic of -7.8 for HA:
dN=/=dS, and a probability of 0.043 and Z-statistic of 1.73 for HA: dN<dS the UL49
gene (Table 1). The value of dN/dS found estimated using the selection at codons via
HyPhy was .306 (Table 2). All of the p-values from the pairwise distance analysis were
significant to reject the null hypothesis that dN=dS (Table 3). The maximum likelihood
tree showed one common ancestor for strain CJ360, OD4, and the group of strains
TFT401, CJ970, CJ394, and CJ311 which had a bootstrap value of 70. Branch lengths
for all clades were all between 0.0 and 0.0035 (Figure 3). The average branch length was
0.0011.
Patterns:
The z-statistics for all codon-based-z-tests of selection using the alternative
hypotheses that dN>dS and dN=/=dS were negative, indicating that there were more
synonymous mutations than nonsynonymous mutations (Table 1). The probabilities were
all greater than 0.05, except for UL20, HA: dN=/=dS, meaning that they were not
significant at rejecting the null hypothesis of neutral selection (Table 1). The codonbased-z-test of Selection for the alternative hypothesis that dN<dS all have a positive Zstatistic values and probability values less than .05 (Table 1). This indicates that there are
more synonymous mutations than non-synonymous and allows for the rejection of the
null hypothesis that dN=dS for all genes. The dN/dS value was less than one for all three
sets indicating purifying selection. UL49 had the lowest value (closest to zero), and the
UL15, 28, 33 complex had the highest value (closest to one). The branch lengths were
generally different for all three trees, with the UL49 tree having the shortest average
branch length. The UL15, 28, 33 complex had the highest overall average branch length,
but when broken down by gene all branch length averages (for UL15, UL28, and UL33
individually) were lower than the average branch length for the UL20 gene (Figure 1, 2,
3).
HA: dN>dS
Gene
UL20
UL15;
UL28;
UL33
UL49
Z-stat
Prob
HA: dN=/=dS
Z-stat
Prob
HA: dN<dS
Z-stat
Prob
-2.572
1.0
-3.017
0.003
2.669
0.004
-1.895
1.0
-1.764
0.080
1.798
0.007
-1.794
1.0
-7.80
0.078
1.73
0.043
Table 1. Codon-Based-Z-Test of Selection: Statistic values (dN-dS, dN-dS, and dS-dN, respectively), and probability
values for each of three hypotheses. Probabilities <0.05 are significant in rejecting the null hypothesis that dN=dS
and highlighted in purple.
Gene
UL20
UL15; UL28; UL33 complex
UL49
dN/dS value
0.749
0.972
.306
Table 2. dN/dS Values Based on HyPhy Testing: UL49 has the lowest dN/dS value while the UL15; UL28; UL33
complex has the highest. The dN/dS values are less than one for all of the genes.
Table 3. Estimates of Evolutionary Diverengce between Sequences: pairwise distances and number of base
substitutions per site between sequences are shown. Codon positions included 1st, 2nd, and noncoding. All
ambiguous positions were deleted. Significant p-values are highlighted in yellow.
Human herpesvirus 1 strain CJ394 partial genome
0.0000
0.0023Human herpesvirus 1 strain CJ360 partial genome
0.0000
0.0045
Human herpesvirus 1 strain CJ311 partial genome
0.0000
0.0348
Human herpesvirus 1 strain TFT401 partial genome
0.0022
0.0369
0.0047 Human herpesvirus 1 strain CJ970 partial genome
0.0000
Human herpesvirus 1 strain OD4 partial genome
Figure 1. Maximum Likelihood Tree of Molecular Phylogenetic Analysis for the UL20 Gene: Evolutionary history of
the UL20 Gene inferred via Maximum Likelihood method. Branch lengths shown next to branches were measured
as the number of substitutions per site. Codon positions included 1st, 2nd, and non-coding.
Human herpesvirus 1 strain CJ394 partial genome
0.0082
0.6869
0.0401
Human herpesvirus 1 strain CJ360 partial genome
0.0084
Human herpesvirus 1 strain TFT401 partial genome
0.3706
0.0000
Human herpesvirus 1 strain CJ970 partial genome
0.0000
Human herpesvirus 1 strain CJ311 partial genome
0.0000
0.7299
Human herpesvirus 1 strain OD4 partial genome
0.0054
Figure 2. Maximum Likelihood Tree Based on Molecular Phylogenetic Analysis for UL15; UL28; UL33 Gene Complex:
Evolutionary history of the UL15; UL28; UL33 Gene Complex inferred via Maximum Likelihood method. Branch
lengths shown next to branches were measured as the number of substitutions per site. Codon positions included
1st, 2nd, and non-coding.
Human herpesvirus 1 strain TFT401 partial genome
0.0000
Human herpesvirus 1 strain CJ970 partial genome
0.0018
0.0000
0.0018
0.0000
0.0017
Human herpesvirus 1 strain CJ394 partial genome
Human herpesvirus 1 strain CJ311 partial genome
Human herpesvirus 1 strain OD4 partial genome
0.0000
0.0035
Human herpesvirus 1 strain CJ360 partial genome
Figure 3. Maximum Likelihood Tree Based on Molecular Phylogenetic Analysis for UL15; UL28; UL33 Gene Complex:
Evolutionary history of the UL49 Gene inferred via Maximum Likelihood method. Branch lengths shown next to
branches were measured as the number of substitutions per site. Codon positions included 1st, 2nd, and non-coding.
Discussion:
According to values computed through HyPhy, the gene correlating with the
smallest dN/dS value was UL49, signifying that UL49 is the most highly conserved out
of the three genes studied (Table 2). This supports part of the hypothesis, stating that the
UL49 gene will be most highly conserved.
In comparison, the gene correlating with the largest dN/dS value was the UL15;
UL28; UL33 complex, signifying that this group of genes is least highly conserved
(Table 2). This gene complex makes up a terminase enzyme needed for the cleavage and
packaging of the viral DNA into capsids. It was hypothesized that this gene complex
would be highly conserved. The large dN/dS value correlating with this gene complex
indicates however that it is not highly conserved.
The hypothesis also states that the UL20 gene will be least conserved since
literature suggests viral function depends less on this gene than on the others. This
however, was not the case when comparing the dN/dS values computed via HyPhy. The
UL20 gene had a dN/dS value of .749, while the UL15; UL28; UL33 gene complex had a
dN/dS value of .972 (Table 2). Although the UL20 has a higher dN/dS value than UL49,
it is still smaller than the UL15; UL28; UL33 gene complex. This could be due to many
reasons.
One reason could be that virion transmission could be more vital to virus function
than DNA encapsidation. However, virion transmission involves genes besides UL20.
Ward et al, in a study on the function of UL20, infected cells with either a UL20- virus
(HSV-1 lacking the UL20 gene), or a Wild Type virus (HSV-1 containing the UL20
gene) (5). Cells infected with the UL20- virus accumulated many virions in between the
nuclear and plasma membrane, but lacked virions in the extracellular space (5). This was
in comparison to cells infected with the Wild Type virus, which showed virion occupancy
between the nuclear and plasma membrane, and within the extracellular space (5). This
indicates that although there are other proteins that perform similar or the same function,
virion egress into the extracellular space still requires UL20. This is therefore necessary
for virus transmission, and will cause the gene to have a lower evolutionary rate than
expected.
Another possible explanation as to why the UL15; UL28; UL33 gene complex is
less conserved than UL20 could be due to the variety of genes within the complex.
Although these three genes work together to encapsulate the DNA, they do have separate
functions within the terminase enzyme. When analyzing the maximum likelihood tree, it
is clear to see that three separate clades were established, each representing a different
gene within the complex. The branch length leading to each individual clade is long, but
the branches composing the clade are short. Each clade correlates with the two strains
used for each individual gene (Figure 2). This means that the strains for each gene are
highly conserved within themselves, even if they are less conserved as a complex. In
other words, it seems as though comparing the different genes within this complex
against each other made for inconclusive results. In addition, if the individual genes were
compared only against themselves, the dN/dS values for each of these genes would be
lower than the dN/dS value for UL20.
Similarly, the assumption that the UL15; UL28; UL33 gene complex would be
conserved because the products of all three genes work together to form the terminase
enzyme was not supported by the pairwise distance matrices. Only three out of fifteen pvalues for the UL15; UL28; UL33 gene complex matrix were able to reject the null that
dN=dS (Table 3). This means that either all of these genes are under neutral selection
(unlikely) or the assumption is fallible (likely). The assumption was most likely fallible
because although the three gene products make up the same enzyme, they are responsible
for different functions within that enzyme. The pairwise distance matrices supported this
theory since the only significant values were the three instances where each individual
gene (UL15, UL28, and UL33) was compared against a different strain of itself (Table 3).
Additionally, tips composed of two strains of the same gene had short branch distances,
whereas branch lengths separating each of the three sets of clades had large branch
distances (Figure 2). The combination of the pairwise matrices and maximum likelihood
trees suggests that the genes vary too greatly in composition to be considered as one
complex. They should instead be treated as individual genes and compared against one
another by isolated analysis rather than grouped.
On the other hand, both UL20 and UL49 provide great support against the null
hypothesis that dN=dS. The codon-based-z-test of selection for HA: dN>dS resulted in
negative values and probabilities of 1.0, suggesting that the test was too restrictive but
also that the number of synonymous mutations was greater than the number of nonsynonymous mutations (Table 1). After changing the alternative hypothesis to test for
neutral selection (HA: dN=/=dS), the only gene with a significant probability to reject the
null was UL20, and again all values were negative (Table 1). Finally, the alternative
hypothesis was changed to dN<dS, and all values were positive with a significant
probability (Table 1). This suggests that the null hypothesis of neutral selection is
rejected and accept the alternative hypothesis that all genes are under purifying selection.
All p-values computed via pairwise distance for UL49 are statistically significant in
rejecting the null (Table 3). This suggests a higher confidence in the hypothesis that
selection sees the UL49 gene. Seeing as how the dN/dS value was 0.306, the hypothesis
that the UL49 gene is under purifying selection is supported (Table 2). The combination
of the unanimous p-values and low dN/dS value strongly support the main hypothesis
that the UL49 gene is most highly conserved. Similarly, the dN/dS value for the UL20
gene is less than one (0.749), suggesting purifying selection (Table 2). However, since
this number is closer to one than the dN/dS value for UL49 and not all p-values for UL20
reject the null hypothesis, there is less confidence that the UL20 gene is under selection
and so the hypothesis that UL20 is less highly conserved than UL49 is supported. This is
also supported by the UL49 gene having the lowest dN/dS value (Table 2). In
conclusion, our hypotheses that UL49 would be highly conserved and UL20 would not be
as highly conserved are supported, but our hypothesis regarding the conservation of the
UL15; UL28; UL49 gene complex is inconclusive.
References:
1. Ehrlich SD. 2013. Herpes simplex virus. Complementary and Alternative
Medicine Guide. University of Maryland Medical Center, Baltimore, MD.
[Online.] http://umm.edu/health/medical/altmed/condition/herpes-simplex-virus
2. Jenkins FJ, Turner SL. 1996. Herpes simplex virus: a tool for neuroscientists.
Frontiers in Bioscience. 1:241-247
3. Spear PG. 2004. Herpes simplex virus: receptors and ligands for cell entry. Cell
Microbiol. 6(5):401-410
4. Tanaka M, Kato A, Satoh Y, Ide T, Sagou K, Kimura K, Hasegawa H,
Kawaguch Y. 2012. Herpes simplex virus 1 VP22 regulates translocation of
multiple viral and cellular proteins and promotes neurovirulence. Journal of
Virology. 86:5264-5277
5. Ward PL, Campadelli-Flume G, Avitabile E, Roizman B. 1994. Localization
and putative function of the UL20 membrane protein in cells infected with Herpes
Simplex Virus 1. Journal of Virology. 68:7406-7417
6. Higgs MR, Preston VG, Stow NG. 2008. The UL15 protein of herpes simplex
virus type 1 is necessary for the localization of the UL28 and UL33 proteins to
virl DNA replication centres. Journal of General Virology. 89:1709-1715
7. Brown, J. December 2004. Effect of gene location on the evolutionary rate of
amino acid substitutions in herpes simplex virus proteins. Virology. 330: 209-220.
[Online.] http://www.sciencedirect.com/science/article/pii/S0042682204006233
8. Fossum, E., Friedel, C.D., Rajagopala, S.V, Titz, B., Baiker, A., Schmidt, T.,
Kraus, T., Stellberger, T, Rutenberg, C., Suthram, S., Bandyopadhyay, S.,
Rose, D., Von Brunn, A., Uhlmann, M., Zeretzke, C., Dong, Y., Boulet, H.,
Koegl, M., Bailer, S.M., Koszinowski, U., Ideker, T., Uetz, P., Zimmer, R., &
Haas, J. September 2009. Evolutionarily conserved herpesviral protein
interaction networks. PLOS Pathogens. [Online.]
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2731838/#
9. Kolb AW, Adams M, Cabot EL, Craven M, Brandt CR. 2011. Human
herpesvirus 1 strains CJ360, CJ311, OD4, TFT401, CJ790, CJ394 partial
genomes. [Online.] http://www.ncbi.nlm.nih.gov
10. Tamura K., Stecher G., Peterson D., Filipski A., and Kumar S. 2013.
MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Molecular
Biology and Evolution30: 2725-2729.
Download