Molecular Characterization of Neuraminidase Influenzavirus B from Egypt, 1999-2008.

advertisement
The American University in Cairo
School of Sciences and Engineering
Molecular Characterization of Neuraminidase
Surface Proteins of Influenzavirus B isolates
from Egypt, 1999-2008.
Biotechnology Department Masters Program
Nadean Safwan Khedr
Bachelor of Science
Under the supervision of Dr. Rania Siam and Dr. Anne M. Gaynor
Khedr, 1
ACKNOWLEDGEMENTS
The work that has been conducted for this thesis has been performed at US Naval
Medical Research Unit No. 3, Cairo (NAMRU-3) in coordination with the American University
in Cairo Biotechnology department. I would like to thank Dr. Rania Siam (AUC advisor and
professor) for allowing me the chance to be part of this research project and for guiding me. I
would also like to thank Dr. Anne M Gaynor (NAMRU-3 advisor) and Dr. Claire Corenlius for
helping me with my practical lab work, providing me with laboratory space, and facilitating the
opportunity to allow me to complete my work in NAMRU-3. I would like to thank the Egyptian
Ministry of Health and Population, for they are responsible for all the collections sites. I would
also like to express my gratitude to Mr. Diaa El-Din who has single handedly taught me
everything I needed to know for me to complete my laboratory research. I would like to thank
Mr. Sherif El Fayoumy for his input on my molecular characterization analysis and finally I
would like to thank Ms. Yasmine Mustafa for helping me with all the molecular characterization
and computational analysis for this research.
Khedr, 2
ABSTRACT
Influenzaviruses affect all of the world’s population, posing a threat to at-risk populations
such as the elderly, children, pregnant women, as well as those with underlying respiratory
ailments. Vaccination is an effective tool to prevent infection from influenzaviruses. However,
due to the high mutation rate of influenzaviruses caused by antigenic drift and shift, vaccines
must be developed every year in an effort to control epidemics. The mutations that effect the
vaccine development occur mainly in the virus’ surface proteins: hemagglutinin and
neuraminidase. It is important to study the sites and rates of mutation of influenzaviruses to
understand how the virus is evolving.
Influenzaviruses have been isolated from samples collected through surveillance projects
in Egypt over the last ten years at NAMRU-3. Frequently these viruses are sequenced to analyze
three important genes, the 2 surface proteins HA (hemagglutinin) and NA (neuraminidase) and
M (Matrix) as they provide the vast majority of information that informs vaccine development
and anti-viral treatment. This study will focus on sequencing and analysis of the neuraminidase
gene of influenzavirus B isolates. This data will also allow us to examine the circulation patterns
of influenzavirus B and allow us to compare it to the data acquired from a similar HA
sequencing effort.
Khedr, 3
TABLE OF CONTENTS
LIST OF FIGURES……………………………………………………………………………...6
LIST OF TABLES……………………………………………………………………………….7
INTRODUCTION……………………………………………………………………………….8
Classification, Structure, and Subtypes…………………………………………………………..8
Virus Life Cycle……………………..……………………………………………………………9
Role of Neuraminidase…………………………….………………………………………….…10
Clinical Disease and Epidemiology ……………………………………………………….…….11
Vaccines and Mutation…………………………………………………………………………..11
INFLUENZAVIRUS STUDIES………………………………………………………………..13
Influenza B Lineages…………………………………………………………………………….13
Objectives………………………………………………………………………………….…….15
MATERIALS AND METHODS……………………………………………….……………….17
Sample collection………………………………………………………………………………...17
Virus Isolation…….……………………………………………………………………………...17
Molecular Testing of Virus Isolates………………..…………………………………………….18
Phylogenetic Analysis………………………………………………………………………........19
Entropy Analysis……………………………………………………………………………........20
RESULTS…………………………………………………………………………………..……21
Sample Selection and sequencing………………………………………………………………..21
Khedr, 4
Influenzavirus B Lineages……………………..……………………………………………….. 21
Phylogenetic Analysis………………………………..……………………………………….….22
Amino Acid Variation of NA Sequences…………………………………………………..…….23
Neuraminidase Glycosylation Sites………………………………………………………..…….25
Neuraminidase Antigen Binding Sites..………………………………………………………….25
Evolutionary Substitution Rates……………………………………………………………..…..26
DISCUSSION……………………………………………………………………………….......28
CONCLUSION…………………………………………………………………………….........34
REFERENCES…………………………………………………………………………………..35
TABLES AND FIGURES……………………………………………………………………….38
Figure legends……………………………………………………………………………………38
Figures……………………………………………………………………………………………40
Tables…………………………………………………………………………………………….48
Khedr, 5
LIST OF FIGURES
Figure 1 – Map of Egypt labelled with affiliated hospitals……………………………………...40
Figure 2 – Prevalence of Yamagata-like isolates and Victoria-like isolates by year……………42
Figure 3 –Phylogenetic tree of influenzavirus B neuraminidase gene….……………………….43
Figure 4 – Circular representation of NA phylogeny color-coded according to distribution from
1998-2008………………………………………………………………………………………..44
Figure 5 – Circular representation of NA phylogeny color-coded by collection site…..………..45
Figure 6 - Entropy H(x) Plot of the NA gene translated sequences……………………...............46
Figure 7 – Rate of Nucleotide Substitutions in Neuraminidase gene…...……………………….47
Khedr, 6
LIST OF TABLES
Table 1 – Influenzavirus segments 1-8……………………………………………………….…48
Table 2 – comparison of Amino Acid Sequences of the Yamagata-like isolates with
B/Yamagata/88………………………………………………………………………………….49
Table 3 – Comparison of Amino Acid sequences of the Victoria-like Isolates with B/Hong
Kong/548/2000, B/Brisbane/32/2002, and B/Johannesburg/69/2001…………………………..50
Table 4 –Glycosylation motifs in NA…………………………………………………...……...53
Table 5 - Non-zero entropy positions in correlation with the antigenic sites of
Neuraminidase…………………………………………………………………………………..54
Table 6 - Zero Entropy Positions in Correlation with Receptor Binding Sites………………….55
Table 7 – List of Abbreviations………………………………………………………………….56
Khedr, 7
INTRODUCTION
Seasonal influenza is an acute viral infection caused by influenzaviruses of which there
are three types: A, B, and C. Together they contribute to significant morbidity and mortality
worldwide. The viruses are well adapted to their human hosts and are able to mutate quickly to
overcome the host immune system. The WHO has a global campaign to better understand the
circulation of influenzaviruses called the “Global Influenza Surveillance and Response System”
established in 1952 and is comprised of over 136 National Influenza Centers positioned
worldwide. As a part of this system, Egypt through both the Ministry of Health and NAMRU-3
has participated in influenza surveillance for more than ten years.
Classification, Structure and Subtypes
Influenzaviruses are segmented negative strand RNA viruses belonging to the viral family
Orthomyxoviridae. There are three genuses of influenzaviruses: Influenzavirus A, Influenzavirus
B, and Influenzavirus C. Influenzavirus A is broken into subtypes based on the makeup of their
two surface proteins HA (hemagglutinin) and NA (neuraminidase), ex: H1N1, H3N2 and H5N1.
In total there are 16 types of HA and 9 types of NA used to classify Influenzavirus A.
Influenzavirus B has the same 8 segments as Influenzavirus A but is not usually subtyped,
though it is described as being of either Victoria or Yamagata lineages.
The 8 genomic segments of influenzaviruses A and B encode for 1-2 proteins per segment
and are coated with nucleocapsid (N) (Table 1), and collectively they are approximately 13.6 kb
in size1. They are pleomorphic in shape though they are most commonly spherical and are on
average 100nm in diameter2.
Khedr, 8
Virus Life Cycle
The influenzavirus undergoes a lytic infectious cycle that contains five stages: attachment,
penetration, replication, maturation, and lysis of the cell2. Influenzaviruses have approximately
five hundred surface projections of HA and NA proteins. These projections enable the virus to
bind to the host cell and get engulfed by the surface membrane. Hemagglutanin binds the
receptor, sialic acid, on the host cell surface and is crucial to the virus’ ability to fuse with the
host’s membrane to infect the cell1. Neuraminidase is responsible for cleaving the sialic acid
from the hemagglutinin glycoprotein to allow entry. The other major protein involved in entry is
M2, the matrix protein that forms an ion channel that helps to increase the permeability of ions
entering the virus core. This protein and its function are crucial to the influenza virus’ ability to
attach and penetrate the membrane of the epithelial cells of the respiratory tract3. The M2 ion
protein channel has a cytoplasmic tail that interacts with the M1 protein that allows the virus to
undergo assembly and budding4.
The virus manipulates the cellular environment through a combination of hijacking cellular
machinery and use of viral proteins to replicate itself. The polymerase B2 protein (PB2),
polymerase B1 protein (PB1), and the polymerase A (PA) protein forms an RNA-dependent
RNA polymerase that is responsible for both replication and transcription of viral RNA5. This
polymerase does not have proofreading activity resulting in a high frequency of point mutations.
NS1 and NS2 aid the virus’ ability to penetrate and replicate by manipulating the host pathways
and machinery. The viral genome is replicated into two types of RNA: positive sense ribonucleic
acid (+RNA) and negative sense mRNA (-RNA). The +RNA is used as a template for the
synthesis of more vRNA6. The newly synthesized vRNA with the nucleoprotein and other viral
proteins assemble and bud from the cell membrane where it acquires its envelope and attaches to
Khedr, 9
the cell again through interactions with HA, and NA is required to cleave the interaction to
generate free infectious virion7.
Role of Neuraminidase
Neuraminidase is a cell surface glycoprotein that is required for the proper entry and
release of influenzaviruses during infection. It is a tetrameric protein with four identical subunits
which is anchored to the virus through a 29 amino acid N-terminal tail3, 8. Neuraminidase is a
sialidase that functions to liberate sialic acid from the host’s glycoconjugates which in turn
destroys the receptors for the virus. It cleaves the α-ketosidic linkage between the Neu5Ac
surface amino acid and the neighboring saccharides that are usually galactose9. The active site of
this enzymatic surface protein is in pocket in the protein that is lined with conserved amino
acids. Neuraminidase is composed of 1451bp where the first 21 base pairs account for a nonbinding protein10, 11.
Studies have shown that the glycosylation of neuraminidase is vital to the virulence of the
influenzavirus. Li et.al has found that there are four Asparagine (Asn) residues that are
glycosylated, with the most conserved (among influenzavirus A and B) one at position 146 of the
protein12. This glycosylation site is associated with a complex sugar that contains Nacetylgalactosamine, which is not found in any of the other 3 Asn residues or in the
hemagglutanin proteins12.
Khedr, 10
Clinical Disease and Epidemiology
Influenzavirus is a global disease that affects approximately 20% of children and 5% of
adults worldwide13. The northern hemisphere usually experiences seasonal influenzavirus
infections from October through March with a peak in February14. Influenzavirus is usually
transmitted through air droplets, which causes the disease to be extremely contagious and its
progression rapid due to its replication cycle beginning within six hours after the penetration of
the mucosal cells15. The influenzavirus virus can cause a variety of symptoms such as a cough,
fever from 39°-41° C, and a sore throat. Influenzavirus have a low mortality rate of 0.1%;
however, the influenza virus can also be accompanied by many secondary infections, for
example Haemophilus influenzae16 and can cause complications such as severe respiratory
syndromes, disorders affecting the lung, heart, brain, liver, kidneys, and muscles, to primary viral
and secondary bacterial pneumonia17. Those who are most at risk to developing complications
from the influenza virus are: children aged 6-59 months, persons more then 50 years of age,
pregnant women, hospitalized patients, immunocompromised patients, and persons with severe
malnutrition14. The economic impact of such a disease is a significant burden on the healthcare
system, particularly in the case of pandemics where productivity rates can decline17.
Vaccines and Mutation
The single most effective means to prevent influenzavirus infection is a vaccine. To
develop a proper vaccine, a meeting is held twice a year with the WHO and collaborating centers
Khedr, 11
to evaluate which viral strains are currently circulating and whether they have mutated from the
viruses that were in the previous version of the vaccine14.
Influenzaviruses are known to regularly mutate in response to the hosts’ immune
response. This constant mutation results in small changes and is known as antigenic drift and can
cause sporadic outbreaks and limited epidemics seasonally18. Influenzaviruses also have a second
mechanism to generate change, which is through reassortment of their 8 segments. This can only
occur when two different viruses infect one host at the same time. A perfect example of this is
the most recent influenza A H1N1 (2009) which was a virus created when at least two different
viruses shuffled their segments resulting in a brand new virus that was able to cause a global
pandemic. This is known as an antigenic shift, it is when the viruses antigens are reassorted and
new strains begin to circulate within the population7. The constant study and surveillance of
influenzavirus is necessary in order to anticipate the next major epidemic or pandemic and to
develop successful vaccines.
Khedr, 12
INFLUENZAVIRUS STUDIES
Influenzavirus B Lineages
According to previous molecular studies, it has been determined that there are lineages
and relationships between circulating strains that can be divided into two distinct phylogenetic
lineages: Victoria (influenzavirus B/Victoria/2/87) lineage and Yamagata (influenzavirus
B/Yamagata/16/88 lineage) lineage19, 20. Both the Victoria-like sequences and the Yamagata-like
sequences for Neuraminidase are composed of 1511 nucleotides and 466 amino acid residues.
The reassortment between circulating strains and insertions/deletions are speculated to be the
strategies by which the influenza virus B undergoes evolution21. In 2002, a molecular
characterization of 105 influenzavirus B specimens from Belgium, Finland, Spain, Israel and
China were analyzed by sequencing the HA and NA to map out lineages for the regions20. From
the study it was determined that 96.2% were B/Victoria/2/87 lineage while the remainder (3.8%)
belonged to the lineage B/Yamagata/16/88. Further analysis was performed and it was
determined that influenzavirus can contain an HA from one lineage and an NA from a different
lineage. The B/Yamagata/16/88 showed a significant antigenic drift in the hemagglutanin protein
while the B/Victoria/2/87 could be divided into two more lineages B/Hong Kong/1351/02-like
(72.3%) and B/Hong Kong/330/01-like (27.7%). Based on the difference in lineage between the
surface proteins in the same strain, the B/Hong Kong/1351/02-like viruses had the
hemagglutanin gene belonging to the B/Victoria/2/87 and the neuraminidase gene belonging to
the B/Yamagata/16/88. The B/Hong Kong/330/01 both the hemagglutanin and neuraminidase
genes belonged to the B/Victoria/2/87 lineage. They also found throughout their study that
although the B/HongKong/330/01 had both the HA and NA from the same lineage, there were
Khedr, 13
B/Yamagata/16/88-like neuraminidase genes found during sequencing which most likely
occurred because of the reassortment of B/Hong Kong/330/01 and B/Hong Kong/1351/02
viruses during coinfection of hosts. The purpose of this study was to determine the co-infection
of the Victoria-like and Yamagata-like strains in Israel and China throughout phylogenetic
analysis. This study revealed these new strains at the end of the year of 2002 which supports the
notion that the influenza B virus continues to evolve using antigenic shifts and drifts20. The study
has also conducted a substitution rate analysis where notable variations were found amongst the
isolates they speculate contributes to the altered viral antigenicity. They also speculate that these
substitutions later can become glycosylation sites. These results contribute to further the
understanding of the influenzavirus within the regions which is lacking for the influenzavirus B
strains in Egypt.
A study on seasonal influenzavirus A, performed by Nelson et.al, was conducted in 2007
on H3N2 strains from the years 1999-200513. There were 487 isolates that were collected from
Australia and New Zealand to represent the southern hemisphere and 413 isolates collected from
New York, USA to represent the northern hemisphere. Phylogenetic analysis of the full genome
for the 900 isolates showed global migration that contributes significantly to Influenzavirus A
epidemics. Through this study, global circulation patterns were mapped out. It was also found
the Influenzavirus A migrates during non-epidemic periods instead of remaining at low levels
locally during what is considered the Influenza “off season5”. From this study, they were able to
determine the genesis of new clades and the spreading of novel virus variants. The circulation
patterns of the influenzavirus A proved useful to determine wide scale migration patterns. More
studies on the influenzaviruses circulating throughout the Egyptian population are needed.
Khedr, 14
A similar study was conducted in Japan in the year 1998 by Linstrom et.al where
Influenzavirus A isolates from 1993-1997 were phylogenetically analyzed to determine
evolutionary pathways and rates. This study was conducted on both the HA and NA gene. The
amino acid sequences were also analyzed and it was determined that the changes that
accumulated in the amino acid sequences were correlated with time. The study was not just
focused on HA and NA genes, but also on the other segments of the influenza virus. Overall
results indicated that the glycoproteins evolved at a faster rate than other proteins in the
influenzavirus22. Whole genome studies on the influenzavirus strains in Egypt are needed.
As shown from the studies previously mentioned, phylogenetic analysis and evolutionary
analysis have been performed on the circulating strains in Australia, Asia, Europe, and North
America. These studies help further understand the influenza virus, its circulation patterns,
lineages, and evolutionary rates. The majority of these studies focus on the influenzavirus A
rather than the influenzavirus B. More studies on the influenzavirus B should be conducted since
it is also a co-circulating virus and contributes to the cause of epidemics for seasonal influenza.
Also, very few studies regarding phylogenetic analysis of the influenzavirus B were found in
Northern Africa.
Objectives
Studies have been conducted all over the world on the influenzavirus strains indigenous to their
regions to map out lineages, circulation, and many of the studies have performed molecular
characterization of the sequencing data to detect various conservative and non-conservative
Khedr, 15
regions of the influenzavirus’ surface proteins. These types of studies have yet to be performed
in Egypt. It is the objective of this study to fill this gap. Through this study, the evolution of the
influenzavirus B neuraminidase surface protein will be examined by sequencing the NA gene
from isolates collected in Egypt. Phylogenetic trees will be created to examine and analyze
circulation patterns of influenzavirus B in Egypt based on the NA gene sequence. This study will
also compare sequencing and phylogenetic data collected on the HA gene from a previous study
in Egypt on the same isolates to generate an overall picture of the influenzavirus B strains
circulating throughout Egypt from 1998-2008.
Khedr, 16
MATERIALS AND METHODS
Sample collection
The samples used in this study were collected between 1999-2008 from 10 hospitals
located throughout Egypt: Alexandria Fever hospital, Monira General Hospital (Monira),
Domiatta Fever Hospital (Domiat), Sharqeya Fever Hospital (Sharqeya), Helwan Fever Hospital
(Helwan), El-Gabarty Fever Hospital (Mokatam, Cairo), Kitchner General Hospital (6th of
October, Cairo), and Menia Fever Hospital (Menia). Oral pharyngeal (OP) samples that were
collected from patients that presented with influenza-like illness (cough, fever from 39°-41° C,
and sore throat). The swabs were placed in 1ml virus transport medium (VTM) (10g veal
infusion broth, 2g bovine albumin fraction V, 400ml sterile distilled water, 0.8 ml gentamicin,
and 8ml fungizone) and immediately placed in liquid nitrogen and transported to NAMRU-3 for
testing.
Virus Isolation
An aliquot of each sample was inoculated into MDCK cell lines. The cells are cultured in
500ml of DMEM supplemented with 55ml fetal bovine serum, 5.5ml penicillin-streptomycin,
5.5ml L-glutamine, and 4ml fungizone. Following inoculation the cells are incubated at 37ºC
with 5% CO2 and observed daily (10-14 days) for cytopathic effect (CPE). Specimens positive
for CPE are then identified using hemagglutination inhibition assay (HAI) using the WHO
Influenza Reagent Kit. 107 Influenzavirus B isolates were identified and selected for further
characterization. This step was performed by El Din, D23.
Khedr, 17
Molecular Testing of Virus Isolates
140ul of the influenzavirus B isolates were extracted using the QIAamp Viral RNA Mini
Kit (QIAGEN). Two-step RT-PCR was used to amplify the necessary regions. In the first step,
4ul of RNA template was added to the 1.5ul each of the forward and reverse primers and
incubated for 5 min at 97ºC to increase the amplification yield. Three reactions were then set up
as follows. Reaction 1: AO-NA1F (AGCAGAAGCAGAGCATCTTC) (Eurofin) with, AONA1R (AACGAGGGTATGTCCACTCC) (Eurofin), Reaction 2: AO-NA2F
(TATATCGCAGTTGATGG) (Eurofin), with AO-NA2R (GCTTCCATCATYTGGTCTGG)
(Eurofin), Reaction 3: AO-NA3F (GCTACCTTCAACTATACAAACG) (Eurofin), with AONA3-R (AGTAGTAACAAGAGCATTTTTC) (Eurofin). Then the rest of the reaction mix from
the Q1 step kit (QIAGEN) is added for the second stop of RT-PCR: 10ul of 5X buffer, 2ul of
dNTPs, 2ul of enzyme mix, and 29ul of H2O making a total of 50ul. The reactions are then
amplified using the following cycling conditions: 30 min at 50ºC, 15 minutes at 95ºC, and 35
cycles of 30 seconds at 94ºC, 30 seconds at 50ºC, 1 minute at 72ºC, followed by 10 minutes at
72ºC, and then cooled down to 4ºC until ready for use.
After the amplification, samples were run on a 2% agarose gel to identify the products of
the three RT-PCR reactions to determine the molecular weight of each of the bands representing
the 3 primer sets. The bands were purified following the manufacturers instructions using the
Qiagen Gel Purification Kit. The gel extracted amplicons were then setup for cycle sequencing
as follows: 4ul of BigDye v3.0 (Applied Biosystems), 4ul of 5X Big Dye buffer, 1ul of primer,
10ul of H2O, and 2ul of the template (total volume 21ul) in a 96-well plate, and thermocycling
conditions: 25 cycles of: 96ºC for 10s, 50ºC, for 5s and 60ºC for 4m. After the Big Dye
termination, the samples are purified again for 30 minutes using X-Terminator (Applied
Khedr, 18
Biosystems), where 20ul of the xterminator is added to each well in the 96-well plate. After
Xterminator is performed, the template already labeled, is inserted into the sequencer (ABI 3730
DNA Sequencer (Applied Biosystems)).
Phylogenetic Analysis
For each sample 6 sequences were obtained, and for each isolate the sequences were
aligned using CodonCode Aligner and edited using Bioedit. For each isolate, overlapping
sequence was identified and assembled using BioEdit. To examine the similarity of all of the
sequenced isolates a multiple sequence alignment was performed using ClustalW of the 107
assembled sequences. These alignments were then examined using Mega 5.0 and used to
construct phylogenetic trees. The sequences were analyzed using the maximum likelihood
method with bootstrapping for the 107 isolates19. The best model was chosen using Mega 5.0’s
Model Test and constructed using the PHYML algorithm. Neighbor-Joining phylogenetic trees
were constructed while being estimated using the maximum-likelihood model. The samples
were divided according to their cluster and clad patterns that were indicated in the phylogenetic
analysis.
Throughout creating these models, ancestral linear regression was estimated by including
the Victoria-like and Yamagata-like strains as references along with the first influenzavirus B
(B/Lee/1940) in the alignment9. Substitution rates are determined by comparing sequences to a
rooted reference strain such as B/Victoria/02/87 and B/Yamagata/16/88.
The method of determining the substitution rate is called root to tip and it is based on first
estimating the phylogeny of the root sequence. The linear regression is between the time of
Khedr, 19
sampling and the genetic distance which is the sum of the reconstructed branch lengths. From
this method, the evolutionary rates for both the Victoria-like and Yamagata-like strains were
determined using Mega 5.0. The distance rates are first calculated and then using excel, the
equation v=k/t is implemented. V is the value that is needed to be obtained, k is the distance
value, and t is the time in years. Also both the non-synonymous and synonymous rates
(nucleotides/site/year) were used to determine which lineage has a higher evolutionary rate4.
Entropy Analysis
Entropy Analysis was plotted using Bioedit to estimate the conservative regions for the
translated amino acid sequence. The amino acid sequences were translated throughout a function
on Bioedit beginning from the first frame. The translated sequences were grouped according to
the reference strains they were closely related to and variations in the amino acids were
observed. The positions of most variation for each lineage were chosen to determine if there
were any common conserved variations within the amino acid sequences. These positions were
observed using Mega 5.0 where the positions were manually selected and observed.
The amino acid sequences for the 107 samples were then scanned using the Prosite
Motifs Database (PPsearch) (http://www.ebi.ac.uk/tools/ppsearch/index.html) to predict
glycosylation sites. After determining the glycosylation sites using PPsearch, these positions are
found on the entropy graph to determine whether or not the glycosylation site is subject to
variation from 1998-2008. Common glycosylation sites between the two lineages Victoria-like
and Yamagata-like were determined by PPsearch.
Khedr, 20
RESULTS
Sample selection
From 1999 through 2008 there were approximately 1,800 influenzavirus B isolates in the
NAMRU-3 collection from Egypt. Approximately 10 isolates per year were selected for further
characterization in this study and previously in the master’s thesis of ElDin, D23. Additionally to
get geographical representation, 42 isolates were selected to represent the Cairo area, 52 isolates
to represent northern Egypt (north coast and delta areas), and 13 isolates to represent southern
Egypt. The 107 isolates were collected from 9 different locations as shown in Figure 1.
Influenza B Lineages
Each of the 107 isolates was extracted and the NA gene was amplified using 3 sets of
primers and subsequently sequenced and assembled into a single contig. The NA gene from the
107 isolates was analyzed to assess the ancestral lineage by phylogenetic analysis. Influenzavirus
B viruses are broadly categorized into two lineages, Victoria-like and Yamagata-like based on
previous studies. In this study 78 or 73% of the isolates had an NA gene of Victoria-like lineages
while the remaining 29 (27%) were Yamagata-like. These two lineages have both circulated in
Egypt throughout 1998-2008 in various proportions, except for the year 2000 where only the
Victoria-like viruses were circulating, as shown in Figure 5. The percentages of Victoria-like
viruses circulating were higher than the percentage of co-circulating Yamagata-like viruses
except in 2005 (Victoria-like 46.6% and Yamagata-like 53.4%) and 2008 (Victoria-like 50% and
Yamagata-like 50%). As seen in the phylogenetic tree in Figure 2, 16 reference strains were
mapped with the 107 influenzavirus B isolates. It was observed that from the years 1998-2003,
Khedr, 21
the Victoria-like isolates were related mostly to the reference strains B/HongKong/548/2000,
B/Wisconsin/01/2009, and B/Brisbane/32/2002. It was also observed throughout Figure 2, that
from the years 2004-2008 the Victoria-like (except for the year 2006 where Yamagata-like
strains were the only circulating viruses) were more closely related to B/Johannesburg/69/2001,
B/Harbin/07/1994, and the Victoria/02/1987 reference strains.
Phylogenetic analysis
A neighbor-joining phylogenetic tree was constructed using Mega 5.0 after analyzing the
sequencing data using ClustalW. Then the alignments were analyzed using bootstrapping
analysis throughout Mega 5.0 before the Neighbor-Joining tree was constructed using the
Maximum Likelihood Method model. Non-parametric bootstrapping was performed to test the
reliability of the dataset. This method of analysis involves the resampling of the provided
datasets (sequences) of the same size as the original dataset (107 sequences and their lengths).
The resampled dataset is sampled at random instead of at their original sites creating between a
hundred and a thousand datasets. Then Mega 5.0 analysis of the resulting trees creates a
consensus tree as shown in Figure 3. The Neighbor-Joining tree also includes 16 reference
strains to map out the samples’ lineages and similarities. The phylogenetic tree was either colorcoded to display year of collection (Figure 4), or location of collection (Figure 5) or displayed to
easily demonstrate relationships between the viruses from Egypt and the reference strains (Figure
3). Result from Figure 4 and 5.
Khedr, 22
In Figure 3 there are two distinct clusters. The smaller of the two clusters is composed of
the 29 Yamagata-like while the larger cluster contains all of the Victoria-like isolates and can be
further broken down into smaller clusters.
There are 5 clades within the Victoria-like cluster that are named according to the
reference strain the isolates are closely related to. In one of the clades there are three isolates
from the years 2000, 2002, and 2003 that are closely related to the B/Wisconsin/01/2009 and
B/Texas/UR06-0541/2007 reference strains. The second clade contains 24 isolates that are most
similar to B/Brisbane/32/02 reference strain. However, one of the 24 samples (2000910434) is
more similar to the reference strain B/Hong Kong/692/01.
The third clade is composed of 31 NA sequences that are B/Hong Kong/548/2000-like
and are from all the years (1998-2008). The fourth clade is the B/Vienna/1/99-like which
contains 4 isolates from 1999 and 2000. Within the B/Vienna/1/99-like clade is a smaller group
of 3 samples that are B/Mexico/84/2000-like sequences from the years 1999 and 2000. The 4th
sample within the B/Vienna/1/99-like clad is closely similar to the B/Sichuan/379/99 reference
strain. The final clade is B/Johannesburg/69/2001-like and there are 9 samples that are closely
related to the B/Johannesburg/69/2001 reference strain.
Amino Acid Variations of NA Sequences
Entropy analysis is the process by which the translated sequences of the influenzavirus B
isolates are aligned and mapped on a graph. On the x-axis are the alignment positions with each
peak representing the locations in the alignment with variation. The higher the peak on the graph,
the greater the variation within that position in the alignments. The positions of the alignment on
Khedr, 23
the graph with peaks are called non-zero entropy positions. Non-zero entropy positions can
represent areas on the neuraminidase gene that are subject to high rates of mutation such as
antigenic sites9.
The alignment positions of the translated sequences on the graph without peaks indicate a
conservative region or a region without variation that is the same throughout the 107 isolates.
The positions of the alignment on the graph without peaks are called zero entropy positions. Zero
entropy positions can indicate regions that are not subject to high mutation within the
Neuraminidase gene such as receptor binding sites4.
Amino acid sequences obtained from translating the nucleic acid sequences using Bioedit
were aligned with reference strains according to the phylogenetic analysis that was constructed
on the 107 samples. As indicated in Table 2, Yamagata-like strains were compared to the
B/Yamagata/1988 reference strain. The positions indicated are the non-zero positions (nonconservative amongst the 107 isolates) on the entropy analysis. According to the comparative
analysis, position 388 has the most variation where Ala to Ser (A338S) in 25 out of the 29
Yamagata-like sequences showing a conserved amino acid variation as shown in Table 2.
In Table 3, the Victoria-like NA genes were grouped according to cluster and compared
according to clades: B/Victoria/02/87, B/Brisbane/32/2002, B/Hong Kong/548/2000, and
B/Johannesburg/69/2001 respectively. The B/Victoria/02/87 amino acid alignment indicates that
the position with the most variation is at position 439 (Leu to Arg). The B/Brisbane/32/2002
amino acid alignment also indicated that position 439 was the most variable with a conserved
amino acid variation of Leu to Arg. The B/Hong Kong/548/2000 and The
B/Johannesburg/69/2001 amino acid alignments also had the most variations at position 439.
Khedr, 24
Neuraminidase Glycosylation Sites
Glycosylation sites are enzymatic sites that are involved in post-translational
modifications that affect the structure and function of proteins. Mutations in glycosylation sites
occur in the antigenic sites of influenzaviruses causing the structure to change. Mutations can
also cause glycosylation sites to appear in different influenzavirus lineages1, 12.
Determining variations in the glycosylation sites indicates the most non-conservative
region in the sequences, and the location where site specific mutations are most likely to occur.
Glycosylation sites of the influenzavirus B NA gene were determined using PPsearch where the
glycosylation sites were mapped out for each isolate. Eight glycosylation sites were identified as
shown in Table 4. The same glycosylation sites were found in both the Victoria-like strains and
the Yamagata-like strains except for the glycosylation site at position 255. The glycosylation site
at position 338 was only identified in isolates from 2000. The glycosylation sites at position 255
and 338 are not as conservative as the ones found in all isolates and in all years.
Neuraminidase Antigen Binding Sites
Antigen binding sites in influenzavirus are some of the most non-conservative regions
where high rates of mutations occur. The entropy plot was graphed as shown in Figure 6 displays
the conservative and non-conservative positions of the alignment. Influenzavirus B
neuraminidase has antigenic sites that are subject to high mutation rates. The antigenic sites are
composed of loops within RNA segment 6 at various sites known as loop 150, loop 200, loop
350, loop 370, and loop 400. Each of these loops can be located within different positions, for
example loop 150 can be located between positions 140-150 as shown in Table 5. The nonKhedr, 25
conservative region between positions 140-150 are at position 148 with a variation entropy
calculation of 0.32508 (a calculation on the Y-axis of the graph in Figure 6 to determine to which
degree this position in the alignment is varied). The rest of the antigenic loop sites contain the
same level of variation except for loop 370 located at position 373 with the entropy calculation
of 0.44702
Using the entropy data, a correlation was found between the zero entropy positions and
the receptor binding sites. Receptor binding sites are highly conservative. For neuraminidase the
conservative receptor binding sites are: Glu-119, Arg-156, Trp-178, Ser-179, Asp-198, Lle-222,
Glu-227, Asp-293, and Glu-425 and their positions are indicated in table 6. As indicated in table
6, the positions of the conserved regions are very similar and have remained conservative as
shown in Figure 6 since the positions are at the zero entropy plots.
Evolutionary Substitution Rates
There are two types of substitutions that occur: non-synonymous and synonymous.
Synonymous substitutions do not change the encoded amino acids and have no selective effect
rendering these mutations neutral. A mutation being synonymous depends on three factors. The
first factor, as mentioned, is that there is little or no difference caused by the substituted codon.
The second factor is that there are no secondary RNA or DNA structures in the coding and noncoding regions of the substitutions. The third factor is the absence of overlapping reading frames.
The non-synonymous substitutions change the encoded amino acids and have a selective effect.
Determining substitution rates potentially allows the observation of the roles that mutations play
on the virility and transmission of the influenza virus20.
Khedr, 26
The nucleotide substitution rate was calculated for each of the 107 strains and the
reference strains (Figure 7). The figure indicates that the evolutionary rate has been increasing
over the years 1998-2008. This indicates that the mutational pattern in specific non-conservative
regions have increased over the span of those 10 years. The Victoria-like lineages have a
substitution rate of 2.3x10-3 nucleotides/site/year. The non-synonymous rate of nucleotide
substitutions, 9.2x10-4, and the synonymous rate of nucleotide substitutions is 9.4x10-3. The
Yamagata-like lineages have a substitution rate of 2.93x10-3. The non-synonymous rate of
nucleotide substitutions is 1.56x10-3 and the synonymous rate of nucleotide substitutions is
8.84x10-3. These substitution rates indicate that the Yamagata-like lineages underwent a higher
rate of nucleotide substitutions than the Victoria-like lineages. The rate of synonymous
substitutions in comparison with the non-synonymous substitutions in the Yamagata-like
lineages is much higher than those of the Victoria-like lineages. According to this data, the
mutation rate of the Yamagata-like lineages is at least double the mutation rate of the Victorialike lineages over the years 1998-2008. Shown in Figure 7 are the rates of the non-synonymous
and synonymous substitutions for both the Victoria-like Yamagata-like isolates.
Khedr, 27
DISCUSSION
The molecular characterization of the influenzavirus B neuraminidase includes: virus
circulation patterns, phylogenetic analysis, determining glycosylation sites, conservative and
non-conservative regions in the neuraminidase gene, and calculating evolutionary rates. The
evolutionary rates of HA were calculated for the same 107 isolates in this study throughout a
previous study. The rate of mutation of NA is compared and contrasted to provide a more
comprehensive molecular characterization of influenzavirus B.
In order to attempt to gain an understanding about the circulation of influenzavirus B in
Egypt and in particular, the characteristics of the neuraminidase gene 107 isolates were
sequenced and examined. The variety of influenza B viruses circulating is dependent on the
current antigenic properties of the virus as well as past exposures to different viruses1, 14. Each
year samples are taken from patients who are suspected to have influenza from outpatient clinics.
The samples are taken from hospitals all over Egypt providing data regarding the circulation of
the influenzavirus B in Egypt.
The phylogenetic analysis for the 107 samples indicates that there are two clusters: the
Yamagata-like and the Victoria-like. The Victoria-like cluster contains clades that further
discriminate the isolates into sub-clades most similar to the following reference strains: B/Hong
Kong/548/2000, B/Brisbane/32/2002, B/Wisconsin/01/2009, B/Vienna/1/99, and
B/Johannesburg/69/2001.
Both the Yamagata-like and Victoria-like samples co-circulated each year except for the
year 2000. The sudden drop and then reappearance of the Yamagata-like strains the following
Khedr, 28
year, according to similar studies conducted by Chi et.al, can be speculated to be due to the
viruses’ antigenic shifting and drifting properties. Much like the results found for Egypt, it has
been reported throughout Asia that the Yamagata-like strains have not been circulating as well as
the Victoria-like strains for the duration of the years 1999-200020. Information gathered about the
influenza B virus is critical to the creation of efficient vaccines to counteract the yearly epidemic.
Vaccines are developed based on the viruses that are circulating each season.
In order to understand the circulation patterns of influenzavirus B in Egypt the
phylogenetic tree was color-coded by year of collection (Figure 4) or by location (Figure 5). As
observed in Figure 5, very few groupings of isolates were formed on the phlyogenetic tree
branches according to location within Egypt therefore it is not conclusive that the isolates’
location on the tree is correlated with the hospital location the sample was collected in Egypt. As
observed in the color grouping on the tree there is a correlation between the lineage and the year
by which the isolates were collected (Figure 4). This can be noticed in some groupings within the
years 2003, 2004, 2005, 2006, and 2007 where various groups of isolates from the same year are
closely related to one another but not all the isolates of the same year are located closely on the
tree. Only some isolates from each year are grouped together.
According to previous work done on the hemagglutanin surface protein by ElDin, D, it
has been determined that the surface proteins on a single influenza virus strain can have different
evolutionary rates and lineages23. Some isolates have shown that the Hemagglutanin sequence
can be of a Victoria-like lineage while the Neuraminidase sequence can be of Yamagata-like
lineage and vice versa.
Khedr, 29
According to ElDin, D 11 glycosylation sites were identified within the HA gene of the
107 isolates from Egypt, and in this study 8 sites were identified in the NA gene of the same
viruses23. A single amino acid change in the glycosylation site can result in the inactivation or
creation of a glycosylation site. The glycosylation sites of the NA gene are known to mutate, for
example glycosylation site 255 was only found in the Yamagata-like viruses indicating that a
mutation caused this glycosylation site to disappear from the Victoria-like influenzaviruses B in
Egypt. Glycosylation site 338 was only found in the isolates circulating in the year 2000,
indicating that a mutation (such as an insertion, deletion, or a substitution) has caused this
glycosylation site to become active but then a mutation occurred once more the following year
allowing the site to disappear once more. The reason for the appearance of glycosylation site 338
could be the result of viral evolution to evade the host immune system and thereby allowing it to
circulate more efficiently throughout the Egyptian population.
It has also been determined throughout this study that there are high variations (chosen
from the entropy graph) within amino acid position 388 for the Yamagata-like lineage and
positions 113, 338, 403, and 437 for the Victoria-like isolates (Table 3). This indicates that these
non-conservative regions will likely continue to change thus contributing to the virus’
evolutionary rate. According to Areej et.al, the position 338 has been determined to be one of the
positions in the NA gene that is involved in NA inhibitors which is subject to high mutation rates
as a method of resistance24.
The entropy plot was done to determine the analysis of variations within the protein
alignment which allows us to map the amino acid sequences that code for the Neuraminidase
surface protein. The translated sequences of 16 reference strains (from various locations
Khedr, 30
throughout the world: Asia, Australia, Europe, Northern America, and Southern America) were
also aligned with the 107 amino acid sequences of the isolates. The beginning and end of the
alignment show the most variance since the 107 amino acid sequences were trimmed at the
beginning and end while the reference strains were not. The change in the antigenic loop sites
allows the influenza virus to change its antigenic properties creating the need for a new vaccine
every year. Determining the most non-conserved sites allows the speculation onto where it is
likely that a mutation would occur (Table 5). Just as the non-conservative sites were observed,
the conservative sites of influenzavirus B receptor binding sites were studied to determine if
these sites have remained conservative (Table 6). The receptor binding sites have remained
conservative indicating a low mutation rate in these regions which most likely remains
conservative as long as the targeted host does not change.
Evolutionary rates for the 107 sequence nucleotides have been determined throughout
nucleotide substitution. These rates indicated that the influenza B neuraminidase mutations have
increase over the years 1998-2008. It has been determined that the Yamagata-like lineages have
undergone a higher rate of substitution than the Victoria-like lineages, as shown in other studies
such as conducted by Schweiger B. in Germany over influenzaviruses A and B on both HA and
NA from 1996-2006. Influenzavirus B neuraminidase showed a higher substitution rate in the
Yamagata-like than the Victoria-like strains25.
El-Din, D. performed the molecular characterization on the HA gene from the same 107
isolates as this study23. In this study the isolates could be characterized into two phylogenic
lineages as Neuraminidase: Victoria-like and Yamagata-like. Of note, the isolates did not
necessarily contain an HA and NA gene from the same lineage. Some isolates contain surface
Khedr, 31
proteins where the HA is Victoria-like and the NA is Yamagata-like and vice versa. There are a
total of 42 isolates with different lineages for their HA and NA surface proteins. Ten of the fortytwo have a Victoria-like HA and a Yamagata-like NA. Thirty-two of the forty-two have a
Yamagata-like HA and a Victoria-like NA. The remaining 65 isolates have the same lineages for
the HA and NA surface proteins be it Victoria-like or Yamagata-like. Chi et.al has also
discovered similar results regarding studies conducted on influenzavirus B strains in Israel and
China20. The influenzavirus isolates can contain an HA from the Victoria lineage and an NA
from the Yamagata lineage and vice versa.
Evolutionary rates were calculated from the HA gene by El-Din, D. and the estimated
evolutionary rate for the Victoria-like strains were 2.23x10-3 nucleotides/site/year and for the
Yamagata-like strains the estimated evolutionary rate was 2.82x10-3 nucleotides/site/year23.
These results were very similar to those calculated using the NA gene (Victoria-like 2.32x10-3
nucleotides/site/year and Yamagata-like 2.93x10-3 nucleotides/site/year). The substitutions for
HA were estimated to be 9.23x10-4 non-synonymous nucleotides/site/year and 6.43x10-3
synonymous substitutions/site/year for the Victoria-like strains. For NA the substitutions for
Victoria-like were estimated to be 9.2x10-4 non-synonymous nucleotides/site/year and 9.4x10-3
synonymous substitutions/site/year. The estimations for the non-synonymous are almost identical
in the HA and NA of the Victoria-like strains but the estimations based on NA for the
synonymous exceeds the estimation for HA. Studies, such as Schweiger B., have determined that
HA has a higher mutational rate than NA in Germany25. According to the findings of this study,
NA’s mutational average rate does not exceed HA in the isolates collected in Egypt. The HA
substitutions for the Yamagata-like strains were 1.75x10-3 non-synonymous
substitutions/site/year and 6.20x10-3 synonymous nucleotides/site/year. Although the
Khedr, 32
evolutionary estimation rates are quite similar for the Victoria-like and Yamagata-like strains,
the synonymous estimations of the Yamagata-like isolates are almost double the estimated values
of the synonymous estimations of the Victoria-like isolates. Therefore the Yamagata-like strains
have a higher substitution rate.
Much like HA, NA substitutions for the Yamagata-like isolates estimated indicate that
the Yamagata-like lineages double the evolutionary rate of the Victoria-like isolates (1.56x10-3
non-synonymous substitutions/site/year and 8.84x10-3 synonymous substitutions/site/year).
According to Domingo et.al, the Yamagata-like lineages have higher evolutionary rates than the
Victoria-like lineages which coincide with the data found in this study26.
Khedr, 33
CONCLUSION
It was the purpose of this study to determine the evolutionary rates, phylogenetic
lineages, and glycosylation sites of the Neuraminidase influenza B virus surface protein. This
study enabled a more thorough understanding of the influenza B viruses that are infecting the
Egyptian population. Full genome sequencing and analysis of the isolates should be conducted
to determine the genetic makeup of the entire virus. It is our hope that this study will encourage
other neighboring countries to conduct similar studies on influenzaviruses to better understand
the virus’ circulation and evolution globally.
Khedr, 34
REFERENCES
1. Influenza Book | Virology of Human Influenza [Internet] [cited 2011 12/27/2011]. Available
from: http://influenzareport.com/ir/virol.htm.
2. Whittaker GR. Intracellular trafficking of influenza virus: Clinical implications for molecular
medicine Expert Reviews in Molecular Medicine 2004; 2001;3(05).
3. Coleman R. The PB1-F2 protein of influenza A virus: Increasing pathogenicity by disrupting
avleolar macrophages. Virology Journal 2007;4(9):56-61.
4. The 2.2 A resolution crystal structure of influenza B neuraminidase and its complex with sialic
acid. [Internet] [cited 2011 12/9/2011];available from:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC556424.
5. Holmes E, Ghedin E, Miller N, Taylor L, Bao Y, St George K, Grenfell B, Salzberg S, Fraser
C, Lipman D, et al. Whole-genome analysis of human influenza A virus reveals multiple
persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol 2005
september;3(9):e300.
6. Nakagawa Y, Oda K, Nakada S. The PB1 subunite alone can catalyze cRNA synthesis, and
the PA subunit in addition to the PB1 subunit is required for viral RNA synthesis in
replication of the influenza virus genome. J Virol 1996 september;70(9):6390-6394.
7. Lu G, Rowley T, Garten R, Donis RO. FluGenome: A web tool for genotyping influenza A
virus Nucleic Acids Res 2007 Jul;35(Web Server issue):W275-9.
8. Air GM, Laver WG. The neuraminidase of influenza virus Proteins: Structure, Function, and
Genetics 1989;6(4):341-356.
9. Connaris H, Takimoto T, Russell R, Crennell S, Moustafa I, Portner A, Taylor G. Probing the
sialic acid binding site of the hemagglutinin-neuraminidase of newcastle disease virus:
Identification of key amino acids involved in cell binding, catalysis, and fusion J Virol
2002;76(4):1816-1824.
10. Chen BJ, Leser GP, Jackson D, Lamb RA. The influenza virus M2 protein cytoplasmic tail
interacts with the M1 protein and influences virus assembly at the site of virus budding J
Virol 2008 Oct;82(20):10059-70.
11. Chen R, Holmes E. The evolutionary dynamics of human influenza B virus. J Mol Evol 2008
june;66(6):655-663.
Khedr, 35
12. Li S, Schulman J, Itamura S, Palese P. Glycosylation of neuraminidase determines the
neurovirulence of influenza A/WSN/33 virus. J Virol 1993;67(11):6667-6673.
13. Phylogenetic analysis reveals the global migration of season influenza A viruses PLoS
Pathogens [Cited 2011 12/9/2011];available from:
http://www.plospathogens.org/article/info:doi/10.1371/journal.ppat.0030131.
14. Hunt R. The epidemiology of the influenza virus. Microbiology and Immunology
2005;16:32-34.
15. Grassly N, Fraser C. Seasonal infectious disease epidemiology Proc Biol Sci 2006 October
7;273(1600):2541-2550.
16. McCullers J. Insights into the interaction between influenza virus and pneumococcus Clin
Microbiol Rev 2006 July;19(3):571-582.
17. Nicholson K, Wood J, Zambon M. Influenza Lancet 2003 November 22;363(9397):17331745.
18. Lofgren E, Fefferman N, Naumov Y, Gorski J, Naumova E. Influenza seasonality:
Underlying causes and modeling theories. J Virol 2006;81(11):5429-5436.
19. Besselaar T, Botha L, McAnerney L, Schoub BD. Phylogenetic studies of influenza B
viruses isolated in sourthern africa: 1998-2001. Virus Res 2004 july;103(1-2):61-66.
20. Chi X, Bolar, TV., Zhao, P., Rappaport R, Cheng S. Cocirculation and evolution of two
lineages of influenza B viruses in europe and israel in the 2001-2002 season. J Clin
Microbiol 2003;41(12):5770-5773.
21. Squires B, Macken C, Garcia-Sastre A, Godbole, S., Noronha, J., Hunt V, Chang R, Larsen
C, Klem E, Biersack K, et.al. BioHealthBase: Informatics support in the elucidation of
influenza virus host pathogen interactions and virulence. Nucleic Acids Res 2008
january;36(Database Issue):D497-503.
22. Lindstrom S, Hiromoto Y, Nerome R, Omoe K, Sugita S, Yamazaki Y, Takahashi T, Nerome
K. Phylogenetic analysis of the entire genome of influenza A (H3N2) viruses from japan:
Evidence for genetic reassortment of the six internal genes. J Virol 1998;72(10):8021-8031.
23. ElDin D. Molecular characterization and evolution of hemagglutanin of influenza B viruses
in Egypt from 1998-2008. Fall 2009.
24. Areej M, Fatma U, Mutasem O. Combining docking, scoring, and molecular field analysis to
probe influenza neuraminidase ligand interactions. Journal of Molecular Graphics and
Modelling 2007;26:443-456.
Khedr, 36
25. Schweiger B. Molecular characterization of human influenza viruses-- a look back on the last
10 years. Berl Munch Tierarztl Wochenschr 2006;119(3-4):167-178.
26. Domingo E, Holland JH, Webster R. Origin and evolution of viruses San Diego, Calif. ;
Academic, c1999.id: 1
Khedr, 37
TABLES AND FIGURES
Figure Legends
Figure 1 – Map of Egypt labelled with affiliated hospitals
Locations of the outpatient clinics (in hospital) from which the samples were collected within
Egypt. 5 of the 10 sites are in the Cairo area due to the high population density (Monira General
Hospital, Helwan General Hospital, El-Gabarty Fever Hospital, and Kitchner General Hospital)
with the 5th point representing NAMRU-3 (purple). Below the map of Egypt is a list that
indicates the number of isolates per year and per collection site.
Figure 2 – Prevalence of Yamagata-like isolates and Victoria-like isolates by year
The percentage of Yamagata-like and Victoria-like viruses was plotted by year to look at the
trends of circulation patterns of the two-clades over the time course examined.
Figure 3 – Phylogenetic tree of influenzavirus B Neuraminidase gene
Neighbor-Joining tree of 107 isolates of influenzavirus B and 16 reference strains. The tree is
rooted to the reference strain; B/Lee/1940 and B/Victorira/87 and B/Yamagata/88 are included to
delineate the two lineages.
Figure 4 - Circular representation of NA phylogeny color-coded according to distribution
from 1999-2008
Circular phylogenetic tree of influenzavirus B isolates and reference strains to discriminate
clades and make comparisons. Each year is a different color with reference strains marked in
purple.
Khedr, 38
Figure 5 – Circular representation of NA phylogeny color-coded by collection site.
Circular phylogenetic tree of the influenzavirus B isolates and reference strains to discriminate
clades and make comparisons color-coded by the 9 collection sites as displayed in Figure 2.
Figure 6 - Entropy H(x) Plot of the NA gene translated sequences
Along the x-axis are the amino acid positions. The height of the peak represents the amount of
variation. Positions with peaks are are called non-zero entropy positons and positions with no
peaks are conserved residues and are called zero entropy positions. From this entropy plot
conservation of glycosylation sites can be examined (Table 6).
Figure 7 – Rate of Nucleotide Substitutions in Neuraminidase gene
Along the x-axis is the year of collection and the y-axis is the rate of substitution. Reference
strains were included in the analysis to compare to the isolates in this study.
Khedr, 39
Figures
Figure 1a – Map of Egypt labelled with affiliated hospitals.
3
1
4
10
6
7
2
5
8
9
1. Alexandria Fever
Hospital
6. El-Gabarty Fever Hospital
7. Kitchner General Hospital
2. Monira General Hospital
8. Menia Fever Hospital
3. Domiatta Fever Hospital
9. Aswan Fever Hospital
4. Sharqeya Fever Hospital
10. NAMRU-3
5. Helwan Fever Hospital
Khedr, 40
Figure 1b: List of the number of isolates per year per collection site.
Year
Locations
1999
Alexandria Fever Hospital
Helwan Fever Hospital
Alexandria Fever Hospital
Helwan Fever Hospital
El Gabarty Fever Hospital
Alexandria Fever Hospital
Monira Fever Hospital
Kitchner Fever Hospital
El Gabarty Fever Hospital
Alexandria Fever Hopsital
Helwan Fever Hopsital
Monira Fever Hospital
Alexandria Fever Hospital
Alexandria Fever Hospital
Helwan Fever Hospital
El Gabarty Fever Hospital
Kitchner Fever Hospital
Alexandria Fever Hospital
Helwan Fever Hospital
El Gabarty Fever Hospital
Kitchner Fever Hospital
Aswan Fever Hospital
Monira Fever Hospital
Domiatta Fever Hospital
Menia Fever Hospital
Alexandria Fever Hospital
Monira Fever Hospital
El Gabarty Fever Hospital
Sharqeya Fever Hospital
Alexandria Fever Hospital
Menia Fever Hospital
2000
2001
2002
2003
2004
2005
2006
2007
2008
Number of isolates
per location
7
4
6
4
2
1
2
4
2
10
2
3
7
8
1
1
2
6
5
2
2
6
1
3
2
2
3
2
2
1
3
Khedr, 41
Total
11
12
9
15
7
12
15
7
13
6
Figure 2 – Prevalence of Yamagata-like Isolates and Victoria-like Isolates by Year
Khedr, 42
Figure 3 – Phylogenic Tree of Influenzavirus B Neuraminidase Gene
Khedr, 43
Figure 4 – Circular representation of NA Phylogeny color-coded according to distribution from 19982008
InfB NA 2007
903712 94
InfB NA
200890
0863 10
2
InfB N
A 20 0
89020
9
9
1
InfB
05
NA 2
0089
0215
3
1
03
InfB
NA 2
0079
InfB
0307
NA
9 93
20 0
790
65 8
B/W
6 97
InfB isc
NAonsin
200 /01/
Inf
BN
3 90 20 09
44 3
A2
I nf
00
15
B
B/ N
2
1
91
Te A
0
xa 200
59
73
s/U 09
In
fB
6
R0 004
N
In
6 - 67
fB A 2
05
00
41 15
In
N
A
fB
/20
6
NA 200 907
07
14
20 690
3
85
06
08
71
90
71
87
56
86
5 98
752 1
7 90
0
200
21
NA
037
InfB
791
00
200
71
NA
046
InfB
791
99
200
75
8
NA
08
95
79
InfB
00
89 6
42
A2
0
0
BN
7 9 55 1
Inf
00
1
88
A 2 902
5
4
BN
68
08
0
2
1
I nf
20
4
90
NA
07
06
72
20 902
1
72
08
07
20
NA
9
05
20
A
NA
NA
N
InfB NA 2007903706 91
InfB NA 2007901829 92
B
Inf
fB
fB
In
fB
fB
fB
In
In
In
In
20
06
9
Inf
0
59 001
B
2
0
NA
00
59
05
Inf
0
B
90
20
8
NA
05
92 22 6 0
90
20
07
Inf
5
86
BN
05
90
07 77
A2
Inf
94
BN
00
69 74
29
A2
7
1
InfB
5
51
00
29
29
NA
23
43
200
11
InfB
7
3
900
4
NA
591 5
20 0
InfB
291
52
055
NA
20 0
93
2 90
9
InfB
224
NA 2
7 38
0019
0
0917
InfB N
28
A 200
19009
23 29
InfB NA
999002
91 7
InfB NA 99
900309 8
InfB NA 200490128
2 60
In
fB
NA
NA
20
B/Mexico/84/20
In
I fB
In nfB NA
I f
N
B/J nfB B N A 20
oh NA A 2 20 07
an
Inf
0 02 90
B
2
In B N /A nes 00 02 90 26
InfBfB NAA 20rgenburg290901 19 88
InfB
0
t
N 2 4 i /6 1 9 5 8
InfB NA 2A0 2000049900n3a/6 9/2905453 25 269
NA 2 049490079 399/2 0125 7
004903311803 5800
002220 58 566 1
InfB NA 20
2 547 9
01900150
24
InfB NA
999005
03 3
B/Ha
rbin/0
7/94
1
00
22 0
41 3 2 19
73 33 20 12 0
90 07 73 66 00 6
8
00 09 90 0698/2 6 1 33 1
20 00 00 09 /54 707 094
A 2 0 0 ng 0 09
N NA A 2 20Ko 009200
10
98
fB B N NAng 20 A
/19 351
InInfInfBfB/Ho NnAfB N a/3189900
9 14
9
InB InfB I
ni
09200946
ma NANA 200003 3 4
B
9
1
f
o
fB
A 99 9005
B/R InIn
B NNA 99
InffB
In
93 11
99
/99
NANA
7903
InfBInfB
/390
B/Sichuan 92000907031 13
/1/999900070 1
InfB
nnaNA
B/Vie
InfB NA 2004901951 61
InfB NA 2005904005 69
900106 67
InfB NA 2005901343 68 7
6 10
2005
InfB NA 200890215 50
2 96
A
4414 790429 63 2
InfB N
0
9
3
0
0 5
00
0
2
0
2
A
0
A
999 778 6
InfB N InfB N
NA 900 834
InfB NA 99990350725
9 6 5
InfB N0A909526 56
Inf2B004903 6262 624
3 8 6
A 0
B N 20 90 8 8 3
Inf NA 004 907789 3 6
2 04 0 03
B
f
A
In N 20 49 09
0 9
B
Inf NA 20 04
fB A 0
In B N A 2
f
In fB N
In
Reference Strains
2003/2004
1998/1999
2004/2005
2005/2006
2006/2007
2007/2008
Khedr, 44
83
81
4
14
79
4
9
91
13
73
4
20 05
1
A
28
20
59
N
71
78
B NA 200 590
f
67
In fB NA
0
62
84
0
0
n
9
2
I fB
18
5
A
00 9030
82
In B N
2
6
881
Inf B NA 200
78
90 2
A
6
Inf
0
20
B N A 20
001
9
Inf
6
N
4 70
00
403
InfB NA 2
590
0
B
f
0
1 76
2
In
119
NA
591
InfB
200
A
N
2
InfB
/200
17
e/32
07125
isban
B/Br NA 20009
fB
In
3848 53
200390
InfB NA
48
03901518
InfB NA 20
7 90
263
790
200
NA
InfB
InfB NA 2002923229 46
InfB NA 2001900915 32
Inf
B NA 200291
InfB
NA 2002 4260 42
InfB N
InfB NA 20039923098 44
0
InfB A 20
029 0602 49
In NA
In fB N 200 21873
InffBB NAA 2002291143 47
0 41
91
In
2
In fB NA 002 140
I fB NA 200 910 4 40
InnfB NNA 200 1900 621 37
In fB A 200 190 930
B/ fB NNA 200 190 0929 35
H o A 20 1 9 09
3
ng 20 019 009 26 3 4
Ko 009 009 22 3
ng 10 21 31
/ 6 4 3 30
92 4
/0 21
1
9
05
0.05
1999/2000
2000/2001
2001/2002
2002/2003
9
5
41
Figure 5 – Circular Representation of NA phylogeny color-coded by collection site
20
06
NA
Inf
05 900
B
20
90
1
N
05
Inf
A
00 59
B
90
20
8
NA
05
92 22 6 0
90
20
07
Inf
5
86
BN
05
7
90
07
7
A2
Inf
94
BN
00
69 74
29
A2
7
1
InfB
5
51
00
29
29
NA
23
43
200
11
InfB
7
3
900
4
NA
5 91 5
200
InfB
291
52
055
NA
200
93
290
9
InfB
224
NA 2
7 38
0019
0091
InfB N
7 28
A 200
19009
23 29
InfB NA
999002
91 7
InfB NA 99
900309 8
InfB NA 200490128
2 60
In
fB
NA
20
InfB NA 2007
903712 94
InfB NA
20 08 90
08 63 10
2
InfB N
A 20 0
8 90 20
9
9
10 5
InfB
NA 2
0089
0215
3
103
InfB
NA 2
0079
InfB
0307
NA
9 93
200
790
65 8
B/W
6 97
InfB isc
NAonsin
/
2
0
Inf
003 1/2
BN
904 009
A2
Inf
431
0
B
02
B/ N
51
91
Te A
0
2
xa 0 0
59
73
s/ U 0 9
In
fB
6
R0 004
N
In
6- 67
fB A 2
05
00
41 15
In
N
A
fB
/20
6
NA 200 907
07
14
20 690
3
85
06
08
71
90
71
87
56
86
A
20
NA
N
NA
fB
fB
InfB NA 2007903706 91
InfB NA 2007901829 92
5 98
752
790
01
20 0
21
NA
03 7
InfB
791
00
20 0
71
NA
046
InfB
9
791
59
200
87
NA
08
95
79
InfB
00
89
2
A2
06
04
BN
79 55 1
Inf
00
1
88
A 2 902
5 4
BN
68
08
0
2
0
1
I nf
2
0
4
9
NA
07 206
72
B
20
0
Inf
9
21
N A 08
77
90
05
20
fB
In
fB
fB
In
In
In
In
1
14
59
83
44
81
79
39
3
59 141
0
87
0
A
82
59
N A2
71
7
0
90
fB N
67
20
In fB NA
05
62
84
0
0
n
9
2
I fB
18
5
A
00 9030
82
In B N
2
6
881
Inf B NA 200
78
902
A
6
Inf
0
20
B N A 20
001
9
Inf
6
N
4 70
00
403
InfB NA 2
590
0
B
f
0
1 76
2
In
119
NA
591
InfB
20 0
A
N
2
InfB
/200
e/32
5 17
isban 00090712
r
/B
B
A2
53
InfB N
8
4
8
3
200390
InfB NA
48
18
039015
InfB NA 20
90
7
263
InfB NA 200790
20
0.05
InfB NA 2004901951 61
InfB NA 2005904005 69
900106 67
InfB NA 2005901343 68 7
05
6 10
20
A
N
InfB
90215
A 2008 4414 50 4292 96 2
InfB N
0
9
3
0790
063
A 20 0
A 20 99900778 56
InfB N InfB N
A 900 7834
N
InfB NA 99990350 25
9 6 5
InfB N0A909526 56
Inf2B004903 6262 624
3 8 6
A 0
B N 20 90 8 8 3
Inf NA 004 907789 3 6
2
4
B
0 3
Inf NA 200 49 090
0 9
B
Inf NA 20 04
fB A 0
In B N A 2
f
In fB N
In
9
05
1
14
In
I fB
In nfB NA
I f
N
B/J nfB B N A 20
oh NA A 2 20 07
a
Inf
n
B
2 0 02 90
InfBB NA /Arnesb 002029 90 26
ge ur 9 0 19 8
I
2
N
n
0
f
n
A
InfB B N 2 04 ti g/601 19 5 8 8
InfB NA 2A0 2000049900n3a/6 9/2905453 25 269
NA 2 049490079 399/2 0125 7
004903311803 5800
002220 58 566 1
InfB NA 20
2 547 9
01900150
24
InfB NA
999005
03 3
B/Ha
rbin/0
7/94
B/Mexico/84/20
00
22 0
41 3 2 19
73 33 20 12 0
90 07 73 66 00 6
8
00 09 90 0698/2 6 1 33 1
20 00 00 09 /54 707 094
A 2 0 0 ng 0 09
N A 2 20 o 09 00
0
8
A
0
K
1
2
9
N
fB B N NAng 20 A
/19 351
InInfInfBfB/Ho NnAfB N a/3189900
9469 14
InB InfB I
ani NAN9A 2900000392030 4
1
om fBfB
A 99 9005
B/R InIn
B NNA 99
IInnffB
93 11
99
/99
NA
7903
InfBInfB
/390
NA
B/Sichuan 92000907031 13
/1/999900070 1
InfB
nnaNA
B/Vie
InfB NA 2002923229 46
InfB NA 2001900915 32
Inf
B NA 200291
InfB
NA 2002 4260 42
InfB N
InfB NA 20039923098 44
0
InfB A 20
029 0602 49
In NA
In fB N 200 21873
InffBB NAA 2002291143 47
0 41
91
In
2
In fB NA 002 140
I fB NA 200 910 4 40
InnfB NNA 200 1900 621 37
In fB A 200 190 930
B/ fB NNA 200 190 0929 35
Ho A 20 19 09
3
ng 20 019 009 26 3 4
Ko 009 009 22 3
ng 10 21 31
/6 43 30
92 4
/0 21
1
Reference Strains
NAMRU-3
Menia Fever Hospital
Helwan Fever Hospital
Sharqeya Fever Hospital
Domiatta Fever Hospital
Aswan Fever Hospital
Kitchner General Hospital
Alexandria Fever Hospital
El-Gabarty Fever Hospital
Khedr, 45
Monira General Hospital
Figure 6 – Entropy H(x) Plot of the NA gene translated sequences
Khedr, 46
Figure 7 - Rate of Nucleotide Substitutions in Neuraminidase gene
lineage strains
lineage strains
lineage strains
Khedr, 47
Table 1 – Influenza virus segments 1-8.
Segment
1
2
3
4
5
Size (Kb)
2.341
2.341
2.233
1.778
1.565
Name
PB2
PB1
PA
HA
NP
6
7
1.413
1.027
8
.890
NA
M1 &
M2
NS1 &
NS2
Coded for:
Transcriptase: cap binding
Transcriptase: elongation
Transcriptase: protease activity (more research is required)
Hemagglutinin
Nucleoprotein: RNA binding, part of transcriptase complex, nuclear
transport of vRNA
Neuraminidase
Matrix protein and ion channel
Non-structural: nucleus and cytoplasm (other unknown functions)
Khedr, 48
Khedr, 49
Khedr, 50
Khedr, 51
Khedr, 52
Table 4 –Glycosylation motifs in NA.
PPsearch was used to identify the glycosylation sites found in the 107 samples.
Motif ID
Expression
ASN_GLYCOSYLATION N-{P}-[ST]{P}
Start Position
8-9
13
53-54
99
162-163
255-256-257
End Position
11-12
16
57-58
102
165-166
258-259-260
260-261
338-339
263-264
341-342
Khedr, 53
Comments
1998-2008
1998-2008
1998-2008
1998-2008
1998-2008
Yamagatalike only
1998-2008
2000
Table 5 - Non-zero entropy positions in correlation with the antigenic sites of Neuraminidase
Antigenetic sites (loops)
Loop 150
140-150
Loop 200
199-210
Loop 350
344-350
Loop 370
367-375
Loop 400
Amino Acid Positions
Entropy (Hx)
Position 148
0.32508
Position 200
0.32508
Position 350
0.33508
Position 367
Position 373
Position 398
0.32508
0.44702
0.32508
Khedr, 54
Table 6 - Zero Entropy Positions in Correlation with Receptor Binding Sites
Neuraminidase Conservative Positions
Glu-119
Arg-156
Trp-178
Ser-179
Asp-198
Lle-222
Glu-227
Position
119
158
178
180
198
225
227
Asp-293
Glu-425
293
426
Khedr, 55
Table 7 – List of Abbreviations
Abbreviation
Ala
Arg
Asn
CPE
DMEM
DNA
dNTP
Glu
HA
HAI
Ile
Leu
NAMRU-3
M
M1
M2
MDCK cells
Met
N
NA
NS1
NS2
PA
PB1
PB2
RNA
RT-PCR
Ser
Trp
VTM
Val
WHO
Alanine
Arginine
Asparagine
Cytopathic effect
Dulbecco’s Modified Eagle Medium
Deoxyribonucleic Acid
Deoxyribonucleotide triphosphate
Glutamic Acid
Hemagglutanin
Hemagglutination inhibition assay
Isoleucine
Leucine
Naval Medical Research Unit 3
Matrix
Matrix 1
Matrix 2
Madin Darby Canine Kidney cells
Methionine
Nucleocapsid
Neuraminidase
Nonstructural protein1
Nonstructural protein 2
Polymerase A
Polymerase B1
Polymerase B2
Ribonucleic Acid
Reverse transcription polymerase chain reaction
Serine
Tryptophan
Viral transport medium
Valine
World Health Organization
Khedr, 56
Download