The American University in Cairo School of Sciences and Engineering Molecular Characterization of Neuraminidase Surface Proteins of Influenzavirus B isolates from Egypt, 1999-2008. Biotechnology Department Masters Program Nadean Safwan Khedr Bachelor of Science Under the supervision of Dr. Rania Siam and Dr. Anne M. Gaynor Khedr, 1 ACKNOWLEDGEMENTS The work that has been conducted for this thesis has been performed at US Naval Medical Research Unit No. 3, Cairo (NAMRU-3) in coordination with the American University in Cairo Biotechnology department. I would like to thank Dr. Rania Siam (AUC advisor and professor) for allowing me the chance to be part of this research project and for guiding me. I would also like to thank Dr. Anne M Gaynor (NAMRU-3 advisor) and Dr. Claire Corenlius for helping me with my practical lab work, providing me with laboratory space, and facilitating the opportunity to allow me to complete my work in NAMRU-3. I would like to thank the Egyptian Ministry of Health and Population, for they are responsible for all the collections sites. I would also like to express my gratitude to Mr. Diaa El-Din who has single handedly taught me everything I needed to know for me to complete my laboratory research. I would like to thank Mr. Sherif El Fayoumy for his input on my molecular characterization analysis and finally I would like to thank Ms. Yasmine Mustafa for helping me with all the molecular characterization and computational analysis for this research. Khedr, 2 ABSTRACT Influenzaviruses affect all of the world’s population, posing a threat to at-risk populations such as the elderly, children, pregnant women, as well as those with underlying respiratory ailments. Vaccination is an effective tool to prevent infection from influenzaviruses. However, due to the high mutation rate of influenzaviruses caused by antigenic drift and shift, vaccines must be developed every year in an effort to control epidemics. The mutations that effect the vaccine development occur mainly in the virus’ surface proteins: hemagglutinin and neuraminidase. It is important to study the sites and rates of mutation of influenzaviruses to understand how the virus is evolving. Influenzaviruses have been isolated from samples collected through surveillance projects in Egypt over the last ten years at NAMRU-3. Frequently these viruses are sequenced to analyze three important genes, the 2 surface proteins HA (hemagglutinin) and NA (neuraminidase) and M (Matrix) as they provide the vast majority of information that informs vaccine development and anti-viral treatment. This study will focus on sequencing and analysis of the neuraminidase gene of influenzavirus B isolates. This data will also allow us to examine the circulation patterns of influenzavirus B and allow us to compare it to the data acquired from a similar HA sequencing effort. Khedr, 3 TABLE OF CONTENTS LIST OF FIGURES……………………………………………………………………………...6 LIST OF TABLES……………………………………………………………………………….7 INTRODUCTION……………………………………………………………………………….8 Classification, Structure, and Subtypes…………………………………………………………..8 Virus Life Cycle……………………..……………………………………………………………9 Role of Neuraminidase…………………………….………………………………………….…10 Clinical Disease and Epidemiology ……………………………………………………….…….11 Vaccines and Mutation…………………………………………………………………………..11 INFLUENZAVIRUS STUDIES………………………………………………………………..13 Influenza B Lineages…………………………………………………………………………….13 Objectives………………………………………………………………………………….…….15 MATERIALS AND METHODS……………………………………………….……………….17 Sample collection………………………………………………………………………………...17 Virus Isolation…….……………………………………………………………………………...17 Molecular Testing of Virus Isolates………………..…………………………………………….18 Phylogenetic Analysis………………………………………………………………………........19 Entropy Analysis……………………………………………………………………………........20 RESULTS…………………………………………………………………………………..……21 Sample Selection and sequencing………………………………………………………………..21 Khedr, 4 Influenzavirus B Lineages……………………..……………………………………………….. 21 Phylogenetic Analysis………………………………..……………………………………….….22 Amino Acid Variation of NA Sequences…………………………………………………..…….23 Neuraminidase Glycosylation Sites………………………………………………………..…….25 Neuraminidase Antigen Binding Sites..………………………………………………………….25 Evolutionary Substitution Rates……………………………………………………………..…..26 DISCUSSION……………………………………………………………………………….......28 CONCLUSION…………………………………………………………………………….........34 REFERENCES…………………………………………………………………………………..35 TABLES AND FIGURES……………………………………………………………………….38 Figure legends……………………………………………………………………………………38 Figures……………………………………………………………………………………………40 Tables…………………………………………………………………………………………….48 Khedr, 5 LIST OF FIGURES Figure 1 – Map of Egypt labelled with affiliated hospitals……………………………………...40 Figure 2 – Prevalence of Yamagata-like isolates and Victoria-like isolates by year……………42 Figure 3 –Phylogenetic tree of influenzavirus B neuraminidase gene….……………………….43 Figure 4 – Circular representation of NA phylogeny color-coded according to distribution from 1998-2008………………………………………………………………………………………..44 Figure 5 – Circular representation of NA phylogeny color-coded by collection site…..………..45 Figure 6 - Entropy H(x) Plot of the NA gene translated sequences……………………...............46 Figure 7 – Rate of Nucleotide Substitutions in Neuraminidase gene…...……………………….47 Khedr, 6 LIST OF TABLES Table 1 – Influenzavirus segments 1-8……………………………………………………….…48 Table 2 – comparison of Amino Acid Sequences of the Yamagata-like isolates with B/Yamagata/88………………………………………………………………………………….49 Table 3 – Comparison of Amino Acid sequences of the Victoria-like Isolates with B/Hong Kong/548/2000, B/Brisbane/32/2002, and B/Johannesburg/69/2001…………………………..50 Table 4 –Glycosylation motifs in NA…………………………………………………...……...53 Table 5 - Non-zero entropy positions in correlation with the antigenic sites of Neuraminidase…………………………………………………………………………………..54 Table 6 - Zero Entropy Positions in Correlation with Receptor Binding Sites………………….55 Table 7 – List of Abbreviations………………………………………………………………….56 Khedr, 7 INTRODUCTION Seasonal influenza is an acute viral infection caused by influenzaviruses of which there are three types: A, B, and C. Together they contribute to significant morbidity and mortality worldwide. The viruses are well adapted to their human hosts and are able to mutate quickly to overcome the host immune system. The WHO has a global campaign to better understand the circulation of influenzaviruses called the “Global Influenza Surveillance and Response System” established in 1952 and is comprised of over 136 National Influenza Centers positioned worldwide. As a part of this system, Egypt through both the Ministry of Health and NAMRU-3 has participated in influenza surveillance for more than ten years. Classification, Structure and Subtypes Influenzaviruses are segmented negative strand RNA viruses belonging to the viral family Orthomyxoviridae. There are three genuses of influenzaviruses: Influenzavirus A, Influenzavirus B, and Influenzavirus C. Influenzavirus A is broken into subtypes based on the makeup of their two surface proteins HA (hemagglutinin) and NA (neuraminidase), ex: H1N1, H3N2 and H5N1. In total there are 16 types of HA and 9 types of NA used to classify Influenzavirus A. Influenzavirus B has the same 8 segments as Influenzavirus A but is not usually subtyped, though it is described as being of either Victoria or Yamagata lineages. The 8 genomic segments of influenzaviruses A and B encode for 1-2 proteins per segment and are coated with nucleocapsid (N) (Table 1), and collectively they are approximately 13.6 kb in size1. They are pleomorphic in shape though they are most commonly spherical and are on average 100nm in diameter2. Khedr, 8 Virus Life Cycle The influenzavirus undergoes a lytic infectious cycle that contains five stages: attachment, penetration, replication, maturation, and lysis of the cell2. Influenzaviruses have approximately five hundred surface projections of HA and NA proteins. These projections enable the virus to bind to the host cell and get engulfed by the surface membrane. Hemagglutanin binds the receptor, sialic acid, on the host cell surface and is crucial to the virus’ ability to fuse with the host’s membrane to infect the cell1. Neuraminidase is responsible for cleaving the sialic acid from the hemagglutinin glycoprotein to allow entry. The other major protein involved in entry is M2, the matrix protein that forms an ion channel that helps to increase the permeability of ions entering the virus core. This protein and its function are crucial to the influenza virus’ ability to attach and penetrate the membrane of the epithelial cells of the respiratory tract3. The M2 ion protein channel has a cytoplasmic tail that interacts with the M1 protein that allows the virus to undergo assembly and budding4. The virus manipulates the cellular environment through a combination of hijacking cellular machinery and use of viral proteins to replicate itself. The polymerase B2 protein (PB2), polymerase B1 protein (PB1), and the polymerase A (PA) protein forms an RNA-dependent RNA polymerase that is responsible for both replication and transcription of viral RNA5. This polymerase does not have proofreading activity resulting in a high frequency of point mutations. NS1 and NS2 aid the virus’ ability to penetrate and replicate by manipulating the host pathways and machinery. The viral genome is replicated into two types of RNA: positive sense ribonucleic acid (+RNA) and negative sense mRNA (-RNA). The +RNA is used as a template for the synthesis of more vRNA6. The newly synthesized vRNA with the nucleoprotein and other viral proteins assemble and bud from the cell membrane where it acquires its envelope and attaches to Khedr, 9 the cell again through interactions with HA, and NA is required to cleave the interaction to generate free infectious virion7. Role of Neuraminidase Neuraminidase is a cell surface glycoprotein that is required for the proper entry and release of influenzaviruses during infection. It is a tetrameric protein with four identical subunits which is anchored to the virus through a 29 amino acid N-terminal tail3, 8. Neuraminidase is a sialidase that functions to liberate sialic acid from the host’s glycoconjugates which in turn destroys the receptors for the virus. It cleaves the α-ketosidic linkage between the Neu5Ac surface amino acid and the neighboring saccharides that are usually galactose9. The active site of this enzymatic surface protein is in pocket in the protein that is lined with conserved amino acids. Neuraminidase is composed of 1451bp where the first 21 base pairs account for a nonbinding protein10, 11. Studies have shown that the glycosylation of neuraminidase is vital to the virulence of the influenzavirus. Li et.al has found that there are four Asparagine (Asn) residues that are glycosylated, with the most conserved (among influenzavirus A and B) one at position 146 of the protein12. This glycosylation site is associated with a complex sugar that contains Nacetylgalactosamine, which is not found in any of the other 3 Asn residues or in the hemagglutanin proteins12. Khedr, 10 Clinical Disease and Epidemiology Influenzavirus is a global disease that affects approximately 20% of children and 5% of adults worldwide13. The northern hemisphere usually experiences seasonal influenzavirus infections from October through March with a peak in February14. Influenzavirus is usually transmitted through air droplets, which causes the disease to be extremely contagious and its progression rapid due to its replication cycle beginning within six hours after the penetration of the mucosal cells15. The influenzavirus virus can cause a variety of symptoms such as a cough, fever from 39°-41° C, and a sore throat. Influenzavirus have a low mortality rate of 0.1%; however, the influenza virus can also be accompanied by many secondary infections, for example Haemophilus influenzae16 and can cause complications such as severe respiratory syndromes, disorders affecting the lung, heart, brain, liver, kidneys, and muscles, to primary viral and secondary bacterial pneumonia17. Those who are most at risk to developing complications from the influenza virus are: children aged 6-59 months, persons more then 50 years of age, pregnant women, hospitalized patients, immunocompromised patients, and persons with severe malnutrition14. The economic impact of such a disease is a significant burden on the healthcare system, particularly in the case of pandemics where productivity rates can decline17. Vaccines and Mutation The single most effective means to prevent influenzavirus infection is a vaccine. To develop a proper vaccine, a meeting is held twice a year with the WHO and collaborating centers Khedr, 11 to evaluate which viral strains are currently circulating and whether they have mutated from the viruses that were in the previous version of the vaccine14. Influenzaviruses are known to regularly mutate in response to the hosts’ immune response. This constant mutation results in small changes and is known as antigenic drift and can cause sporadic outbreaks and limited epidemics seasonally18. Influenzaviruses also have a second mechanism to generate change, which is through reassortment of their 8 segments. This can only occur when two different viruses infect one host at the same time. A perfect example of this is the most recent influenza A H1N1 (2009) which was a virus created when at least two different viruses shuffled their segments resulting in a brand new virus that was able to cause a global pandemic. This is known as an antigenic shift, it is when the viruses antigens are reassorted and new strains begin to circulate within the population7. The constant study and surveillance of influenzavirus is necessary in order to anticipate the next major epidemic or pandemic and to develop successful vaccines. Khedr, 12 INFLUENZAVIRUS STUDIES Influenzavirus B Lineages According to previous molecular studies, it has been determined that there are lineages and relationships between circulating strains that can be divided into two distinct phylogenetic lineages: Victoria (influenzavirus B/Victoria/2/87) lineage and Yamagata (influenzavirus B/Yamagata/16/88 lineage) lineage19, 20. Both the Victoria-like sequences and the Yamagata-like sequences for Neuraminidase are composed of 1511 nucleotides and 466 amino acid residues. The reassortment between circulating strains and insertions/deletions are speculated to be the strategies by which the influenza virus B undergoes evolution21. In 2002, a molecular characterization of 105 influenzavirus B specimens from Belgium, Finland, Spain, Israel and China were analyzed by sequencing the HA and NA to map out lineages for the regions20. From the study it was determined that 96.2% were B/Victoria/2/87 lineage while the remainder (3.8%) belonged to the lineage B/Yamagata/16/88. Further analysis was performed and it was determined that influenzavirus can contain an HA from one lineage and an NA from a different lineage. The B/Yamagata/16/88 showed a significant antigenic drift in the hemagglutanin protein while the B/Victoria/2/87 could be divided into two more lineages B/Hong Kong/1351/02-like (72.3%) and B/Hong Kong/330/01-like (27.7%). Based on the difference in lineage between the surface proteins in the same strain, the B/Hong Kong/1351/02-like viruses had the hemagglutanin gene belonging to the B/Victoria/2/87 and the neuraminidase gene belonging to the B/Yamagata/16/88. The B/Hong Kong/330/01 both the hemagglutanin and neuraminidase genes belonged to the B/Victoria/2/87 lineage. They also found throughout their study that although the B/HongKong/330/01 had both the HA and NA from the same lineage, there were Khedr, 13 B/Yamagata/16/88-like neuraminidase genes found during sequencing which most likely occurred because of the reassortment of B/Hong Kong/330/01 and B/Hong Kong/1351/02 viruses during coinfection of hosts. The purpose of this study was to determine the co-infection of the Victoria-like and Yamagata-like strains in Israel and China throughout phylogenetic analysis. This study revealed these new strains at the end of the year of 2002 which supports the notion that the influenza B virus continues to evolve using antigenic shifts and drifts20. The study has also conducted a substitution rate analysis where notable variations were found amongst the isolates they speculate contributes to the altered viral antigenicity. They also speculate that these substitutions later can become glycosylation sites. These results contribute to further the understanding of the influenzavirus within the regions which is lacking for the influenzavirus B strains in Egypt. A study on seasonal influenzavirus A, performed by Nelson et.al, was conducted in 2007 on H3N2 strains from the years 1999-200513. There were 487 isolates that were collected from Australia and New Zealand to represent the southern hemisphere and 413 isolates collected from New York, USA to represent the northern hemisphere. Phylogenetic analysis of the full genome for the 900 isolates showed global migration that contributes significantly to Influenzavirus A epidemics. Through this study, global circulation patterns were mapped out. It was also found the Influenzavirus A migrates during non-epidemic periods instead of remaining at low levels locally during what is considered the Influenza “off season5”. From this study, they were able to determine the genesis of new clades and the spreading of novel virus variants. The circulation patterns of the influenzavirus A proved useful to determine wide scale migration patterns. More studies on the influenzaviruses circulating throughout the Egyptian population are needed. Khedr, 14 A similar study was conducted in Japan in the year 1998 by Linstrom et.al where Influenzavirus A isolates from 1993-1997 were phylogenetically analyzed to determine evolutionary pathways and rates. This study was conducted on both the HA and NA gene. The amino acid sequences were also analyzed and it was determined that the changes that accumulated in the amino acid sequences were correlated with time. The study was not just focused on HA and NA genes, but also on the other segments of the influenza virus. Overall results indicated that the glycoproteins evolved at a faster rate than other proteins in the influenzavirus22. Whole genome studies on the influenzavirus strains in Egypt are needed. As shown from the studies previously mentioned, phylogenetic analysis and evolutionary analysis have been performed on the circulating strains in Australia, Asia, Europe, and North America. These studies help further understand the influenza virus, its circulation patterns, lineages, and evolutionary rates. The majority of these studies focus on the influenzavirus A rather than the influenzavirus B. More studies on the influenzavirus B should be conducted since it is also a co-circulating virus and contributes to the cause of epidemics for seasonal influenza. Also, very few studies regarding phylogenetic analysis of the influenzavirus B were found in Northern Africa. Objectives Studies have been conducted all over the world on the influenzavirus strains indigenous to their regions to map out lineages, circulation, and many of the studies have performed molecular characterization of the sequencing data to detect various conservative and non-conservative Khedr, 15 regions of the influenzavirus’ surface proteins. These types of studies have yet to be performed in Egypt. It is the objective of this study to fill this gap. Through this study, the evolution of the influenzavirus B neuraminidase surface protein will be examined by sequencing the NA gene from isolates collected in Egypt. Phylogenetic trees will be created to examine and analyze circulation patterns of influenzavirus B in Egypt based on the NA gene sequence. This study will also compare sequencing and phylogenetic data collected on the HA gene from a previous study in Egypt on the same isolates to generate an overall picture of the influenzavirus B strains circulating throughout Egypt from 1998-2008. Khedr, 16 MATERIALS AND METHODS Sample collection The samples used in this study were collected between 1999-2008 from 10 hospitals located throughout Egypt: Alexandria Fever hospital, Monira General Hospital (Monira), Domiatta Fever Hospital (Domiat), Sharqeya Fever Hospital (Sharqeya), Helwan Fever Hospital (Helwan), El-Gabarty Fever Hospital (Mokatam, Cairo), Kitchner General Hospital (6th of October, Cairo), and Menia Fever Hospital (Menia). Oral pharyngeal (OP) samples that were collected from patients that presented with influenza-like illness (cough, fever from 39°-41° C, and sore throat). The swabs were placed in 1ml virus transport medium (VTM) (10g veal infusion broth, 2g bovine albumin fraction V, 400ml sterile distilled water, 0.8 ml gentamicin, and 8ml fungizone) and immediately placed in liquid nitrogen and transported to NAMRU-3 for testing. Virus Isolation An aliquot of each sample was inoculated into MDCK cell lines. The cells are cultured in 500ml of DMEM supplemented with 55ml fetal bovine serum, 5.5ml penicillin-streptomycin, 5.5ml L-glutamine, and 4ml fungizone. Following inoculation the cells are incubated at 37ºC with 5% CO2 and observed daily (10-14 days) for cytopathic effect (CPE). Specimens positive for CPE are then identified using hemagglutination inhibition assay (HAI) using the WHO Influenza Reagent Kit. 107 Influenzavirus B isolates were identified and selected for further characterization. This step was performed by El Din, D23. Khedr, 17 Molecular Testing of Virus Isolates 140ul of the influenzavirus B isolates were extracted using the QIAamp Viral RNA Mini Kit (QIAGEN). Two-step RT-PCR was used to amplify the necessary regions. In the first step, 4ul of RNA template was added to the 1.5ul each of the forward and reverse primers and incubated for 5 min at 97ºC to increase the amplification yield. Three reactions were then set up as follows. Reaction 1: AO-NA1F (AGCAGAAGCAGAGCATCTTC) (Eurofin) with, AONA1R (AACGAGGGTATGTCCACTCC) (Eurofin), Reaction 2: AO-NA2F (TATATCGCAGTTGATGG) (Eurofin), with AO-NA2R (GCTTCCATCATYTGGTCTGG) (Eurofin), Reaction 3: AO-NA3F (GCTACCTTCAACTATACAAACG) (Eurofin), with AONA3-R (AGTAGTAACAAGAGCATTTTTC) (Eurofin). Then the rest of the reaction mix from the Q1 step kit (QIAGEN) is added for the second stop of RT-PCR: 10ul of 5X buffer, 2ul of dNTPs, 2ul of enzyme mix, and 29ul of H2O making a total of 50ul. The reactions are then amplified using the following cycling conditions: 30 min at 50ºC, 15 minutes at 95ºC, and 35 cycles of 30 seconds at 94ºC, 30 seconds at 50ºC, 1 minute at 72ºC, followed by 10 minutes at 72ºC, and then cooled down to 4ºC until ready for use. After the amplification, samples were run on a 2% agarose gel to identify the products of the three RT-PCR reactions to determine the molecular weight of each of the bands representing the 3 primer sets. The bands were purified following the manufacturers instructions using the Qiagen Gel Purification Kit. The gel extracted amplicons were then setup for cycle sequencing as follows: 4ul of BigDye v3.0 (Applied Biosystems), 4ul of 5X Big Dye buffer, 1ul of primer, 10ul of H2O, and 2ul of the template (total volume 21ul) in a 96-well plate, and thermocycling conditions: 25 cycles of: 96ºC for 10s, 50ºC, for 5s and 60ºC for 4m. After the Big Dye termination, the samples are purified again for 30 minutes using X-Terminator (Applied Khedr, 18 Biosystems), where 20ul of the xterminator is added to each well in the 96-well plate. After Xterminator is performed, the template already labeled, is inserted into the sequencer (ABI 3730 DNA Sequencer (Applied Biosystems)). Phylogenetic Analysis For each sample 6 sequences were obtained, and for each isolate the sequences were aligned using CodonCode Aligner and edited using Bioedit. For each isolate, overlapping sequence was identified and assembled using BioEdit. To examine the similarity of all of the sequenced isolates a multiple sequence alignment was performed using ClustalW of the 107 assembled sequences. These alignments were then examined using Mega 5.0 and used to construct phylogenetic trees. The sequences were analyzed using the maximum likelihood method with bootstrapping for the 107 isolates19. The best model was chosen using Mega 5.0’s Model Test and constructed using the PHYML algorithm. Neighbor-Joining phylogenetic trees were constructed while being estimated using the maximum-likelihood model. The samples were divided according to their cluster and clad patterns that were indicated in the phylogenetic analysis. Throughout creating these models, ancestral linear regression was estimated by including the Victoria-like and Yamagata-like strains as references along with the first influenzavirus B (B/Lee/1940) in the alignment9. Substitution rates are determined by comparing sequences to a rooted reference strain such as B/Victoria/02/87 and B/Yamagata/16/88. The method of determining the substitution rate is called root to tip and it is based on first estimating the phylogeny of the root sequence. The linear regression is between the time of Khedr, 19 sampling and the genetic distance which is the sum of the reconstructed branch lengths. From this method, the evolutionary rates for both the Victoria-like and Yamagata-like strains were determined using Mega 5.0. The distance rates are first calculated and then using excel, the equation v=k/t is implemented. V is the value that is needed to be obtained, k is the distance value, and t is the time in years. Also both the non-synonymous and synonymous rates (nucleotides/site/year) were used to determine which lineage has a higher evolutionary rate4. Entropy Analysis Entropy Analysis was plotted using Bioedit to estimate the conservative regions for the translated amino acid sequence. The amino acid sequences were translated throughout a function on Bioedit beginning from the first frame. The translated sequences were grouped according to the reference strains they were closely related to and variations in the amino acids were observed. The positions of most variation for each lineage were chosen to determine if there were any common conserved variations within the amino acid sequences. These positions were observed using Mega 5.0 where the positions were manually selected and observed. The amino acid sequences for the 107 samples were then scanned using the Prosite Motifs Database (PPsearch) (http://www.ebi.ac.uk/tools/ppsearch/index.html) to predict glycosylation sites. After determining the glycosylation sites using PPsearch, these positions are found on the entropy graph to determine whether or not the glycosylation site is subject to variation from 1998-2008. Common glycosylation sites between the two lineages Victoria-like and Yamagata-like were determined by PPsearch. Khedr, 20 RESULTS Sample selection From 1999 through 2008 there were approximately 1,800 influenzavirus B isolates in the NAMRU-3 collection from Egypt. Approximately 10 isolates per year were selected for further characterization in this study and previously in the master’s thesis of ElDin, D23. Additionally to get geographical representation, 42 isolates were selected to represent the Cairo area, 52 isolates to represent northern Egypt (north coast and delta areas), and 13 isolates to represent southern Egypt. The 107 isolates were collected from 9 different locations as shown in Figure 1. Influenza B Lineages Each of the 107 isolates was extracted and the NA gene was amplified using 3 sets of primers and subsequently sequenced and assembled into a single contig. The NA gene from the 107 isolates was analyzed to assess the ancestral lineage by phylogenetic analysis. Influenzavirus B viruses are broadly categorized into two lineages, Victoria-like and Yamagata-like based on previous studies. In this study 78 or 73% of the isolates had an NA gene of Victoria-like lineages while the remaining 29 (27%) were Yamagata-like. These two lineages have both circulated in Egypt throughout 1998-2008 in various proportions, except for the year 2000 where only the Victoria-like viruses were circulating, as shown in Figure 5. The percentages of Victoria-like viruses circulating were higher than the percentage of co-circulating Yamagata-like viruses except in 2005 (Victoria-like 46.6% and Yamagata-like 53.4%) and 2008 (Victoria-like 50% and Yamagata-like 50%). As seen in the phylogenetic tree in Figure 2, 16 reference strains were mapped with the 107 influenzavirus B isolates. It was observed that from the years 1998-2003, Khedr, 21 the Victoria-like isolates were related mostly to the reference strains B/HongKong/548/2000, B/Wisconsin/01/2009, and B/Brisbane/32/2002. It was also observed throughout Figure 2, that from the years 2004-2008 the Victoria-like (except for the year 2006 where Yamagata-like strains were the only circulating viruses) were more closely related to B/Johannesburg/69/2001, B/Harbin/07/1994, and the Victoria/02/1987 reference strains. Phylogenetic analysis A neighbor-joining phylogenetic tree was constructed using Mega 5.0 after analyzing the sequencing data using ClustalW. Then the alignments were analyzed using bootstrapping analysis throughout Mega 5.0 before the Neighbor-Joining tree was constructed using the Maximum Likelihood Method model. Non-parametric bootstrapping was performed to test the reliability of the dataset. This method of analysis involves the resampling of the provided datasets (sequences) of the same size as the original dataset (107 sequences and their lengths). The resampled dataset is sampled at random instead of at their original sites creating between a hundred and a thousand datasets. Then Mega 5.0 analysis of the resulting trees creates a consensus tree as shown in Figure 3. The Neighbor-Joining tree also includes 16 reference strains to map out the samples’ lineages and similarities. The phylogenetic tree was either colorcoded to display year of collection (Figure 4), or location of collection (Figure 5) or displayed to easily demonstrate relationships between the viruses from Egypt and the reference strains (Figure 3). Result from Figure 4 and 5. Khedr, 22 In Figure 3 there are two distinct clusters. The smaller of the two clusters is composed of the 29 Yamagata-like while the larger cluster contains all of the Victoria-like isolates and can be further broken down into smaller clusters. There are 5 clades within the Victoria-like cluster that are named according to the reference strain the isolates are closely related to. In one of the clades there are three isolates from the years 2000, 2002, and 2003 that are closely related to the B/Wisconsin/01/2009 and B/Texas/UR06-0541/2007 reference strains. The second clade contains 24 isolates that are most similar to B/Brisbane/32/02 reference strain. However, one of the 24 samples (2000910434) is more similar to the reference strain B/Hong Kong/692/01. The third clade is composed of 31 NA sequences that are B/Hong Kong/548/2000-like and are from all the years (1998-2008). The fourth clade is the B/Vienna/1/99-like which contains 4 isolates from 1999 and 2000. Within the B/Vienna/1/99-like clade is a smaller group of 3 samples that are B/Mexico/84/2000-like sequences from the years 1999 and 2000. The 4th sample within the B/Vienna/1/99-like clad is closely similar to the B/Sichuan/379/99 reference strain. The final clade is B/Johannesburg/69/2001-like and there are 9 samples that are closely related to the B/Johannesburg/69/2001 reference strain. Amino Acid Variations of NA Sequences Entropy analysis is the process by which the translated sequences of the influenzavirus B isolates are aligned and mapped on a graph. On the x-axis are the alignment positions with each peak representing the locations in the alignment with variation. The higher the peak on the graph, the greater the variation within that position in the alignments. The positions of the alignment on Khedr, 23 the graph with peaks are called non-zero entropy positions. Non-zero entropy positions can represent areas on the neuraminidase gene that are subject to high rates of mutation such as antigenic sites9. The alignment positions of the translated sequences on the graph without peaks indicate a conservative region or a region without variation that is the same throughout the 107 isolates. The positions of the alignment on the graph without peaks are called zero entropy positions. Zero entropy positions can indicate regions that are not subject to high mutation within the Neuraminidase gene such as receptor binding sites4. Amino acid sequences obtained from translating the nucleic acid sequences using Bioedit were aligned with reference strains according to the phylogenetic analysis that was constructed on the 107 samples. As indicated in Table 2, Yamagata-like strains were compared to the B/Yamagata/1988 reference strain. The positions indicated are the non-zero positions (nonconservative amongst the 107 isolates) on the entropy analysis. According to the comparative analysis, position 388 has the most variation where Ala to Ser (A338S) in 25 out of the 29 Yamagata-like sequences showing a conserved amino acid variation as shown in Table 2. In Table 3, the Victoria-like NA genes were grouped according to cluster and compared according to clades: B/Victoria/02/87, B/Brisbane/32/2002, B/Hong Kong/548/2000, and B/Johannesburg/69/2001 respectively. The B/Victoria/02/87 amino acid alignment indicates that the position with the most variation is at position 439 (Leu to Arg). The B/Brisbane/32/2002 amino acid alignment also indicated that position 439 was the most variable with a conserved amino acid variation of Leu to Arg. The B/Hong Kong/548/2000 and The B/Johannesburg/69/2001 amino acid alignments also had the most variations at position 439. Khedr, 24 Neuraminidase Glycosylation Sites Glycosylation sites are enzymatic sites that are involved in post-translational modifications that affect the structure and function of proteins. Mutations in glycosylation sites occur in the antigenic sites of influenzaviruses causing the structure to change. Mutations can also cause glycosylation sites to appear in different influenzavirus lineages1, 12. Determining variations in the glycosylation sites indicates the most non-conservative region in the sequences, and the location where site specific mutations are most likely to occur. Glycosylation sites of the influenzavirus B NA gene were determined using PPsearch where the glycosylation sites were mapped out for each isolate. Eight glycosylation sites were identified as shown in Table 4. The same glycosylation sites were found in both the Victoria-like strains and the Yamagata-like strains except for the glycosylation site at position 255. The glycosylation site at position 338 was only identified in isolates from 2000. The glycosylation sites at position 255 and 338 are not as conservative as the ones found in all isolates and in all years. Neuraminidase Antigen Binding Sites Antigen binding sites in influenzavirus are some of the most non-conservative regions where high rates of mutations occur. The entropy plot was graphed as shown in Figure 6 displays the conservative and non-conservative positions of the alignment. Influenzavirus B neuraminidase has antigenic sites that are subject to high mutation rates. The antigenic sites are composed of loops within RNA segment 6 at various sites known as loop 150, loop 200, loop 350, loop 370, and loop 400. Each of these loops can be located within different positions, for example loop 150 can be located between positions 140-150 as shown in Table 5. The nonKhedr, 25 conservative region between positions 140-150 are at position 148 with a variation entropy calculation of 0.32508 (a calculation on the Y-axis of the graph in Figure 6 to determine to which degree this position in the alignment is varied). The rest of the antigenic loop sites contain the same level of variation except for loop 370 located at position 373 with the entropy calculation of 0.44702 Using the entropy data, a correlation was found between the zero entropy positions and the receptor binding sites. Receptor binding sites are highly conservative. For neuraminidase the conservative receptor binding sites are: Glu-119, Arg-156, Trp-178, Ser-179, Asp-198, Lle-222, Glu-227, Asp-293, and Glu-425 and their positions are indicated in table 6. As indicated in table 6, the positions of the conserved regions are very similar and have remained conservative as shown in Figure 6 since the positions are at the zero entropy plots. Evolutionary Substitution Rates There are two types of substitutions that occur: non-synonymous and synonymous. Synonymous substitutions do not change the encoded amino acids and have no selective effect rendering these mutations neutral. A mutation being synonymous depends on three factors. The first factor, as mentioned, is that there is little or no difference caused by the substituted codon. The second factor is that there are no secondary RNA or DNA structures in the coding and noncoding regions of the substitutions. The third factor is the absence of overlapping reading frames. The non-synonymous substitutions change the encoded amino acids and have a selective effect. Determining substitution rates potentially allows the observation of the roles that mutations play on the virility and transmission of the influenza virus20. Khedr, 26 The nucleotide substitution rate was calculated for each of the 107 strains and the reference strains (Figure 7). The figure indicates that the evolutionary rate has been increasing over the years 1998-2008. This indicates that the mutational pattern in specific non-conservative regions have increased over the span of those 10 years. The Victoria-like lineages have a substitution rate of 2.3x10-3 nucleotides/site/year. The non-synonymous rate of nucleotide substitutions, 9.2x10-4, and the synonymous rate of nucleotide substitutions is 9.4x10-3. The Yamagata-like lineages have a substitution rate of 2.93x10-3. The non-synonymous rate of nucleotide substitutions is 1.56x10-3 and the synonymous rate of nucleotide substitutions is 8.84x10-3. These substitution rates indicate that the Yamagata-like lineages underwent a higher rate of nucleotide substitutions than the Victoria-like lineages. The rate of synonymous substitutions in comparison with the non-synonymous substitutions in the Yamagata-like lineages is much higher than those of the Victoria-like lineages. According to this data, the mutation rate of the Yamagata-like lineages is at least double the mutation rate of the Victorialike lineages over the years 1998-2008. Shown in Figure 7 are the rates of the non-synonymous and synonymous substitutions for both the Victoria-like Yamagata-like isolates. Khedr, 27 DISCUSSION The molecular characterization of the influenzavirus B neuraminidase includes: virus circulation patterns, phylogenetic analysis, determining glycosylation sites, conservative and non-conservative regions in the neuraminidase gene, and calculating evolutionary rates. The evolutionary rates of HA were calculated for the same 107 isolates in this study throughout a previous study. The rate of mutation of NA is compared and contrasted to provide a more comprehensive molecular characterization of influenzavirus B. In order to attempt to gain an understanding about the circulation of influenzavirus B in Egypt and in particular, the characteristics of the neuraminidase gene 107 isolates were sequenced and examined. The variety of influenza B viruses circulating is dependent on the current antigenic properties of the virus as well as past exposures to different viruses1, 14. Each year samples are taken from patients who are suspected to have influenza from outpatient clinics. The samples are taken from hospitals all over Egypt providing data regarding the circulation of the influenzavirus B in Egypt. The phylogenetic analysis for the 107 samples indicates that there are two clusters: the Yamagata-like and the Victoria-like. The Victoria-like cluster contains clades that further discriminate the isolates into sub-clades most similar to the following reference strains: B/Hong Kong/548/2000, B/Brisbane/32/2002, B/Wisconsin/01/2009, B/Vienna/1/99, and B/Johannesburg/69/2001. Both the Yamagata-like and Victoria-like samples co-circulated each year except for the year 2000. The sudden drop and then reappearance of the Yamagata-like strains the following Khedr, 28 year, according to similar studies conducted by Chi et.al, can be speculated to be due to the viruses’ antigenic shifting and drifting properties. Much like the results found for Egypt, it has been reported throughout Asia that the Yamagata-like strains have not been circulating as well as the Victoria-like strains for the duration of the years 1999-200020. Information gathered about the influenza B virus is critical to the creation of efficient vaccines to counteract the yearly epidemic. Vaccines are developed based on the viruses that are circulating each season. In order to understand the circulation patterns of influenzavirus B in Egypt the phylogenetic tree was color-coded by year of collection (Figure 4) or by location (Figure 5). As observed in Figure 5, very few groupings of isolates were formed on the phlyogenetic tree branches according to location within Egypt therefore it is not conclusive that the isolates’ location on the tree is correlated with the hospital location the sample was collected in Egypt. As observed in the color grouping on the tree there is a correlation between the lineage and the year by which the isolates were collected (Figure 4). This can be noticed in some groupings within the years 2003, 2004, 2005, 2006, and 2007 where various groups of isolates from the same year are closely related to one another but not all the isolates of the same year are located closely on the tree. Only some isolates from each year are grouped together. According to previous work done on the hemagglutanin surface protein by ElDin, D, it has been determined that the surface proteins on a single influenza virus strain can have different evolutionary rates and lineages23. Some isolates have shown that the Hemagglutanin sequence can be of a Victoria-like lineage while the Neuraminidase sequence can be of Yamagata-like lineage and vice versa. Khedr, 29 According to ElDin, D 11 glycosylation sites were identified within the HA gene of the 107 isolates from Egypt, and in this study 8 sites were identified in the NA gene of the same viruses23. A single amino acid change in the glycosylation site can result in the inactivation or creation of a glycosylation site. The glycosylation sites of the NA gene are known to mutate, for example glycosylation site 255 was only found in the Yamagata-like viruses indicating that a mutation caused this glycosylation site to disappear from the Victoria-like influenzaviruses B in Egypt. Glycosylation site 338 was only found in the isolates circulating in the year 2000, indicating that a mutation (such as an insertion, deletion, or a substitution) has caused this glycosylation site to become active but then a mutation occurred once more the following year allowing the site to disappear once more. The reason for the appearance of glycosylation site 338 could be the result of viral evolution to evade the host immune system and thereby allowing it to circulate more efficiently throughout the Egyptian population. It has also been determined throughout this study that there are high variations (chosen from the entropy graph) within amino acid position 388 for the Yamagata-like lineage and positions 113, 338, 403, and 437 for the Victoria-like isolates (Table 3). This indicates that these non-conservative regions will likely continue to change thus contributing to the virus’ evolutionary rate. According to Areej et.al, the position 338 has been determined to be one of the positions in the NA gene that is involved in NA inhibitors which is subject to high mutation rates as a method of resistance24. The entropy plot was done to determine the analysis of variations within the protein alignment which allows us to map the amino acid sequences that code for the Neuraminidase surface protein. The translated sequences of 16 reference strains (from various locations Khedr, 30 throughout the world: Asia, Australia, Europe, Northern America, and Southern America) were also aligned with the 107 amino acid sequences of the isolates. The beginning and end of the alignment show the most variance since the 107 amino acid sequences were trimmed at the beginning and end while the reference strains were not. The change in the antigenic loop sites allows the influenza virus to change its antigenic properties creating the need for a new vaccine every year. Determining the most non-conserved sites allows the speculation onto where it is likely that a mutation would occur (Table 5). Just as the non-conservative sites were observed, the conservative sites of influenzavirus B receptor binding sites were studied to determine if these sites have remained conservative (Table 6). The receptor binding sites have remained conservative indicating a low mutation rate in these regions which most likely remains conservative as long as the targeted host does not change. Evolutionary rates for the 107 sequence nucleotides have been determined throughout nucleotide substitution. These rates indicated that the influenza B neuraminidase mutations have increase over the years 1998-2008. It has been determined that the Yamagata-like lineages have undergone a higher rate of substitution than the Victoria-like lineages, as shown in other studies such as conducted by Schweiger B. in Germany over influenzaviruses A and B on both HA and NA from 1996-2006. Influenzavirus B neuraminidase showed a higher substitution rate in the Yamagata-like than the Victoria-like strains25. El-Din, D. performed the molecular characterization on the HA gene from the same 107 isolates as this study23. In this study the isolates could be characterized into two phylogenic lineages as Neuraminidase: Victoria-like and Yamagata-like. Of note, the isolates did not necessarily contain an HA and NA gene from the same lineage. Some isolates contain surface Khedr, 31 proteins where the HA is Victoria-like and the NA is Yamagata-like and vice versa. There are a total of 42 isolates with different lineages for their HA and NA surface proteins. Ten of the fortytwo have a Victoria-like HA and a Yamagata-like NA. Thirty-two of the forty-two have a Yamagata-like HA and a Victoria-like NA. The remaining 65 isolates have the same lineages for the HA and NA surface proteins be it Victoria-like or Yamagata-like. Chi et.al has also discovered similar results regarding studies conducted on influenzavirus B strains in Israel and China20. The influenzavirus isolates can contain an HA from the Victoria lineage and an NA from the Yamagata lineage and vice versa. Evolutionary rates were calculated from the HA gene by El-Din, D. and the estimated evolutionary rate for the Victoria-like strains were 2.23x10-3 nucleotides/site/year and for the Yamagata-like strains the estimated evolutionary rate was 2.82x10-3 nucleotides/site/year23. These results were very similar to those calculated using the NA gene (Victoria-like 2.32x10-3 nucleotides/site/year and Yamagata-like 2.93x10-3 nucleotides/site/year). The substitutions for HA were estimated to be 9.23x10-4 non-synonymous nucleotides/site/year and 6.43x10-3 synonymous substitutions/site/year for the Victoria-like strains. For NA the substitutions for Victoria-like were estimated to be 9.2x10-4 non-synonymous nucleotides/site/year and 9.4x10-3 synonymous substitutions/site/year. The estimations for the non-synonymous are almost identical in the HA and NA of the Victoria-like strains but the estimations based on NA for the synonymous exceeds the estimation for HA. Studies, such as Schweiger B., have determined that HA has a higher mutational rate than NA in Germany25. According to the findings of this study, NA’s mutational average rate does not exceed HA in the isolates collected in Egypt. The HA substitutions for the Yamagata-like strains were 1.75x10-3 non-synonymous substitutions/site/year and 6.20x10-3 synonymous nucleotides/site/year. Although the Khedr, 32 evolutionary estimation rates are quite similar for the Victoria-like and Yamagata-like strains, the synonymous estimations of the Yamagata-like isolates are almost double the estimated values of the synonymous estimations of the Victoria-like isolates. Therefore the Yamagata-like strains have a higher substitution rate. Much like HA, NA substitutions for the Yamagata-like isolates estimated indicate that the Yamagata-like lineages double the evolutionary rate of the Victoria-like isolates (1.56x10-3 non-synonymous substitutions/site/year and 8.84x10-3 synonymous substitutions/site/year). According to Domingo et.al, the Yamagata-like lineages have higher evolutionary rates than the Victoria-like lineages which coincide with the data found in this study26. Khedr, 33 CONCLUSION It was the purpose of this study to determine the evolutionary rates, phylogenetic lineages, and glycosylation sites of the Neuraminidase influenza B virus surface protein. This study enabled a more thorough understanding of the influenza B viruses that are infecting the Egyptian population. Full genome sequencing and analysis of the isolates should be conducted to determine the genetic makeup of the entire virus. It is our hope that this study will encourage other neighboring countries to conduct similar studies on influenzaviruses to better understand the virus’ circulation and evolution globally. Khedr, 34 REFERENCES 1. Influenza Book | Virology of Human Influenza [Internet] [cited 2011 12/27/2011]. Available from: http://influenzareport.com/ir/virol.htm. 2. Whittaker GR. Intracellular trafficking of influenza virus: Clinical implications for molecular medicine Expert Reviews in Molecular Medicine 2004; 2001;3(05). 3. Coleman R. The PB1-F2 protein of influenza A virus: Increasing pathogenicity by disrupting avleolar macrophages. Virology Journal 2007;4(9):56-61. 4. The 2.2 A resolution crystal structure of influenza B neuraminidase and its complex with sialic acid. [Internet] [cited 2011 12/9/2011];available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC556424. 5. Holmes E, Ghedin E, Miller N, Taylor L, Bao Y, St George K, Grenfell B, Salzberg S, Fraser C, Lipman D, et al. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol 2005 september;3(9):e300. 6. Nakagawa Y, Oda K, Nakada S. The PB1 subunite alone can catalyze cRNA synthesis, and the PA subunit in addition to the PB1 subunit is required for viral RNA synthesis in replication of the influenza virus genome. J Virol 1996 september;70(9):6390-6394. 7. Lu G, Rowley T, Garten R, Donis RO. FluGenome: A web tool for genotyping influenza A virus Nucleic Acids Res 2007 Jul;35(Web Server issue):W275-9. 8. Air GM, Laver WG. The neuraminidase of influenza virus Proteins: Structure, Function, and Genetics 1989;6(4):341-356. 9. Connaris H, Takimoto T, Russell R, Crennell S, Moustafa I, Portner A, Taylor G. Probing the sialic acid binding site of the hemagglutinin-neuraminidase of newcastle disease virus: Identification of key amino acids involved in cell binding, catalysis, and fusion J Virol 2002;76(4):1816-1824. 10. Chen BJ, Leser GP, Jackson D, Lamb RA. The influenza virus M2 protein cytoplasmic tail interacts with the M1 protein and influences virus assembly at the site of virus budding J Virol 2008 Oct;82(20):10059-70. 11. Chen R, Holmes E. The evolutionary dynamics of human influenza B virus. J Mol Evol 2008 june;66(6):655-663. Khedr, 35 12. Li S, Schulman J, Itamura S, Palese P. Glycosylation of neuraminidase determines the neurovirulence of influenza A/WSN/33 virus. J Virol 1993;67(11):6667-6673. 13. Phylogenetic analysis reveals the global migration of season influenza A viruses PLoS Pathogens [Cited 2011 12/9/2011];available from: http://www.plospathogens.org/article/info:doi/10.1371/journal.ppat.0030131. 14. Hunt R. The epidemiology of the influenza virus. Microbiology and Immunology 2005;16:32-34. 15. Grassly N, Fraser C. Seasonal infectious disease epidemiology Proc Biol Sci 2006 October 7;273(1600):2541-2550. 16. McCullers J. Insights into the interaction between influenza virus and pneumococcus Clin Microbiol Rev 2006 July;19(3):571-582. 17. Nicholson K, Wood J, Zambon M. Influenza Lancet 2003 November 22;363(9397):17331745. 18. Lofgren E, Fefferman N, Naumov Y, Gorski J, Naumova E. Influenza seasonality: Underlying causes and modeling theories. J Virol 2006;81(11):5429-5436. 19. Besselaar T, Botha L, McAnerney L, Schoub BD. Phylogenetic studies of influenza B viruses isolated in sourthern africa: 1998-2001. Virus Res 2004 july;103(1-2):61-66. 20. Chi X, Bolar, TV., Zhao, P., Rappaport R, Cheng S. Cocirculation and evolution of two lineages of influenza B viruses in europe and israel in the 2001-2002 season. J Clin Microbiol 2003;41(12):5770-5773. 21. Squires B, Macken C, Garcia-Sastre A, Godbole, S., Noronha, J., Hunt V, Chang R, Larsen C, Klem E, Biersack K, et.al. BioHealthBase: Informatics support in the elucidation of influenza virus host pathogen interactions and virulence. Nucleic Acids Res 2008 january;36(Database Issue):D497-503. 22. Lindstrom S, Hiromoto Y, Nerome R, Omoe K, Sugita S, Yamazaki Y, Takahashi T, Nerome K. Phylogenetic analysis of the entire genome of influenza A (H3N2) viruses from japan: Evidence for genetic reassortment of the six internal genes. J Virol 1998;72(10):8021-8031. 23. ElDin D. Molecular characterization and evolution of hemagglutanin of influenza B viruses in Egypt from 1998-2008. Fall 2009. 24. Areej M, Fatma U, Mutasem O. Combining docking, scoring, and molecular field analysis to probe influenza neuraminidase ligand interactions. Journal of Molecular Graphics and Modelling 2007;26:443-456. Khedr, 36 25. Schweiger B. Molecular characterization of human influenza viruses-- a look back on the last 10 years. Berl Munch Tierarztl Wochenschr 2006;119(3-4):167-178. 26. Domingo E, Holland JH, Webster R. Origin and evolution of viruses San Diego, Calif. ; Academic, c1999.id: 1 Khedr, 37 TABLES AND FIGURES Figure Legends Figure 1 – Map of Egypt labelled with affiliated hospitals Locations of the outpatient clinics (in hospital) from which the samples were collected within Egypt. 5 of the 10 sites are in the Cairo area due to the high population density (Monira General Hospital, Helwan General Hospital, El-Gabarty Fever Hospital, and Kitchner General Hospital) with the 5th point representing NAMRU-3 (purple). Below the map of Egypt is a list that indicates the number of isolates per year and per collection site. Figure 2 – Prevalence of Yamagata-like isolates and Victoria-like isolates by year The percentage of Yamagata-like and Victoria-like viruses was plotted by year to look at the trends of circulation patterns of the two-clades over the time course examined. Figure 3 – Phylogenetic tree of influenzavirus B Neuraminidase gene Neighbor-Joining tree of 107 isolates of influenzavirus B and 16 reference strains. The tree is rooted to the reference strain; B/Lee/1940 and B/Victorira/87 and B/Yamagata/88 are included to delineate the two lineages. Figure 4 - Circular representation of NA phylogeny color-coded according to distribution from 1999-2008 Circular phylogenetic tree of influenzavirus B isolates and reference strains to discriminate clades and make comparisons. Each year is a different color with reference strains marked in purple. Khedr, 38 Figure 5 – Circular representation of NA phylogeny color-coded by collection site. Circular phylogenetic tree of the influenzavirus B isolates and reference strains to discriminate clades and make comparisons color-coded by the 9 collection sites as displayed in Figure 2. Figure 6 - Entropy H(x) Plot of the NA gene translated sequences Along the x-axis are the amino acid positions. The height of the peak represents the amount of variation. Positions with peaks are are called non-zero entropy positons and positions with no peaks are conserved residues and are called zero entropy positions. From this entropy plot conservation of glycosylation sites can be examined (Table 6). Figure 7 – Rate of Nucleotide Substitutions in Neuraminidase gene Along the x-axis is the year of collection and the y-axis is the rate of substitution. Reference strains were included in the analysis to compare to the isolates in this study. Khedr, 39 Figures Figure 1a – Map of Egypt labelled with affiliated hospitals. 3 1 4 10 6 7 2 5 8 9 1. Alexandria Fever Hospital 6. El-Gabarty Fever Hospital 7. Kitchner General Hospital 2. Monira General Hospital 8. Menia Fever Hospital 3. Domiatta Fever Hospital 9. Aswan Fever Hospital 4. Sharqeya Fever Hospital 10. NAMRU-3 5. Helwan Fever Hospital Khedr, 40 Figure 1b: List of the number of isolates per year per collection site. Year Locations 1999 Alexandria Fever Hospital Helwan Fever Hospital Alexandria Fever Hospital Helwan Fever Hospital El Gabarty Fever Hospital Alexandria Fever Hospital Monira Fever Hospital Kitchner Fever Hospital El Gabarty Fever Hospital Alexandria Fever Hopsital Helwan Fever Hopsital Monira Fever Hospital Alexandria Fever Hospital Alexandria Fever Hospital Helwan Fever Hospital El Gabarty Fever Hospital Kitchner Fever Hospital Alexandria Fever Hospital Helwan Fever Hospital El Gabarty Fever Hospital Kitchner Fever Hospital Aswan Fever Hospital Monira Fever Hospital Domiatta Fever Hospital Menia Fever Hospital Alexandria Fever Hospital Monira Fever Hospital El Gabarty Fever Hospital Sharqeya Fever Hospital Alexandria Fever Hospital Menia Fever Hospital 2000 2001 2002 2003 2004 2005 2006 2007 2008 Number of isolates per location 7 4 6 4 2 1 2 4 2 10 2 3 7 8 1 1 2 6 5 2 2 6 1 3 2 2 3 2 2 1 3 Khedr, 41 Total 11 12 9 15 7 12 15 7 13 6 Figure 2 – Prevalence of Yamagata-like Isolates and Victoria-like Isolates by Year Khedr, 42 Figure 3 – Phylogenic Tree of Influenzavirus B Neuraminidase Gene Khedr, 43 Figure 4 – Circular representation of NA Phylogeny color-coded according to distribution from 19982008 InfB NA 2007 903712 94 InfB NA 200890 0863 10 2 InfB N A 20 0 89020 9 9 1 InfB 05 NA 2 0089 0215 3 1 03 InfB NA 2 0079 InfB 0307 NA 9 93 20 0 790 65 8 B/W 6 97 InfB isc NAonsin 200 /01/ Inf BN 3 90 20 09 44 3 A2 I nf 00 15 B B/ N 2 1 91 Te A 0 xa 200 59 73 s/U 09 In fB 6 R0 004 N In 6 - 67 fB A 2 05 00 41 15 In N A fB /20 6 NA 200 907 07 14 20 690 3 85 06 08 71 90 71 87 56 86 5 98 752 1 7 90 0 200 21 NA 037 InfB 791 00 200 71 NA 046 InfB 791 99 200 75 8 NA 08 95 79 InfB 00 89 6 42 A2 0 0 BN 7 9 55 1 Inf 00 1 88 A 2 902 5 4 BN 68 08 0 2 1 I nf 20 4 90 NA 07 06 72 20 902 1 72 08 07 20 NA 9 05 20 A NA NA N InfB NA 2007903706 91 InfB NA 2007901829 92 B Inf fB fB In fB fB fB In In In In 20 06 9 Inf 0 59 001 B 2 0 NA 00 59 05 Inf 0 B 90 20 8 NA 05 92 22 6 0 90 20 07 Inf 5 86 BN 05 90 07 77 A2 Inf 94 BN 00 69 74 29 A2 7 1 InfB 5 51 00 29 29 NA 23 43 200 11 InfB 7 3 900 4 NA 591 5 20 0 InfB 291 52 055 NA 20 0 93 2 90 9 InfB 224 NA 2 7 38 0019 0 0917 InfB N 28 A 200 19009 23 29 InfB NA 999002 91 7 InfB NA 99 900309 8 InfB NA 200490128 2 60 In fB NA NA 20 B/Mexico/84/20 In I fB In nfB NA I f N B/J nfB B N A 20 oh NA A 2 20 07 an Inf 0 02 90 B 2 In B N /A nes 00 02 90 26 InfBfB NAA 20rgenburg290901 19 88 InfB 0 t N 2 4 i /6 1 9 5 8 InfB NA 2A0 2000049900n3a/6 9/2905453 25 269 NA 2 049490079 399/2 0125 7 004903311803 5800 002220 58 566 1 InfB NA 20 2 547 9 01900150 24 InfB NA 999005 03 3 B/Ha rbin/0 7/94 1 00 22 0 41 3 2 19 73 33 20 12 0 90 07 73 66 00 6 8 00 09 90 0698/2 6 1 33 1 20 00 00 09 /54 707 094 A 2 0 0 ng 0 09 N NA A 2 20Ko 009200 10 98 fB B N NAng 20 A /19 351 InInfInfBfB/Ho NnAfB N a/3189900 9 14 9 InB InfB I ni 09200946 ma NANA 200003 3 4 B 9 1 f o fB A 99 9005 B/R InIn B NNA 99 InffB In 93 11 99 /99 NANA 7903 InfBInfB /390 B/Sichuan 92000907031 13 /1/999900070 1 InfB nnaNA B/Vie InfB NA 2004901951 61 InfB NA 2005904005 69 900106 67 InfB NA 2005901343 68 7 6 10 2005 InfB NA 200890215 50 2 96 A 4414 790429 63 2 InfB N 0 9 3 0 0 5 00 0 2 0 2 A 0 A 999 778 6 InfB N InfB N NA 900 834 InfB NA 99990350725 9 6 5 InfB N0A909526 56 Inf2B004903 6262 624 3 8 6 A 0 B N 20 90 8 8 3 Inf NA 004 907789 3 6 2 04 0 03 B f A In N 20 49 09 0 9 B Inf NA 20 04 fB A 0 In B N A 2 f In fB N In Reference Strains 2003/2004 1998/1999 2004/2005 2005/2006 2006/2007 2007/2008 Khedr, 44 83 81 4 14 79 4 9 91 13 73 4 20 05 1 A 28 20 59 N 71 78 B NA 200 590 f 67 In fB NA 0 62 84 0 0 n 9 2 I fB 18 5 A 00 9030 82 In B N 2 6 881 Inf B NA 200 78 90 2 A 6 Inf 0 20 B N A 20 001 9 Inf 6 N 4 70 00 403 InfB NA 2 590 0 B f 0 1 76 2 In 119 NA 591 InfB 200 A N 2 InfB /200 17 e/32 07125 isban B/Br NA 20009 fB In 3848 53 200390 InfB NA 48 03901518 InfB NA 20 7 90 263 790 200 NA InfB InfB NA 2002923229 46 InfB NA 2001900915 32 Inf B NA 200291 InfB NA 2002 4260 42 InfB N InfB NA 20039923098 44 0 InfB A 20 029 0602 49 In NA In fB N 200 21873 InffBB NAA 2002291143 47 0 41 91 In 2 In fB NA 002 140 I fB NA 200 910 4 40 InnfB NNA 200 1900 621 37 In fB A 200 190 930 B/ fB NNA 200 190 0929 35 H o A 20 1 9 09 3 ng 20 019 009 26 3 4 Ko 009 009 22 3 ng 10 21 31 / 6 4 3 30 92 4 /0 21 1 9 05 0.05 1999/2000 2000/2001 2001/2002 2002/2003 9 5 41 Figure 5 – Circular Representation of NA phylogeny color-coded by collection site 20 06 NA Inf 05 900 B 20 90 1 N 05 Inf A 00 59 B 90 20 8 NA 05 92 22 6 0 90 20 07 Inf 5 86 BN 05 7 90 07 7 A2 Inf 94 BN 00 69 74 29 A2 7 1 InfB 5 51 00 29 29 NA 23 43 200 11 InfB 7 3 900 4 NA 5 91 5 200 InfB 291 52 055 NA 200 93 290 9 InfB 224 NA 2 7 38 0019 0091 InfB N 7 28 A 200 19009 23 29 InfB NA 999002 91 7 InfB NA 99 900309 8 InfB NA 200490128 2 60 In fB NA 20 InfB NA 2007 903712 94 InfB NA 20 08 90 08 63 10 2 InfB N A 20 0 8 90 20 9 9 10 5 InfB NA 2 0089 0215 3 103 InfB NA 2 0079 InfB 0307 NA 9 93 200 790 65 8 B/W 6 97 InfB isc NAonsin / 2 0 Inf 003 1/2 BN 904 009 A2 Inf 431 0 B 02 B/ N 51 91 Te A 0 2 xa 0 0 59 73 s/ U 0 9 In fB 6 R0 004 N In 6- 67 fB A 2 05 00 41 15 In N A fB /20 6 NA 200 907 07 14 20 690 3 85 06 08 71 90 71 87 56 86 A 20 NA N NA fB fB InfB NA 2007903706 91 InfB NA 2007901829 92 5 98 752 790 01 20 0 21 NA 03 7 InfB 791 00 20 0 71 NA 046 InfB 9 791 59 200 87 NA 08 95 79 InfB 00 89 2 A2 06 04 BN 79 55 1 Inf 00 1 88 A 2 902 5 4 BN 68 08 0 2 0 1 I nf 2 0 4 9 NA 07 206 72 B 20 0 Inf 9 21 N A 08 77 90 05 20 fB In fB fB In In In In 1 14 59 83 44 81 79 39 3 59 141 0 87 0 A 82 59 N A2 71 7 0 90 fB N 67 20 In fB NA 05 62 84 0 0 n 9 2 I fB 18 5 A 00 9030 82 In B N 2 6 881 Inf B NA 200 78 902 A 6 Inf 0 20 B N A 20 001 9 Inf 6 N 4 70 00 403 InfB NA 2 590 0 B f 0 1 76 2 In 119 NA 591 InfB 20 0 A N 2 InfB /200 e/32 5 17 isban 00090712 r /B B A2 53 InfB N 8 4 8 3 200390 InfB NA 48 18 039015 InfB NA 20 90 7 263 InfB NA 200790 20 0.05 InfB NA 2004901951 61 InfB NA 2005904005 69 900106 67 InfB NA 2005901343 68 7 05 6 10 20 A N InfB 90215 A 2008 4414 50 4292 96 2 InfB N 0 9 3 0790 063 A 20 0 A 20 99900778 56 InfB N InfB N A 900 7834 N InfB NA 99990350 25 9 6 5 InfB N0A909526 56 Inf2B004903 6262 624 3 8 6 A 0 B N 20 90 8 8 3 Inf NA 004 907789 3 6 2 4 B 0 3 Inf NA 200 49 090 0 9 B Inf NA 20 04 fB A 0 In B N A 2 f In fB N In 9 05 1 14 In I fB In nfB NA I f N B/J nfB B N A 20 oh NA A 2 20 07 a Inf n B 2 0 02 90 InfBB NA /Arnesb 002029 90 26 ge ur 9 0 19 8 I 2 N n 0 f n A InfB B N 2 04 ti g/601 19 5 8 8 InfB NA 2A0 2000049900n3a/6 9/2905453 25 269 NA 2 049490079 399/2 0125 7 004903311803 5800 002220 58 566 1 InfB NA 20 2 547 9 01900150 24 InfB NA 999005 03 3 B/Ha rbin/0 7/94 B/Mexico/84/20 00 22 0 41 3 2 19 73 33 20 12 0 90 07 73 66 00 6 8 00 09 90 0698/2 6 1 33 1 20 00 00 09 /54 707 094 A 2 0 0 ng 0 09 N A 2 20 o 09 00 0 8 A 0 K 1 2 9 N fB B N NAng 20 A /19 351 InInfInfBfB/Ho NnAfB N a/3189900 9469 14 InB InfB I ani NAN9A 2900000392030 4 1 om fBfB A 99 9005 B/R InIn B NNA 99 IInnffB 93 11 99 /99 NA 7903 InfBInfB /390 NA B/Sichuan 92000907031 13 /1/999900070 1 InfB nnaNA B/Vie InfB NA 2002923229 46 InfB NA 2001900915 32 Inf B NA 200291 InfB NA 2002 4260 42 InfB N InfB NA 20039923098 44 0 InfB A 20 029 0602 49 In NA In fB N 200 21873 InffBB NAA 2002291143 47 0 41 91 In 2 In fB NA 002 140 I fB NA 200 910 4 40 InnfB NNA 200 1900 621 37 In fB A 200 190 930 B/ fB NNA 200 190 0929 35 Ho A 20 19 09 3 ng 20 019 009 26 3 4 Ko 009 009 22 3 ng 10 21 31 /6 43 30 92 4 /0 21 1 Reference Strains NAMRU-3 Menia Fever Hospital Helwan Fever Hospital Sharqeya Fever Hospital Domiatta Fever Hospital Aswan Fever Hospital Kitchner General Hospital Alexandria Fever Hospital El-Gabarty Fever Hospital Khedr, 45 Monira General Hospital Figure 6 – Entropy H(x) Plot of the NA gene translated sequences Khedr, 46 Figure 7 - Rate of Nucleotide Substitutions in Neuraminidase gene lineage strains lineage strains lineage strains Khedr, 47 Table 1 – Influenza virus segments 1-8. Segment 1 2 3 4 5 Size (Kb) 2.341 2.341 2.233 1.778 1.565 Name PB2 PB1 PA HA NP 6 7 1.413 1.027 8 .890 NA M1 & M2 NS1 & NS2 Coded for: Transcriptase: cap binding Transcriptase: elongation Transcriptase: protease activity (more research is required) Hemagglutinin Nucleoprotein: RNA binding, part of transcriptase complex, nuclear transport of vRNA Neuraminidase Matrix protein and ion channel Non-structural: nucleus and cytoplasm (other unknown functions) Khedr, 48 Khedr, 49 Khedr, 50 Khedr, 51 Khedr, 52 Table 4 –Glycosylation motifs in NA. PPsearch was used to identify the glycosylation sites found in the 107 samples. Motif ID Expression ASN_GLYCOSYLATION N-{P}-[ST]{P} Start Position 8-9 13 53-54 99 162-163 255-256-257 End Position 11-12 16 57-58 102 165-166 258-259-260 260-261 338-339 263-264 341-342 Khedr, 53 Comments 1998-2008 1998-2008 1998-2008 1998-2008 1998-2008 Yamagatalike only 1998-2008 2000 Table 5 - Non-zero entropy positions in correlation with the antigenic sites of Neuraminidase Antigenetic sites (loops) Loop 150 140-150 Loop 200 199-210 Loop 350 344-350 Loop 370 367-375 Loop 400 Amino Acid Positions Entropy (Hx) Position 148 0.32508 Position 200 0.32508 Position 350 0.33508 Position 367 Position 373 Position 398 0.32508 0.44702 0.32508 Khedr, 54 Table 6 - Zero Entropy Positions in Correlation with Receptor Binding Sites Neuraminidase Conservative Positions Glu-119 Arg-156 Trp-178 Ser-179 Asp-198 Lle-222 Glu-227 Position 119 158 178 180 198 225 227 Asp-293 Glu-425 293 426 Khedr, 55 Table 7 – List of Abbreviations Abbreviation Ala Arg Asn CPE DMEM DNA dNTP Glu HA HAI Ile Leu NAMRU-3 M M1 M2 MDCK cells Met N NA NS1 NS2 PA PB1 PB2 RNA RT-PCR Ser Trp VTM Val WHO Alanine Arginine Asparagine Cytopathic effect Dulbecco’s Modified Eagle Medium Deoxyribonucleic Acid Deoxyribonucleotide triphosphate Glutamic Acid Hemagglutanin Hemagglutination inhibition assay Isoleucine Leucine Naval Medical Research Unit 3 Matrix Matrix 1 Matrix 2 Madin Darby Canine Kidney cells Methionine Nucleocapsid Neuraminidase Nonstructural protein1 Nonstructural protein 2 Polymerase A Polymerase B1 Polymerase B2 Ribonucleic Acid Reverse transcription polymerase chain reaction Serine Tryptophan Viral transport medium Valine World Health Organization Khedr, 56