Influenza David L. Suarez Southeast Poultry Research Laboratory Agricultural Research Service U.S. Department of Agriculture Athens, Georgia Influenza • Orthomyxovirus • Segmented genome • Pleomorphic RNA viruses single stranded • Three antigenic types: A, B, C • Type A: – Human influenza H1N1, H3N2, pandemic H1N1 – Swine Influenza H1N1, H3N2 – Equine Influenza H3N8, H7N7 – Canine Influenza H3N8 – Avian Influenza(many bird species) H1-H16, N1-N9 • Vary in pathogenicity Influenza A Virus 10 or 11 influenza proteins Neuraminidase 9 proteins packaged in virion HA, NA, M2-surface proteins NP, PA, PB1, PB2, M1 and NS2 internal proteins 16 HA subtypes 9 NA subtypes Hemagglutinin M2 PB1 PB2 PA HA NP NA MA NS M1 NS1 not packaged in virion Influenza: Infection and Disease • Infection may cause a wide range of clinical signs from no disease (asymptomatic), respiratory disease, to severe disease with high mortality • Localized Infection-mild to moderate disease – Intestinal-wild ducks and shorebirds, poultry – Respiratory-humans, swine, horses, poultry, domestic ducks, seal, mink • Systemic Infection-high mortality – chickens, turkeys, other gallinaceous birds Swayne, D.E. Epidemiology of Avian Influenza in Agricultural and Other Man-Made Systems. In: Avian Influenza. Wylie-Blackwell (www.blackwellpublishing.com), March, 2008. Main Existing Influenza Lineages Human influenza H3N2, H1N1 Equine/Canine Influenza H3N8 Avian Influenza Swine influenza H1N1, H3N2 Pandemics of influenza Recorded human pandemic influenza (early sub-types inferred) H2N2 H2N2 H1N1 H1N1 H3N8 1895 1905 1889 Russian influenza H2N2 1915 Pandemic H1N1 H3N2 1925 1900 Old Hong Kong influenza H3N8 1955 1918 Spanish influenza H1N1 1965 1957 Asian influenza H2N2 1975 1985 2010 2015 H9* 1999 H5 1997 2003 H7 1980 Reproduced and adapted (2009) with permission of Dr Masato Tashiro, Director, Center for Influenza Virus Research, National Institute of Infectious Diseases (NIID), Japan. 2005 2009 Pandemic influenza H1N1 1968 Hong Kong influenza H3N2 Recorded new avian influenzas 1955 1995 1965 1975 1985 1996 1995 2002 2005 Animated slide: Press space bar Genetic origins of the pandemic (H1N1) 2009 virus: viral reassortment N. American H1N1 (swine/avian/human) PB2 PB1 PA HA NP NA MP NS Unknown lineage H1N1 PB2 PB1 PA HA NP NA MP NS Classical swine, N. American lineage Avian, N. American lineage Human seasonal H3N2 Unknown lineage (closest Eurasian swine) PB2 PB1 PA HA NP NA MP NS Pandemic (H1N1) 2009, combining swine, avian and human viral components Origin of Swine Flu? Virus as a Parasite • Viruses are very small and encode for relatively few viral genes • Require host genes to make viral RNA or DNA and to package the virus • RNA viruses are generally smaller than DNA viruses (3000-30,000 bp) • Most viruses infect a cell, cause the host cell to make huge numbers of virus RNA, and results in death of host cell Viral Genes and Host Genes • Host proteins are needed to make viral proteins from viral mRNA • Host proteins help to assemble the virus • Viral genes usually make the viral RNA in the polymerase complex- 4 flu proteins, NP, PA, PB2, and PB1 are used to perform this function • Viral proteins are used to attach to the host cells (hemagglutinin protein) and exit host cells (neuraminidase protein) • Viral proteins are used to evade host immune response (non-structural proteins) General flu facts • Influenza makes viral mRNA that is translated into protein by the host cell • Proteins start from the first ATG (methionine) • Proteins end with any of the 3 stop codons • The matrix and non-structural genes are spliced into 2 proteins (M1, M2 and NS1, NS2) • Host machinery processes proteins including removing leader sequences and glycosylation Influenza Virus Production • Influenza has 8 gene segments • Each segment must be packaged into virus to be infectious • How do you get all gene segments into virus? • Each gene segment has conserved sequence on 5’ and 3’ ends of segment • 5’ end is 12 bp AGCAAAAGCAGG • 3’ end is 13 bp CCTTGTTTCTACT Flu facts • Six of eight gene segments are strict on lengths of gene segments – – – – – – PB2 PB1 PA NP MA NS 2341bp 2341bp 2233bp 1565bp 1027bp 890 bp • No larger gene segment have ever been reported for these genes (rare cases smaller) • The hemagglutinin and neuraminidase genes are exceptions with a lot of size variation Influenza Genes AGCGAAAGCAGG TCAAATAT ATTCAATATG AGCGAAAGCAGG CAAACCAT TTGAATG AGCGAAAGCAGG TACTGATT CAAAATG AGCAAAAGCAGG GGTTCAAT CTGTCAAAATG PB2 2341 bp PB1 2341 bp TAGTGTC GAATTGTTTA AAAACGA CCTTGTTTCTACT TGAAAAAATG CCTTGTTTCTACT PA 2233 bp TAGTTGTGGCAATGCTACTATTTGCTATCCATACTGTCCAAAAAAGTA CCTTGTTTCTACT HA 1779 bp TAGTTAAAAACAC CCTTGTTTCTACT NP 1565 bp AGCAAAAGCAGG GTAGATAA TCACTCACCGAGTGACATCC ACATCATG AGCAAAAGCAGG AGTTCAAA ATG AGCAAAAGCAGG TAGATATT GAAAGATG AGCAAAAGCAGG GTGACAAA AACATAATG NA 1450 bp MA 1027 bp NS 1565 bp TAAAGAAAAATAC CCTTGTTTCTACT TAGAAAAAAANT CCTTGTTTCTACT TAGAGCTGGAGTAAAAAACTA CCTTGTTTCTACT TGATAAAAAACAC CCTTGTTTCTACT Nucleoprotein Coding Sequence AGCAAAAGCAGG GTAGATAA TCACTCACCGAGTGACATCC ACATCATG TAAAGAAAAATAC CCTTGTTTCTACT • Nucleoprotein • 1565 base pairs in length • Encodes a single protein of 498 amino acids • Non-coding sequence is present before and after the coding sequence • Non-coding sequence acts as promoter and thought to be important for virus assembly Sequencing of Influenza Viruses • Over 180,000 influenza gene sequences have been deposited in GenBank representing over 50,000 isolates • Many of these sequences are only partial gene sequences that don’t include the non-coding sequence • Understanding non-coding sequences contribution to pathogenesis of flu is important • A rough estimation of 3% of flu sequences in GenBank have serious errors Errors in Flu sequence • Gene segments are longer than they should be and is likely the result of – – – – Primer sequence was included as part of submission For cloned genes, plasmid sequence was included Taq polymerase induced errors Sequence was poorly aligned and includes extra sequence • Sequence includes bad sequence that results in insertions or deletions that result in premature stop codons GenBank Data Mining • Using Influenza Research Database searched for NP gene segments >1565 bp • 266 isolates were greater than 1565 bp which should be the maximum size • Most if not all these sequences have errors that is apparent on a multiple sequence alignment Bioinformatics Class Assignment • Identify obvious mistakes in influenza sequences • Initially identify sequences with non-influenza sequence on the 5’ or 3’ end of the gene segments • Characterize the types of errors that are present and correlate that with the laboratories that produce the sequence Results • Analyze the data from all eight gene segments and publish the results in a peer reviewed journal • Contact the laboratories that have mistakes and give them an opportunity to correct the errors • GenBank provides a relatively simple process to correct sequence data • Track which labs correct the data Errors not so obvious • RT-PCR amplification and sequencing of the PCR product is commonly used Primer DNA Viral RNA converted to ss DNA by reverse transcriptase enzyme Viral RNA SS DNA transcribed to DS DNA Primer Primer Primer PCR used to amplify DS DNA that can then be sequenced Primer PCR basics Primer Denature DS DNA to SS DNA at 94C AGCGCTAGCTAGCTAGCGGCTAGCGTATCGAGCGTAGCGTAG TCGCGATCGATCGATCGCCGATCGCATAGCTCGCATCGCATC Anneal Primer to SS DNA 54C AGCGCTAGCTAGCTAGCGGCTAGCGTATCGAGCGTAGCGTAG AGCTCGCATCGCATC AGCGCTAGCTAGCTA TCGCGATCGATCGATCGCCGATCGCATAGCTCGCATCGCATC Repeat the Denaturation, Annealing, and Extension for 30-40 cycles Mismatches in Primer to Template Can Still Result in PCR Amplification AGCGCTAGCTAGCTAGCGGCTAGCGTATCGAGCGTAGCGTAG AGCTGGCATCGCATC AGCGCGAGCTAGCTA TCGCGATCGATCGATCGCCGATCGCATAGCTGGCATCGCATC Mismatches become incorporated in PCR product AGCGCGAGCTAGCTAGCGGCTAGCGTATCGACCGTAGCGTAG TCGCGATCGATCGATCGCCGATCGCATAGCTGGCATCGCATC Sequenced PCR Product will include these errors Conclusions • Primers must be close but are not always identical to template • Primers may introduce errors into PCR product that will show up in sequence • Primer sequence should be removed when data is submitted to GenBank • Often it isn’t, and errors in sequence may be introduced in GenBank database • Errors in sequence makes it harder to understand what sequence changes are important for viral infections • GIGO-garbage in, garbage out Influenza Sequencing • Procedures are available to PCR amplify the complete gene segment for eight genes • Primers include conserved areas in the noncoding region including the 12 and 13 bp found in all eight gene segments • In addition to flu sequence, primers also contain 5’ extensions to improve PCR efficiency because the sequences are so short • These primer sequences are commonly not removed before submission to GenBank Error from Commonly Used Procedure ACGTCGATCGCTTTCGTCC AGCGAAAGCAGGTACTGATTCAAAATGCCGATCGCT Primer sequence with 5’ extension ACGTCGATCGCTTTCGTCCATGACTAAGTTTTACGGCTAGCGA TGCAGCTAGCGAAAGCAGGTACTGATTCAAAATGCCGATCGCT Primer extension incorporated in PCR product Sequence includes “extra” DNA that if not edited can get submitted to GenBank How to identify primer induced errors • May not be possible by looking at sequence directly • Read the manuscript and look at experimental detail (if they don’t have procedure specifically sequencing ends, probably means they have primer data in their sequence) • Generate own non-coding sequence data and compare that with GenBank sequence Lethality and Molecular Characterization of an HPAI H5N1 Virus Isolated from Eagles Smuggled from Thailand into Europe M. Steensels, S. Van Borm, M. Boschmans, and T. van den Berg Reverse transcription (RT) was performed using an RT primer specific to a universal noncoding sequence present in all influenza segment RNAs (Table 1; Unit 12) and AMV reverse transcriptase (Roche), according to the manufacturer’s instructions, using 4 ll of purified RNA in a 20-ll reaction volume. Overlapping gene fragments were polymerase chain reaction (PCR)–amplified using Taq DNA polymerase (Roche) and a 2lM final concentration of gene-specific primers (Table 1) and 1 ll of cDNA in 50-ll reactions. PCR was performed using the following temperature profile: 4 min at 94 C, followed by 45 times the cycle (1 min at 94 C, then 1 min at 55 C, and 1 min at 72 C). At the end, a final elongation step of 10 min at 72 C was used. The size of the amplicons was verified by agarose gel electrophoresis. Subsequently, amplicons of the correct size were cloned into a pCR2.1-TOPO vector (TOPO TA cloning kit; Invitrogen, Carlsbad, CA), according to manufacturer’s instructions. The plasmid DNA from positive colonies was further purified (Qiaprep miniprep kit; Qiagen, Valencia, CA), according to the manufacturer’s procedures, and was verified by EcoRI (Roche, according to manufacturer’s instructions) digestion and agarose gel electrophoresis. Finally, sequencing reactions were performed using the M13F and M13R primers (provided with the cloning kit) (BigDyeTerminator, version 3.1, Select Extra Sequence for Blast Analysis Conclusions • Test sequence had “extra” sequence on 5’ and 3’ end • 5’ sequence is non-flu sequence added to primer to improve PCR efficiency • Review of published paper confirms data • Original paper shows they cloned sequence before sequencing • Part of 3’ sequence appears to be plasmid sequence • Origin of remainder of 3’ sequence is unclear How can you sequence the ends? Convert SS linear RNA to circular SS RNA T4 RNA ligase will connect RNA ends together Do RT-PCR using primers that cover the non-coding sequence Purify and sequence PCR as normal