The sequences searched are all hemagglutinin protein sequences

The sequences searched are all hemagglutinin protein sequences for the influenza viruses. There are altogether 100 sequences. All the sequences were obtained from the protein database of the National Center for Biotechnology Information (NCBI). The sequences may be found in the file: “sequences.fasta”. They are distributed in the 16 subtypes (H1 – H16) in the following manner: H1N1: 7 H2N1: 7 H3N1: 6 H4N1: 3 H4N4: 1 H4N8: 3 H5N1: 3 H6N2: 7 H6N8: 2 H7N1: 10 H8N2: 1 H8N4: 4 H9N1: 9 H10N1: 2 H10N3: 2 H11N1: 1 H11N2: 6 H12N1: 2 H12N4: 2 H13N2: 3 H13N6: 6 H14N5: 2 H15N2: 1 H15N8: 3 H16N3: 7 The sequence analysis tool of the Europian Bioinformatics Institute (EBI), called “ClustalW” is made use of to find the pairwise distances between all pairs of sequences. The scheme followed to calculate the pairwise distance is: Distance = 1 – {(No. of identities in the best alignment) / (total no. of residues compared)} Gaps are not considered in the total no. of comparisons. The pairwise distances may be found in the file: “pairwise” The same tool ClustalW is made use of to generate the Multiple Sequence Alignment among the sequences. This may be found in the file: “multiple”. ClustalW further generates the alignments in the PHYLIP format (.ph). The alignments in the PHYLIP format may be found in the file: “align_phylip”. To find the phylogenies, both character-based and distance-based, the PHYLogeny Inference Package (PHYLIP) is made use of. The tool “Protpars” in PHYLIP is used to generate the character-based phylogenetic tree. Protpars accepts the aligned sequences in the PHYLIP format and generates the character-based phylogenetic tree. In character-based phylogeny, the substitutions that the nucleotides undergo at different stages in the transformation from one sequence to another is registered. The method used by Protpars is one in which only those substitutions in the nucleotides are counted which bring about a change in the aminoacid. The other substitutions are considered too minor to be registered. The phylogenetic trees generated by Protpars may be found in the files: “char1” and “char2”. The distance-based phylogeny is also generated using the PHYLIP package. In this case, the tools “Protdist” and “Fitch” are used one after the other to generate the phylogenetic trees. Protdist accepts the aligned sequences in the PHYLIP format and returns the distance matrix which is a measure of the evolutionary distances between the various sequences. Evolutionary distance in this case means the fraction of the amino-acids that have undergone a change. This distance matrix may be found in the file: “dist1”. After this, the tool Fitch accepts the distance matrix and generates the distance trees wherein the length of the branches are a representation of the evolutionary distance between the different stages. The distance-based phlogenetic trees may be found in the files: “dist_tree1” and “dist_tree2”. An evaluation of the character based phylogeny obtained from Protpars is as follows: The file “char_with_table” contains the character based phylogenetic trees obtained from Protpars and each tree is followed by a table. This table describes the different stages of evolution that the sequences go through in the transition of one sequence to the other. The table starts with the 1st sequence in the set of aligned sequences gi| 1912345. This sequence is denoted by “1”. The branching of the tree describes each stage of the evolution and the different stages are labeled by numbers (1, 2, 3,…etc.). The table describes the substitutions in the amino-acids that occur at each stage. An amino-acid substitution is indicated by the corresponding alphabet. A “.” represents ‘no change’, an “X” represents a substitution of an amino-acid but which amino-acid, it is unknown. A “?” is used to indicate either an amino-acid substitution or a deletion. Each table is broken into smaller bits because of lack of space (ie. The first bit describes the first 40 identities of the sequences and so on). Thus using this table and observing the tree, the different stages of evolution that the sequence go through while converting from one form to the other may be traced. Observing the character-based phylogenetic tree, one expected conclusion was that almost all sequences that were of the same sub-type and that were derived from the same source were close to each other in terms of evolution. However, a few interesting observations are as follows: (i) gi| 14289397 (H7N1) found in a turkeys was very close to gi| 70608897 (H6N2) found in mallards. (ii) gi| 116235395 (H11N2) found in swans was close to gi| 68137154 (also H11N2) but found in ducks. (iii) gi| 11936550 (H11N2) found in green wing teals was close to gi| 82654072 (also H11N2) but found in mallards. (iv) gi| 118595866 and gi| 125716848, both H13N6 were close to each other although the first was obtained from ducks whereas the latter from gulls. (v) gi| 11027796 (H13N6) from gulls was found to be close to gi| 32330958 (H1N1) found in Taiwan. An evaluation of the distance-based phylogenetic tree also produced some interesting results. Looking at the tree, five “clusters” of sequences could clearly be made out. This gives a rough idea of a pattern of similarity that exists among the various subtypes. Each of the clusters observed consists of two or more sub-clusters. The clusters observed are as follows: (i) H2N1 sequences (sub-cluster) form a cluster along with the sub-cluster comprising H5N1 sequences. (ii) The sub-cluster H6N2 combines with the sub-cluster formed by H13N2 and the sub-cluster formed by H9N1 and the three sub-clusters together form a large cluster. (iii) A sub-cluster comprising H8N4+H8N2 sequences combines with sub-clusters formed by H11N1, a sub-cluster formed by H16N3 sequences and a subcluster formed by H13N6 sequences to form a large cluster. (iv) The sub-clusters of H3N1 sequences and H4N8+H4N1 sequences form a cluster. (v) H10N3+H10N1 sequences form a sub-cluster that joins with the sub-cluster formed by H7N1 sequences to form a cluster.

The sequences searched are all hemagglutinin protein sequences

Related documents

Products

Support

The sequences searched are all hemagglutinin protein sequences

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib