Robert Dougherty Analysis of the Evolution of the PFKP gene in the catarrhine and platyrrhine primate lineages 17 April 2013 Introduction Molecular convergent evolution is a fascinating biological phenomenon because it suggests that evolution can proceed in a predictable fashion at the highest level of specificity. Understanding the predictability of the evolutionary processes that allow organisms to adapt to novel environments holds implications in ecology; how might certain organisms evolve to cope with global warming? It also holds implications in astrobiology; how might life evolve on a different planet? Though it may not hold answers about extraterrestrial life, it is interesting to study cases of convergent evolution in primates due to their close relationships with Homo sapiens. Episodic Adaptive Evolution of Primate Lysozyme[1] analyzed the convergent evolution of foregut lysozyme production in both colobine primates and ruminants. Convergent evolution between two particular groups in the order primates, new world monkeys and old world monkeys, is intriguing because it begs the question as to why hominids branched from the old world monkey lineage, but nothing close to hominids ever diverged from the new world monkey lineage. The convergent evolution of certain characteristics between new world monkeys and old world monkeys has been studied already, like the histability complex in the paper Convergent Evolution of Major Histocompatibility Molecules in Humans and New World Monkeys. [2] In this paper genetic distances were calculated by the Kimura method and phylogenetic trees were drawn by the neighbor joining method in MEGA as part of their tests of convergence. The exercise in molecular evolutionary analysis presented in this paper attempts to diagnose the scale of convergent evolution of primates in the new world monkey lineage and the old world monkey lineage in the PFKP gene. The PFKP gene codes for the PFKP enzyme, which catalyzes the irreversible conversion of fructose 6-phosphate to fructose 1,6biphosphate. This is a major step in glycolysis, as well as in the digestion of fruit. Some primates, new world lineage and old world lineage, rely heavily on fruit in their diet, but not all primates in these lineages do. The hypothesis of this exercise is that a phylogenetic tree of primates constructed based on relatedness of homologous PFKP genes will not reflect the primates’ actual lineage, but rather group primates together based on their dependence of fruit in their diet. Convergent evolution observed between new world and old world monkeys in this analysis would be especially significant since it is believed that the two groups diverged from each other 40 million years ago when some monkeys from Africa made it to South America. By answering this hypothesis, a greater understanding of the strength of convergent evolution of dietary adaptations in primates can be had. The knowledge gained by answering his question may also extend to furthering the understanding of the divergence of hominids from the old world monkey lineage. Picture 1: Primate Phylogeny [3] Description of Data The gene sequences were obtained from the national center of biotechnology information website. Sequences were obtained for eight primate species; six from the old world lineage and two from the new world lineage. The six primates from the old world lineage (catarrhine) included humans (Homo sapiens), common chimpanzees (Pan troglodytes), bonobos (Pan paniscus), Sumatran orangutans (Pongo abelli), gibbons (Namascus leucogenys) and olive baboons (Papio anubis). The two primates from the new world lineage (platyrrhine) included marmosets (Callithrix jachus) and South American squirrel monkeys (Saimiri boliviensis). The first attempt to get data was done using the entire nucleotide sequences of the PFKP genes of the seven primates but they proved to be too large for the available alignment software to handle. After trial and error with some introns and exons of the gene, it was finally decided that the amino acid sequences of the protein encoding gene would be used to test the hypothesis since the data was equally usable for all primate species. Table 1: Description of Gene Sequences Species Scientific name (Common Name) Homo sapiens Pan troglodytes Pan paniscus Pongo abelli Namscus leucogenys Papio anubis Callithrix anubis Saimiri boliviensis Description of Data Analysis Genbank Accession Number NC_000010 NC_006477 NW_003870569 NC_012601 NC_019824 NC_018160 NC_013902 NW_003943713 The amino acid sequences of the eight analogous protein coding genes were copied from the ncbi website and pasted into a text document. The compiled text file was then pasted into a clustalw alignment generator. After this alignment file was obtained it was converted into a MEGA supported file by the MEGA5 program. From here data analysis could be done. The first analysis done was the obtainment of pairwise distances between the amino acid sequences using the Poisson method to get a general idea of the relatedness of the genes. Then a neighbor-joining tree using the Poisson method was made with 1000 bootstrap replicas to see how these genes were related to each other from an evolutionary point of view. Then a maximum parsimony tree was generated with 1000 bootstrap replicas for the same reason as the neighbor joining tree and also to have another phylogenetic tree for comparison. Results of Data Analysis Table 2: Estimates of Evolutionary Divergence Between Sequences (Poisson) (1) HomoSapiens (1) PongoAbelli (2) 0.005 PanPaniscus (3) NamascusLeucogenys (4) PapioAnubis (5) CallithrixJacchus (6) SaimiriBoliviensis (7) PanTroglodyte (8) 0.003 0.026 0.009 0.035 0.023 0.096 (2) (3) 0.005 0.029 0.012 0.038 0.026 0.097 0.026 0.009 0.035 0.023 0.093 (4) (5) (6) (7) 0.033 0.057 0.038 0.048 0.026 0.025 0.119 0.102 0.125 0.113 From MEGA5: “The number of amino acid substitutions per site from between sequences are shown. Analyses were conducted using the Poisson correction model. The analysis involved 8 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 776 positions in the final dataset.” The paired distances table shows a close four-way relationship between Homo sapiens (humans), Pongo abelli (Sumatran orangutans), Pan paniscus (bonobos) and Papio anubis (olive baboons). Tree1: Maximum Parsimony Tree [3] The above tree is a maximum parsimony tree generated for the gene data. The only node with very robust data is the node connecting the new world monkey branches. Statistically robust nodes are nodes that have a high bootstrap number next to them. The number indicates the percentage of the bootstrap replicates generated that included that specific node. Tree 2: Neighbor-Joining Tree (Bootstrap Consensus) Shown above is the bootstrap consensus neighbor-joining tree generated for the gene data. This tree yields different connectivity than the maximum parsimony tree. Once again the only robust node is the one connecting the new world monkey branches. Tree 3: Neighbor Joining (Bootstrap Consensus) – No Chimp The above tree is the bootstrap consensus neighbor joining tree that is generated without including the chimp data. The neighbor joining tree was run more time this way with the hopes of getting more robust data, since the gene data for the common chimp does not share a close similarity with any of the other gene data. The nodes gain robustness, especially the one linking humans, bonobos, baboons and gibbons. Tree 3: Maximum Parsimony - No Chimp This time the maximum parsimony tree was generated without including the gene data for the common chimp. The tree experiences a decline in robustness from the tree that included chimps. Once again the trees generated using the two phylogenetic methods show different connectivity. Table 3: Tajima’s Test Statistic, D m 8 S ps = S/m T = ps/a1 p D 119 0.153351 0.059143 0.043906 -1.404662 m = number of sequences, S = number of segregating sites, ps = S/m, T = ps/a1, p = nucleotide diversity , D = Tajima’s test stat The test statistic D < 0 implies that directional selection is occurring among the sequences. Discussion It has already been mentioned that the table of pairwise distances showed a great deal of similarity between humans, bonobos, Sumatran orangutans and olive baboons. It comes as a surprise then that the gibbons are left out of this strong connection since gibbons share a more recent common ancestor with humans, bonobos and orangutans than baboons do. It comes to even a greater surprise that the common chimp is excluded from this group since they are the closest relatives to humans and the other species from their genus, the bonobos, are included. Other than evolutionary history, bonobos and orangutans are linked by their diet, which consists mainly of fruit. Baboons don’t rely as heavily on fruit as bonobos or orangutans, but then again neither do humans. Since the common chimp is more of a meat eater than bonobos and orangutans and its PFKP gene has diverged greatly from these aforementioned primates, it can suggest that the direct ancestors of humans had a fruit driven diet. It could also suggest that the PFKP gene is under different selective pressures when species start to substitute meat for fruit in their diet. Tajima’s D statistic for the gene sequences is -1.40462, which does suggest that directional selection is occurring. [4][5][6][7][8] The two primate species included in this analysis that are from the new world monkey lineage are the only two species that have the same linkage in all of the phylogenetic trees generated. This is fitting from an evolutionary point of view since these two species are more closely related to each other than to any of the other species included in this genetic analysis. They also have a similar diet of insects and plant exudates. [9][10] One conclusion that can be drawn from this is that the evolution of the PFKP gene has been proceeding at a constant speed in the new world monkey lineages but not in the catarrhine lineage. This could suggest that the PFKP is under greater selective pressure in the catarrhine lineage for a reason that is not yet known. Though some preliminary conclusions can be drawn from the data generated, there is not enough congruent data to prove or disprove the hypothesis presented earlier that a phylogenetic tree of primates constructed based on relatedness of homologous PFKP genes will not reflect the primates’ actual lineage, but rather group primates together based on their dependence of fruit in their diet. Further studies that can be done on this hypothesis could be examining different genes that code for dietary enzymes or using molecular clock methods instead of phylogenetic tree methods. References 1) http://www.popdna.zi.ku.dk/evolbiology/courses/4/lysozym/Messier.pdf 2) Klein, Kriener, O’hUigin, Tichy (200). “Convergent evolution of major histocompatibility complex molecules in humans and New World monkeys” Immunogenetics 51: 169-178 3) http://www.ec.europa.eu 4) Ihobe H (1992). "Observations on the meat-eating behavior of wild bonobos (Pan paniscus) at Wamba, Republic of Zaire". Primates 33 (2): 247–250. 5) http://web.archive.org/web/20080917132740/http://www.awf.org/content/wildlife/de tail/baboon 6) http://www.theanimalspot.com/sumatranorangutan.htm 7) http://pin.primate.wisc.edu/factsheets/entry/bonobo 8) http://pin.primate.wisc.edu/factsheets/entry/chimpanzee 9) http://marmoset.mynumber.biz/index_files/Page741.htm 10) http://www.edu.pe.ca/southernkings/sqmonkey.htm