The Effect of Oxygen on Biochemical Networks and the Evolution of Complex Life by Jason Raymond and Daniel Segre Published in Science (2006) 311, 1764 In modern time, the genome research always often requires intensive analysis due to the increasingly amount of the genomic data. Such analysis will proceed to generate a visualized model to illustrate the knowledge embedded in the data set. This article presents one of the kinds of the approach. SUMMARY The enzymatic reactions that take place in oxygen-dependent networks evolved after molecular oxygen appeared on Earth around 2.2 billion years ago, and some of these adaptations may have been important in a subsequent explosion of multicellular life. To see if that oxygen is required for the largest and most complex networks held true, the authors used a method called metabolic network expansion to simulate metabolic networks that could develop under different conditions. The simulations were performed using the database which contained more than 6,000 different enzymatic reactions derived from 70 genomes. After running 100,000 simulations, the authors found that every generated metabolic network could be placed in one of four major groups, with each group showing more than 95% identical reactions and metabolites. The four groups were different sizes and showed different levels of connectivity, and the smaller groups mainly contained reactions and metabolites also present in the larger groups. Additionally, the largest and most interconnected of the four groups was only reached in simulations that included molecular oxygen. Networks from this group contained as many as 1,000 more reactions than the networks generated without oxygen. Interestingly, about half of these oxygendependent reactions do not explicitly use oxygen but belong to pathways that rely on it. These types of oxygendependent reactions would not have been found in studies of individual reactions, but rather require a wholemetabolome approach to this problem. 1. The effect of oxygen on the metabolic “backbone” (pictured as a pruned version of the full network, with 1861 metabolites and 2652 possible reactions). Blue nodes and edges represent metabolites and reactions, respectively. (anoxic metabolic networks). Red nodes and edges are oxic network metabolites and reactions (contingent on oxygen availability either directly or secondarily). Green edges correspond to reactions that are found only in the oxic network but use at least one anoxic metabolite, representing replacement or rewiring of anoxic pathways to take advantage of oxygen. This figure was generated by VisANT (Z Hu, J Mellor, J Wu and C DeLisi (2004) VisANT: an online visualization and analysis tool for biological interaction data, BMC Bioinformatics 5:17) 2. The effect of various metabolites (legend at right) on the total number of reactions in full-level metabolic networks, as computed with the network expansion algorithm. Each point represents two consecutively generated networks: The first network, whose size is the abscissa value, is generated from a randomly chosen set of seed metabolites, and the second network, whose size is the ordinate value, is generated from that same seed set amended with the addition of one of the nine metabolites shown in the legend. Points are colored based on the amended metabolite, shown in the legend. All networks occupy four broadly similar groups (bold lines and roman numerals) and subgroups (H, higher; L, lower) that result from often very different but chemically interconvertible seed sets. Only networks that include O2 as a metabolite are able to transition into group IV (dashed line), with other transitions being determined by the availability of key metabolites. Notably, Transitions between smaller groups, as well as the sub-splitting of groups II to IV into high(H) and (L) clusters are determined by the availability of biomolecules involved in the assimilation and cycling of key elements and whose essentiality may have manifested early in the evolution of metabolism. 3. (A) Similarity in anoxic network enzyme distribution in 44 different genomes, fit to two dimensions and broadly consistent with genome-based phylogenies. The MDS plot was generated using the similarity matrices in enzyme distribution between organisms. Blue points are obligate aerobes, yellow points are facultative aerobes that also have anaerobic modes of growth, and maroon points are strict anaerobes. (B) Similarity in oxic network enzyme distribution in 44 different genomes, largely inconsistent with organismal phylogeny but, as illustrated, following a trend consistent with an aerobic versus anaerobic growth mode. The distributions of enzymes in the anoxic networks were consistent with the tree of life whereas those of anoxic enzymes were not. Despite limited data from strict anaerobes, the underlying trend for most organisms is consistent with their preferred aerobic versus anaerobic lifestyle as opposed to taxonomic relationships. 4. Increase in total number of reactions catalyzed by individual genomes after the inclusion of O2 in a network originally seeded with N2, H2S, CO2, and the cofactors pyridoxal phosphate, ATP/ADP, THF, and NADP/H. Horizontal bars represent the percent increase in oxic versus anoxic network size on a genome-by-genome basis, colored according to the growth mode of the organism (colors are consistent with those in the previous figure). The inset superimposes this data on a species tree for these organisms, showing that adaptation to O2 has occurred throughout the tree of life. Enzymes specific to the oxic network expansion also observed in strict anaerobes. A closer examination led to two categories: one is O2-dependent enzymes known to be involved in removal of oxygen radicals while the other unclear functional enzymes or pathways may be substitutes in the anoxic condition. The researchers also grouped different organisms together based on which anoxic enzymes they possessed, and they found that the organisms clustered in the same way as in traditional genome-based phylogenies. However, when they grouped organisms by enzymes needed for oxic metabolism, the clusters did not mirror patterns of speciation but instead depended on whether an organism prefers an aerobic or anaerobic lifestyle. This finding suggests that adaptation to molecular oxygen occurred either independently in different lineages or through horizontal gene transfer after all three domains of life had appeared. This research falls into the knowledge discovery (data mining) described in the textbook. As with the graphics and visualization process, the authors would generate a model to illustrate the information which is embedded in the data set. This type of visualization pipeline called model visualization is an explicit approach in the genome research area where, in hope, the scientists can present the specific results to the users.