Hoffman 1 Connor Hoffman Dr. Bert Ely Biology 303 1 November 2013 Genetic Models: How they are used and (occasionally) abused Deciphering the genetic patterns present in biological organisms is an important part of understanding life on earth. Huge amounts of time and money are spent every year in an effort to discover more about how life progresses and changes from one generation to the next. That said, an important aspect of this research is efficiency: the ability that allows researchers to perform their experiments faster, cheaper, and more easily, giving them more time to either refine their experiments or test other, equally important hypotheses. One of the most helpful tools for making genetic research more efficient and understanding the complexity of biological processes is the use of models (Meng, Somani, and Dhar, 2004). The amount of genetic data that needs to be processed for comparative studies is enormous, and manually sequencing genomes to find the processes or systems the researches are looking for is not very efficient. Models are very helpful in removing some of the ‘legwork’ involved and allowing researchers to more easily test their hypotheses (Meng, Somani, and Dhar, 2004). It is very important to note, however, that genetic models are not perfect, and can produce disastrous results if used improperly. One of the ways in which models are abused occurs when researchers predict the genetic systems in organisms outside the parameters of the model (Long et al., 2013). This paper provides Hoffman 2 an in-depth look into how genetic models can be both helpful when used properly, and misleading when used improperly. To illustrate the process of model use and misuse, a model must first be selected. We chose an algorithmic model that predicts the locations of CpG islands on genomes. It is important to understand what a CpG island is before its purpose can be discussed. “CpG islands” are lengths of DNA that are unique in that they are predominantly nonmethylated, allowing them to resist being down-regulated and ‘turned off’ by epigenetic methylation (Weber and Schubeler, 2007; Deaton and Bird, 2011). The first CpG islands were discovered in mammals and researchers were curious as to why certain segments of DNA were exempted from epigenetic regulation. They sequenced the DNA of both humans and mice and discovered similarities in the locations of their CpG islands. The researchers found that within the majority [more than 50%] of CpG Islands there are transcription initiation sites [referred to as TTSs] for mammalian genes (Figure 1) (Takai and Jones, 2002; Deaton and Bird, 2011). Figure 1: This figure shows the distribution of CpG islands on both the human and mouse genomes. It can be seen that over 50% of the islands are located on transcription sites for the mammalian genes (Deaton and Bird, 2011). This experiment was performed to confirm the results of previous experiments. This original data was used to create the genetic model being discussed. Hoffman 3 The affinity that these CpG islands express for gene transcription sites introduces an interesting epigenetic system. DNA’s most important function is transcribing RNA and, as a result, creating proteins for the cell. The only way DNA can start transcribing is through its transcription sites. If those sites are unable to be accessed, the DNA cannot perform its job properly. Therefore, the demethylation of these mammalian transcription sites prevents them from being silenced by epigenetic methylation (Long et al., 2013). Now that these CpG islands had been identified in humans and mice, researchers wanted to determine the locations of these non-methylated DNA segments in other vertebrates. Instead of painstakingly sequencing each species DNA looking for CpG islands, a more efficient approach was adopted: creating a genetic model. While mapping the islands for the humans and mice, researchers noticed patterns similar in each of the located segments. First, they were all larger than or equal to 500bp, they each had a GC [guanine and cytosine] content over or equal to 55%, and an Observed/Expected CpG content over or equal to 65% (Takai and Jones, 2002). Using the genetic properties that Takai and Jones discovered, algorithms were developed to help researches identify where CpG islands could be found on other species’ genomes (Sharif et al. 2010). With the distribution of this new genetic algorithm, scientists could easily perform vertebrate comparative genomics, promoter mapping, and epigenetic studies using the model as a proxy for non-methylated DNA (Bock et al., 2007; Han and Zhao, 2008). The model made locating CpG islands easier, and opened the door for a multitude of studies exploring the evolutionary implications surrounding these non-methylated segments of DNA. The mammalian study which identified the correlation between transcription sites and CpG islands illustrated the important biological system in which transcription sites Hoffman 4 were prevented from being silenced by methylation. This finding was interesting, but researchers wondered if other (such as cold-blooded) vertebrates had evolved to conserve the same biological system. A study was carried out in which the “gold standard” genetic model was used to locate the CpG islands in several different vertebrates (both warm-blooded and coldblooded). Essentially, researchers chose six different species of vertebrate, three warmblooded (G. gallus, M. musculus, and H. sapiens) and three cold-blooded (C. intestinalis, D. reno, and X. tropicalis). If the study had been done in the past, sequencing all of these genomes in order to find the locations of the CpG islands would require a substantial amount of time and money, but the researchers could now use the algorithmic model to easily acquire the locations of the CpG islands. By using the genetic model (that was based on experimental data from humans and mice), it was discovered that not all vertebrates seemed to maintain the “CpG island-Transcription site” biological overlap (Figure 2) (Sharif et al., 2010). Hoffman 5 Figure 2: A comparison of proximity of CpG islands to transcription initiation sites on the DNA of different vertebrates. There is a clear distinction between warm-blooded vertebrates and cold-blooded vertebrates in that the warm-blooded vertebrates have high levels of CpG island-transcription site crossover and the cold-blooded vertebrates have low levels of crossover (Sharif et al., 2010). It seemed as though, despite the logical benefits that CpG-TTS crossover would provide for vertebrates both warm-blooded and cold-blooded, the algorithm supported the notion that, at some point when vertebrates underwent divergent evolution, the process of CpG-TTS crossover was not conserved (Sharif et al., 2010). This idea prevailed, until a woman named Hannah Long and her associates doubted the authenticity of the model’s predictions. They decided to perform a study comparing the predictions of the CpG prediction model to the actual sequenced CpG island locations of several different species (both warm-blooded and cold-blooded). There had been several previous papers suggesting that the predictive models may not be completely accurate when locating non-mammalian CpG islands (Han and Zhao, 2008), which makes sense considering that the model itself was created using only mammals as its parameters (humans and mice, to be exact), but no one had actually tested this hypothesis yet. Long’s study included seven different vertebrates, selected for their proximity to select branching points in vertebrate evolution (human, mouse, platypus, chicken, lizard, frog, and zebrafish). First, the predicted CpG locations were obtained using the genetic model. As in previous studies, there were high levels of overlap between the CpG islands and transcription sites in the mammalian species, while there was little to no crossover in many of the other selected species. Second, the genome of each species was sequenced Hoffman 6 and the true locations of the CpG islands were found. The results provided interesting perspective on the effectiveness of the genetic modeling (Figure 3) (Long et al., 2013). Figure 3: The three black lines for each species represents the genes that were analyzed in this study. The green lines indicate where the model predicted the CpG islands would be. Lastly, the blue peaks indicate where the researchers found the actual CpG islands to be (Long et al., 2013). As expected, the algorithmic predictions and actual locations of the CpG islands on the mammalian genes are nearly identical, and there is a significant amount of overlap between the CpG islands and the transcription initiation sites for the genes, consistent with the previous studies. The other species, however, experience discrepancies between the model’s predictions and the actual locations of the CpG islands. When the genomes were sequenced and the locations of the islands were found based on the nucleotide Hoffman 7 sequence, there was a near perfect match up of CpG island-transcription site crossover in all of the species, both warm-blooded and cold-blooded. The data provided evidence that this important biological system was, in fact, conserved over the millions of years of divergent evolution and, more importantly, that the genetic model being used for predicting island locations was employed to predict data that was not within its accepted parameters (Long et al., 2013). In summary, genetic models can be incredibly helpful, making testing and comparing data easier and more efficient, but they must also be treated with respect and researchers must acknowledge that no model is perfect and all models have parameters. If the model is used for predicting results that are within the parameters (i.e. comparing mammals to other mammals) the model is very successful at easing the process of collecting and managing data. It is only when the model is too heavily relied upon and used for data prediction outside of the data input range for which it was created (i.e. comparing warm-blooded vertebrates to cold-blooded vertebrates) that problems can arise, leading to potentially flawed results and inaccurate conclusions. Hoffman 8 Works Cited Bock, Christoph, Jorn E. Walter, Martina Paulsen, and Thomas Lengauer. "CpG Island Mapping by Epigenome Prediction." PLoS Computational Biology Preprint.2007 (2005): E110. Print. Deaton, Aimee, and Adrian Bird. "CpG Islands and the Regulation of Transcription."Genes and Development 25 (2011): 1010-022. Genesdev.org. 2011. Web. 25 Oct. 2013. < http://genesdev.cshlp.org/content/25/10/1010>. Han, Leng, and Zhongming Zhao. "Comparative Analysis of CpG Islands in Four Fish Genomes." Comparative and Functional Genomics 2008 (2008): 1-7. Print. Long, Hannah K., David Sims, Andreas Heger, Neil P. Blackledge, Claudia Cutter, Megan L. Wright, Frank Grutzner, Duncan T. Odom, Roger Patient, Chris P. Ponting, and Robert J. Klose. "Epigenetic Conservation at Gene Regulatory Elements Revealed by Non-methylated DNA Profiling in Seven Vertebrates."ELife (2013): n. pag. Elife.elifesciences.org. ELife Sciences Publications, 26 Feb. 2013. Web. 5 Oct. 2013. <http://elife.elifesciences.org/content/2/e00348>. Meng, Tan Chee, Sandeep Somani, and Pawan Dhar. "Modeling and Simulation of Biological Systems with Stochasticity." Bioinformation Systems (2004): n. pag.Bioinfo.de. 16 Apr. 2004. Web. 21 Oct. 2013. Sharif, Jafar, Takaho A. Endo, Tetsuro Toyoda, and Haruhiko Koseki. "Divergence of CpG Island Promoters: A Consequence or Cause of Evolution?" Development, Growth & Differentiation 52.6 (2010): 545-54. Onlinelibrary.wiley.com. 21 July 2010. Web. 5 Oct. 2013. <http://onlinelibrary.wiley.com/doi/10.1111/j.1440169X.2010.01193.x/full>. Takai, D., and P. Jones. "Comprehensive Analysis of CpG Islands in Human Chromosomes 21 and 22." PubMed (2002): n. pag. 12 Mar. 2002. Web. 21 Oct. 2013. Weber, M., and D. Schubeler. "Genomic Patterns of DNA Methylation: Targets and Function of an Epigenetic Mark." Current Opinion in Cell Biology 19.3 (2007): 273-80. Print.