Hoffman 1 Connor Hoffman Dr. Bert Ely Biology 303 1 November

advertisement
Hoffman 1
Connor Hoffman
Dr. Bert Ely
Biology 303
1 November 2013
Genetic Models: How they are used and (occasionally) abused
Deciphering the genetic patterns present in biological organisms is an important
part of understanding life on earth. Huge amounts of time and money are spent every year
in an effort to discover more about how life progresses and changes from one generation
to the next. That said, an important aspect of this research is efficiency: the ability that
allows researchers to perform their experiments faster, cheaper, and more easily, giving
them more time to either refine their experiments or test other, equally important
hypotheses. One of the most helpful tools for making genetic research more efficient and
understanding the complexity of biological processes is the use of models (Meng,
Somani, and Dhar, 2004).
The amount of genetic data that needs to be processed for comparative studies is
enormous, and manually sequencing genomes to find the processes or systems the
researches are looking for is not very efficient. Models are very helpful in removing some
of the ‘legwork’ involved and allowing researchers to more easily test their hypotheses
(Meng, Somani, and Dhar, 2004). It is very important to note, however, that genetic
models are not perfect, and can produce disastrous results if used improperly. One of the
ways in which models are abused occurs when researchers predict the genetic systems in
organisms outside the parameters of the model (Long et al., 2013). This paper provides
Hoffman 2
an in-depth look into how genetic models can be both helpful when used properly, and
misleading when used improperly.
To illustrate the process of model use and misuse, a model must first be selected.
We chose an algorithmic model that predicts the locations of CpG islands on genomes. It
is important to understand what a CpG island is before its purpose can be discussed.
“CpG islands” are lengths of DNA that are unique in that they are predominantly nonmethylated, allowing them to resist being down-regulated and ‘turned off’ by epigenetic
methylation (Weber and Schubeler, 2007; Deaton and Bird, 2011). The first CpG islands
were discovered in mammals and researchers were curious as to why certain segments of
DNA were exempted from epigenetic regulation. They sequenced the DNA of both
humans and mice and discovered similarities in the locations of their CpG islands. The
researchers found that within the majority [more than 50%] of CpG Islands there are
transcription initiation sites [referred to as TTSs] for mammalian genes (Figure 1) (Takai
and Jones, 2002; Deaton and Bird, 2011).
Figure 1: This figure shows the
distribution of CpG islands on
both the human and mouse
genomes. It can be seen that
over 50% of the islands are
located on transcription sites
for the mammalian genes
(Deaton and Bird, 2011).
This experiment was performed
to confirm the results of
previous experiments. This
original data was used to create
the genetic model being
discussed.
Hoffman 3
The affinity that these CpG islands express for gene transcription sites introduces
an interesting epigenetic system. DNA’s most important function is transcribing RNA
and, as a result, creating proteins for the cell. The only way DNA can start transcribing is
through its transcription sites. If those sites are unable to be accessed, the DNA cannot
perform its job properly. Therefore, the demethylation of these mammalian transcription
sites prevents them from being silenced by epigenetic methylation (Long et al., 2013).
Now that these CpG islands had been identified in humans and mice, researchers
wanted to determine the locations of these non-methylated DNA segments in other
vertebrates. Instead of painstakingly sequencing each species DNA looking for CpG
islands, a more efficient approach was adopted: creating a genetic model. While mapping
the islands for the humans and mice, researchers noticed patterns similar in each of the
located segments. First, they were all larger than or equal to 500bp, they each had a GC
[guanine and cytosine] content over or equal to 55%, and an Observed/Expected CpG
content over or equal to 65% (Takai and Jones, 2002). Using the genetic properties that
Takai and Jones discovered, algorithms were developed to help researches identify where
CpG islands could be found on other species’ genomes (Sharif et al. 2010).
With the distribution of this new genetic algorithm, scientists could easily perform
vertebrate comparative genomics, promoter mapping, and epigenetic studies using the
model as a proxy for non-methylated DNA (Bock et al., 2007; Han and Zhao, 2008). The
model made locating CpG islands easier, and opened the door for a multitude of studies
exploring the evolutionary implications surrounding these non-methylated segments of
DNA. The mammalian study which identified the correlation between transcription sites
and CpG islands illustrated the important biological system in which transcription sites
Hoffman 4
were prevented from being silenced by methylation. This finding was interesting, but
researchers wondered if other (such as cold-blooded) vertebrates had evolved to conserve
the same biological system.
A study was carried out in which the “gold standard” genetic model was used to
locate the CpG islands in several different vertebrates (both warm-blooded and coldblooded). Essentially, researchers chose six different species of vertebrate, three warmblooded (G. gallus, M. musculus, and H. sapiens) and three cold-blooded (C. intestinalis,
D. reno, and X. tropicalis). If the study had been done in the past, sequencing all of these
genomes in order to find the locations of the CpG islands would require a substantial
amount of time and money, but the researchers could now use the algorithmic model to
easily acquire the locations of the CpG islands. By using the genetic model (that was
based on experimental data from humans and mice), it was discovered that not all
vertebrates seemed to maintain the “CpG island-Transcription site” biological overlap
(Figure 2) (Sharif et al., 2010).
Hoffman 5
Figure 2: A comparison of proximity of CpG islands to transcription initiation sites on the
DNA of different vertebrates. There is a clear distinction between warm-blooded
vertebrates and cold-blooded vertebrates in that the warm-blooded vertebrates have high
levels of CpG island-transcription site crossover and the cold-blooded vertebrates have
low levels of crossover (Sharif et al., 2010).
It seemed as though, despite the logical benefits that CpG-TTS crossover would
provide for vertebrates both warm-blooded and cold-blooded, the algorithm supported the
notion that, at some point when vertebrates underwent divergent evolution, the process of
CpG-TTS crossover was not conserved (Sharif et al., 2010).
This idea prevailed, until a woman named Hannah Long and her associates
doubted the authenticity of the model’s predictions. They decided to perform a study
comparing the predictions of the CpG prediction model to the actual sequenced CpG
island locations of several different species (both warm-blooded and cold-blooded).
There had been several previous papers suggesting that the predictive models may not be
completely accurate when locating non-mammalian CpG islands (Han and Zhao, 2008),
which makes sense considering that the model itself was created using only mammals as
its parameters (humans and mice, to be exact), but no one had actually tested this
hypothesis yet.
Long’s study included seven different vertebrates, selected for their proximity to
select branching points in vertebrate evolution (human, mouse, platypus, chicken, lizard,
frog, and zebrafish). First, the predicted CpG locations were obtained using the genetic
model. As in previous studies, there were high levels of overlap between the CpG islands
and transcription sites in the mammalian species, while there was little to no crossover in
many of the other selected species. Second, the genome of each species was sequenced
Hoffman 6
and the true locations of the CpG islands were found. The results provided interesting
perspective on the effectiveness of the genetic modeling (Figure 3) (Long et al., 2013).
Figure 3: The three black lines for each species represents the genes that were analyzed in
this study. The green lines indicate where the model predicted the CpG islands would be.
Lastly, the blue peaks indicate where the researchers found the actual CpG islands to be
(Long et al., 2013).
As expected, the algorithmic predictions and actual locations of the CpG islands
on the mammalian genes are nearly identical, and there is a significant amount of overlap
between the CpG islands and the transcription initiation sites for the genes, consistent
with the previous studies. The other species, however, experience discrepancies between
the model’s predictions and the actual locations of the CpG islands. When the genomes
were sequenced and the locations of the islands were found based on the nucleotide
Hoffman 7
sequence, there was a near perfect match up of CpG island-transcription site crossover in
all of the species, both warm-blooded and cold-blooded. The data provided evidence that
this important biological system was, in fact, conserved over the millions of years of
divergent evolution and, more importantly, that the genetic model being used for
predicting island locations was employed to predict data that was not within its accepted
parameters (Long et al., 2013).
In summary, genetic models can be incredibly helpful, making testing and
comparing data easier and more efficient, but they must also be treated with respect and
researchers must acknowledge that no model is perfect and all models have parameters. If
the model is used for predicting results that are within the parameters (i.e. comparing
mammals to other mammals) the model is very successful at easing the process of
collecting and managing data. It is only when the model is too heavily relied upon and
used for data prediction outside of the data input range for which it was created (i.e.
comparing warm-blooded vertebrates to cold-blooded vertebrates) that problems can
arise, leading to potentially flawed results and inaccurate conclusions.
Hoffman 8
Works Cited
Bock, Christoph, Jorn E. Walter, Martina Paulsen, and Thomas Lengauer. "CpG Island
Mapping by Epigenome Prediction." PLoS Computational Biology Preprint.2007
(2005): E110. Print.
Deaton, Aimee, and Adrian Bird. "CpG Islands and the Regulation of
Transcription."Genes and Development 25 (2011): 1010-022. Genesdev.org.
2011. Web. 25 Oct. 2013. < http://genesdev.cshlp.org/content/25/10/1010>.
Han, Leng, and Zhongming Zhao. "Comparative Analysis of CpG Islands in Four Fish
Genomes." Comparative and Functional Genomics 2008 (2008): 1-7. Print.
Long, Hannah K., David Sims, Andreas Heger, Neil P. Blackledge, Claudia Cutter,
Megan L. Wright, Frank Grutzner, Duncan T. Odom, Roger Patient, Chris P.
Ponting, and Robert J. Klose. "Epigenetic Conservation at Gene Regulatory
Elements Revealed by Non-methylated DNA Profiling in Seven
Vertebrates."ELife (2013): n. pag. Elife.elifesciences.org. ELife Sciences
Publications, 26 Feb. 2013. Web. 5 Oct. 2013.
<http://elife.elifesciences.org/content/2/e00348>.
Meng, Tan Chee, Sandeep Somani, and Pawan Dhar. "Modeling and Simulation of
Biological Systems with Stochasticity." Bioinformation Systems (2004): n.
pag.Bioinfo.de. 16 Apr. 2004. Web. 21 Oct. 2013.
Sharif, Jafar, Takaho A. Endo, Tetsuro Toyoda, and Haruhiko Koseki. "Divergence of
CpG Island Promoters: A Consequence or Cause of Evolution?" Development,
Growth & Differentiation 52.6 (2010): 545-54. Onlinelibrary.wiley.com. 21 July
2010. Web. 5 Oct. 2013. <http://onlinelibrary.wiley.com/doi/10.1111/j.1440169X.2010.01193.x/full>.
Takai, D., and P. Jones. "Comprehensive Analysis of CpG Islands in Human
Chromosomes 21 and 22." PubMed (2002): n. pag. 12 Mar. 2002. Web. 21 Oct.
2013.
Weber, M., and D. Schubeler. "Genomic Patterns of DNA Methylation: Targets and
Function of an Epigenetic Mark." Current Opinion in Cell Biology 19.3 (2007):
273-80. Print.
Download