doc

advertisement
Neema Bhukhan
BME 230
Investigating the Importance of Conserved Non-coding Transcripts
Abstract:
The purpose of this project is to identify the non-coding RNAs using transcription
data. I start by first identifying the regions in the ENCODE region of the genome that are
actively transcribed by using tiling arrays. Once this information is obtained I subdivide
the findings into coding and non-coding transcripts. Then I divide the non-coding
transcripts into conserved and not conserved. The evofold track of the UCSC Genome
Browser is used to analyze the possible RNA structures of these transcripts excluding any
retrotransposons. I searched the ENCODE region and observed patterns of conservation
and transcription and analyzed their RNA structure predictions.
Introduction:
There is little known about what parts of the genome sequence are expressed as
mRNA transcripts and whether or not they are functional transcripts. Some areas are
conserved but are not transcribed and some sequences are transcribed but not conserved. It
is not quite understood what exactly is the relationship between transcription and
conservation and if there even is a direct correlation. Does conservation necessarily mean
transcription and if so why are there numerous conserved sequences that are non-coding
and why are they transcribed? Does non-coding necessarily mean unimportant?
The article by Dubchak etal (Dubchak etal 2000) conducted large-scale
human-mouse DNA comparisons revealing numerous conserved non-coding sequences, of
which only a small percentage are functionally examined. Their inspection revealed almost
identical patterns of non-coding sequence conservation in human, dog, and mouse DNA.
Of the 14 conserved non-coding sequences found, 2 were determined to be gene regulatory
elements. The results they obtained suggest that a large fraction of non-coding elements
identified are conserved because of functional constraints.
Cawley et al (Cawley et al, 2004) have discovered data that suggest that protein
coding and non-coding genes have similar characteristics. There is evidence of the
existence of common transcription factors in their promoter regions and the ability to
respond to environmental and developmental conditions suggesting they might be
controlled by the same transcriptional machinery. These results suggest that non-coding
RNAs most likely have important biological functions.
The article by Kampa et al (Kampa et al, 2004) points out that the 30,000–40,000
genes in the human genome does not account for any non-coding RNAs. There have been
new discoveries of non-coding RNAs such as small nucleolar RNAs, microRNAs, guide
RNAs, and antisense RNAs. The addition of these to the gene count would greatly increase
the complexity of the human genome.
Kapronov et al (Kapronov et al, 2002) have used an empirical approach to create a
collection of transcript maps. This approach allows the identification of new regions of
transcription, the detection of RNA transcripts with little or no coding capacity, and
identification of RNA isoforms of previously annotated genes. Having found new
transcripts, they ask why were these transcripts not observed previously and what is their
function? They point out that non-coding RNAs are becoming a functional class of
transcripts important for splicing, nucleolar and ribosomal structures, telomeric sequence
addition, transport and insertion of protein into membranes and down regulation of
translation. Characterizing the functions of these transcripts is a task that is making
progress and the functions may eventually lead to the discovery of a hidden transcriptome.
It is currently difficult to identify the non-coding RNAs, however, as discussed
above they are biologically important. Transcription tiling arrays give us information about
how often each base in the genome is transcribed under experimental conditions. Using
this transcription data I attempt a search for the transcribed non-coding RNA genes by
using sequences from the ENCODE region of the human genome.
Methods:
Using the UCSC Genome Browser I first observed the transfrags track that is based
on tiling array data from Affymetrix. The transfrags represent regions of chromosomes 6, 7,
13, 14, 19, 20, 21, 22, X, and Y. Keeping in mind that this track only represents select
chromosomes I used the Table Browser to intersect this track with the ENCODE track
since I am only interested in the chromosomes of the ENCODE region. I then intersected
the ENCODE transfrags with non-coding genes, using the known genes track, to find the
non-coding genes that are transcribed. I further divided these non-coding transfrags into
those that are conserved and not conserved using the most conserved track. I gathered the
statistics of the number of conserved non-coding transcripts and non-conserved
non-coding transcripts.
Once I had these tracks I excluded the retrotransposons from the transcripts. Then I
used the evofold track to analyze the structure predictions of the non-coding transcripts. I
picked a couple of the highest scoring, non-coding transcripts to analyze their possible
functions.
Results:
In the ENCODE region I found that 0.87% is conserved non-coding transcripts and
10.48% is non-coding transcripts that are not conserved. When the retrotransposons were
excluded the evofold track found 0.09% predictions for the conserved transcripts and
0.65% predictions for the transcripts that are not conserved.
Of these evofold predictions I picked the top scoring structures with possibility of
functional importance. Figure 1 shows a conserved non-coding structure prediction with a
score of 645. Figure 2 shows another conserved non-coding structure prediction with a
score of 697.
Figure 1: Location of evofold structure prediction with a score 645. Details of structure
prediction in attached score645.pdf.
Figure 2: Location of evofold structure prediction with a score 699. Details of structure
prediction in attached score699.pdf.
Discussion:
The evofold structure prediction with the score of 645 is located near the FOXP2
gene. The product of the FOXP2 gene is thought to be needed for proper development of
speech and language regions of the brain during embryogenesis. The fact that this
conserved non-coding transcript is located near the region of this gene suggests the
possibility that it may have some functional contribution in the development of this gene.
The evofold structure prediction with a score of 699 is located near the end of the same
FOXP2 gene suggesting a similar relationship.
The results I have obtained suggest conserved non-coding genes are most likely
transcribed for a functional reason. Non-coding transcripts should not be disregarded
because they can have other relevant functions; the fact that it is not transcribed into a
protein does not mean it is unimportant transcript. More recently people have discovered
important non-coding transcripts. Perhaps this is why some of these non-coding sequences
are conserved. This begins to answer the question of the relationship of conservation and
transcription and that there may be some type of correlation. However, there is no direct
relationship between the function of the transcripts and if it is coding or non-coding. The
results that I have come across go along with the research in this area discussed earlier.
Further investigation in this area can lead to the discovery of many new functionally
important transcripts that are not currently accounted for in the human genome.
References:
Dubchak, Inna et al. “Active Conservation of Non-coding Sequences Revealed by
Three-Way Species Comparisons.” Genome Research. Vol 10, Issue 9(2000): 1304-1306.
Sept, 2000.
Kampa, D et al. “Novel RNAs identified from an in-depth analysis of transcriptome of
human chromosomes 21 and 22.” Genome Research. Vol 14, Issue 3: 331-342. March,
2004.
Cawley, Simon et al. “Unbiased mapping of transcription factor binding sites along human
chromosomes 21 and 22 points to widespread regulation of non-coding RNAs.” Cell. Vol
114, Issue4: 499-509. February, 2004.
Kapranov, Philipp et al. “Large-Scale Transcriptional Activity in Chromosomes 21 and
22.” Science.Vol 296, Issue 5569:916-919. May, 2002.
Download