talk_DNAEditing

advertisement
Junk DNA and
DNA editing
‫מוצ"ש י"ג אייר‬
17/05/2008
Shai Carmi
Bar-Ilan, BU
Genome structure
• DNA has mostly evolved to store the
code of the proteins its host cell is using.
• Thus, the main functional
units of any genome are 5’
protein coding genes.
A,C,G,T
• The central dogma of
molecular biology:
A,C,G,U
DNA→RNA → Protein
3’
Final product:
Proteins are the
cellular machinery
20 amino acids
?‫מותר האדם מן הבהמה אין‬
• In human, protein coding sequences are only 2% of the
genome.
• All animals have the same order
of magnitude of genes
(few tens of thousands).
• Does non-coding DNA
determines complexity?
• Is everything else junk?
Non-coding DNA
• The rest codes for introns, promoters and enhancers
(regulation of expression), structural sequences (e.g.
telomeres), non-coding RNAs such as rRNA and tRNA
(translation), micro-RNA (silencing), snRNA (splicing).
• But this is not all!
• Almost HALF of the human genome is made of
mobile elements.
• Pieces of ~100-10k base pairs moving around the
genome in a cut&paste or copy&paste mechanisms.
DNA transposons
• DNA transposons:
cut&paste using the
enzyme transposase
(3% of the genome).
• Sometimes transfers
also host sequences.
• Increases the genome volume
only through repeats at the edges
or if happens during S-phase.
Retrotransposons
• Retrotransposons: copy&paste mechanism through
RNA intermediate.
• Main classes:
 LTR (retrovirus like, 8.7% of the genome).
 LINE (Long interspersed nuclear elements, 21.3%).
 SINE (Short interspersed nuclear elements, 13.6%).
• Retrotransposons behave like retroviruses.
• What are retroviruses?
Retroviruses
• Retroviruses are pieces of (ss) RNA (DNA in other viruses)
wrapped in a capsid and envelope.
Few thousand bases
• They penetrate into the cell, and use the cell machinery to
replicate, assemble a new virus, and infect another cell.
• Example: HIV.
Retroviral infection
Retroviral proteins (advanced)
• Pol: Encodes a polyprotein with protease (cleavage of the retrovirus proteins).
 Reverse transcriptase (copy the RNA to DNA).
 RnaseH (degradation of RNA after reverse transcription).
 Integrase (integration of the DNA into the genome).
• Gag: Codes for core and structural proteins of the virus.
• Env: Glycoprotein that recognizes membrane receptors
of the host cell and initiate the process of infection.
• Complex splicing pattern, with partial overlap and
frameshifting
Retrotransposons
• It is commonly believed that ancient retroviral infection
in the germ line is the origin of nowadays
retrotransposons.
• How did they occupy 40% of the genome?
1. Transcription: genomic DNA→RNA.
2. Translation of viral proteins (if possible).
3. Reverse transcription: RNA → DNA by reverse
transcriptase.
RETRO: violating the central dogma!
4. Insertion into new genomic locations, increasing the
number of genomic copies of the sequence.
• Mobile elements are like double edge sword.
Why are retrotransposons good?
• Serve as reservoir of sequences
for genetic innovation.
• Retroviral proteins have DNA binding capabilities
which can be exploited by the host cell.
• Regulate expression levels of existing genes.
• Change gene regulation networks:
• By copying a promoter, two sequences are
controlled by the same transcription factors (or in
other cases by RNA binding proteins or miRNA).
Why are retrotransposons bad?
• Retroelements generate mutations,
through direct insertion into genes,
or unequal homologous recombination.
• Responsible to 0.3-0.5% of all genetic disorders (e.g.
hemophilia).
• Change the normal transcription of the gene (alter
promoter activity, anti-sense transcription, silencing
via methylation or miRNA binding).
• Alternative splicing and protein isoforms.
Examples
How can we stop them ???
Inhibition of retroelements
Few mechanisms exist:
• Accumulation of mutations results in non-autonomous
elements.
• Methylation and heterochromatin formation attenuates
transcription (LINE).
• RNA interference.
Probably we did:
• DNA editing
(1) Here we are, more complex than any
other organism.
(more to come).
(2) Most elements are inactive–
• Did we succeed?
only Alu and L1 are active with
insertion once in 100 births.
Basics of DNA editing
• The APOBEC3 family of proteins was found to restrict
retroviral replication. One of its mechanisms of operation
is by “Cytosine Deamination of the (-) strand DNA strand
after reverse transcription”. Meaning…
• APOBEC catalyzes some chemical modification of the
DNA just before it is integrated into the genome,
eventually generating G→A mutation (editing).
• (localization varies nucleus/cytoplasm).
• Inducing tens/hundreds of mutations (uracil excision?).
• Editing itself is not sufficient to stop replication- other
mechanisms are also used.
Basics of DNA editing
Evolution of APOBEC
• APOBEC3G is one of the most positively selected
genes (=changes the fastest).
• Ongoing arms race with HIV.
• In response to APOBEC,
HIV developed the Vif protein that can ubiquitinate
APOBEC (=send it to “recycle” (proteasome)).
• Different APOBECs restrict retroviruses/transposons
in different mechanisms (e.g., binding to RNA and
blocking reverse transcription).
DNA editing in the genome
• Some retrotransposons were edited by APOBEC, but
yet integrated into the genome.
• New mechanism of mutagenesis.
• So far, almost neglected by geneticists.
• Together with Erez Levanon, HMS.
• Analyzed retroelements in
mouse, human and chimp,
applying new statistical approach.
Main results
• Editing has fingerprints in
thousands of mouse IAP/MusD
retroelements, with distinguished
motifs.
• Predicting hundreds of thousands editing sites.
• Edited IAPs are transcribed more than non-edited.
• Some edited IAPs overlap with introns and exons.
• Phylogenetic tree can be changed if considering
editing information.
• Editing also in non-LTR, LINE mouse elements.
• Editing in human and chimp HERV retroelements.
DNA editing demonstration
• Comparing two mouse
IAPs.
•
chr9:114987516-114993954
chr8:28575443-28581824
• One cluster of 68
consecutive G→A!
Can
• Total 176/202
G→Aediting
mismatches.
Easily available raw
material for the
generation of new
functions!
(for example: any
editing in TGG creates
premature stop codon).
accelerate evolution?
DNA editing phylogenetics
If two sequences are the same except
for G→A mutation, the sequence with
‘G’ must precede the one with ‘A’.
Thus we can build the tree of elements.
Automatically
generated
genetic tree.
Same tree,
masking the
editing.
Editing affects phylogenetics!
Summary
• Significant fraction of the DNA originates from infection by
ancient RNA viruses, spreading through the genome by
reverse transcription and replication.
• Some of them ‘domesticated’ to benefit the host cell (not
really junk!), but some induce deleterious mutations.
• One of the mechanisms to restrict retrotransposition is
editing them before integration into the genome.
• Many genomic sites are ‘edited’
due to this restriction activity.
• New mechanism of mutagenesis,
potentially leading to evolution of
new molecules or function
(for example, HIV drug resistance).
Download