Second progress report on the analysis of the Physarum sequences

advertisement
Second progress report on the analysis of the Physarum sequences obtained through the
pilot assay. by Gerard Pierron
To better understand what is found in the 22, 616 Physarum “traces” or “reads”, some words
are needed on the way the DNA was prepared by Marianne Bénard. Amoeba were grown in
axenic liquid cultures, harvested and homogeneized to isolate nuclei. About 10% of the cells
were resistant to the homogenization and were lysed with the isolated nuclei, bringing in a
number of mitochondria. After proteinase K treatment, the DNA was purified on CsCl
gradients (small tubes, vertical rotor, traces of ethidium bromide). The genomic DNA was
then taken out with a syringe by side puncturing under UV. This is done by puncturing
slightly below the main DNA band where the GC-rich extrachromosomal rDNA is found. So,
the final DNA preparation contained significant amount of rDNA while, in contrast, most of
the AT-rich mitochondrial DNA, which bands above the genomic DNA, was eliminated.
Mitochondrial DNA content in the 22,616 reads
Blasting the fully-sequenced mitochondrial Physarum genome (AF027295) against the 22,616
traces of Physarum indicates that only 29 reads contain mitochondrial DNA, as mentioned
before by Sandy Clifton. This is a very small number since the mtDNA represents roughly 710% of total DNA and could have accounted for up to 2,000 reads. When a trace matches the
mitochondrial genome (e.g 818410639 or aab31b01.b1), one also might expect its mate
(aab31b01.g1) to match. This is true for the highest 24 scores and gives an opportunity to
measure inserts sizes. I measured 4 inserts and found sizes of 3,596, 3,595, 3,561 and 3,411bp
respectively.
The graphical representation (below) of the hits along the mitochondrial genome indicates,
however, a non-even distribution of the hits. None of them are found in the first 20 kb when 3
clones overlap around 22 kb. There are 3 reads for which the mate is apparently absent from
the mitochondrial genome (I traced a line between the mates). For read 1, the mate contains a
TP1 (transposon 1 sequence of N. Hardman) suggesting a nuclear localisation? Or a cloning
artefact? The read itself, which is 943 nt long, contains mitochondrial DNA up to nt 531 and
not more. Read 2 is short, 440 bp, fully mitochondrial but its mate does not contain any
known sequence. Finally, read 3 puzzles me. Its mate does contain mitochondrial DNA,
although with “intervening” sequences, and should have shown up in the figure. The reason
why it does not is completely unclear to me.
Question : Anything known about a differential stability in E. coli of mitochondrial DNA
clones originating from sequences 0 to 20-kb? Or, is this a bias due to the small size of the
sample?
Ribosomal DNA content in the 22,616 reads
To detect ribosomal DNA in the pilot assay, I blasted the 6,191 bp sequence (VO1159)
containing the 5.8 and 26S rRNA gene. The repetitive nature of this DNA is immediately seen
in the form of many overlapping reads.
This time the distribution of the reads along the sequence is relatively even, with somewhat
more clones on the left side, that corresponds to the 5.8S rRNA. Many reads are about 900 bp
long and are 99% identical to the known sequence. The mistakes are located at the end of the
trace. I measured 3 inserts that were 3.78, 3.65 and 4.18 kb long. If one considers a 1-kb
window as on the graph, about 25 copies are present in the trace archive. If this is true all
along the rDNA palindrome, this gives an estimate of 30 x 25 = 750 traces containing rDNA
i.e 750/22 616 x 100 = 3.3%. The amount of rDNA in the Physarum genome is not exactly
known, but is in the range of 1-2%, suggesting a slight preferential cloning and/or an
enrichment during CsCl gradient DNA purification.
Genomic DNA : Transposon-like elements
The amount of DNA sequenced in the pilot assay is 14,000 kb. This is about 5-10% of the
haploid genome, approximately a 0.1x coverage. Therefore most single copy genes should be
absent from the reads. Indeed, none of the known Physarum actin, profilin, or histone H4
sequences were found by blast. On the other hand, repetitive DNA sequences should show up.
About one third of the Physarum genome is known to re-associate rapidly upon denaturation
and to be hyper-methylated. For one part, this “repeated compartment” is composed of
scrambled versions of a 8.3-kb LTR-retrotransposon--like sequence (TP1) that had inserted
into itself, generating 20-50-kb islands of repeated, hyper-methylated DNA as shown by
Norman Hardman. Blasting the TP1 element against the traces generated about 500 hits.
Another LTR-retrotransposon of 1.68-kb (TP2) has been described by N. Hardman, which,
despite its short size, generated another 300 hits when blasted against the reads. If one
considers the 0.1x coverage, it is likely that these sequences are represented more than 1,000
times in the Physarum genome. Neither of these 2 elements appears to be present in
Dictyostelium although a TP1-homolog is present in Arabidopsis. In any case, the number of
traces containing homologs of retroviral-like proteins like Gag, Pol (Reverse Transcriptase) or
polyprotein is much higher than the TP1+TP2 hits, in the range of 2,500, indicating other
families of repeated elements in Physarum.
Clearly, much information on the Physarum genome can be obtained from this pilot assay. In
the next report, I will review some data obtained on the analysis of unique sequences
matching sequences previously known from Physarum and some others that have a
counterpart in Dictyostelium.
Download