Understanding phylogeny using DNA correlation patterns

advertisement
Understanding phylogeny using DNA correlation patterns
Dilbag Sokhi, Atulya Nagar
Rajendra Bhansali
Department of Computing,
Department of Mathematical Sciences,
Liverpool Hope University,
University of Liverpool,
Hope Park,
Peach Street,
Liverpool – L16 9JD,
Liverpool – L69 7ZL
UK.
UK.
Abstract
The statistical analysis of DNA sequence reveal long-range correlation patterns which are
being supposed to be created through a dynamical process which led to the current state
of the DNA sequence. Two phenomena are being proposed which claim to have created
the long-range correlations in DNA sequences. The first argument is that long-range
power law correlation is the consequence of expansion-modification of the DNA
sequence which can be demonstrated by the covariance or autocorrelation functions. The
second argument presents an alternate explanation of duplication events in DNA
evolution which caused the self-similarity in the DNA sequence.
The present research is based on analysing the long-range correlation patterns of DNA
sequence using various techniques such as autocorrelation function, Wavelet Transform,
and Detrended fluctuation analysis. We have found that all these techniques reflect the
fractal nature of the DNA sequence which supports the hypothesis of expansionmodification occurring in the DNA sequence during the process of evolution. We have
carried out an extensive analysis of DNA sequences of various species so as to found the
similarity patterns existing in the DNA sequences which exhibit the long-range
correlations.
We intend to use the similarity of the DNA sequences in terms of the long-range
correlation patterns in genes of various species to get insight into the relationships that
1
exist among the homologous sequences. Indeed we have found peculiar similarity among
genes of closely related species such as Cattle, Chimp and Humans.
Research Objective(s):
Using long-range correlation patterns in DNA sequence to get an insight into the
dynamical process involved in the elongation of the DNA sequence. Another objective is
to use long-range patterns to analyse the phylogenetic relationship among various
species.
Method(s):
Autocorrelation analysis, Wavelet analysis, Detrended fluctuation analysis
Results:
The results of the analysis suggest that there is enough evidence of the phylogenetic
relationships that exist in the genes of various closely related species. It can be said that
certain portions of the DNA sequences are being conserved during the process of
evolution which carry out certain significant biological functions.
2
Bibliography
P. Abry, P. Flandrin, M. Taqqu, and D. Veitch: Wavelets for the analysis, estimation
and synthesis of scaling data, In Self-similar Network Traffic and Performance
Evaluation, Wiley, 1999
A. Arneodo, E. Bacry, P.V. Graves, J.F. Mugy, Characterizing Long-Range
Correlations in DNA Sequences from Wavelet Analysis, Phys. Rev. Lett. 74: 3293,
1995
M. Y. Azbel, Random Two-Component One-Dimensional Ising Model for
Heteropolymer Melting, Phys. Rev. Lett. 31: 589, 1973
M.Y. Azbel, Universality in a DNA Statistical Structure, Phys. Rev. Lett. 75: 168,
1995
P. Bernaola-Galvan, P. Carpena, and R. Roman-Roldan, et al. Study of statistical
correlations in DNA sequences. Gene, 300: 105-115, 2002
J. Beran: Statistics for Long-Memory Processes, Chapman and Hall, New York, 1994
SV. Buldyrev, AL. Goldberger, S. Havlin, C.-K. Peng, M. Simons, H. E. Stanley,
Generalized Levy Walk Model for DNA Nucleotide Sequences. Phys Rev E. 47:
4514-4523, 1993
SV. Buldyrev, AL. Goldberger, and S. Havlin, C.K. Peng, H.E. Stanley, M.H. Stanley
and M. Simons Fractal Landscapes and Molecular Evolution: Modeling the Myosin
Heavy Chain Gene Family. Biophys J. 65: 2673-2681, 1993
3
GA. Churchill. Hidden Markov chains and the analysis of genome structure.
Computers Chem, 16: 107-116, 1992
JW. Fickett and CS. Tung. Assessment of protein coding measures. Nucleic Acids
Research 20: 6441-6450, 1992
AY. Grosberg and AR. Khokhlov. Statistical Physics of Macromolecules, New York:
AIP Press, 1994
X. Gu, WH. Li. A model for the correlation of mutation-rate with gc content and the
origin of gcrich isochores. J Mol Evol. 38: 468-475, 1994
S. Karlin and V. Brendel. Patchiness and correlations in DNA sequences. Science.
259: 677-680, 1993
WH. Li. Expansion-modification systems: A model for spatial 1/f spectra. Phys Rev
A. 43: 5240-5260, 1991
W. Li, K. Kaneko. Long-range correlations and partial 1/f a spectrum in a noncoding
DNA sequence. Europhys Lett. 17: 655, 1992
W. Li, Marr, T.G., Kaneko, K. Understanding long-range correlations in DNA
sequences. Physica D, 75: 392-416, 1994
PJ. Munson, RC. Taylor, and GS. Michaels. DNA correlations. Nature. 360: 636-636,
1992
S. Nee. Uncorrelated DNA walks. Nature. 357: 450-450, 1992
K. Park, W. Willinger: Self-Similar Network Traffic and Performance Evaluation,
Wiley Interscience, 2000
4
C.-K., Peng, S.V., Buldyrev, A. L., Goldberger et al.: Long-range correlations in
nucleotide sequences, Nature, 356: 168-170, 1992
C.-K., Peng, S.V., Buldyrev, S., Havlin et al: On the mosaic organization of DNA
sequences, Phy. Rev. E., 47: 3730-3733, 1994
G. Rangarajan, M. Ding (eds): Processes with Long-Range Correlations: Theory and
Applications, Lecture Notes in Physics, Springer-Verlag, Berlin, Germany, 2003
G. Samorodnitsky, M. S. Taqqu: Stable Non-gaussian processes: Stochastic Models
with Infinite Variance, Chapman and Hall, New York, 1994
HE. Stanley. Introduction to Phase Transitions and Critical Phenomena. London:
Oxford University Press, 1971
A. Torcini, M. Antoni: Equilibrium and dynamical properties of two-dimensional Nbody systems with long-range attractive interactions, Physical Review E, 59: 2746 –
2763, 1999
R. Voss. Evolution of long-range fractal correlations and 1/f noise in DNA base
sequences. Phys Rev Lett. 68: 3805-3808, 1992
5
Download