Understanding phylogeny using DNA correlation patterns Dilbag Sokhi, Atulya Nagar Rajendra Bhansali Department of Computing, Department of Mathematical Sciences, Liverpool Hope University, University of Liverpool, Hope Park, Peach Street, Liverpool – L16 9JD, Liverpool – L69 7ZL UK. UK. Abstract The statistical analysis of DNA sequence reveal long-range correlation patterns which are being supposed to be created through a dynamical process which led to the current state of the DNA sequence. Two phenomena are being proposed which claim to have created the long-range correlations in DNA sequences. The first argument is that long-range power law correlation is the consequence of expansion-modification of the DNA sequence which can be demonstrated by the covariance or autocorrelation functions. The second argument presents an alternate explanation of duplication events in DNA evolution which caused the self-similarity in the DNA sequence. The present research is based on analysing the long-range correlation patterns of DNA sequence using various techniques such as autocorrelation function, Wavelet Transform, and Detrended fluctuation analysis. We have found that all these techniques reflect the fractal nature of the DNA sequence which supports the hypothesis of expansionmodification occurring in the DNA sequence during the process of evolution. We have carried out an extensive analysis of DNA sequences of various species so as to found the similarity patterns existing in the DNA sequences which exhibit the long-range correlations. We intend to use the similarity of the DNA sequences in terms of the long-range correlation patterns in genes of various species to get insight into the relationships that 1 exist among the homologous sequences. Indeed we have found peculiar similarity among genes of closely related species such as Cattle, Chimp and Humans. Research Objective(s): Using long-range correlation patterns in DNA sequence to get an insight into the dynamical process involved in the elongation of the DNA sequence. Another objective is to use long-range patterns to analyse the phylogenetic relationship among various species. Method(s): Autocorrelation analysis, Wavelet analysis, Detrended fluctuation analysis Results: The results of the analysis suggest that there is enough evidence of the phylogenetic relationships that exist in the genes of various closely related species. It can be said that certain portions of the DNA sequences are being conserved during the process of evolution which carry out certain significant biological functions. 2 Bibliography P. Abry, P. Flandrin, M. Taqqu, and D. Veitch: Wavelets for the analysis, estimation and synthesis of scaling data, In Self-similar Network Traffic and Performance Evaluation, Wiley, 1999 A. Arneodo, E. Bacry, P.V. Graves, J.F. Mugy, Characterizing Long-Range Correlations in DNA Sequences from Wavelet Analysis, Phys. Rev. Lett. 74: 3293, 1995 M. Y. Azbel, Random Two-Component One-Dimensional Ising Model for Heteropolymer Melting, Phys. Rev. Lett. 31: 589, 1973 M.Y. Azbel, Universality in a DNA Statistical Structure, Phys. Rev. Lett. 75: 168, 1995 P. Bernaola-Galvan, P. Carpena, and R. Roman-Roldan, et al. Study of statistical correlations in DNA sequences. Gene, 300: 105-115, 2002 J. Beran: Statistics for Long-Memory Processes, Chapman and Hall, New York, 1994 SV. Buldyrev, AL. Goldberger, S. Havlin, C.-K. Peng, M. Simons, H. E. Stanley, Generalized Levy Walk Model for DNA Nucleotide Sequences. Phys Rev E. 47: 4514-4523, 1993 SV. Buldyrev, AL. Goldberger, and S. Havlin, C.K. Peng, H.E. Stanley, M.H. Stanley and M. Simons Fractal Landscapes and Molecular Evolution: Modeling the Myosin Heavy Chain Gene Family. Biophys J. 65: 2673-2681, 1993 3 GA. Churchill. Hidden Markov chains and the analysis of genome structure. Computers Chem, 16: 107-116, 1992 JW. Fickett and CS. Tung. Assessment of protein coding measures. Nucleic Acids Research 20: 6441-6450, 1992 AY. Grosberg and AR. Khokhlov. Statistical Physics of Macromolecules, New York: AIP Press, 1994 X. Gu, WH. Li. A model for the correlation of mutation-rate with gc content and the origin of gcrich isochores. J Mol Evol. 38: 468-475, 1994 S. Karlin and V. Brendel. Patchiness and correlations in DNA sequences. Science. 259: 677-680, 1993 WH. Li. Expansion-modification systems: A model for spatial 1/f spectra. Phys Rev A. 43: 5240-5260, 1991 W. Li, K. Kaneko. Long-range correlations and partial 1/f a spectrum in a noncoding DNA sequence. Europhys Lett. 17: 655, 1992 W. Li, Marr, T.G., Kaneko, K. Understanding long-range correlations in DNA sequences. Physica D, 75: 392-416, 1994 PJ. Munson, RC. Taylor, and GS. Michaels. DNA correlations. Nature. 360: 636-636, 1992 S. Nee. Uncorrelated DNA walks. Nature. 357: 450-450, 1992 K. Park, W. Willinger: Self-Similar Network Traffic and Performance Evaluation, Wiley Interscience, 2000 4 C.-K., Peng, S.V., Buldyrev, A. L., Goldberger et al.: Long-range correlations in nucleotide sequences, Nature, 356: 168-170, 1992 C.-K., Peng, S.V., Buldyrev, S., Havlin et al: On the mosaic organization of DNA sequences, Phy. Rev. E., 47: 3730-3733, 1994 G. Rangarajan, M. Ding (eds): Processes with Long-Range Correlations: Theory and Applications, Lecture Notes in Physics, Springer-Verlag, Berlin, Germany, 2003 G. Samorodnitsky, M. S. Taqqu: Stable Non-gaussian processes: Stochastic Models with Infinite Variance, Chapman and Hall, New York, 1994 HE. Stanley. Introduction to Phase Transitions and Critical Phenomena. London: Oxford University Press, 1971 A. Torcini, M. Antoni: Equilibrium and dynamical properties of two-dimensional Nbody systems with long-range attractive interactions, Physical Review E, 59: 2746 – 2763, 1999 R. Voss. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett. 68: 3805-3808, 1992 5