Inter-species sequence conservation and intra- species sequence diversity Apratim Mitra

advertisement
Inter-species sequence
conservation and intraspecies sequence diversity
Apratim Mitra
Background
 Sequence
conservation across species is
a well-documented fact
 Genes
coding for the same or similar
proteins, even in evolutionary distant
organisms, have been observed to have
remarkable similarities
Background (Contd.)
 At
the same time, proteins with similar
functions, even in the same species, can
show a bewildering diversity
 Eg.
Immuno-globulins ( commonly called
‘antibodies’)
Aim of this project
 To
demonstrate intra-species sequence
diversity and inter-species sequence
conservation using various web-based
resources and tools, eg., NCBI, GenBank,
ClustalW, etc.
 Investigating
a new way of visualizing the
multiple alignment results
Cumulative alignment profile

We produce a pair-wise alignment from such
alignment programs as ClustalW or MUSCLE

Using BLOSUM / PAM substitution matrices and
Gap opening/extension penalties, we build a
cumulative alignment score profile from the
above alignment

In addition to global sequence similarity this
would include spatial information
Plan of Action
1.
Pool sequences of same/similar genes from different
species and proteins (eg. IgG) from same species that
exhibit diversity.
2.
Run multiple alignment and clustering programs to
obtain phylogenic trees hinting at evolutionary
relationships.
3.
Transform the alignment results into a cumulative
alignment profile that indicates spatial features.
4.
Cluster these profiles using correlation measures and
obtain phylogenic trees.
5.
Compare the two results.
Schematic
Collect sequences from online libraries
Align using ClustalW, MUSCLE, etc
Convert alignment scores into a ‘profile’ that indicates
spatial information about the alignment
Cluster these profiles and compare with the phylogenic
tree obtained at the earlier step
Why do it ?
 Global
pair-wise alignment scores look at
the entire alignment at once
 An
alignment profile which indicates some
spatial information would be a way of
‘improving’ interpretation
 Sequences
which have a high degree of
similarity can be differentiated on the basis
of ‘patterns’ of dissimilarity in the profiles
Further Uses/Extensions

This method could be useful when trying to find:


Differences between closely related species
Similarities between distant species

Can be easily extended to multiple alignments
although results might be hard to interpret

The cumulative profiles can also be analyzed by
time-frequency methods like Fourier transforms or
wavelet analysis for feature extraction
Thank You
Download