Finding functional regulatory SNPs

advertisement
Comp ut a t io na l f ind in g of fun cti on al r egu l ato r y SN P s
Iri na AB NI ZOV A
MRC-BSU, Robinson Way, Cambridge, UK, irina.abnizova@mrc-bsu.cam.ac.uk
Lu i sa FO C O
University of Pavia, Italy, luisa.foco@unipv.it
Rene te BOEKHORST
University of Hertfordshire, College Lane, Hatfield – UK, r.teboekhorst@herts.ac.uk
Lu i sa B E R N AR DINE LL I
MRC-BSU, Robinson Way, Cambridge, UK, luisa.bernardinelli@mrc-bsu.cam.ac.uk
This work is devoted to the analysis of human variations in complex human diseases.
We present here in silico bioinformatic method for inferring possible function of
regulatory single nucleotide polymorphisms, SNPs, in human disease development.
The research presented here combines the strengths of both genetics and genomics by
investigating genetic variants, Single Nucleotide Polymorphisms in regulatory regions
instead of genes. By bringing together the computational search and characterisation
of regions in DNA that regulate gene expression on the one hand and information
about individual variation in the structure of human DNA on the other hand, it aims to
identify likely regulatory regions, the individual variation in their molecular make up
and the effect this may have in the phenotypic expression of genes.
There is strong recent interest in regulatory SNPs [1-8]. There have been also
demonstrated by combining experimental evidence and computation that the promoter
regions of human genes provide a rich source of functional single nucleotide
polymorphisms [4-8]. As many as 35% of promoter SNPs may be of functional
significance [4]. There are, however, currently no computational tools, except of [8]
for promoters, which can be used to assess directly from regulatory DNA sequence
whether or not a given variant is likely to alter gene expression and hence be of
functional significance.
Here, we present the approach that can allow in silico estimation of the likely
functional consequences of single nucleotide changes in putative regulatory DNA.
This approach is based on the integration of at least 16 sources of supervised sequence
information about a given DNA stretch, with unsupervised methods [9,10]. We have
1
also incorporated the novel method, which analyse a SNP functionality due to
sensitivity of a mathematical model with respect to the SNP variant.
Essentially, the method consists of identifying regions in the human genome that are
likely important in the regulation of gene expression and contain motifs that identity
them as TFBSs. We then establish whether the motifs contain SNPs and if so, in how
far these mutations destroy the signal by which regulatory proteins recognize the
motifs as binding sites. Especially these SNPs could be strong candidates for further
experimental verification to establish their possible role in the genesis of and
susceptibility for particular diseases.
Results. To test the method, we collected several known from literature diseaseassociated regulatory SNPs [1-3]. We checked if the disease-associated regulatory
SNP is within one of the feature-predictions, and thus has a high score. We found that
the scores of the disease-associated regulatory SNPs were among the highest scores
for all SNPs for all our training sets. Furthermore, these SNPs appeared to be variant
sensitive, namely some particular SNP variant changed the results of motif
predictions. Interestingly, we found out that known disease-causal SNP variants
formed significantly underrepresented motifs within local context.
References
1.
Monsuur AJ, de Bakker PI, Alizadeh BZ, Zhernakova A, Bevova MR, Strengman E, Franke L, van't
Slot R, van Belzen MJ, Lavrijsen IC, et al. (2005) Nat Genet. 37:1341-4.
2. Ueda H, Howson JM, Esposito L, Heward J, Snook H, Chamberlain G, Rainbow DB, Hunter KM, Smith
AN, Di Genova G, et al. (2003) Nature 423:506-11.
3. Morahan G, Huang D, Ymer SI, Cancilla MR, Stephen K, Dabadghao P, Werther G, Tait BD, Harrison
LC, Colman PG (2001) Nat Genet. 27:218-21.
4. Hoogendoorn, B., Coleman, S. L., Guy, C. A., Smith, S. K., O'Donovan, M. C. and Buckland, P. R.
(2004). Functional analysis of polymorphisms in the promoter regions of genes on 22q11. Hum. Mutat.
24, 35-42.
5. Mooney, S. (2005). Bioinformatics approaches and resources for single nucleotide polymorphism
functional analysis. Brief. Bioinform. 6, 44-56
6. Pastinen, T. and Hudson, T. J. (2004). Cis-acting regulatory variation in the human genome. Science
306, 647-650
7. Hudson, T. J. (2003). Wanted: regulatory SNPs. Nat. Genet. 33, 439-440
8. Paul R. Buckland , Bastiaan Hoogendoorn, Sharon L. Coleman, Carol A. Guy, S. Kaye Smith, Michael
C. O'Donovan (2005) Strong bias in the location of functional promoter polymorphisms,
9. Khan I, et al. and Chuzhanova N. (2006) In silico discrimination of single nucleotide polymorphisms
and pathological mutations in human gene promoter regions by means of local DNA sequence context
and regularity, In Silico Biology 6, 0003
10. Irina Abnizova, Alistair G. Rust, Mark Robinson, Rene te Boekhorst and Walter R. Gilks, (2006)
Prediction of TFBS using Markov models, J. of Bioinformatics and Comp. Biology, v4, n2, pp 425-441
11. Irina Abnizova, Rene te Boekhorst, Klaudia Walter and Walter R. Gilks, (2005), Some statistical
properties of regulatory DNA sequences, and their use in predicting regulatory regions in eukaryotic
genomes: the fluffy-tail test. BMC Bioinformatics, 6:109
2
Download