Use of MESs-clustering software to compare tissues by gene
expression characterized by Unigene and Homologene databases
Saulo de Paula Pinto2, Elisa R. Donnard1, Gabriel Fernandes1, J. Miguel Ortega1
Dept. Bioquimica e Imunologia, ICB, UFMG, 2PUC MINAS
NCBI Unigene database [1] collects occurrence of sequences per cluster of
expressed sequences organized by body sites (tissues). Another resource known as
Homologene [1] builds homology relationship between genes of diverse organisms. We
set up to organize a local repository of those data in a local MySQL database. We
choose to collect the information on 14 organisms spanning from man to rice. The
database thus holds information from 1907 Neurospora crassa up to 116273 Homo
sapiens unigenes. In order to have enough EST representation in tissue or body site
samples, we concentrate our analysis on samples with over 10000 sequences, which
contemplates 78, 46, 20 distinct tissues for human, mouse, and rat, respectively. Besides,
we grouped unigenes under the same homologene identifier in order to enable interorganisms tissues comparison and we considered the sum of the individual unigene
expressions to be the homologene expression.. After that, we applied the MESs (most
expressed sequences) clustering [2] to the corresponding datasets. Considering unigene
datasets we found out that human tissues share MESs from 62.3% for lung-uterus pair
to 11.4% for tonsil-trachea whereas mouse share from 61.1% for lung-mammary gland
to 6.1% for spinal cord-tongue. Rat shared much less than the others: from 39% for
lung-spleen to 12.2% for pituitary gland-testis). Besides, one remarkable finding is the
high occurrence of pairs involving lung and kidney among the 20 topmost ones: 11 in
human, 15 in mouse, and 12 in rat, which can indicate a certain variability of the
expressed sequences among the 1000 analyzed MESs, for these two tissues. On the
other side, among the 20 lowermost pairs we found a prevalence of pairs involving
testis and liver: 11 in human, 12 in mouse, and 17 in rat, which points to a much
different set of expressed MESs. The homologene results almost repeated the unigene
ones considering separated organisms. But considering the three organisms at once we
found that same tissues from different organisms share MESs from 52.6% for mouse
and rat liver to 6.6% for human trachea and mouse tongue. Our results are not
conclusive at all, but point to a direction where the expression estimated in the way
presented here can be reliably used to find out relationships among the patterns of gene
expression in different tissues and organisms.
[1] Pontius, J. U, Wagner, L, Schuler, G. D. UniGene: a unified view of the transcriptome. In: The NCBI
Handbook. Bethesda (MD): National Center for Biotechnology Information; 2003.
[2] de Paula Pinto, S. and Ortega, J. M., An algorithm to infer similarity among cell types and
organisms by examining the most expressed sequences (MESs), to appear in the Genetic and
Molecular Research, special issue in Bioinformatics, 2008.