Moore, L. R., C. Blank, J.G. Burleigh, H. Cui, G. Gasparich, J

advertisement
Next Generation Microbial Phenomics
Lisa R Moore1, Carrine Blank2, Gordon Burleigh3, Hong Cui4, Gail Gasparich5, Jing Liu3, Sonali
Ranade4
1University of Southern Maine
2Montana State University
5Towson University
3University of Florida
4University of Arizona
Biologists and non-biologists alike intuitively connect to the natural world and its underlying
scientific principles through phenotypes. This is true even for microscopic life. For centuries
microbiologists have composed detailed phenomic descriptions of microorganisms, such as
morphology, ecology, metabolism and host-cell interactions, which have been essential for
taxonomy and identification. In combination with gene sequence and genomic analyses,
phenomic information has been important for understanding evolution of microbial traits and
possible horizontal gene transfer events, and co-evolution of host-associated microbes. These
type of studies require large datasets. Obtaining large phylogenetic datasets has become
relatively easy, however, pulling together the phenomic characters from the rich legacy of
microbial literature is tedious. The goal of this project is to develop natural-language processing
tools to assemble phenomic data matrices mined from legacy taxonomic texts that can be used
for mapping phenomic characters onto phylogenetic matrices for analysis of microbial trait
evolution and visualizing the microbial tree of life. CharaParser is a natural-language processing
tool that was developed to analyze the text of phenomic descriptions and produce a structured
output and used successfully with plant and insect descriptions (Cui 2012). We tested
CharaParser with microbial descriptions, but found that the descriptions, which often included
chemical terms and growth conditions, were very different from that of other taxa and were
not recognized very well by the CharaParser tool. Two other software tools were tested,
Stanford Parser and Open-Source Chemistry Analysis Routines (OSCAR), for incorporation into a
new natural-language processing tool for use with microbial descriptions. To help with
developing a new algorithm, we are searching for microbial ontologies. These approaches will
be tested against hand-generated phenomic matrices already generated (Blank and SanchezBaracaldo 2010) to assess their accuracy.
H. Cui, 2012, CharaParser for fine-grained semantic annotation of organism morphological
descriptions. Journal of American Society for Information Science and Technology 63(4): 738754, doi:10.1002/asi.22618.
C. E. BLANK AND P. SANCHEZ-BARACALDO, 2010, Timing of morphological and ecological
innovations in the cyanobacteria – a key to understanding the rise in atmospheric oxygen.
Geobiology 8, 1–23, DOI: 10.1111/j.1472-4669.2009.00220.x
Download