ePlant and biological modelling – how and why do researchers use the software ? Freya Scoates, Department of Plant Sciences, University of Cambridge Modelling has always been integral to scientific research, because researchers use models to help understand their theories and make predictions based on them. Large data-sets have become a common feature of today’s science as a result of technological advances. For example, recent developments in so-called ‘Ultra high throughput’ sequencing mean that you can now sequence 5,000Mb/day, or 5,000,000,000 base pairs in a day. 1 The entire human genome consists of only 3,000,000,000 base pairs: this means that it is now possible to sequence the entire human genome in one day. The plant Arabidopsis thaliana (an annual in the brassica family) has a relatively small genome, at only 157 million base pairs. This makes it comparatively simple to use for genetic research, as it is easy to manipulate. Arabidopsis is now one of the most important model organisms used in genetics and biology. In contrast, the largest genome yet discovered is that of Paris japonica, a flowering plant; its genome is 149,000,000,000 base pairs long. 2 Although these new experimental techniques allow the generation of very large data sets, modellers are now having to produce new models in response to this new kind of data-set. ePlant was developed as a way that researchers can integrate and interact with these massive amounts of data, by looking at the model organism Arabidopsis. It allows you to select a gene and follow its influence over the plant from the genome all the way up to the whole organism. What is ePlant? ePlant is an online tool that allows researchers to visualise genetic data related to Arabidopsis thaliana, the model plant, in 3D.3 It is part of a wider initiative which aims to improve biological models and to work with these vast data sets. ePlant allows you to pick a gene and view the known information about it in five key ways: ‘Homologs and Polymorphisms’ shows the linear genetic sequence of your gene of interest, along with known polymorphisms and the corresponding amino acid sequence. The amino acids are colour coded to show their physicochemical properties. ‘Plant Expression’ displays the varying levels at which your gene is expressed throughout the plant, by mapping data onto a 3D model of Arabidopsis in different colours. ‘Tissue Expression’ shows the relative levels at which your gene is expressed in different tissues as different coloured 3D images of different plant tissues, such as stamens and pollen. ‘Subcellular Localisation’ displays where in the cell the protein product is found, mapped onto a 3D model of a plant cell. 1 Kircher, M. and Kelso, J. (2010) High-throughput DNA sequencing- concepts and limitations [Electronic version] Bioessays 32: 524-536 doi:10.1002/bies.200900181 2 Pellicer, J., Fay, M.F., Leitch, I.J. (2010) The largest eukaryotic genome of them all? [Electronic version] Botanical Journal of the Linnean Society 164, Issue 1, 10–15 doi: 10.1111/j.10958339.2010.01072.x 3 Fucile, G., Di Biase, D., Nahal, H., La, G., Khodabandeh, S. et al. (2011) ePlant and the 3D Data Display Initiative: Integrative Systems Biology on the World Wide Web PLoS ONE 6(1): e15237. doi: 10.1371/journal.pone.0015237 ‘Protein Model’ shows the predicted tertiary structure of the protein product of your gene of interest. You can then manipulate the diagram to show different colours Why do researchers use ePlant? ePlant helps geneticists and biologists learn about the expression of a gene in Arabidopsis. It allows researchers to observe the action of the gene on several different scales - from the primary structure of the protein to the way it is expressed in different plant tissues. Because Arabidopsis is the model plant, the expression of the gene in Arabidopsis is used as a starting point for understanding its expression in other plants. How will modelling tools change in the future? This type of software, which allows you to look at so many different aspects of gene expression and organism development, is set to become more popular as our ideas and perspectives on plant development change and evolve. For example, another programme, The Computable Plant, has been developed at the University of California, and aims to put together a whole-systems view of developmental biology in plants. It is constructed to show how different environmental and genetic factors can jointly influence the biochemistry and morphology of a plant over the course of its lifetime.4 Similar software may also be developed to take advantage of the release of many new genomes during the next few years. Many commercially valuable crop plant genomes are going to be released over the next few years: the tomato genome is due to be released some time towards the end of 2011, while the maize genome is also close to being completed. Advancing our understanding of the development of these vital plants will be invaluable as researchers continue to try to improve crop productivity. What is the future of ePlant? The computer modelling programme, ePlant, was launched in October 2010 by a team in the Department of Systems and Computational Biology at the University of Toronto. At present, only 72% of known Arabidopsis gene sequences are available through ePlant- as more is discovered about the genes and proteins of Arabidopsis, more data can be included in the database. The creators are also planning to improve the visualisation of certain plant organs such as the roots, in order to allow a more detailed image to be produced. Due to the incredible complexity of protein interactions within cells, our knowledge of different metabolic pathways is still developing. The developers of ePlant plan to including a new function which allows users to see different metabolic pathways which your gene of interest is involved in. 4 www.computableplant.org retrieved 11th August 2011.