An Introduction to Modelling in Biology Freya Scoates, Department of Plant Sciences, University of Cambridge The rise of computer models to simulate different scenarios resulting from a given set of circumstances is one of the latest in a long line of techniques that people have used to visualise theories and ideas. Modelling is a fundamental skill in all science, and models range in complexity from simple population modelling to ‘whole earth systems’ models, which attempt to show the circulation of air and oceans across the entire planet. In the future, computer modelling is set to become one of the main methods scientists use to explain their theories and make predictions; as a result, a working knowledge of modelling and its uses is essential for research scientists today. In order to understand modelling, it is first important to understand the massive range of experimental tools which can be categorised as a model. In biology, perhaps one of the most obvious examples of modelling is the use of ‘model organisms’. These are organisms which are used to analyse biological phenomena, and the fruit fly (Drosophila melanogaster) is probably the best known. Arabidopsis thaliana (thale cress) is used as a model plant and was the first plant whose genome was mapped. Arabidopsis was chosen for this due to its very small genome, consisting of only five chromosomes. As a flowering plant (angiosperm) Arabidopsis mutants were used to discover the genetic basis of flower development. The first documented description of Arabidopsis thaliana was written by Armin Braun in 1873i; it is the most extensively used model organism for seed-producing plants, and as a result there is a wealth of information on its genetics and development. This means that it is hugely important in agricultural research. At the other end of the spectrum of biological models lies an abundance of numerical and statistical models. For example, Earth systems models are generated from equations which describe fluid thermodynamics. These equations are then used to describe air circulation and ocean currents across the world, and are also used as a basis for climate change predictions. Statistics are also used to visualise data in graphs, another form of model. A famous example of graphs modelling a biological system is the Lotka-Volterra model which explains predatorprey dynamics, showing that predators respond to variation in numbers of prey and vice versa. Image 1: The data on the cycles of lynx (predator) and hare (prey) which is a classic illustration of the Lotka-Volterra model of predator and prey cycles. Retrieved from https://www.math.duke.edu/education/webfeats/Word2HTML/Predator.html on 15th August 2011. The wide range of different types of model perhaps reflects the different ways in which models can be used. One major role of models in science is to generate a visual representation of a particular theory. For example, many people have seen the bronze ‘armillary’ heliocentric models of the solar system, with ‘planets’ on rings which can turn around the centre, symbolising the sun. These models are very simplified but make it easy to understand the central idea of a heliocentric solar system; they strip away details which are not relevant to the theory and include only those aspects which aid understanding, allowing the theory to be easily understood and popularised. Image 2: A diagram of an armillary sphere which illustrates the idea of a heliocentric solar system, with a sphere in middle representing the sun and the rings round the outside illustrating the orbits of the planets. By Flickr user Double—M Another important feature of models is that people can use them to make predictions of the future. A model’s ability to accurately predict the future is often used to judge the accuracy of the model; if a model repeatedly fails to predict future events, then it is usually believed to be wrong. People use this criterion to refine their models. For example, Edmond Halley calculated the trajectory of his eponymous comet in 1705, using Newton’s Law of Gravity to predict that the comet would next be seen in 1758. The comet was next observed on Christmas Day, 1758; this meant that Halley’s model of comet movement was more widely accepted, as it had been corroborated by his successful prediction of the comet’s movement. This ability to make predictions using a model is important when it is impossible to perform a physical experiment. This may be because it is unethical to perform the experiment, or because the system is too large and too complex to alter. For example, climate change scientists can alter parameters in their models to see the effects such a change may have on the environment, which they could not do in real life as the scale of the experiment would be too large. An added benefit is that you can run the experiment far faster as you are not restricted to observing your results in real-time, meaning that you can predict the course of your experiment over a timespan of years or decades. You can also watch your results on a different scale- for example, on a continental scale or a molecular scale- which you would not be able to do ordinarily without using costly techniques such as satellite imaging and electron microscopy. Computer modelling has become increasingly popular in the scientific community since the first half of the twentieth century. This is partly a result of innovations in computer processing ability, meaning that computers can now deal with vast data sets and increasingly complex calculations. At the same time, computers have become smaller, more compact and faster, with more user-friendly interfaces. Consequently, computers are now far easier to use while their processing power is also much greater. Alongside these advances in computer science, there has been a massive change in the way data-sets are generated in certain scientific disciplines due to the use of new experimental technologies and techniques. The study of climate took a dramatic leap forward with the origins of the NIMBUS programme from NASA, which launched weather satellites to collect data on air pressure, temperature, rainfall patterns and other factors which contribute to the global climate. As a result, people were able to build extremely large data-sets from all over the world. For the first time, we could look at global patterns in the climate and how these varied. Image 3: A model generated by NOAA, the National Oceanic and Atmospheric Administration. This shows a cross section through the sea off the West coast of South America during an El Nino year. The colours show variation in temperature, while arrows show the direction of currents and winds. Image retrieved 15th August 2011 from http://www.pmel.noaa.gov/tao/jsdisplay/plots/gifani/sst-wind-cur-eqt-20c-med.gif. Another field which started to generate large quantities of data was the discipline of molecular biology. As high throughput genome sequencing techniques developed, the length of time it takes to sequence a genome has declined from a matter of years to a few hours. The latest ultra-high-throughput sequencing techniques mean that you can sequence 5,000,000,000 bases in a day- almost double the length of the human genome. By contrast it took two years for Fred Sanger to sequence the first genome, that of bacteriophage phi-174, which has only 5386 bases. In order to make these large quantities of data meaningful, we first needed to be able to analyse them statistically. Although it is possible to do this manually, the sheer number of data points involved means that it would take an unfeasibly long time to do this. This left an opening for computers to fill the gap, as programmers developed programmes which allowed data processing to be handled at speed, electronically. The first example of large-scale modelling using a computer was in World War II, when the Manhattan Project (which created the nuclear bomb) used a computer programme to try to simulate how the bomb would detonate. It would have been incredibly costly and dangerous to simply detonate many different prototypes: using a computer simulation meant that scientists could have some idea of how the bomb would work before a single bomb had been tested. There has been a conceptual shift in science in recent years, partly as a result of our increased knowledge of the links between the microscopic and the macroscopic- how changes in one can have profound consequences for the other. This is nowhere more evident than in the field of biology, and in particular developmental genetics. For example, a base point mutation in the genome can lead to a new phenotype appearing in a population; this phenotype can alter an organism’s fitness and eventually become widespread in the population. This can then alter how the population interacts with its habitat. In response to this change in ideas, ‘systems’ biology has become a widespread academic discipline designed to look at these complex biological systems. Models have adapted so you can combine several sets of data on different scales and see how changes in one can cause changes in the other. For example, the ePlant Arabidopsis modelling programme allows users to look at the genetic sequence of a chosen Arabidopsis gene, its amino acid sequence and where these genes are expressed in the plant; the modellers have been able to integrate different data-sets on different scales to develop one programme.ii A screen-capture from ePlant,a type of plant modelling software which shows the relationship between genetics and development. ePlant was created at the University of Toronto and released in October 2010. Image retrieved 15th August 2011 from http://142.150.215.220/eplant/. The future of modelling in science appears to be firmly rooted online. The rise of the internet has altered the way we process and exchange information. Groups such as SETI are able to link up with other internet users’ computers in order to analyse their data iii; and modelling software is now being released onto the web, fully accessible to anyone with an internet connection. In this way, models are becoming increasingly interactive: people are now able to both contribute their own data to modellers and then use the resulting models themselves. This opens up an exciting range of possibilities for both researchers and the general public. The relatively new academic discipline of systems biology, in particular, is one which is rapidly growing- certainly one to watch over the next decade. i Meyerowitz, E. (2001). Prehistory and History of Arabidopsis Research. Plant Physiology, 125, 15-19. doi:10.1104/pp.125.1.15 ii Fucile, G., Di Biase, D., Nahal, H., La, G., Khodabandeh, S. et al. (2011) ePlant and the 3D Data Display Initiative: Integrative Systems Biology on the World Wide Web PLoS ONE 6(1): e15237. doi: 10.1371/journal.pone.0015237 iii http://setiathome.berkeley.edu/ retrieved 15th August 2011.