An Introduction to Modelling in Biology

advertisement
An Introduction to Modelling in Biology
Freya Scoates, Department of Plant Sciences, University of Cambridge
The rise of computer models to simulate different scenarios resulting from a given set of
circumstances is one of the latest in a long line of techniques that people have used to
visualise theories and ideas. Modelling is a fundamental skill in all science, and models range
in complexity from simple population modelling to ‘whole earth systems’ models, which
attempt to show the circulation of air and oceans across the entire planet. In the future,
computer modelling is set to become one of the main methods scientists use to explain their
theories and make predictions; as a result, a working knowledge of modelling and its uses is
essential for research scientists today.
In order to understand modelling, it is first important to understand the massive range of
experimental tools which can be categorised as a model. In biology, perhaps one of the most
obvious examples of modelling is the use of ‘model organisms’. These are organisms which
are used to analyse biological phenomena, and the fruit fly (Drosophila melanogaster) is
probably the best known. Arabidopsis thaliana (thale cress) is used as a model plant and was
the first plant whose genome was mapped. Arabidopsis was chosen for this due to its very
small genome, consisting of only five chromosomes. As a flowering plant (angiosperm)
Arabidopsis mutants were used to discover the genetic basis of flower development. The first
documented description of Arabidopsis thaliana was written by Armin Braun in 1873i; it is the
most extensively used model organism for seed-producing plants, and as a result there is a
wealth of information on its genetics and development. This means that it is hugely important
in agricultural research.
At the other end of the spectrum of biological models lies an abundance of numerical and
statistical models. For example, Earth systems models are generated from equations which
describe fluid thermodynamics. These equations are then used to describe air circulation and
ocean currents across the world, and are also used as a basis for climate change predictions.
Statistics are also used to visualise data in graphs, another form of model. A famous example
of graphs modelling a biological system is the Lotka-Volterra model which explains predatorprey dynamics, showing that predators respond to variation in numbers of prey and vice
versa.
Image 1: The data on the cycles of lynx (predator) and hare (prey) which is a classic
illustration of the Lotka-Volterra model of predator and prey cycles. Retrieved from
https://www.math.duke.edu/education/webfeats/Word2HTML/Predator.html on 15th
August 2011.
The wide range of different types of model perhaps reflects the different ways in which
models can be used. One major role of models in science is to generate a visual
representation of a particular theory. For example, many people have seen the bronze
‘armillary’ heliocentric models of the solar system, with ‘planets’ on rings which can turn
around the centre, symbolising the sun. These models are very simplified but make it easy to
understand the central idea of a heliocentric solar system; they strip away details which are
not relevant to the theory and include only those aspects which aid understanding, allowing
the theory to be easily understood and popularised.
Image 2: A diagram of an armillary sphere which illustrates the idea of a heliocentric
solar system, with a sphere in middle representing the sun and the rings round the
outside illustrating the orbits of the planets. By Flickr user Double—M
Another important feature of models is that people can use them to make predictions of the
future. A model’s ability to accurately predict the future is often used to judge the accuracy of
the model; if a model repeatedly fails to predict future events, then it is usually believed to be
wrong. People use this criterion to refine their models. For example, Edmond Halley
calculated the trajectory of his eponymous comet in 1705, using Newton’s Law of Gravity to
predict that the comet would next be seen in 1758. The comet was next observed on
Christmas Day, 1758; this meant that Halley’s model of comet movement was more widely
accepted, as it had been corroborated by his successful prediction of the comet’s movement.
This ability to make predictions using a model is important when it is impossible to perform a
physical experiment. This may be because it is unethical to perform the experiment, or
because the system is too large and too complex to alter. For example, climate change
scientists can alter parameters in their models to see the effects such a change may have on
the environment, which they could not do in real life as the scale of the experiment would be
too large. An added benefit is that you can run the experiment far faster as you are not
restricted to observing your results in real-time, meaning that you can predict the course of
your experiment over a timespan of years or decades. You can also watch your results on a
different scale- for example, on a continental scale or a molecular scale- which you would not
be able to do ordinarily without using costly techniques such as satellite imaging and electron
microscopy.
Computer modelling has become increasingly popular in the scientific community since the
first half of the twentieth century. This is partly a result of innovations in computer processing
ability, meaning that computers can now deal with vast data sets and increasingly complex
calculations. At the same time, computers have become smaller, more compact and faster,
with more user-friendly interfaces. Consequently, computers are now far easier to use while
their processing power is also much greater.
Alongside these advances in computer science, there has been a massive change in the way
data-sets are generated in certain scientific disciplines due to the use of new experimental
technologies and techniques. The study of climate took a dramatic leap forward with the
origins of the NIMBUS programme from NASA, which launched weather satellites to collect
data on air pressure, temperature, rainfall patterns and other factors which contribute to the
global climate. As a result, people were able to build extremely large data-sets from all over
the world. For the first time, we could look at global patterns in the climate and how these
varied.
Image 3: A model generated by NOAA, the National Oceanic and Atmospheric
Administration. This shows a cross section through the sea off the West coast of
South America during an El Nino year. The colours show variation in temperature,
while arrows show the direction of currents and winds. Image retrieved 15th August 2011 from
http://www.pmel.noaa.gov/tao/jsdisplay/plots/gifani/sst-wind-cur-eqt-20c-med.gif.
Another field which started to generate large quantities of data was the discipline of molecular
biology. As high throughput genome sequencing techniques developed, the length of time it
takes to sequence a genome has declined from a matter of years to a few hours. The latest
ultra-high-throughput sequencing techniques mean that you can sequence 5,000,000,000
bases in a day- almost double the length of the human genome. By contrast it took two years
for Fred Sanger to sequence the first genome, that of bacteriophage phi-174, which has only
5386 bases.
In order to make these large quantities of data meaningful, we first needed to be able to
analyse them statistically. Although it is possible to do this manually, the sheer number of
data points involved means that it would take an unfeasibly long time to do this. This left an
opening for computers to fill the gap, as programmers developed programmes which allowed
data processing to be handled at speed, electronically.
The first example of large-scale modelling using a computer was in World War II, when the
Manhattan Project (which created the nuclear bomb) used a computer programme to try to
simulate how the bomb would detonate. It would have been incredibly costly and dangerous
to simply detonate many different prototypes: using a computer simulation meant that
scientists could have some idea of how the bomb would work before a single bomb had been
tested.
There has been a conceptual shift in science in recent years, partly as a result of our
increased knowledge of the links between the microscopic and the macroscopic- how
changes in one can have profound consequences for the other. This is nowhere more evident
than in the field of biology, and in particular developmental genetics. For example, a base
point mutation in the genome can lead to a new phenotype appearing in a population; this
phenotype can alter an organism’s fitness and eventually become widespread in the
population. This can then alter how the population interacts with its habitat. In response to this
change in ideas, ‘systems’ biology has become a widespread academic discipline designed to
look at these complex biological systems. Models have adapted so you can combine several
sets of data on different scales and see how changes in one can cause changes in the other.
For example, the ePlant Arabidopsis modelling programme allows users to look at the genetic
sequence of a chosen Arabidopsis gene, its amino acid sequence and where these genes are
expressed in the plant; the modellers have been able to integrate different data-sets on
different scales to develop one programme.ii
A screen-capture from ePlant,a type of plant modelling software which shows the
relationship between genetics and development. ePlant was created at the University
of Toronto and released in October 2010. Image retrieved 15th August 2011 from
http://142.150.215.220/eplant/.
The future of modelling in science appears to be firmly rooted online. The rise of the internet
has altered the way we process and exchange information. Groups such as SETI are able to
link up with other internet users’ computers in order to analyse their data iii; and modelling
software is now being released onto the web, fully accessible to anyone with an internet
connection. In this way, models are becoming increasingly interactive: people are now able to
both contribute their own data to modellers and then use the resulting models themselves.
This opens up an exciting range of possibilities for both researchers and the general public.
The relatively new academic discipline of systems biology, in particular, is one which is rapidly
growing- certainly one to watch over the next decade.
i
Meyerowitz, E. (2001). Prehistory and History of Arabidopsis Research. Plant Physiology,
125, 15-19. doi:10.1104/pp.125.1.15
ii
Fucile, G., Di Biase, D., Nahal, H., La, G., Khodabandeh, S. et al. (2011) ePlant and the 3D
Data Display Initiative: Integrative Systems Biology on the World Wide Web PLoS ONE 6(1):
e15237. doi: 10.1371/journal.pone.0015237
iii
http://setiathome.berkeley.edu/ retrieved 15th August 2011.
Download