Agreement (Ag) The grammatical number of the verb must be the

advertisement
COMMON ERRORS & HOW TO AVOID THEM
Hugo van den Berg
MOAC and Systems Biology Doctoral Training Centres
Warwick University
2010
I. COMPOSITION
“Grammar don’t matter, do it?”
The following is a list of elements of style, grammar and
spelling, to which you must pay attention whenever you write something to hand in. You may
object that this is unfair: that all that matters is the quality of your scientific insight, knowledge,
and achievements, not your grasp of grammar or the elegance of your writing. Indeed, you may
be more cynical and suggest that success in science does not even depend primarily on the quality
of your work. Still, if you wish your written work to have lasting value and appeal to people in
future generations whom you cannot influence by other means, you will have to learn to write
with clarity. Moreover, it is easy to grossly overestimate how well you understand a given topic.
Attempting to write with clarity is a useful reality check. You may object that language is just a
set of conventions. True, and you must adhere to these conventions for the same reasons you
observe the Highway Code. Remember that written text is a poor medium, compared to
conversation. When speaking to a person, he or she can indicate that you need to explain
something in more detail (or, on the contrary, that they know all about it so you can cut to the
chase). But when you are writing you lack all these clues, and the elements of style that make up
good prose constitute one way of making up for these shortcomings.
An asterisk (*) indicates that an incorrect sentence or clause follows. Error codes used when
marking students’ work are indicated in bold face.
Agreement (Ag) The grammatical number of the verb must be the same as that of the
corresponding noun:
* The pH of the P-phase and the N-phase were measured.
The pH of the P-phase and the N-phase was measured.
This is a typical example where the plurality of the intervening clause causes the writer to forget
that it is the pH that was measured. Note that statistics, dynamics, genetics, proteomics, genomics
are all singular. Data is actually the plural of datum, but is nowadays treated by almost all
speakers as a singular mass term (which raises the question of what to call a single data item: a
data point? an observation? say datum and you sound like the professor who ordered a martinus).
one bacterium
one criterion
one phenomenon
one ganglion
two or more bacteria
two or more criteria
two or more phenomena
two or more ganglia
Bacteria (the plural) might refer to several bacterial cells, or two or several bacterial species. The
locution they for a singular person looks and sounds much better than he or she or (s)he, but in
written text it is jarring because it looks too much like an agreement error and, moreover, many
still view singular they as a colloquialism (q.v.).
Apostrophe (Apo) The apostrophe indicates relations of possession:
the enzyme’s = of the enzyme
the enzymes = more than one enzyme
the enzymes’ = of more than one enzyme
The rule is no different for acronyms and abbreviations:
the RNA’s = of the RNA
the RNAs = more than one RNA
the RNAs’ = of more than one RNA
although some writers feel the plural of an acronym needs an apostrophe, too. Names ending in -s
follow the same rules (Bridget Jones’s Diary, the Joneses’ new car), with the exception of timehonoured luminaries (Jesus’ teachings). The rule is different for its, which like the pronouns
theirs and hers is a possessive without an apostrophe; it’s means it is or it has, but remember that
you should not use contractions in academic writing. Irregular plural possessives are formed thus:
children’s, people’s. Thus, men’s clothing is men’s wear, even though retail signage invariably
reads *menswear.
Bastardized English (BE) Foreign students should take care to note that not everything they have
come to believe is English actually is English. They are kindly requested not to refer to a data
projector as a beamer, the latter being a car manufactured by BMW. They should avoid nonidiomatic constructions such as
*This is how it looks like.
*We now have the possibility to obtain an asymptotic result.
The first of these must be the most common example of non-idiomatic English uttered in
seminars; the second sounds like something a Ukrainian gangster might say (the grammar, not the
maths). Such things can and do change, but this is best left to native speakers. German students
should refrain from referring to their mobile phones as handies (or, even worse, Handy’s). Asians
should take care to avoid incorrect locutions with about:
* Discuss about… * Mention about…
*Analyse about…
*A problem about…
Already and yet require a perfect past tense:
*The experiment was done already by Ed et al.
Do not use since where for is correct, as in:
*The protocol, due to Al et al., has been in use since ten years.
In each of the following pairs of sentences, the two juxtaposed sentences mean different things:
I like to express my gratitude.
I am interesting.
I would like to express my gratitude.
I am interested.
The ones on the left express distinctly oddball sentiments.
Colloquialisms (Coll) Strive to write as you speak (indeed, you will avoid most syntactical errors
if you simply avoid writing things you would never say) but remember that written text lacks
some of the advantages of interpersonal contact. In particular, written text can look odd, jejune, or
strained when it is too informal:
*This leaves the RNA polymerase molecule in a bit of a bind.
*The law of large numbers is da bomb.
*Hopefully the octopus makes another attempt to copulate.
*Anaerobic bacteria are ideally suited to this sort of thing.
The first example may well be perfectly acceptable ten years from now, whereas the second
example will be, like, so last decennium. While hopefully could be defended as an elliptic idiom,
the trouble is that the third sentence can be read as imputing hope to octopi, which is probably not
what is meant (although the sentence would be acceptable as part of a wildlife video narration). In
a slightly informal expository text, an expression such as this sort of thing might not be out of
place. Overused filler words (very, really, definitely, fairly, quite, nice) should be avoided, unless
of course you really really mean it (to ban all such words outright would be pedantic; nonetheless,
be careful). Mentally substitute the word damned for very whenever you want to write the latter
and decide whether you really do feel that strongly about it.
*Separation of variables is a very important technique.
Separation of variables is a technique that often proves useful in practice.
Here, the need to eliminate very prompted a more precise and informative rephrasing. One reason
why these words are overused in conversation, and look so sloppy in writing, is that each of them
can mean many different things. If you are tempted to use such a word, try to think of a synonym
with a less wide meaning. For instance, instead of really consider truly, genuinely, considerably;
instead of very consider extremely, intensely, utmost, or, better yet, add a phrase that explains the
very and renders it superfluous. Avoid dropping successfully in sentences reporting even the
slightest of accomplishments.
Dangling elements (Dang) A dangler is a participle or gerund that is not linked to a
corresponding noun:
*Considering the affinity, the mutant enzyme had a lower Km.
*Using these definitions, the key equation follows.
*Having spoken at various conferences, Diplodocus was a giant herbivore.
*When studying spiders, salticids are not easily mistaken for something else.
The -ing forms that start these sentences express an action not possible for the subjects of these
sentences (enzyme, equation, Diplodocus, salticids, although intriguingly salticids do seem to be
keen observers of fellow arachnids). While danglers could be defended as idiomatic elliptical
constructions, they should be avoided in view of the comical effect they can have. Some students,
vaguely remembering that -ing forms at the beginning of a sentence are associated with some sort
of trouble, will seek the safety of the following construction:
*In terms of affinity, the mutant enzyme had a lower Km.
Whereas this is not strictly wrong, such clunky use of in terms of does not make for attractive
prose and is symptomatic of lazy writing.
Green squiggles The built-in grammar checker that puts green squiggles underneath some bits of
your prose is usually right, but not always.
Heterogeneous co-ordination (Het) Nouns that are syntactically co-ordinate should belong to the
same category of meaning:
*The Calvin cycle is more costly than heterotrophy.
*Genomics includes alternative splicing.
*Multiple signaling pathways control homeostasis.
Heterotrophy, as a mode of existence, should be compared to autotrophy (a key component of
which is the biochemical pathway of the Calvin cycle).
Irrelevant material (Irr) Your essay, assignment write-up, or research report is there to get a point
across (or a cluster of related points). Anything that detracts from this goal should not be there.
Material that interrupts the flow of the text too much but should be there to serve the needs of
some readers (long tables, detailed proofs) should be delegated to appendices. Above all, do not
succumb to the feeling that you need to include material merely to showcase your knowledge or
understanding (some lecturers do play “gotcha” but if this happens you can console yourself with
the knowledge that they are poor teachers, and that you will do better when you become one).
Mixed construction (Mix) The construction of the sentence should not change in mid-stream:
*Meiosis is when the diploid genome becomes haploid.
Such errors occur very frequently and can easily be prevented simply by listening to what you
have written.
Colon, semi-colon, comma, full stop (Punc) The colon is the “double dot” and is used when the
following material elaborates the implications of the initial statement:
Substance X is a non-competitive inhibitor: it changes Vmax but not Km.
The semi-colon is the “dot-comma” and separates statements that are complementary and
parallel. When in doubt, use a full stop (unless all your sentences end up being less than 10 words
long, which will make you sound like a robot). The subject of your sentence does not end with a
comma, even when it is a long subject complement clause:
*Integrative homeostatic dynamics models, have been used more recently.
If you are afraid the sentence becomes too difficult to parse without the comma, you should
rephrase it. A comma is nowadays more and more used where one would traditionally expect a
semi-colon or a full stop:
*Microarrays chart gene expression patterns, two systems are available.
This sounds as if the writer does not properly understand the logical connection between the two
clauses. The comma should not be regarded as a one-stop shop for connecting any old pair of
related thoughts:
*The mutant ligand is ineffective, it is unable bind the receptor.
Instead, use a full stop or an appropriate co-ordinating conjunction:
The mutant ligand is ineffective, because it is unable to bind the receptor.
Note that you could not use therefore instead of because in this last sentence. To develop a
feeling where commas should go, read your sentences out loud and pause where you have written
commas. You will hear superfluous commas as unnatural pauses. From this discussion you may
get the impression that a full stop is your best bet when in doubt; this is not too bad as a general
rule of thumb, as long as you remember that each sentence should be complete, with main verb
and predicate, and that two many short sentences following upon one another result in a staccato
“machine gun” effect.
A subordinate clause which you would read out in a lower voice should be flanked on both
sides by commas:
The Van der Waals forces, named after one of the many brilliant Dutch physicists, play a key
role in intramolecular interactions.
*We will explain with the aid of examples, the advantages of differential equations.
The last sentence requires either another comma (before with) or that the one that follows
examples be left out. The word however has two meanings. In the meaning “be this as it may” (or
simply “but”), however should be flanked by commas or, if it appears at the beginning of a
sentence, it should be followed by a comma:
However, the second experiment showed an unexpected result.
The microarray analysis, however, did not confirm our hypothesis.
When however has its other meaning of “regardless of” it is not followed by a comma:
The neurone did not hyperpolarize, however much ATP was added.
Full stops (periods) end sentences. Having a full stop where one should have a semi-colon is
usually admissible, but a semi-colon for a full stop may look pretentious. Full stops also end
abbreviations, but not those that end in the last letter of the unabbreviated word:
doctor: Dr
mister: Mr
doctors: Drs
misters: Messrs
A selection of Latin abbreviations that occur regularly in scientific writing:
cf. = compare (confer) It does not mean “see”.
c.q. = in which case (casu quo) It does not mean “or”.
c.s. = and fellows (cum suis)
et al. = and others (et alia) No period follows et which is a complete word.
etc. = and so on (et cetera) When speaking, avoid saying “egg seterah”.
e.g. = for example (exempli gratia)
i.e. = that is (id est) When speaking, try to say “that is” and not “Aye ee”.
q.v. = which one should look up (quod vide)
s.l. = in the broad sense (sensu lato)
s.s. = strictly speaking, in the narrow sense (sensu stricto)
viz. = namely (videlicet)
The abbreviation c.s. is to refer to a usually prominent person together with the people he or she
works with or who follow him or her. The abbreviation et al. is now spelled et al without the full
stop by many scientific journals. Sensu lato and sensu stricto are usually written out in full.
It is lazy writing to put etc. at the end of a list or enumeration when you have a vague feeling
you may have forgotten one or more similar items (and are afraid, perhaps, that the reader will
take you to task for it). Only use etc. if the reader can easily supply more examples:
Specialized training is required to treat zoo animals such as monkeys, elephants,
crocodiles, tigers etc.
*The blood transports oxygen, nutrients, enzymes etc.
In the second sentence, there certainly are other blood components that have been left out, but
they do not belong to a single category and the list is therefore not readily extendable. You can
always use including or some phrase to similar effect to indicate the fact that the enumeration is
not complete, nor meant to be. (Another legitimate use of etc. is to abbreviate a formula such as a
list of honorifics, but you are unlikely to find yourself needing this in scientific writing.) In the
type-setting language LaTeX, input
i.\ e.\
or: i.e.\
et al.\
to obtain proper spacing following the full stop (omit the second backslash if the abbreviation
actually ends the sentence, and note in passing that a single full stop will do the job of ending
both abbreviation and sentence). Microsoft Word is hopeless at this sort of thing, so it is better to
write i.e. than i. e. Also, you are not required to italicize these abbreviations, although you should
feel free to do so.
Quotations & reference (Quo) Always attribute facts and findings to the source that provided
them, both to pay tribute to the original contribution and to assign responsibility. (Of course, your
source is in no way responsible for any misinterpretations on your part.) By all means use
wikipedia, but always follow up references; if the wikipedia page does not provide them, find
your own. Wikipedia cannot be trusted; its editing process means that pages often do not even
concord with their own references! Fragments of text that you lift from your sources should be
put between quotation marks and be attributed. If you fail to do this you are plagiarizing. Note
that opening quotes are “sixes” and closing quotes are “nines”. In the last sentence the nines
precede the full stop, whereas standard practice reverses this order; you should feel free to follow
either convention. In scientific prose the need seldom arises to quote whole paragraphs (this is
different for scholarly work). If you quote sentence fragments, make sure they are syntactically
contiguous with the surrounding text. Single sixes and nines can be employed to distinguish the
mention of a word from its use:
‘Boston’ has six letters, whereas Boston has six million inhabitants.
Alternatively, you can put the mentioned words in italics (Boston has six letters). Arguing from a
strictly logical point of view, you would expect that offensive words become inoffensive when
you mention them rather than use them, but this is not the case: such words still jump from the
page and may trigger outrage.
Restrictive versus non-restrictive (Res) Compare the following:
The fuel of red blood cells is the carbohydrate glucose.
*The fuel of red blood cells is the carbohydrate, glucose.
The second sentence suggests (incorrectly) that glucose is the only carbohydrate. Additional
(non-restrictive) information appears between commas:
Lactate dehydrogenase, which is a protein, is found in red blood cells.
If you use that instead of which in the previous sentence, you imply that there is also a nonproteinaceous lactate dehydrogenase (which could be true but is probably not what you meant).
Defining (restrictive) information cannot appear between commas:
The enzyme that converts pyruvate to lactate is found in red blood cells.
It would be incorrect to put a comma before that and/or following lactate. In British English, it is
acceptable to use which instead of that in a restrictive clause, but that can only appear in a
restrictive clause.
Split infinitive (SI) It is not always wrong to split an infinitive:
(*?)To fully understand the effect, a more detailed analysis is required.
Nevertheless, in some cases it is better avoided:
*The parasite attempts to forcefully enter the host.
(*??)To systematically elucidate the relationship between HDL and atherosclerotic risk, we
need to better understand the key regulatory factors.
Spelling (Spell) Use the facilities available (automated spelling correction, oed.com). Spell
checkers do not pick up mistakes if the misspelled word happens to spell something else:
weather = a meteorological condition;
which = that
whether = if in the subjunctive sense
witch = gothic-looking woman who casts spells
principal = main, foremost;
to forgo = to give up on, to do without
to effect = to make happen;
an effect = a consequent phenomenon;
complimentary = courtesy-wise
to is the preposition
to advise (verb)
to extend (verb)
to save (verb)
to price (cost), to prize (appreciate)
ensure = make sure
principle = fundamental element, axiom
to forego = to go before, to precede
to affect = to modify, alter, influence
an affect = a certain cognitive state
complementary = supplying the remainder
too = also
an advice (noun)
an extent (noun)
safe (noun and adjective)
a price (cost); a prize (award)
insure = what an insurance company does
If you find a paper in which the authors perform *“principle component analysis” you should
wonder whether the authors have any idea what they are talking about. British and American
spelling are equally valid, but you should be consistent in your choice. Verbs ending in -ize or ise present a special problem. One solution is to use -ise in all cases (as William Shakespeare has
a character exclaim: “Thou whoreson zed! Thou unnecessary letter!”). However, etymology and
phonetics both favour -ize in most cases (to characterize, to analyze). Exceptions (which should
always be spelled with an s) include: to devise, to advise, to apprise, to comprise, to despise, to
excise, to revise, to supervise, to surmise, to exercise, to improvise.
The indefinite article is written either a or an. Correct usage follows phonetics, not spelling:
an mRNA molecule
a uniform
an LSD-derivative
a Yemenite
an x-axis
a utopia
an NYPD officer
a NASA initiative
Symbols at beginning of sentences (Sym) Avoid beginning a sentence with a mathematical
symbol or a chemical formula or a digit:
*f is defined by an ordinary differential equation.
The function f is defined by an ordinary differential equation.
*Al responded differently.
Aluminium responded differently.
*4 mutants were selected for further study.
Four mutants were selected for further study.
This When this is followed by its referent, confusion is unlikely to arise:
This phenomenon is called ‘stochastic resonance’.
When this refers back to an element in a preceding sentence, its precise meaning may elude the
reader:
*For larger parameter values, two stationary points appear. This is a bifurcation.
It is safer to augment such occurrences of this with a noun or clause that recapitulates the
referent:
*This variation of the number of stationary points as the parameter value changes is known as
a bifurcation.
Repetitive material (Rep) Saying things more than once in different ways is a key technique in
exposition, so not all repetition is automatically bad. However, a paragraph or sentence that
contains nothing new, or does not permit the reader to view the matter in a different way, serves
no purpose and had better be omitted.
Usage (Usage) Note the difference between whether and if:
We must see whether the weather allows it, and if it does, we will go.
Adverbs in English tend to end in -ly or -wise or -ways, but not always:
Work hard and you will succeed.
Shakespeare would still have written hardly here and not meant it in the modern sense. Not
everything that seems to explicate a verb is an adverb:
*The door was painted redly.
We say red here because it is a predicative adjunct (it says more about what the door becomes
than about the painting process). The advice to native speakers is not to add -ly where their
instinct tell them to leave it out.
Other things to bear in mind:
We need only show...
different from...
farther (distance)
to imply (A implies B)
to compare = seek similarities
uninterested = not interested
*warm/cold temperature
*expensive/cheap cost or price
*irregardless
few people, a few attempts
It suffices to show...
(never *different than)
further (anything else)
to infer (a person infers B from A)
to contrast = bring out differences
disinterested = without a stake in the matter
high/low temperature
high/low cost/price (or: prohibitive etc.)
irrespective (or: regardless)
less money, less daunting, less water
Fewer than is now almost invariably replaced by less than in everyday speech, and it is to be
expected that written English will follow suit within the next few decades.
A thesaurus is a good tool if you momentarily cannot think of the expression that is on the tip
of your tongue, but do not be tempted by the delicious unusual words you see along the way (the
plural of rhinoceros is rhinocerotes; the collective noun of butterflies is a kaleidoscope). Stick to
words that belong to your normal voice and if you do try something new, make sure the word or
phrase means what you think it means.
II. PRESENTING STATISTICAL RESULTS
“Lies, damned lies, and statistics”
You will learn about statistics in a separate module and
you may well decide that mastering the technical nitty-gritty of it is not for you. Be that as it may,
you are negligent if you fail to heed the following points of advice.
Mean versus median The mean is the average of the data (the sum of the observations divided by
their number). The median is any number such that half the data are larger than this number (i.e.
the 50th percentile). In symmetric distributions, the mean is a median, but this is not the case when
the distribution from which the data were sampled is skewed. In the latter case, it may be better to
report the median.
SEM versus SD The SEM is the standard error of the mean. It estimates the accuracy of the
sample mean as an estimate of the population mean. The latter is unknown, but is of course more
reliably estimated when the sample size increases. Thus, as the number of observations becomes
ever larger, the SEM shrinks to zero. The SEM is often (incorrectly) abbreviated to SE, ‘standard
error’. The SD is the sample standard deviation. It is a measure of the variability (often called
“spread”) in the data around the sample mean. You should use the one marked n-1 on your
calculator, since this provides an unbiased estimate of the standard deviation of the distribution
from which the data were sampled. The SD should be used when one is reporting data. However,
almost all scientists incorrectly use the SEM nowadays because it is invariably smaller than the
SD. If your supervisor is one of these people, try to re-educate him or her.
Statistical significance versus scientific importance Report the P-value. If this is not possible,
present the lowest of the conventional cut-off values that is higher than the P-value. (They are,
traditionally, 0.001, 0.01, and 0.05 although you may encounter other values. Thus, if P=0.005,
report P<0.01, not P<0.05 even though the latter is of course also true.) Now, if you cross a
certain busy road two times a day with a probability of 5 percent to be run over, your chances of
being alive next week are worse than even and you will almost certainly be dead a year from
now. It is therefore right and proper that P-values above 0.05 are not considered to indicate a
statistically significant result, but you should remember that, say, P=0.04 is not much better. A
few quality journals have moved the goal post to 0.001, which is commendable.
Whether or not a finding is statistically significant is much less important than the associated
confidence interval. Suppose that a drug is found to lower blood pressure on average 8 mm Hg
from 100 to 92 mm Hg (mm Hg is not the SI unit of pressure, but it is what most medics use).
The finding is statistically significant. Is it clinically significant? Well, that depends. The
confidence interval may be 2 to 14 mm Hg. The higher end of this range is clinically important,
whereas the lower end is not. Altogether the findings are clinically inconclusive. You may agree
that the very term ‘significant’ tends to obfuscate the issue. Many statisticians nowadays prefer
the term statistically detectable; you are encouraged to adopt this usage and re-educate your
supervisors.
Returning to the drug example, suppose that the study is followed up by a similar study with
more observations. Now the confidence interval is 7 to 10 mm Hg, which is a clinically
significant range. Why not do the study with more subjects straight away? Because resources are
scarce, so that on the whole it makes sense to do a pilot study first. More observations mean
higher statistical power (q.v.), which means that smaller differences become statistically
detectable.
Fishing for significance Genomics, proteomics and metabolomics all afford huge, richly
structured data sets that can be subjected to any number of significant tests. Do enough tests and
you are bound to come up with a statistically significant (better to say: statistically detectable)
result or two. Any number of statistically detectable differences can be found, in fact, if you keep
at it long enough (there are many different ways of forming subgroups of the objects in your
study, if they have several attributes). Of course it is inappropriate to report only the statistically
detectable findings. Worse, it is unethical.
There are various ways to deal correctly with multiple testing situations. A simple and
straightforward procedure is the Bonferroni correction (q.v.), which has the drawback that it is far
too conservative for the modern “omics” environment. More suitable for this environment is the
step-down procedure (q.v.). The ethical entanglements can be side-stepped by specifying in
advance which tests are going to be carried out. This leads to the difficulty that the observations
may throw up something important and unexpected. The proper way to deal with this is to report
the original hypothesis with the originally projected test in one paper, and use the new findings as
a pilot upon which you base a study specifically directed at the new finding.
Percentages There is a bizarre perception, deeply ingrained in our culture, that things are
somehow easier to understand in terms of percentages. Quite often this is simply not the case. For
instance, a natural way to express mortality and morbidity is in terms of per person, per kiloannum. (Annum means year.) But one usually hears these things quoted as a percentage. For
example, breast cancer mortality is reported as 0.5 percent, which presumably means that one in
200 women diagnosed with breast cancer die every year. Early diagnosis reduces this rate. For
instance, in Sweden it was found that the rate went down from 0.51 % to 0.39 % during a breast
cancer screening trial. This good news was reported as a 24 percent reduction in the media, with
an all too predictable effect on public opinion. Perhaps there is just no percentage in honesty—
there certainly is little honesty in percentages. Be deeply suspicious of any science-related
percentages being bandied about in the media, and always try to select honest ways of expressing
your own findings.
III. GRAPHICAL PRESENTATION OF RESULTS
The message of this section can be summarized in a single sentence: whatever the MS-Excel
default would have you do, do the exact opposite.
Line graphs For time series, a line graph is usually better than a column chart. (A) Use straight
lines to connect the means at the various time points sampled. Smooth interpolating curves are
best not used when the purpose of the graph is to report the data, since you are in effect adding
pseudo-data. If you have a mathematical model that purports to capture the processes that gave
rise to the data, you can present a curve derived from this model together with the data. Usually
the curve is shown for the parameter values that result in the closest “fit” to the data set.
Sometimes the data are going to be used in a calculation, and some preprocessing (e.g.
smoothing) is used. In such cases it is permissible to exhibit the curve that represents this
preprocessing. (B) Use different line styles (e.g. solid, dashed, dotted) to distinguish multiple time
courses in the same graph from one another. Different symbols at the data points simply do not
work well. (C) Use axes of even length. For long time series, graphs that are much wider than
they are tall are appropriate, but it is not appropriate to deviate from squarish proportion in order
to exaggerate the impression you want the graph to make. (D) In graphs with linear scales, the
most objective representation is obtained if the axes cross at the point (0,0), although you may
decide to deviate from this rule to avoid too much waste space in the panel.
Column charts and bar charts A set of vertically (occasionally horizontally) arranged bars is often
used to represent data sets. Such a graph is called a ‘column chart’ (‘bar chart’ if the bars are
horizontal). These are mostly used to represent categorical data rather than time series, where
each observation corresponds to a different treatment, mutant, peptide, or whatever the case may
be. The length of a column or bar is proportional to the value you wish to depict. A default choice
is to let the column length correspond to a sample mean, and to adorn the column with T-shaped
extensions that indicate the standard deviation (or, incorrectly, the standard error). Instead, one
could show the raw data as a cloud of points along the centre line of where the column would go.
This has several advantages. Readers can assess the shape of the distribution and the spread in the
data for themselves, which is important to judge whether appropriate statistical tests have been
used. Moreover, it is easier to see if alleged statistically significant differences are due to outliers
or groups of outliers. Quite a bit of information is conveyed in the same amount of space a
column chart would occupy. Disadvantages are that the chart becomes too crowded if the data set
contains more than (say, roughly) 40 data points, and that cognoscenti can glean more from your
chart than you wish to reveal. It is not unusual for experimentalists to be coy about their raw data,
their averred stock-in-trade.
Instead of mean and SD, one can show the 5th, 25th, 50th, 75th, and 95th percentiles. A
traditional format is the box-and-whisker plot (q.v.), which has the drawback of being
tremendously space-consuming Slightly more economical is an open column without fill colour
and horizontal bars at the percentile points.
Three dimensional column charts show the data as a Manhattan cityscape. Use this option
wisely, and do not be tempted to add a phony third dimension to what is really a conventional
graph.
Pie charts Categorical data that add up to 100 % are often depicted in a pie chart. Such charts are
extremely space consuming (one could print them smaller to save space, but then they become
harder to interpret). The same information can be depicted in a column or bar with differently
coloured (or cross-hatched) subsections. This is particularly useful if a number of distributions are
to be compared.
Keys and legends Graphs are easier to interpret if key information appears in the field of the
panel (e.g. labels for the various lines, arrows that indicate when a drug was added, with the name
of the drug next to it). Incredibly, there are journals that insist that all of this information must
appear in the legend (the caption below the graph). This made sense in a time when graphs had to
be prepared by artists who would (typically) not quite understand the subject matter, which
caused graphs with keys and labels to go through many time and money consuming iterations.
Today, when investigators can prepare the graphs themselves on a computer, there is no good
reason to adhere to this practice.
Download