Some general guidance notes on the presentation of data analyses

advertisement

Andrew Jackson ZO4030 Data Handling 2009-2010

Some notes on the presentation of data analyses

General Remarks

Keep reading published research papers and note the layout, style and presentation they use as well as the scientific content. Generally they serve as a good template for your thesis and as are the standard to which you should be aspiring. Of course, not everything that is published is accurate, correct or even good but if you are dealing with one of the mainstream journals you can be 99% sure that it is up to standard. On this note, journals are ranked on various metrics, the most common of which is their “impact factor” ( http://en.wikipedia.org/wiki/Impact_factor ). To some extent this metric can be used to get an idea as to how “good” a journal is in terms of how many people read it, how often its contents is cited by other papers, how often they publish etc… You have to take this metric with a pinch of salt sometimes but for example… Nature

(31.9), Conservation Biology (4.7), Behavior (1.6), Biology and Environment (0.04)

Data are plural, e.g. “These data show that…” or “The data are normally distributed”

Be careful with the content under each of your headings which should be: Abstract,

Introduction, Methods, Results, Discussion o It is a common mistake to mix up methods, results and discussion. o Sometimes it is not entirely clear to which section some text should belong but generally speaking you should be able to keep it on the straight and narrow. o Introduction – where you lay out the question you are going to address, why its important, briefly how you are going about it and what the main point of the study is. o Methods – all the details of your study should be here so that someone can fully understand what you did so that they can assess your work with its limitations in mind

(all studies have some limitations somewhere but some are better than others). If you are going to do some detailed new maths/statistics then you should lay out the details for your new methodology here. If the statistics are more rudimentary and commonplace then leave it to the results section. o Results – all results that you are going to talk about in your discussion need to be presented here. Generally this is where you refer to figures and tables of your data (and not usually before). Results of statistical analyses need to be presented here with enough information to satisfy the reader that they are correct and that the findings you will discuss in the next section are based on robust interpretation of your data. o Discussion – where you interpret your results in the context of the bigger scientific picture and tell us what your study adds to our general knowledge of the world. No reference to p-values should be made in this section, and rather you would focus on the trends in your data which you convinced us of in the results section.

When presenting the results of a statistical analysis, you usually quote the o p-value

Andrew Jackson ZO4030 Data Handling 2009-2010 o degrees of freedom (which gives information about sample size and the number of comparisons you are making which is useful when assessing how much trust the reader has in your data) o the test statistic (e.g. t,z,u depending) o …and if relevant the estimate of the coefficient (b) you are interested in (e.g. the slope or fixed effect in a glm) o When quoting AIC values, remember that these only make sense when comparing multiple models fitted to the same data. So a single AIC on its own is meaningless. Either quote more than one AIC for comparing two models or don’t quote it at all. In a model selection type analysis the three obvious models for comparison are: the minimal adequate model the full model and the null model (see Michael Crawley’s book: An introduction to statistics using R). Bear in mind that I disagree somewhat with Crawley’s approach to finding the “best” model – I tend to go with whatever the step() function returns whereas Crawley advocates manually removing non-significant terms based on their p-values… which in my view mixes up his modeling philosophies. o At some point in the methods section, state the sample size you are dealing with. If your data break down into categories (say by gender or village) and the group sizes are not equal you should also provide the sample size by group to reassure the reader that there are not substantial biases in the group compositions (you could do this in a table).

Variables are presented in italics in text e.g. (b

1

)

General remarks on Figures and Tables

Any figures and/or tables presented in the document need to be referred to in the text.

Additionally, figures and tables should be referred to in numeric order in the text… i.e. reference to Figure 1 should precede reference to Figure 2.

Figures and tables need to have full legends that accompany them explaining what they represent. The basic principle is that a figure or table should be interpretable on their own without recourse to the rest of the text. There is no need to describe trends or patterns in the figure/table as we can see this for ourselves (you do this in the results section).

For your thesis (and indeed when submitting a paper for publication), figures and tables should not be interlaced with text and rather should sit on their own.

Figures

The legend for a figure appears under the figure.

Need to be large enough so that all text, lines and points can be read easily. If it fills an entire page then so be it.

If your figure or image has multiple panels then they should be laid out on the page as you would like them to appear and they should be labeled e.g. Figure 1a and Figure 1b with a common legend describing them. Some people / journals can get very pedantic when it comes

Andrew Jackson ZO4030 Data Handling 2009-2010 to panels and could argue that strictly speaking if the axes represent different things then they should be separate figures (I don’t hold this view).

Images (photographs or line drawings) need to be of high quality and any text in them (such as on maps) need to be large and clear enough to read.

Figures should not have a title above them such as you would create in R by using – plot(x,y, main=”the title goes here”). Rather you tell us what the figure is in the legend.

Figures should not generally have a plotting legend on the graph unless it is something like a contour plot which needs a bar telling us what the colours mean, or if it’s a very complicated graph that really necessitates a legend. Rather, you should provide details of the plotting symbols and lines in the legend text. E.g. the dotted line represents the fitted linear model… the open and solid circles represent males and females respectively…

Ideally you would use black and white to render your figure (certainly for publication you try to achieve this as else it can easily cost up to €1,000 per colour page). Of course you are free to use colour as you wish for your thesis and often it can look much better and is easier to interpret. So it is up to you… you won’t be marked differently for B&W or colour as long as the presentation is clear (colour can muddle things just as much as B&W can be hard to present lots of data types).

Some people hate it when you show a figure that has a box on all 4 sides of the graphing window (I don’t have this hatred). You can easily change the way the box is drawn around the plotting window in R by using the command plot(x,y,bty=”L”)… see ?par under bty for more details.

If you need to resize a figure, avoid changing the aspect ratio of the image (i.e. don’t change the horizontal axis more or less than the vertical) as it can badly squash text (although this entirely depends on the file type.. you may find that windows metafiles (*.emf) which you can get from

R do not suffer from this resizing behavior. Holding down on the “shift” key while resizing anything in windows will lock the axes so that the shape is preserved.

Figures need axes labels that tell us what is plotted and units where relevant.

Boxplots can suffer from their own problems o In a box plot the x-axis is often (and generally) categorical and so is not on automatically on a linear scale: i.e. the distance between each box will be uniform. Therefore, if you do a boxplot that has say distance on the x-axis you need to make sure that the either the boxes are forced to be plotted against a measure of distance or that you do not make statements about trends that are dependent on the x-scale. o If your x-axis categories are ordinal (e.g high, medium, low) then be sure to plot them in either increasing or decreasing order. Its hard to force R to do this as it plots the order in alphabetical order by default. The way around this is to label high, med, low in your original data file as “a.low”, “b.medium”, “c.high” which will then result in the correct order in plotting. You then will need to relabel the x-axis categories manually to “low”,

“med”, “high” when calling boxplot()

Andrew Jackson ZO4030 Data Handling 2009-2010

Tables – example table below (Table 1)

The legend for a table appears directly above the table describing the contents of the table.

Tables should not have vertical lines delineating columns.

Tables should only have horizontal lines delineating the column headers from the table contents and a horizontal line to mark the end of the table.

You should not use shading to highlight rows or columns unless it is essential to the data or the interpretation in some way.

You can use bold typeface for the column headers and/or the row headers but there is no requirement.

Generally at least the first column which may contain row names would be left justified. The rest of the table contents might be left or centre justified depending on your choice of style.

Only use tables when absolutely necessary. If the amount of information can be easily presented in a sentence or two then do so. Tables may commonly be used to present the results of a large

GLM analysis where there are multiple variables that might be associated with an outcome… e.g. the malaria analysis we performed in the tutorial.

Table 1. Example table showing column-headers in bold typeface. Horizontal lines only are used to delineate the column-headers from the main table content, and to mark the start and end of the table.

The first column is left-justified while all other columns are centre-justified.

Column 1 header x y d

Column 2 header

X

X f

Column 3 header x x

A

Download