Rohan and Westbrooke_Chapter_finalBP

advertisement
Statistical Training in the Workplace
Ian Westbrooke
Department of Conservation, Christchurch, New Zealand, iwestbrooke@doc.govt.nz
Maheswaran Rohan
Department of Conservation, Hamilton, New Zealand, mrohan@doc.govt.nz
1. Introduction
The workplace provides a distinctive context for statistical education, with
focussed subject areas, ongoing relationships between participants and
trainers, and typically an applied emphasis. This contrasts with other
statistical training settings, such as through educational institutions, short
courses and online courses, where participants typically come from a wide
range of backgrounds, timeframes are often limited, and applications are
likely to be more wide-ranging. In this chapter, we focus on our experience
in delivering statistical training to staff at our workplace, where we have
developed courses for people who are not statistics graduates and who
work in an operational rather than a research organisation. In fact, this
chapter covers from papers Rohan and Westbrooke (2012) and
Westbrooke (2010) with additional materials. After reviewing the
literature, in section 2, we describe our workplace and the statistical needs
of its staff in section 3, and in section 4 we discuss the important but often
ignored area of data handling. Our study design course is outlined in
section 5, while section 6 describes our modelling courses. Section 7
explains our choice of R and R Commander software. In section 8, we
review some differences between training in the workplace and in the
education sector. Finally, we relate some experiences from outside our
workplace in section 9 and make concluding comments in section 10.
2. Literature on Statistical Training in the Workplace
The statistics education literature focuses, understandably, on the formal education
sector, particularly at school and tertiary levels. The core of publications relating
to the workplace comes from the proceedings of the 4-yearly International
Conferences on Teaching Statistics (ICOTS). ICOTS conferences have included
workplaces in their ambit, and since at least 1998 have the workplace as one of
about 9 topic areas in their programmes. Relevant ICOTS papers deal primarily
with how the education sector can, should or does relate to the workplace or
industry. Reports from within workplaces especially about training nonstatisticians are very limited. Hamilton (2010) provides a perspective on in-house
training for non-statisticians in a national statistical office, while Forbes et al.
(2010) look at training non-statisticians in the state sector from a combined
workplace and university perspective. Most other relevant literature comes from a
more general perspective. Barnett (1990) asks how different organisations can
meet statistical needs. He suggests either employing professional statisticians as
employees or consultants; or developing skills in staff not trained as statisticians.
He then explains how consultancies or tertiary organisations are providing either
open or in-house statistics courses to increase skills of non-statisticians in the
workplace. In a keynote address to ICOTS 5, Scheaffer (1998) addressed
“Bridging the gaps among school, college and the workplace”, looking to “expand
the use of statistics in industry while producing a statistics curriculum in schools
and colleges that can be defended and sustained”. A theme in a number of the
ICOTS papers is the importance of context for workplace training, and the
particular importance of including modern teaching approaches such as
emphasising real data, statistical concepts rather than mathematical derivations,
and using projects and hands-on computing (e.g. Stephenson (2002), Francis and
Lipson (2011)).
3. Our Workplace Context:
The Department of Conservation (DOC)
New Zealand’s Department of Conservation (DOC) is the central government
organisation charged with promoting and implementing the conservation of the
country’s natural and historic heritage. Thus, DOC is responsible for managing
approximately one-third of New Zealand’s land area, along with a number of
marine reserves; protecting and managing much of the country’s indigenous
biodiversity, including many unique ecosystems and species; promoting
recreation; and facilitating tourism. The 1800 staff members include several
hundred science graduates undertaking science and technical work at national,
regional and local levels.
DOC needs evidence-based information to carry out effective management.
Typical questions posed by managers include:
 What are the trends in abundance and health for native species and
ecosystems, and how can management make a difference?
 How are visitors using parks and conservation lands and facilities, and
what issues need to be managed?
To answer these questions adequately, managers need to move beyond the broad,
qualitative assessments that have often underpinned decision-making to an
evidence-based approach, which demands quantitative assessments that are based
on data. As in many environmental and social arenas, there is plenty of variability
involved in conservation, so statistics become essential.
Science and technical staff, and others involved in research and monitoring, are
generally graduates in various fields, whose qualifications range from first degrees
to PhDs. Increasingly, these staff members are expected to perform duties that
require a competence in study design and statistical analysis. In addition, a very
wide range of staff require basic skills in effective data entry, management and
exploration, including effective graphing. A key number of staff require training
in statistical modelling skills, starting from the linear model and up through its
extensions, including mixed models for repeated measures. In addition, smaller
numbers have specialist subject requirements, such as estimating animal
abundance or survival analysis, including mark-recapture models. Since only two
DOC staff members are appointed primarily as statisticians, the provision of
statistical training for other staff is of vital importance.
We initially assessed statistical training needs through analysing the requests made
to us, and by talking to staff and managers. Key areas we found for development
were: data handling and exploration; modelling and study design.
4. Effective Data Entry, Management and Exploration
Practical data handling and exploration are essential pre-requisites for the
successful application of statistics in the workplace, but are often insufficiently
covered in training. In particular, data entry and preparation is an important but
often neglected area of statistical practice, and is essential for the key tasks of data
exploration and analysis. In fact, there is a key phase during which data
preparation and exploration need to interact to ensure that the data are in a suitable
state for analysis. Data errors, e.g. as a result of incorrect data entry, are very
common and can lead to serious biases and incorrect inferences if left uncorrected.
We have found that Microsoft Excel is a good general tool for data handling.
Legitimising the use of Excel for data entry, storage and initial exploration has
helped to facilitate moving data from pieces of papers and into computers for
analysis (see www.reading.ac.uk/SSC/publications/guides/topsde.html for
additional information).
At DOC, more than 300 staff members have taken part in a 1-day course named
Data Handling in Excel—Entering, Managing and Exploring Data. This course
not only introduces tools that assist with data entry, such as freeze panes,
protecting data and data validation, but also covers the exploration of data using
tables and, if time permits, the production of graphs. One of the key emphases is
on the importance of standard data formatting: each observation has its own row;
each variable is entered into a column with a meaningful name; and only raw data
are entered into data sheets, with no blank rows—analyses and summaries go
elsewhere. Fortunately, this layout works not only when using Excel’s excellent
cross-tabulation tool, the Pivot Table, but also when data are transferred to
dedicated statistical packages. When staff see the advantages of using this layout,
particularly through the quick and easy creation of summary tables using Pivot
Tables, this approach is readily adopted.
4.1 Creating Effective Graphs
To facilitate data exploration and improve the quality of presentation, we
developed another 1-day course on graphs, which has an accompanying manual
(Kelly, Jasperse & Westbrooke, 2005). This course draws heavily on Tufte (2001)
and Cleveland (1994). We developed exercises for this course, including one
(Figure 1) that allows participants to learn for themselves about Cleveland’s
recommended order of visual perception (Cleveland & McGill, 1985). Other
exercises include demonstrations of how easily default Excel graphs can be
improved, and an example showing the inadequacies of pie graphs based on an
excellent book by Robbins (2005). We plan to add a module on graphing in R,
using ggplot2.
Position on a
common scale
Position on identical
non-aligned scales
1
50
Length
2
50
50
50
0
Angle
4
0
3
5
50
2
0
4
5
50
3
50
0
0
1
2
3
4
5
0
1
1
0
2
3
4
5
List the seven graphs by how
easy it is to estimate the SIZE
of the number represented
1 is easiest, 7 hardest
1
5
2
3
1
4 5
3
3
2
4
2
5
4
1
6
1
Slope
Area
2
3
Grey scale
4
5
7
Figure 1: An exercise from our graphs course. Participants are asked to order the
seven graphs according to how easily the size of the number represented can be
estimated. This exercise allows participants to learn for themselves about the
options for presenting quantitative data in graphs and leads into considering the
accuracy of visual perception of different approaches.
5. Study Design Course
Our initial decision to emphasise the handling and analysis of data in our courses
was supported by the findings from a study of biodiversity monitoring projects we
carried out in 2008, which showed that data analysis was the area that required the
most strengthening (see Westbrooke, 2010: Figure 1). However, it also revealed
that attention needed to be given to study design.
Tertiary statistical courses for non-statisticians generally equip graduates almost
exclusively for carrying out experiments and statistical significance testing.
However, DOC staff members are mostly involved in observational studies rather
than experimental research, and management decisions are generally much better
informed by an emphasis on effect sizes rather than hypothesis testing. Therefore,
we developed a 3-day course in practical study design for applied conservation
ecology, with a focus on observational studies.
A major challenge has been ensuring that staff understand the basics of
randomisation and replication, and why they matter. Another important aspect has
been clarifying the differences between experiments and observational studies,
particularly in terms of strength of inference, and providing guidance on when and
how to implement different types of study. We focus on two main areas:
 The four Ws—Why, What, Where and When—with particular emphasis
on why, which is the key question for setting clear and realistic
objectives. The other three Ws refer to what measure is to be used, and
the effective use of replication in time and in space to achieve the
objectives.
 The three Rs—Randomisation, Replication, and stRatification.
Participants each bring along an example of a study they are currently involved in
designing. They introduce their study on the first morning and we then use these
as examples throughout, coming back at the end with the group to evaluate what
the topics covered during the course mean for the development of the design for
these studies. The use of participants’ own examples, and other relevant examples
from the workplace, makes the course more effective in gaining involvement and
ensuring that they can relate the lessons to their work both in the course and when
they return to their job.
6. Modelling Courses
We were often asked early on, ‘How do I fit these data into an ANOVA’, because
that was often the only statistical model to which graduates in other subjects had
been exposed. Another theme was ‘What test do I apply to these data’, as statistics
was equated with hypothesis testing. However, in our courses, we prefer to
emphasise model building and the estimation of effect sizes, which are especially
important in a management organisation, where there is likely to be much more
interest in estimating an effect and a confidence interval than on whether or not it
is different from zero.
In our 3-day introductory statistical modelling course, we revise the linear model,
with sessions on ANOVA and multiple linear regression, including ANCOVA.
We then extend this to the generalised linear model (glm), with Poisson and
binomial errors for count and binary response variables, respectively. When time
has allowed, we have added tree-based models and/or generalised additive models
We teach participants to follow five steps when modelling:
Step 1: Investigate the data
Identify the response and explanatory variables, and their data
types (continuous, nominal, ordinal), and construct graphs and
tables to explore the distribution of the variables and the
relationships between variables.
In class, this step provides the opening for teaching the use of
statistical software for data exploration and visualisation.
Step 2: Fit the model
Choose an appropriate error structure based on the design and
information from Step 1, and use software to fit the model.
This step provides the opportunity for us to explain error structure
and the assumptions underlying the modelling.
Step 3: Analyse the model
Examine the model output from the software, and apply model
selection criteria such as likelihood ratio approaches or the Akaike
Information Criterion (AIC) for model comparison and variable
selection.
This step allows us to explain how to interpret model output, and
to discuss issues around model and variable selection, and multicolinearity.
Step 4: Assess the model
Examine the assumptions defined in Step 2 graphically and
numerically. If Step 4 fails, go to steps 1 and 2 and try alternative
models until a satisfactory model is obtained.
Step 5: Interpret the model
Interpret the results in the context of the overall problem, with
estimates of relevant effects, including confidence intervals.
Many DOC staff members have limited mathematical skills, and participants
struggle to write down expressions to predict the values from statistical model
outputs, or to back-transform onto the original scale. Therefore, we avoid teaching
more abstract statistical concepts, aiming to explain the technical aspects that are
needed using outputs from the computation of statistical models. Attendees do not
learn how to compute parameter estimates and we do not expose them to more
than very basic equations; for example, normal equations of the general form
ˆ  ( X t WX ) 1 X t Wy are too complex for almost all of those who attend. This
means that:
(i)
(ii)
(iii)
(iv)
Participants are not exposed to the design matrix X. This makes it
difficult to provide a full explanation of the need for a reference
category for factor variables. Instead, we provide an informal
explanation of the need for a reference category.
We avoid talking about W and iteratively reweighted least squares
(IRLS). When the glm output shows the number of iterations, we
explain that the computation is repeated a number of times to
converge on the estimated value for the parameter.
We only touch in passing on the relationship between maximum
likelihood estimation and least squares estimation.
The concept of starting values for parameter computation is
ignored.
Our other main modelling course involves the analysis of repeated measures, with
either continuous or discrete responses. DOC carries out many large and small
monitoring projects throughout the country, which typically involve collecting
information from the same sampling unit (subjects) over several years. We
provide an introduction to analysing such data using mixed models.
First, participants learn from a real example that the standard classical approach is
not suitable for analysing this type of data, illustrating the violations of the
classical assumptions using residual plots for each subject. We then explain the
need to modify the model by allowing for random variation between the subjects;
for example, we introduce the random intercept model using the graphical
approach shown in Figure 2.
We then explain that variation can be divided into two types:
(i)
Stochastic variation within a subject, similar to our usual classical
model errors.
(ii)
Variation between subjects around the overall mean, known as
random effects.
Thus, we end up with the response as a function of fixed effects and random
effects, which we present as:
Response ~ Fixed effects + Random effects + Error
and illustrate using Figure 2.
There is random variation
between subjects, forming
a distribution of the
intercepts around the
overall mean.
Overall mean intercept and
slope – Fixed effects
Variation
within
subject
These random effects are
is assumed to be
normally distributed
Time
Figure 2: Random intercept model
Although serial correlation may feature in some of our datasets, time constraints
have so far prevented us from addressing these during the courses. Therefore, we
advise participants that input from a statistician is needed to analyse these types of
data.
As in the introductory course, a number of technical statistical issues are often
discussed in simple language during the course. This approach is well received by
the participants, who are happy to accept our word on the more technical aspects
and are generally more interested in understanding the application rather than the
theory of statistics. Applied topics such as model selection, testing assumptions
and interpreting the results are of particular interest to those attending these
courses.
7. The Benefits of Using R and R Commander Software
We use R (R Development Core Team, 2012) as the software for our statistical
training. With its free access, enormous flexibility and the availability of almost
all statistical techniques, the usage of R has increased exponentially worldwide. R
comes with base libraries and recommended packages, as well as more than 2500
contributed packages.
R was chosen as the statistical software for use in DOC because of its power and
its free availability. This reduces the cost to the organisation, and ensures a ready
access to the software and portability of skills learnt. Initially, we taught our
introductory statistical modelling course using standard R, with participants typing
and submitting R code. However, in our experience, it is a challenge to teach both
statistical methods and the R language for our biology-oriented group of
participants—they often ended up with syntax errors, even though we provided R
code on the screen and explained how to write the code, and users found it hard to
understand and correct these errors. Aside from syntax errors, other difficulties in
using standard R included:
(i)
A number of R-functions and their options need to be understood.
For example, to create good graphs in base R, users need to learn
available graphical options such as lwd, lty, cex, pch, type, legend,
etc.
(ii)
Users find it difficult to understand where to use the appropriate
brackets, such as normal brackets (), square brackets [] and curly
brackets {}.
(iii)
Users need to learn about the availability and location of each
function. For example, in order to avoid mathematical details in
the binomial model, we use the ilogit function in the faraway
library to compute predictions of the binomial parameter p in a
logistic regression for various values of the covariates. When we
present such a function, participants ask how they can learn about
the availability of such things in R. We explain it is not always
easy, and that we learn about them by browsing help files and
books, searching the internet, talking to colleagues, and asking
questions on appropriate forums or mailing lists.
(iv)
R error messages are often far from friendly to casual or first-time
users. For example, what is the meaning of the error message
‘object ilogit not found’? We would suggest typing ??ilogit in the
R Console as the first step to solving this problem, and following it
up with the options mentioned in (iii).
Thus, participants can feel daunted or overwhelmed by the computing aspects of R
that are needed to complete their data analysis. Instead, the majority of
participants prefer to start by using a menu-based approach, as this is an
environment with which they are more familiar and reduces their learning load.
We found three menu-based packages in R: R Commander, Deducer and R Excel.
After some comparative evaluation, we found that R Commander was well-suited
to DOC’s needs.
7.1 R Commander
R Commander (Fox, 2005), which is available in the R package Rcmdr, provides a
simple point and click interface to R, including linear and generalised linear
models, and some graphing capacity. This free, menu-based statistical package
within R creates an easy way to learn statistical modelling with R.
R Commander comes with a menu at the top plus three windows (see Figure 3):
(i)
Script Window
—
R script is automatically written
here when a menu is clicked. The
user can also create or modify code
here.
(ii)
Output window
—
This operates much like the R
console, echoing script as it is
submitted and showing outputs.
(iii)
Messages window — Information is provided about the active
dataset and error messages.
Figure 3: R Commander showing the three windows
We converted most of our statistical modelling course material so that participants
could use R Commander instead of typing R code. We found that this made the
course more relaxed for both the trainers and participants, allowing more time to
learn about statistical methods rather than coding aspects. For example, instead of
writing R code such as
xyplot(fc~year.measured | tag, data = FBI, auto.key =TRUE),
which participants often needed help with, now we just ask them to click
Graphs >> XY conditioning plot
and complete the dialogue. This graph allows visualisation of the response for a
given condition and often makes it easier to recognise data entry errors.
Advantages of using R Commander include:
(i)
New and infrequent users can start using R without confusion or
panic.
(ii)
R Commander provides standard R code for each operation, which
can assist with learning R programming commands and allows us
to modify the script when needed. We sometimes give instructions
on how to modify the code; for example, to add a regression line
to a plot created by xyplot() using the menus, we ask participants
to add ,types = c(“p”,”r”) anywhere within brackets of the xyplot
code, highlight the whole of this piece of code and click the submit
button. Code from R Commander can be saved and later run in R
directly, with very minimal modification.
(iii)
Participants start to learn about R packages, which are
automatically loaded in R Commander. For example the lattice
package is used without their knowledge when XY conditioning
plot is called.
Issues with R Commander include:
(i)
Modelling capability in R Commander focuses on the linear and
generalised linear model. It does not provide access to more
complex statistical modelling and graphing techniques, such as
generalised additive models, regression trees and mixed models.
Thus, participants who require functionality that is not available in
R Commander need to learn to use R code directly.
(ii)
Some stability issues were occasionally encountered through our
teaching experience, but these were largely overcome by ensuring
that we used the sdi (Single Document Interface) rather than the
default mdi (Multiple Document Interface) interface to R under
Windows; and by using an up-to-date version of Rcmdr.
Four main R Commander menus—Data, Statistics, Graphs and Models—are used
in our statistical modelling course. A number of data manipulation and statistical
operations can be carried out under these menus. All possible operations can be
viewed in the manual or by clicking appropriate menus. Some examples are:
(i)
(ii)
(iii)
Data menu:
Data can be imported to R Commander from various formats, such
as text, csv files, SPSS, Minitab, STATA, etc. when Import data is
clicked. The Manage variables in active data set submenu allows
data manipulation such as transformations, converting numerical
variables to factors and recoding variables.
Statistics menu:
Most statistical analyses that are required at undergraduate level
can be carried out using this menu (Figure 4). ANOVAs and t-tests
are available under the Means option, while the Fit models options
allow for the linear model, generalised linear models, the
multinomial logistic model and ordinal regression model.
Graphs menu:
Tools for graphs in R Commander are excellent, but not totally
comprehensive. A number of graphs ranging from histograms to
3D graphs including conditional plots can be generated. Figure 5
shows the range of graphs available.
1
Figure 4: Statistics menu in R Commander
1
s
(iv)
Figure 5: Graphs in R Commander
Models menu:
This menu (Figure 6) provides diagnostics for a model, both
graphically and numerically, including making available tests for
multi-colinearity and auto-correlation.
Figure 6: Assessing models in R Commander
R Commander provides a limited range of statistical tools, but has plug-in
packages that can be manually added using the Tools menu. Plug-in packages
include one for survival analysis and one that provides an introduction to the
sophisticated graphics package ggplot2.
We carried out an informal survey amongst a group of our course participants
about R Commander. More than 70% of the 21 participants surveyed agreed that
R Commander is useful for their own work, while none disagreed. More than 70%
also agreed that they did not need to memorise the R functions as much, while
10% disagreed.
Experience at DOC has shown that R Commander significantly eases the steep
learning curve for R. In our statistical modelling course, we have found that it
allows both participants and trainers to concentrate more on the statistical content.
A number of participants have also found this approach to be a useful bridge to
writing code in R, and are now programming in R independently. This ability to
use code in R provides a good basis for those progressing to our repeated
measures course, which involves mixed modelling, where we put R Commander
aside and use R directly. Some staff members, especially those who only use
statistical tools occasionally, have found that the R Commander environment is
adequate and convenient for their needs, and feel no need to progress to writing R
code.
While software packages as Minitab and SPSS might have some advantages over
R Commander for some teaching purposes, we have found that R Commander
works very well for our first modelling course, and functions very effectively as a
bridge to our main statistical software, R, as the author (Fox, 2005) intended.. We
chose R over our previous software SPSS because it meets our needs in a
statistical package best, and for its free availability. R can be simultaneously used
by a large number of people in the workplace or anywhere else without any
licence issues.
8. Differences Between Training in the Workplace and in the
Education Sector
While the courses we deliver to DOC staff have similar content to those taught in
educational institutions, the workplace context has led to some distinct features.
First, we emphasise practical applications and examples using real data, with a
basic outline of the theoretical background; formulae and mathematical notation
are kept to a minimum, with no derivations or proofs. Second, we teach intensive
block courses (typically 1 or 3 days long) rather than multiple sessions over a
longer period such as a quarter or semester; we have found that it is much easier
for staff members who are faced with many competing priorities to commit to
attending short courses, particularly since participants are dispersed across New
Zealand, as conservation management is often carried out in remote areas. Third,
we work with small classes, up to a maximum of about 12, and have a high trainer
to participant ratio; one trainer can cope with up to five or six participants, so we
usually aim for two trainers per course. Finally, we do not carry out formal
assessment of participants, as it would take up precious classroom time, and there
is less need for formal qualifications in the workplace context; instead, we ask the
participants to assess the course and its applicability to their work.
For the statistical modelling course, respondents have assessed 6 statements on a
5 point scale (from “strongly disagree” to “strongly agree” The statements covered
whether the overall content was relevant to my work; whether the explanations
and practical computing increased understanding of statistical models and R; and
whether the 3-day programme met overall expectations. For 30 respondents from
2009, 2011 and 2012, 177 (98%) of 180 responses were evenly split between
“agree” or “strongly agree”. An open question that asked what worked well
revealed two themes, with 11 mentioning practical exercises or examples, and 1 of
these and 6 others mentioning the small class or the availability of 2 tutors. In
response to what could be improved, there were no obvious common themes
except that 11 gave no response (as against 5 for worked well) and 5 stated little,
not much or nothing.
9. Statistical Training Beyond DOC
Each of the authors has had a very good response internationally when presenting
workshops or seminars involving R Commander. Recently, one author (MR)
carried out a 1-day workshop prior to an international conference in Sri Lanka,
which used R Commander and briefly covered material that was similar to our 3day statistical modelling course. This was well-received by the 26 participants
from seven countries. One author (MR) was also asked on the spot to deliver an
informal seminar for fisheries scientists at the Secretariat of the Pacific
Community (SPC), New Caledonia, because there was a thirst for learning more
about R and R Commander. When shown the graph in Figure 7, participants could
not believe it could be so easy to create such a graph either in R Commander or
using a package such as gplots; previously, they had taken hours to make a similar
graph by writing R code.
Figure 7: Tuna count with standard errors
Similarly, there was such wide interest on accessing R using R Commander that
one author (IW) added an unscheduled seminar to his presentations at the 2011
international Conference on Health Statistics in the Pacific in Suva, Fiji. He also
found that R Commander worked very well in a 2-day workshop on statistical
modelling at the University of Queensland, which was aimed at non-statisticians.
We have noticed that R Commander is generally becoming popular in the Pacific
region and beyond, with workshops using R Commander appearing more
commonly.
10. Conclusions
Our experiences from training DOC staff members and presenting seminars more
widely have shown us that:
•
Training that allows observational data to be distinguished from experimental
data and which provides modelling skills that are applicable to different types
•
•
•
of data is critical. There needs to be an emphasis on the estimation of effect
sizes rather than hypothesis testing.
Effective data management and exploration (especially graphing) skills are
needed, to provide the basis for data analysis.
R Commander works very well for introducing statistical modelling,
especially for graduates in non-statistical disciplines. It allows trainees and
workshop participants to concentrate on the concepts and application of
models to the data, rather than the mechanics of the computations involved.
The workplace context means that courses work best as intensive block
courses, rather than as a series of shorter sessions, with a practical rather than
theoretical emphasis. The use of real examples and datasets that attendees
readily understand is also critical, with hands-on computing an integral part of
all sessions. The evaluation of how well courses meet workplace objectives is
more important than evaluating individuals.
The key to a statistician making a difference in a large workplace is to have a
strong training and advocacy role. One or two statisticians can make a difference,
in our case to help protect New Zealand’s unique biodiversity and protected areas.
We receive great support from the wider statistical community through
consultation, the receipt and provision of specialist training, and the availability of
resources such as R and more specialist software. To make academic statistical
training of biological, ecological and social science students more applicable to
the workplace, our experience shows that there is a need for a stronger statistical
modelling approach, with less emphasis on hypothesis testing.
Acknowledgements
We would like to thank the many statisticians and others who have helped with the
development of training at DOC, especially Neil Cox, Jennifer Brown, Richard
Duncan and Tim Robinson who have played major roles in developing some of
the courses. We wish to acknowledge colleagues and the chapter referees who
assisted us by providing feedback and comments on this chapter and the ICOTS
and OZCOTS papers that preceded it.
References
Barnett, V. (1990), Statistical trends in industry and in the social sector.
http://iase-web.org/documents/papers/icots3/BOOK1/C8-3.pdf
Cleveland, W.S. (1994), The Element of Graphing Data. Summit, NJ. Hobart.
Cleveland, W.S.; McGill, R. (1985). Graphical Perception and Graphical Methods for Analyzing
Scientific Data, Science 229: 828–833.
Forbes, S.; Bucknall, P.: Pihama, N. (2011), Helping make government policy analysts
statistically literate In C. Reading (Ed.), Data and context in statistics education: Towards
an evidence-based society. Proceedings of the Eighth International Conference on Teaching
Statistics (ICOTS8, July, 2010), Ljubljana, Slovenia. Voorburg, The Netherlands:
International Statistical Institute
http://iase-web.org/Conference_Proceedings.php?p=ICOTS_8_2010
Fox, J. 2005. The R Commander: A basic-statistics graphical user interface to R. Journal of
Statistical Software, 19(9): 1–42.
Francis, G.; Lipson, K. (2011), The importance of teaching statistics in a professional context. In
C. Reading (Ed.), Data and context in statistics education: Towards an evidence-based
society. Proceedings of the Eighth International Conference on Teaching Statistics (ICOTS8,
July, 2010), Ljubljana, Slovenia. Voorburg, The Netherlands: International Statistical
Institute
http://iase-web.org/Conference_Proceedings.php?p=ICOTS_8_2010
Hamilton, G. (2010), Statistical training for non-statistical staff at the office for national
statistics. In C. Reading (Ed.), Data and context in statistics education: Towards an evidencebased society. Proceedings of the Eighth International Conference on Teaching Statistics
(ICOTS8, July, 2010), Ljubljana, Slovenia. Voorburg, The Netherlands: International
Statistical Institute
http://iase-web.org/Conference_Proceedings.php?p=ICOTS_8_2010
Kelly, D.; Jasperse, J.; Westbrooke, I. (2005),. Designing science graphs for data analysis and
presentation: The bad, the good and the better. Department of Conservation Technical Series
32. Wellington. Department of Conservation. www.doc.govt.nz/upload/documents/scienceand-technical/docts32.pdf
R Development Core Team (2012). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria., ISBN 3-900051-07-0,
www.R-project.org/.
Robbins, N.B. (2005), Creating More Effective Graphs, Hoboken, NJ. Wiley.
Rohan, M.; Westbrooke, I. (2012), Using R Commander for statistical training in the workplace
http://opax.swin.edu.au/~3420701/OZCOTS2012/OZCOTS2012_RohanM_Final_paper.pdf
Scheaffer, R.L. (1998),. Statistics education - bridging the gaps among school, college and the
workplace
http://iase-web.org/documents/papers/icots5/Keynote3.pdf
Stephenson, W.R. (2002), Experiencing statistics at a distance
http://iase-web.org/documents/papers/icots6/4d3_step.pdf
Tufte, E. (2001), The visual display of quantitative information, Cheshire, CT. Graphics Press.
Westbrooke, I. (2010). Statistics education in a conservation organisation—towards evidence
based management. In C. Reading (Ed.), Data and context in statistics education: Towards
an evidence-based society. Proceedings of the Eighth International Conference on Teaching
Statistics (ICOTS8, July, 2010), Ljubljana, Slovenia. Voorburg, The Netherlands:
International Statistical Institute.
http://iase-web.org/Conference_Proceedings.php?p=ICOTS_8_2010
Download