Improving teaching and learning about threshold concepts: the example of
confidence intervals
Chris Cope, La Trobe University, Bendigo, Australia,
Confidence intervals (CIs) has been identified anecdotally in the literature as a concept which is
difficult to explain and troublesome for students to learn. Yet CIs is the first concept encountered by
most students when studying inferential statistics. This branch of statistics underlies the estimation of
population parameters and hypothesis testing, precursors to considered decision making in many
important, complex contexts. Understanding CIs requires students to make the transition from the
comfortable exact outcomes of descriptive statistics to the uncomfortable inexact outcomes of
inferential statistics. With understanding comes an entirely different view of statistics. For students
who are unable to master CIs, inferential statistics may become inaccessible. For such students the
meaningful discourse associated with areas like quantitative research projects for instance, becomes
difficult if not impossible.
CIs would appear to fit the definition of a threshold concept in the statistics discipline. A threshold
concept has been described by Meyer and Land (2003 p.412) as “a portal, opening up a new and
previously inaccessible way of thinking about something”. Without a transformation to the new way
of thinking students are unable to progress in their learning. Threshold concepts can prove
troublesome for many learners because “ ... such transformation entails a letting go of earlier,
comfortable positions and encountering less familiar and sometimes disconcerting new territory”
(Land, Cousin, Meyer & Davies, 2004).
As CIs can be considered a threshold concept we could expect that research effort in the statistics
education discipline has been concentrated on improving teaching and learning about CIs. This
appears not to be the case. The literature reports a number of well-known misconceptions about CIs
(delMas, 2002; West & Ogden, 1998), but no research reporting teaching approaches which have been
demonstrated to improve student learning.
The aim of the study reported in this paper was to begin the process of improving teaching and
learning about CIs at the first year undergraduate level. Underpinning the study was an action research
based approach to improving teaching known as a modified lesson study (Lo et al., 2004). The
original lesson study was a model developed for Japanese teachers to continuously evaluate their
teaching. The stages of the modified learning study to be used in the investigation of CIs were:
Describe a target level of understanding of CIs appropriate for first year undergraduate students
Conduct a phenomenographic study to empirically identify the distinctly different ways that first
year undergraduate students experience CIs
Analyse the outcomes of the phenomenographic study in the light of the target understanding to
identify the educationally critical aspects of CIs
Design a sequence of learning activities to address the educationally critical aspects
Determine the levels of understanding of a group of students prior to experiencing the learning
activities (using the outcomes of the phenomenographic study)
Teach the sequence of learning activities
Evaluate the impact of the learning activities by determining the levels of understanding of the
group of students after experiencing the learning activities.
Reiterate stages 4 through 7 until improved teaching and learning has been established.
This paper reports the underlying frameworks, method and outcomes of stages 1, 2 and 3. While the
study is about a particular threshold concept, the approach to improving teaching and learning is
applicable to any threshold concept.
A phenomenographic perspective on learning
The theoretical framework for the study of improving teaching and learning about CIs was a particular
phenomenographic perspective on learning (Booth, 1997; Marton & Booth, 1997; Marton, 1998;
Marton, Runesson & Tsui, 2004). Phenomenography is a qualitative research approach that can be
used to identify distinct variation in ways that concepts can be experienced (Marton & Booth, 1997).
Studies of many concepts using phenomenography have produced consistent results. Concepts can be
experienced in a limited number of distinctly different ways. Further, these different experiences can
be described in a hierarchy of inclusive levels of understanding. For example, Bruce (1994) used a
phenomenographic approach to investigate students’ experiences of a dissertation literature review.
The outcome space was an inclusive hierarchy of 6 different ways of experiencing a literature review.
From a shallower to a deeper understanding the experiences were of a literature review as a search, a
list, a survey, a vehicle for learning (a description of the current state of knowledge), a research
facilitator (an identifier of holes in knowledge) and a report. The deepest level of understanding, the
report, is inclusive of all the other experiences as, in order to produce the report, a search of the
literature produces a list of relevant publications which are then critically surveyed to describe the
current state of knowledge, in doing so facilitating research through identifying areas in which there is
a lack of knowledge.
Consideration of the findings of many phenomenographic studies has lead to the development of a
phenomenographic perspective on learning. From this perspective the relationship between a learner
and the world is considered to be non-dualistic - each individual constitutes their own way of
experiencing a phenomenon as an internal relationship between the individual and the phenomenon.
Knowledge lies in the relationship. Knowledge is not fixed but reconstituted as required as
interconnected experiences of the phenomenon. Learning involves a change in the way an individual
experiences a phenomenon and an ability to reconstitute a more meaningful and powerful way of
experiencing the phenomenon. This perspective is consistent with the ideas behind threshold concepts.
Successful learning about a threshold concept requires the learner to develop a transformed view of
the concept. “Such a transformed view or landscape may represent how people ‘think’ in a particular
discipline, or how they perceive, apprehend, or experience particular phenomena within the discipline
(or more generally)” (Meyer & Land, 2003, p. 412).
A structure of awareness
To shed further light on a phenomenographic perspective on learning Marton and Booth (1997) have
used a structure of awareness, based on the field of consciousness described by Gurwitsch (1964). As
a structure of awareness was also used extensively in the study of improving teaching and learning
about CIs as a descriptive and analytical framework it is now described in some detail.
We are capable of being aware of many things simultaneously, but at different levels of focus. For
instance while contemplating what to cook for the evening meal we can still be aware of the hum of
an air conditioner or the noise of a car going past. The different levels of focus in awareness have
been described by Marton and Booth (1997) in terms of two horizons of awareness. When we
contemplate some phenomenon at a particular moment, our previous experiences of aspects of the
phenomenon are reconstituted and become part of our awareness. Strongly related aspects of the
phenomenon in those experiences become more in focus in awareness and form the internal horizon
(the theme of awareness according to Gurwisch, 1964). Other less strongly related experiences of the
phenomenon we are contemplating and unrelated phenomena we are aware of, but not concentrating
on, are less focal in awareness and comprise the external horizon. The external horizon is broken up
into the thematic field and the margin of awareness (Gurwitsch, 1964). Those experiences having
some relation to the phenomenon we are contemplating make up the thematic field and those
experiences that concern unrelated phenomena form the margin of awareness.
Dimensions of Variation
The aspects of a phenomenon which are focal in awareness when the phenomenon is being
contemplated have been called dimensions of variation (DsoV) by Marton and Booth (1997). This is
because a phenomenon is likely to have many related aspects (dimensions) and the nature of an aspect
(its value) is likely to vary between different ways of experiencing the phenomenon. For instance,
sitting devices like chairs, stools and sofas, etc., have dimensions related to their structure that can
vary. Examples include the height of the sitting surface off the ground, and the length and width of the
sitting surface. A sofa has a wider and longer sitting area than a stool but generally the sitting surface
is lower.
A structure of awareness can be illustrated diagrammatically (Figure 1) and used to describe a way of
experiencing of a phenomenon. Such a description requires elicitation of the DsoV of the phenomenon
making up the internal horizon of awareness, the “value” of each dimension and the relationship
between dimensions. The boundary between the internal and external horizon delimits the
phenomenon from its context.
External horizon
Internal horizon
Figure 1: A structure of awareness (Cope 2006)
A structure of awareness has also been used as an analytical framework to consider the inclusive,
hierarchical findings of many phenomenographic studies. Ways of experiencing a phenomenon higher
in a hierarchy represent deeper understandings and more complex structures of awareness. In deeper
understandings more DsoV of the phenomenon are part of the internal horizon, the value of each DoV
approaches the value of the dimension in the deepest understanding, the relationships between DsoV
are stronger and the boundary between the internal and external horizons delimits a broader context
for the phenomenon. In Bruce’s (1994) example of the different ways of experiencing a dissertation
literature review the search is represented by the most simple structure of awareness and the report by
the most complex. The search has only one DoV – the search. The report consists of the DsoV of a
search, a list, a survey, a vehicle for learning and a research facilitator. The survey DoV in the report
way of experiencing a literature review may involve a more sophisticated conceptualisation as a
critical review of a paper when compared to the survey conceptualisation in the survey way of
experiencing a literature review ( a different value in the survey DoV). In the way of experiencing a
literature review as a report, the experience is of the review as a much broader process compared to
the less sophisticated experiences of a literature review.
Educationally critical aspects
In some phenomenographic studies, analysis of the different ways of experiencing a concept using a
structure of awareness has demonstrated that there are educationally critical differences between the
levels of understanding represented in the hierarchical outcome space. These differences occur in
educationally critical aspects of the concept. These are aspects which must be addressed by students in
learning experiences or the development of a deep understanding of the concept is highly unlikely
(Marton & Booth, 1997; Cope & Prosser, 2005). Educationally critical aspects can involve the
recognition of a new DoV of a phenomenon, a new value in a DoV, the existence of or recognition of
a stronger relationship between DsoV and recognition that a phenomenon has a broader context.
Examples of educationally critical aspects include the relationship between the state of a body’s
motion and the net force acting on it in Newtonian motion (Linder & Marshal, 2003) and the partwhole notion in simple arithmetic skills – the ability to experience a number as “a sum of smaller
numbers and as a part of larger numbers” (Marton & Booth, 1997, p.60).
A method for identifying which aspects of a deep understanding of a phenomenon are educationally
critical to its development has been developed (Cope 2002, 2006; Cope & Prosser, 2005). The method
involves 3 stages. First, a structure of awareness is used as a framework to describe a target level of
understanding of the phenomenon of interest. Second, a phenomenographic research approach is used
to describe variation in the way the phenomenon can be experienced. If a group of students is used as
the participants, the phenomenographic study will be likely to describe a hierarchy of ways of
experiencing the phenomenon which represent shallower understandings of the phenomenon in
comparison to the target level of understanding. Third, the outcome space of the phenomenographic
study is analysed in the light of the target understanding, using a structure of awareness as an
analytical framework, to determine the educationally critical aspects of the target understanding.
Knowledge about the educationally critical aspects of a threshold concept, for instance, allows
academics to design a sequence of learning activities more likely to help students address the critical
aspects and develop a deeper understanding. Identifying the educationally critical aspects of CIs is at
the core of the modified learning study used in the research reported in this paper.
The modified learning study applied to CIs
Stage 1
A target level understanding of CIs for first year undergraduate students was identified from the
literature and is now described.
Confidence intervals are part of inferential statistics. The purpose of inferential statistics is to draw
conclusions about a population from sample data. Confidence intervals are used to estimate the value
of a population parameter (Moore 2004). The typical example used at undergraduate level involves
estimating a population mean (µ) from a simple random sample when the population variable has a
normal distribution and known standard deviation.
A sample mean ( x ) can be used as an estimate of µ but the sample means of all possible random
samples of the same size taken from a normal population form a normal distribution with a mean of µ.
Consequently a particular x is rarely likely to equal µ. Indeed we would not know, for a particular
sample,whether x was close to µ or not. Instead of using a single point estimate like the sample mean
it is more appropriate and useful to calculate an interval around a sample mean to cover µ and report a
probability that states
what would happen if the inference method (the calculation of a confidence
intervalfor a random sample) was used many times (Moore 2004).
Figure 2 illustrates a way of conceptualizing a 95% CI for a population mean. The normal distribution
is the sampling distribution of the means of all possible random samples of the same size taken from a
normal population. The 95% CIs associated with 12 random samples lie under the normal distribution.
All but one of these intervals covers µ. The notion of a 95% CI suggests that in a very large number of
samples 95% will cover µ (Moore 2004). Any decisions in a particular real-life situation based on
using a CI as an estimate for a population mean need to be made in the light of the knowledge that in
the long run, 5% of CIs will not cover µ.
Sampling distribution of x
Values of x
This interval misses
The others all cover 
Figure 2: Twelve samples from the same population gave these 95% confidence intervals. In the long
run 95% of intervals cover µ (adapted from Moore, 2004).
Stage 2
A phenomenographic study of variation in ways of experiencing CIs among a group of 100 students
studying the first year undergraduate subject Statistical Methods at La Trobe University, Bendigo,
Australia was conducted. The aim of the study was to produce an outcome space of a limited number
of distinctly different but logically related categories of description of CIs. The categories of
description would represent distinctly different ways of experiencing CIs and the distinctly different
experiences would in turn represent different levels of understanding of CIs.
To provide data for the study a multi-part exam question which would give insights into students’
ways of experiencing of CIs was developed (Appendix A). Ethics approval to analyse the exam
answers in depth was applied for and granted. The exam answers were de-identified, and
decontextualised by combining all the answers into a large pool of statements about CIs.
The first step of the phenomenographic analysis involved contemplating each statement in terms of a
structure of awareness. As each statement was read the researcher considered the following questions:
“what DsoVof the concept of CIs are in focus in awareness if the statement is to make sense?” and
“what different values are associated with each dimension?”. The purpose of this first step was to
describe a limited number of distinctly different DsoV of CIs which underlay the pool of statements
and identify what different values were associated with each dimension. This step of the analysis
process took considerable time as comparisons and dissimilarities were considered.
The second stage of the analysis involved the researcher considering each statement again for insight
into relationships between DsoVs.
The third stage of the analysis involved the researcher using the related DsoV to describe distinctly
different categories of description of CIs. This step involved much iteration and constant returning to
the data to seek supporting evidence for the different categories of description.
The rigour of qualitative research in general has long been debated and the rigour of
phenomenographic studies is no different. A recent call in the literature has been for the return of the
concepts of validity and reliability to qualitative research approaches (Morse et al. 2002). The use of a
structure of awareness as a framework to ensure and establish the validity and reliability of the
designing, conducting and reporting of phenomenographic studies has been proposed (Cope 2004) and
was used extensively in the study reported in this paper.
The outcome of the analysis process was a limited number of distinctly different but logically related
categories of description of the concept of CIs. These categories are now described in terms of their
referential aspect (meaning) in the text and structural aspect (structure of awareness) in Table 1. The
category representing the lowest level of understanding is described first. Categories representing
higher levels of understanding are logically inclusive of categories representing lower levels of
Category 1 - Correctness of original data items
A CI is experienced as the output of an academic, statistical exercise conducted to provide exact
information about the acceptability of the individual items of a sample or population. For example a
95% CI indicates that 95% of the data items are acceptable and 5% are not. The world outside of the
immediate calculation is not considered as part of the experience. A descriptive statistic is not part of
this way of experiencing CIs.
The following quote illustrate the notion of acceptability.
The findings of our own investigation have shown that, with the same method of using a simple
random sample of getting 20 bars of chocolate we get a weekly 95% CI, which means in every week,
in whatever date or time that we pick one chocolate bar at random, it will either be one of the 95 good
ones or one of the 5 that weighs less than 100gms out of 100 chocolate bars.
Category 2 - Range to cover % of data items
A CI is experienced as the output of an academic, statistical exercise which calculates a range that
incorporates a % of the data items in a sample. For example a 95% CI is a range that captures 95% of
the data items in a sample.
Yes it is true to say that in the above question’s case (question a) if we took a sample of 20 we would
expect 19 of that 20 to be within 79.726 and 98.747.
A CI is often more desirable than a single point estimate because an interval is able to give a range
which most of the data is able to fall within from which the chance of the data falling within that
range can shown at a percentage (eg 95%).
With the CI you can find data that is more significant as it can show between what 2 numbers a % of
the data lies
The population mean is not part of this way of experiencing CIs. Sometimes the range is considered to
include the sample mean but generally not. The quotes above give no indication of the sample mean
being part of the experience. The quote below does however.
As the CI contains 95% of the data it is understood that our mean or average figure will lie within this
There is evidence in the quotes that this way of experiencing CIs is inclusive of the first experience.
The following quote indicates that values lying within the range are viewed as acceptable and those
outside the range are not.
A CI indicates the range of acceptable values whereas a single point estimate does not. Therefore in
practice an acceptable value may be rejected because it does not equal the single point estimate. It
means that 95% of accepted values fall within the CI.
Category 3 - Single interval that contains µ
A CI is experienced as the output of an academic, statistical exercise that uses one simple random
sample to calculate an interval that contains µ. Calculating a CI to estimate µ is not part of this way of
experiencing CIs. Rather, the purpose is to produce an interval which contains µ. That the calculation
could have an error is not part of the experience.
The following quotes illustrate the focii on a single sample and an interval which contains µ.
It is true to say that there is a probability of .95 (or 95%) that the interval contains the mean. When
you use 95% confidence to calculate the CI you are getting an interval that has a 95% chance that µ
is in that interval.
A CI is better than a sm as it gives you a range where the mean of a population will be 95% of the
There is a sense of exactness about the calculation and its result. This exactness is expressed variously
in terms of accuracy, correctness, rightness and wrongness. This sense of exactness leads to great
confidence or high chance or a large probability that the interval contains µ. The sense of great
confidence here is in the dictionary definition of confidence as a word commonly used in the English
language. Similarly, the sense of high chance or large probability that a CI contains µ does not have a
statistical basis as only one sample is part of this experience.
The following quotes illustrate the sense of exactness associated with this experience.
A CI provides a larger margin for error. If you are wrong with the CI, you may still include µ but if
you are wrong with a single point estimate you will not.
The mean will always lie between the extreme values of a CI. A 95% CI indicates a smaller range of
values that the mean lies between.
The following quotes illustrate the sense of chance/probability that a CI will contain the mean.
When you use 95% confidence to calculate the CI you are getting an interval that has a 95% chance
that µ is in that interval.
However a single point estimate (sm) whilst it will always be close to the µ it will have some errors
and will rarely exactly equal the µ. The CI gives a percentage of how correct µ is within the range.
The following quote illustrates the use of confidence as an emotive rather than a statistical word. In
many similar quotes there was no evidence of confidence being associated with success of the
For a 95% CI for the mean it is not true to say that there is a probability of .95 that the interval
contains the mean but that one is 95% confident that the mean is within the interval.
This way of experiencing CIs is logically inclusive of the previous two experiences and represents a
more complex experience. This experience includes all aspects of the previous experience but also
includes the purpose and sense of confidence/chance/probability that the interval contains µ.
Category 4 - Single interval to estimate/contain µ
A CI is experienced as the output of an academic, statistical exercise that uses one simple random
sample to calculate an interval that contains, and in many instances can be used to estimate, µ.
Importantly there is a recognition of inexactness in using a CI to contain/estimate µ. Unlike the
previous experience where a definite sense of right and wrong, accuracy and correctness is evident, in
this experience there is a sense of being more sure/more accurate.
The following quotes illustrate a way of experiencing CIs as an approach to estimating µ.
A CI for µ is often more desirable than the single point estimate (sm). Whereas a CI provides a range
of values in which we can be 95% confident (for example - the desired confidence level can vary) that
µ will fall, the single point estimate is based on the results of one sample and, unlike a CI, does not
take into account sampling error. A CI is a safer estimate of µ, particularly if the sample size is
relatively small.
A CI for µ is often more desirable than a single point estimate because a single estimate will contain
some error and we need to account for these errors. The margin of error accounts for random
sampling errors. Because our answers will never be exact, by using a CI we indicate how accurate
our estimate is and how confident we are.
The following quotes illustrate the inexactness but sense of being more sure and more accurate,
evident in this experience.
A CI for µ is often more desirable than a single point estimate (sm) because it provides a more
accurate estimate. Using a single point estimate means that you are limiting the chance of obtaining
the correct figure to only a point, whereas using a CI means that you are increasing the chance of
obtaining the correct figure.
The single point estimate of the sm may have been a rare occurrence. Therefore with a CI we can be
more certain when analysing the data of where µ lies.
A CI is often more desirable than a sm value because with a sm you are unsure whether it is actually
representative of µ, but when you use a CI you make a range of values that you can be reasonably
sure µ lies within.
This way of experiencing CIs is logically inclusive of the previous experiences. A structure of
awareness representing this experience contains all aspects of the previous experience but also
includes the estimate of µ as part of the internal horizon. The other significant change is in the
exactness of the calculation DoV. This aspect has a new value with the discernment of the inexact
nature of the CI calculation.
Category 5 - Many intervals to estimate/contain µ
CIs are experienced as an academic, statistical exercise which uses repeated random samples from the
same population to calculate many intervals as a means of containing/estimating µ with surety. There
is a recognition that with many intervals a high proportion will cover the µ. While the inexactness of
the calculation is discerned, a sense of being more sure of containing µ is generated by having many
intervals. There is no recognition of the normal distribution of the sample means and the statistical
reasoning behind 95% (for example) of intervals containing µ.
This experience features a purpose of containing µ much more frequently than estimating µ.
Consequently the error associated with estimating µ using CIs is not generally a part of the
The following quotes illustrate the notion of multiple samples and multiple CIs. In many similar
quotes in the data there is no evidence of the experience incorporating the normal distribution of the
sample means and CIs however.
A 95% CI means that if someone took many random samples of size n, and each time calculated the
CI, then 95% of these would probably contain µ.
If the population was sampled and the mean estimated in the same way, many times, then the mean
should be in the range 95% of times.
In repeated samples 95% of the time the interval will contain µ.
Yes, it is true to say that a 95% CI has a probability of 0.95 that the interval contains the mean. The
95% CI means that if you were to calculate the CI for many samples, 95% would be expected to
contain the mean.
This way of experiencing CIs is logically inclusive of the previous experiences. All aspects of the
previous experiences are part of this experience but multiple samples from the same population are
discerned. Consequently this experience is more complex and powerful as the multiple samples and
CIs give a sense of being more sure in containing/estimating µ.
Category 6 - Estimate of µ with known confidence and error
CIs are experienced as an academic, statistical exercise which estimates µ with a set confidence and
known error. Confidence relates to the success of the method. For instance, a 95% CI means that the
method for calculating an interval will contain the µ in 95% of intervals constructed from repeated
simple random samples of the same size from a normally distributed population. Importantly, in this
way of experiencing CIs there is an awareness that 5% of intervals will not contain µ.
There was no evidence that the recognition that the confidence aspect of CIs relates to the success of
the method is explained in terms of the normal distribution of the sample means and related CIs.
The following quotes illustrate awareness of repeated samples and the meaning of confidence relating
to success of the method.
Firstly 95% of confidence means this interval of numbers was calculated using a method that gives
the correct value 95% of the time. Secondly in the long run 95% of intervals calculated using simple
random sample of this size from the particular population will contain µ. Therefore the meaning of
any 95% CI is “we are 95% confident that µ is within the certain amount”.
Yes, it is true to say that there is 95% confidence (.95 probability) µ is contained in the CI. This
means that 95/100 times, µ will be contained inside the CI. This assumes that the distribution is
normal. The central limit theorem holds this theory. If we took many random samples of size n and
each time calculated the CI, 95% of the CIs would contain µ.
The following quotes illustrate awareness of the purpose of calculating CIs and that the confidence
and error are known.
CIs is more often desirable as it estimates an unknown population parameter with an indication of
how accurate the estimate is, also showing how confident we are that the result is correct.
I am certain that your result was caused by the 5% error present in the 95% CI and therefore a very
unlikely event has occurred
After reviewing production records over the last six months I have found that all of the weekly 95%
CIs derived from samples of 20 contain the advertised mean of 100g. I suggest that you try another
sample as with a CI of 95% there is still a 5 in 100 probability that you will find that it arrives outside
of 100gms.
This way of experiencing CIs is logically inclusive of the previous experiences. The same DsoV are
part of this experience. ……
Category 7 - Estimate of µ with likely error for decision making
CIs are experienced as a statistical exercise to estimate µ to support decision making in real-life.
Importantly the level of confidence and error are known and can be taken into account when making
The following quotes illustrate the use of CIs to make decisions with known confidence and error.
All in all, 25 CIs were calculated, all of the same sample size, to the same level of confidence and
from the same population. 24 intervals contained the advertised mean of 100g. 1 interval - that
calculated by the government authority - did not contain the advertised mean. The calculation of a
95% CI can be explained in the following way: in the long run, 95% of intervals calculated from the
same size simple random sample will contain µ. The chocolate bars selected were taken from the last
6 months of production so the “long run” requirement is fulfilled. 24/25*100 = 96%. The case in
question produced 96% of intervals containing µ. It should be considered purely coincidental that the
government standards department calculated the interval that did not contain the advertised value.
Their result can be attributed to sampling error that the margin of error failed to account for in that
case. Ultimately there is no reason to attach any significance to the department’s findings.
I am certain that your result was caused by the 5% error present in the 95% CI and therefore a very
unlikely event has occurred
After conducting 24 of our own simple random sample tests, all with 95% CI containing the
advertised mean of 100g. it can be concluded that the governments tests fell within the 5% of unusual
As the CI of 95% was used there is a 1 in 20 chance that an interval found does not contain the mean.
So out of 25 samples you would expect at least one CI not to contain the mean. As the error was due
to chance there is no need for management to make any changes in producing the chocolate bars.
The result found that the interval had a lower limit of 99.026g and an upper limit of 99.774g should
not be too surprising as it is a result of there being no certainty that an interval would contain µ,
because we are only 95% confident.
The finding that in the past 6 months all 24 weekly 95% CIs containing the mean of 100g. is
reasonably expected because we know that over a large number of simple random samples, about
95% will contain the mean, which could be more or less depending on chance. Hence it is not
unbelievable that 100% of 24 samples contained the mean. Management should respond to the notice
by explaining that the department’s findings are not strong evidence because it was a 1/20 chance
that this would be found and they did not take enough simple random samples.
This way of experiencing CIs is inclusive of the previous experiences but is a more complex way of
experiencing CIs that represents a deeper understanding.
Stage 3
The identification of the educationally critical aspects of CIs involved a logical analysis of Table 1.
This table illustrates the structural aspects of the outcome space of the phenomenographic study
conducted in Stage 2 of the modified learning study. The rows in the table represent the 7 categories
of description. Category 7 at the top of the table represents the deepest level of understanding of CIs
and is consistent with the target understanding described in Stage 1 of the study. Category 1 is the
lowest level of understanding. The second column is a summary of the referential aspect of each
category described in detail in Stage 2. Columns 3 – 8 are the DsoV of CIs. Each cell in columns 3 – 8
indicates the value of the DoV (column) for the category (row). Dark lines across the bottom of a cell
in columns 3 – 8 indicate the location of a significant change in value in a DoV.
Which of the significant changes in value in a DoV indicates an educationally critical aspect of CIs
was determined logically in the light of the target understanding. In this example the deepest level of
understanding emanating from the phenomenographic study was equivalent to the target level of
understanding. This will not necessarily be the case for other concepts. The judgement of
educationally critical aspects is subjective at this stage of the modified learning study. The judgement
can only be verified through interventions in teaching approach and evaluation of the impact of the
interventions. The proposed educationally critical aspects are:
An understanding that CIs are based on repeated sampling
While students experience CIs as relating to a single sample they are unlikely to come to terms with
confidence implying the success of the method. The notion of repeated sampling is a sophisticated
concept in itself, however. When would one ever take repeated samples in a practical situation? It is a
statistician’s view of confidence intervals and not that of a user of a CI. For instance, suppose that a
patient is considering whether or not to undergo surgery. The patient has been presented with the
surgeon’s advice that there are some risks associated with the surgery; in fact there is only a 95%
chance that the surgery will be successful and there is a 5% chance that it will not be successful. This
does not mean that if the surgeon operates on the patient 100 times, then 95% of the time the surgery
will be successful and 5% of the time it will fail. The surgeon will operate on this patient only once,
and either the operation will be successful or it won’t. So, what does this advice tell the patient?
Presumably it is based on past experience and it gives the patient some appreciation of the risk
involved in the surgery. It assists the patient in making a decision as to whether to undergo surgery or
not. The surgeon sees the operation as a repeatable event whereas the patient sees surgery as nonrepeatable. Thus, while the surgeon may interpret the 5% failure rate in terms of past statistics, the
patient uses the 5% to make a personal choice. It is educationally critical that students learning about
CIs as part of a statistics education adopt a statistician’s view of CIs involving repeated sampling.
This educationally critical aspect is in turn prerequisite knowledge for the second educationally
critical aspect.
The notion that confidence relates to the success of the method
Underlying this educationally critical aspect is Figure 1. This figure shows that the sample means of
repeated random samples of the same size drawn from a normal population will form a normal
distribution with the mean being µ. In the long run, 5% of the 95% CIs associated with each sample
mean will not cover µ. Ninety-five percent confidence means that an interval was calculated using a
method that is successful in covering the mean 95% of the time in repeated samples. Without this
understanding ….. This educationally critical aspect is also in turn prerequisite knowledge for the
third educationally critical aspect.
Thinking of a confidence interval as an estimation tool for decision making in which a
known likelihood of error needs to be acknowledged and dealt with.
Six out of the 7 ways of experiencing CIs described in the phenomenographic study incorporate a
view of CIs as a statistical exercise that needs to be conducted for academic purposes. CIs are not
experienced as relating to the world beyond assessment tasks in a statistics subject. Yet a key part of a
deep understanding of CIs is their inexactness. There is a possibility of error in using any CI to
A deep understanding of CIs associated with
be more prepared to take risky options than others.
statistical exercise
to calculate many
intervals to
estimate/contain µ
statistical exercise
to estimate µ with
likely error
Estimation of µ
with likely error
for decision
making in real-life
One interval
Many intervals
distribution of
Many intervals
distribution of
Many intervals
View of data
Contain µ
Contains µ
Estimate µ
Estimate µ
but sense of more
accurate/more sure
Inexact but sense
of surety (based on
many intervals)
Inexact but with
calculable error
Inexact but with
calculable error
Exactness of
No. of samples
Range which
contains % of
original data
/ confidence that
interval will
contain µ
/ confidence/% of
time/degree of
that interval
contains µ
% of intervals that
will contain µ
Success of method
Success of method
Meaning of
statistical exercise
statistical exercise
statistical exercise
statistical exercise
statistical exercise
Support for
decision to be
made in real world
Dimensions of Variation
statistical exercise
to calculate interval
to estimate/contain
Contain µ
statistical exercise
to calculate interval
that contains µ
Calculate range
statistical exercise
to calculate range Original data items
which incorporates
most of the data
Acceptability of
individual data
One interval
Original data items
95% of data will be
acceptable and 5% statistical
will be in error
statistical exercise
to see if original
data items are
Table 1: Structural aspects of outcome space (rows represents categories, columns 3 – 8 represent dimensions of variation)
The identification of three educationally critical aspects of CIs represents new knowledge for the
statistics education discipline. The finding is significant in that only through learning activities
targeted at these educationally critical aspects are students likely to develop a deep understanding.
Given the nature of the 3 educationally critical aspects, teaching about CIs by lecturing only is
unlikely to result in a majority of students developing a deep understanding. Active, possibly
collaborative learning activities are required. Students need to experience repeated sampling,
calculation and plotting of confidence intervals. Students also need to experience decision making
based on the calculation of a confidence interval. The decision needs to have an associated risk which,
should the risk come to fruition, has a negative implication students can associate with.
There may be a claim that the educationally critical aspects of CIs identified in this study are intuitive.
This claim can only be made in hindsight, however. From a phenomenographic perspective, one of the
problems to be overcome in improving teaching and learning about threshold concepts is that a
teacher’s understanding of a threshold concept may involve taken for granted knowledge. The teacher
may have held and applied a deep understanding of the concept for a considerable time – to the extent
that the understanding and its application become second nature. In this circumstance a teacher is
likely to be unaware of the constituent parts of the understanding and the logical order in which these
parts need to be addressed by students to form the relationships which constitute an understanding. A
teacher is unlikely to be aware of which aspects of the understanding are educationally critical.
The development of teaching interventions based on the educationally critical aspects and their
implementation and evaluation (stages 5 – 8 of the modified learning study) will occur during the
second half of 2006. It is hoped that the reporting of the study to date may provide an approach and an
impetus for others interested in improving teaching and learning about threshold concepts.
I wish to thank Professor Terry Mills and Dr. Christopher Lenard of the Department of Mathematics,
La Trobe University, Australia, for the offer of the use of some of their informal notes about teaching
confidence intervals.
