Single-Subject Research Designs and Data

Sports Med 2004; 34 (15): 1035-1050
0112-1642/04/0015-1035/$31.00/0
LEADING ARTICLE
 2004 Adis Data Information BV. All rights reserved.
Single-Subject Research Designs and
Data Analyses for Assessing Elite
Athletes’ Conditioning
Taisuke Kinugasa,1 Ester Cerin2 and Sue Hooper1,3
1
2
3
School of Human Movement Studies, The University of Queensland, Brisbane,
Queensland, Australia
School of Population Health, The University of Queensland, Brisbane, Queensland, Australia
Centre of Excellence for Applied Sport Science Research, Queensland Academy of Sport,
Sunnybank, Queensland, Australia
Abstract
Research in conditioning (all the processes of preparation for competition) has
used group research designs, where multiple athletes are observed at one or more
points in time. However, empirical reports of large inter-individual differences in
response to conditioning regimens suggest that applied conditioning research
would greatly benefit from single-subject research designs. Single-subject
research designs allow us to find out the extent to which a specific conditioning
regimen works for a specific athlete, as opposed to the average athlete, who is the
focal point of group research designs. The aim of the following review is to
outline the strategies and procedures of single-subject research as they pertain to
the assessment of conditioning for individual athletes. The four main experimental
designs in single-subject research are: the AB design, reversal (withdrawal)
designs and their extensions, multiple baseline designs and alternating treatment
designs. Visual and statistical analyses commonly used to analyse single-subject
data, and advantages and limitations are discussed. Modelling of multivariate
single-subject data using techniques such as dynamic factor analysis and structural equation modelling may identify individualised models of conditioning leading
to better prediction of performance. Despite problems associated with data
analyses in single-subject research (e.g. serial dependency), sports scientists
should use single-subject research designs in applied conditioning research to
understand how well an intervention (e.g. a training method) works and to predict
performance for a particular athlete.
Most conditioning researchers have used group
research designs where multiple athletes are observed at one or more points in time and groups (e.g.
a training group and a control group) are compared.
Lehmann et al.[1] showed that one group of runners
who increased training intensity improved running
speed at the 4 mmol/L lactate level, whereas another
group who increased training volume (distance) did
not improve. Another example is the study of
Mujika et al.[2] who also used the group research
design to assess swimmers’ training and performance. For applied conditioning research, which aims
to establish the effect of a specific conditioning
regimen or intervention on an individual athlete, one
of the main drawbacks of group research designs is
that the sample mean is used as the representative
value of the group, whilst the mean value may mask
important information for some individuals. These
1036
types of design can establish whether a conditioning
regimen works for ‘average’ athletes. However, at
the elite level, applied conditioning research requires a focus on an individual athlete rather than
groups of ‘average’ athletes to make a confident
assessment of the effect of an intervention (e.g. a
specific training method) on the performance of an
individual athlete.
A successful outcome of conditioning is improved performance. However, it is difficult to frequently measure performance using maximal efforts
(e.g. time trial) for an elite athlete. Indicators of
athletes’ preparedness to perform (performance
readiness) are often used to reflect a measure of
performance.[3,4] Measures of performance readiness include physical (e.g. fatigue), psychological
(e.g. mood disturbance), physiological (e.g. heart
rate variability) and medical (e.g. presence of an
injury and illness) indicators that are hypothesised to
predict performance.[5,6] However, measures such as
blood lactate and plasma cortisol levels have shown
different individual responses among runners and
the profile of mood states (POMS) also varied greatly across cyclists during 6 weeks of high-intensity
training.[7,8] Martin et al.[8] reported that some cyclists with relatively large mood disturbances responded well to tapering (improved cycling performance), whereas others responded poorly and
standard deviations for individual mood scores were
large. Other researchers have also anecdotally reported the existence of substantial inter-individual
differences in performance readiness measures.[9-11]
These findings indicate that the average group results on measures of performance readiness are
bound to be invalid indicators of performance for
some individuals. Consequently, applied conditioning research on elite athletes needs to be approached
from a single-subject perspective.[3]
The aim of the following review is to outline the
strategies and procedures of single-subject research
as they pertain to the assessment of conditioning for
individual athletes. The review will focus on single-subject (quasi-) experimental designs and relative data analyses including assessment of the impact of an intervention and modelling of multivariate data. Using the single-subject approach,
coaches and sports scientists can compare a new
training method (e.g. altitude training) with a tradi 2004 Adis Data Information BV. All rights reserved.
Kinugasa et al.
tional training method and make confident assertions about the effectiveness of the new strategy for
an athlete. Sophisticated mathematical models including the systems model have already been
presented by Banister et al.[12] and Busso[13] to predict performance on an individual basis. However,
their models did not include recovery, which is an
integral component of conditioning and the models
were not tested with elite athletes. This article
presents a single-subject approach to applied research on elite athletes’ conditioning that uses multivariate statistical methods with the aim of helping
sport scientists to more accurately track conditioning and predict performance in individual elite athletes.
1. Single-Subject Experimental Designs
Single-subject (single-case or n = 1) research
designs were established by Pavlov as early as 1928
as a result of work with single organisms.[14] They
were developed in later work by researchers such as
Skinner, studying individual behaviour and methods
of behaviour change.[14] The aim of single-subject
research is to observe one or a few subjects’ outcome (e.g. performance) as a dependent variable at
many different timepoints and to compare the
changes to assess the effect of an intervention (e.g. a
training method). When compared with group research designs, single-subject research designs present several advantages. First, they allow rigorous
objective experimental assessment of the effectiveness of an intervention for the individual. As noted
earlier, whilst group research designs aim at answering the question “what is the effect of this training
method on the ‘average’ athlete?”, single-subject
research designs explore the effect of a training
method on a specific athlete. Secondly, single-subject research designs are appropriate for studying the
process of change in selected indicators of conditioning. Although group research designs can also
be used to analyse the dynamic process of conditioning, the individualised approach may more accurately identify whether conditioning was successful for
a specific athlete. Thirdly, single-subject research
designs are more appropriate for the study of small
populations of athletes such as injured, overtrained
or elite athletes, who are difficult to recruit in sufficiently large numbers to meet the requirement of
Sports Med 2004; 34 (15)
group research designs that the sample size be adequate to detect a practically significant effect of an
intervention. Fourth, single-subject research designs
are usually easier to incorporate into practical and
clinical settings than group research designs because
they are sufficiently flexible to accommodate the
changing needs of the individual studied.[15] This
methodology has been used in a wide range of
research fields such as applied behaviour analysis,
rehabilitation and clinical psychology.[16-18]
1037
Self-referenced performance score
Single-Subject Research and Conditioning
10
9
8
7
6
5
4
3
2
Baseline (A)
Intervention (B)
Baseline (A)
1
1
3
5
7
9
11
13
15
Days
1.1 AB Design
To carry out single-subject conditioning studies,
after measures of conditioning outcomes (e.g. time
trial, physiological and psychological measures) are
selected, researchers need to choose and implement
a specific single-subject research design. The AB
design is the most basic single-subject (quasi-) experimental design.[14] It involves repeated measurement of the conditioning data through the phases A
and B of the study. Phase A represents the baseline
stage of the study in which the intervention that is to
be evaluated is not present. In phase B, the intervention is introduced and changes in the conditioning
data are examined. With some major reservations,
these changes are attributed to the intervention.
These reservations are related to the fact that it is
possible that changes in phase B might have occurred despite the introduction of the intervention.[19] More sophisticated single-subject quasi-experimental designs are needed for a higher degree of
internal validity (the extent to which we can establish the existence of a causal effect between the
intervention and the outcome measures), including
the reversal (withdrawal), multiple baseline and alternating treatment designs.[14]
1.2 Reversal (Withdrawal) Designs and
their Extensions
The ABA design, an extension of the AB design,
is the simplest type of reversal or withdrawal design
(figure 1).[14] Reversal refers to the withdrawal of
the intervention introduced following baseline measurement. Withdrawal designs allow us to conclude
whether an intervention (phase B) impacts on a
dependent variable under scrutiny by providing information on the changes following introduction
 2004 Adis Data Information BV. All rights reserved.
Fig. 1. Hypothetical data of an ABA design. Self-referenced performance data on a 10-point scale (from 10 = very, very good to 1 =
very, very poor) are plotted for 15 days. An intervention (e.g. hydrotherapy) is introduced from day 6 to day 10 and subsequently
withdrawn.
and removal of the intervention. To illustrate, Lerner
et al.[20] used the ABA design to investigate the
effects of psychological interventions on the freethrow performance of four female basketball players. Following a baseline phase (first phase A),
subjects were randomly assigned to one of three
interventions (phase B): goal-setting, imagery programmes, or combined goal-setting and imagery.
The final phase (second phase A) consisted of the
withdrawal of the intervention (i.e. return to the
baseline level).
Other suggested extensions and variations of the
ABA design are the ABAB, BAB, ABCB (phase C
consists of an intervention different from that in
phase B) and changing criterion designs.[14] These
designs are more powerful than the simple AB design in terms of internal validity. By introducing
more phases or reversals (inclusion or removal of an
intervention) the researchers can more reliably assess the impact (if any) of an intervention on the
dependent variable. A consistent correspondence
between changes in the dependent variable and the
introduction or withdrawal of the intervention indicates the existence of an intervention effect.
The BAB design involves an intervention, followed by a baseline, and return to the intervention
phase and can be used to study tapering or detraining
effects when the data collection starts close to the
end of a training season. In this subject, the initial
phase B would correspond to the training intervention at the end of the training season, phase A would
represent the off-season (no training) period and the
Sports Med 2004; 34 (15)
1038
Kinugasa et al.
Serve speed in tennis (km/h)
170
168
166
164
162
160
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19
Weeks
20
Fig. 2. Hypothetical data of a changing criterion design. Serve speed (km/h) for a male tennis player is plotted for 20 weeks. The dashed line
represents a criterion (goal) level. The vertical solid line represents the point an intervention was introduced. A specific strength training
regimen is conducted after week 4 until the serve reaches the speed of 162 km/h. When this criterion is met (in week 8), a new criterion (164
km/h) is introduced as the next goal. The process is repeated until the final goal (170 km/h) is achieved.
second phase B would correspond to the return to
training in the following season. The ABCB design,
a variant of the ABAB design, allows comparison of
two types of interventions (phases B and C). For
example, if we were to assess the impact of training
on an athlete’s performance, phase A would correspond to no training, phase B would represent running training, phase C would represent cross-training including various activities (e.g. running, resistance training, cycling) and phase B would
correspond to the return to running training.
The changing criterion design is an extension of
the ABA design,[14] in which the intervention effect
is demonstrated when the dependent variable
changes to criterion (goal) levels predetermined by
the researcher.[14] After initial baseline measurements (phase A), the intervention (phase B) is introduced until a predetermined criterion is met. The
criterion level (phase B) is then established as a new
baseline (phase A) and a higher criterion level is set
as the next goal to be achieved in the subsequent
intervention phase (phase B). This design is used to
determine acceleration or deceleration of performance and may be especially useful to assist in
achievement of goals in athletes. For example, when
a tennis player, who can serve at 162 km/h, wants to
achieve 170 km/h, the player sets the goal to improve 2 km/h initially (figure 2). A specific strength
 2004 Adis Data Information BV. All rights reserved.
training regimen is conducted until the serve reaches
the speed of 164 km/h. When this criterion is met, a
new criterion (166 km/h) is introduced as the next
goal. The process is repeated until the final goal
(170 km/h) is achieved.
If our aim is to establish the existence of a causal
relationship between a specific intervention and an
athlete’s performance, one of the main limitations of
reversal designs pertains to the fact that carry-over
effects can sometimes occur across adjacent phases
of the study.[15,21] In other words, the effect of the
changed performance may persist after the intervention is removed, which makes it difficult to ascertain
whether the intervention ‘caused’ a change in the
performance. Another problem arises from the inability to control the potential effects of maturation
and practice on later intervention and baseline
phases of the study. These concerns are serious
threats to the external (generalisability of the findings) and internal validity of studies implementing
reversal/withdrawal designs.
1.3 Multiple Baseline Designs
Multiple baseline designs are the most widely
used single-subject designs.[22,23] They are more effective at controlling threats to internal validity such
as carry-over effects than the reversal/withdrawal
designs and are appropriate when interventions canSports Med 2004; 34 (15)
Single-Subject Research and Conditioning
Outcomes, settings, or subjects
The three main types of this design are multiple
baselines across outcomes, settings and subjects.[24,25] In the multiple baseline design across
outcomes, an intervention is applied sequentially to
two or more performance or performance readiness
measures within the same subject. To illustrate in
soccer conditioning, multiple baselines across three
soccer skills (dribble, pass and shoot) can be used to
examine the effects of a coach’s specific training
programme over 10 weeks (figure 4). The researchers would take baseline measurements (no skill
Baseline
(no training)
Intervention
(specific training)
Dribble skill score
10
8
6
4
2
0
1
3
5
7
9
11
13
1
3
5
7
9
11
13
1
3
5
7
9
11
13
10
Pass skill score
8
6
4
2
0
10
Shoot skill score
not be withdrawn due to practical limitations or
ethical considerations.[15,21] The design can be used
when the researchers want to examine the effects of
an intervention across different outcomes (e.g. performance or performance readiness measures), settings (conditions or situations) and subjects. The
multiple baseline design can be conceptualised as
separate AB designs. For example, after obtaining a
stable baseline phase, three interventions are introduced sequentially and measurements are taken regularly until datapoints are equalled among the three
dependent variables (figure 3). The researchers are
assured that the intervention is effective when a
change of a dependent variable appears after the
intervention while the levels of other dependent
variables remain relatively constant. A basic requirement of multiple baseline designs is that the
dependent variables are independent of one another.[14]
1039
8
6
4
2
0
Weeks
Baseline A
Baseline A'
Intervention B
Intervention B'
Baseline A"
Intervention B"
Sessions
Fig. 3. A schematic figure of the multiple baseline design. After
obtaining a stable baseline phase of three dependent variables
(outcomes, settings or subjects), three interventions (interventions
B, B′ and B′′) are introduced sequentially and measurements are
taken regularly until datapoints are equalled among the three dependent variables.
 2004 Adis Data Information BV. All rights reserved.
Fig. 4. Hypothetical data of a multiple baseline design across settings. Self-referenced performance scores on a 10-point scale
(from 10 = very, very good to 1 = very, very poor) are plotted for 13
weeks. Three interventions (e.g. specific training programmes for
dribble, pass and shoot) are introduced in week 3, week 5 and
week 7, respectively.
training) on at least three occasions for a soccer
player before introducing each training programme
independently. Barlow and Hersen[14] recommended
a minimum of three datapoints in the baseline phase.
The coach and researchers would visually analyse
each skill on a 10-point scale (from 10 = very, very
good to 1 = very, very poor) and show whether the
specific training programme had a positive effect on
the soccer player’s skills.
Sports Med 2004; 34 (15)
1040
 2004 Adis Data Information BV. All rights reserved.
founding factors.[14] Alternating treatment designs
may overcome these limitations.
1.4 Alternating Treatment Designs
Although alternating treatment designs are not
widely reported in the literature, the designs allow
the comparison of the effects of two or more interventions on the same dependent variable.[14] This
design involves the alternation of two or more types
of treatments or conditions (e.g. treatments A and B
are alternately implemented day by day) for a single
subject within one intervention phase (figure 5). The
alternating treatment design has been described as a
between-series strategy, where one is comparing
results between two separate series of datapoints,
whereas the reversal designs look at data within the
same series of datapoints (within-series).[14] Wolko
et al.[27] used the alternating treatment design to
compare the effects of standard coaching (as a baseline) and additional self-management skills (public
self-regulation as treatment 1 and private self-regulation as treatment 2) on five female gymnasts. All
three conditions were measured once during each
week of practice over 8 weeks. The order of interventions was randomly alternated across weeks. The
results showed treatment 2 was more effective than
treatment 1 in three of the five subjects, while one
subject demonstrated treatment 1 was more effective and one showed mixed results (standard coaching was most effective for frequency of attempted
skill and treatment 2 was most effective for frequency of completed skill).
10
9
8
Fatigue level
In the multiple baseline design across settings, an
intervention is introduced sequentially across two or
more independent settings in a given subject. For
example, the design can be used to investigate athletes in different environments (e.g. altitude, sea
level) and training phases (e.g. preparation, tapering). Similarly, in the multiple baseline design
across subjects, an intervention is applied sequentially to study one outcome across two or more
matched subjects. The design is used to assess a
single athlete but allows attempts to replicate the
effects of the intervention for more than one athlete,
by introducing the intervention successively to each
athlete. In team sports, the coaches can demonstrate
that the training programme affected the individual
athletes (not as a group) by assessing whether performance changed only when the intervention was
introduced for each athlete. Thus, the coaches can
modify the training programme if it is not effective
for enhancing performance. Shambrook and Bull[26]
used multiple baselines across subjects to examine
the effects of imagery training on free-throw performance in four female basketball players. The
researchers divided 26 free-throw trials into two
(baseline and intervention) phases and each subject
began the intervention at different points in time (the
point of intervention for each subject was randomly
determined before the study). Only one subject
demonstrated an improved free-throw performance
after the imagery training, the others showed poorer
performance. Although datapoints in the baseline
phase varied among subjects, the multiple baseline
design across subjects is considered a replication of
the AB design in a single subject.[14]
One of the limitations of multiple baseline designs is that some outcomes, settings, or subjects
may be interdependent or interrelated.[14] A change
in one outcome, setting, or subject may, therefore,
influence another outcome, setting, or subject. In
this subject, the controlling effects of intervention
are subject to question. Thus, outcomes, settings, or
subjects should be independent of each other. Another limitation is that a substantial number of measurements may be needed in the baseline phase to
demonstrate clear intervention effects in the intervention phase. Further, the dependent variable may
change in the baseline phase before the intervention
is applied due to practice effects and other con-
Kinugasa et al.
7
Non-treatment
6
Treatment A
5
4
3
Treatment B
2
1
1
2
3
4
5
6
7
8
9
10 11 12 13
Days
Fig. 5. Hypothetical data of an alternating treatment design. Fatigue
levels on a 10-point scale (from 10 = very, very high to 1 = very,
very low) are plotted for 13 days. After a baseline phase, two
treatments, A and B (e.g. stretching and hydrotherapy, respectively), are introduced alternately during an intervention phase.
Sports Med 2004; 34 (15)
Single-Subject Research and Conditioning
The main advantages of alternating treatment
designs are that they do not require a withdrawal of
intervention, the phases can be much shorter than in
AB designs, and a baseline phase is not an absolute
requirement.[21] However, carry-over effects from
one treatment to the next may exist, although the
random assignment of treatments or conditions can
reduce this problem.[21]
In summary, each of the four main experimental
designs in single-subject research (the AB design,
reversal/withdrawal designs, multiple baseline designs and alternating treatment designs) has its advantages and disadvantages. Choosing an appropriate one is based on available resources (e.g. time,
finances and other source availability, and compliance of subjects) and current knowledge about the
phenomenon being studied.[14]
2. Data Analyses in Single-Subject
Experimental Designs
Systematic assessment is required to provide information about the progress of conditioning so that
the data can be used to adjust athletes’ training
programmes, competition schedules and lifestyle,
on a regular basis. This complex evaluation of conditioning is currently based on the coach’s intuitive
judgment or subjective visual analysis. However,
statistical analyses can assist in understanding successful loading patterns and objectively predicting
performance for each athlete. Although visual analysis is still common in single-subject experimental
designs, statistical analyses have been developed to
improve the analysis of single-subject data.[14,28,29]
2.1 Visual Analysis
Visual analysis (or inspection) is commonly used
to subjectively assess graphed data and judge whether an intervention has produced a significant change
in a dependent variable.[16,30,31] Data are plotted on a
graph (e.g. a line graph), where the horizontal (x)
axis is split according to units of time (e.g. minutes,
hours, days) and the vertical (y) axis contains units
of measurement of the subject’s dependent variable
(e.g. performance).[21] Visual analysis can be used
when there is a non-linear underlying temporal trend
in the dependent variable. These graphs are then
 2004 Adis Data Information BV. All rights reserved.
1041
analysed by the researchers, independent judges, or
coaches.
For example, Mattacola and Lloyd[30] used visual
analysis to assess the effect of a 6-week strength and
proprioception programme for three subjects who
had previously sustained ankle sprains. They reported that improvements in dynamic balance were observable through the researchers’ visual inspection.
In another study, performance decrements in maximum workload, maximal oxygen uptake and anaerobic threshold were observed by two independent,
experienced judges in an elite, ultra-endurance cyclist who developed chronic fatigue syndrome.[32]
The advantages of visual analysis are that it is
easy and inexpensive to use, widely recognised and
understood, and graphs can simplify the data.[31,33]
Visual analysis can be useful in practical or clinical
settings since it allows continuous monitoring of
performance.[31] However, as it has been shown that
there is often a lack of agreement among judges with
respect to the conclusion drawn from visual analysis, the accuracy and reliability of this particular
method of data analysis have been questioned.[34-36]
For example, DeProspero and Cohen[34] showed that
the inter-rater reliability coefficient of visual analysis of 36 ABAB reversal graphs showing hypothetical single-subject data was r = 0.61 for 114 experienced reviewers of behavioural journals. To do this,
a set of graphs was constructed to illustrate four
graphic factors of represented characteristics of visual analysis: (i) pattern of mean shift; (ii) degree of
mean shift; (iii) fluctuation variation within phases;
and (iv) trend. For example, the first graphic factor
was presented by three patterns of mean shifts
across phases: ideal pattern, inconsistent treatment
pattern and irreversible effect pattern of results. The
evaluation of each figure was expressed as a rating
(“How satisfactory a demonstration of experimental
control do you consider this to be?”) on a 100-point
scale and the inter-rater reliability coefficient was
calculated by the Pearson product moment correlation. Ottenbacher[37] also conducted a meta-analysis
of visual analysis studies across 14 mental retardation studies. Each of the 14 studies was coded by
two examiners to establish the inter-rater agreement
of visual analysis. The overall inter-rater reliability
was r = 0.58, but details on how the code was set
were not provided. The inconsistency of the judgSports Med 2004; 34 (15)
1042
ments is mainly due to the lack of any standard rules
or guidelines to make decisions about the result.[35]
Statistical analysis is, therefore, required to support
visual analysis or to replace visual analysis in conditioning research.
2.2 Statistical Analyses for Detecting
Significant Intervention Effects
Statistical significance in single-subject research
designs refers to the probability that an intervention
has a reliable or real effect on a dependent variable.[18] For example, coaches and sports scientists
may want to compare a new training method (e.g.
altitude training) as an intervention and a traditional
training method as a baseline on an elite athlete.
Statistical methods can be used to confidently assess
whether a practically significant change in the athlete’s performance has occurred as a result of the
new training method during a training season. However, it is important to note that levels of statistical
significance and practical (or clinical) meaningfulness are different.[14,38] In this regard, the current
literature suggests a series of statistical methods
aimed at determining the effect of an intervention by
comparing baseline and intervention phases. The
suggested statistical methods are time series analysis, randomisation tests, split-middle technique and
Revusky’s test of ranks.
In group research designs, statistical analysis
usually consists of testing to see whether differences
between groups are statistically significant. The assumption of independence of measurements, which
means that observations must be independent of one
another, is assured by randomly assigning subjects
to specific conditions.[38] However, the prerequisite
for parametric statistical analyses (i.e. assumption of
independence) used in group research designs is
often not met in single-subject studies. Therefore,
conventional statistical analyses used in group research designs (e.g. t and F tests) may not be applicable in single-subject research designs. Before implementing these statistical analyses in single-subject research designs, the issue of serial dependency
must be considered.[38]
2.2.1 Serial Dependency
When successive observations in single-subject
time series data are correlated, the data are said to be
 2004 Adis Data Information BV. All rights reserved.
Kinugasa et al.
serially dependent.[38-40] The existence of serial dependency on one occasion allows us to predict subsequent data in the series. Strictly, errors of measurement (residuals) associated with data at one
point may be predictive of errors at other points in
the series that follows.[41] Serial dependency can be
assessed by examining an autocorrelation coefficient in the series.[24,38] The autocorrelation coefficient of lag 1, which is calculated by pairing temporally adjacent data (time and time-1 datapoints), is
generally deemed sufficient to reveal serial dependency in the series.[38] The lag-1 autocorrelation is
computed by pairing the initial with the second
datapoint, the second with the third datapoint and so
on until the second from the last is paired with the
last datapoint. If the autocorrelation coefficient is
not substantial, conventional statistical analyses
based on the assumption of independence of measurements such as t and F tests can be used to
analyse single-subject data. In contrast, if these tests
are applied to autocorrelated data, the results may
lead to falsely conclude that the intervention was
effective since the precision of estimation will be
affected by the bias of serial dependency (type I
errors may be inflated).[38,39] Statistical tests such as
time series analysis and randomisation tests can be
used to analyse autocorrelated single-subject
data.[36]
2.2.2 Time Series Analysis
Time series analysis allows us to determine the
nature, magnitude and direction of the relationship
between variables that are measured at several equidistant points in time and assess the effects of interventions on a dependent variable in single-subject
studies.[42,43] For example, interrupted time-series
analysis techniques, which are used to analyse temporally ordered variables, make it possible to analyse autocorrelated single-subject data.[38,44,45]
Moreover, time series analysis does not depend upon stable baselines and provides important information about different characteristics of the performance change for adjacent phases.[38] The analysis can
be used to compare adjacent phases such as baseline
and intervention phases in terms of slope and level
(intercept). The method is especially useful to examine whether there is a statistically significant
change in slope and level rather than change in
overall mean.[43,45] The slope refers to the degree of
Sports Med 2004; 34 (15)
Single-Subject Research and Conditioning
1043
the angle of a linear line, which represents the rate of
change from one phase to the next, and the level
refers to the magnitude of change in data at the point
when the intervention is introduced.[25] If data at the
end of the baseline and the beginning of the intervention phases show an abrupt departure or discontinuity, this discontinuity would reflect a change in
level. The probability is computed by comparing the
obtained test statistics in terms of slope and level
between phases (e.g. phases A and B).
The disadvantage of time series analysis is the
complexity of the mathematical theories on which it
is based.[45] It also needs many datapoints to identify
the model accurately.[44,46] Some researchers have
suggested that at least 50 datapoints are required.[42,43] However, Crosbie[44] proposed an interrupted time series analysis correlation procedure
applicable for fewer datapoints than required in
traditional time series analysis. This method requires 10–20 datapoints in each phase to achieve
acceptable statistical power to detect significant differences.[44] Recently, Kinugasa et al.[3] showed a
significantly decreased slope and level in recovery
data (number of hours of physical therapy, day nap,
nutrition, bathing and active rest) for an elite collegiate tennis player between off-season and preseason using this method. They also used it to compare changes in conditioning indicators (e.g. training load, recovery, performance and performance
readiness) between training phases such as off-, preand in-seasons.
Tryon[46] proposed the C statistic as a simple
method of time series analysis for use with as few as
eight datapoints in each phase. The C statistic does
not require a complex computer program and is
easily calculated using Young’s table.[47] The first
step is to calculate the Z value (the normalised C
statistic) of baseline data to examine whether the
data are stable, since interpretation becomes difficult when the baseline data have a significant
trend.[48] The value of C is given by equation 1.[46]
(Eq. 1)
 2004 Adis Data Information BV. All rights reserved.
where xi is the ith datapoint and n is the total number
of datapoints. The standard error of C is calculated
by equation 2 which gives the Z value (Z = C/Sc).[46]
(Eq. 2)
In this regard, we recommend starting data collection in the off-season (no training or minimal
training) when monitoring an athlete’s conditioning.
When the baseline is stable, the next step is to
compare the significance of the change between
normalised baseline and intervention data. Because
the C statistic cannot control for a type I error when
a significant autocorrelation is observed,[49] Yamada[50] suggested that randomisation tests are more
appropriate for single-subject data than the C statistic.
2.2.3 Randomisation Tests
A randomisation test is a permutation test based
on randomisation (random assignment) to test a null
hypothesis about intervention effects.[51] The randomisation tests have been recently proposed as a
valid method of data analysis for single-subject research.[39,51,52] Randomisation tests are a non-parametric procedure that can be applied to single-subject data when the assumptions of parametric tests
are not tenable.[51,53] Randomisation tests can be
used to determine whether the effect of a specific
intervention on the outcome is statistically significantly different from zero or from that of another
intervention. To conduct a valid randomisation test,
the study design must incorporate random assignment. For example, if we are to establish whether
conditioning regimen A is more beneficial than conditioning regimen B, we need to randomly assign
treatment A to half of the points in time at which the
outcome variable is going to be assessed. Intervention B would be assigned to the remaining points in
time. The basic approach to randomisation tests is
straightforward. We formulate our hypothesis,
which, for example, could read “Intervention A has
the same effect on the athlete’s performance as
intervention B”. We choose a statistic to test our
hypothesis (e.g. t-test). We generate a null reference
distribution by randomly shuffling the observed data
of the outcome variable over the entire sequence.
We assign the first n1 observation to the first condiSports Med 2004; 34 (15)
1044
tion (intervention A) and the remaining n2 observations to the second condition (intervention B). We
calculate the test statistics for the reshuffled data.
We repeat the last 3 steps k times (usually more than
1000). We calculate the proportion of times the test
statistic on the randomised data exceeded that on the
original data, which represents the probability of
obtaining the original test statistic under the null
hypothesis. We reject or retain the null hypothesis
on the basis of this probability.[28,41]
The statistical power of randomisation tests (the
probability that the null hypothesis will be rejected
when it is not true) is relatively low.[54,55] For example, Ferron and Ware[55] estimated the power was
0.40 using an AB design with 30 datapoints when
there was no autocorrelation. Cohen[56,57] suggested
0.80 as an acceptable power when the alpha level (p
value) is 0.05. However, the power of randomisation
tests also depends on the choice of the design.[54]
Ferron and Onghena[54] reported that the design involving the random assignment of interventions to
phases was more powerful than the ABAB design
with three randomly assigned interventions. An acceptable power (>0.80) was obtained when Cohen’s
effect sizes were large (1.1 and 1.4), and phase
lengths (datapoints) exceeded five.[55]
One of the advantages of randomisation tests is
that they are more efficient in controlling type I
errors than the C statistic is.[50] Fewer datapoints
than time series analysis (i.e. minimum of 50
datapoints) are required. However, it should be noted that in conditions of significant positive autocorrelations among datapoints, to control type I errors,
a more conservative probability level (e.g. 0.01)
should be adopted when using randomisation
tests.[58] Finally, various methods (e.g. analysis of
variance, correlation analysis) appropriate for the
analysis of single-subject data have been developed,
although these methods have not yet been used in
conditioning research.[36,38]
Recently, Kinugasa et al.[3] monitored two tennis
players for a 6-month training season and used a
randomisation test to analyse mean values of conditioning indicators (training load, recovery and performance readiness) across training phases. One
subject’s performance readiness data significantly
increased from the beginning to the end of the
training season, but this was not observed in the
 2004 Adis Data Information BV. All rights reserved.
Kinugasa et al.
other subject. The researchers suggested that the
randomisation test can be used to objectively assess
changes in how the athlete responded to a conditioning regimen.
2.2.4 Split-Middle Technique
The combined use of the split-middle technique
(celeration line approach) and a binomial test provides a non-parametric method to reveal the nature
of the trend in the data and can be used to predict an
athlete’s performance over time.[59-61] The split-middle technique is easy to compute and can be used
with a small number of datapoints.[21] The aim of
this technique is to assess the effect of a specific
intervention by providing a method of describing the
rate of change in the outcome variable for a single
individual. This technique has been proposed primarily to describe the process of change in the outcome
within and across intervention conditions. This is
achieved by plotting trends within conditions (e.g.
baseline and intervention) to characterise an athlete’s progress. Statistical significance can be examined once the trend lines have been determined.[21]
The split-middle technique involves multiple
steps. From the data of each phase of the study (e.g.
baseline and intervention), a trend or split-middle
line is constructed to characterise the rate of performance over time for a specific phase. This line is
situated so that 50% of the data fall on or above the
line and 50% fall on or below the line. This is done
by dividing the data from a specific phase in half by
drawing a solid vertical line to separate the first half
of the datapoints from the second half. Next, each of
the halves is divided in half by dashed vertical lines.
The median value of the outcome variable is plotted
for each half of the phase. A straight line is drawn
through the two median points denoting the rate of
change for that particular phase. Subsequently, the
trend line from a phase of the study (e.g. baseline) is
extended into the following phase of the study (e.g.
intervention) as a dashed line.[21] A binomial test can
be used to determine the statistical significance of
the intervention effect by establishing if the number
of datapoints above the projected line in the intervention phase is of a sufficiently low probability not
to be attributed to chance.[62]
Sports Med 2004; 34 (15)
Single-Subject Research and Conditioning
There have been few experimental studies using
the split-middle technique, none of which have been
directly related to conditioning.[23,26,63,64] For example, Marlow et al.[63] used the split-middle technique
to analyse the effect of a pre-performance routine on
the water polo penalty shot. The results revealed
21–28% performance improvements in all three
male water polo players between baseline and intervention phases. Shambrook and Bull[26] also reported a multiple-baseline design study examining the
impact of an imagery training routine on basketball
free-throw performance for four female basketball
players over 12 weeks. They showed that only one
subject demonstrated a consistent performance improvement in terms of mean (4%), slope and level
(6%) after the imagery training. Although the use of
inferential statistics accompanying this method of
data analysis is problematic when the data are
autocorrelated,[49] the split-middle technique may be
a useful descriptive technique for conditioning research.
2.2.5 Revusky’s Test of Ranks (Rn)
Revusky’s test of ranks has been proposed for
examining the effect of an intervention in studies
with a multiple baseline design in which data are
collected across several outcomes, settings, or subjects (see section 1.3).[28,38,65,66] The intervention
effect is determined by assessing the performance of
each of the baselines at the point when the intervention is introduced. For example, in a multiple baseline design across different outcomes, each outcome
is treated as a sub-experiment. When the intervention is introduced for a specific outcome, the performance of all outcomes is ranked for that point in
1045
time. To account for baseline differences in magnitude or measurement units across outcomes, the
ranks are based on the percentage change in level
from baseline to the time when the intervention is
introduced to any of the outcomes. The sum of the
ranks across all sub-experiments each time the intervention is introduced constitutes the statistic Rn.
This statistic reflects whether the intervention had a
significant effect on the various aspects of performance.
One of the limitations of the test is that the
performance must change dramatically across
phases to be reflected in the ranks.[38] More importantly, the analysis of the change in level alone may
result in erroneous conclusions with regard to the
effect of the intervention. Hence, it is suggested that
an examination of both changes in levels and slopes
be conducted.[66]
In summary, there are advantages and disadvantages associated with each type of statistical analysis
(table I). It is important to recognise the limitations
of each method before applying any of these statistical analyses to single-subject data. Currently, a combination of randomisation tests and interrupted time
series analysis, examining the changes in means,
slopes and levels is recommended for assessing the
effects of an intervention in single-subject conditioning studies. As far as the other statistical methods are concerned, randomisation tests are simple
statistical tools to use but they lack statistical power.[54,55] On the other hand, time series analysis is
appropriate for conditioning research but its theory
and technique are complex.[45]
Table I. Statistical methods for single-case experimental data
Method
Sample size
Autocorrelated data
Use
Analysis of variance
Small to moderate (>30)
No
AB, ABA, ATD
The C statistic
Small (>16)
No
AB, ABA
Time series analysis
Large (>50)
Yes
AB, model identification
Interrupted time series
Small (>20)
Yes
AB
Randomisation tests
Small (>30)
Yes
AB, ABA, ATD, multi
Split-middle technique
Small (>30)
No
AB, ABA, multi
Revusky’s test of ranks
Small (>30)
Yes
Multi
P-technique factor analysis
Large (>100)
No
Model identification
Dynamic factor analysis
Large (>100)
Yes
Model identification
Structural equation modelling
Large (>100)
No
Model identification
AB = AB design; ABA = ABA design; ATD = alternating treatments design; multi = multiple baseline design.
 2004 Adis Data Information BV. All rights reserved.
Sports Med 2004; 34 (15)
1046
3. Modelling of Multivariate Data to
Predict Performance for a Single Subject
A multifaceted approach is needed to understand
the complex phenomenon of conditioning and identify causal dose-response relationships among training load, recovery, performance and performance
readiness. Researchers have collected multivariate
data to assess elite athletes’ conditioning (e.g. training load, recovery, performance and performance
readiness),[67-69] but have not elaborated on the
methods of analysis of such data derived from studies adopting a single-subject design. P-technique
factor analysis, dynamic factor analysis and structural equation modelling (SEM) are statistical methods that can be used for the analysis of multivariate
time series data in single-subject studies on conditioning. Whilst the use of the first two methods is
mainly limited to the reduction of the number of
observed variables to a few underlying factors, SEM
is usually employed to establish whether the observed data support the existence of hypothetical
causal relationships between variables.
3.1 P-Technique Factor Analysis
Traditional factor analysis used in cross-sectional
studies is known as the R-technique with the dataset
(score matrix) having subjects (n) as rows and variables (p) as columns (n × p).[70] Cattell[70] introduced
the P-technique factor analysis to assess time series
data (repeated measurements on a single subject
over time). P-technique factor analysis uses a T × r
matrix, where T is the number of occasions and r is
the number of variables measured on a single subject. For example, Kinugasa et al.[3] used P-technique factor analysis with a principle component
solution (principle component analysis) to summarise 31 measures of an athlete’s conditioning to
seven factors (components) [e.g. training achievement, recovery achievement, sleep, specific training, fatigue, physiological needs and physical training] and identified an individualised model of conditioning.
P-technique factor analysis analyses the structure
of the correlations among variables to define common underlying factors so that multivariate time
series data can be summarised as a few factors or a
single factor score. However, one of the limitations
 2004 Adis Data Information BV. All rights reserved.
Kinugasa et al.
of the P-technique factor analysis stems from the
fact that it takes into account only simultaneous
relationships among components of multivariate
time series data. Anderson[71] criticised the technique for its failure to incorporate a lagged covariance (a relationship between two datasets with time
lags) structure of the time series. Moreover, the
factor loading estimates are lower than their true
value in P-technique models when serial dependency exists.[72]
3.2 Dynamic Factor Analysis
Recently, dynamic factor analysis has been proposed for analysing multivariate time series data
with serial dependency to overcome the limitations
of the P-technique factor analysis.[72-74] This type of
factor analysis represents a generalisation of the Ptechnique factor model that incorporates the lagged
covariance structure among multivariate time series
data. As such, the technique accounts for the presence of serial dependency in the data.
Although dynamic factor analysis can be computed using traditional SEM software (e.g. LISREL
and EQS), the usual input variance/covariance matrix created by these programmes is not adequate
because it does not include the lagged covariances
among multivariate time series data.[72,75] Performing dynamic factor analysis using SEM requires
specifying a symmetric covariance matrix containing the lagged covariances. This is done by organising the inherently asymmetric lagged covariance
matrix as a block-Toeplitz matrix. A block-Toeplitz
matrix replicates the information about the blocks of
simultaneous and lagged covariances across variables to construct a variance-covariance matrix that
is square and symmetric. In this regard, Wood and
Brown[72] constructed a series of SAS macro programs to obtain a Toeplitz-transformed matrix of
simultaneous and lagged covariances, which can
then be used as the input variance/covariance matrix
in standard SEM software such as LISREL and
EQS.
3.3 Structural Equation Modelling
SEM is a general framework for statistical analysis that includes as special cases several traditional
multivariate methods such as factor analysis, multiSports Med 2004; 34 (15)
Single-Subject Research and Conditioning
ple regression analysis and discriminant analysis.[76-78] SEM is used to estimate hypothetical causal
relationships amongst manifest (measured) and latent variables. Consequently, this technique may be
useful for examining causal relationships among
training load, recovery, performance and performance readiness. SEM may be used to examine both
linear and non-linear relationships between variables.[76,79]
Structural equation models are often visualised
by a graphical path diagram, which depicts the expected relationship between the examined variables.[80] In general, the aim of SEM is to assess the
discrepancy (lack of fit) between the observed
covariance matrix and an expected covariance matrix based on a hypothetical path diagram.
SEM can be used with cross-sectional and repeated measures data. One of the first SEM methods for
the analysis of repeated measures data is the
autoregressive
cross-lagged
panel
model
(ARCL).[81] This modelling strategy comprises two
main components. First, later measures of a construct (e.g. current performance) are predicted by
earlier measures of the same construct (e.g. last
weeks’ performance), thus giving rise to the term
‘autoregressive’. Secondly, later measures of one
construct (e.g. current performance) can be regressed on earlier measures of other constructs (e.g.
yesterday’s mood). This second component gives
rise to the term ‘cross-lagged’. Despite the widespread use of the ARCL modelling approach, this
analytic technique has been criticised for both statistical and theoretical reasons. For example, ARCL
models usually omit the observed mean structure.
This means that potentially important information
about the mean changes over time is ignored. Additionally, in ARCL models, the changes in the construct between two timepoints are independent of
the influence of earlier and later changes in the same
construct. This approach reduces multiple repeated
measures into a series of two-timepoint comparisons, which is seldom consistent with he structure of
the observed data.
Latent curve analysis is a SEM method that has
been developed to overcome the limitations of the
ARCL method. Latent curve analysis aims at exploring the unobserved factors that are hypothesised
to underlie temporal changes in and relationships
 2004 Adis Data Information BV. All rights reserved.
1047
among variables.[82] The latent curve model attempts
to estimate the continuous trajectory that gave rise to
the observed repeated measures. Both ARCL models and latent curve analysis may be useful for
building individualised models of conditioning to
predict performance in individual athletes.
4. General Issues in
Single-Subject Research
Inter-individual differences in response to athletes’ conditioning should be considered in assessing changes in conditioning data especially in elite
athletes.[3,83] Single-subject experimental designs
address this issue, but with the most obvious problem that the results are unlikely to be relevant to
other individuals.[14,84,85]
As mentioned in section 2.2, it is important to
recognise that statistical significance and practical
meaningfulness are two different things.[14,38] Practical meaningfulness refers to the practical value or
importance of the effect of an intervention.[18] For
example, performance may be enhanced to a level
that is important to an athlete, but the change may
not be statistically significant. Statistical significance is not the only way to assess an elite athlete’s
conditioning. If the variable being measured has a
sufficiently small error of measurement, we can
detect the smallest worthwhile change with repeated
measurement of the variable. Thus, error of measurement must be taken into account in a preliminary reliability study to make sure the variable is worth
measuring.[86] Hopkins[87] has addressed the various
ways to estimate the chances that a substantial
(greater than the smallest practically meaningful
effect) change has occurred after an intervention.
Therefore, it is necessary to interpret data carefully
before reaching any conclusions in single-subject
research. Theoretical and practical considerations
and visual analysis by coaches and sports scientists
are helpful in supporting findings.
5. Conclusion
Despite the problems associated with data analyses in single-subject research (e.g. serial dependency and generality of findings), it is recommended
that sports scientists use single-subject research experimental designs in applied conditioning research
Sports Med 2004; 34 (15)
1048
Kinugasa et al.
to assess the effect of an intervention (e.g. a specific
training method) and to predict performance for a
particular athlete. The single-subject approach is
applicable to specific categories of subjects such as
elite or overtrained athletes.
Single-subject research designs are rare in the
literature.[88] Additionally, data from these rare studies have most often been assessed by visual analysis,
a fairly subjective technique of data analysis. We
believe that the use of statistical analyses in conjunction with visual analysis will produce more reliable and objective findings than those based on
visual analysis alone.[36] Further, we believe the
application of single-subject experimental designs
and data analyses for the assessment of athlete conditioning has an important place in effectively and
efficiently monitoring elite athletes.
Acknowledgements
The authors wish to gratefully acknowledge Professor
Will G. Hopkins, Auckland University of Technology, Auckland, New Zealand, for his invaluable contribution to this
manuscript.
The authors have provided no information on sources of
funding or on conflicts of interest directly relevant to the
content of this review.
References
1. Lehmann M, Baumgartl P, Wiesenack C, et al. Training-overtraining: influence of a defined increase in training volume vs
training intensity on performance, catecholamines and some
metabolic parameters in experienced middle- and long-distance runners. Eur J Appl Physiol 1992; 64: 169-77
2. Mujika I, Chatard JC, Busso T, et al. Effects of training on
performance in competitive swimming. Can J Appl Physiol
1995; 20: 395-406
3. Kinugasa T, Miyanaga Y, Shimojo H, et al. Statistical evaluation of conditioning using a single-case design. J Strength
Cond Res 2002; 16: 466-71
4. Rowbottom DG, Morton A, Keast D. Monitoring for overtraining in the endurance performer. In: Shephard RJ, Åstrand PO,
editors. Endurance in sport: volume II of the encyclopaedia of
sports medicine. 2nd ed. Malden (MS): Blackwell Scientific,
2000: 486-504
5. Bompa TO. Theory and methodology of training: the key to
athletic performance. 3rd ed. Dubuque (IA): Kendall/Hunt,
1994
6. Matveyev L. Fundamentals of sports training. Moscow: Progress Publishers, 1981
7. Bagger M, Petersen PH, Pedersen PK. Biological variation in
variables associated with exercise training. Int J Sports Med
2003 Aug; 24 (6): 433-40
8. Martin DT, Andersen MB, Gates W. Using profile of mood
states (POMS) to monitor high-intensity training in cyclists:
group versus case studies. Int J Sport Psychol 2000; 14: 138-56
9. Boulay MR. Physiological monitoring of elite cyclists: practical
methods. Sports Med 1995; 20: 1-11
 2004 Adis Data Information BV. All rights reserved.
10. Fry RW, Morton AR, Keast D. Periodisation of training stress: a
review. Can J Sport Sci 1992; 17: 234-40
11. Pyne DB, Gleeson M, McDonald WA, et al. Training strategies
to maintain immunocompetence in athletes. Int J Sports Med
2000; 21: s51-60
12. Banister EW, Carter JB, Zarkadas PC. Training theory and
taper: validation in triathlon athletes. Eur J Appl Physiol 1999;
79: 182-91
13. Busso T. Variable dose-response relationship between exercise
training and performance. Med Sci Sports Exerc 2003; 35:
1188-95
14. Barlow DH, Hersen M. Single-case experimental designs: strategies for studying behavior change. 2nd ed. New York: Pergamon Press, 1984
15. Backman CL, Harris SR. Case studies, single subject research,
and N of 1 randomized trials: comparison and contrasts. Am J
Phys Med Rehabil 1999; 78: 170-6
16. Bobrovitz CD, Ottenbacher KJ. Comparison of visual inspection and statistical analysis of single-subject data in rehabilitation research. Am J Phys Med Rehabil 1998; 77: 94-102
17. Hartmann DP. Forcing square pegs into round holes: some
comments on ‘an analysis-of-variance model for the intrasubject replication design’. J Appl Behav Anal 1974; 7: 635-8
18. Kazdin AE. Research design in clinical psychology. 3rd ed.
Needham Heights (MA): Allyn and Bacon, 1998
19. Campbell DT. Reforms as experiments. Am Psychol 1969; 24:
409-29
20. Lerner BS, Ostrow AC, Yura MT, et al. The effect of goalsetting and imagery training programs on the free-throw performance of female collegiate basketball players. Sport
Psychol 1996; 10: 382-97
21. Zhan S, Ottenbacher KJ. Single subject research designs for
disability research. Disabil Rehabil 2001; 23: 1-8
22. Bryan AJ. Single-subject designs for evaluation of sport psychology interventions. Sport Psychol 1987; 1: 283-92
23. Callow N, Hardy L, Hall C. The effect of a motivational
general-mastery imagery intervention on the sport confidence
of high-level badminton players. Res Q Exerc Sport 2001; 72:
389-400
24. Neuman SB, McCormick S. Single-subject experimental research: applications for literacy. Newark (DE): International
Reading Association, 1995
25. Richards SB, Taylor RL, Ramasamy R, et al. Single subject
research: applications in educational and clinical settings. San
Diego (CA): Singular Publishing Group, 1999
26. Shambrook CJ, Bull SJ. The use of a single-case research design
to investigate the efficacy of imagery training. J Appl Sport
Psychol 1996; 8: 27-43
27. Wolko KL, Hrycaiko DW, Martin GL. A comparison of two
self-management packages to standard coaching for improving
practice performance of gymnasts. Behav Modif 1993; 17:
209-23
28. Kazdin AE. Single-case research design: methods for clinical
and applied settings. New York: Oxford University Press,
1982
29. Kratochwill TR, Levin JR. Single-case research design and
analysis: new directions for psychology and education.
Hilldale (DE): Lawlence Erlbaum Associates, 1992
30. Mattacola CG, Lloyd JW. Effects of a 6-week strength and
proprioception training program on measures of dynamic balance: a single-case design. J Athlet Train 1997; 32: 127-35
31. Parsonson BS, Baer DM. The visual analysis of data, and
current research into the stimuli controlling it. In: Kratochwill
TR, Levin JR, editors. Single-case research design and analysis: new directions for psychology and education. Hillsdale
(NJ): Lawrence Erlbaum Associates, 1992: 15-40
Sports Med 2004; 34 (15)
Single-Subject Research and Conditioning
32. Rowbottom DG, Keast D, Green S, et al. The case history of an
elite ultra-endurance cyclists who developed chronic fatigue
syndrome. Med Sci Sports Exerc 1998; 30: 1345-8
33. Ferron J, Foster-Johnson L. Analyzing single-case data with
visually guided randomization tests. Behav Res Methods Instrum Comput 1998; 30: 698-706
34. DeProspero A, Cohen S. Inconsistent visual analyses of intrasubject data. J Appl Behav Anal 1979; 12: 573-9
35. Ottenbacher KJ. Reliability and accuracy of visually analyzing
graphed data from single-subject designs. Am J Occup Ther
1986; 40: 464-9
36. Yamada T. Introduction of randomization tests as methods for
analyzing single-case data. Jpn J Behav Anal 1998; 13: 44-58
37. Ottenbacher KJ. Interrater agreement of visual analysis in single-subject decisions: quantitative review and analysis. Am J
Ment Retard 1993; 98: 135-42
38. Kazdin AE. Statistical analyses for single-case experimental
designs. In: Barlow DH, Hersen M, editors. Single case experimental designs: strategies for studying behavior change. 2nd
ed. New York: Pergamon Press, 1984: 285-321
39. Busk PL, Marascuilo LA. Statistical analysis in single-case
research: issues, procedures, and recommendations, with application to multiple behaviors. In: Kratochwill TR, Levin JR,
editors. Single-case research design and analysis: new directions for psychology and education. Hillsdale (NJ): Lawrence
Erlbaum Associates, 1992: 159-85
40. McCleary R, Welsh WN. Philosophical and statistical foundations of time-series experiments. In: Kratochwill TR, Levin
JR, editors. Single-case research design and analysis: new
directions for psychology and education. Hillsdale (NJ): Lawrence Erlbaum Associates, 1992: 41-91
41. Todman JB, Dugard P. Single-case and small-n experimental
designs: a practical guide to randomization tests. Mahwah
(NJ): Lawrence Erlbaum Associates, 2001
42. Box GEP, Jenkins GM. Time series analysis: forecasting and
control. Rev ed. San Francisco (CA): Holden-Day, 1976
43. Glass GV, Willson VL, Gottman JM. Design and analysis of
time-series experiments. Boulder (CO): Colorado Associated
University Press, 1975
44. Crosbie J. Interrupted time-series analysis with brief singlesubject data. J Consult Clin Psychol 1993; 61: 966-74
45. Hartmann DP, Gottman JM, Jones RR, et al. Interrupted timeseries analysis and its application to behavioral data. J Appl
Behav Anal 1980; 13: 543-59
46. Tryon WW. A simplified time-series analysis for evaluating
treatment interventions. J Appl Behav Anal 1982; 15: 423-9
47. Young LC. On randomness in ordered sequences. Ann Math
Stat 1941; 12: 293-300
48. Blumberg CJ. Comments on ‘A simplified time-series analysis
for evaluating treatment interventions’. J Appl Behav Anal
1984; 17: 539-42
49. Crosbie J. The inappropriateness of the C statistic for assessing
stability or treatment effects with single-subject data. Behav
Assess 1989; 11: 315-25
50. Yamada T. Applications of statistical tests for single-case data:
power comparison between randomization tests and C statistic
[in Japanese]. Jpn J Behav Anal 1999; 14: 87-98
51. Edgington ES. Randomization tests. 3rd ed. New York: Marcel
Dekker, 1995
52. Levin JR, Marascuilo LA, Hubert LJ. N = nonparametric
randomization tests. In: Kratochwill TR, Levin JR, editors.
Single-case research design and analysis: new directions for
psychology and education. Hillsdale (NJ): Lawrence Erlbaum
Associates, 1992: 159-85
53. Edgington ES. Statistical inference from n = 1 experiments. J
Psychol 1967; 65: 195-9
 2004 Adis Data Information BV. All rights reserved.
1049
54. Ferron J, Onghena P. The power of randomization tests for
single-case phase designs. J Exp Educ 1996; 64: 231-9
55. Ferron J, Ware W. Analyzing single-case data: the power of
randomization tests. J Exp Educ 1995; 63: 167-78
56. Cohen J. Statistical power analysis for the behavioral sciences.
2nd ed. Hillsdale (NJ): Lawrence Erlbaum, 1988
57. Cohen J. A power primer. Psychol Bull 1992; 112: 155-9
58. Gorman BS, Allison DB. Statistical alternatives for single-case
designs. In: Franklin RD, Allison DB, Gorman BS, editors.
Design and analysis of single-case research. Mahwah (NJ):
Lawrence Erlbaum Associates, 1996
59. White OR. A glossary of behavioral terminology. Champaign
(IL): Research Press, 1971
60. White OR. A manual for the calculation and use of the median
slope: a technique of progress estimation and prediction in the
single case. Eugene (OR): University of Oregon, Regional
Resource Center for Handicapped Children, 1972
61. White OR. The ‘Split Middle’: a ‘Quickie’ method of trend
estimation. Seattle (WA): University of Washington, Experimental Education Unit, Child Development and Mental Retardation Center, 1974
62. Nourbakhsh MR, Ottenbacher KJ. The statistical analysis of
single-subject data: a comparative examination. Phys Ther
1994; 74: 768-76
63. Marlow C, Bull SJ, Heath B, et al. The use of a single case
design to investigate the effect of a pre-performance routine on
the water polo penalty shot. J Sci Med Sport 1998; 1: 143-55
64. Silliman LM, French R. Use of selected reinforcers to improve
the ball kicking of youths with profound mental retardation.
Adapted Phys Activity Q 1993; 10: 52-69
65. Revusky SH. Some statistical treatments compatible with individual organism methodology. J Exp Anal Behav 1967; 10:
319-30
66. Wolery M, Billingsley FF. The application of Revusky’s Rn test
to slope and level changes. Behav Assess 1982; 4: 93-103
67. Mackinnon LT, Hooper SL. Overtraining and overreaching:
cause, effects, and prevention. In: Garrett Jr WE, Kirkendall
DT, editors. Exercise and sport science. Philadelphia (PA):
Lippincott Williams and Willkins, 2000: 487-98
68. McKenzie DC. Markers of excessive exercise. Can J Appl
Physiol 1999; 24: 66-73
69. Rowbottom DG, Keast D, Morton A. Monitoring and preventing of overreaching and overtraining in endurance athletes. In:
Kreider RB, Fry AC, O’Toole ML, editors. Overtraining in
sport. Champaign (IL): Human Kinetics, 1998: 47-66
70. Cattell RB. Factor analysis. New York: Holt, 1952
71. Anderson TW. The use of factor analysis in the statistical
analysis of multiple time series. Psychometrika 1963; 28: 1-25
72. Wood P, Brown D. The study of intraindividual differences by
means of dynamic factor models: rationale, implementation,
and interpretation. Psychol Bull 1994; 116: 166-86
73. Molenaar PCM. A dynamic factor model for the analysis of
multivariate time series. Psychometrika 1985; 50: 181-202
74. Molenaar PCM, Rovine MJ, Corneal SE. Dynamic factor analysis of emotional dispositions of adolescent stepsons toward
their stepfathers. In: Silbereisen R, von Eye A, editors. Growing up in times of social change. New York: DeGruyter, 1999:
287-318
75. Hershberger SL, Corneal SE, Molenaar PCM. Dynamic factor
analysis: an application to emotional response patterns underlying daughter/father and stepdaughter/stepfather relationships. Struct Equat Model 1994; 2: 31-52
76. Hox JJ, Bechger TM. An introduction to structural equation
modeling. Fam Sci Rev 1998; 11: 354-73
77. Marsh HW, Grayson D. Longitudinal stability of latent means
and individual differences: a unified approach. Struct Equat
Model 1994; 1: 317-59
Sports Med 2004; 34 (15)
1050
78. Raykov T, Widaman KF. Issues in applied structural equation
modeling research. Struct Equat Model 1995; 2: 289-318
79. Browne MW, Cudeck R. Alternative ways of assessing model
fit. In: Bollen KA, Long JS, editors. Testing structural equation
models. Newbury Park (CA): Sage Publications, 1993
80. Rigdon EE. Software review: Amos and AmosDraw. Struct
Equat Model 1994; 1: 196-201
81. Anderson TW. Some stochastic process models for intelligence
test scores. In: Arrow KJ, Karlin K, Suppes P, editors. Mathematical methods in the social sciences. Stanford (CA): Stanford University Press, 1960
82. Meredith W, Tisak J. Latent curve analysis. Psychometrika
1990; 55: 107-22
83. Mackinnon LT. Overtraining effects on immunity and performance in athletes. Immunol Cell Biol 2000; 78: 502-9
84. Bates BT. Single-subject methodology: an alternative approach.
Med Sci Sports Exerc 1996; 28: 631-8
85. Reboussin DM, Morgan TM. Statistical considerations in the
use and analysis of single-subject designs. Med Sci Sports
Exerc 1996; 28: 639-44
 2004 Adis Data Information BV. All rights reserved.
Kinugasa et al.
86. Hopkins WG. Measures of reliability in sports medicine and
science. Sports Med 2000; 30: 1-15
87. Hopkins WG. Probabilities of clinical or practical significance
[online]. Available from URL: http//sportsci.org/jour/0201/
wghprob.htm [Accessed 2004 Oct 27]
88. Hrycaiko D, Martin GL. Applied research studies with singlesubject designs: why so few? J Appl Sport Psychol 1996; 8:
183-99
Correspondence and offprints: Dr Sue Hooper, Centre of
Excellence for Applied Sport Science Research, Queensland
Academy of Sport, PO Box 956, Sunnybank, QLD 4109,
Australia.
E-mail: sue.hooper@srq.qld.gov.au
Sports Med 2004; 34 (15)