Sports Med 2004; 34 (15): 1035-1050 0112-1642/04/0015-1035/$31.00/0 LEADING ARTICLE 2004 Adis Data Information BV. All rights reserved. Single-Subject Research Designs and Data Analyses for Assessing Elite Athletes’ Conditioning Taisuke Kinugasa,1 Ester Cerin2 and Sue Hooper1,3 1 2 3 School of Human Movement Studies, The University of Queensland, Brisbane, Queensland, Australia School of Population Health, The University of Queensland, Brisbane, Queensland, Australia Centre of Excellence for Applied Sport Science Research, Queensland Academy of Sport, Sunnybank, Queensland, Australia Abstract Research in conditioning (all the processes of preparation for competition) has used group research designs, where multiple athletes are observed at one or more points in time. However, empirical reports of large inter-individual differences in response to conditioning regimens suggest that applied conditioning research would greatly benefit from single-subject research designs. Single-subject research designs allow us to find out the extent to which a specific conditioning regimen works for a specific athlete, as opposed to the average athlete, who is the focal point of group research designs. The aim of the following review is to outline the strategies and procedures of single-subject research as they pertain to the assessment of conditioning for individual athletes. The four main experimental designs in single-subject research are: the AB design, reversal (withdrawal) designs and their extensions, multiple baseline designs and alternating treatment designs. Visual and statistical analyses commonly used to analyse single-subject data, and advantages and limitations are discussed. Modelling of multivariate single-subject data using techniques such as dynamic factor analysis and structural equation modelling may identify individualised models of conditioning leading to better prediction of performance. Despite problems associated with data analyses in single-subject research (e.g. serial dependency), sports scientists should use single-subject research designs in applied conditioning research to understand how well an intervention (e.g. a training method) works and to predict performance for a particular athlete. Most conditioning researchers have used group research designs where multiple athletes are observed at one or more points in time and groups (e.g. a training group and a control group) are compared. Lehmann et al.[1] showed that one group of runners who increased training intensity improved running speed at the 4 mmol/L lactate level, whereas another group who increased training volume (distance) did not improve. Another example is the study of Mujika et al.[2] who also used the group research design to assess swimmers’ training and performance. For applied conditioning research, which aims to establish the effect of a specific conditioning regimen or intervention on an individual athlete, one of the main drawbacks of group research designs is that the sample mean is used as the representative value of the group, whilst the mean value may mask important information for some individuals. These 1036 types of design can establish whether a conditioning regimen works for ‘average’ athletes. However, at the elite level, applied conditioning research requires a focus on an individual athlete rather than groups of ‘average’ athletes to make a confident assessment of the effect of an intervention (e.g. a specific training method) on the performance of an individual athlete. A successful outcome of conditioning is improved performance. However, it is difficult to frequently measure performance using maximal efforts (e.g. time trial) for an elite athlete. Indicators of athletes’ preparedness to perform (performance readiness) are often used to reflect a measure of performance.[3,4] Measures of performance readiness include physical (e.g. fatigue), psychological (e.g. mood disturbance), physiological (e.g. heart rate variability) and medical (e.g. presence of an injury and illness) indicators that are hypothesised to predict performance.[5,6] However, measures such as blood lactate and plasma cortisol levels have shown different individual responses among runners and the profile of mood states (POMS) also varied greatly across cyclists during 6 weeks of high-intensity training.[7,8] Martin et al.[8] reported that some cyclists with relatively large mood disturbances responded well to tapering (improved cycling performance), whereas others responded poorly and standard deviations for individual mood scores were large. Other researchers have also anecdotally reported the existence of substantial inter-individual differences in performance readiness measures.[9-11] These findings indicate that the average group results on measures of performance readiness are bound to be invalid indicators of performance for some individuals. Consequently, applied conditioning research on elite athletes needs to be approached from a single-subject perspective.[3] The aim of the following review is to outline the strategies and procedures of single-subject research as they pertain to the assessment of conditioning for individual athletes. The review will focus on single-subject (quasi-) experimental designs and relative data analyses including assessment of the impact of an intervention and modelling of multivariate data. Using the single-subject approach, coaches and sports scientists can compare a new training method (e.g. altitude training) with a tradi 2004 Adis Data Information BV. All rights reserved. Kinugasa et al. tional training method and make confident assertions about the effectiveness of the new strategy for an athlete. Sophisticated mathematical models including the systems model have already been presented by Banister et al.[12] and Busso[13] to predict performance on an individual basis. However, their models did not include recovery, which is an integral component of conditioning and the models were not tested with elite athletes. This article presents a single-subject approach to applied research on elite athletes’ conditioning that uses multivariate statistical methods with the aim of helping sport scientists to more accurately track conditioning and predict performance in individual elite athletes. 1. Single-Subject Experimental Designs Single-subject (single-case or n = 1) research designs were established by Pavlov as early as 1928 as a result of work with single organisms.[14] They were developed in later work by researchers such as Skinner, studying individual behaviour and methods of behaviour change.[14] The aim of single-subject research is to observe one or a few subjects’ outcome (e.g. performance) as a dependent variable at many different timepoints and to compare the changes to assess the effect of an intervention (e.g. a training method). When compared with group research designs, single-subject research designs present several advantages. First, they allow rigorous objective experimental assessment of the effectiveness of an intervention for the individual. As noted earlier, whilst group research designs aim at answering the question “what is the effect of this training method on the ‘average’ athlete?”, single-subject research designs explore the effect of a training method on a specific athlete. Secondly, single-subject research designs are appropriate for studying the process of change in selected indicators of conditioning. Although group research designs can also be used to analyse the dynamic process of conditioning, the individualised approach may more accurately identify whether conditioning was successful for a specific athlete. Thirdly, single-subject research designs are more appropriate for the study of small populations of athletes such as injured, overtrained or elite athletes, who are difficult to recruit in sufficiently large numbers to meet the requirement of Sports Med 2004; 34 (15) group research designs that the sample size be adequate to detect a practically significant effect of an intervention. Fourth, single-subject research designs are usually easier to incorporate into practical and clinical settings than group research designs because they are sufficiently flexible to accommodate the changing needs of the individual studied.[15] This methodology has been used in a wide range of research fields such as applied behaviour analysis, rehabilitation and clinical psychology.[16-18] 1037 Self-referenced performance score Single-Subject Research and Conditioning 10 9 8 7 6 5 4 3 2 Baseline (A) Intervention (B) Baseline (A) 1 1 3 5 7 9 11 13 15 Days 1.1 AB Design To carry out single-subject conditioning studies, after measures of conditioning outcomes (e.g. time trial, physiological and psychological measures) are selected, researchers need to choose and implement a specific single-subject research design. The AB design is the most basic single-subject (quasi-) experimental design.[14] It involves repeated measurement of the conditioning data through the phases A and B of the study. Phase A represents the baseline stage of the study in which the intervention that is to be evaluated is not present. In phase B, the intervention is introduced and changes in the conditioning data are examined. With some major reservations, these changes are attributed to the intervention. These reservations are related to the fact that it is possible that changes in phase B might have occurred despite the introduction of the intervention.[19] More sophisticated single-subject quasi-experimental designs are needed for a higher degree of internal validity (the extent to which we can establish the existence of a causal effect between the intervention and the outcome measures), including the reversal (withdrawal), multiple baseline and alternating treatment designs.[14] 1.2 Reversal (Withdrawal) Designs and their Extensions The ABA design, an extension of the AB design, is the simplest type of reversal or withdrawal design (figure 1).[14] Reversal refers to the withdrawal of the intervention introduced following baseline measurement. Withdrawal designs allow us to conclude whether an intervention (phase B) impacts on a dependent variable under scrutiny by providing information on the changes following introduction 2004 Adis Data Information BV. All rights reserved. Fig. 1. Hypothetical data of an ABA design. Self-referenced performance data on a 10-point scale (from 10 = very, very good to 1 = very, very poor) are plotted for 15 days. An intervention (e.g. hydrotherapy) is introduced from day 6 to day 10 and subsequently withdrawn. and removal of the intervention. To illustrate, Lerner et al.[20] used the ABA design to investigate the effects of psychological interventions on the freethrow performance of four female basketball players. Following a baseline phase (first phase A), subjects were randomly assigned to one of three interventions (phase B): goal-setting, imagery programmes, or combined goal-setting and imagery. The final phase (second phase A) consisted of the withdrawal of the intervention (i.e. return to the baseline level). Other suggested extensions and variations of the ABA design are the ABAB, BAB, ABCB (phase C consists of an intervention different from that in phase B) and changing criterion designs.[14] These designs are more powerful than the simple AB design in terms of internal validity. By introducing more phases or reversals (inclusion or removal of an intervention) the researchers can more reliably assess the impact (if any) of an intervention on the dependent variable. A consistent correspondence between changes in the dependent variable and the introduction or withdrawal of the intervention indicates the existence of an intervention effect. The BAB design involves an intervention, followed by a baseline, and return to the intervention phase and can be used to study tapering or detraining effects when the data collection starts close to the end of a training season. In this subject, the initial phase B would correspond to the training intervention at the end of the training season, phase A would represent the off-season (no training) period and the Sports Med 2004; 34 (15) 1038 Kinugasa et al. Serve speed in tennis (km/h) 170 168 166 164 162 160 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Weeks 20 Fig. 2. Hypothetical data of a changing criterion design. Serve speed (km/h) for a male tennis player is plotted for 20 weeks. The dashed line represents a criterion (goal) level. The vertical solid line represents the point an intervention was introduced. A specific strength training regimen is conducted after week 4 until the serve reaches the speed of 162 km/h. When this criterion is met (in week 8), a new criterion (164 km/h) is introduced as the next goal. The process is repeated until the final goal (170 km/h) is achieved. second phase B would correspond to the return to training in the following season. The ABCB design, a variant of the ABAB design, allows comparison of two types of interventions (phases B and C). For example, if we were to assess the impact of training on an athlete’s performance, phase A would correspond to no training, phase B would represent running training, phase C would represent cross-training including various activities (e.g. running, resistance training, cycling) and phase B would correspond to the return to running training. The changing criterion design is an extension of the ABA design,[14] in which the intervention effect is demonstrated when the dependent variable changes to criterion (goal) levels predetermined by the researcher.[14] After initial baseline measurements (phase A), the intervention (phase B) is introduced until a predetermined criterion is met. The criterion level (phase B) is then established as a new baseline (phase A) and a higher criterion level is set as the next goal to be achieved in the subsequent intervention phase (phase B). This design is used to determine acceleration or deceleration of performance and may be especially useful to assist in achievement of goals in athletes. For example, when a tennis player, who can serve at 162 km/h, wants to achieve 170 km/h, the player sets the goal to improve 2 km/h initially (figure 2). A specific strength 2004 Adis Data Information BV. All rights reserved. training regimen is conducted until the serve reaches the speed of 164 km/h. When this criterion is met, a new criterion (166 km/h) is introduced as the next goal. The process is repeated until the final goal (170 km/h) is achieved. If our aim is to establish the existence of a causal relationship between a specific intervention and an athlete’s performance, one of the main limitations of reversal designs pertains to the fact that carry-over effects can sometimes occur across adjacent phases of the study.[15,21] In other words, the effect of the changed performance may persist after the intervention is removed, which makes it difficult to ascertain whether the intervention ‘caused’ a change in the performance. Another problem arises from the inability to control the potential effects of maturation and practice on later intervention and baseline phases of the study. These concerns are serious threats to the external (generalisability of the findings) and internal validity of studies implementing reversal/withdrawal designs. 1.3 Multiple Baseline Designs Multiple baseline designs are the most widely used single-subject designs.[22,23] They are more effective at controlling threats to internal validity such as carry-over effects than the reversal/withdrawal designs and are appropriate when interventions canSports Med 2004; 34 (15) Single-Subject Research and Conditioning Outcomes, settings, or subjects The three main types of this design are multiple baselines across outcomes, settings and subjects.[24,25] In the multiple baseline design across outcomes, an intervention is applied sequentially to two or more performance or performance readiness measures within the same subject. To illustrate in soccer conditioning, multiple baselines across three soccer skills (dribble, pass and shoot) can be used to examine the effects of a coach’s specific training programme over 10 weeks (figure 4). The researchers would take baseline measurements (no skill Baseline (no training) Intervention (specific training) Dribble skill score 10 8 6 4 2 0 1 3 5 7 9 11 13 1 3 5 7 9 11 13 1 3 5 7 9 11 13 10 Pass skill score 8 6 4 2 0 10 Shoot skill score not be withdrawn due to practical limitations or ethical considerations.[15,21] The design can be used when the researchers want to examine the effects of an intervention across different outcomes (e.g. performance or performance readiness measures), settings (conditions or situations) and subjects. The multiple baseline design can be conceptualised as separate AB designs. For example, after obtaining a stable baseline phase, three interventions are introduced sequentially and measurements are taken regularly until datapoints are equalled among the three dependent variables (figure 3). The researchers are assured that the intervention is effective when a change of a dependent variable appears after the intervention while the levels of other dependent variables remain relatively constant. A basic requirement of multiple baseline designs is that the dependent variables are independent of one another.[14] 1039 8 6 4 2 0 Weeks Baseline A Baseline A' Intervention B Intervention B' Baseline A" Intervention B" Sessions Fig. 3. A schematic figure of the multiple baseline design. After obtaining a stable baseline phase of three dependent variables (outcomes, settings or subjects), three interventions (interventions B, B′ and B′′) are introduced sequentially and measurements are taken regularly until datapoints are equalled among the three dependent variables. 2004 Adis Data Information BV. All rights reserved. Fig. 4. Hypothetical data of a multiple baseline design across settings. Self-referenced performance scores on a 10-point scale (from 10 = very, very good to 1 = very, very poor) are plotted for 13 weeks. Three interventions (e.g. specific training programmes for dribble, pass and shoot) are introduced in week 3, week 5 and week 7, respectively. training) on at least three occasions for a soccer player before introducing each training programme independently. Barlow and Hersen[14] recommended a minimum of three datapoints in the baseline phase. The coach and researchers would visually analyse each skill on a 10-point scale (from 10 = very, very good to 1 = very, very poor) and show whether the specific training programme had a positive effect on the soccer player’s skills. Sports Med 2004; 34 (15) 1040 2004 Adis Data Information BV. All rights reserved. founding factors.[14] Alternating treatment designs may overcome these limitations. 1.4 Alternating Treatment Designs Although alternating treatment designs are not widely reported in the literature, the designs allow the comparison of the effects of two or more interventions on the same dependent variable.[14] This design involves the alternation of two or more types of treatments or conditions (e.g. treatments A and B are alternately implemented day by day) for a single subject within one intervention phase (figure 5). The alternating treatment design has been described as a between-series strategy, where one is comparing results between two separate series of datapoints, whereas the reversal designs look at data within the same series of datapoints (within-series).[14] Wolko et al.[27] used the alternating treatment design to compare the effects of standard coaching (as a baseline) and additional self-management skills (public self-regulation as treatment 1 and private self-regulation as treatment 2) on five female gymnasts. All three conditions were measured once during each week of practice over 8 weeks. The order of interventions was randomly alternated across weeks. The results showed treatment 2 was more effective than treatment 1 in three of the five subjects, while one subject demonstrated treatment 1 was more effective and one showed mixed results (standard coaching was most effective for frequency of attempted skill and treatment 2 was most effective for frequency of completed skill). 10 9 8 Fatigue level In the multiple baseline design across settings, an intervention is introduced sequentially across two or more independent settings in a given subject. For example, the design can be used to investigate athletes in different environments (e.g. altitude, sea level) and training phases (e.g. preparation, tapering). Similarly, in the multiple baseline design across subjects, an intervention is applied sequentially to study one outcome across two or more matched subjects. The design is used to assess a single athlete but allows attempts to replicate the effects of the intervention for more than one athlete, by introducing the intervention successively to each athlete. In team sports, the coaches can demonstrate that the training programme affected the individual athletes (not as a group) by assessing whether performance changed only when the intervention was introduced for each athlete. Thus, the coaches can modify the training programme if it is not effective for enhancing performance. Shambrook and Bull[26] used multiple baselines across subjects to examine the effects of imagery training on free-throw performance in four female basketball players. The researchers divided 26 free-throw trials into two (baseline and intervention) phases and each subject began the intervention at different points in time (the point of intervention for each subject was randomly determined before the study). Only one subject demonstrated an improved free-throw performance after the imagery training, the others showed poorer performance. Although datapoints in the baseline phase varied among subjects, the multiple baseline design across subjects is considered a replication of the AB design in a single subject.[14] One of the limitations of multiple baseline designs is that some outcomes, settings, or subjects may be interdependent or interrelated.[14] A change in one outcome, setting, or subject may, therefore, influence another outcome, setting, or subject. In this subject, the controlling effects of intervention are subject to question. Thus, outcomes, settings, or subjects should be independent of each other. Another limitation is that a substantial number of measurements may be needed in the baseline phase to demonstrate clear intervention effects in the intervention phase. Further, the dependent variable may change in the baseline phase before the intervention is applied due to practice effects and other con- Kinugasa et al. 7 Non-treatment 6 Treatment A 5 4 3 Treatment B 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 Days Fig. 5. Hypothetical data of an alternating treatment design. Fatigue levels on a 10-point scale (from 10 = very, very high to 1 = very, very low) are plotted for 13 days. After a baseline phase, two treatments, A and B (e.g. stretching and hydrotherapy, respectively), are introduced alternately during an intervention phase. Sports Med 2004; 34 (15) Single-Subject Research and Conditioning The main advantages of alternating treatment designs are that they do not require a withdrawal of intervention, the phases can be much shorter than in AB designs, and a baseline phase is not an absolute requirement.[21] However, carry-over effects from one treatment to the next may exist, although the random assignment of treatments or conditions can reduce this problem.[21] In summary, each of the four main experimental designs in single-subject research (the AB design, reversal/withdrawal designs, multiple baseline designs and alternating treatment designs) has its advantages and disadvantages. Choosing an appropriate one is based on available resources (e.g. time, finances and other source availability, and compliance of subjects) and current knowledge about the phenomenon being studied.[14] 2. Data Analyses in Single-Subject Experimental Designs Systematic assessment is required to provide information about the progress of conditioning so that the data can be used to adjust athletes’ training programmes, competition schedules and lifestyle, on a regular basis. This complex evaluation of conditioning is currently based on the coach’s intuitive judgment or subjective visual analysis. However, statistical analyses can assist in understanding successful loading patterns and objectively predicting performance for each athlete. Although visual analysis is still common in single-subject experimental designs, statistical analyses have been developed to improve the analysis of single-subject data.[14,28,29] 2.1 Visual Analysis Visual analysis (or inspection) is commonly used to subjectively assess graphed data and judge whether an intervention has produced a significant change in a dependent variable.[16,30,31] Data are plotted on a graph (e.g. a line graph), where the horizontal (x) axis is split according to units of time (e.g. minutes, hours, days) and the vertical (y) axis contains units of measurement of the subject’s dependent variable (e.g. performance).[21] Visual analysis can be used when there is a non-linear underlying temporal trend in the dependent variable. These graphs are then 2004 Adis Data Information BV. All rights reserved. 1041 analysed by the researchers, independent judges, or coaches. For example, Mattacola and Lloyd[30] used visual analysis to assess the effect of a 6-week strength and proprioception programme for three subjects who had previously sustained ankle sprains. They reported that improvements in dynamic balance were observable through the researchers’ visual inspection. In another study, performance decrements in maximum workload, maximal oxygen uptake and anaerobic threshold were observed by two independent, experienced judges in an elite, ultra-endurance cyclist who developed chronic fatigue syndrome.[32] The advantages of visual analysis are that it is easy and inexpensive to use, widely recognised and understood, and graphs can simplify the data.[31,33] Visual analysis can be useful in practical or clinical settings since it allows continuous monitoring of performance.[31] However, as it has been shown that there is often a lack of agreement among judges with respect to the conclusion drawn from visual analysis, the accuracy and reliability of this particular method of data analysis have been questioned.[34-36] For example, DeProspero and Cohen[34] showed that the inter-rater reliability coefficient of visual analysis of 36 ABAB reversal graphs showing hypothetical single-subject data was r = 0.61 for 114 experienced reviewers of behavioural journals. To do this, a set of graphs was constructed to illustrate four graphic factors of represented characteristics of visual analysis: (i) pattern of mean shift; (ii) degree of mean shift; (iii) fluctuation variation within phases; and (iv) trend. For example, the first graphic factor was presented by three patterns of mean shifts across phases: ideal pattern, inconsistent treatment pattern and irreversible effect pattern of results. The evaluation of each figure was expressed as a rating (“How satisfactory a demonstration of experimental control do you consider this to be?”) on a 100-point scale and the inter-rater reliability coefficient was calculated by the Pearson product moment correlation. Ottenbacher[37] also conducted a meta-analysis of visual analysis studies across 14 mental retardation studies. Each of the 14 studies was coded by two examiners to establish the inter-rater agreement of visual analysis. The overall inter-rater reliability was r = 0.58, but details on how the code was set were not provided. The inconsistency of the judgSports Med 2004; 34 (15) 1042 ments is mainly due to the lack of any standard rules or guidelines to make decisions about the result.[35] Statistical analysis is, therefore, required to support visual analysis or to replace visual analysis in conditioning research. 2.2 Statistical Analyses for Detecting Significant Intervention Effects Statistical significance in single-subject research designs refers to the probability that an intervention has a reliable or real effect on a dependent variable.[18] For example, coaches and sports scientists may want to compare a new training method (e.g. altitude training) as an intervention and a traditional training method as a baseline on an elite athlete. Statistical methods can be used to confidently assess whether a practically significant change in the athlete’s performance has occurred as a result of the new training method during a training season. However, it is important to note that levels of statistical significance and practical (or clinical) meaningfulness are different.[14,38] In this regard, the current literature suggests a series of statistical methods aimed at determining the effect of an intervention by comparing baseline and intervention phases. The suggested statistical methods are time series analysis, randomisation tests, split-middle technique and Revusky’s test of ranks. In group research designs, statistical analysis usually consists of testing to see whether differences between groups are statistically significant. The assumption of independence of measurements, which means that observations must be independent of one another, is assured by randomly assigning subjects to specific conditions.[38] However, the prerequisite for parametric statistical analyses (i.e. assumption of independence) used in group research designs is often not met in single-subject studies. Therefore, conventional statistical analyses used in group research designs (e.g. t and F tests) may not be applicable in single-subject research designs. Before implementing these statistical analyses in single-subject research designs, the issue of serial dependency must be considered.[38] 2.2.1 Serial Dependency When successive observations in single-subject time series data are correlated, the data are said to be 2004 Adis Data Information BV. All rights reserved. Kinugasa et al. serially dependent.[38-40] The existence of serial dependency on one occasion allows us to predict subsequent data in the series. Strictly, errors of measurement (residuals) associated with data at one point may be predictive of errors at other points in the series that follows.[41] Serial dependency can be assessed by examining an autocorrelation coefficient in the series.[24,38] The autocorrelation coefficient of lag 1, which is calculated by pairing temporally adjacent data (time and time-1 datapoints), is generally deemed sufficient to reveal serial dependency in the series.[38] The lag-1 autocorrelation is computed by pairing the initial with the second datapoint, the second with the third datapoint and so on until the second from the last is paired with the last datapoint. If the autocorrelation coefficient is not substantial, conventional statistical analyses based on the assumption of independence of measurements such as t and F tests can be used to analyse single-subject data. In contrast, if these tests are applied to autocorrelated data, the results may lead to falsely conclude that the intervention was effective since the precision of estimation will be affected by the bias of serial dependency (type I errors may be inflated).[38,39] Statistical tests such as time series analysis and randomisation tests can be used to analyse autocorrelated single-subject data.[36] 2.2.2 Time Series Analysis Time series analysis allows us to determine the nature, magnitude and direction of the relationship between variables that are measured at several equidistant points in time and assess the effects of interventions on a dependent variable in single-subject studies.[42,43] For example, interrupted time-series analysis techniques, which are used to analyse temporally ordered variables, make it possible to analyse autocorrelated single-subject data.[38,44,45] Moreover, time series analysis does not depend upon stable baselines and provides important information about different characteristics of the performance change for adjacent phases.[38] The analysis can be used to compare adjacent phases such as baseline and intervention phases in terms of slope and level (intercept). The method is especially useful to examine whether there is a statistically significant change in slope and level rather than change in overall mean.[43,45] The slope refers to the degree of Sports Med 2004; 34 (15) Single-Subject Research and Conditioning 1043 the angle of a linear line, which represents the rate of change from one phase to the next, and the level refers to the magnitude of change in data at the point when the intervention is introduced.[25] If data at the end of the baseline and the beginning of the intervention phases show an abrupt departure or discontinuity, this discontinuity would reflect a change in level. The probability is computed by comparing the obtained test statistics in terms of slope and level between phases (e.g. phases A and B). The disadvantage of time series analysis is the complexity of the mathematical theories on which it is based.[45] It also needs many datapoints to identify the model accurately.[44,46] Some researchers have suggested that at least 50 datapoints are required.[42,43] However, Crosbie[44] proposed an interrupted time series analysis correlation procedure applicable for fewer datapoints than required in traditional time series analysis. This method requires 10–20 datapoints in each phase to achieve acceptable statistical power to detect significant differences.[44] Recently, Kinugasa et al.[3] showed a significantly decreased slope and level in recovery data (number of hours of physical therapy, day nap, nutrition, bathing and active rest) for an elite collegiate tennis player between off-season and preseason using this method. They also used it to compare changes in conditioning indicators (e.g. training load, recovery, performance and performance readiness) between training phases such as off-, preand in-seasons. Tryon[46] proposed the C statistic as a simple method of time series analysis for use with as few as eight datapoints in each phase. The C statistic does not require a complex computer program and is easily calculated using Young’s table.[47] The first step is to calculate the Z value (the normalised C statistic) of baseline data to examine whether the data are stable, since interpretation becomes difficult when the baseline data have a significant trend.[48] The value of C is given by equation 1.[46] (Eq. 1) 2004 Adis Data Information BV. All rights reserved. where xi is the ith datapoint and n is the total number of datapoints. The standard error of C is calculated by equation 2 which gives the Z value (Z = C/Sc).[46] (Eq. 2) In this regard, we recommend starting data collection in the off-season (no training or minimal training) when monitoring an athlete’s conditioning. When the baseline is stable, the next step is to compare the significance of the change between normalised baseline and intervention data. Because the C statistic cannot control for a type I error when a significant autocorrelation is observed,[49] Yamada[50] suggested that randomisation tests are more appropriate for single-subject data than the C statistic. 2.2.3 Randomisation Tests A randomisation test is a permutation test based on randomisation (random assignment) to test a null hypothesis about intervention effects.[51] The randomisation tests have been recently proposed as a valid method of data analysis for single-subject research.[39,51,52] Randomisation tests are a non-parametric procedure that can be applied to single-subject data when the assumptions of parametric tests are not tenable.[51,53] Randomisation tests can be used to determine whether the effect of a specific intervention on the outcome is statistically significantly different from zero or from that of another intervention. To conduct a valid randomisation test, the study design must incorporate random assignment. For example, if we are to establish whether conditioning regimen A is more beneficial than conditioning regimen B, we need to randomly assign treatment A to half of the points in time at which the outcome variable is going to be assessed. Intervention B would be assigned to the remaining points in time. The basic approach to randomisation tests is straightforward. We formulate our hypothesis, which, for example, could read “Intervention A has the same effect on the athlete’s performance as intervention B”. We choose a statistic to test our hypothesis (e.g. t-test). We generate a null reference distribution by randomly shuffling the observed data of the outcome variable over the entire sequence. We assign the first n1 observation to the first condiSports Med 2004; 34 (15) 1044 tion (intervention A) and the remaining n2 observations to the second condition (intervention B). We calculate the test statistics for the reshuffled data. We repeat the last 3 steps k times (usually more than 1000). We calculate the proportion of times the test statistic on the randomised data exceeded that on the original data, which represents the probability of obtaining the original test statistic under the null hypothesis. We reject or retain the null hypothesis on the basis of this probability.[28,41] The statistical power of randomisation tests (the probability that the null hypothesis will be rejected when it is not true) is relatively low.[54,55] For example, Ferron and Ware[55] estimated the power was 0.40 using an AB design with 30 datapoints when there was no autocorrelation. Cohen[56,57] suggested 0.80 as an acceptable power when the alpha level (p value) is 0.05. However, the power of randomisation tests also depends on the choice of the design.[54] Ferron and Onghena[54] reported that the design involving the random assignment of interventions to phases was more powerful than the ABAB design with three randomly assigned interventions. An acceptable power (>0.80) was obtained when Cohen’s effect sizes were large (1.1 and 1.4), and phase lengths (datapoints) exceeded five.[55] One of the advantages of randomisation tests is that they are more efficient in controlling type I errors than the C statistic is.[50] Fewer datapoints than time series analysis (i.e. minimum of 50 datapoints) are required. However, it should be noted that in conditions of significant positive autocorrelations among datapoints, to control type I errors, a more conservative probability level (e.g. 0.01) should be adopted when using randomisation tests.[58] Finally, various methods (e.g. analysis of variance, correlation analysis) appropriate for the analysis of single-subject data have been developed, although these methods have not yet been used in conditioning research.[36,38] Recently, Kinugasa et al.[3] monitored two tennis players for a 6-month training season and used a randomisation test to analyse mean values of conditioning indicators (training load, recovery and performance readiness) across training phases. One subject’s performance readiness data significantly increased from the beginning to the end of the training season, but this was not observed in the 2004 Adis Data Information BV. All rights reserved. Kinugasa et al. other subject. The researchers suggested that the randomisation test can be used to objectively assess changes in how the athlete responded to a conditioning regimen. 2.2.4 Split-Middle Technique The combined use of the split-middle technique (celeration line approach) and a binomial test provides a non-parametric method to reveal the nature of the trend in the data and can be used to predict an athlete’s performance over time.[59-61] The split-middle technique is easy to compute and can be used with a small number of datapoints.[21] The aim of this technique is to assess the effect of a specific intervention by providing a method of describing the rate of change in the outcome variable for a single individual. This technique has been proposed primarily to describe the process of change in the outcome within and across intervention conditions. This is achieved by plotting trends within conditions (e.g. baseline and intervention) to characterise an athlete’s progress. Statistical significance can be examined once the trend lines have been determined.[21] The split-middle technique involves multiple steps. From the data of each phase of the study (e.g. baseline and intervention), a trend or split-middle line is constructed to characterise the rate of performance over time for a specific phase. This line is situated so that 50% of the data fall on or above the line and 50% fall on or below the line. This is done by dividing the data from a specific phase in half by drawing a solid vertical line to separate the first half of the datapoints from the second half. Next, each of the halves is divided in half by dashed vertical lines. The median value of the outcome variable is plotted for each half of the phase. A straight line is drawn through the two median points denoting the rate of change for that particular phase. Subsequently, the trend line from a phase of the study (e.g. baseline) is extended into the following phase of the study (e.g. intervention) as a dashed line.[21] A binomial test can be used to determine the statistical significance of the intervention effect by establishing if the number of datapoints above the projected line in the intervention phase is of a sufficiently low probability not to be attributed to chance.[62] Sports Med 2004; 34 (15) Single-Subject Research and Conditioning There have been few experimental studies using the split-middle technique, none of which have been directly related to conditioning.[23,26,63,64] For example, Marlow et al.[63] used the split-middle technique to analyse the effect of a pre-performance routine on the water polo penalty shot. The results revealed 21–28% performance improvements in all three male water polo players between baseline and intervention phases. Shambrook and Bull[26] also reported a multiple-baseline design study examining the impact of an imagery training routine on basketball free-throw performance for four female basketball players over 12 weeks. They showed that only one subject demonstrated a consistent performance improvement in terms of mean (4%), slope and level (6%) after the imagery training. Although the use of inferential statistics accompanying this method of data analysis is problematic when the data are autocorrelated,[49] the split-middle technique may be a useful descriptive technique for conditioning research. 2.2.5 Revusky’s Test of Ranks (Rn) Revusky’s test of ranks has been proposed for examining the effect of an intervention in studies with a multiple baseline design in which data are collected across several outcomes, settings, or subjects (see section 1.3).[28,38,65,66] The intervention effect is determined by assessing the performance of each of the baselines at the point when the intervention is introduced. For example, in a multiple baseline design across different outcomes, each outcome is treated as a sub-experiment. When the intervention is introduced for a specific outcome, the performance of all outcomes is ranked for that point in 1045 time. To account for baseline differences in magnitude or measurement units across outcomes, the ranks are based on the percentage change in level from baseline to the time when the intervention is introduced to any of the outcomes. The sum of the ranks across all sub-experiments each time the intervention is introduced constitutes the statistic Rn. This statistic reflects whether the intervention had a significant effect on the various aspects of performance. One of the limitations of the test is that the performance must change dramatically across phases to be reflected in the ranks.[38] More importantly, the analysis of the change in level alone may result in erroneous conclusions with regard to the effect of the intervention. Hence, it is suggested that an examination of both changes in levels and slopes be conducted.[66] In summary, there are advantages and disadvantages associated with each type of statistical analysis (table I). It is important to recognise the limitations of each method before applying any of these statistical analyses to single-subject data. Currently, a combination of randomisation tests and interrupted time series analysis, examining the changes in means, slopes and levels is recommended for assessing the effects of an intervention in single-subject conditioning studies. As far as the other statistical methods are concerned, randomisation tests are simple statistical tools to use but they lack statistical power.[54,55] On the other hand, time series analysis is appropriate for conditioning research but its theory and technique are complex.[45] Table I. Statistical methods for single-case experimental data Method Sample size Autocorrelated data Use Analysis of variance Small to moderate (>30) No AB, ABA, ATD The C statistic Small (>16) No AB, ABA Time series analysis Large (>50) Yes AB, model identification Interrupted time series Small (>20) Yes AB Randomisation tests Small (>30) Yes AB, ABA, ATD, multi Split-middle technique Small (>30) No AB, ABA, multi Revusky’s test of ranks Small (>30) Yes Multi P-technique factor analysis Large (>100) No Model identification Dynamic factor analysis Large (>100) Yes Model identification Structural equation modelling Large (>100) No Model identification AB = AB design; ABA = ABA design; ATD = alternating treatments design; multi = multiple baseline design. 2004 Adis Data Information BV. All rights reserved. Sports Med 2004; 34 (15) 1046 3. Modelling of Multivariate Data to Predict Performance for a Single Subject A multifaceted approach is needed to understand the complex phenomenon of conditioning and identify causal dose-response relationships among training load, recovery, performance and performance readiness. Researchers have collected multivariate data to assess elite athletes’ conditioning (e.g. training load, recovery, performance and performance readiness),[67-69] but have not elaborated on the methods of analysis of such data derived from studies adopting a single-subject design. P-technique factor analysis, dynamic factor analysis and structural equation modelling (SEM) are statistical methods that can be used for the analysis of multivariate time series data in single-subject studies on conditioning. Whilst the use of the first two methods is mainly limited to the reduction of the number of observed variables to a few underlying factors, SEM is usually employed to establish whether the observed data support the existence of hypothetical causal relationships between variables. 3.1 P-Technique Factor Analysis Traditional factor analysis used in cross-sectional studies is known as the R-technique with the dataset (score matrix) having subjects (n) as rows and variables (p) as columns (n × p).[70] Cattell[70] introduced the P-technique factor analysis to assess time series data (repeated measurements on a single subject over time). P-technique factor analysis uses a T × r matrix, where T is the number of occasions and r is the number of variables measured on a single subject. For example, Kinugasa et al.[3] used P-technique factor analysis with a principle component solution (principle component analysis) to summarise 31 measures of an athlete’s conditioning to seven factors (components) [e.g. training achievement, recovery achievement, sleep, specific training, fatigue, physiological needs and physical training] and identified an individualised model of conditioning. P-technique factor analysis analyses the structure of the correlations among variables to define common underlying factors so that multivariate time series data can be summarised as a few factors or a single factor score. However, one of the limitations 2004 Adis Data Information BV. All rights reserved. Kinugasa et al. of the P-technique factor analysis stems from the fact that it takes into account only simultaneous relationships among components of multivariate time series data. Anderson[71] criticised the technique for its failure to incorporate a lagged covariance (a relationship between two datasets with time lags) structure of the time series. Moreover, the factor loading estimates are lower than their true value in P-technique models when serial dependency exists.[72] 3.2 Dynamic Factor Analysis Recently, dynamic factor analysis has been proposed for analysing multivariate time series data with serial dependency to overcome the limitations of the P-technique factor analysis.[72-74] This type of factor analysis represents a generalisation of the Ptechnique factor model that incorporates the lagged covariance structure among multivariate time series data. As such, the technique accounts for the presence of serial dependency in the data. Although dynamic factor analysis can be computed using traditional SEM software (e.g. LISREL and EQS), the usual input variance/covariance matrix created by these programmes is not adequate because it does not include the lagged covariances among multivariate time series data.[72,75] Performing dynamic factor analysis using SEM requires specifying a symmetric covariance matrix containing the lagged covariances. This is done by organising the inherently asymmetric lagged covariance matrix as a block-Toeplitz matrix. A block-Toeplitz matrix replicates the information about the blocks of simultaneous and lagged covariances across variables to construct a variance-covariance matrix that is square and symmetric. In this regard, Wood and Brown[72] constructed a series of SAS macro programs to obtain a Toeplitz-transformed matrix of simultaneous and lagged covariances, which can then be used as the input variance/covariance matrix in standard SEM software such as LISREL and EQS. 3.3 Structural Equation Modelling SEM is a general framework for statistical analysis that includes as special cases several traditional multivariate methods such as factor analysis, multiSports Med 2004; 34 (15) Single-Subject Research and Conditioning ple regression analysis and discriminant analysis.[76-78] SEM is used to estimate hypothetical causal relationships amongst manifest (measured) and latent variables. Consequently, this technique may be useful for examining causal relationships among training load, recovery, performance and performance readiness. SEM may be used to examine both linear and non-linear relationships between variables.[76,79] Structural equation models are often visualised by a graphical path diagram, which depicts the expected relationship between the examined variables.[80] In general, the aim of SEM is to assess the discrepancy (lack of fit) between the observed covariance matrix and an expected covariance matrix based on a hypothetical path diagram. SEM can be used with cross-sectional and repeated measures data. One of the first SEM methods for the analysis of repeated measures data is the autoregressive cross-lagged panel model (ARCL).[81] This modelling strategy comprises two main components. First, later measures of a construct (e.g. current performance) are predicted by earlier measures of the same construct (e.g. last weeks’ performance), thus giving rise to the term ‘autoregressive’. Secondly, later measures of one construct (e.g. current performance) can be regressed on earlier measures of other constructs (e.g. yesterday’s mood). This second component gives rise to the term ‘cross-lagged’. Despite the widespread use of the ARCL modelling approach, this analytic technique has been criticised for both statistical and theoretical reasons. For example, ARCL models usually omit the observed mean structure. This means that potentially important information about the mean changes over time is ignored. Additionally, in ARCL models, the changes in the construct between two timepoints are independent of the influence of earlier and later changes in the same construct. This approach reduces multiple repeated measures into a series of two-timepoint comparisons, which is seldom consistent with he structure of the observed data. Latent curve analysis is a SEM method that has been developed to overcome the limitations of the ARCL method. Latent curve analysis aims at exploring the unobserved factors that are hypothesised to underlie temporal changes in and relationships 2004 Adis Data Information BV. All rights reserved. 1047 among variables.[82] The latent curve model attempts to estimate the continuous trajectory that gave rise to the observed repeated measures. Both ARCL models and latent curve analysis may be useful for building individualised models of conditioning to predict performance in individual athletes. 4. General Issues in Single-Subject Research Inter-individual differences in response to athletes’ conditioning should be considered in assessing changes in conditioning data especially in elite athletes.[3,83] Single-subject experimental designs address this issue, but with the most obvious problem that the results are unlikely to be relevant to other individuals.[14,84,85] As mentioned in section 2.2, it is important to recognise that statistical significance and practical meaningfulness are two different things.[14,38] Practical meaningfulness refers to the practical value or importance of the effect of an intervention.[18] For example, performance may be enhanced to a level that is important to an athlete, but the change may not be statistically significant. Statistical significance is not the only way to assess an elite athlete’s conditioning. If the variable being measured has a sufficiently small error of measurement, we can detect the smallest worthwhile change with repeated measurement of the variable. Thus, error of measurement must be taken into account in a preliminary reliability study to make sure the variable is worth measuring.[86] Hopkins[87] has addressed the various ways to estimate the chances that a substantial (greater than the smallest practically meaningful effect) change has occurred after an intervention. Therefore, it is necessary to interpret data carefully before reaching any conclusions in single-subject research. Theoretical and practical considerations and visual analysis by coaches and sports scientists are helpful in supporting findings. 5. Conclusion Despite the problems associated with data analyses in single-subject research (e.g. serial dependency and generality of findings), it is recommended that sports scientists use single-subject research experimental designs in applied conditioning research Sports Med 2004; 34 (15) 1048 Kinugasa et al. to assess the effect of an intervention (e.g. a specific training method) and to predict performance for a particular athlete. The single-subject approach is applicable to specific categories of subjects such as elite or overtrained athletes. Single-subject research designs are rare in the literature.[88] Additionally, data from these rare studies have most often been assessed by visual analysis, a fairly subjective technique of data analysis. We believe that the use of statistical analyses in conjunction with visual analysis will produce more reliable and objective findings than those based on visual analysis alone.[36] Further, we believe the application of single-subject experimental designs and data analyses for the assessment of athlete conditioning has an important place in effectively and efficiently monitoring elite athletes. Acknowledgements The authors wish to gratefully acknowledge Professor Will G. Hopkins, Auckland University of Technology, Auckland, New Zealand, for his invaluable contribution to this manuscript. The authors have provided no information on sources of funding or on conflicts of interest directly relevant to the content of this review. References 1. Lehmann M, Baumgartl P, Wiesenack C, et al. Training-overtraining: influence of a defined increase in training volume vs training intensity on performance, catecholamines and some metabolic parameters in experienced middle- and long-distance runners. Eur J Appl Physiol 1992; 64: 169-77 2. Mujika I, Chatard JC, Busso T, et al. Effects of training on performance in competitive swimming. Can J Appl Physiol 1995; 20: 395-406 3. Kinugasa T, Miyanaga Y, Shimojo H, et al. Statistical evaluation of conditioning using a single-case design. J Strength Cond Res 2002; 16: 466-71 4. Rowbottom DG, Morton A, Keast D. Monitoring for overtraining in the endurance performer. In: Shephard RJ, Åstrand PO, editors. Endurance in sport: volume II of the encyclopaedia of sports medicine. 2nd ed. Malden (MS): Blackwell Scientific, 2000: 486-504 5. Bompa TO. Theory and methodology of training: the key to athletic performance. 3rd ed. Dubuque (IA): Kendall/Hunt, 1994 6. Matveyev L. Fundamentals of sports training. Moscow: Progress Publishers, 1981 7. Bagger M, Petersen PH, Pedersen PK. Biological variation in variables associated with exercise training. Int J Sports Med 2003 Aug; 24 (6): 433-40 8. Martin DT, Andersen MB, Gates W. Using profile of mood states (POMS) to monitor high-intensity training in cyclists: group versus case studies. Int J Sport Psychol 2000; 14: 138-56 9. Boulay MR. Physiological monitoring of elite cyclists: practical methods. Sports Med 1995; 20: 1-11 2004 Adis Data Information BV. All rights reserved. 10. Fry RW, Morton AR, Keast D. Periodisation of training stress: a review. Can J Sport Sci 1992; 17: 234-40 11. Pyne DB, Gleeson M, McDonald WA, et al. Training strategies to maintain immunocompetence in athletes. Int J Sports Med 2000; 21: s51-60 12. Banister EW, Carter JB, Zarkadas PC. Training theory and taper: validation in triathlon athletes. Eur J Appl Physiol 1999; 79: 182-91 13. Busso T. Variable dose-response relationship between exercise training and performance. Med Sci Sports Exerc 2003; 35: 1188-95 14. Barlow DH, Hersen M. Single-case experimental designs: strategies for studying behavior change. 2nd ed. New York: Pergamon Press, 1984 15. Backman CL, Harris SR. Case studies, single subject research, and N of 1 randomized trials: comparison and contrasts. Am J Phys Med Rehabil 1999; 78: 170-6 16. Bobrovitz CD, Ottenbacher KJ. Comparison of visual inspection and statistical analysis of single-subject data in rehabilitation research. Am J Phys Med Rehabil 1998; 77: 94-102 17. Hartmann DP. Forcing square pegs into round holes: some comments on ‘an analysis-of-variance model for the intrasubject replication design’. J Appl Behav Anal 1974; 7: 635-8 18. Kazdin AE. Research design in clinical psychology. 3rd ed. Needham Heights (MA): Allyn and Bacon, 1998 19. Campbell DT. Reforms as experiments. Am Psychol 1969; 24: 409-29 20. Lerner BS, Ostrow AC, Yura MT, et al. The effect of goalsetting and imagery training programs on the free-throw performance of female collegiate basketball players. Sport Psychol 1996; 10: 382-97 21. Zhan S, Ottenbacher KJ. Single subject research designs for disability research. Disabil Rehabil 2001; 23: 1-8 22. Bryan AJ. Single-subject designs for evaluation of sport psychology interventions. Sport Psychol 1987; 1: 283-92 23. Callow N, Hardy L, Hall C. The effect of a motivational general-mastery imagery intervention on the sport confidence of high-level badminton players. Res Q Exerc Sport 2001; 72: 389-400 24. Neuman SB, McCormick S. Single-subject experimental research: applications for literacy. Newark (DE): International Reading Association, 1995 25. Richards SB, Taylor RL, Ramasamy R, et al. Single subject research: applications in educational and clinical settings. San Diego (CA): Singular Publishing Group, 1999 26. Shambrook CJ, Bull SJ. The use of a single-case research design to investigate the efficacy of imagery training. J Appl Sport Psychol 1996; 8: 27-43 27. Wolko KL, Hrycaiko DW, Martin GL. A comparison of two self-management packages to standard coaching for improving practice performance of gymnasts. Behav Modif 1993; 17: 209-23 28. Kazdin AE. Single-case research design: methods for clinical and applied settings. New York: Oxford University Press, 1982 29. Kratochwill TR, Levin JR. Single-case research design and analysis: new directions for psychology and education. Hilldale (DE): Lawlence Erlbaum Associates, 1992 30. Mattacola CG, Lloyd JW. Effects of a 6-week strength and proprioception training program on measures of dynamic balance: a single-case design. J Athlet Train 1997; 32: 127-35 31. Parsonson BS, Baer DM. The visual analysis of data, and current research into the stimuli controlling it. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: new directions for psychology and education. Hillsdale (NJ): Lawrence Erlbaum Associates, 1992: 15-40 Sports Med 2004; 34 (15) Single-Subject Research and Conditioning 32. Rowbottom DG, Keast D, Green S, et al. The case history of an elite ultra-endurance cyclists who developed chronic fatigue syndrome. Med Sci Sports Exerc 1998; 30: 1345-8 33. Ferron J, Foster-Johnson L. Analyzing single-case data with visually guided randomization tests. Behav Res Methods Instrum Comput 1998; 30: 698-706 34. DeProspero A, Cohen S. Inconsistent visual analyses of intrasubject data. J Appl Behav Anal 1979; 12: 573-9 35. Ottenbacher KJ. Reliability and accuracy of visually analyzing graphed data from single-subject designs. Am J Occup Ther 1986; 40: 464-9 36. Yamada T. Introduction of randomization tests as methods for analyzing single-case data. Jpn J Behav Anal 1998; 13: 44-58 37. Ottenbacher KJ. Interrater agreement of visual analysis in single-subject decisions: quantitative review and analysis. Am J Ment Retard 1993; 98: 135-42 38. Kazdin AE. Statistical analyses for single-case experimental designs. In: Barlow DH, Hersen M, editors. Single case experimental designs: strategies for studying behavior change. 2nd ed. New York: Pergamon Press, 1984: 285-321 39. Busk PL, Marascuilo LA. Statistical analysis in single-case research: issues, procedures, and recommendations, with application to multiple behaviors. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: new directions for psychology and education. Hillsdale (NJ): Lawrence Erlbaum Associates, 1992: 159-85 40. McCleary R, Welsh WN. Philosophical and statistical foundations of time-series experiments. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: new directions for psychology and education. Hillsdale (NJ): Lawrence Erlbaum Associates, 1992: 41-91 41. Todman JB, Dugard P. Single-case and small-n experimental designs: a practical guide to randomization tests. Mahwah (NJ): Lawrence Erlbaum Associates, 2001 42. Box GEP, Jenkins GM. Time series analysis: forecasting and control. Rev ed. San Francisco (CA): Holden-Day, 1976 43. Glass GV, Willson VL, Gottman JM. Design and analysis of time-series experiments. Boulder (CO): Colorado Associated University Press, 1975 44. Crosbie J. Interrupted time-series analysis with brief singlesubject data. J Consult Clin Psychol 1993; 61: 966-74 45. Hartmann DP, Gottman JM, Jones RR, et al. Interrupted timeseries analysis and its application to behavioral data. J Appl Behav Anal 1980; 13: 543-59 46. Tryon WW. A simplified time-series analysis for evaluating treatment interventions. J Appl Behav Anal 1982; 15: 423-9 47. Young LC. On randomness in ordered sequences. Ann Math Stat 1941; 12: 293-300 48. Blumberg CJ. Comments on ‘A simplified time-series analysis for evaluating treatment interventions’. J Appl Behav Anal 1984; 17: 539-42 49. Crosbie J. The inappropriateness of the C statistic for assessing stability or treatment effects with single-subject data. Behav Assess 1989; 11: 315-25 50. Yamada T. Applications of statistical tests for single-case data: power comparison between randomization tests and C statistic [in Japanese]. Jpn J Behav Anal 1999; 14: 87-98 51. Edgington ES. Randomization tests. 3rd ed. New York: Marcel Dekker, 1995 52. Levin JR, Marascuilo LA, Hubert LJ. N = nonparametric randomization tests. In: Kratochwill TR, Levin JR, editors. Single-case research design and analysis: new directions for psychology and education. Hillsdale (NJ): Lawrence Erlbaum Associates, 1992: 159-85 53. Edgington ES. Statistical inference from n = 1 experiments. J Psychol 1967; 65: 195-9 2004 Adis Data Information BV. All rights reserved. 1049 54. Ferron J, Onghena P. The power of randomization tests for single-case phase designs. J Exp Educ 1996; 64: 231-9 55. Ferron J, Ware W. Analyzing single-case data: the power of randomization tests. J Exp Educ 1995; 63: 167-78 56. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale (NJ): Lawrence Erlbaum, 1988 57. Cohen J. A power primer. Psychol Bull 1992; 112: 155-9 58. Gorman BS, Allison DB. Statistical alternatives for single-case designs. In: Franklin RD, Allison DB, Gorman BS, editors. Design and analysis of single-case research. Mahwah (NJ): Lawrence Erlbaum Associates, 1996 59. White OR. A glossary of behavioral terminology. Champaign (IL): Research Press, 1971 60. White OR. A manual for the calculation and use of the median slope: a technique of progress estimation and prediction in the single case. Eugene (OR): University of Oregon, Regional Resource Center for Handicapped Children, 1972 61. White OR. The ‘Split Middle’: a ‘Quickie’ method of trend estimation. Seattle (WA): University of Washington, Experimental Education Unit, Child Development and Mental Retardation Center, 1974 62. Nourbakhsh MR, Ottenbacher KJ. The statistical analysis of single-subject data: a comparative examination. Phys Ther 1994; 74: 768-76 63. Marlow C, Bull SJ, Heath B, et al. The use of a single case design to investigate the effect of a pre-performance routine on the water polo penalty shot. J Sci Med Sport 1998; 1: 143-55 64. Silliman LM, French R. Use of selected reinforcers to improve the ball kicking of youths with profound mental retardation. Adapted Phys Activity Q 1993; 10: 52-69 65. Revusky SH. Some statistical treatments compatible with individual organism methodology. J Exp Anal Behav 1967; 10: 319-30 66. Wolery M, Billingsley FF. The application of Revusky’s Rn test to slope and level changes. Behav Assess 1982; 4: 93-103 67. Mackinnon LT, Hooper SL. Overtraining and overreaching: cause, effects, and prevention. In: Garrett Jr WE, Kirkendall DT, editors. Exercise and sport science. Philadelphia (PA): Lippincott Williams and Willkins, 2000: 487-98 68. McKenzie DC. Markers of excessive exercise. Can J Appl Physiol 1999; 24: 66-73 69. Rowbottom DG, Keast D, Morton A. Monitoring and preventing of overreaching and overtraining in endurance athletes. In: Kreider RB, Fry AC, O’Toole ML, editors. Overtraining in sport. Champaign (IL): Human Kinetics, 1998: 47-66 70. Cattell RB. Factor analysis. New York: Holt, 1952 71. Anderson TW. The use of factor analysis in the statistical analysis of multiple time series. Psychometrika 1963; 28: 1-25 72. Wood P, Brown D. The study of intraindividual differences by means of dynamic factor models: rationale, implementation, and interpretation. Psychol Bull 1994; 116: 166-86 73. Molenaar PCM. A dynamic factor model for the analysis of multivariate time series. Psychometrika 1985; 50: 181-202 74. Molenaar PCM, Rovine MJ, Corneal SE. Dynamic factor analysis of emotional dispositions of adolescent stepsons toward their stepfathers. In: Silbereisen R, von Eye A, editors. Growing up in times of social change. New York: DeGruyter, 1999: 287-318 75. Hershberger SL, Corneal SE, Molenaar PCM. Dynamic factor analysis: an application to emotional response patterns underlying daughter/father and stepdaughter/stepfather relationships. Struct Equat Model 1994; 2: 31-52 76. Hox JJ, Bechger TM. An introduction to structural equation modeling. Fam Sci Rev 1998; 11: 354-73 77. Marsh HW, Grayson D. Longitudinal stability of latent means and individual differences: a unified approach. Struct Equat Model 1994; 1: 317-59 Sports Med 2004; 34 (15) 1050 78. Raykov T, Widaman KF. Issues in applied structural equation modeling research. Struct Equat Model 1995; 2: 289-318 79. Browne MW, Cudeck R. Alternative ways of assessing model fit. In: Bollen KA, Long JS, editors. Testing structural equation models. Newbury Park (CA): Sage Publications, 1993 80. Rigdon EE. Software review: Amos and AmosDraw. Struct Equat Model 1994; 1: 196-201 81. Anderson TW. Some stochastic process models for intelligence test scores. In: Arrow KJ, Karlin K, Suppes P, editors. Mathematical methods in the social sciences. Stanford (CA): Stanford University Press, 1960 82. Meredith W, Tisak J. Latent curve analysis. Psychometrika 1990; 55: 107-22 83. Mackinnon LT. Overtraining effects on immunity and performance in athletes. Immunol Cell Biol 2000; 78: 502-9 84. Bates BT. Single-subject methodology: an alternative approach. Med Sci Sports Exerc 1996; 28: 631-8 85. Reboussin DM, Morgan TM. Statistical considerations in the use and analysis of single-subject designs. Med Sci Sports Exerc 1996; 28: 639-44 2004 Adis Data Information BV. All rights reserved. Kinugasa et al. 86. Hopkins WG. Measures of reliability in sports medicine and science. Sports Med 2000; 30: 1-15 87. Hopkins WG. Probabilities of clinical or practical significance [online]. Available from URL: http//sportsci.org/jour/0201/ wghprob.htm [Accessed 2004 Oct 27] 88. Hrycaiko D, Martin GL. Applied research studies with singlesubject designs: why so few? J Appl Sport Psychol 1996; 8: 183-99 Correspondence and offprints: Dr Sue Hooper, Centre of Excellence for Applied Sport Science Research, Queensland Academy of Sport, PO Box 956, Sunnybank, QLD 4109, Australia. E-mail: sue.hooper@srq.qld.gov.au Sports Med 2004; 34 (15)