Neurorehabilitation and Neural Repair http://nnr.sagepub.com/ Minimal Detectable Change Scores for the Wolf Motor Function Test Stacy L. Fritz, Sarah Blanton, Gitendra Uswatte, Edward Taub and Steven L. Wolf Neurorehabil Neural Repair 2009 23: 662 originally published online 4 June 2009 DOI: 10.1177/1545968309335975 The online version of this article can be found at: http://nnr.sagepub.com/content/23/7/662 Published by: http://www.sagepublications.com On behalf of: American Society of Neurorehabilitation Additional services and information for Neurorehabilitation and Neural Repair can be found at: Email Alerts: http://nnr.sagepub.com/cgi/alerts Subscriptions: http://nnr.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://nnr.sagepub.com/content/23/7/662.refs.html Downloaded from nnr.sagepub.com at UNIV OF DELAWARE LIB on November 11, 2010 Minimal Detectable Change Scores for the Wolf Motor Function Test Neurorehabilitation and Neural Repair Volume 23 Number 7 September 2009 662-667 © 2009 The Author(s) 10.1177/1545968309335975 http://nnr.sagepub.com Stacy L. Fritz, PhD, MSPT, Sarah Blanton, DPT, Gitendra Uswatte, PhD, Edward Taub, PhD, and Steven L. Wolf, PhD, PT Background. The Wolf Motor Function Test (WMFT) is an impairment-based test whose psychometrics have been examined by previous reliability and validity studies. Standards for evaluating whether a given change is meaningful, however, have not yet been addressed. Objectives. To determine the standard error of measurement (SEM) and minimal detectable change (MDC) for the WMFT. Methods. Data were collected from 6 university laboratories that participated in the EXCITE national clinical trial and included 96 individuals with sub-acute stroke (3–9 months). Measurements were made by blinded evaluators who were trained and standardized to administer the WMFT, which was completed on 2 occasions 2 weeks apart. No intervention was given between testing sessions. Results. The WMFT Performance Time score has a SEM of 0.2 seconds and a MDC95 of 0.7 seconds. The individual task timed items MDC95 ranged from 1.0 second (turn key in lock) to 3.4 seconds (reach and retrieve) with individual task items demonstrating notablly higher variability than the average WMFT Performance Time. The average WMFT Functional Ability Scale SEM and MDC95 is 0.1 points. Conclusions. When assessing the effect of a therapeutic intervention, if an individual experiences an amount of change equal to or greater than the MDC, then one may be 95% confident that this margin of change is truly larger than measurement error and not a chance result. Thus, the determination of SEM and MDC in outcome assessments allows researchers and clinicians to distinguish which results are actual differences versus which results are simply changes resulting from error or chance. Keywords: Reliability; Minimal detectable change; Wolf Motor Function Test; Standard error of measure; Stroke D etermining whether therapeutic gains are statistically significant is important when evaluating results from a clinical trial; however, the results should also be large enough to be clinically meaningful. An important first step in establishing standards for meaningful changes is identifying what is the measurement error on the instruments used to test treatment efficacy. Often times, trials that have a large enough sample can report significant results, but still have changes on the outcome measures that are smaller than their measurement error. Knowing an instrument’s measurement error is also valuable to clinical practice, because changes after therapy for individual patients can be compared with the measurement error to evaluate the meaningfulness of the changes observed without need for statistical testing. The Wolf Motor Function Test (WMFT) is one of the most frequently cited outcome measures used in stroke rehabilitation studies1–5 and has been used to assess upper-extremity function in research labs in over 16 countries. This impairment-based test, developed by Wolf et al6 and later modified by Taub et al,7 quantifies upper extremity movement ability through timed single or multiple joint motions and functional tasks. Progressing from proximal to distal joint movement, the test consists of 15 timed items, 2 strength measures, and a quality of motor function scale for each of the timed items (6-point Functional Ability Scale). The WMFT starts with simple items, such as placing the hand on a table top, and progresses to more challenging fine motor tasks, such as stacking checkers or picking up a paper clip. The test was originally devised to assist clinicians in guiding treatment interventions based upon specific joint-related movement limitations. As evidenced by over 90 citations in the stroke literature to date, its psychometric properties support the utility of the WMFT in research studies. The reliability and validity of the WMFT have been ascertained in chronic stroke (Morris et al,8 and Wolf et al,9 respectively) and subacute stroke.10 Used as one of the primary outcome measures in the EXCITE (Extremity Constraint-Induced Therapy Evaluation) trial,11 the sensitivity of the instrument was such that higher functioning participants were differentiated from lower functioning participants across 6 research sites and From the Arnold School of Public Health, Department of Exercise Science, Physical Therapy Program, University of South Carolina, Columbia, South Carolina (SLF); School of Medicine, Department of Rehabilitation Medicine, Division of Physical Therapy, Emory University, Atlanta, Georgia (SB); Departments of Psychology and Physical Therapy, University of Alabama at Birmingham, Birmingham, Alabama (GU); Department of Psychology, University of Alabama at Birmingham, Birmingham, Alabama (ET); and School of Medicine, Departments of Rehab Medicine, Medicine, and Cell Biology, Nell Hodgson Woodruff School of Nursing, Health and Elder Care, Emory University, Atlanta VA Rehab R&D Center, Atlanta, Georgia (SLW). Address correspondence to Stacy Fritz, PhD, MSPT, Department of Exercise Science, Third Floor PHRC, 921 Assembly Street, Columbia SC, 29208. E-mail: sfritz@mailbox.sc.edu 662 Downloaded from nnr.sagepub.com at UNIV OF DELAWARE LIB on November 11, 2010 Fritz et al / Scores for the Wolf Motor Function Test 663 scores were not influenced by hand dominance or affected side. In healthy individuals without impairments, the WMFT has been shown to be sensitive to the extent that scores were significantly different for the dominant versus the nondominant extremities.9 Stability of the WMFT has been supported by the lack of change in scores from 24 chronic stroke survivors residing in the community tested on 2 occasions separated by approximately 2 weeks.8 While important psychometrics of the WMFT have been established through previous reliability and validity studies,8–10 further information is needed to fully interpret and apply test scores. For instance, while reliable, just how far away does an individual score need to lie from a norm for this difference not to be considered an error of measurement? How can a researcher or clinician know if pretherapy to posttherapy changes on the WMFT exceed the minimal change necessary to be sure the change observed was not simply due to chance? Determining the standard error of measurement (SEM) and minimal detectable change (MDC) of the WMFT can help clarify the meaning of scores obtained in both research and clinical settings. The roles of SEM and MDC in test score interpretation with respect to reliability, measurement error, and test–retest variation are described below. Reliability refers to the repeatability of a measurement. If a test is perfectly error-free then an individual should get the same score on the test when given on 2 separate occasions12; this is assuming that the underlying construct being measured does not change. Thus, reliability indicates the consistency of a measurement and provides an indication of the amount of confidence that can be placed in interpretation of results. The intraclass correlation coefficient (ICC) reflects the degree of similarity and agreement among measurements and is a ratio of the between-classes variance to the total variance.12,13 ICCs permit the comparison of 2 or more repeated measures and range from 0.0 to 1.0, with higher values indicating a greater degree of relationship between variables. The SEM is the standard deviation (SD) of measurement errors and reflects the extent of expected error in repeated measurements.12,14 The SEM is expressed in the units of the measure. Measurement errors are assumed to be normally distributed; therefore, there is a 95% probability that an individual’s true value would be within 2 SEMs of the original measurement. Thus, the ICC provides information on the relative reliability of a measure by assessing the degree of association between repeated measurements, and the SEM provides information on the absolute reliability of a measure by assessing the extent to which a score may vary with repeated measurements on average based on measurement error.12 The SEM reflects the average error in measurement; however, interpreting a change or difference that is larger than the SEM as a “real” difference or more precisely a difference that would not be likely by chance is incorrect. For this purpose, the MDC should be calculated, it reflects the smallest detectable difference that can be considered a “real change,” that exceeds measurement error by a wide enough margin to not be considered a chance result. A change in an observed value that is less than the MDC can be considered similar to an error in measurement. The aim of this article is to explore the absolute reliability of the WMFT as a tool to document upper extremity function by establishing the SEM and MDC. Specifically, the SEM and MDC were determined using WMFT data from participants 3 to 9 months after stroke in the EXCITE national clinical trial who underwent repeated testing separated by a 2-week interval but did not receive Constraint Induced Movement therapy (CIMT) or any outside intervention during this time. Methods Data were gathered by investigators at the 6 EXCITE clinical sites (Emory University, University of Southern California, University of Alabama at Birmingham, University of Florida, University of North Carolina-Chapel Hill with Wake Forest University, and the Ohio State University), all of which had institutional review board approval. For a full description of EXCITE trial methodology, see Winstein et al (2003).11 Participants (mean age, 62.3 years; range, 19–90 years) were assigned to either an immediate or delayed group condition. The immediate group received 2 weeks of CIMT during this sub-acute period, while the delayed group received the therapy 1 year later, in the chronic phase of the stroke. The delayed group’s first and second testing sessions were used in this analysis, which were separated by approximately 2 weeks. All participants were required to have the minimal movement criteria for CIMT, which includes the ability to actively extend the wrist, thumb, and at least 2 other digits greater than 10 degrees. All EXCITE evaluators were masked to participant treatment group designation. These study personnel were trained and standardized to administer the WMFT through a thorough review of specific instructions and repeated practice. Evaluator performance was re-examined at yearly intervals throughout the trial. All components of the test were also standardized, including placement of items, chair, and table. Statistical Analysis The total sample for the study was 116 individuals with sub-acute stroke. Participants who declined to participate (n = 7), withdrew from the study before their second test (n = 12), or had no score reported for the any tasks on the timed portion of the WMFT (n = 1) were removed from the data set. Therefore the final sample size was 96. The normality of average scores for the WMFT (both Performance Timed scores and Functional Ability Scale scores), were visually verified with probability plots (P-P) and statistically verified with the Kolmogorov-Smirnov f test (P > .05). The Functional Ability Scale test scores met assumptions of normality. Although functional ability on individual items is rated using an ordinal scale, the test score is treated as parametric data because the test score can take on a very large number of values because it Downloaded from nnr.sagepub.com at UNIV OF DELAWARE LIB on November 11, 2010 664 Neurorehabilitation and Neural Repair is the average of the item scores. The timed portion of the WMFT, however, required transformation using the natural log (ln) to meet assumptions of normality so that the intraclass correlation coefficient (ICC) could be used. The average measures of the ICC were calculated using a 2-way mixed effect model, with a consistency coefficient.15 The measures for the ICC were then entered into the following SEM formula: SD [√(1-ICC)average task scores], where SD is the standard deviation of the item difference scores. For the data that required transformation, the SD of the transformed item difference score was used. The SD of the transformed data was used to calculate the SEM so the SD would be from a normal distribution. The SEM was then transformed back to the original units using the inverse of the natural log. The MDCs were then calculated using the following formula16–18: MDC90=1.64*SEM*√2 and the MDC95=1.96*SEM*√2. where “1.64” in the MDC90 reflects the 90% confidence interval (CI), the “1.96” in the MDC95 reflects the 95% CI, and √2 accounts for the additional uncertainty introduced by using difference scores from measurements obtained at 2 different time points.19 Calculating the index of reliability for each individual task item for the Functional Ability Scale score is not appropriate, as these are scored with ordinal numbers. To calculate the item-level SEM and MDCs for each timed task, scores of 121 were entered as missing data representing that the participant was unable to complete or perform the test item. This procedure is both for a practical and theoretical purpose. Practically, the 121 scores skewed the data and all transformations attempted did not allow the data to present in a normal distribution. Theoretically, a score of 121 is assigned, not earned; therefore it does not represent a real score. Thus, all the SEM and MDCs for the individual timed task items are only applicable to individuals who complete a given task in 120 seconds or less. Only 16% of the timed items were assigned a score of 121. An additional 5% of the data points, however, could not be used in calculating the indices of reliability for the individual timed scores because their matched pair was a 121 (eg, score was < 121 on the first test but equal to 121 on the second). The same procedure used for the average test scores was then used to calculate the ICC, SEM, MDC90, and MDC95 for the individual scores, including data transformation. For the grip strength task (measured in lbs) on the WMFT, scores of “0” were changed to “0.1” to allow for transformation of data using SQRT to obtain a normal distribution. The task of weight to box (in lbs) did not present as a normal distribution with any attempted transformations, but was still included in the results and should be interpreted with caution. The averages and standard deviation for the WMFT Performance Timed scores and Functional Ability Scale scores were calculated. The individual item scores were calculated following exclusion of all the data with a score of 121. All statistical calculations were completed using SPSS version 16.0. Results Sixty men and 36 women participated in this study. Fortythree had right-side hemiparesis, while 53 had left-side hemiparesis; half had dominant-side hemiparesis. The average and item scores for the individual Performance Time tasks, the strength tasks, and the Functional Ability Scale scores for the affected upper-extremity are presented in Table 1. The pretest average Performance Time score across all items was 28.5 (32.4) seconds and the average posttest score following 2 weeks of no intervention decreased to 25.8 (30.8) seconds. The average Functional Ability Scale score improved from a 2.5 (0.7) at pretest to a 2.6 (0.7). Results for the relative and absolute reliability indices are given in Table 2. The Performance Time test score had an ICC of 0.98 with a MDC95 of 0.7 seconds. The individual task timed item MDCs ranged from 1.0 seconds (turn key in lock) to 3.4 seconds (reach and retrieve). The Functional Ability Scale score ICC was 0.96 with a MDC95 of 0.1 points. Discussion This study determined the SEM and MCD values for the WMFT in a cohort of patients with sub-acute stroke. These results allow us to determine (a) the range of an individual’s or group’s true score by using the measured score and SEM and, (b) by using the MDC, if a patient or group of patients exceeded the minimal change necessary following therapy, the change observed is most likely not simply due to error. These data apply to patients with some residual movement ability 3 to 9 months after stroke. However, the psychometrics of the WMFT are similar in chronic and sub-acute patients,10 so this observation may suggest that these absolute reliability statistics can be applied to chronic patients also. This hypothesis, however, should be tested in future research. The interpretation of SEM is as follows: we are 95% confident the person’s true score was between [measured score − (2*SEM)] and [measured score + (2*SEM)]. For example if a participant scored an average of 17.5 seconds on WMFT Performance Time for the more affected upper extremity, we can be 95% confident that his or her true score was between 17.5 ± 0.4 seconds, or his true score was from 17.1 to 17.9 seconds. With such a small SEM and high ICC, the test score is considered to have excellent reliability. There is need for caution when using strictly statistical analysis (ie, P value) as a surrogate for meaningfulness of change. Mean change scores may demonstrate statistically significant results, especially when larger samples are used. For example, although no therapy was received between the testing sessions in this analysis, there was a significant difference between the first and second testing sessions in WMFT Downloaded from nnr.sagepub.com at UNIV OF DELAWARE LIB on November 11, 2010 Fritz et al / Scores for the Wolf Motor Function Test 665 Table 1 Average and Item Scores for the WMFTa Item No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Average Time (sec) / Weight (lbs) Item Description Pre (SD) Post (SD) Forearm to table Forearm to box Extend elbow Extend elbow with weight Hand to table (front) Hand to box (front) Weight to box (lbs) Reach and retrieve Lift can Lift pencil Lift paper clip Stack checkers Flip cards Grip strength (lbs) Turn key in lock Fold towel Lift basket Average score 2.6 (5.0) 5.1 (10.2) 3.3 (9.2) 3.5 (12.1) 1.9 (1.3) 2.4 (4.8) 5.2 (6.0) 1.5 (1.1) 6.3 (7.9) 11.6 (23.2) 10.6 (19.5) 23.8 (23.7) 17.6 (19.2) 9.5 (7.9) 14.7 (19.6) 13.3 (15.9) 9.6 (15.9) 28.5 (32.4) 1.9 (1.7) 3.5 (3.9) 6.0 (17.3) 2.8 (8.9) 1.9 (1.7) 1.7 (1.2) 5.9 (6.5) 1.4 (1.1) 4.7 (4.4) 8.8 (15.5) 7.5 (11.8) 21.2 (22.1) 15.5 (17.3) 10.4 (8.7) 12.4 (16.1) 15.7 (16.4) 10.7 (17.4) 25.8 (30.8) Average Functional Ability Scale Score Pre (SD) Post (SD) 3.0 (0.5) 2.6 (0.8) 2.8 (1.0) 2.8 (0.9) 2.9 (0.6) 2.7 (0.9) na 2.7 (1.4) 2.4 (0.9) 2.2 (0.9) 2.3 (0.9) 2.0 (0.9) 2.3 (0.8) na 2.3 (0.9) 2.3 (1.0) 2.2 (1.1) 2.5 (0.7) 3.1(0.6) 2.8(0.8) 3.0(1.0) 2.9(1.0) 3.1(0.6) 3.0(1.0) 3.0(1.3) 2.6(1.0) 2.2(0.9) 2.4(1.0) 2.0(0.9) 2.2(0.8) 2.2(1.0) 2.5(1.0) 2.4(1.0) 2.6(0.7) Abbreviations: WMFT, Wolf Motor Function Test; SD, standard deviation; na, not applicable. The average scores for the first testing session (pre) and the second testing session (post) for each item of the WMFT are displayed. a Table 2 Reliability Indices for WMFT a Item No. Item Description Average WMFT time score 1 Forearm to table 2 Forearm to box 3* Extend elbow 4 Extend elbow with weight 5 Hand to table (front) 6 Hand to box (front) 7* Weight to box (lbs) 8 Reach and retrieve 9 Lift can 10* Lift pencil 11 Lift paper clip 12 Stack checkers 13 Flip cards 14 Grip strength (lbs) 15 Turn key in lock 16 Fold towel 17 Lift basket Average WMFT FAS ICC Point Estimate (95% CI) SEM 90% MDC 95% MDC N 0.98(0.96-0.98) 0.80(0.70-0.87) 0.87(0.80-0.92) 0.88(0.82-0.93) 0.81(0.70-0.87) 0.86(0.79-0.91) 0.81(0.70-0.88) 0.94(0.91-0.96) 0.80(0.70-0.86) 0.83(0.73-0.89) 0.78(0.65-0.86) 0.84(0.75-0.90) 0.73(0.53-0.84) 0.91(0.85-0.94) 0.98(0.97-0.99) 0.94(0.90-0.96) 0.91(0.85-0.94) 0.86(0.77-0.92) 0.96(0.94-0.98) 0.2 0.8 0.6 0.6 0.8 0.5 0.7 1.9 1.2 0.7 1.1 0.8 1.1 0.4 0.0 0.4 0.4 0.7 0.1 0.5 1.8 1.4 1.4 2.0 1.3 1.6 4.3 2.8 1.6 2.5 1.8 2.6 1.0 0.1 0.8 1.0 1.7 0.1 0.7 2.1 1.6 1.7 2.4 1.5 1.9 5.2 3.4 2.0 3.0 2.2 3.2 1.2 0.1 1.0 1.2 2.0 0.1 96 92 82 82 86 91 77 96 92 72 71 73 56 73 96 69 69 57 95 Abbreviations: ICC, intraclass correlation coefficient; CI, confidence interval; SEM, standard error of measurement; MDC, minimal detectable change; N, number; WMFT, Wolf Motor Function Test; FAS, Functional Ability Scale. a The SEM and MDCs for the average scores were calculated from the average time (seconds) and average Functional Ability Scale (FAS) scores for the more affected upper extremity. For the individual tasks: (1) all SEM and MDC measurement are in seconds with exception of weight to box and grip strength, which are in pounds, (2) the individual task SEMs and MDCs apply to only participants who are able to complete the task for the timed tasks, therefore for participants who are assigned a score of 121, indices are not applicable. For individual items demarcated with an *, the data did not represent a normal distribution following transformation (P < .01) according to Kolmogorov-Smirnov tests; the indices are still reported because P-P plots are similar to other items. Performance Time (P = .004, ln transformed, 2-tailed, paired t test). The importance of statistically significant findings can be unclear and may simply be due to having a large sample or substantial similarity in scoring trends across participants (low standard deviation). The inclusion or development of additional indices to determine whether data exceed measurement error (MDC) or whether the results are clinically meaningful (MCID) can help with interpretation of statistical results. For these data, Downloaded from nnr.sagepub.com at UNIV OF DELAWARE LIB on November 11, 2010 666 Neurorehabilitation and Neural Repair the average WMFT Performance Time scores exceeded the MDC95 from the first testing session to the second, with an average change of 2.7 seconds (28.5 to 25.8 seconds; Table 1). This result would indicate that we can be 95% confident the change in Performance Time seen in this sample (no intervention) was not due to measurement error. This “improvement” may be due to a practice effect, which includes familiarity with the assessment, or to spontaneous recovery. The experimental group, however, which received CI therapy, had a much larger reduction in Performance Time.20 For this sample, the change in Functional Ability Scale scores from first to second session was equal to the MDC95 value. The absolute reliability is a particularly important statistic for clinicians because scores from individual patients can be compared with these standards to evaluate whether any changes observed after therapy are larger than the instrument error, and hence meaningful. Currently we know the test is highly reliable,10 but what if a participant scores an average of 17.5 seconds on the first test and 17.0 seconds on the second test? Can we be confident that this change is an improvement or is the change simply due to measurement error? The MDC95 score is 0.7 seconds for the average WMFT Performance Time test. Therefore, to be 95% confident that our participants improved, their score would have to be less than (17.5 seconds minus 0.7 seconds) or less than 16.8 seconds. If they score less than 16.8 seconds then we are 95% confident that the change is “true” and not due to measurement error. In addition to the Performance Time scores, the average Functional Ability Scale scores also have reliability indices determined. To be 95% confident that individuals improved on the Functional Ability Scale score, they must improve their average score by 0.1 points or more. The results also allow therapists to determine which individual items exceed the minimal detectable change. Due to the variability in scores and to the assumptions required for parametric statistics, the individual results only apply to participants who are able to complete a task (scoring less than 121 seconds on a particular task). While the absolute reliability is established for the individual task items, the SEM and MDC values for individual item Performance Time scores were larger (with wider ICC confidence intervals) than for the aggregate test score. This discrepancy may be due to changes in patient tone across particular joints that can affect performance on individual items, or just the general heterogeneity of upper-extremity motor deficits after stroke, even among patients whose attributes are defined by the specificity inherent in a clinical study, like the EXCITE Trial, from whence this participant pool came. Therefore, using 1 index from WFMT as an assessment presents decreased reliability. Moreover, validity is not determined at this time for individual items on the WMFT. Therefore, while absolute reliability is reported for the individual task items, the authors suggest against using individual item MDCs to evaluate change, but to use the aggregate test score instead, which does provide a reliable valid metric. Several limitations of the current study need to be addressed. We were unable to distinguish between intrarater/interrater reliability as this information was not recorded, so these values may be inflated if applied to intrarater reliability. In addition, there are the inherent problems when evaluating test–retest reliability including maturation effects, that is, natural changes over the 2-week period, or testing effects that may have influenced the data. However, even if these did occur, these effects results in overestimates of the SEM and MDC, rather than underestimates. Limitations to external validity should also be considered. Although other modified versions of the WMFT have been developed,21 the original tool was intended to be used with patients possessing some isolated distal movement capability. Reservation should be used in the application of these findings in patients who do not meet the minimal movement criteria associated with CIMT or in nonstroke patient populations. Finally, one should highlight the difference between a MDC and minimal clinically important difference (MCID). An MCID defines the smallest change that is large enough to make a meaningful difference in patients’ symptomology, functioning, or quality of life. To establish a minimal clinically important difference (MCID), one would ideally need to relate changes on the WMFT to changes in an external criterion, such as a validated measure of quality of life, health care practitioner perception of recovery,22–24 or the patient’s perspective.25,26 In the absence of such data, the MDC is sometimes treated as an MCID on the basis that a starting point for establishing what represents a minimal clinically meaningful change is a change that is larger than the testing error. To develop a more rigorous evaluation of an MCID on the WMFT, determining what might be the amount of change in the WMFT that would be perceived as beneficial by the patient is an important area for future consideration.26 The MDC established in this study does not relate to meaningfulness of change, simply to measurement error. Even if a patient exceeds the MDC, this does not mean the change measured is a meaningful change in function to the patient. This change in the MDC, however, may begin to offer some guidance to the clinician on whether to justify continuation of care. The determination of the SEM and MDC of outcome assessments helps researchers and clinicians to distinguish actual changes occurring as a result of a therapeutic intervention from chance fluctuations in test scores. The MDC95 for the overall WMFT timed scores for a cohort of individuals with sub-acute stroke is 0.7 seconds. This finding means that if an individual improves by this amount, or more, on the moreaffected extremity average Performance Time, the change is beyond measurement error and, therefore, likely to represent an actual change in the underlying construct. Identifying the absolute reliability of measurement tools in such a manner assists in interpreting results from clinical research and planning and carrying out clinical treatment. References 1.Kunkel A, Kopp B, Muller G, et al. Constraint-induced movement therapy for motor recovery in chronic stroke patients. Arch Phys Med Rehabil. 1999;80:624-628. Downloaded from nnr.sagepub.com at UNIV OF DELAWARE LIB on November 11, 2010 Fritz et al / Scores for the Wolf Motor Function Test 667 2.Miltner W, Bauder H, Sommer M, et al. Effects of constraint-induced movement therapy on patients with chronic motor deficits after stroke: a replication. Stroke. 1999;30:586-592. 3.Sterr A, Elbert T, Berthold I, et al. Longer versus shorter daily constraintinduced movement therapy of chronic hemiparesis: an exploratory study. Arch Phys Med Rehabil. 2002;83:1374-1377. 4.Ang JH, Man DW. The discriminative power of the Wolf motor function test in assessing upper extremity functions in persons with stroke. Int J Rehabil Res. 2006;29:357-361. 5.Ertelt D, Small S, Solodkin A, et al. Action observation has a positive impact on rehabilitation of motor deficits after stroke. Neuroimage. 2007;36(suppl 2):T164-T173. 6.Wolf S, Lecraw D, Barton L. Forced use in hemiplegic upper extremities to reserve the effect of learned nonuse among chronic stroke and headinjured patients. Exp Neurol. 1989;104:125-132. 7.Taub E, Miller N, Novack T, et al. Technique to improve chronic motor deficit after stroke. Arch Phys Med Rehabil. 1993;74:347-354. 8.Morris DM, Uswatte G, Crago JE, et al. The reliability of the Wolf Motor Function Test for assessing upper extremity function after stroke. Arch Phys Med Rehabil. 2001;82:750-755. 9.Wolf SL, Catlin PA, Ellis M, et al. Assessing Wolf motor function test as outcome measure for research in patients after stroke. Stroke. 2001;32: 1635-1639. 10.Wolf SL, Thompson PA, Morris DM, et al. The EXCITE trial: attributes of the Wolf Motor Function Test in patients with subacute stroke. Neurorehabil Neural Repair. 2005;19:194-205. 11.Winstein CJ, Miller JP, Blanton S, et al. Methods for a multisite randomized trial to investigate the effect of constraint-induced movement therapy in improving upper extremity function among adults recovering from a cerebrovascular stroke. Neurorehabil Neural Repair. 2003;17: 137-152. 12.Domholdt E. Rehabilitation Research: Principles and Applications. 3rd ed. St. Louis, MO: Elsevier Inc; 2005. 13.Portney L, Watkins M. Foundations of Clinical Research: Applications to Practice. 2nd ed. New Jersey: Prentice-Hall Inc; 2000. 14.Harris KD, Heer DM, Roy TC, et al. Reliability of a measurement of neck flexor muscle endurance. Phys Ther. 2005;85:1349-1355. 15.McGraw K, Wong S. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996;1:30-46. 16.Beaton DE. Understanding the relevance of measured change through studies of responsiveness. Spine. 2000;25:3192-3199. 17.Taylor R, Jayasinghe UW, Koelmeyer L, et al. Reliability and validity of arm volume measurements for assessment of lymphedema. Phys Ther. 2006;86:205-214. 18.Binkley JM, Stratford PW, Lott SA, et al. The Lower Extremity Functional Scale (LEFS): scale development, measurement properties, and clinical application. North American Orthopaedic Rehabilitation Research Network. Phys Ther. 1999;79:371-383. 19.Steffen T, Seney M. Test-retest reliability and minimal detectable change on balance and ambulation tests, the 36-item short-form health survey, and the unified Parkinson disease rating scale in people with parkinsonism. Phys Ther. 2008;88:733-746. 20.Wolf SL, Winstein CJ, Miller JP, et al. Effect of constraint-induced movement therapy on upper extremity function 3 to 9 months after stroke: the EXCITE randomized clinical trial. JAMA. 2006;296:2095-2104. 21.Whitall J, Savin DN Jr, Harris-Love M, et al. Psychometric properties of a modified Wolf Motor Function test for people with mild and moderate upper-extremity hemiparesis. Arch Phys Med Rehabil. 2006;87:656-660. 22.Osoba D, Rodrigues G, Myles J, et al. Interpreting the significance of changes in health-related quality-of-life scores. J Clin Oncol. 1998;16: 139-144. 23.Liang MH. Longitudinal construct validity: establishment of clinical meaning in patient evaluative instruments. Med Care. 2000;38(suppl): II84-II90. 24.Fritz JM, George SZ. Identifying psychosocial variables in patients with acute work-related low back pain: the importance of fear-avoidance beliefs. Phys Ther. 2002;82:973-983. 25.Fritz SL, George SZ, Wolf SL, et al. Participant perception of recovery as criterion to establish importance of improvement for constraint-induced movement therapy outcome measures: a preliminary study. Phys Ther. 2007;87:170-178. 26.Lang CE, Edwards DF, Birkenmeier RL, et al. Estimating minimal clinically important differences of upper-extremity measures early after stroke. Arch Phys Med Rehabil. 2008;89:1693-1700. For reprints and permission queries, please visit SAGE’s Web site at http://www.sagepub.com/journalsPermissions.nav. Downloaded from nnr.sagepub.com at UNIV OF DELAWARE LIB on November 11, 2010