Clinical research European Heart Journal (2005) 26, 1742–1751 doi:10.1093/eurheartj/ehi259 The reproducibility and sensitivity of the 6-min walk test in elderly patients with chronic heart failure Lee Ingle*, Rhidian J. Shelton, Alan S. Rigby, Samantha Nabb, Andrew L. Clark, and John G.F. Cleland Department of Academic Cardiology, Castle Hill Hospital, Castle Road, Cottingham, Hull HU16 5JQ, UK Received 2 September 2004; revised 10 February 2005; accepted 15 March 2005; online publish-ahead-of-print 14 April 2005 KEYWORDS Aims The 6-min walk test (6-MWT) is used to estimate functional capacity. However, in elderly patients with chronic heart failure (CHF): (i) 1 year reproducibility of the 6-MWT; (ii) sensitivity of the 6-MWT to self-perceived changes in symptoms of heart failure; and (iii) implications for patient numbers required for studies using the 6-MWT as an endpoint have not been described. Methods and results One thousand and seventy-seven patients with CHF, aged . 60, with NYHA Class II were recruited. Heart failure symptom assessment was determined using a questionnaire related to aspects of physical function, and patients performed a baseline 6-MWT, with follow-up 1 year later. Seventy-four patients with unchanged symptoms had an unchanged 6-MWT distance, with an overall intraclass correlation coefficient of 0.80 (95% CI ¼ 0.69–0.87). Four hundred and twenty-three patients reported an improvement in symptoms during follow-up. There was a negative correlation (r ¼ 20.55; P ¼ 0.0001) between D symptoms and D 6-MWT (i.e. a reduced 6-MWT distance is associated with reduced symptom severity at follow-up). Five hundred and sixteen patients reported worsening symptoms of heart failure, a moderate inverse correlation (r ¼ 20.53; P ¼ 0.0001) was displayed between D symptoms and D 6-MWT. For all patients, irrespective of symptom status, a high inverse correlation (r ¼ 20.75; P ¼ 0.0001) was evident. On the basis of the data for patients with unchanged symptoms, it is calculated that to detect an increase in 6-MWT of 50 m, with 90% power, a study size of approximately 120 is required. Conclusion In elderly patients with CHF, the 6-MWT shows satisfactory agreement when repeated 1 year later. Change in 6-MWT distance is sensitive to change in self-perceived symptoms of heart failure. Introduction In patients with chronic heart failure (CHF), the 6-min walk test (6-MWT) is a simple, low-cost method for estimating exercise capacity; only a pre-measured level surface and a timing device are needed.1–4 The mode of exercise is familiar to patients, although it may represent a maximal test for some.5,6 The test appears useful for the assessment of some interventions such as cardiac resynchronization7,8 and has strong predictive power for both mortality and morbidity.4,6,7 Despite the routine inclusion of the 6-MWT in CHF studies,3,4,9,10 few have focused on test–re-test reproducibility. O’Keeffe et al. 10 recruited 60 elderly patients (mean age 82) who completed the 6-MWT and were re-tested within 3–8 weeks. Intraclass correlation coefficients (ICCs) of 0.91 were reported for 24 patients with no overall change in cardiac status, indicating satisfactory agreement. In patients with CHF, despite the interest in using the 6-MWT as a tool to assess treatment and despite the fact that it is an important outcome measure for intervention studies,11,12 only one * Corresponding author. Tel: þ44 148 262 3732; fax: þ44 148 262 4071. E-mail address: l.ingle@hull.ac.uk study has examined reproducibility with a test–re-test interval .3 months9 and none has reported data after 1 year. A major aim of health care is to reduce symptom severity within the physical limits imposed by a disease.13–16 It is not clear whether objective measures of functional capacity are sensitive to self-perceived changes in symptoms of heart failure. Therefore, the aim of the current study was to determine in an elderly representative population of patients with CHF, the following: (i) long-term (1 year) reproducibility of the 6-MWT; (ii) sensitivity of the 6-MWT to self-perceived changes in symptoms of heart failure; and (iii) implications for patient numbers required for studies using the 6-MWT as an endpoint. Methods The Hull and East Riding Ethics Committee approved the study, and all patients provided informed consent for participation. Patients were recruited from a local community heart failure clinic, inclusion criteria were as follows: age .60; evidence of left ventricular systolic dysfunction (LVSD); and symptoms of heart failure (NYHA Class II). In total, 68% of the patients had heart failure of ischaemic aetiology and suffered from the condition for at least 6 months before the study. Co-morbidities including hypertension and & The European Society of Cardiology 2005. All rights reserved. For Permissions, please e-mail: journals.permissions@oupjournals.org Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 Symptoms of heart failure; Elderly patients; 1 year follow-up; Power curves The 6-MWT in elderly patients with CHF Baseline visit Patients were studied when they were clinically stable, without any changes in medication during the previous 3 weeks. They underwent clinical history and physical examination, together with ECG and echocardiogram. Symptoms of heart failure were determined by methodology used in the EuroHeart Failure Survey. Patients were asked a series of six questions graded from 1 to 6, where 1 was unimpaired and 6 was very much impaired. Thus, patients could score between 6 and 36 points. These questions related to perceived heart failure symptoms during physical function14 (see Appendix). 6-MWT protocol The 6-MWT was conducted following a standardized protocol, between 10 a.m. and 4 p.m. after usual medication.3 A 15 m flat, obstacle-free corridor, with chairs placed at either end was used. Patients were instructed to walk as far as possible, turning 1808 every 15 m in the allotted time of 6 min. Patients were able to rest, if needed, and time remaining was called every second minute.19 Patients walked unaccompanied so as not to influence walking speed. On completion of 6 min, patients were instructed to stop and total distance covered was calculated to the nearest metre. Standardized verbal encouragement was given to patients after 2 and 4 min, respectively. Patients returned for follow-up at 1 year and the evaluation of symptom severity and 6-MWT performance were repeated. Patients were divided into three prospectively defined groups based on changes in symptoms of heart failure between baseline and 1 year. In group 1, patients reported unchanged symptoms, defined as baseline score +3 points. These results were used for the reproducibility analysis as no perceived changes in heart failure symptoms were reported.20 In group 2, worsening symptoms were reported, defined by a rise 4 points; and in group 3, improved symptoms were reported, defined by a fall of 4 points. Statistical analysis Data were analysed using SPSS statistical software for Windows version 11.5 (SPSS Inc., Chicago, IL, USA). To assess reproducibility, ICC with 95% CIs were calculated. Several investigators have suggested that an ICC of 0.75 is satisfactory when studying groups of patients, so this threshold was defined as acceptable for the current study.10,21 Bland–Altman plots with 95% limits of agreement were also derived.22 For heart failure symptom assessment, medians and inter-quartile ranges (IQRs) were presented. A x2 test was used to determine the differences in heart failure symptoms between baseline and 1 year. Spearman correlation coefficients were used to determine the relation between changes in 6-MWT performance and changes in symptoms of heart failure. Group differences at baseline were determined by the analysis of variance (ANOVA). In order to account for the inflation of the experiment-wise type I error owing to multiple testing, we have followed previous recommendations of reporting unadjusted P-values.23 Indeed, as Perneger24 concluded ‘simply describing what tests of significance have been performed, and why, is generally the best way of dealing with multiple comparisons’. We have also performed a subgroup analysis that provides information on the consistency (or lack of) of findings. In patients with more severe symptoms, 6-MWT may be limited predominantly by cardiorespiratory disease, whereas in patients with milder disease other factors may be important. Data are presented as mean + SD; all tests were twosided, and P , 0.05 was taken as being statistically significant. We used the standard deviation (SD) of the 6-MWT at baseline to construct power curves for a proposed intervention study. Power was defined as the probability of showing a difference between two (or more groups), if a difference actually exists between them.25 The curves were designed to show the sample size required per group (equal allocation) in step sizes of 10 m. For every 10 m gained, the sample size is reduced. Note that when planning an intervention study, an estimate would be required for the potential loss-to-follow-up. Nomograms and power curves have been produced both for general medical use26,27 and for more specialist problems such as those posed by reliability studies.28,29 Results Of an initial population of 1077 patients, 64 died (46 males, 77.7 + 7.2 years, and body mass 74.6 + 17.9 kg) (Figure 1). At baseline, 6-MWT distance was significantly lower for patients who subsequently died than for groups 1, 2, and 3 (P ¼ 0.002), although symptom severity was not different. Data from the remaining 1013 patients were analysed. Seventy-four patients (52 males) showed no change in symptoms over 1 year. Baseline clinical characteristics are shown in Table 1. There was no difference in 6-MWT distance between patients with unchanged symptoms and those with worsening symptoms (P ¼ 0.086), but a difference between patients with improved symptoms of heart failure and the other groups was seen at baseline (P ¼ 0.032). Long-term (1 year) reproducibility of the 6-MWT In patients with unchanged symptoms after 1 year, baseline 6-MWT distance was 285 + 122 m, and fell slightly, but not significantly (276 + 118 m, P ¼ 0.07). The ICC for 6-MWT for all 74 patients was 0.80 (95% CI ¼ 0.69–0.87) showing a high level of agreement by our criteria.10,19 After stratifying by beta-blocker usage between baseline (r ¼ 0.80; 95% CI ¼ 0.69–0.87) and 1 year (r ¼ 0.81; 95% CI ¼ 0.69–0.89), reproducibility remained unchanged. We then divided patients by sex. Males walked 301 + 115 m at baseline compared with 246 + 133 m (P ¼ 0.07) for females, although females (mean age 73.4 + 5.4) were older than males (mean age 71.6 + 7.4), albeit not significantly (P ¼ 0.086). After 1 year, the difference in distance walked between males (307 + 107 m) and females (205 + 112 m, P ¼ 0.0015) was significant. Reproducibility was higher in Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 diabetes mellitus of moderate or less severity were included according to the National Institute for Clinical Excellence Guidelines.16 Patients were excluded if they were unable to walk without assistance from another person (not including mobility aids) or if they were unable to exercise because of non-cardiac limitations including osteoarthritis and chronic obstructive pulmonary disease of at least moderate severity.17 A history of smoking was evident in 74.2% of patients, although current smoking levels were 11.8%. Heart failure was defined in accordance with the National Institute for Clinical Excellence Guidelines16 and with the European Society of Cardiology.17 Left ventricular function was determined from 2D-echocardiography or magnetic resonance imaging. Echocardiography was carried out by one of three trained operators. Left ventricular function was assessed by estimation on a scale of normal, mild, moderate, and severe impairment and was assessed by a second operator blind to the assessment of the first; where there was disagreement on the severity of left ventricular dysfunction, the echocardiogram was reviewed jointly with the third operator and a consensus reached. Left ventricular ejection fraction (LVEF) was calculated using the Simpson’s formula from measurements of end-diastolic and endsystolic volumes on apical 2D views, following the guidelines of Schiller et al.,18 and LVSD was diagnosed if LVEF was 40%. When the echocardiogram was of low quality, patients underwent a cardiac magnetic resonance scan to determine left ventricular volume and function. 1743 1744 L. Ingle et al. Flow chart showing number of patients in each group. males (ICC ¼ 0.85; 95% CI ¼ 0.75–0.91) than in females (ICC ¼ 0.65; 95% CI ¼ 0.33–0.84). The Bland–Altman plot for the 6-MWT is shown in Figure 2. There was no relation between the differences in values (calculated as 1 year 2 baseline) and the mean values (average of 1 year and baseline). The mean difference was 28.6 m with 95% limits of agreement of 2162.1–144.8 m. We have also reported that NYHA Class II patients show moderate 1 year reproducibility for the 6-MWT (ICC¼0.52; 95% CI ¼ 20.09–0.85). Sensitivity to change of the 6-MWT based on changes in symptoms Figure 3 shows no relation between baseline symptoms and baseline 6-MWT in groups 1, 2, and 3 (r ¼ 0.00, P ¼ 0.74), whereas Figure 4 shows a strong association between D symptom severity and D 6-MWT (r ¼ 20.75; P ¼ 0.00001) in all patients. In 516 patients (327 males; 63%) with worsening symptoms of heart failure, mean 6-MWT distance fell from 279 + 127 to 192 + 165 m. There was an inverse correlation (r ¼ 20.53; P ¼0.0001) between D symptoms and D 6-MWT. However, 132 patients (62 males) declined to participate in the 6-MWT. These patients had a greater decline in symptoms than the other groups. Details of these patients are presented in Table 1. In patients with improved symptoms, there was an inverse correlation (r ¼ 20.55; P ¼0.0001) between D symptoms and D 6-MWT. There was no overall association (Figure 5 ) between baseline symptom severity and D symptoms after 1 year in all groups (r ¼ 0.01; P ¼ 0.109). However, in patients with worsening symptoms, a strong inverse correlation was evident (r ¼ 20.67; P ¼ 0.00045). No relation existed between baseline 6-MWT and D 6-MWT at 1 year (r ¼ 0.2; P ¼ 0.125) (Figure 6 ). Implications for study size using the 6-MWT as an endpoint We constructed power curves to estimate the sample size required for an intervention study based on the 6-MWT (Figure 7 ). We calculated power curves in 10% intervals from 50 to 90%. There is no minimum acceptable power but the higher the better, though high power does not come without cost. For example, the higher the power, the larger the sample size. To construct the power curves, we required information on the type I error (also known as the P-value), where P is the probability of a false positive. Typically, 5% is chosen though this is an arbitrary threshold for statistical significance, and we assumed a two-tailed test. We also required information on the SD of the outcome measure. The SD of the 6-MWT at baseline was 120 m. We also required distance walked data. Power Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 Figure 1 The 6-MWT in elderly patients with CHF 1745 Table 1 Clinical characteristics Clinical characteristics (mean + SD) Classification according to change in symptom score over 1 year No change 74 52/22 1.71 + 0.14 75.3 + 16.8 72.4 + 6.7 33.4 + 7.6 39.2 24.6 Treatment at baseline Warfarin (%) Loop diuretic (%) Beta-blockers (%) Digoxin (%) ACE-I (%) Statin (%) 29.7 75.7 42.7 25.7 52.7 32.4 Baseline 6-MWT (m) Follow-up 6-MWT (m) D Mean 6-MWT Baseline symptom score (median + IQR) Follow-up symptom score (median + IQR) D Mean symptom scorec Unable to repeat 6-MWT Better 516 388/128 1.66 + 0.10 79.4 + 15.1 70.1 + 9.1 29.9 + 5.3 38.4 25.3 132 62/70 1.65 + 0.09 82.3 + 14.0 72.4 + 6.4 28.3 + 4.8 42.3 23.8 423 301/122 1.72 + 0.13 78.4 + 15.2 75.8 + 8.1 34.6 + 7.8 40.5 19.8 27.5 72.5 45.5 28.9 58.6 38.4 29.2 70.3 43.0 27.1 55.8 40.2 33.6 74.4 48.4 27.7 56.6 36.5 285 + 122 276 + 118 9 + 77 15 + 4 279 + 127 195 + 130 284 + 63 15 + 7 263 + 95 — — 15 + 7 342 + 117 396 + 126 54 + 46 16 + 5 15 + 4 20 + 6 26 + 6 10 + 4 0 5 11 26 a Quantitative assessment of LVEF obtained . 80% of patients. Significant difference between deceased patients and other groups (P ¼ 0.002). c Positive values indicate deterioration and negative values indicate improvement. b Figure 2 Bland–Altman plot for 6-MWT in 74 patients with no change in symptoms. ANOVA (P-value) 64 48/16 1.70 + 0.1 74.6 + 17.9 77.7 + 7.2 32.8 + 8.3 41.8 19.7 — — 0.39 0.01 0.01 0.12 0.21 0.08 32.4 74.6 41.4 26.6 55.8 40.2 0.36 0.21 0.34 0.45 0.38 0.34 208 + 103b — — 16 + 8 0.032 0.0001 0.0001 0.23 — 0.0001 — 0.0001 Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 n Male/female Stature (m) Body mass (kg) Age (years) LVEF (%)a Hypertension (%) Diabetes (%) Worse Dead 1746 L. Ingle et al. Figure 4 Baseline symptom severity vs. baseline 6-MWT. Changes in symptoms vs. change in 6-MWT after 1 year. curves were then constructed over a whole range of distances ranging from 30 to 100 m (Figure 7 ) using formulae outlined by Altman.25 The interested reader should then read off the required sample size for a given power (5% significance, two-tailed) for our assumed SD. For a gain of 10 m, we would require over 3000 patients per group (90% power, 5% significance). Conversely, for a 100 m gain, we would require just over 30 patients per group (90% power, 5% significance). It is noteworthy that SD will vary depending on the heterogeneity of the population studied (Table 3 ). Discussion At baseline, 6-MWT performance (mean: 208 + 103 m) was significantly lower in elderly patients with LVSD who died prior to follow-up, although symptom severity was not different. Previous studies have shown that a 6-MWT of ,300 m significantly increases mortality risk.7,30 Current data support this finding. It is also possible that the baseline 6-MWT performance is prognostically more sensitive than the assessment of baseline symptoms. We found no Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 Figure 3 The 6-MWT in elderly patients with CHF 1747 Baseline symptom severity vs. changes in symptoms after 1 year. Figure 6 Baseline 6-MWT vs. change in 6-MWT after 1 year. relationship between baseline symptom severity and 6-MWT performance; however, one of the novel aspects of our study is the sensitivity between changes in symptoms and changes in 6-MWT performance between baseline and 1 year. Obviously, we could not follow-up deceased patients, and further prognostic information regarding symptom severity is yet to be determined. Long-term reproducibility of the 6-MWT The current study shows that after 1 year, the 6-MWT displays acceptable reproducibility (ICC ¼ 0.80; 95% CI ¼ 0.69–0.87) in a population of elderly patients with CHF due to systolic dysfunction and associated comorbidities of hypertension and diabetes mellitus. However, it is difficult to generalize these findings to patients with several co-morbidities including osteoarthritis and chronic obstructive pulmonary disease. In order to compensate for changes in patients’ clinical conditions, we assessed only those patients whose symptom severity remained unchanged (7.3% of total). Although sample size was limited (n ¼ 74), it was larger than in previous studies (n 26) of patients with lung disease,4,31 fibromyalgia,18 brain injury,32 and CHF.10 Furthermore, patients in these studies were followed up after ,8 weeks. The study by Demers et al. 9 assessed 768 patients at baseline, 18 and Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 Figure 5 1748 L. Ingle et al. Table 2 Reproducibility data for patients with unchanged symptom status Variable Number of patients ICC (95% CI) Overall Male Female 74 52 22 0.80 (0.69–0.87) 0.85 (0.75–0.91) 0.65 (0.33–0.84) Age at baseline (years) 60–64 65–69 70–74 75–79 80þ 8 13 16 18 19 0.78 (0.68–0.96) 0.78 (0.43–0.92) 0.91 (0.79–0.97) 0.72 (0.40–0.88) 0.57 (0.46–0.85) NYHA classification II III/IV 62 12 0.52 (20.09–0.85) 0.74 (0.50–0.86) Stratified by beta-blockade Baseline 1 year 60 73 0.80 (0.69–0.87) 0.81 (0.69–0.89) 43 weeks. The trial aimed to examine the effects of candesartan, enalapril, and metropolol on LVEF (RESOLVD study). The authors reported high reproducibility after 43 weeks (ICC ¼ 0.91; CI not reported); however, the study did not compensate for changes in patients’ clinical conditions. To our knowledge, the current study is the first to assess reproducibility of the 6-MWT after 12 months in which clinical condition was controlled. A habituation period, where the test manoeuvre is repeatedly practised, reduces variability in 6-MWT performance.33 A learning effect of 6% was reported in a cardiac rehabilitation population completing the 6-MWTon non-consecutive days,34 and the effect was maintained for up to 2 months in healthy subjects.35 Some have argued that studies should include a minimum of one or even two practice sessions. However, tests would need to be administered on separate days, which would be cumbersome to implement in clinical trials. On the basis of the results of the current study, satisfactory reproducibility can be achieved without repeating the 6-MWT. We stratified by beta-blockade and found no changes in reproducibility, indicating that beta-blockers do not dissociate 6-MWT performance and symptom severity in this cohort of patients. Therefore, we have added confidence that our data demonstrate a true reflection of the association between 6-MWT performance and symptom severity. We also found a clear difference between males (ICC ¼ 0.85; 95% CI ¼ 0.75–0.91) and females (ICC ¼ 0.65; 95% CI ¼ 0.33–0.84). Although females may often walk shorter distances,2 there is little evidence that they provide less stable data during repeated measures. It has been reported that differences in symptom severity between males and females36,37 may be responsible, however, this was not a finding in the current study. It is noteworthy that females (mean age 73.4 + 5.4) were older than males (mean age 71.6 + 7.4), albeit not significantly (P ¼ 0.086); therefore, it is conceivable that age differences may be in some way responsible. We used a 15 m long corridor for the patients to perform the 6-MWT, whereas others have used corridors of different length including 20 m38 and .30 m.3 Although never formally tested, shorter corridor lengths may have an impact on 6-MWT performance due to the increased impact of turning. The current study indicates that stringent standardization of test procedures does not guarantee low SD in a heterogenous heart failure population, in accordance with other studies including RESOLVD.9 Our data show relatively low reproducibility (ICC ¼ 0.52; 95% CI ¼ 20.09–0.84) after 1 year in patients with NYHA Class II symptoms (Table 2 ). It is possible that in patients with more severe symptoms of heart failure, that is, Class III/IV, the 6-MWT will better reflect cardiorespiratory function (ICC ¼ 0.74; 95% CI ¼0.50–0.86), whereas in patients Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 Figure 7 Power curves showing the sample size required for an intervention study based on gained 6-MWT distance (based on SD ¼ 120 m; two-tailed test; P , 0.05). The 6-MWT in elderly patients with CHF 1749 Table 3 Non-interventional CHF trials employing the 6-MWT Authors Mean age (years) (range or SD) Sex (male/female) n NYHA class Distance walked (m) SD Rostagno et al. 7 O’Keeffe et al. 10 Roul et al. 31 57 (29–70) 81 (74–92) 59 (11) 119/95 38/22 — 214 60 121 III–IV I–IV II–III Zugck et al. 53 54 (12) 90/23 113 I–III Morales et al. 54 Opasich et al. 39 53 (11) 53 (9) 37/9 274/41 46 315 II–IV II–III Hulsmann et al. 55 57 (8) 79/17 96 I–IV Martensson et al. 56 Hauptman et al. 57 Cahalin et al. 38 Mean 61 (9) 61 (13) 49 (8) 59 48/0 363/121 — — 48 484 45 154 I–IV — II–IV — 229 239 448 410 423 485 408 390 313 221 395 242 315 310 345 112 52 92 126 104 91 91 88 113 145 107 100 112 100 110 Sensitivity to changes in the 6-MWT based on changes in symptoms We have found that change in 6-MWT distance is sensitive to changes in symptoms of heart failure in a representative sample of patients with CHF. To our knowledge, the current study is the first to focus on self-perceived symptoms of heart failure. Other studies have found no association between generic quality of life (QoL) instruments and 6-MWT.19,28,39 Steptoe et al. 11 assessed health-related QoL and psychological well-being in 99 patients with dilated cardiomyopathy. They reported no association between functional capacity and QoL in patients with NYHA Class I and II symptoms. The current study shows similar findings for symptom severity at baseline (Figure 2 ). A comparative investigation13 of 205 patients with heart failure reported similar findings to the current study. Our data suggest that for patients with a range of heart failure symptoms (NYHA II–IV), 6-MWT is sensitive to changes in symptoms of heart failure. The sensitivity of a test to changes in symptoms is an important but often neglected clinical measure.40 Many factors contribute to these changes including pathophysiological and psychological alterations.11 Patients with CHF are prone to episodes of depression with a resulting deterioration in symptoms.41 The extent to which objective measures of functional capacity predict self-reported mental health status have yet to be determined. We did not measure depression/ depressive symptoms; therefore, it is not possible to say whether symptom severity or indeed reproducibility of the 6-MWT was affected by this variable at follow-up. Future studies should focus on how changes in 6-MWT and symptom severity influence prognosis in patients with CHF. To provide adequate statistical power, careful consideration should be given to sample size and study design. We found that only 7% of patients had symptoms that remained unchanged over 1 year. Few studies have focused on mid- to long-term changes in symptoms and QoL without intervention. The study by O’Keeffe et al. 10 reported that in 45 elderly patients with heart failure followed up after 3–8 weeks, 53% had no changes in QoL, which is much higher than our findings. However, O’Keeffe’s study10 employed a smaller sample size, included a short follow-up period, and used a different QoL inventory and method of analysis, and did not focus specifically on changes in symptoms. Therefore, it is very difficult to compare these findings. To our knowledge, our study is the first to report changes in symptoms over a 12 month period in a large cohort of patients with LVSD. Future studies are required to corroborate or refute these findings. Incomplete data sets due to attrition or non-compliance represent a major challenge for researchers.42–49 In particular, it is important to recognize the pattern of missingness because this can determine the statistical analysis. According to Little and Rubin45 the missing-data mechanism is called ‘missing completely at random’ (MCAR), where the missingness is independent of response (e.g. a patient misses an appointment because of bad weather). The missing-data mechanism is called missing at random, where the missingness depends on the observed response only (i.e. a patient stays in hospital for a few weeks but then skips an appointment). Otherwise, the missing-data mechanism is known as non-ignorable missing. We tested the assumption of MCAR on the 6-MWT data at follow-up by applying Little’s test.43 The test statistic is based on the pattern specific mean values and the pooled estimates of the population mean and covariance. The missing-data mechanism was not MCAR (x2 ¼ 782, P , 0.0001). Recognition of the missing-data mechanism is important in selecting an appropriate method of analysis because methods that disregard the missing-data process may lead to biased estimates of effect size and unrealistic estimates of power,42,47,49 though the latter may be overcome to some extent by including more patients.49 There are a wide variety of statistical methods available for handling missing data, the interested reader is directed to Engels and Diehr.46 Some methods use information pertaining to the patient whose data is missing, others use the values of other patients. A commonly applied method Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 with milder symptoms, other, yet to be identified factors may be important (possibly mood). 1750 (that is easy to apply in practice) is carrying the last observation forwards. This technique will lead to a more conservative treatment effect but at the same time it has a smaller SD (thus, 95% CIs may be unrealistically low). However, there is consensus that no one single method that is appropriate for all situations.46 More generally, it is recommended that ‘. . .in longitudinal studies, where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data should be imputed from the available longitudinal data for that person’.46 In the context of a randomized controlled trial, we recommend that researchers follow the advice of Houck et al. 48 who contended ‘. . .attention to the missing-data mechanism should be an integral part of clinical trial data’. Guyatt et al. 3 suggested that the minimum clinically significant distance for the 6-MWT is 30 m. On the basis of our calculations (Figure 4 ), a gain of 30 m would require 250 patients per group with 80% power at 5% significance or 340 patients per group with 90% power at 5% significance. The study by O’Keeffe et al. 10 reported baseline 6-MWT distance of 239 + 52 and 275 + 103 m 3–8 weeks later in patients with ‘much better’ symptoms. Therefore, for a gain of 47 m, our power analysis indicates that a sample size of 150 patients with 90% power, or 120 with 80% power at 5% significance is required. The study10 recruited 60 patients, and based on our findings was therefore underpowered. Our curves can be used to assist in planning group sizes for intervention studies where the 6-MWT is an outcome measure. Note that when designing studies, an estimate is required of the potential loss-to-follow-up, which should be factored into the planning process. Table 3 identifies a selection of non-interventional CHF trials in which the 6-MWT was used as an endpoint. These data can be used to determine whether the SD of 120 m identified in the current study could be applied to other heart failure populations. Table 3 indicates a mean SD of 110 m, which is similar to the current study; however, mean age is much lower (59 years) than our data (.70 years). It is possible that walking performance is more variable in older patients, which may explain the SD differences. Care should be taken when applying these power curves because of the heterogeneity of different subgroups. A limitation of this study was that 132 patients with worsening symptoms declined to participate in the 6-MWT. This loss was very similar to the smaller study by O’Keeffe et al. 10 Although patient medication may have been optimized, it is possible that ACE inhibitors,15 and beta-blockers50,51 do not lead to positive changes in symptom severity despite the well-known benefits to mortality risk and improvement in LVEF. Further, subgroup numbers (Table 2 ) are small as the reproducibility analysis is based on only 74 patients. The sensitivity of the 6-MWT to perceived changes in symptom severity was determined without measuring perceived changes in anxiety and depression from validated inventories. We acknowledge that these factors may play a role in changes in functional capacity over time, and should be included in future studies. An unexpected observation was that for patients whose symptoms improved; they walked further and were older than patients whose symptoms worsened or did not change at follow-up (Table 1 ). It is difficult to provide an explanation for these findings, future studies may wish to address this issue. We did not carry out an a priori power calculation, and to our knowledge, there is little written about power for reproducibility studies, with perhaps the work of Donner28,29 the most well known. Lack of power becomes a (possible) problem if no significant differences are found. With the exception of NYHA Class II (ICC ¼ 0.52), all the ICCs were significant at the 5% level (Table 2 ). Using the criteria of Landis and Koch,52 an ICC of 0.52 would be classified as having only a ‘moderate’ level of reliability. Using the power curves of Donner28 to show ‘moderate’ reliability would require about 50 patients (assuming two measurements on each), if the actual reliability was 0.8 (80% power, 5% significance, two-tailed). If the actual reliability was ,0.8, then we would require more patients or more than two measurements per patient.28 Conclusion We have shown a satisfactory long-term (1 year) reproducibility for the 6-MWT in elderly patients with heart failure due to systolic dysfunction. These data suggest that the 6-MWT may be an appropriate test of functional capacity in these patients. Males demonstrated lower variability than females. On the basis of these findings, we conclude that 6-MWT distance is sensitive to self-perceived changes in symptoms of heart failure. When the 6-MWT is an endpoint in a clinical trial, a minimum of 500 patients is needed to detect a change of 30 m in an intervention. However, SDs will vary depending on the heterogeneity of the population studied. Researchers may expect a degree of missing data especially in longitudinal studies; attention to the missing-data problem should become an integral part of the clinical trial protocol. Acknowledgement We wish to thank the referees for their constructive comments. Appendix For each question, patients responded by providing one of six responses based on the options follow: (1) no; (2) very little; (3) a little; (4) some; (5) a lot; (6) very much. The six questions relating to symptoms are listed below. In the last month, how much did the following affect you? (1) (2) (3) (4) (5) (6) breathlessness limiting daily activities; fatigue limiting daily activities; inability to do normal daily activities due to health; inability to do hobbies/sports due to health; inability to work due to health; chest pain during normal activity. References 1. Enright PL, Sherill DL. Reference equations for the six-minute walk in healthy adults. Am J Respir Crit Care Med 1998;158:1384–1387. Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 Implications for study size using the 6-MWT as an endpoint L. Ingle et al. The 6-MWT in elderly patients with CHF 24. Perneger TV. What’s wrong with Bonferroni adjustments? BMJ 1998;316:1236–1238. 25. Altman DG. Statistics and ethics in medical research III. How large a sample? BMJ 1980;281:1236–1238. 26. Day SJ, Graham DF. Sample size and power for comparing two or more treatment groups in clinical trials. BMJ 1989;299:663–665. 27. Donner A, Eliasziw M. Sample size requirements for reliability studies. Stat Med 1987;6:441–448. 28. Donner A. Sample size requirements for the comparison of two or more coefficients of inter-observer agreement. Stat Med 1998;17:1157–1168. 29. Roul G, Germain P, Bareiss P. Does the 6-minute walk test predict the prognosis in patients with NYHA class II or III chronic heart failure? Am Heart J 1998;136:449–457. 30. Butland RJ, Pang J, Gross ER et al. Two-, six-, and 12-minute walk tests in respiratory disease. BMJ 1982;284:1607–1608. 31. Mossberg KA. Reliability of a timed walk test in persons with acquired brain injury. Am J Phys Med Rehabil 2003;82:385–390. 32. Pinna GD, Opasich C, Mazza A et al. Reproducibility of the six-minute walking test in heart failure patients. Stat Meth 2000;19: 3087–3094. 33. Hamilton DM, Haennel RG. Validity and reliability of the 6-minute walk test in a cardiac rehabilitation population. J Cardiopulm Rehabil 2000;20:156–164. 34. Wu G, Sanderson B, Bittner V. The 6-minute walk test: how important is the learning effect? Am Heart J 2003;146:129–133. 35. Shephard RJ, Franklin BA. Changes in quality of life: a major goal of cardiac rehabilitation. J Cardiopulm Rehabil 2001;21:189–200. 36. Morrin L, Black S, Reid R. Impact of duration in a cardiac rehabilitation programme on coronary risk profile and health-related quality of life outcomes. J Cardiopulm Rehabil 2000;20:115–121. 37. Cahalin L, Mathier MA, Semigran M et al. The six-minute walk test predicts peak oxygen uptake and survival in patients with advanced heart failure. Chest 1996;110:325–332. 38. Opasich C, Pinna GD, Mazza A et al. Reproducibility of the six-minute walking test in patients with chronic congestive heart failure: practical implications. Am J Cardiol 1998;81:1497–1500. 39. Lipkin DP, Scriven AJ, Crake T et al. Six minute walking test for assessing exercise capacity in chronic heart failure. BMJ 1986;292:653–655. 40. Alloy LB, Abramson LY, Whitehouse WG et al. Depressogenic cognitive styles: predictive validity, information processing and personality characteristics, and developmental origins. Behav Res Ther 1999;37:503–531. 41. Fairclough DL, Peterson HF, Chang V. Why are missing quality of life data a problem in clinical trials of cancer therapy? Stat Med 1998;17:667–677. 42. Little RJA. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 1988;83:1198–1102. 43. Mazumdar S, Lui KS, Houck P et al. Intent-to-treat analysis for longitudinal clinical trials: coping with the challenge of missing values. J Psych Res 1999;33:87–95. 44. Little RJA, Rubin DB. Statistical Analysis with Missing Data. Hoboken, New Jersey: Wiley Interscience; 2002. 45. Engels JM, Diehr P. Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 2003;56:968–976. 46. Auleley G-R, Giraudeau B, Baron G et al. The methods of handling missing data in clinical trials influence sample size requirements. J Clin Epidemiol 2004;56:968–976. 47. Houck PR, Mazumdar S, Koru-Sengul T et al. Estimating treatment effects from longitudinal clinical trial data with missing values: comparative analyses using different methods. Psych Res 2004;129:209–215. 48. Palmer JL. Analysis of missing data in palliative care studies. J Pain Symptom Manage 2004;28:612–618. 49. Carson PE. Beta-blocker treatment in heart failure. Prog Cardiovasc Dis 1999;41:301–321. 50. Fowler MB. Beta-blockers in heart failure. Do they improve the quality as well as the quantity of life? Eur Heart J 1998;19:17–25. 51. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–174. Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016 2. Enright PL, McBurnie MA, Bittner V et al. The 6-min walk test: a quick measure of functional status in elderly patients. Chest 2003;123: 325–327. 3. Guyatt GH, Sullivan MJ, Thompson PL et al. The six minute walk: a new measure of exercise capacity in patients with chronic heart failure. Can Med Assoc J 1985;132:919–923. 4. Knox AJ, Morrison JFJ, Muers MF. Reproducibility of walking test results in chronic obstructive airways disease. Thorax 1988;43:388–392. 5. Faggiano P, D’Aloia A, Gualeni A et al. Oxygen uptake during the 6 minute walking test. Preliminary experience using a portable device. Am Heart J 1998;165:225–232. 6. Shah MR, Hasselblad V, Gheorghiade M et al. Prognostic usefulness of the six-minute walk in patients with advanced congestive heart failure secondary to ischemic or non-ischemic cardiomyopathy. Am J Cardiol 2001;88:987–993. 7. Rostagno C, Olivo G, Comeglio M et al. Prognostic value of 6-minute walk corridor test in patients with mild-to-moderate heart failure: comparison with other methods of functional evaluation. Eur J Heart Fail 2003;5:247–252. 8. Cazeau S, Leclerc C, Lavergne T et al. Multisite Stimulation in Cardiomyopathies (MUSTIC) study investigators. Effects of multisite biventricular pacing in patients with heart failure and intraventricular conduction delay. New Engl J Med 2001;344:873–880. 9. Demers C, McKelvie RS, Negassa A et al. Reliability, validity, and responsiveness of the six-minute walk test in patients with heart failure. Am Heart J 2001;142:698–703. 10. O’Keeffe ST, Lye M, Donnellan C et al. Reproducibility and responsiveness of quality of life assessment and six minute walk test in elderly heart failure patients. Heart 1998;80:377–382. 11. Steptoe A, Mohabir A, Mahon NG et al. Health related quality of life and psychological wellbeing in patients with dilated cardiomyopathy. Heart 2000;83:645–650. 12. Stevens D, Elpern E, Kailash S et al. Comparison of hallway and treadmill six-minute walk tests. Am J Respir Crit Care Med 1999;160:1540–1543. 13. Rector TS, Kubo SH, Cohn JN. Patients’ self-assessment of their congestive heart failure. Content, reliability and validity of a new measure—the Minnesota-living with heart failure questionnaire. Heart Fail 1987;3:198–207. 14. Green C, Porter CB, Bresnahan DR et al. Developing and evaluation of the Kansas City Cardiomyopathy Questionnaire: a new health status measure for heart failure. J Am Coll Cardiol 2000;35:1245–1255. 15. Juenger J, Schellberg D, Kraemer S et al. Health-related quality of life in patients with congestive heart failure: comparison with other chronic diseases and relation to functional variables. Heart 2002;87:235–241. 16. National Institute for Clinical Excellence (NICE). Chronic heart failure. Management of chronic heart failure in adults in primary and secondary care. Clinical Guidelines. Vol. 5. London: NICE; 2003. 17. Remme WJ, Swedberg K. Comprehensive guidelines for the diagnosis and treatment of chronic heart failure. Task force for the diagnosis and treatment of chronic heart failure of the European Society of Cardiology. Eur J Heart Fail 2002;4:11–22. 18. Schiller NB, Shah PM, Crawford M et al. Recommendations for quantification of the left ventricle by two-dimensional echocardiography. J Am Soc Echocardiogr 1989;2:358–367. 19. Bittner V, Weiner DH, Yusuf S et al. Prediction of mortality and morbidity with a 6-minute walk test in patients with left ventricular dysfunction. JAMA 1993;270:1702–1707. 20. Pankoff BA, Overend TJ, Lucy SD et al. Reliability of the six-minute walk test in people with fibromyalgia. Arth Care Res 2000;13291–295. 21. Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol 1997;50:79–93. 22. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;4:307–310. 23. Rigby AS. Statistical methods in epidemiology I. Statistical errors in hypothesis testing. Disab Rehabil 1998;20:121–126. 1751