The reproducibility and sensitivity of the 6

Clinical research
European Heart Journal (2005) 26, 1742–1751
doi:10.1093/eurheartj/ehi259
The reproducibility and sensitivity of the 6-min walk test
in elderly patients with chronic heart failure
Lee Ingle*, Rhidian J. Shelton, Alan S. Rigby, Samantha Nabb, Andrew L. Clark,
and John G.F. Cleland
Department of Academic Cardiology, Castle Hill Hospital, Castle Road, Cottingham, Hull HU16 5JQ, UK
Received 2 September 2004; revised 10 February 2005; accepted 15 March 2005; online publish-ahead-of-print 14 April 2005
KEYWORDS
Aims The 6-min walk test (6-MWT) is used to estimate functional capacity. However, in elderly patients
with chronic heart failure (CHF): (i) 1 year reproducibility of the 6-MWT; (ii) sensitivity of the 6-MWT to
self-perceived changes in symptoms of heart failure; and (iii) implications for patient numbers required
for studies using the 6-MWT as an endpoint have not been described.
Methods and results One thousand and seventy-seven patients with CHF, aged . 60, with NYHA Class
II were recruited. Heart failure symptom assessment was determined using a questionnaire related
to aspects of physical function, and patients performed a baseline 6-MWT, with follow-up 1 year
later. Seventy-four patients with unchanged symptoms had an unchanged 6-MWT distance, with an
overall intraclass correlation coefficient of 0.80 (95% CI ¼ 0.69–0.87). Four hundred and twenty-three
patients reported an improvement in symptoms during follow-up. There was a negative correlation
(r ¼ 20.55; P ¼ 0.0001) between D symptoms and D 6-MWT (i.e. a reduced 6-MWT distance is associated with reduced symptom severity at follow-up). Five hundred and sixteen patients reported worsening symptoms of heart failure, a moderate inverse correlation (r ¼ 20.53; P ¼ 0.0001) was displayed
between D symptoms and D 6-MWT. For all patients, irrespective of symptom status, a high inverse correlation (r ¼ 20.75; P ¼ 0.0001) was evident. On the basis of the data for patients with unchanged
symptoms, it is calculated that to detect an increase in 6-MWT of 50 m, with 90% power, a study size
of approximately 120 is required.
Conclusion In elderly patients with CHF, the 6-MWT shows satisfactory agreement when repeated 1 year
later. Change in 6-MWT distance is sensitive to change in self-perceived symptoms of heart failure.
Introduction
In patients with chronic heart failure (CHF), the 6-min walk
test (6-MWT) is a simple, low-cost method for estimating
exercise capacity; only a pre-measured level surface and a
timing device are needed.1–4 The mode of exercise is familiar
to patients, although it may represent a maximal test for
some.5,6 The test appears useful for the assessment of some
interventions such as cardiac resynchronization7,8 and has
strong predictive power for both mortality and morbidity.4,6,7
Despite the routine inclusion of the 6-MWT in CHF
studies,3,4,9,10 few have focused on test–re-test reproducibility. O’Keeffe et al. 10 recruited 60 elderly patients (mean age
82) who completed the 6-MWT and were re-tested within 3–8
weeks. Intraclass correlation coefficients (ICCs) of 0.91 were
reported for 24 patients with no overall change in cardiac
status, indicating satisfactory agreement. In patients with
CHF, despite the interest in using the 6-MWT as a tool to
assess treatment and despite the fact that it is an important
outcome measure for intervention studies,11,12 only one
* Corresponding author. Tel: þ44 148 262 3732; fax: þ44 148 262 4071.
E-mail address: l.ingle@hull.ac.uk
study has examined reproducibility with a test–re-test interval .3 months9 and none has reported data after 1 year.
A major aim of health care is to reduce symptom severity
within the physical limits imposed by a disease.13–16 It is not
clear whether objective measures of functional capacity are
sensitive to self-perceived changes in symptoms of heart
failure. Therefore, the aim of the current study was to
determine in an elderly representative population of
patients with CHF, the following: (i) long-term (1 year)
reproducibility of the 6-MWT; (ii) sensitivity of the 6-MWT
to self-perceived changes in symptoms of heart failure;
and (iii) implications for patient numbers required for
studies using the 6-MWT as an endpoint.
Methods
The Hull and East Riding Ethics Committee approved the study, and
all patients provided informed consent for participation. Patients
were recruited from a local community heart failure clinic, inclusion
criteria were as follows: age .60; evidence of left ventricular systolic dysfunction (LVSD); and symptoms of heart failure (NYHA Class
II). In total, 68% of the patients had heart failure of ischaemic
aetiology and suffered from the condition for at least 6 months
before the study. Co-morbidities including hypertension and
& The European Society of Cardiology 2005. All rights reserved. For Permissions, please e-mail: journals.permissions@oupjournals.org
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
Symptoms of heart failure;
Elderly patients;
1 year follow-up;
Power curves
The 6-MWT in elderly patients with CHF
Baseline visit
Patients were studied when they were clinically stable, without any
changes in medication during the previous 3 weeks. They underwent
clinical history and physical examination, together with ECG and
echocardiogram. Symptoms of heart failure were determined by
methodology used in the EuroHeart Failure Survey. Patients were
asked a series of six questions graded from 1 to 6, where 1 was unimpaired and 6 was very much impaired. Thus, patients could score
between 6 and 36 points. These questions related to perceived
heart failure symptoms during physical function14 (see Appendix).
6-MWT protocol
The 6-MWT was conducted following a standardized protocol,
between 10 a.m. and 4 p.m. after usual medication.3 A 15 m flat,
obstacle-free corridor, with chairs placed at either end was used.
Patients were instructed to walk as far as possible, turning 1808
every 15 m in the allotted time of 6 min. Patients were able to
rest, if needed, and time remaining was called every second
minute.19 Patients walked unaccompanied so as not to influence
walking speed. On completion of 6 min, patients were instructed
to stop and total distance covered was calculated to the nearest
metre. Standardized verbal encouragement was given to patients
after 2 and 4 min, respectively.
Patients returned for follow-up at 1 year and the evaluation of
symptom severity and 6-MWT performance were repeated.
Patients were divided into three prospectively defined groups
based on changes in symptoms of heart failure between baseline
and 1 year. In group 1, patients reported unchanged symptoms,
defined as baseline score +3 points. These results were used for
the reproducibility analysis as no perceived changes in heart
failure symptoms were reported.20 In group 2, worsening symptoms
were reported, defined by a rise 4 points; and in group 3,
improved symptoms were reported, defined by a fall of 4 points.
Statistical analysis
Data were analysed using SPSS statistical software for Windows
version 11.5 (SPSS Inc., Chicago, IL, USA). To assess reproducibility,
ICC with 95% CIs were calculated. Several investigators have
suggested that an ICC of 0.75 is satisfactory when studying
groups of patients, so this threshold was defined as acceptable for
the current study.10,21 Bland–Altman plots with 95% limits of agreement were also derived.22 For heart failure symptom assessment,
medians and inter-quartile ranges (IQRs) were presented. A x2
test was used to determine the differences in heart failure symptoms between baseline and 1 year. Spearman correlation coefficients were used to determine the relation between changes in
6-MWT performance and changes in symptoms of heart failure.
Group differences at baseline were determined by the analysis of
variance (ANOVA). In order to account for the inflation of the experiment-wise type I error owing to multiple testing, we have followed
previous recommendations of reporting unadjusted P-values.23
Indeed, as Perneger24 concluded ‘simply describing what tests of
significance have been performed, and why, is generally the best
way of dealing with multiple comparisons’. We have also performed
a subgroup analysis that provides information on the consistency (or
lack of) of findings. In patients with more severe symptoms, 6-MWT
may be limited predominantly by cardiorespiratory disease,
whereas in patients with milder disease other factors may be
important. Data are presented as mean + SD; all tests were twosided, and P , 0.05 was taken as being statistically significant.
We used the standard deviation (SD) of the 6-MWT at baseline to
construct power curves for a proposed intervention study. Power
was defined as the probability of showing a difference between
two (or more groups), if a difference actually exists between
them.25 The curves were designed to show the sample size required
per group (equal allocation) in step sizes of 10 m. For every 10 m
gained, the sample size is reduced. Note that when planning an
intervention study, an estimate would be required for the potential
loss-to-follow-up. Nomograms and power curves have been produced both for general medical use26,27 and for more
specialist problems such as those posed by reliability studies.28,29
Results
Of an initial population of 1077 patients, 64 died (46
males, 77.7 + 7.2 years, and body mass 74.6 + 17.9 kg)
(Figure 1). At baseline, 6-MWT distance was significantly
lower for patients who subsequently died than for groups 1,
2, and 3 (P ¼ 0.002), although symptom severity was not
different.
Data from the remaining 1013 patients were analysed.
Seventy-four patients (52 males) showed no change in symptoms over 1 year. Baseline clinical characteristics are shown
in Table 1. There was no difference in 6-MWT distance
between patients with unchanged symptoms and those
with worsening symptoms (P ¼ 0.086), but a difference
between patients with improved symptoms of heart failure
and the other groups was seen at baseline (P ¼ 0.032).
Long-term (1 year) reproducibility of the 6-MWT
In patients with unchanged symptoms after 1 year, baseline
6-MWT distance was 285 + 122 m, and fell slightly, but not
significantly (276 + 118 m, P ¼ 0.07). The ICC for 6-MWT
for all 74 patients was 0.80 (95% CI ¼ 0.69–0.87) showing a
high level of agreement by our criteria.10,19 After stratifying
by beta-blocker usage between baseline (r ¼ 0.80; 95%
CI ¼ 0.69–0.87) and 1 year (r ¼ 0.81; 95% CI ¼ 0.69–0.89),
reproducibility remained unchanged. We then divided
patients by sex. Males walked 301 + 115 m at baseline compared with 246 + 133 m (P ¼ 0.07) for females, although
females (mean age 73.4 + 5.4) were older than males
(mean age 71.6 + 7.4), albeit not significantly (P ¼ 0.086).
After 1 year, the difference in distance walked between
males
(307 + 107 m)
and
females
(205 + 112 m,
P ¼ 0.0015) was significant. Reproducibility was higher in
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
diabetes mellitus of moderate or less severity were included according to the National Institute for Clinical Excellence Guidelines.16
Patients were excluded if they were unable to walk without assistance from another person (not including mobility aids) or if they
were unable to exercise because of non-cardiac limitations including
osteoarthritis and chronic obstructive pulmonary disease of at least
moderate severity.17 A history of smoking was evident in 74.2% of
patients, although current smoking levels were 11.8%.
Heart failure was defined in accordance with the National
Institute for Clinical Excellence Guidelines16 and with the
European Society of Cardiology.17 Left ventricular function was
determined from 2D-echocardiography or magnetic resonance
imaging. Echocardiography was carried out by one of three
trained operators. Left ventricular function was assessed by estimation on a scale of normal, mild, moderate, and severe impairment and was assessed by a second operator blind to the
assessment of the first; where there was disagreement on the severity of left ventricular dysfunction, the echocardiogram was
reviewed jointly with the third operator and a consensus reached.
Left ventricular ejection fraction (LVEF) was calculated using the
Simpson’s formula from measurements of end-diastolic and endsystolic volumes on apical 2D views, following the guidelines of
Schiller et al.,18 and LVSD was diagnosed if LVEF was 40%. When
the echocardiogram was of low quality, patients underwent a
cardiac magnetic resonance scan to determine left ventricular
volume and function.
1743
1744
L. Ingle et al.
Flow chart showing number of patients in each group.
males (ICC ¼ 0.85; 95% CI ¼ 0.75–0.91) than in females
(ICC ¼ 0.65; 95% CI ¼ 0.33–0.84). The Bland–Altman plot
for the 6-MWT is shown in Figure 2. There was no relation
between the differences in values (calculated as 1 year 2
baseline) and the mean values (average of 1 year and baseline). The mean difference was 28.6 m with 95% limits of
agreement of 2162.1–144.8 m. We have also reported
that NYHA Class II patients show moderate 1 year reproducibility for the 6-MWT (ICC¼0.52; 95% CI ¼ 20.09–0.85).
Sensitivity to change of the 6-MWT based on
changes in symptoms
Figure 3 shows no relation between baseline symptoms and
baseline 6-MWT in groups 1, 2, and 3 (r ¼ 0.00, P ¼ 0.74),
whereas Figure 4 shows a strong association between D
symptom severity and D 6-MWT (r ¼ 20.75; P ¼ 0.00001)
in all patients. In 516 patients (327 males; 63%) with worsening symptoms of heart failure, mean 6-MWT distance fell
from 279 + 127 to 192 + 165 m. There was an inverse correlation (r ¼ 20.53; P ¼0.0001) between D symptoms and D
6-MWT. However, 132 patients (62 males) declined to participate in the 6-MWT. These patients had a greater
decline in symptoms than the other groups. Details of
these patients are presented in Table 1. In patients with
improved symptoms, there was an inverse correlation
(r ¼ 20.55; P ¼0.0001) between D symptoms and D
6-MWT. There was no overall association (Figure 5 )
between baseline symptom severity and D symptoms after
1 year in all groups (r ¼ 0.01; P ¼ 0.109). However, in
patients with worsening symptoms, a strong inverse correlation was evident (r ¼ 20.67; P ¼ 0.00045). No relation
existed between baseline 6-MWT and D 6-MWT at 1 year
(r ¼ 0.2; P ¼ 0.125) (Figure 6 ).
Implications for study size using the 6-MWT
as an endpoint
We constructed power curves to estimate the sample size
required for an intervention study based on the 6-MWT
(Figure 7 ). We calculated power curves in 10% intervals
from 50 to 90%. There is no minimum acceptable power
but the higher the better, though high power does not
come without cost. For example, the higher the power,
the larger the sample size. To construct the power curves,
we required information on the type I error (also known as
the P-value), where P is the probability of a false positive.
Typically, 5% is chosen though this is an arbitrary threshold
for statistical significance, and we assumed a two-tailed
test. We also required information on the SD of the
outcome measure. The SD of the 6-MWT at baseline was
120 m. We also required distance walked data. Power
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
Figure 1
The 6-MWT in elderly patients with CHF
1745
Table 1 Clinical characteristics
Clinical characteristics
(mean + SD)
Classification according to change in symptom score over 1 year
No change
74
52/22
1.71 + 0.14
75.3 + 16.8
72.4 + 6.7
33.4 + 7.6
39.2
24.6
Treatment at baseline
Warfarin (%)
Loop diuretic (%)
Beta-blockers (%)
Digoxin (%)
ACE-I (%)
Statin (%)
29.7
75.7
42.7
25.7
52.7
32.4
Baseline 6-MWT (m)
Follow-up 6-MWT (m)
D Mean 6-MWT
Baseline symptom score
(median + IQR)
Follow-up symptom score
(median + IQR)
D Mean symptom scorec
Unable to
repeat 6-MWT
Better
516
388/128
1.66 + 0.10
79.4 + 15.1
70.1 + 9.1
29.9 + 5.3
38.4
25.3
132
62/70
1.65 + 0.09
82.3 + 14.0
72.4 + 6.4
28.3 + 4.8
42.3
23.8
423
301/122
1.72 + 0.13
78.4 + 15.2
75.8 + 8.1
34.6 + 7.8
40.5
19.8
27.5
72.5
45.5
28.9
58.6
38.4
29.2
70.3
43.0
27.1
55.8
40.2
33.6
74.4
48.4
27.7
56.6
36.5
285 + 122
276 + 118
9 + 77
15 + 4
279 + 127
195 + 130
284 + 63
15 + 7
263 + 95
—
—
15 + 7
342 + 117
396 + 126
54 + 46
16 + 5
15 + 4
20 + 6
26 + 6
10 + 4
0
5
11
26
a
Quantitative assessment of LVEF obtained . 80% of patients.
Significant difference between deceased patients and other groups (P ¼ 0.002).
c
Positive values indicate deterioration and negative values indicate improvement.
b
Figure 2
Bland–Altman plot for 6-MWT in 74 patients with no change in symptoms.
ANOVA
(P-value)
64
48/16
1.70 + 0.1
74.6 + 17.9
77.7 + 7.2
32.8 + 8.3
41.8
19.7
—
—
0.39
0.01
0.01
0.12
0.21
0.08
32.4
74.6
41.4
26.6
55.8
40.2
0.36
0.21
0.34
0.45
0.38
0.34
208 + 103b
—
—
16 + 8
0.032
0.0001
0.0001
0.23
—
0.0001
—
0.0001
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
n
Male/female
Stature (m)
Body mass (kg)
Age (years)
LVEF (%)a
Hypertension (%)
Diabetes (%)
Worse
Dead
1746
L. Ingle et al.
Figure 4
Baseline symptom severity vs. baseline 6-MWT.
Changes in symptoms vs. change in 6-MWT after 1 year.
curves were then constructed over a whole range of distances ranging from 30 to 100 m (Figure 7 ) using formulae
outlined by Altman.25 The interested reader should then
read off the required sample size for a given power (5% significance, two-tailed) for our assumed SD. For a gain of
10 m, we would require over 3000 patients per group (90%
power, 5% significance). Conversely, for a 100 m gain, we
would require just over 30 patients per group (90% power,
5% significance). It is noteworthy that SD will vary depending
on the heterogeneity of the population studied (Table 3 ).
Discussion
At baseline, 6-MWT performance (mean: 208 + 103 m) was
significantly lower in elderly patients with LVSD who died
prior to follow-up, although symptom severity was not
different. Previous studies have shown that a 6-MWT of
,300 m significantly increases mortality risk.7,30 Current
data support this finding. It is also possible that the baseline
6-MWT performance is prognostically more sensitive than
the assessment of baseline symptoms. We found no
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
Figure 3
The 6-MWT in elderly patients with CHF
1747
Baseline symptom severity vs. changes in symptoms after 1 year.
Figure 6
Baseline 6-MWT vs. change in 6-MWT after 1 year.
relationship between baseline symptom severity and 6-MWT
performance; however, one of the novel aspects of our study
is the sensitivity between changes in symptoms and changes
in 6-MWT performance between baseline and 1 year.
Obviously, we could not follow-up deceased patients, and
further prognostic information regarding symptom severity
is yet to be determined.
Long-term reproducibility of the 6-MWT
The current study shows that after 1 year, the 6-MWT displays acceptable reproducibility (ICC ¼ 0.80; 95%
CI ¼ 0.69–0.87) in a population of elderly patients with
CHF due to systolic dysfunction and associated comorbidities of hypertension and diabetes mellitus.
However, it is difficult to generalize these findings to
patients with several co-morbidities including osteoarthritis
and chronic obstructive pulmonary disease. In order to compensate for changes in patients’ clinical conditions, we
assessed only those patients whose symptom severity
remained unchanged (7.3% of total). Although sample size
was limited (n ¼ 74), it was larger than in previous studies
(n 26) of patients with lung disease,4,31 fibromyalgia,18
brain injury,32 and CHF.10 Furthermore, patients in these
studies were followed up after ,8 weeks. The study by
Demers et al. 9 assessed 768 patients at baseline, 18 and
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
Figure 5
1748
L. Ingle et al.
Table 2 Reproducibility data for patients with unchanged symptom status
Variable
Number of
patients
ICC (95% CI)
Overall
Male
Female
74
52
22
0.80 (0.69–0.87)
0.85 (0.75–0.91)
0.65 (0.33–0.84)
Age at baseline (years)
60–64
65–69
70–74
75–79
80þ
8
13
16
18
19
0.78 (0.68–0.96)
0.78 (0.43–0.92)
0.91 (0.79–0.97)
0.72 (0.40–0.88)
0.57 (0.46–0.85)
NYHA classification
II
III/IV
62
12
0.52 (20.09–0.85)
0.74 (0.50–0.86)
Stratified by beta-blockade
Baseline
1 year
60
73
0.80 (0.69–0.87)
0.81 (0.69–0.89)
43 weeks. The trial aimed to examine the effects of candesartan, enalapril, and metropolol on LVEF (RESOLVD study).
The authors reported high reproducibility after 43 weeks
(ICC ¼ 0.91; CI not reported); however, the study did not
compensate for changes in patients’ clinical conditions. To
our knowledge, the current study is the first to assess reproducibility of the 6-MWT after 12 months in which clinical
condition was controlled.
A habituation period, where the test manoeuvre is repeatedly practised, reduces variability in 6-MWT performance.33 A
learning effect of 6% was reported in a cardiac rehabilitation
population completing the 6-MWTon non-consecutive days,34
and the effect was maintained for up to 2 months in healthy
subjects.35 Some have argued that studies should include a
minimum of one or even two practice sessions. However,
tests would need to be administered on separate days,
which would be cumbersome to implement in clinical trials.
On the basis of the results of the current study, satisfactory
reproducibility can be achieved without repeating the
6-MWT. We stratified by beta-blockade and found no
changes in reproducibility, indicating that beta-blockers do
not dissociate 6-MWT performance and symptom severity in
this cohort of patients. Therefore, we have added confidence
that our data demonstrate a true reflection of the association
between 6-MWT performance and symptom severity. We also
found a clear difference between males (ICC ¼ 0.85; 95%
CI ¼ 0.75–0.91)
and
females
(ICC ¼ 0.65;
95%
CI ¼ 0.33–0.84). Although females may often walk shorter
distances,2 there is little evidence that they provide less
stable data during repeated measures. It has been reported
that differences in symptom severity between males and
females36,37 may be responsible, however, this was not a
finding in the current study. It is noteworthy that females
(mean age 73.4 + 5.4) were older than males (mean age
71.6 + 7.4), albeit not significantly (P ¼ 0.086); therefore,
it is conceivable that age differences may be in some way
responsible.
We used a 15 m long corridor for the patients to perform the
6-MWT, whereas others have used corridors of different
length including 20 m38 and .30 m.3 Although never formally
tested, shorter corridor lengths may have an impact on
6-MWT performance due to the increased impact of turning.
The current study indicates that stringent standardization
of test procedures does not guarantee low SD in a heterogenous heart failure population, in accordance with other
studies including RESOLVD.9
Our data show relatively low reproducibility (ICC ¼ 0.52;
95% CI ¼ 20.09–0.84) after 1 year in patients with NYHA
Class II symptoms (Table 2 ). It is possible that in patients
with more severe symptoms of heart failure, that is, Class
III/IV, the 6-MWT will better reflect cardiorespiratory function (ICC ¼ 0.74; 95% CI ¼0.50–0.86), whereas in patients
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
Figure 7 Power curves showing the sample size required for an intervention study based on gained 6-MWT distance (based on SD ¼ 120 m; two-tailed test;
P , 0.05).
The 6-MWT in elderly patients with CHF
1749
Table 3 Non-interventional CHF trials employing the 6-MWT
Authors
Mean age (years)
(range or SD)
Sex
(male/female)
n
NYHA class
Distance
walked (m)
SD
Rostagno et al. 7
O’Keeffe et al. 10
Roul et al. 31
57 (29–70)
81 (74–92)
59 (11)
119/95
38/22
—
214
60
121
III–IV
I–IV
II–III
Zugck et al. 53
54 (12)
90/23
113
I–III
Morales et al. 54
Opasich et al. 39
53 (11)
53 (9)
37/9
274/41
46
315
II–IV
II–III
Hulsmann et al. 55
57 (8)
79/17
96
I–IV
Martensson et al. 56
Hauptman et al. 57
Cahalin et al. 38
Mean
61 (9)
61 (13)
49 (8)
59
48/0
363/121
—
—
48
484
45
154
I–IV
—
II–IV
—
229
239
448
410
423
485
408
390
313
221
395
242
315
310
345
112
52
92
126
104
91
91
88
113
145
107
100
112
100
110
Sensitivity to changes in the 6-MWT based on
changes in symptoms
We have found that change in 6-MWT distance is sensitive to
changes in symptoms of heart failure in a representative
sample of patients with CHF. To our knowledge, the
current study is the first to focus on self-perceived symptoms of heart failure. Other studies have found no association between generic quality of life (QoL) instruments
and 6-MWT.19,28,39 Steptoe et al. 11 assessed health-related
QoL and psychological well-being in 99 patients with
dilated cardiomyopathy. They reported no association
between functional capacity and QoL in patients with
NYHA Class I and II symptoms. The current study shows
similar findings for symptom severity at baseline (Figure 2 ).
A comparative investigation13 of 205 patients with heart
failure reported similar findings to the current study. Our
data suggest that for patients with a range of heart failure
symptoms (NYHA II–IV), 6-MWT is sensitive to changes in
symptoms of heart failure. The sensitivity of a test to
changes in symptoms is an important but often neglected
clinical measure.40 Many factors contribute to these
changes including pathophysiological and psychological
alterations.11 Patients with CHF are prone to episodes of
depression with a resulting deterioration in symptoms.41
The extent to which objective measures of functional
capacity predict self-reported mental health status have
yet to be determined. We did not measure depression/
depressive symptoms; therefore, it is not possible to say
whether symptom severity or indeed reproducibility of the
6-MWT was affected by this variable at follow-up. Future
studies should focus on how changes in 6-MWT and
symptom severity influence prognosis in patients with CHF.
To provide adequate statistical power, careful consideration
should be given to sample size and study design.
We found that only 7% of patients had symptoms that
remained unchanged over 1 year. Few studies have
focused on mid- to long-term changes in symptoms and
QoL without intervention. The study by O’Keeffe et al. 10
reported that in 45 elderly patients with heart failure followed up after 3–8 weeks, 53% had no changes in QoL,
which is much higher than our findings. However,
O’Keeffe’s study10 employed a smaller sample size, included
a short follow-up period, and used a different QoL inventory
and method of analysis, and did not focus specifically on
changes in symptoms. Therefore, it is very difficult to
compare these findings. To our knowledge, our study is the
first to report changes in symptoms over a 12 month
period in a large cohort of patients with LVSD. Future
studies are required to corroborate or refute these findings.
Incomplete data sets due to attrition or non-compliance
represent a major challenge for researchers.42–49 In particular, it is important to recognize the pattern of missingness
because this can determine the statistical analysis.
According to Little and Rubin45 the missing-data mechanism
is called ‘missing completely at random’ (MCAR), where the
missingness is independent of response (e.g. a patient
misses an appointment because of bad weather). The
missing-data mechanism is called missing at random,
where the missingness depends on the observed response
only (i.e. a patient stays in hospital for a few weeks but
then skips an appointment). Otherwise, the missing-data
mechanism is known as non-ignorable missing. We tested
the assumption of MCAR on the 6-MWT data at follow-up
by applying Little’s test.43 The test statistic is based on
the pattern specific mean values and the pooled estimates
of the population mean and covariance. The missing-data
mechanism was not MCAR (x2 ¼ 782, P , 0.0001).
Recognition of the missing-data mechanism is important in
selecting an appropriate method of analysis because
methods that disregard the missing-data process may lead
to biased estimates of effect size and unrealistic estimates
of power,42,47,49 though the latter may be overcome to
some extent by including more patients.49
There are a wide variety of statistical methods available
for handling missing data, the interested reader is directed
to Engels and Diehr.46 Some methods use information pertaining to the patient whose data is missing, others use
the values of other patients. A commonly applied method
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
with milder symptoms, other, yet to be identified factors
may be important (possibly mood).
1750
(that is easy to apply in practice) is carrying the last observation forwards. This technique will lead to a more conservative treatment effect but at the same time it has a smaller
SD (thus, 95% CIs may be unrealistically low). However,
there is consensus that no one single method that is appropriate for all situations.46 More generally, it is recommended
that ‘. . .in longitudinal studies, where the overall trend is
for worse health over time and where missing data can be
assumed to be primarily related to worse health, missing
data should be imputed from the available longitudinal
data for that person’.46 In the context of a randomized controlled trial, we recommend that researchers follow the
advice of Houck et al. 48 who contended ‘. . .attention to
the missing-data mechanism should be an integral part of
clinical trial data’.
Guyatt et al. 3 suggested that the minimum clinically significant distance for the 6-MWT is 30 m. On the basis of our calculations (Figure 4 ), a gain of 30 m would require 250
patients per group with 80% power at 5% significance or
340 patients per group with 90% power at 5% significance.
The study by O’Keeffe et al. 10 reported baseline 6-MWT distance of 239 + 52 and 275 + 103 m 3–8 weeks later in
patients with ‘much better’ symptoms. Therefore, for a
gain of 47 m, our power analysis indicates that a sample
size of 150 patients with 90% power, or 120 with 80%
power at 5% significance is required. The study10 recruited
60 patients, and based on our findings was therefore underpowered. Our curves can be used to assist in planning group
sizes for intervention studies where the 6-MWT is an
outcome measure. Note that when designing studies, an
estimate is required of the potential loss-to-follow-up,
which should be factored into the planning process.
Table 3 identifies a selection of non-interventional CHF
trials in which the 6-MWT was used as an endpoint. These
data can be used to determine whether the SD of 120 m
identified in the current study could be applied to other
heart failure populations. Table 3 indicates a mean SD of
110 m, which is similar to the current study; however,
mean age is much lower (59 years) than our data (.70
years). It is possible that walking performance is more variable in older patients, which may explain the SD differences. Care should be taken when applying these power
curves because of the heterogeneity of different subgroups.
A limitation of this study was that 132 patients with worsening symptoms declined to participate in the 6-MWT. This
loss was very similar to the smaller study by O’Keeffe et al. 10
Although patient medication may have been optimized, it is
possible that ACE inhibitors,15 and beta-blockers50,51 do not
lead to positive changes in symptom severity despite the
well-known benefits to mortality risk and improvement in
LVEF. Further, subgroup numbers (Table 2 ) are small as the
reproducibility analysis is based on only 74 patients. The
sensitivity of the 6-MWT to perceived changes in symptom
severity was determined without measuring perceived
changes in anxiety and depression from validated inventories. We acknowledge that these factors may play a role
in changes in functional capacity over time, and should be
included in future studies. An unexpected observation was
that for patients whose symptoms improved; they walked
further and were older than patients whose symptoms worsened or did not change at follow-up (Table 1 ). It is difficult
to provide an explanation for these findings, future studies
may wish to address this issue.
We did not carry out an a priori power calculation, and to
our knowledge, there is little written about power for reproducibility studies, with perhaps the work of Donner28,29 the
most well known. Lack of power becomes a (possible)
problem if no significant differences are found. With the
exception of NYHA Class II (ICC ¼ 0.52), all the ICCs were
significant at the 5% level (Table 2 ). Using the criteria of
Landis and Koch,52 an ICC of 0.52 would be classified as
having only a ‘moderate’ level of reliability. Using the
power curves of Donner28 to show ‘moderate’ reliability
would require about 50 patients (assuming two measurements on each), if the actual reliability was 0.8 (80%
power, 5% significance, two-tailed). If the actual reliability
was ,0.8, then we would require more patients or more
than two measurements per patient.28
Conclusion
We have shown a satisfactory long-term (1 year) reproducibility for the 6-MWT in elderly patients with heart failure
due to systolic dysfunction. These data suggest that the
6-MWT may be an appropriate test of functional capacity
in these patients. Males demonstrated lower variability
than females. On the basis of these findings, we conclude
that 6-MWT distance is sensitive to self-perceived changes
in symptoms of heart failure. When the 6-MWT is an endpoint in a clinical trial, a minimum of 500 patients is
needed to detect a change of 30 m in an intervention.
However, SDs will vary depending on the heterogeneity of
the population studied. Researchers may expect a degree
of missing data especially in longitudinal studies; attention
to the missing-data problem should become an integral
part of the clinical trial protocol.
Acknowledgement
We wish to thank the referees for their constructive comments.
Appendix
For each question, patients responded by providing one of six
responses based on the options follow: (1) no; (2) very little; (3) a
little; (4) some; (5) a lot; (6) very much.
The six questions relating to symptoms are listed below.
In the last month, how much did the following affect you?
(1)
(2)
(3)
(4)
(5)
(6)
breathlessness limiting daily activities;
fatigue limiting daily activities;
inability to do normal daily activities due to health;
inability to do hobbies/sports due to health;
inability to work due to health;
chest pain during normal activity.
References
1. Enright PL, Sherill DL. Reference equations for the six-minute walk in
healthy adults. Am J Respir Crit Care Med 1998;158:1384–1387.
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
Implications for study size using the 6-MWT
as an endpoint
L. Ingle et al.
The 6-MWT in elderly patients with CHF
24. Perneger TV. What’s wrong with Bonferroni adjustments? BMJ
1998;316:1236–1238.
25. Altman DG. Statistics and ethics in medical research III. How large a
sample? BMJ 1980;281:1236–1238.
26. Day SJ, Graham DF. Sample size and power for comparing two or more
treatment groups in clinical trials. BMJ 1989;299:663–665.
27. Donner A, Eliasziw M. Sample size requirements for reliability studies.
Stat Med 1987;6:441–448.
28. Donner A. Sample size requirements for the comparison of two or more
coefficients of inter-observer agreement. Stat Med 1998;17:1157–1168.
29. Roul G, Germain P, Bareiss P. Does the 6-minute walk test predict the
prognosis in patients with NYHA class II or III chronic heart failure? Am
Heart J 1998;136:449–457.
30. Butland RJ, Pang J, Gross ER et al. Two-, six-, and 12-minute walk tests in
respiratory disease. BMJ 1982;284:1607–1608.
31. Mossberg KA. Reliability of a timed walk test in persons with acquired
brain injury. Am J Phys Med Rehabil 2003;82:385–390.
32. Pinna GD, Opasich C, Mazza A et al. Reproducibility of the
six-minute walking test in heart failure patients. Stat Meth 2000;19:
3087–3094.
33. Hamilton DM, Haennel RG. Validity and reliability of the 6-minute walk
test in a cardiac rehabilitation population. J Cardiopulm Rehabil
2000;20:156–164.
34. Wu G, Sanderson B, Bittner V. The 6-minute walk test: how important is
the learning effect? Am Heart J 2003;146:129–133.
35. Shephard RJ, Franklin BA. Changes in quality of life: a major goal of
cardiac rehabilitation. J Cardiopulm Rehabil 2001;21:189–200.
36. Morrin L, Black S, Reid R. Impact of duration in a cardiac rehabilitation
programme on coronary risk profile and health-related quality of life outcomes. J Cardiopulm Rehabil 2000;20:115–121.
37. Cahalin L, Mathier MA, Semigran M et al. The six-minute walk test predicts peak oxygen uptake and survival in patients with advanced heart
failure. Chest 1996;110:325–332.
38. Opasich C, Pinna GD, Mazza A et al. Reproducibility of the six-minute
walking test in patients with chronic congestive heart failure: practical
implications. Am J Cardiol 1998;81:1497–1500.
39. Lipkin DP, Scriven AJ, Crake T et al. Six minute walking test for assessing
exercise capacity in chronic heart failure. BMJ 1986;292:653–655.
40. Alloy LB, Abramson LY, Whitehouse WG et al. Depressogenic cognitive
styles: predictive validity, information processing and personality characteristics, and developmental origins. Behav Res Ther 1999;37:503–531.
41. Fairclough DL, Peterson HF, Chang V. Why are missing quality of life data a
problem in clinical trials of cancer therapy? Stat Med 1998;17:667–677.
42. Little RJA. A test of missing completely at random for multivariate data
with missing values. J Am Stat Assoc 1988;83:1198–1102.
43. Mazumdar S, Lui KS, Houck P et al. Intent-to-treat analysis for longitudinal clinical trials: coping with the challenge of missing values. J Psych Res
1999;33:87–95.
44. Little RJA, Rubin DB. Statistical Analysis with Missing Data. Hoboken,
New Jersey: Wiley Interscience; 2002.
45. Engels JM, Diehr P. Imputation of missing longitudinal data: a comparison
of methods. J Clin Epidemiol 2003;56:968–976.
46. Auleley G-R, Giraudeau B, Baron G et al. The methods of handling missing
data in clinical trials influence sample size requirements. J Clin
Epidemiol 2004;56:968–976.
47. Houck PR, Mazumdar S, Koru-Sengul T et al. Estimating treatment effects
from longitudinal clinical trial data with missing values: comparative analyses using different methods. Psych Res 2004;129:209–215.
48. Palmer JL. Analysis of missing data in palliative care studies. J Pain
Symptom Manage 2004;28:612–618.
49. Carson PE. Beta-blocker treatment in heart failure. Prog Cardiovasc Dis
1999;41:301–321.
50. Fowler MB. Beta-blockers in heart failure. Do they improve the quality as
well as the quantity of life? Eur Heart J 1998;19:17–25.
51. Landis JR, Koch GG. The measurement of observer agreement for
categorical data. Biometrics 1977;33:159–174.
Downloaded from http://eurheartj.oxfordjournals.org/ by guest on September 30, 2016
2. Enright PL, McBurnie MA, Bittner V et al. The 6-min walk test: a quick
measure of functional status in elderly patients. Chest 2003;123:
325–327.
3. Guyatt GH, Sullivan MJ, Thompson PL et al. The six minute walk: a new
measure of exercise capacity in patients with chronic heart failure. Can
Med Assoc J 1985;132:919–923.
4. Knox AJ, Morrison JFJ, Muers MF. Reproducibility of walking test results in
chronic obstructive airways disease. Thorax 1988;43:388–392.
5. Faggiano P, D’Aloia A, Gualeni A et al. Oxygen uptake during the 6 minute
walking test. Preliminary experience using a portable device. Am Heart J
1998;165:225–232.
6. Shah MR, Hasselblad V, Gheorghiade M et al. Prognostic usefulness of the
six-minute walk in patients with advanced congestive heart failure secondary to ischemic or non-ischemic cardiomyopathy. Am J Cardiol
2001;88:987–993.
7. Rostagno C, Olivo G, Comeglio M et al. Prognostic value of 6-minute walk
corridor test in patients with mild-to-moderate heart failure: comparison
with other methods of functional evaluation. Eur J Heart Fail
2003;5:247–252.
8. Cazeau S, Leclerc C, Lavergne T et al. Multisite Stimulation in
Cardiomyopathies (MUSTIC) study investigators. Effects of multisite
biventricular pacing in patients with heart failure and intraventricular
conduction delay. New Engl J Med 2001;344:873–880.
9. Demers C, McKelvie RS, Negassa A et al. Reliability, validity, and responsiveness of the six-minute walk test in patients with heart failure. Am
Heart J 2001;142:698–703.
10. O’Keeffe ST, Lye M, Donnellan C et al. Reproducibility and responsiveness
of quality of life assessment and six minute walk test in elderly heart
failure patients. Heart 1998;80:377–382.
11. Steptoe A, Mohabir A, Mahon NG et al. Health related quality of life and
psychological wellbeing in patients with dilated cardiomyopathy. Heart
2000;83:645–650.
12. Stevens D, Elpern E, Kailash S et al. Comparison of hallway and treadmill
six-minute walk tests. Am J Respir Crit Care Med 1999;160:1540–1543.
13. Rector TS, Kubo SH, Cohn JN. Patients’ self-assessment of their congestive heart failure. Content, reliability and validity of a new
measure—the Minnesota-living with heart failure questionnaire. Heart
Fail 1987;3:198–207.
14. Green C, Porter CB, Bresnahan DR et al. Developing and evaluation of the
Kansas City Cardiomyopathy Questionnaire: a new health status measure
for heart failure. J Am Coll Cardiol 2000;35:1245–1255.
15. Juenger J, Schellberg D, Kraemer S et al. Health-related quality of life in
patients with congestive heart failure: comparison with other chronic diseases and relation to functional variables. Heart 2002;87:235–241.
16. National Institute for Clinical Excellence (NICE). Chronic heart failure.
Management of chronic heart failure in adults in primary and secondary
care. Clinical Guidelines. Vol. 5. London: NICE; 2003.
17. Remme WJ, Swedberg K. Comprehensive guidelines for the diagnosis and
treatment of chronic heart failure. Task force for the diagnosis and treatment of chronic heart failure of the European Society of Cardiology. Eur J
Heart Fail 2002;4:11–22.
18. Schiller NB, Shah PM, Crawford M et al. Recommendations for quantification of the left ventricle by two-dimensional echocardiography. J Am
Soc Echocardiogr 1989;2:358–367.
19. Bittner V, Weiner DH, Yusuf S et al. Prediction of mortality and morbidity
with a 6-minute walk test in patients with left ventricular dysfunction.
JAMA 1993;270:1702–1707.
20. Pankoff BA, Overend TJ, Lucy SD et al. Reliability of the six-minute walk
test in people with fibromyalgia. Arth Care Res 2000;13291–295.
21. Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health
status: reliability and responsiveness of five generic health status
measures in workers with musculoskeletal disorders. J Clin Epidemiol
1997;50:79–93.
22. Bland JM, Altman DG. Statistical methods for assessing agreement
between two methods of clinical measurement. Lancet 1986;4:307–310.
23. Rigby AS. Statistical methods in epidemiology I. Statistical errors in
hypothesis testing. Disab Rehabil 1998;20:121–126.
1751