Clinical Reliability of Manual Muscle Testing

advertisement
Clinical Reliability of Manual Muscle Testing
Middle Trapezius and Gluteus Medius Muscles
ETHEL FRESE,
MARYBETH BROWN,
and BARBARA J. NORTON
The purposes of this study were to develop a protocol to examine the reliability
of manual muscle testing in a clinical setting and to use that protocol to assess
the interrater reliability of manually testing the strength of the middle trapezius
and gluteus medius muscles. One hundred ten patients with various diagnoses
participated as subjects, and 11 physical therapists participated as examiners in
this study. The results showed that interrater reliability for right and left middle
trapezius and gluteus medius muscles was low. The percentage of therapists
obtaining a rating of the same grade or within one third of a grade ranged from
50% to 60% for the four muscles. This study indicates that using manual muscle
testing to make accurate clinical assessments of patient status is of questionable
value.
Key Words: Manual muscle testing, Muscle hypotonia, Physical therapy.
Manual muscle testing is an important clinical tool used by physical therapists to determine a patient's muscle
strength. Muscle testing originated in
the United States in the early 1900s
during the study of muscle function in
patients with poliomyelitis. Despite the
change in the role of manual muscle
testing with the end of the last poliomyelitis epidemic in this country, it
remains an important clinical tool for
assessing the muscular causes of movement dysfunction. Testing of muscles is
considered to be an essential prerequisite for treatment program planning and
modification. The results of manual
muscle testing also are used to make
clinical judgments concerning the patient's progress or deterioration, as well
as to assess the effectiveness of a particular treatment.
The study of the reliability of examiners performing manual muscle tests is
Mrs. Frese is Instructor, Department of Physical
Therapy, St. Louis University, 1504 S Grand Blvd,
St. Louis, MO 63104 (USA). She was a master's
degree student, Program in Physical Therapy,
School of Medicine, Washington University, St.
Louis, MO, when this study was completed.
Dr. Brown is Instructor, Program in Physical
Therapy, PO Box 8083, School of Medicine, Washington University, 660 S Euclid Ave, St. Louis, MO
63110.
Mrs. Norton is Instructor, Program in Physical
Therapy, School of Medicine, Washington University.
This study was completed in partial fulfillment
of the requirements for Mrs. Frese's master's-degree,
Washington University.
This article was submitted April 14, 1986; was
with the authors for revision 10 weeks; and was
accepted August 27, 1986. Potential Conflict of Interest: 4.
1072
necessary if the tests are to be used.
Manual muscle testing reliability in a
clinical setting has been studied minimally. Lilienfeld et al found muscle test
grades from Zero to Normal assigned by
12 to 39 examiners in four different
trials to be within one grade, although
the testing method was controlled because the examiners were trained by the
same instructor.1 Iddings et al also found
manual muscle testing to be reliable
among 10 examiners whose ratings were
within one grade in 90.6% of the trials.2
All of the subjects in both of these studies had the diagnosis of poliomyelitis,
and the examiners were highly skilled in
manual muscle testing.
The reliability of manual muscle tests
has been the most difficult to achieve
for grades greater than Fair because of
the examiner's subjective judgment of
the amount of resistance applied during
the test. One of the problems central to
manual muscle testing is the variable
"frame of reference" for making an assessment. Such subjective judgments include determining what is normal muscle strength for an individual given the
person's age and size, in addition to the
relative strengths of the tester and patient.3-6
Many other factors influence the reproducibility of a manual muscle test.
The testing method may vary among
therapists (eg, Kendall and McCreary7
vs Daniels and Worthingham8), both because the therapists' training may have
differed and because physical therapists
tend to develop their own techniques
and standards for grading muscle
strength. Other variables that influence
the accuracy of a muscle test are 1) the
point and line of force application, 2)
the magnitude of resistive force, 3) the
speed of resistive force application, 4)
the duration of the contraction, 5) the
degree of cooperation from the patient,
6) fatigue, 7) various distracting influences, 8) the type of instructions given,
9) the tone of the therapist's voice, and
10) the amount of interaction between
the therapist and patient.4,9-15
Beasley attempted to increase objectivity in manual muscle testing by developing a standardized scale of norms
for muscle strength.16 Using an electronic myodynagraph, Beasley found a
discrepancy between the percentage of
Normal strength assigned in a manual
muscle test and the percentage of
strength found by a quantitative measure.16 The Good muscle strength group,
usually rated at 75% of Normal in the
manual muscle testing system,7 had only
43% of the Normal value on Beasley's
standardized scale. The Fair group had
a rating of only 9% of Normal, rather
than 50% of Normal usually assigned.
The Poor group, ordinarily rated at 25%
of Normal on the manual scale, had a
rating of only 2.6% of Normal on the
standardized scale. The standard deviations showed considerable overlap in the
percentage of Normal scores in grades
below Fair, indicating poor differentiation in grades below Fair, the range in
which manual muscle testing supposedly is more accurate.16
The purposes of this study were to
develop a protocol to examine the reliPHYSICAL THERAPY
ability of manually testing muscle
strength in a physical therapy department and to use that protocol to assess
the interrater reliability for manually
testing the middle trapezius and gluteus
medius muscles. We chose the two muscles indicated 1) because we wanted to
examine muscles from both the upper
and lower extremities and 2) because
the selected muscles are difficult to test
owing to the stabilization required by
other muscle groups during testing. In
addition, the two muscles selected for
study frequently are found to be weak
in patients. The hypothesis was that a
staff of physical therapists working together in a physical therapy department
would demonstrate interrater reliability
in testing the middle trapezius and gluteus medius muscles.
METHOD
Subjects
One hundred ten patients, who were
referred for physical therapy at St. Louis
University Hospital, participated in the
study. The patients had various musculoskeletal and neurological disorders including low back pain, degenerative
joint disease, cervical pain, gunshot
wound, chondromalacia, rheumatoid
arthritis, and connective tissue disease.
The patients had to exhibit sufficient
range of motion to allow the body part
to be placed in the test position and
either pain-free motion or pain that did
not interfere with the muscle test. The
test group consisted of 50 female and 60
male subjects, aged 15 to 76 years, with
a mean age of 41 years (± 15 years).
Examiners
Eleven staff physical therapists at St.
Louis University Hospital served as the
examiners. All examiners were graduates of accredited university programs.
Seven were graduates of the same university, 2 others graduated from another
university, and the remaining 2 therapists graduated from two other different
schools. The mean number of years of
experience of the staff members was 2.3
± 1.2 years. Eight of the therapists preferred the Kendall and McCreary muscle testing technique,7 2 preferred that
of Daniels and Worthingham,8 and 1
used both.
Each therapist received a work sheet
with 10 spaces for 10 different patients.
Next to each space was the name of the
therapist with whom the examiner had
been paired randomly for that particular
Volume 67 / Number 7, July 1987
patient. A different therapist's name appeared in each space so every examiner
was paired with 1 of 10 different therapists. Each therapist also received a second work sheet with 10 spaces to be
used for recording muscle grades of another therapist's patient when her name
appeared on that therapist's list for that
patient. Each examiner then selected 10
patients to be included in the study. The
Appendix gives the muscle testing scale
and definitions that all of the therapists
used.
Testing Procedures
Manual muscle testing was performed
during the patient's daily treatment session. A rest period of at least three minutes was allowed between the two examiners' tests and the two therapists
kept their results confidential. The examiners used a "break" test, and for the
gluteus medius muscle test, the patient's
hip was placed in as much extension as
possible.
Testing Sequence
The testing sequence involved the following steps:
1. The examiner first identified a patient suitable for the study.
2. The examiner performed the middle
trapezius and gluteus medius muscle
tests bilaterally. The side and muscle
to be tested first was assigned randomly before the beginning of the
test phase. The examiner used her
accustomed technique of muscle testing to determine the appropriate
grade and repeated the test several
times, if needed, to assign a grade.
She then recorded the grades in the
appropriate space on her work sheet.
3. A second therapist, who had been
paired randomly with her for that
patient, then performed the same
two muscle tests in the same order,
but using her own testing technique.
The second therapist also repeated
the test several times, if necessary, to
determine a grade. She then recorded
the grades on her work sheet.
Data Analysis
Cohen's weighted Kappa (Kw) determination17 was used as an index of
agreement for interrater reliability. This
index weighs disagreements by the
amount of disagreement. A weight matrix (Tab. 1) was designed for the scores
RESEARCH
obtained in the study and gave the ratioscaled degrees of disagreement assigned
to each cell. Each cell in the matrix
represents one score for each examiner.
For example, the cell for Normal-Normal had a weight value of 1.0, the cell
for Good-Normal was 0.7, and the cell
for Poor minus-Normal was 0.0.
To determine whether eliminating the
pluses and minuses would improve the
reliability coefficient, we compressed
the original scores into afive-pointscale.
Pluses and minuses were assigned the
same score as the main grade (eg, Fair
plus and Fair minus became Fair), and
a weight matrix was designed for these
scores.
The muscle test scores of every patient
whom Therapist 1 examined were compared with the scores of each of the other
10 therapists with whom she was paired.
An interrater reliability coefficient then
was computed for Therapist 1. This
process was repeated for each therapist
so that an interrater reliability coefficient was computed for all 11 examiners. By doing so, we wanted to determine whether any particular therapist
appeared to be less reliable compared
with the other 10, and whether the
school the therapist graduated from or
her years of experience were factors affecting reliability.
RESULTS
Table 2 summarizes the percentages
of the total number of subjects on which
the examiners agreed, in addition to percentages of agreement within several
ranges of disparity (ie, fractions of grades
they were apart). The percentage of subjects on whom the same grade was obtained by two examiners ranged from
28% to 45% for the four muscles, and
for 89% to 92% of the subjects we found
either complete agreement or agreement
within one grade.
The percentage of patients who were
rated Fair plus or above by one or both
examiners was 88% for the right middle
trapezius muscle, 90% for the left middle trapezius muscle, 91% for the right
gluteus medius muscle, and 95% for the
left gluteus medius muscle. One or both
examiners assigned a grade of Normal
in 50% of the tests for the right middle
trapezius muscle, in 44% of the tests for
the left middle trapezius muscle, in 67%
of the tests for the right gluteus medius
muscle, and in 70% of tests for the left
gluteus medius muscle.
Table 3 gives the interrater reliability
coefficients for both the original and the
1073
compressed muscle testing scores. The
reliability for the original scores was low,
ranging from .11 to .58. Compressing
the scores did not change the interrater
reliability coefficient appreciably (.26.42). Even for grades below Fair, we
found poor interrater reliability.
Table 4 summarizes the results of
comparing each of the examiners with
every other examiner for each test. Reliability coefficients ranged from .04 to
.66 with no pattern of high reliability
being established by any one therapist.
Those therapists with more clinical experience did not demonstrate any
greater level of reliability than those who
had graduated more recently. The
school from which the therapist graduated did not appear to affect reliability
because those therapists who graduated
from the same university did not demonstrate any greater reliability among
each other than the therapists who graduated from different schools. Therapist
3 demonstrated low reliability coefficients on all four tests (.08-. 19).
DISCUSSION
Using Cohen's weighted Kappa determination, we found interrater reliability
for manually testing the strength of middle trapezius and gluteus medius muscles in a clinical setting to be poor. When
the results were expressed as percentages
of agreement, however, they were similar to the findings of Lilienfeld et al1 and
Iddings et al2 who reported good reliability within one grade among experienced examiners (more experienced
than those in our study). The results
(28%-47% agreement) did not agree
with those of Williams,10 who found that
two examiners agreed completely between 60% and 75% of the time. The
examiners in our study agreed more frequently on the gluteal muscle tests than
on the middle trapezius muscle tests for
reasons we could not determine. We
also found poor interrater reliability in
grades below Fair, which agrees with
Beasley's16 finding of poor differentiation in grades below Fair.
Compressing the scores by eliminating pluses and minuses did not appreciably change the interrater reliability
coefficients. The coefficient for the right
middle trapezius muscle decreased, possibly because the interval widened between grades with pluses and minuses
when they were compressed (eg, Fair
plus-Good minus was changed to FairGood).
1074
TABLE 1
Weight Matrix for Original Scoresa
Muscle Test
a
Muscle Test Scores for Examiner 2
Scores for
Examiner 1
P-
P
P+
F-
F
F+
G-
G
G+
N-
N
PP
P+
FF
F+
GG
G+
NN
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.9
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.8
0.9
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.7
0.8
0.9
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.6
0.7
0.8
0.9
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.9
0.8
0.7
0.6
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.9
0.8
0.7
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.9
0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.9
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Eleven possible scores ranging from P - to N.
TABLE 2
Percentage of Agreement Among Scores for Subjectsa
Musclesb
RMT
Grade
Same grade
1/3 grade apart
2/3 grade apart
1 grade apart
1 1/3 grades apart
1 2/3 grades apart
Within 1 grade apart
Same grade or within 1
grade
LMT
RGM
LGM
n
%
n
%
n
'%
n
%
31
24
19
25
6
5
68
28
22
17
23
5
5
62
32
27
23
15
8
4
65
29
25
21
14
7
4
60
52
11
24
14
4
1
49
47
10
22
13
4
1
45
50
17
15
16
7
5
48
45
15
14
15
6
5
44
99
90
97
89
101
92
98
89
a
Each grade was divided into thirds with the use of pluses and minuses; therefore, the
difference between 2 and 2+ was considered 1/3, the difference between 2 - and 2+ was
2/3, and the difference between 2 and 3 was one grade.
b
RMT-right middle trapezius, LMT-left middle trapezius, RGM-right gluteus medius, LGMleft gluteus medius.
TABLE 3
Interrater Reliability of Original and Compressed Scores
Musclesa
Conditions
Original
Compressed
N
110
110
RMT
LMT
RGM
LGM
Kwb
.58
.26
Kw
.29
.26
Kw
.25
.30
Kw
.11
.42
a
RMT-right middle trapezius, LMT-left middle trapezius, RGM-right gluteus medius, LGMleft gluteus medius.
b
Kw = weighted Kappa coefficient.
The distribution of the scores might
have affected the reliability or agreement coefficient. Because the majority
of the subjects' scores were Fair plus or
greater for all of the muscles (88%-95%),
the scores were not well distributed
across all possible muscle grades. This
skewed distribution might have reduced
spuriously the magnitude of the Kappa
coefficient. A broader range of scores
should improve the chances of demon-
strating an accurate measure of agreement. Because we established the criterion of pain not interfering with the
muscle test, some of the weaker patients
may have been excluded from the study.
One procedural problem that could
have affected our results was the difficulty of positioning some of the patients
for a particular test. Different therapists
adjusted the procedure differently to
solve the problem.
PHYSICAL THERAPY
TABLE 4
Interrater Reliability Among Therapists
Musclesa
RMT
Therapist
LMT
RGM
LGM
Kw
Kw
Kw
Kw
.22
.21
.19
.06
.42
.28
.42
.15
.37
.14
.62
.15
.52
.16
.33
.25
.30
.63
.47
.46
.28
.04
.31
.34
.08
.52
.48
.26
.50
.66
.25
.56
.20
.58
.55
.13
.44
.38
.34
.29
.37
.29
.49
.11
b
1
2
3
4
5
6
7
8
9
10
11
a
RMT-right middle trapezius, LMT-left middle trapezius, RGM-right gluteus medius, LGMleft gluteus medius.
b
Kw = weighted Kappa coefficient.
APPENDIX
Muscle Testing Scale and Definitions*
a
Normal
(5)
Normal minus
(5-)
Good plus
(4+)
Good
(4)
Good minus
(4-)
Fair plus
(3+)
Fair
(3)
Fair minus
(3-)
Poor plus
(2+)
Poor
(2)
Poor minus
(2-)
Trace
Zero
(1)
(0)
able to hold the test against gravity and maximum
resistance, or to move the part into the test
position and hold against gravity and maximum
pressure
same as for Normal except slightly less resistance
can be given
same as for Good but slightly more resistance
can be given
same as for Normal except able to hold against
moderate resistance
same as for Good but slightly less resistance can
be given
able to hold the test position against gravity, or to
move the part into the test position and hold
against gravity and slight resistance
able to hold the test position against gravity, or to
move the part into the test position and hold
against gravity
able to release gradually from the test position
against gravity, or to move the part toward the
test position against gravity almost through full
range
able to move the part through full range with
gravity eliminated, but against slight resistance
able to move the part through full range with
gravity eliminated
able to move the part through partial range with
gravity eliminated
muscle contraction can be palpated
no contraction can be elicited
Adapted from Kendall and McCreary7 and Daniels and Worthingham.8
The patients' age did not appear to be
a factor in the low interrater reliability
coefficients because the scores for the
youngest and the oldest subjects in the
study were not consistently any farther
apart than those of the subjects in the
middle age range.
Achieving reliability within one
grade, as in this study, has questionable
Volume 67 / Number 7, July 1987
clinical value, especially when considering the differences between Poor and
Fair, or Fair and Good, versus the difference between Good and Normal. The
interval between each of these pairs of
grades is one grade, although the therapists' subjective judgments of patient
function may have been quite different.
The accuracy of assessments of patient
RESEARCH
progress or deterioration, therefore,
would be questionable despite reliability
within one grade.
Manual muscle testing is an inexpensive, relatively quick, and convenient
method for assessing a patient's muscle
strength. In view of the results of this
study, however, physical therapists
should consider supplementing their
manual muscle test scores with isokinetic testing, dynamometry, or tensiometry. Griffin et al compared the results
of manual muscle testing with isokinetic
testing for knee extensor muscles in patients with neuromuscular disease and
found that a lack of strength improvement or a decrease in strength was demonstrated by both manual muscle testing
and isokinetic testing.18 They also
found, however, that in patients with a
manual muscle test score of 9 to 10
(Normal minus-Normal), isokinetic
testing revealed either muscle strength
deficits or improvement not detectable
with manual muscle testing methods.
They concluded that isokinetic testing
adds valuable information when patients have manual muscle test scores of
Normal. Bohannan found a significant
reliability correlation between manual
muscle test scores and dynamometer
test scores for knee extensor muscles,
which indicated that both testing methods measure muscle strength similarly.19
He found a significant difference, however, between theoretical percentage
manual muscle test scores and calculated dynamometer percentage test
scores, which indicated that theoretical
percentage scores based on manual muscle testing are likely to overestimate a
patient's muscle strength. Supplementing manual muscle test scores with isokinetic testing, dynamometry, or tensiometry would decrease the subjectivity
in assessing a patient's disability.
Further study is needed in this area
with each therapist being paired more
than twice with another therapist. One
potential study might incorporate several staff in-service training sessions before the start of testing to help standardize the muscle testing techniques among
the staff members as much as possible.
Reliability then could be reassessed to
determine whether any improvement is
noted. Garraway et al were able to increase the proportion of examinations
for stroke assessment, which included
motor function, in which total agreement was reached from 41% to 68%
after standardizing definitions, discussion and interpretation of instructions
by the examiners, and practice.20
1075
CONCLUSIONS
The results of this study do not support the research hypothesis that staff
physical therapists can perform manual
muscle tests reliably in a clinical setting.
The results do demonstrate that the
therapists are reliable within one grade;
however, this degree of reproducibility
may not be adequate for making clinical
judgments. Supplementing muscle test
scores with isokinetic testing, dynamometry, or tensiometry is suggested.
The development of a standardized
method of muscle testing is needed so
that different examiners can obtain
comparable results in a clinical setting.
Standardizing the resistance given in
grades of Good and Normal so that
subjective judgment is minimized is an
area in which further study is needed.
Acknowledgments. We thank the
physical therapy staff of St. Louis University Hospital for their cooperation
and Carolyn Heriza for her advice in
planning the study.
REFERENCES
1. Lilienfeld AM, Jacobs M, Willis M: A study of
the reproducibility of muscle testing and certain
other aspects of muscle scoring. Phys Ther
Rev 34:279-289, 1954
2. Iddings DM, Smith LK, Spencer WA: Muscle
testing: Part 2. Reliability in clinical use. Phys
Ther Rev 41:249-256, 1961
3. Molnar GE, Alexander J, Grutfield N: Reliability
of quantitative strength measurements in children. Arch Phys Med Rehabil 60:218-221,
1979
4. Editorial: The accuracy of the manual muscle
test. Arch Phys Med Rehabil 35:515-517,
1954
5. Bechtol CO: Grip test: The use of a dynamometer with adjustable hand spacing. J Bone Joint
Surg [Am] 36:820-824, 1954
6. Nicholas JA, Sapega A, Kraus H, et al: Factors
influencing manual muscle tests in physical
therapy. J Bone Joint Surg [Am] 60:186-190,
1978
7. Kendall FP, McCreary EK: Muscles: Testing
and Function, ed 3. Baltimore, MD, Williams &
Wilkins, 1983
1076
8. Daniels L, Worthingham C: Muscle Testing:
Techniques of Manual Examination, ed 4. Philadelphia, PA, W B Saunders Co, 1980
9. Smidt GL, Rogers MW: Factors contributing to
the regulation and clinical assessment of muscular strength. Phys Ther 62:1283-1290, 1982
10. Williams M: Manual muscle testing: Development and current use. Phys Ther Rev 36:797805,1956
15. Trombly CA: Occupational Therapy for Physical Dysfunction, ed 2. Baltimore, MD, Williams
& Wilkins, 1982, pp 173-229
16. Beasley WC: Quantitative muscle testing: Principles and applications to research and clinical
services. Arch Phys Med Rehabil 42:398-425,
1961
11. Wintz MN: Variations in current manual muscle
testing. Phys Ther Rev 39:466-475, 1959
17. Cohen J: Weighted Kappa: Nominal scale
agreement and provision for scaled disagreement or partial credit. Psychol Bull 70:213220,1968
12. Johannson CA, Kent BE, Shepard KF: Relationship between verbal command volume and
magnitude of muscle contraction. Phys Ther
63:1260-1265,1983
18. Griffin JW, McClure MH, Bertorini TE: Sequential isokinetic and manual muscle testing in
patients with neuromuscular disease. Phys
Ther 66:32-35, 1986
13. Westers BM: Factors influencing strength testing and exercise prescription. Physiotherapy
68:42-44, 1982
19. Bohannan RW: Manual muscle test scores and
dynamometer test scores of knee extension
strength. Arch Phys Med Rehabil 67:390-392,
1986
14. Gonnella C, Harmon G, Jacobs M: The role of
the physical therapist in the gamma globulin
poliomyelitis prevention study. Phys Ther Rev
33:337-345, 1953
20. Garraway WM, Akhtar AJ, Gore SM, et al:
Observer variation in the clinical assessment
of stroke. Age Ageing 5:233-240, 1976
PHYSICAL THERAPY
Download