Effects of Muscle Croup and Placement Site on Reliability of and-~eld Dynamometry Strength Measurements Lori Mitchell M c M a h o n , MSE, PT' C. Burdett, PhD, PTZ Susan L. Whitney, MS, PT, ATC3 Ray T he use of manual muscle testing as a clinical indicator of strength has been subject to criticism for various reasons, including subjectivity at higher muscle grades, an ordinal measurement scale, and relatively large increments between muscle grades (2, 15). Hand-held dynamometers (HHD) have been proposed as an alternative to manual muscle testing for obtaining strength measures in the clinic (5,7,8,lO). During a measurement session, the tester stabilizes the HHD against the extremity being tested while the patient exerts a maximal force against the dynamometer. T h e HHD is objective, uses a measurement scale with equal increments, and provides small increments between measured values. Muscle strength is defined as the amount of torque that a muscle group can exert (14). This torque is generally measured as the amount of external force that can be exerted at a given lever arm distance (torque = force x lever arm), where the lever arm is the perpendicular distance from the joint center to the force action line. HHD outputs are force readings, and, therefore, are not direct measures of muscle strength. When the HHD is applied at a con- Studies of measuring muscle strength with hand-held dynamometers have produced a variety of results. The purpose of this research was to further investigate the effect of muscle group and placement site on reliability. The purpose of Part I of this study was to examine reliabilities of force measurements generated by four specific muscle groups using a hand-held dynamometer (HHD). Part 11's purpose was to determine the effects of HHD placement site on the variability of HHD force measurements. In Part I, two testers obtained measurements of right shoulder abductor, wrist extensor, hip flexor, and ankle dorsiflexor forces in 20 subjects. Two-way analysis of variance indicated a main effect due to tester, but no tester by session interaction and no main effect due to session (p < 0.5). lntraclass correlation coefficients ranged from .76-.93 for within-session, intratester reliabilities, .67-.84, for between-session intratester reliabilities, and .30-.83 for within-session, intertester reliabilities. Reliability tended to be higher when HHD placement sites were iarther from joint centers. Part 11 explored the hypothesis that HHD forces would be less variable if measured distally. One tester measured shoulder abductor forces for 30 subjects at three sites on the upper extremity. Bartlett's Test for homogeneity of variance indicated a lower variability at the distal placement site (p < 0.05). Key Words: muscle strength, reliability, hand-held dynamometer ' Research engineer, Department of Otolaryngology, The Eye & Ear, Institute of Pittsburgh, Pittsburgh PA; formerly instructor, Departrnent of Physical Therapy, University of Pittsburgh 2Associateprofessor, Department of Physical Therapy, University of Pittsburgh, Pittsburgh, PA 'Assistant professor, Department of Physical Therapy, University of Pittsburgh, Pittsburgh, PA sistent position, however, lever arm length remains constant and, therefore, the force reading can be used as an indicator of muscle strength. According to Blesh (3). reliability may be given a subjective rating according to four ranges. Values greater than 0.9 are considered to be high; those ranging from 0.80 to 0.89 are termed good; those between 0.70 and 0.79 are fair; and reliabilities less than 0.70 are considered poor. Intratester reliability of HHD measurements has been shown to be high in both healthy and patient populations within a session (4,ll). Riddle et al also reported high within-session intratester reliability (38-.98) for paretic and nonparetic sides of braindamaged patients. For sessions 2 days apart, however, they reported high intratester reliability for the paretic side (.9-.98) and varied intratester reliabilities for some muscle groups of the nonparetic side (.31-.93) (1 2). Volume 15 Number 5 May 1992 JOSPT Studies of intertester reliability have generated a variety of results. Bohannon and Andrews (6), using a varied patient sample, found relatively high intertester reliabilities (.88-.94). Byl ( I I) used healthy subjects and reported lower intertester reliabilities (.52-.84) than Bohannon and Andrews. Agre et al (1) found good t o high intertester reliabilities for the upper extremities (.79-.99) but poor to fair intertester reliabilities for the lower extremities (.49.8 1). Agre suggested several factors that may contribute t o low intertester reliability, including strength of the tested muscle group, ability of the tester t o stabilize the instrument when obtaining H H D measurements, and distance from joint centers t o H H D placement sites. Riddle et al (1 2) examined the relationship between muscle strength and reliability and reported a Pearson product-moment correlation of 0.1. This result does not support the speculation by Agre e t al (1) that strength of the muscle being tested is a factor in H H D reliability. Recently, however, Wikholm and Bohannon (16) reported that tester strength influences the magnitude and reliability of H H D measurements for muscle forces greater than 120 N. No studies have investigated the effects of H H D distance from the joint center on reliability of H H D force measurements. This study was conducted in two parts. T h e purpose of the first part was to add t o the current knowledge base regarding intertester and intratester reliability of H H D measurements. Specifically, within-session intertester and intratester reliabilities as well as between-session intratester reliabilities were determined for four muscle groups (shoulder abductors, wrist extensors, hip flexors, and ankle dorsiflexors). In the second part, the relationship between H H D placement sites and reliability of shoulder abductor force measurements was investigated. JOSPT Volume 15 Number 5 May 1992 PART I METHOD Subjects Twenty young adult females (x= 22.8 years, range 19-30 years) with n o known muscle strength limitations o r orthopaedic pathology served as subjects rbr this part of the study. All subjects signed an informed consent form approved by the Biomedical Internal Review Board of the University of Pittsburgh. dynamometers have been proposed as an alternative to manual muscle testing for obtaining strength measures in the clinic -- Data Collection A spring-loaded H H D with a measurement range of 0-267 N in 4.45 N increments was used throughout the study (SPARK Instruments 8c Academics. Inc, P O Box 5 123. Coralville IA 5224 1). T h e dynamometer was calibrated prior t o data collection and at the end of the study by applying 133.5 N of free weights t o the dynamometer in 44.5-N increments. Over the course of 2 weeks of data collection, the calibration of the dynamometer remained linear and varied less than 4.45 N with the 133.5 N of loading. During the study, some subjects exerted greater than 133.5 N of force, but the dynamometer's linearity above this value was not determined. Measurements were recorded t o the nearest 2.23 N. T w o physical therapists served as testers for this study. A I-hour practice session was held t o familiarize them with the use of the H H D and the procedures for measuring the four muscle groups in Part I of this study. Neither tester had been using H H D with regularity in the clinic. Isometric "make" tests were performed in standard, against-gravity positions for four muscle groups: right shoulder abductors, wrist extensors, hip flexors, and ankle dorsiflexors. Positions and stabilization strategies are listed in Table 1. During an isometric "make" test, the tester stabilizes the H H D while the subject exerts a maximum isometric force against it. Fach subject's force outputs were measured during session one by each tester and remeasured 1 week later (session two). again by both testers. During each of the sessions, three dynamometer measurements were obtained by both testers for each of the four muscle groups. Thirty seconds of rest were allowed between each trial. T h e tester placed the dynamometer at the appropriate position on the upper extremity, stabilized the dynamometer perpendicularly to the body segment, and asked the subject to exert a maximal force for 3 seconds until told to relax. T h e other tester read the dynamometer and recorded the value. After tester one completed data collection from all four muscle groups, tester two completed the same procedure with the subject. Subjects were assigned numbers, with tester one performing initial data collection on the even-numbered subjects and tester two performing the initial data collection on the odd-numbered subjects. Testing was performed in the same order during both sessions. Data Analysis In comparing force measurements between testers and between R E S E- A R .C-H - S-T-U- D Y------. ---- --.-- ----- -- -- ----- ------ Muscle Group Position Stabilization Strategy Shoulder abductors .Sitting: shoulder at 90" abduction; elbow fully extended Force applied 3 cm proxima1 to tip of olecranon process .Sitting: forearm pronated and resting on treatment table surface; wrist and hand over edge of table; wrist at neutral; fingers flexed Force applied just proximal to MCP joint of third digit .Sitting: hips and knees flexed to 90'; thigh and foot unsuppofied .Force applied 5 cm proximal to superior pole of patella .Sitting: hips and knees flexed to 90"; ankle in 0" DF Force applied on MTP joints .Subject's contralateral UE holding chair seat Wrist extensors Hip flexors Ankle dorsiflexors -- - .Subject's feet flat on floor .Subject uses contralateral hand to stabilize tested forearm .Subject's hands gripping edge of plinth .Subject's heel resting on floor -the testers in the two sessions. T h e results of the four two-way ANOVAs are shown in Table 3. For the shoulder abductors, hip flexors, and wrist extensors, there was n o tester X session interaction and no main effect due t o session. T h e r e was, however, a main effect due to tester for these three muscle groups. This indicates that measurements obtained by each tester were consistent from session to session, but that there was a significant difference between the measurement obtained by the two testers for these muscle groups. Tester two recorded higher force measurements for all of these muscle groups. For the ankle dorsiflexors, there was a significant inter- TABLE 1. Positioningfor HHD testing. sessions, the highest of the three force readings obtained by each tester was used. T h e highest reading was used rather than the mean value to limit the effects of fatigue and learning. T h e reliability of the HHD force measurements was examined by two methods: 1 ) two-way analysis of variance (ANOVA) with repeated measures and 2) intraclass correlation coefficients (ICC) (1 3). T h e two-way repeated measures ANOVA was used with tester as one factor and session as the other factor. T h e main effects of tester and session, as well as any interaction effects between tester and session, were determined ( p < 0.05). In the absence of interaction, a nonsignificant tester effect would tend to indicate a high intertester reliability. A nonsignificant session effect would tend to indicate high intratester reliability. Within-session intratester reliability was determined for each therapist and each muscle group by calculating ICCs, comparing the three measurements of each muscle group for that particular session. As a mea- Tester l Tester 2 Session Session Muscle Group Session Session 2 1 2 , ---- X S K S S ~ S K S Hip flexors Shoulder abductors Ankle dorsiflexors Wrist extensors 190 37 200 36 208 33 212 35 127 20 123 25 135 25 133 25 167 36 176 33 186 40 179 34 The highest reliability was for measurement of the shoulder abductors and the hip flexors, while the lowest reliability was for the wrist extensors. 99 12 99 11 113 19 113 19 TABLE 2. Muscle force measurements (N). sure of between-session intratester reliability, ICCs were calculated for each tester using the forces measured at the first and second sessions. As a measure of within-session intertester reliability, intraclass correlation coefficients were calculated for each session using the maximum forces measured by the two testers. RESULTS Table 2 shows the means and standard deviations of the force measures obtained for the four muscle groups, as measured by each of action between testers and sessions (Figure 1). There was a large difference between force measurements recorded by the two testers during session one and little difference in session two. There was no main effect, however, due to session o r tester. T h e reliabilities as indicated by the ICC's are shown in Table 4. T h e within-session intratester reliabilities ranged from 0.76-0.93, with measurements of the shoulder abductors and hip flexors showing the highest reliabilities and measurement of the wrist extensors showing the lowest reliability. Between-session intratester reliabilities ranged from 0.670.84, with the shoulder abductors Volume 15 Number 5 May 1992 JOSPT RESEARCH S T U D Y Muscle Group Source Hip flexors df SS MS F Tester Session Tester x Session Tester Session Tester x Session Tester Session Tester x Session Tester Session Tester x Session Shoulder abductors Wrist extensors Ankle dorsiflexors 'Significant at p < 0.05. TABLE 3. Results of four two-way repeated measures analysis of variance. Shoulder Abductors 1 3 8 ~ Hip Flaws Team 1 Session 1 Session 1 Session 2 Session 2 Wrist Extensors I2OT st I02+ I I Session 1 I Tester 2 Tester 1 Session 1 Session 2 FIGURE 1. Mean forces measured by both testers in both sessions. Shoulder abductors, hip flexors, and wrist extensors showed a main effect due to differences between testers, while the ankle dorsiflexors showed a tester X session interaction. having the highest reliability and hip flexors the lowest. T h e intertester reliabilities ranged from 0.3 1-0.83. Measurement of the shoulder abductors again showed the highest reliability, and measurement of the wrist extensors generated the lowest. JOSPT Volume 15 Number 5 May 1992 The muscle groups with the shortest moment arms (wrist extensors and ankle dorsiflexors) had consistently lower reliabilities than the hip flexor and shoulder abductor muscle groups. . I Session 2 of Bohannon (4), who found high test-retest reliabilities for ankle dorsiflexion, shoulder abduction, wrist extension, and hip flexion as well as other muscles during a single testing session. Riddle et al (12) reported good to high within-session intratester reliabilities for ankle dorsiflexors, wrist extensors, and hip flexors in the paretic and nonparetic limbs of brain-damaged patients. Riddle et al's testing included three of the four muscle groups tested in the present study. Between-session intratester reliabilities were good for the shoulder abductor muscle group, but fair for DISCUSSION Using the guidelines suggested by Blesh (3). results from the present study indicate good within-session intratester reliability for all four muscle groups. T h e results support those the hip flexor, wrist extensor, and ankle dorsiflexor measurements. These results are similar to those reported by Riddle et al (1 2). Their research generated high reliabilities for paretic limb force outputs, but fair to poor ICCs for nonparetic limb forces. T h e ICC for nonparetic wrist extensors was reported to be 0.3 1. Wikholm and Bohannon (16) reported that the strongest tester measured the highest HHD force readings for shoulder external rotator, elbow flexor, and knee extensor ---- - R .E-S-E-A-R-C-H S T U D Y Muscle Group lntratester Reliability within a Session Tester 2 Tester 1 Day 1 Day 2 Day 1 Day 2 Mean ICC per Group .79 .89 .76 .83 .91 .93 .77 .89 .93 .80 .80 .90 .91 .89 .88 .88 .88 .80 .86 Shoulder abductors Hip flexors Wrist extensors Ankle dorsiflexors Muscle Group .84 Intratester Reliability between Sessions Tester 1 Tester 2 Mean Intertester Reliability within Sessions Session 1 Session 2 Mean - Shoulder abductors Hip flexors Wrist extensors Ankle dorsiflexors .84 .67 .76 .84 .81 .75 .74 .70 .83 .71 .75 .77 .83 .76 .31 .55 .79 .68 .33 .48 .81 .72 .32 .52 TABLE 4. Reliabilities based on K C . measurements. In the present study, however, the stronger tester (tester one) measured consistently lower H H D force readings for all four muscle groups. Wikholm and Bohannon also reported a decrease in reliabilities of H H D measurements with increased tested muscle group force production. T h e present authors did not find such a relationship between reliabilities and muscle group force production, but this may be the result of testing four muscle groups with a narrow range of force outputs: Wikholm and Bohannon's force outputs across muscle groups ranged from 1 14-429 N, while the present study's ranged from 99-212 N. T h e intertester reliabilities in the current study were poor for the wrist extensors and ankle dorsiflexors, fair for the hip flexors, and good for the shoulder abductors. Note that the muscle groups with the shortest measurement moment arms (wrist extensors and ankle dorsiflexors) had consistently lower reliabilities than the hip flexor and shoulder abductor muscle groups. O n the basis of this observation, the authors decided t o test the hypothesis that a more distal placement site would yield more reliable H H D measurements for a particular muscle group. PART II METHOD Subjects Eighteen female and 12 male healthy adult subjects (y = 24 years, range 20-35 years) participated in the study. Subjects excluded from the study included those with histories of upper extremity fracture. shoulder subluxation/dislocation, shoulder instability, rotator cuff injury, and axillary nerve pathology. Informed, written consent was obtained from each subject in compliance with the Human Subjects Committee of the Biomedical Internal Review Board a t the University of Pittsburgh. Data Collection A design was employed in which three force measurements were obtained at each of three sites on the dominant arms of all subjects, using the same type of H H D as in Part I of this study. Upper extremity dominance was determined by handedness preference. T h e positioning of the dynamometer a t each of the three sites was standardized by marking the subjects' skin with a felt-tip marker. Marks were made both 10 cm and 2.5 cm proximal t o the tip of the olecranon process (sites A and B, respectively) as well as the proximal aspect of the ulnar styloid process (site C). Markings of all sites for all subjects were performed by tester one. In order to account for subject fatigue, order of the testing sites was randomized, and the subjects were permitted to rest for 2 minutes between each measurement. Body position during measurement of shoulder abduction was the same as in Part I (see Table 1). Subjects were instructed to allow their nondominant hands t o rest on their laps. Tester two stood on a 6-in stool behind the subject, positioned the subject's upper extremity to 90" of abduction, and placed the instrument on the appropriate site. Subjects were asked to perform a maximal isometric "make" contraction of their dominant shoulder abductor muscles for each of the nine measurements. T h e dynamometer was removed from the subject's arm after 3 seconds. Tester two then gave the H H D to tester one for reading and recording of the measurement. Dynamometer measurements were read t o the nearest 2.23 N. Tester two performed the measurements for all subjects. DATA ANALYSIS T o determine the reliabilities of the force measurements, an ICC (1 3) was calculated for each of the three sites. Also, Bartlett's Test for homogeneity of variance (9) was used t o test the null hypothesis that the variances were equal at all three sites. T h e alternate hypothesis stated that at least one of the sites' variances differed significantly from the others. RESULTS Table 5 summarizes the ICC values as well as the within subject variance for each of the three sites. T h e Volume 15 Number 5 May 1992 JOSPT R E S E A R C H S .T- U- D .Y. ICC values were >0.90 for each site, indicating that all three sites yielded highly reliable intratester dynamometer measurements. Bartlett's Test for homogeneity of variance resulted in rejection of the null hypothesis a t the .05 level of significance. In order t o determine which of the sites' variance differed significantly from the others, the confidence intervals (level = 0.9) were calculated and plotted (Figure 2). Only site C's variance (most distal site) was determined to be significantly different, since its confidence interval plot did not intersect with the other two sites' plots. DISCUSSION Measurement of shoulder abduction forces proved to be highly reliable for each of the dynamometer placement sites. These results agreed with those of Part I of this study in which shoulder abduction was found t o be highly reliable when using site B. O n the basis of the ICC values - Placement Site X (N) A: 10 cm proximal to 212.33 olecranon process B: 2.5 cm proximal to 176.71 olecranon process 96.92 C: just proximal to ulnar styloid process ICC Variance .91 1485.2 .94 2058.0 .94 824.6 TABLE 5. Mean, K C , and variance values lor force measurements at three sites on the upper extremity during shoulder abduction testing. alone, it was not possible t o state with confidence that one of the sites was more reliable than any of the other sites. However, Bartlett's Test for homogeneity of variance showed that site C, the most distal site, generated significantly smaller measurement variance than the other two sites. T h e results of this test indicated that the most distal site was the most reliable for measuring force output of the shoulder abductors using HHD. A possible reason for reliability differences at the three H H D measurement sites may be due t o differences in lever arm measurement lengths. An error in lever arm distance will change the H H D force reading. For example, if a patient's shoulder abductor muscle strength is 4 0 Nm, placement of the dynamometer 0.2 m from the center of the glenohumeral joint would result in an H H D reading of 200 N (neglecting the effect of gravity). If during a retest of the same patient, a 2-cm error occurred so that the dynamometer was placed 0.18 m from the joint center, the H H D reading would be (4O)/(O. 18) o r 222.2 N, resulting in a difference of 22 N in the two measurements with n o actual change in strength. If a more distal testing site is used, for example, 0.4 m from the joint center, the H H D reading would be 100 N. If a 2-cm error in placement again occurred during a retest (HHD a t 0.38 m from the joint center), the H H D Area of Overlap n Variance (N2, FIGURE 2. Ninety percent confidence intervals for three sites on the upper extremity. Site C's variance (most distal site)was significantly smaller than the proximal sites' variances. JOSPT * Volume 15 * Number 5 * May 1992 reading would be 105.3 N, resulting in a force measurement difference of only 5.3 N. T h e same 2-cm error in dynamometer placement results in a larger error in the H H D reading a t the more proximal site. These examples show that, theoretically, the more distal placement site would be less sensitive to placement errors. This may help to explain the significantly smaller variance obtained at the distal measurement site (site C). Due t o the short lever arm distance, proximal H H D placements yield higher force readings. It is more difficult, therefore, for a tester to stabilize an H H D at proximal rather than distal placement sites. T h e tester's own strength may affect his o r her ability to stabilize the HHD, and this may also contribute to the increase in proximal site variance. O n the basis of these study results, the authors recommend a choice of distal H H D placement sites in order to minimize variance in force measurements. T h e r e may be other factors, however, that discourage distal placement sites. Such factors may include pathology of a joint falling between the muscle group being tested and the H H D placeJOSPT ment site. REFERENCES Agre IC, Magness lL, Hull SZ, Wright KC, Baxter TL, Patterson R, Stradel L: Strength testing with a portable dynamometer: Reliability for upper and lower extremities. Arch Phys Med Rehabil68:454-457, 1987 Beasley WC: Influence of method on estimates of normal knee extensor force among normal and post polio children. Phys Ther Rev 36:2 1-4 1, 1956 Blesh TE: Measurement in Physical Education (2nd Ed), New York: The Ronald Press Co, 1 974 Bohannon RW: Test-retest reliability of hand-held dynamometry during a single session of strength assessment. Phys Ther 66:206-209, 1986 Bohannon RW: The clinical measure- RESEARCH S T U D Y 6. 7. 8. 9. - - ment of strength. Clin Rehabil 1:5-16, 1987 Bohannon RW, Andrews AW: Accuracy of spring and strain gauge handheld dynamometers. I Orthop Sports Phys Ther 10:323-325, 1989 Borden R, Colachis SC: Quantitative measurement of the good and normal ranges in muscle testing. Phys Ther 48:839-843, I968 Darcus HD: Strain-gauge dynamometer for measuring the strength of muscle contraction and for re-educating muscles. Ann Phys Med l:l63-177, 19-52 Class CV, Hopkins KD: Statistical Methods in Education and Psychology, . - . (2nd Ed), Englewood Cliffs, NI: Prentice-Hall, Inc, 1984 10. Newman LB: A new device for measuring muscle strength: The myometer. Arch Phys Med Rehabil l:l63-177, 1952 11. Nies By1 N: lntrarater and interrater reliability of strength measurements of the biceps and deltoid using a handheld dynamometer. I Orthop Sports Phys Ther 9:399-405, 1988 12. Riddle DL, Finucane SD, Rothstein lM, Walker ML: lntrasession and intersession reliability of hand-held dynamometer measurements taken on brain-damaged patients. Phys Ther 69: 182- 194, 1989 " . 13. Shrout PE, Fleiss /L: lntraclass correlations: Uses in assessing rater reliability. Psych Bull 86:420-428, 1979 14. Smidt CL, Rogers MW: Factors contributing to the regulation and clinical assessment of muscular strength. Phys Ther 62: 1283- 1290, 1982 15. Wadsworth CT, Krishan R: lntrarater reliability of manual muscle testing and hand-held dynametric muscle testing. Phys Ther 67: 1342-1347, 1987 16. Wikholm )B, Bohannon RW: Handheld dynamometer measurements: Tester strength makes a difference ) Orthop Sports Phys Ther 13:191- 198, 1991 Volume 15 Number 5 May 1992 JOSPT