Rasch Analysis of Combining Two Indices to Assess Comprehensive ADL Function in Stroke Patients I-Ping Hsueh, MA; Wen-Chung Wang, PhD; Ching-Fan Sheu, PhD; Ching-Lin Hsieh, PhD Background and Purpose—To justify the summation of scores representing comprehensive activities of daily living (ADL) function, a Rasch analysis was performed to examine whether items of the Barthel Index (BI), assessing ADL, and items of the Frenchay Activities Index (FAI), assessing instrumental ADL, contribute jointly to a single, unidimensional construct in stroke patients living in the community. The number of scoring points of both indices was examined for their usefulness in discerning the various ability levels of ADL in these patients. Methods—A total of 245 patients at 1 year after stroke participated in this study. The BI and FAI were administered to the patient and/or the patient’s main caregiver by interview. Results—The initial Rasch analysis indicated that the middle scoring points for many items of the BI and FAI could be collapsed to allow only dichotomous response categories. All but 2 items of the FAI, social occasions and walking outside, fitted the model’s expectations rather well. These 2 items were excluded from further analysis. A factor analysis performed on the residuals of the Rasch-transformed scores recovered no dominant component. These results indicate that the combined 23 dichotomous items of the BI and FAI assess a single unidimensional ADL function. Conclusions—A clinically useful assessment of the comprehensive ADL function of patients at or later than 1 year after stroke can be obtained by combining the items of the BI and FAI (excluding 2 FAI items) and simplifying the responses into dichotomous categories. It is also demonstrated that the items of the new scale measure comprehensive ADL function as a single unidimensional construct when assessed at 1 year after stroke. (Stroke. 2004;35:721-726.) Key Words: activities of daily living 䡲 cerebrovascular accident 䡲 disability evaluation S troke is the most common cause of dependence in activities of daily living (ADL) among the elderly.1 ADL generally refers to basic or personal ADL, which has been widely used as a main outcome measure after stroke.2 However, basic ADL does not capture significant losses in higher levels of physical function or activities that are necessary for independence in the home and community (ie, Instrumental ADL [IADL]).2 Both the basic ADL and IADL are recommended as the primary outcome measures after stroke.3 Several authors4,5 have recommended combining the basic ADL and the IADL to comprehensively measure ADL function. Such a combined scale is expected to be more responsive and have a wider range than either of the individual measurements.6,7 Previous studies using factor analysis found that the Barthel Index (BI) (measuring basic ADL) and the Frenchay Activities Index (FAI) (measuring IADL) assessed different constructs in patients with stroke.4,5 Because the BI and FAI supplemented each other with a minimum of overlap in content, it was proposed that they be combined to assess comprehensive ADL function.4,5 However, the main question in combining items of ADL and IADL indices is whether these items together contribute to a unidimensional construct, which is necessary to justify the summation of scores.8 Furthermore, factor analysis operates on interval scale scores,9 whereas the BI and FAI scores are ordinal by nature. Thus, the results of studies using factor analysis to examine whether items of both indices could be combined to measure a single construct were inconclusive. Rasch analysis transforms ordinal scores to the logit scale and thus to an interval-level measurement.10,11 Furthermore, the Rasch model can examine whether items from a scale measure a unidimensional construct.10,11 If items do not fit the model’s expectation, unidimensionality is not preserved.10,11 Therefore, Rasch analysis is a useful method to validate the unidimensional construct of all the items of the BI and FAI and to obtain an objective, interval scale.10,11 Appropriateness of scoring points refers to whether or not participants can be differentiated by their responses as clearly as the points allow.12 The items of the BI are either dichotomous or on a 3or 4-point scale; those of the FAI are on a 4-point scale. Recent studies13,14 have shown that a higher number of scoring points may not lead to finer differentiation of the participants. Given that the BI has been found to have a Received August 12, 2003; final revision received November 23, 2003; accepted December 2, 2003. From the School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei (I-P.H., C-L.H.); Department of Psychology, National Chung Cheng University (W-C.W.), Chia-Yi, Taiwan; and Department of Psychology, DePaul University, Chicago, Ill (C-F.S.). Correspondence to Ching-Lin Hsieh, PhD, School of Occupational Therapy, College of Medicine, National Taiwan University, 7 Chung-Shan S Rd, Taipei 100, Taiwan. E-mail [email protected] © 2004 American Heart Association, Inc. Stroke is available at http://www.strokeaha.org DOI: 10.1161/01.STR.0000117569.34232.76 721 722 Stroke March 2004 TABLE 1. Characteristics of the Patients With Stroke (nⴝ245) Characteristics Basic characteristics Sex Male/female 130/115 Age Mean year (SD) 65.4 (11.2) Diagnosis Side of hemiplegia Canadian Neurological score at admission Cerebral hemorrhage 68 (28%) Cerebral infarction 177 (72%) Right/left 101/144 Median (interquartile range) 7 (4–9) At 1 year after stroke BI Median (interquartile range) 90 (65–100) FAI Median (interquartile range) 6 (0–13) Combined ADL raw score Median (interquartile range) 9 (5–13) Combined ADL Rasch-transformed score notable ceiling effect4,5 and that the FAI demonstrated a floor effect,4,5 it is of interest to examine whether or not the scoring points of these 2 indices are appropriate. Such an investigation can also be conducted by performing a Rasch analysis on the scores of these indices.12 This study used Rasch analysis to (1) investigate the appropriateness of the scoring points of the BI and FAI and (2) examine whether all the items of the BI and FAI contribute to a unidimensional construct in patients with stroke living in the community. Subjects and Methods Subjects The study sample was recruited from the registry of the Quality of Life After Stroke Study in Taiwan between December 1, 1999, and December 31, 2001. Patients were included in the study if they met the following criteria15: (1) diagnosis of cerebral hemorrhage or cerebral infarction; (2) first onset of cerebrovascular accident, without other major diseases; (3) stroke onset within 14 days before hospital admission; and (4) informed consent given personally or by proxy. Further selection and exclusion criteria can be found elsewhere.15 Procedure Data were collected at 1 year after stroke. An occupational therapist administered the BI and FAI to the patient and/or the patient’s main caregiver by face-to-face or telephone interview. The therapist tried to obtain the actual performance of the patients in daily life from all the informants available. Measures Two dichotomous, six 3-point, and two 4-point items constitute the BI.16 The FAI, which was developed to assess social activities or lifestyle after stroke,17 consists of fifteen 4-point items. Stroke severity at admission to the hospital was determined by the Canadian Neurological Scale.18 All 3 scales have been shown to be valid and reliable.18 –22 Data Analysis The analysis consisted of 2 parts. First, the appropriateness of the scoring points in each item of the BI and FAI was investigated with the use of the Rasch partial credit model.10 The scoring points of the items of both indices were reorganized if they were found to be inappropriate. Second, the unidimensionality of the combined BI and FAI scale with appropriate categories was examined by Rasch analysis. The WINSTEPS computer program23 was used to perform the Rasch analysis. Mean (SD) ⫺1.17 (3.94) The partial credit model can be used for polytomous items such as those of the BI and FAI. We examined the order of the step difficulties of each item of both indices to see whether their scoring points could be simplified. To examine unidimensionality of the combined scale, infit and outfit statistics were used to examine whether the data fit the model’s expectation. The infit mean square (MNSQ) is sensitive to unexpected behavior affecting responses to items near the person’s proficiency measure; the outfit MNSQ is sensitive to unexpected behavior by persons on items far from the person’s proficiency level.10,24 MNSQ can be transformed to a t statistic, termed the standardized Z value (ZSTD), which follows approximately the t or standard normal distribution when the items fit the model’s expectation. In this study items with both infit and outfit ZSTD beyond ⫾2 were considered poor fitting. When items fit the model’s expectation, the residuals (observed scores minus expected scores) should be randomly distributed. A factor analysis was conducted to verify whether any dominant component existed among the residuals. The unidimensionality assumption held if no dominant component was found.25 The sufficiency of the BI and FAI combined was verified by examining whether there were substantial gaps between the item difficulties along the item hierarchy. Results A total of 311 patients were registered in the Quality of Life After Stroke Study during the study period. Eight of the 311 patients declined to participate in the study, and 58 patients either did not survive another stroke or could not be contacted for follow-up. A total of 245 patients were included in this study. The Canadian Neurological Scale scores showed that the patients had a broad range of severity at admission. The characteristics of the study sample are shown in Table 1. Appropriateness of Level of Scaling All but 2 items (social occasions and walking outside) fitted the model’s expectation fairly well, indicating that these items measured a single construct for patients at or after 1 year after stroke. Table 2 shows that all but 2 of the 23 polytomous items of both indices exhibited disordering of the step difficulty (ie, that the difficulty of a higher step was lower than that of its adjacent lower step), indicating that the middle categories were never the most likely responses of any patient and were thus redundant. Accordingly, the response Hsueh et al TABLE 2. Step Difficulties Under the Partial Credit Model Item Step 1 Logit BI1: feeding ⫺1.53 BI2: bathing 0.47 BI3: grooming BI4: dressing BI5: bowel control Step 2 Logit Step 3 Logit Rasch Analysis of Combining 2 ADL Indices TABLE 3. Difficulty, Standard Error, and Outfit Statistics When the 2 Poor-Fitting Items of the FAI Were Excluded Item* ⫺6.93* 723 FAI13: household/car maintenance Difficulty Logit 4.72 SE Logit Outfit ZSTD Infit ZSTD 0.31 ⫺0.1 0.3 ⫺3.68 FAI14: reading books 4.72 0.31 0.0 ⫺0.6 ⫺0.19 ⫺1.00* FAI15: gainful work 4.01 0.26 ⫺0.1 0.8 ⫺5.25* FAI12: gardening 3.75 0.25 0.0 1.0 3.52 0.24 0.9 0.6 ⫺2.23 BI6: bladder control ⫺1.07 ⫺5.82* FAI9: actively pursuing hobbies BI7: toileting ⫺0.82 ⫺2.57* FAI11: travel outings/car rides 3.52 0.24 1.1 3.5 FAI1: preparing main meals 3.24 0.23 ⫺0.1 ⫺0.6 BI8: transfer ⫺4.40 ⫺2.24 BI9: mobility ⫺4.05 ⫺1.43 FAI3: washing clothes 3.19 0.23 ⫺0.2 ⫺2.5 ⫺0.38 FAI2: washing up 3.09 0.22 ⫺0.2 ⫺2.1 ⫺1.09* FAI5: heavy housework 2.75 0.22 ⫺0.2 ⫺2.2 0.03 FAI4: light housework 1.95 0.21 ⫺0.3 ⫺2.6 FAI10: driving a car/bus travel 1.83 0.20 ⫺0.2 ⫺0.8 0.59 0.21 ⫺0.2 ⫺0.6 BI10: stairs FAI1: preparing main meals FAI2: washing up FAI3: washing clothes ⫺2.59 3.94 5.67 4.66 2.36* ⫺0.72* ⫺2.35* ⫺1.12* FAI4: light housework 4.57 1.02* ⫺2.05* FAI6: local shopping FAI5: heavy housework 4.47 0.66* ⫺0.55* BI2: bathing 0.55 0.21 ⫺0.2 0.5 ⫺0.72 0.22 ⫺1.0 ⫺1.6 FAI6: local shopping 3.91 ⫺0.78* ⫺1.09* BI10: stairs FAI7: social occasions 3.90 0.35* ⫺0.50* BI4: dressing ⫺0.77 0.22 ⫺0.6 ⫺1.1 ⫺2.85 0.26 ⫺0.6 ⫺1.6 FAI8: walking outside 1.80 ⫺1.74* ⫺3.23* BI9: mobility FAI9: actively pursuing hobbies 5.79 ⫺0.79* 0.56 BI7: toileting ⫺3.48 0.27 ⫺1.0 ⫺3.4 ⫺0.55* BI8: transfer ⫺3.99 0.27 ⫺0.9 ⫺4.8 2.00 BI3: grooming ⫺6.77 0.32 0.2 0.9 0.94* BI6: bladder control ⫺7.09 0.34 0.6 0.3 FAI10: driving a car/bus travel FAI11: travel outings/car rides 3.63 0.56* 4.03 0.43* FAI12: gardening 3.41 2.27* FAI13: household/car maintenance 3.61 2.94* BI5: bowel control ⫺7.33 0.35 ⫺0.1 ⫺0.7 0.63* BI1: feeding ⫺8.41 0.44 ⫺0.1 ⫺0.1 Mean 0.00 0.26 ⫺0.1 ⫺0.8 SD 4.19 0.06 0.5 1.7 FAI14: reading books FAI15: gainful work 4.95 5.94 2.16* ⫺1.71* *Disordering of the step difficulties. *The items are arranged in descending order of difficulty. categories were collapsed into a dichotomy. The highest score category of the items in the BI was rescored as 1, indicating full independence in the activity. The rest of the categories were rescored as 0, indicating dependence in the activity. For the FAI, all but the lowest score category (0) of the items were collapsed into a category and rescored as 1, indicating participation in the activity. The lowest score category of the items in the FAI remained 0, indicating no participation in the activity. Unidimensionality of Simplified Scale When the Rasch dichotomous model was applied to examine the 25 dichotomous items together, 2 misfit items (social occasions and walking outside) had outfit ZSTD statistics of 2.9 (MNSQ⫽9.90 and 4.76, respectively) and infit ZSTD statistics of 3.4 (MNSQ⫽1.37) and 5.3 (MNSQ⫽1.86), respectively, which were statistically significant at the 0.05 level. These 2 items were excluded from the rest of the analyses. Table 3 shows that none of the outfit ZSTD was statistically significant for the remaining 23 items. The infit ZSTD had slightly more variations (SD⫽1.7) than the outfit ZSTD (SD⫽0.5), indicating that there was some degree of unexpected behavior affecting responses to items near the person’s ability measure. However, because they were not very far away from the critical point ⫾2 or ⫾3 (more liberal) and their corresponding outfit ZSTD statistics were not significant, those items with slightly extreme infit ZSTD were not treated as poor-fitting items. A factor analysis on the residuals of Rasch-transformed scores showed no dominant factors. The first and second factors accounted for only 12.3% and 8.3% of the residual variance, respectively. These results indicated that there was a good model-data fit and that the assumption of unidimensionality held for these 23 dichotomous items when assessed at 1 year after stroke. We further examined the psychometric properties of the 23-item scale. The person measures (ranging from ⫺9.46 to 6.04) had a mean of ⫺1.17 and a SD of 3.94. The person reliability was 0.94 (which can be similarly interpreted as Cronbach’s ␣), indicating that these items yielded very precise estimates for the patients. The person reliability was practically identical to that under the partial credit modeling (0.93), ie, collapsing categories did not reduce the person reliability. The person measures under the Rasch dichotomous model were also highly correlated with those under the partial credit model (r⫽0.97), indicating that collapsing the categories did not substantially alter ADL score in the patients. The Figure shows the map of persons and items along the continuum. The item difficulties were spread out across the 724 Stroke March 2004 Item and person map under the Rasch model. patients, indicating that these items could well differentiate these patients. The Figure also shows that there were obvious gaps between items BI3 (grooming) and BI8 (transfer), between BI9 (mobility) and BI4 (dressing), between BI2 (bathing) and BI10 (stairs), and between FAI6 (local shopping) and FAI10 (driving a car/bus travel). A conversion table showing combined BI and FAI raw scores (23 items) and their Rasch-transformed scores (ie, person measure in logit) is shown in Table 4. It allows prospective users to easily derive the Rasch-transformed scores from the raw scores. Discussion Combining ADL and IADL indices can provide a measurement tool with an enhanced range and sensitivity for assessing comprehensive ADL function,6,7 which is especially useful when applied to patients with stroke who are living in the community. Using Rasch analysis, this study found that such a measurement tool can be created by combining all but 2 items of the BI and FAI and simplifying the polytomous responses into dichotomous categories. Because the items of the new scale have been shown to measure comprehensive ADL function as a single, unidimensional construct for patients at 1 year after stroke, it is valid to sum the item scores of the combined scale. The total score represents a stroke patient’s ADL function on a single continuum encompassing the entire range of ADL. A higher score indicates higher independence in living in the community. There are some advantages to using the combined scale. The items of the BI are located in the lower part of item difficulty, indicating easier activities, and those of the FAI are located in the upper part, indicating more difficult activities. The total score of the combined scale is useful in showing whether the subject has basic ADL problems or IADL problems. A conversion table was compiled showing combined BI and FAI raw scores (0 to 23) and their Rasch-transformed scores. The Rasch-transformed scores can be viewed as interval-level measurements,24 whereas the combined BI and FAI raw scores are ordinal data. Since most statistical techniques assume that the data are at least on an interval scale, the Rasch-transformed scores of the combined scale are recommended for future applications. In addition, both the original BI and FAI and their scoring guidelines19,22 as used in this study can be employed as usual. To use the new scale proposed in this article, the users can first obtain the combined ADL score for a patient by adding the dichotomized scores of all but the 2 poor-fitting items and then use the conversion table to look up the corresponding Rasch-transformed interval score. There have been few empirical investigations of the appropriateness of scoring points of the BI and FAI. The items in both indices contain 2 to 4 scoring points. We found that the middle categories of both indices provide little information about the patients and are redundant. Thus, the response categories of all the items were recoded as dichotomous in this study. Given that the psychometric Hsueh et al TABLE 4. Raw Score, Rasch-Transformed Score, and Standard Error of the Combined and Dichotomized BI and FAI (23 items) Rasch-Transformed Score SE 0 ⫺9.46 1.52 1 ⫺8.56 1.18 2 ⫺7.40 1.01 3 ⫺6.38 1.04 4 ⫺5.23 1.11 5 ⫺4.05 1.05 6 ⫺3.01 1.00 7 ⫺2.03 0.98 8 ⫺1.14 0.91 9 Raw Score ⫺0.37 0.84 10 0.29 0.79 11 0.87 0.74 12 1.38 0.70 13 1.85 0.66 14 2.27 0.64 15 2.66 0.62 16 3.03 0.61 17 3.40 0.61 18 3.77 0.62 19 4.18 0.65 20 4.63 0.70 21 5.19 0.80 22 6.04 1.06 23 6.80 1.45 properties of the dichotomous items were equivalent to those of the polytomous items and that a scale with only dichotomous items is much more convenient and efficient to administer, the dichotomous categories are recommended for use in both indices for patients at or after 1 year after stroke. Tennant et al26 also used Rasch analysis to examine the dimensionality of the BI and found that the item “bladder control” did not fit the model’s expectation. The discrepancy between their findings and our findings may be explained by the different scoring guidelines of the BI used in the respective studies. We adopted the scoring guidelines suggested by Wade and Collin,22 whereas they used those from Shah et al.27 Therefore, the results from this study should be interpreted with caution when other scoring guidelines are used. Cultural or environmental factors may explain why the items “social occasions” and “walking outside” did not fit the model well.28,29 In addition, all the patients were assessed at 1 year after stroke and had been discharged from the hospital ⬎6 months previously. Our results may not be broadly applicable to patients with different characteristics (eg, patients at a subacute stage or those who have just been discharged from the hospital). The combined scale can be further improved by generating items to fill the gaps among pairs of current items Rasch Analysis of Combining 2 ADL Indices 725 (eg, grooming and transfer). These new items will be able to differentiate the ability levels of patients falling in the gaps of the current scale. Future studies to examine utility and other psychometric properties (eg, interrater reliability and responsiveness) of the combined scale in stroke patients are needed. Furthermore, it may be promising to extend the BI and FAI into an item bank and to use computerized adaptive testing to further elaborate ADL measurement. In summary, this study provides good evidence for a unidimensional construct of the dichotomized and combined BI and FAI, with the exclusion of 2 FAI items. The results imply that these 2 indices can be combined to assess comprehensive ADL function in patients after 1 year after stroke. Acknowledgments This study was supported by research grants from the National Science Council (NSC 89-2320-B-002-069, NSC 90-2320-B-002033, and NSC 91-2320-B-002-008) and the National Health Research Institute (HNRI-EX92-9204PP). References 1. Stineman MG, Maislin G, Fiedler RC, Granger CV. A prediction model for functional recovery in stroke. Stroke. 1997;28:550 –556. 2. Kelly-Hayes M, Robertson JT, Broderick JP, Duncan PW, Hershey LA, Roth EJ, Thies WH, Trombly CA. The American Heart Association stroke outcome classification. Stroke. 1998;29:1274 –1280. 3. Duncan PW, Jorgensen HS, Wade DT. Outcome measures in acute stroke trials: a systematic review and some recommendations to improve practice. Stroke. 2000;31:1429 –1438. 4. Hsieh CL, Hsueh IP. A cross-validation of the comprehensive assessment of activities of daily living after stroke. Scand J Rehabil Med. 1999;31: 83– 88. 5. Pedersen PM, Jorgensen HS, Nakayama H, Raaschou HO, Olsen TS. Comprehensive assessment of activities of daily living in stroke: the Copenhagen Stroke Study. Arch Phys Med Rehabil. 1997;78:161–165. 6. Duncan PW, Lai SM, Bode RK, Perera S, DeRosa J. Stroke Impact Scale-16: a brief assessment of physical function. Neurology. 2003;60: 291–296. 7. Lai SM, Perera S, Duncan PW, Bode R. Physical and social functioning after stroke: comparison of the Stroke Impact Scale and Short Form-36. Stroke. 2003;34:488 – 493. 8. Sodring KM, Bautz-Holter E, Ljunggren AE, Wyller TB. Description and validation of a test of motor function and activities in stroke patients: the Sodring motor evaluation of stroke patients. Scand J Rehabil Med. 1995; 27:211–217. 9. Zou KH, Tuncali K, Silverman SG. Correlation and simple linear regression. Radiology. 2003;227:617– 622. 10. Wright BD, Masters GN. Rating Scale Analysis. Chicago, Ill: MESA Press; 1982. 11. Rasch G. Probabilistic Models for Some Intelligent and Attainment Tests. Copenhagen, Denmark: Institute of Educational Research; 1960. 12. Linacre JM. Investigating rating scale category utility. J Outcome Meas. 1999;3:103–122. 13. Zhu W, Updyke WF, Lewandowski C. Post-hoc Rasch analysis of optimal categorization of an ordered-response scale. J Outcome Meas. 1997;1:286 –304. 14. Velozo CA, Peterson EW. Developing meaningful fear of falling measures for community dwelling elderly. Am J Phys Med Rehabil. 2001;80:662– 673. 15. Mao HF, Hsueh IP, Tang PF, Sheu CF, Hsieh CL. Analysis and comparison of the psychometric properties of three balance measures for stroke patients. Stroke. 2002;33:1022–1027. 726 Stroke March 2004 16. Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Md State Med J. 1965;14:61– 65. 17. Holbrook M, Skilbeck CE. An activities index for use with stroke patients. Age Ageing. 1983;12:166 –170. 18. Goldstein LB, Chilukuri V. Retrospective assessment of initial stroke severity with the Canadian Neurological Scale. Stroke. 1997;28: 1181–1184. 19. Wade DT, Legh-Smith J, Langton Hewer R. Social activities after stroke: measurement and natural history using the Frenchay Activities Index. Int Rehabil Med. 1985;7:176 –181. 20. Piercy M, Carter J, Mant J, Wade DT. Inter-rater reliability of the Frenchay Activities Index in patients with stroke and their careers. Clin Rehabil. 2000;14:433– 440. 21. Hsueh IP, Lee MM, Hsieh CL. Psychometric characteristics of the Barthel activities of daily living index in stroke patients. J Formos Med Assoc. 2001;100:526 –532. 22. Wade DT, Collin C. The Barthel ADL index: a standard measure of physical disability? Int Disabil Stud. 1988;10:64 – 67. 23. Linacre JM. WINSTEPS [computer program]. Chicago, IL: http:// www.winsteps.com; 1999. 24. Wright BD, Mok M. Rasch models overview. J Appl Meas. 2000;1: 83–106. 25. Linacre JM. Detecting multidimensionality: which residual data-type works best? J Outcome Meas. 1998;2:266 –283. 26. Tennant A, Geddes JML, Chamberlain MA. The Barthel Index: an ordinal score or interval level measure? Clin Rehabil. 1996;10:301–308. 27. Shah S, Vanclay F, Cooper B. Improving the sensitivity of the Barthel Index for stroke rehabilitation. J Clin Epidemiol. 1989;42:703–709. 28. Iwarsson S. Environmental influences on the cumulative structure of instrumental ADL: an example in osteoporosis patients in a Swedish rural district. Clin Rehabil. 1998;12:221–227. 29. Chong DK. Measurement of instrumental activities of daily living in stroke. Stroke. 1995;26:1119 –1122.