European Child & Adolescent Psychiatry 9:271±276 (2000) Ó Steinkop Verlag 2000 H. Klasen W. Woerner D. Wolke R. Meyer S. Overmeyer W. Kaschnitz A. Rothenberger R. Goodman Accepted: 22 February 2000 Dr. H. Klasen Department of Child and Adolescent Psychiatry St. George's Medical School, Tooting GB ± London SW17 0RE Dr. W. Woerner á Prof. Dr. A. Rothenberger Child and Adolescent Psychiatry University of GoÈttingen von-Siebold Str. 5 D-37075 GoÈttingen Prof. Dr. D. Wolke University of Hertfordshire Department of Psychology, College Lane GB ± Hat®eld, Herts AL10 9AB Dr. R. Meyer Lehrstuhl fuÈr Psychologie IV RoÈntgenring 10 D-97074 WuÈrzburg Dr. S. Overmeyer Klinik fuÈr Kinder- und Jugendpsychiatrie Friedrich-Schiller-UniversitaÈt Philosophenstrasse 3-5 D-07740 Jena Dr. W. Kaschnitz UniversitaÈts Klinik fuÈr Kinder- und Jugendheilkunde Klinische Abteilung fuÈr Allgemein PaÈdiatrie, Auenbruggerplatz 30 A-8036 Graz Prof. Dr. R. Goodman (&) Department of Child and Adolescent Psychiatry, Institute of Psychiatry De Crespigny Park GB ± London SE5 8AZ ORIGINAL CONTRIBUTION Comparing the German Versions of the Strengths and Dif®culties Questionnaire (SDQ-Deu) and the Child Behavior Checklist Abstract The Strengths and Diculties Questionnaire (SDQ) is a brief behavioural screening questionnaire that can be completed in about 5 minutes by the parents and teachers of 4±16 year olds. The scores of the English version correlate well with those of the considerably longer Child Behavior Checklist (CBCL). The present study compares the German versions of the questionnaires. Both SDQ and CBCL were completed by the parents of 273 children drawn from psychiatric clinics (N = 163) and from a community sample (N = 110). The children from the community sample also ®lled in the SDQ self-report and the Youth Self Report (YSR). The children from the clinic sample received an ICD-10 diagnosis if applicable. Scores from the parent and self-rated SDQ and CBCL/YSR were highly correlated and equally able to distinguish between the community and clinic samples, with the SDQ showing signi®cantly better results regarding the total scores. They were also equally able to distinguish between disorders within the clinic sample, the only signi®cant dierence being that the SDQ was better able to dierentiate between children with and without hyperactivity-inatten- Introduction The Strengths and Diculties Questionnaire is a brief behavioural screening questionnaire which was devel- tion. The study shows that like the English originals, the SDQ-Deu and the German CBCL are equally valid for most clinical and research purposes. Key words Child psychopathology ± psychometrics ± questionnaires ± validity ± German oped in England (8) and has been translated into over 40 languages. It asks about 25 attributes, some positive and some negative. The items, which have been selected on the basis of contemporary diagnostic 272 European Child & Adolescent Psychiatry, Vol. 9, No. 4 (2000) Ó Steinkop Verlag 2000 criteria as well as factor analysis, are divided between ®ve scales of ®ve items each, generating scores for Conduct Problems, Hyperactivity-Inattention, Emotional Symptoms, Peer Problems, and Prosocial Behaviours. All items of the ®rst four subscales are summed up to generate a Total Diculties Score. The same questionnaire can be completed in about ®ve minutes by parents or teachers of children aged 4 to 16. There is also a self-report version (10) for those aged 11 and above. An extended version assesses impact on social and educational function, distress, and burden on others (9). The validity of the English and Finnish instrument has been shown in various studies (8±11, 13). Translated versions are currently being validated in various countries including Spain, Bangladesh and Brazil. The factor structure has recently been con®rmed using the Swedish translation (16). Although the German version of the questionnaire has been available since 1997 and is already being used both in research and clinical practice, a systematic validation has not yet taken place. Since the German translation of the longer established Child Behavior Checklist (CBCL; 1, 4) has been extensively used and validated in large epidemiological studies (5, 6, 14, 15), it is clearly important to compare the properties of these two measures. In such comparisons, the CBCL can serve as a gold standard against which the considerably shorter SDQ can be measured. A previous study showed that the original English versions of the SDQ and CBCL were highly correlated and generally performed similarly, though the SDQ seemed superior as a measure of inattention/hyperactivity (11). The present study has ®ve aims. Firstly, we determine how well the German versions of the parent-rated SDQ and the CBCL correlate. Secondly, we examine the correlations of the German self-report SDQ and the Youth Self Report (2). Thirdly, we investigate the level of parent-child agreement for both questionnaires. Fourthly, we examine how well both German SDQs and CBCLs are able to distinguish between low-risk children in the community and high-risk children with a relevant psychiatric diagnosis. Finally, we explore within a clinic sample how well the questionnaires are able to distinguish between the type of psychiatric disorder. Method Sample Questionnaires were administered to a total of 273 children drawn from clinic and community samples. Parent-rated questionnaires were available on all subjects, but self-rated questionnaires were only available for the community sample. The clinic sample comprised 163 children seen in three psychiatric centres in Germany and Austria (Departments of Child and Adolescent Psychiatry at the Universities of GoÈttingen, Freiburg and Graz). Questionnaires were completed prior to clinical interventions. Within the clinic sample, 124 children were boys (76%) and 39 were girls (24%); 28 children (18%) were inpatients and 135 were outpatients (82%); 52% of the children were between 4 and 10 years old, while 42% were between 11 and 16. Of the 163 children in the clinic sample, 49 (30%) had an emotional disorder, 42 (26%) had a conduct disorder and 65 (40%) had a hyperactivity disorder. Comorbidity was common, with 39 children (24%) having two or more of these disorders, most commonly the combination of conduct and hyperactivity disorder. A total of 52 children (32%) had none of these three common disorders; they had other disorders (e.g. enuresis or psychosis) or were referred for other reasons (e.g. abuse). The community sample comprised 110 children from a cohort of children that had been investigated since birth (18, 19). All children were born in 1985 in Bavaria (age 12 to 13 at the point of investigation); 52% of them were boys and 47% were girls. Measures The German translations of the CBCL and YSR (4) were administered to all parents and to the children from the community sample. The SDQ was translated by a German child psychiatrist and by a professional translator, subsequently incorporating improvements suggested by colleagues and early piloting. Parent, teacher and self-report versions of the SDQ-Deu can be downloaded from the internet (http://www.sdqinfo.com). All questionnaires were scored in the standard manner (1, 2, 4, 8, 10). In the clinic sample, clinicians assigned the children an ICD-10 diagnosis (20) on the basis of detailed clinical assessment. To avoid small cell sizes, diagnoses were combined into three broad groupings for analysis: oppositional-conduct disorders, hyperactivity-inattention disorders and emotional disorders. As some ICD diagnoses represent mixed disorders (e.g. hyperkinetic conduct disorder) some children were included in more than one group. Other children in the clinic sample were not included in any of the three groups (see sample description above). Statistical analysis Given the non-normal nature of some of the distributions, correlations were calculated using Spearman's rho coecients, while comparisons of means were carried H. Klasen et al. Comparing the German SDQ and CBCL out using Mann±Whitney U tests. Correlations were performed with regard to the total score and the problem scales. The Prosocial Scale of the SDQ and the Competence Scale of the CBCL were not compared since they dier so markedly in content even though they do share a focus on positive attributes. In order to determine how well both the SDQ-Deu and the German CBCL are able to distinguish between the community sample and children with a relevant ICD-10 diagnosis, and between diagnosed children and a clinic sample, we used receiver operating characteristics (ROC) curves. Using analyses of ROC curves to compare the discriminant validity of the two questionnaires does not depend on the representativeness of the two samples; it assumes only that the relevant psychiatric disorder is more common in the high-risk than in the low-risk group. Since the ROC curves for the SDQ and CBCL were derived from the same set of subjects, statistical comparisons of the areas under these ROC curves took their paired nature into account (12). Comparisons of correlations were performed using structural equation modelling (EQS, BMDP Statistical Software) to account for the paired nature of the data. For example, when the parent-child correlation for the SDQ diered from the corresponding CBCL-YSR correlation, the signi®cance of this dierence was examined by comparing two dierent structural equation models: one allowing the correlations to be dierent and the other constraining the correlations to be equal. The dierence in correlations was signi®cant if the goodness of ®t was signi®cantly poorer when the correlations were forced to be equal (7). For the analyses presented in Table 4 comparing community and clinic samples, all 110 community subjects were included in each comparison, while the number of clinic subjects varied according to comparison. All clinic cases were included for total scores and peer/social problems. However only children with the relevant diagnosis were included in the other comparisons, e.g. the ROC analysis for emotional symptoms involved a comparison between all community subjects Table 1 Mean SDQ scores by gender and sample SDQ scale 273 and those clinic cases with an emotional disorder (excluding clinic cases who did not have an emotional disorder). Results Mean SDQ scores Table 1 presents the mean SDQ scores for the community and clinic samples, showing the results for males and females separately. The dierence between clinic and community samples was highly signi®cant for each score for both genders (p<0.001). The correlation of German SDQ and CBCL/YSR The ®rst two columns of Table 2 show the correlation of corresponding SDQ and CBCL scores. The two questionnaires correlated highly with regard to total scores as well as subscales. All correlations were statistically signi®cant at the 0.001 level. We also compared how well the questionnaires correlate when children ®ll them in themselves (Table 2, third column). The self-report questionnaires were only completed by children in the community sample. As with the parent questionnaires, the two dierent self-report questionnaires correlate well. All results were signi®cant at the 0.001 level. Comparing parent reports and self-reports Within the community sample, children ®lled in both the self-report SDQ and the YSR, while their parents ®lled in the SDQ and the CBCL. We examined how well children and parents agree regarding their symptoms when using the two questionnaires. As shown in Table 3, children and parents agree moderately well (rho between 0.36 and 0.64), with similar correlations for both sets of measures. None of the dierences in correlations are Mean score (SD) Males Total diculties Emotional symptoms Conduct problems Hyperactivity Peer problems Prosocial behaviour Female Community (N = 58) Clinic (N = 124) Community (N = 52) Clinic (N = 39) 6.6 1.3 1.0 2.8 1.5 7.6 17.4 3.4 4.0 6.5 3.5 6.1 5.1 1.8 0.7 1.9 0.7 8.9 14.9 4.3 3.3 4.6 2.7 6.4 (4.9) (1.5) (1.4) (2.4) (2.0) (2.0) (6.5) (2.5) (2.4) (2.6) (2.5) (2.1) (4.5) (1.7) (1.1) (2.1) (1.1) (1.3) All community-clinic comparisons signi®cant at p<0.001 using Mann±Whitney U test (6.7) (2.6) (2.2) (2.6) (2.5) (2.4) 274 European Child & Adolescent Psychiatry, Vol. 9, No. 4 (2000) Ó Steinkop Verlag 2000 Table 2 Correlations between German SDQ and CBCL/YSR scores Table 4 Ability of parent-rated SDQ and CBCL scores to distinguish between community and clinic samples Problem scale Community Clinic parents parents (N = 110) (N = 163) Community youth (N = 110) Problem scale Total score Emotional/Internalising Conduct/Externalising Hyperactivity/Attention problems Peer/Social 0.78 0.69 0.60 0.76 0.82 0.73 0.81 0.68 0.77 0.73 0.59 0.78 0.61 0.68 0.58 All correlations signi®cant at p<0.001 Total Score Emotional/Internalising Conduct/Externalising Hyperactivity/Attention problems Peer/Social Area under curve (SE) ± comparing community and clinic1 samples SDQ CBCL 0.91 0.85 0.97 0.94 0.87 0.88 0.96 0.92 (0.02) (0.04) (0.01) (0.02) 0.78 (0.03) p (0.02) (0.03) (0.02) (0.02) ** NS NS NS 0.81 (0.03) NS 1 signi®cant, i.e., neither of the questionnaires shows a comparative advantage or disadvantage with regard to parent-child agreement. For all comparisons, N = 110 for community. For clinic cases, N = 163 for total and peer/social scores, N = 49 for emotional/ internalising, N = 42 for conduct/externalising, and N = 65 for hyperactivity/inattention, please see method section ** p < 0.01 for z test for comparing area under ROC curves derived from the same subjects NS not signi®cant Ability to distinguish between community and clinic samples The ability of dierent SDQ and CBCL scores to distinguish between community and clinic subjects was examined using receiver operating characteristics (ROC) curves, employing the area under the curve (AUC) as the index of discriminant ability (Table 4). As a guide to interpretation, the area under the curve would be 1.0 for a measure that discriminated perfectly, and 0.5 for a measure that discriminated with no better than chance accuracy. Both SDQ and CBCL show good discriminant validity. With regard to the subscales, neither of the questionnaires shows any signi®cant advantage. However, with respect to the total score, the SDQ-Deu is signi®cantly better able to distinguish between the community and clinic sample than the German CBCL, though the magnitude of the dierence is small. Ability to distinguish between disorders within a clinic sample The ®nal analysis addresses the question of how useful the two questionnaires are when used within a clinical Table 3 Correlations between parent and self-reports (community sample, N = 110) Problem scale SDQ CBCL Total score Emotional/Internalising Conduct/Externalising Hyperactivity/Attention problems Peer/Social 0.60 0.59 0.36 0.64 0.53 0.58 0.38 0.57 0.57 0.44 sample. Did the emotional, conduct, and hyperactivity scores discriminate within the clinic sample between patients with dierent sorts of disorders? This was also examined using the area under ROC curves (Table 5). For example, how well did the SDQ emotional score or the CBCL internalising score discriminate between patients with emotional disorders and psychiatric controls, i.e. psychiatric patients who did not have an emotional disorder. There is again little dierence in the performance of the two questionnaires except in the case of hyperactivity-inattention, where the SDQ performed signi®cantly better. It is noteworthy that the questionnaires were not as good at discriminating between dierent types of disorder (Table 5) as they were at distinguishing between the clinic and community sample (Table 4). When the analyses shown in Table 5 were repeated separately for 4±10 year olds and 11±16 year olds, the pattern of ®ndings was unchanged. Discussion As was the case for the English originals (11) and for the Finnish versions (13), the German versions of the Strengths and Diculties Questionnaire and the Child Behavior Checklist correlated highly with each other. This was the case both for the parent-rated as well as for the self-rated questionnaires. Both questionnaires were able to distinguish between children drawn from community and clinic samples very well, while they both performed less well in distinguishing between dierent types of disorder within a clinic sample. The equivalence between the two questionnaires is striking as the SDQ is only about a ®fth the length of the CBCL. H. Klasen et al. Comparing the German SDQ and CBCL Table 5 Ability of scores to distinguish between dierent types of disorders within the clinic sample (N = 163) Problem scale Emotional/Internalising Conduct/Externalising Hyperactivity/Attention problems 275 Comparing clinic cases with and without: (N with/without) Area under curve (SE) p SDQ CBCL Emotional disorder (49/114) Conduct disorder (42/121) Hyperactivity disorder (65/98) 0.72 (0.05) 0.75 (0.04) NS 0.81 (0.04) 0.83 (0.04) NS 0.77 (0.03) 0.65 (0.04) ** ** p < 0.01 for z test for comparing area under ROC curves derived from the same subjects NS not signi®cant Other things being equal, shorter scales are usually less reliable than longer scales (17). In this instance, however, the brevity of the SDQ did not reduce its validity. While most of the dierences between SDQDeu and German CBCL did not reach the signi®cance level, in the two instances where they did, the SDQ performed better than the CBCL. Thus, with regard to total scores the SDQ was better able than the CBCL to distinguish between community cases and clinic cases. Within the clinic sample, the SDQ performed better than the CBCL in picking up children with an ICD-10 diagnosis of hyperkinesis, in line with previous ®ndings showing that interview measures of hyperactivity correlate better with the SDQ hyperactivity score than with the CBCL attention problem score (11). The results of this study are necessarily preliminary. The study is limited by the fact that the community children were all drawn from one age cohort, while the children in the clinic sample spanned a larger age range. It will obviously be important to replicate these ®ndings on a broader age range, using diverse clinical and community samples. It is unlikely, however, that these studies will arrive at dierent conclusions, since English (11) and Finnish studies (13) using diverse samples with a broader age range arrived at similar conclusions. Further studies could also compare the informant-rated SDQ completed by teachers with the Teacher Report Form (3) and examine the value of the impact scores of the SDQ (9). Pending larger-scale studies, the current ®ndings suggest that the two questionnaires are comparable in many ways. Both of them are therefore suited for many purposes. As Tables 4 and 5 have shown, the questionnaires are much better at distinguishing between a community sample and psychiatric cases than at discriminating between dierent sorts of disorder within a clinic sample. This makes the instruments particularly useful as screening instruments or as research tools for epidemiological studies. Within the clinic population, the use of either questionnaire as a diagnostic tool is limited, though they can be used before and after treatment to audit outcome, or they can help in prioritising cases. In some respects, however, the two questionnaires have dierent strengths. The brevity of the SDQ and its low cost in administration as well as evaluation make it a particularly useful instrument for large epidemiological studies as well as for screening of large groups of low-risk children. The SDQ does, however, have fewer subscales than the CBCL and does not ask about less common symptoms such as compulsions, hallucinations, or sexual problems. Consequently, the CBCL might be better suited for studies that require a more detailed assessment of a broader range of symptoms. The SDQ and CBCL serve somewhat dierent purposes, though both questionnaires seem equally valid for most clinical and research applications. References 1. Achenbach TM (1991a) Manual for the Child Behavior Checklist/4-18 and 1991 Pro®le. University of Vermont, Department of Psychiatry, Burlington 2. Achenbach TM (1991b) Manual for the Youth Self Report. University of Vermont, Department of Psychiatry, Burlington 3. Achenbach TM (1991c) Manual for the Teacher's Report Form and 1991 Pro®le. University of Vermont, Department of Psychiatry, Burlington 4. Arbeitsgruppe Deutsche Child Behavior Checklist (1998) Elternfragebogen uÈber das Verhalten von Kindern und Jugendlichen: Deutsche Bearbeitung der Child Behavior Checklist (CBCL/ 4-18). EinfuÈhrung und Anleitung zur Handauswertung. 2. Au¯age mit deutschen Normen. Arbeitsgruppe Kinder-, Jugend-, und Familiendiagnostik, KoÈln, Germany 5. DoÈpfner M, Schmeck K, Poustka F, Berner W, Lehmkuhl G, Verhulst F (1996) VerhaltensauaÈlligkeiten von Kindern und Jugendlichen in Deutschland, den Niederlanden und den USA. Eine kulturvergleichende Studie mit der Child Behavior Checklist. Nervenarzt 67:960±967 6. DoÈpfner M, PluÈck J, Berner W, Fegert JM, Huss M, Lenz K, Schmeck K, Lehmkuhl U, Poustka F, Lehmkuhl G (1997) Psychische AuaÈlligkeiten von Kindern und Jugendlichen in Deutschland. Ergebnisse einer repraÈsentativen Studie: Methodik, Alters-, Geschlechtsund Beurteilungseekte. Zeitschrift fuÈr 276 7. 8. 9. 10. 11. European Child & Adolescent Psychiatry, Vol. 9, No. 4 (2000) Ó Steinkop Verlag 2000 Kinder- und Jugendpsychiatrie und Psychotherapie 25:218±233 Dunn G, Everitt B, Pickles A (1993) Modelling Covariances and Latent Variables Using EQS. Chapman and Hall, London, England Goodman R (1997) The Strengths and Diculties Questionnaire: A research note. Journal of Child Psychology and Psychiatry 38:581±586 Goodman R (1999) The extended version of the Strengths and Diculties Questionnaire as a guide to caseness and consequent burden. Journal of Child Psychology and Psychiatry 40:791±799 Goodman R, Metzer H, Bailey V (1998) The Strengths and Diculties Questionnaire: A pilot study of the validity of the self-report version. European Child and Adolescent Psychiatry 7:125±130 Goodman R, Scott S (1999) Comparing the Strengths and Diculties Questionnaire and the Child Behavior Checklist: Is small beautiful? Journal of Abnormal Child Psychology 27: 17±24 12. Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating curves derived from the same cases. Radiology 148:839±843 13. Koskelainen M, Sourander A, Kaljonen A (2000) The Strengths and Diculties Questionnaire among Finnish school aged children and adolescents, submitted for publication. European Child and Adolescent Psychiatry 9:277± 284 14. Lehmkuhl G, DoÈpfner M, PluÈck J, Berner W, Fegert J, Huss M, Lenz K, Schmeck K, Lehmkuhl U, Poustka F (1998) HaÈu®gkeit psychischer AuaÈlligkeiten und somatischer Beschwerden bei vier- bis zehnjaÈhrigen Kindern in Deutschland im Urteil der Eltern. Ein Vergleich normorientierter und kriterienorientierter Modelle. Zeitschrift fuÈr Kinder- und Jugendpsychiatrie und Psychotherapie 26:83±96 15. Remschmidt H, Walter R (1990) Psychische Aufaelligkeiten bei Schulkindern. Eine epidemiologische Studie. Zeitschrift fuÈr Kinder- und Jugendpsychiatrie18:121±132 16. Smedje H, Broman J-E, Hetta J, von Knorring A-L (1999) Psychometric properties of a Swedish version of the ``Strengths and Diculties Questionnaire''. European Child and Adolescent Psychiatry 8:63±70 17. Streiner DL, Norman GR (1989) Health Measurement Scales. Oxford University Press, Oxford, England 18. Wolke D, Meyer R (1999) Ergebnisse der Bayrischen Entwicklungsstudie: Implikationen fuÈr Theorie und Praxis. Kindheit und Entwicklung 8:24±36 19. Wolke D, Meyer R (1999) Cognitive status, language attainment and prereading skills of 6-year-old very preterm children and their peers: The Bavarian Longitudinal Study. Developmental Medicine and Neurology 41:94±109 20. World Health Organization (1992) The ICD-10 Classi®cation of Mental and Behavioural Disorders: Clinical Descriptions and Diagnostic Guidelines. World Health Organization, Geneva, Switzerland