C 2006) Journal of Abnormal Child Psychology, Vol. 34, No. 3, June 2006, pp. 379–391 ( DOI: 10.1007/s10802-006-9027-x The Short Mood and Feelings Questionnaire (SMFQ): A Unidimensional Item Response Theory and Categorical Data Factor Analysis of Self-Report Ratings from a Community Sample of 7-through 11-Year-Old Children Carla Sharp,1,4 Ian M. Goodyer,2 and Tim J. Croudace3 Received February 24, 2004; revision received May 10, 2005; accepted September 13, 2005 Published online: 29 April 2006 Item response theory (IRT) and categorical data factor analysis (CDFA) are complementary methods for the analysis of the psychometric properties of psychiatric measures that purport to measure latent constructs. These methods have been applied to relatively few child and adolescent measures. We provide the first combined IRT and CDFA analysis of a clinical measure (the Short Mood and Feelings Questionnaire—SMFQ) in a community sample of 7-through 11-year-old children. Both latent variable models supported the internal construct validity of a single underlying continuum of severity of depressive symptoms. SMFQ items discriminated well at the more severe end of the depressive latent trait. Item performance was not affected by age, although age correlated significantly with latent SMFQ scores suggesting that symptom severity increased within the age period of 7–11. These results extend existing psychometric studies of the SMFQ and confirm its scaling properties as a potential dimensional measure of symptom severity of childhood depression in community samples. KEY WORDS: Screening; childhood depression; SMFQ; item response theory; categorical data factor analysis. Over the last 40 years, the methods used to evaluate the psychometric basis of ability tests, health care surveys, and multi-item screening instruments has changed dramatically. Whilst the methodology of classical test theory (CTT) has served test development well, item response/latent trait theory (IRT) approaches have become more mainstream as the technical basis for measurement theory, test construction and scale evaluation (Embretson & Reise, 2000). Although moves towards adoption of more appropriate, non-linear and categorical data factor analysis (CDFA) models have been most apparent in ed1 2 3 4 ucational settings, in the last two decades such methods have begun to be applied in clinical testing of adults. This has been evidenced by psychometric studies published in, for example, Psychological Assessment and Psychological Methods (e.g. Angold, Erkanli, Silberg, Eaves, & Costello, 2002; Cooke & Michie, 1997; Lambert et al., 2003; Patton, Carlin, Shao, Hibbert, & Bowes, 1997; Santor, Ramsay, & Zuroff, 1994). Currently there are very few reports that have applied such methodologies in samples of young children (Cheong & Raudenbush, 2000). One reason for the under-exploitation of such methodologies may be because researchers have not been introduced to the potential and practicalities of these methods (Rouse, Finger, & Butcher, 1999) and are therefore unaware of the advantages they offer over conventional (CTT) methods (van der Linden & Hambleton, 1997). Although CTT is often included in the curriculum of both clinical and applied psychologists, IRT is rarely taught, and has had less coverage in mainstream Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, Texas. Developmental Psychiatry Section, University of Cambridge, Cambridge, UK. Department of Psychiatry, University of Cambridge, Cambridge, UK. Address all correspondence to Carla Sharp, Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, 1 Baylor Plaza, Houston, Texas 77030; e-mail: csharp@hnl.bcm.tmc.edu. 379 C 2006 Springer Science+Business Media, Inc. 0091-0627/06/0600-0379/0 380 psychology journals (Embretson & Reise, 2000). We provide an application of IRT and categorical data factor analysis (CDFA) methods to a commonly used self-report measure of depressive symptoms in children, the Short Mood and Feelings Questionnaire (SMFQ; Angold et al., 1995). As such, the aim of this paper was to scrutinize the internal construct validity of the SMFQ. To this end, we used latent variable models implemented with both an IRT and appropriate factor analysis framework (CDFA). Within a latent variable framework, internal construct validity refers to an understanding of the psychometric performance of items in relation to an underlying (latent) construct of interest. Here the latent variable was a continuum of severity derived from self-reported depressive symptoms in 7–11-year-old children. This latent variable was psychometrically derived, and was not validated through an external criterion measure of depression; hence our results pertain only to internal construct validity. The Mood and Feelings Questionnaire (MFQ), long form, was developed as a screening tool for detecting clinically meaningful signs and symptoms of depressive disorders in children and adolescents (6–17 years of age) by self-report (Angold et al., 1995; Costello & Angold, 1988). MFQ items were designed to cover DSM diagnostic criteria for major depressive disorder (APA, 1994). Over the past decade, the long form consisting of 33 items has been used extensively in both epidemiological and clinical research (Costello et al., 1996a, 1996b; Goodyer, Herbert, Tamplin, & Altham, 2000; Kent, Vostanis, & Feehan, 1997; Park, Goodyer, & Teasdale, 2002; Wood, Kroll, Moore, & Harrington, 1995). Criterion-related validity (ability to predict clinical diagnosis) has been established for the long form (Wood et al., 1995). A short form consisting of 13 items (SMFQ) was subsequently derived for which criterion validity has also been shown (Angold et al., 1995; Kent et al., 1997; Thapar & McGuffin, 1998). Table I presents details of studies that have used the self-report version of both the long and short form of the MFQ and summarizes the validity issues addressed. Although these studies are universally supportive of the internal construct validity of the SMFQ, most samples were from clinic or specially selected populations. Currently, there is no published study on the internal construct validity of the short version in a community sample of primary school-aged children. Establishing internal construct validity in 7–11-year-old children is essential if the SMFQ is to be recommended as a self-report screening measure of severity of depression in community samples of children. Most existing studies, as summarized in Table I, have applied statistical procedures based on CTT, linear factor Sharp, Goodyer, and Croudace analysis or principal components. Such methods are not optimal for the discrete/categorical nature of the MFQ responses (Angold et al., 2002) for several reasons. Traditional methods, such as principal components analysis, assume that item responses are on a continuous metric, yet, psychopathology ratings are recorded using discrete categories in most community studies to date. Usually, the commonest (modal) response on SMFQ items is zero, representing absence of symptoms. Even when ratings are collected on graded scales, response distributions are usually heavily skewed. Failure to treat symptom ratings as categorical data in factor analysis models has two consequences: (1) in multi-dimensional analysis the true factor structure may be severely distorted, and (2) in unidimensional models factor loadings may be biased and resultant (weighted) scores estimated incorrectly. When applying linear models to binary (0,1) data, predictions are made that are not within the plausible range (<0 or >1) (McDonald, 1999). This is obviously highly undesirable when relatively rare symptoms are being modelled. The limitations of linear models in these situations are well known to statisticians and methodologists, but are frequently ignored by applied researchers. Only in limited circumstances will linear methods approximate a more appropriate model (Shrout & Parides, 1992). Although CTT and IRT/CDFA both assume that variation in the observed responses to items of a test can be explained by one or more continuous unobserved latent traits, the way IRT/CDFA models the relation between observed item-responses and the latent trait is different. Instead of summarizing the psychometric properties of a scale with omnibus statistics (such as item-total correlations or Cronbach alpha), thereby averaging across levels of individual variation (Santor et al., 1994), IRT approaches model how the probability of responding to an item—here this is equivalent to endorsing a symptom— varies as a function of the location along a latent continuum or dimension of variation (Santor et al., 1994). IRT methods do not use summary statistics that apply to groups of individuals, such as correlations, but can define a model for the individual response patterns that comprise the raw data. Because item response patterns can be modelled directly within an IRT framework, no information in the data is lost. A mathematical equation—a probability model, similar to that used in logistic or probit regression—is used to describe the non-linear relation between an item-response and the value (severity) on the latent trait. This relation can be represented graphically by a plot that is known as an item characteristic curve (ICC) or item response function (IRF). The ICC offers a graphical profile of item An IRT and CDFA Analysis of the SMFQ 381 Table I. Summary of Validity Characteristics of Studies that Used the Self-report Versions of the MFQ Long and Short Forms Author and date Samples Age N Sex Type of validity assessed Analytic strategy 10–19 43 boys 61 girls ROC analyses Criterion validity for long form, using the K-SADS as criterion 6–17 33 boys 15 girls Internal consistency, content and criterion-related validity for SMFQ, using the DISC as criterion 6–11 54 boys 71 girls Messer et al. (1995) 1502 high risk community children 6–13 Boys only Internal consistency Kent et al. (1997) 114 consecutive attendees at four clinics 7–17 56 boys 57 girls Thapar and McGuffin (1998) 411 twins 11–16 99 boy pairs 123 girl pairs 94 mixed pairs Criterion validity for long and SMFQ, using the K-SADS as criterion Criterion validity for SMFQ, using the CAPA as criterion Wood et al. (1995) 104 consecutive referrals to outpatient psychiatric clinic Angold et al. (1995) 48 consecutive referrals to outpatient psychiatric clinic 125 referrals to a pediatric clinic Principle component analysis Cronbach’s alpha Results Sensitivity 0.78 Specificity 0.78 Unidimensional factor structure (acceptable model fit statistics) Cronbach’s alpha 0.85 High ORs for predicting psychiatric diagnosis Sensitivity 0.60 Specificity 0.85 CFA using LISREL VII Unidimensional structure of SMFQ confirmed: GFI and AGFI indices high (>0.97); RMSR low (>0.08); χ 2 /df indices all <3 Correlational analyses Sensitivity 0.59 ROC analyses Specificity 0.89 Maximum likelihood logistic regression ROC analyses Correlational analyses ROC analyses Sensitivity 0.75 Specificity 0.74 Note. Criterion validity here refers to the SMFQ’s ability to detect major depressive episode in study samples with an acceptable degree of sensitivity and specificity. Internal consistency refers to the underlying factor structure of the SMFQ, notably whether it can be demonstrated to be a unidimensional scale. The current study did not aim to examine criterion validity but only internal consistency. effectiveness, which is more informative than traditional measures of item performance (Santor et al., 1994). For a detailed discussion of the distinction between CTT and IRT approaches and the benefits of using the latter, see Embretson and Reise (2000). More detailed information on the effective measurement range of individual items and scales is especially important for psychopathology measurement in developmental epidemiology studies. Epidemiological evidence has suggested that symptom profiles may differ with age for reasons that are not fully understood. Weiss and Garber (2003) outlined several ways in which developmental differences may impact on the phenomenology of depression over the course of childhood and adolescence. From a psychometric perspective, it may be, for instance, that certain items are not appropriate for certain age groups, or that the symptoms (defined by the wording of questionnaire items) are not a feature of a particular disorder in that age group. As the current study demonstrated (see section on differential item function—DIF), IRT provides the opportunity to distinguish between bias at the level of the item (i.e. the item does not accurately probe for the symptom for a particular age group) and bias at the level of the latent trait (i.e. the disorder does not express itself through a particular symptom in a particular age group). To our knowledge, only one study to date has adopted an appropriate categorical factor analysis 382 approach to SMFQ data (Angold et al., 2002). Angold et al. (2002) reported CDFA solutions estimated using weighted least squares estimation methods in Mplus software. Their study confirmed the SMFQ’s unidimensional structure (single factor) in two samples: First, the Great Smoky Mountains Study comprised 9-, 11and 13-year-olds. This sample of n = 1441 represented 25% of the highest scorers on the Child Behaviour Checklist out of a community-based sample of n = 4500, i.e. a “high-risk” sample. Similar results were found in a second sample from a family study of n = 1412 twins. Confirming the unidimensional factor structure of the SMFQ with non-linear factor analytic techniques (Angold et al., 2002) was an important first step in examining the psychometric properties of the SMFQ from a latent trait modelling perspective. However, no information was presented on the effective measurement range of the SMFQ. This is more easily summarised using graphical representations of the model from an IRT perspective, i.e. the test characteristic curve (TCC). We repeated and extended the methods used by Angold et al. and offer more graphic representations of the latent variable modelling. In addition, we provide the first IRT/CDFA modelling of SMFQ data in a community sample of younger children (including 7and 8-year-olds). Given concerns that there might be item bias with respect to age (with different thresholds for response for younger and older children), we further extended Angold’s approach by testing for item bias (DIF) using a multiple indicator multiple cause (MIMIC) modelling approach (Gallo, Anthony, & Muthén, 1994). Item bias is present when individuals with the same score on the psychometrically derived latent trait are more or less likely to endorse an item. MIMIC modelling investigates item bias through an extension of the CDFA factor analysis model to include covariates, both of the latent trait, and of the items. In testing for age invariance of item thresholds, the estimates of interest are the direct effects (regressions of) item responses on age as a covariate after adjustment for the effect of age on the latent trait score. Given that prevalence rates of depression increase with age in adolescence (Goodyer & Sharp, 2005; Hankin et al., 1998), we expected this might also be the case for this age group. We therefore anticipated finding an age difference in latent trait scores, indicating more psychopathology in the older children (10- and 11-year-olds), but no evidence of item bias for any of the SMFQ items. We were able to test for the effect of age, because unlike previously reported studies, our sample comprised a cohort of younger children (7–11-year-olds) recruited and assessed in an elementary school setting. Sharp, Goodyer, and Croudace METHOD Participants Parents of 2950, 7- to 11-year-old children (primary school years 3–6) of 16 primary schools from a mixed catchment of rural and urban areas in Cambridgeshire, England were asked to participate. Response rates for individual schools ranged from 14 to 40% resulting in 20% of the children taking part in the study (n = 659; 319 boys and 340 girls). There are four possible reasons to explain the low response rate. First, the ethics approval requirements prohibit researchers from gaining access to names and addresses of parents in the community. Invitation letters to participate in the study were therefore handed out to children at school to take home to their parents. It is possible that many invitation letters did not make it home in the first place. Second, ethics in the UK require positive consent. The effort of actually completing and returning a consent form to indicate positive consent may be too much to ask of some parents in the community. Third, limited resources precluded payment to children for their participation. Instead, children were given a sticker and were entered into a school raffle drawing for their participation. Fourth, it is possible that parents feel more protective of children in the below-11 age range compared with adolescents, where the response rate for community studies using a school-based ascertainment procedure in the UK is typically 50% (Goodyer et al., 2000). All children had an estimated IQ above 80. The mean estimated IQ for the sample was 104 (SD = 14) and the mean age was 9 years, 5 months (SD = 12 months). The ethnic distribution in the sample was in line with regional statistics (Office of National Statistics, 2001) for eastern England (97% white, 2% of middle-eastern origin, 0.5% black and 0.5% Asian). Two procedures were employed to determine participation bias. First, permission was obtained for teachers of one of the schools to complete a screening measure of common emotional and behaviour problems, the Strengths and Difficulties Questionnaire (Goodman, 1997, 2001; Goodman, Ford, Simmons, Gatward, & Meltzer, 2000) on all the children in the school. Children who completed the SMFQ were compared with those who did not for their ratings on the SDQ. Independent sample t-tests revealed no evidence of any differences between the participants (n = 61) and non-participants (n = 232) when the five sub-scales of the SDQ (hyperactivity, emotional symptoms, conduct problems, peer problems, prosocial behaviour) were compared. Comparison of sociodemographic characteristics also revealed no difference between participants and non-participants. An IRT and CDFA Analysis of the SMFQ Measures and Procedure Short Mood and Feelings Questionnaire (Angold et al., 1995) Our primary measure was the self-report SMFQ which comprises 13 items with a common response format: 0, never; 1, sometimes; 2, always. The SMFQ was administered individually at the same time as all the other measures. Due to concerns raised regarding the reading and understanding ability of younger children (Messer et al., 1995; Thapar & McGuffin, 1998), teachers were consulted as to the level of understanding for the 7-yearolds (youngest cohort), and it was decided that questions would be read aloud to this group (8%). As in previous studies that have used the SMFQ with younger children (Angold et al., 1995), the answers recorded were the participants’ self-reports and not the examiners’ opinions about them. Children in higher grades were invited to ask for help, if needed. However, none of the children in the high grades did so. IQ A shortened version (Vocabulary and Block Design subtests only) of the Wechsler Intelligence Scale for Children III (Wechsler, 1992), was used to estimate overall IQ in the sample. Sattler’s (1988) guidelines were used for administration and scoring. Psychopathology To assess response bias, teachers completed the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997, 2001; Goodman et al., 2000). The SDQ was specifically designed to screen for psychiatric disorders in community samples and was shown to identify individuals with psychiatric diagnosis with a specificity (the proportion of people without disease who have a negative test result) of 94.6% (95% CI 94.1–95.1%) and a sensitivity of 63.3% (59.7–66.9%) (Goodman et al., 2000). Sensitivity (the proportion of people with disease who have a positive test result) for the SDQ has been demonstrated to be especially good for (70–90%) for identification of conduct, oppositional disorders and hyperactivity disorders. Data Analytic Strategy Combining CDFA with IRT Our primary data analytic strategy was to apply two types of latent variable measurement models to the data: 383 Categorical data factor analysis and item response (latent trait) theory (IRT). In the literature, applications of CDFA methodology often report only numerical results, whereas applications of related IRT models summarise their findings graphically. We wished to exploit both representations and therefore applied software for estimating both types of models. Background to Statistical Models Introductory, technical and statistical accounts of the family of the models that comprise the IRT and CDFA approach are presented elsewhere (Baker, 2001; Duncan-Jones, Grayson, & Moran, 1986; Lord, 1980; Lord & Novick, 1968; van der Linden & Hambleton, 1997). Bartholomew and Knott (1999) and McDonald (1999) provide excellent accounts of relations between these models and more traditional models. Briefly, CDFA is an extension of the linear factor analysis model intended for dimension reduction and scaling of multiple binary or ordinal items. We consider only the probit-probit factor model of Muthén (1984, 1989). The model provides regression estimates of factor scores which are location values for latent trait scores along the single dimension of depression severity. IRT analysis allows for detailed examination of the properties of individual items by determining item characteristic curves. ICCs characterize the interaction of a person with an item and plot the probability of a response (endorsing a symptom) given the level of the underlying characteristic measured by the “test” (Cooke & Michie, 1997)—in this case the severity of self-reported depressive symptoms. ICCs are defined in terms of two parameters that govern the shape and position of the S-shaped curves. In educational testing settings where IRT models were developed, the first parameter, a, is referred to as the item “discrimination” and governs the steepness of the slope of the ICC at the inflexion point of the S-shaped curve. A lower a-value is associated with a more gradual slope. Items with low slope estimates provide information over a wider range of latent trait values. Items with higher a-values have steeper curves and therefore discriminate more finely, but over a smaller range of latent trait values. The second parameter, b, relates to the prevalence of the item (here the proportion of the sample that endorse each SMFQ symptom). In educational settings, it is referred to as the “difficulty” parameter since there it is related to the difficulty of the test item. The b-parameter is related to the point on the trait dimension at which a respondent has a 50% probability of endorsing the item. In clinical and epidemiological studies, this parameter is referred to 384 Sharp, Goodyer, and Croudace as “commonality,” since it relates to the prevalence of the symptoms. If the b-value is low, it indicates that the item/symptom is frequently endorsed even among lowtrait individuals. In contrast, high values indicate that the item/symptom is likely to be endorsed only among hightrait individuals (e.g. severely depressed). of age on the location of the threshold term in the item response model. Graphical Representation of Scale Performance from IRT All participants who completed the SMFQ answered all questions because questionnaires were individually administered and checked before they were returned. There were 10 children who did not want to complete the questionnaire. Children were not required to give a reason for non-participation. Full questionnaire data were therefore available for 649 children. There were no missing data. The SMFQ is a 13-item measure to which each response can be 2, 1 or 0. We recoded all responses to binary format (0/1) because the low frequency (<5%) of 2 scores meant that very little information would be lost in grouping these with 1 responses (options were recoded in the following way: 0 = 0 and 1, 2 = 1). Another advantage of recoding to binary responses (collapsing the two highest response categories) is that it provides a generally more parsimonious model, and simplifies the reporting of our IRT analysis of a candidate childhood depression screening tool. A 2 × 2 comparison (7-year-olds versus older × score of 0 versus 1 on the SMFQ) on each item showed no significant differences between 7-year-olds (to whom questions were read out) and the rest of the children. Children 8 years and older did not display difficulty completing the questionnaires by themselves. In addition, age did not correlate with SMFQ total scores, although a significant relationship was found for SMFQ total scores and IQ (r = −0.17; p < 0.01). Table II shows the prevalence of binary recoded item responses to each item: Five items were endorsed by 15% or more of the sample (item numbers: 4, restless; 7, poor concentration; 3, tired; 10, felt lonely; 12, never be as good); five items were endorsed by fewer than 10% (item numbers: 2, not enjoy anything; 6, cried a lot; 9, bad person; 11, unloved; 13, did everything wrong). The test information function (TIF) is a particularly useful output of an IRT analysis. The TIF profiles variations in the precision of measurement of the latent trait scores over the full range of estimated values (latent trait scores). The TIF provides a compelling graphical assessment of the effective measurement range of the SMFQ instrument. Reports of CDFA analyses do not usually provide any information on the range over which reliable (accurate) scores are estimated for individuals, only reliability at an aggregate level (for a sample group). Software for Model Estimation For the CDFA, we implemented Muthen’s robust weighted least squares estimation approach using Mplus software (see Flora & Curran, 2004). For the IRT analyses, we implemented marginal maximum likelihood (MML) estimation of the two parameter logit-probit IRT model (Albanese & Knott, 1990; Bartholomew, 1987; Bartholomew & Knott, 1999; Bartholomew, Steele, Moustaki, & Gabraith, 2002) in TWOMISS. TWOMISS estimates model parameters for the logit-probit model by a maximum likelihood procedure using a modified expectation-maximisation (EM) algorithm. MIMIC Modelling to Test for Differences in Item CDFA Intercepts (thresholds) by Age In order to explore the impact of age on item responses, we used the MIMIC modelling approach of Gallo et al. (1994). This extends the categorical data factor model to include a covariate. MIMIC modelling enabled us to test the hypotheses of a linear association of latent trait scores with age (as a continuous measure), but no impact of age on the SMFQ item intercepts (thresholds). The presence of the latter would indicate differential item function with respect to age. We tested DIF for each SMFQ item (one at a time) by freely estimating the correlation of the latent factor with age, and also estimating the impact RESULTS Descriptive Statistics Confirmatory Factor Analysis in Mplus Using the Probit-Probit Model (Muthén, 1984, 1989) First we summarize the results in terms of the lambda and tau parameters from the confirmatory categorical data factor analysis model. Then, we discuss indications of model fit [chi-square, root mean square error of An IRT and CDFA Analysis of the SMFQ 385 Table II. Item Endorsement in Terms of the Proportion of the Sample that Endorsed Each Individual Item, Tetrachoric Correlations Between Items, and Proportion of the Sample that Endorsed Pairs of Items Item 1 2 3 4 5 6 7 8 9 10 11 12 13 % 1 0.107 0.094 0.176 0.275 0.114 0.090 0.178 0.114 0.060 0.164 0.088 0.151 0.077 0.331 0.412 0.233 0.474 0.439 0.490 0.474 0.589 0.523 0.525 0.474 0.495 2 3 0.023 0.043 0.036 0.356 0.060 0.181 0.448 0.335 0.362 0.403 0.510 0.401 0.473 0.357 0.209 0.258 0.361 0.284 0.261 0.435 0.440 0.037 0.380 0.497 4 5 6 7 0.045 0.036 0.028 0.049 0.029 0.031 0.022 0.046 0.065 0.040 0.037 0.063 0.054 0.036 0.091 0.315 0.031 0.049 0.188 0.468 0.031 0.430 0.456 0.299 0.373 0.686 0.468 0.531 0.286 0.507 0.447 0.565 0.175 0.599 0.478 0.455 0.341 0.594 0.386 0.478 0.239 0.662 0.388 0.526 0.370 0.551 0.554 0.628 8 9 10 11 12 0.036 0.032 0.042 0.059 0.054 0.031 0.056 0.028 0.011 0.020 0.029 0.025 0.019 0.036 0.028 0.049 0.034 0.049 0.060 0.059 0.040 0.065 0.051 0.031 0.032 0.017 0.039 0.045 0.039 0.022 0.042 0.032 0.023 0.048 0.043 0.037 0.056 0.062 0.062 0.032 0.068 0.054 0.028 0.057 0.043 0.567 0.509 0.500 0.578 0.600 0.509 0.557 0.593 0.477 0.449 0.559 0.493 0.397 0.470 0.618 13 0.028 0.020 0.039 0.042 0.032 0.028 0.048 0.036 0.019 0.031 0.023 0.043 Note. Column 1 (item) refers to the 13 SMFQ items. Column 2 (%) refers to proportion of the sample that endorsed each item (responded 1or 2 rather than 0). The lower left triangle denotes the tetrachoric correlations between items. The upper right triangle denotes the proportion of the sample that endorsed both items. Item labels, 1: Miserable/unhappy; 2: Not enjoy anything; 3: Tired; 4: Restless; 5: No good anymore; 6: Cried a lot; 7: Poor concentration; 8: Hated self; 9: Bad person; 10: Felt lonely; 11: Unloved; 12: Never as good; 13: Everything wrong. approximation (RMSEA) and standardised root mean square residuals (SRMR)]. Inspection of the factor loadings (column 3—lamda values) for the full sample shows high values for all items. Items 2, 3, 4 and 6 had the lowest factor loadings. The variance in item responses not explained by the single latent factor is quantified by the residual variances in column 4. Clearly these are quite substantial in magnitude for items 2, 3, 4 and 6. The standard errors (not shown) for these items were also larger than for other items. Overall the magnitude of most of the loadings is consistent with the hypothesis underpinning our use of a single latent variable, that is, that the SMFQ items are relatively sensitive (discriminating) indicators of an underlying continuum of depression. Tests of Model Fit All indices of model fit (chi-square, RMSEA and SRMR) supported the adequacy of a single latent variable model. The comparative fit index (CFI) (0.992) and Tucker Lewis Index (TLI) (0.994) were both high and close to 1; the root mean square error of approximation was low (<0.06) (RMSEA = 0.018). The standardized root mean square residual (0.063); the weighted root mean square residual were also low (0.833). There was little scope for model fit improvement by increasing the number of factors, and little justification in terms of our study aims (full results available on request from first author). Full Information Item Factor Analysis Using the Logit-Probit Model (Bartholomew & Knott, 1999) Pattern Frequencies We first considered pattern frequencies for all response patterns for the binary recoded data. Under the full information approach, all information from the multiway cross-tabulation of all item responses is used. Thus, there are 132 possible response patterns for which frequencies may be reported. In most samples with more than six items, many possible patterns will not be observed (since sample size is usually much less than the number of itemsnumber of response categories ). Only with shorter instruments is it possible to display all pattern frequencies (i.e. for scales with 5 or 6 binary items). For a 13-item questionnaire and a sample of n = 659 like the current study, 10 pages of response patterns were returned. Space does not permit a full report of all response frequencies. Thus, only the most common patterns will be commented upon here. A full listing of response frequencies is available from the first author (CS). The three most common responses patterns are reported here. Unsurprisingly, given the nature of the sample, the most common pattern was the absence of all morbidity [0000000000000] with a frequency of n = 267 (41%). This represents the modal pattern of children in the sample who had none of the 13 indicators of depression. The second most common pattern was [0100000000000] with a frequency of 63 (10%) 386 Sharp, Goodyer, and Croudace indicating endorsement of only one item, item 2 (did not enjoy anything). The only other two-digit pattern frequency was 23 (3%) for the pattern [0001110000000] indicating endorsement of items 4 (restless), 5 (no good anymore) and 6 (cried a lot). This is the modal response for those children in the sample who had any symptoms (at least one item endorsed). median value on the latent trait. Here, these values range from lower to upper limits. So far we have focussed on the numerical results of the IRT and CDFA models. Next we consider the graphical representations that further facilitate the interpretation of these parameter estimates. Parameter Estimates Graphic Representation of Test and Item Performance Table III reports parameter estimates from TWOMISS for the maximum likelihood factor analysis under the Bartholomew and Knott (logit-probit) model. Here, the IRT factor analysis model is parameterised differently, using alpha 0 for item intercepts and alpha 1 for item slopes, but gives rise to very similar S-shapes for the item characteristic curves (see later). The final column of Table III reports a transformation of the alpha 0 parameter showing the probability of endorsing an item for an individual at the Table III. Corresponding Categorical Data Factor Analysis Results from Bartholomew and Knott’s (1999) Logit-Probit Item Response Function Model Item 1. Miserable 2. Not enjoy 3. Tired 4. Restless 5. Felt no good 6. Cried a lot 7. Poor concentration 8. Hated self 9. Bad person 10. Felt lonely 11. Unloved 12. Never be as good 13. Everything wrong Commonality Discrimination α i0 (SD) α i1 (SD) Symptom endorsement probability for median individual, π i0 3.12 (0.29) 2.82 (0.231) 1.88 (0.152) 1.08 (0.105) 3.68 (0.432) 3.09 (0.274) 2.42 (0.246) 1.80 (0.28) 1.25 (0.23) 1.10 (0.17) 0.75 (0.13) 2.49 (0.40) 1.49 (0.26) 1.93 (0.27) 0.04 0.05 0.13 0.25 0.02 0.04 0.08 3.64 (0.441) 4.30 (0.541) 2.39 (0.223) 3.80 (0.476) 2.79 (0.275) 2.44 (0.41) 2.15 (0.43) 1.71 (0.24) 2.19 (0.42) 2.05 (0.28) 0.02 0.01 0.08 0.02 0.05 4.28 (0.545) 2.44 (0.46) 0.01 Note. Under the logit-probit model, the alphai0 (α i0 ) parameter in column 2 relates to the location on the x-axis of the inflexion point of the item response function curve. Alphai1 (α i1 ) relates to the steepness of the slope of each ICC. Column 3 (π i0 ), is a transformation of the α i0 parameter which denotes the probability of endorsing each SMFQ item for an individual with a latent trait score of zero, i.e. at the median/mean/modal value on the latent trait. The parameters have been estimated using full information marginal maximum likelihood estimation (modified EM algorithm) in TWOMISS software. Test Information Function (TIF) We first report a graphic representation of the psychometric performance of the test as a whole (Fig. 1)—the TIF. The TIF is derived from the inverse of the posterior standard deviation of the latent trait estimates, when the factor scores (or latent traits estimates) are estimated using Bayesian (EAP) estimation. The TIF plots a function of the standard error. Taking the reciprocal of variance or standard deviation provides a humped plot with higher values indicating regions of precise measurement (small standard errors, relative to other regions)—this is “test information.” As expected, the curve dips sharply at the end points where the SMFQ items provide little information. Figure 1 clearly shows that the most information (and therefore highest precision of measurement) is provided by the SMFQ around 1.5 standard deviations above the mean (0) on the latent trait. Item Characteristic Curves (ICCs) In Fig. 2, we represent the model results in terms of ICCs or “tracelines” in which the impact of the two IRT parameters are easily visible in relation to each other for each individual item. ICCs are S-shaped functions, plotted as a function of latent trait depression scores. All items function at more or less the same level on the latent trait. Importantly, all items are located towards the more severe end—to the right of the figures. Both Figs. 1 and 2 suggest that a child located between 1 and 2 standard deviations above (worse) the population mean on the latent trait would have a 50% probability of endorsing the SMFQ items. We can also see that for a child at the mean (or median) latent trait value (0) the probability of endorsing any item is very low. This is also reflected in the entries in the last column of Table IV. Although items 3 (tired) and 4 (restless) (top right two ICCs in Fig. 2) also function at the severe end of the latent trait, they show more shallow slopes. This indicates lower discriminating power with respect to the latent trait An IRT and CDFA Analysis of the SMFQ 387 Fig. 1. Test characteristic curve and test information function for the SMFQ in a community sample of 7- through 11-year-olds. (depression). These are thus the least sensitive items for measuring depression. Items showing the highest commonality/lowest prevalence parameters were items 9: bad person (α i0 = − 4.3058) and 13: did everything wrong (α i1 = − 4.2836). Items 5: felt no good; 8: hated self and 13 showed the highest discrimination parameters (α i1 = 2.4916; α i1 = 2.4440 and α i1 = 2.4444, respectively). The Effect of Age We examined the possibility of item bias (DIF) using an extension to IRT modelling referred to as MIMIC modelling (see data analytic strategy). Results showed that the ratio of estimate to standard error was <1.96 for the direct effect of the covariate on all items, indicating no significant item bias effects. DISCUSSION The current study is the first investigation in which a combined IRT/CDFA analysis, including MIMIC modelling, of the SMFQ has been carried out in 7–11-year-old children ascertained from the community. Methods for summarizing item responses were originally developed in the field of psychometrics and have been widely applied for the purposes of educational testing. They have found increasing application in medical (epidemiological) and health care (survey) research for the assessment of psychopathology, but have rarely been applied to evaluate clinical psychopathological assessments intended for use in children. The advantages of IRT for abnormal psychology measurement are quite substantial (Cooke & Michie, 1997; Duncan-Jones et al., 1986; Embretson & Reise, 2000; Rouse et al., 1999; Santor et al., 1994). In scale construction and evaluation, IRT item analysis provides information that can be used for evaluating which items from a large pool should be used together to comprise a scale that is fit for that purpose (Waller, Tellegen, McDonald, & Lykken, 1996). IRT results can be used 388 Sharp, Goodyer, and Croudace Fig. 2. Item characteristic curves for the 13 items of the SMFQ. Items 1–13 (panelled from left to right, first item top left). Notes: Axis labels removed for clarity of presentation. The x-axis is the estimated latent trait score which is distributed as a standard normal distribution; The x-axis ranges from − 3 to + 3. The y-axis is the probability that the SMFQ symptom is endorsed; the y-axis ranges from a minimum value of 0 to maximum value of 1. Table IV. Confirmatory Factor Analyses using Mplus that Gives IRT Results for Categorical Response SMFQ Items (Binary Recoded) CDFA IRT Item Threshold (tau) Loading (lambda) Residual variance (1 − lambda2 ) Z test for loading (estimate/SE) Discrimination parameter a Intercept parameter b 1. Miserable 2. Not enjoy 3. Tired 4. Restless 5. Felt no good 6. Cried a lot 7. Poor concentration 8. Hated self 9. Bad person 10. Felt lonely 11. Unloved 12. Never be as good 13. Everything wrong 1.24 1.31 0.93 0.59 1.20 1.34 0.92 0.68 0.54 0.53 0.40 0.79 0.60 0.72 0.53 0.70 0.71 0.83 0.36 0.63 0.48 12.96 8.19 9.21 6.79 19.74 9.73 15.16 1.05 0.76 0.68 0.48 1.42 0.89 1.13 1.74 2.22 1.66 1.35 1.50 2.06 1.26 1.20 1.55 0.97 1.35 1.03 1.42 0.77 0.70 0.67 0.72 0.74 0.77 0.39 0.50 0.53 0.47 0.44 0.40 18.97 12.43 14.14 15.07 17.45 15.99 1.39 1.22 1.00 1.26 1.18 1.38 1.51 2.04 1.41 1.76 1.37 1.79 Notes. Parameters (loadings/lambda and thresnolds/tau) estimated using robust weighted least squares estimation (weighted lease squares estimation with mean– and variance-adjusted chi-square test statistic). Model fit under WLSMV: Chi-square value under robust WLS (mean and variance adjusted) = 55.197, df = 46; p = 0.16; Comparative Fit Index = 0.992; Tucker Lewis Index = 0.994; Root mean square error of approximation = 0.018; Weighted root mean square residual = 0.833. Tau: Mplus thresholds; Lambda: Mplus factor loadings; IRT aparameter: Discrimination parameter from response function parameterisation of CDFA, related to the factor loading (lambda); IRT Intercept (b-parameter): x-axis value where p(item) = 0.5 in Fig. 2; CDFA denotes parameters from ‘underlying variable parameterisation of the two parameter probit-probit IRT model; IRT denotes parameters under response function parameterisation of the two parameter probit-probit IRT model. An IRT and CDFA Analysis of the SMFQ to estimate a person’s trait level (latent trait score), which may be more accurate than summing unweighted individual item scores and can be calculated when there are partially missing data. IRT results are useful for determining whether the items included in a candidate scale exhibit differential item functioning for two populations (Cooke & Michie, 1997; Rouse et al., 1999). This particular advantage may be especially relevant in studies where developmental differences in the phenomenology of depression are the focus of interest (see Weiss & Garber, 2003 for a review of such studies). Lastly, using IRT technologies, there is potential for a clinical scale to be substantially shortened through the adoption of an adaptive testing approach (Gardner et al., 2004). With adaptive testing (computerised adaptive testing—CAT) a software program uses IRT parameters to select items in a sequence that optimally gain information on the severity score for each respondent, by tailoring questions to the likely level of the individual’s score. In summary, latent trait modelling techniques (like IRT and CDFA) provide a closer approximation of the structural model underpinning psychiatric data than traditional linear methods (Cooke & Michie, 1997), thereby giving a better representation of empirical data and ultimately uncovering new aspects of a familiar instrument (Duncan-Jones et al., 1986). By applying an IRT/CDFA approach we demonstrated, in line with Angold et al.’s findings (2002), that the SMFQ measures a unidimensional construct of depressive symptoms. Other studies have testified to the external validity of the SMFQ (see Table I). Based on these studies, and the face validity of SMFQ items, we assumed the latent trait underlying SMFQ items to be that of depression. However, in the absence of external validity data we cannot conclude definitively that the construct we studied was in fact depression. Notwithstanding this limitation, the current results support the SMFQ as a highly homogenous/unidimensional measure. Homogeneity/unidimensionality of a measure maximizes the ability to discriminate between diagnostically distinct groups (Costello & Angold, 1988) and thus speaks to the validity of the measure. High correlations between SMFQ items in this sample suggest good internal consistency for the scale in younger children. Factor loadings for all SMFQ items were high, indicating that the items were highly discriminating with respect to the latent trait. Item distribution along the latent trait continuum was also consistent with the goals of the instrument to be used for screening purposes. All items were found to function at the severe end of the latent trait, with none being useful to measure individuals at the 50th percentile in the population (mean, median—the midpoint of the latent trait). Accord- 389 ing to these results, the SMFQ may not be appropriate for use in community studies where the interest lies in variation in average (mental) health, because this would require items to be more widely distributed across the range of the latent trait. As such, the SMFQ may be more appropriate for detecting children ages 7–11 who are likely to report high levels of depressive symptoms at the time of measurement. Despite the low response rate in the current study, which may affect the generalizability of our findings, the current study indicates with more appropriate statistical modelling techniques that the SMFQ does what it was designed to do (Angold, Costello, Pickles, & Winder, 1987). An interesting next step would be to carry out IRT analyses of the SMFQ in a clinical sample with a current diagnosis of depression. This should yield a similar pattern of item loadings but the score distributions would be skewed toward the depressed end of the latent trait. In our sample, most children endorsed zero scores on the SMFQ, which is what is expected for community samples. Our analyses showed that two items function a little differently than the others (3: tired and 4: restless). These items were characterised by ICCs that exhibited shallower slopes and leftmost thresholds. They therefore offer almost no discriminating power at the most severe end of the depressive latent trait but would contribute to the discrimination of individuals with lower scores. This does not, however, provide a strong case for shortening the SMFQ any further. Although these items exhibited shallower slopes and lower thresholds in comparison with other items, their psychometric properties are not sufficiently different from the other items to exclude them. In contrast, items 8: hated self and 13: did everything wrong showed most discriminating power, whilst items 9: bad person and 13: did everything wrong functioned at the most severe end of the latent trait. This suggest that these questions may be the most useful to ask if a clinician or researcher is interested in identifying a child with depression with expediency. The information gained from these analyses regarding individual item performance is also of theoretical and substantive interest. Items 3: tired and 4: restless which did not load as highly (but still sufficiently to retain them) on to the latent trait can be seen as less central to the trait for children ages 7–11 compared with items 8: hated self, 9: bad person, 11: unloved, 12: never be as good and 13: did everything wrong that load highly. Within this age range, cognitive symptoms of depression such as items 8, 9, 11 and 12 show higher factor loadings, and therefore more discriminatory power at the severe end of the latent trait compared with “somatic” symptoms like items 3 and 4. These results are in line with previous research showing 390 that both cognitive and somatic symptoms in the long form of the MFQ are essential to the construct of depression in children and adolescents (Kent et al., 1997), but may differ in their prevalence and/or loading on a single factor of severity. The current results showed somatic symptoms of depression (tired and restless) contributed less than cognitive symptoms to the underlying unidimensional latent trait of depression in 7–11–year-old children. In summary, the IRT analyses conducted here suggest that the SMFQ: (1) is a unidimensional or homogeneous measure; (2) has items showing relatively good discrimination at the severe end of the latent trait, indicating that the instrument, in terms of its internal construct validity, is an adequate screening measure for depressive symptoms in a community sample of children ages 7–11-years-old; (3) although this study lacked an adolescent comparison sample, these data suggest that the SMFQ is not biased with respect to age from 7- to 11-years-old and is suitable for use with children as young as 7. ACKNOWLEDGMENTS The authors are grateful to all the families and schools who participated. We also wish to thank Sarah Moore, Heather Brown, Penny Hazell and Maria Loades for helping with data collection and entry, and Dr. Melanie Merricks and Dr. Carol Stott for valuable discussion. Carla Sharp was supported by an NHS Post-Doctoral Fellowship, University of Cambridge, UK. Tim Croudace was supported by a Dept of Health Career Scientist Award (Public Health). REFERENCES Albanese, M. T., & Knott, M. (1990). Twomiss: A computer program for fitting a one- or two- factor logit-probit latent variable model to binary data when observations may be missing. London, UK: London School of Economics and Political Sciences. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental Disorders—Fourth Edition (DSM-IV). Washington: American Psychiatric Association. Angold, A., Costello, E. J., Messer, S. C., Pickles, A., Winder, F., & Silver, D. (1995). Development of a short questionnaire for use in epidemiological studies of depression in children and adolescents. International Journal of Methods in Psychiatric Research, 5, 237– 249. Angold, A., Costello, E. J., Pickles, A., & Winder, F. (1987). The development of a questionnaire for use in epidemiological studies of depression in children and adolescents. London: Medical Research Council Child Psychiatry Unit. Angold, A., Erkanli, A., Silberg, J., Eaves, L., & Costello, E. J. (2002). Depression scale scores in 8–17-year-olds: Effects of age and gender. Journal of Child Psychology and Psychiatry, 43, 1052–1063. Baker, F. (2001). The basics of item response theory (2nd ed.). ERIC Clearinghouse on Assessment and Evaluation. Retrieved from http://edres.org/irt/ Sharp, Goodyer, and Croudace Bartholomew, D. J., Steele, F., Moustaki, I., & Gabraith, J. (2002). The analysis and interpretation of multivariate data for social scientists. London: Chapman & Hall/CRC. Cheong, Y. F., & Raudenbush, S. W. (2000). Measurement and structural models for children’s problem behaviors. Psychological Methods, 5, 477–495. Cooke, D. J., & Michie, C. (1997). An item response theory analysis of the Hare Psychopathy Checklist. Psychological Assessment, 9, 3–14. Costello, E. J., & Angold, A. (1988). Scales to assess child and adolescent depression: Checklists, screens and nets. Journal of the American Academy of Child and Adolescent Psychiatry, 27, 726– 737. Costello, E. J., Angold, A., Burns, B. J., Erkanli, A., Stangl, D. K., & Tweed, D. L. (1996a). The great smoky mountains study of youth: Functional impairment and serious emotional disturbance. Archives of General Psychiatry, 53, 1137–1143. Costello, E. J., Angold, A., Burns, B. J., Stangl, D. K., Tweed, D. L., Erkanli, A., et al. (1996b). The great smoky mountains study of youth: Goals, design, methods, and the prevalence of DSM-III-R disorders. Archives of General Psychiatry, 53, 1129–1136. Duncan Jones, P., Grayson, D. A., & Moran, P. A. (1986). The utility of latent trait models in psychiatric epidemiology. Psychological Medicine, 16, 391–405. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. London: Lawrence Erlbaum. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466–491. Gallo, J. J., Anthony, J. C., & Muthén, B. O. (1994). Age differences in the symptoms of depression: A latent trait analysis. Journals of Gerontology: Psychological Sciences, 49, 251–264. Gardner, W., Shear, K., Kelleher, K. J., Pajer, K. A., Mammen, O., Buysse, D., et al. (2004). Computerized adaptive measurement of depression: A simulation study. BMC Psychiatry, 4, 13. Goodman, R. (1997). The strengths and difficulties questionnaire: A research note. Journal of Child Psychology and Psychiatry, 38, 581–586. Goodman, R. (2001). Psychometric properties of the strengths and difficulties questionnaire. Journal of the American Academy of Child and Adolescent Psychiatry, 40, 1337–1345. Goodman, R., Ford, T., Simmons, H., Gatward, R., & Meltzer, H. (2000). Using the strengths and difficulties questionnaire (SDQ) to screen for child psychiatric disorders in a community sample. British Journal of Psychiatry, 177, 534–539. Goodyer, I. M., Herbert, J., Tamplin, A., & Altham, P. M. (2000). Firstepisode major depression in adolescents: Affective, cognitive and endocrine characteristics of risk status and predictors of onset. British Journal of Psychiatry, 176, 142–149. Goodyer, I. M., & Sharp, C. (2005). Childhood depression. In B. Hopkins, R. G. Barr, G. F. Michel, & P. Rochat (Eds.), The cambridge encyclopedia of child development (pp. 420–423). Cambridge: Cambridge University Press. Hankin, B. L., Abramson, L. Y., Moffitt, T. E., Silva, P. A., McGee, R., & Angell, K. E. (1998). Development of depression from preadolescence to young adulthood: Emerging gender differences in a 10-year longitudinal study. Journal of Abnormal Psychology, 107, 128–140. Kent, L., Vostanis, P., & Feehan, C. (1997). Detection of major and minor depression in children and adolescents: Evaluation of the Mood and Feelings Questionnaire. Journal of Child Psychology and Psychiatry, 38, 565–573. Lambert, M. C., Schmitt, N., Samms-Vaughan, M. E., An, J. S., Fairclough, M., & Nutter, C. A. (2003). Is it prudent to administer all items for each Child Behavior Checklist crossinformant syndrome? Evaluating the psychometric properties of the youth self-report dimensions with confirmatory factor analysis and item response theory. Psychological Assessment, 15, 550– 568. An IRT and CDFA Analysis of the SMFQ Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Erlbaum. Lord, F. M., & Novick, M. R. (Eds.). (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: LEA. Messer, S. C., Angold, A., Costello, E. J., Loeber, R., Van Kammen, W., & Stouthamer-Loeber, M. (1995). Development of a short questionnaire for use in epidemiological studies of depression in children and adolescents: Factor composition and structure across development. International Journal of Methods in Psychiatric Research, 5, 251–262. Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132. Muthén, B. (1989). Multiple-group structural modelling with nonnormal continuous variables. British Journal of Mathematical and Statistical Psychology, 42, 55–62. Office of National Statistics. (2001). Retrieved from http://www.statistics.gov.uk/ Park, R. J., Goodyer, I. M., & Teasdale, J. D. (2002). Categoric overgeneral autobiographical memory in adolescents with major depressive disorder. Psychological Medicine, 32, 267– 276. Patton, G. C., Carlin, J. B., Shao, Q., Hibbert, M. E., & Bowes, G. (1997). Adolescent dieting: Healthy weight control or borderline eating disorder? Journal of Child Psychology and Psychiatry, 38, 299–306. Rouse, S. V., Finger, M. S., & Butcher, J. N. (1999). Advances in clinical personality measurement: An item response theory analysis of the MMPI-2 PSY-5 scales. Journal of Personality Assessment, 72, 282– 307. 391 Santor, D. A., Ramsay, J. O., & Zuroff, D. C. (1994). Nonparametric item analyses of the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255–270. Sattler, J. M. (1988). Assessment of children. San Diego, CA: Sattler. Shrout, P. E., & Parides, M. (1992). Conventional factor analysis as an approximation to latent trait models for dichotomous data. International Journal of Methods in Psychiatric Research, 2, 55– 65. Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408. Thapar, A., & McGuffin, P. (1998). Validity of the shortened Mood and Feelings Questionnaire in a community sample of children and adolescents: A preliminary research note. Psychiatry Research, 81, 259–268. van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer-Verlag. Waller, N. G., Tellegen, A., McDonald, R. P., & Lykken, D. T. (1996). Exploring nonlinear models in personality assessment: Development and preliminary validation of a negative emotionality scale. Journal of Personality, 64, 545– 576. Wechsler, D. (1992). Wechsler intelligence scale for children (3rd UK ed.). London: Psychological Corporation. Weiss, B., & Garber, J. (2003). Developmental differences in the phenomenology of depression. Development and Psychopathology, 15, 403–430. Wood, A., Kroll, L., Moore, A., & Harrington, R. (1995). Properties of the Mood and Feelings Questionnaire in adolescent psychiatric outpatients: A research note. Journal of Child Psychology and Psychiatry, 36, 327–334.