CESD8 in comparative perspective Vandevelde

advertisement
The psychometric properties of the CES-D 8 depression inventory and
the estimation of cross-national differences in the true prevalence of
depression.
Sarah Van de Velde, Piet Bracke, Katia Levecque,
Department of Sociology, Ghent University,
Korte Meer 5, 9000 GHENT,
Belgium.
Abstract
Cross-national comparisons of the prevalence of depression in the general
populations are hampered by the absence of comparable data. Usually, cross-national
differences are estimated using meta-analyses of data from a diverse set of studies
using divergent depression inventories, different sampling designs, and not completely
comparable sampling populations. Using information on the frequency and severity of
depressive symptoms in the third wave of the European Social Survey (ESS round 3),
we are able to fill this gap. In the ESS round 3 the frequency and severity of symptoms
related to a DSM IV criterion for major depressive disorder is measured using a
shorter version of the Epidemiological Studies Depression Scale or CES-D (Radloff
1977). The CES-D is a key instrument to measure the prevalence of depressive
symptoms in society. Since its introduction the scale has been used to measure
depressive symptoms across several populations (elderly, adolescents, women,
clinical populations and ethnic populations). In a first step, we will evaluate the quality
of this shortened version of the CES-D as a measurement instrument and the
comparability of its measures across gender and across the European countries
involved in the ESS. Using confirmatory factor analysis we will test the configural,
metric and strong and strict factorial invariance, eliminating a possible measurement
bias of the scale. In a second stage, best fitting factor models are used to estimate the
‘true’ frequency and severity of depressive symptoms for women and men in the 25
participating European countries. To the best of our knowledge, the present study is
the first to present highly comparable data on the prevalence of depression in women
and men in Europe.
Introduction
Cross-national gender differences in depression: a partly artificial fact?
According to the World Health Organization, depression is the most common mental health
problem in the Western world (WHO, 2000). It has a high prevalence in nearly every society
with some studies even suggesting that depression is on the rise, at least in the Western world
(Fombonne 1994;Kessler et al. 1993;Wauterickx and Bracke 2005;Weissman and Klerman
1992 ). A recurrent finding in international literature is a 1.5 to 3 times higher prevalence of
depression in women compared to men (Weissman & Klerman, 1977; Weissman et al, 1984;
Kessler et al, 1993; Piccinelli & Wilkinson 2000). This is true for both in- and outpatient as well
as general population studies. The pattern of a higher prevalence of depression in women
compared to men has been consistent across nations, cultures and population groups, in
studies using different methods and measurement instruments and for a diversity of incidence
and prevalence indicators (Weissman et al, 1984; Kessler et al, 1993).
Unfortunately cross-national comparisons of the gender differences in depression in the
general populations have been hampered by the absence of comparable data. Usually, crossnational differences are estimated using meta-analyses of data from a diverse set of studies
using divergent depression inventories, different sampling designs, and not completely
comparable sampling populations. In Europe three cross-national psychiatric epidemiological
surveys went beyond these shortcomings and deliver comparable data: the Depress I and IIstudies, the ESEmed study, and the ODIN investigation. Nevertheless, these studies too have
their shortcomings. The Depress II-study (Angst et al. 2002 ) allows for the estimation of
gender differences in depression in a cross-national sample of treated individuals only.
Moreover, like the first sample of the first pan-European study on depression, the Depress I
(Lepine et al. 1997 ), this sample consists of only six countries. The ESEmed study (Alonso et
al. 2004 ) too contains data of only six European countries. Finally, the ODIN study (AyusoMateos et al. 2001 ) holds information from individuals in nine urban centres and rural areas in
five countries, albeit some of them were identified via primary care databases. Other studies
containing information on depression and anxiety-related complaints are (a) the Survey of
Health, Ageing and Retirement in Europe (SHARE) and (b) the WHO ‘Psychological Problems
in Primary Care’ study. The first survey covers 11 European countries, but is limited to couples
aged 50 and older (Börsch-Supan et al. 2005). The WHO-study sampled clients of 15 primary
care centres in 14 countries (Maier et al. 1999 ). The samples were enriched with depressed
cases, so they are not representative for the general population.
In the present study, we made use of the third round of the European Social Survey (ESS 3),
organized in 2006 and 2007 and covering data from 25 European countries. In the ESS 3 the
frequency and severity of symptoms related to a DSM IV criterion for major depressive
disorder is measured using a shorter version of the Center for Epidemiologic Studies Depression Scale or CES-D (Radloff 1977 ). The CES-D is a key instrument to measure the
prevalence of depressive symptoms in society. Since its introduction the scale has been used
to measure depressive symptoms across several populations (elderly, adolescents, women,
clinical populations and ethnic populations). The ESS 3 thus allows us to compare gender
differences in depression across multiple countries.
An analysis of gender and cross-cultural differences in rates of depression presupposes that
the underlying structure of responses to the depression scale in the participating countries
shows measurement equivalence (Blasius & Thiessen 2006; Moors 2004; Van de Vijver
2003). The CES-D, like most other mental health assessment instruments, was initially
developed and tested on samples comprised of mainly European Americans. Most research
on the cross-cultural equivalence of the CES-D was done on populations within the United
States and in a later stage with Hispanic or Asian populations. Unfortunately, cross-national
validation of the CES-D across European countries is still lacking. A study of the validation of
the translated version of the CES-D scale has been made for German populations (Hautzinger
& Bailer 1993), for Dutch populations (Bouma et al. 1995), for the Belgian Population (Van de
Velde, Levecque & Bracke 2008), for Turkish and Morrocan populations in the Netherlands
(Spijker et al. 2004), for Portuguese populations (Gonçalves & Fagulha 2004) and for French
populations (Fuhrer & Rouillon 1989). A recent study by Meads et al. (2006) assessed the
comparability of the CES-D scale across a UK, Germany and US population but found a bad
fit for the European countries.
Previous studies on the measurement equivalence of the 20-item CES-D scale across gender
groups are also ambivalent, with several studies pointing towards a gender bias (Callahan &
Wolinsky, 1994; Cole et al, 2000; Stommel et al, 1993; Ross & Mirowky 1984) while other
studies suggest measurement equivalence of the scale across gender groups (Berkman et al,
1986; Clark et al. 1981; Van de Velde, Levecque & Bracke 2008). Several other measurement
inventories for depression also pointed towards a gender bias (Piccinelli & Wilkinson, 2000;
Bracke 1996; Byrne et al. 1993; Mirowsky & Ross, 1995). In the ESS 3, depression was not
assessed using the CES-D 20, but respondents were administered an 8-item version of the
CES-D. While the CES-D 20 is used extensively in international research, the CES-D 8 has
seldom been used before.
Measurement equivalence and factorial invariance
When depression is measured using a multi-item self-report instrument such as the CES-D,
each item is considered an imperfect measure of one of the symptoms of depression, but as a
whole, the set of items is hoped to provide a valid indirect assessment of a latent construct
called depression. When the items are summed to form a composite measure, it is also
assumed that the total measurement score will be more reliable than single item scores
(Nunnally 1878).
Measurement equivalence or invariance is the condition that is attained when individuals with
equivalent true scores on a measurement instrument for a latent construct have the same
probability of a particular observed score on an associated test (Meredith 1993). Measurement
equivalence or invariance is the broader concept that subsumes factorial invariance. The latter
is a measurement equivalence approximation that can be tested with confirmatory factor
analysis (CFA), a special case of structural equation modeling. CFA is an excellent way of
determining whether a hypothesized common factor underlies a scale and is currently the
methodology of choice for assessed factorial invariance (Steenkamp & Baumgartner 1998;
Stein et al. 2006). In the jargon of factor analysis, the common factor (i.e. depression) is an
unobserved (latent) variable that is defined based on observed variables (i.e. the items of the
CES-D 8 scale). The common factor is presumed to influence responses to the items
(Meredith & Teresi 2006). CFA thus can be used to test whether this set of items construct an
indirect measure of our common factor ‘depression’.
While CFA allows testing factorial invariance of a construct in a single population group, multigroup CFA measures whether construct validity is invariant across two or more groups. The
ESS 3 measures depression via a self-report instrument in multiple countries. This raises the
question whether the items of our scale mean the same thing to people of these different
cultures and language groups. In cross-cultural research, three types of bias can be
distinguished: construct bias, item bias and method bias. (cf. Van de Vijver & Poortinga 1997).
The first type of bias occurs when the construct measured, in this case depression, is not
identical across groups. The definition of depression or the symptoms associated with it might
differ slightly across groups, intruding the equivalence of the measurement. Item bias occurs
most often when specific items of the depression scale are translated poorly, when there is
low familiarity with the item content in certain cultures, or when cultural specifics such as
nuisance factors or connotations associated with the item wording influence the measurement
score (Van de Vijver 2003). Method bias finally, occurs when there is sample incomparability,
instrument differences, interviewer of respondent effects or a difference in the mode of
administration.
In analysis, these forms of bias should be identified as measurement error. However,
commonly used approaches such as ordinary least squares assume that variables have been
measured without error (Brown 2006). Confirmatory Factor Analysis allows for relationships to
be estimated after adjusting for measurement error and an error theory. It offers a very strong
analytic framework for evaluating the equivalence of measurement models across distinct
groups (e.g. demographic groups such as sexes and cultures).
Available tests for multi-group comparison form a nested hierarchy defining several levels of
factorial invariance: dimensional, configural, metric (also called pattern), strong factorial (also
called scalar), and strict factorial invariance (Meredith 1993; Meredith & Teresi 2006; Bollen
1989; Byrne 1989; Gregorich 2006; Reise et al. 1993). A summary of these invariance
hypotheses is presented in Table 1. At each level a more restrictive hypothesis is introduced
providing increasing evidence of factorial invariance, and allowing specific group comparisons
to be made.
TABLE 1
A nested hierarchy of factorial invariance hypotheses (Source: based on Gregorich (2006) )
Invariance hypothesis
Invariance test
Dimensional
Configural
Metric (pattern)
Strong factorial (scalar)
Strict factorial (residual)
Invariant number of common factors
+ invariant item/factor clusters
+ invariant factor loadings
+ invariant item intercepts
+ invariant residual variances
Defensible substantive
quantitative group comparisons
None
None
Latent (co)variances
+ Latent and observed means
+ Observed (co)variances
Hypotheses on the various levels of factorial invariance start with an unrestricted model in
which no assumptions are made about the comparability of the various parameters across the
groups under scrutiny. As Table 1 shows, this is the case for dimensional and configural
factorial invariance. Respectively, they require that an instrument represents the same number
of common factors across groups, and that each common factor is associated with identical
item sets across groups. In the case of dimensional invariance, the underlying assumption is
that instruments differing in the number of factors across groups indicate qualitative
differences across these groups. In the CES-D 20 literature, the number of factors identified is
usually four, namely depressed affect, positive affect, somatic, and interpersonal problems
together loading on the common factor depression (Radloff 1997; Golding & Aneshensel 1989;
Hertzog et al. 1990; Joseph et al. 1995; Roberts et al. 1989; 1990; Shafer 2006; Sheenan et
al. 1995).
For the CES-D 8, research on the structural form of the scale could not be found. However,
based on the available items in the 8-item version and the identified structure of the full CESD, three structural forms can be hypothesized for the CES-D 8. The first is a one dimensional
model, with all the items loading on one common factor depression. An alternative form is a
two dimensional second-order factor model, built up by the factors depressed affect and
somatic complaints, each loading on the underlying factor depression (Riddle et al 2002;
Steffick 2000). However, many authors construct a distinct factor of the reversed worded items
felt unhappy and did not enjoy life, proposing a three- rather than two-dimensional construct
(Perreira 2005). However, we believe that the relationship among the reverse worded items is
better accounted for by correlated errors than separate factors. The differential covariance
among these items is not based on the influence of a distinct substantially important latent
dimension, but rather reflects an artifact of response styles associated with the wording of the
items (Brown 2006; Marsch 1996).
The next form of factorial invariance, namely metric or pattern invariance, tests whether the
common factors have the same meaning across groups, that is whether the factor loadings
are equal across groups. Factor loadings represent the strength of the linear relation between
each factor and its associated items (Bollen 1989). When the loading of each item on the
underlying factor is equal across groups, the unit of measurement of the underlying factor is
identical and the (co)variances of the estimated factors can be compared between groups.
Group comparisons are defensible because the meanings of corresponding common factors
are deemed invariant across groups and because the CFA model decomposes total item
variation into estimated factor components (i.e. true scores) and errors (Gregorich 2006).
Therefore, group differences in common factor variation and covariation are not contaminated
by possible group differences in residual variation.
Once the hypotheses of dimensional, configural and metric invariance are supported, a test of
strong factorial or scalar invariance is in order. Such a test addresses the question whether
there is differential additive response bias (Rorer 1965; Cheung & Rensvold 2000). Such bias
is caused by forces – such as cultural norms - which are unrelated to the common factors, but
systematically cause higher- or lower-valued item response in one population group compared
to another. Within the CFA model, systematic additive influences on responses are reflected in
the item intercepts. Since this response style is additive, it affects observed means but not
response variation. According to Gregorich (2006) evidence that corresponding factor loadings
and item intercepts are invariant across groups suggests that 1) group differences in
estimated factor means will be unbiased and 2) group differences in observed means will be
directly related to group differences in factor means and will not be contaminated by
differential additive response bias. For most researchers comparison of group means is of
main interest. Therefore the highest level of factorial invariance, namely residual invariance, is
of limited practical value. Residual invariance, also called strict factorial invariance, allows
comparisons of observed variance or covariance across groups. It is tested by constraining
the residuals associated with each items to be equal across groups, in addition to the loadings
and the intercepts of the model.
In sum, measurement or factorial variance can take on several forms, contaminating estimates
of latent constructs in several ways. Multi-group CFA allows to compare the means and
variance of latent constructs by correcting for possible bias due to variation across groups in
the number of common factors (dimensional invariance), the item/factor clusters (configural
invariance), factor loadings (metric invariance), item intercepts (strong factorial invariance) and
residual variances (strict factorial invariance). This is true for both the ‘full’ version of each
form of factorial invariance, and for partial factorial invariance. Factorial invariance of any of
the above mentioned types is said to be ‘full’ when all parameters are invariant across groups
(e.g. full metric invariance means that all factor loadings are invariant across the groups under
consideration); Partial factorial invariance applies to cases in which only some model
parameters are invariant (Byrne et al 1989). According to Steenkamp and Baumgarten (1998),
finding partial invariance suggests that the substantive group comparisons associated with the
corresponding ‘full’ invariance hypotheses are defensible since only the subset of items
meeting the metric, strong, or strict factorial invariance criteria are used to estimate associated
group differences.
Methods
Data: The European Social Survey (ESS) 2006/2007
The European Social Survey (ESS3) (http://www.europeansocialsurvey.com) (Jowell et al.
2007) is a biennial survey covering 25 European countries in 2006 and 2007. The ESS is
designed to chart and explain the interaction between Europe’s changing institutions and the
attitudes, beliefs and behavior patterns of its diverse populations. In each participating country,
the ESS-sample is designed following a strict randomized probability procedure and data is
gathered with face-to-face interviews. The use of proxies was not allowed. ESS-information is
representative for all individuals in the general population aged 15 and older, living in a private
household, irrespective of their language, citizenship and nationality. After deleting cases
lacking minimum information on gender or depression, our unweighted sample consists of
46.669 respondents. Response rate range from 45,97 percent in France to 73,19 percent in
Slovakia.
The CES-D 8
The Center of Epidemiological Studies Depression Scale or CES-D (Radloff 1977) is a key
instrument in the measurement of depression in American research (Perreira 2005), but less
often implemented within the European context. Initially, the CES-D was built by 20 self-report
items in order to identify populations at risk of developing depressive disorders; in itself
however, it should not be used as a clinical diagnostic tool (Radloff 1977; Perreira 2005;
Steffick 2000). The 20 items primarily measure affective and somatic dimensions of
depression, especially reflected in complaints such as depressed mood, feelings of guilt and
worthlessness, helplessness and hopelessness, psychomotor retardation, loss of appetite,
and sleep disturbance (Radloff 1977). Respondents are asked to indicate how often in the
past week previous to the survey they felt of behaved in a certain way ranging from ‘none or
almost none of the time’ to ‘all or almost or all of the time’. The response values are 4-point
Likert scales, with range 0 to 3. Scale scores for the CES-D are assessed using non-weighted
summated rating and range from 0 to 60 for the CES-D 20 and from 0 to 24 for the CES-D 8,
with higher scores indicating a higher frequency of depressive complaints. International
literature shows the CES-D 20 to have good psychometric properties (Radloff 1977, Shafer
2006; Sheenan 1995; Eaton et al. 2004). Shorter versions of the CES-D 20 have been used
extensively before, but research based on the 8-item version is rather scarce. Based on the
data of the ESS 3, we can confirm the reliability of the CES-D 8 for measurement of
depression within a general population context. Reliability was indicated by a total response
rate of 95% in men and 94% in women, lowest in Ukraine (77.1%) and highest in Norway
(99,8%). Respondent mean substitution was applied to respondents answering at least 5
items of the scale. Respondents who answered to less than 5 items of the CES-D 8 scale (330
cases) or who did not report their gender (100 cases) were excluded from our analysis.
Cronbach alpha of the CES-D 8 scale was 0.812 in male data and 0.847 in female data with
the lowest score in Denmark (0,728) and highest in Hungary (0,881). The items building up the
CES-D 8 are reported in Table 2. Consistent with international literature, we found higher
levels of depression in women compared to men, with a mean score of 14.8 in women and
13.7 in men (F(1, 15147.037): 834.724; p<0.001).
TABLE 2
Items of the 8-item version of the Center of Epidemiological
Studies-Depression Scale (Jowell et al. 2007)
How much of the time during the past week…
Answers rage from 0 (None or almost none of the time) to 3 (All or almost all of the time)
…you felt depressed?
…you felt everything you did was an effort?
…your sleep was restless?
…you were happy?
…you felt lonely?
…you enjoyed life?
…you felt sad?
…you could not get going?
Statistical procedure
In order to compare depression scores across gender and countries, the CES-D 8 scale
requires factorial invariance in both model form and model parameters (Bollen 1989). We
estimated the best fitting model using Confirmatory Factor Analysis (CFA). This model is fitted
to the data via Multiple Group Analysis using Maximum Likelihood estimations. Analysis is
conducted using the AMOS 16.0 program. The analysis follows two phases: First,
measurement invariance is hierarchically tested at each of the levels: dimensional, configural,
metric, intercept and residual invariance. Second, we estimate the factor means of the
depression construct for the different groups separately We then compare the estimated mean
differences of our factor model with the observed mean differences of men and women.
In our invariance tests, four specific model fit indicators are used. Commonly used in multigroup analyses, is the Chi²-test, testing the magnitude of the discrepancy between the sample
and fitted covariance matrices (Chen 2000). When Chi² is significant, the model is rejected.
However, the Chi² -test may easily lead to a type I error (and thus to an incorrect rejection of
the model) in case of non-normality of data, large sample sizes and complex models (See
Bollen (1989, 1990) for a detailed explanation of the influence of sample size on measures of
model fit). Since the first two conditions are inherent in our study, we report the Chi²-test but
add three model fit indices that showed a more robust performance in a simulation study by
Hu and Bentler (1998): the Tucker-Lewis index (TLI) (Tucker & Lewis 1973), the Comparative
Fit Index (CFI) (Bentler 1990) and the Root Mean Squared Error of Approximation (RMSEA)
(Steiger 1990). The first two indices range from 0 (poor fit) to 1 (perfect fit). A value of 0,90 or
higher provides evidence for a good fit, a value of 0,95 or above for an excellent fit (Hu &
Bentler 1998). The RMSEA indicates a reasonable fit in case its score is 0,08 or less and a
good fit in case the score is 0,05 or less (Browne & Cudeck 1992).
When exploring partial factorial invariance models in addition to ‘full’ factorial models, the
Modification Index (MI) procedure within AMOS 16.0 enables to relax the restriction on specific
parameters in order to attain partial factorial invariance. The MI of a parameter is a
conservative estimate of the decrease in chi-square that would occur if the parameter was
relaxed (Arbuckle 2007). Thus we determine which cross-group equality constraint, if any,
most significantly contributes to lack of fit; free that constraint; reestimate the model; and
reiterate this process as needed (Gregorich, 2006). We restrict ourselves to significant MIs
(3.84 and greater) (Byrne, Shavelson & Muthén 1989; Jöreskog & Sörbom 1984) and that can
be substantially interpreted.
In the current study we aim to determine whether the CES-D 8 scale is psychometrically
equivalent across gender and countries involved in the ESS 3 by testing its factorial
invariance. The different hypotheses of factorial invariance have been tested relatively
infrequently in the past (Gregorich 2006). When they are tested, investigators have
predominantly focused on invariance of construct validity (Vandenberg & Lance 2000) or on
identifying biased items of the full 20-item and reducing its length (Cole 2000, Perreira 2005,
Vega & Rumbaut 1991). The current study makes use of such an abbreviated CES-D scale
and tests whether its equivalence can effectively be determined across gender and countries
in the ESS 3. The aim of our study is therefore threefold. First, we determine the best fitting
model for our data. This baseline model is then used to test the hierarchal hypotheses of
factorial invariance allowing use to control for gender and cultural bias in the measurement of
depression. Finally we use our model to estimate true depression score and compare these to
the observed scores. To the best of our knowledge, the present study is the first to present
highly comparable data on the prevalence of depression in women and men in Europe.
Results
Tests of factorial invariance hypotheses
Table 3 gives an overview of the goodness-of-fit indices of the different levels of factorial
invariance. First, the dimensional invariance hypothesis is tested by respectively fitting a oneand two dimensional model to our data (model 1a-1b). The analysis is repeated by additionally
controlling for measurement effects of the reversed worded items felt unhappy and did not
enjoy life (model 1c-1d). All models are identified by constraining the factor loading of the item
felt depressed to 1. Results show that the one-dimensional model without any correlated
errors, scores insufficiently on all four model fit indices: Chi² is significant, TLI and CFI are
below the minimum level of 0.90 and RMSEA is far above the maximum level of 0.08.
Distinguishing between a depressed affect factor and a somatic factor, results in a slightly
better model fit as indicated by CFI=0.904, but all other indices suggest the model fits the data
unsatisfactory. Model 1c and 1d with correlated errors between the reverse-worded items also
have a significant Chi², but the three other indices show excellent fit: TLI and CFI above 0.90,
RMSEA below 0.08. However, looking closer at the estimates of the two-factor model (results
not shown here), indicates that this model includes a negative error variance, making its
solution unacceptable. Based on these results we make use Model 1c with all items loading on
one dimension and with a correlated errors between the reversed worded items as our
baseline model for the upcoming multi-group CFA. As models 1e and 1f show, additional tests
of the dimensional invariance hypothesis across gender on the one hand and across countries
on the other hand suggest that our baseline model fits well in both multi-group analyses
separately.
TABLE 3
Model fit summary: Chi², CFI, TLI and RMSEA
European Social Survey 2006-2007 (Jowell et al. 2007)
Model
Chi²
Df
Sign.
CFI
TLI
RMSEA
1. Dimensional
a. One factor
b. Two factor
c. One factor - correlated errors
d. Two factor - correlated errors
12793.362
11585.181
2715.030
1839.087
20
19
19
18
0.000
0.000
0.000
0.000
0.894
0.904
0.978
0.985
0.852
0.858
0.967
0.976
0.117
0.114
0.055
0.047
e: Gender
f: Countries
2756.209
4421.759
38
475
0.000
0.000
0.977
0.965
0.966
0.949
0.039
0.013
2. Configural
3. Metric
4a. Strong factorial
4b. Partial strong factorial
5a. Strict factorial
5b. Partial strict factorial
5026.743
6895.473
21371.629
12764.462
17314.392
13276.378
950
1293
1685
1573
1966
1762
0.000
0.000
0.000
0.000
0.000
0.000
0.964
0.950
0.824
0.900
0.863
0.897
0.946
0.946
0.854
0.911
0.902
0.918
0.010
0.010
0.016
0.012
0.013
0.012
Results of Model 2 and higher show the goodness of fit indices of the multi-group CFA with
groups defined by gender and countries simultaneously. As model 2 in Table 3 shows,
imposing equality constraints on the underlying items sets provides evidence for configural
invariance for the CES-D 8 across gender and countries. The assumption that factor loadings
are identical in both the male and female data in the different countries is also granted based
on the findings in model 3. Although the metric invariance test shows a significant Chi², both
the CFI and TLI suggest a very good fit as indicated by a score above 0.90. In addition, the
RSMEA is smaller than 0.08, adding evidence for metric invariance of the scale across
groups.
The fourth model in Table 3 tests strong factorial invariance by additionally imposing equality
constraints on corresponding item intercepts. We reject this model; Chi² is significant, and all
fit indices except RMSEA are below the acceptable level. Based on the Modification Indices,
we relaxed our model restrictions in order to meet partial strong factorial invariance (model
4b). Of all 400 constrained intercepts in our model, 112 needed to be relaxed for the model to
show an acceptable fit, with largest decrease in Chi² in Hungarian females and secondly in
Finish males. The relaxation resulted in total decrease in Chi² of 8607.167. These findings
suggest that comparisons across gender of factor and observed means of the CES-D 8 are
defensible.
In order to legitimately compare the observed (co)variances of depression in men and women,
the highest level of factorial invariance needs to be verified. This is tested by additionally
constraining all corresponding item residual variances. Model 5 shows that the hypothesis of
strict factorial invariance is empirically rejected (both CFI and TLI are below 0.90). After
relaxing all significant Modification Indices (MI>3.84), we also reject the hypothesis of partial
strict factorial invariance, with a CFI level still below 0.90.
In sum, the findings for all models in Table 3 indicate that the one-dimensional CES-D 8 scale
with correlated errors of the reverse-worded items can be used to compare factor means and
observed means, but not factor (co)variances and observed (co)variances between genders
and countries.
Comparison of observed means
The factorial invariance tests reported in Table 3 showed empirical evidence for the
hypotheses of partial strong factorial invariance, indicating that comparisons of observed
means of the CES-D 8 across gender and countries is warranted. Based on this model we
estimated the mean score of the CES-D 8 factor for each group. Factor scores attained from
the model were used as weights to correct the observed scores by the model specification.
Because of its stable conditions in Multi-group CFA, we identified the Estonian males as our
reference group. As the factor mean differences indicate gender differences in the overall
scores on the best fitting factor model, irrespective of culture specific and gendered item
response styles, they can be regarded as very conservative, gender and culture neutral
estimates of the gender difference in depression. Results of these estimated scores are shown
in table 4, along with the observed means and their standard deviations.
Our observed scores indicate that female respondents score significantly higher on the CES-D
8 scale than male respondents in all countries, except in Ireland and Finland, where the
difference is not significant. The gender difference is largest in Portugal (∆ = 1,829, p<0,000)
and smallest in Ireland (∆ = 0,098, n.s.). Ukrainian females and Hungarian males report the
highest depression level of their sex, with a mean score of respectively 17,439 and 16,133.
Lowest depression levels in each sex are reported by Norwegian females (mean of 12,494)
and Norwegian males (mean of 12,037).
Also the estimated scores confirm a higher prevalence of depression in females in all
countries. This difference is significant in all countries except Ireland. The estimated scores
indicate largest difference in depression between Portuguese males and females (∆ = 2.803),
and secondly in Russian males and females (∆ = 2,506). Smallest differences were found in
Ireland (∆ = 0,104) and Norway (∆ = 0,016).
The impact of our model was highest in Slovenia, with gender difference in depression
deviating 1,039 points from the observed score. Second in line is Portugal with a difference
between estimated and observed scores of 0.975 points, third France with a difference of
0.853 points. Model correction in scores was smallest in Ireland (0,007 points), the
Netherlands (0,077 points) and Norway (0,128 points).
TABLE 4
Comparison of observed mean scores and standard deviations with
estimated mean scores and standard deviations.
European Social Survey 2006-2007 (Jowell et al. 2007)
OBSERVED SCORES
Male
Mean
ESTIMATED SCORES
Female
S.D.
Mean
S.D.
Male
∆; p
Female
Mean
S.D.
Mean
S.D.
∆; p
Austria
13.227
3.769
13.754
4.161
0.527; 0.001
11.028
3.314
11.855
3.821
0.828; 0.000
Belgium
12.738
3.768
14.070
4.335
1.331; 0.000
10.663
3.320
12.393
4.104
1.730; 0.000
Bulgaria
15.247
4.788
16.539
5.018
1.292; 0.000
13.707
4.802
15.562
5.147
1.856; 0.000
Switzerland
12.387
3.138
13.080
3.509
0.692; 0.000
9.675
2.557
10.749
3.031
1.073; 0.000
Cyprus
12.457
3.144
14.022
3.902
1.565; 0.000
9.021
2.308
11.290
3.270
2.269; 0.000
Germany
13.653
3.470
14.490
3.917
0.837; 0.000
10.542
2.909
12.072
3.518
1.530; 0.000
Denmark
12.496
3.117
13.023
3.507
0.527; 0.002
9.081
2.469
10.301
3.018
1.220; 0.000
Estonia
14.315
3.772
15.163
4.142
0.848; 0.000
11.650
3.249
13.265
3.777
1.615; 0.000
Spain
12.858
3.810
14.348
4.531
1.490; 0.000
11.266
3.634
13.355
4.499
2.089; 0.000
Finland
12.817
3.140
13.108
3.484
0.291; 0.057
9.025
2.371
10.165
2.969
1.141; 0.000
France
12.898
3.739
14.250
4.625
1.352; 0.000
10.486
3.251
12.691
4.389
2.205; 0.000
United Kingdom
13.444
4.028
14.162
4.299
0.718; 0.000
11.214
3.469
12.166
3.910
0.952; 0.000
Hungary
16.133
4.984
17.084
5.206
0.951; 0.000
15.472
5.081
16.821
5.326
1.349; 0.000
Ireland
12.837
3.669
12.934
3.612
0.098; 0.579
10.443
3.163
10.548
3.065
0.104; 0.489
Latvia
15.478
3.847
16.206
3.769
0.728; 0.000
13.462
3.619
14.386
3.565
0.925; 0.000
Netherlands
12.710
3.495
13.878
3.947
1.169; 0.000
10.534
3.029
11.780
3.539
1.246; 0.000
Norway
12.037
3.008
12.494
3.222
0.458; 0.002
9.899
2.687
10.229
2.834
0.330; 0.013
Poland
13.921
4.425
15.285
5.148
1.364; 0.000
12.153
4.101
14.295
5.184
2.142; 0.000
Portugal
14.637
3.951
16.466
4.738
1.829; 0.000
12.484
3.768
15.288
4.933
2.803; 0.000
Romania
14.780
3.827
15.921
4.019
1.140; 0.000
11.659
3.387
13.300
3.735
1.641; 0.000
Russian Fed.
15.116
4.348
16.828
4.667
1.712; 0.000
13.670
4.196
16.176
4.687
2.506; 0.000
Sweden
12.455
3.411
13.520
4.146
1.064; 0.000
9.609
2.814
11.360
3.750
1.751; 0.000
Slovenia
13.327
3.280
14.207
4.225
0.880; 0.000
10.694
2.756
12.613
4.036
1.919; 0.000
Slovakia
15.181
3.817
15.694
4.083
0.513; 0.007
12.366
3.308
13.665
3.671
1.300; 0.000
Ukraine
15.686
4.674
17.439
5.068
1.754; 0.000
14.163
4.457
16.405
4.941
2.242; 0.000
Total
13.670
3.957
14.813
4.496
1.143; 0.000
11.287
3.743
13.035
4.500
1.749; 0.000
We used general linear modeling to decompose the variance in the observed and the true
mean depression scores in within and between gender variance and within and between
countries variance components. Results (not reported here) show that modifying the
depression scores according to the best fitting factor model drastically enhances the
proportion of the between gender and the between countries variance within the total gender
and countries variance. For the observed depression scores the within gender variance equals
1.1 % of the total variance, while for the true depression scores the measure equals 3.0 %.
Analogues figures are 7.5 % and 16.1 % for the between countries variance component. This
shift points to a drastically enhanced reliability of the indicator.
The differences in the ranking of the countries based on the true depression scores and the
estimated factor mean scores signal the impact of gender and culture related symptom
differences in depression. For instance, depression in Portuguese women is more strongly
related to feelings of sadness, and not being able to get going; while among Finnish women,
feeling depressed was pivotal as was the case among Hungarian women. Among Austrian
and Hungarian men feeling that everything they did was an effort carried more weight, while
among Portuguese men this complain was rather irrelevant. Feelings of loneliness are
important among Swiss and Danish men. Overall, the symptoms depressed, effort and get
going showed the largest gender*country variation in expression. Finally, overall modifications
of the factor model were largest among the Portuguese, the Hungarian, the Finnish, and the
Ukrainian.
Discussion
Simultaneous analysis of multiple groups places higher demands on the measurement scale
than single-group research. It requires that instruments measure constructs with the same
meaning across groups and allow defensible quantitative group comparisons. In this study, we
used a scale that measures depression by assessing the frequency and occurrence of certain
depressive symptoms. Depression is thus considered to be a latent construct whose
properties are inferred from observing the set of variables that serve as manifest indicators. If
the mean depression score of men differs from that of women across countries, what can be
concluded? It may be the case that these groups actually differ in their level of depression, it
may also be the case that extraneous influences are giving rise to the observed differences
(Meredith & Teresi 2006). Therefore factorial invariance needs to be tested in quantitative
comparative research. Unfortunately, factorial invariance has been tested relatively infrequent
in past research. When it is tested, researchers predominantly focused on invariance of the
factor construct (Vandenberg & Lance 2000). The number of studies that determine whether
comparisons of group means are defensible has increased in recent years, but its application
is not yet widely used and scattered across different domains. Lack of requisite technical skills
or lack of awareness might explain the scarceness of these types of invariance studies
(Gregorich 2006).
In the present study we established factorial invariance at all levels except strict factorial
invariance. Strong factorial invariance was partially established after relaxing a substantial
amount of intercepts. Next we estimated depression mean scores across gender and
countries, eliminating a measurement artifact. Our results indicate that the CES-D 8 scale can
be used to compare mean differences in depression of men and women across countries,
showing good reliability and validity. Our study based on the third round of the European
Social Survey covering 25 European countries confirms the consistent epidemiological finding
that women report more complaints of depression than men. Our results also suggest that the
true gender difference in depression is larger than the observed answers of our population on
the 8 item short version of the CES-D.
Some limitations of our study are worth noting when interpreting the results. When testing
factorial invariance in large community samples such as the ESS3, the researcher should also
bear in mind that the variables of interest are often non-normally distributed, specifically when
working with ordinal Likert scales (Lubke & Muthen 2004). However, the maximum likelihood
estimation method assumes that data have a normal distribution. In our analyses, we tested
the robustness of our findings by additionally estimating a Bollen-Stine significance level via
bootstrapping, a procedure compensating for the normality assumption (Fouladi 1997, Nevitt &
Hancock 1997). Results (not shown) did not indicate a different significance level than the one
reported for the Chi² tests. An additional robustness test was based on a logarithmic
transformation of the CES-D 8 data, decreasing the non-normality of the item and scale score
distributions. This procedure results in better fit-indices (not shown), but it simultaneously
increases the complexity of a substantive interpretation of the parameter estimates. Important
to note is that the hypotheses of factorial invariance were supported by all estimations
methods even after controlling for non-normality.
In our analysis some intercepts were found to vary across our groups, whereas others were
invariant, resulting in partial strong factorial invariance. We based our decision to relax a
certain intercept on significant modification indices as provided by AMOS 16.0. This was only
done if the intercept to be relaxed could be substantially interpreted. How should we then
interpret our partial strong factorial model? In our analysis 112 of the 400 intercepts were
relaxed. We have three options on what to do with our scale. One option might be to omit the
items that were found to perform differently across groups. A second option might be to retain
all 8 items of the CES-D 8 scale in the belief that the population differences in factor structure
are “small” in some sense and that these differences will not obscure inferences from the
scale. A third option might be to abandon the use of the scale altogether for comparisons
across populations, reasoning that the lack of invariance establishes that the scale is
measuring different latent variables in the two populations. Unfortunately the current literature
on factorial invariance offers little guidance in choosing among these three options (Millsap &
Kwok 2004).
Finally, the current findings do not automatically imply psychometric equivalence across social
groups distinguished by other criteria such as language, ethnicity, social class or age. All
these social groups may have group-specific attributes that lead to measurement
inequivalence of (self-report) scales. We therefore strongly suggest to test factorial invariance
before comparing specific group scores. The actual experience and expression of depression
may vary sufficiently according to other demographic and social or cultural factors to
effectively undermine attempts to compare rates of depressive symptoms across all groups.
Further research is needed to determine the extent to which these factors influence responses
to self-report instruments.
Conclusion
The CES-D 8 can be considered a reliable and valid measurement instrument for depression
within the European context at the level of partial strong factorial invariance. Our model allows
us to compare observed means, but not observed (co)variances since the level of strict
factorial invariance was not met. A one dimensional depression model, with all items loading
on the factor depression and with correlated errors between the reversed worded items felt
unhappy and did not enjoy life, fits the data best. Consistent with international literature, we
found higher levels of depression in women compared to men, but our analyses suggest that
the true gender difference in depression is larger than the one observed in the ESS3.
References
Alonso J, Angermeyer MC, Bernert S, et al. (2004). Prevalence Of Mental Disorders In
Europe: Results From The European Study Of The Epidemiology Of Mental Disorders
(Esemed) Project. Acta Psychiatrica Scandinavica; 109: 21-27.
Angst J, Gamma A, Gastpar M, et al. (2002). Gender differences in depression Epidemiological findings from the European DEPRES I and II studies. European
Archives Of Psychiatry And Clinical Neuroscience; 252(5): 201-209.
Arbuckle JL. (2007). AMOS 16.0 User’s Guide. SPSS Inc.
Ayuso-Mateos JL, Vázquez-Barquero J L, Dowrick C. et al. (2001). Depressive disorders in
Europe: prevalence figures from the ODIN study. British Journal of Psychiatry, 179,
308-316.
Bentler PM. (1990). Comparative fit indexes in structural models. Psychological Bulleti; 107:
238-246.
Berkman LF, Berkman LF, Berkman CS, Kasl S, Freeman DH, Leo L, Ostfeld AM,
Cornonihuntley J, Brody JA. (1986). Depressive symptoms in remation to physical
health and functioning in the elderly. American Journal of Epidemiology; 124: 372388.
Blasius J, Thiessen V. (2006). Assessing data quality and construct comparability in crossnational surveys. European Sociological Review; 22(3): 229-242.
Bollen KA. (1989). A new incremental fit index for general structural equation models.
Sociological Methods & Research; 17: 303-316.
Bollen KA. (1989). Structural equations with latent variables. New York: Wiley.
Bollen KA. (1990). Overall fit in covariance structure models – 2 types of sample-size effects.
Psychological Bulletin; 107: 256-259.
Börsch-Supan A, Brugiavini A, Jürges H, Mackenbach J, Siegrist J, Weber G. (2005). Health,
Ageing and Retirement in Europe – First Results from the Survey of Health, Ageing
and Retirement in Europe. Mannheim: MEA.
Bouma J, Ranchor AV, Sanderman R, Sonderen E. (1995) Symptomen van depressie (CESD). Het meten van symptomen van depressie met de CES-D, een handleiding.
(available at
http://www.rug.nl/gradschoolshare/research_tools/assessment_tools/CESD_handleiding.pdf)
Bracke, P. (1996). Geslachtsverschillen in depressief gedrag in een representatieve
steekproef van de Vlaamse bevolking: de validiteit van een zelfrapportageschaal.
Archives of Public Health; 54: 275-300.
Brown TA. (2006). Confirmatory factor analysis for applied research. New York, NY: The
Guildford Press.
Browne MW, Cudeck R. (1992). Alternative ways of assessing model fit. Sociological Methods
& Research; 21: 230-258.
Byrne BM, Baron P, Campbell TL. (1993). Measuring Adolescent Depression: Factorial
Validity and Invariance of the Beck Depression Inventory Across Gender. Journal of
Research on Adolescence; 3: 127-143.
Byrne BM, Shavelson RJ, and Muthén B. (1989). Testing for the equivalence of factor
covariance and mean structures: The issue of partial measurement invariance.
Psychological Bulletin 105: 456–466.
Byrne BM. (1989). Multigroup Comparisons and the Assumption of Equivalent Construct
Validity Across Groups: Methodological and Substantive Issues. Multivariate
Behavioral Research; 24: 503-523.
Callahan CM, Wolinsky FD. (1994). The effect of gender and race on the measurement
properties of the CES-D in older adults. Medical Care; 32: 341-356
Chen FF, Sousa KH, West SG. (2000). Testing measurement invariance of second-order
factor models. Structural Equation Modeling-A Multidisciplinary Journal; 12: 471-492.
Cheung GW, Rensvold RB. (2000). Assessing extreme and acquiescence response sets in
cross-cultural research using structural equations modeling. Journal of Cross-Cultural
Psychology; 31: 187-212.
Clark VA, Aneshensel CS, Frerichs RR, Morgan TM. (1981). Analysis of effects of sex and
age in response to items on the CES-D scale. Psychiatry Research; 5: 171-181.
Cole SR, Kawachi I, Maller SJ, Berkman LF. (2000). Test of item-response bias in the CES-D
scale: experience from the New Haven EPESE Study. Journal of Clinical
Epidemiology; 53: 285-289
Eaton WW, Muntaner C, Smith C, Tien A, Ybarra M. (2004). Center for Epidemiologic Studies
Depression Scale: Review and Revision (CESD and CESDR). The Use of
Psychological testing for Treatment Planning and Outcomes Assessment. (available at
http://apps1.jhsph.edu/weaton/cesd-r/cesdrpaper.pdf)
Fombonne E. (1994). Increased Rates Of Depression - Update Of Epidemiologic Findings And
Analytical Problems. Acta Psychiatrica Scandinavica ; 90(3): 145-156
Fouladi, R.T. (1997). Covariance structure analysis techniques under conditions of multivariate
normality and nonnormality-modified and bootstrap based test statistics. Paper
presented at the annual meeting of the American Educational Research Association,
San Diego.
Fuhrer R. Rouillon F. (1989). La version Française de l’échelle CES-D. Psychiatrie et
Psychologie ; 4 : 163–166.
Golding JM, Aneshensel CS. (1989) Factor structure of the Center of Epidemiologic Studies
Depression Scale among Mexican Americans and Non-Hispanic Whites.
Psychological Assessment: Journal of Consulting and Clinical Psychology; 1: 163-168.
Goncalves B, Fagulha T. (2004). The Portuguese version of the Center for Epidemiologic
Studies Depression Scale (CES-D). EUROPEAN JOURNAL OF PSYCHOLOGICAL
ASSESSMENT; 20 (4):339-348.
Gregorich SE. (2006). Do self-report instruments allow meaningful comparisons across
diverse population groups? Testing measurement invariance using the confirmatory
factor analysis framework. Medical Care; 44: S78-S94.
Harkness JA. Van de Vijver FJR. Mohler PPH. (2003). Cross-cultural survey methods. New
York: Wiley
Hautzinger M, Bailer M. (1993). Allgemeine Depressions-Skala. Beltz, Weinheim.
Hertzog C. Van Alstine JV, Usala PD, Hultsch DF, Dixon R. (1990). Measurement Properties
of the Center for Epidemiologic Studies Depression Scale (CES-D) in older
populations. Psychological Assessment: Journal of Consulting and Clinical
Psychology; 2: 64-72.
Hu LT, Bentler PM. (1998). Fit indices in covariance structure modeling: Sensitivity to
underparameterized model misspecification. Psychological Methods; 3(4): 424-453
rd
Jöreskog KG, Sörbom D. (1984). Lisrel- VI user’s guide. 3
Software.
ed. Mooresville, IN: Scientific
Joseph S, Lewis CA. (1995). Factor Analysis of the Center for Epidemiologic Studies
Depression Scale. Psychological Reports; 76: 40-42.
Jowell R and the Central Coordinating Team, European Social Survey 2006/2007. (2007)
Technical Report, London: Centre for Comparative Social Surveys, City University
(available at http://www.europeansocialsurvey.com)
Kessler RC, Mcconagle KA, Swartz M. Blazer DG, Nelson CB. (1993). Sex and depression in
the National Comorbidity Survey I: Lifetime prevalence, chronicity, and recurrence.
Journal of Affective Disorders; 29: 85-96.
Kumata, H., Schramm, W. (1956). A pilot study of cross cultural meaning. Public Opinion
Quarterly 20: 229–238.
Lepine JP, Gastpar M, Mendlewicz J, et al. (1997). Depression in the community: The first
pan-European study DEPRES (Depression Research in European Society).
International Clinical Psychopharmacology; 12(1): 19-29.
Lubke GH, Muthen BO. (2004). Applying multigroup confirmatory factor models for continuous
outcomes to Likert scale data complicates meaningful group comparisons. Structural
Equation Modeling – A Multidisciplinary Journal; 11: 514-534.
Maier W, Gansicke M, Gater R, Rezaki M, Tiemens B, Urzua RF (1999). Gender differences in
the prevalence of depression: a survey in primary care. Journal Of Affective Disorders;
53(3): 241-252.
Marsch, HW. (1996). Positive and negative global self-esteem: A substantively meaningful
distinction or artifactors? Journal of Personality and Social Psychology, 70, 810-819.
Meads DM, McKenna SP, Doward LC. (2006). Assessing the cross-cultural comparability of
the centre for epidemiologic studies depression scale (CES-D). Value In Health; 9(6):
A324-A324.
Meredith W, Teresi JA. (2006). An essay on measurement and factorial invariance. Medical
Care; 44: S69-S77.
Meredith W. (1993). Measurement invariance, factor-analysis and factorial invariance.
Psychometrika; 58: 525-543.
Millsap RE, Kwok OM. (2004). Evaluating the impact of partial factorial invariance on selection
in two populations. Psychological Methods; 9(1): 93-115.
Mirowsky J, Ross CE. (1995). Sex differences in distress – Real or artifact. American
Sociological Review 1995; 60: 449-468.
Moors G. (2004). Facts and artefacts in the comparison of attitudes among ethnic minorities. A
multigroup latent class structure model with adjustment for response style behavior.
European Sociological Review; 20(4): 303-320.
Nevitt, J., Hancock, G.R. (1997). Relative performance of rescaling and resampling
approaches to model Chi-square and parameter standard error estimation in structural
equation modeling. Paper presented at the annual meeting of the American
Educational Research Association, San Diego, April 14, 1997.
Nunnally, J.C. (1978). Psychometric theory (2nd ed). New York: MacGraw-Hill, 1978.
Perreira KM, Deeb-Sossa N, Harris KM, Bollen K. (2005). What are we measuring? An
evaluation of the CES-D across race/ethnicity and immigrant generation. Social
Forces; 83: 1567-1601.
Piccinelli M, Wilkinson G. (2000). Gender differences in depression. British Journal of
Psychiatry; 177: 486-492.
Radloff LS. (1997). The CES-D Scale: A self-report depression scale for research in the
general population. Applied Psychological Measurement; 1: 385-401.
Reise SP, Widaman KF, Pugh RH. (1993). Confirmatory factor-analysis and item response
theory – 2 approaches for exploring measurement invariance. Psychological Bulletin;
114: 552-566.
Riddle, A.S.; Marc, R.B.; Hess, U. (2002). A Multi-Group Investigation of the CES-D's
Measurement Structure across Adolescents, Young Adults, and Middle Aged Adults.
Scientific Series, CIRABO, Quebec. (available at
http://www.cirano.qc.ca/pdf/publication/2002s-36.pdf)
Roberts R E, Vernon SW, Rhodes HM. (1989). Effects of language and ethnic status on
reliability and validity of the Center for Epidemiologic Studies-Depression Scale with
psychiatric patients. The Journal of Nervous and Mental Disease; 177: 581-592.
Roberts RE, Andrews JA, Lewinsohn PM, Hops H. (1990). Assessment of depression in
adolescents using the center for epidemiologic studies depression scale.
Psychological Assessment 1990; 2: 122-128.
Rorer LG. (1965). The great response-style myth. Psychological Bulletin; 63: 129-156.
Ross CE, Mirowsky JJ. (1984). Socially-desirable response and acquiescence in a crosscultural survey of mental health. Journal of Health and Social Behavior; 25: 189-197.
Shafer AB. (2006). Meta-analysis of the factor structures of four depression questionnaires:
Beck, CES-D, Hamilton, and Zung. Journal of Clinical Psychology; 62: 123-146.
Sheehan T J, Fifield J, Reisine S, Tennen H. (1995). Add The measurement structure of the
Center for Epidemiologic Studies Depression Scale. Journal of Personality
Assessment; 64: 507-521.
Spijker J, van der Wurff FB, Poort EC, Smits CHM, Verhoeff AP, Beekman ATF. (2004).
Depression in first generation labour migrants in Western Europe: the utility of the
Center for Epidemiologic Studies Depression Scale (CES-D). International Journal Of
Geriatric Psychiatry; 19(6): 538-544.
Steenkamp JBEM, Baumgartner H. (1998). Assessing measurement invariance in crossnational consumer research. Journal of Consumer Research 1998; 25: 78-90.
Steffick D. (2000). Documentations of affective functioning measures in the Health and
Retirement Study. HRS/AHEAD Documentation Report DR-005. Survey Research
Center, University of Michigan, Ann Arbor, MI. (available at
http://www.umich.edu/∼hrswww/docs/userg/index.html)
Steiger JH. (1990). Structural model evaluation and modification – an interval estimation
approach. Multivariate Behavioral Research; 25: 173-180.
Stein JA, Lee JW, Jones PS. (2006). Assessing cross-cultural differences through use of
multiple-group invariance analyses. Journal of Personality Assessment; 87: 249-258.
Stommel, M.; Given, BA; Given, CW; Kalaian, HA; Schulz, R.; McCorkle, R. (1993). Gender
bias in the measurement properties of the Center for Epidemiologic Studies
Depression Scale (CES-D). Psychiatry Research 1993; 49: 239-250.
Tucker LR, Lewis C. (1973). Reliability coefficients for maximum likelihood factor-analysis.
Psychometrika 1973; 38: 1-10.
Van de Velde S, Levecque K, Bracke P. (2008) Measurement Equivalence of the CES-D 8 in
the General Population in Belgium: a Gender Perspective.
Van de Vijver FJR. Poortinga YH. (1997) Towards an integrated analysis of bias in crosscultural assessment. European Journal of Psychological Assessment 13 : 29-37.
Van de Vijver, F. (2003), Bias and equivalence, pp. 143-155 in J. Harkness, F. Van de Vijver,
P. Mohler (eds.), Cross-cultural Survey Methods. New Yersey: Wiley & Sons.
Vandenberg RJ, Lance CE. (2000) A review and synthesis of the measurement invariance
literature: suggestions, practices, and recommendations for organizational research.
Organisation Research Methods; 2: 4-69.
Vega WA, Rumbaut RG. (1991). Ethnic minorities and mental health. Annual Review of
Sociology 17: 351–383.
Wauterickx N, Bracke P. (2005). Unipolar depression in the Belgian population - Trends and
sex differences in an eight-wave sample. Social Psychiatry And Psychiatric
Epidemiology; 40(9): 691-699.
Weissman MM, Wickramaratne P, Greenwald S, et al. (1992). The Changing Rate Of Major
Depression - Cross-National Comparisons. Journal Of The American Medical
Association; 268(21): 3098-3105.
Weissman, MM, Klerman GL.(1977). Sex differences and the epidemiology of depression.
Archives of General Psychiatry 1977; 34: 98-111.
Weissmann MM, Leaf PJ, Holzer CE, Myers JK, Tischler GL. (1984). The epidemiology of
depression: An update on sex differences in rates. Journal of Affective Disorders; 7:
1979-1988.
World Health Organisation. Health for all database 2000. Geneva: WHO.
Download