A TIME TO LEARN AND SHARE CIFA国际货代考试英语卷测量等价性检验 Testing Cross gender and region construct validity in CIFA English test for certification of freight forwarders 王占礼 测量等价 (Measurement Invariance; MI) 测量等价是Drasgow 借用项目反应理论( Item Response Theory)的相似概念首次提出了一个测量学术语 , 是指对于 不同的条件下观察和研究的现象, 测量操作产生对同一属性 的度量。根据检验的对象不同, 测量等价由低到高构成四个 水平 : 形等价( configural invariance) 弱等价( weak invariance) 强等价( strong invariance ) 严等价( strict invariance)。 等价级别 形态等价又称结构等价, 是指不同组的潜变量、显变量之间的基本结构关系相同, 即每 一潜变量以相同的显变量来测量, 但不要求对应参数相等。 弱等价又称因素负荷等价, 是指不同组之间的因素负荷相等, 这意味着每一个显变量在 不同的组之间具有相同的单位, 潜变量每变化一个单位, 显变量在不同组中都会产生相同程 度的变化。 强等价又称截距等价, 是指不同组之间显变量在由潜变量预测时截距相等。强等价意味 着测量在不同组之间具有对等的参照点, 这样, 显变量的跨组差异将可以完全反映所测量的 潜变量的跨组差异, 也就是进行跨组的均数比较是有意义的。 严等价又称误差等价, 是指每一显变量在不同的组间测量误差具有相同的变异, 在这一 水平上跨组的方差齐性检验是有意义。在统计上, 四个水平的等价性具有层级嵌套关系, 即 只有在低一水平的等价性得到证实后, 高一水平的等价性检验才有意义。故测量等价性检 验步骤也偱此顺序进行。 构念(construct) Literature review We often have several groups in our analyses: different cultures, regions or countries. In order to compare relationships between constructs or means across groups, we need certain level of invariance of the constructs across those groups. The meaning of invariance is “whether or not, under different conditions of observing and studying phenomena, measurement operations yield measures of the same attribute” (Horn and McArdle 1992, 117). Techniques to test invariance Various techniques have been developed to test measurement invariance (De Beuckelaer, 2005). Multiple- group confirmatory factor analysis (MGCFA: Jöreskog 1971) is among the most powerful. Configural Invariance (1) The lowest level of invariance is ‘configural’ invariance. Configural invariance requires that the items in the measuring instrument exhibit the same configuration of loadings in each of the different countries. That is, the confirmatory factor analysis thus confirms that the same items measure each construct in all countries in the cross-national study (or cross-group). Configural Invariance (2) Configural invariance is supported if (a) a single model specifying which items measure each construct fits the data well, (b) all item loadings are substantial and significant, (c) there are no large modification indices, and (d) the correlations between the factors are less than one. The latter requirement guarantees discriminant validity between the factors (Steenkamp and Baumgartner 1998). Measurement invariance (1) Configural invariance does not ensure that the people in different nations understand the items in the same way. The factor loadings may still be different across countries. The test of the next higher level of invariance, ‘measurement’ or ‘metric’ invariance, requires that the factor loadings between items and constructs are invariant across nations Measurement invariance (2) It is tested by constraining the factor loading of each item on its corresponding construct to be the same across groups. Measurement invariance is supported if the model cannot be significantly improved by releasing some of the constraints. Partial measurement invariance (1) However, for cross-cultural comparison to be allowed, it is not necessary that all factor loadings are equal. Several scholars have suggested that it is enough to have two equal factor loadings per construct across countries to allow comparison of effects. They termed it partial measurement (metric) invariance (Byrne, Shavelson, and Muthen 1989; Steenkamp and Baumgartner 1998). Scalar invariance (1) A third level of invariance is necessary to allow mean comparison of the underlying constructs across countries. This is often a central goal of cross-national research. Such comparisons are meaningful only if ‘scalar’ invariance of the items is ensured. Scalar invariance guarantees that cross-country differences in the means of the observed items are a result of differences in the means of their corresponding constructs. Scalar invariance (2) To assess scalar invariance, one constrains the intercepts of the underlying items to be equal across countries. It is supported if the model fit to the data is good and if it cannot be improved by releasing some of the equality constraints. Invariance - summary Meaningful comparison of construct means across countries requires three levels of invariance, configural, metric, and scalar. Meaningful comparison of relationships between constructs requires two levels of invariance, configural and metric. Only if all these types of invariance are supported can we confidently carry out comparisons. CIFA考试简介 CIFA国际货代考试是由原外经贸部(现商务部)委托,由中国国际 货运代理协会(CIFA)组织实施的职业认证考试。自2002年实施 以来已有近16万人参加考试,其中近6万人获得证书(中国国际货 运代理协会,2011)。考点遍布全国省市,考试得到了业内的高 度评价和广泛的认可。参加考试的院校之间也常常进行比较,考试 成绩对相关院校的英语教学具有巨大的反馈作用。 该考试权威性 强、规模大,高风险(high-stakes)的特点要求考试必须科学、严 谨,尤其对不同群组(性别、区域等)的考生都要公平、公正,具 有较好的跨组测量等价性,跨组效度。这样对分数的解释,进行组 间差异比较也才有意义。 AMOS 结构方程模型(SEM) 包括多种统计技术,如路 径分析,验证性因子分析,带潜变量的因果关 系模型,甚至方差分析和多重线性回归。 AMOS即是处理结构方程的一种软件包。 Amos is short for Analysis of Moment Structures. It implements the general approach to data analysis known as structural equation modeling (SEM), also known as analysis of covariance structures, or causal modeling. This approach includes, as special cases, many wellknown conventional techniques, including the general linear model and common factor analysis. The value 0.49 is the correlation between Education and Income. The values 0.72 and0.11 are standardized regression weights. The value 0.60 is the squared multiple correlation of SAT with Education and Income. 模型比较 分析步骤 收集数据Collecting and treating data. 建立理论模型并检 验不同组别的拟和 程度。Theoretical model (setting and fitting to various subpopulation of the test takers) 嵌套模型检验 nested model testing 模型筛选(model assessment) 建议Implications 跨性别等价性理想,强等价。 跨区域等价性较好,弱等价。 Dif项目原因待查 地雷Caution Recent studies suggest that when full or partial measurement invariance is not guaranteed, it may still be the case that constructs are equivalent. Saris and Gallhofer (2007, chapter 16) indicate that the test of measurement invariance is too strict and may fail although cognitive equivalence still holds. 谢谢! Thank you very much for your attention! 请多多指教! I would appreciate your comments and advice.