March 27, 2006 – Moderator: Weihua Fan YUN YUN DAI: Using Structured Mixture IRT Models to Study Differentiating Item Functioning Yunyun Dai & Robert J. Mislevy University of Maryland Abstract The standard analysis of differential item functioning (DIF) uses a manifest variable, such as gender or ethnicity, to cluster examinees into groups to investigate itemby-group interactions given overall proficiency. A structured mixture IRT model is proposed as a more appropriate method to detect the existence of DIF, in terms of latent classes of examinees, and study its genesis. The structured mixture model accommodates latent group membership into an IRT model in which different item parameters from each latent group may be estimated. One of the most important advantages of the structured mixture model is its ability to study the causes of DIF, which is not part of the model in conventional DIF analysis. This is done by incorporating both person and item covariates, as predictors of latent class membership and item interaction effects respectively in the structured mixture IRT model. JENNIFER KORAN: Teacher and Multi-source Computerized Approaches for Making Individualized Test Accommodation Decisions for English Language Learners Jennifer Koran, Rebecca Kopriva, Jessica Emick University of Maryland J. Ryan Monroe Prince George’s County Public Schools Diane Garavaglia Consultant Abstract There has been a strong call for more systematic methods associated with selecting appropriate large scale test accommodations for students in special populations (Abedi, Courtney, & Leon, 2003; Kopriva & Mislevy, 2001; Thompson et all, 2000; Thurlow et al., 2003). Improvement in this area is especially critical for English language learners (ELLs), a group that has a relatively short history of inclusion in large scale assessments. However, an important prerequisite for the systematic application of theory to select appropriate accommodations is the collection of accurate and relevant data about the student to use as the basis for decision making. This paper introduces a multi-source, theory-driven approach for gathering information to assign appropriate accommodations to individual ELLs. The value of this approach is investigated in the context of a computerized system called the Selection Taxonomy for English Language Learner Accommodations (STELLA), which also uses the data to make a systematic accommodations recommendation based on individual student needs. Results of the study suggest that using a structured data collection procedure with multiple sources does not have an effect on the nature of teachers’ accommodations recommendations for their students, but that it does contribute to improved accommodation recommendations when it is used in conjunction with the STELLA decision rules. Results also support the use of multiple sources, especially the student’s parent, in collecting information to be used in making an accommodation recommendation. ROY LEVY: Posterior Predictive Model Checking for Factor Analysis Roy Levy, University of Maryland Abstract We describe the use of measures for conducting posterior predictive model checking for investigating data-model misfit in factor analytic models. Though such measures have been applied in factor analytic models, the techniques and their potential are not generally understood. This paper discusses these techniques, including their potential advantages over traditional mechanisms in terms of their flexibility and the appropriateness of application to complex models. An example based on a simulated data set illustrates the procedures. It is argued that further methodological attention is warranted. April 3, 2006 – Moderator: Jaehwa Choi ROY LEVY: Alternative Approaches To Validity In A Modeling Framework Roy Levy & Gregory R. Hancock University of Maryland Abstract Developments in modeling, in particular the construction, use, and criticism of latent variable models, have spurred recent interest in viewing validity from a modeling perspective. Alternative approaches to validity are characterized from a modeling perspective. Illustrations of data analysis from under these alternative approaches highlight their differences in terms of model evaluations and recommendations, and the limitations of the approaches themselves. PENG LIN: Characteristics and Differential Functioning of Alternative Response Options for the English Section of ACT Assessment Peng Lin & Amy Hendrickson University of Maryland Abstract An important characteristic of multiple-choice items is the alternative options, or distracters. However, in most multiple-choice item analyses, the item responses are coded as 1 (correct) or 0 (incorrect). With this coding scheme, the distinction among the alternative options is lost (Thissen, 1984). Analyzing multiple-choice items on the option level provides more detailed information about the behavior of both the keyed option and the alternative options. The purpose of this study is to conduct a series of analyses to illustrate how to obtain detailed and useful information about the options. Seventy-five items of the 1995 ACT English section were used. In the first stage, option characteristic curves (OCC) were estimated using a nonparametric IRT approach. Option performance was examined based on the features of the OCCs. In the second stage, DIF analysis was first conducted for all items using a logistic regression method. For the items displaying DIF, multinominal logistic regression analysis was conducted for DOF analysis. The subgroups of interest in this study were gender and ethnicity. BRANDI WEISS: An Investigation of New Computations of Response Time Effort Brandi A. Weiss, University of Maryland Steven L. Wise, James Madison University Abstract In “low-stakes” testing situations, adequate examinee motivation is difficult to assume because the test scores have no personal impact on the students. Wise and Kong (2005) developed an unobtrusive measure of response time effort (RTE) to measure examinee motivation during such testing situations. The development of RTE was based on the notion that examinees who responded too quickly to an item were engaging in rapid-guessing behavior (no effort), whereas examinees that spent a reasonable amount of time on an item were engaging in solution behavior (effort). The computation of RTE is based on an examinee’s response time, in which values of either zero (no effort) or one (effort) are assigned for each test item. These effort values are then summed together and divided by the total number of test items to obtain a proportional RTE index. The current study aimed to evaluate the effect that dichotomizing the response times for individual items has on the measure of RTE. Four new computations of RTE that allowed effort to be considered continuous were compared to Wise and Kong’s (2005) RTE index. Results indicated that the reliability and validity of all five RTE indices were similar. Therefore dichotomizing item responses prior to aggregation did not negatively affect the reliability or validity of the measure of RTE. Interestingly, however, evidence for a third type of response behavior (i.e., abandonment) was found. Recommendations for selecting the most appropriate RTE computation based on the type(s) of response behavior(s) one wishes to identify are discussed.