Lecture 14 - Meta-Analysis and Quasi-Experiments From The American Heritage College Dictionary. 3rd Ed. (1997). Houghton Mifflin Company. The analysis of analyses. The units of analysis are not individual participant scores but whole study scores. Two major functions – identical to the functions of all research, but on a larger scale . . . 1) Descriptive: To summarize results (usually relationships between DVs and IVs) across collections of studies Summaries of relationships across studies used to be done with narrative reviews. Problems with narrative reviews: No systematic way of synthesizing the results of individual studies. Conclusions were variable, not reliable. Two reviewers might review the same studies and arrive at different conclusions. Part of the problem was due to the tendency to report findings as “significant” or “not significant”. Such a focus on reporting left the impression that scientists didn’t know what they were doing – else why didn’t they all report identical “significance” results? The age of meta-analysis changed the culture to one of reporting quantitative results for studies, rather than simply “significant” or “not significant”. The emphasis on quantitative results from individual studies lead to the need to summarize those quantitative results, which is part of meta-analysis. Quasi-Experiments and Meta-Analysis Lecture - 1 8/17/3 Why we need descriptive meta-analyses? Wonderlic <---> Conscientiousness scores from 7 studies conducted at UTC . . . Red’d correlations are significant (p < .05). Quasi-Experiments and Meta-Analysis Lecture - 2 8/17/3 Example of a “descriptive summarize relationships” meta-analysis: Connolly, J. J., Chockalingam, Viswesvaran. (2000). The role of affectivity in job satisfaction: a meta-analysis. Personality and Individual Differences, 29, 265-281. This meta-analysis was related to the dispositional theory of job satisfaction – persons are satisfied on the job because that’s the way they are, not because of the type of job they have or the way they’re treated. The researchers found mean correlation of .49 between a measure of positive affectivity and job satisfaction across studies. Suggests that up to 25% of variance in job satisfaction could be due to individual differences in affectivity. This could have been done within a single organization, but it carries much more weight when done across multiple organizations because the mean across organizations is less likely to be influenced by possible weirdness of one organization. 2) Inferential: To examine factors that affect sizes of relationships across collections of studies. Such factors are called moderators. Example of a “moderation” meta-analysis: Griffith, R. W., Hom, P. W., & Gaertner, S. (2000). A meta-analysis of antecedents and correlates of employee turnover: Update, moderator tests, and research implications for the next millennium. Journal of Management, 26, 463-488. They found that correlations between Pay and turnover depended on whether the organization offered reward contingencies or not. High pay -> low turnover only when reward contingencies were offered. Otherwise the correlation was essentially zero. So presence of reward contingencies moderated the pay -> turnover relationship. This study could not have been conducted in one organization with only one policy. It required multiple organizations, some with reward contingencies and others without them. Why not conduct just one big study? 1. Because it is likely not possible for a single researcher to conduct a study as big as the combination of many other studies. Some meta-analyses have Ns of 10,000 or more. 2. Because the use of multiple studies increases generalizability of the results – external validity. From an interview with Frank Schmidt . . . Those advocating it usually argue that a single large N study can provide the same precision of estimation as the numerous smaller N studies that go into a meta-analysis. But with a single large N study, there is no way to estimate SD-rho or SD-delta, the variation of the population parameters. This means there is no way to demonstrate external validity. A meta-analysis based on a large number of small studies based on different populations, measurement scales, time periods, and so on, allows one to assess the effects of these methodological differences. Quasi-Experiments and Meta-Analysis Lecture - 3 8/17/3 6 Steps from Understanding Meta-Analysis. Joseph A. Durlack 1. Formulating the research question Which effects do we wish to summarize? What relationships do we wish to study? Hunter, J. E., & Schmidt, F. L. (2004). Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. (2nd Ed). Thousand Oaks, CA: Sage. 2. Performing the literature search for articles to include in the M-A. A. Computerized searches of databases, such as PsychInfo B. Manual Searches of journal tables of contents. C. Searches of reference lists of relevant articles. D. Calls or emails to people doing research in the area. E. Being cognizant of the round-file effect – positive bias in rs due to looking only at published articles. From McAbee, S. & Oswald, F. L. (2013). The criterion-related validity of personality measures for predicting GPA: A meta-analytic validity competition. Literature Search A systematic search of the literature on the relationship between the Big Five personality traits and GPA in postsecondary student populations was conducted to identify relevant studies for the current meta-analysis. Articles published between January 1992 (coinciding with the publication of the NEO-FFI and NEO-PI–R; Costa & McCrae, 1992; and Goldberg’s, 1992, unipolar Markers) and February 2012 were considered for inclusion. Studies were located using several online databases, including PsycINFO, ERIC, and Google Scholar. The following keywords were entered into the respective databases to locate articles in various combinations: personality, Big Five, Five Factor Model, academic performance, academic achievement, GPA, and grade. Following database searches, the following online journal records were manually searched, starting from 2005 to online-first publications as of February 2012: Contemporary Educational Psychology, Human Performance, Intelligence, Learning and Individual Differences, Journal of Applied Psychology, Journal of Educational Psychology, Journal of Personality, Journal of Personality and Social Psychology, Journal of Research in Personality, Personality and Individual Differences. Finally, the references of previous metaanalyses and large reviews on personality and academic performance were searched to identify additional studies (Noftle & Robins, 2007; O’Connor & Paunonen, 2007; Poropat, 2009; Richardson et al., 2012; Trapmann et al., 2007). Article abstracts, and Method sections as necessary, were examined to determine whether the study included measures of personality related to academic performance outcomes (e.g., GPA, course grades). Quasi-Experiments and Meta-Analysis Lecture - 4 8/17/3 3. Coding each study. For each study, identifying which value that study has on each independent variable being investigated . e.g. Suppose the dependent variable is efficacy of therapy. For each study, code Type of therapy: Behavioral or Talking Group or individual Professional therapist or lay person In the McAbee and Oswald study mentioned above, the study was coded for type of Big Five questionnaire used, among other things. On the left is a table from that article describing the coding. Quasi-Experiments and Meta-Analysis Lecture - 5 8/17/3 4. Compute the appropriate effect size for each study. Most meta-analyses use either . . . A. A mean difference measure analogous to d = (μ1-μ2)/σ This measure or measures equivalent to it have been defined for many common research designs. Oftentimes, can compute from reports of t or F. B. A correlation measure, i.e., Pearson r. Regardless of whether d or r is computed, you must compute the same measure from each study. When you publish research, please think of meta-analyses. Can what you report be used for a meta-analysis? Quasi-Experiments and Meta-Analysis Lecture - 6 8/17/3 5. Statistical analysis. From each study, a measure of effect size is computed. May be more than 1 measure if more than 1 dv was computed. Descriptive Studies A weighted mean effect size is computed using the formula (Illustrated assuming the effect size for study i is ri based on a sample of size Ni. ΣNiri r-bar = -------------- should be called the weighted r-bar ΣNi The variance of the effect sizes is computed using the formula ΣNi(ri- r-bar)2 S2r = --------------------ΣNi The mean and variance are adjusted for A) Reliability of the independent variable B) Reliability of the dependent variable C) Range restriction of the dependent variable. The variance is also adjusted for D) Sampling error. These adjustments are somewhat complex, but doable with a good spreadsheet program.. There are several meta-analysis programs available, some for free. Quasi-Experiments and Meta-Analysis Lecture - 7 8/17/3 6. Testing for existence of systematic differences between studies Skipped the rest of this page in 2015 Cochran’s Q statistic N X K-1 = ------------------ * S2r (1-r-bar2)2 2 If the X2 is not significant, then it will be assumed that the effect sizes came from only one population of studies and the mean is an estimate of the mean effect size of that population of studies. If the X2 is significant, then it must be assumed that there are two or more populations represented by the group of studies, and some attempt to distinguish among them must be made. Cochran, W. G. (1954). The combination of estimates from different experiments. Biometrics, 10, 101–129. Higgins and Thompson’s I2 statistic Higgins and Thompson’s (2002) I2 value indexes the proportion of variance due to between-study differences. Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in meta-analysis. Statistics in Medicine, 21, 1539–1558. doi:10.1002/ sim.1186 Moderation effects: Identifying sources of systematic differences between studies. Individual studies are treated as rows of a data matrix. Individual study effect sizes are the scores that are analyzed. Each column contains either an effect size dependent variable or a potential moderating variable. t-tests, analysis of variance, or regression analysis may be used to examine the relationship between the dependent variables and potential moderators. Since the dependent variables in these analyses are measures of relationships, a significant t, or F, or significant r in a regression analysis is an indication of moderation. Quasi-Experiments and Meta-Analysis Lecture - 8 8/17/3 Examples of results of a meta-analysis: From . . . Weighted mean of observed, unadjusted correlations. Estimate of population correlation adjusted for predictor and criterion reliability. For Selection purposes, estimate of population correlation adjusted only for criterion reliability. Most people report the operational validity when estimating a population correlation for selection situations. They report the true-score correlation when estimating a population correlation in contexts other than selection. Quasi-Experiments and Meta-Analysis Lecture - 9 8/17/3 Example 2 from . . . Ignore this – it refers to parts of the table not copied here. Quasi-Experiments and Meta-Analysis Lecture - 10 8/17/3 THE ROLE OF EMOTIONAL INTELLIGENCE IN LEADERSHIP EFFECTIVENESS: A META-ANALYSIS A Thesis Presented for the Master of Science Degree The University of Tennessee at Chattanooga Ashleigh D. Farrar May 2009 Abstract Leaders are an essential element of the business world. While good leaders can provide many benefits for an organization, unsuccessful leaders can be detrimental. The notion that emotional intelligence plays a part in whether a leader is effective or not effective has recently been introduced. This study sought to unify the literature evaluating the possible link between emotional intelligence and leadership effectiveness. Meta-analytic techniques were used to analyze this relationship. Results revealed that overall, there is a positive relationship between emotional intelligence and leadership effectiveness. Also, while the type of emotional intelligence measure used served as a moderator to this relationship, a second and third meta-analysis supported the overall positive relationship of emotional intelligence and leadership effectiveness for each type of EI. Results The central aim of the present study was to examine the overall relationship of EI and leadership effectiveness. The initial meta-analysis was conducted using all of the included studies. The results of this meta-analysis are provided in Table 1. A total of 20 correlations were used from 20 studies, with a total sample size of 3,295. After correcting for unreliability in both EI and leadership effectiveness measures, the sample-size-weighted mean rho linking the constructs was .458. The 80% credibility interval did not include zero, indicating that there was a relationship between EI and leadership effectiveness. These results supported Hypothesis 1. Ashley’s data Quasi-Experiments and Meta-Analysis Lecture - 11 8/17/3 Ashley’s results Table 1: All Studies k Total Sample Size Mean Rho Variance of Rho 80% Credibility 20 3295 0.457 0.028 .24-.67 Table 2: EI Mixed Model Measures k 12 Total Sample Size. 2265 Mean Rho 0.427 Variance of Rho 0.030 80% Credibility .20-.65 Table 3: EI Ability Model Measures k 8 Total Sample Size. 1030 Mean Rho 0.536 Variance of Rho 0.013 80% Credibility .39-.68 Quasi-Experiments and Meta-Analysis Lecture - 12 8/17/3 From Frank Schmidt’s presentation at the 2012 RCIO conference Table 1. Selection methods for job performance Multiple R Operational Selection procedures/predictors GMA testsa Integrity testsb Employment interviews (structured)c Employment interviews (unstructured)d Conscientiousnesse Reference checksf Biographical data measuresg Job experience h Person-job fit measuresi SJT (knowledge)j Assessment centersk Peer ratingsl T & E point methodm Years of educationn Interestso Emotional Intelligence (ability)p Emotional Intelligence (mixed)q GPAr Person-organization fit measuress Work sample testst SJT (behavioral tendency)u Emotional Stabilityv Job tryout procedurew Behavioral consistency methodx Job knowledgey validity (r) .65 .46 .58 .60 .22 ! .26 .35 .13 .18 .26 .37 .49 .11 .10 .10 .24 .24 .34 .13 .33 .26 .12 .44 .45 .48 Gain in validity with GMA Over GMA .78 .76 .75 .70 .70 .68 .67 .67 .67 .66 .66 .66 .66 .66 .65 .65 .65 .65 .65 .65 .65 .65 .65 .65 .130 .117 .099 .053 .050 .036 .023 .020 .018 .014 .013 .009 .008 .008 .007 .005 .004 .004 .003 .000 .000 .000 .000 .000 % gain in validity 20% 18% 15% 8% 8% 6% 4% 3% 3% 2% 2% 1% 1% 1% 1% 1% 1% 1% 0% 0% 0% 0% 0% 0% Standardized Regression weights SuppleGMA ment .63 .52 .48 .67 .65 .91 .66 .64 .76 .78 .55 .65 .65 .65 .70 .63 .71 .64 .69 .64 .64 .63 .64 .65 .43 .43 .41 .27 .26 -.34 .17 .16 -.19 -.19 .16 .11 .10 .10 -.11 .09 -.10 .07 -.07 .03 .02 .02 .02 -.01 Table 2. Selection methods for training performance Operational Selection procedures/predictors validity (r) Multiple R with GMA Gain in validity % gain in validity Over GMA Standardized Regression weights SuppleGMA ment GMA testsa .67 Integrity testsb .43 .78 .109 16% .65 .40 Biographical data measuresc .30 .74 .073 11% 1.04 -.50 Conscientiousnessd .25 .73 .061 9% .69 .29 Employment interviewse .48 .72 .051 8% .57 .28 Reference checksf .23 .71 .038 6% .67 .23 Years of educationg .20 .70 .029 4% .67 .20 Interestsh .18 .69 .024 4% .67 .18 Peer ratingsi .36 .67 .002 0% .70 -.06 Emotional Stabilityj .14 .67 .001 0% .66 .03 Job experience (years)k .01 .67 .000 0% .67 .01 Note. Operational Validity estimates in parentheses are what is reported in Schmidt and Hunter (1998, Table 2). Selection procedures whose operational validity is equal to and greater than .10 are listed in the order of gain in operational validity. Unless otherwise noted, all operational validity estimates are corrected for measurement error in the criterion measure and indirect range restriction (IRR) on the predictor measure to estimate operational validity for applicant populations. Quasi-Experiments and Meta-Analysis Lecture - 13 8/17/3 McAbee & Oswald’s results – Prediction of gpa Quasi-Experiments and Meta-Analysis Lecture - 14 8/17/3 Quasi-Experiments and Meta-Analysis Lecture - 15 8/17/3 Quasi-Experiments Not in 2015 True Experiment Campbell & Stanley Cook & Campbell Shadish, Cook, & Campbell Design in which individual participants are randomly assigned to conditions. Quasi-Experiment Anything else. Most often: Designs in which different treatments are assigned (perhaps randomly) to already existing groups. Always: Designs comparing Subject variables, such as gender, graduate program, age, any prior condition Sometimes: Designs for which participants could be randomly assigned but are not for one reason or the other. Some authors differentiate designs for which assignment of conditions to groups is possible (e.g., Training programs to different buildings) from those for which it is not (e.g., Gender). Pretest-Posttest with Nonequivalent Groups Design Also called the Nonequivalent Control Groups Design with Pretest (NECG with Pretest Design) The most frequently discussed Quasi-Experimental Design The design involves pretests on both groups, making one the experimental group and the other the control group, then taking a post observation of both. Diagrammed as Pre Condit Post O1 XE O2 -------------------------------O1 XC O2 The line signifies nonequivalence. The true experimental counterpart is the Randomized Groups Design. (RG Design) O1 XE O2 O1 XC O2 R Note that in the Randomized Groups design, a pretest is not necessary, because the groups are theoretically equivalent prior to the administration of research conditions. But it's certainly possible and advisable to take a pretest in the Randomized Groups design. With the appropriate analysis, having a pretest can increase power of the comparison between the groups. Quasi-Experiments and Meta-Analysis Lecture - 16 8/17/3 Examples 1. Program to compare efficacy of a new diet program vs. an existing diet program. O1 is weight prior to beginning the diet O2 is weight after the diet XE is the new diet XC is the old diet Obviously, if participants could be randomly assigned to the diet type, then this would be a Randomized Groups design. But if individual participants could not be randomly assigned, then the design would be NECG. Pre Condit Post Prior Weight New Diet Post Weight -----------------------------------------------------------:Prior Weight Old Diet Post Weight 2. Comparison of I-O vs. Research program students in statistics. O1 is knowledge of statistical concepts taught in the course prior to the start of the semester O2 is the same test of knowledge of statistical concepts at the end of the course XE is the I-O program (or it could be the Research program) XC is the Research program (or it could be the I-O program) Pre Condit Post Pretest I-O Posttest -------------------------------------Pretest Research Posttest Quasi-Experiments and Meta-Analysis Lecture - 17 8/17/3 A few outcomes of the NECG design and their interpretations . . Assuming pretest and posttest are commensurable 1. Best possible Outcome Analysis: Between Subjects / Within Subjects analysis of variance Group Factor: Level 1 = Experimental Group Level 2= Control Group Time Factor. Level 1 = Pretest measure Level 2 = Posttest measure 1. Main Effect of Time Factor could be found if both groups increase performance from pre- to posttest. 2. Main Effect of Group Factor could be found depending on the specific results. 3. But Interaction is the key. If the interaction is significant and of the form shown below, this would indicate that the Experimental group increased significantly more than the Control group. Moreover, since the Experimental group performance was below that of the Control group on the pretest, any explanation of the result in terms of pre-existing differences would be difficult. E2 C2 C1 E1 Pre Treatment Post Quasi-Experiments and Meta-Analysis Lecture - 18 8/17/3 2. 2nd best possible outcome One Between Subjects / One Within Subjects analysis: Significant effects 1. Main Effect of Time Factor might be found if both groups increased from pre- to posttest. 2. Main Effect of Group Factor might be found in some circumstances. 3. Interaction is the key result here, though. E C E C Again, an explanation in terms of pre-existing groups would be difficult to accept in view of the similarity of performance on the pretest. Quasi-Experiments and Meta-Analysis Lecture - 19 8/17/3 3. The Dark side of the NECG with Pretest design: Regression to the Mean or Maturation One Between Subjects / One Within Subjects analysis: Significant effects 1. ME of Time possibly 2. ME of Group, possibly 3. Interaction is significant. The point being made here is that you can't blindly follow the significant effects. You must consider the pattern of differences. Even if the interaction were significant, it might be argued that the interaction was due to regression to the mean or to maturation effects. C C E E Maturation Explanation C C E E RTTM Explanation Children enrolled in Head Start program increased more from pretest to posttest than children not enrolled. But the result may have been due to differential regression to the mean. The Head Start children may have been from the lower tail of the distribution on the pretest. They would be expected to score higher on the posttest simply due to regression to the mean. Those in the control group may have been more likely to be near the middle of the distribution, and thus would not be expected to change much from pretest to posttest. C E Quasi-Experiments and Meta-Analysis Lecture - 20 8/17/3 4. Main effects only outcome One Between Subjects / One Within Subjects analysis: Significant effects 1. ME of Time possibly 2. ME of Group, possibly 3. Interaction not significant. Again, you can't blindly follow the significant effects. You must consider the pattern of differences. This outcome signifies nothing. The control and treatment groups increased by the same amount. The difference between them at pretest was the same as at posttest. E C E C Example from Shadish, Cook, & Campbell (2002). Carter, Winkler, & Biddle (1987) evaluated the effects of the National Institutes of Health (NIH) Research Career Development Award, a program designed to improve the research careers of promising scientists. Those who received the RCDA did better on the posttest than those who did not receive the award. But, the difference on the pretest was about the same as the difference on the posttest, So the final difference may have been due to pre-existing differences – Those who got the award were better to start with (which is why they got the award) and they were about as much better at the end of the study. Quasi-Experiments and Meta-Analysis Lecture - 21 8/17/3