The Centre for Research on Lifelong Learning and Education (CELE) Nordic Comparative and International Education Society (NOCIES) Symposium Educarium Building University of Turku, Turku, Finland May 21-22, 2013 Comparative research and fallacious causal attributions Jón Torfi Jónasson, School of Education, University of Iceland jtj@hi.is Examples of massive, perhaps influential documents; who has got the time and energy to go through them systematically and critically? ILSA (international large-scale assessments) – PISA, TIMSS, We also have Talis,... • OECD. ( 2009). Creating Effective Teaching and Learning Environments: First Results from TALIS Derived analysis, see e.g. from PISA 2009 (1+4 volumes) – – PISA 2009 Results: Overcoming Social. Background EQUITY IN LEARNING OPPORTUNITIES AND OUTCOMES. VOLUME II - with policy implications PISA 2009 Results: Learning to Learn. STUDENT ENGAGEMENT, STRATEGIES AND PRACTICES. VOLUME III - with policy implications Descriptive studies (e.g. the background documents for the “summits”), all with policy implications • • • OECD. (2013), Teachers for the 21st Century: Using Evaluation to Improve Teaching, OECD Publishing. Schleicher, A. (2012), Ed., Preparing Teachers and Developing School Leaders for the 21st Century: Lessons from around the World, OECD Publishing. OECD. (2011). Building a High-Quality Teaching Profession. Lessons from around the world. • • McKinsey. (2007). “How the world’s best performing school systems come out on top” McKinsey. (2010). “How the World’s Most Improved School Systems Keep Getting Better” Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 2 A recent class of comparative studies: ILSS – International large scale studies What is the basic idea? : To compare!? What is their presumed relevance? : If there is a difference, it calls for change by those who are behind; the culprit is normally “the system” Various types, assessments (inviting ranking, PISA, TIMSS), surveys (Talis), interviews (McKinsey), ... And then what is their presumed use? and what methodological design demands does this make? A very neglected field for discussion and debate? Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 3 Overview • Comparative education; what for? • Research based policy discourse • Analysis of the problem • Formal methodological issues • Discussion of the problem Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 4 Comparative education; what for? on what grounds? with reference to which questions? It may be interesting to compare competence, curriculum, organization and systems, dispositions, aspirations ... • One may want to understand what similarities there might be comparing certain aspects of education (e.g. in the drop-out patterns), despite notable system differences (or vice versa) • One may want to learn from other systems (or cultures) about ways of – – doing things, perhaps arguing on qualitative grounds not doing things, which is common, and sometimes quite dramatic (don't emulate us, please!) Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 5 Research based policy discourse – we must be able to see the wood rather than only the trees “One way we'll know we're succeeding in changing China's schools is when those PISA scores come down.” 2010 JIANG XUEQIN, deputy principal of Peking University High School, and director of its International Division. http://online.wsj.com/article/SB10001424052748703766704576008692493038646.html Here, neither PISA as such nor China are the issue, but the relationship in general between various tests (e.g. PISA), education, schools and their function in society. And this reminds us also of the more general question, what kind of evidence is relevant for educational decisions, and how do we use it? Note the title of: Bridges, D., Smeyers, P., & Smith, R. (Eds.). (2009). Evidence-Based Education Policy. What Evidence? What Basis? Whose Policy? : John Wiley And Sons Ltd. Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 6 Research based policy discourse (cont.) • What issues can in principle be solved and perhaps not be solved by research? (A neglected but crucial question in an evidence based ethos.) – The aim(s) of education, will in principle not be determined by research? – The “best preparation for the future”, can at any given time not be determined by research even though it could perhaps in principle be answered at a later time? (E.g. research on the long-term effects of medical interventions.) Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 7 Research based policy discourse (cont.) General conceptual issues. Distinguish between different classes of questions, such as (and please stop to think how different these are:) • Type A questions: What systems, methods and content will best serve “our” aims of education? Or even, particular subsets of aims? – • Then we must determine what kind of research design might be best suited to respond to these questions; PISA as an example, might or might not be good way of doing it. Type B questions: How can we use existing data (e.g. ILSAs such as PISA, TIMSS, ..) to clarify the operation of our education systems? – Then we would explore what methods would be most appropriate to survey the data in order to tease out the informative patterns; without - in most cases – being able to deduce any causal relationships Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 8 Research based policy discourse (cont.) It might be that people start out with class A questions, collect data that then allows responding to a series of class B questions (and do that quite well) and then feel legitimised talking as if class A questions had been asked. Please note in the following an attempt to convey the questions we think we should be asking and then speculate at what level our questions really are? Then stop for a moment to think how you might map, e.g. the PISA endeavour onto the (some) general overarching aims of education. Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 9 Research based policy discourse (cont.) It is suggested here that the principal validity problem is, however, not simply a validity problem, because it is very or totally unclear what the principal questions are, and thus what the constructs at issue are. Much more time should be spent on this problem than is normally done. What are the principal questions we want to answer? And as soon we have determined these we must enter the validity discussion. In the following we suggest a number of levels (and there are more) at which we might approach the problem. We seem to be normally at the lowest level (or lower) but the discussion is often as if we were at the highest level. Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 10 Formal methodological issues: approaching validity • What kind of society do we want for the future and what should – or might the education system contribute to its formation? (What do we want?) – What characteristics and knowledge do we desire from our emerging generations? What kind of metrics would be sensible to use to gauge these? (How do we measure this?) • To which extent would we expect the important characteristics and knowledge to emerge from within our educational systems or what role would we want these systems to play? (What should the education systems do?) – To the extent the education system is expected to serve the goal of preparing the new generations for the future work life, what kinds of skills or dispositions or cultures would be most sensible? (Education and the world of work.) Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 11 Deconstructing the aims of education and relating to, e.g. PISA The aims of education PISA For the individual, skills, well being, social functioning … For society, world of work, survival, democratic and cultural participation, … Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 12 Assessment stuies, e.g. PISA A huge amount of data is collected; but what are the fundamentally important questions we should be asking (even) before we start analysis? What does the variance mean? In real terms? Examples: a) How do we compare a high group in one system to a low group in another system? What are the system implications of that comparison b) Why on earth does the nation state demand such an attention; what about different regions within it? (Note e.g. Canada, but the examples abound; what does it mean to compare the U.S. and China?) c) Why are the differences within a normal class not the most interesting focus of attention? Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 13 PISA 2009 Variation of reading performance within countries Figure II.1.1 PISA 2009 Results: Overcoming Social Background EQUITY IN LEARNING OPPORTUNITIES AND OUTCOMES VOLUME II Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 14 PISA 2009 Variation of reading performance within countries Figure II.1.1 PISA 2009 Results: Overcoming Social Background EQUITY IN LEARNING OPPORTUNITIES AND OUTCOMES VOLUME II Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 15 Variation in Literacy Skills among Canadian Provinces: Findings from the OECD PIS J. Douglas Willms University of New Brunswick Published by authority of the Minister responsible for Statistics Canada © Minister of Industry, 2004 Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 16 Formal methodological issues: validity • Construct validity, internal validity, (external validity) – To a large extent the validity issue centres around the definition of the problem; what are the questions at the heart of the studies? • Internal validity, causal inferences; design demands – I. Randomized experiments – II. Non-randomized designs (and the problems they entail) • Quasi experiments (static- groups, various versions of (interrupted) time-series designs) • Correlational research (various statistical analysis; regression, path-analysis, ... • Survey research What methodology does allow evidence based policy borrowing? Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 17 Formal methodological issues: alternative approaches What system or content would be most appropriate in order to • a) achieve equality within society? • b) build a democratically competent nation? • c) form a creative population? • Why are these not the most relevant questions; how do we design studies to address those? • But of course existing studies might be very helpful in gauging the problem or assessing the situation. Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 18 A Nordic model – Nordic issues? What are we talking about when discussing a Nordic model? A common system? A common history? A common set of values? A common culture? How important is the nation state as a unit of analysis in the this context? What are the criteria for being Nordic (a question about approach) a) being unique? No probably not? or not necessarily b) sharing something, perhaps also with others? Yes, probably c) sharing something valuable also with others? Yes, definitely Consider some examples from this perspective. What is Nordic about those? Start with the general importance of equality. Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 19 International Monetary Fund Finance & Development, September 2011, Vol. 48, No. 3 Andrew G. Berg and Jonathan D. Ostry “Do societies inevitably face an invidious choice between efficient production and equitable wealth and income distribution? Are social justice and social product at war with one another? In a word, no.” “That experience brought home the fact that sustainable economic reform is possible only when its benefits are widely shared. “ Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 20 Portugal Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 114 Uruguay 21 148 154 Bulgaria Qatar 100 144 147 Israel Trinidad and Tobago 131 135 Argentina 124 Luxembourg Dubai (UAE) 120 122 Belgium Japan New Zealand 116 116 Austria 114 113 Australia 115 113 Sweden Kyrgyzstan Albania 112 Peru Panama 110 112 Singapore 106 108 Iceland United States 105 106 Italy 105 Greece United Kingdom 104 Brazil 105 102 Switzerland Ireland 101 Montenegro Germany 98 100 Czech Republic 96 96 Slovenia Norway 95 Jordan Kazakhstan 94 95 Canada 94 Romania 94 93 94 Poland Russian Federation Hungary 92 Netherlands Slovak Republic 89 91 Croatia 88 87 87 Colombia Spain 86 86 Finland Lithuania 84 86 Tunisia Chinese Taipei 81 83 Mexico 81 Serbia Hong Kong-China 80 81 Denmark 80 Liechtenstein Estonia 77 79 Chile 74 Shanghai-China Turkey 72 74 Korea Latvia 66 67 Azerbaijan Thailand Macao-China 51 60 Indonesia Total variance as a proportion of the OECD variance 80 60 OECD average 65 % 40 20 Variation within schools 0 20 40 60 80 OECD average 42% Variation between schools Expressed as a percentage of the variance in student performance across OECD countries Variation in reading performance between and within schools Figure II.5.1 100 Figure II.5.1 ormance between and within schools Variance in student performance explained by the index of economic, social and cultural status of students and schools e variance in student performance in OECD countries Percentage of variance within and between schools 0 10 20 30 40 50 60 70 80 90 Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 100 22 Source : OECD PISA 2009 database, Table II.5.2. Variation in reading performance explained by students' and schools' socio-economic background Percentage of variance in reading performance explained by the PISA index of economic, social and cultural status of students and schools ariance in student performance in OECD countries Note : Countries are ranked in ascending order of the percentage of overall variance in reading performance explained by the PISA index of economic, social and cultural status of students and schools. Azerbaijan Tunisia Qatar Hong Kong-China Indonesia Thailand Jordan Iceland Finland Norway Macao-China Mexico Dubai (UAE) Romania Kazakhstan Greece Slovenia Italy Russian Federation Netherlands Estonia Canada Israel Panama Croatia Serbia Lithuania Japan Austria Switzerland Brazil Latvia Kyrgyzstan Spain Chinese Taipei Korea Albania OECD average Slovak Republic Argentina Trinidad and Tobago Ireland Portugal Hungary Singapore Germany Czech Republic Bulgaria Belgium Shanghai-China Chile Liechtenstein Turkey Montenegro Australia Peru Poland Colombia Uruguay Denmark Sweden United States New Zealand United Kingdom Luxembourg Variation in performance explained by schools' socio-economic background between schools Variation in performance explained by students' socio-economic background within schools Expressed as a percentage of the average variance in student performance in OECD countries Variation in reading performance explained by students' and schools' socio-economic background Figure II.5.4 Variation in reading performance explained by students' and schools' socioeconomic background ure II.5.4 Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences Kyrgyzstan Azerbaijan Panama Qatar Kazakhstan Argentina Jordan Montenegro Brazil Trinidad and Tobago Mexico Serbia Russian Federation Second-generation students Dubai (UAE) Austria Luxembourg Israel Croatia Czech Republic Spain Greece Slovenia Students without an immigrant background Italy Macao-China Portugal OECD average Hungary United Kingdom Denmark France Ireland Germany Sweden Liechtenstein United States All students Switzerland Estonia Norway Belgium Netherlands Australia New Zealand Canada Singapore Hong Kong-China Finland Reading performance, by immigrant status First-generation students 550 Mean score 500 450 400 350 300 23 Formal methodological issues: meta-analysis • The lessons of meta-analysis; show substantial variation between studies. The problem of relying on single studies, single methodologies, and very homogeneous criteria is probably more serious than is often appreciated. Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 24 Two references Critical discussion of the McKinsey reports: Coffield , Frank (2012). Why the McKinsey reports will not improve school systems. Journal of Education Policy, 27(1), 131-149 Critical discussion of the political use of ILSAs Engel, Laura, Williams, James, Feuer, Michael. (April 2012).The Global Context of Practice and Preaching: Do High-Scoring Countries Practice What U.S. Discourse Preaches? School of Education and Human Development, George Washington University. Working paper 2.3 Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 25 Conclusion A firm conceptual and technical methodology for relating comparative studies to any form of policy action must be (re-) established. There is a serious lack of rigour, not less at the conceptual level, than the technical level (noting that the problem has little to do with statistics which is may be carried out at a sophisticated level – but that is another debate). The most serious problems are those related to validity of the studies, vis-à-vis the questions they are in fact intended to answer and what inferences, causal or otherwise can be drawn related to those questions. This relates to all aspects of validity, not just to internal validity. Therefore, let us briefly return to the meta-question, what are the questions we are seeking answers to? Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 26 Formal methodological issues: approaching validity • What kind of society do we want for the future and what should – or might the education system contribute to its formation? (What do we want?) – What characteristics and knowledge do we desire from our emerging generations? What kind of metrics would be sensible to use to gauge these? (How do we measure this?) • To which extent would we expect the important characteristics and knowledge to emerge from within our educational systems or what role would we want these systems to play? (What should the education systems do?) – To the extent the education system is expected to serve the goal of preparing the new generations for the future work life, what kinds of skills or dispositions or cultures would be most sensible? (Education and the world of work.) Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 27 Thank you Jón Torfi Jónasson CELE NOCIES Symposium Turku 2013 Fallacious inferences 28