Workshop Improving Data Access and Research Transparency in Switzerland: Bren, November 2014 DART in large infrastructure projects A case study of the European Social Survey Michael Breen, Mary Immaculate College, University of Limerick, IRELAND Rationale for DART (Lupia & Alter, 2014) To more effectively and rigorously answer questions about the value of quantitative social science, it is imperative that those of us who conduct such research take actions that reinforce its credibility and make it easier for others to interpret our findings accurately. This means sharing our data whenever possible. It also means making available a complete description of the steps that we used to convert data about the social world into quantitative claims about how it does and does not work. Such commitments will not only help others more accurately assess our claims about individual events but also increase the extent to which others will view as credible our attempts to draw generalizations about people, policies, and institutions from a series of numerical simplifications and logical transformations. European Social Survey (ESS) • The European Social Survey was established in 2001 as an academically-driven social survey designed to chart and explain the interaction between Europe's changing institutions and the attitudes, beliefs and behaviour patterns of its diverse populations. • Currently in the midst of its seventh round, this biennial crosssectional survey covers more than thirty nations and employs the most rigorous methodologies. • All data, questionnaires, an interactive analysis tool (NESSTAR) and an educational resources for use in HEIs are all available online at http://www.europeansocialsurvey.org/. NC meeting: - Meeting papers and minutes from previous NC meetings Help: - Lists all ESS contact e-mails - Detailed sitemap: where to find what on the intranet General information: - ESS7 Project specificaton and Timetable - Information on and list of ESS7 Country contacts Prepare for Fieldwork: - Fieldwork Questionnaire Sampling guidelines, with list of assigned sampling expert Sampling design forms, when these are finalised Translation and verification SQP coding Media Claims Fieldwork Documents: - Source Questionnaire - ESS7 questionnaire, showcards, rotating module consultation information, questionnaire alerts. - Rotating module development - Response enhancement Contact forms (end of April). Interviewer briefings (when finalised). Fieldwork reporting Prepare data: - Data Protocol and variable definitions - ESS 2014 Data Protocol (due June) - SPSS/SAS variable definition programs - Standards for post coded variables, with guidance documents - International standards (Occupation, Industry, Country, Language) - ESS Specific Standards (Education, Religion, Ancestry) - ESS6 Processing Reports (due June) - Question consultation outcomes - Religion, Education, Marital Status and Ancestry - UPCOMING: Alcohol Survey Documentation – National Technical Summary - Download National Technical Summary and Appendices (due September); - Education Income Marital Status Political Parties Deposit Data - Secure deposit of all data and documentation; - Lists all deliverables – both data and documents - Lists all files that have been deposited View archive processing for your country - Transparency: Gives you access to your files during processing - Further information will be provided by you archive contact as processing develops - When processing is finished we will send you a Draft file in confidence for you to validate. Data deliverables • Data from Main questionnaire • Data from Supplementary questionnaires • Data from Interviewer questionnaire • Call record/contact form data • Parents' occupation • Sample design data file (SDDF) • Raw data from main and supplementary questionnaires • Media Claims file Documentation deliverables • • • • • • • • • • • • Main questionnaire Supplementary questionnaire (all versions) Interviewer questionnaire Contact form Show cards (from the main and supplementary questionnaires) National Technical Summary (NTS) with appendices (education, income, political parties and marital and relationship status) Population statistics Interviewer and fieldwork instructions Interviewer briefing and training material Advance letters, brochures etc. Media landscape Final (T)VFFs Processing • The processing is organised in two main steps, each leading up to standardised reports. The reports contain a summary of the programmes, files and output produced during the processing as well as queries that the Archive will need feedback on to produce the national files that will later be integrated into the international data file for Round 7. • When the Archive has completed the processing of the national data file, a draft file will be provided for NCs to approve of the processing carried out by the Archive. All NCs are responsible for the validity of their national data. All national files will be subject to further quality checks by the HQCST and the QDTs when a draft international file is available. • A complete deposit of all deliverables is a prerequisite for a country to be included in the integrated released file. Dire warnings to NCs • No national data (or interpretations of such data) can be released, published or reported in any way until the data has been officially released by the ESS Archive at NSD. Thereafter, the data will be available without restriction for non-commercial use, scientific research, knowledge and policy making in all participating countries and beyond to quarry at will. Compliance 1 • The first group of compliance issues are particularly central. Therefore, all members and observer countries are asked to ensure that they: - field the complete ESS Round 7 questionnaires, - deliver a Sample Design Data File (SDDF) which allows the calculation of inclusion probabilities, - make a complete delivery of ESS Round 7 data (including the contact form data) and documentation to the ESS Archive at NSD before 1 September 2016. Compliance 2 • The second group of compliance issues relate to the quality assurance procedures imposed by the HQ-CST. This means in particular that a country has to have finalised the following before fieldwork starts: - the translation, verification and SQP procedures for the ESS Round 7 questionnaire, - the sign off procedure for the sampling design, - the sign off procedure of the fieldwork questionnaire (FWQ); Compliance 3 The third set of compliance issues arise if quality control analyses performed by the HQ-CST (or other parties) reveal serious doubts as regards data quality. This may, for instance, include indications of - very high design or interviewer effects, indications of very large nonresponse bias or - very low measurement quality (reliability/validity) of the data (including large amounts of missing data). • Respondent substitution and interviewer fraud are also serious threats to data quality. Compliance 4 • The fourth area of compliance relates to data release. ESS data is a public good. NCs must ensure that no national data is released until the official data release via the ESS archive. This allows the data to be properly checked prior to release and ensures equal access to the data for all. • In the event of a breach of any of these four key compliance considerations, the HQ-CST reserves the right not to include the country data in the integrated file. In these cases, the representative for that country in the ESS ERIC General Assembly will be informed of this decision. Post release issues • Individual usage of the data • No control on how the data are used • Require registration of all users • Create ongoing citation database of research using ESS data • Need better resources to review pubished papers • Considering mentoring process for novice researchers using ESS Best practice in closed data: LIS • Luxembourg Income Study • LIS provides access to the LIS and LWS Databases in three ways: LISSY, the Web Tabulator, and the LIS Key Figures. Access through LISSY or the Web Tabulator requires registration. The LIS Key Figures are publically accessible and provide standard statistics based on the LIS Database. • LISSY is a remote-execution data access system for the LIS and LWS microdata. LISSY allows registered users to submit programs using common statistical software packages, while respecting the confidentiality restrictions imposed by certain countries. • Remote execution is enabled through two submission paths, a Job Submission Interface (JSI) or Email The scale of the problem • In a recent large study (Daniele Fanelli, Plos One, May 2009): • 1.97 per cent of scientists admitted to outright falsification of data • 33.7 per cent admitted poor practices - dropping data based on a "gut feeling" - selectively reporting results that supported their hypotheses • 70 per cent said they had seen colleagues doing this • PubMed had 788 papers withdrawn after publication between 2000 and 2010, 70% on error and 30% because of fraud. • Japanese anesthesiologist Yoshitaka Fujii falsified data in 172 of 212 of his papers published between 1993 and 2011. • Shigeaki Kato has notched his 26th, 27th, and 28th retractions, all in Nature Cell Biology. The three papers have been cited a total of 677 times. The appeal of using partial data (De Vries, 2014) • … you could “simplify”. After all, most of your results are in line with your predictions, so your theory is probably right. Why not leave those “aberrant” results out of the paper? There is probably a good reason why they turned out like that. Some anomaly. Nothing to do with your theory really. • Nowhere in this process do you feel like you are being deceptive. You just know what type of papers are easiest to publish, so you chip off the “boring” complications to achieve a clearer, more interesting picture. Sadly, the complications are probably closer to messy reality. The picture you publish, while clearer, is much more likely to be wrong. Core problem • Lack of clarity about methodology • Cf Shoemaker, Tankard and Lasorsa • No access to the code/syntax files • No clarity regarding data recoding • Ambiguity over interpretation Proposed criteria for best practice in DART (ICPSR, Michigan, 2014) • Cites all the evidence and methods upon which published claims rely • Makes available all evidence and methods upon which published claims rely, including numeric data, code, and all other materials necessary to replicate findings; • Ensures that cited objects are available at the time of publication, subject to any ethical or legal limitations, through institutions with demonstrated capacity to provide long-term access; • Recognizes that full access to data may not be possible when data are under external restriction (e.g., the data are classified, require confidentiality protections, or were obtained under a non-disclosure agreement); • Upon request, provides data to editors and reviewers prior to publication for assessment only and under a strict assurance of confidentiality. References • Fanelli D. (2009). How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PLoS ONE 4(5): e5738. doi:10.1371/journal.pone.0005738 • De Vries, R. (2014). When ‘exciting’ trumps ‘honest’, traditional academic journals encourage bad science. http://theconversation.com/when-exciting-trumps-honesttraditional-academic-journals-encourage-bad-science-29804 • ICPSR (2014). Research Transparency, Data Access, and Data Citation: A Call to Action for Scholarly Publications. http://datacommunity.icpsr.umich.edu/research-transparencydata-access-and-data-citation-call-action-scholarly-publications • Lupia, A. And Alter, G. (2014). Data Access and Research Transparency in the Quantitative Tradition . PS: Political Science & Politics, 47, pp 54-59. doi:10.1017/S1049096513001728. • Shoemaker, P. J., Tankard, J. W., & Lasorsa, D. L. (2004). How to build social science theories. Thousand Oaks, CA: Sage Contact Details • Prof. Michael Breen Dean of Arts Mary Immaculate College, University of Limerick • Address Mary Immaculate College South Circular Road Limerick Ireland • Email michael.breen@mic.ul.ie • Phone +353 61 204972