Reporting Standards Version 1, 17 Feb 2016, prepared by Thomas Hartung This background document summarizes the reporting standards assessment frameworks, which might be considered for evidence evaluation in the context of GRAS evaluations. It addresses cell culture, animal and computer models. It does not extend to clinical and ecotoxicological studies. An obvious standard are the OECD Good Laboratory Practice guidelines, but they are more documentation than reporting standards and do not address scientific reporting in journals etc.. They also do not cover non-animal methods to the same extent. Especially for existing evidence to be considered under GRAS evaluations also an assessment of the reposting quality will be needed. Cell Culture Work Good Cell Culture Practice (GCCP) The limited applicability of GLP to in vitro studies was first addressed in an European Center for the Validation of Alternatives Methods (ECVAM) workshop in 1998 (CooperHannan et al. 1999). Parallel initiatives (1996 in Germany and 1999 in Bologna at the Third World Congress on Alternatives and Animal Use in the Life Sciences) led to a declaration toward Good Cell Culture Practice – GCCP (Gstraunthaler amd Hartung 1999): “The participants … call on the scientific community to develop guidelines defining minimum standards in cell and tissue culture, to be called Good Cell Culture Practice … should facilitate the interlaboratory comparability of in vitro results … encourage journals in the life sciences to adopt these guidelines...” A GCCP task force was then established, which produced two reports (Hartung et al. 2002; Coecke et al. 2005). The maintenance of high standards is fundamental to all good scientific practice, and it is essential for ensuring the reproducibility, reliability, credibility, acceptance, and proper application of any results produced. The aim of GCCP is to reduce uncertainty in the development and application of in vitro procedures by encouraging the establishment of principles for the greater international harmonization, standardization, and rational implementation of laboratory practices, nomenclature, quality control systems, safety procedures, and reporting, linked, where appropriate, to the application of the principles of Good Laboratory Practice (GLP). GCCP addresses issues related to: – Characterization & maintenance of essential characteristics – Quality assurance – Recording – Reporting – Safety – Education and training – Ethics The GCCP documents formed a major basis for a GLP advisory document by The Organisation for Economic Co-operation and Development (OECD) for in vitro studies (OECD, 2004), which addresses: – Test Facility Organization and Personnel – Quality Assurance Program – Facilities – Apparatus, Materials, and Reagents – Test Systems – Test and Reference Items – Standard Operating Procedures – Performance of the Study – Reporting of Study Results – Storage and Retention of Records and Materials Therefore, both guidance documents have a lot in common: Inherent variation of in vitro test systems calls for standardization, and both the GLP advisory document and the GCCP guidance are intended to support best practice in all aspects of the use of in vitro systems, including the use of cells and tissues. Notably, there is current development of a Good In vitro Method Practice (GIVIMP) by ECVAM and the OECD, but details have not been published. The envisaged International guidance shall support the implementation of in vitro methods within a GLP environment to support regulatory human safety assessment of chemicals. GIVIMP will contribute to increased standardization and harmonization in the generation of in vitro information on test item safety. The Guidance will further facilitate the application of the OECD Mutual Acceptance of Data agreement for data generated by in vitro methods and as such contribute to avoidance of unnecessary additional testing. GIVIMP will take into account the requirements of the existing OECD guidelines and advisory documents to ensure that the guidance is complementary and 100% in line with these issued documents. When comparing GLP and GCCP, there also are some major differences: GLP still gives only limited guidance for in vitro. GLP cannot normally be implemented in academia on the grounds of costs and lack of flexibility. GCCP, on the other hand, also aims to give guidance to journals and funding bodies. All quality assurance of an in vitro system starts with its definition and standardization, which include: – A definition of the scientific purpose of the method – A description of its mechanistic basis – The case for its relevance – The availability of an optimized protocol, including: standard operation procedures specification of endpoints and endpoint measurements derivation, expression, and interpretation of results (preliminary prediction model) the inclusion of adequate controls – An indication of limitations (preliminary applicability domain) – Quality assurance measures This standardization forms the basis for formal validation, as developed by ECVAM, adapted and expanded by ICCVAM and other validation bodies, and, finally, internationally harmonized by OECD (OECD, 2005). Validation is the independent assessment of the scientific basis, the reproducibility, and the predictive capacity of a test. It was redefined in 2004 in the Modular Approach (Hartung et al. 2004) but needs to be seen as a continuous adaptation of the process to practical needs and a case-bycase assessment of what is feasible (Hartung 2007a; Leist et al. 2012). Animal Work Five articles on reporting the results of animal experiments were identified in a review in press (Samuel et al., in press). One was specific to toxicology (Beronius et al., 2014). • Beronius, Molander, Rudén, Hanberg (2014) This work proposed criteria for assessing reliability and relevance of in vivo studies not conducted according to standardized toxicity test guidelines. A two-tiered approach for assessing reliability was developed. Tier 1 reliability criteria comprise 11 items such as the chemical name/CAS number and source of test compound, the description of number of animals/dose group, description of dose-levels/concentrations, the duration and frequency of administration, and statistical methods. Studies that satisfy all the criteria in Tier 1 are then evaluated for reliability using Tier II criteria, while those failing are regarded as having poor reporting quality, and as such are excluded from evidence used in risk assessment. The proposed Tier II reliability criteria comprise items in seven categories, for example purpose (e.g., description of endpoints to be investigated); test substance (e.g., description of toxicokinetic properties); and animals, housing and feed. A web-based tool was developed for the appraisal of reliability using the Tier II criteria, which translates an assessor marks on the fulfillment of each criterion into a color scale. Finally, relevance is evaluated, guided by eight items that address aspects such as the relevance of the exposure route of administration for human exposure, the appropriateness of exposure timing for the investigated endpoints, and the use of test substance representative of substance being risk assessed. Furthermore, the authors proposed a 16-criteria reporting checklist to support researchers in the design, conduct and reporting of in vivo toxicity studies. • Festing & Altman (2002) Festing and Altman developed a checklist of criteria for reporting animal experiments. Their objective was to promote the “3Rs” framework (Replacement, Reduction, and Refinement) for the ethical use of animals. The checklist consists of three categories that should be addressed in a paper: animals, environment, and statistical software. For example, the following items should be reported with respect to the “animals” category: source (e.g., species and gender), transportation (e.g., period of acclimatization), genotype (e.g., strain name), and microbiological status (e.g., specified pathogen-free). • Kilkenny, Browne, Cuthill, Emerson & Altman (2010) The ARRIVE (Animals in Research: Reporting In Vivo Experiments) guidelines address the reporting of animal experiments. The goal of the guidelines is not to establish a standardized procedure or to mandate procedures for reporting, but, rather, to improve the quality and utility of animal research through enhanced reporting of what was done and found during a study. The guidelines were developed by researchers, statisticians, and journal editors, and funded by the United Kingdom-based National Center for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs). The elements of the 20-item checklist are categorized under headings that follow the typical format of a scientific paper: Title, Abstract, Introduction, Methods, Results, and Discussion. The included items address ethical issues; study design; experimental procedures and specific characteristics of animals used; details of housing and husbandry; sample size; experimental, statistical, and analytical methods; and scientific implications, generalizability, and funding. • Hooijmans, de Vries, Leenaars & Ritskes-Hoitinga (2011) The Gold Standard Publication Checklist (GSPC) provides a distillation of guidelines on the proper design and reporting of animal experiments, and reflects feedback from experts in the field of animal science. The GSPC is intended to improve the quality of research involving animals, to help researchers to replicate results, to reduce the number of animals used in research, and to improve animal welfare. The checklist comprises several items under four categories similar to those of the ARRIVE guidelines: Introduction, Methods, Results and Discussion. For example, the guidelines recommend that the methods section address the following topics: the experimental design used; the experimental groups and controls used (such as species, genetic background, housing and housing conditions, nutrition, etc.); the ethical and regulatory principles followed; the intervention employed (such as dose and/or frequency of intervention, administration route, etc.); and the desired outcome (such as descriptions of parameters of interest and statistical methods). • Landis, Amara, Asadullah, Austin, Blumenstein, Bradley, Crystal, Darnell, Ferrante, Fillit, Finkelstein, Fisher, Gendelman, Golub, Goudreau, Gross, Gubitz, Hesterlee, Howells, Huguenard, Kelner, Koroshetz, Krainc, Lazic, Levine, Macleod, McCall, Moxley, Narasimhan, Noble, Perrin, Porter, Steward, Unger, Utz & Silberberg (2012) This guideline was proposed by major stakeholders in the US National Institute of Neurological Disorders and Stroke. The objective was to improve the quality of reporting of animal studies in grant applications and publications. The authors reached a consensus on reporting criteria that are regarded as pre-requisite for authors of grant applications and scientific publications. These criteria comprise four items: randomization (e.g., data should be garnered and processed randomly), blinding (animal care-takers and investigators should be blinded), sample-size estimation (e.g., utilization of appropriate sample size), and data-handling (e.g., a priori description of inclusion and exclusion criteria). Cell Culture and Animal Work • Schneider, Schwarz, Burkholder, Kopp-Schneider, Edler, Kinsner-Ovaskainen, Hartung & Hoffmann (2009) This paper proposed the Toxicological Data Reliability Assessment Tool (ToxRTool) as a means of introducing more objectivity into the assignment of Klimisch categories to individual studies. The ToxRTool provides comprehensive criteria and guidance for these assignments. This software-based tool comprises two parts, one for in vivo studies and the other for in vitro studies. There are five evaluation criteria groupings: (1) test substance identification, (2) test system characterization, (3) study design description, (4) study results documentation, and (5) plausibility of study design and data. Studies are assigned scores that determine their Klimisch code. Criteria that are considered essential (e.g., test substance identification and test concentration description) are given greater weight in the evaluation. The ToxRTool is nested within a Microsoft Office Excel® 2003 file that contains spreadsheets for the reliability evaluation of in vivo and in vitro toxicity studies, optional documentation of observations with importance to relevance (e.g., was study conducted according to recent OECD or EU guidelines?), as well as detailed explanations of the criteria. The goal of this design is to improve transparency in reliability evaluations of studies. The ToxRTool prototype was tested and improved through inter-rater testing (available for download at https://eurlecvam.jrc.ec.europa.eu/about-ecvam/archive-publications/toxrtool). Computational Toxicology Guidance relevant to assessing the reporting quality of (Q)SAR studies is provided in the OECD and the ECHA guidelines below [see the “Mixed Guidance (Methodological and Reporting Quality)” section, below]. Mixed Guidance (Methodological and Reporting Quality) • OECD 2007 (Guidance document on the validation of (Quantitative) structureactivity relationship [(Q)SAR]) models (http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mo no%282007%292&doclanguage=en) The increasing calls for (QS)ARs to be developed and applied to regulatory purposes beyond screening/prioritization, such as under REACH, resulted in a number of activities being initiated to promote (Q)SARs. A 2002 workshop organized by the International Council of Chemical Associations’ Long-Range Research Initiative (ICCA-LRI) and held in Setubal, Portugal, brought together a diverse group of international stakeholders to formulate guiding principles for the development and application of (Q)SARs. These were named the Setubal principles (Jaworska, Comber, Auer, & Van Leeuwen, 2003). These were subsequently discussed and endorsed by the OECD and are now known as the OECD Principles for (Q)SAR Validation (OECD, 2004a). There are five such principles that should be assessed to evaluate the scientific validity (quality) of a (Q)SAR model: a defined endpoint, an unambiguous algorithm, a defined domain of applicability, statistical validations, and a mechanistic interpretation (where feasible). Each of these five principles has numerous components. Preliminary guidance to interpret these principles was drafted by the European Commission Joint Research Center (Worth et al., 2005). This guidance was taken up by the OECD and published as its guidance in 2007 (OECD, 2007). This OECD guidance can be used retrospectively to evaluate the quality of (Q)SAR studies that apply these models. Reporting formats underpinned by the OECD validation principles were developed under the auspices of the then EU’s Technical Committee for New and Existing Substances QSAR Working Group to characterize pertinent information about a given (Q)SAR model and its predictions. Two of three reporting formats that were created and incorporated into the OECD and ECHA guidance on QSARs (OECD, 2007; ECHA, 2008) are of relevance: • (Q)SAR Model Reporting Format (QMRF): The information captured within a QMRF includes the QMRF author, the (Q)SAR model developer, the model type, the model algorithm, the endpoint being modeled and the descriptors used, the approach used to characterize the domain of applicability, the performance characteristics from internal/external validation, the mechanistic interpretation, and possible applications of the model. The level of detail will vary with different types of (Q)SAR models. • (Q)SAR Prediction Reporting Format (QPRF): The QPRF addresses the question of how a predicted value is generated for a substance using the model described in the QMRF; it also addresses the evaluation of its reliability. The information includes the substance identity and its structural representation, a description of how well the substance falls within the defined domain of applicability, and the extent to which there is agreement between the (Q)SAR predictions and the experimental data for relevant analogues. Although these reporting formats were developed to support efforts at model validation, they are also relevant to the context of ensuring or assessing the reporting quality of individual (Q)SAR studies. • European Chemicals Agency (2008) (http://echa.europa.eu/documents/10162/13632/information_requirements_r6_en.pdf ) ECHA administrates and oversees compliance with the REACH program. ECHA’s (Q)SAR validity and reporting guidance is aligned with the aforementioned OECD five principles of validation and reporting formats respectively. These five principles of validation provide a conceptual framework for determining the adequacy of (Q) SAR models for regulatory purposes. The three types of OECD reporting formats, when used together, ensure a comprehensive description of (Q)SAR and other approaches used in the classification, labeling and safety assessment of a given substance under REACH (ECHA, 2008). In practice, only the QMRF and QPRF are used. Under REACH, the reporting formats are designed to ensure transparency (unambiguous reports of estimation methods, prediction, and reasoning), consistency (information from different approaches should be reported in a common format) and acceptability (report of all relevant information to assess adequacy and completeness of (Q)SAR information for a given substance or endpoint). Criteria Summary Guidance on the methodological and reporting quality of (Q)SAR studies is limited but authoritative, coming mostly from the OECD and the ECHA. This guidance is anchored in the five principles of the (Q)SAR validation process. Each principle has several components that should be addressed and documented in a validation exercise (Table 3). For example, under the principle of having a defined endpoint, one would need to address experimental factors (e.g., species) and the health or ecological effect of interest (e.g., acute oral toxicity), among other considerations. Given that this guidance was developed to support efforts at model validation, some components may need to be appropriately translated – or indeed may not even apply – to the context of assessing the quality of individual (Q)SAR studies employing a given validated model. For example, validation principle four consists of statistical validations and relates to issues such as goodness of fit, sensitivity, internal validation techniques, and training and test sets. To be sure, considerable evidence on these issues would need to be marshaled in the context of an actual validation exercise, but this evidence could simply be referenced in the context of an individual application of a given model. Specific formats have been developed to aid in reporting of (Q)SAR studies. The (Q)SAR Prediction Reporting Format (QPRF) seems particularly suited to applications of validated models, as distinct from the actual validation exercise. It addresses issues such the substance identity and its structural representation, a description of how well the substance falls within the defined domain of applicability, and the extent to which there is agreement between the (Q)SAR predictions and the experimental data for relevant analogues. It remains to be determined how these (Q)SAR quality elements translate to the risk of bias framework from clinical medicine. One of the components of a defined (Q)SAR endpoint is data quality and variability (Table 3). How similar is the assessment of data quality and variability in this context compared to the assessment of methodological quality in clinical medicine? To what extent are (Q)SAR developers assessing the methodological and reporting quality of the underlying experiments on which their models are based (to avoid the problem of “garbage in, garbage out”)? References Beronius, A., Molander, L., Rudén, C., & Hanberg, A. (2014). Facilitating the use of nonstandard in vivo studies in health risk assessment of chemicals: a proposal to improve evaluation criteria and reporting. Journal of Applied Toxicology: JAT, 34(6), 607–617. http://doi.org/10.1002/jat.2991 Coecke, S., Balls, M., Bowe, G., Davis, J., Gstraunthaler, G., Hartung, T., ... Stokes, W. (2005). Guidance on good cell culture practice. a report of the second ECVAM task force on good cell culture practice. Alternatives to Laboratory Animals: ATLA,33(3), 261–287. Cooper-Hannan R, Harbell JW, Coecke S, Balls M, Bowe G, Cervinka M, Clothier R, Hermann F, Klahm LK, de Lange J et al. 1999. The principles of good laboratory practice: Application to in vitro toxicology studies - The report and recommendations of ECVAM Workshop 37. Atla-Altern Lab Anim 27: 539-577. Festing, M. F. W., & Altman, D. G. (2002). Guidelines for the design and statistical analysis of experiments using laboratory animals. ILAR Journal / National Research Council, Institute of Laboratory Animal Resources, 43(4), 244–258. Gstraunthaler G, and Hartung, T. 1999. Bologna declaration toward Good Cell Culture Practice. Altern Lab Anim: 27, 206. Hartung T, Balls M, Bardouille C, Blanck O, Coecke S, Gstraunthaler G, Lewis D, Force EGCCPT. 2002. Good Cell Culture Practice. ECVAM Good Cell Culture Practice Task Force Report 1. Altern Lab Anim 30: 407-414. Hooijmans, C., de Vries, R., Leenaars, M., & Ritskes-Hoitinga, M. (2011). The Gold Standard Publication Checklist (GSPC) for improved design, reporting and scientific quality of animal studies GSPC versus ARRIVE guidelines. Laboratory Animals, 45(1), 61. http://doi.org/10.1258/la.2010.010130 Jaworska, J. S., Comber, M., Auer, C., & Van Leeuwen, C. J. (2003). Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environmental Health Perspectives, 111(10), 1358–1360. Kilkenny, C., Browne, W., Cuthill, I., Emerson, M., & Altman, D. (2010). Improving bioscience research reporting: The ARRIVE guidelines for reporting animal research. Journal of Pharmacology and Pharmacotherapeutics, 1(2), 94. http://doi.org/10.4103/0976-500X.72351 Klimisch, H. J., Andreae, M., & Tillmann, U. (1997). A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data. Regulatory Toxicology and Pharmacology: RTP, 25(1), 1–5. http://doi.org/10.1006/rtph.1996.1076 Krauth, D., Woodruff, T. J., & Bero, L. (2013). Instruments for Assessing Risk of Bias and Other Methodological Criteria of Published Animal Studies: A Systematic Review. Environmental Health Perspectives. http://doi.org/10.1289/ehp.1206389 Küster, A., Bachmann, J., Brandt, U., Ebert, I., Hickmann, S., Klein-Goedicke, J., … Rechenberg, B. (2009). Regulatory demands on data quality for the environmental risk assessment of pharmaceuticals. Regulatory Toxicology and Pharmacology: RTP, 55(3), 276–280. http://doi.org/10.1016/j.yrtph.2009.07.005 Landis, S. C., Amara, S. G., Asadullah, K., Austin, C. P., Blumenstein, R., Bradley, E. W., … Silberberg, S. D. (2012). A call for transparent reporting to optimize the predictive value of preclinical research. Nature, 490(7419), 187–191. http://doi.org/10.1038/nature11556 Lavelle, K. S., Robert Schnatter, A., Travis, K. Z., Swaen, G. M. H., Pallapies, D., Money, C., ... Vrijhof, H. (2012). Framework for integrating human and animal data in chemical risk assessment. Regulatory Toxicology and Pharmacology: RTP, 62(2), 302–312. doi:10.1016/j.yrtph.2011.10.009 Levac, D., Colquhoun, H., & O’Brien, K. K. (2010). Scoping studies: advancing the methodology. Implementation Science, 5(1), 69. http://doi.org/10.1186/1748-5908-5-69 Linkov, I., Loney, D., Cormier, S., Satterstrom, F. K., & Bridges, T. (2009). Weight-ofevidence evaluation in environmental assessment: review of qualitative and quantitative approaches. The Science of the Total Environment, 407(19), 5199–5205. http://doi.org/10.1016/j.scitotenv.2009.05.004 Mallen, C., Peat, G., & Croft, P. (2006). Quality assessment of observational studies is not commonplace in systematic reviews. Journal of Clinical Epidemiology, 59(8), 765– 769. http://doi.org/10.1016/j.jclinepi.2005.12.010 Maxim, L., & van der Sluijs, J. P. (2014). Qualichem in vivo: a tool for assessing the quality of in vivo studies and its application for bisphenol a. PloS One, 9(1), e87738. http://doi.org/10.1371/journal.pone.0087738 Mayer, D. (2004). Essential Evidence Based Medicine | Epidemiology, public health and medical statistics | Cambridge University Press. Retrieved November 5, 2014, from http://www.cambridge.org/us/academic/subjects/medicine/epidemiology-publichealth-and-medical-statistics/essential-evidence-based-medicine McNutt, M. (2014). Reproducibility. Science, 343(6168), 229–229. http://doi.org/10.1126/science.1250475 Miller, G. W. (2014). Improving Reproducibility in Toxicology. Toxicological Sciences, 139(1), 1–3. http://doi.org/10.1093/toxsci/kfu050 Money, C. D., Tomenson, J. A., Penman, M. G., Boogaard, P. J., & Jeffrey Lewis, R. (2013). A systematic approach for evaluating and scoring human data. Regulatory Toxicology and Pharmacology: RTP, 66(2), 241–247. http://doi.org/10.1016/j.yrtph.2013.03.011 National Research Council. (2011). Review of the Environmental Protection Agency’s Draft IRIS Assessment of Formaldehyde. The National Academies Press. Retrieved from http://www.nap.edu/openbook.php?record_id=13142 NICE Guidance by type. Retrieved May 18, 2014, from http://www.nice.org.uk/ Nieto, A., Mazon, A., Pamies, R., Linana, J. J., Lanuza, A., Jiménez, F. O., … Nieto, F. J. (2007). Adverse effects of inhaled corticosteroids in funded and nonfunded studies. Archives of Internal Medicine, 167(19), 2047–2053. http://doi.org/10.1001/archinte.167.19.2047 O’Connor, A. ., Lovei, G. ., Eales, J., Frampton, G. ., Glanville, J., Pullin, A. S., & Sargeant, J. (2012). External Scientific Report: Implementation of systematic reviews. Retrieved May 10, 2014, from http://www.efsa.europa.eu/en/supporting/pub/367e.htm O’Connor, A. M., & Sargeant, J. M. (2014). Critical appraisal of studies using laboratory animal models. ILAR Journal / National Research Council, Institute of Laboratory Animal Resources, 55(3), 405–417. http://doi.org/10.1093/ilar/ilu038 OECD. (2004a). OECD SERIES ON TESTING AND ASSESSMENT Number 49 REPORT FROM THE EXPERT GROUP ON (QUANTITATIVE) STRUCTURE-ACTIVITY RELATIONSHIPS [(Q)SARs] ON THE PRINCIPLES FOR THE VALIDATION OF (Q)SARs. Retrieved January 5, 2015, from http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mon o%282004%2924&doclanguage=en OECD. (2004b). The Application of the Principles of GLP to in vitro Studies, OECD Series on Principles of Good Laboratory Practice and Compliance Monitoring, No. 14, OECD Publishing. DOI: 10.1787/9789264084971-en. Paris: Organisation for Economic Cooperation and Development. Retrieved from http://www.oecdilibrary.org/content/book/9789264084971-en OECD. (2007). Guidance document on the validation of quantitative structure–activity relationship [QSAR] models. OECD Environment Health and Safety Publication, Paris (2007) (Series on testing and assessment no. 69, ENV/JM/MONO(2007)2). Retrieved from http://www.oecd.org/dataoecd/55/35/38130292.pdf%CfAZSQ OECD. (2014). Guidance on grouping of chemicals. OECD Series on Testing and Assessment No. 194. Organisation for Economic Co-operation and Development, Paris, France. Retrieved January 2, 2015, from http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mon o%282014%294&doclanguage=en Patlewicz, G., Aptula, A. O., Roberts, D. W., & Uriarte, E. (2008). A Minireview of Available Skin Sensitization (Q)SARs/Expert Systems. QSAR & Combinatorial Science, 27(1), 60–76. http://doi.org/10.1002/qsar.200710067 Patlewicz, G., Ball, N., Becker, R. A., Booth, E. D., Cronin, M. T. D., Kroese, D., … Hartung, T. (2014). Read-across approaches--misconceptions, promises and challenges ahead. ALTEX, 31(4), 387–396. Rooney, A. A., Boyles, A. L., Wolfe, M. S., Bucher, J. R., & Thayer, K. A. (2014). Systematic review and evidence integration for literature-based environmental health science assessments. Environmental Health Perspectives, 122(7), 711–718. http://doi.org/10.1289/ehp.1307972 Sanderson, S., Tatt, I. D., & Higgins, J. P. T. (2007). Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. International Journal of Epidemiology, 36(3), 666–676. http://doi.org/10.1093/ije/dym018 Schmidt, C. W. (2014). Research wranglers: initiatives to improve reproducibility of study findings. Environmental Health Perspectives, 122(7), A188–191. http://doi.org/10.1289/ehp.122-A188 Schneider, K., Schwarz, M., Burkholder, I., Kopp-Schneider,A., Edler, L., KinsnerOvaskainen, A., ...Hoffmann, S. (2009). “ToxRTool”, a new tool to assess the reliability of toxicological data. Toxicology Letters, 189(2), 138–144. doi:10.1016/j.toxlet.2009.05.013 Schulz, K. F., Chalmers, I., Hayes, R. J., & Altman, D. G. (1995). Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA, 273(5), 408–412. Stephens, M. L., Andersen, M., Becker, R. A., Betts, K., Boekelheide, K., Carney, E., … Zurlo, J. (2013). Evidence-based toxicology for the 21st century: Opportunities and challenges. ALTEX, 30(1), 74–104. A Sterne JAC, Higgins JPT, Reeves BC on behalf of the development group for ACROBATNRSI. A Cochrane Risk Of Bias Assessment Tool: for Non-Randomized Studies of Interventions (ACROBAT-NRSI), Version 1.0.0, 24 September 2014. Available from http://www.riskofbias.info [accessed {04/12/2015} Thayer, K. A., Heindel, J. J., Bucher, J. R., & Gallo, M. A. (2012). Role of environmental chemicals in diabetes and obesity: a National Toxicology Program workshop review. Environmental Health Perspectives, 120(6), 779–789. http://doi.org/10.1289/ehp.1104597 Thayer, K. A., Wolfe, M. S., Rooney, A. A., Boyles, A. L., Bucher, J. R., & Birnbaum, L. S. (2014). Intersection of Systematic Review Methodology with the NIH Reproducibility Initiative. Environmental Health Perspectives, 122(7), A176–A177. http://doi.org/10.1289/ehp.1408671 Tunkel, J., Mayo, K., Austin, C., Hickerson, A., & Howard, P. (2005). Practical considerations on the use of predictive models for regulatory purposes. Environmental Science & Technology, 39(7), 2188–2199. Van Luijk, J., Bakker, B., Rovers, M. M., Ritskes-Hoitinga, M., de Vries, R. B. M., & Leenaars, M. (2014). Systematic Reviews of Animal Studies; Missing Link in Translational Research? PLoS ONE, 9(3), e89981. http://doi.org/10.1371/journal.pone.0089981 Viswanathan, M., Ansari, M. T., Berkman, N. D., Chang, S., Hartling, L., McPheeters, M., … Treadwell, J. R. (2008). Assessing the Risk of Bias of Individual Studies in Systematic Reviews of Health Care Interventions. In Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Rockville (MD): Agency for Healthcare Research and Quality (US). Retrieved from http://www.ncbi.nlm.nih.gov/books/NBK91433/ Wells, Shea, Connell, Peterson, Welch, Losos,Tugwell, 2004 Newcastle-Ottawa Scale (NOS) http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp West, S., King, V., Carey, T. S., Lohr, K. N., McKoy, N., Sutton, S. F., & Lux, L. (2002, April). Systems to Rate the Strength Of Scientific Evidence [Text]. Retrieved May 12, 2014, from http://www.ncbi.nlm.nih.gov/books/NBK33881/ Worth, A., Bassan, A., Gallegos, A., Netzeva, T., Patlewicz, G., Pavan, M., … Vracko, M. (2005). The Characterisation of (Quantitative) StructureActivity Relationships: Preliminary Guidance.