Structure Validation Challenges in Chemical Crystallography Ton Spek Utrecht University, The Netherlands. Madrid, Aug. 26, 2011 Validation History • Structure Validation of data supplied in computer readable CIF format was pioneered by Acta Cryst. C (Syd Hall et al., 1990ies). • Initially the numerical checking of papers submitted to Acta C in CIF format was done by the Chester staff. • Subsequently automated checking of the CIF for data consistency, data completeness and validity was introduced (checkCIF) • PLATON facilities to check for Missed Symmetry and VOIDS were added later on. • Soon followed by the inclusion of numerous other PLATON based tests (PLATxxx) of the reported structure (currently more than 400). checkcif/PLATON FCF Validation • Fo/Fc reflection file deposition and archival in CIF format (FCF) was made mandatory early on for Acta Cryst. papers. • Useful for subsequent analysis of possibly unique data. • CIF + FCF checking was added in 2010 into the IUCr CheckCIF/PLATON suite. • Major chemical journals now require CIF deposition and validation reports but (not yet) the deposition of reflection data. • The CCDC now accepts FCF's for deposition. Why Automated Structure Validation • The large volume of new and routine structure reports submitted for publication. • The limited number experienced and available crystallographic referees for validation. • Detection of errors due to the black box use of crystallography by non-crystallographers. • Setting standards of quality and reliability. • Automated detection of unusual though not necessarily erroneous issues that need special attention (ALERTS A,B,C,G). • Sadly: The need to Detect Frauded structure reports. Systematic Fraud • A massive fraud was detected in late 2009 of structures mainly published around 2007 in Acta Cryst. E. (Soon 200 retractions !) • Nobody was prepared for serious and systematic fraud in this not competitive field of routine structures before 2010. • Many deviations from the expected results can often be explained as errors, inexperience or due to poor data. • Several retractions before 2010 might in hindsight concern frauded structures and not errors. • Ongoing testing of our validation software on the archived data for structures published in Acta E often indicated suspect structures needing a more detailed investigation. • It was only by following up on one of such a strange structure report with an analysis of all structures published by the authors of that paper that a fraud pattern emerged. • It was discovered that the same data set was used to publish a series if invented isomorphous structures. • Full story: Acta Cryst. E (2010) editorial and a Powerpoint Presentation of the E-section editor Jim Simpson (IUCr Website). BogusVariations (with Hirshfeld ALERTS) on the Published Structure 2-hydroxy-3,5-nitrobenzoic acid (ZAJGUM) OH=>NH2 NO2=>COOH OH => F H2O => NH3 Fraud Detection Tools • Generalized Hirshfeld Rigid Bond Test. • CIF versus FCF data checking. • Scatter Plots of the reflection data of the same or related structure(s). • Look in Difference Maps for unusual features. • SHELXL re-refinement using the supplied CIF & FCF data. • Check in the CSD for related structures. • Two case studies that illustrate the use of the above validation and analysis tools follow. Example 1: Error or Fraud ? Structure I Submitted to Acta Cryst. (2011) PLATON Report Part 1 PLATON Report Part 2 RELATED STRUCTURE FROM THE CSD Structure II Structure Report for II Scatter Plots I(obs) versus I(calc) (I) (II) Analysis • Structure (II) has no validation issues. • C-CH3 distance in (II) of 1.50 Ang. as expected. • ‘C-F’ distance in (I) is 1.50 Ang. and not the expected 1.35 Ang. • Conclusion: Structure (I) is the CH3 variety and not F. • Data sets of (I) & (II) are not identical (see next). • Data set (I) likely based on CH3 compound. • Fraud or Error ? DIFABS file Error ? • Authors of (I) confirmed Error believing external chemists proposal. Paper was retracted. Scatter Plots of 2 Data Sets Two Unrelated Data Sets Two Identical Data sets CIF versus FCF data Check • The R & S values in the three lines # R= should be identical within rounding error. • The reported and calculated residual density ranges should also be closely identical • This is the case in the first example but not in the second where the CIF & FCF data do not match. Example 2: Iron(III) Complex Fe(III) Validation Part 1 Fe(III) Validation Part 2 Example 2: Difference Density Map Fe Structure Re-refined Conclusion ? • Structure now O.K. after an erratum ? • Search for similar (isomorphous) structures in the CSD • Yes, there is an isomorphous Mn complex published by a different set of authors from a different university. • Let us compare both structures. Isomorphous Mn(III) Complex Mn Structure Validation Part 1 Mn Validation Part 2 Scatter Plot Fe versus Mn I(obs) Fe and Mn Data Sets Identical ! Analysis on Fe/Mn Structures • The Displacement parameters in the CIF for the H2O molecule in the Fe complex are different from those used in the final refinement. • Reflection sets identical for papers from two different sets of authors and location. • CSD: Unusual coordination distances • Fraud or Error ? • Withdraw/Retract one or both ? Validation Challenges • • • • • • • • Avoid False Positive and Negative ALERTS Disordered structures (true or artifact) Handling of Twinning (data names missing) Powder structure validation (experts needed) Incommensurate structure validation (experts) Fabricated reflection data – Can we detect them Education – What is the meaning of an ALERT Should validation criteria be different for structures published in chemical journals ? Concluding Remarks • PLATON includes a standalone Validation Tool. It is part of the WEB-based IUCr CheckCIF/PLATON Tool that is capably managed by Mike Hoyland (IUCr) • Validation is still a learning process. • Chemical insight might be very helpful and often decisive as a validation tool. • Deposition of structure factors should be a requirement for all journals (The CCDC now accepts those along with the CIF) Thanks To • Martin Lutz and many others for taking the time to bring various unresolved issues to my attention with actual data. • Send to a.l.spek@uu.nl