ppt

advertisement
CheckCIF/PLATON
Crystal Structure Validation
Ton Spek
Utrecht University,
The Netherlands.
Goettingen, 7-Sep, 2011
Data Collection around 1966
Nonius AD3 Diffractometer
One data set: weeks !
~1966, Electrologica X8 ALGOL60 ‘Mainframe’ (<1MHz)
16kW
Plotter
Operator
Output
Input
Console
Multiple Hours of computing time per structure
Flexowriter for the creation and editing of programs and data
Data Storage in the Past
Direct Methods ALGOL60 Program AUDICE on Papertape
Archival of Model Parameters
in a Publication (Acta Cryst.)
Archival of Reflection Data in
a Publication (Acta Cryst.)
Problems Around 1990
• Multiple Data Storage Media (Often hardcopy
only or microfilm).
• No Standard Computer Readable Format for
Archival and Data Exchange.
• Data Entry of Published data done by Retyping.
• No easy Numerical Checking by Referees etc.
• CSD Database Archival by Retyping from the
published paper.
• Multiple typo’s and inconsistencies in the
Published Data
• Often incomplete information reported.
The CIF Solution
• CIF-Standard Proposal for data archival by
S.R. Hall, F.H. Allen, I.D. Brown (1991). Acta
Cryst. A47, 655-685.
• Adopted by the IUCr
• Implemented in the Xtal package Hall,
Stewart et al.).
• Adopted early by the author of the
nowadays most commonly used refinement
program SHELXL (G.M.Sheldrick)
CIF Example File
CIF Constructs
• data_name
where name is the chosen identifier of the data
• Data associations e.g.
_ cell_length_a 16.6392(2)
_ diffraction_radiation_source ‘sealed tube’
• Repetition (loop)
loop_
__symmetry_equiv_pos_as_xyz
‘x, y, z’
‘-x, y+1/2, -z’
Construct for Text
• Text can be included between semi-columns
• Used for Acta Cryst. Section C & E Abstract and
Comment sections
• Example
_publ_section_comment
;
This paper presents the first example
of a very important compound.
;
CIF Completion
• CIF Files are created by the refinement
program (e.g. SHELXL)
• Missing Date can be added with a Text
Editor, enCIFer (from the CCDC).
• The Syntax can be checked with a locally
installed version of the program enCIFer
(Freely Available: www.ccdc.cam.ac.uk)
PROGRAM
enCIFer
Missing Data
Note on Editing the CIF
• The Idea of editing the CIF is to add
missing information to the CIF.
• Some Acta Cryst. authors have been found
to polish away less nice numerical values
(including R-values e.g. 0.0975 => 0.0475)
This leaves traces and is generally detected
now (also in retrospect) by the validation
software and not good for the career of the
culprit…
CIF Validation History
• Structure Validation of data supplied in computer
readable CIF format was pioneered by Acta Cryst. C
(Syd Hall et al., 1990s).
• Initially the numerical checking of papers submitted
to Acta C in CIF format was done by the IUCr
Chester staff.
• Subsequently automated checking of the CIF for data
consistency, data completeness and validity was
introduced (checkCIF) (Non PLATxxx ALERTS).
• PLATON facilities to check for Missed Symmetry
and VOIDS were added soon after.
• This was followed by also including the numerous
other PLATON based tests (PLATxxx) of the reported
structure (currently more than 400).
checkcif/PLATON
FCF Validation
• Fo/Fc reflection file deposition and archival in
CIF format (FCF) was made mandatory early
on for Acta Cryst. papers.
• FCF's are Useful for subsequent analysis of
possibly unique data.
• CIF + FCF checking was added in 2010 into
the IUCr CheckCIF/PLATON suite.
• Major chemical journals now require CIF
deposition and validation reports but (not yet)
the deposition of reflection data.
• The CCDC now accepts FCF's for deposition.
Reflection CIF (FCF)
Why Automated Structure Validation
• The large volume of new and routine structure reports
submitted for publication.
• The limited number experienced and available
crystallographic referees for validation.
• Detection of errors due to the black box use of
crystallography by non-crystallographers.
• Setting standards of quality and reliability.
• Automated detection of unusual though not
necessarily erroneous issues that need special
attention (ALERTS A,B,C,G).
• Sadly: The need to Detect Frauded structure reports.
ALERT LEVELS
CheckCif Report in terms of a list of ALERTS




ALERT A – Could Indicate a Serious Problem – Consider
Carefully (Correct or tell why Correct)
ALERT B – Might Indicate a Potentially Serious Problem
ALERT C – Check to Ensure it is O.K. & Not because of an
oversight.
ALERT G – General Info. Check that it is not something
Unexpected.
ALERT TYPES
1 - CIF Construction/Syntax errors,
Missing or Inconsistent Data.
2 - Indicators that the Structure Model
may be Wrong or Deficient.
3 - Indicators that the quality of the results
may be low.
4 - Cosmetic Improvements, Queries and
Suggestions.
Which Key Issues are Addressed
Missed symmetry (“being Marshed”)
Wrong chemistry (Misassigned atom types)
Too many, too few or misplaced H-atoms
Missed solvent accessible voids in the structure
Missed Twinning
Absolute structure issues
Data quality and completeness

Common CIF Problem
• There exists a frequent misunderstanding
about the correct specification of the
‘population’ parameter value in the CIF for
an atom on a special position leading to
composition ALERTS.
• E.g. A fully occupied position of an atom on
an inversion centre has to be specified with
0.5 in the .res and 1.0 in the CIF.
Common Validation Problems
•
•
•
•
•
•
•
•
CIF and FCF not from the final refinement
SHELXL defaults left unchanged
Completeness (up to 25 degrees)-do not cut
Data names in CIF and FCF not identical
‘Non-standard’ reflection CIF’s
Twinning, Powder, Incommensurate Struct.
Improper parameter transformations (Uij’s)
DAMP 0 0
Validation with PLATON
- Details: www.cryst.chem.uu.nl/platon
- Driven by the file CHECK.DEF with criteria,
ALERT messages and advice.
- Use (UNIX): platon –u structure.cif
- Result on file: structure.chk and structure.ckf
- Applicable on CIF’s and CCDC-FDAT
Two ALERTS related to the misplaced Hydrogen Atom
ADVISE
- Validation should not be postponed to the
publication phase. All validation issues should be
taken care of during the analysis.
- Everything unusual in a structure is suspect,
mostly incorrect (artifact) and should be
investigated and discussed in great detail and
supported by independent evidence.
- The CSD can be very helpful when looking for
possible precedents (but be careful)
Systematic Fraud
• A massive fraud was detected in late 2009 of structures mainly
published around 2007 in Acta Cryst. E. (Soon 200 retractions !)
• Nobody was prepared for serious and systematic fraud in this not
competitive field of routine structures before 2010.
• Many deviations from the expected results can often be explained as
errors, inexperience or due to poor data.
• Several retractions before 2010 might in hindsight concern frauded
structures and not errors.
• Ongoing testing of our validation software on the archived data for
structures published in Acta E often indicated suspect structures
needing a more detailed investigation.
• It was only by following up on one of such a strange structure report
with an analysis of all structures published by the authors of that
paper that a fraud pattern emerged.
• It was discovered that the same data set was used to publish a series
if invented isomorphous structures.
BogusVariations (with Hirshfeld ALERTS) on the Published Structure
2-hydroxy-3,5-nitrobenzoic acid (ZAJGUM)
OH=>NH2
NO2=>COOH
OH => F
H2O => NH3
Error and Fraud Detection Tools
• Generalized Hirshfeld Rigid Bond Test.
• CIF versus FCF data checking.
• Scatter Plots of the reflection data of the same or
related structure(s).
• Look in Difference Maps for unusual features.
• SHELXL re-refinement using the supplied CIF &
FCF data.
• Check in the CSD for related structures.
• Two case studies that illustrate the use of the above
validation and analysis tools follow.
Example 1:
Structure I
Submitted to Acta Cryst.
(2011)
PLATON Report Part 1
PLATON Report Part 2
RELATED STRUCTURE FROM THE CSD
Structure II
Structure Report for II
Analysis
• Structure (II) has no validation issues.
• C-CH3 distance in (II) of 1.50 Ang. as expected.
• ‘C-F’ distance in (I) is 1.50 Ang. and not the expected
1.35 Ang.
• Conclusion: Structure (I) is the CH3 variety and not F.
• Data sets of (I) & (II) are not identical (see next).
• Data set (I) likely based on CH3 compound.
• Fraud or Error ? DIFABS file Error ?
• Authors of (I) confirmed Error believing external
chemists proposal. Paper was retracted.
Scatter Plots of 2 Data Sets
Two Unrelated Data
Sets
Two Identical Data sets
CIF versus FCF data Check
• The R & S values in the three lines # R= should be identical within
rounding error.
• The reported and calculated residual density ranges should also be closely
identical
• This is the case in the first example but not in the second where the CIF &
FCF data do not match.
Example 2: Iron(III) Complex
Fe(III) Validation Part 1
Fe(III) Validation Part 2
Example 2: Difference Density Map
Fe Structure Re-refined
Conclusion ?
• Structure now O.K. after an erratum ?
• Search for similar (isomorphous) structures in
the CSD
• Yes, there is an isomorphous Mn complex
published by a different set of authors from a
different university.
• Let us compare both structures.
Isomorphous Mn(III) Complex
Mn Structure Validation Part 1
Mn Validation Part 2
Scatter Plot Fe versus Mn I(obs)
Fe and Mn Data Sets
Identical !
Validation Challenges
•
•
•
•
•
•
•
•
Avoid False Positive and Negative ALERTS
Disordered structures (true or artifact)
Handling of Twinning (data names missing)
Powder structure validation (experts needed)
Incommensurate structure validation (experts)
Fabricated reflection data – Can we detect them
Education – What is the meaning of an ALERT
Should validation criteria be different for structures
published in chemical journals ?
Residual Problem
EDUCATION
Response of an author of a structural paper submitted to the
crystallographic journal Acta Cryst. to an enquiry from a
referee on the reported space group:
Please teach me, what does in mean
‘ space group incorrect’ ……
Concluding Remarks
• PLATON includes a standalone Validation Tool. It is part of
the WEB-based IUCr CheckCIF/PLATON Tool that is capably
managed by Mike Hoyland (IUCr)
• Validation is still a learning process.
• Chemical insight might be very helpful and often decisive as a
validation tool.
• Deposition of structure factors should be a requirement for all
journals (The CCDC now accepts those along with the CIF)
Thanks To
• Martin Lutz and many
others for taking the
time to bring various
unresolved issues to my
attention with actual
data and suggestions.
• Send to a.l.spek@uu.nl
A.L.Spek (2003). J. Appl. Cryst. 36, 7-13.
A.L.Spek (2009). Acta Cryst. D65, 148-155.
Download