Handling ligands with PRODRG Daan van Aalten Division of Biological Chemistry and Drug Discovery College of Life Sciences PRODRG - why? • Early 1990s - no software to generate topologies for non-macromolecular entities • Manual topology generation is time consuming and error prone (but instructive) • Small molecule coordinate generators essentially only commercially available PRODRG - why? • For small molecules, we need to go from imagination/some chemical info to a correct topology and optimised coordinates in seconds + Topologies for SHELX, REFMAC5, CNS, O, TNT, … PRODRG - why? Citrate (1AJ8) NADP+ (1DDI) Cyclohexylamine (1PPA) (1997; 1.9 Å) (1999; 2.5 Å) (1991; 2.0 Å) Diphosphate (1N5L) Sulphate (1DW9) Ethylene glycol (1JKV) (2002; 2.3 Å) (1999; 1.7 Å) (2001; 1.4 Å) PRODRG History • Version 1 (1995) – Started as a DRuG PROgram in GROMOS87 – Takes PDB file and generates ‘MOLDES’ (SMILES-like 1D string) and MD topologies • Version 2 (2004) – Many additional input formats – Many additional output formats, including topologies for crystallographic software • Version 2.5 (2005) – Internal all-atom representation PRODRG History • Details covered in two publications • Webserver (~300 runs/day) with short FAQ PRODRG Guts • Essentially FORTRAN (30000 lines) with some supporting C (5000) lines • Compiles well on all major platforms • Few dependencies (GROMACS for coordinate generation) What is PRODRG? • Generates information about small molecules Molecular descripton PDB file Molfile Human PRODRG Atomic coordinates Chemical types Connectivity Bond orders / aromaticity Hybridisation Formal charges Atomic charges Force field parameters Hydrogen atoms Free torsions Hydrogen bonding Model building & refinement Molecular dynamics DB lookups & property pred. Docking & analysis Visualisation How does PRODRG work? • Fixed order of steps is bad • Input analysis is rather rude: – Deletes hydrogens – Ignores bond order information 1. 2. 3. 4. 5. 6. 7. 8. Analysis of input Initial data gathering Addition of hydrogens Atom reordering Topology generation Formal and partial charges Additional molecule data Output How does PRODRG work? • Most steps use ‘chemical pattern matching’ • Example: hydrogen generation Add 1+sp(x)-ncon(x) hydrogens Do nothing Add 1 hydrogen How does PRODRG work? • Currently all Hs generated by 17 ‘rules’ • Chemical knowledge in data, not code More flexible Potentially user-configurable Limitations • Supported atom types limited – C,H,N,O,P,S,F,Cl,Br,I only • Other chemical limitations – No more than 4 connections/atom – Standard version limited to <=300 atoms • Ignoring hydrogens and bond types may lead to unexpected results • (Apolar hydrogens as second-class atoms) • SMILES not yet implemented (but trivial) Basic usage: web server • Four easy steps: 1. Go to http://davapc1.bioch.dundee.ac.uk/programs/prodrg Basic usage: web server • Four easy steps: 1. Go to http://davapc1.bioch.dundee.ac.uk/programs/prodrg 2. Paste input Basic usage: web server • Four easy steps: 1. Go to http://davapc1.bioch.dundee.ac.uk/programs/prodrg 2. Paste input 3. Edit settings Chirality restraints? Reduced charges? Coordinates? Basic usage: web server • Four easy steps: 1. 2. 3. 4. Go to http://davapc1.bioch.dundee.ac.uk/programs/prodrg Paste input Edit settings Run it Basic usage: web server • Four easy steps: 1. 2. 3. 4. Go to http://davapc1.bioch.dundee.ac.uk/programs/prodrg Paste input Edit settings Run it Success! PRODRG inputs • • • • • PDB coordinates MDL molfile MOLDES (SMILES-like 1D string) JME editor (web server) “TEXT” input Text drawings • Atoms represented by their element symbols • Connected by bonds – Single: - or | – Double: = or ” – Triple: # • Change case of symbol to invert chirality N C-C | " " C-C-C C-O | | | C=O C=C | O D-Tyr N C-C | " " c-C-C C-O | | | C=O C=C | O L-Tyr PRODRG outputs • PDB (generated/minimzed) coordinates (with/out hydrogens, with proper atoms names for protein/sugars/DNA), but GIGO principle applies • Quality control on input coordinates vs topology • WHAT IF topology - accurate protein-ligand Hbonds • CNS/REFMAC/TNT/SHELX topology (including PTM amino acid building blocks) • GROMOS/GROMACS/OPLS topologies • Consistent topology from crystal -> publication Helping (or kicking) PRODRG • Additional commands/hints in input file: – PATCH (hybridisation) – INSHYD and DELHYD – PATCH (chirality) – PATCH (torsions) – CPNAME Hybridisation hints PATCH <atom> <number> • Useful if PDB analysis did not quite work • Allows to nudge PRODRG in right direction: O “ C=C-C | | C-C=N “ O PRODRG> WARNING: multiplicity of generated molecule is not 1. PRODRG> WARNING: bond type assignment failed at CAF . Hybridisation hints PATCH <atom> <number> • Useful if PDB analysis did not quite work • Allows to nudge PRODRG in right direction: O “ C=C-C | | C-C=N “ O PATCH NAG 21 Adding/removing hydrogens INSHYD <atom> DELHYD <atom> • Allows to override default protonation • Often not actually what you want C-C=O | O INSHYD OAD PRODRG> Cannot assign type to atom ' OAD'. ERRDRG> Error in GROMOS atom names/types. PRODRG> Drug topology not made, sorry! Adding/removing hydrogens INSHYD <atom> DELHYD <atom> • Allows to override default protonation • Often not actually what you want C-C=O | O PATCH OAD 3 Modifying chirality PATCH <atom> -1 • Inverts stereocenter <atom>, useful for PDB input PATCH <atom> <pattern> • ‘Absolute’ chirality for certain classes of molecules N C-C | " " C-C-C C-O | | | C=O C=C | O PATCH CA L L-Tyr N C-C | " " C-C-C C-O | | | C=O C=C | O PATCH CA D D-Tyr Adding dihedral restraints PATCH <atom> ><pattern> • After EM pyranose rings often found in undesirable conformations • PATCH statement introduces additional dihedral restraints to fix conformation C-C-O-C-O | | | O C-C-C | | | O O O PATCH C1 ALPHA PATCH C2 D PATCH C3 L PATCH C4 D PATCH C5 D PATCH C1 >4C1 -D-Glucose Building • PRODRG can add molecular fragments to existing molecules: BUILD <atom> <fragment> BUILD CB PHI L-Ala BUILD CZ OH L-Phe L-Tyr Building • Allows quick alterations to existing molecules • Preserves coordinates of root structure • Fragment libraries contain text drawings – easy to define: FRAG OH X-O FRAG PHI X-C-C=C " | C-C=C FRAG ... Building • Can also be used to generate oligopeptides and oligosaccharides, using BUILD and START <fragment> -D-Glc START bdGLC BUILD O4 adMAN1 BUILD O0F bdNAG1 PATCH C1 >4C1 PATCH C0B >4C1 PATCH C1B >4C1 -D-Man -D-NAG PRODRG IP issues • Currently PRODRG freely accessible for academics through webserver and binaries • Commercial licenses (~10) have provided useful income that contributes (but does not cover) PRODRG development / maintenance • Currently no PRODRG grant funding (previously WT senior fellowship) Thoughts on the future: • Make PRODRG as accessible as possible • Release of source? • Keen to incorporate/integrate with CCP4 but this will require some development PRODRG - what next • Make PRODRG as accessible as possible • Release of source? • Keen to incorporate/integrate with CCP4 but this will require some development • Need to incorporate SMILES • Make PDB input foolproof by quality control • Move away from the united-atom-with-hydrogen-addition model • Link up with GUI - not only drawing but also “building” • Link up with coot (build-place-fit ligand at pointer) Acknowledgements • Alexander Schüttelkopf • PRODRG users