Macromolecular Structure Notation, Editing and Registration Tianhong Zhang BBC Informatics Research Technology Center Pfizer Inc Cambridge, MA Sept. 15-16, 2009 ChemAxon US User Group Meeting San Diego, CA ChemAxon Components Used Marvin Sketcher Marvin Viewer Calculator Plugin Marvin API Informatics Systems Disparity Between Small and Large Molecules Small Molecule Drug Biologics Drug • Peptide • Antibody • Antisense • RNAi • Vaccine Small Molecule Informatics System • ChemAxon • Accelrys • CambridgeSoft • Daylight • OpenEye •… Large Molecule Informatics System • ??? Macromolecule Structure Representation Natural Biological Polymers (Sequence) Nucleic Acids Proteins … AGCTAATT AKKKAAG Synthetic Polymer (Repeating Units) Polyethylene Polyamide (Nylon 6) ... -(CH2CH2)n-(CH2CH2CH2CH2CH2CONH)n- Modified Biological Polymers Terminal Conjugation Modified Nucleotide Hybrid Polymer … AGCCCATT-Biotin AGCUmAAACC AGCUUU-PEG20-KKKGC Modified Biological Polymers: - Structure Representation Challenges How to determine structure uniqueness? Is AGCUAAdTdT the same as AGCUAAtt? How to represent the connections between different polymers? In hybrid structure AGCUUU-PEG20-KKKGC, there is a bond between RNA sequence AGCUUU and chemical linker PEG20, which atom from each polymer is used? How to represent hierarchical information for hybrid polymers? Polymer Type Level: RNA—ChemLinker—Peptide Sequence Level: AGCUUU-PEG20-KKKGC R(A)P.R(G)P.R(C)P.R(U)P.R(U)P.R(U)P;PEG20;K.K.K.G.C Monomer Level: Atom/Bond Level: Molfile or SMILES Notation Language Requirements 1. Monomer (fragment) based notation language 2. Extensible Polymer Types 3. Nucleic Acid Peptide Chemical Modifier Extensible monomers in each polymer type Support backbone and branch monomer type Each monomer has defined structure and attachment points Default connections between monomers in the same polymer type 4. Connections between different polymer types can be specified using monomer position and attachment points 5. Notation can be expanded to full atom/bond representation using Molfile and/or SMILES 6. Hierarchical structure information can be extracted from Notation Notation Language Design ListOfSimplePolymers $ListOfConnections $ListOfHydrogenBonds $ListOfPolymerAttributes $ListOfOtherProperties At least one simple polymer is required; Each simple polymer is labeled with its polymer type followed by a number Simple polymer is represented using monomer IDs and SMILES-like notation Monomer is represented with ChemAxon extended SMILES, where R group is used to represent attachment point in a monomer Connections between monomers within simple polymer are implicit: Default backbone connection is R2 on left monomer to R1 on right monomer Default branch connection is R3 on backbone monomer to R1 on branch monomer Connections between monomers of different polymer type need to be specified using monomer position in simple polymer notation and attachment point in monomer Complex Polymer Simple Polymer Monomer Notation Example (Hybrid) Oligonucleotide-ChemLinker-Peptide Structure: Notation: 5’-AGmCUUU~PEG3~n-KKKGC RNA1{R(A)P.R(G)P.[mR](C)P.R(U)P.R(U)P.R(U)} |PEPTIDE1{K.K.K.G.C} |CHEM1{PEG3} $RNA1,CHEM1,16:R2-1:R1 |PEPTIDE1,CHEM1,5:R2-1:R2 $$$ Symbol Purpose Specs: $ To delimit top level components | To delimit list item within a top level component {} To enclose simple polymer notation . To delimit monomer groups within simple polymer notation () To enclose a branch monomer within simple polymer notation [] To enclose a modified monomer (non-natural) ,:- To describe connection and hydrogen bonding Pfizer Macromolecule Editor (PME) Structure Drawing and Visualization Tool Draw macromolecule structure, similar to Marvin Sketcher for small molecules Display structure from notation Monomer Manager Create new monomer Create new nucleotide Notation Toolkit Generate notation from structure drawing Conversion to canonical SMILES Property calculation, MW, MF, Extinction Coefficient Structure Editing and Visualization Graph View Sequence View Monomer list categorized by polymer type and monomer type Monomer Management Nucleic Acid Monomer List 2’-O-MethylRibose Monomer Definition • • • • ChemAxon Extended SMILES to represent monomer structure Attachment is an integral part of monomer definition Creation of new monomer is controlled to keep the monomer list manageable Updated monomer list available via web service Pfizer Macromolecule Registration (PMR) Large Molecule Registration Web Service Structure representation using polymer notation Structure validation and verification via notation toolkit on polymer notation Structure uniqueness checking on canonical SMILES converted from polymer notation via notation toolkit GUI Tool Single Registration Batch Registration Registration Result View Single Registration • Use PME for structure input • Gather data from input form • Call web service for registration Summary and Conclusion A simple and flexible notation language was designed to represent natural and modified biological polymers such as oligonucleotides and peptides. A graphic user interface (GUI) was developed to enable users to depict and visualize macromolecular structures. A large molecule registration web service was developed based on the polymer notation of large molecules, which implements structure validation and verification, and uniqueness checking. A graphic user interface was developed to enable users to register large molecules into global compound database. These tools streamline RNAi workflow, facilitate compound tracking and registration, and enable data mining and structure based design. Acknowledgements • • • • • • • • • Hongli Li, Targets & Mechanisms Informatics Shreyas Dube, Targets & Mechanisms Informatics Ya Chen, Targets & Mechanisms Informatics Jason Hughes, Target Generation Unit, BBC Theresa Johnson, Computational Chemistry, RTC Simon Xi, Computational Sciences CoE, PGRD Karen Mullane-Robinson, BBC Informatics David Klatte, Targets & Mechanisms Informatics Sergio Rotstein, BBC Informatics • Pfizer Global LMR Team For More Information, Contact Tianhong Zhang, Ph.D BBC Informatics Research Technology Center Pfizer Inc Cambridge, MA 02139 tianhong.zhang@pfizer.com