- ChemAxon

advertisement
Macromolecular Structure Notation, Editing and Registration
Tianhong Zhang
BBC Informatics
Research Technology Center
Pfizer Inc
Cambridge, MA
Sept. 15-16, 2009
ChemAxon US User Group Meeting
San Diego, CA
ChemAxon Components Used
„
„
„
„
Marvin Sketcher
Marvin Viewer
Calculator Plugin
Marvin API
Informatics Systems Disparity
Between Small and Large Molecules
Small Molecule Drug
Biologics Drug
• Peptide
• Antibody
• Antisense
• RNAi
• Vaccine
Small Molecule Informatics System
• ChemAxon
• Accelrys
• CambridgeSoft
• Daylight
• OpenEye
•…
Large Molecule Informatics System
• ???
Macromolecule Structure Representation
„ Natural Biological Polymers (Sequence)
„ Nucleic Acids
„ Proteins
„ …
AGCTAATT
AKKKAAG
„ Synthetic Polymer (Repeating Units)
„ Polyethylene
„ Polyamide (Nylon 6)
„ ...
-(CH2CH2)n-(CH2CH2CH2CH2CH2CONH)n-
„ Modified Biological Polymers
„
„
„
„
Terminal Conjugation
Modified Nucleotide
Hybrid Polymer
…
AGCCCATT-Biotin
AGCUmAAACC
AGCUUU-PEG20-KKKGC
Modified Biological Polymers:
- Structure Representation Challenges
„ How to determine structure uniqueness?
„ Is AGCUAAdTdT the same as AGCUAAtt?
„ How to represent the connections between different polymers?
„ In hybrid structure AGCUUU-PEG20-KKKGC, there is a bond between
RNA sequence AGCUUU and chemical linker PEG20, which atom
from each polymer is used?
„ How to represent hierarchical information for hybrid polymers?
„
„
„
„
Polymer Type Level: RNA—ChemLinker—Peptide
Sequence Level:
AGCUUU-PEG20-KKKGC
R(A)P.R(G)P.R(C)P.R(U)P.R(U)P.R(U)P;PEG20;K.K.K.G.C
Monomer Level:
Atom/Bond Level:
Molfile or SMILES
Notation Language Requirements
1.
Monomer (fragment) based notation language
2.
Extensible Polymer Types
„
„
„
3.
Nucleic Acid
Peptide
Chemical Modifier
Extensible monomers in each polymer type
„
„
„
Support backbone and branch monomer type
Each monomer has defined structure and attachment points
Default connections between monomers in the same polymer type
4.
Connections between different polymer types can be specified using
monomer position and attachment points
5.
Notation can be expanded to full atom/bond representation using Molfile
and/or SMILES
6.
Hierarchical structure information can be extracted from Notation
Notation Language Design
ListOfSimplePolymers
$ListOfConnections
$ListOfHydrogenBonds
$ListOfPolymerAttributes
$ListOfOtherProperties
„ At least one simple polymer is required; Each simple polymer is
labeled with its polymer type followed by a number
„ Simple polymer is represented using monomer IDs and SMILES-like
notation
„ Monomer is represented with ChemAxon extended SMILES, where R
group is used to represent attachment point in a monomer
„ Connections between monomers within simple polymer are implicit:
„
„
Default backbone connection is R2 on left monomer to R1 on right monomer
Default branch connection is R3 on backbone monomer to R1 on branch monomer
„ Connections between monomers of different polymer type need to be
specified using monomer position in simple polymer notation and
attachment point in monomer
Complex Polymer
Simple Polymer
Monomer
Notation Example (Hybrid)
Oligonucleotide-ChemLinker-Peptide
Structure:
Notation:
5’-AGmCUUU~PEG3~n-KKKGC
RNA1{R(A)P.R(G)P.[mR](C)P.R(U)P.R(U)P.R(U)}
|PEPTIDE1{K.K.K.G.C}
|CHEM1{PEG3}
$RNA1,CHEM1,16:R2-1:R1
|PEPTIDE1,CHEM1,5:R2-1:R2
$$$
Symbol Purpose
Specs:
$
To delimit top level components
|
To delimit list item within a top level component
{}
To enclose simple polymer notation
.
To delimit monomer groups within simple polymer notation
()
To enclose a branch monomer within simple polymer notation
[]
To enclose a modified monomer (non-natural)
,:-
To describe connection and hydrogen bonding
Pfizer Macromolecule Editor (PME)
„ Structure Drawing and Visualization Tool
„ Draw macromolecule structure, similar to Marvin Sketcher for
small molecules
„ Display structure from notation
„ Monomer Manager
„ Create new monomer
„ Create new nucleotide
„ Notation Toolkit
„ Generate notation from structure drawing
„ Conversion to canonical SMILES
„ Property calculation, MW, MF, Extinction Coefficient
Structure Editing and Visualization
Graph View
Sequence View
Monomer list categorized
by polymer type
and monomer type
Monomer Management
Nucleic Acid
Monomer List
2’-O-MethylRibose
Monomer
Definition
•
•
•
•
ChemAxon Extended SMILES to represent monomer structure
Attachment is an integral part of monomer definition
Creation of new monomer is controlled to keep the monomer list manageable
Updated monomer list available via web service
Pfizer Macromolecule Registration
(PMR)
„ Large Molecule Registration Web Service
„ Structure representation using polymer notation
„ Structure validation and verification via notation toolkit on
polymer notation
„ Structure uniqueness checking on canonical SMILES converted
from polymer notation via notation toolkit
„ GUI Tool
„ Single Registration
„ Batch Registration
„ Registration Result View
Single Registration
• Use PME
for structure
input
• Gather data
from input
form
• Call web
service for
registration
Summary and Conclusion
„ A simple and flexible notation language was designed to
represent natural and modified biological polymers such as
oligonucleotides and peptides.
„ A graphic user interface (GUI) was developed to enable users
to depict and visualize macromolecular structures.
„ A large molecule registration web service was developed
based on the polymer notation of large molecules, which
implements structure validation and verification, and
uniqueness checking.
„ A graphic user interface was developed to enable users to
register large molecules into global compound database.
„ These tools streamline RNAi workflow, facilitate compound
tracking and registration, and enable data mining and
structure based design.
Acknowledgements
•
•
•
•
•
•
•
•
•
Hongli Li, Targets & Mechanisms Informatics
Shreyas Dube, Targets & Mechanisms Informatics
Ya Chen, Targets & Mechanisms Informatics
Jason Hughes, Target Generation Unit, BBC
Theresa Johnson, Computational Chemistry, RTC
Simon Xi, Computational Sciences CoE, PGRD
Karen Mullane-Robinson, BBC Informatics
David Klatte, Targets & Mechanisms Informatics
Sergio Rotstein, BBC Informatics
• Pfizer Global LMR Team
For More Information, Contact
Tianhong Zhang, Ph.D
BBC Informatics
Research Technology Center
Pfizer Inc
Cambridge, MA 02139
tianhong.zhang@pfizer.com
Download