ChemAxon Presentation

advertisement
Java Solutions for Cheminformatics
March 2005
About Us
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
History
Formed:
1998 Budapest, Hungary
Highlights
•
1998: Custom projects
•
1999: Java tools for
sketching/viewing structures
Skills base:
•
Chemistry,
•
2000: Structure database support
•
Software development,
•
•
Predictive tools
2001: Clustering and diversity
analysis
•
2003: Pharmacophore screening,
property predictions, reaction
processing, fragmenting
•
2004: Cartridge technology, virtual
synthesis, improved SMARTS
support
Aim:
Platform independent
software for chemistry
People
Developers: 17
Business Support: 3
(7 Phd, 10 MSc)
(1 MSc, 2 BSc)
Technical expertise
Commercial expertise
•
Cheminformatics
•
Negotiation & contracting
•
Synthetic and physicochemistry
•
Relationship management
•
Collaboration steering and
development
•
Virtual drug design
•
Java
•
Strategic marketing
•
Web technology
•
Mutually benefitial (win win)
business relationships
Selected Application Areas
Global licenses
Custom development
projects
Value added
constructions
Websites/portal front
and back end
Educational
Product development
1999
2000
RDF,
Marvin SDF,
XYZ
animations,
Applets,
CML,
Molfiles, stereo
templates,
support,
compressed
Windows, Unix
formats, Swing,
3D rendering
Structure Database and
Cheminformatics toolkit
Chemical drawing
1998
SMILES,
SMARTS,
PDB,
Rgroups,
isotopes,
shortcuts,
Marvin
Beans
2001
2002
Ball and stick,
JPG, PNG,
SVG,
Cut&Paste with
Isis/ChemDraw,
2D cleaning,
(de)aromatizatio
n, reactions
Mac support,
signed applets,
Java Web
Start, atom
mapping
JChem
Oracle, MySQL,
SQLServer,
Access, hashed
fingerprints,
substructure and
similarity
searching
clustering,
diversity
DB2,
PostgreSQL,
Rgroup
searching
2003
Partial charge,
pKa, logP, logD,
3D generation,
radicals,
Sgroups
reaction
searching,
reaction
processing,
pharmacophore
analysis.
screening,
standardization,
fragmentation
2004
Marvin file format,
enhanced
stereo, enhanced
SMARTS support,
shapes, text
boxes, multiple
groups, TPSA,
Donor/Acceptor...
cartridge,
enhanced stereo
searching,
recursive
SMARTS,
chemical
expressions,
virtual
synthesis…
Current Products Overview
Multiple Deployment Formats
• Applications
• Java Applets
• Signed Java Applets
• Java Web Start
• Java Beans
• Plugins
• JSP
Why ChemAxon?
• Sophisticated virtual chemistry technology
• Platform independence and Web (Java)
• High performance tools (speed, capacity)
• Client oriented development
• Comprehensive API for the developers
• Detailed documentation
• Competitive prices
• Fast and reliable support
Product Support
„Developers supporting developers”
• Fast response to support question – max. 24 hour
response (fast solution also!)
• Final and beta releases available online.
• Detailed documents available online and extensive
help bundled within software
• Skilled and relevant human support quality (direct
developer to developer)
• Product development based on support requests
Molecule Drawing and Visualization
About Us
Molecule Drawing and Visualization
Structure Searching
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Operating Systems
• 100% pure java
• Windows
– 95, 98, Me,
NT, 2000, XP
• Macintosh
– OS 9, OS X
• Unix
– Linux, Solaris, Irix,
etc.
Web Browsers
• Internet
Explorer
• Netscape
• Mozilla
• Safari
• Opera
Marvin
• Various file formats
• Isotopes, charges, radicals
• SMARTS properties (atoms,
bonds, recursive SMARTS)
• Alias, pseudo atoms
• Chemical error checking
• Templates
• Generic atoms and bonds
• Abbreviated groups
• Atom lists and not lists
• Reactions
• 2D cleaning
• Atom maps
• 3D cleaning
• R-groups
• Various 3D models
• Stereo bonds, stereo
configurations (R/S, E/Z)
• Shapes, text boxes
• Enhanced stereo
(ABS/AND/OR)
• Plugins
Various File Formats
Isotopes, Charges, Radicals
Templates
Abbreviated Groups
R-groups
Reactions
Rendered 3D displays with MarvinSpace
Structure Cleaning
CC(C)NCC(O)COC1=C2C=C(C)NC2=CC=C1
topology
2D
3D
Structure Searching
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
JChem Base Features
• Rapid fingerprint-based database scanning
• Sophisticated graph-based searching
• Integration with databases
–
–
–
–
–
–
–
Oracle
MS SQL Server
DB2
MYSQL
PostgreSQL
InterBase
Access
• Custom standardization
• JChem Cartridge for searching in Oracle
• JSP integration
Import with JChem Base Manager
Query Features
• Exact structure
• Stereo atoms
• Substructure
• Stereo bonds
• Atom lists and notlists
• R-group queries
• Explicit hydrogens
• Generic atoms
• Generic bonds
• SMARTS atom properties
–
–
–
–
–
–
–
Aliphatic, aromatic
Hydrogen count
Connection count
Valence
Ring count
Smallest ring size
Recursive SMARTS
–
–
–
–
R-groups
Occurence
if / then conditions
RestH
• Reaction search
– Transformation recognition
– Component identification
– Stereospecific reactions
(inversion, retention)
• Diastereomers
– Enhanced stereo groups
(Abs, And, Or)
JChem Base JSP Integration
Thin client support: only a web
browser and Java required
Cartridge Technology
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
JChem Cartridge for Oracle
Oracle can be extended to support chemical database
operations using the JChem Cartridge for Oracle
Examples:
Substructure search displaying ID, SMILES codes, and molweight:
SELECT cd_id, cd_smiles, cd_molweight FROM my_structures
WHERE jc_contains(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') = 1;
Finding benzene derivatives conforming the Lipinski’s rule of five:
SELECT count(*) FROM my_structures
WHERE jc_compare(structure, 'c1ccccc1','sep=!t:s!ctFilter:
(mass() <= 500) &&
(logP() <= 5) &&
(donorCount() <= 5) &&
(acceptorCount() <= 10)') = 1;
JChem Cartridge for Oracle
Example Oracle search returning
similar structures with logP >1,
which were acquired after April
14th, 2002. MarvinView below.
Structure Standardization
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Standardization
• Explicit hydrogens
• Aromatic bonds
• Mesomers
• Tautomers
• Counterions
Standardization Example
before
after
Molecular Predictions
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Calculator Plugins
Available Calculations
Calculation Interface
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Elemental analysis
Charge distribution
Polarizability
pKa
logP
logD
Polar surface area
Huckel Analysis
H-bond donor-acceptor
Major microspecies
Refractivity
Marvin GUI
Command line
Chemical Terms
API
Elemental Analysis
Polar Surface Area
Partial Charge Distribution
Partial Charge Distribution Calculation
Partial Equalization of Orbital Electronegativities (PEOE)
Orbital electronegativity defined by Mulliken
Orbital electronegativity of atom i:
ci=at+btqi+ctqi2
qi: partial charge
Partial charge of atom i is iteratively calculated based on
Gasteiger’s method:
ci(0) = at, qi(0) = 0
qi(n+1) = qi (n) + S(0.5)n(ci- ck)/ max(ci, ck)
k: index of a neighbor of atom i
Polarizability
logP
logP Example
logP =
Sf
i
fI: atomic logP increment
Validation of the logP prediction
logD
logD Example
k1
1+(1)
k4
1+2+(4
k5
123(0)
k2
2+(2)
k6
k3

p0
neutral species
log D  log
3-(3)
k7
)
1+3-(5)
1 +2+3(7)
2+3-(6)
mono -ionized species
-ionized species
tri-ionized species



 di

 



 2
[H ]
[H ]
k3
[H ]
k5
k6
k
 p1
 p2
 p3   p4
 p5  p6  p7 7 [ H  ]
k1
k2
[H ]
k1k4
k1
k2
k1k4


 2
[H ] [H ]
k
[H ] k5 k 6
k
1

 3 
   7 [H  ]
k1
k2
[H ] k1k4
k1 k 2 k1k4
logD is computed using micro ionization constants (ki),
micro partition coefficients (pi), and pH
pKa
pKa Plugin - Microconstants
Micro ionization constants (logk) are calculated from
regression equations that have three types of calculated
parameters:
Intramolecular
interactions
Partial charges
logk
Polarizabilities
pKa Plugin - Macroconstants
Macro ionization constants
(pKa) are calculated from the
microconstants (logk)
Ionization scheme
1
1-
1 -2 +
2+
1 -3 -
3-
2+3-
3
123
2
1 -2 + 3 -
Hydrogen Bonds in pKa Calculation
Dlogk = a (qi - qk) + b
a,b: regression parameters
Intramolecular hydrogen bonds are also taken into account
Validation of the pKa prediction
Chemical Expressions
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Chemical Terms
Elements of the language
•
structure matching functions (describing functional groups, reaction sites, similarity…)
•
property calculations (partial charge distribution, pKa, logP, electrophility…)
•
arithmetic and logic-operators
Chemical Terms examples
searching
match("olefine.mol") && !match("c1ccncc1") && (atomCount(16)
== 0) || (mass() < 300);
goal functions
inhibitor = inhibitor.mol;
(similarity(inhibitor, pharmacophore_tanimoto) > 0.8) &&
(similarity(inhibitor, chemical_tanimoto) < 0.5);
filtering
(mass() <= 500) &&
(logP() <= 5) &&
(donorCount() <= 5) &&
(acceptorCount() <= 10);
Applications of Chemical Terms
virtual synthesis
reaction and synthesis rules
pharmacophore analysis
pharmacophore definitions
CT
drug design
goal functions
structure searching
advanced query expressions
Screening
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Pharmacophore Mapping
atom type colors
■ hydrophobic (h)
■ acceptor (a)
■ donor / cationic (d/c)
pharmacophore type colors
■ aromatic (r)
■ acceptor / donor (a/d)
■ donor / aromatic (d/r)
Topological Pharmacophore Fingerprint
r
h
r
r
r
r
r/d
r
r
h
r
h
d/a
h
h
h
h
d/+
d/a
Hypothesis Fingerprints
Advantages
Disadvantages
Minimum
strict selection of
common features
very sensitive to one
missing feature
Average
not that sensitive to
outliers
less selective if actives
are similar
Dissimilarity Metrics
Euclidean
Tanimoto
• standard
• standard
• normalized
• scaled
• weighted
• asymmetric
• asymmetric
Screening Optimization
10,000 test compounds
(from NCI)
300
optimization
50 active compounds
(ß-adrenoreceptor antagonists)
TRAINING
1/3
training set
1/3
query set
9,700
validation
VALIDATION
1/3
spikes
Screening Validation
ß2-adrenoreceptor antagonists
All compounds:
Known active compounds:
9,700
18
minimum hypothesis
all hits
known active hits
enrichment
before
optimization
after
optimization
2,476
18
15
18
3.27
539.89
Active Hit Distribution
ß2-adrenoreceptor antagonists
Mixing 18 active compounds with random 9,700 NCI molecules.
Sorting by pharmacophore similarity.
Screening Validation
10,000 NCI compounds
family
before optimization
actives
all hits
after optimization
active hits
enrichment
all hits
active hits
enrichment
ACE
7
6,537
6
1.27
171
6
47.01
Angiotensin2
4
177
3
40.40
66
3
105.50
D2
5
417
5
22.90
31
5
269.08
delta
7
60
5
106.70
9
5
495.25
FTP
13
1020
11
7.97
13
10
422.30
mGluR1
7
1744
3
2.38
10
7
571.10
NPY Y5
49
6370
38
1.18
145
45
47.12
3
328
2
19.6
57
2
109.64
thrombin
Optimized Screening
JSP Example
Optimized Screening
JSP Example Hits
Clustering
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
JKlustor
• JarvisPatrick
• Ward
Ward Clustering Features
• Ward's minimum variance method
• Murtagh's reciprocal nearest neighbor (RNN)
algorithm
• O(n2) time complexity
• O(n) memory complexity
Ward Pharmacophore Clustering
Example
• 8 active compound sets
–
–
–
–
–
–
–
–
5-HT3-antagonists
ACE inhibitors
angiotensin 2 antagonists
D2 antagonists
delta antagonists
FTP antagonists
mGluR1 antagonists
thrombin inhibitors
Ward Centroids
A Ward Cluster
D2 antagonists
Maximum Common Substructure Clustering
Drug Design
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
RECAP fragmentation example
amide:2
ether:1
amide:1
amine:1
amine:2
ether:2
Virtual Synthesis
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
The Ideal Virtual Reaction
• Generic (simple)
– the equation describes the
transformation only
– few hundred generic reactions can
form the basic armory of a preparative
chemist
• Specific (complex)
– chemo-, recognizes reactive and
inactive functional groups
– regio-, "knows" directing rules
– stereo-, inversion/retention
• Customizable
– to improve reaction model quality
Reaction Modeling
• Processing selective "smart" reactions
• Batch mode (sequential or combinatorial
combinations)
• Reverse direction
• High performance (speed and capacity)
Customizable Reaction Engine!
Chemoselective Reaction Definition
REACTIVITY:
!match(ratom(3), "[#6][N,O,S:1][N,O,S]", 1) &&
!match(ratom(3), "[N,O,S:1][C,P,S]=[N,O,S]", 1)
Reactants
369 isocyanates
and isothiocyanates
2920 amines, alcohols
and thiols
Chemoselective Reaction Products
1,264,391 single site products
Regioselectivity (Markovnikov, Zaitsev)
Addition reaction definition with the Markovnikov rule.
r1
SELECTIVITY:
hcount(ratom(2))
An elimination reaction definition with Zaitsev’s rule.
r2
SELECTIVITY:
-hcount(ratom(2))
Regioselective Reaction Example
Chlorine migration example in four steps by consecutive elimination and addition
reactions.
r2
r2
r1
r1
Regioselectivity (SeAr)
Reaction definition of aromatic electrophile bromination of the benzene ring. The
expression defines a regioselectivity rule for the major product.
SELECTIVITY:
TOLERANCE:
-charge(ratom(1))
0.0045
Regioselectivity (SeAr) Products
The virtual bromination of toluene with the above reacton definition results the ortho
and para isomer as main product…
… and bromine is directed into the meta position in case of nitro-benzene.
Regioselectivity (SeAr) Example
Products
1,198 monobrominated main products
(tolerance is set to zero)
Virtual Synthesis
• Multiple steps
• Flexible compound dispatching
• Synthesis rules
• Synthesis tree building
• Memory, file and database mode
• Graphical synthesis browser
• Building block coloring
Customizable Synthesis Engine!
Synthesis Example
alkyne coupling
lacton aminolysis
esterification
Derek S. Tan, Michael A. Foley, Matthew D. Shair, Stuart L. Schreiber*, J. Am. Chem. Soc., 1998, 120, 8565-8566
Synthesis Definition
Synthesis route definition
R1
Step1:
A+B
Step2:
C+D
R2
E
Step3:
E+F
R3
G
"Smart" reaction library
R1: alkyl-iodid + alkyne >> alkyl-alkyne
R2: lacton + amine >> amide
R3: alcohol + carboxylic acid >> ester
C
Component set definition
Set1:
Set2:
Set3:
Set4:
Set5:
Set6:
Set7:
A
B1, B2, B3
D1, D2
F1, F2
Synthesis Browser
Current Developments
About Us
Molecule Drawing and Visualization
Structure Searching
Cartridge Technology
Structure Standardization
Molecular Predictions
Chemical Expressions
Screening
Clustering
Fragment Analysis
Virtual Synthesis
Current Developments
Recent Developments
• Automatic searching of low-energy conformers
• Improved Oracle cartridge
• Structure searching combined with chemical calculations
• Exhaustive Synthesis for metabolism applications
• R-group decomposition
• Maximum common substructure search in molecule pairs and in
libraries
Current Developments
• MarvinSpace, an OpenGL based 3D molecule and surface
visualisation engine for small and macromolecules
• Instant JChem Base, a desktop and enterprise chemical
database client with form builder
• IUPAC naming plugin
• Isoelectric point plugin
• Random Synthesis for building up a diverse virtual space of
synthetically feasible compounds
• Extension of the reaction library
• Further descriptors in the Topology Analysis plugin
Future Plans
• Metabolic transformation library
• Diverse database of synthetically accessible compounds
• Search in Markush compounds
• Peptide builder
• Fragment-based activity analysis of compound libraries
• AnalogMaker (fragment based random evolutionary analog
design)
• Retrosynthesis
Visit us
• Home page
– www.chemaxon.com
• Forum
– www.chemaxon.com/forum
• Animated demos and tutorials
– www.chemaxon.com/demos
• Presentations and posters
– www.chemaxon.com/conf
Thank you for your attention
Máramaros köz 3/a
Budapest, 1037
Hungary
info@chemaxon.com
www.chemaxon.com
Download