pKaData Poster EuroCUP 2011

advertisement
IUPAC pKa Compilations Converted to
Substructure Searchable Databases
ǂ
Tony Slater and Joe Corkery
§
Abstract
Features
OpenEye’s partner, pKaData Limited has converted
all four aqueous pKa compilations of organic acids
and bases sponsored by the International Union of
Pure and Applied Chemistry (IUPAC) from book
form into fully curated, computer-readable data,
searchable by substructure.
• Molecule names and structures converted to
SMILES
• IUPAC critical data quality assessment
• Data assigned to separate fields (e.g. ionic
strength, concentration and temperature)
• Associated alphabetic data placed in separate
field to numerical data (e.g. <5.3 assigned to
two data fields) for enhanced search capabilities
• Full reference and method description for each
record
• Ionisation assignment for logD calculations
• Very flexible searching due to careful field
assignment:
• Substructure
• Search for basic pKa with 6.5 < pKa < 7.5
• Use only the highest quality data
• Search for where 35°C ≤ temp ≤ 40°C
• Any combination of the above and much
more
• Database can be merged with existing in-house
data, with the IUPAC-sourced data clearly
identified
• Tautomers were enumerated using OpenEye’s
QuacPac program, with the ability to display
just a single representatidve tautomer
• Total number of 75232 records and 79 columns
• Good range of organic chemistry with
applicability to pharmaceutical, agrochemical
and specialty chemicals research, as well as pKa
prediction research
The 13697 molecules with 30415 pKa experimental
pKa values can be searched very flexibly due to
the careful assignment of information to defined
data fields. Simple examples of such searches are
provided below.
Introduction
pKaData Limited has kindly been granted exclusive
permission by IUPAC to convert their extensive
compilations of experimental pKa values of organic
acids and bases from book form into fully curated,
computer-readable data, searchable by substructure.
The project was completed in mid 2011, providing
researchers with access to 30415 experimental
pKa values in aqueous solution.
The four books of pKa compilations sponsored by
IUPAC are:
1. Dissociation Constants of Organic Bases in
Aqueous Solution, by D. D. Perrin
2. Dissociation Constants of Organic Bases in
Aqueous Solution, Supplement 1972, by D. D.
Perrin
3. Dissociation Constants of Organic Acids in
Aqueous Solution, by G. Kortum, W. Vogel, and
K. Andrussow
4. Ionisation Constants of Organic Acids in
Aqueous Solution, by E. P. Serjeant and Boyd
Dempsey
Conversion
Figure 1 below illustrates how the data were
extracted from the books and assigned to defined
fields. Note the production of a SMILES description
was derived from the molecule name and/or
molecular diagram for each molecule. pKbs were
converted into pKas using the method of Bandura
1
and Lvov .
assign data and text to relevant fields
supply details of method
supply full reference
convert names to smiles
translate quality assessment
into confidence limits
assign non-numeric
text (eg. <) to
separate field,
also assign ion
group
pKa
7.152
7.42
7.3
Table 2: Results for search for para-substituted phenols in
the pKa range 6.5 < pKa < 7.5
Products
pKa Databases created by conversion of the
following IUPAC books:
Base 1 (3775 molecules, 8766 pKas)
Dissociation Constants of Organic Bases in
Aqueous Solution, by D. D. Perrin
Acid 1 (1063 molecules, 2893 pKas)
Dissociation Constants of Organic Acids in Aqueous
Solution, by G. Kortum, W. Vogel and K.
Andrussow
Base 2 (4275 molecules, 7844 pKas)
Dissociation Constants of Organic Bases in
Aqueous Solution, Supplement 1972, by D. D.
Perrin
Acid 2 (4584 molecules, 10912 pKas)
Searches
Data Source
convert
pKb into
pKa
Substituent X
Ionisation Constants of Organic Acids in Aqueous
Solution, by E. P. Serjeant and Boyd Dempsey
Search 1
A simple search first, say we are interested in the
effect of a para halogen substituent on the pKa of
aniline as in Figure 2. Furthermore, we only want
pKas measured at 25°C and we are interested
only in the most reliable data as defined by the
IUPAC data quality assessment. The results are
shown in Table 1.
Complete Database (13697 molecules, 30415
pKas)
Conclusions
In association with OpenEye, pKaData Limited has
converted all four IUPAC compilations of aqueous
pKa data in book form into computer-readable and
substructure-searchable form. The databases are
fully curated, and the complete database provides
access to researchers to 30415 experimental pKa
determinations. The IUPAC critical data quality
assessment provides confidence limits to the
measurements. Ionisation assignments have been
provided for logD calculations
Hal
Figure 2: Search for pKa of aniline with para-substituted
halogen measured at 25°C.
Hal
pKa range
(# of obs)
most reliable
value
F
4.53 - 4.65 (5)
4.64
Cl
3.82 - 4.15 (9)
3.982
Br
3.8 - 3.95 (6)
3.888
References
I
3.75 - 3.84 (6)
3.812
1) Bandura, A. V., and S. N. Lvov, "The Ionization
Constant of Water over a Wide Range of
Temperatures and Densities." J. Phys. Chem. Ref.
Data, Vol. 35, 2006, pp. 15 – 30.
Table 1: Results for search for pKa of aniline with parasubstituted halogen measured at 25 degC.
Search 2
In this search, we have an enzyme active site that
we think will accept an ionised phenol, but can we
bring the phenolic pKa down enough to be in the
range 6.5 < pKa < 7.5 using a para substituent as
in Figure 3? Will nitro suffice and are there any
“off the wall” substituents that will do the job? The
results are shown in Table 2.
pKaData Limited
§
116 Wood Road
RD9 Maungatapere
Whangarei 0179
New Zealand
+6494346197
tony@pKaData.com
www.pKaData.com
9 Bisbee Ct
Suite D
Santa Fe, NM 87508
505.473.7385
info@eyesopen.com
www.eyesopen.com
ǂ
X
assign temperature to correct field and put other text (eg. ~ or >) in separate field
Figure 1: Example of data extraction from book into database.
Figure 3: Search for para-substituted phenols in the pKa
range 6.5 < pKa < 7.5
Download