IUPAC pKa Compilations Converted to Substructure Searchable Databases ǂ Tony Slater and Joe Corkery § Abstract Features OpenEye’s partner, pKaData Limited has converted all four aqueous pKa compilations of organic acids and bases sponsored by the International Union of Pure and Applied Chemistry (IUPAC) from book form into fully curated, computer-readable data, searchable by substructure. • Molecule names and structures converted to SMILES • IUPAC critical data quality assessment • Data assigned to separate fields (e.g. ionic strength, concentration and temperature) • Associated alphabetic data placed in separate field to numerical data (e.g. <5.3 assigned to two data fields) for enhanced search capabilities • Full reference and method description for each record • Ionisation assignment for logD calculations • Very flexible searching due to careful field assignment: • Substructure • Search for basic pKa with 6.5 < pKa < 7.5 • Use only the highest quality data • Search for where 35°C ≤ temp ≤ 40°C • Any combination of the above and much more • Database can be merged with existing in-house data, with the IUPAC-sourced data clearly identified • Tautomers were enumerated using OpenEye’s QuacPac program, with the ability to display just a single representatidve tautomer • Total number of 75232 records and 79 columns • Good range of organic chemistry with applicability to pharmaceutical, agrochemical and specialty chemicals research, as well as pKa prediction research The 13697 molecules with 30415 pKa experimental pKa values can be searched very flexibly due to the careful assignment of information to defined data fields. Simple examples of such searches are provided below. Introduction pKaData Limited has kindly been granted exclusive permission by IUPAC to convert their extensive compilations of experimental pKa values of organic acids and bases from book form into fully curated, computer-readable data, searchable by substructure. The project was completed in mid 2011, providing researchers with access to 30415 experimental pKa values in aqueous solution. The four books of pKa compilations sponsored by IUPAC are: 1. Dissociation Constants of Organic Bases in Aqueous Solution, by D. D. Perrin 2. Dissociation Constants of Organic Bases in Aqueous Solution, Supplement 1972, by D. D. Perrin 3. Dissociation Constants of Organic Acids in Aqueous Solution, by G. Kortum, W. Vogel, and K. Andrussow 4. Ionisation Constants of Organic Acids in Aqueous Solution, by E. P. Serjeant and Boyd Dempsey Conversion Figure 1 below illustrates how the data were extracted from the books and assigned to defined fields. Note the production of a SMILES description was derived from the molecule name and/or molecular diagram for each molecule. pKbs were converted into pKas using the method of Bandura 1 and Lvov . assign data and text to relevant fields supply details of method supply full reference convert names to smiles translate quality assessment into confidence limits assign non-numeric text (eg. <) to separate field, also assign ion group pKa 7.152 7.42 7.3 Table 2: Results for search for para-substituted phenols in the pKa range 6.5 < pKa < 7.5 Products pKa Databases created by conversion of the following IUPAC books: Base 1 (3775 molecules, 8766 pKas) Dissociation Constants of Organic Bases in Aqueous Solution, by D. D. Perrin Acid 1 (1063 molecules, 2893 pKas) Dissociation Constants of Organic Acids in Aqueous Solution, by G. Kortum, W. Vogel and K. Andrussow Base 2 (4275 molecules, 7844 pKas) Dissociation Constants of Organic Bases in Aqueous Solution, Supplement 1972, by D. D. Perrin Acid 2 (4584 molecules, 10912 pKas) Searches Data Source convert pKb into pKa Substituent X Ionisation Constants of Organic Acids in Aqueous Solution, by E. P. Serjeant and Boyd Dempsey Search 1 A simple search first, say we are interested in the effect of a para halogen substituent on the pKa of aniline as in Figure 2. Furthermore, we only want pKas measured at 25°C and we are interested only in the most reliable data as defined by the IUPAC data quality assessment. The results are shown in Table 1. Complete Database (13697 molecules, 30415 pKas) Conclusions In association with OpenEye, pKaData Limited has converted all four IUPAC compilations of aqueous pKa data in book form into computer-readable and substructure-searchable form. The databases are fully curated, and the complete database provides access to researchers to 30415 experimental pKa determinations. The IUPAC critical data quality assessment provides confidence limits to the measurements. Ionisation assignments have been provided for logD calculations Hal Figure 2: Search for pKa of aniline with para-substituted halogen measured at 25°C. Hal pKa range (# of obs) most reliable value F 4.53 - 4.65 (5) 4.64 Cl 3.82 - 4.15 (9) 3.982 Br 3.8 - 3.95 (6) 3.888 References I 3.75 - 3.84 (6) 3.812 1) Bandura, A. V., and S. N. Lvov, "The Ionization Constant of Water over a Wide Range of Temperatures and Densities." J. Phys. Chem. Ref. Data, Vol. 35, 2006, pp. 15 – 30. Table 1: Results for search for pKa of aniline with parasubstituted halogen measured at 25 degC. Search 2 In this search, we have an enzyme active site that we think will accept an ionised phenol, but can we bring the phenolic pKa down enough to be in the range 6.5 < pKa < 7.5 using a para substituent as in Figure 3? Will nitro suffice and are there any “off the wall” substituents that will do the job? The results are shown in Table 2. pKaData Limited § 116 Wood Road RD9 Maungatapere Whangarei 0179 New Zealand +6494346197 tony@pKaData.com www.pKaData.com 9 Bisbee Ct Suite D Santa Fe, NM 87508 505.473.7385 info@eyesopen.com www.eyesopen.com ǂ X assign temperature to correct field and put other text (eg. ~ or >) in separate field Figure 1: Example of data extraction from book into database. Figure 3: Search for para-substituted phenols in the pKa range 6.5 < pKa < 7.5