Computer Assisted Structure Elucidation and 3-D Structural Modeling of “Complex” and “Operationally Defined” Organic Compounds: Fundamental Concepts and Case Studies Mamadou S. Diallo 1, 2 1Materials and Process Simulation Center, Beckman Institute, California Institute of Technology, Pasadena 2Department of Civil Engineering, Howard University, Washington DC Outline • Background • Computer Assisted Structure Elucidation: The Signature Molecular Descriptor • Computer Assisted Structure Elucidation: The SIGNATURE Program • Case Study : Computer Assisted Structural Elucidation and 3-D Structural Modeling of Chelsea Soil Humic Acid • Summary and Outlook • Acknowledgments Background • Computational chemistry is increasingly being used to characterize the molecular physical chemistry of organic/inorganic compounds. • The starting point of any molecular level investigations of the physical-chemical behavior of a given compound by computational chemistry is the bond topology, that is, a list of connection between all its atoms. • For “small” and “well” defined organic/inorganic compounds, a crystal structure or a 2-D structural model are usually available. • There many cases in chemistry (e. g., environmental chemistry, petroleum chemistry, soil chemistry, organic geochemistry and the chemistry of natural products) where the 2-D/3-D “structures” compounds of interest are not known. “Operationally Defined” Organic Compounds in Environmental Chemistry • • • Humic acids (HAs) are operationally defined as the fraction of natural organic matter that is insoluble in aqueous solutions at acidic pH (<2) and soluble in aqueous solutions at higher pH. They are ubiquitous in nature. In terrestrial ecosystems, the amount of carbon in HAs ( 6.0 1012 tons) exceeds that in living organisms. They act as (i) soil stabilizers, (ii) nutrient and water reservoirs for plants, (iii) sorbents for toxic metal ions, radionuclides and organic pollutants and (iv) chemical buffers with catalytic activity. • • A commonly accepted view in the literature is that HAs are organic geo-macromolecules formed through the degradation of plant biopolymers and/or the condensation of plants and microbial degradation products. However, due to this broad diversity in structural building blocks and formation pathways, reliable 3-D structural models that capture the specific chemistry for HAs from a given source have yet to be achieved despite two centuries of investigations. “Operationally Defined” Organic Compounds in Petroleum Chemistry • Asphaltenes are operationally defined as the non-volatile fraction of petroleum that is insoluble in n-alkanes and soluble in aromatic solvents. • The precipitation of asphaltenes can cause such severe problems as reservoir and pipeline plugging. • The adsorption of asphaltenes at oil-water interfaces has been shown to drastically increase the stability of waterin-oil (W/O) emulsions generated during petroleum recovery by waterflooding. • Asphaltenes also adversely impact oil refining. They can promote coke formation, deactivate catalysts and are the main components of vacuum residua. • Most of the scientific and technological challenges associated with the production and processing of heavy oils is directly related to their high content of non-volatile and refractory compounds such as asphaltenes. Limitations of the Conventional Approach for Modeling Humic Acids and Asphaltenes • There are two major impediments to this conventional approach. • First, the structure elucidation process is carried out manually • This may be prohibitively time consuming for multifunctional geomacromolecules such as humic acids and asphaltenes. • Second and more importantly, when several isomers can be built from the same analytical data set, the conventional approach does not provide any means of selecting the “appropriate” isomer • Thus, reliable results may be difficult to achieve when structural models of HAs generated with the conventional approach are used in subsequent calculations of their physicochemical properties by computational chemistry. Computer Assisted Structure Elucidation: The Signature Descriptor (Faulon, J. Chem. Inf. Comput. Sci., 1994, 34, 1204-121) • The signature is a systematic codification system over an alphabet of atom types, describing the extended valence (i.e.,neighborhood) of the atoms of a molecule. • For complex organic geo-macromolecules such as humic acids and asphaltenes, the signature descriptor provides a simple and robust means of coding: – (i) elemental analysis data as 0 level atomic signatures, (ii) quantitative 1H/13C NMR data as 1 or 2 level atomic signatures, and (iii) – qualitative data (e.g., molecular fragments and interfragmentbonds from FT-IR spectroscopy, qualitative 1-D/2-D NMR spectroscopy, ESI mass spectrometry, etc.) as 1, 2, or higher level molecular signatures. Computer Assisted Structure Elucidation: Signature of an Atom . Faulon, J. Chem Inf. Comput. Sci., 1994, 34, 1204-1218 • • • • A molecule can be represented by the saturated atomic graph G = {V, E} where the elements of V are the atoms and the edges of E are the bonds Let v be a vertex of the atomic graph G = {V, E} and Tl (v) the spanning subtree of height l rooted on v. The l-signature of v is defined as sl(v) = c{Tl (v)} Thus, the subtree Tl (v) of the atomic graph G = {V, E} can be viewed as a molecular fragment centered on the atom v reduced to a limited environment of radial distance l Computer Assisted Structure Elucidation: Signature of a Molecule as a Linear Combination of Its Atomic Signatures . (Faulon, J. Chem Inf. Comput. Sci., 2003, 43(3) 707-721) and Dreiding FF Atom Types Computer Assisted Structure Elucidation: Signature of a Molecular Fragment . (Faulon, J. Chem Inf. Comput. Sci., 1994, 34, 1204-1218) Computer Assisted Structure Elucidation: Signature of an Interfragment Bond . (Faulon, J. Chem Inf. Comput. Sci., 1994, 34, 1204-1218) Computer Assisted Structure Elucidation: The Signature Equation . (Faulon, J. Chem Inf. Comput. Sci., 1994, 34, 1204-1218) l-signatures of molecular fragments + l-signatures of interfragment bonds = lsignatures of the unknown structure sl (S) and sl’ (S) are the lsignatures and associated standard deviations of the unkown structure • xi and yj are, repectively, the quantities of molecular fragment fi and interfragment bond bj • I and J are, respectively, the numbers of molecular fragment fi and interfragment bond bj • Heriarchical Approach for Modeling Humic Susbstances Experimental Characterization EA, FT-IR Spectroscopy, 1-D and 2-D 1H/13C NMR Spectroscopy, Mass Spectrometry, etc Elements Types Amounts Molecular Fragments Interfragment Bonds Types Amounts Types Amounts Computer Assisted Structure Elucidation 3-D Models Atomic Simulations Molecular Dynamics, Molecular Mechanics, etc Structural Properties 1H/13C NMR, IR Spectrum, etc Thermodynamic Properties Model Selection Selection of Reliable of 3-D Models Bulk Density, Solubility Parameter, etc. Guiding Principles for the Hierarchical Approach for 3-D Structural Modeling of Humic Acids • • • HAs from different sources (e.g., soils, plants, sediments and streams) have different structural characteristics. No single structural model can be used to describe HAs from different sources. Given a set of reliable structural data, the hierarchical approach shown in Figure 1 can be used to generate all the 3-D models that best match the structural data for the HA of interest. • These models can then be used in subsequent calculations of their bulk thermodynamic and structural properties (e.g., density, solubility parameter, 13C NMR spectrum etc) by standard and validated methods of computational chemistry. • Only models that yield bulk thermodynamic and structural properties in agreement with the experimental data can be considered as reliable 3-D structural models for the HA of interest. McCarthy’s First Principles of Humic Substances • MacCarthy’s “First Principle of Humic Substances” (P. MacCarthy, In Humic Substances: Structures, Models and Functions E.A. Ghabbour, G. Davies, Eds. Royal Society of Chemistry Special Publication 273, 2001, pp 19-30.) • “Humic substances comprise an extraordinarly complex, amorphous mixture of highly heterogeneous, chemically reactive yet refractory molecules, produced during early diagenesis in the decay of biomatter, and formed ubiquitously in the environment via processes involving chemical reactions of species randomly chosen from a pool of diverse molecules and through random chemical alteration of precursor molecules.” 3-D Structural Modeling of Chelsea Soil Humic Acid • • • Chelsea soil HA was selected as model HA to illustrate this new methodology. The Chelsea HA sample was extracted from Houghton muck, a Histosol soil widely found in the Great Lakes region of the USA [Michigan, Wisconsin, Minnesota, Illinois, Indiana and Ohio]. The selection of Houghton muck as the HA source sample was partially motivated by the availability of data on its origin and insight into the mechanisms of formation of Chelsea soil HA (USDA-NRCS Soil Survey Division). • • The native vegetation that led to the formation of Hougthon consisted predominantly of grasses, sedges, reeds, buttonbrush and cattails. The poor drainage of Houghton muck, the characteristics of its native vegetation and the relatively large mean residence time of organic matter in Histosol soils (1) suggest that the condensation of plant degradation products (e.g., lignin degradation products, sugars, amino acids, etc) was a major formation pathway for Chelsea soil HA. Experimental Characterization of Chelsea Humic Acid • • • • • Elemental Analysis Diffuse Reflectance FT-IR Spectroscopy 1-D 13C and 1H Solution NMR Spectroscopy 2-D Solution NMR Spectroscopy (TOCSY and HMQC ) ESI Quadrupole Time-of-Flight Mass Spectrometry Figure 3: Electrospray ionization (ESI) quadrupole time-of-flight (Q-ToF) mass spectrum for Chelsea humic acid. The spectrum exhibits the broad distribution of peaks observed in typical mass spectra of humic substances. It tails at approximately 1200 Dalton thereby suggesting that higher molecular weight compounds are not significant components or building block of Chelsea soil humic acid . Experimental Characterization of Chelsea Humic Acid: Summary of Results • • • The organic normalized weight fractions for C (51.31%), H (4.00%), O (39.67%), N (4.12%) and S (0.90%) and O/C atomic ratio (0.58) for Chelsea soil HA are typical of soil humic acids Overall, the results of the DRIFT and 1-D and 2-D solution NMR spectroscopic experiments are consistent with the hypothesis that the condensation of plant degradation products (e.g., lignin degradation products, sugars, amino acids, etc) was a major formation pathway for Chelsea soil HA The ESI Q-TOF mass spectrum of Chelsea soil HA tails at approximately 1200 Dalton thereby suggesting that higher molecular weight compounds are not significant components or building blocks of Chelsea soil HA. Computer Assisted Structural Elucidation of Chelsea Humic Acid • In the second phase of this study, we used the stochastic generator of chemical structures (SIGNATURE) to generate all the 3-D structural models of Chelsea HA that are consistent with: – the experimental data, and – The hypothesized formation pathway of Chelsea HA • The computer assisted structure elucidation program (SIGNATURE) performs three basic tasks: – First, it calculates an exhaustive and non-overlapping list of molecular fragments and associated interfragment bonds that best match the structural input data for the humic acid (HA) of interest – In the second task, the software evaluates the total number of structural models that are consistent with the list of molecular fragments and interfragment bonds found in step 1 – Finally, SIGNATURE generates all the 3-D models of the HA of interest or a statistically representative sample of these models by randomly connecting the “precursor molecules” and interfragment bonds found in step 1 SIGNATURE Input Parameters for Chelsea Humic Acid: Atomic Signatures Atom Type C H Osp3 Osp2 Nsp3 Ssp3 Aliphatic C Aromatic C Methyl C C amino acid Anomeric sugar C Carbonyl + carboxyl C O substituted aromatic C Methoxy aromatic C CA Hexose sugar C CB Hexose sugar C CC Hexose sugar C CE Hexose sugar C CF Hexose sugar C Signature h s (S)exp h_ o_ o’ n s c_ cp c_(h_h_h_*_) c_(n_c_h_*_) c_(o_c_o_h_) c=(o'*_*_*_) cp(cpcpo_*_) o_(cp(cpcp*_)c_(h_h_h_)*_*_) c_(o_(h_*_*_)c_(c_o_h_)o_(c_*_*_)h_(*_*_*_)) c_(o_(h_*_*_)c_(c_o_h_)c_(o_o_h_)h_(*_*_*_)) c_(o_(h_*_*_)c_(c_o_h_)c_(c_o_h_)h_(*_*_*_)) c_(c_(o_h_h_)c_(c_o_h_)o_(c_*_*_)h_(*_*_*_)) c_(o_(h_*_*_)c_(c_o_h_)h_(*_*_*_)h_(*_*_*_)) 93.40 43.50 14.50 6.90 0.66 17.00 34.00 3.00 3.00 2.00 24.00 9.00 6.00 2.00 2.00 3.19 2.00 2.00 SIGNATURE Input Paramters for Chelsea Humic Acid: Molecular Fragments and Interfragment Bonds Lignin Derived Fragments Amino Acids Polyphenols Sugars Fatty Acids Bonds 1-(4-Hydroxy-3,5dimethoxyphenyl) ethanol 1-(3,4-Dimethoxyphenyl) ethanol 3,4,5-Trimethoxy cinnamic acid 1-(4-Hydroxyphenyl) ethanol Aspartic acid Galacturon ic acid Gluconic acid Mannuroni c acid Allose Arginine 3,4-Dimethoxy benzoic acid Glutamine 4-Methoxy cinnamic acid Glycine 4-Hydroxy benzoic acid Histidine Apocynol Isoleucine Cinnamyl alcohol Coniferyl alcohol Dihydroferulic acid Dihydrocoumaric acid Eugenol Ferulic acid Gallic acid Guaiacol Guaiacyl propionic acid Isoeugenol Protocatechuic acid Sinapyl alcohol Sinapinic acid Syringyl alcohol Syringic acid Syringol Syringyl propionic acid Vanylic acid Veratric acid Vinyl guaiacol Cis-Ferulic acid p-Anisic acid p-Coumaric acid p-Coumaryl alcohol Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Trytophan Tyrosine Valine Undecanoic acid Dodecanoic acid Tridecanoic acid Tetradecanoic acid Pentadecanoic acid Hexadecanoic acid Heptadecanoic acid Octadecanoic acid Nonadecanoic acid Eicosanoic acid Ethanoic acid Propanioc acid Butanoic acid Pentanoic acid Hexanoic acid Heptanoic acid Octanoic acid Nonanoic acid Decanoic acid Caro_Caro 3,4-Dimethoxy cinnamyl alcohol 3,4-Dimethoxy cinnamic acid 1,2,3 Trihydrox benzoic acid 2,3,4 Trihydrox benzoic acid 2,3,6 Tricarboxy phenol 2,4 Dicarboxy phenol 2,4 Dihydroxy benzoic acid 3.4,5 Trihydroxy benzoic acid 3.4 Dihydroxy benzoic acid 3.5 Dihydroxy benzoic acid 3 Hydroxy benzoic acid 4 Hydroxy benzoic acid Phenol o-Creosol m-Creosol p-Creosol Phloroglucinol Resorcinol Glutamic acid Alanine Asparagine Cysteine Arabinose Fucose Galactose Glucose Gulose Idose Mannose Rhamnose Ribose Xylose Caro_H Caro_O Caro_N Cali_ Caro Cali_H Cali_O Cali_N SIGNATURE Output Parameters for Chelsea Humic Acid: Model Predictions for Atomic Ratios versus Analytical Input Data Signature h s (S)exp - h s (S)pred h_ o_ o’ n s c_ cp c_(h_h_h_*_) c_(n_c_h_*_) c_(o_c_o_h_) c=(o'*_*_*_) cp(cpcpo_*_) o_(cp(cpcp*_)c_(h_h_h_)*_*_) c_(o_(h_*_*_)c_(c_o_h_)o_(c_*_*_)h_(*_*_*_)) c_(o_(h_*_*_)c_(c_o_h_)c_(o_o_h_)h_(*_*_*_)) c_(o_(h_*_*_)c_(c_o_h_)c_(c_o_h_)h_(*_*_*_)) c_(c_(o_h_h_)c_(c_o_h_)o_(c_*_*_)h_(*_*_*_)) c_(o_(h_*_*_)c_(c_o_h_)h_(*_*_*_)h_(*_*_*_)) Average atomic signature error 2.20 5.70 1.10 4.70 1.60 9.70 19.30 1.40 0.80 0.20 8.40 2.10 0.70 0.20 0.20 0.90 0.20 0.20 3.00 Evaluation of 3-D Structural Properties from Atomistic Simulations • • In the third phase of this study, we used SIGNATURE to generate all the 18 3-D structural models model isomers for Chelsea soil HA by randomly connecting the optimal “precursor molecules” and corresponding interfragment bonds found during the first stage of the model building process. The bulk density and solubility parameters of the Chelsea model isomers were subsequently calculated using standard and validated methods of computational chemistry (e.g., molecular mechanics and molecular dynamics simulations) Calculated bulk densities:(A) and Hildebrand solubility parameters (B) of the SIGNATURE generated 3- D model for Chelsea soil humic acid. Estimated Bulk Density of Humic Substances: 3 1.20 -1.45 g/cm 3 Bulk Density (g/cm ) 2.0 1.5 1.0 0.5 0.0 0 5 10 15 Solubility Parameter (J 1/2 3/2 / cm ) Chelsea Soil Humic Acid Model Isomer # 40 35 Estimated Solubility Parameter of Soil Humic Acids: 1/2 3/2 23.0-28.0 J / cm 30 25 20 15 2 4 6 8 10 12 14 Chelsea Soil Humic Acid Model Isomer # 16 18 Selected SIGNATURE Generated Structural Models for Chelsea Humic Acid Chelsea soil humic acid model # 4 = 1.33 g/cm3 and = 27.80 J1/2 /cm3/2 Chelsea soil humic acid model # 6 = 1.40 g/cm3 and = 25.50 J1/2 /cm3/2 Chelsea soil humic acid model # 9 = 1.43 g/cm3 and = 28.40 J1/2 /cm3/2 Chelsea soil humic acid model # 5 = 1.40 g/cm3 and = 27.80 J1/2 /cm3/2 Chelsea soil humic acid model # 8 = 1.42 g/cm3 and = 28.00 J1/2 /cm3/2 Summary and Conclusions • We have combined experimental characterization (elemental analysis, FT-IR spectroscopy, 1-D and 2-D 1H/13C NMR spectroscopy and electrospray ionization quadrupole time-offlight mass spectrometry) with computer assisted structure elucidation and atomistic simulations to generate all the 3D structural models for Chelsea soil humic acid that are consistent with the structural data and available bulk thermodynamic properties of humic acids. • We find that Chelsea soil humic acid can be described as a “simple” mixture of a limited number of low molar mass “molecularly heterogeneous” model isomers. The simulated 13C NMR spectrum of a mixture of these model isomers compares very well with the measured spectrum of Chelsea soil humic acid. Potential Impacts of Methodology in Humic Substances Research • For HAs formed predominantly through the biotic/abiotic condensation of plant degradation products (e.g., lignin degradation products, carbohydrates, amino acids, etc) such as those found in Histosol, Mollisol and peat spoils (1), a systematic application of our methodology to bulk HA samples and well resolved HA fractions from these soils is expected to result in the development of reliable 3-D structural models. • Such models could then be used in subsequent integrated experimental and computational studies to address some key fundamental questions: 1. What are the 3-D structures of HAs in the bulk phase, aqueous solutions and at mineralwater interfaces? 2. Do organic geo-macromolecules such HAs with no well defined head and tail self assemble in ordered micelle/membrane like aggregates or disordered fractal like aggregates in aqueous solutions and at mineral-water interfaces ? 3. What are the “molecular” scale locations and preferred coordination environment for metal ions and organic guests bound to HA hosts in aqueous solutions, in soils and at mineralwater interfaces? 4. How strong is the binding of metal ion and organic guests to HA hosts in aqueous solutions, in soils and at mineral-water interfaces? • • • • Acknowledgments •Prof. Weilin Huang (Drexel University ) for providing the Chelsea HA samples •USEPA GLMA Center for Hazardous Substance Research (Funding to Howard University ) •Department of Commerce (Funding to Howard University and Caltech) •National Science Foundation (Funding to The Ohio State Environmental Molecular Science Institute) •Environmental Molecular Sciences Laboratory (PNNL) for analytical support)