Topological Index Calculator III [1] Computation of the Hosoya Index using SMILES representation Steven D. Granz, Departments of Mathematics & Computer Science, Gordon College, Wenham, MA 01984, sgranz@peace.gordon.edu Irvin J. Levy, Departments of Chemistry & Computer Science, Gordon College, ijl@gordon.edu Abstract: JavaScript application for the computation of QSPR descriptor indices for alkane Hosoya Index Algorithm Using SMILES Representation: molecules. The previous program supported the computation of the following Computing the value of the Hosoya Index by hand can be tedious. As the number of bonds in the indices: Balaban Index, Weiner Index, Randic Index, Odd-Even Index, Polarity Index, molecule increases, the probability of human error becomes greater. Without a computational Vertex Degree Distance Index and Harary Index. tool such as Topological Index Calculator, human error is very likely to occur. In the process of extending Previously we introduced Topological Index Calculator as a freely available Z(CC(C)) Number of parent carbons: 2 N=2/2=1 N=1+1=2 Boiling Point vs Hosoya Index C | C(C) 475 Z(CC(C)) = Z(C) * Z(C(C)) + PI Z(fragments) 425 Topological Index Calculator, we developed an algorithm that uses SMILES representation to compute We now report the addition of the Hosoya Index for acyclic alkanes. Further, we discuss Hosoya Index Regression: PI Z(fragments) = = the Hosoya Index. SMILES representation of the molecule and how to compute the Hosoya Index for cyclic SMILES Representation Summary: Z(CC(C)) = Z(C) * Z(C(C)) + Z((C)) = 1 * 2 + 1 = 3 Z(C(C)CC) Number of parent carbons: 3 N = 3 / 2 = 1.5 N=1 Boiling Point (K) the algorithm we have developed to compute the Hosoya Index directly from the alkanes. SMILES [3] (Simplified Molecular Input Line Entry Specification) is a simple yet comprehensive chemical nomenclature. SMILES is used for expressing the molecular graph of "normal" organic N=1+1=2 molecules. 375 325 275 C(C) | CC Two Basic Rules of SMILES: 225 Carbons are represented by the atomic symbol: C Z(C(C)CC) = Z(C(C)) * Z(CC) + PI Z(fragments) y = 63.432Ln(x) + 169.01 Branching is indicated by parentheses 175 PI Z(fragments) Hosoya Index Algorithm For Acyclic Alkanes: = 0 10 20 30 40 50 60 70 80 Hosoya Index = Z(C(C)) * Z(CC) + Z((C)) * Z(C) = 2 * 2 + 1 * 1 = 5 Given G is the SMILES representation of an acyclic alkane, Z(G) is computed as follows: Z(G) Find the number of carbons in G = Experimental vs Computed Boiling Point with Hosoya Index Equation = Z(CC(C)) * Z(C(C)CC) + Z((C)) * Z((C)) * Z(C) * Z(CC) = 3 * 5 + 1 * 1 * 1 * 2 = 17 If the number of carbons = 1 450 Z(G) = 1 Hosoya Index Algorithm for Cyclic Alkanes: Else if the number of carbons = 2 400 Given G is the graph representation of a cyclic alkane, Z(G) can be computed by finding the Hosoya Else if the number of carbons = 3 http://www.math-cs.gordon.edu/courses/organic/topo/ Index of two other alkanes for which the sum of their individual Z values will equal Z(G). Preferably, we Z(G) = 3 wish to find two alkanes that are acyclic, but if that is not the case then we can again find the Hosoya Index Else Hosoya Index: The Hosoya Index [2], Z = Z(G), was introduced by Hosoya in 1971 as the Z index. This index is defined below: Find the number of parent carbons in G of two other alkanes, recursively continuing until we have a sum of the Hosoya Index of all acyclic alkanes, Let N = (the number of parent carbon) / 2 yielding the Hosoya Index for the original cyclic structure. If the number of parent carbon is odd truncate N to an integer N /2 Add one to N i 0 Find the Nth carbon in G Z p(G; i ) where p(G;i) is the number of selections of i mutually non-adjacent edges in G , a chemical graph with N vertices. By definition, p(G;0) = 1, and p(G;1) is the number of edges in G. 300 250 200 1. Let H be a subgraph defined by removal of one random edge, E, from a ring in G Z(G) = Z(subgraph, left of B) * Z(subgraph, from B to end) + PI Z(fragments) 2. Let I be a subgraph defined by removal of E from G as well as by the removal of all edges adjacent to E in G where PI Z(fragments) is equal to the product of Z for each fragment created Average Percent Error = 2.48% 150 200 250 3. If H or I is a disconnected graph then calculate Z(H) and/or Z(I) as the product of Z by removing B and the parent carbon to the left of B 2,3-dimethylpentane 300 350 400 Calculated BP (K) for each of the connected components of the disconnected graph(s) Acyclic Alkane Example: Future Directions: 4. Z(G) = Z(H) + Z(I) 2,3-dimethylpentane Cyclic Alkane Example: p(G;0) = 1 p(G;1) = 6 p(G;2) = 8 • Students use the tool to verify values found in the literature since at least two journal articles [4] [5] have been found to have errors. • Incorporate existing indices into Topological Index Calculator • Develop new indices with better predictions of the boiling point 1,2-cyclopropane References: SMILES Representation: G = CC(C)C(C)CC Z(G) Number of parent carbons: 5 N = 5 / 2 = 2.5 N=2 Z(G) = Z( ) + Z( Z(G) ) [1] Topological Index Calculator II: Applications for the classroom and research, Steven D. Granz and I. J. Levy, 228th ACS national meeting, Philadelphia, PA 2004. N = 2 +1 = 3 Z(G) = Z(CC(C)CC) + Z(CC) * Z(C) * Z(C) * Z(C) Z(CC(C)CC) = Z(CC(C)) * Z(CC) + PI Z(fragments) CC(C) | C(C)CC p(G;3) = 2 350 An algorithm to do this is the following: Let the atom, B, be the Nth parent carbon atom in G’s SMILES representation Example: Expected BP (K) Z(G) = 2 = Z(CC(C)) * Z(C(C)CC) + PI Z(fragments) = Z(CC(C)) * Z(CC) + Z(C) * Z((C)) * Z(C) [2] Mihalic, Z. "A Graph-Theoretical Approach to Structure-Property Relationships.” J. Chem. Educ. 1992, 69, 701-712. [3] Weininger, D. “SMILES, a Chemical Language and Information System.” J. Chem. Inf. Comput. Sci. 1988, 28, 31-26. =3*2+1*1*1=7 PI Z(fragments) = [4] Cao, C. "Topological Indices Based on Vertex, Distance and Ring: On Boiling Points of Paraffins and Cycloalkanes." J. Chem. Inf. and Comp. Sci., 2001, 41, 867-877. = Z(G) The Hosoya Index of 2,3-dimethylpentane Z(G) = p(G;0) + p(G;1) + p(G;2) + p(G;3) = 1 + 6 + 8 + 2 = 17 Z(G) = Z(CC(C)) * Z(C(C)CC) + Z((C)) * Z((C)) * Z(C) * Z(CC) = Z(CC(C)CC) + Z(CC) * Z(C) * Z(C) * Z(C) =7+2*1*1*1=9 [5] Mihalic, Z., et al. “The Detour Matrix And The Detour Index.” Topological Indices and Related Descriptors in QSAR and QSPR. Ed. J. Devillers and A. T. Balaban. Gordon and Breach Science Publishers, 1999, pp.297-299. 450 90