Petroleum Hydrocarbon Fingerprinting - Numerical Interpretation Developments John W. Wigger, P.E. Environmental Liability Management, Inc., Tulsa, Oklahoma Bruce E. Torkelson, Torkelson Geochemistry, Inc., Tulsa, Oklahoma ABSTRACT A feasibility study was conducted to assess the use of mathematical algorithms as aids for interpreting hydrocarbon fingerprint data. The first algorithm developed applies a correlation routine to determine the degree of similarity among different hydrocarbon samples. The second algorithm models the evaporation portion of the weathering process in gasoline. Controlled evaporation experiments of different grades of gasoline samples were used to create a matrix of numerical multipliers that describes individual component volatilization. This matrix can be used to: 1) estimate the composition of gasoline after a release, and inversely 2) estimate the original composition of a gasoline obtained subsequent to a release (for example, from a monitoring well). Algorithms like these will probably never fully replace the visual process that is now used to interpret hydrocarbon fingerprints, however, they have the potential to add a more objective and quantitative perspective to the process. INTRODUCTION Hydrocarbon fuels and derivative products discovered in soils and groundwater at environmental release sites are often characterized by use of a laboratory technique known as capillary column gas chromatography (also referred as hydrocarbon fingerprinting, gas chromatography or GC fingerprinting). GC fingerprinting is an extremely useful tool in the investigation of subsurface contamination of soil and groundwater. (Bruce and Schmidt) GC fingerprints are used to obtain information from liquid hydrocarbon samples (free product) by determining the composition of the hydrocarbons present. The identification and interpretation of GC fingerprints, however, is largely a qualitative practice and dependent upon the skill and experience of the individuals(s) involved. BACKGROUND Petroleum Hydrocarbon Chemistry Petroleum hydrocarbons consist of a very large number of compounds that, by definition, are found in crude oil, as well as other sources of petroleum such as natural gas, coal, and peat. Petroleum hydrocarbons consist of three major groups of compounds. These are alkanes (paraffins), alkenes (olefins), and aromatics. Paraffins, are one of the major constituents of crude oil and are found in refined petroleum products such as gasoline, kerosene, diesel fuel, heating oil, etc. There are three major classes of paraffins; these are linear alkanes, branched alkanes, and naphthenes. The linear alkanes have carbon atoms arranged in a line and there are only two ends to these molecules. Linear alkanes are also referred to in the literature as n-alkanes. Branched alkanes have the carbon atoms arranged similar to the n-alkanes, however, some of the carbon atoms are branched, thus creating many differing configurations. Naphthenes are molecules in which the carbon atoms are arranged in one or more rings. Olefins are formed during the refining process of creating petroleum products from crude oil. These molecules have a double bond and two less hydrogen atoms than their corresponding alkane. Aromatics contain one or more 6 carbon rings with 3 of the carbons containing double bonds. Examples of 1 ring (or mononuclear) aromatics are Benzene, Toluene, Ethylbenzene, and Xylene (BTEX). Multiple ring aromatics (polynuclear) are aromatic compounds with multiple 6 carbon ring molecules. Examples of these are naphthalene, anthracene, pyrene, and many more. Hydrocarbon products such as gasoline, diesel fuel, and asphalts are all created from crude oil by a variety of refining and distillation processes. Each product is produced by the combination of multiple individual hydrocarbon compounds all of which have slightly different vaporization and boiling temperatures. For example, gasoline is the combination of many lower boiling range compounds including C4 to C12 alkanes, C4 to C7 alkenes, and aromatics BTEX. The middle boiling range compounds are used in differing proportions to create products such as kerosene, diesel, and heating oil. These products predominantly contain C10 to C24 alkanes, and polynuclear aromatics with little to no olefins. (Zemo, Graff, and Bruya) Hydrocarbon Fingerprinting GC fingerprints are created by injecting a small portion of the sample into a gas chromatograph. Once injected, the product is heated and vaporized and carried into a capillary column by a flow of inert gas. After injection the temperature of the column is slowly raised. As the temperature increases the compounds begin to move through the column, in general the more volatile and lower boiling compounds start moving first. A flame ionization detector connected to the end of the column detects the components of the product as they elute from the column. The time that it takes for individual components to go through the column depends on the temperature, length of column, column characteristics, and the character of the compound itself. Five GC fingerprints of various hydrocarbon products (gasoline, kerosene, diesel, JP-8 jet fuel and crude oil) are presented in Figures 1 through 5. A few of the peaks have been labeled identifying some of the compounds in each of the products. Figure 1. Gasoline I (Regular Unleaded Gasoline) Figure 2. Kerosene I Figure 3. Diesel Fuel Figure 4. JP-8 Jet Fuel Figure 5. Crude Oil MATERIALS & METHODS Gas Chromatography Hydrocarbon samples were analyzed on a Hewlett Packard 5890 instrument equipped with a split/splitless injector, J&W 30 meter DB-1 column and an FID detector: All gas flow rates were set to manufacturer specifications. Injections were made in split mode with a split ratio of 1:100. The column oven was programmed from -10 o to 350 o C at 10 o C/minute with 4 minute hold at 350o C. The injector temperature is set at 350o C and the detector temperature is 360o C. Data was acquired and processed with an EZChrom Chromatography data system. Gasoline Weathering Simulation One of the weathering processes that can affect released gasoline is evaporation. To simulate evaporation under controlled conditions, three different grades of gasoline were obtained from a local retailer and allowed to evaporate to controlled volumes. The gasoline components with the lowest boiling points tend to volatilize more rapidly than the components with higher boiling points. On the GC fingerprint the components that have the shortest retention times (left side of the GC fingerprint) are the most volatile and will tend to decrease in peak intensity preferentially as more volatilization takes place. This is clearly illustrated in Figure 6, where GC fingerprints from the same gasoline are shown under differing levels of volatilization. Figure 6. Controlled Evaporation Of Regular Grade Of Unleaded Gasoline (Note: Chromatograms Have Been Normalized To Make The Heights of Naphthalene Peaks Equal) Evaporation Procedures The evaporation procedure that was used is described below: Samples of three grades of gasoline (87, 89, and 93 octane) were acquired at a local service station. Equal volumes of each grade of gasoline were placed in four (4) 40 milliliter vials, making a total of 12 vials. Three (3) vials of each sample were uncovered (total of 9) and allowed to volatilize at room temperature. The uncovered vials were closely monitored and capped when the gasoline was reduced to the desired volume resulting in one vial each at 75%, 50%, and 25% of original volume for each of the three grades of gasoline. Algorithms Described Correlation Coefficient The correlation coefficient, denoted by , measures the relationship between two data sets that are scaled to be independent of the unit of measure and is given by the formula: Where and are values in each corresponding data set. The value of the correlation coefficient is always between -1 and +1. A value of equal to -1 indicates a perfect linear relationship between the sample values of x and y, with the value of y decreasing as the value of x increases. A value of equal to +1 also indicates a perfect linear relationship between the sample values, but one in which the value of y increases as x increases. Larger values of y are associated with larger values of x; and smaller values of y are associated with smaller values of x. If there is no linear relationship between the sample values of x and y, then will have a value near or equal to zero (Hayslett). The correlation coefficient determines whether two data sets move together; that is, whether large values of one set are associated with large values of the other (positive correlation), whether small values of one set are associated with large values of the other (negative correlation), or whether the values in the two sets are unrelated. In this study, 71 hydrocarbon chromatogram peaks, each representing a different hydrocarbon compound, were used in the analysis. Table 1, presents a list of the compounds. Integrated peak areas were measured and then tabulated for each of the five hydrocarbon samples in figures 1 through 5. The integrated peak is a measure of the intensity of the response of the flame ionization detector to each of the individual compounds measured in millivolt seconds. The library samples includes gasoline, kerosene, diesel, JP-8 jet fuel, and crude oil. Once the peak area data were collected and tabulated for these hydrocarbon samples, three additional hydrocarbon samples were also collected. The first sample collected had been identified as a kerosene from the provider, however, the GC fingerprint clearly illustrated a much broader range of hydrocarbons than the kerosene run earlier. The second sample was a laboratory standard mixture of gasoline and diesel fuel. The third sample was a regular grade of unleaded gasoline from a different service station. Figures 7, 8, and 9 present the GC fingerprints of these three respective samples. Table 1. 71 Hydrocarbon Compounds Used For Analysis 1 iC4 = Isobutane 37 IP14 = C14 Isoprenoid 2 nC4 = Butane 38 1 M naph = 1 Methylnaphthalene 3 iC5 = Isopentane 39 nC13 = Tridecane 4 nC5 = Pentane 40 IP15 = Farnesane 5 2 M Pent = 2 Methylpentane 41 nC14 = Tetradecane 6 3 M Pent = 3 Methylpentane 42 IP16 = C16 Isoprenoid 7 nC6 = Hexane 43 nC15 Pentadecane 8 C6 Olefin = C6 Olefin 44 nC16 = Hexadecane 9 M Cycl Pent = Methyl cyclopentane 45 IP18 = Norpristane 10 2,4 DMP = 2,4 Dimethylpentane 46 nC17 = Heptadecane 11 Bnz = Benzene 47 Pristane = Pristane 12 Cyclo Hexane = Cyclohexane 48 nC18 = Octadecane 13 2 M Hex = 2 Methylhexane 49 Phytane = Phytane 14 3 M Hex = 3 Methylhexane 50 nC19 = Nonadecane 15 Isooctane = Isooctane or 2,2,4 Trimethylpentane 51 nC20 = Eicosane 16 nC7 = Heptane 52 nC21 = Heneicosane 17 MCHX = Methylcyclohexane 53 nC22 = Docosane 18 Tol = Toluene 54 nC23 = Tricosane 19 nC8 = Octane 55 nC24 = Tetracosane 20 EB = Ethylbenzene 56 nC25 = Pentacosane 21 m/p-xyl = m/p Xylene 57 nC26 = Hexacosane 22 o-xyl = o Xylene 58 nC27 = Heptacosane 23 nC9 = Nonane 59 nC28 = Octacosane 24 propylbenzene = n Propylbenzene 60 nC29 = Nonacosane 25 1M 3E benz = 1 Methyl 3 ethylbenzene 61 nC30 = Triacontane 26 1M 4E benz = 1 Methyl 3 ethylbenzene 62 nC31 = Hentriacontane 27 1,3,5 T M Benz = 1,3,5 Trimethylbenzene 63 nC32 = Dotriacontane 28 3,3,4 T M Hept = 3,3,4 Trimethylheptane 64 nC33 = Tritriacontane 29 1,2,4 T M Benz = 1,2,4 Trimethylbenzene 65 nC34 = Tetratriacontane 30 nC10 = Decane 66 nC35 = Pentatriacontane 31 1,2,3 T M Benz = 1,2,3 Trimethylbenzene 67 nC36 = Hexatriacontane 32 nC11 / 1,2,4,5 TeMB = Undecane and 1,2,4,5 Tetramethlybenzene 33 Naph = Naphthalene 68 69 nC37 = Heptatriacontane nC38 = Octatriacontane 34 nC12 = Dodecane 70 nC39 = Nonatriacontane 35 IP13 = C13 Isoprenoid 71 nC40 = Tetracontane 36 2 M naph = 2 Methylnaphthalene Figure 7. Kerosene II Figure 8. Gasoline Diesel Laboratory Mixture Figure 9. Gasoline II (Regular Unleaded Gasoline) Once the peak areas for all of the 71 individual components were established for each of the samples, the correlation coefficient was calculated between the samples presented in Figures 1 through 5, and those presented in Figures 7, 8, and 9. Table 2 presents the results of these calculations. Table 2. Results of Correlation Coefficient Determinations Gasoline I Kerosene I Diesel JP-8 Jet Fuel Crude Oil Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 7 -0.156 0.732 0.882 0.932 0.379 Gas/Diesel Mixture 0.638 0.333 0.528 0.505 0.440 0.894 -0.112 -0.213 -0.065 0.134 Kerosene II Figure 8 Gasoline II Figure 9 To evaluate the reproducibility of this process, the regular unleaded gasoline presented in Figure 9, was run on the GC five separate times, thus creating five GC fingerprints and five slightly differing numerical data sets. The data collected for all 71 compounds were then compared to each other, thus creating a total of twenty (20) correlation coefficient comparisons. Gasoline Weathering An algorithm was developed to model the volatilization process of gasoline released into the environment. This was accomplished by using experimental data obtained from the controlled evaporation of the three different grades of gasoline described earlier. GC fingerprint data were used to create a numerical function that describes the volatilization process. This numerical function can then be applied to fresh gasoline samples to predict what the product GC fingerprint would look like if weathered in an environmental release. As discussed earlier, the gasoline components with the lowest boiling points tend to volatilize more rapidly than the rest of the components. The components with the highest boiling points (components at the right hand side of the GC fingerprints with retention times > 10 minutes) experience little volatilization under the weathering conditions described above. With the above in mind, it was assumed that the actual volume of the naphthalene stayed relatively constant during the weathering simulation and can be used similar to an internal standard. By utilizing this, the GC data from each stage of the weathering process were normalized to the naphthalene peak. Once each GC fingerprint was normalized to naphthalene, each component was then evaluated as the total volume of product decreased. Once this process was completed for all components, then a matrix of volatilization multipliers was created. This matrix consists of a table of factors ranging from 0.0 to 1.0 describing the amount of volatilization of each of the 71 components at differing stages of evaporation of the total product. To demonstrate how the matrix was created, Table 3 presents the integrated peak areas for the first 8 of the 71 components from the premium grade unleaded gasoline used in the experiment. Table 4, presents the same component integrated peak areas after they have been normalized to naphthalene. And Table 5, presents each component integrated peak area from table 4 normalized from 0 to 1. Table 5 represents the matrix of multipliers. Two other similar tables were also produced for the mid-grade and regular grades of gasoline. The entire matrix of multipliers for each of the grades of gasoline was not presented because of the size of the tables. Table 3. Peak Areas For Components At Differing % Volatilization (Premium Grade Gasoline) Sample Id iC4 nC4 iC5 nC5 2 M Pent 3 M Pent nC6 C6 Olefin Gasoline Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Prem-0 5139 53846 129407 21998 34183 18112 14922 2865 Prem-25 0 737 23654 6196 20421 11421 10368 1940 Prem-50 0 0 0 0 1666 2305 2894 547 Prem-75 0 0 0 0 0 0 0 0 % Vol. Table 4. Peak Areas For Components At Differing % Volatilization After Normalizing With Naphthalene (Premium Grade Gasoline) Sample Id iC4 nC4 iC5 nC5 2 M Pent 3 M Pent nC6 C6 Olefin Gasoline Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Prem-0 5139 53846 129407 21998 34183 18112 14922 2865 Prem-25 0 531 17047 4465 14717 8231 7472 1398 Prem-50 0 0 0 0 1165 1612 2024 383 Prem-75 0 0 0 0 0 0 0 0 % Vol. Table 5. Matrix Of Multipliers For Individual Components Of Gasoline Under Differing Percentages Of Overall Product Volatilization (Premium Grade Gasoline) Sample Id iC4 nC4 iC5 nC5 2 M Pent 3 M Pent nC6 C6 Olefin Gasoline % Vol. Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Peak Area Prem-0 1 1 1 1 1 1 1 1 Prem-25 0 0.01 0.132 0.203 0.4305 0.4544 0.501 0.488 Prem-50 0 0 0 0 0.0341 0.089 0.136 0.1336 Prem-75 0 0 0 0 0 0 0 0 RESULTS & DISCUSSION Reproducibility The reproducibility of the gas chromatography analysis technique was evaluated by analyzing the gasoline sample presented in Figure 9 a total of 5 times. The 20 correlation coefficients calculated between each of the 5 analyses and the other four had a minimum of 0.99545, a maximum of 0.999989, an average of 0.997921 and a standard deviation of 0.001704. From this evaluation, truly alike GC chromatograms will probably have correlation coefficients of 0.99 or better. It is possible that other product types may have different reproducibilities since their data may include different peaks that come from a different part of the GC fingerprint. Correlation Table 2 presents the results of the correlation coefficients calculated when comparing the samples in Figures 1 through 5 to those in Figures 7 through 9. Prior to calculating the correlation coefficients it was expected that similar products, for example gasolines, would show higher correlations among themselves and less correlation when compared to other product types. The exact numbers, however, could not be anticipated nor how the correlation coefficients would vary between similar and different product types. Significant, logical, and reproducible differences and similarities in the correlation coefficient numbers are crucial for this process to be a useful tool. The correlation coefficient results must also make sense and compare favorably with visual inspection of the GC fingerprints. From this feasibility study, it appears that there are meaningful similarities and differences in correlation coefficient numbers calculated using GC fingerprint data. This study suggest that similar product types such as gasolines could be expected to have correlation coefficients of about 0.9 or better. Dissimilar product types have a much lower correlation coefficient of perhaps 0.6 or 0.5 or even less. The correlation coefficients shown here also make sense when compared to the visual evaluation of the GC fingerprints. An unexpected and interesting result of the correlations was that the JP-8 jet fuel and the kerosene II sample had a high correlation coefficient. At first this seemed unusual, but it must be remembered that JP-8 and kerosene are often times from the same distillation range of the crude oil. Visual comparison of the two GC chromatograms confirms the rather high degree of similarity of the two products. Gasoline Weathering The matrix of multipliers created for the evaporation sequences for the three different grades of gasoline numerically models how the volatile components tend to evaporate from the sample. The matrix of multipliers can be used to numerically alter ("evaporate") the data from a fresh sample in an attempt to estimate the composition of the sample after a weathering process. Once the sample has been artificially altered, it can then be numerically compared to other controlled weathered samples. This weathering algorithm can also be used in the inverse. For example, if one had a hydrocarbon sample from a site but did not know the extent of weathering that has already taken place, the sample could be correlated with the library of samples of known weathered gasolines. Once a library sample with the highest correlation has been determined, a matrix of multipliers of the sample with the highest correlation could be used to reconstruct the composition of the original gasoline sample when fresh. This matrix of multipliers needed to estimate the original gasoline composition would be constructed by simply using the inverse of the individual compounds within the matrix. that most closely correlated with the weathered sample. (For example, if Benzene's multiplier is 0.25, then to reconstruct the original amount of Benzene, one would multiply the peak area by 1/0.25 = 4.0.) Future Work The information and techniques presented in this paper represent some beginning examples of the types of analysis that are possible with GC fingerprint numerical data. Listed below are a number of additional numerical techniques and experiments that are under consideration for future work: 1. Investigate algorithms that numerically weight key compounds, 2. Develop algorithms that use the presence of unique compounds to indicate specific characteristics. For example olefins indicate the presence of catalytic cracked hydrocarbons. 3. Develop algorithms that further refine gasoline compound recognition, 4. Investigate additional controlled weathering experiments on other types of hydrocarbon products, 5. Investigate other weathering processes such as water washing, biodegradation, volatilization, and chemical speciation, 6. Expand the GC fingerprint library to include other petroleum products and hydrocarbon samples from the extraction of both soils and groundwater, 7. Investigate correlation experiments using isolated ranges of hydrocarbons; therefore, allowing mixtures of products (say gasoline and diesel fuel) to be evaluated separately and then numerically added together.