MULTIVARIATE STATISTICAL PATTERN RECOGNITION OF CURIE-POINT PYROLYSIS-GAS CHROMATOGRAPHIC FINGERPRINTS

MULTIVARIATE STATISTICAL PATTERN RECOGNITION OF CURIE-POINT PYROLYSIS-GAS CHROMATOGRAPHIC FINGERPRINTS FROM RANGELAND SHRUBS D. N. Stevenson R. V. Valcarce G. G. Smith B.A. Haws E. D. McArthur B. L. Welch H. C. Stutz ABSTRACT The application of pattern recognition to the analysis of Py-GC data generally consists of two parts: unsupervised exploratory data analysis and supervised classification model development (Meglen 1988). Unsupervised exploratory data analysis detects outliers or abnormal measurements and provides information about the intrinsic data structure through classification. The goal of classification is to categorize a set of data as members of a class or classes without a prior or assumed knowledge of the data (Sharaf and others 1986; Wold and others 1984; Jerman-Blazic and others 1989). Unsupervised exploratory data analysis is an iterative routine that uses a variety of multivariate statistical methods such as cluster analysis, factor analysis, and principal component analysis, all of which are based on finding structural relationships or classifications among N-dimensional data (Meglen 1988; Tabachnick and Fidel11983). The three multivariate statistical programs used for exploratory data analysis in this study are: (a) LINK hierarchical cluster analysis (HCA); (b) MVSP principle component analysis (PCA); and (c) Fuzzy c-varieties pattern recognition (FCV). In general, these three techniques complement each other, and when used together provide a powerful tool for exploratory data analysis. Supervised classification model development is used to test the classification hypothesis determined in the exploratory data analysis phase by developing classification and prediction rules. These rules are used to predict class membership for new samples or to test the classification hypothesis by evaluating the performance of the rules on the data set (Sharaf and others 1986). Supervised classification model development relies heavily upon prior or assumed knowledge about class membership of the samples in the data set. Measurements or features of known samples are then used to construct a model that best represents the classification. Subsequent samples to be classified are compared with the classification model and assigned to an appropriate class (Knudson and others 1977; Meglen 1988). It is often desirable to reduce the number of features (pyrogram peaks) in the data set; this is accomplished The application of multivariate statistics to chemistry (chemometrics), using pattern recognition (PR) techniques, is shown to be a rapid and efficient method for the analysis of complex pyrolysis-gas chromatographic (Py-GC) data obtained from biomaterials. Results of two studies using various multivariate pattern recognition programs are presented. In one study, pyrograms obtained from accessions of big sagebrush (Artemisia tridentata) were correlated with differential palatability of the sagebrush to sheep. In the other study, Py-GC-PR was used to differentiate levels of ploidy in shadscale (Atriplex confertifolia). INTRODUCTION Pyrolysis-gas chromatographic (Py-GC) fingerprinting of complex biological materials has been shown to be a rapid and reliable chemotaxonomic technique (Soderstrom and Frisvad 1984; Torell and others 1989; Valcarce and Smith 1989a, 1989b). In pyrolysis, small amounts (usually micrograms) of directly sampled, underivitized material are fragmented by heating in the absence of oxygen. The resulting pyrolyzates are resolved by gas chromatography, producing a pyrogram. Pyrograms from biological samples, such as sagebrush and shadscale, are complex, and overall patterns of variation are not easily detected by visual examination. Multivariate pattern recognition techniques can be used to statistically evaluate and interpret·the data (Irwin 1982; Jurs 1986). Paper presented at the Symposium on Cheatgrass Invasion, Shrub DieOff, and Other Aspects of Shrub Biology and Management, Las Vegas, NV, April 5-7, 1989. D. N. Stevenson, R. V. Valcarce, and G. G. Smith are students and Professor Emeritus, Department of Chemistry and Biochemistry, Utah State University, Logan, UT 84322-0300; B. A. Haws is Professor Emeritus, Department of Biology, Utah State University, Logan, UT 84322-5305; E. D. McArthur and B. L. Welch are Project Leader and Research Plant Physiologist, Intermountain Research Station, Forest Service, U.S. Department of Agriculture, Shrub Sciences Laboratory, Provo, UT 84606; H. C. Stutz is Professor Emeritus, Department of Botany and Range Science, Brigham Young University, Provo, UT 84602. 325 This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. using an additional procedure known as feature selection, which ascertains the minimum number of variables (pyrogram peaks) necessary to correctly classify the training set samples (Duewer and Kowalski 1976; Sharaf and others 1986). In this study, feature selection was performed using the multivariate statistical program CART (Classification and Regression Trees). Applications of unsupervised exploratory data analysis and supervised classification and model development for the interpretation ofPy-GC data provide a powerful analytical tool for plant materials. The following two studies conducted on big sagebrush (Artemisia tridentata) and shadscale (Atriplex confertifolia) are presented to demonstrate the applicability of this technique for the analysis of genetically different but morphologically similar rangeland plants. In this study, pyrogram peaks that correlate with palatability of sagebrush to sheep were sought using supervised pattern recognition methods to classifY sagebrush pyrograms into clusters corresponding to three classes: low palatability (<25 percent), medium palatability (25 percent-75 percent), and high palatability (>75 percent). Shadscale Shadscale is also abundant in the Intermountain region of the western United States, from central Arizona and southwestern California to southern and eastern Montana. Although it is easily distinguished from all other species of saltbush (Atriplex sp.) populations are highly variable. Some variation may be attributed to environmental conditions, but the majority appears to be genetic, coming mostly from polyploidy and from introgression from other species (Stutz and Sanderson 1983). The chromosome numbers of shadscale plants are determined by cytological examination of meiotic cells in male flower buds. Collections for these studies can be made only during a few weeks each year when the plants are flowering. Being able to determine the ploidy level of shadscale at any time during the year would be useful. Since Py-GC-PR has been shown to be an effective method for discrimination of plant and insect materials (Soderstrom and Frisvad 1984; Torell and others 1989; Valcarce and Smith 1989a, 1989b), it was decided to attempt to characterize shadscale using these same methods. Preliminary Py-GC-PR studies were conducted using a limited data set of shadscale consisting of 16 plants representing eight locations and four chromosome races (table 2). Big Sagebrush Big sagebrush is among the most widespread shrub species as well as the most numerous single species in western North America (McArthur and others 1981; McArthur and Welch 1982). The big sagebrush complex is divided into three common subspecies: basin, Wyoming, and mountain big sagebrush (A t. ssp. tridentata, wyomingensis, and vaseyana) (McArthur and Plummer 1978; McArthur and Welch 1982), each consisting of various populations or accessions. Welch and others (1987) reported that domestic sheep showed differential preference for various accessions of big sagebrush (table 1). Differential palatability has applications in land rehabilitation, where less-preferred sagebrush could be used for revegetation in areas subject to overgrazing. Conversely, the establishment of preferred accessions of big sagebrush can provide winter forage, improving rangelands for domestic and wild animals (Behan and Welch 1986; Welch and others 1987). Table 1-Utilization of big sagebrush accessions by wintering sheep Table 2-Sample numbers, ploidy, identification numbers, and (Welch and others 19a7) Sample number 1-6 7-12 13-1a 19-24 25-30 31-36' 37-42 43-4a 49-54 55-60 Accession 1 Hobble Creek (v) Salina Canyon (v) Dove Creek (t) Petty Bish'ops Log (v) Clear Creek Canyon (v) Hobble Creek II (v) Clear Creek Canyon (t) Evanston (t) Milford (w) Evanston (w) location of shadscale samples used in this study Percent of current year's vegetative growth eaten ao.6 10.9 0.0 4a.3 21.6 ao.6 1.9 .5 a2.7 44.2 1 v =A. t. ssp. vaseyana, t =A. t. ssp. tridentata, w = A. t. ssp. wyomingensis. 326 Sample number Ploidy ID number 1-3 4-6 7-9 10-12 13-15 16-1a 19-21 22-24 26-27 2a-30 31-33 34-36 37-39 40-42 43-45 46-4a 2x 2x 2x 2x 2x 2x 4x 4x 4x 4x ax ax ax ax 10x 10x 79777 79777 a2246 a2246 a3244 a3244 a2272 a2272 a2261 a2261 a31aO a31aO a2239 a2239 a3133 a3133 Location Hardin, MT Hardin, MT Antelope Island Antelope Island Horse Canyon, UT Horse Canyon, UT Emery,UT Emery, UT Rock Springs, WY Rock Springs, WY Alkali Flats, OR Alkali Flats, OR Scipio, UT Scipio, UT Eskdale, UT Eskdale, UT EXPERIMENTAL represented by a pyrogram, can be considered as a N -dimensional data vector (Meglen 1988). Essentially, cluster analysis searches the distance matrix for the two data vectors with the smallest distance of separation. They are then treated as a single point, positioned at the center of gravity of the pair, and a new distance matrix is computed. This process continues with the number of groups reduced by one at each step, until all data vectors have been assigned to a single cluster (Dunn and Everitt 1982; Lavine 1988). A variety of methods to calculate the distance between a single point and a cluster, or between two clusters, are available (Romesburg 1984). Single linkage (SLINK), complete linkage (CLINK), and average linkage (UPGMA) between groups were the methods used in this study. Euclidean distances were used for generating the dissimilarity coefficient matrix. The results ofHCA are illustrated using a two-dimensional dendrogram that displays the multidimensional relationships among all samples (for example, fig. 1). Materials Big sagebrush samples, grown in uniform gardens established by the U.S. Department of Agriculture, Forest Service, Shrub Sciences Laboratory, Provo, UT, consisted of 10 accessions representing three different subspecies of big sagebrush (basin, mountain, and Wyoming big sagebrush) (Welch and others 1987). Shadscale samples taken from nursery-grown plants at Brigham Young University, Provo, UT, consisted ofleaves from 16 plants, representing eight different locations and four different chromosome races (2x, 4x, 8x, and lOx). Sample Preparation Each big sagebrush and shadscale sample was uniformly dried and ground to a fine powder. Fifteen milligrams of powder was suspended in 1.5 mL of spectralgrade methanol, and the resulting mixture was sonicated for 30 minutes. Portions (5-10 J..LL) of the sonicated mixtures were applied to 510 °C ferromagnetic pyrolysis wires and uniformly dried. MVSP Principal Component Analysis (PCA)PCA is a standard statistical technique used in numerical taxonomy (Dunn and Everitt 1982; Tabachnick and Fidell 1983). It reduces the dimensionality of multidimensional data but retains as much of the variation in the data as Pyrolysis-Gas Chromatography Analysis Each ferromagnetic sample wire was heated by induction, under helium, to 510 oc for 8 seconds using a F.O.M. XL Curie-point pyrolyzer. The resulting pyrolysis products were resolved on a 27-m (0.32-mm ID, 0.25-J..Lm film) Supelco SPB-5 fused-silica capillary column using a Hewlett-Packard 5880A gas chromatograph equipped with a flame ionization detector. Helium was used as the carrier gas, and the peak areas were determined with a Hewlett-Packard series 5880A level4 integrator. The pyrolyzer head was maintained at 85 °C, the gas chromatograph oven was heated from 50 °C to 200 oc at a rate of 5 °C/min, and the detector temperature was maintained at 200 °C. Dissimilarity Value 10 Evanston (t) 15 0.5'l'. Clear Creek Canyon (t) 1.9% L Dove Creek r (t) 0.0'1. L I l Data Processing Hobble Creek (vl 80.6'l'. The resulting pyrograms (retention time versus peak areas) were compiled into m X n data matrices, consisting of i = 1, 2, ... m samples andj = 1, 2, ... n features. Each data matrix was the starting point for further chemometric investigation by: LINK (HCA), MVSP (PCA), and FCVI?C-87 run on an IBM-AT-compatible equipped with a math coprocesser and an Orchid TurboPGA video card, and CART using a Digital Equipment Corporation VAX Model 8650. Data standardization was performed by normalizing each column of features in the data matrix to the sum of the values in the column. I Petty Bishop's Log (vl 48.3'1. I Clear Creek Canyon (v) 21.6'1. · '~ Salina Canyon (v) 10.9'1o L I Milford (w) 82.7'1. ---+---------....1 L Pattern Recognition Programs Figure 1-Dendrogram from hierarchical cluster analysis of sagebrush using single linkage (SLINK) between groups and Euclidean distance measure. LINK Hierarchical Cluster Analysis (HCA)-HCA uses techniques that search for unbiased natural groupings among samples inN-dimensional space. A sample, 327 20 25 possible. This enables direct examination of relative positions of the data points (pyrograms) in the highdimensional space. This is accomplished by transforming the original variables (pyrogram peaks) into a set of new uncorrelated variables known as principal components (PC's). The resulting PC's are linear combinations of the original variables and are arranged in order of decreasing variance, relative to the variation originally present in the data (Tabachnick and Fidel11983). If the axes of the first two or three principal components account for most of the variation, plots can be generated (PC1 vs. PC2 or PC3) to represent the relative positions of the data points in the high-dimensional space. among the three palatability classes in table 3. A variety of other methods of supervised classification are available, including discriminant analysis or soft independent modeling of class analogy (SIMCA). In this study, supervised classification was performed by normalizing the three palatability classes separately. Figure 2 shows the threedimensional "false color" plot of the data structure determined using this supervised procedure. An additional weighting feature relating the palatability of the sagebrush to sheep was added to the data set. This resulted in a "forced clustering" where the palatability features were unnormalized (values from 0-100), and the pyrogram peak areas were normalized (global average value = 1.00), producing the three classes of sagebrush samples: low palatability (30 samples), medium palatability (12 samples), and high palatability (18 samples). FCV allows the samples to have a shared class membership. This was particularly Fuzzy c-Varieties Pattern Recognition (FCV)FCV pattern recognition consists of two parts: multi class principal component modeling (MPCM) and false-color data imaging. The objective of the MPCM algorithm is to obtain disjoint principal component models of the classes within the data (Gunderson 1984; Jacobsen and Gunderson 1987). Multiclass principal component modeling is an unsupervised agglomerative method that determines the membership of a sample class within a preselected number of classes using a variance-based optimization routine (Vogt and others'1989). In the MPCM algorithm, each sample data vector plays a weighted role in defining each class represented by the data. The result is a membership matrix, containing membership values for each sample data vector in each class. The output of the MPCM algorithms, a set of principal components, one per class, and a membership coefficient matrix (membership values), can be used for cluster analysis, classification of new samples, and feature selection. False-color data imaging (Gunderson and others 1988), a plotting subroutine, makes it possible to evaluate model validity using three-dimensional "false color" images and is useful for evaluating the results of the algorithm. Table 3-Ciassification of big sagebrush accessions according to palatability Palatability level Accession Subspecies Percent used High Hobble Creek Milford A t. ssp. vaseyana A t. ssp. wyomingensis 80.6 82.7 Medium Petty Bishop's Log Evanston A t. ssp. vaseyana A t. ssp. wyomingensis 48.3 44.2 Low Evanston Clear Creek Canyon Dove Creek Clear Creek Canyon Salina Canyon A A A A A Classification and Regression Trees (CART)CART classifies samples according to tree-structured rules (Breiman and Friedman 1984). The classification is performed according to a probability model. On a test set, all features are used to construct a large tree, which is then pruned to the minimal tree necessary to perform the classification. A cross-validation procedure can be used to determine the significance of the features or variables. The success or significance of the classification is expressed as a misclassification rate. CART can be used to develop a test set and to analyze unknowns. t. ssp. t. ssp. t. ssp. t. ssp. t. ssp. tridentata tridentata tridentata vaseyana vaseyana 0.5 1.9 .0 21.6 10.9 y Moderate Palatability &~~~-~ RESULTS z Big Sagebrush Using the single-linkage (SLINK) clustering method, HCA was applied to the big sagebrush data set where the clustering was found to follow a hierarchical pattern (fig. 1). Figure 1 shows that classes representing different sagebrush accessions were detected, although no relationship with palatability could be established. However, using the supervised approach of the FCVMPCM algorithm, a good discrimination was obtained Figure 2-Three-dimensional false color data image plot of the three sagebrush palatability FCV classes determined using supervised classification. 328 X useful in the construction of the palatability classes because it allowed samples with 20 percent palatability to sheep to be partly in the "low" class and partly in the "medium" class. Of particular interest are the pyrogram peaks that show a stepwise decrease or increase relative to low-, medium-, and high-palatability classes(* in fig. 3). The FCV center values (average value of each class center in 37 dimensional spaces) are graphed in figure 3. A slight increase with increasing palatability is observed for features 2 and 27, and a more pronounced increase for features 20, 22, and 35. A stepwise decrease is observed for features 7, 15, and 31. Figure 3 also reveals that several other peaks have much higher discriminating power than the peaks mentioned, such as, 4, 9, 13, 14, 17, 19, 26, and 36 for the low class, and 12, 18, 23, 24, 25, and 32 for the medium class. Since the different classes contain different accessions of sagebrush, these peaks may represent chemical compounds useful in numerical taxonomy, but not necessarily correlated with palatability. CART is designed for classification (supervised data analysis); therefore, an exploratory data analysis cannot be performed. CART was applied to the data set in which the palatability value was added as a class variable (low = 1, medium= 2, high= 3), and the chemical data were used to classify the samples into the three classes. The resulting tree (fig. 4) consists of three nodes where samples are split into two groups according to the value of a certain feature. At node 1 of the tree, the samples were classified according to feature 20 S 0.905. Thirty-six samples were found with feature 20 S 0.905, as shown in figure 4. These samples consist of both high-palatability class samples and medium-palatability class samples. Node 2 in the classification tree was used to classify the 36 samples according to feature 32 S 4.51. At this node, 30 samples were found with feature 32 S 4.51, and were assigned to the high-palatability class; six samples were found with feature 32 ~ 4.51 and were assigned to the mediumpalatability class. The 24 samples at node 1 with feature 20 ~ 0.905 consisted ofboth medium- and low-palatability classes. Node 3 was used to split the 24 samples into medium- or low-palatability classes according to feature 25 ~ 4.57. In addition to the construction of the classification tree, CART also evaluates the importance of the different features. Some features compete with those used in the tree. A ranking according to importance may, therefore, give high priority to features not used in the classification step. Table 4 shows the relative importance of the different features determined by CART. Feature 20 was highest; all other features that show a stepwise increase with increasing palatability have an importance greater than 54 percent. Figure 5 shows the 20 pyrogram peaks best used for distinguishing palatability. 5 • 4 - 3 - Low Palatability 0 Med Palatability IJ High Palatability * 2 * * 1 • > * * * : * : : : 0 ~ . ~ l . .I : . J ~ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 Pyrogram Peak Figure 3-Center values for the three sagebrush palatability FCV classes determined using supervised classification. The .. above the bars indicates the pyrogram peaks with a stepwise increase or decrease relative to the low-, medium-, and high-palatability classes. 329 Table 4-Relative importance of features according to CART Feature number Relative Importance Feature 20 ~ 0.905 v 3( rf \s\4 r-~~--. ~~----~ Medium Palatability Low Palatability High Palatability 20 24 25 18 3 23 22 27 2 1 6 29 35 Feature 25 ~ 4.57 '6 Relative importance Percent 36/ '24 Feature 32 ~ 4.51 Feature number Percent 100 75 75 75 74 71 68 65 63 60 59 55 54 48 48 48 47 47 47 47 42 41 39 28 26 <25 12 32 21 4 13 19 26 28 5 34 10 31 Remaining ...... ·.- Figure 4-Ciassification according to CART. At each of the nodes, one feature is used to split the data set into two subsets. Final classification of sagebrush samples according to palatability is: high, 30 samples; medium, 12 samples; and low, 18 samples. 6 II Low Palatability ~ Med Palatability 5 - IIJ High Palatability 4 . 4 - 3 - 14 3 15 v v v v 6 v 17 2 2 19 18 1 7 1 . 10 9 11 5 v v v v v v 0 1 2 3 1/ 4 6 v v v v v v v._ 12 13 v 20 16 v v v - v ,.... v v v , - • 18 19 20 ~-I v v , v ~/ , 8 13 P' ~ I v v v v._ v ~' li:; v v. J ~ ,~ / ~ 21 22 23 24 25 26 27 29 32 35 Pyrogram Peak Figure 5-Center values for the three sagebrush palatability FCV classes determined using supervised classification. Numbers above the bars indicate the 20 most important features according to CART. 330 12 that HCA was capable of discriminating all nine locations; however, no apparent correlation to ploidy was found. Because exploratory data analysis did not reveal any relationship with ploidy, supervised modeling was used on the data set to obtain chemical models corresponding to the four euploids. A weighted feature was added to the data set which corresponded to the four euploids. The results from HCA on the supervised data set (figs. 10 and 11) show that HCA correctly classified the shadscale samples into one of four classes corresponding to 2x, 4x, 8x and lOx. The dendrogram also indicated similarities and dissimilarities between the groups of shadscale pyrograms, with 4x and 8x samples the most similar, and the 2x the most dissimilar. MVSP principal component analysis was applied to the shadscale data set in table 2. Figure 12 shows a plot of the first and second principal components. Results corroborated previous results obtained from HCA. Shadscale The shadscale samples used for this study are shown in table 2. The resulting pyrograms from the Py-GC analyses of these samples, consisting of approximately 20 peaks, were compiled into a 48 x 28 data matrix. Representative pyrograms for each different ploidy level are shown in figures 6 and 7. HCA was applied to the resulting data matrix using two different clustering methods: (UPGMA) and (CLINK). The resulting dendrograms are shown in figures 8 and 9, respectively. Except for the distance values at which the samples formed clusters, the order in which samples merged to form the dendrograms is essentially the same for both clustering methods. Because UPGMA and CLINK are based on different philosophies, the resulting clusters can be considered as well defined and not artifacts of the clustering method. Figures 8 and 9, show 8x 2x lOx 4x Figure &-Representative pyrograms from Figure 7-Representative pyrograms from Py-GC analysis of shadscale chromosome races 2x and 4x. Py-GC analysis of shadscale chromosome races ax and 1Ox. 331 Ohsia11arity Value Ohsia11arity Value 10 IS 20 10 25 15 20 25 l~ry,UT87 21 2 ~~ock f Springs, IIY 1---------, 3 ~~88 31 3 32 3 3 3 37 3 4 3 41 42 4 4 4 4 47 4 Figure &-Dendrogram from unsupervised hierarchical cluster analysis of shadscale using average linkage (UPGMA) between groups and Euclidean distance measure. Figure 10-Dendrogram from supervised hierarchical cluster analysis of shadscale using average linkage (UPGMA) between groups and Euclidean distance measure. D1ssia11arity Yalu1 10 15 Ohsia11arity Value 20 25 10 15 1 , 87 21 r : h r yUT 2 ~~ock Springs, IIY ~7 1--------~ 3 ~~ Ellery, Uf 88 31 3 32 3 3 3 37 3 4 3 41 42 4 4 4 4 47 4 Figure 9-Dendrogram from unsupervised hierarchical cluster analysis of shadscale using complete linkage (CLINK) between groups and Euclidean distance measure. Figure 11-Dendrogram from supervised hierarchical cluster analysis of shadscale using complete linkage (CLINK) between groups and Euclidean distance measure. 332 20 25 0.025 0 ~ 0.015 The FCV algorithm was used to determine the chemical features that differentiate the four classes found by HCA and PCA. Figure 13 plots the contribution of each class center (average values of each class) to each of the 28 Euclidean dimensions in the data set. Several peaks have high discriminating power among the four shadscale euploids: peaks 16 and 20 for discriminating lOx, peak 24 for discriminating 2x, peak 9 for discriminating 8x, and the absence of any peaks for discriminating 4x from the other three. 6. 2x 0 0 0 4x Sx lOx '--- ~ N 'E Cl) c 0 a. 0.005 E 0 (.) asa. '(j @ 6 ~ -0.005 .5 ~ 0:. g -0.015 § DISCUSSION 0 Big Sagebrush A feature may be important to the class structure in two ways. It may show a high variation within a class, thus high "modeling power" for that particular class, or it may be a good discriminator among classes. In unsupervised clustering, one class is split so that the variations within the new classes are minimized relative to the global variation. A feature with global modeling -0.025 -0.1 0.0 0.1 0.2 Principal Comoonent 1 Figure 12-First and second supervised principal components analysis of shadscale samples. ~ .18 2x ~ 7 10 m Bx 4x • lOx 19 22 .16 .14 s..... Q) ,.._;l f=l Q) u en en cd r--i .12 .1 u 0 ~ Q) 0 ~ cd ,.._;l en ·~ ~ 1 4 13 16 Py rogran 1 Peaks Figure 13-Center values of the four chromosome races (2x, 4x, ax, 10x) FCV classes determined using supervised classification. 333 25 2A power should, therefore, be a good discriminator at the subset or class level. Consequently, features with high modeling power according to this model should be good discriminators for taxonomy classes but not for palatability classes. The class centers from FCV, shown in figure 3, illustrate the chemical differences among the three palatability classes. Eight features of possible importance to the palatability can be identified visually as shown in figure 3 (features with a stepwise increase or decrease with palatability). The combined use of FCV and CART, shown in figure 5, resulted in a subset of 20 peaks from figure 3 ranked as important for classification (features ; : : 4 7 percent in table 4). Numbers above the bars indicate CART's ranking. The eight peaks with a stepwise increase or decrease with palatability are among the 20 peaks in the subset, with peak 20 being the most important if the samples are to be classified according to palatability. In a related study, Welch and McArthur (1986) have shown that coumarin compounds are good taxonomic indicators, as well as palatability indicators for mule deer. Further studies on peak identification should begin with a comparative study of known taxonomic and palatability indicators (such as coumarin compounds) and peak 20. This result demonstrates that Py-GC-PR is a viable method for the determination of big sagebrush palatability. its relationship to the abundance and species of insects on shadscale. A survey ofinsects associated with native shrubs during 1986 to 1989 has included collections from areas where the ploidy of shadscale is known. It should be possible to determine if insects and ploidy are correlated. If insects are shown to be to be associated (directly of indirectly) with shrub dieoff, and with ploidy, understanding the causes of dieoff would be increased. Similar correlations could be calculated when quantitative, biological data are available about plant diseases, edaphic factors, and range management (grazing, etc.). It may well be that some of these associations will be shown to be random. However, if these associations are real by using shadscale as a test case, the principles would have widespread use. For example, as guidelines for plant materials centers, production and commercialization of native seeds, range management, gathering information about poisonous range plants, and a host of other applications. CONCLUSIONS Big Sagebrush Hierarchical application of FCV discriminated among the 10 accessions. However, neither model resulting from unsupervised data analysis correlated with palatability classes. The two programs used for supervised classification employed different approaches to evaluate the chemical features. The FCV class centers identified eight features that increased or decreased stepwise with increasing palatability. When these results were combined with the feature evaluation performed in CART, one chemical feature (peak 20) proved to discriminate best for palatability; further studies on peak identification should begin with this feature. Of the three pattern-recognition programs, the most definitive information was obtained by combining FCV and CART. Given an adequate training set, the combined application of these algorithms is recommended for interpretation of complex data sets such as those resulting from Py-GC analysis of big sagebrush. Shadscale The three multivariate pattern-recognition programs (HCA, PCA, and FCV) applied to the pyrolysis-gas chromatographic data correctly classified the shadscale samples according to location, with the supervised approach showing each of the ploidy levels. Output of the FCV algorithm was used to determine pyrogram peaks responsible for discriminating among the four ploidy levels. Five features (pyrogram peaks 6, 9, 16, 20, and 24) or their absence were important in discriminating each of the ploidy levels of shadscale. The biological questions that gave rise to these chemical tests of ploidy were: Knowing that fingerprinting techniques similar to those described here have previously discriminated among range grasses with different susceptibilities to insect feeding (Windig and others 1983), can ploidies of shadscale (2x, 4x, etc.) be classified using similar methods? Can ploidy be identified all seasons of the year? Is ploidy related to the kinds and abundance of insects found in nativ~ rangelands? Are insects directly or indirectly related to dieoff of native shrubs? Results presented here answer some of these questions. Yes, shadscale ploidy can be classified using Py-GC. Discrimination of plant location was also determined, a very important observation that may assist in identifying plants that should or should not be grown in certain areas because of their adaptation characteristics. Data now on hand and being analyzed may provide answers to some of the other questions about ploidy and Shadscale The results of this study using a limited data set from four ploidy levels of shadscale (2x, 4x, 8x, and lOx) demonstrated that Py-GC-PR is capable of discerning minute biochemical differences among morphologically similar accessions of shadscale. In addition, this study demonstrates that Py-GC-PR, given a large enough training set, could classify and differentiate unknown samples of shadscale according to their ploidy levels. Determination of the chemical identity of the discriminating pyrogram peaks allows development of rapid screening methods for shadscale identification through the use of Py-GC-PR. 334 ACKNOWLEDGMENTS McArthur, E. D.; Welch, B. L. 1982. Growth rate differences among big sagebrush (Artemisia tridentata) accessions and subspecies. Journal of Range Management. 35: 396-401. Meglen, R. R. 1988. Chemometrics: its role in chemistry and measurement sciences. Chemometrics and Intelligent Laboratory Systems. 3: 17-29. Romesburg, H. C. 1984. Cluster analysis for researchers. Belmont, CA: Lifetime Learning Publications. Soderstrom, B.; Frisvad, J. C. 1984. Separation of closely related asymmetric penicillia by pyrolysis gas chromatography and mycotoxin production. Mycologia. 76: 408-419. Sharaf, M. A.; Tilman, D. L.; Kowalski, B. R. 1986. In: Elving, P. J.; Winefordner, J.D., eds. Chemometrics Vol. 82 in chemical analysis. New York: John Wiley and Sons: Chapter 6. Stutz, H. C.; Sanderson, S. C. 1983. Evolutionary studies of Atriplex: chromosome races of A. confertifolia (shadscale). American Journal of Botany. 70(10): 1536-1547. Tabachnick, B. G.; Fidell, L. S. 1983. In: Using multivariate statistics. New York: Harper and Row. Torell, J.; Evans, J.; Valcarce, R.; Smith, G. G. 1989. Chemical characterization of leafy spurge (Euplwrbia esula L.) by Curie-point pyrolysis-gas chromatographypattern recognition. Journal of Analytical Applied Pyrolysis. 14: 223-236. Valcarce, R.; Smith, G. G. 1989a. Chemical characterization of honey bees by Curie-point pyrolysis-gas chromatography-pattern recognition. Chemometrics and Intelligent Laboratory Systems. 6: 157-166. Valcarce, R.; Smith, G. G. 1989b. Pattern recognition studies of Curie-point pyrolysis-gas chromatographic data from materials important to agriculture. Journal of Analytical Applied Pyrolysis. 15: 357-372. Vogt, N. B.; Bye, E.; Thrane, K. E.; Jacobsen, T.; Benestad, C. 1989. Composition activity relationshipsCARE: Part 1. Exploratory multivariate analysis of elements, polycyclic aromatic hydrocarbons and mutagenicity in air samples. Chemometrics and Intelligent Laboratory Systems. 6:31-47. Welch, B. L.; McArthur, E. D. 1986. Wintering mule deer preference for 21 accessions of big sagebrush. Great Basin Naturalist. 46: 281-286. Welch, B. L.; McArthur, E. D.; Rodriguez, R. L. 1987. Variation in utilization of big sagebrush accessions by wintering sheep. Journal of Range Management. 40: 113-115. Windig, W.; Meuzlaar, H. L. C.; Haws, B. A.; Campbell, C. F.; Asay, K. H. 1983. Biochemical differences observed in pyrolysis mass spectra of range grasses with different resistance to Labops hesperus Uhler attack. Journal of Analytical and Applied Pyrolysis. 5: 183-198. Wold, S.; Albano, C.; Dunn, W. J., II; Edlund, U.; Esbensen, K.; Geladi, P.; Hellberg, S.; Johansson, W.; Lindberg, W.; Sjostrom, M. 1984. In: Kowalski, B. R., ed. Chemometrics: mathematics and statistics in chemistry. NATO ASI Series: NATO Science Affairs Division. New York: Reidel Publishing. The authors wish to thank R. W. Gunderson for the use of his programs FCVPC-87 and LINK; T. Jacobsen, on leave from the Brewing Industry Research Institute, Oslo, Norway, for help and useful discussions during data processing; and J. Robinson for help with the Py-GC analyses. Portions of this study were funded by the Utah Agricultural Experiment Station, Logan, UT; the Biotechnology Center, Logan, UT; Utah State University; and the USDA Forest Service, Intermountain Research Station. Journal paper No. 3885 of the Utah Agricultural Experiment Station. REFERENCES Behan, B.; Welch, B. L. 1986. Winter nutritive content of black sagebrush (Artemisia nova) grown in a uniform garden. Great Basin Naturalist. 46: 161-165. Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J. 1984. In: Bickel, P. J., ed. Classification and regression trees. Belmont CA: Wadsworth International: Chapter 8. Duewer, D. L.; Kowalski, B. R.; Fasching, J. L. 1976. Improving the reliability of factor analysis of chemical data by utilizing the measured analytical uncertainty. Analytical Chemistry. 48: 2002. Dunn, G.; Everitt, B.S. 1982. An introduction to mathematical taxonomy. New York: Cambridge University Press. Gunderson, R. W. 1984. FCV-manual. Logan, UT: Utah State University, Dept. of Electrical Engineering. Gunderson, R. W.; Thrane, K.; Nilson, R. D. 1988. A falsecolor technique for display and analysis ofmultivariable chemometric data. Chemometrics and Intelligent Laboratory Systems. 3: 119-131. Irwin, W. J. 1982. In: Analytical pyrolysis: a comprehensive guide. Chromatographic Science Series Vol. 22. New York: Marcel Dekker. Jacobsen, T.; Gunderson, R. W. 1987. In: Piggot, J. J., ed. Statistical procedures in food research. Elsevier: Chapter 10. Jerman-Blazic, B.; Fabic-Petrac, I.; Randic, M. 1989. Evaluation of the molecular similarity and property prediction for QSAR purposes. Chemometrics and Intelligent Laboratory Systems. 6: 49-63. Jurs, P. C. 1986. Pattern recognition used to investigate multivariate data in analytical-chemistry. Science. 232: 1219-1224. Knudson, E. A.; Duewer, D. L.; Christian, G. D.; Larson, T. V.. 1977. In: Kowalski, B. R., ed. Chemometrics: theory and application. Washington, DC: ACS Symposium Series 52. McArthur, E. D.; Plummer, A. P. 1978. Biogeography and management of native western shrubs: a case study, Section Tridentatae of Artemisia. Great Basin Naturalist Memoirs. 2: 229-243. McArthur, E. D.; Pope, C. L.; Freeman, D. C. 1981. Chromosomal studies of subgenus Tridentatae of Artemisia: evidence for autoploidy. American Journal of Botany. 68: 589-605. 335

MULTIVARIATE STATISTICAL PATTERN RECOGNITION OF CURIE-POINT PYROLYSIS-GAS CHROMATOGRAPHIC FINGERPRINTS

Related documents

Products

Support

MULTIVARIATE STATISTICAL PATTERN RECOGNITION OF CURIE-POINT PYROLYSIS-GAS CHROMATOGRAPHIC FINGERPRINTS

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib