advertisement

BRE-12139 Supplementary text 1 Quartz Trace elements and Principal Component Analysis The large number of elements measured in quartz creates a rich multivariate dataset from which to infer detrital provenance. A common approach to visually evaluate populations in an ndimensional dataset is principal component analysis (PCA). PCA transforms data by reducing the dimensionality of a dataset and allowing it to be visualized in 2-D or allowing clustering algorithms to be run directly on the transformed data. PCA works by considering the data as a set of points (or vectors) in n-dimensional space, and extracts a set of orthogonal vectors that can be linearly combined to reproduce the data set. The vectors are extracted in order of the amount of variability that they explain, such that the first vector “explains” more of the variability than later vectors. These vectors can be viewed as coefficients or “loadings,” which when multiplied by the concentration of each element in a given grain, sum to a principal component score (one score per grain, per basis vector). High or low scores suggest that the composition of that grain is not well explained by a given PC vector, whereas scores close to zero suggest the grain’s composition is well explained. Thus, when PC scores are plotted, grains with similar compositions should plot together, even if the source of their similarity is unknown. The power of PCA lies in the fact that a single PC vector can explain multiple co-varying elements, thereby reducing the dimensionality of the dataset and preventing the need for redundant bivariate plots that show the same population clusters visualized in different element pairings. Another approach to identify populations within a dataset is cluster analysis, which groups data points by the Euclidean distance (or other metrics) between points in n-dimensional space. We employed the K-means algorithm, which begins with a set of “seed” values and groups the points closest to that seed value into a cluster. It then computes the mean of all points in each resultant cluster and reclassifies all the original points based on their distance from the new set of means. This process is iteratively repeated until the cluster means do not shift. Using an F-test, which plots the ratio of the variance between cluster means over the variance of the entire population as a function of the number of clusters, we evaluated the appropriate number of clusters and settled on n=5 clusters (Fig. S3). Both PCA and cluster analysis were performed using the Statistics Toolbox in Matlab.