Eigenvector Localization on Data-Dependent Graphs Alexander Cloninger and Wojciech Czaja Norbert Wiener Center Department of Mathematics University of Maryland, College Email: {alex,wojtek}@math.umd.edu Abstract—We aim to understand and characterize embeddings of datasets with small anomalous clusters using the Laplacian Eigenmaps algorithm. To do this, we characterize the order in which eigenvectors of a disjoint graph Laplacian emerge and the support of those eigenvectors. We then extend this characterization to weakly connected graphs with clusters of differing sizes, utilizing the theory of invariant subspace perturbations and proving some novel results. Finally, we propose a simple segmentation algorithm for anomalous clusters based off our theory. I. I NTRODUCTION TO G RAPH T HEORY IN D IMENSION R EDUCTION Many nonlinear dimensionality reduction techniques, such as Laplacian Eigenmaps [1], Diffusion Maps [5], and Local Linear Embedding [9], center on building a data-dependent graph. This allows one to look at similarities between data points as a way to extract useful relationships. We shall focus on the Laplacian Eigenmaps embedding technique. The purpose of Laplacian Eigenmaps, as with all non linear dimensionality reduction techniques, is to create a mapping φ : Rd → Rm , where m is the inherent dimension of the underlying data. Let Ω = {x1 , ...xn } ⊂ Rd be a set of training points. We have a positive, symmetric kernel K : Ω × Ω → R that encodes relationships between two points. We define a neighborhood N (x) ⊂ Ω of each x ∈ Ω to be the k closest points to x, as measured by the kernel K. We construct a graph G = (Ω, E), where {xi , xj } ∈ E if xj ∈ N (xi ). Let A be the adjacency matrix of G. Then A is sparse, with row Ai,· containing k non-zero entries. We require A to be symmetric, though whether A is symmetric depends on how the neighborhoods are generated. If A is not ei,j = max(Ai,j , Aj,i ). symmetric, simply define the weights A Note that, if we do not symmetrize A, it would be exactly an adjacency matrix for a k-regular graph due to the nearest neighborPcondition. We define the diagonal matrix D such that Di,i = j Ai,j , and we define the graph Laplacian as L = D − A. Finally, we solve the normalized eigenvalue problem 1 1 D− 2 LD− 2 φ = λφ. (1) The smallest eigenvalue is 0 and its associated eigenvector is left out, as the eigenvector is constant across all nodes of a connected component of a graph. The m eigenvectors corresponding to the next smallest m eigenvalues are used to c 978-1-4673-7353-1/15/$31.00 2015 IEEE form the embedding into Rm . In other words, if {φi }m i=1 are the eigenvectors associated with eigenvalues {λi }m i=1 , then φ(xi ) = (φ1 (i), ..., φm (i)). It it worth mentioning that {φi }m i=1 are orthogonal, due to the 1 1 fact that D− 2 LD− 2 is self-adjoint. Analysis of the performance of Laplacian Eigenmaps generally focuses on the assumption that the data lies on a smooth manifold and that the graph Laplacian approximates the Laplace-Beltrami operator on that manifold [2]. Focus is also given to the similarity kernel applied to the data, in order to generate a more faithful embedding [11]. This paper instead proposes to study these operators from the context of graph theory. Graph theoretic analysis of dimension reduction allows the results to be independent of distance metric or local geometry of the data. An example of this is shown in Figure 1. These two data sets can differ in geometry, and even in dimension. However, the individual points relate to each other in a similar manner, and both data sets generate virtually identical graph Laplacians. Namely, both graphs consist of two nearly disjoint clusters. (a) Gaussian (b) Gaussian Adj. (c) Two Moon (d) Two Moon Adj. Fig. 1. Two datasets and the adjacency matrices of their data-dependent graphs, sorted into clusters for visualization purposes. Gaussian adjacency: µ = 23.7, σ = 1.64, Two moon adjacency: µ = 23.4, σ = 2.23 These clusters can be described graph theoretically. Let us simplify the context slightly, and assume that the kernel K(x, y) is an indicator function of whether x and y are nearest neighbors. We shall define a cluster on n points as being a randomly chosen k-regular graph with n nodes. This is because each row of the adjacency matrix has k non-zero entries due to the k-nearest neighbors algorithm for choosing edges. Also, the edges within the cluster will be randomly distributed between other nearby points in the cluster, as there is no structural difference between the clustered points, except for noise and slight variability. This means the degree of each clustered point will be k, and the distribution of those weights is independent of which point in the cluster is chosen. Definition I.1. The family of k-regular graphs Gn,k is the set of all graphs P G = (V, E) with n nodes and ∀x ∈ V , deg(x) ≡ wx,y = k. {x,y}∈E We chose k = 25 nearest neighbors for the examples in Figure 1. For both graph Laplacians, the two clusters are almost completely disjoint. Also, we report the mean (µ) and standard deviation (σ) of the degree of the nodes. Clearly, all of the nodes have almost identical degree for both graphs, and are fairly close to being k-regular graphs. For the rest of this paper, we shall examine results about graphs with k-regular subgraph clusters, specifically when the clusters are of differing sizes. Also, we shall assume the kernel k(x, y) is an indicator function of whether x and y are nearest neighbors. This shall allow us to approximate the behavior of Laplacian Eigenmaps by utilizing the vast literature that exists on regular graphs. supp(φi0 ) ⊂ C1 . This means φi0 (x) = 0 for x ∈ C2 . However, ∃x, y ∈ C1 such that φi0 (x) < 0 and φi0 (y) > 0. Thus, a separating line on φi0 would be unable to differentiate C1 from C2 . Since most of the energy of φ(Ω) lies in C1 , most i ∈ {1, ..., m} satisfy supp(φi ) ⊂ C1 . B. Eigenvector Distribution for Unions of Regular Graphs The logic behind the phenomenon in Figure 2 is based on the distribution of eigenvalues of the Laplacian. Specifically, it depends on the interlacing of eigenvalues of k-regular graphs. The first significant progress in this problem came 30 years ago in a paper by McKay [8]. He showed that, given a sequence of regular graphs with the number of nodes tending to infinity, the empirical spectral distribution of the scaled adjacency matrix √ 1 An converges to a semicircle k−1 fd (x) = II. E IGENVECTOR D ISTRIBUTION FOR D ISJOINT C LUSTERS WITH H ETEROGENEOUS S IZES For Laplacian Eigenmaps, the common assumption is that one only needs to keep the m d smallest eigenvectors to create a faithful embedding. However, the choice of m is commonly overlooked, other than assuming m must be at least as large as the intrinsic dimensionality of the data. A. Example of Eigenvector Distribution A general approach to choosing m is deciding on the intrinsic dimension of the data. However, Figure 2 demonstrates the choice of m is more complicated. The data consists of two clusters in R2 , with cluster C1 containing 10,000 points, and cluster C2 containing 1000 points. Laplacian Eigenmaps is run on this example with a Gaussian kernel and 50 nearest neighbors. The images below show the eigenvectors with the 14 smallest non-zero eigenvalues. Observe that, due solely to the difference between their clusters, all but one of the eigenvectors have their entire energy concentrated in C1 . Original 1st Eig ... 12th Eig 13th Eig Fig. 2. First image shows the two original clusters, followed by the support of each eigenvector of the graph Laplacian. Notice that the first appearance of the smaller cluster does not occur until the 13th eigenvector. This can be problematic for a number of reasons. For one, there are no intercluster features in the data. However, 13 of the first 14 eigenvectors are picking up erroneous features in C1 . This can lead to issues when the embedded points φ(Ω) are inputs to a clustering algorithm such as k-means or support vector machines. These erroneous features are given undue weight in clustering, leading to errors in classification. Second, despite C2 constituting a significant portion of the data, almost all the energy in φ(Ω) is concentrated in C1 . Again, this poses problems for clustering and classification algorithms. To see this, fix i0 ∈ {1, ..., m} such that 1 p 4 − x2 , −2 < x < 2. 2π (2) Following this result, the necessity to avoid cycles was removed in exchange for proving results about random regular graphs. Also, it raised the question of whether such convergence results could be made for finite n. It wasn’t until 2013 that results were proved in this case by Dumitriu and Pal [7]. Theorem II.1. (Theorem 2, [7]) Fix δ > 0 and let k = (log(n))γ , and let η = 21 (exp(k −α ) − exp(−k −α )) for 0 < α < min(1, 1/γ). Then there exists an N large enough such that ∀n > N , for G ∈ Gn,k chosen randomly with adjacency matrix A, for any interval I ⊂ R such that |I| ≥ max{2η, η/(−δ log δ)}, Z |NI − n fd (x)dx| < nδ|I| I with probability at least 1 − o(1/n). Here, NI is the number 1 of eigenvalues of √k−1 A in the interval I and fd is the semicircle law in (2). Using Theorem II.1, we begin to address the phenomenon that occurs in Figure 2. We give a theorem characterizing the order in which eigenvalues and eigenvectors concentrated on either C1 or C2 emerge from a graph Laplacian of a disjoint graph. Theorem II.2. Let Γ = (Ω, E) be an undirected graph. Suppose Ω can be split into two disjoint clusters C1 and C2 such that, for the subgraph G1 generated by C1 and the subgraph G2 generated by C2 , G1 ∈ Gn,k and G2 ∈ G Dn ,k . Furthermore, assume @{x, y} ∈ E such that x ∈ C1 and y ∈ C2 . Fix δ, k, α, and η as in Theorem II.1. Choose any interval √ max{2η, η/(−δ log δ)}. Let I ⊂ [0, 2] such that |I| ≥ k−1 k L denote the graph Laplacian, and σ1 , ..., σm denote the m eigenvalues of L that lie in I. Then there exists an orthonormal basis {v1 , ..., vm } of associated eigenvectors such that, if NI1 = |{i : supp(vi ) ⊂ C1 }| and NI2 = |{i : supp(vi ) ⊂ C2 }|, then NI1 + NI2 = m and there exists some N such that ∀n > N , |NI1 − DNI2 | ≤ 2δn √ k |I| k−1 with probability at least 1 − o(1/n) over the choice of subgraphs G1 and G2 . Moreover, m satisfies Z m − (n + n ) fd (x)dx < δ(n + n ) √ k |I|. D D k−1 I III. W EAKLY C ONNECTED C LUSTERS WITH H ETEROGENEOUS S IZES In real datasets, it is unlikely that clusters are disjoint. However, Theorem II.2 begins to describe eigenvector localization for Laplacian Eigenmaps. The next question that arises concerns the behavior of weakly connected clusters with heterogeneous sizes, i.e., graphs with a small number of edges between the clusters. This characterizes a larger and more realistic class of data analysis problems. Definition III.1. A graph with weakly connected clusters of order t is a connected graph with adjacency matrix A1 B1,2 A= , | B1,2 A2 where B1,2 has t non-zero entries, and A1 and A2 are adjacency matrices of k-regular graphs. We now restate the eigenvector localization problem on a graph with weakly connected clusters as a problem of matrix perturbation. Consider two graphs H and G, where H is a disjoint regular graph that satisfies the assumptions of Theorem II.2, and G is a graph with weakly connected A1 0 clusters of order t. In other words, AH = and 0 A2 A1 B1,2 AG = . Then, AG = AH + B, where B | B1,2 A2 is a block 2 × 2, 2t sparse adjacency matrix that only has terms on the block off-diagonal. Clearly, one can see AG as a perturbed version of AH , and the eigenvalues and eigenvectors of AH are completely characterized by Theorem II.2. This makes perturbation theory a valid approach to showing that the eigenvalues and eigenvectors of AG (and the graph Laplacian LG ) do not deviate much from the known quantities of AH . A. Eigenvalue Distribution First, we shall consider the eigenvalue distribution of this new perturbed matrix. Theorem III.2 is a variant of Weyl’s inequality, which says the eigenvalues of a perturbed matrix A + E deviate at worst by the largest eigenvalue of the error matrix E [14]. This variant, which relies heavily on results from [4] for normalized graph Laplacians, relates specifically to graphs with weakly connected clusters. Theorem III.2. Let Γ = (Ω, E) be a graph with weakly connected clusters of order t, such that one cluster is of size n . Fix δ, k, α, η, and I as n and the other cluster is of size D in Theorem II.2. Let L denote the graph Laplacian, and σ1 , ..., σm denote the m eigenvalues of L that lie in I. Then m satisfies Z n k n |m − (n + ) fd (x)dx| < δ(n + ) √ |I| + 2t, D I D k−1 again with probability at least 1 − o(1/n). B. Eigenvector Distribution Now, we shall consider the eigenvector distribution of a graph with weakly connected clusters by considering it as a matrix perturbation problem. Davis and Kahan [6] were the first to give general theorems relating to the invariant subspaces of two Hermitian matrices. These results were extended by Stewart [12] via an iterative process for generating the invariant subspaces. These theories center around the distribution of eigenvalues and eigenvalue gaps. Definition III.3. The eigenvalue separation of two n × n A matrices A and B with spectrum σ(A) = {λA 1 , ..., λn } and B B σ(B) = {λ1 , ..., λn } is defined as B sep(A, B) = min |λA i − λj |. i,j Theorem III.4. (Theorem 4.11, [12]) Let A, E ∈ Cn×n . Let X = [X1 , X2 ] be a unitary matrix with X1 ∈ Cn×l , and suppose R(X1 ) is an invariant subspace of A. Let A1,1 A1,2 E1,1 E1,2 X ∗ AX = , X ∗ EX = . 0 A2,2 E2,1 E2,2 Let δ = sep(A1,1 , A2,2 ) − kE1,1 k − kE2,2 k. Then, if 1 kE2,1 k(kA1,2 k + kE1,2 k) ≤ , 2 δ 4 there is a matrix P satisfying kP k ≤ 2 (3) kE2,1 k , δ such that f1 = (X1 + X2 P )(I + P ∗ P )−1//2 X (4) is an invariant subspace of A + E. Theorem III.4 gives a sufficient condition for guaranteeing that an eigenspace X1 remains relatively preserved under perturbation. Under the condition that A is a graph with weakly connected clusters, and sep(A1,1 , A2,2 ) 6= 0, Theorem III.4 gives us bounds on the individual eigenvectors under perturbation. This type of theorem is an approach to showing that the eigenvectors of a graph with weakly connected clusters remains localized. It implies that greater eigenvalue separation leads to better eigenvector localization. However, the conditions that need to be satisfied are too strict for our problem, given that σ(LG ) ⊂ [0, 2] regardless of the number of points. To demonstrate this disparity between theory and example, consider the two moons dataset from Figure 1(c). In this dataset, there are 7 edges connecting C1 and C2 . |C1 | = 1989 and |C2 | = 211, meaning |C1 | = D · |C2 | where D = 9.4. We shall examine the smallest 10% of eigenvalues and their associated eigenvectors, as these are the vectors that are most commonly chosen for the Laplacian Eigenmaps algorithm. Let LG be the Laplacian of the graph with weakly connected clusters, and LH be the Laplacian of the graph with disjoint clusters. Let {v1 , ..., v200 } be the eigenvectors of LG and {w1 , ..., w200 } be the eigenvectors of LH . Figure 3 plots hvi , wi i for i ∈ {1, ..., 200}. Clearly, there is a large discrepancy between theory and practice. Theorem III.4 only predicts 26 eigenvectors satisfy the assumptions of the spectral gap necessary to guarantee (4) holds for kP k < 1, which √ . However, 180 of the eigenvectors would imply hvi , wi i > 22√ actually satisfy hvi , wi i > 22 . (a) Vector Angles (b) Vector Angles Predicted by Theorem III.4 Fig. 3. Actual Vector Angles hvi , wi i for the first 200 eigenvectors of data from Figure 1(c) versus Predicted spectral gap from Theorem III.4. A more enlightening depiction of this discrepancy can be seen in Figure 4. This is another plot of the vector angles (same as Figure 3(a)), except now the indices {i : supp(wi ) ⊂ C2 } are marked with a vertical line. Recall from Theorem II.2, this occurs on average once out of every D indices. the perturbed matrix A + E, where x = [x1 , ..., xn ]. Then X j∈C c |xj |2 ≤ e − λi )x − Exk2 k(λ 2 . min(λi − λi−s , λi+s − λi )2 Theorem III.5 demonstrates that, when there exists a series of eigenvectors concentrated on a subset of the points C, then the concentration of the new, corresponding eigenvectors is inversely proportional to the square of the eigenvalue gap. It is related to results from [10]. Using Theorem III.5, we attempt to predict the number of eigenvectors from Figure 1(c) that remain concentrated in the appropriate cluster. Recall √ that 180 of the first 200 eigenvectors satisfied hvi , wi iP> 22 , which is a similar condition to predicting that |xj |2 < .5. Theorem III.5 predicts that j∈C c P 130 of the eigenvectors will satisfy |xj |2 < .5 and remain j∈C c concentrated in their respective clusters. While this is less than the 180 that actually remain localized, the prediction of 130 is far better than the prediction of 26 that occurs using Theorem III.4. More importantly, of those 130 predicted eigenvectors, 127 are concentrated on the larger cluster C1 . Only 3 are concentrated on the smaller cluster C2 . IV. R ESULTS OF E IGENVECTOR C ONCENTRATION T HEOREMS A. Interpretation of Results Fig. 4. Vector Angles for first 200 eigenvectors of data from Figure 1(c), with green vertical lines denoting eigenvalues for which λi ∈ {λi : supp(wi ) ⊂ C2 }. Blue dot: {hwi , vi i : supp(wi ) ⊂ C1 }, Red dot: {hwi , vi i : supp(wi ) ⊂ C2 }. Notice that the only deviation hvi , wi i makes from being close to 1 occurs on or near the indices for which supp(wi ) ⊂ C2 . This suggests why Theorem III.4 is not sufficient for the current setting. Theorem III.4 gives a condition for which kP k < 1. However, it does not speak to which eigenvectors f1 in (4), regardless of whether (3) is from X2 contribute to X violated. Figure 4 suggests that the eigenvectors from X2 that conf1 are exactly those that are nearest in eigenvalue. tribute to X This is why points near a vertical line for λi ∈ {λi : supp(wi ) ⊂ C2 } are less robust to perturbation. This leads to the following theorem. Theorem III.5. Let A be a symmetric n × n matrix with eigendecomposition A = V ΣV ∗ . Let (λi , vi ) be an eigenpair of A. Partition V by ordering the eigenvalues such that V = [V1 , V2 , vi , V3 , V4 ] where V2 , V3 ∈ Rn×s . Moreover, assume ∃C ( {1, ..., n} such that supp(vi ) ⊂ C and supp(vj ) ⊂ C e x) an eigenvector of where vj is a column of V2 , V3 . Let (λ, Let us consider the results of Theorem III.5. This eigenvector concentration result, along with the disjoint graph results from Theorem III.5, suggest a negative result for differentiating small clusters C2 from a larger background cluster C1 using Laplacian Eigenmaps. These results would suggest that small clusters are forced to 0 for most eigenvectors of the graph Laplacian. This makes classification, and especially determining inter-cluster differences in C2 , very difficult. On top of that, Theorem III.5 suggests that even if LH has an eigenvector wi supported on C2 , its corresponding eigenvector vi of LG for the graph with weakly connected clusters may not remain supported on C2 . This is because, while supp(wi ) ⊂ C2 , supp(wi−1 ) and supp(wi+1 ) are most likely concentrated on C1 due to Theorem II.2. B. Experimental Demonstration of Results The theories throughout this paper are descriptive in nature, meaning the goal is to characterize the embeddings. The main takeaway from these results is that the current method of selecting the number of m eigenvectors kept fails to take into account the uneven distribution of eigenvectors concentrated on small clusters of anomalous data. However, it is still possible to experimentally verify this descriptive theory and possibly gain an algorithmic advantage. Consider a very noisy version of two moons dataset (see Figure 5). We shall call the graph G = (Ω, E). Once again, the larger cluster C1 is much larger than the smaller cluster C2 . Specifically, we pick |C1 | = 9, 500, and |C2 | = 500. One question is how to separate these two clusters in an optimal way. One approach would be to use a graph cut algorithm. This is a method that finds a set V ⊂ Ω that minimizes |E(V, V c )| , V ⊂Ω |V | · |V c | RatioCut(Ω) = arg min where E(V, V c ) = {{x, y} ∈ E : x ∈ V, y ∈ V c }. While this problem is non-convex, a common approximation involves thresholding the second eigenvector of the graph Laplacian via a function sgn(φ2 (Ω) − φ2 (Ω)), where φ2 (Ω) is the mean of the second eigenvector [13]. Unfortunately, even though V = C2 would be the correct split of the data, |C2 | is too small to achieve the minimum of RatioCut. This is reflected in Figure 5. Instead, we can take into account the knowledge that anomalous clusters are forced to zero for most eigenvectors φi . One simply calculates a new scoring function ρm, (x) = #{i ∈ {1, ..., m} : |φi (x)| < }, (5) and uses ρm, : Ω → R to cluster the data points into two clusters. We compare this scoring function to eigenvector thresholding for the graph cut in Figure 5. The scoring function in (5) clearly gives the correct clustering, with an error rate of only 0.03%. 81st smallest. This is consistent with our theory, given that |C1 |/|C2 | = 50. Figure 6(b) shows the graph cut generated by the smallest non-zero eigenvector, and Figure 6(c) shows a class separator generated by (5) on 25 eigenvectors. Using (5) gives a perfect classifier for the anomalous cluster, whereas the graph cut does not reflect the anomalous cluster in any way. (a) Source (b) Graph Cut Classifier (c) ρ25,0.001 (x) Fig. 6. (a) Data points from flight path. The points in red mark the locations of the anomalous radiological signature. (b) Nuclear data class separator using graph cut. (c) Nuclear data class separator using (5). VI. C ONCLUSIONS This paper proves that Laplacian Eigenmap embeddings rarely concentrate on small, anomalous data clusters. This affects the choice of m in the algorithm, as well as shifts most of the energy in the embedding to background clusters. However, it is possible to exploit this characteristic by using (5) to create a segmentation function that easily differentiates small clusters. We plan to extend this analysis to other dimension reduction techniques in future work. R EFERENCES (a) Graph Cut (b) Clusters ρ50,0.001 Generated by Fig. 5. Class separation for two moons example with |C1 | = 9, 500, and |C2 | = 500. Clusters generated by ρ50,0.001 incorrectly classified 3 of the 10,000 points in the example. V. E XAMPLES Throughout this paper, we have used the synthetic example of the two moons dataset from Figure 1(c). However, Theorems II.2 and III.5 are completely general within the aforementioned class of graphs with weakly connected clusters. To demonstrate this, we shall examine a dataset of radiation detection collected by the Unmanned Systems Lab (USL) at Virginia Tech using systems mounted on an unmanned aircraft (UAV) Yamaha RMAX helicopter [3]. The UAV collected a low-altitude aerial mapping of spectral data. During the experiment, the two radioactive sources were present, .084 Ci 137Ce and .00048 Ci 133Ba. The flight path, as well as the location of the strongest radiological spectra, can be seen in Figure 6(a). Using this data, we built a 10 nearest neighbor graph based on Euclidean distances between the radiological spectra at each location. The first eigenvector concentrated on the anomalous cluster emerges in the third smallest eigenvector, but the next emergence of such an eigenvector is not until the [1] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373– 1396, 2003. [2] M. Belkin and P. Niyogi. Convergence of laplacian eigenmaps. In NIPS, pages 129–136, 2006. [3] J. Benedetto, A. Cloninger, W. Czaja, T. Doster, K. Kochersberger, B. Manning, T. McCullough, and M. McLean. Operator-based integration of information in multimodal radiological search mission with applications to anomaly detection. Proc. SPIE, 9073:90731A–90731A– 9, 2014. [4] G. Chen, G. Davis, F. Hall, Z. Li, K. Patel, and M. Stewart. An interlacing result on normalized laplacians. SIAM J. Discret. Math., 18(2):353–361, Feb. 2005. [5] R. R. Coifman and S. Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006. [6] C. Davis and W. Kahan. The rotation of eigenvectors by a perturbation III. SIAM Journal on Numerical Analysis, 7(l), 1970. [7] I. Dumitriu and S. Pal. Sparse regular random graphs: Spectral density and eigenvectors. The Annals of Probability, 40(5):2197–2235, Sept. 2012. [8] B. McKay. The Expected Eigenvalue Distribution of a Large Regular Graph. Linear Algebra and its Applications, 10017:203–216, 1981. [9] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000. [10] A. Ruhe. Perturbation bounds for means of eigenvalues and invariant subspaces. BIT Numerical Mathematics, 10:343–354, 1970. [11] A. Singer and R. R. Coifman. Non-linear independent component analysis with diffusion maps. Applied and Computational Harmonic Analysis, 25(2):226–239, 2008. [12] G. W. Stewart and J. G. Sun. Matrix perturbation theory. Academic press, 1990. [13] A. Szlam and X. Bresson. A total variation-based graph clustering algorithm for cheeger ratio cuts. UCLA CAM Report, pages 09–68, 2009. [14] H. Weyl. Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen. Math. Ann., 71:441–479, 1912.