Summary of Robust Scalable Visualized Clustering in Vector and

Robust Scalable Visualized Clustering in Vector and non Vector Semimetric Spaces, G Fox - Parallel Processing Letters, 2013 - World Scientific The Summary Mashar Cenk Gencal School of Informatics and Computing, Indiana University, Bloomington mcgencal@indiana.edu 1 Abstract In this paper, a new idea is revealed to data analytics for large systems by using robust parallel algorithms which runs on clouds and HPC systems. However, it is assumed that the data is already described in a vector space, and the pairwise distances between points are also described. Moreover, it is achieved to make some known algorithms better for functionality, features and performance. Furthermore, it is mentioned about steering complex analytics where visualization is important for both the non vector semimetric case and clustering high dimension vector spaces. Deterministic Annealing is used to have fairly fast robust algorithm. The method is practiced in some life sciences applications. 1 Introduction In the publication, deterministic annealing is defined, and then it is considered that clustering with an highlight on some of features which are not found in the openly available software such as R. After that, it is mentioned about difficulty of parallelization for heteregeneous geometries. Finally, in the last section, it is suggested to use a data mining algorithm with some form of visulization since the most of datasets do not show that whether or not an analysis is adequante. Because of that, it is advised that “ The well studied area of dimension reduction deserves more systematic use and show how one can map high dimension vector and non vector spaces into 3 dimensions for this purpose.” [1] 2 Deterministic Annealing Before explaining the idea in the publication, we need to explain Determistic Annealing, Gibbs Distribution, Free Energy F : 2.1 Gibbs Distribution, Free Energy Gibbs Measure: where 𝛽 = 1 𝑇 𝑃𝑟 (𝑥 ∈ 𝐶𝑗 ) = 𝑒 −𝛽𝐸𝑥 (𝑗) 𝑍𝑥 (1) , 𝑥 ∶ 𝑝𝑜𝑖𝑛𝑡 , 𝐶𝑗 ∶ 𝑗 𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 , 𝐸𝑥 (𝑗) ∶ 𝑒𝑛𝑒𝑟𝑔𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 , 𝑍𝑥 ∶ 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 2 𝑍𝑥 = ∑𝑐𝑘=1 𝑒 −𝛽𝐸𝑥 (𝑘) (2) - if 𝛽 → 0, each point is equally associated with all clusters. - if 𝛽 → ∞, each point belongs to exactly one cluster Total partition function : 𝑍 = ∏𝑥 𝑍𝑥 (3) Expected energy of x; 𝐸𝑥 = ∑𝑐𝑘=1 𝑃𝑟 (𝑥 ∈ 𝐶𝑘 ) 𝐸𝑥 (𝑘) (4) Thus, total energy is, 𝐸 = ∑𝑥 𝐸𝑥 (5) If F is effective cost function ( free energy ), from the equation (3) ; 1 1 𝛽 𝛽 𝑍 = 𝑒 −𝛽𝐹 𝑜𝑟 𝐹 = − log 𝑍 = − ∑𝑥 log 𝑍𝑥 (6) Therefore, if 𝛽 → ∞, then 𝐹 = 𝐸 2 𝐸𝑥 (𝑗) = |𝑥 − 𝑦𝑗 | , where 𝑦𝑗 is the representative of 𝐶𝑗 (7) Thus, the probability that x belongs to 𝐶𝑗 , from equations (1) , (2) and (7) ; 𝑃𝑟 (𝑥 ∈ 𝐶𝑗 ) = 𝑒 2 −𝛽|𝑥−𝑦𝑗 | −𝛽|𝑥−𝑦𝑗 | ∑𝑐𝑘=1 𝑒 (8) 2 From equations (6) and (8) ; 1 2 𝐹 = − ∑𝑥 log(∑𝑐𝑘=1 𝑒 −𝛽|𝑥−𝑦𝑘 | ) 𝛽 (9) If we take derivation of this function, we will have ; 𝜕𝐹 𝜕𝑦𝑗 = 0 ⇒ 𝑦𝑗 = ∑𝑥 𝑥 𝑃𝑟 (𝑥∈𝐶𝑗 ) ∑𝑥 𝑃𝑟 (𝑥∈𝐶𝑗 ) Hence, if we keep taking derivatives of the function, from (9), we will have ; 𝑦 (𝑛+1) = 𝑓(𝑦 (𝑛) ) , 3 ∑𝑥 where f is based on 𝑦𝑗 (𝑛+1) = ∑𝑥 −𝛽|𝑥−𝑦𝑗 𝑥.𝑒 2 (𝑛) | −𝛽|𝑥−𝑦𝑘 (𝑛) | ∑𝑐 𝑘=1 𝑒 2 −𝛽|𝑥−𝑦𝑗 (𝑛) | 𝑒 −𝛽|𝑥−𝑦𝑘 ∑𝑐 𝑘=1 𝑒 (𝑛) | 2 , ∀𝑗 (10) 2 2.2 Deterministic Annealing Algorithm 1. 2. 3. 4. 5. Set 𝛽 = 0 Choose an arbitrary set of initial cluster means Iterate according to equation (10) until converged Increase 𝛽 If 𝛽 < 𝛽𝑚𝑎𝑥 go to step 3. 2.3 The Idea in The Publication Optimization problems are considered by finding global minima of energy functions in the publication. Since Monte Carlo approach of simulated annealing is too slow, integrals are analytically applied by utilizing some different approximations in statistical physics. Hamiltonian, H(x), is minimized for the variable,  . Then, the idea relating to average the Gibbs Distribution at Temperature T is introduced. P() will denote probability distribution for  of type T, P() = 𝑒 𝐻(𝑥) − 𝑇 (11) 𝐻(𝑥) − ∫ 𝑑𝑥 𝑒 𝑇 or P() = 𝑒 𝐻(𝑥) 𝐹 + 𝑇 𝑇 − (12) and S(P) will denote entropy associated with probability distribution P(), S(P) = - P() lnP() (13) After minimizing the Free Energy F by using Objective Function and Entropy, 𝐻(𝑥) 𝑇 F = < H − T S(P) > = ∫ 𝑑𝑥 [P()H + TP()lnP()] = − 𝑇 𝑙𝑛  𝑑. 𝑒− (14) For each iteration, the temperature is decreased from 0.95 to 0.9995. Since it has difficulty for some cases, e.g. vector space clustering, Mixture Model, the new Hamiltonian, H0(, ), is revealed. 4 𝐻0 (,) 𝐹 + 𝑇 𝑇 𝑃0 () = 𝑒 − (15) F(𝑃0 ) = < H0 - T S0(P0) >|0 = < H – H0> |0 + F0(P0) = − 𝑇 𝑙𝑛  𝑑. 𝑒 − 𝐻0(,) 𝑇 (16) where < ….. >|0 shows that averaging with Gibbs 𝑃0 () i.e. ∫ 𝑑𝑥 𝑃0 () . ( 𝜀 is fixed in these integrals) Since P minimizes F(P), we will have ; 𝐹(𝑃) ≤ 𝐹(𝑃0 ) (17) This helps us for EM method, which is to calculate the expected value of the log likelihood function, and to find the parameter that maximizing the quantity. Therefore, <> = <> |0 =  d  Po() (18) For the new Hamiltonian function, there are three variables: 𝜀, 𝑥, 𝜑. It is descriped that “ The first variable set 𝜀 are used to approximate the real Hamiltonian H(𝑥, 𝜑) by H0( , , 𝜑 ) ; the second variable set x are subject to annealing while one can follow the determination of x by finding yet other parameters 𝜑 optimized by traditional method.” [1] It is said that the deterministic idea has some difficulties such show to choose parameter. Also, it is challenging to minimize the free energy from the equation (16). 3 Vector Space Clustering 3.1 Basic Central or Vector Space Clustering It is assumed that we have a clustering problem with N points labeled by x and K clusters labeled by k. If 𝑀𝑥 (𝑘) is the probability that point x belongs to cluster k with constarint, 𝐾 ∑ 𝑀𝑥 (𝑘) = 1 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑥 𝑘=1 𝐾 𝐻0 = ∑𝑁 𝑥=1 ∑𝑘=1 𝑀𝑥 (𝑘)𝜀𝑥 (𝑘) (19) where 𝜀𝑥 (𝑘) = (𝑋(𝑥) − 𝑌(𝑘))2 (20) 𝑋(𝑥) : points’ position 𝑌(𝑘) : cluster center Clearly, we can write the equation (19) as follows : 5 𝐾 2 𝐻𝑐𝑒𝑛𝑡𝑟𝑎𝑙 = 𝐻0 = ∑𝑁 𝑥=1 ∑𝑘=1 𝑀𝑥 (𝑘) (𝑋(𝑥) − 𝑌(𝑘)) (21) If we rewrite the equations (15) and (16) ; 𝐻0 (,) 𝑇 𝑃0 () = 𝑒 − 𝐹 . 𝑒𝑇 (22) 𝐻0(,) 𝑇 F(𝑃0 ) = −𝑇. 𝑙𝑛 ∑ 𝑒 − (23) After applying the equation (21) and (23) to the equation (22) ; 𝑃0 () = 𝑒 ∏𝑁 𝑥=1 − ∑𝐾 𝑘=1 𝑀𝑥 (𝑘)𝜀𝑥 (𝑘) (24) 𝜀 (𝑘) − 𝑥 𝑇 ∑𝐾 𝑒 𝑘=1 The optimal vectors {𝑦𝑣 } are derived by maximizing the entropy of the Gibbs distribution.[5] 𝑦𝛼 = arg 𝑚𝑎𝑥{𝑦𝑣} (− ∑ 𝑃0 . 𝑙𝑜𝑔𝑃0 ()) = arg 𝑚𝑎𝑥{𝑦𝑣} (∑ 𝐻0 (,).𝑃0 () 𝐾 + ∑𝑁 𝑥=1 𝑙𝑜𝑔 ∑𝑘=1 𝑇 𝑒− 𝜀𝑥(𝑘) 𝑇 ) (25) If we take derivative of the equation (25), and set the result 0 ; 𝑁 0 = ∑ < 𝑀𝑥 (𝑘) > 𝑥=1 < 𝑀𝑥 (𝑘) > = 𝑒 𝜕𝜀𝑥 (𝑘) 𝜕𝑦𝑣 𝜀 (𝑘) − 𝑥 𝑇 (26) 𝜀𝑥 (𝑘) − 𝑇 ∑𝐾 𝑘=1 𝑒 And the cluster centers Y(k) are easily determined in M step.[1] 𝑌(𝑘) = ∑𝑁 𝑥=1<𝑀𝑥 (𝑘)>𝑋(𝑘) (27) ∑𝑁 𝑥=1<𝑀𝑥 (𝑘)> The non vector case is not mentioned in detail. However, we have the equation ; 𝐾 𝑁 𝐻𝑁𝑜𝑛 𝑉𝑒𝑐𝑡𝑜𝑟 = ∑𝑁 𝑥=1 ∑𝑦=1 𝑑(𝑥, 𝑦) ∑𝑘=1 𝑀𝑋 (𝑘)𝑀𝑌 (𝑘) 𝐶(𝑘) (28) where 𝐶(𝑘) = ∑𝑁 𝑥=1 𝑀𝑥 (𝑘) is the number of points in k th cluster and ∑𝐾 𝑘=1 𝐶(𝑘) = 𝑁 is the total number of points 6 3.2 Fuzzy Clustering and K-means If the temperature is very high, the exponent of the equation (26) goes to zero, and all centers have equal probability. In that case, all centers have the position, 𝑌(𝑘) = ∑𝑁 𝑥=1 𝑋(𝑥)/𝑁 (29) If the temperature decreases, the points tend to close to a center by stages, and in the zero temperature limit, K-means solution can be found by assigning each point with the nearest center. Figure 1 declares the idea behind the annealing which is that we could not find false minima at high (infinite) temperature where the object function is smooth, and the free energy is minimized. If we decrease the temperature slowly, we close to the minima at lower temperature. Hence, the annealing is robust as long as temperature lowered slow enough. [1] In the figure, temperatures T2 and T3 show that the false minima at finite temperature. Tihs process is being approximated in deterministic annealing. The approximation includes modifying integrals by evaulating functions through their mean values. 3.3 Multiscale ( Hierarchical ) and Continous Clustering 7

Summary of Robust Scalable Visualized Clustering in Vector and

Related documents

Products

Support

Summary of Robust Scalable Visualized Clustering in Vector and

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib