Hierarchical Affinity Propagation Inmar E. Givoni, Clement Chung, Brendan J. Frey outline • A Binary Model for Affinity Propagation • Hierarchical Affinity Propagation • Experiments A Binary Model for Affinity Propagation AP was originally derived as an instance of the max-product (belief propagation) algorithm in a loopy factor graph. What’s factor graph? • Definition: A factor graph is a bipartite graph that expresses the structure of the factorization. A factor graph has a variable node for each variable 𝑥𝑖 , a factor node for each local function 𝑓𝑗 , and an edge-connecting variable node 𝑥𝑖 to factor node 𝑓𝑗 if and only if 𝑥𝑖 is an argument of𝑓𝑗 . Definition:所谓factor graph(因子图),就是对函数因子分解的表示 图,一般内含两种节点,变量节点和函数节点。我们知道,一个 全局函数能够分解为多个局部函数的积,因式分解就行了,这些 局部函数和对应的变量就能体现在因子上。 Eg: 𝑔 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 是一个五个变量的函数,假设g可以表达成: 𝑔 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 = 𝑓𝐴 (𝑥1 ) 𝑓𝐵 (𝑥2 ) 𝑓𝐶 (𝑥1 , 𝑥2 , 𝑥3 ) 𝑓𝐷 (𝑥3 , 𝑥4 )𝑓𝐸 (𝑥3 , 𝑥5 ) A factor graph for the product𝑓𝐴 (𝑥1 ) 𝑓𝐵 (𝑥2 ) 𝑓𝐶 (𝑥1 , 𝑥2 , 𝑥3 ) 𝑓𝐷 (𝑥3 , 𝑥4 )𝑓𝐸 (𝑥3 , 𝑥5 ) • The Max-Sum Update Rules 变量节点发给函数节点的消息:是变量节点 收到其他与之关联的函数节点发来的消息的 和。 where the notation ne(x)\ f is used to indicate the set of variable node x’s neighbors excluding function node f . 函数节点发给变量节点的消息:是函数节点的 值和其他变量节点发给它的消息累加和的最大 值。 • ne( f )\x is used to indicate the set of function node f ’s neighbors excluding variable node x. A binary variable model for affinity propagation • ci j = 1 if the exemplar for point i is point j. • 𝐼𝑖 , 𝐸𝑗 , 𝑆𝑖𝑗 :the I function nodes, in every row i of the grid, exactly one ci j , j ∈ {1, . . . , N}, variable must be set to 1. The E function nodes enforce the exemplar consistency constraints; in every column j, the set of ci j , i ∈ {1, . . . , N}, variables set to 1 indicates all the points that have chosen point j as their exemplar. the 𝐼𝑖 , 𝐸𝑗 , 𝑆𝑖𝑗 function nodes • We derive the scalar message updates in the binary variable AP model. Recall the max-sum message update rules. • The scalar message difference βi j (1) − βi j (0) is denoted by βi j . Similar notation is used for α, ρ, and η. In what follows, for each message we calculate its value for each setting of the binary variable and then take the difference. • The αi j messages are identical to the AP availability messages a(i, j),and the ρi j messages are identical to the AP responsibility messages r (i, j).Thus, we have recovered the original affinity propagation updates. Hierarchical Affinity Propagation • Goal: to solve the hierarchical clustering problem. • What’s hierarchical clustering ? 层次聚类算法与之前所讲的聚类有很大不同,它 不再产生单一聚类,而是产生一个聚类层次。说 白了就是一棵层次树。 层次聚类算法可分为凝聚(agglomerative,自底向 上)和分裂(divisive,自顶向下)两种。自底向 上,一开始,每个数据点各自为一个类别,然后 每一次迭代选取距离最近的两个类别,把他们合 并,直到最后只剩下一个类别为止,至此一棵树 构造完成。自顶向下与之相反过程。 Model • Goal: We propose a hierarchical exemplar based clustering objective function in terms of a high-order factor-graph, and we derive an efficient approximate loopy max-sum algorithms. • We wish to find a set of L consecutive layers of clustering, where the points to be clustered in layer l are constrained to be in the exemplar set of layer l-1. • (a) HAP factor-graph, a single layer of the standard AP model is shown in the dotted square. (b) HAP messages. Differences 1.The main difference compared to the at representation is manifested in the functions: if point i is not chosen as an exemplar at layer l-1, (i.e. if = 0), then point i will not be clustered at layer l. Alternatively, if point i is chosen as an exemplar at layer l1, it must choose an exemplar at layer l. • 2 • We note the 1ij messages passed in the first layer and the Lij messages passed in the top-most layer are identical to the standard AP messages for an AP layer. Experiments 2D synthetic data Analysis of Synthetic HIV Sequences Figure : 2D synthetic data: comparison of objective Eq. (8) achieved by HAP and its greedy counterpart (Greedy). Top:Median percent improvement of HAP over Greedy for a given number of layers used. Bottom: Scatter plots of the net similarity achieved by HAP v.s. Greedy. Experiments for which HAP obtains better results than Greedy are below the line. Total percent of settings where HAP outperforms Greedy is reported in the inset. Color in scatter-plot indicates the number of layers. First, we plotted precision v.s. recall for various clustering settings • Synthetic HIV data: precision-recall for HAP, Greedy, HKMC and HKMeans applied to the problem of identifying ancestral sequences from a set of 867 synthetic HIV sequences. For HKMC and HKMeans, we only plot the best precision obtained for each unique recall value. • Synthetic HIV data: distribution of Rand index for different experiments using HAP and Greedy. A higher Rand index indicates the solution better resembles the ground truth. Experiments for which HAP obtains better results than Greedy are below the line. The percentage of solutions that identified the correct single ancestor sequence at the top layer (layer 4) is also reported.