Supplemental Material for “Locally Linear Hashing for Extracting Non-Linear Manifolds” – Additional Experimental Results – Go Irie NTT Corporation Kanagawa, Japan Zhenguo Li Huawei Noah’s Ark Lab Hong Kong, China Xiao-Ming Wu, Shih-Fu Chang Columbia University New York, NY, USA irie.go@lab.ntt.co.jp li.zhenguo@huawei.com {xmwu, sfchang}@ee.columbia.edu Abstract For convenience, we hereafter call one set {ITQ, PQRR, SH, AGH, IMH-tSNE} Group-A and the other set {KMH, MDSH, SPH, KLSH} Group-B. For visibility, we show the comparative results with these two groups in separated figures. For further details on experimental setup, protocols and datasets, please refer to our main paper. This is the supplemental material for our main paper titled “Locally Linear Hashing for Extracting Non-Linear Manifolds”. In the main paper, we propose a new hashing method named Locally Linear Hashing (LLH) and report a set of extensive experimental results to demonstrate its superiority over the state-of-the-art methods. In this supplemental material, we provide additional results to emphasize its effectiveness, scalability, and efficiency. Additional Results on Synthetic Datasets Fig. 1 (Group-A) and Fig. 2 (Group-B) show the results on Two-TrefoilKnots. First, for every code length, our LLH outperforms all the methods in Group-B with significant performance gain (Fig. 2(a)). Note that our LLH also outperforms the methods in Group-A (see Fig. 6(b) in our main paper). Second, for each fixed {16, 32, 64}-bit code, our LLH shows significantly better performance than all the other methods at any number of top retrieved points (Fig. 1(b-d) and Fig. 2(b-d)). Notably our LLH obtains almost perfect precision in all the cases. Fig. 3 (Group-A) and Fig. 4 (Group-B) show the results on Two-SwissRolls. For this dataset, no previous methods achieve reasonable performance, and only LLH (or LLH0 ) works well and shows strong superiority to the other methods including the exhaustive l2 -scan. These results are clear evidences that our LLH can effectively capture and preserve the locally linear structures of the non-linear manifolds in the final Hamming space after hashing, which is indeed vital to the performance. Introduction In our main paper, we experimentally compare our LLH to several state-of-the-art methods including ITQ [1], PQRR [4], SH [9], AGH [6], and IMH-tSNE [7], and demonstrate the significant superiority of our LLH over these methods (see Sec. 3 of our main paper). In this supplemental material, we show some additional results which cannot be presented in the main paper due to space limit. Also, we compare our LLH to some other state-of-the-art methods including K-means Hashing (KMH) [2], Multidimensional Spectral Hashing (MDSH) [8], Spherical Hashing (SPH) [3], and Kernelized Locality Sensitive Hashing (KLSH) [5]. To run each method, we use the publicly available Matlab code provided by each author group with suggested parameter settings (if given) in each original paper. Specifically, for KMH which internally uses Product Quantization technique [4], the number of bits per sub-space is chosen from {2, 4, 8} based on some preliminary tests1 . For KLSH, we use the Gaussian RBF kernel κ(xi , xj ) = exp(−||xi − xj ||2 /2σ 2 ) and sample 300 training points to form the kernel Gramian. For KLSH and MDSH, the kernel (or affinity matrix) parameter σ is set to the average distance between data points in the training set. Additional Results on Face Images We next show the results on Yale in Fig. 5 (Group-A) and Fig. 6 (Group-B). Note that the results for Group-A with 64bit codes are presented in Fig. 8(b) in the main paper. Our LLH is consistently better than all the other methods at every code length, and LLH0 is the second best. The best competitive methods to ours are KMH and SPH, both aim- 1 On the two synthetic datasets, Two-TrefoilKnots and Two-SwissRolls, the number of sub-spaces is set as 1 when c = 1, c being the number of bits. 1 ing to preserve Euclidean neighborhood in the Hamming space. They outperform the exhaustive l2 -scan with relatively longer codes such as 24-bit or more (Fig. 6(a)). However, our LLH, and even LLH0 , are much better than these two methods at every code length and at every number of top retrieved points (Fig. 6(b-d)). The gain is also remarkable, ranging from 38% to 356%. These results suggest that our LLH successfully extracts the non-linear manifold structures on the Yale face dataset. Additional Results on Handwritten Digits Fig. 7 (Group-A) and Fig. 8 (Group-B) show the results on the USPS handwritten digits dataset. Our LLH shows the best performance on this dataset, and LLH0 follows (Fig. 8(a)). While AGH is slightly better than LLH0 in some cases (Fig. 7(a)), LLH is consistently better than AGH for all the cases. Fig. 9 (Group-A) and Fig. 10 (Group-B) show the results on MNIST. Again, our LLH achieves the best performance in most cases. The best competitive method is AGH, and the performances of Group-B are relatively poor. For 16-bit (or less), LLH0 performs better than LLH (Fig. 9(a)). This may be because 16 (or less) dimensional Hamming space is too low to preserve the locally linear structures of the multiple manifolds which is captrued by LLSR. In contrast, for 32bits or more, LLH outperforms LLH0 and AGH, especially when we retrieve 900 or less data points (Fig. 9(b)). Additional Results on Natural Images Fig. 11 (Group-A) and Fig. 12 (Group-B) show the results on CIFAR. Our LLH outperforms all the other methods in most cases. The best competitive method is ITQ, and SPH follows (Fig. 12(a)). Note that our LLH is the only method which is comparable to or even better than l2 -scan. Fig. 13 (Group-A) and Fig. 14 (Group-B) show the results on ImageNet-200K. Our LLH is still the best on this dataset. LLH0 , KMH, SPH, and ITQ, are highly competitive on this dataset. With 64-bit codes, SPH is slightly better than our LLH when we retrieve more than 500 points (Fig. 14(d)). However, for shorter codes (from 16- to 48-bit codes), our LLH achieves higher precision (Fig. 14(a-c)) than SPH. Lastly, Fig. 15 shows qualitative results on MIRFLICKR-1M. For every query, our LLH returns more semantically consistent images and shows better retrieval results than Group-B. Computation Time In this supplemental material, we focus on the scalability of the out-of-sample binary code generation time w.r.t. the database size n and K (see Sec. 2.3 in our main paper). All results are obtained on MIRFLICKR-1M dataset and with MATLAB on a workstation with 2.53 GHz Intel Xeon CPU and 64GB RAM. Fig. 16(a) shows the binary code generation time for varied database sizes from n = 50K to 1M (in these experiments, we fix m = 100K). As analyzed in Sec. 2.4 in our main paper, the binary code generation time is independent of n. Fig. 16(b) shows the binary code generation time for varied K. We find that K does not really affect the code generation time in practice. This is because, as K increases, the number of data points within each cluster decreases (see Sec. 2.1 and Sec. 2.3), leading to almost constant time for code generation. Based on these results, our out-of-sample algorithm is scalable to both the database size n and K. Parameter Study We report some additional results on the parameter sensitivity of our LLH. Fig. 17(a-b) show the averaged precision on CIFAR with different settings of λ and η. Similar to the results on MNIST (repoted in Fig. 16 of our main paper), the performance is slightly sensitive to λ and η, but the retrieval quality is still much better than l2 -scan in most cases, especially when using longer codes such as 32-bit or more. Fig. 17(c-d) show the performance of LLH with varied K on CIFAR and MNIST datasets, respectively. We find that the performance of our LLH is quite stable w.r.t. K. References [1] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. PAMI, 35(12):2916–2929, 2013. 1 [2] K. He, F. Wen, and J. Sun. K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In CVPR, 2013. 1 [3] J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon. Spherical hashing. In CVPR, 2012. 1 [4] H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. PAMI, 33(1):117–128, 2011. 1 [5] B. Kulis, P. Jain, and K. Grauman. Kernelized locality-sensitive hashing. IEEE Trans. PAMI, 34(6):1092–1104, 2012. 1 [6] W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, 2011. 1 [7] F. Shen, C. Shen, Q. Shi, A. van den Hengel, and Z. Tang. Inductive hashing on manifolds. In CVPR, 2013. 1 [8] Y. Weiss, R. Fergus, and A. Torralba. Multidimensional spectral hashing. In ECCV, 2012. 1 [9] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008. 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 ITQ PQ−ASD SH AGH IMH−tSNE 0.6 0.5 0.7 ITQ PQ−ASD SH AGH IMH−tSNE 0.6 0.5 0 0.3 1 ITQ PQ−ASD SH AGH IMH−tSNE 0.6 0.5 0.3 3 10 # retrieved points 0 LLH LLH l2−scan 0.4 2 10 0.7 0 LLH LLH l2−scan 0.4 precision 1 precision precision 1 1 10 10 (a) LLH LLH l2−scan 0.4 2 0.3 3 10 # retrieved points 10 1 2 10 3 10 # retrieved points (b) 10 (c) Figure 1. Comparison to Group-A on Two-TrefoilKnots. Averaged precision vs. the number of top retrieved points for (a) 16-bit, (b) 32-bit, and (c) 64-bit codes. 0.7 KMH MDSH SPH KLSH 0.6 0 124681216 24 32 48 # bits 1 1 0.9 0.9 0.8 0.8 0.8 0.7 0.6 0.5 LLH LLH l2−scan 0.5 1 0.9 0.4 KMH MDSH SPH KLSH 0.3 64 1 10 (a) 0.7 0.6 KMH MDSH SPH KLSH 0.5 LLH0 LLH l2−scan 2 0.3 3 1 10 10 (b) 0.7 0.6 0.5 LLH0 LLH l2−scan 0.4 10 # retrieved points precision 0.8 precision precision@500 0.9 precision 1 0.4 2 0.3 3 10 # retrieved points 10 KMH MDSH SPH KLSH LLH0 LLH l2−scan 1 10 (c) 2 3 10 # retrieved points 10 (d) Figure 2. Comparison to Group-B on Two-TrefoilKnots. (a) Averaged precision at top 500 retrieved points vs. the number of bits; Averaged precision vs. the number of top retrieved points for (b) 16-bit, (c) 32-bit, and (d) 64-bit codes. 1 1 0.9 0.8 0.7 0 LLH LLH l2−scan 0.6 0.9 ITQ PQ−ASD SH AGH IMH−tSNE 0.8 precision ITQ PQ−ASD SH AGH IMH−tSNE precision precision 0.9 1 0 0.7 LLH LLH l2−scan 0.6 0.5 2 0 LLH LLH l2−scan 0.5 3 10 # retrieved points 0.7 0.6 0.5 1 10 ITQ PQ−ASD SH AGH IMH−tSNE 0.8 1 10 10 (a) 2 3 10 # retrieved points 1 10 2 10 3 10 # retrieved points (b) 10 (c) Figure 3. Comparison to Group-A on Two-SwissRolls. Averaged precision vs. the number of top retrieved points for (a) 16-bit, (b) 32-bit, and (c) 64-bit codes. 1 1 1 0.9 0.9 0.9 KMH MDSH SPH KLSH 0.6 LLH0 LLH l2−scan 0.8 0.7 KMH MDSH SPH KLSH 0.6 LLH0 LLH l2−scan 0.8 0.7 KMH MDSH SPH KLSH 0.6 LLH0 LLH l2−scan precision 0.7 precision 0.8 precision precision@500 0.9 0.8 0.7 KMH MDSH SPH KLSH 0.6 LLH0 LLH l2−scan 0.5 0.5 12468 1216 24 32 # bits (a) 48 64 0.5 1 10 2 10 # retrieved points (b) 3 10 0.5 1 10 2 10 # retrieved points (c) 3 10 1 10 2 10 # retrieved points 3 10 (d) Figure 4. Comparison to Group-B on Two-SwissRolls. (a) Averaged precision at top 500 retrieved points vs. the number of bits; Averaged precision vs. the number of top retrieved points for (b) 16-bit, (c) 32-bit, and (d) 64-bit codes. 0.8 0.8 0.7 0.7 ITQ PQ−RR SH AGH IMH−tSNE 0.6 ITQ PQ−RR SH AGH IMH−tSNE 0.5 0.4 0.3 precision precision 0.6 0.4 0 LLH LLH l2−scan 0.3 0 LLH LLH l2−scan 0.2 0.5 0.2 0.1 0.1 0 10 20 30 40 # retrieved points 0 50 10 20 30 40 # retrieved points (a) 50 (b) Figure 5. Comparison to Group-A on Yale. Averaged precision vs. the number of top retrieved images for (a) 16-bit and (b) 32-bit codes. 0.8 0.7 KMH MDSH SPH KLSH 0.4 0.3 0.1 24 32 # bits 48 LLH0 LLH l2−scan 0.3 0 LLH LLH l2−scan 0.2 0.4 64 0.7 KMH MDSH SPH KLSH 0.6 KMH MDSH SPH KLSH 0.5 0.8 0.5 0.6 precision 0.5 0 8 12 16 0.8 0.7 0.6 precision precision@10 0.6 0.8 precision 0.7 0 LLH LLH l2−scan 0.4 0.5 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 10 20 30 40 # retrieved points (a) 50 10 20 30 40 # retrieved points (b) 0 50 KMH MDSH SPH KLSH 0.4 LLH0 LLH l2−scan 10 20 30 40 # retrieved points (c) 50 (d) 0.9 0.9 0.8 0.8 0.7 0.7 precision precision Figure 6. Comparison to Group-B on Yale. (a) Averaged precision at top 10 retrieved points vs. the number of bits; the number of top retrieved images for (b) 16-bit, (c) 32-bit, and (d) 64-bit codes. 0.6 0.5 ITQ PQ−RR SH AGH IMH−tSNE 0.4 0.3 0.6 ITQ PQ−RR SH AGH IMH−tSNE 0.5 0.4 0.3 LLH0 LLH l2−scan 0 LLH LLH l2−scan 0.2 0.1 1 0.2 2 10 0.1 3 10 # retrieved points 1 10 10 2 3 10 # retrieved points (a) 10 (b) Figure 7. Comparison to Group-A on USPS; Averaged precision vs. the number of top retrieved images for (a) 16-bit and (b) 32-bit codes. 0.8 0.4 KMH MDSH SPH KLSH 0.3 0.2 0 8 12 16 24 32 # bits (a) 48 0.9 0.8 0.8 0.7 0.7 0.7 0.6 0.5 0.4 0.3 LLH0 LLH l2−scan 0.1 0.9 0.8 0.2 64 0.1 KMH MDSH SPH KLSH 1 10 0.6 0.5 KMH MDSH SPH KLSH 0.4 0.3 LLH0 LLH l2−scan 2 (b) 3 10 0.1 1 10 0.6 0.5 0.4 0.3 LLH0 LLH l2−scan 0.2 10 # retrieved points precision 0.5 precision precision@500 0.6 0.9 precision 0.7 KMH MDSH SPH KLSH 0 0.2 2 10 # retrieved points (c) 3 10 0.1 LLH LLH l2−scan 1 10 2 10 # retrieved points 3 10 (d) Figure 8. Comparison to Group-B on USPS; (a) Averaged precision at top 500 retrieved points vs. the number of bits; the number of top retrieved images for (b) 16-bit, (c) 32-bit, and (d) 64-bit codes. 1 0.9 0.8 0.8 0.7 precision precision 1 0.9 ITQ PQ−RR SH AGH IMH−tSNE 0.6 0.5 0.4 1 0.4 LLH0 LLH l2−scan 0.3 2 10 ITQ PQ−RR SH AGH IMH−tSNE 0.6 0.5 LLH0 LLH l2−scan 0.3 0.7 3 10 # retrieved points 1 10 10 2 3 10 # retrieved points (a) 10 (b) 1 1 0.9 0.9 0.9 0.7 0.8 0.8 0.8 0.6 0.7 0.7 0.7 KMH MDSH SPH KLSH 0.5 0.4 0.2 8 12 16 24 32 # bits 48 KMH MDSH SPH KLSH 0.5 0.4 LLH0 LLH l2−scan 0.3 0.6 1 64 10 0.4 2 3 1 10 10 (b) 0.6 KMH MDSH SPH KLSH 0.5 0.4 LLH0 LLH l2−scan 0.3 10 # retrieved points (a) KMH MDSH SPH KLSH 0.5 LLH0 LLH l2−scan 0.3 0.6 precision 1 0.8 precision 0.9 precision precision@500 Figure 9. Comparison to Group-A on MNIST; Averaged precision vs. the number of top retrieved images for (a) 16-bit and (b) 32-bit codes. LLH0 LLH l2−scan 0.3 2 10 # retrieved points 3 1 10 2 10 3 10 # retrieved points (c) 10 (d) Figure 10. Comparison to Group-B on MNIST; (a) Averaged precision at top 500 retrieved points vs. the number of bits; the number of top retrieved images for (b) 16-bit, (c) 32-bit, and (d) 64-bit codes. ITQ PQ−RR SH AGH IMH−tSNE 0.5 0.45 0.35 LLH0 LLH l2−scan 0.4 LLH LLH l2−scan precision precision 0.45 0 0.4 ITQ PQ−RR SH AGH IMH−tSNE 0.5 0.3 0.25 0.35 0.3 0.25 0.2 0.2 0.15 0.15 1 2 10 3 10 # retrieved points 1 10 10 2 3 10 # retrieved points (a) 10 (b) Figure 11. Comparison to Group-A on CIFAR; Averaged precision vs. the number of top retrieved images for (a) 16-bit and (b) 32-bit codes. LLH0 LLH l2−scan 0.2 KMH MDSH SPH KLSH 0.15 8 12 16 24 32 # bits (a) 48 0.3 0.25 LLH0 LLH l2−scan 0.1 0.35 64 0.35 0.3 0.35 0.3 0.25 0.2 0.2 0.2 0.15 0.15 0.15 1 2 10 # retrieved points (b) 3 10 LLH0 LLH l2−scan 0.4 0.25 10 KMH MDSH SPH KLSH 0.5 0.45 LLH0 LLH l2−scan 0.4 precision 0.4 precision precision@500 0.25 KMH MDSH SPH KLSH 0.5 0.45 precision KMH MDSH SPH KLSH 0.5 0.45 1 10 2 10 # retrieved points (c) 3 10 1 10 2 10 # retrieved points 3 10 (d) Figure 12. Comparison to Group-B on CIFAR; (a) Averaged precision at top 500 retrieved points vs. the number of bits; the number of top retrieved images for (b) 16-bit, (c) 32-bit, and (d) 64-bit codes. 0.16 0.16 ITQ PQ−RR SH AGH IMH−tSNE 0.14 0.12 LLH0 LLH 0.1 precision precision 0.12 ITQ PQ−RR SH AGH IMH−tSNE 0.14 0.08 0.08 0.06 0.06 0.04 0.04 0.02 LLH0 LLH 0.1 0.02 1 2 10 3 10 # retrieved points 1 10 10 2 3 10 # retrieved points (a) 10 (b) Figure 13. Comparison to Group-A on ImageNet-200K; Averaged precision vs. the number of top retrieved images for (a) 16-bit and (b) 32-bit codes. 0.16 0.16 KMH MDSH SPH KLSH KMH MDSH SPH KLSH 0.04 LLH0 LLH 0.02 16 precision precision@500 0.12 0.06 24 32 48 64 # bits (a) 0.12 LLH0 LLH 0.1 0.08 0.16 KMH MDSH SPH KLSH 0.14 0.12 LLH0 LLH 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.04 0.02 0.02 1 2 10 # retrieved points 3 10 LLH0 LLH 0.1 0.06 10 KMH MDSH SPH KLSH 0.14 precision 0.14 precision 0.08 0.02 1 10 (b) 2 10 # retrieved points 3 10 (c) 1 10 2 10 # retrieved points 3 10 (d) Figure 14. Comparison to Group-B on ImageNet-200K; (a) Averaged precision at top 500 retrieved points vs. the number of bits; the number of top retrieved images for (b) 16-bit, (c) 32-bit, and (d) 64-bit codes. (a) query (b) LLH (c) KMH (d) MDSH (e) SPH (f) KLSH Figure 15. Top retrieved images on MIRFLICKR-1M using 64-bit codes. Red border denotes false positive. 0 0 10 LLH (16−bits) LLH (32−bits) LLH (64−bits) wall clock time per query [sec] wall clock time per query [sec] 10 −1 10 −2 10 −3 LLH (16−bits) LLH (32−bits) LLH (64−bits) −1 10 −2 10 −3 10 2 4 6 database size 8 10 10 5 x 10 200 (a) 400 600 K 800 1000 (b) Figure 16. Scalability of the proposed out-of-sample extension algorithm on MIRFLICKR-1M; Averaged binary code generation time per query vs. (a) database size n and (b) K. 0.9 lambda=0.01 0.05 0.1 0.5 1.0 l2−scan 0.1 8 12 16 24 32 # bits (a) 48 0.2 eta=0.01 0.05 0.1 0.5 1.0 l2−scan 0.1 64 8 12 16 24 32 # bits (b) 48 0.2 K=64 128 256 512 1024 l2−scan 0.1 64 precision@500 0.2 precision@500 precision@500 precision@500 0.8 8 12 16 24 32 # bits 48 0.7 0.6 0.5 0.4 0.3 64 K=64 128 256 512 1024 l2−scan 8 12 16 (c) Figure 17. Parameter sensitivity; (a) λ, (b) η, and (c) K on CIFAR and (d) K on MNIST. 24 32 # bits (d) 48 64