1 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES Adversarial Examples for Handcrafted Features Zohaib Ali zali.msee16seecs@seecs.edu.pk Muhammad Latif Anjum latif.anjum@seecs.edu.pk Robotics & Machine Intelligence (ROMI) Lab National University of Sciences and Technology, Islamabad, Pakistan. Wajahat Hussain wajahat.hussain@seecs.edu.pk Original With Adversarial Noise Original Noisy Original Original Figure 1: Attacking Local Feature Matching. Original image and its perturbed version using our novel adversarial noise. The decline in SURF [3] feature matching is 92.07% (1463 matches with original-original and 116 with perturbed-original) with little perceptual change. Abstract Adversarial examples have exposed the weakness of deep networks. Careful modification of the input fools the network completely. Little work has been done to expose the weakness of handcrafted features in adversarial settings. In this work, we propose novel adversarial perturbations for handcrafted features. Pixel level analysis of handcrafted features reveals simple modifications which considerably degrade their performance. These perturbations generalize over different features, viewpoint and illumination changes. We demonstrate successful attack on several well known pipelines (SLAM, visual odometry, SfM etc.). Extensive evaluation is presented on multiple public benchmarks. 1 Introduction Is there any domain where deep features are not the best performers? Surprisingly image registration is one area where handcrafted features still outperform these data driven features c 2019. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms. 2 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES [40]. Perhaps the lack of appropriate training data is the reason behind this anomalous lag. Labeling interest points is much more tedious than labeling objects, e.g., there can be more than thousand interest points on a single image as compared to thousand object instances in the entire dataset. Recently there has been a surge in adversarial examples generation that fool these deep systems. These adversarial examples can have either imperceptible changes [19] or blatant editing [7] of input for tricking DNNs. Although blatant, this editing appears harmless, e.g., a black patch somewhere far from the object of interest. Ironically, the stock example used to advocate the gravity of this weakness is where autonomous car misinterprets stop sign for high speed (50 kmph) area [31]. Visual odometry is the fundamental method used by moving agents to estimate their location. This method relies on handcrafted features. This raises the question whether such weakness exists for these handcrafted features? This work is focused on investigating the existence of such adversarial examples for handcrafted features. One way to approach this challenge is to understand the adversarial example generation process for DNNs. The key ingredient, in the deep adversarial example recipe, is the end-to-end differentiability of DNN [46]. This sound analytical formulation makes adversarial example generation appear real simple, i.e., change the input slightly so that the output decision of DNN changes. On the other hand, handcrafted features, as the name suggests, comprise of functioning heuristics and well thought out discrete steps that have evolved over decades [3, 30, 34]. The pipeline for interest points includes gradient calculation for corner detection, orientation assignment, and scale calculation for invariant matching. Gradient calculation involves thresholding to find significant gradient. Orientation is determined by quantizing the gradients into fixed number of angles. Scale is determined by searching over discrete scale-space pyramid. These discrete steps indicate that this pipeline will not have the end-to-end differentiability. A nonlinear process maps the input to output in case of DNN. Therefore, the deep pipeline amplifies slight change in input to totally different output. On the contrary, the interest points pipeline works with direct pixel values, e.g., edge is a simple difference between two neighbouring pixels. Is it possible to slightly modify the pixels and get totally different output for interest points? In this work, we share pixel level insights that expose weaknesses of these handcrafted pipelines. The simplest case of image registration, i.e., no scale, illumination or view changes is shown in Figure 1. The image is registered with itself. As expected large number of features match. We add our novel perturbation to the image and match the original image with the same image perturbed with our novel adversarial noise. Even for this simple case of image registration, the number of matches significantly reduces. Following are the contributions of our work: 1. This work is the first attempt, to the best of our knowledge, to demonstrate adversarial examples for handcrafted features in context of natural scenes. 2. Our adversarial noise generalizes over different local features, viewpoints and illuminations with varying degree of success. 3. We demonstrate successful attacks on well known image registration, structure-frommotion (SfM), SLAM and loop-closing pipelines with varying degree of success. Our novel perturbation scheme helps in censoring personal images. An image should only be used for the intended purpose and nothing more [11]. Imagine the implications if ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES 3 an image, uploaded on social websites, is registered with a stored database (Google maps) to find its location. Using our novel method, the owner of digital content can reduce the chances of such privacy violations. Furthermore, our novel perturbation scheme does not considerably affect image perceptual quality keeping its likeability [25] and interesting-ness [20] intact. 2 Related Work The concern of adversarial examples was raised in machine learning and robotics community as a theoretical exercise when it was successfully demonstrated that deep neural networks can be deceived with a small perturbation in digital images [5, 19, 28, 37, 46]. It was reported that these adversarial examples do not pose a threat to autonomous systems owing to their rigid viewpoint and scale matching requirements [31]. This was quickly overruled when adversarial examples for physical-world were proposed in the form of adversarial patch [7], adversarial stickers resembling graffiti [12] and printed adversarial images [26]. Adversarial examples for 3D-printed physical objects have also been presented which are robust to viewpoint changes and can be misclassified from every viewpoint nearly 100% of the time [1]. Most recent work on adversarial examples is focused on generating adversarial examples entirely from scratch using GANs [44], integrating locality constraints into the optimization equation [41], creating black-box adversarial attack which does not require access to classifier gradients [24], and using parametric lightning and image geometry to design adversarial attack for deep features [29]. Strategies to defend against such adversarial attacks are proposed in [9, 32, 39, 42, 43]. Before the arms race (adversarial attacks vs robust deep networks) started, the most common example of adversarial attacks on vision revolved around CAPTCHAs [35]. Written text was distorted by adding clutter (occluding lines) and random transformations (rotation, warping) so that contemporary text detectors fail to read the text [48]. Early CAPTCHA solvers were able to detect text reliably even under adversarial clutter using pattern matching (object detection) techniques [35]. Face spoofing was another attack on classic vision systems [6]. These attacks on classic vision systems involved changing the input considerably. Imperceptibility was not the goal. Our attack on traditional local features requires as little modification in input image as possible. The work closest to ours comes from perceptual ad blocking [14] where discrete noise has been added to deceive SIFT [30] features. This has been done on a very small patch of the image (containing The AdChoices Logo) to deceive ad-blocking algorithms. Our work is considerably different from this work. We have proposed perturbations that generalize over viewpoint and illumination changes and work for different local features. Furthermore, small logos contain few local features as compared to natural scenes we are attacking. 3 Pixel Level Insight into Handcrafted Features What happens to handcrafted features if the pixel at which the feature is detected and its neighboring pixels are perturbed? The answer is tricky as it all depends on pixels around the patch that is being perturbed. The feature may completely vanish from original location and its surroundings. This is termed as successful attack on feature detector. However, the feature may also be displaced by a few pixels from original location after perturbations. 4 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES This is considered as unsuccessful attack on detector because the feature is still being detected (albeit a few pixels displaced). This, however, does not mean perturbations have not affected the feature. The power of handcrafted features lie in their descriptors which are used to match these features. To analyze the effectiveness of perturbations in such situations we estimated the outlier free matches (correspondences) using fundamental matrix / epipolar line check involving RANSAC [13, 22]. A successful attack on descriptor is established if a feature displaced due to perturbations fails to match with original feature. An unsuccessful attack is the one where feature is detected (either at the same pixel or displaced) and matched successfully. Pixel level insight into all these categories is provided in Figure 2 for Harris [21] corners after addition of Gaussian blur and P2P perturbations (P2P and other perturbations are explained in Section 4). Because of the discrete nature of handcrafted features, it is difficult to predict the effect of perturbations on handcrafted features. Inspired by this pixel level insight into handcrafted features, we designed various discrete perturbations to successfully affect handcrafted features. Successful Descriptor Attack (corner detected within 9x9 window but not matched) Unsuccessful Attack (corner detected within 9x9 window and matched) P2P Perturbation (3x3) Perturbed Original Gaussian Blur (3x3) Blurred Original Successful Detector Attack (corner removed from 9x9 window) Figure 2: Pixel level insight. Pixel at which corner is detected is colored green. Takeaway from this insight is, with our adversarial noise, the corner is either removed (successful detector attack) or its descriptor corrupted (successful descriptor attack) with little visual change for the naked eye. Note that for our best perturbation, successful detector and descriptor attack on Harris constitutes 99% of corners. 4 Formulating the Adversarial Perturbations We divide our adversarial perturbation schemes into three groups: Gaussian blur, inpainted perturbations, and discrete perturbations. Each scheme is discussed separately. 4.1 Gaussian Blur Almost all of the handcrafted features utilize image gradients in one way or the other. Any smoothing filter discretely applied at locations of detected features may disturb the local gradients and therefore the features. Our first perturbation, therefore, is Gaussian blur of various kernel sizes and sigma values applied at locations of features. Our experimental 5 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES results show that Gaussian blur drastically degrades image quality and fails to significantly deceive most of the features. 4.2 Inpainted Perturbations Image inpainting [4] methods fill the holes in the image using the context and produce realistic results. We leverage this technique to produce adversarial yet imperceptible noise. 4.2.1 Average Squared Mask (ASM) Perturbation: ASM is applied in two steps (Figure 3a). Firstly, the fixed region (3×3, 5×5) around the detected feature is averaged to remove the corner just like Gaussian blur. Secondly, inpainting is deployed to refill the averaged pixels. This results in perceptually original looking patch while having minute changes affecting the local feature as shown in the results. 4.2.2 Dominant Direction (DD) Perturbation for SURF: Instead of a rectangular region (3×3, 5×5), we tried greedy approach of adding perturbations along the dominant direction of SURF features and its perpendicular direction (Figure 3b), followed by inpainting to refill perturbed regions. Each pixel lying along the cross sign within the patch is replaced with white pixel before using inpainting. Average Square Mask (ASM) Dominant Direction (DD) P2P Perturbation Scheme p 𝛍 p Original patch Noise Added Inpainting Mask Inpainted patch Original patch Noise Added (a) Inpainting Mask Inpainted patch (b) (c) PPS Perturbation Scheme p p p B2B Perturbation Scheme p p p p p p Block 1 Block 2 p p p p p p p p 𝛍 𝛍 p p p p p p p p p p p p p p 𝛍 p 𝛍 p 3x3 patch centered at feature 3x3 p p p p Block 1 repeated with external pixels 5x5 Block 3 (d) Block 4 (e) Figure 3: Our novel adversarial perturbation schemes. 4.3 Discrete Perturbations Instead of modifying the image regions in a black box manner we developed few perturbations that rely on local averaging. This averaging significantly affects the local gradients while keeping the distortion low. 4.3.1 Pixel-2-Pixel (P2P) Perturbation: P2P perturbations are designed to affect every pixel of the patch (3×3, 5×5) around the feature location. Each pixel of a fixed sized patch, 6 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES around the feature location, is replaced with the average of two pixels: its neighbors in previous row and previous column (Figure 3c) as given in Equation 1. It must be noted that during P2P perturbations, edge pixels of the patch are replaced by the average of pixels outside the patch. This arrangement ensures no sharp edge is generated at the boundary of the patch after perturbations. Pixels are modified sequentially starting from the top left. pi, j = pi, j−1 + pi−1, j 2 (1) 4.3.2 Pixel-2-Pixel-Scattered (PPS) Perturbation: The scattered P2P perturbation routine modifies every other pixel (along horizontal and vertical axis) within the patch instead of modifying its every pixel (dark pixels in Figure 3d). This scattered nature of the perturbation keeps the quality degradation in image to a low level. We have used simple averaging of nine pixels, i.e., dark pixels in Figure 3d are replaced by the average of 3×3 neighbourhood. 4.3.3 Block-2-Block (B2B) Perturbation: Block-2-Block scheme, is a greedy approach, which generates perturbation at coarser block level instead of pixel level (Figure 3e). The perturbation scheme is as follows: 1) A patch of 3×3 around each feature is divided into four overlapping blocks as shown in Figure 3e. 2) The decision to perturb a pixel is taken at block level. Sum of all four pixels in each block is evaluated and block with highest sum is considered as reference block that will not be perturbed. Sum of pixels in each block is compared with the reference block, and only those blocks are perturbed whose difference is greater than a threshold. This can result in one, two or all three blocks (except the reference block) as candidates to undergo perturbations. 3) Once a block is selected as a candidate to undergo perturbation, we replace each pixel of the block with the average of its three neighboring pixels outside the block (Figure 3e). Note that since blocks are overlapping, one pixel may undergo perturbation more than once. 4.3.4 Scale-specific Perturbation for SURF: SURF are scale-invariant features and additionally provide scale at which a keypoint is detected. This scale information can be utilized for designing a perturbation to deceive SURF features. In scale-specific perturbation, the kernel size of ASM, P2P and PPS perturbations is linked with scale of keypoints which is divided into three categories: 1) For keypoints with scale less than 5, a 5×5 kernel is used, 2) for keypoints with scale between 5 and 10, a 7×7 kernel is used, and 3) for keypoints with scale greater than 10, a 9×9 kernel is used. This constitutes a scale-specific perturbation with kernel sizes (5×5)-(7×7)-(9×9) for three scale categories. We have also tested for two other schemes of kernel sizes i.e. for (7×7)-(9×9)-(11×11) and (9×9)-(11×11)-(13×13). 5 Experimental Results We show the effectiveness of our adversarial noise on robust image registration and various publicly available pipelines. We report results on two competing indicators, i.e., pipeline degradation and image perceptual quality degradation. We use two metrics to ascertain the perceptual quality degradation after perturbations 1) structural similarity index (SSIM) [47], and 2) peak signal-to-noise ratio (PSNR) [8]. Higher value of these metrics means good image perceptual quality. These two image quality metrics are considered benchmark for digital content [23]. Image registration results are reported on recent HPatches dataset [2] which contains 116 different environments with large viewpoint and illumination changes. We have used videos from RGB-D TUM benchmark [45] for sequential/multiview pipelines. 7 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES Table 1: Results for Gaussian Blur, ASM, P2P, DD, PPS and B2B perturbations on HPatches dataset. Resilience of SURF features (PFMD boldfaced) against adversarial attacks stands out. Gaussian Blur (σ = 1) Kernel Size: 15×15 Average Squared Mask (ASM) Kernel Size: 25×25 Kernel Size: 3×3 Kernel Size: 5×5 Features PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) Harris [21] FAST [38] BRISK [27] MSER [33] SURF [3] 92.37 92.47 94.01 78.49 63.59 95.8 96.25 97.13 97.83 96.38 33.59 34.56 36.12 36.65 34.38 94.97 95.32 95.80 95.94 94.33 33.35 34.18 34.54 34.52 34.47 92.86 92.43 91.93 96.00 94.27 31.16 30.99 30.52 34.44 33.29 90.79 90.51 89.07 95.17 93.81 29.11 29.30 27.91 32.60 32.69 92.30 92.89 94.25 82.58 69.47 98.52 99.37 95.83 92.27 63.62 Pixel-2-Pixel (P2P) Kernel Size: 3×3 99.40 99.59 98.42 95.53 68.06 Dominant Direction (DD) Kernel Size: 5×5 Kernel Size: 11×11 Kernel Size: 15×15 Features PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) Harris [21] FAST [38] BRISK [27] MSER [33] SURF [3] 98.27 98.69 95.64 70.31 41.75 97.30 96.91 96.09 99.52 99.28 31.77 32.37 31.22 40.26 40.19 92.41 92.58 90.74 98.38 97.40 26.48 27.47 26.11 34.43 33.71 – – – – 93.57 – – – – 32.53 – – – – 93.42 – – – – 32.34 99.40 99.59 98.89 88.52 75.47 – – – – 70.68 Pixel-2-Pixel Scattered (PPS) Threshold: 3×3 – – – – 73.10 Block-2-Block (B2B) Kernel Size: 5×5 Threshold = 50 Threshold = 20 Features PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) Harris [21] FAST [38] BRISK [27] MSER [33] SURF [3] 91.44 94.37 90.16 58.63 25.00 98.96 98.79 98.47 99.76 99.64 36.23 36.76 35.48 43.53 44.55 99.18 99.03 98.82 99.77 98.14 38.95 39.63 38.61 45.32 35.64 98.77 98.46 98.20 99.82 99.79 35.51 35.95 35.10 44.94 45.17 98.47 98.05 97.66 99.75 99.68 34.56 34.75 33.75 43.25 43.75 80.80 93.61 82.71 56.28 65.85 92.94 87.23 84.69 47.00 19.16 95.90 94.87 91.89 57.28 25.59 Code to reproduce experimental results is available 1 . 5.1 Feature Matching Decline Our experimental strategy includes outlier free correspondences calculation, using robust fundamental matrix calculation, before and after adding perturbations. We evaluate percentage feature matching decline (PFMD) after perturbation using the following equation, M −M PFMD = ooMoo op × 100%, where Moo is the number of feature matches between an image with itself, and Mop is the number of feature matches between the image and its perturbed version. For all the local interest point approaches discussed below, e.g., Harris etc., we have used MATLAB’s inbuilt routines for the detector and the corresponding descriptor. The same applies for thresholds involving the fundamental matrix/epipolar line test [13, 22]. 5.1.1 Perturbations at Single Scale: Results for Gaussian blur, ASM, DD, P2P, PPS and B2B perturbations are reported in Table 1 for two mask sizes. It can be seen that significant number of features are being deceived with the addition of perturbations. Since the scale of features (in case of SURF) is ignored, SURF features are not significantly deceived at single scale perturbations. 5.1.2 Scale-specific Perturbations for SURF: Adding scale-specific perturbations for SURF, outlined in Section 4.3, significantly degrades its performance for various noises 1 https://github.com/zohaibali-pk/Adversarial-Noises-for-Handcrafted_Features 8 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES Table 2: Results for scale-specific perturbations for SURF features. It can be seen that more than 97% SURF features can be deceived if we can allow some image quality degradation (88% SSIM). Scale scheme: (5×5)-(7×7)-(9×9) Scale scheme: (9×9)-(11×11)-(13×13) Perturbation PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) PFMD(%) SSIM(%) PSNR(dB) ASM 82.64 92.72 31.35 94.03 90.12 28.87 97.16 86.16 26.57 P2P 75.93 97.32 33.59 93.20 93.45 28.64 97.44 88.30 25.11 PPS 27.89 99.52 42.96 66.90 97.75 34.93 88.28 93.66 29.30 MSER 0.40 0.44 0.09 0.61 0.25 SURF 0.31 0.52 0.28 0.47 0.44 0.93 0.88 0.99 0.47 0.44 0.52 0.16 0.14 0.49 0.25 0.51 0.43 0.36 0.27 0.44 0.83 0.71 0.60 0.19 0.22 0.68 0.30 0.59 0.14 0.23 0.23 0.21 0.18 0.08 0.09 0.24 0.13 0.05 0.03 0.04 0.45 0.52 0.03 0.04 0.11 0.31 0.68 0.03 0.02 0.09 0.22 0.36 0.29 0.02 0.03 0.07 0.03 0.02 0.03 0.06 0.08 0.07 0.04 0.03 0.13 Brisk FAST Harris MSER SURF ASM 0.46 0.48 0.12 SURF 0.86 0.96 0.91 0.82 0.79 P2P 0.56 0.51 0.08 MSER 0.86 0.93 0.93 0.78 0.82 0.61 0.69 0.62 0.43 0.60 PPS 0.89 0.82 0.14 Harris Brisk 0.97 0.97 0.26 FAST FAST 0.72 0.76 0.80 Scale-specific Brisk Harris 0.37 0.59 SURF MSER 0.57 0.69 MSER SURF 0.70 0.91 Harris Brisk 1 0.99 FAST FAST 0.75 0.89 B2B Brisk Harris 0.57 SURF MSER 0.70 MSER SURF 0.69 Harris Brisk 0.99 FAST FAST 0.88 PPS Brisk Harris SURF MSER MSER SURF Harris Brisk P2P FAST FAST ASM Brisk Harris Scale scheme: (7×7)-(9×9)-(11×11) Attack Failed Attack Successful Figure 4: Confusion matrix illustrating cross feature generalization of our adversarial perturbation. This data is generated after taking into account slight change in viewpoint/illumination (1-3 of HPatches dataset). The robustness of scale-specific attack using P2P scheme is evident. Dark color indicates successful attack. (ASM, P2P and PPS) as shown in Table 2. 5.1.3 Generalization of our Adversarial Perturbation: Figure 4 shows results, in the form of confusion matrix, where perturbations are added at one feature location (e.g. SURF) and the effect on other features is evaluated. It can be seen that scale-specific P2P perturbations added to SURF features, have the best generalization, as they deceives all the handcrafted features considered. 5.1.4 Generalization over Viewpoint and Illumination Changes: Figure 5 indicates significant decline in features matching across viewpoints and illumination changes. 6000 2500 150 160 5000 Matched Feature Count Matched Feature Count 2000 120 4000 80 3000 40 2000 1-2 1000 0 1-1 1-2 1-3 1-4 1-3 1-4 Viewpoint Change 1-5 1-5 1-6 1-6 100 1500 50 1000 500 0 1-2 1-1 1-2 1-3 1-4 1-3 1-4 Illumination Change 1-5 1-5 1-6 1-6 Figure 5: Generalization over Viewpoint & Illumination Changes. No. of feature matches between an image pair (x-y). For the perturbed version, noise is added in x only. There are six different viewpoints/illuminations per scene in Hpatches dataset. It can be seen that scalespecific P2P perturbation is effective when matching with viewpoint/illumination changes. After ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES Attack on Visual Odometry (SVO2) Y Y X 9 Attack on Visual SLAM (ORBSLAM) Y X X Trajectory Before Loop Closing Trajectory After Loop Closing Loop Closing Failed Figure 6: Attacking Dense Visual Odometry & SLAM. (Left) Trajectories generated by SVO 2.0 with original and perturbed videos (x3 Scenes). Difference in trajectories caused by perturbations is evident. (Right) Loop-closure failure in ORB-SLAM. Table 3: Attacking Dense Visual Odometry. Drift generated by P2P perturbations on SVO 2.0 5.2 Dataset No. of Videos Min. Traj. Error Max. Traj. Error RGB-D TUM 12 0.45 m 12.21 m Avg. Traj. Error 2.54 m Own Videos 2 3.31 m 4.67 m 3.99 m Attacking Well Known Pipelines The success of attacks from the last section may vary by tweaking various parameters. In this section we show the effectiveness of our adversarial noise on publicly available pipelines. Scale-specific P2P perturbation is used for the following tests. 5.2.1 Visual Odometry/SLAM: For a given video sequence, instead of perturbing every frame, we perturbed alternate batches of 12 consecutive frames with adversarial noise. Even with this sparse sampling dense visual odometry (SVO 2.0 [15]) shows visible drift (Figure 6 and Table 3). We have used the benchmark presented in [45] for calculating trajectory errors. Note that SVO 2.0 utilizes the entire image instead of the extracted features. Even than our adversarial noise affects its performance. Similar attack on SLAM (ORB-SLAM [36]) was more severe as it failed to even initialize or lost tracking within seconds with perturbed videos. 5.2.2 Loop Closure in SLAM: We sampled videos from the RGB-D TUM benchmark [45] that include loop closures (Table 4). ORB-SLAM managed to close the loop on only a subset of these videos without the adversarial noise. Therefore, we gathered few additional videos and verified ORBSLAM’s loop closure. For attacking the loop closure specifically, we added noise in frames neighbouring the loop closing frame. ORB-SLAM’s loop closure fails 100% of times (Figure 6 and Table 4). For further inspection we sampled these video sequences. The sampled frames included the initial frame and the neighbourhood around the original loop closing frame via manual inspection. Without noise, two publicly available loop closing pipelines, i.e., DBoW [18] and FAB-MAP [10], selected the correct match (out of 29 frames) for the initial frame. After perturbing this best matching frame only, both the pipelines show degradation in its rank and matching score considerably (Table 4). This degradation means incorrect frame has been selected as the best match. 5.2.3 3D Reconstruction: We provide qualitative results of attacking the dense reconstruction pipeline, i.e., CMVS [16, 17]. For this attack, we added the noise to all frames. Figure 7 shows significant degradation in 3D reconstructed scene after perturbation. Note that this pipeline is based on SIFT. This affirms the generalizing ability of our adversarial noise. 10 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES Table 4: Attacking Loop Closing Pipelines. Results for loop-closure (LC) routines after adding P2P perturbations. LCF: Loop-closure failure after perturbations. PMSD: Percentage matching score decline of the best match. APD: Average position downgrade of the best match. Best match is the top ranked match of the reference image before adding perturbation. Its matching score declines after perturbing it. Furthermore, it falls in the ranking of best matches (out of 29 images) of the reference image. ORB-SLAM [36] Dataset Avail. LC Succ. LC DBoW [18] LCF No. of PMSD FAB-MAP [10] APD Videos No. of PMSD APD Videos RGBD-TUM 12 3 100% 15 36.30% 9 15 83.08% 16 Own Videos 5 5 100% 5 26.58 14 5 82.02% 17 Reconstructed with original images Before Reconstructed with original images After Reconstructed with original images Before Reconstructed with original images After Reconstructed with original images Before Reconstructed with original images After Figure 7: Attacking Dense 3D Reconstruction. 3D scene reconstructed from original and P2P perturbed images using CMVS. The areas where reconstruction failed due to perturbation are highlighted. Zoomed patch (original) 6 Zoomed patch (perturbed) Zoomed patches. Original (left) perturbed (right) Zoomed patches. Original (left) perturbed (right) Zoomed patches. Original (left) perturbed (right) Zoomed patches. Original (left) perturbed (right) Conclusion In this work we have shown for the first time that robust local features are not as robust in the adversarial domain. Our novel method helps in generating less privileged digital content. This makes it harder to register the images with a stored database using local hand crafted features. This opens few exciting avenues for future research. Is it possible to extend the attacks on local handcrafted features in the physical world? Is it possible to generate a targeted attack on local features, forcing unrelated images to match? How does this noise affect the DNNs? How does deep adversarial noise affects handcrafted features? Acknowledgement This work is funded by Higher Education Commission (HEC), Govt. of Pakistan through its grant NRPU-6025-2016/2017. We are extremely thankful to research students (Saran, Rafi, Suneela, and Usama among others) at ROMI Lab for helping test various odometry / SLAM / reconstruction pipelines. ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES 11 References [1] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. In ICML, 2018. [2] Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In CVPR, 2017. [3] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In ECCV, 2006. [4] Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 2000. [5] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, 2013. [6] Zinelabidine Boulkenafet, Jukka Komulainen, and Abdenour Hadid. Face spoofing detection using colour texture analysis. IEEE Transactions on Information Forensics and Security, 2016. [7] Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017. [8] Damon M Chandler and Sheila S Hemami. Vsnr: A wavelet-based visual signal-tonoise ratio for natural images. IEEE Transactions on Image Processing, 2007. [9] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In ICML, 2017. [10] Mark Cummins and Paul Newman. Fab-map: Probabilistic localization and mapping in the space of appearance. The International Journal of Robotics Research, 2008. [11] Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897, 2015. [12] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. In CVPR, 2018. [13] Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981. [14] Gili Rusak Giancarlo Pellegrino Dan Boneh Florian Tramér, Pascal Dupré. Adversarial: Perceptual ad-blocking meets adversarial machine learning. arXiv preprint arXiv:1811.03194v2, 2019. 12 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES [15] Christian Forster, Zichao Zhang, Michael Gassner, Manuel Werlberger, and Davide Scaramuzza. Svo: Semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics, 2017. [16] Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence, 2010. [17] Yasutaka Furukawa, Brian Curless, Steven M Seitz, and Richard Szeliski. Towards internet-scale multi-view stereo. In CVPR. IEEE, 2010. [18] Dorian Gálvez-López and Juan D Tardos. Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 2012. [19] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. [20] Michael Gygli, Helmut Grabner, Hayko Riemenschneider, Fabian Nater, and Luc Van Gool. The interestingness of images. In ICCV, 2013. [21] Christopher G Harris, Mike Stephens, et al. A combined corner and edge detector. In Alvey vision conference, 1988. [22] Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003. [23] Alain Hore and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In 20th International Conference on Pattern Recognition, 2010. [24] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. In ICML, 2018. [25] Aditya Khosla, Atish Das Sarma, and Raffay Hamid. What makes an image popular? In Proceedings of the 23rd international conference on World wide web. ACM, 2014. [26] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016. [27] Stefan Leutenegger, Margarita Chli, and Roland Y Siegwart. BRISK: Binary robust invariant scalable keypoints. 2011. [28] Bo Li and Yevgeniy Vorobeychik. Feature cross-substitution in adversarial classification. In NIPS, 2014. [29] Hsueh-Ti Derek Liu, Michael Tao, Chun-Liang Li, Derek Nowrouzezahrai, and Alec Jacobson. Beyond pixel norm-balls: Parametric adversaries using an analytically differentiable renderer. In International Conference on Learning Representations, 2019. [30] David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004. [31] Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth. No need to worry about adversarial examples in object detection in autonomous vehicles. arXiv preprint arXiv:1707.03501, 2017. ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES 13 [32] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. [33] Jiri Matas, Ondrej Chum, Martin Urban, and Tomás Pajdla. Robust wide-baseline stereo from maximally stable extremal regions. Image and vision computing, 2004. [34] Krystian Mikolajczyk, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, Jiri Matas, Frederik Schaffalitzky, Timor Kadir, and Luc Van Gool. A comparison of affine region detectors. IJCV, 2005. [35] Greg Mori and Jitendra Malik. Recognizing objects in adversarial clutter: Breaking a visual captcha. In CVPR. IEEE, 2003. [36] Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 2015. [37] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, 2015. [38] Edward Rosten and Tom Drummond. Machine learning for high-speed corner detection. In ECCV, 2006. [39] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018. [40] Johannes L Schonberger, Hans Hardmeier, Torsten Sattler, and Marc Pollefeys. Comparative evaluation of hand-crafted and learned local features. In CVPR, 2017. [41] Vikash Sehwag, Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Mung Chiang, and Prateek Mittal. Not all pixels are born equal: An analysis of evasion attacks under locality constraints. In ACM SIGSAC Conference on Computer and Communications Security, 2018. [42] Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. stat, 1050:29, 2017. [43] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. International Conference on Learning Representations, 2018. [44] Yang Song, Rui Shu, Nate Kushman, and Stefano Ermon. Constructing unrestricted adversarial examples with generative models. In NIPS. 2018. [45] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of rgb-d slam systems. In Proc. of the International Conference on Intelligent Robot Systems (IROS), Oct. 2012. [46] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. 14 ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES [47] Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et al. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 2004. [48] Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang. Yet another text captcha solver: A generative adversarial network based approach. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2018.