Uploaded by Latif Anjum

BMVC

advertisement
1
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
Adversarial Examples for Handcrafted
Features
Zohaib Ali
zali.msee16seecs@seecs.edu.pk
Muhammad Latif Anjum
latif.anjum@seecs.edu.pk
Robotics & Machine Intelligence
(ROMI) Lab
National University of Sciences and
Technology, Islamabad, Pakistan.
Wajahat Hussain
wajahat.hussain@seecs.edu.pk
Original
With Adversarial Noise
Original
Noisy
Original
Original
Figure 1: Attacking Local Feature Matching. Original image and its perturbed version
using our novel adversarial noise. The decline in SURF [3] feature matching is 92.07%
(1463 matches with original-original and 116 with perturbed-original) with little perceptual
change.
Abstract
Adversarial examples have exposed the weakness of deep networks. Careful modification of the input fools the network completely. Little work has been done to expose the
weakness of handcrafted features in adversarial settings. In this work, we propose novel
adversarial perturbations for handcrafted features. Pixel level analysis of handcrafted features reveals simple modifications which considerably degrade their performance. These
perturbations generalize over different features, viewpoint and illumination changes. We
demonstrate successful attack on several well known pipelines (SLAM, visual odometry,
SfM etc.). Extensive evaluation is presented on multiple public benchmarks.
1
Introduction
Is there any domain where deep features are not the best performers? Surprisingly image
registration is one area where handcrafted features still outperform these data driven features
c 2019. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.
2
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
[40]. Perhaps the lack of appropriate training data is the reason behind this anomalous lag.
Labeling interest points is much more tedious than labeling objects, e.g., there can be more
than thousand interest points on a single image as compared to thousand object instances in
the entire dataset.
Recently there has been a surge in adversarial examples generation that fool these deep
systems. These adversarial examples can have either imperceptible changes [19] or blatant
editing [7] of input for tricking DNNs. Although blatant, this editing appears harmless, e.g.,
a black patch somewhere far from the object of interest. Ironically, the stock example used
to advocate the gravity of this weakness is where autonomous car misinterprets stop sign for
high speed (50 kmph) area [31]. Visual odometry is the fundamental method used by moving
agents to estimate their location. This method relies on handcrafted features. This raises the
question whether such weakness exists for these handcrafted features? This work is focused
on investigating the existence of such adversarial examples for handcrafted features.
One way to approach this challenge is to understand the adversarial example generation process for DNNs. The key ingredient, in the deep adversarial example recipe, is the
end-to-end differentiability of DNN [46]. This sound analytical formulation makes adversarial example generation appear real simple, i.e., change the input slightly so that the output
decision of DNN changes. On the other hand, handcrafted features, as the name suggests,
comprise of functioning heuristics and well thought out discrete steps that have evolved over
decades [3, 30, 34]. The pipeline for interest points includes gradient calculation for corner detection, orientation assignment, and scale calculation for invariant matching. Gradient
calculation involves thresholding to find significant gradient. Orientation is determined by
quantizing the gradients into fixed number of angles. Scale is determined by searching over
discrete scale-space pyramid. These discrete steps indicate that this pipeline will not have
the end-to-end differentiability.
A nonlinear process maps the input to output in case of DNN. Therefore, the deep
pipeline amplifies slight change in input to totally different output. On the contrary, the
interest points pipeline works with direct pixel values, e.g., edge is a simple difference between two neighbouring pixels. Is it possible to slightly modify the pixels and get totally
different output for interest points?
In this work, we share pixel level insights that expose weaknesses of these handcrafted
pipelines. The simplest case of image registration, i.e., no scale, illumination or view changes
is shown in Figure 1. The image is registered with itself. As expected large number of
features match. We add our novel perturbation to the image and match the original image
with the same image perturbed with our novel adversarial noise. Even for this simple case
of image registration, the number of matches significantly reduces.
Following are the contributions of our work:
1. This work is the first attempt, to the best of our knowledge, to demonstrate adversarial
examples for handcrafted features in context of natural scenes.
2. Our adversarial noise generalizes over different local features, viewpoints and illuminations with varying degree of success.
3. We demonstrate successful attacks on well known image registration, structure-frommotion (SfM), SLAM and loop-closing pipelines with varying degree of success.
Our novel perturbation scheme helps in censoring personal images. An image should
only be used for the intended purpose and nothing more [11]. Imagine the implications if
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
3
an image, uploaded on social websites, is registered with a stored database (Google maps)
to find its location. Using our novel method, the owner of digital content can reduce the
chances of such privacy violations. Furthermore, our novel perturbation scheme does not
considerably affect image perceptual quality keeping its likeability [25] and interesting-ness
[20] intact.
2
Related Work
The concern of adversarial examples was raised in machine learning and robotics community
as a theoretical exercise when it was successfully demonstrated that deep neural networks
can be deceived with a small perturbation in digital images [5, 19, 28, 37, 46]. It was reported
that these adversarial examples do not pose a threat to autonomous systems owing to their
rigid viewpoint and scale matching requirements [31]. This was quickly overruled when
adversarial examples for physical-world were proposed in the form of adversarial patch [7],
adversarial stickers resembling graffiti [12] and printed adversarial images [26]. Adversarial
examples for 3D-printed physical objects have also been presented which are robust to viewpoint changes and can be misclassified from every viewpoint nearly 100% of the time [1].
Most recent work on adversarial examples is focused on generating adversarial examples
entirely from scratch using GANs [44], integrating locality constraints into the optimization equation [41], creating black-box adversarial attack which does not require access to
classifier gradients [24], and using parametric lightning and image geometry to design adversarial attack for deep features [29]. Strategies to defend against such adversarial attacks
are proposed in [9, 32, 39, 42, 43].
Before the arms race (adversarial attacks vs robust deep networks) started, the most common example of adversarial attacks on vision revolved around CAPTCHAs [35]. Written
text was distorted by adding clutter (occluding lines) and random transformations (rotation,
warping) so that contemporary text detectors fail to read the text [48]. Early CAPTCHA
solvers were able to detect text reliably even under adversarial clutter using pattern matching
(object detection) techniques [35]. Face spoofing was another attack on classic vision systems [6]. These attacks on classic vision systems involved changing the input considerably.
Imperceptibility was not the goal. Our attack on traditional local features requires as little
modification in input image as possible.
The work closest to ours comes from perceptual ad blocking [14] where discrete noise
has been added to deceive SIFT [30] features. This has been done on a very small patch of
the image (containing The AdChoices Logo) to deceive ad-blocking algorithms. Our work
is considerably different from this work. We have proposed perturbations that generalize
over viewpoint and illumination changes and work for different local features. Furthermore,
small logos contain few local features as compared to natural scenes we are attacking.
3
Pixel Level Insight into Handcrafted Features
What happens to handcrafted features if the pixel at which the feature is detected and its
neighboring pixels are perturbed? The answer is tricky as it all depends on pixels around
the patch that is being perturbed. The feature may completely vanish from original location
and its surroundings. This is termed as successful attack on feature detector. However,
the feature may also be displaced by a few pixels from original location after perturbations.
4
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
This is considered as unsuccessful attack on detector because the feature is still being detected (albeit a few pixels displaced). This, however, does not mean perturbations have not
affected the feature. The power of handcrafted features lie in their descriptors which are
used to match these features. To analyze the effectiveness of perturbations in such situations
we estimated the outlier free matches (correspondences) using fundamental matrix / epipolar
line check involving RANSAC [13, 22]. A successful attack on descriptor is established
if a feature displaced due to perturbations fails to match with original feature. An unsuccessful attack is the one where feature is detected (either at the same pixel or displaced)
and matched successfully. Pixel level insight into all these categories is provided in Figure 2
for Harris [21] corners after addition of Gaussian blur and P2P perturbations (P2P and other
perturbations are explained in Section 4). Because of the discrete nature of handcrafted features, it is difficult to predict the effect of perturbations on handcrafted features. Inspired by
this pixel level insight into handcrafted features, we designed various discrete perturbations
to successfully affect handcrafted features.
Successful Descriptor Attack
(corner detected within 9x9 window but not matched)
Unsuccessful Attack
(corner detected within 9x9 window and matched)
P2P Perturbation (3x3)
Perturbed
Original
Gaussian Blur (3x3)
Blurred
Original
Successful Detector Attack
(corner removed from 9x9 window)
Figure 2: Pixel level insight. Pixel at which corner is detected is colored green. Takeaway
from this insight is, with our adversarial noise, the corner is either removed (successful detector attack) or its descriptor corrupted (successful descriptor attack) with little visual change
for the naked eye. Note that for our best perturbation, successful detector and descriptor
attack on Harris constitutes 99% of corners.
4
Formulating the Adversarial Perturbations
We divide our adversarial perturbation schemes into three groups: Gaussian blur, inpainted
perturbations, and discrete perturbations. Each scheme is discussed separately.
4.1
Gaussian Blur
Almost all of the handcrafted features utilize image gradients in one way or the other. Any
smoothing filter discretely applied at locations of detected features may disturb the local
gradients and therefore the features. Our first perturbation, therefore, is Gaussian blur of
various kernel sizes and sigma values applied at locations of features. Our experimental
5
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
results show that Gaussian blur drastically degrades image quality and fails to significantly
deceive most of the features.
4.2
Inpainted Perturbations
Image inpainting [4] methods fill the holes in the image using the context and produce realistic results. We leverage this technique to produce adversarial yet imperceptible noise.
4.2.1 Average Squared Mask (ASM) Perturbation: ASM is applied in two steps (Figure 3a). Firstly, the fixed region (3×3, 5×5) around the detected feature is averaged to
remove the corner just like Gaussian blur. Secondly, inpainting is deployed to refill the averaged pixels. This results in perceptually original looking patch while having minute changes
affecting the local feature as shown in the results.
4.2.2 Dominant Direction (DD) Perturbation for SURF: Instead of a rectangular region (3×3, 5×5), we tried greedy approach of adding perturbations along the dominant direction of SURF features and its perpendicular direction (Figure 3b), followed by inpainting
to refill perturbed regions. Each pixel lying along the cross sign within the patch is replaced
with white pixel before using inpainting.
Average Square Mask (ASM)
Dominant Direction (DD)
P2P Perturbation Scheme
p
𝛍
p
Original patch
Noise Added
Inpainting Mask
Inpainted patch
Original patch
Noise Added
(a)
Inpainting Mask
Inpainted patch
(b)
(c)
PPS Perturbation Scheme
p
p
p
B2B Perturbation Scheme
p
p
p
p
p
p
Block 1
Block 2
p
p
p
p
p
p
p
p
𝛍
𝛍
p
p p
p
p
p
p
p
p
p
p
p
p
p
𝛍
p
𝛍
p
3x3 patch centered at feature
3x3
p p
p
p
Block 1 repeated with external pixels
5x5
Block 3
(d)
Block 4
(e)
Figure 3: Our novel adversarial perturbation schemes.
4.3
Discrete Perturbations
Instead of modifying the image regions in a black box manner we developed few perturbations that rely on local averaging. This averaging significantly affects the local gradients
while keeping the distortion low.
4.3.1 Pixel-2-Pixel (P2P) Perturbation: P2P perturbations are designed to affect every
pixel of the patch (3×3, 5×5) around the feature location. Each pixel of a fixed sized patch,
6
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
around the feature location, is replaced with the average of two pixels: its neighbors in
previous row and previous column (Figure 3c) as given in Equation 1. It must be noted
that during P2P perturbations, edge pixels of the patch are replaced by the average of pixels
outside the patch. This arrangement ensures no sharp edge is generated at the boundary of
the patch after perturbations. Pixels are modified sequentially starting from the top left.
pi, j =
pi, j−1 + pi−1, j
2
(1)
4.3.2 Pixel-2-Pixel-Scattered (PPS) Perturbation: The scattered P2P perturbation routine modifies every other pixel (along horizontal and vertical axis) within the patch instead of
modifying its every pixel (dark pixels in Figure 3d). This scattered nature of the perturbation
keeps the quality degradation in image to a low level. We have used simple averaging of nine
pixels, i.e., dark pixels in Figure 3d are replaced by the average of 3×3 neighbourhood.
4.3.3 Block-2-Block (B2B) Perturbation: Block-2-Block scheme, is a greedy approach,
which generates perturbation at coarser block level instead of pixel level (Figure 3e). The
perturbation scheme is as follows: 1) A patch of 3×3 around each feature is divided into
four overlapping blocks as shown in Figure 3e. 2) The decision to perturb a pixel is taken
at block level. Sum of all four pixels in each block is evaluated and block with highest sum
is considered as reference block that will not be perturbed. Sum of pixels in each block is
compared with the reference block, and only those blocks are perturbed whose difference is
greater than a threshold. This can result in one, two or all three blocks (except the reference
block) as candidates to undergo perturbations. 3) Once a block is selected as a candidate
to undergo perturbation, we replace each pixel of the block with the average of its three
neighboring pixels outside the block (Figure 3e). Note that since blocks are overlapping, one
pixel may undergo perturbation more than once.
4.3.4 Scale-specific Perturbation for SURF: SURF are scale-invariant features and additionally provide scale at which a keypoint is detected. This scale information can be utilized for designing a perturbation to deceive SURF features. In scale-specific perturbation,
the kernel size of ASM, P2P and PPS perturbations is linked with scale of keypoints which
is divided into three categories: 1) For keypoints with scale less than 5, a 5×5 kernel is used,
2) for keypoints with scale between 5 and 10, a 7×7 kernel is used, and 3) for keypoints with
scale greater than 10, a 9×9 kernel is used. This constitutes a scale-specific perturbation
with kernel sizes (5×5)-(7×7)-(9×9) for three scale categories. We have also tested for two
other schemes of kernel sizes i.e. for (7×7)-(9×9)-(11×11) and (9×9)-(11×11)-(13×13).
5
Experimental Results
We show the effectiveness of our adversarial noise on robust image registration and various
publicly available pipelines. We report results on two competing indicators, i.e., pipeline
degradation and image perceptual quality degradation. We use two metrics to ascertain the
perceptual quality degradation after perturbations 1) structural similarity index (SSIM) [47],
and 2) peak signal-to-noise ratio (PSNR) [8]. Higher value of these metrics means good
image perceptual quality. These two image quality metrics are considered benchmark for
digital content [23]. Image registration results are reported on recent HPatches dataset [2]
which contains 116 different environments with large viewpoint and illumination changes.
We have used videos from RGB-D TUM benchmark [45] for sequential/multiview pipelines.
7
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
Table 1: Results for Gaussian Blur, ASM, P2P, DD, PPS and B2B perturbations on HPatches
dataset. Resilience of SURF features (PFMD boldfaced) against adversarial attacks stands
out.
Gaussian Blur (σ = 1)
Kernel Size: 15×15
Average Squared Mask (ASM)
Kernel Size: 25×25
Kernel Size: 3×3
Kernel Size: 5×5
Features
PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB)
Harris [21]
FAST [38]
BRISK [27]
MSER [33]
SURF [3]
92.37
92.47
94.01
78.49
63.59
95.8
96.25
97.13
97.83
96.38
33.59
34.56
36.12
36.65
34.38
94.97
95.32
95.80
95.94
94.33
33.35
34.18
34.54
34.52
34.47
92.86
92.43
91.93
96.00
94.27
31.16
30.99
30.52
34.44
33.29
90.79
90.51
89.07
95.17
93.81
29.11
29.30
27.91
32.60
32.69
92.30
92.89
94.25
82.58
69.47
98.52
99.37
95.83
92.27
63.62
Pixel-2-Pixel (P2P)
Kernel Size: 3×3
99.40
99.59
98.42
95.53
68.06
Dominant Direction (DD)
Kernel Size: 5×5
Kernel Size: 11×11
Kernel Size: 15×15
Features
PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB)
Harris [21]
FAST [38]
BRISK [27]
MSER [33]
SURF [3]
98.27
98.69
95.64
70.31
41.75
97.30
96.91
96.09
99.52
99.28
31.77
32.37
31.22
40.26
40.19
92.41
92.58
90.74
98.38
97.40
26.48
27.47
26.11
34.43
33.71
–
–
–
–
93.57
–
–
–
–
32.53
–
–
–
–
93.42
–
–
–
–
32.34
99.40
99.59
98.89
88.52
75.47
–
–
–
–
70.68
Pixel-2-Pixel Scattered (PPS)
Threshold: 3×3
–
–
–
–
73.10
Block-2-Block (B2B)
Kernel Size: 5×5
Threshold = 50
Threshold = 20
Features
PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB) PFMD(%)
SSIM(%)
PSNR(dB)
Harris [21]
FAST [38]
BRISK [27]
MSER [33]
SURF [3]
91.44
94.37
90.16
58.63
25.00
98.96
98.79
98.47
99.76
99.64
36.23
36.76
35.48
43.53
44.55
99.18
99.03
98.82
99.77
98.14
38.95
39.63
38.61
45.32
35.64
98.77
98.46
98.20
99.82
99.79
35.51
35.95
35.10
44.94
45.17
98.47
98.05
97.66
99.75
99.68
34.56
34.75
33.75
43.25
43.75
80.80
93.61
82.71
56.28
65.85
92.94
87.23
84.69
47.00
19.16
95.90
94.87
91.89
57.28
25.59
Code to reproduce experimental results is available 1 .
5.1
Feature Matching Decline
Our experimental strategy includes outlier free correspondences calculation, using robust
fundamental matrix calculation, before and after adding perturbations. We evaluate percentage feature matching decline (PFMD) after perturbation using the following equation,
M −M
PFMD = ooMoo op × 100%,
where Moo is the number of feature matches between an image with itself, and Mop is the
number of feature matches between the image and its perturbed version. For all the local
interest point approaches discussed below, e.g., Harris etc., we have used MATLAB’s inbuilt
routines for the detector and the corresponding descriptor. The same applies for thresholds
involving the fundamental matrix/epipolar line test [13, 22].
5.1.1 Perturbations at Single Scale: Results for Gaussian blur, ASM, DD, P2P, PPS and
B2B perturbations are reported in Table 1 for two mask sizes. It can be seen that significant
number of features are being deceived with the addition of perturbations. Since the scale of
features (in case of SURF) is ignored, SURF features are not significantly deceived at single
scale perturbations.
5.1.2 Scale-specific Perturbations for SURF: Adding scale-specific perturbations for
SURF, outlined in Section 4.3, significantly degrades its performance for various noises
1 https://github.com/zohaibali-pk/Adversarial-Noises-for-Handcrafted_Features
8
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
Table 2:
Results for scale-specific perturbations for SURF features. It can be seen that more than 97% SURF features can be
deceived if we can allow some image quality degradation (88% SSIM).
Scale scheme: (5×5)-(7×7)-(9×9)
Scale scheme: (9×9)-(11×11)-(13×13)
Perturbation
PFMD(%)
SSIM(%)
PSNR(dB)
PFMD(%)
SSIM(%)
PSNR(dB)
PFMD(%)
SSIM(%)
PSNR(dB)
ASM
82.64
92.72
31.35
94.03
90.12
28.87
97.16
86.16
26.57
P2P
75.93
97.32
33.59
93.20
93.45
28.64
97.44
88.30
25.11
PPS
27.89
99.52
42.96
66.90
97.75
34.93
88.28
93.66
29.30
MSER
0.40
0.44
0.09
0.61
0.25
SURF
0.31
0.52
0.28
0.47
0.44
0.93
0.88
0.99
0.47
0.44
0.52
0.16
0.14
0.49
0.25
0.51
0.43
0.36
0.27
0.44
0.83
0.71
0.60
0.19
0.22
0.68
0.30
0.59
0.14
0.23
0.23
0.21
0.18
0.08
0.09
0.24
0.13
0.05
0.03
0.04
0.45
0.52
0.03
0.04
0.11
0.31
0.68
0.03
0.02
0.09
0.22
0.36
0.29
0.02
0.03
0.07
0.03
0.02
0.03
0.06
0.08
0.07
0.04
0.03
0.13
Brisk
FAST
Harris
MSER
SURF
ASM
0.46
0.48
0.12
SURF
0.86
0.96
0.91
0.82
0.79
P2P
0.56
0.51
0.08
MSER
0.86
0.93
0.93
0.78
0.82
0.61
0.69
0.62
0.43
0.60
PPS
0.89
0.82
0.14
Harris
Brisk
0.97
0.97
0.26
FAST
FAST
0.72
0.76
0.80
Scale-specific
Brisk
Harris
0.37
0.59
SURF
MSER
0.57
0.69
MSER
SURF
0.70
0.91
Harris
Brisk
1
0.99
FAST
FAST
0.75
0.89
B2B
Brisk
Harris
0.57
SURF
MSER
0.70
MSER
SURF
0.69
Harris
Brisk
0.99
FAST
FAST
0.88
PPS
Brisk
Harris
SURF
MSER
MSER
SURF
Harris
Brisk
P2P
FAST
FAST
ASM
Brisk
Harris
Scale scheme: (7×7)-(9×9)-(11×11)
Attack Failed
Attack Successful
Figure 4: Confusion matrix illustrating cross feature generalization of our adversarial
perturbation. This data is generated after taking into account slight change in viewpoint/illumination (1-3 of HPatches dataset). The robustness of scale-specific attack using
P2P scheme is evident. Dark color indicates successful attack.
(ASM, P2P and PPS) as shown in Table 2.
5.1.3 Generalization of our Adversarial Perturbation: Figure 4 shows results, in the
form of confusion matrix, where perturbations are added at one feature location (e.g. SURF)
and the effect on other features is evaluated. It can be seen that scale-specific P2P perturbations added to SURF features, have the best generalization, as they deceives all the
handcrafted features considered.
5.1.4 Generalization over Viewpoint and Illumination Changes:
Figure 5 indicates significant decline in features matching across viewpoints and illumination changes.
6000
2500
150
160
5000
Matched Feature Count
Matched Feature Count
2000
120
4000
80
3000
40
2000
1-2
1000
0
1-1
1-2
1-3
1-4
1-3
1-4
Viewpoint Change
1-5
1-5
1-6
1-6
100
1500
50
1000
500
0
1-2
1-1
1-2
1-3
1-4
1-3
1-4
Illumination Change
1-5
1-5
1-6
1-6
Figure 5: Generalization over Viewpoint & Illumination Changes. No. of feature matches
between an image pair (x-y). For the perturbed version, noise is added in x only. There are
six different viewpoints/illuminations per scene in Hpatches dataset. It can be seen that scalespecific P2P perturbation is effective when matching with viewpoint/illumination changes.
After
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
Attack on Visual Odometry (SVO2)
Y
Y
X
9
Attack on Visual SLAM (ORBSLAM)
Y
X
X
Trajectory Before Loop Closing
Trajectory After Loop Closing
Loop Closing Failed
Figure 6: Attacking Dense Visual Odometry & SLAM. (Left) Trajectories generated by
SVO 2.0 with original and perturbed videos (x3 Scenes). Difference in trajectories caused
by perturbations is evident. (Right) Loop-closure failure in ORB-SLAM.
Table 3: Attacking Dense Visual Odometry. Drift generated by P2P perturbations on SVO
2.0
5.2
Dataset
No. of Videos
Min. Traj. Error
Max. Traj. Error
RGB-D TUM
12
0.45 m
12.21 m
Avg. Traj. Error
2.54 m
Own Videos
2
3.31 m
4.67 m
3.99 m
Attacking Well Known Pipelines
The success of attacks from the last section may vary by tweaking various parameters. In
this section we show the effectiveness of our adversarial noise on publicly available pipelines.
Scale-specific P2P perturbation is used for the following tests.
5.2.1 Visual Odometry/SLAM: For a given video sequence, instead of perturbing every
frame, we perturbed alternate batches of 12 consecutive frames with adversarial noise. Even
with this sparse sampling dense visual odometry (SVO 2.0 [15]) shows visible drift (Figure
6 and Table 3). We have used the benchmark presented in [45] for calculating trajectory
errors. Note that SVO 2.0 utilizes the entire image instead of the extracted features. Even
than our adversarial noise affects its performance. Similar attack on SLAM (ORB-SLAM
[36]) was more severe as it failed to even initialize or lost tracking within seconds with
perturbed videos.
5.2.2 Loop Closure in SLAM:
We sampled videos from the RGB-D TUM benchmark [45] that include loop closures
(Table 4). ORB-SLAM managed to close the loop on only a subset of these videos without the adversarial noise. Therefore, we gathered few additional videos and verified ORBSLAM’s loop closure. For attacking the loop closure specifically, we added noise in frames
neighbouring the loop closing frame. ORB-SLAM’s loop closure fails 100% of times (Figure 6 and Table 4). For further inspection we sampled these video sequences. The sampled
frames included the initial frame and the neighbourhood around the original loop closing
frame via manual inspection. Without noise, two publicly available loop closing pipelines,
i.e., DBoW [18] and FAB-MAP [10], selected the correct match (out of 29 frames) for the
initial frame. After perturbing this best matching frame only, both the pipelines show degradation in its rank and matching score considerably (Table 4). This degradation means incorrect frame has been selected as the best match.
5.2.3 3D Reconstruction: We provide qualitative results of attacking the dense reconstruction pipeline, i.e., CMVS [16, 17]. For this attack, we added the noise to all frames.
Figure 7 shows significant degradation in 3D reconstructed scene after perturbation. Note
that this pipeline is based on SIFT. This affirms the generalizing ability of our adversarial
noise.
10
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
Table 4: Attacking Loop Closing Pipelines. Results for loop-closure (LC) routines after
adding P2P perturbations. LCF: Loop-closure failure after perturbations. PMSD: Percentage
matching score decline of the best match. APD: Average position downgrade of the best
match. Best match is the top ranked match of the reference image before adding perturbation.
Its matching score declines after perturbing it. Furthermore, it falls in the ranking of best
matches (out of 29 images) of the reference image.
ORB-SLAM [36]
Dataset
Avail. LC
Succ. LC
DBoW [18]
LCF
No.
of
PMSD
FAB-MAP [10]
APD
Videos
No.
of
PMSD
APD
Videos
RGBD-TUM
12
3
100%
15
36.30%
9
15
83.08%
16
Own Videos
5
5
100%
5
26.58
14
5
82.02%
17
Reconstructed with original images
Before
Reconstructed with original images
After
Reconstructed with original images
Before
Reconstructed with original images
After
Reconstructed with original images
Before
Reconstructed with original images
After
Figure 7: Attacking Dense 3D Reconstruction. 3D scene reconstructed from original and
P2P perturbed images using CMVS. The areas where reconstruction failed due to perturbation are highlighted.
Zoomed patch (original)
6
Zoomed patch (perturbed)
Zoomed patches. Original (left)
perturbed (right)
Zoomed patches. Original (left)
perturbed (right)
Zoomed patches. Original (left)
perturbed (right)
Zoomed patches. Original (left)
perturbed (right)
Conclusion
In this work we have shown for the first time that robust local features are not as robust in
the adversarial domain. Our novel method helps in generating less privileged digital content.
This makes it harder to register the images with a stored database using local hand crafted
features. This opens few exciting avenues for future research. Is it possible to extend the
attacks on local handcrafted features in the physical world? Is it possible to generate a
targeted attack on local features, forcing unrelated images to match? How does this noise
affect the DNNs? How does deep adversarial noise affects handcrafted features?
Acknowledgement
This work is funded by Higher Education Commission (HEC), Govt. of Pakistan through its
grant NRPU-6025-2016/2017. We are extremely thankful to research students (Saran, Rafi,
Suneela, and Usama among others) at ROMI Lab for helping test various odometry / SLAM
/ reconstruction pipelines.
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
11
References
[1] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust
adversarial examples. In ICML, 2018.
[2] Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. Hpatches:
A benchmark and evaluation of handcrafted and learned local descriptors. In CVPR,
2017.
[3] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features.
In ECCV, 2006.
[4] Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. Image
inpainting. In Proceedings of the 27th annual conference on Computer graphics and
interactive techniques, 2000.
[5] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel
Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at
test time. In Joint European conference on machine learning and knowledge discovery
in databases, 2013.
[6] Zinelabidine Boulkenafet, Jukka Komulainen, and Abdenour Hadid. Face spoofing
detection using colour texture analysis. IEEE Transactions on Information Forensics
and Security, 2016.
[7] Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017.
[8] Damon M Chandler and Sheila S Hemami. Vsnr: A wavelet-based visual signal-tonoise ratio for natural images. IEEE Transactions on Image Processing, 2007.
[9] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas
Usunier. Parseval networks: Improving robustness to adversarial examples. In ICML,
2017.
[10] Mark Cummins and Paul Newman. Fab-map: Probabilistic localization and mapping
in the space of appearance. The International Journal of Robotics Research, 2008.
[11] Harrison Edwards and Amos Storkey. Censoring representations with an adversary.
arXiv preprint arXiv:1511.05897, 2015.
[12] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei
Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks
on deep learning visual classification. In CVPR, 2018.
[13] Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for
model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981.
[14] Gili Rusak Giancarlo Pellegrino Dan Boneh Florian Tramér, Pascal Dupré. Adversarial: Perceptual ad-blocking meets adversarial machine learning. arXiv preprint
arXiv:1811.03194v2, 2019.
12
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
[15] Christian Forster, Zichao Zhang, Michael Gassner, Manuel Werlberger, and Davide
Scaramuzza. Svo: Semidirect visual odometry for monocular and multicamera systems. IEEE Transactions on Robotics, 2017.
[16] Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis.
IEEE transactions on pattern analysis and machine intelligence, 2010.
[17] Yasutaka Furukawa, Brian Curless, Steven M Seitz, and Richard Szeliski. Towards
internet-scale multi-view stereo. In CVPR. IEEE, 2010.
[18] Dorian Gálvez-López and Juan D Tardos. Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 2012.
[19] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing
adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[20] Michael Gygli, Helmut Grabner, Hayko Riemenschneider, Fabian Nater, and Luc
Van Gool. The interestingness of images. In ICCV, 2013.
[21] Christopher G Harris, Mike Stephens, et al. A combined corner and edge detector. In
Alvey vision conference, 1988.
[22] Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision.
Cambridge university press, 2003.
[23] Alain Hore and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In 20th International Conference on Pattern Recognition, 2010.
[24] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial
attacks with limited queries and information. In ICML, 2018.
[25] Aditya Khosla, Atish Das Sarma, and Raffay Hamid. What makes an image popular?
In Proceedings of the 23rd international conference on World wide web. ACM, 2014.
[26] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
[27] Stefan Leutenegger, Margarita Chli, and Roland Y Siegwart. BRISK: Binary robust
invariant scalable keypoints. 2011.
[28] Bo Li and Yevgeniy Vorobeychik. Feature cross-substitution in adversarial classification. In NIPS, 2014.
[29] Hsueh-Ti Derek Liu, Michael Tao, Chun-Liang Li, Derek Nowrouzezahrai, and Alec
Jacobson. Beyond pixel norm-balls: Parametric adversaries using an analytically differentiable renderer. In International Conference on Learning Representations, 2019.
[30] David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
[31] Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth. No need to worry about
adversarial examples in object detection in autonomous vehicles. arXiv preprint
arXiv:1707.03501, 2017.
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
13
[32] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv
preprint arXiv:1706.06083, 2017.
[33] Jiri Matas, Ondrej Chum, Martin Urban, and Tomás Pajdla. Robust wide-baseline
stereo from maximally stable extremal regions. Image and vision computing, 2004.
[34] Krystian Mikolajczyk, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, Jiri
Matas, Frederik Schaffalitzky, Timor Kadir, and Luc Van Gool. A comparison of affine
region detectors. IJCV, 2005.
[35] Greg Mori and Jitendra Malik. Recognizing objects in adversarial clutter: Breaking a
visual captcha. In CVPR. IEEE, 2003.
[36] Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 2015.
[37] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled:
High confidence predictions for unrecognizable images. In CVPR, 2015.
[38] Edward Rosten and Tom Drummond. Machine learning for high-speed corner detection. In ECCV, 2006.
[39] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint
arXiv:1805.06605, 2018.
[40] Johannes L Schonberger, Hans Hardmeier, Torsten Sattler, and Marc Pollefeys. Comparative evaluation of hand-crafted and learned local features. In CVPR, 2017.
[41] Vikash Sehwag, Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Mung Chiang, and Prateek Mittal. Not all pixels are born equal: An analysis of evasion attacks
under locality constraints. In ACM SIGSAC Conference on Computer and Communications Security, 2018.
[42] Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. stat, 1050:29, 2017.
[43] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial
examples. International Conference on Learning Representations, 2018.
[44] Yang Song, Rui Shu, Nate Kushman, and Stefano Ermon. Constructing unrestricted
adversarial examples with generative models. In NIPS. 2018.
[45] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for
the evaluation of rgb-d slam systems. In Proc. of the International Conference on
Intelligent Robot Systems (IROS), Oct. 2012.
[46] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian
Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint
arXiv:1312.6199, 2013.
14
ZOHAIB ET AL.: ADVERSARIAL EXAMPLES FOR HANDCRAFTED FEATURES
[47] Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et al. Image quality
assessment: from error visibility to structural similarity. IEEE Transactions on Image
Processing, 2004.
[48] Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu,
Xiaojiang Chen, and Zheng Wang. Yet another text captcha solver: A generative adversarial network based approach. In Proceedings of the 2018 ACM SIGSAC Conference
on Computer and Communications Security. ACM, 2018.
Download