presentation_hugobre..

advertisement
Breaking HUGO – the
Process Discovery
presented jointly with
Steganalysis of Content-Adaptive
Steganography in Spatial Domain
jessicaFRIDRICH
janKODOVSKÝ
miroslavGOLJAN
vojtěchHOLUB
1/ 43
Are there “issues” with adaptive stego?
• Content adaptive embedding  leakage about placement
of embedding changes.
• Is HUGO’s probabilistically-known selection channel a
problem?
Why should it be a problem?
• It is all about how well we can model the content.
• Honestly, fellow BOSS competitors, you all started here,
haven’t you?
2/ 43
Fridrich, Kodovský, Holub, Goljan
Probability of embedding change
… can be estimated from the stego image
fairly well:
cover
0.5
p iY
0.4
estimated
pi X
0.3
0.2
actual
changes
0.1
0
0
0.1
0.2
0.3
true p
3/ 43
0.4
0.5
X
i
Fridrich, Kodovský, Holub, Goljan
Complex texture of 512×512 images
512×512
image
4/ 43
4MP image
Fridrich, Kodovský, Holub, Goljan
Look at what HUGO did …
Seven images from BOSSrank can be detected visually
as stego images:
Close-up of its LSB plane
BOSSrank image No. 235
5/ 43
Fridrich, Kodovský, Holub, Goljan
Weighted-Stego attack for HUGO?
n
yi  xi  si , i {1,0,1}, (1/ n) si 2   , the change rate,   0.1
i 1
xˆi  xi  i , estimate cover from stego, e.g., xˆi  ( yi 1  yi 1 ) / 2
n
Assume that we can estimate si s.t.
n
n
i 1
i 1
 sˆ s
i 1
i i
 b n and b  0
c   ( yi  xˆi )sˆi   (si  i )sˆi
assuming i iid, E[c] ~ n, Var[c] ~ n
Problem: E[c] varies much with content, cannot be easily
thresholded or calibrated despite the fact that E[c] < E[s]
in general (and sometimes by as much as 60% but on
average by 1.74%).
6/ 43
Fridrich, Kodovský, Holub, Goljan
Pixel domain is not useful, right?
HUGO approximately preserves ~107 statistics computed
from neighboring pixels. Intimidating, isn’t it? Forget the pixel
domain, go to a different domain. Wavelet perhaps?
Brushed off dust from WAM, put it on steroids, whacked
HUGO with it.
What we tried:
a) added moments from LL band to inform steganalyzer about
content (makes sense for content adaptive stego)
b) add the same feature vector from re-embedded image (relying on
“saturation effect” with re-embedding)
c) replace Wiener filter in WAM with adaptive filter based on
estimated probability of change: Var[i ]  pˆ i X .
BOSSrank score: 59% 
7/ 43
Fridrich, Kodovský, Holub, Goljan
Go back to pixel domain!
Your best chances for detection are in the embedding
domain.
Compute the residual rij  xˆij  xij where xˆ ij is an estimator
of xij from its local neighborhood.
Advantages of computing detection statistics from rij:
a) narrower dynamic range
b) image content suppressed
c) higher SNR between stego-signal and noise
Undoubtedly, the best estimator is xij. However, xˆ ij should
not depend on xij to avoid biased estimate (this is why
denoising filters do not work well).
8/ 43
Fridrich, Kodovský, Holub, Goljan
Higher-order local models (HOLMES)
• HUGO approximately preserves joint distribution of three
1st-order differences among four neighboring pixels.
• We need to get out of HUGO’s model:
a) Use four or more differences – cooc dimension grows too fast,
bins in coocs become empty or underpopulated.
b) Use higher-order differences – they “see” beyond 4 pixels.
SPAM feature set uses xˆi , j  xi , j 1  locally constant model
rij ( h)  xi , j 1  xi , j
rij ( h)  xi , j 1  2 xi , j  xi , j 1
rij ( h)  xi , j 1  3xi , j  3xi , j 1  xi , j  2
…
9/ 43
constant model
linear model
quadratic model
…
Fridrich, Kodovský, Holub, Goljan
Higher-order local models, cont’d
Hugo is likely
to embed here
even though
the content is
modelable in the
vertical direction
Edge close up
Image with many edges
10/ 43
Fridrich, Kodovský, Holub, Goljan
However, pixel
differences will mostly
be in the marginal.
Linear or quadratic
models bring the
residual back inside
the cooc matrix
Quantize and truncate
Before computing the coocs, the residual is first quantized
and then truncated.
 rij 
rij  truncT  
q
x
when x  [T , T ]

truncT ( x)  
Tsign( x) otherwise
Note that we marginalize instead of cutting.
The marginals (bins at the boundary) are very important!
11/ 43
Fridrich, Kodovský, Holub, Goljan
First successful features
Take min/max of 2nd-order residuals in 4 directions:
rij ( h )  xi , j 1  2 xi , j  xi , j 1
rij (min)  min{rij ( h) , rij ( v ) , rij ( d ) , rij ( m) }
rij (max)  min{rij ( h) , rij ( v ) , rij ( d ) , rij ( m) }
Features are two 3D
cooc matrices:
H(min)  (rij (min) , ri , j 1(min) , ri , j  2(min) )
(H(min)  V (min) , H(max)  V (max) )
H(max)  (rij (max) , ri , j 1(max) , ri , j  2(max) )
V (min)  (rij (min) , ri 1, j (min) , ri  2, j (min) )
V (max)  (rij (max) , ri 1, j (max) , ri  2, j (max) )
MINMAX: T = 4, q = 1, dim = 2×(2T+1)3 = 1458
QUANT : T = 4, q = 2, dim = 1458
12/ 43
Fridrich, Kodovský, Holub, Goljan
Encouraging results
Early October
Features: MINMAX, dim  1458
Training database: 2×9074 BOSSbase 0.91
Classifier: FLD
BOSSrank: 71%
Features: MINMAX+QUANT, dim  2916
Training database: 2×9074 BOSSbase 0.91
Classifier: G-SVM
BOSSrank: 73%
13/ 43
Fridrich, Kodovský, Holub, Goljan
Unexpected stego-source mismatch
BOSSbase 0.91 was prepared with   4,   10
BOSSrank with     1
BOSSbase 0.92 embedded with     1.
Retraining our classifier on the correct stego database gave:
October 14
Features: MINMAX+QUANT, dim  2916
Training database: 2×9074 BOSSbase 0.92
Classifier: G-SVM
BOSSrank: 75%
14/ 43
Fridrich, Kodovský, Holub, Goljan
Hugobreakers’ frustration
BOSSrank
Do not say “hop” before you jump
79
78
77
76
75
74
Oct 14
Nov 13
This is when BOSS became GOSS:
“Guess Our Steganographic Source”
15/ 43
Fridrich, Kodovský, Holub, Goljan
The dreaded cover-source mismatch
The tell-tale symptom of the mismatch: Adding more
features improved score on BOSSbase but worsened
BOSSrank score.
The problem: we trained on one source but tested on
another (different) source. Our detector lacked robustness.
Note that this is an issue of robustness rather than
overtraining. Well recognized in detection and estimation.
Very difficult problem as the mismatch can have so many
different forms.
16/ 43
Fridrich, Kodovský, Holub, Goljan
Trying to resolve the CSM
a) Train on a more diverse source (adding 6000 images to
BOSSbase lowered BOSSrank – making mismatch worse?)
b) Use classifiers with a simpler decision boundary (L-SVM)
(the same problem and lower accuracy)
c) Contaminate the training set with BOSSrank images:
- put denoised BOSSrank  covers (use adaptive denoising
based on estimated probabilities)
- put re-embedded BOSSrank  stego
(unable to obtain consistent results with contamination
when experimenting with BOSSbase, decided to toss it)
d) Find out more about the cover source
- estimate resampling artifacts – we could obtain info about
the original image size (no artifacts detected by Farid’s code)
- extract fingerprint from BOSSbase cameras, detect in
images from BOSSrank, train on images from the right source.
17/ 43
Fridrich, Kodovský, Holub, Goljan
Forensic analysis of BOSSrank
• Fingerprint extracted from all 7 BOSSbase cameras and detected in BOSSrank.
• ~500 images tested positive for Leica M9, no other camera tested positive
1235
• Leica
• Rebel
900
PCE
625
400
225
100
25
16
0
0
16
100
200
300
400
500
600
BOSSrank images
18/ 43
Fridrich, Kodovský, Holub, Goljan
700
800
900
1000
Forensic analysis of BOSSrank, cont’d
Most images taken
in Pacific North-West
19/ 43
Forensic analysis of BOSSrank, cont’d
Fingerprint extracted from 25 JPEG images from Tomas Filler’s camera
(Panasonic Lumix DMC-FZ50) taken previously at SPIE conferences.
PCE
Resized to 512×512 using the same script. Positively identified in ~77
BOSSrank images.
196
144
Could not use for BOSS 100
as other competitors did
not have this opportunity. 64
36
We closed our
investigation with ~50%
from Leica, the rest
declared unknown.
16
4
0
0
100
200
300
400
500
600
700
BOSSrank images
20/ 43
Fridrich, Kodovský, Holub, Goljan
800
900
1000
Forensic-aided steganalysis
Option #1: Buy Leica M9 and generate our own
database. Oops … price is $7,000!!
Option #2: LensRentals.com, rent it for a week.
Took 7,301 images with Leica M9.
Experiment#1
Train two classifiers – one trained only on Leica to analyze only Leica
images, and one trained on all to analyze the rest. Merge the prediction
files.
Experiment#2
Add Leica images to the BOSSbase batabase and train on all.
Result: BOSSrank score either the same or slightly worse. Bummer 
21/ 43
Fridrich, Kodovský, Holub, Goljan
Can a cover source be replicated?
Cover source is a very complex entity shaped by:
• Camera and its settings
short exposure  lower dark current
high ISO  increased level of noise
stopping lens at 5.6  sharper images than when stopped at 2.0
• Lens
short focus  low depth of field  easier for analysis
• Content
Binghamton in Fall is a poor replacement for French Riviera.
Average amount of edges, smooth regions.
We rented the wrong lens (50 mm), Patrick used 35 mm.
22/ 43
Fridrich, Kodovský, Holub, Goljan
Model diversity is the key
QUANT, go 4D, use 3rd order differences (quadratic model), merge.
Difference order
2nd
3rd
2nd
3rd
Cooc.
3
3
4
4
T
3
3
2
2
q
2
2
2
2
dim
686
686
1250
1250
November 13
Features: dim  3872
Training database: 2×9074 BOSSbase 0.92
Classifier: G-SVM
BOSSrank: 76%
With increased dimensionality, machine learning became a
serious bottleneck.
23/ 43
Fridrich, Kodovský, Holub, Goljan
Ensemble classifier (SVM)
To facilitate further development, we started using ensemble classifiers
instead of SVMs.
1. Set l  1
2. Randomly select k features out of d, k d.
3. Train a FLD on this random subspace on all BOSSbase images, set
threshold to obtain minimum PE, store the eigenvector el.
4. Make decisions on BOSSrank (fj is the jth feature):
fj  el > 0  Dec(l, j)  1 (stego)
fj  el < 0  Dec(l, j)  0 (cover)
5. Repeat 2–4 L-times, obtain L decisions Dec(1..L, 1..1000) for each test
image.
6. For each image, fuse decisions by voting.
Advantages
• Low complexity (training of a 9288-dim set on 2×17,000 images
with L  31 and k  1600 takes only 8 minutes on a PC.
• Performance comparable to SVM.
24/ 43
Fridrich, Kodovský, Holub, Goljan
Scaling up feature dim seemed to work
Mid November
Feature set: Previous 3872 + 1458 (MINMAX) = 5330
Training database: 2×9074 BOSSbase v. 0.92
Classifier: Ensemble, L  31, k  1600
BOSSrank: 77%
However, adding more features computed from various
residuals did not improve BOSSrank, despite steady
improvement on BOSSbase.
25/ 43
Fridrich, Kodovský, Holub, Goljan
A little more empirical magic …
Train on  2N images where N is about 20–50% larger
than feature dimension.
November 29
2500
2500
1458
Feature set: 5330 + QUANT4 + SQUARE + KB = 9288
Training database: 2×9074 + 2×6500 = 2×15,574
Classifier: Ensemble, L  31, k  1600
BOSSrank: 78%
QUANT4: rij   xi , j  2  4xi, j 1  6xi, j  4xi , j 1  xi, j 2
SQUARE: rij ( h)  xi , j 1  2 xi , j  xi , j 1 + “square” cooc
KB (Ker-Bőhme) kernel: -1/4 1/2 -1/4 cooc = H + V
(h)
26/ 43
1/2 0 1/2
-1/4 1/2 -1/4
The final behemoth of dim 24,933
Combination of 32 feature subsets containing
• 1st–6th order differences
• multiple versions with different values of q (quantization)
• EDGE residuals (effective around edges)
• Calibrated features (from a low-pass filtered image)
• 5D coocs with T = 1
December 31
Feature set: 24,933
Training database: 2×34,719
Classifier: Ensemble, L  71, k  2400
BOSSrank: 81%
Accuracy on Leica:
82.3%
Accuracy on Panasonic: 70.0%
27/ 43
Fridrich, Kodovský, Holub, Goljan
Score progress
82
Dec 31
80
Dec 23
Dec18
BOSSrank score
78
Nov 29
Nov 15
76
Nov 13
74
Oct 14
72
Oct 4
70
68
Oct 3
66
Sep 30
64
Sep 1
28/ 43
Oct 1
Nov 1
%
68
71
73
75
76
77
78
79
80
81
dim
1458
1458
1458
2916
3872
5330
9388
17933
22307
24933
Dec 1
Fridrich, Kodovský, Holub, Goljan
img
3759
9074
9074
9074
9074
9074
16375
24184
24184
34719
Jan 1
Detecting HUGO without cover
source mismatch
alias
Steganalysis of Content-Adaptive
Steganography in Spatial Domain
29/ 43
Fridrich, Kodovský, Holub, Goljan
Effect of quantization
Quantization allows the features to sense changes in
textured areas and around edges.
3D coocs are best quantized with q = c = central coefficient in
the residual computation.
c1
rij ( h)  xi , j 1  xi , j
rij ( h)  xi , j 1  2 xi , j  xi , j 1
c2
( h)
r
 xi , j 1  3xi , j  3xi , j 1  xi , j  2
c3
ij
(h)
c  6 rij   xi , j  2  4xi , j 1  6xi , j  4xi , j 1  xi , j 2
( h)
c  10 rij  xi , j 3  5xi , j  2  10xi , j 1  10xi , j  5xi , j 1  xi , j 2
c  20 rij ( h)  xi , j 3  6 xi , j  2  15xi , j 1  20xi , j  15xi , j 1  6xi , j  2  xi , j 3
30/ 43
Fridrich, Kodovský, Holub, Goljan
Best quantization value for 3D and 4D coocs
Feature set MINMAX, 4th-order differences, 3D, T = 4.
q
PE
2
4
30.5 26.8
6
26.1
8
26.8
10
27.7
12
28.2
Feature set MINMAX, 4th-order differences, 4D, T = 2.
q
PE
2
4
34.2 30.7
6
28.2
8
26.8
10
27.5
For 3D coocs, the best q is  c
For 4D coocs, the best q is  1.5c
31/ 43
Fridrich, Kodovský, Holub, Goljan
12
28.4
Testing higher-order residuals
Average accuracy when training on 8074 and testing on 1000 images
from BOSSbase repeated 100 times (all results with ensemble).
Fea. type
(diff, q, T)
“SPAM”(3D)* (2nd,1,4)
MINMAX(3D)(2nd,1,4)
QUANT(3D) (2nd,2,4)
QUANT(3D)+ (2nd–6th,c,4)
QUANT(4D)+ (2nd–6th,c,2)
d
1458
1458
1458
7290
6250
PE Best
71.4 74.5
72.7 74.9
73.8 76.8
80.0 82.2
79.1 81.0
*
Worst L
k
69.0 31 1000
68.7 31 1000
71.6 31 1000
77.4 81 1600
76.5 81 1600
“SPAM” is a direct equivalent of SPAM vector with 1st order
differences replaced with 2nd order.
+ 2nd–6th is a merger of QUANT features from 2nd–6th differences
quantized with q = c = central coefficient in the residual
32/ 43
Fridrich, Kodovský, Holub, Goljan
Accuracy on BOSSbase across cameras
EOS 400D
1
EOS 7D
Rebel
Leica M9
Nikon D70
Pentax K20D
0.9
0.8
0.7
0.6
0.5
Accuracy per image of BOSSbase
on 1000 splits 8074/1000 (trn/tst).
Lines = avgs for each camera
0.4
0.3
0.2
0.1
0
0
1000
2000
3000
4000
5000
6000
7000
6627 cover images always classified as cover
6647 stego images always classified as stego
4836 images always classified correctly as cover AND stego
33/ 43
8000
9000
Pentax K20D is the easiest
ROC and scatter plot with QUANT (dim 1458)
34/ 43
Fridrich, Kodovský, Holub, Goljan
Canon Rebel is the hardest
Scatter plot with QUANT (dim 1458)
35/ 43
Fridrich, Kodovský, Holub, Goljan
Accuracy correlates with texture
FLD scatter plot with QUANT (dim 1458)
9600
9550
9500
9450
9400
9350
1--1354: Canon EOS 400D
10 MP, w = 1936
9300
9250
0
1416--2769: Canon EOS 7D
18 MP, w = 5184
1000
2000
2770--4372: Canon Digital Rebel xsi
12.2 MP, w = 2256
3000
4000
1355--1415: Canon EOS 40D 10 MP, w = 1936
6640--7672 Nikon D70
6MP, w = 3040
4373--6639 Leica M9
w = 5216 18 MP
5000
Image # sorted by cameras
6000
7673 -- end Pentax K20D
14.6 MP w = 4864
7000
8000
7000
8000
9000
Average absolute 2nd difference
80
70
60
50
40
30
20
10
0
0
36/ 43
1000
2000
3000
4000
5000
Fridrich, Kodovský, Holub, Goljan
6000
9000
Leica images
Pixel count
4000
3000
2000
1000
0
0
50
100
150
200
250
Grayscale
Typical Leica image histogram (possibly caused by the
resizing script). Decreased dynamic range makes detection
of embedding easier.
37/ 43
Fridrich, Kodovský, Holub, Goljan
Scatter plot for LSB matching (QUANT 1458)
Dependence on content is much weaker!
38/ 43
Fridrich, Kodovský, Holub, Goljan
Comparison to 1 embedding and CDF
… ensemble with 33,963-dim behemoth
HUGO with BOSS payload, accuracy 84.2%
39/ 43
Fridrich, Kodovský, Holub, Goljan
Implications for steganalysis
• As steganography becomes more sophisticated, steganalysis needs
to use more complex models to capture more subtle dependencies
among pixels.
• The key is diveristy! The model should be rich – a union of smaller
submodels.
• Feature dimensionality will inevitably increase.
• Automatic handling of the dimensionality problem is preferable to
hand-tweaking – ensemble classifiers scale well w.r.t. feature dim
and training set size and are suitable for this task.
• Detectability of HUGO embedding in larger images will increase
faster than what Square Root Law dictates because neighboring
pixels will be more correlated
• Cover source mismatch is an extremely difficult problem that will
hamper deployment of steganalysis in practice.
• Robust machine learning is badly needed.
40/ 43
Fridrich, Kodovský, Holub, Goljan
Implications for steganography
• Adaptive stego implemented to minimize
distortion in model space is the way to go
• Critical: choice of model and distortion function
• HUGO’s model is high-dim but too narrow
• By making the model more diverse (rich) better
steganography can likely be built
• Despite progress made during BOSS, HUGO
remains the most secure stego algorithm we ever
tested
41/ 43
Fridrich, Kodovský, Holub, Goljan
BOSS jump-started new directions
• Optimal choice of residual and its quantization?
• Perhaps learning both from given source and
for stego algorithm?
• Alternative to coocs as statistical descriptors of
the random field of residuals?
• Helped us develop ensemble classification as
alternative to SVMs
• Drew attention to CSM
- training set contamination
- training only on (processed) test images
42/ 43
Fridrich, Kodovský, Holub, Goljan
Our current results on detection of HUGO
and much more in the Rump Session.
43/ 43
Fridrich, Kodovský, Holub, Goljan
Some more interesting stats
1000 splits of BOSSbase into 8074/1000
BEST … images always classified correctly as cover AND stego
FAs …… images always classified as stego when cover
MDs ….. images always classified as cover when stego
Images Avg. gray Satur. pixls
BEST
74.1
2046
FAs
101.3
4415
MDs
102.0
5952
Texture
1.73
4.66
3.95
Texture: Scaled average |xij – xi,j+1|
44/ 43
Fridrich, Kodovský, Holub, Goljan
Effect of quantization
Cooc covers only this range
0.4
0.35
0.3
Thin
marginal
Thick
marginal
Quantized
distribution
0.25
0.2
0.15
Original
distribution
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
2
3
Changes to elements from marginal are undetected.
45/ 43
Fridrich, Kodovský, Holub, Goljan
4
5
Another example
after scaling
to 512×512
46/ 43
4MP image
Fridrich, Kodovský, Holub, Goljan
Download