Breaking HUGO – the Process Discovery presented jointly with Steganalysis of Content-Adaptive Steganography in Spatial Domain jessicaFRIDRICH janKODOVSKÝ miroslavGOLJAN vojtěchHOLUB 1/ 43 Are there “issues” with adaptive stego? • Content adaptive embedding leakage about placement of embedding changes. • Is HUGO’s probabilistically-known selection channel a problem? Why should it be a problem? • It is all about how well we can model the content. • Honestly, fellow BOSS competitors, you all started here, haven’t you? 2/ 43 Fridrich, Kodovský, Holub, Goljan Probability of embedding change … can be estimated from the stego image fairly well: cover 0.5 p iY 0.4 estimated pi X 0.3 0.2 actual changes 0.1 0 0 0.1 0.2 0.3 true p 3/ 43 0.4 0.5 X i Fridrich, Kodovský, Holub, Goljan Complex texture of 512×512 images 512×512 image 4/ 43 4MP image Fridrich, Kodovský, Holub, Goljan Look at what HUGO did … Seven images from BOSSrank can be detected visually as stego images: Close-up of its LSB plane BOSSrank image No. 235 5/ 43 Fridrich, Kodovský, Holub, Goljan Weighted-Stego attack for HUGO? n yi xi si , i {1,0,1}, (1/ n) si 2 , the change rate, 0.1 i 1 xˆi xi i , estimate cover from stego, e.g., xˆi ( yi 1 yi 1 ) / 2 n Assume that we can estimate si s.t. n n i 1 i 1 sˆ s i 1 i i b n and b 0 c ( yi xˆi )sˆi (si i )sˆi assuming i iid, E[c] ~ n, Var[c] ~ n Problem: E[c] varies much with content, cannot be easily thresholded or calibrated despite the fact that E[c] < E[s] in general (and sometimes by as much as 60% but on average by 1.74%). 6/ 43 Fridrich, Kodovský, Holub, Goljan Pixel domain is not useful, right? HUGO approximately preserves ~107 statistics computed from neighboring pixels. Intimidating, isn’t it? Forget the pixel domain, go to a different domain. Wavelet perhaps? Brushed off dust from WAM, put it on steroids, whacked HUGO with it. What we tried: a) added moments from LL band to inform steganalyzer about content (makes sense for content adaptive stego) b) add the same feature vector from re-embedded image (relying on “saturation effect” with re-embedding) c) replace Wiener filter in WAM with adaptive filter based on estimated probability of change: Var[i ] pˆ i X . BOSSrank score: 59% 7/ 43 Fridrich, Kodovský, Holub, Goljan Go back to pixel domain! Your best chances for detection are in the embedding domain. Compute the residual rij xˆij xij where xˆ ij is an estimator of xij from its local neighborhood. Advantages of computing detection statistics from rij: a) narrower dynamic range b) image content suppressed c) higher SNR between stego-signal and noise Undoubtedly, the best estimator is xij. However, xˆ ij should not depend on xij to avoid biased estimate (this is why denoising filters do not work well). 8/ 43 Fridrich, Kodovský, Holub, Goljan Higher-order local models (HOLMES) • HUGO approximately preserves joint distribution of three 1st-order differences among four neighboring pixels. • We need to get out of HUGO’s model: a) Use four or more differences – cooc dimension grows too fast, bins in coocs become empty or underpopulated. b) Use higher-order differences – they “see” beyond 4 pixels. SPAM feature set uses xˆi , j xi , j 1 locally constant model rij ( h) xi , j 1 xi , j rij ( h) xi , j 1 2 xi , j xi , j 1 rij ( h) xi , j 1 3xi , j 3xi , j 1 xi , j 2 … 9/ 43 constant model linear model quadratic model … Fridrich, Kodovský, Holub, Goljan Higher-order local models, cont’d Hugo is likely to embed here even though the content is modelable in the vertical direction Edge close up Image with many edges 10/ 43 Fridrich, Kodovský, Holub, Goljan However, pixel differences will mostly be in the marginal. Linear or quadratic models bring the residual back inside the cooc matrix Quantize and truncate Before computing the coocs, the residual is first quantized and then truncated. rij rij truncT q x when x [T , T ] truncT ( x) Tsign( x) otherwise Note that we marginalize instead of cutting. The marginals (bins at the boundary) are very important! 11/ 43 Fridrich, Kodovský, Holub, Goljan First successful features Take min/max of 2nd-order residuals in 4 directions: rij ( h ) xi , j 1 2 xi , j xi , j 1 rij (min) min{rij ( h) , rij ( v ) , rij ( d ) , rij ( m) } rij (max) min{rij ( h) , rij ( v ) , rij ( d ) , rij ( m) } Features are two 3D cooc matrices: H(min) (rij (min) , ri , j 1(min) , ri , j 2(min) ) (H(min) V (min) , H(max) V (max) ) H(max) (rij (max) , ri , j 1(max) , ri , j 2(max) ) V (min) (rij (min) , ri 1, j (min) , ri 2, j (min) ) V (max) (rij (max) , ri 1, j (max) , ri 2, j (max) ) MINMAX: T = 4, q = 1, dim = 2×(2T+1)3 = 1458 QUANT : T = 4, q = 2, dim = 1458 12/ 43 Fridrich, Kodovský, Holub, Goljan Encouraging results Early October Features: MINMAX, dim 1458 Training database: 2×9074 BOSSbase 0.91 Classifier: FLD BOSSrank: 71% Features: MINMAX+QUANT, dim 2916 Training database: 2×9074 BOSSbase 0.91 Classifier: G-SVM BOSSrank: 73% 13/ 43 Fridrich, Kodovský, Holub, Goljan Unexpected stego-source mismatch BOSSbase 0.91 was prepared with 4, 10 BOSSrank with 1 BOSSbase 0.92 embedded with 1. Retraining our classifier on the correct stego database gave: October 14 Features: MINMAX+QUANT, dim 2916 Training database: 2×9074 BOSSbase 0.92 Classifier: G-SVM BOSSrank: 75% 14/ 43 Fridrich, Kodovský, Holub, Goljan Hugobreakers’ frustration BOSSrank Do not say “hop” before you jump 79 78 77 76 75 74 Oct 14 Nov 13 This is when BOSS became GOSS: “Guess Our Steganographic Source” 15/ 43 Fridrich, Kodovský, Holub, Goljan The dreaded cover-source mismatch The tell-tale symptom of the mismatch: Adding more features improved score on BOSSbase but worsened BOSSrank score. The problem: we trained on one source but tested on another (different) source. Our detector lacked robustness. Note that this is an issue of robustness rather than overtraining. Well recognized in detection and estimation. Very difficult problem as the mismatch can have so many different forms. 16/ 43 Fridrich, Kodovský, Holub, Goljan Trying to resolve the CSM a) Train on a more diverse source (adding 6000 images to BOSSbase lowered BOSSrank – making mismatch worse?) b) Use classifiers with a simpler decision boundary (L-SVM) (the same problem and lower accuracy) c) Contaminate the training set with BOSSrank images: - put denoised BOSSrank covers (use adaptive denoising based on estimated probabilities) - put re-embedded BOSSrank stego (unable to obtain consistent results with contamination when experimenting with BOSSbase, decided to toss it) d) Find out more about the cover source - estimate resampling artifacts – we could obtain info about the original image size (no artifacts detected by Farid’s code) - extract fingerprint from BOSSbase cameras, detect in images from BOSSrank, train on images from the right source. 17/ 43 Fridrich, Kodovský, Holub, Goljan Forensic analysis of BOSSrank • Fingerprint extracted from all 7 BOSSbase cameras and detected in BOSSrank. • ~500 images tested positive for Leica M9, no other camera tested positive 1235 • Leica • Rebel 900 PCE 625 400 225 100 25 16 0 0 16 100 200 300 400 500 600 BOSSrank images 18/ 43 Fridrich, Kodovský, Holub, Goljan 700 800 900 1000 Forensic analysis of BOSSrank, cont’d Most images taken in Pacific North-West 19/ 43 Forensic analysis of BOSSrank, cont’d Fingerprint extracted from 25 JPEG images from Tomas Filler’s camera (Panasonic Lumix DMC-FZ50) taken previously at SPIE conferences. PCE Resized to 512×512 using the same script. Positively identified in ~77 BOSSrank images. 196 144 Could not use for BOSS 100 as other competitors did not have this opportunity. 64 36 We closed our investigation with ~50% from Leica, the rest declared unknown. 16 4 0 0 100 200 300 400 500 600 700 BOSSrank images 20/ 43 Fridrich, Kodovský, Holub, Goljan 800 900 1000 Forensic-aided steganalysis Option #1: Buy Leica M9 and generate our own database. Oops … price is $7,000!! Option #2: LensRentals.com, rent it for a week. Took 7,301 images with Leica M9. Experiment#1 Train two classifiers – one trained only on Leica to analyze only Leica images, and one trained on all to analyze the rest. Merge the prediction files. Experiment#2 Add Leica images to the BOSSbase batabase and train on all. Result: BOSSrank score either the same or slightly worse. Bummer 21/ 43 Fridrich, Kodovský, Holub, Goljan Can a cover source be replicated? Cover source is a very complex entity shaped by: • Camera and its settings short exposure lower dark current high ISO increased level of noise stopping lens at 5.6 sharper images than when stopped at 2.0 • Lens short focus low depth of field easier for analysis • Content Binghamton in Fall is a poor replacement for French Riviera. Average amount of edges, smooth regions. We rented the wrong lens (50 mm), Patrick used 35 mm. 22/ 43 Fridrich, Kodovský, Holub, Goljan Model diversity is the key QUANT, go 4D, use 3rd order differences (quadratic model), merge. Difference order 2nd 3rd 2nd 3rd Cooc. 3 3 4 4 T 3 3 2 2 q 2 2 2 2 dim 686 686 1250 1250 November 13 Features: dim 3872 Training database: 2×9074 BOSSbase 0.92 Classifier: G-SVM BOSSrank: 76% With increased dimensionality, machine learning became a serious bottleneck. 23/ 43 Fridrich, Kodovský, Holub, Goljan Ensemble classifier (SVM) To facilitate further development, we started using ensemble classifiers instead of SVMs. 1. Set l 1 2. Randomly select k features out of d, k d. 3. Train a FLD on this random subspace on all BOSSbase images, set threshold to obtain minimum PE, store the eigenvector el. 4. Make decisions on BOSSrank (fj is the jth feature): fj el > 0 Dec(l, j) 1 (stego) fj el < 0 Dec(l, j) 0 (cover) 5. Repeat 2–4 L-times, obtain L decisions Dec(1..L, 1..1000) for each test image. 6. For each image, fuse decisions by voting. Advantages • Low complexity (training of a 9288-dim set on 2×17,000 images with L 31 and k 1600 takes only 8 minutes on a PC. • Performance comparable to SVM. 24/ 43 Fridrich, Kodovský, Holub, Goljan Scaling up feature dim seemed to work Mid November Feature set: Previous 3872 + 1458 (MINMAX) = 5330 Training database: 2×9074 BOSSbase v. 0.92 Classifier: Ensemble, L 31, k 1600 BOSSrank: 77% However, adding more features computed from various residuals did not improve BOSSrank, despite steady improvement on BOSSbase. 25/ 43 Fridrich, Kodovský, Holub, Goljan A little more empirical magic … Train on 2N images where N is about 20–50% larger than feature dimension. November 29 2500 2500 1458 Feature set: 5330 + QUANT4 + SQUARE + KB = 9288 Training database: 2×9074 + 2×6500 = 2×15,574 Classifier: Ensemble, L 31, k 1600 BOSSrank: 78% QUANT4: rij xi , j 2 4xi, j 1 6xi, j 4xi , j 1 xi, j 2 SQUARE: rij ( h) xi , j 1 2 xi , j xi , j 1 + “square” cooc KB (Ker-Bőhme) kernel: -1/4 1/2 -1/4 cooc = H + V (h) 26/ 43 1/2 0 1/2 -1/4 1/2 -1/4 The final behemoth of dim 24,933 Combination of 32 feature subsets containing • 1st–6th order differences • multiple versions with different values of q (quantization) • EDGE residuals (effective around edges) • Calibrated features (from a low-pass filtered image) • 5D coocs with T = 1 December 31 Feature set: 24,933 Training database: 2×34,719 Classifier: Ensemble, L 71, k 2400 BOSSrank: 81% Accuracy on Leica: 82.3% Accuracy on Panasonic: 70.0% 27/ 43 Fridrich, Kodovský, Holub, Goljan Score progress 82 Dec 31 80 Dec 23 Dec18 BOSSrank score 78 Nov 29 Nov 15 76 Nov 13 74 Oct 14 72 Oct 4 70 68 Oct 3 66 Sep 30 64 Sep 1 28/ 43 Oct 1 Nov 1 % 68 71 73 75 76 77 78 79 80 81 dim 1458 1458 1458 2916 3872 5330 9388 17933 22307 24933 Dec 1 Fridrich, Kodovský, Holub, Goljan img 3759 9074 9074 9074 9074 9074 16375 24184 24184 34719 Jan 1 Detecting HUGO without cover source mismatch alias Steganalysis of Content-Adaptive Steganography in Spatial Domain 29/ 43 Fridrich, Kodovský, Holub, Goljan Effect of quantization Quantization allows the features to sense changes in textured areas and around edges. 3D coocs are best quantized with q = c = central coefficient in the residual computation. c1 rij ( h) xi , j 1 xi , j rij ( h) xi , j 1 2 xi , j xi , j 1 c2 ( h) r xi , j 1 3xi , j 3xi , j 1 xi , j 2 c3 ij (h) c 6 rij xi , j 2 4xi , j 1 6xi , j 4xi , j 1 xi , j 2 ( h) c 10 rij xi , j 3 5xi , j 2 10xi , j 1 10xi , j 5xi , j 1 xi , j 2 c 20 rij ( h) xi , j 3 6 xi , j 2 15xi , j 1 20xi , j 15xi , j 1 6xi , j 2 xi , j 3 30/ 43 Fridrich, Kodovský, Holub, Goljan Best quantization value for 3D and 4D coocs Feature set MINMAX, 4th-order differences, 3D, T = 4. q PE 2 4 30.5 26.8 6 26.1 8 26.8 10 27.7 12 28.2 Feature set MINMAX, 4th-order differences, 4D, T = 2. q PE 2 4 34.2 30.7 6 28.2 8 26.8 10 27.5 For 3D coocs, the best q is c For 4D coocs, the best q is 1.5c 31/ 43 Fridrich, Kodovský, Holub, Goljan 12 28.4 Testing higher-order residuals Average accuracy when training on 8074 and testing on 1000 images from BOSSbase repeated 100 times (all results with ensemble). Fea. type (diff, q, T) “SPAM”(3D)* (2nd,1,4) MINMAX(3D)(2nd,1,4) QUANT(3D) (2nd,2,4) QUANT(3D)+ (2nd–6th,c,4) QUANT(4D)+ (2nd–6th,c,2) d 1458 1458 1458 7290 6250 PE Best 71.4 74.5 72.7 74.9 73.8 76.8 80.0 82.2 79.1 81.0 * Worst L k 69.0 31 1000 68.7 31 1000 71.6 31 1000 77.4 81 1600 76.5 81 1600 “SPAM” is a direct equivalent of SPAM vector with 1st order differences replaced with 2nd order. + 2nd–6th is a merger of QUANT features from 2nd–6th differences quantized with q = c = central coefficient in the residual 32/ 43 Fridrich, Kodovský, Holub, Goljan Accuracy on BOSSbase across cameras EOS 400D 1 EOS 7D Rebel Leica M9 Nikon D70 Pentax K20D 0.9 0.8 0.7 0.6 0.5 Accuracy per image of BOSSbase on 1000 splits 8074/1000 (trn/tst). Lines = avgs for each camera 0.4 0.3 0.2 0.1 0 0 1000 2000 3000 4000 5000 6000 7000 6627 cover images always classified as cover 6647 stego images always classified as stego 4836 images always classified correctly as cover AND stego 33/ 43 8000 9000 Pentax K20D is the easiest ROC and scatter plot with QUANT (dim 1458) 34/ 43 Fridrich, Kodovský, Holub, Goljan Canon Rebel is the hardest Scatter plot with QUANT (dim 1458) 35/ 43 Fridrich, Kodovský, Holub, Goljan Accuracy correlates with texture FLD scatter plot with QUANT (dim 1458) 9600 9550 9500 9450 9400 9350 1--1354: Canon EOS 400D 10 MP, w = 1936 9300 9250 0 1416--2769: Canon EOS 7D 18 MP, w = 5184 1000 2000 2770--4372: Canon Digital Rebel xsi 12.2 MP, w = 2256 3000 4000 1355--1415: Canon EOS 40D 10 MP, w = 1936 6640--7672 Nikon D70 6MP, w = 3040 4373--6639 Leica M9 w = 5216 18 MP 5000 Image # sorted by cameras 6000 7673 -- end Pentax K20D 14.6 MP w = 4864 7000 8000 7000 8000 9000 Average absolute 2nd difference 80 70 60 50 40 30 20 10 0 0 36/ 43 1000 2000 3000 4000 5000 Fridrich, Kodovský, Holub, Goljan 6000 9000 Leica images Pixel count 4000 3000 2000 1000 0 0 50 100 150 200 250 Grayscale Typical Leica image histogram (possibly caused by the resizing script). Decreased dynamic range makes detection of embedding easier. 37/ 43 Fridrich, Kodovský, Holub, Goljan Scatter plot for LSB matching (QUANT 1458) Dependence on content is much weaker! 38/ 43 Fridrich, Kodovský, Holub, Goljan Comparison to 1 embedding and CDF … ensemble with 33,963-dim behemoth HUGO with BOSS payload, accuracy 84.2% 39/ 43 Fridrich, Kodovský, Holub, Goljan Implications for steganalysis • As steganography becomes more sophisticated, steganalysis needs to use more complex models to capture more subtle dependencies among pixels. • The key is diveristy! The model should be rich – a union of smaller submodels. • Feature dimensionality will inevitably increase. • Automatic handling of the dimensionality problem is preferable to hand-tweaking – ensemble classifiers scale well w.r.t. feature dim and training set size and are suitable for this task. • Detectability of HUGO embedding in larger images will increase faster than what Square Root Law dictates because neighboring pixels will be more correlated • Cover source mismatch is an extremely difficult problem that will hamper deployment of steganalysis in practice. • Robust machine learning is badly needed. 40/ 43 Fridrich, Kodovský, Holub, Goljan Implications for steganography • Adaptive stego implemented to minimize distortion in model space is the way to go • Critical: choice of model and distortion function • HUGO’s model is high-dim but too narrow • By making the model more diverse (rich) better steganography can likely be built • Despite progress made during BOSS, HUGO remains the most secure stego algorithm we ever tested 41/ 43 Fridrich, Kodovský, Holub, Goljan BOSS jump-started new directions • Optimal choice of residual and its quantization? • Perhaps learning both from given source and for stego algorithm? • Alternative to coocs as statistical descriptors of the random field of residuals? • Helped us develop ensemble classification as alternative to SVMs • Drew attention to CSM - training set contamination - training only on (processed) test images 42/ 43 Fridrich, Kodovský, Holub, Goljan Our current results on detection of HUGO and much more in the Rump Session. 43/ 43 Fridrich, Kodovský, Holub, Goljan Some more interesting stats 1000 splits of BOSSbase into 8074/1000 BEST … images always classified correctly as cover AND stego FAs …… images always classified as stego when cover MDs ….. images always classified as cover when stego Images Avg. gray Satur. pixls BEST 74.1 2046 FAs 101.3 4415 MDs 102.0 5952 Texture 1.73 4.66 3.95 Texture: Scaled average |xij – xi,j+1| 44/ 43 Fridrich, Kodovský, Holub, Goljan Effect of quantization Cooc covers only this range 0.4 0.35 0.3 Thin marginal Thick marginal Quantized distribution 0.25 0.2 0.15 Original distribution 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 Changes to elements from marginal are undetected. 45/ 43 Fridrich, Kodovský, Holub, Goljan 4 5 Another example after scaling to 512×512 46/ 43 4MP image Fridrich, Kodovský, Holub, Goljan