International Journal of Innovative Computing, Information and Control Volume 3, Number 3, June 2007 c ICIC International °2007 ISSN 1349-4198 pp. 575—588 A LSB STEGANOGRAPHY APPROACH AGAINST PIXELS SAMPLE PAIRS STEGANALYSIS Xiangyang Luo, Fenlin Liu Zhengzhou Information Science and Technology Institute Zhengzhou 450002, P. R. China xiangyangluo@126.com; liufenlin@vip.sina.com Peizhong Lu Department of Computer Science and Engineering Fudan University Shanghai 200433, P. R. China pzlu@fudan.edu.cn Received October 2005; revised February 2006 Abstract. Through dynamic compensation of pixel values of the LSB (Least Significant Bit) embedded image, this paper presents a novel LSB information hiding method against pixels Sample Pairs Steganalysis (SPA), a powerful steganalysis method proposed by Dumitrescu et al. with high precision. This approach embeds messages in the LSB plane of the carrier image randomly via a chaotic system, then makes a dynamic compensation on the stego-image. Even when the embedding ratio is near 100%, such dynamic compensation can lead SPA steganalysis to an incorrect judgment review because of getting a very small estimate value close to 0. Moreover, the initial value of the chaotic system and the selection parameters of the compensation can be used to improve the security of steganography approach. Experimental results show that this method can also resist some others steganalysis methods, such as RS analysis, DIH method, and various improved versions of SPA and RS steganalysis. Keywords: Steganography, LSB embedding, Sample pairs steganalysis, Chaotic system, Dynamic compensation 1. Introduction. Steganography is one of the important research subjects in the field of information security. It enables secret communication by embedding messages in the texts, images, audio, video files or other digital carriers. Among all the image information hiding methods, LSB embedding is widely used for its high hiding capacity, and simpleness to realize. Many public steganographical softwares, such as S-Tools, EZStego and Steganos apply this technique. Therefore, it’s with great significance to detect the images with hidden messages produced by LSB embedding effectively, accurately and reliably. And many experts made efforts on the LSB steganography and steganalysis research over the years. Fridrich et al. [1] developed a steganalysis method for detection of LSB embedding in 24-bit color images (the Raw Quick Pairs —RQP method), which is based on analysis of close pairs of colors created by LSB embedding. It works reasonably well as long as the number of unique colors in the cover image is less than 30% of the number of pixels. The 575 576 X. LUO, F. LIU AND P. LU RQP method can only provide a rough estimate of the size of the secret message, and the result becomes progressively unreliable once the number of unique colors exceeds about 50 percent of the number of pixels. Westfeld [2,3] performed the blind steganalysis based on statistical analysis of PoVs (pairs of values). This method, so-called χ2 -statistical test, gave a successful result to a sequential LSB steganography. Provos [4] pointed that this idea can also detect random embedding if applied to smaller part of images. Chandramouli [5,6] introduced hypothesis test to judge the existence of secret messages by LSB embedding, and the general framework was given in [7]. Fridrich et al. [8,9] also proposed a powerful RS (regular and singular groups method). This method made statistics of the alterations of regular groups and singular groups in the image to estimate the embedded length accurately. It is suitable for color or gray-scale images when the messages are embedded randomly. Sorina Dumitrescu et al. [10,11] proposed SPA, a method to detect LSB steganography via sample pair analysis. When the embedding ratio is more than 3%, this method can estimate the embedded length with relatively high precision. Paper [12] presented a reliable detection method of LSB steganography based on the difference image histogram (DIH method). This method not only can detect the existence of messages embedded by sequential or random LSB replacement in images reliably, but also can estimate the hiding length. If the embedding ratio is above 50%, the result of DIH method is more accurate than that of RS method. In a sense, RS and SPA are a couple of the most reliable detectors of thinly-spread LSB steganography until now [13]. Andrew D. Ker [13,14] evaluated the reliabilities of Pairs [15], RS and SPA (which is called Couples Steganalysis in [14]) through large quantities of experiments, and proposed some good improvements. We also present a LSM sample pair steganalysis based on the improvement of SPA in literature [16]. However, most of the methods discussed above (especially the ones with high reliability) are based on some important statistical hypothesis. Though these assumptions are held for natural images, they also offer the possibility for the encoder to avoid being detected. In fact, encoders may adjust the embedded images purposely to make them still satisfy these assumptions to avoid exposure. Against RS analysis, Jeong Jae Yu et al. put forward SES [17] , which was realized by adding one to or subtracting one from the pixel values of the carrier image randomly during embedding, and part of the image without the hidden message would be used to adjust RS statistics measurement. This method can resist the detection from RS or χ2 -statistical test, and in this process, the more the number of pixels is used to adjust the RS statistics measurement, the slimmer the possibility a hidden message can be captured by RS detection. So the message embedded by LSB should not be too large in order to make the system more secure. The author recommended that more than 50% of the pixels should be used to adjust RS statistics measurement. However, as far as SPA steganalysis is concerned, no effective attack seems to have been proposed in published literature. Through analysis of the method described in literature [10], we developed a new LSB embedding method with resistance to SPA detection, which is called DCLS (Dynamic Compensation LSB Steganography) method for short. DCLS method is perhaps the first one focusing on defeating SPA. Firstly, this method embeds messages in the LSB plane of the carrier image randomly via a chaotic system, then makes a dynamic compensation on the stego-image. The compensation style isn’t fixed, i.e. the position and shape of this part can be chosen at will, such as rectangular and circular. Moreover, some parameters, A LSB STEGANOGRAPHY APPROACH 577 such as the initial value of the chaotic system, the starting point, length and width of a chosen rectangle area can be used as part of a system key, which further improves the security of the whole hiding system. Large quantities of experiments show that the method described in this paper also can effectively evade attacks from RS, DIH, improved RS and SPA, and other analyses based on sample pairs. Even when the embedding ratio is near 100%, this method is still effective to such attacks. Meanwhile, this method can powerfully defeat χ2 -statistical test. This paper is constructed as follows. Principle of SPA steganalysis is briefly introduced in Section 2. Section 3 describes the principle and algorithm of DCLS method, as well as corresponding its analysis. Section 4 presents the chaotic system and experimental results, and then Section 5 concludes this paper. 2. Principle of SPA Steganalysis. 2.1. SPA steganalysis method. The principle of SPA method is based on finite-state machine theory. The states of finite-state machine are selected multisets of sample pairs. If sample pairs were drawn from images, there are some inherent relations. But after random LSB embedding, these multisets will change, and it causes changes to these statistics relations. Assuming that the pixel value of an image is represented by the succession of samples s1 , s2 , · · · , sN (the index represents the location of a sample in the image), a sample pair means a two-tuple (si , sj ), 1 ≤ i, j ≤ N . Let P be a set of sample pairs drawn from an image, then P can be seen as a multiset of two-tuples (u, v), where u and v are the values of two adjacent samples, 0 ≤ u ≤ 2b − 1, 0 ≤ v ≤ 2b − 1, and b is the number of bits to represent each sample value. Denote by Dn the submultiset of P that consists of sample pairs of the form (u, u + n)or(u + n, u), i.e., wherenis a fixed integer, 0 ≤ n ≤ 2b − 1. For each integer m, 0 ≤ m ≤ 2b−1 − 1, define Cm = {(u, v) ∈ P | bu/2c − bv/2c = m or bv/2c − bu/2c = m}, where b•c denotes the maximal integer that is smaller than •. Obviously, Dn form a partition of P , and Cm forms another partition of P , and D2m is contained in Cm . We partition D2m+1 into two multisets X2m+1 and Y2m+1 , where X2m+1 = D2m+1 ∩ Cm+1 , Y2m+1 = D2m+1 ∩ Cm , for 0 ≤ m ≤ 2b−1 − 2, and X2b −1 = ∅, Y 2b −1 = D2b −1 . Both X2m+1 and Y2m+1 contain pairs (u, v) that differ by 2m+1 (i.e., |u − v| = 2m + 1). Those pairs in which the even component is larger are in X2m+1 , whereas those pairs in which the odd components is larger are in Y2m+1 . For natural images (normal signals), the probability for a sample pair in D2m+1 to have a larger or smaller even component is the same, all the algorithms discussed in literature [10] are based on this important assumption. That’s to say: E {|X2m+1 |} = E {|Y2m+1 |} (1) Through deduction, the following quadratic equations can be derived: (|Cm |−|Cm+1 |)p2 4 (|D 0 |−|D 0 |+2|Y 0 |−2|X 0 |)p 2m+2 2m+1 2m+1 − 2m 2 0 0 +|Y2m+1 | − |X2m+1 | = 0, m ≥ 1 (2) 578 and X. LUO, F. LIU AND P. LU (2|C0 |−|C1 |)p2 4 (2|D0 |−|D0 |+2|Y 0 |−2|X 0 |)p 0 2 1 1 − 2 (3) +|Y10 | − |X10 | = 0, m = 0 where 1 ≤ m ≤ 2b−1 − 1, and |∗0 | represents the cardinality of the multiset * in the embedded image. Resolving either (2) or (3), the length of the hidden information, the value of p can be obtained. To get a more robust estimate of the length, literature [10] uses a more accurate assumption ¯) ¯) (¯ j (¯ j ¯[ ¯ ¯[ ¯ ¯ ¯ ¯ ¯ E ¯ (4) X2m+1 ¯ = E ¯ Y2m+1 ¯ ¯ ¯ ¯ ¯ m=i m=i to replace the assumption (1), where 1 ≤ i ≤ m ≤ j ≤ 2b−1 − 1, then a more powerful quadratic equation can be obtained to estimate the value of p: (|Ci |−|Cj+1 |)p2 4 and (|D 0 |−|D 0 |+2 Pj (|Y 0 |−|X 0 |))p 2m+1 2m+1 2j+2 m=i − 2i 2 Pj 0 0 + m=i (|Y2m+1 | − |X2m+1 |) = 0, i ≥ 1 (2|C0 |−|Cj+1 |)p2 4 0 (2|D00 |−|D2j+2 |+2 Pj (|Y 0 |−|X 0 (5) |))p 2m+1 2m+1 m=0 − 2 (6) Pj 0 0 + m=0 (|Y2m+1 | − |X2m+1 |) = 0, i = 0 The precise estimate of hiding ratio can be obtained by resolving either equation (5) or (6)1 . Through experiments, the recommendation values of i, j and the judgment threshold are given by the author. The most precise estimate length can be obtained when i = 0, j = 30 and the judgment threshold is 0.018, and the average error is 0.023 in this situation. The experimental result arranged in literature [10] is showed in Table 1: Table 1. The detection results in literature [10] Embedding ratio 0% 3% 5% 10% 15% 20% Error ratio 0.1379 0.1103 0 0 0 0 2.2. Principle analysis of SPA method. As illustrated in Section 2.1, the principle of SPA steganalysis bases on the assumption (1) or the assumption (4). Through making statistics of some characteristics of the sample pairs in the image, it constructs equations about p. Then the ratio of the hidden message can be obtained and used to compare with the threshold to judge whether secret messages exist or not. The statistical characteristic include |Cm |, |D2m |, |X2m+1 | and |Y2m+1 |, the cardinalities of the sets Cm , D2m , X2m+1 and Y2m+1 , i.e. the quantities of them. For each modification pattern π ∈ {00, 10, 01, 11} and any submultiset A ⊆ P , denote by ρ(π, A) the probability that the sample pairs of A are modified with pattern π as a result of the LSB embedding. We say that multiset A is unbiased if ρ(π, A) = ρ(π, P ) holds for each modification pattern π. And A is called unbiased in short, otherwise A is called biased. 1 |Ct | (0 ≤ t ≤ 2b−1 − 1) is unknown to attackers, because they can’t get original carrier images. All the values obtained via statistics in equations (2 ∼ 6) can only obtained by making statistics to the examined image, so |Ct | in these equations actually is |Ct0 |. A LSB STEGANOGRAPHY APPROACH 579 The most straightforward attack method is to avoid operating LSB embedding at locations where adjacent sample pairs have close values, for example, no embedding in the adjacent sample pairs that differ by less than 3 in value. In other words, we can purposefully trick C0 to be biased, violating the accuracy of detection. Such an attack against SPA is to only embed message bits into candidate sample locations where all adjacent sample pairs are in Ct , t ≥ τ , where τ is the prefixed threshold. In other words, any sample pair (u, v), |u − v| ≥ 2τ − 1 will satisfy |u0 − v 0 | ≥ 2τ − 1, (u0 , v 0 ) represents the values of the two samples after LSB embedding where. Clearly, this LSB embedding scheme conditioned on Ct , t ≥ τ , and both encoder and decoder can refer to the same Ct to decide whether a sample is a candidate for embedding. This kind of attack was examined in literature [10], and the corresponding counter measures are discussed. In general, the proposed method is open for attack if the locations of chosen sample pairs in P are known and if the algorithm examines a specific close set CS and the chosen s is also known. For SPA detection can make statistics of all the possible relations of adjacent pixel pairs, through equation (5), different multisets ∪jm=i Cm and ∪j+1 m=i+1 Cm can be chosen to estimate the value of p for different i and j. The estimate accuracy will be improved if ∪jm=i Cm and ∪j+1 m=i+1 Cm are unbiased. In fact, it is extremely difficult, if not impossible, to select locations of embedded message bits in such a way that all of Cm , 0 ≤ m ≤ 2b−1 − 1 become biased. So it is unavailable to attempt to foil the SPA detection by choosing embedding locations. Let the difference between |Y2m+1 | and |X2m+1 | as εm , then εm = |Y2m+1 | − |X2m+1 |, 0 ≤ m ≤ 2b−1 − 2. 2 For 1 ≤ i ≤ j ≤ 2b−1 − 2, denote eij = 2 eij = j P j P εm m=i |D2i |−|D2j+2 | , and for 0 = i ≤ j ≤ 2b−1 − 2, denote εm m=0 2|D0 |−|D2j+2 | . The error bounds on the estimate of embedding ratio is presented in 2|ei | literature [10] through theoretical deduction as |p − p̂(i, j)| ≤ 1−ejij (1 − p), where 0 ≤ i ≤ j ≤ 2b−1 − 2, and p̂(i, j) is the estimate value p obtained by resolving equation (5) or (6). Noting that |eij | should be small to reduce the estimate error, in other words, we would P P like to reduce jm=i εm and increase |D2i | − |D2j+2 |. jm=i εm decreases in general as the difference between iand j increases. However, given an i, the larger the distance j − i, the larger the difference |D2i | − |D2j+2 |, more robust estimate of p. Therefore, as it’s said in literature [10], we should let i be 0, and choose a sufficiently large j to obtain robust estimate by resolving equation (6). In fact, |X2m+1 | and |Y2m+1 | are so small that can be ignored when m > 30, and the recommendation value of j is given as 30 by the author based on large quantities of experiments. 3. Dynamic Compensation LSB Steganography System. Because equation (6) bases on assumption (4), the attempt this paper made to defeat SPA attack starts from assumption (4). ¯ ¯ j ¯ ¯S ¯ X2m+1 ¯¯ For an image whose least significant bits have been embedded with message, ¯ m=i ¯ j ¯ ¯S ¯ and ¯¯ Y2m+1 ¯¯ will be altered, and the distance value between Y2m+1 and X2m+1 before m=i 580 X. LUO, F. LIU AND P. LU and after embedding will be altered correspondingly, but how much could this alteration affect the obtained hiding ratio? 0 0 ||∪j Y2m+1 |−|∪jm=i X2m+1 || 0 0 Let ∪m=i and ∪jm=i Y2m+1 , be the irrelevance between ∪jm=i X2m+1 j j 0 0 Y + ∪ X | m=i 2m+1 | | ¯¯m=i 2m+1 | ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ 0 0 0 0 ¯ − ¯∪j X2m+1 ¯¯ = δi,j × (¯∪j Y2m+1 ¯ + ¯∪j X2m+1 ¯). written as δi,j , then ¯¯∪jm=i Y2m+1 m=i m=i m=i Figure 2 shows the effect on the detection result of an original image, Lena (Figure 1, color, red component), caused by the alteration of δi,j in the SPA steganalysis on the assumption (1) (i = j) and (4) (i = 0, j = 30). Figure 1. Original Lena image Figure 2. The effect irrelevance causes on detection results Generally, the value of δi,j will become larger after LSB embedding. And as shown in the above figure, the estimate value p getting from detection ¯with the ¯ ¯ jwill grow ¯ j increment 0 0 ¯ ¯ ¯ of δi,j . In other words, the closer the difference between ∪m=i Y2m+1 and ¯∪m=i X2m+1 is to 0, the the ¯ estimate ¯length ¯is. So to defeat SPA, we must let the difference ¯ jsmaller j 0 0 ¯ ¯ ¯ between ∪m=i Y2m+1 and ∪m=i X2m+1 ¯ be small enough to make p smaller than the judgment threshold. Based on this idea, we present an LSB embedding method that can defeat SPA steganalysis. In this method, all the pixel values in part of the LSB embedded image are compensated. The operation of adding 1 (or subtracting 1) will apply to all the pixel values in the chosen part to adjust δi,j to reach the smallest, close to 0, then to make the estimate value of SPA close to 0. And the range of this part can be dynamically chosen according to δi,j , the irrelevance after embedding. Regarding the carrier image as I, embedded message as S, the embedded image as I 0 , and when I 0 is dynamically compensated, it is written as I 00 . Assuming that the width and height of the image are: W = M , H = N , where M , N respectively represents the number of pixels in the horizontal and vertical direction of the image. Generally, the whole LSB steganography system can fall into two processes as information embedding and information extraction. These two processes can be realized in the two sides of the communication, the sender and the receiver. In this paper, we simply preprocess the carrier image firstly, then operate message embedding on the LSB plane of the pixels, afterwards, values¯ of the image, that’s to adjust the statistics ¯ the¯ pixel ¯ 0 pa-¯ ¯ jcompensate j 0 0 ¯ ¯ ¯ ¯ image. For the value of ¯Y2m+1 ¯ rameters ¯ ¯ ¯ ¯ 0 ∪¯m=0 Y2m+1 and ∪m=0 X2m+1 in ¯the30embedded 0 0 ¯ and ¯∪30 ¯ and ¯X2m+1 ¯ is nearly 0 when m > 30, only ¯∪m=0 Y2m+1 m=0 X2m+1 need to be dynamically compensated in reality. The principle of the whole hiding system is illustrated in Figure 3. Then the message embedding algorithm, compensation preprocessing, compensation algorithm and extraction algorithm will be explained respectively as below. A LSB STEGANOGRAPHY APPROACH 581 Figure 3. The steganography system against sample pairs steganalysis 3.1. Message embedding algorithm. There are two modes of LSB embedding as follows, sequential embedding and random embedding. According to the SPA steganalysis method, we only also consider the later embedding mode in this thesis. In random embedding process, use a random number generator based on a chaotic system. Input: original carrier image I 0 , message bit-stream m1 , · · · , ml , l is the embedding length, mi = 0 or 1, 1 ≤ i ≤ l; initial key K. Output: stego-image I 0 . Steps: 1. Image index. Make the index of the pixels of the cover image in the order of from left to right and from up to down, and put it as: c1 , c2 , · · · , cM ×N . 2. Generates random sequence. According to the initial key K, generates a random sequence k1 , · · · , kM ×N via a perturbed digital chaotic system. 3. Selects embedding positions. Uses the pixels in those index locations to embed message bits: j1 = k 1 ji = (ji−1 + ki ) i ≥ 2 This means that using message bit mi to substitute the LSB of the pixel in the location cji , then decide the next embedding locations according to ki . If (ji−1 + ki ) > M × N , ji = (ji−1 + ki )Mod(M × N ), i.e. should make a M × N module. 4. Embeds LSB message. Embeds all message bits and gets the stego-image I 0 . 3.2. Compensation preprocessing. For the pixel values of part of the image will be added by 1 in the compensation process, the pixel value of 255 will be 256 after adding 1, which overruns the range which can be denoted by one byte. One countermeasure is to put the pixel value as 0 after adding 255 by 1, but this will lead to the unsmoothness of part of the image, like the appearance of flecks, as illustrated in Figure 4 (in the back brim of the hat and the bulge of nose). Therefore, it is necessary to preprocess the stegoimage before compensating, and the concrete operation is to scan all the pixel values in the carrier image and modify all the pixels whose value is 255 into 253. Obviously, such modification will cause little effect on the image and won’t cause difference in vision. At the same time, this modification model can remain the parity of adjoint sample pairs and won’t effect correct extraction of the message bits. If subtracting 1 from the pixel value realizes the compensation, efforts should be made analogously. 582 X. LUO, F. LIU AND P. LU Figure 4. The color image Lena after the pixel value 255 being modified as 0 ¯ ¯ 0 ¯ ¯ 0 ¯ and ¯X2m+1 ¯ grows 3.3. Compensation algorithm. The irrelevance between ¯Y2m+1 after LSB embedding, and compensation algorithm deals with how to eliminate this alteration. To make comprehension easy, we first assume that the adjacent pixel pairs whose difference is 2m + 1 distribute uniformly in the embedded image. Then after the pixels values of half stego-image are added by 1, the total number of the adjacent pairs whose difference is 2m + 1 is almost invariable (only slight alteration near the parting line). But as the pixels values in compensation area all are added by 1, the parity of the pixel value 0 ¯ ¯ 0 ¯, it has been reduced by |Y2m+1 | , but has been inwill be interchanged. As for ¯Y2m+1 2 0 0 0 ¯ 00 ¯ |Y2m+1 |X2m+1 | | |X2m+1 | ¯ ¯ creased by , so the pixel value after modification is Y + . In = 2m+1 2 2 2 00 this paper, |∗ | denotes the cardinality of the multiset ∗ in the compensated stego-image. 0 0 ¯ 00 ¯ |Y2m+1 ¯ 00 ¯ ¯ 00 ¯ | |X2m+1 | ¯= ¯ = ¯X2m+1 ¯ after half the Similarly, ¯X2m+1 + 2 . It is obvious that ¯Y2m+1 2 ¯¯ 00 ¯ ¯ 00 ¯¯ ¯ − ¯X2m+1 ¯¯ pixel values of the image have been compensated. Then the value of ¯¯Y2m+1 is 0, i.e. their irrelevance is 0. So the value of p is 0 by resolving the equation (6) in SPA, which foils SPA into the incorrect judgment that no hidden message is in the image. For Lena¯has been embedded with 5% message, we make statistics ¯ if the¯ image ¯ example, 0 30 0 ¯ ¯ ¯ Y X and ∪ of ¯∪30 m=0 2m+1 m=0 2m+1 , and calculate their irrelevance δ0,30 = 0.0107, then the estimate value obtained by SPA is p ¯= 4.97%. Then compensate the ¯ ¯ pixel values of half ¯ 30 00 00 ¯ ¯ ¯ this stego-image, make statistics of ¯∪30 Y X and ∪ m=0 2m+1 m=0 2m+1 , and calculate their irrelevance, we get δ0,30 = 0.0013, meanwhile, the estimate value obtained by SPA is p = 0.59%. For the threshold used in SPA is 0.018, SPA steganalysis will come to the conclusion that no hidden information is in the image, then protecting the hidden system from SPA attack is realized. ¯ 30But after ¯ the image, ¯ 30 Lena ¯has been embedded with 90% 0 0 ¯ ¯ ¯ message, make statistics of ∪m=0 Y2m+1 and ∪m=0 X2m+1 ¯, their irrelevance obtained by calculation is δ0,30 = 0.1873, correspondingly p = 93.72%. After the pixel values of half the image have been compensated, we get δ0,30 = 0.0094 and p = 4.64%, repeating the above procedures. This result is still larger than the threshold, so correct judgment will be made, and the steganography will be exposed to SPA attack. This means that directly adding the pixel values of half the stego-image by one as a whole is not adequate and the compensation area of the image should be dynamically chosen. One of the simplest methods for selection is exhaustive search. Assuming that the compensation area of the image is the rows 0 ∼ i, 1 ≤ i ≤ N − 2 (when i is N -1, after all the pixel values in the image are added by one, δ0,30 is invariable, though X2m+1 and Y2m+1 have interchanged). i starts from 1, add all the pixel values in the rows 0 ∼ i by 1, A LSB STEGANOGRAPHY APPROACH 583 (1) make statistics and calculate corresponding δ0,30 , then add i by the degree of 1 in turn, (i) (i) the δ0,30 in N − 2 circumstances will be obtained, choose the lowest irrelevance min δ0,30 , (i) then the rows 0 ∼ i corresponded by min δ0,30 will be used as compensation area. Obviously, the compensation area can be chosen at will. It can be rectangular chosen randomly or other shapes chosen in the stego-image. Compensation should be made to get the lowest δ0,30 by adjusting the size of that shape area , then lead the SPA steganalysis to get a relatively low estimate valuep, we call such compensation with the lowest δ0,30 the optimum one in this selection. After the image, Lena has been LSB embedded with 90% ratio, the irrelevance obtained after the optimum compensation is 2.3736 × 10−4 as i = 241, and the embedding ratio obtained by SPA detection is p = 0.12%. For this value is far less than the threshold 0.018, SPA steganalysis will come to the incorrect conclusion that no information is hidden in Lena. When it is embedded with a ratio of 100%, after being optimally compensated by the above method, the SPA detection result is 0.073%, still far lower than the judgment threshold. Obviously, both two sides of the communication must know that in which part of the image whose pixel values are added by 1, otherwise correct message extraction is impossible. Then the parameters, such as the starting point and the down edge of the rectangular range can be regarded as the secret key K 0 . K 0 and Kwill consist of the system secret key together to decide the security of the steganography system. The system secret key will be transmitted through a secure channel. 3.4. Message extraction algorithm. Extraction is the reverse process of embedding and compensation. Firstly, subtract 1 from the pixel values in the corresponding range according to the secret key K 0 , so the image I 00 is restored into the image before compensation I 0 (without the consideration of the effects of the noise in the transmission), then generate corresponding pseudo-random sequence according to the embedded secret key K, and find pixels in the locations corresponded by the pseuo-random sequence in the image, finally extract the embedded message in their least significant bits. Extraction algorithm is considerably simple, and isn’t the emphasis of this paper, so we give no detailed description here. Note 1: The purpose of dynamic compensation is to get a relative low δ0,30 via statistics, after the pixel values in the selected optimum area are added or subtracted by one, then to make SPA detection get a very low estimated p. How about the effect dynamic compensation causes on δi,j corresponded by different i, j? Figure 5(a), 5(b) and 5(c) describe the δi,j corresponded by different i, j in the original image Lena, the 50% LSB embedded Lena and dynamically compensated Stego-Lena respectively. It can be seen that δi,j becomes low while δ0,30 is low after compensation (the lowest δi,j before compensation is about 0.035 and the largest δi,j after compensation is less than 0.013), i.e. all the δi,j are compensated, which makes attack on LSB embedding difficult. In fact, |Y2m+1 | and |X2m+1 | are too low to can be ignored when m > 30. 4. Experiment Results and Analysis. DCLS approach embeds messages in the LSB plane of the carrier image randomly via chaotic system firstly. In this experiment, we selected a perturbed digital chaotic system proposed by [19] and made a small improvement. The system uses perturb and diffuse kill to improve the dynamics characteristic of digital 584 X. LUO, F. LIU AND P. LU (a) (b) (c) Figure 5. (a) The irrelevance of original grey-color image Lena; (b)(c) The irrelevance before and after the compensation in the 50% stego-image Lena chaotic system. Experiment results shown that this chaotic system has some excellent dynamics characteristics, such as long cycle and nicer randomicity, so the steganography system has a comparative high security. The chaotic system can be described simply as follows. Considering a single dimension subsection linear chaotic mapping with four subintervals: [0, a), [a, b)[b, 1 − a), [1 − a, 1], ⎧ 4x 0≤x<a ⎪ ⎪ ⎨ 2 − 4x a≤x<b (7) g(x) = 4x − 2 b≤x<1−a ⎪ ⎪ ⎩ −4x 1−a≤x≤1 Diffuse the third subinterval of defining interval of g(x), c3 = [b, 1 − a), divide c3 into two segments according to e : 1 − e scale: c31 = [b, b + e/r) and c32 = [b + e/r, 1 − a], where e is diffused coefficient, r is constant, x is initial number. Let a = 0.25, b = 0.5 and r = 4, then we can obtain a new subsection linear chaotic mapping as follows according to the perturbing algorithm presented in [19]. ⎧ g(x) x∈ / c3 ⎨ 4 g( 2e (x − 0.5)) x ∈ c31 (8) f (x) = ⎩ x−(0.5+ 4e ) x ∈ c32 g( 1−e + 0.75) where e ∈ (0, 1) is diffuse coefficient, x ∈ (0, 1) and its initial number is x0 . When constructing chaotic sequence, according to the key K can get x0 and e. Dividing the mapping trivial into 2A subinterval equably, one can obtain some random numbers among 0∼ 2A − 1, these numbers conduct a sequence. In order to using the chaotic system into A LSB STEGANOGRAPHY APPROACH 585 DCLS method, we make a small modification to this sequence. Adding 1 to all of this sequence, we can get a new sequence and it also is random. 4.1. Standard images and detection results. In order to test the availability of the steganographic technique described in this paper, some standard images are embedded with message and analyzed in the experiment. The test image set includes twenty-four typical gray-scale images, and twenty- two images of it come from the twenty-four images enumerated in literature [10]. All the images are embedded by the LSB algorithm described in Part B of Section 3, then they are compensated by the algorithm described in Part C of Section 3. For each image, we make all kinds of experiments with the embedding ratio of 3%, 5%, 10%, 20%, · · · , 90%. Table 2 gives the average estimate value p obtained by SPA steganalysis before and after these twenty-four stego-images are dynamically compensated. The DC means dynamic compensation. Table 2. The effect on SPA steganalysis result caused by dynamic compensation Embedding ratio Average estimate value(%) Before DC After DC 0 0.69 0.69 3% 3.07 0.01 5% 4.95 0.02 10% 10.27 0.03 20% 20.13 0.04 30% 29.94 0.04 40% 40.30 0.06 50% 50.35 0.06 60% 60.37 0.06 70% 70.10 0.09 80% 81.01 0.10 90% 89.60 0.11 The table above shows that after the dynamic compensation is applied to stego-images, the estimate value of embedding ratio obtained by SPA steganalysis is greatly reduced. The embedding ratio 0 in the table means images are original ones, so the estimate values are the same before and after compensation. The largest average value of p is only 0.11%, much smaller than the judgment threshold 1.8%, so it can defeat SPA attack effectively. Meanwhile, similar experiment has been applied to 2000 JPG images with the pixel of 1024 × 768 from digit camera. The result is encouraging, none of the secrete images is captured by SPA detection. (a) (b) Figure 6. (a) Compensation area chosen by the method of average interval (b) SPA detection result before and after the compensation 4.2. Performance effect caused by random selection. In this part we discuss the effect on the detection result caused by the random selection of compensation area. The test image set used in the experiment is the same as the one used in Section 4.1, and the 586 X. LUO, F. LIU AND P. LU compensation area is selected by the method of average interval. Some i × i areas are chosen as the compensation area, where 32 ≤ i ≤ 128, as illustrated in Figure 6(a). The (i) value of i is determined by min δ0,30 . Figure 6(b) shows the effect on SPA steganalysis after the implementation of average interval method to select compensation area. Obviously, when the compensation area of the image is chosen by the method of average interval, the value of the embedding ratio is a very small value close to 0 even the embedding ratio is as high as 90%. So it concludes that this method can defeat SPA attack effectively. Figure 7(a), 7(b) describes the distribution of |Cm | in the 50% LSB embedded graycolor image Lena before and after compensation respectively. It can be obviously seen that the distribution of |Cm | in both images are very similar. |Cm | will change after the stego-image being compensated, but the alternant amplitude is small, because |D2m | keeps the same before and after compensation. Large quantities of experiments show that |Cm | still satisfies the inequality relation 2 |C0 | > |C1 | > |C2 | > · · · > |Cm | > |Cm+1 | > · · · of natural images. (a) (b) Figure 7. (a) The distribution of |Cm | in image before compensation (b) The distribution of |Cm | in image after compensation 4.3. Effect on DCLS method caused by preprocessing. Figure 8 shows the estimate value of embedding ratio obtained by SPA steganalysis in the color image Lena after it is preprocessed by the method described in the section 3, LSB embedded, and compensated. We can see from Figure 8 that the SPA attack on the image which has been preprocessed also becomes invalid, when the stego-images are compensated. 4.4. Defeating RS steganalysic approach. Further experiments show that DCLS method can also defeat RS attack. Table 3 gives the average estimate values of embedding ratio p obtained by RS steganalysis before and after the twenty-four stego-images in the Part A are dynamically compensated. It can be seen obviously from the Table 3 that all the embedding rates obtained from the compensated stego-images are extremely small, where DC means dynamic compensation. This shows that the DCLS method also has a powerful performance against RS steganalysis. A LSB STEGANOGRAPHY APPROACH 587 Figure 8. Detection result before and after compensation of the preprocessed image Lena Table 3. Effect on RS steganalysis result caused by dynamic compensation Embedding ratio Average estimate value(%) Before DC After DC 0 1.05 1.05 3% 3.33 0.79 5% 5.26 0.87 10% 10.89 0.84 20% 21.14 1.04 30% 31.30 1.27 40% 41.28 1.55 50% 51.56 1.72 60% 61.64 1.90 70% 71.33 2.00 80% 81.47 2.44 90% 89.75 2.52 4.5. Defeating other steganalytical approaches. To test the defeating performance of DCLS method to χ2 -statistical test, DIH, Pairs and the improved methods of Pairs, RS and SPA, 2000 JPG images (from a digit camera) are used in the experiment. These images are embedded with 10%, 50% and 90% LSB message, then dynamically compensated and finally detected by these steganalysic methods respectively. In the experiment, χ2 -statistical test is from [3], DIH steganalysis is from [12], and the improved methods of Pairs, RS, and SPA are from literature [13], [16] and [18] (the uniform threshold is assumed as 0.03). Results show that all the attacks from the above methods can be effectively defeated by our DCLS approach. 5. Conclusions. A new LSB steganography, DCLS approach, is presented. Firstly, this approach embeds a message in the LSB plane of the carrier image randomly via a chaotic system, then makes a dynamic compensation on the stego-image. Theoretic analysis and experiments show that this approach can be effective against sample pairs steganalysis. Some parameters of chaotic system and the compensation are used as part of system secret key, which further improves the security of the hiding system. What’s more, more empirical results also show that this method has rather strong resistance to some other steganalysis via pixels sample pairs analysis, such as RS, DIH, χ2 -statistical test, Pairs and the improved methods of Pairs, RS and SPA detection, and deeper research will be made to develop more robust steganography systems. Acknowledgment. This work is supported partially by the Nation Natural Science Foundation of China (No.60673082, 90304014 and 60374004), Special Funds of Authors of Excellent Doctoral Dissertation in China (No.200084), National High Technology Research and Development Program of China (“863” Program, No.2006AA10Z409), Henan Science Fund for Distinguished Young Scholar (No.0412000200) and HAIPURT (No. 2001KYCX008). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation. 588 X. LUO, F. LIU AND P. LU REFERENCES [1] Fridrich, J., R. Du and L. Meng, Steganalysis of LSB encoding in color images, Proc. of the IEEE International Conference on Multimedia and Expo, CD-ROM, IEEE Press, Piscataway, N. J., 2000. [2] Westfeld, A., Detecting low embedding rates, Proc. of the Information Hiding Workshop, Springer LNCS, vol.2578, pp.324-339. 2002. [3] Westfeld, A. and A. Pfitzmann, Attacks on steganographic systems, in Information Hiding, A. Pfitzmann (ed.), New York: Springer-Verlag, pp.61-76, 1999. [4] Provos, N., Defending against statistical steganalysis, Proc. of the 10th USENIX Security Symposium, Washington, DC, 2001. [5] Chandramouli, R. and N. Memon, Analysis of LSB based image steganography techniques, Proc. of the IEEE International Conference on Image Processing, pp.1019-1022, 2001. [6] Chandramouli, R. and N. D. Memon, On sequential watermark detection, IEEE Transactions on Signal Processing, vol.51, no.4, pp.1034-1044, 2003. [7] Chandramouli, R. and N. D. Memon. A distributed detection framework for steganalysis, Proc. of the ACM Multimedia Workshop Marina, pp.123-126, 2000. [8] Fridrich, J. and M. Goljan, Practical steganalysis of digital images — State of the art, in Security and Watermarking of Multimedia Contents IV., E. J. Delp III and P. W. Wong (eds.), Proc. SPIE, vol.4675, pp.1-13, 2002. [9] Fridrich, J., M. Goljan and R. Du, Reliable detection of LSB steganography in color and grayscale images, Proc. of the ACM Workshop Multimedia Security, Ottawa, ON, Canada, pp.27-30, 2001. [10] Dumitrescu, S., X. Wu and Z. Wang, Detection of LSB steganography via sample pair analysis, IEEE Transactions on Signal Processing, vol.51, no.7, pp.1995-2007, 2003. [11] Dumitrescu, S., X. Wu and Z. Wang, Detection of LSB steganography via sample pair analysis, Springer LNCS, vol.2578, pp.355-372, 2003. [12] Zhang, T. and X. Ping, Reliable detection of LSB steganography based on the difference image histogram, Proc. of the IEEE ICSAAP 2003, Part III, pp. 545-548, 2003. [13] Andrew D. Ker, Improved detection of LSB steganography in grayscale images, Proc. of the 6th Information Hiding Workshop, Springer LNCS, vol.3200, pp.97-115, 2004. [14] Andrew D. Ker, Quantitive evaluation of pairs and RS steganalysis, in Security, Steganography, and Watermarking of Multimedia Contents VI., E. J. Delp III and P. W. Wong (eds.), Proc. SPIE, vol.5306, pp.83-97, 2004. [15] Fridrich, J., M. Goljan and D. Soukal, Higher-order statistical steganalysis of palette images, in Security and Watermarking of Multimedia Contents V., E. J. Delp III and P. W. Wong (eds.), Proc. of the SPIE, vol.5020, pp.178-190, 2003. [16] Lu, P., X. Luo et. al., An improved sample pairs method for detection of LSB embedding, Proc. of the 6th Information Hiding Workshop, Springer LNCS, vol.3200, pp.116-128, 2004. [17] Yu, J. J., J. W. Han et. al., A secure steganographic scheme against statistical analyses, Proc. of the IWDW 2003, Springer LNCS, vol.2939, pp.497-507, 2004. [18] Luo, X., B. Liu and F. Liu, Improved RS method for detection of LSB steganography, Proc. of the 2005 International Conference on Computational Science and Its Applications, Springer LNCS, vol.3481, pp.508-516, 2005. [19] Liu, B., Luo X., Liu F., Perturbing scheme of digital chaos, Journal of Shanghai Jiaotong University (Science), English version, vol.11, no.2, pp.172-176, 2006.