RESEARCH PAPER ON ENHANCING DATA COMPRESSION RATE USING STEGANOGRAPHY Tamanna Garg School of Computer Science & Engineering Bahra University, Shimla Hills, India. garg.tamanna1@gmail.com Sonia Vatta School of Computer Science & Engineering Bahra University, Shimla Hills, India. soniavatta@yahoo.com ABSTRACT-In this paper, description of a compression algorithm based on steganography has been narrated. The compression algorithm has been used to develop an application which will help the users to hide large size text documents inside small size images. Maximum bits to be hidden per pixel can be increased to eight with the help of the developed compression application. After hiding the data inside an image, there appears to be no visible distortion at all. Also the application is compatible with all the documents and image formats. The developed application automatically converts the output stego image in bmp format. Keywords- steganography, cryptography , LSB, embedding, extraction, secret key, compression. I.INTRODUCTION In the contemporary era there is a dire need to convey confidential information secretly. Steganography serves the above said purpose in a frictionless way by hiding information inside the carrier. In other words steganography facilitates the process of hiding any information related document in carrier such a way that the existence of hidden information can’t even be judged by anyone. Image steganography is very prominent now-a- days. Hiding small size information in small or large images is an easy task but hiding large size information in small images is very complicated. In this research work, there is the narration of a compression algorithm which has been used to design an application capable of hiding large documents in small images without any changes. The developed compression algorithm has a capability of hiding data up to eight bits per pixel. The primary objective of steganography is to avoid drawing attention to the transmission of hidden information. The basic terminologies used in steganography systems are: the cover message, secret message, the secret key and embedding algorithm. In this research work, the embedding algorithm is the compression algorithm. The cover message is the carrier of the message such as image, video, audio, text or some other digital media. Here the carrier is an image. The secret message is the information which is needed to be hidden in the suitable digital media. The secret information in this work is in the form of any text format. The secret key is usually used to encrypt the message to have more security. The embedding algorithm is the way or the idea that is generally used to embed the secret information in the cover message. In steganography, before the hiding process starts, the sender must select an appropriate message carrier, an effective message to be hidden as well as a secret key used as a password. A robust steganography algorithm must be selected that should be able to encrypt the message more effectively. The sender then sends the hidden message to the receiver by using any of the modern communication techniques. The receiver after receiving the message decrypts the hidden message using the extraction algorithm and a secret key. 1 Figure 1: General Steganography Approach II. REVIEW OF LITRATURE In any field, the literature review provides a massive support to find the questions to carry out research. The review of literature reveals that further investigation in the field is required. So in relation to this work many research developments have been taken into consideration. Great scholar James C. Judge in his work' Steganography: Past, Present, Future’, stated that steganography is the term applied to any number of processes that will hide a message within an object, where the hidden message will not be apparent to an observer [1]. One of the researches by Muhalim bin Mohamed Amin et al in their work on' Information Hiding Using Steganography' has put forward that the system used to enhance the compression rate using LSB technique by randomly dispersing the bits of the message in the image. This technique makes it harder for unauthorized people to extract the original message [2].The pioneer researchers T. Morkel et al in their work 'An Overview of Image Steganography' asserted that different applications have different requirements of the steganography technique used. For example, some applications may require absolute invisibility of the secret information, while others require a larger secret message to be hidden [3]. In one another study by Shawn D. Dickman entitled ' An Overview of Steganography’, it has been stated that Steganography is a useful tool that allows covert transmission of information over an overt communications channel [4]. One another research by Namita Tiwari et al entitled 'Evaluation of Various LSB based methods of Image Steganography on GIF File Format 'proposed that many different carrier file formats can be used, but digital images are the most popular because of their frequency on the Internet [5]. Prominent research scholar Yongzhen Zheng et al in their work on ' Identification of Steganography Software based on Core Instructions Template Matching ' proposed an approach, which was based on the principles of LSB Replacement Steganography algorithm and which was used to identify steganography software by Core Instructions Template Matching [6]. Research scholars Dipesh Agrawal & Samidha Diwedi in their research on ' Analysis of random bit image steganography techniques' propounded that many steganography techniques can be used like least significant bit (LSB), layout management schemes replacing only 1& apos;s or only zero & apos;s from lower nibble from the byte for hiding secret message in an image [7]. Saddaf Rubab and Dr. M. Younus in their project' Improved Image Steganography Technique for Colored Images using Huffman Encoding with Symlet Wavelets' stated a new devised algorithm to hide text in any colored image of any size using Huffman encryption and 2D Wavelet Transform. The results proved that there is very negligible image quality degradation. It gives more capacity for larger image sizes. It enhances security and also preserves the image quality. By inserting Huffman codes into the three components of colored image it becomes complicated[8].Shamim Ahmed Laskar and Kattamanchi Hemachandran in their work on' High Capacity data hiding using LSB Steganography and Encryption' proposed a high capacity data embedding approach by the combination of Steganography and Cryptography. The combination of these two methods will enhance the security of the data embedded. The main objective of this work was to provide resistance against visual and statistical attacks as well as high capacity [9]. Hemalatha Sharma et al in their project on 'A Secure and High Capacity Image Steganography Technique' provides a novel image steganography technique to hide multiple secret images and keys in color cover image using Integer Wavelet Transform (IWT).However the disadvantage of the approach is that it is susceptible to noise if spatial domain techniques are used to hide the key[10].Elham Ghasemi et al in their work on 'High Capacity Image Steganography Based on Genetic Algorithm and Wavelet Transform' stated the application of wavelet transform and genetic algorithm (GA) in a novel steganography scheme. A GA based mapping function to embed data in discrete wavelet transform coefficients in 4*4 blocks on the cover image has been employed. The optimal pixel adjustment process (OPAP) is applied after embedding the message. 2 This work introduced a novel steganography technique to increase the capacity and the imperceptibility of the image after embedding [11]. Rahul Jain and Naresh Kumar in their research on ' Efficient data hiding scheme using lossless data compression and image steganography' stated a data hiding scheme using image steganography and compression. The improved embedding capacity of the image is possible due to preprocessing the secret message in which a lossless data compression technique is applied. This preprocessing reduces the size of the secret data by a significant amount and thus permits more data into the same image [12]. Prashant Dahake in his work ‘An Efficient Encryption Using Data Compression towards Steganography' stated that compactness is achieved using data compression technique, that is by using arithmetic coding. In proposed system additional security is provided to data by using encryption technique, which makes use of any cryptographic algorithm and it is applied on the compressed data [13]. In the above study, first of all there has been described the general definition of the steganography given by a researcher. Then the work done by some pioneer researchers on the steganography to enhance the quality as well as the size of data being hidden in digital media has been described. After having a deep observation, it has been found that there was a problem related to hide large size text information in small size image. So the aim was to develop a technique which could enhance the compression rate in order to hide large size information in small size images. III.OBJECTIVES The main goal of this research work is to enhance the data compression rate by designing and applying compression algorithm on bmp images to facilitate the hiding of enlarged text in an image. This project has following objectives: To explore techniques of hiding data using encryption module of this project. To extract techniques of getting secret data using decryption module. To design a compression algorithm. To enhance data compression rate by using the designed algorithm. To create a tool that can be used to hide the data inside a 24-bit colored image. IV. OLDTECHNIQUE & PROPOSEDTECHNIQUE 1. OLD TECHNIQUE Old technique was based on LSB algorithm.LSB (Least Significant Bit) substitution is the process of adjusting the least significant bit pixels of the carrier image. It is a simple approach for embedding message into the image. The Least Significant Bit insertion varies according to number of bits in an image. For an 8 bit image, the least significant bit i.e., the 8th bit of each byte of the image is changed to the bit of secret message. For 24 bit image, the colors of each component like RGB (red, green and blue) are changed. LSB is effective in using BMP images as the compression in BMP is lossless. But for hiding the secret message inside an image of BMP file using LSB algorithm it requires a large image which is used as a cover. LSB substitution is also possible for GIF formats, but the problem with the GIF image is whenever the least significant bit is changed the whole color palette will be changed. The problem can be avoided by only using the gray scale GIF images as the gray scale image contains 256 shades and the changes will be done gradually, so that it will be very hard to detect. For JPEG, the direct substitution of steganography techniques is not possible as it will use lossy compression. So it uses LSB substitution for embedding the data into images. There are many approaches available for hiding the data within an image: one of the simple least significant bit submission approaches is "Optimum Pixel Adjustment Procedure". The simple algorithm for OPA explains the procedure of hiding the sample text in an image. Step1: A few least significant bits (LSB) are substituted with data to be hidden. Step2: The pixels are arranged in a manner of placing the hidden bits before the pixel of each cover image to minimize the errors. Step3: Let n LSBs be substituted in each pixel. 3 Step4: Let d= decimal value of the pixel after the substitution.d1 = decimal value of last n bits of the pixel.d2 = decimal value of n bits hidden in that pixel. Step5: If (d1~d2) <= (2^n)/2, then no adjustment is made in that pixel. Else Step6: If (d1<d2) d = d –2^n.If (d1>d2) d = d + 2^n. This "d" is converted to binary and written back to pixel. This method of substitution is simple and easy to retrieve the data and the image quality is better & it provides enhanced security. Figure 2: General LSB technique 2. PROPOSED TECHNIQUE The algorithm that has proposed is basically an extension of the original LSB technique, which is quite vulnerable. Instead of hiding data in least significant bits of the RGB components of a pixel, the data would be hidden as shown below:Let the data to be hidden is word “ABC” ASCII code of A= 65 and corresponding binary is 01000001. ASCII code of B= 66 and corresponding binary is 01000010. ASCII code of C= 67 and corresponding binary is 01000011. Let the first pixel’s RGB component be: - Red component is replaced with binary of 65 i.e. A. Let the second pixel’s RGB component be: - 4 Green component of second pixel is replaced with binary of 66 i.e. B. Let the third pixel’s RGB component be: - Blue component of third pixel is replaced with binary of 67 i.e. C. And the process continues until all the pixels get exhausted. The resulting stego image that will be obtained after the algorithm completes its execution, is distorted and is easy to detect, that some kind of alteration has been done to the image. So, to enhance the security of the secret message the covering of resulting stego image with a new cover image would be done, this is the first level of security. By just looking at the resulting image no one would be able to predict that something is hidden inside it. The new cover image can be the same or different than the original. In order to increase the storage capacity of the image, a compression algorithm has been used; each component of an RGB pixel is represented with 8 bits. So, the maximum compression would be 8 bits per pixel and minimum would be 1 bit per pixel. The proposed steganography algorithm comprises of two embedding techniques; which are data hiding technique and data retrieving technique. Data hiding technique as the name suggests is used to hide secret message and key in the cover image, while data retrieving technique is used to retrieve the key and the hidden secret message from the stego image. Therefore data is protected in image without revealing to unauthorized party. A. Proposed embedding technique. Inputs: - Text file, cover image 1, cover image 2 and secret key. Output: - Stego image. Begin 1. Select a text file, convert it into binary form and calculate the number of bits in it. 2. Select a carrier image (cover image 1) for hiding purpose, find the number of pixels, convert it into RGB image and call the compression function. 3. If bits calculated are compatible with the image resolution, then Start sub iteration 1 Replace red component of the first pixel with first character. Replace green component of the second pixel with second character. 5 Replace blue component of the third pixel with third character. And repeat iterations until pixels exhaust. Stop sub iteration 1 Else Repeat sub iteration 1 Find necessary compression ratio and perform sub iteration 2. Sub iteration 2 Replace necessary bits as defined by the compression ratio in immediate component of each pixel. Store the information about bits embedded in a binary address file. Stop sub iteration2 4. Provide a security key to encrypt the data for better security. 5. Select 2nd cover image to hide the distorted stego image. End B. Proposed Extraction technique. Input: - Stego image and secret key. Output: - Secret text file. Begin 1. 2. 3. 4. Browse the stego image. Choose the folder in which you want to extract the hidden text file. Provide necessary security key. Convert the binary file into human readable form. End The main focus of this proposed steganography technique is to hide text files in images, compresses the text files so as to increase the overall storage capacity, applying a secret key on the resulting stego image and transferring the secret message without any vulnerability and threat. 6 Figure 3: General Layout of Proposed System. This system is able to maintain the accuracy & confidentiality of the data. The system also works by hiding the text files in images using a secret key and is also able to retrieve the data back from the stego image. V. IMPLEMENTATION OF SYSTEM The system has been developed in Java. The system basically comprises of two main interfaces, one for embedding purpose and other for the extraction process. Overview of System The embedding form looks like as shown below: Figure 4: Embedding form of application The embedding form as shown above comprises of three main browsing fields. One for the text file to be embedded, second for the image in which the file will be embedded and third for the cover image to hide the underlying distortion. One important point to note here is that the cover file can or cannot be same as the one used for the hiding process. After filling these necessary fields, the next step is to check the encryption checkbox. User need not to worry about the underlying compression procedure, which in turn is automatically performed by the system itself. User then needs to provide the secret key twice for the verification procedure, various validations are applied here. The secret key along with the text file is embedded inside the 7 image. Once the data has been keyed in and the secret key has been entered, the new stego image can be saved to a different image location. The new stego image can then be used by the user to send it via internet or email to other parties without revealing the secret data inside the image. If the other parties want to extract the hidden data from the stego image, they need to upload the new stego image using the system itself to retrieve the text file hidden inside the image by providing the secret key. The extraction form looks like as shown below: Figure 5:Extraction form of Application VI. RESULTS The system is tested using the images shown in Figures 6-8. Example 1 Figure 6a: Original image (.jpg) Figure 6b: Cover image (.jpg) Figure 6 c: Stego image (.jpg.bmp) Figure 6a shows the original image before the message is stored in it. Figure 6b shows the cover image. Here it should be noted that the original image and the cover image are exactly same having same extensions of .jpg. The resulting stego image has a double extension of .jpg.bmp. It has been found that the stego image as shown in Figure 6c does not have any noticeable changes in it as seen from naked eyes. 8 Example 2 Figure 7a: Original image (.jpg) Figure 7b: Cover image (.jpg) Figure 7c: Stego image (.jpg.bmp) In this example hidden image and cover image Figure 7a and 7b respectively are exactly the same having same extensions of .jpg. The resulting stego image as shown in Figure7c obtained does not have any noticeable changes and it is found that it is having an extension of .jpg.bmp only. Example 3 Figure 8a: Original image (.jpg) Figure 8b: Cover image (.jpg) Figure 8c: Stego image (.jpg.bmp) In this example hidden image and cover image inFigure 8a and 8b respectively are exactly the same having same extensions of .jpg. The resulting stego image as shown in Figure 8c obtained does not have any noticeable changes and it is found that it is having an extension of .jpg.bmp only. Actually what is happening here is that the data is embedded inside the original image using the algorithm which has been proposed but the images which are obtained after the embedding process, are distorted, so in order to overcome this limitation the distorted image is covered using a cover image. Using the proposed algorithm, the testing is done on several sizes of images to see various sizes of data being stored in the image. Table 1: Comparison of different file sizes in jpg format. Sr.no. 1 2 3 4 5 6 7 8 9 10 Original image size(.jpg) 289 kb 201 kb 172 kb 151 kb 134 kb 126 kb 114 kb 73 kb 38 kb 27 kb Text file size 216 kb 345 kb 556 kb 818 kb 1.10 mb 1.50 mb 1.71 mb 1.30 mb 1.10 mb 1.50 mb Cover image size(.jpg) 289 kb 201 kb 172 kb 151 kb 134 kb 126 kb 114 kb 73 kb 38 kb 27 kb 9 Stego image size(.jpg.b mp) 289 kb 201 kb 172 kb 151 kb 134 kb 126 kb 114 kb 73 kb 38 kb 27 kb Embedding Done Done Done Done Done Done Done Done Done Done Extraction Done Done Done Done Done Done Done Done Done Done After embedding the text file, the developed application automatically converts the image into bmp format. Table 2: Text files formats supported by our system. Text file formats Embedding Extraction .txt Done Done .docx Done Done .pdf Done Done .ppt Done Done .cpp Done Done VII. CONCLUSION AND SUGGESTIONS After the analysis and development of steganography application with the capability of compression technique, which was not present in earlier existing steganography applications, it is derived that the designed application works well even with the large size documents due to its compression capability. The application can be used for hiding large document in a small image by increasing the maximum bits hidden per pixel. It has been concluded that the developed application can hide up to eight bits per pixel through its unique compression technique. The generated stego image even after the optimization of compression is free from any visible changes. Steganography will continue to increase in popularity over cryptography, as it gets more and more advanced as will the steganalysis tools for detecting it. At the time though most of the tools can detect the files hidden in any image, but small sentences and one-word answers like ‘yes’ are virtually impossible to find. There also seems very less of tools for hiding data in videos. There are some available for audio, but this is still an area, which lags behind image steganography. The future may see audio files and video streams that could possibly be decoded on the fly to form their correct messages. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] James. C. Judge, "Steganography: Past, Present, Future", GSEC Version 1.2f, SANS Institute 2001. M.M. Amin, .M. Salleh, S. Ibrahim, M.R Katmin (2003), “Information Hiding Using Steganography”, 4th National Conference on Telecommunication Technology Proceeding 2003 (NCTT2003), Concorde Hotel, Shah Alam, Selangor, 14-15 January 2003, pp. 21-25. T Morkel, JHP Eloff and MS Olivier, "An Overview of Image Steganography," in Proceedings of the Fifth Annual Information Security South Africa Conference (ISSA2005), Sandton, South Africa, June/July 2005 (Published electronically). Shawn D.Dickman "An Overview of Steganography", James Madison University Infosec Tech report, July 2007, JMUINFOSEC-TR-2007-002. Namita Tiwari, Dr.Madhu Shandilya, "Evaluation of Various LSB based methods of Image Steganography on GIF File Format", International Journal of Computer Applications (0975-8887), Volume 6-No.2, September 2010. Yongzhen Zheng, Fenlin Liu ; Xiangyang Luo ; Chunfang Yang ,"Identification of Steganography Software based on Core Instructions Template Matching," in Multimedia Information Networking and Security (MINES), 2012 Fourth International Conference on, Shanghai , 4-6 Nov. 2011. Dipesh Agrawal & Samidha Diwedi, "Analysis of random bit image steganography techniques" IJCA Proceedings on International Conference on Recent Trends in Engineering & technology 2013 ICRTET, New York, USA, 1-4, May 2013. Saddaf Rubab, Dr. M. Younus," Improved Image Steganography Technique for Colored Images using Huffman Encoding with Symlet Wavelets" IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 2, No 1, March 2012 Shamim Ahmed Laskar and Kattamanchi Hemachandran," High Capacity data hiding using LSB Steganography and Encryption" International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012. 10 [10] [11] [12] [13] Hemalatha S, U Dinesh Acharya, Renuka A and Priya R. Kamath, "A SECURE AND HIGH CAPACITY IMAGE STEGANOGRAPHY TECHNIQUE" Signal & Image Processing: An International Journal (SIPIJ) Vol.4, No.1, February 2013. Elham Ghasemi, Jamshid Shanbehzadeh, and Nima Fassihi, "High Capacity Image Steganography Based on Genetic Algorithm and Wavelet Transform" Rahul Jain," High Capacity data hiding using LSB Steganography and Encryption" International Journal of Engineering Science and Technology (IJEST) Prashant Dahake ,"An efficient encryption using data compression towards steganography" 11