Journal of Discrete Mathematical Sciences and Cryptography ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tdmc20 DNA encryption algorithm based on Huffman coding Mustapha Meftah, Adda Ali Pacha & Naïma Hadj-Said To cite this article: Mustapha Meftah, Adda Ali Pacha & Naïma Hadj-Said (2020): DNA encryption algorithm based on Huffman coding, Journal of Discrete Mathematical Sciences and Cryptography, DOI: 10.1080/09720529.2020.1818450 To link to this article: https://doi.org/10.1080/09720529.2020.1818450 Published online: 27 Dec 2020. Submit your article to this journal Article views: 23 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=tdmc20 Journal of Discrete Mathematical Sciences & Cryptography ISSN 0972-0529 (Print), ISSN 2169-0065 (Online) DOI : 10.1080/09720529.2020.1818450 DNA encryption algorithm based on Huffman coding Mustapha Meftah * Adda Ali Pacha † Naïma Hadj-Said § Coding and Information Security Laboratory University of Science and Technology Mohamed Boudiaf Oran Algieria Abstract Today, the transmission of hypersensitive data through public communication, poses a real problem for an unauthorized recipient which makes information security very important. The basic idea behind the proposed research work is to exploit the robustness of the genetic material in order to improve and outperform the performance of other conventional algorithm. In this paper, Huffman coding method has been adopted to develop a new and efficient symmetric DNA encryption algorithm. Firstly, the algorithm codifies the secondary DNA key which is extracted from the main DNA key according to Huffman coding. Then an XOR is applied between the coded DNA sequence and the plain-image. And in order to strengthen our algorithm, we have performed diffusion with a permutation box. Subject Classification: 11K45, 68P25, 81P94, 94A60, 94A62. Keywords: DNA, Cryptography, Huffman, Algorithm, Encryption, Decryption, Coding. 1. Introduction The security of traditional methods of secret key and public key cryptography is based on the keys. The keys used are so big that a *E-mail: meftah_m@hotmail.com (Corresponding Author) † E-mail: a.alipacha@gmail.com § E-mail: naima.hadjsaid@univ-usto.dz © 2 M. MEFTAH, A. A. PACHA AND N. H. SAID multitude of powerful machines computing at the same time would still take many years to decipher a key. It is not a problem today, but it will be soon, given the growth in computing power. A new method of data security was born inspired from DNA called DNA Computing. It was invented by Leonard Max Adleman in 1994 to solve some known NP-complete problems such as the traveling salesman problem and the Hamilton Road problem [1]. The concept of using DNA in cryptography has given new hope to unbreakable algorithms [2]. Our proposed encryption algorithm is based on a Huffman coding of DNA nucleotide bases. The characteristics of Huffman coding are as follows: • Variable length coding. • Instant decoding • Unique decoding • Reversible decoding • Compact It is widely used in data compression during source coding. Aruna Malik, Geeta Sikka & Harsh K. Verman, proposed a steganography method based on Huffman compression and color coding [3] Also and in order to strengthen our algorithm we used a spiral rotation permutation box. In this context, Sanal Kumar & S. Anfino Sherfin presented an encryption algorithm based on a spiral rotation [4]. Related works During the last years, as DNA encryption has been cited as one of the safest methods of data representation, extensive researches have been devoted to develop novel algorithms to ensure data security One of the algorithms suggested using a bi-serial DNA encryption method in which the plaintext is converted to hexadecimal code and into a binary code. This message is divided into two parts; one will be used as a key and the other as a message. The XOR operation is also performed in order to increase the compression factor. A DNA encoded message is received after the application of the digital DNA coding, then the PCR DNA ENCRYPTION ALGORITHM BASED ON HUFFMAN CODING3 amplification is implemented using two main pairs as the key and the compression is performed for a variable data length [5]. Another algorithm proposed by Shreyas Chavan was plaintext encryption using a DNA-based method and a One Time Pad (OTP). This algorithm uses two keys. First key is a random string of nucleotides bases used as DNA sequence; its length is similar to that of the plaintext. The second key is a binary sequence used as OTP, its length is twice that of the DNA sequence key [6]. Varma and Raju carried out an analysis study of different approaches of DNA encryption based on the matrix and the secure key generation scheme. [7]. On the other hand, Kang Ning proposed an approach of securing data by a method called pseudo-cryptography DNA method by converting the text into protein according to the table of genetic codes. [8]. Kritika Gupta and Shailendra Singh proposed an algorithm, which convert the plaintext into its ASCII code, and in its binary form. Then, these binary values are encoded in DNA sequences. After that, a DNA sequence is chosen as a key and grouped into blocks of 8 nucleotidesbases. Depending on the positions of the characters in the key, a table is created and, using this table and key, the produced data is converted into an encrypted form [9]. Recently, Al-Wattar et al. proposed different methods of DNA cryptography depending on the key for the Mix-Columns and Shift-Rows transformations similar to that implemented in the AES algorithm. This increases its resistance to attacks. [10] [11]. On the bases of these finding, we have proposed a symmetric encryption algorithm based on DNA. The proposed algorithm a. Encryption process: Both parties must first share a DNA strand (The main key) of their choice. The secondary key is represented by three parameters: Start_ Pos, Nbr_Bases and a number of round NR1. Phase 1 : Coding of the encryption key Let be a plain-image M with a length LM. Here is the process for coding the encryption key: We are going to extract a DNA sequence which starts from Start_Pos position with a length of Nbr_Bases. Next we will calculate the probability 4 M. MEFTAH, A. A. PACHA AND N. H. SAID of appearance of each short sequence represented by four (04) nucleotide bases. This is to obtain the Huffman coding for each quadruple. The next step is to positioning in the main key at the Start_Pos position and start coding the short sequences of four bases with the Huffman coding obtained until we obtain a binary sequence having the same image length (LM). After coding, we will obtain a binary key sequence S of length LS = LM. Phase 2 : XOR In this step, we will proceed to an XOR between the sequence obtained S and the plain-image M in its binary form which are of the same length. As a result, we will obtain a binary sequence which represents an encrypted-image C1. Phase 3 : Diffusion Reminder: A group is said to be monogenic when it is generated by one of its elements. ∃x ∈ G / G = < x > = {x m , m ∈ Z}. (1) Furthermore, if x is of finite order for n ≥ 1, we say that G is a cyclic group of order n and we have: = G {e , x , x 2 , x 3 , … x n −1 }[12]. (2) Permutation in a set: Let n ∈ N *, we call permutation of {1, 2, 3, …, n}, any bijection of {1, 2, 3, …, n} in itself. All of these permutations are noted Sn [13]. Notation : ∀ ∈ Sn : 1, 2, 3, …, n δ (1), δ (2), δ (3), , δ (n) δ = (3) In order to strengthen the encryption we are going to do a diffusion of the message obtained and for that, we used a permutation box as follows: The encrypted sequence C1 is divided into blocks of 16 bits which will each constitute an entry in the permutation box. Based on 16 bits, including 8 even positions (2, 4, 6, 8, 10, 12, 14, 16) and 8 odd positions DNA ENCRYPTION ALGORITHM BASED ON HUFFMAN CODING5 (1, 3, 5, 7, 9, 11, 13, 15), our permutation function is defined in the following way: We will first take the even positions in an increasing order then the odd positions in a decreasing order. 1 2 3 4 5 6 7 8 9 10 Even bits descending 16 14 12 10 8 6 11 12 13 14 15 16 13 15 Odd bits ascending 4 2 1 3 5 7 9 11 Figure 1 Permutation box In= our case : G {δ 0 , δ 1 , δ 2 , δ 3 , … , δ n −1 } is a cyclic group of order n such that: δ n is the permutation identity δ 0 = {1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15, 16} (4a) δp = δ ⋅ δ ⋅ δ ( p times) δn = δ0 (4 b) (4 c) After simulation of Sn (the entire set of permutations), we obtained n = 240 Which implies that G is a group of order 240 δ 240 = δ 0 (5) The number of round NR is obtained from the secondary key NR1 combined with the result of phase 2 as follows: Let S1 and S2 be the sum of the values of the odd and even pixels respectively of the image C1 NR = [( NR1 + | S1 – S2 | ) Mod 239 ] + 1 (6) NR ∈ [1...239] ∀NR : δ NR < > δ 0 We opted for this combination in order to obtain the avalanche effect between the small modification of the plain-image and the encryptedimage obtained. After the 1st round, we will shift one bit to the left, After the 2nd round, we will shift two bits to the left, and after the 3rd round, we will shift three bits to the left, Until complete the number of round NR. 6 M. MEFTAH, A. A. PACHA AND N. H. SAID We will get a binary sequence which represents the encryptedimage C. All the process of encryption is represented below in fig. 2. Figure 2 Diagram of encryption algorithm b. Decryption process As being our approach is a secret key algorithm, the decryption process is the reverse operation of the encryption process. Figure 3 Diagram of decryption algorithm DNA ENCRYPTION ALGORITHM BASED ON HUFFMAN CODING7 Validation of results We downloaded ‘Zea mays cultivar B73 chromosome2’ from NCBI https://www.ncbi.nlm.nih.gov [14] and used it as our main key. The main key contains 244442276 nucleotide bases. The secondary key is: Pos_Start = 1000000 Nbr_Base = 100000 Number of round NR1 = 8. The image used is ‘Cameraman.tif’ of size 256x256. After implementing the algorithm with Matlab R2015a. The results are obtained as follow: Test 1 : Key size Main-Key: 244442276 nucleotide bases. Secondary keys: Start_Pos = [1 .. 244442276] (244 * 106) ≈ 228. Nbr_Base = [1 .. 244442276] (244 * 106) ≈ 228. Number of round NR1 = [1..240] (28). Key size = 28 + 28 + 8 = 64 bits Test 2 : Histogram of Image The histogram can be seen as a probability of appearance of each pixel value. It is resistant to many image transformations; such as rotations, translations, changes of point of view and changes of scale. Referring to the results obtained (fig. 4 & fig. 5), we can see that plainimage histogram differs substantially from the corresponding encrypted one. Moreover, the histogram of the encrypted-image is uniform which makes it difficult to extract the values for statistical purpose. Fig. 4 Plain-image “cameraman.tif” and his histogram 8 M. MEFTAH, A. A. PACHA AND N. H. SAID Fig. 5 Encrypted-Image “cameraman.tif” and his histogram Test 3 : Entropy Shannon’s entropy is a function that corresponds to the amount of information provided by an information source. In the case of a source X, providing a random variable with n characters, each character xi has a probability of occurrence Pi. The entropy H of source X is defined as follows: H (X ) = Pi = −∑ i =1 Pi .log 2 (Pi ) n ki n (7 a) (7 b) In our case : • i Œ [0..255] • n = 256 * 256 = 65536 • ki : is the frequency of each value i If a source provides 256 symbols and if these symbols are equiprobable, the entropy of each symbol is: log2 (256) = log2 (28) = 8 bits, this means that to transmit a symbol you need 8 bits. The ideal entropy of the encrypted-image must have a value close to an entropy of a source providing random variables. In our approach, we got the followings results: • eM = 7,0097 : The entropy of the plain-image ‘cameraman.tif’. • eC = 7,9957 : The entropy of the encrypted-image. DNA ENCRYPTION ALGORITHM BASED ON HUFFMAN CODING9 Test 4 : Correlation between the adjacent pixels In statistics and probabilities, studying the correlation between two variables is to evaluate the intensity of the connection that can exist between these variables. The correlation coefficient is between –1 and 1. These values represent the degree of linear dependence between the two variables. The closer the coefficient is to the values –1 and 1, the stronger the correlation between the variables. A zero correlation coefficient means that there is absolutely no correlation between adjacent pixels. In our case, we took 2000 pixels randomly and we studied the correlation between adjacent pixels. Figure 6 Correlation between the adjacent pixels horizontally of the plain-image and the encrypted-image. Figure 7 Correlation between the adjacent pixels vertically of the plain-image and the encrypted-image. 10 M. MEFTAH, A. A. PACHA AND N. H. SAID Figure 8 Correlation between the adjacent pixels diagonally of the plain-image and the encrypted-image. Table 1 Correlation coefficient Image Cameraman.tif Direction Horizontal Vertical Diagonal Plain-image 0.9505 0.9628 0.9098 Encrypted-image - 0.0132 0.0027 -0.0083 Fig. 6, Fig. 7 and Fig. 8 summarize the correlation between the adjacent pixels of the plain-image and corresponding encrypted-image. After calculating the correlation coefficient (Tab 03), we see that the adjacent pixels in the plain-image have a strong correlation (coeff.≈1), whereas in the encrypted-image there is a very weak correlation (coeff. ≈ 0). This weak correlation between neighboring pixels in the encryptedimage makes our cryptosystem resistant to statistical attack. Test 5 : Differential attack To carry out a differential attack, the attacker makes a tiny modification such as modifying only one pixel in the plain-image, then observe the modifications made to the encrypted result. In this way, he can discover a significant relationship between the original image and the encrypted one. So, in the case where a minor change in the plain-image causes a significant change in the encryptedimage, it means that the differential attack is ineffective. DNA ENCRYPTION ALGORITHM BASED ON HUFFMAN CODING11 In our approach, to observe the result of one pixel change on the encrypted-image, two measurements were used: (1) The number of pixels change rate (NPCR) (2) The unified average change intensity (UACI). For this purpose, we will note two encrypted-images c1 and c2, corresponding to the plain-images m1 and m2 knowing that the difference between m1 and m2 is only one pixel. Label the pixel values on the matrix (i, j) of c1 and c2 by c1(i, j) and c2(i, j) respectively and define a twodimensional array D having the same size as the encrypted-image [15]. = D(i , j ) = 0 if c1(i , j ) c 2(i , j ) (8 a) = D(i , j ) 1 if c1(i , j ) ≠ c 2(i , j ) (8 b) NPCR : N (c1, c 2) = D(i , j ) ∑ W .H i; j .100% (9) Where, W and H are the width and height of encrypted-image c1 or c2. NPCR measures the rate of pixel difference between the two images. The NPCR value is between 0 and 1. When NPCR = 0, it means that c1 and c2 are exactly the same. When NPCR = 1, it means that all the pixels in c2 are modified compared to those in c1. Therefore, it is very difficult to establish a relationship between these two encrypted images c1 and c2 [16]. The UACI, which measures the average difference in intensity between the two encrypted images, is defined as follows: UACI : U (c1, c 2) = c1(i , j ) − c 2(i , j ) .100% ∑ 255.W .H i, j (10) Tests were realized on the proposed algorithm, concerning the influence of the change of one pixel on an image ‘cameraman.tif’ with 256 grayscale of size 256 * 256. We have changed the pixel value which has position (150,150) in the plain-image “cameraman.tif”, We put the value 144 instead of the value 143, to obtain two encrypted images c1 and c2 respectively with the corresponding values. We got the following results: • MPCR = 99,6032715 % 12 • M. MEFTAH, A. A. PACHA AND N. H. SAID UAIC = 33,6073513 % It is reported in literature that the ideal expectation values of NPCR and UACI for a gray scale image are 99.6094% and 33.4635% [17] respectively. The obtained result of NPCR and UACI in our algorithm is 99.6032% and 33.6073% respectively. This means that any minor change occurring on the plain-image, would cause an obvious change on the encrypted images. It is called ‘the avalanche effect’. It means that our algorithm presents strong sensitivity of plaintext. Conclusion This study reports a simple, straightforward and efficient symmetric DNA encryption algorithm with block cipher on the bases of 03 essential points: A variable length Huffman coding of the nucleotide bases, a logical XOR between the coded sequences obtained and the plain-image, and a diffusion of the result using a permutation box with a number of round obtained by a sub-key combined with a number of round obtained from the previous step. Moreover, the robustness of our cryptosystem is based on: a. The coding of the nucleotide bases is not fixed in advance. b. The coding of the nucleotide bases is of variable length. c. The impossibility of establishing a relationship between the encrypted images. Finally, future research is needed to further improve the efficiency and performance of DNA encryption to ensure better security of the encrypted result. References [1]Leonard Adleman, “Molecular Computation of Solutions to Combinatorial Problems,” Science, 266:1021-1024, November 1994. [2]T. Mandge and V. Choudhary. A review on emerging cryptography technique: DNA cryptography. International Journal of Computer Applications (IJCA), Vol. 13, pp. 9-13, DNA ENCRYPTION ALGORITHM BASED ON HUFFMAN CODING13 [3] Aruna Malik, Geeta Sikka & Harsh K. Verma (2017) A high capacity text steganography scheme based on huffman compression and color coding, Journal of Information and Optimization Sciences, 38:5, 647-664, DOI: 10.1080/02522667.2016.1197572 [4] S. Sanal Kumar & S. Anfino Sherfin (2019) A cryptographic encryption technique byte – Spiral rotation encryption algorithm, Journal of Discrete Mathematical Sciences and Cryptography, 22:3, 371-376, DOI: 10.1080/09720529.2019.1578083 [5] D.Prabhu, M.Adimoolam, “Bi-serial DNA Encryption Algorithm” [Online]. https://pdfs.semanticscholar.org/1754/f0eb5852500598a70af4002e186cd2f3c6ce.pdf [6]Shreyas Chavan, “DNA Cryptography Based on DNA Hybridization and One Time pad scheme”, International Journal of Engineering Research & Technology, Volume 2 Issue 10, October-2013. [7]P. S. Varma, K. G. Raju. Cryptography based on DNA using random key generation scheme. International Journal of Science Engineering and Advance Technology (IJSEAT), Vol. 2, Issue 7, pp. 168-175, July, 2014. [8] Kang Ning, “A Pseudo DNA Cryptography Method”, arXiv: 0903.2693 [cs.CR], Cornell University Library, March-2009. [9]Kritika Gupta, Shailendra Singh, “DNA Based Cryptographic Techniques: A Review”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3 Issue 3, March 2013. [10]A. H. Al-Wattar, R. Mahmod, Z. A. Zukarnain, and N. Udzir. A new DNA based approach of generating key dependent MixColumns transformation. International Journal of Computer Networks & Communications (IJCNC), Vol. 7, No. 2, pp. 93-102, March 2015. [11]A. Al-Wattar, R. Mahmod, Z. Zukarnain, and N. Udzir, “A new DNA based approach of generating keydependent ShiftRows transformation. International Journal of Network Security and Its Applications (IJNSA), Vol.7, No.1, January 2015. [12] Alain Jeanneret and Daniel Unes. ‘Invitation à l’algèbre’ CÉPADUÈS-ÉDITIONS, pp 36, 2008. [13]Jean Delcourt. ‘Théorie des groupes’ DUNOD 2ème édition, pp 46. [14]Zea mays cultivar B73 chromosome 2, whole genome shotgun sequence https://www.ncbi.nlm.nih.gov/nuccore/CM007648.1 [15]G. Chen, Y. Mao, and C. Chui, ‘A symmetric image encryption scheme based on 3D chaotic cat maps’ Chaos, Solitons and Fractals, vol. 21, pp. 749-761, 2004. [16]Y. Mao, G. Chen, S. Lian, ‘A novel fast image encryption scheme based on 3D chaotic baker map’ , International Journal of Bifurcation and Chaos, vol. 14 No. 10, pp 3613-3624, 2004. [17]Deng X.H., Zhu C.X. Image encryption algorithms based on chaos through dual scrambling of pixel position and bit. J. Commun. 2014; 35(3) : 216–223. Received February, 2020 Revised June, 2020