Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 1 Syllabus Chapter 1: Data compression and encryption Need for data compression, lossy/lossless compression, compression ratio, run length encoding (RLE) for text and image compression, relative encoding and its applications in facsimile data compression and telemetry, scalar quantization. Chapter 2: Statistical methods Statistical modeling of information source, coding redundancy, variable size codes, prefix codes, Shannon-Fano coding, Huffman coding, adaptive Huffman coding, arithmetic coding and text compression using PPM method. Chapter 3: Dictionary methods String compression, sliding window compression, LZ77, LZ78 and LZW algorithms and applications in text compression, Zip and Gzip, ARC and cyclic redundancy code. Chapter 4: Image compression Lossless techniques of image compression, gray codes, two dimensional image transforms, discrete cosine transform and its applications in lossy image compression, quantization, zig-zag coding sequences, JPEG and JPEG-LS compression standards, pulse code modulation and differential pulse code modulation methods of image compression, video compression and MPEG industry standard. Chapter 5: Audio compression Digital audio, Lossy sound compression, M-law and A-law companding DPCM and ADPCM audio compression, MPEG audio compression, frequency domain coding, format of compressed data. Chapter 6: Conventional encryption Security of information, security attacks, classical techniques, Caesar cipher, block cipher principle, design and modes of operation, S-box design, triple DES with two three keys, introduction to international data encryption algorithm. Chapter 7: Number Theory and public encryption Modular arithmetic, Fermat’s and Euler’s theorems, Chinese remainder theorem, discrete logarithm, principles of public key cryptosystems, RSA algorithm, key management, Diffie-Hellman key exchange, elliptic curve cryptography. Chapter 8: Message authentication Authentication requirements and functions, message authentication functions (MAC), hash functions and their security, hash and MAC algorithms, digital signatures and authentication protocols, digital signature standard and algorithms. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only CHAPTER 1 1.1 2 CONVENTIONAL ENCRYPTION Cryptography and related terms Cryptography: Cryptography is the practice of storing and communicating data in such a form that only whom it is intended for can read and process it. The basic purpose of cryptography is to protect the information from unauthorized individuals who may exploit it for their own benefit and cause loss to the organization. In cryptography we encode the data to be transmitted into an unreadable format using certain algorithms so that it cannot be used and modified to produce unauthorized effects. Practical goal of cryptography Practically most of the cryptographic algorithms can be broken down if the attacker has enough time and resources. Therefore the more realistic goal of cryptography is to make obtaining the information work intensive for the attacker. In other words the encryption algorithm should be strong enough that the time and resources lost by the attacker while decoding the code and tracking the algorithm should be more than actual value of information. The encryption algorithm is considered secure even if the time taken by the attacker to break the code and obtain the information exceeds the useful lifetime of the information. Following figure shows the basic encryption procedure: The sender generates the message containing the information to be communicated. This message is in plain text and therefore cannot be transmitted on an insecure channel. Hence this message is encrypted using the encryption algorithm to generate cipher text. A secret key is used by the encryption algorithm to generate cipher text which is known only to the sender and the intended receiver. This cipher text can be interpreted only by those individuals whose know how it was encrypted i.e. who have the decryption algorithm and the secret key. The intended receivers decrypt the message by running the decryption algorithm and obtain the readable copy of the message. Plain text: original message to be transmitted. Cipher text: encrypted message. Cipher: algorithm used to convert plain text to cipher text. Key: secret data used sender and the receiver for encryption and decryption purposes. Cryptography: study of encryption and decryption techniques. Cryptanalysis: practice of decoding the encrypted message without the knowledge of the key. Cryptology: study of both cryptography and cryptanalysis. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 3 Encipher: to encrypt Decipher: to decrypt 1.2 Information security There are three aspects of information security Security service Security mechanism Security attack Security service: The security service is something that enhances the security of data processing systems and information transfers of an organization. It is used to counter security attacks and it uses many security mechanisms to do so. The security standards defined by ITU X.800 are: 1. Authentication: Authentication refers to the authenticity of the contents of the messages being exchanged as well as that of the communicating entities. 2. Access control: Access control is the ability to limit and control the access to host systems and applications via communication links. To achieve this control, each entity trying to gain access must first be identified, or authenticated, so that access rights can be provided to the individual. 3. Data confidentiality: The contents of the message being transferred across the insecure medium should be readable to only those whom it is intended for and to no other entity. 4. Data integrity: The contents of the message should not get modified during transit and even if the message is modified, it should be detected at the receiving end. 5. Non repudiation: Repudiation disputes arise when one entity denies sending or receiving any message. The security mechanism should provide means to resolve such disputes. Security mechanism: A security mechanism is a mechanism designed to detect, prevent and recover from a security attack. No single mechanism supports all the functions required to provide complete security and therefore many mechanisms work together. Security attack: A security attack is any action which compromises the security of information of an organization. It is an assault on the system derived from a threat. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 4 Following figures shows different types of security attacks: Security threat: A threat is potential for violation of security which exists when there is a circumstance, capability, action or event that could breach security. In simple words a threat is the vulnerability of the system which may be exploited by an attacker. Two types of security attacks: Passive attacks Active attacks Passive attacks: In a passive attack the attack monitors the transmissions to obtain message content or monitors traffic flows, but does not modify the message. Active attacks: In an active attack the attacker acquires the message and modifies the contents of the message to obtain unauthorized effects. Types of active attacksModification of messages in transit: In such type of a part of the message is altered or the message is delayed to produce an unauthorized effect. Masquerade: In masquerade one entity pretends to be another entity to produce an unauthorized effect. Replay: In replay attack a message sequence is captured and then retransmitted to produce an unauthorized effect. Denial of service: Denial of service attack prevents or inhibits the normal use and management of communication facilities. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 1.3 5 Classifications of cryptographic systems 1) Classification based on type of operations used for transforming plain text into cipher text: Substitution cipher: In substitution cipher each element in the plain text is mapped into (replaced by) another element to generate the cipher text. Transposition cipher: In transposition cipher the elements of the plain text are rearranged to generate the cipher text. Product systems: Product systems involve multiple stages of substitution and transposition. 2) Classification based on number of keys used: Symmetric, single key, secret key or conventional encryption: In this encryption method both the sender and the receiver use the same single key. The key is used for both encryption and decryption purposes. Asymmetric, two key or public key encryption: In public key encryption the sender and the receiver use different keys. 3) Classification on the basis of manner in which plain text is processed: Block cipher: A block cipher processes the input one block at a time producing an output block for each input block. Stream cipher: Stream cipher processes the input elements continuously producing an output one element at a time as it goes along. 1.4 Symmetric cipher In symmetric cipher encryption or secret key encryption the sender and the receiver share a secret key between them and all the messages are encrypted and decrypted using the same secret key. Following figure shows the symmetric encryption process: Here a source produces a plain text message of the form: P = [X1, X2, ... , Xm] Where X1, X2, … are characters. A secret key is generated by the sender which is delivered to the receiver securely. The plain text is encrypted using this secret key to generate the cipher text as: C = EK (P) Where E is the encryption algorithm. The receiver decrypts the cipher text using the same key to obtain the plain text as: P = DK (C) Where D is the decryption algorithm. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 6 Requirements of symmetric encryption: 1. The encryption algorithm should be unconditionally secure i.e. the it should be strong enough that the attacker should not be able to decrypt the cipher text or discover the key even if he possesses cipher text copies along with corresponding plain text copies. 2. Sender and receiver should obtain the copies of secret key in a secure fashion and must keep the key secure. 3. The algorithm should be computationally secure i.e. : - The cost of breaking the cipher exceeds the value of the message. - The time required for breaking the cipher should exceed the useful lifetime of the message. Drawbacks of symmetric encryption: - There is no method which is completely secure for delivering the secret key and if the attacker obtains a copy of the secret key then all the communication of the organization will be compromised. - This method does not provide any mechanism for authentication of the communicating parties involved and therefore is vulnerable to masquerade attacks. 1.5 Fiestel cipher Fiestel cipher is a product cipher and uses two basic ciphers in sequence in such a way that their result is cryptographically stronger. This method uses a cipher that alternates substitution and permutation. Principle of operation: Fiestel cipher works on the principle of confusion of diffusion and confusion. Diffusion: In diffusion, the statistical nature of plain text is dissipated into long range statistics of cipher text. This is done by making each bit of the plain text affect many bits of cipher text. The purpose of diffusion is to make the statistical relationship between the plain text and the cipher text as complex as possible to prevent the attacker from deducing the key. Confusion: In confusion, the relationship between statistics of the cipher text and the encryption key is made as complex as possible using a complex substitution algorithm. This is done so that even if the attacker has understood the statistics of the cipher text he will not be able to discover the key due to complex relationship between the key and the cipher text. Algorithm: The inputs to the encryption algorithm are: a plain text block of size 2w bits and a key having many subkeys K = {K1, K2,…, Kn}. The plain text block is divided into two halves each of length w bits denoted by R0 for w rightmost bits and L0 for w leftmost bits. These two halves pass through n rounds of processing and are then combined to produce the cipher text block. Each round i has inputs Li-1 and Ri-1 derived from previous round and a key Ki derived from K. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 7 Li is subjected to substitution by first applying a round function on Ri-1 and ex-oring the result with Li-1. The round function has same structure for each round but is parameterized by the round key Ki. Following this substitution, a permutation is performed that consists of interchange of the two halves of data. Following fig. shows the Fiestel cipher algorithm: Design principles: 1. Block size: Increasing the block size increases complexity and thus improves security. But it slows the cipher. Typically block size is 64 bits 2. Key size: Increasing the key size improves security but slows the cipher. Typically key size is 128 bits. 3. Round function: Complex functions improve security but slow the cipher. 4. Number of rounds: Increasing the number of rounds improves complexity but slows down the cipher. Typically 16 rounds are used. 5. Complexity of subkey generation: Complexity of subkey generation improves security and makes the analysis harder. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 8 Data encryption standard (DES) DES is an encryption technique which encrypts the data in 64 bit blocks using 56 bit keys. Following fig. shows the encryption procedure used by DES: The inputs to the encryption function are a 64 bit block of plain text and a 56 bit key. Although the actual size of the key is 64 bits, only 56 bits are used and the remaining 8 bits are arbitrary. Following processes are involved in encryption of a block of plain text data using DES: 1. Initial permutation 2. 16 rounds of complex key dependent round function involving substitution and permutation functions. 3. 32 bit swap 4. Permutation which is inverse of the initial permutation. Initial permutation: The initial permutation is defined by the following table: The table has to be interpreted in the following way: - The input to the table consists of 64 bits numbered from 1 to 64. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only - 9 The 64 entries in the permutation table contain a permutation of the numbers from 1 to 64. Each entry in the permutation table indicates the position of a numbered input bit in the output, which also consists of 64 bits. Inverse initial permutation: The inverse initial permutation is defined by the following table: Single round details: Following figure shows the details of a single round involved in data processing: - A 64 bit intermediate value is the input to every round. This value is divided into two data blocks each of length 32 bits. The right hand side block Ri-1 is subjected to an expansion/permutation block which converts 32 bit block of data into a 48 bit block. The expansion is done according to the following table: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only - 32 bit block of data is expanded into a 48 bit block by repeating some of the bits from the original block. The repetition of bits is as given in the above table. After expansion the 48 bit data block is ex-ored with the 48 bit key. The 48 bit ex-or output block is then mapped into 32 bit block by a substitution function involving eight s-boxes. Following figure shows s-box design: - Each s-box takes 6 bits of data as input and maps it into 4 bit data. s-box design: Following figure shows the design of an s-box: S1 10 Mapping 6 bits data into 4-bits: Consider the 6 bit input as 110101 4 bit number = binary equivalent of 3 = 0011 i. The 2 bit number formed by the first and last bits gives the row number to be referred in the table. ii. The remaining 4 bits give the column number. iii. The number at the corresponding row and column when converted into 4 bit binary equivalent is the 4 bit mapped output. - The output of s-box is then subjected to a permutation block which rearranges the bits in order to increase the complexity of the encryption. Following table defines the permutation operation: - The permuted output is then ex-ored with the left hand side input to the round: Li-1 to generate the right hand side output block Ri. The input block Ri-1 is the left hand side output of the round i.e. Li = Ri-1. - Key generation in DES: DES uses a 64 bit key as input. Out of the 64 bits every 8th bit is ignored and only 56 bits are used as given by the following table: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 11 The resultant 56 bit key is then subjected to a permutation defined by the following permutation choice -1 table: The permuted 56 bit key is then divided into two halves Co and Do each of size 28 bits. At each round Ci-1 and Di-1 are subjected to a circular left shift given by the following table: The shifted values serve as input to the next round. They also serve as input to the permuted choice-2 table which produces the 48 bit key for the round function. PC-2 table: DES decryption: DES uses the same algorithm for decryption of the message except that the order of application of the keys is reversed. Triple DES: DES is vulnerable to brute force attacks and therefore using DES for encryption does not ensure complete security. Hence to improve the security of encryption, the plain text is encrypted multiple times using same DES algorithm but with different keys. In triple DES the plain text is encrypted by subjecting it to DES algorithm thrice. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 12 Triple DES using two keys: C = EK1 [DK2 {EK1 (P)}] P = DK1 [EK2 {EK1 (C)}] Triple DES using three keys: C = EK3 [DK2 (EK1 (P))] P = DK3 [EK2 (DK1(C))] Block cipher principles: 1. Electronic codebook mode: In electronic codebook (ECB) mode the plain text is encrypted in 64 bit blocks using the same encryption key K. The plain text message is divided into 64 bit blocks and if the size of any block is less than 64 bits then bits are padded. Each 64 bit block is encrypted independent of other blocks. Hence each block will result in a unique cipher text block and therefore the codebook is used. This method is useful for small blocks of data. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 13 The drawback of this method is that if the attacker discovers the encryption algorithm and the key entire data becomes visible to him. 2. Cipher block chaining mode: - - In CBC mode the cipher text output of the previous round is ex-ored with the current plain text block and the ex-or output is subjected to the encryption block. For the first block of data no previous cipher text block is known and therefore an initial value is used to ex-or it with the plain text block. The advantage of this method is that even if an attacker finds out the encryption key and the encryption algorithm, he will not be able to decrypt the cipher text block unless the previous cipher text blocks are known to him. Another advantage of this method is that same blocks of cipher text will produce different blocks of cipher text and therefore the structural analysis of data is not possible. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 14 3. Cipher feedback mode: - CFB mode converts a block cipher into stream cipher by padding with appropriate number of bits. This mode is suitable for real time applications where s bits of stream data are to be transmitted immediately. 4. Output feedback mode: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 5. Counter mode: 15 Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only - 16 The advantage of this method is that even if the attacker knows the encryption algorithm and the secret key, he will not be able to decrypt the cipher text until he knows the cipher text. Key management in symmetric encryption: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 17 In this method the key distribution center which is a highly trusted organization generates the secret keys to be used by two communicating entities. Following steps take place for key distribution: 1. The initiator A has to establish a data transfer session with B. Hence A sends a request message to KDC. Along with the request message a nonce N1 is added which can be a time stamp or any counter number depending on the application. 2. KDC responds by a message encrypted using the secret key shared between KDC and A and another message encrypted using the secret key shared between KDC and B. The first message contains a secret key Ks to be used for communication message along with a copy of the request message sent by A so that A can verify that the message did not get modified during transit. The other message contains the secret key Ks along with identity of A and it is encrypted using the key shared between KDC and B so that once B receives this message it trusts the key source. 3. A extracts the second part of the message and sends it to B. 4. B derives the key and sends an encrypted nonce to A. 5. A decrypts the nonce N2 and sends it to B so that the identity of A is authenticated to B. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only CHAPTER 2 2.1 18 NUMBER THEORY AND PUBLIC KEY ENCRYPTION Number theory Modular arithmetic: Modulus operator: Consider a positive integer ‘n’ and any other integer ‘a’. When a is divided by n we get remainder ‘r’ and quotient ‘q’ such that: a = nq + r When the remainder is required and the quotient is not of much significance, then the operation can be represented using modulus operator as: a mod n = r a mod n operation gives the remainder when a is divided by n. For example: 7 mod 5 = 2 11 mod 7 = 4 Congruent modulo integers: Two integers a and b are said to be congruent modulo n if: a mod n = b mod n and it is represented as: Rules 1. 2. 3. For example: 17 13 mod 4 35 52 mod 17 of modular arithmetic: a mod n + b mod n = (a + b) mod n a mod n - b mod n = (a - b) mod n a mod n x b mod n = (a x b) mod n Relatively prime numbers: Two numbers are said to be relatively prime to each other if there is no factor common between them other than 1 i.e. if their G.C.D is 1. Thus a and b are relatively prime to each other if gcd (a,b) = 1 Any prime number is relatively prime to all numbers other than 1 and its multiples. For example: 25 and 33 are relatively prime to each other. 7 and 21 are not relatively prime to each other. Euler’s totient function: For any natural number n the Euler’s totient function (n) is defined as the total number of natural numbers less than n and relatively prime to n. For example let n = 15 The set of natural numbers less than 15 and relatively prime to 15 is: {1, 2, 4, 7, 8, 11, 13, 14} (15) is number of elements in this set i.e. 8 Hence (15) = 8 For any prime number n, all the numbers less than n are relatively prime to n. Hence for any prime number n, (n) = n – 1 Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 19 Fermat’s theorem: Fermat’s theorem states that if ‘p’ is a prime number and ‘a’ is a positive integer not divisible by p, then: Proof: If p is a prime number and a is a positive integer not divisible by p, then according to modular arithmetic the set of numbers: { 0 mod p, a mod p, 2a mod p, ...... ,(p-1)a mod p } is identical to set { 0, 1, 2, ...... , p-1 }. Since 0 mod p = 0 the first element of the two sets are equal. Now multiplying the remaining elements of the two sets and taking modulus we get: [(1a mod p)(2a mod p).....((p-1)a mod p)] mod p = (123.......(p-1)) mod p Using product rule on RHS: (a2a.....(p-1)a) mod p = (123.......(p-1)) mod p ap-1(p-1)! mod p = (p-1)! mod p Canceling (p-1)! on both sides: ap-1 mod p = 1 mod p or ap-1 1 mod p Euler’s theorem: Euler’s theorem states that for every a and n that are relatively prime: 2.2 Principles of public key cryptographic systems Drawbacks of single key encryption: Single key encryption uses one key shared by both the sender and the receiver. If this key is disclosed, all communication between the sender and the receiver becomes transparent to the attacker. This is symmetric system and therefore it does not prevent the parties from forging a message and claiming it to be sent by the other party. Public key encryption: Public key encryption is based on using different keys for encryption and decryption purposes. In public key encryption each communicating party generates a pair of keys. One of the keys is publicly available and is therefore called the public key KU. The other key is known only to the respective party and therefore called as private key KR. The keys are generated in such a way that a message encrypted using the public key can be decrypted using the private key only while a message encrypted using the public key can be decrypted using the private key only. Public key encryption can be used for authentication and confidentiality both and it also eliminates the need for a secure medium for distribution of secure keys. Steps involved in public key encryption: 1. Each communicating entity generates a pair of keys to be used for encryption and decryption of messages. 2. One of the keys is kept secret and is known only to the user. This key is the private key. 3. The other key is placed in the public register and is accessible to every one. This key is the public key. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 20 4. Keys are used for encryption and decryption depending on the application. Data confidentiality using public key encryption: Confidentiality refers to the security of the information while it is transmitted through an insecure channel. No other entity except the intended receiver should be able to view the message. Following figure shows how data confidentiality is obtained using public key encryption: A source A produces messages in plain text P = [P1, P2, ......] where the elements P1, P2, P3, ...... are letters in some finite alphabet. The receiver of the message B generates a pair of key i.e. a private key KRB known only to B and a public key KUB known to everyone including A. For confidentiality the receiver’s public key is used for encryption. A message encrypted using the receiver’s public key can be decrypted using the receiver’s private key only. Since the private key is known to no one else, the message will be secure from everyone and confidentiality will be achieved. Therefore A encrypts the plain text message using the receiver’s public key KUB and the cipher text of the form C = [C1, C2, ......]. C = EKUB[P] Upon reception B decrypts this message using the private key and generates the plain text message as: P = DKRB[C] - This method ensures confidentiality but not authentication as anyone having the public key of B can forge a message masquerading as A. Authentication using public key encryption: Authentication refers to the genuineness of the communicating entities. For example if A and B are communicating, both A and B should be aware of each other’s identities. Authentication can be implemented using public key encryption in the following manner: Here the sender A generates a plain text message P and encrypts this message using his private key KRA to generate the cipher text C as: C = EKRA[X] Since this message is encrypted using the private key of the sender, it can be decrypted only using the public key of the sender. Therefore if a communicating party is able to decrypt the message using the public key, the identity of the sender will be authenticated as no one else can encrypt a message using the private key. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 21 Upon reception the receiver decrypts the message as: P = DKUA[C] - This method provides authentication but not confidentiality as the message is encrypted using the sender’s private key and everyone having the public key can decrypt the message and view the contents. Authentication and confidentiality using public key encryption: Authentication and confidentiality both can ensured using public key encryption by subjecting the plain text message to two rounds of encryption as shown in the figure: As shown in the figure the message is encrypted twice first using the sender’s private key and then using the receiver’s public key. The public key of the receiver is used to ensure confidentiality the private key of the sender is used to authenticate the sender. The cipher text is generated as: C = EKUB[EKRA(P)] The cipher text is decrypted as: P = DKRB[DKUA(C)] - The disadvantage of this method is that the complex encryption algorithm has to be executed twice at each end which increases the processing time. Requirements of public key encryption: 1. It should be computationally feasible for all the communicating parties to generate a key pair (KU, KR) 2. It should be computationally feasible for a sender A knowing the public key of the receiver B to generate cipher text as C = EKUB(P). 3. It should be computationally feasible for the receiver B to decrypt the cipher text and obtain the original message as P = DKRB(C). 4. It should be computationally infeasible for an attacker who knows KU to find KR. 5. It should be computationally infeasible for an attacker who knows C and KU to find P. 6. Encryption and decryption functions can be applied in any order: M = EKUB[DKRB(M)] = DKUB[EKRB(M)] = EKRB[DKUB(M)] = DKRB[EKUB(M)] 2.3 RSA algorithm: RSA algorithm is a practical implementation of public key encryption. It is a block cipher scheme where the plain text and cipher text are integers between 0 and n-1. Typically n=1024. Here the plain text is encrypted in blocks where the size of each block is k bits, such that 2k < n ≤ 2k+1. For a block of plain text M, the cipher text C is generated as: C = Me mod n The cipher text is decrypted as P = Cd mod n = Med mod n Both sender and the receiver know the value of n and e whereas only the receiver knows the value of d. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 22 Thus the public key of the receiver is KU = {e, n} and the private key of the receiver is KR = {d, n} The RSA algorithm consists of following modules: I. Key generation: 1. Generate two large random and distinct prime numbers p and q which are approximately of same size in terms of bit length. 2. Compute n = pq and Ф = (p-1)(q-1). 3. Select a random integer e, 1<e<Ф such that gcd(Ф, e) = 1 4. Compute unique integer 1<d< Ф such that ed 1 mod Ф II. Encryption: The sender encrypts the message M as: 1. Obtain the KU of the intended receiver. 2. Represent the message M in integer in the interval 0 to n-1. 3. Compute C = Me mod n and send it to the intended receiver. III. Decryption: The receiver recovers the plain text from the cipher text as: P = Cd mod n = Med mod n - Note: even though we have to select the values of p and q which are similar, we cannot take very nearby values because if then . The value of n is known to everyone and hence anyone can find the value of p and by trial and error and find the keys. 2.4 Key management: There are two main aspects of key management Distribution of public keys Use of public key encryption to distribute secret keys Distribution of public keys: 1. Public announcement of public keys: In this method each user distributes public keys to recipients or broadcast them to the entire community. The drawback of this method is forgery. Suppose X is an attacker and he sends following message to B and C after blocking the message from A. X to B & C : [IDA, KUX] So here X is sending his public key pretending to be A and can masquerade until discovered by A. Hence in method anyone can create a key claiming to be someone else and broadcast it. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 23 2. Publicly available directory: - In this method, the public keys are registered with a public directory. This assures greater security to the keys. The directory must be trusted with following properties: 1. It should contain the name and public key entries in the form {IDX, KUX}. 2. The participants should register securely with the directory. 3. The directory should be periodically published. 4. The directory should be electronically accessible. 3. Public key authority: In this method highly trusted public key authority controls the distribution of keys. The public key authority provides all the functionalities of the directory. All the communicating entities interact with the directory to obtain public keys. The only requirement of this method is real time access to the directory. Following figure shows the key distribution procedure by public key authority: The key distribution takes place in the following steps: 1. A PKA: Request || T1 The initiator A sends a message to public key authority containing a request for current public key of B and a time stamp T1. Time stamp is used to prevent replay attacks. 2. PKA A: EKRAUTH [KUB || Request|| T1] The authority responds with a message that is encrypted using its private key KRAUTH. This message contains the public of B and the original message that was sent by A to public key authority. The original message is sent back to A so that A can verify the message for any modification or replay attacks. The message is encrypted using the private key of the authority to authenticate the public key authority and prevent masquerade attacks. 3. A B: EKUB[IDA || N1] Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 4. 5. 6. 7. 24 A stores the public key of B and encrypts a message using this key and sends it to B. This message contains the identity of A and a nonce N1 which serves as an identifier to the message. B PKA: Request || T2 B sends a message to public key authority requesting the public key of A. This message contains the identity of A and a time stamp T2. PKA B: EKRAUTH[KUA || Request || T2] The public key authority responds by sending a message with KRAUTH containing the public key of A and the original request message along with the time stamp. B A: EKUA[N1 || N2] B sends a message to A after encrypting the message with the public key of A in response to message (3). This message contains the original nonce N1 along with a new nonce N2. The original nonce is sent back to A so that A is assured of the identity of B. Since B is sending the nonce N1 which was encrypted using the public key of B, it is actually B with whom A is communicating as no one else can find N1. A B: EKUB[N1 || N2] A sends the nonce N2 back to B to authenticate itself. 4. Public key certificates: Public key certificates allow key exchange without real time access to public key authority. Following figure shows the key exchange procedure with public key certificates: A public key certificate binds the identity to public key along with other information such as period of validity, rights of use etc. All the contents of the certificate are signed by the certificate authority and therefore it can be verified by anyone who knows the public key of the certificate authority. Each communicating party sends its public key to the certificate authority securely. For party A the certificate authority verifies the relevant details and provides a certificate of the form: CA = EKRAUTH [IDA, KUA] Similar certificates are given to all the communicating parties after authentication. All the communicating parties exchange the certificates instead of exchanging the public keys. Whenever a party receives a certificate from another party, it will obtain the public key of the sender by decrypting the certificate using the public key of the certificate. If the certificate is successfully decrypted with the public key of the certificate authority, the sender of the certificate is authenticated. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 25 Public key distribution of secret keys: This method assumes that the two communicating parties A and B have already exchanged the public keys. The secret key is exchanged in the following steps: 1. A B: EKUB[N1 || IDA] A uses the public key of B to encrypt a message to B which contains the identity of A IDA and nonce N1, which is used to identify this transaction uniquely. 2. B A: EKUA[N1 || N2] B sends the response to A containing the nonce N1 and a new nonce N2. This message is encrypted using the public key of A. B sends the received nonce N1 back to A to authenticate itself to A. 3. A B: EKUB[N2] A sends the nonce N2 back to B to authenticate itself to B. 4. A B: EKUB[EKRA(Ks)] A selects a secret key Ks and sends it to B after encrypting it twice. The secret key is first encrypted using KRA and then using KUB. This ensures authentication as well as confidentiality. 5. Finally B decrypts the received message and obtains the secret key as: Ks = DKRB [DKUA(Ks)] Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only CHAPTER 3 3.1 26 MESSAGE AUTHENTICATION Message authentication Purpose of message authentication: There are three main aspects of message authentication1. Protecting the integrity of the message. Preventing the messages from getting modified during transit and in the case of any modification the receiver should be able to detect it and discard the message. 2. Validating the identity of the originator. Authentication scheme should ensure that the sender of the message is same individual as in indicated by the identity in the message. 3. Non repudiation of origin. The authentication scheme should be able resolve the disputes resulting due to sender denying any message which has its identity. Requirements of authentication: For any message to be authenticated following attacks must be prevented1. Disclosure 2. Traffic analysis 3. Masquerade 4. Content modification 5. Sequence modification 6. Timing modification 7. Source repudiation 8. Destination repudiation 3.2 Message authentication functions Message authentication functions Message encryption Message authentication code (MAC) Hash function I. Message encryption: Here the cipher text of the message serves as its authenticator. 1. Symmetric encryption: In symmetric encryption a source A transmits a message M to a receiver B after encrypting it with a secret key K shared between A and B. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 27 Since no other party knows the secret key K, confidentiality is provided. It also authenticates the two parties for each other. If party B receives a message encrypted using key K and containing the identity of A, it is assured that it was generated by A as no other party knows the secret key K. 2. Public key encryption: Direct use of public key encryption: In public key encryption sender A generates a message M and encrypts it using public key KUB of the intended receiver B. upon reception party B decrypts the message using its private key KRB. The direct use of public key encryption provides only confidentiality and not authentication because an attacker can easily obtain the public key of party B and forge a message using identity of party A as shown: Attacker C: EKUB [M, IDA] Upon reception of such a message party B will not be able to detect that the message is unauthorized. Encryption using private key: Here the sender A transmits a message M to the receiver B after encrypting it using its private key KRA. Upon reception B decrypts this message using the public key KUA of A and obtains M. This method provides authentication because if B is able to decrypt the message using KUA, it was definitely encrypted using KRA which is known only to A and no other party. Only A can encrypt a message using its private key and therefore it is authenticity is confirmed. The drawback of this method is that it does not provide confidentiality because anyone can obtain the public key KUA of A and decrypt the messages. Authentication using multiple encryption: In this method every message is encrypted twice before being transmitted to the receiver. Here the sender A first encrypts the message using its private key KRA and then again using the public key KUB of the receiver. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 28 This method provides authentication and confidentiality both but at the cost of extra processing time for running the complex encryption algorithm twice. Drawbacks of using message encryption to provide authentication: This method provides partial authentication by authenticating only the sender of the message and not the contents of the message. Any attacker can obtain a copy of cipher text and remove some bits from it or rearrange the bits even if he is not able to decrypt the message. Such types of attacks cannot be prevented and only solution is to detect and discard such messages. This method provides no mechanism for detecting such unauthorized modifications. To provide both authentication and confidentiality, the complex encryption algorithm has to be used twice which increases the load on the system and the processing time. II. Message authentication code (MAC): In this method an additional data called as cryptographic checksum or message authentication code (MAC) is added to the message which serves as its authenticator. Following figure shows the procedure for authentication using MAC: The sender A generates a message M to be transmitted to receiver B. The cryptographic checksum is calculated by subjecting M to a function C called as MAC function using the secret key K. MAC = CK (M) This cryptographic checksum or MAC value is then appended to the original message and then transmitted to the intended receiver. The MAC function and the secret key are known only to the two communicating parties involved. Upon reception, the receiver separates the message and MAC and then recalculates the MAC value from M using K. If the received MAC value and the recalculated MAC value are equal, the message is authenticated otherwise it is discarded. The message authentication is based on the fact even if an attacker is able to modify the message, he cannot modify the MAC value accordingly as he does not know the MAC function or the secret key. If an attacker modifies the message to produce an unauthorized effect, the recalculated MAC value and the received MAC value will not match and the message will be discarded at the receiving end. Requirement of MAC: 1. If an attacker observes M and CK (M), it should be computationally infeasible for him to construct a message M’ such that: CK (M’) = CK (M). 2. CK (M) should be uniformly distributed in the sense that for randomly chosen messages M and M’, the probability that CK (M’) = CK (M) is 2-n where n is the number of bits in MAC. 3. MAC should depend equally on all bits of the message. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 29 III. Hash function: Hash function is a public function that maps a message of any length into a fixed length hash value which serves as its authenticator. Fig. shows the basic procedure involved in authentication using hash function: The sender generates the message M and the hash value ‘h’ is calculated by subjecting M to hash function as: h = H (M) This value is appended to the message at the source. The receiver authenticates the message by recomputing the hash value from the message and then comparing it with the received hash value. Authentication is based on the fact that it is not possible for an attacker to modify the message and the hash value accordingly. Hence even if an attacker modifies the message it will be detected at the receiving end as the calculated and received hash values will not match. Practical implementations of authentication using hash function: 1. Implementation using symmetric encryption: 2. Implementation using public key encryption: 3. Implementation using public key encryption and a secret data: Properties of hash function: 1. The hash function produces a fixed length output for variable length input. 2. It can be applied on a block of data of any size. 3. H (x) should be relatively easier to calculate for any x, so that hardware and software implementation is possible. 4. One way property: For any given value h, it is computationally infeasible to find x such that H (x) = h. 5. Weak collision resistance: For any block x, it is computationally infeasible to find y not equal to x such that H(x) = H(y). Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 30 6. Strong collision resistance: It is computationally infeasible to find any pair (x,y) such that H(x) = H(y). Secure hash algorithm The secure hash algorithm takes as input a message with a maximum length less than 2 64 bits and produces a 160 bit message digest. The input is produced in 512 bit blocks and following steps are involved in the processing: 1. The message is padded so that its length is congruent to 448 modulo 512. Padding is always added even if the message is of desired length. The number of padding bits is in the range of 1 to 512 bits and the padding consists of a single 1–bit followed by the necessary number of 0 bits. 2. A block of 64 bits is appended to the message. This block is treated as an unsigned 64-bit integer and contains the length of the original message before padding. 3. A 160 bit buffer is used to hold intermediate and final results of the hash value. The buffer is represented by five 32-bit registers A, B, C, D and E.These buffers are initialized to following hexadecimal values: A = 67452301 B = EFCDAB89 C = 98BADCFE D = 10325476 E = C3D2E1F0 4. The message is processed in 512 bit or 16-word blocks. The algorithm consists of module having rounds of processing of 20 steps each. There are four rounds having similar structure but using different primitive logical functions. Each round takes as input, the current 512 bit block i.e. Yq and the 160 bit buffer value ABCDE and updates the contents of the buffer. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 31 5. After all the 512 bit blocks have been processed, the output from the Lth stage is the 160 bit message digest or the hash value where L is the number of blocks in the message. 5.3 Digital signatures Need for digital signatures: Message encryption and authentication protects two communicating parties against any third party but it does not protect the two parties against each other. Disputes arise when there is source or destination repudiation. In those situations where the two communicating parties do not have complete trust on each other, digital signatures are required. Properties/requirements of digital signatures: 1. It must verify the date and time of the signature along with verifying the author. 2. It must authenticate the contents at the time of the signature. 3. It must be verifiable by the third party to resolve the disputes. 4. The digital signature must be a bit pattern that depends on the message being signed. 5. The signature must use some information unique to the sender to prevent forgery and denial. 6. It should be relatively easy to produce, recognize and verify the digital signature. 7. It must be infeasible to forge a digital signature either by constructing a new message for an existing digital signature or by constructing a fraudulent digital signature for a given message. 8. It should be practical to retain a copy of the digital signature in storage. Arbitrated digital signature techniques: In arbitrated digital signature techniques, the signed message from the sender X to the receiver Y goes first to an arbitrator A who subjects this message and its signature to various tests to check whether the origin and contents are genuine or not. The message is then dated and sent to Y with an indication that it has been verified by the arbitrator. The presence of an arbitrator solves the problem of source repudiation. Following approaches are used in arbitrated digital signatures: 1. Conventional encryption: a. Where arbitrator can see the message: X A: M || EKXA [IDX || H(M)] A Y: EKAY [IDX || M || EKXA (IDX || H(M)) || T] In this method the arbitrator must share a secret key KXA with the sender X and secret key KYA with Y. Here the arbitrator can see the message. The arbitrator calculates H(M) from the message received and compares it with received H(M). After verifying the origin and contents, the arbitrator forwards another message to the receiver which contains a signature. The signature consists of the identity IDX and the hash value H(M). The timestamp T ensures that it is not a replay attack. Y cannot decrypt the signature but still the message is considered authentic as it has come through A. This method requires both X and Y to have to trust A in the following manner: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only - 32 X must trust A not to reveal KXA and not to generate false signatures of the form EKXA [IDX || H(M)]. Y must trust A to forward a message only after verifying the hash value and the signature. Both X and Y must trust A to resolve disputes fairly. b. Arbitrator cannot see the message X A: IDX || EKXY[M] || EKXA [IDX || H[EKXY(M)]] A Y: EKYA [IDX || EKXY(M)] || EKXA[IDX || H[EKXY(M)] || T1] Here X and Y must share a secret key KXY between them. In this case the arbitrator cannot see the message. Drawbacks of using conventional encryption: - Arbitrator can form an alliance with the sender deny a signed message. - Arbitrator can form an alliance with the receiver to forge sender’s signature. 2. Public key encryption: X A A: IDX || EKRX [IDX || EKUY(EKRX(M))] Y: EKRA [IDX || EKUY[EKRX(M)] || T] In this case X double encrypts a message M, first with its private key KRX and then with the receiver’s public key KUY. This is a signed secret version of the message. This signed version with IDX is encrypted again with KRX and is sent to A along with IDX. The inner double encrypted message is secure from the arbitrator. However A can decrypt the outer encryption to assure that the message must have come from X. The arbitrator A verifies the validity of the private-public key pair of X and if the key pair is validated, A verifies the message. After verification, A transmits a message a message to Y encrypted with KRA. The message includes IDX, double encrypted message and a timestamp. Here the message is secret from A. Another advantage of this method is that no information is shared among the parties before communication which prevents alliances to defraud. Digital signature standard: Digital signature is a public key technique which uses an algorithm designed to provide the digital signature function. The DSS approach makes use of a hash function. The hash value of the message is given as input to a signature function along with a random number K generated for that particular signature. The signature also depends on the sender’s private key and a set of parameters which constitute a global public key (KUG). Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 33 The output of signature function is a signature consisting of two components labeled as ‘s’ and ‘r’. These two components are appended to the message and the entire block is transmitted. Upon reception, the hash value of the message is calculated. The hash value and the message are given to the verification function which requires the public key of the sender along with KUG. The output of the verification function is a value equal to the signature component r if the signature is valid. Digital signature algorithm: The strength of digital signature algorithm is based on the difficulty of computing discrete logarithms. The DSA consists of following steps: 1. Calculating global public key components: 1. Select a prime number p with a length between 512 and 1024 bit. 2L-1 < p ≤ 2L for 512 ≤ L ≤ 1024 L is a multiple of 64 2. Select a 160 bit prime number q such that q is a prime divisor of (p-1). 3. Select g such that 1 < g and g=h^[(p-1)/q] mod p and 1 < h < p-1. The numbers p,g and q form the global public key KUG = {p, g, q} 2. Calculation of private key X of the user: Select the private key X of the user such that 0 < X < q. X should be selected randomly or pseudo randomly. 3. Calculating the public key Y of the user: The public key of the user is calculated using his private key as Y = gX mod p. Knowing the value of Y, it is computationally infeasible to find X, since discrete logarithm is involved. 4. Generating user’s per message secret number K: It is a random or pseudo random integer K such that 0 < K < q. It is unique for every signature. 5. Creating a signature: Creation of a signature requires calculation of two quantities r and s that are functions of the public key components (p, q, g), user’s private key X, hash code of the message H(M) and K. r is calculated as r = (gK mod p) mod q s is calculated as s = [K-1(H(M)) + Xr] mod q The signature is (r, s) Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 34 6. Verification: Verification is done by following calculations: 1. W = (s’)-1 mod q 2. u1 = H(M’) W mod q 3. u2 = (r’) w mod q 4. v = [(gu1 gu2) mod p] mod q If v = r’, then the message is validated. CHAPTER 4 DATA COMPRESSION 4.1 Data compression The process of converting an input data stream into another data stream having reduced size is called as data compression. The input stream could be from a file or buffer in the memory. Source file: the input file to the encoder. Compressed file: the output file produced by the encoder which has a smaller size compared to the source file. Compressor or encoder: It is the program that converts the raw data into the input data stream and then compresses it to create the output stream. Decoder or decompressor: It is the program which generates the original data stream from the compressed data stream. Note: In general the term CODEC is used for coder-decoder. General law of data compression: General law of data compression states that for compression short codes should be assigned for common events and long codes should be assigned for rare events. This law is based on eliminating the redundancy in the data to achieve compresssion. 4.2 Classification of compression algorithms Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 35 Lossy and lossless compression techniques: Lossy compression techniques: In lossy compression methods, compression is achieved by losing some part of the information. In such cases the decompressed data is not identical to original data and some information is permanently lost and therefore such methods are irreversible compression methods. Lossy compression methods are generally used for audio video and image compression. Eg. JPEG, MPEG, EZW etc. Lossless compression techniques: In lossless compression methods compression is achieved without losing any information and therefore such methods are used in cases where information cannot be lost like text files. Eg. Huffman coding, Shannon Fano coding, Arithmetic coding, LZW etc. Adaptive and non adaptive compression techniques: Non adaptive compression techniques: Non adaptive compression is rigid is and does not modify its compression parameters or tables in response to the different patterns of the input data being compressed. Such methods are best suited to compress data of a single type or of a definite pattern. Eg. Huffman compression. Adaptive compression techniques: In adaptive compression techniques the compressor examines the input data statistics and patterns and modifies its parameters and compression tables accordingly. In other words the compressor adapts itself to varying conditions of input data for obtaining efficient compression. Eg. adaptive Huffman coding. Semi-adaptive method: A semi-adaptive method uses a two part algorithm where the first part reads the input stream to collect the statistics of data being processed and the second part does the actual compression using the statistical information provided by first part. Symmetric and asymmetric compression techniques: Symmetric compression techniques: In symmetric compression techniques same algorithm is used by compressor and the decompressor but is applied in opposite directions. Asymmetric compression techniques: In asymmetric compression techniques different compression algorithms are used by compressor and decompressor. 4.3 Compression parameters Compression ratio: Compression ratio is defined as the ratio of the output stream size to the input stream size. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 36 For compression C.R. < 1 For expansion C.R. > 1 Compression factor: Compression factor is defined as the ratio of the size of the input stream to the size of the output stream. For compression C.F. > 1 and for expansion C.F. < 1. Compression gain: Compression gain is defined as- Reference size is either size of input stream or size of the compressed stream produced by standard lossless compression method. 4.4 Runlength encoding (RLE) Runlength encoding is a lossless compression technique used for compression of text and images. RLE is useful for compression of those files where the characters are repeated many times continuously. In RLE a character string is encoded only if it is repeated more than 3 times and the compressed data is written in the following format: ( escape character, data character, runlength ) The escape character i.e. ‘@’ is used to indicate that data has been compressed. The data character is the character which is repeated. Runlength gives the number of times the character is repeated. For example consider the following stream of data given as input to the RLE encoder: aabcxfffffwwww1111111ssw The compressed output stream will be: aabcx@f5@w4@16ssw Note: For encoding a character stream the minimum value of runlength has to be 4 because, the runlength encoding procedure requires three bytes which is same as the number of bytes occupied by three characters. Hence if a character run of length three or less is encoded, it will not result in any compression RLE image compression: A digital image consists of small dots called as pixels. Pixels are arranged in an array called as bitmap of the image in the form of scan lines. RLE image compression is based on the fact that there is a high probability that a randomly selected pixel will have all the neighboring pixels of similar color. Each pixel occupies 3 bytes, one byte for each color field in (R, G, B) color space. The R, G and B fields are encoded as three different data streams using RLE. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 37 Typically each row is encoded separately using runlength encoding. Compression ratio can be further improved by ignoring shorter runs. 4.5 Relative encoding (differencing) Relative encoding is used when the elements of the data stream to be encoded have similar values. In such cases instead of sending each element, the difference between the elements can be transmitted to save bandwidth. Differencing is used for telemetry and facsimile applications. For example consider the following data stream generated by a temperature measurement telemetry system: Temperature (0C): 300, 301, 304, 300, 301, 299, 298, 302 The data stream can be encoded by transmitting the relative values considering the first value as the reference value. The encoded stream will be: 300, 1, 4, 0, -1, -2, 2 If the difference between the successive values is transmitted, the stream will be encoded as: 300, 1, 3, -4, 1, -2, -1, 4 If the difference between the reference value and the current value is large, then actual value is transmitted instead of sending the relative value. 4.6 Scalar quantization Scalar quantization is used to compress the data which is in the form of large numbers as quantized numbers will occupy lesser space. But quantization leads to permanent loss of information. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only CHAPTER 5 5.1 38 STASTICAL METHODS OF DATA COMPRESSION Statistical modeling of information source In statistical modeling of an information source, the probabilities of source symbols are tracked. The order of the model depends on number of previously occurring symbols taken into account. With increasing the order, the probabilities obtained become more and more reliable but the complexity increases. The overall efficiency of any data compression technique depends on individual performances of the modeling processes and the encoding methods. A statistical model used in compression can be shown as: Here the probabilities of the symbols occurring in the input stream are tracked and then forwarded to the encoder along with symbols for encoding. 5.2 Information theory Measurement of information: The information content of any message mK is measured as: pk is the probability of occurrence of mK The unit of information is bits. From the above expression it can be concluded that as the probability of occurrence of a symbol increases, the information content decreases i.e. less frequently occurring symbols convey more information as compared to more frequently occurring symbols. Note- for calculations use the formula: Entropy of a source: Consider a source that generates n different symbols S1, S2, ... , Sn with probabilities P1, P2, ... , Pn respectively. The entropy of the source is defined as the average information content of the source. It gives the minimum number of bits required to represent each symbol. It is given by the following expression: Above expression can be simplified as: H (S) = P1I1 + P2I2 + ...... + PnIn Entropy is measured in terms of bits/symbol. Average length of a code: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 39 It is the average number of bits needed per symbol. It is given by the following expression: PK is the probability of occurrence of Kth symbol and LK is its length in terms of bits. Redundancy: Redundancy is defined as the symbols largest possible entropy and its actual entropy. It is given by the following expression: For data to be compressed efficiently, R should be as small as possible i.e. the number of bits used to represent a symbol should be very close to the actual information content of the symbol. 5.3 Prefix codes Prefix property: Prefix property states that when a certain bit pattern has been assigned as the code for a symbol then no other code can start with that pattern. Consider the example where the symbols are assigned codes without following prefix property: Symbol Code S1 0 S2 01 S3 10 S4 010 If the symbols transmitted are S2 S3 S4, the corresponding data stream will be: 0110010 This data stream can be read as: S2 S3 S4 and also as: S2 S3 S2 S1 To avoid such ambiguities prefix property should be used while developing the code words for the symbols. Prefix codes: A prefix code is a code which satisfies prefix property. A unary code of a non negative integer n is defined as (n-1) zeroes followed by a single one or (n-1) ones followed by a single zero. 5.4 Shannon-Fano coding: Shannon-Fano coding produces variable size codes for the symbols occurring with different probabilities. The coding depends on the probability of occurrence of the symbol and the general idea is to assign shorter codes for symbols that occur more frequently and long codes for the symbols occurring less frequently. Shannon-Fano algorithm: The algorithm used for generating Shannon-Fano codes is as follows: 1. For a given list of symbols, develop a corresponding list of probabilities so that each symbol’s relative probability is known. 2. List the symbols in the order of decreasing probability. 3. Divide the symbols into two groups so that each group has equal probability. 4. Assign a value 0 to first group and a value 1 to second group. 5. Repeat steps 3 and 4, each time partitioning the sets with nearly equal probabilities as possible until further partitioning is not possible. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 5.5 40 Huffman coding Huffman coding gives a variable size code based on symbol probabilities. This method is based on reducing the redundancy in the number of bits used for representation of information. The general idea is to achieve compression by assigning shorter codes for frequently occurring symbols and longer codes for symbols occurring less frequently. Algorithm: The encoder starts by building a list of the symbols in the descending order of probabilities. It then constructs a tree with a symbol at every leaf from bottom to top. This is done in steps where at each step the two symbols with smallest probabilities are selected, added to the top of partial tree, deleted from the list and replaced another symbol representing those two symbols. When the list is reduced to just one symbol, then the tree is completed. The tree is then traversed from right to left to determine codes for the symbols. Note: when there are more than two nodes having smallest probabilities, select the nodes which are highest and lowest in the tree and combine them. This will reduce the total variance of the code. The Huffman code having smallest variance is preferred. The variance of a code measures by how much the size of the individual codes deviate from the average size. The variance of a code is defined as: PK = probability of occurrence of kth symbol LK = number of bits used to represent the symbol LA = average length Huffman decoding: - The Huffman table used for coding must be transmitted to the decoder as many times as it is updated if the technique is adaptive. For static Huffman coding only one table is sufficient for the decoder. - The decoder starts at the root of the tree and reads the first bit from the compressed stream. If the bit is zero the bottom edge is followed otherwise, top edge of the tree is followed. In the same manner successive bits are read until the decoder reaches a leaf where it finds a symbol. Drawbacks of Huffman coding: The symbol probabilities which are the basic requirements are very rarely known in advance. This makes the algorithm inefficient. There are two possible solutions to this problem: - use adaptive method - use semi adaptive method 5.5 Adaptive Huffman coding - In adaptive Huffman coding both the compressor and the decompressor start with an empty Huffman tree. No symbols are assigned codes and every new Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only - 41 symbol is treated as a leaf node with the same weight. As new symbols are added, the tree is also updated such that the updated tree is also a Huffman tree. The first symbol is written on the output stream as it is. This symbol is then added to the tree and a code is assigned to it. The next time this symbol occurs, its current code is written on the output stream and its frequency is incremented by 1. Each time the symbols are processed, it has to be checked whether the tree satisfies the Huffman properties. The Huffman property is that if we scan the tree, the frequency of occurrence of symbols should decrease from right to left and from top to bottom i.e. the symbol on top right position will have the highest frequency and the one at the bottom left will have the lowest frequency. This property is called as sibling property of Huffman tree. Updating the Huffman tree: The process of updating the tree starts always at the current node which is a leaf ‘S’ with ‘f’ as its frequency of occurrence. Every iteration has three steps: 1. Compare S to its successors in the tree from left to right and bottom to top. If the immediate successor has frequency (f+1) or more, then the nodes are still in sorted order and swapping is not required. Otherwise some successors of S have identical frequency f or smaller frequency. In such a case S should be swapped with the last node in this group. 2. Increment the frequency from f to f+1. Also increase the frequency of all its parents. 3. If S becomes the root, then the process stops otherwise the process repeats with the parent of node S. Drawbacks of adaptive Huffman coding: 1. Count overflow: The frequency counts are accumulated and this field can overflow. Normally the width of this field is 16 bits and can store a count up to 65535. The count of the root is monitored every time it is incremented. When the maximum count limit is reached, all the weights are rescaled with an integer division by 2. This is actually done by performing an integer division only on the leaf nodes and updating the tree again. Sometimes it leads to violation of Huffman property and the tree needs to be updated again. 2. Code overflow: Code overflow when many symbols are added to the tree and the tree grows longer. The compressor has to find out the code for an input symbol S in the tree by linear search method. If S is found in the tree, the compressor moves from node S back to root thus building the code bit by bit. These bits have to be accumulated as they are transmitted in the reverse order. When the tree gets longer, the codes get longer and if the field size is exceeded, the program malfunctions. 3. Another drawback of the Huffman coding is that the codes generated contain integer number of bits which adds redundancy to the data. 5.6 Arithmetic coding One of the drawbacks of Huffman coding is that it assigns an integer number of bits to individual symbols, which adds some coding redundancy. Arithmetic coding overcomes Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 42 this drawback by assigning one long code to represent the string of symbols instead of assigning codes to individual symbols. Arithmetic coding is also based on the probability model of the symbols to be encoded. Initially the encoding starts with a code assigned to the first symbol which gets modified as other symbols are added. The result code when the last symbol is encoded is the compressed data. Data is encoded in following steps: 1. Start by defining the current interval as [0, 1). 2. Repeat the following two steps for each symbol S in the input stream: i. Divide the current interval into sub-interval whose sizes are proportional to the symbol’s probabilities. ii. Select the sub-interval for S and define it as the new current interval. 3. When the entire input stream has been processed, the output should be any number within the current interval. 5.7 Context based text compression (PPM) In context based compression the probability model of the symbol is generated depending on frequency of the symbol and the context in which the symbol has occurred so far. The PPM encoder switches to a shorter context when a longer one results in zero probability. PPM starts with an order n context and it searches its data structure for a previous occurrence of the current context C followed by the next symbol S. If no such occurrence is found the encoder switches to order n-1 context and then same procedure is followed. The encoder reads the next symbol S from the input stream, looks at the current order n context C and based on the input data that has been encoded previously, it determines the probability (P) that S will appear in context C. The encoder then uses adaptive arithmetic encoder to encode the symbol S with probability P. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only CHAPTER 6 6.1 DICTIONARY BASED METHODS Dictionary based methods Dictionary based methods try to compress variable size strings of information into tokens using a dictionary. The dictionary holds strings of symbols and it can be static as well as adaptive. The adaptive dictionary holds the strings previously found in the input stream allowing for addition of new strings as the input is being read. The encoder tries to match a part of the input stream with the words (strings) stored in the dictionary. If a match is found, the token is written on the output stream which contains a pointer to that location of the dictionary where the matched word is stored. This method is also called as string compression. If a word is found which does not match then it is written as it is on the output stream followed by a flag character and size of the word. 6.2 Static and adaptive dictionary methods Static dictionary methods 6.3 43 Adaptive dictionary methods 1. Static dictionary methods are rigid and the dictionary is not modified according to the varying input data. 1. In adaptive dictionary methods the unmatched strings are added to the dictionary dynamically and hence the dictionary is dynamically updated. 2. The size of the dictionary is fixed and generally very small. 2. Here space is allocated for addition of new strings to the dictionary. 3. Preferred only when the strings encountered in the input stream follow a definite pattern and occur in definite patterns. 3. Preferred when the words randomly appear in the input data and do not fall under any category. LZ-77 (sliding window compression) The LZ-77 compression method is an adaptive compression method where the encoder dynamically builds a dictionary from the input data and then uses the previously occurring strings to compare and compress the new strings. The amount compression i.e. the compression ratio depends on: - Length of the dictionary - Size of the window used The encoder maintains a window and shifts the input in that window from right to left as the symbols are being encoded and that is why this method is called sliding window. The sliding window has two parts: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only - 44 The left part is called the search buffer and it contains the current dictionary. It includes the strings that have been input and encoded. The right part of the window is called the look ahead buffer and it contains the strings which are to be encoded. Typically the size of search buffer is very large as compared to the look ahead buffer. For each encoded string a token is written on the output stream. The LZ-77 token structure is as follows: This token is written on the output stream and the window is shifted to right. - The first field of the token is offset field which gives the location of the matched string in the dictionary. This field is basically a pointer to the dictionary which points to the memory location in the dictionary where the string is stored. The size of the offset field is log2 (S) - The second field of the token is the match length i.e. the number of symbols in the string which found a match in the dictionary. The size of this field is log2 (L-1). - The third field of the token is the next unmatched symbol which stores the next symbol in the input stream after the matched string. The length of this field is log2 (C). Drawbacks of LZ -77 compression technique: 1. This method assumes that a match is found around the window which is not the case in practical applications. 2. Compression ratio can be improved only by increasing the size of search window which increases the latency. 3. This method is not practically applicable as there is no definite data structure. 6.4 LZ-78 In LZ-78 method a dictionary of previously occurred strings is maintained, the size of which is limited by the available memory. This method reduces the token size by having only two fields in the token. The token structure in LZ-78 method is as shown: - The LZ-78 token has only two fields as compared to three in LZ-77. The pointer field points to the memory location in the dictionary at which a match is found for the current input string. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only - 45 The second field in the dictionary stores the value of the symbol occurring immediately after the string that found a match in the dictionary. In other words this field stores the symbol next to the string being encoded which when added to the string will result in a string having no match in the existing dictionary. Encoding: The dictionary is empty and starts with a null string at location zero. As the symbols are input and encoded, dictionary is built by adding new strings at positions starting from 1. If the current input symbol ‘S’ does not matches any of the strings in the dictionary, then it is added to the next available memory location and the value of the symbol is written in the token. Otherwise, if the current symbol is present in the dictionary then the next symbol in the stream is added to this symbol to form a new string and this string is checked for a match in the dictionary. In this manner symbols are added to the string until there is no match in the dictionary. At the point when there is no match found in the dictionary, the location of the recently matched string in the dictionary is written in the pointer field of the token and the recently added symbol which caused the mismatch is the next unmatched symbol. Decoding: The LZ-78 decoder works by building and maintaining the dictionary in the same way as the encoder. Drawbacks of LZ-78 algorithm: The drawback of the LZ-78 algorithm is the memory size as the frequently encountered symbols as well as the longer matches have to be stored as entries in the dictionary. If the dictionary is full, then either the dictionary has to be restarted or some of the entries have to be deleted. 6.5 - - LZW The LZW compression algorithm eliminates the ‘unmatched symbol’ field from the token and hence only one field i.e. the pointer to the dictionary has to be transmitted for each encoded data string. But due to this every unmatched symbol has to be exclusively encoded. In LZW, the dictionary is initialized to store all the symbols in alphabet and other ASCII characters and therefore memory locations 0-255 are occupied. The new entries in the dictionary are based on the combinations of existing symbols which appear in the data stream. The decoding is done by building the dictionary in the same manner as for encoding. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 46 Question bank Chapter #1 Data compression 1. Give the applications of data compression? (4m) 2. Compare lossy and lossless data compression techniques? (5m) 3. Suggest and explain a compression method for the compression of data transmitted by a remote measurement system which monitors the temperature of a furnace? (4m) 4. Explain run length encoding. What are the applications of run length encoding? (510m) 5. Encode the following data strings using run length encoding: a) 11abbbbcccccabc b) @aaaa$55555677777 Also find the compression ratio and compression factor in each case. 6. Compare dictionary based methods and statistical methods of text compression. (510m) 7. Write a short note on: a) Relative encoding \ telemetry compression b) Scalar quantization Chapter #2 Statistical methods of data compression Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 47 1. Write a short note on: a) Information content of a message b) Entropy c) Average length of a code d) Redundancy 2. Explain Shannon Fano coding. Generate the codes for the following symbols using Shannon Fano coding: Symbol Probability S1 0.35 S2 0.21 S3 0.15 S4 0.19 S5 0.1 Also find the redundancy in coding. (10m) 3. State advantages and disadvantages of statistical methods for data compression. 4. Compare adaptive and non adaptive compression methods. 5. Explain arithmetic coding technique for data compression. 6. Explain Huffman method of data compression. Consider the following symbols with the given probabilities: Symbol Probability S1 0.4 S2 0.2 S3 0.2 S4 0.1 S5 0.1 Draw the Huffman trees using normal method and using minimum variance method. Also find the variance and the coding redundancy in each case. 7. A source emits letters from an alphabet set S = {m, n, o, p, q} such that: P (m) = P (n) = 0.2, P (o) = 0.4 and P (p) = P (q) = 0.1. a) Find the entropy of the source. b) Find the Huffman code using the standard procedure and the minimum variance method. c) Find the average length of the code and the coding redundancy for both the codes. 8. What are the drawbacks of Huffman method? What are the solutions to those drawbacks? Explain adaptive Huffman method. 9. Compare RLE and Huffman coding for an image where each pixel is represented in 8 bits and 50% of the pixels have a grey level of 127 and remaining 50% of the pixels have a grey level of 128. 10. A source emits six discrete symbols with probabilities as P (a1) = 0.1, P (a2) = 0.4, P (a3) = 0.06, P (a4) = 0.1, P (a5) = 0.04 and P (a6) = 0.1. Use Huffman coding to encode the source. If the encoded string is 010100111100, decode it to find the original string. 11. A source emits five symbols S1, S2, S3, S4 and S5 with probabilities 0.25, 0.25, 0.25, 0.125 and 0.125 respectively. Find: a) Entropy of the source b) Huffman code using standard procedure. c) Shannon Fano code. d) Average length of code and redundancy. 12. Encode the following data strings using adaptive Huffman method: Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 48 a) sir_sid_easily b) she_sells_sea_shells c) zxzyzxzyzx Also show the decoding. 13. Encode and decode the following data strings using arithmetic coding: a) swiss_miss b) assassinimassa 14. Given 3 data symbols a1, a2 and with probabilities 0.001838, 0.975 and 0.023162 respectively. Use arithmetic coding to encode the data string “a2, a2, a1, a3, a3”. 15. Compare arithmetic coding with Huffman coding. 16. “The Huffman coding is not unique”. Explain this with an example. 17. Explain context based coding. What are its advantages? 18. Draw the trie structure for following data strings: a) zxzyzxxyzx b) abcbccacbcaabcb Also show base and vine pointers. Chapter #3 Dictionary based compression 1. Compare statistical and dictionary based compression techniques. (5m) 2. Suggest a suitable compression technique for each of the following data strings. Also state the reasons. a) xyzzyyxzx b) xxxxyyyyzzzz c) xzyyzyzzyzzzz 3. Compare LZ-77, LZ-78 and LZW compression techniques. 4. Write a short note on: a) Zip b) Gzip c) CRC d) Arc 5. Encode the following data strings using LZ-77, LZ-78 and LZW algorithms: a) sir_sid_is_easily_teases_sea_sea_sick_seals b) she_sells_sea_shells_at_the_sea_shore c) alph_eats_alphalpha 6. Explain the concept of static and adaptive dictionary. Explain with a suitable example the encoding technique using LZ-77. 7. Describe the situations when LZ-77 algorithm is best and worst. Explain the LZ-78 algorithm specifying the improvements over the LZ-77 algorithm. 8. An initial dictionary consists of letters a, b, r, y and z. Encode the following message with LZW algorithm: “azbarzarrayzbyzbarrayarzvay”. 9. What are the advantages of LZW over other methods. Chapter #4 Image compression 1. 2. 3. 4. Describe different approaches for image compression. Write a short note on gray codes. Explain the application of DCT in image compression. Explain JPEG compression method used for image compression. How JPEG-LS standard is different from JPEG? Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 49 5. What is motion compression with respect to image compression? Give the basic structure of MPEGI video standard. 6. What is motion compensation? Explain the working of MPEG in detail. 7. Draw the structured layers of MPEGI video stream. 8. Explain the various techniques used in video compression and their underlying principles. 9. What is the effect of quantization on an image? 10. Write a short note on PCM and DPCM. 11. Explain the various steps involved in the compression of video sequences using MPEGI video standard. Chapter #5 Audio compression 1. Write a short note on lossy sound compression. 2. Describe A-law and μ-law companding. 3. Write a short note on ADPCM and DPCM. What are the advantages of ADPCM over PCM? 4. What is linear predictive coding? Explain CELP and MELP in details. 5. Explain the terms “frequency masking” and “temporal masking” in audio compression. 6. Write a short note MPEG audio standard. Chapter #6 Conventional Encryption 1. Write a short note on: a) Goals of cryptography b) Security service and security mechanism c) Security attacks 2. Explain data encryption standard. 3. Explain IDEA? 4. Write a short note on Fiestel Cipher. Explain the design principles. 5. Explain CBC, ECB, OFB, CFB and counter mode of operation of block ciphers. Chapter #7 1. 2. 3. 4. 5. 6. Number theory and public key encryption Explain CRT with an example. Explain the concept of discrete logarithm. What is the difference between index and discrete logarithm? Compare conventional and public key encryption. Explain RSA with an example. Calculate the private key and public key based on RSA taking 5 and 11 as two prime numbers. Use these keys to encrypt and decrypt a plain text input of M=17. Chapter #8 Message authentication 1. Describe the various authentication requirements for communication across a network. Explain different authentication functions. 2. Write a short note on MAC. 3. Explain MAC based on DES. Kalpana Coaching Classess BE-SEM-VII-EXTC-DCE-Notes by Rohit Sinha Ph. Dadar-24330916 Thane-25440393 For private circulation only 4. What is MAC? Where do we use it? 5. What is secure hash algorithm? 6. Differentiate between MAC and hash codes. 7. Write a short note on HMAC. 8. Write a short note on one way hash function. 9. What are the needs and requirements of digital signatures? 10. What are the drawbacks of direct signatures? 11. Explain DSA. 50