Section 1 Data compression: Data compression is a reduction in the number of bits needed to represent data. There are two methods of compressing data: Lossy compression reduces file size by permanently removing some of the original data. It’s typically used when a file can afford to lose some data, and/or if storage space needs to be drastically ‘freed up’. It’s applicable for Images, video, audio. (Images: JPEG Video: MPEG, AVC, HEVC Audio: MP3, AAC) Much smaller in size but the quality degrades a lot from the original one. Lossless compression file size by removing unnecessary metadata. In lossless compression, the file data is restored and rebuilt in its original form after decompression, enabling the image to take up less space without any discernible loss in picture quality. No data is lost and as the process can be reversed, it’s also known as reversible compression. It’s applicable for Text, images, audio (Images: RAW, BMP, PNG General: ZIP Audio: WAV, FLAC) Not so much smaller in size but the quality doesn’t change a lot. 1- Entropy : The average amount of information of a source (average number of bits of a code) N H = − P X i log 2 P X i i =1 2- Average length ( Lavg ): It’s the sigma summation of probability of the code multiplied by its length for each code. Σ𝑝𝑖 × 𝑙𝑖 3- Efficiency: It’s the efficiency of the binary code in which can be compressed Efficiency = 𝐻 𝐿𝑎𝑣𝑔 Example: Symbol Prob. FLC Code 1 Code 2 Code 3 Code 4 A P[A]=1/2 000 1 1 0 00 B P[B]=1/4 001 01 10 10 01 C P[C]=1/8 010 001 100 110 10 D P[D]=1/16 011 0001 1000 1110 11 E P[E]=1/16 100 00001 10000 1111 110 Average Length H=30/16 3 31/16 31/16 30/16 33/16 Exercises: Test the codewords in this binary code and Conclude if it’s uniquely decodable or not ? {0,01, 11} {0,01,10} {0,01,10,1} {0,1,00,11} {0,10,110,111} Test the codewords in this binary code and conclude it it’s prefix or not? {1,01,001,0000} {0,10,110,1011} {0,10,010,101} Encode the following sets of binary codes by using Run-length coding technique: 0011110001111000 0000011110 111100011100 Decode the following sets of codes by using Run-length coding technique: 10,20 , -5 ,35,32,54,32,19,3,87 9,12,-4,35,76,112,67,2,19,2 255,8,2,54,32,65,76,255,5,30,1 128,8,2,54,32,65,76,128,5,30,1 Find the codewords and code length of the following probabilities by using Shannon-fano coding technique: 1- 0.25, 0.2, 0.15, 0.15, 0.10, 0.10, 0.05 2- AADCABCBAB