Lecture 24 Reminders: Programming Project 5 (reassigned Homework Problem 5.48) due today. Homework 7 due next Tuesday. Questions? Thursday, November 17 CS 475 Networks - Lecture 24 1 Outline Chapter 7 - End-to-End Data 7.1 Presentation Formatting 7.2 Multimedia Data 7.3 Summary Thursday, November 17 CS 475 Networks - Lecture 24 2 Data Compression Data compression can allow us to send data at a faster rate than seemingly supported by the network. Compression can, for example, allow us to send a 10 Mbps video stream over a 1 Mbs link. Compression algorithms can be categorized as lossy or lossless. Greater compression ratios can be achieved with lossy algorithms. The lost data may not be noticeable in images or audio or video streams. Thursday, November 17 CS 475 Networks - Lecture 24 3 Data Compression Compression and decompression are often time consuming. Greater throughput can only be achieved if x / Bc + x / (r Bn) < x / Bn where x represents the amount of data (uncompressed), r is the compression ratio, Bc is the bandwidth at which data can be run through the compressor/decompressor and Bn is the network bandwidth. Thursday, November 17 CS 475 Networks - Lecture 24 4 Lossless Compression Run Length Encoding (RLE) In RLE, consecutive occurrences of a symbol are replaced with a copy of the symbol and a count. The string AAABBCDDDD would be encoded as 3A2B1C4D (an 80% compression ratio). RLE works well on scanned text images where 8 to 1 rates can be achieved. It is used to transmit faxes. If there is not a lot of adjacent identical data, RLE actually can increase the size of a file. Thursday, November 17 CS 475 Networks - Lecture 24 5 Lossless Compression Differential Pulse Code Modulation In DPCM, a reference symbol is output and then the difference between each symbol and the reference is output. The string AAABBCDDDD would be output as A0001123333. Differences between 0 and 3 can be encoded using only 2 bits so the compression ratio is 35%. When the difference becomes too large a new reference symbol is output. Thursday, November 17 CS 475 Networks - Lecture 24 6 Lossless Compression Differential Pulse Code Modulation A variation gives delta encoding where each symbol is encoded as the difference from the previous one. AAABBCDDDD would become A001011000. Encoding the difference using only one bit gives a compression ratio of 21%. Delta encoding may be following by RLE since there may be long strings of 0s. Delta encoding works well when adjacent pixels in an image are similar. Thursday, November 17 CS 475 Networks - Lecture 24 7 Lossless Compression Dictionary Methods The UNIX compress command uses a variation of the Limpel-Ziv (LZ) algorithm which is a dictionary based method. With this approach a group of symbols (a word) is replaced by its index into a dictionary. LZ generates the dictionary adaptively and must transmit the dictionary along with the compressed data. Thursday, November 17 CS 475 Networks - Lecture 24 8 Image Compression (GIF) The Graphical Interchange Format (GIF) image uses 8-bit pixels to index into a 256 color 24-bit palette. The image is compressed using LZ with common sequences of pixels making up the dictionary. Patent restrictions on the LZ algorithm spurred development in 1995 of the Portable Network Graphics (PNG) format as a replacement format for images on the web. The patent expired in 2003. Thursday, November 17 CS 475 Networks - Lecture 24 9 Image Compression (JPEG) The JPEG (Joint Photographic Expert Group) image format uses a lossy compression algorithm that offers better ratios than GIF or PNG. JPEG is related to MPEG which is a video data format. JPEG compression occurs in three phases. The image is fed through the phases one 8 x 8 block at a time. Thursday, November 17 CS 475 Networks - Lecture 24 10 Image Compression (JPEG) DCT Phase To simplify the discussion assume that we are working only with 8-bit grayscale images where 0 is white and 255 is black. In the discrete cosine transform (DCT) phase an 8 x 8 matrix of pixel values is converted to an 8 x 8 matrix of frequency coefficients. Low frequency coefficients define gross image feature while high frequency coefficients correspond to the fine details. Thursday, November 17 CS 475 Networks - Lecture 24 11 Image Compression (JPEG) DCT Phase The DCT and its inverse are defined by N −1 N −1 1 DCT i , j= C i C j ∑ ∑ pixel x , y 2N x=0 x=0 2 x1i 2 y1 j cos cos 2N 2N [ ] [ 1 pixel x , y = 2N ] N −1 N −1 ∑ ∑ C iC j DCT i , j i=0 j =0 [ ] [ 2 x1i 2 y1 j cos cos 2N 2N { 1 where C i= 2 1 Thursday, November 17 ] if i=0 if i0 CS 475 Networks - Lecture 24 12 Image Compression (JPEG) DCT Phase The frequency coefficient at (0, 0) is the DC coefficient and is the average of the 64 pixel values. The other coefficients correspond to higher frequencies and to finer and finer detail. The DCT is lossless. The image can be exactly reconstructed from the coefficients using the inverse transform. It is in the Quantization Phase where loss is introduced. Thursday, November 17 CS 475 Networks - Lecture 24 13 Image Compression (JPEG) Quantization Phase To see how quantization works imagine truncating numbers less than 100 to multiples of 10. For example, the numbers 45, 98, 23, 66 and 7 would become 4, 9, 2, 6 and 0. These numbers can be encoded using only 4 bits instead of 7. The truncation operation is equivalent to dividing each of the original numbers by a quantum (10) using integer division. Thursday, November 17 CS 475 Networks - Lecture 24 14 Image Compression (JPEG) Quantization Phase Instead of using the same quantum value for all 64 freq. coefficients, JPEG uses quantization tables. An example table is: [ 3 5 7 9 Quantum= 11 13 15 17 Thursday, November 17 5 7 9 11 13 15 17 19 7 9 11 13 15 17 19 21 9 11 13 15 17 19 21 23 11 13 15 17 19 21 23 25 CS 475 Networks - Lecture 24 13 15 17 19 21 23 25 27 15 17 19 21 23 25 27 29 17 19 21 23 25 27 29 31 ] 15 Image Compression (JPEG) Quantization Phase The quantized frequency coefficient values are computed using: QuantizedValue i , j= IntegerRound DCTi , j/Quantum i , j Thursday, November 17 CS 475 Networks - Lecture 24 16 Image Compression (JPEG) Encoding Phase A variant of RLE is used to compress an 8 x 8 block along a zigzag path. The individual coefficients are encoded using a Huffman code (short codes are assigned to the most frequently occurring coefficients). Zigzag traversal of the quantized frequency coefficients for RLE. Thursday, November 17 CS 475 Networks - Lecture 24 17 Image Compression (JPEG) Color Images A color image can be thought of as being made up of separate red, green and blue images. This is known as the RGB representation. JPEG typically uses the YUV (luminance and chrominance) representation. Each of the Y, U, and V images is processed as described for a grayscale image. JPEG is capable of compressing 24-bit images by an approximately 30-to-1 ratio. Thursday, November 17 CS 475 Networks - Lecture 24 18 Video Compression (MPEG) Video can be thought of as a succession of still images or frames. Individual frames can be compressed using a DCT-based technique as with JPEG. By additionally removing the redundant information that is present in successive frames compression ratios on the order of 150-to-1 can be achieved. Thursday, November 17 CS 475 Networks - Lecture 24 19 Video Compression (MPEG) Frame Types MPEG takes a video stream and produces intrapicture (I), predicted picture (P) and bidirectional predicted picture (B) frames. Thursday, November 17 CS 475 Networks - Lecture 24 20 Video Compression (MPEG) Frame Types I frames can be considered to be JPEG compressed frames. They are self-contained. P and B frames specify differences relative to an I frame. 16 x 16 blocks of pixels are used in MPEG. The U and V components are downsampled to an 8 x 8 block. Thursday, November 17 CS 475 Networks - Lecture 24 21 Video Compression (MPEG) Frame Types A macroblock in a B frame is represented with a 4-tuple: (1) coordinates (x and y) for the macroblock, (2) a motion vector (x p and yp) relative to the previous reference, (3) a motion vector (xf and yf) relative to the next reference, and (4) a delta (δ) for each pixel in the macroblock. The pixel value in the current frame is computed using: Fc(x, y) = (Fp(x+xp, y+yp)+Fn(x+xf, y+yf))/2 + δ(x, y) Thursday, November 17 CS 475 Networks - Lecture 24 22 Transmitting MPEG over a Network The MPEG specification specifies the format for a video stream but does not specify how the stream is broken into network packets. An MPEG main profile is a nested structure. At the top level the video stream is broken into groups of pictures (GOP) separated by a SeqHdr. The SeqHdr contains a quantization matrix for the I frame and one for the B and P frames. Thursday, November 17 CS 475 Networks - Lecture 24 23 Transmitting MPEG over a Network Thursday, November 17 CS 475 Networks - Lecture 24 24 Transmitting MPEG over a Network A GOP consists of pictures (I, B, and P frames) and pictures are broken into slices (a region of the picture). A slice consists of a macroblock. A macroblock is made up of six 8 x 8 blocks (one each for the U and V components and 4 for the Y component which is 16 x 16). MPEG allows the frame rate, resolution, mix of frame types, and quantization to change between GOPs, allowing picture quality to be traded for network bandwidth adaptively. Thursday, November 17 CS 475 Networks - Lecture 24 25 Transmitting MPEG over a Network By using UDP and breaking the stream at selected points (macroblock boundaries) we can limit the loss in picture quality due to a lost network packet. This is an example of application level framing. The video encoding method may depend on latency as well as bandwidth. For interactive videoconferencing (where low latency is desirable) an encoding using only I and P frames (I P P P P I) may be preferable to one also using B frames (I B B B B P B B B B I). Thursday, November 17 CS 475 Networks - Lecture 24 26 Audio Compression (MP3) CD quality sound requires sampling at 44.1 kHz using 16-bit samples. A stereo stream implies a bit rate of 2 x 44100 x 16 = 1.41 Mbps Synchronization and error correction require sending 49 bits instead of 16 resulting in 49/16 x 1.41 Mbps = 4.32 Mbps Thursday, November 17 CS 475 Networks - Lecture 24 27 Audio Compression (MP3) MPEG also defines audio compression formats, the most popular is Layer III (MP3) which reduces BW requirements to 128 Kbps MP3 splits audio into frequency subbands. The subbands are broken into blocks, transformed (DCT), quantized and Huffman encoded (all similar to video compression). The audio frames are interleaved with video frames to form a complete MPEG stream. Thursday, November 17 CS 475 Networks - Lecture 24 28 In-class Exercises To be submitted as part of Homework 8. Problems 7.25 and 7.26 on page 629. Some notes: You can create image files in plain (P2 type) PGM format using a text editor. See the PGM man page for the description. cjpeg produces JPEG files from PGM files. djpeg produces RAW PBM files from JPEG files. They can be converted to plain format using pnmtoplainpnm or (depending on which version of the PBM utilities you have installed) pnmtopnm ­plain Thursday, November 17 CS 475 Networks - Lecture 24 29