ENEE408G Multimedia Signal Processing Lab Manual on Image, Video, Audio and Speech K. J. Ray Liu Min Wu Guan-Ming Su Department of Electrical and Computer Engineering University of Maryland, College park Last Updated: Spring 2003 © Copyright 2003. All rights reserved. ENEE408G Multimedia Signal Processing Design Project on Image Processing and Digital Photography The Goals: 1. Understand the fundamental knowledge of digital image processing. 2. Learn how to enhance image quality and how to compress images. 3. Explore artistic techniques by using digital image processing. Note: The symbol means to put your discussion, flowchart, block diagram, or plots in your report. The symbol indicates that you should put the multimedia data in your report. Save images in BMP format unless otherwise stated. The symbol means to put your source codes (Matlab, Basic, or C/C++) in your report. Part I. Color Coordinate Several color coordinate systems are commonly used in practice. Each coordinate system represents a color space consisting of several components and has its own special purpose. In this section, we explore three color coordinate systems, RGB (Red, Green and Blue), HSL (Hue, Saturation and Lightness or formally, HSV, where V represents brightness value) and YUV. 1. Separate three components of a color image using Paint Shop Pro. (a) Open the Flower.bmp file using Paint Shop Pro. By Colors Spilt channel Spilt to RGB, we can split a color image into red, green and blue components. What can you observe from these three images? (b) Choose the red component and adjust its value by Colors Adjust Brightness/Contrast. Set the Brightness to 75% and Contrast to 0%. After this, we combine this new red component and the original green and blue components using Combine RGB dialog box obtained from Colors Combine channel Combine from RGB. Observe and store this combined image. UMD ENEE408G Spring 2003 Design Project -Image 1 (c) Repeat the procedure (a) and (b) but use the green component instead of the red one. (d) Repeat the procedure (a) and (b) but use the blue component instead of the red one. (e) Compare these three new images and the original image. Describe what you observe. 2. Repeat 1(a)~1(d) but we change the color coordinate to HSL by using Colors Spilt channel Spilt to HSL and Colors Combine channel Combine from HSL. Save these new images . Discuss the role that each component plays in this color coordinate. 3. Explore the YCbCr (YUV) color coordinate using Matlab. (a) Write a Matlab script using the following procedures. (i) RGB to YCbCr: Use imread.m and rgb2ycbcr.m to separate the Flower.bmp into Y, Cb and Cr. (ii) Downsampling: Use imresize.m to downsample Cb and Cr components in each dimension by factor 1.5, 2, 4, 6 and 8. (iii) Upsampling: Use imresize.m to upsample the downsampled Cb and Cr components to their original sizes. (iv) YCbCr to RGB: Combine Y and these two new components, Cb and Cr, and transform back to RGB color coordinate by ycbcr2rgb.m. Display and save these new images using imshow.m and imwrite.m, respectively. (b) Explain what you observe and discuss what advantages the color coordinate transform from RGB to YUV has. (c) Compare RGB and YUV representations. Under what situation will you adopt each color representation? 4. Blue Background Extraction On TV weather news, we often see that the weather forecast men stand in front of maps or Doppler radar images. In traditional news production, they just stand in front of a blue curtain. A special camera system will extract the weathermen’s images and add them in front of those weather related images. The basic idea of this camera system is to split an image into RGB channels, create a mask based on the information of blue channel, and use this mask to extract images. In this subsection, develop your own scheme and use Paint Shop Pro to extract the flower UMD ENEE408G Spring 2003 Design Project -Image 2 object from BlueBG.bmp and add this flower object in front of another image Sand.jpg. (a) Draw a block diagram of your scheme and describe the procedures you followed with Paint Shop Pro. (b) Save the final image of the flower with sand. Part II. Image Enhancement An image could be corrupted during transmission or captured under ill condition. We can use image enhancement techniques to enhance some features and improve the visual quality of an image. 1. Histogram: Histogram of a gray image refers to the distribution of luminance. In this part we examine the histogram of the gray scale Lena image using Paint Shop Pro. After opening the Lena.bmp, click the icon , and you will see a Histogram Window. (a) Histogram Equalization: Histogram equalization is an image enhancement technique to make the histogram more uniform. Use Colors Histogram Functions Equalize to perform histogram equalization. Apply this technique to Lena.bmp and LenaDark.bmp. Use the Histogram Window to observe the histogram before and after equalization. Save the equalized image and record the mean and median values of both original and equalized images . (b) Histogram Stretch: Histogram stretch is a technique stretching the original distribution to the whole gray levels but maintaining the overall trend of the original distribution. Use Colors Histogram Functions Stretch to perform histogram stretch. Save this stretched image . Compare the stretched results of Lena.bmp and LenaDark.bmp with (a) . UMD ENEE408G Spring 2003 Design Project -Image 3 2. Histogram Adjustment: In some situations, we would like to emphasize a specific band of gray-level pixels. We can shape the histogram of an image to a desired histogram using Histogram Adjustment dialog box in Paint Shop Pro. (a) Open Lena.bmp. Use Colors Histogram Functions Histogram Adjustment, and you will see the Histogram Adjustment dialog box. Adjust the values in Midtones Compress1 and Gamma2 and observe the adjusted image. (b) We have already used histogram equalization and stretch on LenaDark.bmp. In this subsection, use Histogram Adjustment dialog box to reshape the histogram of LenaDark.bmp. Save this reshaped image and record the values of parameters you used . Compare your result with the results of histogram equalization and stretch . 3. Image Sharpening: The goal of image sharpening is to enhance some details or blurred regions of images. (a) Open LenaBlur.bmp using Paint Shop Pro. Use Effects Sharpen Sharpen and Sharpen more from Paint Shop Pro menu bar to improve the quality of this image. Observe the effect and save the sharpened image. Discuss whether a blurred image can be completely recovered . 1 Midtone Compress is a non-linear operation mapping the original gray-level image onto another scale. The higher the value you choose, the more gray-level pixels mapped into middle band. If you select a larger negative value, it will expand more original mid-band pixels to a wide gray scale range. 2 The response of photographic films is non-linear and can be written as d = γ log10 w − d 0 . w is the incident light intensity. d is the optical density, which is the response on the film for w. γ is called the gamma of the film. Similar nonlinear response is also associated with visual displayer. We can compensate the non-linear response by an inverse procedure with γ . UMD ENEE408G Spring 2003 Design Project -Image 4 (b) One approach to implement image sharpening is to use spatial high boost filter. Paintshop Pro provides a User Defined filters under the Effects menu. In this dialog box, you can create your own filter by choosing New from User Defined filters Window. An Edit User Defined Filter window will pop up, which is shown in the following figure. Key in the parameters shown below and apply it to LenaBlur.bmp. Save this sharpened image . (c) In general, we can create an m x m spatial high boost filter as follows: −1 " # −1 −1 1 −1 w m2 −1 −1 # −1 " Where w = A × m 2 − 1 −1 −1 −1 and " −1 # # " −1 A ≥1 Generate a 5x5 and a 7x7 high boost filters to improve LenaBlur.bmp and save these improved images. Record the m and A you have used in each case and compare the resulting image quality with (b) . 4. Noise Cleaning: An image can become noisy in such situations as transmission through noisy channel. To recover the original perceptual quality of the image, we can perform noise cleaning. There are two popular noise models widely used in image processing. One is the Gaussian noise and the other is the salt-and-pepper noise. In this subsection, we use two approaches, namely, an average filter (spatial low pass filter) and a median filter, to remove these two kinds of noises. (Note: BoatPxx.tif represents an image that suffers from salt-and -pepper noise and BoatGxx.tif suffers from Gaussian noise). (a) Clean Salt-and-Pepper Noise using Average Filter: Open BoatP05.tif, BoatP25.tif, and BoatP50.tif using Paint Shop Pro. Apply Effects Blur Average. Set Filter aperture as 3 and save the filtered images . UMD ENEE408G Spring 2003 Design Project -Image 5 (b) Clean Salt-and-Pepper Noise using Median Filter: Open BoatP05.tif, BoatP25.tif, and BoatP50.tif. Apply Effects Noise Median Filter. You can adjust the Filter Aperture to enhance the photo. Save the filtered images. (c) Clean Gaussian Noise using Average Filter: Open BoatG01.tif, BoatG10.tif, and BoatG20.tif, and use average filter to clean up noise. Save the filtered images. (d) Clean Gaussian Noise using Median Filter: Open BoatG01.tif, BoatG10.tif, and BoatG20.tif, and use median filter to clean up noise. Save the filtered images. (e) Observe your results obtained from (a) to (d). For each type of noises, explain which filter can clean it up better. 5. Edge Detection: Edges are local discontinuities in luminance. Since edges indicate the physical extent of an object, edge detection plays an important role in computer vision. In this part, we experiment on a few edge detection algorithms. (a) We can detect edges by using Paint Shop Pro’s built-in operators. Apply Effects Edge Find all to Pepper.bmp and Baboon.bmp. Save these images. (b) There are several spatial edge detectors that we can study through the User Defined filters mentioned earlier in 3-(c). Apply the following six filters to Pepper.bmp and Baboon.bmp and use Image Arithmetic Add to combine the corresponding anti-diagonal-direction and diagonal-direction (or rowdirection and column-direction) results. Save these images. Filter type Roberts Anti-diagonal-direction Filter 0 0 − 1 0 1 0 0 0 0 Diagonal-direction Filter − 1 0 0 0 1 0 0 0 0 Filter type Prewitt Row-direction Filter 1 0 − 1 1 1 0 − 1 3 1 0 − 1 Column-direction Filter − 1 − 1 − 1 1 0 0 0 3 1 1 1 Sobel 1 0 − 1 1 2 0 − 2 4 1 0 − 1 − 1 − 2 − 1 1 0 0 0 4 1 2 1 UMD ENEE408G Spring 2003 Design Project -Image 6 (c) Compare and discuss the results you obtained in (b). Which filter has the best performance? Part III. Image Compression – JPEG JPEG (Joint Photography Experts Group)3 is a popular image compression format. The JPEG encoder can compress images into much smaller data size without too much distortion. Users can select a quality factor to generate an image with either higher quality but a larger file size or lower quality but a smaller file size. JPEG supports several modes to display images. For example, JPEG progressive mode can display an image progressively from coarse to fine. Hence, if the connection speed of network is quite slow, users can browse the rough images first and then finer detail will be added up. In this part, we first explore the JPEG compression and then design a JPEG-like image codec. 1. JPEG Experiment In this experiment, we compare different parameters in JPEG standard using Paint shop Pro. We can save .jpg by File Save As. A Save As dialog box will show up. Choose the Save as type as JPEG and click the Options buttons. A dialog box will pop up. (a) Standard Mode: Save LenaC.bmp as standard mode. Adjust the compression factor4 as 10,20,30…90 and record the resulted image sizes . Discuss the 3 4 A useful tutorial of JPEG standard can be downloaded at: ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz The commonly used relation between quality factor (Q) and quantization scale factor is: 5000 1 ≤ Q ≤ 50 Q scale _ factor (%) = 200 − 2 * Q 50 ≤ Q ≤ 99 1 Q = 100 The Scale factor is a multiplicative factor applied to the JPEG quantization table. Paint Shop Pro defines the Compression Factor (CF) as CF 50 Scale _ factor = 50 100 − CF UMD ENEE408G Spring 2003 Design Project -Image 1 ≤ CF ≤ 50 50 ≤ CF ≤ 99 7 relation between compression factor and the quality of image . Zoom in images and describe the artifacts caused by the 8x8 blocks . Try the Effects Enhance Photo JPEG Artifact Removal to remove the artifacts. (b) Progressive Mode: Download an aerial image with size at least 1024x1024 from the USC Image database5. Generate a progressive mode JPEG image with best quality. Put this image on your web site and observe how these images are loaded and displayed under these the following two conditions. One is with a high-speed connection (e.g. use the computer in Jasmine during the lab hours) and the other is with a low-speed connection (e.g. dial-up from your home)6. Describe your observation. 2. Design a JPEG-like Codec (a) In this part, we design a JPEG-like image codec and implement it by writing two Matlab scripts. One script is for the encoder and the other is for the decoder . The block diagram is shown as follows. Y RGB Image RGB to YCbCr DCT QY Zigzag Cb 2 DCT QCb Zigzag Cr 2 DCT QCr Zigzag Entropy Eocoding Compressed Image Encoder Compressed Image iZigzag iQY IDCT iZigzag iQCb IDCT 2 Cb iZigzag iQCr IDCT 2 Cr Entropy Decoding Y YCbCr to RGB Reconstructed RGB Image Decoder 2 represents down sample by 2. 2 represents up sample by 2. 5 USC Image Database: http://sipi.usc.edu/services/database/Database.html Make sure your browser support JPEG progressive mode. Pick one that supports this mode, such as the Netscape 4.7. Otherwise, you won’t see the progressive effect. 6 UMD ENEE408G Spring 2003 Design Project -Image 8 Here are some hints on implementing a few key modules: (i) Image I/O: imread.m and imwrite.m. (ii) RGB YcbCr: Use rgb2ycbcr.m and ycbcr2rgb.m. (iii) The DCT block means NxN block-based DCT. You may use dct2.m and blkproc.m. IDCT is the NxN block-based inverse DCT. Use idct2.m and blkproc.m. (iv) QY, QCb, and QCr represent the NxN quantization with quantization tables7 for luminance and chrominance components, respectively. And iQY, iQCb, iQCr denote the corresponding reconstruction. You should design this part by yourself. (v) The downsample and upsample factor is 2. You can use imresize.m or simple spatial sampling and average of image. (vi)You can use ZigzagMtx2Vector.m8 we have provided to you to perform zigzag scanning and use Vector2ZigzagMtx.m9 for iZigzag. (vii) For Entropy Encoding, use JPEG_entropy_encode.m10 we have provided to you. This function will read a matrix, in which each row represents a vectorized DCT block, write a bit stream, whose filename is always named as JPEG.jpg, and return the length of this file. You can do the luminance part first, rename this file, and then do the chrominance part. For the entropy decoding, use JPEG_entropy_decode.m11, which performs the inverse functionality. 7 JpegLumQuanTable.m returns the JPEG standard luminance quantization table, and JpegChrQuanTable.m returns the JPEG standard chrominance quantization table. In this part, you should use the JPEG standard table as a reference and design your own table. 8 function out=ZigzagMtx2Vector(in) Convert a matrix to a vector using zigzag order, e.g. [1 2 6; 3 5 7; 4 8 9] [1 2 3 4 5 6 7 8 9]. 9 function out=Vector2ZigzagMtx(in) Convert a vector to a square matrix using zigzag order, e.g. [1 2 3 4 5 6 7 8 9] [ 1 2 6 ; 3 5 7 ; 4 8 9]. 10 function [Len]=JPEG_entropy_encode(rowN,colN,dct_block_size,Q,ZZDCTQIm,encoder_path,DisplayProcess_Flag) Input: rowN (1x1): the number of row. colN (1x1) : the number of column. dct_block_size (1x1): the dimension of DCT. Q: quantization table of size dct_block_size x dct_block_size. ZZDCTQIm: the zigzagged image matrix after quantization with size (rowN*colN/dct_block_size2)x(dct_block_size2) encoder_path (string): the absolute path of this function and jpeg_entropy_encode.exe. ## Remember to set Matlab "current directory window" to this path, so that this .exe file can be run DisplayProcess_Flag (1x1): flag for displaying the zero run pair and Huffman table (in JPEG_entropy_encode.html) This HTML file provides the details of JPEG Huffman encoding. Output: Len: 1x1, compressed file length This Matlab function is an interface for generating a text file, JPEG_DCTQ_ZZ.txt, and running jpeg_entropy_encode.exe. 11 function [rowN,colN,dct_block_size,iQ,iZZDCTQIm]=JPEG_entropy_decode(decoder_path) Input: decoder_path: (string): the absolute path of this function and jpeg_entropy_decode.exe. ## Remember to set Matlab "current directory window" to this path, so that this .exe file can be run. Output: rowN (1x1): the number of row. colN (1x1) : the number of column. dct_block_size (1x1): the dimension of DCT. iQ: quantization table of size dct_block_size x dct_block_size iZZDCTQIm: the zigzagged image matrix after reconstruction with size (rowN*colN/dct_block_size2)x(dct_block_size2) This Matlab function is an interface for generating and interpreting a text file, JPEG_iDCTQ_ZZ.txt, which is generated by jpeg_entropy_decode.exe. UMD ENEE408G Spring 2003 Design Project -Image 9 (viii) Make sure that the pixel value of the reconstructed image are integers within [0 255]. If the value is above 255, you should enforce it to 255. If the value is below 0, enforce it to 0. (b) Adjust the following parameters (or matrix) on LenaC.bmp: N, QY, QCb, and QCr. Use the following two definitions to plot the PSNR of luminance component (y-axis) v.s. CR (x-axis) . (i) Peak signal to noise ratio (PSNR): PSNR = MSD = ( peak _ intensity ) 2 MSD w h i j ∑∑ (err[i][ j ]) 2 w× h Where peak_intensity=255, w is the width, h is the height of the images, and err[i][j] represents the difference between the (i, j) pixel in the original image and that in the reconstructed image. (ii) Compression ratio (CR): CR= original image file size (in bytes) / compressed image file size (in bytes) Remember that a compressed image should contain both luminance and chrominance data, though we store them in several files here for simplicity. Reminder: Your codec should achieve as high PSNR as possible at each reasonable compression CR. (c) Discuss the advantage of zigzag scanning. Verify your conclusion by removing the Zigzag and iZigzag part from your codec, plotting the PSNR v.s. CR figure , and comparing the result with (c). Part IV: Mobile Computing and Pocket PC Programming From the above three parts, we have learned the fundamentals of image processing. Now design a simple Pocket PC application related to digital image processing using the Microsoft eMbedded Tools. You can refer to “ENEE408G Multimedia Signal Processing Mobile Computing and Pocket PC Programming Manual” and extend the examples there. UMD ENEE408G Spring 2003 Design Project -Image 10 Part V: Digital Photography During the past few years, the functionalities of digital cameras become much powerful. An advantage of digital camera is that we can get images immediately after capturing, erase them if we do not like it, modify them if the lighting is not perfect, and transfer them easily through electronic connections. In this section, use the digital camera to take pictures, and modify them artistically and creatively. You can use the built-in function in digital camera to do real time processing while you are taking the pictures and apply the skills you have learned from Paint Shop Pro and Matlab to do post-processing. We will use a color printer in the Communication and Signal Processing Laboratory (CSPL, AVW2454) to print your artwork. ________________________________________________________________________ Bonus Part I. Digital Halftoning Halftoning is a process to convert multi-level gray or color images to two-level images. The technique is widely used in printing. In this section, we learn how to generate a halftoning images using Paint Shop Pro and Matlab. 1. Paint shop Pro supports several halftone methods. Click Colors Color Depth 2 Colors, you can see a dialog box as follows. Decrease Set Palette Component as Grey values and Palette weight as non-weighted. Apply the five reduction methods shown in the dialog box on images Lena.bmp and Baboon.bmp. Save these images and compare them . 2. We can generate a halftone image in other ways. (a) The most significant bit approach: We can simply select the most significant bit to present the black and white. Under Paint Shop Pro, Colors Adjust UMD ENEE408G Spring 2003 Design Project -Image 11 Thresholds, you will see a Threshold dialog box. Set the threshold to 127 and save the image. (b) Dithering approach: Before applying a threshold, we can add noise by Effects Noise Add. Choose 50% uniform noise on the Add Noise dialog box. After adding noise, apply a 127-threshold as mentioned in the first approach. Save this image . What happens if you apply different magnitude percentage and different types of random noise? (c) Halftone screen approach: Open the two images, Lena.bmp and halftone_screen.tif and open an Image Arithmetic dialog box by Image Arithmetic. Select Lena.bmp as the image #1 and halftone_screen.tif as image#2. Set the Function as Add, Channel as All channels, Divisor as 1, Bias as 0 and check the Clip color values. After finishing this step, use the threshold as 255 to generate 2-level halftone image . Compare the results of (a), (b), and (c) . 3. Design your own halftone screen by writing a Matlab script. (a) Design an 8x8 halftone pattern matrix M. (b) Use the function, halftone_screen.m, to generate your own halftone screen. Note: There are four parameters in this function. % halftone_screen( M, im_height, im_width, filename) % M: a matrix you have designed; for M=0 (scalar), it sets a default clustered-dot % pattern; for M=1, it sets a dispersed dot pattern. % im_heigth and im_width: the dimension of the source image. % filename: the name and storage path of halftone screen. UMD ENEE408G Spring 2003 Design Project -Image 12 (c) Use this halftone screen to generate halftoned Lena and Baboon. Save the images . Compare the results with the halftoning approach in the previous section . 4. Color Halftoning (a) Separate LenaC.bmp into CMYK12 (Cyan-Magenta-Yellow-blacK) color coordinate by Colors Spilt channel Spilt to CMYK. Therefore, we obtain 4 color components. (b) Perform the halftoning technique on each component and set these components as gray level image by Color Grey Level. (c) Combine these four new images by Colors Combine channel Combine from CMYK. Thus, we obtain the color halftoning image. Save the image. Bonus Part II. Special Effects Many techniques of digital image processing are used extensively in various ways of art and entertainment. In this part, we will learn some special effects used in those fields. 1. Morphing (dissolving method): Michael Jackson’s Black or White13 MTV is a classical video adopting morphing technique. In this subsection, we implement morphing using dissolving method provided by the Jasc Animation Shop. (a) Prepare two face images, face1 and face2, with size MxN. You can use Digital Camera or PC Camera to capture face images. Use Paint Shop Pro to crop faces with rectangles and adjust the image size. (b) Click File New in Animation Shop. A Create New Animation dialog box will pop up. Set Width as N, Height as M and Canvas color as Transparent. 12 RGB is an additive color synthesis system specific to light. CMY is a substractive color synthesis system for pigment colors, such as newspaper and color printer. Actually, CMY is the complementary system for RGB. The relation between RGB and CMY coordinates is in the following formula. C=1-R, M=1-G, Y=1-B. The CMYK is an offspring system of CMY. The difference between CMYK and CMY is that CMYK extracts all black elements from CMY and put in the last component, blacK. 13 Michael Jackson’s Black or White on line demo: http://www.gti.ssr.upm.es/~fmb/seq/mjackson.mpg UMD ENEE408G Spring 2003 Design Project -Image 13 (c) Animation Insert Frames face1 image From Files. Click the Add File to insert (d) Move to the second frame and insert face2, and face1 for the third frame. (e) Click the first frame and Effects Insert Image Transition. Set the parameters as shown in the figure above. Select the Effect as Dissolve. (f) As we can see, Animation Pro has automatically inserted 20 frames between face1 and face2. Click the 22nd frame (i.e. the original face2) and repeat the step (d). There are totally 43 frames in this animation file. (g) Use View Animation to preview the result. (h) Save the animation as GIF file format by File UMD ENEE408G Spring 2003 Design Project -Image Save as. 14 Here is the animation result. 2. Special Effect - Emboss Filter: The usages of spatial filter are not limited in sharpening and blurring. Many digital artists use spatial filter to create their artwork. In this sub-section, we use emboss filter to convert an image into a basrelief. (a) Apply emboss filter on Lena.bmp by Effects Paint Shop Pro. Save this image . Texture Effects Emboss in (b) We can implement emboss filter simply using the User Defined filters dialog box. Set the Bias of the filter as 128 and Filter Matrix as − 1 0 1 − 1 0 1 − 1 0 1 Save this image. (c) We can apply the same emboss filter on color image, such as LenaC.bmp. Spilt the RGB image into HSL color coordinate. Use the emboss filter in (b) to filter the L component. Combine these three components. Save the image. 3. Special Effect - Lithograph and Psychedelic Distillation Filter: (a) Try the following two filters on LenaC.bmp using User Defined filters in Paint Shop Pro. Save the results. Lithograph −1 −1 −1 −1 −1 − 1 − 10 − 10 − 10 − 1 − 1 − 10 98 − 10 − 1 − 1 − 10 − 10 − 10 − 1 − 1 − 1 − 1 − 1 − 1 Division Factor: 1 ; Bias : 0 Psychedelic Distillation 0 − 1 − 2 − 3 − 4 0 −1 3 2 1 0 − 1 10 2 1 2 1 0 −1 3 0 − 1 − 2 − 3 − 4 Division Factor: 1 ; Bias : 0 (b) We have learned several useful spatial filters to create artistic effects. The basic approach is to sum up the weighted neighbors around each pixel. In this part, design one special filter to create your own artistic effect. Put the filter and images in your report. coefficients UMD ENEE408G Spring 2003 Design Project -Image 15 ENEE408G Multimedia Signal Processing Design Project on Video Processing The Goals 1. Understand the fundamentals of video compression. 2. Learn the latest MPEG video standard. 3. Understand content-based search and indexing. Note: The symbol means to put your discussion, flowchart, block diagram, or plots in your report. The symbol indicates that you should put the multimedia data in your report. Save images in BMP format unless otherwise stated. The symbol means to put your source codes (Matlab, Basic, or C/C++) in your report. Part I. Video Capturing by PC Camera Video can convey more perceptual information than image. For example, we can send a vivid greeting video through email to family and friends instead of a plain greeting card. Nowadays digital video can be captured in good quality using affordable consumer-level devices. In this section, we use a PC camera, Creative Video Blaster Web Cam1, to learn how to capture digital video. 1. Capturing: Use PC Camera to capture your facial video sequence of about two seconds long. Save it as an uncompressed AVI file . 2. Playing back: To check whether the video sequence is taken successfully, use the Windows Media Player to open this AVI file and see the video. 3. Matlab provides several functions for reading and writing AVI files as well as displaying and manipulating movies. (a) aviread.m: Read an AVI file. (b) movie.m: Display a movie or an image sequence. (c) movie2avi.m: Save an image sequence in AVI file format. Write a Matlab script to read an AVI file captured in step 1, display it and save the first ten frames in another AVI file . 1 Adjust the “Album Directory” to “c:\temp” by Settings Æ Album. UMD ENEE408G Spring’03 Design Project - Video 1 Part II. Motion Estimation and Compensation: A video consists of a sequence of images. We have learned in lecture that even a short video clip would contain a large amount of information. It is desirable to compress videos in order to save storage and reduce the transmission time. There are several ways to compress a video. In addition to remove the redundancy in each frame (the spatial redundancy), we can compress the video by eliminating the redundancy among the neighboring frames (the temporal redundancy). In this part, we learn how to remove the temporal redundancy using motion estimation (ME) and motion compensation (MC). The basic idea of ME/MC is that the content within a video shot is almost the same, except the objects’ motion and the global camera motion. We can find that some regions in the current frame have the corresponding ones in the previous and future frames. Therefore, we do not need to encode these regions all over again. Instead, we can simply use motion vectors to describe the motion between these two frames. Motion estimation is to find out the motion vectors and motion compensation is to construct the estimated current frame using a reference frame. The following figures illustrate a pair of video encoder and decoder based on motion compensation and DCT transform coding. Original macroblocks of pixels motion compensated macroblocks Motion Compensated residual + _ Motion Compensation DCT Frame Memory Entropy Coding Quantization Encoded Interframe Inverse quantization + DCT + Reconstructed motion compensated residual estimated motion vectors Motion Estimation reconstructed motion compensated reference frame Inter-frame Encoder Encoded Interframe Entropy Decoding Inverse quantization IDCT reconstructed motion compensated residual estimated motion vectors + reconstructed macroblocks of pixels Motion Compensation Inter-frame Decoder UMD ENEE408G Spring’03 Design Project - Video 2 1. Motion Estimation: In this sub-section, we implement the motion estimation algorithm using exhaustive block matching in Matlab. R R-1 N1 R N2 R-1 Reference Frame Current Frame To perform block matching, we first divide the current frame into several blocks with size N1x N2 (oftentimes we use 16x16). For each block B in the current frame, we calculate the MAD that is defined as follows: MAD(d1 , d 2 ) ≡ 1 N1 N 2 ∑ ( n1 , n2 )∈B S ref (n1 + d1 , n2 + d 2 ) − Scur (n1 , n2 ) Sref(x, y) represents the pixel (x, y) in the reference frame and Scur(x, y) represents the pixel (x, y) in the current frame. d1 and d2 are motion displacement that are integers between –R and R-1 with R being the maximal search size. The goal is to find a (d1, d2) pair for each block such that the MAD(d1, d2) is minimized. car1.bmp car2.bmp We will start with N1 = N2 = 16, and R=16. You are provided with two files: car1.bmp is the reference frame, and car2.bmp is the current frame. Write a Matlab script to find the motion vectors for this image pair and plot the motion field2 . That is, each block is represented by an arrow indicating its motion vector. An example of motion field is shown in the following figure). 2 To plot the motion vectors, you can use quiver.m function under Matlab. UMD ENEE408G Spring’03 Design Project - Video 3 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 0 2 4 6 8 10 12 14 16 Motion Vectors 2. Motion Compensation: Estimated Car2.bmp Motion compensation residual Motion compensation is to construct the estimated current frame by replacing each block in the current frame with a best matching block in the reference frame. The best matching block is indicated by the motion vector. Here is an instance. Suppose we have a 16x16 block in the current frame with the left-top pixel at (17, 33) and the motion vector of this block is (-5,10). We replace this block by the block whose left-top pixel is (12,43) in the reference frame. In this part, write a Matlab script to complete the following tasks. (a) Use the motion vectors obtained in the previous section to perform motion compensation and construct the estimated image. Save this estimated image . (b) Compute the motion compensated residual, which is the difference between the current frame and the estimated frame. Display3 and save this image . (c) Calculate the mean absolute distortion (MAD) between the current image and the estimated image . MAD = 1 M ×N M N ∑ ∑ | residual[i, j ] | i j where M and N are the dimensions of the frames. 3 Use imagesc.m to adjust the visualization scale. UMD ENEE408G Spring’03 Design Project - Video 4 3. Although the exhaustive block-matching algorithm can find the best-matching block within a specified search range, it is very time consuming. There are several suboptimal algorithms to speed up the search. In this part, we explore the three-step search approach. 3 3 2 3 2 3 2 3 3 1 1 1 1 3 3 2 1 2 2 2 2 1 1 1 1 Search points in step 1 2 Search points in step 2 3 Search points in step 3 1 The figure above shows the three-step search algorithm. Suppose the maximal search step size is R. At the beginning, the search step size is about the half of the search range R. At each step, the search step size is about half of the one in the previous step. We only calculate the MAD between the block in the current frame and the blocks whose centers are located at nine points in the reference frame. Among the nine points, one locates in the center, and the other eight points locate in different directions illustrated in the above figure. Select the point among the nine that gives the minimal MAD value, and move the search center of the next step to this point. Proceed to the next search step with the search step size reduced by half. After three steps, we pick the block with minimal MAD as the matched block. In your experiment, set the block size as 16x16 and R=16. Like before, car1.bmp is the reference frame and car2.bmp is the current frame. Write a Matlab script to complete the following tasks. (a) Find the motion vectors using the three-step search algorithm and plot those motion vectors . (b) Use the motion vectors obtained through the three-step search to implement motion compensation. Display and save the estimated image . (c) Obtain the motion compensated residual. Display4 and save it . (d) Calculate the mean absolute distortion (MAD) between the current image and the estimated image . 4 Use imagesc.m to adjust the visulization scale. UMD ENEE408G Spring’03 Design Project - Video 5 4. In this part, apply your previous Matlab code on an image pair, Carphone0195.tif and Carphone0196.tif using the exhaustive block matching algorithm and the three-step algorithm. Carphone0195.tif Carphone0196.tif For each algorithm, set the block size as 16x16, and R=16. Use Carphone0195.tif as the reference frame and Carphone0196.tif as the current frame. (a) Find the motion vectors and plot them . (b) Use the motion vectors obtained in (a) to perform motion compensation. Display and save the estimated image. (c) Obtain the motion compensated residual5. Display and save it . (d) Calculate the mean absolute distortion (MAD) between the current image and estimated image . 5. From step 1 to 4, we have already performed ME/MC on the car and carphone image sequences using the exhaustive block matching algorithm and the threestep algorithm. Discuss the advantages and disadvantages of these two algorithms for each image sequence . Part III. MPEG Video MEPG-1 was developed by Moving Picture Experts Group in the early 1990s and targeted at producing near-VHS quality video at a bit rate up to 1.5 Mbps. MPEG-1 consists of three types of frame, I, P, and B. I frame is encoded alone without taking reference from any other frames. P (predicted) frame uses ME/MC technique and is predicted from the previous frames. Motion vectors in B (bi-direction) frame can be obtained using previous or the next I/P frames. There are several factors affecting the video quality. In this section, we explore the basic parameters and study the tradeoff in MPEG-1 video compression. 5 Use imagesc.m to adjust the gray scale. UMD ENEE408G Spring’03 Design Project - Video 6 The Matlab “user contributed function” mpgwrite.m6 supports generating a MPEG file from an image sequence. You can adjust the parameters as given below: % % % % % % % % % % % % % % % % % % % % % % % % % % MPGWRITE(M, map, 'filename', options) Encodes M in MPEG format using the specified colormap and writes the result to the specified file. The options argument is an optional vector of 8 or fewer options where each value has the following meaning: 1. REPEAT: An integer number of times to repeat the movie (default is 1). 2. P-SEARCH ALGORITHM: 0 = logarithmic (fastest, default value) 1 = subsample 2 = exhaustive (better, but slow) 3. B-SEARCH ALGORITHM: 0 = simple (fastest) 1 = cross2 (slightly slower, default value) 2 = exhaustive (very slow) 4. REFERENCE FRAME: 0 = original (faster, default) 1 = decoded (slower, but results in better quality) 5. RANGE IN PIXELS: An integer search radius. Default is 10. 6. I-FRAME Q-SCALE: An integer between 1 and 31. Default is 8. 7. P-FRAME Q-SCALE: An integer between 1 and 31. Default is 10. 8. B-FRAME Q-SCALE: An integer between 1 and 31. Default is 25. 1. Apply BR_Q_mpg.m7 on foreman image sequence to achieve the following goals and list the parameters you choose in mpg_option. (a) Average PSNR is higher than 26dB. (b) Bit Rate is lower than 400Kbps. 2. Give a rule of thumb on how to choose those parameters in step 1. Part IV. Video Conference Video conferencing becomes increasingly popular. Imagine that people can have a meeting through Internet without physically getting together! In this part, we use Microsoft’s NetMeeting8 software to have video conferencing over two kinds of network, namely, the Local Area Network and the Wide Area Network. 1. Over the Local Area Network (LAN): From CallÆ New call, call your partner’s IP address9 directly. 6 This file can be obtained from http://www.mathworks.com/support/solutions/data/8154.shtml Modify the parameters in this BR_Q_mpg.m. Make sure the file names, in_filename, out_filename, mpg_filename, have the full path. What you need to do is to modify the mpg_option, which is the input argument, options, in mpgwrite.m Remember to include subroutine read_im_seq.m, write_im_seq.m, file_size.m and PSNR_seq.m. 8 You should set up the PC Cameras, microphones and earphones before you start this experiment. 9 The IP addresses of PCs in Jasmine are posted on the top of the PC’s cases. 7 UMD ENEE408G Spring’03 Design Project - Video 7 2. Over the Wide Area Network (WAN): Login to the server with your partner by CallÆ Log on to Microsoft Internet Directory. 3. Compare the results of both video and audio part between LAN and WAN. Part V. Video Scene Change Detection Scene change detection plays a fundamental role in content-based video processing, which facilitates users to automatically analyze and index the content of videos. In this part, we will design and implement a scene change detector. We will explore the detection of Cut, Fade, and Wipe transitions. We first understand what common scene change are. Cut is simply an abrupt transition. Read Cut.mpg using mpgread.m. You can observe that the transition between the following two scenes is abrupt and without any intermediate frames. Dissolve is a time-dependent linear combination for two scenes. You can observe the dissolve in Dissolve.mpg using mpgread.m. Wipe covers an old scene and reveals a new scene with such patterns as line, blinds, or checkerboard. You can observe a wipe with diagonal line pattern in Wipe.mpg using mpgread.m. UMD ENEE408G Spring’03 Design Project - Video 8 1. Design a Scene Change Detector: Design a scene change detector10 using Matlab . This detector should be able to indicate the frame indexes at which the scene changes happen. Apply your detector to detect scene changes in Cut.mpg, Dissolve.mpg, and Wipe.mpg. List the frame indexes your program indicates . Hint: You can calculate the statistical characteristics for each frame. 2. Design a Scene Change Detector for Multiple Scene within one video: Use what you have learned above and write a Matlab script to detect the scene changes in the video sequence cbswipe.mpg, which is mixed with several types of scene changes . This detector should be able to indicate the frame indexes at which the scene changes happen. List the frame indexes your program indicates . cbswipe.mpg 3. Discussion: Discuss the performance of your scene change detector . 10 You can use frame2im.m to retrieve every image frame from a movie object in Matlab. E.g. M=mpgread(‘Cut.mpeg’,1:100,’truecolor’); % read 1st to 100th frame of Cut.mpeg and each pixel is % represented by true color (24bits) % load Cut.mpeg into M, a movie object Movie(M); % display the movie object Im=frame2im(M(1)); % transfer the 1st frame in movie object into Im UMD ENEE408G Spring’03 Design Project - Video 9 Part VI: Mobile Computing and Pocket PC Programming In this part, apply the fundamental video processing you learned from the previous parts to design a simple application related to digital video processing by the Microsoft eMbedded Tools for Pocket PC. You can refer to the “ENEE408G Multimedia Signal Processing Mobile Computing and Pocket PC Programming Manual” and extend the examples there. Bonus Part I. MPEG-7 Visual Descriptor MPEG-1, MPEG-2 and MPEG-4 emphasize on how to compress videos efficiently. MPEG-7, on the other hand, focuses on how to describe the content of video11. In fact, it is a standard that defines content description systems for indexing, searching, and browsing of multimedia data. The MPEG-7 Visual Part12 defines descriptors for color, texture, shape, and motion. In this part, we explore texture and shape descriptors. 1. MPEG-7 Texture Descriptor Texture contains much useful information about an image. In this part, we use the MPEG-7 Homogeneous Texture Descriptor Demo, (http://nayana.ece.ucsb.edu /M7TextureDemo/Demo/client/M7TextureDemo.html), to study the texture descriptor. 11 12 For more details: IEEE Trans. Circuits Syst. Video Technol, vol. 11, June 2001. MPEG-7 Visual Part Committee draft: http://mpeg.telecomitalialab.com/public/mpeg-7_visual_fcd.zip UMD ENEE408G Spring’03 Design Project - Video 10 The figure above shows the MPEG-7 Homogeneous Texture Descriptor Demo for searching aerial images. You can query a region by the instructions given in that web site. Inquire three different tiles using NN search with 10 best matches for each query. Save the inquiry results13 in JPEG format and discuss how good the search results are. 2. MPEG-7 Shape Descriptor Shape is also an important feature for objects. In this part, we use the Shape Queries Using Image Databases (SQUID) demo, http://www.ee.surrey.ac.uk/ Research/VSSP/imagedb/demo.html, to learn the shape descriptor. The figure above shows the GUI of SQUID. One can query a shape by the following steps: Step-1: Click the Random button to generate random contours. Step-2: Click on one desired shape and images with similar shapes in the database will be shown. Query three different shapes. Save the inquiry results in JPEG format and discuss how good the search results are . 13 You can use “Print Screen” (on the keyboard) to capture the screen and Paint shop pro to crop the images UMD ENEE408G Spring’03 Design Project - Video 11 ENEE408G Multimedia Signal Processing Design Project on Digital Speech Processing The Goals 1. Learn how to use linear predictive model for speech analysis and synthesis. 2. Implement a linear predictive model for speech coding. 3. Explore the speech recognition based human computer interface. Note: The symbol means to put your discussion, flowchart, block diagram, or plots in your report. The symbol indicates that you should put the multimedia data in your report. Save speech in signed mono16 bits WAV format unless otherwise stated. The symbol means to put your source codes (Matlab, CSLU RAD, Basic, or C/C++) in your report. Part I. Speech Analysis To analyze a speech signal, we should first understand the human vocal tract and build a model to describe it. In this part, we investigate the linear predictive model. The figure1 above shows the mid-sagittal plane of human vocal apparatus. The vocal tract begins at the opening of the vocal cords and ends at the lips. The figure below represents a model of speech production. Pitch Period Vocal Tract Parameters Impluse Train Generator Voiced X White Noise 1 Unvoiced Vocal Tract Model speech G http://www.phon.ox.ac.uk/~jcoleman/phonation.htm UMD ENEE408G Spring’03 Design Project - Speech 1 This model consists of two parts. The first part is the excitation with two states: the impulse train generator produces impulse trains at specified pitches for voiced sound and the white noise generator for the unvoiced speech. The impulse train generator is stimulated with a given pitch period (i.e., the fundamental frequency of the glottal oscillation). The second part is the vocal tract model (with gain G) that is usually modeled as the following pth order all-pole linear predictive model, V(z). G V ( z) = p ∏ (1 − α k z −k ) k =1 −k The frequency parts of 1 − α k z are called formants, which are the resonant frequencies caused by airflow through the vocal tract. In this part, we use the Matlab “COLEA” toolbox2 to study the linear predictive model. There are four major window interfaces in COLEA toolbox as described in the following paragraphs. Waveform on Time Domain: This window shows the raw speech signal. We can observe the signal, bode.wav3, on the time domain by DisplayÆTime Waveform. Spectrogram: We can use time-frequency domain to highlight different information of a speech signal. Spectrogram (short time Fourier Transform) is a popular spectral representation. We can use DisplayÆ Spectrogram Æ Color to display the spectrogram. 2 COLEA: http://www.utdallas.edu/~loizou/speech/colea.htm Download the American-English part from the Handbook of the International Phonetic Association (IPA): http://web.uvic.ca/ling/resources/ipa/handbook.htm . 3 UMD ENEE408G Spring’03 Design Project - Speech 2 Pitch and Formant tracking: We can use Display Æ F0 contour Æ Autocorrelation Approach and Display Æ Formant Track to visualize the pitch and formant of a speech signal. Pitch Contour Formant tract LPC spectra: We can also characterize the spectrum of a speech signal using linear predictive model, V(z). For example, we first open a speech signal file, say, the bode.wav. Left click on the waveform or spectrogram on the corresponding window. Two sub-windows will show up. One is the LPC Spectra and the other one is Controls, which can be used to set the parameters of displayed LPC spectra. UMD ENEE408G Spring’03 Design Project - Speech 3 To verify the correctness of LPC for speech modeling, we can calculate the Short Time Fourier Transform (STFT), overlap it with the LPC spectrum, and compare how close those two spectrums are. We can choose the Spectrum as FFT on the Controls window and check the Overlay on the bottom of the window to compare them. 1. Linear Predictive Model (a) Now, use the Recoding Tool (as shown in the figure below, you can find it from menu barÆRecord) to record your own voices with ten vowels listed in the following table . Then, use the tools introduced earlier to analyze the pitches and their first three formants . To locate the position of a specified vowel in a speech signal, you can search it by listening to a small frame in this speech signal. To specify a small frame, you can use left and right mouse clicks on the waveform window to make the start and the end of a frame, respectively. To listen to this small frame, press the “sel” button in the Play area. 1 2 3 4 5 6 7 8 9 10 Word/vowel beet bit bet bat but hot bought foot boot bird UMD ENEE408G Spring’03 Design Project - Speech Pitch(Hz) Formant1 (Hz) Formant2(Hz) Formant3(Hz) 4 (b) Plot the first formant (X-axis) and second formant (Y-axis) of each vowel for all the members in your group in a single figure (as shown in the following figure) . Discuss what you observe from this figure . (c) Adjust the order and duration of Linear Predictive model. Describe what you have observed from the LPC spectral results and STFT for different orders and durations . 2. Gender Identification: Linear predictive model is widely used in digital signal processing due to its simplicity and effectiveness. In this part, we use linear predictive model to program gender identification. You should develop your own algorithms using Matlab to identify the gender of a speaker. Ten male speech samples and their corresponding female speech samples are provided on the course web page. You can train your gender identifier with those samples. At the end of this lab, you will be asked to test your program with a new set of samples LP Gender Identification framework: The figure below shows the LPC gender identification framework. There are three building blocks in this system, LPC Analysis, Features Extraction for training set, and Gender Identification testing. Training Set LPC Analysis by proclpc.m Features Extraction for Training Set Gender Identificatoin Male / Female Unknow gender wave files UMD ENEE408G Spring’03 Design Project - Speech 5 LPC Analysis: Using proclpc.m4, we can obtain LPC coefficients and other information. % [aCoeff,resid,pitch,G,parcor,stream] = proclpc(data,sr,L,fr,fs,preemp) % % LPC analysis is performed on a monaural sound vector (data) which has been % sampled at a sampling rate of "sr". The following optional parameters modify % the behaviour of this algorithm. % L - The order of the analysis. There are L+1 LPC coefficients in the output % array aCoeff for each frame of data. The default value is 13. % fr - Frame time increment, in ms. The LPC analysis is done starting every % fr ms in time. Defaults is 20ms (50 LPC vectors a second) % fs - Frame size in ms. The LPC analysis is done by windowing the speech % data with a rectangular window that is fs ms long. Defaults to 30ms % preemp - This variable is the epsilon in a digital one-zero filter which % serves to preemphasize the speech signal and compensate for the 6dB % per octave rolloff in the radiation function. Defaults to .9378. % The output variables from this function are % aCoeff - The LPC analysis results, a(i). One column of L numbers for each % frame of data % resid - The LPC residual, e(n). One column of sr*fs samples representing % the excitation or residual of the LPC filter. % pitch – A vector of frame-by-frame estimate of the pitch of the signal, calculated % by finding the peak in the residual's autocorrelation for each frame. % G - The LPC gain for each frame. % parcor - The parcor coefficients. The parcor coefficients give the ratio % between adjacent sections in a tubular model of the speech % articulators. There are L parcor coefficients for each frame of % speech. % stream – A vector representing the residual or excitation signal in the LPC analysis. % Overlapping frames of the resid output combined into a new one% dimensional signal and post-filtered. The following diagram illustrated how this M-file works. preemp data & sr Preemphasis fr fs Frame Blocking L LPC Calculation aCoeff resid pitch G parcor stream (a) Features Extractions for training sets: For each sample, we can obtain one set of coefficients. Develop your own algorithm to distinguish gender using those coefficients. Write Matlab scripts/functions to implement your algorithm and briefly explain how it works . Note 1: Use [wave, SampleRate] = TIMITread(filename) to read wave files. Note 2: The unvoiced signals in the speech files may affect your identification performance. 4 Auditory Toolbox: http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/ UMD ENEE408G Spring’03 Design Project - Speech 6 (b) Testing new voice files: Your algorithm will be tested with ten new samples and your scores for this part depends on the percentage of the correct identification by your gender identifier. Part II. Speech Coding: Linear Predictive Vocoder To encode a speech signal at low bit rate, it is efficient if we employ analysis-synthesis approach to design a voice coder (vocoder). Linear predictive vocoder is a popular framework. In this part, we design a 2.4kbps 10-order linear predictive vocoder according to the linear predictive model we learned in Part I. Ideas: We have already learned how to obtain LPC related parameters, such as 10 LPC coefficients, {ak}k=1~10, (aCoeff), Gain (G) and Pitch (pitch), for each frame using proclpc.m5. We can represent those parameters as a vector A=(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, Gain, Pitch) and quantize A to compress a speech signal. After reconstructing a speech signal from those quantized parameters, we can use synlpc.m, (also in Auditory Toolbox) to synthesize this speech. % synWave = synlpc(aCoeff,source,sr,G,fr,fs,preemp); % % LPC synthesis to produce a monaural sound vector (synWave) using the following parameters % aCoeff – represents the LPC analysis results, a(i). Eeach column of L+1 numbers for each % frame of speech data. The number of columns is determined by the number of frames in the % speech signal % source - represents the LPC residual, e(n). One column of sr*fs samples representing % the excitation or residual of the LPC filter. % sr - sampling rate % G - The LPC gain for each frame. % fr - Frame time increment, in milli-seconds (ms). The LPC analysis is done for every % fr ms in time. Default is 20ms (i.e., 50 LPC vectors a second). % fs - Frame size in ms. The LPC analysis is done by windowing the speech % data with a rectangular window that is fs ms long. Default is 30ms (i.e., allow 10 ms’ overlap % between frames). % preemp - This variable is the epsilon in a digital single-zero filter that % is used to preemphasize the speech signal and compensate for the 6dB % per octave rolloff in the radiation function. Default is 0.9378. Line Spectrum Pair: If we directly quantize LPC coefficients (a1 ~ a10), it may cause instability due to some poles near the unit circle by quantization. One way to overcome this problem is to convert LPC to Line Spectrum Pair (LSP) parameters that are more amenable to quantization. LSP can be calculated first by generating polynomials P(z) and Q(z): 5 In proclpc.m, if pitch is equal to zero, it means this frame is unvoiced (UV). If it is nonzero, pitch indicates that this frame is voiced (V) with pitch period, T. To avoid confusion with the meaning of pitch, we denote the value of pitch as UV/V,T in the following paragraph. UMD ENEE408G Spring’03 Design Project - Speech 7 P ( z ) = 1 + (a1 − a10 ) z −1 + (a 2 − a 9 ) z −2 + ... + (a10 − a1 ) z −10 − z −11 Q( z ) = 1 + (a1 + a10 ) z −1 + (a 2 + a 9 ) z − 2 + ... + (a10 + a1 ) z −10 + z −11 Then, rearrange P(z) and Q(z) to obtain parameters {wk}: P ( z ) = (1 − z −1 )∏k = 2, 4,...,10 (1 − 2 coswk z −1 + z −2 ) Q( z ) = (1 + z −1 )∏k =1,3,...,9 (1 − 2 coswk z −1 + z − 2 ) where {wk}k=1~10 are the LSP parameters with order of 0< w1 < w2< …< w10 < π . We can use lpcar2ls.m6 to convert LPC (AR) parameters to LSP parameters and use lpcls2ar.m to convert LSP back to LPC (AR). Quantization: To achieve a coding rate of 2.4kbps with a frame size of 20msec (i.e., there are 50 frames per second), each frame will be represented by 48 bits. The following table shows how to allocate bits for the above-mentioned parameters of {wk}, Gain and UV/V,T. Parameters w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 Gain UV/V,T Bits/frame 3 4 4 4 4 3 3 3 3 3 7 7 We assign seven bits (range of value: 0~127) for the parameter UV/V,T. If this frame is unvoiced (UV), we encode it as (0)10 or (0000000)2. Otherwise for the voiced case (V), we encode the corresponding pitch period, T, according to the following table. For example, if T is equal to 22, we encode it as (3)10 or (0000011)2. UV/V UV V V V … V 6 T 20 21 22 … 146 Encoded Value 0 1 2 3 … 127 From Voicebox: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html UMD ENEE408G Spring’03 Design Project - Speech 8 (a) Design your own 2.4kbps vocoder: Orignal Speech Frame Segmentation &LPC analysis proclpc.m {ak}k=1~10 LPC to LSP {wk}k=1~10 lpcar2ls.m Gain, UV/V,T 2.4kbps compressed Speech Q Q Encoder iQ 2.4kbps compressed Speech {w’k}k=1~10 LSP to LPC lpcls2ar.m iQ UV/V,T’ {a’k}k=1~10 Gain’ Impluse Train Generator LPC synthesis & Frame combination synlpc.m Reconstructed Speech Source White Noise Decoder The figures above show the encoder and decoder, respectively. Write Matlab scripts/functions to implement this scheme and explain your design briefly . Note that the encoder and decoder should be implemented separately. That is, the encoder reads7 a wave file, generates a compressed bit stream, and saves it in the storage. The decoder reads this compressed bit stream from disk and decompresses it into a wave file. You can use proclpc.m, synlpc.m, lpcar2ls.m and lpcls2ar.m as the basic building blocks. Thus, the remaining work is to design quantization tables for the LPC parameters A’=(w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, Gain, UV/V,T) and design the impulse train generator for voiced state and white noise for unvoiced speech. (b) Compress the speech signal stored in tapestry.wav. Calculate the mean squared error for the reconstructed speech signal and the original speech signal . (c) Code Excited Linear Prediction (CELP): CELP is a federal speech-coding standard (FS-1016) that also uses linear prediction. This standard offers good speech compression at intermediate bit rate (4.8-9.6kbps). In this part, we use an audio processing software Goldwave to 7 Use wavread.m to read a wave file. Use wavwrite.m to write out a wave file. UMD ENEE408G Spring’03 Design Project - Speech 9 compress a speech signal in CELP format and compare with the result we obtained in the previous LPC vocoder part. (1) Open tapestry.wav by File Æ Open in Goldwave. We can convert it to CELP format by FileÆSave As. You will see the following window. Choose the Lernout & Hauspire CELP 4.8kbits/s 8,000 Hz, 16bits, mono and save it as a new file. (2) Load the new CELP wave file and save it back to 16bits,mono, signed .wav file. Write a Matlab8 script/function to calculate the MSE between the original signal and the reconstructed signal . Compare the results with your LPC vocoder in the previous part . Part III. Speech Recognition by IBM ViaVoice IBM ViaVoice is successful commercial speech recognition software. In this part, we use ViaVoice to get some experience of the state of the art of speech recognition. 1.ViaVoice Training: Open the IBM ViaVoice: As a new user, you are required to read a short story (about 100 sentences) to train this software. Please be patient to finish this part. 2.Operating PC by ViaVoice: Use ViaVoice to operate your PC and do a short dictation on Microsoft Word. 3. Discuss the strength and weaknesses of this speech recognition system . 8 Use wavread.m to read a wave file. Use wavwrite.m to write out a wave file. UMD ENEE408G Spring’03 Design Project - Speech 10 Part IV. Speech Synthesis Speech synthesis systems are generally classified into two categories: concept-to-speech systems and text-to-speech (TTS) systems. Concept-to-speech system is used by automatic dialog machine that has a limited vocabulary, e.g. 100 words, but with artificial intelligence to respond to inputs. Text-to-speech system aims at reading text (e.g. aids for the blind) and has the ability to record and store all the words of a specified language. In this part, we explore the TTS system and implement (a simple) speech synthesis system. 1. Text To Speech (TTS) and Talking Head: We can define text-to-speech system as the production of speech by machines that use automatic phonetization of the sentences to utter. In general, TTS9 consists of two components, natural language processing (NLP) and digital signal processing (DSP). NLP is capable of producing a phonetic transcription of the text to read and DSP converts those outputs of NLP into natural human speech. In addition to these two components, researchers recently add one more module, a talking head, which simulates the face and articulator when it pronounces texts. In this part, we use the CSLU10 toolkit to explore how Text-To-Speech works. MultiSync Window 9 For more details of TTS, please refer to Thierry Dutoit, An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997 10 CSLU Toolkit: http://cslu.cse.ogi.edu/toolkit/index.html UMD ENEE408G Spring’03 Design Project - Speech 11 Talking Head window Baldi Window Click StartÆ ProgramsÆ 408GÆ Speech Toolkit Æ Multi Sync and two windows will pop up. The first one is the MultiSync window and the other is the Talking Head window. You can type some sentences in the textbox labeled as Text to align with of the MultiSync window, and click the TTS button to generate button, listen to the speech, and the phonetic transcription. Then click the observe the behavior of the talking head. You can choose among five different talking-head characters to do the speech. From StartÆ ProgramsÆ 408GÆ Speech Toolkit Æ Baldi, you will see the talking head, Baldi, which is shown in the right figure above. Clicking File Æ Preference, you can change the colors, the texture map, and emotions of the talking head. You also can observe the articulators by Rendering Æ Solidness. 2. Vowel Synthesis: In practice, designing a text-to-speech system is not a simply task. This system should consist of several levels of processing: acoustic, phonetic, phonological, morphological, syntactic, semantic, and pragmatic levels. In this part, we emphasize on the phonetic level and synthesize the vowel by MakeVowel.m that is also provided with the Auditory Toolbox. % y=MakeVowel(len, pitch, sampleRate, f1, f2, f3) % len: length of samples % pitch: can be either a scalar indicating the actual pitch frequency, or an array of impulse % locations. Using an array of impulses allows this routine to compute vowels with % time-varying pitch. % sampleRate: sampling rate % f1, f2 & f3: formant frequencies Synthesize those ten vowels that have been analyzed in Part I.1.(a) by putting the values of pitch, f1, f2 and f3 in the input argument of MakeVowel.m. You can use the Matlab function sound(y, sampleRate) to hear those synthetic vowels and use wavwrite.m to write out the wave files . Compare the vowels you recorded in Part I.1.(a) with the synthesized results . UMD ENEE408G Spring’03 Design Project - Speech 12 Part V. Human Computer Interface One of the motivations to analyze and synthesize speech is to create a friendly and convenient interface between users and computers. Though such a concept-to-speech system, users can operate and communicate with machine through voice. In this part, we explore two advanced human computer interfaces based on speech recognition. 1. CSLU Human Computer Interface: CSLU provides a tool, known as Rapid Application Developer (RAD), to develop a human computer interface via speech. A computer uses speech recognition technology to understand what human speaks. In addition, the computer will react according to the decision rules set in the software program and speak by Text-To-Speech system. The figure below shows the RAD development environment. You can open this program by StartÆ ProgramsÆ 408GÆ Speech Toolkit Æ. Take a look at the tutorials (c:\Program Files\ CSLU\ Toolkit\ 2.0\ apps\rad\examples tutorials) and user’s guide11. Before you start using these interactive programs, you should calibrate the microphone by FileÆPreferencesÆAudioÆ Calibrate from RAD menu bar. (a) Design a simple application using the CSLU RAD tools . 11 CSLU RAD Tutorials: http://cslu.cse.ogi.edu/toolkit/docs/2.0/apps/rad/tutorials/ UMD ENEE408G Spring’03 Design Project - Speech 13 2. MIT Galaxy System: The MIT Spoken language Systems Group has been working on several research projects of human computer interface via telephone targeting at the following applications:12. JUPITER13 - A weather information system MERCURY - An airline flight planning system PEGASUS14 - An airline flight status system VOYAGER - A city guide and urban navigation system ORION - A personal agent for automated, off-line services See the instructions on the web sites of JUPITER and PEGASUS. Dial the corresponding toll-free phone numbers and talk with these two systems. Describe under what kinds of conditions these systems will make mistakes . Part VI: Mobile Computing and Pocket PC Programming We have learned various aspects of digital speech processing in this design project. In this part, apply what you have learned from the previous parts and design a simple application related to digital speech processing for Pocket PC using Microsoft eMbedded Tools. You can refer to “ENEE408G Multimedia Signal Processing Mobile Computing and Pocket PC Programming Manual” and extend the examples there. 12 MIT SLS: http://www.sls.lcs.mit.edu/sls/applications/ JUPITER: http://www.sls.lcs.mit.edu/sls/applications/jupiter.shtml TEL: 1-888-573-8255 14 PEGASUS: http://www.sls.lcs.mit.edu/sls/applications/pegasus.shtml TEL: 1-877-527-8255 13 UMD ENEE408G Spring’03 Design Project - Speech 14 ENEE408G Multimedia Signal Processing Design Project on Digital Audio Processing The Goals 1. Learn the fundamentals of perceptual coding of audio and intellectual rights protection from multimedia. 2. Design a digital audio watermarking system in time and frequency domain. 3. Explore the synthetic audio: MIDI and MPEG4 Structured Audio. Note: The symbol means to put your discussion, flowchart, block diagram, or plots in your report. The symbol indicates that you should put the obtained multimedia data in your report. The symbol means to put your source codes (Matlab, Basic, or C/C++) in your report. Part I. Perceptual Coding and MP3 In modern audio coding algorithms, four key technologies play important roles: perceptual coding, frequency-domain coding, window switching, and dynamic bit allocation. The figure below shows a generic block diagram for modern audio encoders. In this part of the design project, we investigate the fundamentals of perceptual coding in the MP3 technology. s(n) Time/Frequency Analysis Psychoacoustic Analysis Params. Params. Quantization & Encoding Entropy (lossless) Coding Masking Thresholds Bit Allocation Side Info M u x Channel 1. Psychoacoustic Models and Perceptual Coding UMD ENEE408G Spring’03 Design Project - Audio 1 The figure1 above shows the anatomy of human ear. Many researchers in the field of psychoacoustics exploit the “irrelevant” signal information that is not detectable even by a well-trained or sensitive listener. These studies lead to five psychoacoustic principles. a. Absolute Threshold of Hearing: This threshold represents the minimal amount of energy if a listener is able to detect a pure tone in a noiseless environment. If a given tone is too weak, we cannot hear it, so we do not have to encode it. b. Critical Bands Frequency Analysis: Cochlea can be modeled as a non-uniform filter bank that consists of 25 highly overlapping bandpass filters. Critical bands are the passbands of those filters. c. Simultaneous Masking: In each critical band, one sound (the maskee) is rendered inaudible due to the presence of another sound (the masker). We can identify a masker and do not encode the inaudible, masked tones. d. Spread of Masking: Masking in a critical band can be spread to its neighboring bands. e. Non-Simultaneous Masking: Masking can be done in time domain, too. For perceptual coding, the encoder generates a global masker according to the above principles and provides parameters for further processing. In this part, we investigate the absolute threshold of hearing and simultaneous marking principles. (a) Absolute Threshold of Hearing: In this section, we use PM_Abs_Thre_Hearing.m to explore the absolute threshold of hearing. The goal of this experiment is to find out a volume threshold that is just auditable at a specific frequency. In other words, given a tone with the same frequency, if it has a slightly lower volume than this threshold, it becomes inaudible. S tage 2: M eas ure Absolute Threshold of Hearing 40 Related S ound P ress ure Level to 4K Hz , rS P L(dB ) Left Click m ouse to adjust the volum e. Right click to E xit 35 30 25 20 15 10 5 0 -5 -10 2 10 1 3 10 Frequency(Hz) 10 4 From http://www.vestibular.org/gallery.html UMD ENEE408G Spring’03 Design Project - Audio 2 First, we calibrate the minimal audible volume for 4KHz2. After setting this volume, you will see the figure above. Each circle represents a frequency component and we can increase/decrease its volume by simply left clicking mouse in the upper/lower side of this circle. You can right click mouse to exit this program. Find out the thresholds for your ear at the 11 frequencies shown in the figure. Copy your result figure using Edit Copy Figure from the menu bar of the figure window and paste it to your report . (b) Simultaneous Masking: In this section, we use PM_Simu_Masking.m to explore the simultaneous masking, which means that a tone could become inaudible if there is a simultaneous tone with higher volume at a neighboring frequency. For each critical band, we can fix the volume of the central frequency. By adding a neighboring tone with different amplitude, we can find out a threshold that this neighboring tone is inaudible if the amplitude is lower than the threshold, but audible if the amplitude is higher than the threshold. In this experiment, we also need to calibrate the minimal audible volume for 4KHz. After calibration, you will see another figure that is illustrated below. There are seven frequencies indicated by circles. The middle one represents the central frequency of a specific critical band. When we click the other neighboring frequency, this program will generate an audio signal consisting of the central frequency and the selected frequency. We can increase/ decrease its volume by simply left clicking mouse in the upper/lower side of this circle. After finding out the thresholds of those six frequencies (except the central frequency), right click mouse to exit this program. S tage 2: M easure sim ultaneous m ask ing 60 Related S ound P ress ure Level to 4K Hz , rS PL(dB ) Left Click m ouse to adjust the volum e. Right click to E xit 50 40 30 20 10 0 -10 2900 3000 3100 3200 3300 3400 3500 3600 Frequency(Hz) 3700 3800 3900 2 Since 4KHz is the most sensitive frequency for human ear, to obtain the whole dynamic range of volume, we first calibrate the minimal audible volume at 4KHz. UMD ENEE408G Spring’03 Design Project - Audio 3 Select two different critical bands3 and find out the simultaneous masking of your ear. Copy these two figures using Edit Copy Figure from the menu bar of this figure window and paste them in your report . For each critical band, compare the mask of higher neighboring frequency with the lower one . 2. Audio Extraction and MP3 MP3 (MPEG1 Audio Layer 3) is an audio coding/compression standard that uses perceptual coding technologies. In this part, we use GoldWave to extract a piece of music and compress it in MP3 format. We will observe how psychoacoustic model works by comparing the frequency spectrum of the raw signal and MP3 file: (a) CD Audio Extraction: Click Tools CD audio extraction from GoldWave’s menu bar. An audio extraction window will pop up. Use this tool to extract about ten seconds’ music4 from your favorite music CD and save it as a 16-bit stereo signed WAV file. Name it as original.wav. (b) Downsampling: To compare with MP3, we need downsample this WAV file. Click Effect Resample and choose 16000Hz. Save this file as downsample.wav. (c) Generate MP3 file by File Save As. Choose Save as type as Wave Audio. Adjust the parameters in the File Attributes as MPEG Layer 3, 32kbps, 16000Hz, stereo. Name this new file as wav2mp3.wav. (d) Convert MP3 to WAVE: Reload wav2mp3 file and save it as 16-bit stereo signed WAV file. Name it as reconstructed_wav.wav. (e) Compare Spectrum: Use Compare_Spectrum.m5 to plot spectrum for downsample.wav and reconstructed_wav.wav . Compare the difference between 3 [LinX,volumnY]=PM_Simu_Masking(ith_CB) ; you can select critical band by specifying ith_CB You can use your own favorite Compact Disk. 5 Compare_Spectrum(Original_Filename,Reconstructed_Filename,NFFTorder,Len_order,shift) % Original_Filename : file name of the original signal. % Reconstructed_Filename : file name of the reconstructed signal % NFFTorder: Number of FFT points, NFFT=power(2,NFFTorder), e.g. NFFT order = 9 for % 512-point FFT; 4 UMD ENEE408G Spring’03 Design Project - Audio 4 the two audio chips and discuss on which frequency range the psychoacoustic model has a significant impact . (f) Repeat (b)~(e) but change the MP3 to MPEG Layer 3, 56kbps, 24000Hz, stereo . Compare the result with previous one . Part II. Digital Audio Watermarking 1. Watermark Embedding, Detection, and Attack Digital Watermarks: Due to the advancement in network communications and multimedia signal processing network, exchanging and distributing multimedia become easier and popular. The accompanying problem with this advancement is how to protect the copyright of intellectual property of digital content. Ownership/copyright protection and integrity verification are two key issues. Digital watermarking is a class of techniques that embed copyright and other protection information in multimedia. In this part, we use AudioMark Demo6 to embed and detect watermark in audio and explore the weakness of this demo software. (a) Embedding of Digital Audio Watermark: By Watermark Cast to load the sample.wav and specify a KEY (the range of demo version is between 100000000~100000100) and the filename of watermarked file. (a) Detection of Digital Audio Watermark: By Watermark Detect and specify the key/seed, the detection module will determine the existence of the watermark. (b) Attack on Robust Audio Watermark: An adversary may modify a piece of watermarked multimedia to try to remove the watermark without degrading the perceptual quality of multimedia too much. To increase the robustness of watermarking, designers should understand typical attacks and prevent them in advance. In this part, you are asked to attack the watermarked wave file using the tools provided by GoldWave and test the existence of watermark in the attacked file using AudioMark Demo. Try to keep the quality of music acceptable. Here are a few possible ways in which you can attack. % Len_order: Number of samples for calculating spectrum, Len=power(2,Len_order); % shift: start position to compare two signals 6 AudioMark Demo: http://www.alphatecltd.com/watermarking/audiomark/audiomark.html . UMD ENEE408G Spring’03 Design Project - Audio 5 (1) MP3 compression: Choose mp3 by File Save As. (2) Echoing: Add echo by Effect Echo. (3) Enhancement and filtering techniques, such as low pass filtering, quantization, and equalization: By Effect Filter Low/HighPass, Equalizer, or Effect Resample. (4) Noise addition: By Tools Expression Evaluator. Develop an attack scheme that can preserve reasonable good sound quality yet remove watermark embedded by AudioMark Demo. Describe under what conditions the watermark will be destroyed. 2. Design Your Own Audio Watermarking Systems There are three basic issues in designing an audio watermarking system: • Transparency: The digital watermark should not degrade the perceptual quality of the signal. a. Robustness: For watermark conveying owner’s rights, adversaries would have incentives to remove the watermark by modifying and attacking the watermarked audio. The watermarks in these applications should be robust enough to survive a wide range of attacks. On the other hand, watermark that is fragile to processing can be useful for detecting tampering, where the change in watermark will signal and locate the altered regions of a multimedia signal. As multimedia is often stored in compressed format for efficient storage and compression, even in the case of fragile watermark, it is often desirable to design the watermark to sustain moderate compression. b. Capacity / Payload: It is desirable in many applications to allow the embedded watermarks to carry enough payload bits for representing various types of information. In this section, you will design two digital audio watermarking systems, and evaluate them in terms of transparency, robustness, and payload. (a) System #1 One of the simplest approaches to “hide” a message in audio is to convert the message into bits and putting them into the least-significant-bits (LSBs) of audio samples. To help detector make a more reliable decision, you may repeatedly embed each message bit in a number of audio samples at the embedder’s side, and do a majority voting at the detector’s side. UMD ENEE408G Spring’03 Design Project - Audio 6 (1) Implement this LSB-based watermarking in Matlab to embed and detect the following message in an audio file sample.wav : “(c)Spring 2003. DO NOT SELL. DO NOT TAMPER. Go Terps! ” Your implementation may include a message encoding function, watermark embedding and detection functions, and a message decoding function. Note 1: The sample.wav can be downloaded from course webpage. It is a stereo audio file. For simplicity, you can just embed watermarks in one channel here. Note 2: Matlab supports the I/O of WAVE file format. wavread.m and wavwrite.m are for reading and writing wave files, respectively. The format of sample.wav file is signed 16-bit, i.e. the range of its signed integer representation is between [-32768, 32767]. However, the value obtained by wavread.m is in stored as a “double” between [-1, 1). To embed watermarks in the LSBs in the Matlab environment, you may find it convenient to convert the [-1, 1) values to 16-bit unsigned integer values [0 65535] using the formula (x+1)*215. Note 3: The following Matlab built-in/toolbox functions may be helpful to your implementation: dec2bin( ), bin2dec( ), char( ), double( ). Note 4: Try to implement your watermarking system in a flexible way to accommodate the embedding in the nth LSB bits, n = 1, 2, 3 …. See the instructions below regarding the transparency and robustness. (4) Transparency: Listen to the watermarked audio whose 1st LSBs carries your message. Does the watermark affect the quality of the audio? Change your embedding function (and correspondingly the detection) to put your message in the 2nd LSBs and answer the above question again. How about 3rd LSBs and 4th LSBs? (5) Robustness, Security, and Applications: How robust is the watermarking system that embeds message in the 1st LSBs? How about the 2nd, 3rd, and 4th LSBs? How does the repeating time affect the robustness? Can an unauthorized person change the embedded message? Design tests and use the results to justify your answers . Note: You can use the bit error rate (BER), the percentage of bits that are incorrectly decoded, to measure the robustness. (b) System #2 The watermark can be embedded either in time domain or frequency domain. In this section, we employ spread-spectrum embedding to put a watermark in the 1-D DCT domain. UMD ENEE408G Spring’03 Design Project - Audio 7 Embedder: Key/Seed Noise-like seq. Watermark W(j) Audio File Segmentation (frame size L) DCT V(j) V’(j) = V(j) + a(j) W(j) e.g., a(j) = 0.05 |V(j)| added in midfrequency region V’(j) iDCT Watermarked audio file The basic procedures of spread-spectrum embedding is as the follows: • First, segment an audio file into non-overlapped frames with size L=1024 and apply a 1-D DCT for each frame. • Second, construct a noise-like vector w of length L as your watermark (such as through “randn( )”). Normalize the strength of your watermark, for example, use a random number generator to generate each element wi with a variance of 1. • For each frame of the audio, add watermark according to v’i = vi + αi wi , i = 1, …, L, where vi denotes an original coefficient and v’i denotes the watermarked version. The scaling factor αi controls the strength of your overall watermark. Here we apply the following simple rules: set αi to zero except for mid-frequency DCT coefficients (i.e. αi = 0 for i < T1 and i > T2, where the frequency thresholds T1 < T2 are determined by your experiments). For each mid-frequency coefficient vi, set αi to be 3-10% of |vi| . You can determine the exact setting empirically. As can be seen from this simple rule, a watermark is only embedded in the mid-frequency DCT coefficients of an audio frame. This is because information in the mid-frequency is more important than the other part for human auditory system. An adversary does not want to sacrifice too much quality of audio by hacking this informative part. Accordingly, the watermark is more likely to survive. A more sophisticated choice of αi for mid-frequency part should be guided by human auditory model. • After embedding, perform an inverse DCT to convert the signal back to time domain. Clip the amplitude of the watermarked signal to the range of the original audio sample [-1, +1]. Repeat the process for every frame. Detector: We use the original unmarked audio file to help determine the existence of a specific watermark. Thus the detector would know the original audio file, a watermark sequence, and of course, the watermarking method and all the related parameters. UMD ENEE408G Spring’03 Design Project - Audio 8 Audio File in question Original audio file DCT V’’ + _ V DCT Extract mid-frequency coefficients watermark detector W A specific watermark Detection result Similarity measures: <V’’ - V, W>, or < (V’’-V) ./ a, W >, or correlation coeff. The basic procedures of spread-spectrum embedding is as the follows: • We perform DCT on both the original and the audio file in question, and compute the difference between the corresponding elements of these two sets of DCT coefficients. We shall denote this difference vector as z. • Retain only the elements of w and z that correspond to the mid-frequency part in which the embedder chooses to embed watermark. We denote the retained vectors as w(m) and z(m). • Measuring the similarity between w(m) and z(m) by computing the correlation coefficient. A high positive correlation coefficient indicates that with high probability the audio frame in question comes from adding w(m) to the original audio frame. You may also try to take into account the scaling factor a when measuring the similarity. • Repeat the process for other frames. Plot the correlation coefficients you obtained from all frames. Here is your To-Do list: (1) Use Matlab scripts/functions to implement an embedder and a detector according to the procedures described above . Note: The following Matlab built-in/toolbox functions may be helpful to your implementation: dct( ), idct( ), randn( ), corrcoef( ). (2) Generate two different spread-spectrum watermarks w1 and w2 . Produce a watermarked audio files from sample.wav with w1 embedded, and name the watermarked file as marked1.wav; produce a 2nd watermarked audio files from sample.wav with w2 embedded, and name the watermarked file as marked2.wav. Use your detector to determine whether w1 can be found in marked1.wav and marked2.wav, respectively. In a single plot of correlation coefficients vs. audio frames, include and compare your detection results of the two cases. (3) Transparency and Robustness - Adjust various parameters in your watermarking system (L, αi , T1 and T2) and examine their impact on transparency and robustness: UMD ENEE408G Spring’03 Design Project - Audio 9 i. Listen to the watermarked audio. Does the watermark affect the quality of the audio? ii. Add noise7 with different amplitude to the watermarked audio. Does the detector detect the existence of the watermark that was generated from the original user key? iii. Use GoldWave to add echo in the watermarked audio. Try echoes with short delay a small volume and with long delay a large volume, respectively. Observe the detection result. iv. Use GoldWave to apply MP3 compression to the watermarked audio. Can your watermark resist the compression attack? v. Discuss the tradeoff between the transparency and robustness . Include in your report a watermarked audio signal using the parameter settings that you believe giving the best tradeoff . (4) Compare the transparency and robustness of the watermarks for the two systems investigated above. List their advantages, disadvantages, and potential applications. Discuss how to improve these two systems. (5) Bonus part: Extend your System#2 to hide a meaningful message such as the message we have used in System#1 . Hint: you can use hide one bit in one frame by adding the watermark if to embed a bit “1” (v’i = vi + αi wi) and subtracting the watermark if to embed a bit “0” (v’i = vi - αi wi). You can reuse your message encoding and decoding functions from System#1. If needed, you can repeatedly embed the same bit in a few frames and do a majority voting at the detector’s side. Appendix of Part-II for further exploration: Pseudo-Noise Sequence Spread-spectrum watermarking uses a noise-like sequence w as watermark. Such noise-like signals are less likely to introduce perceivable distortions than structured signal (such as a periodic squared wave). In addition, they have good statistical properties to achieve high noise resistance and help detector make reliable decisions. In System#2, you have experienced real-valued watermark generated through a Gaussian random number generator. Another simpler choice of w is a binary periodic 7 You can use rand.m to generate “uniformly distributed” random sequences or randn.m “Normally distributed” random sequences. For example, noise=A*randn(1,100) represents 100 normally distributed random numbers with amplitude scalar A. UMD ENEE408G Spring’03 Design Project - Audio 10 pseudo-noise sequence8, or PN sequence in short. PN sequence can be generated by a series of shift registers with a feedback logic that is shown in the figure below. Logic Flip-Flop 1 2 3 ...... p output sequence clock The feedback logic can be expressed by a polynomial, f(X)=g1X+g2X2+…+gpXp where Xi indicated the ith Flip-Flop and g i ∈ {0,1} controls the logic. When the clock triggers, the system will shift all of its values by one unit, output a bit as the PN sequence, and then send the f(X) back to the first Flip-Flop. One implementation is to employ Galois Field prime polynomials9 in the logic feedback part to obtain maximal-length sequence (or m-sequence in short). Setting p=10 gives us 60 possible polynomials to use10. Each polynomial can generate 210-1 = 1023 different output sequences. You can generate a pseudo random sequence of this kind using a provided MATLAB function PNsequence.m11 with a given polynomial and a seed (which is to initialize the shift register). We can observe the auto-correlation and cross-correlation function of the m-sequence by the following MATLAB programs: Auto-correlation property: Seq1_index= ? % Specify your selected key (seed) here seq1_poly= GFprimMrx(seq1_index,2:end) seq1 = PNsequence(seq1_poly,bitget(1,1:p)); for seq2_index=1:2^p-1 seq2 = PNsequence(seq1_poly,bitget(seq2_index,1:p)); [corrf]=PNcorrelation(seq1, seq2); plot(corrf); axis([0, length(corrf)+1, ,0 max(corrf)+1 ]); pause; end 8 For details, please refer to chapter 7 in: Simon Haykin: Communication Systems, 4th edition, Wiley, 2000. For fixed order p, the prime polynomials can be obtained using the Matlab built-in function. GFprimMrx = gfprimfd(p,’all’) Each row represents a “key” for each user: GFprimMrx (key,2:end) = [g1 g2 … gp] You can store those polynomials as a .mat file (by save command) since searching for the polynomials is time-consuming. 10 W.W. Peterson and E.J. Weldon, Error-Correcting Codes, Cambridge: MIT Press, 1972. 11 out_sequence = PNsequence(GFprimMrx (key,2:end), seed) 9 UMD ENEE408G Spring’03 Design Project - Audio 11 Cross-correlation property: Seq1_index= ? % Specify your selected key (seed) here NumGF=size(GFprimMrx,1); maxTable=zeros(1,NumGF); seq1_poly= GFprimMrx(seq1_index,2:end) seq1 = PNsequence(seq1_poly, rand(1,length(seq1_poly))>=0.5); for seq2_index=1:NumGF seq2_poly = GFprimMrx(seq2_index,2:end); seq2 = PNsequence(seq2_poly,rand(1,length(seq2_poly))>=0.5); [corrf]=PNcorrelation(seq1, seq2); plot(corrf); axis([0, length(corrf)+1, 0 max(corrf)+1 ]); maxTable(seq2_index)=max(corrf); end plot(maxTable); find(maxTable==max(maxTable)) UMD ENEE408G Spring’03 Design Project - Audio 12 Part III. Synthetic Audio (1): Musical Instrument Digital Interface (MIDI) Musical Instrument Digital Interface (MIDI)12 is different from the digital sampled audio, such as PCM. It can be thought of as instructions telling music synthesizer when to play and what notes to play instead of sending waveform to the speakers. There are several advantages using this synthesis approach. For instances, it requires much less storage space and PC bandwidth in I/O bus. In this part, we use Anvil Studio13 to study the MIDI Protocol. As shown in the figure below, there are several panels (from top to bottom) in this software: play panel, track editor panel, stave panel, note editor panel, and keyboard panel. (a) Load the Sonata-c.mid by File Open Song. Double click the play button in the play panel. Then modify the track editor panel, such as, the channel and instrument14. The channel setting is for assigning an audio channel for each track and the instrument setting is to choose what instrument to play. 12 A useful Tutorial of MIDI: http://www.harmony-central.com/MIDI/Doc/tutorial.html Anvil Studio: http://www.anvilstudio.com/upgraden.htm 14 There are 16 logical channels and 128 instruments in the General MIDI (GM) systems. Those number of instruments are standardized, thus different music synthesizer will not play the different instrument while reading the same instrument number form MIDI file. However, this does not mean those music 13 UMD ENEE408G Spring’03 Design Project - Audio 13 (b) Notes in MIDI file are defined as the following table (each note takes 4 bits). Octave C 0 1 2 3 4 5 6 7 8 9 10 C# 0 12 24 36 48 60 72 84 96 108 120 1 13 25 37 49 61 73 85 97 109 121 D 2 14 26 38 50 62 74 86 98 110 122 D# 3 15 27 39 51 63 75 87 99 111 123 E 4 16 28 40 52 64 76 88 100 112 124 Music Notes F F# G 5 6 7 17 18 19 29 30 31 41 42 43 53 54 55 65 66 67 77 78 79 89 90 91 101 102 103 113 114 115 125 126 127 G# 8 20 32 44 56 68 80 92 104 116 A 9 21 33 45 57 69 81 93 105 117 A# 10 22 34 46 58 70 82 94 106 118 B 11 23 35 47 59 71 83 95 107 119 The Anvil Studio has provided a keyboard interface at the bottom of the program’s window, which can automatically translate notes keyed in into the above note numbers. Create a new MIDI file by File New Song and use the keyboard panel to key in the following score. Save this MIDI file. (c) To check whether the MIDI file is generated successfully, you can use the Windows Media Player to listen to this MIDI file. Part IV. Essays: Digital Rights Protection of Multimedia and related Ethics issues for Engineers Intellectual Property (IP), such as copyright, plays an important role in contemporary society. Discuss the issues of IP rights protection for digital multimedia among your team, and interview your friends/families to see what they think about this. Write an essay to report and summarize your friends and opinions . synthesizers will play the same waveform. In fact, every music synthesizer has its own approach to generate any specified note of instrument. Usually, there are two approaches, Frequency Modulation and Wave Table. The quality of the latter approach is much better than the former. Except hardware synthesizer, there exists software synthesizer generating and mixing waveforms. Microsoft GS Wavetable SW Synth is the popular one under Windows system, which adopts Roland instrument sounds. You can try to change the MIDI playback device, no matter hardware of software synthesizer, and listen the differences among them. UMD ENEE408G Spring’03 Design Project - Audio 14 Part V: Mobile Computing and Pocket PC Programming From the above three parts, we have learned the fundamentals of digital audio processing. Now design a simple Pocket PC application related to digital audio processing using the Microsoft eMbedded Tools. You can refer to “ENEE408G Multimedia Signal Processing Mobile Computing and Pocket PC Programming Manual” and extend the examples there. Bonus Part I. Synthetic Audio (2): MPEG4- Structured Audio (MP4SA) MPEG4 synthetic audio coding standard consists of two methods, namely, Structured Audio (SA) and Text-to-Speech (TTS). In the SA part, MPEG4 defines the Structured Audio Orchestra Language (SAOL) and Structured Audio Score Language (SASL)15. Instead of using the frequency modulation and wavetable technique as in MIDI, SAOL encodes a sound signal according to its structure. This technique can achieve extremely high compression ratio, about 100:1~10,000:1. In this section, we explore these two languages, SAOL and SASL, using the SPlay and SNet16. SAOL and SASL: The role of SAOL is sound modeling, which encodes algorithms on how instruments generate sounds. On the other hand, SASL is for sound sequencing, i.e. a timing table giving instructions to each instrument on when and how to play notes. The figure below illustrates the framework. The SAOL and SASL are in plain text file format. The encoder encodes those two files into binary form known as the MP4 format. A MP4 file contains the structure of instruments and the scores of the music instead of digitalized sample waveforms. The decoder converts the MP4 file into a C file and then compiles it as an executable audio file. SAOL Encoder SASL Encoder .mp4 MP4 to C Translator .c C++ compiler executable audio file Decoder SPlay: SPlay is a software program that implements a decoder. Download several MP4 files from http://student-kmt.hku.nl/~saol/ and listen to the results. Notice that the file size is 15 MP4-SA Language Standard: http://www.cs.berkeley.edu/~lazzaro/sa/book/append/fdis/SA-FDIS.pdf. (a) A useful online book explains how to use this new language: The MPEG-4 Standard Structured Audio book, http://www.cs.berkeley.edu/~lazzaro/sa/book/ by John Lazzaro and John Wawrzynek. (b) You can watch a short presentation about MP4-SA by John Wawrzynek at http://bmrc.berkeley.edu/bibs/instance?prog=1&group=13&inst=35 16 SPlay and SNet are available at: http://student-kmt.hku.nl/~saol/ UMD ENEE408G Spring’03 Design Project - Audio 15 quite small. The behavior of this audio player is different from that of a waveform player, such as players for .wav and .mp3 files. Since an MP4 player translates the MP4 file into C file and compiles it, this player will take more time than a waveform player. (a) In this sub-section, we explore SAOL and SASL using SNet. SNet is a GUI wrapping the kernel sfront. You can refer to John Lazzaro’s online book, The MPEG-4 Standard Structured Audio Book, to learn more details about how to use these languages. (1) Read the online book Part I – A Tutorial Introduction. (2) Play those three examples (sine, vsine and vcsine) by SNet. First, copy *.saol file and then paste on the SAOL tab. Save it. Then, copy the *.sasl file and paste on the Score tab. Save them by Render Render to .mp4. . (3) Use SPlay to play the MP4 files. Discuss the advantages and disadvantages of waveform coding (PCM) and synthetic audio (e.g. MIDI and MP4-SA) . UMD ENEE408G Spring’03 Design Project - Audio 16 ENEE408G Multimedia Signal Processing (Spring’03) Overview and Warm-up Exercises of Matlab Programming 1. Starting Matlab Begin a Matlab session by clicking on its icon under the “Start Programs” Menu. We will be using Matlab 6.5. Upon started, you will find an interface like this1: 1 For more details, please refer http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.shtml ENEE408G Spring 2003 Matlab Programming Page 1/1 2. Matlab 6.5 Features The new features of Matlab 6.5 include: • Command Window Use the Command Window to enter commands, and run functions and M-files. • Command History The commands that you have entered in the Command Window are automatically logged in the Command History window. You can view previously used functions there, and copy and execute selected lines. • Start Button and Launch Pad MATLAB's Launch Pad provides easy access to tools, demos, and documentation. • Help Browser Use the Help browser to search and view documentation for all MathWorks products. The Help browser is a Web browser integrated into the MATLAB desktop that displays HTML documents. To open a Help browser, click the help button in the toolbar, or type “help browser” in the Command Window. • Current Directory Browser MATLAB file operations use the current directory and the search path as reference points. Any file you want to run must either be in the current directory or on the search path. To search, view, open, or make changes to MATLAB-related directories and files, use the MATLAB Current Directory browser. Alternatively, you can use the commands dir, cd, and delete in the Command Window. After starting Matlab, please change the current directory to your own working directory rather than staying in the default Matlab directories. Otherwise, you could accidentally overwrite some important files! • Workspace Browser The MATLAB workspace consists of the variables built up during a MATLAB session. Variables are added to the current workspace when you use functions, run M-files, or load previously saved workspaces. ENEE408G Spring 2003 Matlab Programming Page 2/2 To view the workspace and information about each variable, use the Workspace Browser, or use the commands “who” and “whos”. • Array Editor Double-click a variable in the Workspace browser to see it in the Array Editor. Use the Array Editor to view and edit a visual representation of one- or two-dimensional numeric arrays, strings, and cell arrays of strings that are in the workspace. • Editor/Debugger Use the Editor/Debugger to create and debug M-files, which are Matlab programs you write. The Editor/Debugger provides a graphical user interface for basic text editing, as well as for M-file debugging. You can use any other text editor to create M-files, such as Emacs and Notepad, and use Preferences (accessible from the Matlab desktop’s File menu) to specify that editor as the default. If you choose to use other editors, you can still use the MATLAB Editor/Debugger for debugging, or you can use debugging functions, such as “dbstop”, which sets a breakpoint. If you just need to view the contents of an M-file, you can display it in the Command Window by using the “type” command. 3. General Comments Since most of you are already familiar with Matlab in previous courses, this section provides only a brief review of a few important points of using Matlab. • General Philosophy Matlab has become a popular software tool of linear algebra, numerical analysis, and visualization. Much of its power lies in its highly optimized operations on vectors and matrices. In many cases, you should be able to eliminate the “for” loops you used to write in C code with Matlab's simple and fast vectorized syntax2. Here is an example to compute the cosine of 10001 values ranging from 0 to 10. i = 0; for t = 0:.001:10 i = i+1; y(i) = cos(t); end 2 If you are using MATLAB 6.5 or higher, before spending time to vectorize your code, please refer to http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_prog/ch7_per6.shtml#773530 . You may be able to speed up your program considerably by using the new MATLAB JIT Accelerator. ENEE408G Spring 2003 Matlab Programming Page 3/3 A vectorized version of the same code is: t = 0:.001:10; y = cos(t); It is important for you to do so whenever possible, since the “for” loops are not optimized (in MATLAB 6.1 or lower) and could be very slow. The second way to improve code execution time is to preallocate the arrays that store output results. Here is an example to preallocate a row vector with 100 elements. y = zeros(1,100); Preallocation makes it unnecessary for MATLAB to resize an array each time you enlarge it, and it also helps reduce memory fragmentation if you work with large matrices. Another tip to speed up performance is to implement your code in a function rather than a script. Every time a script is used in MATLAB, it is loaded into memory and evaluated one line at a time. Functions, on the other hand, are compiled into pseudo-code and loaded into memory all together, so that repeated calls to the function can be faster. • Help Type "help function_name" at the Matlab Command Window to get help on a specific function. For a nicer interface to the help utility, click the question mark button at the top of the Matlab Command Window as explained above. Or, from the Help menu item, choose "Help Desk" to get to online HTML help. From the Help Desk, you can search the online documentation to get help on how to accomplish tasks for which you may not know the specific function names. You can also try the “lookfor” command at Matlab Command Window. • Saving/Loading Data Matlab data can be stored into “.mat” files on your disk. To save your whole workspace in a file called “filename.mat”, use "save filename". To save one particular variable called “variable_name” into “filename.mat”, type "save filename variable_name" Saving will overwrite whatever filename you specify. To append instead of overwrite, use "save filename variable_name -append". Note that the saved files are in a special binary format, and hence unreadable by other applications. You can use "save ... -ascii" to save your workspace as a text file. See "help save" for more details. ENEE408G Spring 2003 Matlab Programming Page 4/4 To load a workspace from “filename.mat” file, we can type “load filename”. Again, you can load specific variables using "load filename variable_name". See "help load" for more details. • Writing Matlab Programs Procedures that you call repeatedly can be stored either as function or script. o Scripts do not accept input arguments or return output arguments. They operate on data in the workspace. o Functions can accept input arguments and return output arguments. Internal variables are local to the function. Both of them can be created by writing “.m” files in your favorite text editor and storing them in your working directory. When you invoke a script, MATLAB simply executes the functions found in the file. Scripts operate on existing data in your workspace, and they can create new data on which to operate. Although scripts do not return output arguments, any variables that they create remain in the workspace and can be used in the subsequent computations. Functions are M-files that can accept input arguments and return output arguments. The name of the M-file and of the function should be the same. Functions operate on variables within their own workspace, separate from the main workspace you access at the MATLAB Command Window. Here is an example function file, MSE2PSNR.m. function PSNR=MSE2PSNR(MSE) X % Convert mean square error (MSE) into peak signal to noise ratio (PSNR) % Input: MSE % Output: PSNR % % Author: Guan-Ming Su % Date: 8/31/02 A=255*255./MSE; PSNR=10*log10(A); Y Z X A function consists of three parts. The first line of a function M-file starts with a keyword “function” and has to include a statement like: function [out1, ... outN] = function_name( input1, input2, .., inputN ) The final value of variables out1, ... , outN will be automatically returned once the function execution is finished. ENEE408G Spring 2003 Matlab Programming Page 5/5 Y The next several lines with “%”, up to the first blank or executable line, are comment lines that provide the HELP text. In other words, these lines are printed when you type “help MSE2PSNR” in Matlab Command Window. In addition, the first line of the HELP text is the H1 line, which MATLAB displays when you use the “lookfor” command or request help on a directory. Z The rest of the file is the executable MATLAB code defining the function. The variable A introduced in the body of the function, as well as the variables MSE and PSNR on the first line of the function, are all local to the function; they are separate from any variables in the MATLAB workspace. • Audio Representation and Playback Matlab supports multi-channel wave format with up to 16 bits per sample. To load a wave file, you can use “[Y, Fs, Nbits] = wavread( wave_filename)”, where wave_filename is the file name of the wave file, Y the sampled data with dimension number of samples ° number of channels, Fs (in Hz) the sampling rate, and Nbits the number of bits per sample used to encode the data in the file. Amplitude values in Y vector are normalized to the range [-1, +1] according to the formula, Y = X /[2(Nbits-1)] – 1, where X is the original unsigned Nbits integer expression. For instance, while X is 128 with Nbits = 8, Y is 0. To generate a wave file and store it in the hard disk, you can use “wavwrite(Y, Fs, Nbits, wave_filename)”. To playback the signal in vector Y, use “sound(Y, Fs, Nbits)”. • Image Representation and Display Many Matlab variables are matrices (arrays) with double precision complex entries. A gray scale image is just an array with real entries ranging from 0 to 1. Many applications also use 1 byte, (i.e., 256 representative levels from 0 to 255), to represent a pixel value. This saves storage space and speeds up computation. We will use these two representations in our lab assignment. To display a grayscale image, use "imshow( image_name, [Low High])", where the value Low (and any value less than Low) will be displayed as black, the value High (and any value greater than High) as white, and values in between as intermediate shades of gray. An image stored in “uint8” (unsigned 8-bit integer) data type can be displayed directly by “imshow(image_name)”. See “help imshow” for details. If you are to display a transformed image such as by DCT or DFT, you may want to zero out the DC value beforehand to make it easier to see the AC values that are usually a few magnitude smaller than the DC. You could also try viewing the log of the DCT or brightening the color map with “brighten”. ENEE408G Spring 2003 Matlab Programming Page 6/6 • Matlab tips In addition to such basic commands as for, end, if, while, “;”, “= =”, and “=”, you may find the following functions may be useful in this course: o fft2 – a two-dimensional Fast Fourier Transform routine. Make sure to use this command for 2-D images, not the one-dimensional transform “fft”! See the helps of these two commands to understand the difference. o Ctrl-C – stops execution of any command that went awry. o clear all - Remove all variables, globals, functions and MEX links. o close all - Close all the open figure windows o max – For vectors, “max(X)” is the largest element in X. For matrices, it gives a vector containing the maximum element from each column. To get the maximum element of the entire matrix X, try “max(X(:))”. Other functions such as min, mean, and median can be used in a similar manner. o abs – “abs(X)” gives the absolute value of elements in X. o Indexing – Use the colon operator and the “end” keyword to get at the entries of a matrix easily. For example, to get every 5th element of a vector a, use “a(1:5:end)”. See “help colon” and “help end”. o strcat(S1,S2,S3,...) – Concatenate the corresponding rows of the character arrays S1, S2, S3 etc. o num2str(X) – Convert a matrix X into a string representation with about 4 digits and an exponent if necessary. This is useful for labeling plots. o flipud and fliplr - Flip matrix in up/down direction and flip matrix in left/right direction o tic and toc – “tic” starts a stopwatch timer and “toc” reads the stopwatch timer. It is useful to monitor the executing time of your program. o waitbar – This function can display the progress of your program when you use loops, such as a for loop. o pause – This function causes a procedure to stop and wait for the user to strike any key before continuing. o find – Find indices of nonzero elements. ENEE408G Spring 2003 Matlab Programming Page 7/7 o B = repmat(A,M,N): Replicate and tile the matrix A to produce the M-by-N block matrix B. For example, let A=[1 2 3], the result of repmat(A,3,2) is [1 2 3 1 2 3; 1 2 3 1 2 3; 1 2 3 1 2 3] o Deleting rows and columns: You can delete a row or column by assigning it as a pair of square bracket. For example, A=[1 2 3;4 5 6;7 8 9], we can delete the second column by “A(:,2)=[];”. The resulting matrix A is [1 3; 4 6; 7 9]. 4. Examples on Digital Audio Processing We use a few examples related to audio and image processing as warm-up exercises before we get into the design lab section. Let’s start Matlab and type “Edit” in the command window. You will see a “M-file Editor”. We will use this editor3 to write M-files. • Read, Playback and Visualize an Audio Signal (1) Download the “symphonic.wav” audio file from the course web site. Make sure it is in your working directory. (2) You can read an audio file into an Y matrix using the function wavread : [Music,Fs, Nbits] = wavread('symphonic.wav'); (3) To obtain the dimension of this image, type [MusicLength, NumChannel ] = size(Music); The function size will return the number of samples and number of channels of this audio file into MusicLength and NumChannel. (4) To playback this audio vector, type sound(Music, Fs, Nbits); Make sure your speaker or earphone is on. (5) We can visualize the waveform by typing: Display_start = 1; Display_end = MusicLength; subplot(2,1,1); plot(Music(Display_start: Display_end,1)); title(‘First channel’); 3 Please type those lines indicated by the symbol “ “ in the M-file Editor. You can name this file as”MatlabReview.m” and execute it by typing MatlabReview in Matlab command window. ENEE408G Spring 2003 Matlab Programming Page 8/8 subplot(2,1,2); plot(Music(Display_start: Display_end,2)); title('Second channel'); You can adjust the display range by changing Display_start and Display_end. • Bits Manipulation (1) Convert the double value format of Music to an unsigned integer expression with Nbits bits IntMusic = (Music+1)*power(2,Nbits-1); (2) Many music files use 16 bits to represent an audio sample. In this case, NBits = 16. We can extract the lower byte from the first channel: LowNbits = Nbits/2; LowIntMusicCh1 = zeros(MusicLength,1); FirstChannel = IntMusic(:,1); % extract the first channel for ibit = 1:1:LowNbits LowIntMusicCh1 = LowIntMusicCh1 + bitget(FirstChannel, ibit)*power(2,ibit-1); end (3) Convert unsigned integer to the normalized expression and listen to it. LowRecMusicCh1 = LowIntMusicCh1/power(2,LowNbits-1) - 1; sound(LowRecMusicCh1,Fs, LowNbits); (4) Repeat the procedure for the second channel and store the final result in LowRecMusicCh2. What do you hear? • Window-based Audio Processing (1) Many audio/speech processing techniques divide the whole audio samples into segments before performing further processing, as depicted in the following figure: Data First segment 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 Second segment 5 6 7 8 9 Third segment 9 Length of overlapping ENEE408G Spring 2003 Matlab Programming 1 1 1 1 1 1 1 0 1 2 3 4 5 6 1 0 1 1 1 1 1 0 1 2 3 4 Length of window Page 9/9 As we can see, a sliding window is used to extract a set of data from the original data stream and the adjacent windows have some overlap with each other. For the example in the above figure, the window size is 6 and the overlapping size is 2. The first six samples are extracted as the first segment, and the start point of the second segment is the 5th sample. (2) In this section, we segment the lower bytes of the second channel, LowRecMusicCh2, with window size 215 and zero overlapping. Then re-order samples in each window using “flipud”. LenWindow = power(2,15); % Length of window LenOverlap = 0; % Length of overlapping LenNonOverlap = LenWindow - LenOverlap; NumSegCh2=floor((length(LowRecMusicCh2)-LenWindow)/(LenNonOverlap ))+ 1 ; for iseg = 1:1:NumSegCh2 seg_start = (iseg-1)*LenNonOverlap+1; % start point of current window seg_end = seg_start + LenWindow - 1; % end point of current window LowRecMusicCh2(seg_start:seg_end) =… flipud(LowRecMusicCh2(seg_start:seg_end )); end sound(LowRecMusicCh2,Fs, LowNbits); Can you hear the “secrets”? 5. Examples on Digital Image Processing • Read and Display an Image File (1) Download the “CuteBaboon.bmp” image file from the course web site. Make sure it is in your working directory. (2) You can read an image file into an Im matrix using the function imread : Im=imread(‘CuteBaboon.bmp’); (3) To obtain the dimension of this image, type [height, width]=size(Im); The function size will return the dimension of this 2-D matrix, Im, into height and width. (4) To display this image, type imshow(Im,[0 255]); ENEE408G Spring 2003 Matlab Programming Page 10/10 Since each pixel in this image has 8 bits, the gray level range is between 0 and 255. We specify this range as the second argument in the imshow function. • Bit Planes As we have mentioned, each pixel is represented using 8 bits. We can put together the most significant bit (MSB) of each pixel and form the first bit plane, as shown in the following figure. Similarly, we can extract the second MSB of each pixel to form the second bit plane, and so on so forth. Width Height Bits 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 50 51 51 44 65 106 134 115 59 46 46 64 111 150 130 104 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 51 71 88 102 107 80 47 89 113 118 65 56 66 119 131 84 57 61 113 141 91 63 70 89 135 105 81 61 75 86 120 89 71 61 77 97 103 90 73 77 89 105 90 93 103 102 126 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 0 0 1 First bit Plane 0 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 Second bit plane Ex. 131(10)=10000011(2) (1) Extract MSB bit plane using Matlab’s library function “bitget”. msbIm=bitget(Im,8); (2) Display this bit plane imshow(msbIm, [0 1]); Note: Since each bit is represented by 0 or 1, the value range of a bit plane is [0 1]. (3) Add your own codes here to observe other bit planes. As we go from MSB to LSB (least significant bit), you will find that the bit plane becomes more noise-like. ENEE408G Spring 2003 Matlab Programming Page 11/11 • Image Rotation We can rotate images by treating them as a 2-D matrix and using Matlab’s built-in matrix operation function. The following table shows the relation between Matlab function and image rotation. The first row shows the commands and the second row shows the resulting rotated images. Im fliplr(Im') flipud(Im') flipud(fliplr(Im)) Let’s try image rotation on the 2nd LSB bit plane of Im. (1) Extract 2nd LSB bit plane of Im. lsb2Im=bitget(Im,2); imshow(lsb2Im, [0 1]); (2) Rotate each 16x16 block. bs=16; for i=1:1:height/bs for j=1:1:width/bs lsb2Im((i-1)*bs+1:1:i*bs,(j-1)*bs+1:1:j*bs)=… flipud(lsb2Im ((i-1)*bs+1:1:i*bs,(j-1)*bs+1:1:j*bs)'); end end (3) Display your result and see what “secret” message you can discover. imshow(lsb2Im, [0 1]); • Image Down-Sampling There are many ways to down-sample an image to reduce its size. The basic idea is to use one pixel to represent a small area that is usually in square shape. (1) Use the left-top pixel of each square area to represent that area: ENEE408G Spring 2003 Matlab Programming Page 12/12 sIm=Im(1:8:end,1:8:end); imshow(sIm,[0 255]); (2) Take the average of each square area. Here we use a function call. meansIm=mean_subfunction(Im); You can create this subfunction by File | New, name it as mean_subfunction.m, and type in the following Matlab’s codes. function Y=mean_subfunction(X) % Matlab Review4 [row col]=size(X); Y=zeros(row/8, col/8); for i=1:1:row/8 for j=1:1:col/8 Y(i,j)=mean2(X((i-1)*8+1:1:i*8, (j-1)*8+1:1:j*8)); end end Observe this downsampled image. imshow(meansIm,[0 255]); (3) Compare the result of (1) and (2). Which is a better way to do down-sampling? • Histogram A histogram of a grayscale image shows the luminance distribution of this image. It tells us the statistics of how many pixels (or what percentage of pixels) an image has at each gray level. Although Matlab Image toolbox already provides histogram functions, we write our own version here for the purpose of practicing Matlab. (1) find function find function is a very powerful function and can make your Matlab program elegant if used properly. 4 You can replace it with your comments on the purpose and usage of this function. After you have done with this subfunction, you will see “Matlab Review” and other comments you put there by typing “help mean_subfunction” in Matlab Command Window. ENEE408G Spring 2003 Matlab Programming Page 13/13 I = find(X) returns the indices of the vector X that are non-zero. (2) length function length function will return the number of elements in a vector. (3) We can use find and length function together to obtain the number of pixels that has a specific luminance value igray. histogram=zeros(1,256); for igray=0:1:255 histogram(igray+1)=length(find(Im==igray)); end plot(histogram); axis([0 255 min(histogram) max(histogram)]); title(‘Histogram’); xlabel(‘Luminance value’); ylabel(‘Number of pixels’); 6. Assignments (1) Audio Steganography The word “steganography” comes from Greek and means “secret writing”. Here we apply steganography to audio to hide a secret and inaudible message in a music file. y1 y2 y3 y4 y5 y6 y7 y8 y9 M SB {y i } LS B z1 M SB {z k } LSB As illustrated in the above figure, a secret audio message {zk} is put in the least significant bits of samples from a host music signal to produce a stego-ed audio, {yi}. We can extract the hidden message {zk} from {yi} according to the following formula: zk = 8k ∑ LSB( y ) ⋅ 2 i =8 ( k −1) +1 ENEE408G Spring 2003 Matlab Programming i i −8 ( k −1) −1 for k = 1,…,N/8 Page 14/14 where LSB(yi) represents the operator which extracting the least significant bit from sample yi and N the number of samples in {yi}. Download “guitar.wav” from course web site and write a simple M-file to extract the hidden message {zk}. (2) Image Up-Sampling (Enlargement) The simplest way to implement image up-sampling is replication. To up-sample an MxN image to 2Mx2N, each original pixel is expanded into four pixels of the same value. These four pixels are arranged in a 2x2 manner. Here is an example: 1 2 A= 3 4 1 1 LA = 3 3 1 2 2 1 2 2 3 4 4 3 4 4 Download “Girl.bmp” image file from course web site and write a simple M-file to enlarge it without using any “for” loops. Hint: you may find Matlab’s built-in Kronecker Product function kron useful. If A and B are M1 x M2 and N1 x N2 matrices, respectively, then their Kronecker product is defined as a(1,1)B " a(1, M 2 )B A ⊗ B ≡ {a(m, n)B} = # # a ( M 1 ,1)B " a ( M 1 , M 2 )B (3) Erroneously Indexed Image The image “Noise.bmp” is an erroneously indexed image caused by some unknown reasons. The only thing we know so far is the relation between original values and the erroneously indexed values. If we represent the original luminance (8 bits) in binary form as oriL = a7 ⋅ 2 7 + a 6 ⋅ 2 6 + a5 ⋅ 2 5 + a 4 ⋅ 2 4 + a3 ⋅ 2 3 + a 2 ⋅ 2 2 + a1 ⋅ 21 + a 0 ⋅ 2 0 , then the erroneously indexed value can be expressed as errL = a 0 ⋅ 2 7 + a1 ⋅ 2 6 + a 2 ⋅ 2 5 + a3 ⋅ 2 4 + a 4 ⋅ 2 3 + a5 ⋅ 2 2 + a 6 ⋅ 21 + a 7 ⋅ 2 0 , Download the “Noise.bmp” image file and recover it. ENEE408G Spring 2003 Matlab Programming Page 15/15