ENEE408G Multimedia Signal Processing Lab Manual on

ENEE408G Multimedia Signal Processing
Lab Manual on Image, Video, Audio and Speech
K. J. Ray Liu
Min Wu
Guan-Ming Su
Department of Electrical and Computer Engineering
University of Maryland, College park
Last Updated: Spring 2003
© Copyright 2003.
All rights reserved.
ENEE408G Multimedia Signal Processing
Design Project on Image Processing and Digital Photography
The Goals:
1. Understand the fundamental knowledge of digital image processing.
2. Learn how to enhance image quality and how to compress images.
3. Explore artistic techniques by using digital image processing.
Note: The symbol
means to put your discussion, flowchart, block diagram, or plots
in your report. The symbol ‘ indicates that you should put the multimedia data in
your report. Save images in BMP format unless otherwise stated. The symbol
means to put your source codes (Matlab, Basic, or C/C++) in your report.
Part I. Color Coordinate
Several color coordinate systems are commonly used in practice. Each coordinate system
represents a color space consisting of several components and has its own special
purpose. In this section, we explore three color coordinate systems, RGB (Red, Green
and Blue), HSL (Hue, Saturation and Lightness or formally, HSV, where V represents
brightness value) and YUV.
1.
Separate three components of a color image using Paint Shop Pro.
(a) Open the Flower.bmp file using Paint Shop Pro. By Colors Spilt channel
Spilt to RGB, we can split a color image into red, green and blue
components. What can you observe from these three images?
(b) Choose the red component and adjust its value by Colors Adjust
Brightness/Contrast. Set the Brightness to 75% and Contrast to 0%. After
this, we combine this new red component and the original green and blue
components using Combine RGB dialog box obtained from Colors
Combine channel Combine from RGB.
Observe and store this combined image. ‘
UMD ENEE408G Spring 2003
Design Project -Image
1
(c) Repeat the procedure (a) and (b) but use the green component instead of the
red one. ‘
(d) Repeat the procedure (a) and (b) but use the blue component instead of the red
one. ‘
(e) Compare these three new images and the original image. Describe what you
observe.
2. Repeat 1(a)~1(d) but we change the color coordinate to HSL by using Colors
Spilt channel Spilt to HSL and Colors Combine channel Combine from
HSL. Save these new images ‘. Discuss the role that each component plays in
this color coordinate.
3. Explore the YCbCr (YUV) color coordinate using Matlab.
(a) Write a Matlab script using the following procedures.
(i) RGB to YCbCr: Use imread.m and rgb2ycbcr.m to separate the
Flower.bmp into Y, Cb and Cr.
(ii) Downsampling: Use imresize.m to downsample Cb and Cr components in
each dimension by factor 1.5, 2, 4, 6 and 8.
(iii) Upsampling: Use imresize.m to upsample the downsampled Cb and Cr
components to their original sizes.
(iv) YCbCr to RGB: Combine Y and these two new components, Cb and Cr,
and transform back to RGB color coordinate by ycbcr2rgb.m. Display and
save these new images using imshow.m and imwrite.m, respectively. ‘
(b) Explain what you observe and discuss what advantages the color coordinate
transform from RGB to YUV has.
(c) Compare RGB and YUV representations. Under what situation will you adopt
each color representation?
4. Blue Background Extraction
On TV weather news, we often see that the weather forecast men stand in front of
maps or Doppler radar images. In traditional news production, they just stand in
front of a blue curtain. A special camera system will extract the weathermen’s
images and add them in front of those weather related images. The basic idea of
this camera system is to split an image into RGB channels, create a mask based on
the information of blue channel, and use this mask to extract images. In this subsection, develop your own scheme and use Paint Shop Pro to extract the flower
UMD ENEE408G Spring 2003
Design Project -Image
2
object from BlueBG.bmp and add this flower object in front of another image
Sand.jpg.
(a) Draw a block diagram of your scheme and describe the procedures you
followed with Paint Shop Pro.
(b) Save the final image of the flower with sand. ‘
Part II. Image Enhancement
An image could be corrupted during transmission or captured under ill condition. We
can use image enhancement techniques to enhance some features and improve the visual
quality of an image.
1. Histogram: Histogram of a gray image refers to the distribution of luminance. In
this part we examine the histogram of the gray scale Lena image using Paint Shop
Pro. After opening the Lena.bmp, click the icon , and you will see a Histogram
Window.
(a) Histogram Equalization: Histogram equalization is an image enhancement
technique to make the histogram more uniform. Use Colors Histogram
Functions Equalize to perform histogram equalization. Apply this
technique to Lena.bmp and LenaDark.bmp. Use the Histogram Window to
observe the histogram before and after equalization. Save the equalized image
‘ and record the mean and median values of both original and equalized
images .
(b) Histogram Stretch: Histogram stretch is a technique stretching the original
distribution to the whole gray levels but maintaining the overall trend of the
original distribution. Use Colors Histogram Functions Stretch to
perform histogram stretch. Save this stretched image ‘. Compare the
stretched results of Lena.bmp and LenaDark.bmp with (a) .
UMD ENEE408G Spring 2003
Design Project -Image
3
2. Histogram Adjustment: In some situations, we would like to emphasize a specific
band of gray-level pixels. We can shape the histogram of an image to a desired
histogram using Histogram Adjustment dialog box in Paint Shop Pro.
(a) Open Lena.bmp. Use Colors Histogram Functions Histogram
Adjustment, and you will see the Histogram Adjustment dialog box. Adjust
the values in Midtones Compress1 and Gamma2 and observe the adjusted
image.
(b) We have already used histogram equalization and stretch on LenaDark.bmp.
In this subsection, use Histogram Adjustment dialog box to reshape the
histogram of LenaDark.bmp. Save this reshaped image ‘ and record the
values of parameters you used . Compare your result with the results of
histogram equalization and stretch .
3. Image Sharpening: The goal of image sharpening is to enhance some details or
blurred regions of images.
(a) Open LenaBlur.bmp using Paint Shop Pro. Use Effects Sharpen Sharpen
and Sharpen more from Paint Shop Pro menu bar to improve the quality of
this image. Observe the effect and save the sharpened image‘. Discuss
whether a blurred image can be completely recovered .
1
Midtone Compress is a non-linear operation mapping the original gray-level image onto another scale.
The higher the value you choose, the more gray-level pixels mapped into middle band. If you select a
larger negative value, it will expand more original mid-band pixels to a wide gray scale range.
2
The response of photographic films is non-linear and can be written as d = γ log10 w − d 0 . w is the
incident light intensity. d is the optical density, which is the response on the film for w. γ is called the
gamma of the film. Similar nonlinear response is also associated with visual displayer. We can
compensate the non-linear response by an inverse procedure with γ .
UMD ENEE408G Spring 2003
Design Project -Image
4
(b) One approach to implement image sharpening is to use spatial high boost
filter. Paintshop Pro provides a User Defined filters under the Effects menu.
In this dialog box, you can create your own filter by choosing New from User
Defined filters Window. An Edit User Defined Filter window will pop up,
which is shown in the following figure. Key in the parameters shown below
and apply it to LenaBlur.bmp. Save this sharpened image ‘.
(c) In general, we can create an m x m spatial high boost filter as follows:
 −1 "
#


−1 −1
1 
−1 w
m2 

−1 −1

#
 −1 "

Where w = A × m 2 − 1
−1
−1
−1
and
" −1
# 





#
" −1
A ≥1
Generate a 5x5 and a 7x7 high boost filters to improve LenaBlur.bmp and
save these improved images‘. Record the m and A you have used in each
case and compare the resulting image quality with (b) .
4. Noise Cleaning: An image can become noisy in such situations as transmission
through noisy channel. To recover the original perceptual quality of the image, we
can perform noise cleaning. There are two popular noise models widely used in
image processing. One is the Gaussian noise and the other is the salt-and-pepper
noise. In this subsection, we use two approaches, namely, an average filter
(spatial low pass filter) and a median filter, to remove these two kinds of noises.
(Note: BoatPxx.tif represents an image that suffers from salt-and -pepper noise
and BoatGxx.tif suffers from Gaussian noise).
(a) Clean Salt-and-Pepper Noise using Average Filter: Open BoatP05.tif,
BoatP25.tif, and BoatP50.tif using Paint Shop Pro. Apply Effects Blur
Average. Set Filter aperture as 3 and save the filtered images ‘.
UMD ENEE408G Spring 2003
Design Project -Image
5
(b) Clean Salt-and-Pepper Noise using Median Filter: Open BoatP05.tif,
BoatP25.tif, and BoatP50.tif. Apply Effects Noise Median Filter. You
can adjust the Filter Aperture to enhance the photo. Save the filtered images.
‘
(c) Clean Gaussian Noise using Average Filter: Open BoatG01.tif, BoatG10.tif,
and BoatG20.tif, and use average filter to clean up noise. Save the filtered
images. ‘
(d) Clean Gaussian Noise using Median Filter: Open BoatG01.tif, BoatG10.tif,
and BoatG20.tif, and use median filter to clean up noise. Save the filtered
images. ‘
(e) Observe your results obtained from (a) to (d). For each type of noises, explain
which filter can clean it up better.
5. Edge Detection: Edges are local discontinuities in luminance. Since edges
indicate the physical extent of an object, edge detection plays an important role in
computer vision. In this part, we experiment on a few edge detection algorithms.
(a) We can detect edges by using Paint Shop Pro’s built-in operators. Apply
Effects Edge Find all to Pepper.bmp and Baboon.bmp. Save these
images. ‘
(b) There are several spatial edge detectors that we can study through the User
Defined filters mentioned earlier in 3-(c). Apply the following six filters to
Pepper.bmp and Baboon.bmp and use Image Arithmetic Add to combine
the corresponding anti-diagonal-direction and diagonal-direction (or rowdirection and column-direction) results. Save these images. ‘
Filter type
Roberts
Anti-diagonal-direction Filter
0 0 − 1
0 1 0 


0 0 0 
Diagonal-direction Filter
 − 1 0 0
 0 1 0


 0 0 0
Filter type
Prewitt
Row-direction Filter
1 0 − 1
1
1 0 − 1

3
1 0 − 1
Column-direction Filter
− 1 − 1 − 1
1
0
0
0 

3
 1
1
1 
Sobel
1 0 − 1 
1
2 0 − 2

4
1 0 − 1 
− 1 − 2 − 1
1
0
0
0 

4
 1
2
1 
UMD ENEE408G Spring 2003
Design Project -Image
6
(c) Compare and discuss the results you obtained in (b). Which filter has the best
performance?
Part III. Image Compression – JPEG
JPEG (Joint Photography Experts Group)3 is a popular image compression format. The
JPEG encoder can compress images into much smaller data size without too much
distortion. Users can select a quality factor to generate an image with either higher
quality but a larger file size or lower quality but a smaller file size. JPEG supports
several modes to display images. For example, JPEG progressive mode can display an
image progressively from coarse to fine. Hence, if the connection speed of network is
quite slow, users can browse the rough images first and then finer detail will be added up.
In this part, we first explore the JPEG compression and then design a JPEG-like image
codec.
1. JPEG Experiment
In this experiment, we compare different parameters in JPEG standard using Paint
shop Pro. We can save .jpg by File Save As. A Save As dialog box will show
up. Choose the Save as type as JPEG and click the Options buttons. A dialog box
will pop up.
(a) Standard Mode: Save LenaC.bmp as standard mode. Adjust the compression
factor4 as 10,20,30…90 and record the resulted image sizes . Discuss the
3
4
A useful tutorial of JPEG standard can be downloaded at: ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz
The commonly used relation between quality factor (Q) and quantization scale factor is:
 5000

1 ≤ Q ≤ 50 
 Q


scale _ factor (%) = 200 − 2 * Q
50 ≤ Q ≤ 99
 1
Q = 100 




The Scale factor is a multiplicative factor applied to the JPEG quantization table.
Paint Shop Pro defines the Compression Factor (CF) as
 CF
 50
Scale _ factor = 
 50
100 − CF
UMD ENEE408G Spring 2003
Design Project -Image

1 ≤ CF ≤ 50 


50 ≤ CF ≤ 99 

7
relation between compression factor and the quality of image . Zoom in
images and describe the artifacts caused by the 8x8 blocks . Try the Effects
Enhance Photo JPEG Artifact Removal to remove the artifacts.
(b) Progressive Mode: Download an aerial image with size at least 1024x1024
from the USC Image database5. Generate a progressive mode JPEG image
with best quality. Put this image on your web site and observe how these
images are loaded and displayed under these the following two conditions.
One is with a high-speed connection (e.g. use the computer in Jasmine during
the lab hours) and the other is with a low-speed connection (e.g. dial-up from
your home)6. Describe your observation.
2. Design a JPEG-like Codec
(a) In this part, we design a JPEG-like image codec and implement it by writing
two Matlab scripts. One script is for the encoder and the other is for the
decoder . The block diagram is shown as follows.
Y
RGB
Image
RGB to
YCbCr
DCT
QY
Zigzag
Cb
2
DCT
QCb
Zigzag
Cr
2
DCT
QCr
Zigzag
Entropy
Eocoding
Compressed
Image
Encoder
Compressed
Image
iZigzag
iQY
IDCT
iZigzag
iQCb
IDCT
2
Cb
iZigzag
iQCr
IDCT
2
Cr
Entropy
Decoding
Y
YCbCr to
RGB
Reconstructed
RGB
Image
Decoder
2 represents down sample by 2.
2 represents up sample by 2.
5
USC Image Database: http://sipi.usc.edu/services/database/Database.html
Make sure your browser support JPEG progressive mode. Pick one that supports this mode, such as the
Netscape 4.7. Otherwise, you won’t see the progressive effect.
6
UMD ENEE408G Spring 2003
Design Project -Image
8
Here are some hints on implementing a few key modules:
(i) Image I/O: imread.m and imwrite.m.
(ii) RGB 
YcbCr: Use rgb2ycbcr.m and ycbcr2rgb.m.
(iii) The DCT block means NxN block-based DCT. You may use dct2.m and
blkproc.m. IDCT is the NxN block-based inverse DCT. Use idct2.m and
blkproc.m.
(iv) QY, QCb, and QCr represent the NxN quantization with quantization tables7
for luminance and chrominance components, respectively. And iQY, iQCb,
iQCr denote the corresponding reconstruction. You should design this part
by yourself.
(v) The downsample and upsample factor is 2. You can use imresize.m or
simple spatial sampling and average of image.
(vi)You can use ZigzagMtx2Vector.m8 we have provided to you to perform
zigzag scanning and use Vector2ZigzagMtx.m9 for iZigzag.
(vii) For Entropy Encoding, use JPEG_entropy_encode.m10 we have provided
to you. This function will read a matrix, in which each row represents a
vectorized DCT block, write a bit stream, whose filename is always
named as JPEG.jpg, and return the length of this file. You can do the
luminance part first, rename this file, and then do the chrominance part.
For the entropy decoding, use JPEG_entropy_decode.m11, which performs
the inverse functionality.
7
JpegLumQuanTable.m returns the JPEG standard luminance quantization table, and JpegChrQuanTable.m returns the JPEG
standard chrominance quantization table. In this part, you should use the JPEG standard table as a reference and design your own
table.
8
function out=ZigzagMtx2Vector(in)
Convert a matrix to a vector using zigzag order, e.g. [1 2 6; 3 5 7; 4 8 9] [1 2 3 4 5 6 7 8 9].
9
function out=Vector2ZigzagMtx(in)
Convert a vector to a square matrix using zigzag order, e.g. [1 2 3 4 5 6 7 8 9] [ 1 2 6 ; 3 5 7 ; 4 8 9].
10
function [Len]=JPEG_entropy_encode(rowN,colN,dct_block_size,Q,ZZDCTQIm,encoder_path,DisplayProcess_Flag)
Input: rowN (1x1): the number of row. colN (1x1) : the number of column. dct_block_size (1x1): the dimension of DCT.
Q: quantization table of size dct_block_size x dct_block_size.
ZZDCTQIm: the zigzagged image matrix after quantization with size (rowN*colN/dct_block_size2)x(dct_block_size2)
encoder_path (string): the absolute path of this function and jpeg_entropy_encode.exe.
## Remember to set Matlab "current directory window" to this path, so that this .exe file can be run
DisplayProcess_Flag (1x1): flag for displaying the zero run pair and Huffman table (in JPEG_entropy_encode.html)
This HTML file provides the details of JPEG Huffman encoding.
Output: Len: 1x1, compressed file length
This Matlab function is an interface for generating a text file, JPEG_DCTQ_ZZ.txt, and running jpeg_entropy_encode.exe.
11
function [rowN,colN,dct_block_size,iQ,iZZDCTQIm]=JPEG_entropy_decode(decoder_path)
Input: decoder_path: (string): the absolute path of this function and jpeg_entropy_decode.exe.
## Remember to set Matlab "current directory window" to this path, so that this .exe file can be run.
Output: rowN (1x1): the number of row. colN (1x1) : the number of column. dct_block_size (1x1): the dimension of DCT.
iQ: quantization table of size dct_block_size x dct_block_size
iZZDCTQIm: the zigzagged image matrix after reconstruction with size (rowN*colN/dct_block_size2)x(dct_block_size2)
This Matlab function is an interface for generating and interpreting a text file, JPEG_iDCTQ_ZZ.txt, which is generated by
jpeg_entropy_decode.exe.
UMD ENEE408G Spring 2003
Design Project -Image
9
(viii) Make sure that the pixel value of the reconstructed image are integers
within [0 255]. If the value is above 255, you should enforce it to 255. If
the value is below 0, enforce it to 0.
(b) Adjust the following parameters (or matrix) on LenaC.bmp: N, QY, QCb, and
QCr. Use the following two definitions to plot the PSNR of luminance
component (y-axis) v.s. CR (x-axis) .
(i) Peak signal to noise ratio (PSNR):
PSNR =
MSD =
( peak _ intensity ) 2
MSD
w
h
i
j
∑∑ (err[i][ j ])
2
w× h
Where peak_intensity=255, w is the width, h is the height of the images,
and err[i][j] represents the difference between the (i, j) pixel in the
original image and that in the reconstructed image.
(ii) Compression ratio (CR):
CR= original image file size (in bytes) / compressed image file size (in
bytes)
Remember that a compressed image should contain both luminance and
chrominance data, though we store them in several files here for
simplicity.
Reminder: Your codec should achieve as high PSNR as possible at each
reasonable compression CR.
(c) Discuss the advantage of zigzag scanning.
Verify your conclusion by
removing the Zigzag and iZigzag part from your codec, plotting the PSNR v.s.
CR figure , and comparing the result with (c).
Part IV: Mobile Computing and Pocket PC Programming
From the above three parts, we have learned the fundamentals of image processing. Now
design a simple Pocket PC application related to digital image processing using the
Microsoft eMbedded Tools. You can refer to “ENEE408G Multimedia Signal Processing
Mobile Computing and Pocket PC Programming Manual” and extend the examples
there.
UMD ENEE408G Spring 2003
Design Project -Image
10
Part V: Digital Photography
During the past few years, the functionalities of digital cameras become much powerful.
An advantage of digital camera is that we can get images immediately after capturing,
erase them if we do not like it, modify them if the lighting is not perfect, and transfer
them easily through electronic connections.
In this section, use the digital camera to take pictures, and modify them artistically and
creatively. You can use the built-in function in digital camera to do real time processing
while you are taking the pictures and apply the skills you have learned from Paint Shop
Pro and Matlab to do post-processing. We will use a color printer in the Communication
and Signal Processing Laboratory (CSPL, AVW2454) to print your artwork.
________________________________________________________________________
Bonus Part I. Digital Halftoning
Halftoning is a process to convert multi-level gray or color images to two-level images.
The technique is widely used in printing. In this section, we learn how to generate a
halftoning images using Paint Shop Pro and Matlab.
1. Paint shop Pro supports several halftone methods. Click Colors
Color Depth 2 Colors, you can see a dialog box as follows.
Decrease
Set Palette Component as Grey values and Palette weight as non-weighted. Apply
the five reduction methods shown in the dialog box on images Lena.bmp and
Baboon.bmp. Save these images ‘ and compare them .
2. We can generate a halftone image in other ways.
(a) The most significant bit approach: We can simply select the most significant bit
to present the black and white. Under Paint Shop Pro, Colors Adjust
UMD ENEE408G Spring 2003
Design Project -Image
11
Thresholds, you will see a Threshold dialog box. Set the threshold to 127 and
save the image‘.
(b) Dithering approach: Before applying a threshold, we can add noise by Effects
Noise Add. Choose 50% uniform noise on the Add Noise dialog box. After
adding noise, apply a 127-threshold as mentioned in the first approach. Save
this image ‘. What happens if you apply different magnitude percentage and
different types of random noise?
(c) Halftone screen approach: Open the two images, Lena.bmp and
halftone_screen.tif and open an Image Arithmetic dialog box by Image
Arithmetic. Select Lena.bmp as the image #1 and halftone_screen.tif as
image#2. Set the Function as Add, Channel as All channels, Divisor as 1, Bias
as 0 and check the Clip color values. After finishing this step, use the threshold
as 255 to generate 2-level halftone image ‘. Compare the results of (a), (b),
and (c) .
3. Design your own halftone screen by writing a Matlab script.
(a) Design an 8x8 halftone pattern matrix M.
(b) Use the function, halftone_screen.m, to generate your own halftone screen.
Note: There are four parameters in this function.
% halftone_screen( M, im_height, im_width, filename)
% M: a matrix you have designed; for M=0 (scalar), it sets a default clustered-dot
% pattern; for M=1, it sets a dispersed dot pattern.
% im_heigth and im_width: the dimension of the source image.
% filename: the name and storage path of halftone screen.
UMD ENEE408G Spring 2003
Design Project -Image
12
(c) Use this halftone screen to generate halftoned Lena and Baboon. Save the
images ‘. Compare the results with the halftoning approach in the previous
section .
4. Color Halftoning
(a) Separate LenaC.bmp into CMYK12 (Cyan-Magenta-Yellow-blacK) color
coordinate by Colors Spilt channel Spilt to CMYK. Therefore, we obtain
4 color components.
(b) Perform the halftoning technique on each component and set these
components as gray level image by Color Grey Level.
(c) Combine these four new images by Colors Combine channel Combine
from CMYK. Thus, we obtain the color halftoning image. Save the image. ‘
Bonus Part II. Special Effects
Many techniques of digital image processing are used extensively in various ways of art
and entertainment. In this part, we will learn some special effects used in those fields.
1. Morphing (dissolving method): Michael Jackson’s Black or White13 MTV is a
classical video adopting morphing technique. In this subsection, we implement
morphing using dissolving method provided by the Jasc Animation Shop.
(a) Prepare two face images, face1 and face2, with size MxN. You can use
Digital Camera or PC Camera to capture face images. Use Paint Shop Pro to
crop faces with rectangles and adjust the image size.
(b) Click File New in Animation Shop. A Create New Animation dialog box
will pop up. Set Width as N, Height as M and Canvas color as Transparent.
12
RGB is an additive color synthesis system specific to light. CMY is a substractive color synthesis system
for pigment colors, such as newspaper and color printer. Actually, CMY is the complementary system for
RGB. The relation between RGB and CMY coordinates is in the following formula. C=1-R, M=1-G,
Y=1-B. The CMYK is an offspring system of CMY. The difference between CMYK and CMY is that
CMYK extracts all black elements from CMY and put in the last component, blacK.
13
Michael Jackson’s Black or White on line demo: http://www.gti.ssr.upm.es/~fmb/seq/mjackson.mpg
UMD ENEE408G Spring 2003
Design Project -Image
13
(c) Animation Insert Frames
face1 image
From Files. Click the Add File to insert
(d) Move to the second frame and insert face2, and face1 for the third frame.
(e) Click the first frame and Effects
Insert Image Transition.
Set the parameters as shown in the figure above. Select the Effect as Dissolve.
(f) As we can see, Animation Pro has automatically inserted 20 frames between
face1 and face2. Click the 22nd frame (i.e. the original face2) and repeat the
step (d). There are totally 43 frames in this animation file.
(g) Use View
Animation to preview the result.
(h) Save the animation as GIF file format by File
UMD ENEE408G Spring 2003
Design Project -Image
Save as. ‘
14
Here is the animation result.
2. Special Effect - Emboss Filter: The usages of spatial filter are not limited in
sharpening and blurring. Many digital artists use spatial filter to create their
artwork. In this sub-section, we use emboss filter to convert an image into a basrelief.
(a) Apply emboss filter on Lena.bmp by Effects
Paint Shop Pro. Save this image ‘.
Texture Effects
Emboss in
(b) We can implement emboss filter simply using the User Defined filters dialog
box. Set the Bias of the filter as 128 and Filter Matrix as
− 1 0 1
− 1 0 1


− 1 0 1
Save this image. ‘
(c) We can apply the same emboss filter on color image, such as LenaC.bmp.
Spilt the RGB image into HSL color coordinate. Use the emboss filter in (b)
to filter the L component. Combine these three components. Save the image.
‘
3. Special Effect - Lithograph and Psychedelic Distillation Filter:
(a) Try the following two filters on LenaC.bmp using User Defined filters in Paint
Shop Pro. Save the results. ‘
Lithograph
 −1 −1 −1 −1 −1 
 − 1 − 10 − 10 − 10 − 1 


 − 1 − 10 98 − 10 − 1 


 − 1 − 10 − 10 − 10 − 1 
 − 1 − 1 − 1 − 1 − 1 
Division Factor: 1 ; Bias : 0
Psychedelic Distillation
 0 − 1 − 2 − 3 − 4
 0 −1 3
2
1 

 0 − 1 10 2
1 


2
1 
 0 −1 3
 0 − 1 − 2 − 3 − 4
Division Factor: 1 ; Bias : 0
(b) We have learned several useful spatial filters to create artistic effects. The
basic approach is to sum up the weighted neighbors around each pixel. In this
part, design one special filter to create your own artistic effect. Put the filter
and images ‘in your report.
coefficients
UMD ENEE408G Spring 2003
Design Project -Image
15
ENEE408G Multimedia Signal Processing
Design Project on Video Processing
The Goals
1. Understand the fundamentals of video compression.
2. Learn the latest MPEG video standard.
3. Understand content-based search and indexing.
Note: The symbol means to put your discussion, flowchart, block diagram, or plots
in your report. The symbol ‘ indicates that you should put the multimedia data in
your report. Save images in BMP format unless otherwise stated. The symbol means to put your source codes (Matlab, Basic, or C/C++) in your report.
Part I. Video Capturing by PC Camera
Video can convey more perceptual information than image. For example, we can send a
vivid greeting video through email to family and friends instead of a plain greeting card.
Nowadays digital video can be captured in good quality using affordable consumer-level
devices. In this section, we use a PC camera, Creative Video Blaster Web Cam1, to learn
how to capture digital video.
1. Capturing: Use PC Camera to capture your facial video sequence of about two
seconds long. Save it as an uncompressed AVI file ‘.
2. Playing back: To check whether the video sequence is taken successfully, use the
Windows Media Player to open this AVI file and see the video.
3. Matlab provides several functions for reading and writing AVI files as well as
displaying and manipulating movies.
(a) aviread.m: Read an AVI file.
(b) movie.m: Display a movie or an image sequence.
(c) movie2avi.m: Save an image sequence in AVI file format.
Write a Matlab script to read an AVI file captured in step 1, display it and save
the first ten frames in another AVI file .
1
Adjust the “Album Directory” to “c:\temp” by Settings Æ Album.
UMD ENEE408G Spring’03
Design Project - Video
1
Part II. Motion Estimation and Compensation:
A video consists of a sequence of images. We have learned in lecture that even a short
video clip would contain a large amount of information. It is desirable to compress
videos in order to save storage and reduce the transmission time. There are several ways
to compress a video. In addition to remove the redundancy in each frame (the spatial
redundancy), we can compress the video by eliminating the redundancy among the
neighboring frames (the temporal redundancy). In this part, we learn how to remove the
temporal redundancy using motion estimation (ME) and motion compensation (MC).
The basic idea of ME/MC is that the content within a video shot is almost the same,
except the objects’ motion and the global camera motion. We can find that some regions
in the current frame have the corresponding ones in the previous and future frames.
Therefore, we do not need to encode these regions all over again. Instead, we can simply
use motion vectors to describe the motion between these two frames. Motion estimation
is to find out the motion vectors and motion compensation is to construct the estimated
current frame using a reference frame. The following figures illustrate a pair of video
encoder and decoder based on motion compensation and DCT transform coding.
Original
macroblocks
of pixels
motion
compensated
macroblocks
Motion
Compensated
residual
+ _
Motion
Compensation
DCT
Frame
Memory
Entropy
Coding
Quantization
Encoded
Interframe
Inverse
quantization
+ DCT
+
Reconstructed motion
compensated residual
estimated
motion
vectors
Motion
Estimation
reconstructed motion
compensated
reference frame
Inter-frame Encoder
Encoded
Interframe
Entropy
Decoding
Inverse
quantization
IDCT
reconstructed
motion
compensated
residual
estimated motion
vectors
+
reconstructed
macroblocks of
pixels
Motion
Compensation
Inter-frame Decoder
UMD ENEE408G Spring’03
Design Project - Video
2
1. Motion Estimation: In this sub-section, we implement the motion estimation
algorithm using exhaustive block matching in Matlab.
R
R-1
N1
R
N2
R-1
Reference Frame
Current Frame
To perform block matching, we first divide the current frame into several blocks
with size N1x N2 (oftentimes we use 16x16). For each block B in the current
frame, we calculate the MAD that is defined as follows:
MAD(d1 , d 2 ) ≡
1
N1 N 2
∑
( n1 , n2 )∈B
S ref (n1 + d1 , n2 + d 2 ) − Scur (n1 , n2 )
Sref(x, y) represents the pixel (x, y) in the reference frame and Scur(x, y) represents
the pixel (x, y) in the current frame. d1 and d2 are motion displacement that are
integers between –R and R-1 with R being the maximal search size. The goal is to
find a (d1, d2) pair for each block such that the MAD(d1, d2) is minimized.
car1.bmp
car2.bmp
We will start with N1 = N2 = 16, and R=16. You are provided with two files:
car1.bmp is the reference frame, and car2.bmp is the current frame. Write a
Matlab script to find the motion vectors for this image pair and plot the motion
field2 . That is, each block is represented by an arrow indicating its motion
vector. An example of motion field is shown in the following figure).
2
To plot the motion vectors, you can use quiver.m function under Matlab.
UMD ENEE408G Spring’03
Design Project - Video
3
0
-2
-4
-6
-8
-10
-12
-14
-16
-18
0
2
4
6
8
10
12
14
16
Motion Vectors
2. Motion Compensation:
Estimated Car2.bmp
Motion compensation residual
Motion compensation is to construct the estimated current frame by replacing
each block in the current frame with a best matching block in the reference frame.
The best matching block is indicated by the motion vector. Here is an instance.
Suppose we have a 16x16 block in the current frame with the left-top pixel at (17,
33) and the motion vector of this block is (-5,10). We replace this block by the
block whose left-top pixel is (12,43) in the reference frame.
In this part, write a Matlab script to complete the following tasks. (a) Use the motion vectors obtained in the previous section to perform motion
compensation and construct the estimated image. Save this estimated image
‘.
(b) Compute the motion compensated residual, which is the difference between
the current frame and the estimated frame. Display3 and save this image ‘.
(c) Calculate the mean absolute distortion (MAD) between the current image and
the estimated image .
MAD =
1
M ×N
M N
∑ ∑ | residual[i, j ] |
i
j
where M and N are the dimensions of the frames.
3
Use imagesc.m to adjust the visualization scale.
UMD ENEE408G Spring’03
Design Project - Video
4
3. Although the exhaustive block-matching algorithm can find the best-matching
block within a specified search range, it is very time consuming. There are
several suboptimal algorithms to speed up the search. In this part, we explore the
three-step search approach.
3
3
2
3
2
3
2
3
3
1
1
1
1
3
3
2
1
2
2
2
2
1
1
1
1
Search points in step 1
2
Search points in step 2
3
Search points in step 3
1
The figure above shows the three-step search algorithm. Suppose the maximal
search step size is R. At the beginning, the search step size is about the half of the
search range R. At each step, the search step size is about half of the one in the
previous step. We only calculate the MAD between the block in the current
frame and the blocks whose centers are located at nine points in the reference
frame. Among the nine points, one locates in the center, and the other eight points
locate in different directions illustrated in the above figure. Select the point
among the nine that gives the minimal MAD value, and move the search center of
the next step to this point. Proceed to the next search step with the search step
size reduced by half. After three steps, we pick the block with minimal MAD as
the matched block.
In your experiment, set the block size as 16x16 and R=16. Like before, car1.bmp
is the reference frame and car2.bmp is the current frame. Write a Matlab script to
complete the following tasks. (a) Find the motion vectors using the three-step search algorithm and plot those
motion vectors .
(b) Use the motion vectors obtained through the three-step search to implement
motion compensation. Display and save the estimated image ‘.
(c) Obtain the motion compensated residual. Display4 and save it ‘.
(d) Calculate the mean absolute distortion (MAD) between the current image and
the estimated image .
4
Use imagesc.m to adjust the visulization scale.
UMD ENEE408G Spring’03
Design Project - Video
5
4. In this part, apply your previous Matlab code on an image pair, Carphone0195.tif
and Carphone0196.tif using the exhaustive block matching algorithm and the
three-step algorithm. Carphone0195.tif
Carphone0196.tif
For each algorithm, set the block size as 16x16, and R=16. Use Carphone0195.tif
as the reference frame and Carphone0196.tif as the current frame.
(a) Find the motion vectors and plot them .
(b) Use the motion vectors obtained in (a) to perform motion compensation.
Display and save the estimated image. ‘
(c) Obtain the motion compensated residual5. Display and save it ‘.
(d) Calculate the mean absolute distortion (MAD) between the current image and
estimated image .
5. From step 1 to 4, we have already performed ME/MC on the car and carphone
image sequences using the exhaustive block matching algorithm and the threestep algorithm. Discuss the advantages and disadvantages of these two algorithms
for each image sequence .
Part III. MPEG Video
MEPG-1 was developed by Moving Picture Experts Group in the early 1990s and
targeted at producing near-VHS quality video at a bit rate up to 1.5 Mbps. MPEG-1
consists of three types of frame, I, P, and B. I frame is encoded alone without taking
reference from any other frames. P (predicted) frame uses ME/MC technique and is
predicted from the previous frames. Motion vectors in B (bi-direction) frame can be
obtained using previous or the next I/P frames. There are several factors affecting the
video quality. In this section, we explore the basic parameters and study the tradeoff in
MPEG-1 video compression.
5
Use imagesc.m to adjust the gray scale.
UMD ENEE408G Spring’03
Design Project - Video
6
The Matlab “user contributed function” mpgwrite.m6 supports generating a MPEG file
from an image sequence. You can adjust the parameters as given below:
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
MPGWRITE(M, map, 'filename', options) Encodes M in MPEG
format using the specified colormap and writes the result to the
specified file. The options argument is an optional vector of
8 or fewer options where each value has the following meaning:
1. REPEAT:
An integer number of times to repeat the movie
(default is 1).
2. P-SEARCH ALGORITHM:
0 = logarithmic
(fastest, default value)
1 = subsample
2 = exhaustive
(better, but slow)
3. B-SEARCH ALGORITHM:
0 = simple (fastest)
1 = cross2 (slightly slower, default value)
2 = exhaustive
(very slow)
4. REFERENCE FRAME:
0 = original
(faster, default)
1 = decoded
(slower, but results in better quality)
5. RANGE IN PIXELS:
An integer search radius. Default is 10.
6. I-FRAME Q-SCALE:
An integer between 1 and 31. Default is 8.
7. P-FRAME Q-SCALE:
An integer between 1 and 31. Default is 10.
8. B-FRAME Q-SCALE:
An integer between 1 and 31. Default is 25.
1. Apply BR_Q_mpg.m7 on foreman image sequence to achieve the following goals
and list the parameters you choose in mpg_option. (a) Average PSNR is higher than 26dB.
(b) Bit Rate is lower than 400Kbps.
2. Give a rule of thumb on how to choose those parameters in step 1. Part IV. Video Conference
Video conferencing becomes increasingly popular. Imagine that people can have a
meeting through Internet without physically getting together! In this part, we use
Microsoft’s NetMeeting8 software to have video conferencing over two kinds of network,
namely, the Local Area Network and the Wide Area Network.
1. Over the Local Area Network (LAN): From CallÆ New call, call your partner’s IP
address9 directly.
6
This file can be obtained from http://www.mathworks.com/support/solutions/data/8154.shtml
Modify the parameters in this BR_Q_mpg.m. Make sure the file names, in_filename, out_filename,
mpg_filename, have the full path. What you need to do is to modify the mpg_option, which is the input
argument, options, in mpgwrite.m Remember to include subroutine read_im_seq.m, write_im_seq.m,
file_size.m and PSNR_seq.m.
8
You should set up the PC Cameras, microphones and earphones before you start this experiment.
9
The IP addresses of PCs in Jasmine are posted on the top of the PC’s cases.
7
UMD ENEE408G Spring’03
Design Project - Video
7
2. Over the Wide Area Network (WAN): Login to the server with your partner by
CallÆ Log on to Microsoft Internet Directory.
3. Compare the results of both video and audio part between LAN and WAN. Part V. Video Scene Change Detection
Scene change detection plays a fundamental role in content-based video processing,
which facilitates users to automatically analyze and index the content of videos. In this
part, we will design and implement a scene change detector. We will explore the
detection of Cut, Fade, and Wipe transitions.
We first understand what common scene change are.
Cut is simply an abrupt transition. Read Cut.mpg using mpgread.m. You can observe that
the transition between the following two scenes is abrupt and without any intermediate
frames.
Dissolve is a time-dependent linear combination for two scenes. You can observe the
dissolve in Dissolve.mpg using mpgread.m.
Wipe covers an old scene and reveals a new scene with such patterns as line, blinds, or
checkerboard. You can observe a wipe with diagonal line pattern in Wipe.mpg using
mpgread.m.
UMD ENEE408G Spring’03
Design Project - Video
8
1. Design a Scene Change Detector:
Design a scene change detector10 using Matlab . This detector should be able to
indicate the frame indexes at which the scene changes happen. Apply your
detector to detect scene changes in Cut.mpg, Dissolve.mpg, and Wipe.mpg. List
the frame indexes your program indicates .
Hint: You can calculate the statistical characteristics for each frame.
2. Design a Scene Change Detector for Multiple Scene within one video:
Use what you have learned above and write a Matlab script to detect the scene
changes in the video sequence cbswipe.mpg, which is mixed with several types of
scene changes . This detector should be able to indicate the frame indexes at
which the scene changes happen. List the frame indexes your program indicates
.
cbswipe.mpg
3. Discussion:
Discuss the performance of your scene change detector .
10
You can use frame2im.m to retrieve every image frame from a movie object in Matlab. E.g.
M=mpgread(‘Cut.mpeg’,1:100,’truecolor’);
% read 1st to 100th frame of Cut.mpeg and each pixel is
% represented by true color (24bits)
% load Cut.mpeg into M, a movie object
Movie(M);
% display the movie object
Im=frame2im(M(1));
% transfer the 1st frame in movie object into Im
UMD ENEE408G Spring’03
Design Project - Video
9
Part VI: Mobile Computing and Pocket PC Programming
In this part, apply the fundamental video processing you learned from the previous parts
to design a simple application related to digital video processing by the Microsoft
eMbedded Tools for Pocket PC. You can refer to the “ENEE408G Multimedia Signal
Processing Mobile Computing and Pocket PC Programming Manual” and extend the
examples there. Bonus Part I. MPEG-7 Visual Descriptor
MPEG-1, MPEG-2 and MPEG-4 emphasize on how to compress videos efficiently.
MPEG-7, on the other hand, focuses on how to describe the content of video11. In fact, it
is a standard that defines content description systems for indexing, searching, and
browsing of multimedia data. The MPEG-7 Visual Part12 defines descriptors for color,
texture, shape, and motion. In this part, we explore texture and shape descriptors.
1. MPEG-7 Texture Descriptor
Texture contains much useful information about an image. In this part, we use the
MPEG-7 Homogeneous Texture Descriptor Demo, (http://nayana.ece.ucsb.edu
/M7TextureDemo/Demo/client/M7TextureDemo.html), to study the texture
descriptor.
11
12
For more details: IEEE Trans. Circuits Syst. Video Technol, vol. 11, June 2001.
MPEG-7 Visual Part Committee draft: http://mpeg.telecomitalialab.com/public/mpeg-7_visual_fcd.zip
UMD ENEE408G Spring’03
Design Project - Video
10
The figure above shows the MPEG-7 Homogeneous Texture Descriptor Demo for
searching aerial images. You can query a region by the instructions given in that
web site.
Inquire three different tiles using NN search with 10 best matches for each query.
Save the inquiry results13 in JPEG format ‘and discuss how good the search
results are. 2. MPEG-7 Shape Descriptor
Shape is also an important feature for objects. In this part, we use the Shape
Queries Using Image Databases (SQUID) demo, http://www.ee.surrey.ac.uk/
Research/VSSP/imagedb/demo.html, to learn the shape descriptor.
The figure above shows the GUI of SQUID. One can query a shape by the
following steps:
Step-1: Click the Random button to generate random contours.
Step-2: Click on one desired shape and images with similar shapes in the
database will be shown.
Query three different shapes. Save the inquiry results in JPEG format ‘and
discuss how good the search results are .
13
You can use “Print Screen” (on the keyboard) to capture the screen and Paint shop pro to crop the images
UMD ENEE408G Spring’03
Design Project - Video
11
ENEE408G Multimedia Signal Processing
Design Project on Digital Speech Processing
The Goals
1. Learn how to use linear predictive model for speech analysis and synthesis.
2. Implement a linear predictive model for speech coding.
3. Explore the speech recognition based human computer interface.
Note: The symbol means to put your discussion, flowchart, block diagram, or plots
in your report. The symbol ‘ indicates that you should put the multimedia data in
your report. Save speech in signed mono16 bits WAV format unless otherwise
stated. The symbol means to put your source codes (Matlab, CSLU RAD, Basic,
or C/C++) in your report.
Part I. Speech Analysis
To analyze a speech signal, we should first understand the human vocal tract and build a
model to describe it. In this part, we investigate the linear predictive model.
The figure1 above shows the mid-sagittal plane of human vocal apparatus. The vocal
tract begins at the opening of the vocal cords and ends at the lips. The figure below
represents a model of speech production.
Pitch Period
Vocal Tract
Parameters
Impluse
Train
Generator
Voiced
X
White
Noise
1
Unvoiced
Vocal Tract
Model
speech
G
http://www.phon.ox.ac.uk/~jcoleman/phonation.htm
UMD ENEE408G Spring’03
Design Project - Speech
1
This model consists of two parts. The first part is the excitation with two states: the
impulse train generator produces impulse trains at specified pitches for voiced sound and
the white noise generator for the unvoiced speech. The impulse train generator is
stimulated with a given pitch period (i.e., the fundamental frequency of the glottal
oscillation). The second part is the vocal tract model (with gain G) that is usually
modeled as the following pth order all-pole linear predictive model, V(z).
G
V ( z) = p
∏ (1 − α k z −k )
k =1
−k
The frequency parts of 1 − α k z are called formants, which are the resonant frequencies
caused by airflow through the vocal tract.
In this part, we use the Matlab “COLEA” toolbox2 to study the linear predictive model.
There are four major window interfaces in COLEA toolbox as described in the following
paragraphs.
Waveform on Time Domain: This window shows the raw speech signal. We can observe
the signal, bode.wav3, on the time domain by DisplayÆTime Waveform.
Spectrogram: We can use time-frequency domain to highlight different information of a
speech signal. Spectrogram (short time Fourier Transform) is a popular spectral
representation. We can use DisplayÆ Spectrogram Æ Color to display the spectrogram.
2
COLEA: http://www.utdallas.edu/~loizou/speech/colea.htm
Download the American-English part from the Handbook of the International Phonetic Association
(IPA): http://web.uvic.ca/ling/resources/ipa/handbook.htm
.
3
UMD ENEE408G Spring’03
Design Project - Speech
2
Pitch and Formant tracking: We can use Display Æ F0 contour Æ Autocorrelation
Approach and Display Æ Formant Track to visualize the pitch and formant of a speech
signal.
Pitch Contour
Formant tract
LPC spectra: We can also characterize the spectrum of a speech signal using linear
predictive model, V(z). For example, we first open a speech signal file, say, the
bode.wav. Left click on the waveform or spectrogram on the corresponding window.
Two sub-windows will show up. One is the LPC Spectra and the other one is Controls,
which can be used to set the parameters of displayed LPC spectra.
UMD ENEE408G Spring’03
Design Project - Speech
3
To verify the correctness of LPC for speech modeling, we can calculate the Short Time
Fourier Transform (STFT), overlap it with the LPC spectrum, and compare how close
those two spectrums are. We can choose the Spectrum as FFT on the Controls window
and check the Overlay on the bottom of the window to compare them.
1. Linear Predictive Model
(a) Now, use the Recoding Tool (as shown in the figure below, you can find it from
menu barÆRecord) to record your own voices with ten vowels listed in the
following table ‘. Then, use the tools introduced earlier to analyze the pitches
and their first three formants . To locate the position of a specified vowel in a
speech signal, you can search it by listening to a small frame in this speech signal.
To specify a small frame, you can use left and right mouse clicks on the
waveform window to make the start and the end of a frame, respectively. To listen
to this small frame, press the “sel” button in the Play area.
1
2
3
4
5
6
7
8
9
10
Word/vowel
beet
bit
bet
bat
but
hot
bought
foot
boot
bird
UMD ENEE408G Spring’03
Design Project - Speech
Pitch(Hz)
Formant1 (Hz) Formant2(Hz)
Formant3(Hz)
4
(b) Plot the first formant (X-axis) and second formant (Y-axis) of each vowel for all
the members in your group in a single figure (as shown in the following figure)
. Discuss what you observe from this figure .
(c) Adjust the order and duration of Linear Predictive model. Describe what you
have observed from the LPC spectral results and STFT for different orders and
durations .
2. Gender Identification:
Linear predictive model is widely used in digital signal processing due to its simplicity
and effectiveness. In this part, we use linear predictive model to program gender
identification. You should develop your own algorithms using Matlab to identify the
gender of a speaker. Ten male speech samples and their corresponding female speech
samples are provided on the course web page. You can train your gender identifier with
those samples. At the end of this lab, you will be asked to test your program with a new
set of samples
LP Gender Identification framework:
The figure below shows the LPC gender identification framework. There are three
building blocks in this system, LPC Analysis, Features Extraction for training set, and
Gender Identification testing.
Training
Set
LPC Analysis
by proclpc.m
Features
Extraction for
Training Set
Gender
Identificatoin
Male / Female
Unknow gender
wave files
UMD ENEE408G Spring’03
Design Project - Speech
5
LPC Analysis:
Using proclpc.m4, we can obtain LPC coefficients and other information.
% [aCoeff,resid,pitch,G,parcor,stream] = proclpc(data,sr,L,fr,fs,preemp)
%
% LPC analysis is performed on a monaural sound vector (data) which has been
% sampled at a sampling rate of "sr". The following optional parameters modify
% the behaviour of this algorithm.
% L - The order of the analysis. There are L+1 LPC coefficients in the output
%
array aCoeff for each frame of data. The default value is 13.
% fr - Frame time increment, in ms. The LPC analysis is done starting every
%
fr ms in time. Defaults is 20ms (50 LPC vectors a second)
% fs - Frame size in ms. The LPC analysis is done by windowing the speech
%
data with a rectangular window that is fs ms long. Defaults to 30ms
% preemp - This variable is the epsilon in a digital one-zero filter which
%
serves to preemphasize the speech signal and compensate for the 6dB
%
per octave rolloff in the radiation function. Defaults to .9378.
% The output variables from this function are
% aCoeff - The LPC analysis results, a(i). One column of L numbers for each
%
frame of data
% resid - The LPC residual, e(n). One column of sr*fs samples representing
%
the excitation or residual of the LPC filter.
% pitch – A vector of frame-by-frame estimate of the pitch of the signal, calculated
%
by finding the peak in the residual's autocorrelation for each frame.
% G - The LPC gain for each frame.
% parcor - The parcor coefficients. The parcor coefficients give the ratio
%
between adjacent sections in a tubular model of the speech
%
articulators. There are L parcor coefficients for each frame of
%
speech.
% stream – A vector representing the residual or excitation signal in the LPC analysis.
%
Overlapping frames of the resid output combined into a new one%
dimensional signal and post-filtered.
The following diagram illustrated how this M-file works.
preemp
data
& sr
Preemphasis
fr
fs
Frame
Blocking
L
LPC
Calculation
aCoeff
resid
pitch
G
parcor
stream
(a) Features Extractions for training sets: For each sample, we can obtain one set of
coefficients. Develop your own algorithm to distinguish gender using those
coefficients. Write Matlab scripts/functions to implement your algorithm and
briefly explain how it works .
Note 1: Use [wave, SampleRate] = TIMITread(filename) to read wave files.
Note 2: The unvoiced signals in the speech files may affect your identification
performance.
4
Auditory Toolbox: http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/
UMD ENEE408G Spring’03
Design Project - Speech
6
(b) Testing new voice files: Your algorithm will be tested with ten new samples and
your scores for this part depends on the percentage of the correct identification by
your gender identifier.
Part II. Speech Coding: Linear Predictive Vocoder
To encode a speech signal at low bit rate, it is efficient if we employ analysis-synthesis
approach to design a voice coder (vocoder). Linear predictive vocoder is a popular
framework. In this part, we design a 2.4kbps 10-order linear predictive vocoder
according to the linear predictive model we learned in Part I.
Ideas:
We have already learned how to obtain LPC related parameters, such as 10 LPC
coefficients, {ak}k=1~10, (aCoeff), Gain (G) and Pitch (pitch), for each frame using
proclpc.m5. We can represent those parameters as a vector A=(a1, a2, a3, a4, a5, a6, a7, a8,
a9, a10, Gain, Pitch) and quantize A to compress a speech signal. After reconstructing a
speech signal from those quantized parameters, we can use synlpc.m, (also in Auditory
Toolbox) to synthesize this speech.
% synWave = synlpc(aCoeff,source,sr,G,fr,fs,preemp);
%
% LPC synthesis to produce a monaural sound vector (synWave) using the following parameters
% aCoeff – represents the LPC analysis results, a(i). Eeach column of L+1 numbers for each
%
frame of speech data. The number of columns is determined by the number of frames in the
% speech signal
% source - represents the LPC residual, e(n). One column of sr*fs samples representing
%
the excitation or residual of the LPC filter.
% sr - sampling rate
% G - The LPC gain for each frame.
% fr - Frame time increment, in milli-seconds (ms). The LPC analysis is done for every
%
fr ms in time. Default is 20ms (i.e., 50 LPC vectors a second).
% fs - Frame size in ms. The LPC analysis is done by windowing the speech
%
data with a rectangular window that is fs ms long. Default is 30ms (i.e., allow 10 ms’ overlap
%
between frames).
% preemp - This variable is the epsilon in a digital single-zero filter that
%
is used to preemphasize the speech signal and compensate for the 6dB
%
per octave rolloff in the radiation function. Default is 0.9378.
Line Spectrum Pair:
If we directly quantize LPC coefficients (a1 ~ a10), it may cause instability due to some
poles near the unit circle by quantization. One way to overcome this problem is to
convert LPC to Line Spectrum Pair (LSP) parameters that are more amenable to
quantization. LSP can be calculated first by generating polynomials P(z) and Q(z):
5
In proclpc.m, if pitch is equal to zero, it means this frame is unvoiced (UV). If it is nonzero, pitch
indicates that this frame is voiced (V) with pitch period, T. To avoid confusion with the meaning of pitch,
we denote the value of pitch as UV/V,T in the following paragraph.
UMD ENEE408G Spring’03
Design Project - Speech
7
P ( z ) = 1 + (a1 − a10 ) z −1 + (a 2 − a 9 ) z −2 + ... + (a10 − a1 ) z −10 − z −11
Q( z ) = 1 + (a1 + a10 ) z −1 + (a 2 + a 9 ) z − 2 + ... + (a10 + a1 ) z −10 + z −11
Then, rearrange P(z) and Q(z) to obtain parameters {wk}:
P ( z ) = (1 − z −1 )∏k = 2, 4,...,10 (1 − 2 coswk z −1 + z −2 )
Q( z ) = (1 + z −1 )∏k =1,3,...,9 (1 − 2 coswk z −1 + z − 2 )
where {wk}k=1~10 are the LSP parameters with order of 0< w1 < w2< …< w10 < π .
We can use lpcar2ls.m6 to convert LPC (AR) parameters to LSP parameters and use
lpcls2ar.m to convert LSP back to LPC (AR).
Quantization:
To achieve a coding rate of 2.4kbps with a frame size of 20msec (i.e., there are 50 frames
per second), each frame will be represented by 48 bits. The following table shows how to
allocate bits for the above-mentioned parameters of {wk}, Gain and UV/V,T.
Parameters
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
Gain
UV/V,T
Bits/frame
3
4
4
4
4
3
3
3
3
3
7
7
We assign seven bits (range of value: 0~127) for the parameter UV/V,T. If this frame is
unvoiced (UV), we encode it as (0)10 or (0000000)2. Otherwise for the voiced case (V),
we encode the corresponding pitch period, T, according to the following table. For
example, if T is equal to 22, we encode it as (3)10 or (0000011)2.
UV/V
UV
V
V
V
…
V
6
T
20
21
22
…
146
Encoded Value
0
1
2
3
…
127
From Voicebox: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
UMD ENEE408G Spring’03
Design Project - Speech
8
(a) Design your own 2.4kbps vocoder:
Orignal
Speech
Frame
Segmentation
&LPC analysis
proclpc.m
{ak}k=1~10 LPC to LSP {wk}k=1~10
lpcar2ls.m
Gain, UV/V,T
2.4kbps
compressed
Speech
Q
Q
Encoder
iQ
2.4kbps
compressed
Speech
{w’k}k=1~10 LSP to LPC
lpcls2ar.m
iQ
UV/V,T’
{a’k}k=1~10
Gain’
Impluse Train
Generator
LPC
synthesis &
Frame
combination
synlpc.m
Reconstructed
Speech
Source
White Noise
Decoder
The figures above show the encoder and decoder, respectively. Write Matlab
scripts/functions to implement this scheme and explain your design briefly .
Note that the encoder and decoder should be implemented separately. That is, the
encoder reads7 a wave file, generates a compressed bit stream, and saves it in the
storage. The decoder reads this compressed bit stream from disk and
decompresses it into a wave file. You can use proclpc.m, synlpc.m, lpcar2ls.m
and lpcls2ar.m as the basic building blocks. Thus, the remaining work is to design
quantization tables for the LPC parameters A’=(w1, w2, w3, w4, w5, w6, w7, w8, w9,
w10, Gain, UV/V,T) and design the impulse train generator for voiced state and
white noise for unvoiced speech.
(b) Compress the speech signal stored in tapestry.wav. Calculate the mean squared
error for the reconstructed speech signal and the original speech signal .
(c) Code Excited Linear Prediction (CELP):
CELP is a federal speech-coding standard (FS-1016) that also uses linear
prediction. This standard offers good speech compression at intermediate bit rate
(4.8-9.6kbps). In this part, we use an audio processing software Goldwave to
7
Use wavread.m to read a wave file. Use wavwrite.m to write out a wave file.
UMD ENEE408G Spring’03
Design Project - Speech
9
compress a speech signal in CELP format and compare with the result we
obtained in the previous LPC vocoder part.
(1) Open tapestry.wav by File Æ Open in Goldwave. We can convert it to CELP
format by FileÆSave As. You will see the following window. Choose the
Lernout & Hauspire CELP 4.8kbits/s 8,000 Hz, 16bits, mono and save it as a
new file.
(2) Load the new CELP wave file and save it back to 16bits,mono, signed .wav
file. Write a Matlab8 script/function to calculate the MSE between the
original signal and the reconstructed signal . Compare the results with
your LPC vocoder in the previous part .
Part III. Speech Recognition by IBM ViaVoice
IBM ViaVoice is successful commercial speech recognition software. In this part, we
use ViaVoice to get some experience of the state of the art of speech recognition.
1.ViaVoice Training:
Open the IBM ViaVoice: As a new user, you are required to read a short story
(about 100 sentences) to train this software. Please be patient to finish this part.
2.Operating PC by ViaVoice:
Use ViaVoice to operate your PC and do a short dictation on Microsoft Word.
3. Discuss the strength and weaknesses of this speech recognition system .
8
Use wavread.m to read a wave file. Use wavwrite.m to write out a wave file.
UMD ENEE408G Spring’03
Design Project - Speech
10
Part IV. Speech Synthesis
Speech synthesis systems are generally classified into two categories: concept-to-speech
systems and text-to-speech (TTS) systems. Concept-to-speech system is used by
automatic dialog machine that has a limited vocabulary, e.g. 100 words, but with artificial
intelligence to respond to inputs. Text-to-speech system aims at reading text (e.g. aids
for the blind) and has the ability to record and store all the words of a specified language.
In this part, we explore the TTS system and implement (a simple) speech synthesis
system.
1. Text To Speech (TTS) and Talking Head:
We can define text-to-speech system as the production of speech by machines that
use automatic phonetization of the sentences to utter. In general, TTS9 consists of
two components, natural language processing (NLP) and digital signal processing
(DSP). NLP is capable of producing a phonetic transcription of the text to read
and DSP converts those outputs of NLP into natural human speech. In addition to
these two components, researchers recently add one more module, a talking head,
which simulates the face and articulator when it pronounces texts. In this part, we
use the CSLU10 toolkit to explore how Text-To-Speech works.
MultiSync Window
9
For more details of TTS, please refer to
Thierry Dutoit, An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997
10
CSLU Toolkit: http://cslu.cse.ogi.edu/toolkit/index.html
UMD ENEE408G Spring’03
Design Project - Speech
11
Talking Head window
Baldi Window
Click StartÆ ProgramsÆ 408GÆ Speech Toolkit Æ Multi Sync and two
windows will pop up. The first one is the MultiSync window and the other is the
Talking Head window. You can type some sentences in the textbox labeled as
Text to align with of the MultiSync window, and click the TTS button to generate
button, listen to the speech, and
the phonetic transcription. Then click the
observe the behavior of the talking head. You can choose among five different
talking-head characters to do the speech.
From StartÆ ProgramsÆ 408GÆ Speech Toolkit Æ Baldi, you will see the
talking head, Baldi, which is shown in the right figure above. Clicking File Æ
Preference, you can change the colors, the texture map, and emotions of the
talking head. You also can observe the articulators by Rendering Æ Solidness.
2. Vowel Synthesis:
In practice, designing a text-to-speech system is not a simply task. This system
should consist of several levels of processing: acoustic, phonetic, phonological,
morphological, syntactic, semantic, and pragmatic levels. In this part, we
emphasize on the phonetic level and synthesize the vowel by MakeVowel.m that is
also provided with the Auditory Toolbox.
% y=MakeVowel(len, pitch, sampleRate, f1, f2, f3)
% len: length of samples
% pitch: can be either a scalar indicating the actual pitch frequency, or an array of impulse
%
locations. Using an array of impulses allows this routine to compute vowels with
%
time-varying pitch.
% sampleRate: sampling rate
% f1, f2 & f3: formant frequencies
Synthesize those ten vowels that have been analyzed in Part I.1.(a) by putting the
values of pitch, f1, f2 and f3 in the input argument of MakeVowel.m. You can use
the Matlab function sound(y, sampleRate) to hear those synthetic vowels and use
wavwrite.m to write out the wave files ‘. Compare the vowels you recorded in
Part I.1.(a) with the synthesized results .
UMD ENEE408G Spring’03
Design Project - Speech
12
Part V. Human Computer Interface
One of the motivations to analyze and synthesize speech is to create a friendly and
convenient interface between users and computers. Though such a concept-to-speech
system, users can operate and communicate with machine through voice. In this part, we
explore two advanced human computer interfaces based on speech recognition.
1. CSLU Human Computer Interface:
CSLU provides a tool, known as Rapid Application Developer (RAD), to develop
a human computer interface via speech. A computer uses speech recognition
technology to understand what human speaks. In addition, the computer will
react according to the decision rules set in the software program and speak by
Text-To-Speech system. The figure below shows the RAD development
environment. You can open this program by StartÆ ProgramsÆ 408GÆ Speech
Toolkit Æ. Take a look at the tutorials (c:\Program Files\ CSLU\ Toolkit\ 2.0\
apps\rad\examples tutorials) and user’s guide11. Before you start using these
interactive programs, you should calibrate the microphone by
FileÆPreferencesÆAudioÆ Calibrate from RAD menu bar.
(a) Design a simple application using the CSLU RAD tools .
11
CSLU RAD Tutorials: http://cslu.cse.ogi.edu/toolkit/docs/2.0/apps/rad/tutorials/
UMD ENEE408G Spring’03
Design Project - Speech
13
2. MIT Galaxy System: The MIT Spoken language Systems Group has been
working on several research projects of human computer interface via telephone
targeting at the following applications:12.
JUPITER13 - A weather information system
MERCURY - An airline flight planning system
PEGASUS14 - An airline flight status system
VOYAGER - A city guide and urban navigation system
ORION - A personal agent for automated, off-line services
See the instructions on the web sites of JUPITER and PEGASUS. Dial the
corresponding toll-free phone numbers and talk with these two systems. Describe
under what kinds of conditions these systems will make mistakes .
Part VI: Mobile Computing and Pocket PC Programming
We have learned various aspects of digital speech processing in this design project. In
this part, apply what you have learned from the previous parts and design a simple
application related to digital speech processing for Pocket PC using Microsoft eMbedded
Tools. You can refer to “ENEE408G Multimedia Signal Processing Mobile Computing
and Pocket PC Programming Manual” and extend the examples there. 12
MIT SLS: http://www.sls.lcs.mit.edu/sls/applications/
JUPITER: http://www.sls.lcs.mit.edu/sls/applications/jupiter.shtml TEL: 1-888-573-8255
14
PEGASUS: http://www.sls.lcs.mit.edu/sls/applications/pegasus.shtml
TEL: 1-877-527-8255
13
UMD ENEE408G Spring’03
Design Project - Speech
14
ENEE408G Multimedia Signal Processing
Design Project on Digital Audio Processing
The Goals
1. Learn the fundamentals of perceptual coding of audio and intellectual rights
protection from multimedia.
2. Design a digital audio watermarking system in time and frequency domain.
3. Explore the synthetic audio: MIDI and MPEG4 Structured Audio.
Note: The symbol
means to put your discussion, flowchart, block diagram, or plots
in your report. The symbol ‘ indicates that you should put the obtained multimedia
data in your report. The symbol means to put your source codes (Matlab, Basic, or
C/C++) in your report.
Part I. Perceptual Coding and MP3
In modern audio coding algorithms, four key technologies play important roles:
perceptual coding, frequency-domain coding, window switching, and dynamic bit
allocation. The figure below shows a generic block diagram for modern audio encoders.
In this part of the design project, we investigate the fundamentals of perceptual coding in
the MP3 technology.
s(n)
Time/Frequency
Analysis
Psychoacoustic
Analysis
Params.
Params.
Quantization &
Encoding
Entropy
(lossless)
Coding
Masking
Thresholds
Bit Allocation
Side
Info
M
u
x
Channel
1. Psychoacoustic Models and Perceptual Coding
UMD ENEE408G Spring’03
Design Project - Audio
1
The figure1 above shows the anatomy of human ear. Many researchers in the field of
psychoacoustics exploit the “irrelevant” signal information that is not detectable even
by a well-trained or sensitive listener. These studies lead to five psychoacoustic
principles.
a. Absolute Threshold of Hearing: This threshold represents the minimal amount of
energy if a listener is able to detect a pure tone in a noiseless environment. If a
given tone is too weak, we cannot hear it, so we do not have to encode it.
b. Critical Bands Frequency Analysis: Cochlea can be modeled as a non-uniform
filter bank that consists of 25 highly overlapping bandpass filters. Critical bands
are the passbands of those filters.
c. Simultaneous Masking: In each critical band, one sound (the maskee) is rendered
inaudible due to the presence of another sound (the masker). We can identify a
masker and do not encode the inaudible, masked tones.
d. Spread of Masking: Masking in a critical band can be spread to its neighboring
bands.
e. Non-Simultaneous Masking: Masking can be done in time domain, too.
For perceptual coding, the encoder generates a global masker according to the above
principles and provides parameters for further processing. In this part, we investigate
the absolute threshold of hearing and simultaneous marking principles.
(a) Absolute Threshold of Hearing:
In this section, we use PM_Abs_Thre_Hearing.m to explore the absolute threshold
of hearing. The goal of this experiment is to find out a volume threshold that is just
auditable at a specific frequency. In other words, given a tone with the same
frequency, if it has a slightly lower volume than this threshold, it becomes
inaudible.
S tage 2: M eas ure Absolute Threshold of Hearing
40
Related S ound P ress ure Level to 4K Hz , rS P L(dB )
Left Click m ouse to adjust the volum e. Right click to E xit
35
30
25
20
15
10
5
0
-5
-10
2
10
1
3
10
Frequency(Hz)
10
4
From http://www.vestibular.org/gallery.html
UMD ENEE408G Spring’03
Design Project - Audio
2
First, we calibrate the minimal audible volume for 4KHz2. After setting this
volume, you will see the figure above. Each circle represents a frequency
component and we can increase/decrease its volume by simply left clicking mouse
in the upper/lower side of this circle. You can right click mouse to exit this
program.
Find out the thresholds for your ear at the 11 frequencies shown in the figure. Copy
your result figure using Edit Copy Figure from the menu bar of the figure
window and paste it to your report ‘.
(b) Simultaneous Masking:
In this section, we use PM_Simu_Masking.m to explore the simultaneous masking,
which means that a tone could become inaudible if there is a simultaneous tone with
higher volume at a neighboring frequency. For each critical band, we can fix the
volume of the central frequency. By adding a neighboring tone with different
amplitude, we can find out a threshold that this neighboring tone is inaudible if the
amplitude is lower than the threshold, but audible if the amplitude is higher than the
threshold.
In this experiment, we also need to calibrate the minimal audible volume for 4KHz.
After calibration, you will see another figure that is illustrated below. There are
seven frequencies indicated by circles. The middle one represents the central
frequency of a specific critical band. When we click the other neighboring
frequency, this program will generate an audio signal consisting of the central
frequency and the selected frequency. We can increase/ decrease its volume by
simply left clicking mouse in the upper/lower side of this circle. After finding out
the thresholds of those six frequencies (except the central frequency), right click
mouse to exit this program.
S tage 2: M easure sim ultaneous m ask ing
60
Related S ound P ress ure Level to 4K Hz , rS PL(dB )
Left Click m ouse to adjust the volum e. Right click to E xit
50
40
30
20
10
0
-10
2900 3000 3100
3200 3300 3400 3500 3600
Frequency(Hz)
3700 3800 3900
2
Since 4KHz is the most sensitive frequency for human ear, to obtain the whole dynamic range of volume,
we first calibrate the minimal audible volume at 4KHz.
UMD ENEE408G Spring’03
Design Project - Audio
3
Select two different critical bands3 and find out the simultaneous masking of your
ear. Copy these two figures using Edit Copy Figure from the menu bar of this
figure window and paste them in your report ‘. For each critical band, compare the
mask of higher neighboring frequency with the lower one .
2. Audio Extraction and MP3
MP3 (MPEG1 Audio Layer 3) is an audio coding/compression standard that uses
perceptual coding technologies. In this part, we use GoldWave to extract a piece of
music and compress it in MP3 format. We will observe how psychoacoustic model
works by comparing the frequency spectrum of the raw signal and MP3 file:
(a) CD Audio Extraction: Click Tools CD audio extraction from GoldWave’s menu
bar. An audio extraction window will pop up. Use this tool to extract about ten
seconds’ music4 from your favorite music CD and save it as a 16-bit stereo signed
WAV file. Name it as original.wav. ‘
(b) Downsampling: To compare with MP3, we need downsample this WAV file. Click
Effect Resample and choose 16000Hz. Save this file as downsample.wav. ‘
(c) Generate MP3 file by File Save As. Choose Save as type as Wave Audio. Adjust
the parameters in the File Attributes as MPEG Layer 3, 32kbps, 16000Hz, stereo.
Name this new file as wav2mp3.wav. ‘
(d) Convert MP3 to WAVE: Reload wav2mp3 file and save it as 16-bit stereo signed
WAV file. Name it as reconstructed_wav.wav. ‘
(e) Compare Spectrum: Use Compare_Spectrum.m5 to plot spectrum for
downsample.wav and reconstructed_wav.wav . Compare the difference between
3
[LinX,volumnY]=PM_Simu_Masking(ith_CB) ; you can select critical band by specifying ith_CB
You can use your own favorite Compact Disk.
5
Compare_Spectrum(Original_Filename,Reconstructed_Filename,NFFTorder,Len_order,shift)
% Original_Filename : file name of the original signal.
% Reconstructed_Filename : file name of the reconstructed signal
% NFFTorder: Number of FFT points, NFFT=power(2,NFFTorder), e.g. NFFT order = 9 for
% 512-point FFT;
4
UMD ENEE408G Spring’03
Design Project - Audio
4
the two audio chips and discuss on which frequency range the psychoacoustic
model has a significant impact .
(f) Repeat (b)~(e) but change the MP3 to MPEG Layer 3, 56kbps, 24000Hz, stereo ‘.
Compare the result with previous one .
Part II. Digital Audio Watermarking
1. Watermark Embedding, Detection, and Attack
Digital Watermarks: Due to the advancement in network communications and
multimedia signal processing network, exchanging and distributing multimedia become
easier and popular. The accompanying problem with this advancement is how to protect
the copyright of intellectual property of digital content. Ownership/copyright protection
and integrity verification are two key issues. Digital watermarking is a class of
techniques that embed copyright and other protection information in multimedia. In
this part, we use AudioMark Demo6 to embed and detect watermark in audio and
explore the weakness of this demo software.
(a) Embedding of Digital Audio Watermark: By Watermark Cast to load the
sample.wav and specify a KEY (the range of demo version is between
100000000~100000100) and the filename of watermarked file.
(a) Detection of Digital Audio Watermark: By Watermark Detect and specify the
key/seed, the detection module will determine the existence of the watermark.
(b) Attack on Robust Audio Watermark: An adversary may modify a piece of
watermarked multimedia to try to remove the watermark without degrading the
perceptual quality of multimedia too much. To increase the robustness of
watermarking, designers should understand typical attacks and prevent them in
advance. In this part, you are asked to attack the watermarked wave file using the
tools provided by GoldWave and test the existence of watermark in the attacked file
using AudioMark Demo. Try to keep the quality of music acceptable. Here are a
few possible ways in which you can attack.
% Len_order: Number of samples for calculating spectrum, Len=power(2,Len_order);
% shift: start position to compare two signals
6
AudioMark Demo: http://www.alphatecltd.com/watermarking/audiomark/audiomark.html .
UMD ENEE408G Spring’03
Design Project - Audio
5
(1) MP3 compression: Choose mp3 by File
Save As.
(2) Echoing: Add echo by Effect Echo.
(3) Enhancement and filtering techniques, such as low pass filtering, quantization,
and equalization: By Effect Filter Low/HighPass, Equalizer, or Effect
Resample.
(4) Noise addition: By Tools Expression Evaluator.
Develop an attack scheme that can preserve reasonable good sound quality yet remove
watermark embedded by AudioMark Demo. Describe under what conditions the
watermark will be destroyed.
2. Design Your Own Audio Watermarking Systems
There are three basic issues in designing an audio watermarking system:
•
Transparency: The digital watermark should not degrade the perceptual quality of
the signal.
a. Robustness: For watermark conveying owner’s rights, adversaries would have
incentives to remove the watermark by modifying and attacking the watermarked
audio. The watermarks in these applications should be robust enough to survive a
wide range of attacks. On the other hand, watermark that is fragile to processing
can be useful for detecting tampering, where the change in watermark will signal
and locate the altered regions of a multimedia signal. As multimedia is often stored
in compressed format for efficient storage and compression, even in the case of
fragile watermark, it is often desirable to design the watermark to sustain moderate
compression.
b. Capacity / Payload: It is desirable in many applications to allow the embedded
watermarks to carry enough payload bits for representing various types of
information.
In this section, you will design two digital audio watermarking systems, and evaluate
them in terms of transparency, robustness, and payload.
(a) System #1
One of the simplest approaches to “hide” a message in audio is to convert the
message into bits and putting them into the least-significant-bits (LSBs) of audio
samples. To help detector make a more reliable decision, you may repeatedly embed
each message bit in a number of audio samples at the embedder’s side, and do a
majority voting at the detector’s side.
UMD ENEE408G Spring’03
Design Project - Audio
6
(1) Implement this LSB-based watermarking in Matlab to embed and detect the
following message in an audio file sample.wav :
“(c)Spring 2003. DO NOT SELL. DO NOT TAMPER. Go Terps! ”
Your implementation may include a message encoding function, watermark
embedding and detection functions, and a message decoding function.
Note 1: The sample.wav can be downloaded from course webpage. It is a stereo
audio file. For simplicity, you can just embed watermarks in one channel here.
Note 2: Matlab supports the I/O of WAVE file format. wavread.m and
wavwrite.m are for reading and writing wave files, respectively. The format of
sample.wav file is signed 16-bit, i.e. the range of its signed integer representation
is between [-32768, 32767]. However, the value obtained by wavread.m is in
stored as a “double” between [-1, 1). To embed watermarks in the LSBs in the
Matlab environment, you may find it convenient to convert the [-1, 1) values to
16-bit unsigned integer values [0 65535] using the formula (x+1)*215.
Note 3: The following Matlab built-in/toolbox functions may be helpful to your
implementation: dec2bin( ), bin2dec( ), char( ), double( ).
Note 4: Try to implement your watermarking system in a flexible way to
accommodate the embedding in the nth LSB bits, n = 1, 2, 3 …. See the
instructions below regarding the transparency and robustness.
(4) Transparency: Listen to the watermarked audio whose 1st LSBs carries your
message. Does the watermark affect the quality of the audio? Change your
embedding function (and correspondingly the detection) to put your message in
the 2nd LSBs and answer the above question again. How about 3rd LSBs and 4th
LSBs?
(5) Robustness, Security, and Applications: How robust is the watermarking system
that embeds message in the 1st LSBs? How about the 2nd, 3rd, and 4th LSBs? How
does the repeating time affect the robustness? Can an unauthorized person change
the embedded message? Design tests and use the results to justify your answers
.
Note: You can use the bit error rate (BER), the percentage of bits that are
incorrectly decoded, to measure the robustness.
(b) System #2
The watermark can be embedded either in time domain or frequency domain. In this
section, we employ spread-spectrum embedding to put a watermark in the 1-D DCT
domain.
UMD ENEE408G Spring’03
Design Project - Audio
7
Embedder:
Key/Seed
Noise-like seq.
Watermark
W(j)
Audio File
Segmentation
(frame size L)
DCT
V(j)
V’(j) = V(j) + a(j) W(j)
e.g., a(j) = 0.05 |V(j)|
added in midfrequency
region
V’(j)
iDCT
Watermarked
audio file
The basic procedures of spread-spectrum embedding is as the follows:
•
First, segment an audio file into non-overlapped frames with size L=1024 and
apply a 1-D DCT for each frame.
•
Second, construct a noise-like vector w of length L as your watermark (such
as through “randn( )”). Normalize the strength of your watermark, for
example, use a random number generator to generate each element wi with a
variance of 1.
•
For each frame of the audio, add watermark according to v’i = vi + αi wi ,
i = 1, …, L, where vi denotes an original coefficient and v’i denotes the
watermarked version. The scaling factor αi controls the strength of your
overall watermark. Here we apply the following simple rules: set αi to zero
except for mid-frequency DCT coefficients (i.e. αi = 0 for i < T1 and i > T2,
where the frequency thresholds T1 < T2 are determined by your experiments).
For each mid-frequency coefficient vi, set αi to be 3-10% of |vi| . You can
determine the exact setting empirically.
As can be seen from this simple rule, a watermark is only embedded in the
mid-frequency DCT coefficients of an audio frame. This is because
information in the mid-frequency is more important than the other part for
human auditory system. An adversary does not want to sacrifice too much
quality of audio by hacking this informative part. Accordingly, the watermark
is more likely to survive. A more sophisticated choice of αi for mid-frequency
part should be guided by human auditory model.
•
After embedding, perform an inverse DCT to convert the signal back to time
domain. Clip the amplitude of the watermarked signal to the range of the
original audio sample [-1, +1]. Repeat the process for every frame.
Detector:
We use the original unmarked audio file to help determine the existence of a
specific watermark. Thus the detector would know the original audio file, a
watermark sequence, and of course, the watermarking method and all the related
parameters.
UMD ENEE408G Spring’03
Design Project - Audio
8
Audio File
in question
Original
audio file
DCT
V’’
+
_
V
DCT
Extract
mid-frequency
coefficients
watermark
detector
W
A specific
watermark
Detection
result
Similarity measures:
<V’’ - V, W>, or
< (V’’-V) ./ a, W >, or
correlation coeff.
The basic procedures of spread-spectrum embedding is as the follows:
•
We perform DCT on both the original and the audio file in question, and
compute the difference between the corresponding elements of these two sets
of DCT coefficients. We shall denote this difference vector as z.
•
Retain only the elements of w and z that correspond to the mid-frequency part
in which the embedder chooses to embed watermark. We denote the retained
vectors as w(m) and z(m).
•
Measuring the similarity between w(m) and z(m) by computing the correlation
coefficient. A high positive correlation coefficient indicates that with high
probability the audio frame in question comes from adding w(m) to the original
audio frame. You may also try to take into account the scaling factor a when
measuring the similarity.
•
Repeat the process for other frames. Plot the correlation coefficients you
obtained from all frames.
Here is your To-Do list:
(1) Use Matlab scripts/functions to implement an embedder and a detector according
to the procedures described above .
Note: The following Matlab built-in/toolbox functions may be helpful to your
implementation: dct( ), idct( ), randn( ), corrcoef( ).
(2) Generate two different spread-spectrum watermarks w1 and w2 . Produce a
watermarked audio files from sample.wav with w1 embedded, and name the
watermarked file as marked1.wav; produce a 2nd watermarked audio files from
sample.wav with w2 embedded, and name the watermarked file as marked2.wav.
Use your detector to determine whether w1 can be found in marked1.wav and
marked2.wav, respectively. In a single plot of correlation coefficients vs. audio
frames, include and compare your detection results of the two cases.
(3) Transparency and Robustness - Adjust various parameters in your watermarking
system (L, αi , T1 and T2) and examine their impact on transparency and
robustness:
UMD ENEE408G Spring’03
Design Project - Audio
9
i.
Listen to the watermarked audio. Does the watermark affect the quality of the
audio?
ii. Add noise7 with different amplitude to the watermarked audio. Does the
detector detect the existence of the watermark that was generated from the
original user key?
iii. Use GoldWave to add echo in the watermarked audio. Try echoes with short
delay a small volume and with long delay a large volume, respectively. Observe
the detection result.
iv. Use GoldWave to apply MP3 compression to the watermarked audio. Can your
watermark resist the compression attack?
v. Discuss the tradeoff between the transparency and robustness . Include in
your report a watermarked audio signal using the parameter settings that you
believe giving the best tradeoff ‘.
(4) Compare the transparency and robustness of the watermarks for the two systems
investigated above. List their advantages, disadvantages, and potential
applications. Discuss how to improve these two systems.
(5) Bonus part: Extend your System#2 to hide a meaningful message such as the
message we have used in System#1
.
Hint: you can use hide one bit in one frame by adding the watermark if to embed
a bit “1” (v’i = vi + αi wi) and subtracting the watermark if to embed a bit “0” (v’i
= vi - αi wi). You can reuse your message encoding and decoding functions from
System#1. If needed, you can repeatedly embed the same bit in a few frames and
do a majority voting at the detector’s side.
Appendix of Part-II for further exploration: Pseudo-Noise Sequence
Spread-spectrum watermarking uses a noise-like sequence w as watermark. Such
noise-like signals are less likely to introduce perceivable distortions than structured
signal (such as a periodic squared wave). In addition, they have good statistical
properties to achieve high noise resistance and help detector make reliable decisions.
In System#2, you have experienced real-valued watermark generated through a
Gaussian random number generator. Another simpler choice of w is a binary periodic
7
You can use rand.m to generate “uniformly distributed” random sequences or randn.m “Normally
distributed” random sequences. For example, noise=A*randn(1,100) represents 100 normally distributed
random numbers with amplitude scalar A.
UMD ENEE408G Spring’03
Design Project - Audio
10
pseudo-noise sequence8, or PN sequence in short. PN sequence can be generated by a
series of shift registers with a feedback logic that is shown in the figure below.
Logic
Flip-Flop
1
2
3
......
p
output
sequence
clock
The feedback logic can be expressed by a polynomial,
f(X)=g1X+g2X2+…+gpXp
where Xi indicated the ith Flip-Flop and g i ∈ {0,1} controls the logic. When the clock
triggers, the system will shift all of its values by one unit, output a bit as the PN
sequence, and then send the f(X) back to the first Flip-Flop. One implementation is
to employ Galois Field prime polynomials9 in the logic feedback part to obtain
maximal-length sequence (or m-sequence in short).
Setting p=10 gives us 60 possible polynomials to use10. Each polynomial can
generate 210-1 = 1023 different output sequences. You can generate a pseudo random
sequence of this kind using a provided MATLAB function PNsequence.m11 with a
given polynomial and a seed (which is to initialize the shift register). We can observe
the auto-correlation and cross-correlation function of the m-sequence by the
following MATLAB programs:
Auto-correlation property:
Seq1_index= ? % Specify your selected key (seed) here
seq1_poly= GFprimMrx(seq1_index,2:end)
seq1 = PNsequence(seq1_poly,bitget(1,1:p));
for seq2_index=1:2^p-1
seq2 = PNsequence(seq1_poly,bitget(seq2_index,1:p));
[corrf]=PNcorrelation(seq1, seq2);
plot(corrf); axis([0, length(corrf)+1, ,0 max(corrf)+1 ]);
pause;
end
8
For details, please refer to chapter 7 in: Simon Haykin: Communication Systems, 4th edition, Wiley, 2000.
For fixed order p, the prime polynomials can be obtained using the Matlab built-in function.
GFprimMrx = gfprimfd(p,’all’)
Each row represents a “key” for each user: GFprimMrx (key,2:end) = [g1 g2 … gp]
You can store those polynomials as a .mat file (by save command) since searching for the polynomials is
time-consuming.
10
W.W. Peterson and E.J. Weldon, Error-Correcting Codes, Cambridge: MIT Press, 1972.
11
out_sequence = PNsequence(GFprimMrx (key,2:end), seed)
9
UMD ENEE408G Spring’03
Design Project - Audio
11
Cross-correlation property:
Seq1_index= ? % Specify your selected key (seed) here
NumGF=size(GFprimMrx,1);
maxTable=zeros(1,NumGF);
seq1_poly= GFprimMrx(seq1_index,2:end)
seq1 = PNsequence(seq1_poly, rand(1,length(seq1_poly))>=0.5);
for seq2_index=1:NumGF
seq2_poly = GFprimMrx(seq2_index,2:end);
seq2 = PNsequence(seq2_poly,rand(1,length(seq2_poly))>=0.5);
[corrf]=PNcorrelation(seq1, seq2);
plot(corrf); axis([0, length(corrf)+1, 0 max(corrf)+1 ]);
maxTable(seq2_index)=max(corrf);
end
plot(maxTable);
find(maxTable==max(maxTable))
UMD ENEE408G Spring’03
Design Project - Audio
12
Part III. Synthetic Audio (1): Musical Instrument Digital Interface
(MIDI)
Musical Instrument Digital Interface (MIDI)12 is different from the digital sampled audio,
such as PCM. It can be thought of as instructions telling music synthesizer when to play
and what notes to play instead of sending waveform to the speakers. There are several
advantages using this synthesis approach. For instances, it requires much less storage
space and PC bandwidth in I/O bus. In this part, we use Anvil Studio13 to study the MIDI
Protocol. As shown in the figure below, there are several panels (from top to bottom) in
this software: play panel, track editor panel, stave panel, note editor panel, and keyboard
panel.
(a) Load the Sonata-c.mid by File Open Song. Double click the play button in the
play panel. Then modify the track editor panel, such as, the channel and
instrument14. The channel setting is for assigning an audio channel for each track
and the instrument setting is to choose what instrument to play.
12
A useful Tutorial of MIDI: http://www.harmony-central.com/MIDI/Doc/tutorial.html
Anvil Studio: http://www.anvilstudio.com/upgraden.htm
14
There are 16 logical channels and 128 instruments in the General MIDI (GM) systems. Those number of
instruments are standardized, thus different music synthesizer will not play the different instrument while
reading the same instrument number form MIDI file. However, this does not mean those music
13
UMD ENEE408G Spring’03
Design Project - Audio
13
(b) Notes in MIDI file are defined as the following table (each note takes 4 bits).
Octave
C
0
1
2
3
4
5
6
7
8
9
10
C#
0
12
24
36
48
60
72
84
96
108
120
1
13
25
37
49
61
73
85
97
109
121
D
2
14
26
38
50
62
74
86
98
110
122
D#
3
15
27
39
51
63
75
87
99
111
123
E
4
16
28
40
52
64
76
88
100
112
124
Music Notes
F
F#
G
5
6
7
17
18
19
29
30
31
41
42
43
53
54
55
65
66
67
77
78
79
89
90
91
101 102 103
113 114 115
125 126 127
G#
8
20
32
44
56
68
80
92
104
116
A
9
21
33
45
57
69
81
93
105
117
A#
10
22
34
46
58
70
82
94
106
118
B
11
23
35
47
59
71
83
95
107
119
The Anvil Studio has provided a keyboard interface at the bottom of the program’s
window, which can automatically translate notes keyed in into the above note
numbers. Create a new MIDI file by File New Song and use the keyboard
panel to key in the following score. Save this MIDI file. ‘
(c) To check whether the MIDI file is generated successfully, you can use the
Windows Media Player to listen to this MIDI file.
Part IV. Essays: Digital Rights Protection of Multimedia and related
Ethics issues for Engineers
Intellectual Property (IP), such as copyright, plays an important role in contemporary
society. Discuss the issues of IP rights protection for digital multimedia among your
team, and interview your friends/families to see what they think about this. Write an
essay to report and summarize your friends and opinions .
synthesizers will play the same waveform. In fact, every music synthesizer has its own approach to
generate any specified note of instrument. Usually, there are two approaches, Frequency Modulation and
Wave Table. The quality of the latter approach is much better than the former. Except hardware
synthesizer, there exists software synthesizer generating and mixing waveforms. Microsoft GS Wavetable
SW Synth is the popular one under Windows system, which adopts Roland instrument sounds. You can try
to change the MIDI playback device, no matter hardware of software synthesizer, and listen the differences
among them.
UMD ENEE408G Spring’03
Design Project - Audio
14
Part V: Mobile Computing and Pocket PC Programming
From the above three parts, we have learned the fundamentals of digital audio processing.
Now design a simple Pocket PC application related to digital audio processing using the
Microsoft eMbedded Tools. You can refer to “ENEE408G Multimedia Signal Processing
Mobile Computing and Pocket PC Programming Manual” and extend the examples
there.
Bonus Part I. Synthetic Audio (2): MPEG4- Structured Audio (MP4SA)
MPEG4 synthetic audio coding standard consists of two methods, namely, Structured
Audio (SA) and Text-to-Speech (TTS). In the SA part, MPEG4 defines the Structured
Audio Orchestra Language (SAOL) and Structured Audio Score Language (SASL)15.
Instead of using the frequency modulation and wavetable technique as in MIDI, SAOL
encodes a sound signal according to its structure. This technique can achieve extremely
high compression ratio, about 100:1~10,000:1. In this section, we explore these two
languages, SAOL and SASL, using the SPlay and SNet16.
SAOL and SASL:
The role of SAOL is sound modeling, which encodes algorithms on how instruments
generate sounds. On the other hand, SASL is for sound sequencing, i.e. a timing table
giving instructions to each instrument on when and how to play notes. The figure below
illustrates the framework. The SAOL and SASL are in plain text file format. The encoder
encodes those two files into binary form known as the MP4 format. A MP4 file contains
the structure of instruments and the scores of the music instead of digitalized sample
waveforms. The decoder converts the MP4 file into a C file and then compiles it as an
executable audio file.
SAOL
Encoder
SASL
Encoder
.mp4
MP4 to C
Translator
.c
C++
compiler
executable
audio file
Decoder
SPlay:
SPlay is a software program that implements a decoder. Download several MP4 files
from http://student-kmt.hku.nl/~saol/ and listen to the results. Notice that the file size is
15
MP4-SA Language Standard: http://www.cs.berkeley.edu/~lazzaro/sa/book/append/fdis/SA-FDIS.pdf.
(a) A useful online book explains how to use this new language: The MPEG-4 Standard Structured
Audio book, http://www.cs.berkeley.edu/~lazzaro/sa/book/ by John Lazzaro and John Wawrzynek.
(b) You can watch a short presentation about MP4-SA by John Wawrzynek at
http://bmrc.berkeley.edu/bibs/instance?prog=1&group=13&inst=35
16
SPlay and SNet are available at: http://student-kmt.hku.nl/~saol/
UMD ENEE408G Spring’03
Design Project - Audio
15
quite small. The behavior of this audio player is different from that of a waveform
player, such as players for .wav and .mp3 files. Since an MP4 player translates the MP4
file into C file and compiles it, this player will take more time than a waveform player.
(a) In this sub-section, we explore SAOL and SASL using SNet. SNet is a GUI
wrapping the kernel sfront. You can refer to John Lazzaro’s online book, The
MPEG-4 Standard Structured Audio Book, to learn more details about how to use
these languages.
(1) Read the online book Part I – A Tutorial Introduction.
(2) Play those three examples (sine, vsine and vcsine) by SNet. First, copy *.saol
file and then paste on the SAOL tab. Save it. Then, copy the *.sasl file and paste
on the Score tab. Save them by Render Render to .mp4. ‘.
(3) Use SPlay to play the MP4 files. Discuss the advantages and disadvantages of
waveform coding (PCM) and synthetic audio (e.g. MIDI and MP4-SA) .
UMD ENEE408G Spring’03
Design Project - Audio
16
ENEE408G Multimedia Signal Processing (Spring’03)
Overview and Warm-up Exercises of Matlab Programming
1. Starting Matlab
Begin a Matlab session by clicking on its icon under the “Start
Programs” Menu. We will be
using Matlab 6.5. Upon started, you will find an interface like this1:
1
For more details, please refer http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.shtml
ENEE408G Spring 2003
Matlab Programming
Page 1/1
2. Matlab 6.5 Features
The new features of Matlab 6.5 include:
•
Command Window
Use the Command Window to enter commands, and run functions and M-files.
•
Command History
The commands that you have entered in the Command Window are automatically logged in
the Command History window. You can view previously used functions there, and copy
and execute selected lines.
•
Start Button and Launch Pad
MATLAB's Launch Pad provides easy access to tools, demos, and documentation.
•
Help Browser
Use the Help browser to search and view documentation for all MathWorks products. The
Help browser is a Web browser integrated into the MATLAB desktop that displays HTML
documents.
To open a Help browser, click the help button in the toolbar, or type “help browser” in the
Command Window.
•
Current Directory Browser
MATLAB file operations use the current directory and the search path as reference points.
Any file you want to run must either be in the current directory or on the search path.
To search, view, open, or make changes to MATLAB-related directories and files, use the
MATLAB Current Directory browser. Alternatively, you can use the commands dir, cd, and
delete in the Command Window.
After starting Matlab, please change the current directory to your own working directory
rather than staying in the default Matlab directories. Otherwise, you could accidentally
overwrite some important files!
•
Workspace Browser
The MATLAB workspace consists of the variables built up during a MATLAB session.
Variables are added to the current workspace when you use functions, run M-files, or load
previously saved workspaces.
ENEE408G Spring 2003
Matlab Programming
Page 2/2
To view the workspace and information about each variable, use the Workspace Browser, or
use the commands “who” and “whos”.
•
Array Editor
Double-click a variable in the Workspace browser to see it in the Array Editor. Use the
Array Editor to view and edit a visual representation of one- or two-dimensional numeric
arrays, strings, and cell arrays of strings that are in the workspace.
•
Editor/Debugger
Use the Editor/Debugger to create and debug M-files, which are Matlab programs you write.
The Editor/Debugger provides a graphical user interface for basic text editing, as well as for
M-file debugging.
You can use any other text editor to create M-files, such as Emacs and Notepad, and use
Preferences (accessible from the Matlab desktop’s File menu) to specify that editor as the
default. If you choose to use other editors, you can still use the MATLAB Editor/Debugger
for debugging, or you can use debugging functions, such as “dbstop”, which sets a
breakpoint.
If you just need to view the contents of an M-file, you can display it in the Command
Window by using the “type” command.
3. General Comments
Since most of you are already familiar with Matlab in previous courses, this section provides only a
brief review of a few important points of using Matlab.
•
General Philosophy
Matlab has become a popular software tool of linear algebra, numerical analysis, and
visualization. Much of its power lies in its highly optimized operations on vectors and
matrices. In many cases, you should be able to eliminate the “for” loops you used to write in
C code with Matlab's simple and fast vectorized syntax2. Here is an example to compute the
cosine of 10001 values ranging from 0 to 10.
i = 0;
for t = 0:.001:10
i = i+1;
y(i) = cos(t);
end
2
If you are using MATLAB 6.5 or higher, before spending time to vectorize your code, please refer to
http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_prog/ch7_per6.shtml#773530 . You may be able to
speed up your program considerably by using the new MATLAB JIT Accelerator.
ENEE408G Spring 2003
Matlab Programming
Page 3/3
A vectorized version of the same code is:
t = 0:.001:10;
y = cos(t);
It is important for you to do so whenever possible, since the “for” loops are not optimized
(in MATLAB 6.1 or lower) and could be very slow.
The second way to improve code execution time is to preallocate the arrays that store output
results. Here is an example to preallocate a row vector with 100 elements.
y = zeros(1,100);
Preallocation makes it unnecessary for MATLAB to resize an array each time you enlarge it,
and it also helps reduce memory fragmentation if you work with large matrices.
Another tip to speed up performance is to implement your code in a function rather than a
script. Every time a script is used in MATLAB, it is loaded into memory and evaluated one
line at a time. Functions, on the other hand, are compiled into pseudo-code and loaded into
memory all together, so that repeated calls to the function can be faster.
•
Help
Type "help function_name" at the Matlab Command Window to get help on a specific
function. For a nicer interface to the help utility, click the question mark button at the top of
the Matlab Command Window as explained above. Or, from the Help menu item, choose
"Help Desk" to get to online HTML help.
From the Help Desk, you can search the online documentation to get help on how to
accomplish tasks for which you may not know the specific function names. You can also try
the “lookfor” command at Matlab Command Window.
•
Saving/Loading Data
Matlab data can be stored into “.mat” files on your disk. To save your whole workspace in a
file called “filename.mat”, use "save filename".
To save one particular variable called “variable_name” into “filename.mat”, type
"save filename variable_name"
Saving will overwrite whatever filename you specify. To append instead of overwrite, use
"save filename variable_name -append".
Note that the saved files are in a special binary format, and hence unreadable by other
applications. You can use "save ... -ascii" to save your workspace as a text file. See "help
save" for more details.
ENEE408G Spring 2003
Matlab Programming
Page 4/4
To load a workspace from “filename.mat” file, we can type “load filename”. Again, you can
load specific variables using "load filename variable_name". See "help load" for more
details.
•
Writing Matlab Programs
Procedures that you call repeatedly can be stored either as function or script.
o Scripts do not accept input arguments or return output arguments. They operate on
data in the workspace.
o Functions can accept input arguments and return output arguments. Internal variables
are local to the function.
Both of them can be created by writing “.m” files in your favorite text editor and storing
them in your working directory.
When you invoke a script, MATLAB simply executes the functions found in the file. Scripts
operate on existing data in your workspace, and they can create new data on which to
operate. Although scripts do not return output arguments, any variables that they create
remain in the workspace and can be used in the subsequent computations.
Functions are M-files that can accept input arguments and return output arguments. The
name of the M-file and of the function should be the same. Functions operate on variables
within their own workspace, separate from the main workspace you access at the MATLAB
Command Window. Here is an example function file, MSE2PSNR.m.
function PSNR=MSE2PSNR(MSE)
X
% Convert mean square error (MSE) into peak signal to noise ratio (PSNR)
% Input: MSE
% Output: PSNR
%
% Author: Guan-Ming Su
% Date: 8/31/02
A=255*255./MSE;
PSNR=10*log10(A);
Y
Z
X A function consists of three parts. The first line of a function M-file starts with a
keyword “function” and has to include a statement like:
function [out1, ... outN] = function_name( input1, input2, .., inputN )
The final value of variables out1, ... , outN will be automatically returned once the function
execution is finished.
ENEE408G Spring 2003
Matlab Programming
Page 5/5
Y The next several lines with “%”, up to the first blank or executable line, are comment
lines that provide the HELP text. In other words, these lines are printed when you type “help
MSE2PSNR” in Matlab Command Window. In addition, the first line of the HELP text is the
H1 line, which MATLAB displays when you use the “lookfor” command or request help on
a directory.
Z The rest of the file is the executable MATLAB code defining the function. The variable
A introduced in the body of the function, as well as the variables MSE and PSNR on the first
line of the function, are all local to the function; they are separate from any variables in the
MATLAB workspace.
•
Audio Representation and Playback
Matlab supports multi-channel wave format with up to 16 bits per sample. To load a wave
file, you can use “[Y, Fs, Nbits] = wavread( wave_filename)”, where wave_filename is the
file name of the wave file, Y the sampled data with dimension number of samples ° number
of channels, Fs (in Hz) the sampling rate, and Nbits the number of bits per sample used to
encode the data in the file. Amplitude values in Y vector are normalized to the range [-1, +1]
according to the formula, Y = X /[2(Nbits-1)] – 1, where X is the original unsigned Nbits integer
expression. For instance, while X is 128 with Nbits = 8, Y is 0. To generate a wave file and
store it in the hard disk, you can use “wavwrite(Y, Fs, Nbits, wave_filename)”. To playback
the signal in vector Y, use “sound(Y, Fs, Nbits)”.
•
Image Representation and Display
Many Matlab variables are matrices (arrays) with double precision complex entries. A gray
scale image is just an array with real entries ranging from 0 to 1. Many applications also use
1 byte, (i.e., 256 representative levels from 0 to 255), to represent a pixel value. This saves
storage space and speeds up computation. We will use these two representations in our lab
assignment.
To display a grayscale image, use "imshow( image_name, [Low High])", where the value
Low (and any value less than Low) will be displayed as black, the value High (and any value
greater than High) as white, and values in between as intermediate shades of gray. An image
stored in “uint8” (unsigned 8-bit integer) data type can be displayed directly by
“imshow(image_name)”. See “help imshow” for details.
If you are to display a transformed image such as by DCT or DFT, you may want to zero out
the DC value beforehand to make it easier to see the AC values that are usually a few
magnitude smaller than the DC. You could also try viewing the log of the DCT or
brightening the color map with “brighten”.
ENEE408G Spring 2003
Matlab Programming
Page 6/6
•
Matlab tips
In addition to such basic commands as for, end, if, while, “;”, “= =”, and “=”, you may
find the following functions may be useful in this course:
o fft2 – a two-dimensional Fast Fourier Transform routine. Make sure to use this
command for 2-D images, not the one-dimensional transform “fft”! See the helps of
these two commands to understand the difference.
o Ctrl-C – stops execution of any command that went awry.
o clear all - Remove all variables, globals, functions and MEX links.
o close all - Close all the open figure windows
o max – For vectors, “max(X)” is the largest element in X. For matrices, it gives a
vector containing the maximum element from each column. To get the maximum
element of the entire matrix X, try “max(X(:))”. Other functions such as min, mean,
and median can be used in a similar manner.
o abs – “abs(X)” gives the absolute value of elements in X.
o Indexing – Use the colon operator and the “end” keyword to get at the entries of a
matrix easily. For example, to get every 5th element of a vector a, use “a(1:5:end)”.
See “help colon” and “help end”.
o strcat(S1,S2,S3,...) – Concatenate the corresponding rows of the character arrays S1,
S2, S3 etc.
o num2str(X) – Convert a matrix X into a string representation with about 4 digits and
an exponent if necessary. This is useful for labeling plots.
o flipud and fliplr - Flip matrix in up/down direction and flip matrix in left/right
direction
o tic and toc – “tic” starts a stopwatch timer and “toc” reads the stopwatch timer. It is
useful to monitor the executing time of your program.
o waitbar – This function can display the progress of your program when you use
loops, such as a for loop.
o pause – This function causes a procedure to stop and wait for the user to strike any
key before continuing.
o find – Find indices of nonzero elements.
ENEE408G Spring 2003
Matlab Programming
Page 7/7
o B = repmat(A,M,N): Replicate and tile the matrix A to produce the M-by-N block
matrix B. For example, let A=[1 2 3], the result of repmat(A,3,2) is
[1 2 3 1 2 3; 1 2 3 1 2 3; 1 2 3 1 2 3]
o Deleting rows and columns: You can delete a row or column by assigning it as a pair
of square bracket. For example, A=[1 2 3;4 5 6;7 8 9], we can delete the second
column by “A(:,2)=[];”. The resulting matrix A is [1 3; 4 6; 7 9].
4. Examples on Digital Audio Processing
We use a few examples related to audio and image processing as warm-up exercises before we get
into the design lab section. Let’s start Matlab and type “Edit” in the command window. You will
see a “M-file Editor”. We will use this editor3 to write M-files.
•
Read, Playback and Visualize an Audio Signal
(1) Download the “symphonic.wav” audio file from the course web site. Make sure it is in
your working directory.
(2) You can read an audio file into an Y matrix using the function wavread :
[Music,Fs, Nbits] = wavread('symphonic.wav');
(3) To obtain the dimension of this image, type
[MusicLength, NumChannel ] = size(Music);
The function size will return the number of samples and number of channels of this
audio file into MusicLength and NumChannel.
(4) To playback this audio vector, type
sound(Music, Fs, Nbits);
Make sure your speaker or earphone is on.
(5) We can visualize the waveform by typing:
Display_start = 1;
Display_end = MusicLength;
subplot(2,1,1); plot(Music(Display_start: Display_end,1));
title(‘First channel’);
3
Please type those lines indicated by the symbol “ “ in the M-file Editor. You can name this file as”MatlabReview.m”
and execute it by typing MatlabReview in Matlab command window.
ENEE408G Spring 2003
Matlab Programming
Page 8/8
subplot(2,1,2); plot(Music(Display_start: Display_end,2));
title('Second channel');
You can adjust the display range by changing Display_start and Display_end.
•
Bits Manipulation
(1) Convert the double value format of Music to an unsigned integer expression with Nbits
bits
IntMusic = (Music+1)*power(2,Nbits-1);
(2) Many music files use 16 bits to represent an audio sample. In this case, NBits = 16. We
can extract the lower byte from the first channel:
LowNbits = Nbits/2;
LowIntMusicCh1 = zeros(MusicLength,1);
FirstChannel = IntMusic(:,1);
% extract the first channel
for ibit = 1:1:LowNbits
LowIntMusicCh1 = LowIntMusicCh1 + bitget(FirstChannel, ibit)*power(2,ibit-1);
end
(3) Convert unsigned integer to the normalized expression and listen to it.
LowRecMusicCh1 = LowIntMusicCh1/power(2,LowNbits-1) - 1;
sound(LowRecMusicCh1,Fs, LowNbits);
(4) Repeat the procedure for the second channel and store the final result in
LowRecMusicCh2. What do you hear?
•
Window-based Audio Processing
(1) Many audio/speech processing techniques divide the whole audio samples into segments
before performing further processing, as depicted in the following figure:
Data
First
segment
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6
Second
segment
5 6 7 8 9
Third
segment
9
Length of
overlapping
ENEE408G Spring 2003
Matlab Programming
1 1 1 1 1 1 1
0 1 2 3 4 5 6
1
0
1 1 1 1 1
0 1 2 3 4
Length of
window
Page 9/9
As we can see, a sliding window is used to extract a set of data from the original data
stream and the adjacent windows have some overlap with each other. For the example
in the above figure, the window size is 6 and the overlapping size is 2. The first six
samples are extracted as the first segment, and the start point of the second segment is
the 5th sample.
(2) In this section, we segment the lower bytes of the second channel, LowRecMusicCh2,
with window size 215 and zero overlapping. Then re-order samples in each window
using “flipud”.
LenWindow = power(2,15);
% Length of window
LenOverlap = 0;
% Length of overlapping
LenNonOverlap = LenWindow - LenOverlap;
NumSegCh2=floor((length(LowRecMusicCh2)-LenWindow)/(LenNonOverlap ))+ 1 ;
for iseg = 1:1:NumSegCh2
seg_start = (iseg-1)*LenNonOverlap+1;
% start point of current window
seg_end = seg_start + LenWindow - 1;
% end point of current window
LowRecMusicCh2(seg_start:seg_end) =…
flipud(LowRecMusicCh2(seg_start:seg_end ));
end
sound(LowRecMusicCh2,Fs, LowNbits);
Can you hear the “secrets”?
5. Examples on Digital Image Processing
•
Read and Display an Image File
(1) Download the “CuteBaboon.bmp” image file from the course web site. Make sure it is
in your working directory.
(2) You can read an image file into an Im matrix using the function imread :
Im=imread(‘CuteBaboon.bmp’);
(3) To obtain the dimension of this image, type
[height, width]=size(Im);
The function size will return the dimension of this 2-D matrix, Im, into height and width.
(4) To display this image, type
imshow(Im,[0 255]);
ENEE408G Spring 2003
Matlab Programming
Page 10/10
Since each pixel in this image has 8 bits, the gray level range is between 0 and 255. We
specify this range as the second argument in the imshow function.
•
Bit Planes
As we have mentioned, each pixel is represented using 8 bits. We can put together the most
significant bit (MSB) of each pixel and form the first bit plane, as shown in the following
figure. Similarly, we can extract the second MSB of each pixel to form the second bit plane,
and so on so forth.
Width
Height
Bits
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
50
51
51
44
65
106
134
115
59
46
46
64
111
150
130
104
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
51 71 88 102 107 80
47 89 113 118 65 56
66 119 131 84 57 61
113 141 91 63 70 89
135 105 81 61 75 86
120 89 71 61 77 97
103 90 73 77 89 105
90 93 103 102 126 88
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
1
1
0
0
1
First bit Plane
0
0
1
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
0
1
1
1
1
1
1
0
0
1
1
1
1
1
Second bit plane
Ex. 131(10)=10000011(2)
(1) Extract MSB bit plane using Matlab’s library function “bitget”.
msbIm=bitget(Im,8);
(2) Display this bit plane
imshow(msbIm, [0 1]);
Note: Since each bit is represented by 0 or 1, the value range of a bit plane is [0 1].
(3) Add your own codes here to observe other bit planes. As we go from MSB to LSB (least
significant bit), you will find that the bit plane becomes more noise-like.
ENEE408G Spring 2003
Matlab Programming
Page 11/11
•
Image Rotation
We can rotate images by treating them as a 2-D matrix and using Matlab’s built-in matrix
operation function. The following table shows the relation between Matlab function and
image rotation. The first row shows the commands and the second row shows the resulting
rotated images.
Im
fliplr(Im')
flipud(Im')
flipud(fliplr(Im))
Let’s try image rotation on the 2nd LSB bit plane of Im.
(1) Extract 2nd LSB bit plane of Im.
lsb2Im=bitget(Im,2);
imshow(lsb2Im, [0 1]);
(2) Rotate each 16x16 block.
bs=16;
for i=1:1:height/bs
for j=1:1:width/bs
lsb2Im((i-1)*bs+1:1:i*bs,(j-1)*bs+1:1:j*bs)=…
flipud(lsb2Im ((i-1)*bs+1:1:i*bs,(j-1)*bs+1:1:j*bs)');
end
end
(3) Display your result and see what “secret” message you can discover.
imshow(lsb2Im, [0 1]);
•
Image Down-Sampling
There are many ways to down-sample an image to reduce its size. The basic idea is to use
one pixel to represent a small area that is usually in square shape.
(1) Use the left-top pixel of each square area to represent that area:
ENEE408G Spring 2003
Matlab Programming
Page 12/12
sIm=Im(1:8:end,1:8:end);
imshow(sIm,[0 255]);
(2) Take the average of each square area. Here we use a function call.
meansIm=mean_subfunction(Im);
You can create this subfunction by File | New, name it as mean_subfunction.m, and type
in the following Matlab’s codes.
function Y=mean_subfunction(X)
% Matlab Review4
[row col]=size(X);
Y=zeros(row/8, col/8);
for i=1:1:row/8
for j=1:1:col/8
Y(i,j)=mean2(X((i-1)*8+1:1:i*8, (j-1)*8+1:1:j*8));
end
end
Observe this downsampled image.
imshow(meansIm,[0 255]);
(3) Compare the result of (1) and (2). Which is a better way to do down-sampling?
•
Histogram
A histogram of a grayscale image shows the luminance distribution of this image. It tells us
the statistics of how many pixels (or what percentage of pixels) an image has at each gray
level. Although Matlab Image toolbox already provides histogram functions, we write our
own version here for the purpose of practicing Matlab.
(1) find function
find function is a very powerful function and can make your Matlab program elegant if
used properly.
4
You can replace it with your comments on the purpose and usage of this function. After you have done with this subfunction, you will see “Matlab Review” and other comments you put there by typing “help mean_subfunction” in
Matlab Command Window.
ENEE408G Spring 2003
Matlab Programming
Page 13/13
I = find(X) returns the indices of the vector X that are non-zero.
(2) length function
length function will return the number of elements in a vector.
(3) We can use find and length function together to obtain the number of pixels that has a
specific luminance value igray.
histogram=zeros(1,256);
for igray=0:1:255
histogram(igray+1)=length(find(Im==igray));
end
plot(histogram);
axis([0 255 min(histogram) max(histogram)]);
title(‘Histogram’);
xlabel(‘Luminance value’);
ylabel(‘Number of pixels’);
6. Assignments
(1) Audio Steganography
The word “steganography” comes from Greek and means “secret writing”. Here we apply
steganography to audio to hide a secret and inaudible message in a music file.
y1
y2
y3
y4
y5
y6
y7
y8
y9
M SB
{y i }
LS B
z1
M SB
{z k }
LSB
As illustrated in the above figure, a secret audio message {zk} is put in the least significant
bits of samples from a host music signal to produce a stego-ed audio, {yi}. We can extract
the hidden message {zk} from {yi} according to the following formula:
zk =
8k
∑ LSB( y ) ⋅ 2
i =8 ( k −1) +1
ENEE408G Spring 2003
Matlab Programming
i
i −8 ( k −1) −1
for k = 1,…,N/8
Page 14/14
where LSB(yi) represents the operator which extracting the least significant bit from sample
yi and N the number of samples in {yi}.
Download “guitar.wav” from course web site and write a simple M-file to extract the hidden
message {zk}.
(2) Image Up-Sampling (Enlargement)
The simplest way to implement image up-sampling is replication. To up-sample an MxN
image to 2Mx2N, each original pixel is expanded into four pixels of the same value. These
four pixels are arranged in a 2x2 manner. Here is an example:
1 2 
A=

3 4 
1
1
LA = 
3

3
1 2 2
1 2 2 
3 4 4

3 4 4
Download “Girl.bmp” image file from course web site and write a simple M-file to enlarge
it without using any “for” loops.
Hint: you may find Matlab’s built-in Kronecker Product function kron useful. If A and B
are M1 x M2 and N1 x N2 matrices, respectively, then their Kronecker product is defined as
 a(1,1)B " a(1, M 2 )B 

A ⊗ B ≡ {a(m, n)B} = 
#
#

a ( M 1 ,1)B " a ( M 1 , M 2 )B 
(3) Erroneously Indexed Image
The image “Noise.bmp” is an erroneously indexed image caused by some unknown reasons.
The only thing we know so far is the relation between original values and the erroneously
indexed values. If we represent the original luminance (8 bits) in binary form as
oriL = a7 ⋅ 2 7 + a 6 ⋅ 2 6 + a5 ⋅ 2 5 + a 4 ⋅ 2 4 + a3 ⋅ 2 3 + a 2 ⋅ 2 2 + a1 ⋅ 21 + a 0 ⋅ 2 0 ,
then the erroneously indexed value can be expressed as
errL = a 0 ⋅ 2 7 + a1 ⋅ 2 6 + a 2 ⋅ 2 5 + a3 ⋅ 2 4 + a 4 ⋅ 2 3 + a5 ⋅ 2 2 + a 6 ⋅ 21 + a 7 ⋅ 2 0 ,
Download the “Noise.bmp” image file and recover it.
ENEE408G Spring 2003
Matlab Programming
Page 15/15