ELEC 484 Final Proje..

advertisement
ELEC 484 Final Project Report
Phase Vocoder – Design & Analysis
Colter McQuay – V00168058
Table of Contents
1
2
Introduction .......................................................................................................................................... 4
1.1
The Hanning Window Function..................................................................................................... 4
1.2
The Windowing Process ................................................................................................................ 5
The Phase Vocoder Implementation .................................................................................................... 6
2.1
Phase Vocoder Analysis Module ................................................................................................... 7
2.1.1
2.2
Analysis Code Description ..................................................................................................... 8
Phase Vocoder Synthesis Module ............................................................................................... 11
2.2.1 .................................................................................................................................................... 12
2.2.2
3
4
Synthesis Code Description ................................................................................................. 13
Effects.................................................................................................................................................. 17
3.1
Time Stretching ........................................................................................................................... 17
3.2
Pitch Shifting ............................................................................................................................... 18
3.3
Robotization ................................................................................................................................ 19
3.4
Whisperization ............................................................................................................................ 20
3.5
Stable & Transient component separation ................................................................................. 21
3.6
Signal Filtering ............................................................................................................................. 22
3.7
Denoising .................................................................................................................................... 23
3.8
Audio Compression ..................................................................................................................... 24
3.8.1
Thresholding frequency data .............................................................................................. 24
3.8.2
Extracting largest frequency components .......................................................................... 25
Additional Figures and Graphs ............................................................................................................ 26
4.1
Project Part B – Phase Vocoder Testing with Cosines ................................................................ 26
4.2
Part C – Amplitudes in time/frequency plane ............................................................................ 30
4.2.1
4.3
Part D – Phase variations in time/frequency plane .................................................................... 34
4.3.1
4.4
Results Interpretation ......................................................................................................... 33
Results Interpretation ......................................................................................................... 40
Effect of Cyclic Shift..................................................................................................................... 41
4.4.1
Sinusoid ............................................................................................................................... 41
4.4.2
Window Centered Impulse ................................................................................................. 42
5
Recommendations .............................................................................................................................. 44
6
Conclusion ........................................................................................................................................... 44
Abstract
Phase vocoders are extremely useful tools in audio signal processing. They allow for fast and efficient
time-frequency processing of audio signals by using taking the Fast Fourier Transform (FFT) of windowed
segments of the audio signal. Many audio effects can be implemented using the phase vocoder by
modifying the phase or frequency data associated with the windowed segments of the audio signal.
This paper describes the implementation and functionality of a phase vocoder written using Matlab.
The phase vocoder itself will be thoroughly explained as well as some of the effects that were achieved
with it. The author of this paper assumes that the audience has basic knowledge of the Fast Fourier
Transform and other basic signal processing operations and terminology.
1 Introduction
The phase vocoder is a widely used audio signal processing tool which makes use of the Fast Fourier
Transform (FFT) algorithm to decompose segments of an audio signal into a number of N point FFT’s.
An N point FFT operation will lead to a discretized version of the frequency domain representation of a
signal, with a frequency resolution of fr where:
𝑓𝑟 =
𝐹𝑠
𝑁
In which Fs is the sampling frequency and N is the number of points being used to compute the FFT of
the signal. In the phase vocoder implementation developed and described in this paper, small ‘chunks’
of the input signal are taken and used to compute a number of time-varying frequency domain
representations of the input signal. In order to take these small chunks of the input signal, a window
function is used to minimize distortion and improve the accuracy of signal reconstruction.
1.1 The Hanning Window Function
The phase vocoder described in this paper uses the Hanning window to window each chunk of the input
signal. The Hanning window (Figure 1) used to window the input signal is given by the following
equation:
𝑦=
1
∗ (1 + cos(2 ∗ 𝑝𝑖 ∗ (0: 𝑠𝑖𝑧𝑒 − 1)./𝑠𝑖𝑧𝑒))
2
Where (0:size-1) is a vector definition in Matlab. So the function described above actually creates a
vector of window coefficients that can be multiplied with other vectors of the same size (i.e. our chunk
of input signal). The visual representation of the Hanning window can be seen in Figure 1 below:
Figure 1 - Hanning window used to window input signal
The reason the Hanning window is used to window the function is so that we can take overlapping
windowed segments of the input signal that can be later overlapped and added back together to
reproduce the output signal without any variations in amplitude. The reason this method works is
because if we overlap each window with half of another window and add them together we get a signal
that is equal to 1 in the region of overlap. This is exactly what we want because when reconstructing
the signal we want a gain of unity. The Hanning window is also used because it is a smooth version of a
rectangular window and thus produces less harmonics and distortion than that of a rectangular or
truncating window (due to discontinuities at rectangular window boundaries).
1.2 The Windowing Process
The input signal is ‘windowed’ by taking a chunk of the signal which is the same size as our generated
window coefficient vector (see section 1.1 The Hanning Window Function) and multiplying each vector
together. The FFT of this resulting vector is computed and the magnitude and phase are then placed
into two arrays which will hold all of the time-frequency data of the windowed signal chunks. The next
chunk is then windowed by moving our window by a down the signal by a certain number of samples
known as the Analysis Hop Size. This analysis hop size which we will denote with anHop can be defined
by the user to achieve many different effects as we will see later in this paper. The windowing process is
shown below (Figure 2 – Windowing and overlap adding of signal) with the following parameters set:
Window Size= 200 Samples
anHop = 100 Samples
Figure 2 – Windowing and overlap adding of signal
The above diagram shows the step by step process of windowing the input signal (blue). Each window is
taken and then the index is advanced by anHop. You can see the windows used (dashed blue) and the
resulting windowed input (red). The resulting output signal is produced by overlapping and adding the
windowed chunks (i.e. adding the red chunks together as they appear in the figure).
Now that the basics of signal windowing have been explained, the phase vocoder structure and modules
for analyzing and synthesizing all of these windowed chunks will be introduced and explained.
2 The Phase Vocoder Implementation
The phase vocoder implemented and discussed in this paper has the structure shown in Figure 3 below.
Figure 3 - Phase Vocoder Structure
This structure and implementation provides a greater flexibility for the programmer when coding new
audio effects based on this phase vocoder. As you can see from the figure above, the analysis and
synthesis modules do not change; only the effect specific code that alters the phase and magnitude data
changes depending on the effect. This allows for many effects to be coded using the phase vocoder with
minimal effort. This structure could also be easily abstracted into a full effect plug-in with minimal
coding and effort.
2.1
Phase Vocoder Analysis Module
The analysis module is used to transfer the windowed time-domain chunks into the frequency domain
by calculating the FFT of each chunk and returning the magnitude and phase information associated
with its frequency-domain representation. See Figure 4 for the code associated with the analysis
module.
function [mag,phase] = pvAnalyze(x,windowSize,anHop)
%pvAnalyze function analyzes the input signal by taking windowed chunks of
%the input and doing an fft of each chunk and placing it into an array.
%This allows for the phase and magnitude of each chunk to be separated and
%modified before being resynthisized back into an output signal.
% set up our internal variables and pre-allocate memory
xSize=length(x);
% Get the input signals length
window=makeWindow(windowSize);
% Window vector used to window input
numWindows=ceil(xSize/anHop);
% Get the number of windows we need
y=zeros(1,numWindows*anHop +windowSize);
% pre-allocate our output signal
Z=zeros(numWindows,windowSize);
% Pre-allocate our output signal
% Pad the input signal with zeros if our windowing will be larger than
% the input so that we can avoid discontinuities
if(length(y)>length(x))
x=[x zeros(1,length(y)-length(x))];
end
% Set up the start index for taking fft's of input
n_start=1;
for i = 1:numWindows-1
% Advance the end index by the length of our window
n_end=n_start+windowSize-1;
% Take the windowed fft and place in array
Z(i,:)=fft(fftshift(x(n_start:n_end).*window));
n_start=n_start+anHop;
% Advance the starting index by one hop
end
% threshold the array of complex numbers to get rid of the very small
% numbers which produce unusable phase results
Z=threshold(Z,1e-4);
mag=abs(Z);
% Return the magnitude
phase=angle(Z);
% Return the phase
end
Figure 4 - Phase Vocoder Analysis Module Code
2.1.1
Analysis Code Description
Variable Description



X is the time domain input signal (mono is expected)
windowSize is the size of the window that will be used to analyze this signal
anHop is the analysis hop size, i.e. the number of samples the windows are spaced
by
Memory Pre- Allocation
The code starts off by pre-allocating memory for the arrays and variables that will be used. This is done
to speed up the processing time by avoiding dynamic memory allocation which is expensive when
processing large amounts of data.
Input Padding
The next ‘if’ statement pads the input signal with zeros so that we can use an integer number of
windows to avoid partial windowing and discontinuities.
Windowing, Cyclic shift & FFT operation
The ‘for’ loop iterates through our input signal and takes the FFT of our windowed chunks of input,
which are spaced apart by a number of samples specified by the anHop variable passed to the function.
I should take this time to draw your attention to and explain the following small snippet of code which is
doing several things at once.
Z(i,:)=fft(fftshift(x(n_start:n_end).*window));
Lets break this line of code up and take a look at what it is actually doing. First let’s look at the inner
part:
x(n_start:n_end).*window
As you can probably guess, this is taking a chunk of our input signal and multiplying it by our window
coefficients; in turn windowing the input signal. (see section 1.2 The Windowing Process for more details
on this process)
The next part of code which might not make much sense at this point is:
fftshift(x(n_start:n_end).*window)
This part of the code uses a function called fftshift. This function is used to do what is called a Cyclic
Shift on the windowed signal. Some additional background information is needed about the FFT
operation before explaining the importance and use of the cyclic shift operation. The explanation of this
function and its importance is discussed in Appendix A.
The next part of this code:
Z(i,:)=fft(fftshift(x(n_start:n_end).*window));
Takes the fft of the now cyclic shifted signal chunk and places the complex valued vector into an array
for later manipulation.
Threshold Time-Frequency Data
Z=threshold(Z,1e-4);
The complex valued FFT results are then put through a threshold function which iterates through each
frequency bin of each FFT window and sets all bin’s below a certain threshold to zero. The reason this is
done is because if the complex number in the frequency bin is very small (both real and imaginary
portions), it will have little or no effect on the magnitude results however, will have a huge effect on the
phase results since the real and imaginary parts of the complex number are very small but very close in
magnitude.
Generate Phase & Magnitude Data
mag=abs(Z);
phase=angle(Z);
% Return the magnitude
% Return the phase
Now that a threshold has been place on the FFT windows, the magnitude and phase data is returned to
the calling program.
The phase vocoder synthesis module will now be discussed so that our analyzed results can be used to
generate an output signal.
2.2 Phase Vocoder Synthesis Module
The synthesis module is used to take the time-frequency data of our analyzed and modified signal and
synthesize an output signal. There are quite a few general steps that have to be taken which I will
describe and explain after introducing the code itself. Please refer to ## for the fully documented code
for this synthesis module.
function [y] = pvSynthesize(mags,phases,synthHop,anHop,taper)
% pvSynthesize is used to synthesize a signal given the magnitude and
% phase of the signal as well as the synthesis and analysis hop sizes
% set up our internal variables and pre-allocate memory
[numWindows windowSize]=size(mags);
numBins=windowSize;
window=makeWindow(windowSize);
% pre-allocate our output signal
y=zeros(1,numWindows*synthHop+windowSize);
synthPhase=zeros(numWindows,numBins);
%% PHASE UNWRAPPING SECTION
% This section is responsible for unwrapping the phase
% The target phase (per sample) for each bin is calculated by
% dividing the unit circle by the number of bins and then multiplying
% by each bin's number ie omegaK=2*pi*n/N
omega_k=2*pi*[0:numBins-1]./numBins;
% Pre-allocate our phase_increment array
phase_increment=zeros(numWindows-1,numBins);
% We want the first phase of both of our signals to be the same
synthPhase(1,:)=phases(1,:);
% Iterate through all of our windows (except the first) and calculate
% the phase increment for the analysis hop, then take the preceding
% synthesis phase and add the phase_increment found for the analysis
% hop size and interpolate it for the synthesis hop size
for i=2:numWindows
% The Target phase for the each bin is found for this window by
% using the old phase and assuming the phase difference for a
% perfect bin centered cosine, i.e. the phase slope omega_k for
% each bin multiplied by the number of samples between windows
% i.e. the anHop
target_phase=phases(i-1,:)+omega_k*anHop;
% the deviation phase is calculated by taking difference between
% the actual phase of this bin in this window and the target
% phase of this bin. The principle argument must be taken to
% place the phase between - pi and pi and thus placing it on the
% unit circle for resynthesis
deviation_phase=princarg(phases(i,:)-target_phase);
% once the deviation phase is found we then add it to the ideal
% phase ramp for this bin (i.e. if we had a bin centered cosine)
phase_increment(i-1,:)=(omega_k.*anHop+deviation_phase)./anHop;
Code continued on next Page...
% Create new synthesis Phase based on the phase increment of the
% previous phases and interpolate linearly using synthHop size
synthPhase(i,:)= princarg(synthPhase(i-1,:)+ ...
phase_increment(i-1,:)*synthHop);
end
% Create our new complex output array which we will need to take the
% ifft of
Z=mags.*exp(j*synthPhase);
%% OVERLAP & ADD SECTION
% This section takes care of the overlapp and adding of the new
% synthesized signal. It does this by taking the ifft of each
% synthesized window and overlapping and adding it with the current
% output signal.
n_start=1;
for i = 1:numWindows
n_end=n_start+windowSize-1;
% Generate a window of output and overlap add it with the
% previous window. IFFT with the circular shift.
tmpIfft=fftshift(real(ifft(Z(i,:))));
% Window Tapering to avoid phase discontinuities at ends of
% windows
y(n_start:n_end)=y(n_start:n_end)+tmpIfft*window;
n_start=n_start+synthHop;
end
% truncate silence at end of output generated by the padding of the
% input signal in the analyze function
for i=length(y):-1:1
if(y(i)~=0)
outputEnd=i;
break;
end
end
y=y(1:outputEnd);
% Normalize Output Signal
y(1:length(y))=y(1:length(y))/max(abs(y));
end
2.2.1
Figure 5 - Synthesis Module Code
2.2.2
Synthesis Code Description
Variable Description:





mags is an array holding the magnitude portion of the desired output signals timefrequency data
phases is an array holding the phase portion of the desired output signals timefrequency data
synthHop is the number of samples that the synthesis hop size should be. This
parameter will determine behavior of the overlap add and phase reconstruction of
the signal.
anHop is the hop size that was used in the analysis module. (see section 2.1.1
Analysis Code Description for description)
taper allows the user to declare whether or not they would like the output signal
window tapered. (can cause some distortions in certain situations)
NOTE: The size of the mags and phases arrays should be (number of windows) x (window size) in
order for the synthesis module to be able to correctly reconstruct the signal.
Memory Pre-allocation
As in the analysis module the code starts off by pre-allocating memory for the arrays and variables that
will be used. This is done to speed up the processing time by avoiding dynamic memory allocation
which is expensive when processing large amounts of data.
Phase Unwrapping
The phase unwrapping is a very important and somewhat complicated section of code and as such, I will
be taking an in depth approach to try and explain explicitly what exactly this section of code is doing.
Before reading the explanation of this code it is recommended that the reader refer to Appendix B for a
detailed explanation of the relationship between phase and frequency of a sinusoid and the need for
phase unwrapping in this implementation. We will take a look at the code line by line to explain the
purpose of each line.
The first line of the phase unwrapping section:
omega_k=2*pi*[0:numBins-1]./numBins;
Sets up a vector that holds the nominal phase increment per sample for each frequency bin. The array
for holding all of the phase increments is then pre-allocated in memory with the following line:
phase_increment=zeros(numWindows-1,numBins);
Next the first synthesis window’s phase is set to the original first window’s phase since this window will
be in the exact same location (i.e. at the beginning of the signal so no interpolation is needed).
synthPhase(1,:)=phases(1,:);
The algorithm then enters a ‘for’ loop which iterates through all of our windowed segments timefrequency data and unwraps the phase so it can be used to calculate the what the new phase should be.
To unwrap the phase the target or nominal phase for each bin must be calculated (i.e. the phase ramp
described in Appendix B) should be:
target_phase=phases(i-1,:)+omega_k*anHop;
Recall that omega_k is the nominal phase increment per sample for each frequency bin, so this value is
multiplied by the anHop (which is in samples) to determine the nominal phase increment between
windows. The deviation from this ideal situation is then calculated by using the current phase and
subtracting the target phase calculated above. Since Matlab’s angle function places all phase angles in
the range of –π and π the principle argument of this value must be taken to put this deviation phase in
the correct range to add to the current phase in Matlab. This is done by using the princarg function
(code shown below).
function Phase=princarg(Phasein)
% This function is responsible for calculating the principle
% argument of a given phase
Phase=mod(Phasein+pi,-2*pi)+pi;
end
deviation_phase=princarg(phases(i,:)-target_phase);
After calculating the deviation phase above we now know the difference between the actual phase
measured and the nominal or target phase of the current frequency bin. The reason there this deviation
phase exists goes back to the fact that we are doing an N-point FFT operation on our windowed signal.
Recall (from section 1 Introduction) that this operation produces a frequency-domain representation
with a finite frequency resolution of fr. This means that the phase vocoder is quantizing the actual
frequencies of the signal into one of these frequency bins which have a set frequency. The result is that
if a frequency in the signal perfectly matches that of a frequency bin (i.e. bin centered frequency) it will
have a phase of zero radians. However, if the signal has frequencies that are slightly off bin-centered or
in other words in between two bins, it will be placed within a frequency bin but will also have a non-zero
phase associated with it. This phase changes from window to window and when analyzed has a ramplike shape from window to window. It can be seen from Appendix B that a phase ramp (with respects to
time) translates to a frequency; however, we already know the frequency bin (i.e. the ω in Appendix B is
not zero) in which this phase ramp is occurring. In turn the frequency of the current bin and the
frequency associated with the phase ramp can be added together to produce the actual frequency of
this component. The nominal phase increment per sample is then calculated with the following line:
phase_increment(i-1,:)=(omega_k.*anHop+deviation_phase)./anHop;
The synthesis phase is then calculated by taking the previous synthesis phase value and adding the
nominal phase increment which depends on the synthesis hop size (synthHop). It can be seen that if the
anHop and synthHop variables are equal then the synthesis phase and the original phase will be the
same. Let’s see what happens if anHop and synthHop are different. You will notice that the phase
increment is divided by anHop to get a nominal phase increment per sample. To find the synthesis
phase we then take this phase increment and multiply it by synthHop. If the synthesis hop size is larger
than the analysis hop size, the phase increment for the synthesis phase will be a larger increment. So
what does this mean? This is the portion of code that is interpolating the phase ramp measured (see
the figure in Appendix B) so that when we overlap and add the windows with further spacing (i.e.
synthHop instead of anHop) the phase is where it should be to maintain the frequency that this phase
ramp represents. The code that interpolates this phase increment is shown below:
synthPhase(i,:)= princarg(synthPhase(i-1,:)+ phase_increment(i-1,:)*synthHop);
As you can see, it takes the previous synthesis phase and adds the newly interpolated phase increment
value to it so that the new phase will represent a point further along on the phase ramp.
Resynthesis
The output frequency-domain representation is then generated with the following line of code which
uses the original magnitude (mags) and the newly calculated and interpolated synthesis phase array.
Z=mags.*exp(j*synthPhase);
Overlap and Add
The next section of code iterates through each re-synthesized window and overlaps and adds each
output chunk together (see Figure 2 for process details). Taking a closer look at this loop you will notice
the following line:
tmpIfft=fftshift(real(ifft(Z(i,:))));
Which is taking the i’th window and doing the Inverse Fast Fourier Transform (IFFT) of it to create a time
domain representation. The real part of this signal is taken as the IFFT operation will sometimes
produce some very small complex numbers in addition to the real signal. You will then notice that once
again the fftshift operation is used. This fftshift operation is used to counteract the original one done in
the analysis module. This will shift all of the samples back to their original and rightful position in the
output window. The next few lines take the temporary output signal and overlap it with rest of the
output signal in order to add it.
if(taper==0)
y(n_start:n_end)=y(n_start:n_end)+tmpIfft;
else
y(n_start:n_end)=y(n_start:n_end)+tmpIfft.*window;
end
Depending on the user defined variable taper, the time domain synthesized output chunk is being
multiplied by the window coefficients before being added to the rest of the output. This process is
called ‘Window Tapering’ and is used to eliminate any phase discontinuities that may occur at the
boundaries of overlap (i.e. the edges of the output chunk). This window tapering removes a lot of
distortion that occurs as a result of awkward phase transitions on the boundaries of each window.
Window tapering however can also cause distortion in the output signal (due to windowing on the
magnitude) if there are no discontinuities in the phase at the window boundaries, hence why this
parameter was made optional in this implementation. The index of the output signal is then increased
by the synthesis hop size and the next output chunk is calculated and added to the output signal.
Truncation of Silence
The next ‘for’ loop starts from the end of the generated output signal and removes the silence, which
results from the zero-padding process done in the analysis module (see section 2.1.1 - Input Padding), by
truncating the output signal.
Normlization
The output signal is then finally normalized to 0 dB by dividing the entire signal by the maximum
amplitude of the output.
3 Effects
Using the phase vocoder described in the previous section, many different audio effects can be created.
In this section I will discuss some of the effects and applications of the phase vocoder.
NOTE: The following websites contain audio clips of these effects, Matlab files and a power point
presentation outlining how they work:
Audio:
http://web.uvic.ca/~cjam/ELEC484/Final%20Project/Audio%20Results/
Matlab files:
http://web.uvic.ca/~cjam/ELEC484/Final%20Project/M-Files.zip
Power Point:
http://web.uvic.ca/~cjam/ELEC484/Final%20Project/Phase%20Vocoder.pptx
3.1 Time Stretching
An interesting application of the phase vocoder is time stretching. To accurately stretch or compress the
time scale of an audio signal without affecting the pitch of the audio signal itself. This effect can be
easily done with the phase vocoder by changing the ratio of analysis hop size to synthesis hop size.
Basically the frequency content in each of the windowed chunks is taken and either spaced further apart
or closer together when overlapping and adding the segments. The effect is that the pitch is maintained
while the signal is stretched. (See Figure 6 below for documented code)
%% Load Signal From File
[x Fs]=wavread('../x1.wav');
x=x(:,1)';
% Convert to mono
windowSize=2048;
% Set up window Size
stretchRatio=2.5;
% Our time Stretch Ratio
anHop=128; % Analysis Hop Size
% Calculate Synthesis Hop Size based on ratio
synthHop=round(anHop*stretchRatio);
% Analyze input signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
% Generate output (don’t modify phase or
% magnitudes)
y=pvSynthesize(mag,phase,synthHop,anHop,1);
y=y*max(abs(x)); % Scale output to match input amplitude
Figure 6 - Time-Stretch Effect Code
3.2 Pitch Shifting
Pitch shifting is an effect that is the opposite of time-stretching in the sense that the time scale of the
audio signal should remain the same however the pitch of the audio signal should be modified. This can
be accomplished by time-stretching the audio sample by the ratio that you would like to adjust the pitch
by, and then re-sampling the resulting signal at the same ratio. This will result in a pitch shifted audio
signal. (See Figure 7 below for commented code)
%% Load Signal From File
[x Fs]=wavread('../x1.wav');
x=x(:,1)';
% Convert to mono
windowSize=2048;
% Set up window Size
pitchRatio=0.7; % Our pitch shift Ratio
anHop=256; % Analysis Hop Size
% Calculate Synthesis Hop Size based on ratio
synthHop=round(anHop*pitchRatio);
% Analyze input signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
% Generate output (don’t modify phase or
% magnitudes)
y=pvSynthesize(mag,phase,synthHop,anHop,1);
% Resample output to make pitch shifted
% version of input
y=resample(y,anHop,synthHop);
y=y*max(abs(x)); % Scale output to match input amplitude
Figure 7 - Pitch Shifting Code
3.3 Robotization
Since so much of the pitch and frequency content is contained within the phase information of each
windowed chunk (see Appendix B for details); an interesting effect can be achieved by setting this phase
information to zero in each window. This process eliminates much of the pitch variations in the audio
signal and produces an effect that sounds much like a robot. (See Figure 8 below for commented code)
NOTE: The underlying pitch of the resulting audio signal can be adjusted by changing the window and
analysis hop size.
%% Load Signal From File
[x Fs]=wavread('../x1.wav');
x=x(:,1)';
% Convert to mono
windowSize=512; % Set up window Size
anHop=128; % Analysis Hop Size
synthHop=anHop; % Synthesis Hop Size
% Analyze input signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
% Get size of phase array
[pRows pCols]=size(phase);
% Set new phases to zero
phase=zeros(pRows,pCols);
% Reconstruct Output Signal
y=pvSynthesize(mag,phase,synthHop,anHop,1);
Figure 8 - Robotization Effect Code
3.4 Whisperization
Whisperization is another effect that can obtained by modifying the phase information of each
windowed chunk. This effect can be achieved by randomizing all of the phases in each windowed chunk
so that the correlation of phase values between windows, for any given frequency bin, is lost. (See
Figure 9 below for commented code)
NOTE: A smaller window size should be used when applying this effect to enhance the strectral envelope
and diminish the amount of the signal being represented by the magnitude portion of the timefrequency data (i.e. the envelope of the signal).
Figure 9 - Whisperization Effect Code
3.5 Stable & Transient component separation
The stable and transient components of a signal can be separated by observing the changes in phase
from window to window. The components that have large deviations in phase from window to window
are transient signal components and with the opposite logic, the stable components have small
variations in phase from window to window. The code shown below (Figure 10) shows how the
transient and stable components can be isolated.
%% SOUND FILE
[x Fs]=wavread('../stableTest.wav');
x=x(:,1)';
windowSize=1024;
anHop=512;
synthHop=anHop;
%Analyze Input Signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
[pRows pCols]=size(phase);
% Set Threshold
thresh=.5;
% Pre-allocate new phase and magnitude arrays
newPhase=zeros(pRows,pCols);
newMag=zeros(pRows,pCols);
% make first window of new phase array equal to original phase
newPhase(1,:)=phase(1,:);
% nominal phase increment for each bin
omega_k=2*pi*[0:pCols-1]./pCols;
% Iterate through all windows
for i = 2:pRows
% Calculate the target phase for each bin based on previous phase
target_phase=phase(i-1,:)+omega_k*anHop;
% Calculate deviation from the target phase
deviation_phase=princarg(phase(i,:)-target_phase);
% Set all bins either outside or inside the threshold to zero
% NOTE: if abs(deviation_phase)<thresh is used, the stable components
% will be kept. If abs(deviation_phase)>thresh the transient
% components will be kept
newPhase(i,:)=phase(i,:).*(abs(deviation_phase)<thresh);
newMag(i,:)=mag(i,:).*(abs(deviation_phase)<thresh);
end
% Reconstruct output signal
y=pvSynthesize(newMag,newPhase,synthHop,anHop,1);
Figure 10 - Transient & Stable Component Isoloation Code
3.6 Signal Filtering
Digital filters can also be easily implemented using the phase vocoder by taking multiplying the
magnitude (frequency) response of the desired filter with each windowed chunk of the input signal. In
order to have the same response as the desired filter, the phase of the filter must also be added to the
phase of the windowed chunk of the input signal (i.e. the phase in each corresponding frequency bin
must be added). (See Figure 11 below for commented code)
%% Load Signal From File
[x Fs]=wavread('../x1.wav');
x=x(:,1)';
% Convert to mono
windowSize=2048;
% Set up window Size
anHop=256; % Analysis Hop Size
synthHop=anHop; % Synthesis Hop Size
b=[1 0 -1];
% Filter numerator Coefficients
a=[1 -1.5450 -1];
% Filter Denominator
% Get Frequency Response of Filter
filterCoefs=freqz(b,a,windowSize)';
% Analyze input signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
% Get size of magnitude array
[mRows mCols]=size(mag);
% Multiply the filter frequency response with
% The time frequency data in each window and
% add the phase response to the phase data in each window
for i = 1:mRows
mag(i,:)=mag(i,:).*abs(filterCoefs);
phase(i,:)=phase(i,:)+angle(filterCoefs);
end
% Reconstruct Output Signal
y=pvSynthesize(mag,phase,synthHop,anHop,1);
Figure 11 – Signal Filtering Code
3.7 Denoising
De-noising is an important application of the phase vocoder. The de-noising effect can be thought of as
a number of noise gates being applied to each frequency bin in each window. The code shown below
(Figure 12) uses a dynamic scaling technique to attenuate the low level signals. It does this by using a
de-noising coefficient coef and calculating the normalized gain of the input signal r. The code then
multiplies the input signal by the ratio:
r./(r+coef)
Which you can see will be extremely small for any value of r which is comparable in magnitude to coef,
and close to unity if r >> coef. This results in the attenuation of low level signals and thus de-noising of
the signal.
NOTE: The code below adds some low level noise to the input signal and uses the de-noising algorithm to
remove it.
%% Load Signal From File
[x Fs]=wavread('../x1.wav');
x=x(:,1)';
% Convert to mono
noise=0.01*randn(1,length(x)); % Create a noise signal
x=x+noise;
% Add noise to input
windowSize=2048;
% Set up window Size
anHop=256; % Analysis Hop Size
synthHop=anHop; % Synthesis Hop Size
% Analyze input signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
coef=0.0015; % Set up de-noising coefficient
f=mag.*exp(j*phase);
% Set up frequency vector
r = mag/windowSize;
% Get r parameter for de-noising
newF = f.*r./(r+coef); % Remove noise based on coef
newMag=abs(newF);
% get new magnitude
newPhase=angle(newF);
% get new phase
% Reconstruct Output Signal
y=pvSynthesize(newMag,newPhase,synthHop,anHop,1);
y=y*max(abs(x));
% Scale output to matchin input amplitude
Figure 12 - De-noising effect code
3.8 Audio Compression
Audio file compression is an interesting application of the phase vocoder and can be implemented in
many different ways. I have implemented an audio compression scheme in two different ways using the
phase vocoder.
3.8.1
Thresholding frequency data
The first compression scheme simply sets all frequency bins that have a magnitude below a certain
threshold to zero and thus removing this data from the audio signal. (See Figure 13Figure 11 below for
commented code)
%% Load Signal From File
[x Fs]=wavread('../x1.wav');
x=x(:,1)';
% Convert to mono
windowSize=2048;
% Set up window Size
anHop=1024; % Analysis Hop Size
synthHop=anHop; % Synthesis Hop Size
% Analyze input signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
% Get number of windows and bins from magnitude array
[numWindows,numBins]=size(mag)
% preallocate arrays for new phases and magnitudes
newPhase=phase;
newMag=mag;
% Set threshold value
thresh=1.5;
% set all frequency components that are below the threshold to zero
for i = 1:numWindows
newPhase(i,:)=phase(i,:).*(mag(i,:)>thresh);
newMag(i,:)=mag(i,:).*(mag(i,:)>thresh);
end
% Reconstruct output
y=pvSynthesize(newMag,newPhase,synthHop,anHop,1);
Figure 13 - Audio Compression (threshold) Effect Code
Comments:
Using this compression scheme, the quality of the sound deteriorates with increases in threshold. This is
expected because as the threshold is increased, more information in the signal is compressed and lost.
3.8.2
Extracting largest frequency components
Another implementation of audio compression is to iterate through each windowed chunk of input
signal and keep only the top N largest frequency components. In other words, analyze the magnitude
time-frequency data and keep only N frequency bins with the largest magnitudes. So for example if
N=1, the audio signal is being represented by a single sinusoid at any given point in time. (See Figure 14
below for commented code)
%% Load Signal From File
[x Fs]=wavread('../x1.wav');
% [x Fs]=wavread('../stableTest.wav');
x=x(:,1)';
% Convert to mono
windowSize=2048;
% Set up window Size
anHop=1024; % Analysis Hop Size
synthHop=anHop; % Synthesis Hop Size
% Analyze input signal
[mag phase]=pvAnalyze(x,windowSize,anHop);
% Get number of windows and bins from magnitude array
[numWindows,numBins]=size(mag)
% preallocate arrays for new phases and magnitudes
newPhase=zeros(numWindows,numBins);
newMag=zeros(numWindows,numBins);
% Set the number of frequency bins to keep
bins2Keep=2;
% set all frequency components that are below the threshold to zero
for i = 1:numWindows
% iterate through the window and find the maximum (bins2Keep) frequency
% components of the signal
for j = 1:bins2Keep
% find index of maximum frequency component
[maxVal maxIndex]=max(mag(i,:));
% put in the maximum magnitude and phase information into the new
% magnitude and phase arrays
newMag(i,maxIndex)=mag(i,maxIndex);
newPhase(i,maxIndex)=phase(i,maxIndex);
% Set the current maximum to zero so that it does not get used
% again
mag(i,maxIndex)=0;
end
end
% Reconstruct output
y=pvSynthesize(newMag,newPhase,synthHop,anHop,1);
Figure 14 - Audio Compression Effect (maximum frequency component) Code
Comments:
Using this compression scheme, the higher the value of the bins2Keep variable, the more frequency bins
are kept and thus less compression. Some artifacts are introduced at low values of bins2Keep which is
expected since the process is trying to represent the signal with fewer and fewer sinusoids.
4 Additional Figures and Graphs
This section provides the additional figures and graphs required and outlined as deliverable in the
project summary.
4.1 Project Part B – Phase Vocoder Testing with Cosines
This section shows the wave forms produced by the phase vocoder given different cases of input cosine
waves. Note that window tapering was not used in these examples as this is one of the situations where
there are no phase discontinuities between windows and the window tapering causes modulation in the
output signals amplitude.
4.2 Part C – Amplitudes in time/frequency plane
The following section shows the figures representing the time/frequency (magnitude spectrum) data for
the cosine waves that were tested in section 4.1.
4.2.1
Results Interpretation
The following table describes and interprets the results shown above:
Frequency (Hz)
7.5
7.5685
1
0.9
Observations
Clean, very defined spikes in frequency
domain.
Interpretation
The integer number of samples per
cycle lead to very clean spikes in the
frequency domain. Many cycles per
window so the frequency content is
located away from DC.
Slight variations in magnitude of spikes
The non-integer number of samples
in frequency domain, small tails added
per cycle lead to a slow shift in how
to spikes.
many cycles are in each window.
Similar Results as above for Mid Frequencies
Large variations in magnitudes in
The large variations in magnitude are
frequency domain. Tails also added to
caused by the fact that the sinusoid is
spikes. Magnitude variations are
oscillating so slowly that it is actually
symmetric since the windows
seen by the phase vocoder as a slow
encapsulate the same curves just with
varying DC waveform.
opposite slopes.
Large variations in magnitudes which
The non symmetric variations are
are not symmetric. Tails are also
caused by the non-integer number of
present on each spike.
samples per cycle (i.e. correlation
between windows is lost)
4.3 Part D – Phase variations in time/frequency plane
The following section shows the figures representing the time/frequency (phase spectrum) data for the
cosine waves that were tested in section 4.1.
4.3.1
Results Interpretation
The following section is interpreting the results shown in the section previous.
Integer number of samples per cycle



For an odd number of cycles per window the phase would alternate between 0 and π from
window to window.
o This is because in alternate windows there will be either two negative peaks or two
positive peaks, this will result in a phase of π or 0 respectively.
For an even number of cycles per window the phase would be constant at zero.
o This is because each window has an even number of negative and positive peaks
which lead to a net phase of zero.
For lower frequencies (i.e. less than 2 cycles per window) the phase information became
inconsistent because the signal was closer to a ramping DC signal than a sinusoid (relative to
each window’s size).
Non-Integer number of samples per cycle


Phase ramps between windows can be seen due to the fact that the frequency content of
the signal lies in between frequency bins.
Much messier phase data from non-integer number of samples per cycle
4.4 Effect of Cyclic Shift
This section shows with figures and diagrams, the effect of analyzing both a sinusoid and impulse input
with and without the cyclic shift operation in place.
4.4.1
Sinusoid
The first image shown in this section is that of a sinusoid analyzed with the cyclic shift operation in
place. The second image shown is the same sinusoid without the cyclic shift. Note that only the phase
information is shown because the cyclic shift does not affect the time-frequency magnitude data.
Notice that when the cyclic shift is not used the correlation between the phase values in any given
window is lost and the phase values alternate between 0 and π.
Figure 15 - Sinusoid analyzed with cyclic shift
Figure 16 - Sinusoid analyzed without cyclic shift
4.4.2
Window Centered Impulse
The effect seen on an impulse is shown by the following two images which contrast the window
centered impulse analyzed with the cyclic shift and then without. It can be clearly seen that the cyclic
shift correlates the phases and gives a much cleaner representation of the frequency spectrum than
without.
Figure 17 - Window centered impulse analyzed with cyclic shift
Figure 18 - Window centered impulse analyzed without cyclic shift
5 Recommendations
The phase vocoder implementation can be improved by adding some minor modifications to the base
analysis and synthesis module code to improve performance. One of these modifications is to zero pad
the windowed chunks of input signal. This modification increases the frequency resolution of the FFT
operation without sacrificing the ability to track fast spectral modifications.
6 Conclusion
A phase vocoder is extremely useful tool for audio signal processing. Many effects and tools can be built
with the phase vocoder implementation described in this paper. Time-frequency analysis of audio
signals can seem confusing at first glance; however when the process is broken down into small pieces
and analyzed, the rationale behind the code and algorithms becomes much more transparent. This
project was an excellent way to investigate and comb through the finer details of time-frequency and
spectral analysis.
Appendix A
FFT Operation and the Cyclic Shift
The FFT operation assumes that the time vector inputted has the origin at the left hand side of the
vector. However, since we are windowing the input signal with the Hanning window, we would like a
pulse in the middle of our window to have zero phase. If we however simply take the FFT of a pulse in
the middle of the window, the resulting phase information will alternate from 0 to π in each frequency
bin since the impulse is right in the middle of the window. This is because the associated cosine
components needed to create this impulse would have coefficients that alternate between +1 & – 1 for
each frequency bin. In order to solve this issue the cyclic shift is used which takes the right half of the
windowed signal (Blue section below) and shifts it to the left hand side and moves the left hand side
(Red section below) to the right hand side. The diagram below shows the cyclic shift operation.
Cyclic Shift
The effect of this operation is that an impulse centered in the middle of the window will be moved left
hand side of the modified window. When the FFT operation is done on this modified windowed signal,
the resulting phase of the cosines making up this impulse will all be zero which have much better
correlation than the previous case which jumped from +1 to -1 to +1 for each frequency bin. The effect
of the cyclic shift can be seen in the figures and results shown seen in section 4.4.
Appendix B
Phase-Frequency relationship of sinusoids & need for phase unwrapping
This appendix will explain the relationship between instantaneous frequency and phase of a sinusoid.
To help show this relationship consider a cosine of the form:
𝑥(𝑡) = cos(𝜔𝑡 + 𝜙(𝑡))
(1)
Where ω is the angular frequency of the sinusoid and 𝜙(𝑡) is the phase of the cosine as a function of
time. Now imagine for a moment that ω=0 for this sinusoid, which leaves us with:
𝑥(𝑡) = cos(𝜙(𝑡))
(2)
Now suppose that 𝜙(𝑡)has the following form:
𝜙(𝑡) = 𝑐𝑡
(3)
Where c is a constant. Now by substituting (3) back into (2) we get:
𝑥(𝑡) = cos(𝑐𝑡)
Which looks exactly like (1) without any phase variations
𝑥(𝑡) = cos(𝜔𝑡)
Consequently what this means for our phase vocoder is that a sinusoid of a specific frequency can be
thought of having a linear phase ramp instead of a frequency component in the sinusoid. Therefore if
we know what frequency bin we are analyzing (i.e. the slope of the phase ramp) as well as the current
window’s phase, we can interpolate the phase ramp to find what the phase should be at an arbitrary
time (i.e. the next window). In the figure shown below, you can see phase of a steady sinusoid versus
time (dashed blue line). The green line is the phase that Matlab returns since when it takes the angle of
an arbitrary complex number it places the phase between –π and π (Matlab does not realize that the
angle has gone around the unit circle once already).
For this reason, we must unwrap the phase (green line) of the sinusoid to get the actual phase (blue line)
so that we can accurately determine the correct amount to increment the new phase by to maintain the
relationship given by (3).
Download