final report - School of Electrical and Computer Engineering at

advertisement
Final Report
4/19/05
Spring 2005: Group 3
Austin Assavavallop
William Feaster
Greg Heim
Philipp Pfeiffenberger
Wamba-Kuete Yves
ECE4006E Senior Design Project
Georgia Institute of Technology
School of Electrical and Computer Engineering
http://www.ece.gatech.edu/academic/courses/ece4006/05spring/ece4006e/group03/
I. Introductory Design Theory
The very nature of the digital signal processing (DSP) system demands simplicity, to be
lean in computing cycle usage and to approach real-time processing, and accuracy, to ensure data
quality and robustness of the system. Thus, the design specifies a multithreaded analysis of the
input signal using autocorrelation analysis, or “Pitch Detection,” to determine note frequencies,
and time frame analysis, or “Envelope Analysis,” to determine note envelopes. Figure 1
Figure 1. System Overview
illustrates the basic concept. A monophonic signal is input into the DSP system and is
simultaneously processed through envelope analysis and pitch detection. The “Controller” is the
host computer that consolidates the information supplied by the two threads.
A. Pitch Detection Analysis
In the autocorrelation pitch detection analysis, the octave and specific note are
determined. As with the time frame envelope analysis, presets must be calculated for optimal
operation. Presets required are the sampling rate, the number of guesses to be processed, and the
maximum octave limit on the musical scale. Initial arrays are set according to these presets and
two main loops begin processing. The outside loop traverses through each octave, while the inner
1
loop traverses through each note.
The autocorrelation function, visually demonstrated in Figure 2, reveals that the
amplitude values at the intersections of the signal
and red lines are very closely correlated, resulting
in a very low value for autocorrelation. The
system samples the input signal at 44,100 Hz,
yielding 1,603 samples per frame. This array of
Figure 2. Autocorrelation illustration.
samples is referred to as si(n)=[0..N], where i
represents the frame number. To perform the autocorrelation calculations using Eq. 1, where Np
is the autocorrelation result and n is the sample index, each frame must contain at least two full
periods of the signal.
 N /2

min   | s(n)  s(n  N p ) |   N p
 n 0

Eq. 1
The frame size is determined by the lowest expected frequency of the input signal. This
value is used to determine the closest guess to a correct note in a particular octave. The design
limits the lowest expected frequency to 55 Hz, corresponding to an “A” in the first octave of a
standard twelve-note chromatic scale. Beginning with that note, the system calculates the
frequency of an incoming note based on the iterations of the two loops. From the calculated
frequency, the offset is also found. Next, the input note is compared sample-by-sample to the
ideal note at the calculated frequency. If the input note matches more closely the current ideal
note than the previously guessed note, the current ideal note is set as the best guess for the input
note’s frequency. If the previous guess is closer, no changes occur and the iterations resume.
This process continues through the entire musical scale to ensure the absolute best guess is
determined.
2
B. Envelope Analysis
In the time frame envelope analysis, the incoming monophonic signal supplied to the
system helps determine the block size for the time frame function. This preset allows the
function to set certain parameters before any processing can occur, thus allowing for quicker and
more accurate readings of input data. Once the main loop of the program starts, an automated
guess of the first note’s time frame size is generated based on where the frame starts and the size
of each block. Once time frame size is set, two more loops are created. The inner loop evaluates
individual samples from which the outer loop determines how many periods of the waveform
occur within the time frame. If no consistent periods are found, the assumed frame is expanded.
Once the time frame is reduced to the first complete period found in the timeframe, the single
note found can be further analyzed.
After the first note has been calculated, the envelope and pitch analyses cooperatively
determine the rest of the notes and their envelopes. When the envelope analyzer receives the
harmonic period Np from the pitch detection analyzer, the envelope analyzer divides the frame
into Cp one-period cycles ci(n) (Eq. 1). Cp is determined by Equation 2. Discrete integration over
ci(n) by Eq. 3 gives us Aci, the area of cycle i: the estimate of energy and amplitude. Following
s ( n  N p i ) 0  n  N p
c i ( n)  
0
Np  n

Eq. 1
C p  floor ( N / N p )
Eq. 2
Np
Aci   | ci ( j ) |
Eq. 3
j 0
A
1
Cp
Cp
A
i 0
ci
Eq. 4
Eq. 4, the average amplitude of the frame is then simply the averaged amplitude of all cycles.
The slope S, or attack, of the frame can then be estimated by averaging the difference of adjacent
cycle amplitudes (Eq. 5):
3
1
S
Cp
 A
Cp
i 1
ci
 Aci1

Eq. 5
In the situation where Cp = 4, the frames are split up as shown in Figure 3. The frame size is
Figure 3. Time frame distribution.
shown to be one period wide within a larger frame. The original frame is split into smaller time
frames until a single period occurs within a time frame.
II. Algorithm State Machine Design
The state machine, depicted in Figure 4, consists of four different states. Each plays a significant
role to insuring system reliability.
Figure 4. Music transcription system state machine.
1
A. Initialization State
The Initialization state represents the starting point of the system. During this state, the
system estimates the level of noise during silence. The system remains in the Initialization state
until a significant amplitude change is detected.
B. New Note State
The New Note state is a state of uncertainty. The system attempts to determine whether or
not a note has been encountered. When the system reaches the New Note state, three options are
weighed. The first option returns the system to the Initialization state; this only occurs only if the
system has yet to be in the Stable Note state and the frequency is currently unstable. The system
recognizes the signal as not a note, but noise with high amplitudes. The second option moves the
system into the Stable Note state. This state transition occurs if the system detects a stable
frequency over a certain period of time. The third option allows the system to linger in the New
Note state. If the change in frequency is not consistent enough to move the system to the Stable
Note state, the system remains in the New Note state.
C. Stable Note State
The system reaches the Stable Note state when the system is certain a note has been
detected. The system stays in the Stable Note state only if the amplitude level is fairly constant or
slightly decreasing, and the frequency produced is constant. If the frequency changes or
amplitude sharply increases, the system returns to the New Note state. The system moves on to
the Pause state if a significant decrease in amplitude into the silence level, occurs.
D. Pause State
The Pause state marks a halt between notes, or silence levels of a certain amount of time.
The state is characterized by very low amplitudes and lack of a constant frequency. The system
2
leaves this state for the New Note state if the system detects a stabilized frequency and a
significant increase in amplitude. Otherwise the system remains in the Pause state.
III. Implementation
A. Equipment
The core of the system makes use of Texas Instruments’ C67xx DSP boards, namely, the
DSK C6713. The DSK C6713 DSP board was chosen as the board of choice for its faster speed
over the older C6711, and its USB compatibility, allowing flexibility in use and portability. The
supplied Code Composer Studio program was used to write the DSP program in C because of its
native support of the DSK C6713 and its convenience. On the user end, MATLAB was
employed to display the resultant data in graphical form. MATLAB was chosen because of its
simplicity in implementation and the given time constraints. As an input device, a Shure LM57
instrument microphone was used because of its accuracy and sturdiness.
B. Overview
Monophonic music is input through a microphone connected to the “mic in” input to the
DSK C6713. The input is stored in buffers of a certain frame size. The buffered data is then
processed using the aforementioned pitch detection and timing analysis algorithms. The
processed data is then relayed from the buffers to the host and further processed through the state
machine algorithm and then presented on a graphical display. Texas Instrument’s
The implementation of the monophonic decoder system required two parallel phases: the
backend raw data processing and the front-end graphical display. The analysis and processing to
extract note and time length values involved translating MATLAB code written in the early
design phase into C code optimized for the C67 family of DSP processors. The graphical display
was developed in MATLAB and displays a “piano roll” representation of the incoming data
3
stream.
The next two sections, “Digital Signal Processing” and “Host Processing,” discuss the
implementation in separate parts in relation to the respective machines containing the algorithms.
B. Digital Signal Processing
1. Design to Implementation
After functionality of the algorithm was verified in MATLAB during the design phase,
the algorithm was implemented in C using Code Composer Studio (CCS). The CCS
implementation was verified with the same test vectors used for the MATLAB implementation.
Storing test waveforms as files allowed the advantage of using CCS’s FileIO functionality that
simulates a buffer by periodically reloading an array from file data.
Paralleling the MATLAB implementation, the pitch detection algorithm was
implemented first and verified using one-note waveform segments. Initially, the pitch detection
algorithm ran surprisingly slowly. Inspection of the assembly code showed little utilization of the
C6713’s parallel processing capabilities.
Upon conversion to C, the timing analysis initially performed poorly as well. Although
the results were accurate, the MATLAB-oriented program structure did not lend itself to the DSP
chip architecture. The code was then re-structured to eliminate conditionals wherever possible,
ensuring that often-executed blocks performed as little arithmetic as possible.
2. Process Flow
Analog input is sampled from the “mic” input into a buffer that holds two frames, each
containing 2,000 samples, for a total of 4,000 samples. A single HST pipe employs Texas
Instrument’s new Parallel Input/Output (PIO) module, currently available only for the DSK
C6713 board, to translate live analog input to stereo digital data. The pipe connects the two-
4
frame buffer to the program, providing frames of data as they are ready to be processed. Figure 5
illustrates the process.
Figure 5. Data pipe process flow.
Availability and ready-status is tracked by bit-masking software interrupts (SWI) utilized
by the data pipe that calls the pitch detection function. The SWI is initialized with a mailbox
value of three. Upon a read, the mailbox value will be bit masked to a value of one, and,
following a write, will be bit masked to a value of zero, thereby activating the interrupt and
resetting the mailbox value to three. A read action will store two frames in the buffer to reduce
the delay penalty incurred by sampling the outside signal while storing a relatively large amount
of data.
Once the pitch detection function has completed, through a second data pipe with a
similar bit-masking software interrupt setup, an offset float value representing the length of a
period in samples is supplied to the timing analysis function that, in turn, returns variance and
amplitude values. An array of floats containing the offset, variance, and amplitude is then
relayed through real-time data exchange (RTDX) to the host computer for further processing.
5
IV. Host Processing
A. Note Detection
Contrary to the original design, the note detection state machine resides on the host
machine. After testing, the state machine’s multi-frame interpretation was decidedly more
efficiently processed on the host computer. However, the core algorithm remains faithful to the
original design.
Note detection makes use of RTDX between the C67xx board and MATLAB. After each
frame of data is analyzed by the board, the MATLAB coded program receives a float array
containing offset, amplitude, and variance. The program uses a series of “if” statements to
determine the note frequency being played. If the given period does not match the period of a
note exactly, the closest note is selected.
B. Graphical Display
The end of the process flow ends with the graphical display. Since the DSP board and
MATLAB are involved in real-time data exchange, the graphical display, for simplicity, is
MATLAB-based. Shown in Figure 6 is the “piano roll” display. The display employs
MATLAB’s imagesc function that translates a matrix of numbers into a grayscale image. The
display is comprised of two juxtaposed matrices: the left matrix containing the note labels, and
the right matrix containing the incoming stream of note data. The left matrix is created by
reading a series of JPG images containing the names of each note, A through A-flat and
displaying them to the screen. This matrix is created only once during the initialization phase and
is set as the farthest left hand column during every update. During initialization, the right side is
initialized to a matrix of all white pixels, a color value of 255 in grayscale. Then, certain rows
and columns are set to a shade of gray to create a grid. The horizontal axis displays time while
6
the vertical axis displays pitch. The horizontal lines are spaced to be at the edges of the note
indexes in the left matrix.
Figure 6. One octave of the “Piano Roll” representation of music in MATLAB.
When a note ends, the display is updated. The state machine sends the pitch, length, and
start time of the note to the graphical display. The pitch value determines the row where the note
will be written. The length value contains the total number of columns the note should be in
length. The start time contains a value, in terms of column number, where the next note should
start. The display then finds a position in the image based on the start time and pitch and
proceeds to create an array of zeros, seen as a black bar, four times as long as the number of
frames for which the note was played. Basically, a length of four pixels represents one frame.
Once the matrix array has filled 250 frames in the horizontal direction, the graphical display calls
a scrolling function that drops the left fifth of the right hand matrix and appends a blank matrix
of columns (white with the gray grid) to the end of the right side. At this point the dropped data
is lost to the graphical display, but is stored in an array of notes for future reference.
7
IV. Results
A. Testing
Once the coding was completed, tests were run to ensure robustness of the algorithms, the
ability to input live music, and, most importantly, real-time processing.
The algorithms initially proved unreliable and broke occasionally due to buffer lag and
initial misunderstandings of RTDX and the PIO.
Live music input proved to be much less an issue, as it required slight adjustments of
noise floor calculations for proper detection.
From the start, real-time processing became an issue. For real-time functioning, data
processing required a time limit of 45.3 ms, the time length of one buffer of 2000 samples
sampled at a rate of 44,100 kHz. Initial testing yielded process times much greater than the realtime threshold. Optimization procedures were sought and executed.
On the display side, the graphics lagged behind the DSP side. Since the graphical display
waits until the end of the note to display the note, and since MATLAB rebuilds the array during
the display, the display appears to lag and is not as fast as a text only display.
B. Optimization
To improve performance and increase processing speed, the algebraic structure of the
algorithm was reorganized to allow for parallel processing. Although some low-level operations
changed drastically, the high-level functionality remained the same. For instance, instead of
summing the difference of two integer arrays, the algorithm sums the product of two floatingpoint arrays, allowing parallel addition and multiplication. This method decreases computation
time by about a factor of two.
Texas Instrument’s freely available C67x library was also used whenever possible. This
8
library provided optimized routines for a number of common tasks, including convolution and
other general mathematic functions. A number of these were employed to meet real-time
deadlines.
Structurally, the Data Pipe (PIP) method of device independent communication on the
DSP board was chosen over the Streamed Input/Output (SIO) method as the best method because
of its greater flexibility and lower overhead. The PIP method allows the added advantage of
multithreading our pitch detection and timing analysis.
C. Final Results
Following the optimization, the system successfully responded in real-time to live
musical input. Current processing time numbers in the 25ms range, almost half the real-time
threshold limit. Pitches are correctly detected 90% of the time, while note lengths are detected
within 25ms of accuracy.
V. Conclusion
The real-time monophonic music decoder was successfully implemented on the TI DSK
C6713 after its algorithm was verified in MATLAB. Most delays in implementation were due to
a lack of familiarity with the Code Composer Studio software suite and its powerful set of
debugging tools. However, as CCS was learned by members of the team, the rate of development
increased significantly.
Given ample time, a more involved graphical display could have been implemented into a
streamlined graphical user interface that responds in real-time. In hindsight, integration of the
DSP part of the system with the host part of the system at an earlier point in time could have
made troubleshooting easier during the testing phase.
9
Download