Final Report 4/19/05 Spring 2005: Group 3 Austin Assavavallop William Feaster Greg Heim Philipp Pfeiffenberger Wamba-Kuete Yves ECE4006E Senior Design Project Georgia Institute of Technology School of Electrical and Computer Engineering http://www.ece.gatech.edu/academic/courses/ece4006/05spring/ece4006e/group03/ I. Introductory Design Theory The very nature of the digital signal processing (DSP) system demands simplicity, to be lean in computing cycle usage and to approach real-time processing, and accuracy, to ensure data quality and robustness of the system. Thus, the design specifies a multithreaded analysis of the input signal using autocorrelation analysis, or “Pitch Detection,” to determine note frequencies, and time frame analysis, or “Envelope Analysis,” to determine note envelopes. Figure 1 Figure 1. System Overview illustrates the basic concept. A monophonic signal is input into the DSP system and is simultaneously processed through envelope analysis and pitch detection. The “Controller” is the host computer that consolidates the information supplied by the two threads. A. Pitch Detection Analysis In the autocorrelation pitch detection analysis, the octave and specific note are determined. As with the time frame envelope analysis, presets must be calculated for optimal operation. Presets required are the sampling rate, the number of guesses to be processed, and the maximum octave limit on the musical scale. Initial arrays are set according to these presets and two main loops begin processing. The outside loop traverses through each octave, while the inner 1 loop traverses through each note. The autocorrelation function, visually demonstrated in Figure 2, reveals that the amplitude values at the intersections of the signal and red lines are very closely correlated, resulting in a very low value for autocorrelation. The system samples the input signal at 44,100 Hz, yielding 1,603 samples per frame. This array of Figure 2. Autocorrelation illustration. samples is referred to as si(n)=[0..N], where i represents the frame number. To perform the autocorrelation calculations using Eq. 1, where Np is the autocorrelation result and n is the sample index, each frame must contain at least two full periods of the signal. N /2 min | s(n) s(n N p ) | N p n 0 Eq. 1 The frame size is determined by the lowest expected frequency of the input signal. This value is used to determine the closest guess to a correct note in a particular octave. The design limits the lowest expected frequency to 55 Hz, corresponding to an “A” in the first octave of a standard twelve-note chromatic scale. Beginning with that note, the system calculates the frequency of an incoming note based on the iterations of the two loops. From the calculated frequency, the offset is also found. Next, the input note is compared sample-by-sample to the ideal note at the calculated frequency. If the input note matches more closely the current ideal note than the previously guessed note, the current ideal note is set as the best guess for the input note’s frequency. If the previous guess is closer, no changes occur and the iterations resume. This process continues through the entire musical scale to ensure the absolute best guess is determined. 2 B. Envelope Analysis In the time frame envelope analysis, the incoming monophonic signal supplied to the system helps determine the block size for the time frame function. This preset allows the function to set certain parameters before any processing can occur, thus allowing for quicker and more accurate readings of input data. Once the main loop of the program starts, an automated guess of the first note’s time frame size is generated based on where the frame starts and the size of each block. Once time frame size is set, two more loops are created. The inner loop evaluates individual samples from which the outer loop determines how many periods of the waveform occur within the time frame. If no consistent periods are found, the assumed frame is expanded. Once the time frame is reduced to the first complete period found in the timeframe, the single note found can be further analyzed. After the first note has been calculated, the envelope and pitch analyses cooperatively determine the rest of the notes and their envelopes. When the envelope analyzer receives the harmonic period Np from the pitch detection analyzer, the envelope analyzer divides the frame into Cp one-period cycles ci(n) (Eq. 1). Cp is determined by Equation 2. Discrete integration over ci(n) by Eq. 3 gives us Aci, the area of cycle i: the estimate of energy and amplitude. Following s ( n N p i ) 0 n N p c i ( n) 0 Np n Eq. 1 C p floor ( N / N p ) Eq. 2 Np Aci | ci ( j ) | Eq. 3 j 0 A 1 Cp Cp A i 0 ci Eq. 4 Eq. 4, the average amplitude of the frame is then simply the averaged amplitude of all cycles. The slope S, or attack, of the frame can then be estimated by averaging the difference of adjacent cycle amplitudes (Eq. 5): 3 1 S Cp A Cp i 1 ci Aci1 Eq. 5 In the situation where Cp = 4, the frames are split up as shown in Figure 3. The frame size is Figure 3. Time frame distribution. shown to be one period wide within a larger frame. The original frame is split into smaller time frames until a single period occurs within a time frame. II. Algorithm State Machine Design The state machine, depicted in Figure 4, consists of four different states. Each plays a significant role to insuring system reliability. Figure 4. Music transcription system state machine. 1 A. Initialization State The Initialization state represents the starting point of the system. During this state, the system estimates the level of noise during silence. The system remains in the Initialization state until a significant amplitude change is detected. B. New Note State The New Note state is a state of uncertainty. The system attempts to determine whether or not a note has been encountered. When the system reaches the New Note state, three options are weighed. The first option returns the system to the Initialization state; this only occurs only if the system has yet to be in the Stable Note state and the frequency is currently unstable. The system recognizes the signal as not a note, but noise with high amplitudes. The second option moves the system into the Stable Note state. This state transition occurs if the system detects a stable frequency over a certain period of time. The third option allows the system to linger in the New Note state. If the change in frequency is not consistent enough to move the system to the Stable Note state, the system remains in the New Note state. C. Stable Note State The system reaches the Stable Note state when the system is certain a note has been detected. The system stays in the Stable Note state only if the amplitude level is fairly constant or slightly decreasing, and the frequency produced is constant. If the frequency changes or amplitude sharply increases, the system returns to the New Note state. The system moves on to the Pause state if a significant decrease in amplitude into the silence level, occurs. D. Pause State The Pause state marks a halt between notes, or silence levels of a certain amount of time. The state is characterized by very low amplitudes and lack of a constant frequency. The system 2 leaves this state for the New Note state if the system detects a stabilized frequency and a significant increase in amplitude. Otherwise the system remains in the Pause state. III. Implementation A. Equipment The core of the system makes use of Texas Instruments’ C67xx DSP boards, namely, the DSK C6713. The DSK C6713 DSP board was chosen as the board of choice for its faster speed over the older C6711, and its USB compatibility, allowing flexibility in use and portability. The supplied Code Composer Studio program was used to write the DSP program in C because of its native support of the DSK C6713 and its convenience. On the user end, MATLAB was employed to display the resultant data in graphical form. MATLAB was chosen because of its simplicity in implementation and the given time constraints. As an input device, a Shure LM57 instrument microphone was used because of its accuracy and sturdiness. B. Overview Monophonic music is input through a microphone connected to the “mic in” input to the DSK C6713. The input is stored in buffers of a certain frame size. The buffered data is then processed using the aforementioned pitch detection and timing analysis algorithms. The processed data is then relayed from the buffers to the host and further processed through the state machine algorithm and then presented on a graphical display. Texas Instrument’s The implementation of the monophonic decoder system required two parallel phases: the backend raw data processing and the front-end graphical display. The analysis and processing to extract note and time length values involved translating MATLAB code written in the early design phase into C code optimized for the C67 family of DSP processors. The graphical display was developed in MATLAB and displays a “piano roll” representation of the incoming data 3 stream. The next two sections, “Digital Signal Processing” and “Host Processing,” discuss the implementation in separate parts in relation to the respective machines containing the algorithms. B. Digital Signal Processing 1. Design to Implementation After functionality of the algorithm was verified in MATLAB during the design phase, the algorithm was implemented in C using Code Composer Studio (CCS). The CCS implementation was verified with the same test vectors used for the MATLAB implementation. Storing test waveforms as files allowed the advantage of using CCS’s FileIO functionality that simulates a buffer by periodically reloading an array from file data. Paralleling the MATLAB implementation, the pitch detection algorithm was implemented first and verified using one-note waveform segments. Initially, the pitch detection algorithm ran surprisingly slowly. Inspection of the assembly code showed little utilization of the C6713’s parallel processing capabilities. Upon conversion to C, the timing analysis initially performed poorly as well. Although the results were accurate, the MATLAB-oriented program structure did not lend itself to the DSP chip architecture. The code was then re-structured to eliminate conditionals wherever possible, ensuring that often-executed blocks performed as little arithmetic as possible. 2. Process Flow Analog input is sampled from the “mic” input into a buffer that holds two frames, each containing 2,000 samples, for a total of 4,000 samples. A single HST pipe employs Texas Instrument’s new Parallel Input/Output (PIO) module, currently available only for the DSK C6713 board, to translate live analog input to stereo digital data. The pipe connects the two- 4 frame buffer to the program, providing frames of data as they are ready to be processed. Figure 5 illustrates the process. Figure 5. Data pipe process flow. Availability and ready-status is tracked by bit-masking software interrupts (SWI) utilized by the data pipe that calls the pitch detection function. The SWI is initialized with a mailbox value of three. Upon a read, the mailbox value will be bit masked to a value of one, and, following a write, will be bit masked to a value of zero, thereby activating the interrupt and resetting the mailbox value to three. A read action will store two frames in the buffer to reduce the delay penalty incurred by sampling the outside signal while storing a relatively large amount of data. Once the pitch detection function has completed, through a second data pipe with a similar bit-masking software interrupt setup, an offset float value representing the length of a period in samples is supplied to the timing analysis function that, in turn, returns variance and amplitude values. An array of floats containing the offset, variance, and amplitude is then relayed through real-time data exchange (RTDX) to the host computer for further processing. 5 IV. Host Processing A. Note Detection Contrary to the original design, the note detection state machine resides on the host machine. After testing, the state machine’s multi-frame interpretation was decidedly more efficiently processed on the host computer. However, the core algorithm remains faithful to the original design. Note detection makes use of RTDX between the C67xx board and MATLAB. After each frame of data is analyzed by the board, the MATLAB coded program receives a float array containing offset, amplitude, and variance. The program uses a series of “if” statements to determine the note frequency being played. If the given period does not match the period of a note exactly, the closest note is selected. B. Graphical Display The end of the process flow ends with the graphical display. Since the DSP board and MATLAB are involved in real-time data exchange, the graphical display, for simplicity, is MATLAB-based. Shown in Figure 6 is the “piano roll” display. The display employs MATLAB’s imagesc function that translates a matrix of numbers into a grayscale image. The display is comprised of two juxtaposed matrices: the left matrix containing the note labels, and the right matrix containing the incoming stream of note data. The left matrix is created by reading a series of JPG images containing the names of each note, A through A-flat and displaying them to the screen. This matrix is created only once during the initialization phase and is set as the farthest left hand column during every update. During initialization, the right side is initialized to a matrix of all white pixels, a color value of 255 in grayscale. Then, certain rows and columns are set to a shade of gray to create a grid. The horizontal axis displays time while 6 the vertical axis displays pitch. The horizontal lines are spaced to be at the edges of the note indexes in the left matrix. Figure 6. One octave of the “Piano Roll” representation of music in MATLAB. When a note ends, the display is updated. The state machine sends the pitch, length, and start time of the note to the graphical display. The pitch value determines the row where the note will be written. The length value contains the total number of columns the note should be in length. The start time contains a value, in terms of column number, where the next note should start. The display then finds a position in the image based on the start time and pitch and proceeds to create an array of zeros, seen as a black bar, four times as long as the number of frames for which the note was played. Basically, a length of four pixels represents one frame. Once the matrix array has filled 250 frames in the horizontal direction, the graphical display calls a scrolling function that drops the left fifth of the right hand matrix and appends a blank matrix of columns (white with the gray grid) to the end of the right side. At this point the dropped data is lost to the graphical display, but is stored in an array of notes for future reference. 7 IV. Results A. Testing Once the coding was completed, tests were run to ensure robustness of the algorithms, the ability to input live music, and, most importantly, real-time processing. The algorithms initially proved unreliable and broke occasionally due to buffer lag and initial misunderstandings of RTDX and the PIO. Live music input proved to be much less an issue, as it required slight adjustments of noise floor calculations for proper detection. From the start, real-time processing became an issue. For real-time functioning, data processing required a time limit of 45.3 ms, the time length of one buffer of 2000 samples sampled at a rate of 44,100 kHz. Initial testing yielded process times much greater than the realtime threshold. Optimization procedures were sought and executed. On the display side, the graphics lagged behind the DSP side. Since the graphical display waits until the end of the note to display the note, and since MATLAB rebuilds the array during the display, the display appears to lag and is not as fast as a text only display. B. Optimization To improve performance and increase processing speed, the algebraic structure of the algorithm was reorganized to allow for parallel processing. Although some low-level operations changed drastically, the high-level functionality remained the same. For instance, instead of summing the difference of two integer arrays, the algorithm sums the product of two floatingpoint arrays, allowing parallel addition and multiplication. This method decreases computation time by about a factor of two. Texas Instrument’s freely available C67x library was also used whenever possible. This 8 library provided optimized routines for a number of common tasks, including convolution and other general mathematic functions. A number of these were employed to meet real-time deadlines. Structurally, the Data Pipe (PIP) method of device independent communication on the DSP board was chosen over the Streamed Input/Output (SIO) method as the best method because of its greater flexibility and lower overhead. The PIP method allows the added advantage of multithreading our pitch detection and timing analysis. C. Final Results Following the optimization, the system successfully responded in real-time to live musical input. Current processing time numbers in the 25ms range, almost half the real-time threshold limit. Pitches are correctly detected 90% of the time, while note lengths are detected within 25ms of accuracy. V. Conclusion The real-time monophonic music decoder was successfully implemented on the TI DSK C6713 after its algorithm was verified in MATLAB. Most delays in implementation were due to a lack of familiarity with the Code Composer Studio software suite and its powerful set of debugging tools. However, as CCS was learned by members of the team, the rate of development increased significantly. Given ample time, a more involved graphical display could have been implemented into a streamlined graphical user interface that responds in real-time. In hindsight, integration of the DSP part of the system with the host part of the system at an earlier point in time could have made troubleshooting easier during the testing phase. 9