Speech Recognition Calculator ECE 4007 Section L02 - Group 08 Alfredo Herrera John Holmes Josh Liang Alex Kee Table of Contents Introduction ......................................................................................................................... 3 Design Approach ................................................................................................................ 4 Hardware ............................................................................................................................. 5 Hardware Approach ........................................................................................................ 5 Hardware Details ............................................................................................................ 6 Speech Recognition Integrated Circuit (HM2007) ..................................................... 6 Arduino Microcontroller ............................................................................................. 6 Hardware Changes to Original Design ........................................................................... 7 Software .............................................................................................................................. 8 Arduino Microcontroller Programming .......................................................................... 8 Overview ..................................................................................................................... 8 Basic Algorithm .......................................................................................................... 8 Modifications to Basic Algorithm ............................................................................... 8 Graphical User Interface ............................................................................................... 11 Overview ................................................................................................................... 11 Serial Communication .............................................................................................. 12 GUI Display .............................................................................................................. 12 Computation.............................................................................................................. 13 Error-checking .......................................................................................................... 13 Marketing Approach ......................................................................................................... 14 Cost Analysis ................................................................................................................ 14 Appendix A ....................................................................................................................... 16 Appendix B ....................................................................................................................... 18 Appendix C ....................................................................................................................... 21 Works Cited ...................................................................................................................... 23 2 Introduction Calculators on today’s market insufficiently accommodate the needs of limited mobility people. Limited mobility people are typically classified as people whom have difficulty with precise movements such as button presses on small electronics. Currently the market for calculators aims toward limited vision consumers by either enlarging the surface area of buttons and/or by adding voice feedback when buttons are pressed. While such solutions mitigate the problem, they fail to completely address the problem limited mobility consumers encounter. Clearly, a better solution is to completely remove any physically interaction with the device. Voice recognition makes this possible. Through speech users can operate such a device with ease and accuracy; limited mobility users no longer have to fumble around with button presses which require very precise motor skills. Typically voice recognition is associated with automated telephone systems, where users speak to the system as a means for data input. When users are speaking to the system, there is no direct feedback of speech detected. As a result, the user must wait for the system to verify correct data input at the end of a sequence of data. Therefore, voice recognition implementations are known to be inaccurate and time consuming. Our approach to these issues is to provide real time visual feedback of data spoken. This allows users to correct misrecognized words in real time. In addition, all preconceived notions of voice recognition require some pre-operation training in order for accurate recognition. However by implementing a means of speaker independence, speech recognition without any pre-operation training, a limitless number of different users can operate the device with reliable accuracy. The primary goal of this project was to address pitfalls of existing products. This methodically will be implemented in the design of our Speech Recognition Calculator (SRC), allowing us to completely cater to the needs of our intended audience. Besides the basic operation of the voice recognition calculator, we sought to provide additional features to supplement learning and to ease operation. These include remote connectivity option and a LCD display. The objective of remote connectivity is to allow a tutor to remotely interact with a student. A tutor can input equations that must be solved by the student. The use of a LCD display will add additional visual feedback to the user. 3 Design Approach As with all voice recognition systems, the system is composed of three main components: voice recognition, microprocessor, and user interface. The microprocessor handles the communication between the voice recognition and user interface, as shown in Figure 1. The microprocessor is controlled with the C programming language and proprietary libraries which allows control of the data pins on the microprocessor. Those data pins provide a means of communication between the voice recognition components and the microprocessor. The data flow from the voice recognition components and microprocessor is unidirectional, as the data can only be read from the voice recognition components. As a result, the microprocessor must perform the following functions: decoding of data signals, manipulation of the decoded signals, and control of data transmission to the user interface. The user interface provides real time visual feedback of spoken words and the arithmetic calculations. Figure 1. High level overview of interactions between voice recognition components, microcontroller, and user interface. 4 Hardware Hardware Approach The SRC system is divided into three main components: a graphical user interface (GUI), a speech recognition component, and a microprocessor core. The GUI is implemented in the software design. The microphone, analog filter, speech recognition integrated circuit (HM2007), and the SRAM constitute the speech recognition components. The microcontroller used is an Arduino processor. Figure 2 is a schematic of the data interactions between the individual components that contribute to the SRC design. The arrows denote flow of data. Figure 2: Block diagram showcasing speech recognition calculator. 5 Hardware Details Speech Recognition Integrated Circuit (HM2007) The HM2007 is the chip that uses a single CMOS transistor that analyzes the analog signal obtained from the microphone [1]. Speech signals are captured with a microphone which are then filtered and converted into a digital signal by an analog filter. The speech signal must be filtered to remove any frequencies outside the range of normal speech. Filtering certain frequencies also reduces the bandwidth of the speech signal resulting in less required computation power. Depending on the operation of the HM2007 (training or recognition mode), data is either written to or read from the SRAM. The SRAM is divided into data banks where each data bank has its own unique 8 bit binary value. If the HM2007 is in recognition mode, the HM2007 will attempt to match the speech signal with the entries in the SRAM and return the corresponding data bank’s 8 bit binary value on its data bus. If no match is detected, a reserved 8 bit value is transmitted. When the HM2007 matches the received speech signal to a specific stored phrase of word, the corresponding 8 bit value will be constantly held on the eight pins of the data bus. The 8 bit value will only change when the HM2007 recognizes a different speech signal. Therefore, the voice recognition components will always transmit a signal of the last speech signal recognized. Arduino Microcontroller Arduino is an open-source electronics prototyping platform based on flexible, easy-to-use hardware and software [2]. The Arduino microcontroller is essential to the design of the SRC as it provides communication between the voice recognition components and the graphical user interface (GUI). The 8 bit data bus provides communication between the microprocessor and the HM2007. After Arduino microprocessor reads the 8 bit data bus, the programmed microcontroller will decode and manipulated the 8 bit signals. After processing the 8 bit signals, the Arduino microcontroller sends the ASCII equivalent of the spoken word or phrase through a USB connection to the GUI. The GUI is written in the Processing language, which is based on Java. The details of the software used to program the Arduino microcontroller and the GUI will be discussed in later sections. 6 Hardware Changes to Original Design While we have met our goal by nearly eliminating all physical user interaction, some proposed features failed to be included in our final design. The original hardware selected for this project did not operate in a reliable manner. As a result, new hardware had to be selected. The newer hardware does not have the same connectivity or flexibility as the original. However, it does function in a more reliable manner. As planned, our calculator was supposed to be able to feature speaker independence. Speaker independence was supposed to allow an array of users to be able to use the calculator without training. However, extensive benchmark tests show that speaker independence is not entirely achievable with the current speech recognition chip (HM2007). The current design approach for the Speech Recognition Calculator was not the first design iteration. The original proposed plan was to construct an original voice recognition circuitry. The original circuitry was comprised of single package HM2007 and SRAM IC chips and a microcontroller assembled on a breadboard. In addition, LEDs were used in order to help for debugging and status indicators. These components are identical to the prepackaged HM2007 currently employed. Unlike our current design, the Basic Stamp microcontroller was previously used rather than the Arduino microprocessor. Ideally, the design with either microprocessor should perform identically, but this was not the case. One of problems with this design was that the Basic Stamp did not recognize the signal that was in the output from the HM2007. Moreover, the Basic Stamp chip could not manipulate the information that was inside the HM2007. In other words, we could not get the HM2007 interface with the Basic Stamp chip. Another problem faced was the circuitry itself. Although care was taken when connecting all the wires, there were too many wires and possibilities for error. It was very difficult to debug the circuit. The circuit was built more than a dozen times in order to see if there was a bug in the circuitry. In the end, the circuit never worked. Finally, the LEDs that were supposed to light up whenever recognition occurred would light up at random intervals, making debugging nearly impossible. 7 Software Arduino Microcontroller Programming Overview The Arduino microcontroller is programmed with Arduino’s proprietary complier. The complier is based off of the C programming language. Included with the complier is Arduino’s propriety libraries which allows the programmer to access to the external pins of the microcontroller. Arduino’s complier has two reserved functions. These two functions are named setup( ) and loop( ). The function, called setup( ), is always executed when the Arduino microcontroller is initialized. The loop( ) function is continually evaluated while the Arduino microcontroller is running immediately after the setup( ) function. Basic Algorithm Within the setup( ) function, the eight pins on the microcontroller are establish as inputs. This allows the eight pins to sense digital logic levels transmitted by the voice recognition components. This setup function also establishes the data rate at which the serial bus will be transmitted or received data. The serial bus is the communication link between the microcontroller and the graphical user interface. Therefore, any data written on to the serial bus by the microcontroller will be read by the graphical user interface, and vise versa. The loop( ) function will handle all the decoding of data from the voice recognition components and transmissions of data to the graphical user interface. This function will begin by reading the eight pins on the microcontroller. Since the value held on eight pins correspond to a specific phrase or word stored in the SRAM, the loop( ) function will decode that phrase or word into its ASCII value. That ASCII value will be transmitted to the graphical user interface over the serial bus. This methodically will be repeated continuous after the setup( ) function is executed. This is the basic algorithm of the Arduino microcontroller. Modifications to Basic Algorithm As noted above in the ‘Hardware Details’ section, the voice recognition system will continually transmit the eight bit value of the last recognized phrase or word. Since the loop( ) function is constantly evaluated, the basic algorithm will constantly transmit a ASCII value to the graphical user interface. As a result, even if the voice recognition system has not recognized any new phrases or words, the last recognized word is still 8 transmitted, due to the fact the voice recognition system will continue to transmit its last recognized phrase or word. In order to decrease the amount of data flow to the graphical user interface, a decision loop must be added. Every time the loop( ) function is ran, this decision loop will store the current decoded ASCII value and the previous decoded ASCII value of the last evaluation of the loop( ) function. When the previous and current ACSII value differs, this means a new phrase or word has been recognized. As a result, this new ACSII value will be transmitted over the serial bus. The high level flowchart is shown in Figure 3. Appendix B is the code used to program the Arduino microcontroller. 9 Figure 3: Arduino programming software flow. 10 Graphical User Interface The Processing language is based on the Java language and places an emphasis on “images, animation, and interactions” [x]. Given Processing’s compatibility with the ideals of the SRC (user-friendliness and graphical interaction), the language was selected to handle most of the high-level functions. The high-level functions include the GUI display and computation algorithms. Overview The Processing program works at the top most layer of the software design solution used for the SRC (see Figure 4). The program resides on a PC (personal computer) with the appropriate operating system (Linux, Windows, or Mac OS X), and Java virtual machine. The C program is implemented in C and programmed into the Arduino’s Atmel microcontroller. The data flow between the two layers is bidirectional using a COM Serial port. Processing Serial port C 8 bit bus HM2007 Figure 4: Software layers. The Processing program itself is composed of three components: serial communication, display GUI, and computation. A scheme for serial communication was necessary in order to interface with the C code. The display GUI was set up to provide the user with a friendly and easy-to-read visual feedback system. The computation algorithms take care of the basic arithmetic functions of the calculator. 11 Serial Communication Serial communications are established between the C program and Processing program via the creation of a Serial object in the Processing code. A 9600 bps baud rate is used for communication to achieve a wider range of compatibility, as speed is of little importance. Figure 5 shows the sequence in which serial communication takes place. Serial.read() GUI Display Serial.clear() Serial.clear() Y Delimiter? Serial.read() N Figure 5: Processing software flow. The delimiter is necessary in order to provide the user with feedback while interfacing with the SRC. The Processing display GUI will not display another number or operator until the delimiter has been spoken by the user (usually the word “OK”). In addition, the result of a computation will not be displayed until a keyword has been spoken (usually the word “EQUAL”). For example, the equation 2+2 can be evaluated using the following scheme, “2”, “OK”, “2”, “OK”,”EQUALS”, “OK” which will cause the calculator to display the number 2. GUI Display A significant amount of research was invested in the design of the GUI display, given that the SRC is intended for all disability types, including those patrons with low-vision. A study by Hill and Scharff suggests the best font color, font style, font type, and background color combination as green italicized Times New Roman on a yellow background. [http://hubel.sfasu.edu/research/AHNCUR.html]. The GUI features a high 12 contrast, high visibility display with similar attributes as those mentioned as well as a large font size in order to aid visual user feedback. Computation Once the speech data has been read and interpreted from the serial bus by the Processing code, the Processing program stores the data into a vector array as a means of memory storage. Once the keyword “EQUALS” has been picked up by the program, the vector array is processed and a result is sent to the GUI display. The vector array is parsed element-by-element, and a variable stores a running total. When an arithmetic operator is encountered in the vector array, two inputs are sent to a method corresponding to the operation to be performed (addition, subtraction, division, etc.). The method performs the computation and then returns the result as a single variable. Error-checking Error-checking is performed on-the-fly via a Boolean state variable that determines whether or not two arithmetic operators have been spoken consecutively. If this is the case, the Processing code will accept the last spoken operator. Appendix C is the code for creating the java GUI. 13 Marketing Approach The market for limited mobility people has never been adequately addressed. Limited mobility people have always been excluded from using forms of instrumentation which require precise motor movements such as button presses on small electronics such as handheld calculators. The market for calculators aims toward limited vision consumers by either enlarging the surface area of buttons and/or by adding voice feedback when buttons are pressed. While such solutions help users with limited vision, they fail to completely address the problems for limited mobility users. The only complete solution is to completely remove physically interaction with the device. With voice recognition technology, this is now possible. The SRC calculator is an all-in-one, disability-friendly calculator that introduces a handsfree, speech recognition and speech feedback system. The calculator can also be used to remotely tutor users. Since production options on the market is currently very limited, the cost of the calculator should not impede market success. The unit cost of an SRC calculator will be around $285. Many existing speech recognition calculator products on the market today are software products for laptop or desktop computers. The competing Orion speech recognition calculator retails for around $280. This offers an advantage for the SRC because it offers a comparable market cost with superior features. The SRC will be the only product on the market which will feature hands free speech recognition capabilities. The SRC will offer unparalleled support for users with this requirement. Cost Analysis As seen in Table A4, the inclusion of the Arduino board into the overall SRC calculator design has lowered the cost of all materials to approximately $231. If the unassembled ImagesCo SR-06 circuit kit is purchased and mass assembled, the cost of overall materials would be lowered drastically when the SRC is mass manufactured. Since the pre-assembled ImagesCo SR-07 circuit kit was purchased to create the prototype, it is known that the design will work flawlessly with the SRC. The SR-07 Speech Recognition circuit is essentially the assembled and tested SR-06 kit. Mass assembly and testing will be included in the total cost of manufacturing. The cost of the SR-07 kit was $179.95 and the cost of the SR-06 kit is $114.95. This will allow the unit cost to drop by $65.00. At a unit price of $285, there will be an even more significant margin of profit. The overall labor cost, based on the salaries of four entry level engineers at $26 an hour for 100 hours each is determined to be $10,400. With all the cost of overhead, fringe, 14 and sales, the sale of 100,000 units at a price of $285 allows for a $13.6 million dollar profit. Tables outlining the details of cost analysis may be found in Appendix A. 15 Appendix A Table A1. SRC PRICE CALCULATIONS SRC PRICE CALCULATIONS Fringe Benefits Overhead Sales Expense 30% of labor 120% of materials, labor & fringe 10% of selling price Table A2. Labor Cost Labor Cost Engineer 1 (100 hrs x $26) Engineer 2 Engineer 3 Engineer 4 Total Labor Cost $2,600 $2,600 $2,600 $2,600 $10,400 Table A3. Development Cost Development Cost Parts Labor Fringe Benefits, % of Labor Subtotal Overhead, % of Matl, Labor & Fringe $231 $10,400 $3,120 $13,751 $16,501 Total $44,003 16 Table A4. Bill of Materials for 1 Unit Bill of Materials for 1 Unit HM2007 8K x 8 SRAM 74LS373 7448 x 2 LCD Display Arduino Microcontroller Board Keypad Protoboard Microphone Wires, Resistors, and Capacitors Parallax Internet Netburner Kit Total $10 $8.85 $0.80 $1.50 $20 $49 $10 $15 $5 $12 $99 $231 Table A5. Determination of Selling Price Determination of Selling Price Based on: Parts Cost Assembly Labor Testing Labor Total Labor Fringe Benefits, % of Labor Subtotal Overhead, % of Matl, Labor & Fringe Subtotal, Input Costs Sales Expense Amortized Development Costs Subtotal, All Costs Profit Selling Price Net Profit 17 100,000 units $231 $30 $10 $40 $12 $283 $340 $120 $29 $0 $149 136 $285 $13,600,000 Appendix B //declare our methods and variables int ReadMatchPin(); boolean Comparer(int[], int[]); //the pins we read int pins[] = {2, 3, 6, 7, 8, 9, 4, 5}; //our masks int mask[10][8] = { {1,1,0,1,1,1,1,1}, //{1, 0, 0, 0, 0, 0, 0, 0}, //zero -> OK button {1, 0, 0, 0, 0, 1, 0, 0}, //one {1, 0, 0, 0, 1, 0, 0, 0}, //two {1, 0, 0, 0, 1, 1, 0, 0}, //three {1, 0, 0, 1, 0, 0, 0, 0}, //four {1, 0, 0, 1, 0, 1, 0, 0}, //five {1, 0, 0, 1, 1, 0, 0, 0}, //six {1, 0, 0, 1, 1, 1, 0, 0}, //seven {1, 0, 1, 0, 0, 0, 0, 0}, //eight {1, 0, 1, 0, 0, 1, 0, 0}, //nine }; void setup() { //set pins as inputs for(int i=0; i<=7; i++){ pinMode(pins[i], INPUT); } beginSerial(9600); } char command; void loop(){ int value; int tempValue; int previousValue; boolean firstTime = true; boolean firstSend = true; 18 command = serialRead(); if(command == 'r'){ do{ tempValue = ReadMatchPins(); if(tempValue > 0){ value = tempValue; if(firstTime){ previousValue = value; firstTime = false; } if(previousValue != value){ firstSend = true; } if(firstSend){ serialWrite(value+48); //ascii value printNewline(); firstSend = false; } previousValue = value; } command = serialRead(); //stop }while(command != 's'); } } int ReadMatchPins(){ int pinValues[8]; boolean found = false; for(int i=0; i<=7; i++){ pinValues[i] = digitalRead(pins[i]); } int index = 0; int matchedValue; do{ if(Comparer(pinValues, mask[index])){ found = true; matchedValue = index; } index++; //condition where nothing is matched returned value is -1 if(index > 9){ found = true; matchedValue = -1; 19 } }while(!found); return matchedValue; } boolean Comparer(int a[], int b[]){ boolean matched = true; for(int i=0; i<=7; i++){ if(a[i] != b[i]){ matched = false; } } return matched; } 20 Appendix C //Import libraries import processing.serial.*; //Declarations String inString = new String(" "); // Input string from serial port: String pString = new String("1"); String holder = new String("1"); Serial myPort; // The serial port: PFont fontA; // The display font: int equal = '='; // ASCII equals // Set the left and top margin int margin = 6; int gap = 30; void setup() { size(600, 600); background(0); fontA = loadFont("CourierNew36.vlw"); textFont(fontA, 48); textAlign(LEFT); // List all the available serial ports: //println(Serial.list()); //Select second serial port (tested on John's computer and seems to be typical) myPort = new Serial(this, Serial.list()[1], 9600); //Read buffer until we hit the equals character //myPort.bufferUntil(equal); } void draw() { 21 myPort.write(114); //start the Serial communication //myPort.write(115); causes spamming holder = (myPort.readString()); //temporary holder myPort.clear(); if ((holder != null) ) { inString = inString.concat(holder); print(inString); } delay(2000); } void stringParse(String p) { if (p.indexOf("+") != -1) //test for addition { } } 22 Works Cited [1] Images Scientific Instruments Inc., “How to Build a Speech Recognition Circuit,” Images Scientific Instruments Inc. [Online]. Available: http://www.imagesco.com/articles/hm2007/SpeechRecognitionTutorial01.html. [Accessed: Dec. 1, 2007]. [2] Arduino inc., “Arduino Board”, [Online]. Available: http://www.arduino.cc/. [Accessed: Dec. 1, 2007]. [3] B. Fry and C. Reas, "Processing," [Online document], 2007, Available HTTP: http://www.processing.org/ [4] A.L. Hill and L.F.V. Scharff, "Readability Of Websites With Various Foreground/Background Color Combinations, Font Types And Word Styles," [Online document], 1997, Available HTTP: http://hubel.sfasu.edu/research/AHNCUR.html 23