Introduction:

advertisement
Speech Recognition
Calculator
ECE 4007
Section L02 - Group 08
Alfredo Herrera
John Holmes
Josh Liang
Alex Kee
Table of Contents
Introduction ......................................................................................................................... 3
Design Approach ................................................................................................................ 4
Hardware ............................................................................................................................. 5
Hardware Approach ........................................................................................................ 5
Hardware Details ............................................................................................................ 6
Speech Recognition Integrated Circuit (HM2007) ..................................................... 6
Arduino Microcontroller ............................................................................................. 6
Hardware Changes to Original Design ........................................................................... 7
Software .............................................................................................................................. 8
Arduino Microcontroller Programming .......................................................................... 8
Overview ..................................................................................................................... 8
Basic Algorithm .......................................................................................................... 8
Modifications to Basic Algorithm ............................................................................... 8
Graphical User Interface ............................................................................................... 11
Overview ................................................................................................................... 11
Serial Communication .............................................................................................. 12
GUI Display .............................................................................................................. 12
Computation.............................................................................................................. 13
Error-checking .......................................................................................................... 13
Marketing Approach ......................................................................................................... 14
Cost Analysis ................................................................................................................ 14
Appendix A ....................................................................................................................... 16
Appendix B ....................................................................................................................... 18
Appendix C ....................................................................................................................... 21
Works Cited ...................................................................................................................... 23
2
Introduction
Calculators on today’s market insufficiently accommodate the needs of limited mobility
people. Limited mobility people are typically classified as people whom have difficulty
with precise movements such as button presses on small electronics. Currently the
market for calculators aims toward limited vision consumers by either enlarging the
surface area of buttons and/or by adding voice feedback when buttons are pressed. While
such solutions mitigate the problem, they fail to completely address the problem limited
mobility consumers encounter. Clearly, a better solution is to completely remove any
physically interaction with the device. Voice recognition makes this possible. Through
speech users can operate such a device with ease and accuracy; limited mobility users no
longer have to fumble around with button presses which require very precise motor skills.
Typically voice recognition is associated with automated telephone systems, where users
speak to the system as a means for data input. When users are speaking to the system,
there is no direct feedback of speech detected. As a result, the user must wait for the
system to verify correct data input at the end of a sequence of data. Therefore, voice
recognition implementations are known to be inaccurate and time consuming.
Our approach to these issues is to provide real time visual feedback of data spoken. This
allows users to correct misrecognized words in real time. In addition, all preconceived
notions of voice recognition require some pre-operation training in order for accurate
recognition. However by implementing a means of speaker independence, speech
recognition without any pre-operation training, a limitless number of different users can
operate the device with reliable accuracy. The primary goal of this project was to address
pitfalls of existing products. This methodically will be implemented in the design of our
Speech Recognition Calculator (SRC), allowing us to completely cater to the needs of our
intended audience.
Besides the basic operation of the voice recognition calculator, we sought to provide
additional features to supplement learning and to ease operation. These include remote
connectivity option and a LCD display. The objective of remote connectivity is to allow
a tutor to remotely interact with a student. A tutor can input equations that must be
solved by the student. The use of a LCD display will add additional visual feedback to
the user.
3
Design Approach
As with all voice recognition systems, the system is composed of three main components:
voice recognition, microprocessor, and user interface. The microprocessor handles the
communication between the voice recognition and user interface, as shown in Figure 1.
The microprocessor is controlled with the C programming language and proprietary
libraries which allows control of the data pins on the microprocessor. Those data pins
provide a means of communication between the voice recognition components and the
microprocessor.
The data flow from the voice recognition components and microprocessor is
unidirectional, as the data can only be read from the voice recognition components. As a
result, the microprocessor must perform the following functions: decoding of data
signals, manipulation of the decoded signals, and control of data transmission to the user
interface. The user interface provides real time visual feedback of spoken words and the
arithmetic calculations.
Figure 1. High level overview of interactions between voice recognition
components, microcontroller, and user interface.
4
Hardware
Hardware Approach
The SRC system is divided into three main components: a graphical user interface
(GUI), a speech recognition component, and a microprocessor core. The GUI is
implemented in the software design. The microphone, analog filter, speech recognition
integrated circuit (HM2007), and the SRAM constitute the speech recognition
components. The microcontroller used is an Arduino processor. Figure 2 is a schematic
of the data interactions between the individual components that contribute to the SRC
design. The arrows denote flow of data.
Figure 2: Block diagram showcasing speech recognition calculator.
5
Hardware Details
Speech Recognition Integrated Circuit (HM2007)
The HM2007 is the chip that uses a single CMOS transistor that analyzes the analog
signal obtained from the microphone [1]. Speech signals are captured with a microphone
which are then filtered and converted into a digital signal by an analog filter. The speech
signal must be filtered to remove any frequencies outside the range of normal speech.
Filtering certain frequencies also reduces the bandwidth of the speech signal resulting in
less required computation power. Depending on the operation of the HM2007 (training
or recognition mode), data is either written to or read from the SRAM. The SRAM is
divided into data banks where each data bank has its own unique 8 bit binary value. If
the HM2007 is in recognition mode, the HM2007 will attempt to match the speech signal
with the entries in the SRAM and return the corresponding data bank’s 8 bit binary value
on its data bus. If no match is detected, a reserved 8 bit value is transmitted.
When the HM2007 matches the received speech signal to a specific stored phrase of
word, the corresponding 8 bit value will be constantly held on the eight pins of the data
bus. The 8 bit value will only change when the HM2007 recognizes a different speech
signal. Therefore, the voice recognition components will always transmit a signal of the
last speech signal recognized.
Arduino Microcontroller
Arduino is an open-source electronics prototyping platform based on flexible, easy-to-use
hardware and software [2]. The Arduino microcontroller is essential to the design of the
SRC as it provides communication between the voice recognition components and the
graphical user interface (GUI). The 8 bit data bus provides communication between the
microprocessor and the HM2007.
After Arduino microprocessor reads the 8 bit data bus, the programmed microcontroller
will decode and manipulated the 8 bit signals. After processing the 8 bit signals, the
Arduino microcontroller sends the ASCII equivalent of the spoken word or phrase
through a USB connection to the GUI. The GUI is written in the Processing language,
which is based on Java. The details of the software used to program the Arduino
microcontroller and the GUI will be discussed in later sections.
6
Hardware Changes to Original Design
While we have met our goal by nearly eliminating all physical user interaction, some
proposed features failed to be included in our final design. The original hardware
selected for this project did not operate in a reliable manner. As a result, new hardware
had to be selected. The newer hardware does not have the same connectivity or
flexibility as the original. However, it does function in a more reliable manner.
As planned, our calculator was supposed to be able to feature speaker independence.
Speaker independence was supposed to allow an array of users to be able to use the
calculator without training. However, extensive benchmark tests show that speaker
independence is not entirely achievable with the current speech recognition chip
(HM2007).
The current design approach for the Speech Recognition Calculator was not the first
design iteration. The original proposed plan was to construct an original voice
recognition circuitry. The original circuitry was comprised of single package HM2007
and SRAM IC chips and a microcontroller assembled on a breadboard. In addition,
LEDs were used in order to help for debugging and status indicators. These components
are identical to the prepackaged HM2007 currently employed. Unlike our current design,
the Basic Stamp microcontroller was previously used rather than the Arduino
microprocessor. Ideally, the design with either microprocessor should perform
identically, but this was not the case.
One of problems with this design was that the Basic Stamp did not recognize the signal
that was in the output from the HM2007. Moreover, the Basic Stamp chip could not
manipulate the information that was inside the HM2007. In other words, we could not
get the HM2007 interface with the Basic Stamp chip.
Another problem faced was the circuitry itself. Although care was taken when
connecting all the wires, there were too many wires and possibilities for error. It was
very difficult to debug the circuit. The circuit was built more than a dozen times in order
to see if there was a bug in the circuitry. In the end, the circuit never worked. Finally,
the LEDs that were supposed to light up whenever recognition occurred would light up at
random intervals, making debugging nearly impossible.
7
Software
Arduino Microcontroller Programming
Overview
The Arduino microcontroller is programmed with Arduino’s proprietary complier. The
complier is based off of the C programming language. Included with the complier is
Arduino’s propriety libraries which allows the programmer to access to the external pins
of the microcontroller. Arduino’s complier has two reserved functions. These two
functions are named setup( ) and loop( ). The function, called setup( ), is always
executed when the Arduino microcontroller is initialized. The loop( ) function is
continually evaluated while the Arduino microcontroller is running immediately after the
setup( ) function.
Basic Algorithm
Within the setup( ) function, the eight pins on the microcontroller are establish as inputs.
This allows the eight pins to sense digital logic levels transmitted by the voice
recognition components. This setup function also establishes the data rate at which the
serial bus will be transmitted or received data. The serial bus is the communication link
between the microcontroller and the graphical user interface. Therefore, any data written
on to the serial bus by the microcontroller will be read by the graphical user interface,
and vise versa.
The loop( ) function will handle all the decoding of data from the voice recognition
components and transmissions of data to the graphical user interface. This function will
begin by reading the eight pins on the microcontroller. Since the value held on eight pins
correspond to a specific phrase or word stored in the SRAM, the loop( ) function will
decode that phrase or word into its ASCII value. That ASCII value will be transmitted to
the graphical user interface over the serial bus. This methodically will be repeated
continuous after the setup( ) function is executed. This is the basic algorithm of the
Arduino microcontroller.
Modifications to Basic Algorithm
As noted above in the ‘Hardware Details’ section, the voice recognition system will
continually transmit the eight bit value of the last recognized phrase or word. Since the
loop( ) function is constantly evaluated, the basic algorithm will constantly transmit a
ASCII value to the graphical user interface. As a result, even if the voice recognition
system has not recognized any new phrases or words, the last recognized word is still
8
transmitted, due to the fact the voice recognition system will continue to transmit its last
recognized phrase or word.
In order to decrease the amount of data flow to the graphical user interface, a decision
loop must be added. Every time the loop( ) function is ran, this decision loop will store
the current decoded ASCII value and the previous decoded ASCII value of the last
evaluation of the loop( ) function. When the previous and current ACSII value differs,
this means a new phrase or word has been recognized. As a result, this new ACSII value
will be transmitted over the serial bus. The high level flowchart is shown in Figure 3.
Appendix B is the code used to program the Arduino microcontroller.
9
Figure 3: Arduino programming software flow.
10
Graphical User Interface
The Processing language is based on the Java language and places an emphasis on
“images, animation, and interactions” [x]. Given Processing’s compatibility with the
ideals of the SRC (user-friendliness and graphical interaction), the language was selected
to handle most of the high-level functions. The high-level functions include the GUI
display and computation algorithms.
Overview
The Processing program works at the top most layer of the software design solution used
for the SRC (see Figure 4). The program resides on a PC (personal computer) with the
appropriate operating system (Linux, Windows, or Mac OS X), and Java virtual machine.
The C program is implemented in C and programmed into the Arduino’s Atmel
microcontroller. The data flow between the two layers is bidirectional using a COM
Serial port.
Processing
Serial port
C
8 bit bus
HM2007
Figure 4: Software layers.
The Processing program itself is composed of three components: serial communication,
display GUI, and computation. A scheme for serial communication was necessary in
order to interface with the C code. The display GUI was set up to provide the user with a
friendly and easy-to-read visual feedback system. The computation algorithms take care
of the basic arithmetic functions of the calculator.
11
Serial Communication
Serial communications are established between the C program and Processing program
via the creation of a Serial object in the Processing code. A 9600 bps baud rate is used
for communication to achieve a wider range of compatibility, as speed is of little
importance. Figure 5 shows the sequence in which serial communication takes place.
Serial.read()
GUI Display
Serial.clear()
Serial.clear()
Y
Delimiter?
Serial.read()
N
Figure 5: Processing software flow.
The delimiter is necessary in order to provide the user with feedback while interfacing
with the SRC. The Processing display GUI will not display another number or operator
until the delimiter has been spoken by the user (usually the word “OK”). In addition, the
result of a computation will not be displayed until a keyword has been spoken (usually
the word “EQUAL”). For example, the equation 2+2 can be evaluated using the
following scheme,
“2”, “OK”, “2”, “OK”,”EQUALS”, “OK”
which will cause the calculator to display the number 2.
GUI Display
A significant amount of research was invested in the design of the GUI display, given
that the SRC is intended for all disability types, including those patrons with low-vision.
A study by Hill and Scharff suggests the best font color, font style, font type, and
background color combination as green italicized Times New Roman on a yellow
background. [http://hubel.sfasu.edu/research/AHNCUR.html]. The GUI features a high
12
contrast, high visibility display with similar attributes as those mentioned as well as a
large font size in order to aid visual user feedback.
Computation
Once the speech data has been read and interpreted from the serial bus by the Processing
code, the Processing program stores the data into a vector array as a means of memory
storage. Once the keyword “EQUALS” has been picked up by the program, the vector
array is processed and a result is sent to the GUI display. The vector array is parsed
element-by-element, and a variable stores a running total. When an arithmetic operator is
encountered in the vector array, two inputs are sent to a method corresponding to the
operation to be performed (addition, subtraction, division, etc.). The method performs
the computation and then returns the result as a single variable.
Error-checking
Error-checking is performed on-the-fly via a Boolean state variable that determines
whether or not two arithmetic operators have been spoken consecutively. If this is the
case, the Processing code will accept the last spoken operator.
Appendix C is the code for creating the java GUI.
13
Marketing Approach
The market for limited mobility people has never been adequately addressed. Limited
mobility people have always been excluded from using forms of instrumentation which
require precise motor movements such as button presses on small electronics such as
handheld calculators. The market for calculators aims toward limited vision consumers
by either enlarging the surface area of buttons and/or by adding voice feedback when
buttons are pressed. While such solutions help users with limited vision, they fail to
completely address the problems for limited mobility users. The only complete solution
is to completely remove physically interaction with the device. With voice recognition
technology, this is now possible.
The SRC calculator is an all-in-one, disability-friendly calculator that introduces a handsfree, speech recognition and speech feedback system. The calculator can also be used to
remotely tutor users. Since production options on the market is currently very limited,
the cost of the calculator should not impede market success. The unit cost of an SRC
calculator will be around $285.
Many existing speech recognition calculator products on the market today are software
products for laptop or desktop computers. The competing Orion speech recognition
calculator retails for around $280. This offers an advantage for the SRC because it offers
a comparable market cost with superior features. The SRC will be the only product on
the market which will feature hands free speech recognition capabilities. The SRC will
offer unparalleled support for users with this requirement.
Cost Analysis
As seen in Table A4, the inclusion of the Arduino board into the overall SRC calculator
design has lowered the cost of all materials to approximately $231. If the unassembled
ImagesCo SR-06 circuit kit is purchased and mass assembled, the cost of overall
materials would be lowered drastically when the SRC is mass manufactured. Since the
pre-assembled ImagesCo SR-07 circuit kit was purchased to create the prototype, it is
known that the design will work flawlessly with the SRC. The SR-07 Speech
Recognition circuit is essentially the assembled and tested SR-06 kit. Mass assembly and
testing will be included in the total cost of manufacturing.
The cost of the SR-07 kit was $179.95 and the cost of the SR-06 kit is $114.95. This will
allow the unit cost to drop by $65.00. At a unit price of $285, there will be an even more
significant margin of profit.
The overall labor cost, based on the salaries of four entry level engineers at $26 an hour
for 100 hours each is determined to be $10,400. With all the cost of overhead, fringe,
14
and sales, the sale of 100,000 units at a price of $285 allows for a $13.6 million dollar
profit.
Tables outlining the details of cost analysis may be found in Appendix A.
15
Appendix A
Table A1. SRC PRICE CALCULATIONS
SRC PRICE CALCULATIONS
Fringe Benefits
Overhead
Sales Expense
30% of labor
120% of materials, labor & fringe
10% of selling price
Table A2. Labor Cost
Labor Cost
Engineer 1 (100 hrs x $26)
Engineer 2
Engineer 3
Engineer 4
Total Labor Cost
$2,600
$2,600
$2,600
$2,600
$10,400
Table A3. Development Cost
Development Cost
Parts
Labor
Fringe Benefits, % of Labor
Subtotal
Overhead, % of Matl, Labor & Fringe
$231
$10,400
$3,120
$13,751
$16,501
Total
$44,003
16
Table A4. Bill of Materials for 1 Unit
Bill of Materials for 1 Unit
HM2007
8K x 8 SRAM
74LS373
7448 x 2
LCD Display
Arduino Microcontroller Board
Keypad
Protoboard
Microphone
Wires, Resistors, and Capacitors
Parallax Internet Netburner Kit
Total
$10
$8.85
$0.80
$1.50
$20
$49
$10
$15
$5
$12
$99
$231
Table A5. Determination of Selling Price
Determination of Selling Price
Based on:
Parts Cost
Assembly Labor
Testing Labor
Total Labor
Fringe Benefits, % of Labor
Subtotal
Overhead, % of Matl, Labor & Fringe
Subtotal, Input Costs
Sales Expense
Amortized Development Costs
Subtotal, All Costs
Profit
Selling Price
Net Profit
17
100,000 units
$231
$30
$10
$40
$12
$283
$340
$120
$29
$0
$149
136
$285
$13,600,000
Appendix B
//declare our methods and variables
int ReadMatchPin();
boolean Comparer(int[], int[]);
//the pins we read
int pins[] = {2, 3, 6, 7, 8, 9, 4, 5};
//our masks
int mask[10][8] = {
{1,1,0,1,1,1,1,1}, //{1, 0, 0, 0, 0, 0, 0, 0}, //zero -> OK button
{1, 0, 0, 0, 0, 1, 0, 0}, //one
{1, 0, 0, 0, 1, 0, 0, 0}, //two
{1, 0, 0, 0, 1, 1, 0, 0}, //three
{1, 0, 0, 1, 0, 0, 0, 0}, //four
{1, 0, 0, 1, 0, 1, 0, 0}, //five
{1, 0, 0, 1, 1, 0, 0, 0}, //six
{1, 0, 0, 1, 1, 1, 0, 0}, //seven
{1, 0, 1, 0, 0, 0, 0, 0}, //eight
{1, 0, 1, 0, 0, 1, 0, 0}, //nine
};
void setup() {
//set pins as inputs
for(int i=0; i<=7; i++){
pinMode(pins[i], INPUT);
}
beginSerial(9600);
}
char command;
void loop(){
int value;
int tempValue;
int previousValue;
boolean firstTime = true;
boolean firstSend = true;
18
command = serialRead();
if(command == 'r'){
do{
tempValue = ReadMatchPins();
if(tempValue > 0){
value = tempValue;
if(firstTime){
previousValue = value;
firstTime = false;
}
if(previousValue != value){
firstSend = true;
}
if(firstSend){
serialWrite(value+48); //ascii value
printNewline();
firstSend = false;
}
previousValue = value;
}
command = serialRead(); //stop
}while(command != 's');
}
}
int ReadMatchPins(){
int pinValues[8];
boolean found = false;
for(int i=0; i<=7; i++){
pinValues[i] = digitalRead(pins[i]);
}
int index = 0;
int matchedValue;
do{
if(Comparer(pinValues, mask[index])){
found = true;
matchedValue = index;
}
index++;
//condition where nothing is matched returned value is -1
if(index > 9){
found = true;
matchedValue = -1;
19
}
}while(!found);
return matchedValue;
}
boolean Comparer(int a[], int b[]){
boolean matched = true;
for(int i=0; i<=7; i++){
if(a[i] != b[i]){
matched = false;
}
}
return matched;
}
20
Appendix C
//Import libraries
import processing.serial.*;
//Declarations
String inString = new String(" "); // Input string from serial port:
String pString = new String("1");
String holder = new String("1");
Serial myPort; // The serial port:
PFont fontA; // The display font:
int equal = '=';
// ASCII equals
// Set the left and top margin
int margin = 6;
int gap = 30;
void setup() {
size(600, 600);
background(0);
fontA = loadFont("CourierNew36.vlw");
textFont(fontA, 48);
textAlign(LEFT);
// List all the available serial ports:
//println(Serial.list());
//Select second serial port (tested on John's computer and seems to be typical)
myPort = new Serial(this, Serial.list()[1], 9600);
//Read buffer until we hit the equals character
//myPort.bufferUntil(equal);
}
void draw()
{
21
myPort.write(114); //start the Serial communication
//myPort.write(115); causes spamming
holder = (myPort.readString()); //temporary holder
myPort.clear();
if ((holder != null) )
{
inString = inString.concat(holder);
print(inString);
}
delay(2000);
}
void stringParse(String p)
{
if (p.indexOf("+") != -1) //test for addition
{
}
}
22
Works Cited
[1]
Images Scientific Instruments Inc., “How to Build a Speech Recognition Circuit,”
Images Scientific Instruments Inc. [Online]. Available:
http://www.imagesco.com/articles/hm2007/SpeechRecognitionTutorial01.html.
[Accessed: Dec. 1, 2007].
[2]
Arduino inc., “Arduino Board”, [Online]. Available: http://www.arduino.cc/.
[Accessed: Dec. 1, 2007].
[3]
B. Fry and C. Reas, "Processing," [Online document], 2007, Available HTTP:
http://www.processing.org/
[4]
A.L. Hill and L.F.V. Scharff, "Readability Of Websites With Various
Foreground/Background Color Combinations, Font Types And Word Styles,"
[Online document], 1997, Available HTTP:
http://hubel.sfasu.edu/research/AHNCUR.html
23
Download