NetTalk Project
Speech Generation Using a
Neural Network
Michael J Euhardy
The Speech Generation Idea
Input: a specific letter whose sound is to
be generated
 Input: three letters on each side of it for
a total of seven letters input
 Output: the sound that should be
generated based on the input letter and
the surrounding letters
The Strategy
26 possible letters
 7 input position
Map each letter in each position to a
unique input
7*26 = 182 total inputs
The Strategy
57 possible sounds generated
Map to 57 output labels
The Resulting ANN
A fully connected single layer
perceptron with 182 inputs
and 57 outputs
The Findings
The trained neural network performs
very well, and the larger the training set
and the longer spent training on it, the
better it performs
 The training can be an extremely long
process if a high rate of classification is
desired and the training set is large
 Space
You can’t rush training the network.
Even using a dual PIII-733 with 512MB,
it still took a really long time to train
any data of a significant size. And just
converting all of the characters in the
data file to the matrices necessary to
use as inputs and labels took hours.
20000 words of data with maybe 7
letters on average. That’s a matrix
Double precision in Matlab, that’s a lot
of memory
Smaller data set, only 1000 words
 Lower standards of training, only train
to 80% classification
Next Time
Matlab is way too slow and way too
memory intensive
Start Earlier, it’s a long process
 Multi-Layer Perceptron
I give up!
 I don’t know how Microsoft’s Narrator
does it, but I bet it doesn’t do it this