ELBOWTM
Beta version - launch date – 8/7/13
*Copyright (c) 2013 Waveform Communication. All rights reserved.
THIS SOFTWARE IS PROVIDED BY WAVEFORM COMMUNICATION ``AS IS'' AND
ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANT ABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WAVEFORM
COMMUNICATION NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
Why Do We Need Free General Public License Speech Audio?
Speech recognition (or Speech-to-Text) engines are closed source and they do not give you
access to the speech audio and transcriptions (i.e. the speech corpus) used to create the acoustic
model. Although there are a few instances of small speech corpora that could be used to create
acoustic models, the vast majority of corpora (especially large corpora best suited to building
good acoustic models) must be purchased under restrictive licenses, or the corpora does not exist
and must be collected. We are collecting recordings to extend the Waveform Model to real
world speech productions from a variety of microphones.
By using this early version of ELBOW, you are agreeing to your voice being recorded for
Waveform Communication’s speech corpus. No recordings will be disseminated in any form.
Disclaimer
This application uses the unchanged logic that achieved 99.2 percent accuracy on static data and
91.1 percent on the streaming .wav files of vowels produced in perfect conditions with quality
equipment. Improvements can be made, but the next business step is to look at real world input
from a wide range of microphones. It is not realistic to expect anything close to 90 percent
without having a corpus to establish the model parameters, and that is not being claimed in this
release.
The entire ELBOW engine has been built in less than 2 months and is well over 90 percent for
vowels produced in perfect conditions. This will not be online for long since we are using this
to rapidly collect a speech corpus, not demonstrate poor performance. We do not need to show
our recognized vowel, but users would not see how the working ELBOW will work. With 91
percent achieved in less than a week, a speech corpus will significantly speed performance
enhancements for microphone input.
Using ELBOW
This application can process live recordings beginning by pressing the Record button. The “Stop
and Process” button will process the stream to identify and display the vowel, and then the
application will send the .wav file back to our server for future analysis.
The Play button will play back the recordings. The Save button may not be active in this version
since we are saving the .wav file on our web server. The “Process a folder” button allows users
to browse for a folder which then initiates the processing of each .wav file in the selected folder.
The Data button provides the raw acoustic input supplied by our acoustic analysis programming.
The columns of acoustic data are the calculation of F1/F0, pitch (F0), and the formant values F1,
F2, and F3. The columns for our decision processing are not displayed.
Notes
ELBOW cuts waveform input into 20ms windows or slices. The data displayed through our Data
button is the raw formant data for each 20ms slice which is collectively used to identify the
vowel.
This beta version is built for males only. It is not expected that this brute force approach to
collecting a speech corpus will be needed for women and children. This is a limited application
for a limited time.
More time and effort will be spent on a formal white paper when the fully functional ELBOW is
reposted to this site.
Be patient when you start the application. Also, a display of - - means there was not an
identified vowel for that production.
ELBOW is independent of any open source or commercial product.