elbowtm

ELBOWTM Beta version - launch date – 8/7/13 *Copyright (c) 2013 Waveform Communication. All rights reserved. THIS SOFTWARE IS PROVIDED BY WAVEFORM COMMUNICATION ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANT ABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WAVEFORM COMMUNICATION NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Why Do We Need Free General Public License Speech Audio? Speech recognition (or Speech-to-Text) engines are closed source and they do not give you access to the speech audio and transcriptions (i.e. the speech corpus) used to create the acoustic model. Although there are a few instances of small speech corpora that could be used to create acoustic models, the vast majority of corpora (especially large corpora best suited to building good acoustic models) must be purchased under restrictive licenses, or the corpora does not exist and must be collected. We are collecting recordings to extend the Waveform Model to real world speech productions from a variety of microphones. By using this early version of ELBOW, you are agreeing to your voice being recorded for Waveform Communication’s speech corpus. No recordings will be disseminated in any form. Disclaimer This application uses the unchanged logic that achieved 99.2 percent accuracy on static data and 91.1 percent on the streaming .wav files of vowels produced in perfect conditions with quality equipment. Improvements can be made, but the next business step is to look at real world input from a wide range of microphones. It is not realistic to expect anything close to 90 percent without having a corpus to establish the model parameters, and that is not being claimed in this release. The entire ELBOW engine has been built in less than 2 months and is well over 90 percent for vowels produced in perfect conditions. This will not be online for long since we are using this to rapidly collect a speech corpus, not demonstrate poor performance. We do not need to show our recognized vowel, but users would not see how the working ELBOW will work. With 91 percent achieved in less than a week, a speech corpus will significantly speed performance enhancements for microphone input. Using ELBOW This application can process live recordings beginning by pressing the Record button. The “Stop and Process” button will process the stream to identify and display the vowel, and then the application will send the .wav file back to our server for future analysis. The Play button will play back the recordings. The Save button may not be active in this version since we are saving the .wav file on our web server. The “Process a folder” button allows users to browse for a folder which then initiates the processing of each .wav file in the selected folder. The Data button provides the raw acoustic input supplied by our acoustic analysis programming. The columns of acoustic data are the calculation of F1/F0, pitch (F0), and the formant values F1, F2, and F3. The columns for our decision processing are not displayed. Notes ELBOW cuts waveform input into 20ms windows or slices. The data displayed through our Data button is the raw formant data for each 20ms slice which is collectively used to identify the vowel. This beta version is built for males only. It is not expected that this brute force approach to collecting a speech corpus will be needed for women and children. This is a limited application for a limited time. More time and effort will be spent on a formal white paper when the fully functional ELBOW is reposted to this site. Be patient when you start the application. Also, a display of - - means there was not an identified vowel for that production. ELBOW is independent of any open source or commercial product.

elbowtm

Related documents

Products

Support

elbowtm

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib