Controlling your household appliances through conversation. Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Agenda Problem Statement How it works System Architecture Future Works Demonstration Questions Problem Statement Imagine you were to leave your house in a hurry because you were running late to work, had to pick up your kids from school or any other reason and you forgot to turn off the stove. You remember three blocks away and you do not have the time to turn around. You then dial a phone number that is assigned to your house and ask your house to turn the stove off for you. Your house confirms the stove will be turned off and you are now relived you won’t come home to a potential fire. This is the essence of the Phoning Home system. How it works Phoning Home is broken into four main components. The Voice over IP (VOIP) server which forwards the phone speech and uses text to speech (TTS) to speak to the user. The Phoning Home web services which handles the communication for all Phoning Home households. The speech recognition server which is responsible for recognizing the user’s phone speech. The client services which handle an individual households devices. Asterisk Server (VOIP) Phoning Home Services (WCF) Speech Recognizition (Cmu Sphinx) Client Services (WCF + MCU) Answering incoming calls Asterisk Open Source Communication Project Turns a computer into a voice communications server Includes features: Ability to answer incoming calls Ability to generate outgoing calls Ability to play and generate tones Ability to integrate with web services Ability to record calls Ability to provide call details such as caller identification Phone Server Application Speech Recognition Application Carnegie Melon University (CMU) Sphinx-4 Speech Recognition Engine The Sphinx-4 framework consists of three primary modules: the FrontEnd, the Decoder, and the Linguist. The FrontEnd takes input signals and parameterizes them into a sequence of features. The Linguist translates any type of standard language model along with information from the Dictionary and structural information from one or more sets of Acoustic models into a SearchGraph. The Decoder uses the features from the FrontEnd, and the SearchGraph from the Linguist to perform the actual decoding and produce the Results. Sphinx 4 A Wireless Appliance Reducing Energy (AWARE) • Each device that can be controlled via the Machine B Phoning Home system is known as an A Wireless Appliance Reducing Energy (AWARE). Every device is wirelessly controlled via the Phoning Home Master Control which is ultimately an Atmel Atmega16 microcontroller connected via USB to the Client Services. Phoning Home System Overview Sequence Diagram Issues One issue I had to overcome when developing the demonstration was recognizing telephone speech. There are significant differences between microphone and telephone speech. From the Sphinx documentation: “The issue with telephone audio is that it has limited range of frequencies. Unlike usual microphone recording that includes frequencies from 1 Hz to 8000 kHz, telephone audio is passed through frequency filters. As a result telephone audio contains frequencies from 200 Hz to 3500 Hz. That makes it impossible to recognize telephone audio with usual microphone acoustic model. You need to use specialized models to recognize it.” Ending up using the 8kHz VoxForge acoustic model. Future Works Easy Collection and Storage of Multiple Utterances can be used to improve acoustic models Database • Asterisk Server is capable of simultaneously handling multiple calls • Server stores and catalogs utterances automatically Future Works Although Phoning Home is designed to allow you to call your house to control your appliances, it would be very useful to combine Dr. Kepuska’s Wake-up-Word (WuW) technology to allow you to control your appliances from inside the house as well. In a commercial product, Phoning Home would need to be equipped with extensive security measures to confirm the user calling their house actually has appropriate credentials to control their appliances. This could be as simple as a password or as complex as adding speaker recognition to confirm the user calling is a user that is permitted to call. Demonstration Virtual Phone Number: 1 (321) 710 - 5090 #JSGF V1.0; /** * JSGF Digits Grammar for Phoning Home */ grammar digits; public <command> = <polite> <startAction> room [number] <numbers> <devices> <endAction>; <polite> = [please | kindly | could you | oh mighty computer | operator]; <startAction> = (turn | switch); <devices> = (lights | lamps); <endAction> = (on | off); <numbers> = (oh | zero | one | two | three | four | five | six | seven | eight | nine); Questions