PhoningHome - My FIT - Florida Institute of Technology

advertisement
Controlling your household
appliances through conversation.
Sean Powers
Florida Institute of Technology
ECE 5525 Final:
Dr. Veton Kepuska
Date: 07 December 2010
Agenda
 Problem Statement
 How it works
 System Architecture
 Future Works
 Demonstration
 Questions
Problem Statement
 Imagine you were to leave your house in a hurry because you were
running late to work, had to pick up your kids from school or any other
reason and you forgot to turn off the stove. You remember three blocks
away and you do not have the time to turn around. You then dial a
phone number that is assigned to your house and ask your house to
turn the stove off for you. Your house confirms the stove will be turned
off and you are now relived you won’t come home to a potential fire.
This is the essence of the Phoning Home system.
How it works
Phoning Home is broken into four
main components. The Voice over
IP (VOIP) server which forwards
the phone speech and uses text to
speech (TTS) to speak to the user.
The Phoning Home web services
which handles the communication
for all Phoning Home households.
The speech recognition server
which is responsible for
recognizing the user’s phone
speech. The client services which
handle an individual households
devices.
Asterisk
Server (VOIP)
Phoning
Home
Services
(WCF)
Speech
Recognizition
(Cmu Sphinx)
Client
Services (WCF
+ MCU)
Answering incoming calls
 Asterisk Open Source Communication Project
 Turns a computer into a voice communications server
 Includes features:






Ability to answer incoming calls
Ability to generate outgoing calls
Ability to play and generate tones
Ability to integrate with web services
Ability to record calls
Ability to provide call details such as caller identification
Phone Server Application
Speech Recognition Application
 Carnegie Melon University (CMU) Sphinx-4 Speech Recognition Engine
 The Sphinx-4 framework consists of three primary modules: the
FrontEnd, the Decoder, and the Linguist. The FrontEnd takes input
signals and parameterizes them into a sequence of features. The
Linguist translates any type of standard language model along with
information from the Dictionary and structural information from one
or more sets of Acoustic models into a SearchGraph. The Decoder uses
the features from the FrontEnd, and the SearchGraph from the Linguist
to perform the actual decoding and produce the Results.
Sphinx 4
A Wireless Appliance Reducing Energy
(AWARE)
• Each device that can be controlled via the
Machine B
Phoning Home system is known as an A Wireless
Appliance Reducing Energy (AWARE). Every
device is wirelessly controlled via the Phoning
Home Master Control which is ultimately an
Atmel Atmega16 microcontroller connected via
USB to the Client Services.
Phoning Home System Overview
Sequence Diagram
Issues
 One issue I had to overcome when developing the demonstration was
recognizing telephone speech. There are significant differences
between microphone and telephone speech. From the Sphinx
documentation:
 “The issue with telephone audio is that it has limited range of
frequencies. Unlike usual microphone recording that includes
frequencies from 1 Hz to 8000 kHz, telephone audio is passed
through frequency filters. As a result telephone audio contains
frequencies from 200 Hz to 3500 Hz. That makes it impossible to
recognize telephone audio with usual microphone acoustic model.
You need to use specialized models to recognize it.”
 Ending up using the 8kHz VoxForge acoustic model.
Future Works
 Easy Collection and Storage of Multiple Utterances
can be used to improve acoustic models
Database
• Asterisk Server is capable of simultaneously handling
multiple calls
• Server stores and catalogs utterances automatically
Future Works
 Although Phoning Home is designed to allow you to call
your house to control your appliances, it would be very
useful to combine Dr. Kepuska’s Wake-up-Word (WuW)
technology to allow you to control your appliances from
inside the house as well.
 In a commercial product, Phoning Home would need to be
equipped with extensive security measures to confirm the
user calling their house actually has appropriate credentials
to control their appliances. This could be as simple as a
password or as complex as adding speaker recognition to
confirm the user calling is a user that is permitted to call.
Demonstration
 Virtual Phone Number: 1 (321) 710 - 5090
#JSGF V1.0;
/**
* JSGF Digits Grammar for Phoning Home
*/
grammar digits;
public <command> = <polite> <startAction> room [number] <numbers> <devices> <endAction>;
<polite> = [please | kindly | could you | oh mighty computer | operator];
<startAction> = (turn | switch);
<devices> = (lights | lamps);
<endAction> = (on | off);
<numbers> = (oh | zero | one | two | three | four | five | six | seven | eight | nine);
Questions
Download