SoundSense: Scalable Sound Sensing for People-Centric Application on Mobile Phones

advertisement
SoundSense: Scalable Sound Sensing
for People-Centric Application on
Mobile Phones
Hon Lu, Wei Pan, Nocholas D. lane, Tanzeem
Choudhury and Andrew T. Campbell
Department of Computer Science,
Dartmouth College
Motivation:
• Utilizing the microphone sensor to detect
personalized sound events.
• Sound captured by mobile phone’s
microphone is a rich source of information for
surrounding environment, social environment,
conversation, activity, location, dietary etc.
What is SoundSense?
• Scalable Sound Sensing Framework: Capable
of identifying any meaningful sound events of
a user’s daily life.
• Implemented for resource limited devices,
Apple iPhone.
• System solely runs in mobile phone
Contribution
• First general purpose sound event
classification system designed for large
number of events.
• Able to address significant sound event’s for
individual user’s environment
• Implemented the whole system architecture
and algorithm in Apple iPhone
Design Consideration
• Building a scalable sound classification system
so that it can detect all type of sound events
for different users.
• Privacy Issue: Record and Processing audio
data happens all in the Mobile phone.
• Light weight signal processing and
classification of sound.
Design Consideration
Phone context condition
RMS good
approximation
of volume.
30% range of
variation for
different contextual
position.
SoundSense Architecture
Remove Frames that are silent or hard to classify
SoundSense Architecture
1. Collect features that are
insensitive to volume.
2. Detect coarse-grain category
of sound: Voice, music and
ambient sound.
3. Multilevel Classification:
Decision Tree and Markov Model
based classifier.
4. Two level of classification to
make the output smoothing.
SoundSense Architecture
1. Use previously established
audio signal processing technique
2. In this stage speech recognition,
speaker identification and music
genre classification is applied
SoundSense Architecture
1. Detect only ambient sound
(sound other then voice and music)
2.Unsuprvised learning technique
3. Detect meaningful ambient
sound. ( assumption: sound
occurrence and duration indicates
its importance)
4. Maintain a SoundRank: ranking
of the meaningful sound based on
their importance
5. Prompt user, if a new sound
exceed the threshold value of
minimum sound rank.
Implementation
• Implemented in C,C++ and Objective C
• Developed for Apple iPhone
• Duty cycle 0.64 second during lack of acoustic
event
Parameters Selection
Decision tree Classifier
Buffered in FIFO queue
Markov model classifier
• Increasing the buffer size (Sequence Length) increase the
accuracy.
• However, responsiveness of the system also increases.
• Optimal buffer size is 5.
Parameters Selection
Precision is the
number of frames
that are correctly
classified divided by
all frames.
MFCC frame length
This Precision and Recall plot is for ambient sound
Recall is define as the
recognized occurrence
of a frame type divided
by the number of
overall occurrence of
that frame
Evaluation
1. When acoustic event detected CPU usage increase to 25%. In
idle situation CPU usage is less then 5%
2. Processing time of a frame (64 ms) is around 20-30ms.
Evaluation
Only Decision Tree Classifier
Classification accuracy improved 10% for music and
speech and 3% for ambient sound
Only Decision Tree Classifier
With Markov model
Evaluation
No reliable sound to represent bus riding
Applications
• Audio Daily Diary: Log everyday events for a
users.
– To make query, how much time spend in certain
event
• Music Detector based on Participatory
Sensing:
– Provides user a way to discover event that are
associated with music being played.
Some music and voice samples are incorrectly classified as ambient sound
Friday
Saturday
Conclusion
• General Sound Classification
– Light-weight
– Hierarchical
•
•
•
•
Flexible and Scalable.
All task implemented in mobile Phone.
Able to identify new sound.
Can be used in personalized context.
Thank you
• Question?
Download