DSP Algorithms and Adaptive systems for AUDIO

advertisement
DSP Algorithms and Adaptive systems
for AUDIO applications
MMSP 2014
Forni di Sopra (UD), February 16-18
A3lab people
Full professor
•  Francesco Piazza
Researcher
•  Simone Fiori
•  Stefano Squartini
•  Christian Morbidoni
Post Doc
Researcher
•  Stefania Cecchi
•  Paolo Peretti
•  Emanuele Principi
•  Michele Nucci
•  Marco Grassi
Ph.D. Student
•  Marco Virgulti
•  Leonardo Gabrielli
•  Francesco Faccenda
•  Marco Fagiani
•  Marco Severini
MMSP 2014
Forni di Sopra (UD), February 16-18
2
A3lab activities
Main Research activity
area
•  DSP algorithms for Audio and
Multimedia applications
•  Computational Intelligence algorithms
for Multimedia
•  Real-Time processing oriented
Teaching activity
•  Circuit Theory
•  Digital Signal Processing and
Computational Intelligence
•  Electrical Machines and Systems
Projects and collaboration
•  Projects Funded by public agencies
(European Community also) and
private companies
•  Several active collaborations with
International Academies and
Enterprises
MMSP 2014
Forni di Sopra (UD), February 16-18
3
A3lab facilities
Laboratory Audio-DSP
•  equipped with professional
audio instrumentations (e.g.,
professional sound cards,
microphones, loudspeakers, etc)
Semi Anechoic Chamber
• The chamber dimensions are 9
mx7mx5m
• This chamber is qualified ISO
3745
MMSP 2014
Forni di Sopra (UD), February 16-18
4
Some A3lab projects
hArtes– FP6 [2006-2010] - 634K
DISCOVERY- eContentPlus [2006-2010] - 300K
SEMLIB - FP7 [2010-2012] - 300K
Funded Regional Projects
TASCA– POR [2010-2011] - 144K
SAIYL– POR [2009-2010] - 111K
HOMELINE– POR [2009-2010] - 40K
Moretti Forni – “Giovane
Tecnologo” [2009-2010] - 40K
FBT – “Giovane Tecnologo” [2009-2010]
- 40K
eDoor– 598 [2007-2008] - 56K
CMT– 598 [2006-2007] - 20K
Funded
European
Projects
Funded Private Projects
Line Arrays– [2007-2009] – 99K
Others– [2008-2010] – 50K
EU COST Actions
COST A32 – [2006-2010]
COST 2102 – [2006-2010]
COST 277 – [2001-2005]
MMSP 2014
Forni di Sopra (UD), February 16-18
5
A3lab collaborations
Academia/research centers (formal):
University of Illinois at Chigago (USA), South China University of Technology
(China), Fondazione Bruno Kessler (Italy), Università La Sapienza, Digital
Enterprise Research Institute (Ireland), Texas Instruments European University
Program (equipment donation received from 2010 to date).
Academia/research centers (informal) (active Erasmus links)
Riken Institute (Japan), University of Stirling (UK), University of Windsor
(Canada), University of Aachen (Germany), Fraunhofer Institute (Germany),
Escola Universitaria Politecnica de Matarò (Spain), Aalto University (Finland),
Technical University of Munich (Germany) and others
Companies/Enterprises:
Texas Instruments, Thales, Thomson, HP, Roland Europe, KORG, Faital, CMT,
FBT, Indesit, Radvision Italia, Proietti Planet, AYT, Microhard, Imolinfo, NET7, and
more
~100 scientific paper published (last 5 yr)
MMSP 2014
Forni di Sopra (UD), February 16-18
6
A3lab research fields
Audio Rendering
•  Optimize the listening experience according to the
characteristics of the acoustic environment and the user
needs
Speech-interfaced Systems
•  Systems where the use of speech is involved to enable a
certain service
Digital Music
•  Digital music processing
MMSP 2014
Forni di Sopra (UD), February 16-18
7
Audio Rendering
—  Multichannel
—  Wave‐field
Equalization
Synthesis and Analysis
—  Reverberation
—  3D
audio
—  Acoustic
—  Active
Echo Cancellation
Noise Cancellation
MMSP 2014
Forni di Sopra (UD), February 16-18
8
Audio Rendering
Multichannel Equalization
Frequency
domain
Multipoint
•  To enlarge the
sweet spot
with several
measures
around the
listener
•  To reduce the
computational
complexity
for real time
approaches
Fixed
Adaptive
•  To
compensate
small
environments
•  To consider
variable
environments
MMSP 2014
Forni di Sopra (UD), February 16-18
9
Audio Rendering
Wave‐field Synthesis and Analysis
Reproduction systems, based on stereo or multichannel technique, are designed to
obtain an optimal acoustic sensation in only one point of the environment (sweet
spot).
WAVE FIELD SYNTHESIS
(WFS)
WAVE FIELD ANALYSIS
(WFA)
implements sound fields reproduction
implements sound fields recording techniques
techniques based on loudspeakers arrays.
based on microphones arrays.
MMSP 2014
Forni di Sopra (UD), February 16-18
10
Audio Rendering
Digital Effects : Reverberation
It is probably the most used audio effect employed by musician during live
performances and recording session.
HYBRID REVERBERATOR: based on a combined approach that use
— 
— 
measured impulse responses for the early reflection
synthetic IR for the late reflections
MMSP 2014
Forni di Sopra (UD), February 16-18
11
Audio Rendering
3D Audio
Advanced Audio Spatializer
The system is composed of two
parts:
•  a sound rendering system based on
a crosstalk canceller
•  a listener position tracking system
base on a Kinect control.
MMSP 2014
Forni di Sopra (UD), February 16-18
12
Audio Rendering
Acoustic Echo Cancellation
Stereo acoustic echo cancellers (SAECs) have become essential after spreading
of multichannel systems, introduced for ensuring higher realistic performance in
terms of speaker localization.
DECORRELATION is used to weaken the linear relationship between the
two input channels must be introduced in order to obtain good echo
cancellation.
MMSP 2014
Forni di Sopra (UD), February 16-18
13
Audio Rendering
Active Noise Cancellation
It is based on sound field modification by destructive wave interference, i.e.,
principle of superposition
•  A real time feedback system
applied to a real noise recorded in
a yatch
•  Quiet zone close to the pillows :
microphones and loudspeakers
positioned near the bed
MMSP 2014
Forni di Sopra (UD), February 16-18
14
Speech-interfaced Systems
—  Distributed
Speech-based Systems for
Smart Homes
—  Pre-processing
Framework for Speech-
interfaced Systems
—  Speech
Reinforcement
MMSP 2014
Forni di Sopra (UD), February 16-18
15
Distributed Speech-based
System for Smart Homes
— 
Main Issues
–  Distributed system for recognition of building automation vocal commands and of
distress calls for emergency state detection.
–  Two functional Units: CMPU (Central Management and Processing Unit) and
LMCU (Local Multimedia Control Unit)
LMCU
— 
ITAAL corpus
–  20 people involved(10 men and 10 women)
–  Headset & Distant Microphones
–  Home Automation Commands & Distress Calls in Italian
MMSP 2014
Forni di Sopra (UD), February 16-18
16
Distributed Speech-based
System for Smart Homes
— 
Advancements
–  Small-vocabulary speech recognizers
(based on the i-vectors paradigm)
–  Vocal Effort Classification (see
figures)
–  Seamless integration of Sound
Identification and Novelty Detection
module
Speech
Neutral corpus
(APASCI)
GMM Training
UBM
Neutral templates
Supervectors Extraction
Shout templates
SVM Training
Model
SVM Classification
Vocal effort
LMCU
w/ Vocal Effort
Classifier
MMSP 2014
Forni di Sopra (UD), February 16-18
17
Speech-interfaced Tabletop
— 
Fostering groups conversations by
visualizing suitable stimuli on the
tabletop display
— 
Stimuli can be floating words and/or
pictures. Stimuli are related to the
topic of the discussion
— 
Topics are obtained by capturing
spoken keywords
MMSP 2014
Forni di Sopra (UD), February 16-18
— 
Perception: captures the ongoing situation
around the table (status of the system,
conversation keywords).
— 
Interpretation: draws the topic of the
conversation based on recognized keywords
and predefined topics.
— 
Presentation: dynamically selects stimuli
according to the status and the topic of the
conversation.
18
Pre-processing framework for
Speech-interfaced Systems
Overall Framework
— 
Pre-processing framework
composed by three cooperating
module in cascade
— 
Speaker Diarization: it pilots
the other two stages informing
them who is speaking.
x1 ( k )
Blind Channel Identification:
the source-microphone Irs are
blindly identified when one single
speaker is active.
xN (k )
— 
— 
Speech Dereverberation:
reverberation is compensated
directly on the SIMO systems
obtained from the original MIMO
one and original sources are
yielded as output.
Speaker
Diarization
Forni di Sopra (UD), February 16-18
BCI
h
Speaker Diarization
P
P
1
M
Training
xn (k )
Noise robust implementation.
MMSP 2014
Speech
Dereverberat
Feature
Extraction
Models
Recognition
xn (k )
GMM Training
Feature
Extraction
Identification
(Majority Vote)
SPK1
SPK3
...
SPKm
19
Demultiplexer
...
— 
xn (k )
P1
PM
Speech-Reinforcement
— 
Speech reinforcement (SR)
techniques aim to increase the
speech intelligibility in adverse
environment where the
communication is difficult.
— 
SR system: composed by one
microphone, an amplifier and a
loudspeaker at least.
— 
Acoustic Feedback occurrence
due to the acoustic coupling
between the microphone and the
loudspeaker.
— 
Suitable algorithms are needed:
PEM-AFROW based solutions
adopted in this case.
— 
Implementation on embedded
systems and application in real
environments.
MMSP 2014
Forni di Sopra (UD), February 16-18
20
Speech-Reinforcement
— 
— 
Application to the automotive dualchannel communication scenario
Two Acoustic Feedback and Echo
Cancellation problems to solve
MMSP 2014
Forni di Sopra (UD), February 16-18
21
Digital Music
—  Music
—  Digital
—  Music
Information Retrieval
Music Effects
Synthesis
—  Wireless
Music
MMSP 2014
Forni di Sopra (UD), February 16-18
22
Music Information Retrieval
— 
Acoustic Onset Detection
–  Data-driven algorithm developed in
collaboration with Technical University of
Munich (Germany)
–  Hybrid Feature Extraction module based on
linear prediction in the wavelet domain and
MFCCs
–  Detector based on Bidirectional Long Short
Term Memory Recurrent Neural Networks
–  Improvements with the recent SotA
25
x[n]
Framing /
Windowing
DWPT
coif5, dec_level=8
Nbands=25
WPEC
Logarithm
25
Band Energy
Compute
Delta
win=2
25
Input
•
Forget
Gate
Input
Gate
Memory
Cell
•
1.0
•
x[n]
FN,M Neural Nets
Feature
Extraction
(RNN, BRNN,
(WPEC, ASF)
LSTM, BLSTM)
Output
WPEC
0
}
WPEC
00
Output
Gate
ODF
Threshold
Peak-Picking
Onsets
MMSP 2014
Forni di Sopra (UD), February 16-18
23
Digital Music Effects
— 
Virtual Acoustic Feedback
–  In collaboration with Aalto University (Finland)
–  Nonlinear Digital Oscillator with a second-order peaking filter in the feedback path
–  Pitch tracking algorithm (SNAC) included to adaptively select the input tone
–  Wave Digital Triode nonlinearity included to improve realism
–  Advancements: rise-time, compressor, smoothing, gain pedal
–  PureData patch implemented on different processors
MMSP 2014
Forni di Sopra (UD), February 16-18
24
Digital Music Effects
— 
Ibrida
–  PureData tool for sound
hybridization
–  Wavelet domain based
–  Dynamic morphing driven by
automatic onset detection
–  OSC controllable
— 
Speech-driven wah-wah
effect
–  Tuning the wah-wah effect by
means of voice commands
–  Low-complexity speech feature
extraction
–  Implementation on commercial
DSP
MMSP 2014
Forni di Sopra (UD), February 16-18
25
Music Synthesis
— 
Physical Model of the Clavinet
–  In collaboration with the Aalto University
(Finland)
–  Recording and analysis of the different issues
(tones spectral characteristics, attack and
decay, inharmonicity, spectrum ripple,
beating, amplifier and tone switches)
–  Digital Wave Guide based computational
model of the Clavinet string
MMSP 2014
Forni di Sopra (UD), February 16-18
26
Wireless Music
— 
Wireless MUsic Studio (We-MUST)
–  HW/SW platform for wireless music
–  Based on the PureData and Jacktrip opensource SWs
–  Latency down to 4ms single-link
–  Developments are currently on-going
(automatic device discovery and adaptive
resampling)
— 
Application example
–  BeagleBoards (BB) are used to process
and send/receive the audio streams
Beagle Board MMSP 2014
Forni di Sopra (UD), February 16-18
27
Download