DSP Algorithms and Adaptive systems for AUDIO applications MMSP 2014 Forni di Sopra (UD), February 16-18 A3lab people Full professor • Francesco Piazza Researcher • Simone Fiori • Stefano Squartini • Christian Morbidoni Post Doc Researcher • Stefania Cecchi • Paolo Peretti • Emanuele Principi • Michele Nucci • Marco Grassi Ph.D. Student • Marco Virgulti • Leonardo Gabrielli • Francesco Faccenda • Marco Fagiani • Marco Severini MMSP 2014 Forni di Sopra (UD), February 16-18 2 A3lab activities Main Research activity area • DSP algorithms for Audio and Multimedia applications • Computational Intelligence algorithms for Multimedia • Real-Time processing oriented Teaching activity • Circuit Theory • Digital Signal Processing and Computational Intelligence • Electrical Machines and Systems Projects and collaboration • Projects Funded by public agencies (European Community also) and private companies • Several active collaborations with International Academies and Enterprises MMSP 2014 Forni di Sopra (UD), February 16-18 3 A3lab facilities Laboratory Audio-DSP • equipped with professional audio instrumentations (e.g., professional sound cards, microphones, loudspeakers, etc) Semi Anechoic Chamber • The chamber dimensions are 9 mx7mx5m • This chamber is qualified ISO 3745 MMSP 2014 Forni di Sopra (UD), February 16-18 4 Some A3lab projects hArtes– FP6 [2006-2010] - 634K DISCOVERY- eContentPlus [2006-2010] - 300K SEMLIB - FP7 [2010-2012] - 300K Funded Regional Projects TASCA– POR [2010-2011] - 144K SAIYL– POR [2009-2010] - 111K HOMELINE– POR [2009-2010] - 40K Moretti Forni – “Giovane Tecnologo” [2009-2010] - 40K FBT – “Giovane Tecnologo” [2009-2010] - 40K eDoor– 598 [2007-2008] - 56K CMT– 598 [2006-2007] - 20K Funded European Projects Funded Private Projects Line Arrays– [2007-2009] – 99K Others– [2008-2010] – 50K EU COST Actions COST A32 – [2006-2010] COST 2102 – [2006-2010] COST 277 – [2001-2005] MMSP 2014 Forni di Sopra (UD), February 16-18 5 A3lab collaborations Academia/research centers (formal): University of Illinois at Chigago (USA), South China University of Technology (China), Fondazione Bruno Kessler (Italy), Università La Sapienza, Digital Enterprise Research Institute (Ireland), Texas Instruments European University Program (equipment donation received from 2010 to date). Academia/research centers (informal) (active Erasmus links) Riken Institute (Japan), University of Stirling (UK), University of Windsor (Canada), University of Aachen (Germany), Fraunhofer Institute (Germany), Escola Universitaria Politecnica de Matarò (Spain), Aalto University (Finland), Technical University of Munich (Germany) and others Companies/Enterprises: Texas Instruments, Thales, Thomson, HP, Roland Europe, KORG, Faital, CMT, FBT, Indesit, Radvision Italia, Proietti Planet, AYT, Microhard, Imolinfo, NET7, and more ~100 scientific paper published (last 5 yr) MMSP 2014 Forni di Sopra (UD), February 16-18 6 A3lab research fields Audio Rendering • Optimize the listening experience according to the characteristics of the acoustic environment and the user needs Speech-interfaced Systems • Systems where the use of speech is involved to enable a certain service Digital Music • Digital music processing MMSP 2014 Forni di Sopra (UD), February 16-18 7 Audio Rendering Multichannel Wave‐field Equalization Synthesis and Analysis Reverberation 3D audio Acoustic Active Echo Cancellation Noise Cancellation MMSP 2014 Forni di Sopra (UD), February 16-18 8 Audio Rendering Multichannel Equalization Frequency domain Multipoint • To enlarge the sweet spot with several measures around the listener • To reduce the computational complexity for real time approaches Fixed Adaptive • To compensate small environments • To consider variable environments MMSP 2014 Forni di Sopra (UD), February 16-18 9 Audio Rendering Wave‐field Synthesis and Analysis Reproduction systems, based on stereo or multichannel technique, are designed to obtain an optimal acoustic sensation in only one point of the environment (sweet spot). WAVE FIELD SYNTHESIS (WFS) WAVE FIELD ANALYSIS (WFA) implements sound fields reproduction implements sound fields recording techniques techniques based on loudspeakers arrays. based on microphones arrays. MMSP 2014 Forni di Sopra (UD), February 16-18 10 Audio Rendering Digital Effects : Reverberation It is probably the most used audio effect employed by musician during live performances and recording session. HYBRID REVERBERATOR: based on a combined approach that use measured impulse responses for the early reflection synthetic IR for the late reflections MMSP 2014 Forni di Sopra (UD), February 16-18 11 Audio Rendering 3D Audio Advanced Audio Spatializer The system is composed of two parts: • a sound rendering system based on a crosstalk canceller • a listener position tracking system base on a Kinect control. MMSP 2014 Forni di Sopra (UD), February 16-18 12 Audio Rendering Acoustic Echo Cancellation Stereo acoustic echo cancellers (SAECs) have become essential after spreading of multichannel systems, introduced for ensuring higher realistic performance in terms of speaker localization. DECORRELATION is used to weaken the linear relationship between the two input channels must be introduced in order to obtain good echo cancellation. MMSP 2014 Forni di Sopra (UD), February 16-18 13 Audio Rendering Active Noise Cancellation It is based on sound field modification by destructive wave interference, i.e., principle of superposition • A real time feedback system applied to a real noise recorded in a yatch • Quiet zone close to the pillows : microphones and loudspeakers positioned near the bed MMSP 2014 Forni di Sopra (UD), February 16-18 14 Speech-interfaced Systems Distributed Speech-based Systems for Smart Homes Pre-processing Framework for Speech- interfaced Systems Speech Reinforcement MMSP 2014 Forni di Sopra (UD), February 16-18 15 Distributed Speech-based System for Smart Homes Main Issues – Distributed system for recognition of building automation vocal commands and of distress calls for emergency state detection. – Two functional Units: CMPU (Central Management and Processing Unit) and LMCU (Local Multimedia Control Unit) LMCU ITAAL corpus – 20 people involved(10 men and 10 women) – Headset & Distant Microphones – Home Automation Commands & Distress Calls in Italian MMSP 2014 Forni di Sopra (UD), February 16-18 16 Distributed Speech-based System for Smart Homes Advancements – Small-vocabulary speech recognizers (based on the i-vectors paradigm) – Vocal Effort Classification (see figures) – Seamless integration of Sound Identification and Novelty Detection module Speech Neutral corpus (APASCI) GMM Training UBM Neutral templates Supervectors Extraction Shout templates SVM Training Model SVM Classification Vocal effort LMCU w/ Vocal Effort Classifier MMSP 2014 Forni di Sopra (UD), February 16-18 17 Speech-interfaced Tabletop Fostering groups conversations by visualizing suitable stimuli on the tabletop display Stimuli can be floating words and/or pictures. Stimuli are related to the topic of the discussion Topics are obtained by capturing spoken keywords MMSP 2014 Forni di Sopra (UD), February 16-18 Perception: captures the ongoing situation around the table (status of the system, conversation keywords). Interpretation: draws the topic of the conversation based on recognized keywords and predefined topics. Presentation: dynamically selects stimuli according to the status and the topic of the conversation. 18 Pre-processing framework for Speech-interfaced Systems Overall Framework Pre-processing framework composed by three cooperating module in cascade Speaker Diarization: it pilots the other two stages informing them who is speaking. x1 ( k ) Blind Channel Identification: the source-microphone Irs are blindly identified when one single speaker is active. xN (k ) Speech Dereverberation: reverberation is compensated directly on the SIMO systems obtained from the original MIMO one and original sources are yielded as output. Speaker Diarization Forni di Sopra (UD), February 16-18 BCI h Speaker Diarization P P 1 M Training xn (k ) Noise robust implementation. MMSP 2014 Speech Dereverberat Feature Extraction Models Recognition xn (k ) GMM Training Feature Extraction Identification (Majority Vote) SPK1 SPK3 ... SPKm 19 Demultiplexer ... xn (k ) P1 PM Speech-Reinforcement Speech reinforcement (SR) techniques aim to increase the speech intelligibility in adverse environment where the communication is difficult. SR system: composed by one microphone, an amplifier and a loudspeaker at least. Acoustic Feedback occurrence due to the acoustic coupling between the microphone and the loudspeaker. Suitable algorithms are needed: PEM-AFROW based solutions adopted in this case. Implementation on embedded systems and application in real environments. MMSP 2014 Forni di Sopra (UD), February 16-18 20 Speech-Reinforcement Application to the automotive dualchannel communication scenario Two Acoustic Feedback and Echo Cancellation problems to solve MMSP 2014 Forni di Sopra (UD), February 16-18 21 Digital Music Music Digital Music Information Retrieval Music Effects Synthesis Wireless Music MMSP 2014 Forni di Sopra (UD), February 16-18 22 Music Information Retrieval Acoustic Onset Detection – Data-driven algorithm developed in collaboration with Technical University of Munich (Germany) – Hybrid Feature Extraction module based on linear prediction in the wavelet domain and MFCCs – Detector based on Bidirectional Long Short Term Memory Recurrent Neural Networks – Improvements with the recent SotA 25 x[n] Framing / Windowing DWPT coif5, dec_level=8 Nbands=25 WPEC Logarithm 25 Band Energy Compute Delta win=2 25 Input • Forget Gate Input Gate Memory Cell • 1.0 • x[n] FN,M Neural Nets Feature Extraction (RNN, BRNN, (WPEC, ASF) LSTM, BLSTM) Output WPEC 0 } WPEC 00 Output Gate ODF Threshold Peak-Picking Onsets MMSP 2014 Forni di Sopra (UD), February 16-18 23 Digital Music Effects Virtual Acoustic Feedback – In collaboration with Aalto University (Finland) – Nonlinear Digital Oscillator with a second-order peaking filter in the feedback path – Pitch tracking algorithm (SNAC) included to adaptively select the input tone – Wave Digital Triode nonlinearity included to improve realism – Advancements: rise-time, compressor, smoothing, gain pedal – PureData patch implemented on different processors MMSP 2014 Forni di Sopra (UD), February 16-18 24 Digital Music Effects Ibrida – PureData tool for sound hybridization – Wavelet domain based – Dynamic morphing driven by automatic onset detection – OSC controllable Speech-driven wah-wah effect – Tuning the wah-wah effect by means of voice commands – Low-complexity speech feature extraction – Implementation on commercial DSP MMSP 2014 Forni di Sopra (UD), February 16-18 25 Music Synthesis Physical Model of the Clavinet – In collaboration with the Aalto University (Finland) – Recording and analysis of the different issues (tones spectral characteristics, attack and decay, inharmonicity, spectrum ripple, beating, amplifier and tone switches) – Digital Wave Guide based computational model of the Clavinet string MMSP 2014 Forni di Sopra (UD), February 16-18 26 Wireless Music Wireless MUsic Studio (We-MUST) – HW/SW platform for wireless music – Based on the PureData and Jacktrip opensource SWs – Latency down to 4ms single-link – Developments are currently on-going (automatic device discovery and adaptive resampling) Application example – BeagleBoards (BB) are used to process and send/receive the audio streams Beagle Board MMSP 2014 Forni di Sopra (UD), February 16-18 27