Voice Quality Monitoring - Broadband Technology 2000 Ltd

Signal Processing Technologies in Voice over IP
Eli Shoval
Audiocodes
© 2006 AudioCodes Ltd.
All rights reserved.
AudioCodes Confidential Proprietary
Scope
• The purpose of this presentation is to provide an
overview of speech processing technologies that are
used in Audiocodes VoIP products
Outline
• Signal processing technologies in VoIP
–
–
–
–
–
–
Line Echo Cancellation
Acoustic Echo Cancellation
Speech Compression
Wideband Speech Compression
Background Noise Reduction
Voice Quality Monitoring
Main challenges in VoIP system design
• Bandwidth Efficiency
– Need a vocoder with the lowest possible bit rate with voice quality
suitable for the application
– Different vocoders will suite different networks (LAN, WAN, Wireless)
• IP Network Inherent problems
– Jitter
– Delay
– Packet Loss
• Voice Quality issues and enhancements
– Echo
– Background Noise
– Gain adjustment
Main challenges in VoIP system design
Cont’d
• Handling of Non Speech Signals –
–
–
–
Fax
Data Modems
Caller ID
DTMF
• Interoperability – VoIP equipment must be able to
communicate smoothly with equipment of other vendors
• Implementation efficiency
Basic DSP processing in VoIP
CAS
Signaling
Detector
Fax/Modem
ByPass Rx
Fax Relay Rx
Data Relay
Rx
CAS Signal
Echo
Canceler
PCM
Highway Voice
Signal
I/F
Input Gain
Voice
Fax
Data
In-Band
Signaling
Detector
Speech Encoder
Discrim
-inator
Host
Port
I/F
Speech Decoder
Output Gain
CAS Signal
In-Band
Signaling
Generator
Data Relay
Tx
Fax Relay Tx
Fax/Modem
ByPass Tx
CAS
Signaling
Generator
Host Data
Line/Electrical Echo phenomena
• Line echo exists in both networks due to leakage in 2/4
wire hybrid transformers
• In PSTN network: the echo exist but is not perceptible (it
is masked)
• In IP network: echo is perceptible due to the added IP
delay
TDM Networks vs. IP Networks
Regular TDM network
Telephone B
Telephone A
Rx
Rx
Acoustical
echo
Acoustical
echo
TDM Network
Tx
Tx
E1
E3
E2
E4
IP network – the IP delay is added to the PSTN side delay
Telephone B
Telephone A
Rx
Rx
Acoustical
echo
Acoustical
echo
IP Network
Tx
E1
GW A
E2
GW B
E3
E4
Tx
Basic structure of Echo Canceler
x[n]
x[n]
x[n]
EC
Far - End
h
Control
Circuit
Echo
Filtering
h
Hybrid
h
yˆ[n]
y[n]
Near - End
Echo Filter
Estimation
h
e[n]
o[n]
NLP
e[n]
+
s[n]
DC
Remover
s[n]
v[n]
Example of sparse FIR with 3 active
windows to handle 3 hybrids
R1
R2
R3
G.168 Test 2B - convergence
Challenges in Echo Cancellation
• Double talk can cause the adaptive filter to diverge –
adaptation in AC49x/AC50x EC is robust to double talk
• Non linearity in the echo path can not be modeled by the
linear FIR – AC49x/AC50x EC has a proprietary NLP to
reduce the residual echo
• NLP attenuation can cause modulation of background
noise level – AC49x/AC50x EC support the injection of
comfort noise in order to overcome this issue
• Echo path can change during the call – AC49x/AC50x
EC adaptation is fast after such changes are detected
Acoustic Echo Cancellation
• A similar problem to line echo cancellation with some
additional complications:
–
–
–
–
–
–
Longer echo paths, less sparse compared to line echo path
Worse ERL , can also be negative, howling must be handled
Worse SNR due to the hands free interface
Rapid echo path changes
Higher nonlinearity in the echo path due to loudspeaker characteristics
Need both in 8KHz and 16KHz sampling rates
• The AC494/AC495 has a an acoustic EC that can handle
hands free communication in IP phones.
Speech Compression
• Interoperability is a key issue in VoIP communication
systems , therefore Vocoders are usually standardized
• Audiocodes products support a wide range of vocoders
• Support transcoding between different vocoders
Narrowband Vocoders
•
•
•
•
G.711 – The most basic vocoder , 64 kbps
G.726 – ADPCM 32 kbps
G.729A – The most popular LBR , 8 kbps
G.723.1 – Developped by Audiocodes, 6.3 kbps, Same
quality less bitrate than G729
• iLBC – very robust to packet loss, royalties free, 13 kbps
• AMR – used in UMTS, 4.75-12.2 kbps
• EVRC – used in CDMA, 8.55 kbps
Next generation speech compression Wideband Vocoders
• Bandwidth: 50H – 7Khz , 16 bit, 16Khz sampling rate (vs. 300H to
3.4Khz, PCM, 8 Khz rate in NB speech)
• Substantially higher MOS quality
• Superior clarity
• Better Intelligibility (esp. in noise)
• Richer sound
• Similar bit rates (& cost) as NB
• Better speaker recognition (important in conferencing)
• Better quality with music signals
Comparison of NB and WB codecs
Source: ITU G.7291 performance tests
Comparison of NB and WB codecs
• Humans perceives a wideband speech signal as a much
higher in quality than narrow band, the difference is big more than 1 in Wide Band MOS score (4.5 compared to
3.5)
• The MOS difference is even more dramatic when
comparing current narrowband codec like G.729 @ 8
kbps to a modern wide band codec like g.729.1 @ 32
kbps
Wide Band Vocoders in AC49x/AC50xx
• G.722 Sub Band Coding ADPCM @ 48,56,64 kbps –
used in some high end conferencing systems
• G.722.2 AMR-WB – ACELP @ 12-32 kbps used in
UMTS networks
• G.729.1 – CELP @ 8-32 kbps used for VoIP
• G.711-WB @ 96 kbps used for VoIP (*)
• RTA – Microsoft proprietary Vocoder (*)
• Speex – Royalties free vocoder for internet applications
• SILK – Skype proprietary vocoder (*)
• (*) – roadmap
Background Noise Reduction
• A new feature planned for AC494 6.6 release
• Used for improved Hands free communication in IP
phones
• Optimal Filtering is done in each frequency to suppress
the background noise with minimal effect on speech
Noise Reduction Block Diagram
FFT
Noise Energy Estimation
Speech Energy Estimation
Estimate
Optimal Gain
IFFT
Noise Reduction Demo
• Male with Car Noise , SNR=12dB
– Noisy
– NR
• Male with Car Noise , SNR=6dB
– Noisy
– NR
• Female with Office Noise , SNR=18dB
– Noisy
– NR
Voice Quality Monitoring
• Telchemy VQ Mon Algorithm – estimate MOS from
packet arrival statistics
• RTCP-XR – a standard packet format that carry the
Quality parameters
• MOS-CQ – Conversational MOS , takes into account
also Echo and Delay influence on the quality
Voice Quality Monitoring
Telchemy VQmon accuracy vs. MOS
Summary
• We described some of the challenges in implementing
speech processing algorithms in practical VoIP products
• We described the solutions as implemented in
Audiocodes AC490x/AC50x VoIP processors products
Thank you for your time