Seminar Presentation: Adaptive Multi- Rate Wideband Speech Codec Sergei Hyppenen

advertisement
Seminar Presentation: Adaptive MultiRate Wideband Speech Codec
deployment in 3G Core Network
Sergei Hyppenen
Supervisor: Professor Sven-Gustav Häggman
HELSINKI UNIVERSITY OF TECHNOLOGY
11.04.2006
1
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Contents of the presentation
• Abbreviations
• Introduction
• AMR-WB speech codec
• Network architectures: GSM and 3G (Release 4)
• Speech transmission
• TrFO and TFO
• Out-of-Band Transcoder Control in TrFO
• TFO frames
• Lawful interception
• Signal interception simulation
• Test results: Noise floor values
• Test results: MOS quality values
• Conclusions
2
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Abbreviations
• 3G: 3rd Generation
• HR: Half Rate speech codec
• ACELP: Algebraic Code-Excited Linear Prediction
• IP: Internet Protocol
• AMR-WB: Adaptive Multi-Rate Wideband speech
codec
• LSB: Least Significant Bit
• ATM: Asynchronous Transfer Mode
• NSS: Network Sub-System
• BSS: Base Station Subsystem
• OoBTC: Out-of-Band Transcoder Control
• CN: Core network
• TC: Transcoder
• dB: decibel
• TDM: Time Division Multiplexing
• dBov: dB relative to the overload point of the digital
system
• TFO: Tandem Free Operation
• DTX: Discontinuous Transmission
• MOS: Mean Opinion Score rated 1-5
• TrFO: Transcoder Free Operation
• EDGE: Enhanced Data rates for Global Evolution
• UMTS: Universal Mobile
Telecommunications System
• G.711: PCM-based coding method with 8 kHz
sampling frequency and 8-bit A- or µ-law weighting
• VAD: Voice Activity Detection
• GSM: Global System for Mobile Communications
3
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
• WB-PESQ: a tool for quality evaluation [ITUT: P.862]
Introduction
• Speech contains frequencies up to the 10 kHz
• Current fixed and mobile telecommunication systems operate with a
narrow audio bandwidth: 300-3400 Hz (ITU-T G.711)
• 500-3000 Hz is sufficient for understanding
• The sampling frequency used in digital core networks is 8000 Hz → in theory
enables transmitting signals up to 4000 Hz
• Codecs utilized in mobile systems lower the quality of narrowband
speech even more than the G.711
• AMR-WB speech codec improves the quality and especially the
naturalness of speech
• In EDGE and UMTS all coding modes of the AMR-WB will be used,
in GSM only coding modes till 12.65 kb/s
4
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
AMR-WB speech codec
• Process 50-7000 Hz
Original speech
A-law coded speech
• Sampling: 16 kHz
• Precision: 14-bit
• Coding model: ACELP
• VAD and DTX
• Bad frame handler
• Bit rates: 6.60, 8.85,
12.65, 14.25, 15.85,
18.25, 19.85, 23.05,
23.85 kb/s
• Coding mode 12.65 kb/s
produces better quality
than G.711 (64 kb/s)
5
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
time
HR coded speech
time
time
AMR-WB coded speech
time
Network architectures: GSM and 3G (Release 4)
BSS
BSS
MS
ME
BTS
Abis
TC
NSS
EIR
PSTN/
ISDN
GMSC
MSC
Ater
TDM
SIM
Um
AuC
BTS
BSC
+
HLR
VLR
BTS
Other
PLMN
VLR
A
O&M
NMS
• 3G, Release 4: Core Network
(CN) is divided to Packet
Switched (PS) and Circuit
Switched (CS) domains
• CS domain is separated to
Control Plane (Signaling) and
User Plane (Data)
• TC moved to core network, but
still, the most common scheme
to transfer speech in CN is
G.711
6
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
• GSM: Transcoder (TC) is a part
of Base Station Subsystem
(BSS)
• In core Network Sub-Systems
(NSS) speech signals are
transferred in G.711 form
Network Management NMS
CN CS Domain
MSC
Server
Um
GERAN
MS
Mc H.248
Ater/Iu
BSC
BICC CS-2,
SIP-T, ISUP
Mc H.248
MGW
TC
Abis
MSS/
GCS
MGW
TDM/IP/ATM
Nb
BTS
Iu
Uu
UTRAN
UE
CN PS Domain
Gb
RNC
Node-B
Iub
SGSN
PSTN/
ISDN
Other
PLMN
GGSN
Internet
Iu
Speech transmission
• In current telecommunication systems transcoding is performed at least twice
• In core networks speech signals are transferred in narrowband G.711 form and
one one-way connection requires a 64 kb/s channel
GS
Uplink direction
M Encoding
Downlink direction
Decoding
22.8
kb/s
MS
CODED
SIGNAL
TC
BSC
16 kb/s
BTS Abis
16 kb/s
Ater
EFR / FR / HR
64
kb/s
MSC
TDM
MSC
TC
64
kb/s
A
G.711
Encoding
A
Decoding
BSC
Ater
Abis BTS
MS
G.711
• Wideband speech cannot be transferred using the same technique
• Requires 16 kHz * 14 bit connection speeds, which are UNAXEPTABLY HIGH!
• → wideband speech should be transferred only in CODED FORM!
7
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
TrFO and TFO
• Transcoder Free Operation (TrFO)
transfers coded speech frames in
ATM- and IP-based networks as
such
• In Tandem Free Operation (TFO)
coded frames are merged into least
significant bits (LSB) of PCM-based
signals
• Transcoder-free means that the
same codec is used on the both
sides of a connection → Out-ofBand Transcoder Control (OoBTC)
is needed
• The TFO is utilized in TDM networks
• OoBTC requires the late
assignment of a radio traffic channel
with forward bearer establishment in
CN (see the next slide for details)
8
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
• TFO protocol negotiates with the
distant partner a common codec to be
used by sending messages in-band
• Message bits replace every 16th LSB
• When both mobile terminals switch to
a compatible codec, coded speech
frames can be merged into PCMbased stream that was decoded from
those coded frames
Out-of-Band Transcoder Control in TrFO
• In TrFO negotiation of the codec to be used during the call has to
be performed before the bearer establishment procedures
Early assignment of a radio traffic channel with
backward bearer establishment in CN
O - MSC-S
UE
O - RNC
O - MGW
Late assignment of a radio traffic channel with
forward bearer establishment in CN
O - MSC-S
T- MSC-S
T- MGW
T - RNC
UE
UE
O - RNC
SETUP
T- MSC-S
O - MGW
SETUP
Bearer establishment à
T- MGW
IAM
Paging
Iu UP Initialization à
IAM + Bearer Information
Bearer Information
Paging
SETUP
ß Bearer establishment
Nb UP Initialization à
9
© 2006 Nokia
T - RNC
AMRWB_depl.ppt / 2006-04-11 / SHy
SETUP
Bearer establishment à
{
Bearer establishment à
Iu UP Initialization à
Nb UP Initialization à
ß Bearer establishment
ß Bearer establishment
ß Iu UP Initialization
ß Iu UP Initialization
ALERTING
ALERTING
CONNECT
CONNECT
UE
TFO frames 1
• When TFO is operational 1, 2 or 4 LSBs of every 8-bit PCM sample are replaced
by TFO frames
• TFO frames requiring replacement of 4 LSBs consist of the main frame part (1st
and 2nd LSBs) and the extension frame part (3rd and 4th LSBs).
• During the transmission through the core network TFO frames should not be
modified by noise suppression, level control or other enhancement algorithms
Bits
8
7
6
5
4 ...
3
2
1
8k TFO frame
16k TFO frame
unaltered
}sample
bits
...
1
2
3
...
158 159 160
160 samples
TFO frame length=160bits
10
32k TFO frame
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
...
...
2
1
4
3
6
5
...
316 318 320
315 317 319
160 samples
TFO frame length=320bits
...
2
1
2
1
4
3
4
3
6
5
6
5
...
316 318 320
315 317 319
316 318 320
315 317 319
160 samples
TFO frame length=640bits
extension
}frame
part
main
}frame part
TFO frames 2
• TFO frames are different for each codec and
each coding mode, if a multi-rate codec is in
question
• TFO frames contain synchronization bits,
control and error correction bits, time
alignment bits, spare bits and actual data
bits
• Synchronization and control bits are used
only in the main part
• On the right is an example of the TFO
frames specified for the AMR-WB, the
coding mode is 23.85 kb/s
11
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Lawful interception
• Before an operator may launch a commercial telecommunication network, it has
to provide the lawful interception service.
• The quality provided for the authorities has to be the same or better than the quality
provided for the monitored target
• PCM-based intercepted signals are directed to the authorities as such
• Coded signals are converted into PCM form
• What to do if the intercepted signal contains TFO frames? After all, the signal is
noisy
• The solution is utilization of the passive TFO protocol
• But how bad the noise really is?
12
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Signal interception simulation
• Theoretical noise floor
values were calculated with
the assumption that every
bit in signal representation
raises the dynamics of the
signal 6 dB
Radio
interface
Decoder
Downsampler
1
G.711
converter
Encoder
• In tests the scheme
presented on the right was
simulated
13
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Local TFO
3
G.711
&
Input
Distant TFO
G.711 (+TFO)
coded
G.711
• The results were verified by
sending silence through the
testing system
• Also the MOS quality values
of the speech signals were
evaluated using the WBPESQ tool
Transit network
2a
Output
coded
Interface
towards
authorities
G.711 (+TFO)
or
4
Passive TFO
4
or
G.711
3
Input
G.711
converter
Downsampler
2b
wideband
speech
Decoder
Output
coded
or
1. Original wideband signal
2. Once transcoded wideband
signal
3. Pure narrowband G.711 signal
4. Narrowband G.711 signal with
possible embedded TFO frames
Test results: Noise floor values
A –law
µ –law
Corrupted bits
in G.711 sample
Corrupted bits
in linear values
Effective bits in
linear level
representations
Unaltered
bits in linear
values
0
every 16th LSB
1
2
4
0
every 16th LSB
1
2
4
0
12
12
Calculated
(approx)
-72
13
10
9
7
13
-60
-54
-42
-78
11
10
8
-66
-60
-48
2
3
5
0
2
3
5
Noise floor in dBov
Measured
(exact)
-72.26
-71.21
-64.77
-59.47
-47.59
-78.26
-76.47
-74.74
-66.44
-51.42
• Linear notation of the A-law is 13 bits and the µ-law is 14 bits. The first bit
is the sign bit and it is not one of the effective bits in representation
• In theory only half of the bits are really replaced → measured noise floor
values are lower than the calculated ones
14
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Test results: MOS quality values
Signal files
Decoded (2a)
G.711 (3)
G.711+TFO (4)
Decoded TFO (2b)
T04
3.9
3.1
1.7
3.6
T05
4.1
3.9
1.8
3.8
T14
3.7
3.4
1.8
3.6
T18
Average
3.7
3.9
2.9
3.3
2.1
1.9
3.6
3.7
• The level of the original signals was -26 dBov and SNR 45 dB
• Decoded from TFO frames signals (2b) are slightly different than the originally
decoded ones (2a), as TFO protocol needs approx 1 second time to establish a
connection. During that time no coded speech frames are sent
15
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Conclusions
• SNR values of the intercepted signals with AMR-WB-specific TFO frames were
15-25 dB (original signals -26 dBov) and MOS grades below two.
• If the original signals would have contained noise from the beginning, as it is
usually in real phone-calls, the quality would have been lower
• Using in the tests signals with lower levels, -30 and -36 dBov, which corresponds
to intensive whispering in real-world calls, the results would have been even
worse
• → authorities will not be satisfied with the quality of the intercepted signal
• → the passive TFO protocol is needed indeed!
16
© 2006 Nokia
AMRWB_depl.ppt / 2006-04-11 / SHy
Download