Seminar Presentation: Adaptive MultiRate Wideband Speech Codec deployment in 3G Core Network Sergei Hyppenen Supervisor: Professor Sven-Gustav Häggman HELSINKI UNIVERSITY OF TECHNOLOGY 11.04.2006 1 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Contents of the presentation • Abbreviations • Introduction • AMR-WB speech codec • Network architectures: GSM and 3G (Release 4) • Speech transmission • TrFO and TFO • Out-of-Band Transcoder Control in TrFO • TFO frames • Lawful interception • Signal interception simulation • Test results: Noise floor values • Test results: MOS quality values • Conclusions 2 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Abbreviations • 3G: 3rd Generation • HR: Half Rate speech codec • ACELP: Algebraic Code-Excited Linear Prediction • IP: Internet Protocol • AMR-WB: Adaptive Multi-Rate Wideband speech codec • LSB: Least Significant Bit • ATM: Asynchronous Transfer Mode • NSS: Network Sub-System • BSS: Base Station Subsystem • OoBTC: Out-of-Band Transcoder Control • CN: Core network • TC: Transcoder • dB: decibel • TDM: Time Division Multiplexing • dBov: dB relative to the overload point of the digital system • TFO: Tandem Free Operation • DTX: Discontinuous Transmission • MOS: Mean Opinion Score rated 1-5 • TrFO: Transcoder Free Operation • EDGE: Enhanced Data rates for Global Evolution • UMTS: Universal Mobile Telecommunications System • G.711: PCM-based coding method with 8 kHz sampling frequency and 8-bit A- or µ-law weighting • VAD: Voice Activity Detection • GSM: Global System for Mobile Communications 3 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy • WB-PESQ: a tool for quality evaluation [ITUT: P.862] Introduction • Speech contains frequencies up to the 10 kHz • Current fixed and mobile telecommunication systems operate with a narrow audio bandwidth: 300-3400 Hz (ITU-T G.711) • 500-3000 Hz is sufficient for understanding • The sampling frequency used in digital core networks is 8000 Hz → in theory enables transmitting signals up to 4000 Hz • Codecs utilized in mobile systems lower the quality of narrowband speech even more than the G.711 • AMR-WB speech codec improves the quality and especially the naturalness of speech • In EDGE and UMTS all coding modes of the AMR-WB will be used, in GSM only coding modes till 12.65 kb/s 4 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy AMR-WB speech codec • Process 50-7000 Hz Original speech A-law coded speech • Sampling: 16 kHz • Precision: 14-bit • Coding model: ACELP • VAD and DTX • Bad frame handler • Bit rates: 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85 kb/s • Coding mode 12.65 kb/s produces better quality than G.711 (64 kb/s) 5 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy time HR coded speech time time AMR-WB coded speech time Network architectures: GSM and 3G (Release 4) BSS BSS MS ME BTS Abis TC NSS EIR PSTN/ ISDN GMSC MSC Ater TDM SIM Um AuC BTS BSC + HLR VLR BTS Other PLMN VLR A O&M NMS • 3G, Release 4: Core Network (CN) is divided to Packet Switched (PS) and Circuit Switched (CS) domains • CS domain is separated to Control Plane (Signaling) and User Plane (Data) • TC moved to core network, but still, the most common scheme to transfer speech in CN is G.711 6 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy • GSM: Transcoder (TC) is a part of Base Station Subsystem (BSS) • In core Network Sub-Systems (NSS) speech signals are transferred in G.711 form Network Management NMS CN CS Domain MSC Server Um GERAN MS Mc H.248 Ater/Iu BSC BICC CS-2, SIP-T, ISUP Mc H.248 MGW TC Abis MSS/ GCS MGW TDM/IP/ATM Nb BTS Iu Uu UTRAN UE CN PS Domain Gb RNC Node-B Iub SGSN PSTN/ ISDN Other PLMN GGSN Internet Iu Speech transmission • In current telecommunication systems transcoding is performed at least twice • In core networks speech signals are transferred in narrowband G.711 form and one one-way connection requires a 64 kb/s channel GS Uplink direction M Encoding Downlink direction Decoding 22.8 kb/s MS CODED SIGNAL TC BSC 16 kb/s BTS Abis 16 kb/s Ater EFR / FR / HR 64 kb/s MSC TDM MSC TC 64 kb/s A G.711 Encoding A Decoding BSC Ater Abis BTS MS G.711 • Wideband speech cannot be transferred using the same technique • Requires 16 kHz * 14 bit connection speeds, which are UNAXEPTABLY HIGH! • → wideband speech should be transferred only in CODED FORM! 7 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy TrFO and TFO • Transcoder Free Operation (TrFO) transfers coded speech frames in ATM- and IP-based networks as such • In Tandem Free Operation (TFO) coded frames are merged into least significant bits (LSB) of PCM-based signals • Transcoder-free means that the same codec is used on the both sides of a connection → Out-ofBand Transcoder Control (OoBTC) is needed • The TFO is utilized in TDM networks • OoBTC requires the late assignment of a radio traffic channel with forward bearer establishment in CN (see the next slide for details) 8 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy • TFO protocol negotiates with the distant partner a common codec to be used by sending messages in-band • Message bits replace every 16th LSB • When both mobile terminals switch to a compatible codec, coded speech frames can be merged into PCMbased stream that was decoded from those coded frames Out-of-Band Transcoder Control in TrFO • In TrFO negotiation of the codec to be used during the call has to be performed before the bearer establishment procedures Early assignment of a radio traffic channel with backward bearer establishment in CN O - MSC-S UE O - RNC O - MGW Late assignment of a radio traffic channel with forward bearer establishment in CN O - MSC-S T- MSC-S T- MGW T - RNC UE UE O - RNC SETUP T- MSC-S O - MGW SETUP Bearer establishment à T- MGW IAM Paging Iu UP Initialization à IAM + Bearer Information Bearer Information Paging SETUP ß Bearer establishment Nb UP Initialization à 9 © 2006 Nokia T - RNC AMRWB_depl.ppt / 2006-04-11 / SHy SETUP Bearer establishment à { Bearer establishment à Iu UP Initialization à Nb UP Initialization à ß Bearer establishment ß Bearer establishment ß Iu UP Initialization ß Iu UP Initialization ALERTING ALERTING CONNECT CONNECT UE TFO frames 1 • When TFO is operational 1, 2 or 4 LSBs of every 8-bit PCM sample are replaced by TFO frames • TFO frames requiring replacement of 4 LSBs consist of the main frame part (1st and 2nd LSBs) and the extension frame part (3rd and 4th LSBs). • During the transmission through the core network TFO frames should not be modified by noise suppression, level control or other enhancement algorithms Bits 8 7 6 5 4 ... 3 2 1 8k TFO frame 16k TFO frame unaltered }sample bits ... 1 2 3 ... 158 159 160 160 samples TFO frame length=160bits 10 32k TFO frame © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy ... ... 2 1 4 3 6 5 ... 316 318 320 315 317 319 160 samples TFO frame length=320bits ... 2 1 2 1 4 3 4 3 6 5 6 5 ... 316 318 320 315 317 319 316 318 320 315 317 319 160 samples TFO frame length=640bits extension }frame part main }frame part TFO frames 2 • TFO frames are different for each codec and each coding mode, if a multi-rate codec is in question • TFO frames contain synchronization bits, control and error correction bits, time alignment bits, spare bits and actual data bits • Synchronization and control bits are used only in the main part • On the right is an example of the TFO frames specified for the AMR-WB, the coding mode is 23.85 kb/s 11 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Lawful interception • Before an operator may launch a commercial telecommunication network, it has to provide the lawful interception service. • The quality provided for the authorities has to be the same or better than the quality provided for the monitored target • PCM-based intercepted signals are directed to the authorities as such • Coded signals are converted into PCM form • What to do if the intercepted signal contains TFO frames? After all, the signal is noisy • The solution is utilization of the passive TFO protocol • But how bad the noise really is? 12 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Signal interception simulation • Theoretical noise floor values were calculated with the assumption that every bit in signal representation raises the dynamics of the signal 6 dB Radio interface Decoder Downsampler 1 G.711 converter Encoder • In tests the scheme presented on the right was simulated 13 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Local TFO 3 G.711 & Input Distant TFO G.711 (+TFO) coded G.711 • The results were verified by sending silence through the testing system • Also the MOS quality values of the speech signals were evaluated using the WBPESQ tool Transit network 2a Output coded Interface towards authorities G.711 (+TFO) or 4 Passive TFO 4 or G.711 3 Input G.711 converter Downsampler 2b wideband speech Decoder Output coded or 1. Original wideband signal 2. Once transcoded wideband signal 3. Pure narrowband G.711 signal 4. Narrowband G.711 signal with possible embedded TFO frames Test results: Noise floor values A –law µ –law Corrupted bits in G.711 sample Corrupted bits in linear values Effective bits in linear level representations Unaltered bits in linear values 0 every 16th LSB 1 2 4 0 every 16th LSB 1 2 4 0 12 12 Calculated (approx) -72 13 10 9 7 13 -60 -54 -42 -78 11 10 8 -66 -60 -48 2 3 5 0 2 3 5 Noise floor in dBov Measured (exact) -72.26 -71.21 -64.77 -59.47 -47.59 -78.26 -76.47 -74.74 -66.44 -51.42 • Linear notation of the A-law is 13 bits and the µ-law is 14 bits. The first bit is the sign bit and it is not one of the effective bits in representation • In theory only half of the bits are really replaced → measured noise floor values are lower than the calculated ones 14 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Test results: MOS quality values Signal files Decoded (2a) G.711 (3) G.711+TFO (4) Decoded TFO (2b) T04 3.9 3.1 1.7 3.6 T05 4.1 3.9 1.8 3.8 T14 3.7 3.4 1.8 3.6 T18 Average 3.7 3.9 2.9 3.3 2.1 1.9 3.6 3.7 • The level of the original signals was -26 dBov and SNR 45 dB • Decoded from TFO frames signals (2b) are slightly different than the originally decoded ones (2a), as TFO protocol needs approx 1 second time to establish a connection. During that time no coded speech frames are sent 15 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy Conclusions • SNR values of the intercepted signals with AMR-WB-specific TFO frames were 15-25 dB (original signals -26 dBov) and MOS grades below two. • If the original signals would have contained noise from the beginning, as it is usually in real phone-calls, the quality would have been lower • Using in the tests signals with lower levels, -30 and -36 dBov, which corresponds to intensive whispering in real-world calls, the results would have been even worse • → authorities will not be satisfied with the quality of the intercepted signal • → the passive TFO protocol is needed indeed! 16 © 2006 Nokia AMRWB_depl.ppt / 2006-04-11 / SHy