TSB116 TIA/EIA TELECOMMUNICATIONS SYSTEMS BULLETIN Telecommunications IP Telephony Equipment Voice Quality Recommendations for IP Telephony TSB116 MARCH 2001 TELECOMMUNICATIONS INDUSTRY ASSOCIATION The Telecommunications Industry Association Represents the Communications Sector of NOTICE TIA/EIA Engineering Standards and Publications are designed to serve the public interest through eliminating misunderstandings between manufacturers and purchasers, facilitating interchangeability and improvement of products, and assisting the purchaser in selecting and obtaining with minimum delay the proper product for his particular need. Existence of such Standards and Publications shall not in any respect preclude any member or nonmember of TIA/EIA from manufacturing or selling products not conforming to such Standards and Publications, nor shall the existence of such Standards and Publications preclude their voluntary use by those other than TIA/EIA members, whether the standard is to be used either domestically or internationally. Standards and Publications are adopted by TIA/EIA in accordance with the American National Standards Institute (ANSI) patent policy. By such action, TIA/EIA does not assume any liability to any patent owner, nor does it assume any obligation whatever to parties adopting the Standard or Publication. This Standard does not purport to address all safety problems associated with its use or all applicable regulatory requirements. It is the responsibility of the user of this Standard to establish appropriate safety and health practices and to determine the applicability of regulatory limitations before its use. (From Project No. 4689, formulated under the cognizance of the TIA TR-41.1 Subcommittee on Multiline Terminal Systems.) Published by TELECOMMUNICATIONS INDUSTRY ASSOCIATION 2001 Standards and Technology Department 2500 Wilson Boulevard Arlington, VA 22201 PRICE: Please refer to current Catalog of EIA ELECTRONIC INDUSTRIES ALLIANCE STANDARDS and ENGINEERING PUBLICATIONS or call Global Engineering Documents, USA and Canada (1-800-854-7179) International (303-397-7956) All rights reserved Printed in U.S.A. PLEASE! DON'T VIOLATE THE LAW! This document is copyrighted by the TIA and may not be reproduced without permission. Organizations may obtain permission to reproduce a limited number of copies through entering into a license agreement. For information, contact: Global Engineering Documents 15 Inverness Way East Englewood, CO 80112-5704 or call U.S.A. and Canada 1-800-854-7179, International (303) 397-7956 TIA/EIA/TSB116 TABLE OF CONTENTS INTRODUCTION ..................................................................................................................... 1 1. 1.1. SUMMARY OF VOICE QUALITY RECOMMENDATIONS FOR IP TELEPHONY............... 1 2. REFERENCES .......................................................................................................................... 2 3. DEFINITIONS, ABBREVIATIONS AND ACRONYMS ..................................................... 4 3.1. 3.2. 4. WAVEFORM CODEC AND SPEECH COMPRESSION CODEC ........................................... 4 ABBREVIATIONS AND ACRONYMS .................................................................................. 4 THE E-MODEL......................................................................................................................... 5 4.1. 4.2. TRANSMISSION RATING FACTOR “R” ............................................................................. 6 IP TELEPHONY IMPAIRMENTS AND THE E-MODEL ....................................................... 6 4.2.1. Delay ............................................................................................................................. 7 4.2.2. Echo .............................................................................................................................. 8 4.2.3. Speech Compression ................................................................................................... 11 4.2.4. Packet Loss ................................................................................................................. 13 4.3. WHAT DOES R SOUND LIKE? ......................................................................................... 16 4.4. E-MODEL SYMMETRY AND PERFORMANCE ................................................................ 17 4.5. E-MODEL ENHANCEMENTS ............................................................................................ 17 4.6. E-MODEL CONVENTIONS ................................................................................................ 18 5. WIRELINE PSTN VOICE QUALITY BENCHMARKS.................................................... 19 5.1. 5.2. 5.3. 5.4. 6. ISDN VOICE QUALITY ..................................................................................................... 20 PSTN VOICE QUALITY .................................................................................................... 21 TOLL COMPRESSION VOICE QUALITY .......................................................................... 24 WIRELINE PSTN VOICE QUALITY SUMMARY .............................................................. 26 IP TELEPHONY VOICE QUALITY ANALYSIS .............................................................. 27 6.1. 6.2. VOICE QUALITY ISSUES FOR IP TELEPHONY ............................................................... 29 VOICE QUALITY RECOMMENDATIONS FOR IP TELEPHONY ...................................... 30 6.2.1. Delay ........................................................................................................................... 30 6.2.2. Speech Compression ................................................................................................... 35 6.2.3. Packet Loss ................................................................................................................. 37 6.2.4. Transcoding................................................................................................................. 41 6.2.5. Tandeming .................................................................................................................. 42 6.2.6. New Gateway Loss Plan ............................................................................................. 45 ANNEX A (INFORMATIVE) – VOIP END-TO-END DELAY BUDGET PLANNING FOR PRIVATE NETWORKS.................................................................................................................... 46 A.1. A.2. A.3. A.4. A.4. VOIP END-TO-END DELAY SOURCES OVERVIEW ............................................................ 46 VOIP END-TO-END DELAY SOURCE DEFINITIONS ........................................................... 47 VOIP END-TO-END DELAY BUDGET CASE 1 ..................................................................... 50 VOIP END-TO-END DELAY BUDGET CASE 2 ..................................................................... 52 VOIP END-TO-END DELAY BUDGET CASE 2 ..................................................................... 52 i TIA/EIA/TSB116 FOREWORD (This foreword is not part of this standard.) This document is a TIA/EIA Telecommunications Technical Services Bulletin (TSB) produced by Working Group TR-41.1.2 of Committee TR-41. This TSB was developed in accordance with TIA/EIA procedural guidelines, and represents the consensus position of the Working Group and its parent subcommittee TR-41.1, which served as the formulating group. The TR-41.1.2 VoIP Voice Quality Working Group acknowledges the contribution made by the following individuals in the development of this standard. Name Roger Britt Mark Armstrong Dermot Kavanagh Peter Melton Kirit Patel Representing Nortel Networks Nortel Networks Nortel Networks eOn Communications Cisco Systems Chair/Editor Copyrighted parts of ITU-T Appendix I to Recommendation G.113 and Recommendation G.114 are used with permission of the ITU. The ITU owns the copyright for the ITU Recommendations. The one annex in this Standard is informative and is not considered part of this Standard. Suggestions for improvement of this standard are welcome. They should be sent to: Telecommunications Industry Association Engineering Department Suite 300 250 Wilson Boulevard Arlington, VA 22201 ii TIA/EIA/TSB116 1. Introduction The objectives of this TSB are to provide end-to-end voice quality guidelines for North American IP Telephony and to an E-Model tutorial for IP scenarios. IP Telephony introduces several impairments, some of which are familiar and some new. The E-Model (ITU-T Recommendation G.107) is a tool that can estimate the end-to-end voice quality, taking the IP Telephony parameters and impairments into account. This TSB first describes how the E-Model handles IP Telephony impairments and then it provides general design recommendations for the best possible voice quality performance irrespective of cost, available technology or customer requirements. These recommendations are illustrated with specific IP scenarios to provide an E-Model tutorial for analyzing real networks. Since initially, IP telephony is a replacement technology for the existing wireline PSTN, the focus of this document is on wireline scenarios. The impairments introduced by wireline IP packet technology can be significant. The reader should be aware that wireless and satellite technologies also introduce significant impairments and that only a few of the combined effects are illustrated here. This TSB builds on similar work done for North American PBX private networks that was published in TIA/EIA/TSB32-A and the focus of this document remains on providing guidelines for engineered private networks as opposed to the Internet. The E-Model scenarios detailed in this TSB are available as two Microsoft Excel workbooks: TSB116NETEM1.xls and TSB116NETEM2.xls. Each workbook includes Version 19 of the E-Model application, which the reader can use to model other scenarios. Download these workbooks at: http://www.tiab2b.com/whitepapers.cfm?manufacturer=Nortel 1.1. Summary of Voice Quality Recommendations for IP Telephony Section 6 uses the E-Model to develop the following IP Telephony voice quality recommendations. Delay Rec. #1: Use G.711 end-to-end because it has the lowest Ie-value and therefore it allows more delay for a given voice quality level. Delay Rec. #2: Minimize the speech frame size and the number of speech frames per packet. Delay Rec. #3: Actively minimize jitter buffer delay. Delay Rec. #4: Actively minimize one-way delay. Delay Rec. #5: Accept the E-Model results, which permit longer delays for low Ie-value codecs, like G.711, for a given R-value; see Figure 22 and Figure 27. Delay Rec. #6: Use priority scheduling for voice-class traffic, as well as RTP header compression and data packet fragmentation on slow-speed links to minimize the contribution of this variable delay source. Delay Rec. #7: Avoid using slow serial links. Speech Compression Rec. #1: Use G.711 unless the link speed demands compression. Speech Compression Rec. #2: Speech compression codecs for wireless networks and packet networks must be rationalized to minimize transcoding issues. Packet Loss Rec. #1: Keep (random) packet loss well below 1%. Packet Loss Rec. #2: Use packet loss concealment with G.711. Packet Loss Rec. #3: If other codecs are used, then use codecs that have built-in or add-on PLCs. Packet Loss Rec. #4: New PLCs should be optimized for less than 1% of (random) packet loss. Transcoding Rec. #1: Avoid transcoding where possible. Adds Ie and delay impairment. Transcoding Rec. #2: For interoperability, IP gateways must support wireless codecs or IP must implement unified Transcoder Free Operation with wireless. Tandeming Rec. #1: Avoid asynchronous tandeming if possible. Adds Ie and delay impairment. Tandeming Rec. #2: Synchronous tandeming of G.726 is generally permissible. Impairment is delay dependent, so long delay DCME equipment should be avoided. Loss Plan Rec. #1: Use TIA/EIA/TSB122-A, Voice Gateway Loss and Level Plan. 1 TIA/EIA/TSB116 2. References The following documents contain provisions that are referenced in this TSB. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this Standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. ANSI and TIA maintain registers of currently valid national standards published by them. WARNING: This document contains a reference to a work-in-progress – PN-3673 (to be published as ANSI/TIA/EIA-464-C), which is subject to change. The most current version of PN-3673 is available in the public directory of TR-41.1 at: http://ftp.tiaonline.org/tr-41/tr411/Public/Latest_Revision_of_PN-3673/ [1] ANSI/TIA/EIA-464-B (April 1996), Requirements for Private Branch Exchange (PBX) Switching Equipment. [2] ANSI/TIA/EIA-579-A (November 1998), Transmission Requirements for Digital Wireline Telephones. [3] ANSI/TIA/EIA-810-A (December 2000), Transmission Requirements for Narrowband Voice over IP and Voice over PCM Digital Wireline Telephones. [4] TIA/EIA/TSB32-A (December 1998), Overall Transmission Plan Aspects for Telephony in a Private Network. [5] TIA/EIA/TSB122-A (March 2001), Voice Gateway Loss and Level Plan Guidelines. [6] PN-3673 (to be published as ANSI/TIA/EIA-464-C), Requirements for PBX switching Equipment. [7] ANSI T1.521, American National Standard for Packet Loss Concealment with ITU-T Recommendation G.711. [8] ITU-T Recommendation G.107 (12/98) and (05/00), The E-Model, A Computational Model for use in Transmission Planning. [9] ITU-T Recommendation G.108 (2000), Conversational impacts on end-to-end speech transmission quality – a planning guide on effects not covered by the E-Model. [10] ITU-T Recommendation G.109 (1999), Definition of categories of speech transmission quality. [11] ITU-T Recommendation G.113 (02/96), Transmission impairments. [12] ITU-T Appendix I to Recommendation G.113 (1998), Transmission impairments – Appendix I: Provisional planning values for the equipment impairment factor Ie. [13] ITU-T Recommendation G.114 (05/00), General Recommendations on the transmission quality for an entire international telephone connection. [14] CCITT Recommendation G.131 (08/96), Control of talker echo. 2 TIA/EIA/TSB116 [15] ITU-T Recommendation G.175 (05/00), Transmission plan aspects of special circuits and connections using the international telephone connection network. [16] ITU-T Recommendation G.177 (2000), Transmission planning for voiceband services over hybrid Internet/PSTN connections. [17] CCITT Recommendation G.711 (11/88), Pulse code Modulation (PCM) of voice frequencies. [18] ITU-T Recommendation G.712 (11/96), Transmission performance characteristics of pulse code modulation. [19] ITU-T Recommendation G.723.1 (03/96), Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. [20] ITU-T Recommendation G.726 (12/90), 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code M (ADPCM). [21] ITU-T Recommendation G.729 (03/96), Coding of speech at 8 kbit/s using conjugatestructure algebraic-code-excited linear-prediction (CS-ACELP). [22] ITU-T Recommendation P.861 (02/98), Objective quality measurement of telephone-band (300 -3400 Hz) speech codecs. [23] ETSI EG 201 377-1 (1999), Specification and measurement of speech transmission quality; Part 1: Introduction to objective comparison measurement methods for one-way speech quality across networks. 3 TIA/EIA/TSB116 3. Definitions, Abbreviations and Acronyms 3.1. Waveform Codec and Speech Compression Codec A codec is a combination of an analog-to-digital encoder and a digital-to-analog decoder operating in opposite directions of transmission in the same equipment. A “waveform codec” preserves the waveform of the incoming signal, and operates on a sample-by-sample basis. G.711 and G.726 are examples of waveform codecs. Modern low bit-rate coders use work on a different principle than waveform codecs. They use a uses a model of the human speech mechanism to encode and compress speech signals, based on the analysis of a frame of input samples. The speech model parameters are sent instead of the speech waveform. This document generally uses the term “speech compression codec” for this type of codec, but it also uses the term “vocoder”. Table 1 separates codecs into the waveform and speech compression categories. 3.2. Abbreviations and Acronyms Abbreviations and acronyms, not in common usage, which appear in this TSB, are defined below. ACELP CPE DCME DSL ECAN EFR ERL ERLE GoB GSM Ie IP ISDN MOS NETEM NLP OLR PBX PCM PL PLC PoW PSTN qdu QoS RLR RTP SLR STMR TCLw TDM TELR TFO TrFO VAD VoIP Algebraic-Code-Excited Linear-Prediction Customer Premise Equipment Digital Circuit Multiplication Equipment Digital Subscriber Line Echo Canceller Enhanced Full Rate Echo Return Loss Echo Return Loss Enhancement Good or Better (a quality rating derived from MOS) Global System for Mobile Telecommunications Equipment Impairment factor (an E-Model input parameter) Internet Protocol Integrated Services Digital Network Mean Opinion Score Network Edge Technology E-Model Nonlinear Processor Objective Loudness Rating Private Branch Exchange Pulse Code Modulation Packet Loss Packet Loss Concealment Poor or Worse (a quality rating derived from MOS) Public Switched Telephone Network Quantization Distortion Unit(s) Quality of Service (refers to systems of tagging packets for priority of transmission) Receive Loudness Rating Real-Time Protocol Send Loudness Rating Sidetone Masking Rating Weighted Terminal Coupling Loss Time Division Multiplexing Talker Echo Loudness Rating Tandem Free Operation Transcoder Free Operation Voice Activity Detector Voice over Internet Protocol 4 TIA/EIA/TSB116 4. The E-Model The objectives for this section are to: • • • Demonstrate the suitability of the E-Model for estimating the voice quality of IP (Internet Protocol) Telephony, Explain the R-scale used by the E-Model, Explain what some of the IP Telephony impairments sound like. The E-Model is a transmission-planning tool for estimating the user satisfaction of a narrowband, handset conversation, as perceived by the listener. It is not intended for predicting absolute user satisfaction. Instead, the intent is to model the performance of an unknown connection relative to a connection with known performance. The E-Model has proven to be versatile tool that has adapted well to the impairments of IP telephony. This document assumes that the reader is familiar with the EModel and the basics of following standards. • • • • • • • • • ITU-T Recommendation G.107 (the E-Model, including a program listing) ITU-T Recommendation G.108 (a tutorial on the E-Model and network planning) ITU-T Recommendation G.109 (defines categories of speech transmission quality) ITU-T Recommendation G.113 (details transmission impairments) ITU-T Appendix I to Rec. G.113 (table of the equipment impairment factor, Ie, values) ITU-T Recommendation G.114 (details delay, including the expected delay for IP codecs) ITU-T Recommendation G.131 (details talker echo) ITU-T Recommendation G.177 (provides guidelines for mixed IP/PSTN connections) TIA/EIA/TSB32-A and ETSI Guide 201 050 (a tutorial on the E-Model and a transmission planning guide for private networks). Figure 1 – Comparison of E-Model Output Scales and Categories G.107 Default Value R 100 USER SATISFACTION MOS 94 90 Very Satisfied 4.4 4.3 98.4 97.0 0.1 0.2 4.0 89.5 1.4 3.6 73.6 5.9 3.1 50.1 17.4 2.6 26.6 37.7 1.0 0 99.8 %GOB %POW Satisfied 80 Some Users Dissatisfied 70 Many Users Dissatisfied 60 Nearly All Users Dissatisfied 50 Not Recommended 0 5 TIA/EIA/TSB116 4.1. Transmission Rating Factor “R” The output of the E-Model is a scalar called the “Rating Factor”, the “R-value”, or simply R. The scale is typically from 50 to 100, where everything below 50 is clearly unacceptable and everything above 94.15 (the maximum with the G.107 E-Model, version 19 default values) is unobtainable in narrowband (300 to 3400 Hz) telephony. The scale on the left-hand side of Figure 1 illustrates this point. The center scale labeled “User Satisfaction” shows the categories defined in G.109. This gives an indication of the overall quality of the conversation. It is important to note the distinction between E-Model objective results and the results of subjective studies that are expressed using the MOS (Mean Opinion Score), %GoB (percent Good or Better) or %PoW (percent Poor or Worse) scales. In subjective testing, subjects are requested to classify the perceived quality into categories (for example, a five point scale that includes the classifications excellent, good, fair, poor and bad). In each subjective experiment, the MOS scores may differ, even for the same condition, depending on the design of the experiment, the range of conditions included in the study, etc. E-Model results, however, are calculated using the Impairment Factor method in which impairment values along the speech path (such as loss, distortion, echo, delay, noise, etc.) are combined to obtain an overall transmission rating R, which is objective and repeatable. While the R can be deterministically converted into MOS, %GoB or %PoW scores, it is preferable to avoid confusion and use only the R scale for all E-Model work. For reference, the MOS, %GoB and %PoW scales are shown on the right-hand side of Figure 1. The E-Model consists of several models that relate specific impairment parameters and their interactions to end-to-end performance. The total end-to-end performance, taking into account all factors, is estimated using the Impairment Factor method, which is based on the principle that transmission impairments can be transformed into “Psychological Factors” and these factors are additive on the “Psychological Scale”. The equation for the transmission rating factor R is: R = Ro - Is - Id - Ie +A Where, • Ro, the basic signal-to-noise ratio based on send and receive loudness ratings and the circuit and room noise; • Is, the sum of real-time or simultaneous speech transmission impairments, e.g., loudness levels, sidetone and PCM quantising distortion; • Id, the sum of delayed impairments relative to the speech signal, e.g., talker echo, listener echo and absolute delay; • Ie, the Equipment Impairment factor for special equipment, e.g., low bit-rate coding (determined subjectively for each codec and for each % packet loss and documented in Appendix I to G.113); • A, the Advantage factor adds to the total and improves the R-value for new services, like satellite phones, to take into account the advantage of using a new service and to reflect acceptance of lower quality by users for such services. It is assumed that the Advantage Factor will be reduced over time as the service improves and the customers get used to the benefits of the new service. It is not recommended to include a non-zero Advantage Factor for IP telephony because it is a replacement for existing services, rather than a completely new service. The Equipment Impairment factor and the Advantage factor are unique to the E-Model, but it is the Equipment Impairment factor that makes the E-Model a powerful tool for estimating the relative user satisfaction of IP Telephony conversations. 4.2. IP Telephony Impairments and the E-Model The four main impairments of IP telephony are: 6 TIA/EIA/TSB116 • • • • Delay, including delay variation and jitter Echo Speech compression Packet loss. The ability to handle these impairments is one of the strengths of the E-Model. Figure 2 – Delay Impairment of Reference Connection User Satisfaction 100 Very satisfactory 90 Satisfactory 80 Some users dissatisfied R TELR = 65 dB 70 Many users dissatisfied 60 Exceptional limiting case 50 0 100 200 300 400 500 One-way Delay (ms) 4.2.1. Delay The curve in Figure 2 plots the transmission rating factor R vs delay for the reference connection shown in Figure 3. The right-hand side of Figure 2 includes the “User Satisfaction” scale for reference. Graphing R (y-axis) against delay (x-axis) gives a clear picture of how important delay is in interactive voice telephony. The reference connection curve uses the G.107 default values for all parameters except the variable delay. This gives the best possible performance for a narrowband handset conversation, over this range of delay, and therefore will be used as the “relative reference” throughout this document. The connection consists of two ideal digital telephones with G.711 codecs and some means to vary network delay from 0 to 500 ms, as shown in Figure 3. The parametric variable, TELR (talker echo loudness rating), shown in the Figure 2 legend, is explained in the next section. Notice that there is a knee on the curve at about 175 ms. The region between 150 and 200 ms is where the delay starts to affect the dynamics of a conversation. The steeper slope of the curve after 175 ms reflects the increasing degradation of the dynamics of normal conversation with increasing delay. Why does this happen? In a normal face-to-face conversation, after one person speaks, there is about a 200 ms break, then the other person speaks, followed by another 200 ms break and so on. This is called turn taking. When delays on the communication channel become comparable to the turn-taking 7 TIA/EIA/TSB116 pauses, there is a loss of synchronicity in the conversation and normal turn taking rules start to break down. Often, when there is added delay, one person will start talking before the other person is finished or both people will start talking simultaneously, which causes the conversation to stop and restart. If one person dominates the conversation, then the other person will have trouble breaking in, because the dominant talker has already started again before the break reaches the less dominant talker. There are other effects as well. For instance, the extra delay can also change the message. People interpret hesitation as evidence of openness, honesty, or confidence. Suppose someone asks the question, “Will you marry me?” and the answer, “Yes”, is delayed by some noticeable amount. The delay may be interpreted as a hesitation to reply rather the normal operation of codecs, jitter buffers and propagation delay. The medium can distort the message. Figure 3 – Block Diagram and E-Model Parameters for Reference Connection Side A 0 dBr Digital Telephone Side B Digital Telephone Echo Path - Side A Echo Path - Side B Title Electric Circuit Noise (at 0 dBr) Room Noise Send Loudness Rating Receive Loudness Rating D-factor Noise Floor Sidetone Masking Rating Equipment Impairment Factor Expectation (Advantage) Factor Mean One-Way Delay (upper) Mean One-Way Delay (lower) Mean One-Way Delay (upper = lower) Electrical Loss (upper) Electrical Loss (lower) Electrical Loss (upper = lower) Quantizing Distortion Units (upper) Quantizing Distortion Units (lower) Echo Return Loss Abbrev. (Default) Nc (-70) Po (35) SLR (8) RLR (2) D (3) Nfor (-64) STMR (15) Ie (0) A (0) Tu (0) Tl (0) Tul (0) Lu Ll Lul qduu (1) qdul (1) ERL Digital Set 35 8 2 3 -64 15 0 0 0 0 0 0 0 55 0 dBr -70 0 to 500 Digital Set 35 8 2 3 -64 15 0 0 0 0 0 0 0 55 4.2.2. Echo The family of curves in Figure 4 shows the effect of echo as predicted by the E-Model for the connection shown in Figure 7. To fully understand the meaning of the graph it is necessary to take a step back and explain the parametric variable TELR. First, the definition (based on Echo Path B in Figures 5 and 6): TELR (Side B) = SLR (Side B) + Loss in bottom path + ELR or TCLw (Side A) + Loss in top path + RLR (Side B) 8 TIA/EIA/TSB116 The TELR for Side A is similar, but follows the opposite path, as highlighted by the arrows below the block diagrams. Figure 5 shows a connection with two 2-to-4-wire conversions and an analog-todigital and digital-to-analog conversion. The 2-to-4-wire conversion is called a hybrid and the amount of echo that is reflected by the hybrid is called the transhybrid loss, the echo return loss or more simply the ERL. Transhybrid loss varies depending on the impedance matching of the circuits on each side of the hybrid. Figure 6 shows an all-digital connection. In this case, the echo is due to acoustic or electrical coupling of the two voice paths at the terminal. The echo return loss is called weighted terminal coupling loss or TCLw. TCLw is leakage in the analog portion of the digital set, i.e., the analog circuits, capacitive coupling in the handset cord, mechanical coupling from the receiver to the transmitter in the handset or acoustical coupling from the receiver to the transmitter in the handset. TELR is the sum of the losses around the loop, from one set’s transmitter back to the receiver on the same set. The SLR (send loudness rating) and RLR (receive loudness rating) are the loss values for the same telephone. In Figure 5, loss pads in the upper and lower paths control the loss plan. These pads may add gain, which increases the echo, they may add loss, which reduces the echo or they may be neutral (0 dB) and have no affect on the echo. The loss plan for an all-digital connection is determined by the loudness ratings of the telephones and there are no additional losses in the network, allowing “clear channel” transmission. The loss plan for an analog or mixed analog/digital connection is a fine balance between providing enough loss to attenuate the echo and maintain circuit stability, while still being audible over a range of analog loops. Back to Figure 4. Note that as TELR is reduced, the amount of end-to-end delay available to the connection for a given performance quality objective on the R scale is also reduced. The nominal loudness ratings for digital telephones are SLR = 8 dB and RLR = 2 dB. TELR has to be about 65 dB to completely remove echo, so TCLw has to be: TCLw = TELR – SLR – RLR = 65 – 8 – 2 = 55 dB. The ERL standard for echo cancellers (ECANs), ITU-T Recommendation G.168, specifies ERL >= 55 dB of echo path loss for ECANs in gateways, but the standard for digital sets, TIA-810-A, specifies TCLw >= 45 dB for ISDN and digital proprietary PBX telephones and TCLw >= 52 dB for IP telephones. The curve for TELR = 55 dB (45 + 8 +2) shows that this is a good requirement for low delay connections like local ISDN and digital proprietary PBX telephones, but it is not adequate for IP telephones. Clearly, IP telephones need to have TCLw >= 55 dB for minimum echo return, just like ECANs, because they work in a long delay environment. Due to the difficulty in meeting TCLw >= 55, TIA-810-A specifies a compromise value of >= 52 dB, plus a desirable of >= 55 dB. Figure 4 is useful for understanding the implications of double-talk on the performance of ECANs. In single-talk mode, i.e., when one person is talking and the other is silent, the convolution processor part of the echo canceller provides about 18 dB of echo return loss enhancement (ERLE), in addition to the typical analog telephone ERL of about 12 dB. The nonlinear processor (NLP) provides an additional loss of at least 25 dB, for a total of 55 dB. When both people start talking at once, the NLP drops out leaving the connection with only 30 dB of loss. So in double-talk mode, the echo performance drops from the TELR = 65 dB curve to the TELR = 40 dB curve, with a significant drop in R. Since there is no conclusive evidence that anyone is listening when both people are talking, the point may well be moot. 9 TIA/EIA/TSB116 Figure 4 – E-Model Prediction of Echo Impairment User Satisfaction 100 Very satisfactory 90 Satisfactory 80 TELR = 65 dB Some users dissatisfied R TELR = 60 dB TELR = 55 dB 70 TELR = 50 dB Many users dissatisfied TELR = 45 dB 60 Exceptional limiting case 50 0 100 200 300 400 500 One-way Delay (ms) Figure 5 – Echo Path for Analog Connection Side A OPS Analog Telephone Digital PBX CODEC Hybrid SLR = 11 dB U dB D ERL = 14 dB RLR = -3 dB CODEC U dB A Side B Digital PBX D RLR = -3 dB ERL = 14 dB D A OPS Analog Telephone A D Digital PSTN Hybrid SLR = 11 dB A L dB L dB Echo Path A Echo Path B Figure 6 – Echo Path for Digital Connection Side A Digital PBX Digital PBX Side B Digital Telephone U dB U dB Digital Telephone SLR = 8 dB RLR = 2 dB Digital PSTN RLR = 2 dB TCLw = 45 dB L dB SLR = 8 dB L dB TCLw = 45 dB Echo Path A Echo Path B 10 TIA/EIA/TSB116 Figure 7 – Block Diagram and E-Model Parameters for Echo Impairment Side A Side B 0 dBr IP Telephone IP Telephone IP Intranet Echo Path - Side A Echo Path - Side B Title Electric Circuit Noise (at 0 dBr) Room Noise Send Loudness Rating Receive Loudness Rating D-factor Noise Floor Sidetone Masking Rating Equipment Impairment Factor Expectation (Advantage) Factor Mean One-Way Delay (upper) Mean One-Way Delay (lower) Mean One-Way Delay (upper = lower) Electrical Loss (upper) Electrical Loss (lower) Electrical Loss (upper = lower) Quantizing Distortion Units (upper) Quantizing Distortion Units (lower) Echo Return Loss Abbrev. (Default) Nc (-70) Po (35) SLR (8) RLR (2) D (3) Nfor (-64) STMR (15) Ie (0) A (0) Tu (0) Tl (0) Tul (0) Lu (0) Ll (0) Lul (0) qduu (1) qdul (1) ERL IP Set 35 8 2 3 -64 15 0 0 0 0 0 0.5 0.5 35 to 55 0 dBr -70 IP Intranet IP Set 0 to 500 35 8 2 3 -64 15 0 0 0 0 0 0.5 0.5 35 to 55 Figure 7 shows the block diagram and E-Model parameters for echo impairment. It is the same as Figure 3, except the parametric variable TELR ranges from 45 to 65 dB in 5 dB steps. Actually, it is the TCLw parameter that ranges from 35 to 55 dB in 5 dB steps, but the generic term, ERL, is used in the E-Model, as shown on the bottom line of Figure 7. 4.2.3. Speech Compression A unique feature of the E-Model is its flexibility to deal with the impairments introduced by speech compression and packet loss via the Equipment Impairment Factor (Ie). Ie values for several codecs are listed in Rec. G.113, Appendix I and for convenience are reproduced here in Table 1. The list of codecs and channel conditions included in G.113 is frequently updated, as are the Ie values. Before using these Ie values in any E-Model calculations, the reader is advised to refer to the latest revision of G.113, Appendix I. The Ie values in Table 1 were determined in subjective experiments with ideal software implementations of the codecs; the performance provided by commercial codecs may vary. As detailed in section 4.1, Ie is subtracted from the R, reflecting the reduced listener satisfaction due to distortion. Figure 8 illustrates the point by comparing the best-case curves for three popular IP codecs, G.711, G.729A and G.723.1 (6.3 kbit/s). Notice that the codecs with speech compression (G.729A and G.723.1) have larger Ie values and, therefore, can tolerate less one-way delay for a given voice quality level (R). Figure 9 shows the block diagram with Ie as the parametric variable and delay as the independent variable. The jitter buffers and packetization modules are shown in the gateways without any delay allotment. Instead all the delay is shown at the 0 dBr point. This is done only for convenience. When modeling a real connection, the proper gateway delay would be entered in the gateway columns. Note that the G.711 curve in Figure 8 is the same as the TELR = 65 dB reference curve in the previous sections. NETEM determines the reflection point as the ERL value that is closest to the 0 dBr column (55 dB in this case). This is only correct if the ECAN’s tail path capacity is long enough. 11 TIA/EIA/TSB116 Table 1 – Planning Values for the Equipment Impairment Factor Ie Codec Type Reference Operating Rate kbit/s Waveform Codecs PCM ADPCM G.711 G.726, G.727 G.721, G.726, G.727 G.726, G.727 G.726, G.727 Speech Compression Codecs LD-CELP G.728 CS-ACELP G.729 G.729-A + VAD IS-54 IS-641 IS-96a IS-127 Japanese PDC GSM 06.10, Full-rate GSM 06.20, Half-rate GSM 06.60, EFR G.723.1 G.723.1 VSELP ACELP QCELP RCELP VSELP RPE-LTP VSELP ACELP ACELP MP-MLQ Ie Value 64 40 32 24 16 0 2 7 25 50 16 12.8 8 8 8 7.4 8 8 6.7 13 5.6 12.2 5.3 6.3 7 20 10 11 20 10 21 6 24 20 23 5 19 15 Figure 8 – Speech Compression Impairment User Satisfaction 100 Very satisfactory 90 Satisfactory 80 G.711 Some users dissatisfied R G.729A 70 Many users dissatisfied G.723.1 60 Exceptional limiting case 50 0 100 200 300 One-way Delay (ms) 12 400 500 TIA/EIA/TSB116 Figure 9 – Block Diagram and E-Model Parameters for Speech Compression Impairment Side A Gateway Gateway 0 dBr Digital Telephone G.7xx G.711 Side B Digital Telephone JB IP Intranet JB G.711 G.7xx Echo Path - Side A Echo Path - Side B Title Electric Circuit Noise (at 0 dBr) Room Noise Send Loudness Rating Receive Loudness Rating D-factor Noise Floor Sidetone Masking Rating Equipment Impairment Factor Expectation (Advantage) Factor Mean One-Way Delay (upper) Mean One-Way Delay (lower) Mean One-Way Delay (upper = lower) Electrical Loss (upper) Electrical Loss (lower) Electrical Loss (upper = lower) Quantizing Distortion Units (upper) Quantizing Distortion Units (lower) Echo Return Loss Abbrev. (Default) Nc (-70) Po (35) SLR (8) RLR (2) D (3) Nfor (-64) STMR (15) Ie (0) A (0) Tu (0) Tl (0) Tul (0) Lu Ll Lul qduu (1) qdul (1) ERL Digital Set 35 8 2 3 -64 15 0 0 0 0 0 0.5 0.5 45 IP Gateway 0 dBr -70 IP Intranet 0, 11, 15 0 0 0 0 0 0 55 IP Gateway 0, 11, 15 0 to 500 0 0 0 0 0 0 0 55 Digital Set 35 8 2 3 -64 15 0 0 0 0 0 0.5 0.5 45 4.2.4. Packet Loss The provisional Equipment Impairment Factors for G.711, G.729A, G.723.1 and GSM Enhanced Full Rate (EFR) codecs under conditions of packet loss are listed in Table 2. There are three columns for G.711, one without Packet Loss Concealment (PLC) and two with PLC. The two with PLC are further subdivided into random and bursty packet loss conditions. For reference, the G.711 PLC algorithms are specified in ANSI T1.521. Their performance is similar. Annex A is the algorithm that was used for the bursty column and Annex B was used for the random column. The plot of Ie values vs packet loss in Figure 10, for the codecs in Table 2, shows the effectiveness of the PLC algorithms. Ideal performance would be a horizontal line along the x-axis, indicating no increase in impairment as packet loss increases. The dark blue vertical curve for G.711 without PLC is almost the opposite of the ideal. The difference between the two G.711 curves with PLC (the green and red curves) probably reflects both the differences in the algorithms (Annex A vs Annex B of ANSI T1.521, respectively) and the effects of the test conditions (bursty vs random, respectively). Another way to describe packet loss visually is to plot the family of curves for a given codec. Figure 11 does so for GSM EFR. As the packet loss increases from 0 to 5%, the Ie value increases from 5 to 33 and the available one-way delay drops from about 380 ms to about 35 ms at R = 60. For reference, the G.711 default curve is included as well. Note that the encoder-decoder delay is about 100 ms for GSM. So really, the R axis shifts over to the 200 ms mark for this scenario, before considering any other sources of delay. Also note (in Figure 12) that the ERL is set to the ideal for this simple example. Actual GSM voice quality is much lower than it appears in Figure 11. Refer to Section 6.2.3 for more packet loss examples. Figure 12 shows the scenario block diagram with Ie as the parametric variable and with all the delay again concentrated in the 0 dBr column, rather than distributed appropriately. To avoid transcoding issues in this example, the transport between the base stations uses transcoder free operation (TrFO), meaning the GSM bits are transported over the packet network without being converted to and from G.711. 13 TIA/EIA/TSB116 Table 2– Provisional Planning Values for the Equipment Impairment Factor Ie under Conditions of Packet Loss for Codecs G.711, G.729A + VAD, G.732.1 + VAD and GSM EFR Packet Loss (%) 0 0.5 1 1.5 2 3 4 5 7 8 10 15 16 20 G.711 without PLC (10 ms speech packet length) 0 – 25 – 35 45 – 55 – – – – – – G.711 + PLC Random Packet Loss G.711 + PLC Bursty Packet Loss G.729A + VAD 8 kbit/s G.723.1 + VAD 6.3 kbit/s GSM 06.60 EFR 12.2 kbit/s (10 ms speech packet length) (10 ms speech packet length) (2 speech frames/ packet) (1 speech frame/ packet) (1 speech frame/ packet 0 – 5 – 7 10 – 15 20 – 25 35 – 45 0 – 5 – 7 10 – 30 35 – 40 45 – 50 11 11 15 17 19 23 26 – – 36 – – 49 – 15 15 19 22 24 27 32 – – 41 – – 55 – 5 – 16 – 21 26 – 33 – – – – – – Figure 10 – Provisional Planning Values for the Equipment Impairment Factor Ie under Conditions of Packet Loss for Codecs G.711, G.729A + VAD and G.723.1 + VAD 60 50 40 Ie 30 20 10 0 0 5 10 15 20 Packet Loss % G.711 without PLC G.711 with PLC Bursty Packet Loss G.723.1 + VAD 14 G.711 with PLC Random Packet Loss G.729A + VAD GSM 06.60 EFR 25 TIA/EIA/TSB116 Figure 11 – GSM 06.60 Enhanced Full Rate Packet Loss Performance relative to G.711 without Packet Loss GSM 06.60 EFR Packet Loss Performance User Satisfaction 100 Very satisfactory 90 G.711 @ PL = 0% Satisfactory G.711 Reference GSM EFR GSM EFR @ PL = 0% 80 Some users dissatisfied R GSM EFR @ PL = 1% 70 GSM EFR @ PL = 2% Many users dissatisfied GSM EFR @ PL = 3% 60 GSM EFR @ PL = 5% Exceptional limiting case 50 0 100 200 300 400 500 One-way Delay (ms) PL = Packet Loss Figure 12 – Block Diagram and E-Model Parameters for Packet Loss Impairment 15 TIA/EIA/TSB116 4.3. What Does R Sound Like? Table 3 – Descriptions of the Sound Characteristics caused by IP Telephony Impairments Description Cause Convergence echo* A brief blast of echo at the start of a call, before the ECAN converges, or following changes to the echo path (e.g., transferred calls or switching in conference bridges). Double-talk echo* ECANs disable the NLP when both people talk simultaneously, leaving only the low ERLE of the convolution processor. When delays are long, the residual echo may be audible. After double-talk echo* Echo caused by double-talk, but arriving after double-talk is finished due to network delay (see next) Conversation protocol issues, like End-to-end delay (speech coding + packetization + jitter turn-taking, over-talking, break-in compensation + network routing + propagation) is too high. and who’s-in-charge problems Because of the loss of simultaneity, the parties may perceive each other as inattentive, insincere, or rude. This will increase with increased delay, until turn-taking cues break down completely. Whirlybird distortion, or waterfall CELP-based compression coding algorithms. effect Speech Clipping at beginning VAD (voice activity detector) not switching quickly enough, and/or end of phases or words ECAN NLP not switching quickly enough, or VAD and ECAN interfering with each other. Background noise silence periods* Background contrast* contrast noise in No comfort noise generator (CNG), or the CNG is not properly matching the background noise in the sending end (CNG is using stationary noise rather than modeling the actual noise, or the noise model is inadequate). transition Comfort noise generator switches in too slowly, hang-time on VAD too long. Noise pumping* Background noise is triggering the VAD, lack of comfort noise generator or comfort noise level does not match the actual background noise level. Dropouts/chopping/clipping Lack of signal caused by packet loss or a problem with VAD. Clicks, pops Packet loss with waveform codecs operating without packet loss concealment. Tonal or mechanical riding on the voice artifacts Side effects of the packet loss concealment algorithm. Low level tones* Created intentionally by decoders during long bursts of packet loss. * These impairments are not modeled by the E-Model. Now that we are confident that the E-Model is a suitable tool for IP telephony, it is time to consider what R values are possible and what these values sound like. First of all, it is important to appreciate that any particular R may be reached by multiple combinations of impairment. Therefore, different Rs have many different sound characteristics. Some are listening characteristics and some are conversation characteristics. These characteristics were discussed in general terms in Section 4.2 and 16 TIA/EIA/TSB116 will be discussed further in Section 6. Table 3 is a glossary of many of the sounds that IP telephony users will experience, along with the potential causes. The question of what values of R are possible is one of the objectives of this TSB and it is a much more difficult question than can be answered in this section. It will take most of Section 6 to flesh out the answer. Figures 8 and 11 give some hints about how the R-axis works, as each codec has a family of curves that simply shift the reference curve down to lower starting points on the R-axis, due to increased noise and distortion. The delay axis is a bit more complicated. Delay can be partitioned as: speech coding/packetization + jitter compensation + transport + propagation. Section 6 will provide the necessary delay details, but for now it is sufficient to say that unlike the PSTN, the region below 100 ms is not well used by IP telephony. The concept of regions will be explored further in Section 5 and although Section 6 will calculate specific R for specific scenarios, it is important to remember that the quality of IP transmission depends on dynamic impairments rather than the static impairments associated with TDM transmission). Therefore, the overall quality of IP Telephony is probabilistic in nature, and must be characterized statistically. 4.4. E-Model Symmetry and Performance This document assumes symmetry between Side A and Side B that probably does not exist in practice. For instance, packet loss, delay and loudness ratings may be asymmetrical. Also, because the Ie values are based on ideal implementations of ITU-T codecs, the performance predicted by the EModel may be optimistic. The performance of real codecs may be worse due to implementation constraints. Also, all buffer behavior is not equal. Many jitter buffers simply discard packets under overload conditions. Smart buffers wait for silence periods before discarding packets. The difference is audible. Delay variation over time may also be audible in some cases. 4.5. E-Model Enhancements The E-Model has gained worldwide acceptance because it is a unique model based on several previously existing transmission quality models and the results of many subjective experiments conducted over the last fifty years. One of the E-Model’s limitations is that it is currently only applicable for narrowband handset operation. Obvious enhancements are to add headset, handsfree and wideband functionality. Further work under Q. 20/12 in ITU-T Study Group 12 plans to include headset and handsfree operation, but currently there is no support for developing a planning model for wideband audio. Work has started in ITU-T Study Group 12 on a new Recommendation (P.833) to provide a detailed methodology for determining equipment impairment factors for use in G.107, from the results of subjective tests. The current methodology used for determining equipment impairment factors is not well documented and could be improved. As an alternative to the use of subjective tests, work is also planned within ITU-T Study Group 12 to develop methods for determining equipment impairment factors from objective measurements, based on the new Recommendation P.862, Perceptual Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. The latest revision of ITU-T Recommendation G.107 (05/00) changes the supplementary amount of equivalent circuit noise, Nos, in the E-Model to include the Lombard effect. The Lombard effect accounts for the behavior of a talker in a noisy environment to raise his voice only half the amount necessary to maintain the same signal-to-noise ratio as in a quiet environment. Acknowledging this behavior requires a change to a constant in the Nos equation from 0.008 to 0.004. This change affects the default R for Ie = 0, changing it from R = 94.15 to R = 93.19. Since this change occurred during 17 TIA/EIA/TSB116 the preparation of this document, the (05/00) version of the E-Model is not used here, rather the (12/98) version of the E-Model (sometimes known as VTQME19), with R default = 94.15 is used. Also changing in ITU-T Recommendation G.107 (05/00) is the relationship between the R and the number of qdus. In G.107 (12/98), R was constant for values of qdu below about 5, but the constant is being removed in G.107 (05/00). The slope remains the same, but it starts at qdu = 1 instead of about qdu = 5. This information is explained in G.113 (2001). Europe has traditionally preferred about 6 dB louder sidetone than North America. Europe has favored a nominal STMR of about 12 dB and North America has favored a quieter nominal STMR of about 18 dB. Currently, the E-Model incorrectly penalizes STMRs quieter than 15 dB. TIA-810-A specifies STMR = 18 dB +/- 6 dB. An effort will be made to change the 15 dB threshold to 21 dB in a future revision of G.107 to better accommodate North American sidetone preferences. 4.6. E-Model Conventions To obtain consistent answers, it is necessary to agree on certain conventions. These are listed below. • • • • • Ta = T Tr = 2T If actual delay is asymmetrical (Tact side A to side B ≠ Tact side B to side A), then take the average of the two delays (Tavg side A to side B = Tavg side B to side A = [Tact side A to side B + Tact side B to side A]/2). This is appropriate because the subjective perception of delay depends on the round trip delay, not on either of the one-way delays. QDU minimum = 1 for all digital connections including IP telephones and gateways It is a relative tool rather than an absolute tool. 18 TIA/EIA/TSB116 5. Wireline PSTN Voice Quality Benchmarks The PSTN is a well-developed network with acceptable voice quality. Therefore, it is logical to compare the performance of IP telephony with the benchmarks established by the PSTN. Since everyone has used the PSTN, everyone has a mental model of the quality it provides. In this section, we will develop the corresponding E-Model and R values associated with PSTN performance. In this way, we can establish solid objective benchmarks for comparing IP telephony to the PSTN. The results of this benchmarking will be used to compare to the IP telephony scenarios in the next section. Three representative PSTN benchmarks will be illustrated: ISDN voice quality (G.711 end-to-end), PSTN voice quality (analog telephones with nominal length analog loop at both ends and G.711 digital switching/trunking) and Toll Compression voice quality (similar to the PSTN case, using G.726 at 32 kb/s in the digital portion). These benchmarks will then be summarized as the “existing PSTN” region in the delay vs. R space described above. ITU Recommendation G.114 provides guidance on propagation delay. For fiber optic national circuits the formula for propagation plus equipment delay is: National delay (ms) = 3 ms + (0.005 ms/km × distance in km) Note: All delays in G.114 and in this TSB are one-way. The 3 ms constant term makes allowance for one PCM coder/decoder pair and for five digitally switched exchanges. In addition to propagation, the 0.005 ms/km factor accounts for the delay in repeaters and regenerators. The maximum national distance is about 6000 km, which equates to a maximum national delay of 33 ms. In practice, there would typically be much less delay due to propagation and more delay associated with equipment like PBXs, compression codecs and multiplexers. For reference, as the crow flies, the distance/delay (includes 3 ms constant) between St. John's, Newfoundland and Victoria, British Columbia is about 5000 km/28 ms and between New York and San Francisco is about 4200 km/24 ms. For international circuits, the following are the relevant guidelines from G.114: International submarine fiber optic delay (ms) = 13 ms + 10 ms + (0.005 ms/km × distance in km) International submarine coaxial cable system delay (ms) = 30 ms + (0.006 ms/km × distance in km) The 13 ms constant accounts for the transmitter delay and the 10 ms accounts for the receiver delay. The 30 ms constant accounts for the total one-way digital circuit multiplication equipment (DCME) delay, using a G.726-32 codec and digital speech interpolation. The submarine cable route could be significantly longer than the way the crow flies, but here are some examples for reference. The distance/delay (including the constants) between London, England and New York is about 5600 km/ 51 ms (fiber) or 64 ms (coax) and between San Francisco and Hong Kong is about 11000 km/ 78 ms (fiber) or 96 ms (coax). Building on the above examples, a more complicated scenario would consist of one national and two international sections. For example, London and Hong Kong are only about 10,000 km apart (via Europe and Asia), but an international call might be routed the long way around through the United States for a minimum of 20,800 km (London to New York to San Francisco to Hong Kong). The objective here is to give a feel for the amount of worst-case PSTN impairment in a hypothetical reference connection, not to represent typical connection routing. The fiber route is assumed to be G.711 all the way. It has an Ie = 0 and a total one-way delay of 153 ms. Assuming that national connection uses G.711 and that the tandeming is asynchronous (see Section 6.2.4), then the coax route has two DCME impairments of an Ie of 7, for a total of 14. If instead, the tandeming is synchronous, the Ie can be reduced to down to 7 and the total one-way delay is 184 ms. 19 TIA/EIA/TSB116 Worst-case scenarios like this do not reflect the typical user experience, particularly as it relates to delay. We typically experience delays of less than 50 ms with good echo control (most national and approximately 50% of international calls, depending on country of origin). In fact, as the following examples will show, the loss plan for analog sets introduces much of the impairment we experience. Therefore, the next sections detail three, simple PSTN benchmarks that highlight the typical user experience rather than three complicated hypothetical reference connections that mask the message in detail. The worst-case delay of 100 ms for the PSTN benchmarks was selected as a compromise between the typical actual delays of less than 50 ms and worst-case hypothetical reference connection delays of about 200 ms. One can think of 100 ms as approximately one national and one international wireline connection. 5.1. ISDN Voice Quality ISDN voice quality is obtained on all-digital TDM connections with the following characteristics: • • • • G.711 only Echo control in the telephones (45 dB >= TCLw >= 40 dB) Nominal loudness ratings of SLR = 8 dB and RLR = 2 dB and STMR = 15 dB (note sidetone discussion in section 4.5) One-way delay = 100 ms maximum Figure 13 – ISDN Quality Voice Benchmark Figure 13 graphically illustrates ISDN quality with delay from 0 to 100 ms as the independent variable and TELR of 50, 55 and 60 dB as the parametric variable. This corresponds to TCLw values of 40, 45 and 50 dB, respectively. The rationale for using these values is that TIA-810-A increased the TCLw requirements by 5 dB and now specifies a nominal TCLw of 45 dB and a desirable TCLw of 50 dB (the yellow region). Previous digital set standards specified a nominal TCLw of 40 dB and a desirable TCLw of 45 dB (the gray region). The maximum delay for a national connection is about 28 ms @ 5000 km. As shown in Figure 13, the typical digital set TCLw values, at this distance, do not have a significant effect on R: the voice 20 TIA/EIA/TSB116 quality remains in the “very satisfactory” category. However, as the delay increases, the curve for the TCLw = 40 dB curve drops rapidly down to 80. The existing PSTN is defined by the green region between the reference curve (TCLw = 55 dB) and R = 80 and between 0 and 100 ms. Figure 14 shows the connection diagram and the E-Model parameters for the ISDN quality scenario. Figure 14 – Block Diagram and E-Model Parameters for ISDN Voice Quality Benchmark Side A 0 dBr ISDN Terminal Side B ISDN Terminal Echo Path - Side A Echo Path - Side B Title Electric Circuit Noise (at 0 dBr) Room Noise Send Loudness Rating Receive Loudness Rating D-factor Noise Floor Sidetone Masking Rating Equipment Impairment Factor Expectation (Advantage) Factor Mean One-Way Delay (upper) Mean One-Way Delay (lower) Mean One-Way Delay (upper = lower) Electrical Loss (upper) Electrical Loss (lower) Electrical Loss (upper = lower) Quantizing Distortion Units (upper) Quantizing Distortion Units (lower) Echo Return Loss Abbrev. (Default) Nc (-70) Po (35) SLR (8) RLR (2) D (3) Nfor (-64) STMR (15) Ie (0) A (0) Tu (0) Tl (0) Tul (0) Lu Ll Lul qduu (1) qdul (1) ERL ISDN Terminal 35 8 2 3 -64 15 0 0 0 0 0 0.5 0.5 55,45,40 0 dBr -70 0 to 100 ISDN Terminal 35 8 2 3 -64 15 0 0 0 0 0 0.5 0.5 55,45,40 5.2. PSTN Voice Quality Mixed analog and digital (TDM) connections, without speech compression, have PSTN voice quality and they have the following characteristics: • • • • • G.711 only No echo control below 10 ms (ERL = 11 dB + 6 dB Rx loss = 17 dB) Echo control enabled at 10 ms (ELR = 55 dB) Nominal loudness ratings of SLR = 11 dB and RLR = -3 dB and STMR = 15 dB (note sidetone discussion in section 4.5) One-way delay = 100 ms maximum ITU-T Recommendation G.131 provides guidance on when to enable the ECAN, but each administration enables ECANs according to their own rules. Some administrations always have the ECANs enabled, while others wait until 22 ms. For this scenario, 10ms was selected as a compromise because it is the delay value where the blue (TELR = 31 dB) curve intercepts the R = 80 contour. 21 TIA/EIA/TSB116 The much lower echo control of the PSTN network before ECANs are enabled accounts for the very rapid degradation of the quality vs delay as shown in Figure 15. Once the ECAN is enabled, at 10 ms, there is an abrupt improvement in the voice quality due to the ECAN’s significant improvement in ERL. The red curve (with the ECAN enabled) uses analog telephone loudness ratings, with G.711 codecs in the digital segment. Compare it to the red curve in Figure 17, which uses G.726 in the digital segment with an Ie of 7. Figure 16 shows the connection diagram and the E-Model parameters for the PSTN quality scenario. Figure 15 – PSTN Voice Quality Benchmark 22 TIA/EIA/TSB116 Figure 16 – Block Diagram and E-Model Parameters for PSTN Voice Quality Benchmark 23 TIA/EIA/TSB116 5.3. Toll Compression Voice Quality All-digital or mixed analog/digital connections, with speech compression, have Toll Compression voice quality and they have the following characteristics: • • • • • G.711 and G.726 @ 32 kbit/s (Ie = 7) Echo control enabled (ERL = 55 dB) Nominal analog loudness ratings of SLR = 11 dB and RLR = -3 dB and STMR = 15 dB (note sidetone discussion in section 4.5) Nominal digital loudness ratings of SLR = 8 dB and RLR = 2 dB and STMR = 15 dB (note sidetone discussion in section 4.5) One-way delay = 100 ms maximum (of which the G.726/DCME accounts for 30 ms) The gray region in Figure 17 defines toll-compression quality. It is bounded by the red curve (mixed analog/digital connection with speech compression shown in Figure 18) and the blue curve (an alldigital connection with speech compression and ECANs enabled, but no hybrids). The main impairment is due to the reduced voice quality of the G.726 codec, but the 30 ms delay that the DCME introduces is also a significant portion of the 100 ms budget. Figure 17 – Toll Compression Voice Quality Benchmark 24 TIA/EIA/TSB116 Figure 18 – Block Diagram and E-Model Parameters for Toll Compression Voice Quality Benchmark 25 TIA/EIA/TSB116 5.4. Wireline PSTN Voice Quality Summary The wireline PSTN is characterized by: • • • Analog telephones with low delay, good loudness ratings and poor echo control. Digital telephones with low delay, good loudness ratings and good echo control. Digital networks with low delay, low impairments and good echo control. It is summarized by the green, “Existing PSTN” region in Figure 19 and it is bounded by the best G.711 performance on the top, R =80 on the bottom and delay between 0 and 100 ms. Most of the 100 ms delay “budget” is available for propagation delay. Other technologies like IP and wireless introduce combinations of noise/distortion and delay that reduce the voice quality relative to the existing PSTN. The objective of the next section is to illustrate the impairments introduced by IP technology with the aid of the E-Model and to provide some design guidance. Wireless technology introduces similar impairments and the combination of wireless and IP exacerbates the problem. Careful planning must be done for these two high impairment technologies to interoperate with a voice quality level approaching the existing PSTN. Figure 19 – Existing PSTN Voice Quality 26 TIA/EIA/TSB116 6. IP Telephony Voice Quality Analysis Section 4 demonstrated the suitability of the E-Model as a tool for estimating the relative voice and conversation quality of IP telephony. Section 5 defined the voice quality of the “Existing PSTN”. The goal of this section is to define the region of voice quality that is acceptable for IP telephony and to establish the recommendations needed to manage the impairments introduced by IP telephony. Figure 1 showed the G.109 categories of speech transmission quality. These six categories give a good feeling for the level of user satisfaction, but the resolution is too fine for the purposes of this document. Instead, this document compresses the five categories above R = 50 into three categories: “High”, “Medium” and “Low”. From the user’s perspective the categories blend together as shown in Figure 20, but thresholds for the three categories have been defined to make them more useful for objective work. Taking the “Existing PSTN” voice quality region illustrated in Figure 19 as representative of “High” voice quality, then R = 80 defines the lower bound of the “High” category. Figure 19 shows that the “Existing PSTN” only uses 100 ms of the maximum acceptable delay of 250 ms for R = 80 (the intercept of the Ie = 0 curve with the R = 80 line). However, according to the E-Model, regardless of the delay value, all points on the R = 80 line have the same impairment. Therefore, the region bounded by the Ie = 0 contour and the R = 80 line constitutes the “High” category. The intent of the “Medium” category is to define a region for a selection of compression codecs. The upper limit of the “Medium” category is the same as the lower limit of the “High” category, R = 80. To determine the lower boundary of this region, the effect of delay on the interactive nature of conversation must be considered. Basing our choice on the contour curve of a popular IP codec, G.729A + VAD (see Figure 25 in Section 6.2.1.1), and the same delay limit as the “High” category, i.e., 250 ms, we obtain a lower limit for the “Medium” category of R = 70. This is also the threshold between “Some Users Dissatisfied” and “Many Users Dissatisfied” in Figure 1, making it suitable choice. The “Medium” category is then bounded by R = 80, R = 70 and the Ie = 0 contour. All connections below R = 70 will suffer from some combination of distortion and long delay. The region between R = 50 and R= 70 encompasses the “Many Users Dissatisfied” and the “Nearly All Users Dissatisfied” categories in Figure 1 and therefore deserves the title “Low” voice quality. The “Low” category is then bounded by R = 70, R = 50 and the Ie = 0 contour. Figure 21 and Figure 22 summarize the recommended IP Telephony voice quality categories. 27 TIA/EIA/TSB116 Figure 20 – IP Telephony Voice Quality Regions G.107 Default Value R 94 Voice Quality Continuum MOS %GOB %POW 4.4 98.4 0.1 4.0 89.5 1.4 3.6 73.6 5.9 3.1 50.1 17.4 2.6 26.6 37.7 1.0 0 99.8 High 80 Medium 70 60 Low 50 Not Recommended 0 Figure 21 – Recommended IP Telephony Voice Quality Categories G.107 Default Value R 94 Recommended IP Telephony Voice Quality Categories MOS %GOB %POW 4.4 98.4 0.1 4.0 89.5 1.4 3.6 73.6 5.9 3.1 50.1 17.4 2.6 26.6 37.7 1.0 0 99.8 High 80 Medium 70 60 Low 50 Not Recommended 0 28 TIA/EIA/TSB116 Figure 22 – Recommended IP Telephony Voice Quality Categories 6.1. Voice Quality Issues for IP Telephony The list of voice quality issues from Section 4.1 (delay, echo, speech compression and packet loss) has been expanded in this section to cover some additional issues: • • • • • • • • Delay Speech compression Packet loss G.711 packet loss concealment (PLC) Transcoding New gateway loss plan Echo cancellers (ECAN) TCLw G.711 PLC, transcoding and loss plans have been added to the list. The first two are very important issues for IP telephony and are covered in some detail. Loss plan details however, are documented in TSB122-A. They are mentioned here only because this entire document assumes that the loss plan is implemented correctly. Echo has been divided into two categories: echo cancellers and TCLw. However, they were covered in sufficient detail in Sections 4 and 5, so they are not explicitly detailed in this section. In summary, for long delay environments like IP telephony echo cancellers and IP telephones must provide at least 55 dB of ERL or 52 dB of TCLw, respectively. These requirements are documented in the appropriate standards and no further voice quality recommendations are required. 29 TIA/EIA/TSB116 6.2. Voice Quality Recommendations for IP Telephony 6.2.1. Delay Delay Rec. #1: Use G.711 end-to-end because it has the lowest Ie-value and therefore it allows more delay for a given voice quality level. Delay Rec. #2: Minimize the speech frame size and the number of speech frames per packet. Delay Rec. #3: Actively minimize jitter buffer delay. Delay Rec. #4: Actively minimize one-way delay. Delay Rec. #5: Accept the E-Model results, which permit longer delays for low Ie-value codecs, like G.711, for a given R-value; see Figure 22 and Figure 27. Delay Rec. #6: Use priority scheduling for voice-class traffic, as well as RTP header compression and data packet fragmentation on slow-speed links to minimize the contribution of this variable delay source. Delay Rec. #7: Avoid using slow serial links. While ITU-T Recommendation G.114 is the definitive document on delay, this section provides some value-added information without reiteration. In particular, the emphasis of this TSB is to encourage the use of the E-Model in making design decisions. This provides more flexibility than simple oneway transmission time limits like 150 ms or 400 ms. One-way delay has three components: • • • Encoding/decoding/packetization + jitter buffer delay (delay variation), Transport delay, Propagation delay. In addition to the information provided in this section, Annex A (VoIP End-to-End Delay Budget Planning for Private Networks) offers an analysis of the delay aspects of two scenarios. 6.2.1.1. Packetization and Jitter Buffer Delay Packetization delay in a codec/vocoder is comprised of several components. The delay on the encoder side (the send side) consists of: the time taken to accumulate speech samples into a speech frame; the time required to compress the speech frame, if needed, for the purpose of bandwidth reduction; the time to insert the speech frame(s) into a packet and transfer the packet to the transport facility; and the firmware/hardware delays. In addition, some vocoders use a look-ahead function, as part of the compression process, which waits for the first part of the following speech frame to provide information on how to help reconstruct the speech sample if there are any lost packets. The packetization delay on the decoder side (the receive side) consists of: the time taken to decompress the speech frame(s) into speech samples and the firmware/hardware delays. In addition, some codecs have an add-on packet loss concealment algorithm that adds some delay. Also, on the decoder side is the jitter buffer, which introduces delay to compensate for the variation in arrival time of sequential packets from the transport facility. Some documents refer to this as delay variation. G.114 has a thorough analysis of planning guidelines for packetization delay, for the case where the compression/encoding process fully utilizes the power of the processor. G.114 provides the following formulas for calculating the minimum and maximum codec-related processing delay: Minimum packetization delay for high-speed connections = (N+1) x frame size + D (ms) Maximum packetization delay for low-speed connections = (2N+1) x frame size + D (ms) {1} {2} Where: N = number of speech frames per packet; speech frame size is in ms; D = look-ahead, PLC, and additional firmware/hardware delay (where applicable) in ms. 30 TIA/EIA/TSB116 For purposes of this document, a high-speed connection is defined as one in which the time taken to transfer the speech packets to the transport facility is insignificant with respect to the length of the speech frame size. A low-speed connection is defined as one in which the transfer time is equal to the length of the speech frame size (to maintain real-time communication, the transfer time cannot exceed the speech frame size). The ‘2’ in the ‘2N’ part of equations {2} and {4} consists of 1N for compression (same as equations {1} and {3}) and 1N for the transfer time, i.e., (1 + 1) N = 2N. Equations {1} and {2} indicate that the delay due to compression (the ‘+ 1’ term) need only be considered once when calculating the total packetization delay. This is true because while the processor is executing the compression algorithm on a speech frame, the next speech frame is concurrently being accumulated. To operate in real time, the compression execution must be finished before the next speech frame is ready. Therefore, a fully utilized processor takes the duration of one speech frame to complete the compression algorithm. Also possible, but not documented in G.114, is the situation where there is sufficient processor power to compress the speech frame almost instantaneously or when the G.711 codec, which does not require any compression, is used. Equations {3} and {4} illustrate sufficient processor power case by removing the ‘+ 1’ term. Minimum packetization delay for high-speed connections = N x frame size + D (ms) Maximum packetization delay for low-speed connections = 2N x frame size + D (ms) {3} {4} Where: N = number of speech frames per packet; speech frame size is in ms; D = look-ahead, PLC, and additional firmware/hardware delay (where applicable) in ms. The theoretical minimum and maximum packetization delays, irrespective of processing power, are then described by equations {3} and {2}, respectively. These equations define theoretical limits. Actual delays for practical implementations will lie somewhere between these limits. Figure 23 shows a graphical representation of the packetization delay for the G.729A + VAD connection illustrated in Figure 26. G.729A has a speech frame of 10 ms and look-ahead of 5 ms. (G.711 does not have speech frames per se; it can be packetized in 1 ms increments.) There is always a trade-off between header-to-payload efficiency and packetization delay, but clearly, as more speech frames and larger speech frames are inserted into each packet, the packetization delay increases. This translates into a lower voice quality level for a given amount of propagation delay (see Figure 25). Due to processing power limitations, IP telephone jitter buffers are typically frame-based meaning that the size of the buffer is a multiple of speech-frame size. Frame-based jitter buffers can increase delay dramatically if the frame size is large. A rule of thumb for frame-based buffers is that the jitter buffer must be two times the speech-frame size. Gateways have more processing power, so they often use absolute jitter buffers that are sized to the expected delay of the transport. The benefit in doing this lies in the enhanced resolution of the jitter buffer. That is, to cancel 20 ms of jitter, it does not matter whether an absolute jitter buffer or a frame-based jitter buffer with frame size 10 or 20 ms is used. On the other hand, to cancel 21 ms jitter the absolute jitter buffer will introduce less delay because it can be set to 23 ms, for example, (21 ms + some safety margin in increments determined by the jitter buffer resolution) instead of 30 ms. Frame-based jitter buffer delay = 2N x frame size (ms) Absolute jitter buffer delay = actual end-to-end delay variation + margin (ms) Where: N = number of speech frames per packet; speech frame size is in ms. {5} {6} Jitter buffers solve the lost and late packets problem by adding delay that reduces the available delay budget. Truly lost (discarded) packets will never show up. Figure 24 adds frame-based jitter buffer delay to the packetization delay shown in Figure 23. This translates into even lower voice quality for a given amount of propagation delay. The goal is to minimize jitter buffer delay. When QoS controls are used, smaller jitter buffers can be used to obtain the same performance. 31 TIA/EIA/TSB116 Figure 23 – G.729A + VAD Packetization Delay, without Jitter Buffer Delay, for the case where the packetization process Fully Utilizes the Power of the Processor Figure 24 – G.729A + VAD Packetization Delay + Frame-based Jitter Buffer Delay for the case where the packetization process Fully Utilizes the Power of the Processor 32 TIA/EIA/TSB116 Figure 25 – Example of G.729A + VAD Allowable Propagation Delay Figure 26 – E-Model details for the Delay Examples in Figures 23, 24 and 25 33 TIA/EIA/TSB116 Most jitter is source-based jitter. It is a function of the link speed, packet size and network loading. Source jitter arises due to contention for the link bandwidth, both by multiple voice calls sharing the same queuing priority and between voice and data packets where data packet has already started transmission. Source-based jitter in all low speed packet access networks (cable, enterprise, DSL) can dwarf network jitter in high-speed networks by orders of magnitude. 6.2.1.2. Transport Delay and Jitter Intranet carriers and corporate managed IP networks use equipment with only about 25 to 100 microseconds of delay per hop, plus about 10 to 20 ms of jitter buffer delay end-to-end to accommodate source-based jitter. Network-based jitter is usually negligible relative to source-based jitter and intranet carriers and corporate managed IP networks design their networks for packet loss rates well below 1%. However, any contention or queue overruns will produce “jitter events”, which are brief but dramatic changes in delay relative to the mean jitter. The question then becomes, how brief? The packet loss concealment algorithms in the end terminals can deal with about 40 ms of missing speech. Between about 40 ms and about 200 ms, the speech is clipped and after that there are speech dropouts. When calculating the mean jitter, these jitter events have to be removed and noted separately, otherwise they would significantly distort the mean jitter value. Significant jitter events in the network can only be avoided by increasing the bandwidth and/or implementing a QoS strategy to prioritize the voice packets. For the purposes of this section, 25 ms has been reserved for transport delay plus jitter (see Figure 25) and there are no jitter events. 6.2.1.3. Propagation Delay Once the packetization, jitter buffer and transport delay are accounted for in a given connection, all that remains is propagation delay. There are three ways to deal with propagation delay: • • • Reserve a block of time for propagation delay; Illustrate the available propagation delay using equipment quality classes; Illustrate the available propagation delay on a case-by-case basis with real scenarios or hypothetical reference connections. The advantage of reserving a block of time for propagation delay is that it is a very simple approach. The problem is: how much is appropriate? Section 5 identified the typical actual PSTN delay as less than 50 ms, but the worst-case hypothetical PSTN reference connection delay is about 200 ms. This approach is too course to be practical for IP telephony. The second approach, using equipment quality classes, refers to dividing both the IP telephone packetization/jitter buffer delay and the network transport delay into a number of quality classes. The various combinations of these classes can be documented in a matrix against the “High”, “Medium” and “Low” voice quality categories and the matrix can show how much propagation delay is available. The matrix format is compact. For example, with three delay classes of IP telephones, three delay classes of networks and three voice quality categories, a matrix with 27 entries gives a good overview of the range of available propagation delay, from none to more than enough. While this approach both solves the resolution problem identified in the previous method and is easy to use in the table format, it does take some studying to appreciate the available propagation delay message. The third approach is illustrated in Figure 25. It is consistent with the philosophy of this document to use the E-Model to evaluate specific scenarios. Figure 25 shows the case of G.729A + VAD with two speech frames per packet ((2 * 2 + 1) x 10 + 5 ms = 55 ms), a two time speech frame jitter buffer (2 x 10 ms = 20 ms) and a transport delay of 25 ms (from Section 6.2.1.2), for a total of 80 ms. The allowable propagation delay depends on the minimum voice quality level (remember this includes the conversation dynamics) to which the connection is allowed to drop. In this example, the intercept with R = 70, the lower limit of the “Medium” category, is the “line in the sand”. Therefore, 34 TIA/EIA/TSB116 the allowable propagation delay for this example is about 145 ms, which is sufficient for all national and most international connections. This approach provides great flexibility for evaluating the allowable propagation delay. However, it requires sufficient resources to perform the necessary E-Model analyses. 6.2.2. Speech Compression Speech Compression Rec. #1: Use G.711 unless the link speed demands compression. Speech Compression Rec. #2: Speech compression codecs for wireless networks and packet networks must be rationalized to minimize transcoding issues. Speech compression adds distortion. In the E-Model, an increase in distortion is represented by an increase in the Ie-value and therefore a decrease in the R. As R drops, the delay margin available for a given voice quality level also drops. This is highlighted in Figure 27, where a number of codecs from Table 1, have been plotted to illustrate their relative contours. Figure 28 shows the block diagram and E-Model parameters for this example. Determining the Ie-value for a given codec is as much an art as a science and it is useful to think of the information in Figure 27 and Table 1 as ranking of codecs rather than a list of absolute Ie-values. The selection of the appropriate codec for the task is complicated. One must consider the cost of the intellectual property, the speech performance, the packet loss performance, the conferencing performance (leads to transcoding issues), the tone/DTMF/fax performance, the efficiency, the speech frame size, the delay and so on. The challenge of the details misses the big picture. The implication from Figure 27 and the example in Section 6.2.4, Transcoding, is that there are simply too many codecs. The combination of impairments introduced by wireless networks and packet networks demands a rationalization of codecs. This task is beyond the scope of this document, but explains why this document recommends the G.711 codec for IP telephony unless the link speed demands compression. Without going into the EModel details, Table 4 shows the number of channels that can be transported over three link speeds for three codecs. This is achieved by trading off voice distortion and conversation dynamics against additional capacity. Yes, a gateway can transport 5 channels of G.729A, with two speech frames per packet, over a 212 kb/s link, but the quality of conversation on the resulting channels is reduced. Using the examples in this document for guidance, determining the Rs for Table 4 is left as an exercise for the reader. Table 4 -– Comparison of Codecs, Link Speed and Capacity Codec Bit Rate (kb/s) Packet Frame Duration (ms) Payload Size (bytes) IP Packet Size w/ overhead ATM Cells Needed ATM Bytes Needed ATM Bitrate Needed (kb/s) Link Speed (kb/s) #Channels (Max) 212 #Channels (Max) 512 64 10 80 134 3 159 127.2 1 4 G.711 64 20 160 214 5 265 106 G.726 64 32 32 30 10 20 240 40 80 294 94 134 7 2 3 371 106 159 98.9 84.8 63.6 32 30 120 174 4 212 56.5 G.729A 8 8 8 8 10 20 30 40 10 20 30 40 64 74 84 94 2 2 2 2 106 106 106 106 84.8 42.4 28.3 21.2 Number of Channels Possible for each Codec at a Given Link Speed 2 2 2 3 3 2 5 7 4 5 6 8 9 6 12 18 35 10 24 TIA/EIA/TSB116 Figure 27 -– A Comparison of Selected Codecs from Table 1 Speech Compression Impairment User Satisfaction 100 G.711 GSM EFR @ 12.2 kb/s 90 Very satisfactory G.726, G.727 @ 40 kb/s Ie = 0 G.726, G.727 @ 32 kb/s; G.728 @ 16 kb/s Ie = 2 Satisfactory Ie = 5 G.726, G.727 @ 24 kb/s 80 Ie = 7 Some users dissatisfied R Ie = 11 Ie = 15 70 Many users dissatisfied 60 Ie = 20 Ie = 25 G.723.1 @ 6.3 kb/s G.729A + VAD @ 8 kb/s Exceptional limiting case IS-54 @ 8 kb/s; G.728 @ 12.8 kb/s 50 0 100 200 300 400 500 One-way Delay (ms) Figure 28 -– E-Model details for the Speech Compression Example in Figure 27 36 TIA/EIA/TSB116 6.2.3. Packet Loss Packet Loss Rec. #1: Packet Loss Rec. #2: Packet Loss Rec. #3: Packet Loss Rec. #4: Keep (random) packet loss well below 1%. Use packet loss concealment with G.711. If other codecs are used, then use codecs that have built-in or add-on PLCs. New PLCs should be optimized for less than 1% of (random) packet loss. The packet loss rate, the distribution of the losses (random vs bursty) and the number of speech frames per packet are known to affect the subjective quality of voice on packet networks. The light blue (w/o PLC @ PL = 1%) curve in Figure 29 shows what happens to the voice quality of the G.711 codec when it encounters only 1% of packet loss without a packet loss concealment algorithm. R drops from 94 to 69 at 0 ms. PLCs monitor the receive signal and attempt to reduce the effects of packet loss by using information in the current packet to estimate the following packet if it doesn’t arrive in time. PLCs add about 5 ms of processing delay but they are essential in packet networks. Vocoders intended for packet networks like G.729 and G.723.1 are equipped with PLCs, but codecs like G.711 and G.726, which were originally intended for switched circuit networks require PLC addons. As mentioned below, there are two standard methods for G.711, but there is currently no established standard for G.726. A standard is not necessary, however, since most methods are deployed at the decoder only; therefore, a proprietary algorithm easily be employed where needed. Based on the Ie-values in Table 2 and the scenario in Figure 30, the G.711 packet loss performance with and without PLC is illustrated in Figure 29. R at 1% packet loss without PLC is about the same R for 10% packet loss with PLC (not shown in Figure 29) assuming a random distribution of lost packets. The red, green, blue and violet curves (w/PLC @ PL = 1% to w/PLC @ PL = 5%) show the G.711 with PLC random packet loss performance. It is clear from these curves that PLC must be used with G.711 in packet networks. Recently, ITU-T SG16 approved the ANSI T1.521 Annex A PLC algorithm as Appendix I of Recommendation G.711. The performance of this algorithm is documented in Table 2 in the “Bursty” column, but it is not plotted in this section. T1.521 also has an Annex B with a second PLC algorithm, which is documented in Table 2 in the “Random” column. This is the algorithm that is plotted in Figure 29. Note that the terms “Random” and “Bursty” only refer to the test conditions that were used to evaluate the algorithms. Because PLC algorithms work on the receive side only, having a choice of two algorithms is acceptable. Figure 31 and Figure 33 show the random packet loss family of curves for two popular IP coders: G.729A + VAD and G.723.1A (6.3 kbit/s) codecs, respectively. For relative reference, the G.711 default curve is included on each graph. These graphs illustrate two points. First, on the R-axis it shows how much distortion impairment each vocoder adds. Second, on the delay axis it shows the reduction in delay available to the connection, for a given R. Recall the packetization and jitter buffer delay details provided in Section 6.2.1. The delay margin available within a given performance category is significantly reduced by the use of speech compression and it is further reduced by packet loss. Figure 32 and Figure 34 show the related block diagrams and E-Model parameters. In practice, intranet carriers and corporate managed IP networks design their networks for packet loss rates well below 1%. Much of the packet loss detail shown in this section for rates of packet loss greater 2% is based on early experience with voice over the Internet where Best Effort service does not provide any arrival time guarantees. Including the GSM EFR vocoder in Section 4.2.4, this document details the family of curves for four of the five PLC algorithms/codecs that have published (provisional) packet loss information. All four PLC algorithms show reasonably consistent degradation vs packet loss. Another way of thinking about packet loss is in terms of time rather than percent. PLCs can provide adequate “repair” of consecutive missing speech up to about 40 ms. Most packet loss is bursty in nature, i.e., occasional long losses, rather than frequent short losses. In practice this means that packet loss performance is directly related to packet size, the shorter, the better. 37 TIA/EIA/TSB116 Figure 29 -– G.711 Random Packet Loss Performance G.711 Packet Loss Performance with & without PLC User Satisfaction 100 Very satisfactory ∆ R = 20 @ 1% PL ∆ R = 28 @ 2% PL 90 G.711 @ PL = 0% Satisfactory w /PLC w /PLC @ PL = 1% 80 w /PLC @ PL = 2% Some users dissatisfied R w /PLC @ PL = 3% w /PLC @ PL = 5% 70 w /o PLC @ PL = 1% Many users dissatisfied w /o PLC @ PL = 2% 60 Exceptional limiting case w /o PLC 50 0 100 w /PLC = w ith Packet Loss Concealment w /o PLC = w ithout Packet Loss Concealment 200 300 One-way Delay (ms) 400 500 PL = Packet Loss PLC = Packet Loss Concealment Figure 30 -– E-Model details for the G.711 Packet Loss Example in Figure 29 38 TIA/EIA/TSB116 Figure 31 – G.729A +VAD Packet Loss Performance compared to the G.711 Reference G.729A Packet Loss Performance User Satisfaction 100 Very satisfactory 90 G.729A G.711 Reference G.711 @ PL = 0% Satisfactory G.729A @ PL = 0% 80 Some users dissatisfied R G.729A @ PL = 1% 70 G.729A @ PL = 2% Many users dissatisfied G.729A @ PL = 3% 60 G.729A @ PL = 4% Exceptional limiting case 50 0 100 200 300 One-way Delay (ms) 400 500 PL = Packet Loss Figure 32 – E-Model details for the G.729A + VAD Packet Loss Example in Figure 31 39 TIA/EIA/TSB116 Figure 33 – G.723.1 (6.3 kbit/s) Packet Loss Performance compared to the G.711 Reference G.723.1 Packet Loss Performance User Satisfaction 100 Very satisfactory 90 G.711 Reference G.723.1 G.711 @ PL = 0% Satisfactory G.723.1 @ PL = 0% 80 Some users dissatisfied R G.723.1 @ PL = 1% 70 G.723.1 @ PL = 2% Many users dissatisfied G.723.1 @ PL = 3% 60 G.723.1 @ PL = 4% Exceptional limiting case 50 0 100 200 300 400 500 One-way Delay (ms) PL = Packet Loss Figure 34 – E-Model details for the G.729A Packet Loss Example in Figure 33 40 TIA/EIA/TSB116 6.2.4. Transcoding Transcoding Rec. #1: Avoid transcoding where possible. Adds Ie and delay impairment. Transcoding Rec. #2: For interoperability, IP gateways must support wireless codecs or IP must implement unified Transcoder Free Operation with wireless. Transcoding is defined as two or more encodings of a signal through different types of non-G.711 codecs, separated by G.711 or linear segments. Example: GSM EFR to G.711 to G.729. Transcoding is accomplished by converting the signal to G.711 or linear. Direct conversion between arbitrary codecs is not yet possible. Transcoding is a significant issue in wireless connections because there are several different wireless codecs. IP telephony supports several different codecs, but connections are expected to be established by handshaking to a common codec if the connection does not transverse islands of PSTN. The problem of transcoding occurs in IP-to-wireless connections because each technology may use a different set of codecs, although some wireless codecs are being adopted as options for IP. In terms of the E-Model and impairments, transcoding has the potential to increase distortion and delay. How much distortion depends on the codecs involved. Looking down the list in Table 1, Ievalues vary from 2 to 50. Some combinations will not be noticeable, while others will be intolerable. Similarly, delay is very specific to a given scenario. Although this document attempts to summarize complex issues with general recommendations, its paramount objective is to provide an E-Model tutorial for IP scenarios, so readers can analyze their own networks. A number of transcoding and tandeming examples are provided on the next few pages to help deal with the complexities of these situations. Figure 36 shows a scenario with a GSM wireless telephone connected to a G.729A + VAD IP telephone. The IP telephone may be using G.729A+ VAD instead of G.711 due to low speed access. There is one transcode in this connection from GSM to G.729A+ VAD and vice versa. The resulting decrease in voice quality from the “High” category to the “Medium” category is illustrated by the green curve (GSM with G.729A+ VAD transcode) in Figure 35. The GSM EFR Ie-value of 5 is added to the G.729A+ VAD Ie-value of 11 for a total Ie of 16. Also, note that the amount of delay available to a given voice quality level is significantly reduced. For example, at R = 70 the available delay is about 80 ms less for the green curve compared to the red curve, which is in turn about 50 ms less than the black curve. The conclusion is that only one transcode can be tolerated before the performance drops below acceptable levels, for most combinations of non-G.711 codecs. Transcoding E-Modeling Rule: Assume Ie-values are additive. Note: The E-Model does not take into account the order of the codecs, which in practice may be incorrect. Transcoding Example: Ie total = GSM EFR (Ie = 5) to G.711 (Ie = 0) to G.729A + VAD (Ie = 11) = 5 + 11 = 16 A way around this additive impairment is Transcoder Free Operation (TrFO). TrFO is an out-of-band signaling procedure that provides the capability of negotiating the same (or at least an interoperable) encoder/decoder combination between the end terminals themselves, with a direct digital (no conversion to/from G.711) connection in between. Figure 37 shows a scenario where the IP network and telephones support the wireless GSM codec, so the voice quality is raised from the green curve to the red curve (GSM only, no transcode, Ie = 5) in Figure 35. 41 TIA/EIA/TSB116 Transcoder Free Operation E-Modeling Rule: Only one Ie-value. Transcoder Free Operation Example: Ie total = GSM EFR all the way (Ie = 5) = 5 Transcoder Free Operation is still in the discussion stage for wireless networks. TrFO will be just as important for IP as it will be for wireless. However, interoperability must be the common goal of wireless and IP networks. Either TrFO must be unified between IP and wireless, or the range of codecs in IP gateways expanded to include wireless codecs. The next section deals with tandeming, which has some similarities to transcoding and some significant differences. Worth mentioning in this section is that while Tandem Free Operation (TFO) and TrFO appear to be the same thing, they are implemented in a different manner. A device called the transcode unit is in place for TFO, but not for TrFO. As result, in simple terms, TrFO may have less delay, but TFO may have more network compatibility and features and be less expensive to implement. Ideally, they should be harmonized. 6.2.5. Tandeming Tandeming Rec. #1: Tandeming Rec. #2: Avoid asynchronous tandeming if possible. Adds Ie and delay impairment. Synchronous tandeming of G.726 is generally permissible. Impairment is delay dependent, so long delay DCME equipment should be avoided. Tandeming, for all codecs except G.711, is defined as two or more encodings of a signal through the same type of codec, separated by analog or G.711 segments. Example: G.726 to G.711 to G.726. Tandeming for G.711 is defined as two or more encodings of a signal through G.711, separated by analog segments. It is asynchronous tandeming. Tandeming E-Modeling Rule for G.711: Ie for G.711 is 0, but each D/A-A/D conversion incurs a distortion impairment of 1 qdu. Tandeming Example for G.711: qdu total = G.711 (qdu = 0.5) to analog to G. 711 (qdu = 0.5) = 1 Asynchronous tandeming of waveform codecs (see Table 1), except G.711, is defined as two or more encodings of a signal through the same type of waveform codec, separated by analog segments, G.711 segments with digital processing, such as digital pads that interrupts the sample-by-sample flow. Delay introduced by DCME equipment may be an issue, see synchronous tandeming. Asynchronous Tandeming E-Modeling Rule for non-G.711 Waveform codecs: Ie-values are additive. DCME delay may be an issue. Asynchronous Tandeming Example for non-G.711 Waveform codecs: Ie total = analog set to G.711 to G.726.32 (Ie=7) to G.711 to G.726.32 (Ie=7) to G.711 to analog set = 14 Synchronous tandeming of waveform codecs (in practice G.726), except G.711, is defined as two or more encodings of a signal through the same type of waveform codec, separated by G.711 segments without any digital processing that interrupts the sample-by-sample flow. Some DCME equipment has tandem-avoidance capability that synchronizes the samples. In modern digital networks, waveform codec segments separated by G.711 segments synchronize naturally, without the aid of DCME tandem-avoidance capability, only when there is no digital signal processing in the G.711 42 TIA/EIA/TSB116 segments. However this is the typical case. As mentioned in Section 5, each piece of DCME equipment also adds about 30 ms of one-way delay, for things like voice activity detection. DCME uses TDM streams, which requires conversion from IP with media gateways. Also, DCME may introduce transcoding issues. DCME equipment was useful in the switched circuit network, but it introduces undesirable delay and Ie impairment in conjunction with wireless and IP technologies. Synchronous Tandeming E-Modeling Rule for G.726 Waveform codecs: Ie-values are not additive. There is only one Ie-value, regardless of the number of tandems. Delay becomes the limiting factor. Synchronous Tandeming Example for G.726 Waveform codecs: Ie total = Analog set to G.711 to G.726.32 (Ie = 7) to G.711 to G.726.32 to G.711 to analog set = 7 Asynchronous tandeming of speech compression codecs is defined as two or more encodings of a signal through the same type of codec, separated by analog segments, or G.711 segments with digital processing, such as digital pads, and/or different frame boundaries for frame-based codecs, so that the frame sampling boundaries do not line up from one encoding/decoding to the next. Asynchronous Tandeming E-Modeling Rule for Speech Compression codecs: Ie-values are additive. Asynchronous Tandeming Example for Speech Compression codecs: Ie total = analog set to G.711 to G.729 (Ie = 10) to G.711 to G.729 (Ie = 10) to G.711 to analog set = 20 It is possible to arrange the frame boundaries of speech compression codecs to line up to become synchronously tandemed, but it is not commonplace. The nature of the coding process is such that there will still be some degradation, but probably less than the sum of Ie-values. This situation is probably codec-specific and there are no simple modeling rules. Tandem Free Operation (TFO) is an in-band signaling procedure to handshake between the transcoding units that would normally interface to the PSTN with G.711 PCM. If the transcoding units determine that they are compatible (normally exactly the same codec, but some interworking scenarios are possible), then they set up a direct digital path (transport the bits) instead of going back to G.711. The key point is that the codecs in the end terminals are unaware of this procedure. TFO is just beginning to be introduced in wireless networks to avoid tandeming of speech codecs in wirelessto-wireless calls. This will raise voice quality levels and therefore, user expectations. From the E-Model perspective TFO is the TrFO scenario in Figure 37. The IP network and telephones support the wireless GSM codec, so the voice quality is illustrated by the red curve (GSM only, no transcode, Ie = 5) in Figure 35. Tandem Free Operation E-Modeling Rule: Only one Ie-value. Tandem Free Operation Example: Ie total = GSM EFR all the way (Ie = 5) = 5 TFO was originally designed for wireless systems, but it is applicable to any packet voice network. TFO requires identical codecs at either end. The handshaking protocol supports selection of compatible compression modes; that is, if the two terminals support multiple codecs, the protocol includes procedures for the selection of a common-mode compression. 43 TIA/EIA/TSB116 Figure 35 – A Comparison of the Effect of One Transcode with No Transcodes and the G.711 Reference Effect of Transcoding on Impairment User Satisfaction 100 Very satisfactory GSM only, no transcodes 90 G.711 Reference Satisfactory G.711 Reference Some users dissatisfied GSM only, no transcodes 80 R 70 GSM w ith G.729A transcode Many users dissatisfied GSM w ith G.729A transcode 60 Exceptional limiting case 50 0 100 200 300 400 500 One-way Delay (ms) Figure 36 – E-Model details for the GSM EFR to G.729A Transcode Example in Figure 35 44 TIA/EIA/TSB116 Figure 37 – E-Model details for the No Transcode Example in Figure 35 6.2.6. New Gateway Loss Plan Loss Plan Rec. #1: Use TIA/EIA/TSB122-A, Voice Gateway Loss and Level Plan. Until recently, the loss and level plan for North American CPE (customer premise equipment) was documented in ANSI/TIA/EIA-464-B. This loss plan was developed in the early 1990s for PBXs and key systems. There were a number of changes in the second half of the 1990s, necessitating the development of a new loss plan. The changes, in order of importance, were: • • • • The change from the IEEE loudness rating methodology (TOLR/ROLR) to the ITU loudness rating methodology (SLR/RLR) in ANSI/TIA/EIA-579-A. The 3 dB increase in digital set SLR and the 2 dB decrease in digital set RLR introduced in ANSI/TIA/EIA-579-A to standardize with ITU. The introduction of OLR = 10 dB as the objective for CPE loss plans. The shift from PBXs to gateways. The final point necessitated documenting essentially the same loss plan in two different standards, a PBX version and a voice gateway version. PN-3673 is the project to revise the PBX standard. It will be published as ANSI/TIA/EIA-464-C, possibly in 2001. An important point to note with this revision is that the loss plan is no longer mandatory. However, for interoperability success, it must be followed as closely as possible. TIA/EIA/TSB122-A is the voice gateway version of the new loss plan. It is available from the TIA TR-41 web page. 45 TIA/EIA/TSB116 Annex A (informative) – VoIP End-to-End Delay Budget Planning for Private Networks This annex provides a more detailed view of VoIP one-way end-to-end delay sources in a private IP network or intranet. End-to-end delay will be used synonymously with one-way delay in this document. Section A.1 covers delay sources in an example worst-case end-to-end private network. Sections A.2 and A.3 show detailed end-to-end delay budget planning in a VoIP network for a G.729A vocoder and show how the end-to-end delay is affected by the voice packet size, link speed and maximum data packet size. Although this document covers the delay budget planning for the G.729A vocoder only, the same planning rules can be applied to any other vocoder. A.1. VoIP End-to-End Delay Sources Overview Figure 38 below shows a VoIP end-to-end private network connection and lists the main delay sources for each section of the network. There are basically two types of delay source, fixed or variable and each delay source in the Figure 38 is listed in one of the two categories. Figure 38 – VoIP End-to-End Delay Sources for Private Network Scenario Originating Voice-LAN ASide Fixed: - Look ahead - Encoding - Buffer - VAD - Packetizing Orginating Gateway Terminating Voice-LAN Core Network Edge Router L1 -Link Fixed: - Switching Variable: - Voice contention - Data contention Edge Router L2 -Link Fixed: - Serialization WAN Core Network Routers Fixed: - Switching - Progation - Serialization Variable: - Voice contention - Data contention L2 -Link Fixed: - Serialization WAN Terminating Gateway L1 -Link Fixed: - Switching Variable: - Voice contention - Data contention BSide Fixed: - Decoding Variable: - Dejitter buffer KDP 2/10/2000 46 TIA/EIA/TSB116 A.2. VoIP End-to-end Delay Source Definitions A.2.1 Vocoder Encoding Details on the vocoder delays are from ITU-T Recommendation G.114; also see Section 5.2.1 of this document. This consists of fixed delays, look-ahead, the encoding process and packetization. There is also the additional serialization delay to transmit the packets over the 10/100 Base-T link, but this is negligible (much less than 1ms), so it is ignored. A.2.2 Originating Voice-LAN A.2.2.1 Fixed Switching Delay: This delay through the edge switch can be significant since forwarding engines in the edge switch are not very fast. A.2.2.2 Variable Voice Contention Delay This is the delay due to contention between voice packets for the link bandwidth. Average queue delays caused by contention between voice packets that share the same queuing priority, can be modeled using the queuing theory formula for fairly constant bit rate traffic sharing a single queue. The formula is: Average voice queuing time is: tQ-av = tdls * σ/2*(σ-1) Worst case queuing time is (95% of distribution): tQ-wo = 2*tQ-av Where tdls is Voice packet link serialization delay and σ is the link utilization of voice packets. A.2.2.3 Variable Data/Voice Contention Delay This delay is due to contention between a voice packet and a data packet, where the data packet has already started transmission. When the forwarding node uses priority-scheduling algorithms for differentiated QoS between voice and data classes, then the maximum time the voice packet is delayed by the data packet is: tD-max = (Maximum # Data MTU bytes + 48 overhead)/(link speed kbps/8) Important planning recommendation: need to use priority scheduling for voice-class traffic, as well as RTP header compression and data packet fragmentation on slow-speed links to minimize the contribution of this variable delay source. A.2.2.4 Fixed Serialization WAN Delay This delay is due to voice packets transmission on the WAN L2- link. The link rate can vary from 56kb/s to OC3 and up. The formula for serialization delay is: tV-max = (Voice packet bytes + 48 overhead)/(link speed kbps/8) Important planning recommendation: in order to minimize the effect of this delay source, avoid using slow serial links in any of the end-to-end network connections. 47 TIA/EIA/TSB116 A.2.3 Core Network A.2.3.1 Fixed Switching Delay Includes packet-switching engine delay (see originating Voice-LAN section for details) and any other network multiplexing equipment delays. An estimate of 1 ms of delay for each hop is used in the calculation table in the next section. A.2.3.2 Fixed Propagation Delay This is the cumulative delay due to the physical ‘speed of light’ limitations of propagation through the network. Details for this are contained in ITU-T Recommendation G.114. For the purpose of this exercise, a figure of 5µs/km is used in the calculation of the table in the next section. A.2.3.3 Fixed Serialization Delay Network This is the same as defined earlier, but since the link rate in the core network is usually in the broadband range, the total effect of this delay source is small enough (< 1.5 ms) that it is ignored in the calculation table in the next section. A.2.3.4 Variable Voice Contention Delay As defined earlier: Average voice queuing time is: tQ-av = tdls * σ/2*(σ-1) Worst case queuing time is (95% of distribution): tQ-wo = 2*tQ-av Total core network worst-case queuing time is (95% of distribution): = tQ-wo * (number of hops -1) Since the link rate in the core network is usually in the broadband range, the tdls delay source is small. In addition, σ, the link utilization ratio for voice packet is small, so that the total effect of this delay source can be ignored in the calculation table in the next section. A.2.3.5 Variable Data/Voice Contention Delay As defined earlier: tD-max = (Maximum # Data MTU bytes + 48 overhead)/(link speed kbps/8) Total core network maximum data MTU queuing time is: = tQ-wo * (number of hops -1) Important planning recommendation: need to use priority scheduling for voice-class traffic, as well as RTP header compression and data packet fragmentation on slow-speed links to minimize the contribution of this variable delay source. A.2.4 Terminating Voice-LAN A.2.4.1 Fixed Serialization WAN This delay is due to voice packet transmission on the WAN L2- link. The link rate can vary from 56kb/s to OC3 and up. The formula for serialization delay is: tV-max = (voice packet bytes + 48 overhead)/(link speed kbps/8) Important planning recommendation: in order to minimize the effect of this delay source, avoid using slow serial links in any of the end-to-end network connections. 48 TIA/EIA/TSB116 A.2.5 Vocoder Decoder A.2.5.1 Variable Dejitter Buffer Delay This is the delay required to buffer all the variable delays in the network so that the voice packets can be played at constant bit-rate to the decoder. The size of dejitter buffer is: vocoder encoding compression amount + the total variable delay in the end-to-end connection. A.2.5.2 Fixed Decoder Details on the vocoder decoding delay is detailed in ITU-T Recommendation G.114 49 TIA/EIA/TSB116 A.3. VoIP End-to-End Delay Budget Case 1 Table 5: Case 1a - VoIP End-to-End Delay Budget Case 1a: L1 = 10Mb/s; L2 = 128kb/s; Data MTU max= 128 Codec type: Delay type Units G.729 G.729 G.729 10.00 10.00 20.00 G.729 20.00 Fixed (ms) Variable (ms) Fixed (ms) Variable (ms) A-side phone Encoding process delay Codec Look ahead Encoding compression 1xbuffer ms 5.0 5.0 ms 10.0 10.0 ms 10.0 ~ 10.0 ~ ms 0.0 ~ 10.0 ~ bytes 10.0 1 hop, @ > 100 pps ms 10.0 voice packets queuing @ 128kb/s (Max 2*SD) ms 1.5 2.9 ms 11.0 11.0 Packetization delay # of Voice bytes/packet 20.0 Originating Voice-LAN Switching Voice contention queuing Data Queuing Max. data unit 128 bytes +48 O/H @ 128kb/s Serialization WAN delay 10.0 Voice packet + 48 O/H @ 128kb/s ms 3.6 ~ 4.3 ~ 5 hops, @ > 1k pps ms 5.0 voice packets queuing @ 1544kb/s (Max 2*SD) ms 0.1 0.3 ms 3.6 3.6 Core Network Switching Voice contention queuing Data Queuing 5 hops, Max data 128+48 O/H @ 1544kb/s avg 5.0 Serialization core Voice packet + 48 O/H @ 1544kb/s ms 1.2 ~ 1.4 ~ Propagation delay 5000km @ 5µs/km ms 25.0 ~ 25.0 ~ Voice packet + 48 O/H @ 128kb/s ms 3.6 ~ 4.3 ~ 1 hop, @ > 100 pps ms 10.0 1 comp. delay + network variable delay ms 10.0 16.2 10.0 17.8 ms 10.0 ~ 10.0 ~ 103.5 32.5 114.9 35.7 103.5 135.9 114.9 150.6 Terminating Voice-LAN Serialization WAN delay Switching 10.0 B-side phone Dejitter buffer delay Decoding delay Min/Max 50 TIA/EIA/TSB116 Table 6: Case 1b - VoIP End-to-End Delay Budget Case 1b: L1 = 10Mb/s; L2 = 128kb/s; Data MTU max= 512 Codec type: Delay type Units G.729 G.729 G.729 10.00 10.00 20.00 G.729 20.00 Fixed (ms) Variable (ms) Fixed (ms) Variable (ms) A-side phone Encoding process delay Codec Look ahead Encoding compression 1xbuffer ms 5.0 5.0 ms 10.0 10.0 ms 10.0 ~ 10.0 ~ ms 0.0 ~ 10.0 ~ bytes 10.0 1 hop, @ > 100 pps ms 10.0 voice packets queuing @ 128kb/s (Max 2*SD) ms 1.5 2.9 ms 35.0 35.0 Packetization delay # of Voice bytes/packet 20.0 Originating Voice-LAN Switching Voice contention queuing Data Queuing Max. data unit 512 bytes +48 O/H @ 128kb/s Serialization WAN delay 10.0 Voice packet + 48 O/H @ 128kb/s ms 3.6 ~ 4.3 ~ 5 hops, @ > 1k pps ms 5.0 voice packets queuing @ 1544kb/s (Max 2*SD) ms 0.1 0.3 ms 11.6 11.6 Core Network Switching Voice contention queuing Data Queuing 5 hops, Max data 512+48 O/H @ 1544kb/s avg 5.0 Serialization core Voice packet + 48 O/H @ 1544kb/s ms 1.2 ~ 1.4 ~ Propagation delay 5000km @ 5µs/km ms 25.0 ~ 25.0 ~ Voice packet + 48 O/H @ 128kb/s ms 3.6 ~ 4.3 ~ 1 hop, @ > 100 pps ms 10.0 1 comp. delay + network variable delay ms 10.0 Terminating Voice-LAN Serialization WAN delay Switching 10.0 B-side phone Dejitter buffer delay Decoding delay ms Min/Max 51 48.2 10.0 49.8 10.0 ~ 10.0 ~ 103.5 96.4 114.9 99.6 103.5 199.9 114.9 214.5 TIA/EIA/TSB116 A.4. VoIP End-to-End Delay Budget Case 2 Table 7: Case 2a - VoIP End-to-End Delay Budget Case 2a: L1 = 10Mb/s; L2 = 1544kb/s; Data MTU max= 128 Codec type: Delay type Units G.729 G.729 G.729 10.00 10.00 20.00 G.729 20.00 Fixed (ms) Variable (ms) Fixed (ms) Variable (ms) A-side phone Encoding process delay Codec Look ahead Encoding compression 1xbuffer ms 5.0 5.0 ms 10.0 10.0 ms 10.0 ~ 10.0 ~ ms 0.0 ~ 10.0 ~ bytes 10.0 1 hop, @ > 100 pps ms 10.0 voice packets queuing @ 128kb/s (Max 2*SD) ms 0.1 0.2 ms 0.9 0.9 Packetization delay # of Voice bytes/packet 20.0 Originating Voice-LAN Switching Voice contention queuing Data Queuing Max. data unit 128 bytes +48 O/H @ 1544kb/s Serialization WAN delay 10.0 Voice packet + 48 O/H @ 1544kb/s ms 0.3 ~ 0.4 ~ 5 hops, @ > 1k pps ms 5.0 voice packets queuing @ 1544kb/s (Max 2*SD) ms 0.1 0.3 ms 3.6 3.6 Core Network Switching Voice contention queuing Data Queuing 5 hops, Max data 128+48 O/H @ 1544kb/s avg 5.0 Serialization core Voice packet + 48 O/H @ 1544kb/s ms 1.2 ~ 1.4 ~ Propagation delay 5000km @ 5µs/km ms 25.0 ~ 25.0 ~ Voice packet + 48 O/H @ 128kb/s ms 0.3 ~ 0.4 ~ 1 hop, @ > 100 pps ms 10.0 1 comp. delay + network variable delay ms 10.0 ms 10.0 ~ 10.0 ~ 96.8 9.6 107.1 10.2 96.8 106.4 107.1 117.3 Terminating Voice-LAN Serialization WAN delay Switching 10.0 B-side phone Dejitter buffer delay Decoding delay Min/Max 52 4.8 10.0 5.1 TIA/EIA/TSB116 Table 8: Case 2b - VoIP End-to-End Delay Budget Case 2b: L1 = 10Mb/s; L2 = 1544kb/s; Data MTU max= 512 Codec type: Delay type Units G.729 G.729 G.729 10.00 10.00 20.00 G.729 20.00 Fixed (ms) Variable (ms) Fixed (ms) Variable (ms) A-side phone Encoding process delay Codec Look ahead Encoding compression 1xbuffer ms 5.0 5.0 ms 10.0 10.0 ms 10.0 ~ 10.0 ~ ms 0.0 ~ 10.0 ~ bytes 10.0 1 hop, @ > 100 pps ms 10.0 voice packets queuing @ 1544kb/s (Max 2*SD) ms 0.1 0.2 ms 2.9 2.9 Packetization delay # of Voice bytes/packet 20.0 Originating Voice-LAN Switching Voice contention queuing Data Queuing Max. data unit 512 bytes +48 O/H @ 1544kb/s Serialization WAN delay 10.0 Voice packet + 48 O/H @ 1544kb/s ms 0.3 ~ 0.4 ~ 5 hops, @ > 1k pps ms 5.0 voice packets queuing @ 1544kb/s (Max 2*SD) ms 0.1 0.3 ms 11.6 11.6 Core Network Switching Voice contention queuing Data Queuing 5 hops, Max data 512+48 O/H @ 1544kb/s avg 5.0 Serialization core Voice packet + 48 O/H @ 1544kb/s ms 1.2 ~ 1.4 ~ Propagation delay 5000km @ 5µs/km ms 25.0 ~ 25.0 ~ Voice packet + 48 O/H @ 1544kb/s ms 0.3 ~ 0.4 ~ 1 hop, @ > 100 pps ms 10.0 1 comp. delay + network variable delay ms 10.0 ms 10.0 ~ 10.0 ~ 96.8 29.5 107.1 30.1 96.8 126.3 107.1 137.2 Terminating Voice-LAN Serialization WAN delay Switching 10.0 B-side phone Dejitter buffer delay Decoding delay Min/Max 53 14.8 10.0 15.0