TSB116
TIA/EIA
TELECOMMUNICATIONS
SYSTEMS BULLETIN
Telecommunications
IP Telephony Equipment
Voice Quality Recommendations for
IP Telephony
TSB116
MARCH 2001
TELECOMMUNICATIONS INDUSTRY ASSOCIATION
The Telecommunications Industry Association
Represents the Communications Sector of
NOTICE
TIA/EIA Engineering Standards and Publications are designed to serve the public interest
through eliminating misunderstandings between manufacturers and purchasers, facilitating
interchangeability and improvement of products, and assisting the purchaser in selecting and
obtaining with minimum delay the proper product for his particular need. Existence of such
Standards and Publications shall not in any respect preclude any member or nonmember of
TIA/EIA from manufacturing or selling products not conforming to such Standards and
Publications, nor shall the existence of such Standards and Publications preclude their voluntary
use by those other than TIA/EIA members, whether the standard is to be used either domestically
or internationally.
Standards and Publications are adopted by TIA/EIA in accordance with the American National
Standards Institute (ANSI) patent policy. By such action, TIA/EIA does not assume any liability
to any patent owner, nor does it assume any obligation whatever to parties adopting the Standard
or Publication.
This Standard does not purport to address all safety problems associated with its use or all
applicable regulatory requirements. It is the responsibility of the user of this Standard to
establish appropriate safety and health practices and to determine the applicability of regulatory
limitations before its use.
(From Project No. 4689, formulated under the cognizance of the TIA TR-41.1 Subcommittee on
Multiline Terminal Systems.)
Published by
TELECOMMUNICATIONS INDUSTRY ASSOCIATION 2001
Standards and Technology Department
2500 Wilson Boulevard
Arlington, VA 22201
PRICE: Please refer to current Catalog of
EIA ELECTRONIC INDUSTRIES ALLIANCE STANDARDS and ENGINEERING
PUBLICATIONS or call Global Engineering Documents, USA and Canada
(1-800-854-7179) International (303-397-7956)
All rights reserved
Printed in U.S.A.
PLEASE!
DON'T VIOLATE
THE
LAW!
This document is copyrighted by the TIA and may not be reproduced without
permission.
Organizations may obtain permission to reproduce a limited number of copies
through entering into a license agreement. For information, contact:
Global Engineering Documents
15 Inverness Way East
Englewood, CO 80112-5704 or call
U.S.A. and Canada 1-800-854-7179, International (303) 397-7956
TIA/EIA/TSB116
TABLE OF CONTENTS
INTRODUCTION ..................................................................................................................... 1
1.
1.1.
SUMMARY OF VOICE QUALITY RECOMMENDATIONS FOR IP TELEPHONY............... 1
2.
REFERENCES .......................................................................................................................... 2
3.
DEFINITIONS, ABBREVIATIONS AND ACRONYMS ..................................................... 4
3.1.
3.2.
4.
WAVEFORM CODEC AND SPEECH COMPRESSION CODEC ........................................... 4
ABBREVIATIONS AND ACRONYMS .................................................................................. 4
THE E-MODEL......................................................................................................................... 5
4.1.
4.2.
TRANSMISSION RATING FACTOR “R” ............................................................................. 6
IP TELEPHONY IMPAIRMENTS AND THE E-MODEL ....................................................... 6
4.2.1. Delay ............................................................................................................................. 7
4.2.2. Echo .............................................................................................................................. 8
4.2.3. Speech Compression ................................................................................................... 11
4.2.4. Packet Loss ................................................................................................................. 13
4.3.
WHAT DOES R SOUND LIKE? ......................................................................................... 16
4.4.
E-MODEL SYMMETRY AND PERFORMANCE ................................................................ 17
4.5.
E-MODEL ENHANCEMENTS ............................................................................................ 17
4.6.
E-MODEL CONVENTIONS ................................................................................................ 18
5.
WIRELINE PSTN VOICE QUALITY BENCHMARKS.................................................... 19
5.1.
5.2.
5.3.
5.4.
6.
ISDN VOICE QUALITY ..................................................................................................... 20
PSTN VOICE QUALITY .................................................................................................... 21
TOLL COMPRESSION VOICE QUALITY .......................................................................... 24
WIRELINE PSTN VOICE QUALITY SUMMARY .............................................................. 26
IP TELEPHONY VOICE QUALITY ANALYSIS .............................................................. 27
6.1.
6.2.
VOICE QUALITY ISSUES FOR IP TELEPHONY ............................................................... 29
VOICE QUALITY RECOMMENDATIONS FOR IP TELEPHONY ...................................... 30
6.2.1. Delay ........................................................................................................................... 30
6.2.2. Speech Compression ................................................................................................... 35
6.2.3. Packet Loss ................................................................................................................. 37
6.2.4. Transcoding................................................................................................................. 41
6.2.5. Tandeming .................................................................................................................. 42
6.2.6. New Gateway Loss Plan ............................................................................................. 45
ANNEX A (INFORMATIVE) – VOIP END-TO-END DELAY BUDGET PLANNING FOR
PRIVATE NETWORKS.................................................................................................................... 46
A.1.
A.2.
A.3.
A.4.
A.4.
VOIP END-TO-END DELAY SOURCES OVERVIEW ............................................................ 46
VOIP END-TO-END DELAY SOURCE DEFINITIONS ........................................................... 47
VOIP END-TO-END DELAY BUDGET CASE 1 ..................................................................... 50
VOIP END-TO-END DELAY BUDGET CASE 2 ..................................................................... 52
VOIP END-TO-END DELAY BUDGET CASE 2 ..................................................................... 52
i
TIA/EIA/TSB116
FOREWORD
(This foreword is not part of this standard.)
This document is a TIA/EIA Telecommunications Technical Services Bulletin (TSB) produced by
Working Group TR-41.1.2 of Committee TR-41. This TSB was developed in accordance with
TIA/EIA procedural guidelines, and represents the consensus position of the Working Group and its
parent subcommittee TR-41.1, which served as the formulating group.
The TR-41.1.2 VoIP Voice Quality Working Group acknowledges the contribution made by the
following individuals in the development of this standard.
Name
Roger Britt
Mark Armstrong
Dermot Kavanagh
Peter Melton
Kirit Patel
Representing
Nortel Networks
Nortel Networks
Nortel Networks
eOn Communications
Cisco Systems
Chair/Editor
Copyrighted parts of ITU-T Appendix I to Recommendation G.113 and Recommendation G.114 are
used with permission of the ITU. The ITU owns the copyright for the ITU Recommendations.
The one annex in this Standard is informative and is not considered part of this Standard.
Suggestions for improvement of this standard are welcome. They should be sent to:
Telecommunications Industry Association
Engineering Department
Suite 300
250 Wilson Boulevard
Arlington, VA 22201
ii
TIA/EIA/TSB116
1. Introduction
The objectives of this TSB are to provide end-to-end voice quality guidelines for North American IP
Telephony and to an E-Model tutorial for IP scenarios. IP Telephony introduces several impairments,
some of which are familiar and some new. The E-Model (ITU-T Recommendation G.107) is a tool
that can estimate the end-to-end voice quality, taking the IP Telephony parameters and impairments
into account. This TSB first describes how the E-Model handles IP Telephony impairments and then
it provides general design recommendations for the best possible voice quality performance
irrespective of cost, available technology or customer requirements. These recommendations are
illustrated with specific IP scenarios to provide an E-Model tutorial for analyzing real networks.
Since initially, IP telephony is a replacement technology for the existing wireline PSTN, the focus of
this document is on wireline scenarios. The impairments introduced by wireline IP packet technology
can be significant. The reader should be aware that wireless and satellite technologies also introduce
significant impairments and that only a few of the combined effects are illustrated here. This TSB
builds on similar work done for North American PBX private networks that was published in
TIA/EIA/TSB32-A and the focus of this document remains on providing guidelines for engineered
private networks as opposed to the Internet.
The E-Model scenarios detailed in this TSB are available as two Microsoft Excel workbooks:
TSB116NETEM1.xls and TSB116NETEM2.xls. Each workbook includes Version 19 of the E-Model
application, which the reader can use to model other scenarios. Download these workbooks at:
http://www.tiab2b.com/whitepapers.cfm?manufacturer=Nortel
1.1. Summary of Voice Quality Recommendations for IP Telephony
Section 6 uses the E-Model to develop the following IP Telephony voice quality recommendations.
Delay Rec. #1: Use G.711 end-to-end because it has the lowest Ie-value and therefore it allows more
delay for a given voice quality level.
Delay Rec. #2: Minimize the speech frame size and the number of speech frames per packet.
Delay Rec. #3: Actively minimize jitter buffer delay.
Delay Rec. #4: Actively minimize one-way delay.
Delay Rec. #5: Accept the E-Model results, which permit longer delays for low Ie-value codecs, like
G.711, for a given R-value; see Figure 22 and Figure 27.
Delay Rec. #6: Use priority scheduling for voice-class traffic, as well as RTP header compression
and data packet fragmentation on slow-speed links to minimize the contribution of
this variable delay source.
Delay Rec. #7: Avoid using slow serial links.
Speech Compression Rec. #1: Use G.711 unless the link speed demands compression.
Speech Compression Rec. #2: Speech compression codecs for wireless networks and packet
networks must be rationalized to minimize transcoding issues.
Packet Loss Rec. #1: Keep (random) packet loss well below 1%.
Packet Loss Rec. #2: Use packet loss concealment with G.711.
Packet Loss Rec. #3: If other codecs are used, then use codecs that have built-in or add-on PLCs.
Packet Loss Rec. #4: New PLCs should be optimized for less than 1% of (random) packet loss.
Transcoding Rec. #1: Avoid transcoding where possible. Adds Ie and delay impairment.
Transcoding Rec. #2: For interoperability, IP gateways must support wireless codecs or IP must
implement unified Transcoder Free Operation with wireless.
Tandeming Rec. #1: Avoid asynchronous tandeming if possible. Adds Ie and delay impairment.
Tandeming Rec. #2: Synchronous tandeming of G.726 is generally permissible. Impairment is
delay dependent, so long delay DCME equipment should be avoided.
Loss Plan Rec. #1: Use TIA/EIA/TSB122-A, Voice Gateway Loss and Level Plan.
1
TIA/EIA/TSB116
2. References
The following documents contain provisions that are referenced in this TSB. At the time of
publication, the editions indicated were valid. All standards are subject to revision, and parties to
agreements based on this Standard are encouraged to investigate the possibility of applying the most
recent editions of the standards indicated below. ANSI and TIA maintain registers of currently valid
national standards published by them.
WARNING: This document contains a reference to a work-in-progress – PN-3673 (to be
published as ANSI/TIA/EIA-464-C), which is subject to change. The most current version of
PN-3673 is available in the public directory of TR-41.1 at:
http://ftp.tiaonline.org/tr-41/tr411/Public/Latest_Revision_of_PN-3673/
[1]
ANSI/TIA/EIA-464-B (April 1996), Requirements for Private Branch Exchange (PBX)
Switching Equipment.
[2]
ANSI/TIA/EIA-579-A (November 1998), Transmission Requirements for Digital Wireline
Telephones.
[3]
ANSI/TIA/EIA-810-A (December 2000), Transmission Requirements for Narrowband Voice
over IP and Voice over PCM Digital Wireline Telephones.
[4]
TIA/EIA/TSB32-A (December 1998), Overall Transmission Plan Aspects for Telephony in a
Private Network.
[5]
TIA/EIA/TSB122-A (March 2001), Voice Gateway Loss and Level Plan Guidelines.
[6]
PN-3673 (to be published as ANSI/TIA/EIA-464-C), Requirements for PBX switching
Equipment.
[7]
ANSI T1.521, American National Standard for Packet Loss Concealment with ITU-T
Recommendation G.711.
[8]
ITU-T Recommendation G.107 (12/98) and (05/00), The E-Model, A Computational Model
for use in Transmission Planning.
[9]
ITU-T Recommendation G.108 (2000), Conversational impacts on end-to-end speech
transmission quality – a planning guide on effects not covered by the E-Model.
[10]
ITU-T Recommendation G.109 (1999), Definition of categories of speech transmission
quality.
[11]
ITU-T Recommendation G.113 (02/96), Transmission impairments.
[12]
ITU-T Appendix I to Recommendation G.113 (1998), Transmission impairments – Appendix
I: Provisional planning values for the equipment impairment factor Ie.
[13]
ITU-T Recommendation G.114 (05/00), General Recommendations on the transmission
quality for an entire international telephone connection.
[14]
CCITT Recommendation G.131 (08/96), Control of talker echo.
2
TIA/EIA/TSB116
[15]
ITU-T Recommendation G.175 (05/00), Transmission plan aspects of special circuits and
connections using the international telephone connection network.
[16]
ITU-T Recommendation G.177 (2000), Transmission planning for voiceband services over
hybrid Internet/PSTN connections.
[17]
CCITT Recommendation G.711 (11/88), Pulse code Modulation (PCM) of voice frequencies.
[18]
ITU-T Recommendation G.712 (11/96), Transmission performance characteristics of pulse
code modulation.
[19]
ITU-T Recommendation G.723.1 (03/96), Dual rate speech coder for multimedia
communications transmitting at 5.3 and 6.3 kbit/s.
[20]
ITU-T Recommendation G.726 (12/90), 40, 32, 24, 16 kbit/s Adaptive Differential Pulse
Code M (ADPCM).
[21]
ITU-T Recommendation G.729 (03/96), Coding of speech at 8 kbit/s using conjugatestructure algebraic-code-excited linear-prediction (CS-ACELP).
[22]
ITU-T Recommendation P.861 (02/98), Objective quality measurement of telephone-band
(300 -3400 Hz) speech codecs.
[23]
ETSI EG 201 377-1 (1999), Specification and measurement of speech transmission quality;
Part 1: Introduction to objective comparison measurement methods for one-way speech
quality across networks.
3
TIA/EIA/TSB116
3. Definitions, Abbreviations and Acronyms
3.1. Waveform Codec and Speech Compression Codec
A codec is a combination of an analog-to-digital encoder and a digital-to-analog decoder operating in
opposite directions of transmission in the same equipment. A “waveform codec” preserves the
waveform of the incoming signal, and operates on a sample-by-sample basis. G.711 and G.726 are
examples of waveform codecs. Modern low bit-rate coders use work on a different principle than
waveform codecs. They use a uses a model of the human speech mechanism to encode and compress
speech signals, based on the analysis of a frame of input samples. The speech model parameters are
sent instead of the speech waveform. This document generally uses the term “speech compression
codec” for this type of codec, but it also uses the term “vocoder”. Table 1 separates codecs into the
waveform and speech compression categories.
3.2. Abbreviations and Acronyms
Abbreviations and acronyms, not in common usage, which appear in this TSB, are defined below.
ACELP
CPE
DCME
DSL
ECAN
EFR
ERL
ERLE
GoB
GSM
Ie
IP
ISDN
MOS
NETEM
NLP
OLR
PBX
PCM
PL
PLC
PoW
PSTN
qdu
QoS
RLR
RTP
SLR
STMR
TCLw
TDM
TELR
TFO
TrFO
VAD
VoIP
Algebraic-Code-Excited Linear-Prediction
Customer Premise Equipment
Digital Circuit Multiplication Equipment
Digital Subscriber Line
Echo Canceller
Enhanced Full Rate
Echo Return Loss
Echo Return Loss Enhancement
Good or Better (a quality rating derived from MOS)
Global System for Mobile Telecommunications
Equipment Impairment factor (an E-Model input parameter)
Internet Protocol
Integrated Services Digital Network
Mean Opinion Score
Network Edge Technology E-Model
Nonlinear Processor
Objective Loudness Rating
Private Branch Exchange
Pulse Code Modulation
Packet Loss
Packet Loss Concealment
Poor or Worse (a quality rating derived from MOS)
Public Switched Telephone Network
Quantization Distortion Unit(s)
Quality of Service (refers to systems of tagging packets for priority of transmission)
Receive Loudness Rating
Real-Time Protocol
Send Loudness Rating
Sidetone Masking Rating
Weighted Terminal Coupling Loss
Time Division Multiplexing
Talker Echo Loudness Rating
Tandem Free Operation
Transcoder Free Operation
Voice Activity Detector
Voice over Internet Protocol
4
TIA/EIA/TSB116
4. The E-Model
The objectives for this section are to:
•
•
•
Demonstrate the suitability of the E-Model for estimating the voice quality of IP (Internet
Protocol) Telephony,
Explain the R-scale used by the E-Model,
Explain what some of the IP Telephony impairments sound like.
The E-Model is a transmission-planning tool for estimating the user satisfaction of a narrowband,
handset conversation, as perceived by the listener. It is not intended for predicting absolute user
satisfaction. Instead, the intent is to model the performance of an unknown connection relative to a
connection with known performance. The E-Model has proven to be versatile tool that has adapted
well to the impairments of IP telephony. This document assumes that the reader is familiar with the EModel and the basics of following standards.
•
•
•
•
•
•
•
•
•
ITU-T Recommendation G.107 (the E-Model, including a program listing)
ITU-T Recommendation G.108 (a tutorial on the E-Model and network planning)
ITU-T Recommendation G.109 (defines categories of speech transmission quality)
ITU-T Recommendation G.113 (details transmission impairments)
ITU-T Appendix I to Rec. G.113 (table of the equipment impairment factor, Ie, values)
ITU-T Recommendation G.114 (details delay, including the expected delay for IP codecs)
ITU-T Recommendation G.131 (details talker echo)
ITU-T Recommendation G.177 (provides guidelines for mixed IP/PSTN connections)
TIA/EIA/TSB32-A and ETSI Guide 201 050 (a tutorial on the E-Model and a transmission
planning guide for private networks).
Figure 1 – Comparison of E-Model Output Scales and Categories
G.107
Default
Value
R
100
USER SATISFACTION
MOS
94
90
Very Satisfied
4.4
4.3
98.4
97.0
0.1
0.2
4.0
89.5
1.4
3.6
73.6
5.9
3.1
50.1
17.4
2.6
26.6
37.7
1.0
0
99.8
%GOB %POW
Satisfied
80
Some Users Dissatisfied
70
Many Users Dissatisfied
60
Nearly All Users Dissatisfied
50
Not Recommended
0
5
TIA/EIA/TSB116
4.1. Transmission Rating Factor “R”
The output of the E-Model is a scalar called the “Rating Factor”, the “R-value”, or simply R. The
scale is typically from 50 to 100, where everything below 50 is clearly unacceptable and everything
above 94.15 (the maximum with the G.107 E-Model, version 19 default values) is unobtainable in
narrowband (300 to 3400 Hz) telephony. The scale on the left-hand side of Figure 1 illustrates this
point. The center scale labeled “User Satisfaction” shows the categories defined in G.109. This gives
an indication of the overall quality of the conversation.
It is important to note the distinction between E-Model objective results and the results of subjective
studies that are expressed using the MOS (Mean Opinion Score), %GoB (percent Good or Better) or
%PoW (percent Poor or Worse) scales. In subjective testing, subjects are requested to classify the
perceived quality into categories (for example, a five point scale that includes the classifications
excellent, good, fair, poor and bad). In each subjective experiment, the MOS scores may differ, even
for the same condition, depending on the design of the experiment, the range of conditions included
in the study, etc. E-Model results, however, are calculated using the Impairment Factor method in
which impairment values along the speech path (such as loss, distortion, echo, delay, noise, etc.) are
combined to obtain an overall transmission rating R, which is objective and repeatable. While the R
can be deterministically converted into MOS, %GoB or %PoW scores, it is preferable to avoid
confusion and use only the R scale for all E-Model work. For reference, the MOS, %GoB and %PoW
scales are shown on the right-hand side of Figure 1.
The E-Model consists of several models that relate specific impairment parameters and their
interactions to end-to-end performance. The total end-to-end performance, taking into account all
factors, is estimated using the Impairment Factor method, which is based on the principle that
transmission impairments can be transformed into “Psychological Factors” and these factors are
additive on the “Psychological Scale”.
The equation for the transmission rating factor R is:
R = Ro - Is - Id - Ie +A
Where,
• Ro, the basic signal-to-noise ratio based on send and receive loudness ratings and the circuit and
room noise;
• Is, the sum of real-time or simultaneous speech transmission impairments, e.g., loudness levels,
sidetone and PCM quantising distortion;
• Id, the sum of delayed impairments relative to the speech signal, e.g., talker echo, listener echo
and absolute delay;
• Ie, the Equipment Impairment factor for special equipment, e.g., low bit-rate coding (determined
subjectively for each codec and for each % packet loss and documented in Appendix I to G.113);
• A, the Advantage factor adds to the total and improves the R-value for new services, like satellite
phones, to take into account the advantage of using a new service and to reflect acceptance of
lower quality by users for such services. It is assumed that the Advantage Factor will be reduced
over time as the service improves and the customers get used to the benefits of the new service. It
is not recommended to include a non-zero Advantage Factor for IP telephony because it is a
replacement for existing services, rather than a completely new service.
The Equipment Impairment factor and the Advantage factor are unique to the E-Model, but it is the
Equipment Impairment factor that makes the E-Model a powerful tool for estimating the relative user
satisfaction of IP Telephony conversations.
4.2. IP Telephony Impairments and the E-Model
The four main impairments of IP telephony are:
6
TIA/EIA/TSB116
•
•
•
•
Delay, including delay variation and jitter
Echo
Speech compression
Packet loss.
The ability to handle these impairments is one of the strengths of the E-Model.
Figure 2 – Delay Impairment of Reference Connection
User Satisfaction
100
Very
satisfactory
90
Satisfactory
80
Some users
dissatisfied
R
TELR = 65 dB
70
Many users
dissatisfied
60
Exceptional
limiting case
50
0
100
200
300
400
500
One-way Delay (ms)
4.2.1. Delay
The curve in Figure 2 plots the transmission rating factor R vs delay for the reference connection
shown in Figure 3. The right-hand side of Figure 2 includes the “User Satisfaction” scale for
reference.
Graphing R (y-axis) against delay (x-axis) gives a clear picture of how important delay is in
interactive voice telephony. The reference connection curve uses the G.107 default values for all
parameters except the variable delay. This gives the best possible performance for a narrowband
handset conversation, over this range of delay, and therefore will be used as the “relative reference”
throughout this document. The connection consists of two ideal digital telephones with G.711 codecs
and some means to vary network delay from 0 to 500 ms, as shown in Figure 3. The parametric
variable, TELR (talker echo loudness rating), shown in the Figure 2 legend, is explained in the next
section.
Notice that there is a knee on the curve at about 175 ms. The region between 150 and 200 ms is where
the delay starts to affect the dynamics of a conversation. The steeper slope of the curve after 175 ms
reflects the increasing degradation of the dynamics of normal conversation with increasing delay.
Why does this happen? In a normal face-to-face conversation, after one person speaks, there is about
a 200 ms break, then the other person speaks, followed by another 200 ms break and so on. This is
called turn taking. When delays on the communication channel become comparable to the turn-taking
7
TIA/EIA/TSB116
pauses, there is a loss of synchronicity in the conversation and normal turn taking rules start to break
down. Often, when there is added delay, one person will start talking before the other person is
finished or both people will start talking simultaneously, which causes the conversation to stop and
restart. If one person dominates the conversation, then the other person will have trouble breaking in,
because the dominant talker has already started again before the break reaches the less dominant
talker. There are other effects as well. For instance, the extra delay can also change the message.
People interpret hesitation as evidence of openness, honesty, or confidence. Suppose someone asks
the question, “Will you marry me?” and the answer, “Yes”, is delayed by some noticeable amount.
The delay may be interpreted as a hesitation to reply rather the normal operation of codecs, jitter
buffers and propagation delay. The medium can distort the message.
Figure 3 – Block Diagram and E-Model Parameters for Reference Connection
Side A
0 dBr
Digital
Telephone
Side B
Digital
Telephone
Echo Path - Side A
Echo Path - Side B
Title
Electric Circuit Noise (at 0 dBr)
Room Noise
Send Loudness Rating
Receive Loudness Rating
D-factor
Noise Floor
Sidetone Masking Rating
Equipment Impairment Factor
Expectation (Advantage) Factor
Mean One-Way Delay (upper)
Mean One-Way Delay (lower)
Mean One-Way Delay (upper = lower)
Electrical Loss (upper)
Electrical Loss (lower)
Electrical Loss (upper = lower)
Quantizing Distortion Units (upper)
Quantizing Distortion Units (lower)
Echo Return Loss
Abbrev.
(Default)
Nc (-70)
Po (35)
SLR
(8)
RLR
(2)
D
(3)
Nfor (-64)
STMR (15)
Ie
(0)
A
(0)
Tu
(0)
Tl
(0)
Tul
(0)
Lu
Ll
Lul
qduu
(1)
qdul
(1)
ERL
Digital
Set
35
8
2
3
-64
15
0
0
0
0
0
0
0
55
0 dBr
-70
0 to 500
Digital
Set
35
8
2
3
-64
15
0
0
0
0
0
0
0
55
4.2.2. Echo
The family of curves in Figure 4 shows the effect of echo as predicted by the E-Model for the
connection shown in Figure 7. To fully understand the meaning of the graph it is necessary to take a
step back and explain the parametric variable TELR. First, the definition (based on Echo Path B in
Figures 5 and 6):
TELR (Side B) = SLR (Side B) + Loss in bottom path + ELR or TCLw (Side A)
+ Loss in top path + RLR (Side B)
8
TIA/EIA/TSB116
The TELR for Side A is similar, but follows the opposite path, as highlighted by the arrows below the
block diagrams. Figure 5 shows a connection with two 2-to-4-wire conversions and an analog-todigital and digital-to-analog conversion. The 2-to-4-wire conversion is called a hybrid and the amount
of echo that is reflected by the hybrid is called the transhybrid loss, the echo return loss or more
simply the ERL. Transhybrid loss varies depending on the impedance matching of the circuits on
each side of the hybrid. Figure 6 shows an all-digital connection. In this case, the echo is due to
acoustic or electrical coupling of the two voice paths at the terminal. The echo return loss is called
weighted terminal coupling loss or TCLw. TCLw is leakage in the analog portion of the digital set,
i.e., the analog circuits, capacitive coupling in the handset cord, mechanical coupling from the
receiver to the transmitter in the handset or acoustical coupling from the receiver to the transmitter in
the handset.
TELR is the sum of the losses around the loop, from one set’s transmitter back to the receiver on the
same set. The SLR (send loudness rating) and RLR (receive loudness rating) are the loss values for
the same telephone. In Figure 5, loss pads in the upper and lower paths control the loss plan. These
pads may add gain, which increases the echo, they may add loss, which reduces the echo or they may
be neutral (0 dB) and have no affect on the echo. The loss plan for an all-digital connection is
determined by the loudness ratings of the telephones and there are no additional losses in the network,
allowing “clear channel” transmission. The loss plan for an analog or mixed analog/digital connection
is a fine balance between providing enough loss to attenuate the echo and maintain circuit stability,
while still being audible over a range of analog loops.
Back to Figure 4. Note that as TELR is reduced, the amount of end-to-end delay available to the
connection for a given performance quality objective on the R scale is also reduced. The nominal
loudness ratings for digital telephones are SLR = 8 dB and RLR = 2 dB. TELR has to be about 65 dB
to completely remove echo, so TCLw has to be:
TCLw = TELR – SLR – RLR = 65 – 8 – 2 = 55 dB.
The ERL standard for echo cancellers (ECANs), ITU-T Recommendation G.168, specifies ERL >=
55 dB of echo path loss for ECANs in gateways, but the standard for digital sets, TIA-810-A,
specifies TCLw >= 45 dB for ISDN and digital proprietary PBX telephones and TCLw >= 52 dB for
IP telephones. The curve for TELR = 55 dB (45 + 8 +2) shows that this is a good requirement for low
delay connections like local ISDN and digital proprietary PBX telephones, but it is not adequate for
IP telephones. Clearly, IP telephones need to have TCLw >= 55 dB for minimum echo return, just
like ECANs, because they work in a long delay environment. Due to the difficulty in meeting TCLw
>= 55, TIA-810-A specifies a compromise value of >= 52 dB, plus a desirable of >= 55 dB.
Figure 4 is useful for understanding the implications of double-talk on the performance of ECANs. In
single-talk mode, i.e., when one person is talking and the other is silent, the convolution processor
part of the echo canceller provides about 18 dB of echo return loss enhancement (ERLE), in addition
to the typical analog telephone ERL of about 12 dB. The nonlinear processor (NLP) provides an
additional loss of at least 25 dB, for a total of 55 dB. When both people start talking at once, the NLP
drops out leaving the connection with only 30 dB of loss. So in double-talk mode, the echo
performance drops from the TELR = 65 dB curve to the TELR = 40 dB curve, with a significant drop
in R. Since there is no conclusive evidence that anyone is listening when both people are talking, the
point may well be moot.
9
TIA/EIA/TSB116
Figure 4 – E-Model Prediction of Echo Impairment
User Satisfaction
100
Very
satisfactory
90
Satisfactory
80
TELR = 65 dB
Some users
dissatisfied
R
TELR = 60 dB
TELR = 55 dB
70
TELR = 50 dB
Many users
dissatisfied
TELR = 45 dB
60
Exceptional
limiting case
50
0
100
200
300
400
500
One-way Delay (ms)
Figure 5 – Echo Path for Analog Connection
Side A
OPS
Analog
Telephone
Digital PBX
CODEC
Hybrid
SLR = 11 dB
U dB
D
ERL =
14 dB
RLR = -3 dB
CODEC
U dB
A
Side B
Digital PBX
D
RLR = -3 dB
ERL =
14 dB
D
A
OPS
Analog
Telephone
A
D
Digital
PSTN
Hybrid
SLR = 11 dB
A
L dB
L dB
Echo Path A
Echo Path B
Figure 6 – Echo Path for Digital Connection
Side A
Digital PBX
Digital PBX
Side B
Digital
Telephone
U dB
U dB
Digital
Telephone
SLR = 8 dB
RLR = 2 dB
Digital
PSTN
RLR = 2 dB
TCLw = 45 dB
L dB
SLR = 8 dB
L dB
TCLw = 45 dB
Echo Path A
Echo Path B
10
TIA/EIA/TSB116
Figure 7 – Block Diagram and E-Model Parameters for Echo Impairment
Side A
Side B
0 dBr
IP
Telephone
IP
Telephone
IP Intranet
Echo Path - Side A
Echo Path - Side B
Title
Electric Circuit Noise (at 0 dBr)
Room Noise
Send Loudness Rating
Receive Loudness Rating
D-factor
Noise Floor
Sidetone Masking Rating
Equipment Impairment Factor
Expectation (Advantage) Factor
Mean One-Way Delay (upper)
Mean One-Way Delay (lower)
Mean One-Way Delay (upper = lower)
Electrical Loss (upper)
Electrical Loss (lower)
Electrical Loss (upper = lower)
Quantizing Distortion Units (upper)
Quantizing Distortion Units (lower)
Echo Return Loss
Abbrev.
(Default)
Nc (-70)
Po
(35)
SLR
(8)
RLR
(2)
D
(3)
Nfor (-64)
STMR (15)
Ie
(0)
A
(0)
Tu
(0)
Tl
(0)
Tul
(0)
Lu
(0)
Ll
(0)
Lul
(0)
qduu
(1)
qdul
(1)
ERL
IP Set
35
8
2
3
-64
15
0
0
0
0
0
0.5
0.5
35 to 55
0 dBr
-70
IP Intranet
IP Set
0 to 500
35
8
2
3
-64
15
0
0
0
0
0
0.5
0.5
35 to 55
Figure 7 shows the block diagram and E-Model parameters for echo impairment. It is the same as
Figure 3, except the parametric variable TELR ranges from 45 to 65 dB in 5 dB steps. Actually, it is
the TCLw parameter that ranges from 35 to 55 dB in 5 dB steps, but the generic term, ERL, is used in
the E-Model, as shown on the bottom line of Figure 7.
4.2.3. Speech Compression
A unique feature of the E-Model is its flexibility to deal with the impairments introduced by speech
compression and packet loss via the Equipment Impairment Factor (Ie). Ie values for several codecs
are listed in Rec. G.113, Appendix I and for convenience are reproduced here in Table 1. The list of
codecs and channel conditions included in G.113 is frequently updated, as are the Ie values. Before
using these Ie values in any E-Model calculations, the reader is advised to refer to the latest revision
of G.113, Appendix I. The Ie values in Table 1 were determined in subjective experiments with ideal
software implementations of the codecs; the performance provided by commercial codecs may vary.
As detailed in section 4.1, Ie is subtracted from the R, reflecting the reduced listener satisfaction due
to distortion. Figure 8 illustrates the point by comparing the best-case curves for three popular IP
codecs, G.711, G.729A and G.723.1 (6.3 kbit/s). Notice that the codecs with speech compression
(G.729A and G.723.1) have larger Ie values and, therefore, can tolerate less one-way delay for a
given voice quality level (R).
Figure 9 shows the block diagram with Ie as the parametric variable and delay as the independent
variable. The jitter buffers and packetization modules are shown in the gateways without any delay
allotment. Instead all the delay is shown at the 0 dBr point. This is done only for convenience. When
modeling a real connection, the proper gateway delay would be entered in the gateway columns. Note
that the G.711 curve in Figure 8 is the same as the TELR = 65 dB reference curve in the previous
sections. NETEM determines the reflection point as the ERL value that is closest to the 0 dBr column
(55 dB in this case). This is only correct if the ECAN’s tail path capacity is long enough.
11
TIA/EIA/TSB116
Table 1 – Planning Values for the Equipment Impairment Factor Ie
Codec Type
Reference
Operating Rate
kbit/s
Waveform Codecs
PCM
ADPCM
G.711
G.726, G.727
G.721, G.726, G.727
G.726, G.727
G.726, G.727
Speech Compression Codecs
LD-CELP
G.728
CS-ACELP
G.729
G.729-A + VAD
IS-54
IS-641
IS-96a
IS-127
Japanese PDC
GSM 06.10, Full-rate
GSM 06.20, Half-rate
GSM 06.60, EFR
G.723.1
G.723.1
VSELP
ACELP
QCELP
RCELP
VSELP
RPE-LTP
VSELP
ACELP
ACELP
MP-MLQ
Ie
Value
64
40
32
24
16
0
2
7
25
50
16
12.8
8
8
8
7.4
8
8
6.7
13
5.6
12.2
5.3
6.3
7
20
10
11
20
10
21
6
24
20
23
5
19
15
Figure 8 – Speech Compression Impairment
User Satisfaction
100
Very
satisfactory
90
Satisfactory
80
G.711
Some users
dissatisfied
R
G.729A
70
Many users
dissatisfied
G.723.1
60
Exceptional
limiting case
50
0
100
200
300
One-way Delay (ms)
12
400
500
TIA/EIA/TSB116
Figure 9 – Block Diagram and E-Model Parameters for Speech Compression Impairment
Side A
Gateway
Gateway
0 dBr
Digital
Telephone
G.7xx
G.711
Side B
Digital
Telephone
JB
IP Intranet
JB
G.711
G.7xx
Echo Path - Side A
Echo Path - Side B
Title
Electric Circuit Noise (at 0 dBr)
Room Noise
Send Loudness Rating
Receive Loudness Rating
D-factor
Noise Floor
Sidetone Masking Rating
Equipment Impairment Factor
Expectation (Advantage) Factor
Mean One-Way Delay (upper)
Mean One-Way Delay (lower)
Mean One-Way Delay (upper = lower)
Electrical Loss (upper)
Electrical Loss (lower)
Electrical Loss (upper = lower)
Quantizing Distortion Units (upper)
Quantizing Distortion Units (lower)
Echo Return Loss
Abbrev.
(Default)
Nc (-70)
Po (35)
SLR
(8)
RLR
(2)
D
(3)
Nfor (-64)
STMR (15)
Ie
(0)
A
(0)
Tu
(0)
Tl
(0)
Tul
(0)
Lu
Ll
Lul
qduu
(1)
qdul
(1)
ERL
Digital
Set
35
8
2
3
-64
15
0
0
0
0
0
0.5
0.5
45
IP Gateway
0 dBr
-70
IP Intranet
0, 11, 15
0
0
0
0
0
0
55
IP Gateway
0, 11, 15
0 to 500
0
0
0
0
0
0
0
55
Digital
Set
35
8
2
3
-64
15
0
0
0
0
0
0.5
0.5
45
4.2.4. Packet Loss
The provisional Equipment Impairment Factors for G.711, G.729A, G.723.1 and GSM Enhanced Full
Rate (EFR) codecs under conditions of packet loss are listed in Table 2. There are three columns for
G.711, one without Packet Loss Concealment (PLC) and two with PLC. The two with PLC are
further subdivided into random and bursty packet loss conditions. For reference, the G.711 PLC
algorithms are specified in ANSI T1.521. Their performance is similar. Annex A is the algorithm that
was used for the bursty column and Annex B was used for the random column.
The plot of Ie values vs packet loss in Figure 10, for the codecs in Table 2, shows the effectiveness of
the PLC algorithms. Ideal performance would be a horizontal line along the x-axis, indicating no
increase in impairment as packet loss increases. The dark blue vertical curve for G.711 without PLC
is almost the opposite of the ideal. The difference between the two G.711 curves with PLC (the green
and red curves) probably reflects both the differences in the algorithms (Annex A vs Annex B of
ANSI T1.521, respectively) and the effects of the test conditions (bursty vs random, respectively).
Another way to describe packet loss visually is to plot the family of curves for a given codec.
Figure 11 does so for GSM EFR. As the packet loss increases from 0 to 5%, the Ie value increases
from 5 to 33 and the available one-way delay drops from about 380 ms to about 35 ms at R = 60. For
reference, the G.711 default curve is included as well. Note that the encoder-decoder delay is about
100 ms for GSM. So really, the R axis shifts over to the 200 ms mark for this scenario, before
considering any other sources of delay. Also note (in Figure 12) that the ERL is set to the ideal for
this simple example. Actual GSM voice quality is much lower than it appears in Figure 11. Refer to
Section 6.2.3 for more packet loss examples. Figure 12 shows the scenario block diagram with Ie as
the parametric variable and with all the delay again concentrated in the 0 dBr column, rather than
distributed appropriately. To avoid transcoding issues in this example, the transport between the base
stations uses transcoder free operation (TrFO), meaning the GSM bits are transported over the packet
network without being converted to and from G.711.
13
TIA/EIA/TSB116
Table 2– Provisional Planning Values for the Equipment Impairment Factor Ie under
Conditions of Packet Loss for Codecs G.711, G.729A + VAD, G.732.1 + VAD and GSM EFR
Packet
Loss
(%)
0
0.5
1
1.5
2
3
4
5
7
8
10
15
16
20
G.711
without PLC
(10 ms speech
packet length)
0
–
25
–
35
45
–
55
–
–
–
–
–
–
G.711 + PLC
Random
Packet Loss
G.711 + PLC
Bursty
Packet Loss
G.729A +
VAD
8 kbit/s
G.723.1 +
VAD
6.3 kbit/s
GSM 06.60
EFR
12.2 kbit/s
(10 ms speech
packet length)
(10 ms speech
packet length)
(2 speech frames/
packet)
(1 speech frame/
packet)
(1 speech frame/
packet
0
–
5
–
7
10
–
15
20
–
25
35
–
45
0
–
5
–
7
10
–
30
35
–
40
45
–
50
11
11
15
17
19
23
26
–
–
36
–
–
49
–
15
15
19
22
24
27
32
–
–
41
–
–
55
–
5
–
16
–
21
26
–
33
–
–
–
–
–
–
Figure 10 – Provisional Planning Values for the Equipment Impairment Factor Ie under
Conditions of Packet Loss for Codecs G.711, G.729A + VAD and G.723.1 + VAD
60
50
40
Ie 30
20
10
0
0
5
10
15
20
Packet Loss %
G.711 without PLC
G.711 with PLC Bursty Packet Loss
G.723.1 + VAD
14
G.711 with PLC Random Packet Loss
G.729A + VAD
GSM 06.60 EFR
25
TIA/EIA/TSB116
Figure 11 – GSM 06.60 Enhanced Full Rate Packet Loss Performance
relative to G.711 without Packet Loss
GSM 06.60 EFR Packet Loss Performance
User Satisfaction
100
Very
satisfactory
90
G.711 @ PL = 0%
Satisfactory
G.711 Reference
GSM EFR
GSM EFR @ PL = 0%
80
Some users
dissatisfied
R
GSM EFR @ PL = 1%
70
GSM EFR @ PL = 2%
Many users
dissatisfied
GSM EFR @ PL = 3%
60
GSM EFR @ PL = 5%
Exceptional
limiting case
50
0
100
200
300
400
500
One-way Delay (ms)
PL = Packet Loss
Figure 12 – Block Diagram and E-Model Parameters for Packet Loss Impairment
15
TIA/EIA/TSB116
4.3. What Does R Sound Like?
Table 3 – Descriptions of the Sound Characteristics caused by IP Telephony Impairments
Description
Cause
Convergence echo*
A brief blast of echo at the start of a call, before the ECAN
converges, or following changes to the echo path (e.g.,
transferred calls or switching in conference bridges).
Double-talk echo*
ECANs disable the NLP when both people talk
simultaneously, leaving only the low ERLE of the convolution
processor. When delays are long, the residual echo may be
audible.
After double-talk echo*
Echo caused by double-talk, but arriving after double-talk is
finished due to network delay (see next)
Conversation protocol issues, like End-to-end delay (speech coding + packetization + jitter
turn-taking, over-talking, break-in compensation + network routing + propagation) is too high.
and who’s-in-charge problems
Because of the loss of simultaneity, the parties may perceive
each other as inattentive, insincere, or rude. This will increase
with increased delay, until turn-taking cues break down
completely.
Whirlybird distortion, or waterfall CELP-based compression coding algorithms.
effect
Speech Clipping at beginning VAD (voice activity detector) not switching quickly enough,
and/or end of phases or words
ECAN NLP not switching quickly enough, or VAD and
ECAN interfering with each other.
Background noise
silence periods*
Background
contrast*
contrast
noise
in No comfort noise generator (CNG), or the CNG is not
properly matching the background noise in the sending end
(CNG is using stationary noise rather than modeling the actual
noise, or the noise model is inadequate).
transition Comfort noise generator switches in too slowly, hang-time on
VAD too long.
Noise pumping*
Background noise is triggering the VAD, lack of comfort
noise generator or comfort noise level does not match the
actual background noise level.
Dropouts/chopping/clipping
Lack of signal caused by packet loss or a problem with VAD.
Clicks, pops
Packet loss with waveform codecs operating without packet
loss concealment.
Tonal or mechanical
riding on the voice
artifacts Side effects of the packet loss concealment algorithm.
Low level tones*
Created intentionally by decoders during long bursts of packet
loss.
* These impairments are not modeled by the E-Model.
Now that we are confident that the E-Model is a suitable tool for IP telephony, it is time to consider
what R values are possible and what these values sound like. First of all, it is important to appreciate
that any particular R may be reached by multiple combinations of impairment. Therefore, different Rs
have many different sound characteristics. Some are listening characteristics and some are
conversation characteristics. These characteristics were discussed in general terms in Section 4.2 and
16
TIA/EIA/TSB116
will be discussed further in Section 6. Table 3 is a glossary of many of the sounds that IP telephony
users will experience, along with the potential causes.
The question of what values of R are possible is one of the objectives of this TSB and it is a much
more difficult question than can be answered in this section. It will take most of Section 6 to flesh out
the answer. Figures 8 and 11 give some hints about how the R-axis works, as each codec has a family
of curves that simply shift the reference curve down to lower starting points on the R-axis, due to
increased noise and distortion.
The delay axis is a bit more complicated. Delay can be partitioned as: speech coding/packetization +
jitter compensation + transport + propagation. Section 6 will provide the necessary delay details, but
for now it is sufficient to say that unlike the PSTN, the region below 100 ms is not well used by IP
telephony. The concept of regions will be explored further in Section 5 and although Section 6 will
calculate specific R for specific scenarios, it is important to remember that the quality of IP
transmission depends on dynamic impairments rather than the static impairments associated with
TDM transmission). Therefore, the overall quality of IP Telephony is probabilistic in nature, and
must be characterized statistically.
4.4. E-Model Symmetry and Performance
This document assumes symmetry between Side A and Side B that probably does not exist in
practice. For instance, packet loss, delay and loudness ratings may be asymmetrical. Also, because the
Ie values are based on ideal implementations of ITU-T codecs, the performance predicted by the EModel may be optimistic. The performance of real codecs may be worse due to implementation
constraints. Also, all buffer behavior is not equal. Many jitter buffers simply discard packets under
overload conditions. Smart buffers wait for silence periods before discarding packets. The difference
is audible. Delay variation over time may also be audible in some cases.
4.5. E-Model Enhancements
The E-Model has gained worldwide acceptance because it is a unique model based on several
previously existing transmission quality models and the results of many subjective experiments
conducted over the last fifty years. One of the E-Model’s limitations is that it is currently only
applicable for narrowband handset operation. Obvious enhancements are to add headset, handsfree
and wideband functionality. Further work under Q. 20/12 in ITU-T Study Group 12 plans to include
headset and handsfree operation, but currently there is no support for developing a planning model
for wideband audio.
Work has started in ITU-T Study Group 12 on a new Recommendation (P.833) to provide a detailed
methodology for determining equipment impairment factors for use in G.107, from the results of
subjective tests. The current methodology used for determining equipment impairment factors is not
well documented and could be improved. As an alternative to the use of subjective tests, work is also
planned within ITU-T Study Group 12 to develop methods for determining equipment impairment
factors from objective measurements, based on the new Recommendation P.862, Perceptual
Evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment
of narrowband telephone networks and speech codecs.
The latest revision of ITU-T Recommendation G.107 (05/00) changes the supplementary amount of
equivalent circuit noise, Nos, in the E-Model to include the Lombard effect. The Lombard effect
accounts for the behavior of a talker in a noisy environment to raise his voice only half the amount
necessary to maintain the same signal-to-noise ratio as in a quiet environment. Acknowledging this
behavior requires a change to a constant in the Nos equation from 0.008 to 0.004. This change affects
the default R for Ie = 0, changing it from R = 94.15 to R = 93.19. Since this change occurred during
17
TIA/EIA/TSB116
the preparation of this document, the (05/00) version of the E-Model is not used here, rather the
(12/98) version of the E-Model (sometimes known as VTQME19), with R default = 94.15 is used.
Also changing in ITU-T Recommendation G.107 (05/00) is the relationship between the R and the
number of qdus. In G.107 (12/98), R was constant for values of qdu below about 5, but the constant is
being removed in G.107 (05/00). The slope remains the same, but it starts at qdu = 1 instead of about
qdu = 5. This information is explained in G.113 (2001).
Europe has traditionally preferred about 6 dB louder sidetone than North America. Europe has
favored a nominal STMR of about 12 dB and North America has favored a quieter nominal STMR of
about 18 dB. Currently, the E-Model incorrectly penalizes STMRs quieter than 15 dB. TIA-810-A
specifies STMR = 18 dB +/- 6 dB. An effort will be made to change the 15 dB threshold to 21 dB in a
future revision of G.107 to better accommodate North American sidetone preferences.
4.6. E-Model Conventions
To obtain consistent answers, it is necessary to agree on certain conventions. These are listed below.
•
•
•
•
•
Ta = T
Tr = 2T
If actual delay is asymmetrical (Tact side A to side B ≠ Tact side B to side A), then take the average
of the two delays (Tavg side A to side B = Tavg side B to side A = [Tact side A to side B + Tact side
B to side A]/2). This is appropriate because the subjective perception of delay depends on the
round trip delay, not on either of the one-way delays.
QDU minimum = 1 for all digital connections including IP telephones and gateways
It is a relative tool rather than an absolute tool.
18
TIA/EIA/TSB116
5. Wireline PSTN Voice Quality Benchmarks
The PSTN is a well-developed network with acceptable voice quality. Therefore, it is logical to
compare the performance of IP telephony with the benchmarks established by the PSTN. Since
everyone has used the PSTN, everyone has a mental model of the quality it provides. In this section,
we will develop the corresponding E-Model and R values associated with PSTN performance. In this
way, we can establish solid objective benchmarks for comparing IP telephony to the PSTN. The
results of this benchmarking will be used to compare to the IP telephony scenarios in the next section.
Three representative PSTN benchmarks will be illustrated: ISDN voice quality (G.711 end-to-end),
PSTN voice quality (analog telephones with nominal length analog loop at both ends and G.711
digital switching/trunking) and Toll Compression voice quality (similar to the PSTN case, using
G.726 at 32 kb/s in the digital portion). These benchmarks will then be summarized as the “existing
PSTN” region in the delay vs. R space described above.
ITU Recommendation G.114 provides guidance on propagation delay. For fiber optic national circuits
the formula for propagation plus equipment delay is:
National delay (ms) = 3 ms + (0.005 ms/km × distance in km)
Note: All delays in G.114 and in this TSB are one-way.
The 3 ms constant term makes allowance for one PCM coder/decoder pair and for five digitally
switched exchanges. In addition to propagation, the 0.005 ms/km factor accounts for the delay in
repeaters and regenerators. The maximum national distance is about 6000 km, which equates to a
maximum national delay of 33 ms. In practice, there would typically be much less delay due to
propagation and more delay associated with equipment like PBXs, compression codecs and
multiplexers. For reference, as the crow flies, the distance/delay (includes 3 ms constant) between St.
John's, Newfoundland and Victoria, British Columbia is about 5000 km/28 ms and between New
York and San Francisco is about 4200 km/24 ms.
For international circuits, the following are the relevant guidelines from G.114:
International submarine fiber optic delay (ms) = 13 ms + 10 ms + (0.005 ms/km × distance in km)
International submarine coaxial cable system delay (ms) = 30 ms + (0.006 ms/km × distance in km)
The 13 ms constant accounts for the transmitter delay and the 10 ms accounts for the receiver delay.
The 30 ms constant accounts for the total one-way digital circuit multiplication equipment (DCME)
delay, using a G.726-32 codec and digital speech interpolation. The submarine cable route could be
significantly longer than the way the crow flies, but here are some examples for reference. The
distance/delay (including the constants) between London, England and New York is about 5600 km/
51 ms (fiber) or 64 ms (coax) and between San Francisco and Hong Kong is about 11000 km/ 78 ms
(fiber) or 96 ms (coax).
Building on the above examples, a more complicated scenario would consist of one national and two
international sections. For example, London and Hong Kong are only about 10,000 km apart (via
Europe and Asia), but an international call might be routed the long way around through the United
States for a minimum of 20,800 km (London to New York to San Francisco to Hong Kong). The
objective here is to give a feel for the amount of worst-case PSTN impairment in a hypothetical
reference connection, not to represent typical connection routing. The fiber route is assumed to be
G.711 all the way. It has an Ie = 0 and a total one-way delay of 153 ms.
Assuming that national connection uses G.711 and that the tandeming is asynchronous (see Section
6.2.4), then the coax route has two DCME impairments of an Ie of 7, for a total of 14. If instead, the
tandeming is synchronous, the Ie can be reduced to down to 7 and the total one-way delay is 184 ms.
19
TIA/EIA/TSB116
Worst-case scenarios like this do not reflect the typical user experience, particularly as it relates to
delay. We typically experience delays of less than 50 ms with good echo control (most national and
approximately 50% of international calls, depending on country of origin). In fact, as the following
examples will show, the loss plan for analog sets introduces much of the impairment we experience.
Therefore, the next sections detail three, simple PSTN benchmarks that highlight the typical user
experience rather than three complicated hypothetical reference connections that mask the message in
detail. The worst-case delay of 100 ms for the PSTN benchmarks was selected as a compromise
between the typical actual delays of less than 50 ms and worst-case hypothetical reference connection
delays of about 200 ms. One can think of 100 ms as approximately one national and one international
wireline connection.
5.1. ISDN Voice Quality
ISDN voice quality is obtained on all-digital TDM connections with the following characteristics:
•
•
•
•
G.711 only
Echo control in the telephones (45 dB >= TCLw >= 40 dB)
Nominal loudness ratings of SLR = 8 dB and RLR = 2 dB and STMR = 15 dB (note sidetone
discussion in section 4.5)
One-way delay = 100 ms maximum
Figure 13 – ISDN Quality Voice Benchmark
Figure 13 graphically illustrates ISDN quality with delay from 0 to 100 ms as the independent
variable and TELR of 50, 55 and 60 dB as the parametric variable. This corresponds to TCLw values
of 40, 45 and 50 dB, respectively. The rationale for using these values is that TIA-810-A increased
the TCLw requirements by 5 dB and now specifies a nominal TCLw of 45 dB and a desirable TCLw
of 50 dB (the yellow region). Previous digital set standards specified a nominal TCLw of 40 dB and a
desirable TCLw of 45 dB (the gray region).
The maximum delay for a national connection is about 28 ms @ 5000 km. As shown in Figure 13, the
typical digital set TCLw values, at this distance, do not have a significant effect on R: the voice
20
TIA/EIA/TSB116
quality remains in the “very satisfactory” category. However, as the delay increases, the curve for the
TCLw = 40 dB curve drops rapidly down to 80.
The existing PSTN is defined by the green region between the reference curve (TCLw = 55 dB) and
R = 80 and between 0 and 100 ms. Figure 14 shows the connection diagram and the E-Model
parameters for the ISDN quality scenario.
Figure 14 – Block Diagram and E-Model Parameters for ISDN Voice Quality Benchmark
Side A
0 dBr
ISDN
Terminal
Side B
ISDN
Terminal
Echo Path - Side A
Echo Path - Side B
Title
Electric Circuit Noise (at 0 dBr)
Room Noise
Send Loudness Rating
Receive Loudness Rating
D-factor
Noise Floor
Sidetone Masking Rating
Equipment Impairment Factor
Expectation (Advantage) Factor
Mean One-Way Delay (upper)
Mean One-Way Delay (lower)
Mean One-Way Delay (upper = lower)
Electrical Loss (upper)
Electrical Loss (lower)
Electrical Loss (upper = lower)
Quantizing Distortion Units (upper)
Quantizing Distortion Units (lower)
Echo Return Loss
Abbrev.
(Default)
Nc (-70)
Po (35)
SLR
(8)
RLR
(2)
D
(3)
Nfor (-64)
STMR (15)
Ie
(0)
A
(0)
Tu
(0)
Tl
(0)
Tul
(0)
Lu
Ll
Lul
qduu
(1)
qdul
(1)
ERL
ISDN
Terminal
35
8
2
3
-64
15
0
0
0
0
0
0.5
0.5
55,45,40
0 dBr
-70
0 to 100
ISDN
Terminal
35
8
2
3
-64
15
0
0
0
0
0
0.5
0.5
55,45,40
5.2. PSTN Voice Quality
Mixed analog and digital (TDM) connections, without speech compression, have PSTN voice quality
and they have the following characteristics:
•
•
•
•
•
G.711 only
No echo control below 10 ms (ERL = 11 dB + 6 dB Rx loss = 17 dB)
Echo control enabled at 10 ms (ELR = 55 dB)
Nominal loudness ratings of SLR = 11 dB and RLR = -3 dB and STMR = 15 dB (note
sidetone discussion in section 4.5)
One-way delay = 100 ms maximum
ITU-T Recommendation G.131 provides guidance on when to enable the ECAN, but each
administration enables ECANs according to their own rules. Some administrations always have the
ECANs enabled, while others wait until 22 ms. For this scenario, 10ms was selected as a compromise
because it is the delay value where the blue (TELR = 31 dB) curve intercepts the R = 80 contour.
21
TIA/EIA/TSB116
The much lower echo control of the PSTN network before ECANs are enabled accounts for the very
rapid degradation of the quality vs delay as shown in Figure 15. Once the ECAN is enabled, at 10 ms,
there is an abrupt improvement in the voice quality due to the ECAN’s significant improvement in
ERL. The red curve (with the ECAN enabled) uses analog telephone loudness ratings, with G.711
codecs in the digital segment. Compare it to the red curve in Figure 17, which uses G.726 in the
digital segment with an Ie of 7.
Figure 16 shows the connection diagram and the E-Model parameters for the PSTN quality scenario.
Figure 15 – PSTN Voice Quality Benchmark
22
TIA/EIA/TSB116
Figure 16 – Block Diagram and E-Model Parameters for PSTN Voice Quality Benchmark
23
TIA/EIA/TSB116
5.3. Toll Compression Voice Quality
All-digital or mixed analog/digital connections, with speech compression, have Toll Compression
voice quality and they have the following characteristics:
•
•
•
•
•
G.711 and G.726 @ 32 kbit/s (Ie = 7)
Echo control enabled (ERL = 55 dB)
Nominal analog loudness ratings of SLR = 11 dB and RLR = -3 dB and STMR = 15 dB (note
sidetone discussion in section 4.5)
Nominal digital loudness ratings of SLR = 8 dB and RLR = 2 dB and STMR = 15 dB (note
sidetone discussion in section 4.5)
One-way delay = 100 ms maximum (of which the G.726/DCME accounts for 30 ms)
The gray region in Figure 17 defines toll-compression quality. It is bounded by the red curve (mixed
analog/digital connection with speech compression shown in Figure 18) and the blue curve (an alldigital connection with speech compression and ECANs enabled, but no hybrids). The main
impairment is due to the reduced voice quality of the G.726 codec, but the 30 ms delay that the
DCME introduces is also a significant portion of the 100 ms budget.
Figure 17 – Toll Compression Voice Quality Benchmark
24
TIA/EIA/TSB116
Figure 18 – Block Diagram and E-Model Parameters for Toll Compression Voice Quality Benchmark
25
TIA/EIA/TSB116
5.4. Wireline PSTN Voice Quality Summary
The wireline PSTN is characterized by:
•
•
•
Analog telephones with low delay, good loudness ratings and poor echo control.
Digital telephones with low delay, good loudness ratings and good echo control.
Digital networks with low delay, low impairments and good echo control.
It is summarized by the green, “Existing PSTN” region in Figure 19 and it is bounded by the best
G.711 performance on the top, R =80 on the bottom and delay between 0 and 100 ms. Most of the
100 ms delay “budget” is available for propagation delay.
Other technologies like IP and wireless introduce combinations of noise/distortion and delay that
reduce the voice quality relative to the existing PSTN. The objective of the next section is to illustrate
the impairments introduced by IP technology with the aid of the E-Model and to provide some design
guidance. Wireless technology introduces similar impairments and the combination of wireless and IP
exacerbates the problem. Careful planning must be done for these two high impairment technologies
to interoperate with a voice quality level approaching the existing PSTN.
Figure 19 – Existing PSTN Voice Quality
26
TIA/EIA/TSB116
6. IP Telephony Voice Quality Analysis
Section 4 demonstrated the suitability of the E-Model as a tool for estimating the relative voice and
conversation quality of IP telephony. Section 5 defined the voice quality of the “Existing PSTN”. The
goal of this section is to define the region of voice quality that is acceptable for IP telephony and to
establish the recommendations needed to manage the impairments introduced by IP telephony.
Figure 1 showed the G.109 categories of speech transmission quality. These six categories give a
good feeling for the level of user satisfaction, but the resolution is too fine for the purposes of this
document. Instead, this document compresses the five categories above R = 50 into three categories:
“High”, “Medium” and “Low”. From the user’s perspective the categories blend together as shown in
Figure 20, but thresholds for the three categories have been defined to make them more useful for
objective work.
Taking the “Existing PSTN” voice quality region illustrated in Figure 19 as representative of “High”
voice quality, then R = 80 defines the lower bound of the “High” category. Figure 19 shows that the
“Existing PSTN” only uses 100 ms of the maximum acceptable delay of 250 ms for R = 80 (the
intercept of the Ie = 0 curve with the R = 80 line). However, according to the E-Model, regardless of
the delay value, all points on the R = 80 line have the same impairment. Therefore, the region
bounded by the Ie = 0 contour and the R = 80 line constitutes the “High” category.
The intent of the “Medium” category is to define a region for a selection of compression codecs. The
upper limit of the “Medium” category is the same as the lower limit of the “High” category, R = 80.
To determine the lower boundary of this region, the effect of delay on the interactive nature of
conversation must be considered. Basing our choice on the contour curve of a popular IP codec,
G.729A + VAD (see Figure 25 in Section 6.2.1.1), and the same delay limit as the “High” category,
i.e., 250 ms, we obtain a lower limit for the “Medium” category of R = 70. This is also the threshold
between “Some Users Dissatisfied” and “Many Users Dissatisfied” in Figure 1, making it suitable
choice. The “Medium” category is then bounded by R = 80, R = 70 and the Ie = 0 contour.
All connections below R = 70 will suffer from some combination of distortion and long delay. The
region between R = 50 and R= 70 encompasses the “Many Users Dissatisfied” and the “Nearly All
Users Dissatisfied” categories in Figure 1 and therefore deserves the title “Low” voice quality. The
“Low” category is then bounded by R = 70, R = 50 and the Ie = 0 contour.
Figure 21 and Figure 22 summarize the recommended IP Telephony voice quality categories.
27
TIA/EIA/TSB116
Figure 20 – IP Telephony Voice Quality Regions
G.107
Default
Value
R
94
Voice Quality Continuum
MOS
%GOB %POW
4.4
98.4
0.1
4.0
89.5
1.4
3.6
73.6
5.9
3.1
50.1
17.4
2.6
26.6
37.7
1.0
0
99.8
High
80
Medium
70
60
Low
50
Not Recommended
0
Figure 21 – Recommended IP Telephony Voice Quality Categories
G.107
Default
Value
R
94
Recommended IP Telephony
Voice Quality Categories
MOS
%GOB %POW
4.4
98.4
0.1
4.0
89.5
1.4
3.6
73.6
5.9
3.1
50.1
17.4
2.6
26.6
37.7
1.0
0
99.8
High
80
Medium
70
60
Low
50
Not Recommended
0
28
TIA/EIA/TSB116
Figure 22 – Recommended IP Telephony Voice Quality Categories
6.1. Voice Quality Issues for IP Telephony
The list of voice quality issues from Section 4.1 (delay, echo, speech compression and packet loss)
has been expanded in this section to cover some additional issues:
•
•
•
•
•
•
•
•
Delay
Speech compression
Packet loss
G.711 packet loss concealment (PLC)
Transcoding
New gateway loss plan
Echo cancellers (ECAN)
TCLw
G.711 PLC, transcoding and loss plans have been added to the list. The first two are very important
issues for IP telephony and are covered in some detail. Loss plan details however, are documented in
TSB122-A. They are mentioned here only because this entire document assumes that the loss plan is
implemented correctly.
Echo has been divided into two categories: echo cancellers and TCLw. However, they were covered
in sufficient detail in Sections 4 and 5, so they are not explicitly detailed in this section. In summary,
for long delay environments like IP telephony echo cancellers and IP telephones must provide at least
55 dB of ERL or 52 dB of TCLw, respectively. These requirements are documented in the
appropriate standards and no further voice quality recommendations are required.
29
TIA/EIA/TSB116
6.2. Voice Quality Recommendations for IP Telephony
6.2.1. Delay
Delay Rec. #1: Use G.711 end-to-end because it has the lowest Ie-value and therefore it allows more
delay for a given voice quality level.
Delay Rec. #2: Minimize the speech frame size and the number of speech frames per packet.
Delay Rec. #3: Actively minimize jitter buffer delay.
Delay Rec. #4: Actively minimize one-way delay.
Delay Rec. #5: Accept the E-Model results, which permit longer delays for low Ie-value codecs, like
G.711, for a given R-value; see Figure 22 and Figure 27.
Delay Rec. #6: Use priority scheduling for voice-class traffic, as well as RTP header compression
and data packet fragmentation on slow-speed links to minimize the contribution of
this variable delay source.
Delay Rec. #7: Avoid using slow serial links.
While ITU-T Recommendation G.114 is the definitive document on delay, this section provides some
value-added information without reiteration. In particular, the emphasis of this TSB is to encourage
the use of the E-Model in making design decisions. This provides more flexibility than simple oneway transmission time limits like 150 ms or 400 ms. One-way delay has three components:
•
•
•
Encoding/decoding/packetization + jitter buffer delay (delay variation),
Transport delay,
Propagation delay.
In addition to the information provided in this section, Annex A (VoIP End-to-End Delay Budget
Planning for Private Networks) offers an analysis of the delay aspects of two scenarios.
6.2.1.1.
Packetization and Jitter Buffer Delay
Packetization delay in a codec/vocoder is comprised of several components. The delay on the encoder
side (the send side) consists of: the time taken to accumulate speech samples into a speech frame; the
time required to compress the speech frame, if needed, for the purpose of bandwidth reduction; the
time to insert the speech frame(s) into a packet and transfer the packet to the transport facility; and the
firmware/hardware delays. In addition, some vocoders use a look-ahead function, as part of the
compression process, which waits for the first part of the following speech frame to provide
information on how to help reconstruct the speech sample if there are any lost packets.
The packetization delay on the decoder side (the receive side) consists of: the time taken to
decompress the speech frame(s) into speech samples and the firmware/hardware delays. In addition,
some codecs have an add-on packet loss concealment algorithm that adds some delay.
Also, on the decoder side is the jitter buffer, which introduces delay to compensate for the variation in
arrival time of sequential packets from the transport facility. Some documents refer to this as delay
variation.
G.114 has a thorough analysis of planning guidelines for packetization delay, for the case where the
compression/encoding process fully utilizes the power of the processor. G.114 provides the following
formulas for calculating the minimum and maximum codec-related processing delay:
Minimum packetization delay for high-speed connections = (N+1) x frame size + D (ms)
Maximum packetization delay for low-speed connections = (2N+1) x frame size + D (ms)
{1}
{2}
Where: N = number of speech frames per packet; speech frame size is in ms;
D = look-ahead, PLC, and additional firmware/hardware delay (where applicable) in ms.
30
TIA/EIA/TSB116
For purposes of this document, a high-speed connection is defined as one in which the time taken to
transfer the speech packets to the transport facility is insignificant with respect to the length of the
speech frame size. A low-speed connection is defined as one in which the transfer time is equal to the
length of the speech frame size (to maintain real-time communication, the transfer time cannot exceed
the speech frame size). The ‘2’ in the ‘2N’ part of equations {2} and {4} consists of 1N for
compression (same as equations {1} and {3}) and 1N for the transfer time, i.e., (1 + 1) N = 2N.
Equations {1} and {2} indicate that the delay due to compression (the ‘+ 1’ term) need only be
considered once when calculating the total packetization delay. This is true because while the
processor is executing the compression algorithm on a speech frame, the next speech frame is
concurrently being accumulated. To operate in real time, the compression execution must be finished
before the next speech frame is ready. Therefore, a fully utilized processor takes the duration of one
speech frame to complete the compression algorithm. Also possible, but not documented in G.114, is
the situation where there is sufficient processor power to compress the speech frame almost
instantaneously or when the G.711 codec, which does not require any compression, is used. Equations
{3} and {4} illustrate sufficient processor power case by removing the ‘+ 1’ term.
Minimum packetization delay for high-speed connections = N x frame size + D (ms)
Maximum packetization delay for low-speed connections = 2N x frame size + D (ms)
{3}
{4}
Where: N = number of speech frames per packet; speech frame size is in ms;
D = look-ahead, PLC, and additional firmware/hardware delay (where applicable) in ms.
The theoretical minimum and maximum packetization delays, irrespective of processing power, are
then described by equations {3} and {2}, respectively. These equations define theoretical limits.
Actual delays for practical implementations will lie somewhere between these limits.
Figure 23 shows a graphical representation of the packetization delay for the G.729A + VAD
connection illustrated in Figure 26. G.729A has a speech frame of 10 ms and look-ahead of 5 ms.
(G.711 does not have speech frames per se; it can be packetized in 1 ms increments.) There is always
a trade-off between header-to-payload efficiency and packetization delay, but clearly, as more speech
frames and larger speech frames are inserted into each packet, the packetization delay increases. This
translates into a lower voice quality level for a given amount of propagation delay (see Figure 25).
Due to processing power limitations, IP telephone jitter buffers are typically frame-based meaning
that the size of the buffer is a multiple of speech-frame size. Frame-based jitter buffers can increase
delay dramatically if the frame size is large. A rule of thumb for frame-based buffers is that the jitter
buffer must be two times the speech-frame size. Gateways have more processing power, so they often
use absolute jitter buffers that are sized to the expected delay of the transport. The benefit in doing
this lies in the enhanced resolution of the jitter buffer. That is, to cancel 20 ms of jitter, it does not
matter whether an absolute jitter buffer or a frame-based jitter buffer with frame size 10 or 20 ms is
used. On the other hand, to cancel 21 ms jitter the absolute jitter buffer will introduce less delay
because it can be set to 23 ms, for example, (21 ms + some safety margin in increments determined
by the jitter buffer resolution) instead of 30 ms.
Frame-based jitter buffer delay = 2N x frame size (ms)
Absolute jitter buffer delay = actual end-to-end delay variation + margin (ms)
Where: N = number of speech frames per packet; speech frame size is in ms.
{5}
{6}
Jitter buffers solve the lost and late packets problem by adding delay that reduces the available delay
budget. Truly lost (discarded) packets will never show up. Figure 24 adds frame-based jitter buffer
delay to the packetization delay shown in Figure 23. This translates into even lower voice quality for
a given amount of propagation delay. The goal is to minimize jitter buffer delay. When QoS controls
are used, smaller jitter buffers can be used to obtain the same performance.
31
TIA/EIA/TSB116
Figure 23 – G.729A + VAD Packetization Delay, without Jitter Buffer Delay,
for the case where the packetization process Fully Utilizes the Power of the Processor
Figure 24 – G.729A + VAD Packetization Delay + Frame-based Jitter Buffer Delay
for the case where the packetization process Fully Utilizes the Power of the Processor
32
TIA/EIA/TSB116
Figure 25 – Example of G.729A + VAD Allowable Propagation Delay
Figure 26 – E-Model details for the Delay Examples in Figures 23, 24 and 25
33
TIA/EIA/TSB116
Most jitter is source-based jitter. It is a function of the link speed, packet size and network loading.
Source jitter arises due to contention for the link bandwidth, both by multiple voice calls sharing the
same queuing priority and between voice and data packets where data packet has already started
transmission. Source-based jitter in all low speed packet access networks (cable, enterprise, DSL) can
dwarf network jitter in high-speed networks by orders of magnitude.
6.2.1.2.
Transport Delay and Jitter
Intranet carriers and corporate managed IP networks use equipment with only about 25 to 100
microseconds of delay per hop, plus about 10 to 20 ms of jitter buffer delay end-to-end to
accommodate source-based jitter. Network-based jitter is usually negligible relative to source-based
jitter and intranet carriers and corporate managed IP networks design their networks for packet loss
rates well below 1%. However, any contention or queue overruns will produce “jitter events”, which
are brief but dramatic changes in delay relative to the mean jitter. The question then becomes, how
brief? The packet loss concealment algorithms in the end terminals can deal with about 40 ms of
missing speech. Between about 40 ms and about 200 ms, the speech is clipped and after that there are
speech dropouts. When calculating the mean jitter, these jitter events have to be removed and noted
separately, otherwise they would significantly distort the mean jitter value. Significant jitter events in
the network can only be avoided by increasing the bandwidth and/or implementing a QoS strategy to
prioritize the voice packets. For the purposes of this section, 25 ms has been reserved for transport
delay plus jitter (see Figure 25) and there are no jitter events.
6.2.1.3.
Propagation Delay
Once the packetization, jitter buffer and transport delay are accounted for in a given connection, all
that remains is propagation delay. There are three ways to deal with propagation delay:
•
•
•
Reserve a block of time for propagation delay;
Illustrate the available propagation delay using equipment quality classes;
Illustrate the available propagation delay on a case-by-case basis with real scenarios or
hypothetical reference connections.
The advantage of reserving a block of time for propagation delay is that it is a very simple approach.
The problem is: how much is appropriate? Section 5 identified the typical actual PSTN delay as less
than 50 ms, but the worst-case hypothetical PSTN reference connection delay is about 200 ms. This
approach is too course to be practical for IP telephony.
The second approach, using equipment quality classes, refers to dividing both the IP telephone
packetization/jitter buffer delay and the network transport delay into a number of quality classes. The
various combinations of these classes can be documented in a matrix against the “High”, “Medium”
and “Low” voice quality categories and the matrix can show how much propagation delay is
available. The matrix format is compact. For example, with three delay classes of IP telephones, three
delay classes of networks and three voice quality categories, a matrix with 27 entries gives a good
overview of the range of available propagation delay, from none to more than enough. While this
approach both solves the resolution problem identified in the previous method and is easy to use in
the table format, it does take some studying to appreciate the available propagation delay message.
The third approach is illustrated in Figure 25. It is consistent with the philosophy of this document to
use the E-Model to evaluate specific scenarios. Figure 25 shows the case of G.729A + VAD with two
speech frames per packet ((2 * 2 + 1) x 10 + 5 ms = 55 ms), a two time speech frame jitter buffer (2 x
10 ms = 20 ms) and a transport delay of 25 ms (from Section 6.2.1.2), for a total of 80 ms.
The allowable propagation delay depends on the minimum voice quality level (remember this
includes the conversation dynamics) to which the connection is allowed to drop. In this example, the
intercept with R = 70, the lower limit of the “Medium” category, is the “line in the sand”. Therefore,
34
TIA/EIA/TSB116
the allowable propagation delay for this example is about 145 ms, which is sufficient for all national
and most international connections.
This approach provides great flexibility for evaluating the allowable propagation delay. However, it
requires sufficient resources to perform the necessary E-Model analyses.
6.2.2. Speech Compression
Speech Compression Rec. #1: Use G.711 unless the link speed demands compression.
Speech Compression Rec. #2: Speech compression codecs for wireless networks and packet
networks must be rationalized to minimize transcoding issues.
Speech compression adds distortion. In the E-Model, an increase in distortion is represented by an
increase in the Ie-value and therefore a decrease in the R. As R drops, the delay margin available for a
given voice quality level also drops. This is highlighted in Figure 27, where a number of codecs from
Table 1, have been plotted to illustrate their relative contours. Figure 28 shows the block diagram and
E-Model parameters for this example. Determining the Ie-value for a given codec is as much an art as
a science and it is useful to think of the information in Figure 27 and Table 1 as ranking of codecs
rather than a list of absolute Ie-values.
The selection of the appropriate codec for the task is complicated. One must consider the cost of the
intellectual property, the speech performance, the packet loss performance, the conferencing
performance (leads to transcoding issues), the tone/DTMF/fax performance, the efficiency, the speech
frame size, the delay and so on. The challenge of the details misses the big picture. The implication
from Figure 27 and the example in Section 6.2.4, Transcoding, is that there are simply too many
codecs. The combination of impairments introduced by wireless networks and packet networks
demands a rationalization of codecs.
This task is beyond the scope of this document, but explains why this document recommends the
G.711 codec for IP telephony unless the link speed demands compression. Without going into the EModel details, Table 4 shows the number of channels that can be transported over three link speeds
for three codecs. This is achieved by trading off voice distortion and conversation dynamics against
additional capacity. Yes, a gateway can transport 5 channels of G.729A, with two speech frames per
packet, over a 212 kb/s link, but the quality of conversation on the resulting channels is reduced.
Using the examples in this document for guidance, determining the Rs for Table 4 is left as an
exercise for the reader.
Table 4 -– Comparison of Codecs, Link Speed and Capacity
Codec Bit Rate (kb/s)
Packet Frame Duration (ms)
Payload Size (bytes)
IP Packet Size w/ overhead
ATM Cells Needed
ATM Bytes Needed
ATM Bitrate Needed (kb/s)
Link
Speed
(kb/s)
#Channels (Max)
212
#Channels (Max)
512
64
10
80
134
3
159
127.2
1
4
G.711
64
20
160
214
5
265
106
G.726
64
32
32
30
10
20
240 40
80
294 94
134
7
2
3
371 106 159
98.9 84.8 63.6
32
30
120
174
4
212
56.5
G.729A
8
8
8
8
10
20
30
40
10
20
30
40
64
74
84
94
2
2
2
2
106 106 106 106
84.8 42.4 28.3 21.2
Number of Channels Possible for each Codec at a
Given Link Speed
2
2
2
3
3
2
5
7
4
5
6
8
9
6
12
18
35
10
24
TIA/EIA/TSB116
Figure 27 -– A Comparison of Selected Codecs from Table 1
Speech Compression Impairment
User Satisfaction
100
G.711
GSM EFR @ 12.2 kb/s
90
Very
satisfactory
G.726, G.727 @ 40 kb/s
Ie = 0
G.726, G.727 @ 32 kb/s;
G.728 @ 16 kb/s
Ie = 2
Satisfactory
Ie = 5
G.726, G.727 @ 24 kb/s
80
Ie = 7
Some users
dissatisfied
R
Ie = 11
Ie = 15
70
Many users
dissatisfied
60
Ie = 20
Ie = 25
G.723.1 @ 6.3 kb/s
G.729A + VAD
@ 8 kb/s
Exceptional
limiting case
IS-54 @ 8 kb/s;
G.728 @ 12.8 kb/s
50
0
100
200
300
400
500
One-way Delay (ms)
Figure 28 -– E-Model details for the Speech Compression Example in Figure 27
36
TIA/EIA/TSB116
6.2.3. Packet Loss
Packet Loss Rec. #1:
Packet Loss Rec. #2:
Packet Loss Rec. #3:
Packet Loss Rec. #4:
Keep (random) packet loss well below 1%.
Use packet loss concealment with G.711.
If other codecs are used, then use codecs that have built-in or add-on PLCs.
New PLCs should be optimized for less than 1% of (random) packet loss.
The packet loss rate, the distribution of the losses (random vs bursty) and the number of speech
frames per packet are known to affect the subjective quality of voice on packet networks. The light
blue (w/o PLC @ PL = 1%) curve in Figure 29 shows what happens to the voice quality of the G.711
codec when it encounters only 1% of packet loss without a packet loss concealment algorithm. R
drops from 94 to 69 at 0 ms. PLCs monitor the receive signal and attempt to reduce the effects of
packet loss by using information in the current packet to estimate the following packet if it doesn’t
arrive in time. PLCs add about 5 ms of processing delay but they are essential in packet networks.
Vocoders intended for packet networks like G.729 and G.723.1 are equipped with PLCs, but codecs
like G.711 and G.726, which were originally intended for switched circuit networks require PLC addons. As mentioned below, there are two standard methods for G.711, but there is currently no
established standard for G.726. A standard is not necessary, however, since most methods are
deployed at the decoder only; therefore, a proprietary algorithm easily be employed where needed.
Based on the Ie-values in Table 2 and the scenario in Figure 30, the G.711 packet loss performance
with and without PLC is illustrated in Figure 29. R at 1% packet loss without PLC is about the same
R for 10% packet loss with PLC (not shown in Figure 29) assuming a random distribution of lost
packets. The red, green, blue and violet curves (w/PLC @ PL = 1% to w/PLC @ PL = 5%) show the
G.711 with PLC random packet loss performance. It is clear from these curves that PLC must be used
with G.711 in packet networks. Recently, ITU-T SG16 approved the ANSI T1.521 Annex A PLC
algorithm as Appendix I of Recommendation G.711. The performance of this algorithm is
documented in Table 2 in the “Bursty” column, but it is not plotted in this section. T1.521 also has an
Annex B with a second PLC algorithm, which is documented in Table 2 in the “Random” column.
This is the algorithm that is plotted in Figure 29. Note that the terms “Random” and “Bursty” only
refer to the test conditions that were used to evaluate the algorithms. Because PLC algorithms work
on the receive side only, having a choice of two algorithms is acceptable.
Figure 31 and Figure 33 show the random packet loss family of curves for two popular IP coders:
G.729A + VAD and G.723.1A (6.3 kbit/s) codecs, respectively. For relative reference, the G.711
default curve is included on each graph. These graphs illustrate two points. First, on the R-axis it
shows how much distortion impairment each vocoder adds. Second, on the delay axis it shows the
reduction in delay available to the connection, for a given R. Recall the packetization and jitter buffer
delay details provided in Section 6.2.1. The delay margin available within a given performance
category is significantly reduced by the use of speech compression and it is further reduced by packet
loss. Figure 32 and Figure 34 show the related block diagrams and E-Model parameters.
In practice, intranet carriers and corporate managed IP networks design their networks for packet loss
rates well below 1%. Much of the packet loss detail shown in this section for rates of packet loss
greater 2% is based on early experience with voice over the Internet where Best Effort service does
not provide any arrival time guarantees. Including the GSM EFR vocoder in Section 4.2.4, this
document details the family of curves for four of the five PLC algorithms/codecs that have published
(provisional) packet loss information. All four PLC algorithms show reasonably consistent
degradation vs packet loss.
Another way of thinking about packet loss is in terms of time rather than percent. PLCs can provide
adequate “repair” of consecutive missing speech up to about 40 ms. Most packet loss is bursty in
nature, i.e., occasional long losses, rather than frequent short losses. In practice this means that packet
loss performance is directly related to packet size, the shorter, the better.
37
TIA/EIA/TSB116
Figure 29 -– G.711 Random Packet Loss Performance
G.711 Packet Loss Performance with & without PLC
User Satisfaction
100
Very
satisfactory
∆ R = 20 @ 1% PL
∆ R = 28 @ 2% PL
90
G.711 @ PL = 0%
Satisfactory
w /PLC
w /PLC @ PL = 1%
80
w /PLC @ PL = 2%
Some users
dissatisfied
R
w /PLC @ PL = 3%
w /PLC @ PL = 5%
70
w /o PLC @ PL = 1%
Many users
dissatisfied
w /o PLC @ PL = 2%
60
Exceptional
limiting case
w /o PLC
50
0
100
w /PLC = w ith Packet Loss Concealment
w /o PLC = w ithout Packet Loss Concealment
200
300
One-way Delay (ms)
400
500
PL = Packet Loss
PLC = Packet Loss Concealment
Figure 30 -– E-Model details for the G.711 Packet Loss Example in Figure 29
38
TIA/EIA/TSB116
Figure 31 – G.729A +VAD Packet Loss Performance compared to the G.711 Reference
G.729A Packet Loss Performance
User Satisfaction
100
Very
satisfactory
90
G.729A
G.711 Reference
G.711 @ PL = 0%
Satisfactory
G.729A @ PL = 0%
80
Some users
dissatisfied
R
G.729A @ PL = 1%
70
G.729A @ PL = 2%
Many users
dissatisfied
G.729A @ PL = 3%
60
G.729A @ PL = 4%
Exceptional
limiting case
50
0
100
200
300
One-way Delay (ms)
400
500
PL = Packet Loss
Figure 32 – E-Model details for the G.729A + VAD Packet Loss Example in Figure 31
39
TIA/EIA/TSB116
Figure 33 – G.723.1 (6.3 kbit/s) Packet Loss Performance compared to the G.711 Reference
G.723.1 Packet Loss Performance
User Satisfaction
100
Very
satisfactory
90
G.711 Reference
G.723.1
G.711 @ PL = 0%
Satisfactory
G.723.1 @ PL = 0%
80
Some users
dissatisfied
R
G.723.1 @ PL = 1%
70
G.723.1 @ PL = 2%
Many users
dissatisfied
G.723.1 @ PL = 3%
60
G.723.1 @ PL = 4%
Exceptional
limiting case
50
0
100
200
300
400
500
One-way Delay (ms)
PL = Packet Loss
Figure 34 – E-Model details for the G.729A Packet Loss Example in Figure 33
40
TIA/EIA/TSB116
6.2.4. Transcoding
Transcoding Rec. #1: Avoid transcoding where possible. Adds Ie and delay impairment.
Transcoding Rec. #2: For interoperability, IP gateways must support wireless codecs or IP must
implement unified Transcoder Free Operation with wireless.
Transcoding is defined as two or more encodings of a signal through different types of non-G.711
codecs, separated by G.711 or linear segments. Example: GSM EFR to G.711 to G.729. Transcoding
is accomplished by converting the signal to G.711 or linear. Direct conversion between arbitrary
codecs is not yet possible.
Transcoding is a significant issue in wireless connections because there are several different wireless
codecs. IP telephony supports several different codecs, but connections are expected to be established
by handshaking to a common codec if the connection does not transverse islands of PSTN. The
problem of transcoding occurs in IP-to-wireless connections because each technology may use a
different set of codecs, although some wireless codecs are being adopted as options for IP.
In terms of the E-Model and impairments, transcoding has the potential to increase distortion and
delay. How much distortion depends on the codecs involved. Looking down the list in Table 1, Ievalues vary from 2 to 50. Some combinations will not be noticeable, while others will be intolerable.
Similarly, delay is very specific to a given scenario. Although this document attempts to summarize
complex issues with general recommendations, its paramount objective is to provide an E-Model
tutorial for IP scenarios, so readers can analyze their own networks. A number of transcoding and
tandeming examples are provided on the next few pages to help deal with the complexities of these
situations.
Figure 36 shows a scenario with a GSM wireless telephone connected to a G.729A + VAD IP
telephone. The IP telephone may be using G.729A+ VAD instead of G.711 due to low speed access.
There is one transcode in this connection from GSM to G.729A+ VAD and vice versa. The resulting
decrease in voice quality from the “High” category to the “Medium” category is illustrated by the
green curve (GSM with G.729A+ VAD transcode) in Figure 35. The GSM EFR Ie-value of 5 is
added to the G.729A+ VAD Ie-value of 11 for a total Ie of 16. Also, note that the amount of delay
available to a given voice quality level is significantly reduced. For example, at R = 70 the available
delay is about 80 ms less for the green curve compared to the red curve, which is in turn about 50 ms
less than the black curve.
The conclusion is that only one transcode can be tolerated before the performance drops below
acceptable levels, for most combinations of non-G.711 codecs.
Transcoding E-Modeling Rule:
Assume Ie-values are additive. Note: The E-Model does not take into account the order of the codecs,
which in practice may be incorrect.
Transcoding Example:
Ie total = GSM EFR (Ie = 5) to G.711 (Ie = 0) to G.729A + VAD (Ie = 11) = 5 + 11 = 16
A way around this additive impairment is Transcoder Free Operation (TrFO). TrFO is an out-of-band
signaling procedure that provides the capability of negotiating the same (or at least an interoperable)
encoder/decoder combination between the end terminals themselves, with a direct digital (no
conversion to/from G.711) connection in between. Figure 37 shows a scenario where the IP network
and telephones support the wireless GSM codec, so the voice quality is raised from the green curve to
the red curve (GSM only, no transcode, Ie = 5) in Figure 35.
41
TIA/EIA/TSB116
Transcoder Free Operation E-Modeling Rule: Only one Ie-value.
Transcoder Free Operation Example: Ie total = GSM EFR all the way (Ie = 5) = 5
Transcoder Free Operation is still in the discussion stage for wireless networks. TrFO will be just as
important for IP as it will be for wireless. However, interoperability must be the common goal of
wireless and IP networks. Either TrFO must be unified between IP and wireless, or the range of
codecs in IP gateways expanded to include wireless codecs.
The next section deals with tandeming, which has some similarities to transcoding and some
significant differences. Worth mentioning in this section is that while Tandem Free Operation (TFO)
and TrFO appear to be the same thing, they are implemented in a different manner. A device called
the transcode unit is in place for TFO, but not for TrFO. As result, in simple terms, TrFO may have
less delay, but TFO may have more network compatibility and features and be less expensive to
implement. Ideally, they should be harmonized.
6.2.5. Tandeming
Tandeming Rec. #1:
Tandeming Rec. #2:
Avoid asynchronous tandeming if possible. Adds Ie and delay impairment.
Synchronous tandeming of G.726 is generally permissible. Impairment is
delay dependent, so long delay DCME equipment should be avoided.
Tandeming, for all codecs except G.711, is defined as two or more encodings of a signal through the
same type of codec, separated by analog or G.711 segments. Example: G.726 to G.711 to G.726.
Tandeming for G.711 is defined as two or more encodings of a signal through G.711, separated by
analog segments. It is asynchronous tandeming.
Tandeming E-Modeling Rule for G.711:
Ie for G.711 is 0, but each D/A-A/D conversion incurs a distortion impairment of 1 qdu.
Tandeming Example for G.711: qdu total = G.711 (qdu = 0.5) to analog to G. 711 (qdu = 0.5) = 1
Asynchronous tandeming of waveform codecs (see Table 1), except G.711, is defined as two or more
encodings of a signal through the same type of waveform codec, separated by analog segments,
G.711 segments with digital processing, such as digital pads that interrupts the sample-by-sample
flow. Delay introduced by DCME equipment may be an issue, see synchronous tandeming.
Asynchronous Tandeming E-Modeling Rule for non-G.711 Waveform codecs:
Ie-values are additive. DCME delay may be an issue.
Asynchronous Tandeming Example for non-G.711 Waveform codecs:
Ie total = analog set to G.711 to G.726.32 (Ie=7) to G.711 to G.726.32 (Ie=7) to G.711 to analog set
= 14
Synchronous tandeming of waveform codecs (in practice G.726), except G.711, is defined as two or
more encodings of a signal through the same type of waveform codec, separated by G.711 segments
without any digital processing that interrupts the sample-by-sample flow. Some DCME equipment
has tandem-avoidance capability that synchronizes the samples. In modern digital networks,
waveform codec segments separated by G.711 segments synchronize naturally, without the aid of
DCME tandem-avoidance capability, only when there is no digital signal processing in the G.711
42
TIA/EIA/TSB116
segments. However this is the typical case. As mentioned in Section 5, each piece of DCME
equipment also adds about 30 ms of one-way delay, for things like voice activity detection. DCME
uses TDM streams, which requires conversion from IP with media gateways. Also, DCME may
introduce transcoding issues. DCME equipment was useful in the switched circuit network, but it
introduces undesirable delay and Ie impairment in conjunction with wireless and IP technologies.
Synchronous Tandeming E-Modeling Rule for G.726 Waveform codecs:
Ie-values are not additive. There is only one Ie-value, regardless of the number of tandems. Delay
becomes the limiting factor.
Synchronous Tandeming Example for G.726 Waveform codecs:
Ie total = Analog set to G.711 to G.726.32 (Ie = 7) to G.711 to G.726.32 to G.711 to analog set = 7
Asynchronous tandeming of speech compression codecs is defined as two or more encodings of a
signal through the same type of codec, separated by analog segments, or G.711 segments with digital
processing, such as digital pads, and/or different frame boundaries for frame-based codecs, so that the
frame sampling boundaries do not line up from one encoding/decoding to the next.
Asynchronous Tandeming E-Modeling Rule for Speech Compression codecs: Ie-values are additive.
Asynchronous Tandeming Example for Speech Compression codecs:
Ie total = analog set to G.711 to G.729 (Ie = 10) to G.711 to G.729 (Ie = 10) to G.711 to analog set
= 20
It is possible to arrange the frame boundaries of speech compression codecs to line up to become
synchronously tandemed, but it is not commonplace. The nature of the coding process is such that
there will still be some degradation, but probably less than the sum of Ie-values. This situation is
probably codec-specific and there are no simple modeling rules.
Tandem Free Operation (TFO) is an in-band signaling procedure to handshake between the
transcoding units that would normally interface to the PSTN with G.711 PCM. If the transcoding
units determine that they are compatible (normally exactly the same codec, but some interworking
scenarios are possible), then they set up a direct digital path (transport the bits) instead of going back
to G.711. The key point is that the codecs in the end terminals are unaware of this procedure. TFO is
just beginning to be introduced in wireless networks to avoid tandeming of speech codecs in wirelessto-wireless calls. This will raise voice quality levels and therefore, user expectations.
From the E-Model perspective TFO is the TrFO scenario in Figure 37. The IP network and
telephones support the wireless GSM codec, so the voice quality is illustrated by the red curve (GSM
only, no transcode, Ie = 5) in Figure 35.
Tandem Free Operation E-Modeling Rule: Only one Ie-value.
Tandem Free Operation Example: Ie total = GSM EFR all the way (Ie = 5) = 5
TFO was originally designed for wireless systems, but it is applicable to any packet voice network.
TFO requires identical codecs at either end. The handshaking protocol supports selection of
compatible compression modes; that is, if the two terminals support multiple codecs, the protocol
includes procedures for the selection of a common-mode compression.
43
TIA/EIA/TSB116
Figure 35 – A Comparison of the Effect of One Transcode with No Transcodes
and the G.711 Reference
Effect of Transcoding on Impairment
User Satisfaction
100
Very
satisfactory
GSM only, no transcodes
90
G.711 Reference
Satisfactory
G.711 Reference
Some users
dissatisfied
GSM only,
no transcodes
80
R
70
GSM w ith
G.729A transcode
Many users
dissatisfied
GSM w ith G.729A
transcode
60
Exceptional
limiting case
50
0
100
200
300
400
500
One-way Delay (ms)
Figure 36 – E-Model details for the GSM EFR to G.729A Transcode Example in Figure 35
44
TIA/EIA/TSB116
Figure 37 – E-Model details for the No Transcode Example in Figure 35
6.2.6. New Gateway Loss Plan
Loss Plan Rec. #1:
Use TIA/EIA/TSB122-A, Voice Gateway Loss and Level Plan.
Until recently, the loss and level plan for North American CPE (customer premise equipment) was
documented in ANSI/TIA/EIA-464-B. This loss plan was developed in the early 1990s for PBXs and
key systems. There were a number of changes in the second half of the 1990s, necessitating the
development of a new loss plan. The changes, in order of importance, were:
•
•
•
•
The change from the IEEE loudness rating methodology (TOLR/ROLR) to the ITU loudness
rating methodology (SLR/RLR) in ANSI/TIA/EIA-579-A.
The 3 dB increase in digital set SLR and the 2 dB decrease in digital set RLR introduced in
ANSI/TIA/EIA-579-A to standardize with ITU.
The introduction of OLR = 10 dB as the objective for CPE loss plans.
The shift from PBXs to gateways.
The final point necessitated documenting essentially the same loss plan in two different standards, a
PBX version and a voice gateway version.
PN-3673 is the project to revise the PBX standard. It will be published as ANSI/TIA/EIA-464-C,
possibly in 2001. An important point to note with this revision is that the loss plan is no longer
mandatory. However, for interoperability success, it must be followed as closely as possible.
TIA/EIA/TSB122-A is the voice gateway version of the new loss plan. It is available from the TIA
TR-41 web page.
45
TIA/EIA/TSB116
Annex A (informative) – VoIP End-to-End Delay Budget Planning for Private Networks
This annex provides a more detailed view of VoIP one-way end-to-end delay sources in a private IP
network or intranet. End-to-end delay will be used synonymously with one-way delay in this
document. Section A.1 covers delay sources in an example worst-case end-to-end private network.
Sections A.2 and A.3 show detailed end-to-end delay budget planning in a VoIP network for a
G.729A vocoder and show how the end-to-end delay is affected by the voice packet size, link speed
and maximum data packet size. Although this document covers the delay budget planning for the
G.729A vocoder only, the same planning rules can be applied to any other vocoder.
A.1. VoIP End-to-End Delay Sources Overview
Figure 38 below shows a VoIP end-to-end private network connection and lists the main delay
sources for each section of the network. There are basically two types of delay source, fixed or
variable and each delay source in the Figure 38 is listed in one of the two categories.
Figure 38 – VoIP End-to-End Delay Sources for Private Network Scenario
Originating
Voice-LAN
ASide
Fixed:
- Look ahead
- Encoding
- Buffer
- VAD
- Packetizing
Orginating
Gateway
Terminating
Voice-LAN
Core Network
Edge
Router
L1 -Link
Fixed:
- Switching
Variable:
- Voice contention
- Data contention
Edge
Router
L2 -Link
Fixed:
- Serialization
WAN
Core Network
Routers
Fixed:
- Switching
- Progation
- Serialization
Variable:
- Voice contention
- Data contention
L2 -Link
Fixed:
- Serialization
WAN
Terminating
Gateway
L1 -Link
Fixed:
- Switching
Variable:
- Voice contention
- Data contention
BSide
Fixed:
- Decoding
Variable:
- Dejitter buffer
KDP 2/10/2000
46
TIA/EIA/TSB116
A.2. VoIP End-to-end Delay Source Definitions
A.2.1 Vocoder Encoding
Details on the vocoder delays are from ITU-T Recommendation G.114; also see Section 5.2.1 of this
document. This consists of fixed delays, look-ahead, the encoding process and packetization. There
is also the additional serialization delay to transmit the packets over the 10/100 Base-T link, but this
is negligible (much less than 1ms), so it is ignored.
A.2.2 Originating Voice-LAN
A.2.2.1 Fixed Switching Delay:
This delay through the edge switch can be significant since forwarding engines in the edge switch are
not very fast.
A.2.2.2 Variable Voice Contention Delay
This is the delay due to contention between voice packets for the link bandwidth. Average queue
delays caused by contention between voice packets that share the same queuing priority, can be
modeled using the queuing theory formula for fairly constant bit rate traffic sharing a single queue.
The formula is:
Average voice queuing time is: tQ-av = tdls * σ/2*(σ-1)
Worst case queuing time is (95% of distribution): tQ-wo = 2*tQ-av
Where tdls is Voice packet link serialization delay and σ is the link utilization of voice packets.
A.2.2.3 Variable Data/Voice Contention Delay
This delay is due to contention between a voice packet and a data packet, where the data packet has
already started transmission. When the forwarding node uses priority-scheduling algorithms for
differentiated QoS between voice and data classes, then the maximum time the voice packet is
delayed by the data packet is:
tD-max = (Maximum # Data MTU bytes + 48 overhead)/(link speed kbps/8)
Important planning recommendation: need to use priority scheduling for voice-class traffic, as well
as RTP header compression and data packet fragmentation on slow-speed links to minimize the
contribution of this variable delay source.
A.2.2.4 Fixed Serialization WAN Delay
This delay is due to voice packets transmission on the WAN L2- link. The link rate can vary from
56kb/s to OC3 and up. The formula for serialization delay is:
tV-max = (Voice packet bytes + 48 overhead)/(link speed kbps/8)
Important planning recommendation: in order to minimize the effect of this delay source, avoid
using slow serial links in any of the end-to-end network connections.
47
TIA/EIA/TSB116
A.2.3 Core Network
A.2.3.1 Fixed Switching Delay
Includes packet-switching engine delay (see originating Voice-LAN section for details) and any other
network multiplexing equipment delays. An estimate of 1 ms of delay for each hop is used in the
calculation table in the next section.
A.2.3.2 Fixed Propagation Delay
This is the cumulative delay due to the physical ‘speed of light’ limitations of propagation through
the network. Details for this are contained in ITU-T Recommendation G.114. For the purpose of this
exercise, a figure of 5µs/km is used in the calculation of the table in the next section.
A.2.3.3 Fixed Serialization Delay Network
This is the same as defined earlier, but since the link rate in the core network is usually in the
broadband range, the total effect of this delay source is small enough (< 1.5 ms) that it is ignored in
the calculation table in the next section.
A.2.3.4 Variable Voice Contention Delay
As defined earlier:
Average voice queuing time is: tQ-av = tdls * σ/2*(σ-1)
Worst case queuing time is (95% of distribution): tQ-wo = 2*tQ-av
Total core network worst-case queuing time is (95% of distribution): = tQ-wo * (number of hops -1)
Since the link rate in the core network is usually in the broadband range, the tdls delay source is small.
In addition, σ, the link utilization ratio for voice packet is small, so that the total effect of this delay
source can be ignored in the calculation table in the next section.
A.2.3.5 Variable Data/Voice Contention Delay
As defined earlier:
tD-max = (Maximum # Data MTU bytes + 48 overhead)/(link speed kbps/8)
Total core network maximum data MTU queuing time is: = tQ-wo * (number of hops -1)
Important planning recommendation: need to use priority scheduling for voice-class traffic, as well
as RTP header compression and data packet fragmentation on slow-speed links to minimize the
contribution of this variable delay source.
A.2.4 Terminating Voice-LAN
A.2.4.1 Fixed Serialization WAN
This delay is due to voice packet transmission on the WAN L2- link. The link rate can vary from
56kb/s to OC3 and up. The formula for serialization delay is:
tV-max = (voice packet bytes + 48 overhead)/(link speed kbps/8)
Important planning recommendation: in order to minimize the effect of this delay source, avoid
using slow serial links in any of the end-to-end network connections.
48
TIA/EIA/TSB116
A.2.5 Vocoder Decoder
A.2.5.1 Variable Dejitter Buffer Delay
This is the delay required to buffer all the variable delays in the network so that the voice packets can
be played at constant bit-rate to the decoder. The size of dejitter buffer is:
vocoder encoding compression amount + the total variable delay in the end-to-end connection.
A.2.5.2 Fixed Decoder
Details on the vocoder decoding delay is detailed in ITU-T Recommendation G.114
49
TIA/EIA/TSB116
A.3. VoIP End-to-End Delay Budget Case 1
Table 5: Case 1a - VoIP End-to-End Delay Budget
Case 1a: L1 = 10Mb/s; L2 = 128kb/s; Data MTU max= 128
Codec type:
Delay type
Units
G.729
G.729
G.729
10.00
10.00
20.00
G.729
20.00
Fixed
(ms)
Variable
(ms)
Fixed
(ms)
Variable
(ms)
A-side phone
Encoding process delay
Codec Look ahead
Encoding compression
1xbuffer
ms
5.0
5.0
ms
10.0
10.0
ms
10.0
~
10.0
~
ms
0.0
~
10.0
~
bytes
10.0
1 hop, @ > 100 pps
ms
10.0
voice packets queuing @
128kb/s (Max 2*SD)
ms
1.5
2.9
ms
11.0
11.0
Packetization delay
# of Voice bytes/packet
20.0
Originating Voice-LAN
Switching
Voice contention queuing
Data Queuing Max. data unit 128 bytes +48
O/H @ 128kb/s
Serialization WAN delay
10.0
Voice packet + 48 O/H @
128kb/s
ms
3.6
~
4.3
~
5 hops, @ > 1k pps
ms
5.0
voice packets queuing @
1544kb/s (Max 2*SD)
ms
0.1
0.3
ms
3.6
3.6
Core Network
Switching
Voice contention queuing
Data Queuing 5 hops, Max data 128+48 O/H
@ 1544kb/s avg
5.0
Serialization core
Voice packet + 48 O/H @
1544kb/s
ms
1.2
~
1.4
~
Propagation delay
5000km @ 5µs/km
ms
25.0
~
25.0
~
Voice packet + 48 O/H @
128kb/s
ms
3.6
~
4.3
~
1 hop, @ > 100 pps
ms
10.0
1 comp. delay + network
variable delay
ms
10.0
16.2
10.0
17.8
ms
10.0
~
10.0
~
103.5
32.5
114.9
35.7
103.5
135.9
114.9
150.6
Terminating Voice-LAN
Serialization WAN delay
Switching
10.0
B-side phone
Dejitter buffer delay
Decoding delay
Min/Max
50
TIA/EIA/TSB116
Table 6: Case 1b - VoIP End-to-End Delay Budget
Case 1b: L1 = 10Mb/s; L2 = 128kb/s; Data MTU max= 512
Codec type:
Delay type
Units
G.729
G.729
G.729
10.00
10.00
20.00
G.729
20.00
Fixed
(ms)
Variable
(ms)
Fixed
(ms)
Variable
(ms)
A-side phone
Encoding process delay
Codec Look ahead
Encoding compression
1xbuffer
ms
5.0
5.0
ms
10.0
10.0
ms
10.0
~
10.0
~
ms
0.0
~
10.0
~
bytes
10.0
1 hop, @ > 100 pps
ms
10.0
voice packets queuing @
128kb/s (Max 2*SD)
ms
1.5
2.9
ms
35.0
35.0
Packetization delay
# of Voice bytes/packet
20.0
Originating Voice-LAN
Switching
Voice contention queuing
Data Queuing Max. data unit 512 bytes +48
O/H @ 128kb/s
Serialization WAN delay
10.0
Voice packet + 48 O/H @
128kb/s
ms
3.6
~
4.3
~
5 hops, @ > 1k pps
ms
5.0
voice packets queuing @
1544kb/s (Max 2*SD)
ms
0.1
0.3
ms
11.6
11.6
Core Network
Switching
Voice contention queuing
Data Queuing 5 hops, Max data 512+48 O/H
@ 1544kb/s avg
5.0
Serialization core
Voice packet + 48 O/H @
1544kb/s
ms
1.2
~
1.4
~
Propagation delay
5000km @ 5µs/km
ms
25.0
~
25.0
~
Voice packet + 48 O/H @
128kb/s
ms
3.6
~
4.3
~
1 hop, @ > 100 pps
ms
10.0
1 comp. delay + network
variable delay
ms
10.0
Terminating Voice-LAN
Serialization WAN delay
Switching
10.0
B-side phone
Dejitter buffer delay
Decoding delay
ms
Min/Max
51
48.2
10.0
49.8
10.0
~
10.0
~
103.5
96.4
114.9
99.6
103.5
199.9
114.9
214.5
TIA/EIA/TSB116
A.4. VoIP End-to-End Delay Budget Case 2
Table 7: Case 2a - VoIP End-to-End Delay Budget
Case 2a: L1 = 10Mb/s; L2 = 1544kb/s; Data MTU max= 128
Codec type:
Delay type
Units
G.729
G.729
G.729
10.00
10.00
20.00
G.729
20.00
Fixed
(ms)
Variable
(ms)
Fixed
(ms)
Variable
(ms)
A-side phone
Encoding process delay
Codec Look ahead
Encoding compression
1xbuffer
ms
5.0
5.0
ms
10.0
10.0
ms
10.0
~
10.0
~
ms
0.0
~
10.0
~
bytes
10.0
1 hop, @ > 100 pps
ms
10.0
voice packets queuing @
128kb/s (Max 2*SD)
ms
0.1
0.2
ms
0.9
0.9
Packetization delay
# of Voice bytes/packet
20.0
Originating Voice-LAN
Switching
Voice contention queuing
Data Queuing Max. data unit 128 bytes +48
O/H @ 1544kb/s
Serialization WAN delay
10.0
Voice packet + 48 O/H @
1544kb/s
ms
0.3
~
0.4
~
5 hops, @ > 1k pps
ms
5.0
voice packets queuing @
1544kb/s (Max 2*SD)
ms
0.1
0.3
ms
3.6
3.6
Core Network
Switching
Voice contention queuing
Data Queuing 5 hops, Max data 128+48 O/H
@ 1544kb/s avg
5.0
Serialization core
Voice packet + 48 O/H @
1544kb/s
ms
1.2
~
1.4
~
Propagation delay
5000km @ 5µs/km
ms
25.0
~
25.0
~
Voice packet + 48 O/H @
128kb/s
ms
0.3
~
0.4
~
1 hop, @ > 100 pps
ms
10.0
1 comp. delay + network
variable delay
ms
10.0
ms
10.0
~
10.0
~
96.8
9.6
107.1
10.2
96.8
106.4
107.1
117.3
Terminating Voice-LAN
Serialization WAN delay
Switching
10.0
B-side phone
Dejitter buffer delay
Decoding delay
Min/Max
52
4.8
10.0
5.1
TIA/EIA/TSB116
Table 8: Case 2b - VoIP End-to-End Delay Budget
Case 2b: L1 = 10Mb/s; L2 = 1544kb/s; Data MTU max= 512
Codec type:
Delay type
Units
G.729
G.729
G.729
10.00
10.00
20.00
G.729
20.00
Fixed
(ms)
Variable
(ms)
Fixed
(ms)
Variable
(ms)
A-side phone
Encoding process delay
Codec Look ahead
Encoding compression
1xbuffer
ms
5.0
5.0
ms
10.0
10.0
ms
10.0
~
10.0
~
ms
0.0
~
10.0
~
bytes
10.0
1 hop, @ > 100 pps
ms
10.0
voice packets queuing @
1544kb/s (Max 2*SD)
ms
0.1
0.2
ms
2.9
2.9
Packetization delay
# of Voice bytes/packet
20.0
Originating Voice-LAN
Switching
Voice contention queuing
Data Queuing Max. data unit 512 bytes +48
O/H @ 1544kb/s
Serialization WAN delay
10.0
Voice packet + 48 O/H @
1544kb/s
ms
0.3
~
0.4
~
5 hops, @ > 1k pps
ms
5.0
voice packets queuing @
1544kb/s (Max 2*SD)
ms
0.1
0.3
ms
11.6
11.6
Core Network
Switching
Voice contention queuing
Data Queuing 5 hops, Max data 512+48 O/H
@ 1544kb/s avg
5.0
Serialization core
Voice packet + 48 O/H @
1544kb/s
ms
1.2
~
1.4
~
Propagation delay
5000km @ 5µs/km
ms
25.0
~
25.0
~
Voice packet + 48 O/H @
1544kb/s
ms
0.3
~
0.4
~
1 hop, @ > 100 pps
ms
10.0
1 comp. delay + network
variable delay
ms
10.0
ms
10.0
~
10.0
~
96.8
29.5
107.1
30.1
96.8
126.3
107.1
137.2
Terminating Voice-LAN
Serialization WAN delay
Switching
10.0
B-side phone
Dejitter buffer delay
Decoding delay
Min/Max
53
14.8
10.0
15.0