7. STANDARDS FOR INTERNET TELEPHONY AND MULTIMEDIA SYSTEMS

advertisement
7. STANDARDS FOR INTERNET TELEPHONY AND
MULTIMEDIA SYSTEMS
7.1 Overview on ITU-T Standards
ITU-T stands for Telecommunications Standards Sector of International
Telecommunication Union, the headquarters of which is located in Geneva. ITU-T is the
dominant de-jure standardization body focusing on worldwide telecommunication
standards. The practical work of ITU-T is carried out in 15 Study Groups (SG). The
Study Group in charge of multimedia standards is the recently established SG16. Internet
telephony standardization falls within the scope of SG16. In addition to ITU-T, ISO and
IEC issue worldwide de-jure standards, such as MPEG1, which influence communication
systems and devices. Furthermore, an increasing number of de-facto bodies such as IETF,
ATM Forum, and DAVIC make specifications, which bear close resemblance with
communication standards. Since 1996, a huge step forward has been taken as ITU-T has
started to strengthen its cooperation with the de-facto bodies. This is essential to succeed
in future standards efforts of the merging telecommunication and computer
communication infrastructures.
ITU-T standards for Internet telephony is a collection of diverse standards issued by ITUT itself and IETF. As a matter of fact, the set of standards does not restrict to telephony
only, but it covers services and technology for real-time multimedia applications conveyed
across a packet switched IP network. A taxonomic map of the relevant standards appear
on Appendix 1.
The ITU-T recently approved Recommendations for low bit-rate multimedia
communication systems, which include standards for audio coding, video coding, system
control, multiplexing as well as media stream synchronization and packetization. The
networks involved are PSTN, digital mobile, and Intranet/Internet or IP based WANs
(ATM or Frame Relay). The system specifications are in contained in following Recs.:
• H.323 (Visual Telephone Systems and Equipment for Local Area networks which
Provide a Non-Guaranteed QoS)[ITU 96d]
• H.324 (Visual telephone Systems and Equipment for PSTN) [ITU96c].
• H.310 (Broadband Audiovisual Communication Systems and Terminals)
• H.321 (Adaptation of H.320 Visual Telephone Terminals to B-ISDN Environments
• H.322 (Visual Telephone Systems and Terminal Equipment for LANs which Provide
Guaranteed Quality of Service).
Subsequently we will focus on the two first standards, which are the most important ones
in the short term.
The forthcoming mobile extension of H.324 bears the notation H.324/M. H.323 is not
only applicable for low bit rate connections, but also high speed rates up to hundreds of
kbps. The set of Recommendations define the technical requirements for conversational
multimedia communication systems, but they support also non-conversational mode of
operation to a limited extent. It is worth noting that H.324 system is a non-packet based
multimedia standard, which is compatible with H.323 on bit stream level of speech and
video, but employs circuit switched network access.
7.2 Speech Codecs for Wireless Multimedia and Internet Telephony
Good speech quality is the most important design goal of most multimedia systems. If the
speech quality is not acceptable, the other media content tend to become useless albeit the
quality would be good. In this section we briefly deal with some fundamentals of standard
speech codecs, in particular from QoS point of view. Specifically the delay problems
related the ITU-T G.723.1 [ITU 96a], speech codec are discussed. This codec is the key
component for low bit-rate multimedia applications as well as for Internet telephony,
because of the decision taken by the VoIP Consortium in March 1997. According to the
decision, G.723.1 was chosen as default codec for Internet telephony.
7.2.1 Speech Coding Standards
Speech coding is trade-off between bit-rate, delay, complexity, cost and delay. All the
current low bit-rate speech codecs G.723.1 (5.3/6.3 kbps) G.729 (8 kbps)[ITU 96b],
GSM (13 kbps), IS-54 (7.95 kbps), IS-95 (9.6 kbps), and PDC (6.7 kbps) belong to same
class of codecs, namely linear prediction analysis- by-synthesis (LPAS). The prevailing
algorithms are mainly based on vector excitation coding, in which is the excitation is
derived from a codebook as a closest estimate to the transmitted vector. The complexity is
measured in MIPS of the signal processing needed. Speech codec of 30 MIPS is regarded
as complex one, whereas speech codec using less than 15 MIPS is a low complexity one.
G.723.1 supports two bit rates, namely 5.3 kbps and 6.3 kbps [Cox 96b]. ITU-T SG16 is
currently developing a 4 kbps speech coding standard, the target dead-line of which is 3Q
of year 2000. The objectives are ambitious: speech quality, error resilience, and
performance in the presence of background noise not worse than those of 32 kbps
ADPCM codec (G.726). Maximum allowable codec delay is 45 ms. Acceptable performance under adverse network conditions is an essential design objective to address the
needs of mobile multimedia applications. The technology has not been decided yet, but a
likely candidate is some kind of CELP based technology.
7.2.2 Evaluation Methods of Speech Quality
The speech quality is measured using subjective methodology. The Speech Quality Expert
Group (SQEG) of ITU-T SG12 is in charge of developing quality testing methods for the
subjective tests [ITU 96k], [ITU 96l]. The commonly used test is ACR, absolute category
rating. The subjects, selected for the test, listen to samples of speech for 8-10 seconds and
they are requested to give there score using a scale of 1-5. The numerical values derived
correspond to mean opinion score (MOS). The robustness of speech codec under adverse
conditions is an important design parameter. Also the ability to use the codec in multipoint
configurations needs to be considered. In this case, each digital speech signal has to be
decoded, summed and re-encoded. This combined tandeming and audio bridging not only
impairs the speech quality, but doubles the delay as well. The tolerance of the speech
codec to allow 2-3 tandeming without severe degradation of quality is important, because
tandeming is confronted per force in the current telecommunication network. In the future
such need will decrease for two reasons. First the terminals are expected to support a
range of different speech coding standards avoiding thus the need for transcoding. Second
digital transcoders with rate adaptation avoid A/D and D/A conversions, which are primary sources of degradation.
The background noise is taken into account by adding speech-correlated noise to the
samples. The measure of the noise is called MNRU, modulated noise reference unit.
Another form of degradation is quantization distortion, stemming from reduced
bandwidth. The measure of it is QDU, quantization distortion unit.
Besides the ACR test, also DCR test is used. In DCR the subjects hear the original sample
first uncoded and then the coded one. They are requested to give their rating using the
following score in terms of degradation to the reference sample:
5 No perceptible distortion
4 Perceptible, but not annoying
3 Mildly annoying
2 Annoying
1 Very annoying
This test is better than ACR in such cases when the quality of original reference sample is
already impaired, because the subjects do hardly distinguish the effect of background noise
on the quality.
A MOS of 4.0 is considered commonly good quality. So called “toll quality” pertains to
MOS values above 3. The need for improving the speech quality is clearly indicated by the
CTIA recent user survey. It is likely that rapidly advancing signal and speech processing
technologies will allow a MOS score of at least 4 with good performance in adverse
operating conditions for the next generation standards even at lower bit-rates than today.
7.2.3 Delays in Speech Communication
The delays of speech impair human interaction and should be kept as small as possible.
Therefore, reduction of delays is prime target in real-time communications. This tends to
be difficult due to restrictions in speech coding technology and network conditions.
Subsequently, some fundamental issues of the overall delay are discussed.
Rec. G.114 [ITU 96a] defines 400ms as the maximum allowable overall delay. In real life,
such a long delay impairs severely human interaction. According to SQEG, users do not
get distracted if the one-way delays remains below 200ms [SQE97], which is a more
appropriate design goal than G.114. Unlike geostationary (GEO) satellites LEO and ICO
satellite links incur fiber-like delays, which make them more appropriate for real-time
applications. The delay of GEO satellite is around 260 ms. Given that delays in low bitrate or Internet telephony applications usually are more than 200 ms, GEO links are
inappropriate for such connections.
The residual delay of a speech codec consists of codec delay and transmission delay (see
Fig. 14). The codec delay is composed of algorithmic delay, which is implementation
independent, and processing delay. The transmission delay and processing delay depend
on network conditions and implementation. As nominal system delay for G.723.1 a figure
of 97.5 ms has been given [Cox 96b], which is derived from 37.5 ms algorithmic delay,
and 40 ms of processing delay and 20 ms for transmission. Therefore, in subsequent
calculations, when the network portion is replaced by something else, i.e. the Internet or a
digital mobile network, we have used a figure of 80 ms for coding delay and assessed
separately the network transit delay.
Overall system delay τd
τad1
algorithmic
delay
τ pd1s
processing
delay
τ td
Propagation delay of
network= transmission
delay
Speech Encoder
τpd2
processing
delay
τad2
algorithmic
delay
Speech Encoder
τd = (τad1 + τad2 + τpd1 + τpd2) + τtd
Figure 1 Delay introducing entities in digital speech communication
The speech coding scheme G.723.1 was originally designed for low-bit rate
videotelephony specifically in the PSTN. The long delay does not matter in
videotelephony, since the coding delay of a video codec tends to be substantially longer
than that of a speech codec. Furthermore, the low repetition rate of video frames (5 Hz)
equates to 200 ms, which means that the audio has to be delayed in any case to insure lip
synchronism. Therefore, it was felt in SG16 that low delay, which tends to increase
complexity and cost, was less important. The possibility of using a G.723.1 codec in
networks with inherent long delays such as digital cellular networks or Internet emerged
later on, when SG15, the predecessor of SG16, decided to develop a mobile extension of
H.324. In terms of Internet, nobody could foresee the option of using G.723.1 as an
Internet telephony default codec at that time, when SG15 confirmed design constraints for
G.723.1.
In a digital cellular network, a speech codec needs to cope with a much more adverse
operating environment, which inherently introduce long delays due to delay spread
equalization, echo cancellation, bit interleaving, and error control. The same goes for the
Internet, which is plagued by long residual routing delays. Therefore, the delay margin left
for speech codec is much lower than in PSTN to keep the overall residual delay within
reasonable limits. It is easy to insert additional delay into digital data when needed, but
once introduced, you cannot take it away.
The competing audio coding scheme G.729, which lost the voting in VoIP Forum for
Internet telephony default codec, introduces a coding delay of 25 ms only. The SQEG has
recently pointed out that extensive tests have been made for G.729 to evaluate its
performance on tandemed connections [Hay 96]. Such test results are not available, not at
least in public, for G.723.1. The quality of G.723.1 and G.729 do not otherwise significantly differ from each other under normal conditions i.e. without any transcoding being
present.
The good news from the VoIP decision is that G.723.1 is the default codec of H.324
systems, which facilitates interoperability between Internet telephones and H.324 terminals
(no transcoding is needed). Besides incurring additional cost, transcoding tend to
introduce additional delay. Therefore, it is more than desirable that interoperability is
provided on peer-to-peer level. In the future, one general purpose signal processor,
equipped with a SW package to support a range of different speech coding schemes, can
be foreseen. Yet transcoding cannot be avoided completely due to the structure of existing
telecommunication network.
In Section 8 we have assessed the delays in some basic network configurations between
G.723.1 voice over packet radio and H.323 Internet telephony as well as between
H.324/M and H.323.
7.3 ITU-T H.324 Recommendation [ITU 96d]
H.324 is a system specification for low bit-rate real-time multimedia communication. The
prime design objective was to provide best possible performance for audio, video , data
over low bit rate connections (read: sub 64 kbps). The standard was initially targeted to
PSTN [Lin 96], but does not rule out other low bit-rate networks. For instance currently a
mobile extension of H.324 is being developed in ITU-T SG16. The H.324 standard rests
on the existing widely used H.320 videoconferencing system standard in the sense that
interoperability at best level of common mode between H.320 systems is an important
design goal
equipment
User data
applications
T.120 etc.
G.723.1
path delay
H.223
Mux/Demux
l
l
a
a
y
y
e
e
r
r
Data protocols
V.14, LAPM
etc.
Control protocol
H.245
SRP/LAPM
PSTN
Modem
control
V.25ter
System control
Scope of Rec. H.324
Figure 2 H.324 block diagram [Lin 96]
7.3.1 H.324 Functional Entities and Communication Procedures
The main building blocks and related standards of a H.324 system are depicted in Fig.15
and respective protocol architecture in Fig. 16.
The interface to physical PSTN includes a V.34 modem and V.8/V.8bis DTE-DCE
interface protocol is used for establishing connection between the modems at start-up. The
control of H.324 is carried out by H.245 control protocol [[ITU 96h], which is based on
logical channels. The logical channels can be opened i.e. activated for the media streams
needed by an application. For data protocols such as T.120 [ITU 96j] or V.70, a bi-directional pair of logical channels need to be opened. The T.120 stack supports multipoint
operation, which enables a H.324 terminal to join a multipoint videoconference. The
H.223 multiplexer mixes the various media streams and demultiplexer extracts them from
the incoming data. Synchronization of the incoming data is carried out with HDLC flags at
the beginning and end of data frame (HDLC is better known as synchronous link protocol
for SS7).
For low bit-rate video coding, a new scheme developed from H.261, the currently
dominating standard, for H.320 video conferencing. Regardless of the fact that H.263 has
been created for sub 64 kbps connections, it provides better quality even on higher bitrates than H.261. In addition to the picture formats FCIF and QCIF, supported by both
H.261 and H.263, the latter one supports also SQCIF. This is rather low spatial resolution
of the video, and it is debatable whether it will have any practical significance. The speech
codec of H.324 systems is G.723.1 has been discussed in more detail in the Section 7.2.3.
The Transport Layer error control is optional for H.324. It will be needed for enhanced
interoperability with H.324/M systems. Interoperability on level 0 is always ensured even
without support for error recovery, but then QoS is determined by the BER of the mobile
link i.e. 10-2 - 10-3. For speech, the QoS remains above the toll quality threshold even at
this very low BER level [Hay 96].
Application layer
Audio apps
Video apps
System control
Data Apps
Presentation
layer
G.723.1
H.263
H.261
H.245 Control Protocol
T.120
Session layer
Transport layer
Network layer
Link layer
Physical Layer
SRP/LAPM
H.223, optionally ITU-T H.223
forthcoming error control
(FEC+ARQ) protocol
V.14, LAPM
H.223
V.8/V.8 bis
V.34 Modem
V.8/V.8 bis
V.25 ter
Figure 3 Protocol architecture of a H.324 system
Table 1 Phases of communication between two H.324 terminals
Phase Mode
A
POTS call setup
B
Analogue voice conversation
C
multimedia call setup, establishment of
digital communication for DSVS (digital
simultaneous voice and data)
D
Completion of terminal negotiation
E
Multimedia communication mode, users
can exchange multimedia information in
DSVD mode
Termination of the H.324 portion of the
call
F
G
Function
An ordinary telephone call is set up, with
ringing, answering, either automatically
with V.25 ter procedures or by the called
user
The calling user can originate a POTS call
first and upgrade the call later on into a
multimedia call using the V.8 bis protocol
(AVD, alternate voice and data)
A start-up phase of around 10 seconds
takes place, during which initial terminal
negotiation and modem (equalizers)
“training” for optimal line equalization take
place.
Exchange of terminal capabilities using
H.245 control protocol (duration < 1 sec.)
All logical channels are available during the
call and can be opened or closed under user
control
All logical channels are closed and H.245
messages specify call disconnect, transfer
to POTS or another digital mode such as
fax or V.34 file transfer.
Call disconnect, switch to POTS or specified digital mode
7.3.2 Call Initialization
Different phases of a H.324 call are presented in Table 9 [Lin 96]. V.8 bis allows the user
to alternate between analogue voice (POTS) and multimedia mode during the same call. In
the phase C the modems communicate on 300 bps exchanging some basic data on their
capabilities. No line equalization is needed at this low speed. The stepwise increase of the
V.34 modulation in 2.4 kbps increments up to a maximum speed of 28.8 kbps (for V.34Q
33.6kbps) is initiated with appropriate parameter adjustments to optimize the modem performance. Unlike packet switched mode a the synchronous operation mode of H.324
introduces negligible propagation delays, but the coin has its flipside as well.
The start-up is the Achilles’ heel of the H.324 system. Terminals with V.8 protocol only,
has the disadvantage that users cannot greet each other by using first POTS. The called
party just hears the V.34 modem tones for a duration of 10 seconds or even more without
knowing, who is calling. In any case, communication is unusable during the start-up and
there is no good solution to the problem. One possibility is to train the modems up to half
speed and start audiovisual communication first at a reduced quality level. Subsequently
the training is completed during the multimedia communication phase. The duration of the
unusable communication phase is reduced by around 50% at the expense of reduced
audiovisual quality at the beginning of the call.
7.3.3 Mobile Extension of H.324 (H.324/M)
The Mobile Extension of H.324 (Annex C to H.324) is under development in ITU-T
SG16. The targeted deadline for the Annex C is January 1998. In its Nice meeting
February 1997 the Experts’ Group for Very Low Bit-rate Video Telephony decided ITUT SG16 that unlike originally intended no link layer protocol will be provided over the air
interface. Instead an error control mechanism will be provided on end-to-end level as an
extended functionality of H.223. Due to this decision, no gateway is needed in the
network and no upgrades are needed in current cellular networks. The error control
protocol to be developed is likely to be FEC based or a combination of ARQ and FEC
error recovery technologies to achieve a net BER, say 10-6. Usually FEC schemes are not
capable of correcting errors fully reliably. In particular, for some error patterns, that
fortunately occur fairly seldom, FEC tends to make coding errors. Bit interleaver spreads
burst errors over time domain in such a fashion that the probability of multiple errors in
same code block is significantly lower than that of single errors. ARQ can handle those
rare patterns without significant increase in overall latency. ARQ tend to increase the size
of receiver buffer and latency. Therefore, it is usually not feasible for audio and video, in
particular in cellular networks with long inherent propagation delay. Under severe
conditions, Reed-Solomon (RS), Bose-Caudhuri-Hocquenghem (BCH), or Rate
Punctured Convolutional Code (RCPC), are used for channel coding.
The SG16 Experts’ Group has decided to include a multilevel error control scheme to
H.324, the basic idea of which appears on the Fig. 17 [Wim 97]. The scheme is not
exclusively intended for mobile networks, but for error-prone environments in general.
Depending on the terminal type, error control may be supported from level 1 to 3. All
H.324 systems can interwork at the level 0, but under degraded network conditions the
QoS may remain rather poor.
Level 3
Level 2 + improved error protection of the adapation layer
Level 2
Level 1 + improved header with
multiplex payload length field
Level 1
longer synchronization flag
no HDLC 0-insertion
Level 0
plain H.223
Figure 4 Error control scheme for H.324 in error-prone environments
On the first level, the synchronization is improved by better protection of the sync flags.
The level 2 adds error robustness in the header. The algorithm for the error protection is
still open. The level 3 adds payload specific error protection to the adaptation layer resting
above H.223 [ITU 96f] multiplexer layer, which means that multiplexed media streams are
treated separately based on different algorithms. This approach stems from the fact that
the error tolerance of different types of media components varies within a broad range.
For instance audio is fairly error tolerant due to high degree of redundancy, but even a
single error may be fatal for application data. Video falls between these two extremes. The
level 3 error protection algorithm has not been decided yet. The likely candidates are RS
or RPCP. The RS codes are more effective for burst errors, whereas RCPC suits better for
correction of single error patterns. Burst errors are more common in wireless networks
without bit interleaving such as PHS and PACS. The levels are to be included in H.245
control protocol and terminals negotiate their error protection capabilities at the beginning
of a call. User may choose the level, which offers the best performance. The overall
performance is not necessarily improved when a higher level is selected. The additional
error protection is achieved at the expense of increased overhead. This entails less bandwidth for payload and impaired video quality.
The suggested protocol architecture for mobile H.324 systems is depicted in Fig. 18. ITUT SG16 is developing a new protocol for multi-link operation (H.Multilink), which is
scheduled for approval January 1998.
Application layer
Audio apps
Video apps
System control
Data Apps
Presentation
layer
Session layer
Transport layer
G.723.1
H.263
H.261
H.245 Control Protocol
T.120
Network layer
SRP/LAPM
H.223, ITU-T forthcoming H.223
error control (FEC+ARQ)
protocol
Regional Stds
V.14, LAPM
H.223
Link layer
Physical Layer
Bandwidth-on-demand e.g. multi-link reservation protocol, V.34 modem
Radio bearers (Regional stds)
Figure 5 Protocol architecture of a H.324/M system
The G.723.1 speech codec and H.263 video codecs may need modifications for acceptable
performance under adverse operating conditions. The modified versions have to be backwards compatible with the PSTN versions. Interworking between H.324/M and H.323
terminals is depicted in Fig. 21, Section 7.4.5. H.245 control protocol is common to
H.323 and H.324 systems, which makes terminal interoperability easier.
7.4 ITU-T H.323 Recommendation [ITU 96e]
The recently approved ITU-T Recommendation H.323 defines the system requirements
for Internet telephony and IP based real-time multimedia communications. The
recommendation has obtained a wide support and the computer software industry with
Microsoft, Intel, and IBM in the forefront. Furthermore, IMTC representing the telecom
operators and service providers is backing up H.323. It is likely that H.323 will gain a
dominant position as international standard for Internet telephony. Further development of
H.323 is underway in ITU-T Study Group 16. Terminals complying with new version of
H.323 will be backwards compatible with current H.323 terminals. A block diagram of a
H.323 terminal is presented in Fig. 19. The standard is flexible in the sense that besides
audio, it supports additional media such as still image data and video up to a full-fledged
multimedia system. System control functions and audio are mandatory both for A-law and
mu-law quantization. Unlike the H.324 systems, H.323 is not specifically a low bit-rate
system, but in case needed, it can operate also across low bit-rate links.
Scope of Recommendation H.323
Audio I/O
Devices
Audio codec
G.729,G.723.1
G.711,G.722
G.728, MPEG1
Video I/O
Devices
Video codec
H.261, H.263
Receive Path
delay
compensatio
n
H.225.0 Layer
Data equipment/
applications
T.120 Data
Interface
System Control
H.245 control
System Control
User Interface
RAS=Registrat-
H.225.0
Call control
H.225.0
LAN
Interface
Figure 6 Block diagram of a H.323 terminal [ITU 96e]
The protocol architecture of a H.323 system appears in subsequent Fig. 20.
Application
layer
Presentation
layer
Audio apps
Video
System control
Data apps
apps
G.711, G.722, H.263
RTCP
H.225.0 Call H.245
T.120
G.723.1,G.728 H.261
signaling
Control
G.729
channel
Protocol
Session layer
RTP
H.225.0
Transport layer Unreliable Transport (UDP)
Reliable Transport (TCP)
Network layer IP
Link layer
CSMA/CD, Token Ring protocol
Physical Layer Ethernet, Token Ring LAN
Figure 7 Protocol architecture for H.323
7.4.1 Audio
The mandatory speech codec of terminals complying with H.323 Recommendation is
G.711, which is the 64 kbps PCM A-law or mu-law speech coding scheme. The VoIP
decision to back up G.723.1 as a default codec for Internet telephony implies that the
industry has agreed to include in the forthcoming H.323 terminals at least these two
codecs. The resolution does not rule out other audio codecs such as G.729, G.722, G.728
and MPEG1 audio appearing in Fig. 19. Capability set of H.245 system control includes
all the speech codecs. Terminals interchange such information at a negotiation phase prior
sending any payload data.
ITU-T SG16 has recommended G.729 as default codec for terminals with audio-only
capability in the new version of H.323, determined to be approved in January 1998. G.729
is the recently approved ITU-T 8 kbps speech coding scheme [ITU 96c]. It delivers better
voice quality allowing at 2-3 transcoding sequences i.e. recoding after a D/A and A/D
transformation procedure without severe degradation of speech quality. Wireless
extensions of G.729 are under development in ITU-T. The victory of G.723.1 (dual-rate
6.3/5.3 kbps speech codec for low bit-rate videotelephony) in the VoIP voting, reflects the
choices made by a majority of the industry in regard to their forthcoming Internet
telephony products. The benefit of the inferior G.723.1 codec is better interoperability
with H.324 and H.324/M terminals. G.729 codecs are not widely deployed yet.
Slight modifications may be needed for G.723.1, while being used in error prone
environments. In any case, the modified version has to be compatible with the existing
G.723.1 version. Concerning the other audio codecs listed in Fig 19, G.722 is wideband
64 kbps audio, and G.728 is the 16 kbps speech codec used in H.320 videoconference
terminals. The additional codecs are recommended to insure interoperability with various
multimedia systems.
Interoperability between H.323 phone terminals across IP networks to digital cellular
networks would need the support of speech codecs such as GSM, IS-54 or IS-95. This
simplifies the gateway design and keeps the transcoding delay reasonably low.
7.4.2 H.263 Video [ITU 96i]
Video is optional, but if supported, H.261 is mandatory, but it is not suitable for low bitrate operation. The terminals have to be able to both encode and decode QCIF (Quarter
Common Interchange Format) video. Also H.263 video coding may be supported and if
supported QCIF is mandatory. CIF is optional. CIF video (352 * 288 pixels for luminance
and 176*144 pixels for chrominance) is not feasible on a low bit-rate connection.
Therefore, CIF video with a reasonable motion rendition performance needs a connection
of around 100 kbps. ISDN is commonly used for H.320 videoconferencing terminals,
which implies that the available bit-rate is 128 kbps for audio, video and data. Therefore, if
a LAN is equipped with an ISDN H.323 gateway, CIF is recommendable for respective
H.323 multimedia terminals. In practice, besides H.263 most H.323 systems will probably
support both H.261 and H.263 to provide interoperability with existing H.320 ISDN
videoconferencing systems and support for low bit-rate operation.
The forthcoming upgradings of H.263 bearing the notation H.263+ are particularly
important for error-prone, packet-loss affected environments. For instance scaleable
videocoding enables splitting the coded video into multiple logical channels in such a
manner that some loss of data does not impair significantly the video quality. Since low
bit-rate operation is necessary over the Internet, the upgraded H.263 will improve the
video quality of H.323 systems under adverse network conditions. A wide range of
custom source formats will improve the adaptability of decoded video scenes to resizable
PC displays and windows. This enhanced flexibility is particularly attractive for wireless
terminals, because it will seen by a user as improved overall quality.
7.4.3 Data
The data channel supports telematic applications such as electronic whiteboards, file
transfer, still image transfer, database access, and audiographics conferencing etc. The
various data protocols such as those of T.120 series protocol suite are not part of H.323
Recommendation.
7.4.4 System, Media Stream and QoS Control
System control for H.323 systems is defined in Rec. H.245 and for media stream
synchronization and packetization in H.225.0.
All communication between endpoints is controlled by H.245 control protocol. Logical
channels are opened for different media streams by using H.245 procedures, which are
transferred between endpoints across a H.245 control channel. The logical channels may
be either one-way or bi-directional, which enables different configuration in different
transmission directions. For instance, video may be used one-way only or the spatial
resolution of the video may be SQCIF only in the other direction to increase temporal
performance. This flexibility is important to utilize the scarce bandwidth as effectively as
possible in accordance with user needs.
H.245 is also needed to solve a problem faced in a mixed IP/circuit switched network
(CSN) environment i.e. how to send DTMF signaling intact through the IP network,
which tend to cause excessive delays or tone discontinuities through packet drop-outs.
The lengths and levels of the tones of DTMF signals are determined by an ITU-T Q.35
standard and a DTMF receiver located in LE of the PSTN discards a skewed or distorted
DTMF signal. Furthermore, excessive propagation delays may induce the user to hang up
or repeat the touch tone sequence causing malfunction. Therefore, VoIP Forum has
decided to advocate inclusion of DTMF signals in the PDUs of the next version of H.245.
Different terminal capabilities are included in H.245 by using ASN.1 notation. H.245 is
common to H.323 and H.324 systems, which facilitates substantially interoperability.
H.323 is intended for non-guaranteed QoS LANs. Therefore, the standard does not offer
any QoS guarantees. The delay intolerant media stream packets (audio and video) are sent
over unreliable UDP, which means that lost packets are not retransmitted, unless
specifically desired. The H.245 and RTP allows also the use of TCP within limits
determined by the playback buffer size, however, at the expense of additional latency.
Endpoints may use FEC based error control at the upper layers to handle separate bit
errors. Under normal conditions, the audio and video schemes tolerate errors and packet
drop-outs without excessive deterioration of quality.
H.225.0 enable synchronization of audio and video with the aid of RTP time stamps.
Furthermore, it defines the used coding algorithm so that the receiving endpoint can
decode the media streams correctly. RTCP capabilities of H.225.0 serve QoS, session and
rate control. The packets of different media types are separated by sending them to
different transport addresses of the H.225.0 layer.
7.4.5 Wireless Interworking
The architecture for intercommunication between H.323, H.320, H.322, H.324, H.324/I
and H.324/M terminals is depicted in Fig. 21. H.324/I is a forthcoming ISDN version of
the H.324 standard. The H.323 gateway may support interworking with PSTN,
narrowband and broadband ISDN, or a mobile network. In this context, the focus is on
narrow-band only.
The prime gateway functions consists of conversion functions to adapt H.323 endpoints
with H.320 (ISDN), H.321 (B-ISDN), H.324 (PSTN), and possibly forthcoming H.324/M
terminals. Furthermore the gateway interconnects H.323 terminals over the Internet or
WAN to H.323 terminals residing in another LAN. Typical conversion functions are:
• bit-stream framing and multiplexing into H.221 (H.320)
• terminal negotiation and system control signaling into H.242 (control protocol of the
existing H.320 videoconferencing systems)
• call control signaling (E.164/Internet address)
• DTMF tone conversion
• audio conversion to H.320 terminals (G.723.1 to/from G.728) and
• video coding conversion to H.324/M terminals (H.263 to/from H.263M)
•
data protocol conversion
Intercommunication between H.324 and H.324/M terminals is insured at level 0, which
means that no error correction is present i.e. QoS is determined by the raw BER of the
mobile link, ranging from 10-2 to 10-3. In order to insure a BER of the order 10-6 , the
H.324 terminal needs to support the transport layer error control mechanism of H.324/M
terminal as well (see Section 7.3.3).
H.323 Multipoint Control
Unit (MCU)
Gatekeeper
H.323 Gateway/IP Server
Non-guaranteed QoS LAN
H.323 Terminals
IP network
(Intranet/
Internet/WAN)
Speech
terminal
ISDN
Guaranteed QoS
LAN
H.322 Terminal
H.324 Terminal
H.320 or H.324/I
terminal
PSTN
SS7
E1/T1
H.324/M Terminal
MSC
BSC
BTS
Figure 8 Illustration of interworking between H.323, and other H series terminals
7.4.6 H.323 Extensions
The H.323 standard is currently addressing terminals interconnected to public packet
switched networks over LANs. ITU-T SG16 is currently studying also the possibility to
use H.323 terminals by using a ppp-like protocol and a copper subscriber line of PSTN. In
other words, the idea of running low bit-rate multimedia applications over IP is being
investigated. There seems no other technical obstacles to do so except protocol latency
and bandwidth constraints, because originally H.323 was not intended for low bit-rate
applications. If the problems can be solved, this would result in a complete Internet
telephony and multimedia communication standard.
7.5 IETF standards for RTP and RTCP (RFC 1890)
The RTP protocol has been used for H.323. It is an end-to-end protocol operating on the
top of UDP. RTP accomplishes two fundamental tasks in multimedia communications.
Since UDP does not support synchronization of different media streams, RTP includes a
synchronization mechanism, which time stamps the outgoing packets enabling the receiver
to restore the timing relationship between different media streams. The other main task of
RTP is to define the payload format such as audio and video coding. RTCP is an integral
part of RTP and it provides QoS oriented feed-back and supports similar functionality as
frame synchronous in-band signaling protocols for videoconferencing and videotelephony
in the ISDN.
Applications can use RTCP to control their operation modes, adjust session control
functions, data rate and other parameter settings. RTP will be important in packet radio
applications, which contain real-time media such as voice and video. Since RTCP packets
are sent at regular intervals, they enable also a third party i.e. service provider to monitor
remotely network operation.
At a beginning of a RTP session, the application e.g. videoconferencing defines the desired
network address and destination port addresses for both RTP and RTCP. Audio and video
are carried in a separate RTP session, each having its own string of RTCP packets, which
monitor the control the received quality of audio or video. The packets related to a given
session report the transmitting entity about the reception quality. For instance, under
adverse network conditions, the transmitter may reduce the bit-rate to increase
transmission quality i.e. BER.
RTP supports dynamic channel allocation in wireless networks and data rate or video
coding can be changed on the fly. In the wireless realm, RTP implementations need be
effective in terms of low additional delay and overhead. Even if RTP is not a mobile aware
protocol, RTP does not include such elements, which would burden or retard the
transmission. The aim is to keep the number of control packets below 5% of the payload,
but in practice a much lower overhead is achievable. RTP and RTCP seem to fit well into
wireless environments, since the functionality includes such elements that are very useful
for operation under adverse network conditions.
7.6 Future Directions of Wireless Multimedia Standardization
7.6.1 Potential H.323 Extensions
ITU-T SG16 and ETSI TIPHON project are currently investigating the possibility of using
H.323 terminals across a PSTN subscriber line in a point-to-point configuration. A new
technology is needed to transfer the data between households and ITSPs (Internet
Telephony Service Provider), because a V.34 modem connection and the ppp protocol
incur far too high latency. As the number of PCs in households is increasing rapidly, there
is an apparent need to interconnect them. Low-end routers are already available, and the
need for cheap packet switched access is clearly indicated. The new access technology
may come up also as an ADSL or ISDN derivative. According to the current knowledge,
a residential H.323 access is not likely to be available in short or medium term. The
following Fig.22 illustrates the block diagram of a suggested H.323/M terminal and
respectively.
An IP based solution would have some significant advantages in regard to H.324
systems. The market outlook for H.323 is by far dominant in business communications. If
a residential version of H.323 would be available, the market would be really huge in long
term, which would enable economies of scale. A H.323 terminal would offer the versatility
of IP functionality at the residential user’s disposal, which would enable access across
Internet to both conversational and non-conversational Web applications.
Scope of a potential Recommendation
H.323/M
Audio codec
G.729,G.723.1
G.711,G.722
G.728
Audio I/O
Devices
Receive Path
delay
compensatio
n
Video codec
H.261, H.263
Video I/O
Devices
T.120 Data
Interface
Data equipment/
applications
H.225.0M
Layer
System Control
H.245 control
System Control
User Interface
RAS = Registration,
admission and
status
H.225.0
Call control
H.225.0
RAS control
Figure 9 Suggested block diagram of a H.323/M terminal
In a similar fashion, we would like to investigate the possibility of wireless extension of
H.323. We need to ask first, whether there is a need for such a standard. Given that in 3-5
years, H.323 is foreseen to gain a dominant market position, the answer is yes. There are
Air interface
Multilink
reservation
protocol
many business applications, in particular in the medical and utilities fields, which would
greatly benefit from a wireless H.323 standard. Furthermore, interoperability with other
H.323 terminals would be very simple with low additional cost over-head. The second
question is, whether the standard is feasible. The envisioned wireless packet radio is unfit
to act as a vehicle for such real-time services [Häm 96]. Therefore, the only possibility is
circuit-switched connection based on multiple link reservation such as GSM HSCSD. A
different error control scheme that is being developed for the forthcoming H.324 mobile
extension is needed. Correct operation requires protection for IP headers in H.225.0M at
the link layer. The routing of IMT 2000 and Teledesic are expected to offer 2-way data
transfer capabilities up to ISDN 2B rates. Fig. 23 presents a potential protocol
architecture for H.323/M [Ban 97].
Application
layer
Presentation
layer
Session layer
Transport
layer
Network
layer
Link layer
Physical
Layer
Audio apps
Video apps System control
H.263
RTCP
G.723.1
G.728,G.711 H.261
G.729,G.722
RTP
Unreliable Transport (UDP)
H.225.0 Call H.245 Control Channel
signaling
channel
H.225.0
Reliable Transport (TCP)
Data apps
T.120
IP
Error control
Bandwidth-on-demand e.g. multi -link reservation protocol (Regional Standard)
Air interface (Regional Standard)
Figure 10 A potential protocol architecture for H.323/M
7.6.2 IETF RSVP Standard
The IETF RSVP standard enables reservation of adequate bandwidth for real-time
applications under changing network traffic conditions. During a RSVP session, flowspecific state information will be stored in the routers and endpoint hosts [Est 96]. The
RSVP protocol enables the network to share the link resources in a controlled manner
between multiple RSVP entities. RSVP is under intense development. In particular
admission control and policing issues need a lot of further studies. The non-RSVP users’
behavior is unclear under heavily congested network conditions as well as implications on
overall network operation. The situation may lead to an explosion of RSVP reservation
requests, which in essence will have policing implications. The use of RSVP in packet
radio environments does not seem likely in short and medium term. The current state of
the art suggests that except voice-over-the-Web, there are no other real-time applications,
feasible to be carried over packet radio networks.
In case circuit switched wireless multiple links are used as access paths to wireline TCP/IP
networks, the RSVP operation is transparent from the mobile network point of view. This
issue is beyond the scope of this investigation and should be addressed in conjunction with
other IETF RSVP studies.
7.6.3 MPEG4 Standard
ISO JTC1 SC29 WG11 is working on the next generation audio and video coding scheme
MPEG4. The success of MPEG1 and MPEG2 has encouraged ISO to launch this highly
ambitious initiative. MPEG4 will be actually more than an audio and video coding
standards, it is a complete multimedia system standard. This implies that a sophisticated
syntax and system control will be included in the standard, which enable a range of new
functions such as scaleability, content based bit stream editing and manipulation, and
combining synthetic and natural coding by incorporating an advanced set of development
tools [Rea 96]. MPEG4 would for instance make possible such exciting properties as
increased spatial resolution of the video on a selected picture area (a real zooming
feature). The current state of the work suggests that MPEG4 standard will not be
available before the of year 1999. As we have noted before, MPEG4 is not likely to be
compatible with H.263L. The MPEG4 functionality is particularly interesting from low
bit-rate multimedia communications point of view. MPEG4 is expected to offer better
quality for both speech and video. Furthermore, it supports also audio i.e. music and
coding of other tonal information than speech.
Provided that MPEG4 is a success story, it will become a very strong candidate for next
generation Internet telephony and multimedia systems. The incompatibility with H.263L is
the biggest threat to this scenario. Because interoperability of a potential large base of
forthcoming H.323 systems with MPEG4 systems may require enormous investments in
gateways that would not be needed with terminals conforming to H.263L.
7.6.4 Global Initiatives for Future Wireless Standards
In mobile network context, standardization of the IMT 2000, is underway in ITU-R
TG8/1. Its scope of work makes up only a small fraction of the enormous effort needed.
Japanese are pushing now hard ITU-T to put more effort on IMT 2000. In USA the
deployment of 2nd generation digital network has begun fairly recently. Furthermore, in
USA, much of the lower frequency block of IMT 2000 has been allocated for rapidly
expanding PCS1900. Therefore, interest in IMT 2000 among the big US players is low. In
national context, the standards efforts are likely to concentrate on further development of
IS-54 and IS-95. In Europe the evolution is seen as a continuous process from GSM [Rap
95], [Ber 96],[Cox 96a],[Ram 96]],[Udd 95]. The phase 2+ in GSM includes HSCSD and
GPRS standards, that are almost finished.
The next phase may include a wideband asymmetric air interface, possibly based on
CDMA technologies. This air-interface can be seen as a terrestrial counterpart and
contender for the Teledesic high speed 2 Mbps satellite links. Unlike Teledesic, the high
speed data services of the terrestrial systems are planned to be provided in wireless
environments with limited coverage only[Ber 96]. Therefore, the two systems address
partly different markets.
Download