7. STANDARDS FOR INTERNET TELEPHONY AND MULTIMEDIA SYSTEMS 7.1 Overview on ITU-T Standards ITU-T stands for Telecommunications Standards Sector of International Telecommunication Union, the headquarters of which is located in Geneva. ITU-T is the dominant de-jure standardization body focusing on worldwide telecommunication standards. The practical work of ITU-T is carried out in 15 Study Groups (SG). The Study Group in charge of multimedia standards is the recently established SG16. Internet telephony standardization falls within the scope of SG16. In addition to ITU-T, ISO and IEC issue worldwide de-jure standards, such as MPEG1, which influence communication systems and devices. Furthermore, an increasing number of de-facto bodies such as IETF, ATM Forum, and DAVIC make specifications, which bear close resemblance with communication standards. Since 1996, a huge step forward has been taken as ITU-T has started to strengthen its cooperation with the de-facto bodies. This is essential to succeed in future standards efforts of the merging telecommunication and computer communication infrastructures. ITU-T standards for Internet telephony is a collection of diverse standards issued by ITUT itself and IETF. As a matter of fact, the set of standards does not restrict to telephony only, but it covers services and technology for real-time multimedia applications conveyed across a packet switched IP network. A taxonomic map of the relevant standards appear on Appendix 1. The ITU-T recently approved Recommendations for low bit-rate multimedia communication systems, which include standards for audio coding, video coding, system control, multiplexing as well as media stream synchronization and packetization. The networks involved are PSTN, digital mobile, and Intranet/Internet or IP based WANs (ATM or Frame Relay). The system specifications are in contained in following Recs.: • H.323 (Visual Telephone Systems and Equipment for Local Area networks which Provide a Non-Guaranteed QoS)[ITU 96d] • H.324 (Visual telephone Systems and Equipment for PSTN) [ITU96c]. • H.310 (Broadband Audiovisual Communication Systems and Terminals) • H.321 (Adaptation of H.320 Visual Telephone Terminals to B-ISDN Environments • H.322 (Visual Telephone Systems and Terminal Equipment for LANs which Provide Guaranteed Quality of Service). Subsequently we will focus on the two first standards, which are the most important ones in the short term. The forthcoming mobile extension of H.324 bears the notation H.324/M. H.323 is not only applicable for low bit rate connections, but also high speed rates up to hundreds of kbps. The set of Recommendations define the technical requirements for conversational multimedia communication systems, but they support also non-conversational mode of operation to a limited extent. It is worth noting that H.324 system is a non-packet based multimedia standard, which is compatible with H.323 on bit stream level of speech and video, but employs circuit switched network access. 7.2 Speech Codecs for Wireless Multimedia and Internet Telephony Good speech quality is the most important design goal of most multimedia systems. If the speech quality is not acceptable, the other media content tend to become useless albeit the quality would be good. In this section we briefly deal with some fundamentals of standard speech codecs, in particular from QoS point of view. Specifically the delay problems related the ITU-T G.723.1 [ITU 96a], speech codec are discussed. This codec is the key component for low bit-rate multimedia applications as well as for Internet telephony, because of the decision taken by the VoIP Consortium in March 1997. According to the decision, G.723.1 was chosen as default codec for Internet telephony. 7.2.1 Speech Coding Standards Speech coding is trade-off between bit-rate, delay, complexity, cost and delay. All the current low bit-rate speech codecs G.723.1 (5.3/6.3 kbps) G.729 (8 kbps)[ITU 96b], GSM (13 kbps), IS-54 (7.95 kbps), IS-95 (9.6 kbps), and PDC (6.7 kbps) belong to same class of codecs, namely linear prediction analysis- by-synthesis (LPAS). The prevailing algorithms are mainly based on vector excitation coding, in which is the excitation is derived from a codebook as a closest estimate to the transmitted vector. The complexity is measured in MIPS of the signal processing needed. Speech codec of 30 MIPS is regarded as complex one, whereas speech codec using less than 15 MIPS is a low complexity one. G.723.1 supports two bit rates, namely 5.3 kbps and 6.3 kbps [Cox 96b]. ITU-T SG16 is currently developing a 4 kbps speech coding standard, the target dead-line of which is 3Q of year 2000. The objectives are ambitious: speech quality, error resilience, and performance in the presence of background noise not worse than those of 32 kbps ADPCM codec (G.726). Maximum allowable codec delay is 45 ms. Acceptable performance under adverse network conditions is an essential design objective to address the needs of mobile multimedia applications. The technology has not been decided yet, but a likely candidate is some kind of CELP based technology. 7.2.2 Evaluation Methods of Speech Quality The speech quality is measured using subjective methodology. The Speech Quality Expert Group (SQEG) of ITU-T SG12 is in charge of developing quality testing methods for the subjective tests [ITU 96k], [ITU 96l]. The commonly used test is ACR, absolute category rating. The subjects, selected for the test, listen to samples of speech for 8-10 seconds and they are requested to give there score using a scale of 1-5. The numerical values derived correspond to mean opinion score (MOS). The robustness of speech codec under adverse conditions is an important design parameter. Also the ability to use the codec in multipoint configurations needs to be considered. In this case, each digital speech signal has to be decoded, summed and re-encoded. This combined tandeming and audio bridging not only impairs the speech quality, but doubles the delay as well. The tolerance of the speech codec to allow 2-3 tandeming without severe degradation of quality is important, because tandeming is confronted per force in the current telecommunication network. In the future such need will decrease for two reasons. First the terminals are expected to support a range of different speech coding standards avoiding thus the need for transcoding. Second digital transcoders with rate adaptation avoid A/D and D/A conversions, which are primary sources of degradation. The background noise is taken into account by adding speech-correlated noise to the samples. The measure of the noise is called MNRU, modulated noise reference unit. Another form of degradation is quantization distortion, stemming from reduced bandwidth. The measure of it is QDU, quantization distortion unit. Besides the ACR test, also DCR test is used. In DCR the subjects hear the original sample first uncoded and then the coded one. They are requested to give their rating using the following score in terms of degradation to the reference sample: 5 No perceptible distortion 4 Perceptible, but not annoying 3 Mildly annoying 2 Annoying 1 Very annoying This test is better than ACR in such cases when the quality of original reference sample is already impaired, because the subjects do hardly distinguish the effect of background noise on the quality. A MOS of 4.0 is considered commonly good quality. So called “toll quality” pertains to MOS values above 3. The need for improving the speech quality is clearly indicated by the CTIA recent user survey. It is likely that rapidly advancing signal and speech processing technologies will allow a MOS score of at least 4 with good performance in adverse operating conditions for the next generation standards even at lower bit-rates than today. 7.2.3 Delays in Speech Communication The delays of speech impair human interaction and should be kept as small as possible. Therefore, reduction of delays is prime target in real-time communications. This tends to be difficult due to restrictions in speech coding technology and network conditions. Subsequently, some fundamental issues of the overall delay are discussed. Rec. G.114 [ITU 96a] defines 400ms as the maximum allowable overall delay. In real life, such a long delay impairs severely human interaction. According to SQEG, users do not get distracted if the one-way delays remains below 200ms [SQE97], which is a more appropriate design goal than G.114. Unlike geostationary (GEO) satellites LEO and ICO satellite links incur fiber-like delays, which make them more appropriate for real-time applications. The delay of GEO satellite is around 260 ms. Given that delays in low bitrate or Internet telephony applications usually are more than 200 ms, GEO links are inappropriate for such connections. The residual delay of a speech codec consists of codec delay and transmission delay (see Fig. 14). The codec delay is composed of algorithmic delay, which is implementation independent, and processing delay. The transmission delay and processing delay depend on network conditions and implementation. As nominal system delay for G.723.1 a figure of 97.5 ms has been given [Cox 96b], which is derived from 37.5 ms algorithmic delay, and 40 ms of processing delay and 20 ms for transmission. Therefore, in subsequent calculations, when the network portion is replaced by something else, i.e. the Internet or a digital mobile network, we have used a figure of 80 ms for coding delay and assessed separately the network transit delay. Overall system delay τd τad1 algorithmic delay τ pd1s processing delay τ td Propagation delay of network= transmission delay Speech Encoder τpd2 processing delay τad2 algorithmic delay Speech Encoder τd = (τad1 + τad2 + τpd1 + τpd2) + τtd Figure 1 Delay introducing entities in digital speech communication The speech coding scheme G.723.1 was originally designed for low-bit rate videotelephony specifically in the PSTN. The long delay does not matter in videotelephony, since the coding delay of a video codec tends to be substantially longer than that of a speech codec. Furthermore, the low repetition rate of video frames (5 Hz) equates to 200 ms, which means that the audio has to be delayed in any case to insure lip synchronism. Therefore, it was felt in SG16 that low delay, which tends to increase complexity and cost, was less important. The possibility of using a G.723.1 codec in networks with inherent long delays such as digital cellular networks or Internet emerged later on, when SG15, the predecessor of SG16, decided to develop a mobile extension of H.324. In terms of Internet, nobody could foresee the option of using G.723.1 as an Internet telephony default codec at that time, when SG15 confirmed design constraints for G.723.1. In a digital cellular network, a speech codec needs to cope with a much more adverse operating environment, which inherently introduce long delays due to delay spread equalization, echo cancellation, bit interleaving, and error control. The same goes for the Internet, which is plagued by long residual routing delays. Therefore, the delay margin left for speech codec is much lower than in PSTN to keep the overall residual delay within reasonable limits. It is easy to insert additional delay into digital data when needed, but once introduced, you cannot take it away. The competing audio coding scheme G.729, which lost the voting in VoIP Forum for Internet telephony default codec, introduces a coding delay of 25 ms only. The SQEG has recently pointed out that extensive tests have been made for G.729 to evaluate its performance on tandemed connections [Hay 96]. Such test results are not available, not at least in public, for G.723.1. The quality of G.723.1 and G.729 do not otherwise significantly differ from each other under normal conditions i.e. without any transcoding being present. The good news from the VoIP decision is that G.723.1 is the default codec of H.324 systems, which facilitates interoperability between Internet telephones and H.324 terminals (no transcoding is needed). Besides incurring additional cost, transcoding tend to introduce additional delay. Therefore, it is more than desirable that interoperability is provided on peer-to-peer level. In the future, one general purpose signal processor, equipped with a SW package to support a range of different speech coding schemes, can be foreseen. Yet transcoding cannot be avoided completely due to the structure of existing telecommunication network. In Section 8 we have assessed the delays in some basic network configurations between G.723.1 voice over packet radio and H.323 Internet telephony as well as between H.324/M and H.323. 7.3 ITU-T H.324 Recommendation [ITU 96d] H.324 is a system specification for low bit-rate real-time multimedia communication. The prime design objective was to provide best possible performance for audio, video , data over low bit rate connections (read: sub 64 kbps). The standard was initially targeted to PSTN [Lin 96], but does not rule out other low bit-rate networks. For instance currently a mobile extension of H.324 is being developed in ITU-T SG16. The H.324 standard rests on the existing widely used H.320 videoconferencing system standard in the sense that interoperability at best level of common mode between H.320 systems is an important design goal equipment User data applications T.120 etc. G.723.1 path delay H.223 Mux/Demux l l a a y y e e r r Data protocols V.14, LAPM etc. Control protocol H.245 SRP/LAPM PSTN Modem control V.25ter System control Scope of Rec. H.324 Figure 2 H.324 block diagram [Lin 96] 7.3.1 H.324 Functional Entities and Communication Procedures The main building blocks and related standards of a H.324 system are depicted in Fig.15 and respective protocol architecture in Fig. 16. The interface to physical PSTN includes a V.34 modem and V.8/V.8bis DTE-DCE interface protocol is used for establishing connection between the modems at start-up. The control of H.324 is carried out by H.245 control protocol [[ITU 96h], which is based on logical channels. The logical channels can be opened i.e. activated for the media streams needed by an application. For data protocols such as T.120 [ITU 96j] or V.70, a bi-directional pair of logical channels need to be opened. The T.120 stack supports multipoint operation, which enables a H.324 terminal to join a multipoint videoconference. The H.223 multiplexer mixes the various media streams and demultiplexer extracts them from the incoming data. Synchronization of the incoming data is carried out with HDLC flags at the beginning and end of data frame (HDLC is better known as synchronous link protocol for SS7). For low bit-rate video coding, a new scheme developed from H.261, the currently dominating standard, for H.320 video conferencing. Regardless of the fact that H.263 has been created for sub 64 kbps connections, it provides better quality even on higher bitrates than H.261. In addition to the picture formats FCIF and QCIF, supported by both H.261 and H.263, the latter one supports also SQCIF. This is rather low spatial resolution of the video, and it is debatable whether it will have any practical significance. The speech codec of H.324 systems is G.723.1 has been discussed in more detail in the Section 7.2.3. The Transport Layer error control is optional for H.324. It will be needed for enhanced interoperability with H.324/M systems. Interoperability on level 0 is always ensured even without support for error recovery, but then QoS is determined by the BER of the mobile link i.e. 10-2 - 10-3. For speech, the QoS remains above the toll quality threshold even at this very low BER level [Hay 96]. Application layer Audio apps Video apps System control Data Apps Presentation layer G.723.1 H.263 H.261 H.245 Control Protocol T.120 Session layer Transport layer Network layer Link layer Physical Layer SRP/LAPM H.223, optionally ITU-T H.223 forthcoming error control (FEC+ARQ) protocol V.14, LAPM H.223 V.8/V.8 bis V.34 Modem V.8/V.8 bis V.25 ter Figure 3 Protocol architecture of a H.324 system Table 1 Phases of communication between two H.324 terminals Phase Mode A POTS call setup B Analogue voice conversation C multimedia call setup, establishment of digital communication for DSVS (digital simultaneous voice and data) D Completion of terminal negotiation E Multimedia communication mode, users can exchange multimedia information in DSVD mode Termination of the H.324 portion of the call F G Function An ordinary telephone call is set up, with ringing, answering, either automatically with V.25 ter procedures or by the called user The calling user can originate a POTS call first and upgrade the call later on into a multimedia call using the V.8 bis protocol (AVD, alternate voice and data) A start-up phase of around 10 seconds takes place, during which initial terminal negotiation and modem (equalizers) “training” for optimal line equalization take place. Exchange of terminal capabilities using H.245 control protocol (duration < 1 sec.) All logical channels are available during the call and can be opened or closed under user control All logical channels are closed and H.245 messages specify call disconnect, transfer to POTS or another digital mode such as fax or V.34 file transfer. Call disconnect, switch to POTS or specified digital mode 7.3.2 Call Initialization Different phases of a H.324 call are presented in Table 9 [Lin 96]. V.8 bis allows the user to alternate between analogue voice (POTS) and multimedia mode during the same call. In the phase C the modems communicate on 300 bps exchanging some basic data on their capabilities. No line equalization is needed at this low speed. The stepwise increase of the V.34 modulation in 2.4 kbps increments up to a maximum speed of 28.8 kbps (for V.34Q 33.6kbps) is initiated with appropriate parameter adjustments to optimize the modem performance. Unlike packet switched mode a the synchronous operation mode of H.324 introduces negligible propagation delays, but the coin has its flipside as well. The start-up is the Achilles’ heel of the H.324 system. Terminals with V.8 protocol only, has the disadvantage that users cannot greet each other by using first POTS. The called party just hears the V.34 modem tones for a duration of 10 seconds or even more without knowing, who is calling. In any case, communication is unusable during the start-up and there is no good solution to the problem. One possibility is to train the modems up to half speed and start audiovisual communication first at a reduced quality level. Subsequently the training is completed during the multimedia communication phase. The duration of the unusable communication phase is reduced by around 50% at the expense of reduced audiovisual quality at the beginning of the call. 7.3.3 Mobile Extension of H.324 (H.324/M) The Mobile Extension of H.324 (Annex C to H.324) is under development in ITU-T SG16. The targeted deadline for the Annex C is January 1998. In its Nice meeting February 1997 the Experts’ Group for Very Low Bit-rate Video Telephony decided ITUT SG16 that unlike originally intended no link layer protocol will be provided over the air interface. Instead an error control mechanism will be provided on end-to-end level as an extended functionality of H.223. Due to this decision, no gateway is needed in the network and no upgrades are needed in current cellular networks. The error control protocol to be developed is likely to be FEC based or a combination of ARQ and FEC error recovery technologies to achieve a net BER, say 10-6. Usually FEC schemes are not capable of correcting errors fully reliably. In particular, for some error patterns, that fortunately occur fairly seldom, FEC tends to make coding errors. Bit interleaver spreads burst errors over time domain in such a fashion that the probability of multiple errors in same code block is significantly lower than that of single errors. ARQ can handle those rare patterns without significant increase in overall latency. ARQ tend to increase the size of receiver buffer and latency. Therefore, it is usually not feasible for audio and video, in particular in cellular networks with long inherent propagation delay. Under severe conditions, Reed-Solomon (RS), Bose-Caudhuri-Hocquenghem (BCH), or Rate Punctured Convolutional Code (RCPC), are used for channel coding. The SG16 Experts’ Group has decided to include a multilevel error control scheme to H.324, the basic idea of which appears on the Fig. 17 [Wim 97]. The scheme is not exclusively intended for mobile networks, but for error-prone environments in general. Depending on the terminal type, error control may be supported from level 1 to 3. All H.324 systems can interwork at the level 0, but under degraded network conditions the QoS may remain rather poor. Level 3 Level 2 + improved error protection of the adapation layer Level 2 Level 1 + improved header with multiplex payload length field Level 1 longer synchronization flag no HDLC 0-insertion Level 0 plain H.223 Figure 4 Error control scheme for H.324 in error-prone environments On the first level, the synchronization is improved by better protection of the sync flags. The level 2 adds error robustness in the header. The algorithm for the error protection is still open. The level 3 adds payload specific error protection to the adaptation layer resting above H.223 [ITU 96f] multiplexer layer, which means that multiplexed media streams are treated separately based on different algorithms. This approach stems from the fact that the error tolerance of different types of media components varies within a broad range. For instance audio is fairly error tolerant due to high degree of redundancy, but even a single error may be fatal for application data. Video falls between these two extremes. The level 3 error protection algorithm has not been decided yet. The likely candidates are RS or RPCP. The RS codes are more effective for burst errors, whereas RCPC suits better for correction of single error patterns. Burst errors are more common in wireless networks without bit interleaving such as PHS and PACS. The levels are to be included in H.245 control protocol and terminals negotiate their error protection capabilities at the beginning of a call. User may choose the level, which offers the best performance. The overall performance is not necessarily improved when a higher level is selected. The additional error protection is achieved at the expense of increased overhead. This entails less bandwidth for payload and impaired video quality. The suggested protocol architecture for mobile H.324 systems is depicted in Fig. 18. ITUT SG16 is developing a new protocol for multi-link operation (H.Multilink), which is scheduled for approval January 1998. Application layer Audio apps Video apps System control Data Apps Presentation layer Session layer Transport layer G.723.1 H.263 H.261 H.245 Control Protocol T.120 Network layer SRP/LAPM H.223, ITU-T forthcoming H.223 error control (FEC+ARQ) protocol Regional Stds V.14, LAPM H.223 Link layer Physical Layer Bandwidth-on-demand e.g. multi-link reservation protocol, V.34 modem Radio bearers (Regional stds) Figure 5 Protocol architecture of a H.324/M system The G.723.1 speech codec and H.263 video codecs may need modifications for acceptable performance under adverse operating conditions. The modified versions have to be backwards compatible with the PSTN versions. Interworking between H.324/M and H.323 terminals is depicted in Fig. 21, Section 7.4.5. H.245 control protocol is common to H.323 and H.324 systems, which makes terminal interoperability easier. 7.4 ITU-T H.323 Recommendation [ITU 96e] The recently approved ITU-T Recommendation H.323 defines the system requirements for Internet telephony and IP based real-time multimedia communications. The recommendation has obtained a wide support and the computer software industry with Microsoft, Intel, and IBM in the forefront. Furthermore, IMTC representing the telecom operators and service providers is backing up H.323. It is likely that H.323 will gain a dominant position as international standard for Internet telephony. Further development of H.323 is underway in ITU-T Study Group 16. Terminals complying with new version of H.323 will be backwards compatible with current H.323 terminals. A block diagram of a H.323 terminal is presented in Fig. 19. The standard is flexible in the sense that besides audio, it supports additional media such as still image data and video up to a full-fledged multimedia system. System control functions and audio are mandatory both for A-law and mu-law quantization. Unlike the H.324 systems, H.323 is not specifically a low bit-rate system, but in case needed, it can operate also across low bit-rate links. Scope of Recommendation H.323 Audio I/O Devices Audio codec G.729,G.723.1 G.711,G.722 G.728, MPEG1 Video I/O Devices Video codec H.261, H.263 Receive Path delay compensatio n H.225.0 Layer Data equipment/ applications T.120 Data Interface System Control H.245 control System Control User Interface RAS=Registrat- H.225.0 Call control H.225.0 LAN Interface Figure 6 Block diagram of a H.323 terminal [ITU 96e] The protocol architecture of a H.323 system appears in subsequent Fig. 20. Application layer Presentation layer Audio apps Video System control Data apps apps G.711, G.722, H.263 RTCP H.225.0 Call H.245 T.120 G.723.1,G.728 H.261 signaling Control G.729 channel Protocol Session layer RTP H.225.0 Transport layer Unreliable Transport (UDP) Reliable Transport (TCP) Network layer IP Link layer CSMA/CD, Token Ring protocol Physical Layer Ethernet, Token Ring LAN Figure 7 Protocol architecture for H.323 7.4.1 Audio The mandatory speech codec of terminals complying with H.323 Recommendation is G.711, which is the 64 kbps PCM A-law or mu-law speech coding scheme. The VoIP decision to back up G.723.1 as a default codec for Internet telephony implies that the industry has agreed to include in the forthcoming H.323 terminals at least these two codecs. The resolution does not rule out other audio codecs such as G.729, G.722, G.728 and MPEG1 audio appearing in Fig. 19. Capability set of H.245 system control includes all the speech codecs. Terminals interchange such information at a negotiation phase prior sending any payload data. ITU-T SG16 has recommended G.729 as default codec for terminals with audio-only capability in the new version of H.323, determined to be approved in January 1998. G.729 is the recently approved ITU-T 8 kbps speech coding scheme [ITU 96c]. It delivers better voice quality allowing at 2-3 transcoding sequences i.e. recoding after a D/A and A/D transformation procedure without severe degradation of speech quality. Wireless extensions of G.729 are under development in ITU-T. The victory of G.723.1 (dual-rate 6.3/5.3 kbps speech codec for low bit-rate videotelephony) in the VoIP voting, reflects the choices made by a majority of the industry in regard to their forthcoming Internet telephony products. The benefit of the inferior G.723.1 codec is better interoperability with H.324 and H.324/M terminals. G.729 codecs are not widely deployed yet. Slight modifications may be needed for G.723.1, while being used in error prone environments. In any case, the modified version has to be compatible with the existing G.723.1 version. Concerning the other audio codecs listed in Fig 19, G.722 is wideband 64 kbps audio, and G.728 is the 16 kbps speech codec used in H.320 videoconference terminals. The additional codecs are recommended to insure interoperability with various multimedia systems. Interoperability between H.323 phone terminals across IP networks to digital cellular networks would need the support of speech codecs such as GSM, IS-54 or IS-95. This simplifies the gateway design and keeps the transcoding delay reasonably low. 7.4.2 H.263 Video [ITU 96i] Video is optional, but if supported, H.261 is mandatory, but it is not suitable for low bitrate operation. The terminals have to be able to both encode and decode QCIF (Quarter Common Interchange Format) video. Also H.263 video coding may be supported and if supported QCIF is mandatory. CIF is optional. CIF video (352 * 288 pixels for luminance and 176*144 pixels for chrominance) is not feasible on a low bit-rate connection. Therefore, CIF video with a reasonable motion rendition performance needs a connection of around 100 kbps. ISDN is commonly used for H.320 videoconferencing terminals, which implies that the available bit-rate is 128 kbps for audio, video and data. Therefore, if a LAN is equipped with an ISDN H.323 gateway, CIF is recommendable for respective H.323 multimedia terminals. In practice, besides H.263 most H.323 systems will probably support both H.261 and H.263 to provide interoperability with existing H.320 ISDN videoconferencing systems and support for low bit-rate operation. The forthcoming upgradings of H.263 bearing the notation H.263+ are particularly important for error-prone, packet-loss affected environments. For instance scaleable videocoding enables splitting the coded video into multiple logical channels in such a manner that some loss of data does not impair significantly the video quality. Since low bit-rate operation is necessary over the Internet, the upgraded H.263 will improve the video quality of H.323 systems under adverse network conditions. A wide range of custom source formats will improve the adaptability of decoded video scenes to resizable PC displays and windows. This enhanced flexibility is particularly attractive for wireless terminals, because it will seen by a user as improved overall quality. 7.4.3 Data The data channel supports telematic applications such as electronic whiteboards, file transfer, still image transfer, database access, and audiographics conferencing etc. The various data protocols such as those of T.120 series protocol suite are not part of H.323 Recommendation. 7.4.4 System, Media Stream and QoS Control System control for H.323 systems is defined in Rec. H.245 and for media stream synchronization and packetization in H.225.0. All communication between endpoints is controlled by H.245 control protocol. Logical channels are opened for different media streams by using H.245 procedures, which are transferred between endpoints across a H.245 control channel. The logical channels may be either one-way or bi-directional, which enables different configuration in different transmission directions. For instance, video may be used one-way only or the spatial resolution of the video may be SQCIF only in the other direction to increase temporal performance. This flexibility is important to utilize the scarce bandwidth as effectively as possible in accordance with user needs. H.245 is also needed to solve a problem faced in a mixed IP/circuit switched network (CSN) environment i.e. how to send DTMF signaling intact through the IP network, which tend to cause excessive delays or tone discontinuities through packet drop-outs. The lengths and levels of the tones of DTMF signals are determined by an ITU-T Q.35 standard and a DTMF receiver located in LE of the PSTN discards a skewed or distorted DTMF signal. Furthermore, excessive propagation delays may induce the user to hang up or repeat the touch tone sequence causing malfunction. Therefore, VoIP Forum has decided to advocate inclusion of DTMF signals in the PDUs of the next version of H.245. Different terminal capabilities are included in H.245 by using ASN.1 notation. H.245 is common to H.323 and H.324 systems, which facilitates substantially interoperability. H.323 is intended for non-guaranteed QoS LANs. Therefore, the standard does not offer any QoS guarantees. The delay intolerant media stream packets (audio and video) are sent over unreliable UDP, which means that lost packets are not retransmitted, unless specifically desired. The H.245 and RTP allows also the use of TCP within limits determined by the playback buffer size, however, at the expense of additional latency. Endpoints may use FEC based error control at the upper layers to handle separate bit errors. Under normal conditions, the audio and video schemes tolerate errors and packet drop-outs without excessive deterioration of quality. H.225.0 enable synchronization of audio and video with the aid of RTP time stamps. Furthermore, it defines the used coding algorithm so that the receiving endpoint can decode the media streams correctly. RTCP capabilities of H.225.0 serve QoS, session and rate control. The packets of different media types are separated by sending them to different transport addresses of the H.225.0 layer. 7.4.5 Wireless Interworking The architecture for intercommunication between H.323, H.320, H.322, H.324, H.324/I and H.324/M terminals is depicted in Fig. 21. H.324/I is a forthcoming ISDN version of the H.324 standard. The H.323 gateway may support interworking with PSTN, narrowband and broadband ISDN, or a mobile network. In this context, the focus is on narrow-band only. The prime gateway functions consists of conversion functions to adapt H.323 endpoints with H.320 (ISDN), H.321 (B-ISDN), H.324 (PSTN), and possibly forthcoming H.324/M terminals. Furthermore the gateway interconnects H.323 terminals over the Internet or WAN to H.323 terminals residing in another LAN. Typical conversion functions are: • bit-stream framing and multiplexing into H.221 (H.320) • terminal negotiation and system control signaling into H.242 (control protocol of the existing H.320 videoconferencing systems) • call control signaling (E.164/Internet address) • DTMF tone conversion • audio conversion to H.320 terminals (G.723.1 to/from G.728) and • video coding conversion to H.324/M terminals (H.263 to/from H.263M) • data protocol conversion Intercommunication between H.324 and H.324/M terminals is insured at level 0, which means that no error correction is present i.e. QoS is determined by the raw BER of the mobile link, ranging from 10-2 to 10-3. In order to insure a BER of the order 10-6 , the H.324 terminal needs to support the transport layer error control mechanism of H.324/M terminal as well (see Section 7.3.3). H.323 Multipoint Control Unit (MCU) Gatekeeper H.323 Gateway/IP Server Non-guaranteed QoS LAN H.323 Terminals IP network (Intranet/ Internet/WAN) Speech terminal ISDN Guaranteed QoS LAN H.322 Terminal H.324 Terminal H.320 or H.324/I terminal PSTN SS7 E1/T1 H.324/M Terminal MSC BSC BTS Figure 8 Illustration of interworking between H.323, and other H series terminals 7.4.6 H.323 Extensions The H.323 standard is currently addressing terminals interconnected to public packet switched networks over LANs. ITU-T SG16 is currently studying also the possibility to use H.323 terminals by using a ppp-like protocol and a copper subscriber line of PSTN. In other words, the idea of running low bit-rate multimedia applications over IP is being investigated. There seems no other technical obstacles to do so except protocol latency and bandwidth constraints, because originally H.323 was not intended for low bit-rate applications. If the problems can be solved, this would result in a complete Internet telephony and multimedia communication standard. 7.5 IETF standards for RTP and RTCP (RFC 1890) The RTP protocol has been used for H.323. It is an end-to-end protocol operating on the top of UDP. RTP accomplishes two fundamental tasks in multimedia communications. Since UDP does not support synchronization of different media streams, RTP includes a synchronization mechanism, which time stamps the outgoing packets enabling the receiver to restore the timing relationship between different media streams. The other main task of RTP is to define the payload format such as audio and video coding. RTCP is an integral part of RTP and it provides QoS oriented feed-back and supports similar functionality as frame synchronous in-band signaling protocols for videoconferencing and videotelephony in the ISDN. Applications can use RTCP to control their operation modes, adjust session control functions, data rate and other parameter settings. RTP will be important in packet radio applications, which contain real-time media such as voice and video. Since RTCP packets are sent at regular intervals, they enable also a third party i.e. service provider to monitor remotely network operation. At a beginning of a RTP session, the application e.g. videoconferencing defines the desired network address and destination port addresses for both RTP and RTCP. Audio and video are carried in a separate RTP session, each having its own string of RTCP packets, which monitor the control the received quality of audio or video. The packets related to a given session report the transmitting entity about the reception quality. For instance, under adverse network conditions, the transmitter may reduce the bit-rate to increase transmission quality i.e. BER. RTP supports dynamic channel allocation in wireless networks and data rate or video coding can be changed on the fly. In the wireless realm, RTP implementations need be effective in terms of low additional delay and overhead. Even if RTP is not a mobile aware protocol, RTP does not include such elements, which would burden or retard the transmission. The aim is to keep the number of control packets below 5% of the payload, but in practice a much lower overhead is achievable. RTP and RTCP seem to fit well into wireless environments, since the functionality includes such elements that are very useful for operation under adverse network conditions. 7.6 Future Directions of Wireless Multimedia Standardization 7.6.1 Potential H.323 Extensions ITU-T SG16 and ETSI TIPHON project are currently investigating the possibility of using H.323 terminals across a PSTN subscriber line in a point-to-point configuration. A new technology is needed to transfer the data between households and ITSPs (Internet Telephony Service Provider), because a V.34 modem connection and the ppp protocol incur far too high latency. As the number of PCs in households is increasing rapidly, there is an apparent need to interconnect them. Low-end routers are already available, and the need for cheap packet switched access is clearly indicated. The new access technology may come up also as an ADSL or ISDN derivative. According to the current knowledge, a residential H.323 access is not likely to be available in short or medium term. The following Fig.22 illustrates the block diagram of a suggested H.323/M terminal and respectively. An IP based solution would have some significant advantages in regard to H.324 systems. The market outlook for H.323 is by far dominant in business communications. If a residential version of H.323 would be available, the market would be really huge in long term, which would enable economies of scale. A H.323 terminal would offer the versatility of IP functionality at the residential user’s disposal, which would enable access across Internet to both conversational and non-conversational Web applications. Scope of a potential Recommendation H.323/M Audio codec G.729,G.723.1 G.711,G.722 G.728 Audio I/O Devices Receive Path delay compensatio n Video codec H.261, H.263 Video I/O Devices T.120 Data Interface Data equipment/ applications H.225.0M Layer System Control H.245 control System Control User Interface RAS = Registration, admission and status H.225.0 Call control H.225.0 RAS control Figure 9 Suggested block diagram of a H.323/M terminal In a similar fashion, we would like to investigate the possibility of wireless extension of H.323. We need to ask first, whether there is a need for such a standard. Given that in 3-5 years, H.323 is foreseen to gain a dominant market position, the answer is yes. There are Air interface Multilink reservation protocol many business applications, in particular in the medical and utilities fields, which would greatly benefit from a wireless H.323 standard. Furthermore, interoperability with other H.323 terminals would be very simple with low additional cost over-head. The second question is, whether the standard is feasible. The envisioned wireless packet radio is unfit to act as a vehicle for such real-time services [Häm 96]. Therefore, the only possibility is circuit-switched connection based on multiple link reservation such as GSM HSCSD. A different error control scheme that is being developed for the forthcoming H.324 mobile extension is needed. Correct operation requires protection for IP headers in H.225.0M at the link layer. The routing of IMT 2000 and Teledesic are expected to offer 2-way data transfer capabilities up to ISDN 2B rates. Fig. 23 presents a potential protocol architecture for H.323/M [Ban 97]. Application layer Presentation layer Session layer Transport layer Network layer Link layer Physical Layer Audio apps Video apps System control H.263 RTCP G.723.1 G.728,G.711 H.261 G.729,G.722 RTP Unreliable Transport (UDP) H.225.0 Call H.245 Control Channel signaling channel H.225.0 Reliable Transport (TCP) Data apps T.120 IP Error control Bandwidth-on-demand e.g. multi -link reservation protocol (Regional Standard) Air interface (Regional Standard) Figure 10 A potential protocol architecture for H.323/M 7.6.2 IETF RSVP Standard The IETF RSVP standard enables reservation of adequate bandwidth for real-time applications under changing network traffic conditions. During a RSVP session, flowspecific state information will be stored in the routers and endpoint hosts [Est 96]. The RSVP protocol enables the network to share the link resources in a controlled manner between multiple RSVP entities. RSVP is under intense development. In particular admission control and policing issues need a lot of further studies. The non-RSVP users’ behavior is unclear under heavily congested network conditions as well as implications on overall network operation. The situation may lead to an explosion of RSVP reservation requests, which in essence will have policing implications. The use of RSVP in packet radio environments does not seem likely in short and medium term. The current state of the art suggests that except voice-over-the-Web, there are no other real-time applications, feasible to be carried over packet radio networks. In case circuit switched wireless multiple links are used as access paths to wireline TCP/IP networks, the RSVP operation is transparent from the mobile network point of view. This issue is beyond the scope of this investigation and should be addressed in conjunction with other IETF RSVP studies. 7.6.3 MPEG4 Standard ISO JTC1 SC29 WG11 is working on the next generation audio and video coding scheme MPEG4. The success of MPEG1 and MPEG2 has encouraged ISO to launch this highly ambitious initiative. MPEG4 will be actually more than an audio and video coding standards, it is a complete multimedia system standard. This implies that a sophisticated syntax and system control will be included in the standard, which enable a range of new functions such as scaleability, content based bit stream editing and manipulation, and combining synthetic and natural coding by incorporating an advanced set of development tools [Rea 96]. MPEG4 would for instance make possible such exciting properties as increased spatial resolution of the video on a selected picture area (a real zooming feature). The current state of the work suggests that MPEG4 standard will not be available before the of year 1999. As we have noted before, MPEG4 is not likely to be compatible with H.263L. The MPEG4 functionality is particularly interesting from low bit-rate multimedia communications point of view. MPEG4 is expected to offer better quality for both speech and video. Furthermore, it supports also audio i.e. music and coding of other tonal information than speech. Provided that MPEG4 is a success story, it will become a very strong candidate for next generation Internet telephony and multimedia systems. The incompatibility with H.263L is the biggest threat to this scenario. Because interoperability of a potential large base of forthcoming H.323 systems with MPEG4 systems may require enormous investments in gateways that would not be needed with terminals conforming to H.263L. 7.6.4 Global Initiatives for Future Wireless Standards In mobile network context, standardization of the IMT 2000, is underway in ITU-R TG8/1. Its scope of work makes up only a small fraction of the enormous effort needed. Japanese are pushing now hard ITU-T to put more effort on IMT 2000. In USA the deployment of 2nd generation digital network has begun fairly recently. Furthermore, in USA, much of the lower frequency block of IMT 2000 has been allocated for rapidly expanding PCS1900. Therefore, interest in IMT 2000 among the big US players is low. In national context, the standards efforts are likely to concentrate on further development of IS-54 and IS-95. In Europe the evolution is seen as a continuous process from GSM [Rap 95], [Ber 96],[Cox 96a],[Ram 96]],[Udd 95]. The phase 2+ in GSM includes HSCSD and GPRS standards, that are almost finished. The next phase may include a wideband asymmetric air interface, possibly based on CDMA technologies. This air-interface can be seen as a terrestrial counterpart and contender for the Teledesic high speed 2 Mbps satellite links. Unlike Teledesic, the high speed data services of the terrestrial systems are planned to be provided in wireless environments with limited coverage only[Ber 96]. Therefore, the two systems address partly different markets.