Thesis Defense Olufunke Olaleye Symbiotic Audio Communication on Interactive Transport Technology Overview Today's Internet traffic contains both audio and data packets. Sensitive traffic (audio) shares bandwidth with other non-sensitive traffic as shown fig 1.1 VoIP Audio Chat Online Radio Real time internet lecture Real time Internet conference Online Music service Internet News Figure 1.1 Bandwidth Limited Network March 21, 2007 Symbiotic Audio Communication on Interactive Transport 2 Growth of Audio While, the technology has moved many businesses online, the use of audio traffic over the Internet has growth exponentially. However, audio perception is highly susceptible to disturbance in temporal quality. Packet loss, delay and jitter affects quality of service during congestion. Global VOIP Growth Subscribers # of Subscribers 150000000 100000000 50000000 To maximize the audio/voice quality, some algorithms proposed to adapt: 0 2005 2006 2007 2008 2009 Year GLOBAL VoIP GROWTH 2005 Total Subscribers 2006 2007 2008 24,043,303 47,346,874 81,618,331 111,209,271 Growth % Net New Subscribers 67 97 2009 133,633,938 72 36 20 9,682,349 23,303,571 34,271,457 29,590,940 22,424,668 Size of the playout buffer Coding rate Packet path diversity [25] Challenge Receiving feedback from the network about congestion. Source: Infonetics Research, February 2006 March 21, 2007 Symbiotic Audio Communication on Interactive Transport 3 Research Goal of Thesis A well-engineered, end-to-end network is necessary to transmit audio over the Internet. Goal of this thesis: Reduce the delay and jitter faced by audio traffic in the network during congestion. An efficient solution: Sender’s end (encoder) ability to sense the state of the network and react accordingly. R &D: Developed symbiotic perceptual audio streaming mechanism that is capable of detecting the underlying bandwidth of the network and modify the target bit rate to suit the network condition. Combines the quantization technique to accurately represent the audio signals without distortion. Proposed TCP Interactive (iTCP) --- Operationally state equivalent to the conventional TCP except applications. --- Optionally subscribe and receive selected local end-point protocol events in real-time. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 4 The Human Auditory Perception Human ear perceives signal within the range 20 and 20000 Hz. High sensitivity is between 2.5 and 5 kHz and decreases beyond and below this frequency band. Two Principle Perception is based on Threshold in quiet --- Sensitivity. Masking threshold --Figure 2.2 Threshold in quiet and masking threshold Temporal Simultaneous masking Human auditory system perceives signal in a non-equal width sub-band called critical bands, it’s unit is barks March 21, 2007 Symbiotic Audio Communication on Interactive Transport 5 Impact of Threshold in Quiet Threshold in Quiet Quantized values without adaptation Signal strength below the threshold in quiet are inaudible to the human ear. Quantized values with 25% adaptation March 21, 2007 Symbiotic Audio Communication on Interactive Transport 6 Impact of Masking Threshold Quantizes value with a single masker – 37.5% Quantized values with multiple maskers ---50% Final Output March 21, 2007 Symbiotic Audio Communication on Interactive Transport 7 Effect of Simultaneous & Temporal Masking Masking threshold -- Temporal Simultaneous masking Effect of Multiple Signals Effect of Multiple Signals Sound Presure (dB) Sound Presure (dB) 200 200 100 25 Time in ms Signal A Signal B March 21, 2007 Signal C Signal D Signal E Signal A 375 325 225 175 Signal A 125 Signal C 275 375 325 275 225 175 75 25 Signal A 125 Signal C 0 Signal E 0 75 Signal E Time in ms Signal B Symbiotic Audio Communication on Interactive Transport Signal C Signal D Signal E 8 Audio Adaptation Audio signal simultaneously passes though the hybrid filter bank and psychoacoustic model. The hybrid filter bank Divides the input signal into chucks of 576 samples called granule and sub bands of frequency. Provides a specified mapping in time and frequency. Figure 2.5 Block Diagram of the encoder The psychoacoustic model behaves like the human auditory system. Computes a just noticeable noise level in each subband. Determines the block (window) type --- (short/long). Computes the energy in each partitions band (threshold calculation partition). Convolves the partitioned energy by applying the spreading function (frequency spread of masking). Apply pre-echo control using some constants (2 or 16). Compares the threshold with the last threshold and quiet threshold and takes the maximum. Converts threshold calculation partition to scalefactor bands and calculates the signal to mask ratio. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 9 Tables for Threshold calculation partitions Threshold calculation partitions is computed with following parameters: width, minval, threshold in quiet, norm and bval: (44.1kHz sampling rate---long) no. 0 1 2 5 6 7 12 15 16 60 FFT-lines 1 1 1 1 1 1 1 2 2 36 minval 24.5 24.5 24.5 20 20 20 18 12 6 0 (44.1kHz sampling rate---short) qthr norm bval 4.532 0.951 0 4.532 0.7 0.431 4.532 0.681 0.861 0.09 0.665 2.153 0.09 0.664 2.584 0.029 0.664 3.015 0.009 0.578 5.057 0.018 0.856 6.422 0.018 0.846 7.026 32.554 0.483 23.897 Parameters for computing the SNR, for short window, it is read from a table. Norm is normaling constant for each sub band Bval is bark value. For low freq, the strength of the masking is limited by minval March 21, 2007 no. 0 1 2 5 6 12 13 14 37 FFT-lines 1 1 1 1 1 1 1 1 7 qthr 4.532 0.904 0.029 0.009 0.009 0.009 0.009 0.009 6.33 norm SNR (db) 0.952 -8.24 0.7 -8.24 0.681 -8.24 0.665 -8.24 0.664 -8.24 0.578 -7.447 0.541 -7.447 0.575 -7.447 0.57 -5.229 bval 0 1.723 3.445 7.609 8.71 13.21 13.748 14.241 23.828 bark: a non-linear frequency scale. Symbiotic Audio Communication on Interactive Transport 10 Tables for converting threshold calculation partitions to scalefactor bands There are 21 bands at each sampling frequency for long windows and 12 bands each for short windows. (44.1kHz sampling rate---long) no. sb 0 1 2 5 6 12 13 14 20 cbw 3 3 4 1 3 4 3 3 2 bu 0 4 7 17 18 36 40 43 59 bo 4 7 11 18 21 40 43 46 61 w1 1 0.944 0.389 0.861 0.083 0.18 0.9 0.532 0.278 bo is the first index value of cbw bu is the last index value of cbw March 21, 2007 w2 0.056 0.611 0.167 0.917 0.583 0.1 0.468 0.623 0.96 no. sb 0 1 2 5 6 9 10 11 (44.1kHz sampling rate---short) cbw 2 2 3 5 3 3 3 2 bu 0 3 5 15 20 30 33 36 bo 3 5 8 20 23 33 36 38 w1 1 0.833 0.167 0.833 0.75 0.625 0.7 0.833 w2 0.167 0.833 0.5 0.25 0.583 0.3 0.167 1 The number of partitions (cbw) converted to one scalefactor band (excluding the first and the last partition). The threshold calculation partitions are converted directly to scalefactor bands. The first partition which is added to the scalefactor band is weighted with w1, the last with w2 Symbiotic Audio Communication on Interactive Transport 11 Audio Adaptation The noise allocation block Uses the output of the psychoacoustic model, “signal to mask” ratio. Used the noise level in noise allocation to determine the actual quantizers and quantizer levels. Determines how to allocate the number of code available for quantization of subband signals. It uses two nested iteration loops. The spectrum (frequencies) are broken into "scalefactor bands". Thes bands are determined by the MPEG ISO spec. In the noise shaping/quantization code, we allocate bits among the partition bands to achieve the best possible quality Distortion control loop (outer loop) Rate control loop – (inner loop) quantize in an iterative process The inner loop quantizes the input signal and increases the quantizer step size until the output can be coded with the available amount of bit. After completion of the inner loop, an outer loop checks the distortion of each scalefactor band, if the allowed distortion is exceeded, it amplifies the scalefactor band and calls the inner loop again. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 12 Audio Adaptation If the overall bit sum is less than the available bits to encode a frame. The best quantized values are coded by Huffman coding to further reduce their space requirement. The bitstream formatting block assembles and formats quantized subband signal using Huffman code and other side information into bitstream. This is then passed into the network for transmission. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 13 TCP Congestion Control TCP provides a connection-oriented, reliable delivery of data streams between two applications or hosts TCP uses two mechanism to detect network congestion i. Retransmission timer time out. ii. Duplicate ACKs. Congestion Control Algorithms Slow-start and congestion- avoidance. Fast-retransmit and fast-recovery. Figure 3.1 Slow Start/Congestion Avoidance (SSCA) mechanism. Figure 3.2 Fast Retransmit/Fast Recovery (FRFR) mechanism March 21, 2007 Symbiotic Audio Communication on Interactive Transport 14 TCP Congestion Control Internal Events } Subscribable Events Event Denotation Explanation 1 Retransmission time out 2 5 New ACK received snd_cwnd reached the slow start threshold ssthresh Third duplicate ACK received Fourth (or more) duplicate ACK received Congested network / Lost segment. Increment snd_cwnd exponential or linearly. Switch snd_cwnd increment from exponential to linear. 6 New ACK received 3 4 SSCA FRFR Sub Lost segment, execute fast retransmit. A segment left the network; transmit a new segment. Retransmitted segment arrived at the receiver and all out of order segments buffered at the receiver are acknowledged Table 3.1. TCP Congestion Control Internal Events In this thesis, for simplicity, we use event 1, retransmission timer time out. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 15 iTCP: interactive TCP User Space An application that has subscribed to the kernel also binds a T-ware to a selected TCP event through the subscription API as shown by line 1 and 2 of figure 4. 7 Application 1 T-ware T-ware T-ware [1] [2] [n] 5 TCP Connection Signal Handler 6a 4a System Socket API Probing API Subscription API 3a 2 6b 4b TCP Kernel 3b Event Monitor Event Informatio n Connectio n State Figure 4. The iTCP extension and API. When the TCP state changes as a result of congestion in the network, it also causes an event to occur in the TCP kernel. The event monitor is aware of the changes that occurred; thus, it responds instantly by sending a signal in (3a) to the signal handler and also stores the event information in (3b). The signal handler catches the signal from the kernel and requests the event type (4a, 4b) from the kernel via the probing API. The appropriate T- Transientware Modules (or T-ware) (5) is triggered to serve the particular event. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 16 Symbiosis Throttling Model During congestion, detected by the time-out event ( ζ = 1 ), the model reduces the target bit rate to considerable lesser or minimum rate. bmax, target bitrate at a normal state bmin, the minimum acceptable rate Figure 5.1 Symbiosis throttling Model (Input Rate) Reduction ratio = rate retraction ratio = ρ = b min/ bmax (5.1) “Running generation threshold” function = link between the underlying TCP and the model. 1 gT (t 1) when 1 2 gT (t 1) otherwise gT (t ) (5.2) Figure 5.2 Symbiosis throttling Model (Output Rate) March 21, 2007 Symbiotic Audio Communication on Interactive Transport 17 Symbiosis Throttling Model Running control generation function” b(t) = b(t ) . bmax when 1 (5.3) 1 gT (t 1) 2 , b(t 1) 1] when 1 & b (t 1) gT (t 1) when 1 & b(t ) 2 . b(t 1) min[ bmax Figure 5.1 Symbiosis throttling Model (Input Rate) Rs = reservoir size Estimated Target Buffer Fullness ETBF 0.9 * T (t ) R s 1 . 1152 / freq) Z ) / 2 when 1 T (t ) ((bmax . 1152 / freq) Z ) / 2 when (( . bmax (5.4) (5.5) A = Actual number of bits per granule Rs T A Figure 5.2 Symbiosis throttling Model (Output Rate) March 21, 2007 Symbiotic Audio Communication on Interactive Transport 18 Symbiosis Throttling Model Input Source audio T-ware xr xr Perceptual Model Event b b (t), ρ Reservoir Rs ratio(sb) Noise Allocation Compute Allowed Distortion T T (t), xr Estimated Target Buffer Fullness 0.9 * T , Rs xmin(sb) 0.9 * T (t), Rs Amplify violated (xmin – xfsf) of masking threshold scalefactor band j < 0 ok Compute Distortion in Quantization xfsf(sb) ix Quantization of actual energy xr ix ix Quantization of amplified energy xr ix violated xr (i) = xr (i) * ifqstep xfsf(sb) Code Information Quant Compare Compute Distortion in Quantization Rs Output audio stream The best quantization with allowed distortion to sharp the noise. ifqstep = sqrt(2) ^ ((1 + scalefac_scale) * ifq(scalefactor band)) March 21, 2007 Symbiotic Audio Communication on Interactive Transport 19 Symbiosis Mechanism: The T-ware Signal handler • Catch the signal from the kernel and invoke the appropriate Tware. • Encoder subscribe with the “retransmit timer out" event only. • T-ware calculates a frugal state target bitrate base on the retraction ratio • Stores the reduced rate in the “rate.par” file. The mechanism use to reduce delay is called T-ware. It gives the transport layer the ability to communicate with the application layer (encoder). The key element is the loss event handler that generates a signal. The signal information is used to probe iTCP service. Couple with the retraction ratio ρ , the rate is reduced to achieve the objective of the thesis. Recovery-T-ware • The recovery T-ware kicks in at the Trecovery time • Returns the encoder bit rate to normal rate. March 21, 2007 Symbiotic Audio Communication on Interactive Transport Figure 6a & 6b Pseudo code of the Signal Handler and the Recovery handler 20 Experiment Set Up Figure 7.1 Experiment setup March 21, 2007 Symbiotic Audio Communication on Interactive Transport 21 Experiment (Test Samples) TEST ( 3 types of audio sound quality) High quality music (HighQmusic), Low or Poor quality music (LowQmusic) Speech mixed with music (SpeechMusic) 3 Running Mode iEXP iOFF Classic Table 7.1 Experiment control flags and running modes March 21, 2007 Symbiotic Audio Communication on Interactive Transport 22 Experiment and Performance Analysis The parameters used in the experiments: (i) Predetermined rate retraction ratio ( ρ = 0.50) (ii) Bit rate To compute b(t), the rate retraction ratio is multiplied by the current bit rate. (Bit rate * ρ ) Frames information are collected for the first 800 up to 1200 audio frames of the encoder and the player. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 23 Performance Analysis Performance of iEXP is better than the classic TCP. The delay buildup in Classic and iOFF are much higher than that experienced by the iEXP (iTCP) as indicated by the step jump. The step jump of iEXP is much smaller as a result of the rate retraction ρ iEXP was able to recover from delay buildup in few seconds compare to the Classic and iOFF. Figure 7.2. Frame arrival delay on the three audio qualities March 21, 2007 Symbiotic Audio Communication on Interactive Transport 24 Performance Analysis Assuming a packet arrives at the receiver end at time tj Expected arrival time for the packet is ej. Referential jitter (refJitter(j)) = (tj - ej) refjitter(j) is negative if packet j arrives early at the receiver end. It can be buffered and played at the actual time. refjitter(j) is positive, if packet j has arrived late at the receiver end. The player will pause and wait for packet that arrive late. The higher the delay, the higher the jitter experience by the frames. Figure 8.3. Referential Jitter on the three audio qualities March 21, 2007 The step jumps in the iEXP were much smaller than those in TCP-classic and iOFF. Furthermore, this indicates that the interactive TCP reduces jitter. Symbiotic Audio Communication 25 on Interactive Transport Performance Analysis Symbiotic rate reduction that occurred as a result of the rate modification between the rate controller of the encoder and the symbiosis unit Target bits and the actual bits generated by the encoder for each frame. The rate retraction ratio of the symbiosis kicks in when a time out event is triggered and reported by iTCP during congestion. The effect is observed on the plots as number of bits drop in accordance with the rate retraction ratio. Figure 7.4. Symbiotic Rate Reduction on the three audio qualities March 21, 2007 Symbiotic Audio Communication on Interactive Transport 26 Performance Analysis Comparison of frame delay and acceptance ratio Table 7.2 Average Frame Delay and acceptance ratio A delay tolerance of d = 2, 4, and 6 seconds are introduced to measure the average frame delay and acceptance ratio. iEXP mode experience a low delay and high acceptance ratio. iTCP’s T-ware mechanism allowed the application to use sophisticated techniques to control the temporal qualities of its traffic. Low delay & High acceptance ratio. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 27 Performance Analysis Overall stream compression The overall delivery bits in the iEXP mode reduce to 80 – 90% of the original bits Table 7.3 Percentage of total bits delivered for each mode iOFF and Classic cases shows no adaptation. The file size of iEXP mode reduce significantly compared to the other modes. Table 7.4 Sizes of the audio files The audio quality between the sending end and the receiving end is achieved perceptually by the temporal and spectral resolution. iTCP provides a means of tradeoff of terrible frame delay for a satisfactory reduction of quality. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 28 Performance Analysis iOFF iExp Aim of the iOFF mode --- Study the overhead introduced by the event notification service. classic 18 16 Increase total transmission time for all modes. 14 12 10 iOFF mode is higher than the classic TCP mode due to event notification service enabled. 8 6 4 iEXP mode is much smaller than the other modes 2 0 HighQmusic Low Qmusic SpeechMusic Indicates the application level performance outweighs the overhead. Figure 7.5 Overhead of the interactivity service March 21, 2007 Symbiotic Audio Communication on Interactive Transport 29 Conclusions Aim: Reduce delay and jitter of audio traffic. Solution: Design an interactive and friendly application that receives the state feedback from the network – Symbiotic encoder. Achievements: Dynamic reduction of jitter and delay of time sensitive traffic during congestion. Network congestion reduction by reducing the traffic at the source. Trade off quality for delay and jitter. The approach is simple and does not alter any network dynamic to be optimal (e.g fair queuing). Its effect is entirely on the application layer. It also further validates interactive transport control protocol (iTCP) usefulness and the efficiency through the idea of the event notification. Technically, this scheme is not applicable to non-elastic traffic such as simple file transfer. March 21, 2007 Symbiotic Audio Communication on Interactive Transport 30 Future Work Optimization: The parameters in the scheme are user defined but can be optimized with the symbiotic throttling model in [13]. The experiment can be performed using the other subsribable events(i.e. duplicate ACK). March 21, 2007 Symbiotic Audio Communication on Interactive Transport 31 BIBLOGRPHY [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] March 21, 2007 Andersen D., Bansal D., Curtis D., Seshan S., and Balakrishnan H., "System Support for Bandwidth Management and Content Adaptation in Internet Applications," Proc. of OSDI'OO, Oct. 2000, San Diego, CA Balakrishnan H., Rahul H., and Seshan S., "An Integrated Congestion Management Architecture for Internet Hosts," Proc. of ACM SIGCOMM, Cambridge, MA, Sep 1999. pp.115-187. Catherine Boutremans, Jean-Yve L Boudec, “Adaptive Joint Playout Buffer and FEC adjustment for the Internet Telephony,” 2003. Eitan Altman, Chadi Barakat, Victor Ramos, “Queuing Analysis of Simple FEC schemes for IP Telephony,” 2001. Floyd S., Handley M., Padhye J., and Widmer J.,“Equation-Based Congestion Control for Unicast Applications,” August 2000. SIGCOMM 2000 H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: A transport protocol for real-time applications,” RFC 1889, 1996 Handley M., Floyd S., Pahdye J., and Widmer J., “TCP Friendly Rate Control” (TFRC): Protocol ISO/IEC 11172-3. Information technology -- Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s -- Part 3 J. Mahdavi and S. Floyd, “TCP-Friendly unicast rate-based flow control,” in Draft posted on end2end mailing list, January 1997, http://www.psc.edu/networking/papers/tcp%20friendly.html. Jacobson V. and Michael J. Karels, "Congestion Avoidance and Control," Computer Communication Review, vol. 18, no. 4, pp. 314-329, Aug. 1988. Jacobson V., "Modified TCP Congestion Avoidance Algorithm," end2end-interest mailing list, April 1990 Johnny Matta, Christine Pepin, Khosrow Lashkari, Ravi Jain, “A source and channel rate adaptation algorithm for AMR in VoIP using E-model,” June 2003. Khan J. and Zaghal R., "Symbotic Rate adaptation for Time sensitive Elastic Traffic with Interactive Transport," Journal of Computer Networks, Elsevier Science, March. 2006. Symbiotic Audio Communication on Interactive Transport 32 BIBLOGRPHY(cont’d) [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] March 21, 2007 Khan J., Zaghal R., and Gu Q., "Rate Control in an MPEG-2 Video Rate Transcoder for Transport Feedback based Quality-Rate Tradeoff;" PV2002, Pittsburgh, P A, April 2002. Khan J., Zaghal R., and Gu Q., "Dynamic QoS Adaptation for Time Sensitive Traffic with Transientware," IASTED WOC'03, Banff; Canada, July 2003. Khan J. and Zaghal R., "Event Model and Application Programming Interface of TCP Interactive," Technical Report 'TR2003-02-02', Feb. 2003. N. Shacham and P.M Kenney, “Packet recovery in high-speed networks using coding and buffer management,” in Proc. IEEE Infocom 1999 . Pradhan P., Chiueh T. and Neogi A., “Aggregate TCP Congestion Control Using Multiple Network Probing,” Proc. Of the 20th International Conference on Distributing Systems, ICDCS 2000. Rishi Sinha, Christos Paradopoulos, Chris Kyriakakis, “Loss Concealment for Multi-channel streaming audio,” June 2003. Sisalem D. and Wolisz A., “Towards TCP-Friendly Adaptive Multimedia Applications Based on RTP,” Proc. of the 4th IEEE Symposium on Computers and Communications, 1998. Stevens W. R., “TCP/IP Illustrated, Volume 1: The Protocols,” Addison-Wesley, 1994. Wang R., Yamada K., Sanadidi M. Y., and Gerla M.,“TCP with sender-side intelligence to handle dynamic, large, leaky pipes,” IEEE Journal on Selected Areas in Communications, 23(2):235-248, 2005. Wen-Tsai Liao, Janet J.C.Chen, Ming-Syan Chen Chen, “Adaptive Recovery Techniques for Real Time Audio Stream” Wenyu Jiang, Hening Schulzrinne, “VoIP: Comparision and optimatization of packet loss repair methods in VoIP perceived quality under bursty loss,” May 2002 Yi J. Liang, Eckehard G. Steinbach, Bernd Girod, “Voice over IP: Real-time voice communication over the nternet using packet path diversity,” October 2001. TR2007-02-01 Test Audio Set: Symbiosis Audio Streaming with iTCP, Olufunke Olaleye and Javed I Khan, February 2007. Symbiotic Audio Communication on Interactive Transport 33 Questions and Comments March 21, 2007 Symbiotic Audio Communication on Interactive Transport 34