QoS Requirements of Multimedia Applications Brett Berliner, Brian Clark and Albert Hartono Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 {berliner, clarkbr, hartonoa}@cse.ohio-state.edu Abstract With the growth of the internet and easier access to high speed internet connections, more and more people are turning towards computer networks to handle their long distance voice communication instead of the traditional telephone system. Using the internet to replace standard telephone lines has many advantages. One of the biggest advantages being that using the internet for voice communication eliminates the concept of “long distance”. Most companies that provide internet based voice communication charge a monthly rate and do not charge on a per-minute basis like traditional telephone companies. The most common way of transmitting voice over the internet is by Voice over Internet Protocol, or VoIP. Quality of Service requirements are very important to multimedia applications. Ensuring that these requirements are met is key to many of today’s applications and creating new technologies to ensure that stricter requirements can be met will help create new devices in the future. This is a study on the values of the QoS Requirements for Multimedia Applications. Introduction When the internet was designed, it was intended to be used for transfer of text or other simple data types, where the level of service did not matter. The only thing that mattered was reliability. Concepts such as delay, jitter and packet loss percentages did not effect the service, the only thing that mattered was that the service existed. When the internet started being used for applications such as internet telephony, streaming live video and even remote surgery, things like jitter and packet loss began to matter. Multimedia, unlike text, has a need for service guarantees or else the services become useless. Trying to carry on a conversation when words arrive out of order would be quite frustrating. Thus, the introduction of multimedia into the internet led to the concept of quality of service. 1. Voice Codec Name G.711 GIPS Enhanced G.711 G.723.1 G.728 G.729 Delay Component Packetization 1.1 Raw Data for VoIP The process of sending human voice over a computer network starts with a person speaking into a PC microphone. The sounds waves produced by the voice must be translated into an electrical signal in order to be sent over a network. This process of converting the analog signal to a digital one is call digitization. In order to digitize human voice effectively a sample is captured 8,000 times per second, or given a sampling rate of 8 kHz. It is standard to use 8 bits per sample which results in a minimum data transfer rate of 64 Kbps. At the application layer this digital signal is encoded and decoded by a codec. The sampling rates, bit rates and extra information about popular codecs Figure 1: Audio Compression Standards (Codecs) [1] Sampling Rate Bit Rate (Kbps) Delay Contribution Miscellaneous (kHz) (ms) 8 64 <1 8 Variable Voice activity detection 8 8 8 5.3 and 6.3 16 8 100 2 10 Optimized for low delay Voice activity detection Figure 2: Delay Contribution by Components [2] Maximum Delay Comments Contribution (ms) 30 The process of converting the actual digital signal into packets. Serialization Dejittering 40 30 Presentation 17 This delay is only incurred when using modems. Done to compensate for jitter introduced by networks. This assumes a 1x dejitter buffer is used. The delay introduced by actually presenting the information to the human recipient. 1 can be found in Figure 1. Like any other calculation, encoding and decoding voice signals takes some finite amount of time. The delay contributions of the various codecs are also presented in Figure 1. Each of these compression codecs introduces a different amount of delay. The delay introduced comes from various sources. The upper bounds of some of the factors contributing to this delay are presented in Figure 2[3]. while the other listens, only 50% of the full duplex connection is used at a time. Also, voice packets are not transmitted during the silence observed in between words. By not sending packets containing “dead air” approximately 10% of the bandwidth is reduced. These two techniques total up to a 60% reduction in bandwidth from silence suppression. 2. Video Once the voice is encoded with a particular codec it is transmitted over the internet using internet protocol l(IP). Video traffic is being sent more and more often in today’s internet and will only increase in the future. Applications such as video conferencing are becoming business standards and many websites, CNN.com for example, offer videos on demand. Today many homes even have digital cable television service which transmits video information over a network. Since video imaging requires lots of data, compression and reservation protocols are going to become necessary to support the future of video in networking. Since IP is a best-effort service the QoS is not perfect and some delay, loss and jitter is encountered. When the encoded signal reaches its intended destination it is decoded using the same codec used by the receiver. Finally the decoded audio signal is presented to the receiver through a speaker. 1.2 Need for Delay Reduction 2.1 Raw Data From Figures 1 and 2, one can see that the delay introduced from a codec alone can approach almost 120 ms. This number does not include the other various delays introduced by the network such as propagation delay, queuing delay and transmission delay. Based on the ITU recommendation G.114, the delay in a telephone call should be less than 100150 ms. The reasoning behind this is a psychological factor. If the delay is much more than this the caller will be dissatisfied with the service. Even though a delay of 100150 ms is acceptable most QoS requirements for VoIP ask for 50-80 ms of delay or less. In order to achieve studio quality picture a video stream is broken up into 30 frames per second. Each of these frames contains 525 lines. In each of these frames the y value, or luminance, is sampled at 13.5 MHz and the two chrominance values, u and v, are sampled at 6.75 MHz. This total data rate comes out to (13.5 + 6.75 + 6.75) * 8 = 216 Mbps. Due to this extremely high bit rate, obviously compression techniques are required to transmit video over the internet. For different transmission lines different compression is required. If a channel supports higher bandwidth then less compression is needed. Conversely, if a channel has lower bandwidth then a higher compression ratio is necessary to view the data. This point is illustrated in Figure 3. It can be seen that for slower channels sending video, even of lower quality, is just not feasible due to the enormous compression ratios needed. 1.3 Solutions Since delay must be minimized to ensure satisfactory telephone service we must employ some techniques to reduce this delay. The first major way to speed up voice communication is to compress the audio signal. If the sheer size of the data being transported is reduced it will arrive at the destination quicker. Some notable low bit rate compression algorithms used are ITU G.723.1 and G.729A. Another way to reduce the payload of transmitting voice over IP is to use silence suppression. Due to the fact that during normal telephone conversation one person talks Channel PC local LAN Modems ISDN T-1, DSL Ethernet T-3 Fiber Optic Video uses techniques to compress individual frames, like JPEG does, but also uses motion prediction to compress the data further. In fact, most of the time the bit rate required for video transmission is dependant solely on motion within the images. Factors such as screen size, resolution and scanning rates are almost irrelevant. Motion is defined in Figure 3: Required Compression Ratios for Package Television [4] NTSC TV HDTV Bit Rate 168 Mb/s 933 Mb/s 30 kb/s 5,600:1 31,000:1 56 kb/s 3,000:1 17,000:1 64 – 144 kb/s 1,166:1 6,400:1 1.5 Mb/s 112:1 622:1 10 Mb/s 17:1 93:1 42 Mb/s 4:1 22:1 200 Mb/s 1:1 5:1 2 Film Quality 2300 Mb/s 76,000:1 41,000:1 16,000:1 1,500:1 230:1 54:1 11:1 Figure 4: Number of Television Channels for Various Averaged Motions Within the Images [4] Average Very Slow Slow Normal Fast Motion Pixel Change 2 kp/s 4 kp/s 8 kp/s 16 kp/s Rate Channel Bit Rate 12 kb/s 24 kb/s 48 kb/s 96 kb/s 30 kb/s 2.5 1 0 0 PC local LAN 56 kb/s 4 2 1 0 Modems 64 – 144 kb/s 12 6 3 1 ISDN 1.5 Mb/s 125 62 31 15 T-1, DSL 10 Mb/s 833 416 208 104 Ethernet 42 Mb/s 3500 1750 875 437 T-3 200 Mb/s 16,666 8,333 4,166 2,083 Fiber Optic increments of 1k (1024) pixels/second. In normal television this translates to approximately one square inch of changed image per second. This change does not need to be in one contiguous block, it can be scattered throughout the entire image. Figure 4 illustrates this by showing the number of simultaneous channels various types of links can support for different rates of motion. 3.1 Definition One type of interactive multimedia is interactive gaming. Interactive gaming, in this case, refers to players on their own machine connecting remotely to other machines to compete in the same event against each other. The device used to connect could be a PC, a console game system or a handheld device. Each of the devices already has most of the game data, such as the engine and the graphics, so only certain data needs to be sent to the central server. This data may include character positioning and orientation, as well as their current action, and the central server sends the pertinent data to the connected computers for processing. Figure 5: Delay Requirements for Data Types [6] Application Video Voice Interactive Multimedia 150 150+80 15 Delay (ms) 2.2 Delay Introduced By Compression Every computation takes some time and compressing/decompressing video is no exception. More often than not, the latency introduced by this process is much greater than the latency introduced by digitization and digital processing in uncompressed format. Since most video is very data intensive a high compression ratio is needed. The greater the compression ratio used, the greater the latency introduced. Typically the delay introduced by encoding and decoding in a distribution and/or broadcast scenario is several seconds [5]. 3.2 QoS Requirements These requirements help ensure that gameplay is a smooth, realistic experience for all users with a minimum internet connection, depending on the game. Even the inability to meet one of these requirements often will completely ruin gamers’ experiences while playing. The QoS requirements that most directly affect interactive gaming are [7]: 1. a minimum amount of throughput 2. an acceptable end-to-end delay 3. low jitter 4. low packet loss rate 5. high dependability 3. Interactive Gaming Recently, interactive multimedia, such as network gaming, remote visualizations, remote surgery and tele-immersion, has become a very large part of the still developing internet. Compared to video and voice, these types of applications often have QoS requirements that are even tougher to satisfy than video and voice program. This is often due to the fact that these applications can generally not afford to lose packets or suffer from any noticeable latency, or there is a good chance the experience will be affected, if not ruined. 3.3 Throughput Among researchers, there is a belief that the lower bound on the acceptable delay from interactive multimedia is 15 ms, which is the amount of time it takes for a 66Hz monitor to draw a single frame. With a lower delay requirement, the monitor could not keep up, and therefore, most of these methods could not be implemented [6]. Figure 5 shows the average delay requirements for interactive multimedia in comparison to those of video and voice. Throughput is a QoS requirement that varies from game to game. Most games only require 56K dial up connections (40 kpbs) to run smoothly. For example, two of today’s most popular online games, Guild Wars and Counter-Strike, can both be played online with a 56K connection. CounterStrike, for example, only needs around 16 Kbps per connected user to avoid slowdown [7]. This number can vary greatly depending on the genre of game. Games where players have to take turns, such as Massive Multiplayer 3 Online Role Playing Games (MMORPGs) like Everquest or World of Warcraft, can allow for slower links, as the data can update while the player is waiting their turn. As a result, these games often only require a 30-40 Kbps link. This also applies to real time strategy games such as Command and Conquer, where the player tells their units what to do, and while the unit is processing, the server can send receive data. These games generally hover around 20 – 30 Kbps, although the newer the game, the higher the link speed necessary. Very new first person shooters (FPS), such as Battlefield 1942, can be played with 16 players on a 40 Kpbs connection. However, to take full advantage of all of the vehicles and weapons, as well as allow all 64 possible players at a time, each user must have a broadband connection around 250 Kbps [8]. Its sequel, Battlefield 2, needs around that level and offers no guarantees for those with less speed. In fact, for highest performance with 64 players, a link of 2 Mbps is necessary. The following table shows what type of games need around how much speed. Figure 6: Throughput and Delay Across Game Types Game Type Basic RTS MM Basic Intense ORPG FPS FPS 2030-40 40 250 – Throughput 40 30 250 2000 (Kbps) 150 - 150 150 50 End to End 50 200 Delay (ms) 3.4 End-to-End Delay The most important QoS requirement for online gaming is definitely end-to-end delay. End-to-end delay is a major factor in what gamers call lag, which is basically a slang term for latency. Lag refers to the delay between when a command is issued by the player, and when it happens on the screen. A large amount of lag can completely ruin a gamer’s experience. For elite performance, 50 ms or less for end-to-end delay is optimal. However, this is only absolutely necessary for certain games and game types. While 50 ms is a good value for very intense first person shooters like Battlefield 2, like throughput, older first person shooters such as Counter-Strike, and real-time strategy games and MMORPGs, can survive with a higher delay, as they do not need as fast of a link. Specifically, most of these games can run with a delay of 150 ms or below [9]. Figure 6 shows the acceptable amount of delay across game types, while Figure 7 shows how CounterStrike is affected by end to end delay. 3.5 Jitter and Packet Loss Rate Jitter will cripple an online game. Packets are timely, and any amount of jitter that allows packets to be received late, or even worse, out of order, affect gameplay the same as < 50 ms Excellent gameplay packet loss. As the paper says, even the users who suffered a delay of 40 ms, but with a jitter of 20 ms, were affected greatly. All of the users reported horrible gameplay during this experiment, and the delay never was over 100 ms, which in the game they played (Unreal Tournament 2003), would not be devastating, but some of the users could not even continue the game [11]. This is due to the jitter. If the packets don’t arrive on time, then game information can be lost. Even worse, if the packets arrive out of order, then the packet that arrives late is useless, and the possibly important information is lost. In network gaming, only data that is relevant to the game is usually sent. As a result, there can be almost no packet loss, since all data is important. Effectively, the packet loss has to be at 0% for a game to run smoothly. The only way that a packet loss is acceptable is if the packet contains insignificant details. However, almost no packets (if any) contain any insignificant details. Therefore, packet loss can be crippling to a game session. 3.6 Interactive Gaming Dependability Like most multimedia, network gaming generally uses UDP for transmission, due to the fact that it is a repeated transmission to a single source, and the transmission is usually small. In addition, there is no time for TCP connection establishment and acknowledgement, as speed is everything. Therefore, packets cannot be retransmitted, so the link needs to be dependable. It also needs to always be open. Even if only two players are against each other in a Battlefield 2 session, the ability to add in 62 additional players must be there. They cannot simply open up the line when it’s needed, because if there isn’t room to be opened up, the game session will fail. Also, the server must be able to handle a constant stream of packets. Although each packet is very small (around 100 bytes), the packets come at a steady stream, depending on game, varying from 30 ms to 100 ms. For most games, the maximum amount of traffic must be assumed (that is, every game room is full), so if maximum traffic is actually transmitted, the network is prepared [7]. 3.7 Mobile Network Gaming Mobile network gaming is played on devices such as PDAs and cell phones. When gaming on these devices, there is generally much less data to send than PC games, due to the processing limitations of the devices. As a result of these limitations, technology, bandwidth and time are rarely devoted to mobile games, meaning these games don’t particularly work well (if at all). This, however, is beginning to change as handheld gaming systems like the Sony PSP become popular. Figure 7: End-To-End Delay in Counter-Strike [10] 50-100 ms 100-150 ms 150-200 ms Good Noticeably decreased Significantly effected gameplay gameplay gameplay 4 > 200 ms Intolerable gameplay The two major transport services for PDAs and cellular phones are GPRS (General Packet Radio Service) and UMTS (Universal Mobile Telecommunications Systems). GPRS is a non-voice service that works in unison with mobile devices to send their data. GRPS is referred to as “always connected”, since it can perform an almost instant transmission. Theoretically, GRPS can send data at a rate of 171.2 Kbps, but in reality, to reach this, an operator must grant them all of the bandwidth, referred to as timeslots. Since operators will rarely, if ever, do this, in reality, the bandwidth is less than 1/8th of this at many times, and only 1/4th of it at best [12]. UMTS is a type of mobile transmission that relies on radio spectrum transmission. Again, theoretically, UMTS can transmit anywhere from 384 Kbps to 2 Mbps, but again, this is only if the space is reserved [13]. A study done in Germany on a mobile volleyball game shows the problem with UMTS and GRPS that doesn’t befall console and PC gaming – mobile networks are just not set up for over-provisioning that is necessary to meet gaming QoS requirements. For instance, the simple volleyball game had very little data to send (approximately 20 states per minute, which each state containing very little data), but the game was very tough to play, due to the fact that the delay was always between 100 and 200 ms [14]. must wait for every action to perform, they will not want to use the system. The QoS requirements that most directly affect remote visualizations are: 1. low delay 2. high throughput/bandwidth 3. low latency 4.2 Delay Generally, the way remote visualizations work is simple – a computer collects the input, storing it as raw data. That data is then turned into triangles, which are translated to an image, and then only the image is sent to the user, who then views the image or executes a command, modifying the data and forcing the system to take a new snapshot. Therefore, most of the data is handled at the side of the computer that is given the raw data. [16] For the calculations, the data gathering computer first has to receive the raw data. Often, this data can approach size of gigabytes or terabytes, but due to visualization algorithms, the size can often be dropped down to around 100 MB. The size of the raw data is listed here on as ‘n’. Next, the number of triangles per frame in the image can be written as K, where K is often 500,000 (but can range from 50,000 to 1,000,000). With n and K, the delay to generate the triangles from raw data is on the order of O(log(n) + K), with the actual time depending on the CPU[16]. Although the Sony PSP contains much better graphics than a PDA or a mobile phone, the reason is that it contains its own wireless card that conforms to the IEEE 802.11b standards [15]. The devices only communicate with a wireless router, which already has the setup for ensuring QoS requirements are met (as they would be for laptops or PCs connected to the wireless router). As a result, the PSPs are able to be played around a wireless router. True mobile gaming is not effective yet. With time determined, next is size. The size of the triangles in bytes is 3 (dimensions) * 3 (points) * 4 (float number) * 2 (shading) + 12 (color) = 86 bytes. So with 500,000 triangles, there will be 43 MB of data. Then, to make them an image, the data must be copied to the graphic card, so that will take around 43 M * the copying speed, and then an average graphic card can take around 100M triangles per second, so the time to process one frame is 500,000 / 100M, or 5 ms [16]. Then, the only delay left is how long it takes to get to the recipient, which must as small as possible, because otherwise, the phenomenon that nothing is happening can occur. If a user goes to zoom in on a data set, and the processing takes 30 seconds, it will be a highly frustrating process. However, with the large amount of data to be dealt with at the original computer, this is part of the reason remote visualizations will not work. Figure 8 shows a simplified version of how the data is collected and transferred. 4. Remote Visualizations Remote visualizations are another type of multimedia becoming a reality as the internet continues to grow. Basically, a remote user connects to a data set that is either generated or has previously been generated. The key is that the data must be interactive – for example, the user can often do things like rotate the data, zoom in, and even add in slice planes to see inside the data. Right now, however, there is just too much in the way of performing consistent remote visualizations. Figure 8: Remote Visualization Data Process 4.1 QoS Requirements To perform consistent, timely visualization transmission, a few QoS requirements need to be met. If these requirements cannot be met, the user will almost certainly be unable to have the patience to get their desired results. Since the size of the raw data is often in the gigabyte range, it is clear that remote visualizations require great precision. Thus, users need to be able to single out important parts of the data. This can often take many actions, and if a user 5 carried out from a certain distance. Through communication lines, digitized information can be transmitted to remote locations, enabling surgeons to operate on patients located distantly. Challenges to this concept are numerous, but the most essential limitations have been the dependability or quality of service of the communication lines and the issue of latency, which is the delay time from when the hand motion is initiated by the surgeon until the remote manipulator actually moves, and the image is shown on the surgeon’s monitor [20]. 4.3 Throughput/Bandwidth This is a big factor that keeps remote visualization from being possible. The transmission speed is the last key. Obviously, a high speed link is needed at the end, because the transmission both ways must be as quick as possible, so that the only delay is at the computer gathering data. When performing remote visualizations that do NOT work like the above one, usually, a link speed about around 700 Mbps is necessary. This is to make sure the data (which would be the 100M) is transmitted in a timely fashion. The more observers of the data the more time is needed. If that compression is put into motion, then a link of only 100 – 200 Mbps is necessary. Again, the more observers, the more the overall link will need (so 2 users will be around 200 – 400 Mbps from the central server, and so on). In addition, if stereoscopic rendering (that is, creating the illusion of depth in the image), the amount of bandwidth needed is doubled [17]. 5.1 QoS Requirements Even though there is little practical experience of telesurgery at present, it is clear that successful telesurgery will require a data transfer of robot commands, video and voice signals, text, computer data, as well as stored and real-time medical images. A list of provisional network requirements for telesurgery can therefore be identified. These QoS requirements include [18]: 1. reliability 2. an acceptable end-to-end delay 3. multiplexing of various data rates 4. low data error rate (BER-bit error rate) Other requirements and desirable features are likely to emerge as further telesurgery trials are conducted and more experience is gained. 4.4 Latency Although QoS data on exact requirements for latency are difficult to track down, a simple experiment using a visualization kit (VTK, in this case), can demonstrate the effect of latency on distance visualizations [26]. On VTK’s website, they offer many sample programs in C++ that use to perform visualizations. Although these occur solely on the user’s computer, the “sleep” command from <windows.h> can be used to simulate remote visualization delay. Since the command sleep(50) tells the processor to sleep for 50 ms before continuing on to the next program. Inserting a sleep(X) command wherever a command appears in the VTK program will cause the program to pause for X ms before it continues. Since every time a command is issued, the computer waits a certain amount of time to begin it, this is a very effective latency simulation. 5.2 Reliability and Error Rate In telesurgery, since human lives are at stake, the consequences of an error in transmission could be very serious, and therefore reliable techniques of networking must be acquired. Since the data rate associated with robot commands is very low (typically 19.2 Kbps), there is therefore ample scope to protect each message with error-protection coding. Each time the operator issues a command, the transmitting equipment can send it more than once to the receiving end. The receiving end can then echo the command back to the sending end. Only when the command is received and echoed correctly, say three times in succession, would the command be executed at the receiving end [18]. A simple, unscientific test reveals that around 150 ms sleep time starts to noticeably affect the program. It is not a deal breaker, but it is frustrating at times. At between 225 and 250 ms, the latency becomes impossible to work with. Even a simple task such as rotating the data to see a different side of the visualization becomes arduous. Both of the users who attempted to experiment with the remote data with a delay of 250 ms became too frustrated before finishing any tests with slice planes. As a result, even though these numbers are unscientific, it is clear what kind of effect latency has on remote visualizations. One apparent threat to safe telesurgery would be a power cut occurring in the network. Opportunely, telecommunications network operators employ battery arrays and ‘hot standby’ generators to take over the task of powering the network in the event of a mains power failure. A power cut on the network must be virtually unnoticed by the user [18]. 5. Telesurgery 5.3 Time-Delay Telesurgery - surgery performed at a certain distance - is one aspect of telemedicine. The introduction of robotic and computer technology into surgical operations allows dexterity to be increased and surgical procedures to be There is a major constraint that could lead to disastrous results during surgery, namely time delay. The surgeon therefore views his or her movements on the computer 6 interface as they are happening. If the surgical system were removed to a more distant site, however, it would introduce a time delay. Visualization of the operating field could be milliseconds or even seconds behind the real-time manipulations of the surgeon. Studies showed that the acceptable limit of time delay in terms of a surgeon’s perception of safety was roughly 330 ms [19]; satellite transmission, for example, would introduce a delay of more than 600 ms. tele-immersion was one of five key technologies identified as necessary for the future use of the NGI [25]: Tele-immersion. Tele-immersion will enable users in different locations to collaborate in a shared, virtual, or simulated environment as if they are in the same room. It is the ultimate synthesis of networking and media technologies to enhance collaborative environments. Tele-Immersive applications must combine audio, video, virtual worlds, simulations, and many other complex technologies. They will require huge bandwidth, very fast responses, and guarantees of delivery. On September 7th, 2002, the world’s first human longdistance operation was performed between New York, USA and Strasbourg, France (14,000 km distance), demonstrating the feasibility and safety of performing a complete surgical operation from remote locations. The two sites were connected through a high-speed terrestrial optical-fiber network that transports data through dedicated connections using Asynchronous Transfer Mode (ATM) technology. A bandwidth of 10 Mbps has been reserved through a network that interconnects applications at both sites using a network termination unit (NTU), which provides a multiservice path to different applications [19]. 6.1 QoS Requirements In general, the QoS requirements for tele-immersion include the following four important factors: 1. extremely high network bandwidth 2. low latency 3. constant jitter 4. guarantees of delivery 6.2 Challenges of Tele-Immersion By monitoring both NTU units at the two ends, the number of lost packets and the communication quality were measured. It was revealed that no ATM packet was lost during any surgical procedure. The round-trip delay by ATM transport was 78 − 80 ms. Adding 70 ms for video coding and decoding, plus a few milliseconds for rate adaptation and Ethernet-to-ATM packet conversion, movements executed by the surgeon in New York were apparent within 155 ms on his video screen [19]. Tele-immersion has emerged as a high-end driver for the Quality of Service (QoS), bandwidth, and reservation efforts envisioned by the NGI and Internet2 leadership. From a networking perspective, tele-immersion is a very challenging technology for several reasons [24]: The networks must be in place and tuned to support high-bandwidth applications Low latency, needed for 2-way collaboration, is hard to specify and guarantee given current middleware The speed of light in fiber itself is a limiting factor over transcontinental and transoceanic distances Multicast, unicast, reliable and unreliable data transmissions (called “flows”) need to be provided for and managed by the networks and the operating systems of supercomputer-class workstations Real-time considerations for video and audio reconstruction (“streaming”) are critical to achieving the feel of telepresence, whether synchronous or recorded and played back The computers, too, are bandwidth limited with regard to handling very large data for collaboration Simulation and data mining are open-ended in computational and bandwidth needs—there will never be quite enough computing and bits/second to fully analyze, and simulate reality for scientific purposes Another technology which impacts the time-delay requirement of telesurgery is video compression algorithms (codecs). It is important that new codecs produce video that is of higher quality, low latency (< 100 ms), and degrades gracefully. 5.4 Simultaneous Transfer of Various Types of Data The various transferred data in telesurgery consists of video, voice, images, robot commands, text, and computer data. These data are originated from different equipments and have different data rates. Therefore, it is crucial that the network must have the ability to simultaneously transfer data from sources with widely differing data rates. One networking technology that is ideal for telesurgery is, for example, ATM, due to its special ability to multiplex sources with different data rates and its low cell loss rate [18]. 6.3 Lag 6. Tele-Immersion Lag is the term used to describe the perceived sum of all the sources of latency in a system. Typically, it is thought of as the delay between action in the real world and the perceived response of the system to that action. Lag is the critical Tele-immersion has since entered the NGIand Internet2 vocabulary. In the applications section of the Computing Research Association’s “Research Challenges for the NGI,” 7 Type Control Text Audio Video Tracking Database Simulation Haptic Rendering Latency < 30 ms < 100 ms < 30 ms < 100 ms < 10 ms < 100 ms < 30 ms < 10 ms < 30 ms Figure 9: Tele-Immersion Data Flow Types Bandwidth Reliable Multicast Security 64Kb/s Yes No High 64Kb/s Yes No Medium Nx128Kb/s No Yes Medium Nx5Mb/s No Yes Low Nx128Kb/s No Yes Low > 1GB/s Yes Maybe Medium > 1GB/s Mixed Maybe Medium > 1 Mb/s Mixed Maybe Low >1GB/s No Maybe Low issue for usability; reducing lag is a major technical challenge. Communications latency is only one component of tele-immersion lag. Effective solutions to reducing lag must attack the component sources of latency at all levels of the system. Sources of latency in the communications system are transmission latency, bandwidth or transfer latency, switching or routing latency, contention, and protocol latency. Most users have difficulty manipulating objects in VR once lag exceeds 200 ms [24]. When the virtual display is coupled with the real world, as in tele-robotics, this limit is approximately 30 ms. Non-network components of the VR system often together exceed 200 – 300 ms, so there is actually very little room for wide-area communications delay in the lag budget. 6.4 Jitter Jitter in the network will more greatly impact collaborative coordination than latency [23]. Higher latencies with low jitter will still allow collaborators to make reasonable predictions of how an environment will behave (albeit overall task performance will decline.) However high jitter reduces predictability and hence collaborators are forced to employ a purely sequential interaction strategy 6.5 Tele-Immersion Flow Types Progress in all these areas, however, is expected; teleimmersion serves as an integrating technology as pieces of the solution are contributed by the community and computer/networking industry. The following table, developed in discussions with Rick Stevens, director of the Math and Computer Science Division at Argonne National Lab, gives our current best estimations and opinions of the attributes of the nine types flows simultaneously needed for an n-way compute and data-intensive audio, video, and haptic (touch) tele-immersive session [24]. Each row indicates data flow types: Control information: data used to manage the session, to authenticate users or processes, to launch processes, to control the display or tracking systems, and to communicate out of band between the world servers and VR systems. Text provides simple communications capability within sessions for simple note taking and passing. Audio gives ambient auditory cues, allows voice communications among users, and is used to issue Streaming No No Yes Yes Yes No Maybe Maybe Maybe DynQoS Low Low Medium Medium Medium High High High Medium commands via voice recognition and speech synthesis. Video can allow teleconferencing or remote monitoring displayed within the virtual world. Tracking is achieved with location and orientation sensors, and captures the position and orientation of the user. Typically this data is streamed to the computer responsible for computing the perspective of the scene. Tele-immersion requires tracking data to be shared among sites. Database is the heart of a tele-immersion application world. The database contains the graphical models of virtual scenes, objects, and data, and since the database is used to provide the models that are rendered, it must be maintained in a coherent state across multiple sites. Databases might be as simple as shared VRML files or as complex as multi-terabyte scientific datasets, VR extensions of video serving. Simulation provides the basis for dynamic behaviors, like responding to the users’ actions. Small-scale simulations often run on the computer also generating the VR experience, but frequently the simulation will need a dedicated supercomputer. User input is captured and transmitted to the simulation via the network and the simulation will generate an update, which is then propagated to each user site for local rendering. Typically the data transferred to the simulation is considerably smaller than the data returned by the simulation. Haptics include force and touch sensing/feedback devices and use a variety of sensors and actuators that are “attached” to the hands and/or legs of users. Some systems now generate haptic “images” that augment or replace visual images Haptics are particularly sensitive to latency and jitter (instantaneous variations in latency). Rendering is the transformation of geometric information into images for display. All VR environments primarily render graphics locally. As networks provide bandwidth adequate for compressed HDTV, however, it will become reasonable and efficient for scenes to be rendered remotely and transmitted to each site in real time. The each flow-type attribute is explained in the following list: Latency is the sum of all delays in the system, from the speed of light in fiber, to operating system 8 overhead, to tracker settling time and screen refresh Bandwidth is the bits/second the system can transmit Reliable flows are verified and retransmitted if bad Multicast flows go to more than one site at once Security involves encryption overhead that may or may not be warranted or legal Streaming data is a constant flow of information over time, as with video, audio and tracking Dynamic QoS can provide ways to service bursty high-bandwidth needs on request [9] Borg, Seth, Girard, Eric, Sheldon, Nathan, “The Effect of Latency on Performance of Warcraft III”; http://web.cs.wpi.edu/~claypool/mqp/war3/mqp.pdf [10] Almasbakk, Hans, Brekne, Tonnes, Overby, Harald, “Online Gaming: An Overview”; http://www.q2s.ntnu.no/q2sfc/uploads/online_gaming.pdf [11] Degrande, Natalie, De Vleeschauwer, Danny, Lamotte, Wim, Monsieurs, Patrick, Quax, Peter, “Objective and Subjective Evaluation of the Influence of Small Amounts of Delay and Jitter on a Recent First Person Shooter Game”; http://www.sigcomm.org/sigcomm2004/workshop_papers/n et608-quax.pdf Conclusion As the internet has evolved it has certainly improved the quality of service. What was once only able to support simple text transfer is now capable of allowing surgery to be performed from 3,000 miles away. Simple multimedia applications such as voice traffic and streaming video are very easy to do with today’s technologies and applications that require precise data transfers with large payloads, which were once inconceivable, are starting to become commonplace. Technology will spawn more and more complex applications, each requiring finer and more guaranteed quality of service. In order to support these future technologies more effective means of ensuring quality of service are going to need to be developed as well. [12] “What is General Packet Radio Service?”; http://www.gsmworld.com/technology/gprs/intro.shtml REFERENCES [15] “The PSP FAQ”; http://psp.ign.com/articles/513/513175p1.html [13] “What is UMTS?”; http://www.umtsforum.org/servlet/dycon/ztumts/umts/Live/en/umts/What+is +UMTS_index [14] Busse, Marcel, Effelsberg, Wolfgang, Lamparter, Bernd, Mauve, Martin, “Lightweight QoS Support for Network Mobile Gaming”; http://www.coe.montana.edu/ee/rwolff/Games%20Project/li terature%20search/QoS%20suuport%20%20mobile%20games.pdf [1] Ballard, Buzz, Blake, Leslie, Macik, Keith, Montemayor, Ramiro, “Sip Phone Codec Testing Project”, May 2004; http://voipitec.tamu.edu/nit/Final%20Presentation.pdf [16] Xuan, Dong. Nov. 2005. ”Preliminary Data for Remote Visualization”. [17] “ESLEA: HPC- Further Information”; http://www.eslea.uklight.ac.uk/sp_hpc_further_info.html [2] Voiceage: The World’s Premier Supplier of Speech and Audio Codecs; http://www.voiceage.com/g729.php [18] Smithwick, M. 1995. "Network Options for Wide-Area Telesurgery." Journal of Telemedicine and Telecare 1(3):131-38. [3] Birin, Gil, “Voice over Frame Relay, IP and ATM”; http://www.protocols.com/papers/voe.htm [4] “Image and Video Compression Techniques”; http://www.autosophy.com/videcomp.htm [19] J Marescaux, J Leroy, M Gagner, F Rubino, D Mutter, M Vix, S E Butner and M K Smith, “Transatlantic RobotAssisted Telesurgery”, Nature, 413 (6,854) (2001), pp. 379–80. [5] “DTV Latency”; http://www.tvtechnology.com/features/Tech-Corner/f-rhlatency.shtml [20] Marescaux J, Rubino F. Telesurgery, “Trends Including Robot Assisted Technology”, Business Briefing: Global Surgery, October, 2003. [6] Xuan, Dong. Sept 2005. “Delays in the Internet” [7] “Aperto Networks Solutions: Internet Gaming”; http://apertonet.com/en/solutions/solutions/gaming.shtml [21] Butner S. E., Ghodoussi M., “Transforming a Surgical Robot for Human Telesurgery”, IEEE Transactions on Robotics and Automation, Oct. 2003. [8] “Amazon.com: System Requirements: Battlefield 2”; http://www.amazon.com/exec/obidos/tg/stores/detail//videogames/B00009V3NW/tech-data/002-01966545085624 [22] Cai Meng, Tianmiao Wang, Wusheng Chou, Sheng Luan, Yuru Zhang, Zengmin Tian, “Remote surgery case: 9 robot-assisted teleneurosurgery”, Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 IEEE International Conference on, Volume: 1, On page(s): 819- 823 Vol.1. [23] Jason Leigh, Oliver Yu, Dan Schonfeld, Rashid Ansari, Eric He, Atul Nayak, Jinghua Ge, Naveen Krishnapasad, Kyoung Park, Yong-joo Cho, Liujia Hu, Ray Fang, Alan Verlo, Linda Winkler, Thomas DeFanti, “Adaptive Networking for Tele-Immersion,” Proc. of the 5th Immersive Projection Technology/ 7th Eurographics Virtual Environments Conference (IPT/EGVE), May 16-18, 2001, Stuttgart, Germany, pp.199-208. [24] J. Leigh, Johnson, A., DeFanti, T., et al., "A Review of Tele-Immersive Applications in the CAVE Research Network," presented at IEEE VR99, Houston, Texas, 1999. Tom DeFanti, Dan Sandin, Maxine Brown, Dave Pape, Josephine Anstey, Mike Bogucki, Greg Dawe, Andy Johnson, Tom Huang, “Technologies for Virtual Reality/Tele-Immersion Applications: Issues of Research in Image Display and Global Networking”, European Commission/National Science Foundation Advanced Research Workshop on "Human-Centered Computing, Online Communities, and Virtual Environments", Judy Brown, Andy van Dam, Rae Earnshaw, Jose Encarnacao, Richard Guedj, Jennifer Preece, Ben Shneiderman, John Vance, eds., Chateau de Bonas, France, June 1-4, 1999. [25] J.E. Smith and F.W. Weingarten (eds.), “Research Challenges for the Next Generation Internet”, Computing Research Association, 1997, p. 20. [26] “VTK: Visualization Toolkit”; www.vtk.org 1 0