QoS Requirements of Multimedia Applications

Brett Berliner, Brian Clark and Albert Hartono
Department of Computer Science and Engineering
The Ohio State University
Columbus, OH 43210
{berliner, clarkbr, hartonoa}@cse.ohio-state.edu
With the growth of the internet and easier access to high
speed internet connections, more and more people are
turning towards computer networks to handle their long
distance voice communication instead of the traditional
telephone system. Using the internet to replace standard
telephone lines has many advantages. One of the biggest
advantages being that using the internet for voice
communication eliminates the concept of “long distance”.
Most companies that provide internet based voice
communication charge a monthly rate and do not charge on
a per-minute basis like traditional telephone companies.
The most common way of transmitting voice over the
internet is by Voice over Internet Protocol, or VoIP.
Quality of Service requirements are very important to
multimedia applications. Ensuring that these requirements
are met is key to many of today’s applications and creating
new technologies to ensure that stricter requirements can
be met will help create new devices in the future. This is a
study on the values of the QoS Requirements for Multimedia
When the internet was designed, it was intended to be used
for transfer of text or other simple data types, where the
level of service did not matter. The only thing that mattered
was reliability. Concepts such as delay, jitter and packet
loss percentages did not effect the service, the only thing
that mattered was that the service existed. When the
internet started being used for applications such as internet
telephony, streaming live video and even remote surgery,
things like jitter and packet loss began to matter.
Multimedia, unlike text, has a need for service guarantees or
else the services become useless. Trying to carry on a
conversation when words arrive out of order would be quite
frustrating. Thus, the introduction of multimedia into the
internet led to the concept of quality of service.
1. Voice
Codec Name
GIPS Enhanced
Delay Component
1.1 Raw Data for VoIP
The process of sending human voice over a computer
network starts with a person speaking into a PC
microphone. The sounds waves produced by the voice must
be translated into an electrical signal in order to be sent over
a network. This process of converting the analog signal to a
digital one is call digitization. In order to digitize human
voice effectively a sample is captured 8,000 times per
second, or given a sampling rate of 8 kHz. It is standard to
use 8 bits per sample which results in a minimum data
transfer rate of 64 Kbps. At the application layer this digital
signal is encoded and decoded by a codec. The sampling
rates, bit rates and extra information about popular codecs
Figure 1: Audio Compression Standards (Codecs) [1]
Sampling Rate
Bit Rate (Kbps)
Delay Contribution
Voice activity detection
5.3 and 6.3
Optimized for low delay
Voice activity detection
Figure 2: Delay Contribution by Components [2]
Maximum Delay
Contribution (ms)
The process of converting the actual digital signal into packets.
This delay is only incurred when using modems.
Done to compensate for jitter introduced by networks. This assumes
a 1x dejitter buffer is used.
The delay introduced by actually presenting the information to the
human recipient.
can be found in Figure 1. Like any other calculation,
encoding and decoding voice signals takes some finite
amount of time. The delay contributions of the various
codecs are also presented in Figure 1. Each of these
compression codecs introduces a different amount of delay.
The delay introduced comes from various sources. The
upper bounds of some of the factors contributing to this
delay are presented in Figure 2[3].
while the other listens, only 50% of the full duplex
connection is used at a time. Also, voice packets are not
transmitted during the silence observed in between words.
By not sending packets containing “dead air” approximately
10% of the bandwidth is reduced. These two techniques
total up to a 60% reduction in bandwidth from silence
2. Video
Once the voice is encoded with a particular codec it is
transmitted over the internet using internet protocol l(IP).
Video traffic is being sent more and more often in today’s
internet and will only increase in the future. Applications
such as video conferencing are becoming business standards
and many websites, CNN.com for example, offer videos on
demand. Today many homes even have digital cable
television service which transmits video information over a
network. Since video imaging requires lots of data,
compression and reservation protocols are going to become
necessary to support the future of video in networking.
Since IP is a best-effort service the QoS is not perfect and
some delay, loss and jitter is encountered. When the
encoded signal reaches its intended destination it is decoded
using the same codec used by the receiver. Finally the
decoded audio signal is presented to the receiver through a
1.2 Need for Delay Reduction
2.1 Raw Data
From Figures 1 and 2, one can see that the delay introduced
from a codec alone can approach almost 120 ms. This
number does not include the other various delays introduced
by the network such as propagation delay, queuing delay
and transmission delay. Based on the ITU recommendation
G.114, the delay in a telephone call should be less than 100150 ms. The reasoning behind this is a psychological
factor. If the delay is much more than this the caller will be
dissatisfied with the service. Even though a delay of 100150 ms is acceptable most QoS requirements for VoIP ask
for 50-80 ms of delay or less.
In order to achieve studio quality picture a video stream is
broken up into 30 frames per second. Each of these frames
contains 525 lines. In each of these frames the y value, or
luminance, is sampled at 13.5 MHz and the two
chrominance values, u and v, are sampled at 6.75 MHz.
This total data rate comes out to (13.5 + 6.75 + 6.75) * 8 =
216 Mbps. Due to this extremely high bit rate, obviously
compression techniques are required to transmit video over
the internet. For different transmission lines different
compression is required. If a channel supports higher
bandwidth then less compression is needed. Conversely, if
a channel has lower bandwidth then a higher compression
ratio is necessary to view the data. This point is illustrated
in Figure 3. It can be seen that for slower channels sending
video, even of lower quality, is just not feasible due to the
enormous compression ratios needed.
1.3 Solutions
Since delay must be minimized to ensure satisfactory
telephone service we must employ some techniques to
reduce this delay. The first major way to speed up voice
communication is to compress the audio signal. If the sheer
size of the data being transported is reduced it will arrive at
the destination quicker.
Some notable low bit rate
compression algorithms used are ITU G.723.1 and G.729A.
Another way to reduce the payload of transmitting voice
over IP is to use silence suppression. Due to the fact that
during normal telephone conversation one person talks
PC local LAN
T-1, DSL
Fiber Optic
Video uses techniques to compress individual frames, like
JPEG does, but also uses motion prediction to compress the
data further. In fact, most of the time the bit rate required
for video transmission is dependant solely on motion within
the images. Factors such as screen size, resolution and
scanning rates are almost irrelevant. Motion is defined in
Figure 3: Required Compression Ratios for Package Television [4]
Bit Rate
168 Mb/s
933 Mb/s
30 kb/s
56 kb/s
64 – 144 kb/s
1.5 Mb/s
10 Mb/s
42 Mb/s
200 Mb/s
Film Quality
2300 Mb/s
Figure 4: Number of Television Channels for Various Averaged Motions Within the Images [4]
Very Slow
Pixel Change
2 kp/s
4 kp/s
8 kp/s
16 kp/s
Bit Rate
12 kb/s
24 kb/s
48 kb/s
96 kb/s
30 kb/s
PC local LAN
56 kb/s
64 – 144 kb/s
1.5 Mb/s
T-1, DSL
10 Mb/s
42 Mb/s
200 Mb/s
Fiber Optic
increments of 1k (1024) pixels/second. In normal television
this translates to approximately one square inch of changed
image per second. This change does not need to be in one
contiguous block, it can be scattered throughout the entire
image. Figure 4 illustrates this by showing the number of
simultaneous channels various types of links can support for
different rates of motion.
3.1 Definition
One type of interactive multimedia is interactive gaming.
Interactive gaming, in this case, refers to players on their
own machine connecting remotely to other machines to
compete in the same event against each other. The device
used to connect could be a PC, a console game system or a
handheld device. Each of the devices already has most of
the game data, such as the engine and the graphics, so only
certain data needs to be sent to the central server. This data
may include character positioning and orientation, as well
as their current action, and the central server sends the
pertinent data to the connected computers for processing.
Figure 5: Delay Requirements for Data Types [6]
Delay (ms)
2.2 Delay Introduced By Compression
compressing/decompressing video is no exception. More
often than not, the latency introduced by this process is
much greater than the latency introduced by digitization and
digital processing in uncompressed format. Since most
video is very data intensive a high compression ratio is
needed. The greater the compression ratio used, the greater
the latency introduced. Typically the delay introduced by
encoding and decoding in a distribution and/or broadcast
scenario is several seconds [5].
3.2 QoS Requirements
These requirements help ensure that gameplay is a smooth,
realistic experience for all users with a minimum internet
connection, depending on the game. Even the inability to
meet one of these requirements often will completely ruin
gamers’ experiences while playing. The QoS requirements
that most directly affect interactive gaming are [7]:
1. a minimum amount of throughput
2. an acceptable end-to-end delay
3. low jitter
4. low packet loss rate
5. high dependability
3. Interactive Gaming
Recently, interactive multimedia, such as network gaming,
remote visualizations, remote surgery and tele-immersion,
has become a very large part of the
still developing internet. Compared to video and voice,
these types of applications often have QoS requirements
that are even tougher to satisfy than video and voice
This is often due to the fact that these
applications can generally not afford to lose packets or
suffer from any noticeable latency, or there is a good chance
the experience will be affected, if not ruined.
3.3 Throughput
Among researchers, there is a belief that the lower bound
on the acceptable delay from interactive multimedia is 15
ms, which is the amount of time it takes for a 66Hz monitor
to draw a single frame. With a lower delay requirement, the
monitor could not keep up, and therefore, most of these
methods could not be implemented [6]. Figure 5 shows the
average delay requirements for interactive multimedia in
comparison to those of video and voice.
Throughput is a QoS requirement that varies from game to
game. Most games only require 56K dial up connections
(40 kpbs) to run smoothly. For example, two of today’s
most popular online games, Guild Wars and Counter-Strike,
can both be played online with a 56K connection. CounterStrike, for example, only needs around 16 Kbps per
connected user to avoid slowdown [7]. This number can
vary greatly depending on the genre of game. Games where
players have to take turns, such as Massive Multiplayer
Online Role Playing Games (MMORPGs) like Everquest or
World of Warcraft, can allow for slower links, as the data
can update while the player is waiting their turn. As a
result, these games often only require a 30-40 Kbps link.
This also applies to real time strategy games such as
Command and Conquer, where the player tells their units
what to do, and while the unit is processing, the server can
send receive data. These games generally hover around 20
– 30 Kbps, although the newer the game, the higher the link
speed necessary. Very new first person shooters (FPS),
such as Battlefield 1942, can be played with 16 players on a
40 Kpbs connection. However, to take full advantage of all
of the vehicles and weapons, as well as allow all 64 possible
players at a time, each user must have a broadband
connection around 250 Kbps [8]. Its sequel, Battlefield 2,
needs around that level and offers no guarantees for those
with less speed. In fact, for highest performance with 64
players, a link of 2 Mbps is necessary. The following table
shows what type of games need around how much speed.
Figure 6: Throughput and Delay Across Game Types
Game Type Basic RTS MM
Basic Intense
40 250 –
Throughput 40
150 - 150
End to End 50
Delay (ms)
3.4 End-to-End Delay
The most important QoS requirement for online gaming is
definitely end-to-end delay. End-to-end delay is a major
factor in what gamers call lag, which is basically a slang
term for latency. Lag refers to the delay between when a
command is issued by the player, and when it happens on
the screen. A large amount of lag can completely ruin a
gamer’s experience. For elite performance, 50 ms or less for
end-to-end delay is optimal. However, this is only
absolutely necessary for certain games and game types.
While 50 ms is a good value for very intense first person
shooters like Battlefield 2, like throughput, older first
person shooters such as Counter-Strike, and real-time
strategy games and MMORPGs, can survive with a higher
delay, as they do not need as fast of a link. Specifically,
most of these games can run with a delay of 150 ms or
below [9]. Figure 6 shows the acceptable amount of delay
across game types, while Figure 7 shows how CounterStrike is affected by end to end delay.
3.5 Jitter and Packet Loss Rate
Jitter will cripple an online game. Packets are timely, and
any amount of jitter that allows packets to be received late,
or even worse, out of order, affect gameplay the same as
< 50 ms
Excellent gameplay
packet loss. As the paper says, even the users who suffered
a delay of 40 ms, but with a jitter of 20 ms, were affected
greatly. All of the users reported horrible gameplay during
this experiment, and the delay never was over 100 ms,
which in the game they played (Unreal Tournament 2003),
would not be devastating, but some of the users could not
even continue the game [11]. This is due to the jitter. If the
packets don’t arrive on time, then game information can be
lost. Even worse, if the packets arrive out of order, then the
packet that arrives late is useless, and the possibly important
information is lost.
In network gaming, only data that is relevant to the game is
usually sent. As a result, there can be almost no packet loss,
since all data is important. Effectively, the packet loss has
to be at 0% for a game to run smoothly. The only way that
a packet loss is acceptable is if the packet contains
insignificant details. However, almost no packets (if any)
contain any insignificant details. Therefore, packet loss can
be crippling to a game session.
3.6 Interactive Gaming Dependability
Like most multimedia, network gaming generally uses UDP
for transmission, due to the fact that it is a repeated
transmission to a single source, and the transmission is
usually small. In addition, there is no time for TCP
connection establishment and acknowledgement, as speed is
everything. Therefore, packets cannot be retransmitted, so
the link needs to be dependable. It also needs to always be
open. Even if only two players are against each other in a
Battlefield 2 session, the ability to add in 62 additional
players must be there. They cannot simply open up the line
when it’s needed, because if there isn’t room to be opened
up, the game session will fail.
Also, the server must be able to handle a constant stream of
packets. Although each packet is very small (around 100
bytes), the packets come at a steady stream, depending on
game, varying from 30 ms to 100 ms. For most games, the
maximum amount of traffic must be assumed (that is, every
game room is full), so if maximum traffic is actually
transmitted, the network is prepared [7].
3.7 Mobile Network Gaming
Mobile network gaming is played on devices such as PDAs
and cell phones. When gaming on these devices, there is
generally much less data to send than PC games, due to the
processing limitations of the devices. As a result of these
limitations, technology, bandwidth and time are rarely
devoted to mobile games, meaning these games don’t
particularly work well (if at all). This, however, is
beginning to change as handheld gaming systems like the
Sony PSP become popular.
Figure 7: End-To-End Delay in Counter-Strike [10]
50-100 ms
100-150 ms
150-200 ms
Noticeably decreased
Significantly effected
> 200 ms
Intolerable gameplay
The two major transport services for PDAs and cellular
phones are GPRS (General Packet Radio Service) and
UMTS (Universal Mobile Telecommunications Systems).
GPRS is a non-voice service that works in unison with
mobile devices to send their data. GRPS is referred to as
“always connected”, since it can perform an almost instant
transmission. Theoretically, GRPS can send data at a rate
of 171.2 Kbps, but in reality, to reach this, an operator must
grant them all of the bandwidth, referred to as timeslots.
Since operators will rarely, if ever, do this, in reality, the
bandwidth is less than 1/8th of this at many times, and only
1/4th of it at best [12]. UMTS is a type of mobile
transmission that relies on radio spectrum transmission.
Again, theoretically, UMTS can transmit anywhere from
384 Kbps to 2 Mbps, but again, this is only if the space is
reserved [13]. A study done in Germany on a mobile
volleyball game shows the problem with UMTS and GRPS
that doesn’t befall console and PC gaming – mobile
networks are just not set up for over-provisioning that is
necessary to meet gaming QoS requirements. For instance,
the simple volleyball game had very little data to send
(approximately 20 states per minute, which each state
containing very little data), but the game was very tough to
play, due to the fact that the delay was always between 100
and 200 ms [14].
must wait for every action to perform, they will not want to
use the system. The QoS requirements that most directly
affect remote visualizations are:
1. low delay
2. high throughput/bandwidth
3. low latency
4.2 Delay
Generally, the way remote visualizations work is simple – a
computer collects the input, storing it as raw data. That data
is then turned into triangles, which are translated to an
image, and then only the image is sent to the user, who then
views the image or executes a command, modifying the
data and forcing the system to take a new snapshot.
Therefore, most of the data is handled at the side of the
computer that is given the raw data. [16]
For the calculations, the data gathering computer first has to
receive the raw data. Often, this data can approach size of
gigabytes or terabytes, but due to visualization algorithms,
the size can often be dropped down to around 100 MB. The
size of the raw data is listed here on as ‘n’. Next, the
number of triangles per frame in the image can be written as
K, where K is often 500,000 (but can range from 50,000 to
1,000,000). With n and K, the delay to generate the
triangles from raw data is on the order of O(log(n) + K),
with the actual time depending on the CPU[16].
Although the Sony PSP contains much better graphics than
a PDA or a mobile phone, the reason is that it contains its
own wireless card that conforms to the IEEE 802.11b
standards [15]. The devices only communicate with a
wireless router, which already has the setup for ensuring
QoS requirements are met (as they would be for laptops or
PCs connected to the wireless router). As a result, the PSPs
are able to be played around a wireless router. True mobile
gaming is not effective yet.
With time determined, next is size. The size of the triangles
in bytes is 3 (dimensions) * 3 (points) * 4 (float number) * 2
(shading) + 12 (color) = 86 bytes. So with 500,000
triangles, there will be 43 MB of data. Then, to make them
an image, the data must be copied to the graphic card, so
that will take around 43 M * the copying speed, and then an
average graphic card can take around 100M triangles per
second, so the time to process one frame is 500,000 / 100M,
or 5 ms [16]. Then, the only delay left is how long it takes
to get to the recipient, which must as small as possible,
because otherwise, the phenomenon that nothing is
happening can occur. If a user goes to zoom in on a data
set, and the processing takes 30 seconds, it will be a highly
frustrating process. However, with the large amount of data
to be dealt with at the original computer, this is part of the
reason remote visualizations will not work. Figure 8 shows
a simplified version of how the data is collected and
4. Remote Visualizations
Remote visualizations are another type of multimedia
becoming a reality as the internet continues to grow.
Basically, a remote user connects to a data set that is either
generated or has previously been generated. The key is that
the data must be interactive – for example, the user can
often do things like rotate the data, zoom in, and even add
in slice planes to see inside the data. Right now, however,
there is just too much in the way of performing consistent
remote visualizations.
Figure 8: Remote Visualization Data Process
4.1 QoS Requirements
To perform consistent, timely visualization transmission, a
few QoS requirements need to be met.
If these
requirements cannot be met, the user will almost certainly
be unable to have the patience to get their desired results.
Since the size of the raw data is often in the gigabyte range,
it is clear that remote visualizations require great precision.
Thus, users need to be able to single out important parts of
the data. This can often take many actions, and if a user
carried out from a certain distance. Through communication
lines, digitized information can be transmitted to remote
locations, enabling surgeons to operate on patients located
distantly. Challenges to this concept are numerous, but the
most essential limitations have been the dependability or
quality of service of the communication lines and the issue
of latency, which is the delay time from when the hand
motion is initiated by the surgeon until the remote
manipulator actually moves, and the image is shown on the
surgeon’s monitor [20].
4.3 Throughput/Bandwidth
This is a big factor that keeps remote visualization from
being possible. The transmission speed is the last key.
Obviously, a high speed link is needed at the end, because
the transmission both ways must be as quick as possible, so
that the only delay is at the computer gathering data.
When performing remote visualizations that do NOT work
like the above one, usually, a link speed about around 700
Mbps is necessary. This is to make sure the data (which
would be the 100M) is transmitted in a timely fashion. The
more observers of the data the more time is needed. If that
compression is put into motion, then a link of only 100 –
200 Mbps is necessary. Again, the more observers, the
more the overall link will need (so 2 users will be around
200 – 400 Mbps from the central server, and so on). In
addition, if stereoscopic rendering (that is, creating the
illusion of depth in the image), the amount of bandwidth
needed is doubled [17].
5.1 QoS Requirements
Even though there is little practical experience of
telesurgery at present, it is clear that successful telesurgery
will require a data transfer of robot commands, video and
voice signals, text, computer data, as well as stored and
real-time medical images. A list of provisional network
requirements for telesurgery can therefore be identified.
These QoS requirements include [18]:
1. reliability
2. an acceptable end-to-end delay
3. multiplexing of various data rates
4. low data error rate (BER-bit error rate)
Other requirements and desirable features are likely to
emerge as further telesurgery trials are conducted and more
experience is gained.
4.4 Latency
Although QoS data on exact requirements for latency are
difficult to track down, a simple experiment using a
visualization kit (VTK, in this case), can demonstrate the
effect of latency on distance visualizations [26]. On VTK’s
website, they offer many sample programs in C++ that use
to perform visualizations. Although these occur solely on
the user’s computer, the “sleep” command from
<windows.h> can be used to simulate remote visualization
delay. Since the command sleep(50) tells the processor to
sleep for 50 ms before continuing on to the next program.
Inserting a sleep(X) command wherever a command appears
in the VTK program will cause the program to pause for X
ms before it continues. Since every time a command is
issued, the computer waits a certain amount of time to begin
it, this is a very effective latency simulation.
5.2 Reliability and Error Rate
In telesurgery, since human lives are at stake, the
consequences of an error in transmission could be very
serious, and therefore reliable techniques of networking
must be acquired.
Since the data rate associated with robot commands is very
low (typically 19.2 Kbps), there is therefore ample scope to
protect each message with error-protection coding. Each
time the operator issues a command, the transmitting
equipment can send it more than once to the receiving end.
The receiving end can then echo the command back to the
sending end. Only when the command is received and
echoed correctly, say three times in succession, would the
command be executed at the receiving end [18].
A simple, unscientific test reveals that around 150 ms sleep
time starts to noticeably affect the program. It is not a deal
breaker, but it is frustrating at times. At between 225 and
250 ms, the latency becomes impossible to work with.
Even a simple task such as rotating the data to see a
different side of the visualization becomes arduous. Both of
the users who attempted to experiment with the remote data
with a delay of 250 ms became too frustrated before
finishing any tests with slice planes. As a result, even
though these numbers are unscientific, it is clear what kind
of effect latency has on remote visualizations.
One apparent threat to safe telesurgery would be a power
telecommunications network operators employ battery
arrays and ‘hot standby’ generators to take over the task of
powering the network in the event of a mains power failure.
A power cut on the network must be virtually unnoticed by
the user [18].
5. Telesurgery
5.3 Time-Delay
Telesurgery - surgery performed at a certain distance - is
one aspect of telemedicine. The introduction of robotic and
computer technology into surgical operations allows
dexterity to be increased and surgical procedures to be
There is a major constraint that could lead to disastrous
results during surgery, namely time delay. The surgeon
therefore views his or her movements on the computer
interface as they are happening. If the surgical system were
removed to a more distant site, however, it would introduce
a time delay. Visualization of the operating field could be
milliseconds or even seconds behind the real-time
manipulations of the surgeon. Studies showed that the
acceptable limit of time delay in terms of a surgeon’s
perception of safety was roughly 330 ms [19]; satellite
transmission, for example, would introduce a delay of more
than 600 ms.
tele-immersion was one of five key technologies identified
as necessary for the future use of the NGI [25]:
Tele-immersion. Tele-immersion will enable users in
different locations to collaborate in a shared, virtual, or
simulated environment as if they are in the same room. It is
the ultimate synthesis of networking and media technologies
to enhance collaborative environments. Tele-Immersive
applications must combine audio, video, virtual worlds,
simulations, and many other complex technologies. They
will require huge bandwidth, very fast responses, and
guarantees of delivery.
On September 7th, 2002, the world’s first human longdistance operation was performed between New York, USA
and Strasbourg, France (14,000 km distance), demonstrating
the feasibility and safety of performing a complete surgical
operation from remote locations. The two sites were
connected through a high-speed terrestrial optical-fiber
network that transports data through dedicated connections
using Asynchronous Transfer Mode (ATM) technology. A
bandwidth of 10 Mbps has been reserved through a network
that interconnects applications at both sites using a network
termination unit (NTU), which provides a multiservice path
to different applications [19].
6.1 QoS Requirements
In general, the QoS requirements for tele-immersion include
the following four important factors:
1. extremely high network bandwidth
2. low latency
3. constant jitter
4. guarantees of delivery
6.2 Challenges of Tele-Immersion
By monitoring both NTU units at the two ends, the number
of lost packets and the communication quality were
measured. It was revealed that no ATM packet was lost
during any surgical procedure. The round-trip delay by
ATM transport was 78 − 80 ms. Adding 70 ms for video
coding and decoding, plus a few milliseconds for rate
adaptation and Ethernet-to-ATM packet conversion,
movements executed by the surgeon in New York were
apparent within 155 ms on his video screen [19].
Tele-immersion has emerged as a high-end driver for the
Quality of Service (QoS), bandwidth, and reservation
efforts envisioned by the NGI and Internet2 leadership.
From a networking perspective, tele-immersion is a very
challenging technology for several reasons [24]:
 The networks must be in place and tuned to
support high-bandwidth applications
 Low latency, needed for 2-way collaboration, is
hard to specify and guarantee given current
 The speed of light in fiber itself is a limiting factor
over transcontinental and transoceanic distances
 Multicast, unicast, reliable and unreliable data
transmissions (called “flows”) need to be provided
for and managed by the networks and the operating
systems of supercomputer-class workstations
 Real-time considerations for video and audio
reconstruction (“streaming”) are critical to
achieving the feel of telepresence, whether
synchronous or recorded and played back
 The computers, too, are bandwidth limited with
regard to handling very large data for collaboration
 Simulation and data mining are open-ended in
computational and bandwidth needs—there will
never be quite enough computing and bits/second
to fully analyze, and simulate reality for scientific
Another technology which impacts the time-delay
requirement of telesurgery is video compression algorithms
(codecs). It is important that new codecs produce video that
is of higher quality, low latency (< 100 ms), and degrades
5.4 Simultaneous Transfer of Various Types of
The various transferred data in telesurgery consists of video,
voice, images, robot commands, text, and computer data.
These data are originated from different equipments and
have different data rates. Therefore, it is crucial that the
network must have the ability to simultaneously transfer
data from sources with widely differing data rates. One
networking technology that is ideal for telesurgery is, for
example, ATM, due to its special ability to multiplex
sources with different data rates and its low cell loss rate
6.3 Lag
6. Tele-Immersion
Lag is the term used to describe the perceived sum of all the
sources of latency in a system. Typically, it is thought of as
the delay between action in the real world and the perceived
response of the system to that action. Lag is the critical
Tele-immersion has since entered the NGIand Internet2
vocabulary. In the applications section of the Computing
Research Association’s “Research Challenges for the NGI,”
< 30 ms
< 100 ms
< 30 ms
< 100 ms
< 10 ms
< 100 ms
< 30 ms
< 10 ms
< 30 ms
Figure 9: Tele-Immersion Data Flow Types
Multicast Security
> 1GB/s
> 1GB/s
> 1 Mb/s
issue for usability; reducing lag is a major technical
challenge. Communications latency is only one component
of tele-immersion lag. Effective solutions to reducing lag
must attack the component sources of latency at all levels of
the system. Sources of latency in the communications
system are transmission latency, bandwidth or transfer
latency, switching or routing latency, contention, and
protocol latency.
Most users have difficulty manipulating objects in VR once
lag exceeds 200 ms [24]. When the virtual display is
coupled with the real world, as in tele-robotics, this limit is
approximately 30 ms. Non-network components of the VR
system often together exceed 200 – 300 ms, so there is
actually very little room for wide-area communications
delay in the lag budget.
6.4 Jitter
Jitter in the network will more greatly impact collaborative
coordination than latency [23]. Higher latencies with low
jitter will still allow collaborators to make reasonable
predictions of how an environment will behave (albeit
overall task performance will decline.) However high jitter
reduces predictability and hence collaborators are forced to
employ a purely sequential interaction strategy
6.5 Tele-Immersion Flow Types
Progress in all these areas, however, is expected; teleimmersion serves as an integrating technology as pieces of
the solution are contributed by the community and
computer/networking industry. The following table,
developed in discussions with Rick Stevens, director of the
Math and Computer Science Division at Argonne National
Lab, gives our current best estimations and opinions of the
attributes of the nine types flows simultaneously needed for
an n-way compute and data-intensive audio, video, and
haptic (touch) tele-immersive session [24].
Each row indicates data flow types:
 Control information: data used to manage the
session, to authenticate users or processes, to
launch processes, to control the display or tracking
systems, and to communicate out of band between
the world servers and VR systems.
 Text provides simple communications capability
within sessions for simple note taking and passing.
 Audio gives ambient auditory cues, allows voice
communications among users, and is used to issue
commands via voice recognition and speech
Video can allow teleconferencing or remote
monitoring displayed within the virtual world.
Tracking is achieved with location and orientation
sensors, and captures the position and orientation
of the user. Typically this data is streamed to the
computer responsible for computing the
perspective of the scene. Tele-immersion requires
tracking data to be shared among sites.
Database is the heart of a tele-immersion
application world. The database contains the
graphical models of virtual scenes, objects, and
data, and since the database is used to provide the
models that are rendered, it must be maintained in
a coherent state across multiple sites. Databases
might be as simple as shared VRML files or as
complex as multi-terabyte scientific datasets, VR
extensions of video serving.
Simulation provides the basis for dynamic
behaviors, like responding to the users’ actions.
Small-scale simulations often run on the computer
also generating the VR experience, but frequently
the simulation will need a dedicated
supercomputer. User input is captured and
transmitted to the simulation via the network and
the simulation will generate an update, which is
then propagated to each user site for local
rendering. Typically the data transferred to the
simulation is considerably smaller than the data
returned by the simulation.
Haptics include force and touch sensing/feedback
devices and use a variety of sensors and actuators
that are “attached” to the hands and/or legs of
users. Some systems now generate haptic “images”
that augment or replace visual images Haptics are
particularly sensitive to latency and jitter
(instantaneous variations in latency).
Rendering is the transformation of geometric
information into images for display. All VR
environments primarily render graphics locally. As
networks provide bandwidth adequate for
compressed HDTV, however, it will become
reasonable and efficient for scenes to be rendered
remotely and transmitted to each site in real time.
The each flow-type attribute is explained in the following
 Latency is the sum of all delays in the system, from
the speed of light in fiber, to operating system
overhead, to tracker settling time and screen
Bandwidth is the bits/second the system can
Reliable flows are verified and retransmitted if bad
Multicast flows go to more than one site at once
Security involves encryption overhead that may or
may not be warranted or legal
Streaming data is a constant flow of information
over time, as with video, audio and tracking
Dynamic QoS can provide ways to service bursty
high-bandwidth needs on request
[9] Borg, Seth, Girard, Eric, Sheldon, Nathan, “The Effect
of Latency on Performance of Warcraft III”;
[10] Almasbakk, Hans, Brekne, Tonnes, Overby, Harald,
“Online Gaming: An Overview”;
[11] Degrande, Natalie, De Vleeschauwer, Danny, Lamotte,
Wim, Monsieurs, Patrick, Quax, Peter, “Objective and
Subjective Evaluation of the Influence of Small Amounts of
Delay and Jitter on a Recent First Person Shooter Game”;
As the internet has evolved it has certainly improved the
quality of service. What was once only able to support
simple text transfer is now capable of allowing surgery to
be performed from 3,000 miles away. Simple multimedia
applications such as voice traffic and streaming video are
very easy to do with today’s technologies and applications
that require precise data transfers with large payloads,
which were once inconceivable, are starting to become
commonplace. Technology will spawn more and more
complex applications, each requiring finer and more
guaranteed quality of service. In order to support these
future technologies more effective means of ensuring
quality of service are going to need to be developed as well.
