WordPress.com

advertisement
LECTURE:1
Unit-1: Basics of Multimedia Technology:
1. Computers, communication and entertainment
In the era of information technology, we are dealing with free flow of information with no barriers of distance. Take
the case of internet as one of the simplest examples. You can view and download information across the globe within
a reasonable time, if you have a good speed of connection.
Let us spend a bit of time to think in which form we access the information. The simplest and the most common of
these is the printed text. In every web page, some text materials are always present, although the volume of textual
contents may vary from page to page. The text materials are supported with graphics, still pictures, animations, video
clippings, audio commentaries and so on. All or at least more than one of these media, which we can collectively call
as “multimedia”, are inevitably present to convey the information which the web site developers want to do, for the
benefit of the world community at large. All these media are therefore utilized to present the information in a
meaningful way, in an attractive style.
Internet is not the only kind of information dissemination involving multiple media. Let us have a look at some other
examples as well. In television, we have involvement of two media – audio and video, which should be presented
together in a synchronized manner. If we present the audio ahead of video or video ahead of audio in time, the results
are far from being pleasant. Loss of lip synchronization is noticeable, even if the audio and the video presentations
differ by just 150 milliseconds or more. If the time lead or the lag is in order of seconds, one may totally lose the
purpose of the presentation. Say, in some distance learning program, the teacher is explaining something which is
written down on a blackboard. If the audio and the video differ in time significantly, a student will not be able to
follow the lecture at all.
So, television is also a multimedia and now, we understand one more requirement of multimedia signals. The
multimedia signals must be synchronized and if it is not possible to make them absolutely synchronized, they should
at least follow a stringent specification by which lack of synchronization can be tolerated.
Television is an example, where there is only unidirectional flow of multimedia information – from the transmitter to
the receiver. In standard broadcast television, there is no flow of information in the reverse direction, unless you use
a different device and a channel –say, by talking to the television presenter over telephone. In internet of course, you
have interactivity in the sense that you can navigate around the information and make selections through hyperlinks,
but bulk of the flow of information is happening from the web server to the users. In some applications, we require
free flow of multimedia signals between two or more nodes, as is the case with video conferencing, or what we
should more appropriately call a multimedia teleconferencing. In case of multimedia conferencing, the nodes
physically located in different parts of a globe will be equipped with microphones, cameras, a computer supporting
texts, graphics and animations and may be other supporting devices if required. For example,suppose five eminent
doctors across the continents are having a live medical conference to discuss about a patient’s condition. The doctors
should not only see and talk to each other, all of them should observe the patient at the same time, have access to the
patient’s medical history, live readings and graphs of the monitoring instruments, visual rendering of the data etc. In a
multimedia teleconferencing application of this nature, one must ensure that the end-to-end delays and the turnaround
time is minimum. Moreover, the end-to-end delays between different nodes should not be different very significantly
with respect to each other, that is, the delay jitters must be small
Multimedia an introduction
What is Multimedia?

A simple definition of multimedia is ‘multimedia can be any combination of text, graphics, sound, animation
and video, to effectively communicate ideas to users’
Multimedia = Multi + media
Multi = many
Media = medium or means by which information is stored,
transmitted, presented or perceived.
Other definition of Multimedia area
Definition 1
“Multimedia is any combination of text, graphic art, sound, animation and video delivered to you by computer or
other electronic means.”
Definition 2
“Multimedia is the presentation of a (usually interactive) computer application, incorporating media
elements such as text, graphics, video, animation and sound on computer.”
Types of Multimedia Presentation

Multimedia presentation can be categorize into 2; linear Multimedia and Interactive Multimedia.
a.
Linear Multimedia
o
In linear Multimedia the users have very little control over the presentation. The users would only sit back and watch the
presentation The presentation normally plays from the start to end or even loops continually to present the information.

b.
A movie is a common type of linear multimedia
Interactive Multimedia
o In interactive Multimedia, users dictate the flow of delivery. The users control the delivery of elements
and control the what and when.
o Users have the ability to move around or follow different path through the information presentation.
o The advantage of interactive Multimedia is that complex domain of information can easily be
presented. Whereas the disadvantage is, users might get lost in the massive “information highway”.
 Interactive Multimedia is useful for: information archive (encyclopedia), education,
training and entertainment.
Multimedia System Characteristic

4 major characteristics of Multimedia System are
a. Multimedia systems must be computer controlled.
b. All multimedia components are integrated.
c. The interface to the final user may permit interactivity.
d. The information must be represented digitally.
I)
Computer controlled

Computer is used for
o Producing the content of the information – e.g. by using the authoring tools, image editor,
sound and video editor
o Storing the information – providing large and shared capacity for multimedia information.
o Transmitting the information – through the network.
o Presenting the information to the end user – make direct use of computer peripheral such
as display device (monitor) or sound generator (speaker).
II)
Integrated

All multimedia components (audio, video, text, graphics) used in the system must be somehow
integrated.
o
Example:

Every device, such as microphone and camera is connected to and controlled by a
single computer.
III)

A single type of digital storage is used for all media type.

Video sequences are shown on computer screen instead of TV monitor.
Interactivity

Three levels of interactivity:
Level 1: Interactivity strictly on information delivery. Users
select the time at which the presentation starts, the
order, the speed and the form of the presentation itself.
Level 2: Users can modify or enrich the content of the
information, and this modification is recorded.
Level 3: Actual processing of users input and the computer
generate genuine result based on the users input.
IV)
Digitally represented

Digitization : process involved in transforming an analog signal to digital signal.
LECTURE 2:
Framework for multimedia systems
For multimedia communication, we have to make judicious use of all the media at our disposal. We have audio,
video, graphics, texts - all these media as the sources, but first and foremost, we need a system which can acquire the
separate media streams, process them together to make it an integrated multimedia stream. Now, click at the link
given below to view the elements involved in a multimedia transmitter, as shown in Fig.1.1
Devices like cameras, microphones, keyboards, mouse, touch-screens, storage medium etc. are required to feed inputs
from different sources. All further processing till the transmission is done by the computer. The data acquisition from
multiple media is followed by data compression to eliminate inherent redundancies present in the media streams. This
is followed by inter-media synchronization by insertion of time-stamps, integration of individual media streams and
finally the transmission of integrated multimedia stream through a communication channel, which can be a wired or a
wireless medium. The destination end should have a corresponding interface to receive the integrated multimedia
stream through the communication channel. At the receiver, a reversal of the processes involved during transmission
is required.
Now, click at the link below to view the elements involved in a multimedia receiver, as shown in Fig.1.2.
The media extractor separates the integrated media stream into individual media streams, which undergoes decompression and then presented in a synchronized manner according to their time-stamps in different playback units,
such as monitors, loudspeakers, printers/plotters, recording devices etc.
The subject of multimedia is studied from different perspectives in different universities and institutes. In some of the
multimedia courses, the emphasis is on how to generate multimedia production, the authoring tools and the software
associated with these etc. In this course, we don’t cover any of the multimedia production aspects at all. Rather, this
course focuses on the multimedia systems and the technology associated with multimedia signal processing and
communication. We have already posed the technical challenges. In the coming lessons, we shall see in details how
these challenges, such as compression, synchronization etc can be overcome, how the multimedia standards have
been designed to ensure effective multimedia communication, how to integrate the associated media and how to
index and retrieve multimedia sequences. Other than the introduction, this course has been divided into the following
modules:
(i) Basics of Image Compression and Coding.
(ii) Orthogonal Transforms for Image Compression.
(iii) Temporal Redundancies in Video sequences.
(iv) Real-time Video Coding.
(v) Multimedia standards.
(vi) Continuity and synchronization.
(vii) Audio Coding.
(viii) Indexing, Classification and Retrieval.
(ix) Multimedia applications.
MULTIMEDIA DEVICE:
1 CONNECTIONS:
Among many devices- computer, monitor, disk drive, video disk driver etc there are many wires and connections to
resemble the intensive care ward of hospital.
The equipment required for multimedia project depend on the content of the project as well as its design. If you can
find content such as sound effects, music, graphic arts, quick time or AVI movies to use in your project, you may not
need extra tools for making your own. Multimedia developer have separate equipments for digitizing sound from
microphone or taps, scanning photos from other printer matters.
Connection
Serial port
Standard parallel port
Original USB
IDE
SCSI-1
SCSI-2
Ultra SCSI
Ultra 2 SCSI
Wide Ultra 2 SCSI
Ultra 3 SCSI
Wide ultra 3 SCSI
Transfer rates
115kbits/s
115 kbits/s
12 mbitd/s
3.3-16.7Mbits/s
5Mbits/s
10Mbits/s
20Mbits/s
40Mbits/s
40Mbits//s
80Mbits/s
160Mbits/s
SCSI
Small Computer System Interface, or SCSI (pronounced ['skʌzi][1]), is a set of standards for physically
connecting and transferring data between computers and peripheral devices. The SCSI standards define commands,
protocols, and electrical and optical interfaces. SCSI is most commonly used for hard disks and tape drives, but it can
connect a wide range of other devices, including scanners and CD drives. The SCSI standard defines command sets
for specific peripheral device types; the presence of "unknown" as one of these types means that in theory it can be
used as an interface to almost any device, but the standard is highly pragmatic and addressed toward commercial
requirements.




SCSI is an intelligent interface: it hides the complexity of physical format. Every device attaches to the SCSI
bus in a similar manner.
SCSI is a peripheral interface: up to 8 or 16 devices can be attached to a single bus. There can be any number
of hosts and peripheral devices but there should be at least one host.
SCSI is a buffered interface: it uses hand shake signals between devices, SCSI-1, SCSI-2 have the option of
parity error checking. Starting with SCSI-U160 (part of SCSI-3) all commands and data are error checked by
a CRC32 checksum.
SCSI is a peer to peer interface: the SCSI protocol defines communication from host to host, host to a
peripheral device, peripheral device to a peripheral device. However most peripheral devices are exclusively
SCSI targets, incapable of acting as SCSI initiators—unable to initiate SCSI transactions themselves.
Therefore peripheral-to-peripheral communications are uncommon, but possible in most SCSI applications.
The Symbios Logic 53C810 chip is an example of a PCI host interface that can act as a SCSI target.
LECTURE 3:
IDE, ATA, EIDE, ultra ATA, Ultra IDE
The ATA (Advanced Technology Attachment) standard is a standard interface that allows you to connect storage
peripherals to PC computers. The ATA standard was developed on May 12, 1994 by the ANSI (document X3.2211994).
Despite the official name "ATA", this standard is better known by the commercial term IDE (Integrated Drive
Electronics) or Enhanced IDE (EIDE or E-IDE).
The ATA standard was originally intended for connecting hard drives, however an extension called ATAPI (ATA
Packet Interface) was developed in order to be able to interface other storage peripherals (CD-ROM drives, DVDROM drives, etc.) on an ATA interface.
Since the Serial ATA standard (written S-ATA or SATA) has emerged, which allows you to transfer data over a serial
link, the term "Parallel ATA" (written PATA or P-ATA) sometimes replaces the term "ATA" in order to differentiate
between the two standards.
The Principle
The ATA standard allows you to connect storage peripherals directly with the motherboard thanks to a ribbon cable,
which is generally made up of 40 parallel wires and three connectors (usually a blue connector for the motherboard
and a black connector and a grey connector for the two storage peripherals).
On the cable, one of the peripherals must be declared the master cable and the other the slave. It is understood that
the far connector (black) is reserved for the master peripheral and the middle connector (grey) for the slave
peripheral. A mode called cable select (abbreviated as CS or C/S) allows you to automatically define the master and
slave peripherals as long as the computer's BIOS supports this functionality.
USB:
Universal Serial Bus (USB) connects more than computers and peripherals. It has the power to connect you with a
whole new world of PC experiences. USB is your instant connection to the fun of digital photography or the limitless
creative possibilities of digital imaging. You can use USB to connect with other people through the power of PCtelephony and video conferencing. Once you've tried USB, we think you'll grow quite attached to it! In information
technology, Universal Serial Bus (USB) is a serial bus standard to interface devices to a host computer. USB was
designed to allow many peripherals to be connected using a single standardized interface socket and to improve the
Plug and play capabilities by allowing hot swapping, that is, by allowing devices to be connected and disconnected
without rebooting the computer or turning off the device. Other convenient features include providing power to lowconsumption devices without the need for an external power supply and allowing many devices to be used without
requiring manufacturer specific, individual device drivers to be installed.
USB is intended to replace many legacy varieties of serial and parallel ports. USB can connect computer peripherals
such as mice, keyboards, PDAs, gamepads and joysticks, scanners, digital cameras, printers, personal media players,
and flash drives. For many of those devices USB has become the standard connection method. USB was originally
designed for personal computers, but it has become commonplace on other devices such as PDAs and video game
consoles, and as a bridging power cord between a device and an AC adapter plugged into a wall plug for charging
purposes. As of 2008, there are about 2 billion USB devices in the world.[citation needed]
The design of USB is standardized by the USB Implementers Forum (USB-IF), an industry standards body
incorporating leading companies from the computer and electronics industries. Notable members have included
Agere (now merged with LSI Corporation), Apple Inc., Hewlett-Packard, Intel, NEC, and Microsoft.
Firewire
Presentation of FireWire Bus (IEEE 1394)
The IEEE 1394 bus (name of the standard to which it makes reference) was developed at the end of 1995 in order to provide an
interconnection system that allows data to circulate at a high speed and in real time. The company Apple gave it the commercial
name "FireWire", which is how it is most commonly known. Sony also gave it commercial name, i.Link. Texas Instruments
preferred to call it Lynx.
FireWire is a port that exists on some computers that allows you to connect peripherals (particularly digital cameras) at a very
high bandwidth. There are expansion boards (generally in PCI or PC Card / PCMCIA format) that allow you to equip a computer
with FireWire connectors. FireWire connectors and cables can be easily spotted thanks to their shape as well as the following
logo:
FireWire Connectors
There are different FireWire connectors for each of the IEEE 1394 standards.

The IEEE 1394a standard specifies two connectors:
o
Connectors 1394a-1995:
o
Connectors 1394a-2000, called mini-DV because they are used on Digital Video (DV) cameras:

The IEEE 1394b standard specifies two types of connectors that are designed so that 1394b-Beta cables can be plugged into
Beta and Bilingual connectors, but 1394b Bilingual cables can only be plugged into Bilingual connectors:
o
1394b Beta connectors:
o
1394b Bilingual connectors:
How the FireWire Bus Works
The IEEE 1394 bus has about the same structure as the USB bus except that it is a cable made up of six wires (2 pairs for the
data and the clock and 2 wires for the power supply) that allow it to reach a bandwidth of 800 Mb/s (soon it should be able to
reach 1.6 Gb/s, or even 3.2 Gb/s down the road). The two wires for the clock is the major difference between the USB bus and
the IEEE 1394 bus, i.e. the possibility to operate in two transfer modes:

Asynchronous transfer mode: this mode is based on a transmission of packets at variable time intervals. This means that the
host sends a data packet and waits to receive a receipt notification from the peripheral. If the host receives a receipt notification,
it sends the next data packet. Otherwise, the first packet is resent after a certain period of time.

Synchronous mode: this mode allows data packets of specific sizes to be sent in regular intervals. A node called Cycle Master
is in charge of sending a synchronisation packet (called a Cycle Start packet) every 125 microseconds. This way, no receipt
notification is necessary, which guarantees a set bandwidth. Moreover, given that no receipt notification is necessary, the
method of addressing a peripheral is simplified and the saved bandwidth allows you to gain throughput.
Another innovation of the IEEE 1394 standard: bridges (systems that allow you to link buses to other buses) can be used.
Peripheral addresses are set with a node (i.e. peripheral) identifier encoded on 16 bits. This identifier is divided into two fields: a
10-bit field that identifies the bridge and a 6-bit field that specifies the node. Therefore, it is possible to connect 1,023 bridges (or
210 -1) on which there can be 63 nodes (or 26 -1), which means it is possible to address 65,535 peripherals! The IEEE 1394
standard allows hot swapping. While the USB bus is intended for peripherals that do not require a lot of resources (e.g. a mouse
or a keyboard), the IEEE 1394 bandwidth is larger and is intended to be used for new, unknown multimedia (video acquisition,
etc.).
LECTURE 4:
MEMORY STORAGE DEVICE:
To estimate the memory requirement of a multimedia project is the space required on any floppy or a hard
disk or CD ROM not on the random access memory. While your computer is running you must have sense of the
project content, color image, text and programming code that glue it all together to the required memory.
If you are making a multimedia you must also need memory for storing and archiving working files used during
production, audio and video files.
<1>
RAM
Random-access memory (usually known by its acronym, RAM) is a form of computer data storage. Today it takes
the form of integrated circuits that allow the stored data to be accessed in any order (i.e., at random). The word
random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its physical
location and whether or not it is related to the previous piece of data.[1]
ROM:
Read-only memory (usually known by its acronym, ROM) is a class of storage media used in computers and other
electronic devices. Because data stored in ROM cannot be modified (at least not very quickly or easily), it is mainly
used to distribute firmware In its strictest sense, ROM refers only to mask ROM (the oldest type of solid state
ROM), which is fabricated with the desired data permanently stored in it, and thus can never be modified. However,
more modern types such as EPROM and flash EEPROM can be erased and re-programmed multiple times; they are
still described as "read-only memory"(ROM) because the reprogramming process is generally infrequent,
comparatively slow, and often does not permit random access writes to individual memory locations. Despite the
simplicity of mask ROM, economies of scale and field-programmability often make reprogrammable technologies
more flexible and inexpensive, so that mask ROM is rarely used in new products as of 2007.
Zip, Jaz, Syquest, optical storage device:
Fro years the Sequest 44 MB removable cartridges were the most widely used portable medium among multimedia
developers. Zip driver with there likewise inexpensive 100MB, 250MB, 750 MB cartridge built on a floppy disk
technology, significantly presented Syquest’s Market share fro removable media. Lomega’s Jaz cartridge , built
based on hard drive technology provide one or two gegebytes of removable storage media, and have fast enough
transfer rate for multimedia developer.
Other storage device are:





Digital versatile disc
Flash or thumb Drive
CD-ROM Players
CD Recorder
CD RW
INPUT DEVICE:
 Keyboard
 Mice









Trackball
Touchscreen
Magnetic card encoders and reader
Graphic tablets
Scanners
Optical Character Recognizer Device
infrared Remotes
Voice Recognizer systems
Digital Cameras
OUTPUT HARDWARE:
Presentation of audio and visual component of your multimedia project requires hardware that may not be
attached with the computer by itself, such as speaker, amplifier, monitor. There is no greater test of benefit of
good output hardware then to feed to feed your audio output of your computer in your external amplifier system.
Some of your output device are:







Audio Device
Amplifier and Speaker
Portable Media Player
Monitor
Video Device
Projectors
Printers
Communication Devidce:
Communication among workgroup member ands with the client is essential for the efficient and assured
completion of project. When you need it immediate, an internet connection is required. If you and your client
both are connected via internet, a combination of communication by e-mail and FTP(file transfer protocol )
may be the most cost effective and efficient solution for creative developer and manager Various
communication device are as below:
 Modem
 ISDN and DSL
 Cable Modems

LECTURE 5:
CD-AUDIO, CD ROM, CD-I:
CD Audio (or CD-DA -Digital Audio):
The first widely used CDs were music CDs that appeared in the early 1980s.The format for storing
recorded music in digital form, as on CDs that are commonly found in music stores. The red book
standards, created by Sony and Phillips indicate specifications such as the size of the pits and lands, how
the audio is organized and where it is located on the CD, as well as how errors are corrected.Using
compression techniques CD Audio discs can hold up to 75 minutes of sound. To provide the highest
quality, the music is sampled at 44.1 kHz, 16-bit stereo. Because of the high-quality sound of audio CDs,
they quickly became very popular. Other CD formats evolved from the Red Books standards.
CDROM:
Although the Red Book standards were excellent for audio, they were useless for data, text, graphics and
video. he Yellow Book standards built upon the Red Book, adding specifications for a track to accommodate data,
thus establishing a format for storing data, including video and audio, in digital form on a compact disc.CD-ROM
also provided a better error-checking scheme, which is important for data.One drawback of the Yellow Book
standards is that they allowed various manufacturers to determines their own method of organizing and accessing
data. This led to incompatibilities across computer platforms. For the first few years of its existence, the Compact
Disc was a medium used purely for audio. However, in 1985 the Yellow Book CD-ROM standard was established by
Sony and Philips, which defined a non-volatile optical data computer data storage medium using the same physical
format as audio compact discs, readable by a computer with a CD-ROM drive
CD-I (Interactive):
CD-i or Compact Disc Interactive is the name of an interactive multimedia CD player developed and marketed
by Royal Philips Electronics N.V. CD-i also refers to the multimedia Compact Disc standard utilized by the CD-i
console, also known as Green Book, which was co-developed by Philips and Sony in 1986 (not to be confused with
MMCD, the pre-DVD format also co-developed by Philips and Sony). The first Philips CD-i player, released in 1991
and initially priced around USD $700, is capable of playing interactive CD-i discs, Audio CDs, CD+G
(CD+Graphics), Karaoke CDs, and Video CDs (VCDs), though the latter requires an optional "Digital Video Card" to
provide MPEG-1 decoding. Dveloped by Phillips in 1986, the specifications for CD-I, were published in the Green
Book. CD-I is a platform-specific format; it requires a CD-I player, with a proprietary operating system, attached to a
television set. Because of the need for specific CD-I hardware, this format has had only marginal success in the
consumer market. One of the benefits of CD-I is its ability to synchronize sound and pictures on a single track of the
disc.
PRESENTATION DEVICE AND USER INTERFACE IN MULTIMEDIA
LECTURE 6:
2. Lans and multimedia internet, World Wide Web & multimedia distribution
network-ATM & ADSL
• Networks
Telephone networks dedicate a set of resources that forms a complete path from end to end for the duration of the
telephone connection. The dedicated path guarantees that the voice data can be delivered from one end to the other
end in a smooth and timely way, but the resources remain dedicated even when there is no talking. In contrast, digital
packet networks, for communication between computers, use time-shared resources (links, switches, and routers) to
send packets through the network. The use of shared resources allows computer networks to be used at high
utilization, because even small periods of inactivity can be filled with data from a different user. The high utilization
and shared resources create a problem with respect to the timely delivery of video and audio over data networks.
Current research centers around reserving resources for time-sensitive data, which will make digital data networks
more like telephone voice networks.
Internet
The Internet and intranets, which use the TCP protocol suite, are the most important delivery vehicles for multimedia
objects. TCP provides communication sessions between applications on hosts, sending streams of bytes for which
delivery is always guaranteed by means of acknowledgments and retransmission. User Datagram Protocol (UDP) is a
``best-effort'' delivery protocol (some messages may be lost) that sends individual messages between hosts. Internet
technology is used on single LANs and on connected LANs within an organization, which are sometimes called
intranets, and on ``backbones'' that link different organizations into one single global network. Internet technology
allows LANs and backbones of totally different technologies to be joined together into a single, seamless network.
Part of this is achieved through communications processors called routers. Routers can be accessed from two or more
networks, passing data back and forth as needed. The routers communicate information on the current network
topology among themselves in order to build routing tables within each router. These tables are consulted each time a
message arrives, in order to send it to the next appropriate router, eventually resulting in delivery.
Token ring
Token ring [31] is a hardware architecture for passing packets between stations on a LAN. Since a single circular
communication path is used for all messages, there must be a way to decide which station is allowed to send at any
time. In token ring, a ``token,'' which gives a station the right to transmit data, is passed from station to station. The
data rate of a token ring network is 16 Mb/s.
Ethernet
Ethernet [31] LANs use a common wire to transmit data from station to station. Mediation between transmitting
stations is done by having stations listen before sending, so that they will not interfere with each other. However, two
stations could begin to send at the same time and collide, or one station could start to send significantly later than
another but not know it because of propagation delay. In order to detect these other situations, stations continue to
listen while they transmit and determine whether their message was possibly garbled by a collision. If there is a
collision, a retransmission takes place (by both stations) a short but random time later. Ethernet LANs can transmit
data at 10 Mb/s. However, when multiple stations are competing for the LAN, the throughput may be much lower
because of collisions and retransmissions.
Switched Ethernet
Switches may be used at a hub to create many small LANs where one large one existed before. This reduces
contention and permits higher throughput. In addition, Ethernet is being extended to 100Mb/s throughput. The
combination, switched Ethernet, is much more appropriate to multimedia than regular Ethernet, because existing
Ethernet LANs can support only about six MPEG video streams, even when nothing else is being sent over the LAN.
ATM
Asynchronous Transfer Mode(ATM) [29, 32] is a new packet-network protocol designed for mixing voice, video, and
data within the same network. Voice is digitized in telephone networks at 64 Kb/s (kilobits per second), which must
be delivered with minimal delay, so very small packet sizes are used. On the other hand, video data and other
business data usually benefit from quite large block sizes. An ATM packet consists of 48 octets (the term used in
communications for eight bits, called a byte in data processing) of data preceded by five octets of control information.
An ATM network consists of a set of communication links interconnected by switches. Communication is preceded
by a setup stage in which a path through the network is determined to establish a circuit. Once a circuit is established,
53-octet packets may be streamed from point to point.
ATM networks can be used to implement parts of the Internet by simulating links between routers in separate
intranets. This means that the ``direct'' intranet connections are actually implemented by means of shared ATM links
and switches.
ATM, both between LANs and between servers and workstations on a LAN, will support data rates that will allow
many users to make use of motion video on a LAN.
• Data-transmission techniques
Modems
Modulator/demodulators, or modems, are used to send digital data over analog channels by means of a carrier signal
(sine wave) modulated by changing the frequency, phase, amplitude, or some combination of them in order to
represent digital data. (The result is still an analog signal.) Modulation is performed at the transmitting end and
demodulation at the receiving end. The most common use for modems in a computer environment is to connect two
computers over an analog telephone line. Because of the quality of telephone lines, the data rate is commonly limited
to 28.8 Kb/s. For transmission of customer analog signals between telephone company central offices, the signals are
sampled and converted to ``digital form'' (actually, still an analog signal) for transmission between offices. Since the
customer voice signal is represented by a stream of digital samples at a fixed rate (64 Kb/s), the data rate that can be
achieved over analog telephone lines is limited.
ISDN
Integrated Service Digital Network (ISDN) extends the telephone company digital network by sending the digital
form of the signal all the way to the customer. ISDN is organized around 64Kb/s transmission speeds, the speed used
for digitized voice. An ISDN line was originally intended to simultaneously transmit a digitized voice signal and a
64Kb/s data stream on a single wire. In practice, two channels are used to produce a 128Kb/s line, which is faster
than the 28.8Kb/s speed of typical computer modems but not adequate to handle MPEG video.
ADSL
Asymmetric Digital Subscriber Lines (ADSL) [33-35] extend telephone company twisted-pair wiring to yet greater
speeds. The lines are asymmetric, with an outbound data rate of 1.5 Mb/s and an inbound rate of 64 Kb/s. This is
suitable for video on demand, home shopping, games, and interactive information systems (collectively known as
interactive television), because 1.5 Mb/s is fast enough for compressed digital video, while a much slower ``back
channel'' is needed for control. ADSL uses very high-speed modems at each end to achieve these speeds over twistedpair wire.
ADSL is a critical technology for the Regional Bell Operating Companies (RBOCs), because it allows them to use
the existing twisted-pair infrastructure to deliver high data rates to the home.
• Cable systems
Cable television systems provide analog broadcast signals on a coaxial cable, instead of through the air, with the
attendant freedom to use additional frequencies and thus provide a greater number of channels than over-the-air
broadcast. The systems are arranged like a branching tree, with ``splitters'' at the branch points. They also require
amplifiers for the outbound signals, to make up for signal loss in the cable. Most modern cable systems use fiber
optic cables for the trunk and major branches and use coaxial cable for only the final loop, which services one or two
thousand homes. The root of the tree, where the signals originate, is called the head end.
Cable modems
Cable modems are used to modulate digital data, at high data rates, into an analog 6-MHz-bandwidth TV-like signal.
These modems can transfer 20 to 40 Mb/s in a frequency bandwidth that would have been occupied by a single
analog TV signal, allowing multiple compressed digital TV channels to be multiplexed over a single analog channel.
The high data rate may also be used to download programs or World Wide Web content or to play compressed video.
Cable modems are critical to cable operators, because it enables them to compete with the RBOCs using ADSL.
Set-top box
The STB is an appliance that connects a TV set to a cable system, terrestrial broadcast antenna, or satellite broadcast
antenna. The STB in most homes has two functions. First, in response to a viewer's request with the remote-control
unit, it shifts the frequency of the selected channel to either channel 3 or 4, for input to the TV set. Second, it is used
to restrict access and block channels that are not paid for. Addressable STBs respond to orders that come from the
head end to block and unblock channels.
• Admission control
Digital multimedia systems that are shared by multiple clients can deliver multimedia data to a limited number of
clients. Admission control is the function which ensures that once delivery starts, it will be able to continue with the
required quality of service (ability to transfer isochronous data on time) until completion. The maximum number of
clients depends upon the particular content being used and other characteristics of the system.
• Digital watermarks
Because it is so easy to transmit perfect copies of digital objects, many owners of digital content wish to control
unauthorized copying. This is often to ensure that proper royalties have been paid. Digital watermarking [38, 39]
consists of making small changes in the digital data that can later be used to determine the origin of an unauthorized
copy. Such small changes in the digital data are intended to be invisible when the content is viewed. This is very
similar to the ``errors'' that mapmakers introduce in order to prove that suspect maps are copies of their maps. In other
circumstances, a visible watermark is applied in order to make commercial use of the image impractical.
Lecture 7:
Multimedia architecture
In this section we show how the multimedia technologies are organized in order to create multimedia systems, which
in general consist of suitable organizations of clients, application servers, and storage servers that communicate
through a network. Some multimedia systems are confined to a stand-alone computer system with content stored on
hard disks or CD-ROMs. Distributed multimedia systems communicate through a network and use many shared
resources, making quality of service very difficult to achieve and resource management very complex.
• Single-user stand-alone systems
Stand-alone multimedia systems use CD-ROM disks and/or hard disks to hold multimedia objects and the scripting
metadata to orchestrate the playout. CD-ROM disks are inexpensive to produce and hold a large amount of digital
data; however, the content is static--new content requires creation and physical distribution of new disks for all
systems. Decompression is now done by either a special decompression card or a software application that runs on
the processor. The technology trend is toward software decompression.
• Multi-user systems
Video over LANs
Stand-alone multimedia systems can be converted to networked multimedia systems by using client-server remotefile-system technology to enable the multimedia application to access data stored on a server as if the data were on a
local storage medium. This is very convenient, because the stand-alone multimedia application does not have to be
changed. LAN throughput is the major challenge in these systems. Ethernet LANs can support less than 10 Mb/s, and
token rings 16 Mb/s. This translates into six to ten 1.5Mb/s MPEG video streams. Admission control is a critical
problem. The OS/2* LAN server is one of the few products that support admission control [40]. It uses priorities with
token-ring messaging to differentiate between multimedia traffic and lower-priority data traffic. It also limits the
multimedia streams to be sure that they do not sum to more than the capacity of the LAN. Without some type of
resource reservation and admission control, the only way to give some assurance of continuous video is to operate
with small LANs and make sure that the server is on the same LAN as the client. In the future, ATM and fast
Ethernet will provide capacity more appropriate to multimedia.
Direct Broadcast Satellite
Direct Broadcast Satellite (DBS), which broadcasts up to 80 channels from a satellite at high power, arrived in 1995
as a major force in the delivery of broadcast video. The high power allows small (18-inch) dishes with line-of-sight to
the satellite to capture the signal. MPEG compression is used to get the maximum number of channels out of the
bandwidth. The RCA/Hughes service employs two satellites and a backup to provide 160 channels. This large
number of channels allows many premium and special-purpose channels as well as the usual free channels. Many
more pay-per-view channels can be provided than in conventional cable systems. This allows enhanced pay-per-view,
in which the same movie is shown with staggered starting times of half an hour or an hour.
DBS requires a set-top box with much more function than a normal cable STB. The STB contains a demodulator to
reconstruct the digital data from the analog satellite broadcast. The MPEG compressed form is decompressed, and a
standard TV signal is produced for input to the TV set. The STB uses a telephone modem to periodically verify that
the premium channels are still authorized and report on use of the pay-per-view channels so that billing can be done.
Interactive TV and video to the home
Interactive TV and video to the home [2-5] allow viewers to select, interact with, and control video play on a TV set
in real time. The user might be viewing a conventional movie, doing home shopping, or engaging in a network game.
The compressed video flowing to the home requires high bandwidth, from 1.5 to 6 Mb/s, while the return path, used
for selection and control, requires far lower bandwidth.
The STB used for interactive TV is similar to that used for DBS. The demodulation function depends upon the
network used to deliver the digital data. A microprocessor with memory for limited buffering as well as an MPEG
decompression chip is needed. The video is converted to a standard TV signal for input to the TV set. The STB has a
remote-control unit, which allows the viewer to make choices from a distance. Some means are needed to allow the
STB to relay viewer commands back to the server, depending upon the network being used.
Cable systems appear to be broadcast systems, but they can actually be used to deliver different content to each home.
Cable systems often use fiber optic cables to send the video to converters that place it on local loops of coaxial cable.
If a fiber cable is dedicated to each final loop, which services 500 to 1500 homes, there will be enough bandwidth to
deliver an individual signal to many of those houses. The cable can also provide the reverse path to the cable head
end. Ethernet-like protocols can be used to share the same channel with the other STBs in the local loop. This
topology is attractive to cable companies because it uses the existing cable plant. If the appropriate amplifiers are not
present in the cable system for the back channel, a telephone modem can be used to provide the back channel.
As mentioned above, the asymmetric data rates of ADSL are tailored for interactive TV. The use of standard twistedpair wire, which has been brought to virtually every house, is attractive to the telephone industry. However, the
twisted pair is a more noisy medium than coaxial cable, so more expensive modems are needed, and distances are
limited. ADSL can be used at higher data rates if the distance is further reduced.
Interactive TV architectures are typically three-tier, in which the client and server tiers interact through an application
server. (In three-tier systems, the tier-1 systems are clients, the tier-2 systems are used for application programs, and
the tier-3 systems are data servers.) The application tier is used to separate the logic of looking up material in indexes,
maintaining the shopping state of a viewer, interacting with credit card servers, and other similar functions from the
simple function of delivering multimedia objects.
The key research questions about interactive TV and video-on-demand are not computer science questions at all.
Rather, they are the human-factors issues concerning ease of the on-screen interface and, more significantly, the
marketing questions regarding what home viewers will find valuable and compelling.
Internet over cable systems
World Wide Web browsing allows users to see a rich text, video, sound, and graphics interface and allows them to
access other information by clicking on text or graphics. Web pages are written in HyperText Markup Language
(HTML) and use an application communications protocol called HTTP. The user responses, which select the next
page or provide a small amount of text information, are normally quite short. On the other hand, the graphics and
pictures require many times the number of bytes to be transmitted to the client. This means that distribution systems
that offer asymmetric data rates are appropriate.
Cable TV systems can be used to provide asymmetric Internet access for home computers in ways that are very
similar to interactive TV over cable. The data being sent to the client is digitized and broadcast over a prearranged
channel over all or part of the cable system. A cable modem at the client end tunes to the right channel and
demodulates the information being broadcast. It must also filter the information destined for the particular station
from the information being sent to other clients. The low-bandwidth reverse channel is the same low-frequency band
that is used in interactive TV. As with interactive TV, a telephone modem might be used for the reverse channel. The
cable head end is then attached to the Internet using a router. The head end is also likely to offer other services that
Internet Service Providers sell, such as permanent mailboxes. This asymmetric connection would not be appropriate
for a Web server or some other type of commerce server on the Internet, because servers transmit too much data for
the low-speed return path. The cable modem provides the physical link for the TCP/IP stack in the client computer.
The client software treats this environment just like a LAN connected to the Internet.
Video servers on a LAN
LAN-based multimedia systems [4, 6, 15] go beyond the simple, client-server, remote file system type of video
server, to advanced systems that offer a three-tier architecture with clients, application servers, and multimedia
servers. The application servers provide applications that interact with the client and select the video to be shown. On
a company intranet, LAN-based multimedia could be used for just-in-time education, on-line documentation of
procedures, or video messaging. On the Internet, it could be used for a video product manual, interactive video
product support, or Internet commerce. The application server chooses the video to be shown and causes it to be sent
to the client.
There are three different ways that the application server can cause playout of the video: By giving the address of the
video server and the name of the content to the client, which would then fetch it from the video server; by
communicating with the video server and having it send the data to the client; and by communicating with both to set
up the relationship.
The transmission of data to the client may be in push mode or pull mode. In push mode, the server sends data to the
client at the appropriate rate. The network must have quality-of-service guarantees to ensure that the data gets to the
client on time. In pull mode, the client requests data from the server, and thus paces the transmission.
The current protocols for Internet use are TCP and UDP. TCP sets up sessions, and the server can push the data to the
client. However, the ``moving-window'' algorithm of TCP, which prevents client buffer overrun, creates
acknowledgments that pace the sending of data, thus making it in effect a pull protocol. Another issue in Internet
architecture is the role of firewalls, which are used at the gateway between an intranet and the Internet to keep
potentially dangerous or malicious Internet traffic from getting onto the intranet. UDP packets are normally never
allowed in. TCP sessions are allowed, if they are created from the inside to the outside. A disadvantage of TCP for
isochronous data is that error detection and retransmission is automatic and required--whereas it is preferable to
discard garbled video data and just continue.
Resource reservation is just beginning to be incorporated on the Internet and intranets. Video will be considered to
have higher priority, and the network will have to ensure that there is a limit to the amount of high-priority traffic that
can be admitted. All of the routers on the path from the server to the client will have to cooperate in the reservation
and the use of priorities.
Video conferencing
Video conferencing which will be used on both intranets and the Internet, uses multiple data types, and serves
multiple clients in the same conference. Video cameras can be mounted near a PC display to capture the user's
picture. In addition to the live video, these systems include shared white boards and show previously prepared
visuals. Some form of mediation is needed to determine which participant is in control. Since the type of multimedia
data needed for conferencing requires much lower data rates than most other types of video, low-bit-rate video, using
approximately eight frames per second and requiring tens of kilobits per second, will be used with small window
sizes for the ``talking heads'' and most of the other visuals. Scalability of a video conferencing system is important,
because if all participants send to all other participants, the traffic goes up as the square of the number of participants.
This can be made linear by having all transmissions go through a common server. If the network has a multicast
facility, the server can use that to distribute to the participants
LECTUER 8
Multimedia Software



Familiar Tools
Multimedia Authoring Tools
Elemental Tools
Familiar Tools




Word Processors
_ Microsoft Word
_ WordPerfect
Spreadsheets
_ Excel
Databases
_Q+E Database/VB
Presentation Tools
_ PowerPoint
MULTIMEDIA AUTHORING TOOL:
A multimedia authoring tool is a program that helps you write multimedia applications. A multimedia authoring tool
enables you to create a final application merely by linking together objects, such as a paragraph of text, an
illustration, or a song. They are used exclusively for applications that present a mixture of textual, graphical, and
audio data.
With multimedia authoring software you can make video productions including CDs and DVDs, design interactivity
and user interface, animations, screen savers, games, presentations, interactive training and simulations.
Types of authoring tools:
There are basically three types of authoring tools. These are as following.
 Card- or Page-based Tools
 Icon-based Tools
 Time-based Tools
Card- or Page-based Tools
In these authoring systems, elements are organized as pages of a book or stack of cards.
The authoring system lets you link these pages or cards into organized sequence and they also allow you to play
sound elements and launch animations and digital videos. Page-based authoring systems are object-oriented: the
objects are the buttons, graphics and etc. Each object may contain a programming script activated when an event
related to that object occurs.
EX: Visual Basic.
Icon-based Tools
Icon-based, event-driven tools provide a visual programming approach to organizing and presenting multimedia.
First you build the flowchart of events, tasks and decisions by using appropriate icons from a library. These icons
can include menu choices, graphic images and sounds. When the flowchart is built, you can add your content:
text, graphics, animations, sounds and video movies.
EX: Authoware Professional
Time-based Tools:
Time-based authoring tools are the most common of multimedia authoring tools. In these authoring systems,
elements are organized along a time line. They are the best to use when you have message with the beginning and
an end. Sequentially organized graphic frames are played back at the speed that you can set. Other elements (such
as audio events) are triggered at the given time or location in the sequence of events.
EX: Animation Works Interactive
LECTURE 9:
Elemental Tools:
Elemental tools help us work with the important basic elements of your project: its graphics, images, sound, text
and moving pictures.
Elemental tools includes:
 Painting And Drawing Tools
 Cad And 3-D Drawing Tools
 Image Editing Tools
 OCR Software
 Sound Editing Programs
 Tools For Creating Animations And Digital Movies
 Helpful Accessories
Painting And Drawing Tools:
Painting and drawing tools are the most important items in your toolkit because the impact
of the graphics in your project will likely have the greatest influence on the end user.
Painting software is dedicated to producing excellent bitmapped images .
Drawing software is dedicated to producing line art that is easily printed to paper. Drawing packages include
powerful and expensive computer-aided design (CAD) software.
Ex: DeskDraw, DeskPaint, Designer
CAD And 3-D Drawing Tools
CAD (computer-aided design) is a software used by architects, engineers, drafters,
artists and others to create precision drawings or technical illustrations. It can be used to create two-dimensional
(2-D) drawings or three dimensional modules. The CAD images can spin about in space, with lighting conditions
exactly simulated and shadows properly drawn. With CAD software you can stand in front of your work and view
it from any angle, making judgments about its design.
Ex: AutoCAD
Image Editing Tools
Image editing applications are specialized and powerful tools for enhancing and retouching
existing bitmapped images. These programs are also indispensable for rendering images used in multimedia
presentations. Modern versions of these programs also provide many of the features and tools of painting and
drawing programs, and can be used to create images from scratch as well as images digitized from scanners,
digital cameras or artwork files created by painting or drawing packages.
Ex: Photoshop
OCR Software
Often you will have printed matter and other text to incorporate into your project, but no
electronic text file. With Optical Character Recognition
(OCR) software, a flat-bed scanner and your computer you can save many hours of typing printed words and get
the job done faster and more accurately.
Ex: Perceive
Sound Editing Programs
Sound editing tools for both digitized and MIDI sound let you see music as well as hear
it. By drawing the representation of the sound in a waveform, you can cut, copy, paste and edit segments of the sound
with great precision and making your own sound effects.
Using editing tools to make your own MIDI files requires knowing about keys, notations and instruments and you
will need a MIDI synthesizer or device connected to the computer.
Ex: SoundEdit Pro
Tools For Creating Animations And Digital Movies
Animations and digital movies are sequences of bitmapped graphic scenes (frames), rapidly
played back. But animations can also be made within an authoring system by rapidly changing the location of
objects to generate an appearance of motion.
Movie-making tools let you edit and assemble video clips captured from camera, animations, scanned images,
other digitized movie segments. The completed clip, often with added transition and visual effects can be played
back.
Ex: Animator Pro
and SuperVideo Windows
Helpful Accessories
No multimedia toolkit is complete without a few indispensable utilities to perform
some odd, but repeated tasks. These are the accessories. For example a screen-grabber is essential, because
bitmap images are so common in multimedia, it is important to have a tool for grabbing all or part of the screen
display so you can import it into your authoring system or copy it into an image editing application
LECTURE 10:
Anti-aliasing
One of the most important techniques in making graphics and text easy to read and pleasing to the eye on-screen is
anti-aliasing. Anti-aliasing is a cheaty way of getting round the low 72dpi resolution of the computer monitor and
making objects appear as smooth as if they'd just stepped out of a 1200dpi printer (nearly).
Take a look at these images. The letter a on the left is un-anti-aliased and looks coarse compared
with the letter on the right.
If we zoom in we can see better what's happening. Look at how the un-anti-aliased example below left breaks up
curves into steps and jagged outcrops. This is what gives the letter its coarse appearance. The example on the right is
the same letter, same point size and everything, but with anti-aliasing turned on in Photoshop's text tool. Notice how
the program has substituted shades of grey around the lines which would otherwise be broken across a pixel.
But anti-aliasing is more than just making something slightly fuzzy so that you can't see the jagged edges: it's a way
of fooling the eye into seeing straight lines and smooth curves where there are none.
To see how anti-aliasing works, let's take a diagonal line drawn across a set of pixels. In the example left the pixels
are marked by the grid: real pixels don't look like that of course, but the principle is the same.
Pixels around an un-anti-aliased line can only be part of the line or not part of it: so the computer draws the line as a
jagged set of pixels roughly approximating the course of our original nice smooth line. (Trivia fact: anti-aliasing was
invented at MIT's Media Lab. So glad they do do something useful there....)
When the computer anti-aliases the line it works out how much of each in-between pixel would be covered by the
diagonal line and draws that pixel as an intermediate shade between background and foreground. In our simpleminded example here this is shades of grey. This close up the anti-aliasing is obvious and actually looks worse than
the un-anti-aliased version, but try taking your glasses off, stepping a few yards back from the screen and screwing up
your eyes a bit to emulate the effect of seeing the line on a VGA monitor covered in crud at its right size. Suddenly a
nice, smooth line pops into view.
So how does one go about anti-aliasing an image? Just be grateful you don't have to do it by hand. Most screen
design programs, including Photosop and Paintshop Pro include anti-alias options for things like text and line tools.
The important thing is simply to remember to do it, and to do it at the appropriate time.
There are far too many graphics out on the Web that are perfectly well-designed, attractive and fitted to their purpose
but end up looking amateurish because they haven't been anti-aliased. Equally, there are plenty of graphics that have
turned to visual mush because they've been overworked with the anti-alias tool.
Generally, I guess, the rules are these:
Always anti-alias text except when the text is very small. This is to taste but I reckon on switching off anti-aliasing in
Photoshop below about 12 points. If you're doing a lot with text this size, you really oughtn't be putting it in a graphic
but formatting ASCII instead.
Always anti-alias rasterised EPSs (see the accompanying page for details). Except when you don't want to, of course.
If attempting to anti-alias something manually, or semi-manually, such as by putting a grey halo round a block black
graphic, then only apply the effect at the last possible stage. And always, always, always bear in mind the target
background colour. It's a fat lot of good anti-aliasing a piece of blue text on a white background, if the target page is
orange, because the anti-aliased halo is going to be shades of white-orange. I spent two hours re-colouring in a logo
after doing exactly that. Doh!
Never confuse blur and anti-aliasing. The former is a great help in making things appear nice and smooth if applied to
specific parts of an image, but it'll make your image just look runny if used all over.
That's about it. Anti-aliasing is of immense importance, especially in turning EPSs into something pleasant to look at
onscreen, as I explain in the next couple of pages.
LECTURE 11:
ANIMATION
Animation is achieved by adding motion to still image or objects, It may also be defined as the creation of moving
pictures one frame at a time.Animation grabs attention, and makes a multimedia product more interesting and
attractive.
There are a few types of animation
Layout transition
It is the simplest form of animation
Example of transitions is spiral, stretch. Zoom
Process / information transition
Animation can be used to describe complex information/ process in an easier way.
Such as performing visual cues (e.g. how things work)
Object Movement
Object movement which are more complex animations such as animated gif or animated scenes.
 How does animation works?

Animation is possible because of:o A biological phenomenon known as persistence of vision

An object seen by human eye remains chemically mapped on the eye’s retina for a
brief time after viewing
o A psychological phenomenon called phi.


Human’s mind need to conceptually complete the perceived action.
The combination of persistence of vision and phi make it possible for a series of images that are
changed very slightly and very rapidly, one after another, to seemingly blend together into a visual
illusion of movement.

Eg. A few cells or frames of rotating logo, when continuously and rapidly changed, the arrow of
the compass is perceived to be spinning.

Still images are flashed in sequence to provide the illusion of animation.

The speed of the image changes is called the frame rate.

Film is typically delivered at 24 frames per second (fps)

In reality the projector light flashes twice per frame, this increasing the flicker rate to 48 times per
second to remove any flicker image.

The more interruptions per second, the more continuous the beam of light appears, the smoother
the animation.
 Animation Techniques

Cel animation
o a series of progressively different graphics are used for each frame of film.
o Made famous by Disney.

Stop motion
o Miniatures three-dimensional sets are used (stage, objects)
o Objects are moved carefully between shots.

Computer Animation (Digital cel & sprite animation)
o Employ the same logic and procedural concept of cel animation
o Objects are drawn using 3D modelling software.
o Objects and background are drawn on different layers, which can be put on top of one
another.
o Sprite animation – animation on moving object (sprite)

Computer Animation (Key Frame Animation)
o Key frames are drawn to provide the pose a detailed characteristic of characters at
important points in the animation

Eg. Specify the start and end of a walk, the top and bottom of the fall.
o 3D modelling and animation software will do the tweening process.
o Tweening fill the gaps between the key frames and create a smooth animation.

Hybrid Technique
o A technique that mix cell and 3D computer animation. It may as well include life footage.

Kinematics
o The study of motion of jointed structure (such as people)
o Realistic animation of such movement can be complex. The latest technology use motion
capture for complex movement animation

Morphing
o The process of transitioning from one image to another.
When morphing, few key elements (such as a nose from both images) are set to share the same location (one the final
image)
LECTURE 12:
VIDEO ON DEMAND

Video can add great impact to your multimedia presentation due to its ability to draw people
attention.

Video is also very hardware-intensive (require the highest performance demand on your computer)
o Storage issue: full-screen, uncompressed video uses over 20 megabytes per second (MBps)
of bandwidth and storage space. Processor capability in handling very huge data on real
time delivery

To get the highest video performance, we should:
o Use video compression hardware to allow you to work with full-screen, full-motion video .
o Use a sophisticated audio board to allow you to use CD-quality sounds.
o Install a Super fast RAID ( Redundant Array of Independent Disks) system that will
support high-speed data transfer rates.
 Analog vs Digital Video

Digital video is beginning to replace analog in both professional (production house and broadcast
station) and consumer video markets.

Digital video offer superior quality at a given cost.

Why?
o
Digital video reduces generational losses suffered by analog video.
o
Digital mastering means that quality will never be an issues.
 Obtaining Video Clip

If using analog video, we need to convert it to digital format first (in other words, need to digitize the
analog video first).

Source for analog video can come from:
o
Existing video content

o
beware of licensing and copyright issues
Take a new footage (i.e. shoot your own video)

Ask permission from all the persons who appear or speak, as well as the
permission for the audio or music used.
 How video works. (Video Basics)

Light passes through the camera lens and is converted to an electronic signal by a Charge Coupled
Device (CCD)

Most consumer-grade cameras have a single CCD.

Professional–grade cameras have three CCDs, one for each Red, Green and Blue color information

The output of the CCD is processed by the camera into a signal containing three channels of color
information and synchronization pulse (sync).

If each channel of color information is transmitted as a separate signal on its own conductor, the signal
output is called RGB, which is the preferred method for higher-quality and professional video work.
LECTURE 13:
IMAGES:
An image (from Latin imago) is an artifact, usually two-dimensional (a picture), that has a similar
appearance to some subject—usually a physical object or a person.
The word image is also used in the broader sense of any two-dimensional figure such as a map, a graph, a pie chart,
or an abstract painting. In this wider sense, images can also be rendered manually, such as by drawing, painting,
carving, rendered automatically by printing or computer graphics technology, or developed by a combination of
methods, especially in a pseudo-photograph.
A volatile image is one that exists only for a short period of time. This may be a reflection of an object by a mirror, a
projection of a camera obscura, or a scene displayed on a cathode ray tube. A fixed image, also called a hard copy, is
one that has been recorded on a material object, such as paper or textile by photography or digital processes.
A mental image exists in an individual's mind: something one remembers or imagines. The subject of an image need
not be real; it may be an abstract concept, such as a graph, function, or "imaginary" entity. For example, Sigmund
Freud claimed to have dreamt purely in aural-images of dialogues. The development of synthetic acoustic
technologies and the creation of sound art have led to a consideration of the possibilities of a sound-image made up of
irreducible phonic substance beyond linguistic or musicological analysis.
Still image
A still image is a single static image, as distinguished from a moving image (see below). This phrase is used in
photography, visual media and the computer industry to emphasize that one is not talking about movies, or in very
precise or pedantic technical writing such as a standard.
A film still is a photograph taken on the set of a movie or television program during production, used for promotional
purposes.
CAPTURING AND EDITING IMAGE:
The image you see on your monitor is a bitmap stored in a video memory, updated about every
1/60 of a second or faster, depending upon your monitor’s scan rate. As you assemble image for your multimedia
project, you may need to capture and store an image directly from your screen. The simplest way to capture what you
see on your screen is to press any proper key on your keyboard.
 Both the Macintosh and window environment have a clipboard where text and images are temporarily
stored when you cut copy them within an application. In window when you press print screen a copy of
your screen image go to clipboard. From clipboard you can paste the captured bitmap image to an
application.
 Screen capture utility for Macintosh and window go to a step further and are indispensable to the
multimedia artists. With a key stroke they let you select an area of screen and save the selection in various
other format.
Image editing
When manipulating bitmap is to use an image-editing program. There is king of Mountain program
that let you not only retouch the image but also do tricks like placing your face at the helm of a squarerigger.
In addition to letting you to enhance and make composite image, image editing will also allow
you to alter and distort the image. a colored image of red rose can be changed into blue rose or purple
rose. Morphing is the other effect that can be used to manipulate the still images or can be used to create
bizarre animated transformations. Morphing allow you to smoothly blend two image so that one image is
seemed to be melted into the other image.
SCANNING IMAGE:
Document scanning or image scanning is the action or process of converting text and graphic paper
documents, photographic film, photographic paper or other files to digital images. This "analog" to "digital"
conversion process (A<D) is required for computer users to be able to view electronic files.
LECTURE:14
Color palette:
In computer graphics, a palette is either a given, finite set of colors for the management of digital
images (that is, a color palette), or a small on-screen graphical element for choosing from a limited set of choices,
not necessarily colors (such as a tools palette).
Depending on the context (an engineer's technical specification, an advertisement, a programmers' guide,
an image file specification, a user's manual, etc.) the term palette and related terms such as Web palette and RGB
palette, for example, can have somewhat different meanings.
The following are some of the widely used meanings for color palette in computing.

The total number of colors that a given system is able to generate or manage; the term full palette is often
encountered in this sense. For example, Highcolor displays are said to have a 16-bit RGB palette.

The limited fixed color selection that a given display adapter can offer when its hardware registers are
appropriately set (fixed palette selection). For example, the Color Graphics Adapter (CGA) can be set to show
the so-called palette #1 or the palette #2 in color graphic mode: two combinations of 4 fixed colors each.

The limited selection of colors that a given system is able to display simultaneously, generally picked from a
wider full palette; selected colors or picked colors are also used. In this case, the color selection is always
chosen by software, both by the user or by a program. For example, the standard VGA display adapter is said
to provide a palette of 256 simultaneous colors from a total of 262,144 different colors.

The hardware registers of the display subsystem in which the selected colors' values are to be loaded in order
to show them, also referred as the hardware palette or Color Look-Up Table (CLUT). For example, the
hardware registers of the Commodore Amiga are known both as their color palette and their CLUT,
depending on sources.

A given color selection officially standardized by some body or corporation; default palette or system palette
are also used for this meaning. For example, the well known Web colors for use with Internet browsers, or the
Microsoft Windows default palette.

The limited color selection inside a given indexed color image file as GIF, for example, although the
expressions color table or color map are also generally used.
vector images
A vector image is made up of a series of mathematical instructions. These instructions define the lines
and shapes that make up a vector image. As well as shape, size and orientation, the file stores information about the
outline and fill colour of these shapes.
Features common to vector images

Scalability/resolution independence - images display at any size without loss of detail

File sizes are usually smaller than raster images

Easily convertible to raster formats (rasterising)

Unable to display on Web without browser plug-in or conversion to raster format
File formats for vector images
There are three types of file format for storing vector images:
1. Native vector image file formats e.g. Adobe Illustrator (AI), Adobe FreeHand (FH11), CorelDraw (CDR)
Native file formats are created and used by drawing software. They are usually proprietary and best
supported by the programs that create them. Native file formats can be saved in Web compatible formats, or
as metafiles for printing.
2. Metafiles/Page Description Language (PDL) e.g. Encapsulated PostScript (EPS), Windows Metafile
(WMF), Computer Graphic Metafile (CGM), Adobe Portable Document Format (PDF)
Metafiles contain images (vector and raster) and the instructions for displaying them. They can act as
containers for vector images when native formats are not supported. Unlike WMF, which only works in
Windows, EPS is platform independent
1.
Web compatible vector image file formats e.g. Adobe Flash (SWF) and Scalable Vector Graphics (SVG)
Flash and SVG are the two main vector image standards for the Web. Both support animation (see Vector
Images Part II) and both require browser plug-ins. Flash is almost ubiquitous and widely supported, but
remains proprietary. SVG is an 'open' standard based on XML, but is currently not nearly as well supported by
browsers.
2.
LECTURE 15:
JPEG (pronounced "jay-peg") is a standardized image compression mechanism. JPEG stands for
Joint Photographic Experts Group, the original name of the committee that wrote the standard. JPEG is
designed for compressing either full-color or gray-scale images of natural, real-world scenes. It works well on
photographs, naturalistic artwork, and similar material; not so well on lettering, simple cartoons, or line
drawings. JPEG handles only still images, but there is a related standard called MPEG for motion pictures. JPEG
is "lossy," meaning that the decompressed image isn't quite the same as the one you started with. (There are
lossless image compression algorithms, but JPEG achieves much greater compression than is possible with
lossless methods.) JPEG is designed to exploit known limitations of the human eye, notably the fact that small
color changes are perceived less accurately than small changes in brightness. Thus, JPEG is intended for
compressing images that will be looked at by humans. If you plan to machine-analyze your images, the small
errors introduced by JPEG may be a problem for you, even if they are invisible to the eye.
A useful property of JPEG is that the degree of lossiness can be varied by adjusting compression parameters ( Q
factor). This means that the image maker can trade off file size against output image quality. You can make
*extremely* small files if you don't mind poor quality; this is useful for applications such as indexing image
archives. Conversely, if you aren't happy with the output quality at the default compression setting, you can
jack up the quality until you are satisfied, and accept lesser compression.
Another important aspect of JPEG is that decoders can trade off decoding speed against image quality, by using
fast but inaccurate approximations to the required calculations. Some viewers obtain remarkable speedups in
this way. (Encoders can also trade accuracy for speed, but there's usually less reason to make such a sacrifice
when writing a file.)
Very well indeed, when working with its intended type of image (photographs and suchlike). For full-color
images, the uncompressed data is normally 24 bits/pixel. The best known lossless compression methods can
compress such data about 2:1 on average. JPEG can typically achieve 10:1 to 20:1 compression without visible
loss, bringing the effective storage requirement down to 1 to 2 bits/pixel. 30:1 to 50:1 compression is possible
with small to moderate defects, while for very-low-quality purposes such as previews or archive indexes, 100:1
compression is quite feasible. An image compressed 100:1 with JPEG takes up the same space as a full-color
one-tenth-scale thumbnail image, yet it retains much more detail than such a thumbnail.
JPEG compression
The compression method is usually lossy, meaning that some visual quality is lost in the process and cannot be
restored. There are variations on the standard baseline JPEG that are lossless; however, these are not widely
supported.
There is also an interlaced "Progressive JPEG" format, in which data is compressed in multiple passes of
progressively higher detail. This is ideal for large images that will be displayed while downloading over a slow
connection, allowing a reasonable preview after receiving only a portion of the data. However, progressive JPEGs are
not as widely supported, and even some software which does support them (such as some versions of Internet
Explorer) only displays the image once it has been completely downloaded.
There are also many medical imaging systems that create and process 12-bit JPEG images. The 12-bit JPEG format
has been part of the JPEG specification for some time, but again, this format is not as widely supported.
[edit] Lossless editing
A number of alterations to a JPEG image can be performed losslessly (that is, without recompression and the
associated quality loss) as long as the image size is a multiple 1 MCU block (Minimum Coded Unit) (usually 16
pixels in both directions, for 4:2:0).
Blocks can be rotated in 90 degree increments, flipped in the horizontal, vertical and diagonal axes and moved about
in the image. Not all blocks from the original image need to be used in the modified one.
The top and left of a JPEG image must lie on a block boundary, but the bottom and right need not do so. This limits
the possible lossless crop operations, and also what flips and rotates can be performed on an image whose edges do
not lie on a block boundary for all channels.
When using lossless cropping, if the bottom or right side of the crop region is not on a block boundary then the rest of
the data from the partially used blocks will still be present in the cropped file and can be recovered relatively easily
by anyone with a hex editor and an understanding of the format.
It is also possible to transform between baseline and progressive formats without any loss of quality, since the only
difference is the order in which the coefficients are placed in the file.
JPEG-DCT encoding and quantization:
JPEG-DCT encoding
Next, each component (Y, Cb, Cr) of each 8×8 block is converted to a frequency-domain representation, using a
normalized, two-dimensional type-II discrete cosine transform (DCT).
As an example, one such 8×8 8-bit subimage might be:
Before computing the DCT of the subimage, its gray values are shifted from a positive range to one centered around
zero. For an 8-bit image each pixel has 256 possible values: [0,255]. To center around zero it is necessary to subtract
by half the number of possible values, or 128.
Subtracting 128 from each pixel value yields pixel values on [ − 128,127]
The next step is to take the two-dimensional DCT, which is given by:
The DCT transforms 64 pixels to a linear combination of these 64 squares. Horizontally is u and vertically is v.
where





is the horizontal spatial frequency, for the integers .
is the vertical spatial frequency, for the integers .
is a normalizing function
is the pixel value at coordinates
is the DCT coefficient at coordinates
If we perform this transformation on our matrix above, and then round to the nearest integer, we get
Note the rather large value of the top-left corner. This is the DC coefficient. The remaining 63 coefficients are called
the AC coefficients. The advantage of the DCT is its tendency to aggregate most of the signal in one corner of the
result, as may be seen above. The quantization step to follow accentuates this effect while simultaneously reducing
the overall size of the DCT coefficients, resulting in a signal that is easy to compress efficiently in the entropy stage.
The DCT temporarily increases the bit-depth of the image, since the DCT coefficients of an 8-bit/component image
take up to 11 or more bits (depending on fidelity of the DCT calculation) to store. This may force the codec to
temporarily use 16-bit bins to hold these coefficients, doubling the size of the image representation at this point; they
are typically reduced back to 8-bit values by the quantization step. The temporary increase in size at this stage is not a
performance concern for most JPEG implementations, because typically only a very small part of the image is stored
in full DCT form at any given time during the image encoding or decoding process.
[edit] Quantization
The human eye is good at seeing small differences in brightness over a relatively large area, but not so good at
distinguishing the exact strength of a high frequency brightness variation. This allows one to greatly reduce the
amount of information in the high frequency components. This is done by simply dividing each component in the
frequency domain by a constant for that component, and then rounding to the nearest integer. This is the main lossy
operation in the whole process. As a result of this, it is typically the case that many of the higher frequency
components are rounded to zero, and many of the rest become small positive or negative numbers, which take many
fewer bits to store.
A typical quantization matrix, as specified in the original JPEG Standard[5], is as follows:
The quantized DCT coefficients are computed with
where G is the unquantized DCT coefficients; Q is the quantization matrix above; and B is the quantized DCT
coefficients. (Note that this is in no way matrix multiplication.)
Using this quantization matrix with the DCT coefficient matrix from above results in:
For example, using −415 (the DC coefficient) and rounding to the nearest integer
LECTURE:16
Entropy coding
Main article: Entropy encoding
Zigzag ordering of JPEG image components
Entropy coding is a special form of lossless data compression. It involves arranging the image components in a
"zigzag" order employing run-length encoding (RLE) algorithm that groups similar frequencies together, inserting
length coding zeros, and then using Huffman coding on what is left.
The JPEG standard also allows, but does not require, the use of arithmetic coding, which is mathematically superior
to Huffman coding. However, this feature is rarely used as it is covered by patents and because it is much slower to
encode and decode compared to Huffman coding. Arithmetic coding typically makes files about 5% smaller.
The zigzag sequence for the above quantized coefficients are shown below. (The format shown is just for ease of
understanding/viewing.)
−26
−3
−3
2
1
−1
0
0
0
0
0
0
−2
−4
1
1
0
0
0
0
0
−6
1
5
−1
0
0
0
0
0
−4
1
2
−1
0
0
0
0
2
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
If the i-th block is represented by Bi and positions within each block are represented by (p,q) where p = 0, 1, ..., 7 and
q = 0, 1, ..., 7, then any coefficient in the DCT image can be represented as Bi(p,q). Thus, in the above scheme, the
order of encoding pixels (for the i-th block) is Bi(0,0), Bi(0,1), Bi(1,0), Bi(2,0), Bi(1,1), Bi(0,2), Bi(0,3), Bi(1,2) and
so on.
Baseline sequential JPEG encoding and decoding processes
This encoding mode is called baseline sequential encoding. Baseline JPEG also supports progressive encoding.
While sequential encoding encodes coefficients of a single block at a time (in a zigzag manner), progressive encoding
encodes similar-positioned coefficients of all blocks in one go, followed by the next positioned coefficients of all
blocks, and so on. So, if the image is divided into N 8×8 blocks {B0,B1,B2, ..., Bn-1}, then progressive encoding
encodes Bi(0,0) for all blocks, i.e., for all i = 0, 1, 2, ..., N-1. This is followed by encoding Bi(0,1) coefficient of all
blocks, followed by Bi(1,0)-th coefficient of all blocks, then Bi(0,2)-th coefficient of all blocks, and so on. It should
be noted here that once all similar-positioned coefficients have been encoded, the next position to be encoded is the
one occurring next in the zigzag traversal as indicated in the figure above. It has been found that Baseline Progressive
JPEG encoding usually gives better compression as compared to Baseline Sequential JPEG due to the ability to use
different Huffman tables (see below) tailored for different frequencies on each "scan" or "pass" (which includes
similar-positioned coefficients), though the difference is not too large.
In the rest of the article, it is assumed that the coefficient pattern generated is due to sequential mode.
In order to encode the above generated coefficient pattern, JPEG uses Huffman encoding. JPEG has a special
Huffman code word for ending the sequence prematurely when the remaining coefficients are zero.
Using this special code word: "EOB", the sequence becomes:
−26
−3
−3
2
1
−1
0
0
−2
−4
1
1
0
−6
1
5
−1
0
−4
1
2
−1
2
0
0
−1 EOB
JPEG's other code words represent combinations of (a) the number of significant bits of a coefficient, including sign,
and (b) the number of consecutive zero coefficients that follow it. (Once you know how many bits to expect, it takes
1 bit to represent the choices {-1, +1}, 2 bits to represent the choices {-3, -2, +2, +3}, and so forth.) In our example
block, most of the quantized coefficients are small numbers that are not followed immediately by a zero coefficient.
These more-frequent cases will be represented by shorter code words.The JPEG standard provides general-purpose
Huffman tables; encoders may also choose to generate Huffman tables optimized for the actual frequency
distributions in images being encoded.
LECTURE:17
GIF
GIF, which stands for Graphics Interchange Format, is a lossless method of compression. All that means
is that when the program that creates a GIF squashes the original image down it takes care not to lose
any data. It uses a simple substitution method of compression. If the algorithm comes across several
parts of the image that are the same, say a sequence of digits like this, 1 2 3 4 5, 1 2 3 4 5, 1 2 3 4 5, it
makes the number 1 stand for the sequence 1 2 3 4 5 so that you could render the same sequence 1 1 1,
obviously saving a lot of space. It stores the key to this (1 = 1 2 3 4 5) in a hash table, which is attached
to the image so that the decoding program can unscramble it.
The maximum compression available with a GIF therefore depends on the amount of repetition there is
in an image. A flat colour will compress well - sometimes even down to one tenth of the original file
size - while a complex, non-repetitive image will fare worse, perhaps only saving 20% or so.
There are problems with GIFs. One is that they are limited to a palette of 256 colours or less.
Compuserve, which created the GIF, did at one point say it would attempt to produce a 24-bit version of
the GIF, but then along came problem number two: Unisys. Unisys discovered that it owned some
patents to key parts of the GIF compression technology, and has started demanding fees from every
company whose software uses the (freely available) GIF code. This has somewhat stifled development.
There is a 24-bit, license-free GIFalike called the PNG format, but this has yet to take off.
JPEG
JPEG, on the other hand, is a lossy compression method. In other words, to save space it just throws
away parts of an image. Obviously you can't just go around discarding any old piece of information so
what the JPEG algorithm does is first divide the image into squares (you can see these squares on badlycompressed JPEGs).
Then it uses a piece of mathematics called Discrete Cosine Transformation to turn the square of data
into a set of curves, some small, some big, that go together to make up the image. This is where the
lossy bit comes in: depending on how much you want to compress the image the algorithm throws away
the less significant part of the data (the smaller curves) which adds less to the overall "shape" of the
image.
This means that, unlike GIF, you get a say in how much you want to compress an image by. However
the lossy compression method can generate artifacts - unwanted effects such as false colour and
blockiness - if not used carefully.
To see some examples of GIFs and JPEGs at work, click on the related pages, below.
PNG - Portable Network Graphics
(.PNG file extension, the pronunciation 'Ping' is specifically mentioned in the PNG Specification). PNG needs to be
mentioned. PNG is not the number one file format, but you will want to know about it. PNG is not so popular yet, but
it's appeal is growing as people discover what it can do.
PNG was designed recently, with the experience advantage of knowing all that went before. The original purpose of
PNG was to be a royalty-free GIF and LZW replacement (see LZW next page). However PNG supports a large set of
technical features, including superior lossless compression from LZ77. Compression in PNG is called the ZIP
method, and is like the 'deflate" method in PKZIP (and is royalty free).
But the big deal is that PNG incorporates special preprocessing filters that can greatly improve the lossless
compression efficiency, especially for typical gradient data found in 24 bit photographic images. This filter
preprocessing causes PNG to be a little slower than other formats when reading or writing the file (but all types of
compression require processing time).
Photoshop 7 and Elements 2.0 correct this now, but earlier Adobe versions did not store or read the ppi number to
scale print size in PNG files (Adobe previously treated PNG like GIF in this respect, indicated 72 ppi regardless). The
ppi number never matters on the video screen or web, but it was a serious usability flaw for printing purposes.
Without that stored ppi number, we must scale the image again every time we print it. If we understand this, it should
be no big deal, and at home, we probably automatically do that anyway (digital cameras do the same thing with their
JPG files). But sending a potentially unsized image to a commercial printer is a mistake, and so TIF files should be
used in that regard.
Most other programs do store and use the correct scaled resolution value in PNG files. PNG stores resolution
internally as pixels per meter, so when calculating back to pixels per inch, some programs may show excessive
decimal digits, perhaps 299.999 ppi instead of 300 ppi (no big deal).
PNG has additional unique features, like an Alpha channel for a variable transparency mask (any RGB or Grayscale
pixel can be say 79% transparent and other pixels may individually have other transparency values). If indexed color,
palette values may have similar variable transparency values. PNG files may also contain an embedded Gamma value
so the image brightness can be viewed properly on both Windows and Macintosh screens. These should be wonderful
features, but in many cases these extra features are not implemented properly (if at all) in many programs, and so
these unique features must be ignored for web pages. However, this does not interfere with using the standard
features, specifically for the effective and lossless compression.
Netscape 4.04 and MS IE 4.0 browsers added support for PNG files on web pages, not to replace JPG, but to replace
GIF for graphics. For non-web and non-graphic use, PNG would compete with TIF. Most image programs support
PNG, so basic compatibility is not an issue. You may really like PNG.
PNG may be of great interest, because it's lossless compression is well suited for master copy data, and because PNG
is a noticeably smaller file than LZW TIF. Perhaps about 25% smaller than TIF LZW for 24 bit files, and perhaps
about 10% to 30% smaller than GIF files for indexed data.
Different images will have varying compression sizes, but PNG is an excellent replacement for GIF and 24 bit TIFF
LZW files. PNG does define 48 bit files, but I don't know of any programs that support 48 bit PNG (not too many
support 48 bit in any form).
Here are some representative file sizes for a 9.9 megabyte 1943x1702 24-bit RGB color image:
File type
File size
TIFF
9.9 megs
TIFF LZW
8.4 megs
PNG
6.5 megs
JPG
1.0 megs
BMP
9.9 megs
(1.0 / 9.9) is 10% file size
Seems to me that PNG is an excellent replacement for TIFF too.
TIFF - Tag Image File Format
(.TIF file extension, pronounced Tif) TIFF is the format of choice for archiving important images. TIFF is THE
leading commercial and professional image standard. TIFF is the most universal and most widely supported format
across all platforms, Mac, Windows, Unix. Data up to 48 bits is supported.
TIFF supports most color spaces, RGB, CMYK, YCbCr, etc. TIFF is a flexible format with many options. The data
contains tags to declare what type of data follows. New types are easy to invent, and this versatility can cause
incompatibly, but about any program anywhere will handle the standard TIFF types that we might encounter. TIFF
can store data with bytes in either PC or Mac order (Intel or Motorola CPU chips differ in this way). This choice
improves efficiency (speed), but all major programs today can read TIFF either way, and TIFF files can be exchanged
without problem.
Several compression formats are used with TIF. TIF with G3 compression is the universal standard for fax and multipage line art documents.
TIFF image files optionally use LZW lossless compression. Lossless means there is no quality loss due to
compression. Lossless guarantees that you can always read back exactly what you thought you saved, bit-for-bit
identical, without data corruption. This is a critical factor for archiving master copies of important images. Most
image compression formats are lossless, with JPG and Kodak PhotoCD PCD files being the main exceptions.
Compression works by recognizing repeated identical strings in the data, and replacing the many instances with one
instance, in a way that allows unambiguous decoding without loss. This is fairly intensive work, and any compression
method makes files slower to save or open.
LZW is most effective when compressing solid indexed colors (graphics), and is less effective for 24 bit continuous
photo images. Featureless areas compress better than detailed areas. LZW is more effective for grayscale images than
color. It is often hardly effective at all for 48 bit images (VueScan 48 bit TIF LZW is an exception to this, using an
efficient data type that not all others use ).
LZW is Lempel-Ziv-Welch, named for Israeli researchers Abraham Lempel and Jacob Zif who published IEEE
papers in 1977 and 1978 (now called LZ77 and LZ78) which were the basis for most later work in compression.
Terry Welch built on this, and published and patented a compression technique that is called LZW now. This is the
1984 Unisys patent (now Sperry) involved in TIF LZW and GIF (and V.42bis for modems). There was much
controversy about a royalty for LZW for GIF, but royalty was always paid for LZW for TIF files and for v.42bis
modems. International patents recently expired in mid-2004.
Image programs of any stature will provide LZW, but simple or free programs often do not pay LZW patent royalty
to provide LZW, and then its absence can cause an incompatibility for compressed files.
It is not necessary to say much about TIF. It works, it's important, it's great, it's practical, it's the standard universal
format for high quality images, it simply does the best job the best way. Give TIF very major consideration, both for
photos and documents, especially for archiving anything where quality is important.
But TIF files for photo images are generally pretty large. Uncompressed TIFF files are about the same size in bytes as
the image size in memory. Regardless of the novice view, this size is a plus, not a disadvantage. Large means lots of
detail, and it's a good thing. 24 bit RGB image data is 3 bytes per pixel. That is simply how large the image data is,
and TIF LZW stores it with recoverable full quality in a lossless format (and again, that's a good thing). $200 today
buys BOTH a 320 GB 7200 RPM disk and 512 MB of memory so it is quite easy to plan for and deal with the size.
There are situations for less serious purposes when the full quality may not always be important or necessary. JPEG
files are much smaller, and are suitable for non-archival purposes, like photos for read-only email and web page use,
when small file size may be more important than maximum quality. JPG has its important uses, but be aware of the
large price in quality that you must pay for the small size of JPG, it is not without cost.
Graphic Interchange Format (GIF)
(.GIF file extension) There have been raging debates about the pronunciation. The designers of GIF say it is correctly
pronounced to sound like Jiff. But that seems counter-intuitive, and up in my hills, we say it sounding like Gift
(without the t).
GIF was developed by CompuServe to show images online (in 1987 for 8 bit video boards, before JPG and 24 bit
color was in use). GIF uses indexed color, which is limited to a palette of only 256 colors (next page). GIF was a
great match for the old 8 bit 256 color video boards, but is inappropriate for today's 24 bit photo images.
GIF files do NOT store the image's scaled resolution ppi number, so scaling is necessary every time one is printed.
This is of no importance for screen or web images. GIF file format was designed for CompuServe screens, and
screens don't use ppi for any purpose. Our printers didn't print images in 1987, so it was useless information, and
CompuServe simply didn't bother to store the printing resolution in GIF files.
GIF is still an excellent format for graphics, and this is its purpose today, especially on the web. Graphic images (like
logos or dialog boxes) use few colors. Being limited to 256 colors is not important for a 3 color logo. A 16 color GIF
is a very small file, much smaller, and more clear than any JPG, and ideal for graphics on the web.
Graphics generally use solid colors instead of graduated shades, which limits their color count drastically, which is
ideal for GIF's indexed color. GIF uses lossless LZW compression for relatively small file size, as compared to
uncompressed data. GIF files offer optimum compression (smallest files) for solid color graphics, because objects of
one exact color compress very efficiently in LZW. The LZW compression is lossless, but of course the conversion to
only 256 colors may be a great loss. JPG is much better for 24 bit photographic images on the web. For those
continuous tone images, the JPG file is also very much smaller (although lossy). But for graphics, GIF files will be
smaller, and better quality, and (assuming no dithering) pure and clear without JPG artifacts.
If GIF is used for continuous tone photo images, the limited color can be poor, and the 256 color file is quite large as
compared to JPG compression, even though it is 8 bit data instead of 24 bits. Photos might typically contain 100,000
different color values, so the image quality of photos is normally rather poor when limited to 256 colors. 24 bit JPG is
a much better choice today. The GIF format may not even be offered as a save choice until you have reduced the
image to 256 colors or less.
So for graphic art or screen captures or line art, GIF is the format of choice for graphic images on the web. Images
like a company logo or screen shots of a dialog box should be reduced to 16 colors if possible and saved as a GIF for
smallest size on the web. A complex graphics image that may look bad at 16 colors might look very good at say 48
colors (or it may require 256 colors if photo-like). But often 16 colors is fine for graphics, with the significance that
the fewer number of colors, the smaller the file, which is extremely important for web pages.
GIF optionally offers transparent backgrounds, where one palette color is declared transparent, so that the background
can show through it. The GIF File - Save As dialog box usually has an Option Button to specify which one GIF
palette index color is to be transparent.
Interlacing is an option that quickly shows the entire image in low quality, and the quality sharpens as the file
download completes. Good for web images, but it makes the file slightly larger.
GIF files use a palette of indexed colors, and if you thought 24 bit RGB color was kinda complicated, then you ain't
seen nuthin' yet (next page).
For GIF files, a 24 bit RGB image requires conversion to indexed color. More specifically, this means conversion to
256 colors, or less. Indexed Color can only have 256 colors maximum. There are however selections of different
ways to convert to 256 colors.
LECTURE 18
The Digital Representation of Sound
The world is continuous. Time marches on and on and there are plenty of things that we could measure
at any instant. For example, weather forecasters might keep an ongoing recording of the temperature, or
the barometric pressure. If you are in the hospital, then the nurses might be keeping a record of your
temperature, or your heart rate (EKG), or your brain waves (EEG), or your insturance coverage. Any
one of these records gives you a function f(t), where at a given time t, f(t) would be the value of the
particular statistic that interests you. These sorts of functions are called time series.
Genetic algorithms (GA) are evolutionary computing systems, which have been applied to things like optimization
and machine learning. These are examples of "continuous" functions. What we mean by this is that at any instant
of time, the functions take on a well-defined value, so that as we can see by the above figures, they make squiggly
line graphs, which, if traced out by a number two pencil (please have one handy at all times!) could be done
without the pencil ever leaving the paper (Ladies and gentlemen! Notice that at no time, did the pencil leave the
paper!). This might also be called an "analog" function.
Now, it's true that to illustrate the idea of a graph, we could have used a lot of simpler things (like the Dow Jones
average, or a rainfall chart, or an actual EKG. But, you've all seen stuff like that, and also, we're really nerdy (well,
one of us isn't), so we thought these would be really, like, way cool (totally).
Copyright Juha Haataja/Center for Scientific Computing, Finland
Of course, the time series that interest us are those that represent sound. In particular, what we want to
do is take these time series, stick them on the computer and start to play with them!
Now, if you're paying attention, then you may realize that at this moment we're in a bit of a bind. The
type of time series that we've been describing is a continuous function. That is, at every instant in time,
we could write down a number that is the value of the function at that instant —whether it be how much
your eardrum has been displaced, what is your temperature, what is your heartrate, etc. But, this is an
infinite list of numbers (any one of which may have an infinite expansion, like = 3.1417...) and no
matter how big your computer is, you're going to have a pretty tough time fitting an infinite collection of
numbers on your spanking new hard drive.
So, how do we do it? That's the problem that we'll start to investigate in this chapter. How can we
represent sound as a finite collection of numbers that can be stored efficiently, in a finite amount of
space, on your computer, and played back, and manipulated at will! In short, how do we represent sound
digitally?!?!"
Here's a simpler restatement of the basic problem: computers basically store a finite list of numbers
(which can then be thought of as a long list of 0s and 1s). These numbers also have a finite precision. A
continuous function would be a list infinitely long! What is a poor electroacoustic musician to do?
(Well, one thing to do would be to remember our mentions of sampling in the previous chapter).
Somehow we have to come up with a finite list of numbers which does a good job of representing our
continuous function. We do it with samples of the original function, at every few instants (of some
predetermined rate, called the sampling rate) recording the value of the function. For example, maybe
we only record the temperature every 5 minutes. For sounds we need to go a lot faster, and often use a
special device which grabs instantaneous amplitudes at rapid, audio rates (called an Analog to Digital
converter, or ADC).
A continuous function is also called an analog function, and to restate the problem, we have to convert
analog functions to lists of samples, or digital functions, the fundamental way that computers store
information. In computers, think of this function not as a function of time (which it is) but as a function
of position in computer memory. That is, we store these functions as lists of numbers in computer
memory, and as we read through them we are basically creating a discrete function of time of individual
amplitude values.
Encoding Analog Signals
Amplitude modulation (AM) can be used to send any type of information, and is not limited to sending just numerical
information. In fact, usually in amplitude modulated systems the voltage levels do not change abruptly as in the
example of Figures 5.9 through 5.8, but they vary continuously over a range of voltage values. One common shape
for the way voltages vary in an amplitude modulation communication system is the sinusoid or sine wave shown in
Figure 5.15.
Where Digital Meets Analog
A traditional telephone network operates with analog signals, whereas computers work with digital signals. Therefore
a device is required to convert the computer's digital signal to an analog signal compatible with the phone line
(modulation). This device must also convert the incoming analog signal from phone line to a digital signal
(demodulation). Such a device is called a modem; its name is derived from this process of modulation/demodulation.
A modem is also known as Data Circuit Terminating Equipment (DCE), which is used to connect a computer or data
terminal to a network. Logically, a PC or data terminal is called Data Terminal Equipment (DTE).
A modem's transmission speed can be represented by either data rate or baud rate. The data rate is the number of bits
which a modem can transmit in one second. The baud rate is the number of 'symbols' which a modem can transmit in
one second.
The carrier signal on a telephone line has a bandwidth of 4000 Hz. Figure 7.2 shows one cycle of telephone carrier
signals. The following types of modulation are used to convert digital signals to analog signals:




Amplitude Shift Keying(ASK)
Frequency Shift Keying (FSK)
Phase Shift Keying (PSK)
Quadrature Amplitude Modulation (QAM)
Amplitude Shift Keying
In Amplitude Shift Keying (ASK), the amplitude of the signal changes. This also may be referred to as Amplitude
Modulation (AM). The receiver recognizes these modulation changes as voltage changes, as shown in Figure 5.17.
The smaller amplitude is represented by zero and the larger amplitude is represented by one. Each cycle is
represented by one bit, with the maximum bits per second determined by the speed of the carrier signal. In this case,
the baud rate is equal to the number of bits per second.
Figure 5.17
Frequency Shift Keying. With Frequency Shift Keying (FSK), a zero is represented by no change to the
frequency of the original signal, while a one is represented by a change to the frequency of the original signal. This is
shown in Figure 5.18. Frequency modulation is a term often used in place of FSK.
Figure 5.18
Phase Shift Keying. Using the Phase Shift Keying (PSK) modulation method, the phase of the signal is changed
to represent ones and zeros. Figure 5.19 shows a 90-degree phase shift. Figures 5.20(a), (b), and (c) show the
original signals with a 90-degree shift, a 180-degree shift and a 270-degree shift, respectively.
Figure 5.19
Figure 5.20
LECTURE 19:
SUBBAND CODING:
Sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency
bands and encodes each one independently. This decomposition is often the first step in data compression for audio
and video signals.
Basic Principles
The utility of SBC is perhaps best illustrated with a specific example. When used for audio compression, SBC
exploits what might be considered a deficiency of the human auditory system. Human ears are normally sensitive to a
wide range of frequencies, but when a sufficiently loud signal is present at one frequency, the ear will not hear
weaker signals at nearby frequencies. We say that the louder signal masks the softer ones. The louder signal is called
the masker, and the point at which masking occurs is known, appropriately enough, as the masking threshold.
The basic idea of SBC is to enable a data reduction by discarding information about frequencies which are masked.
The result necessarily differs from the original signal, but if the discarded information is chosen carefully, the
difference will not be noticeable.
A basic SBC scheme
To enable higher quality compression, one may use subband coding. First, a digital filter bank divides the input signal
spectrum into some number (e.g., 32) of subbands. The psychoacoustic model looks at the energy in each of these
subbands, as well as in the original signal, and computes masking thresholds using psychoacoustic information. Each
of the subband samples is quantized and encoded so as to keep the quantization noise below the dynamically
computed masking threshold. The final step is to format all these quantized samples into groups of data called frames,
to facilitate eventual playback by a decoder.
Decoding is much easier than encoding, since no psychoacoustic model is involved. The frames are unpacked,
subband samples are decoded, and a frequency-time mapping reconstructs an output audio signal.
Over the last five to ten years, SBC systems have been developed by many of the key companies and laboratories in
the audio industry. Beginning in the late 1980s, a standardization body called the Motion Picture Experts Group
(MPEG) developed generic standards for coding of both audio and video. Subband coding resides at the heart of the
popular MP3 format (more properly known as MPEG 1 audio layer III), for example.
Fourier methods applied to image processing
Background
The theory introduced for one dimensional signals above carries over to two dimensional signals with minor changes.
Our basis functions now depend on two variables (one in the x-direction and one in the y-direction) and also two
frequencies, one in each direction. See Exercises 2.15-2.18 in [1] for more details. The corresponding l2-norm for a
two dimensional signal now becomes
(2)
where aij are the elements in the
matrix representing the two dimensional signal. It is computed in Matlab
using the Frobenius norm.
Displaying the spectrum
When you display the DFT of a two dimensional signal in Matlab, the default setting is that the low frequencies are
displayed towards the edges of the plot and high frequencies in the center. However, in many situations one displays
the spectrum with the low frequencies in the center, and the high frequencies near the edges of the plot. Which way
you choose is up to you. You can accomplish the latter alternative by using the command fftshift().
There is a very useful trick to enhance the visualization of the spectrum of a two dimensional signal. If you take the
logarithm of the gray-scale, this usually give a better plot of the frequency distribution. In case you want to display
the spectrum fA, I recommend to type imshow(log(abs(fA))) for a nice visualization of the frequency distribution.
You may also want to use fftshift as described above, but that is more a matter of taste.
Exercises
Exercise 12: The two dimensional Fourier basis
Define the basis function Fm,n for an
matrix by
(3)
To get a feeling for what these functions look like, plot the real and imaginary part of F1,1, F2,0, F0,2 and F4,4. Do this
by first evaluating these functions on a square grid (128 by 128) and then display the resulting matrix using the
command imshow.
Finally, form a sum of two (real) basis functions such that the resulting imshow-plot ``resembles'' a chess board.
You may have to re-scale the matrix elements before you display the matrix using imshow. You can accomplish this
using the mat2gray() command.
Exercise 13: Thresholding
To compress an image, set elements in the Fourier space of a signal with a magnitude less than some number
zero. (A matrix with many zeros can be stored very efficiently.)
to
Write a function threshold that takes a matrix and a scalar representing as input. The function should loop through
all elements in the matrix and put all elements with a magnitude less than epsilon to zero and compute and print (on
the screen) the compression ratio (use definition in previous lab and assume that every non-zero element is ``one
piece of information'' that needs to be stored). Finally, the function should return the resulting matrix.
Exercise 14: Image compression
Using the function threshold, compress (by performing thresholding in the Fourier domain) the two images pichome.jpg and basket.jpg you worked with in the previous lab. Discuss the compression in terms of visual quality,
compression rate and the l2-error. Is the performance different for the two images and if so why? How does this
compression method compare to the SVD-compression for the previous lab?
Exercise 15: Low pass filtering
A ``naive'' way to low pass filter an image (that is, remove high frequencies of the image) is to Fourier transform the
image, put all elements representing high frequencies of the Fourier transformed signal equal to zero, and then take
an inverse transform of the signal.
Design a low pass filter that removes the highest frequencies in both the x-direction and in the y-direction.
Experiment by removing more and more high frequencies and see how this changes the the filtered image.
LECTURE: 20
Audio compression
Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression
algorithms are implemented in computer software as audio codecs. Generic data compression algorithms perform
poorly with audio data, seldom reducing file sizes much below 87% of the original, and are not designed for use in
real time. Consequently, specific audio "lossless" and "lossy" algorithms have been created. Lossy algorithms provide
far greater compression ratios and are used in mainstream consumer audio devices.
As with image compression, both lossy and lossless compression algorithms are used in audio compression, lossy
being the most common for everyday use. In both lossy and lossless compression, information redundancy is reduced,
using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to
describe the data.
The trade-off of slightly reduced audio quality is clearly outweighed for most practical audio applications where users
cannot perceive any difference and space requirements are substantially reduced. For example, on one CD, one can fit
an hour of high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in
MP3 format at medium bit rates.
Lossless audio compression
Lossless audio compression allows one to preserve an exact copy of one's audio files, in contrast to the irreversible
changes from lossy compression techniques such as Vorbis and MP3. Compression ratios are similar to those for
generic lossless data compression (around 50–60% of original size), and substantially less than for lossy compression
(which typically yield 5–20% of original size).
Use
The primary use of lossless encoding are:
Archives
For archival purposes, one naturally wishes to maximize quality.
Editing
Editing lossily compressed data leads to digital generation loss, since the decoding and re-encoding introduce
artifacts at each generation. Thus audio engineers use lossless compression.
Audio quality
Being lossless, these formats completely avoid compression artifacts. Audiophiles thus favor lossless
compression.
A specific application is to store lossless copies of audio, and then produce lossily compressed versions for a digital
audio player. As formats and encoders improve, one can produce updated lossily compressed files from the lossless
master.
As file storage and communications bandwidth have become less expensive and more available, lossless audio
compression has become more popular.
Lossy audio compression
Lossy audio compression is used in an extremely wide range of applications. In addition to the direct applications
(mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television;
streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy
compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of
the original stream, rather than 50 percent to 60 percent), by discarding less-critical data.
The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio
stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by
first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear.
Typical examples include high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are
coded with decreased accuracy or not coded at all.
Musical Instrument Digital Interface
Interfaces
MIDI connector diagram
The physical MIDI interface uses DIN 5/180° connectors. Opto-isolating connections are used, to prevent ground
loops occurring among connected MIDI devices. Logically, MIDI is based on a ring network topology, with a
transceiver inside each device. The transceivers physically and logically separate the input and output lines, meaning
that MIDI messages received by a device in the network not intended for that device will be re-transmitted on the
output line (MIDI-OUT). This introduces a delay, one that is long enough to become audible on larger MIDI rings.
MIDI-THRU ports started to be added to MIDI-compatible equipment soon after the introduction of MIDI, in order
to improve performance. The MIDI-THRU port avoids the aforementioned retransmission delay by linking the MIDITHRU port to the MIDI-IN socket almost directly. The difference between the MIDI-OUT and MIDI-THRU ports is
that data coming from the MIDI-OUT port has been generated on the device containing that port. Data that comes out
of a device's MIDI-THRU port, however, is an exact duplicate of the data received at the MIDI-IN port.
Such chaining together of instruments via MIDI-THRU ports is unnecessary with the use of MIDI "patch bay,"
"mult" or "Thru" modules consisting of a MIDI-IN connector and multiple MIDI-OUT connectors to which multiple
instruments are connected. Some equipment has the ability to merge MIDI messages into one stream, but this is a
specialized function and is not universal to all equipment.
All MIDI compatible instruments have a built-in MIDI interface. Some computers' sound cards have a built-in MIDI
Interface, whereas others require an external MIDI Interface which is connected to the computer via the game port,
the newer DA-15 connector, a USB connector or by FireWire or ethernet. MIDI connectors are defined by the MIDI
interface standard..
LECTURE 22:
Speech coding
Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses
speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined
with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.
The two most important applications of speech coding are mobile telephony and Voice over IP.
The techniques used in speech coding are similar to that in audio data compression and audio coding where
knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For
example, in narrowband speech coding, only information in the frequency band 400 Hz to 3500 Hz is transmitted but
the reconstructed signal is still adequate for intelligibility.
Speech coding differs from other forms of audio coding in that speech is a much simpler signal than most other audio
signals, and that there is a lot more statistical information available about the properties of speech. As a result, some
auditory information which is relevant in audio coding can be unnecessary in the speech coding context. In speech
coding, the most important criterion is preservation of intelligibility and "pleasantness" of speech, with a constrained
amount of transmitted data.
It should be emphasised that the intelligibility of speech includes, besides the actual literal content, also speaker
identity, emotions, intonation, timbre etc. that are all important for perfect intelligibility. The more abstract concept of
pleasantness of degraded speech is a different property than intelligibility, since it is possible that degraded speech is
completely intelligible, but subjectively annoying to the listener.
In addition, most speech applications require low coding delay, as long coding delays interfere with speech
interaction.
SPEECH RECOGNITION AND GENERATION TECHNIQUE
speech is a complex audio signal, made up of a large number of component sound waves. Speech can easily be
captured in wave form, transmitted and reproduced by common equipment; this is how the telephone has
worked for a century.
However, once we move up the complexity scale and try to make a computer understand the message encoded
in speech, the actual wave form is unreliable. Vastly different sounds can produce similar wave forms, while a
subtle change in inflection can transform a phoneme's wave form into something completely alien. In fact,
much of the speech signal is of no value to the recognition process. Worse still: any reasonably accurate
mathematical representation of the entire signal would be far too large to manipulate in real time.
Therefore, a manageable number of discriminating features must somehow be extracted from the wave before
recognition can take place. A common scheme involves "cepstral coefficients" (cepstral is a mangled form of
spectral); the recognizer collects 8,000 speech samples per second and extracts a "feature vector" of at most a
few dozen numbers from each one, through a mathematical analysis process that is far beyond the scope of
this article. From now on, when I mention "wave form" or "signal", I am actually talking about these collections
of feature vectors.
Acoustic Pattern Recognition
A speech recognizer is equipped with two crucial data structures:



A database of typical wave forms for all of the phonemes (i.e., basic component sounds) of a language.
Since many of these phonemes' pronunciation varies with context, the database usually contains a
number of different wave forms, or "allophones", for each phoneme. Databases containing 200, 800 or
even 10,000 allophones of the English language can be purchased on the open market.
A lexicon containing transcriptions of all the words known to the system into a phonetic language. There
must be a "letter" in this phonetic alphabet for each allophone in the acoustic database. A good lexicon
will contain several transcriptions for most words; the word "the", for example, can be
pronounced "duh" or "dee", so it should have at least these two entries in the lexicon.
Limitations of Speech Recognition
For all of the effort which dozens of PhD's have been putting into their work for years, speech recognition is
nowhere near Star Trek yet. Among the unresolved issues:





Plain old speech recognizers are dumb. Even those smart enough to recognize complete sentences and
equipped with language models will only spit out collections of words. It's up to someone or something
else to make sense of them. (This is why most, if not all, speech input systems also include a natural
language understanding unit; I will describe this component in detail next month.)
Speech recognition works best in quiet, controlled environments. Trying to make it work in Quake III
noise levels is not very effective.
The larger the vocabulary, the easier it is to confuse a recognizer. If the vocabulary contains true
homonyms, the system is in trouble.
Speech recognition is a processor hog; it can easily eat up the equivalent of a 300 Mhz Pentium II,
leaving chump change for the rest of the application.
It is a lot easier to differentiate between long words; unfortunately, most common words are short.
Speech Generation
FinallY, a few words about automated speech generation. While the commercial tools tend to be very
easy to use (i.e., one function call, passing a string of text and receiving a WAV file in return), speech quality is
questionable at best. For games, this is rarely acceptable; unless you want a robot-like voice, you should have
an actor record the computer character(s)' most common responses, and use word-stitching for the rest.
AUDIO SYNTHESIS
Audio synthesis environments comprise a wide and varying range of software and hardware
configurations. Even different versions of the same environment can differ dramatically. Because of this broad
variability, certain aspects of different systems cannot be directly compared. Moreover, some levels of comparison
are either very difficult to objectively quantify, or depend purely on personal preference.
Some of the commonly considered subjective attributes for comparison include:




Usability (how difficult is it for beginners to generate some kind of meaningful output)
Ease of use (how steep is the Learning curve for average and advancing users)
Sound "quality" (which environment produces the most subjectively appealing sound)
Creative flow (in what ways does the environment affect the creative process - e.g. guiding the user in certain
directions)
LECTURE 23:
Image Compression
Image here refers to not only still images but also motion-pictures and compression is the process
used to reduce the physical size of a block of information.
Compression is simply representing information more efficiently; "squeezing the air" out of the data,
so to speak. It takes advantage of three common qualities of graphical data; they are often redundant,
predictable or unnecessary.
(
Today , compression has made a great impact on the storing of large volume of image data. Even
hardware and software for compression and decompression are increasingly being made part of a
computer platform. Take for example, Kay and Levine (1995, pg. 22) state that " … the System 7
operating system of Macintosh computers, now includes a compression engine offering several types
of compression and decompression."
Compression does have its trade-offs. The more efficient the compression technique, the more
complicated the algorithm will be and thus, requires more computational resources or more time to
decompress. This tends to affect the speed. Speed is not so much of an importance to still images but
weighs a lot in motion-pictures. Surely you do not want to see your favourite movies appearing
frame by frame in front of you.
In this section of this paper, I will introduce some of the techniques for compression such as RLE
(Run-Length Encoding), Huffman and Arithmetic. You will then see that some of these compression
in later stage plays an important role in JPEG.
Run-Length Encoding (RLE)
The simplest form of compression technique which is widely supported by most bitmap file formats
such as TIFF, BMP, and PCX. RLE performs compression regardless of the type of information
stored, but the content of the information does affect its efficiency in compressing the information.
Algorithm
RLE works by reducing the physical size of a repeating string of characters. The reduced strings will
then be indicated by a flag. This flag can be a special flag such that it will not appear in the data
stream or it can be just any of the 255 bytes that can appear on the data stream. If the later is used,
then an approach known as byte-stuffing will have to be used in order for the decoder to differentiate
between flag and character belonging to the data stream. Let's illustrate how both RLE and bytestufffing works. This illustration is based on Steinmetz and Nahrstedt
Given that we have the following uncompressed data :
ABBBCCCCCDEFGGGGHI
and the encoder engine is set such that, if characters appear consecutively more than 3 times (n=3)
then we will compress the repeated strings. Note : In some encoders this n value might be different.
Let's take the flag to be'!'.
Depending on the encoder's algorithm, one or more bytes maybe used to indicate the numbers of
repeating characters. In this case we chose 1 byte. Now what happened if the exclamation mark '!'
appears in the data stream, is the decoder going to interpret it as a flag or as a character in the data
stream? Here is where byte-stuffing comes in. If the encoder encounter any characters that appears to
be the flag, it will then stuffed another of the character into the data stream. For example, the
following uncompressed data :
Therefore when the decoder reads two exclamation marks, it will know that one of them belong to
the data stream and automatically remove the other.
Variants on Run-Length Encoding
It was stated by Murray and VanRyper (1994, Encyclopedia of Graphics File Formats) that there are
variants of RLE. The most common one is, encoding an image sequentially along the X-axis
(starting from top left corner), and slowly propagate down the Y-axis (ending at the bottom left
corner). This is shown in Figure 2.1 a). Encoding can also be done along the Y-axis (starting from
the top left corner) and slowly propagate across the X-axis (ending at the bottom right corner). This
is shown in Figure 2.1 b). Other forms, include encoding in 2 dimensional tiles where we can select
the size of each tiles in terms of the number of pixels. And lastly mentioned by Murray and
VanRyper is encoding in zig-zag manners across the image. We can expect the last two encoding
algorithm to be use only in highly specialised applications.
Advantages and Disadvantages of RLE
Advantage :

Algorithm is simple to implement

Fast to execute
Produce lossless compression of images

Disadvantage :

Compression Ratio not as high compared to other more advanced compression algorithms.
LECTURE 31:
APPLICATIONS OF MULTIMEDIA
Multimedia presentations may be viewed in person on stage, projected, transmitted, or played locally with a media
player. A broadcast may be a live or recorded multimedia presentation. Broadcasts and recordings can be either
analog or digital electronic media technology. Digital online multimedia may be downloaded or streamed. Streaming
multimedia may be live or on-demand.
Multimedia games and simulations may be used in a physical environment with special effects, with multiple users in
an online network, or locally with an offline computer, game system, or simulator.
The various formats of technological or digital multimedia may be intended to enhance the users' experience, for
example to make it easier and faster to convey information. Or in entertainment or art, to transcend everyday
experience.
A lasershow is a live multimedia performance.
Enhanced levels of interactivity are made possible by combining multiple forms of media content. Online multimedia
is increasingly becoming object-oriented and data-driven, enabling applications with collaborative end-user
innovation and personalization on multiple forms of content over time. Examples of these range from multiple forms
of content on Web sites like photo galleries with both images (pictures) and title (text) user-updated, to simulations
whose co-efficients, events, illustrations, animations or videos are modifiable, allowing the multimedia "experience"
to be altered without reprogramming. In addition to seeing and hearing, Haptic technology enables virtual objects to
be felt. Emerging technology involving illusions of taste and smell may also enhance the multimedia experience.
History of the term
In 1965 the term Multi-media was used to describe the Exploding Plastic Inevitable, a performance that combined
live rock music, cinema, experimental lighting and performance art.
In the intervening forty years the word has taken on different meanings. In the late 1970s the term was used to
describe presentations consisting of multi-projector slide shows timed to an audio track.[citation needed] In the 1990s it
took on its current meaning. In common usage the term multimedia refers to an electronically delivered combination
of media including video, still images, audio, text in such a way that can be accessed interactively.[1] Much of the
content on the web today falls within this definition as understood by millions.
Some computers which were marketed in the 1990s were called "multimedia" computers because they incorporated a
CD-ROM drive, which allowed for the delivery of several hundred megabytes of video, picture, and audio data.
Use of Multimedia
In Business and Industry
Multimedia is helpful for Presentations, Marketing , product Demonstration, Instance messaging , providing
employee training, advertisement and selling products all over the world via virtually unlimited web-based
technologies.
For example pilots get training and practice over multimedia virtual system before going to actual flight. Salesperson
can learn about product line with demonstration.
Also multimedia is used as a way to help present information to shareholders, superiors and coworkers.
In the above picture is a
multimedia.
presentation using Powerpoint. Corporate presentations may combine all forms of
Virtual Reality
Virtual reality uses multimedia content. Applications and delivery platforms of multimedia are virtually limitless.
Multimedia finds its application in various areas including, but not limited to, advertisements, art, education,
entertainment, engineering, medicine, mathematics, business, scientific research and spatial temporal applications.
Several examples are as follows:
Creative industries
Creative industries use multimedia for a variety of purposes ranging from fine arts, to entertainment, to commercial
art, to journalism, to media and software services provided for any of the industries listed below. An individual
multimedia designer may cover the spectrum throughout their career. Request for their skills range from technical, to
analytical, to creative and contact baskar.
Commercial
Much of the electronic old and new media utilized by commercial artists is multimedia. Exciting presentations are
used to grab and keep attention in advertising. Industrial, business to business, and interoffice communications are
often developed by creative services firms for advanced multimedia presentations beyond simple slide shows to sell
ideas or liven-up training. Commercial multimedia developers may be hired to design for governmental services and
nonprofit services applications as well.
Entertainment and fine arts
In addition, multimedia is heavily used in the entertainment industry, especially to develop special effects in movies
and animations. Multimedia games are a popular pastime and are software programs available either as CD-ROMs or
online. Some video games also use multimedia features. Multimedia applications that allow users to actively
participate instead of just sitting by as passive recipients of information are called Interactive Multimedia. In the Arts
there are multimedia artists, whose minds are able to blend techniques using different media that in some way
incorporates interaction with the viewer. One of the most relevant could be Peter Greenaway who is melding Cinema
with Opera and all sorts of digital media. Another approach entails the creation of multimedia that can be displayed in
a traditional fine arts arena, such as an art gallery. Although multimedia display material may be volatile, the
survivability of the content is as strong as any traditional media. Digital recording material may be just as durable and
infinitely reproducible with perfect copies every time.
Education
In Education, multimedia is used to produce computer-based training courses (popularly called CBTs) and reference
books like encyclopedia and almanacs. A CBT lets the user go through a series of presentations, text about a
particular topic, and associated illustrations in various information formats. Edutainment is an informal term used to
describe combining education with entertainment, especially multimedia entertainment.
Learning theory in the past decade has expanded dramatically because of the introduction of multimedia. Several
lines of research have evolved (e.g. Cognitive load, Multimedia learning, and the list goes on). The possibilities for
learning and instruction are nearly endless.
Engineering
Software engineers may use multimedia in Computer Simulations for anything from entertainment to training such as
military or industrial training. Multimedia for software interfaces are often done as a collaboration between creative
professionals and software engineers.
Mathematical and Scientific Research
In Mathematical and Scientific Research, multimedia are mainly used for modelling and simulation. For example, a
scientist can look at a molecular model of a particular substance and manipulate it to arrive at a new substance.
Representative research can be found in journals such as the Journal of Multimedia.
Medicine
In Medicine, doctors can get trained by looking at a virtual surgery or they can simulate how the human body is
affected by diseases spread by viruses and bacteria and then develop techniques to prevent it.
Multimedia at home
It can be used at home either by the medium of television or internet or by CD media . Home shopping , gardening ,
Cooking , Home designing can be various areas where multimedia can be useful or better way to get information.
LECTURE 32:
Virtual Reality Information
Introduction
We often receive requests from students, hobbyists, or interested parties for more information about Virtual
Reality (VR). The following is an attempt to provide an introduction to the topic, and to also supply some
links where further information may be found.
What is Virtual Reality (VR)?
Virtual Reality is generally a Computer Generated (CG) environment that makes the user think that he/she
is in the real environment. One may also experience a virtual reality by simply imagining it, like Alice in
Wonderland, but we will focus on computer generated virtual realities for this discussion.
The virtual world is hosted on a computer in the form of a database (e.g. terrain database or environment
database). The database resides in the memory of the computer. The database generally consists of
points in space (vertices), as well as textures (images). vertices may be connected to form planes,
commonly referred to as polygons. Each polygon consists of at least three vertices. The polygon could
have a specific color, and the color could be shaded, or the polygon could have a texture pasted onto it.
Virtual objects will consist of polygons. A virtual object will have a position (x, y, z), an orientation (yaw,
pitch, roll) as well as attributes (e.g. gravity or elasticity).
The virtual world is rendered with a computer. Rendering involves the process of calculating the scene that
must be displayed (on a flat plane) for a virtual camera view, from a specific point, at a specific orientation
and with a specific field of view (FOV). In the past the central processing unit (CPU) of the computer was
mainly used for rendering (so-called software rendering). Lately we have graphics processing units (GPUs)
that render the virtual world to a display screen (so-called hardware rendering). The GPUs are normally
situated on graphics accelerator cards, but may also be situated directly on the motherboard of the
computer. Hardware rendering is generally much faster than software rendering.
The virtual environment (also sometimes referred to as a synthetic environment) may be experienced with
a Desktop VR System, or with an Immersive VR System.
With Desktop VR a computer screen is normally used as the display medium. The user views the virtual
environment on the computer screen. In order to experience the virtual environment, the user must look at
the screen the whole time.
With Immersive VR the user is 'immersed in' or 'surrounded by' the virtual environment. This may be
achieved by using:
A Multi-Display System
or
A Head Mounted Display (HMD)
Immersive VR Systems provide the user with a wider field of view than Desktop VR Systems.
With Multi-Display Systems the field of view (FOV) of the user is extended by using several computer
monitors, or projectors. When using projectors, the image may be front-projected or back-projected onto
the viewing screen. Many simulators utilize three screens (forward view, left view, right view) to provide an
extended FOV. The configuration where the user is surrounded by projection screens are sometimes
referred to as a cave environment. The image may also be projected on a dome that may vary in shape
and size. With a multi-display system the user may look around as if in the real world.
A Head Mounted Display (HMD) consists of two miniature displays that are mounted in front of the user's
eyes with a headmount. Special optics enable the user to view the miniature screens. The HMD also
contains two headphones, so that the user may also experience the virtual environment aurally. The HMD
is normally fitted with a Head Tracker. The position (x, y, z) and orientation (yaw, pitch, roll) of the user's
head is tracked by means of the Head Tracker. As the user looks around, the position and orientation
information is continuously relayed to the host computer. The computer calculates the appropriate view
(virtual camera view) that the user should see in the virtual environment, and this is displayed on the
miniature displays. For example, let's assume that the virtual environment is the inside of a car, and that
the user is sitting behind the steering wheel. If the user looks forward, the head tracker will measure this
orientation, and relay it to the computer. The computer would then calculate the forward view, and the user
will see the windscreen, wipers and bonnet of the car (the user will obviously also see the outside world, or
out of window (OOW) view). If the user looks down, the computer will present a view of the steering wheel.
If the user looks further down, the accelerator pedal, clutch (if present) and brake pedal will be shown. The
orientation information may also be used to experience stereo and 3-D sound. If the user looks straight
forward, he/she will hear the engine noise of the car. The volume and phase will be equal for the right and
left ear. If the user looks to the left, the volume of the engine noise will be higher in the right ear and lower
in the left ear. Trackers that only track the orientation (yaw, pitch, roll) are referred to as 3 degree of
freedom, or 3 DOF trackers, while trackers that also tracks the position (x, y, z) are referred to as 6 DOF
trackers.
Applications of Virtual Reality (VR)
Virtual Reality is an ideal training and visualization medium.
VR is ideal for the training of operators that perform tasks in dangerous or hazardous environments. The
trainee may practice the procedure in virtual reality first, before graduating to reality-based training. The
trainee may be exposed to life-threatening scenarios, under a safe and controlled environment. Examples
of dangerous or hazardous environments may be found in the following fields:
Aviation
Automotive
Chemical
Defense
High Voltage
Industrial
Marine
Medical
Mining
Nuclear Energy
Examples of the above are shown under the Products heading.
VR is also an ideal tool to train operators for:
This paper describes the development and impact of new visually-coupled system (VCS) equipment designed to
support engineering and human factors research in the military aircraft cockpit environment. VCS represents an
advanced man-machine interface (MMI). Its potential to improve aircrew situational awareness seems enormous, but
its superiority over the conventional cockpit MMI has not been established in a conclusive and rigorous fashion.
What has been missing is a "systems" approach to technology advancement that is comprehensive enough to produce
conclusive results concerning the operational viability of the VCS concept and verify any risk factors that might be
involved with its general use in the cockpit. The advanced VCS configuration described here, has been ruggedized for
use in military aircraft environments and was dubbed the Virtual Panoramic Display (VPD). It was designed to
answer the VCS portion of the "systems" problem, and is implemented as a modular system whose performance can
be tailored to specific application requirements. The overall system concept and the design of the two most important
electronic subsystems that support the helmet-mounted components -- a new militarized version of the magnetic
helmet mounted sight and correspondingly similar helmet display electronics -- are discussed in detail. Significant
emphasis is given to illustrating how particular design features in the hardware improve overall system performance
and support research activities
A study was conducted to compare and validate two visually coupled system (VCS) installations, one in a
moving-base flight simulator and a second in a Bell 205 research helicopter. Standard low-level
maneuvering tasks were used to examine changes in handling qualities. Pilots also assessed two levels of
control augmentation: rate damped and translational rate command. The system handling qualities
degraded whenever the VCS was used. Pilots reported that there were system deficiencies which
increased their workload and prevented them from achieving desired task performance. The decline in
handling qualities was attributed principally to the reduction in image quality while flying the helicopter with
the VCS. The primary factors affecting performance included a reduction in image resolution, a reduction
in the field of view, system latencies, and the limitations of the simulator mathematical model. Control
augmentation improved the system handling qualities in the simulator and should be investigated further
as an effective strategy for overcoming VCS limitations
The Visually-Coupled System Computer-Generated Imagery (VCS-CGI) Interface program had two main phases.
The objective for the first phase was to successfully interface the various subcomponents (helmet-mounted sight,
helmet-mounted displays, and advanced computer-generated image system) and demonstrate their compatibility and
feasibility for use in wide field of view, air-to-ground visual simulation. The objective for the second phase was to
conduct a systematic exploration and evaluation of various system parameters that could affect display quality.
(Author)
LECTURE 33:
Visually Coupled Systems
The capability to “look and shoot” was but a fantasy in the days of the Flash Gordon and Buck Rogers comic strips.
Soon today’s Air Force pilot will be able to aim his weapons by a mere glance and fire along his line of sight by the
simple push of a button. Systematic research and development of visual-coupling concepts, to improve man’s
relationship with his machine, are helping to bring a “look and shoot” capability closer to operational reality.
Recent combat experience has shown that many tactical, reconnaissance/strike, and air-superiority systems are
operator-limited by both the task loading placed on the crew and the design of the interface between the operator and
his machine. As long as tactical weapon systems are used in a high-threat environment, the flight profiles necessary
for survivability will dictate that the operator perform all essential tasks effectively, accurately, and, most important,
expeditiously. A well-designed interface lets him use his natural perceptual and motor abilities optimally. Such
limiting factors especially critical in weapon delivery missions where visual target acquisition and weapon aiming are
task requirements.
Since 1965, in an attempt to improve aircraft man-machine design, human-factors engineers of the Aerospace
Medical Research Laboratory (AMRL) at Wright-Patterson AFB, Ohio, (a unit of Aerospace Medical Division) have
been pioneering techniques to “visually couple” the operator to his weapon system.
A visually coupled system is more correctly a special subsystem that integrates the natural visual and motor skills of
an operator with the machine he is controlling. An operator visually searches for, finds, and tracks an object of
interest. His line of sight is measured and used to aim sensors and/or weapons toward the object. Information related
to his visual/motor task from sensors, weapons, or central data sources is fed back directly to his vision by special
displays so as to enhance his task performance. In other words, he looks at the target, and the sensors/weapons
automatically point at the target. Simultaneously with the display, he verifies where sensors/weapons are looking. He
visually fine-tunes their aim, and he shoots at what he sees.
Two functions are performed: a line-of-sight sensing/control function and a display feedback function. Although each
may be used separately, a fully visually coupled system includes both. Thus, it is a unique control/display subsystem
in which man’s line of sight is measured and used for control, and visual information is fed back directly to his eyes
for his attention and use.
Currently a helmet-mounted sight is used to measure head position and line of sight. An early version of a helmet
sight was used in an in-flight evaluation at Tyndall AFB in 1969. Various experimental sights have undergone flight
tests. The U.S. Navy has produced a similar sight for operational use in F-4J and F-4B aircraft.
A helmet-mounted display is used to feed back information to the eye. An early bulky experimental display
completely occluded outside vision to the right eye. Later versions permit a see-through capability, which allows
simultaneous viewing of the display and the outside world scene. Many experimental display improvements are under
study, but display flight-test experience is still limited. Research and development efforts are under way to reduce
size, weight, and profile and to increase the performance of future visual coupling devices. Before looking at
development progress toward operational reality, let’s explain in general terms how such sights, displays, and
visually coupled systems are now mechanized and discuss their potential capabilities.
helmet sight components and capabilities
In the mid-sixties Honeywell selected, as one way to mechanize line-of-sight determination, an electrooptical
technique for determining helmet position and the line of sight through a reticle. (Figure 1) Rotating parallel fanlike
planes of infrared energy from the sight surveying units (mounted on canopy rails) scan two photo diodes on the side
of the helmet. Timing signals from the scanners and diodes are processed by a digital computer (sight electronics
unit) to determine line of sight. Such line-of-sight information can be used to point a variety of other subsystems.
A helmet-mounted sight facilitates wide off-boresight sensor or weapon aiming and speeds target acquisition. It
permits continuous visual attention to the target outside the cockpit while sensors/weapons are slewed, and the hands
are free from slewing control. The sight capitalizes on the ease and accuracy of the operator’s natural eye/head
tracking abilities. His natural outside-the-cockpit spatial orientation is used throughout target acquisition.
helmet display components and capabilities
In an experimental helmet-mounted display video and symbolic signals are received from various alternative aircraft
subsystems. Cathode-ray tube (CRT) imagery is projected directly to the eye of the operator in such a way that it
appears to be focused upon real-world background. A collimation lens performs the focus at infinity. The combiner
reflects the imagery into the eye much as a dental mirror does; however it permits the eye to see through to the realworld scene simultaneously. Thus it essentially combines the display and real-world scenes for the eye.
A small helmet display could substitute effectively for a large conventional panel-mounted display and would give
the operator the benefits of larger display with a high-quality image. The designer benefits from an overall subsystem
weight and space savings. These advantages accrue from the simple fact that in target detection it is image size upon
the retina of the eye which counts. A one-inch-diameter CRT display presented as a virtual imagel and placed one
inch in front of the eye results in approximately the same image size on the retina as a 21-inch CRT would mounted
on a panel 21 inches away from the eye.2 Miniature CRT technology can now provide sufficient resolution to make a
high-quality helmet-mounted image display practical.
Even though most aircraft panels cannot accommodate large CRT’s, it is important that the displayed imagery be
large enough to be detected and identified by the eye. In other words, the image size detection capabilities of the
sensor, the display, and the eye should be made as compatible as possible. Helmet-mounted displays offer designers a
new way to achieve this compatibility. They offer the operator continuous head-up, captive-attention viewing. When
the display is used alone, selected sensor imagery, flight control/display symbols, or other subsystem status
information can be directly presented to the eye no matter where the operator is looking. However, comprehensive
analyses and ground and in-flight evaluations of the operator’s capability to use the information must be carried out if
operator effectiveness is to be realized.
visually coupled systems components and capabilities
The helmet-mounted sight and display are combined as a system integrated with the, operator’s vision.
Mechanization of the full system involves integration of the sight and display components into a lightweight helmet
development of a visor that automatically varies light transmission to ensure appropriate display brightness contrast,
and improvements in the electronic and optic components.
When they are combined and matched with seekers, sensors, central data computers, and/or flight control subsystems,
entirely new control/display capabilities can be provided to the user: a hemispheric head-up display that is compatible
with the operator’s outside-the-cockpit spatial orientation; sensor extensions of the operator’s vision (e.g., it is
possible to position sensors so the operator “looks” through aircraft structures); visual control of the aircraft and
weapons; and visual communications between crew members and between aircraft.
Potential visual coupling applications with aircraft and remotely piloted vehicle fire control, flight control,
reconnaissance, navigation, weapon delivery, and communications subsystems are many. In a night attack mission,
for example, a low-light-level television scene can be displayed, superimposed on the real world, off-boresight, and
through aircraft structure. Flight control and weapons data are provided in addition to the ground scene on the
display.
Visually coupled systems can also be used to input line-of-sight angle information into central computers in order to
update navigation; to identify a target location for restrike, reconnaissance, or damage assessment; and to
communicate coordinate locations in real time with other aircraft or with command and control systems. By means of
intracockpit visual communication, one operator can cue another operator on where to look for targets of interest.
Similar nonverbal communication between forward air control and attack aircraft is conceivable.
LECTURE 34:
VISUALLY COUPLED SYSTEM DEVELOPMENT
Visually coupled system development is merely highlighted here to give a glimpse of progress. No attempt is made to
be comprehensive but rather to give a feel for some of the choices and changes that led to the current objectives of the
Air Force engineering development efforts. Until 1971 these efforts were mainly exploratory. Since March 1971, the
Aerospace Medical Research Laboratory has pursued exploratory development of advanced concepts and engineering
development of visual coupling devices. Progress to date indicates that these devices will soon be ready for Air Force
operational use.
helmet-mounted sight development
Historically, it is not possible to trace the basic line-of-sight and display-feedback concepts to specific originators.
Some credit for the sighting concept should go to behavioral scientists who in the late forties and early fifties were
engrossed in systematic analyses of pilot eye movements to determine instrument scan and visual search patterns. 3
Initial applied sighting efforts in government and industry concerned the accuracy of head and eye tracking in the
laboratory.4 It was apparent that accuracy and effectiveness were functions of the head and/or eye position sensing
techniques. Applications had to await practical sensing technologies. Head position tracking has received the most
applied emphasis. Eye position tracking continues to be explored. It was also evident that the proof for any sighting
technique would be in its accuracy and acceptability in flight.
The Army, Navy, Air Force, and industry have pursued complementary developments of helmet-mounted sights.
Two especially noteworthy early approaches to line-of-sight sensing: a mechanical head-position sensing system by
Sperry, Utah Division, and an electrooptical helmet-position sensing system by Minneapolis Honeywell were
developed to the brassboard5 stages for testing in the 1965 through 1967 time period.
Sperry’s sight is a mechanically linked system where helmet position is determined in a manner similar to the
working of drafting-board arms. A reticle in front of the operator’s eye fixes the line of sight in reference to the
helmet and its mechanical linkage. A magnetic attachment provides a quick disconnect capability. 6 Under Army
contracts, Sperry’s mechanical head-position tracker was evaluated in UH-1 and AH-1G helicopters, starting in 1967.
Subsequent testing has led to a production contract to use the mechanical helmet-mounted sight as a backup target
acquisition aid on certain Cobra helicopters. The Air Force pursued the mechanical sight approach in early AC-130
gunship aircraft.
on the helmet and cockpit-mounted photo diode detection surfaces. A Honeywell technique employs ultrasonic sound
ranging and sensing.
Projected advanced sight improvements include enlarging from 1 cubic foot to 2 cubic feet the head-motion envelope
(motion box) within which helmet position can be determined. Also sighting accuracy is to be improved, as is the
effective coverage of line-of-sight azimuth and elevation angles. Improvements will further reduce helmet weight to
51 ounces and will reduce costs per unit while increasing its reliability. Helmet-mounted sight technology is being
integrated with helmet display developments (described below) to form the Air Force’s fully visually coupled system.
THE FUTURE OF VIRTUALLY COUPLED SYSTEM
Visually coupled systems are under Army consideration for night attack applications in the Cobra helicopter and as a
technology for future advanced attack helicopters. The Navy is considering them for its AGILE and F-14 programs.
The Air Force is investigating potential applications for air-to-air and air-to-ground target acquisition and weapon
delivery, off-boresight weapon guidance, reconnaissance, navigation, and remotely piloted vehicle control. Currently,
the Air Force is conducting or planning flight tests in conjunction with the AGM-65 Maverick, C-130 gunship, laser
designation systems, and weapons guidance developments.
The immediate future will be determined by progress made in current development and test projects. Advanced
technology will be transitioned into engineer development prototypes for testing. Interim helmet-mounted sights will
be ready for Air Force application testing late in 1974, and advanced versions of sights, displays, and combined
subsystems will be ready in 1977.
The Aerospace Medical Research Laboratory is already looking beyond the technology discussed here to future
visual coupling concepts that can further improve operator performance. One such line-of-sight sensing concept will
determine eye position rather than head position. A display patterned after the human eye is also under study. These
concepts are briefly discussed:

The Aerospace Medical Research Laboratory is attempting to determine line of sight accurately from a small
eye-movement sensor that could be located in the cockpit instrument panel. This remote oculometer
determines line of sight from the reflection angle of a beam of infrared “light” projected onto the cornea of the
eye. Currently, in the laboratory the eye can be tracked within a one-cubic-foot motion box with an accuracy
of one degree. Should this technique prove to be practical, line-of-sight sensing and control could be possible
without any encumbrance on the head.18

A promising visually coupled system display technique employs dual-resolution fields of view, high
resolution with a zoom capability in the center and low resolution in the periphery. 19 This concept, patterned
after human vision, offers considerable potential in target search and identification as a means of coupling the
operator with high-magnification sensors and high-power optics. Associated sensor slewing control
techniques enable the operator to feel he has been moved closer to the target while using his natural visual
search capabilities.

Also several display-related technologies can be incorporated into visual coupling devices. For example,
predictor displays can be readily exploited. Color display for improved infrared sensor target detection is also
a possibility.20
In summary, the “look and shoot” capability is around the corner. Systematic R&D pursuit of visual coupling
technology is opening many possible applications. Development of components has progressed sufficiently that
operational applications are feasible. Although helmet sighting technology is further along than helmet display, full
visually coupled systems capabilities should be available to the Air Force in 1977. Operator and system performance
will be appreciably enhanced by the application of visual coupling devices.
LECTURE 35:
VR Operating Shell
VR poses a true challenge for the underlying software environment, usually referred to as the VR operating shell.
Such a system must integrate real-time three-dimensional graphics, in large object-oriented modelling and database
techniques, event-driven simulation techniques, and the overall dynamics based on multithreading distributed
techniques. The emerging VR operating shells, such as Trix at Autodesk, Inc., VEOS at HIT Lab, and Body Electric
at VPL, Inc., share many design features with the MOVIE system. A multiserver network of multithreading
interpreters of high-level object-oriented language seems to be the optimal software technology in the VR domain.
We expect MOVIE to play an important role in the planned VR projects at Syracuse University, described in the
previous section. The system is capable of providing both the overall infrastructure (VR operating shell) and the highperformance computational model for addressing new challenges in computational science, stimulated by VR
interfaces. In particular, we intend to address research topics in biological vision on visual perception limits [in
association with analogous constraints on VR technology; research topics in machine vision in association with highperformance support for the ``non-encumbered'' VR interfaces and neural network research topics in association with
the tracking and real-time control problems emerging in VR environments [From the software engineering
perspective, MOVIE can be used both as the base MovieScript-based software development platform and the
integration environment which allows us to couple and synchronize various external VR software packages involved
in the planned projects.
Figure illustrates the MOVIE-based high-performance VR system planned at NPAC and discussed in the previous
section. High-performance computing, high-quality three-dimensional graphics, and VR peripherals modules are
mapped on an appropriate set of MovieScript threads. The overall synchronization necessary, for example, to sustain
the constant frame rate, is accomplished in terms of the real-time component of the MovieScript scheduling model.
The object-oriented interpreted multithreading language model of MovieScript provides the critical mix of
functionalities, necessary to cope efficiently with prototyping in such complex software and hardware environments.
Planned High-End Virtual Reality Environment at NPAC. New parallel systems: CM-5, nCUBE2 and DECmpp are
connected by the fast HIPPI network and integrated with distributed FDDI clusters, high-end graphics machines, and
VR peripherals by mapping all these components on individual threads of the VR MOVIE server. Overall
synchronization is achieved by the real-time support within the MOVIE scheduling model. Although the figure
presents only one ``human in the loop,'' the model can also support in a natural way the multiuser, shared virtual
worlds with remote access capabilities and with a variety of interaction patterns among the participants.
The MOVIE model-based high-performance VR server at NPAC could be employed in a variety of visualizationintensive R&D projects. It could also provide a powerful shared VR environment, accessible from remote sites.
MovieScript-based communication protocol and remote server programmability within the MOVIE network assure
satisfactory performance of shared distributed virtual worlds also for low-bandwidth communication media such as
telephone lines.
From the MOVIE perspective, we see VR as an asymptotic goal in the GUI area, or the ``ultimate'' user interface.
Rather than directly build the specific VR operating shell, which would be short-lived given the current state of the
art in VR peripherals, we instead construct the VR support in the graded fashion, closely following existing and
emerging standards. A natural strategy is to extend the present MovieScript GUI sector based on Motif and threedimensional servers by some minimal VR operating shell support.
Two possible public domain standard candidates in this area to be evaluated are VEOS from HIT Lab and MR
(Minimal Reality) from the University of Alberta. We also plan to experiment with the Presence toolkit from DEC
and with the VR_Workbench system from SimGraphics, Inc.
Parallel with evaluating emerging standard candidates, we will also attempt to develop a custom MovieScript-based
VR operating shell. Present VR packages typically split into the static CAD-style authoring system for building
virtual worlds and the dynamic real-time simulation system for visiting these worlds. The general-purpose support for
both components is already present in the current MovieScript design: an interpretive object-oriented model with
strong graphics support for the authoring system and a multithreading multiserver model for the simulation system.
A natural next step is to merge both components within the common language model of MovieScript so that new
virtual worlds could also be designed in the dynamic immersive mode. The present graphics speed limitations do not
currently allow us to visit worlds much more complex than just Boxvilles of various flavors, but this will change in
coming years. Simple solids can be modelled in the conventional mouse-based CAD style, but with the growing
complexity of the required shapes and surfaces, more advanced tools such as VR gloves become much more
functional. This is illustrated in where we present a natural transition from the CAD-style to VR-style modelling
environment. Such VR-based authoring systems will dramatically accelerate the process of building virtual worlds in
areas such as industrial or fashion design, animation, art, and entertainment. They will also play a crucial role in
designing nonphysical spaces-for example, for hypermedia navigation through complex databases where there are no
established VR technologies and the novel immersion ideas can be created only by active, dynamic human
participation in the interface design process.
Examples of the Glove-Based VR Interfaces for CAD and Art Applications. The upper figure illustrates the planned
tool for interactive sculpturing or some complex, irregular CAD tasks. A set of ``chisels'' will be provided, starting
from the simplest ``cutting plane'' tool to support the glove-controlled polygonal geometry modelling. The lower
figure illustrates a more advanced interface for the glove-controlled surface modelling. Given the sufficient resolution
of the polygonal surface representation and the HPC support, one can generate the illusion of smooth, plastic
deformations for various materials. Typical applications of such tools include fashion design, industrial (e.g.,
automotive) design, and authoring systems for animation. The ultimate goal in this direction is a virtual world
environment for creating new virtual worlds.
VR-related Technologies
Other VR-related technologies combine virtual and real environments. Motion trackers are employed to monitor the
movements of dancers or athletes for subsequent studies in immersive VR. The technologies of 'Augmented Reality'
allow for the viewing of real environments with superimposed virtual objects. Telepresence systems (e.g.,
telemedicine, telerobotics) immerse a viewer in a real world that is captured by video cameras at a distant location
and allow for the remote manipulation of real objects via robot arms and manipulators.
Applications
As the technologies of virtual reality evolve, the applications of VR become literally unlimited. It is assumed that VR
will reshape the interface between people and information technology by offering new ways for the communication
of information, the visualization of processes, and the creative expression of ideas.
Note that a virtual environment can represent any three-dimensional world that is either real or abstract. This includes
real systems like buildings, landscapes, underwater shipwrecks, spacecrafts, archaeological excavation sites, human
anatomy, sculptures, crime scene reconstructions, solar systems, and so on. Of special interest is the visual and
sensual representation of abstract systems like magnetic fields, turbulent flow structures, molecular models,
mathematical systems, auditorium acoustics, stock market behavior, population densities, information flows, and any
other conceivable system including artistic and creative work of abstract nature. These virtual worlds can be
animated, interactive, shared, and can expose behavior and functionality.
Real and abstract virtual worlds (Michigan Stadium, Flow Structure):
Useful applications of VR include training in a variety of areas (military, medical, equipment operation, etc.),
education, design evaluation (virtual prototyping), architectural walk-through, human factors and ergonomic studies,
simulation of assembly sequences and maintenance tasks, assistance for the handicapped, study and treatment of
phobias (e.g., fear of height), entertainment, and much more.
7.Available VR Software Systems
[NOTE: This section is BADLY out of date. Most of the information is from the 1993 version of this paper. It does
not address VRML, or other new systems. Search Yahoo or other service to find new systems.]
There are currently quite a number of different efforts to develop VR technology. Each of these projects have
different goals and approaches to the overall VR technology. Large and small University labs have projects underway
(UNC, Cornell, U.Rochester, etc.). ARPA , NIST, National Science Foundation and other branches of the US
Government are investing heavily in VR and other simulation technologies. There are industry supported laboratories
too, like the Human Interface Technologies Laboratory (HITL) in Seattle and the Japanese NTT project. Many
existing and startup companies are also building and selling world building tools (Autodesk, IBM', Sense8,
VREAM).
There are two major categories for the available VR software: toolkits and authoring systems. Toolkits are
programming libraries, generally for C or C++ that provide a set of functions with which a skilled programmer can
create VR applications. Authoring systems are complete programs with graphical interfaces for creating worlds
without resorting to detailed programming. These usually include some sort of scripting language in which to
describe complex actions, so they are not really non-programming, just much simpler programming. The
programming libraries are generally more flexible and have faster renders than the authoring systems, but you must
be a very skilled programmer to use them. (Note to developers: if i fail to mention your system below, please let me
know and I will try to remember to include it when, and if, i update this document again)
Lesson Plan
-- MMT --
The readings referred to in the table below are recommended material from
A – “Multimedia:Making it work ” by Tay Vaungh
B – “Multimedia system” by John F. Boford
C – “Multimedia: sound and video” by Lozano.
Lecture
Unit-1
1.
2.
3.
4.
5.
6&7
UNIT 2
8&9
10.
11.
12.
13.
14.
15
16
17
UNIT 3
18
19
20
21
22
23
24
25
26
27
28
29
30
UNIT-4
31
32
33
Topics
Readings
Basics of multimedia
Computer, communication, entertainment,
multimedia an introduction
Frame work for multimedia , multimedia
device
CD-ROM, CD-Audio, Multimedia presentation
and authoring
Professional development tool, LAN,
internet,World wide web,
ATM,ADSL, vector graphic, 3D graphic
program
B-2.1, A-3 to 7
Animation technique, shading, anti-alising,
morphing, video on demand
Image compression and standard
Making still image, editing and capturing
image
Scanning an image, computer color model
A-173,149, 56, 142,
143, B-1.3.1
B-3.4, A-232to 257
B-7.3, A-284to297
A-230to245,A-301to
315
B-13.3, B-17.6, A-331to
332
A-134to139, A141to142
A-143, A-151to159
Color palettes, vector drawing
3 D drawing and rendeing
JPEG objective and architecture
JPEG-DCT encoding and quantization
JPEG statistical coding, JPEG
predictive lossless coding
JPEG performance
GIF,PNG, TIFF, BMP
AUDIO & VIDEO
Digital representation of sound
Subband encoding
Fourior method
Transmission of digital sound
Digital audio signal processing
A-144to146,C-6.3.2
A-146to 151
B-6.5.1-6.5.2
B-6.5.3
B-6.5.4, B-6.5.5
Stereophonic and quadraphonic signal
processing
Editing sampled sound
MPEG audio, audio compression and
decompression
Speech generation and recognization
MIDI
MPEG motion video compression
standard
DVI Technology
Time based media representation and
delivery`
VIRTUAL REALITY
Applications of multimedia,
Intelligent multimedia
Desktop virtual reality
B-4.5.1
B-6.5.6
A-359, A-162, A-355
B-4.3,
B-4.3.2, C-6.4
B-4.3.2, A-271
B-4.4
B-4.5, C-7.5
B-4.5.3,A-291
A-207, A-276 to 278
C-9.4, A-415, B-8.5.9
A-106to116
B-6.7
B-6.8
B-7.1 to 7.6
A-5to 11
B-18.3
A-339 to 340
34
VR operating system
Notes from internet
35
Visually coupled system requirement
B-18.5
Record of Lectures Taken
Lecture
Date
Topics covered
Remarks
No
ASSIGNMENT-1
GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT
SUBJECT: multimedia
CLASS:B.Tech 4th sem
1. Define Multimedia. Give its applications in various areas.
2. Discuss different standards of CD-ROM. Define session management in Orange Book
Standard of CD-ROM?
3. What is utilization of high speed devices in Multimedia, Define SCSI and IDE devices?
4. Discuss the Transport Layer Protocols to handle the Multimedia Data operation.
5. Write short notes on the followings.
(a) Multimedia Authoring Tools.
(b) Synchronous and Isochronous Data.
(c) TCP and XTP
(d) ATM and FDDI
(e) ADSL
(f) Vector Graphics
ASSIGNMENT-2
GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT
SUBJECT: multimedia
CLASS:B.Tech 4th sem
1. Give various animation techniques in detail
2. Explain Multimedia servers & databases
3. Explain multimedia distribution network.
4. How 3D graphics programs are executed using multimedia
5. Briefly describe the Twining and morphing
ASSIGNMENT-3
GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT
SUBJECT: multimedia
CLASS:B.Tech 4th sem
Q1 1. Explain MPEG motion Video Compression Standard.
2. Differentiate MPEG2 and MPEG4
3. Why compression is required in MM Technologies.
4. Explain loss less and lossy compression technique.
5. Explain JPEG-DCT encoding and quantization. Explain Quantization noise
6. Compare the performance of JPEG with BMP and GIF
7. How capturing and editing is done in still images
8. Explain various high speed devices used in MM Technologies.
9. Explain JPEG-statistical encoding how it is different from JPEG predictive lossless
coding
10. Write short notes on the following.
(a) Computer color models
(b) Color palattes
(c) Vector drawing
(d) 3D drawing and rendering
ASSIGNMENT-4
GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT
SUBJECT: multimedia
CLASS:B.Tech 4th sem
1. Explain intelligent multimedia system. Explain its components.
2. Explain quick time Architecture for Macintosh system. Explain its relevancy with
Virtual Reality O. S.
3. What do you mean by VR? Explain intelligent VR software systems
4. Explain applications of VR in various fields
5. What are virtual environment displays and orientation making
6. Write short notes on the following.
(i) MIME Applications.
(j) Zig-zag Ordering.
(k) Desktop VR
(l) Applications of multimedia
Q1 1. Explain MPEG motion Video Compression Standard.
2. Differentiate MPEG2 and MPEG4
3. Why compression is required in MM Technologies.
4. Explain loss less and lossy compression technique.
5. Explain JPEG-DCT encoding and quantization. Explain Quantization noise
6. Compare the performance of JPEG with BMP and GIF
7. How capturing and editing is done in still images
8. Explain various high speed devices used in MM Technologies.
9. Explain JPEG-statistical encoding how it is different from JPEG predictive lossless
coding
10. Write short notes on the following.
(a) Computer color models
(b) Color palattes
(c) Vector drawing
(d) 3D drawing and rendering
1. Describe digital representation of sound
2. How analog signals are encoded
3. Explain transmission of digital sound
4. Define Speech Recognition system and its utilization in daily life.
5. Explain Digital audio signal processing, briefly explain stereophonic and quadraphonic
signal processing techniques
6. Define MPEG motion Video Compression Standard.
7. Explain hybrid encoding metods.
8. Define DVI technology.
9. What is MIDI
10. Explain time base media representation and delivery
11. Write short notes on the following.
(e) Subband coding
(f) Fourier method
(g) Time domain sampled representation of sound
(h) Audio synthesis
1. Explain intelligent multimedia system. Explain its components.
2. Explain quick time Architecture for Macintosh system. Explain its relevancy with
Virtual Reality O. S.
3. What do you mean by VR? Explain intelligent VR software systems
4. Explain applications of VR in various fields
5. What are virtual environment displays and orientation making
6. Write short notes on the following.
(i) MIME Applications.
(j) Zig-zag Ordering.
(k) Desktop VR
(l) Applications of multimedia
1. Define Multimedia. Give its applications in various areas.
2. Discuss different standards of CD-ROM. Define session management in Orange Book
Standard of CD-ROM?
3. What is utilization of high speed devices in Multimedia, Define SCSI and IDE devices?
4. Discuss the Transport Layer Protocols to handle the Multimedia Data operation.
5. Write short notes on the followings.
(a) Multimedia Authoring Tools.
(b) Synchronous and Isochronous Data.
(c) TCP and XTP
(d) ATM and FDDI
(e) ADSL
(f) Vector Graphics
1. Give various animation techniques in detail
2. Explain Multimedia servers & databases
3. Explain multimedia distribution network.
4. How 3D graphics programs are executed using multimedia
5. Briefly describe the Twining and morphing
Download