LECTURE:1 Unit-1: Basics of Multimedia Technology: 1. Computers, communication and entertainment In the era of information technology, we are dealing with free flow of information with no barriers of distance. Take the case of internet as one of the simplest examples. You can view and download information across the globe within a reasonable time, if you have a good speed of connection. Let us spend a bit of time to think in which form we access the information. The simplest and the most common of these is the printed text. In every web page, some text materials are always present, although the volume of textual contents may vary from page to page. The text materials are supported with graphics, still pictures, animations, video clippings, audio commentaries and so on. All or at least more than one of these media, which we can collectively call as “multimedia”, are inevitably present to convey the information which the web site developers want to do, for the benefit of the world community at large. All these media are therefore utilized to present the information in a meaningful way, in an attractive style. Internet is not the only kind of information dissemination involving multiple media. Let us have a look at some other examples as well. In television, we have involvement of two media – audio and video, which should be presented together in a synchronized manner. If we present the audio ahead of video or video ahead of audio in time, the results are far from being pleasant. Loss of lip synchronization is noticeable, even if the audio and the video presentations differ by just 150 milliseconds or more. If the time lead or the lag is in order of seconds, one may totally lose the purpose of the presentation. Say, in some distance learning program, the teacher is explaining something which is written down on a blackboard. If the audio and the video differ in time significantly, a student will not be able to follow the lecture at all. So, television is also a multimedia and now, we understand one more requirement of multimedia signals. The multimedia signals must be synchronized and if it is not possible to make them absolutely synchronized, they should at least follow a stringent specification by which lack of synchronization can be tolerated. Television is an example, where there is only unidirectional flow of multimedia information – from the transmitter to the receiver. In standard broadcast television, there is no flow of information in the reverse direction, unless you use a different device and a channel –say, by talking to the television presenter over telephone. In internet of course, you have interactivity in the sense that you can navigate around the information and make selections through hyperlinks, but bulk of the flow of information is happening from the web server to the users. In some applications, we require free flow of multimedia signals between two or more nodes, as is the case with video conferencing, or what we should more appropriately call a multimedia teleconferencing. In case of multimedia conferencing, the nodes physically located in different parts of a globe will be equipped with microphones, cameras, a computer supporting texts, graphics and animations and may be other supporting devices if required. For example,suppose five eminent doctors across the continents are having a live medical conference to discuss about a patient’s condition. The doctors should not only see and talk to each other, all of them should observe the patient at the same time, have access to the patient’s medical history, live readings and graphs of the monitoring instruments, visual rendering of the data etc. In a multimedia teleconferencing application of this nature, one must ensure that the end-to-end delays and the turnaround time is minimum. Moreover, the end-to-end delays between different nodes should not be different very significantly with respect to each other, that is, the delay jitters must be small Multimedia an introduction What is Multimedia? A simple definition of multimedia is ‘multimedia can be any combination of text, graphics, sound, animation and video, to effectively communicate ideas to users’ Multimedia = Multi + media Multi = many Media = medium or means by which information is stored, transmitted, presented or perceived. Other definition of Multimedia area Definition 1 “Multimedia is any combination of text, graphic art, sound, animation and video delivered to you by computer or other electronic means.” Definition 2 “Multimedia is the presentation of a (usually interactive) computer application, incorporating media elements such as text, graphics, video, animation and sound on computer.” Types of Multimedia Presentation Multimedia presentation can be categorize into 2; linear Multimedia and Interactive Multimedia. a. Linear Multimedia o In linear Multimedia the users have very little control over the presentation. The users would only sit back and watch the presentation The presentation normally plays from the start to end or even loops continually to present the information. b. A movie is a common type of linear multimedia Interactive Multimedia o In interactive Multimedia, users dictate the flow of delivery. The users control the delivery of elements and control the what and when. o Users have the ability to move around or follow different path through the information presentation. o The advantage of interactive Multimedia is that complex domain of information can easily be presented. Whereas the disadvantage is, users might get lost in the massive “information highway”. Interactive Multimedia is useful for: information archive (encyclopedia), education, training and entertainment. Multimedia System Characteristic 4 major characteristics of Multimedia System are a. Multimedia systems must be computer controlled. b. All multimedia components are integrated. c. The interface to the final user may permit interactivity. d. The information must be represented digitally. I) Computer controlled Computer is used for o Producing the content of the information – e.g. by using the authoring tools, image editor, sound and video editor o Storing the information – providing large and shared capacity for multimedia information. o Transmitting the information – through the network. o Presenting the information to the end user – make direct use of computer peripheral such as display device (monitor) or sound generator (speaker). II) Integrated All multimedia components (audio, video, text, graphics) used in the system must be somehow integrated. o Example: Every device, such as microphone and camera is connected to and controlled by a single computer. III) A single type of digital storage is used for all media type. Video sequences are shown on computer screen instead of TV monitor. Interactivity Three levels of interactivity: Level 1: Interactivity strictly on information delivery. Users select the time at which the presentation starts, the order, the speed and the form of the presentation itself. Level 2: Users can modify or enrich the content of the information, and this modification is recorded. Level 3: Actual processing of users input and the computer generate genuine result based on the users input. IV) Digitally represented Digitization : process involved in transforming an analog signal to digital signal. LECTURE 2: Framework for multimedia systems For multimedia communication, we have to make judicious use of all the media at our disposal. We have audio, video, graphics, texts - all these media as the sources, but first and foremost, we need a system which can acquire the separate media streams, process them together to make it an integrated multimedia stream. Now, click at the link given below to view the elements involved in a multimedia transmitter, as shown in Fig.1.1 Devices like cameras, microphones, keyboards, mouse, touch-screens, storage medium etc. are required to feed inputs from different sources. All further processing till the transmission is done by the computer. The data acquisition from multiple media is followed by data compression to eliminate inherent redundancies present in the media streams. This is followed by inter-media synchronization by insertion of time-stamps, integration of individual media streams and finally the transmission of integrated multimedia stream through a communication channel, which can be a wired or a wireless medium. The destination end should have a corresponding interface to receive the integrated multimedia stream through the communication channel. At the receiver, a reversal of the processes involved during transmission is required. Now, click at the link below to view the elements involved in a multimedia receiver, as shown in Fig.1.2. The media extractor separates the integrated media stream into individual media streams, which undergoes decompression and then presented in a synchronized manner according to their time-stamps in different playback units, such as monitors, loudspeakers, printers/plotters, recording devices etc. The subject of multimedia is studied from different perspectives in different universities and institutes. In some of the multimedia courses, the emphasis is on how to generate multimedia production, the authoring tools and the software associated with these etc. In this course, we don’t cover any of the multimedia production aspects at all. Rather, this course focuses on the multimedia systems and the technology associated with multimedia signal processing and communication. We have already posed the technical challenges. In the coming lessons, we shall see in details how these challenges, such as compression, synchronization etc can be overcome, how the multimedia standards have been designed to ensure effective multimedia communication, how to integrate the associated media and how to index and retrieve multimedia sequences. Other than the introduction, this course has been divided into the following modules: (i) Basics of Image Compression and Coding. (ii) Orthogonal Transforms for Image Compression. (iii) Temporal Redundancies in Video sequences. (iv) Real-time Video Coding. (v) Multimedia standards. (vi) Continuity and synchronization. (vii) Audio Coding. (viii) Indexing, Classification and Retrieval. (ix) Multimedia applications. MULTIMEDIA DEVICE: 1 CONNECTIONS: Among many devices- computer, monitor, disk drive, video disk driver etc there are many wires and connections to resemble the intensive care ward of hospital. The equipment required for multimedia project depend on the content of the project as well as its design. If you can find content such as sound effects, music, graphic arts, quick time or AVI movies to use in your project, you may not need extra tools for making your own. Multimedia developer have separate equipments for digitizing sound from microphone or taps, scanning photos from other printer matters. Connection Serial port Standard parallel port Original USB IDE SCSI-1 SCSI-2 Ultra SCSI Ultra 2 SCSI Wide Ultra 2 SCSI Ultra 3 SCSI Wide ultra 3 SCSI Transfer rates 115kbits/s 115 kbits/s 12 mbitd/s 3.3-16.7Mbits/s 5Mbits/s 10Mbits/s 20Mbits/s 40Mbits/s 40Mbits//s 80Mbits/s 160Mbits/s SCSI Small Computer System Interface, or SCSI (pronounced ['skʌzi][1]), is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, and electrical and optical interfaces. SCSI is most commonly used for hard disks and tape drives, but it can connect a wide range of other devices, including scanners and CD drives. The SCSI standard defines command sets for specific peripheral device types; the presence of "unknown" as one of these types means that in theory it can be used as an interface to almost any device, but the standard is highly pragmatic and addressed toward commercial requirements. SCSI is an intelligent interface: it hides the complexity of physical format. Every device attaches to the SCSI bus in a similar manner. SCSI is a peripheral interface: up to 8 or 16 devices can be attached to a single bus. There can be any number of hosts and peripheral devices but there should be at least one host. SCSI is a buffered interface: it uses hand shake signals between devices, SCSI-1, SCSI-2 have the option of parity error checking. Starting with SCSI-U160 (part of SCSI-3) all commands and data are error checked by a CRC32 checksum. SCSI is a peer to peer interface: the SCSI protocol defines communication from host to host, host to a peripheral device, peripheral device to a peripheral device. However most peripheral devices are exclusively SCSI targets, incapable of acting as SCSI initiators—unable to initiate SCSI transactions themselves. Therefore peripheral-to-peripheral communications are uncommon, but possible in most SCSI applications. The Symbios Logic 53C810 chip is an example of a PCI host interface that can act as a SCSI target. LECTURE 3: IDE, ATA, EIDE, ultra ATA, Ultra IDE The ATA (Advanced Technology Attachment) standard is a standard interface that allows you to connect storage peripherals to PC computers. The ATA standard was developed on May 12, 1994 by the ANSI (document X3.2211994). Despite the official name "ATA", this standard is better known by the commercial term IDE (Integrated Drive Electronics) or Enhanced IDE (EIDE or E-IDE). The ATA standard was originally intended for connecting hard drives, however an extension called ATAPI (ATA Packet Interface) was developed in order to be able to interface other storage peripherals (CD-ROM drives, DVDROM drives, etc.) on an ATA interface. Since the Serial ATA standard (written S-ATA or SATA) has emerged, which allows you to transfer data over a serial link, the term "Parallel ATA" (written PATA or P-ATA) sometimes replaces the term "ATA" in order to differentiate between the two standards. The Principle The ATA standard allows you to connect storage peripherals directly with the motherboard thanks to a ribbon cable, which is generally made up of 40 parallel wires and three connectors (usually a blue connector for the motherboard and a black connector and a grey connector for the two storage peripherals). On the cable, one of the peripherals must be declared the master cable and the other the slave. It is understood that the far connector (black) is reserved for the master peripheral and the middle connector (grey) for the slave peripheral. A mode called cable select (abbreviated as CS or C/S) allows you to automatically define the master and slave peripherals as long as the computer's BIOS supports this functionality. USB: Universal Serial Bus (USB) connects more than computers and peripherals. It has the power to connect you with a whole new world of PC experiences. USB is your instant connection to the fun of digital photography or the limitless creative possibilities of digital imaging. You can use USB to connect with other people through the power of PCtelephony and video conferencing. Once you've tried USB, we think you'll grow quite attached to it! In information technology, Universal Serial Bus (USB) is a serial bus standard to interface devices to a host computer. USB was designed to allow many peripherals to be connected using a single standardized interface socket and to improve the Plug and play capabilities by allowing hot swapping, that is, by allowing devices to be connected and disconnected without rebooting the computer or turning off the device. Other convenient features include providing power to lowconsumption devices without the need for an external power supply and allowing many devices to be used without requiring manufacturer specific, individual device drivers to be installed. USB is intended to replace many legacy varieties of serial and parallel ports. USB can connect computer peripherals such as mice, keyboards, PDAs, gamepads and joysticks, scanners, digital cameras, printers, personal media players, and flash drives. For many of those devices USB has become the standard connection method. USB was originally designed for personal computers, but it has become commonplace on other devices such as PDAs and video game consoles, and as a bridging power cord between a device and an AC adapter plugged into a wall plug for charging purposes. As of 2008, there are about 2 billion USB devices in the world.[citation needed] The design of USB is standardized by the USB Implementers Forum (USB-IF), an industry standards body incorporating leading companies from the computer and electronics industries. Notable members have included Agere (now merged with LSI Corporation), Apple Inc., Hewlett-Packard, Intel, NEC, and Microsoft. Firewire Presentation of FireWire Bus (IEEE 1394) The IEEE 1394 bus (name of the standard to which it makes reference) was developed at the end of 1995 in order to provide an interconnection system that allows data to circulate at a high speed and in real time. The company Apple gave it the commercial name "FireWire", which is how it is most commonly known. Sony also gave it commercial name, i.Link. Texas Instruments preferred to call it Lynx. FireWire is a port that exists on some computers that allows you to connect peripherals (particularly digital cameras) at a very high bandwidth. There are expansion boards (generally in PCI or PC Card / PCMCIA format) that allow you to equip a computer with FireWire connectors. FireWire connectors and cables can be easily spotted thanks to their shape as well as the following logo: FireWire Connectors There are different FireWire connectors for each of the IEEE 1394 standards. The IEEE 1394a standard specifies two connectors: o Connectors 1394a-1995: o Connectors 1394a-2000, called mini-DV because they are used on Digital Video (DV) cameras: The IEEE 1394b standard specifies two types of connectors that are designed so that 1394b-Beta cables can be plugged into Beta and Bilingual connectors, but 1394b Bilingual cables can only be plugged into Bilingual connectors: o 1394b Beta connectors: o 1394b Bilingual connectors: How the FireWire Bus Works The IEEE 1394 bus has about the same structure as the USB bus except that it is a cable made up of six wires (2 pairs for the data and the clock and 2 wires for the power supply) that allow it to reach a bandwidth of 800 Mb/s (soon it should be able to reach 1.6 Gb/s, or even 3.2 Gb/s down the road). The two wires for the clock is the major difference between the USB bus and the IEEE 1394 bus, i.e. the possibility to operate in two transfer modes: Asynchronous transfer mode: this mode is based on a transmission of packets at variable time intervals. This means that the host sends a data packet and waits to receive a receipt notification from the peripheral. If the host receives a receipt notification, it sends the next data packet. Otherwise, the first packet is resent after a certain period of time. Synchronous mode: this mode allows data packets of specific sizes to be sent in regular intervals. A node called Cycle Master is in charge of sending a synchronisation packet (called a Cycle Start packet) every 125 microseconds. This way, no receipt notification is necessary, which guarantees a set bandwidth. Moreover, given that no receipt notification is necessary, the method of addressing a peripheral is simplified and the saved bandwidth allows you to gain throughput. Another innovation of the IEEE 1394 standard: bridges (systems that allow you to link buses to other buses) can be used. Peripheral addresses are set with a node (i.e. peripheral) identifier encoded on 16 bits. This identifier is divided into two fields: a 10-bit field that identifies the bridge and a 6-bit field that specifies the node. Therefore, it is possible to connect 1,023 bridges (or 210 -1) on which there can be 63 nodes (or 26 -1), which means it is possible to address 65,535 peripherals! The IEEE 1394 standard allows hot swapping. While the USB bus is intended for peripherals that do not require a lot of resources (e.g. a mouse or a keyboard), the IEEE 1394 bandwidth is larger and is intended to be used for new, unknown multimedia (video acquisition, etc.). LECTURE 4: MEMORY STORAGE DEVICE: To estimate the memory requirement of a multimedia project is the space required on any floppy or a hard disk or CD ROM not on the random access memory. While your computer is running you must have sense of the project content, color image, text and programming code that glue it all together to the required memory. If you are making a multimedia you must also need memory for storing and archiving working files used during production, audio and video files. <1> RAM Random-access memory (usually known by its acronym, RAM) is a form of computer data storage. Today it takes the form of integrated circuits that allow the stored data to be accessed in any order (i.e., at random). The word random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its physical location and whether or not it is related to the previous piece of data.[1] ROM: Read-only memory (usually known by its acronym, ROM) is a class of storage media used in computers and other electronic devices. Because data stored in ROM cannot be modified (at least not very quickly or easily), it is mainly used to distribute firmware In its strictest sense, ROM refers only to mask ROM (the oldest type of solid state ROM), which is fabricated with the desired data permanently stored in it, and thus can never be modified. However, more modern types such as EPROM and flash EEPROM can be erased and re-programmed multiple times; they are still described as "read-only memory"(ROM) because the reprogramming process is generally infrequent, comparatively slow, and often does not permit random access writes to individual memory locations. Despite the simplicity of mask ROM, economies of scale and field-programmability often make reprogrammable technologies more flexible and inexpensive, so that mask ROM is rarely used in new products as of 2007. Zip, Jaz, Syquest, optical storage device: Fro years the Sequest 44 MB removable cartridges were the most widely used portable medium among multimedia developers. Zip driver with there likewise inexpensive 100MB, 250MB, 750 MB cartridge built on a floppy disk technology, significantly presented Syquest’s Market share fro removable media. Lomega’s Jaz cartridge , built based on hard drive technology provide one or two gegebytes of removable storage media, and have fast enough transfer rate for multimedia developer. Other storage device are: Digital versatile disc Flash or thumb Drive CD-ROM Players CD Recorder CD RW INPUT DEVICE: Keyboard Mice Trackball Touchscreen Magnetic card encoders and reader Graphic tablets Scanners Optical Character Recognizer Device infrared Remotes Voice Recognizer systems Digital Cameras OUTPUT HARDWARE: Presentation of audio and visual component of your multimedia project requires hardware that may not be attached with the computer by itself, such as speaker, amplifier, monitor. There is no greater test of benefit of good output hardware then to feed to feed your audio output of your computer in your external amplifier system. Some of your output device are: Audio Device Amplifier and Speaker Portable Media Player Monitor Video Device Projectors Printers Communication Devidce: Communication among workgroup member ands with the client is essential for the efficient and assured completion of project. When you need it immediate, an internet connection is required. If you and your client both are connected via internet, a combination of communication by e-mail and FTP(file transfer protocol ) may be the most cost effective and efficient solution for creative developer and manager Various communication device are as below: Modem ISDN and DSL Cable Modems LECTURE 5: CD-AUDIO, CD ROM, CD-I: CD Audio (or CD-DA -Digital Audio): The first widely used CDs were music CDs that appeared in the early 1980s.The format for storing recorded music in digital form, as on CDs that are commonly found in music stores. The red book standards, created by Sony and Phillips indicate specifications such as the size of the pits and lands, how the audio is organized and where it is located on the CD, as well as how errors are corrected.Using compression techniques CD Audio discs can hold up to 75 minutes of sound. To provide the highest quality, the music is sampled at 44.1 kHz, 16-bit stereo. Because of the high-quality sound of audio CDs, they quickly became very popular. Other CD formats evolved from the Red Books standards. CDROM: Although the Red Book standards were excellent for audio, they were useless for data, text, graphics and video. he Yellow Book standards built upon the Red Book, adding specifications for a track to accommodate data, thus establishing a format for storing data, including video and audio, in digital form on a compact disc.CD-ROM also provided a better error-checking scheme, which is important for data.One drawback of the Yellow Book standards is that they allowed various manufacturers to determines their own method of organizing and accessing data. This led to incompatibilities across computer platforms. For the first few years of its existence, the Compact Disc was a medium used purely for audio. However, in 1985 the Yellow Book CD-ROM standard was established by Sony and Philips, which defined a non-volatile optical data computer data storage medium using the same physical format as audio compact discs, readable by a computer with a CD-ROM drive CD-I (Interactive): CD-i or Compact Disc Interactive is the name of an interactive multimedia CD player developed and marketed by Royal Philips Electronics N.V. CD-i also refers to the multimedia Compact Disc standard utilized by the CD-i console, also known as Green Book, which was co-developed by Philips and Sony in 1986 (not to be confused with MMCD, the pre-DVD format also co-developed by Philips and Sony). The first Philips CD-i player, released in 1991 and initially priced around USD $700, is capable of playing interactive CD-i discs, Audio CDs, CD+G (CD+Graphics), Karaoke CDs, and Video CDs (VCDs), though the latter requires an optional "Digital Video Card" to provide MPEG-1 decoding. Dveloped by Phillips in 1986, the specifications for CD-I, were published in the Green Book. CD-I is a platform-specific format; it requires a CD-I player, with a proprietary operating system, attached to a television set. Because of the need for specific CD-I hardware, this format has had only marginal success in the consumer market. One of the benefits of CD-I is its ability to synchronize sound and pictures on a single track of the disc. PRESENTATION DEVICE AND USER INTERFACE IN MULTIMEDIA LECTURE 6: 2. Lans and multimedia internet, World Wide Web & multimedia distribution network-ATM & ADSL • Networks Telephone networks dedicate a set of resources that forms a complete path from end to end for the duration of the telephone connection. The dedicated path guarantees that the voice data can be delivered from one end to the other end in a smooth and timely way, but the resources remain dedicated even when there is no talking. In contrast, digital packet networks, for communication between computers, use time-shared resources (links, switches, and routers) to send packets through the network. The use of shared resources allows computer networks to be used at high utilization, because even small periods of inactivity can be filled with data from a different user. The high utilization and shared resources create a problem with respect to the timely delivery of video and audio over data networks. Current research centers around reserving resources for time-sensitive data, which will make digital data networks more like telephone voice networks. Internet The Internet and intranets, which use the TCP protocol suite, are the most important delivery vehicles for multimedia objects. TCP provides communication sessions between applications on hosts, sending streams of bytes for which delivery is always guaranteed by means of acknowledgments and retransmission. User Datagram Protocol (UDP) is a ``best-effort'' delivery protocol (some messages may be lost) that sends individual messages between hosts. Internet technology is used on single LANs and on connected LANs within an organization, which are sometimes called intranets, and on ``backbones'' that link different organizations into one single global network. Internet technology allows LANs and backbones of totally different technologies to be joined together into a single, seamless network. Part of this is achieved through communications processors called routers. Routers can be accessed from two or more networks, passing data back and forth as needed. The routers communicate information on the current network topology among themselves in order to build routing tables within each router. These tables are consulted each time a message arrives, in order to send it to the next appropriate router, eventually resulting in delivery. Token ring Token ring [31] is a hardware architecture for passing packets between stations on a LAN. Since a single circular communication path is used for all messages, there must be a way to decide which station is allowed to send at any time. In token ring, a ``token,'' which gives a station the right to transmit data, is passed from station to station. The data rate of a token ring network is 16 Mb/s. Ethernet Ethernet [31] LANs use a common wire to transmit data from station to station. Mediation between transmitting stations is done by having stations listen before sending, so that they will not interfere with each other. However, two stations could begin to send at the same time and collide, or one station could start to send significantly later than another but not know it because of propagation delay. In order to detect these other situations, stations continue to listen while they transmit and determine whether their message was possibly garbled by a collision. If there is a collision, a retransmission takes place (by both stations) a short but random time later. Ethernet LANs can transmit data at 10 Mb/s. However, when multiple stations are competing for the LAN, the throughput may be much lower because of collisions and retransmissions. Switched Ethernet Switches may be used at a hub to create many small LANs where one large one existed before. This reduces contention and permits higher throughput. In addition, Ethernet is being extended to 100Mb/s throughput. The combination, switched Ethernet, is much more appropriate to multimedia than regular Ethernet, because existing Ethernet LANs can support only about six MPEG video streams, even when nothing else is being sent over the LAN. ATM Asynchronous Transfer Mode(ATM) [29, 32] is a new packet-network protocol designed for mixing voice, video, and data within the same network. Voice is digitized in telephone networks at 64 Kb/s (kilobits per second), which must be delivered with minimal delay, so very small packet sizes are used. On the other hand, video data and other business data usually benefit from quite large block sizes. An ATM packet consists of 48 octets (the term used in communications for eight bits, called a byte in data processing) of data preceded by five octets of control information. An ATM network consists of a set of communication links interconnected by switches. Communication is preceded by a setup stage in which a path through the network is determined to establish a circuit. Once a circuit is established, 53-octet packets may be streamed from point to point. ATM networks can be used to implement parts of the Internet by simulating links between routers in separate intranets. This means that the ``direct'' intranet connections are actually implemented by means of shared ATM links and switches. ATM, both between LANs and between servers and workstations on a LAN, will support data rates that will allow many users to make use of motion video on a LAN. • Data-transmission techniques Modems Modulator/demodulators, or modems, are used to send digital data over analog channels by means of a carrier signal (sine wave) modulated by changing the frequency, phase, amplitude, or some combination of them in order to represent digital data. (The result is still an analog signal.) Modulation is performed at the transmitting end and demodulation at the receiving end. The most common use for modems in a computer environment is to connect two computers over an analog telephone line. Because of the quality of telephone lines, the data rate is commonly limited to 28.8 Kb/s. For transmission of customer analog signals between telephone company central offices, the signals are sampled and converted to ``digital form'' (actually, still an analog signal) for transmission between offices. Since the customer voice signal is represented by a stream of digital samples at a fixed rate (64 Kb/s), the data rate that can be achieved over analog telephone lines is limited. ISDN Integrated Service Digital Network (ISDN) extends the telephone company digital network by sending the digital form of the signal all the way to the customer. ISDN is organized around 64Kb/s transmission speeds, the speed used for digitized voice. An ISDN line was originally intended to simultaneously transmit a digitized voice signal and a 64Kb/s data stream on a single wire. In practice, two channels are used to produce a 128Kb/s line, which is faster than the 28.8Kb/s speed of typical computer modems but not adequate to handle MPEG video. ADSL Asymmetric Digital Subscriber Lines (ADSL) [33-35] extend telephone company twisted-pair wiring to yet greater speeds. The lines are asymmetric, with an outbound data rate of 1.5 Mb/s and an inbound rate of 64 Kb/s. This is suitable for video on demand, home shopping, games, and interactive information systems (collectively known as interactive television), because 1.5 Mb/s is fast enough for compressed digital video, while a much slower ``back channel'' is needed for control. ADSL uses very high-speed modems at each end to achieve these speeds over twistedpair wire. ADSL is a critical technology for the Regional Bell Operating Companies (RBOCs), because it allows them to use the existing twisted-pair infrastructure to deliver high data rates to the home. • Cable systems Cable television systems provide analog broadcast signals on a coaxial cable, instead of through the air, with the attendant freedom to use additional frequencies and thus provide a greater number of channels than over-the-air broadcast. The systems are arranged like a branching tree, with ``splitters'' at the branch points. They also require amplifiers for the outbound signals, to make up for signal loss in the cable. Most modern cable systems use fiber optic cables for the trunk and major branches and use coaxial cable for only the final loop, which services one or two thousand homes. The root of the tree, where the signals originate, is called the head end. Cable modems Cable modems are used to modulate digital data, at high data rates, into an analog 6-MHz-bandwidth TV-like signal. These modems can transfer 20 to 40 Mb/s in a frequency bandwidth that would have been occupied by a single analog TV signal, allowing multiple compressed digital TV channels to be multiplexed over a single analog channel. The high data rate may also be used to download programs or World Wide Web content or to play compressed video. Cable modems are critical to cable operators, because it enables them to compete with the RBOCs using ADSL. Set-top box The STB is an appliance that connects a TV set to a cable system, terrestrial broadcast antenna, or satellite broadcast antenna. The STB in most homes has two functions. First, in response to a viewer's request with the remote-control unit, it shifts the frequency of the selected channel to either channel 3 or 4, for input to the TV set. Second, it is used to restrict access and block channels that are not paid for. Addressable STBs respond to orders that come from the head end to block and unblock channels. • Admission control Digital multimedia systems that are shared by multiple clients can deliver multimedia data to a limited number of clients. Admission control is the function which ensures that once delivery starts, it will be able to continue with the required quality of service (ability to transfer isochronous data on time) until completion. The maximum number of clients depends upon the particular content being used and other characteristics of the system. • Digital watermarks Because it is so easy to transmit perfect copies of digital objects, many owners of digital content wish to control unauthorized copying. This is often to ensure that proper royalties have been paid. Digital watermarking [38, 39] consists of making small changes in the digital data that can later be used to determine the origin of an unauthorized copy. Such small changes in the digital data are intended to be invisible when the content is viewed. This is very similar to the ``errors'' that mapmakers introduce in order to prove that suspect maps are copies of their maps. In other circumstances, a visible watermark is applied in order to make commercial use of the image impractical. Lecture 7: Multimedia architecture In this section we show how the multimedia technologies are organized in order to create multimedia systems, which in general consist of suitable organizations of clients, application servers, and storage servers that communicate through a network. Some multimedia systems are confined to a stand-alone computer system with content stored on hard disks or CD-ROMs. Distributed multimedia systems communicate through a network and use many shared resources, making quality of service very difficult to achieve and resource management very complex. • Single-user stand-alone systems Stand-alone multimedia systems use CD-ROM disks and/or hard disks to hold multimedia objects and the scripting metadata to orchestrate the playout. CD-ROM disks are inexpensive to produce and hold a large amount of digital data; however, the content is static--new content requires creation and physical distribution of new disks for all systems. Decompression is now done by either a special decompression card or a software application that runs on the processor. The technology trend is toward software decompression. • Multi-user systems Video over LANs Stand-alone multimedia systems can be converted to networked multimedia systems by using client-server remotefile-system technology to enable the multimedia application to access data stored on a server as if the data were on a local storage medium. This is very convenient, because the stand-alone multimedia application does not have to be changed. LAN throughput is the major challenge in these systems. Ethernet LANs can support less than 10 Mb/s, and token rings 16 Mb/s. This translates into six to ten 1.5Mb/s MPEG video streams. Admission control is a critical problem. The OS/2* LAN server is one of the few products that support admission control [40]. It uses priorities with token-ring messaging to differentiate between multimedia traffic and lower-priority data traffic. It also limits the multimedia streams to be sure that they do not sum to more than the capacity of the LAN. Without some type of resource reservation and admission control, the only way to give some assurance of continuous video is to operate with small LANs and make sure that the server is on the same LAN as the client. In the future, ATM and fast Ethernet will provide capacity more appropriate to multimedia. Direct Broadcast Satellite Direct Broadcast Satellite (DBS), which broadcasts up to 80 channels from a satellite at high power, arrived in 1995 as a major force in the delivery of broadcast video. The high power allows small (18-inch) dishes with line-of-sight to the satellite to capture the signal. MPEG compression is used to get the maximum number of channels out of the bandwidth. The RCA/Hughes service employs two satellites and a backup to provide 160 channels. This large number of channels allows many premium and special-purpose channels as well as the usual free channels. Many more pay-per-view channels can be provided than in conventional cable systems. This allows enhanced pay-per-view, in which the same movie is shown with staggered starting times of half an hour or an hour. DBS requires a set-top box with much more function than a normal cable STB. The STB contains a demodulator to reconstruct the digital data from the analog satellite broadcast. The MPEG compressed form is decompressed, and a standard TV signal is produced for input to the TV set. The STB uses a telephone modem to periodically verify that the premium channels are still authorized and report on use of the pay-per-view channels so that billing can be done. Interactive TV and video to the home Interactive TV and video to the home [2-5] allow viewers to select, interact with, and control video play on a TV set in real time. The user might be viewing a conventional movie, doing home shopping, or engaging in a network game. The compressed video flowing to the home requires high bandwidth, from 1.5 to 6 Mb/s, while the return path, used for selection and control, requires far lower bandwidth. The STB used for interactive TV is similar to that used for DBS. The demodulation function depends upon the network used to deliver the digital data. A microprocessor with memory for limited buffering as well as an MPEG decompression chip is needed. The video is converted to a standard TV signal for input to the TV set. The STB has a remote-control unit, which allows the viewer to make choices from a distance. Some means are needed to allow the STB to relay viewer commands back to the server, depending upon the network being used. Cable systems appear to be broadcast systems, but they can actually be used to deliver different content to each home. Cable systems often use fiber optic cables to send the video to converters that place it on local loops of coaxial cable. If a fiber cable is dedicated to each final loop, which services 500 to 1500 homes, there will be enough bandwidth to deliver an individual signal to many of those houses. The cable can also provide the reverse path to the cable head end. Ethernet-like protocols can be used to share the same channel with the other STBs in the local loop. This topology is attractive to cable companies because it uses the existing cable plant. If the appropriate amplifiers are not present in the cable system for the back channel, a telephone modem can be used to provide the back channel. As mentioned above, the asymmetric data rates of ADSL are tailored for interactive TV. The use of standard twistedpair wire, which has been brought to virtually every house, is attractive to the telephone industry. However, the twisted pair is a more noisy medium than coaxial cable, so more expensive modems are needed, and distances are limited. ADSL can be used at higher data rates if the distance is further reduced. Interactive TV architectures are typically three-tier, in which the client and server tiers interact through an application server. (In three-tier systems, the tier-1 systems are clients, the tier-2 systems are used for application programs, and the tier-3 systems are data servers.) The application tier is used to separate the logic of looking up material in indexes, maintaining the shopping state of a viewer, interacting with credit card servers, and other similar functions from the simple function of delivering multimedia objects. The key research questions about interactive TV and video-on-demand are not computer science questions at all. Rather, they are the human-factors issues concerning ease of the on-screen interface and, more significantly, the marketing questions regarding what home viewers will find valuable and compelling. Internet over cable systems World Wide Web browsing allows users to see a rich text, video, sound, and graphics interface and allows them to access other information by clicking on text or graphics. Web pages are written in HyperText Markup Language (HTML) and use an application communications protocol called HTTP. The user responses, which select the next page or provide a small amount of text information, are normally quite short. On the other hand, the graphics and pictures require many times the number of bytes to be transmitted to the client. This means that distribution systems that offer asymmetric data rates are appropriate. Cable TV systems can be used to provide asymmetric Internet access for home computers in ways that are very similar to interactive TV over cable. The data being sent to the client is digitized and broadcast over a prearranged channel over all or part of the cable system. A cable modem at the client end tunes to the right channel and demodulates the information being broadcast. It must also filter the information destined for the particular station from the information being sent to other clients. The low-bandwidth reverse channel is the same low-frequency band that is used in interactive TV. As with interactive TV, a telephone modem might be used for the reverse channel. The cable head end is then attached to the Internet using a router. The head end is also likely to offer other services that Internet Service Providers sell, such as permanent mailboxes. This asymmetric connection would not be appropriate for a Web server or some other type of commerce server on the Internet, because servers transmit too much data for the low-speed return path. The cable modem provides the physical link for the TCP/IP stack in the client computer. The client software treats this environment just like a LAN connected to the Internet. Video servers on a LAN LAN-based multimedia systems [4, 6, 15] go beyond the simple, client-server, remote file system type of video server, to advanced systems that offer a three-tier architecture with clients, application servers, and multimedia servers. The application servers provide applications that interact with the client and select the video to be shown. On a company intranet, LAN-based multimedia could be used for just-in-time education, on-line documentation of procedures, or video messaging. On the Internet, it could be used for a video product manual, interactive video product support, or Internet commerce. The application server chooses the video to be shown and causes it to be sent to the client. There are three different ways that the application server can cause playout of the video: By giving the address of the video server and the name of the content to the client, which would then fetch it from the video server; by communicating with the video server and having it send the data to the client; and by communicating with both to set up the relationship. The transmission of data to the client may be in push mode or pull mode. In push mode, the server sends data to the client at the appropriate rate. The network must have quality-of-service guarantees to ensure that the data gets to the client on time. In pull mode, the client requests data from the server, and thus paces the transmission. The current protocols for Internet use are TCP and UDP. TCP sets up sessions, and the server can push the data to the client. However, the ``moving-window'' algorithm of TCP, which prevents client buffer overrun, creates acknowledgments that pace the sending of data, thus making it in effect a pull protocol. Another issue in Internet architecture is the role of firewalls, which are used at the gateway between an intranet and the Internet to keep potentially dangerous or malicious Internet traffic from getting onto the intranet. UDP packets are normally never allowed in. TCP sessions are allowed, if they are created from the inside to the outside. A disadvantage of TCP for isochronous data is that error detection and retransmission is automatic and required--whereas it is preferable to discard garbled video data and just continue. Resource reservation is just beginning to be incorporated on the Internet and intranets. Video will be considered to have higher priority, and the network will have to ensure that there is a limit to the amount of high-priority traffic that can be admitted. All of the routers on the path from the server to the client will have to cooperate in the reservation and the use of priorities. Video conferencing Video conferencing which will be used on both intranets and the Internet, uses multiple data types, and serves multiple clients in the same conference. Video cameras can be mounted near a PC display to capture the user's picture. In addition to the live video, these systems include shared white boards and show previously prepared visuals. Some form of mediation is needed to determine which participant is in control. Since the type of multimedia data needed for conferencing requires much lower data rates than most other types of video, low-bit-rate video, using approximately eight frames per second and requiring tens of kilobits per second, will be used with small window sizes for the ``talking heads'' and most of the other visuals. Scalability of a video conferencing system is important, because if all participants send to all other participants, the traffic goes up as the square of the number of participants. This can be made linear by having all transmissions go through a common server. If the network has a multicast facility, the server can use that to distribute to the participants LECTUER 8 Multimedia Software Familiar Tools Multimedia Authoring Tools Elemental Tools Familiar Tools Word Processors _ Microsoft Word _ WordPerfect Spreadsheets _ Excel Databases _Q+E Database/VB Presentation Tools _ PowerPoint MULTIMEDIA AUTHORING TOOL: A multimedia authoring tool is a program that helps you write multimedia applications. A multimedia authoring tool enables you to create a final application merely by linking together objects, such as a paragraph of text, an illustration, or a song. They are used exclusively for applications that present a mixture of textual, graphical, and audio data. With multimedia authoring software you can make video productions including CDs and DVDs, design interactivity and user interface, animations, screen savers, games, presentations, interactive training and simulations. Types of authoring tools: There are basically three types of authoring tools. These are as following. Card- or Page-based Tools Icon-based Tools Time-based Tools Card- or Page-based Tools In these authoring systems, elements are organized as pages of a book or stack of cards. The authoring system lets you link these pages or cards into organized sequence and they also allow you to play sound elements and launch animations and digital videos. Page-based authoring systems are object-oriented: the objects are the buttons, graphics and etc. Each object may contain a programming script activated when an event related to that object occurs. EX: Visual Basic. Icon-based Tools Icon-based, event-driven tools provide a visual programming approach to organizing and presenting multimedia. First you build the flowchart of events, tasks and decisions by using appropriate icons from a library. These icons can include menu choices, graphic images and sounds. When the flowchart is built, you can add your content: text, graphics, animations, sounds and video movies. EX: Authoware Professional Time-based Tools: Time-based authoring tools are the most common of multimedia authoring tools. In these authoring systems, elements are organized along a time line. They are the best to use when you have message with the beginning and an end. Sequentially organized graphic frames are played back at the speed that you can set. Other elements (such as audio events) are triggered at the given time or location in the sequence of events. EX: Animation Works Interactive LECTURE 9: Elemental Tools: Elemental tools help us work with the important basic elements of your project: its graphics, images, sound, text and moving pictures. Elemental tools includes: Painting And Drawing Tools Cad And 3-D Drawing Tools Image Editing Tools OCR Software Sound Editing Programs Tools For Creating Animations And Digital Movies Helpful Accessories Painting And Drawing Tools: Painting and drawing tools are the most important items in your toolkit because the impact of the graphics in your project will likely have the greatest influence on the end user. Painting software is dedicated to producing excellent bitmapped images . Drawing software is dedicated to producing line art that is easily printed to paper. Drawing packages include powerful and expensive computer-aided design (CAD) software. Ex: DeskDraw, DeskPaint, Designer CAD And 3-D Drawing Tools CAD (computer-aided design) is a software used by architects, engineers, drafters, artists and others to create precision drawings or technical illustrations. It can be used to create two-dimensional (2-D) drawings or three dimensional modules. The CAD images can spin about in space, with lighting conditions exactly simulated and shadows properly drawn. With CAD software you can stand in front of your work and view it from any angle, making judgments about its design. Ex: AutoCAD Image Editing Tools Image editing applications are specialized and powerful tools for enhancing and retouching existing bitmapped images. These programs are also indispensable for rendering images used in multimedia presentations. Modern versions of these programs also provide many of the features and tools of painting and drawing programs, and can be used to create images from scratch as well as images digitized from scanners, digital cameras or artwork files created by painting or drawing packages. Ex: Photoshop OCR Software Often you will have printed matter and other text to incorporate into your project, but no electronic text file. With Optical Character Recognition (OCR) software, a flat-bed scanner and your computer you can save many hours of typing printed words and get the job done faster and more accurately. Ex: Perceive Sound Editing Programs Sound editing tools for both digitized and MIDI sound let you see music as well as hear it. By drawing the representation of the sound in a waveform, you can cut, copy, paste and edit segments of the sound with great precision and making your own sound effects. Using editing tools to make your own MIDI files requires knowing about keys, notations and instruments and you will need a MIDI synthesizer or device connected to the computer. Ex: SoundEdit Pro Tools For Creating Animations And Digital Movies Animations and digital movies are sequences of bitmapped graphic scenes (frames), rapidly played back. But animations can also be made within an authoring system by rapidly changing the location of objects to generate an appearance of motion. Movie-making tools let you edit and assemble video clips captured from camera, animations, scanned images, other digitized movie segments. The completed clip, often with added transition and visual effects can be played back. Ex: Animator Pro and SuperVideo Windows Helpful Accessories No multimedia toolkit is complete without a few indispensable utilities to perform some odd, but repeated tasks. These are the accessories. For example a screen-grabber is essential, because bitmap images are so common in multimedia, it is important to have a tool for grabbing all or part of the screen display so you can import it into your authoring system or copy it into an image editing application LECTURE 10: Anti-aliasing One of the most important techniques in making graphics and text easy to read and pleasing to the eye on-screen is anti-aliasing. Anti-aliasing is a cheaty way of getting round the low 72dpi resolution of the computer monitor and making objects appear as smooth as if they'd just stepped out of a 1200dpi printer (nearly). Take a look at these images. The letter a on the left is un-anti-aliased and looks coarse compared with the letter on the right. If we zoom in we can see better what's happening. Look at how the un-anti-aliased example below left breaks up curves into steps and jagged outcrops. This is what gives the letter its coarse appearance. The example on the right is the same letter, same point size and everything, but with anti-aliasing turned on in Photoshop's text tool. Notice how the program has substituted shades of grey around the lines which would otherwise be broken across a pixel. But anti-aliasing is more than just making something slightly fuzzy so that you can't see the jagged edges: it's a way of fooling the eye into seeing straight lines and smooth curves where there are none. To see how anti-aliasing works, let's take a diagonal line drawn across a set of pixels. In the example left the pixels are marked by the grid: real pixels don't look like that of course, but the principle is the same. Pixels around an un-anti-aliased line can only be part of the line or not part of it: so the computer draws the line as a jagged set of pixels roughly approximating the course of our original nice smooth line. (Trivia fact: anti-aliasing was invented at MIT's Media Lab. So glad they do do something useful there....) When the computer anti-aliases the line it works out how much of each in-between pixel would be covered by the diagonal line and draws that pixel as an intermediate shade between background and foreground. In our simpleminded example here this is shades of grey. This close up the anti-aliasing is obvious and actually looks worse than the un-anti-aliased version, but try taking your glasses off, stepping a few yards back from the screen and screwing up your eyes a bit to emulate the effect of seeing the line on a VGA monitor covered in crud at its right size. Suddenly a nice, smooth line pops into view. So how does one go about anti-aliasing an image? Just be grateful you don't have to do it by hand. Most screen design programs, including Photosop and Paintshop Pro include anti-alias options for things like text and line tools. The important thing is simply to remember to do it, and to do it at the appropriate time. There are far too many graphics out on the Web that are perfectly well-designed, attractive and fitted to their purpose but end up looking amateurish because they haven't been anti-aliased. Equally, there are plenty of graphics that have turned to visual mush because they've been overworked with the anti-alias tool. Generally, I guess, the rules are these: Always anti-alias text except when the text is very small. This is to taste but I reckon on switching off anti-aliasing in Photoshop below about 12 points. If you're doing a lot with text this size, you really oughtn't be putting it in a graphic but formatting ASCII instead. Always anti-alias rasterised EPSs (see the accompanying page for details). Except when you don't want to, of course. If attempting to anti-alias something manually, or semi-manually, such as by putting a grey halo round a block black graphic, then only apply the effect at the last possible stage. And always, always, always bear in mind the target background colour. It's a fat lot of good anti-aliasing a piece of blue text on a white background, if the target page is orange, because the anti-aliased halo is going to be shades of white-orange. I spent two hours re-colouring in a logo after doing exactly that. Doh! Never confuse blur and anti-aliasing. The former is a great help in making things appear nice and smooth if applied to specific parts of an image, but it'll make your image just look runny if used all over. That's about it. Anti-aliasing is of immense importance, especially in turning EPSs into something pleasant to look at onscreen, as I explain in the next couple of pages. LECTURE 11: ANIMATION Animation is achieved by adding motion to still image or objects, It may also be defined as the creation of moving pictures one frame at a time.Animation grabs attention, and makes a multimedia product more interesting and attractive. There are a few types of animation Layout transition It is the simplest form of animation Example of transitions is spiral, stretch. Zoom Process / information transition Animation can be used to describe complex information/ process in an easier way. Such as performing visual cues (e.g. how things work) Object Movement Object movement which are more complex animations such as animated gif or animated scenes. How does animation works? Animation is possible because of:o A biological phenomenon known as persistence of vision An object seen by human eye remains chemically mapped on the eye’s retina for a brief time after viewing o A psychological phenomenon called phi. Human’s mind need to conceptually complete the perceived action. The combination of persistence of vision and phi make it possible for a series of images that are changed very slightly and very rapidly, one after another, to seemingly blend together into a visual illusion of movement. Eg. A few cells or frames of rotating logo, when continuously and rapidly changed, the arrow of the compass is perceived to be spinning. Still images are flashed in sequence to provide the illusion of animation. The speed of the image changes is called the frame rate. Film is typically delivered at 24 frames per second (fps) In reality the projector light flashes twice per frame, this increasing the flicker rate to 48 times per second to remove any flicker image. The more interruptions per second, the more continuous the beam of light appears, the smoother the animation. Animation Techniques Cel animation o a series of progressively different graphics are used for each frame of film. o Made famous by Disney. Stop motion o Miniatures three-dimensional sets are used (stage, objects) o Objects are moved carefully between shots. Computer Animation (Digital cel & sprite animation) o Employ the same logic and procedural concept of cel animation o Objects are drawn using 3D modelling software. o Objects and background are drawn on different layers, which can be put on top of one another. o Sprite animation – animation on moving object (sprite) Computer Animation (Key Frame Animation) o Key frames are drawn to provide the pose a detailed characteristic of characters at important points in the animation Eg. Specify the start and end of a walk, the top and bottom of the fall. o 3D modelling and animation software will do the tweening process. o Tweening fill the gaps between the key frames and create a smooth animation. Hybrid Technique o A technique that mix cell and 3D computer animation. It may as well include life footage. Kinematics o The study of motion of jointed structure (such as people) o Realistic animation of such movement can be complex. The latest technology use motion capture for complex movement animation Morphing o The process of transitioning from one image to another. When morphing, few key elements (such as a nose from both images) are set to share the same location (one the final image) LECTURE 12: VIDEO ON DEMAND Video can add great impact to your multimedia presentation due to its ability to draw people attention. Video is also very hardware-intensive (require the highest performance demand on your computer) o Storage issue: full-screen, uncompressed video uses over 20 megabytes per second (MBps) of bandwidth and storage space. Processor capability in handling very huge data on real time delivery To get the highest video performance, we should: o Use video compression hardware to allow you to work with full-screen, full-motion video . o Use a sophisticated audio board to allow you to use CD-quality sounds. o Install a Super fast RAID ( Redundant Array of Independent Disks) system that will support high-speed data transfer rates. Analog vs Digital Video Digital video is beginning to replace analog in both professional (production house and broadcast station) and consumer video markets. Digital video offer superior quality at a given cost. Why? o Digital video reduces generational losses suffered by analog video. o Digital mastering means that quality will never be an issues. Obtaining Video Clip If using analog video, we need to convert it to digital format first (in other words, need to digitize the analog video first). Source for analog video can come from: o Existing video content o beware of licensing and copyright issues Take a new footage (i.e. shoot your own video) Ask permission from all the persons who appear or speak, as well as the permission for the audio or music used. How video works. (Video Basics) Light passes through the camera lens and is converted to an electronic signal by a Charge Coupled Device (CCD) Most consumer-grade cameras have a single CCD. Professional–grade cameras have three CCDs, one for each Red, Green and Blue color information The output of the CCD is processed by the camera into a signal containing three channels of color information and synchronization pulse (sync). If each channel of color information is transmitted as a separate signal on its own conductor, the signal output is called RGB, which is the preferred method for higher-quality and professional video work. LECTURE 13: IMAGES: An image (from Latin imago) is an artifact, usually two-dimensional (a picture), that has a similar appearance to some subject—usually a physical object or a person. The word image is also used in the broader sense of any two-dimensional figure such as a map, a graph, a pie chart, or an abstract painting. In this wider sense, images can also be rendered manually, such as by drawing, painting, carving, rendered automatically by printing or computer graphics technology, or developed by a combination of methods, especially in a pseudo-photograph. A volatile image is one that exists only for a short period of time. This may be a reflection of an object by a mirror, a projection of a camera obscura, or a scene displayed on a cathode ray tube. A fixed image, also called a hard copy, is one that has been recorded on a material object, such as paper or textile by photography or digital processes. A mental image exists in an individual's mind: something one remembers or imagines. The subject of an image need not be real; it may be an abstract concept, such as a graph, function, or "imaginary" entity. For example, Sigmund Freud claimed to have dreamt purely in aural-images of dialogues. The development of synthetic acoustic technologies and the creation of sound art have led to a consideration of the possibilities of a sound-image made up of irreducible phonic substance beyond linguistic or musicological analysis. Still image A still image is a single static image, as distinguished from a moving image (see below). This phrase is used in photography, visual media and the computer industry to emphasize that one is not talking about movies, or in very precise or pedantic technical writing such as a standard. A film still is a photograph taken on the set of a movie or television program during production, used for promotional purposes. CAPTURING AND EDITING IMAGE: The image you see on your monitor is a bitmap stored in a video memory, updated about every 1/60 of a second or faster, depending upon your monitor’s scan rate. As you assemble image for your multimedia project, you may need to capture and store an image directly from your screen. The simplest way to capture what you see on your screen is to press any proper key on your keyboard. Both the Macintosh and window environment have a clipboard where text and images are temporarily stored when you cut copy them within an application. In window when you press print screen a copy of your screen image go to clipboard. From clipboard you can paste the captured bitmap image to an application. Screen capture utility for Macintosh and window go to a step further and are indispensable to the multimedia artists. With a key stroke they let you select an area of screen and save the selection in various other format. Image editing When manipulating bitmap is to use an image-editing program. There is king of Mountain program that let you not only retouch the image but also do tricks like placing your face at the helm of a squarerigger. In addition to letting you to enhance and make composite image, image editing will also allow you to alter and distort the image. a colored image of red rose can be changed into blue rose or purple rose. Morphing is the other effect that can be used to manipulate the still images or can be used to create bizarre animated transformations. Morphing allow you to smoothly blend two image so that one image is seemed to be melted into the other image. SCANNING IMAGE: Document scanning or image scanning is the action or process of converting text and graphic paper documents, photographic film, photographic paper or other files to digital images. This "analog" to "digital" conversion process (A<D) is required for computer users to be able to view electronic files. LECTURE:14 Color palette: In computer graphics, a palette is either a given, finite set of colors for the management of digital images (that is, a color palette), or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors (such as a tools palette). Depending on the context (an engineer's technical specification, an advertisement, a programmers' guide, an image file specification, a user's manual, etc.) the term palette and related terms such as Web palette and RGB palette, for example, can have somewhat different meanings. The following are some of the widely used meanings for color palette in computing. The total number of colors that a given system is able to generate or manage; the term full palette is often encountered in this sense. For example, Highcolor displays are said to have a 16-bit RGB palette. The limited fixed color selection that a given display adapter can offer when its hardware registers are appropriately set (fixed palette selection). For example, the Color Graphics Adapter (CGA) can be set to show the so-called palette #1 or the palette #2 in color graphic mode: two combinations of 4 fixed colors each. The limited selection of colors that a given system is able to display simultaneously, generally picked from a wider full palette; selected colors or picked colors are also used. In this case, the color selection is always chosen by software, both by the user or by a program. For example, the standard VGA display adapter is said to provide a palette of 256 simultaneous colors from a total of 262,144 different colors. The hardware registers of the display subsystem in which the selected colors' values are to be loaded in order to show them, also referred as the hardware palette or Color Look-Up Table (CLUT). For example, the hardware registers of the Commodore Amiga are known both as their color palette and their CLUT, depending on sources. A given color selection officially standardized by some body or corporation; default palette or system palette are also used for this meaning. For example, the well known Web colors for use with Internet browsers, or the Microsoft Windows default palette. The limited color selection inside a given indexed color image file as GIF, for example, although the expressions color table or color map are also generally used. vector images A vector image is made up of a series of mathematical instructions. These instructions define the lines and shapes that make up a vector image. As well as shape, size and orientation, the file stores information about the outline and fill colour of these shapes. Features common to vector images Scalability/resolution independence - images display at any size without loss of detail File sizes are usually smaller than raster images Easily convertible to raster formats (rasterising) Unable to display on Web without browser plug-in or conversion to raster format File formats for vector images There are three types of file format for storing vector images: 1. Native vector image file formats e.g. Adobe Illustrator (AI), Adobe FreeHand (FH11), CorelDraw (CDR) Native file formats are created and used by drawing software. They are usually proprietary and best supported by the programs that create them. Native file formats can be saved in Web compatible formats, or as metafiles for printing. 2. Metafiles/Page Description Language (PDL) e.g. Encapsulated PostScript (EPS), Windows Metafile (WMF), Computer Graphic Metafile (CGM), Adobe Portable Document Format (PDF) Metafiles contain images (vector and raster) and the instructions for displaying them. They can act as containers for vector images when native formats are not supported. Unlike WMF, which only works in Windows, EPS is platform independent 1. Web compatible vector image file formats e.g. Adobe Flash (SWF) and Scalable Vector Graphics (SVG) Flash and SVG are the two main vector image standards for the Web. Both support animation (see Vector Images Part II) and both require browser plug-ins. Flash is almost ubiquitous and widely supported, but remains proprietary. SVG is an 'open' standard based on XML, but is currently not nearly as well supported by browsers. 2. LECTURE 15: JPEG (pronounced "jay-peg") is a standardized image compression mechanism. JPEG stands for Joint Photographic Experts Group, the original name of the committee that wrote the standard. JPEG is designed for compressing either full-color or gray-scale images of natural, real-world scenes. It works well on photographs, naturalistic artwork, and similar material; not so well on lettering, simple cartoons, or line drawings. JPEG handles only still images, but there is a related standard called MPEG for motion pictures. JPEG is "lossy," meaning that the decompressed image isn't quite the same as the one you started with. (There are lossless image compression algorithms, but JPEG achieves much greater compression than is possible with lossless methods.) JPEG is designed to exploit known limitations of the human eye, notably the fact that small color changes are perceived less accurately than small changes in brightness. Thus, JPEG is intended for compressing images that will be looked at by humans. If you plan to machine-analyze your images, the small errors introduced by JPEG may be a problem for you, even if they are invisible to the eye. A useful property of JPEG is that the degree of lossiness can be varied by adjusting compression parameters ( Q factor). This means that the image maker can trade off file size against output image quality. You can make *extremely* small files if you don't mind poor quality; this is useful for applications such as indexing image archives. Conversely, if you aren't happy with the output quality at the default compression setting, you can jack up the quality until you are satisfied, and accept lesser compression. Another important aspect of JPEG is that decoders can trade off decoding speed against image quality, by using fast but inaccurate approximations to the required calculations. Some viewers obtain remarkable speedups in this way. (Encoders can also trade accuracy for speed, but there's usually less reason to make such a sacrifice when writing a file.) Very well indeed, when working with its intended type of image (photographs and suchlike). For full-color images, the uncompressed data is normally 24 bits/pixel. The best known lossless compression methods can compress such data about 2:1 on average. JPEG can typically achieve 10:1 to 20:1 compression without visible loss, bringing the effective storage requirement down to 1 to 2 bits/pixel. 30:1 to 50:1 compression is possible with small to moderate defects, while for very-low-quality purposes such as previews or archive indexes, 100:1 compression is quite feasible. An image compressed 100:1 with JPEG takes up the same space as a full-color one-tenth-scale thumbnail image, yet it retains much more detail than such a thumbnail. JPEG compression The compression method is usually lossy, meaning that some visual quality is lost in the process and cannot be restored. There are variations on the standard baseline JPEG that are lossless; however, these are not widely supported. There is also an interlaced "Progressive JPEG" format, in which data is compressed in multiple passes of progressively higher detail. This is ideal for large images that will be displayed while downloading over a slow connection, allowing a reasonable preview after receiving only a portion of the data. However, progressive JPEGs are not as widely supported, and even some software which does support them (such as some versions of Internet Explorer) only displays the image once it has been completely downloaded. There are also many medical imaging systems that create and process 12-bit JPEG images. The 12-bit JPEG format has been part of the JPEG specification for some time, but again, this format is not as widely supported. [edit] Lossless editing A number of alterations to a JPEG image can be performed losslessly (that is, without recompression and the associated quality loss) as long as the image size is a multiple 1 MCU block (Minimum Coded Unit) (usually 16 pixels in both directions, for 4:2:0). Blocks can be rotated in 90 degree increments, flipped in the horizontal, vertical and diagonal axes and moved about in the image. Not all blocks from the original image need to be used in the modified one. The top and left of a JPEG image must lie on a block boundary, but the bottom and right need not do so. This limits the possible lossless crop operations, and also what flips and rotates can be performed on an image whose edges do not lie on a block boundary for all channels. When using lossless cropping, if the bottom or right side of the crop region is not on a block boundary then the rest of the data from the partially used blocks will still be present in the cropped file and can be recovered relatively easily by anyone with a hex editor and an understanding of the format. It is also possible to transform between baseline and progressive formats without any loss of quality, since the only difference is the order in which the coefficients are placed in the file. JPEG-DCT encoding and quantization: JPEG-DCT encoding Next, each component (Y, Cb, Cr) of each 8×8 block is converted to a frequency-domain representation, using a normalized, two-dimensional type-II discrete cosine transform (DCT). As an example, one such 8×8 8-bit subimage might be: Before computing the DCT of the subimage, its gray values are shifted from a positive range to one centered around zero. For an 8-bit image each pixel has 256 possible values: [0,255]. To center around zero it is necessary to subtract by half the number of possible values, or 128. Subtracting 128 from each pixel value yields pixel values on [ − 128,127] The next step is to take the two-dimensional DCT, which is given by: The DCT transforms 64 pixels to a linear combination of these 64 squares. Horizontally is u and vertically is v. where is the horizontal spatial frequency, for the integers . is the vertical spatial frequency, for the integers . is a normalizing function is the pixel value at coordinates is the DCT coefficient at coordinates If we perform this transformation on our matrix above, and then round to the nearest integer, we get Note the rather large value of the top-left corner. This is the DC coefficient. The remaining 63 coefficients are called the AC coefficients. The advantage of the DCT is its tendency to aggregate most of the signal in one corner of the result, as may be seen above. The quantization step to follow accentuates this effect while simultaneously reducing the overall size of the DCT coefficients, resulting in a signal that is easy to compress efficiently in the entropy stage. The DCT temporarily increases the bit-depth of the image, since the DCT coefficients of an 8-bit/component image take up to 11 or more bits (depending on fidelity of the DCT calculation) to store. This may force the codec to temporarily use 16-bit bins to hold these coefficients, doubling the size of the image representation at this point; they are typically reduced back to 8-bit values by the quantization step. The temporary increase in size at this stage is not a performance concern for most JPEG implementations, because typically only a very small part of the image is stored in full DCT form at any given time during the image encoding or decoding process. [edit] Quantization The human eye is good at seeing small differences in brightness over a relatively large area, but not so good at distinguishing the exact strength of a high frequency brightness variation. This allows one to greatly reduce the amount of information in the high frequency components. This is done by simply dividing each component in the frequency domain by a constant for that component, and then rounding to the nearest integer. This is the main lossy operation in the whole process. As a result of this, it is typically the case that many of the higher frequency components are rounded to zero, and many of the rest become small positive or negative numbers, which take many fewer bits to store. A typical quantization matrix, as specified in the original JPEG Standard[5], is as follows: The quantized DCT coefficients are computed with where G is the unquantized DCT coefficients; Q is the quantization matrix above; and B is the quantized DCT coefficients. (Note that this is in no way matrix multiplication.) Using this quantization matrix with the DCT coefficient matrix from above results in: For example, using −415 (the DC coefficient) and rounding to the nearest integer LECTURE:16 Entropy coding Main article: Entropy encoding Zigzag ordering of JPEG image components Entropy coding is a special form of lossless data compression. It involves arranging the image components in a "zigzag" order employing run-length encoding (RLE) algorithm that groups similar frequencies together, inserting length coding zeros, and then using Huffman coding on what is left. The JPEG standard also allows, but does not require, the use of arithmetic coding, which is mathematically superior to Huffman coding. However, this feature is rarely used as it is covered by patents and because it is much slower to encode and decode compared to Huffman coding. Arithmetic coding typically makes files about 5% smaller. The zigzag sequence for the above quantized coefficients are shown below. (The format shown is just for ease of understanding/viewing.) −26 −3 −3 2 1 −1 0 0 0 0 0 0 −2 −4 1 1 0 0 0 0 0 −6 1 5 −1 0 0 0 0 0 −4 1 2 −1 0 0 0 0 2 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 If the i-th block is represented by Bi and positions within each block are represented by (p,q) where p = 0, 1, ..., 7 and q = 0, 1, ..., 7, then any coefficient in the DCT image can be represented as Bi(p,q). Thus, in the above scheme, the order of encoding pixels (for the i-th block) is Bi(0,0), Bi(0,1), Bi(1,0), Bi(2,0), Bi(1,1), Bi(0,2), Bi(0,3), Bi(1,2) and so on. Baseline sequential JPEG encoding and decoding processes This encoding mode is called baseline sequential encoding. Baseline JPEG also supports progressive encoding. While sequential encoding encodes coefficients of a single block at a time (in a zigzag manner), progressive encoding encodes similar-positioned coefficients of all blocks in one go, followed by the next positioned coefficients of all blocks, and so on. So, if the image is divided into N 8×8 blocks {B0,B1,B2, ..., Bn-1}, then progressive encoding encodes Bi(0,0) for all blocks, i.e., for all i = 0, 1, 2, ..., N-1. This is followed by encoding Bi(0,1) coefficient of all blocks, followed by Bi(1,0)-th coefficient of all blocks, then Bi(0,2)-th coefficient of all blocks, and so on. It should be noted here that once all similar-positioned coefficients have been encoded, the next position to be encoded is the one occurring next in the zigzag traversal as indicated in the figure above. It has been found that Baseline Progressive JPEG encoding usually gives better compression as compared to Baseline Sequential JPEG due to the ability to use different Huffman tables (see below) tailored for different frequencies on each "scan" or "pass" (which includes similar-positioned coefficients), though the difference is not too large. In the rest of the article, it is assumed that the coefficient pattern generated is due to sequential mode. In order to encode the above generated coefficient pattern, JPEG uses Huffman encoding. JPEG has a special Huffman code word for ending the sequence prematurely when the remaining coefficients are zero. Using this special code word: "EOB", the sequence becomes: −26 −3 −3 2 1 −1 0 0 −2 −4 1 1 0 −6 1 5 −1 0 −4 1 2 −1 2 0 0 −1 EOB JPEG's other code words represent combinations of (a) the number of significant bits of a coefficient, including sign, and (b) the number of consecutive zero coefficients that follow it. (Once you know how many bits to expect, it takes 1 bit to represent the choices {-1, +1}, 2 bits to represent the choices {-3, -2, +2, +3}, and so forth.) In our example block, most of the quantized coefficients are small numbers that are not followed immediately by a zero coefficient. These more-frequent cases will be represented by shorter code words.The JPEG standard provides general-purpose Huffman tables; encoders may also choose to generate Huffman tables optimized for the actual frequency distributions in images being encoded. LECTURE:17 GIF GIF, which stands for Graphics Interchange Format, is a lossless method of compression. All that means is that when the program that creates a GIF squashes the original image down it takes care not to lose any data. It uses a simple substitution method of compression. If the algorithm comes across several parts of the image that are the same, say a sequence of digits like this, 1 2 3 4 5, 1 2 3 4 5, 1 2 3 4 5, it makes the number 1 stand for the sequence 1 2 3 4 5 so that you could render the same sequence 1 1 1, obviously saving a lot of space. It stores the key to this (1 = 1 2 3 4 5) in a hash table, which is attached to the image so that the decoding program can unscramble it. The maximum compression available with a GIF therefore depends on the amount of repetition there is in an image. A flat colour will compress well - sometimes even down to one tenth of the original file size - while a complex, non-repetitive image will fare worse, perhaps only saving 20% or so. There are problems with GIFs. One is that they are limited to a palette of 256 colours or less. Compuserve, which created the GIF, did at one point say it would attempt to produce a 24-bit version of the GIF, but then along came problem number two: Unisys. Unisys discovered that it owned some patents to key parts of the GIF compression technology, and has started demanding fees from every company whose software uses the (freely available) GIF code. This has somewhat stifled development. There is a 24-bit, license-free GIFalike called the PNG format, but this has yet to take off. JPEG JPEG, on the other hand, is a lossy compression method. In other words, to save space it just throws away parts of an image. Obviously you can't just go around discarding any old piece of information so what the JPEG algorithm does is first divide the image into squares (you can see these squares on badlycompressed JPEGs). Then it uses a piece of mathematics called Discrete Cosine Transformation to turn the square of data into a set of curves, some small, some big, that go together to make up the image. This is where the lossy bit comes in: depending on how much you want to compress the image the algorithm throws away the less significant part of the data (the smaller curves) which adds less to the overall "shape" of the image. This means that, unlike GIF, you get a say in how much you want to compress an image by. However the lossy compression method can generate artifacts - unwanted effects such as false colour and blockiness - if not used carefully. To see some examples of GIFs and JPEGs at work, click on the related pages, below. PNG - Portable Network Graphics (.PNG file extension, the pronunciation 'Ping' is specifically mentioned in the PNG Specification). PNG needs to be mentioned. PNG is not the number one file format, but you will want to know about it. PNG is not so popular yet, but it's appeal is growing as people discover what it can do. PNG was designed recently, with the experience advantage of knowing all that went before. The original purpose of PNG was to be a royalty-free GIF and LZW replacement (see LZW next page). However PNG supports a large set of technical features, including superior lossless compression from LZ77. Compression in PNG is called the ZIP method, and is like the 'deflate" method in PKZIP (and is royalty free). But the big deal is that PNG incorporates special preprocessing filters that can greatly improve the lossless compression efficiency, especially for typical gradient data found in 24 bit photographic images. This filter preprocessing causes PNG to be a little slower than other formats when reading or writing the file (but all types of compression require processing time). Photoshop 7 and Elements 2.0 correct this now, but earlier Adobe versions did not store or read the ppi number to scale print size in PNG files (Adobe previously treated PNG like GIF in this respect, indicated 72 ppi regardless). The ppi number never matters on the video screen or web, but it was a serious usability flaw for printing purposes. Without that stored ppi number, we must scale the image again every time we print it. If we understand this, it should be no big deal, and at home, we probably automatically do that anyway (digital cameras do the same thing with their JPG files). But sending a potentially unsized image to a commercial printer is a mistake, and so TIF files should be used in that regard. Most other programs do store and use the correct scaled resolution value in PNG files. PNG stores resolution internally as pixels per meter, so when calculating back to pixels per inch, some programs may show excessive decimal digits, perhaps 299.999 ppi instead of 300 ppi (no big deal). PNG has additional unique features, like an Alpha channel for a variable transparency mask (any RGB or Grayscale pixel can be say 79% transparent and other pixels may individually have other transparency values). If indexed color, palette values may have similar variable transparency values. PNG files may also contain an embedded Gamma value so the image brightness can be viewed properly on both Windows and Macintosh screens. These should be wonderful features, but in many cases these extra features are not implemented properly (if at all) in many programs, and so these unique features must be ignored for web pages. However, this does not interfere with using the standard features, specifically for the effective and lossless compression. Netscape 4.04 and MS IE 4.0 browsers added support for PNG files on web pages, not to replace JPG, but to replace GIF for graphics. For non-web and non-graphic use, PNG would compete with TIF. Most image programs support PNG, so basic compatibility is not an issue. You may really like PNG. PNG may be of great interest, because it's lossless compression is well suited for master copy data, and because PNG is a noticeably smaller file than LZW TIF. Perhaps about 25% smaller than TIF LZW for 24 bit files, and perhaps about 10% to 30% smaller than GIF files for indexed data. Different images will have varying compression sizes, but PNG is an excellent replacement for GIF and 24 bit TIFF LZW files. PNG does define 48 bit files, but I don't know of any programs that support 48 bit PNG (not too many support 48 bit in any form). Here are some representative file sizes for a 9.9 megabyte 1943x1702 24-bit RGB color image: File type File size TIFF 9.9 megs TIFF LZW 8.4 megs PNG 6.5 megs JPG 1.0 megs BMP 9.9 megs (1.0 / 9.9) is 10% file size Seems to me that PNG is an excellent replacement for TIFF too. TIFF - Tag Image File Format (.TIF file extension, pronounced Tif) TIFF is the format of choice for archiving important images. TIFF is THE leading commercial and professional image standard. TIFF is the most universal and most widely supported format across all platforms, Mac, Windows, Unix. Data up to 48 bits is supported. TIFF supports most color spaces, RGB, CMYK, YCbCr, etc. TIFF is a flexible format with many options. The data contains tags to declare what type of data follows. New types are easy to invent, and this versatility can cause incompatibly, but about any program anywhere will handle the standard TIFF types that we might encounter. TIFF can store data with bytes in either PC or Mac order (Intel or Motorola CPU chips differ in this way). This choice improves efficiency (speed), but all major programs today can read TIFF either way, and TIFF files can be exchanged without problem. Several compression formats are used with TIF. TIF with G3 compression is the universal standard for fax and multipage line art documents. TIFF image files optionally use LZW lossless compression. Lossless means there is no quality loss due to compression. Lossless guarantees that you can always read back exactly what you thought you saved, bit-for-bit identical, without data corruption. This is a critical factor for archiving master copies of important images. Most image compression formats are lossless, with JPG and Kodak PhotoCD PCD files being the main exceptions. Compression works by recognizing repeated identical strings in the data, and replacing the many instances with one instance, in a way that allows unambiguous decoding without loss. This is fairly intensive work, and any compression method makes files slower to save or open. LZW is most effective when compressing solid indexed colors (graphics), and is less effective for 24 bit continuous photo images. Featureless areas compress better than detailed areas. LZW is more effective for grayscale images than color. It is often hardly effective at all for 48 bit images (VueScan 48 bit TIF LZW is an exception to this, using an efficient data type that not all others use ). LZW is Lempel-Ziv-Welch, named for Israeli researchers Abraham Lempel and Jacob Zif who published IEEE papers in 1977 and 1978 (now called LZ77 and LZ78) which were the basis for most later work in compression. Terry Welch built on this, and published and patented a compression technique that is called LZW now. This is the 1984 Unisys patent (now Sperry) involved in TIF LZW and GIF (and V.42bis for modems). There was much controversy about a royalty for LZW for GIF, but royalty was always paid for LZW for TIF files and for v.42bis modems. International patents recently expired in mid-2004. Image programs of any stature will provide LZW, but simple or free programs often do not pay LZW patent royalty to provide LZW, and then its absence can cause an incompatibility for compressed files. It is not necessary to say much about TIF. It works, it's important, it's great, it's practical, it's the standard universal format for high quality images, it simply does the best job the best way. Give TIF very major consideration, both for photos and documents, especially for archiving anything where quality is important. But TIF files for photo images are generally pretty large. Uncompressed TIFF files are about the same size in bytes as the image size in memory. Regardless of the novice view, this size is a plus, not a disadvantage. Large means lots of detail, and it's a good thing. 24 bit RGB image data is 3 bytes per pixel. That is simply how large the image data is, and TIF LZW stores it with recoverable full quality in a lossless format (and again, that's a good thing). $200 today buys BOTH a 320 GB 7200 RPM disk and 512 MB of memory so it is quite easy to plan for and deal with the size. There are situations for less serious purposes when the full quality may not always be important or necessary. JPEG files are much smaller, and are suitable for non-archival purposes, like photos for read-only email and web page use, when small file size may be more important than maximum quality. JPG has its important uses, but be aware of the large price in quality that you must pay for the small size of JPG, it is not without cost. Graphic Interchange Format (GIF) (.GIF file extension) There have been raging debates about the pronunciation. The designers of GIF say it is correctly pronounced to sound like Jiff. But that seems counter-intuitive, and up in my hills, we say it sounding like Gift (without the t). GIF was developed by CompuServe to show images online (in 1987 for 8 bit video boards, before JPG and 24 bit color was in use). GIF uses indexed color, which is limited to a palette of only 256 colors (next page). GIF was a great match for the old 8 bit 256 color video boards, but is inappropriate for today's 24 bit photo images. GIF files do NOT store the image's scaled resolution ppi number, so scaling is necessary every time one is printed. This is of no importance for screen or web images. GIF file format was designed for CompuServe screens, and screens don't use ppi for any purpose. Our printers didn't print images in 1987, so it was useless information, and CompuServe simply didn't bother to store the printing resolution in GIF files. GIF is still an excellent format for graphics, and this is its purpose today, especially on the web. Graphic images (like logos or dialog boxes) use few colors. Being limited to 256 colors is not important for a 3 color logo. A 16 color GIF is a very small file, much smaller, and more clear than any JPG, and ideal for graphics on the web. Graphics generally use solid colors instead of graduated shades, which limits their color count drastically, which is ideal for GIF's indexed color. GIF uses lossless LZW compression for relatively small file size, as compared to uncompressed data. GIF files offer optimum compression (smallest files) for solid color graphics, because objects of one exact color compress very efficiently in LZW. The LZW compression is lossless, but of course the conversion to only 256 colors may be a great loss. JPG is much better for 24 bit photographic images on the web. For those continuous tone images, the JPG file is also very much smaller (although lossy). But for graphics, GIF files will be smaller, and better quality, and (assuming no dithering) pure and clear without JPG artifacts. If GIF is used for continuous tone photo images, the limited color can be poor, and the 256 color file is quite large as compared to JPG compression, even though it is 8 bit data instead of 24 bits. Photos might typically contain 100,000 different color values, so the image quality of photos is normally rather poor when limited to 256 colors. 24 bit JPG is a much better choice today. The GIF format may not even be offered as a save choice until you have reduced the image to 256 colors or less. So for graphic art or screen captures or line art, GIF is the format of choice for graphic images on the web. Images like a company logo or screen shots of a dialog box should be reduced to 16 colors if possible and saved as a GIF for smallest size on the web. A complex graphics image that may look bad at 16 colors might look very good at say 48 colors (or it may require 256 colors if photo-like). But often 16 colors is fine for graphics, with the significance that the fewer number of colors, the smaller the file, which is extremely important for web pages. GIF optionally offers transparent backgrounds, where one palette color is declared transparent, so that the background can show through it. The GIF File - Save As dialog box usually has an Option Button to specify which one GIF palette index color is to be transparent. Interlacing is an option that quickly shows the entire image in low quality, and the quality sharpens as the file download completes. Good for web images, but it makes the file slightly larger. GIF files use a palette of indexed colors, and if you thought 24 bit RGB color was kinda complicated, then you ain't seen nuthin' yet (next page). For GIF files, a 24 bit RGB image requires conversion to indexed color. More specifically, this means conversion to 256 colors, or less. Indexed Color can only have 256 colors maximum. There are however selections of different ways to convert to 256 colors. LECTURE 18 The Digital Representation of Sound The world is continuous. Time marches on and on and there are plenty of things that we could measure at any instant. For example, weather forecasters might keep an ongoing recording of the temperature, or the barometric pressure. If you are in the hospital, then the nurses might be keeping a record of your temperature, or your heart rate (EKG), or your brain waves (EEG), or your insturance coverage. Any one of these records gives you a function f(t), where at a given time t, f(t) would be the value of the particular statistic that interests you. These sorts of functions are called time series. Genetic algorithms (GA) are evolutionary computing systems, which have been applied to things like optimization and machine learning. These are examples of "continuous" functions. What we mean by this is that at any instant of time, the functions take on a well-defined value, so that as we can see by the above figures, they make squiggly line graphs, which, if traced out by a number two pencil (please have one handy at all times!) could be done without the pencil ever leaving the paper (Ladies and gentlemen! Notice that at no time, did the pencil leave the paper!). This might also be called an "analog" function. Now, it's true that to illustrate the idea of a graph, we could have used a lot of simpler things (like the Dow Jones average, or a rainfall chart, or an actual EKG. But, you've all seen stuff like that, and also, we're really nerdy (well, one of us isn't), so we thought these would be really, like, way cool (totally). Copyright Juha Haataja/Center for Scientific Computing, Finland Of course, the time series that interest us are those that represent sound. In particular, what we want to do is take these time series, stick them on the computer and start to play with them! Now, if you're paying attention, then you may realize that at this moment we're in a bit of a bind. The type of time series that we've been describing is a continuous function. That is, at every instant in time, we could write down a number that is the value of the function at that instant —whether it be how much your eardrum has been displaced, what is your temperature, what is your heartrate, etc. But, this is an infinite list of numbers (any one of which may have an infinite expansion, like = 3.1417...) and no matter how big your computer is, you're going to have a pretty tough time fitting an infinite collection of numbers on your spanking new hard drive. So, how do we do it? That's the problem that we'll start to investigate in this chapter. How can we represent sound as a finite collection of numbers that can be stored efficiently, in a finite amount of space, on your computer, and played back, and manipulated at will! In short, how do we represent sound digitally?!?!" Here's a simpler restatement of the basic problem: computers basically store a finite list of numbers (which can then be thought of as a long list of 0s and 1s). These numbers also have a finite precision. A continuous function would be a list infinitely long! What is a poor electroacoustic musician to do? (Well, one thing to do would be to remember our mentions of sampling in the previous chapter). Somehow we have to come up with a finite list of numbers which does a good job of representing our continuous function. We do it with samples of the original function, at every few instants (of some predetermined rate, called the sampling rate) recording the value of the function. For example, maybe we only record the temperature every 5 minutes. For sounds we need to go a lot faster, and often use a special device which grabs instantaneous amplitudes at rapid, audio rates (called an Analog to Digital converter, or ADC). A continuous function is also called an analog function, and to restate the problem, we have to convert analog functions to lists of samples, or digital functions, the fundamental way that computers store information. In computers, think of this function not as a function of time (which it is) but as a function of position in computer memory. That is, we store these functions as lists of numbers in computer memory, and as we read through them we are basically creating a discrete function of time of individual amplitude values. Encoding Analog Signals Amplitude modulation (AM) can be used to send any type of information, and is not limited to sending just numerical information. In fact, usually in amplitude modulated systems the voltage levels do not change abruptly as in the example of Figures 5.9 through 5.8, but they vary continuously over a range of voltage values. One common shape for the way voltages vary in an amplitude modulation communication system is the sinusoid or sine wave shown in Figure 5.15. Where Digital Meets Analog A traditional telephone network operates with analog signals, whereas computers work with digital signals. Therefore a device is required to convert the computer's digital signal to an analog signal compatible with the phone line (modulation). This device must also convert the incoming analog signal from phone line to a digital signal (demodulation). Such a device is called a modem; its name is derived from this process of modulation/demodulation. A modem is also known as Data Circuit Terminating Equipment (DCE), which is used to connect a computer or data terminal to a network. Logically, a PC or data terminal is called Data Terminal Equipment (DTE). A modem's transmission speed can be represented by either data rate or baud rate. The data rate is the number of bits which a modem can transmit in one second. The baud rate is the number of 'symbols' which a modem can transmit in one second. The carrier signal on a telephone line has a bandwidth of 4000 Hz. Figure 7.2 shows one cycle of telephone carrier signals. The following types of modulation are used to convert digital signals to analog signals: Amplitude Shift Keying(ASK) Frequency Shift Keying (FSK) Phase Shift Keying (PSK) Quadrature Amplitude Modulation (QAM) Amplitude Shift Keying In Amplitude Shift Keying (ASK), the amplitude of the signal changes. This also may be referred to as Amplitude Modulation (AM). The receiver recognizes these modulation changes as voltage changes, as shown in Figure 5.17. The smaller amplitude is represented by zero and the larger amplitude is represented by one. Each cycle is represented by one bit, with the maximum bits per second determined by the speed of the carrier signal. In this case, the baud rate is equal to the number of bits per second. Figure 5.17 Frequency Shift Keying. With Frequency Shift Keying (FSK), a zero is represented by no change to the frequency of the original signal, while a one is represented by a change to the frequency of the original signal. This is shown in Figure 5.18. Frequency modulation is a term often used in place of FSK. Figure 5.18 Phase Shift Keying. Using the Phase Shift Keying (PSK) modulation method, the phase of the signal is changed to represent ones and zeros. Figure 5.19 shows a 90-degree phase shift. Figures 5.20(a), (b), and (c) show the original signals with a 90-degree shift, a 180-degree shift and a 270-degree shift, respectively. Figure 5.19 Figure 5.20 LECTURE 19: SUBBAND CODING: Sub-band coding (SBC) is any form of transform coding that breaks a signal into a number of different frequency bands and encodes each one independently. This decomposition is often the first step in data compression for audio and video signals. Basic Principles The utility of SBC is perhaps best illustrated with a specific example. When used for audio compression, SBC exploits what might be considered a deficiency of the human auditory system. Human ears are normally sensitive to a wide range of frequencies, but when a sufficiently loud signal is present at one frequency, the ear will not hear weaker signals at nearby frequencies. We say that the louder signal masks the softer ones. The louder signal is called the masker, and the point at which masking occurs is known, appropriately enough, as the masking threshold. The basic idea of SBC is to enable a data reduction by discarding information about frequencies which are masked. The result necessarily differs from the original signal, but if the discarded information is chosen carefully, the difference will not be noticeable. A basic SBC scheme To enable higher quality compression, one may use subband coding. First, a digital filter bank divides the input signal spectrum into some number (e.g., 32) of subbands. The psychoacoustic model looks at the energy in each of these subbands, as well as in the original signal, and computes masking thresholds using psychoacoustic information. Each of the subband samples is quantized and encoded so as to keep the quantization noise below the dynamically computed masking threshold. The final step is to format all these quantized samples into groups of data called frames, to facilitate eventual playback by a decoder. Decoding is much easier than encoding, since no psychoacoustic model is involved. The frames are unpacked, subband samples are decoded, and a frequency-time mapping reconstructs an output audio signal. Over the last five to ten years, SBC systems have been developed by many of the key companies and laboratories in the audio industry. Beginning in the late 1980s, a standardization body called the Motion Picture Experts Group (MPEG) developed generic standards for coding of both audio and video. Subband coding resides at the heart of the popular MP3 format (more properly known as MPEG 1 audio layer III), for example. Fourier methods applied to image processing Background The theory introduced for one dimensional signals above carries over to two dimensional signals with minor changes. Our basis functions now depend on two variables (one in the x-direction and one in the y-direction) and also two frequencies, one in each direction. See Exercises 2.15-2.18 in [1] for more details. The corresponding l2-norm for a two dimensional signal now becomes (2) where aij are the elements in the matrix representing the two dimensional signal. It is computed in Matlab using the Frobenius norm. Displaying the spectrum When you display the DFT of a two dimensional signal in Matlab, the default setting is that the low frequencies are displayed towards the edges of the plot and high frequencies in the center. However, in many situations one displays the spectrum with the low frequencies in the center, and the high frequencies near the edges of the plot. Which way you choose is up to you. You can accomplish the latter alternative by using the command fftshift(). There is a very useful trick to enhance the visualization of the spectrum of a two dimensional signal. If you take the logarithm of the gray-scale, this usually give a better plot of the frequency distribution. In case you want to display the spectrum fA, I recommend to type imshow(log(abs(fA))) for a nice visualization of the frequency distribution. You may also want to use fftshift as described above, but that is more a matter of taste. Exercises Exercise 12: The two dimensional Fourier basis Define the basis function Fm,n for an matrix by (3) To get a feeling for what these functions look like, plot the real and imaginary part of F1,1, F2,0, F0,2 and F4,4. Do this by first evaluating these functions on a square grid (128 by 128) and then display the resulting matrix using the command imshow. Finally, form a sum of two (real) basis functions such that the resulting imshow-plot ``resembles'' a chess board. You may have to re-scale the matrix elements before you display the matrix using imshow. You can accomplish this using the mat2gray() command. Exercise 13: Thresholding To compress an image, set elements in the Fourier space of a signal with a magnitude less than some number zero. (A matrix with many zeros can be stored very efficiently.) to Write a function threshold that takes a matrix and a scalar representing as input. The function should loop through all elements in the matrix and put all elements with a magnitude less than epsilon to zero and compute and print (on the screen) the compression ratio (use definition in previous lab and assume that every non-zero element is ``one piece of information'' that needs to be stored). Finally, the function should return the resulting matrix. Exercise 14: Image compression Using the function threshold, compress (by performing thresholding in the Fourier domain) the two images pichome.jpg and basket.jpg you worked with in the previous lab. Discuss the compression in terms of visual quality, compression rate and the l2-error. Is the performance different for the two images and if so why? How does this compression method compare to the SVD-compression for the previous lab? Exercise 15: Low pass filtering A ``naive'' way to low pass filter an image (that is, remove high frequencies of the image) is to Fourier transform the image, put all elements representing high frequencies of the Fourier transformed signal equal to zero, and then take an inverse transform of the signal. Design a low pass filter that removes the highest frequencies in both the x-direction and in the y-direction. Experiment by removing more and more high frequencies and see how this changes the the filtered image. LECTURE: 20 Audio compression Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are implemented in computer software as audio codecs. Generic data compression algorithms perform poorly with audio data, seldom reducing file sizes much below 87% of the original, and are not designed for use in real time. Consequently, specific audio "lossless" and "lossy" algorithms have been created. Lossy algorithms provide far greater compression ratios and are used in mainstream consumer audio devices. As with image compression, both lossy and lossless compression algorithms are used in audio compression, lossy being the most common for everyday use. In both lossy and lossless compression, information redundancy is reduced, using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to describe the data. The trade-off of slightly reduced audio quality is clearly outweighed for most practical audio applications where users cannot perceive any difference and space requirements are substantially reduced. For example, on one CD, one can fit an hour of high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in MP3 format at medium bit rates. Lossless audio compression Lossless audio compression allows one to preserve an exact copy of one's audio files, in contrast to the irreversible changes from lossy compression techniques such as Vorbis and MP3. Compression ratios are similar to those for generic lossless data compression (around 50–60% of original size), and substantially less than for lossy compression (which typically yield 5–20% of original size). Use The primary use of lossless encoding are: Archives For archival purposes, one naturally wishes to maximize quality. Editing Editing lossily compressed data leads to digital generation loss, since the decoding and re-encoding introduce artifacts at each generation. Thus audio engineers use lossless compression. Audio quality Being lossless, these formats completely avoid compression artifacts. Audiophiles thus favor lossless compression. A specific application is to store lossless copies of audio, and then produce lossily compressed versions for a digital audio player. As formats and encoders improve, one can produce updated lossily compressed files from the lossless master. As file storage and communications bandwidth have become less expensive and more available, lossless audio compression has become more popular. Lossy audio compression Lossy audio compression is used in an extremely wide range of applications. In addition to the direct applications (mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television; streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of the original stream, rather than 50 percent to 60 percent), by discarding less-critical data. The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are coded with decreased accuracy or not coded at all. Musical Instrument Digital Interface Interfaces MIDI connector diagram The physical MIDI interface uses DIN 5/180° connectors. Opto-isolating connections are used, to prevent ground loops occurring among connected MIDI devices. Logically, MIDI is based on a ring network topology, with a transceiver inside each device. The transceivers physically and logically separate the input and output lines, meaning that MIDI messages received by a device in the network not intended for that device will be re-transmitted on the output line (MIDI-OUT). This introduces a delay, one that is long enough to become audible on larger MIDI rings. MIDI-THRU ports started to be added to MIDI-compatible equipment soon after the introduction of MIDI, in order to improve performance. The MIDI-THRU port avoids the aforementioned retransmission delay by linking the MIDITHRU port to the MIDI-IN socket almost directly. The difference between the MIDI-OUT and MIDI-THRU ports is that data coming from the MIDI-OUT port has been generated on the device containing that port. Data that comes out of a device's MIDI-THRU port, however, is an exact duplicate of the data received at the MIDI-IN port. Such chaining together of instruments via MIDI-THRU ports is unnecessary with the use of MIDI "patch bay," "mult" or "Thru" modules consisting of a MIDI-IN connector and multiple MIDI-OUT connectors to which multiple instruments are connected. Some equipment has the ability to merge MIDI messages into one stream, but this is a specialized function and is not universal to all equipment. All MIDI compatible instruments have a built-in MIDI interface. Some computers' sound cards have a built-in MIDI Interface, whereas others require an external MIDI Interface which is connected to the computer via the game port, the newer DA-15 connector, a USB connector or by FireWire or ethernet. MIDI connectors are defined by the MIDI interface standard.. LECTURE 22: Speech coding Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. The two most important applications of speech coding are mobile telephony and Voice over IP. The techniques used in speech coding are similar to that in audio data compression and audio coding where knowledge in psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in narrowband speech coding, only information in the frequency band 400 Hz to 3500 Hz is transmitted but the reconstructed signal is still adequate for intelligibility. Speech coding differs from other forms of audio coding in that speech is a much simpler signal than most other audio signals, and that there is a lot more statistical information available about the properties of speech. As a result, some auditory information which is relevant in audio coding can be unnecessary in the speech coding context. In speech coding, the most important criterion is preservation of intelligibility and "pleasantness" of speech, with a constrained amount of transmitted data. It should be emphasised that the intelligibility of speech includes, besides the actual literal content, also speaker identity, emotions, intonation, timbre etc. that are all important for perfect intelligibility. The more abstract concept of pleasantness of degraded speech is a different property than intelligibility, since it is possible that degraded speech is completely intelligible, but subjectively annoying to the listener. In addition, most speech applications require low coding delay, as long coding delays interfere with speech interaction. SPEECH RECOGNITION AND GENERATION TECHNIQUE speech is a complex audio signal, made up of a large number of component sound waves. Speech can easily be captured in wave form, transmitted and reproduced by common equipment; this is how the telephone has worked for a century. However, once we move up the complexity scale and try to make a computer understand the message encoded in speech, the actual wave form is unreliable. Vastly different sounds can produce similar wave forms, while a subtle change in inflection can transform a phoneme's wave form into something completely alien. In fact, much of the speech signal is of no value to the recognition process. Worse still: any reasonably accurate mathematical representation of the entire signal would be far too large to manipulate in real time. Therefore, a manageable number of discriminating features must somehow be extracted from the wave before recognition can take place. A common scheme involves "cepstral coefficients" (cepstral is a mangled form of spectral); the recognizer collects 8,000 speech samples per second and extracts a "feature vector" of at most a few dozen numbers from each one, through a mathematical analysis process that is far beyond the scope of this article. From now on, when I mention "wave form" or "signal", I am actually talking about these collections of feature vectors. Acoustic Pattern Recognition A speech recognizer is equipped with two crucial data structures: A database of typical wave forms for all of the phonemes (i.e., basic component sounds) of a language. Since many of these phonemes' pronunciation varies with context, the database usually contains a number of different wave forms, or "allophones", for each phoneme. Databases containing 200, 800 or even 10,000 allophones of the English language can be purchased on the open market. A lexicon containing transcriptions of all the words known to the system into a phonetic language. There must be a "letter" in this phonetic alphabet for each allophone in the acoustic database. A good lexicon will contain several transcriptions for most words; the word "the", for example, can be pronounced "duh" or "dee", so it should have at least these two entries in the lexicon. Limitations of Speech Recognition For all of the effort which dozens of PhD's have been putting into their work for years, speech recognition is nowhere near Star Trek yet. Among the unresolved issues: Plain old speech recognizers are dumb. Even those smart enough to recognize complete sentences and equipped with language models will only spit out collections of words. It's up to someone or something else to make sense of them. (This is why most, if not all, speech input systems also include a natural language understanding unit; I will describe this component in detail next month.) Speech recognition works best in quiet, controlled environments. Trying to make it work in Quake III noise levels is not very effective. The larger the vocabulary, the easier it is to confuse a recognizer. If the vocabulary contains true homonyms, the system is in trouble. Speech recognition is a processor hog; it can easily eat up the equivalent of a 300 Mhz Pentium II, leaving chump change for the rest of the application. It is a lot easier to differentiate between long words; unfortunately, most common words are short. Speech Generation FinallY, a few words about automated speech generation. While the commercial tools tend to be very easy to use (i.e., one function call, passing a string of text and receiving a WAV file in return), speech quality is questionable at best. For games, this is rarely acceptable; unless you want a robot-like voice, you should have an actor record the computer character(s)' most common responses, and use word-stitching for the rest. AUDIO SYNTHESIS Audio synthesis environments comprise a wide and varying range of software and hardware configurations. Even different versions of the same environment can differ dramatically. Because of this broad variability, certain aspects of different systems cannot be directly compared. Moreover, some levels of comparison are either very difficult to objectively quantify, or depend purely on personal preference. Some of the commonly considered subjective attributes for comparison include: Usability (how difficult is it for beginners to generate some kind of meaningful output) Ease of use (how steep is the Learning curve for average and advancing users) Sound "quality" (which environment produces the most subjectively appealing sound) Creative flow (in what ways does the environment affect the creative process - e.g. guiding the user in certain directions) LECTURE 23: Image Compression Image here refers to not only still images but also motion-pictures and compression is the process used to reduce the physical size of a block of information. Compression is simply representing information more efficiently; "squeezing the air" out of the data, so to speak. It takes advantage of three common qualities of graphical data; they are often redundant, predictable or unnecessary. ( Today , compression has made a great impact on the storing of large volume of image data. Even hardware and software for compression and decompression are increasingly being made part of a computer platform. Take for example, Kay and Levine (1995, pg. 22) state that " … the System 7 operating system of Macintosh computers, now includes a compression engine offering several types of compression and decompression." Compression does have its trade-offs. The more efficient the compression technique, the more complicated the algorithm will be and thus, requires more computational resources or more time to decompress. This tends to affect the speed. Speed is not so much of an importance to still images but weighs a lot in motion-pictures. Surely you do not want to see your favourite movies appearing frame by frame in front of you. In this section of this paper, I will introduce some of the techniques for compression such as RLE (Run-Length Encoding), Huffman and Arithmetic. You will then see that some of these compression in later stage plays an important role in JPEG. Run-Length Encoding (RLE) The simplest form of compression technique which is widely supported by most bitmap file formats such as TIFF, BMP, and PCX. RLE performs compression regardless of the type of information stored, but the content of the information does affect its efficiency in compressing the information. Algorithm RLE works by reducing the physical size of a repeating string of characters. The reduced strings will then be indicated by a flag. This flag can be a special flag such that it will not appear in the data stream or it can be just any of the 255 bytes that can appear on the data stream. If the later is used, then an approach known as byte-stuffing will have to be used in order for the decoder to differentiate between flag and character belonging to the data stream. Let's illustrate how both RLE and bytestufffing works. This illustration is based on Steinmetz and Nahrstedt Given that we have the following uncompressed data : ABBBCCCCCDEFGGGGHI and the encoder engine is set such that, if characters appear consecutively more than 3 times (n=3) then we will compress the repeated strings. Note : In some encoders this n value might be different. Let's take the flag to be'!'. Depending on the encoder's algorithm, one or more bytes maybe used to indicate the numbers of repeating characters. In this case we chose 1 byte. Now what happened if the exclamation mark '!' appears in the data stream, is the decoder going to interpret it as a flag or as a character in the data stream? Here is where byte-stuffing comes in. If the encoder encounter any characters that appears to be the flag, it will then stuffed another of the character into the data stream. For example, the following uncompressed data : Therefore when the decoder reads two exclamation marks, it will know that one of them belong to the data stream and automatically remove the other. Variants on Run-Length Encoding It was stated by Murray and VanRyper (1994, Encyclopedia of Graphics File Formats) that there are variants of RLE. The most common one is, encoding an image sequentially along the X-axis (starting from top left corner), and slowly propagate down the Y-axis (ending at the bottom left corner). This is shown in Figure 2.1 a). Encoding can also be done along the Y-axis (starting from the top left corner) and slowly propagate across the X-axis (ending at the bottom right corner). This is shown in Figure 2.1 b). Other forms, include encoding in 2 dimensional tiles where we can select the size of each tiles in terms of the number of pixels. And lastly mentioned by Murray and VanRyper is encoding in zig-zag manners across the image. We can expect the last two encoding algorithm to be use only in highly specialised applications. Advantages and Disadvantages of RLE Advantage : Algorithm is simple to implement Fast to execute Produce lossless compression of images Disadvantage : Compression Ratio not as high compared to other more advanced compression algorithms. LECTURE 31: APPLICATIONS OF MULTIMEDIA Multimedia presentations may be viewed in person on stage, projected, transmitted, or played locally with a media player. A broadcast may be a live or recorded multimedia presentation. Broadcasts and recordings can be either analog or digital electronic media technology. Digital online multimedia may be downloaded or streamed. Streaming multimedia may be live or on-demand. Multimedia games and simulations may be used in a physical environment with special effects, with multiple users in an online network, or locally with an offline computer, game system, or simulator. The various formats of technological or digital multimedia may be intended to enhance the users' experience, for example to make it easier and faster to convey information. Or in entertainment or art, to transcend everyday experience. A lasershow is a live multimedia performance. Enhanced levels of interactivity are made possible by combining multiple forms of media content. Online multimedia is increasingly becoming object-oriented and data-driven, enabling applications with collaborative end-user innovation and personalization on multiple forms of content over time. Examples of these range from multiple forms of content on Web sites like photo galleries with both images (pictures) and title (text) user-updated, to simulations whose co-efficients, events, illustrations, animations or videos are modifiable, allowing the multimedia "experience" to be altered without reprogramming. In addition to seeing and hearing, Haptic technology enables virtual objects to be felt. Emerging technology involving illusions of taste and smell may also enhance the multimedia experience. History of the term In 1965 the term Multi-media was used to describe the Exploding Plastic Inevitable, a performance that combined live rock music, cinema, experimental lighting and performance art. In the intervening forty years the word has taken on different meanings. In the late 1970s the term was used to describe presentations consisting of multi-projector slide shows timed to an audio track.[citation needed] In the 1990s it took on its current meaning. In common usage the term multimedia refers to an electronically delivered combination of media including video, still images, audio, text in such a way that can be accessed interactively.[1] Much of the content on the web today falls within this definition as understood by millions. Some computers which were marketed in the 1990s were called "multimedia" computers because they incorporated a CD-ROM drive, which allowed for the delivery of several hundred megabytes of video, picture, and audio data. Use of Multimedia In Business and Industry Multimedia is helpful for Presentations, Marketing , product Demonstration, Instance messaging , providing employee training, advertisement and selling products all over the world via virtually unlimited web-based technologies. For example pilots get training and practice over multimedia virtual system before going to actual flight. Salesperson can learn about product line with demonstration. Also multimedia is used as a way to help present information to shareholders, superiors and coworkers. In the above picture is a multimedia. presentation using Powerpoint. Corporate presentations may combine all forms of Virtual Reality Virtual reality uses multimedia content. Applications and delivery platforms of multimedia are virtually limitless. Multimedia finds its application in various areas including, but not limited to, advertisements, art, education, entertainment, engineering, medicine, mathematics, business, scientific research and spatial temporal applications. Several examples are as follows: Creative industries Creative industries use multimedia for a variety of purposes ranging from fine arts, to entertainment, to commercial art, to journalism, to media and software services provided for any of the industries listed below. An individual multimedia designer may cover the spectrum throughout their career. Request for their skills range from technical, to analytical, to creative and contact baskar. Commercial Much of the electronic old and new media utilized by commercial artists is multimedia. Exciting presentations are used to grab and keep attention in advertising. Industrial, business to business, and interoffice communications are often developed by creative services firms for advanced multimedia presentations beyond simple slide shows to sell ideas or liven-up training. Commercial multimedia developers may be hired to design for governmental services and nonprofit services applications as well. Entertainment and fine arts In addition, multimedia is heavily used in the entertainment industry, especially to develop special effects in movies and animations. Multimedia games are a popular pastime and are software programs available either as CD-ROMs or online. Some video games also use multimedia features. Multimedia applications that allow users to actively participate instead of just sitting by as passive recipients of information are called Interactive Multimedia. In the Arts there are multimedia artists, whose minds are able to blend techniques using different media that in some way incorporates interaction with the viewer. One of the most relevant could be Peter Greenaway who is melding Cinema with Opera and all sorts of digital media. Another approach entails the creation of multimedia that can be displayed in a traditional fine arts arena, such as an art gallery. Although multimedia display material may be volatile, the survivability of the content is as strong as any traditional media. Digital recording material may be just as durable and infinitely reproducible with perfect copies every time. Education In Education, multimedia is used to produce computer-based training courses (popularly called CBTs) and reference books like encyclopedia and almanacs. A CBT lets the user go through a series of presentations, text about a particular topic, and associated illustrations in various information formats. Edutainment is an informal term used to describe combining education with entertainment, especially multimedia entertainment. Learning theory in the past decade has expanded dramatically because of the introduction of multimedia. Several lines of research have evolved (e.g. Cognitive load, Multimedia learning, and the list goes on). The possibilities for learning and instruction are nearly endless. Engineering Software engineers may use multimedia in Computer Simulations for anything from entertainment to training such as military or industrial training. Multimedia for software interfaces are often done as a collaboration between creative professionals and software engineers. Mathematical and Scientific Research In Mathematical and Scientific Research, multimedia are mainly used for modelling and simulation. For example, a scientist can look at a molecular model of a particular substance and manipulate it to arrive at a new substance. Representative research can be found in journals such as the Journal of Multimedia. Medicine In Medicine, doctors can get trained by looking at a virtual surgery or they can simulate how the human body is affected by diseases spread by viruses and bacteria and then develop techniques to prevent it. Multimedia at home It can be used at home either by the medium of television or internet or by CD media . Home shopping , gardening , Cooking , Home designing can be various areas where multimedia can be useful or better way to get information. LECTURE 32: Virtual Reality Information Introduction We often receive requests from students, hobbyists, or interested parties for more information about Virtual Reality (VR). The following is an attempt to provide an introduction to the topic, and to also supply some links where further information may be found. What is Virtual Reality (VR)? Virtual Reality is generally a Computer Generated (CG) environment that makes the user think that he/she is in the real environment. One may also experience a virtual reality by simply imagining it, like Alice in Wonderland, but we will focus on computer generated virtual realities for this discussion. The virtual world is hosted on a computer in the form of a database (e.g. terrain database or environment database). The database resides in the memory of the computer. The database generally consists of points in space (vertices), as well as textures (images). vertices may be connected to form planes, commonly referred to as polygons. Each polygon consists of at least three vertices. The polygon could have a specific color, and the color could be shaded, or the polygon could have a texture pasted onto it. Virtual objects will consist of polygons. A virtual object will have a position (x, y, z), an orientation (yaw, pitch, roll) as well as attributes (e.g. gravity or elasticity). The virtual world is rendered with a computer. Rendering involves the process of calculating the scene that must be displayed (on a flat plane) for a virtual camera view, from a specific point, at a specific orientation and with a specific field of view (FOV). In the past the central processing unit (CPU) of the computer was mainly used for rendering (so-called software rendering). Lately we have graphics processing units (GPUs) that render the virtual world to a display screen (so-called hardware rendering). The GPUs are normally situated on graphics accelerator cards, but may also be situated directly on the motherboard of the computer. Hardware rendering is generally much faster than software rendering. The virtual environment (also sometimes referred to as a synthetic environment) may be experienced with a Desktop VR System, or with an Immersive VR System. With Desktop VR a computer screen is normally used as the display medium. The user views the virtual environment on the computer screen. In order to experience the virtual environment, the user must look at the screen the whole time. With Immersive VR the user is 'immersed in' or 'surrounded by' the virtual environment. This may be achieved by using: A Multi-Display System or A Head Mounted Display (HMD) Immersive VR Systems provide the user with a wider field of view than Desktop VR Systems. With Multi-Display Systems the field of view (FOV) of the user is extended by using several computer monitors, or projectors. When using projectors, the image may be front-projected or back-projected onto the viewing screen. Many simulators utilize three screens (forward view, left view, right view) to provide an extended FOV. The configuration where the user is surrounded by projection screens are sometimes referred to as a cave environment. The image may also be projected on a dome that may vary in shape and size. With a multi-display system the user may look around as if in the real world. A Head Mounted Display (HMD) consists of two miniature displays that are mounted in front of the user's eyes with a headmount. Special optics enable the user to view the miniature screens. The HMD also contains two headphones, so that the user may also experience the virtual environment aurally. The HMD is normally fitted with a Head Tracker. The position (x, y, z) and orientation (yaw, pitch, roll) of the user's head is tracked by means of the Head Tracker. As the user looks around, the position and orientation information is continuously relayed to the host computer. The computer calculates the appropriate view (virtual camera view) that the user should see in the virtual environment, and this is displayed on the miniature displays. For example, let's assume that the virtual environment is the inside of a car, and that the user is sitting behind the steering wheel. If the user looks forward, the head tracker will measure this orientation, and relay it to the computer. The computer would then calculate the forward view, and the user will see the windscreen, wipers and bonnet of the car (the user will obviously also see the outside world, or out of window (OOW) view). If the user looks down, the computer will present a view of the steering wheel. If the user looks further down, the accelerator pedal, clutch (if present) and brake pedal will be shown. The orientation information may also be used to experience stereo and 3-D sound. If the user looks straight forward, he/she will hear the engine noise of the car. The volume and phase will be equal for the right and left ear. If the user looks to the left, the volume of the engine noise will be higher in the right ear and lower in the left ear. Trackers that only track the orientation (yaw, pitch, roll) are referred to as 3 degree of freedom, or 3 DOF trackers, while trackers that also tracks the position (x, y, z) are referred to as 6 DOF trackers. Applications of Virtual Reality (VR) Virtual Reality is an ideal training and visualization medium. VR is ideal for the training of operators that perform tasks in dangerous or hazardous environments. The trainee may practice the procedure in virtual reality first, before graduating to reality-based training. The trainee may be exposed to life-threatening scenarios, under a safe and controlled environment. Examples of dangerous or hazardous environments may be found in the following fields: Aviation Automotive Chemical Defense High Voltage Industrial Marine Medical Mining Nuclear Energy Examples of the above are shown under the Products heading. VR is also an ideal tool to train operators for: This paper describes the development and impact of new visually-coupled system (VCS) equipment designed to support engineering and human factors research in the military aircraft cockpit environment. VCS represents an advanced man-machine interface (MMI). Its potential to improve aircrew situational awareness seems enormous, but its superiority over the conventional cockpit MMI has not been established in a conclusive and rigorous fashion. What has been missing is a "systems" approach to technology advancement that is comprehensive enough to produce conclusive results concerning the operational viability of the VCS concept and verify any risk factors that might be involved with its general use in the cockpit. The advanced VCS configuration described here, has been ruggedized for use in military aircraft environments and was dubbed the Virtual Panoramic Display (VPD). It was designed to answer the VCS portion of the "systems" problem, and is implemented as a modular system whose performance can be tailored to specific application requirements. The overall system concept and the design of the two most important electronic subsystems that support the helmet-mounted components -- a new militarized version of the magnetic helmet mounted sight and correspondingly similar helmet display electronics -- are discussed in detail. Significant emphasis is given to illustrating how particular design features in the hardware improve overall system performance and support research activities A study was conducted to compare and validate two visually coupled system (VCS) installations, one in a moving-base flight simulator and a second in a Bell 205 research helicopter. Standard low-level maneuvering tasks were used to examine changes in handling qualities. Pilots also assessed two levels of control augmentation: rate damped and translational rate command. The system handling qualities degraded whenever the VCS was used. Pilots reported that there were system deficiencies which increased their workload and prevented them from achieving desired task performance. The decline in handling qualities was attributed principally to the reduction in image quality while flying the helicopter with the VCS. The primary factors affecting performance included a reduction in image resolution, a reduction in the field of view, system latencies, and the limitations of the simulator mathematical model. Control augmentation improved the system handling qualities in the simulator and should be investigated further as an effective strategy for overcoming VCS limitations The Visually-Coupled System Computer-Generated Imagery (VCS-CGI) Interface program had two main phases. The objective for the first phase was to successfully interface the various subcomponents (helmet-mounted sight, helmet-mounted displays, and advanced computer-generated image system) and demonstrate their compatibility and feasibility for use in wide field of view, air-to-ground visual simulation. The objective for the second phase was to conduct a systematic exploration and evaluation of various system parameters that could affect display quality. (Author) LECTURE 33: Visually Coupled Systems The capability to “look and shoot” was but a fantasy in the days of the Flash Gordon and Buck Rogers comic strips. Soon today’s Air Force pilot will be able to aim his weapons by a mere glance and fire along his line of sight by the simple push of a button. Systematic research and development of visual-coupling concepts, to improve man’s relationship with his machine, are helping to bring a “look and shoot” capability closer to operational reality. Recent combat experience has shown that many tactical, reconnaissance/strike, and air-superiority systems are operator-limited by both the task loading placed on the crew and the design of the interface between the operator and his machine. As long as tactical weapon systems are used in a high-threat environment, the flight profiles necessary for survivability will dictate that the operator perform all essential tasks effectively, accurately, and, most important, expeditiously. A well-designed interface lets him use his natural perceptual and motor abilities optimally. Such limiting factors especially critical in weapon delivery missions where visual target acquisition and weapon aiming are task requirements. Since 1965, in an attempt to improve aircraft man-machine design, human-factors engineers of the Aerospace Medical Research Laboratory (AMRL) at Wright-Patterson AFB, Ohio, (a unit of Aerospace Medical Division) have been pioneering techniques to “visually couple” the operator to his weapon system. A visually coupled system is more correctly a special subsystem that integrates the natural visual and motor skills of an operator with the machine he is controlling. An operator visually searches for, finds, and tracks an object of interest. His line of sight is measured and used to aim sensors and/or weapons toward the object. Information related to his visual/motor task from sensors, weapons, or central data sources is fed back directly to his vision by special displays so as to enhance his task performance. In other words, he looks at the target, and the sensors/weapons automatically point at the target. Simultaneously with the display, he verifies where sensors/weapons are looking. He visually fine-tunes their aim, and he shoots at what he sees. Two functions are performed: a line-of-sight sensing/control function and a display feedback function. Although each may be used separately, a fully visually coupled system includes both. Thus, it is a unique control/display subsystem in which man’s line of sight is measured and used for control, and visual information is fed back directly to his eyes for his attention and use. Currently a helmet-mounted sight is used to measure head position and line of sight. An early version of a helmet sight was used in an in-flight evaluation at Tyndall AFB in 1969. Various experimental sights have undergone flight tests. The U.S. Navy has produced a similar sight for operational use in F-4J and F-4B aircraft. A helmet-mounted display is used to feed back information to the eye. An early bulky experimental display completely occluded outside vision to the right eye. Later versions permit a see-through capability, which allows simultaneous viewing of the display and the outside world scene. Many experimental display improvements are under study, but display flight-test experience is still limited. Research and development efforts are under way to reduce size, weight, and profile and to increase the performance of future visual coupling devices. Before looking at development progress toward operational reality, let’s explain in general terms how such sights, displays, and visually coupled systems are now mechanized and discuss their potential capabilities. helmet sight components and capabilities In the mid-sixties Honeywell selected, as one way to mechanize line-of-sight determination, an electrooptical technique for determining helmet position and the line of sight through a reticle. (Figure 1) Rotating parallel fanlike planes of infrared energy from the sight surveying units (mounted on canopy rails) scan two photo diodes on the side of the helmet. Timing signals from the scanners and diodes are processed by a digital computer (sight electronics unit) to determine line of sight. Such line-of-sight information can be used to point a variety of other subsystems. A helmet-mounted sight facilitates wide off-boresight sensor or weapon aiming and speeds target acquisition. It permits continuous visual attention to the target outside the cockpit while sensors/weapons are slewed, and the hands are free from slewing control. The sight capitalizes on the ease and accuracy of the operator’s natural eye/head tracking abilities. His natural outside-the-cockpit spatial orientation is used throughout target acquisition. helmet display components and capabilities In an experimental helmet-mounted display video and symbolic signals are received from various alternative aircraft subsystems. Cathode-ray tube (CRT) imagery is projected directly to the eye of the operator in such a way that it appears to be focused upon real-world background. A collimation lens performs the focus at infinity. The combiner reflects the imagery into the eye much as a dental mirror does; however it permits the eye to see through to the realworld scene simultaneously. Thus it essentially combines the display and real-world scenes for the eye. A small helmet display could substitute effectively for a large conventional panel-mounted display and would give the operator the benefits of larger display with a high-quality image. The designer benefits from an overall subsystem weight and space savings. These advantages accrue from the simple fact that in target detection it is image size upon the retina of the eye which counts. A one-inch-diameter CRT display presented as a virtual imagel and placed one inch in front of the eye results in approximately the same image size on the retina as a 21-inch CRT would mounted on a panel 21 inches away from the eye.2 Miniature CRT technology can now provide sufficient resolution to make a high-quality helmet-mounted image display practical. Even though most aircraft panels cannot accommodate large CRT’s, it is important that the displayed imagery be large enough to be detected and identified by the eye. In other words, the image size detection capabilities of the sensor, the display, and the eye should be made as compatible as possible. Helmet-mounted displays offer designers a new way to achieve this compatibility. They offer the operator continuous head-up, captive-attention viewing. When the display is used alone, selected sensor imagery, flight control/display symbols, or other subsystem status information can be directly presented to the eye no matter where the operator is looking. However, comprehensive analyses and ground and in-flight evaluations of the operator’s capability to use the information must be carried out if operator effectiveness is to be realized. visually coupled systems components and capabilities The helmet-mounted sight and display are combined as a system integrated with the, operator’s vision. Mechanization of the full system involves integration of the sight and display components into a lightweight helmet development of a visor that automatically varies light transmission to ensure appropriate display brightness contrast, and improvements in the electronic and optic components. When they are combined and matched with seekers, sensors, central data computers, and/or flight control subsystems, entirely new control/display capabilities can be provided to the user: a hemispheric head-up display that is compatible with the operator’s outside-the-cockpit spatial orientation; sensor extensions of the operator’s vision (e.g., it is possible to position sensors so the operator “looks” through aircraft structures); visual control of the aircraft and weapons; and visual communications between crew members and between aircraft. Potential visual coupling applications with aircraft and remotely piloted vehicle fire control, flight control, reconnaissance, navigation, weapon delivery, and communications subsystems are many. In a night attack mission, for example, a low-light-level television scene can be displayed, superimposed on the real world, off-boresight, and through aircraft structure. Flight control and weapons data are provided in addition to the ground scene on the display. Visually coupled systems can also be used to input line-of-sight angle information into central computers in order to update navigation; to identify a target location for restrike, reconnaissance, or damage assessment; and to communicate coordinate locations in real time with other aircraft or with command and control systems. By means of intracockpit visual communication, one operator can cue another operator on where to look for targets of interest. Similar nonverbal communication between forward air control and attack aircraft is conceivable. LECTURE 34: VISUALLY COUPLED SYSTEM DEVELOPMENT Visually coupled system development is merely highlighted here to give a glimpse of progress. No attempt is made to be comprehensive but rather to give a feel for some of the choices and changes that led to the current objectives of the Air Force engineering development efforts. Until 1971 these efforts were mainly exploratory. Since March 1971, the Aerospace Medical Research Laboratory has pursued exploratory development of advanced concepts and engineering development of visual coupling devices. Progress to date indicates that these devices will soon be ready for Air Force operational use. helmet-mounted sight development Historically, it is not possible to trace the basic line-of-sight and display-feedback concepts to specific originators. Some credit for the sighting concept should go to behavioral scientists who in the late forties and early fifties were engrossed in systematic analyses of pilot eye movements to determine instrument scan and visual search patterns. 3 Initial applied sighting efforts in government and industry concerned the accuracy of head and eye tracking in the laboratory.4 It was apparent that accuracy and effectiveness were functions of the head and/or eye position sensing techniques. Applications had to await practical sensing technologies. Head position tracking has received the most applied emphasis. Eye position tracking continues to be explored. It was also evident that the proof for any sighting technique would be in its accuracy and acceptability in flight. The Army, Navy, Air Force, and industry have pursued complementary developments of helmet-mounted sights. Two especially noteworthy early approaches to line-of-sight sensing: a mechanical head-position sensing system by Sperry, Utah Division, and an electrooptical helmet-position sensing system by Minneapolis Honeywell were developed to the brassboard5 stages for testing in the 1965 through 1967 time period. Sperry’s sight is a mechanically linked system where helmet position is determined in a manner similar to the working of drafting-board arms. A reticle in front of the operator’s eye fixes the line of sight in reference to the helmet and its mechanical linkage. A magnetic attachment provides a quick disconnect capability. 6 Under Army contracts, Sperry’s mechanical head-position tracker was evaluated in UH-1 and AH-1G helicopters, starting in 1967. Subsequent testing has led to a production contract to use the mechanical helmet-mounted sight as a backup target acquisition aid on certain Cobra helicopters. The Air Force pursued the mechanical sight approach in early AC-130 gunship aircraft. on the helmet and cockpit-mounted photo diode detection surfaces. A Honeywell technique employs ultrasonic sound ranging and sensing. Projected advanced sight improvements include enlarging from 1 cubic foot to 2 cubic feet the head-motion envelope (motion box) within which helmet position can be determined. Also sighting accuracy is to be improved, as is the effective coverage of line-of-sight azimuth and elevation angles. Improvements will further reduce helmet weight to 51 ounces and will reduce costs per unit while increasing its reliability. Helmet-mounted sight technology is being integrated with helmet display developments (described below) to form the Air Force’s fully visually coupled system. THE FUTURE OF VIRTUALLY COUPLED SYSTEM Visually coupled systems are under Army consideration for night attack applications in the Cobra helicopter and as a technology for future advanced attack helicopters. The Navy is considering them for its AGILE and F-14 programs. The Air Force is investigating potential applications for air-to-air and air-to-ground target acquisition and weapon delivery, off-boresight weapon guidance, reconnaissance, navigation, and remotely piloted vehicle control. Currently, the Air Force is conducting or planning flight tests in conjunction with the AGM-65 Maverick, C-130 gunship, laser designation systems, and weapons guidance developments. The immediate future will be determined by progress made in current development and test projects. Advanced technology will be transitioned into engineer development prototypes for testing. Interim helmet-mounted sights will be ready for Air Force application testing late in 1974, and advanced versions of sights, displays, and combined subsystems will be ready in 1977. The Aerospace Medical Research Laboratory is already looking beyond the technology discussed here to future visual coupling concepts that can further improve operator performance. One such line-of-sight sensing concept will determine eye position rather than head position. A display patterned after the human eye is also under study. These concepts are briefly discussed: The Aerospace Medical Research Laboratory is attempting to determine line of sight accurately from a small eye-movement sensor that could be located in the cockpit instrument panel. This remote oculometer determines line of sight from the reflection angle of a beam of infrared “light” projected onto the cornea of the eye. Currently, in the laboratory the eye can be tracked within a one-cubic-foot motion box with an accuracy of one degree. Should this technique prove to be practical, line-of-sight sensing and control could be possible without any encumbrance on the head.18 A promising visually coupled system display technique employs dual-resolution fields of view, high resolution with a zoom capability in the center and low resolution in the periphery. 19 This concept, patterned after human vision, offers considerable potential in target search and identification as a means of coupling the operator with high-magnification sensors and high-power optics. Associated sensor slewing control techniques enable the operator to feel he has been moved closer to the target while using his natural visual search capabilities. Also several display-related technologies can be incorporated into visual coupling devices. For example, predictor displays can be readily exploited. Color display for improved infrared sensor target detection is also a possibility.20 In summary, the “look and shoot” capability is around the corner. Systematic R&D pursuit of visual coupling technology is opening many possible applications. Development of components has progressed sufficiently that operational applications are feasible. Although helmet sighting technology is further along than helmet display, full visually coupled systems capabilities should be available to the Air Force in 1977. Operator and system performance will be appreciably enhanced by the application of visual coupling devices. LECTURE 35: VR Operating Shell VR poses a true challenge for the underlying software environment, usually referred to as the VR operating shell. Such a system must integrate real-time three-dimensional graphics, in large object-oriented modelling and database techniques, event-driven simulation techniques, and the overall dynamics based on multithreading distributed techniques. The emerging VR operating shells, such as Trix at Autodesk, Inc., VEOS at HIT Lab, and Body Electric at VPL, Inc., share many design features with the MOVIE system. A multiserver network of multithreading interpreters of high-level object-oriented language seems to be the optimal software technology in the VR domain. We expect MOVIE to play an important role in the planned VR projects at Syracuse University, described in the previous section. The system is capable of providing both the overall infrastructure (VR operating shell) and the highperformance computational model for addressing new challenges in computational science, stimulated by VR interfaces. In particular, we intend to address research topics in biological vision on visual perception limits [in association with analogous constraints on VR technology; research topics in machine vision in association with highperformance support for the ``non-encumbered'' VR interfaces and neural network research topics in association with the tracking and real-time control problems emerging in VR environments [From the software engineering perspective, MOVIE can be used both as the base MovieScript-based software development platform and the integration environment which allows us to couple and synchronize various external VR software packages involved in the planned projects. Figure illustrates the MOVIE-based high-performance VR system planned at NPAC and discussed in the previous section. High-performance computing, high-quality three-dimensional graphics, and VR peripherals modules are mapped on an appropriate set of MovieScript threads. The overall synchronization necessary, for example, to sustain the constant frame rate, is accomplished in terms of the real-time component of the MovieScript scheduling model. The object-oriented interpreted multithreading language model of MovieScript provides the critical mix of functionalities, necessary to cope efficiently with prototyping in such complex software and hardware environments. Planned High-End Virtual Reality Environment at NPAC. New parallel systems: CM-5, nCUBE2 and DECmpp are connected by the fast HIPPI network and integrated with distributed FDDI clusters, high-end graphics machines, and VR peripherals by mapping all these components on individual threads of the VR MOVIE server. Overall synchronization is achieved by the real-time support within the MOVIE scheduling model. Although the figure presents only one ``human in the loop,'' the model can also support in a natural way the multiuser, shared virtual worlds with remote access capabilities and with a variety of interaction patterns among the participants. The MOVIE model-based high-performance VR server at NPAC could be employed in a variety of visualizationintensive R&D projects. It could also provide a powerful shared VR environment, accessible from remote sites. MovieScript-based communication protocol and remote server programmability within the MOVIE network assure satisfactory performance of shared distributed virtual worlds also for low-bandwidth communication media such as telephone lines. From the MOVIE perspective, we see VR as an asymptotic goal in the GUI area, or the ``ultimate'' user interface. Rather than directly build the specific VR operating shell, which would be short-lived given the current state of the art in VR peripherals, we instead construct the VR support in the graded fashion, closely following existing and emerging standards. A natural strategy is to extend the present MovieScript GUI sector based on Motif and threedimensional servers by some minimal VR operating shell support. Two possible public domain standard candidates in this area to be evaluated are VEOS from HIT Lab and MR (Minimal Reality) from the University of Alberta. We also plan to experiment with the Presence toolkit from DEC and with the VR_Workbench system from SimGraphics, Inc. Parallel with evaluating emerging standard candidates, we will also attempt to develop a custom MovieScript-based VR operating shell. Present VR packages typically split into the static CAD-style authoring system for building virtual worlds and the dynamic real-time simulation system for visiting these worlds. The general-purpose support for both components is already present in the current MovieScript design: an interpretive object-oriented model with strong graphics support for the authoring system and a multithreading multiserver model for the simulation system. A natural next step is to merge both components within the common language model of MovieScript so that new virtual worlds could also be designed in the dynamic immersive mode. The present graphics speed limitations do not currently allow us to visit worlds much more complex than just Boxvilles of various flavors, but this will change in coming years. Simple solids can be modelled in the conventional mouse-based CAD style, but with the growing complexity of the required shapes and surfaces, more advanced tools such as VR gloves become much more functional. This is illustrated in where we present a natural transition from the CAD-style to VR-style modelling environment. Such VR-based authoring systems will dramatically accelerate the process of building virtual worlds in areas such as industrial or fashion design, animation, art, and entertainment. They will also play a crucial role in designing nonphysical spaces-for example, for hypermedia navigation through complex databases where there are no established VR technologies and the novel immersion ideas can be created only by active, dynamic human participation in the interface design process. Examples of the Glove-Based VR Interfaces for CAD and Art Applications. The upper figure illustrates the planned tool for interactive sculpturing or some complex, irregular CAD tasks. A set of ``chisels'' will be provided, starting from the simplest ``cutting plane'' tool to support the glove-controlled polygonal geometry modelling. The lower figure illustrates a more advanced interface for the glove-controlled surface modelling. Given the sufficient resolution of the polygonal surface representation and the HPC support, one can generate the illusion of smooth, plastic deformations for various materials. Typical applications of such tools include fashion design, industrial (e.g., automotive) design, and authoring systems for animation. The ultimate goal in this direction is a virtual world environment for creating new virtual worlds. VR-related Technologies Other VR-related technologies combine virtual and real environments. Motion trackers are employed to monitor the movements of dancers or athletes for subsequent studies in immersive VR. The technologies of 'Augmented Reality' allow for the viewing of real environments with superimposed virtual objects. Telepresence systems (e.g., telemedicine, telerobotics) immerse a viewer in a real world that is captured by video cameras at a distant location and allow for the remote manipulation of real objects via robot arms and manipulators. Applications As the technologies of virtual reality evolve, the applications of VR become literally unlimited. It is assumed that VR will reshape the interface between people and information technology by offering new ways for the communication of information, the visualization of processes, and the creative expression of ideas. Note that a virtual environment can represent any three-dimensional world that is either real or abstract. This includes real systems like buildings, landscapes, underwater shipwrecks, spacecrafts, archaeological excavation sites, human anatomy, sculptures, crime scene reconstructions, solar systems, and so on. Of special interest is the visual and sensual representation of abstract systems like magnetic fields, turbulent flow structures, molecular models, mathematical systems, auditorium acoustics, stock market behavior, population densities, information flows, and any other conceivable system including artistic and creative work of abstract nature. These virtual worlds can be animated, interactive, shared, and can expose behavior and functionality. Real and abstract virtual worlds (Michigan Stadium, Flow Structure): Useful applications of VR include training in a variety of areas (military, medical, equipment operation, etc.), education, design evaluation (virtual prototyping), architectural walk-through, human factors and ergonomic studies, simulation of assembly sequences and maintenance tasks, assistance for the handicapped, study and treatment of phobias (e.g., fear of height), entertainment, and much more. 7.Available VR Software Systems [NOTE: This section is BADLY out of date. Most of the information is from the 1993 version of this paper. It does not address VRML, or other new systems. Search Yahoo or other service to find new systems.] There are currently quite a number of different efforts to develop VR technology. Each of these projects have different goals and approaches to the overall VR technology. Large and small University labs have projects underway (UNC, Cornell, U.Rochester, etc.). ARPA , NIST, National Science Foundation and other branches of the US Government are investing heavily in VR and other simulation technologies. There are industry supported laboratories too, like the Human Interface Technologies Laboratory (HITL) in Seattle and the Japanese NTT project. Many existing and startup companies are also building and selling world building tools (Autodesk, IBM', Sense8, VREAM). There are two major categories for the available VR software: toolkits and authoring systems. Toolkits are programming libraries, generally for C or C++ that provide a set of functions with which a skilled programmer can create VR applications. Authoring systems are complete programs with graphical interfaces for creating worlds without resorting to detailed programming. These usually include some sort of scripting language in which to describe complex actions, so they are not really non-programming, just much simpler programming. The programming libraries are generally more flexible and have faster renders than the authoring systems, but you must be a very skilled programmer to use them. (Note to developers: if i fail to mention your system below, please let me know and I will try to remember to include it when, and if, i update this document again) Lesson Plan -- MMT -- The readings referred to in the table below are recommended material from A – “Multimedia:Making it work ” by Tay Vaungh B – “Multimedia system” by John F. Boford C – “Multimedia: sound and video” by Lozano. Lecture Unit-1 1. 2. 3. 4. 5. 6&7 UNIT 2 8&9 10. 11. 12. 13. 14. 15 16 17 UNIT 3 18 19 20 21 22 23 24 25 26 27 28 29 30 UNIT-4 31 32 33 Topics Readings Basics of multimedia Computer, communication, entertainment, multimedia an introduction Frame work for multimedia , multimedia device CD-ROM, CD-Audio, Multimedia presentation and authoring Professional development tool, LAN, internet,World wide web, ATM,ADSL, vector graphic, 3D graphic program B-2.1, A-3 to 7 Animation technique, shading, anti-alising, morphing, video on demand Image compression and standard Making still image, editing and capturing image Scanning an image, computer color model A-173,149, 56, 142, 143, B-1.3.1 B-3.4, A-232to 257 B-7.3, A-284to297 A-230to245,A-301to 315 B-13.3, B-17.6, A-331to 332 A-134to139, A141to142 A-143, A-151to159 Color palettes, vector drawing 3 D drawing and rendeing JPEG objective and architecture JPEG-DCT encoding and quantization JPEG statistical coding, JPEG predictive lossless coding JPEG performance GIF,PNG, TIFF, BMP AUDIO & VIDEO Digital representation of sound Subband encoding Fourior method Transmission of digital sound Digital audio signal processing A-144to146,C-6.3.2 A-146to 151 B-6.5.1-6.5.2 B-6.5.3 B-6.5.4, B-6.5.5 Stereophonic and quadraphonic signal processing Editing sampled sound MPEG audio, audio compression and decompression Speech generation and recognization MIDI MPEG motion video compression standard DVI Technology Time based media representation and delivery` VIRTUAL REALITY Applications of multimedia, Intelligent multimedia Desktop virtual reality B-4.5.1 B-6.5.6 A-359, A-162, A-355 B-4.3, B-4.3.2, C-6.4 B-4.3.2, A-271 B-4.4 B-4.5, C-7.5 B-4.5.3,A-291 A-207, A-276 to 278 C-9.4, A-415, B-8.5.9 A-106to116 B-6.7 B-6.8 B-7.1 to 7.6 A-5to 11 B-18.3 A-339 to 340 34 VR operating system Notes from internet 35 Visually coupled system requirement B-18.5 Record of Lectures Taken Lecture Date Topics covered Remarks No ASSIGNMENT-1 GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT SUBJECT: multimedia CLASS:B.Tech 4th sem 1. Define Multimedia. Give its applications in various areas. 2. Discuss different standards of CD-ROM. Define session management in Orange Book Standard of CD-ROM? 3. What is utilization of high speed devices in Multimedia, Define SCSI and IDE devices? 4. Discuss the Transport Layer Protocols to handle the Multimedia Data operation. 5. Write short notes on the followings. (a) Multimedia Authoring Tools. (b) Synchronous and Isochronous Data. (c) TCP and XTP (d) ATM and FDDI (e) ADSL (f) Vector Graphics ASSIGNMENT-2 GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT SUBJECT: multimedia CLASS:B.Tech 4th sem 1. Give various animation techniques in detail 2. Explain Multimedia servers & databases 3. Explain multimedia distribution network. 4. How 3D graphics programs are executed using multimedia 5. Briefly describe the Twining and morphing ASSIGNMENT-3 GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT SUBJECT: multimedia CLASS:B.Tech 4th sem Q1 1. Explain MPEG motion Video Compression Standard. 2. Differentiate MPEG2 and MPEG4 3. Why compression is required in MM Technologies. 4. Explain loss less and lossy compression technique. 5. Explain JPEG-DCT encoding and quantization. Explain Quantization noise 6. Compare the performance of JPEG with BMP and GIF 7. How capturing and editing is done in still images 8. Explain various high speed devices used in MM Technologies. 9. Explain JPEG-statistical encoding how it is different from JPEG predictive lossless coding 10. Write short notes on the following. (a) Computer color models (b) Color palattes (c) Vector drawing (d) 3D drawing and rendering ASSIGNMENT-4 GURGAON INSTITUTE OF TECHNOLOGY AND MENEGEMENT SUBJECT: multimedia CLASS:B.Tech 4th sem 1. Explain intelligent multimedia system. Explain its components. 2. Explain quick time Architecture for Macintosh system. Explain its relevancy with Virtual Reality O. S. 3. What do you mean by VR? Explain intelligent VR software systems 4. Explain applications of VR in various fields 5. What are virtual environment displays and orientation making 6. Write short notes on the following. (i) MIME Applications. (j) Zig-zag Ordering. (k) Desktop VR (l) Applications of multimedia Q1 1. Explain MPEG motion Video Compression Standard. 2. Differentiate MPEG2 and MPEG4 3. Why compression is required in MM Technologies. 4. Explain loss less and lossy compression technique. 5. Explain JPEG-DCT encoding and quantization. Explain Quantization noise 6. Compare the performance of JPEG with BMP and GIF 7. How capturing and editing is done in still images 8. Explain various high speed devices used in MM Technologies. 9. Explain JPEG-statistical encoding how it is different from JPEG predictive lossless coding 10. Write short notes on the following. (a) Computer color models (b) Color palattes (c) Vector drawing (d) 3D drawing and rendering 1. Describe digital representation of sound 2. How analog signals are encoded 3. Explain transmission of digital sound 4. Define Speech Recognition system and its utilization in daily life. 5. Explain Digital audio signal processing, briefly explain stereophonic and quadraphonic signal processing techniques 6. Define MPEG motion Video Compression Standard. 7. Explain hybrid encoding metods. 8. Define DVI technology. 9. What is MIDI 10. Explain time base media representation and delivery 11. Write short notes on the following. (e) Subband coding (f) Fourier method (g) Time domain sampled representation of sound (h) Audio synthesis 1. Explain intelligent multimedia system. Explain its components. 2. Explain quick time Architecture for Macintosh system. Explain its relevancy with Virtual Reality O. S. 3. What do you mean by VR? Explain intelligent VR software systems 4. Explain applications of VR in various fields 5. What are virtual environment displays and orientation making 6. Write short notes on the following. (i) MIME Applications. (j) Zig-zag Ordering. (k) Desktop VR (l) Applications of multimedia 1. Define Multimedia. Give its applications in various areas. 2. Discuss different standards of CD-ROM. Define session management in Orange Book Standard of CD-ROM? 3. What is utilization of high speed devices in Multimedia, Define SCSI and IDE devices? 4. Discuss the Transport Layer Protocols to handle the Multimedia Data operation. 5. Write short notes on the followings. (a) Multimedia Authoring Tools. (b) Synchronous and Isochronous Data. (c) TCP and XTP (d) ATM and FDDI (e) ADSL (f) Vector Graphics 1. Give various animation techniques in detail 2. Explain Multimedia servers & databases 3. Explain multimedia distribution network. 4. How 3D graphics programs are executed using multimedia 5. Briefly describe the Twining and morphing