Audio, Images, and Video Audio Representations Sound Perception Video Representations Visual Perception Θέµατα Συστηµάτων Πολυµέσων Basic Sound Concepts Fourier Analysis Analog to Digital Conversion Sampling and Quantization Analog Transmission of Signals Digital Audio Examples Audio Compression Voice Compression / Coding Basic Imaging Concepts TV Scanning Pattern (NTSC) RGB Color Video Analog, Traditional TV Standards Image Aspect Ratios 02-1 Basic Sound Concepts produced by vibration of matter represented by waveform period (T) frequency (f) sound is not transmitted in a vacuum f=1/T amplitude analog representation Θέµατα Συστηµάτων Πολυµέσων 02-2 Fourier Analysis Fourier Theorem: any (reasonably behaved) periodic function g(t), with period T, can be expressed as where an, bn, cn are the Fourier coefficients Θέµατα Συστηµάτων Πολυµέσων 02-3 Fourier Analysis Example Assume identical extension to make periodic harmonics: possibly infinite ‘Root Mean Square’ (RMS) of Fourier coefficients = represents the power (energy) of the signal at a given harmonic (frequency) Θέµατα Συστηµάτων Πολυµέσων 02-4 Low-pass Filter Approximation Θέµατα Συστηµάτων Πολυµέσων 02-5 Analog Transmission of Signals Modulation Types of Analog Signal Modulation Carrier AM FM Phase Modulation Digital Signal Modulation example digital signal: (a) AM: (b) FM: (c) non-zero amplitude represents 1 higher frequency represents 1 Phase Modulation: (d) Θέµατα Συστηµάτων Πολυµέσων 02-6 Analog to Digital Conversion Θέµατα Συστηµάτων Πολυµέσων 02-7 Sampling and Quantization Sampling convert continuous time signal to discrete time signal still analog (continuous amplitude) Nyquist Theorem sampled sine wave we can fully reproduce (fully recover without any loss of information) a sampled signal, if the sampling frequency is at least twice the maximum frequency of the signal 2 bit quantization (fractional) Quantization convert analog amplitude (at discrete time instants) to discrete values digital signal sampled and quantized sine wave Θέµατα Συστηµάτων Πολυµέσων 02-8 DPCM, Delta Modulation and ADPCM Differential PCM: difference from previous value represented Delta Modulation: 1-bit (per sample) difference Adaptive Differential PCM (ADPCM): variable number of bits per sample 56/64 kb/s approximated well with 8 kb/s Θέµατα Συστηµάτων Πολυµέσων 02-9 Understanding Audio Sound is a time varying signal with frequencies (~20hz, ~20khz) Sound composed of many signals added together Low frequencies are vowels and bass High frequencies are consonants Humans perceive sounds over the entire range, but you are more sensitive to low frequencies: 2khz .. 4khz Hearing very dependent on the room and the environment Sounds masked by overlapping sounds Θέµατα Συστηµάτων Πολυµέσων 02-10 Digital Audio Examples Voice 4 kHz voice (telephone) 8 kHz sampling rate 8 bits per sample (linear quantization) 64 Kb/s digital signal Stereo CD-quality Audio 22 kHz CD-quality audio 16 bit (linear) quantization 44 kHz sampling rate actually, 44100 samples (Pulse Code Modulation-PCM) 2 channels (stereo) 1411200 b/s = 1.4 Mb/s Θέµατα Συστηµάτων Πολυµέσων 02-11 Audio Representations Optimal sampling frequency is twice the highest frequency to be sampled (Nyquist Theorem) Θέµατα Συστηµάτων Πολυµέσων 02-12 Critical Bands ELECTRO ACOUSTICS Θέµατα Συστηµάτων Πολυµέσων 02-13 Soundfield Θέµατα Συστηµάτων Πολυµέσων 02-14 Impulse Response Θέµατα Συστηµάτων Πολυµέσων 02-15 Audio Noise Masking Θέµατα Συστηµάτων Πολυµέσων 02-16 Audio Compression A bit rate of 1.411 Mbps for stereo music exceeds most access rates, and even 64 Kbps for speech exceeds the access rate for a dial-up modem user Compression techniques are used to reduce the bit rates of the stream; popular compression techniques for speech include PCM encoded speech and music is rarely used in the Internet. GSM (13 Kbps) G.729 (8 Kbps) G.723 (both 6.4 and 5.3 Kbps) a large number of proprietary techniques, including those used by RealNetworks MP3: popular compression technique for near CD-quality stereo music (MPEG layer 3) bit rate for music to 128 or 112 Kbps very little sound degradation an MP3 file can be broken up into pieces, and each piece is still playable this headerless file format allows MP3 music files to be streamed across the Internet the MP3 compression standard is complex: uses psychoacoustic masking, redundancy reduction and bit reservoir buffering Θέµατα Συστηµάτων Πολυµέσων 02-17 Voice Compression / Coding Source Coding source (voice/speech) characteristics exploited vo-coder 64 Kb/s => ~ 8 Kb/s with no loss of perceived quality Speech Recognition / Synthesis speech recognition i.e., speech to ‘’text’’ conversion transmission of text speech synthesis i.e., ‘’text’’ to speech conversion Θέµατα Συστηµάτων Πολυµέσων 02-18 Audio Quality as a Function of Quantization... bits/sample Θέµατα Συστηµάτων Πολυµέσων 02-19 Audio Quality as a Function of Technology Kb/s Θέµατα Συστηµάτων Πολυµέσων 02-20 Related Topics not Covered Music MIDI: Music Instrument Digital Interface Speech speech generation (text to speech conversion) speech to text conversion speaker-dependent speaker independent speaker recognition/identification Θέµατα Συστηµάτων Πολυµέσων 02-21 Discussion Higher quality Filter input More bits per sample (i.e., 10,12, 16, etc.) More channels (e.g., stereo, quadraphonic, etc.) Digital processing Reshape impulse response to simulate a different room Move perceived location from which sound comes Cover missing samples Mix multiple signals (i.e., conference) Cancel echos Θέµατα Συστηµάτων Πολυµέσων 02-22 Interactivity Time Constraints Maximum time not to hear own voice (echo): 100 ms Maximum allowable round-trip time: 300 ms Θέµατα Συστηµάτων Πολυµέσων 02-23 Importance of Sound Passive viewing (e.g., film, video playback, etc.) Very sensitive to sound breaks Visual channel more important (ask film makers!) Tolerate occasional frame drops Video conferencing Sound channel is more important Visual channel still conveys information Some people report that video teleconference users turn off video Θέµατα Συστηµάτων Πολυµέσων 02-24 Producing High Quality Audio Eliminate background noise Directional mic gives more control Some audio systems will cancel wind noise One microphone per speaker Keep the sound levels balanced Sweeten the sound track with interesting sound effects Θέµατα Συστηµάτων Πολυµέσων 02-25 Audio vs. Video Some people argue that sound is easy and video is hard because of the disparity in data rates Not true, audio is every bit as hard as video -- it is just different! We will learn about audio and video just as we learned about printing with the introduction of desktop publishing Θέµατα Συστηµάτων Πολυµέσων 02-26 Understanding Video Analog video is a continuous signal that drives a CRT Composite video combines luminance and chrominance into one signal Component video separates luminance and chrominance signals Θέµατα Συστηµάτων Πολυµέσων 02-27 Computer Video Controller Standards Pixels (Horizontal x Vertical) x (Color) Depth Color Graphics Adapter (CGA): 320 x 200 x 2 Enhanced Graphics Adapter (EGA): 640 x 350 x 4 Video Graphics Array (VGA): 640 x 480 x 8 Super VGA (SVGA): 800 x 600 (x 16..24) Extended Graphics Array (XGA): 1024 x 768 x 24 Θέµατα Συστηµάτων Πολυµέσων 02-28 Basic Imaging Concepts We see objects because of reflected light Visible wavelengths: 250-780 nm Red: 700 nm Green: 546 nm Blue: 436 nm Monochromatic and non-monochromatic sources Luminance Chrominance Images vs. Graphics Θέµατα Συστηµάτων Πολυµέσων 02-29 PC Bus Transfer Rates Θέµατα Συστηµάτων Πολυµέσων 02-30 Scan and Retrace Θέµατα Συστηµάτων Πολυµέσων 02-31 RGB Color Video Θέµατα Συστηµάτων Πολυµέσων 02-32 Color TV: RGB to YUV RED Θέµατα Συστηµάτων Πολυµέσων 02-33 Analog, Traditional TV Standards NTSC (National Television Standards Committee) 525 lines, interlaced (2:1), 4:3 aspect ratio 60 fields/s (actually, 59.94 fields/s, ~= 30 frames/s) YIQ: 6 MHz analog signal Y = 0.30 R + 0.59 G + 0.11 B I = 0.60 R - 0.28 G - 0.32 B Q = 0.21 R - 0.52 G + 0.31 B 4.2 MHz luminence, 1.5 MHz chrominance, each; VCR: 0.5 MHz chrominance PAL (Phase Alternating Line) & SECAM (SEquentiel Couleur Avec Memoire) 625 lines, interlaced (2:1), 4:3 aspect ratio 50 fields/s (25 frames/s) YUV: Y = 0.30 R + 0.59 G + 0.11 B U = 0.493 (B-Y) = -0.15 R - 0.29 G + 0.44 B V = 0.877 (R-Y) = 0.62 R - 0.52 G + 0.10 B 8 MHz analog signal Θέµατα Συστηµάτων Πολυµέσων 02-34 Down Sampling Color Video Θέµατα Συστηµάτων Πολυµέσων 02-35 Interlacing Θέµατα Συστηµάτων Πολυµέσων 02-36 TV Scanning Pattern (NTSC) Θέµατα Συστηµάτων Πολυµέσων 02-37 Many Video Formats Analog - NTSC, PAL, SECAM Digital - CCIR 601, D1-D5, HDTV Θέµατα Συστηµάτων Πολυµέσων 02-38 NTSC Video (525-lines, 60-fields/s) 525 scan lines repeated 29.97 times per second (i.e., 33.37 ms/frame) Interlaced scan lines divide frame into 2 fields each 262.5 lines (i.e., 16.68 ms/field) 20 lines reserved for control information at the beginning of each field ...so only 485 lines of visible data ...laserdisc and S-VHS display around 420 lines ...normal broadcast TV displays around 320 lines Each line lasts 63.6 µs (10.9 µs blanked) Θέµατα Συστηµάτων Πολυµέσων 02-39 NTSC Signal Composition Θέµατα Συστηµάτων Πολυµέσων 02-40 PAL Video (625-lines, 50-fields/s) 625 scan lines repeated 25 times per second (i.e., 40 ms/frame) Interlaced scan lines divide frame into 2 fields each 312.5 lines (i.e., 20 ms/field) Approximately 20% more lines than NTSC Θέµατα Συστηµάτων Πολυµέσων 02-41 Digital Video Representations Digital Component Video (D1/D5, SMPTE RP125) Maintain separate signals for luminance and color 27 MB/s data rate, either parallel or serial Subsampled color signals 4:2:2 Each pixel is 2 bytes: (CB0,Y0)(CR0,Y1)(CB2,Y2)... Digital Composite Video (D2/D3, SMPTE 244M) 14.3 MB/s data rate, either parallel or serial Subsampled color signals 4:2:2 Each pixel is 1 byte Θέµατα Συστηµάτων Πολυµέσων 02-42 Image Aspect Ratios Θέµατα Συστηµάτων Πολυµέσων 02-43 TV Systems: Spatial Parameters Θέµατα Συστηµάτων Πολυµέσων 02-44 TV Systems: Temporal Parameters Θέµατα Συστηµάτων Πολυµέσων 02-45 Human Perception Vision system accepts 20fps as smooth motion More sensitive to low frequencies Luminance more important than chrominance Noise is noise Vision emphasizes edge detection Detect horizontal lines better than vertical lines Detect verical lines better than diagonal lines Visual masking by significant luminance changes Θέµατα Συστηµάτων Πολυµέσων 02-46 Producing High Quality Video Need high quality camera S-video (SVHS, Hi8mm) better than composite 3 chip better than 1 chip Lights, lights, lights, ... Experiment with filters to change apparent colors Shoot scene from different angles and cut between them to create visual stimulation Study film/video techniques Let person exit the scene without moving camera Keep orientation of images correct Θέµατα Συστηµάτων Πολυµέσων 02-47 Jargon/Standards Here are the actual standards G.711 - A-LAW/µ-LAW encodings G.721 - ADPCM at 32 kbs G.723 - ADPCM at 24kbs and 40kbs GSM 06.10 - 8kz, 1.65kbs (used in Europe) LPC (FIPS-1015) - Linear Predictive Coding (2.4kbs) CELP (FIPS-1016) - Code excited LPC (4.8kbs) The emerging standards common formats 8khz 8-bit µ-LAW mono 22khz 8-bit unsinged linear mono and stereo 44khz 16-bit signed mono and stereo Θέµατα Συστηµάτων Πολυµέσων 02-48