Audio, Images, and Video

advertisement
Audio, Images, and Video
Audio Representations
Sound Perception
Video Representations
Visual Perception
Θέµατα Συστηµάτων Πολυµέσων
Basic Sound Concepts
Fourier Analysis
Analog to Digital Conversion
Sampling and Quantization
Analog Transmission of Signals
Digital Audio Examples
Audio Compression
Voice Compression / Coding
Basic Imaging Concepts
TV Scanning Pattern (NTSC)
RGB Color Video
Analog, Traditional TV Standards
Image Aspect Ratios
02-1
Basic Sound Concepts
produced by vibration of matter
represented by waveform
period (T)
frequency (f)
sound is not transmitted in a vacuum
f=1/T
amplitude
analog representation
Θέµατα Συστηµάτων Πολυµέσων
02-2
Fourier Analysis
Fourier Theorem:
any (reasonably behaved) periodic function g(t), with period T, can be
expressed as
where an, bn, cn are the Fourier coefficients
Θέµατα Συστηµάτων Πολυµέσων
02-3
Fourier Analysis Example
Assume identical extension to make periodic
harmonics: possibly infinite
‘Root Mean Square’ (RMS) of Fourier coefficients =
represents the power (energy) of the signal at a given harmonic (frequency)
Θέµατα Συστηµάτων Πολυµέσων
02-4
Low-pass Filter Approximation
Θέµατα Συστηµάτων Πολυµέσων
02-5
Analog Transmission of Signals
Modulation
Types of Analog Signal Modulation
Carrier
AM
FM
Phase Modulation
Digital Signal Modulation
example digital signal: (a)
AM: (b)
FM: (c)
non-zero amplitude represents 1
higher frequency represents 1
Phase Modulation: (d)
Θέµατα Συστηµάτων Πολυµέσων
02-6
Analog to Digital Conversion
Θέµατα Συστηµάτων Πολυµέσων
02-7
Sampling and Quantization
Sampling
convert continuous time signal to
discrete time signal
still analog (continuous amplitude)
Nyquist Theorem
sampled sine wave
we can fully reproduce (fully recover
without any loss of information) a
sampled signal, if the sampling
frequency is at least twice the
maximum frequency of the signal
2 bit quantization (fractional)
Quantization
convert analog amplitude (at discrete
time instants) to discrete values
digital signal
sampled and quantized sine wave
Θέµατα Συστηµάτων Πολυµέσων
02-8
DPCM, Delta Modulation and ADPCM
Differential PCM: difference from previous value represented
Delta Modulation: 1-bit (per sample) difference
Adaptive Differential PCM (ADPCM): variable number of bits per sample
56/64 kb/s approximated well with 8 kb/s
Θέµατα Συστηµάτων Πολυµέσων
02-9
Understanding Audio
Sound is a time varying signal with frequencies (~20hz, ~20khz)
Sound composed of many signals added together
Low frequencies are vowels and bass
High frequencies are consonants
Humans perceive sounds over the entire range, but you are more
sensitive to low frequencies: 2khz .. 4khz
Hearing very dependent on the room and the environment
Sounds masked by overlapping sounds
Θέµατα Συστηµάτων Πολυµέσων
02-10
Digital Audio Examples
Voice
4 kHz voice (telephone)
8 kHz sampling rate
8 bits per sample (linear quantization)
64 Kb/s digital signal
Stereo CD-quality Audio
22 kHz CD-quality audio
16 bit (linear) quantization
44 kHz sampling rate
actually, 44100 samples
(Pulse Code Modulation-PCM)
2 channels (stereo)
1411200 b/s = 1.4 Mb/s
Θέµατα Συστηµάτων Πολυµέσων
02-11
Audio Representations
Optimal sampling frequency is twice the highest
frequency to be sampled (Nyquist Theorem)
Θέµατα Συστηµάτων Πολυµέσων
02-12
Critical Bands
ELECTRO ACOUSTICS
Θέµατα Συστηµάτων Πολυµέσων
02-13
Soundfield
Θέµατα Συστηµάτων Πολυµέσων
02-14
Impulse Response
Θέµατα Συστηµάτων Πολυµέσων
02-15
Audio Noise Masking
Θέµατα Συστηµάτων Πολυµέσων
02-16
Audio Compression
A bit rate of 1.411 Mbps for stereo music exceeds most access rates, and even 64
Kbps for speech exceeds the access rate for a dial-up modem user
Compression techniques are used to reduce the bit rates of the stream; popular
compression techniques for speech include
PCM encoded speech and music is rarely used in the Internet.
GSM (13 Kbps)
G.729 (8 Kbps)
G.723 (both 6.4 and 5.3 Kbps)
a large number of proprietary techniques, including those used by RealNetworks
MP3: popular compression technique for near CD-quality stereo music (MPEG layer 3)
bit rate for music to 128 or 112 Kbps
very little sound degradation
an MP3 file can be broken up into pieces, and each piece is still playable
this headerless file format allows MP3 music files to be streamed across the Internet
the MP3 compression standard is complex: uses psychoacoustic masking, redundancy
reduction and bit reservoir buffering
Θέµατα Συστηµάτων Πολυµέσων
02-17
Voice Compression / Coding
Source Coding
source (voice/speech)
characteristics exploited
vo-coder
64 Kb/s => ~ 8 Kb/s
with no loss of perceived quality
Speech Recognition / Synthesis
speech recognition
i.e., speech to ‘’text’’ conversion
transmission of text
speech synthesis
i.e., ‘’text’’ to speech conversion
Θέµατα Συστηµάτων Πολυµέσων
02-18
Audio Quality as a Function of Quantization...
bits/sample
Θέµατα Συστηµάτων Πολυµέσων
02-19
Audio Quality as a Function of Technology
Kb/s
Θέµατα Συστηµάτων Πολυµέσων
02-20
Related Topics not Covered
Music
MIDI: Music Instrument Digital Interface
Speech
speech generation (text to speech conversion)
speech to text conversion
speaker-dependent
speaker independent
speaker recognition/identification
Θέµατα Συστηµάτων Πολυµέσων
02-21
Discussion
Higher quality
Filter input
More bits per sample (i.e., 10,12, 16, etc.)
More channels (e.g., stereo, quadraphonic, etc.)
Digital processing
Reshape impulse response to simulate a different room
Move perceived location from which sound comes
Cover missing samples
Mix multiple signals (i.e., conference)
Cancel echos
Θέµατα Συστηµάτων Πολυµέσων
02-22
Interactivity Time Constraints
Maximum time not to hear own voice (echo): 100 ms
Maximum allowable round-trip time: 300 ms
Θέµατα Συστηµάτων Πολυµέσων
02-23
Importance of Sound
Passive viewing (e.g., film, video playback, etc.)
Very sensitive to sound breaks
Visual channel more important (ask film makers!)
Tolerate occasional frame drops
Video conferencing
Sound channel is more important
Visual channel still conveys information
Some people report that video teleconference users turn off video
Θέµατα Συστηµάτων Πολυµέσων
02-24
Producing High Quality Audio
Eliminate background noise
Directional mic gives more control
Some audio systems will cancel wind noise
One microphone per speaker
Keep the sound levels balanced
Sweeten the sound track with interesting sound effects
Θέµατα Συστηµάτων Πολυµέσων
02-25
Audio vs. Video
Some people argue that sound is easy and video is hard because of
the disparity in data
rates
Not true, audio is every bit as hard as video -- it is just different!
We will learn about audio and video just as we learned about printing
with the introduction of desktop publishing
Θέµατα Συστηµάτων Πολυµέσων
02-26
Understanding Video
Analog video is a continuous signal that drives a CRT
Composite video combines luminance and chrominance into one
signal
Component video separates luminance and chrominance signals
Θέµατα Συστηµάτων Πολυµέσων
02-27
Computer Video Controller Standards
Pixels (Horizontal x Vertical) x (Color) Depth
Color Graphics Adapter (CGA): 320 x 200 x 2
Enhanced Graphics Adapter (EGA): 640 x 350 x 4
Video Graphics Array (VGA): 640 x 480 x 8
Super VGA (SVGA): 800 x 600 (x 16..24)
Extended Graphics Array (XGA): 1024 x 768 x 24
Θέµατα Συστηµάτων Πολυµέσων
02-28
Basic Imaging Concepts
We see objects because of reflected light
Visible wavelengths: 250-780 nm
Red: 700 nm
Green: 546 nm
Blue: 436 nm
Monochromatic and non-monochromatic sources
Luminance
Chrominance
Images vs. Graphics
Θέµατα Συστηµάτων Πολυµέσων
02-29
PC Bus Transfer Rates
Θέµατα Συστηµάτων Πολυµέσων
02-30
Scan and Retrace
Θέµατα Συστηµάτων Πολυµέσων
02-31
RGB Color Video
Θέµατα Συστηµάτων Πολυµέσων
02-32
Color TV: RGB to YUV
RED
Θέµατα Συστηµάτων Πολυµέσων
02-33
Analog, Traditional TV Standards
NTSC (National Television Standards Committee)
525 lines, interlaced (2:1), 4:3 aspect ratio
60 fields/s (actually, 59.94 fields/s, ~= 30 frames/s)
YIQ:
6 MHz analog signal
Y = 0.30 R + 0.59 G + 0.11 B
I = 0.60 R - 0.28 G - 0.32 B
Q = 0.21 R - 0.52 G + 0.31 B
4.2 MHz luminence, 1.5 MHz chrominance, each; VCR: 0.5 MHz chrominance
PAL (Phase Alternating Line) & SECAM (SEquentiel Couleur Avec Memoire)
625 lines, interlaced (2:1), 4:3 aspect ratio
50 fields/s (25 frames/s)
YUV:
Y = 0.30 R + 0.59 G + 0.11 B
U = 0.493 (B-Y) = -0.15 R - 0.29 G + 0.44 B
V = 0.877 (R-Y) = 0.62 R - 0.52 G + 0.10 B
8 MHz analog signal
Θέµατα Συστηµάτων Πολυµέσων
02-34
Down Sampling Color Video
Θέµατα Συστηµάτων Πολυµέσων
02-35
Interlacing
Θέµατα Συστηµάτων Πολυµέσων
02-36
TV Scanning Pattern (NTSC)
Θέµατα Συστηµάτων Πολυµέσων
02-37
Many Video Formats
Analog - NTSC, PAL, SECAM
Digital - CCIR 601, D1-D5, HDTV
Θέµατα Συστηµάτων Πολυµέσων
02-38
NTSC Video (525-lines, 60-fields/s)
525 scan lines repeated 29.97 times per second
(i.e., 33.37 ms/frame)
Interlaced scan lines divide frame into 2 fields
each 262.5 lines (i.e., 16.68 ms/field)
20 lines reserved for control information at the
beginning of each field
...so only 485 lines of visible data
...laserdisc and S-VHS display around 420 lines
...normal broadcast TV displays around 320 lines
Each line lasts 63.6 µs (10.9 µs blanked)
Θέµατα Συστηµάτων Πολυµέσων
02-39
NTSC Signal Composition
Θέµατα Συστηµάτων Πολυµέσων
02-40
PAL Video (625-lines, 50-fields/s)
625 scan lines repeated 25 times per second (i.e., 40 ms/frame)
Interlaced scan lines divide frame into 2 fields each 312.5 lines
(i.e., 20 ms/field)
Approximately 20% more lines than NTSC
Θέµατα Συστηµάτων Πολυµέσων
02-41
Digital Video Representations
Digital Component Video (D1/D5, SMPTE RP125)
Maintain separate signals for luminance and color
27 MB/s data rate, either parallel or serial
Subsampled color signals 4:2:2
Each pixel is 2 bytes: (CB0,Y0)(CR0,Y1)(CB2,Y2)...
Digital Composite Video (D2/D3, SMPTE 244M)
14.3 MB/s data rate, either parallel or serial
Subsampled color signals 4:2:2
Each pixel is 1 byte
Θέµατα Συστηµάτων Πολυµέσων
02-42
Image Aspect Ratios
Θέµατα Συστηµάτων Πολυµέσων
02-43
TV Systems: Spatial Parameters
Θέµατα Συστηµάτων Πολυµέσων
02-44
TV Systems: Temporal Parameters
Θέµατα Συστηµάτων Πολυµέσων
02-45
Human Perception
Vision system accepts 20fps as smooth motion
More sensitive to low frequencies
Luminance more important than chrominance
Noise is noise
Vision emphasizes edge detection
Detect horizontal lines better than vertical lines
Detect verical lines better than diagonal lines
Visual masking by significant luminance changes
Θέµατα Συστηµάτων Πολυµέσων
02-46
Producing High Quality Video
Need high quality camera
S-video (SVHS, Hi8mm) better than composite
3 chip better than 1 chip
Lights, lights, lights, ...
Experiment with filters to change apparent colors
Shoot scene from different angles and cut
between them to create visual stimulation
Study film/video techniques
Let person exit the scene without moving camera
Keep orientation of images correct
Θέµατα Συστηµάτων Πολυµέσων
02-47
Jargon/Standards
Here are the actual standards
G.711 - A-LAW/µ-LAW encodings
G.721 - ADPCM at 32 kbs
G.723 - ADPCM at 24kbs and 40kbs
GSM 06.10 - 8kz, 1.65kbs (used in Europe)
LPC (FIPS-1015) - Linear Predictive Coding (2.4kbs)
CELP (FIPS-1016) - Code excited LPC (4.8kbs)
The emerging standards common formats
8khz 8-bit µ-LAW mono
22khz 8-bit unsinged linear mono and stereo
44khz 16-bit signed mono and stereo
Θέµατα Συστηµάτων Πολυµέσων
02-48
Download