Digital Media Dr. Jim Rowan ITEC 2110 Audio

advertisement
Digital Media
Dr. Jim Rowan
ITEC 2110
Audio
What is audio?
First, some demos
• Can you hear this?
– http://www.freemosquitoringtones.org/heari
ng_test/
– “mosquito ring tone”
• Audio illusion “Creep”
– http://www.youtube.com/watch?v=ugriWS
mRxcM
The nature of sound
First, a video from ted.com
http://www.wimp.com/howsound/
• Three types we will discuss
– 1) Environmental sound
(sounds found in the environment)
• There are two special classes of audio
– 2) Music
– 3) Speech
The nature of sound
• Environmental sounds
– Provides information about the surroundings that
the human is currently in
• Music and Speech
– Functionally and uniquely different than other
sounds
– Music
• Carries a cultural status
• Can be represented by non-sound: MIDI
• Can be represented by a musical score
– Speech
• Linquistic content
• Lends itself to special compression
And it’s complicated…
• Converting energy to vibrations and back
• Transported through some medium
– Either air or some other compressible medium
• Consider speech
– Starts as an electrical signal (brain & nerves)
– Ends as an electrical signal (brain & nerves)
– But…
No… it’s REALLY complicated..
http://en.wikipedia.org/wiki/Ear
– Starts as an electrical signal (brain & nerves) ==>
– Muscle movement (vocal chords)
• Vibrates a column of air sending out a series of
compression waves in the air
– Compression waves cause ear membrane to
vibrate ==>
– Moves 3 tiny bones ==>
– Causes waves in the liquid in the inner ear ==>
– Bends tiny hair cells immersed in the liquid ==>
– When bent they fire ==>
– Sends electrical signals to the cerebral cortex
– Processed by the temporal cortex
Audio Illusions
• Play a 200 Hz pure tone
– Softly at first
– Gradually increase the volume
– Most listeners will report that the tone drops in
pitch as the volume increases
• Play a 2000 Hz pure tone
– Softly at first
– Gradually increase the volume
– Most listeners will report that the tone rises in
pitch as the volume increases
Audio Illusions
• Complex tones are reported to have lower
pitch than pure tone of the same frequency
• Frequencies above human hearing affect how
the lower frequencies are perceived even
though they can’t be “heard”
Why do you think…
• You can’t tell where some sounds come from
(like some alarms for instance)
• You only need one sub woofer when you
need at least two for everything else
• You can’t tell where sound is coming from
underwater
• Two things running at the same speed make
a “beating” sound
Why do you think… (cont)
• With your eyes closed you can’t tell whether a
sound is in front of you or behind you
• You hear sound that isn’t there (tinnitis)
• Phantom sounds
– Heard… but not there
• Masking sounds
– Not simply drowning them out
– Can mask a sound that occurs before the
masking sound actually starts
Why do you think… (cont)
• You can hear your name in a noisy
room
– Cocktail party effect
– http://en.wikipedia.org/wiki/Cocktail_party_
effect
– Still very much a subject of research
Why? It’s complicated!
• http://en.wikipedia.org/wiki/Psychoacoustics
• Psychoacoustics
– The study of human sound perception
– The study of the psychological and
physiological affects of sound
Why?
It’s complicated!
• Sound is physical phenomenon that is
interpreted through the human perceptual
system
– Wavelength affects stereo hearing
• The distance between your ears related to the wavelength
– Speed of sound affects stereo hearing
• The faster the sound travels, the wider apart your ears need
to be
– You can tell where a sound comes from if
• the wavelength is long enough and
• the speed that sound travels is slow enough to allow the
waves arrive at your ears at different times
Processing Audio
Processing audio
• How can we look at sound?
• What do you want to see?
• Waveform displays
– Summed amplitude of all frequencies & time
– Amplitude & frequency components at one point in
time
– Amplitude & frequency & time
Summed amplitude across all frequencies
& time
more examples of this form ==>
now for some other forms of audio display ==>
Amplitude & frequency components at
one point in time
pipe organ audio
Amplitude & frequency & time
pipe organ audio
Summed amplitude & time
joe took father’s shoe
bench out
Amplitude & frequency & time
Here… the amplitude (volume) is shown as
increasingly darkening areas
Digitized audio
• As we have seen earlier this semester
– Sample rate & quantization level
– Reduction in sample rate is less noticeable than
reducing the quantization level
• Jitter is a problem
– Slight changes in timing causes problems
• 20k+ frequencies?
– Though they can’t be heard they manifest
themselves as aliases when reconstructed
Audio Dithering
Weird…
add noise…
get better sounding result
• Add random noise to the original signal
• This noise causes rapid transitioning
between the few quantized levels
• Makes audio with few quantization
levels seem more acceptable
Audio processing
terms to know
• Clipping
– …but you don’t know how high the amplitude will
be before the performance is recorded
• Noise gate
– has an amplitude threshold
• Notch filter
– remove 60 cycle hum
•
•
•
•
•
Low pass filter
High pass filter
Time stretching (or shrinking… Limbaugh)
Pitch alteration
Envelope shaping (modifying attack)
One thing about humans…
• We can actively “filter out” what we don’t want
to hear
– remember the cocktail party effect?
• Over time we don’t hear the pops and snaps
of a vinyl record
– Have you ever recorded something that you
thought would be good only to play it back and
hear the air conditioner or traffic roaring in the
background?
• A piece of software can’t do this…
– …not yet anyway!
Compressing sound files
• Take the opposite approach from the
one you took with images
– With images you can toss out the high
frequencies
– With audio you can’t… high frequency
changes are highly significant
Compressing sound: Voice
• Remove silence
– Similar to RLE
• Non-linear quantization
• “companding”
– Quiet sounds are represented in greater detail than loud
ones
• Mu-law (North America and Japan)
• A-law (Europe)
– Allows a dynamic range that would require 12 bits into 8
bits
– 4096 (2**12) ==> 256 (2**8)
Compressing sound: Voice
• Differential Pulse Code Modulation (DPCM)
– Related to temporal (inter-frame) video compression
• It predicts what the next sample will be
• It sends that difference rather than the absolute value
• Not as effective for sound as it is for images
• Adaptive DCPM
– Dynamically varies the sample step size
• Large differences were encoded using large steps
• Small differences were encoded using small steps
Sound compression
that is based on perception
• The idea is to remove what doesn’t matter
• Based on the psycho-acoustic model
– Threshold of hearing
• Remove sounds too low to be heard
– High and low frequencies not as important (for
voice)
Questions?
Download