Multimedia-Systems: Audio Ralf Steinmetz Dr. L.Wolf, Dr. S.Fischer

advertisement
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Multimedia-Systems:
Audio
Prof. Dr.-Ing. Ralf Steinmetz
Dr. L.Wolf, Dr. S.Fischer
TU Darmstadt - Darmstadt University of Technology,
Dept. of Electrical Engineering and Information Technology, Dept. of Computer Science
KOM - Industrial Process and System Communications, Tel.+49 6151 166151,
Merckstr. 25, D-64283 Darmstadt, Germany, Ralf.Steinmetz@KOM.tu-darmstadt.de Fax. +49 6151 166152
GMD - German National Research Center for Information Technology
IPSI - Integrated Publication and Information Systems Institute, Tel.+49 6151 869869
Dolivostr. 15, D-64293 Darmstadt, Germany, Ralf.Steinmetz@darmstadt.gmd.de Fax. +49 6151 869870
05-audio.fm 1 22.October.99
Usage
Services
Systems
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Scope
Applications
Learning & Teaching
Content
Processing
Documents
Design
Security
Opt. Memories
Group
SynchroCommuninization
cations
...
Databases
Media-Server
User Interfaces
Programming
Operating Systems
Communications
Quality of Service
Networks
Basics
Compression
05-audio.fm 2 22.October.99
Computer
Architectures
Image &
Graphics
Animation
Video
Audio
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Overview
1. Basics Knowledge: Physics of Accustics
2. Analog Audio Technology
3. Computer Based Digital Audio - Advantages of Using Computers
4. Analog to Digital - Theory of Sampling
5. Music - Producing (MIDI) and Storing (CD, DAT, MD)
6. Speech
05-audio.fm 3 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
1. Basics Knowledge: Physics of Accustics
Sound can be thought of in one of two ways:
• In the time domain
• In the frequency domain
• Transformation is accomplished by a Fourier transform
t
Time
f0
f
Frequency
Fourier:
• Any waveform can be represented by a sum of sine waves
05-audio.fm 4 22.October.99
Fletcher-Munsen curves show:
• particularly sensitive response between 1K and 6K
• effect shall be used for coding and compression
Sound Pressure Level (dbm)
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Psycho-accustics: Amplitude Response and Frequency
140
70
20
40
100
1k
Frequency (Hz)
05-audio.fm 5 22.October.99
4k
10k
80
fm = 0.25
60
Sound Pressure Level (dB)
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Masking Threshold in Frequency Domain
1
4 kHz
40
av
masking
patterns
20
absolute threshold
of hearing
0
0.02
0.05
0.1
0.2
0.5
1
5
2
frequency (kHz)
• narrowband random noise
• mean frequencies250 Hz,1 kHz,and 4 kHz
• related bandwidth100Hz, 160 Hz and 700 Hz respectively
i.e., width depends on frequency
05-audio.fm 6 22.October.99
10
20
80
100 dB
60
Sound Pressure Level (dB)
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Masking Threshold in Frequency Domain
80
40
60
20
40
20
0
0.02
0.05
0.1
0.2
• narrowband random noise
i.e., width depends on amplitude
05-audio.fm 7 22.October.99
0.5
1
5
2
frequency (kHz)
10
20
80
60
90 dB
SPL (dB)
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Masking in Frequency Domain
40
20
50
0
0.02
0.05
0.1
0.2
0.5
1
5
2
frequency (kHz)
Comparison: Sine waves vs. random noise (used before)
• depends on frequency
• i.e., similar to narrowband random noise
05-audio.fm 8 22.October.99
10
20
60 pre-
simultaneous-
post-masking-
40
SLT
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Masking in Time Domain
masker
20
0
-50
50
100
150 ms0
Dt
• after and before the event
• depends on (to some extent) amplitude
05-audio.fm 9 22.October.99
50
tv
100
150
200
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
2. Analog Audio Technology
Well understood and widely used
Adequate quality for recorded productions in the past
Input/output:
• Microphones / speakers
• Tape recorders
• Connectors, amplifiers, and mixers
• Metering, signal levels
Processing:
• Dolby B/C, dbx
• Effect units
Not to be mentioned her in detail, we concentrate on digital systems !
05-audio.fm 10 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
3. Computer Based Digital Audio - Advantages of Using
Computers
Motivation:
• Direct access, storage and compression of data
• User interface often better than with dedicated equipment
• Use of data in multimedia environment
• Conversion into different formats
• Exchange with different users / sites / machines
• Access to digital audio by addition of simple inexpensive cards
score printing
music education
05-audio.fm 11 22.October.99
control of MIDI instruments
direct to disk recording
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Input Devices: Sound Cards
sample rates
44.1kHz, 22 kHz &
11kHz
resolution
16 and 32 bit
frequency response
19kHz +/- 0.5 dB
20 kHz +0/-3 dB
software record &
edit
file formats .wav
audio converters
A/D: 64x sample
D/A: 8x interpole
64x oversample
MIDI
IN, OUT, THRU
synthesizer
4M ROM
instruments - 126
32 voices 22 kHz & 11kHz
05-audio.fm 12 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
4. Analog to Digital - Theory of Sampling
From analog to digital:
1. Sample:
Measure signal value
2. Hold:
Store value temporarily
3. Code / Digitize: Represent sampled value by an integer number
analog signal
digital signal
Steps in sampling a source are usually:
• Removal of frequencies greater than upper limit (low-pass filter)
• Conversion to digital form with an AD converter (digitization)
• Assignment of values into discrete levels (quantization)
Important factors:
• Sampling rate: Number of sampled values per second
• Quantization depth: Number of bits per digitized value
05-audio.fm 13 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Theory of Sampling (cont.)
Sampling rate determined by properties of recorded sound:
• Nyquist: “For lossless digitization, the sampling rate should be at least
twice the maximum frequency response”
• Music typically extends from 20 Hz to 20 kHz
• Speech 100 Hz to 10 kHz,
major energy in band from 200 Hz to 4 kHz
Quantization depth determined by desired sound quality:
• Typically 8 (256 levels) or
• 16 (65,536 levels)
Samples always taken per channel: 2x for stereo
05-audio.fm 14 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Audio Quality of Common Appliances
05-audio.fm 15 22.October.99
Audio Device
Frequency Response
(Bandwidth)
Signal-to-Noise
Ratio
Total
Harmonic
Distortion
CD
20 Hz - 20,000 Hz
98dB
0.005%
Cassette tape
20 Hz - 17,000 Hz
75dB
0.01%
FM Radio
20 Hz - 15,000 Hz
75dB
0.01%
AM Radio
50 Hz - 5,000 Hz
60dB
0.1%
Telephone
300 Hz - 3400 Hz
42dB
Poor
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Popular Sampling Rates
05-audio.fm 16 22.October.99
Sampling Rates
Used As...
8000
Telephony Standard, Popular in UNIX Workstations
11000
Quarter of CD rate, Popular on Macintosh
16000
G.722 Standard (Federal Standard)
18900
CD-ROM XA Rate
22000
Half CD rate, Macintosh rate
32000
Japanese HDTV, British TV audio, Long play DAT
37800
CD XA Standard
44056
Professional audio industry
44100
CD Rate
48000
DAT Rate
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Coding Methods (1)
Coding = representation of sampled values by integers / bits
Common coding methods are:
• PCM (Pulse Code Modulation)
• integer value = (quantized) sampled value
• simple but requires high number of bits
• DPCM (Differential PCM)
• integer value = difference between current value and predicted value
• prediction based on previous values
• requires less bits than PCM for same quality
• DM (Delta Modulation)
• as DPCM but only differences of 1 and -1 allowed
• requires minimal number of bits but quality can be poor
• ADPCM (Adaptive Differential PCM):
• as DPCM but adapts predictor to signal characteristics
• also adapts width of quantization steps to signal characteristics
• better quality than DPCM with same storage requirements
05-audio.fm 17 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Coding Methods (2)
Companding Methods:
• ‘companding’: compress during record, expand during playback
• µ-Law, A-Law
• logarithmic quantization (reduce noise for low volume signals)
• linear PCM with 14 / 13 bit then table lookup yields 8 bit companded value
PCM
Output
Signal
µ-Law / A-Law
Output Signal
100
100
80
80
60
60
40
40
20
20
20 40 60 80 100
Input Signal
05-audio.fm 18 22.October.99
µ-Law/A-Law Mapping
PCM Mapping
Decreased Resolution
for Large Signals
Increased Resolution
for Small Signals
20 40 60 80 100
Input Signal
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Comparison of Sampling/Coding Techniques
05-audio.fm 19 22.October.99
mode
bits
sample
per sec
noise
level
freq. (Hz)
mono
storage
(bytes/sec)
stereo
storage
(bytes/sec)
PCM
16
44,100
v. low
20 - 20K
88,200
176,400
A-Law
8
44,100
low
20 - 20K
44,100
N/A
ADPCM
16
high
low
20 - 18K
22,050
N/A
PCM
8
22,050
low
20 - 9.2K
22,050
44,100
A-Law
8
22,050
low
20 - 9.2K
22,050
44,100
ADPCM
16
music
low
20 - 7K
11,025
22,050
PCM
8
11,025
high
20 - 4.5K
11,025
22,050
A-Law
8
11,025
low
20 - 4.5K
11,025
22,050
ADPCM
16
speech
low
20 - 3K
5,500
N/A
A-Law
16
8,000
low
20 - 3K
8,000
16,000
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
5. Music - Producing (MIDI) and Storing (CD, DAT, MD)
Musical Instrument Digital Interface (MIDI) is:
• Bidirectional
• Device independent
• Resolution independent
Transmits:
• Events, not data
• Quicker, less storage needed
• Each note assigned a discrete value
• Other properties can also be coded
• 16 channels available
• Sending
• Receiving and passing on data
05-audio.fm 20 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Media for Digital Audio
Compact Disk Digital Audio (CD-DA):
• Optical disk technology developed jointly by Philips and Sony
Philips Digital Compact Cassette (DCC):
• Cassette format digital tape - writable
• Suffers from longer seek times than DAT
Media for Digital Audio
Digital Audio Tape (DAT):
• 2 track format - smaller then a conventional cassette
• Quality as good as CD-DA
• Often DAT players have digital input/output
• Popularity hampered by SCMS (Serial Copy Management System)
• Suitable for data backup, e.g. on Silicon Graphics Iris
• Long seek times
Sony Mini-Disk:
• 2.5 inch writable optical disc
• 74 minutes storage possible at 44.1 kHz with compression
• Quality slightly less than CD-DA and DCC
05-audio.fm 21 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Recording Market
Motivation:
• Better quality recording needed:
• Inadequate quality after 3/4 ‘bounces’
• Hassle-free mix down of tracks:
• Forever re-organsing track layout
• Over-dupping a small portion troublesome
• Import/export from samplers easier
Features:
• 8 track format on popular media:
• S-VHS tape (Alesis), Hi-8 video tape (Tascam)
• Often expandable to 32/64 tracks by the addition of other units
• Real-time transfer of data between units
05-audio.fm 22 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
6. Speech
Speech:
• As produced and perceived by humans
History:
• 1850 - Helmholtz modelled vocal tract by mechanical resonators
• 1940 - First synthesis of speech using electrical oscillations
Issues:
• Output:
• By playback
• By synthesis
• Input:
• By recognition
• Transmission
05-audio.fm 23 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Synthesis including Playback
Requirements:
• Real-time output
• Understandable and natural
• ‘Infinite’ vocabulary (for most applications)
Techniques:
• Reproduction of recorded speech (finite vocabulary)
• Assembly
05-audio.fm 24 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Playback Terminology
Dissection of text:
• Phonem: smallest unit of speech (~40)
• Allophon: phonem in environment
• Morphem: smallest units with unique meaning
• Voiced sound (m,l,w ...)
• Voiceless sound (f,s,p)
Text
Sentence
Clause
Word
Syllabel (20000)
Diphon (1400)
Phon (40)
05-audio.fm 25 22.October.99
Allophon
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Example
Phonem: Strolch
S
t
r
O
l
St
tr
rO
Ol
lc
c
Diphon
c-
-S
Half Syllable
StrO
lc
Syllable
Strolch
StrOlc
Konsonant
05-audio.fm 26 22.October.99
Vokal
Konsonant
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Playback Problems
„Assembly“ technique:
• Assemble single units (phonems, diphons, ...)
• Character set analogy
Problems of speech synthesis:
• Co-articulation:
• Strongly influenced by the previous and succeding elements
• Inertia of the vocal system
• Science of speaking style (prosody):
• Depends on semantics
• Pronounciation:
• How said can change the perceived meaning
• Example: “Wachs-tube” and “Wach-stube”
05-audio.fm 27 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Synthesis in the Frequency Domain
Principle:
• Assembly of formats
Dictionary
Phonetic Library
Interactive
Improvements
Text
Transcription
Synthesis
Phonetic Language
Problems:
• Similar to speech synthesis in the time domain
05-audio.fm 28 22.October.99
Speech
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Input in General
Speech Input
Speaker Detection/
Recognition
Who?
Authentification
(Verification)
05-audio.fm 29 22.October.99
Speech
…
What?
How?
(e.g. to detect truth)
Identification
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Recognition
Components
“Lautmuster,
Wortmodelle”
Speech
05-audio.fm 30 22.October.99
acoustic and
phonetic
Analyses
Syntax
syntactic
Analyses
Semantics
semantic
Analyses
Recognised
Sentence
Text
Quality
Quality of Speech
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Transmission
excellent
good
2000
acceptable
1990
bad, sufficient
1980
very bad
1
2
4
8
16
32
data rate of encoded signal (kbps)
Methods:
• Signal coding
• Parametric coding
• Recognition & synthetic reproduction
05-audio.fm 31 22.October.99
64
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Transmission
Signal coding:
• Characteristics:
• Simple technique
• Only signal coded
• No inherent knowledge of speech properties
• Telephone example:
• Normal signal64 kbit/sec
• DPCM56 kbit/sec
• ADPCM32 kbit/sec
05-audio.fm 32 22.October.99
http://www.kom.e-technik.tu-darmstadt.de
http://www.ipsi.gmd.de
© Ralf Steinmetz
Speech Transmission
Parametric Coding:
• Speech properties taken into account
• Extension of sub-band coding
• Mobile radio example:
• 13Kbit/sec
• Working towards 6Kbit/sec
Recognition & synthetic reproduction
• Input words are recognised by an intelligent system
• ‘Codes’ representing the speech are transmitted
• Reconstructed at the other end by synthesis
• Data rate as low as 50 bit/sec
• Quality not acceptable (today)
05-audio.fm 33 22.October.99
Download