http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Multimedia-Systems: Audio Prof. Dr.-Ing. Ralf Steinmetz Dr. L.Wolf, Dr. S.Fischer TU Darmstadt - Darmstadt University of Technology, Dept. of Electrical Engineering and Information Technology, Dept. of Computer Science KOM - Industrial Process and System Communications, Tel.+49 6151 166151, Merckstr. 25, D-64283 Darmstadt, Germany, Ralf.Steinmetz@KOM.tu-darmstadt.de Fax. +49 6151 166152 GMD - German National Research Center for Information Technology IPSI - Integrated Publication and Information Systems Institute, Tel.+49 6151 869869 Dolivostr. 15, D-64293 Darmstadt, Germany, Ralf.Steinmetz@darmstadt.gmd.de Fax. +49 6151 869870 05-audio.fm 1 22.October.99 Usage Services Systems http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Scope Applications Learning & Teaching Content Processing Documents Design Security Opt. Memories Group SynchroCommuninization cations ... Databases Media-Server User Interfaces Programming Operating Systems Communications Quality of Service Networks Basics Compression 05-audio.fm 2 22.October.99 Computer Architectures Image & Graphics Animation Video Audio http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Overview 1. Basics Knowledge: Physics of Accustics 2. Analog Audio Technology 3. Computer Based Digital Audio - Advantages of Using Computers 4. Analog to Digital - Theory of Sampling 5. Music - Producing (MIDI) and Storing (CD, DAT, MD) 6. Speech 05-audio.fm 3 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz 1. Basics Knowledge: Physics of Accustics Sound can be thought of in one of two ways: • In the time domain • In the frequency domain • Transformation is accomplished by a Fourier transform t Time f0 f Frequency Fourier: • Any waveform can be represented by a sum of sine waves 05-audio.fm 4 22.October.99 Fletcher-Munsen curves show: • particularly sensitive response between 1K and 6K • effect shall be used for coding and compression Sound Pressure Level (dbm) http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Psycho-accustics: Amplitude Response and Frequency 140 70 20 40 100 1k Frequency (Hz) 05-audio.fm 5 22.October.99 4k 10k 80 fm = 0.25 60 Sound Pressure Level (dB) http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Masking Threshold in Frequency Domain 1 4 kHz 40 av masking patterns 20 absolute threshold of hearing 0 0.02 0.05 0.1 0.2 0.5 1 5 2 frequency (kHz) • narrowband random noise • mean frequencies250 Hz,1 kHz,and 4 kHz • related bandwidth100Hz, 160 Hz and 700 Hz respectively i.e., width depends on frequency 05-audio.fm 6 22.October.99 10 20 80 100 dB 60 Sound Pressure Level (dB) http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Masking Threshold in Frequency Domain 80 40 60 20 40 20 0 0.02 0.05 0.1 0.2 • narrowband random noise i.e., width depends on amplitude 05-audio.fm 7 22.October.99 0.5 1 5 2 frequency (kHz) 10 20 80 60 90 dB SPL (dB) http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Masking in Frequency Domain 40 20 50 0 0.02 0.05 0.1 0.2 0.5 1 5 2 frequency (kHz) Comparison: Sine waves vs. random noise (used before) • depends on frequency • i.e., similar to narrowband random noise 05-audio.fm 8 22.October.99 10 20 60 pre- simultaneous- post-masking- 40 SLT http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Masking in Time Domain masker 20 0 -50 50 100 150 ms0 Dt • after and before the event • depends on (to some extent) amplitude 05-audio.fm 9 22.October.99 50 tv 100 150 200 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz 2. Analog Audio Technology Well understood and widely used Adequate quality for recorded productions in the past Input/output: • Microphones / speakers • Tape recorders • Connectors, amplifiers, and mixers • Metering, signal levels Processing: • Dolby B/C, dbx • Effect units Not to be mentioned her in detail, we concentrate on digital systems ! 05-audio.fm 10 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz 3. Computer Based Digital Audio - Advantages of Using Computers Motivation: • Direct access, storage and compression of data • User interface often better than with dedicated equipment • Use of data in multimedia environment • Conversion into different formats • Exchange with different users / sites / machines • Access to digital audio by addition of simple inexpensive cards score printing music education 05-audio.fm 11 22.October.99 control of MIDI instruments direct to disk recording http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Input Devices: Sound Cards sample rates 44.1kHz, 22 kHz & 11kHz resolution 16 and 32 bit frequency response 19kHz +/- 0.5 dB 20 kHz +0/-3 dB software record & edit file formats .wav audio converters A/D: 64x sample D/A: 8x interpole 64x oversample MIDI IN, OUT, THRU synthesizer 4M ROM instruments - 126 32 voices 22 kHz & 11kHz 05-audio.fm 12 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz 4. Analog to Digital - Theory of Sampling From analog to digital: 1. Sample: Measure signal value 2. Hold: Store value temporarily 3. Code / Digitize: Represent sampled value by an integer number analog signal digital signal Steps in sampling a source are usually: • Removal of frequencies greater than upper limit (low-pass filter) • Conversion to digital form with an AD converter (digitization) • Assignment of values into discrete levels (quantization) Important factors: • Sampling rate: Number of sampled values per second • Quantization depth: Number of bits per digitized value 05-audio.fm 13 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Theory of Sampling (cont.) Sampling rate determined by properties of recorded sound: • Nyquist: “For lossless digitization, the sampling rate should be at least twice the maximum frequency response” • Music typically extends from 20 Hz to 20 kHz • Speech 100 Hz to 10 kHz, major energy in band from 200 Hz to 4 kHz Quantization depth determined by desired sound quality: • Typically 8 (256 levels) or • 16 (65,536 levels) Samples always taken per channel: 2x for stereo 05-audio.fm 14 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Audio Quality of Common Appliances 05-audio.fm 15 22.October.99 Audio Device Frequency Response (Bandwidth) Signal-to-Noise Ratio Total Harmonic Distortion CD 20 Hz - 20,000 Hz 98dB 0.005% Cassette tape 20 Hz - 17,000 Hz 75dB 0.01% FM Radio 20 Hz - 15,000 Hz 75dB 0.01% AM Radio 50 Hz - 5,000 Hz 60dB 0.1% Telephone 300 Hz - 3400 Hz 42dB Poor http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Popular Sampling Rates 05-audio.fm 16 22.October.99 Sampling Rates Used As... 8000 Telephony Standard, Popular in UNIX Workstations 11000 Quarter of CD rate, Popular on Macintosh 16000 G.722 Standard (Federal Standard) 18900 CD-ROM XA Rate 22000 Half CD rate, Macintosh rate 32000 Japanese HDTV, British TV audio, Long play DAT 37800 CD XA Standard 44056 Professional audio industry 44100 CD Rate 48000 DAT Rate http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Coding Methods (1) Coding = representation of sampled values by integers / bits Common coding methods are: • PCM (Pulse Code Modulation) • integer value = (quantized) sampled value • simple but requires high number of bits • DPCM (Differential PCM) • integer value = difference between current value and predicted value • prediction based on previous values • requires less bits than PCM for same quality • DM (Delta Modulation) • as DPCM but only differences of 1 and -1 allowed • requires minimal number of bits but quality can be poor • ADPCM (Adaptive Differential PCM): • as DPCM but adapts predictor to signal characteristics • also adapts width of quantization steps to signal characteristics • better quality than DPCM with same storage requirements 05-audio.fm 17 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Coding Methods (2) Companding Methods: • ‘companding’: compress during record, expand during playback • µ-Law, A-Law • logarithmic quantization (reduce noise for low volume signals) • linear PCM with 14 / 13 bit then table lookup yields 8 bit companded value PCM Output Signal µ-Law / A-Law Output Signal 100 100 80 80 60 60 40 40 20 20 20 40 60 80 100 Input Signal 05-audio.fm 18 22.October.99 µ-Law/A-Law Mapping PCM Mapping Decreased Resolution for Large Signals Increased Resolution for Small Signals 20 40 60 80 100 Input Signal http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Comparison of Sampling/Coding Techniques 05-audio.fm 19 22.October.99 mode bits sample per sec noise level freq. (Hz) mono storage (bytes/sec) stereo storage (bytes/sec) PCM 16 44,100 v. low 20 - 20K 88,200 176,400 A-Law 8 44,100 low 20 - 20K 44,100 N/A ADPCM 16 high low 20 - 18K 22,050 N/A PCM 8 22,050 low 20 - 9.2K 22,050 44,100 A-Law 8 22,050 low 20 - 9.2K 22,050 44,100 ADPCM 16 music low 20 - 7K 11,025 22,050 PCM 8 11,025 high 20 - 4.5K 11,025 22,050 A-Law 8 11,025 low 20 - 4.5K 11,025 22,050 ADPCM 16 speech low 20 - 3K 5,500 N/A A-Law 16 8,000 low 20 - 3K 8,000 16,000 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz 5. Music - Producing (MIDI) and Storing (CD, DAT, MD) Musical Instrument Digital Interface (MIDI) is: • Bidirectional • Device independent • Resolution independent Transmits: • Events, not data • Quicker, less storage needed • Each note assigned a discrete value • Other properties can also be coded • 16 channels available • Sending • Receiving and passing on data 05-audio.fm 20 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Media for Digital Audio Compact Disk Digital Audio (CD-DA): • Optical disk technology developed jointly by Philips and Sony Philips Digital Compact Cassette (DCC): • Cassette format digital tape - writable • Suffers from longer seek times than DAT Media for Digital Audio Digital Audio Tape (DAT): • 2 track format - smaller then a conventional cassette • Quality as good as CD-DA • Often DAT players have digital input/output • Popularity hampered by SCMS (Serial Copy Management System) • Suitable for data backup, e.g. on Silicon Graphics Iris • Long seek times Sony Mini-Disk: • 2.5 inch writable optical disc • 74 minutes storage possible at 44.1 kHz with compression • Quality slightly less than CD-DA and DCC 05-audio.fm 21 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Recording Market Motivation: • Better quality recording needed: • Inadequate quality after 3/4 ‘bounces’ • Hassle-free mix down of tracks: • Forever re-organsing track layout • Over-dupping a small portion troublesome • Import/export from samplers easier Features: • 8 track format on popular media: • S-VHS tape (Alesis), Hi-8 video tape (Tascam) • Often expandable to 32/64 tracks by the addition of other units • Real-time transfer of data between units 05-audio.fm 22 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz 6. Speech Speech: • As produced and perceived by humans History: • 1850 - Helmholtz modelled vocal tract by mechanical resonators • 1940 - First synthesis of speech using electrical oscillations Issues: • Output: • By playback • By synthesis • Input: • By recognition • Transmission 05-audio.fm 23 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Synthesis including Playback Requirements: • Real-time output • Understandable and natural • ‘Infinite’ vocabulary (for most applications) Techniques: • Reproduction of recorded speech (finite vocabulary) • Assembly 05-audio.fm 24 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Playback Terminology Dissection of text: • Phonem: smallest unit of speech (~40) • Allophon: phonem in environment • Morphem: smallest units with unique meaning • Voiced sound (m,l,w ...) • Voiceless sound (f,s,p) Text Sentence Clause Word Syllabel (20000) Diphon (1400) Phon (40) 05-audio.fm 25 22.October.99 Allophon http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Example Phonem: Strolch S t r O l St tr rO Ol lc c Diphon c- -S Half Syllable StrO lc Syllable Strolch StrOlc Konsonant 05-audio.fm 26 22.October.99 Vokal Konsonant http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Playback Problems „Assembly“ technique: • Assemble single units (phonems, diphons, ...) • Character set analogy Problems of speech synthesis: • Co-articulation: • Strongly influenced by the previous and succeding elements • Inertia of the vocal system • Science of speaking style (prosody): • Depends on semantics • Pronounciation: • How said can change the perceived meaning • Example: “Wachs-tube” and “Wach-stube” 05-audio.fm 27 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Synthesis in the Frequency Domain Principle: • Assembly of formats Dictionary Phonetic Library Interactive Improvements Text Transcription Synthesis Phonetic Language Problems: • Similar to speech synthesis in the time domain 05-audio.fm 28 22.October.99 Speech http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Input in General Speech Input Speaker Detection/ Recognition Who? Authentification (Verification) 05-audio.fm 29 22.October.99 Speech … What? How? (e.g. to detect truth) Identification http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Recognition Components “Lautmuster, Wortmodelle” Speech 05-audio.fm 30 22.October.99 acoustic and phonetic Analyses Syntax syntactic Analyses Semantics semantic Analyses Recognised Sentence Text Quality Quality of Speech http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Transmission excellent good 2000 acceptable 1990 bad, sufficient 1980 very bad 1 2 4 8 16 32 data rate of encoded signal (kbps) Methods: • Signal coding • Parametric coding • Recognition & synthetic reproduction 05-audio.fm 31 22.October.99 64 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Transmission Signal coding: • Characteristics: • Simple technique • Only signal coded • No inherent knowledge of speech properties • Telephone example: • Normal signal64 kbit/sec • DPCM56 kbit/sec • ADPCM32 kbit/sec 05-audio.fm 32 22.October.99 http://www.kom.e-technik.tu-darmstadt.de http://www.ipsi.gmd.de © Ralf Steinmetz Speech Transmission Parametric Coding: • Speech properties taken into account • Extension of sub-band coding • Mobile radio example: • 13Kbit/sec • Working towards 6Kbit/sec Recognition & synthetic reproduction • Input words are recognised by an intelligent system • ‘Codes’ representing the speech are transmitted • Reconstructed at the other end by synthesis • Data rate as low as 50 bit/sec • Quality not acceptable (today) 05-audio.fm 33 22.October.99