05.basicAudioFeature

Basic Features of Audio Signals (音訊的基本特徵) Jyh-Shing Roger Jang (張智星) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan Audio Features Four commonly used audio features Volume Pitch Zero crossing rate Timber Our goal These features can be perceived subjectively. But we need to compute them quantitatively for further processing and recognition. Audio Features in Time Domain Audio features presented in the time domain Fundamental period Intensity Timbre: Waveform within an FP Audio Features in Frequency Domain Volume: Magnitude of spectrum Pitch: Distance between harmonics Timber: Smoothed spectrum First formant F1 Intensity Pitch freq Second formant F2 Demo: Real-time Spectrogram Try “dspstfft_audio” under MATLAB: Spectrum: Spectrogram: Steps for Audio Feature Extraction Frame blocking Frame duration of 20 ms or so Feature extraction Volume, zero-crossing rate, pitch, MFCC, etc Endpoint detection Usually based on volume & zero-crossing rate Frame Blocking 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 0 500 1000 1500 Overlap 2000 2500 0.3 0.2 0.1 0 Sample rate = 11025 Hz Zoom in Frame Frame size = 256 samples Overlap = 84 samples (Hop size = 256-84) Frame rate = 11025/(256-84)=64 frames/sec -0.1 -0.2 -0.3 -0.4 0 50 100 150 200 250 300 Intensity (I) Intensity Visual cue: Amplitude of vibration Computation: n Volume: vol  s i 1 i  n 2 Log energy (in decibel): energy  10*log10   si   i 1  Characteristics Influenced by microphone types Microphone setups Perceived volume is influenced by frequency and timbre Intensity (II) To avoid DC drifting DC drifting: The vibration is not around zero Computation: n Volume: vol   si  median  s  i 1 2  n energy  10*log s  mean s    10    i Log energy (in decibel):  i 1  Theoretical background (How to prove?) n s   s1 , s2 ,..., sn   arg min  si  x  median  s  x i 1 n s   s1 , s2 ,..., sn   arg min   si  x   mean  s  x i 1 2 Intensity (III) Examples Please refer to the online tutorial Pitch Definition Pitch is known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:  Hz  semitone  69  12*log 2   440   Pitch Computation (I) Pitch of tuning forks ff  16000/ 187 7  / 5  439.56 Hz  ff  pitch  69  12* log2    68.98 sem itone 440   Pitch Computation (II) Pitch of speech ff  16000/ 477 75 / 3  119.403 Hz  ff  pitch  69  12* log2    46.42 sem itone 440   Statistics of Mandarin Chinese  5401 characters, each character is at least associated with a base syllable and a tone  411 base syllables, and most syllables have 4 ones, so we have 1501 tonal syllables  Tone is characterized by the pitch curves: Tone 1: high-high Tone 2: low-high Tone 3: high-low-high Tone 4: high-low  Some examples of tones: 1242：清華大學 1234：三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 ?????：美麗大教堂、滷蛋有夠鹹（Taiwanese） Sinusoidal Signals How to generate a stream of sinusoidal signals fs=16000; duration=3; f=440; t=(1:fs*duration)/fs; y=0.8*sin(2*pi*f*t); plot(t,y); axis([0.6, 0.65, -1 1]); sound(y, fs); Zero Crossing Rate Zero crossing rate (ZCR) The number of zero crossing in a frame. Characteristics：  Noise and unvoiced sound have high ZCR.  ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR. ZCR Computations Two types of ZCR definition If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. It affects the ZCR, especially when the sample rate is low. Other consideration Zero-justification is required. ZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?) ZCR Examples Please refer to the online tutorial.

05.basicAudioFeature

Related documents

Products

Support

05.basicAudioFeature

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib