estimation of acoustical parameters of emotional speech

advertisement

XVI Session of the Russian Acoustical Society Moscow, November 14-18, 2005

А .F.Khromatidi, I.B.Starchenko

ESTIMATION OF ACOUSTICAL PARAMETERS OF EMOTIONAL SPEECH

Taganrog State University of Radioengineering, Department of Hydroacoustics and Medical Engineering

Russia, 347928, Taganrog, Nekrasovsky, 44,

E-mail: khromatidi@yandex.ru, star@tsure.ru

During the last years, because of development of technology and systems of automatic recognition and synthesis of speech, acoustical features of speech are in the focus of high interest. Attempts to understand how our brain after receiving of information about changes in sound pressure in time, selects the information about condition of the speaker. This article was dedicated to this question. It shows the experimental data and also the methods of analysis, such as: spectral, phoneme and statistical.

Speech signal has dual nature. From one side is it a usual acoustical signal, which represents the process of energy diffusion of acoustical fluctuations in elastic medium. As other acoustical signals it can be described as sound wave, which represents processes of pressure and discharge of particles of environment. The shape of front depends on features of source and conditions of diffusion.

That’s why as any other acoustical signals, speech is characterized be certain number of objective features: dependence of sound pressure from time (temporal structure of sound wave), duration of phonation, spectrum, position of source in space and so on. From the other side, speech as physical event activate certain auditory feelings, such as: pitch, loudness, timbre, localization and so on [1,7].

Ten students (average age – 19y.o.) took part in investigation, which examined the connection of acoustical parameters of speech and emotional condition of the person. For the speech signals recording professional microphone (which was located on 40 cm distance from subjects mouth) was used, notebook ASUS P4 and software Cool Edit Pro and Praat. Recording was made in the following situations: а ) normal situation, when words and phrases were recorded in relaxed condition of the subjects; b) “emotional” situation, when recording was made after preliminary stimulation of emotional conditions [3,6].

Recording of experimental data was conducted in laboratory of Saint-Petersburg University,

Department of Physiology of higher nervous activity and Psychophysiology.

Analysis of acoustical characteristics of speech was started from speech signal recording.

Preliminary analysis was made in Cool Edit Pro software. During the PC processing, as a first step, the oscillogram as a recording of averaged sound pressure for limited time duration was made. Example of oscillogram, recorded with the help of Cool Edit Pro is shown on Fig.1. a) b) c) d)

Fig.1.

Oscillogram of word «karton» ( а - relaxation, b - anger, c - happiness, d - sorrow)

607

XVI Session of the Russian Acoustical Society Moscow, November 14-18, 2005

Parameters of speech signal recording in Cool Edit Pro are: sampling frequency – 44100 Hz, digit capacity – 16bit, channels of recording – Mono. Files were saved in 2 formats: audio format without compression (more precisely with minimal compression PCM) *.wav and in ASCII format

*.txt. Duration of audio files was: from 0,03 с to 1,3 с , average capacity 100 К byte. Average number of points in ASCII files is 40,000.

After recording of speech signals, the following types of analysis were conducted: phonetical, spectral and statistical. The Cool Edit Pro and Praat softwares were used.

There is window of Praat software with analyzed signal on the picture below.

Fig. 2 . Window from Praat software (selected part of signal is a vowel under stress)

Such type of analysis allows to control intensity diffusion of phonation of different words, and also diffusion of formant tracks. Formants are most important parameters in speech investigation. First format characterized level of vocal tract opening on the section lips-oral cavity, second formant – level of tongue shape changing.

Besides control of formant diffusion, Praat software allows to calculate frequency of basic tone, intensity and duration of speech signal.

Data, which was obtained with the help of this software was placed in tables in MS Excel program. This type of tables were created for all subjects. Words (“karton”, “tiho”, “posuda”,

“moloko”) were pronounced by task with certain emotions (relaxed, happiness, anger, disgust, irritation, sadness, doubt). Later, on basis of collected data the statistical analysis was conducted. It is represented on pie charts, Fig.3

8%

16%

12%

Maximal f1

4%

36% anger disgust doubt irritation sadness relaxed

Maximal intensity of vow el under stress

14%

7%

18% anger happiness

61% doubt sadness

24%

608

XVI Session of the Russian Acoustical Society Moscow, November 14-18, 2005

Maximal duration of vow el under stress

15%

Maximal f0

22%

18% doubt happiness anger sadness

53% disgust happiness sadness

11%

29%

7%

11%

Minimal f1

21%

7%

11%

25% relaxed doubt happiness sadness irritation test anger disgust

14%

4%

Minimal duration of vow el under stress

4%

11%

14% irritation

7% doubt

4% relaxed anger test happiness

52%

Minimal intensity of vow el under stress

7%

21%

35% sadness test happiness anger

11%

11%

4%

4% 7% irritation disgust relaxed doubt

18%

11%

Minimal f0

4%

18%

14% test sadness disgust relaxed irritation doubt

60% 35%

Fig. 3.

Pie charts of statistical analysis of different parameters of speech signals depending on emotions

We got following results, which indicate changes in acoustic characteristics of speech depending on emotional state:

• "anger" - frequency of phonation sharply increase comparing to neutral speech and also frequency of first formant, articulation is very clear;

• "fear" - frequency of phonation is lower comparing to "anger", contains sharp picks, articulation is more defined, than in “normal” condition;

• "sadness" – small variations in frequency of phonation, articulation is slow, big duration of vowels, irregularity. Frequency of phonation droningly fall to the end on phrase, sometimes tremor appears;

• "norm" – minimal duration of vowel under stress, consonants sometimes are not clear, but vowels are always clear.

As a result of speech of subjects we ascertain, that influence of emotions leads to statistically significant changes of big number of parameters, listed below:

1. Spectrum of speech signal become deformed, it leads to changes in energy and activates the tendency of shifting of spectrum. At that, spectrum concentrates in low-frequency zone in negative emotions and in high-frequency zone in positive emotions.

609

XVI Session of the Russian Acoustical Society Moscow, November 14-18, 2005

2. Formant structure of speech also changes. Values of formants and bandwidth of formant zones increase on 70—150% on strong emotions. For detection of mentioned shifts spectral moments can be used.

3. Envelops of words are severely change.

4. Frequency of basic tone increases on strong emotions on 150-300% comparing to normal condition. Modulation appears.

5. Actively experiences emotions («happiness», «excitement» and so on) are accompanied by increasing of amplitude of speech signal. speech.

6. Temp of articulation changes and correlation between time of intonation and pauses in

Authors are gratefully acknowledge Doctor of Biology, Professor of Department of

Physiology of higher nervous activity and psychophysiology, Saint-Petersburg University - Lyakso E, for great help and important advices during the work with experimental data.

R E F E R E N C E S

1. Galunov V.I. Speech, emotions and personality: problems, perspectives // Speech, emotions and personality: materials and reports of All-Union symposium. L., 1978. (in Russian)

2. Galunov V. I., Manerov V. H. Ways of solving problem of systems creation for detecting emotional condition of the person //Questions of cybernetics. Vol. 22.- М ., 1976.-P. 95-114. (in Russian)

3. Gubachev Y. ., Iovlev B. V., Karvasarskiy B.D. Emotional stress in conditions of norm and pathology.- L.:

Medicine, 1976.- 224 p. (in Russian)

4. Jinkin .

И . Mechanisms of speech. М .: APN, 1958. – 370p. (in Russian)

5. Lukianov А .N., Frolov М .V. Signals of condition of human-operator. М .: Science, 1968. – 267p. (in

Russian)

6. Milovanova G. B. Integral estimation of emotional condition of the person by his vegetative functions //

Methods and techniques of investigations of operator activity. – М .: Science, 1985.- P. 7-11. (in Russian)

7. Petelin R., Petelin Y. Audio studio in РС . SPB.: BHV – Saint - Petersburg, 1998. – 256p. (in Russian)

610

Download