1 2 3 Fundamental terms.................................................................................................................................. 2 1.1 Sound waves: production and perception ............................................................................................. 2 1.2 Frequency .............................................................................................................................................. 3 1.3 Amplitude .............................................................................................................................................. 3 1.4 Acoustic vibrations ................................................................................................................................. 3 1.5 Simple and complex waves .................................................................................................................... 4 Distinctive features of sounds .................................................................................................................. 5 2.1 Pitch ....................................................................................................................................................... 5 2.2 Loudness ................................................................................................................................................ 5 2.3 Quality ................................................................................................................................................... 5 Machine analysis ..................................................................................................................................... 6 3.1 Waveforms ............................................................................................................................................ 6 3.2 Spectrograms ......................................................................................................................................... 6 3.3 Analysis of vowels .................................................................................................................................. 7 3.3.1 Oral vowels ........................................................................................................................................ 7 3.3.2 Nasal vowels ...................................................................................................................................... 7 3.4 Analysis of consonants........................................................................................................................... 8 3.4.1 Stops .................................................................................................................................................. 8 3.4.2 Nasals ................................................................................................................................................ 9 3.4.3 lateral /l/ ......................................................................................................................................... 10 3.4.4 trill /r/ .............................................................................................................................................. 11 3.4.5 approximants ...................................................................................... Error! Bookmark not defined. 3.4.6 fricatives .......................................................................................................................................... 13 1 1 1.1 Fundamental terms Sound waves: production and perception Sound waves have their origin in the acoustic interference which produces vibration carried by a propagation medium (i.e. the substance or object through which the sound travels, usually air). The vibration consists of small and rapidly occurring variations in air pressure. Figure 1: Diagrammatic representation of fluctuations in air pressurecuased by a vibrating tuning fork (from Ladefoged, Elements of acoustic phonetics). In case of speech, the variations are caused by the actions of the articulatory organs of the speaker that are superimposed on the outgoing stream of air (-> outward AM). When the sound wave reaches the ear it causes vibration of the eardrum. The eardrum is connected to a chain of bones which transmit its vibrations to the liquid in the inner ear. The vibrating liquid stimulates the nerves which lead to the auditory sensation area of the brain so that there is a sensation of hearing. Figure 2: A schematic diagramm of the mechanism of the ear (from Ladefoged, Elements of acoustic phonetics). 2 1.2 Frequency Each sound wave is characterized by a specific frequency and amplitude. Frequency refers to the distance between the peaks of the subsequent waves (i.e. it describes how close together the waves are). Figure 3: A periodic wave (from Davenport & Hannahs, Introducing phonetics and phonology). Frequency is measured in cycles per second (Hz). One cycle is the movement of the wave from the rest (B) to the peak (C) and back to the rest, to the trough (A) and to the rest. A sound wave whose frequency is 100 Hz has 100 cycles in a second. Figure 4: A wave of a 20 Hz frequency from Davenport & Hannahs, Introducing phonetics and phonology). Fundamental frequency (of a voiced speech sound) is the frequency of vocal folds vibration. Depending on the size of the vocal apparatus human voice produces sounds within the ranges: 80-220 male 120-300 female 200-500 children 1.3 Amplitude It refers to the maximum distance between the peak and the trough. 1.4 Acoustic vibrations In order to be heard the frequency of the vibration must be in the range 20-20000Hz. Vibrations can have a different course: a) periodic – the amplitude of the wave takes the same value at regular time intervals, the frequency of the wave can be determined. Periodic vibrations have musical quality (vowels, and sonorant consonants: glides, liquids and nasals). 3 b) aperiodic – the amplitude of the wave takes random values, the frequency of the wave can not be determined. Aperiodic vibrations have less musical quality e.g. buzz, murmur, hiss etc. (voiceless obstruents). c) mixed – the aperiodic course is superimposed on the periodic one (voiced obstruents) Figure 5: Examples of complex waves of a periodic (top), aperiodic (middle) and mixed (bottom) vibration (from Dukiewicz). 1.5 Simple and complex waves Speech sounds consist of complex waves which result from superposition of a number of simple waves (pure tones) – harmonics whose frequencies are whole number multiples of the basic frequency of the sound. E.g. a sound (periodic or mixed vibration) of the basic frequency of 100Hz will have harmonics of 200, 300, 400Hz etc. 4 Figure 6: Two simple waves of frequency 100 and 500 cps. Figure 7: The complex wave resulting from superposition of two simple waves of 100 and 500 cps (from Ladefoged, Elements of acoustic phonetics). 2 Distinctive features of sounds Two sounds of the same duration (lenght) can differ with respect to: 2.1 Pitch Pitch refers to the subjective impression of the “height” of the sound (perception by humans). It is related to frequency of the vibration which is an acoustic (objective) measure indicating the “height” of the sound. Two sounds of a different frequency can be perceived as having the same pitch. 2.2 Loudness It is related to the amplitude of the sound in the same way as pitch relates to frequency. The higher the amplitude the louder the sound is perceived. Amplitude is affected by the efficiency and distance of the propagating medium: the larger the distance the less audible the sound becomes, some materials, e.g. wood, are more efficient in carrying sounds than air. 2.3 Quality 5 Two sounds of the same duration, frequency and amplitude can still differ due to differences in quality (or colouring). The latter result from differences in the shape of the propagation medium (hence differences in the perception of the same phoneme produced by different speakers as well as differences in the vowel quality resulting from different shape of the vocal tract) and the material enclosing that medium (in case of musical instruments e.g. flute made of metal vs. wooden violin). Depending on the features (shape and material) of the propagation medium some harmonics of the sound will be emphasized and others will be weakened. 3 Machine analysis There are different techiques which make it possible to “see” sounds: 3.1 Waveforms It illustrates variations in the air pressure associated with speech sounds. In the waveform pulses corresponding to the vibrations of the vocal folds can be seen. What can we read from a waveform? a) amplitude b) F0 c) the manner of articulation (to some extent): vowels, approximants and nasals – pulses (voicing), high amplitude and energy (vowels, approximants and in the end nasals) voiced obstruents (plosives, fricatives and affricates) – pulses and low energy and amplitude (fricative segments, plosives) voiceless obstruents – empty spaces in case of stops, aperiodic variation in the amplitude in case of fricatives and fricative component of an affricate Figure 8: Waveform of an utterance: Ostatnie przygody Korowiowa i Behemota (male speaker). 3.2 Spectrograms Is the display of variation in the frequency domain over the time and is produced by a spectrograph. On the spectrogram you can see vertical lines which correspond to pulsations of the vocal folds. In the frequency domain it can be seen that certain frequencies are emphasized (dark marks) – they occur as horizontal lines or irregular striations (in higher frequencies). The specific emphasized frequencies above the F0 that appear on the spectrogram are called formants. The frequency of the formant depends on the size and shape of the vocal tract, so in a spectrographic analysis it provides information on the place and manner of articulation. In the analysis of speech the first four formants are taken into account and they are marked as F1, F2, F3 and F4 (from the lowest to the highest on the frequency scale). F1 and F2 are the most important indicators of vowel quality, whereas the higher formants reflect speaker’s characteristics (voice quality). In the flow of articulation changes in formant frequencies which occur when the setting of the vocal tract is changed from one sound to another are called transitions. 6 Spectrograms provide a reliable basis for the analysis of such aspects of speech sounds as duration, F0 and phonetic features (e.g. aspiration) and provides information necessary for the identification of different speech sounds (on the basis on e.g. formant frequencies, transitions and the display of the vocal cords pulsation). Figure 9: Spectrogram of an utterance: Ostatnie przygody Korowiowa i Behemota (male speaker). 3.3 Analysis of vowels 3.3.1 Oral vowels The auditory quality of a vowel is the result of the specific variations in the air pressure due to vocal tract shape superimposed on the F0 produced by the vocal cords. Generally, the frequency of the F2 decreases from the front to back vowels. The frequencies of the F1 change with the horizontal position of the tongue and increase from high to low vowels. Figure 10: The range of formant frequencies of Polish vowels pronounced in the context of the 6 Polish vowels pronounced by 10 speakers (after W. Jassem) 3.3.2 Nasal vowels In Polish there are two nasal vowels. In the ortographic transcription they are symbolized by [ą] and [ę]. These vowels are always realized as diphtongs (contrary to monophtongal realization as in e.g. French): they consist of a sequence of /o/ and /e/ (for [ą] and [ę] respectively) followed by a nasal segment: /m/, /n/, /ɲ/, /ŋ/, / w~/ or /j~/. The polisegmental structure of Polish nasal vowels is reflected in the spectrogram. The part of the display corresponding to the oral vowel /o/ or /e/ has formant frequencies characteristic of that vowel. The display changes with the occurence of the nasal segment – the frequencies of the F2 and F3 as well as the energy of all the formants decrease. The transitions of F2 are characteristic of the subsequent nasal segment. 7 Figure 11: Spectrogram of the word „pęk”: you can see that the nasal vowel [ę] is realized as /e/ followed by velar /N/. 3.4 Analysis of consonants The spectrographic analysis of consonants is more complex, because often the identification of the consonant is possible only on the basis of the “behavior” of the vowel formants. However, it is possible to identify a consonant on the basis of acoustic correlates of consonantal features: voicing: vertical striations corresponding to the vibrations of the vocal folds place of articulation: characteristic transitions and locus (see below) manner of articulation: characteristic formant structure and other 3.4.1 Stops Stops occur as gaps in the patterns on the spectrogram followed by burst of noise (voiceless) or sharp beginning of formant structure (voiced). Stops are identified on the basis on their effect on the adjacent (or preceding) vowel: the transitions of the vowel formants have different place of origin (or ending) called locus. The correspondence between locus and place of articulation is not straightforward, because the actual point of origin (or ending) of the formant depends on the vowel. The position of that part of the tongue which is not involved in the formation of the closure will be that of the vowel, and at the moment of the release the formant frequencies will be determined by the shape of the vocal tract as a whole. bilabial: locus of F2 and F3 comparatively low; rapid increase of all the three formant frequencies at the release of the closure and a rapid decrease at the moment of the closure dental (also post-dental): locus of F2 about 1700-1800; a rapid increase in F1 and a slight decrease in F2 and F3 at the release of the closure, decrease in all the three formant frequencies at the moment of the closure velar: F2 usually high, a rapid increase (at release) and decrease in F1 (at closure), a rapid decrease in F2 and F3 at the release and narrowing of the distance between F2 and F3 towards the closure 8 Figure 12: A spectrogram of the words „bab”, „dad”, “gag” (Brit. Eng. accent, from Ladefoged). 3.4.2 Nasals Nasal consonants have formant structure similar to that of vowels but with nasal formants of about 250, 2500, 3250 Hz (they are independent of the place of articulation and therefore have considerable stable values, they occur at a distance of 800-1000 Hz). The higher formants are considerably reduced in intensity. Nasal consonants can be distinguished on the basis of length of the acoustically effective part of the oral tract (i.e. the space between the closure formed by the articulators and the nasal tract and pharynx). Generally, the shorter it is, the higher the frequencies of the formants. Nasal have an effect on the neighbouring vowels similar to that of stops – depending on the vowel quality and place of articulation the transitions of the vowel formants have different direction and locus. Antiformants are characteristic of nasals. They are the effect of the blocked oral part of the vocal tract and can be defined as significant minima of energy occurring at specific frequencies. Their frequencies are affected by the acoustically effective part of the oral part of the vocal tract in the same way as formants. /m/: F2 – F0 – F3 (F0=800Hz) /n/: F3 – F0 – F4 (F0=1400Hz) /ɲ/: F4 – F0 – F5 (F0=3000Hz) Figure 13: A spectrogram of „pin”, “Tim”, “king” (Brit. English accent, from Ladefoged) /maŋgo/ 9 /dana/ /daɲa/ 3.4.3 lateral /l/ Like nasals their formant structure is similar to that of vowels. mean frequency of the F1 is about 400Hz the energy of F3 or F4 is higher than the energy of F2 the distance between F3, F4 and F5 is smaller than in vowels (about 1500 Hz) there is an antiformant at 3500-4500 Hz: it results from the characteristic shape of the vocal tract (obstruction of the airstream at a point along the center of the oral tract) lubię /lubje/ 10 liczę /litʃe/ 3.4.4 trill /r/ The consonant /r/ consists of a vocalic and consonantal segments occurring one after another. They are of a very short duration and the vocalic segments is usually longer than the consonantal one. In the flow of articulation /r/ is most often produced as a sequence of two consonantal segments separated by the vocalic segment of a total duration of 20-30 milisec. In the context of vowels of a considerably low F2 /r/ has the following features: F2 (locus)= 1250, F3 (locus)=1500 Hz (1600) The loci of formants F3, F4 and F5 of the consonantal segment are lower than the corresponding formants of the vocalic segments the transitions of the formants (which are significant) occur at the length of the vocalic segments the transition of F2 is positive and the transitions of F3, F4 and F5 are negative towards the consonantal segment the greatest difference in the frequency can be observed between the consonantal segment and the locus of the F3 of the vowel (it can be greater than 1000 Hz) /torty/ /orka/ 11 3.4.5 Glides Like nasals, lateral /l/ and trill /r/, the two approximants /j/ and /w/ have formant structure similar to that of vowels, but characterized by dynamic changes in the formants. /j/ In the context of an adjacent vowel of a considerably low F2 /j/ has the following features: F1 is stable loci of the formants: F1=250 Hz, F2=2300 Hz, F3=3300 Hz, F4=3400 Hz the energy of F2 of /j/ is lower than the energy of the adjacent vowel at the start of /j/ the F3 is steady (F3= 3300Hz) and then there appears the transition to the locus of the F3 of the adjacent vowel. Over then length of the vowel the transition changes direction (from falling to rising) contrary to F1 and F2 the frequency of F4 is higher for /j/ than for the adjacent vowel, the distance between F4 and F3 is only 100Hz In the context of a preceding vowel of a considerably low F2 such as /u/, /j/ has the following features: formant transitions are generally symmetrical to those occurring when the vowel follows /j/ F3 approaches F2 instead of F4 loci of the formants: F1=250 Hz, F2=2300 Hz, F3=2500 Hz, F4=3500 Hz except for F1, the other formants are not steady /kruj/ In the context of a preceding vowel of a considerably high F2 (such as /ɨ/) /j/ has the following features: minor changes in F2 frequency of the vowel steady F3 and F4 of the vowel no characteristic “double angle” (which occurs when the vowel has considerably low F2) nearly equal distance between F2, F3 and F4 loci of the formants: F1=250 Hz, F2=2400 Hz, F3=3050 Hz, F4=3750Hz /krɨj/ /w/ 12 contrary to /j/, the formant frequencies are more steady in sequences /w/+vowels than vowel+/w/ loci of the formants: F1=300-500 Hz, F2=700-900 Hz, F3=2500-2700 Hz, F4=2900-3200Hz the energy of the formants is significantly lower than that of the vowel except for those sequences where the F1 locus of the vowel and /w/ has the same frequency there is no tendency to create a “double angle” /piwci/ /stuw/ 3.4.6 fricatives Fricatives are characterized by a random noise pattern located especially in higher frequency regions, but dependent on the place of articulation. The aperiodic vibrations in higher frequency regions are displayed as irregular striations – dark vertical lines in the upper part of the spectrogram. The main resonant frequency (marked as the darkest part on the spectrogram) rise as the size of the oral cavity becomes decreases (i.e. the further forward in the mouth the obstruction is). /f/, /v/ 4500-7000Hz Figure 14: A spectrogram of the word /farmer/: notice the initial /f/ with characteristic fricative noise in the frequency of 4500-7000Hz. /s/, /z/ 4000 Hz 13 Figure 15: A spectrogram of the word sad /sat/: notice the initial /s/ with characteristic fricative noise in the frequency of 4000Hz. /ʃ/, /ʒ/ 3000 Hz Figure 16: A spectrogram of the word szal /ʃal/: notice the initial /ʃ/ with characteristic fricative noise in the frequency of 3000Hz. /ɕ/, /ʑ/ 2000 Hz Figure 17: A spectrogram of the word siad /ɕat/: notice the initial /ɕ/ with characteristic fricative noise in the frequency of 2000Hz. /x/ 1000 Hz Figure 18: A spectrogram of the word harcerz /xartseʃ/: notice the initial /x/ with characteristic fricative noise in the frequency of 1000Hz. 14 The intensity of the noise is generally lower in case of voiced than voiceless fricatives. The F1 is only visible in case of /f/, /v/ and /x/ (due to the effect of the antiformant in case of the other fricatives). Its values vary from 400-1000Hz and are higher for the labiodental than velar fricatives. The highest range of the frequency of the noise is observed for /f/, /v/. 15