Fundamental terms

advertisement
1
2
3
Fundamental terms.................................................................................................................................. 2
1.1
Sound waves: production and perception ............................................................................................. 2
1.2
Frequency .............................................................................................................................................. 3
1.3
Amplitude .............................................................................................................................................. 3
1.4
Acoustic vibrations ................................................................................................................................. 3
1.5
Simple and complex waves .................................................................................................................... 4
Distinctive features of sounds .................................................................................................................. 5
2.1
Pitch ....................................................................................................................................................... 5
2.2
Loudness ................................................................................................................................................ 5
2.3
Quality ................................................................................................................................................... 5
Machine analysis ..................................................................................................................................... 6
3.1
Waveforms ............................................................................................................................................ 6
3.2
Spectrograms ......................................................................................................................................... 6
3.3
Analysis of vowels .................................................................................................................................. 7
3.3.1 Oral vowels ........................................................................................................................................ 7
3.3.2 Nasal vowels ...................................................................................................................................... 7
3.4
Analysis of consonants........................................................................................................................... 8
3.4.1 Stops .................................................................................................................................................. 8
3.4.2 Nasals ................................................................................................................................................ 9
3.4.3 lateral /l/ ......................................................................................................................................... 10
3.4.4 trill /r/ .............................................................................................................................................. 11
3.4.5 approximants ...................................................................................... Error! Bookmark not defined.
3.4.6 fricatives .......................................................................................................................................... 13
1
1
1.1
Fundamental terms
Sound waves: production and perception
Sound waves have their origin in the acoustic interference which produces vibration carried by a
propagation medium (i.e. the substance or object through which the sound travels, usually air). The
vibration consists of small and rapidly occurring variations in air pressure.
Figure 1: Diagrammatic representation of fluctuations in air pressurecuased by a vibrating tuning fork (from
Ladefoged, Elements of acoustic phonetics).
In case of speech, the variations are caused by the actions of the articulatory organs of the speaker that
are superimposed on the outgoing stream of air (-> outward AM).
When the sound wave reaches the ear it causes vibration of the eardrum. The eardrum is connected to
a chain of bones which transmit its vibrations to the liquid in the inner ear. The vibrating liquid
stimulates the nerves which lead to the auditory sensation area of the brain so that there is a sensation
of hearing.
Figure 2: A schematic diagramm of the mechanism of the ear (from Ladefoged, Elements of acoustic phonetics).
2
1.2
Frequency
Each sound wave is characterized by a specific frequency and amplitude.
Frequency refers to the distance between the peaks of the subsequent waves (i.e. it describes how
close together the waves are).
Figure 3: A periodic wave (from Davenport & Hannahs, Introducing phonetics and phonology).
Frequency is measured in cycles per second (Hz). One cycle is the movement of the wave from the
rest (B) to the peak (C) and back to the rest, to the trough (A) and to the rest.
A sound wave whose frequency is 100 Hz has 100 cycles in a second.
Figure 4: A wave of a 20 Hz frequency from Davenport & Hannahs, Introducing phonetics and phonology).
Fundamental frequency (of a voiced speech sound) is the frequency of vocal folds vibration.
Depending on the size of the vocal apparatus human voice produces sounds within the ranges:
80-220 male
120-300 female
200-500 children
1.3
Amplitude
It refers to the maximum distance between the peak and the trough.
1.4
Acoustic vibrations
In order to be heard the frequency of the vibration must be in the range 20-20000Hz.
Vibrations can have a different course:
a) periodic – the amplitude of the wave takes the same value at regular time intervals, the
frequency of the wave can be determined. Periodic vibrations have musical quality (vowels,
and sonorant consonants: glides, liquids and nasals).
3
b) aperiodic – the amplitude of the wave takes random values, the frequency of the wave can not
be determined. Aperiodic vibrations have less musical quality e.g. buzz, murmur, hiss etc.
(voiceless obstruents).
c) mixed – the aperiodic course is superimposed on the periodic one (voiced obstruents)
Figure 5: Examples of complex waves of a periodic (top), aperiodic (middle) and mixed (bottom) vibration (from
Dukiewicz).
1.5
Simple and complex waves
Speech sounds consist of complex waves which result from superposition of a number of simple
waves (pure tones) – harmonics whose frequencies are whole number multiples of the basic
frequency of the sound. E.g. a sound (periodic or mixed vibration) of the basic frequency of 100Hz
will have harmonics of 200, 300, 400Hz etc.
4
Figure 6: Two simple waves of frequency 100 and 500 cps.
Figure 7: The complex wave resulting from superposition of two simple waves of 100 and 500 cps (from Ladefoged,
Elements of acoustic phonetics).
2
Distinctive features of sounds
Two sounds of the same duration (lenght) can differ with respect to:
2.1
Pitch
Pitch refers to the subjective impression of the “height” of the sound (perception by humans). It is
related to frequency of the vibration which is an acoustic (objective) measure indicating the “height”
of the sound. Two sounds of a different frequency can be perceived as having the same pitch.
2.2
Loudness
It is related to the amplitude of the sound in the same way as pitch relates to frequency. The higher the
amplitude the louder the sound is perceived.
Amplitude is affected by the efficiency and distance of the propagating medium: the larger the
distance the less audible the sound becomes, some materials, e.g. wood, are more efficient in carrying
sounds than air.
2.3
Quality
5
Two sounds of the same duration, frequency and amplitude can still differ due to differences in quality
(or colouring). The latter result from differences in the shape of the propagation medium (hence
differences in the perception of the same phoneme produced by different speakers as well as
differences in the vowel quality resulting from different shape of the vocal tract) and the material
enclosing that medium (in case of musical instruments e.g. flute made of metal vs. wooden violin).
Depending on the features (shape and material) of the propagation medium some harmonics of the
sound will be emphasized and others will be weakened.
3
Machine analysis
There are different techiques which make it possible to “see” sounds:
3.1
Waveforms
It illustrates variations in the air pressure associated with speech sounds. In the waveform pulses
corresponding to the vibrations of the vocal folds can be seen.
What can we read from a waveform?
a) amplitude
b) F0
c) the manner of articulation (to some extent):
vowels, approximants and nasals – pulses (voicing), high amplitude and energy (vowels, approximants
and in the end nasals)
voiced obstruents (plosives, fricatives and affricates) – pulses and low energy and amplitude (fricative
segments, plosives)
voiceless obstruents – empty spaces in case of stops, aperiodic variation in the amplitude in case of
fricatives and fricative component of an affricate
Figure 8: Waveform of an utterance: Ostatnie przygody Korowiowa i Behemota (male speaker).
3.2
Spectrograms
Is the display of variation in the frequency domain over the time and is produced by a spectrograph.
On the spectrogram you can see vertical lines which correspond to pulsations of the vocal folds.
In the frequency domain it can be seen that certain frequencies are emphasized (dark marks) – they
occur as horizontal lines or irregular striations (in higher frequencies). The specific emphasized
frequencies above the F0 that appear on the spectrogram are called formants.
The frequency of the formant depends on the size and shape of the vocal tract, so in a spectrographic
analysis it provides information on the place and manner of articulation.
In the analysis of speech the first four formants are taken into account and they are marked as F1, F2,
F3 and F4 (from the lowest to the highest on the frequency scale). F1 and F2 are the most important
indicators of vowel quality, whereas the higher formants reflect speaker’s characteristics (voice
quality).
In the flow of articulation changes in formant frequencies which occur when the setting of the vocal
tract is changed from one sound to another are called transitions.
6
Spectrograms provide a reliable basis for the analysis of such aspects of speech sounds as duration, F0
and phonetic features (e.g. aspiration) and provides information necessary for the identification of
different speech sounds (on the basis on e.g. formant frequencies, transitions and the display of the
vocal cords pulsation).
Figure 9: Spectrogram of an utterance: Ostatnie przygody Korowiowa i Behemota (male speaker).
3.3
Analysis of vowels
3.3.1 Oral vowels
The auditory quality of a vowel is the result of the specific variations in the air pressure due to vocal
tract shape superimposed on the F0 produced by the vocal cords. Generally, the frequency of the F2
decreases from the front to back vowels. The frequencies of the F1 change with the horizontal position
of the tongue and increase from high to low vowels.
Figure 10: The range of formant frequencies of Polish vowels pronounced in the context of the 6 Polish vowels
pronounced by 10 speakers (after W. Jassem)
3.3.2 Nasal vowels
In Polish there are two nasal vowels. In the ortographic transcription they are symbolized by [ą] and
[ę]. These vowels are always realized as diphtongs (contrary to monophtongal realization as in e.g.
French): they consist of a sequence of /o/ and /e/ (for [ą] and [ę] respectively) followed by a nasal
segment: /m/, /n/, /ɲ/, /ŋ/, / w~/ or /j~/.
The polisegmental structure of Polish nasal vowels is reflected in the spectrogram.
The part of the display corresponding to the oral vowel /o/ or /e/ has formant frequencies characteristic
of that vowel. The display changes with the occurence of the nasal segment – the frequencies of the F2
and F3 as well as the energy of all the formants decrease. The transitions of F2 are characteristic of the
subsequent nasal segment.
7
Figure 11: Spectrogram of the word „pęk”: you can see that the nasal vowel [ę] is realized as /e/ followed by velar /N/.
3.4
Analysis of consonants
The spectrographic analysis of consonants is more complex, because often the identification of the
consonant is possible only on the basis of the “behavior” of the vowel formants. However, it is
possible to identify a consonant on the basis of acoustic correlates of consonantal features:
 voicing: vertical striations corresponding to the vibrations of the vocal folds
 place of articulation: characteristic transitions and locus (see below)
 manner of articulation: characteristic formant structure and other
3.4.1 Stops
Stops occur as gaps in the patterns on the spectrogram followed by burst of noise (voiceless) or sharp
beginning of formant structure (voiced).
Stops are identified on the basis on their effect on the adjacent (or preceding) vowel: the transitions of
the vowel formants have different place of origin (or ending) called locus.
The correspondence between locus and place of articulation is not straightforward, because the actual
point of origin (or ending) of the formant depends on the vowel. The position of that part of the tongue
which is not involved in the formation of the closure will be that of the vowel, and at the moment of
the release the formant frequencies will be determined by the shape of the vocal tract as a whole.
 bilabial: locus of F2 and F3 comparatively low; rapid increase of all the three formant
frequencies at the release of the closure and a rapid decrease at the moment of the closure
 dental (also post-dental): locus of F2 about 1700-1800; a rapid increase in F1 and a slight
decrease in F2 and F3 at the release of the closure, decrease in all the three formant
frequencies at the moment of the closure
 velar: F2 usually high, a rapid increase (at release) and decrease in F1 (at closure), a rapid
decrease in F2 and F3 at the release and narrowing of the distance between F2 and F3 towards
the closure
8
Figure 12: A spectrogram of the words „bab”, „dad”, “gag” (Brit. Eng. accent, from Ladefoged).
3.4.2 Nasals
Nasal consonants have formant structure similar to that of vowels but with nasal formants of about
250, 2500, 3250 Hz (they are independent of the place of articulation and therefore have considerable
stable values, they occur at a distance of 800-1000 Hz). The higher formants are considerably reduced
in intensity.
Nasal consonants can be distinguished on the basis of length of the acoustically effective part of the
oral tract (i.e. the space between the closure formed by the articulators and the nasal tract and
pharynx). Generally, the shorter it is, the higher the frequencies of the formants.
Nasal have an effect on the neighbouring vowels similar to that of stops – depending on the vowel
quality and place of articulation the transitions of the vowel formants have different direction and
locus.
Antiformants are characteristic of nasals. They are the effect of the blocked oral part of the vocal tract
and can be defined as significant minima of energy occurring at specific frequencies. Their frequencies
are affected by the acoustically effective part of the oral part of the vocal tract in the same way as
formants.
/m/: F2 – F0 – F3 (F0=800Hz)
/n/: F3 – F0 – F4 (F0=1400Hz)
/ɲ/: F4 – F0 – F5 (F0=3000Hz)
Figure 13: A spectrogram of „pin”, “Tim”, “king” (Brit. English accent, from Ladefoged)
/maŋgo/
9
/dana/
/daɲa/
3.4.3 lateral /l/
Like nasals their formant structure is similar to that of vowels.
 mean frequency of the F1 is about 400Hz
 the energy of F3 or F4 is higher than the energy of F2
 the distance between F3, F4 and F5 is smaller than in vowels (about 1500 Hz)
 there is an antiformant at 3500-4500 Hz: it results from the characteristic shape of the vocal
tract (obstruction of the airstream at a point along the center of the oral tract)
lubię /lubje/
10
liczę /litʃe/
3.4.4 trill /r/
The consonant /r/ consists of a vocalic and consonantal segments occurring one after another. They are
of a very short duration and the vocalic segments is usually longer than the consonantal one. In the
flow of articulation /r/ is most often produced as a sequence of two consonantal segments separated by
the vocalic segment of a total duration of 20-30 milisec.
In the context of vowels of a considerably low F2 /r/ has the following features:
 F2 (locus)= 1250, F3 (locus)=1500 Hz (1600)
 The loci of formants F3, F4 and F5 of the consonantal segment are lower than the
corresponding formants of the vocalic segments
 the transitions of the formants (which are significant) occur at the length of the vocalic
segments
 the transition of F2 is positive and the transitions of F3, F4 and F5 are negative towards the
consonantal segment
 the greatest difference in the frequency can be observed between the consonantal segment and
the locus of the F3 of the vowel (it can be greater than 1000 Hz)
/torty/
/orka/
11
3.4.5 Glides
Like nasals, lateral /l/ and trill /r/, the two approximants /j/ and /w/ have formant structure similar to
that of vowels, but characterized by dynamic changes in the formants.
/j/
In the context of an adjacent vowel of a considerably low F2 /j/ has the following features:
 F1 is stable
 loci of the formants: F1=250 Hz, F2=2300 Hz, F3=3300 Hz, F4=3400 Hz
 the energy of F2 of /j/ is lower than the energy of the adjacent vowel
 at the start of /j/ the F3 is steady (F3= 3300Hz) and then there appears the transition to the
locus of the F3 of the adjacent vowel. Over then length of the vowel the transition changes
direction (from falling to rising)
 contrary to F1 and F2 the frequency of F4 is higher for /j/ than for the adjacent vowel, the
distance between F4 and F3 is only 100Hz
In the context of a preceding vowel of a considerably low F2 such as /u/, /j/ has the following features:
 formant transitions are generally symmetrical to those occurring when the vowel follows /j/
 F3 approaches F2 instead of F4
 loci of the formants: F1=250 Hz, F2=2300 Hz, F3=2500 Hz, F4=3500 Hz
 except for F1, the other formants are not steady
/kruj/
In the context of a preceding vowel of a considerably high F2 (such as /ɨ/) /j/ has the following
features:
 minor changes in F2 frequency of the vowel
 steady F3 and F4 of the vowel
 no characteristic “double angle” (which occurs when the vowel has considerably low F2)
 nearly equal distance between F2, F3 and F4
 loci of the formants: F1=250 Hz, F2=2400 Hz, F3=3050 Hz, F4=3750Hz
/krɨj/
/w/
12




contrary to /j/, the formant frequencies are more steady in sequences /w/+vowels than
vowel+/w/
loci of the formants: F1=300-500 Hz, F2=700-900 Hz, F3=2500-2700 Hz, F4=2900-3200Hz
the energy of the formants is significantly lower than that of the vowel except for those
sequences where the F1 locus of the vowel and /w/ has the same frequency
there is no tendency to create a “double angle”
/piwci/
/stuw/
3.4.6 fricatives
Fricatives are characterized by a random noise pattern located especially in higher frequency regions,
but dependent on the place of articulation. The aperiodic vibrations in higher frequency regions are
displayed as irregular striations – dark vertical lines in the upper part of the spectrogram.
The main resonant frequency (marked as the darkest part on the spectrogram) rise as the size of the
oral cavity becomes decreases (i.e. the further forward in the mouth the obstruction is).
/f/, /v/ 4500-7000Hz
Figure 14: A spectrogram of the word /farmer/: notice the initial /f/ with characteristic fricative noise in the frequency
of 4500-7000Hz.
/s/, /z/ 4000 Hz
13
Figure 15: A spectrogram of the word sad /sat/: notice the initial /s/ with characteristic fricative noise in the frequency
of 4000Hz.
/ʃ/, /ʒ/ 3000 Hz
Figure 16: A spectrogram of the word szal /ʃal/: notice the initial /ʃ/ with characteristic fricative noise in the frequency
of 3000Hz.
/ɕ/, /ʑ/ 2000 Hz
Figure 17: A spectrogram of the word siad /ɕat/: notice the initial /ɕ/ with characteristic fricative noise in the
frequency of 2000Hz.
/x/ 1000 Hz
Figure 18: A spectrogram of the word harcerz /xartseʃ/: notice the initial /x/ with characteristic fricative noise in the
frequency of 1000Hz.
14
The intensity of the noise is generally lower in case of voiced than voiceless fricatives.
The F1 is only visible in case of /f/, /v/ and /x/ (due to the effect of the antiformant in case of the other
fricatives). Its values vary from 400-1000Hz and are higher for the labiodental than velar fricatives.
The highest range of the frequency of the noise is observed for /f/, /v/.
15
Download