speech_LECT15

advertisement
DEVELOPMENT AND NEURAL CONTROL OF SPEECH PRODUCTION
Definitions of Speech and LAnguage
In considering how humans and other species communicate, it
is clear that certain symbols must be transmitted from one
individual to another for meaning to be conveyed. We do not
otherwise know what the "sender" means to convey, so we must rely
on signals of one kind or another. These may be sound waves, or
writing or other made images, or body movements such as shrugging,
making a fist, or, for more detail, sign language.
Signals are sent and received through some type of language.
A language may be defined as an arbitrary system of signs or
symbols used according to prescribed rules to convey meaning
within a linguistic community (Kent, 1988). Animals other than
humans have meaningful communications, such as wagging of tails or
barking and baring of fangs; they are much broader in meaning
(less capable of detail) than human language. We humans also use
body language at times, to communicate an overall emotional or
affective background to our conversations.
Speech is one mode of using language; it is a system that
relates meaning with sound. The fact that humans rely so heavily
on speech for communication is probably a consequence of the
organization of our nervous systems: Humans are uniquely disposed
to symbolically transforming our experiences. Since the dawn of
social interaction, we have drawn pictures on the walls of caves
and more recently on paper, canvas, or the sides of freeway ramps,
to communicate to others how we feel about something. Given the
flexible muscular apparatus of the mouth and pharynx, and our
propensity toward language, it was only a matter of time until we
learned to convert squeaks and grunts to the symbol-rich noises of
speech. (Whales and other cetaceans are also capable of producing
a wide variety of sounds, leading to speculation that they may
communicate on a similarly complex level.) Humans like to
symbolize experience, and speech is one of the principal ways.
Development of Language and Speech
Language
The rather stereotypical way that language develops in
infants also sheds light on the role of the nervous system in
making detailed informational sounds. It used to be said that
children learn to speak (the language) by reinforcement of the
babbling sounds they make. When he or she says a word correctly, a
parent smiles and the child files the word away as a "good" sound.
We now know this is not correct, and that children would have to
babble for several hundred years to learn enough words in this
way. Instead, Chomsky (1959) and others have argued convincingly
that children are born with an innate ability to learn and to use
language. This really means to use the rules of language, because
language is governed by the rules of morphology and syntax, which
tell how we arrange sounds into words and sentences. Morphology
describes the rules governing changes in word units. A morpheme is
the smallest meaningful unit of a word; e.g. "work" is one
morpheme, "worked" is two morphemes. Syntax covers the rules of
word order, or how words are put together to form a sentence.
The way in which children learn language is illustrated in
the book by Perkins (1971): "When, near his first birthday, he
speaks his first meaningful word (the child) gives evidence of
having broken the semantic and phonemic codes. His first word has
meaning. It expresses an idea. It might, for example, be `moo,'
which, because it is night and no cows are insight, can be taken
to refer to the moon in the sky....An adult can translate it into
mature linguistic form: `That object up there is the moon."
One clue about the way children learn languages comes
from the study of "parentese", the type of speech that adults
adopt for talking to newborn babies. This is melodious, with
drawn-out vowel sounds: "Hellooooo, Baaa-beee. How are
yooooooo?" etc. Dr. Kuhl at the U. of Washington studied how
mothers talk to their newborns in America, Sweden and Russia
and found they all use the same kind of speech. This teaches
the babies how the vowels sound in each language. There are
nine vowel sounds in English, five in Russian and sixteen in
Swedish.
We can see that children learn the rules of grammar by the
mistakes they make. "He gived it to me," correctly follows the
rules of past tense-making. The speaker has not yet learned that
"give" is an irregular verb. A computer will probably never be
able to translate English syntax, because it can’t learn the
difference between "Time flies like an arrow," and "Fruit flies
like a banana."
Speech
An approximate timetable for the appearance of different
stages of infant utterances is shown in Table I. These times
represent average values, and should not be taken as diagnostic of
any dysfunction.
Table I. Typical phonological development in the young child
________________________________________________________________
0 - 6 months
6 - 10 months
Vocalization
Sound production in larynx
Cooing
Production of posterior
sounds, "coo" or "goo"
Expansion
More complex sounds
Canonical
Real syllables, "bababa"
babbling
or "dadada"
6 - 18 months
Variegated
babbling
Increasingly varied consonants and vowels
12 months
Appearance
One-word utterances
of true speech
18 months
"
Two-word phrases
30 months
"
More complex
utterances
In the phonation stage, sounds are produced in the larynx but
there is little resonance or reinforcing vibrations. The mouth may
be nearly closed, and the resulting sounds have a nasal quality.
The cooing stage involves some primitive syllable formation, and
rounded vowels such as "oo". In the expansion stage the child
gains increasing control of the vocal mechanism; sounds such as
squeals, clicks, growls, buzzing of the lips or yells may be
produced. Vowels become more fully resonant. The canonical
babbling stage (also known as reduplicated babbling) shows the
appearance of true consonants and is easily recognized. Parents
say the child is "talking," although no meaningful speech is
produced. In variegated babbling, the sounds are repeated less and
sound more like speech.
True speech, of course, has meaning when it first appears in
the child's utterances. One-word utterances such as "Mama" or
"ball" are recognizable as speech if they are produced when the
child sees the named person or object. The next step is two-word
phrases, a giant act of combination.
These may look like "Daddy go," "More cookie," etc., although
made-up words may also be used. From here on sentences become ever
more complex, depending on how much the child reads or how little
he or she watches television.
Mechanism of Voice Production in the Adult
The vocal organs are the lungs, the trachea, the larynx
(containing the vocal cords), the throat (pharynx), the nose and
mouth. Together, these organs form an intricate "tube" extending
from the lungs to the lips. The part of the tube lying above the
larynx is called the vocal tract and consists of the pharynx,
mouth and nose. The shape of the vocal tract can be varied
extensively by moving the tongue, lips, and other parts of the
tract.
The source of energy for speech production is the steady
stream of air that comes from the lungs as we exhale. When we
breathe, the exhaled air is made audible for speech by vibrating
the air stream through vocal cord action, a process known as
phonation. The vocal cords (or "vocal folds") consist of an
epithelial covering and layers of tissue overlying the
thyroarytenoid (or vocalis) muscles (Sataloff, 1992
Resonance in the pharynx, nose and mouth then gives the voice its
characteristic overtones. Specific vowel and consonant sounds are
articulated by changing the shapes of resonators and completely or
partially obstructing the flow of air. Thus, the four phases of
voice production are known as respiration, phonation, resonance
and articulation.
Respiration
The muscles of respiration - intercostal, diaphragm, back and
chest muscles - provide the force for moving the airstream and
vibrating the vocal folds. Respiration thus controls the loudness
of the voice. The patterns of inspiration and expiration during
breathing and speech are shown in Figure 3. The lines indicate a
measure of lung inflation such as chest diameter. Normal breathing
consists of a rapid intake and expulsion of air, as indicated.
During speech the air is inspired rapidly and released at a much
slower, almost constant rate until the lung volume has reached a
minimum.
Phonation
In order for phonation to occur, the vocal folds must be
brought sufficiently close together to touch during their
vibration; this is shown in Figure 1, Part A. Different muscles
may affect the position or tension of the folds, as shown in Part
B. For instance, contraction of the lateral crico-arytenoid
muscles rotates the arytenoid cartilage and brings the folds
together.
This occurs by rotation of the arytenoid cartilages; the
cartilages don't rotate with each vibration, but just to start the
speech sound. As we produce a sound, the vocal cords open and
close rapidly, chopping up the steady airstream into a series of
puffs. We hear this rapid sequence of puffs as a buzz.
The pitch or frequency of the voice is changed by (1)
contraction of the muscles in the vocal folds, increasing their
tension, and (2) changing the shape of the folds. The edges may be
made thin and pointed or thick and well-rounded. From slow-motion
pictures it has been found that the vocal folds actually lengthen
as the pitch of singing rises. This is opposite to what might be
expected, but the effect of lengthening is offset by the increase
in tension. The length of the vocal folds averages 15 mm in men
and 11 mm in women, which is why men have deeper voices. At
puberty the male vocal cords grow to about twice their previous
length over a short period of time. This lowers the voice about an
octave, although to the new length results in some cracking and
squeaking of the voice.
Resonance
Vowels (open, relatively unobstructed sounds) are made up of
a fundamental frequency determined by the vibration of the vocal
folds and two or more formants, or higher-frequency sounds
produced by resonance in the pharynx, nose and mouth. This effect
is similar to blowing across the tops of two or more bottles; the
frequencies which resonate at the particular lengths of the
bottles are reinforced, resulting in fairly clear tones; the total
sound is the sum of the tones produced.
By means of a spectrogram, as shown in Figure 5, the
different frequencies present in vowel sounds may be visualized
directly. Part A shows a spectrogram of the sound [i] (phonetic
representation of the "ee" sound such as in "heat") and Part B of
the sound [u], as in "hoot." The darkness of the bands corresponds
to the amount of energy at a perticular frequency. The left side
of Part A indicates three main frequencies present in the [i]
sound. These three tones together give a recognizable [i]; the
other frequencies are higher harmonics and breathiness. Part B
shows the major sound in [u] is the fundamental, about 300 Hz. The
[u] sound is an almost pure tone resulting from the vibration of
the vocal folds and some resonance in the mouth. To make an [i]
sound, the fundamental is produced with the vocal folds, one
formant is produced in the mouth and another between the teeth and
lips.
By plotting spectrograms continuously during speaking, it is
possible to make voiceprints, or frequency plots of different
speakers while pronouncing certain words or phrases. These were
once thought to be specific to a given person, because of the
individuality of the anatomical structures and timing of speech.
(After all, most voices are distinct even over the telephone,
which severly limits high and low frequencies). Later, however, it
was realized that different individuals may have identical
voiceprints, so they are not useful for legal identification.
Articulation
Vowels are articulated by altering the shape of the vocal
tract, by changing tongue position or lip of jaw configuration.
One familiar example is the diphthong [ai], as in "high."
Diphthongs are a special class of vowels where there are two or
more vowel sounds and the tongue, jaw and lips change positions to
move from
ne sound to another. Thus, [ai] is a combination of "a"
[a] and "ee" [i]. To go from the first sound to the second the
tongue moves from a low to a high position and the lips retract.
Consonants are sounds produced by partial or complete
interruption of the airstream. They may be voiced or voiceless,
depending on whether the vocal folds are vibrated during the
consonant. For instance, the bilabial p is voiceless, as in "pit,"
while the bilabial b is voiced, as in "bit." One system of
classification of consonants in English is shown in Table II.
Table II. Types of English consonants
________________________________________________________________
Manner of
Consonants
Mechanism of production
articulation
________________________________________________________________
Nasal
[m], [n], [ng]
Obstruction of mouth, nasal
passage open
Plosive
[p], [b], [k], [g], Complete stoppage of air
[t], [d]
Fricative
[f], [v], [w],
[th], [sh], [zh]
Partial stoppage of air
Affricate
[ch], [j]
Partial stoppage, then rapid
release
Lateral
[l]
Air leaves sides of mouth
Glide
[r], [y]
Shape of resonators changes
________________________________________________________________
The positions of oral structures during production of some
consonants is shown in Figure 6. The velar consonants [k] and [g]
(voiceless and voiced) are made by raising the posterior tongue
against the palate, thereby blocking the air stream and then
suddenly releasing. The bilabial m is made with the lips sealed
and palate lowered, creating continuity of the oral and nasal
cavities, followed by opening the lips and suddenly releasing the
air stream. [p] and [b] are also bilabials, but with the palate
raised, sealing off the nasal cavity. The fricatives [f] and [v]
are made by placing the upper incisors against the lower lip, with
continuous but constricted airflow, giving them a turbulent or
hissing sound. [t] and [d] are produced with the tip of the tongue
just behind the upper alveolar ridge. The [] sound is
the tongue between the teeth, the [s] with a small
the tongue tip and alveolar ridge.
made with
space between
Articulation of consonants (and vowels) takes place
continuously during speech. It is remarkable that both speakers
and listeners learn to distinguish so many sounds during the short
period of time required to say something, and extract the
significant meaning from the rapidly-changing sounds.
Neural Mechanisms of Speech Production
Innervation of laryngeal muscles
The innervation of the laryngeal muscles is shown in Table
III:
Muscle
Innervation
Action
Thyroarytenoid
X
Relaxes vocal fold
Lateral cricoarytenoid
X
Brings vocal folds together
Transverse arytenoid
X
Close inlet of larynx by
bringing arytenoid cartilages
together
Posterior cricoarytenoid
X
Separates vocal folds
Efferent pathways for speech
Since so many of the same muscles are used for speech
production as for mastication and swallowing, it is natural that a
large amount of overlap exists in the motor pathways for these
different behaviors. The pyramidal neurons for speech are also
located in the facial area of the motor cortex, but other
important cortical areas are important for speech but are not used
for mastication or swallowing: (1) Wernicke's area, located in the
topmost gyrus of the temporal lobe, has important associative
functions in creating and recognizing speech sounds. Damage to
this area, as from a stroke or trauma, may result in Wernicke's
aphasia, which includes an inability to comprehend or remember
words (dysnomia). This can lead to so-called "empty speech," where
the structure is gramatically correct but lacks content and
meaning (Lezac, 1976). Auditory comprehension of patients with
Wernicke's aphasia is poor, and is considered to be poorer than
their speech production. (2) Broca's area, the lower portion of
the anterior motor association area in the parietal lobe,
regulates the motor organization and patterning of speech.
Patients with lesions in this area cannot organize the muscles of
speech to form sounds or pattern groups of sounds into words. They
may thus be dysfluent, although their ability to comprehend
languange is unimpaired.
With modern imaging techniques, it is possible to record the
regional blood flow, indicating increased neuronal activity, in
different areas of the brain during various language tasks (Posner
and Raichle, 1994). This is shown in Figure 8.
Passively viewing words elicits activity in the visual cortex.
Listening to words generates activity in Broca's and Wernicke's
areas, as does generating verbs. However, speaking words of many
kinds also excites activity in parietal areas. Thus, language
processing apparently involves cortical regions besides the
classical language areas.
Several other neuromuscular conditions may interfere with
normal fluent speech, including dysarthrias, or types of weakness,
paralysis or incoordination. Some diseases which have associated
dysarthrias are parkinsonism, amyotrophic lateral sclerosis, and
cerebellar disease. Such conditions may of course disrupt the
normal patterns of mastication and swallowing as well.
The variety of sounds which can be made with the oral
apparatus indicates the complex sensorimotor integration which
occurs with speech. Much of this occurs at the level of the
medulla, and involves the motor nuclei of cranial nerves V, VII,
and X and the nucleus ambiguus. Sensory feedback about the
position of oral structures is provided to the speaker by oral
mechanoreceptors. The assessment of oral sensory function often
includes a test for oral stereognosis, in which the shapes of
small objects placed in the mouth are matched against drawings of
the different objects.
Feedback about the quality of sound produced is obtained from
auditory receptors. Both oral sensory function and hearing are
vital to development of high quality speech. In this way, speech
production is similar to mastication or swallowing, both of which
depend on sensory feedback.
Effects of Dental Conditions on Speech
Although several types of malocclusions, traumas or
developmental defects may affect the quality of speech, it is
remarkable to what extent the patient is able to compensate for a
particular defect (Lawson and Bond, 1968). Some of the more
obvious conditions and their effects on speech are shown in Table
IV.
Table IV. Effects of dental and orofacial defects on speech
________________________________________________________________
Condition
Speech sound(s) affected
________________________________________________________________
Anterior open bite
[f], [v]
Recessive mandible
[p], [b], [m]
Prognathism
[f], [v]
Cleft palate
nasal speech
________________________________________________________________
The condition of open bite, if extreme, can prevent the upper
teeth from touching the lower lip, interfering with the fricatives
[f] and [v.] A recessive mandible may not allow the correct
apposition of the upper and lower lips to produce the bilabials
[p], [b] and [m]. Prognathism, or a jutting jaw, may likewise not
permit the upper teeth to contact the lower lip. In the condition
of cleft palate the bones of the hard palate may fail to grow
together or to join with the central portion of the nose. This
leads to a permanent opening between the nasal cavity and oral
cavity, giving most speech sounds a strong nasal quality. However,
cleft palate is repairable with surgery or an appliance worn over
the opening, and many other defects can be compensated. For
instance, a good [f] or [v] can be made by placing the lower teeth
against the upper lip. Nevertheless, the practice of modern
dentistry includes an awareness of the effects of restorations and
prostheses on patient speech and comfort.
Download