DEVELOPMENT AND NEURAL CONTROL OF SPEECH PRODUCTION Definitions of Speech and LAnguage In considering how humans and other species communicate, it is clear that certain symbols must be transmitted from one individual to another for meaning to be conveyed. We do not otherwise know what the "sender" means to convey, so we must rely on signals of one kind or another. These may be sound waves, or writing or other made images, or body movements such as shrugging, making a fist, or, for more detail, sign language. Signals are sent and received through some type of language. A language may be defined as an arbitrary system of signs or symbols used according to prescribed rules to convey meaning within a linguistic community (Kent, 1988). Animals other than humans have meaningful communications, such as wagging of tails or barking and baring of fangs; they are much broader in meaning (less capable of detail) than human language. We humans also use body language at times, to communicate an overall emotional or affective background to our conversations. Speech is one mode of using language; it is a system that relates meaning with sound. The fact that humans rely so heavily on speech for communication is probably a consequence of the organization of our nervous systems: Humans are uniquely disposed to symbolically transforming our experiences. Since the dawn of social interaction, we have drawn pictures on the walls of caves and more recently on paper, canvas, or the sides of freeway ramps, to communicate to others how we feel about something. Given the flexible muscular apparatus of the mouth and pharynx, and our propensity toward language, it was only a matter of time until we learned to convert squeaks and grunts to the symbol-rich noises of speech. (Whales and other cetaceans are also capable of producing a wide variety of sounds, leading to speculation that they may communicate on a similarly complex level.) Humans like to symbolize experience, and speech is one of the principal ways. Development of Language and Speech Language The rather stereotypical way that language develops in infants also sheds light on the role of the nervous system in making detailed informational sounds. It used to be said that children learn to speak (the language) by reinforcement of the babbling sounds they make. When he or she says a word correctly, a parent smiles and the child files the word away as a "good" sound. We now know this is not correct, and that children would have to babble for several hundred years to learn enough words in this way. Instead, Chomsky (1959) and others have argued convincingly that children are born with an innate ability to learn and to use language. This really means to use the rules of language, because language is governed by the rules of morphology and syntax, which tell how we arrange sounds into words and sentences. Morphology describes the rules governing changes in word units. A morpheme is the smallest meaningful unit of a word; e.g. "work" is one morpheme, "worked" is two morphemes. Syntax covers the rules of word order, or how words are put together to form a sentence. The way in which children learn language is illustrated in the book by Perkins (1971): "When, near his first birthday, he speaks his first meaningful word (the child) gives evidence of having broken the semantic and phonemic codes. His first word has meaning. It expresses an idea. It might, for example, be `moo,' which, because it is night and no cows are insight, can be taken to refer to the moon in the sky....An adult can translate it into mature linguistic form: `That object up there is the moon." One clue about the way children learn languages comes from the study of "parentese", the type of speech that adults adopt for talking to newborn babies. This is melodious, with drawn-out vowel sounds: "Hellooooo, Baaa-beee. How are yooooooo?" etc. Dr. Kuhl at the U. of Washington studied how mothers talk to their newborns in America, Sweden and Russia and found they all use the same kind of speech. This teaches the babies how the vowels sound in each language. There are nine vowel sounds in English, five in Russian and sixteen in Swedish. We can see that children learn the rules of grammar by the mistakes they make. "He gived it to me," correctly follows the rules of past tense-making. The speaker has not yet learned that "give" is an irregular verb. A computer will probably never be able to translate English syntax, because it can’t learn the difference between "Time flies like an arrow," and "Fruit flies like a banana." Speech An approximate timetable for the appearance of different stages of infant utterances is shown in Table I. These times represent average values, and should not be taken as diagnostic of any dysfunction. Table I. Typical phonological development in the young child ________________________________________________________________ 0 - 6 months 6 - 10 months Vocalization Sound production in larynx Cooing Production of posterior sounds, "coo" or "goo" Expansion More complex sounds Canonical Real syllables, "bababa" babbling or "dadada" 6 - 18 months Variegated babbling Increasingly varied consonants and vowels 12 months Appearance One-word utterances of true speech 18 months " Two-word phrases 30 months " More complex utterances In the phonation stage, sounds are produced in the larynx but there is little resonance or reinforcing vibrations. The mouth may be nearly closed, and the resulting sounds have a nasal quality. The cooing stage involves some primitive syllable formation, and rounded vowels such as "oo". In the expansion stage the child gains increasing control of the vocal mechanism; sounds such as squeals, clicks, growls, buzzing of the lips or yells may be produced. Vowels become more fully resonant. The canonical babbling stage (also known as reduplicated babbling) shows the appearance of true consonants and is easily recognized. Parents say the child is "talking," although no meaningful speech is produced. In variegated babbling, the sounds are repeated less and sound more like speech. True speech, of course, has meaning when it first appears in the child's utterances. One-word utterances such as "Mama" or "ball" are recognizable as speech if they are produced when the child sees the named person or object. The next step is two-word phrases, a giant act of combination. These may look like "Daddy go," "More cookie," etc., although made-up words may also be used. From here on sentences become ever more complex, depending on how much the child reads or how little he or she watches television. Mechanism of Voice Production in the Adult The vocal organs are the lungs, the trachea, the larynx (containing the vocal cords), the throat (pharynx), the nose and mouth. Together, these organs form an intricate "tube" extending from the lungs to the lips. The part of the tube lying above the larynx is called the vocal tract and consists of the pharynx, mouth and nose. The shape of the vocal tract can be varied extensively by moving the tongue, lips, and other parts of the tract. The source of energy for speech production is the steady stream of air that comes from the lungs as we exhale. When we breathe, the exhaled air is made audible for speech by vibrating the air stream through vocal cord action, a process known as phonation. The vocal cords (or "vocal folds") consist of an epithelial covering and layers of tissue overlying the thyroarytenoid (or vocalis) muscles (Sataloff, 1992 Resonance in the pharynx, nose and mouth then gives the voice its characteristic overtones. Specific vowel and consonant sounds are articulated by changing the shapes of resonators and completely or partially obstructing the flow of air. Thus, the four phases of voice production are known as respiration, phonation, resonance and articulation. Respiration The muscles of respiration - intercostal, diaphragm, back and chest muscles - provide the force for moving the airstream and vibrating the vocal folds. Respiration thus controls the loudness of the voice. The patterns of inspiration and expiration during breathing and speech are shown in Figure 3. The lines indicate a measure of lung inflation such as chest diameter. Normal breathing consists of a rapid intake and expulsion of air, as indicated. During speech the air is inspired rapidly and released at a much slower, almost constant rate until the lung volume has reached a minimum. Phonation In order for phonation to occur, the vocal folds must be brought sufficiently close together to touch during their vibration; this is shown in Figure 1, Part A. Different muscles may affect the position or tension of the folds, as shown in Part B. For instance, contraction of the lateral crico-arytenoid muscles rotates the arytenoid cartilage and brings the folds together. This occurs by rotation of the arytenoid cartilages; the cartilages don't rotate with each vibration, but just to start the speech sound. As we produce a sound, the vocal cords open and close rapidly, chopping up the steady airstream into a series of puffs. We hear this rapid sequence of puffs as a buzz. The pitch or frequency of the voice is changed by (1) contraction of the muscles in the vocal folds, increasing their tension, and (2) changing the shape of the folds. The edges may be made thin and pointed or thick and well-rounded. From slow-motion pictures it has been found that the vocal folds actually lengthen as the pitch of singing rises. This is opposite to what might be expected, but the effect of lengthening is offset by the increase in tension. The length of the vocal folds averages 15 mm in men and 11 mm in women, which is why men have deeper voices. At puberty the male vocal cords grow to about twice their previous length over a short period of time. This lowers the voice about an octave, although to the new length results in some cracking and squeaking of the voice. Resonance Vowels (open, relatively unobstructed sounds) are made up of a fundamental frequency determined by the vibration of the vocal folds and two or more formants, or higher-frequency sounds produced by resonance in the pharynx, nose and mouth. This effect is similar to blowing across the tops of two or more bottles; the frequencies which resonate at the particular lengths of the bottles are reinforced, resulting in fairly clear tones; the total sound is the sum of the tones produced. By means of a spectrogram, as shown in Figure 5, the different frequencies present in vowel sounds may be visualized directly. Part A shows a spectrogram of the sound [i] (phonetic representation of the "ee" sound such as in "heat") and Part B of the sound [u], as in "hoot." The darkness of the bands corresponds to the amount of energy at a perticular frequency. The left side of Part A indicates three main frequencies present in the [i] sound. These three tones together give a recognizable [i]; the other frequencies are higher harmonics and breathiness. Part B shows the major sound in [u] is the fundamental, about 300 Hz. The [u] sound is an almost pure tone resulting from the vibration of the vocal folds and some resonance in the mouth. To make an [i] sound, the fundamental is produced with the vocal folds, one formant is produced in the mouth and another between the teeth and lips. By plotting spectrograms continuously during speaking, it is possible to make voiceprints, or frequency plots of different speakers while pronouncing certain words or phrases. These were once thought to be specific to a given person, because of the individuality of the anatomical structures and timing of speech. (After all, most voices are distinct even over the telephone, which severly limits high and low frequencies). Later, however, it was realized that different individuals may have identical voiceprints, so they are not useful for legal identification. Articulation Vowels are articulated by altering the shape of the vocal tract, by changing tongue position or lip of jaw configuration. One familiar example is the diphthong [ai], as in "high." Diphthongs are a special class of vowels where there are two or more vowel sounds and the tongue, jaw and lips change positions to move from ne sound to another. Thus, [ai] is a combination of "a" [a] and "ee" [i]. To go from the first sound to the second the tongue moves from a low to a high position and the lips retract. Consonants are sounds produced by partial or complete interruption of the airstream. They may be voiced or voiceless, depending on whether the vocal folds are vibrated during the consonant. For instance, the bilabial p is voiceless, as in "pit," while the bilabial b is voiced, as in "bit." One system of classification of consonants in English is shown in Table II. Table II. Types of English consonants ________________________________________________________________ Manner of Consonants Mechanism of production articulation ________________________________________________________________ Nasal [m], [n], [ng] Obstruction of mouth, nasal passage open Plosive [p], [b], [k], [g], Complete stoppage of air [t], [d] Fricative [f], [v], [w], [th], [sh], [zh] Partial stoppage of air Affricate [ch], [j] Partial stoppage, then rapid release Lateral [l] Air leaves sides of mouth Glide [r], [y] Shape of resonators changes ________________________________________________________________ The positions of oral structures during production of some consonants is shown in Figure 6. The velar consonants [k] and [g] (voiceless and voiced) are made by raising the posterior tongue against the palate, thereby blocking the air stream and then suddenly releasing. The bilabial m is made with the lips sealed and palate lowered, creating continuity of the oral and nasal cavities, followed by opening the lips and suddenly releasing the air stream. [p] and [b] are also bilabials, but with the palate raised, sealing off the nasal cavity. The fricatives [f] and [v] are made by placing the upper incisors against the lower lip, with continuous but constricted airflow, giving them a turbulent or hissing sound. [t] and [d] are produced with the tip of the tongue just behind the upper alveolar ridge. The [] sound is the tongue between the teeth, the [s] with a small the tongue tip and alveolar ridge. made with space between Articulation of consonants (and vowels) takes place continuously during speech. It is remarkable that both speakers and listeners learn to distinguish so many sounds during the short period of time required to say something, and extract the significant meaning from the rapidly-changing sounds. Neural Mechanisms of Speech Production Innervation of laryngeal muscles The innervation of the laryngeal muscles is shown in Table III: Muscle Innervation Action Thyroarytenoid X Relaxes vocal fold Lateral cricoarytenoid X Brings vocal folds together Transverse arytenoid X Close inlet of larynx by bringing arytenoid cartilages together Posterior cricoarytenoid X Separates vocal folds Efferent pathways for speech Since so many of the same muscles are used for speech production as for mastication and swallowing, it is natural that a large amount of overlap exists in the motor pathways for these different behaviors. The pyramidal neurons for speech are also located in the facial area of the motor cortex, but other important cortical areas are important for speech but are not used for mastication or swallowing: (1) Wernicke's area, located in the topmost gyrus of the temporal lobe, has important associative functions in creating and recognizing speech sounds. Damage to this area, as from a stroke or trauma, may result in Wernicke's aphasia, which includes an inability to comprehend or remember words (dysnomia). This can lead to so-called "empty speech," where the structure is gramatically correct but lacks content and meaning (Lezac, 1976). Auditory comprehension of patients with Wernicke's aphasia is poor, and is considered to be poorer than their speech production. (2) Broca's area, the lower portion of the anterior motor association area in the parietal lobe, regulates the motor organization and patterning of speech. Patients with lesions in this area cannot organize the muscles of speech to form sounds or pattern groups of sounds into words. They may thus be dysfluent, although their ability to comprehend languange is unimpaired. With modern imaging techniques, it is possible to record the regional blood flow, indicating increased neuronal activity, in different areas of the brain during various language tasks (Posner and Raichle, 1994). This is shown in Figure 8. Passively viewing words elicits activity in the visual cortex. Listening to words generates activity in Broca's and Wernicke's areas, as does generating verbs. However, speaking words of many kinds also excites activity in parietal areas. Thus, language processing apparently involves cortical regions besides the classical language areas. Several other neuromuscular conditions may interfere with normal fluent speech, including dysarthrias, or types of weakness, paralysis or incoordination. Some diseases which have associated dysarthrias are parkinsonism, amyotrophic lateral sclerosis, and cerebellar disease. Such conditions may of course disrupt the normal patterns of mastication and swallowing as well. The variety of sounds which can be made with the oral apparatus indicates the complex sensorimotor integration which occurs with speech. Much of this occurs at the level of the medulla, and involves the motor nuclei of cranial nerves V, VII, and X and the nucleus ambiguus. Sensory feedback about the position of oral structures is provided to the speaker by oral mechanoreceptors. The assessment of oral sensory function often includes a test for oral stereognosis, in which the shapes of small objects placed in the mouth are matched against drawings of the different objects. Feedback about the quality of sound produced is obtained from auditory receptors. Both oral sensory function and hearing are vital to development of high quality speech. In this way, speech production is similar to mastication or swallowing, both of which depend on sensory feedback. Effects of Dental Conditions on Speech Although several types of malocclusions, traumas or developmental defects may affect the quality of speech, it is remarkable to what extent the patient is able to compensate for a particular defect (Lawson and Bond, 1968). Some of the more obvious conditions and their effects on speech are shown in Table IV. Table IV. Effects of dental and orofacial defects on speech ________________________________________________________________ Condition Speech sound(s) affected ________________________________________________________________ Anterior open bite [f], [v] Recessive mandible [p], [b], [m] Prognathism [f], [v] Cleft palate nasal speech ________________________________________________________________ The condition of open bite, if extreme, can prevent the upper teeth from touching the lower lip, interfering with the fricatives [f] and [v.] A recessive mandible may not allow the correct apposition of the upper and lower lips to produce the bilabials [p], [b] and [m]. Prognathism, or a jutting jaw, may likewise not permit the upper teeth to contact the lower lip. In the condition of cleft palate the bones of the hard palate may fail to grow together or to join with the central portion of the nose. This leads to a permanent opening between the nasal cavity and oral cavity, giving most speech sounds a strong nasal quality. However, cleft palate is repairable with surgery or an appliance worn over the opening, and many other defects can be compensated. For instance, a good [f] or [v] can be made by placing the lower teeth against the upper lip. Nevertheless, the practice of modern dentistry includes an awareness of the effects of restorations and prostheses on patient speech and comfort.