Auditory Physiology The primary aim of this chapter will be to review the physiological mechanisms that are involved in two basic and extraordinarily important functions of the auditory system: (1) conversion of the vibratory energy that reaches the ear drum into a series of neural impulses on the auditory nerve (this is called transduction), and (2) the spectrum analysis function of the auditory system; that is, the ability of the auditory system to break a complex sound wave into its individual frequency components. Before getting into the details, it might be useful to consider some of the fundamental capabilities of the auditory system which, from any point of view, are nothing short of awe inspiring. A brief and by no means exhaustive list appears below. The faintest sound that can be detected by the human ear is so weak that it moves the ear drum a distance that is equivalent to one-tenth the diameter of a hydrogen molecule. If the ear were slightly more sensitive we would hear the random particle oscillations known as Brownian motion. The most intense sound that can be heard without causing pain is approximately 140 dB more intense than a barely detectable sound. This means that that the dynamic range of the ear – the ratio of the most intense sound that can be heard without pain to the intensity of a barely audible sound – is an astounding 100 trillion to 1. The frequency range of human hearing runs from approximately 20 Hz to 20,000 Hz, a range of about 10 octaves. For signal levels approximating conversational speech, the ear can detect frequency differences that are on the order of 0.1%, or approximately 1 Hz for a 1,000 Hz test signal (Wier, Jestaedt, & Green, 1977). Later in the chapter we will see that the auditory system utilizes an elegant mechanism that delivers sounds of different frequencies to different physical locations along the cochlea; i.e., a sound of one frequency will produce the greatest neural activity at one physical location while a sound of a slightly different frequency will activate a different location. The difference in frequency that a listener can barely detect corresponds to a difference in physical location along the cochlea of about 10 microns (1 micron = one millionth of a meter, or one thousandth of a millimeter). This distance, in turn, is approximately the width of a single auditory receptor cell (Davis and Silverman, 1970). Under ideal conditions listeners can detect intensity differences as small as 0.6 dB (Gulick, Gescheider, and Frisina, 1989). Listeners can locate the source of a sound based on differences in the time of arrival between the two ears that are as small as 10 s (i.e., 10 millionths of a second). Auditory Physiology 2 Further, the anatomy that supports this processing is a miracle of miniaturization. For example, the middle ear cavity is approximately 3 mm in width and approximately 15 mm in the vertical dimension (Zemlin, 1968), with roughly the volume of a sugar cube. The cochlea, which contains the auditory receptors, is even smaller, at approximately 5 mm in height and approximately 9 mm in diameter at its widest point (Gelfand, 1990). Figure 4-1. The three functional subdivisions of the auditory system. Reprinted from Deutsch and Richards (1979). Overview of the Auditory System The auditory system can be divided into three major functional subsystems: the conductive mechanism, the sensorineural mechanism, and the central auditory system (see Figure 4-1). In terms of anatomical structures, the conductive mechanism consists of the pinna, the ear canal (also known as the external auditory meatus), the ear drum (also known as the tympanic membrane), and the middle ear, which contains three very small bones called the auditory ossicles. The primary function of the conductive mechanism is to transmit the vibrations that are picked up at the tympanic membrane to the structures of the inner ear, a fluid-filled structure which contains the auditory receptors. However, as we shall see, the middle ear also accomplishes a pressure amplification trick which significantly enhances the sensitivity of the ear. The sensorineural mechanism consists of the structures of the cochlea and the auditory nerve, also known as the 8th cranial nerve. The auditory nerve conveys neural impulses between the cochlea and the brain stem, which is part of the central auditory system. The inner ear contains specialized sensory receptor cells called hair cells. These cells are responsible for converting the vibratory energy that enters the auditory system into nerve impulses that are transmitted to the central nervous system via the auditory nerve. In addition to the conversion of vibratory energy into neural impulses, the cochlea also carries out a spectrum analysis in which the low frequency components of the signal are directed to one end of the cochlea and the highfrequency components are directed to the other end. As will be seen later in this chapter, the precise role that is played by this frequency analysis is only partially understood. Auditory Physiology 3 The electrical signals that are generated by the hair cells in the inner ear are carried by the auditory nerve to central auditory system, which consists of structures in the brain stem and auditory cortex. It is often said that the central auditory system is responsible for higher level functions of auditory analysis, such as the "... recognition, interpretation, and integration of auditory information ..." (Deutsch & Richards, 1979). There is little question that the central auditory system is, in fact, heavily involved in higher level functions such as speech recognition and the ability to recognize familiar voices and familiar melodies. However, the central auditory system also plays a very important role in relatively low-level aspects of auditory analysis, such as sound localization, pitch perception and, quite possibly, spectrum analysis. The Conductive Mechanism The Outer Ear The outermost portion of the conductive mechanism is a cartilaginous structure called the pinna, also known as the auricle (see Figure 4-2). While the approximately funnel shape of the auricle might lead one to believe that the structure may play some role in sound gathering, this appears not to be the case (von Bekesy & Rosenblith, 1958). A prominent visual characteristic of the auricle is the rather convoluted shape consisting of a number of ridges, grooves, and depressions. It appears that this complex topography, along with other factors, plays some role in sound localization (von Bekesy & Rosenblith, 1958; Batteau, 1967; Freedman & Fisher, 1968). Sound is conducted to the tympanic membrane through the external auditory meatus, also known as the ear canal. The lateral two-thirds of the ear canal is cartilaginous and the medial third is bone. The general shape of the ear canal approximates that of a uniform tube, open at the lateral end and closed medially by the tympanic membrane. The tube averages approximately 2.3 cm in length (Wiener & Ross, 1946). Recall that the resonant frequency pattern of a uniform tube which is closed at one end (by the ear drum in this case) can be determined if its length is known. Using the formula from Chapter 3, the lowest resonant frequency of the ear canal should be approximately 3800 Hz (F1 = 35,000/(4 . 2.3) = 35,000/9.2 = 3804 Hz). This figure agrees well with experimental data Figure 4-2. The pinna or auricle. (Reprinted (Wiener & Ross, 1946; Fleming, 1939), although from Zemlin, 1968) estimates vary. This resonance is partially responsible for the heightened sensitivity of the auditory system to frequencies in the middle portion of the spectrum, which will be discussed later . The sound wave that enters the ear canal sets the tympanic membrane into vibration. When instantaneous air pressure is relatively high (compression), the membrane will be forced inward, and when instantaneous air pressure is relatively low (rarefaction), the membrane will be forced outward. Consequently, the inward and outward movements of the tympanic membrane mirror Auditory Physiology 4 Figure 4-3. The ear canal and middle ear cavity. Reprinted from Denes and Pinson, The Speech Chain, 1993, W.H. Freeman & Co. those of the sound wave that is driving it; for example, if the tympanic membrane is excited by a 500 Hz sinusoid, the tympanic membrane will move inward and outward sinusoidally at 500 Hz. In general, the instant-to-instant displacements of the tympanic membrane will mirror the instantaneous air pressure waveform that is driving the membrane. The Middle Ear The middle ear or tympanic cavity is an air-filled chamber whose volume approximates that of a sugar cube (see Figure 4-3). The middle ear communicates with the nasopharynx via the Eustachian tube. This tube is approximately 35 mm in length in adults and angles downward and forward to connect the anterior wall of the tympanic cavity with the nasopharynx. The tube is normally closed, but opens during yawning and swallowing. When the tube opens, air can travel either into or out of the middle ear to create an equilibrium between the air pressure inside the tympanic cavity and that of the outside air. The Eustachian tube also plays an important role in allowing fluids to drain from the middle ear into the nasopharynx. In terms of the broad overview presented here, the most important structures in the tympanic cavity are the three ossicles, a series of very small bones referred to collectively as the ossicular chain (see Figure 4-4). The largest of the ossicles is the malleus, which attaches directly to the tympanic membrane. The head of the malleus articulates with the incus, which in turn connects to a very small stirrup-shaped bone called the stapes. The stapes ends in an oval plate called the footplate. The stapes footplate attaches to an opening into the labyrinth called the oval window. The labyrinth is a fluid-filled structure that contains the cochlea and the vestibular system, which is responsible for our sense of balance. The stapes footplate is attached to the oval window via a circular ligament called the annular ligament. Directly below the oval Auditory Physiology 5 window is a second opening into the labyrinth called the round window. The round window is covered by a very small membrane called the internal tympanic membrane. A reasonable question to ask about the auditory system is why we have a tympanic membrane and ossicular chain at all. Since a primary effect of these structures is to transmit vibrations to the fluid-filled structures of the inner ear, then why isn't the oval window simply covered with a flexible membrane that is driven directly by the sound wave? Aquatic animals, in fact, make use of a "direct-drive" system with no middle ear. A system of this kind would work in land animals as well, but for reasons that are Figure 4-4. The auditory ossicles. (From Yost and explained below, a substantial loss of energy Nielson, 1985). would result. The key to understanding the role that is played by the tympanic membrane and ossicular chain is to appreciate the energy loss that occurs when a sound wave is transmitted from the air medium in which the sound is initially generated to the fluid medium that exists inside the inner ear. We know from everyday experience that we do not hear airborne sound very well when we are underwater. The primary reason for this is that there exists an impedance mismatch between the air medium in which the airborne sound is initially generated and the fluid medium into which the vibratory distrubance must be transmitted in order for our underwater listener to hear the sound. Impedance is the total opposition to the flow of energy,1 and the mismatch results from the fact that air is a low-impedance medium while water (and other similar fluids) is a highimpedance medium. These differences in impedance can be demonstrated simply by running a cupped hand through air and water. There is a general rule that states that energy is reflected back toward the source when a signal reaches the boundary between two media whose impedances do not match. In the case of air and fluid, the impedance mismatch is quite large, and when the signal reaches the air-fluid boundary, only 1/1,000th of the energy is absorbed into the fluid medium, with the remainder being reflected back toward the source. Represented on a decibel scale, the loss of signal intensity is 30 dB. In the formula below, the signal intensity on the airborne side of the air-fluid boundary serves as the reference intensity, and the signal intensity on the fluid side of the boundary serves as the measured intensity. 1 Impedance consists of three distinct components: resistance, capacitive reactance (also known as compliant reactance), and mass reactance (also known as inductive reactance or inertive reactance). Resistance is simply the dissipation of energy due to friction. When the head of a thumb tack is rubbed back and forth on the surface of a table, the tack heats up because of the friction of the two surfaces. Capacitive reactance is opposition that is offered due to the elastic properties of an object. For example, when you push against a spring, compressing it beyond its resting state, the spring generates a force that opposes the applied force. The same kind of opposition to an applied force occurs when a spring is stretched beyond its resting state. Mass reactance is opposition due to the inertial properties of objects; that is, the tendency of a resting object to remain at rest, and the tendency of a moving object to remain in motion. Impedance is the vector sum of resistance, capacitive reactance, and mass reactance, with vector sum simply indicating that these three quantities need to be added using the Pythagorean theorem . Auditory Physiology 6 dB = 10 log10 Im/Ir = 10 log10 1/1,000 = 10 (-3) = -30 dB The negative sign here simply means that the signal will be 30 dB weaker on the fluid side of the boundary. Consequently, if the airborne sound wave were to directly drive a simple membrane covering the oval window, a 30 dB loss in signal intensity would occur at the air-fluid boundary. This is not a minor loss of energy. As we will see in the chapter on auditory perception, a 10 dB decrease in intensity corresponds to a decrease of approximately one-half in our subjective impression of loudness. This means that a 50 dB signal, for example, sounds only one-eighth as loud as an 80 dB signal. One of the primary functions of the middle ear is to amplify pressure so as to overcome a large portion of this energy loss. This is accomplished in two ways: (1) an increase in pressure that occurs when the vibrations that are picked up on the relatively large surface area of the tympanic membrane are focused on the very small surface area of the stapes footplate, and (2) an increase in force (and therefore pressure as well) that occurs as a result of the mechanical lever action of the ossicular chain. The "area trick," known as the condensation effect, is by far the more important of the two effects. Recall from Chapter 2 that there is an important distinction between force and pressure: force is the amount of push or pull on an object, and is the product of mass and acceleration; pressure, on the other hand, is force per unit area. A major implication of this relationship is that pressure can be amplified without a change in force simply by decreasing the area over which the force is delivered. This is the design principle underlying thumb tacks and knives with sharp cutting edges, and exactly this principle is at work in the middle ear as the energy that is delivered to the relatively large area of the tympanic membrane is focused on the very small area at the stapes footplate. The amount of pressure amplification that results from this concentration of force is proportional to the ratio of the two areas that are involved. The effective area of the tympanic membrane is approximately 0.594 cm2, while the 2 area of the stapes footplate is approximately 0.032 cm (Durrant & Lovrinic, 1984). Consequently, pressure at the stapes footplate will be approximately 18.6 times greater than pressure at the tympanic membrane (0.594/0.032 = 18.6). This pressure amplification can be represented on a decibel scale. Since we are talking about an increase in pressure, the pressure version of the decibel formula is needed: dB = 20 log10 (0.594/0.032) = 20 log10 (18.6) = 20 (1.27) = 25.4 dB Consequently, of the 30 dB that would be lost at the air-fluid boundary, the condensation effect makes up for roughly 25 dB. A small amount of additional amplification results from the lever action of the ossicular chain. The basic idea is that the ossicular chain is suspended by ligaments in such a way as to Auditory Physiology Figure 4-5. The mechanical lever advantage of the ossicular chain. Adapted from Denes and Pinson, The Speech Chain, 1993, W.H. Freeman & Co. dB 7 form a lever system, with the fulcrum on the body of the incus. One arm of the lever system consists of the malleus while the other arm consists of the incus (see Figure 4-5). The malleus lever arm is approximately 30% longer than the incus lever arm, producing a lever ratio of 1.3:1. Since the force amplification that occurs in any lever system is proportional to the ratio of the lengths of the two lever arms, force will be amplified by a factor of 1.3. Pressure is the force per unit area, so this increase in force means that pressure will also be amplified by a factor of 1.3. Represented on a decibel scale, this amounts to: = 20 log10 (1.3) = 20 (0.11) = 2.3 dB (Notice that the pressure version of the decibel formula is being used here rather than the intensity version. That is because the lever advantage produces an increase in force and, therefore, pressure.) If this 2.3 dB pressure amplification is added to the 25.4 dB that is produced by the condensation effect, we find that the combined action of the middle ear system results in a pressure amplification of 25.4+2.3 = 27.7 dB, nearly all of the 30 dB that would otherwise be lost at the air-fluid boundary. The Sensorineural Mechanism The two major auditory structures of the sensorineural mechanism are the cochlea and the auditory nerve. The cochlea is one portion of a larger structure called the labyrinth. As noted earlier, the labyrinth contains both the cochlea (the organ of hearing) and the vestibular system (the organ of balance). The three major divisions of the labyrinth are shown in Figure 4-6. The snail-shaped portion of the labyrinth is the cochlea, which contains the hair cells and many other structures that are important for hearing. The upper portion of the labyrinth contains three structures called the semicircular canals, which are part of the vestibular system. The middle portion of the labyrinth is called the vestibule. The oval window and round window are openings into the vestibule. The portion of the labyrinth that is shown in panel a of Figure 4-7 is a hollowed-out and fluid-filled bony shell called the bony or osseous labyrinth. Fully contained within the bony labyrinth is a fluid-filled structure called the membranous labyrinth, which can be thought of as something like a convoluted water balloon that fits inside the bony labyrinth (see panel b of Figure 4-7). The fluid that courses through the membranous labyrinth is called endolymph and the fluid outside the membranous labyrinth is called perilymph. Two bulges in the membranous Auditory Physiology 8 Figure 4-6. The labyrinth. From Zemlin (1968). Figure 4-7. The bony labyrinth (panel a) and the membranous labyrinth (the unshaded portion of panel b). Reprinted from Minifie, Hixon, and Williams (1973). labyrinth called the utricle and saccule are part of the vestibular system. The portion of the membranous labyrinth that is contained within the cochlea is called the cochlear duct. The end of the cochlea that is closest to the vestibule is called the base or basal end, and the end that is furthest from the vestibule is called the apex or apical end. The cochlea is divided into three canals or scalae: the scala vestibuli, which lies above the cochlear duct, the scala tympani, which lies below the cochlear duct, and the cochlear duct itself, which is also known as the scala media. The three canals are shown in highly schematic form in a partially unrolled cochlea in Figure 4-8. A small gap at the apical end of the cochlea called the helicotrema allows the perilymph in the scala vestibuli and the scala tympani to communicate. Anatomy of the Cochlea Some of the views that are shown of the cochlea can be a bit difficult to interpret simply because of the coiled shape. Since the coiling is strictly a space-saving feature that has essentially no effect on cochlear physiology, the cochlea is often shown in unrolled form. Panels a and b of Figure 4-9 show views that result from two kinds of cuts through an unrolled cochlea. Panel c shows a highly schematic picture of what the view would look like if a cut were made through the cochlea in its coiled-up form. A more detailed picture of the view in panel c can be seen in Figure 4-10. Shown in this figure are the basal, medial, and apical turns of the cochlea, which are wrapped around a bony core called the modiolus. We can imagine building a structure similar to the cochlear portion of Auditory Physiology 9 the labyrinth by coiling a length of garden hose approximately 2 3/4 turns around wet plaster, and then allowing the plaster to dry. The plaster is analogous to the modiolus, and the garden hose is the cochlea. Entering through a tunnel in the modiolus is the cochlear branch of the auditory nerve. The vestibular branch of the 8th cranial nerve, which is not shown in this figure, Figure 4-8. Schematic of a partially unrolled cochlea showing the scala vestibuli, the scala media, and the scala tympani. Adapted from Zemlin (1968). Figure 4-10. The cochlea, modiolus, and auditory nerve. Reprinted from Deutsch and Richards (1979). Figure 4-9 Cuts through an unrolled (panels a and b) and rolled (panel c) cochlea. Reprinted from Deutsch and Richards (1979). Figure 4-11. A cross-section of the cochlea. Reprinted from Stevens (1951). Auditory Physiology 10 innervates sensory receptors in the vestibular system. Figure 4-11 shows a detailed view of a single cross-section through the cochlea, corresponding to the cut shown in panel b of Figure 49. Fibers from the auditory nerve enter the cochlear duct through a tunnel in a thin shelf of bone called the spiral lamina. The opening in the spiral lamina through which the auditory nerve fibers enter is called the habenula perforata. The spiral lamina is covered with a layer of fibrous tissue called the limbus. As Figure 4-10 shows, the cochlear duct is a triangular-shaped partition of the cochlea that is formed by two membranes: Reisner's membrane, which separates the cochlear duct from the scala vestibuli, and the basilar membrane, which separates the cochlear duct from the scala tympani. The basilar membrane is held in place by the spiral ligament. Covering the spiral ligament in the scala media is a layer of endolymph-secreting vascular tissue called the stria vascularis. The set of structures that are resting on the basilar membrane are referred to collectively as the organ of Corti. A more detailed look at the organ of Corti is shown in Figure 4-12. Emerging from the limbus and lying immediately above the hair cells is a gelatinous membrane called the tectorial membrane. The hair cells are arranged in rows consisting of a single inner hair cell (IHC) and either three or four outer hair cells (OHC), with three OHCs being more common. The hair-like structures emerging from the tops of the hair cells are called cilia. The structure and function of hair cells will be discussed later in this chapter. The human cochlea contains approximately 3,000 - 3,500 arrangements such as those shown in Figure 4-12, consisting of approximately 3,000 - 3,500 IHCs and approximately 10,000 - 12,000 OHCs. In later discussions we will refer to this unit consisting of one IHC and three or four OHCs as a channel. The hair cells are innervated by approximately 30,000 auditory nerve fibers (Spoendlin, 1989), which connect to the base of the hair cells. The overwhelming majority (~98%) of these auditory nerve fibers are afferent, i.e., conveying neural impulses away from the hair cells in the direction of the central nervous system. In turn, the overwhelming majority (~95%) of these afferent fibers are connected to the IHCs as opposed to the OHCs, meaning that it is almost exclusively the IHCs that are responsible for conveying sensory information to the central nervous system (Spoendlin, 1974). A rough estimate is that there are approximately 10 auditory nerve fibers connected to each IHC. Individual auditory nerve fibers that innervate IHCs typically supply a single IHC, rather than sprouting branches to many other IHCs. Exactly the opposite is true of fibers that innervate OHCs, where a single nerve fiber branches to innervate many OHCs. These differences in innervation patterns are shown in Figure 4-13. As will be seen below, the hair cells generate an electrical signal in response to the traveling wave motion of the basilar membrane. These electrical disturbances in turn cause depolarization of the auditory nerve fibers that are attached at the base of the hair cells. The physiology of nerve fibers and the precise meaning of the term depolarization will be explained later in this chapter. For the time being, it is necessary only to understand that the ultimate result of action of the organ of Corti is the generation of an electrical spike on the auditory nerve fibers that innervate the hair cells. The nature of the basilar membrane traveling wave and the mechanisms that are thought to be involved in the generation of the electrical disturbance in the hair cells will be described below. Auditory Physiology Figure 4-12. A cross-section of the cochlea. Reprinted from Stevens (1951). Figure 4-14. The cochlear duct is formed by two membranes: Reisner's membrane above, and the basilar membrane below. In this simplified drawing the duct is represented as a single structure called the cochlear partition. Inward movement of the stapes causes a downward deflection of the cochlear partition, and the fluid pressure is resolved by an outward deformation at the round window. Outward motion of the stapes has the opposite effect: the partition is deflected upward, and the fluid pressure is resolved by an inward deformation at the round window. Based on von Bekesy (1960) and reprinted from Durrant and Lovrinic (1984). 11 Figure 4-13. Innervation patterns for inner and outer hair cells. Note that a single IHC is typically innervated by many nerve fibers, while individual nerve fibers innervating OHCs typically branch to supply several receptor cells. After Spoendlin (1979). [check this citation] Figure 4-15. The basilar membrane varies continuously in stiffness from base to apex. The greater stiffness of the membrane at the base makes the basal end respond better to high frequencies than low frequencies, while the opposite is true of the apical end. After von Bekesy (1960), Rhode (1973), and Durrant & Lovrinic (1984). Auditory Physiology 12 The Traveling Wave Figure 4-14 shows a simplified view of an uncoiled cochlea. The cochlear duct in this figure has been greatly simplified and is represented as though it were a single membrane, attached on either side and running from the base to the helicotrema. We know, of course, that the cochlear duct is formed by two membranes: Reisner's membrane above and the basilar membrane below. These two membranes are often referred to collectively as the cochlear partition, and for the purposes of understanding the movement dynamics of this system we can consider the partition as consisting of just a single membrane. Further it is primarily the mechanical properties of the basilar membrane that control the most important movement characteristics of the cochlear partition. The single most important fact about the basilar membrane is that its stiffness varies systematically from the base to the apex (see Figure 4-15). Specifically, the basilar membrane is stiffer at the base than the apex. Recall from our discussion of spring-and-mass systems in chapter 3 that natural vibrating frequency varies in direct proportion to stiffness; i.e., as stiffness increases, natural vibrating frequency increases. This means that the basal end of the basilar membrane, which is stiff, will respond best to high frequencies, and the apical end, which less stiff, will respond best to low frequencies. Figure 4-16. The basilar membrane traveling wave. Panel a shows a sequence of snapshots of the traveling wave (reprinted from Ranke, 1942). Panel b shows a single snapshot of the traveling wave (reprinted from Tonndorf, 1960). When an acoustic stimulus is delivered to the ear, the vibratory pattern is picked up at the tympanic membrane and transmitted through the ossicles, resulting in inward and outward movements at the stapes footplate. As shown in Figure 4-14, the inward pressure of the stapes on the incompressible cochlear fluid causes the cochlear partition to deflect in the direction of the scala tympani. This fluid pressure is resolved by an outward deflection of the internal tympanic membrane, which covers the round window. Similarly, during a rarefaction phase the outward motion of the stapes will cause the cochlear partition to deflect in the direction of the scala vestibuli, pulling the internal tympanic membrane inward. These upward and downward Auditory Physiology 13 deflections of the cochlear partition result in the generation of a displacement pattern with a highly specific shape called a traveling wave. The basilar membrane traveling wave was discovered in a series of ingenious experiments by Georg von Bekesy, which earned him the Nobel Prize for Physiology and Medicine. As shown in Figure 4-16, the traveling wave moves from the base toward the apex, with the amplitude rising rather gradually to a peak, and then decaying rather suddenly after reaching a peak. Panel a shows what a sequence of snapshots of the traveling wave might look like. The smooth curve in panel a is the envelope or amplitude envelope of the traveling wave. A more detailed view of a single snapshot of the traveling wave pattern is shown in panel b of Figure 4-16. As shown in Figure 4-15, the point along the basilar membrane where the traveling wave reaches its maximum amplitude will be strongly affected by the frequency of the input signal: high frequency signals will reach peak amplitude near the basal end of the cochlea, where the basilar membrane is stiffer, while low frequency signals will reach peak amplitude near the apical end of the cochlea, where the basilar membrane is less stiff. What this means is that low-frequency signals will be directed toward the apical end of the basilar membrane, high- frequency signals will be directed toward the basal end of the basilar membrane, and mid-frequency signals will be directed toward an appropriate place in the middle of the basilar membrane. As will be seen below, this frequency-dependent behavior of the basilar membrane traveling wave will be reflected in the pattern of 8th-nerve electrical activity; that is, for low-frequency signals, 8th-nerve electrical activity will be greatest for fibers connected to hair cells at the apical end of the cochlea, while for high-frequency signals, 8th-nerve electrical activity will be greatest for fibers connected to hair cells at the basal end of the cochlea. This relationship between the frequency of the input signal and the point of maximum basilar membrane motion is one of the most important properties of cochlear analysis, and it is the key to understanding what has been called the place theory or the rate-place model of auditory spectrum analysis, which will be discussed later in this chapter. Figure 4-17. The upward and downward movement of the basilar membrane produces a shearing force on the hair cell cilia, causing them to pivot at their base. Reprinted from Ryan and Davis (1976). Auditory Physiology 14 There is one additional fact about basilar membrane motion that is necessary for understanding how the sensorineural system converts vibration into neural impulses. As shown in Figure 4-17, the upward and downward movement of the basilar membrane produces something called a shearing force on the hair cell cilia that results in the side-to-side movement of the cilia. In other words, as the basilar membrane vibrates up and down, the cilia are alternately forced away from and then toward the spiral lamina.2 As explained below, it is this movement of the cilia that produces excitation of the hair cells which, in turn, results in the depolarization of the auditory nerve fibers that innervate the hair cells. Hair-Cell Transduction All sensory receptors are examples of a general class of device called transducers. In all cases the function of a transducer is to convert energy of one form into energy of a different form. Common examples of transducers include microphones, which convert acoustic energy Figure 4-19. Transmission electron micrograph of a longitudinal section of outer hair cells. Reprinted from Kimura (1966). Figure 4-18. A hair cell. Note that the cilia are arranged according to height and interconnected by fine filaments called transduction links. Adapted from Hudspeth (1994). 2 The mechanism of cilia motion that is shown in Figure 4-17 is the most widely accepted view. As simple and intuitive as this model might seem, it may well be incorrect. The motion pattern shown in Figure 4-17 will have to suffice for the relatively cursory review presented here. For our purposes the important point is that movement of the hair cell cilia occurs as a direct result of the movement of the basilar membrane. This much is well established, although the detailed mechanism is not well understood. See Gelfand (1990) and Zwislocki (1984, 1985, 1986) for a discussion of the potential problems with the view presented in Figure 4-17. Auditory Physiology 15 into electrical energy, and loudspeakers, which perform precisely the opposite type of transduction, converting the electrical energy coming from an amplifier into acoustic energy. In the case of sensory receptors, the job is to convert stimulation of various sorts into an electrochemical code consisting of a sequence of neural impulses. In the visual system the incoming optical stimulus is converted by the rods and cones of the retina into a series of neural impulses on the optic nerve that are interpreted by the brain as a visual image. In the auditory system, the type of transduction that takes place involves the conversion of the mechanical vibration that reaches the basilar membrane into a series of impulses on the auditory nerve. While there are many aspects of auditory transduction that remain poorly understood, there is complete agreement that the site of transduction is the hair cell, which generates an electrical potential that stimulates impulses in the 8th-nerve fibers that are connected to its base. The chain of events, which will be described below, includes the following: (1) the vibration of the basilar membrane causes the hair-cell cilia to bend at their base, (2) this "shearing" of the cilia results in the flow of electrical current through the hair cell that is called the receptor current, (3) the receptor current stimulates the release by the hair cell of neurotransmitter chemicals, and (4) uptake of this neurotransmitter substance by dendrites in an 8th fiber connected to the base of the hair cell stimulates an all-or-none action potential in the nerve fiber. A schematic drawing of a single hair cell is shown in Figure 4-18. In close proximity to the base of the hair cell are auditory nerve fibers.3 The cilia projecting from the top of the hair cell serve a crucial function in transduction. Note that the cilia are arranged according to height, with the shortest cilia being closest to the spiral lamina. This feature can be seen clearly in the electron micrograph shown in Figure 4-19. Also shown in the schematic drawing in Figure 4-18 is a series of very thin filaments called transduction links that serve to attach the adjacent cilia of differing heights. The cilia themselves are exceedingly stiff and are effectively hinged at their base. As a result, the application of a force to the hair bundle causes the cilia to pivot at their base rather than bow. Figure 4-21 shows the position of the cilia at rest (panel a), and after the application of a force either in the direction of the tall cilia (panel b), or in the direction of the short cilia (panel c). For reasons that are explained below, it is the cilia motion pattern shown in panel b, in response to the movement of the basilar membrane, which is ultimately responsible for the stimulation of the 8th nerve fiber. To see how the transduction process occurs it is necessary to understand two important electrical potentials that exist within the cochlea. An electrode inserted into the body of the hair cell will record a negative voltage of approximately -60 millivolts (mV), while an electrode inserted into the endolymphatic fluid in the scala media (which is electrically separated from the hair cell body) will record a positive voltage of approximately +80 mV. Consequently, the difference in electrical potential between the hair cell body and the endolymphatic fluid that lies outside of the cell body is approximately 140 mV. As will be seen below, this difference in electrical potential serves as a biological battery that supplies the energy source for the generation of the receptor current. 3 The single afferent fiber and single efferent fiber that are shown in Figure 4-18 should not be taken too seriously. Recall that: (a) the typical IHC is innervated by several afferent fibers, (b) the great majority of afferent fibers innervate IHCs rather than OHCs, and (c) only about 2% of all fibers innervating hair cells are efferent. Auditory Physiology Figure 4-20. A simple electrical circuit consisting of a battery in series with a variable resistor and a current meter. Figure 4-22. Davis' model of hair cell function. Reprinted from Davis (1965). 16 Figure 4-21. Hair cell cilia: (a) at rest, (b) being displaced in the direction of the tall hairs (i.e., away from the modiolus), and (c) being displaced in the direction of the short hairs (i.e., toward the modiolus). Figure 4-23. Model of hair cell transduction proposed by Pickles (1984) and Hudspeth (1985). Adapted from Hudspeth (1994). Auditory Physiology 17 The theory of hair cell function that has enjoyed the widest acceptance is a surprisingly straightforward model (at least in broad strokes) that was proposed a number of years ago by Davis (1965), although many important details of hair cell transduction have only been uncovered within the last few years. To understand how the Davis model works, consider the simple electrical circuit in Figure 4-20. The circuit consists of a battery connected in series with a device called a variable resistor (also known as a rheostat). The meter has been placed in the circuit simply to record how much electrical current is flowing. A variable resistor is simply an electrical resistor whose resistance value can be varied. Volume control dials on devices such as televisions and radios are variable resistors, as are the dimmer dials that are often found in dining rooms. Turning the volume down is a matter of setting the dial on the variable resistor to a high resistance position, limiting the flow of electrical current to a small value. In our simple circuit, this high resistance value would be reflected by a very small deflection on the current meter. On the other hand, turning the volume up involves setting the dial on the variable resistor to a low resistance position, resulting in a large flow of electrical current and, consequently, the current meter would show a large deflection. The battery in this simple circuit corresponds to the roughly 140 mV difference in electrical potential between the hair cell body and the endolymph. According to Davis, the hair cell cilia behave like variable resistors whose resistance values change as they pivot at their base. These changes in electrical resistance modulate the flow of ions between the endolymphatic fluid and the hair cell. (An ion is an atom with either a surplus of electrons, giving it a negative charge, or a deficit of electrons, giving it a positive charge.) A drop in resistance is accompanied by the flow of electrical current, and this current flow is the receptor current. It is now known that the specific type of cilia motion that produces the required resistance drop is movement in the direction of the taller cilia; that is, the kind of motion that is depicted in panel b of Figure 4-20. Electrical resistance offered by the cilia when they are standing straight up (Figure 4-20, panel a) is very high and becomes even higher when the cilia are sheared in the direction of the shorter cilia (Figure 4-20 panel c), resulting in inhibition of the receptor current rather than excitation. A more complex version of the electrical circuit envisioned by Davis is shown in Figure 4-22. A theory proposed by Pickles et al. (1984) and Hudspeth (1985) attempts to explain why this change in electrical resistance occurs when the hair bundle pivots in the direction of the taller cilia. A schematic of the model is shown in Figure 4-23. Recall that very fine filaments called transduction links connect adjacent cilia of different heights. According to this "gatespring" model, movement of the hair bundle in the direction of the taller cilia has the effect of stretching these transduction links, while movement of the hair bundle in the direction of the shorter cilia has the effect of compressing the transduction links (see Figure 4-23). As seen in the figure, stretching the spring-like transduction links has the effect of opening a pore or "molecular gate," allowing ions to flow. On the other hand, movement in the direction of the shorter cilia, which compresses transduction links, has the effect of squeezing the molecular gate closed, inhibiting the flow of ions. As a result, the receptor current tends to be generated primarily during that half of each cycle of vibration that results in movement of the hair bundle in the direction of the taller cilia. Auditory Physiology Input Signal Input Signal Receptor Current Receptor Current 18 Figure 4-24. Relationship between the receptor potential generated by the hair cells and the input signal. The receptor potential is a graded response that preserves the shape of the input signal, except that it is halfwave rectified. To complete the transduction story, it is necessary to note that the receptor current (i.e., the electrical current that flows through the hair cell) stimulates the release by the hair cell of neurotransmitter chemicals which, in turn, stimulates the depolarization of the auditory nerve fiber. Although the causal link between these two events is well established, the precise mechanism that relates the generation of the receptor to the release of neurotransmitter substance is currently not well understood. A crucial fact about the nerve-stimulating electrical disturbance that is generated by the hair cell is that it is a graded signal. This means that the instantaneous amplitude of the hair cell current varies continuously depending on the instantaneous amplitude of the shearing force that is applied to the hair bundle (which, in turn, varies continuously depending on the amplitude of the basilar membrane traveling wave). To say that the receptor current varies continuously with the amplitude of the shearing force simply means that when the shearing force is low the receptor current will be low, when the shearing force is large the receptor current will be high, and when the shearing force is intermediate in size the receptor current will be intermediate. Figure 4-24 is an idealized representation showing how the receptor current varies over time for two input signals. The main point to be made about this figure is that the changes over time in receptor current faithfully model the shape of the input signal, with one critical exception: since the hair cell is stimulated to generate a receptor by shearing of the cilia in one direction only, the "bottom half" of the signal is missing. The name that is given to this process in which only one polarity of a signal is preserved is halfwave rectification. The main point, then, is that the hair cell receptor faithfully models a halfwave rectified version of the input signal (though with some restrictions, which will not be discussed here). The graded nature of the receptor stands in contrast to the all-or-none nature of the electrical potential that is generated by the auditory nerve fiber. As will be seen, this graded, continuously varying receptor current will be translated by the auditory nerve into a sequence of Auditory Physiology 19 discrete on-off pulses called action potentials. The mechanism involved in the generation of these action potentials on the auditory nerve will be described below. Figure 4-25. The dendrites of one neuron synapsing with the axon of an adjacent neuron . Reprinted from Denes and Pinson, The Speech Chain, 1993, W.H. Freeman & Co. Nerve Impulses Neurons are highly specialized cells that form the basic building blocks of the nervous system. The human brain contains approximately 10 billion neurons. Neurons can vary considerably in their detailed structure, but all neurons share a common architecture, which is illustrated in Figure 4-25. The portion of the neuron containing the nucleus is called the cell body. The long projection extending from the cell body, called the axon, carries electrochemical information away from the cell body. Axons terminate in branch-like endings called nerve endings. Axon lengths can vary considerably from one neuron to the next, with the longest axons extending a meter or more. The bushy endings on the other side of the cell body are called dendrites; these extensions convey electrochemical information in the direction of the cell body. The microscopic spaces that exist between the nerve endings of an axon and the dendrites of an adjacent neuron are called synapses. Effector Cells and Receptor Cells Neurons communicate not only with other neurons, but also with two other kinds of specialized cells: effector cells and receptor cells. A common example of an effector cell is a muscle fiber, which receives its stimulus to contract from a neuron. Receptor cells, on the other hand, receive sensory information from stimuli such as light, sound, and touch, and convey this information to adjacent sensory neurons. In the auditory system, hair cells serve as receptor cells, and the neurons that convey information from the hair cells to the central nervous system are auditory nerve fibers. Auditory Physiology 20 Generation of an Action Potential The electrochemical signal that is generated by a neuron is called an action potential. The energy source that supplies the power for the generation of an action potential is a difference in electrical charge between the cytoplasm inside the neuron and the extracellular fluid that lies outside of the cell membrane. If one electrode is placed inside the cell membrane of a neuron in its resting state and a second electrode is placed in the extracellular fluid just outside of the cell membrane, a voltmeter will show an electrical potential of about -50 mV, with the cell body being negative with respect to the extracellular fluid (see Figure 4-26, panel a). The difference in charge exists because of unequal concentrations of positively and negatively charged ions within the cell as compared to the extracellular fluid. The neuron in this state is said to be polarized, and the difference in charge can be thought of as constituting a biological battery in the same sense as the electrical potential that serves as the energy source for the generation of the receptor potential in the Davis hair cell model. Figure 4-26. Propagation of an action potential along a nerve fiber. There are several different kinds of events that can stimulate the depolarization of a neuron, resulting in the propagation of an action potential along the axon. The most important of these events is the uptake of neurotransmitter chemicals by the dendrites of the neuron. These neurotransmitter chemicals are released either by an adjacent neuron or an adjacent receptor cell, such as a hair cell. Depolarization of the neuron begins with an increase in the permeability of the cell membrane, allowing positively charged ions to enter the cell and negatively charged ions to exit. The result is a very rapid change in the electrical potential of the cell. Panel b of Figure 426 shows the electrical state of a neuron at a particular location where the depolarization disturbance has reached. An imperfect but useful analogy can be drawn between the propagation of an action potential on an axon and the combustion that propagates along the length of a fuse. The analogy is useful in making three important points about the propagation of an action potential. First, the event is a self-sustaining chain reaction. In the case of the fuse, the combustion in one local area Auditory Physiology 21 of the fuse causes an adjacent area to burn, and in the case of the action potential, it is the local electrochemical disturbance that spreads to adjoining regions of the axon. Second, the energy that supports the propagation of the action potential comes from the fiber itself, and not the stimulating event, just as the energy that is responsible for the propagation of combustion along a fuse comes from the fuse and not the match that was used to light the fuse. Finally, combustion along a fuse is an all-or-none event, meaning that the fuse will either burn or fail to burn, and the amount of heat that is generated along the fuse will not be graded depending on the size of the match that was used to light it. This all-or-none law is one of the most fundamental and important properties of neural coding: the neuron either depolarizes or it does not, and the amplitude of the action potential is not graded according to the amplitude of the stimulating event. In relation to the transduction process. This means that the graded receptor potential generated by the hair cell will be translated not into a correspondingly graded neural event, but into a discrete, all-or-none action potential on the auditory nerve. Figure 4-27. Changes in voltage over time in an action potential. Figure 4-27 shows what a typical action potential looks like. Action potentials are measured by placing a very small recording electrode inside the membrane of an axon. Consequently, the graph shows the changes that occur over time in the electrical potential within the cell at one particular location on the axon. The same pattern is repeated at different points in time as the disturbance propagates along the axon. The graph begins at the approximately -50 mV resting potential of the neuron. The very rapid swing to about +40 mV occurs when the cell membrane permeability increases, allowing positive ions to enter. This rapid swing from about -50 mV to about +40 mV occurs over a brief interval of approximately 0.5 ms, and this portion of the action potential is called the spike potential. An active process within the cell rapidly repolarizes the neuron by pumping positive ions out through the cell membrane and, if the neuron remains undisturbed, the electrical potential eventually returns to the resting potential of about -50 mV. Auditory Physiology 22 Absolute and Relative Refractory Periods There is one crucial aspect of the fuse analogy that does not apply to neurons: once a fuse burns it cannot be relit. Neurons, on the other hand, repolarize shortly after generating an action potential and can be stimulated to fire again. However, if the stimulating event occurs less than about 1 ms after the generation of a spike potential, the fiber will not fire, no matter how strong the stimulus. This interval of approximately 1 ms is called the absolute refractory period, and it simply means that spike potentials cannot occur more frequently than about once every millisecond. This corresponds to a frequency of 1/1,000 s = 1,000 spikes per second, or 1,000 Hz. Following the absolute refractory interval is a longer interval of about 1-10 ms that is called the relative refractory period. A neuron can fire during the relative refractory period, but the threshold for stimulating the neuron is elevated. For example, at 2 ms following a neural spike the neuron is capable of firing again since the absolute refractory period has been exceeded; however, since the threshold is elevated a relatively strong stimulus is required. It is important to appreciate that the firing of neurons is a probabilistic event, meaning that it has a random rather than deterministic character. The probability of a second firing increases with either: (a) increases in the amplitude of the stimulating event (i.e., the neuron is more likely to fire when strongly stimulated), and (b) increases in the time that elapses since the previous spike potential (i.e., the neuron is more likely to fire if a long interval has elapsed since the previous spike potential). As we will see later in this chapter, the concepts of refractory periods and the probabilistic nature of neural firing patterns have important implications for neural coding of sound. Excitation versus Inhibition When a neural spike arrives at a nerve ending, neurotransmitters are released into the synaptic space where they are taken up by the dendrites of adjoining neurons. To this point we have been speaking as though the release of neurotransmitters at synaptic junctures always had the effect of stimulating an action potential. However, synaptic junctures may be either excitatory – increasing the likelihood of an action potential in an adjoining neuron, or inhibitory –- decreasing the likelihood of an action potential in an adjoining neuron. These inhibitory connections are quite important and play a central role in a class of contrast-enhancing phenomena called lateral suppression or lateral inhibition, which we will not be discussing in this text. [Omit this? if not, at least give a citation] Signal Coding on the Auditory Nerve Sensory Nerves as Encoders We have seen that the receptor potential that is generated by the hair cells is a graded or continuously varying signal that is a faithful model of the input signal, with the important exception that it is halfwave rectified; that is, the "bottom half" of the signal is not represented since the hair cells are excited by cilia shearing in one direction only (see Figure 4-24). In fact, the signal exists in graded or continuous form at all points up to and including the hair cell potential: (a) as continuous variations in instantaneous air pressure over time prior to the tympanic membrane, (b) as continuous variations in instantaneous displacement over time all the Auditory Physiology 23 way from the tympanic membrane to the basilar membrane, and (c) as continuous variations in instantaneous voltage over time at the hair cell. The electrical signals on the auditory nerve, however, have a very different character since auditory nerve fibers carry a sequence of all-ornone on-off pulses. [a figure is needed here - see parkins-houde paper] The importance of this difference between the graded receptor potential and the discrete on-off pulses on the auditory nerve can not be overestimated. What this means is that the auditory nerve cannot represent the input signal in a completely straightforward way; for example, the all-or-none law means that the auditory nerve cannot simply generate weak pulses when the receptor potential is weak and strong pulses when the receptor potential is strong. The auditory nerve must find some way to encode the receptor potentials generated at various places along the basilar membrane that can be carried out with on-off pulses. The key word here is encode, and to understand the nature of this encoding process and, in fact, the basic structure of all encoding operations, it might be helpful to consider the kind of encoding that occurs in the transmission of messages using Morse Code. In Morse Code the units that must be encoded are letters and a few control characters such as STOP. This is accomplished by assigning a code to each character consisting of a unique sequence of long and short electrical pulses. Imagine a device consisting of an optical scanner, software that would recognize the characters on a page of text, and an encoding circuit that would produce the appropriate sequence of long and short electrical pulses for each character. The main point is that the device has done more than simply convert from optical energy to electrical energy; it has translated or encoded the message into an entirely different kind of language; that is, from the language of letter shapes to the language of pulse widths. In the case of the auditory system, the "message" that needs to be encoded is the sound wave arriving at the tympanic membrane or, alternatively, its spectrum. The signal is preserved in halfwave rectified form in the hair cell receptor potential that drives the auditory nerve. The kind of translation that is occurring is from a graded, continuous signal to a sequence of on-off pulses. The question, then, is how might this continuous signal (or its spectrum) be coded on the Figure 4-28. The basilar membrane displacement patterns for two sinusoids differing in frequency (top panels) and the auditory nerve firing rate patterns that would likely be associated with each of these signals (bottom panels). When the amplitude of basilar membrane movement is high, auditory nerve firing rate is high. Auditory Physiology 24 auditory nerve using on-off pulses? Since the pulses do not vary appreciably in amplitude, the number of dimensions that might be exploited is fairly limited. Three characteristics of auditory nerve firing patterns that might be exploited in this coding scheme are: (1) the time of occurrence of the pulse, (2) the rate at which the neurons fire (i.e., whether a large or small number of spikes occur in a given time interval), and (3) the physical location of the nerve fiber (i.e., whether the nerve fiber is connected to a hair cell on the basal end of the cochlea, the apical end, or somewhere in between). These dimensions need not be treated separately. For example, it is possible to examine the rate of neural activity for fibers connected at various positions along the basilar membrane, which combines the firing rate parameter with the physical position parameter. This is the essence of "place coding" or "rate-place coding," described below. Rate-Place Coding To understand rate-place coding, it is necessary to recall that the basal end of the basilar membrane, which is stiffer than the apical end, responds better to high frequencies than low frequencies, while the opposite is true of the apical end of the basilar membrane. Consequently, higher frequency pure tones will produce the largest basilar membrane movement amplitude toward the base, while lower frequency pure tones will produce the largest basilar membrane movement amplitude toward the apex (see Figures 4-15 and 4-16). The same basic principle applies to complex signals consisting of many frequency components: the lower frequency components of the input signal will be directed toward the apical end of the basilar membrane, and the higher frequency components will be directed toward the basal end. The basic idea of rate-place coding is that this spatial separation of frequency components will be reflected in the pattern of auditory nerve activity. As shown in Figure 4-28, two signals differing in frequency will show different patterns of 8th nerve electrical activity, with lower frequency signals showing more activity at the apical end and higher frequency signals showing more activity at the basal end. "Amount of neural activity" here is simply firing rate: the number of spikes per unit time in neurons connected at various places along the cochlea. The basic idea, then, is that auditory nerve activity toward the base codes high frequency, while auditory nerve activity at the apex codes low frequency. The representation in Figure 4-28 can be viewed as a spectrum of sorts, broadly analogous to a Fourier amplitude spectrum, with two differences: (1) the frequency scale is backwards, since low frequencies are on the right and high frequencies are on the left, and (2) the spectrum is quite coarse relative to the kind of spectrum that can be obtained by Fourier analysis; that is, the pure tone produces activity over a rather wide area of the cochlea. The first point is not relevant since Mother Nature has no bias toward reading from left to right, but the second point may have considerable relevance. This issue will be discussed below. The data shown in Figure 4-28 are hypothetical, and no such pattern has ever been directly observed. The reason is that collection of this kind of data would require the simultaneous recording of auditory nerve firing patterns in a large number of fibers at various positions. Current methods do not exist for making these kinds of recordings simultaneously from a large number of spatially separated neurons. Rate-place coding, however, can be inferred from two techniques that make use of recordings from single auditory nerve fibers. One technique involves the measurement of neural tuning curves from single neurons, and the other technique involves measurement of frequency response curves, also from single neurons. Auditory Physiology 25 Figure 4-29. Neural tuning curves for auditory nerve fibers with three different characteristic frequencies (CF). Data from Kiang and Moxon (1974). Figure 4-29. Neural tuning curves for auditory nerve fibers with three different characteristic frequencies (CF). The threshold of the fiber is the lowest (i.e., sensitivity is greatest) at the characteristic frequency of the fiber. Data from Kiang and Moxon (1974). Neural Tuning Curves Neural tuning curves are measured by placing an electrode into a neuron and determining the threshold of the fiber over a wide range of signal frequencies. The threshold is simply the signal intensity that is required to obtain a measurable response from the neuron. Neural tuning curves for three different neurons are shown in Figure 4-29. Measuring the threshold of a neuron is not simply a matter of increasing the signal intensity until the neuron fires. This is because neurons will fire periodically even in the absence of an acoustic stimulus. The rate at which a neuron will fire in relative quiet is called the spontaneous rate of the neuron. The threshold of a neuron, then, is the intensity required for a neuron to fire at rates that are measurably above its spontaneous rate. The main point to be noted about the tuning curves in Figure 4-29 is that each neuron has a much lower threshold at some frequencies than others. The sharp dip in each tuning curve represents the lowest threshold and therefore the frequency at which the neuron is most sensitive. This is called the characteristic frequency (CF) or best frequency (BF) of the neuron. The terms characteristic frequency and best frequency need to be interpreted carefully. Finding that a given neuron has a CF of 12,000 Hz, for example, does not reveal anything about the structural properties of the neuron that cause it to "resonate" at 12,000 Hz; rather, the CF of 12,000 Hz Auditory Physiology 26 Figure 4-30. Frequency response curves for two neurons with different characteristic frequencies. The figure shows the firing rate of the two nerve fibers to sinusoids whose amplitude is always the same, but whose frequency varies. Data from Rose et al. (1971). means that the neuron is connected to a hair cell that is located at the high frequency (basal) end of the basilar membrane. In other words, the "best frequency" of a neuron is determined not by its internal properties but by its location along the basilar membrane. If a neuron has a CF of 12,000 Hz it is because it is innervating a hair cell that is located at a point along the length of the cochlea where the basilar membrane responds best to a frequency of 12,000 Hz. If this 12,000 Hz CF neuron were "unplugged" from its hair cell near the basal end of the cochlea and attached to a hair cell located at the apical end, the CF would shift to a lower frequency since it would then be driven by the movement of a portion of basilar membrane that is maximally sensitive to lower frequencies. Consequently, although CF is measured by recording the electrical activity of a nerve fiber, the best frequency of the fiber is actually controlled by the mechanical properties of the basilar membrane. Notice also that the tuning curves are asymmetrical; that is, the slopes are much sharper on the high frequency side than the low frequency side. This asymmetry is a direct result of the asymmetry in the envelope of the basilar membrane traveling wave. This point will be addressed in the section below on auditory nerve frequency response curves. The relationship that exists on the auditory nerve between CF and the physical location of the nerve fiber along the basilar membrane is called tonotopic organization. As will be seen later in this chapter, tonotopic organization is a fundamental architectural property of the auditory system. Tonotopic organization is preserved not only on the auditory nerve but throughout the entire auditory system, up to and including the auditory cortex. Auditory Physiology 27 Frequency Response Curves of Auditory Nerve Fibers Another way to observe rate-place coding on the auditory nerve is to measure frequency response curves of individual fibers. For reasons that are explained below, this method is more revealing in some respects of the kind of frequency analysis that is carried out by the cochlea. The method is conceptually identical to the one described in Chapter 3 for measuring the frequency response of a filter. The method described earlier involves driving the filter with pure tones of constant amplitude at various frequencies, from the lowest frequency of interest to the highest frequency of interest. The frequency response curve is the amplitude of the signal at the output of the filter as a function of the frequency of a constant amplitude input signal. The shape of the frequency response curve tells us what frequencies will be allowed to pass through the filter and what frequencies will be attenuated. To measure the frequency response curve of an auditory nerve fiber, a recording electrode is placed in the nerve fiber and its firing rate is measured as pure tones are delivered at a variety of input frequencies; the amplitude of the input signal is held constant. The frequency response curve of the fiber shows firing rate on the y axis and the frequency of a constant amplitude input signal on the x axis. The firing rate measure is equivalent to the output amplitude measure that was described in Chapter 3 for determining the frequency response curve of a filter. Figure 4-30 shows frequency response curves for two neurons with different CFs. The measurements were made by Rose et al. (1971) from a squirrel monkey using a rather low presentation level of 45 dBSPL. (The low presentation level is quite important, as will be discussed below.) One fiber has a CF of 900 Hz and the other fiber has a CF of 1,700 Hz. Note that the two frequency response curves resemble bandpass filters; that is, the fibers respond with maximum output at their CF, with fairly sharp drops on either side. Again, the tendency of these fibers to respond with high firing rates at their CF does not reveal anything about the fibers except that the 1,700 Hz CF fiber innervates a hair cell that is closer to the basal end of the basilar membrane than the 900 Hz CF fiber. The filtering effect, then, can be attributed to the frequency selective behavior of the basilar membrane and not the nerve fiber or the hair cell. Findings such as those presented in Figure 4-30 have given rise to a view of the cochlea as a filter bank; that is, a bank of some 3,000-3,500 overlapping bandpass filters of the kind shown in Figure 4-30. This range of 3,000-3,500 comes from the approximate number of hair cell channels in the cochlea, with each channel consisting of 1 IHC and 3-4 OHCs.4 Each of these channels can be thought of as analogous to the bandpass filter that is used on a radio tuner: each channel allows a band of energy through, while attenuating signal components of higher or lower frequency. By measuring the output of each of these channels (i.e., the firing rate), a spectrum could be reconstructed. Since each channel is maximally sensitive to signal components at the fiber CF, the firing rate at each CF reflects the amount of signal energy at that frequency. This is the essence of what is meant by rate-place coding: the firing rate at each channel codes the amount of signal energy at the CF corresponding to that channel. 4 In terms of the signals that are generated on the auditory nerve, a channel can essentially be considered to be a single IHC since the great majority of afferent auditory nerve fibers innervate IHCs rather than OHCs. Auditory Physiology 28 Figure 4-31. Frequency response curves for a single auditory nerve fiber at eight different signal intensities. Note that the frequency response curves are relatively narrow at low presentation levels but become very broad at intensities that are typical of speech. Data from Rose et al. (1971). Problems with Rate-Place Coding There is no doubt that filtering of the general type that is shown in Figure 4-30 takes place in the cochlea, but does the auditory system actually derive a spectrum using rate-place coding? Opinions on this question have been divided for many years. One of the main questions is whether the bandpass filters that make up the cochlear filter bank are sufficiently selective to account for what is known about the frequency discrimination abilities of listeners. The term "selective" here simply means narrow; that is, a selective or narrow band filter passes a narrow band of frequencies, with sharp slopes on either side. If frequency discrimination ability can be explained on the basis of the cochlear filter bank, then the bandpass filters need to be very narrow since frequency discrimination abilities are stunningly good: in the middle portion of the spectrum, one just noticeable difference in frequency corresponds to a distance along the basilar membrane of approximately 10 microns, or roughly the diameter of a single inner hair cell (Davis and Silverman, 1970). Consequently, in order for rate-place coding to work, the bandwidths of the filters at each channel would have to be sufficiently narrow that relatively little energy is allowed to "spill" into an adjacent channel. There is some reason to believe that the cochlear filter bank is too broadly tuned to explain frequency discrimination abilities. Although frequency response curves tend to be fairly narrow when signal intensities are low, there is very good evidence that filter bandwidths become quite broad at even moderate signal intensities. Figure 4-31 shows a family of frequency response curves for an individual auditory nerve fiber from a squirrel monkey from a study by Rose et al. (1971). This particular auditory nerve fiber has a CF of 1,700 Hz. The eight separate curves represent the frequency response curve measured at eight different signal levels, every 10 dB from 25 dBSPL to 95 dBSPL. Notice first of all that the frequency response curves reveal a certain Auditory Physiology 29 amount of frequency selectivity; that is, the fiber responds better to frequencies at or near the 1,700 Hz CF than at other frequencies, despite the fact that the intensity of the input signal is held constant for each of the individual curves. However, notice that the degree of frequency selectivity is strongly affected by signal level. Specifically, the frequency response curves become much more broad (i.e., less frequency selective) at higher signal intensities. For example, at 35-45 dBSPL the shapes of the frequency response curves resemble a bandpass filter, with fairly sharp drops in firing rate on either side of the CF. However, at levels that are more typical of speech (e.g., 65-85 dBSPL), the filter shapes become considerably broader. This is especially true on the low frequency side of the frequency response curves; that is, the fibers show considerable activity at frequencies that are much lower than the CF. In fact, at the higher presentation levels the filter shapes begin to resemble lowpass filters more than bandpass filters. What this means is that the relationship between place and frequency -- which is the essence of place coding -- is not nearly as strong at typical speech levels as it is at very low stimulus levels. Some theorists have argued that the frequency selectivity that is shown in these frequency response curves is far too coarse to account for the excellent frequency discrimination ability of human listeners. The final point that needs to be discussed regarding the frequency response curves in Figure 4-31 is the asymmetry. The slopes of the frequency response curves are considerably sharper on the high frequency side (~100-500 dB/octave) than on the low frequency side (~8-12 dB/octave). This is the same kind of asymmetry that was seen earlier in the neural tuning curve data, and both effects are due to the asymmetry in the envelope of the basilar membrane traveling wave. This may seem counterintuitive since the traveling wave envelope has a sharper slope on the low frequency (apical) side than the high frequency (basal) side. To understand why this does, in fact, make sense, suppose that we were to measure the frequency response of a fiber with a CF of 1,000 Hz. In measuring the frequency response curve of the fiber, we begin with low frequency pure tones and move to higher frequencies, each time holding the intensity constant and measuring the firing rate of the neuron. Since we are progressing from low frequencies to high frequencies, the point of maximum amplitude in the basilar membrane traveling wave moves systematically from the apex to the base. Figure 4-32 shows what the traveling wave envelope would look like for three sinusoids: (1) a pure tone at the CF of the fiber (1,000 Hz), (2) a pure tone that is lower in frequency than the CF, and (3) a pure tone that is higher in frequency than the CF. The arrow in roughly the center of the figure shows the approximate location of the 1,000 Hz CF fiber that is being recorded. The firing rate at this location will be strongly correlated with the amplitude of the traveling wave at this position on the cochlea. The main point to notice is that the pure tone that is lower than the CF would be expected to cause much more activity in the 1,000 CF fiber than the pure tone that is higher in frequency than the CF due to the asymmetry in the traveling wave envelope. This is why the slopes of the frequency response curves in Figure 4-31 tend to be much sharper on the high frequency side than the low frequency side. Summary of Rate-Place Coding In summary, the essence of rate-place coding is that the vibratory characteristics of the basilar membrane are such that high frequency signals tend to cause greater neural activity in auditory nerve fibers connected at the base than the apex, while the opposite is true of low Auditory Physiology 30 Figure 4-32. Basilar membrane traveling wave envelopes for signals at three different frequencies. If a signal with a 1,000 Hz CF is being recorded, the lower frequency signal will produce more activity in the 1,000 Hz CF fiber than the higher frequency signal. This is why tuning curves and frequency response curves have sharper slopes on the high frequency side than the low frequency side. frequency signals. This relationship between the characteristic frequency or best frequency of an auditory nerve fiber and spatial location along the basilar membrane is called tonotopic organization. Tonotopic organization may be observed experimentally by recording the threshold of individual nerve fibers at different frequencies, resulting in a neural tuning curve. It can also be observed by measuring the frequency response curves of individual nerve fibers. Experimental findings using these two techniques have given rise to the view of the cochlea as a filter bank, with each of some 3,000-3,500 channels passing a band of frequencies. An auditory spectrum might be coded as variations in the firing rate at the output of each of these channels. However, there is some uncertainty about whether the filter bank provides enough frequency resolution to account for the excellent frequency discrimination that is shown by listeners. Synchrony Coding The basic idea behind synchrony coding is that the period of the input signal will be preserved in the period that elapses between successive spikes on the auditory nerve fibers. For example, if the input signal has a period of 10 ms (f = 100 Hz), the pulse train produced on the auditory nerve will tend to have an interspike interval of 10 ms. This type of coding, then, exploits the time of occurrence parameter in neural firing patterns. The basic idea behind synchrony coding is shown in a highly simplified form in Figure 4-33. The basic idea is quite straightforward: a 100 Hz signal will produce a 100 Hz pulse train on the auditory nerve or, stated differently, the interspike interval will match the period of the input signal. According to synchrony coding, the auditory spectrum is assumed to be derived not by the filtering action of the cochlea, but by the measurement of interspike intervals in the central nervous system, where neural firing patterns are analyzed. The very simple kind of synchrony coding that is shown in Figure 4-33 is a very old idea that dates back to Rutherford (1886). When the theory was first proposed the limits imposed on maximum firing rates by neural refractory periods were not known, and it was thought that an Auditory Physiology 31 individual nerve fiber could fire at the very high rates that prevail at the high end of the human frequency range (~20 kHz). Recall that the absolute refractory period is approximately 1 ms, meaning that a neuron cannot fire at a rate exceeding about 1,000 Hz. An individual nerve fiber, in fact, can sustain this 1,000 Hz top rate only for very brief bursts. For sustained signals, the relative refractory period must also be taken into account. The relative refractory period is a longer interval ranging from about 1-10 ms, and when this interval is taken into account, a maximum sustained firing rate that might average perhaps 300-400 Hz or less might be achieved by an individual fiber. Figure 4-34. The volley principle. Adapted from Wever (1949). Auditory Physiology 32 One implication of these firing rate limits would seem to be that the synchrony principle could accurately code signal frequencies only for very low frequencies of perhaps 300 Hz or lower. However, a rather simple elaboration of synchrony coding called the volley principle allows this low frequency limit to be greatly exceeded. The basic idea behind the volley principle, which was proposed by Wever (1949), is shown in Figure 4-34. The assumption made by Wever is that, while each individual neuron may not be able to fire fast enough to produce one pulse for every cycle of the waveform, the sum of the activity of several neurons will show a Figure 4-35. An illustration of Poisson coding. The figure shows the instantaneous probability of a pulse on an individual auditory nerve fiber (i.e., the number of spikes occurring in a given time interval) as a function of time. Superimposed on the pulse probability function, which is plotted as a bar graph, is the input signal. The main lesson of this figure is that instantaneous pulse probability is high when instantaneous signal amplitude is high, and vice versa. Consequently, the pulse probability function preserves the shape of the input waveform, except that the function, like the receptor potential, is halfwave rectified. Data are from the Rose et al. (1971) measurements of 8th nerve firing patterns in the squirrel monkey. [replace w/ scan of Rose puretone and complex data??] train of pulses whose interspike interval matches the period of the input signal.5 According to this view, then, the interspike interval, which codes the period of the input signal, must be measured not from individual nerve fiber, but from the summed output of a group of nerve fibers. 5 Figure 4-34, which has been widely reproduced in many texts in auditory physiology, does a rather good job of capturing the basic idea behind Wever's volley principle. However, the figure is misleading in one respect. The figure shows an unrealistically orderly firing pattern for the individual neurons, where each neuron fires, misses exactly four cycles, fires again, and so on. The adjacent neurons are also shown to be offset from one another by exactly one pulse. Such orderly behavior is not actually obtained experimentally, but the basic principle of measuring the interspike interval from a group of neurons does not depend on this unrealistically orderly behavior. Auditory Physiology 33 This coding scheme does not depend on individual nerve fibers firing at the same frequency as the input signal, but it does depend on the ability of nerve fibers to remain synchronized to the input waveform. In other words, the neuron may fire for one cycle, miss several cycles, fire on another cycle, miss several more, and so on. However, for the coding scheme to work, the spikes that do occur need to remain in synchrony with the input waveform. Remaining in synchrony means that the fiber tends to fire at roughly the same point in the cycle. For example, a fiber out of synchrony might fire once at the positive peak, next at zero crossing, again at a negative peak, and so on. Auditory nerve fibers appear to be able to maintain synchrony for signal frequencies up to approximately 4,000-5,000 Hz, although there is no sharp cutoff. Consequently, synchrony coding, along with the summing-across-fibers assumption provided by the volley principle, appears to be capable of coding frequency up to about 4,0005,000 Hz. Place-Synchrony Coding Rate-place coding and synchrony coding are both very old ideas with long and complex histories. The earliest well formulated version of place theory was proposed by von Helmholtz (1857) more than a century ago, and later significantly modified and extended in the NobelPrize-winning work of von Bekesy (1960) that culminated in the traveling wave theory that has been discussed at great length in this chapter. Synchrony coding has a nearly equally long history, dating back to the first well formulated version described by Rutherford (1886). For many years the principal debate in auditory physiology centered around the question of which of these two divergent approaches was correct. A view that has become rather common is that these two coding schemes are not incompatible. A compromise view called place-synchrony coding or place-volley theory holds that frequency is coded both by rate-place mechanisms and by synchrony mechanisms. In the low frequencies, where synchrony is well maintained, synchrony coding is thought to be dominant, while in the higher frequencies where synchrony is not well maintained, place coding is dominant. There is assumed to be no abrupt shift between synchrony coding and place coding, and for a rather broad range of frequencies in the middle of the spectrum, perhaps from 1,500-5,000 Hz, both kinds of coding schemes are assumed to contribute to frequency analysis. Poisson Coding There is one additional coding scheme that may play an important role in auditory analysis. The scheme is called Poisson (pwah-SOAN) coding, and it typically receives little or no attention in introductory discussions in auditory physiology and, in the view of the authors, has perhaps received too little attention among professional scientists. While the coding scheme has an unfamiliar sounding name, the basic principle is not complicated. The basic idea behind Poisson coding is shown in Figure 4-35, which displays a slightly simplified version of data from a study by Rose et al. (1971). In this study recordings were made from single auditory nerve fibers from a squirrel monkey. Each of these figures shows pulse probability as a function of time; that is, the probability that a pulse will occur at any given point in time. (Note that this scheme, like synchrony coding, exploits the time of occurrence parameter of auditory nerve firing patterns.) Superimposed on each of these figures is the signal that is being presented to the animal. The signal is the smooth curve, and pulse probability is shown as a bar graph. Pulse Auditory Physiology 34 probability is simply the number of spikes that occur in a given small interval of time. The similarity between the shape of the pulse probability function and the shape of the input signal is striking. What the Rose et al. data show is that the instantaneous probability of an auditory nerve pulse is directly and linearly proportional to the instantaneous amplitude of the signal that is being coded. In other words, if the signal amplitude at a given instant is high, the fiber is very likely to fire and, conversely, if the signal amplitude at a given instant is low, the fiber is not very likely to fire. Engineers refer to this coding scheme as a Poisson point process or a Poisson code. The origin of the name is unimportant; what is important is the stunning similarity between the pulse probability code and the input waveform. The pulse probability function can be seen as a very direct coding of the graded receptor potential which, in turn, is a faithful representation of the input signal.6 (Note that both the receptor potential and the pulse probability function are halfwave rectified. Again, this is because hair cells are stimulated to generate a receptor potential as a result of cilia shearing in one direction only. Since it is the receptor potential that stimulates the generation of action potentials in neurons, spike probability is near zero in one half of each cycle.) What is the Poisson code good for? Unfortunately, as with many aspects of auditory analysis, it is not currently known. In very general terms, one possibility is that the primary function of the cochlea is simply to preserve the detailed shape of the input signal (in halfwave rectified form), and the auditory nerve serves to code this signal with on-off pulses using the Poisson pulse-probability code. The Poisson coded signal is then passed on to the central nervous system for analysis. According to this view, the "heavy lifting" of spectrum analysis is carried out not by the cochlea but by purely neural mechanisms in the central nervous system. The neural mechanisms that might be involved in carrying out this spectrum analysis at central levels -- if they exist at all -- are currently not known. Neural circuits that are capable of deriving something equivalent to a Fourier amplitude spectrum from the Poisson coded signal have been hypothesized, but it is not currently known if these circuits exist (see Houde & Hillenbrand, 1997). *********************** CHAPTER NOT QUITE FINISHED. STILL NEEDED ARE: (1) A QUICK TOUR OF CENTRAL AUDITORY PATHWAY AND (2) A SUMMARY. 6 A point worth noting about the Rose et al. data is that the pulse probability functions were derived by summing the activity of an individual fiber over many hundreds of cycles of the input waveform. Being a probabilistic process, the Poisson code will be very ragged looking on any individual fiber for a single presentation of a stimulus. Since a listener perceives sound quality in real time based on single presentations of stimuli, it must be the case that the Poisson code is derived not on a single nerve fiber but by summing pulse probabilities over many nerve fibers. Some rough calculations suggest that the Poisson code is accumulated by summing the activity of something on the order of 1,000 fibers (Houde and Hillenbrand, 1996). Auditory Physiology 35 REFERENCES Batteau, D.W. (1967). The role of the pinna in human localization. Proceedings of the Royal Society, Series B, 168, no. 1011, 158-180. Bekesy von, G., and Rosenblith, W.A. (1958). The mechanical properties of the ear. In S.S. Stevens (Ed.) Handbook of Experimental Psychology. New York: John Wiley & Sons. Davis, H. (1965). A model for transducer action in the cochlea. Cold Spring Harbor Symposium on Quantitative Biology, 30: Sensory Receptors, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory of Quantitative Biology, 181-190. Freedman, S.J., & Fisher, H.G. (1968). The role of the pinna in auditory localization. The Neuropsychology of Spatially Oriented Behavior. S.J. Freedman (Ed.), Homewood, IL: Dorsey Press (chapter 8). Fleming, N. (1939). Resonance in the external auditory meatus. Nature, 142, 642-643. Houde, R.A., and Hillenbrand, J.M. A comb filter theory of auditory spectrum analysis. Under review. Spoendlin, H., and Schrott, A. (1989). "Analysis of the human auditory nerve". Hearing Research, 43, 25-38. Wiener, F.M., and Ross, D.A. (1939). Pressure distribution in the auditory canal in a progressive sound field. Journal of the Acoustical Society of America, 18, 401-408. Wier, C.C., Jestaedt, W., & Green, D.M. (1977). Frequency discrimination as a function of frequency and sensation level. Journal of the Acoustical Society of America, 61, 178-184.