The Brain Bases of Phantom Auditory Phenomena: From Tinnitus to Hearing Voices
Cynthia Gayle Wible, Ph.D.*
Harvard Medical School
VA Boston Healthcare System
940 Belmont Street
Psychiatry 116A
Brockton, MA 02301
The phenomenology and neural bases of phantom auditory perceptions are reviewed. A variety
of phantom auditory phenomena are discussed, from tinnitus to hearing voices. It is claimed that
the phenomenology or qualia of the hallucinatory experience may correspond to how the
auditory system is organized into functional regions (neural architecture) and how auditory
percepts are represented at the single neuron level within this system. There may be a one-toone correspondence between the type of experience (e.g., hearing a tone versus hearing a voice)
and the representational qualities or aspects of sound representation in different parts of the
auditory and speech processing stream or system. The literature does not support the supposition
that certain features of auditory hallucinations correspond to either neurological or psychiatric
disease. Clinical aspects of auditory hallucinations are also discussed that may be of interest to
clinical practitioners with patients who have auditory hallucinations.
Keywords: auditory hallucinations, tinnitus, music hallucinations, voices, schizophrenia
The phenomenology of phantom auditory phenomena.
Phantom auditory phenomena or auditory hallucinations (AH) consist of auditory perception in
the absence of external stimulation. AH range from hearing short simple sounds to hearing
voices (speech). Voice hallucinations also range from simple short lived phenomena (hearing
one’s name called) to hearing full sentences or dialogues and the feeling of a presence of a being
or entity. This apparent continuum of complexity may be related to the characteristics of the
brain regions that are involved in the hallucination; this will be discussed in the next section.
AH can arise for a number of reasons, but are thought to arise from central brain dysfunction or
abnormal activity in the brain. Even tinnitus, a simple form of auditory hallucination, which is
characterized by hearing a buzzing, ringing or tone sound, is thought to result from abnormal
brain activity and is not usually alleviated by peripheral treatments1. So, although tinnitus is
often precipitated by hearing loss or peripheral damage, the experience of hearing tinnitus
sound(s) comes from abnormal activity in the brain. Tinnitus is common and has been
estimated to occur in up to 20% of the population 2. Like tinnitus, musical hallucinations are
most often associated with either hearing loss or neurological damage, but this is not the case
with voice hallucinations3. If voices are heard in neurologically-impaired patients, they are often
localized on one side and hence don’t have the same experiential quality as hearing a “real”
voice4. Auditory hallucinations of voices or auditory verbal hallucinations (AVH) are the most
frequent symptom of schizophrenia. Bleuler 5 observed that “Almost every schizophrenic who is
hospitalized hears voices, occasionally or continually.” Silbersweig & Stern6 estimated that up
to 74% of schizophrenic patients hear voices, which can be experienced as conversing with each
other or commenting on ongoing behavior and are often sentences or dialogs, not just single
words. Schizophrenic AVH are usually very different from the common auditory hallucination
of hearing one’s name that occurs in neurologically healthy individuals7. Schizophrenic AVH or
voices are experienced as coming from a person or a presence and seem “real” as if someone is
actually speaking. There is often an elaborate system of beliefs and attributions concerning the
voice and its origins. There may be more than one voice and the voices can have conversations
with each other. Some patients hear voices that comment on their ongoing behavior. Some have
the feeling that the voice constitutes another person inside of their body. The voices may also
have a frightening or punitive tone. A first person account of what it is like to hear voices was in
a special issue of the journal Cognitive Neuropsychiatry that was dedicated to the topic of
auditory hallucinations; the following is a quote from an article by Cockshutt8:
“For me, and I can only speak for myself, the voices are externalized …. and real. There
is no point in pretending otherwise. I could say that I understand that they are a false
manifestation of my internal thoughts. The truth is that for me that is the unreal aspect of
it all because by pretending to believe that the voices are unreal I am, in essence, creating
a false reality.”
In summary, auditory hallucinations form a continuum of complexity. On one end are simple
tones and noises, followed by music which is more complex and organized. Voice hallucinations
may be next in the continuum, and can also be categorized within the speech system as simple
(hearing brief single words or unintelligible speech-like sounds) or more complex (hearing a
voice or voices speaking full sentences or dialogs accompanied by a feeling that the voice has an
external source or is real or corporeal). The next section of this article will describe
representational aspects of the auditory speech system in the brain that may correspond to the
phenomenological aspects of auditory hallucinations.
The brain bases of phantom auditory phenomena: Tones and Speech.
In the previous section, a continuum of complexity in AH was described that can range from
hearing simple tones to hearing a voice with a feeling of a source or a feeling that it comes from
a real person. The later type of hallucination is most often associated with schizophrenia
(hearing a voice narrative with a feeling of a source or presence). However, brain lesions can
produce schizophrenia-like voice hallucinations9. Therefore, the type of hallucination
experienced may depend on the brain areas involved, not the type of disease. The cortex is made
up of small functional regions of neurons that respond similarly; the size and juxtaposition of
these cortical maps is what I refer to as “architecture.” The adjacency or cortical location of a
map can convey an abundance of information about what type of inputs it receives, the output of
the computation, and the function of the region. For example, neurons within a region might all
respond to faces more than other visual attributes or they might respond to visual objects and
also exhibit size and shape constancy. These attributes constitute information about the neuronal
representation. The size and shape of these regions varies considerably, especially beyond
primary cortical regions where a columnar organization may be present. I will review evidence
that the continuum of complexity in AH matches the representational structure and cortical
architecture of progressively higher order auditory cortical regions. This type of correspondence
between hallucinations and cortical architecture and representation has been found in the visual
system. Unlike language, the visual system in non-human primates is comparable to humans and
has been mapped in great detail. Visual cortex is divided into different regions whose neurons
respond primarily to color, to objects, to faces and other visual categories. A one-to-one
correspondence was found between the type of visual hallucination experienced in human
subjects and activity in visual regions such that color hallucinations were associated with activity
in visual regions representing color, face hallucinations were associated with activity in those
cortical regions representing faces and so on10,11.
Within this framework, hearing simple tones or ringing sounds would correspond to a
dysfunction of more primary, tonotopically-organized regions of cortex. It is well known from
single unit or single neuron recording in animals that the primary auditory cortex is tonotopically
organized into maps where neurons within a small region respond optimally to tones within a
specific frequency and nearby regions respond to surrounding frequencies. Mirror symmetric
tonotopic maps resembling those previously found in macaque monkeys have now been found in
the human primary auditory cortex12. These investigators used very high resolution functional
magnetic resonance imaging (FMRI) and found mirror symmetric tonotopic maps in Heschl’s
gyrus or primary auditory cortex that shared a low frequency border. Primary auditory cortex
has been implicated as a generator of tinnitus in animals and human subjects. For example, the
modulation of (both excitatory and inhibitory) activity in primary auditory cortex was found to
be the basis of tinnitus in a recent report using an animal model whose results replicated previous
work13. If primary auditory cortex is responsible for the experience of tinnitus, then interventions
that change brain activity in this region should affect tinnitus in human subjects. Human
neuroimaging and magnetoencephalography (MEG) studies show that the primary auditory
cortex is reorganized in subjects with tinnitus. There is an expansion of the frequency
representation in the auditory cortex that corresponds to the perceived tinnitus frequencies; the
degree of the shift is related to the severity of the tinnitus 14,15. These findings have been used to
successfully treat tinnitus in a patient with auditory nerve damage who was deaf in the left ear.
FMRI was used to map out the auditory response in primary auditory cortex and to visualize the
abnormally-activated regions. This FMRI activity was used as a guide to place electrodes for
focal extradural electrical stimulation of the primary auditory cortex. This procedure was
reported to suppress tinnitus completely15. Recently, researchers have begun to investigate
hypotheses about the interactions between other regions and auditory cortex in inhibiting
tinnitus, but current evidence suggests that the perception is clearly generated within primary
auditory cortex16. Hence, abnormal activity or over-activation in the primary auditory cortex that
is tonotopically organized corresponds to the experience of hearing tones or simple sounds when
none are present in the environment. The next section will explore the possibility that more
complex auditory hallucinations such as speech are also a result of neural overactivation and that
the phenomenological aspects of the hallucinations can be linked to the neuronal representational
properties and architecture of higher order cortical regions that are further up in the stream of
auditory processing.
Neural bases of speech perception and production
This describes progressively higher order auditory processing stages that form the neural basis of
speech perception. Although these regions are subsequent to primary auditory cortex, the
processing of information is thought to be highly recursive and interactive and does not proceed
in a strictly linear fashion. For speech perception, a spectral-temporal analysis of the auditory
signal is performed in primary auditory cortex, and an auditory phonological representation is
activated in the middle portion of the superior temporal sulcus (STS) (see Figure 1). A more
posterior region of the STS houses amodal (or multimodal) phonological representations and can
be activated by written or spoken words. To initiate speech production, these representations are
then translated into prearticulatory motor codes for the vocal tract in
the Sylvian parietal-temporal area (Spt), a region near the temporal-parietal boundary (this
summary is based on a model and synthesis from Hickok and Poeppel17,18). Spt interacts with
middle and inferior frontal regions to produce the articulatory codes for speech production. The
posterior STS and Spt are active during speech perception, production and during subvocal
rehearsal for working memory. Phonemic representations in the STS also make contact with
widespread semantic representations in the temporal lobe and other regions. Over-activation that
extends beyond primary auditory cortex would be predicted to cause the perception of phonemes
(perhaps in the form of incomprehensible words) or whole single word auditory representations.
Hug et al.4 described an epileptic patient who exhibited a progressive auditory hallucination such
that tonal tinnitus transformed into the perception of noise and then finally to incomprehensible
voices, as would be predicted from the structure of the auditory speech system.
Voices are usually experienced within a social context and within conversation with another
person. Under naturalistic conditions, auditory voice and visual face and body gestures are
experienced simultaneously. The audiovisual nature of the speech signal is reflected in an area
that is in the posterior region of the STS, or PSTS. A large portion of this region is dedicated to
audiovisual speech processing in human subjects. This area (especially the right PSTS) functions
in the recognition of voices, as opposed to recognizing verbal or semantic content19. The PSTS
is part of a system that extends upward into the inferior parietal region where intentions are
formed and sent to motor regions of the brain. The PSTS and inferior parietal regions are often
referred to as the temporal-parietal occipital junction (TPJ). This system, or collection of tightly
functionally coupled regions, has interesting representational and architectural properties that
may correspond to characteristics present in schizophrenia-like voice hallucinations.
The audiovisual signal that conveys speech also contains information about person identity
(agency), as well as information about intentions and emotional state (in the form of prosody and
emotional gestures). TPJ functionality reflects the fact that the voice is experienced
simultaneously with social representations of persons and emotion. A dominant role of this
region is in the representation of dynamic multimodal gestures (primarily sight, sound and
touch), including audio-visual speech20. Audiovisual speech representation is adjacent to or
partially overlapping in the TPJ with the neural territory responsible for prosody perception
(prosody is the melodic quality of the voice that can convey both emotion and meaning), the
perception of emotional expressions, the perception of eye gaze, and the perception of social
attention21-27. .
The coding of agency (or intention or purpose) is automatically activated or perceived along with
gestures or movements and is an inherent part of the representation. Jellema et al.28 recorded
neuronal activity within monkey STS that combined information about reaching or grasping with
activity related to attention or gaze direction. This combination of information results in a
cellular representation of the intentionality of movements28. FMRI studies in human subjects
show that the PSTS is the only brain region in humans that responded differentially to intentional
versus unintentional movement, leading to the conclusion that this region is a core substrate for
conveying the perception of agency29. The consequence of this coupling of gesture with a code
for agency is that if the speech representations in TPJ were erroneously activated, then there
would be an accompanying feeling of agency or a feeling of someone acting. Hence, overactivation of audio-visual speech gestures could cause the perception of a voice and the feeling
of a presence with intentions. This could be the basis of a feeling of a source or presence that
accompanies the auditory hallucination of a voice in schizophrenia-like voice hallucinations.
This could also be an underlying reason for the often elaborate system of beliefs about the
purpose and origin of the voice. There is direct evidence for these suppositions: Cortical
stimulation of the TPJ in a non-psychotic human subject produced a feeling of a shadowy
presence, and the subject imbued this presence with certain intentions30. In other words, the
voice hallucinations feel real because they activate the part of cortex that corresponds to the
perception of speaking to another person; at this cortical level of processing, the intention of the
person and the audio-visual speech signal are encoded and perceived automatically and
When audio-visual representations are activated in the PSTS, neurons provide rapid feedback
input to unimodal sensory regions, and especially to the auditory cortex in both animals and
humans 26,27,31. This feedback is automatic and occurs without conscious attention. Hence, the
over-activation or erroneous activation of audio-visual speech representations excites earlier
auditory regions and could perpetuate abnormal neural activity.
Another aspect of schizophrenia-like voice hallucinations is that the voices often speak in dialogs
or narratives. The TPJ (bilaterally) is preferentially involved in narrative comprehension,
ascompared to word or even sentence comprehension28. Therefore, this region may be used to
both perceive and construct narratives. This observation might account for the fact that
schizophrenic subjects are deficient in generating or building linguistic context and in
understanding narrative32.
The TPJ is also selectively involved in the theory of mind or the ability to attribute and represent
other’s mental states (also an important part of social communication and understanding speech
and actions) as well as in self representation21,22,25. Figure 2 shows the overlapping functionality
in this region and depicts the approximate cortical regions for many of the functions discussed in
this section 33-38. In fact, the TPJ may be the core region in the brain that underlies the
perception of social interaction24,33. Simply put, over-activation of this region may cause the
perception of social interaction such as being in a conversation with another person or hearing a
conversation. As discussed above, the overlap between voice and dynamic person and emotion
representation may be the basis for the experience of a voice and of a presence that constitutes
schizophrenia-like auditory hallucinations.
Figure 2. Summary figure of the overlap of functional
regions in the TPJ (inferior parietal and PSTS) involved in
eye gaze (red); audiovisual speech (light10blue); self
representation (yellow); theory of mind/agency (green);
emotional perception of faces and prosody (dark blue). Rerepresentation of data respectively from references34-38.
There are several lines of evidence that the TPJ is involved in schizophrenia-like auditory
hallucinations26,27. First, a lesion in the TPJ, or epilepsy, can cause schizophrenia-like psychoses,
including AVH9,27. One of the most accurate but difficult ways to study auditory voice
hallucinations is the symptom capture method. In this method, the brain is imaged during the
hallucination and also during periods when the hallucination is absent. One of the best symptom
capture studies of schizophrenic voice hallucinations was performed using a patient whose
hallucinations had a periodicity (the hallucination would last for approximately 26 seconds with
a period of silence for 26 seconds) that matched well with the requirements of FMRI imaging 39.
This study showed that activity in the PSTS and middle temporal region was evident
immediately before the hallucination. Also, the neural activity persisted throughout the
hallucination and spread to inferior parietal and then to frontal cortex. This case study is
consistent with the supposition that schizophrenia-like auditory hallucinations of voices arise
from activity in the TPJ40. Additional supporting evidence for this theory comes from reports
that transcranial magnetic stimulation (TMS) applied to the TPJ has also been found to alleviate
schizophrenic auditory hallucinations (as well as other symptoms)41. TMS uses a coil to apply
electromagnetic energy to the patient’s scalp/skull and has the ability to suppress or alter patterns
of neural activity in brain regions beneath the coil.
Music Hallucinations
Where in the auditory processing stream does the perception of music occur? There is relatively
little research on the perception of musical hallucinations and on music perception in general,
apart from other types of auditory perception. Apparently, the neural territory for music
perception overlaps with speech perception in the superior temporal cortex 42,43. However, music
was found to activate more dorsomedial regions, including insular and inferior parietal regions42.
A review of musical hallucinations concluded that they can result from underlying pathology in
the ear or the brain44. The findings that music activates more dorsomedial regions are consistent
with a report of a patient who developed musical hallucinations after resection of a right insular
glioma45. However, additional studies are needed to determine the underlying neural basis of
music hallucinations because the findings are somewhat divergent in this area of research46,47.
Clinical aspects and implications of auditory hallucinations.
Auditory hallucinations -- especially music and voice hallucinations -- are thought to be underreported. Brain regions involved in producing auditory hallucinations are also involved in
normal hearing, auditory attention, and working memory tasks. Therefore, all of these functions
may be impaired in patients with auditory hallucinations. These factors should be considered
when evaluating patients who may be experiencing hallucinations. Oliver Sacks describes
several case studies of persons with musical hallucinations in his book Musicophilia 48,
whichprovides insight into these unusual experiences, including the emotional reactions and
coping strategies of patients with music hallucinations. This book is recommended reading for
patients with music hallucinations, their family members and health care providers.
Auditory hallucinations can be associated with various etiologies, including brain damage,
epilepsy, psychiatric disorders or hearing loss. One report of patients referred for hearing
impairment showed a high prevalence of auditory hallucinations (33%) that consisted of
“humming or buzzing (35.9%), shushing (12.8%), beating or tapping (10.6%), ringing (7.7%),
other individual sounds (15.4%), multiple sounds (12.6%), voices (2.5%) or music (2.5%)”49.
Patients who experience music or voice hallucinations would probably benefit from a
neurologic or psychiatric consultation. This is especially true if the hallucinations are present
for extended periods (months or longer), and are distressful or interfering with daily function.
One study indicated that patients who hear auditory hallucinations, and especially voices, felt
that they would benefit if they could describe the experience in more detail with health
practitioners50. Clinicians asking questions about the content and sensory aspects of the
hallucination may comfort patients and may also provide valuable information about treatment
strategies and referrals.
Voice hallucinations in particular can be a prominent and sometimes defining feature of
schizophrenia. Schizophrenia is often accompanied by other symptoms such as hallucinations of
people and delusions (e.g., people are spying on me, watching me, etc.) as well as affective
inexpressiveness and attention deficits. However, voice hallucinations that comment on the
patient’s actions or that consist of hearing two or more voices conversing may alone indicate
schizophrenia if they are experienced for more than two months (see the Diagnostic and
Statistical Manual of Mental Disorders for diagnostic criteria and more specific information).
Because voices can sometimes command patients to harm themselves or others, clinicians should
be vigilant for this type of hallucination. Strategies exist that can be used to assess the negative
characteristics of unpleasant voices and to assess the potential risk resulting from hearing voices
. Behavior management protocols (such as listening to music and other coping skills) can be
taught to patients with persistent auditory hallucinations51.
This article reviewed evidence for understanding auditory hallucinations within a framework that
is based on the functional architecture and single neuron representational properties of the
auditory processing stream. Hallucinations of short-lived sounds may correspond to the aberrant
activation of more primary auditory regions. Hallucinations of voices and a feeling of a presence
or that someone is speaking may correspond to the activation of audiovisual speech
representations in the TPJ. In the normal brain, this region is activated when someone is actually
speaking. These gesture representations automatically convey agency and intention, and
synchronize speech regions. The TPJ is also used to perceive and construct narratives, a feature
that may account for the elaborate conversational nature of schizophrenia-like auditory voice
hallucinations. Auditory hallucinations are frequently reported by patients in audiology clinics.
The brain regions that were described to be involved in auditory hallucinations are also used for
speech perception, speech production and working memory. Some of these neural regions are
also used for affect perception and production (in the form of gestures and prosody) and social
attention. Therefore, clinicians should be sensitive to the fact that hallucinations may interfere
with these functions. Patients who experience hallucinations report a need to discuss the
phenomenological aspects of the hallucinations and their meaning with caretakers. An
assumption underlying the framework described in this article is that the experiences seem very
real and are not under the control of the patient. The type of auditory hallucination (simple or
schizophrenia-like) does not show a one-to-one correspondence to the cause (e.g. psychosis,
epilepsy, hearing loss). This observation should be kept in mind for patients who are referred to
other specialties (neurology, psychiatry) because the intensity, type of hallucination or distress
caused by the hallucination need to be considered in order to identify and implement an effective
treatment program.
This work was supported by an NIMH grant: 1 R01 MH067080-01A2 and by the Harvard
Neuro-Discovery Center (formally HCNR). Funded also by the Biomedical Informatics Research
Network (U24RR021992); National Institute of Mental Health.
Conflict of Interest Statement: The author declares that the research was conducted in the
absence of any commercial or financial relationships that could be construed as a potential
conflict of interest.
Figure Legends.
Figure 1. Several posterior language regions that are involved in the perception and production
of speech (ref). The primary auditory cortex (green), a middle portion of the STS (yellow) that
houses auditory phonological representations. An amodal phonological region (dark blue) in the
posterior STS that is involved in amodal phonological representation. The Sylvian-parietaltemporal (Spt) in light blue only in the left hemisphere provides the interface between the
phonological networks in the bilateral superior temporal gyrus and STS and the articulatory
networks in the anterior or prefrontal language system.
Figure 2. Summary figure of the overlap of functional regions in the TPJ (inferior parietal and
PSTS) involved in eyegaze (red); audiovisual speech (light blue); self representation (yellow);
theory of mind/agency (green); emotional perception of faces and prosody (dark blue). Rerepresentation of data respectively from references (Nummenmaa et al., 2010; Wright et al.,
2003; Blanke and Arzy, 2005; Young, Dodell-Feder, & Saxe, (2010); Adolfs et al., 2002).
