The Maximum Intelligible Range of the Human Voice

advertisement
Sponsoring Committee:
Professor Agnieszka Roginska, Chairperson
Professor Tae Hong Park
Professor Brian Gill
THE MAXIMUM INTELLIGIBLE RANGE OF THE HUMAN VOICE
Braxton Boren
Program in Music Technology
Department of Music and Performing Arts Professions
Submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy in the
Steinhardt School of Culture, Education, and Human Development
New York University
2014
Copyright © 2014 Braxton Boren
ACKNOWLEDGEMENTS
The saying “it takes a village to raise a child” still applies to those of us who
find ourselves slowly emerging from childhood around the age of thirty with the end
(we hope!) of our schooling in sight. For me at least it has taken a large amount of
villagers and I am pleased to be able to thank them all without fear of an orchestra
playing me off the stage.
First I must thank my father, who early in life instilled a love of history in
my brothers and me that he was able to further enrich later on as our teacher. Dad
made me read Franklin’s Autobiography in high school, and later in life reminded
me of the Whitefield experiment and its similarity to other archaeoacoustic work I
had done. Simply put, none of this project would have come about without him.
Secondly, I am extremely grateful to my supervisor, Agnieszka Roginska,
who from the start has been supportive and encouraging even when I embarked on
a direction of study without direct precedent in our (or perhaps any other) department. Her probing questions and boundless imagination allowed my work to take
on breadth without sacrificing depth, and though at times I felt overwhelmed by
the new avenues of study opened up by our conversations, the research produced
was always stronger for it. Thanks also to the other members of my committee,
Brian Gill and Tae Hong Park, who each in their own way exemplify the interdisciplinary work that allows Steinhardt to be such a great nexus of expertise for the
entire university.
I would like to thank the many historians whose comments, suggestions, and
iii
insights have supplemented my lack of training in that area to hopefully produce
something coherent enough to make a small contribution to that field. This includes
George Boudreau, who graciously gave me a comprehensive tour of the Philadelphia Market Street area early on and supplied many important details in the early
modeling stage. Thanks also to David Bebbington, Mike Breidenbach, Joyce Chaplin, Lee Gatiss, Deborah Howard, Digby James, David Ceri Jones, Karen Kupperman, Jerome Mahaffey, Mark Noll, Elizabeth Pardoe, Richard Rath, Harry Stout,
and Peter Williams for all the comments, hints, and suggestions that guided me
along the path toward the very finicky details I needed for this project.
Thanks go to the scientists whose input helped solidify my own understanding of the processes involved and helped affirm that “applied research” is not a dirty
word. This particularly includes David Lubman for his particular devotion to rigorous analysis in archaeoacoustics to generate real quantitative data. Thanks also
to David Bradley, Ken Cunefare, Bengt-Inge Dalenback, Malcolm Longair, and
Charles Ross for their helpful suggestions and encouragement along the way.
Thanks to all my friends in the Music and Audio Research Lab over the years:
to Areti Andreopoulou for showing me the ropes and in general showing patience
and grace to me when I had no right to expect it; to Andrew Madden for his infectious happiness; to Justin Matthew for his microphone calibration software; to
Marc Wilhite for his help meticulously setting up microphone arrays; to Rachel Bittner for making me feel old and thus giving me extra motivation to graduate; to Finn
Upham for deep conversations; to Aron Glennon for shallow conversations (equally
necessary); to Taemin Cho for years of LaTeX expertise; to Jon Forsyth for sharing
my esoteric musical tastes; to Eric Humphrey and Uri Nieto for coffee, beer, and
solidarity; and to Michael Musick for shouldering several of my responsibilities as
I retreated from lab duties to actually finish the dissertation.
iv
My appreciation goes to Steinhardt for providing a grant to go to London
to survey and measure the sites of Whitefield’s crowds, and to Blair and Melissa
Heuer for giving me a place to stay there. Thanks also to the many vocalists who
participated in the recording sessions involved in this research.
Thanks to my brothers for both majoring in history and making me feel insecure for majoring in music. Thanks to my mother for giving me a job in junior
high combing through old newspapers (little did I know how useful that skill would
prove later on).
And thanks last and most of all to my wife Laura, who followed me to New
York when neither of us were really sure we wanted to go there. She has celebrated
with me in the good times, and wept with me in the bad. If there is any quality in
this work, she deserves at least half the credit for the countless ways she has loved
and supported me throughout the last four years.
v
TABLE OF CONTENTS
LIST OF TABLES
ix
LIST OF FIGURES
xi
ACRONYMS
xiv
CHAPTER
I
INTRODUCTION
1
Scope of this Study
Construction of Audience
Assessment of Intelligibility
Motivation
Dissertation Outline
Contributions
Associated Publications by the Author
Peer-Reviewed Articles
Conference Papers
II
BACKGROUND
13
History of Whitefield’s Crowds
Archaeoacoustics
Acoustical Simulation
Acoustics of the Spoken Voice
Directivity
Maximum Level
III
3
3
5
6
9
10
11
11
11
13
15
16
22
23
24
ACOUSTIC DIRECTIVITY OF VOCAL PRODUCTION MODES 28
Measurement Procedure
Results
Normalized Radiation Patterns
Absolute Radiation Patterns
Discussion
vi
29
33
33
37
40
IV
ANALYSIS OF FRANKLIN’S EXPERIMENT
Franklin’s Experiment
Diffraction Effects
Noise Sources in Eighteenth-Century Philadelphia
Discussion
V
ACOUSTIC SIMULATION OF FRANKLIN’S EXPERIMENT
Makeup of the Colonial City
Modeling Procedure
Geometry
Sound Attenuation Simulation
Speech Intelligibility
Background Noise
Atmospheric Conditions
Results
Discussion
VI
MAXIMUM AVERAGED AND PEAK VOCAL SPLS
Method
Pilot Study
Spoken and Sung Voice
Analysis
Average Levels for Speech
Gender Differences
Spoken Levels For Singers
Peak Spread
Standard Deviation by Level
Back vs. Mask Levels
Discussion
VII
43
46
52
54
56
57
59
59
61
63
64
66
66
68
70
71
71
74
75
75
75
75
76
77
78
79
MODELING THE SITES OF WHITEFIELD’S LONDON CROWDS 80
Locations
Moorfields
Kennington Common
Mayfair
VIII
43
SIMULATIONS OF WHITEFIELD’S SERMONS IN LONDON
Simulation Results: Base Conditions
Moorfields
vii
80
80
86
89
94
94
96
Kennington Common
Mayfair
Other Factors
Environmental Factors
Geometric factors
Crowd Density
Final Crowd Estimates
IX
100
102
103
104
109
113
117
CONCLUSION
122
Findings
Implications
Future Work
Summing Up
122
123
124
125
BIBLIOGRAPHY
126
A
FULL VOCAL SPL MEASUREMENTS
136
B
HISTORY OF SOUND IN COLONIAL PHILADELPHIA
140
viii
LIST OF TABLES
1
Review of maximum SPLs measured in previous studies
26
2
Absorption coefficients by octave band center frequency (Hz) for
each material used in Market Street model
60
Octave band averaged sound pressure (dB) at 1 m for both background noise sources
65
Simulated LAeq values (dB) for Whitefield’s voice based on background noise distance and minimum STI value
67
Leq values for pilot study, in dBA , for Conversational, Theatrical,
and Maximal Levels
72
6
Lpk values for pilot study, in dBA
73
7
Spk values for pilot study, in dBA
73
8
Absorption coefficients for buildings and crowds at the Moorfields
86
9
Absorption coefficients for brick walls near Mayfair
92
10
Moorfields simulated MIA (m2 ) for each vocal SPL and background
noise level
97
Kennington simulated MIA (m2 ) for each vocal SPL and background
noise level
100
Mayfair simulated MIA (m2 ) for each vocal SPL and background
noise level
103
Simulated changes in MIA resulting from changes in temperature in
Moorfields
105
3
4
5
11
12
13
ix
14
Simulated changes in MIA for Moorfields resulting from changes
relative to 50% humidity
108
Simulated changes in MIA resulting from changes in source elevation angle in Moorfields
110
Maximum simulated MIA and crowd size for each site at 90 dBA
vocal level
118
17
Maximum reported crowd size for each site
119
18
Leq values for speech, in dBA
137
19
Leq values for back sung voice, in dBA
137
20
Leq values for mask sung voice, in dBA
138
21
Lpk values for speech, in dBA
138
22
Lpk values for back sung voice, in dBA
139
23
Lpk values for mask sung voice, in dBA
139
15
16
x
LIST OF FIGURES
1
Schlieren photography showing wave propagation in a concert hall
from Rindel (2002)
17
Optical ray method showing attenuation of individual ray paths from
Rindel (2002)
17
3
Diagram of microphone array used for measurements
31
4
Aligning a vocalist with the measurement array
32
5
Normalized overall levels for the vocalists, intoning vowels on C4
34
6
Normalized overall levels for the vocalists, speech (a and b) and song
(c and d)
35
7
Normalized third-octave bands for actress’s monologue
36
8
Normalized third-octave bands for actress’s vowels
36
9
Normalized 10000 Hz bands for musical theater singer’s song and
vowels
37
10
Normalized third-octave bands for opera singer’s song
38
11
Absolute overall levels for the vocalists, intoning vowels on C4
39
12
Absolute overall levels for the vocalists, speech (a and b) and song
(c and d)
40
13
Absolute third-octave bands for opera singer’s song
41
14
Absolute third-octave bands for actor’s vowels
41
15
Inset of Clarkson-Biddle Map of Philadelphia showing Market Street
45
2
xi
16
Diagram of Franklin’s position (BF) in relation to sources on Front
Street
47
17
Diffraction at Franklin’s Position using Kurze-Anderson Formula
50
18
Diffraction at Franklin’s Position using Maekawa’s Solution
51
19
William Breton, Old Court House & Second Friend’s Meeting, 1830,
Library Company of Philadelphia
57
Inset of George Heap’s East Prospect of the City of Philadelphia,
1752, New York Public Library
58
AutoCAD model of Market Street area, extruded from ClarksonBiddle map
60
22
Predicted logarithmic attenuation from Whitefield to Franklin
62
23
Summed pressure-squared echogram from Whitefield to Franklin
64
24
Mean peak spread, dBA
76
25
Standard Deviation for the 9 Singers, dBA
77
26
Mean dB Increase from Back to Mask Voice
78
27
Inset of John Rocque’s 1746 Map of London showing the Moorfields
83
28
Sketchup Model of the Upper and Middle Moorfields
85
29
Map of Kennington Manor, including the Common, based on Hodskinson and Middleton’s survey, 1785
87
30
Modeling Kennington Common in Sketchup
88
31
Inset of John Rocque’s 1746 Map of London showing Mayfair
90
32
Unsigned wood print of Chesterfield House, 1760
91
33
Sketchup Model of Mayfair
92
34
Simulated STI at Moorfields for different background noise conditions
99
20
21
xii
35
Simulated STI at Kennington for different background noise conditions
101
36
Simulated STI at Mayfair for different background noise conditions
104
37
Atmospheric absorption at different humidity levels, from (Harris,
1966)
107
Male vocal directivity pattern, in octave bands, used for Whitefield’s
voice
111
Change in MIA based on source height and angle at Mayfair
112
38
39
xiii
ACRONYMS
BEM : Boundary Element Method. 19
CT : Cone Tracing. 21, 61, 62, 94
FDE : Finite Difference Equation. 19
FEM : Finite Element Method. 19
GIS : Geographical Information System. 15
ISM : Image Source Model. 20, 21
MIA : Minimally Intelligible Area. 93, 95–97, 99–102, 104–113, 115–117, 119,
120, 122
PE : Parabolic Equation. 19
RMS : root mean square. 6, 33
RT : Ray Tracing. 20, 21, 61
SPL : Sound Pressure Level. 3, 6, 22, 24–29, 55, 56, 62, 66–68, 70, 93–96, 120,
122
STI : Speech Transmission Index. 4–6, 28, 55, 63–67, 70, 86, 93–97, 101, 106,
112
WE : Wave Equation. 18–20, 22
xiv
“We shed as we pick up, like travellers who must carry everything in
their arms, and what we let fall will be picked up by those behind. The
procession is very long and life is very short. We die on the march. But
there is nothing outside the march so nothing can be lost to it.”
-Tom Stoppard, Arcadia
xv
CHAPTER I
INTRODUCTION
The subject of this dissertation is the intelligible range of George Whitefield’s
open-air oratory in eighteenth-century America and Britain. Benjamin Franklin
doubted the accounts he heard of the Anglican preacher Whitefield addressing
30,000 or more congregants at open-air venues in London. When Whitefield came
to Philadelphia in 1739, Franklin performed one of the earliest recorded ‘archaeoacoustic’ experiments:
[Whitefield] had a loud and clear Voice, and articulated his Words and
Sentences so perfectly that he might be heard and understood at a great
Distance, especially as his Auditories, however numerous, observ’d the
most exact Silence. He preach’d one Evening from the Top of the Court
House Steps, which are in the middle of Market Street, and on the West
Side of Second Street which crosses it at right angles. Both Streets
were fill’d with his Hearers to a considerable Distance. Being among
the hindmost in Market Street, I had the Curiosity to learn how far
he could be heard, by retiring backwards down the Street towards the
River; and I found his Voice distinct till I came near Front Street, when
some Noise in that Street, obscur’d it. Imagining then a Semicircle, of
which my Distance should be the Radius, and that it were fill’d with
Auditors, to each of whom I allow’d two square feet, I computed that
1
he might well be heard by more than Thirty Thousand. This reconcil’d
me to the Newspaper Accounts of his having preach’d to 25,000 People
in the Fields, and to the ancient Histories of Generals haranguing whole
Armies, of which I had sometimes doubted. (Franklin, 1793).
Though novel by eighteenth-century standards, Franklin’s experiment ignores
some important acoustic phenomena. Advances in physics and computational technology have transformed archaeoacoustic research, allowing much more detailed
descriptions of how acoustic spaces would have sounded in the past.
Using modern simulation techniques and Franklin’s data, it is possible to
model eighteenth-century Philadelphia to calculate how loud Whitefield’s voice
would have been during Franklin’s experiment. This information may then be used
to insert a virtual Whitefield into a model of his largest crowds in London, simulating how many people could have heard his unamplified voice at once and allowing
a measure for the accuracy of Franklin’s original calculation. Since Whitefield’s
crowds are among the largest recorded in history, this research addresses not only
Franklin’s specific question but also the general question of the maximum free field
range of the unamplified human voice.
The study of history, as with most of human culture, prizes visual cues over
auditory ones. This is partly due to the neurological composition of these two sensory systems and partly because, musical scores being a notable exception, very
little auditory information is encoded into the archaeological or historical record.
Because of this, our common conception of history is reduced to something like a
picture-book of frozen images in time. However, the auditory system has a much
greater resolution in the time domain than the visual system, and because of this
auditory cues are a primary way in which we experience the flow of time. Hearing
2
the past is a valuable way to understand the lives of people in the past by experiencing time as a transient, flowing medium rather than a series of famous paintings in
a history book. Since hearing events from the past is quite difficult, this constitutes
a major problem for our ability to holistically understand history.
Franklin’s experiment represents a desire to investigate an important historical event – Whitefield’s sermons in London – by recording a smaller piece of measurable data, and extrapolating mathematically the larger question of his maximum
crowd size. Unfortunately, modern historians addressing the same question fall into
the opposing traps of either considering the crowd size as unknowable, or by taking Franklin’s estimate at face value. Both of these approaches blatantly ignore
the advances in knowledge that are possible because of the progress of science and
technology over the past 250 years.
The goal of this study is to use Franklin’s measured data, combined with
modern understandings of sound propagation and psychoacoustics, to estimate the
Sound Pressure Level (SPL) of Whitefield’s voice and how many people could have
intelligibly heard him at once. Through a combination of historical and archaeological research, acoustical modeling, and laboratory measurements of the human
voice, I intend to complete Franklin’s experiment as I believe he would were he
alive today.
Scope of this Study
Construction of Audience
This study is interested primarily in answering the specific research question Franklin
was addressing: how many people could hear Whitefield’s unamplified voice at
once? In particular, this study addresses the question in the same way his exper-
3
iment did. This means that that the virtual audience will essentially be populated
entirely with virtual Benjamin Franklins, with equal hearing to Franklin himself.
Franklin had no reported record of hearing loss and was relatively young
(33) at the time of his experiment, and so this method allows a fair generalization
of the hearing capacity of Whitefield’s audiences. In addition, different minimum
values of the Speech Transmission Index (STI) may be implemented in the model
to account for different levels for Franklin himself. These in turn propagate through
the entire experiment, since better hearing yields a lower minimum STI, which
in turn leads to a lower simulated loudness for Whitefield’s voice. This allows a
wider range of possibilities to be considered while maintaining Franklin’s original
experiment design.
Of course, it is always possible to construct a theoretical audience with better
or worse hearing than Franklin or taller or shorter or in different geometric configurations, but the focus of this study is primarily to answer Franklin’s question as he
would have, given 250 years of advances in scientific knowledge and technology.
Franklin may thus be considered a more or less average citizen listener based on
what we know about him.
In addition, it is tempting to use the simulated impulse response of Whitefield’s acoustic system to generate an auralization to allow us to hear what his oratory would have sounded like. This is an effective technique for ensemble music,
which averages out the individual characteristics of single musicians and allows a
good approximation of how a specific piece would have sounded in the past (Boren,
Longair, & Orlowski, 2013). But in Whitefield’s case we have little specific information about his accent, which would make an anechoic recording of a trained actor
mere speculation for the subjective character of his voice itself. While Franklin’s
4
experiment provides good data for an analysis of maximum intelligible range, it
does not provide enough information for an auralization.
Assessment of Intelligibility
This study, like Franklin’s, will focus on a generalized intelligibility rating rather
than describing specific factors for different words or phrases, despite the existence of specific research into that question. For instance, there has been extensive
work done on the subjective intelligibility of different speech and frequency bands
(Houtgast, Steeneken, & Plomp, 1980). In the field of vocal directivity, some attention has been paid to the analysis of the radiation patterns of individual phonemes
(Katz & D’Alessandro, 2007). And we do have some anecdotal data about Whitefield’s power over specific words: the famous English Shakespearean actor David
Garrick reported that Whitefield “could make his audiences weep or tremble merely
by varying his pronunciation of the word Mesopotamia.” Garrick also said he would
give one hundred guineas if he could pronounce ‘O!’ like Whitefield (Wakeley,
1871).
Yet in the end, these individual data points do not afford us a rigorous enough
body of evidence to look into such precise questions in a simulation of Whitefield’s
voice. Even Franklin, who had a living, breathing Whitefield present during his
experiment, did not try to find out anything so specific: instead, he merely sought
an averaged ‘intelligibility’ rating rather than focusing on the intelligibility of a
specific word or phrase. The field of archaeoacoustics seldom even offers historical examples with quantitative averaged data such as Franklin’s, let alone any such
information about specific phonemes, nor is it clear how the summation of such specific research questions would cohere to answer the broader historical question of
Whitefield’s maximum crowd size. Thus this study will focus on the use of the STI,
5
which uses known data about the understandability of specific frequency bands to
produce an averaged metric that correlates well with subjective human speech intelligibility. In a similar manner, radiation pattern data considered within the context
of this study will simply use root mean square (RMS) acoustic energy per frequency
band, which is standard for directivity measurements (Chu & Warnock, 2002; Katz
& D’Alessandro, 2007). This will allow straightforward calculations of the STI
within the simulated acoustic system, allowing the best estimate of Whitefield’s
‘average’ intelligibility. While not every speaker at the same level will yield equivalent perceptual intelligibility, the power of Whitefield’s articulation, rather than
being considered separately, is included in the virtual level simulation here for the
purposes of the STI. Thus a given vocal level estimate of X dB could also refer to
a level of (X − 1) dB with clearer articulation.
Motivation
This type of research is fairly novel, as it involves an insular problem with an insular solution – but the problem and solution exist in entirely separate academic
departments. The methodology of estimating a historical source’s SPL based on
the STI at a receiver point is entirely new, owing to the fact that there are few if
any historical accounts as detailed as Franklin’s in discussing the threshold of intelligibility at a specific location. This methodology will contribute toward a broader
body of research aimed at reconstructing the sounds of the past based on historical
evidence and principles of physics.
In addition to novelty, there is scientific significance to a quantitative study
of the maximum range of the human voice. While this topic is less important for
modern acoustic engineering because of amplification technology, some maximum
6
limits of the human voice have been explored within the vocal acoustics community
(Kent, Kent, & J. Rosenbek, 1987; Coleman, Mabis, & Hinson, 1977). Whitefield’s
example, while not reproducible because of its place in history, still offers a unique
framework for this research question, as it would not be feasible to undertake a
present experiment with 30,000 listeners. In addition, the in-depth modeling process will expand the understanding of the techniques and data needed to efficiently
and accurately simulate acoustics in large outdoor venues. This research combines
the study of maximum vocal level, sound radiation patterns, computational acoustic
propagation algorithms, and the perceptual quantification of intelligibility to provide a more concrete answer to how many people a single voice can reach on its
own strength.
This research is also significant from a historical point of view. Whitefield,
though not as famous today, was one of the first transcontinental celebrities and was
probably known to more Americans in the first half of the eighteenth century than
any public figure except George II. The revivals in Britain spurred by Whitefield
and John Wesley led to significant social change, including prison reform and the
abolition of the slave trade (Dallimore, 1970). In the colonies, Whitefield’s role
in the First Great Awakening helped establish a more coherent and independent
American identity and may have contributed to the American Revolutionary cause
(Mahaffey, 2007). The question of how large Whitefield’s crowds were is of historical significance, as is the relation between the actual crowd sizes and the estimates,
often circulated by Whitefield’s supporters (Lambert, 1994).
But in addition to Whitefield’s significance alone, of course this project also
involves checking the experimental work of Benjamin Franklin, the first scientist
of any distinction in the colonies. In addition to his work on electromagnetism,
Franklin also wrote and theorized about the physics of sound throughout his lifetime
7
(Franklin, n.d., 1749), including the question he addressed here: how many people
could hear a single orator’s voice? While novel given the technology and knowledge
of his day, Franklin’s experiment failed to account for the role of sound absorption,
and it also assumed a uniform intelligible radius without referencing any measure
of the sound power used to generate the speaker’s voice. Using modern acoustic
modeling technology to account for these factors will allow a better estimate of how
large Whitefield’s crowds could have been and give some measure of the accuracy
of Franklin’s original experimental result.
Finally, in addition to analyzing Franklin’s experiment because of his personal significance, there is also anthropological value to the question Franklin was
raising. He was personally skeptical of narrative accounts of generals haranguing large armies, and sought to address this using scientific first principles with
Whitefield as his test case. Since Whitefield’s crowds are the largest reported for
an unamplified voice in recent history in a city where 80,000 people actually lived,
Whitefield’s case and Franklin’s data provide a touchstone for a much broader investigation of the maximum crowd that could have gathered to hear a unified message in the pre-amplified era. This will provide a helpful empirical framework for
investigating religious revivals like Whitefield’s, military communication channels
such as Franklin was investigating, and also more broadly the size of gatherings in
pre-literary cultures, where a strong oral tradition helped cement a single people
group together.
While Whitefield’s case is singular because of Franklin’s recorded data, a rigorous analysis of the extreme case allows the construction of a more general framework of how to treat historical instances of unamplified speakers addressing large
audiences. Based on geometry and material composition of the sites in question,
climatological data, and background noise estimates, various historical speakers’
8
maximum crowd size could be indexed to their maximum vocal level. Research
into the maximum level of the spoken voice could also be used to separate trained
orators from speakers with more normal vocal ranges.
Dissertation Outline
Chapter II gives a description of the state-of-the-art in the fields necessary to complete this research. In particular, the fields of archaeoacoustics, computer
acoustic simulation, and acoustics of the human voice are examined in detail.
Chapter III examines the directivity of the human voice at high levels for trained
actors and opera singers. An experimental set of measurements are taken to
examine the effect of different vocal modes of production on vocal directivity
in the horizontal plane.
Chapter IV describes the specifics of Franklin’s experiment in Philadelphia and
discusses the possible sound sources that may have existed near Front Street.
Using a geometrical analysis of diffraction effects at Franklin’s position, conclusions are drawn about Franklin’s position during the experiment.
Chapter V details the geometry and material composition of the Market Street area
where Franklin conducted his experiment. These data are used to construct a
computer model which then provides an estimate of Whitefield’s on-axis SPL
based on the intelligibility at Franklin’s position.
Chapter VI explores the maximum average and peak level of the human voice at
a fixed distance. A larger set of trained actors and opera singers are measured for several different classifiers of on-axis SPL. These findings are then
compared to the estimates for Whitefield’s voice and the existing literature.
9
Chapter VII describes the occasion and locations of Whitefield’s largest crowds in
London. The geography and history of the Moorfields, Kennington Common,
and Mayfair are discussed, along with the specific crowd estimates for each
site. Based on the geometrical and material data available, computer models
of the sites of the three largest crowds are constructed.
Chapter VIII uses the data for Whitefield’s estimated SPL and the acoustic computer models to investigate the intelligible area reached by his voice with
respect to several other factors. Based on these simulations and an estimate
of the average crowd density, final estimates are provided for the maximum
acoustic limit of Whitefield’s crowds.
Chapter IX concludes this thesis. This chapter presents a summary of the findings
of this study and the diverse research methods used to achieve its findings.
Contributions
The primary contributions of this dissertation are listed below:
• An empirical investigation of the relationship between vocal production modes
and acoustic directivity.
• Many specific historical details about the material and geometric composition
of Philadelphia and London in 1739.
• New information about the soundscape of colonial Philadelphia and specific
octave-band noise data for carriages on gravel roads.
• A novel diffraction analysis to pinpoint more precisely Franklin’s position
during his experiment.
• An acoustic computer model of Market Street in 1739 to re-create the first
phase of Franklin’s experiment.
10
• An estimate of George Whitefield’s on-axis LAeq at 1 m.
• A comprehensive organization of previous research vocal SPL measurements
as well as new experimental data on the maximum SPL achievable by a human voice.
• Acoustic computer models of Whitefield preaching at the Moorfields, Kennington Common, and Mayfair, to re-create the second phase of Franklin’s
experiment.
• The first rigorous estimates of the maximum intelligible range of a single
unamplified human voice under a variety of conditions.
Associated Publications by the Author
This thesis covers much of the work presented in the publications listed below:
Peer-Reviewed Articles
• Boren, B. (2014). The Maximum Intelligible Range of the Human Voice.
Journal of the Acoustical Society of America (submission).
• Boren, B. (2012). Sounds of the City: The Colonial Era. The Encyclopedia
of Greater Philadelphia.
Conference Papers
• Boren, B., Roginska, A., & Gill, B. (2013). Maximum Averaged and Peak
Levels of Vocal Sound Pressure. 135th Audio Engineering Society Convention, New York, NY.
• Boren, B. & Roginska, A. (2013). Sound radiation of trained vocalizers. Proceedings of Meetings on Acoustics: 21st International Congress on Acoustics, Montreal, Canada.
11
• Boren, B. & Roginska, A. (2012). Analysis of noise sources in colonial
Philadelphia. Internoise 2012, New York, NY.
12
CHAPTER II
BACKGROUND
History of Whitefield’s Crowds
The reported sizes of Whitefield’s crowds are among the largest recorded audiences
in history for a single unamplified speaker (Dallimore, 1970). However, historical
studies offer few methods for estimating the crowd sizes more precisely than the
estimates of the day: Dallimore (1970) and Stout (1991) believe that the reported
numbers should be reduced by a factor of one-half, while Mahaffey (2012) has
suggested looking at population data to find the largest possible audience within
a ten mile radius. But in the end these methods are highly speculative, as were
the estimates from Whitefield and his contemporaries. Even with crowds that are
photographed extensively, there is still today a large relative standard error in crowd
estimation techniques (R. Watson & Yip, 2011).
The question may arise as to how accurate the original crowd estimates are
or why we should place much confidence in them, as blind estimates are frequently
inflated (Jacobs, 1967). However, the estimates of Whitefield’s crowds were often
said to have been ‘computed’, suggesting that a more rigorous approach may have
been taken. Indeed, one such account in The Gentleman’s Magazine indicates that
a modern sort of density-area calculation had been used to arrive at an estimate of
20,000 (“The Gentleman’s Magazine”, 1739).
Still, some historians remain skeptical about the veracity of the period ac-
13
counts of the crowds, believing instead that Whitefield or his publicist William
Seward “fabricated crowd estimates” as a publicity stunt (Lambert, 1994). To this
two things may be said: first, while a reading of Whitefield’s journals (Whitefield,
1756) may give evidence toward his overconfidence, his sincerity comes across with
equal strength. While his estimates may have been too high, when he wrote that he
“really believed” his crowds numbered a certain amount, Whitefield’s dedication to
personal piety suggests that he never deliberately inflated the reported crowd sizes
or encouraged others to do so. Secondly, it is useful in such cases to consider not
only Whitefield’s friends but also his enemies. Perhaps because Whitefield’s fieldpreaching was such a new phenomenon, his opposition in the established church
generally saw his massive crowds as mostly a mark against his credibility. One Anglican priest in Boston opposed Whitefield but agreed with the estimate of 20,000
hearers gathered to hear Whitefield there in 1740 (Lambert, 1994). One letter to
Franklin’s Pennsylvania Gazette, however, did assert that the crowd estimates were
too large, though Franklin added that the letter came close to invective (Franklin,
1740).
While the historical record is far from clear on this issue, it can be said that
some of Whitefield’s crowd estimates appear to have been actual numerical estimates rather than blind guesses, and that the majority of his supporters and detractors seemed to agree that the crowds were the largest they had seen. In Chapter IV
we will consider additional historical evidence as to why an acoustical method may
offer the best empirical estimate of Whitefield’s maximum crowd size.
14
Archaeoacoustics
Recent movements within digital humanities use computational techniques to provide quantitative data for research within less strictly quantitative disciplines. Often
such projects must delve deeply into an unrelated empirical field to find the tools
necessary to address a question in the humanities, such as using Music Information
Retrieval to investigate classical theories of tuning in Indian music (Serra, Koduri,
Miron, & Serra, 2011) or using Geographical Information System (GIS) technology to address the role of topology in the Battle of Gettysburg (Knowles, 2008).
Archaeoacoustics, or archaeological acoustics, is a similar emerging discipline between the fields of archaeology, history, musicology, physics, and acoustics. It aims
to provide a lens to past soundscapes and help understand how sound affected the
past as experienced by the people of a given time period.
Research in this field has examined acoustical effects and questions of intentionality in prehistoric and neolithic monuments (Scarre & Lawson, 2006; Abel et
al., 2008). Archaeoacoustic researchers have used instrument modeling techniques
to synthesize an estimate of how ancient instruments would have sounded based
on physical descriptions and drawings (Andreopoulou & Roginska, 2012). Others
seek to explain recorded descriptions of sound propagation using wave theory to
explain acoustical shadows reported during U.S. Civil War battles. (Ross, 1999).
Even pure historians have begun to pay attention to the transient nature of sound in
reconstructing how people experienced the past (Smith, 1999; Rath, 2003).
More recent work has begun to examine the question of acoustics in architectural design from periods before the physics of sound were well understood
(Orlowski, 2006; Howard & Moretti, 2010). In interior spaces especially, acoustics acts as a bridge from architectural history to other aspects of history, including
15
music, theatre, and religious liturgy. In addition, recent studies using quantitative
acoustic measurements have allowed a more sophisticated empirical analysis of
existing acoustical spaces of historical importance (Bonsi, Longair, Garsed, & Orlowski, 2008). In addition, the use of computational acoustic modeling, calibrated
according to objective measurements, has been used to estimate the acoustical effects of crowds, tapestries, and changes in geometry that such spaces would have
encountered in the past (Boren & Longair, 2011; Boren et al., 2013). Another recent project combines visual and acoustic modeling to simulate the experience of
hearing John Donne preaching at the pre-fire St. Paul’s Cathedral in London (Wall,
Stephens, & Markham, 2012). Because of the wide interdisciplinary nature of the
field, archaeoacoustic research can be based on qualitative interpretation or quantitative analysis, and often requires a nuanced blend of both to provide meaningful
results.
Acoustical Simulation
Though the understanding that sound travels in a wave goes back to Aristotle, no
attempt to simulate the motion of sound in a real environment was made until 1843
with the invention of the ripple tank (Rindel, 2002). By exciting water vibrations
and using hard surfaces to model the walls of a cross-section of a room, this method
allowed a coarse simulation of the 2-dimensional motion within a room. Because
of the vastly different physical properties of water, air, and room surfaces, the ripple
tank could not do much more than show the wave motion, however. By the early
twentieth century, pioneering acoustician Wallace Sabine used Schlieren photography to implement a similar visualization technique within a real room (fig. 1). This
involved filling the room with smoke and backlighting it, then photographing the
16
motion from an impulsive spark, thus giving some indication of how a sound wave
would move with an actual space.
Figure 1: Schlieren photography showing wave propagation in a concert hall from
Rindel (2002)
The first ray-based acoustical simulations were implemented through the optical beam method, in which a single light source is made to give off light rays in
many directions (fig. 2). By darkening or lightening surfaces, some degree of reflection and absorption could be simulated. However, this only worked for simulating
high frequency sound, as the the wavelengths of optical light are very small in relation to room surfaces. A later method used lasers to improve simulation precision,
but it retains the basic modeling procedure (Rindel, 2002).
Figure 2: Optical ray method showing attenuation of individual ray paths from Rindel
(2002)
These early methods were all non-auditory. They helped shape the modern
17
understanding of room acoustics, but they did not allow any auralization of how a
simulated space would sound. The next big push in the twentieth century was the
method of building scaled physical models of a space. By scaling the wavelengths
of sound to the proportion of the model, a tiny dummy head could by used to generate an approximate binaural room impulse response for auralization (?, ?). The
drawbacks of this method include accounting for the lowpass filtering effects of air
absorption (addressed through using dried air at 2-3% humidity, or oxygen-free nitrogen, or post-filtering if the air has a very homogeneous distribution) as well as
the fact that higher accuracy requires bigger models, which becomes more expensive (Rindel, 2002). In spite of its drawbacks, however, this method is still used by
many acoustic consulting firms (Kleiner, Dalenback, & Svensson, 1993).
The acoustic wave equation (eq. 1) describes the behavior of a sound wave in
a medium based on its pressure p, wave velocity c, and time t (∇2 is the Laplacian
operator in 3D Cartesian space). Because this is a second-order partial differential
equation, it must be solved numerically at millions of discrete points in space and
time, requiring significant computational power.
1 ∂ 2p
∇ p= 2 2
c ∂t
2
(1)
Computational Wave Equation (WE) models produce a high level of accuracy,
but consequently they require a high level of precision during the modeling process,
requiring complex acoustic impedance values instead of a simple Sabine absorption
coefficient (Olesen, 1997). They can accurately model resonance, focusing effects,
diffraction, and refraction, although they still require statistical techniques to account for scattering in rough complex geometries. However, since each solution
to the wave equation is a single-frequency phasor, a dense array of calculations is
18
needed to get full octave-band information for a single point in a room. Since the
number of affected room modes is proportional to the cube of frequency, it is incredibly computationally expensive to calculate wave equation solutions for a wide
frequency range, for large spaces, or for large arrays of listener positions (Rindel,
2000). For these reasons, WE models are typically used in non-auditory acoustic
simulations or noise engineering contexts in which the acoustic system is relatively
small. Popular WE models include Finite Element Method (FEM) models, which
generate a mesh of the modeled acoustic medium, and Boundary Element Method
(BEM) models, which form a mesh of the system’s boundaries and assumes a homogeneous medium contained within (Kleiner et al., 1993). FEM systems work better
for simulating refraction effects in heterogeneous media, while BEM systems work
better when the volume to surface area ratio is high. Finite Difference Equation
(FDE) systems discretize the wave equation using a Taylor Series approximation,
but their accuracy is reduced at high frequencies without good data for the characteristic impedance of any boundary materials (Olesen, 1997). Parabolic Equation
(PE) systems have become popular for modeling long-range acoustic propagation
underwater or outdoors, but these too require precise data about surface impedance
and the temperature gradient within the acoustic medium (West, Gilbert, & Sack,
1992; White & Gilbert, 1989).
In essence the two great advantages of WE models are their handling of wave
phenomena and their numerical accuracy. The tradeoffs for these are their requirement of huge computational resources and high precision input data. Since this
project does not require real-time results, the computational requirement is not particularly relevant. However, because of the historical nature of this project, most
inputs can only be broadly estimated rather than precisely measured, which negates
any possible gains in precision. The ability to simulate wave phenomena accurately
19
also requires precise measurements of geometry or atmospheric temperatures at different strata, which are impossible to obtain for the dates in question. Thus a WE
simulation for archaeological acoustic simulation seems somewhat like giving an
answer with five significant digits when the inputs only contained one. Because
this research involves a linear line-of-sight calculation with only tangential questions of diffraction and refraction, it may be best to handle those side issues generally and then perform calculations using simulation methods more robust to general
absorption and scattering data.
The next most accurate computer simulation techniques are geometrical models, which model sound as a ray rather than as a wave, which allow them to focus
only on the sound paths necessary to correctly generate the room impulse response
at the listener’s position. The oldest of these is the Image Source Model (ISM),
which, though known earlier, was first implemented numerically by Allen and
Berkeley for simulating impulse responses in rectangular office buildings (1970). It
was later extended to a vector-based approach by Borish, whose technique allowed
it to be implemented for any arbitrary polyhedron (1984). The basic principle of
ISM is that reflections from a source may be modeled as the effect of a virtual
source in a virtual room, existing across the axis wall of reflection. ISM allows
an efficient way to quickly simulate all reflections within a given radial distance.
However, for complicated geometries, many of the possible virtual sources will not
be visible to the listener and thus will not affect the eventual simulated impulse
response. For instance, a normal, somewhat complex room geometry could produce as many as 1019 possible virtual higher-order sources, out of which only 2500
are viable (Borish, 1984). Because of this, ISM is only tenable for modeling early
reflections.
The Ray Tracing (RT) method instead sends out a uniform distribution of
20
rays from a modeled source, computing the rays’ reflection paths and attenuation
to simulate an acoustic impulse response at the point of a receiver (Rindel, 2000).
This essentially has the opposite effect of creating virtual listeners in virtual rooms
as opposed to the virtual sources of ISM. The rays are one-dimensional, while the
listener has a small volume and detects all rays that pass through. If both had infinite resources, they would both give the same result. But RT may not necessarily
find all reflection paths in increasing reflection order, which makes it less effective for simulating early reflections (Borish, 1984). However, it is easier to apply
statistical scattering methods in RT, which makes it ideal for modeling late-field
reverberation. A variant of this approach is Cone Tracing (CT), which uses small
beams with a definite cross section, while the virtual listeners are modeled as points
(B.-I. Dalenback, 1996). This increases the angle with increased distance from the
source, which allows it to find virtual listeners more uniformly. However, the system must be carefully engineered (using triangular beams) to ensure that the beams
do not intersect.
In reality, the most accurate results come from a hybrid method that makes
best use of the different techniques. Odeon* and EASE† both use a hybrid of ISM
and RT for early and late reflections, respectively, and CATT-Acoustic uses a hybrid
of ISM and CT. Vorländer found that hybrid methods outperformed other singlealgorithm methods, and that those software packages (like Odeon and CATT) which
modeled acoustic scattering performed best (Vorländer, 1995). These programs all
use some variant of Lambert’s cosine law to calculate probability distributions for
scattered sound, which allows a more diffuse field and keeps the software from
* http://www.odeon.dk/,
†
accessed 7/22/2014.
http://ease.afmg.eu/, accessed 7/22/2014.
21
overestimating the reverberation times in virtual rooms. While these methods do
not take wavelength into effect, they can simulate it by performing separate calculations for different frequency bands and using the absorption data accordingly.
In addition, many hybrid methods now include additional algorithms to account
for diffraction effects and reflection-based-scattering, which previously only WE
models could simulate (Rindel, Nielsen, & Christensen, 2009). The use of hybrid
models in outdoor acoustics has been verified in multiple studies, which have found
that hybrid systems work accurately as long as early reflection surfaces are adequately modeled (Lisa, Rindel, & Christensen, 2004; Mori, Yoshino, S. Satoh, &
Tachibana, 2011). There do exist geometrical software packages specifically intended for outdoor acoustic simulation which allow more precise handling of wind
noise and diffraction effects as well* . Again, due to the lack of information on historical wind speeds and the linear nature of the acoustic systems in question, for the
purposes of this project any of the available hybrid modeling
Acoustics of the Spoken Voice
The acoustic properties of the human voice have been investigated from various
perspectives, with the result that much that is known about the acoustic system is
confined within specific disciplines such as communication disorders, voice recognition, noise control engineering, and music performance. This research will focus
only on the spatial directivity of the voice and the maximum vocal SPL produceable
by trained vocalists.
* https://kluedo.ub.uni-kl.de/frontdoor/index/index/docId/2051,
cessed 7/22/2014.
22
ac-
Directivity
The directivity of the human voice has been a subject of interest for a variety of
applications for over 70 years. Different studies have focused on measuring radiation patterns for knowledge of microphone placement (Dunn & Farnsworth,
1939), experimental verification of physical theory (Flanagan, 1960), architectural
design (Chu & Warnock, 2002; McKendree, 1986), vocal performance practice
(Cabrera, Davis, & Connolly, 2011), and computer simulation and auralization
(Katz & D’Alessandro, 2007).
The methods used in these measurements varied from a single ‘exploring’
microphone (Dunn & Farnsworth, 1939) to the more extensive arc arrays of microphones used in the most detailed studies (Chu & Warnock, 2002; Monson, Hunter,
& Story, 2012). These techniques have possible error introduced due to the necessity of the subject repeating a single block of speech for each measurement position.
Other studies have used a single static array of microphones to measure directivity within a single plane, allowing a detailed investigation without the need for a
dedicated laboratory measurement apparatus and avoiding any error introduced by
changes in the subject’s vocal delivery (McKendree, 1986; Cabrera et al., 2011).
The findings of these studies have not always been in exact agreement, but
gradually consensus is building around the independence of the radiation pattern
against several factors. For instance, McKendree (1986) reported differences in
directivity based on gender, but later studies by Chu and Warnock (2002) and Monson (2012) by and large were not able to support this conclusion. Chu and Warnock
(2002) similarly found differences in radiation pattern for different loudness levels,
but this was only investigated for a single subject. Monson (2012) did not observe
23
the same effect, although he did report some increases in directionality at high frequencies for loud speech.
Marshall and Meyer (1985) reported differences in individual phonemes, and
Katz (2007) also found significant differences between sung vowels within specific
mid-frequency bands. Monson (2012) also found variation in radiation patterns for
different voiceless fricatives, presumably because of differences in mouth shape
and frequency content for these phonemes. Katz (2007) investigated different sung
vocal techniques, including ‘projected’ and ‘focused’ voice, but found that these
techniques did not appreciably affect the radiation pattern of the voice.
Maximum Level
Greater vocal levels are found for trained vocalists in comparison to untrained vocalists (Akerlund, Gramming, & Sundberg, 1992). For the sung or spoken voice,
two similar phenomena known alternatively as the Singer’s or Speaker’s Formant
occur in trained vocalists to merge multiple vocal formants into a single spectral
peak around 3 kHz (Nawka, Anders, Cebulla, & Zurakowski, 1997; Sundberg,
2001). Some studies have suggested this is a cluster of formants 3 and 4, while
others have interpreted it as a cluster of formants 4 and 5.* In the case of singers,
this allows the voice to stand out against the typical frequency contour of a symphony orchestra. But because of the nonlinear frequency-dependent sensitivity of
the auditory system, it also helps concentrate more sound energy into the frequencies most important for speech intelligibility (1-4 kHz).
The perception of loudness is dependent on more than SPL alone (G. D. Allen,
1971), but this investigation will be purely limited to the maximum peak or averaged
* www.phys.unsw.edu.au/jw/voice.html,
24
accessed 7/22/2014.
SPL produced by vocalists. To the author’s knowledge, no study has comprehensively surveyed the existing literature on maximum vocal SPLs since (Kent et al.,
1987). Kent’s study summarized different series of measurements, while acknowledging that they were recorded at different distances from the vocalist’s mouth.
Here some attempt will be made to scale all such measurements to the predicted
SPL at 1 m using the classical formula for free field sound attenuation (eq. 2). In
practice the drop-off will be less abrupt for some close measurements, due to nearfield behavior of the sound field close to the vocalist’s mouth. Still, this formula
allows a good approximation for comparing some of the large SPLs reported at
short distances to later measurements taken at 1 m.
∆L = 20 log10
r1
r2
(2)
Many of the studies that have collected maximum SPL measurements have
used them for comparison with other factors, and so the types of SPL measurements
and vocal signals analyzed vary. Some studies (Mendes, Rothman, Sapienza, &
Brown, 2003; Coleman et al., 1977; Coleman, 1994) did not specify the time of
integration for SPL recording, so it is assumed some form of fast-integrated Lp
was used. Some studies (Mendes et al., 2003; Akerlund et al., 1992; Coleman,
1994; Gramming, Sundberg, & Ternström, 1988) only provided graphs and not
exact measurement values, so the dB values reported here may contain ± 1 dB of
error. In addition, some studies (Awan, 1991; Akerlund et al., 1992; Leino, 2009)
only report the mean level for all their participants, indicating that higher values
were measured but not reported directly.
The highest SPL reported in Kent’s (Kent et al., 1987) summary was from
Coleman’s 1977 study of fundamental frequency and SPL (Coleman et al., 1977).
25
Coleman reported the fast-integrated full spectrum SPL recorded for a 2 s sung
phonation. This yielded a max Lp of 126 dB for a male and 122 for a female,
recorded at 6 inches from the mouth. These high values are a consequence of a
very close distance to the vocalist; using an attenuation factor of -16 dB from eq. 2
gives estimated SPLs of 110 and 106 dB, respectively at 1 m. These values are still
high but more similar to the maximum peak values seen in other studies.
Table 1
Review of maximum SPLs measured in previous studies
Study
Participants
Type
Dist. (cm)
Max SPL
SPL @ 1 m
Coleman
10 m. adults
Fast Lp , 2 s phon.
15.24
126 dB
110 dB
Coleman
12 f. adults
Fast Lp , 2 s phon.
15.24
122 dB
106 dB
Gramming
9 m. singers
Leq , sung triads
30
105 dB
95 dB
Awan
20 singers
Fast Lp , 3 s phon.
30.48
112.5 dB
102 dB
Akerlund
10 f. singers
Leq , 30 s speech
30
93 dB
83 dB
Akerlund
10 f. singers
Leq , 2 s phon.
30
118 dB
108 dB
Coleman
20 singers
Lp , 4 s phon.
15
114 dB
98 dB
Mendes
14 singers
Lp , 6 s phon.
2
118 dB
84 dB
Sundberg
31 speakers
Leq , 40 s speech
30
100.3 dB
90 dB
Leino
14 students
Mean Leq , speech
40
72.8 dB
65 dB
Many studies have measured either instantaneous Lp or Leq for constant phonations a few seconds in length. Leq is defined as the time average of the SPL:
Leq = 10 log10
1
T
Z
0
T
p(t)
p0
2
dt
(3)
where T is the integration time and p(t) is the pressure as a function of time. For
26
sustained tones with no pauses or silences, Leq will be higher than the Leq for normal speech or song. For this reason, studies that have measured short sustained
phonations, whether they are reporting instantaneous Lp or Leq , are describing a
quantity more similar to a peak value when applied to continuous speech or song.
This can be seen in the study by Akerlund (1992), which measured Leq for normal
speech and a 2 s phonation. The maximum value of the phonation’s Leq was 25 dB
greater than that of the speech.
Table 1 summarizes the maximum recorded SPLs of the relevant literature,
including the corresponding estimated level at 1 m for comparison. The maximum fast-integrated Lp is similar but not identical to the Lpk value, but these measurements give a good overview of the highest average and instantaneous levels
recorded for the human voice. In particular, Sundberg (Sundberg & Nordenberg,
2006) reports an Leq corresponding to about 90 dB at 1 m, the highest time-averaged
value reported in the literature.
27
CHAPTER III
ACOUSTIC DIRECTIVITY OF VOCAL PRODUCTION MODES
Before constructing an acoustic model of Whitefield preaching, it is important to ask whether we possess sufficient information to model him as an acoustic
source. A virtual acoustic source requires an SPL value, a spectrum of frequencies over which the sound power is divided, and an acoustic directivity pattern,
which determines how the sound spreads out in space with respect to the source’s
directional orientation. The SPL value is the variable for which we are trying to
solve, so that can be left as an unknown for the present. The spectrum is fairly
consistent across speakers of the same sex, so simulation engines generally use a
single averaged spectrum for male and female talkers (B. Dalenback, 2011). While
Whitefield’s voice may have diverged slightly from the average male spectrum,
those deviations are effectively unknown.* The nonlinear loudness sensitivity of
the auditory system to different frequencies is already characterized by the STI,
and unfortunately the frequency spectrum is not a simple one-dimensional system
whose deviations from a mean value may be neatly evaluated in either direction.
Rather the spectrum is a complex multidimensional system with no clear preferential deviation from a mean male frequency profile based on the historical evidence.
* For
instance, if Whitefield’s voice had a noticeable Speaker’s Formant, he may have had a relative
peak at 3 kHz compared to the average male spectrum. But this would also have resulted in a loss
of energy at other frequency bands, so without more specific data an averaged spectrum is the best
approximation we have.
28
Because of this, it is better to use a mean spectrum while noting that extreme deviations could make differences within a given SPL based on the auditory system and
frequency-dependent air absorption.
Apart from level and spectrum, however, Whitefield’s vocal directivity pattern could have significantly affected his maximum audience size. Existing literature on this subject mainly focuses on conversational speech or the sung voice
(e.g. (Chu & Warnock, 2002; Cabrera et al., 2011), as mentioned in Chapter II).
But trained vocalists often employ multiple modes of vocal production (Katz &
D’Alessandro, 2007), and it is unknown whether this could cause significant deviations from averaged vocal directivity databases. This chapter analyzes the effects of
vocal production methods on the horizontal acoustic radiation pattern of the trained
voice.
Measurement Procedure
This research involved measuring the radiation patterns of trained vocalists employing different vocal production techniques at high levels. Two male singers
were measured, one a professional opera singer and the other a classically trained
musical theater singer. Two actors were also measured, one male and one female.
For each mode of production, each vocalist first intoned five vowels on C4: /i/, /eI/,
/a/, /o/, and /u/, for about two seconds each. This was less to see the effect on any
given phoneme than to investigate the changes without broadband fricatives whose
content is presumably less affected by a change in vocal production mode. After
the vowels, the singers performed about 30 seconds of a prepared song, while the
actors performed a monologue of the same length.
C4 was chosen because it was within the common range of both male and
29
female vocalists. Since it is towards the bottom of a female’s typical range and
toward the middle to top of a male’s range, the absolute loudness achievable for
different genders will experience some difference. However, keeping the note the
same assists the directivity analysis procedure by ensuring that the fundamental and
harmonics are uniform for each measurement, whereas raising the note for a female
would lead to many irregularities at specific octave bands. In addition, because
these results are intended primarily for making comparisons of different production
techniques for each vocalist at specific third-octave bands, the change in absolute
loudness matters less than the overall directivity by frequency. Furthermore, since
Whitefield’s voice was subjectively described as somewhat high, this pitch corresponds to a good range in the male voice to observe possible directivity effects for
different vocal production modes.
The singers and actors employed different vocabularies to describe different
‘placements’ of the voice corresponding to different production methods, so three
methods were chosen for the singers and four for the actors. The singers’ production methods were ‘back,’ ‘forward,’ and ‘in the mask’, corresponding to perceived
resonances in the rear of the mouth, the front of the mouth, and the sinus cavities
respectively.* The actors’ production methods were a ‘chest’ voice corresponding
to a felt resonance in the front of the speaker’s chest, a ‘mask’ voice similar to that
employed by the singers, a ‘head’ voice in which the voice is felt resonating at the
top of the speaker’s head, and a ‘back resonance’ voice in which the speaker shifts
the resonance to the rear of the torso.
The measurements were conducted using 13 Earthworks M30 measurement
* These
categories were self-reported. While there were audible spectral differences between the
methods, an expert listener expressed that the singers were not properly achieving the ‘mask’ and
‘front’ voices.
30
} 60 cm Figure 3: Diagram of microphone array used for measurements
microphones. These microphones have extended flat frequency responses, and each
was calibrated within a 0.21 dB range using pink noise at 95 dBZ in a hemi-anechoic
environment. The microphones were spaced along a semicircle at 15-degree intervals and aligned to the height of the center of each vocalist’s mouth. Assuming
vocal symmetry, the data were doubled to form 360-degree radiation patterns in the
horizontal plane. Only the horizontal plane at the level of the vocalist’s mouth was
analyzed because more data are available for this plane (McKendree, 1986; Cabrera
et al., 2011; Monson et al., 2012) and it is easier to measure without a dedicated
microphone arc. Each vocalist was aligned to the center of the semicircle at their
measured height (fig. 4). No apparatus was used to keep the vocalists’ heads in
place, but they were observed to keep very still during the measurements.
Because the measurements were conducted in a live auditorium, the measurement distance was reduced to 60 cm from 1 m to better capture the acoustic
near field. The auditorium’s mid-frequency reverberation time was measured to
31
Figure 4: Aligning a vocalist with the measurement array
be 1.3 s using the Schroeder-integrated decay curves for a balloon popped in the
space. Though the room doubtless creates some smoothing of the measured radiation patterns, an auditorium was preferable to an anechoic environment because the
vocalists needed to project their voices at very high levels (Cabrera et al., 2011).
Anechoic rooms often feel perceptually unnatural and can lead to reduction in vocal level as a result (Katz & D’Alessandro, 2007). Katz (2007) attempted to reduce
this factor in anechoic measurements by producing artificial reverberation through
headphones, this was not an option for the purposes of these measurements since
for many of the vocal production methods the vocalists needed to first feel the resonance in a specific part of their head, often feeling their heads* with their hands
before recording in order to find the purest version of a specific ‘voice.’ In addition, the measurements were not used for absolute comparisons to other anechoic
* While
open headphones would probably have felt more natural, the need to feel the tops of their
heads made any type of headphones problematic for this purpose.
32
measurements but only to make differential comparisons between the methods and
subjects within this study.
Results
Normalized Radiation Patterns
The analysis of the data consisted of RMS levels of overall directivity and in thirdoctave bands for all the vocalists. Data were first normalized to observe any patterns
in the directivity itself independent from spectrum. Figure 5 shows the overall levels
for all four vocalists intoning the five vowels at C4.* Figure 6 shows the overall
levels for the actors’ monologues and the singers’ songs, respectively.
It is observed that the radiation patterns remain generally unaffected by the
production method used, with a few variations of about 1 dB or less. For the actress’s vowels, the ‘back resonance’ directivity was relatively 1-1.5 dB louder at
rear positions than the other production methods (fig. 5b), but this pattern was not
seen in the male actor’s vowels (fig. 5a) or in the female actress’s monologue (fig.
6b). In general, the overall level does not display a significant difference between
the average level for speech or song and that of intoned vowels.
The data were also analyzed in third-octave bands to show the effects of frequency on directivity. Selected bands are shown here for the sake of brevity. Figure
7 shows radiation patterns at three bands for the actress’s monologue. While the
expected increase in directionality with increased frequency can be observed, the
effect of the production method on directivity is very small (less than 1 dB) in most
* Note
that these are dB comparisons of RMS pressure alone, meaning that a dB comparison on the
scale of pressure squared would simply be multiplied by 2 for each plot.
33
Normalized Actor Vowels Overall Levels
Normalized Actress Vowels Overall Levels
90
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
90
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
−4
−4
150
150
30
30
−6
−6
−8
−8
180
0
180
330
210
0
330
210
300
240
300
240
270
270
(a) Actor
(b) Actress
Normalized MT Singer Vowels Overall Levels
Normalized Opera Singer Vowels Overall Levels
90
Back
Forward
Mask
0 dB
120
60
90
60
−2
−2
−4
−4
150
30
150
30
−6
−6
−8
−8
180
0
210
330
180
0
210
300
240
Back
Forward
Mask
0 dB
120
330
300
240
270
270
(c) Musical Theater Singer
(d) Opera Singer
Figure 5: Normalized overall levels for the vocalists, intoning vowels on C4
bands. The results for the actor’s monologue are so similar to the actress’s monologue that they are omitted here.
Figure 8 shows the same frequency bands for the actress’s intoned vowels.
Larger variations between production modes can be seen at low frequencies, especially at the band centered at 251 Hz, close to the fundamental frequency of the C4
on which the vowels were sung. The modes of production become more uniform as
the frequency increases, suggesting that the directionality of low frequency speech
is smoother than that of vowel production alone.
34
Normalized Actor Monologue Overall Levels
Normalized Actress Monologue Overall Levels
90
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
90
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
−4
−4
150
150
30
30
−6
−6
−8
−8
180
0
180
330
210
0
330
210
300
240
300
240
270
270
(a) Actor
(b) Actress
Normalized MT Singer Song Overall Levels
Normalized Opera Singer Song Overall Levels
90
Back
Forward
Mask
0 dB
120
60
90
60
−2
−2
−4
−4
150
30
150
30
−6
−6
−8
−8
180
0
210
330
180
0
210
300
240
Back
Forward
Mask
0 dB
120
330
300
240
270
270
(c) Musical Theater Singer
(d) Opera Singer
Figure 6: Normalized overall levels for the vocalists, speech (a and b) and song (c and
d)
Figure 9 shows the third-octave band data at 10000 Hz for the musical theater
singer’s song and vowels. While both the actress’s speech (fig. 7c) and vowels (fig.
8c) were smooth at high frequencies, the musical theater singer’s vowels experienced more variation in normalized directivity even at 10000 Hz (fig. 9b). Again,
the vowels-only case shows the greatest variation in radiation pattern, but when
averaged over a song segment (fig. 9a), these variations quickly diminish. Future
work may need to focus more on the directivity of separate phonemes, as a single
35
Normalized Actress Monologue Radiation at 251 Hz
Normalized Actress Monologue Radiation at 1000 Hz
90
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
Normalized Actress Monologue Radiation at 10000 Hz
90
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
−4
90
60
−2
−4
150
−4
150
30
150
30
−6
30
−6
−8
−6
−8
180
0
0
180
330
210
300
240
−8
180
330
210
0
330
210
300
240
270
300
240
270
(a) 251 Hz
Chest
Mask
Head
Back Resonance
0 dB
120
270
(b) 1000 Hz
(c) 10000 Hz
Figure 7: Normalized third-octave bands for actress’s monologue
Normalized Actress Vowels Radiation at 251 Hz
Normalized Actress Vowels Radiation at 1000 Hz
90
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
Chest
Mask
Head
Back Resonance
0 dB
120
60
−2
−4
90
60
−2
−4
150
30
150
30
−6
30
−6
−8
−6
−8
180
0
330
210
300
270
−8
180
0
330
210
300
240
270
(a) 251 Hz
Chest
Mask
Head
Back Resonance
0 dB
120
−4
150
240
Normalized Actress Vowels Radiation at 10000 Hz
90
180
0
330
210
300
240
270
(b) 1000 Hz
(c) 10000 Hz
Figure 8: Normalized third-octave bands for actress’s vowels
vocalist’s instantaneous acoustic radiation can differ significantly from a long-term
average.
The opera singer’s song directivity (fig. 10) also showed small variations
across vocal production mode in most bands. However, unlike the previous two
examples, this singer’s voice showed its greatest variation at the 1995 Hz band,
with the radiation patterns becoming somewhat more uniform at the 10000 Hz band.
Unlike the musical theater singer, the opera singer’s vowels and song data were both
extremely similar in directivity. This may be a consequence of the style of the aria
36
Normalized MT Singer Song Radiation at 10000 Hz
Normalized MT Singer Vowels Radiation at 10000 Hz
90
Back
Forward
Mask
0 dB
120
60
90
60
−2
−2
−4
−4
150
150
30
30
−6
−6
−8
−8
180
0
330
210
180
0
330
210
300
240
Back
Forward
Mask
0 dB
120
300
240
270
270
(a) Song
(b) Vowels
Figure 9: Normalized 10000 Hz bands for musical theater singer’s song and vowels
being sung, which had many long vowel notes and fewer fricatives than the musical
theater singer’s song.
Absolute Radiation Patterns
After a normalized analysis, ‘absolute’ radiation patterns were also plotted relative
only to the greatest level out of the vocal production modes used.* While the shapes
of these radiation patterns were earlier seen to vary little, the changes in absolute
level between different production modes can be informative. In figures 11c and
12c, the musical theater singer’s three production modes are not only the same
shape but remarkably consistent in overall level as well, although we have already
seen that this uniformity is not always present in individual frequency bands.
But in figure 11, a and b, we see that the actors’ two chest modes and head
modes grouped more or less together, though not in the same way. For the male
actor, the chest modes were uniformly louder, while for the female actor they were
*0
dB = the level of the mode with greatest level on-axis.
37
Normalized Opera Singer Monologue Radiation at 251 Hz
Normalized Opera Singer Song Radiation at 1995 Hz
90
Back
Forward
Mask
0 dB
120
60
Back
Forward
Mask
0 dB
120
60
−2
90
−4
60
−2
−4
−4
150
30
150
30
−6
30
−6
−8
−6
−8
180
0
330
210
−8
180
0
330
210
300
270
(a) 251 Hz
300
240
Back
Forward
Mask
0 dB
120
−2
150
240
Normalized Opera Singer Song Radiation at 10000 Hz
90
180
0
330
210
300
240
270
270
(b) 1995 Hz
(c) 10000 Hz
Figure 10: Normalized third-octave bands for opera singer’s song
softer than the two head modes. In fact, we can see from an absolute analysis that
while the actress’s ‘back resonance’ mode had the highest relative level in the rear,
the two head modes were still absolutely louder at those positions.
In-depth analysis of the absolute directivity differences amounts essentially
to a spatial spectrum, indexed first by position and then by frequency. At most
frequency bands, the radiation exhibits similar patterns to what has already been
seen. But at some frequencies, these drastically diverge from a simple uniformity.
For instance, in figure 13 the opera singer’s normalized directivity is nearly
the same at each frequency band. But while the absolute levels of each production
mode are almost identical at 251 (a) or 1000 Hz (c), the 501 Hz band (b) shows a
large change in level for each vocal mode. This pattern was observed for the opera
singer’s intoned vowels as well.
Figure 14 shows the spectral progression of all four production modes in
four third-octave bands for the actor’s intoned vowels. At 1000 Hz (a), we see
the ‘back resonance’ and ‘chest’ voices grouping together, which the ‘mask’ and
‘head’ voices are lower. Then at 1995 Hz (b) the ‘chest’ voice groups with the
lower level voices while the ‘back resonance’ voice is about 3 dB louder than the
38
Absolute Actor Vowels Overall Levels
Absolute Actress Vowels Overall Levels
90
Back Resonance
Mask
Head
Chest
0 dB
120
60
−2
90
Mask
Chest
Head
Back Resonance
0 dB
120
60
−2
−4
−4
150
150
30
30
−6
−6
−8
−8
180
0
180
330
210
0
330
210
300
240
300
240
270
270
(a) Actor
(b) Actress
Absolute MT Singer Vowels Overall Levels
Absolute Opera Singer Vowels Overall Levels
90
Forward
Back
Mask
0 dB
120
60
90
60
−2
−2
−4
−4
150
30
150
30
−6
−6
−8
−8
180
0
210
330
180
0
210
300
240
Back
Forward
Mask
0 dB
120
330
300
240
270
270
(c) Musical Theater Singer
(d) Opera Singer
Figure 11: Absolute overall levels for the vocalists, intoning vowels on C4
rest. Finally at 10000 Hz (c), the two chest voices group together and the two head
voices are together at a lower level, similar to their arrangement in (a).
39
Absolute Actor Monologue Overall Levels
Absolute Actress Monologue Overall Levels
90
Head
Mask
Chest
Back Resonance
0 dB
120
60
−2
90
Mask
Chest
Head
Back Resonance
0 dB
120
60
−2
−4
−4
150
150
30
30
−6
−6
−8
−8
180
0
180
330
210
0
330
210
300
240
300
240
270
270
(a) Actor
(b) Actress
Absolute MT Singer Song Overall Levels
Absolute Opera Singer Song Overall Levels
90
Forward
Back
Mask
0 dB
120
60
90
60
−2
−2
−4
−4
150
30
150
30
−6
−6
−8
−8
180
0
210
330
180
0
210
300
240
Back
Forward
Mask
0 dB
120
330
300
240
270
270
(c) Musical Theater Singer
(d) Opera Singer
Figure 12: Absolute overall levels for the vocalists, speech (a and b) and song (c and
d)
Discussion
Overall it will be seen that the vocal production modes chosen for this study had
a generally small effect on horizontal voice directivity. It is possible that different
effects could be observed for vertical directivity, but it seems unlikely based on
previous studies that have examined both dimensions. At specific frequencies, and
presumably for specific phonemes, the effects are greater, but no larger patterns
40
Absolute Opera Singer Song Radiation at 251 Hz
Absolute Opera Singer Song Radiation at 501 Hz
90
Back
Forward
Mask
0 dB
120
60
Absolute Opera Singer Song Radiation at 1000 Hz
90
Back
Forward
Mask
0 dB
120
60
−2
90
−4
60
−2
−4
150
−4
150
30
150
30
−6
30
−6
−8
−6
−8
180
0
0
180
330
210
300
240
−8
180
330
210
0
330
210
300
240
270
300
240
270
(a) 251 Hz
Forward
Back
Mask
0 dB
120
−2
270
(b) 501 Hz
(c) 1000 Hz
Figure 13: Absolute third-octave bands for opera singer’s song
Absolute Actor Vowels Radiation at 1000 Hz
Absolute Actor Vowels Radiation at 1995 Hz
90
Back Resonance
Mask
Head
Chest
0 dB
120
60
−2
Back Resonance
Mask
Head
Chest
0 dB
120
60
−2
−4
90
60
−2
−4
150
30
150
30
−6
30
−6
−8
−6
−8
180
0
330
210
300
270
−8
180
0
330
210
300
240
270
(a) 1000 Hz
Chest
Mask
Head
Back Resonance
0 dB
120
−4
150
240
Absolute Actor Vowels Radiation at 10000 Hz
90
180
0
330
210
300
240
270
(b) 1995 Hz
(c) 10000 Hz
Figure 14: Absolute third-octave bands for actor’s vowels
regarding this behavior have been observed. Changes observed in this study were
mainly on the order of 2 dB or less. While greater variations may exist due to the
smoothing effects of recording in a diffuse environment, this is the same type of
environment in which most performances for trained vocalists occur. If this is the
only variation able to be controlled by a trained vocalist, it may not be a salient
effect in diffuse environments, especially if any other sound sources are present.
This study also argues against some common notions about vocal directivity.
For instance, some of the vocalists measured actively predicted that the ‘mask’
41
mode would radiate more energy forward than the other production modes. However, the recorded data for all four vocalists does not support that conclusion.
I have heard it said in both classical and musical theater circles that operatic
singers radiate more circularly than musical theater singers. Since both singers
in this study sang different songs, the only example for comparison is their sung
vowels on C4 (fig. 5). However, the measured data show that these two singers have
fairly similar vocal radiation patterns. A larger sample size would be necessary to
investigate this further, but this example casts some doubt on the claim. Most likely
this piece of folk wisdom derived from a common misconception: the conflation
of directivity and spectrum. It is possible that the perceived omnidirectionality of
classical singers comes from the fact that male opera singers usually radiate more
low-frequency sound energy than male musical theater singers, thus leading to the
perception that their voices radiate differently in space rather than in frequency.
Finally for the main purpose of this dissertation, this study indicates that specific oratorical production methods may affect overall level somewhat but have a
small impact on the voice’s radiation pattern. Thus more comprehensive anechoic
datasets on male voice directivity for loud speech are sufficiently robust for the
purpose of simulating Whitefield’s oratory.
42
CHAPTER IV
ANALYSIS OF FRANKLIN’S EXPERIMENT
Franklin’s Experiment
In Chapter II, we examined the question of Whitefield’s crowd sizes from the perspective of historical scholarship. While a purely historical method may be informative, for the numerical question being considered it is also useful to use physical
and mathematical reasoning in conjunction with historical evidence.
Interestingly, toward the end of his life Whitefield revised many of his journals and removed passages written in his youth that he had come to view as “justly
exceptionable.” This included changing any estimates of crowds that were greater
than 20,000 to “so many thousand that many went away because they could not
hear” (Lambert, 1994). Though Whitefield’s voice was often described as “the roar
of a lion” (Stout, 1991), he himself admitted that his largest gatherings were still
limited by the audible range of his voice. Thus, it may well be that Franklin’s
acoustical experiment provided the best method for estimating the maximum size
of Whitefield’s audiences. Unfortunately, many historians simply take Franklin at
his word without bothering to look into the details behind his calculation.
Franklin specifies that he was assuming a semicircular radiation pattern based
on a uniform intelligible radius equal to the distance he measured from Whitefield’s
position. He also lists his assumed crowd density as 2 square feet per person, which
allows the minimally intelligible area and intelligible radius to be inferred from his
43
reported crowd estimate. If we take his estimate of 30,000 auditors to be the exact
answer to his calculation, then
30, 000 listeners =
Area
sq. ft.
2 listener
(4)
so
Area = 60, 000 sq. ft. ≈ 5, 575 m2
(5)
Since we know that for a semicircle,
1
Area = πr2
2
(6)
we have
r
r=
2 ∗ 5, 575
≈ 60 m
π
(7)
This value of is about half the distance to Franklin’s reported position near
Front Street (Fig. 15). Since area is the integral of the semicircumference (which
varies linearly with the radius), doubling the intelligible distance would quadruple
the intelligible area to about 23,000 m2 .
Some have asserted that Franklin merely miscalculated* or that he measured
the distance in strides and misremembered it as feet. (Liberman, 2005) Yet it seems
unlikely that Franklin was actually referring to a position only 60 m from Whitefield, which would have been nearer to Letitia Court, a small alleyway, than to Front
Street, which was one of the most important streets in Philadelphia at the time. An* See
the editor’s note in Franklin’s Autobiography, p. 179
44
Figure 15: Inset of Clarkson-Biddle Map of Philadelphia showing Market Street
other possible explanation is that he calculated the higher figure initially, but only
reported it as “more than Thirty Thousand.” While this may seem strange to modern
readers, it is actually very much in keeping with Franklin’s self-professed “Habit
of expressing my self in Terms of modest Diffidence, never using when I advance
any thing that may possibly be disputed...” (Franklin, 1793) After all, Franklin’s
experiment, while perhaps the best that could be quickly calculated, was only an
approximation, and it is likely he was aware of its shortcomings.* Also in Poor
Richard Improved (Franklin, 1749), Franklin describes a similar thought experiment using the same density for soldiers in formation. He ends this by stating that
“There are many voices that may be heard at 100 yards distance,” which suggests
that he specifically remembered measuring a distance at least this large. It seems
reasonable as well that Franklin calculated a number far larger than the accounts of
* For
instance, it had been known since the ancient Greeks designed their amphitheaters that the
human voice’s radiation is not perfectly semicircular (Orlowski, 2006).
45
Whitefield’s crowds, but realized that while his estimation was extreme, it made the
figure of 30,000 listeners seem more believable.
Diffraction Effects
Though Franklin lists his position as ‘near Front Street’, he does not list a specific position. Thus the first question that must be asked is where exactly Franklin
was in relation to Front Street when the noise source obscured Whitefield’s speech.
Though Franklin does not specify, it seems reasonable to assume that he was onaxis for Whitefield’s sermon, which would put him in the center of Market Street
as he walked away from Whitefield. It will be seen (Fig. 16) that Franklin’s visible
area on the south section of Front Street forms a triangle whose area doubles as
Franklin’s distance to Front Street, d, halves. The geometry for the north section
is similar though not identical, but there are historical reasons to suspect the noise
source was to the south of Market Street, which will be discussed further later in
this chapter.
The question arises as to whether Franklin was hearing diffracted noise around
the corner of the intersection, or whether he had a direct path to the noise source itself. The nature of the building at this corner will be discussed later, but because the
buildings in early Philadelphia were made of brick, (Cotter, Roberts, & Parrington,
1992) it seems safe to focus only on diffracted sound and assume the noise conducted through the building itself is negligible. Assigning the origin in a coordinate
plane to the point of diffraction (the blue dot in Fig. 16), Franklin’s position is
BF = (−d, w1 )
where w1 is half the width of Market Street.
46
(8)
Figure 16: Diagram of Franklin’s position (BF) in relation to sources on Front Street
To compare a visible source to one which is not visible, we may assign representative points c1 and c2 , where c1 is the centroid of the triangle representing the
visible area on Front Street, and c2 is the centroid of the triangle of non-visible area
bordering the visible area (Fig. 16). These have coordinates
!
c1 =
2w2 −w1 w2
,
3
3d
!
c2 =
w2 −2w1 w2
,
3
3d
(9)
and
(10)
where w2 is the full width of Front Street. It will be seen that c1 approaches
the border of Market Street as d becomes large, and that Franklin’s distance to c1 is
simply
47
v
!2
u
u 2w
2
+d +
rc1 = t
3
w1 w2
w1 +
3d
!2
(11)
while the distance to c2 is similarly
v
!2
u
u w
2
+d +
rc2 = t
3
2w1 w2
w1 +
3d
!2
(12)
The shortest path of diffracted sound is rA +rB , where rA is the distance from
the source to the diffraction point and rB is the distance from the diffraction point
to the receiver. These are then
v
!
u
u 2w 2
2
rA = t
+
3
w1 w2
3d
!2
(13)
and
q
rB = d2 + w12
(14)
The path difference δ = rA + rB − rc2 is used to calculate the Fresnel number,
N , by
N = 2δ/λ
(15)
where λ is the wavelength of the sound being diffracted.
The simplest approximation for calculating frequency-based attenuation is to
treat the building blocking Franklin’s position as a planar screen, a practice that
allows an intuitive understanding of the relationship between orthogonal screen
height and wavelength (Maekawa, 1968; Meyer, 2009). This method of using an
equivalent screen for wedge-based diffraction is not as accurate as Pierce’s more
48
rigorous solution using the Fresnel auxiliary functions (Pierce, 1974). But since
this is a more general historical application and Franklin’s position is well within
the ‘shadow zone’ of the diffraction, the ‘equivalent screen’ method provides a good
approximation.
One of the most popular methods in noise control engineering is the KurzeAnderson formula (Kurze & Anderson, 1971), equation 16 below, which models
Maekawa’s measured data closely for large values of N and within 1.5 dB whenever
N < 1 (Menounou, 2001). This formula has the advantage of being calculated
solely from the Fresnel number:
√
∆L = 5 + 20 log
2πN
√
(dB)
tanh 2πN
(16)
Using measured values from Fig. 15, we find that w1 = 50 and w2 ≈ 55.
We may use these to find values of the projected attenuation based on Franklin’s
distance d from Market Street and the frequency† (in Hz) of a noise source located
at c2 relative to Franklin’s position (Fig. 17).
Using the more complex analytical description of Maekawa’s solution, (Kurze,
1974) we can use
rc2
rc2
sin (φ/2)
∆L = 10 log 4π −20 log
+10 log 1 +
−20 log 1 +
λ
rA + rB
rA + rB
sin θ + (φ/2)
(17)
2δ
when N ≥ 1, and using the correction term
The speed of sound c was calculated using a temperature of 4.5â—¦ C, the average for Philadelphia in
November, when Franklin’s account took place. This is discussed at length in the next chapter.
†
49
Figure 17: Diffraction at Franklin’s Position using Kurze-Anderson Formula
p
(N/2)
p
∆LR = 20 log 2
tanh π (N/2)
π
(18)
as a substitute for the first term in eqn. 17 when N < 1. The attenuation
projected by this formula is almost identical to that of the Kurze-Anderson approximation. The full result of this solution is shown in Fig. 18.
Analyzing Figures 17 and 18, three things stand out:
1. The average attenuation for a source in the non-visible triangle from figure
16 would have been quite high – nearly 30 dB for high frequencies and large
values of d.
2. The only instances in which attenuation would have been small (i.e. less than
50
Figure 18: Diffraction at Franklin’s Position using Maekawa’s Solution
10 dB) would have been for low-frequency sounds, probably less than 100
Hz, or for very small values of d.
3. Any diffracted noise would have been lower than the optimum frequency
range to mask the human voice, meaning that it would have had to be very
loud to still distract Franklin after being attenuated.
In addition, any higher-frequency noise would have been almost completely
shielded by the building until Franklin came within about 5 feet (≈ 1.5 meters) of
Front Street. Therefore if the noise source did not contain a large amount of lowfrequency acoustic energy, the background noise would have been close to zero for
slightly larger values of d. To answer the question of the nature of the noise source,
however, we will need to further consult the historical and archaeological record.
51
Noise Sources in Eighteenth-Century Philadelphia
Though the issue of noise in cities seems like a modern problem, dwellers in colonial Philadelphia began to complain about noise sources within the first century of
the city’s existence.* The Quaker Meetinghouse at 2nd and Market Streets, next to
the Courthouse from which Whitefield preached, would later be abandoned because
excessive street noise disrupted the Friends’ silent worship services (Rath, 2003).
When William Penn first laid out the city, Front Street was to be a broad promenade along the river and thus would have contained the various wharves and docks
where workers were loading ships. But soon merchants began constructing places
of business closer to the water, and by 1739 Front Street would have already been
cut off from the noise of the river (Cotter et al., 1992).
Rath’s study of colonial Philadelphia found that the two main sources of noise
were carriage traffic and people (both from children’s games and human voices)
(Rath, 2003). The question of carriage noise is complex, as the noise resulting can
depend on the type of carriage, wheels, and especially the type of road on which it
is traveling. While the carriage wheels may have been iron-wrought by 1739 (Rath,
2003) along with the horses’ shoes, the composition of Front Street itself in 1739
is difficult to ascertain. The city of Philadelphia would not take responsibility for
paving its streets until 1762, (Cotter et al., 1992) and the streets up to that point were
often taken care of by the residents and businesses who lived near them. Franklin
himself put together a scheme for paving and cleaning part of Philadelphia in the
1750s (Franklin, 1793). Traveling to Philadelphia in 1748, Swedish Botanist Peter
Kalm remarked of the streets that
* This
discussion is limited to the noise sources that were likely to affect Franklin from Front Street.
A broader history of noise in colonial Philadelphia is included in Appendix B.
52
...some are paved, others are not, and it seems less necessary since the
ground is sandy, and therefore absorbs the wet. But in most of the
streets is a pavement of flags, a fathom or more broad, laid before the
houses...(Kalm, 1770)
There is some agreement that Market Street itself was probably paved to some
extent by the 1720s (Boudreau, 2012b, 2012a; Jackson, 1918) although that ‘pavement’ would probably have consisted of what today we would call gravel (Hershey,
1975; Boren, 2012). As Front Street was one of the more important thoroughfares
of the early city, it is likely that it too would have had such a treatment, but no explicit references to the street have been found in the historical literature, positively
or negatively. Front Street likely either consisted of a similar gravel pavement, possibly with a flagstone sidewalk nearer the buildings, or else was still made of the
sandy soil that Kalm described. In either case, a carriage traveling on such surfaces
would have been less prone to the more impulsive attacks associated with cobblestone streets and certainly would not have generated enough low frequency noise
to obscure Whitefield’s speech through diffraction. Within the visible section of
the street, however, such a carriage would have been a viable noise source once
Franklin was close to Front Street.
Though it might be thought that the crowd listening to Whitefield would have
been a significant noise source, Franklin specifically mentioned how silent they
were, (Franklin, 1793) and Whitefield himself commented that his audiences in
America were even quieter than those in Britain (Stout, 1991). But by 1739 Whitefield was becoming quite a celebrity, and as such was drawing more than simply
a devotional audience. Indeed, as Whitefield’s audiences grew, it is recorded that
groups often gathered on the edge of his congregants, either social elites or the low-
53
brow inhabitants of London’s Moorfields, causing some noise around the periphery
of the crowd (Dallimore, 1970; Wakeley, 1872).
Such a gathering may have been the source of noise Franklin discussed, and
Front Street was the ideal setting: the street itself was the site of many merchants’
clearinghouses, and the lively coffee houses that “opened directly into the life of
the streets” were concentrated there as well (Cotter et al., 1992). In particular, the
London Coffee House, the city’s “pulsating heart of excitement, enterprise, and patriotism” was located exactly at the southwest corner of Market Street and Front
Street, very near Franklin’s position (Ukers, 1922). Though the more famous Second London Coffee House was not founded until 1754, a chronicler lists its predecessor at the same intersection (though probably not the same corner), founded in
1702. A second account disputes this location, but the second chronicler is unsure
of the exact location and may be referring to a different establishment. In addition,
the southern part of Front Street had been home to at least two other coffee shops by
1739 (Ukers, 1922). In any case, there is no doubt that Front Street was a bustling
center of cultural life for the city, and was a likely spot for those who did not wish
to join Whitefield’s congregation on Market Street.
Discussion
Though Franklin misremembered some facts in his autobiography, it seems likely
that he correctly recalls the facts of his experiment, especially his position near
Front Street and the noise source he records there. Based on the geometry of Market Street in 1739, we have shown that diffracted noise would have been greatly
attenuated except for low frequencies and small distances to Front Street. The middle frequencies of Whitefield’s voice that could carry to Front Street yet remain
54
intelligible would still have been higher than the frequencies that could have been
diffracted without significant energy loss.
Based on the historical evidence, conversation outside a store or coffee house
along Front Street seems the most likely source of noise that would have reached
Franklin’s position. This would have the added benefit of occupying the same frequency range as Whitefield’s voice and thus be most likely to mask his sermon.
Another possibility is noise from a carriage along Front Street, although the street
would probably have been composed of gravel rather than hard cobblestone. But
the noise generated from either of these sources would not have contained the large
amounts of low frequency energy necessary to diffract around a corner to Franklin’s
position.
It seems most logical to conclude that Franklin encountered the noise abruptly
by coming very close to Front Street and obtaining a direct line-of-sight to the noise,
which then reduced the intelligibility of Whitefield’s speech. While Franklin was
approaching the corner, the diffraction would have caused a gradual increase in the
noise until it had a direct path to him, avoiding any binary classification dilemmas. Since Franklin had a direct line of sight to Whitefield, Franklin’s experiment
may now be simulated using geometric acoustic modeling techniques, which do not
natively account for wave phenomena. Geometrical simulation programs do however provide a wide range of tools for comparing the effects of spatial and spectral
changes on STI, which is ideal for this application. In addition, this will require
measured values for both possible noise sources, which may then be put into a
virtual model to calculate the amount of noise at a series of receiver positions for
Franklin along Market Street. Varying source distance within reasonable limits will
allow high and low estimates for the SPL required for Whitefield to retain minimal
intelligibility.
55
CHAPTER V
ACOUSTIC SIMULATION OF FRANKLIN’S EXPERIMENT
Franklin’s experiment certainly provides an important clue to the true range
of Whitefield’s voice, but we should not take it at face value without considering
the advances in physics that have been made in the past three centuries. Given the
advances in acoustical knowledge over the next three centuries, Franklin’s actual
calculation is less important than his recorded data: the maximum intelligible distance of Whitefield’s voice. The analysis in Chapter IV has suggested that Franklin
would have had to have been within about 1.5 m from Front Street to be able to
hear significant masking noise from a source in that street. This allows a better
estimate of the maximum intelligible distance of about 121 m along the ground.
This piece of data can be used to reconstruct fully the acoustic system from Whitefield to Franklin, yielding an estimate of the source magnitude necessary to achieve
minimum intelligibility at Franklin’s position.
The goal of this chapter is to estimate, based on the data from Franklin’s
experiment, the time-averaged SPL of George Whitefield’s speaking voice at a distance of 1 m. This data can be used to simulate the maximum crowd size that could
have heard Whitefield in the sites in London where he attracted his largest crowds.
56
Makeup of the Colonial City
The first step toward building a model of the acoustic conditions present during
Franklin’s experiment is to determine the geometrical layout of the ground, buildings, and people that would have been nearby during Whitefield’s sermon. While
many period maps of Philadelphia exist, most of these depict only congruent boxes
for the various buildings that made up the city. The earliest map of the city that includes scaled drawings of buildings and streets in the Clarkson-Biddle map of 1762
(Snyder, 1975), as shown in figure 15.
Most of the Market Street area has changed dramatically since Franklin’s experiment, and most of the buildings that would have been present then, including
the court house, no longer exist. Because of this, most of the geometrical information about the area must be reconstructed from the Clarkson-Biddle map and period
drawings of the area, such as William Breton’s 1830 watercolor rendering of the
court house (fig. 19).
Figure 19: William Breton, Old Court House & Second Friend’s Meeting, 1830, Library Company of Philadelphia
57
The primary material composition of the area can be determined through historical and archaeological research. The Clarkson-Biddle map describes the houses
of the city as being made of brick. While most of these buildings no longer exist, the brick exterior of nearby Christ Church was completed during the 1730s
and provides a good basis for the sizes of bricks that would have been used in the
other buildings. This brick, along with glass windows and wooden doors visible
in many drawings, accounts for most of the reflective surfaces on the buildings on
Market Street. As discussed in Chapter IV, the material composition of Market
Street was probably more similar to gravel than smooth pavement. Since measured
acoustic absorption data are available for all these materials, a geometrical computer model should be able to accurately recreate the acoustic conditions present
during Franklin’s experiment.
Figure 20: Inset of George Heap’s East Prospect of the City of Philadelphia, 1752,
New York Public Library
58
Modeling Procedure
Geometry
The Market Street area was modeled geometrically in AutoCAD first by making a
2-dimensional trace of the Clarkson-Biddle map. This was scaled to the width of
Market Street itself, which was laid out to be 100 feet wide (30.48 m) (Cotter et al.,
1992), and measurements on the site confirmed that this value is still accurate today.
Heights were estimated by proportions of horizontal to vertical measurements in
drawings like fig. 19 and broader views such as George Heap’s East Prospect of the
City of Philadelphia (fig. 20).
These estimates were used to extrude the 2D drawing into a 3D model. The
ground area from the court house to Front Street was lowered linearly corresponding to a measured drop in elevation of 2.1 m using Google Earth’s elevation database.*
Windows and doors were modeled for the court house but not for other buildings, as
previous research indicates that such precision is only needed very close to acoustic sources and receivers for outdoor models (Mori et al., 2011). The area directly
around the court house was modeled as a series of planes 1.5 m high representing the crowd listening to Whitefield. The crowd was modeled with a total area of
about 1000 m2 , corresponding to the estimates of Whitefield’s Philadelphia crowds
as about 6,000 people (Tyerman, 1877) and Franklin’s assumed density of about
0.186 m2 per person. This yielded a final CAD model (fig. 21) that could be imported into CATT-Acoustic, a geometrical acoustic modeling program.
In CATT the absorption coefficients for all acoustic surfaces were obtained
from the publicly available ODEON library,† and are shown in table 2.
* http://www.google.com/earth,
†
accessed 7/22/2014.
available from www.odeon.dk, accessed 7/22/2014.
59
(a) Seen from East
(b) Seen from above
Figure 21: AutoCAD model of Market Street area, extruded from Clarkson-Biddle
map
Table 2
Absorption coefficients by octave band center frequency (Hz) for each material used
in Market Street model
Surface
63
125
250
500
1000
2000
4000
Brick, 19 holes, 60 mm
0.14
0.14
0.28
0.45
0.90
0.45
0.65
Gravel
0.25
0.25
0.60
0.65
0.70
0.75
0.80
Audience area
0.60
0.60
0.74
0.88
0.96
0.93
0.85
Windows
0.35
0.35
0.25
0.18
0.12
0.07
0.04
Solid wooden door
0.14
0.14
0.10
0.06
0.08
0.10
0.10
A source was placed at Whitefield’s position with an IEC male standard spectrum, facing outward. It was determined in Chapter III that existing voice directivity
datasets were sufficient for this simulation, and so the CATT standard male spoken
voice directivity pattern was applied to the virtual source. A receiver represent60
ing Franklin was positioned in the center of Market Street, 1.5 m from the edge
of Front Street as previously discussed. Franklin’s position was 1.75 m above the
ground level, corresponding to his known height. Whitefield’s exact height is not
known, but he was described as being of medium height, so he was also modeled at
1.75 m above the court house steps. This gave a linear distance of 121.6 m between
source and receiver, approaching Franklin’s position at a vertical angle of 2.6â—¦ .
Sound Attenuation Simulation
In an ideal free field, sound, like other wave phenomena, experiences inverse-square
attenuation simply from spreading out in space. When plotted on a logarithmic decibel (dB) scale, this is often referred to as “6 dB per distance doubling,” following
the free field attenuation formula (eq. 2). This equation is based on spherical source
radiation with no reflecting surfaces present. Free field conditions are often used
as an approximation to the acoustic conditions present in outdoor locations because
of the lack of reinforcing reflection buildup found in interior spaces. In the case of
Market Street, however, the initial sound is highly channeled toward Franklin’s position due to the court house behind Whitefield and the buildings lining the street on
either side. These reinforcing reflections negate free field behavior near the source,
but the overall decay will approach free field conditions at a sufficient distance from
the source.
The modeling environment, CATT-Acoustic, offers three different CT algorithms based on the density of cones and length of impulse response desired
for modeling purposes (B. Dalenback, 2011). The first algorithm is optimized
for speed, while the others are slower and use a higher density of cones. While
the slower algorithms are recommended for detailed interior auralizations, in tests
the simulated levels at Franklin’s position from all three algorithms were identical
61
within 0.1 dB. This is an advantage of CT algorithms with respect to RT: high ray
counts are needed for RT applications at large distances due to the decreasing relative size of a receiver (Borish, 1984), but cones constitute a uniform solid angle at
all distances. Thus in an outdoor case with no reverberation, even a low-order CT
algorithm will yield a good model.
−10
Market Street attenuation compared to free field
Market Street
Free Field
−15
Attenuation (dB)
−20
−25
−30
−35
−40
−45 0
10
1
2
10
10
3
10
Distance (m) from source
Figure 22: Predicted logarithmic attenuation from Whitefield to Franklin
The overall Z-weighted pressure attenuation along Market Street was tested
in the computer model by placing virtual receivers along the center of the street,
beginning at 4 m from Whitefield’s position, and doubling the distance until 128
m, just beyond Franklin’s position. As expected, this shows an initial decay less
steep than that of a free field, while at greater distances its overall slope becomes
very close to that of a free field decay (fig. 22). It should be noted that the virtual
model also includes some high-frequency attenuation from air absorption that is not
accounted for in equation 2, indicating that the deviation from a free field model is
slightly greater even than shown here. The model predicts that Whitefield’s vocal
62
SPL at Franklin’s position was about 7 dB greater than it would have been at an
equivalent distance in a more open environment.
Speech Intelligibility
After this initial analysis, it is useful to consider closely the acoustic system consisting of Whitefield’s voice, the Market Street area, and Franklin himself, since even
though reflections may increase the overall level they may still reduce speech intelligibility, as in the case of highly reverberant rooms. By examining the full-spectrum
pressure-squared echogram at Franklin’s position (fig. 23), it is evident that Whitefield’s voice is mainly aided by a single strong reflection from the court house itself
behind him. The other reflections from surrounding buildings are more spread out
in time and much weaker due to longer transmission paths and additional loss from
surface absorption. Most importantly, the principal reflection reaches Franklin’s
position only about 40 ms after the initial wavefront, within the 50 ms limit traditionally accepted as the cutoff time for reflections to enhance rather than degrade
speech intelligibility (Bradley, Reich, & Norcross, 1999).
The STI requires not only the reflection data for a given source-receiver combination, but also the background noise level at the receiver (Houtgast et al., 1980),
and for this reason it was used to index Whitefield’s loudness rather than a purely
time-based measure like C50 . STI uses both reflection data and background noise
to calculate the overall signal degradation, which is pegged to an effective signal to
noise ratio. This is then used to calculate the numerical STI quantifier from 0-1 for a
series of octave bands. These bands are averaged using a weighting function for the
non-linear frequency response of the auditory system to produce a single quantifier
for a given acoustic system’s intelligibility. Before applying any STI calculations,
63
Figure 23: Summed pressure-squared echogram from Whitefield to Franklin
it is necessary to approximate the background noise that Franklin described that
“obsur’d” Whitefield’s voice.
Background Noise
Chapter IV concluded that the two most likely candidates for a noise source on
Front Street were either a conversation around the corner or a horse and carriage
moving down the street. The time-averaged levels for both of these sources was
found to be very similar: measurements of conversational speech (Boren & Roginska, 2013) matched closely the IEC standard of 59.5 dBA for normal speech
(B. Dalenback, 2011). Several different horses and carriages were measured at
approximately 1 m on the gravel circle around the Cherry Hill Fountain in New
York’s Central Park. The time-average (Leq ) of these measurements ranged from
60-63 dBA . The largest measured LAeq values were used in the model and are listed
in table 3.
64
Table 3
Octave band averaged sound pressure (dB) at 1 m for both background noise sources
Source
125
250
500
1000
2000
4000
8000
16000
IEC Normal Vocal Effort
51
57
60
54
49
44
39
34
Maximum Carriage Noise
57
54
58
60
56
56
51
45
The question of the exact noise level at Franklin’s position is trickier to determine absolutely, however. A single point-source model relatively far down Front
Street would be attenuated nearly 30 dB according to a purely theoretical model.
However, this would have made the level at Franklin’s position slightly above 30
dBA , similar to that of a modern recording studio, at a time when he was complaining about the noise level, which seems suspect. Even at times of relative quiet
outside, wind alone can generate noise around 40 dBA , which is still low enough
to qualify for acoustical sustainability credits in educational buildings.* It is not
possible to measure a similar minimum level on site today since motorized traffic
and US Interstate 95 now significantly increase the noise level at the intersection to
70-80 dBZ . Because of this, a more general framework was developed for “near”
and “far” sources experiencing respective attenuation of 10 dB and 15 dB, depending on their distance to Franklin’s position. This yielded background noise levels
of about 50 and 45 dBA , depending on the exact octave-band results used from table 3. Because the carriage contained more energy in the most important octave
bands for STI (2-4 kHz (Steeneken & Houtgast, 1980)), this virtual source yielded
a louder virtual Whitefield. While these higher frequency bands would be more
subject to atmospheric air absorption than the lower bands, at a short distance (20
* http://www.usgbc.org/leed,
accessed 7/22/2014.
65
m or less) these effects would be about 1 dB or less, depending on the humidity and
temperature (ANSI, 2009).
Atmospheric Conditions
The exact temperature during Whitefield’s sermon is not known exactly. Whitefield
recorded preaching from the court house steps on the 8th, 9th, and 10th of November, 1739 in his first trip to Philadelphia (Tyerman, 1877). Franklin recorded an
additional sermon on the following Sunday, November 11th (Franklin, 1739). The
US National Climatic Data Center does not possess any recorded temperature data
for Philadelphia prior to 1767.* Peter Kalm, the Scandinavian botanist, traveled
throughout the American colonies ten years later and spent time in Philadelphia
during November of 1749. During this time he recorded the morning and afternoon temperatures using the newly-developed Celsius thermometer (Kalm, 1770).
His recorded temperatures give an average of about 5 â—¦ C, similar to the November
temperature in Philadelphia today.† While this is not determinative of the weather
ten years prior, this clue was used to base the temperature and humidity data in
the model on current normal values for Philadelphia in November: 4.5 â—¦ C and 50%
humidity, respectively.
Results
The computer model was used to simulate the minimum SPL on-axis 1 m from
Whitefield’s mouth that would be necessary to generate a minimal value of the STI
based on “near” and “far” background noise sources. The value of the STI that
* http://www.ncdc.noaa.gov/,
accessed 7/22/2014.
†
http://weatherspark.com/averages/31282/11/10/Philadelphia
-Pennsylvania-United-States, accessed 7/22/2014.
66
defines the threshold of intelligibility varies among different people: since STI is
a classifier of the external acoustic system to the listener, actual subjective intelligibility will depend on the listener’s hearing perception. An STI of 0.3 is usually
seen as the value at which intelligibility becomes “bad” (Hodgson, 2002). Because
of this, STI values from 0.2 (corresponding to better than average hearing) to 0.4
(corresponding to below average hearing) were simulated. There is no evidence that
Franklin had hearing loss, and he was still relatively young (33) at the time. Because of this, the “normal” threshold of 0.3 is probably the best measure. However,
the higher and lower values are included as well, since this variable can later be included in simulations of Whitefield’s crowds in London by using the minimum STI
(instead of the maximum intelligible distance) to estimate how many people with
identical hearing (i.e. how many Benjamin Franklins) could have heard Whitefield
speak, as Franklin’s original experiment did.
Table 4
Simulated LAeq values (dB) for Whitefield’s voice based on background noise
distance and minimum STI value
Vocal Noise
Minimum STI:
0.2
0.3
0.4
Close Source:
86
90
93
Far Source:
81
85
88
Minimum STI:
0.2
0.3
0.4
Close Source:
90
95
99
Far Source:
85
90
95
Carriage Noise
Table 4 shows the simulated values of Whitefield’s on-axis SPL, in dBA , for
67
both possible noise sources. It will be noted that every estimate is greater than the
IEC standard “Loud” voice, which is about 74 dBA . Many of the estimates, depending on their assumptions, posit that Whitefield’s voice was much louder than this
standard, up to and beyond 90 dBA . The highest estimates are probably unreasonable, and greatly exceed the maximum on-axis vocal Leq in the existing literature, as
discussed in Chapter II. However, the more moderate estimates of a time-averaged
Leq of 90 dBA closely match the maximum measured values.
Discussion
The only acoustical factor not accounted for by a geometric model is atmospheric
refraction. This may occur due to changes in wave velocity throughout a medium
due either to wind or varying temperature (Piercy, Embleton, & Sutherland, 1977).
Wind data are of course not obtainable for the day in question, but based on both
Franklin’s and Whitefield’s references to the quiet of the scene, it seems reasonable
to assume the day was not particularly blustery, especially since Whitefield often
mentioned the wind if it was an acoustical factor (Dallimore, 1970). Temperature
gradient data are likewise unavailable, but under normal conditions the temperature
is either fairly steady or is in a “lapse” state, in which air gets colder with increasing
elevation. This causes sound waves to bend upward and attenuate faster, and this
effect would have been increased due to Whitefield’s elevated position. This means
that adding any refraction effects into the model would only increase the simulated
SPL for Whitefield’s voice, which already approaches or exceeds the existing maximum measured values of vocal SPL discussed in Chapter II. The opposite effect,
a temperature inversion that carries sound farther than normal, is rare and occurs
chiefly at night or early morning, or sometimes after a rainstorm (Ross, 1999).
68
Since there is no evidence of a temperature inversion, the total refraction effects on
Franklin’s experiment may be presumed to be minimal.
It is known that training (Mendes et al., 2003) and youth (Kent et al., 1987)
both contribute to maximum vocal output, and Whitefield had both on his side as
he spoke for hours per day though only 24 years old at the time. While simulated
SPL values greater than those verified experimentally should be viewed with caution, the computer model predicts that Whitefield’s average SPL during Franklin’s
experiment could have exceeded 90 dBA at 1 m, indicating that Whitefield might
well have been one of the loudest people that ever lived.
69
CHAPTER VI
MAXIMUM AVERAGED AND PEAK VOCAL SPLS
In the previous chapter we showed that acoustical models of Franklin’s experiment suggest an estimate of Whitefield’s average vocal SPL at 1 m from 81-99
dBA . These estimates were roughly evenly distributed around a median value of 90
dBA , with values from 85-95 dBA projected for more combinations of noise level
and STI minimum. The survey of existing literature in Chapter II showed a limit
of about 90 dBA for time-averaged measurements. To test this further, it will be
useful to make a series of vocal SPL measurements for trained actors and singers to
compare their maximum levels.
In Chapter III, vocal production modes were not found to have a significant
effect on acoustic radiation pattern, but some vocalists believed that certain vocal
placements were more effective at reaching an audience than others. It was hypothesized that this was because different vocal resonances might correspond to
differences in overall sound pressure rather than directivity. Absolute SPLs of subjective dynamic levels measured in a controlled environment could examine this
theory further.
The maximum peak and average SPL values for trained vocalists are important because training can significantly increase potential vocal output (Mendes et
al., 2003; Awan, 1991). The IEC standard of vocal level for acoustic simulation
assumes a 3-level scheme of ‘normal,’ ‘raised,’ and ‘loud,’ with the top category
70
corresponding to an Leq of about 74 dBA (B. Dalenback, 2011). But Whitefield’s
example suggests that there exists some headroom above this designation, corresponding to an even louder ‘maximal’ level of speech.
Even if this level is only achievable by some trained vocalists, this is important to consider since trained vocalists are disproportionately represented in
recording studios, concert halls, and theatres. For recording engineers, knowing
the maximum peak and average SPLs produceable by vocalists can be useful for
pre-calibrating vocal microphones. For live sound engineers, acoustic conditions
may be very different for a very loud trained vocalist and require less loudspeaker
reinforcement as a result.
Method
Pilot Study
To first investigate the possible average maximum level of the spoken voice, a pilot
study was undertaken using one professional actor and one professional actress.
This is too small a sample size to extrapolate aggregated data. However, since the
goal of this research was principally to investigate the extreme outliers, this study
was meant only to explore the maximum levels achievable by trained vocalists. The
vocalists were measured in the live room of the James Dolan Recording Studio at
NYU. The room’s dimensions are 9m by 4.6m by 3m. The vocalists were aligned to
exactly 1 m in front of an on-axis Micro-SPL measurement condenser microphone
attached to an XL2 Sound Level Meter, logging peak and averaged values of LA
and LZ in 1 s intervals. The meter’s sensitivity range was set to Lp values of 30-130
dB. The vocalists were not restrained in any way, but were observed to keep still
during the measurements.
71
Both speakers were instructed to recite a short monologue, about 30 to 60 s in
length, from memory at three different subjective loudness levels: ‘conversational’
speech, ‘theatrical’ speech (defined as the level necessary to be intelligible to an
audience in a small theatre with no amplification system), and ‘maximal’ speech
(defined as the loudest achievable without shouting or screaming).
While some studies have used noise played over headphones to induce a
higher SPL out of vocalists (Akerlund et al., 1992), for some vocalists this condition has actually reduced maximum sound pressure (Gramming et al., 1988). Since
all vocalists measured for this study were highly trained, no headphones or noise
conditions were used, ensuring that the participants felt natural during the experiment.
The levels recorded will be reported chiefly in dBA , though the LZ values
were generally within 1 dB of the LA values. Both speakers in the pilot study
achieved levels comparable to the ‘loud’ designation in (B. Dalenback, 2011) at
their ‘theatrical’ level, and both were able to produce an average level slightly over
90 dBA at their ‘maximal’ level (table 5). The corresponding Z-weighted values
were about 0.5 dB greater for both vocalists.
Table 5
Leq values for pilot study, in dBA , for Conversational, Theatrical, and Maximal Levels
Participant
Conv.
Thea.
Max.
P1 - actor
64.1
77.9
90.1
P2 - actress
65.3
73.4
90.7
72
Table 6
Lpk values for pilot study, in dBA
Participant
Conv.
Thea.
Max.
P1 - actor
93.6
106.0
113.2
P2 - actress
92.3
98.7
113.3
Table 6 shows the highest peak value measured for each speech designation.
It can be seen that the peak value in a given monologue was routinely 20-30 dB
greater than the average level for that period. To investigate this difference, we
define the peak spread of a given vocal measurement as follows:
Spk = 20 log
ppk
peq
= Lpk − Leq
(19)
where Spk is the peak spread, ppk is the peak pressure, and peq is the average
pressure. Table 7 gives the peak spread for both participants in the pilot study based
on the A-weighted levels. It can be seen that Spk decreased with increasing vocal
level, such that the ‘maximal’ voice contained the least variation in pressure.
Table 7
Spk values for pilot study, in dBA
Participant
Conv.
Thea.
Max.
P1 - actor
29.5
28.1
23.1
P2 - actress
27.0
25.3
22.6
73
Spoken and Sung Voice
The pilot study showed that trained vocalists could indeed reach average levels of
90 dBA , and provided other interesting questions about the relationship between
peak and average values in vocal output at different levels. After this, a more broad
series of measurements was conducted on 9 trained singers in the same space, using
the same setup and equipment. The vocalists contained 6 females (5 sopranos and
1 mezzo soprano) and 3 males (2 baritones and 1 tenor). The same three spoken
voice designations were measured for these participants.
In addition, the singers also sang a piece, about 30 to 60 s long, from their
repertoire at three different dynamic levels: pianissimo (pp), mezzo forte (mf), and
fortissimo (ff). While it is known that frequency content is strongly correlated with
short-term SPL (Coleman et al., 1977), the singers were merely instructed to select
a piece from their actual repertoire that they could sing as loudly as possible while
retaining a ‘musical’ tone. Frequency of the highest note was not a criterion for
selection, but singers were instructed to retain the original key of the pieces to
ensure that the pieces were not artificially amplified by modulation. To investigate
the role of different vocal resonances, the vocalists sang all three dynamic levels
using both the ‘back’ and ‘mask’ voices, for 6 total sung measurements. The ‘back’
voice places the primary vocal resonance at the rear of the vocal tract, similar to a
yawn in its most extreme form. The ‘mask’ voice uses the resonances of the sinus
cavities at the front of the head.
Appendix A lists the full Leq and Lpk for the spoken, back sung, and mask
sung voices for the 9 singers, along with each vocalist’s numerical ID and vocal
range.
74
Analysis
Average Levels for Speech
The mean of the 9 singers’ spoken levels was 59.0 dBA for ‘conversational’ speech,
69.9 dBA for ‘theatrical’ speech, and 79.6 dBA for ‘maximal’ speech. Each of these
values increases by 1-2 dB if the two actors from the pilot study are included in
the data, indicating that a population of all trained actors may achieve even higher
mean values. Even with singers, however, the mean ‘maximal’ LAeq was still about
6 dB higher than the ‘loud’ level of 74 dBA used in the IEC standard (B. Dalenback,
2011). Individual vocalists were able to exceed the ‘loud’ level by up to 15 dB.
Gender Differences
The mean Leq values were higher among male vocalists than female vocalists at the
highest vocal levels for both speech and sung conditions, consistent with previous
research (Kent et al., 1987). For the sung conditions the males showed a larger
dynamic range overall, as their mean Leq was lower for the pianissimo condition,
but this may be a consequence of the small sample size of male singers (n=3). This
is not to say that females cannot produce equally high levels – in fact, the highest
recorded Leq for speech, 90.7 dBA , was produced by the female actress in the pilot
study.
Spoken Levels For Singers
Another interesting aspect of the recorded data is the differences between sung and
spoken data for the trained singers. While previous studies have conclusively shown
that vocal training can lead to higher maximum SPL (Mendes et al., 2003; Akerlund
et al., 1992), this effect was lessened for some singers during the spoken conditions,
75
as many of the singers produced maximum Leq values that were much lower than
those of the two actors. In fact, vocalists 3, 5, and 6 had maximum values of spoken
Leq that were 5 or more dB lower than the Leq for their pianissimo mask voice! This
was not the case for other singers, such as vocalist 7, who was able to produce sung
and spoken maximums of Leq similar to the levels produced by the trained actors.
This suggests that some trained singers may have different mental frameworks for
spoken vs. sung voice, which increases their maximum sung SPL more than their
maximum spoken SPL.
Peak Spread
Figure 24: Mean peak spread, dBA
After the pilot study, it had been anticipated that the peak spread would be reduced as the SPL of the measured signals increased. However, the mean spread for
all vocalists measured actually shows a slight increase in peak spread with increasing SPL for both sung conditions (fig. 24). The highest mean Spk value for speech
76
was found at the medium dynamic (‘theatrical’ level). This persisted whether calculating the mean for all eleven vocalists or just for the nine singers. While the
two actors’ peak spread was greatly reduced for their loudest speaking voice, the
singers increased their sung Lpk slightly faster than their Leq as their dynamic level
increased.
Standard Deviation by Level
Figure 25: Standard Deviation for the 9 Singers, dBA
In addition to measuring the range of an individual’s pressure variations from
the mean, it is also helpful to examine the total variance in Leq across all the singers
via the standard deviations of the A-weighted levels (fig. 25). The spoken voice
conditions show a clear increase in standard deviation with increasing level, indicating that the singers’ SPLs were more dispersed as they spoke with greater effort.
Interestingly, this trend was not observed in the standard deviations for either of the
sung voices, which stayed in a similar range at all three levels. It is possible that
77
because the subjects were primarily trained as singers rather than speakers, they had
more precision as a group in their sung levels than in their spoken levels.
Back vs. Mask Levels
Figure 26: Mean dB Increase from Back to Mask Voice
Figure 26 shows the average dB difference between the ‘mask’ voice and the
‘back’ voice. As had been hypothesized, trained singers usually interpreted the
same dynamic levels at lower SPLs for the ‘back’ voice. This difference is greatest
at pianissimo and decreases with increasing dynamic level. Participants 4, 5, and
6 each showed a difference greater than 7 dB for pianissimo. It is possible this
difference in subjective level stems from the greater damping of the ‘back’ voice
due to its placement in the rear of the vocal tract. Since singers may judge their
dynamic level based more on vocal effort than absolute SPL, an equivalent vocal
effort may lead to lower relative output pressure for the back voice.
78
Discussion
When comparing maximum SPL measurements in the literature, averaged and peak
levels should be distinguished based on the nature of the experiment. Both past
studies and this current experiment have yielded maximum Leq values of 90-91 dB,
as well as maximum Lpk values in the range 110-114 dB. The difference between
peak and average values fluctuates between about 20 and 30 dB, and it may possibly
behave differently for trained actors versus trained singers.
For the purposes of simulating George Whitefield’s voice, this study confirms
that averaged values of around 90 dBA are perfectly possible 1 m from a speaker.
While it is conceivable that his voice may have been louder than any of the vocalists
measured so far, any estimates above this measured maximum should be viewed
with caution until they can be experimentally verified.
79
CHAPTER VII
MODELING THE SITES OF WHITEFIELD’S LONDON CROWDS
The final step to investigate Whitefield’s maximum crowd size and check
Franklin’s own estimate is to model the acoustic propagation of Whitefield’s voice
at the locations in London where his largest crowds were reported: the Moorfields,
Kennington Common, and Mayfair. The material and geometric composition of
these sites in Whitefield’s time is necessary to model accurately the acoustic systems comprised by each of these sites.
Locations
Moorfields
Background
The Moorfields was a park in London outside the Moorgate near the homes of
many of Whitefield’s most devoted followers, near where both Whitefield’s and
John Wesley’s devotees would later build their headquarters (Dallimore, 1970). Its
wide open space functioned as something of a city mall and attracted the detritus of society. This was literally true in the case of the “not inodorous” heaps of
refuse and open sewers that began to accumulate there in the seventeenth century
(Thornbury, 1878). It was also metaphorically true in the case of the lower class of
society who gathered there for “bear-baiting, merry-andrew shows, wrestling, cud-
80
gel playing and dog fights” (Dallimore, 1970). Though more respectable Anglican
clerics avoided the area for this very reason, Whitefield, the consummate evangelist,
saw only lost souls in need of his message of the new birth. Whitefield’s open-air
preaching in London had begun in late April, 1739, when the leaders of St Mary’s
church at Islington had refused to allow him to preach there after initially inviting him. Whitefield took this refusal as license to head outdoors, and he promptly
preached to a crowd that gathered in the churchyard outside (Tyerman, 1877).
On Sunday, April 29 (all dates from the Julian Calendar), Whitefield ventured
out into the Moorfields for the first time. According to John Gillies, who wrote the
first biography of Whitefield,
Opportunities of preaching in a more regular way being now denied
him, and his preaching in the fields being attended with a remarkable
blessing, he judged it his duty to go on in this practice, and ventured
the following Sunday into Moorfields. Public notice having been given,
and the thing being new and singular, upon coming out of the coach,
he found an incredible number of people assembled. Many had told
him that he should never come again out of that place alive. He went
in, however, between two of his friends ; who, by the pressure of the
crowd, were soon parted entirely from him, and were obliged to leave
him to the mercy of the rabble. But these, instead of hurting him,
formed a lane for him, and carried him along to the middle of the fields,
(where a table had been placed, which was broken in pieces by the
crowd,) and afterwards lack again to the wall that then parted the upper
and lower Moorfields ; from whence he preached without molestation,
to an exceeding great multitude in the lower fields. (Gillies, 1772)
81
This first crowd at Moorfields was estimated at 10,000 people, but these numbers would grow as Whitefield began preaching there each Sunday while he was in
London. The next week, his crowd was estimated at 20,000, and the week after
(May 13) he reported:
Preached this morning to a prodigious number of people in Moorfields
and collected for the orphans £52 19s. 6d., above £20 of which was
in half-pence. Indeed, they almost wearied me in receiving their mites
and they were more than one man could carry home. Went to public
worship twice and preached in the evening to near sixty thousand people. Many went away because they could not hear, but God enabled
me to speak so that the best part of them could understand me well,
and it is very remarkable what a deep silence is preserved while I am
speaking. (Whitefield, 1756)
The quote above would seem to indicate that Whitefield’s evening crowd,
estimated at 60,000, was observed at the Moorfields. However, Whitefield’s publicist William Seward the next day reported crowds of 50,000 at the Moorfields and
60,000 at Kennington Common (Lambert, 1994). It was Whitefield’s usual practice
to preach at Moorfields in the morning and Kennington in the evening, so this may
be the case. However, on such occasions in his journal he usually named both locations specifically. If Seward’s account is correct, the highest reported crowd for
the Moorfields is then 50,000. Under the alternate interpretation, the estimate of
60,000 would be attributed to the Moorfields rather than Kennington Common.
82
Figure 27: Inset of John Rocque’s 1746 Map of London showing the Moorfields
Modeling
Because of Whitefield’s specific attachment to this area, there is also more specific
historical data available as to his position there. The Moorfields (fig. 27) was divided into three portions, designated as the lower, middle, and upper Moorfields.
The lower Moorfields (today Finsbury Circus) was the largest portion, with more
trees and greenery shown on John Rocque’s map of the area, based on surveys carried out from 1737 to 1746.* Gillies’s account is the most specific reference to
the exact spot where Whitefield preached at any of the sites in London. However,
* http://www.motco.com/map/81002/
83
the upper and lower Moorfields proper are not directly adjacent to one another, so
Gillies must be referring to either the border between the upper and middle Moorfields or the middle and lower Moorfields. It will be noted from Rocque’s map that
the lower Moorfields’ northern edge is made of a line of trees, thus lessening the
audience that could have heard Whitefield if he had preached there. In addition,
William Denton’s account of the area mentions a “low wall” separating the upper
and middle Moorfields but mentions no wall between the middle and lower Moorfields (Denton, 1883). Though they were slightly smaller, the upper and middle
portions contained more wide open space and were nearer to Whitefield’s tabernacle north of the upper Moorfields. Thus it seems likely that Gillies misspoke and
was referring to the wall between the upper and middle Moorfields.
The Moorfields was modeled geometrically using a Sketchup rendering of
the ground area based on existing data from the Google Maps database (fig. 28).
Because the area of Whitefield’s preaching has since been developed, no elevation
data was available except for the heights of the buildings present there today. To
check the topography, points were selected from streets around the area on each
side, which showed that the area was quite flat and lacked any significant changes
in elevation. This was confirmed by a foot survey of the area as it exists today.
The tree lines in the Moorfields were modeled as shown in the Rocque map,
with an assumed height of 5 m. Since unnecessary planes can reduce the accuracy of
a geometrical acoustical model (Rindel, 2010) and are not generally recommended
unless they are very close to an acoustic source (Mori et al., 2011), the tree lines
were simplified as planes that were 10% acoustically absorpent and 30% acoustically transparent with a large mid-frequency scattering coefficient of 0.5, allowing
half of incident sound through and providing few specular reflections. As this was
not an important London neighborhood, few period prints or drawings depict the
84
Figure 28: Sketchup Model of the Upper and Middle Moorfields
exact arrangement of the bordering areas next to the Moorfields. The buildings lining the edges of the Moorfields were modeled three stories (about 10 m) tall with
a sturdy wood construction on their facades with a low mid-frequency scattering
coefficient of 0.1. The absorption data for the wooden buildings surrounding Moorfields are shown in table 8. It will be noted, however, that the precise reflective
characteristics of these buildings only become acoustically relevant if it is predicted
that the edges of the open ground could hear Whitefield clearly. The audience occupying the entire area of the site was modeled based on an average density of 2
persons per square meter (absorption data shown in table 8). The question of the
correct density will be addressed in Chapter VIII, but as this is the densest audience
value for which measured absorption data exist, it is not possible to use a denser
value at any rate. In addition, since humans are more efficient absorbers at low
densities, the extra Sabine absorption per m2 will not increase greatly as density is
increased beyond a high amount (Meyer, 2009) and since these environments are
reasonably free field and lack reverberation effects, even a slight change in crowd
85
absorption should not significantly affect the STI calculations, which depend much
more on the direct sound level and background noise in these cases.
Table 8
Absorption coefficients for buildings and crowds at the Moorfields
Frequency (Hz):
125
250
500
1000
2000
4000
Wood Absorption:
0.11
0.07
0.03
0.01
0.01
0.02
Audience Absorption:
0.26
0.46
0.87
0.99
0.99
0.99
Whitefield’s mouth was modeled at a height of 1.75 m, standing atop a stone
fence of 1 m in height in the center of the border between the upper and middle
Moorfields. The crowd was also modeled as 1.75 m, giving Whitefield an effective
height of 1 m and ensuring a direct line of sight to those in the crowd because of
the flatness of the area. The total area of the entire Moorfields region is about 22
acres (89,000 m2 ), but it will be noted that the combination of the tree line and the
concavity of the outer border significantly reduces the area that would have had a
direct line of sight to Whitefield’s preaching position.
Kennington Common
Background
Kennington Common (today called Kennington Park), near the Manor of Vauxhall,
was the most wide open of the three sites. Like the Moorfields, it had gained a
reputation as a dangerous section of the city because of its history as an execution
ground. It was the home of “vicious sports and drunken brawlings,” a place where
“the harlot and pick-pocket sought the victims of their trades, and...the mob assem-
86
bled, ready for any any act of violence” (Dallimore, 1970). Whitefield was again
drawn to such a large unreached audience, and he himself joined the spectacle by
preaching there regularly. On May 6,1739, he spoke to a crowd estimated at 50,000,
and on June 3, to another possibly larger (Whitefield, 1756). As mentioned before,
depending on the interpretation of Whitefield’s journal entry, it is possible that his
crowd of May 13, 1739 (estimated at 60,000) was observed at Kennington Common
instead of the Moorfields.
Modeling
Figure 29: Map of Kennington Manor, including the Common, based on Hodskinson
and Middleton’s survey, 1785
Kennington Common was unfortunately not included in Rocque’s map of the
city, but another map of the Common and surrounding area (segments 11 and 12
in fig. 29) shows that the common occupied essentially the same space as the park
87
does today. This makes Kennington Park the least-developed and best-preserved of
the sites being modeled.
As with Moorfields, Kennington was modeled in Sketchup based on the Google
Maps data for the area (fig. 30). It is the most topographically varied of the three
London sites, containing a slight hillock towards its center. This is still a very
slight variation from a perfectly flat plain, however. Based on period images and
descriptions, the Common did not have buildings close to it that could act as potential boundary reflectors. Since the area was a Common in 1739 (as opposed to
a park today), Kennington would have been used for livestock grazing and would
have most likely lacked trees, unlike its current layout. The extant map of Kennington Common also does not indicate any trees, and thus none were included in the
Sketchup model. The lack of trees or buildings makes Kennington the most wide
open of the three sites investigated, containing also the most raw area for fitting in
a large crowd independent of acoustical factors.
Figure 30: Modeling Kennington Common in Sketchup
88
Mayfair
Background
On June 1, 1739, Whitefield reported that he
...preached in the evening, at a place called Mayfair, near Hyde Park
Corner. The congregation, I believe, consisted of near eighty thousand
people. It was by far the largest I ever preached to yet. In the time
of my prayer there was a little noise, but they kept a deep silence during my whole discourse. A high and very commodious scaffold was
erected for me to stand upon, and though I was weak in myself, yet
God strengthened me to speak so loud, that most could hear, and so
powerfully, that most, I believe, could feel. (Dallimore, 1970)
The region he described, called Mayfair (fig. 31), is now known for being
one of the most expensive areas in London (partially because of its top position in
the British version of the board game Monopoly* ). While its high land values have
led to extensive development there today, in Whitefield’s day it was still a wideopen area named for the traditional fair that had been held there in May since the
sixteenth century in the fields outside St. James’s Hospital (Walford, 1878).
Modeling
Situated between Hyde Park on its west and Picadilly to its south, Mayfair in Whitefield’s day had an overall area of about 23 acres (93,000 m2 ), but much of the area
in its southwest corner, closest to Hyde Park, would have been obscured by the
* http://www.propertywire.com/news/europe/uk-properties-monopoly
-prices-201306277943.html, accessed 7/22/2014.
89
Figure 31: Inset of John Rocque’s 1746 Map of London showing Mayfair
presence of Chesterfield House (fig. 32), a large manor surrounded by a high wall,
as well as another smaller walled estate to the east. These both would have reduced
total area both by their subtractive presence as well as their shadowing effect, obscuring sound paths from the main fair location to the northeast.
Unlike the other sites, Whitefield preached at Mayfair only once during his
annus mirabilis of 1739. While he preached to relatively large reported numbers
at other sites around London and the rest of the Britain, the numbers reported are
small compared to those recorded for Moorfields and Kennington Common. But
this number of 80,000 was by far the largest ever reported for Whitefield’s crowds,
and though it was a single incident, it deserves investigation solely because of the
audacity of its claim. Psychologically, it is possible that a similar-sized crowd in a
90
Figure 32: Unsigned wood print of Chesterfield House, 1760
setting unfamiliar to Whitefield and his followers would have seemed perceptually
larger. However, Whitefield had also moved from the margins of the city (both
socially and geographically) to the center, so it seems at least plausible that this
may have actually been the largest crowd he attracted. The fact that a scaffold was
specifically constructed for his visit to this site may also indicate a larger degree of
planning and perhaps a larger crowd. The final Mayfair Sketchup model is shown
in figure 33.
Because this was not a regular spot for Whitefield’s preaching, there is much
less recorded evidence about where exactly he was positioned during the sermon.
The mention of the scaffold constructed for him suggests that perhaps he was located toward the northeastern corner of the area, near the site of the historical fair.
There is no historical data recording the location of a platform or other speaker’s
position at the fair to the author’s knowledge. Based on the shape of the area and the
directivity of the voice, Whitefield was positioned within the model in front of the
91
Figure 33: Sketchup Model of Mayfair
only building in the area, near the street that led to Berkeley Square (fig. 31). This
position would have opened up the largest range of his voice to the crowd gathered
based on the shape of the site’s bordering buildings. While there was a small area
behind him which would have been occluded by this building, his voice would have
been aided by a strong early reflection from the building behind him, similar to that
from the court house doors in Philadelphia. The building facades were modeled
as wood with identical absorption properties to that used in the Moorfields, and
the crowd was also constructed of the same density (table 8). The walls enclosing
Chesterfield House and the other walled estate were modeled as 3 m tall and made
of brick, with the absorption coefficients shown in table 9.
Table 9
Absorption coefficients for brick walls near Mayfair
Frequency (Hz):
125
250
500
1000
2000
4000
Brick Absorption:
0.14
0.28
0.45
0.90
0.45
0.65
These three models can be used for a more rigorous analysis of Whitefield’s
92
preaching at each site. Separate simulations will allow a better understanding of the
factors that apply to Whitefield’s preaching in general or only to specific sites. In
addition, other variables may be altered to investigate their significance to Whitefield’s total intelligible range. The next chapter describes the simulation process
and the final estimates for Whitefield’s crowds.
93
CHAPTER VIII
SIMULATIONS OF WHITEFIELD’S SERMONS IN LONDON
Simulation Results: Base Conditions
To test the range of Whitefield’s voice, each of the three sites was simulated in an
acoustic computer model. Each simulation was carried out under ‘base’ conditions:
11.5â—¦ Celsius, 50% humidity, with Whitefield’s orientation directly forward (0â—¦ elevation). After the base conditions had been evaluated, these three variables were
altered systematically to predict any differential changes that these factors might
have had on the previous simulations. In the Mayfair model, where Whitefield
mentioned preaching from a scaffold, additional simulations also investigated the
role of added height to his maximum crowd size.
We may define the Minimally Intelligible Area (MIA) for a given model as
the amount of area at that site for which the STI is greater than or equal to the
defined minimum STI value. Each site’s MIA was simulated using three different
source SPLs: 85 dBA , 90 dBA , and 95 dBA . Since Franklin’s method incorporated
both Whitefield’s vocal level and Franklin’s hearing acuity into a single measurement, the simulated loudness for Whitefield is dependent on Franklin having normal
hearing, defined as a lower bound of intelligibility at STI = 0.3. There is no record
of Franklin having hearing loss, but if he had slightly worse than normal hearing
(lower bound of STI at 0.4), Whitefield’s voice would have had to have been about
5 dB louder, or about 95 dBA . Conversely, if Franklin’s hearing was slightly bet-
94
ter than normal (lower bound of STI at 0.2), Whitefield’s voice may have been 5
dB lower, or about 85 dBA . Since the model’s crowd is essentially populated with
virtual Benjamin Franklins, this can be addressed in the simulations by simultaneously adjusting the source SPL and the minimum STI threshold. This allows the
simulation to take into account a wider range of factors while retaining the same
data Franklin originally measured. The pairing of a noisier crowd and a louder
Whitefield also addresses the Lombard Effect, which is the tendency of humans
to subconsciously raise their voices in the presence of greater background noise
(Pick, Siegel, Fox, Garber, & Kearney, 1989), since Whitefield would likely have
achieved his greatest vocal levels when the crowd was noisier. The estimated crowd
that could fit within such an area will depend finally on a density estimate, which
will be addressed later on.
Sound level measurements were taken at each of the three sites during a visit
in summer 2013. The Mayfair area, being close to Hyde Park Corner, had considerable noise almost continually from tourists and motorized traffic. The Moorfields
area was quieter but still is completely developed at the site of Whitefield’s preaching and may not be an accurate indicator of ambient noise levels in Whitefield’s
time. Kennington Common, however, remains in a similar condition to its original
layout and in its southeastern end is perceptually free from traffic noise. During
the periods of relative quiet (interrupted only by planes overhead leaving Heathrow
airport) the ambient noise level there was measured to be as low as 50 dBA . It is
possible that the city was quieter in 1749, or that Whitefield’s auditors were not as
silent as he often described them to be, so simulations were carried out for background noise levels of 45, 50, and 55 dBA .
The simulations were performed using an acoustic CT algorithm using CATTAcoustic v9.0 (B. Dalenback, 2011). For each simulation, site plots were generated
95
showing the projected STI values over the included area. As with the Philadelphia
simulations, various cone densities were used in initial tests, which showed no effect
on the final calculation because of the free field nature of the environment. Because
of this, low cone densities (about 10,000 cones) were used in each simulation to
reduce processing time. CATT exported a grid of 2m x 2m squares for each site
with the projected STI value for each square. A customized MATLAB script was
used to calculate the amount of area with STI above or equal to a given input value
for each of the vocal SPL conditions.
Moorfields
The simulated MIA for the Moorfields under base conditions is shown in table 10.
Recall that the minimum STI simulated decreases as the vocal SPL increases, such
that the signal-to-noise ratio is not the only determination of the MIA. It will be
seen that within the range of background noise levels considered, the MIA is quite
sensitive to changes in noise level, decreasing by over 75% with a 10 dB noise
increase. This suggests that, similar to the accounts in his journals, Whitefield
might have been able to reach far greater crowds when they remained relatively
quiet. As his preaching became more of a spectacle he attracted larger amounts of
people, but the number that could have heard him clearly may have decreased as a
result.
In contrast, we see that the absolute SPL of Whitefield’s voice, once coupled
to the hearing of our crowd of virtual Benjamin Franklins, does not have a large
effect on the simulated MIA. A 10 dB increase in Whitefield’s vocal SPL from 85
to 95 dBA (corresponding to an increase in minimal STI from 0.2 to 0.4) increases
MIA by only 5% at low crowd noise levels. As the background noise level in96
Table 10
Moorfields simulated MIA (m2 ) for each vocal SPL and background noise level
Vocal SPL
Noise Level
85 dBA
90 dBA
95 dBA
45 dBA
38,508
39,412
40,372
50 dBA
22,508
25,124
29,084
55 dBA
8,164
10,304
14,036
creases, the MIA gains from a higher vocal SPL increase to about 30% at 50 dBA ,
and about 72% at 55 dBA . This suggests that at low noise levels the linked STIVocal SPL system works nearly linearly, but as the background noise increases,
there are higher gains associated with a louder voice. This may seem surprising as
physical acoustics is essentially linear, but psychoacoustics is not, which accounts
for the differences shown here: as the overall SPL increases (due to both Whitefield’s voice and the crowd’s background noise), the auditory system will admit
more sound from frequency bands other than the 1-4 kHz region most important to
speech intelligibility. Thus the overall STI value may decrease as these frequency
bands mask the 1-4 kHz region, causing the MIA to decrease as well.
Figure 34 shows the STI maps for all nine combinations of source and background noise levels. STI values from 0-0.3 are usually classified as ‘bad’, 0.3-0.4
as ‘poor’, 0.4-0.5 as ‘fair’, 0.5-0.65 as ‘good’, 0.65-0.8 as ‘very good’, and 0.8-1.0
as ‘excellent’ (Hodgson, 2002). Observe that the STI shows a cardoid-like directivity pattern, similar to those seen for the voice at mid and high frequencies. This
is a result of the STI’s weighting function, which emphasizes the octave frequency
bands at 1, 2, and 4 kHz to model the non-linear loudness sensitivity of the human
auditory system (Houtgast et al., 1980). Though low frequencies are the least atten-
97
uated by air absorption, they are also the least important to the STI’s value over the
simulated audience plane.
For the extreme cases of 95 dBA source level and 45 dBA noise level, the MIA
is projected to cover most of the upper and middle Moorfields except those portions
occluded by treelines. This suggests that in a wider area under these “perfect storm”
conditions, a single voice could reach an even larger area. However, as mentioned
before, no experimental data has shown the existence of a voice that can sustain an
Leq of 95 dBA , so these highest figures should be viewed with caution, as should
the lowest noise levels of 45 dBA , which is extremely quiet and would be unlikely
to be sustained in a large crowd for an extended period of time. However, the more
modest center condition of 90 dBA vocal level and 50 dBA background noise yields
an impressive area over 25,000 m2 . As the background noise level might fluctuate,
the best estimate for all these simulations is probably between the center area of
25,000 m2 and the more modest figure of about 10,000 m2 for the 55 dBA noise
level.
Based on these more moderate source and noise level assumptions, Whitefield’s voice is not projected to be minimally intelligible toward the edges of the
Moorfields, which suggests that the area acts as a reasonably free field environment for the purposes of reflection tracing. The building reflections would only
reinforce the STI when listeners were very close to them so that the reflections arrived soon after the direct sound (usually within 50 ms). Farther from the buildings
(that is, closer to Whitefield) the reflections could conceivably degrade the STI, but
by that point the sound would be much more diminished than at the edge of the
crowd, where it was already small in comparison to the background noise. After
the effects of high frequency air absorption and normal intensity drop-off, these reflections would not strong enough to significantly affect STI closer to the source.
98
(a) 45 dBA noise, 85
dBA source
(b) 45 dBA noise, 90
dBA source
(c) 45 dBA noise, 95
dBA source
(d) 50 dBA noise, 85
dBA source
(e) 50 dBA noise, 90
dBA source
(f) 50 dBA noise, 95
dBA source
(g) 55 dBA noise, 85
dBA source
(h) 55 dBA noise, 90
dBA source
(i) 55 dBA noise, 95
dBA source
Figure 34: Simulated STI at Moorfields for different background noise conditions
99
This suggests that the exact absorption coefficients for the buildings surrounding
the Moorfields are not a significant factor as long as they are in the range of fairly
absorbent materials (wood or brick) which are assumed to have been in use at that
time. Much more reflective surfaces (e.g. a Palladian classical marble facade) may
have been able to return enough sound intensity to slightly affect the MIA, but there
is no period evidence of such a structure to the author’s knowledge.
Kennington Common
The simulated base condition MIA is shown for Kennington Common in table 11.
Many of the same patterns observed for the Moorfields can be seen in the projections for Kennington: increases in background noise quickly shrink the MIA for a
given vocal level, and increases in vocal level have a somewhat weaker effect that
is diminished further at low noise levels.
Table 11
Kennington simulated MIA (m2 ) for each vocal SPL and background noise level
Vocal SPL
Noise Level
85 dBA
90 dBA
95 dBA
45 dBA
57212
63408
67872
50 dBA
22476
27292
36512
55 dBA
7572
9612
13508
The MIA for Kennington Common under each vocal and noise condition is
shown in figure 35. It can be seen that the upper bound on the MIA under the most
generous acoustic conditions is merely the boundary of the common itself. If the
model were to extend to the roads and adjacent land plots shown in figure 29, these
100
simulations would likely give an even larger estimate for the MIA. However, as
mentioned previously, these conditions are included more out of theoretical curiosity than a realistic expectation that they represented the full acoustic system during
Whitefield’s sermons.
(a) 45 dBA noise, 85 dBA
source
(b) 45 dBA noise, 90 dBA
source
(c) 45 dBA noise, 95 dBA
source
(d) 50 dBA noise, 85 dBA
source
(e) 50 dBA noise, 90 dBA
source
(f) 50 dBA noise, 95 dBA
source
(g) 55 dBA noise, 85 dBA
source
(h) 55 dBA noise, 90 dBA
source
(i) 55 dBA noise, 95 dBA
source
Figure 35: Simulated STI at Kennington for different background noise conditions
101
The most reasonable range of MIA values (defined again as the vocal level
of 90 dBA , from 50 to 55 dBA crowd noise) is similar to that for Moorfields, as we
would expect since in those ranges both can be considered to be relatively free field.
The slight differences between the two may be traced to the greater topological
change in Kennington Common and the tree lines at Moorfields which block some
direct sound paths. The least generous condition (85 dBA vocal level and 55 dBA
crowd noise) MIA is 7,572, slightly reduced but similar to that for the Moorfields
figure of 8,164.
Mayfair
The projected MIA for each vocal and crowd noise level is shown in table 12 for
Mayfair under base conditions. The same general trends may be observed in the
simulations for Mayfair as for the other two sites, indicating that the space functions
as a relatively free field under moderate acoustic conditions. Only under the lowest
noise condition (45 dBA ) are there major differences between sites, because at that
noise level Whitefield’s voice is projected to fill most of whatever site it is placed
in. Moorfields and Mayfair, both smaller and more closed in, thus have an upper
bound to their MIA based only on area and independent of acoustical factors. While
Mayfair’s upper bound is significantly smaller than Kennington’s, it is still greater
than Moorfield’s largest MIA value by over 50%. Figure 36 shows the simulated
STI for Mayfair under each pair of noise and vocal levels.
Based on the potential audience area at each site, it is clear that Kennington,
if filled, could have hosted a larger crowd than that at Mayfair. Yet specific details
about the extent of each crowd are usually lacking in the historical accounts, and
it may be that huge crowd reported at Mayfair filled a much greater portion of
102
Table 12
Mayfair simulated MIA (m2 ) for each vocal SPL and background noise level
Vocal SPL
Noise Level
85 dBA
90 dBA
95 dBA
45 dBA
53448
58024
61660
50 dBA
22336
26964
35240
55 dBA
7588
9812
13916
that site’s area on a single occasion. However, Whitefield usually preached at a
single site multiple times regularly, allowing larger crowds to form at each visit,
suggesting that the Mayfair estimate may be overly generous.
Other Factors
Having investigated the general relationship between noise, vocal level, and MIA
for all three sites under base conditions, it is also useful to expand the variables
investigated to estimate their significance to the final MIA simulation. First environmental factors will be investigated, followed by geometric factors. Since under
moderate acoustic conditions the sites were found to all behave similarly, each variable will be addressed for a single site, with the differential changes to the base
condition model presented according to a change in the dependent variable being
addressed. Each of the sites addressed in this section were simulated using a vocal
level of 90 dBA and a crowd noise level of 50 dBA , so the differential effect will
have differences with respect to other vocal and noise level conditions.
103
(a) 45 dBA noise, 85 dBA
source
(b) 45 dBA noise, 90 dBA
source
(c) 45 dBA noise, 95 dBA
source
(d) 50 dBA noise, 85 dBA
source
(e) 50 dBA noise, 90 dBA
source
(f) 50 dBA noise, 95 dBA
source
(g) 55 dBA noise, 85 dBA
source
(h) 55 dBA noise, 90 dBA
source
(i) 55 dBA noise, 95 dBA
source
Figure 36: Simulated STI at Mayfair for different background noise conditions
Environmental Factors
Temperature
Historical weather data before the nineteenth century is rare, as was the case in
the model of Philadelphia’s Market Street. However, the British HadCET dataset
104
contains average monthly temperatures for London dating back to 1698 (Manley,
1953). Unfortunately its day-to-day mean temperatures do not begin until 1772,
but even as estimates these monthly figures provide a better starting point than is
otherwise available to us for a project of this nature since so many empirical data
points have been lost to the passage of time.
The dataset’s value of about 11.5â—¦ Celsius was used for the base conditions
above. The differential effect of temperature was then used for a model of the Moorfields by running separate simulations every 2 degrees Celsius for values higher and
lower than the base model’s temperature. Table 13 shows the relative effects on the
simulated MIA for changes in the model’s temperature.
Table 13
Simulated changes in MIA resulting from changes in temperature in Moorfields
∆◦ Celsius
∆ MIA (m2 )
-4
-4620
-2
-244
0
0
+2
+216
+4
+440
+6
+624
Since the speed of sound in air (or any medium) is dependent on the ambient
energy of the particles that constitute that medium (i.e. its temperature), the theoretical acoustic impedance of a sound wave is also dependent on temperature, though
less so than it is on humidity (ANSI, 2009; Harris, 1966). The differential analysis
shows that over a wide range of temperatures from 9.5â—¦ C to 17.5â—¦ C the overall
105
simulated MIA is fairly constant (within a range of 1000 m2 ). However, for a very
low temperature of 7.5â—¦ C the simulated MIA does show significant change, with a
projected drop of 4620 m2 .
This result seems striking but requires two caveats: first, the average temperature figure for May 1739 of about 11.5â—¦ C is already lower than would be expected
for London in May,* and if the HadCET dataset is not reliable, it is possible a
warmer value may be more appropriate. For instance, the average temperature for
June 1739 in the dataset is about 4â—¦ warmer than that for May, and Whitefield was
speaking on the cusp of the two months. Secondly, in real life scenarios temperature
and humidity are correlated over certain intervals, but here they are being treated
as independent variables. In an outdoor acoustic environment, as the temperature
dropped closer to freezing we would normally expect the humidity to approach
zero, which would decrease atmospheric absorption from water vapor, increase the
audible range of a sound source, and thus increase the MIA for the site. Since this
is only projected at extremely low temperatures, it seems safe to treat the MIA simulation as a reasonably good estimate based on the temperature data available to
us.
Humidity
Air humidity is the single most important environmental factor in determining the
acoustic absorption of sound by air. The water vapor particles in air serve as an absorbent obstacle for high frequencies whose wavelengths are very small, and over
large enough distances this causes significant high frequency attenuation independent of normal free field intensity loss. At very low humidities, the air is so dry
* http://www.metoffice.gov.uk/public/weather/climate/city-of-london
-greater-london, accessed 7/22/2014.
106
that air absorption loss is very small and high frequencies travel much farther. The
high frequency loss per unit distance is highest around 10-20% humidity, as shown
in figure 37. As the humidity increases beyond this point, the additional water particles in the air transfer sound more efficiently between one another and thus high
frequency attenuation decreases more or less monotonically as humidity increases.
As the high frequency bands are very important to the calculation of STI, changes
in humidity will have a significant effect on the projected MIA.
Figure 37: Atmospheric absorption at different humidity levels, from (Harris, 1966)
The nonlinear relationship between humidity and air absorption makes definite high frequency loss calculations difficult when humidity data is not known,
since the same loss factor can often correspond to two different humidities. However, the driest humidity levels are expected during winter or within arid climates,
107
neither of which are generally associated with late spring in southern England.
Since London’s current average humidities for May range from 50% to 60%,* it
seems safe to consider humidity levels greater than 20%, over which air absorption
should decrease consistently. Table 14 shows the projected differential changes in
MIA based on changes in humidity by decade of percentage points relative to 50%
humidity.
Table 14
Simulated changes in MIA for Moorfields resulting from changes relative to 50%
humidity
∆ % Humidity
∆ MIA (m2 )
-30%
-3108
-20%
-1464
-10%
-628
+0%
0
+10%
+528
+20%
+884
+30%
1192
As expected, the simulated MIA is much lower around 20% humidity and
increases monotonically with higher humidity. The rate of increase over this interval is greater in the lower humidity values (increasing MIA by about 1600 m2
from 20% to 30%) and decreases for each subsequent decade. While no mention of
cold or dry weather is made by Whitefield or his followers, they did occasionally
* http://www.bbc.com/weather/2643743,
108
accessed 7/22/2014.
mention rain at their gatherings during this time period (Whitefield, 1756), which
suggests that the spring was not unusually dry.
However, beyond ruling out the extremes of very low humidity or 100% humidity (when we know it was not raining), there still remains a wide range of humidities possible for the dates being simulated for which we have little to no historical evidence. These simulations can help quantify the uncertainty in these ranges,
but given the empirical data available the best precision we can attain at this point
is to say that the base conditions simulations should probably be given a margin of
error of ±1500 m2 based on the environmental conditions on the day of the specific sermons preached. But based on current average temperatures for London’s
humidity, our starting guess of 50% humidity still appears to be a good estimate for
a colder-than-average year such as 1739 based on the little data available.
Geometric factors
Since the geometric arrangements used in the baseline acoustic predictions are only
estimates based on available historical data, it is useful also to consider the significance of slight changes in geometry on the final MIA simulations. The base
conditions assumed a directional orientation of 0â—¦ elevation (that is, dead-center)
for Whitefield’s mouth and a height of 1 m above the crowd for Moorfields and
Kennington and 5 m above the crowd for Mayfair. How might changes in Whitefield’s direction or height above the crowd have affected the range of his voice?
Directional orientation
Since the goal of these simulations is to estimate the average acoustic conditions
during one of Whitefield’s sermons, directly forward seems the most likely average
position of his head over time. However, the acoustic directivity of the voice is
109
sensitive to slight changes in angle, and it may be useful to investigate their significance in this context. The effect of slight increases in source elevation angle were
simulated for the Moorfields, with the change in MIA shown in table 15.
Table 15
Simulated changes in MIA resulting from changes in source elevation angle in
Moorfields
∆ % Elevation Angle
∆ MIA (m2 )
-1â—¦
-76
0â—¦
+0
+1â—¦
-128
+2.5â—¦
-320
In general, these simulations suggest that slight changes in elevation angle
are much less significant to the final MIA estimate than other historical unknowns,
such as weather data. A flat source angle yields the greatest MIA, but slight changes
in average angle of ±1â—¦ decrease the intelligible area only slightly. At a more
elevated angle of 2.5â—¦ , the MIA decreases more because of the directivity of the
voice, which becomes more attenuated at positions lower than the mouth. This
effect is especially pronounced in the frequency range from 1-4 kHz, which is most
critical for speech intelligibility. Figure 38 shows the decrease in level off axis
with increasing frequency in the directivity pattern of the male voice dataset used
in these simulations (as might be expected, the lower frequency bands are more
omnidirectional).
110
(a) 1 kHz
(b) 2 kHz
(c) 4 kHz
Figure 38: Male vocal directivity pattern, in octave bands, used for Whitefield’s voice
Source Height
For the majority of his sermons Whitefield preached at informal locations, from
such elevation as could be arranged - a hillside in Blackheath, the wall in Moorfields, a table, a tree stump, and even a tombstone are all mentioned as substitute
pulpits that he used at some point (Dallimore, 1970; Gillies, 1772; Whitefield, 1756;
Lambert, 1994). For the Moorfields and Kennington, it seems likely that he had a
similarly modest elevation above his crowd since no other apparatus is ever mentioned in the accounts of his sermons. But in the case of his sermon at Mayfair he
specifically mentioned a tall scaffold constructed for him to preach from. This was
modeled as 5 m tall under the base condition simulation, but since a lower height or
a different angle may have had a significant effect on his vocal range, each height
from 1 to 5 m (in 1 m increments) and each source orientation elevation angle were
simulated to estimate the differential effects with respect to the base conditions.
It will be noted that these simulations did not include any model of the scaffold
itself, which could be more or less obstructing or reflecting based on its construction. Thus these examples investigate only the effects of the source geometry on
the final intelligible area. Figure 39 shows the simulated change in MIA for each
combination of height and angle.
111
Figure 39: Change in MIA based on source height and angle at Mayfair
While a first guess might have supposed that increasing height would have
increased the area that could be reached by Whitefield’s voice, these simulations
show the opposite: increasing height yields a lower MIA for any source orientation
angle. This is partly because the sites in question do not contain enough topographical diversity for new area to be reached with greater source height, and partly
because more height means more air absorption and intensity drop-off as the sound
must travel through a greater distance to the farthest listeners. It must be said also
that in the ideal system simulated here, 1 m of height was still enough to ensure
a perfect line of sight from Whitefield to each of the identical Benjamin Franklins
in the virtual audience. In a real audience with varying heights, some shorter individuals might benefit from a slightly larger height if it were enough to avoid sound
obstruction from other audience members.
Within a single height, raised source elevation angles show a decrease in MIA
112
at all elevations. As we would expect, however, the optimal source angle changes
as the scaffold gets higher. At 1 m above the audience (similar to the cases at
Moorfields and Kennington) a flat (0â—¦ ) angle gives the greatest MIA value. As the
source height increases, the optimal angle decreases to -1â—¦ at 2 m and 3 m, -2â—¦ at 4
m, and roughly the same for -2â—¦ and -3â—¦ at 5 m. This is because at higher elevations
more of the lower-head attenuation seen in figure 38 becomes salient for audience
members if the head angle is kept flat.
Thus it seems that even if the height of the base condition example was correct, the estimated MIA should be increased somewhat to account for a likely downward source angle to better reach his audience. The extreme lowest height of 1 m
seems unlikely since Whitefield thought it important to mention that he was much
higher than normal. Without more quantification, we cannot say more specifically
which estimate is the best. But it seems safe to rule out the upward source angles to
begin with. Furthermore, we can notice that at the downward angles all the scaffold
heights converge somewhat into a smaller margin of uncertainty. If we guess that
Whitefield was at a height from 2-5 m and that he was speaking downward at an
angle of at least -2â—¦ , then we should increase the Mayfair MIA estimate by about
400-700 m2 .
Crowd Density
Having investigated thoroughly the MIA of Whitefield’s voice at each site under a
variety of noise, environmental, and geometrical factors, we are now left with the
question of how many people could actually fit into the area that Whitefield’s voice
could fill. As mentioned before, the simulations were carried out with the maximum
113
crowd density of 0.5 m2 per person, though it was argued that greater density would
not greatly increase absorption or affect the final STI calculation significantly.
Though Franklin was a bit vague on some of the details of his experiment,
he did explicitly state that he used a density estimate of 2 ft2 per person, or a little
less than 0.2 m2 (Franklin, 1793). This is equivalent to a later figure he used during another calculation published in Poor Richard Improved years later (Franklin,
1749). Franklin seems to have based these figures on the maximum number of people that could possibly have fit into a given area. It might be thought, therefore, that
to complete Franklin’s experiment all that is necessary is that we update the MIA
estimate using modern technology, substitute in Franklin’s original estimate, and be
done with it. However, the science of crowd estimation has also progressed since
Franklin’s time, and it is worth acknowledging the advances in that field as well.
In 1967 a newspaper reporter named Herbert Jacobs published one of the
first examinations of crowd estimation, motivated by the overenthusiastic “wild
guesses” he saw published by his colleagues (Jacobs, 1967). Jacobs meticulously
counted heads in photographs of actual crowd assemblies and found that the estimates given by event organizers and reporters were often much higher than the actual figure he was able to count. Given that both organizers and the media may have
an implicit bias toward larger estimates, Jacobs suggested that an Area ∗ Density
calculation might lead to a better overall estimate.
Jacobs’s method was later updated by Seidler et. al (Seidler, Meyer, & Gillivray,
1976) and Swank (Swank & Clapp, 1999) to include better accounting of variable
crowd density and sampling methods. A more recent update on the state of the art
in this field is given by (R. Watson & Yip, 2011). While the exact methodologies of
these studies cannot be adopted due to the lack of photographic evidence, they do
provide some insight into a reasonable density estimate over an entire crowd.
114
Under current crowd estimation techniques, a density of 4 persons per m2
(and, by extension also Franklin’s estimate of about 5 persons per m2 ) are classified
as “mosh pit conditions” (R. Watson & Yip, 2011). These, as in Franklin’s account,
classify the most people that can be fit into a given space. However, they only
occur over very small areas and in social environments (such as their namesake) in
which being in direct contact with the people around oneself is acceptable. Even
supposing that close to Whitefield the excitement of his celebrity led to conditions
close to this, eighteenth-century notions of propriety suggest that even the densest
part of the crowd might leave more space between audience members than is found
in a modern mosh pit. In addition, since there is no more detailed evidence by
which to assign variable density levels, we are forced to do our best instead to find
an average density level for the entire crowd.
Another important consideration towards estimating an appropriate average
crowd density for 1739 London is the average size of the people themselves. While
estimating historical average population size (like average vocal sound pressure) is
tricky due to lack of certain types of written evidence (Wachter & Trussell, 1982), it
seems likely that the average male height in early eighteenth-century England was
about 165 cm (Komlos & Cinnirella, 2005), about 10 cm shorter than the average
British male today.* The smaller average size might suggest a concomitant greater
possible crowd density.
However, in contrast, the English hoop skirt was in vogue in 1739 (Chrisman,
1996), and since Whitefield and the Methodist revivals in general tended to disproportionate attract women (Dallimore, 1970) this would have made a significant different in the average area per person in Whitefield’s crowds. Even in the most dense
* http://www.theguardian.com/uk/2002/aug/28/science.research,
7/22/2014.
115
accessed
sections around Whitefield, hoop skirts would have contributed toward an upper
bound for density less than the figure Franklin used (which, it should be recalled,
he also applied to a model of soldiers standing in formation later on (Franklin,
1749)).
Certainly an average density estimate of 4-5 persons per m2 seems unreasonable over areas as large as those investigated here. It seems likely that while greater
density pockets probably existed around Whitefield, the entire crowd would likely
have spread out to a comfortable interpersonal distance farther out. This overall
density is less dependent on individual’s sizes and more on social conventions and
notions of propriety. Therefore it seems reasonable to look to modern crowd estimation literature for a best estimate of average crowd density.
Watson defines a lower average density of “strong” conditions as about 2
persons per m2 (0.5 m2 per person), which is in general the highest natural density
achieved by a large crowd over a significant amount of area (R. Watson & Yip,
2011). This seems more in keeping with the subjective descriptions of Whitefield’s
crowds* and thus will be adopted for the maximum crowd estimates for each of
Whitefield’s sites. Since Franklin’s density factor is much higher, for any given
MIA value the estimates adopted here may be multiplied by a constant value of
0.5m2
≈ 2.7
2ft2
(20)
to obtain the crowd values using Franklin’s original density factor.
* In
a description in The Gentleman’s Magazine that year, the ‘computed’ value of 20,000 people
over 3 acres seems to assume a density value of about 1.6 persons per m2 , just slightly lower than
this value (“The Gentleman’s Magazine”, 1739)
116
Final Crowd Estimates
The density estimate of 0.5 m2 per person has the added benefit that it only requires
multiplying each MIA value by 2 to obtain the approximate crowd that could fit
in such an area. For crowd estimates under specific conditions, the MIA figures in
tables 10, 11, and 12 may be used in addition to the correction factors suggested
previously. Rather than repeating all the MIA data times 2, here we will simply give
the high and low estimates for how many people Whitefield could have reached with
his voice at the London locations. These will focus on the vocal level of 90 dBA
and background noise levels of 50 and 55 dBA , with the understanding that other
combinations of source and noise levels are also possible. These two noise levels
will be classified as the ‘low’ and ‘high’ noise conditions, respectively. Though
higher or lower levels cannot be ruled out, these two levels are the best guesses
we have based on current information as to the levels of Whitefield’s crowds when
they were being relatively quiet. Since there is such a large variation over this 5
dB interval, this area will suffice for answering most of our questions about the
acoustic limits for crowd size. All values will be rounded to the nearest hundred
for simplicity since this study makes no claim to greater precision (and perhaps no
more precise than to the nearest thousand).
The base condition estimates for Moorfields yielded an MIA of about 25,100
m2 at 50 dBA noise and about 10,300 m2 at 55 dBA . For Kennington, the low
noise condition MIA was about 27,300 m2 and the high condition about 9,600 m2 .
At Mayfair, these were about 27,000 m2 and 9,800 m2 , respectively. Due to environmental conditions these figures could be revised by about ± 1500 m2 , with
a larger decrease if the humidity was close to 20%. Source angles in Moorfields
and Kennington indicated that the flat angle in the base conditions was the optimal
117
orientation for increasing MIA. At Mayfair it was shown that the base condition
estimate should be increased by +400-700 m2 . Since the humidity is not known,
neither the high nor low condition will be assumed, but the 50% humidity level will
be reported here, with the understanding that significant variance is possible due to
atmospheric conditions. The increase factor for Mayfair, however, will be incorporated as it seems to highlight an actual deficiency in the base condition calculation.
The maximum Mayfair adjustment, +700 m2 is added to the Mayfair MIA at 50
dB while the lower value of +400 m2 is added at 55 dBA since the differential circumference of the smaller area at 55 dBA would also yield a lower increase in area.
These lead to the final estimates of the upper limits of Whitefield’s MIA and crowd
size, shown in table 16.
Table 16
Maximum simulated MIA and crowd size for each site at 90 dBA vocal level
Noise Level
Moorfields
Kennington
Mayfair
MIA (m2 )
50 dBA
25,100
27,300
27,700
55 dBA
10,300
9,600
10,200
Crowd Size
50 dBA
50,200
54,600
55,400
55 dBA
20,600
19,200
20,400
Even under these restrictions there is a wide variance allowable based mainly
on the noise level in the crowd. Noise can account for a factor of about 2.5 in
the final crowd limit (and adopting Franklin’s density estimate adds another similar
factor for a variation of about 2.52 = 625%). Some might be tempted at this point
118
to throw up our hands and admit that the historians were right to consider these
questions unknowable.
However, not all is lost, and in fact much useful knowledge can be extracted
from the simulations. First, as mentioned before, Franklin’s density estimate does
not seem reasonable as an average value to describe the entire crowd at any of the
sites. Secondly, noise is not unknowable day to day like humidity is - Whitefield’s
journals and other accounts provide subjective descriptions of the crowd noise on
specific dates. So we have reason to believe that on some days the crowd noise
approached the lower limit of 50 dBA . Similarly, we have evidence to suggest that
Whitefield’s voice could reach 90 dBA on his best days (and that he was reasonably consistent while healthy). Certainly there was some variation between these
quantities, but between the two of them we can outline a general interval of 5 dB
in the combined signal/noise ratio over which these high and low values probably
appeared from day to day.
Given this outlook, we can now begin to evaluate the crowd sizes reported
from Whitefield’s day. Assuming we can trust William Seward’s assertion of 60,000
people to Kennington instead of Moorfields, the greatest reported crowd sizes for
each site are shown in table 17.
Table 17
Maximum reported crowd size for each site
Moorfields
Kennington
Mayfair
50,000
60,000
80,000
It will be seen that even under the most generous acoustic conditions there is
no indication that Whitefield could have reasonably reached a crowd of 80,000 peo119
ple at Mayfair (though to be fair he himself doubted that the entire crowd could hear
him that day (Whitefield, 1756)). If we add an additional few thousand for favorable
environmental conditions or perhaps a temperature inversion carrying his voice farther than usual, we could imagine that under very ideal circumstances Whitefield’s
voice could have reached nearly 60,000 people. But such effects are quite speculative, and not verifiable based on specific data available. However, on his best days,
it does seem possible that Whitefield could have been heard intelligibly by a crowd
of 50,000 people.
Interestingly, Franklin’s original calculation of Whitefield’s range reached an
MIA estimate similar to those given here. His radius of about 121 m yields a final
estimate of
1
MIA = πr2 ≈ 23, 000m2
2
(21)
This is only slightly less than the maximum values shown here. This indicates that
despite his historical and technological limits, Franklin’s base experiment was still
a good first-order estimate. His semicircular radiation pattern would have included
extra area to the sides and excluded other area behind, still giving a good answer
all things considered. Franklin’s overly generous density factor presented problems
for the final calculation, but his method for obtaining the MIA for Whitefield still
seems valid. It is doubtful that any of us could have done better had we been in a
similar situation.
Recall that due to his high density factor, Franklin calculated a crowd size
greater than 100,000 but then only reported 30,000. This was perhaps a combination of his New England modesty and the fact that Whitefield’s crowds of 30,000
were the instigator for Franklin’s experiment in the first place. Like Franklin, we
120
are interested not only in the peak of Whitefield’s popularity, but in the large gatherings he continued to attract over many years across Britain and America. The
largest of these were usually estimated at 20,000-30,000 people. We can see that
under our least ideal acoustic conditions here that Whitefield still could have been
heard by 20,000 people, and with slight variations in vocal level, crowd noise, and
crowd density, he could probably have spoken clearly to 30,000 people on most
days. While a single crowd of 60,000 is more impressive, two crowds of 30,000
accomplish roughly the same effect from the perspective of Whitefield’s itinerant
ministry. When it is considered in the context of the hundreds of such crowds he
attracted over his lifetime, Whitefield probably spoke directly to more individuals
than any orator in history.
121
CHAPTER IX
CONCLUSION
Findings
This work has investigated the range of George Whitefield’s voice and the accuracy of Benjamin Franklin’s auditory experiment to find Whitefield’s maximum
audience size. The investigation has required historical, archaeological, and meteorological research as well as physics-based reasoning and numerical acoustic
simulations. The evidence discussed here makes a strong case for the trustworthiness of the acoustic models constructed during this research. These models suggest
that Whitefield, along with other trained vocalists, could produce average vocal
SPL values of about 90 dBA at a distance of 1 m. Based on Whitefield’s vocal level,
it has been simulated that Whitefield could have reached a crowd of up to 50,000
people under ideal acoustic conditions. Even assuming higher noise levels or lower
crowd density, the majority of Whitefield’s large crowds of 20,000-30,000 seem
acoustically reasonable based on the data provided by Franklin’s experiment. Since
Whitefield’s voice is projected to be as loud as any measured voice today, the crowd
sizes projected here may also be good maximum values for any human gathering in
the pre-amplified era.
Franklin’s MIA estimation is slightly lower but still very close to those generated by the computer models, indicating that his semicircular assumption still
provides a good first-order approximation for this quantity without further informa-
122
tion about source directivity or environmental contributions. However, Franklin’s
density value is probably overly optimistic by at least a factor of 2. Thus this work
provides a better lens for understanding Franklin’s early scientific approach before
his more well known work in electromagnetism.
Implications
Since the publication of the C.P. Snow essay The Two Cultures (Snow, 1959) it
has been a common scene for humanists to be nervous about scientists’ claims to
represent the future of human knowledge. It is possible that much of the resistance
to Digital Humanities research stems from an unwillingness to cede intellectual
ground to overly confident scientists wielding equations and computers.
However, properly considered, this should not be an area of conflict because
science and the humanities have not only different tools, but also different goals.*
Science, while ill-equipped to handle questions of meaning, value, or purpose, is
quite well suited to counting, which is the basis of this project and indeed most of
physics at a fundamental level. Humanities disciplines like history require empirical
facts to interpret, and science provides the best tools for providing these basic facts
for further analysis.
In a similar way, it is not the author’s intention to “run the table” on epistemological authority on a historical issue. Rather, a historically significant numerical
question has been examined from a scientific viewpoint to provide a quantitative
answer (or at least to quantify the uncertainty remaining in the answer). It is hoped
* Indeed,
Snow himself was concerned that without sufficient scientific understanding, humanists
and others would be overly-deferential to scientists in positions of authority (Snow, 1960).
123
that trained historians will further apply and interpret the findings from this research
in a broader historical context.
Future Work
This work has examined the maximum intelligible range of the human voice through
the lens of George Whitefield, who has been shown to be an extreme outlier in maximum vocal level based on Benjamin Franklin’s recorded data. The framework for
determining unamplified vocal range may now be applied more generally to cases
of orators, trained and untrained, throughout history. Since these other speakers
have no such detailed description of their intelligible ranges, more guesswork will
have to be used to determine their approximate speaking SPL: for instance, Alexander the Great would likely have been trained in oratory as part of his education, and
may be assumed to speak to his armies at a greater level than Moses at Sinai, who
was "slow of speech and of tongue." * Based on the projected maximum level for
Whitefield’s voice and that measured for trained vocalists today, the maximum pressure generated by the human voice does not seem to have changed greatly over the
past 300 years despite changes in amplification technology during that time. Thus
it seems fair to assume that the greatest orators like Cicero or Demosthenes may
have been able to sustain levels near that of Whitefield, while less trained speakers
may have had levels nearer the IEC standard of loud speech.
Aside from the question of absolute vocal SPL, the rest of the framework laid
out in this study may be adapted in a straightforward manner to analyze the MIA of
famous speeches based on environment and geometry. This can be combined with
density estimates to project the effective crowd size for many famous addresses
* Exodus
4:10 (English Standard Version)
124
throughout history. These additional analyses will contain greater error than for
Whitefield, but will still provide a crucial step toward a quantitative description of
the limits of human gatherings in the pre-amplified era.
Summing Up
Whitefield declared in 1739 that
The Christian world is in a deep sleep. Nothing but a loud voice can
waken them out of it! (Vaudry, 2003)
This statement nicely captures Whitefield’s lasting significance. Not only did his relentless travel schedule and singleminded devotion to his mission succeed in awakening a religious movement that sparked lasting social, political, and ecclesiastical
reform, but he also did so with the loudest of voices - one that (metaphorically
speaking, of course* ) continues to resound through the ages.
* Acoustic
metaphors should always be used sparingly, especially in scientific works.
125
BIBLIOGRAPHY
Abel, J., Rick, J., Huang, P., Kolar, M., Smith, J., & J. Chowning. (2008). On the
Acoustics of the Underground Galleries of Ancient Chavin de Huantar, Peru.
In Acoustics ‘08. Paris. 15
Akerlund, L., Gramming, P., & Sundberg, J. (1992, January). Phonetogram and
averages of sound pressure levels and fundamental frequencies of speech:
Comparison between female singers and nonsingers. Journal of Voice, 6(1),
55–63. 24, 25, 27, 72, 75
Allen, G. D. (1971). Acoustic Level and Vocal Effort as Cues for the Loudness of
Speech. Journal of the Acoustical Society of America, 49(6B), 1831–1841.
24
Allen, J., & Berkeley, D. (1970). Image method for efficiently simulating small
room acoustics. Journal of the Acoustical Society of America, 65, 943–950.
20
Andreopoulou, A., & Roginska, A. (2012). Computer-Aided Estimation of the
Athenian Agora Aulos Scales Based on Physical Modeling. In Proceedings
of the 133rd Audio Engineering Society Convention. San Francisco, CA. 15
ANSI. (2009). Method for Calculation of the Absorption of Sound by the Atmosphere (Tech. Rep.). American National Standards Institute. 66, 105
Awan, S. (1991). Phonetographic profiles and F0-SPL characteristics of untrained
versus trained vocal groups. Journal of Voice, 5(1), 41–50. 25, 70
Bonsi, D., Longair, M., Garsed, P., & Orlowski, R. (2008). Acoustic and audience
response analyses of eleven Venetian churches. In Acoustics ‘08 (pp. 3087–
3092). Paris. 16
Boren, B.
(2012).
Sounds of the City:
126
The Colonial Era.
Retrieved
from
http://philadelphiaencyclopedia.org/archive/
sounds-of-the-city-the-colonial-era/ 53
Boren, B., & Longair, M. (2011). A Method for Acoustic Modeling of Past Soundscapes. In Proceedings of the Acoustics of Ancient Theatres Conference. Patras, Greece. 16
Boren, B., Longair, M., & Orlowski, R. (2013). Acoustic Simulation of Renaissance
Venetian Churches. Acoustics in Practice, 1(2), 17–28. 4, 16
Boren, B., & Roginska, A. (2013). Maximum Averaged and Peak Levels of Vocal Sound Pressure. In Proceedings of the 135th Audio Engineering Society
Convention. New York, NY. 64
Borish, J. (1984). Extension of the image model to arbitrary polyhedra. Journal of
the Acoustical Society of America, 75(6), 1827–1836. 20, 21, 62
Boudreau, G. (2012a). Independence: A Guide to Historic Philadelphia. Westholme Publishing. 53
Boudreau, G. (2012b). Personal Communication. 53
Bradley, J. S., Reich, R., & Norcross, S. G. (1999). A just noticeable difference in
C50 for speech. Applied Acoustics, 58(58), 99–108. 63
Bridenbaugh, C. (1964). Cities in the Wilderness: The First Century of Urban Life
in America 1625-1742. New York: Alfred A. Knopf. 143
Bridenbaugh, C. (1971). Cities in Revolt: Urban Life in America, 1743-1776 (2nd
ed.). London, New York: Oxford University Press. 142
Bridenbaugh, C., & Bridenbaugh, J. (1942). Rebels and Gentlemen; Philadelphia
in the Age of Franklin. New York: Reynal and Hitchcock. 141, 143, 144
Cabrera, D., Davis, P. J., & Connolly, A. (2011, November). Long-term horizontal
vocal directivity of opera singers: effects of singing projection and acoustic
environment. Journal of Voice, 25(6), e291–e303. 23, 29, 31, 32
Chrisman, K. (1996). Unhoop the Fair Sex: The Campaign Against the Hoop
127
Petticoat in Eighteenth-Century England. Eighteenth-Century Studies, 30(1),
5–23. 115
Chu, W. T., & Warnock, A. C. C. (2002). Detailed Directivity of Sound Fields
Around Human Talkers (Vol. 61; Tech. Rep.). National Research Council
Canada. doi: http://dx.doi.org/10.4224/20378930 6, 23, 29
Coleman, R. (1994, September). Dynamic intensity variations of individual choral
singers. Journal of Voice, 8(3), 196–201. 25
Coleman, R., Mabis, J., & Hinson, J. (1977). Fundamental Frequency-Sound Pressure Level Profiles of Adult Male and Female Voices. Journal of Speech and
Hearing Research, 20, 197–204. 7, 25, 74
Cotter, J., Roberts, D., & Parrington, M. (1992). The Buried Past: An Archaeological History of Philadelphia. University of Pennsylvania Press. 46, 52, 54,
59, 142
Dalenback, B. (2011). CATT-Acoustic v9. Gothenburg, Sweden: CATT. 28, 61,
64, 71, 72, 75, 95
Dalenback, B.-I. (1996). Room acoustic prediction based on a unified treatment of
diffuse and specular reflection. Journal of the Acoustical Society of America,
100(2). 21
Dallimore, A. (1970). George Whitefield; the life and times of the great evangelist
of the eighteenth- century revival. London: Banner of Truth Trust. 7, 13, 54,
68, 80, 81, 87, 89, 111, 115
Denton, W. (1883). Records of St. Giles’ Cripplegate. London: George Bell and
Sons. 84
Dunn, H. K., & Farnsworth, D. W. (1939). Exploration of Pressure Field Around the
Human Head During Speech. Journal of the Acoustical Society of America,
10(3), 184–199. 23
Flanagan, J. L. (1960). Analog measurements of sound radiation from the mouth.
Journal of the Acoustical Society of America, 32(12), 1613–1620. 23
128
Franklin, B. (n.d.). The papers of Benjamin Franklin (L. Labaree, Ed.). New Haven:
Yale University Press. Retrieved from franklinpapers.org 8
Franklin, B. (1739). The Pennsylvania Gazette, Nov. 15. Philadelphia. 66
Franklin, B. (1740). The Pennsylvania Gazette, May 8. Philadelphia. 14
Franklin, B. (1749). Poor Richard, Improved. Philadelphia, PA: Benjamin Franklin.
8, 45, 114, 116
Franklin, B. (1793). The Autobiography of Benjamin Franklin (2nd ed.). New
Haven and London: Yale University Press. 2, 45, 52, 53, 114
The Gentleman’s Magazine. (1739). The Gentleman’s Magazine, 9, 162. 13, 116
Gillies, J. (1772). Memoirs of the Life of the Reverend George Whitefield, MA.
Oswestry, UK: Quinta Press. 81, 111
Gillingham, H. E., & Drowne, S. (1924). Dr. Solomon Drowne. The Pennsylvania
Magazine of History and Biography, 48(3), 227–250. 143
Gramming, P., Sundberg, J., & Ternström, S. (1988). Relationship between changes
in voice pitch and loudness. Journal of Voice, 2(2), 118–126. 25, 72
Harris, C. (1966). Absorption of Sound in Air versus Humidity and Temperature.
Journal of the Acoustical Society of America, 40(1), 141–159. xiii, 105, 107
Hershey, W. (1975). Independence Hall Sidewalk Salvage Project (Tech. Rep.).
Philadelphia: Independence National Historical Park Library. 53, 142
Hodgson, M. (2002). Rating, ranking, and understanding acoustical quality in
university classrooms. Journal of the Acoustical Society of America, 112(2),
568–575. 67, 97
Houtgast, T., Steeneken, H. J. M., & Plomp, R. (1980). Predicting speech intelligibility in rooms from the modulation transfer function. I. General room
acoustics. Acustica, 46(1), 60–72. 5, 63, 97
Howard, D., & Moretti, L. (2010). Sound and Space in Renaissance Venice. Yale
University Press. 15
129
Jackson, J. (1918). Market Street Philadelphia: The Most Historic Highway in
America, Its Merchants, Its Story. Patterson and White. 53, 142
Jacobs, B. H. A. (1967). To count a crowd. Columbia Journalism Review, 5, 37–40.
13, 114
Kalm, P. (1770). The America of 1750 : Peter Kalm’s travels in North America :
the English version of 1770. New York: Dover Publications. 53, 66
Katz, B., & D’Alessandro, C. (2007). Directivity Measurements of the Singing
Voice. In 19th International Congress on Acoustics (Vol. 10, pp. 45–50). 5,
6, 23, 24, 29, 32
Kent, R., Kent, J., & J. Rosenbek. (1987). Maximum Performance Tests of Speech
Production. Journal of Speech and Hearing Disorders, 52, 367–387. 7, 25,
69, 75
Kleiner, M., Dalenback, B.-I., & Svensson, P. (1993). Auralization-An Overview.
Journal of the Audio Engineering Society, 41(11), 861–875. 18, 19
Knowles, K. (2008). What could Lee see at Gettysburg? In Placing history: how
maps, spatial data, and GIS are changing historical scholarship (pp. 235–
266). Redlands, CA: ESRI Press. 15
Komlos, J., & Cinnirella, F. (2005). European Heights in the Early 18th Century.
Vierteljahrschrift für Sozial- und Wirtschaftsgeschichte, 94, 271–284. 115
Kurze, U. J. (1974, March). Noise reduction by barriers. The Journal of the
Acoustical Society of America, 55(3), 504. 49
Kurze, U. J., & Anderson, G. (1971, January). Sound attenuation by barriers.
Applied Acoustics, 4(1), 35–53. 49
Lambert, F. (1994). Pedlar in Divinity. Princeton, NJ: Princeton University Press.
7, 14, 43, 82, 111
Leino, T. (2009, November). Long-term average spectrum in screening of voice
quality in speech: untrained male university students. Journal of Voice, 23(6),
671–6. 25
130
Liberman, M.
(2005).
Counting People.
Retrieved 9/21/2012, from
http://itre.cis.upenn.edu/~myl/languagelog/
archives/002487.html 44
Lisa, M., Rindel, J., & Christensen, C. (2004). Predicting the acoustics of ancient
open-air theatres: the importance of calculation methods and geometrical details. In Joint Baltic-Nordic Acoustics Meeting 2004. 22
Maekawa, Z. (1968). Noise reduction by screens. Applied Acoustics, 1(3), 157–
173. 48
Mahaffey, J. (2007). Preaching Politics : The Religious Rhetoric of George Whitefield and the Founding of a New Nation. Waco, TX: Baylor University Press.
7
Mahaffey, J. (2012). Personal Communication. 13
Manley, G. (1953). The mean temperature of Central England, 1698 to 1952.
Quarterly Journal of the Royal Meteorological Society, 12, 317–342. 105
Marshall, A. H., & Meyer, J. (1985). The directivity and auditory impressions of
singers. Acustica, 58, 130–140. 24
McKendree, F. S. (1986). Directivity indices of human talkers in English speech.
In Internoise 86 (pp. 911–916). Cambridge. 23, 31
Mendes, A. P., Rothman, H. B., Sapienza, C., & Brown, W. (2003, December). Effects of vocal training on the acoustic parameters of the singing voice. Journal
of Voice, 17(4), 529–543. 25, 69, 70, 75
Menounou, P. (2001, October). A correction to Maekawa’s curve for the insertion loss behind barriers. The Journal of the Acoustical Society of America,
110(4), 1828. 49
Meyer, J. (2009). Acoustics and the Performance of Music (5th ed.). Springer. 48,
85
Monson, B. B., Hunter, E. J., & Story, B. H. (2012, July). Horizontal directivity
of low- and high-frequency energy in speech and singing. The Journal of the
131
Acoustical Society of America, 132(1), 433–41. 23, 24, 31
Mori, J., Yoshino, D., S. Satoh, & Tachibana, H. (2011). Prediction of outdoor
sound propagation by applying geometrical sound simulation technique. In
Internoise 2011. Osaka, Japan. 22, 59, 84
Nawka, T., Anders, L. C., Cebulla, M., & Zurakowski, D. (1997, December). The
speaker’s formant in male voices. Journal of Voice, 11(4), 422–8. 24
Olesen, S. K. (1997). Low Frequency Room Simulation using Finite Difference
Equations. In Proceedings of the 102nd Audio Engineering Society Convention. Munich, Germany. 18, 19
Orlowski, R. (2006). Acoustics and Architectural Form. In Architettura e musica
nella venezia del rinascimento. Bruno Mondadori. 15, 45
Pick, H. L., Siegel, G. M., Fox, P. W., Garber, S. R., & Kearney, J. K. (1989).
Inhibiting the Lombard effect. The Journal of the Acoustical Society of
America, 85(2), 894. Retrieved from http://scitation.aip.org/
content/asa/journal/jasa/85/2/10.1121/1.397561
doi:
10.1121/1.397561 95
Pierce, A. (1974). Diffraction of sound around corners and over wide barriers.
Journal of the Acoustical Society of America, 55(5), 941–955. 49
Piercy, J. E., Embleton, T. F., & Sutherland, L. C. (1977, June). Review of noise
propagation in the atmosphere. The Journal of the Acoustical Society of
America, 61(6), 1403–18. 68
Rath, R. (2003). How early America sounded. Cornell University Press. 15, 52,
140, 144
Rindel, J. (2000). The Use of Computer Modeling in Room Acoustics. Journal of
Vibroengineering, 3(4), 219–224. 19, 21
Rindel, J. (2002). Modelling in Auditorium Acoustics - From Ripple Tank and
Scale Models to Computer Simulations. In Proceedings of the 2002 Forum
Acusticum. Sevilla, Spain. 16, 17, 18
132
Rindel, J. (2010). Room Acoustic Prediction Modelling. In El XXIII Encontro da
Sociedade Brasileira de Acustica. Salvador, Brazil. 84
Rindel, J., Nielsen, G., & Christensen, C. (2009). Diffraction around corners and
over wide barriers in room acoustic simulations. In The Sixteenth International Congress on Sound and Vibration. 22
Ross, C. (1999). Outdoor Sound Propagation in the U.S. Civil War. Echoes, 9(1).
15, 68
Scarre, C., & Lawson, G. (Eds.). (2006). Archaeoacoustics. Cambridge, UK:
McDonald Institute for Archaeological Research. 15
Scharf, J., & Westcott, T. (1884). The History of Philadelphia, 1609-1884, Vol. I.
Philadelphia, PA: L.H. Everts and Co. 141, 142, 143
Seidler, J., Meyer, K., & Gillivray, L. M. (1976). Collecting Data on Crowds
and Rallies: A New Method of Stationary Sampling. Social Forces, 55(2),
507–519. 114
Serra, J., Koduri, K., Miron, M., & Serra, X. (2011). Assessing the Tuning of
Sung Indian Classical Music. In 12th International Conference on Music
Information Retrieval (ISMIR-11). Miami, FL. 15
Sivitz, P., & Smith, B. (2012). Philadelphia and Its People in Maps: The 1790s.
Retrieved
from
http://philadelphiaencyclopedia.org/
archive/philadelphia-and-its-people-in-maps-the
-1790s/ 140
Smith, B. (1999). The Acoustic World of Early Modern England: Attending to the
O-Factor. Chicago: University of Chicago Press. 15
Snow, C. P. (1959). The Two Cultures and the Scientific Revolution. New York:
Cambridge University Press. 123
Snow, C. P. (1960). Science and Government. Cambridge, MA: Harvard University
Press. 123
Snyder, M. (1975). City of Independence: Views of Philadelphia Before 1800. New
133
York: Praeger Publishers. 57, 140, 141
Steeneken, H. J. M., & Houtgast, T. (1980). A physical method for measuring
speech transmission quality. Journal of the Acoustical Society of America,
67(1), 318–326. 65
Stout, H. (1991). The Divine Dramatist: George Whitefield and the Rise of Modern
Evangelicalism. Grand Rapids, MI: William B. Eerdmans Publishing Company. 13, 43, 53
Sundberg, J. (2001, June). Level and center frequency of the singer’s formant.
Journal of Voice, 15(2), 176–86. 24
Sundberg, J., & Nordenberg, M. (2006). Effects of vocal loudness variation on spectrum balance as reflected by the alpha measure of long-term-average spectra
of speech. The Journal of the Acoustical Society of America, 120(1), 453–
457. 27
Swank, E., & Clapp, J. (1999). Some Methodological Concerns When Estimating
the Size of Organizing Activities. Journal of Community Practice, 6(3), 49–
69. 114
Thornbury, W. (1878). Moorfields and Finsbury. In Old and New London: Volume 2
(Vol. 2, pp. 196–208). London, Paris, and New York: Cassell, Petter, Galpin,
and Co. 80
Tyerman, L. (1877). The life of the Rev. George Whitefield. New York: Anson D.
F. Randolph and Company. 59, 66, 81
Ukers, W. (1922). All About Coffee. The Tea and Coffee Trade Journal Company.
54
Vaudry, R. (2003). Anglicans and the Atlantic World. McGill-Queen’s University
Press. 125
Vorländer, M. (1995). International Round Robin on Room Acoustical Computer Simulations. In 15th International Congress on Acoustics Proceedings.
Trondheim. 21
134
Wachter, K. W., & Trussell, J. (1982). Estimating Historical Heights. Journal of
the American Statistical Association, 77(378), 279–293. 115
Wakeley, J. (1871). The Prince of Pulpit Orators: A Portraiture of Rev. George
Whitefield, M.A. (2nd ed.). New York: Carlton and Lanahan. 5
Wakeley, J. (1872). Anecdotes of the Rev. George Whitefield, M.A., with Biographical Sketch. Hodder and Stoughton. 54
Walford, E. (1878). Mayfair. In Old and New London: Volume 4 (pp. 345–359).
London, Paris, and New York: Cassell, Petter, Galpin, and Co. 89
Wall, J., Stephens, J., & Markham, B. (2012). Virtual Paul’s Cross Project. Retrieved 2/20/2013, from http://vpcp.chass.ncsu.edu/ 16
Watson, J. (1830). Annals of Philadelphia. E.L. Carey and A. Hart. 142, 143, 144
Watson, R., & Yip, P. (2011). How many were there when it mattered ? Significance, 8(3), 104–107. 13, 114, 115, 116
West, M., Gilbert, K., & Sack, R. (1992, January). A tutorial on the parabolic equation (PE) model used for long range sound propagation in the atmosphere.
Applied Acoustics, 37(1), 31–49. 19
White, M. J., & Gilbert, K. E. (1989, January). Application of the parabolic equation to the outdoor propagation of sound. Applied Acoustics, 27(3), 227–238.
19
Whitefield, G. (1756). The Works of George Whitefield: Journals. Oswestry, UK:
Quinta Press. 14, 82, 87, 109, 111, 120
135
APPENDIX A
FULL VOCAL SPL MEASUREMENTS
136
Table 18
Leq values for speech, in dBA
Participant
Conv.
Thea.
Max.
1 - mez. sop.
60.1
65.3
72.4
2 - soprano
57.3
72.6
86.6
3 - soprano
58.0
65.5
73.3
4 - baritone
61.7
75.4
84.3
5 - soprano
56.2
68.0
73.1
6 - soprano
55.6
61.4
69.7
7 - baritone
63.7
76.2
90.3
8 - soprano
62.8
71.0
79.9
9 - tenor
55.7
73.3
86.5
pp
mf
ff
1 - mez. sop.
70.8
73.8
77.1
2 - soprano
69.3
75.0
82.2
3 - soprano
78.5
79.8
83.7
4 - baritone
68.0
79.0
83.8
5 - soprano
69.3
82.5
84.9
6 - soprano
69.0
76.1
82.3
7 - baritone
68.5
80.4
86.8
8 - soprano
71.3
76.1
80.4
9 - tenor
63.3
73.7
88.4
Table 19
Leq values for back sung voice, in dBA
Participant
137
Table 20
Leq values for mask sung voice, in dBA
pp
mf
ff
1 - mez. sop.
71.3
73.9
76.9
2 - soprano
67.3
75.7
82.1
3 - soprano
80.2
82.5
82.2
4 - baritone
75.6
81.6
86.4
5 - soprano
78.1
79.5
83.4
6 - soprano
76.7
82.2
88.1
7 - baritone
69.6
84.2
90.8
8 - soprano
71.9
77.9
83.1
9 - tenor
68.1
78.6
90.7
Conv.
Thea.
Max.
1 - mez. sop.
81.6
88.3
94.0
2 - soprano
79.0
97.0
107.8
3 - soprano
82.8
91.7
100.7
4 - baritone
85.7
97.1
107.4
5 - soprano
77.6
91.3
94.5
6 - soprano
77.7
84.3
93.7
7 - baritone
88.4
102.6
113.1
8 - soprano
88.2
98.1
104.1
9 - tenor
79.6
98.1
112.4
Participant
Table 21
Lpk values for speech, in dBA
Participant
138
Table 22
Lpk values for back sung voice, in dBA
pp
mf
ff
1 - mez sop.
88.9
93.8
95.7
2 - soprano
89.6
94.6
102.7
3 - soprano
97.8
100.4
105.5
4 - baritone
89.4
100.6
105.4
5 - soprano
88.1
103.6
107.2
6 - soprano
94.2
99.2
103.5
7 - baritone
88.9
101.7
109.6
8 - soprano
94.3
98.6
101.3
9 - tenor
78.0
91.6
108.2
pp
mf
ff
1 - mez. sop.
88.7
91.8
96.4
2 - soprano
85.4
98.1
105.0
3 - soprano
101.3
102.4
103.1
4 - baritone
97.2
104.4
109.4
5 - soprano
101.5
101.6
105.0
6 - soprano
99.9
105.0
112.0
7 - baritone
92.9
108.5
110.8
8 - soprano
93.2
98.8
103.0
9 - tenor
83.9
96.9
113.0
Participant
Table 23
Lpk values for mask sung voice, in dBA
Participant
139
APPENDIX B
HISTORY OF SOUND IN COLONIAL PHILADELPHIA
Soon after its founding, Philadelphia quickly crossed the threshold from a
mere rural agglomeration into a true city, complete with an urban soundscape. In
contrast to the countryside, where large distances and tree lines weakened the intensity of sound traveling between farms, within the city neighbors had no choice
but to hear the diverse noises that resulted from both private and public endeavors. Despite William Penn’s vision of a city spread between the Delaware River
and the Schuylkill, Philadelphia remained densely concentrated along the Delaware
throughout the eighteenth century (Sivitz & Smith, 2012). Sounds from the private
sphere intruded into public life without hindrance: women’s batting staffs, street
criers, and bells sounded loudly throughout the city’s domestic, commercial, and
religious life (Rath, 2003).
The intersection of Market and Second Street was central to William Penn’s
plan for Philadelphia, and the soundscape at this point quickly began to diverge
from that of the countryside. As early as 1682 it was the site of a simple cage for
the city’s criminal offenders with no sound insulation whatsoever (Snyder, 1975),
and a later prison built in the middle of Market Street was labeled a nuisance by the
city’s Grand Jury in 1702. The space in the middle of Market Street next became
host to the bleating of sheep, who were pastured on the common area by the town
butcher. Plenty of human noise followed as well: Philadelphia’s market was moved
from Front Street to the intersection of Market and Second, meeting on Wednesdays
140
and Saturdays. A bell rang out to signify the opening of the market, which quickly
gave Market Street (originally called High Street) its name (Scharf & Westcott,
1884). Bells were an especially common means of sonic communication in the
colonial city, present in churches, clocks, and schools, not to mention what would
later be called the Liberty Bell, which served to call the members of the legislature
to work.
Another bell would sit atop the Old Court House, built in 1707 at the same
intersection, which brought more focus to Market and Second Street over the next
few decades as the city’s only important public building (Snyder, 1975; Bridenbaugh & Bridenbaugh, 1942). Both the city and county courts met in the Court
House, whose foundation stood on brick arches that allowed the market stalls to extend under the building itself (Scharf & Westcott, 1884). Since the market’s hours
were strictly defined, presumably noise from the stalls did not significantly disturb
the proceedings within the Court House.
Instrumental music and singing added to the sounds of the city. Religious
leaders, including Whitefield, discouraged secular or instrumental music, preferring
only devotional hymns during church services, and in 1740 only three churches in
the city possessed an organ. Gradually, as the influence of Whitefield’s revivals
waned, instrumental music became more popular. By 1774 every worship service
in Philadelphia used an organ except the silent worship of the Society of Friends.
Public houses were often centers of popular music, and David Lockwood’s tavern
even had a “Musical Clock,” which played “Sonatas, Concertos, Marches, Minuets,
Jiggs, and Scots Airs.” (Bridenbaugh & Bridenbaugh, 1942)
In contrast to the riotous atmosphere at many taverns, the upper echelons of
Philadelphia society preferred the more subdued conversation at the city’s growing
number of coffee houses, including the famous London Coffee House at the corner
141
of Front and Market Street, which opened in 1754. It began as a place for civil
conversations and business transactions between merchants and traders, but this led
to the Coffee House’s use as an all-purpose auction house for horses, carriages, and
even slaves. This louder commercial soundscape would eventually give way to the
crackles of bonfires and shouts of revolutionary mobs: as the conflict with Britain
worsened in the 1760s, the street in front of the Coffee House became the site of
protests against the Stamp Act and later the burning in effigy of British officials
(Scharf & Westcott, 1884).
Indeed, carriages and wagons coming from the countryside to the wharves
along the river, as well as the whip cracks of their drivers, continually generated
noise (Bridenbaugh, 1971). In the first half of the eighteenth century, streets were
not often paved, and what pavement there was consisted of what archaeologists call
pebblestone (Hershey, 1975), similar to gravel (Jackson, 1918). Dirt and gravel
roads provided less rigid surfaces than a hard cobblestone pavement, reducing the
noise from cartwheels and horses’ hooves while producing a slight hiss from the
small particles of stone sticking to wheels (as in the audio example). But as the city
developed more in the second half of the century, streets became more uniformly
and solidly paved (Cotter et al., 1992) and wheels were more likely to be lined with
iron (Bridenbaugh, 1971), both of which increased the radiated noise throughout
the city. John Fanning Watson’s Annals of Philadelphia in the Olden Time included
several anecdotes of citizens hearing voices or artillery over great distances, and
older residents told him that it was easier to hear distant sounds when the city had
fewer carriages and unpaved streets (J. Watson, 1830). These sounds, along those
of the herds of livestock occasionally found moving through the city, ensured that
the colonial city’s soundscape was not entirely divorced from the sounds of the
countryside around it like the later industrialized city’s would be.
142
While the carriages heading to the wharves generated noise throughout Philadelphia, the Delaware riverfront itself provided another diverse soundscape on the edge
of the growing city. The riverfront also had associations with intemperance, as one
of the earliest river landing sites was named for a tavern on that site that predated the
city. Caves along the river also served as grog shops, which further contributed to
the lower-class reputation of the waterfront through the frivolity, songs, and brawls
that went along with their wares. These led to the passing of strict laws against
drunkenness as early as 1682-3. Soon however, the wealthy founders of Philadelphia built large wharves along the river, and the waterfront’s soundscape was further
infused with clanking anchors and chains, the groaning of masts and riggings, the
speech of sailors and merchants, and the loading and unloading of carriages, wagons, and ships (Bridenbaugh, 1964).
The growing noise led the citizens of Philadelphia to take action while the
city was still relatively young. As early as 1732, the city drew up a noise ordinance
restricting gatherings and noise-making on Sundays (Scharf & Westcott, 1884),
possibly for the peace of the Friends’ worship services. But as the city continued to
develop and increase in loudness and density, the city’s growing commercial noise
prompted a flow of wealthy citizens that began to move to the suburbs (Bridenbaugh
& Bridenbaugh, 1942). In a letter to his brother in Rhode Island, Doctor Solomon
Drowne wrote from Philadelphia in 1774 “I almost envy you your pleasant situation
on Mendon’s pleasant Hill, remote from Noise & Confusion. Here the thundering
of Coaches, Chariots, Chaises, Waggons, Drays, and the whole Fraternity of Noise
almost continually assails our Ears.” (Gillingham & Drowne, 1924) As early as
1711, Robert Fairman mentioned in a letter the benefits of a plantation “out of the
noise of Philadelphia, but in site of it.” (J. Watson, 1830) In 1770, the principal of a
143
private academy north of Philadelphia likewise advertised his college as being “free
from the noise of the city.” (J. Watson, 1830)
Nights in the city were quieter by and large, marked by the occasional “crying of the hour” by the night watchman. Occasionally, however, disturbances did
break out when taverns brawls erupted into the streets. In the years leading to the
American Revolution, British soldiers in the city may have affected transatlantic relations through their nighttime noisemaking: in 1769, a young Englishman named
Alexander Macraby recorded that he, along with several officers and a band, routinely paraded through the streets at midnight and played under the windows of
young women, “which,” he added, “they esteem a high compliment.” Occasionally
their celebrations passed into the countryside, such as when Macraby wrote that
they took seven sleighs with fiddlers on horseback “to a public house a few miles
from town, where we danced, sung and romped and eat and drank, and kicked away
care from morning till night.” (Bridenbaugh & Bridenbaugh, 1942)
While some scholars have argued that colonial Philadelphia was even louder
than a modern urban environment (Rath, 2003), the crucial difference is not the
overall loudness over time but the different textures of the soundscapes. Modern
city noise is characterized by relatively continuous background noise from engines,
generators, and ventilation systems. In contrast, eighteenth-century Philadelphians heard many short, impulsive sounds rising over a very quiet pre-electric background. Only at the busiest times did the colonial city have enough individual
sources of noise to blend into anything continuous enough to be perceived as background noise. The rest of the time these sharp sounds would ring out into the
foreground of public attention, one of the many growing pains for the young city.
144
Download