Sponsoring Committee: Professor Agnieszka Roginska, Chairperson Professor Tae Hong Park Professor Brian Gill THE MAXIMUM INTELLIGIBLE RANGE OF THE HUMAN VOICE Braxton Boren Program in Music Technology Department of Music and Performing Arts Professions Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Steinhardt School of Culture, Education, and Human Development New York University 2014 Copyright © 2014 Braxton Boren ACKNOWLEDGEMENTS The saying “it takes a village to raise a child” still applies to those of us who find ourselves slowly emerging from childhood around the age of thirty with the end (we hope!) of our schooling in sight. For me at least it has taken a large amount of villagers and I am pleased to be able to thank them all without fear of an orchestra playing me off the stage. First I must thank my father, who early in life instilled a love of history in my brothers and me that he was able to further enrich later on as our teacher. Dad made me read Franklin’s Autobiography in high school, and later in life reminded me of the Whitefield experiment and its similarity to other archaeoacoustic work I had done. Simply put, none of this project would have come about without him. Secondly, I am extremely grateful to my supervisor, Agnieszka Roginska, who from the start has been supportive and encouraging even when I embarked on a direction of study without direct precedent in our (or perhaps any other) department. Her probing questions and boundless imagination allowed my work to take on breadth without sacrificing depth, and though at times I felt overwhelmed by the new avenues of study opened up by our conversations, the research produced was always stronger for it. Thanks also to the other members of my committee, Brian Gill and Tae Hong Park, who each in their own way exemplify the interdisciplinary work that allows Steinhardt to be such a great nexus of expertise for the entire university. I would like to thank the many historians whose comments, suggestions, and iii insights have supplemented my lack of training in that area to hopefully produce something coherent enough to make a small contribution to that field. This includes George Boudreau, who graciously gave me a comprehensive tour of the Philadelphia Market Street area early on and supplied many important details in the early modeling stage. Thanks also to David Bebbington, Mike Breidenbach, Joyce Chaplin, Lee Gatiss, Deborah Howard, Digby James, David Ceri Jones, Karen Kupperman, Jerome Mahaffey, Mark Noll, Elizabeth Pardoe, Richard Rath, Harry Stout, and Peter Williams for all the comments, hints, and suggestions that guided me along the path toward the very finicky details I needed for this project. Thanks go to the scientists whose input helped solidify my own understanding of the processes involved and helped affirm that “applied research” is not a dirty word. This particularly includes David Lubman for his particular devotion to rigorous analysis in archaeoacoustics to generate real quantitative data. Thanks also to David Bradley, Ken Cunefare, Bengt-Inge Dalenback, Malcolm Longair, and Charles Ross for their helpful suggestions and encouragement along the way. Thanks to all my friends in the Music and Audio Research Lab over the years: to Areti Andreopoulou for showing me the ropes and in general showing patience and grace to me when I had no right to expect it; to Andrew Madden for his infectious happiness; to Justin Matthew for his microphone calibration software; to Marc Wilhite for his help meticulously setting up microphone arrays; to Rachel Bittner for making me feel old and thus giving me extra motivation to graduate; to Finn Upham for deep conversations; to Aron Glennon for shallow conversations (equally necessary); to Taemin Cho for years of LaTeX expertise; to Jon Forsyth for sharing my esoteric musical tastes; to Eric Humphrey and Uri Nieto for coffee, beer, and solidarity; and to Michael Musick for shouldering several of my responsibilities as I retreated from lab duties to actually finish the dissertation. iv My appreciation goes to Steinhardt for providing a grant to go to London to survey and measure the sites of Whitefield’s crowds, and to Blair and Melissa Heuer for giving me a place to stay there. Thanks also to the many vocalists who participated in the recording sessions involved in this research. Thanks to my brothers for both majoring in history and making me feel insecure for majoring in music. Thanks to my mother for giving me a job in junior high combing through old newspapers (little did I know how useful that skill would prove later on). And thanks last and most of all to my wife Laura, who followed me to New York when neither of us were really sure we wanted to go there. She has celebrated with me in the good times, and wept with me in the bad. If there is any quality in this work, she deserves at least half the credit for the countless ways she has loved and supported me throughout the last four years. v TABLE OF CONTENTS LIST OF TABLES ix LIST OF FIGURES xi ACRONYMS xiv CHAPTER I INTRODUCTION 1 Scope of this Study Construction of Audience Assessment of Intelligibility Motivation Dissertation Outline Contributions Associated Publications by the Author Peer-Reviewed Articles Conference Papers II BACKGROUND 13 History of Whitefield’s Crowds Archaeoacoustics Acoustical Simulation Acoustics of the Spoken Voice Directivity Maximum Level III 3 3 5 6 9 10 11 11 11 13 15 16 22 23 24 ACOUSTIC DIRECTIVITY OF VOCAL PRODUCTION MODES 28 Measurement Procedure Results Normalized Radiation Patterns Absolute Radiation Patterns Discussion vi 29 33 33 37 40 IV ANALYSIS OF FRANKLIN’S EXPERIMENT Franklin’s Experiment Diffraction Effects Noise Sources in Eighteenth-Century Philadelphia Discussion V ACOUSTIC SIMULATION OF FRANKLIN’S EXPERIMENT Makeup of the Colonial City Modeling Procedure Geometry Sound Attenuation Simulation Speech Intelligibility Background Noise Atmospheric Conditions Results Discussion VI MAXIMUM AVERAGED AND PEAK VOCAL SPLS Method Pilot Study Spoken and Sung Voice Analysis Average Levels for Speech Gender Differences Spoken Levels For Singers Peak Spread Standard Deviation by Level Back vs. Mask Levels Discussion VII 43 46 52 54 56 57 59 59 61 63 64 66 66 68 70 71 71 74 75 75 75 75 76 77 78 79 MODELING THE SITES OF WHITEFIELD’S LONDON CROWDS 80 Locations Moorfields Kennington Common Mayfair VIII 43 SIMULATIONS OF WHITEFIELD’S SERMONS IN LONDON Simulation Results: Base Conditions Moorfields vii 80 80 86 89 94 94 96 Kennington Common Mayfair Other Factors Environmental Factors Geometric factors Crowd Density Final Crowd Estimates IX 100 102 103 104 109 113 117 CONCLUSION 122 Findings Implications Future Work Summing Up 122 123 124 125 BIBLIOGRAPHY 126 A FULL VOCAL SPL MEASUREMENTS 136 B HISTORY OF SOUND IN COLONIAL PHILADELPHIA 140 viii LIST OF TABLES 1 Review of maximum SPLs measured in previous studies 26 2 Absorption coefficients by octave band center frequency (Hz) for each material used in Market Street model 60 Octave band averaged sound pressure (dB) at 1 m for both background noise sources 65 Simulated LAeq values (dB) for Whitefield’s voice based on background noise distance and minimum STI value 67 Leq values for pilot study, in dBA , for Conversational, Theatrical, and Maximal Levels 72 6 Lpk values for pilot study, in dBA 73 7 Spk values for pilot study, in dBA 73 8 Absorption coefficients for buildings and crowds at the Moorfields 86 9 Absorption coefficients for brick walls near Mayfair 92 10 Moorfields simulated MIA (m2 ) for each vocal SPL and background noise level 97 Kennington simulated MIA (m2 ) for each vocal SPL and background noise level 100 Mayfair simulated MIA (m2 ) for each vocal SPL and background noise level 103 Simulated changes in MIA resulting from changes in temperature in Moorfields 105 3 4 5 11 12 13 ix 14 Simulated changes in MIA for Moorfields resulting from changes relative to 50% humidity 108 Simulated changes in MIA resulting from changes in source elevation angle in Moorfields 110 Maximum simulated MIA and crowd size for each site at 90 dBA vocal level 118 17 Maximum reported crowd size for each site 119 18 Leq values for speech, in dBA 137 19 Leq values for back sung voice, in dBA 137 20 Leq values for mask sung voice, in dBA 138 21 Lpk values for speech, in dBA 138 22 Lpk values for back sung voice, in dBA 139 23 Lpk values for mask sung voice, in dBA 139 15 16 x LIST OF FIGURES 1 Schlieren photography showing wave propagation in a concert hall from Rindel (2002) 17 Optical ray method showing attenuation of individual ray paths from Rindel (2002) 17 3 Diagram of microphone array used for measurements 31 4 Aligning a vocalist with the measurement array 32 5 Normalized overall levels for the vocalists, intoning vowels on C4 34 6 Normalized overall levels for the vocalists, speech (a and b) and song (c and d) 35 7 Normalized third-octave bands for actress’s monologue 36 8 Normalized third-octave bands for actress’s vowels 36 9 Normalized 10000 Hz bands for musical theater singer’s song and vowels 37 10 Normalized third-octave bands for opera singer’s song 38 11 Absolute overall levels for the vocalists, intoning vowels on C4 39 12 Absolute overall levels for the vocalists, speech (a and b) and song (c and d) 40 13 Absolute third-octave bands for opera singer’s song 41 14 Absolute third-octave bands for actor’s vowels 41 15 Inset of Clarkson-Biddle Map of Philadelphia showing Market Street 45 2 xi 16 Diagram of Franklin’s position (BF) in relation to sources on Front Street 47 17 Diffraction at Franklin’s Position using Kurze-Anderson Formula 50 18 Diffraction at Franklin’s Position using Maekawa’s Solution 51 19 William Breton, Old Court House & Second Friend’s Meeting, 1830, Library Company of Philadelphia 57 Inset of George Heap’s East Prospect of the City of Philadelphia, 1752, New York Public Library 58 AutoCAD model of Market Street area, extruded from ClarksonBiddle map 60 22 Predicted logarithmic attenuation from Whitefield to Franklin 62 23 Summed pressure-squared echogram from Whitefield to Franklin 64 24 Mean peak spread, dBA 76 25 Standard Deviation for the 9 Singers, dBA 77 26 Mean dB Increase from Back to Mask Voice 78 27 Inset of John Rocque’s 1746 Map of London showing the Moorfields 83 28 Sketchup Model of the Upper and Middle Moorfields 85 29 Map of Kennington Manor, including the Common, based on Hodskinson and Middleton’s survey, 1785 87 30 Modeling Kennington Common in Sketchup 88 31 Inset of John Rocque’s 1746 Map of London showing Mayfair 90 32 Unsigned wood print of Chesterfield House, 1760 91 33 Sketchup Model of Mayfair 92 34 Simulated STI at Moorfields for different background noise conditions 99 20 21 xii 35 Simulated STI at Kennington for different background noise conditions 101 36 Simulated STI at Mayfair for different background noise conditions 104 37 Atmospheric absorption at different humidity levels, from (Harris, 1966) 107 Male vocal directivity pattern, in octave bands, used for Whitefield’s voice 111 Change in MIA based on source height and angle at Mayfair 112 38 39 xiii ACRONYMS BEM : Boundary Element Method. 19 CT : Cone Tracing. 21, 61, 62, 94 FDE : Finite Difference Equation. 19 FEM : Finite Element Method. 19 GIS : Geographical Information System. 15 ISM : Image Source Model. 20, 21 MIA : Minimally Intelligible Area. 93, 95–97, 99–102, 104–113, 115–117, 119, 120, 122 PE : Parabolic Equation. 19 RMS : root mean square. 6, 33 RT : Ray Tracing. 20, 21, 61 SPL : Sound Pressure Level. 3, 6, 22, 24–29, 55, 56, 62, 66–68, 70, 93–96, 120, 122 STI : Speech Transmission Index. 4–6, 28, 55, 63–67, 70, 86, 93–97, 101, 106, 112 WE : Wave Equation. 18–20, 22 xiv “We shed as we pick up, like travellers who must carry everything in their arms, and what we let fall will be picked up by those behind. The procession is very long and life is very short. We die on the march. But there is nothing outside the march so nothing can be lost to it.” -Tom Stoppard, Arcadia xv CHAPTER I INTRODUCTION The subject of this dissertation is the intelligible range of George Whitefield’s open-air oratory in eighteenth-century America and Britain. Benjamin Franklin doubted the accounts he heard of the Anglican preacher Whitefield addressing 30,000 or more congregants at open-air venues in London. When Whitefield came to Philadelphia in 1739, Franklin performed one of the earliest recorded ‘archaeoacoustic’ experiments: [Whitefield] had a loud and clear Voice, and articulated his Words and Sentences so perfectly that he might be heard and understood at a great Distance, especially as his Auditories, however numerous, observ’d the most exact Silence. He preach’d one Evening from the Top of the Court House Steps, which are in the middle of Market Street, and on the West Side of Second Street which crosses it at right angles. Both Streets were fill’d with his Hearers to a considerable Distance. Being among the hindmost in Market Street, I had the Curiosity to learn how far he could be heard, by retiring backwards down the Street towards the River; and I found his Voice distinct till I came near Front Street, when some Noise in that Street, obscur’d it. Imagining then a Semicircle, of which my Distance should be the Radius, and that it were fill’d with Auditors, to each of whom I allow’d two square feet, I computed that 1 he might well be heard by more than Thirty Thousand. This reconcil’d me to the Newspaper Accounts of his having preach’d to 25,000 People in the Fields, and to the ancient Histories of Generals haranguing whole Armies, of which I had sometimes doubted. (Franklin, 1793). Though novel by eighteenth-century standards, Franklin’s experiment ignores some important acoustic phenomena. Advances in physics and computational technology have transformed archaeoacoustic research, allowing much more detailed descriptions of how acoustic spaces would have sounded in the past. Using modern simulation techniques and Franklin’s data, it is possible to model eighteenth-century Philadelphia to calculate how loud Whitefield’s voice would have been during Franklin’s experiment. This information may then be used to insert a virtual Whitefield into a model of his largest crowds in London, simulating how many people could have heard his unamplified voice at once and allowing a measure for the accuracy of Franklin’s original calculation. Since Whitefield’s crowds are among the largest recorded in history, this research addresses not only Franklin’s specific question but also the general question of the maximum free field range of the unamplified human voice. The study of history, as with most of human culture, prizes visual cues over auditory ones. This is partly due to the neurological composition of these two sensory systems and partly because, musical scores being a notable exception, very little auditory information is encoded into the archaeological or historical record. Because of this, our common conception of history is reduced to something like a picture-book of frozen images in time. However, the auditory system has a much greater resolution in the time domain than the visual system, and because of this auditory cues are a primary way in which we experience the flow of time. Hearing 2 the past is a valuable way to understand the lives of people in the past by experiencing time as a transient, flowing medium rather than a series of famous paintings in a history book. Since hearing events from the past is quite difficult, this constitutes a major problem for our ability to holistically understand history. Franklin’s experiment represents a desire to investigate an important historical event – Whitefield’s sermons in London – by recording a smaller piece of measurable data, and extrapolating mathematically the larger question of his maximum crowd size. Unfortunately, modern historians addressing the same question fall into the opposing traps of either considering the crowd size as unknowable, or by taking Franklin’s estimate at face value. Both of these approaches blatantly ignore the advances in knowledge that are possible because of the progress of science and technology over the past 250 years. The goal of this study is to use Franklin’s measured data, combined with modern understandings of sound propagation and psychoacoustics, to estimate the Sound Pressure Level (SPL) of Whitefield’s voice and how many people could have intelligibly heard him at once. Through a combination of historical and archaeological research, acoustical modeling, and laboratory measurements of the human voice, I intend to complete Franklin’s experiment as I believe he would were he alive today. Scope of this Study Construction of Audience This study is interested primarily in answering the specific research question Franklin was addressing: how many people could hear Whitefield’s unamplified voice at once? In particular, this study addresses the question in the same way his exper- 3 iment did. This means that that the virtual audience will essentially be populated entirely with virtual Benjamin Franklins, with equal hearing to Franklin himself. Franklin had no reported record of hearing loss and was relatively young (33) at the time of his experiment, and so this method allows a fair generalization of the hearing capacity of Whitefield’s audiences. In addition, different minimum values of the Speech Transmission Index (STI) may be implemented in the model to account for different levels for Franklin himself. These in turn propagate through the entire experiment, since better hearing yields a lower minimum STI, which in turn leads to a lower simulated loudness for Whitefield’s voice. This allows a wider range of possibilities to be considered while maintaining Franklin’s original experiment design. Of course, it is always possible to construct a theoretical audience with better or worse hearing than Franklin or taller or shorter or in different geometric configurations, but the focus of this study is primarily to answer Franklin’s question as he would have, given 250 years of advances in scientific knowledge and technology. Franklin may thus be considered a more or less average citizen listener based on what we know about him. In addition, it is tempting to use the simulated impulse response of Whitefield’s acoustic system to generate an auralization to allow us to hear what his oratory would have sounded like. This is an effective technique for ensemble music, which averages out the individual characteristics of single musicians and allows a good approximation of how a specific piece would have sounded in the past (Boren, Longair, & Orlowski, 2013). But in Whitefield’s case we have little specific information about his accent, which would make an anechoic recording of a trained actor mere speculation for the subjective character of his voice itself. While Franklin’s 4 experiment provides good data for an analysis of maximum intelligible range, it does not provide enough information for an auralization. Assessment of Intelligibility This study, like Franklin’s, will focus on a generalized intelligibility rating rather than describing specific factors for different words or phrases, despite the existence of specific research into that question. For instance, there has been extensive work done on the subjective intelligibility of different speech and frequency bands (Houtgast, Steeneken, & Plomp, 1980). In the field of vocal directivity, some attention has been paid to the analysis of the radiation patterns of individual phonemes (Katz & D’Alessandro, 2007). And we do have some anecdotal data about Whitefield’s power over specific words: the famous English Shakespearean actor David Garrick reported that Whitefield “could make his audiences weep or tremble merely by varying his pronunciation of the word Mesopotamia.” Garrick also said he would give one hundred guineas if he could pronounce ‘O!’ like Whitefield (Wakeley, 1871). Yet in the end, these individual data points do not afford us a rigorous enough body of evidence to look into such precise questions in a simulation of Whitefield’s voice. Even Franklin, who had a living, breathing Whitefield present during his experiment, did not try to find out anything so specific: instead, he merely sought an averaged ‘intelligibility’ rating rather than focusing on the intelligibility of a specific word or phrase. The field of archaeoacoustics seldom even offers historical examples with quantitative averaged data such as Franklin’s, let alone any such information about specific phonemes, nor is it clear how the summation of such specific research questions would cohere to answer the broader historical question of Whitefield’s maximum crowd size. Thus this study will focus on the use of the STI, 5 which uses known data about the understandability of specific frequency bands to produce an averaged metric that correlates well with subjective human speech intelligibility. In a similar manner, radiation pattern data considered within the context of this study will simply use root mean square (RMS) acoustic energy per frequency band, which is standard for directivity measurements (Chu & Warnock, 2002; Katz & D’Alessandro, 2007). This will allow straightforward calculations of the STI within the simulated acoustic system, allowing the best estimate of Whitefield’s ‘average’ intelligibility. While not every speaker at the same level will yield equivalent perceptual intelligibility, the power of Whitefield’s articulation, rather than being considered separately, is included in the virtual level simulation here for the purposes of the STI. Thus a given vocal level estimate of X dB could also refer to a level of (X − 1) dB with clearer articulation. Motivation This type of research is fairly novel, as it involves an insular problem with an insular solution – but the problem and solution exist in entirely separate academic departments. The methodology of estimating a historical source’s SPL based on the STI at a receiver point is entirely new, owing to the fact that there are few if any historical accounts as detailed as Franklin’s in discussing the threshold of intelligibility at a specific location. This methodology will contribute toward a broader body of research aimed at reconstructing the sounds of the past based on historical evidence and principles of physics. In addition to novelty, there is scientific significance to a quantitative study of the maximum range of the human voice. While this topic is less important for modern acoustic engineering because of amplification technology, some maximum 6 limits of the human voice have been explored within the vocal acoustics community (Kent, Kent, & J. Rosenbek, 1987; Coleman, Mabis, & Hinson, 1977). Whitefield’s example, while not reproducible because of its place in history, still offers a unique framework for this research question, as it would not be feasible to undertake a present experiment with 30,000 listeners. In addition, the in-depth modeling process will expand the understanding of the techniques and data needed to efficiently and accurately simulate acoustics in large outdoor venues. This research combines the study of maximum vocal level, sound radiation patterns, computational acoustic propagation algorithms, and the perceptual quantification of intelligibility to provide a more concrete answer to how many people a single voice can reach on its own strength. This research is also significant from a historical point of view. Whitefield, though not as famous today, was one of the first transcontinental celebrities and was probably known to more Americans in the first half of the eighteenth century than any public figure except George II. The revivals in Britain spurred by Whitefield and John Wesley led to significant social change, including prison reform and the abolition of the slave trade (Dallimore, 1970). In the colonies, Whitefield’s role in the First Great Awakening helped establish a more coherent and independent American identity and may have contributed to the American Revolutionary cause (Mahaffey, 2007). The question of how large Whitefield’s crowds were is of historical significance, as is the relation between the actual crowd sizes and the estimates, often circulated by Whitefield’s supporters (Lambert, 1994). But in addition to Whitefield’s significance alone, of course this project also involves checking the experimental work of Benjamin Franklin, the first scientist of any distinction in the colonies. In addition to his work on electromagnetism, Franklin also wrote and theorized about the physics of sound throughout his lifetime 7 (Franklin, n.d., 1749), including the question he addressed here: how many people could hear a single orator’s voice? While novel given the technology and knowledge of his day, Franklin’s experiment failed to account for the role of sound absorption, and it also assumed a uniform intelligible radius without referencing any measure of the sound power used to generate the speaker’s voice. Using modern acoustic modeling technology to account for these factors will allow a better estimate of how large Whitefield’s crowds could have been and give some measure of the accuracy of Franklin’s original experimental result. Finally, in addition to analyzing Franklin’s experiment because of his personal significance, there is also anthropological value to the question Franklin was raising. He was personally skeptical of narrative accounts of generals haranguing large armies, and sought to address this using scientific first principles with Whitefield as his test case. Since Whitefield’s crowds are the largest reported for an unamplified voice in recent history in a city where 80,000 people actually lived, Whitefield’s case and Franklin’s data provide a touchstone for a much broader investigation of the maximum crowd that could have gathered to hear a unified message in the pre-amplified era. This will provide a helpful empirical framework for investigating religious revivals like Whitefield’s, military communication channels such as Franklin was investigating, and also more broadly the size of gatherings in pre-literary cultures, where a strong oral tradition helped cement a single people group together. While Whitefield’s case is singular because of Franklin’s recorded data, a rigorous analysis of the extreme case allows the construction of a more general framework of how to treat historical instances of unamplified speakers addressing large audiences. Based on geometry and material composition of the sites in question, climatological data, and background noise estimates, various historical speakers’ 8 maximum crowd size could be indexed to their maximum vocal level. Research into the maximum level of the spoken voice could also be used to separate trained orators from speakers with more normal vocal ranges. Dissertation Outline Chapter II gives a description of the state-of-the-art in the fields necessary to complete this research. In particular, the fields of archaeoacoustics, computer acoustic simulation, and acoustics of the human voice are examined in detail. Chapter III examines the directivity of the human voice at high levels for trained actors and opera singers. An experimental set of measurements are taken to examine the effect of different vocal modes of production on vocal directivity in the horizontal plane. Chapter IV describes the specifics of Franklin’s experiment in Philadelphia and discusses the possible sound sources that may have existed near Front Street. Using a geometrical analysis of diffraction effects at Franklin’s position, conclusions are drawn about Franklin’s position during the experiment. Chapter V details the geometry and material composition of the Market Street area where Franklin conducted his experiment. These data are used to construct a computer model which then provides an estimate of Whitefield’s on-axis SPL based on the intelligibility at Franklin’s position. Chapter VI explores the maximum average and peak level of the human voice at a fixed distance. A larger set of trained actors and opera singers are measured for several different classifiers of on-axis SPL. These findings are then compared to the estimates for Whitefield’s voice and the existing literature. 9 Chapter VII describes the occasion and locations of Whitefield’s largest crowds in London. The geography and history of the Moorfields, Kennington Common, and Mayfair are discussed, along with the specific crowd estimates for each site. Based on the geometrical and material data available, computer models of the sites of the three largest crowds are constructed. Chapter VIII uses the data for Whitefield’s estimated SPL and the acoustic computer models to investigate the intelligible area reached by his voice with respect to several other factors. Based on these simulations and an estimate of the average crowd density, final estimates are provided for the maximum acoustic limit of Whitefield’s crowds. Chapter IX concludes this thesis. This chapter presents a summary of the findings of this study and the diverse research methods used to achieve its findings. Contributions The primary contributions of this dissertation are listed below: • An empirical investigation of the relationship between vocal production modes and acoustic directivity. • Many specific historical details about the material and geometric composition of Philadelphia and London in 1739. • New information about the soundscape of colonial Philadelphia and specific octave-band noise data for carriages on gravel roads. • A novel diffraction analysis to pinpoint more precisely Franklin’s position during his experiment. • An acoustic computer model of Market Street in 1739 to re-create the first phase of Franklin’s experiment. 10 • An estimate of George Whitefield’s on-axis LAeq at 1 m. • A comprehensive organization of previous research vocal SPL measurements as well as new experimental data on the maximum SPL achievable by a human voice. • Acoustic computer models of Whitefield preaching at the Moorfields, Kennington Common, and Mayfair, to re-create the second phase of Franklin’s experiment. • The first rigorous estimates of the maximum intelligible range of a single unamplified human voice under a variety of conditions. Associated Publications by the Author This thesis covers much of the work presented in the publications listed below: Peer-Reviewed Articles • Boren, B. (2014). The Maximum Intelligible Range of the Human Voice. Journal of the Acoustical Society of America (submission). • Boren, B. (2012). Sounds of the City: The Colonial Era. The Encyclopedia of Greater Philadelphia. Conference Papers • Boren, B., Roginska, A., & Gill, B. (2013). Maximum Averaged and Peak Levels of Vocal Sound Pressure. 135th Audio Engineering Society Convention, New York, NY. • Boren, B. & Roginska, A. (2013). Sound radiation of trained vocalizers. Proceedings of Meetings on Acoustics: 21st International Congress on Acoustics, Montreal, Canada. 11 • Boren, B. & Roginska, A. (2012). Analysis of noise sources in colonial Philadelphia. Internoise 2012, New York, NY. 12 CHAPTER II BACKGROUND History of Whitefield’s Crowds The reported sizes of Whitefield’s crowds are among the largest recorded audiences in history for a single unamplified speaker (Dallimore, 1970). However, historical studies offer few methods for estimating the crowd sizes more precisely than the estimates of the day: Dallimore (1970) and Stout (1991) believe that the reported numbers should be reduced by a factor of one-half, while Mahaffey (2012) has suggested looking at population data to find the largest possible audience within a ten mile radius. But in the end these methods are highly speculative, as were the estimates from Whitefield and his contemporaries. Even with crowds that are photographed extensively, there is still today a large relative standard error in crowd estimation techniques (R. Watson & Yip, 2011). The question may arise as to how accurate the original crowd estimates are or why we should place much confidence in them, as blind estimates are frequently inflated (Jacobs, 1967). However, the estimates of Whitefield’s crowds were often said to have been ‘computed’, suggesting that a more rigorous approach may have been taken. Indeed, one such account in The Gentleman’s Magazine indicates that a modern sort of density-area calculation had been used to arrive at an estimate of 20,000 (“The Gentleman’s Magazine”, 1739). Still, some historians remain skeptical about the veracity of the period ac- 13 counts of the crowds, believing instead that Whitefield or his publicist William Seward “fabricated crowd estimates” as a publicity stunt (Lambert, 1994). To this two things may be said: first, while a reading of Whitefield’s journals (Whitefield, 1756) may give evidence toward his overconfidence, his sincerity comes across with equal strength. While his estimates may have been too high, when he wrote that he “really believed” his crowds numbered a certain amount, Whitefield’s dedication to personal piety suggests that he never deliberately inflated the reported crowd sizes or encouraged others to do so. Secondly, it is useful in such cases to consider not only Whitefield’s friends but also his enemies. Perhaps because Whitefield’s fieldpreaching was such a new phenomenon, his opposition in the established church generally saw his massive crowds as mostly a mark against his credibility. One Anglican priest in Boston opposed Whitefield but agreed with the estimate of 20,000 hearers gathered to hear Whitefield there in 1740 (Lambert, 1994). One letter to Franklin’s Pennsylvania Gazette, however, did assert that the crowd estimates were too large, though Franklin added that the letter came close to invective (Franklin, 1740). While the historical record is far from clear on this issue, it can be said that some of Whitefield’s crowd estimates appear to have been actual numerical estimates rather than blind guesses, and that the majority of his supporters and detractors seemed to agree that the crowds were the largest they had seen. In Chapter IV we will consider additional historical evidence as to why an acoustical method may offer the best empirical estimate of Whitefield’s maximum crowd size. 14 Archaeoacoustics Recent movements within digital humanities use computational techniques to provide quantitative data for research within less strictly quantitative disciplines. Often such projects must delve deeply into an unrelated empirical field to find the tools necessary to address a question in the humanities, such as using Music Information Retrieval to investigate classical theories of tuning in Indian music (Serra, Koduri, Miron, & Serra, 2011) or using Geographical Information System (GIS) technology to address the role of topology in the Battle of Gettysburg (Knowles, 2008). Archaeoacoustics, or archaeological acoustics, is a similar emerging discipline between the fields of archaeology, history, musicology, physics, and acoustics. It aims to provide a lens to past soundscapes and help understand how sound affected the past as experienced by the people of a given time period. Research in this field has examined acoustical effects and questions of intentionality in prehistoric and neolithic monuments (Scarre & Lawson, 2006; Abel et al., 2008). Archaeoacoustic researchers have used instrument modeling techniques to synthesize an estimate of how ancient instruments would have sounded based on physical descriptions and drawings (Andreopoulou & Roginska, 2012). Others seek to explain recorded descriptions of sound propagation using wave theory to explain acoustical shadows reported during U.S. Civil War battles. (Ross, 1999). Even pure historians have begun to pay attention to the transient nature of sound in reconstructing how people experienced the past (Smith, 1999; Rath, 2003). More recent work has begun to examine the question of acoustics in architectural design from periods before the physics of sound were well understood (Orlowski, 2006; Howard & Moretti, 2010). In interior spaces especially, acoustics acts as a bridge from architectural history to other aspects of history, including 15 music, theatre, and religious liturgy. In addition, recent studies using quantitative acoustic measurements have allowed a more sophisticated empirical analysis of existing acoustical spaces of historical importance (Bonsi, Longair, Garsed, & Orlowski, 2008). In addition, the use of computational acoustic modeling, calibrated according to objective measurements, has been used to estimate the acoustical effects of crowds, tapestries, and changes in geometry that such spaces would have encountered in the past (Boren & Longair, 2011; Boren et al., 2013). Another recent project combines visual and acoustic modeling to simulate the experience of hearing John Donne preaching at the pre-fire St. Paul’s Cathedral in London (Wall, Stephens, & Markham, 2012). Because of the wide interdisciplinary nature of the field, archaeoacoustic research can be based on qualitative interpretation or quantitative analysis, and often requires a nuanced blend of both to provide meaningful results. Acoustical Simulation Though the understanding that sound travels in a wave goes back to Aristotle, no attempt to simulate the motion of sound in a real environment was made until 1843 with the invention of the ripple tank (Rindel, 2002). By exciting water vibrations and using hard surfaces to model the walls of a cross-section of a room, this method allowed a coarse simulation of the 2-dimensional motion within a room. Because of the vastly different physical properties of water, air, and room surfaces, the ripple tank could not do much more than show the wave motion, however. By the early twentieth century, pioneering acoustician Wallace Sabine used Schlieren photography to implement a similar visualization technique within a real room (fig. 1). This involved filling the room with smoke and backlighting it, then photographing the 16 motion from an impulsive spark, thus giving some indication of how a sound wave would move with an actual space. Figure 1: Schlieren photography showing wave propagation in a concert hall from Rindel (2002) The first ray-based acoustical simulations were implemented through the optical beam method, in which a single light source is made to give off light rays in many directions (fig. 2). By darkening or lightening surfaces, some degree of reflection and absorption could be simulated. However, this only worked for simulating high frequency sound, as the the wavelengths of optical light are very small in relation to room surfaces. A later method used lasers to improve simulation precision, but it retains the basic modeling procedure (Rindel, 2002). Figure 2: Optical ray method showing attenuation of individual ray paths from Rindel (2002) These early methods were all non-auditory. They helped shape the modern 17 understanding of room acoustics, but they did not allow any auralization of how a simulated space would sound. The next big push in the twentieth century was the method of building scaled physical models of a space. By scaling the wavelengths of sound to the proportion of the model, a tiny dummy head could by used to generate an approximate binaural room impulse response for auralization (?, ?). The drawbacks of this method include accounting for the lowpass filtering effects of air absorption (addressed through using dried air at 2-3% humidity, or oxygen-free nitrogen, or post-filtering if the air has a very homogeneous distribution) as well as the fact that higher accuracy requires bigger models, which becomes more expensive (Rindel, 2002). In spite of its drawbacks, however, this method is still used by many acoustic consulting firms (Kleiner, Dalenback, & Svensson, 1993). The acoustic wave equation (eq. 1) describes the behavior of a sound wave in a medium based on its pressure p, wave velocity c, and time t (∇2 is the Laplacian operator in 3D Cartesian space). Because this is a second-order partial differential equation, it must be solved numerically at millions of discrete points in space and time, requiring significant computational power. 1 ∂ 2p ∇ p= 2 2 c ∂t 2 (1) Computational Wave Equation (WE) models produce a high level of accuracy, but consequently they require a high level of precision during the modeling process, requiring complex acoustic impedance values instead of a simple Sabine absorption coefficient (Olesen, 1997). They can accurately model resonance, focusing effects, diffraction, and refraction, although they still require statistical techniques to account for scattering in rough complex geometries. However, since each solution to the wave equation is a single-frequency phasor, a dense array of calculations is 18 needed to get full octave-band information for a single point in a room. Since the number of affected room modes is proportional to the cube of frequency, it is incredibly computationally expensive to calculate wave equation solutions for a wide frequency range, for large spaces, or for large arrays of listener positions (Rindel, 2000). For these reasons, WE models are typically used in non-auditory acoustic simulations or noise engineering contexts in which the acoustic system is relatively small. Popular WE models include Finite Element Method (FEM) models, which generate a mesh of the modeled acoustic medium, and Boundary Element Method (BEM) models, which form a mesh of the system’s boundaries and assumes a homogeneous medium contained within (Kleiner et al., 1993). FEM systems work better for simulating refraction effects in heterogeneous media, while BEM systems work better when the volume to surface area ratio is high. Finite Difference Equation (FDE) systems discretize the wave equation using a Taylor Series approximation, but their accuracy is reduced at high frequencies without good data for the characteristic impedance of any boundary materials (Olesen, 1997). Parabolic Equation (PE) systems have become popular for modeling long-range acoustic propagation underwater or outdoors, but these too require precise data about surface impedance and the temperature gradient within the acoustic medium (West, Gilbert, & Sack, 1992; White & Gilbert, 1989). In essence the two great advantages of WE models are their handling of wave phenomena and their numerical accuracy. The tradeoffs for these are their requirement of huge computational resources and high precision input data. Since this project does not require real-time results, the computational requirement is not particularly relevant. However, because of the historical nature of this project, most inputs can only be broadly estimated rather than precisely measured, which negates any possible gains in precision. The ability to simulate wave phenomena accurately 19 also requires precise measurements of geometry or atmospheric temperatures at different strata, which are impossible to obtain for the dates in question. Thus a WE simulation for archaeological acoustic simulation seems somewhat like giving an answer with five significant digits when the inputs only contained one. Because this research involves a linear line-of-sight calculation with only tangential questions of diffraction and refraction, it may be best to handle those side issues generally and then perform calculations using simulation methods more robust to general absorption and scattering data. The next most accurate computer simulation techniques are geometrical models, which model sound as a ray rather than as a wave, which allow them to focus only on the sound paths necessary to correctly generate the room impulse response at the listener’s position. The oldest of these is the Image Source Model (ISM), which, though known earlier, was first implemented numerically by Allen and Berkeley for simulating impulse responses in rectangular office buildings (1970). It was later extended to a vector-based approach by Borish, whose technique allowed it to be implemented for any arbitrary polyhedron (1984). The basic principle of ISM is that reflections from a source may be modeled as the effect of a virtual source in a virtual room, existing across the axis wall of reflection. ISM allows an efficient way to quickly simulate all reflections within a given radial distance. However, for complicated geometries, many of the possible virtual sources will not be visible to the listener and thus will not affect the eventual simulated impulse response. For instance, a normal, somewhat complex room geometry could produce as many as 1019 possible virtual higher-order sources, out of which only 2500 are viable (Borish, 1984). Because of this, ISM is only tenable for modeling early reflections. The Ray Tracing (RT) method instead sends out a uniform distribution of 20 rays from a modeled source, computing the rays’ reflection paths and attenuation to simulate an acoustic impulse response at the point of a receiver (Rindel, 2000). This essentially has the opposite effect of creating virtual listeners in virtual rooms as opposed to the virtual sources of ISM. The rays are one-dimensional, while the listener has a small volume and detects all rays that pass through. If both had infinite resources, they would both give the same result. But RT may not necessarily find all reflection paths in increasing reflection order, which makes it less effective for simulating early reflections (Borish, 1984). However, it is easier to apply statistical scattering methods in RT, which makes it ideal for modeling late-field reverberation. A variant of this approach is Cone Tracing (CT), which uses small beams with a definite cross section, while the virtual listeners are modeled as points (B.-I. Dalenback, 1996). This increases the angle with increased distance from the source, which allows it to find virtual listeners more uniformly. However, the system must be carefully engineered (using triangular beams) to ensure that the beams do not intersect. In reality, the most accurate results come from a hybrid method that makes best use of the different techniques. Odeon* and EASE† both use a hybrid of ISM and RT for early and late reflections, respectively, and CATT-Acoustic uses a hybrid of ISM and CT. Vorländer found that hybrid methods outperformed other singlealgorithm methods, and that those software packages (like Odeon and CATT) which modeled acoustic scattering performed best (Vorländer, 1995). These programs all use some variant of Lambert’s cosine law to calculate probability distributions for scattered sound, which allows a more diffuse field and keeps the software from * http://www.odeon.dk/, † accessed 7/22/2014. http://ease.afmg.eu/, accessed 7/22/2014. 21 overestimating the reverberation times in virtual rooms. While these methods do not take wavelength into effect, they can simulate it by performing separate calculations for different frequency bands and using the absorption data accordingly. In addition, many hybrid methods now include additional algorithms to account for diffraction effects and reflection-based-scattering, which previously only WE models could simulate (Rindel, Nielsen, & Christensen, 2009). The use of hybrid models in outdoor acoustics has been verified in multiple studies, which have found that hybrid systems work accurately as long as early reflection surfaces are adequately modeled (Lisa, Rindel, & Christensen, 2004; Mori, Yoshino, S. Satoh, & Tachibana, 2011). There do exist geometrical software packages specifically intended for outdoor acoustic simulation which allow more precise handling of wind noise and diffraction effects as well* . Again, due to the lack of information on historical wind speeds and the linear nature of the acoustic systems in question, for the purposes of this project any of the available hybrid modeling Acoustics of the Spoken Voice The acoustic properties of the human voice have been investigated from various perspectives, with the result that much that is known about the acoustic system is confined within specific disciplines such as communication disorders, voice recognition, noise control engineering, and music performance. This research will focus only on the spatial directivity of the voice and the maximum vocal SPL produceable by trained vocalists. * https://kluedo.ub.uni-kl.de/frontdoor/index/index/docId/2051, cessed 7/22/2014. 22 ac- Directivity The directivity of the human voice has been a subject of interest for a variety of applications for over 70 years. Different studies have focused on measuring radiation patterns for knowledge of microphone placement (Dunn & Farnsworth, 1939), experimental verification of physical theory (Flanagan, 1960), architectural design (Chu & Warnock, 2002; McKendree, 1986), vocal performance practice (Cabrera, Davis, & Connolly, 2011), and computer simulation and auralization (Katz & D’Alessandro, 2007). The methods used in these measurements varied from a single ‘exploring’ microphone (Dunn & Farnsworth, 1939) to the more extensive arc arrays of microphones used in the most detailed studies (Chu & Warnock, 2002; Monson, Hunter, & Story, 2012). These techniques have possible error introduced due to the necessity of the subject repeating a single block of speech for each measurement position. Other studies have used a single static array of microphones to measure directivity within a single plane, allowing a detailed investigation without the need for a dedicated laboratory measurement apparatus and avoiding any error introduced by changes in the subject’s vocal delivery (McKendree, 1986; Cabrera et al., 2011). The findings of these studies have not always been in exact agreement, but gradually consensus is building around the independence of the radiation pattern against several factors. For instance, McKendree (1986) reported differences in directivity based on gender, but later studies by Chu and Warnock (2002) and Monson (2012) by and large were not able to support this conclusion. Chu and Warnock (2002) similarly found differences in radiation pattern for different loudness levels, but this was only investigated for a single subject. Monson (2012) did not observe 23 the same effect, although he did report some increases in directionality at high frequencies for loud speech. Marshall and Meyer (1985) reported differences in individual phonemes, and Katz (2007) also found significant differences between sung vowels within specific mid-frequency bands. Monson (2012) also found variation in radiation patterns for different voiceless fricatives, presumably because of differences in mouth shape and frequency content for these phonemes. Katz (2007) investigated different sung vocal techniques, including ‘projected’ and ‘focused’ voice, but found that these techniques did not appreciably affect the radiation pattern of the voice. Maximum Level Greater vocal levels are found for trained vocalists in comparison to untrained vocalists (Akerlund, Gramming, & Sundberg, 1992). For the sung or spoken voice, two similar phenomena known alternatively as the Singer’s or Speaker’s Formant occur in trained vocalists to merge multiple vocal formants into a single spectral peak around 3 kHz (Nawka, Anders, Cebulla, & Zurakowski, 1997; Sundberg, 2001). Some studies have suggested this is a cluster of formants 3 and 4, while others have interpreted it as a cluster of formants 4 and 5.* In the case of singers, this allows the voice to stand out against the typical frequency contour of a symphony orchestra. But because of the nonlinear frequency-dependent sensitivity of the auditory system, it also helps concentrate more sound energy into the frequencies most important for speech intelligibility (1-4 kHz). The perception of loudness is dependent on more than SPL alone (G. D. Allen, 1971), but this investigation will be purely limited to the maximum peak or averaged * www.phys.unsw.edu.au/jw/voice.html, 24 accessed 7/22/2014. SPL produced by vocalists. To the author’s knowledge, no study has comprehensively surveyed the existing literature on maximum vocal SPLs since (Kent et al., 1987). Kent’s study summarized different series of measurements, while acknowledging that they were recorded at different distances from the vocalist’s mouth. Here some attempt will be made to scale all such measurements to the predicted SPL at 1 m using the classical formula for free field sound attenuation (eq. 2). In practice the drop-off will be less abrupt for some close measurements, due to nearfield behavior of the sound field close to the vocalist’s mouth. Still, this formula allows a good approximation for comparing some of the large SPLs reported at short distances to later measurements taken at 1 m. ∆L = 20 log10 r1 r2 (2) Many of the studies that have collected maximum SPL measurements have used them for comparison with other factors, and so the types of SPL measurements and vocal signals analyzed vary. Some studies (Mendes, Rothman, Sapienza, & Brown, 2003; Coleman et al., 1977; Coleman, 1994) did not specify the time of integration for SPL recording, so it is assumed some form of fast-integrated Lp was used. Some studies (Mendes et al., 2003; Akerlund et al., 1992; Coleman, 1994; Gramming, Sundberg, & Ternström, 1988) only provided graphs and not exact measurement values, so the dB values reported here may contain ± 1 dB of error. In addition, some studies (Awan, 1991; Akerlund et al., 1992; Leino, 2009) only report the mean level for all their participants, indicating that higher values were measured but not reported directly. The highest SPL reported in Kent’s (Kent et al., 1987) summary was from Coleman’s 1977 study of fundamental frequency and SPL (Coleman et al., 1977). 25 Coleman reported the fast-integrated full spectrum SPL recorded for a 2 s sung phonation. This yielded a max Lp of 126 dB for a male and 122 for a female, recorded at 6 inches from the mouth. These high values are a consequence of a very close distance to the vocalist; using an attenuation factor of -16 dB from eq. 2 gives estimated SPLs of 110 and 106 dB, respectively at 1 m. These values are still high but more similar to the maximum peak values seen in other studies. Table 1 Review of maximum SPLs measured in previous studies Study Participants Type Dist. (cm) Max SPL SPL @ 1 m Coleman 10 m. adults Fast Lp , 2 s phon. 15.24 126 dB 110 dB Coleman 12 f. adults Fast Lp , 2 s phon. 15.24 122 dB 106 dB Gramming 9 m. singers Leq , sung triads 30 105 dB 95 dB Awan 20 singers Fast Lp , 3 s phon. 30.48 112.5 dB 102 dB Akerlund 10 f. singers Leq , 30 s speech 30 93 dB 83 dB Akerlund 10 f. singers Leq , 2 s phon. 30 118 dB 108 dB Coleman 20 singers Lp , 4 s phon. 15 114 dB 98 dB Mendes 14 singers Lp , 6 s phon. 2 118 dB 84 dB Sundberg 31 speakers Leq , 40 s speech 30 100.3 dB 90 dB Leino 14 students Mean Leq , speech 40 72.8 dB 65 dB Many studies have measured either instantaneous Lp or Leq for constant phonations a few seconds in length. Leq is defined as the time average of the SPL: Leq = 10 log10 1 T Z 0 T p(t) p0 2 dt (3) where T is the integration time and p(t) is the pressure as a function of time. For 26 sustained tones with no pauses or silences, Leq will be higher than the Leq for normal speech or song. For this reason, studies that have measured short sustained phonations, whether they are reporting instantaneous Lp or Leq , are describing a quantity more similar to a peak value when applied to continuous speech or song. This can be seen in the study by Akerlund (1992), which measured Leq for normal speech and a 2 s phonation. The maximum value of the phonation’s Leq was 25 dB greater than that of the speech. Table 1 summarizes the maximum recorded SPLs of the relevant literature, including the corresponding estimated level at 1 m for comparison. The maximum fast-integrated Lp is similar but not identical to the Lpk value, but these measurements give a good overview of the highest average and instantaneous levels recorded for the human voice. In particular, Sundberg (Sundberg & Nordenberg, 2006) reports an Leq corresponding to about 90 dB at 1 m, the highest time-averaged value reported in the literature. 27 CHAPTER III ACOUSTIC DIRECTIVITY OF VOCAL PRODUCTION MODES Before constructing an acoustic model of Whitefield preaching, it is important to ask whether we possess sufficient information to model him as an acoustic source. A virtual acoustic source requires an SPL value, a spectrum of frequencies over which the sound power is divided, and an acoustic directivity pattern, which determines how the sound spreads out in space with respect to the source’s directional orientation. The SPL value is the variable for which we are trying to solve, so that can be left as an unknown for the present. The spectrum is fairly consistent across speakers of the same sex, so simulation engines generally use a single averaged spectrum for male and female talkers (B. Dalenback, 2011). While Whitefield’s voice may have diverged slightly from the average male spectrum, those deviations are effectively unknown.* The nonlinear loudness sensitivity of the auditory system to different frequencies is already characterized by the STI, and unfortunately the frequency spectrum is not a simple one-dimensional system whose deviations from a mean value may be neatly evaluated in either direction. Rather the spectrum is a complex multidimensional system with no clear preferential deviation from a mean male frequency profile based on the historical evidence. * For instance, if Whitefield’s voice had a noticeable Speaker’s Formant, he may have had a relative peak at 3 kHz compared to the average male spectrum. But this would also have resulted in a loss of energy at other frequency bands, so without more specific data an averaged spectrum is the best approximation we have. 28 Because of this, it is better to use a mean spectrum while noting that extreme deviations could make differences within a given SPL based on the auditory system and frequency-dependent air absorption. Apart from level and spectrum, however, Whitefield’s vocal directivity pattern could have significantly affected his maximum audience size. Existing literature on this subject mainly focuses on conversational speech or the sung voice (e.g. (Chu & Warnock, 2002; Cabrera et al., 2011), as mentioned in Chapter II). But trained vocalists often employ multiple modes of vocal production (Katz & D’Alessandro, 2007), and it is unknown whether this could cause significant deviations from averaged vocal directivity databases. This chapter analyzes the effects of vocal production methods on the horizontal acoustic radiation pattern of the trained voice. Measurement Procedure This research involved measuring the radiation patterns of trained vocalists employing different vocal production techniques at high levels. Two male singers were measured, one a professional opera singer and the other a classically trained musical theater singer. Two actors were also measured, one male and one female. For each mode of production, each vocalist first intoned five vowels on C4: /i/, /eI/, /a/, /o/, and /u/, for about two seconds each. This was less to see the effect on any given phoneme than to investigate the changes without broadband fricatives whose content is presumably less affected by a change in vocal production mode. After the vowels, the singers performed about 30 seconds of a prepared song, while the actors performed a monologue of the same length. C4 was chosen because it was within the common range of both male and 29 female vocalists. Since it is towards the bottom of a female’s typical range and toward the middle to top of a male’s range, the absolute loudness achievable for different genders will experience some difference. However, keeping the note the same assists the directivity analysis procedure by ensuring that the fundamental and harmonics are uniform for each measurement, whereas raising the note for a female would lead to many irregularities at specific octave bands. In addition, because these results are intended primarily for making comparisons of different production techniques for each vocalist at specific third-octave bands, the change in absolute loudness matters less than the overall directivity by frequency. Furthermore, since Whitefield’s voice was subjectively described as somewhat high, this pitch corresponds to a good range in the male voice to observe possible directivity effects for different vocal production modes. The singers and actors employed different vocabularies to describe different ‘placements’ of the voice corresponding to different production methods, so three methods were chosen for the singers and four for the actors. The singers’ production methods were ‘back,’ ‘forward,’ and ‘in the mask’, corresponding to perceived resonances in the rear of the mouth, the front of the mouth, and the sinus cavities respectively.* The actors’ production methods were a ‘chest’ voice corresponding to a felt resonance in the front of the speaker’s chest, a ‘mask’ voice similar to that employed by the singers, a ‘head’ voice in which the voice is felt resonating at the top of the speaker’s head, and a ‘back resonance’ voice in which the speaker shifts the resonance to the rear of the torso. The measurements were conducted using 13 Earthworks M30 measurement * These categories were self-reported. While there were audible spectral differences between the methods, an expert listener expressed that the singers were not properly achieving the ‘mask’ and ‘front’ voices. 30 } 60 cm Figure 3: Diagram of microphone array used for measurements microphones. These microphones have extended flat frequency responses, and each was calibrated within a 0.21 dB range using pink noise at 95 dBZ in a hemi-anechoic environment. The microphones were spaced along a semicircle at 15-degree intervals and aligned to the height of the center of each vocalist’s mouth. Assuming vocal symmetry, the data were doubled to form 360-degree radiation patterns in the horizontal plane. Only the horizontal plane at the level of the vocalist’s mouth was analyzed because more data are available for this plane (McKendree, 1986; Cabrera et al., 2011; Monson et al., 2012) and it is easier to measure without a dedicated microphone arc. Each vocalist was aligned to the center of the semicircle at their measured height (fig. 4). No apparatus was used to keep the vocalists’ heads in place, but they were observed to keep very still during the measurements. Because the measurements were conducted in a live auditorium, the measurement distance was reduced to 60 cm from 1 m to better capture the acoustic near field. The auditorium’s mid-frequency reverberation time was measured to 31 Figure 4: Aligning a vocalist with the measurement array be 1.3 s using the Schroeder-integrated decay curves for a balloon popped in the space. Though the room doubtless creates some smoothing of the measured radiation patterns, an auditorium was preferable to an anechoic environment because the vocalists needed to project their voices at very high levels (Cabrera et al., 2011). Anechoic rooms often feel perceptually unnatural and can lead to reduction in vocal level as a result (Katz & D’Alessandro, 2007). Katz (2007) attempted to reduce this factor in anechoic measurements by producing artificial reverberation through headphones, this was not an option for the purposes of these measurements since for many of the vocal production methods the vocalists needed to first feel the resonance in a specific part of their head, often feeling their heads* with their hands before recording in order to find the purest version of a specific ‘voice.’ In addition, the measurements were not used for absolute comparisons to other anechoic * While open headphones would probably have felt more natural, the need to feel the tops of their heads made any type of headphones problematic for this purpose. 32 measurements but only to make differential comparisons between the methods and subjects within this study. Results Normalized Radiation Patterns The analysis of the data consisted of RMS levels of overall directivity and in thirdoctave bands for all the vocalists. Data were first normalized to observe any patterns in the directivity itself independent from spectrum. Figure 5 shows the overall levels for all four vocalists intoning the five vowels at C4.* Figure 6 shows the overall levels for the actors’ monologues and the singers’ songs, respectively. It is observed that the radiation patterns remain generally unaffected by the production method used, with a few variations of about 1 dB or less. For the actress’s vowels, the ‘back resonance’ directivity was relatively 1-1.5 dB louder at rear positions than the other production methods (fig. 5b), but this pattern was not seen in the male actor’s vowels (fig. 5a) or in the female actress’s monologue (fig. 6b). In general, the overall level does not display a significant difference between the average level for speech or song and that of intoned vowels. The data were also analyzed in third-octave bands to show the effects of frequency on directivity. Selected bands are shown here for the sake of brevity. Figure 7 shows radiation patterns at three bands for the actress’s monologue. While the expected increase in directionality with increased frequency can be observed, the effect of the production method on directivity is very small (less than 1 dB) in most * Note that these are dB comparisons of RMS pressure alone, meaning that a dB comparison on the scale of pressure squared would simply be multiplied by 2 for each plot. 33 Normalized Actor Vowels Overall Levels Normalized Actress Vowels Overall Levels 90 Chest Mask Head Back Resonance 0 dB 120 60 −2 90 Chest Mask Head Back Resonance 0 dB 120 60 −2 −4 −4 150 150 30 30 −6 −6 −8 −8 180 0 180 330 210 0 330 210 300 240 300 240 270 270 (a) Actor (b) Actress Normalized MT Singer Vowels Overall Levels Normalized Opera Singer Vowels Overall Levels 90 Back Forward Mask 0 dB 120 60 90 60 −2 −2 −4 −4 150 30 150 30 −6 −6 −8 −8 180 0 210 330 180 0 210 300 240 Back Forward Mask 0 dB 120 330 300 240 270 270 (c) Musical Theater Singer (d) Opera Singer Figure 5: Normalized overall levels for the vocalists, intoning vowels on C4 bands. The results for the actor’s monologue are so similar to the actress’s monologue that they are omitted here. Figure 8 shows the same frequency bands for the actress’s intoned vowels. Larger variations between production modes can be seen at low frequencies, especially at the band centered at 251 Hz, close to the fundamental frequency of the C4 on which the vowels were sung. The modes of production become more uniform as the frequency increases, suggesting that the directionality of low frequency speech is smoother than that of vowel production alone. 34 Normalized Actor Monologue Overall Levels Normalized Actress Monologue Overall Levels 90 Chest Mask Head Back Resonance 0 dB 120 60 −2 90 Chest Mask Head Back Resonance 0 dB 120 60 −2 −4 −4 150 150 30 30 −6 −6 −8 −8 180 0 180 330 210 0 330 210 300 240 300 240 270 270 (a) Actor (b) Actress Normalized MT Singer Song Overall Levels Normalized Opera Singer Song Overall Levels 90 Back Forward Mask 0 dB 120 60 90 60 −2 −2 −4 −4 150 30 150 30 −6 −6 −8 −8 180 0 210 330 180 0 210 300 240 Back Forward Mask 0 dB 120 330 300 240 270 270 (c) Musical Theater Singer (d) Opera Singer Figure 6: Normalized overall levels for the vocalists, speech (a and b) and song (c and d) Figure 9 shows the third-octave band data at 10000 Hz for the musical theater singer’s song and vowels. While both the actress’s speech (fig. 7c) and vowels (fig. 8c) were smooth at high frequencies, the musical theater singer’s vowels experienced more variation in normalized directivity even at 10000 Hz (fig. 9b). Again, the vowels-only case shows the greatest variation in radiation pattern, but when averaged over a song segment (fig. 9a), these variations quickly diminish. Future work may need to focus more on the directivity of separate phonemes, as a single 35 Normalized Actress Monologue Radiation at 251 Hz Normalized Actress Monologue Radiation at 1000 Hz 90 Chest Mask Head Back Resonance 0 dB 120 60 −2 Normalized Actress Monologue Radiation at 10000 Hz 90 Chest Mask Head Back Resonance 0 dB 120 60 −2 −4 90 60 −2 −4 150 −4 150 30 150 30 −6 30 −6 −8 −6 −8 180 0 0 180 330 210 300 240 −8 180 330 210 0 330 210 300 240 270 300 240 270 (a) 251 Hz Chest Mask Head Back Resonance 0 dB 120 270 (b) 1000 Hz (c) 10000 Hz Figure 7: Normalized third-octave bands for actress’s monologue Normalized Actress Vowels Radiation at 251 Hz Normalized Actress Vowels Radiation at 1000 Hz 90 Chest Mask Head Back Resonance 0 dB 120 60 −2 Chest Mask Head Back Resonance 0 dB 120 60 −2 −4 90 60 −2 −4 150 30 150 30 −6 30 −6 −8 −6 −8 180 0 330 210 300 270 −8 180 0 330 210 300 240 270 (a) 251 Hz Chest Mask Head Back Resonance 0 dB 120 −4 150 240 Normalized Actress Vowels Radiation at 10000 Hz 90 180 0 330 210 300 240 270 (b) 1000 Hz (c) 10000 Hz Figure 8: Normalized third-octave bands for actress’s vowels vocalist’s instantaneous acoustic radiation can differ significantly from a long-term average. The opera singer’s song directivity (fig. 10) also showed small variations across vocal production mode in most bands. However, unlike the previous two examples, this singer’s voice showed its greatest variation at the 1995 Hz band, with the radiation patterns becoming somewhat more uniform at the 10000 Hz band. Unlike the musical theater singer, the opera singer’s vowels and song data were both extremely similar in directivity. This may be a consequence of the style of the aria 36 Normalized MT Singer Song Radiation at 10000 Hz Normalized MT Singer Vowels Radiation at 10000 Hz 90 Back Forward Mask 0 dB 120 60 90 60 −2 −2 −4 −4 150 150 30 30 −6 −6 −8 −8 180 0 330 210 180 0 330 210 300 240 Back Forward Mask 0 dB 120 300 240 270 270 (a) Song (b) Vowels Figure 9: Normalized 10000 Hz bands for musical theater singer’s song and vowels being sung, which had many long vowel notes and fewer fricatives than the musical theater singer’s song. Absolute Radiation Patterns After a normalized analysis, ‘absolute’ radiation patterns were also plotted relative only to the greatest level out of the vocal production modes used.* While the shapes of these radiation patterns were earlier seen to vary little, the changes in absolute level between different production modes can be informative. In figures 11c and 12c, the musical theater singer’s three production modes are not only the same shape but remarkably consistent in overall level as well, although we have already seen that this uniformity is not always present in individual frequency bands. But in figure 11, a and b, we see that the actors’ two chest modes and head modes grouped more or less together, though not in the same way. For the male actor, the chest modes were uniformly louder, while for the female actor they were *0 dB = the level of the mode with greatest level on-axis. 37 Normalized Opera Singer Monologue Radiation at 251 Hz Normalized Opera Singer Song Radiation at 1995 Hz 90 Back Forward Mask 0 dB 120 60 Back Forward Mask 0 dB 120 60 −2 90 −4 60 −2 −4 −4 150 30 150 30 −6 30 −6 −8 −6 −8 180 0 330 210 −8 180 0 330 210 300 270 (a) 251 Hz 300 240 Back Forward Mask 0 dB 120 −2 150 240 Normalized Opera Singer Song Radiation at 10000 Hz 90 180 0 330 210 300 240 270 270 (b) 1995 Hz (c) 10000 Hz Figure 10: Normalized third-octave bands for opera singer’s song softer than the two head modes. In fact, we can see from an absolute analysis that while the actress’s ‘back resonance’ mode had the highest relative level in the rear, the two head modes were still absolutely louder at those positions. In-depth analysis of the absolute directivity differences amounts essentially to a spatial spectrum, indexed first by position and then by frequency. At most frequency bands, the radiation exhibits similar patterns to what has already been seen. But at some frequencies, these drastically diverge from a simple uniformity. For instance, in figure 13 the opera singer’s normalized directivity is nearly the same at each frequency band. But while the absolute levels of each production mode are almost identical at 251 (a) or 1000 Hz (c), the 501 Hz band (b) shows a large change in level for each vocal mode. This pattern was observed for the opera singer’s intoned vowels as well. Figure 14 shows the spectral progression of all four production modes in four third-octave bands for the actor’s intoned vowels. At 1000 Hz (a), we see the ‘back resonance’ and ‘chest’ voices grouping together, which the ‘mask’ and ‘head’ voices are lower. Then at 1995 Hz (b) the ‘chest’ voice groups with the lower level voices while the ‘back resonance’ voice is about 3 dB louder than the 38 Absolute Actor Vowels Overall Levels Absolute Actress Vowels Overall Levels 90 Back Resonance Mask Head Chest 0 dB 120 60 −2 90 Mask Chest Head Back Resonance 0 dB 120 60 −2 −4 −4 150 150 30 30 −6 −6 −8 −8 180 0 180 330 210 0 330 210 300 240 300 240 270 270 (a) Actor (b) Actress Absolute MT Singer Vowels Overall Levels Absolute Opera Singer Vowels Overall Levels 90 Forward Back Mask 0 dB 120 60 90 60 −2 −2 −4 −4 150 30 150 30 −6 −6 −8 −8 180 0 210 330 180 0 210 300 240 Back Forward Mask 0 dB 120 330 300 240 270 270 (c) Musical Theater Singer (d) Opera Singer Figure 11: Absolute overall levels for the vocalists, intoning vowels on C4 rest. Finally at 10000 Hz (c), the two chest voices group together and the two head voices are together at a lower level, similar to their arrangement in (a). 39 Absolute Actor Monologue Overall Levels Absolute Actress Monologue Overall Levels 90 Head Mask Chest Back Resonance 0 dB 120 60 −2 90 Mask Chest Head Back Resonance 0 dB 120 60 −2 −4 −4 150 150 30 30 −6 −6 −8 −8 180 0 180 330 210 0 330 210 300 240 300 240 270 270 (a) Actor (b) Actress Absolute MT Singer Song Overall Levels Absolute Opera Singer Song Overall Levels 90 Forward Back Mask 0 dB 120 60 90 60 −2 −2 −4 −4 150 30 150 30 −6 −6 −8 −8 180 0 210 330 180 0 210 300 240 Back Forward Mask 0 dB 120 330 300 240 270 270 (c) Musical Theater Singer (d) Opera Singer Figure 12: Absolute overall levels for the vocalists, speech (a and b) and song (c and d) Discussion Overall it will be seen that the vocal production modes chosen for this study had a generally small effect on horizontal voice directivity. It is possible that different effects could be observed for vertical directivity, but it seems unlikely based on previous studies that have examined both dimensions. At specific frequencies, and presumably for specific phonemes, the effects are greater, but no larger patterns 40 Absolute Opera Singer Song Radiation at 251 Hz Absolute Opera Singer Song Radiation at 501 Hz 90 Back Forward Mask 0 dB 120 60 Absolute Opera Singer Song Radiation at 1000 Hz 90 Back Forward Mask 0 dB 120 60 −2 90 −4 60 −2 −4 150 −4 150 30 150 30 −6 30 −6 −8 −6 −8 180 0 0 180 330 210 300 240 −8 180 330 210 0 330 210 300 240 270 300 240 270 (a) 251 Hz Forward Back Mask 0 dB 120 −2 270 (b) 501 Hz (c) 1000 Hz Figure 13: Absolute third-octave bands for opera singer’s song Absolute Actor Vowels Radiation at 1000 Hz Absolute Actor Vowels Radiation at 1995 Hz 90 Back Resonance Mask Head Chest 0 dB 120 60 −2 Back Resonance Mask Head Chest 0 dB 120 60 −2 −4 90 60 −2 −4 150 30 150 30 −6 30 −6 −8 −6 −8 180 0 330 210 300 270 −8 180 0 330 210 300 240 270 (a) 1000 Hz Chest Mask Head Back Resonance 0 dB 120 −4 150 240 Absolute Actor Vowels Radiation at 10000 Hz 90 180 0 330 210 300 240 270 (b) 1995 Hz (c) 10000 Hz Figure 14: Absolute third-octave bands for actor’s vowels regarding this behavior have been observed. Changes observed in this study were mainly on the order of 2 dB or less. While greater variations may exist due to the smoothing effects of recording in a diffuse environment, this is the same type of environment in which most performances for trained vocalists occur. If this is the only variation able to be controlled by a trained vocalist, it may not be a salient effect in diffuse environments, especially if any other sound sources are present. This study also argues against some common notions about vocal directivity. For instance, some of the vocalists measured actively predicted that the ‘mask’ 41 mode would radiate more energy forward than the other production modes. However, the recorded data for all four vocalists does not support that conclusion. I have heard it said in both classical and musical theater circles that operatic singers radiate more circularly than musical theater singers. Since both singers in this study sang different songs, the only example for comparison is their sung vowels on C4 (fig. 5). However, the measured data show that these two singers have fairly similar vocal radiation patterns. A larger sample size would be necessary to investigate this further, but this example casts some doubt on the claim. Most likely this piece of folk wisdom derived from a common misconception: the conflation of directivity and spectrum. It is possible that the perceived omnidirectionality of classical singers comes from the fact that male opera singers usually radiate more low-frequency sound energy than male musical theater singers, thus leading to the perception that their voices radiate differently in space rather than in frequency. Finally for the main purpose of this dissertation, this study indicates that specific oratorical production methods may affect overall level somewhat but have a small impact on the voice’s radiation pattern. Thus more comprehensive anechoic datasets on male voice directivity for loud speech are sufficiently robust for the purpose of simulating Whitefield’s oratory. 42 CHAPTER IV ANALYSIS OF FRANKLIN’S EXPERIMENT Franklin’s Experiment In Chapter II, we examined the question of Whitefield’s crowd sizes from the perspective of historical scholarship. While a purely historical method may be informative, for the numerical question being considered it is also useful to use physical and mathematical reasoning in conjunction with historical evidence. Interestingly, toward the end of his life Whitefield revised many of his journals and removed passages written in his youth that he had come to view as “justly exceptionable.” This included changing any estimates of crowds that were greater than 20,000 to “so many thousand that many went away because they could not hear” (Lambert, 1994). Though Whitefield’s voice was often described as “the roar of a lion” (Stout, 1991), he himself admitted that his largest gatherings were still limited by the audible range of his voice. Thus, it may well be that Franklin’s acoustical experiment provided the best method for estimating the maximum size of Whitefield’s audiences. Unfortunately, many historians simply take Franklin at his word without bothering to look into the details behind his calculation. Franklin specifies that he was assuming a semicircular radiation pattern based on a uniform intelligible radius equal to the distance he measured from Whitefield’s position. He also lists his assumed crowd density as 2 square feet per person, which allows the minimally intelligible area and intelligible radius to be inferred from his 43 reported crowd estimate. If we take his estimate of 30,000 auditors to be the exact answer to his calculation, then 30, 000 listeners = Area sq. ft. 2 listener (4) so Area = 60, 000 sq. ft. ≈ 5, 575 m2 (5) Since we know that for a semicircle, 1 Area = πr2 2 (6) we have r r= 2 ∗ 5, 575 ≈ 60 m π (7) This value of is about half the distance to Franklin’s reported position near Front Street (Fig. 15). Since area is the integral of the semicircumference (which varies linearly with the radius), doubling the intelligible distance would quadruple the intelligible area to about 23,000 m2 . Some have asserted that Franklin merely miscalculated* or that he measured the distance in strides and misremembered it as feet. (Liberman, 2005) Yet it seems unlikely that Franklin was actually referring to a position only 60 m from Whitefield, which would have been nearer to Letitia Court, a small alleyway, than to Front Street, which was one of the most important streets in Philadelphia at the time. An* See the editor’s note in Franklin’s Autobiography, p. 179 44 Figure 15: Inset of Clarkson-Biddle Map of Philadelphia showing Market Street other possible explanation is that he calculated the higher figure initially, but only reported it as “more than Thirty Thousand.” While this may seem strange to modern readers, it is actually very much in keeping with Franklin’s self-professed “Habit of expressing my self in Terms of modest Diffidence, never using when I advance any thing that may possibly be disputed...” (Franklin, 1793) After all, Franklin’s experiment, while perhaps the best that could be quickly calculated, was only an approximation, and it is likely he was aware of its shortcomings.* Also in Poor Richard Improved (Franklin, 1749), Franklin describes a similar thought experiment using the same density for soldiers in formation. He ends this by stating that “There are many voices that may be heard at 100 yards distance,” which suggests that he specifically remembered measuring a distance at least this large. It seems reasonable as well that Franklin calculated a number far larger than the accounts of * For instance, it had been known since the ancient Greeks designed their amphitheaters that the human voice’s radiation is not perfectly semicircular (Orlowski, 2006). 45 Whitefield’s crowds, but realized that while his estimation was extreme, it made the figure of 30,000 listeners seem more believable. Diffraction Effects Though Franklin lists his position as ‘near Front Street’, he does not list a specific position. Thus the first question that must be asked is where exactly Franklin was in relation to Front Street when the noise source obscured Whitefield’s speech. Though Franklin does not specify, it seems reasonable to assume that he was onaxis for Whitefield’s sermon, which would put him in the center of Market Street as he walked away from Whitefield. It will be seen (Fig. 16) that Franklin’s visible area on the south section of Front Street forms a triangle whose area doubles as Franklin’s distance to Front Street, d, halves. The geometry for the north section is similar though not identical, but there are historical reasons to suspect the noise source was to the south of Market Street, which will be discussed further later in this chapter. The question arises as to whether Franklin was hearing diffracted noise around the corner of the intersection, or whether he had a direct path to the noise source itself. The nature of the building at this corner will be discussed later, but because the buildings in early Philadelphia were made of brick, (Cotter, Roberts, & Parrington, 1992) it seems safe to focus only on diffracted sound and assume the noise conducted through the building itself is negligible. Assigning the origin in a coordinate plane to the point of diffraction (the blue dot in Fig. 16), Franklin’s position is BF = (−d, w1 ) where w1 is half the width of Market Street. 46 (8) Figure 16: Diagram of Franklin’s position (BF) in relation to sources on Front Street To compare a visible source to one which is not visible, we may assign representative points c1 and c2 , where c1 is the centroid of the triangle representing the visible area on Front Street, and c2 is the centroid of the triangle of non-visible area bordering the visible area (Fig. 16). These have coordinates ! c1 = 2w2 −w1 w2 , 3 3d ! c2 = w2 −2w1 w2 , 3 3d (9) and (10) where w2 is the full width of Front Street. It will be seen that c1 approaches the border of Market Street as d becomes large, and that Franklin’s distance to c1 is simply 47 v !2 u u 2w 2 +d + rc1 = t 3 w1 w2 w1 + 3d !2 (11) while the distance to c2 is similarly v !2 u u w 2 +d + rc2 = t 3 2w1 w2 w1 + 3d !2 (12) The shortest path of diffracted sound is rA +rB , where rA is the distance from the source to the diffraction point and rB is the distance from the diffraction point to the receiver. These are then v ! u u 2w 2 2 rA = t + 3 w1 w2 3d !2 (13) and q rB = d2 + w12 (14) The path difference δ = rA + rB − rc2 is used to calculate the Fresnel number, N , by N = 2δ/λ (15) where λ is the wavelength of the sound being diffracted. The simplest approximation for calculating frequency-based attenuation is to treat the building blocking Franklin’s position as a planar screen, a practice that allows an intuitive understanding of the relationship between orthogonal screen height and wavelength (Maekawa, 1968; Meyer, 2009). This method of using an equivalent screen for wedge-based diffraction is not as accurate as Pierce’s more 48 rigorous solution using the Fresnel auxiliary functions (Pierce, 1974). But since this is a more general historical application and Franklin’s position is well within the ‘shadow zone’ of the diffraction, the ‘equivalent screen’ method provides a good approximation. One of the most popular methods in noise control engineering is the KurzeAnderson formula (Kurze & Anderson, 1971), equation 16 below, which models Maekawa’s measured data closely for large values of N and within 1.5 dB whenever N < 1 (Menounou, 2001). This formula has the advantage of being calculated solely from the Fresnel number: √ ∆L = 5 + 20 log 2πN √ (dB) tanh 2πN (16) Using measured values from Fig. 15, we find that w1 = 50 and w2 ≈ 55. We may use these to find values of the projected attenuation based on Franklin’s distance d from Market Street and the frequency† (in Hz) of a noise source located at c2 relative to Franklin’s position (Fig. 17). Using the more complex analytical description of Maekawa’s solution, (Kurze, 1974) we can use rc2 rc2 sin (φ/2) ∆L = 10 log 4π −20 log +10 log 1 + −20 log 1 + λ rA + rB rA + rB sin θ + (φ/2) (17) 2δ when N ≥ 1, and using the correction term The speed of sound c was calculated using a temperature of 4.5â—¦ C, the average for Philadelphia in November, when Franklin’s account took place. This is discussed at length in the next chapter. † 49 Figure 17: Diffraction at Franklin’s Position using Kurze-Anderson Formula p (N/2) p ∆LR = 20 log 2 tanh π (N/2) π (18) as a substitute for the first term in eqn. 17 when N < 1. The attenuation projected by this formula is almost identical to that of the Kurze-Anderson approximation. The full result of this solution is shown in Fig. 18. Analyzing Figures 17 and 18, three things stand out: 1. The average attenuation for a source in the non-visible triangle from figure 16 would have been quite high – nearly 30 dB for high frequencies and large values of d. 2. The only instances in which attenuation would have been small (i.e. less than 50 Figure 18: Diffraction at Franklin’s Position using Maekawa’s Solution 10 dB) would have been for low-frequency sounds, probably less than 100 Hz, or for very small values of d. 3. Any diffracted noise would have been lower than the optimum frequency range to mask the human voice, meaning that it would have had to be very loud to still distract Franklin after being attenuated. In addition, any higher-frequency noise would have been almost completely shielded by the building until Franklin came within about 5 feet (≈ 1.5 meters) of Front Street. Therefore if the noise source did not contain a large amount of lowfrequency acoustic energy, the background noise would have been close to zero for slightly larger values of d. To answer the question of the nature of the noise source, however, we will need to further consult the historical and archaeological record. 51 Noise Sources in Eighteenth-Century Philadelphia Though the issue of noise in cities seems like a modern problem, dwellers in colonial Philadelphia began to complain about noise sources within the first century of the city’s existence.* The Quaker Meetinghouse at 2nd and Market Streets, next to the Courthouse from which Whitefield preached, would later be abandoned because excessive street noise disrupted the Friends’ silent worship services (Rath, 2003). When William Penn first laid out the city, Front Street was to be a broad promenade along the river and thus would have contained the various wharves and docks where workers were loading ships. But soon merchants began constructing places of business closer to the water, and by 1739 Front Street would have already been cut off from the noise of the river (Cotter et al., 1992). Rath’s study of colonial Philadelphia found that the two main sources of noise were carriage traffic and people (both from children’s games and human voices) (Rath, 2003). The question of carriage noise is complex, as the noise resulting can depend on the type of carriage, wheels, and especially the type of road on which it is traveling. While the carriage wheels may have been iron-wrought by 1739 (Rath, 2003) along with the horses’ shoes, the composition of Front Street itself in 1739 is difficult to ascertain. The city of Philadelphia would not take responsibility for paving its streets until 1762, (Cotter et al., 1992) and the streets up to that point were often taken care of by the residents and businesses who lived near them. Franklin himself put together a scheme for paving and cleaning part of Philadelphia in the 1750s (Franklin, 1793). Traveling to Philadelphia in 1748, Swedish Botanist Peter Kalm remarked of the streets that * This discussion is limited to the noise sources that were likely to affect Franklin from Front Street. A broader history of noise in colonial Philadelphia is included in Appendix B. 52 ...some are paved, others are not, and it seems less necessary since the ground is sandy, and therefore absorbs the wet. But in most of the streets is a pavement of flags, a fathom or more broad, laid before the houses...(Kalm, 1770) There is some agreement that Market Street itself was probably paved to some extent by the 1720s (Boudreau, 2012b, 2012a; Jackson, 1918) although that ‘pavement’ would probably have consisted of what today we would call gravel (Hershey, 1975; Boren, 2012). As Front Street was one of the more important thoroughfares of the early city, it is likely that it too would have had such a treatment, but no explicit references to the street have been found in the historical literature, positively or negatively. Front Street likely either consisted of a similar gravel pavement, possibly with a flagstone sidewalk nearer the buildings, or else was still made of the sandy soil that Kalm described. In either case, a carriage traveling on such surfaces would have been less prone to the more impulsive attacks associated with cobblestone streets and certainly would not have generated enough low frequency noise to obscure Whitefield’s speech through diffraction. Within the visible section of the street, however, such a carriage would have been a viable noise source once Franklin was close to Front Street. Though it might be thought that the crowd listening to Whitefield would have been a significant noise source, Franklin specifically mentioned how silent they were, (Franklin, 1793) and Whitefield himself commented that his audiences in America were even quieter than those in Britain (Stout, 1991). But by 1739 Whitefield was becoming quite a celebrity, and as such was drawing more than simply a devotional audience. Indeed, as Whitefield’s audiences grew, it is recorded that groups often gathered on the edge of his congregants, either social elites or the low- 53 brow inhabitants of London’s Moorfields, causing some noise around the periphery of the crowd (Dallimore, 1970; Wakeley, 1872). Such a gathering may have been the source of noise Franklin discussed, and Front Street was the ideal setting: the street itself was the site of many merchants’ clearinghouses, and the lively coffee houses that “opened directly into the life of the streets” were concentrated there as well (Cotter et al., 1992). In particular, the London Coffee House, the city’s “pulsating heart of excitement, enterprise, and patriotism” was located exactly at the southwest corner of Market Street and Front Street, very near Franklin’s position (Ukers, 1922). Though the more famous Second London Coffee House was not founded until 1754, a chronicler lists its predecessor at the same intersection (though probably not the same corner), founded in 1702. A second account disputes this location, but the second chronicler is unsure of the exact location and may be referring to a different establishment. In addition, the southern part of Front Street had been home to at least two other coffee shops by 1739 (Ukers, 1922). In any case, there is no doubt that Front Street was a bustling center of cultural life for the city, and was a likely spot for those who did not wish to join Whitefield’s congregation on Market Street. Discussion Though Franklin misremembered some facts in his autobiography, it seems likely that he correctly recalls the facts of his experiment, especially his position near Front Street and the noise source he records there. Based on the geometry of Market Street in 1739, we have shown that diffracted noise would have been greatly attenuated except for low frequencies and small distances to Front Street. The middle frequencies of Whitefield’s voice that could carry to Front Street yet remain 54 intelligible would still have been higher than the frequencies that could have been diffracted without significant energy loss. Based on the historical evidence, conversation outside a store or coffee house along Front Street seems the most likely source of noise that would have reached Franklin’s position. This would have the added benefit of occupying the same frequency range as Whitefield’s voice and thus be most likely to mask his sermon. Another possibility is noise from a carriage along Front Street, although the street would probably have been composed of gravel rather than hard cobblestone. But the noise generated from either of these sources would not have contained the large amounts of low frequency energy necessary to diffract around a corner to Franklin’s position. It seems most logical to conclude that Franklin encountered the noise abruptly by coming very close to Front Street and obtaining a direct line-of-sight to the noise, which then reduced the intelligibility of Whitefield’s speech. While Franklin was approaching the corner, the diffraction would have caused a gradual increase in the noise until it had a direct path to him, avoiding any binary classification dilemmas. Since Franklin had a direct line of sight to Whitefield, Franklin’s experiment may now be simulated using geometric acoustic modeling techniques, which do not natively account for wave phenomena. Geometrical simulation programs do however provide a wide range of tools for comparing the effects of spatial and spectral changes on STI, which is ideal for this application. In addition, this will require measured values for both possible noise sources, which may then be put into a virtual model to calculate the amount of noise at a series of receiver positions for Franklin along Market Street. Varying source distance within reasonable limits will allow high and low estimates for the SPL required for Whitefield to retain minimal intelligibility. 55 CHAPTER V ACOUSTIC SIMULATION OF FRANKLIN’S EXPERIMENT Franklin’s experiment certainly provides an important clue to the true range of Whitefield’s voice, but we should not take it at face value without considering the advances in physics that have been made in the past three centuries. Given the advances in acoustical knowledge over the next three centuries, Franklin’s actual calculation is less important than his recorded data: the maximum intelligible distance of Whitefield’s voice. The analysis in Chapter IV has suggested that Franklin would have had to have been within about 1.5 m from Front Street to be able to hear significant masking noise from a source in that street. This allows a better estimate of the maximum intelligible distance of about 121 m along the ground. This piece of data can be used to reconstruct fully the acoustic system from Whitefield to Franklin, yielding an estimate of the source magnitude necessary to achieve minimum intelligibility at Franklin’s position. The goal of this chapter is to estimate, based on the data from Franklin’s experiment, the time-averaged SPL of George Whitefield’s speaking voice at a distance of 1 m. This data can be used to simulate the maximum crowd size that could have heard Whitefield in the sites in London where he attracted his largest crowds. 56 Makeup of the Colonial City The first step toward building a model of the acoustic conditions present during Franklin’s experiment is to determine the geometrical layout of the ground, buildings, and people that would have been nearby during Whitefield’s sermon. While many period maps of Philadelphia exist, most of these depict only congruent boxes for the various buildings that made up the city. The earliest map of the city that includes scaled drawings of buildings and streets in the Clarkson-Biddle map of 1762 (Snyder, 1975), as shown in figure 15. Most of the Market Street area has changed dramatically since Franklin’s experiment, and most of the buildings that would have been present then, including the court house, no longer exist. Because of this, most of the geometrical information about the area must be reconstructed from the Clarkson-Biddle map and period drawings of the area, such as William Breton’s 1830 watercolor rendering of the court house (fig. 19). Figure 19: William Breton, Old Court House & Second Friend’s Meeting, 1830, Library Company of Philadelphia 57 The primary material composition of the area can be determined through historical and archaeological research. The Clarkson-Biddle map describes the houses of the city as being made of brick. While most of these buildings no longer exist, the brick exterior of nearby Christ Church was completed during the 1730s and provides a good basis for the sizes of bricks that would have been used in the other buildings. This brick, along with glass windows and wooden doors visible in many drawings, accounts for most of the reflective surfaces on the buildings on Market Street. As discussed in Chapter IV, the material composition of Market Street was probably more similar to gravel than smooth pavement. Since measured acoustic absorption data are available for all these materials, a geometrical computer model should be able to accurately recreate the acoustic conditions present during Franklin’s experiment. Figure 20: Inset of George Heap’s East Prospect of the City of Philadelphia, 1752, New York Public Library 58 Modeling Procedure Geometry The Market Street area was modeled geometrically in AutoCAD first by making a 2-dimensional trace of the Clarkson-Biddle map. This was scaled to the width of Market Street itself, which was laid out to be 100 feet wide (30.48 m) (Cotter et al., 1992), and measurements on the site confirmed that this value is still accurate today. Heights were estimated by proportions of horizontal to vertical measurements in drawings like fig. 19 and broader views such as George Heap’s East Prospect of the City of Philadelphia (fig. 20). These estimates were used to extrude the 2D drawing into a 3D model. The ground area from the court house to Front Street was lowered linearly corresponding to a measured drop in elevation of 2.1 m using Google Earth’s elevation database.* Windows and doors were modeled for the court house but not for other buildings, as previous research indicates that such precision is only needed very close to acoustic sources and receivers for outdoor models (Mori et al., 2011). The area directly around the court house was modeled as a series of planes 1.5 m high representing the crowd listening to Whitefield. The crowd was modeled with a total area of about 1000 m2 , corresponding to the estimates of Whitefield’s Philadelphia crowds as about 6,000 people (Tyerman, 1877) and Franklin’s assumed density of about 0.186 m2 per person. This yielded a final CAD model (fig. 21) that could be imported into CATT-Acoustic, a geometrical acoustic modeling program. In CATT the absorption coefficients for all acoustic surfaces were obtained from the publicly available ODEON library,† and are shown in table 2. * http://www.google.com/earth, † accessed 7/22/2014. available from www.odeon.dk, accessed 7/22/2014. 59 (a) Seen from East (b) Seen from above Figure 21: AutoCAD model of Market Street area, extruded from Clarkson-Biddle map Table 2 Absorption coefficients by octave band center frequency (Hz) for each material used in Market Street model Surface 63 125 250 500 1000 2000 4000 Brick, 19 holes, 60 mm 0.14 0.14 0.28 0.45 0.90 0.45 0.65 Gravel 0.25 0.25 0.60 0.65 0.70 0.75 0.80 Audience area 0.60 0.60 0.74 0.88 0.96 0.93 0.85 Windows 0.35 0.35 0.25 0.18 0.12 0.07 0.04 Solid wooden door 0.14 0.14 0.10 0.06 0.08 0.10 0.10 A source was placed at Whitefield’s position with an IEC male standard spectrum, facing outward. It was determined in Chapter III that existing voice directivity datasets were sufficient for this simulation, and so the CATT standard male spoken voice directivity pattern was applied to the virtual source. A receiver represent60 ing Franklin was positioned in the center of Market Street, 1.5 m from the edge of Front Street as previously discussed. Franklin’s position was 1.75 m above the ground level, corresponding to his known height. Whitefield’s exact height is not known, but he was described as being of medium height, so he was also modeled at 1.75 m above the court house steps. This gave a linear distance of 121.6 m between source and receiver, approaching Franklin’s position at a vertical angle of 2.6â—¦ . Sound Attenuation Simulation In an ideal free field, sound, like other wave phenomena, experiences inverse-square attenuation simply from spreading out in space. When plotted on a logarithmic decibel (dB) scale, this is often referred to as “6 dB per distance doubling,” following the free field attenuation formula (eq. 2). This equation is based on spherical source radiation with no reflecting surfaces present. Free field conditions are often used as an approximation to the acoustic conditions present in outdoor locations because of the lack of reinforcing reflection buildup found in interior spaces. In the case of Market Street, however, the initial sound is highly channeled toward Franklin’s position due to the court house behind Whitefield and the buildings lining the street on either side. These reinforcing reflections negate free field behavior near the source, but the overall decay will approach free field conditions at a sufficient distance from the source. The modeling environment, CATT-Acoustic, offers three different CT algorithms based on the density of cones and length of impulse response desired for modeling purposes (B. Dalenback, 2011). The first algorithm is optimized for speed, while the others are slower and use a higher density of cones. While the slower algorithms are recommended for detailed interior auralizations, in tests the simulated levels at Franklin’s position from all three algorithms were identical 61 within 0.1 dB. This is an advantage of CT algorithms with respect to RT: high ray counts are needed for RT applications at large distances due to the decreasing relative size of a receiver (Borish, 1984), but cones constitute a uniform solid angle at all distances. Thus in an outdoor case with no reverberation, even a low-order CT algorithm will yield a good model. −10 Market Street attenuation compared to free field Market Street Free Field −15 Attenuation (dB) −20 −25 −30 −35 −40 −45 0 10 1 2 10 10 3 10 Distance (m) from source Figure 22: Predicted logarithmic attenuation from Whitefield to Franklin The overall Z-weighted pressure attenuation along Market Street was tested in the computer model by placing virtual receivers along the center of the street, beginning at 4 m from Whitefield’s position, and doubling the distance until 128 m, just beyond Franklin’s position. As expected, this shows an initial decay less steep than that of a free field, while at greater distances its overall slope becomes very close to that of a free field decay (fig. 22). It should be noted that the virtual model also includes some high-frequency attenuation from air absorption that is not accounted for in equation 2, indicating that the deviation from a free field model is slightly greater even than shown here. The model predicts that Whitefield’s vocal 62 SPL at Franklin’s position was about 7 dB greater than it would have been at an equivalent distance in a more open environment. Speech Intelligibility After this initial analysis, it is useful to consider closely the acoustic system consisting of Whitefield’s voice, the Market Street area, and Franklin himself, since even though reflections may increase the overall level they may still reduce speech intelligibility, as in the case of highly reverberant rooms. By examining the full-spectrum pressure-squared echogram at Franklin’s position (fig. 23), it is evident that Whitefield’s voice is mainly aided by a single strong reflection from the court house itself behind him. The other reflections from surrounding buildings are more spread out in time and much weaker due to longer transmission paths and additional loss from surface absorption. Most importantly, the principal reflection reaches Franklin’s position only about 40 ms after the initial wavefront, within the 50 ms limit traditionally accepted as the cutoff time for reflections to enhance rather than degrade speech intelligibility (Bradley, Reich, & Norcross, 1999). The STI requires not only the reflection data for a given source-receiver combination, but also the background noise level at the receiver (Houtgast et al., 1980), and for this reason it was used to index Whitefield’s loudness rather than a purely time-based measure like C50 . STI uses both reflection data and background noise to calculate the overall signal degradation, which is pegged to an effective signal to noise ratio. This is then used to calculate the numerical STI quantifier from 0-1 for a series of octave bands. These bands are averaged using a weighting function for the non-linear frequency response of the auditory system to produce a single quantifier for a given acoustic system’s intelligibility. Before applying any STI calculations, 63 Figure 23: Summed pressure-squared echogram from Whitefield to Franklin it is necessary to approximate the background noise that Franklin described that “obsur’d” Whitefield’s voice. Background Noise Chapter IV concluded that the two most likely candidates for a noise source on Front Street were either a conversation around the corner or a horse and carriage moving down the street. The time-averaged levels for both of these sources was found to be very similar: measurements of conversational speech (Boren & Roginska, 2013) matched closely the IEC standard of 59.5 dBA for normal speech (B. Dalenback, 2011). Several different horses and carriages were measured at approximately 1 m on the gravel circle around the Cherry Hill Fountain in New York’s Central Park. The time-average (Leq ) of these measurements ranged from 60-63 dBA . The largest measured LAeq values were used in the model and are listed in table 3. 64 Table 3 Octave band averaged sound pressure (dB) at 1 m for both background noise sources Source 125 250 500 1000 2000 4000 8000 16000 IEC Normal Vocal Effort 51 57 60 54 49 44 39 34 Maximum Carriage Noise 57 54 58 60 56 56 51 45 The question of the exact noise level at Franklin’s position is trickier to determine absolutely, however. A single point-source model relatively far down Front Street would be attenuated nearly 30 dB according to a purely theoretical model. However, this would have made the level at Franklin’s position slightly above 30 dBA , similar to that of a modern recording studio, at a time when he was complaining about the noise level, which seems suspect. Even at times of relative quiet outside, wind alone can generate noise around 40 dBA , which is still low enough to qualify for acoustical sustainability credits in educational buildings.* It is not possible to measure a similar minimum level on site today since motorized traffic and US Interstate 95 now significantly increase the noise level at the intersection to 70-80 dBZ . Because of this, a more general framework was developed for “near” and “far” sources experiencing respective attenuation of 10 dB and 15 dB, depending on their distance to Franklin’s position. This yielded background noise levels of about 50 and 45 dBA , depending on the exact octave-band results used from table 3. Because the carriage contained more energy in the most important octave bands for STI (2-4 kHz (Steeneken & Houtgast, 1980)), this virtual source yielded a louder virtual Whitefield. While these higher frequency bands would be more subject to atmospheric air absorption than the lower bands, at a short distance (20 * http://www.usgbc.org/leed, accessed 7/22/2014. 65 m or less) these effects would be about 1 dB or less, depending on the humidity and temperature (ANSI, 2009). Atmospheric Conditions The exact temperature during Whitefield’s sermon is not known exactly. Whitefield recorded preaching from the court house steps on the 8th, 9th, and 10th of November, 1739 in his first trip to Philadelphia (Tyerman, 1877). Franklin recorded an additional sermon on the following Sunday, November 11th (Franklin, 1739). The US National Climatic Data Center does not possess any recorded temperature data for Philadelphia prior to 1767.* Peter Kalm, the Scandinavian botanist, traveled throughout the American colonies ten years later and spent time in Philadelphia during November of 1749. During this time he recorded the morning and afternoon temperatures using the newly-developed Celsius thermometer (Kalm, 1770). His recorded temperatures give an average of about 5 â—¦ C, similar to the November temperature in Philadelphia today.† While this is not determinative of the weather ten years prior, this clue was used to base the temperature and humidity data in the model on current normal values for Philadelphia in November: 4.5 â—¦ C and 50% humidity, respectively. Results The computer model was used to simulate the minimum SPL on-axis 1 m from Whitefield’s mouth that would be necessary to generate a minimal value of the STI based on “near” and “far” background noise sources. The value of the STI that * http://www.ncdc.noaa.gov/, accessed 7/22/2014. † http://weatherspark.com/averages/31282/11/10/Philadelphia -Pennsylvania-United-States, accessed 7/22/2014. 66 defines the threshold of intelligibility varies among different people: since STI is a classifier of the external acoustic system to the listener, actual subjective intelligibility will depend on the listener’s hearing perception. An STI of 0.3 is usually seen as the value at which intelligibility becomes “bad” (Hodgson, 2002). Because of this, STI values from 0.2 (corresponding to better than average hearing) to 0.4 (corresponding to below average hearing) were simulated. There is no evidence that Franklin had hearing loss, and he was still relatively young (33) at the time. Because of this, the “normal” threshold of 0.3 is probably the best measure. However, the higher and lower values are included as well, since this variable can later be included in simulations of Whitefield’s crowds in London by using the minimum STI (instead of the maximum intelligible distance) to estimate how many people with identical hearing (i.e. how many Benjamin Franklins) could have heard Whitefield speak, as Franklin’s original experiment did. Table 4 Simulated LAeq values (dB) for Whitefield’s voice based on background noise distance and minimum STI value Vocal Noise Minimum STI: 0.2 0.3 0.4 Close Source: 86 90 93 Far Source: 81 85 88 Minimum STI: 0.2 0.3 0.4 Close Source: 90 95 99 Far Source: 85 90 95 Carriage Noise Table 4 shows the simulated values of Whitefield’s on-axis SPL, in dBA , for 67 both possible noise sources. It will be noted that every estimate is greater than the IEC standard “Loud” voice, which is about 74 dBA . Many of the estimates, depending on their assumptions, posit that Whitefield’s voice was much louder than this standard, up to and beyond 90 dBA . The highest estimates are probably unreasonable, and greatly exceed the maximum on-axis vocal Leq in the existing literature, as discussed in Chapter II. However, the more moderate estimates of a time-averaged Leq of 90 dBA closely match the maximum measured values. Discussion The only acoustical factor not accounted for by a geometric model is atmospheric refraction. This may occur due to changes in wave velocity throughout a medium due either to wind or varying temperature (Piercy, Embleton, & Sutherland, 1977). Wind data are of course not obtainable for the day in question, but based on both Franklin’s and Whitefield’s references to the quiet of the scene, it seems reasonable to assume the day was not particularly blustery, especially since Whitefield often mentioned the wind if it was an acoustical factor (Dallimore, 1970). Temperature gradient data are likewise unavailable, but under normal conditions the temperature is either fairly steady or is in a “lapse” state, in which air gets colder with increasing elevation. This causes sound waves to bend upward and attenuate faster, and this effect would have been increased due to Whitefield’s elevated position. This means that adding any refraction effects into the model would only increase the simulated SPL for Whitefield’s voice, which already approaches or exceeds the existing maximum measured values of vocal SPL discussed in Chapter II. The opposite effect, a temperature inversion that carries sound farther than normal, is rare and occurs chiefly at night or early morning, or sometimes after a rainstorm (Ross, 1999). 68 Since there is no evidence of a temperature inversion, the total refraction effects on Franklin’s experiment may be presumed to be minimal. It is known that training (Mendes et al., 2003) and youth (Kent et al., 1987) both contribute to maximum vocal output, and Whitefield had both on his side as he spoke for hours per day though only 24 years old at the time. While simulated SPL values greater than those verified experimentally should be viewed with caution, the computer model predicts that Whitefield’s average SPL during Franklin’s experiment could have exceeded 90 dBA at 1 m, indicating that Whitefield might well have been one of the loudest people that ever lived. 69 CHAPTER VI MAXIMUM AVERAGED AND PEAK VOCAL SPLS In the previous chapter we showed that acoustical models of Franklin’s experiment suggest an estimate of Whitefield’s average vocal SPL at 1 m from 81-99 dBA . These estimates were roughly evenly distributed around a median value of 90 dBA , with values from 85-95 dBA projected for more combinations of noise level and STI minimum. The survey of existing literature in Chapter II showed a limit of about 90 dBA for time-averaged measurements. To test this further, it will be useful to make a series of vocal SPL measurements for trained actors and singers to compare their maximum levels. In Chapter III, vocal production modes were not found to have a significant effect on acoustic radiation pattern, but some vocalists believed that certain vocal placements were more effective at reaching an audience than others. It was hypothesized that this was because different vocal resonances might correspond to differences in overall sound pressure rather than directivity. Absolute SPLs of subjective dynamic levels measured in a controlled environment could examine this theory further. The maximum peak and average SPL values for trained vocalists are important because training can significantly increase potential vocal output (Mendes et al., 2003; Awan, 1991). The IEC standard of vocal level for acoustic simulation assumes a 3-level scheme of ‘normal,’ ‘raised,’ and ‘loud,’ with the top category 70 corresponding to an Leq of about 74 dBA (B. Dalenback, 2011). But Whitefield’s example suggests that there exists some headroom above this designation, corresponding to an even louder ‘maximal’ level of speech. Even if this level is only achievable by some trained vocalists, this is important to consider since trained vocalists are disproportionately represented in recording studios, concert halls, and theatres. For recording engineers, knowing the maximum peak and average SPLs produceable by vocalists can be useful for pre-calibrating vocal microphones. For live sound engineers, acoustic conditions may be very different for a very loud trained vocalist and require less loudspeaker reinforcement as a result. Method Pilot Study To first investigate the possible average maximum level of the spoken voice, a pilot study was undertaken using one professional actor and one professional actress. This is too small a sample size to extrapolate aggregated data. However, since the goal of this research was principally to investigate the extreme outliers, this study was meant only to explore the maximum levels achievable by trained vocalists. The vocalists were measured in the live room of the James Dolan Recording Studio at NYU. The room’s dimensions are 9m by 4.6m by 3m. The vocalists were aligned to exactly 1 m in front of an on-axis Micro-SPL measurement condenser microphone attached to an XL2 Sound Level Meter, logging peak and averaged values of LA and LZ in 1 s intervals. The meter’s sensitivity range was set to Lp values of 30-130 dB. The vocalists were not restrained in any way, but were observed to keep still during the measurements. 71 Both speakers were instructed to recite a short monologue, about 30 to 60 s in length, from memory at three different subjective loudness levels: ‘conversational’ speech, ‘theatrical’ speech (defined as the level necessary to be intelligible to an audience in a small theatre with no amplification system), and ‘maximal’ speech (defined as the loudest achievable without shouting or screaming). While some studies have used noise played over headphones to induce a higher SPL out of vocalists (Akerlund et al., 1992), for some vocalists this condition has actually reduced maximum sound pressure (Gramming et al., 1988). Since all vocalists measured for this study were highly trained, no headphones or noise conditions were used, ensuring that the participants felt natural during the experiment. The levels recorded will be reported chiefly in dBA , though the LZ values were generally within 1 dB of the LA values. Both speakers in the pilot study achieved levels comparable to the ‘loud’ designation in (B. Dalenback, 2011) at their ‘theatrical’ level, and both were able to produce an average level slightly over 90 dBA at their ‘maximal’ level (table 5). The corresponding Z-weighted values were about 0.5 dB greater for both vocalists. Table 5 Leq values for pilot study, in dBA , for Conversational, Theatrical, and Maximal Levels Participant Conv. Thea. Max. P1 - actor 64.1 77.9 90.1 P2 - actress 65.3 73.4 90.7 72 Table 6 Lpk values for pilot study, in dBA Participant Conv. Thea. Max. P1 - actor 93.6 106.0 113.2 P2 - actress 92.3 98.7 113.3 Table 6 shows the highest peak value measured for each speech designation. It can be seen that the peak value in a given monologue was routinely 20-30 dB greater than the average level for that period. To investigate this difference, we define the peak spread of a given vocal measurement as follows: Spk = 20 log ppk peq = Lpk − Leq (19) where Spk is the peak spread, ppk is the peak pressure, and peq is the average pressure. Table 7 gives the peak spread for both participants in the pilot study based on the A-weighted levels. It can be seen that Spk decreased with increasing vocal level, such that the ‘maximal’ voice contained the least variation in pressure. Table 7 Spk values for pilot study, in dBA Participant Conv. Thea. Max. P1 - actor 29.5 28.1 23.1 P2 - actress 27.0 25.3 22.6 73 Spoken and Sung Voice The pilot study showed that trained vocalists could indeed reach average levels of 90 dBA , and provided other interesting questions about the relationship between peak and average values in vocal output at different levels. After this, a more broad series of measurements was conducted on 9 trained singers in the same space, using the same setup and equipment. The vocalists contained 6 females (5 sopranos and 1 mezzo soprano) and 3 males (2 baritones and 1 tenor). The same three spoken voice designations were measured for these participants. In addition, the singers also sang a piece, about 30 to 60 s long, from their repertoire at three different dynamic levels: pianissimo (pp), mezzo forte (mf), and fortissimo (ff). While it is known that frequency content is strongly correlated with short-term SPL (Coleman et al., 1977), the singers were merely instructed to select a piece from their actual repertoire that they could sing as loudly as possible while retaining a ‘musical’ tone. Frequency of the highest note was not a criterion for selection, but singers were instructed to retain the original key of the pieces to ensure that the pieces were not artificially amplified by modulation. To investigate the role of different vocal resonances, the vocalists sang all three dynamic levels using both the ‘back’ and ‘mask’ voices, for 6 total sung measurements. The ‘back’ voice places the primary vocal resonance at the rear of the vocal tract, similar to a yawn in its most extreme form. The ‘mask’ voice uses the resonances of the sinus cavities at the front of the head. Appendix A lists the full Leq and Lpk for the spoken, back sung, and mask sung voices for the 9 singers, along with each vocalist’s numerical ID and vocal range. 74 Analysis Average Levels for Speech The mean of the 9 singers’ spoken levels was 59.0 dBA for ‘conversational’ speech, 69.9 dBA for ‘theatrical’ speech, and 79.6 dBA for ‘maximal’ speech. Each of these values increases by 1-2 dB if the two actors from the pilot study are included in the data, indicating that a population of all trained actors may achieve even higher mean values. Even with singers, however, the mean ‘maximal’ LAeq was still about 6 dB higher than the ‘loud’ level of 74 dBA used in the IEC standard (B. Dalenback, 2011). Individual vocalists were able to exceed the ‘loud’ level by up to 15 dB. Gender Differences The mean Leq values were higher among male vocalists than female vocalists at the highest vocal levels for both speech and sung conditions, consistent with previous research (Kent et al., 1987). For the sung conditions the males showed a larger dynamic range overall, as their mean Leq was lower for the pianissimo condition, but this may be a consequence of the small sample size of male singers (n=3). This is not to say that females cannot produce equally high levels – in fact, the highest recorded Leq for speech, 90.7 dBA , was produced by the female actress in the pilot study. Spoken Levels For Singers Another interesting aspect of the recorded data is the differences between sung and spoken data for the trained singers. While previous studies have conclusively shown that vocal training can lead to higher maximum SPL (Mendes et al., 2003; Akerlund et al., 1992), this effect was lessened for some singers during the spoken conditions, 75 as many of the singers produced maximum Leq values that were much lower than those of the two actors. In fact, vocalists 3, 5, and 6 had maximum values of spoken Leq that were 5 or more dB lower than the Leq for their pianissimo mask voice! This was not the case for other singers, such as vocalist 7, who was able to produce sung and spoken maximums of Leq similar to the levels produced by the trained actors. This suggests that some trained singers may have different mental frameworks for spoken vs. sung voice, which increases their maximum sung SPL more than their maximum spoken SPL. Peak Spread Figure 24: Mean peak spread, dBA After the pilot study, it had been anticipated that the peak spread would be reduced as the SPL of the measured signals increased. However, the mean spread for all vocalists measured actually shows a slight increase in peak spread with increasing SPL for both sung conditions (fig. 24). The highest mean Spk value for speech 76 was found at the medium dynamic (‘theatrical’ level). This persisted whether calculating the mean for all eleven vocalists or just for the nine singers. While the two actors’ peak spread was greatly reduced for their loudest speaking voice, the singers increased their sung Lpk slightly faster than their Leq as their dynamic level increased. Standard Deviation by Level Figure 25: Standard Deviation for the 9 Singers, dBA In addition to measuring the range of an individual’s pressure variations from the mean, it is also helpful to examine the total variance in Leq across all the singers via the standard deviations of the A-weighted levels (fig. 25). The spoken voice conditions show a clear increase in standard deviation with increasing level, indicating that the singers’ SPLs were more dispersed as they spoke with greater effort. Interestingly, this trend was not observed in the standard deviations for either of the sung voices, which stayed in a similar range at all three levels. It is possible that 77 because the subjects were primarily trained as singers rather than speakers, they had more precision as a group in their sung levels than in their spoken levels. Back vs. Mask Levels Figure 26: Mean dB Increase from Back to Mask Voice Figure 26 shows the average dB difference between the ‘mask’ voice and the ‘back’ voice. As had been hypothesized, trained singers usually interpreted the same dynamic levels at lower SPLs for the ‘back’ voice. This difference is greatest at pianissimo and decreases with increasing dynamic level. Participants 4, 5, and 6 each showed a difference greater than 7 dB for pianissimo. It is possible this difference in subjective level stems from the greater damping of the ‘back’ voice due to its placement in the rear of the vocal tract. Since singers may judge their dynamic level based more on vocal effort than absolute SPL, an equivalent vocal effort may lead to lower relative output pressure for the back voice. 78 Discussion When comparing maximum SPL measurements in the literature, averaged and peak levels should be distinguished based on the nature of the experiment. Both past studies and this current experiment have yielded maximum Leq values of 90-91 dB, as well as maximum Lpk values in the range 110-114 dB. The difference between peak and average values fluctuates between about 20 and 30 dB, and it may possibly behave differently for trained actors versus trained singers. For the purposes of simulating George Whitefield’s voice, this study confirms that averaged values of around 90 dBA are perfectly possible 1 m from a speaker. While it is conceivable that his voice may have been louder than any of the vocalists measured so far, any estimates above this measured maximum should be viewed with caution until they can be experimentally verified. 79 CHAPTER VII MODELING THE SITES OF WHITEFIELD’S LONDON CROWDS The final step to investigate Whitefield’s maximum crowd size and check Franklin’s own estimate is to model the acoustic propagation of Whitefield’s voice at the locations in London where his largest crowds were reported: the Moorfields, Kennington Common, and Mayfair. The material and geometric composition of these sites in Whitefield’s time is necessary to model accurately the acoustic systems comprised by each of these sites. Locations Moorfields Background The Moorfields was a park in London outside the Moorgate near the homes of many of Whitefield’s most devoted followers, near where both Whitefield’s and John Wesley’s devotees would later build their headquarters (Dallimore, 1970). Its wide open space functioned as something of a city mall and attracted the detritus of society. This was literally true in the case of the “not inodorous” heaps of refuse and open sewers that began to accumulate there in the seventeenth century (Thornbury, 1878). It was also metaphorically true in the case of the lower class of society who gathered there for “bear-baiting, merry-andrew shows, wrestling, cud- 80 gel playing and dog fights” (Dallimore, 1970). Though more respectable Anglican clerics avoided the area for this very reason, Whitefield, the consummate evangelist, saw only lost souls in need of his message of the new birth. Whitefield’s open-air preaching in London had begun in late April, 1739, when the leaders of St Mary’s church at Islington had refused to allow him to preach there after initially inviting him. Whitefield took this refusal as license to head outdoors, and he promptly preached to a crowd that gathered in the churchyard outside (Tyerman, 1877). On Sunday, April 29 (all dates from the Julian Calendar), Whitefield ventured out into the Moorfields for the first time. According to John Gillies, who wrote the first biography of Whitefield, Opportunities of preaching in a more regular way being now denied him, and his preaching in the fields being attended with a remarkable blessing, he judged it his duty to go on in this practice, and ventured the following Sunday into Moorfields. Public notice having been given, and the thing being new and singular, upon coming out of the coach, he found an incredible number of people assembled. Many had told him that he should never come again out of that place alive. He went in, however, between two of his friends ; who, by the pressure of the crowd, were soon parted entirely from him, and were obliged to leave him to the mercy of the rabble. But these, instead of hurting him, formed a lane for him, and carried him along to the middle of the fields, (where a table had been placed, which was broken in pieces by the crowd,) and afterwards lack again to the wall that then parted the upper and lower Moorfields ; from whence he preached without molestation, to an exceeding great multitude in the lower fields. (Gillies, 1772) 81 This first crowd at Moorfields was estimated at 10,000 people, but these numbers would grow as Whitefield began preaching there each Sunday while he was in London. The next week, his crowd was estimated at 20,000, and the week after (May 13) he reported: Preached this morning to a prodigious number of people in Moorfields and collected for the orphans £52 19s. 6d., above £20 of which was in half-pence. Indeed, they almost wearied me in receiving their mites and they were more than one man could carry home. Went to public worship twice and preached in the evening to near sixty thousand people. Many went away because they could not hear, but God enabled me to speak so that the best part of them could understand me well, and it is very remarkable what a deep silence is preserved while I am speaking. (Whitefield, 1756) The quote above would seem to indicate that Whitefield’s evening crowd, estimated at 60,000, was observed at the Moorfields. However, Whitefield’s publicist William Seward the next day reported crowds of 50,000 at the Moorfields and 60,000 at Kennington Common (Lambert, 1994). It was Whitefield’s usual practice to preach at Moorfields in the morning and Kennington in the evening, so this may be the case. However, on such occasions in his journal he usually named both locations specifically. If Seward’s account is correct, the highest reported crowd for the Moorfields is then 50,000. Under the alternate interpretation, the estimate of 60,000 would be attributed to the Moorfields rather than Kennington Common. 82 Figure 27: Inset of John Rocque’s 1746 Map of London showing the Moorfields Modeling Because of Whitefield’s specific attachment to this area, there is also more specific historical data available as to his position there. The Moorfields (fig. 27) was divided into three portions, designated as the lower, middle, and upper Moorfields. The lower Moorfields (today Finsbury Circus) was the largest portion, with more trees and greenery shown on John Rocque’s map of the area, based on surveys carried out from 1737 to 1746.* Gillies’s account is the most specific reference to the exact spot where Whitefield preached at any of the sites in London. However, * http://www.motco.com/map/81002/ 83 the upper and lower Moorfields proper are not directly adjacent to one another, so Gillies must be referring to either the border between the upper and middle Moorfields or the middle and lower Moorfields. It will be noted from Rocque’s map that the lower Moorfields’ northern edge is made of a line of trees, thus lessening the audience that could have heard Whitefield if he had preached there. In addition, William Denton’s account of the area mentions a “low wall” separating the upper and middle Moorfields but mentions no wall between the middle and lower Moorfields (Denton, 1883). Though they were slightly smaller, the upper and middle portions contained more wide open space and were nearer to Whitefield’s tabernacle north of the upper Moorfields. Thus it seems likely that Gillies misspoke and was referring to the wall between the upper and middle Moorfields. The Moorfields was modeled geometrically using a Sketchup rendering of the ground area based on existing data from the Google Maps database (fig. 28). Because the area of Whitefield’s preaching has since been developed, no elevation data was available except for the heights of the buildings present there today. To check the topography, points were selected from streets around the area on each side, which showed that the area was quite flat and lacked any significant changes in elevation. This was confirmed by a foot survey of the area as it exists today. The tree lines in the Moorfields were modeled as shown in the Rocque map, with an assumed height of 5 m. Since unnecessary planes can reduce the accuracy of a geometrical acoustical model (Rindel, 2010) and are not generally recommended unless they are very close to an acoustic source (Mori et al., 2011), the tree lines were simplified as planes that were 10% acoustically absorpent and 30% acoustically transparent with a large mid-frequency scattering coefficient of 0.5, allowing half of incident sound through and providing few specular reflections. As this was not an important London neighborhood, few period prints or drawings depict the 84 Figure 28: Sketchup Model of the Upper and Middle Moorfields exact arrangement of the bordering areas next to the Moorfields. The buildings lining the edges of the Moorfields were modeled three stories (about 10 m) tall with a sturdy wood construction on their facades with a low mid-frequency scattering coefficient of 0.1. The absorption data for the wooden buildings surrounding Moorfields are shown in table 8. It will be noted, however, that the precise reflective characteristics of these buildings only become acoustically relevant if it is predicted that the edges of the open ground could hear Whitefield clearly. The audience occupying the entire area of the site was modeled based on an average density of 2 persons per square meter (absorption data shown in table 8). The question of the correct density will be addressed in Chapter VIII, but as this is the densest audience value for which measured absorption data exist, it is not possible to use a denser value at any rate. In addition, since humans are more efficient absorbers at low densities, the extra Sabine absorption per m2 will not increase greatly as density is increased beyond a high amount (Meyer, 2009) and since these environments are reasonably free field and lack reverberation effects, even a slight change in crowd 85 absorption should not significantly affect the STI calculations, which depend much more on the direct sound level and background noise in these cases. Table 8 Absorption coefficients for buildings and crowds at the Moorfields Frequency (Hz): 125 250 500 1000 2000 4000 Wood Absorption: 0.11 0.07 0.03 0.01 0.01 0.02 Audience Absorption: 0.26 0.46 0.87 0.99 0.99 0.99 Whitefield’s mouth was modeled at a height of 1.75 m, standing atop a stone fence of 1 m in height in the center of the border between the upper and middle Moorfields. The crowd was also modeled as 1.75 m, giving Whitefield an effective height of 1 m and ensuring a direct line of sight to those in the crowd because of the flatness of the area. The total area of the entire Moorfields region is about 22 acres (89,000 m2 ), but it will be noted that the combination of the tree line and the concavity of the outer border significantly reduces the area that would have had a direct line of sight to Whitefield’s preaching position. Kennington Common Background Kennington Common (today called Kennington Park), near the Manor of Vauxhall, was the most wide open of the three sites. Like the Moorfields, it had gained a reputation as a dangerous section of the city because of its history as an execution ground. It was the home of “vicious sports and drunken brawlings,” a place where “the harlot and pick-pocket sought the victims of their trades, and...the mob assem- 86 bled, ready for any any act of violence” (Dallimore, 1970). Whitefield was again drawn to such a large unreached audience, and he himself joined the spectacle by preaching there regularly. On May 6,1739, he spoke to a crowd estimated at 50,000, and on June 3, to another possibly larger (Whitefield, 1756). As mentioned before, depending on the interpretation of Whitefield’s journal entry, it is possible that his crowd of May 13, 1739 (estimated at 60,000) was observed at Kennington Common instead of the Moorfields. Modeling Figure 29: Map of Kennington Manor, including the Common, based on Hodskinson and Middleton’s survey, 1785 Kennington Common was unfortunately not included in Rocque’s map of the city, but another map of the Common and surrounding area (segments 11 and 12 in fig. 29) shows that the common occupied essentially the same space as the park 87 does today. This makes Kennington Park the least-developed and best-preserved of the sites being modeled. As with Moorfields, Kennington was modeled in Sketchup based on the Google Maps data for the area (fig. 30). It is the most topographically varied of the three London sites, containing a slight hillock towards its center. This is still a very slight variation from a perfectly flat plain, however. Based on period images and descriptions, the Common did not have buildings close to it that could act as potential boundary reflectors. Since the area was a Common in 1739 (as opposed to a park today), Kennington would have been used for livestock grazing and would have most likely lacked trees, unlike its current layout. The extant map of Kennington Common also does not indicate any trees, and thus none were included in the Sketchup model. The lack of trees or buildings makes Kennington the most wide open of the three sites investigated, containing also the most raw area for fitting in a large crowd independent of acoustical factors. Figure 30: Modeling Kennington Common in Sketchup 88 Mayfair Background On June 1, 1739, Whitefield reported that he ...preached in the evening, at a place called Mayfair, near Hyde Park Corner. The congregation, I believe, consisted of near eighty thousand people. It was by far the largest I ever preached to yet. In the time of my prayer there was a little noise, but they kept a deep silence during my whole discourse. A high and very commodious scaffold was erected for me to stand upon, and though I was weak in myself, yet God strengthened me to speak so loud, that most could hear, and so powerfully, that most, I believe, could feel. (Dallimore, 1970) The region he described, called Mayfair (fig. 31), is now known for being one of the most expensive areas in London (partially because of its top position in the British version of the board game Monopoly* ). While its high land values have led to extensive development there today, in Whitefield’s day it was still a wideopen area named for the traditional fair that had been held there in May since the sixteenth century in the fields outside St. James’s Hospital (Walford, 1878). Modeling Situated between Hyde Park on its west and Picadilly to its south, Mayfair in Whitefield’s day had an overall area of about 23 acres (93,000 m2 ), but much of the area in its southwest corner, closest to Hyde Park, would have been obscured by the * http://www.propertywire.com/news/europe/uk-properties-monopoly -prices-201306277943.html, accessed 7/22/2014. 89 Figure 31: Inset of John Rocque’s 1746 Map of London showing Mayfair presence of Chesterfield House (fig. 32), a large manor surrounded by a high wall, as well as another smaller walled estate to the east. These both would have reduced total area both by their subtractive presence as well as their shadowing effect, obscuring sound paths from the main fair location to the northeast. Unlike the other sites, Whitefield preached at Mayfair only once during his annus mirabilis of 1739. While he preached to relatively large reported numbers at other sites around London and the rest of the Britain, the numbers reported are small compared to those recorded for Moorfields and Kennington Common. But this number of 80,000 was by far the largest ever reported for Whitefield’s crowds, and though it was a single incident, it deserves investigation solely because of the audacity of its claim. Psychologically, it is possible that a similar-sized crowd in a 90 Figure 32: Unsigned wood print of Chesterfield House, 1760 setting unfamiliar to Whitefield and his followers would have seemed perceptually larger. However, Whitefield had also moved from the margins of the city (both socially and geographically) to the center, so it seems at least plausible that this may have actually been the largest crowd he attracted. The fact that a scaffold was specifically constructed for his visit to this site may also indicate a larger degree of planning and perhaps a larger crowd. The final Mayfair Sketchup model is shown in figure 33. Because this was not a regular spot for Whitefield’s preaching, there is much less recorded evidence about where exactly he was positioned during the sermon. The mention of the scaffold constructed for him suggests that perhaps he was located toward the northeastern corner of the area, near the site of the historical fair. There is no historical data recording the location of a platform or other speaker’s position at the fair to the author’s knowledge. Based on the shape of the area and the directivity of the voice, Whitefield was positioned within the model in front of the 91 Figure 33: Sketchup Model of Mayfair only building in the area, near the street that led to Berkeley Square (fig. 31). This position would have opened up the largest range of his voice to the crowd gathered based on the shape of the site’s bordering buildings. While there was a small area behind him which would have been occluded by this building, his voice would have been aided by a strong early reflection from the building behind him, similar to that from the court house doors in Philadelphia. The building facades were modeled as wood with identical absorption properties to that used in the Moorfields, and the crowd was also constructed of the same density (table 8). The walls enclosing Chesterfield House and the other walled estate were modeled as 3 m tall and made of brick, with the absorption coefficients shown in table 9. Table 9 Absorption coefficients for brick walls near Mayfair Frequency (Hz): 125 250 500 1000 2000 4000 Brick Absorption: 0.14 0.28 0.45 0.90 0.45 0.65 These three models can be used for a more rigorous analysis of Whitefield’s 92 preaching at each site. Separate simulations will allow a better understanding of the factors that apply to Whitefield’s preaching in general or only to specific sites. In addition, other variables may be altered to investigate their significance to Whitefield’s total intelligible range. The next chapter describes the simulation process and the final estimates for Whitefield’s crowds. 93 CHAPTER VIII SIMULATIONS OF WHITEFIELD’S SERMONS IN LONDON Simulation Results: Base Conditions To test the range of Whitefield’s voice, each of the three sites was simulated in an acoustic computer model. Each simulation was carried out under ‘base’ conditions: 11.5â—¦ Celsius, 50% humidity, with Whitefield’s orientation directly forward (0â—¦ elevation). After the base conditions had been evaluated, these three variables were altered systematically to predict any differential changes that these factors might have had on the previous simulations. In the Mayfair model, where Whitefield mentioned preaching from a scaffold, additional simulations also investigated the role of added height to his maximum crowd size. We may define the Minimally Intelligible Area (MIA) for a given model as the amount of area at that site for which the STI is greater than or equal to the defined minimum STI value. Each site’s MIA was simulated using three different source SPLs: 85 dBA , 90 dBA , and 95 dBA . Since Franklin’s method incorporated both Whitefield’s vocal level and Franklin’s hearing acuity into a single measurement, the simulated loudness for Whitefield is dependent on Franklin having normal hearing, defined as a lower bound of intelligibility at STI = 0.3. There is no record of Franklin having hearing loss, but if he had slightly worse than normal hearing (lower bound of STI at 0.4), Whitefield’s voice would have had to have been about 5 dB louder, or about 95 dBA . Conversely, if Franklin’s hearing was slightly bet- 94 ter than normal (lower bound of STI at 0.2), Whitefield’s voice may have been 5 dB lower, or about 85 dBA . Since the model’s crowd is essentially populated with virtual Benjamin Franklins, this can be addressed in the simulations by simultaneously adjusting the source SPL and the minimum STI threshold. This allows the simulation to take into account a wider range of factors while retaining the same data Franklin originally measured. The pairing of a noisier crowd and a louder Whitefield also addresses the Lombard Effect, which is the tendency of humans to subconsciously raise their voices in the presence of greater background noise (Pick, Siegel, Fox, Garber, & Kearney, 1989), since Whitefield would likely have achieved his greatest vocal levels when the crowd was noisier. The estimated crowd that could fit within such an area will depend finally on a density estimate, which will be addressed later on. Sound level measurements were taken at each of the three sites during a visit in summer 2013. The Mayfair area, being close to Hyde Park Corner, had considerable noise almost continually from tourists and motorized traffic. The Moorfields area was quieter but still is completely developed at the site of Whitefield’s preaching and may not be an accurate indicator of ambient noise levels in Whitefield’s time. Kennington Common, however, remains in a similar condition to its original layout and in its southeastern end is perceptually free from traffic noise. During the periods of relative quiet (interrupted only by planes overhead leaving Heathrow airport) the ambient noise level there was measured to be as low as 50 dBA . It is possible that the city was quieter in 1749, or that Whitefield’s auditors were not as silent as he often described them to be, so simulations were carried out for background noise levels of 45, 50, and 55 dBA . The simulations were performed using an acoustic CT algorithm using CATTAcoustic v9.0 (B. Dalenback, 2011). For each simulation, site plots were generated 95 showing the projected STI values over the included area. As with the Philadelphia simulations, various cone densities were used in initial tests, which showed no effect on the final calculation because of the free field nature of the environment. Because of this, low cone densities (about 10,000 cones) were used in each simulation to reduce processing time. CATT exported a grid of 2m x 2m squares for each site with the projected STI value for each square. A customized MATLAB script was used to calculate the amount of area with STI above or equal to a given input value for each of the vocal SPL conditions. Moorfields The simulated MIA for the Moorfields under base conditions is shown in table 10. Recall that the minimum STI simulated decreases as the vocal SPL increases, such that the signal-to-noise ratio is not the only determination of the MIA. It will be seen that within the range of background noise levels considered, the MIA is quite sensitive to changes in noise level, decreasing by over 75% with a 10 dB noise increase. This suggests that, similar to the accounts in his journals, Whitefield might have been able to reach far greater crowds when they remained relatively quiet. As his preaching became more of a spectacle he attracted larger amounts of people, but the number that could have heard him clearly may have decreased as a result. In contrast, we see that the absolute SPL of Whitefield’s voice, once coupled to the hearing of our crowd of virtual Benjamin Franklins, does not have a large effect on the simulated MIA. A 10 dB increase in Whitefield’s vocal SPL from 85 to 95 dBA (corresponding to an increase in minimal STI from 0.2 to 0.4) increases MIA by only 5% at low crowd noise levels. As the background noise level in96 Table 10 Moorfields simulated MIA (m2 ) for each vocal SPL and background noise level Vocal SPL Noise Level 85 dBA 90 dBA 95 dBA 45 dBA 38,508 39,412 40,372 50 dBA 22,508 25,124 29,084 55 dBA 8,164 10,304 14,036 creases, the MIA gains from a higher vocal SPL increase to about 30% at 50 dBA , and about 72% at 55 dBA . This suggests that at low noise levels the linked STIVocal SPL system works nearly linearly, but as the background noise increases, there are higher gains associated with a louder voice. This may seem surprising as physical acoustics is essentially linear, but psychoacoustics is not, which accounts for the differences shown here: as the overall SPL increases (due to both Whitefield’s voice and the crowd’s background noise), the auditory system will admit more sound from frequency bands other than the 1-4 kHz region most important to speech intelligibility. Thus the overall STI value may decrease as these frequency bands mask the 1-4 kHz region, causing the MIA to decrease as well. Figure 34 shows the STI maps for all nine combinations of source and background noise levels. STI values from 0-0.3 are usually classified as ‘bad’, 0.3-0.4 as ‘poor’, 0.4-0.5 as ‘fair’, 0.5-0.65 as ‘good’, 0.65-0.8 as ‘very good’, and 0.8-1.0 as ‘excellent’ (Hodgson, 2002). Observe that the STI shows a cardoid-like directivity pattern, similar to those seen for the voice at mid and high frequencies. This is a result of the STI’s weighting function, which emphasizes the octave frequency bands at 1, 2, and 4 kHz to model the non-linear loudness sensitivity of the human auditory system (Houtgast et al., 1980). Though low frequencies are the least atten- 97 uated by air absorption, they are also the least important to the STI’s value over the simulated audience plane. For the extreme cases of 95 dBA source level and 45 dBA noise level, the MIA is projected to cover most of the upper and middle Moorfields except those portions occluded by treelines. This suggests that in a wider area under these “perfect storm” conditions, a single voice could reach an even larger area. However, as mentioned before, no experimental data has shown the existence of a voice that can sustain an Leq of 95 dBA , so these highest figures should be viewed with caution, as should the lowest noise levels of 45 dBA , which is extremely quiet and would be unlikely to be sustained in a large crowd for an extended period of time. However, the more modest center condition of 90 dBA vocal level and 50 dBA background noise yields an impressive area over 25,000 m2 . As the background noise level might fluctuate, the best estimate for all these simulations is probably between the center area of 25,000 m2 and the more modest figure of about 10,000 m2 for the 55 dBA noise level. Based on these more moderate source and noise level assumptions, Whitefield’s voice is not projected to be minimally intelligible toward the edges of the Moorfields, which suggests that the area acts as a reasonably free field environment for the purposes of reflection tracing. The building reflections would only reinforce the STI when listeners were very close to them so that the reflections arrived soon after the direct sound (usually within 50 ms). Farther from the buildings (that is, closer to Whitefield) the reflections could conceivably degrade the STI, but by that point the sound would be much more diminished than at the edge of the crowd, where it was already small in comparison to the background noise. After the effects of high frequency air absorption and normal intensity drop-off, these reflections would not strong enough to significantly affect STI closer to the source. 98 (a) 45 dBA noise, 85 dBA source (b) 45 dBA noise, 90 dBA source (c) 45 dBA noise, 95 dBA source (d) 50 dBA noise, 85 dBA source (e) 50 dBA noise, 90 dBA source (f) 50 dBA noise, 95 dBA source (g) 55 dBA noise, 85 dBA source (h) 55 dBA noise, 90 dBA source (i) 55 dBA noise, 95 dBA source Figure 34: Simulated STI at Moorfields for different background noise conditions 99 This suggests that the exact absorption coefficients for the buildings surrounding the Moorfields are not a significant factor as long as they are in the range of fairly absorbent materials (wood or brick) which are assumed to have been in use at that time. Much more reflective surfaces (e.g. a Palladian classical marble facade) may have been able to return enough sound intensity to slightly affect the MIA, but there is no period evidence of such a structure to the author’s knowledge. Kennington Common The simulated base condition MIA is shown for Kennington Common in table 11. Many of the same patterns observed for the Moorfields can be seen in the projections for Kennington: increases in background noise quickly shrink the MIA for a given vocal level, and increases in vocal level have a somewhat weaker effect that is diminished further at low noise levels. Table 11 Kennington simulated MIA (m2 ) for each vocal SPL and background noise level Vocal SPL Noise Level 85 dBA 90 dBA 95 dBA 45 dBA 57212 63408 67872 50 dBA 22476 27292 36512 55 dBA 7572 9612 13508 The MIA for Kennington Common under each vocal and noise condition is shown in figure 35. It can be seen that the upper bound on the MIA under the most generous acoustic conditions is merely the boundary of the common itself. If the model were to extend to the roads and adjacent land plots shown in figure 29, these 100 simulations would likely give an even larger estimate for the MIA. However, as mentioned previously, these conditions are included more out of theoretical curiosity than a realistic expectation that they represented the full acoustic system during Whitefield’s sermons. (a) 45 dBA noise, 85 dBA source (b) 45 dBA noise, 90 dBA source (c) 45 dBA noise, 95 dBA source (d) 50 dBA noise, 85 dBA source (e) 50 dBA noise, 90 dBA source (f) 50 dBA noise, 95 dBA source (g) 55 dBA noise, 85 dBA source (h) 55 dBA noise, 90 dBA source (i) 55 dBA noise, 95 dBA source Figure 35: Simulated STI at Kennington for different background noise conditions 101 The most reasonable range of MIA values (defined again as the vocal level of 90 dBA , from 50 to 55 dBA crowd noise) is similar to that for Moorfields, as we would expect since in those ranges both can be considered to be relatively free field. The slight differences between the two may be traced to the greater topological change in Kennington Common and the tree lines at Moorfields which block some direct sound paths. The least generous condition (85 dBA vocal level and 55 dBA crowd noise) MIA is 7,572, slightly reduced but similar to that for the Moorfields figure of 8,164. Mayfair The projected MIA for each vocal and crowd noise level is shown in table 12 for Mayfair under base conditions. The same general trends may be observed in the simulations for Mayfair as for the other two sites, indicating that the space functions as a relatively free field under moderate acoustic conditions. Only under the lowest noise condition (45 dBA ) are there major differences between sites, because at that noise level Whitefield’s voice is projected to fill most of whatever site it is placed in. Moorfields and Mayfair, both smaller and more closed in, thus have an upper bound to their MIA based only on area and independent of acoustical factors. While Mayfair’s upper bound is significantly smaller than Kennington’s, it is still greater than Moorfield’s largest MIA value by over 50%. Figure 36 shows the simulated STI for Mayfair under each pair of noise and vocal levels. Based on the potential audience area at each site, it is clear that Kennington, if filled, could have hosted a larger crowd than that at Mayfair. Yet specific details about the extent of each crowd are usually lacking in the historical accounts, and it may be that huge crowd reported at Mayfair filled a much greater portion of 102 Table 12 Mayfair simulated MIA (m2 ) for each vocal SPL and background noise level Vocal SPL Noise Level 85 dBA 90 dBA 95 dBA 45 dBA 53448 58024 61660 50 dBA 22336 26964 35240 55 dBA 7588 9812 13916 that site’s area on a single occasion. However, Whitefield usually preached at a single site multiple times regularly, allowing larger crowds to form at each visit, suggesting that the Mayfair estimate may be overly generous. Other Factors Having investigated the general relationship between noise, vocal level, and MIA for all three sites under base conditions, it is also useful to expand the variables investigated to estimate their significance to the final MIA simulation. First environmental factors will be investigated, followed by geometric factors. Since under moderate acoustic conditions the sites were found to all behave similarly, each variable will be addressed for a single site, with the differential changes to the base condition model presented according to a change in the dependent variable being addressed. Each of the sites addressed in this section were simulated using a vocal level of 90 dBA and a crowd noise level of 50 dBA , so the differential effect will have differences with respect to other vocal and noise level conditions. 103 (a) 45 dBA noise, 85 dBA source (b) 45 dBA noise, 90 dBA source (c) 45 dBA noise, 95 dBA source (d) 50 dBA noise, 85 dBA source (e) 50 dBA noise, 90 dBA source (f) 50 dBA noise, 95 dBA source (g) 55 dBA noise, 85 dBA source (h) 55 dBA noise, 90 dBA source (i) 55 dBA noise, 95 dBA source Figure 36: Simulated STI at Mayfair for different background noise conditions Environmental Factors Temperature Historical weather data before the nineteenth century is rare, as was the case in the model of Philadelphia’s Market Street. However, the British HadCET dataset 104 contains average monthly temperatures for London dating back to 1698 (Manley, 1953). Unfortunately its day-to-day mean temperatures do not begin until 1772, but even as estimates these monthly figures provide a better starting point than is otherwise available to us for a project of this nature since so many empirical data points have been lost to the passage of time. The dataset’s value of about 11.5â—¦ Celsius was used for the base conditions above. The differential effect of temperature was then used for a model of the Moorfields by running separate simulations every 2 degrees Celsius for values higher and lower than the base model’s temperature. Table 13 shows the relative effects on the simulated MIA for changes in the model’s temperature. Table 13 Simulated changes in MIA resulting from changes in temperature in Moorfields ∆◦ Celsius ∆ MIA (m2 ) -4 -4620 -2 -244 0 0 +2 +216 +4 +440 +6 +624 Since the speed of sound in air (or any medium) is dependent on the ambient energy of the particles that constitute that medium (i.e. its temperature), the theoretical acoustic impedance of a sound wave is also dependent on temperature, though less so than it is on humidity (ANSI, 2009; Harris, 1966). The differential analysis shows that over a wide range of temperatures from 9.5â—¦ C to 17.5â—¦ C the overall 105 simulated MIA is fairly constant (within a range of 1000 m2 ). However, for a very low temperature of 7.5â—¦ C the simulated MIA does show significant change, with a projected drop of 4620 m2 . This result seems striking but requires two caveats: first, the average temperature figure for May 1739 of about 11.5â—¦ C is already lower than would be expected for London in May,* and if the HadCET dataset is not reliable, it is possible a warmer value may be more appropriate. For instance, the average temperature for June 1739 in the dataset is about 4â—¦ warmer than that for May, and Whitefield was speaking on the cusp of the two months. Secondly, in real life scenarios temperature and humidity are correlated over certain intervals, but here they are being treated as independent variables. In an outdoor acoustic environment, as the temperature dropped closer to freezing we would normally expect the humidity to approach zero, which would decrease atmospheric absorption from water vapor, increase the audible range of a sound source, and thus increase the MIA for the site. Since this is only projected at extremely low temperatures, it seems safe to treat the MIA simulation as a reasonably good estimate based on the temperature data available to us. Humidity Air humidity is the single most important environmental factor in determining the acoustic absorption of sound by air. The water vapor particles in air serve as an absorbent obstacle for high frequencies whose wavelengths are very small, and over large enough distances this causes significant high frequency attenuation independent of normal free field intensity loss. At very low humidities, the air is so dry * http://www.metoffice.gov.uk/public/weather/climate/city-of-london -greater-london, accessed 7/22/2014. 106 that air absorption loss is very small and high frequencies travel much farther. The high frequency loss per unit distance is highest around 10-20% humidity, as shown in figure 37. As the humidity increases beyond this point, the additional water particles in the air transfer sound more efficiently between one another and thus high frequency attenuation decreases more or less monotonically as humidity increases. As the high frequency bands are very important to the calculation of STI, changes in humidity will have a significant effect on the projected MIA. Figure 37: Atmospheric absorption at different humidity levels, from (Harris, 1966) The nonlinear relationship between humidity and air absorption makes definite high frequency loss calculations difficult when humidity data is not known, since the same loss factor can often correspond to two different humidities. However, the driest humidity levels are expected during winter or within arid climates, 107 neither of which are generally associated with late spring in southern England. Since London’s current average humidities for May range from 50% to 60%,* it seems safe to consider humidity levels greater than 20%, over which air absorption should decrease consistently. Table 14 shows the projected differential changes in MIA based on changes in humidity by decade of percentage points relative to 50% humidity. Table 14 Simulated changes in MIA for Moorfields resulting from changes relative to 50% humidity ∆ % Humidity ∆ MIA (m2 ) -30% -3108 -20% -1464 -10% -628 +0% 0 +10% +528 +20% +884 +30% 1192 As expected, the simulated MIA is much lower around 20% humidity and increases monotonically with higher humidity. The rate of increase over this interval is greater in the lower humidity values (increasing MIA by about 1600 m2 from 20% to 30%) and decreases for each subsequent decade. While no mention of cold or dry weather is made by Whitefield or his followers, they did occasionally * http://www.bbc.com/weather/2643743, 108 accessed 7/22/2014. mention rain at their gatherings during this time period (Whitefield, 1756), which suggests that the spring was not unusually dry. However, beyond ruling out the extremes of very low humidity or 100% humidity (when we know it was not raining), there still remains a wide range of humidities possible for the dates being simulated for which we have little to no historical evidence. These simulations can help quantify the uncertainty in these ranges, but given the empirical data available the best precision we can attain at this point is to say that the base conditions simulations should probably be given a margin of error of ±1500 m2 based on the environmental conditions on the day of the specific sermons preached. But based on current average temperatures for London’s humidity, our starting guess of 50% humidity still appears to be a good estimate for a colder-than-average year such as 1739 based on the little data available. Geometric factors Since the geometric arrangements used in the baseline acoustic predictions are only estimates based on available historical data, it is useful also to consider the significance of slight changes in geometry on the final MIA simulations. The base conditions assumed a directional orientation of 0â—¦ elevation (that is, dead-center) for Whitefield’s mouth and a height of 1 m above the crowd for Moorfields and Kennington and 5 m above the crowd for Mayfair. How might changes in Whitefield’s direction or height above the crowd have affected the range of his voice? Directional orientation Since the goal of these simulations is to estimate the average acoustic conditions during one of Whitefield’s sermons, directly forward seems the most likely average position of his head over time. However, the acoustic directivity of the voice is 109 sensitive to slight changes in angle, and it may be useful to investigate their significance in this context. The effect of slight increases in source elevation angle were simulated for the Moorfields, with the change in MIA shown in table 15. Table 15 Simulated changes in MIA resulting from changes in source elevation angle in Moorfields ∆ % Elevation Angle ∆ MIA (m2 ) -1â—¦ -76 0â—¦ +0 +1â—¦ -128 +2.5â—¦ -320 In general, these simulations suggest that slight changes in elevation angle are much less significant to the final MIA estimate than other historical unknowns, such as weather data. A flat source angle yields the greatest MIA, but slight changes in average angle of ±1â—¦ decrease the intelligible area only slightly. At a more elevated angle of 2.5â—¦ , the MIA decreases more because of the directivity of the voice, which becomes more attenuated at positions lower than the mouth. This effect is especially pronounced in the frequency range from 1-4 kHz, which is most critical for speech intelligibility. Figure 38 shows the decrease in level off axis with increasing frequency in the directivity pattern of the male voice dataset used in these simulations (as might be expected, the lower frequency bands are more omnidirectional). 110 (a) 1 kHz (b) 2 kHz (c) 4 kHz Figure 38: Male vocal directivity pattern, in octave bands, used for Whitefield’s voice Source Height For the majority of his sermons Whitefield preached at informal locations, from such elevation as could be arranged - a hillside in Blackheath, the wall in Moorfields, a table, a tree stump, and even a tombstone are all mentioned as substitute pulpits that he used at some point (Dallimore, 1970; Gillies, 1772; Whitefield, 1756; Lambert, 1994). For the Moorfields and Kennington, it seems likely that he had a similarly modest elevation above his crowd since no other apparatus is ever mentioned in the accounts of his sermons. But in the case of his sermon at Mayfair he specifically mentioned a tall scaffold constructed for him to preach from. This was modeled as 5 m tall under the base condition simulation, but since a lower height or a different angle may have had a significant effect on his vocal range, each height from 1 to 5 m (in 1 m increments) and each source orientation elevation angle were simulated to estimate the differential effects with respect to the base conditions. It will be noted that these simulations did not include any model of the scaffold itself, which could be more or less obstructing or reflecting based on its construction. Thus these examples investigate only the effects of the source geometry on the final intelligible area. Figure 39 shows the simulated change in MIA for each combination of height and angle. 111 Figure 39: Change in MIA based on source height and angle at Mayfair While a first guess might have supposed that increasing height would have increased the area that could be reached by Whitefield’s voice, these simulations show the opposite: increasing height yields a lower MIA for any source orientation angle. This is partly because the sites in question do not contain enough topographical diversity for new area to be reached with greater source height, and partly because more height means more air absorption and intensity drop-off as the sound must travel through a greater distance to the farthest listeners. It must be said also that in the ideal system simulated here, 1 m of height was still enough to ensure a perfect line of sight from Whitefield to each of the identical Benjamin Franklins in the virtual audience. In a real audience with varying heights, some shorter individuals might benefit from a slightly larger height if it were enough to avoid sound obstruction from other audience members. Within a single height, raised source elevation angles show a decrease in MIA 112 at all elevations. As we would expect, however, the optimal source angle changes as the scaffold gets higher. At 1 m above the audience (similar to the cases at Moorfields and Kennington) a flat (0â—¦ ) angle gives the greatest MIA value. As the source height increases, the optimal angle decreases to -1â—¦ at 2 m and 3 m, -2â—¦ at 4 m, and roughly the same for -2â—¦ and -3â—¦ at 5 m. This is because at higher elevations more of the lower-head attenuation seen in figure 38 becomes salient for audience members if the head angle is kept flat. Thus it seems that even if the height of the base condition example was correct, the estimated MIA should be increased somewhat to account for a likely downward source angle to better reach his audience. The extreme lowest height of 1 m seems unlikely since Whitefield thought it important to mention that he was much higher than normal. Without more quantification, we cannot say more specifically which estimate is the best. But it seems safe to rule out the upward source angles to begin with. Furthermore, we can notice that at the downward angles all the scaffold heights converge somewhat into a smaller margin of uncertainty. If we guess that Whitefield was at a height from 2-5 m and that he was speaking downward at an angle of at least -2â—¦ , then we should increase the Mayfair MIA estimate by about 400-700 m2 . Crowd Density Having investigated thoroughly the MIA of Whitefield’s voice at each site under a variety of noise, environmental, and geometrical factors, we are now left with the question of how many people could actually fit into the area that Whitefield’s voice could fill. As mentioned before, the simulations were carried out with the maximum 113 crowd density of 0.5 m2 per person, though it was argued that greater density would not greatly increase absorption or affect the final STI calculation significantly. Though Franklin was a bit vague on some of the details of his experiment, he did explicitly state that he used a density estimate of 2 ft2 per person, or a little less than 0.2 m2 (Franklin, 1793). This is equivalent to a later figure he used during another calculation published in Poor Richard Improved years later (Franklin, 1749). Franklin seems to have based these figures on the maximum number of people that could possibly have fit into a given area. It might be thought, therefore, that to complete Franklin’s experiment all that is necessary is that we update the MIA estimate using modern technology, substitute in Franklin’s original estimate, and be done with it. However, the science of crowd estimation has also progressed since Franklin’s time, and it is worth acknowledging the advances in that field as well. In 1967 a newspaper reporter named Herbert Jacobs published one of the first examinations of crowd estimation, motivated by the overenthusiastic “wild guesses” he saw published by his colleagues (Jacobs, 1967). Jacobs meticulously counted heads in photographs of actual crowd assemblies and found that the estimates given by event organizers and reporters were often much higher than the actual figure he was able to count. Given that both organizers and the media may have an implicit bias toward larger estimates, Jacobs suggested that an Area ∗ Density calculation might lead to a better overall estimate. Jacobs’s method was later updated by Seidler et. al (Seidler, Meyer, & Gillivray, 1976) and Swank (Swank & Clapp, 1999) to include better accounting of variable crowd density and sampling methods. A more recent update on the state of the art in this field is given by (R. Watson & Yip, 2011). While the exact methodologies of these studies cannot be adopted due to the lack of photographic evidence, they do provide some insight into a reasonable density estimate over an entire crowd. 114 Under current crowd estimation techniques, a density of 4 persons per m2 (and, by extension also Franklin’s estimate of about 5 persons per m2 ) are classified as “mosh pit conditions” (R. Watson & Yip, 2011). These, as in Franklin’s account, classify the most people that can be fit into a given space. However, they only occur over very small areas and in social environments (such as their namesake) in which being in direct contact with the people around oneself is acceptable. Even supposing that close to Whitefield the excitement of his celebrity led to conditions close to this, eighteenth-century notions of propriety suggest that even the densest part of the crowd might leave more space between audience members than is found in a modern mosh pit. In addition, since there is no more detailed evidence by which to assign variable density levels, we are forced to do our best instead to find an average density level for the entire crowd. Another important consideration towards estimating an appropriate average crowd density for 1739 London is the average size of the people themselves. While estimating historical average population size (like average vocal sound pressure) is tricky due to lack of certain types of written evidence (Wachter & Trussell, 1982), it seems likely that the average male height in early eighteenth-century England was about 165 cm (Komlos & Cinnirella, 2005), about 10 cm shorter than the average British male today.* The smaller average size might suggest a concomitant greater possible crowd density. However, in contrast, the English hoop skirt was in vogue in 1739 (Chrisman, 1996), and since Whitefield and the Methodist revivals in general tended to disproportionate attract women (Dallimore, 1970) this would have made a significant different in the average area per person in Whitefield’s crowds. Even in the most dense * http://www.theguardian.com/uk/2002/aug/28/science.research, 7/22/2014. 115 accessed sections around Whitefield, hoop skirts would have contributed toward an upper bound for density less than the figure Franklin used (which, it should be recalled, he also applied to a model of soldiers standing in formation later on (Franklin, 1749)). Certainly an average density estimate of 4-5 persons per m2 seems unreasonable over areas as large as those investigated here. It seems likely that while greater density pockets probably existed around Whitefield, the entire crowd would likely have spread out to a comfortable interpersonal distance farther out. This overall density is less dependent on individual’s sizes and more on social conventions and notions of propriety. Therefore it seems reasonable to look to modern crowd estimation literature for a best estimate of average crowd density. Watson defines a lower average density of “strong” conditions as about 2 persons per m2 (0.5 m2 per person), which is in general the highest natural density achieved by a large crowd over a significant amount of area (R. Watson & Yip, 2011). This seems more in keeping with the subjective descriptions of Whitefield’s crowds* and thus will be adopted for the maximum crowd estimates for each of Whitefield’s sites. Since Franklin’s density factor is much higher, for any given MIA value the estimates adopted here may be multiplied by a constant value of 0.5m2 ≈ 2.7 2ft2 (20) to obtain the crowd values using Franklin’s original density factor. * In a description in The Gentleman’s Magazine that year, the ‘computed’ value of 20,000 people over 3 acres seems to assume a density value of about 1.6 persons per m2 , just slightly lower than this value (“The Gentleman’s Magazine”, 1739) 116 Final Crowd Estimates The density estimate of 0.5 m2 per person has the added benefit that it only requires multiplying each MIA value by 2 to obtain the approximate crowd that could fit in such an area. For crowd estimates under specific conditions, the MIA figures in tables 10, 11, and 12 may be used in addition to the correction factors suggested previously. Rather than repeating all the MIA data times 2, here we will simply give the high and low estimates for how many people Whitefield could have reached with his voice at the London locations. These will focus on the vocal level of 90 dBA and background noise levels of 50 and 55 dBA , with the understanding that other combinations of source and noise levels are also possible. These two noise levels will be classified as the ‘low’ and ‘high’ noise conditions, respectively. Though higher or lower levels cannot be ruled out, these two levels are the best guesses we have based on current information as to the levels of Whitefield’s crowds when they were being relatively quiet. Since there is such a large variation over this 5 dB interval, this area will suffice for answering most of our questions about the acoustic limits for crowd size. All values will be rounded to the nearest hundred for simplicity since this study makes no claim to greater precision (and perhaps no more precise than to the nearest thousand). The base condition estimates for Moorfields yielded an MIA of about 25,100 m2 at 50 dBA noise and about 10,300 m2 at 55 dBA . For Kennington, the low noise condition MIA was about 27,300 m2 and the high condition about 9,600 m2 . At Mayfair, these were about 27,000 m2 and 9,800 m2 , respectively. Due to environmental conditions these figures could be revised by about ± 1500 m2 , with a larger decrease if the humidity was close to 20%. Source angles in Moorfields and Kennington indicated that the flat angle in the base conditions was the optimal 117 orientation for increasing MIA. At Mayfair it was shown that the base condition estimate should be increased by +400-700 m2 . Since the humidity is not known, neither the high nor low condition will be assumed, but the 50% humidity level will be reported here, with the understanding that significant variance is possible due to atmospheric conditions. The increase factor for Mayfair, however, will be incorporated as it seems to highlight an actual deficiency in the base condition calculation. The maximum Mayfair adjustment, +700 m2 is added to the Mayfair MIA at 50 dB while the lower value of +400 m2 is added at 55 dBA since the differential circumference of the smaller area at 55 dBA would also yield a lower increase in area. These lead to the final estimates of the upper limits of Whitefield’s MIA and crowd size, shown in table 16. Table 16 Maximum simulated MIA and crowd size for each site at 90 dBA vocal level Noise Level Moorfields Kennington Mayfair MIA (m2 ) 50 dBA 25,100 27,300 27,700 55 dBA 10,300 9,600 10,200 Crowd Size 50 dBA 50,200 54,600 55,400 55 dBA 20,600 19,200 20,400 Even under these restrictions there is a wide variance allowable based mainly on the noise level in the crowd. Noise can account for a factor of about 2.5 in the final crowd limit (and adopting Franklin’s density estimate adds another similar factor for a variation of about 2.52 = 625%). Some might be tempted at this point 118 to throw up our hands and admit that the historians were right to consider these questions unknowable. However, not all is lost, and in fact much useful knowledge can be extracted from the simulations. First, as mentioned before, Franklin’s density estimate does not seem reasonable as an average value to describe the entire crowd at any of the sites. Secondly, noise is not unknowable day to day like humidity is - Whitefield’s journals and other accounts provide subjective descriptions of the crowd noise on specific dates. So we have reason to believe that on some days the crowd noise approached the lower limit of 50 dBA . Similarly, we have evidence to suggest that Whitefield’s voice could reach 90 dBA on his best days (and that he was reasonably consistent while healthy). Certainly there was some variation between these quantities, but between the two of them we can outline a general interval of 5 dB in the combined signal/noise ratio over which these high and low values probably appeared from day to day. Given this outlook, we can now begin to evaluate the crowd sizes reported from Whitefield’s day. Assuming we can trust William Seward’s assertion of 60,000 people to Kennington instead of Moorfields, the greatest reported crowd sizes for each site are shown in table 17. Table 17 Maximum reported crowd size for each site Moorfields Kennington Mayfair 50,000 60,000 80,000 It will be seen that even under the most generous acoustic conditions there is no indication that Whitefield could have reasonably reached a crowd of 80,000 peo119 ple at Mayfair (though to be fair he himself doubted that the entire crowd could hear him that day (Whitefield, 1756)). If we add an additional few thousand for favorable environmental conditions or perhaps a temperature inversion carrying his voice farther than usual, we could imagine that under very ideal circumstances Whitefield’s voice could have reached nearly 60,000 people. But such effects are quite speculative, and not verifiable based on specific data available. However, on his best days, it does seem possible that Whitefield could have been heard intelligibly by a crowd of 50,000 people. Interestingly, Franklin’s original calculation of Whitefield’s range reached an MIA estimate similar to those given here. His radius of about 121 m yields a final estimate of 1 MIA = πr2 ≈ 23, 000m2 2 (21) This is only slightly less than the maximum values shown here. This indicates that despite his historical and technological limits, Franklin’s base experiment was still a good first-order estimate. His semicircular radiation pattern would have included extra area to the sides and excluded other area behind, still giving a good answer all things considered. Franklin’s overly generous density factor presented problems for the final calculation, but his method for obtaining the MIA for Whitefield still seems valid. It is doubtful that any of us could have done better had we been in a similar situation. Recall that due to his high density factor, Franklin calculated a crowd size greater than 100,000 but then only reported 30,000. This was perhaps a combination of his New England modesty and the fact that Whitefield’s crowds of 30,000 were the instigator for Franklin’s experiment in the first place. Like Franklin, we 120 are interested not only in the peak of Whitefield’s popularity, but in the large gatherings he continued to attract over many years across Britain and America. The largest of these were usually estimated at 20,000-30,000 people. We can see that under our least ideal acoustic conditions here that Whitefield still could have been heard by 20,000 people, and with slight variations in vocal level, crowd noise, and crowd density, he could probably have spoken clearly to 30,000 people on most days. While a single crowd of 60,000 is more impressive, two crowds of 30,000 accomplish roughly the same effect from the perspective of Whitefield’s itinerant ministry. When it is considered in the context of the hundreds of such crowds he attracted over his lifetime, Whitefield probably spoke directly to more individuals than any orator in history. 121 CHAPTER IX CONCLUSION Findings This work has investigated the range of George Whitefield’s voice and the accuracy of Benjamin Franklin’s auditory experiment to find Whitefield’s maximum audience size. The investigation has required historical, archaeological, and meteorological research as well as physics-based reasoning and numerical acoustic simulations. The evidence discussed here makes a strong case for the trustworthiness of the acoustic models constructed during this research. These models suggest that Whitefield, along with other trained vocalists, could produce average vocal SPL values of about 90 dBA at a distance of 1 m. Based on Whitefield’s vocal level, it has been simulated that Whitefield could have reached a crowd of up to 50,000 people under ideal acoustic conditions. Even assuming higher noise levels or lower crowd density, the majority of Whitefield’s large crowds of 20,000-30,000 seem acoustically reasonable based on the data provided by Franklin’s experiment. Since Whitefield’s voice is projected to be as loud as any measured voice today, the crowd sizes projected here may also be good maximum values for any human gathering in the pre-amplified era. Franklin’s MIA estimation is slightly lower but still very close to those generated by the computer models, indicating that his semicircular assumption still provides a good first-order approximation for this quantity without further informa- 122 tion about source directivity or environmental contributions. However, Franklin’s density value is probably overly optimistic by at least a factor of 2. Thus this work provides a better lens for understanding Franklin’s early scientific approach before his more well known work in electromagnetism. Implications Since the publication of the C.P. Snow essay The Two Cultures (Snow, 1959) it has been a common scene for humanists to be nervous about scientists’ claims to represent the future of human knowledge. It is possible that much of the resistance to Digital Humanities research stems from an unwillingness to cede intellectual ground to overly confident scientists wielding equations and computers. However, properly considered, this should not be an area of conflict because science and the humanities have not only different tools, but also different goals.* Science, while ill-equipped to handle questions of meaning, value, or purpose, is quite well suited to counting, which is the basis of this project and indeed most of physics at a fundamental level. Humanities disciplines like history require empirical facts to interpret, and science provides the best tools for providing these basic facts for further analysis. In a similar way, it is not the author’s intention to “run the table” on epistemological authority on a historical issue. Rather, a historically significant numerical question has been examined from a scientific viewpoint to provide a quantitative answer (or at least to quantify the uncertainty remaining in the answer). It is hoped * Indeed, Snow himself was concerned that without sufficient scientific understanding, humanists and others would be overly-deferential to scientists in positions of authority (Snow, 1960). 123 that trained historians will further apply and interpret the findings from this research in a broader historical context. Future Work This work has examined the maximum intelligible range of the human voice through the lens of George Whitefield, who has been shown to be an extreme outlier in maximum vocal level based on Benjamin Franklin’s recorded data. The framework for determining unamplified vocal range may now be applied more generally to cases of orators, trained and untrained, throughout history. Since these other speakers have no such detailed description of their intelligible ranges, more guesswork will have to be used to determine their approximate speaking SPL: for instance, Alexander the Great would likely have been trained in oratory as part of his education, and may be assumed to speak to his armies at a greater level than Moses at Sinai, who was "slow of speech and of tongue." * Based on the projected maximum level for Whitefield’s voice and that measured for trained vocalists today, the maximum pressure generated by the human voice does not seem to have changed greatly over the past 300 years despite changes in amplification technology during that time. Thus it seems fair to assume that the greatest orators like Cicero or Demosthenes may have been able to sustain levels near that of Whitefield, while less trained speakers may have had levels nearer the IEC standard of loud speech. Aside from the question of absolute vocal SPL, the rest of the framework laid out in this study may be adapted in a straightforward manner to analyze the MIA of famous speeches based on environment and geometry. This can be combined with density estimates to project the effective crowd size for many famous addresses * Exodus 4:10 (English Standard Version) 124 throughout history. These additional analyses will contain greater error than for Whitefield, but will still provide a crucial step toward a quantitative description of the limits of human gatherings in the pre-amplified era. Summing Up Whitefield declared in 1739 that The Christian world is in a deep sleep. Nothing but a loud voice can waken them out of it! (Vaudry, 2003) This statement nicely captures Whitefield’s lasting significance. Not only did his relentless travel schedule and singleminded devotion to his mission succeed in awakening a religious movement that sparked lasting social, political, and ecclesiastical reform, but he also did so with the loudest of voices - one that (metaphorically speaking, of course* ) continues to resound through the ages. * Acoustic metaphors should always be used sparingly, especially in scientific works. 125 BIBLIOGRAPHY Abel, J., Rick, J., Huang, P., Kolar, M., Smith, J., & J. Chowning. (2008). On the Acoustics of the Underground Galleries of Ancient Chavin de Huantar, Peru. In Acoustics ‘08. Paris. 15 Akerlund, L., Gramming, P., & Sundberg, J. (1992, January). Phonetogram and averages of sound pressure levels and fundamental frequencies of speech: Comparison between female singers and nonsingers. Journal of Voice, 6(1), 55–63. 24, 25, 27, 72, 75 Allen, G. D. (1971). Acoustic Level and Vocal Effort as Cues for the Loudness of Speech. Journal of the Acoustical Society of America, 49(6B), 1831–1841. 24 Allen, J., & Berkeley, D. (1970). Image method for efficiently simulating small room acoustics. Journal of the Acoustical Society of America, 65, 943–950. 20 Andreopoulou, A., & Roginska, A. (2012). Computer-Aided Estimation of the Athenian Agora Aulos Scales Based on Physical Modeling. In Proceedings of the 133rd Audio Engineering Society Convention. San Francisco, CA. 15 ANSI. (2009). Method for Calculation of the Absorption of Sound by the Atmosphere (Tech. Rep.). American National Standards Institute. 66, 105 Awan, S. (1991). Phonetographic profiles and F0-SPL characteristics of untrained versus trained vocal groups. Journal of Voice, 5(1), 41–50. 25, 70 Bonsi, D., Longair, M., Garsed, P., & Orlowski, R. (2008). Acoustic and audience response analyses of eleven Venetian churches. In Acoustics ‘08 (pp. 3087– 3092). Paris. 16 Boren, B. (2012). Sounds of the City: 126 The Colonial Era. Retrieved from http://philadelphiaencyclopedia.org/archive/ sounds-of-the-city-the-colonial-era/ 53 Boren, B., & Longair, M. (2011). A Method for Acoustic Modeling of Past Soundscapes. In Proceedings of the Acoustics of Ancient Theatres Conference. Patras, Greece. 16 Boren, B., Longair, M., & Orlowski, R. (2013). Acoustic Simulation of Renaissance Venetian Churches. Acoustics in Practice, 1(2), 17–28. 4, 16 Boren, B., & Roginska, A. (2013). Maximum Averaged and Peak Levels of Vocal Sound Pressure. In Proceedings of the 135th Audio Engineering Society Convention. New York, NY. 64 Borish, J. (1984). Extension of the image model to arbitrary polyhedra. Journal of the Acoustical Society of America, 75(6), 1827–1836. 20, 21, 62 Boudreau, G. (2012a). Independence: A Guide to Historic Philadelphia. Westholme Publishing. 53 Boudreau, G. (2012b). Personal Communication. 53 Bradley, J. S., Reich, R., & Norcross, S. G. (1999). A just noticeable difference in C50 for speech. Applied Acoustics, 58(58), 99–108. 63 Bridenbaugh, C. (1964). Cities in the Wilderness: The First Century of Urban Life in America 1625-1742. New York: Alfred A. Knopf. 143 Bridenbaugh, C. (1971). Cities in Revolt: Urban Life in America, 1743-1776 (2nd ed.). London, New York: Oxford University Press. 142 Bridenbaugh, C., & Bridenbaugh, J. (1942). Rebels and Gentlemen; Philadelphia in the Age of Franklin. New York: Reynal and Hitchcock. 141, 143, 144 Cabrera, D., Davis, P. J., & Connolly, A. (2011, November). Long-term horizontal vocal directivity of opera singers: effects of singing projection and acoustic environment. Journal of Voice, 25(6), e291–e303. 23, 29, 31, 32 Chrisman, K. (1996). Unhoop the Fair Sex: The Campaign Against the Hoop 127 Petticoat in Eighteenth-Century England. Eighteenth-Century Studies, 30(1), 5–23. 115 Chu, W. T., & Warnock, A. C. C. (2002). Detailed Directivity of Sound Fields Around Human Talkers (Vol. 61; Tech. Rep.). National Research Council Canada. doi: http://dx.doi.org/10.4224/20378930 6, 23, 29 Coleman, R. (1994, September). Dynamic intensity variations of individual choral singers. Journal of Voice, 8(3), 196–201. 25 Coleman, R., Mabis, J., & Hinson, J. (1977). Fundamental Frequency-Sound Pressure Level Profiles of Adult Male and Female Voices. Journal of Speech and Hearing Research, 20, 197–204. 7, 25, 74 Cotter, J., Roberts, D., & Parrington, M. (1992). The Buried Past: An Archaeological History of Philadelphia. University of Pennsylvania Press. 46, 52, 54, 59, 142 Dalenback, B. (2011). CATT-Acoustic v9. Gothenburg, Sweden: CATT. 28, 61, 64, 71, 72, 75, 95 Dalenback, B.-I. (1996). Room acoustic prediction based on a unified treatment of diffuse and specular reflection. Journal of the Acoustical Society of America, 100(2). 21 Dallimore, A. (1970). George Whitefield; the life and times of the great evangelist of the eighteenth- century revival. London: Banner of Truth Trust. 7, 13, 54, 68, 80, 81, 87, 89, 111, 115 Denton, W. (1883). Records of St. Giles’ Cripplegate. London: George Bell and Sons. 84 Dunn, H. K., & Farnsworth, D. W. (1939). Exploration of Pressure Field Around the Human Head During Speech. Journal of the Acoustical Society of America, 10(3), 184–199. 23 Flanagan, J. L. (1960). Analog measurements of sound radiation from the mouth. Journal of the Acoustical Society of America, 32(12), 1613–1620. 23 128 Franklin, B. (n.d.). The papers of Benjamin Franklin (L. Labaree, Ed.). New Haven: Yale University Press. Retrieved from franklinpapers.org 8 Franklin, B. (1739). The Pennsylvania Gazette, Nov. 15. Philadelphia. 66 Franklin, B. (1740). The Pennsylvania Gazette, May 8. Philadelphia. 14 Franklin, B. (1749). Poor Richard, Improved. Philadelphia, PA: Benjamin Franklin. 8, 45, 114, 116 Franklin, B. (1793). The Autobiography of Benjamin Franklin (2nd ed.). New Haven and London: Yale University Press. 2, 45, 52, 53, 114 The Gentleman’s Magazine. (1739). The Gentleman’s Magazine, 9, 162. 13, 116 Gillies, J. (1772). Memoirs of the Life of the Reverend George Whitefield, MA. Oswestry, UK: Quinta Press. 81, 111 Gillingham, H. E., & Drowne, S. (1924). Dr. Solomon Drowne. The Pennsylvania Magazine of History and Biography, 48(3), 227–250. 143 Gramming, P., Sundberg, J., & Ternström, S. (1988). Relationship between changes in voice pitch and loudness. Journal of Voice, 2(2), 118–126. 25, 72 Harris, C. (1966). Absorption of Sound in Air versus Humidity and Temperature. Journal of the Acoustical Society of America, 40(1), 141–159. xiii, 105, 107 Hershey, W. (1975). Independence Hall Sidewalk Salvage Project (Tech. Rep.). Philadelphia: Independence National Historical Park Library. 53, 142 Hodgson, M. (2002). Rating, ranking, and understanding acoustical quality in university classrooms. Journal of the Acoustical Society of America, 112(2), 568–575. 67, 97 Houtgast, T., Steeneken, H. J. M., & Plomp, R. (1980). Predicting speech intelligibility in rooms from the modulation transfer function. I. General room acoustics. Acustica, 46(1), 60–72. 5, 63, 97 Howard, D., & Moretti, L. (2010). Sound and Space in Renaissance Venice. Yale University Press. 15 129 Jackson, J. (1918). Market Street Philadelphia: The Most Historic Highway in America, Its Merchants, Its Story. Patterson and White. 53, 142 Jacobs, B. H. A. (1967). To count a crowd. Columbia Journalism Review, 5, 37–40. 13, 114 Kalm, P. (1770). The America of 1750 : Peter Kalm’s travels in North America : the English version of 1770. New York: Dover Publications. 53, 66 Katz, B., & D’Alessandro, C. (2007). Directivity Measurements of the Singing Voice. In 19th International Congress on Acoustics (Vol. 10, pp. 45–50). 5, 6, 23, 24, 29, 32 Kent, R., Kent, J., & J. Rosenbek. (1987). Maximum Performance Tests of Speech Production. Journal of Speech and Hearing Disorders, 52, 367–387. 7, 25, 69, 75 Kleiner, M., Dalenback, B.-I., & Svensson, P. (1993). Auralization-An Overview. Journal of the Audio Engineering Society, 41(11), 861–875. 18, 19 Knowles, K. (2008). What could Lee see at Gettysburg? In Placing history: how maps, spatial data, and GIS are changing historical scholarship (pp. 235– 266). Redlands, CA: ESRI Press. 15 Komlos, J., & Cinnirella, F. (2005). European Heights in the Early 18th Century. Vierteljahrschrift für Sozial- und Wirtschaftsgeschichte, 94, 271–284. 115 Kurze, U. J. (1974, March). Noise reduction by barriers. The Journal of the Acoustical Society of America, 55(3), 504. 49 Kurze, U. J., & Anderson, G. (1971, January). Sound attenuation by barriers. Applied Acoustics, 4(1), 35–53. 49 Lambert, F. (1994). Pedlar in Divinity. Princeton, NJ: Princeton University Press. 7, 14, 43, 82, 111 Leino, T. (2009, November). Long-term average spectrum in screening of voice quality in speech: untrained male university students. Journal of Voice, 23(6), 671–6. 25 130 Liberman, M. (2005). Counting People. Retrieved 9/21/2012, from http://itre.cis.upenn.edu/~myl/languagelog/ archives/002487.html 44 Lisa, M., Rindel, J., & Christensen, C. (2004). Predicting the acoustics of ancient open-air theatres: the importance of calculation methods and geometrical details. In Joint Baltic-Nordic Acoustics Meeting 2004. 22 Maekawa, Z. (1968). Noise reduction by screens. Applied Acoustics, 1(3), 157– 173. 48 Mahaffey, J. (2007). Preaching Politics : The Religious Rhetoric of George Whitefield and the Founding of a New Nation. Waco, TX: Baylor University Press. 7 Mahaffey, J. (2012). Personal Communication. 13 Manley, G. (1953). The mean temperature of Central England, 1698 to 1952. Quarterly Journal of the Royal Meteorological Society, 12, 317–342. 105 Marshall, A. H., & Meyer, J. (1985). The directivity and auditory impressions of singers. Acustica, 58, 130–140. 24 McKendree, F. S. (1986). Directivity indices of human talkers in English speech. In Internoise 86 (pp. 911–916). Cambridge. 23, 31 Mendes, A. P., Rothman, H. B., Sapienza, C., & Brown, W. (2003, December). Effects of vocal training on the acoustic parameters of the singing voice. Journal of Voice, 17(4), 529–543. 25, 69, 70, 75 Menounou, P. (2001, October). A correction to Maekawa’s curve for the insertion loss behind barriers. The Journal of the Acoustical Society of America, 110(4), 1828. 49 Meyer, J. (2009). Acoustics and the Performance of Music (5th ed.). Springer. 48, 85 Monson, B. B., Hunter, E. J., & Story, B. H. (2012, July). Horizontal directivity of low- and high-frequency energy in speech and singing. The Journal of the 131 Acoustical Society of America, 132(1), 433–41. 23, 24, 31 Mori, J., Yoshino, D., S. Satoh, & Tachibana, H. (2011). Prediction of outdoor sound propagation by applying geometrical sound simulation technique. In Internoise 2011. Osaka, Japan. 22, 59, 84 Nawka, T., Anders, L. C., Cebulla, M., & Zurakowski, D. (1997, December). The speaker’s formant in male voices. Journal of Voice, 11(4), 422–8. 24 Olesen, S. K. (1997). Low Frequency Room Simulation using Finite Difference Equations. In Proceedings of the 102nd Audio Engineering Society Convention. Munich, Germany. 18, 19 Orlowski, R. (2006). Acoustics and Architectural Form. In Architettura e musica nella venezia del rinascimento. Bruno Mondadori. 15, 45 Pick, H. L., Siegel, G. M., Fox, P. W., Garber, S. R., & Kearney, J. K. (1989). Inhibiting the Lombard effect. The Journal of the Acoustical Society of America, 85(2), 894. Retrieved from http://scitation.aip.org/ content/asa/journal/jasa/85/2/10.1121/1.397561 doi: 10.1121/1.397561 95 Pierce, A. (1974). Diffraction of sound around corners and over wide barriers. Journal of the Acoustical Society of America, 55(5), 941–955. 49 Piercy, J. E., Embleton, T. F., & Sutherland, L. C. (1977, June). Review of noise propagation in the atmosphere. The Journal of the Acoustical Society of America, 61(6), 1403–18. 68 Rath, R. (2003). How early America sounded. Cornell University Press. 15, 52, 140, 144 Rindel, J. (2000). The Use of Computer Modeling in Room Acoustics. Journal of Vibroengineering, 3(4), 219–224. 19, 21 Rindel, J. (2002). Modelling in Auditorium Acoustics - From Ripple Tank and Scale Models to Computer Simulations. In Proceedings of the 2002 Forum Acusticum. Sevilla, Spain. 16, 17, 18 132 Rindel, J. (2010). Room Acoustic Prediction Modelling. In El XXIII Encontro da Sociedade Brasileira de Acustica. Salvador, Brazil. 84 Rindel, J., Nielsen, G., & Christensen, C. (2009). Diffraction around corners and over wide barriers in room acoustic simulations. In The Sixteenth International Congress on Sound and Vibration. 22 Ross, C. (1999). Outdoor Sound Propagation in the U.S. Civil War. Echoes, 9(1). 15, 68 Scarre, C., & Lawson, G. (Eds.). (2006). Archaeoacoustics. Cambridge, UK: McDonald Institute for Archaeological Research. 15 Scharf, J., & Westcott, T. (1884). The History of Philadelphia, 1609-1884, Vol. I. Philadelphia, PA: L.H. Everts and Co. 141, 142, 143 Seidler, J., Meyer, K., & Gillivray, L. M. (1976). Collecting Data on Crowds and Rallies: A New Method of Stationary Sampling. Social Forces, 55(2), 507–519. 114 Serra, J., Koduri, K., Miron, M., & Serra, X. (2011). Assessing the Tuning of Sung Indian Classical Music. In 12th International Conference on Music Information Retrieval (ISMIR-11). Miami, FL. 15 Sivitz, P., & Smith, B. (2012). Philadelphia and Its People in Maps: The 1790s. Retrieved from http://philadelphiaencyclopedia.org/ archive/philadelphia-and-its-people-in-maps-the -1790s/ 140 Smith, B. (1999). The Acoustic World of Early Modern England: Attending to the O-Factor. Chicago: University of Chicago Press. 15 Snow, C. P. (1959). The Two Cultures and the Scientific Revolution. New York: Cambridge University Press. 123 Snow, C. P. (1960). Science and Government. Cambridge, MA: Harvard University Press. 123 Snyder, M. (1975). City of Independence: Views of Philadelphia Before 1800. New 133 York: Praeger Publishers. 57, 140, 141 Steeneken, H. J. M., & Houtgast, T. (1980). A physical method for measuring speech transmission quality. Journal of the Acoustical Society of America, 67(1), 318–326. 65 Stout, H. (1991). The Divine Dramatist: George Whitefield and the Rise of Modern Evangelicalism. Grand Rapids, MI: William B. Eerdmans Publishing Company. 13, 43, 53 Sundberg, J. (2001, June). Level and center frequency of the singer’s formant. Journal of Voice, 15(2), 176–86. 24 Sundberg, J., & Nordenberg, M. (2006). Effects of vocal loudness variation on spectrum balance as reflected by the alpha measure of long-term-average spectra of speech. The Journal of the Acoustical Society of America, 120(1), 453– 457. 27 Swank, E., & Clapp, J. (1999). Some Methodological Concerns When Estimating the Size of Organizing Activities. Journal of Community Practice, 6(3), 49– 69. 114 Thornbury, W. (1878). Moorfields and Finsbury. In Old and New London: Volume 2 (Vol. 2, pp. 196–208). London, Paris, and New York: Cassell, Petter, Galpin, and Co. 80 Tyerman, L. (1877). The life of the Rev. George Whitefield. New York: Anson D. F. Randolph and Company. 59, 66, 81 Ukers, W. (1922). All About Coffee. The Tea and Coffee Trade Journal Company. 54 Vaudry, R. (2003). Anglicans and the Atlantic World. McGill-Queen’s University Press. 125 Vorländer, M. (1995). International Round Robin on Room Acoustical Computer Simulations. In 15th International Congress on Acoustics Proceedings. Trondheim. 21 134 Wachter, K. W., & Trussell, J. (1982). Estimating Historical Heights. Journal of the American Statistical Association, 77(378), 279–293. 115 Wakeley, J. (1871). The Prince of Pulpit Orators: A Portraiture of Rev. George Whitefield, M.A. (2nd ed.). New York: Carlton and Lanahan. 5 Wakeley, J. (1872). Anecdotes of the Rev. George Whitefield, M.A., with Biographical Sketch. Hodder and Stoughton. 54 Walford, E. (1878). Mayfair. In Old and New London: Volume 4 (pp. 345–359). London, Paris, and New York: Cassell, Petter, Galpin, and Co. 89 Wall, J., Stephens, J., & Markham, B. (2012). Virtual Paul’s Cross Project. Retrieved 2/20/2013, from http://vpcp.chass.ncsu.edu/ 16 Watson, J. (1830). Annals of Philadelphia. E.L. Carey and A. Hart. 142, 143, 144 Watson, R., & Yip, P. (2011). How many were there when it mattered ? Significance, 8(3), 104–107. 13, 114, 115, 116 West, M., Gilbert, K., & Sack, R. (1992, January). A tutorial on the parabolic equation (PE) model used for long range sound propagation in the atmosphere. Applied Acoustics, 37(1), 31–49. 19 White, M. J., & Gilbert, K. E. (1989, January). Application of the parabolic equation to the outdoor propagation of sound. Applied Acoustics, 27(3), 227–238. 19 Whitefield, G. (1756). The Works of George Whitefield: Journals. Oswestry, UK: Quinta Press. 14, 82, 87, 109, 111, 120 135 APPENDIX A FULL VOCAL SPL MEASUREMENTS 136 Table 18 Leq values for speech, in dBA Participant Conv. Thea. Max. 1 - mez. sop. 60.1 65.3 72.4 2 - soprano 57.3 72.6 86.6 3 - soprano 58.0 65.5 73.3 4 - baritone 61.7 75.4 84.3 5 - soprano 56.2 68.0 73.1 6 - soprano 55.6 61.4 69.7 7 - baritone 63.7 76.2 90.3 8 - soprano 62.8 71.0 79.9 9 - tenor 55.7 73.3 86.5 pp mf ff 1 - mez. sop. 70.8 73.8 77.1 2 - soprano 69.3 75.0 82.2 3 - soprano 78.5 79.8 83.7 4 - baritone 68.0 79.0 83.8 5 - soprano 69.3 82.5 84.9 6 - soprano 69.0 76.1 82.3 7 - baritone 68.5 80.4 86.8 8 - soprano 71.3 76.1 80.4 9 - tenor 63.3 73.7 88.4 Table 19 Leq values for back sung voice, in dBA Participant 137 Table 20 Leq values for mask sung voice, in dBA pp mf ff 1 - mez. sop. 71.3 73.9 76.9 2 - soprano 67.3 75.7 82.1 3 - soprano 80.2 82.5 82.2 4 - baritone 75.6 81.6 86.4 5 - soprano 78.1 79.5 83.4 6 - soprano 76.7 82.2 88.1 7 - baritone 69.6 84.2 90.8 8 - soprano 71.9 77.9 83.1 9 - tenor 68.1 78.6 90.7 Conv. Thea. Max. 1 - mez. sop. 81.6 88.3 94.0 2 - soprano 79.0 97.0 107.8 3 - soprano 82.8 91.7 100.7 4 - baritone 85.7 97.1 107.4 5 - soprano 77.6 91.3 94.5 6 - soprano 77.7 84.3 93.7 7 - baritone 88.4 102.6 113.1 8 - soprano 88.2 98.1 104.1 9 - tenor 79.6 98.1 112.4 Participant Table 21 Lpk values for speech, in dBA Participant 138 Table 22 Lpk values for back sung voice, in dBA pp mf ff 1 - mez sop. 88.9 93.8 95.7 2 - soprano 89.6 94.6 102.7 3 - soprano 97.8 100.4 105.5 4 - baritone 89.4 100.6 105.4 5 - soprano 88.1 103.6 107.2 6 - soprano 94.2 99.2 103.5 7 - baritone 88.9 101.7 109.6 8 - soprano 94.3 98.6 101.3 9 - tenor 78.0 91.6 108.2 pp mf ff 1 - mez. sop. 88.7 91.8 96.4 2 - soprano 85.4 98.1 105.0 3 - soprano 101.3 102.4 103.1 4 - baritone 97.2 104.4 109.4 5 - soprano 101.5 101.6 105.0 6 - soprano 99.9 105.0 112.0 7 - baritone 92.9 108.5 110.8 8 - soprano 93.2 98.8 103.0 9 - tenor 83.9 96.9 113.0 Participant Table 23 Lpk values for mask sung voice, in dBA Participant 139 APPENDIX B HISTORY OF SOUND IN COLONIAL PHILADELPHIA Soon after its founding, Philadelphia quickly crossed the threshold from a mere rural agglomeration into a true city, complete with an urban soundscape. In contrast to the countryside, where large distances and tree lines weakened the intensity of sound traveling between farms, within the city neighbors had no choice but to hear the diverse noises that resulted from both private and public endeavors. Despite William Penn’s vision of a city spread between the Delaware River and the Schuylkill, Philadelphia remained densely concentrated along the Delaware throughout the eighteenth century (Sivitz & Smith, 2012). Sounds from the private sphere intruded into public life without hindrance: women’s batting staffs, street criers, and bells sounded loudly throughout the city’s domestic, commercial, and religious life (Rath, 2003). The intersection of Market and Second Street was central to William Penn’s plan for Philadelphia, and the soundscape at this point quickly began to diverge from that of the countryside. As early as 1682 it was the site of a simple cage for the city’s criminal offenders with no sound insulation whatsoever (Snyder, 1975), and a later prison built in the middle of Market Street was labeled a nuisance by the city’s Grand Jury in 1702. The space in the middle of Market Street next became host to the bleating of sheep, who were pastured on the common area by the town butcher. Plenty of human noise followed as well: Philadelphia’s market was moved from Front Street to the intersection of Market and Second, meeting on Wednesdays 140 and Saturdays. A bell rang out to signify the opening of the market, which quickly gave Market Street (originally called High Street) its name (Scharf & Westcott, 1884). Bells were an especially common means of sonic communication in the colonial city, present in churches, clocks, and schools, not to mention what would later be called the Liberty Bell, which served to call the members of the legislature to work. Another bell would sit atop the Old Court House, built in 1707 at the same intersection, which brought more focus to Market and Second Street over the next few decades as the city’s only important public building (Snyder, 1975; Bridenbaugh & Bridenbaugh, 1942). Both the city and county courts met in the Court House, whose foundation stood on brick arches that allowed the market stalls to extend under the building itself (Scharf & Westcott, 1884). Since the market’s hours were strictly defined, presumably noise from the stalls did not significantly disturb the proceedings within the Court House. Instrumental music and singing added to the sounds of the city. Religious leaders, including Whitefield, discouraged secular or instrumental music, preferring only devotional hymns during church services, and in 1740 only three churches in the city possessed an organ. Gradually, as the influence of Whitefield’s revivals waned, instrumental music became more popular. By 1774 every worship service in Philadelphia used an organ except the silent worship of the Society of Friends. Public houses were often centers of popular music, and David Lockwood’s tavern even had a “Musical Clock,” which played “Sonatas, Concertos, Marches, Minuets, Jiggs, and Scots Airs.” (Bridenbaugh & Bridenbaugh, 1942) In contrast to the riotous atmosphere at many taverns, the upper echelons of Philadelphia society preferred the more subdued conversation at the city’s growing number of coffee houses, including the famous London Coffee House at the corner 141 of Front and Market Street, which opened in 1754. It began as a place for civil conversations and business transactions between merchants and traders, but this led to the Coffee House’s use as an all-purpose auction house for horses, carriages, and even slaves. This louder commercial soundscape would eventually give way to the crackles of bonfires and shouts of revolutionary mobs: as the conflict with Britain worsened in the 1760s, the street in front of the Coffee House became the site of protests against the Stamp Act and later the burning in effigy of British officials (Scharf & Westcott, 1884). Indeed, carriages and wagons coming from the countryside to the wharves along the river, as well as the whip cracks of their drivers, continually generated noise (Bridenbaugh, 1971). In the first half of the eighteenth century, streets were not often paved, and what pavement there was consisted of what archaeologists call pebblestone (Hershey, 1975), similar to gravel (Jackson, 1918). Dirt and gravel roads provided less rigid surfaces than a hard cobblestone pavement, reducing the noise from cartwheels and horses’ hooves while producing a slight hiss from the small particles of stone sticking to wheels (as in the audio example). But as the city developed more in the second half of the century, streets became more uniformly and solidly paved (Cotter et al., 1992) and wheels were more likely to be lined with iron (Bridenbaugh, 1971), both of which increased the radiated noise throughout the city. John Fanning Watson’s Annals of Philadelphia in the Olden Time included several anecdotes of citizens hearing voices or artillery over great distances, and older residents told him that it was easier to hear distant sounds when the city had fewer carriages and unpaved streets (J. Watson, 1830). These sounds, along those of the herds of livestock occasionally found moving through the city, ensured that the colonial city’s soundscape was not entirely divorced from the sounds of the countryside around it like the later industrialized city’s would be. 142 While the carriages heading to the wharves generated noise throughout Philadelphia, the Delaware riverfront itself provided another diverse soundscape on the edge of the growing city. The riverfront also had associations with intemperance, as one of the earliest river landing sites was named for a tavern on that site that predated the city. Caves along the river also served as grog shops, which further contributed to the lower-class reputation of the waterfront through the frivolity, songs, and brawls that went along with their wares. These led to the passing of strict laws against drunkenness as early as 1682-3. Soon however, the wealthy founders of Philadelphia built large wharves along the river, and the waterfront’s soundscape was further infused with clanking anchors and chains, the groaning of masts and riggings, the speech of sailors and merchants, and the loading and unloading of carriages, wagons, and ships (Bridenbaugh, 1964). The growing noise led the citizens of Philadelphia to take action while the city was still relatively young. As early as 1732, the city drew up a noise ordinance restricting gatherings and noise-making on Sundays (Scharf & Westcott, 1884), possibly for the peace of the Friends’ worship services. But as the city continued to develop and increase in loudness and density, the city’s growing commercial noise prompted a flow of wealthy citizens that began to move to the suburbs (Bridenbaugh & Bridenbaugh, 1942). In a letter to his brother in Rhode Island, Doctor Solomon Drowne wrote from Philadelphia in 1774 “I almost envy you your pleasant situation on Mendon’s pleasant Hill, remote from Noise & Confusion. Here the thundering of Coaches, Chariots, Chaises, Waggons, Drays, and the whole Fraternity of Noise almost continually assails our Ears.” (Gillingham & Drowne, 1924) As early as 1711, Robert Fairman mentioned in a letter the benefits of a plantation “out of the noise of Philadelphia, but in site of it.” (J. Watson, 1830) In 1770, the principal of a 143 private academy north of Philadelphia likewise advertised his college as being “free from the noise of the city.” (J. Watson, 1830) Nights in the city were quieter by and large, marked by the occasional “crying of the hour” by the night watchman. Occasionally, however, disturbances did break out when taverns brawls erupted into the streets. In the years leading to the American Revolution, British soldiers in the city may have affected transatlantic relations through their nighttime noisemaking: in 1769, a young Englishman named Alexander Macraby recorded that he, along with several officers and a band, routinely paraded through the streets at midnight and played under the windows of young women, “which,” he added, “they esteem a high compliment.” Occasionally their celebrations passed into the countryside, such as when Macraby wrote that they took seven sleighs with fiddlers on horseback “to a public house a few miles from town, where we danced, sung and romped and eat and drank, and kicked away care from morning till night.” (Bridenbaugh & Bridenbaugh, 1942) While some scholars have argued that colonial Philadelphia was even louder than a modern urban environment (Rath, 2003), the crucial difference is not the overall loudness over time but the different textures of the soundscapes. Modern city noise is characterized by relatively continuous background noise from engines, generators, and ventilation systems. In contrast, eighteenth-century Philadelphians heard many short, impulsive sounds rising over a very quiet pre-electric background. Only at the busiest times did the colonial city have enough individual sources of noise to blend into anything continuous enough to be perceived as background noise. The rest of the time these sharp sounds would ring out into the foreground of public attention, one of the many growing pains for the young city. 144