The Humming Hum: Background Noise as a Carrier of ENF Artifacts in Mobile Device Audio Recordings Niklas Fechner and Matthias Kirchner Department of Information Systems, University of Münster, Germany {nfech_01 | matthias.kirchner}@uni-muenster.de Abstract Audio forensics based on fluctuations in the electrical network frequency (ENF) has become one of the major approaches for the authentication of digital audio recordings. Yet little is known about the circumstances and preconditions under which battery-powered devices leave ENF artifacts in their recordings. Our study with multiple mobile recording devices confirms the hypothesis that background noise, generated by mains-powered electronic devices in proximity to the recording device, is a carrier of ENF artifacts. Experiments in an indoor setting suggest a very high robustness and indicate the presence of ENF artifacts even multiple rooms apart from the noise source. Keywords digital audio forensics, electrical network frequency, mobile devices, background noise I. I NTRODUCTION The Audio Engineering Society defines an authentic audio recording as [1, p. 6] “a recording made simultaneously with the acoustic events it purports to have recorded, and in a manner fully and completely consistent with the method of recording claimed by the party who produced the recording; a recording free from unexplained artifacts, alterations, additions, deletions, or edits.” The literature on digital audio forensics is rich of proposals to assess the various aspects of the above definition based on inherent characteristics of questioned audio recordings [2]. Relevant questions of interest include, amongst others, inference about the provenance [3] and the acoustic environment of a recording [4]–[6], the identification of recording devices [7]–[9] or their audio codecs [10]–[12], forensic speaker identification [13], as well as the detection of processing artifacts [14]–[17]. Electrical network frequency (ENF) artifacts in digital audio recordings are among the most actively researched signal characteristics in the context of forensic audio authentication. Grigoras [18] first reported that mains-powered devices are likely to capture supposedly random deviations from the nominal network frequency (typically 50 Hz in Europe, and 60 Hz in most parts of the Americas) in their recordings. Such fluctuations are inevitable because of the continuously changing and relatively unpredictable generation-load (im)balance in the respective network. Multiple independent studies suggest that ENF artifacts are stable across the whole network and distinctive enough to serve as a “natural fingerprint” of a recording [18]–[24, amongst others]. One advantage of ENF-based forensics is its wide applicability to different aspects of audio authentication. Because of their stochastic character, ENF traces can help to determine or to verify the time of a recording, if reference data of the purported source network and time frame is available. Yet ENF fluctuations also vary considerably across different networks. Aggregated statistical measures, extracted from a questioned recording only, can be sufficient to determine the source network [25]. Also audio editing is detectable by means of ENF-based forensics. Missing or misplaced frequency traces may indicate processed parts of an audio file [18], and so do abrupt phase changes in the ENF frequency band [26]. Early on, ENF artifacts have also been reported to be present in audio recordings acquired with battery-powered devices, i. e., recorders that are not directly connected to the mains grid [19], [20]. This observation is particularly relevant for practical casework, considering the omnipresence of mobile audio recording devices (e. g., smartphones) nowadays. Yet surprisingly little has been known about the exact circumstances that would explain when and to what degree mobile device audio signals carry ENF artifacts. Early beliefs that the presence of a strong enough electromagnetic field would be the main precondition were questioned soon after [27]. This paper sets out to explore another possible scenario in more detail. Specifically, Chai et al. [28] suggested very recently that (audible) background noise is a carrier of ENF artifacts, if the noise source is a mains-powered electronic device. We follow this trail equipped with a more sophisticated experimental setup than the original study, which will allow us to learn more about the strength and robustness of ENF artifacts in mobile device recordings. Hence, we expect our findings to contribute to a better understanding and assessment of practical forensic analyses. The remainder of this paper is organized as follows. The following Sect. II reviews the status quo in mobile device ENF-based forensics. Section III details our experimental setup, before Sect. IV discusses our observations and results. Section V concludes the paper. II. M OBILE D EVICE ENF-F ORENSICS It is now a widely-accepted assumption that digital audio data from mains-powered recording devices carries traces of ENF fluctuations with high probability [18]–[24]. Depending on the network, typical deviations from the nominal network frequency are in the range of ± 50 mHz to ± 200 mHz, with a distribution that generally appears to resemble a Gaussian fairly well [22], [29]. ENF artifacts can be extracted from audio recordings at low signal-to-noise ratios through the use of a band-pass filter around the nominal network frequency, typically combined with spectral methods or the analysis of zero-crossings. ENF-related artifacts may also exist at higher-order harmonics of the nominal frequency [30]. In general, the strength and presence of ENF artifacts is believed to depend on the recording device’s internal circuitry design and electromagnetic compatibility (EMC) characteristics [27]. As for battery-powered mobile devices, the situation is more complex. Grigoras [18] alluded to the possibility that a direct connection to the mains grid is not necessarily a precondition for capturing ENF artifacts, as long as the recording is made close enough to other mains-powered equipment. Experiments by Kajstura et al. [19] confirmed this conjecture, but Brixen [20] observed that only certain recording devices may capture ENF artifacts while others won’t. According to Brixen’s study, electric cables in the recording devices’ proximity were not sufficient. Electromagnetic fields of varying strength were generally ruled out as the source of ENF artifacts in recordings from devices with electret microphones (as opposed to positive results for dynamic microphones). These observations are of particular practical relevance, because electret microphones are the most common type of microphone in typical consumer devices. Chai et al. [28] very recently confirmed Brixen’s results in that recorders with electret microphones do not capture ENF artifacts through electrical or magnetic fields. Instead, the authors suggested that acoustic mains hum is a carrier of ENF traces. In general, every kind of background noise generated by mains-powered devices in proximity to the recording device might indirectly transfer network frequency fluctuations, very similar to flickering fluorescent light sources in indoor video recordings [31]. Chai et al. [28] used an air conditioning fan as noise source in their experiments. ENF artifacts were present when the fan was switched on, but no obvious characteristic traces could be found when the fan was not running. When the recording device was wrapped with soundproof material, the strength of ENF artifacts decreased. Several additional factors can impact the strength and the presence of ENF artifacts in mobile device recordings. Due to technical restrictions or device-internal digital signal processing, mobile digital recorders may not be able to capture the signal. Specifically, many consumer devices are inherently incapable of recording very low frequencies around the fundamental 50 / 60 Hz band. Also relatively strong compression (e. g., Adaptive Multi-Rate compression, AMR) can limit the available frequency range. Hence, also ENF artifacts at higher-order harmonics of the fundamental network frequency are generally of particular interest in forensic analyses of mobile device recordings. III. E XPERIMENTS WITH AUDIBLE H UM Mains hum in audio systems is one of the most well-known network-induced acoustic noise sources. It can occur because of grounding imbalances, or it may be transferred magnetically or capacitively. The oftentimes hard-to-eliminate humming sound can be very well audible. In our Münster-based study (nominal network frequency in Germany: 50 Hz), we use it as a prototypic background noise source to investigate to what extent acoustic signals carry artifacts due to ENF fluctuations and how well such artifacts are captured by mobile recording devices in relatively close proximity. Other potential (indoor) noise sources are fans [28], power adapters, lights, fridges, amongst many others. A. Recording Devices We use three different battery-powered recording devices in our experiments: a Samsung Galaxy S2 with Android 4.2 installed, an iPhone 4s with iOS 6 installed, as well as a Tascam R-05 digital recorder. All three devices are equipped with electret microphones. We expect that the smartphones (Galaxy S2 and iPhone 4s) use microphones of lower quality than the digital recorder. Typical frequency responses of electret microphones may well extend beyond a 20 Hz to 20 kHz interval, yet noise reduction efforts inside the recording device usually impose a considerably more narrow frequency response. We use a third-party application to circumvent a too aggressive low-frequency cut-off in the iPhone, as older versions of Apple’s iOS damp frequency components below 250 Hz by default.1 Figure 1 gives an overview of the recording characteristics of the test devices. The spectrograms (Hanning window length: 16,000 samples, overlap: 70 samples) were obtained from uncompressed and unprocessed WAV PCM format recordings of a 10 second probe chirp with a linearly increasing signal frequency in the range [20, 200] Hz. For comparison, we also include the spectrogram for a Sennheiser PC 360 headset, which was connected to a mains-powered computer during the recording. The spectrograms indicate that the two smartphones in our test set capture low-frequency components only to a limited 1 http://blog.faberacoustical.com/2012/ios/iphone/finally-ios-6-kills-the-filter-on-headset-and-mic-inputs Samsung Galaxy S2 (Android 4.2) 200 150 150 frequency [Hz] frequency [Hz] frequency [Hz] iPhone 4s (iOS 6) 200 100 50 0 0 2 4 6 8 Tascam R-05 0 1 100 0.8 20 50 0.6 40 0.40 10 0 time [s] 2 Tascam R-05 frequency [Hz] 150 100 50 8 10 60 Sennheiser PC 360 0 150 6 time [s] 0 200 200 frequency [Hz] 4 0.2 0.2 0.4 0.6 0.8 1 80 time [s] 100 100 50 120 0 here is some text without a meaning. This text should show what a printe Hello, 6 8 read this0 text, you2 will get4 no information. Really? 10 Is there no information? Is there time [s] time [s] nonsense like “Huardest gefburn”? Kjift – not at all! A blind text like this gives Figure 1. Spectrograms of unprocessed 10-second audio recordings chirp probe signal) from devices in our test look. set. A This corresponding how(linear the letters are written andmobile an impression of the text should contain a spectrogram for a Sennheiser PC 360 headset, connected to mains-powered computer, is displayed for reference. The artifacts in the iPhone spectrogram written in of the original language. There is no need for special content, but the len after about 6.5 seconds are due to clipping. 0 0 2 4 6 8 10 degree. Spectral components around and below 50 Hz are subject to considerable damping. The Tascam digital recorder is expectably more sensitive. Yet all mobile device recordings are generally in stark contrast to the Sennheiser PC 360 data, which not only contains the full frequency range but also has strong spectral lines at 50 Hz and 150 Hz throughout the whole recording. While the more uniform frequency response suggests the absence of strong post-processing, the two additional frequency components indicate the to-be-expected presence of ENF artifacts (also at higher-order harmonics) in the recording. B. Noise Source We generate audible mains hum by connecting a dual RCA audio cable to the audio input of a typical active subwoofer system (Teufel Concept M series), located indoors, inside a typical German apartment. The cable has a length of 2 meters. One end of the cable is left unconnected. When the subwoofer is turned on, a humming sound is clearly noticeable from the speaker. Electronic devices in the same room other than subwoofer and the battery-powered audio recorders are switched off during the experiments. C. Soundproof Box Chai et al. [28] wrapped recording devices with soundproof material to isolate the impact of the acoustic noise source on ENF artifacts in audio recordings. Experiments with their construction—a cylinder built from four layers of recycled rubber material—indicated weakened ENF artifacts as compared to recordings without soundproofing. We follow this approach in our experiments, but replace the hands-on construction with a more sophisticated soundproof box. Ideally, any attempt to verify that ENF characteristics are transmitted over the acoustic channel would require a complete absorption across the whole frequency range. Practically, this requirement is very hard to meet with feasible efforts and consumer-grade materials. As a compromise, we carefully designed our box to provide a good trade-off between effectiveness, complexity and usability. Figure 2. Design and structure of the soundproof box used in our experiments. Specifically, we use both porous absorbers and resonant absorbers. The former generally have higher sound absorption coefficients than resonant absorbers, but lose efficiency towards lower frequencies. Also resonant absorbers are thus necessary to cover typical frequency ranges of ENF artifacts and their harmonics. The final construction is depicted in Fig. 2. It consists of two nested boxes, providing the necessary cavity to absorb low-frequency components. The boxes itself are built from medium-density fibreboards (MDF), a material often used for loudspeaker design. Penetrating sound waves let the MDF material resonate, which transforms parts of the sound energy into kinetic energy. All MDF joints are siliconed. The sealing is necessary for good absorption properties. The removable lid is equipped with a special window rubber seal construction. The cavity between both boxes is filled with fine sand, a porous absorber with excellent sound absorption properties. The inside of inner box is further cased with a thick layer of solid densely porous foam. Figure 3 visualizes the empirical sound attenuation characteristics of our soundproof box, as we determined it based on the sound energy of various audio recordings of a linear chirp signal from devices inside (transmitted energy) and outside (incident energy) the box.2 An attenuation coefficient of 1 indicates the total attenuation of a specific frequency component. The gray graph depicts the absorption of spectral components for one particular recording with the Galaxy S2 mobile phone. The red graph corresponds to the average over all test recordings. The measurements reflect the challenges of achieving a complete sound absorption in particular for lower frequencies. Overall, however, the soundproof box exhibits very good characteristics also for ENF-related frequency ranges. 2 Note that we do not measure the exact sound absorption coefficient, i. e., the ratio of absorbed sound energy and incident energy, of the box. So-called in situ measurements are a complex process—impaired by reflective distortions and background noise [32]—and beyond the scope of our study. Schalls Schalls angibt. angibt. Insgesamt Insgesamt kann kann die die Konstruktion Konstruktion noch noch weiter weiter verbessert verbessert werden, werden, weist weist aber verwendete Zylinder aus gummiaber einen einen höheren höheren Schluckgrad Schluckgrad auf, auf, als als der der von von C CHAI HAI verwendete Zylinder aus gummi- Schluckgradinin%% Schluckgrad 90 92 100 90 92 9494 9696 9898 100 artigem artigem Dämmmaterial Dämmmaterial (Chai (Chai et et al. al. 2013). 2013). attenuationfactor factor attenuation 1.00 1.00 single measurement average 0.98 0.98 0.96 0.96 0.94 0.94 0.92 0.92 0 0 500 500 1,000 1,000 1,500 1,500 2,000 2,500 2,000 2,500 frequency [Hz] frequency [Hz] 3,000 3,000 3,500 3,500 4,000 durchschnittlicher Wert 4,000 durchschnittlicher Wert 0 1000 2000 3000 4000 5000 0 1000 3000 and average over 4000 5000 (red) of linear Figure 3. Sound attenuation characteristics of the soundproof box.2000 Single measurement (gray) multiple audio recordings Frequenz (Hz) chirp signals with devices inside and outside the box. Frequenz (Hz) Abbildung Abbildung 23: 23: Amplitudenabschwächung Amplitudenabschwächung durch durch Schalldämmung Schalldämmung D. Analysis of ENF Artifacts SimilarVersuchsablauf to most studies in the literature, we analyze ENF artifacts in the short-time Fourier transform (STFT) domain [22]. A numberVersuchsablauf of pre-processing steps are necessary to keep the computational load tractable and to ensure a high signal-to-noise ratio. Specifically, we first downsample the audio recordings to sampling rates of 300 Hz, 600 Hz or 900 Hz, conditional Nachdem ein inneren Kasten platziert wird luftdicht veron the order of the harmonic of interest (first,im second, or third, respectively). fourth-order Butterworth Nachdem ein Aufnahmegerät Aufnahmegerät im inneren Kasten platziert Aist, ist, wird dieser dieser luftdichtlow-pass ver- filter is used for anti-aliasing. The downsampled signals are then fed into aSand bandpass filter (fourth-order Butterworth, ± 0.5 Hz) to schlossen. Danach wird der äußere Kasten mit aufgefüllt und ebenfalls luftunschlossen. Danach wird der äußere Kasten mit Sand aufgefüllt und ebenfalls luftunextract the frequency range of interest. As we found that there is not the single best setting for STFT temporal and spectral durchlässig geschlossen. Der befindet dabei die in resolutions, we report different combinations by varying both thesich length and an the einer overlapStelle, of analysis windows. durchlässig geschlossen. Der Kasten Kasten befindet sich dabei an einer Stelle, die sich sich in unun- mittelbarer Nähe E. Reference Data mittelbarer Nähe zur zur Quelle Quelle befindet, befindet, mit mit der der das das Summen Summen erzeugt erzeugt werden werden kann. kann. Da Da der der ungefähr 15cm entfernt Aufnahmegerät im We useKasten referencedabei ENF signals for comparison withQuelle the measurements from the mobile das device audio recordings. The Kasten dabei ungefähr 15cm zur zur Quelle entfernt ist, ist, zeichnet zeichnet das Aufnahmegerät im reference data was Inneren recorded independently in Dresden, Germany, approximately 500 km away from the location of our experiments in in 40cm zum Summen auf. ersten fünf werden Inneren in ungefähr ungefähr 40cm Abstand Abstand Summencircuitry auf. Die Die fünf Minuten Minuten Münster. The ENF was measured directly off the grid.zum A step-down and ersten a Schmitt-trigger convertwerden the raw ENF data ohne das erzeugte Summen, also aufgenommen. Dato a continuos wave signal. The elapsed time between zero crossingsQuelle, is measured with the 100 MHz ohnesquare das explizit explizit erzeugte Summen, also bei beithedeaktivierter deaktivierter Quelle, aufgenommen. Da-clock of a 3 BeagleBone Black board. Although the measurement does not include any time adjustments per GPS, the results have a nach ist das eingeschaltet und weitere fünf Minuten werden aufgezeichnet. nachwith ist deviations das Summen Summen eingeschaltet high accuracy in the low mHz-range. und weitere fünf Minuten werden aufgezeichnet. Während sind alle Während des des ganzen ganzen Versuchs VersuchsIV. sind alle weiteren weiterenRin in der der Nähe Nähe befindlichen, befindlichen, möglichen möglichen E XPERIMENTAL ESULTS Quellen für störende Signale oder elektromagnetische Felder ausgeschaltet. Quellen fürdescribed störendeinSignale odersection, elektromagnetische Based on the setup the previous we conducted a Felder number ausgeschaltet. of experiments to better understand how acoustic signals transfer ENF artifacts into mobile device audio recordings. If not stated otherwise, all devices were set to store uncompressed PCM audio data at 8 kHz / 16 bit. We report our main findings in the following. Ergebnisse Ergebnisse A. Baseline Results Die Ergebnisse zeigen, dass in Phase, bei ausgeschalteter Quelle, In a first baseline experiment, we recorded eachENF ten minutes background audio each mobile device in our test set Die Ergebnisse zeigen, dass die die ENF in der derofersten ersten Phase, beiwith ausgeschalteter Quelle, when the keine noise source was switched eitherhinterlässt. on or off. The recording devices were placed um at a distance of approximately 30 cm nachweisbare Spur In den Frequenzbereichen die Nennfrequenz keinesource. nachweisbare In den Frequenzbereichen um die Nennfrequenz from the noise We recored Spur with allhinterlässt. devices simultaneously. Figure 4 depicts the amplitudes of the resulting frequency 50 Hz und allen weiteren Harmonischen, kann keine dominante festgestellt spectra. The panel allen of the figure correspond to recordingskann with mains-generated hum. Frequenz Strong spectral components around 50right Hz und weiteren Harmonischen, keine dominante Frequenz festgestellt multiples werden. of the fundamental network frequency (50 Hz)der are clearly noticeable for all three devices. Such artifacts are missing Falls die ENF in diesem Teil Aufnahme vorhanden ist, verschwindet sie werden. Falls die ENF der Aufnahme vorhanden ist, verschwindet sie in in in the corresponding hum-free spectrainondiesem the left,Teil indicating that the humming background noise might indeed carry traces dem of ENF fluctuations. dem Rauschen Rauschen der der Aufzeichnung. Aufzeichnung. In In Abbildung Abbildung 24 24 wird wird das das Ergebnis Ergebnis dargestellt, dargestellt, wobei wobei A closer inspection confirms that300sten the suspicious frequency components ofzeigt. audio Wie signals with audible hum are with high der Abschnitt bis zur Sekunde die erste Phase hier zu sehen ist, ist der Abschnitt zur 300sten erste frequency Phase zeigt. Wie zuHzsehen ist,different ist STFT certainty characteristic ENFbis artifacts. Figure 5Sekunde reports thedie extracted signals for hier the 50 band at keine dominante Frequenz in Bereich die 50 Tatsächlich zeitemporal and spectral resolutions. Each panel also theum synced signal, represented with a temporal keine dominante Frequenz in dem demdisplays Bereich um diereference 50 Hz Hz festzustellen. festzustellen. Tatsächlich zei-resolution of one second. Observe that all extracted signalsPhase, closelybei resemble the reference data, although certain STFT parameter settings gen sich jedoch in der zweiten erzeugtem Summen, Spuren einer dominanten in estimates. der zweiten Phase, bei erzeugtem Summen, Spuren einer dominanten appear to gen yieldsich morejedoch accurate Independent of all tested STFT settings, none of the hum-free audio recordings Frequenz. Diese Frequenz zeigt einen Verlauf, der mit der ENF übereinstimmt, wie revealed ENF-like signal tracesFrequenz (not depicted here for the sake of brevity). is generally similar the analysis Frequenz. Diese zeigt einen Verlauf, der mit The der situation ENF übereinstimmt, wiefordie die of higher-order harmonics. Figure 6des depicts exemplary results for the 100 Hz (left panel) Sekunde and 150 Hzdarstellt. band (right panel), Abbildung 24 anhand Abschnitts von der 300sten bis zur 600sten Abbildung 24 anhand des Abschnitts der 300sten bissuitable zur 600sten Sekunde darstellt. respectively. In accordance to prior work [28], the 150von Hz band seems more to accurately measure ENF artifacts. 3 http://beagleboard.org/Products/BeagleBone reference iPhone 4s Galaxy S2 Tascam R-05 noise source switched off freq. resolution: 0.5 Hz, time resolution: 1 s 49.98 106 105 104 spectral amplitude 50.00 50.02 107 frequency [Hz] spectral amplitude frequency [Hz] 50.02 noise source switched on freq. resolution: 0.25 Hz, time resolution: 3 s 50.00 49.98 103 49.96 106 105 104 103 49.96 102 0 107 60 0 12050180100 240150 300200 360250 420300 480350 540400 600450 500 102 0 time [s] frequency [Hz] 60 0 12050180100 240150 300200 360250 420300 480350 540400 600450 500 time [s] frequency [Hz] freq. resolution: 0.12 Hz, time freq. resolution:noise 0.03 Hz, time switch resolution: s and on (right). Figure 4. Amplitudes ofresolution: frequency7 sspectra of indoor test recordings with mains-powered source off 29 (left) 50.02 50.02 frequency [Hz] frequency [Hz] B. Robustness of the Acoustic Channel 50.00 The above observations suggest that mobile device audio50.00 recordings in close proximity to our noise source contain characteristic traces of the ENF. Because all recording devices are equipped with electret microphones, we suspect that these artifacts are not transferred through electric or magnetic fields [20], [28]. For a better understanding of the acoustic channel, 49.98 49.98 we repeated the experiments with recording devices placed inside the soundproof box (cf. Sect. III-C). After sealing the box, we consecutively recorded five minutes of audio data with the noise source switched off, before it was switched on for another five minutes. The devices inside the box were approximately 40 cm apart from the noise source. Figure 7 displays 49.96 49.96 the extracted frequency signals in the 50 Hz band. The graphs for all three recording devices clearly reflect the two different phases. artifacts seem300 to be only second while 0 ENF 60 120 180 240 360present 420 480 540when 600 the noise source0 was 60 activated 120 180 (starting 240 300 at360 420 300), 480 540 600 the first halves of the recordings do ENF timenot [s] exhibit clear signs of network characteristics. Overall, the time [s] estimates appear much less accurate than before in Fig. 5. This is an effect of the soundproof box and its strong sound attenuation (cf. Fig. 3). FigureFigure 1. STFT analysis of ENFdetailed traces in account the 50 Hzof band varyingoftemporal and spectral resolutions the noiseharmonics source was activated. 8 gives a more theforimpact soundproofing with respectwhen to relevant of the fundamental network frequency. The left panel of the figure compares the harmonics’ spectral amplitudes in recordings inside and outside the box on a normalized logarithmic scale. The right panel displays the corresponding sound attenuation coefficients per Hello, here is some text without a meaning. This text show whatofathe printed text will look like at this place. If youindicate harmonic. As before, a coefficient of 1 denotes theshould full suppression respective harmonic frequency. The graphs read this text, you will get no information. Really? Is there no information? Is there a difference between this text and that soundproofing reduces the amplitude of the first harmonic (50 Hz) to only about 1.5 % of the benchmarksome outside the nonsense like “Huardest Kjift – harmonics not at all! is A generally blind textlower, like this information font,is most box. The energy ofgefburn”? the higher-order andgives it is you further decreased about insidethe theselected box . What how theinteresting letters arehere written impression of the look. This text should containofallcharacteristic letters of the ENF alphabet andwhich it should be even (andand of an immediate practical relevance) is the robustness traces, survive written severe in of the original language. There is no need for special content, but the length of words should match the language. damping of the carrier signal. In a more close-to-real world scenario, we further recorded ten minutes of audio data in other rooms inside and outside the same apartment. The setup was the same as before, i. e., the noise source was switched off during the first five minutes of the recording, and then activated for another five minutes. The hum was not audible outside the room of the noise source. Figure 9 depicts the extracted ENF signals in the 50 Hz band, along with the synced reference data. Note that the distance of the recording device (the Samsung Galaxy S2) to the noise source is increasing from panels a) to c). The first setting corresponds to a directly neighboring room, separated from the noise source by a massive brick wall. More recordings were made two rooms apart (three walls), and in the hallway outside the apartment (five walls). Despite the additional distances, the results generally resemble our findings from the earlier experiments. No meaningful ENF artifacts are noticeable without the generated background noise. However, Figures 9 a) and b) clearly suggest the presence of characteristic ENF traces even in rooms that are up to 10 meters apart from the activated noise source. Here, the spectral energy of the extracted signals is comparable to the measurements inside the soundproof box. Only the last case turned out to be too challenging, cf. Fig. 9 c). A distance of 20 meters and five massive brick walls introduced too much dampening and additional noise on the channel to extract an ENF-like signal from the second half of the audio recording. Overall, however, the results from this series of experiments seem very encouraging with respect to practical applications and confirm the high robustness of ENF artifacts in acoustic signals. reference 50.02 50.02 Galaxy S2 50.00 50.00 49.98 49.98 Tascam R-05 freq. resolution: 0.25 Hz, time resolution: 3 s freq. resolution: 0.25 Hz, time resolution: 3 s 50.02 50.02 frequency[Hz] [Hz] frequency frequency[Hz] [Hz] frequency iPhone 4s freq. resolution: 0.5 Hz, time resolution: 1 s freq. resolution: 0.5 Hz, time resolution: 1 s 49.96 49.96 50.00 50.00 49.98 49.98 49.96 49.96 00 60 60 120 180 180 240 240 300 300 360 360 420 420 480 480 540 540 600 600 120 00 60 60 time [s] [s] time 50.00 50.00 49.98 49.98 freq. resolution: 0.03 Hz, time resolution: 29 s freq. resolution: 0.03 Hz, time resolution: 29 s 50.02 50.02 frequency[Hz] [Hz] frequency frequency[Hz] [Hz] frequency time [s] [s] time freq. resolution: 0.12 Hz, time resolution: 7 s freq. resolution: 0.12 Hz, time resolution: 7 s 50.02 50.02 120 180 180 240 240 300 300 360 360 420 420 480 480 540 540 600 600 120 50.00 50.00 49.98 49.98 49.96 49.96 49.96 49.96 00 60 60 120 120 180 180 240 240 300 300 360 360 420 420 480 480 540 540 600 600 time time [s] [s] 00 60 60 120 120 180 180 240 240 300 300 360 360 420 420 480 480 540 540 600 600 time time [s] [s] 5. STFT analysis of traces ENF traces Hz for band for varying temporal and spectral resolutions (noise source activated). FigureFigure 1. STFT analysis of ENF in the in 50the Hz 50 band varying temporal and spectral resolutions when the noise source was activated. 100.00 50.00 frequency[Hz] [Hz] frequency frequency[Hz] [Hz] frequency Hello, here is some text without a meaning. This text should show what a printed text will look like at this place. If you read this text, you will get no information. Really? IsiPhone there4s no information? Is there a difference between this text and some reference Galaxy S2 Tascam R-05 nonsense like “Huardest gefburn”? Kjift – not at all! A blind text like this gives you information about the selected font, freq. resolution: 0.1 Hz, time resolution: 9 s freq. resolution: 0.1 Hz, time resolution: 9 s how the letters areresolution: written0.5and an impression all letters it should be freq. Hz, time resolution: 1 s of the look. This text should contain freq. resolution: 0.25 of Hz, the time alphabet resolution: 3 and s 100.05 150.10 written in of the original language. There is no need for special content, but the length of words should match the language. 50.02 50.02 99.95 49.98 49.96 99.90 150.00 50.00 149.90 49.98 49.96 149.80 00 60 60 120 180 180 240 240 300 300 360 360 420 420 480 480 540 540 600 600 120 00 time [s] [s] time 60 60 120 180 180 240 240 300 300 360 360 420 420 480 480 540 540 600 600 120 time [s] [s] time freq. resolution: Hz, time resolution: 7 s traces in the 100 Hz (left) and 150 Hz (right) freq. resolution: 0.03 source Hz, timeactivated). resolution: 29 s Figure 6. 0.12 STFT analysis of ENF band (noise 50.00 49.98 50.02 frequency [Hz] frequency [Hz] 50.02 50.00 49.98 freq. resolution: 0.06 Hz, time resolution: 15 s frequency [Hz] 50.05 50.00 reference iPhone 4s Galaxy S2 Tascam R-05 49.95 49.90 0 60 120 180 240 300 360 420 480 540 600 time [s] Figure 7. STFT analysis of ENF traces in the 50 Hz band when the recording devices were placed inside the soundproof box. The noise source was activated after the first 300 seconds of recording. normalized spectral amplitude attenuation 0.985 0.975 0.984 0.981 0.979 0.945 1 outside the box inside the box 0.1 0.01 0.001 0.0001 50 100 150 200 250 300 harmonic [Hz] Figure 8. Attenuation of the ENF harmonics when the recording device (Samsung Galaxy S2) was placed inside the soundproof box. Spectral amplitudes from recordings inside and outside the box, normalized to the amplitude of the 50 Hz harmonic without soundproofing and depicted on a logarithmic scale. Corresponding attenuation factors are reported in the upper part of the figure. V. S UMMARY AND C ONCLUDING R EMARKS This paper has presented an experimental investigation into the role of background noise as a carrier of ENF artifacts in mobile device audio recordings. Our examination of recordings from three different mobile devices supports the recent hypothesis [28] that mains-powered noise sources in proximity to the recording device can cause traces of the typical ENF fluctuations. Most importantly, our experiments indicate a very high robustness of these artifacts. Characteristic ENF traces were still detectable even after placing the recording devices in a dedicated soundproof box with approximately 99 % sound attenuation. Measurements in an indoor setting further suggest the presence of ENF artifacts multiple rooms apart from the then inaudible noise source. Overall, our results re-emphasize the immense potential of ENF-based forensics for audio authentication purposes. In anticipation of increasingly better signal models, estimation methods and detection techniques [24], [33], it seems relatively safe to predict a further maturing of the field. Yet these theory-driven advances can only bear fruit to the best possible extent in practical casework, if they are backed with a clear and precise understanding of the genuine cause(s) of ENF artifacts in audio recordings. As currents studies report indications based on few samples only (with our work being no exception), future work still has to conduct large-scale empirical experiments to infer how likely real-world audio recordings will contain distinctive ENF artifacts (possibly conditional on certain environmental characteristics). Along these lines, also the influence of different recording devices, softwares and compression algorithms needs to be considered. This indicates strong parallels to the field of digital image forensics, where these and related empirical questions have already a comparatively longer tradition [34], [35]. ACKNOWLEDGEMENTS 4 The authors thank dence GmbH, Dresden, Germany, for measuring and providing the ENF reference data used in this study. 4 http://dence.de a) distance: 5 m b) distance: 10 m 50.05 frequency [Hz] frequency [Hz] 50.05 50.00 49.95 49.90 50.00 49.95 49.90 0 60 120 180 240 300 360 420 480 540 600 0 60 120 180 240 300 360 420 480 540 600 time [s] time [s] c) distance: 20 m frequency [Hz] 50.05 reference 50.00 Galaxy S2 (freq. resolution: 0.06 Hz, time resolution: 15 s) 49.95 49.90 0 60 120 180 240 300 360 420 480 540 600 time [s] Figure 9. STFT analysis of ENF traces in the 50 Hz band when the recording device (Samsung Galaxy S2) was placed outside the room of the noise source, a) 5 meters and 1 massive wall, b) 10 meters and 3 massive walls, and c) 20 meters and 5 massive walls apart. The noise source was activated after the first 300 seconds of recording. R EFERENCES [1] Audio Engineering Society, “AES recommended practice for forensic purposes — Managing recorded audio materials intended for examination,” AES Standard AES27-1996 (r2007), 2007. [2] S. Gupta, S. Cho, and C.-C. J. Kuo, “Current developments and future trends in audio authentication,” IEEE MultiMedia, vol. 19, no. 1, pp. 50–59, 2012. [3] V. A. Balasubramaniyan, A. Poonawalla, M. Ahamad, M. T. Hunter, and P. Traynor, “Pindr0p: Using single-ended audio features to determine call provenance,” in ACM Conference on Computer and Communications Security. ACM Press, 2010, pp. 109–120. [4] H. Malik and H. Farid, “Audio forensics from acoustic reverberation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 1710–1713. [5] N. Peters, H. Lei, and G. Friedland, “Name that room: Room identification using acoustic features in a recording,” in ACM International Conference on Multimedia. ACM Press, 2012, pp. 841–844. [6] H. Malik, “Acoustic environment identification and its applications to audio forensics,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 11, pp. 1827–1837, 2013. [7] R. Buchholz, C. Kraetzer, and J. Dittmann, “Microphone classification using Fourier coefficients,” in Information Hiding, 11th International Workshop, ser. Lecture Notes in Computer Science, S. Katzenbeisser and A.-R. Sadeghi, Eds., vol. 5806. Springer, 2009, pp. 235–246. [8] D. Garcia-Romero and C. Y. Espy-Wilson, “Automatic acquisition device identification from speech recordings,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 1806–1809. [9] C. Hanilçi and F. Ertas, “Optimizing acoustic features for source cell-phone recognition using speech signals,” in ACM Workshop on Information Hiding and Multimedia Security. ACM Press, 2013, pp. 141–148. [10] R. Böhme and A. Westfeld, “Feature-based encoder classification of compressed audio streams,” Multimedia Systems, vol. 11, no. 2, pp. 108–120, 2005. [11] D. Luo, W. Luo, R. Yang, and J. Huang, “Compression history identification for digital audio signal,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 1733–1736. [12] F. Jenner and A. Kwasinski, “Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 1737–1740. [13] P. Rose, Forensic Speaker Identification. CRC Press, 2003. [14] X. Pan, X. Zhang, and S. Lyu, “Detecting splicing in digital audios using local noise level estimation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 1841–1844. [15] R. Yang, Z. Qu, and J. Huang, “Exposing MP3 audio forgeries using frame offsets,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 8, no. 2S, 2012. [16] J. Chen, S. Xiang, W. Liu, and H. Huang, “Exposing digital audio forgeries in time domain by using singularity analysis with Wavelets,” in ACM Workshop on Information Hiding and Multimedia Security. ACM Press, 2013, pp. 149–158. [17] T. Bianchi, A. D. Rosa, M. Fontani, G. Rocciolo, and A. Piva, “Detection and classification of double compressed MP3 audio tracks,” in ACM Workshop on Information Hiding and Multimedia Security. ACM Press, 2013, pp. 159–164. [18] C. Grigoras, “Digital audio recording analysis: the electric network frequency (ENF) criterion,” Speech, Language and the Law, vol. 12, no. 1, pp. 63–76, 2005. [19] M. Kajstura, A. Trawinska, and J. Hebenstreit, “Application of the electrical network frequency (ENF) criterion: A case of a digital recording,” Forensic Science International, vol. 155, no. 2–3, pp. 165–171, 2005. [20] E. B. Brixen, “Techniques for the authentication of digital audio recordings,” in 122nd AES Convention, 2007. [21] M. Huijbregtse and Z. J. Geradts, “Using the ENF criterion for determining the time of recording of short digital audio recordings,” in Computational Forensics, Third International Workshop, ser. Lecture Notes in Computer Science, Z. J. Geradts, K. Y. Franke, and C. J. Veenman, Eds., vol. 5718. Springer, 2009, pp. 116–124. [22] A. J. Cooper, “An automated approach to the electric network frequency (ENF) criterion - theory and practice,” International Journal of Speech Language and the Law, vol. 16, no. 2, pp. 193–218, 2009. [23] ——, “Further considerations for the analysis of ENF data for forensic audio and video applications,” International Journal of Speech Language and the Law, vol. 18, no. 1, pp. 99–120, 2011. [24] R. Garg, A. L. Varna, and M. Wu, “Modeling and analysis of electric network frequency signal for timestamp verification,” in IEEE International Workshop on Information Forensics and Security, 2012, pp. 67–72. [25] A. Hajj-Ahmad, R. Garg, and M. Wu, “ENF based location classification of sensor recordings,” in IEEE International Workshop on Information Forensics and Security, 2013. [26] D. P. Nicolalde and J. A. Apolinário, Jr., “Evaluating digital audio authenticity with spectral distances and ENF phase change,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2009, pp. 1417–1420. [27] E. B. Brixen, “ENF; Quantification of the magnetic field,” in AES 32nd International Conference, 2008. [28] J. Chai, F. Liu, Z. Yuan, R. W. Conners, and Y. Liu, “Source of ENF in battery-powered digital recordings,” in 135th AES Convention, 2013. [29] Y. Liu, Z. Yuan, P. N. Markham, R. W. Conners, and Y. Liu, “Application of power system frequency for digital audio authentication,” IEEE Transactions on Power Delivery, vol. 27, no. 4, pp. 1820–1828, 2012. [30] E. B. Brixen, “Further investigation into the ENF criterion for forensic authentication,” in 123rd AES Convention, 2007. [31] R. Garg, A. L. Varna, and M. Wu, “Seeing ENF: Natural time stamp for digital video via optical sensing and signal processing,” in ACM International Conference on Multimedia. ACM Press, 2011, pp. 23–32. [32] M. Garai, “Measurement of the sound-absorption coefficient in situ: The reflection method using periodic pseudo-random sequences of maximum length,” Applied Acoustics, vol. 39, no. 1–2, pp. 119–139, 1993. [33] L. Fu, P. N. Markham, R. W. Conners, and Y. Liu, “An improved Discrete Fourier Transform-based algorithm for electric network frequency extraction,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 7, pp. 1173–1181, 2013. [34] R. Böhme, F. Freiling, T. Gloe, and M. Kirchner, “Multimedia forensics is not computer forensics,” in Computational Forensics, Third International Workshop, ser. Lecture Notes in Computer Science, Z. J. Geradts, K. Y. Franke, and C. J. Veenman, Eds., vol. 5718. Springer, 2009, pp. 90–103. [35] H. T. Sencar and N. Memon, Eds., Digital Image Forensics: There is More to a Picture Than Meets the Eye. Springer, 2013.