The Humming Hum: Background Noise as a Carrier of ENF Artifacts

advertisement
The Humming Hum:
Background Noise as a Carrier of ENF Artifacts in Mobile Device Audio Recordings
Niklas Fechner and Matthias Kirchner
Department of Information Systems, University of Münster, Germany
{nfech_01 | matthias.kirchner}@uni-muenster.de
Abstract
Audio forensics based on fluctuations in the electrical network frequency (ENF) has become one of the major approaches
for the authentication of digital audio recordings. Yet little is known about the circumstances and preconditions under which
battery-powered devices leave ENF artifacts in their recordings. Our study with multiple mobile recording devices confirms the
hypothesis that background noise, generated by mains-powered electronic devices in proximity to the recording device, is a
carrier of ENF artifacts. Experiments in an indoor setting suggest a very high robustness and indicate the presence of ENF
artifacts even multiple rooms apart from the noise source.
Keywords
digital audio forensics, electrical network frequency, mobile devices, background noise
I. I NTRODUCTION
The Audio Engineering Society defines an authentic audio recording as [1, p. 6]
“a recording made simultaneously with the acoustic events it purports to have recorded, and in a manner fully and
completely consistent with the method of recording claimed by the party who produced the recording; a recording
free from unexplained artifacts, alterations, additions, deletions, or edits.”
The literature on digital audio forensics is rich of proposals to assess the various aspects of the above definition based on
inherent characteristics of questioned audio recordings [2]. Relevant questions of interest include, amongst others, inference
about the provenance [3] and the acoustic environment of a recording [4]–[6], the identification of recording devices [7]–[9]
or their audio codecs [10]–[12], forensic speaker identification [13], as well as the detection of processing artifacts [14]–[17].
Electrical network frequency (ENF) artifacts in digital audio recordings are among the most actively researched signal
characteristics in the context of forensic audio authentication. Grigoras [18] first reported that mains-powered devices are
likely to capture supposedly random deviations from the nominal network frequency (typically 50 Hz in Europe, and 60 Hz
in most parts of the Americas) in their recordings. Such fluctuations are inevitable because of the continuously changing and
relatively unpredictable generation-load (im)balance in the respective network. Multiple independent studies suggest that ENF
artifacts are stable across the whole network and distinctive enough to serve as a “natural fingerprint” of a recording [18]–[24,
amongst others]. One advantage of ENF-based forensics is its wide applicability to different aspects of audio authentication.
Because of their stochastic character, ENF traces can help to determine or to verify the time of a recording, if reference data
of the purported source network and time frame is available. Yet ENF fluctuations also vary considerably across different
networks. Aggregated statistical measures, extracted from a questioned recording only, can be sufficient to determine the
source network [25]. Also audio editing is detectable by means of ENF-based forensics. Missing or misplaced frequency
traces may indicate processed parts of an audio file [18], and so do abrupt phase changes in the ENF frequency band [26].
Early on, ENF artifacts have also been reported to be present in audio recordings acquired with battery-powered devices,
i. e., recorders that are not directly connected to the mains grid [19], [20]. This observation is particularly relevant for practical
casework, considering the omnipresence of mobile audio recording devices (e. g., smartphones) nowadays. Yet surprisingly
little has been known about the exact circumstances that would explain when and to what degree mobile device audio signals
carry ENF artifacts. Early beliefs that the presence of a strong enough electromagnetic field would be the main precondition
were questioned soon after [27]. This paper sets out to explore another possible scenario in more detail. Specifically, Chai
et al. [28] suggested very recently that (audible) background noise is a carrier of ENF artifacts, if the noise source is a
mains-powered electronic device. We follow this trail equipped with a more sophisticated experimental setup than the original
study, which will allow us to learn more about the strength and robustness of ENF artifacts in mobile device recordings.
Hence, we expect our findings to contribute to a better understanding and assessment of practical forensic analyses.
The remainder of this paper is organized as follows. The following Sect. II reviews the status quo in mobile device
ENF-based forensics. Section III details our experimental setup, before Sect. IV discusses our observations and results.
Section V concludes the paper.
II. M OBILE D EVICE ENF-F ORENSICS
It is now a widely-accepted assumption that digital audio data from mains-powered recording devices carries traces of
ENF fluctuations with high probability [18]–[24]. Depending on the network, typical deviations from the nominal network
frequency are in the range of ± 50 mHz to ± 200 mHz, with a distribution that generally appears to resemble a Gaussian
fairly well [22], [29]. ENF artifacts can be extracted from audio recordings at low signal-to-noise ratios through the use
of a band-pass filter around the nominal network frequency, typically combined with spectral methods or the analysis of
zero-crossings. ENF-related artifacts may also exist at higher-order harmonics of the nominal frequency [30]. In general,
the strength and presence of ENF artifacts is believed to depend on the recording device’s internal circuitry design and
electromagnetic compatibility (EMC) characteristics [27].
As for battery-powered mobile devices, the situation is more complex. Grigoras [18] alluded to the possibility that a
direct connection to the mains grid is not necessarily a precondition for capturing ENF artifacts, as long as the recording
is made close enough to other mains-powered equipment. Experiments by Kajstura et al. [19] confirmed this conjecture,
but Brixen [20] observed that only certain recording devices may capture ENF artifacts while others won’t. According to
Brixen’s study, electric cables in the recording devices’ proximity were not sufficient. Electromagnetic fields of varying
strength were generally ruled out as the source of ENF artifacts in recordings from devices with electret microphones (as
opposed to positive results for dynamic microphones). These observations are of particular practical relevance, because
electret microphones are the most common type of microphone in typical consumer devices.
Chai et al. [28] very recently confirmed Brixen’s results in that recorders with electret microphones do not capture ENF
artifacts through electrical or magnetic fields. Instead, the authors suggested that acoustic mains hum is a carrier of ENF
traces. In general, every kind of background noise generated by mains-powered devices in proximity to the recording device
might indirectly transfer network frequency fluctuations, very similar to flickering fluorescent light sources in indoor video
recordings [31]. Chai et al. [28] used an air conditioning fan as noise source in their experiments. ENF artifacts were present
when the fan was switched on, but no obvious characteristic traces could be found when the fan was not running. When the
recording device was wrapped with soundproof material, the strength of ENF artifacts decreased.
Several additional factors can impact the strength and the presence of ENF artifacts in mobile device recordings. Due to
technical restrictions or device-internal digital signal processing, mobile digital recorders may not be able to capture the signal.
Specifically, many consumer devices are inherently incapable of recording very low frequencies around the fundamental
50 / 60 Hz band. Also relatively strong compression (e. g., Adaptive Multi-Rate compression, AMR) can limit the available
frequency range. Hence, also ENF artifacts at higher-order harmonics of the fundamental network frequency are generally of
particular interest in forensic analyses of mobile device recordings.
III. E XPERIMENTS WITH AUDIBLE H UM
Mains hum in audio systems is one of the most well-known network-induced acoustic noise sources. It can occur because
of grounding imbalances, or it may be transferred magnetically or capacitively. The oftentimes hard-to-eliminate humming
sound can be very well audible. In our Münster-based study (nominal network frequency in Germany: 50 Hz), we use it as a
prototypic background noise source to investigate to what extent acoustic signals carry artifacts due to ENF fluctuations and
how well such artifacts are captured by mobile recording devices in relatively close proximity. Other potential (indoor) noise
sources are fans [28], power adapters, lights, fridges, amongst many others.
A. Recording Devices
We use three different battery-powered recording devices in our experiments: a Samsung Galaxy S2 with Android 4.2
installed, an iPhone 4s with iOS 6 installed, as well as a Tascam R-05 digital recorder. All three devices are equipped with
electret microphones. We expect that the smartphones (Galaxy S2 and iPhone 4s) use microphones of lower quality than the
digital recorder. Typical frequency responses of electret microphones may well extend beyond a 20 Hz to 20 kHz interval, yet
noise reduction efforts inside the recording device usually impose a considerably more narrow frequency response. We use a
third-party application to circumvent a too aggressive low-frequency cut-off in the iPhone, as older versions of Apple’s iOS
damp frequency components below 250 Hz by default.1
Figure 1 gives an overview of the recording characteristics of the test devices. The spectrograms (Hanning window length:
16,000 samples, overlap: 70 samples) were obtained from uncompressed and unprocessed WAV PCM format recordings of a
10 second probe chirp with a linearly increasing signal frequency in the range [20, 200] Hz. For comparison, we also include
the spectrogram for a Sennheiser PC 360 headset, which was connected to a mains-powered computer during the recording.
The spectrograms indicate that the two smartphones in our test set capture low-frequency components only to a limited
1 http://blog.faberacoustical.com/2012/ios/iphone/finally-ios-6-kills-the-filter-on-headset-and-mic-inputs
Samsung Galaxy S2 (Android 4.2)
200
150
150
frequency [Hz] frequency [Hz]
frequency [Hz]
iPhone 4s (iOS 6)
200
100
50
0
0
2
4
6
8
Tascam R-05
0
1
100
0.8
20
50
0.6
40
0.40
10
0
time [s]
2
Tascam R-05
frequency [Hz]
150
100
50
8
10
60
Sennheiser PC 360
0
150
6
time [s]
0
200
200
frequency [Hz]
4
0.2
0.2
0.4
0.6
0.8
1
80
time [s]
100
100
50
120
0 here is some text without a meaning. This text should show what a printe
Hello,
6
8
read this0 text, you2 will get4 no information.
Really? 10
Is there no information? Is there
time [s]
time [s]
nonsense like “Huardest gefburn”? Kjift – not at all! A blind text like this gives
Figure 1. Spectrograms of unprocessed 10-second audio recordings
chirp probe
signal) from
devices in our
test look.
set. A This
corresponding
how(linear
the letters
are written
andmobile
an impression
of the
text should contain a
spectrogram for a Sennheiser PC 360 headset, connected to mains-powered computer, is displayed for reference. The artifacts in the iPhone spectrogram
written
in
of
the
original
language.
There
is
no
need
for
special
content, but the len
after about 6.5 seconds are due to clipping.
0
0
2
4
6
8
10
degree. Spectral components around and below 50 Hz are subject to considerable damping. The Tascam digital recorder is
expectably more sensitive. Yet all mobile device recordings are generally in stark contrast to the Sennheiser PC 360 data,
which not only contains the full frequency range but also has strong spectral lines at 50 Hz and 150 Hz throughout the whole
recording. While the more uniform frequency response suggests the absence of strong post-processing, the two additional
frequency components indicate the to-be-expected presence of ENF artifacts (also at higher-order harmonics) in the recording.
B. Noise Source
We generate audible mains hum by connecting a dual RCA audio cable to the audio input of a typical active subwoofer
system (Teufel Concept M series), located indoors, inside a typical German apartment. The cable has a length of 2 meters.
One end of the cable is left unconnected. When the subwoofer is turned on, a humming sound is clearly noticeable from the
speaker. Electronic devices in the same room other than subwoofer and the battery-powered audio recorders are switched off
during the experiments.
C. Soundproof Box
Chai et al. [28] wrapped recording devices with soundproof material to isolate the impact of the acoustic noise source on
ENF artifacts in audio recordings. Experiments with their construction—a cylinder built from four layers of recycled rubber
material—indicated weakened ENF artifacts as compared to recordings without soundproofing. We follow this approach in
our experiments, but replace the hands-on construction with a more sophisticated soundproof box. Ideally, any attempt to
verify that ENF characteristics are transmitted over the acoustic channel would require a complete absorption across the
whole frequency range. Practically, this requirement is very hard to meet with feasible efforts and consumer-grade materials.
As a compromise, we carefully designed our box to provide a good trade-off between effectiveness, complexity and usability.
Figure 2.
Design and structure of the soundproof box used in our experiments.
Specifically, we use both porous absorbers and resonant absorbers. The former generally have higher sound absorption
coefficients than resonant absorbers, but lose efficiency towards lower frequencies. Also resonant absorbers are thus necessary
to cover typical frequency ranges of ENF artifacts and their harmonics. The final construction is depicted in Fig. 2. It consists
of two nested boxes, providing the necessary cavity to absorb low-frequency components. The boxes itself are built from
medium-density fibreboards (MDF), a material often used for loudspeaker design. Penetrating sound waves let the MDF
material resonate, which transforms parts of the sound energy into kinetic energy. All MDF joints are siliconed. The sealing
is necessary for good absorption properties. The removable lid is equipped with a special window rubber seal construction.
The cavity between both boxes is filled with fine sand, a porous absorber with excellent sound absorption properties. The
inside of inner box is further cased with a thick layer of solid densely porous foam.
Figure 3 visualizes the empirical sound attenuation characteristics of our soundproof box, as we determined it based on
the sound energy of various audio recordings of a linear chirp signal from devices inside (transmitted energy) and outside
(incident energy) the box.2 An attenuation coefficient of 1 indicates the total attenuation of a specific frequency component.
The gray graph depicts the absorption of spectral components for one particular recording with the Galaxy S2 mobile phone.
The red graph corresponds to the average over all test recordings. The measurements reflect the challenges of achieving a
complete sound absorption in particular for lower frequencies. Overall, however, the soundproof box exhibits very good
characteristics also for ENF-related frequency ranges.
2 Note that we do not measure the exact sound absorption coefficient, i. e., the ratio of absorbed sound energy and incident energy, of the box. So-called
in situ measurements are a complex process—impaired by reflective distortions and background noise [32]—and beyond the scope of our study.
Schalls
Schalls angibt.
angibt. Insgesamt
Insgesamt kann
kann die
die Konstruktion
Konstruktion noch
noch weiter
weiter verbessert
verbessert werden,
werden, weist
weist
aber
verwendete Zylinder aus gummiaber einen
einen höheren
höheren Schluckgrad
Schluckgrad auf,
auf, als
als der
der von
von C
CHAI
HAI verwendete Zylinder aus gummi-
Schluckgradinin%%
Schluckgrad
90
92
100
90 92 9494 9696 9898 100
artigem
artigem Dämmmaterial
Dämmmaterial (Chai
(Chai et
et al.
al. 2013).
2013).
attenuationfactor
factor
attenuation
1.00
1.00
single measurement
average
0.98
0.98
0.96
0.96
0.94
0.94
0.92
0.92
0
0
500
500
1,000
1,000
1,500
1,500
2,000
2,500
2,000
2,500
frequency [Hz]
frequency [Hz]
3,000
3,000
3,500
3,500
4,000
durchschnittlicher Wert
4,000
durchschnittlicher Wert
0
1000
2000
3000
4000
5000
0
1000
3000 and average over
4000
5000 (red) of linear
Figure 3. Sound attenuation
characteristics of
the soundproof box.2000
Single measurement (gray)
multiple audio recordings
Frequenz (Hz)
chirp signals with devices inside and outside the box.
Frequenz (Hz)
Abbildung
Abbildung 23:
23: Amplitudenabschwächung
Amplitudenabschwächung durch
durch Schalldämmung
Schalldämmung
D. Analysis of ENF Artifacts
SimilarVersuchsablauf
to most studies in the literature, we analyze ENF artifacts in the short-time Fourier transform (STFT) domain [22].
A numberVersuchsablauf
of pre-processing steps are necessary to keep the computational load tractable and to ensure a high signal-to-noise
ratio. Specifically, we first downsample the audio recordings to sampling rates of 300 Hz, 600 Hz or 900 Hz, conditional
Nachdem
ein
inneren
Kasten
platziert
wird
luftdicht
veron the order
of the harmonic
of interest (first,im
second,
or third,
respectively).
fourth-order
Butterworth
Nachdem
ein Aufnahmegerät
Aufnahmegerät
im
inneren
Kasten
platziert Aist,
ist,
wird dieser
dieser
luftdichtlow-pass
ver- filter is
used for anti-aliasing.
The downsampled
signals
are then
fed into
aSand
bandpass
filter (fourth-order
Butterworth,
± 0.5 Hz) to
schlossen.
Danach
wird
der
äußere
Kasten
mit
aufgefüllt
und
ebenfalls
luftunschlossen. Danach wird der äußere Kasten mit Sand aufgefüllt und ebenfalls luftunextract the frequency range of interest. As we found that there is not the single best setting for STFT temporal and spectral
durchlässig
geschlossen.
Der
befindet
dabei
die
in
resolutions,
we report different
combinations
by varying
both thesich
length
and an
the einer
overlapStelle,
of analysis
windows.
durchlässig
geschlossen.
Der Kasten
Kasten
befindet
sich
dabei
an
einer
Stelle,
die sich
sich
in unun-
mittelbarer
Nähe
E. Reference
Data
mittelbarer
Nähe zur
zur Quelle
Quelle befindet,
befindet, mit
mit der
der das
das Summen
Summen erzeugt
erzeugt werden
werden kann.
kann. Da
Da der
der
ungefähr
15cm
entfernt
Aufnahmegerät
im
We useKasten
referencedabei
ENF signals
for comparison
withQuelle
the measurements
from
the mobile das
device
audio recordings. The
Kasten
dabei
ungefähr
15cm zur
zur
Quelle
entfernt ist,
ist, zeichnet
zeichnet
das
Aufnahmegerät
im reference
data was Inneren
recorded independently
in
Dresden,
Germany,
approximately
500
km
away
from
the
location
of
our
experiments
in
in
40cm
zum Summen
auf.
ersten fünf
werden
Inneren
in ungefähr
ungefähr
40cm Abstand
Abstand
Summencircuitry
auf. Die
Die
fünf Minuten
Minuten
Münster. The
ENF was
measured directly
off the grid.zum
A step-down
and ersten
a Schmitt-trigger
convertwerden
the raw ENF data
ohne
das
erzeugte
Summen,
also
aufgenommen.
Dato a continuos
wave signal.
The elapsed
time between
zero crossingsQuelle,
is measured
with the 100 MHz
ohnesquare
das explizit
explizit
erzeugte
Summen,
also bei
beithedeaktivierter
deaktivierter
Quelle,
aufgenommen.
Da-clock of a
3
BeagleBone
Black
board.
Although
the
measurement
does
not
include
any
time
adjustments
per
GPS,
the
results have a
nach
ist
das
eingeschaltet
und weitere fünf Minuten werden aufgezeichnet.
nachwith
ist deviations
das Summen
Summen
eingeschaltet
high accuracy
in the low
mHz-range. und weitere fünf Minuten werden aufgezeichnet.
Während
sind
alle
Während des
des ganzen
ganzen Versuchs
VersuchsIV.
sind
alle weiteren
weiterenRin
in der
der Nähe
Nähe befindlichen,
befindlichen, möglichen
möglichen
E XPERIMENTAL
ESULTS
Quellen
für
störende
Signale
oder
elektromagnetische
Felder
ausgeschaltet.
Quellen
fürdescribed
störendeinSignale
odersection,
elektromagnetische
Based on
the setup
the previous
we conducted a Felder
number ausgeschaltet.
of experiments to better understand how
acoustic signals transfer ENF artifacts into mobile device audio recordings. If not stated otherwise, all devices were set to
store uncompressed
PCM audio data at 8 kHz / 16 bit. We report our main findings in the following.
Ergebnisse
Ergebnisse
A. Baseline Results
Die
Ergebnisse
zeigen,
dass
in
Phase,
bei
ausgeschalteter
Quelle,
In a first
baseline
experiment,
we recorded
eachENF
ten minutes
background
audio
each mobile device
in our test set
Die
Ergebnisse
zeigen,
dass die
die
ENF
in der
derofersten
ersten
Phase,
beiwith
ausgeschalteter
Quelle,
when the keine
noise source
was switched
eitherhinterlässt.
on or off. The
recording
devices were placed um
at a distance
of approximately 30 cm
nachweisbare
Spur
In
den
Frequenzbereichen
die
Nennfrequenz
keinesource.
nachweisbare
In den Frequenzbereichen
um die Nennfrequenz
from the noise
We recored Spur
with allhinterlässt.
devices simultaneously.
Figure 4 depicts the amplitudes
of the resulting frequency
50
Hz
und
allen
weiteren
Harmonischen,
kann
keine
dominante
festgestellt
spectra. The
panel allen
of the figure
correspond
to recordingskann
with mains-generated
hum. Frequenz
Strong spectral
components around
50right
Hz und
weiteren
Harmonischen,
keine dominante
Frequenz
festgestellt
multiples werden.
of the fundamental
network
frequency
(50
Hz)der
are clearly
noticeable
for all three
devices.
Such artifacts
are missing
Falls
die
ENF
in
diesem
Teil
Aufnahme
vorhanden
ist,
verschwindet
sie
werden. Falls
die ENF
der Aufnahme
vorhanden
ist, verschwindet
sie in
in
in the corresponding
hum-free
spectrainondiesem
the left,Teil
indicating
that the humming
background
noise might indeed
carry traces
dem
of ENF fluctuations.
dem Rauschen
Rauschen der
der Aufzeichnung.
Aufzeichnung. In
In Abbildung
Abbildung 24
24 wird
wird das
das Ergebnis
Ergebnis dargestellt,
dargestellt, wobei
wobei
A closer
inspection
confirms
that300sten
the suspicious
frequency
components
ofzeigt.
audio Wie
signals
with
audible
hum
are
with high
der
Abschnitt
bis
zur
Sekunde
die
erste
Phase
hier
zu
sehen
ist,
ist
der Abschnitt
zur 300sten
erste frequency
Phase zeigt.
Wie
zuHzsehen
ist,different
ist STFT
certainty characteristic
ENFbis
artifacts.
Figure 5Sekunde
reports thedie
extracted
signals
for hier
the 50
band at
keine
dominante
Frequenz
in
Bereich
die
50
Tatsächlich
zeitemporal and
spectral
resolutions.
Each panel
also
theum
synced
signal, represented
with a temporal
keine
dominante
Frequenz
in dem
demdisplays
Bereich
um
diereference
50 Hz
Hz festzustellen.
festzustellen.
Tatsächlich
zei-resolution
of one second.
Observe
that all
extracted
signalsPhase,
closelybei
resemble
the reference
data, although
certain
STFT
parameter settings
gen
sich
jedoch
in
der
zweiten
erzeugtem
Summen,
Spuren
einer
dominanten
in estimates.
der zweiten
Phase, bei
erzeugtem
Summen,
Spuren
einer
dominanten
appear to gen
yieldsich
morejedoch
accurate
Independent
of all
tested STFT
settings, none
of the
hum-free
audio recordings
Frequenz.
Diese
Frequenz
zeigt
einen
Verlauf,
der
mit
der
ENF
übereinstimmt,
wie
revealed ENF-like
signal
tracesFrequenz
(not depicted
here
for the
sake of brevity).
is generally similar
the analysis
Frequenz.
Diese
zeigt
einen
Verlauf,
der mit The
der situation
ENF übereinstimmt,
wiefordie
die
of higher-order
harmonics.
Figure 6des
depicts
exemplary
results
for the 100
Hz
(left
panel) Sekunde
and 150 Hzdarstellt.
band (right panel),
Abbildung
24
anhand
Abschnitts
von
der
300sten
bis
zur
600sten
Abbildung
24 anhand
des Abschnitts
der 300sten
bissuitable
zur 600sten
Sekunde
darstellt.
respectively.
In accordance
to prior work
[28], the 150von
Hz band
seems more
to accurately
measure
ENF artifacts.
3 http://beagleboard.org/Products/BeagleBone
reference
iPhone 4s
Galaxy S2
Tascam R-05
noise source switched off
freq. resolution: 0.5 Hz, time resolution: 1 s
49.98
106
105
104
spectral amplitude
50.00
50.02
107
frequency [Hz]
spectral amplitude
frequency [Hz]
50.02
noise source switched on
freq. resolution: 0.25 Hz, time resolution: 3 s
50.00
49.98
103
49.96
106
105
104
103
49.96
102
0
107
60 0 12050180100
240150
300200
360250
420300
480350
540400
600450 500
102
0
time [s]
frequency [Hz]
60 0 12050180100
240150
300200
360250
420300
480350
540400
600450 500
time [s]
frequency [Hz]
freq.
resolution:
0.12 Hz, time
freq. resolution:noise
0.03 Hz,
time switch
resolution:
s and on (right).
Figure
4. Amplitudes
ofresolution:
frequency7 sspectra of indoor test recordings with mains-powered
source
off 29
(left)
50.02
50.02
frequency [Hz]
frequency [Hz]
B. Robustness of the Acoustic Channel
50.00
The above observations suggest that mobile device audio50.00
recordings in close proximity to our noise source contain
characteristic traces of the ENF. Because all recording devices are equipped with electret microphones, we suspect that these
artifacts are not transferred through electric or magnetic fields [20], [28]. For a better understanding of the acoustic channel,
49.98
49.98
we repeated the experiments with recording devices placed inside the soundproof box (cf. Sect. III-C). After sealing the box,
we consecutively recorded five minutes of audio data with the noise source switched off, before it was switched on for
another five minutes. The devices inside the box were approximately
40 cm apart from the noise source. Figure 7 displays
49.96
49.96
the extracted frequency signals in the 50 Hz band. The graphs for all three recording devices clearly reflect the two different
phases.
artifacts
seem300
to be
only
second
while
0 ENF
60 120
180 240
360present
420 480
540when
600 the noise source0 was
60 activated
120 180 (starting
240 300 at360
420 300),
480 540
600 the first
halves of the recordings do
ENF
timenot
[s] exhibit clear signs of network characteristics. Overall, the
time
[s] estimates appear much less
accurate than before in Fig. 5. This is an effect of the soundproof box and its strong sound attenuation (cf. Fig. 3).
FigureFigure
1. STFT
analysis
of ENFdetailed
traces in account
the 50 Hzof
band
varyingoftemporal
and spectral
resolutions
the noiseharmonics
source was activated.
8 gives
a more
theforimpact
soundproofing
with
respectwhen
to relevant
of the fundamental
network frequency. The left panel of the figure compares the harmonics’ spectral amplitudes in recordings inside and outside
the box on a normalized logarithmic scale. The right panel displays the corresponding sound attenuation coefficients per
Hello,
here is some
text without
a meaning.
This text
show whatofathe
printed
text will
look like
at this place.
If youindicate
harmonic.
As before,
a coefficient
of 1 denotes
theshould
full suppression
respective
harmonic
frequency.
The graphs
read this
text,
you
will
get
no
information.
Really?
Is
there
no
information?
Is
there
a
difference
between
this
text
and
that soundproofing reduces the amplitude of the first harmonic (50 Hz) to only about 1.5 % of the benchmarksome
outside the
nonsense
like
“Huardest
Kjift – harmonics
not at all! is
A generally
blind textlower,
like this
information
font,is most
box.
The
energy ofgefburn”?
the higher-order
andgives
it is you
further
decreased about
insidethe
theselected
box . What
how theinteresting
letters arehere
written
impression
of the look.
This text
should
containofallcharacteristic
letters of the ENF
alphabet
andwhich
it should
be even
(andand
of an
immediate
practical
relevance)
is the
robustness
traces,
survive
written severe
in of the
original
language.
There
is
no
need
for
special
content,
but
the
length
of
words
should
match
the
language.
damping of the carrier signal.
In a more close-to-real world scenario, we further recorded ten minutes of audio data in other rooms inside and outside
the same apartment. The setup was the same as before, i. e., the noise source was switched off during the first five minutes
of the recording, and then activated for another five minutes. The hum was not audible outside the room of the noise source.
Figure 9 depicts the extracted ENF signals in the 50 Hz band, along with the synced reference data. Note that the distance
of the recording device (the Samsung Galaxy S2) to the noise source is increasing from panels a) to c). The first setting
corresponds to a directly neighboring room, separated from the noise source by a massive brick wall. More recordings were
made two rooms apart (three walls), and in the hallway outside the apartment (five walls). Despite the additional distances,
the results generally resemble our findings from the earlier experiments. No meaningful ENF artifacts are noticeable without
the generated background noise. However, Figures 9 a) and b) clearly suggest the presence of characteristic ENF traces even
in rooms that are up to 10 meters apart from the activated noise source. Here, the spectral energy of the extracted signals is
comparable to the measurements inside the soundproof box. Only the last case turned out to be too challenging, cf. Fig. 9 c).
A distance of 20 meters and five massive brick walls introduced too much dampening and additional noise on the channel to
extract an ENF-like signal from the second half of the audio recording. Overall, however, the results from this series of
experiments seem very encouraging with respect to practical applications and confirm the high robustness of ENF artifacts in
acoustic signals.
reference
50.02
50.02
Galaxy S2
50.00
50.00
49.98
49.98
Tascam R-05
freq. resolution: 0.25 Hz, time resolution: 3 s
freq. resolution: 0.25 Hz, time resolution: 3 s
50.02
50.02
frequency[Hz]
[Hz]
frequency
frequency[Hz]
[Hz]
frequency
iPhone 4s
freq. resolution: 0.5 Hz, time resolution: 1 s
freq. resolution: 0.5 Hz, time resolution: 1 s
49.96
49.96
50.00
50.00
49.98
49.98
49.96
49.96
00
60
60
120 180
180 240
240 300
300 360
360 420
420 480
480 540
540 600
600
120
00
60
60
time [s]
[s]
time
50.00
50.00
49.98
49.98
freq. resolution: 0.03 Hz, time resolution: 29 s
freq. resolution: 0.03 Hz, time resolution: 29 s
50.02
50.02
frequency[Hz]
[Hz]
frequency
frequency[Hz]
[Hz]
frequency
time [s]
[s]
time
freq. resolution: 0.12 Hz, time resolution: 7 s
freq. resolution: 0.12 Hz, time resolution: 7 s
50.02
50.02
120 180
180 240
240 300
300 360
360 420
420 480
480 540
540 600
600
120
50.00
50.00
49.98
49.98
49.96
49.96
49.96
49.96
00
60
60
120
120 180
180 240
240 300
300 360
360 420
420 480
480 540
540 600
600
time
time [s]
[s]
00
60
60
120
120 180
180 240
240 300
300 360
360 420
420 480
480 540
540 600
600
time
time [s]
[s]
5. STFT
analysis
of traces
ENF traces
Hz for
band
for varying
temporal
and spectral
resolutions
(noise
source
activated).
FigureFigure
1. STFT
analysis
of ENF
in the in
50the
Hz 50
band
varying
temporal
and spectral
resolutions
when the
noise
source
was activated.
100.00
50.00
frequency[Hz]
[Hz]
frequency
frequency[Hz]
[Hz]
frequency
Hello, here is some text without a meaning. This text should show what a printed text will look like at this place. If you
read this text, you will get no information.
Really? IsiPhone
there4s no information?
Is there
a difference between this text and some
reference
Galaxy S2
Tascam R-05
nonsense like “Huardest gefburn”? Kjift – not at all! A blind text like this gives you information about the selected font,
freq. resolution: 0.1 Hz, time resolution: 9 s
freq. resolution: 0.1 Hz, time resolution: 9 s
how the letters
areresolution:
written0.5and
an impression
all letters
it should be
freq.
Hz, time
resolution: 1 s of the look. This text should contain
freq. resolution:
0.25 of
Hz, the
time alphabet
resolution: 3 and
s
100.05
150.10
written
in
of
the
original
language.
There
is
no
need
for
special
content,
but
the
length
of
words
should
match
the
language.
50.02
50.02
99.95
49.98
49.96
99.90
150.00
50.00
149.90
49.98
49.96
149.80
00
60
60
120 180
180 240
240 300
300 360
360 420
420 480
480 540
540 600
600
120
00
time [s]
[s]
time
60
60
120 180
180 240
240 300
300 360
360 420
420 480
480 540
540 600
600
120
time [s]
[s]
time
freq. resolution:
Hz, time
resolution:
7 s traces in the 100 Hz (left) and 150 Hz (right)
freq. resolution:
0.03 source
Hz, timeactivated).
resolution: 29 s
Figure 6. 0.12
STFT
analysis
of ENF
band (noise
50.00
49.98
50.02
frequency [Hz]
frequency [Hz]
50.02
50.00
49.98
freq. resolution: 0.06 Hz, time resolution: 15 s
frequency [Hz]
50.05
50.00
reference
iPhone 4s
Galaxy S2
Tascam R-05
49.95
49.90
0
60
120
180
240
300
360
420
480
540
600
time [s]
Figure 7. STFT analysis of ENF traces in the 50 Hz band when the recording devices were placed inside the soundproof box. The noise source was
activated after the first 300 seconds of recording.
normalized spectral amplitude
attenuation
0.985
0.975
0.984
0.981
0.979
0.945
1
outside the box
inside the box
0.1
0.01
0.001
0.0001
50
100
150
200
250
300
harmonic [Hz]
Figure 8. Attenuation of the ENF harmonics when the recording device (Samsung Galaxy S2) was placed inside the soundproof box. Spectral amplitudes
from recordings inside and outside the box, normalized to the amplitude of the 50 Hz harmonic without soundproofing and depicted on a logarithmic scale.
Corresponding attenuation factors are reported in the upper part of the figure.
V. S UMMARY AND C ONCLUDING R EMARKS
This paper has presented an experimental investigation into the role of background noise as a carrier of ENF artifacts
in mobile device audio recordings. Our examination of recordings from three different mobile devices supports the recent
hypothesis [28] that mains-powered noise sources in proximity to the recording device can cause traces of the typical ENF
fluctuations. Most importantly, our experiments indicate a very high robustness of these artifacts. Characteristic ENF traces
were still detectable even after placing the recording devices in a dedicated soundproof box with approximately 99 % sound
attenuation. Measurements in an indoor setting further suggest the presence of ENF artifacts multiple rooms apart from
the then inaudible noise source. Overall, our results re-emphasize the immense potential of ENF-based forensics for audio
authentication purposes. In anticipation of increasingly better signal models, estimation methods and detection techniques
[24], [33], it seems relatively safe to predict a further maturing of the field. Yet these theory-driven advances can only bear
fruit to the best possible extent in practical casework, if they are backed with a clear and precise understanding of the genuine
cause(s) of ENF artifacts in audio recordings. As currents studies report indications based on few samples only (with our
work being no exception), future work still has to conduct large-scale empirical experiments to infer how likely real-world
audio recordings will contain distinctive ENF artifacts (possibly conditional on certain environmental characteristics). Along
these lines, also the influence of different recording devices, softwares and compression algorithms needs to be considered.
This indicates strong parallels to the field of digital image forensics, where these and related empirical questions have already
a comparatively longer tradition [34], [35].
ACKNOWLEDGEMENTS
4
The authors thank dence GmbH, Dresden, Germany, for measuring and providing the ENF reference data used in this study.
4 http://dence.de
a) distance: 5 m
b) distance: 10 m
50.05
frequency [Hz]
frequency [Hz]
50.05
50.00
49.95
49.90
50.00
49.95
49.90
0
60
120 180 240 300 360 420 480 540 600
0
60
120 180 240 300 360 420 480 540 600
time [s]
time [s]
c) distance: 20 m
frequency [Hz]
50.05
reference
50.00
Galaxy S2
(freq. resolution: 0.06 Hz, time resolution: 15 s)
49.95
49.90
0
60
120 180 240 300 360 420 480 540 600
time [s]
Figure 9. STFT analysis of ENF traces in the 50 Hz band when the recording device (Samsung Galaxy S2) was placed outside the room of the noise
source, a) 5 meters and 1 massive wall, b) 10 meters and 3 massive walls, and c) 20 meters and 5 massive walls apart. The noise source was activated after
the first 300 seconds of recording.
R EFERENCES
[1] Audio Engineering Society, “AES recommended practice for forensic purposes — Managing recorded audio materials intended for
examination,” AES Standard AES27-1996 (r2007), 2007.
[2] S. Gupta, S. Cho, and C.-C. J. Kuo, “Current developments and future trends in audio authentication,” IEEE MultiMedia, vol. 19,
no. 1, pp. 50–59, 2012.
[3] V. A. Balasubramaniyan, A. Poonawalla, M. Ahamad, M. T. Hunter, and P. Traynor, “Pindr0p: Using single-ended audio features to
determine call provenance,” in ACM Conference on Computer and Communications Security. ACM Press, 2010, pp. 109–120.
[4] H. Malik and H. Farid, “Audio forensics from acoustic reverberation,” in IEEE International Conference on Acoustics, Speech and
Signal Processing, 2010, pp. 1710–1713.
[5] N. Peters, H. Lei, and G. Friedland, “Name that room: Room identification using acoustic features in a recording,” in ACM International
Conference on Multimedia. ACM Press, 2012, pp. 841–844.
[6] H. Malik, “Acoustic environment identification and its applications to audio forensics,” IEEE Transactions on Information Forensics
and Security, vol. 8, no. 11, pp. 1827–1837, 2013.
[7] R. Buchholz, C. Kraetzer, and J. Dittmann, “Microphone classification using Fourier coefficients,” in Information Hiding, 11th
International Workshop, ser. Lecture Notes in Computer Science, S. Katzenbeisser and A.-R. Sadeghi, Eds., vol. 5806. Springer,
2009, pp. 235–246.
[8] D. Garcia-Romero and C. Y. Espy-Wilson, “Automatic acquisition device identification from speech recordings,” in IEEE International
Conference on Acoustics, Speech and Signal Processing, 2010, pp. 1806–1809.
[9] C. Hanilçi and F. Ertas, “Optimizing acoustic features for source cell-phone recognition using speech signals,” in ACM Workshop on
Information Hiding and Multimedia Security. ACM Press, 2013, pp. 141–148.
[10] R. Böhme and A. Westfeld, “Feature-based encoder classification of compressed audio streams,” Multimedia Systems, vol. 11, no. 2,
pp. 108–120, 2005.
[11] D. Luo, W. Luo, R. Yang, and J. Huang, “Compression history identification for digital audio signal,” in IEEE International Conference
on Acoustics, Speech and Signal Processing, 2012, pp. 1733–1736.
[12] F. Jenner and A. Kwasinski, “Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals,”
in IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 1737–1740.
[13] P. Rose, Forensic Speaker Identification.
CRC Press, 2003.
[14] X. Pan, X. Zhang, and S. Lyu, “Detecting splicing in digital audios using local noise level estimation,” in IEEE International
Conference on Acoustics, Speech and Signal Processing, 2012, pp. 1841–1844.
[15] R. Yang, Z. Qu, and J. Huang, “Exposing MP3 audio forgeries using frame offsets,” ACM Transactions on Multimedia Computing,
Communications, and Applications, vol. 8, no. 2S, 2012.
[16] J. Chen, S. Xiang, W. Liu, and H. Huang, “Exposing digital audio forgeries in time domain by using singularity analysis with
Wavelets,” in ACM Workshop on Information Hiding and Multimedia Security. ACM Press, 2013, pp. 149–158.
[17] T. Bianchi, A. D. Rosa, M. Fontani, G. Rocciolo, and A. Piva, “Detection and classification of double compressed MP3 audio tracks,”
in ACM Workshop on Information Hiding and Multimedia Security. ACM Press, 2013, pp. 159–164.
[18] C. Grigoras, “Digital audio recording analysis: the electric network frequency (ENF) criterion,” Speech, Language and the Law,
vol. 12, no. 1, pp. 63–76, 2005.
[19] M. Kajstura, A. Trawinska, and J. Hebenstreit, “Application of the electrical network frequency (ENF) criterion: A case of a digital
recording,” Forensic Science International, vol. 155, no. 2–3, pp. 165–171, 2005.
[20] E. B. Brixen, “Techniques for the authentication of digital audio recordings,” in 122nd AES Convention, 2007.
[21] M. Huijbregtse and Z. J. Geradts, “Using the ENF criterion for determining the time of recording of short digital audio recordings,”
in Computational Forensics, Third International Workshop, ser. Lecture Notes in Computer Science, Z. J. Geradts, K. Y. Franke, and
C. J. Veenman, Eds., vol. 5718. Springer, 2009, pp. 116–124.
[22] A. J. Cooper, “An automated approach to the electric network frequency (ENF) criterion - theory and practice,” International Journal
of Speech Language and the Law, vol. 16, no. 2, pp. 193–218, 2009.
[23] ——, “Further considerations for the analysis of ENF data for forensic audio and video applications,” International Journal of Speech
Language and the Law, vol. 18, no. 1, pp. 99–120, 2011.
[24] R. Garg, A. L. Varna, and M. Wu, “Modeling and analysis of electric network frequency signal for timestamp verification,” in IEEE
International Workshop on Information Forensics and Security, 2012, pp. 67–72.
[25] A. Hajj-Ahmad, R. Garg, and M. Wu, “ENF based location classification of sensor recordings,” in IEEE International Workshop on
Information Forensics and Security, 2013.
[26] D. P. Nicolalde and J. A. Apolinário, Jr., “Evaluating digital audio authenticity with spectral distances and ENF phase change,” in
IEEE International Conference on Acoustics, Speech, and Signal Processing, 2009, pp. 1417–1420.
[27] E. B. Brixen, “ENF; Quantification of the magnetic field,” in AES 32nd International Conference, 2008.
[28] J. Chai, F. Liu, Z. Yuan, R. W. Conners, and Y. Liu, “Source of ENF in battery-powered digital recordings,” in 135th AES Convention,
2013.
[29] Y. Liu, Z. Yuan, P. N. Markham, R. W. Conners, and Y. Liu, “Application of power system frequency for digital audio authentication,”
IEEE Transactions on Power Delivery, vol. 27, no. 4, pp. 1820–1828, 2012.
[30] E. B. Brixen, “Further investigation into the ENF criterion for forensic authentication,” in 123rd AES Convention, 2007.
[31] R. Garg, A. L. Varna, and M. Wu, “Seeing ENF: Natural time stamp for digital video via optical sensing and signal processing,” in
ACM International Conference on Multimedia. ACM Press, 2011, pp. 23–32.
[32] M. Garai, “Measurement of the sound-absorption coefficient in situ: The reflection method using periodic pseudo-random sequences
of maximum length,” Applied Acoustics, vol. 39, no. 1–2, pp. 119–139, 1993.
[33] L. Fu, P. N. Markham, R. W. Conners, and Y. Liu, “An improved Discrete Fourier Transform-based algorithm for electric network
frequency extraction,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 7, pp. 1173–1181, 2013.
[34] R. Böhme, F. Freiling, T. Gloe, and M. Kirchner, “Multimedia forensics is not computer forensics,” in Computational Forensics,
Third International Workshop, ser. Lecture Notes in Computer Science, Z. J. Geradts, K. Y. Franke, and C. J. Veenman, Eds., vol.
5718. Springer, 2009, pp. 90–103.
[35] H. T. Sencar and N. Memon, Eds., Digital Image Forensics: There is More to a Picture Than Meets the Eye.
Springer, 2013.
Download