Mining Knowledge in Noisy Audio Data Andnej CZYZEWSKI

From: KDD-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.
Mining Knowledge in Noisy Audio Data
Andnej CZYZEWSKI
TechnicalUniversityof GdaW, Facultyof Electronics,SoundEngineeringDept.
Narutowicza11112;
80-952Gdausk,Poland.
andrzej@next.elka.pg.gda.pl
Abstract
This paper demonstratesa KDD methodapplied to
audio data analysis, particularly, it presents
possibilitieswhich result from replacing traditional
methodsof analysisand acousticsignal processing
by KDD algorithmswhen restoringaudio recordings
at&ctedby strongnoise.
Introduction
m-l-,,
C--II-CL2----a?------L-- .--L--,-.-z-A_
1yplGdl dppllCdUOIlS or cmnpurer
l~llnologlGs
10
audioacousticsonly rarely considerthe opportunitiesof
data processingwith the use of methodswhich stem
from KDD approach.What is essentialhere is the fact
that methods of analysis and signal processing
developedon the basisof speechacousticshavenot been
transferre4lrespectivelyso far to other relatedareas,e.g.
as an algorithm of intelligent analysisand processing
of
the musical audio signal. In the meantimethe area of
audio acoustics has an extensive demand for
applicationsof intelligent signalprocessing.
The paper demonstratesa KDD methodapplied
to audio data analysis,which was studiedat the Sound
Engineering Department of the Gdat+iskTechnical
University. Particularly, it presentspossibilitieswhich
result from replacing traditional methods of analysis
and acousticsignalprocessingby KDD algorithmswhen
restoringaudiorecordingsaRcted by strongnoise.
Previously, the parallel algorithm applied
to the removal of clicks has been tested(Czyzewski
1994, 1995a) and the rough set method applied to
noise suppressionin old audio recordings was tried
(Czyzewski1995b).A new conceptof perceptualcoding
allowing for noise reductionin old musical recordings
stemmed from a modification of KDD applications
investieated
zvzewski
199%
,d-+,
-_.. --- -.
. -,
p----r-nreviouslv
-.-- ---, bv
-, the author K’
1995c). Perceptual coding provides the way of
processingaudio signal in such a way that the portions
of signal which are perceptibleto human hearingsense
are to be encodedwhile the remainingportionsof signal
or noise are to be rejected.The algorithm processes
signalin subbandsof the frequencyscalecorresponding
220
KDD-96
to the critical bandsof hearing. The rough set method
was employedto building the knowledgebaseof signal
and distortionsin sucha way that it becomespossibleto
automaticallycontrol the maskingthresholdin order to
maintainthe noiseaffectingaudio signalsnot audibleto
listeners,
Detailsof the elaboratedand testedalgorithmswill be
presentedand results of their application discussed.
Some general conclusions concerning knowledge
acquisition of audio signal affected by noise and
distortionswill be added.Potentialtelecommunications
and multimediaappiicationswiii be quoted.
Rough set approach to mining signal and
noise data
The idea of using KDD approachto the removal
of continuous noise from old recordings uses the
perceptualcoding schemeenhancedby the intelligent
decisionalgorithmbasedon the rough set method. The
rule set usedfor the determinationof thresholdsfor the
selectionof eligible componentsof the signal is obtained
by learning from examples.Hence, the data mining
processallows one to discernbetweensignal and noise
portions of the audio material. Consequently,the
maskingthresholdlevel can be determinedfor eachdata
frame allowingone to makethe noiseinaudibleafter the
executionof the perceptualcodingprocedure.
Before the details of the elaboratedalgorithm are
presenteda brief introduction to the domains of rough
sets and of perceptualcoding of audio data will be
provided.
Basic concepts of the rough set theory
The Booleantraditional logic, which is employed
by computersfor general use, stems from clhntofs
formulationof the definition of a set and operationson
sets.However,as numerousexamplesprove, computers
which work on this basis,are not good enoughto solve
many practical problems which require automatic
inference,especiallyin situationswhen the data being
analyzedcarry a certain inaccuracyor irreproducibility
and when there is no possibility to create a precise
enoughmodel of the decisiveprocess.In such cases,
overcomingthe axioms of Cantor’sdefinition of sets
may turn out to be of purposeand very useN. This
situation obviously correspondsto the problems of
discerning between signal and noise componentsin
noisyaudiopatterns. Overcoming the iimitations
related to Cantor’sdefinition of the set is possibleby
:-P...i..,-,.l.eh
+l.n+
kn..a
+rr
Ipjllvllllg
U&F-“..:ram,...+
IGqyuu~,,~cur
LuaLCLLUG“a*
ret.l.,...ewan;a..
lAlulLualI~~
IUIVCL”
be strictly defined, that is of a set which is precisely
definedby its elements.By doing so it is possibleto
define the set based on its lower and upper
approximations.Sucha set, sinceit is not definedfully,
may includeelementswhich belongto it mauytimes.
JJppr
Apprownahon /
gouudary Region
\
Lower
ADDrOxdtiOn
UNIVERSE-U
APPROXIMATION
SPACE -AS
LOWERANDUPPERAPROXRdATIONS-LA&UP
ROUGH
------.luuJULT
SET Bt DEPENDENCIES
-IRS & D
- Ii
CORE-C
Fig. 1. Illustration of elementarynotions relatedto the
roughsettheory.
Overcomingthe traditional axioms applied in the
caseof rough setscausesthe logic basedon the rough
set theory to acquire completely new featureswhich
make it an extremely useful tool for solving many
problemswhich require an intelligent data analysis,
searchingfor hiddenrelationshipsbetweenthe data and
even making the right decisionsin situations when
incompleteandpartially contraryantecedents
exist.
The fundamentalsof this theory were announced
by Pawlak in the early 1980’s (pawlak, 1982). The
theorywasquickly adoptedby the scientificworld and is
now one of the fast developingmethodsof artificial
;~+dl;“c.~M
ulK.uqj~~~
mn-.lnl,
(rawmn,
1aa1\
r=lJ,.
I%*
111v
mn;n
lua.ul
+.wa.sno
LQIILW
.*1ntLarl
Inatbu
+rr
L”
defining a rough set are illustrated schematicallyin
Fig. 1. More detaileddefinitionsof theseterms can be
found in the literature(Slowinskied., 1992).
The knowledgeis representedin a systembased
on rough sets in a tabular form composedof the
following elements:
a finite setof objects,P - is a finite set of attributes,D is a specialdecisiveattributedeterminedby the expert,
P is a collectionof all conditionalattributesin P, vo,
representsthe area of decisiveattributes,F - is called
the knowledgefunction.
An inductivelearningsystembasedon rough set
.~ a-.-.-.consistsof the iearning componentfor the automauc
rule derivation from training samples and of the
inference system used for decision-making at the
recognitionstage.Rulesare expressionsin the following
format:
if <conditionl>n2~.
then <decision>.
. and~ndition d
Conditions are based on attributes. Attributes
shouldbe equalto concretevalues or shouldbelong to
certain ranges.The decisionalways use a single rule
matchingapproach.
The knowledgebasein rough set methodcan be
convenientlyrepresented
in the form of a decisiontable,
in which rows representob&cts and c0imns represent
attributes.In the last column decision attributes are
collected.The main task for the learning phase (rule
generationprocedure)is to find the minimum number
of maximum composedsets that cover the so called
positive region J%%s) providing one of the basic
notions of the rough set theory @ ‘awl& 1982).
Consequently,
two typesof rules are to be derivedfrom
the
UIV &.4cinn
UVIIY*“II
t-hlw
WVIV.
rclrtcain
W.-L
nrbc
md
nnccihlp
SWVY s4AI.s
yv”LuY’”
ndec
*UIIY.
.An
Y.
importantparameterof possiblerules, reflecting quality
of them is the roughmeasurek&s definedas follows:
pm=IXnYI
1’1
!U
where:X is the conceptand Y is the set of all
examplesdescribedby the rule.
Spatial, Text, & Multimedia
221
An additionalparameterwas proposedby the author
doting
one to optimize the rule generationprocess.
This parameterwas called the rule strength r and is
definedas follows:
where:c - numberof casesconformingthe rule,
n
p - the neutral point of the rough
measure.
The neutral point of the rough measure‘fl provides
one of someparametersof the rule generationsystemto
be set by its operator. This parameterallows one to
regulatethe influence of possible(uncertain)rules on
the processof decisionmaking.More detailsconcerning
the rough set conceptsare to be found in the rich
literature(Slowidskied., 1992;Wojcik, 1993).
Perceptual coding algorithm
Empirical data reveal that some signals are
inaudiblein the presenceof othersboth in time and in
frequencydomain. The level of the barely audiblepure
tone dependson the ditferenceof frequenciesof the
signal componentsand the masking tone, and on the
excitationlevel of the masker.Usually,this dependence
is presentedas the function of frequencyin a linear
scale or in a logarithmic scale (Zwicker&Zwicker
1991). It can be observed that the shape varies
.A.....%.%.”
T..
“:-:Cn,.nCLr
..F..-.,.....x..”
4,.
&v.,....m..-.bmlllt;E;~.
3lpulclaluy &uJluullj
LOncqllcmby
u1 rL.
WIE;
literature,someattemptsto calculatethat shapecan be
found (Krohkowski, 1996).
Another well-proven phenomenonis the fact that
humanauditorysystemanalysesspectrumrangeswhich
correspondto subbands,called critical bands. The
stimuli frequencieswithin a critical band are perceived
similarly and processedindependentlyfrom those of
other critical bands. Becauseof the need to use the
critical bandsscale,the Bark unit for this was defined,
which is of the width of one critical band. It can be
observed that Bark-to-Hertz dependencecan be
approximatedby the straightline: up to the 8th Bark in
the linear frequencyscaleand from the 8th Bark in the
logarithmic one. The above property was taken into
accountwhen convertingto Bark units in the eiaborated
algorithm. Moreover,it turns out that maskingcurves
assumealmostan uniform shapewhen calculatedversus
Bark units. One can noticethat the slopesof the curves
at the lower frequenciesare constant,whereasat the
higher ones,the slopedependson the excitationlevel of
the masker.
222
KDD-96
Basing on the above indications, the elaborated
perceptual
algorithm for the encoder performs the
- __
folIowingoperations:
Step 1. On the basis of the sampling frequency,
valuesof frequenciesfor consecutivespectrallines are
computed.Next, they are convertedto Bark units.
Step2. The continuousseriesof 16-bit samplesare
groupedinto blocks of the length of 512 samples.This
&.P
“I-
ie UOY~IIYU
ncrr~md
10
tn
L” lm
LN
1 t.r.mnrnmi~~
ss
w*n,p”“~oY
lustw~c.n
uwL,,YvII
hinh
1LL611
spectralresolution (long blocks) and time resolution
requirements(short blocks). Since at the borders of
blocks some audible distortions may occur, so called
‘block effects’, thus overlapping technique with the
lengthof the fold equalsto 32 wasapplied.
Step 3. Becauseof the Digital Fourier Transform
properties,blocks containing samples are windowed
before the processing is made. Rectangular-cosine
shapedwindow was used calculatedaccordingto the
following formula:
d&u(i) =
I
data - sin(: eG‘) ,
i e(O,L-1)
data,
i E(L,N-L-1)
A u
datacos($-$),
i E(N-&N-l)
(3)
where:
L - the sizeof the fold,
N - the lengthof the block.
Step4. The FFT procedureis executed.Amplitudes
and phases of spectral lines are calculated using
complex representations.Since these values are
symmetrical,only half of them is further processed.The
remainingpart is restoredin the decoder.
Step 5. The amplitudesof spectrallines are sorted
accordingto decreasingvaluesof amplitude.
Step 6. Simultaneous masking procedure is
executed.Spectrallines with amplitudevalue remaining
below the masking threshold are discarded.Masking
curves are representedby uniform curves. For the
practicaluse, shapesof thesecurvesare approximated
hv
“J
ctraioht
YU.a~..
lin~c
I...WY.
Thp
clone
L1.W Ya”p
nf
“a
the
line
UX” 11-w
at
u.
thP
ua”
lnw~r
1”..ll
frequenciesis set to 27 dB/Bark, whereasat the higher
ones,the slopeS (with minus sign) is expressedas:
S =22-0.2-I [dB/Bark]
(4)
where:
I - denotesthe ievei of the masker.
For eachnon-maskedspectralline, there is a mask
level computed,i.e. the excitation value of barely
audibletoneof the respectivefrequency.
Step7. Within every critical band for each spectral
line, the minimum mask level is chosen.This approach
ensures that after the quantization of non-masked
components,
the quantizationnoiseis maintainedbelow
the audiblelevel.
Step 8. Subjectivelevel of noise is estimatedby a
human listener. If its value is below the mask level in
everycritical band- no further actionsare performed.If
not, new maskingthresholdsare establishedto shift the
noise below the masking level in each particular
subband.This procedureis conceivedas a flexible one
in order to not to diminish nor to neglect eligible
componentswithin the band.
Step 9. After the perceptual processing,every
spectmlcomponentis quantizedor set to 0 (if masked).
It would be redundantto encodemaskedvahles:ml&
only these componentswith non-zeromagnitudesare
processed.However,there is a needto know, for which
frequenciesspectmllines were neglectedif they were.
Therefore,an additional piece of information is sent,
which defineswhetherconsecutivecomplexcomponents
are transmittedor not. And thus, everypackageof data
is precededby 256 one-bitflags. If the flag is setto 0 at
the kth position,it meansthat the magnitudeand phase
for the kth spectml componentis not processed(the
maskingoccurred).On the other hand if the kth value
is set to 1, it denotesthat the packageincludesthe kth
complexcomponent.
utili= all 16-bit representationwhereasthe value can
matchonly a part of it. Thus, it can be assumedthat the
numberof bits usedto encodethesevaluesis constantin
a critical band within the time duration of several
frames.In a case,when the word length changes,one
needsto sendonly a new valuefor appropriatesubband.
Consequently,
the datapackageis precededby a number
of one-bitflags. When the kth flag is set to 0, it means
that the number of bits used to encode spectml
componentsin the kth critical band did not change.
When the flag is equal to 1, it denotesthat a new
number for thesebits will be transmittedfor the kth
band. The number of one-bit flags is equal to the
number of critical bands determinedby the given
nnms.1:..n
&.-..a..-,
341qJlll1~
UGquGll~y.
During the maskingprocedure,it is evahratedthe
permissiblenoiselevel in eachcritical baud. The level
is used for the quantimtion and should be known to
restoresoundcorrectly. Thus, there is a needto store
values of the levels for every subband.Fortunately,it
turns out that it is essentialto precedethe audiopackage
by a numberof flags informing whetherthe noiselevel
in consecutivecriticai bandschanged.If so, a new 16bit valueof the level for appropriatebandis transmitted.
As previously,the numberof the one-bitflags refer to
the numberof utilized critical bands.
Consequently,each SlZsample frame is storedin
the format as follows:
256 of one-bitflags of non-maskedspectrallines,
up to 25 of one-bitflags. They describethe change
of the word length engaged to encode complex
componentsin a critical band,
up to 25 of one-bitflags. They concernthe changeof
permissiblenoiselevel in a critical band,
up to 25 of 4-bit values of a new value of word
length usedto encodespectralcomponentsin a critical
band,
16-bit valuesof a new permissiblenoise level in a
critical band,
vdaes of ---r-----amnlitudes --and nhasen
of non-masked
r----
spectrallines.The value of phasescan be encodedusing
5 bits, what is recommendedin the literature.
Consequently,amplitude and phase of the constant
complexcomponentis encoded.
The decoderusesthe abovedatafor the additive
synthesisof soundon the basis of spectralcomponents
which havebeenqualifiedas non-maskedones.
Knoyledge acquisition of signal and of noise
Previously, the rough set approach to the
determinationof spectralcomponentsobtained in the
McAulay-Quatieri analysis was tried by the author
(Czyzewski1995b,1995c).Similar KDD procedureto
the one elaboratedpreviously was exploited in the
current experiments to derive rules allowing to
automaticallyselectthe optimal level of the masking
thresholdin the perceptualcodingffiltrationprocedure.
The maskingthresholdinfluencing the selectionof
spectralcomponentsin the encodershouldbe updated
frameby framebasingon the rule set.This rule setis to
be acquiredon the basis of examplesprocessedduring
the learningphase.Correspondingly,threeclassesare to
be defined: thresholdlow (too low), thresholdmedium
or balanced(right) and thresholdhigh (too high). The
expressions in brackets correspond to subjective
assessment
of the effectof filtration. When the threshold
is too low, then the noise is clearly audible.When the
threshold is too high, many eligible componentsare
removed,so the resulting soundis clean but poor and
distorted. Balanced threshold allows one to remove
noisewithout discardingeligible signal components.As
resultsfrom aboveindications,thresholdvaluesneedto
be quantized.The quantizationconsistsin replacingreal
values of masking threshold by the range
representations
which are utilized as decisionattributes
(Slowinski ed., 1992). Practically, the uuiform
. ... . . ..r.-..r...---^
..-,,...-.,.I L.-.r,A
,.^ uL up
.a73-I--”
rC
yuat1uuuw1
walaGlupluyGu
UiibGuUll
1illl~CSUl
magnitudeof threshold.
Practically,the learuing procedureconsistsin
selecting some short fragments of the recording,
automatically setting various threshold levels and
assessingthe effect subjectivelywhen playing back
thosefragmentsafter the resynthesis.Correspondingly,
Spatial, Text, Q Multimedia
223
each sample fragment is to be represented by a set of
threshold values in each critical band and the decision
comes from the human expert. That is the way the
knowledge base is built withregard to expert subjective
assessm&ts of individual examples. Normally, it is
suBicient to choose some examples corresponding to the
most characteristic
fragments of the recording.
Typically, up to 5 percent of the whole material should
be chosen and assessed on the basis of 2 to 3 seconds
portions. This produces a stream of exemplary data to
be added to the relevant classes labeled as “low”, “right”
and too “high”. Consequently, the knowledge base is
build up to be applicable to the selected fragments
(certainly) and to the rest of the whole recording
(possibly). The generalization capabilities of KDD
algorithms the rough set belongs to proved to work well
also with the new patterns representing the material not
employed to the %aining. The rule base represents
knowledge acquired during the training of the
algorithm. Thfi rule set may contain rules of the
following form:
(5)
where: W,) = V*J*,...J*,~ denotes that in the
frequency band No. k the threshold value is to be set to
I is
i EC l,2,...,16 > .
the
quantized
level
,
,
k E< 1,2,...,24 >
Then, the magnitude of the threshold set in kth
critical band is as follows:
4 = 61,.
[m]
(6)
The exemplary rules presented above are to be
determined automatically through the learning from
fragments of the recording, packet after packet. As each
packet represents the result of FFI’ tran&orm of 512
samples, thus for each 1s portion of audio sampled with
the frequency 22.05 kHz as much as 22.050/512 z 43
rules may be generated.
Thus, initially after typical learning procedure
employing 10 or 15 s of exemplary audio material the
decision table is constructed containing several
hundreds of rules. Usually, such a data collection is
superfluous. This feature results from the fact that
musical tones duration usually exceeds single packet
length. Consequently, many rules and attributes W, )
are discarded after the rough set based reduction of the
obtained decision table and the resulting knowledge
base is automatically compacted. Subsequently, the
224
KDD-96
acquired rule base is used for the automatic setting
thresholds for all subbands and all consecutive packets
of the whole recording. For the concrete combination of
input data many rules are firing, some of them certain
(rough measure equal to 1) and some uncertain (rough
measure lower than 1) - see eq. 1. According to the
rough set method principles, the decision always comes
from the single firing rule being the strongest one - see
eq. 2. Consequently, the specuum filtering thresholds
are to be updated according to the winning rules.
Subsequently, the current packet is processed using
threshold values update controlled by the rule
conditional attributes. Results of processing noisy
recordings with this method are presented in the next
paragraph.
Results
Music affected by strong noise was used for the
described experiments. The
analysis-resynthesis
algorithm with the “intelligent” threshold update was
applied among others to an exemplary fragment of the
song performed by Edith Piaf taken from a very noisy
record.
After the perceptual filtration is executed, which is
supported by the rough set-based control of masking
threshold, the rectified signal was obtained revealing
enhanced subjective quality. Results of analyses of
music material made before and after the processing are
presented in Fig. 2. As is seen from Fig. 2a, spectral
analysis of the musical fragment reveals that signal
components are accompanied by very strong noise. The
noise is broadband (so called hiss) and its components
are strong within the whole range of frequency (up to
l/2 sampling freq which was equal to 22.05 kHz).
Fig. 2b presents spectral analysis of the fragment
restored with the perceptual coding algorithm with not
properly selected masking thresholds - some eligible
signal components are weakened while the higher
components of hiss remain still beyond the masking
threshold. Fig. 2c presents the result of signal
restoration with the perceptual coding algorithm
controlled by the rule set derived from eight
characteristic portions of the whole recording (time
duration of each portion: 2 to 3 s).
Conclusions
KDD algorithms should find their way to more audio
applications. The previously conducted experiments
related to neural network implementation to the removal
of impulse distortions (Czyzewski 1994, 1995a) and the
presented exemplary application
related to the
restoration of noisy audio recordings may support this
opinion. There are many potential applications of
Acknowledgment
intelligent algorithms applied to the removal of noise
and distortions. Some of them might be used in
telecommunications and digital broadcasting, in
databasesand multimedia-related systems as data
reductionand noisesuppression
techniques.
The presentedresearchprojectwas sponsoredby the
Committeefor ScientiIic Research,Warsaw, Poland.
GrantNo. 8 TllD 002 08.
References
1. CzyzewskiA 1994, Artificial IntelligencsBased
Processingof Old Audio Recordings.In Preprintof 97th
Audio EngineeringSocietyConvention.San Francisco
(Preprint No. 3885).
a
,I
22
3.9
4.4
53
6R
7J
9)
2. Cmki
A 1995a. Some Methods For
DetectionAnd InterpolationOf ImpulsiveDistortionsIn
#-%,A
*..A:- n A^^..AZ.., T,
lx.,, mnm ~wmACLlD~l,,...Lol.,...
UIU tusuw ntxiwuuqp.
m TIVC;. l~oo
vvumm~vt,
on Applications of Signal Processingto Audio and
Acoustics.MohonkMountain,N.Y., U.S.A.
1.9
Time(seconds)
3. Cqzewski A. 1995b.ManagingNoisy Data in the
AI-based Processingof Old Audio Recordings. In
Proceedingsof InternationalSymposiumon Intelligent
DataAnalysis.Baden-Baden,17-19August, 1995.
4. Czyzemki A 1995c.New Learning Algorithms
for the Processingof Old Audio Recordings.In Preprint
I ,.a..
or 99th Audio Engineering Society Convenuon.New
York (Preprint No. 4078).
5. Pawlak Z. (1982). Rough sets. Journal of
Computerand InformationScience,~01.11, No.5.
I.
,I
I.
22
I.9
4A
$3
II
7J
81
93
Time koadrl
6. PawlakZ. (1993).Rough Sets- PresentStateand
the Future. Foundationsof Computing and Decision
Sciences,
vol. 18,No.3-4.
7. Slowinski R, ed. 1992. Intelligent Decision
Suppo~.balk
on ~piica~oiis Ed evens of
the Rough Sets Theory. Kluwer Academic Publisher,
Dordrecht/Bostonln.
m
6664
urn.
ml.
1
9.
.I
I.99
2.w
136
4s
534
cm96
7.92
834
The (reconds)
Fig. 2 Resultsof perceptualfiltering of soundwith the
useof the rule base.
- 1 ungmm
.--t-1--, e------L
-e--au auaro
-.-AZ-recorurng
-____AZ-- dutxieu
,me-r-A
a./
uagmemor
by strongnoise,
b./ The same fragment restoredwith the use of
insuflicientlytrainal algorithm,
c./ Fragmentas in Fig. (a) processedemployingthe
final versionof the rule set.
93
8. Wojcik Z.M. 1993. Rough Sets for Intelligent
ImageFiltering, In Proc. of RoughSetsand Knowledge
DiscoveryWorkshop(RSKD),Ban& Canada.
9. Zwicker E., Zwicker U. 1991.Audio Engineering
and Psychoacoustics:
Matching Signals to the Final
Receiver,the Human
Auditory
System,J. Audio Eng.
-.- _^C.
Sot., voi. 39 (pp. I 13-126).
10. Krolikowski R 1996. Noise reduction in old
musical recordings using the perceptual coding of
audio.Forthcoming.
Suatial. Text. Q Multimedia
22s