EE422G Signals and Systems Laboratory Noise Analysis Kevin D. Donohue Electrical and Computer Engineering University of Kentucky Noise as a Random Variable Noise is modeled as a random variable (RV), which is defined as a function that maps an event into a real number. There are 2 main characterizations of noise that are important for designing signal processing and communication systems: 1. Distribution of amplitudes is characterized by the probability density function (pdf), or its integral, the cumulative distribution function (cdf) 2. The correlation or influence between neighboring noise samples is characterized by the autocorrelation (AC), or its Fourier transform, the power spectral density (PSD). The PDF Given random signal x[n] with pdf pX(x), its probability of occurring between values a and b is given by: b Pr {a X ≤b }=∫ p X x dx a The cdf is the probability the RV X is less than x denoted by: x Pr { X ≤ x }=P x x= ∫ p X d −∞ PDF and CDF Example Consider an exponential noise distribution with pdf: Compute cdf and find values of x such that the probability of being less than x1 is 0.5132 and less than x2 is 0.9544. Find the probability of x being between x1 and x2 0 .8 pdf 0 .6 0 .4 0 .2 0 0 0 .5 1 1 .5 2 2 .5 x 3 3 .5 4 4 .5 5 4 .5 5 1 0 .8 X : 3 .4 3 Y : 0 .9 5 4 4 X : 0 .8 Y : 0 .5 1 3 2 0 .6 cdf p X ( x)=0.9exp(−0.9 x) for x≥0 and 0 Elsewhere 1 0 .4 0 .2 0 0 0 .5 1 1 .5 2 2 .5 x 3 3 .5 4 Parametric PDF Estimate If the form of the distribution is known, then only the parameters of the distribution need to be estimated from the data. For example consider a Gaussian distribution with pdf given by: ( (x−μ) 1 p X ( x)= exp − 2 2 2σ √2 π σ with mean and standard deviation 2. 2 ) So if N data samples (xn) are collected, the sample mean and standard deviation is estimated by: 1 = N N −1 ∑ xn n=0 N −1 1 x n − 2 = ∑ N −1 n=0 Threshold Design Assume that a noise process is exponential with unknown mean. Given 10 noise samples, estimate a threshold to detect a signal of greater power, such that the probability of false alarm is 1 out of 10k tests. Sample Data: s=[ 0.20, 1.17, 0.69, 3.70, 1.3, 5.55, 0.46, 0.70, 0.36, 0.34 ] Model: 1 x p X x= exp − b b for x≥0 Threshold Design Compute sample mean as an estimate of the b 10 parameter: 1 b= s =1.45 ∑ 10 i=1 i Use in cdf to show a relationship between a false alarm probability and the threshold value: −T P fa =1−cdf T =exp b T =−b ln P fa ≈−b ln P fa T ≈−1.45ln 1/10k=13.35 PDF Non-Parametric Estimation If the form of the distribution is unknown, it can be estimated with few assumptions using a normalized histogram operation, which estimates the PDF over short intervals (bins) based on percentage of sample data occurring in the bin: b Pr [ a x≤b ]=∫ p x dx≈ a b ∫ p x dx=b−a p x ab samples∈[ a , b ] total samples collected for some x ab ∈[ a b ] a 0 .8 0 .8 0 .7 0 .7 0 .6 0 .6 0 .5 0 .5 0 .4 0 .4 0 .3 0 .3 0 .2 0 .2 0 .1 0 .1 0 0 0 .5 1 1 .5 2 x 2 .5 3 3 .5 0 1 2 x 3 4 Signal Power and Moments The RMS value of a random signal is equivalent to its second moment. √ √ 1 1 2 S rms = s (t ) dt ≈ ∫ T T N N ∑ si2 i=1 If the signal is zero mean, the standard deviation is equivalent to the RMS value. Find the RMS value of: s(t )=3+5sin (2 π 100 t )V Find RMS value of a Gaussian random noise process with mean 0.5 and variance of 4. Signal to Noise Ratio Signal to noise ratio in Decibels (dB) is defined as the following power ratio: SNR dB=10 log10 ( ) σ 2 σ 2 s =20 log 10 n σs σn ( ) where s is the RMS value of the signal and n is the RMS value of the noise. Assume a zero mean signal has an RMS value of 2 and zero mean Gaussian noise has an RMS value of 1. The noise and signal will be added together to simulate a signal in noise. What must the noise signal be multiplied by so that the resulting SNRdB is -2 dB? Correlation The correlation indicates how similar one signal is to another. Related to this is the covariance, which removes the signal mean: xy =E[ xy]=∫ ∫ x− x y− y p XY x , y dx dy x y Correlation is the same as above without the mean subtraction. For zero mean signals, the covariance and correlation are identical. To estimate correlation from stationary (i.e. statistics do not change over time) random signal segments of length N, the sample correlation is used: N −1 1 ∑ x [n−k ] y [n] Rxy [k ]= N −k n=k Autocorrelation A signal can be correlated with a delayed version of itself to determine influence (statistically) as samples get further apart. This is the autocorrelation (AC) function: 1 Rxy [ k ]= N N −1 ∑ y [n−k ] y [n] n=k This is referred to as the biased AC and k is often called the lag, which represents the relative delay or shift between the signals. The Matlab function xcorr() can be used to perform these operations. Note for k=0 the energy or second moment is computed. This will always be the largest value for all AC lags. The AC is often normalized by this value so the zero lag becomes 1. Convolution and Correlation Recall the convolution operation: N −1 w [k ]= ∑ x [ k −n] y [n] n=k The main difference between convolution and correlation is time reversal of one of the signals before the multiply and sum operation. The conv() operation in Matlab can be used to implement correlation (with some minor modifications to the input arguments) R [k ]= 1 N N −1 ∑ x [n−k ] y [n] n=k Autocorrelation Example x (n ) 4 2 0 -2 -4 -2 0 -1 5 -1 0 -5 0 n 5 10 15 20 Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Autocorrelation Example Power Spectral Density The power spectral density (PSD) of a random signal is its average magnitude spectrum (phase is considered irrelevant and is removed). For stationary random signals the PSD can be estimated from the average of DFT magnitudes. G −1 1 S [m]= G ∑ ∣V g [m]∣ g=0 From long data segments the PSD is estimated from shorter overlapping segments, referred to as Welch's method. For a single segment g, FFT is computed from: 1 V g [m]= N N −1 ∑ n=0 v [ Lgn] w [n]exp − j 2 nm N FFT for 0≤m N FFT where w[n] is a window function used to taper the data down at the edge of each extracted segment, and L is an increment (usually less than N) to obtain an overlap between segments. NFFT is the number of FFT points usually obtained from padding with zeros. The Matlab function pwelch() can be used to implement the PSD computation. PSD estimation process with hopping window method FFT Magnitudes Average Windowed Segments Power Spectral Density Estimation The power spectral density (PSD) estimation performance is affected by window length, window taper, and zero padding. 1. Window length is proportional to frequency resolution (i.e. the ability to distinguish between 2 closely spaced frequencies). 1 Δf∝ T 2. Window tapering lowers resolution but reduces sidelobe artifacts called spectral leakage. 3. Zero padding increases the number of grid points on the PSD frequency axis. It does this through interpolations, so resolution is not improved but can improve the location of a peak or a null by using a finer grid spacing. MATLAB Tips for Axes If plotting a time signal (vector) the X-axis labels must correspond to the same number of point as in the signal vector. Given the sampling frequency is stored in variable fs in Hz and the signal vector is stored in variable sig, a time axis vector can be generated by: >> tax = [0:length(sig)-1]/fs; % time axis in seconds >> plot(tax,sig) If plotting the frequency spectrum vector pd from an NFFT point DFT, the frequency axis vector can be created from: >> fax = fs*[0:length(pd)-1]/length(pd); % Frequency axis in Hz >> plot(fax,abs(pd))