1
Many techniques in speech processing require the manipulation of probabilities and statistics.
The two principal application areas we will encounter are:
Statistical pattern recognition.
Modeling of linear systems.
2
It is customary to refer to the probability of an event.
An event is a certain set of possible outcomes of an experiment or trial.
Outcomes are assumed to be mutually exclusive and, taken together, to cover all possibilities.
3
To any event A we can assign a number,
P(A), which satisfies the following axioms:
P(A)≥0.
P(S) =1.
If A and B are mutually exclusive, then
P(A+B) = P(A)+P(B).
The number P(A) is called the
of A .
4
(some consequence)
Some immediate consequence:
If is the complement of A, then
( A
A )
S
P ( A )
1
P ( A )
P(0) ,the probability of the impossible event , is 0.
P(A) ≤ 1.
If two event A and B are not mutually exclusive, we can show that
P(A+B)=P(A)+P(B)-P(AB).
5
The conditional probability of an event A , given that event B has occurred, is defined as:
P ( A | B )
P ( AB )
P ( B )
We can infer P(B|A) by means of Bayes’ theorem:
P ( B | A )
P ( A | B )
P ( B )
P ( A )
6
Events A and B may have nothing to do with each other and they are said to be independent.
Two events are independent if
P(AB)=P(A)P(B).
From the definition of conditional probability:
P ( A |
P ( B |
B )
A )
P ( A )
P ( B )
P ( A
B )
P ( A )
P ( B )
P ( A ) P ( B )
7
Three events A,B and C are independent only if:
P ( AB )
P ( A ) P ( B )
P ( AC )
P ( A ) P ( C )
P (
P ( BC
ABC )
)
P (
P ( B ) P ( C )
A ) P ( B ) P ( C )
8
A random variable is a number chosen at random as the outcome of an experiment.
Random variable may be real or complex and may be discrete or continuous.
In S.P. ,the random variable encounter are most often real and discrete.
We can characterize a random variable by its probability distribution or by its probability density function (pdf).
9
Random Variables
(distribution function)
:
The distribution function for a random variable y is the probability that y does not exceed some value u ,
F y
( u )
P ( y
u )
and
P ( u
y
v )
F y
( v )
F y
( u )
10
Random Variables
(probability density function)
:
The probability density function is the derivative of the distribution : f y
( u )
d du
F y
( u ) v and, P ( u
y
v )
u f y
( y ) dy
F y
(
)
1
f y
( y ) dy
1
11
(expected value)
We can also characterize a random variable by its statistics.
The expected value of g(x) is written
E{g(x)} or <g(x)> and defined as
Continuous random variable:
g ( x )
g ( x ) f ( x ) dx
Discrete random variable:
g ( x )
g ( x x ) p ( x )
12
(moments)
The statistics of greatest interest are the moment of X.
The kth moment of X is the expected value
X k of .
For a discrete random variable: m k
X k x
x k p ( x )
13
(mean & variance)
m x .
Continuous:
X
xf ( x ) dx
Discrete:
X
X
x xp ( x )
The second central moment, also known as the variance of p(x), is given by
2 x
( x
x )
2 p ( x )
m
2
X
2
14
To estimate the statistics of a random variable, we repeat the experiment which generates the variable a large number of times.
If the experiment is run N times, then each value x will occur Np(x) times, thus
ˆ k
1
N i
N
1 x i k x
1
N i
N
1 x i
15
(Uniform density)
A random variable has a uniform density on the interval (a, b) if :
F
X
( x )
(
0
1 , x
,
a ) /( b
a ), a
x x x
a b b
f
X
( x )
1
0
/( b
a
,
), a
x
otherwise b
2
1
12
( b
a )
2
16
Random Variables
(Gaussian density)
:
The Gaussian, or normal density function is given by: n ( x ;
,
)
1
2
e
( x
)
2
/ 2
2
17
Random Variables (…Gaussian density) :
The distribution function of a normal variable is:
N ( x ;
,
)
x
n ( u ;
,
) du
If we define error function as
1 erf ( x )
2
x
e
u
2
/ 2 du
Thus,
N ( x ;
,
)
1 erf ( x
)
18
Two Random Variables:
If two random variables x and y are to be considered together, they can be described in terms of their joint probability density f(x, y) or, for discrete variables, p(x, y).
Two random variable are independent if
p ( x , y )
p ( x ) p ( y )
19
Two Random Variables
(…Continue)
:
Given a function g(x, y), its expected value is defined as:
Continuous:
g ( x , y )
g ( x , y ) f ( x , y ) dxdy
Discrete:
g ( x , y )
x
y , g ( x , y ) p ( x , y )
And joint moment for two discrete random variable is: m ij
x
y , x i y j p ( x , y )
20
Two Random Variables
( …Continue)
:
Moments are estimated in practice by averaging repeated measurements:
ˆ ij
1
N
N
1 x
i y
j
A measure of the dependence of two random variables is their correlation and the correlation of two variables is their joint second moment: m
11
xy
x
y , xyp ( x , y )
21
Two Random Variables
( …Continue)
:
The joint second central moment of x , y is their covariance :
xy
( x
x )( y
y )
m
11
x y
If x and y are independent then their covariance is zero.
The correlation coefficient of x and y is their covariance normalized to their standard deviations: r xy
xy x
y
22
Two Random Variables
(…Gaussian
Random Variable )
:
Two random variables x and y are jointly
Gaussian if their density function is : n ( x , y )
2
x
y
1
1
r
2 exp
1
2 ( 1
r
2
) x
2
x
2
2 rxy
x
y
y
2
2 y
Where r xy
xy x
y
23
Two Random Variables
(…Sum of
Random Variables )
:
The expected value of the sum of two random variables is :
x
y
x
y
This is true whether x and y are independent or not
And also we have :
cx
c
x
i
x i
i
x i
24
Two Random Variables
( …Sum of
Random Variable )
:
The variance of the sum of the two independent random variable is :
x
2
y
x
2 y
2
If two random variable are independent , the probability density of their sum is the convolution of the densities of the individual variables :
Continuous:
Discrete: p x
f y x
( y
( z ) z )
u
p x f x
( u )
( u ) p y f y
(
( z z
u ) du u )
25
Central Limit Theorem
Central Limit Theorem (informal paraphrase ):
If many independent random variable are summed, the probability density function
(pdf) of the sum tends toward the
Gaussian density, no matter what their individual densities are.
26
Multivariate Normal Density
The normal density function can be generalized to any number of random variables.
Let X be the random vector,
N ( x )
Where
( 2
)
n / 2
| R |
1 exp
Col
[
1
2
X
1
, X
Q ( x
2
,...,
x )
X n
]
Q ( x
x )
( x
x )
T
R
1
( x
x )
The matrix R is the covariance matrix of X
(R is Positive-Definite)
R
( x
x )( x
x )
T
27
A random function is one arising as the outcome of an experiment.
Random function need not necessarily be functions of time, but in all case of interest to us they will be.
A discrete stochastic process is characterized by many probability density of the form, p ( x
1
, x
2
, x
3
,..., x n
, t
1
, t
2
, t
3
,..., t n
)
28
If the individual values of the random signal are independent, then p ( x
1
, x
2
,..., x n
, t
1
, t
2
,..., t n
)
p ( x
1
, t
1
) p ( x
2
, t
2
)...
p ( x n
, t n
)
If these individual probability densities are all the same, then we have a sequence of independent, identically distributed samples (i.i.d.).
29
The mean is the expected value of x(t) : x ( t )
x ( t )
xp ( x , t ) x
The autocorrelation function is the expected value of the product :
1 2 r ( t
1
, t
2
)
x ( t
1
) x ( t
2
)
x
1
,
x
2 x
1 x
2 p ( x
1
, x
2
, t
1
, t
2
)
30
Mean and autocorrelation can be determined in two ways:
The experiment can be repeated many times and the average taken over all these functions. Such an average is called ensemble average .
Take any one of these function as being representative of the ensemble and find the average from a number of samples of this one function. This is called a time average .
31
If the time average and ensemble average of a random function are the same, it is said to be ergodic .
A random function is said to be stationary if its statistics do not change as a function of time.
Any ergodic function is also stationary.
32
For a stationary signal we have:
Where p (
x
1
, x
2 t
2
, t
1 t x ( t )
1
, t
2
)
x p ( x
1
, x
2
,
)
And the autocorrelation function is : r (
)
x
1
,
x
2 x
1 x
2 p ( x
1
, x
2
,
)
33
When x(t) is ergodic, its mean and autocorrelation are : x
1
N lim
2 N t
N
N x ( t ) r (
)
x ( t ) x ( t
)
N lim
1
N t
N
N x ( t ) x ( t
)
34
The cross-correlation of two ergodic random functions is : r xy
(
)
x ( t ) y ( t
)
N lim
1
N t
N
N x ( t ) y ( t
)
The subscript xy indicates a cross-correlation.
35
Random Functions
(power & cross spectral density)
:
The Fourier transform of r (
) (the autocorrelation function of an ergodic random function) is called the power spectral density of x(t) :
S (
)
r (
) e
j
The cross-spectral density of two ergodic random functions is :
S xy
(
)
r xy
(
) e
j
36
( …power density)
For an ergodic signal x(t), r (
) written as: r (
)
x (
)
x (
) can be
Then from elementary Fourier transform properties,
S (
)
X (
) X (
)
X (
) X
(
)
| X (
) |
2
37
(White Noise)
If all values of a random signal are uncorrelated, r (
)
2
(
)
Then this random function is called white noise
The power spectrum of white noise is constant,
S (
)
2
White noise is a mixture of all frequencies.
38
Random Signal in Linear Systems :
Let T[ ] represent the linear operation; then
T [ x ( t )]
T [
x ( t )
]
Given a system with impulse response h(n),
y ( n )
x ( n )
h ( n )
x ( n )
h ( n )
A stationary signal applied to a linear system yields a stationary output, r yy
(
)
r xx
(
)
S yy
(
)
S xx h (
(
) |
)
h (
H (
) |
2
)
39