PPT-2 - Laboratory for Remote Sensing Hydrology and Spatial

advertisement
STATISTICS
Random Variables and
Distribution Functions
Professor Ke-Sheng Cheng
Department of Bioenvironmental Systems Engineering
National Taiwan University
Definition of random variable (RV)
• For a given probability space (  ,A, P[]), a
random variable, denoted by X or X(), is a
function with domain  and counterdomain the
real line. The function X() must be such that the
set Ar, denoted by Ar   : X ()  r, belongs
to A for every real number r.
• Unlike the probability which is defined on the
event space, a random variable is defined on
the sample space.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
2
Random
experiment
Sample
space
Event
space
Probability
space
P{1 , 2 } is defined whereas X {1 , 2} is not defined.
P X    r   P Ar   P : X ()  r
P{1 , 2}  P X  X (1 ) or X  X (2 )
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
3
Cumulative distribution function
(CDF)
• The cumulative distribution function of a
random variable X, denoted by FX () , is
defined to be
FX ( x)  P[ X  x]  P{ : X ( )  x}
x  R
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
4
• Consider the experiment of tossing two fair
coins. Let random variable X denote the
number of heads. CDF of X is
x0
 0
0.25 0  x  1

FX ( x )  
0
.
75
1

x

2


2 x
 1
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
5
FX ( x)  0.25I [ 0,1) ( x)  0.75I [1, 2 ) ( x)  I [ 2,  ) ( x)
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
6
Indicator function or indicator
variable
• Let  be any space with points  and A any
subset of . The indicator function of A,
denoted by I A () , is the function with domain
 and counterdomain equal to the set
consisting of the two real numbers 0 and 1
defined by
1 if   A
I A ( )  
0 if   A
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
7
Discrete random variables
• A random variable X will be defined to be discrete if
the range of X is countable.
• If X is a discrete random variable with values
x1 , x2 ,, xn ,, then the function denoted by
f X () and defined by
P[ X  x j ] if x  x j , j  1,2,, n,
f X ( x)  
0
if x  x j

is defined to be the discrete density function of X.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
8
Continuous random variables
• A random variable X will be defined to be
f X () such
continuous if there
exists
a
function
x
that FX ( x)   f X (u)du for every real number x.
• The function f X () is called the probability
density function of X.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
9
Properties of a CDF
FX ()  lim FX ( x)  0
x  
FX ()  lim FX ( x)  1
x  
FX (a)  FX (b) for a  b
FX () is continuous from the right, i.e.
lim F
0 h 0
3/14/2016
X
( x  h )  FX ( x )
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
10
Properties of a PDF
f X ( x)  0



3/14/2016
x  R
f X ( x)  1
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
11
Example 1
• Determine which of the following are valid
distribution functions:
1  [e2 x / 2] x  0
FX ( x)  
2x
x0
 e /2
1 x  0
x
FX ( x)  u ( x  a)  u ( x  2a) u ( x)  
a
0 x  0
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
12
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
13
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
14
Example 2
• Determine the real constant a, for arbitrary real
constants m and 0 < b, such that
f X ( x)  ae
 x m / b
x  R
is a valid density function.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
15
• Function



f X (x) is symmetric about m.

f X ( x)dx  2 ae
( x  m ) / b
m
dx

 2ab  e  y dy  2ab  1
0
a  1/ 2b
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
16
Characterizing random variables
• Cumulative distribution function
• Probability density function
– Expectation (expected value)
– Variance
– Moments
– Quantile
– Median
– Mode
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
17
Expectation of a random variable
• The expectation (or mean, expected value) of
X, denoted by  X or E(X) , is defined by:
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
18
Rules for expectation
• Let X and Xi be random variables and c be any
real constant.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
19
X (t )  25 sin( t )
3/14/2016
E X (t )  ?
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
20
Variance of a random variable
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
21
•
 X  Var( X )  0
is called the standard
deviation of X.
Var[ X ]    E[ X ]  ( E[ X ])
 
2
X
 E X 
2
2
2
2
X
• Variance characterizes the dispersion of data
with respect to the mean. Thus, shifting a
density function does not change its variance.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
22
Rules for variance
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
23
• Two random variables are said to be
independent if knowledge of the value
assumed by one gives no clue to the value
assumed by the other.
• Events A and B are defined to be independent
if and only if
P[ AB]  PA  B  PAPB
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
24
Moments and central moments of a
random variable
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
25
Properties of moments
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
26
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
27
Quantile
• The qth quantile of a random variable X,
denoted by  q , is defined as the smallest
number  satisfying FX ( )  q .
Discrete Uniform
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
28
Median and mode
• The median of a random variable is the 0.5th
quantile, or  0.5 .
• The mode of a random variable X is defined as
the value u at which f X (u ) is the maximum
of f X () .
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
29
Note: For a positively skewed distribution, the
mean will always be the highest estimate of
central tendency and the mode will always be
the lowest estimate of central tendency
(assuming that the distribution has only one
mode). For negatively skewed distributions, the
mean will always be the lowest estimate of
central tendency and the mode will be the
highest estimate of central tendency. In any
skewed distribution (i.e., positive or negative)
the median will always fall in-between the mean
and the mode.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
30
Moment generating function
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
31
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
32
Usage of MGF
• MGF can be used to express moments in terms
of PDF parameters and such expressions can
again be used to express mean, variance,
coefficient of skewness, etc. in terms of PDF
parameters.
• Random variables of the same MGF are
associated with the same type of probability
distribution.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
33
• The moment generating function of a sum of
independent random variables is the product
of the moment generating functions of
individual random variables.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
34
Expected value of a function of a random
variable
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
35
• If Y=g(X)

E[ g ( X )]   g ( x) f X ( x)dx

 E Y    yf Y ( y )dy


Var[ X ]  E[( X   X ) ]
2

  ( x   X ) f X ( x)dx
2

3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
36
Y
Y=g(X)

E[ g ( X )]   g ( x) f X ( x)dx
y

 E Y    yf Y ( y )dy


x1
3/14/2016
x2
x3
X
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
37
Theorem
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
38
Chebyshev Inequality
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
39
• The Chebyshev inequality gives a bound,
which does not depend on the distribution of X,
for the probability of particular events
described in terms of a random variable and its
mean and variance.
3/14/2016
Laboratory for Remote Sensing Hydrology and Spatial Modeling,
Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.
40
Download