# Notes 8 - Wharton Statistics Department ```Statistics 510: Notes 8
I. Random Variables
So far, we have been defining probability functions in
terms of the elementary outcomes making up an
experiment’s sample space.
Thus, if two fair dice were tossed, a probability was
assigned to each of the 36 possible pairs of upturned faces,:
P((3,2))=1/36, P((2,3))=1/36, P((4,6))=1/36 and so on.
We have seen that in certain situations some attribute of an
outcome may hold more interest for the experimenter than
the outcome itself.
A craps player, for example, may be concerned only that he
throws a 7, not whether the 7 was the result of a 5 and a 2, a
4 and a 3 or a 6 and a 1.
That, being the case, it makes sense to replace the 36member sample space of (x,y) pairs with the more relevant
(and simpler) 11-member set of all possible two-dice sums,
S  {x  y : x  y  2,3, ,12} .
This redefinition of the sample space not only changes the
number of outcomes in the space (from 36 to 11) but also
changes the probability structure. In the original sample
space, all 36 outcomes are equally likely. In the revised
sample space, the 11 outcomes are not equally likely. The
probability of getting a sum equal to 2 is 1/36[=P((1,1))],
but the probability of getting a sum equal to 3 is
2/36[=P((1,2))+P((2,1))].
In general, rules for redefining sample spaces – like going
from (x,y)’s to (x+y)’s – are called random variables.
A random variable is simply a function that is defined on
the sample space of the experiment and that assigns a
numerical variable to each possible outcome of the
experiment. We denote random variables by uppercase
letters, often X, Y or Z.
A random variable that can take on a finite or at most
countably infinite number of values is said to be discrete; a
random variable that can take on values in an interval of
real numbers, bounded or unbounded, is said to be
continuous.
We will focus on discrete random variables in Chapter 4
and consider continuous random variables in Chapter 5.
Associated with each discrete random variable X is a
probability mass function (pmf) p ( a ) that gives the
probability that X equals a:
p(a)  P{ X  a}  P({s  S | X ( s)  a}) .
Example 1: Suppose two fair dice are tossed. Let X be the
random variable that is the sum of the two upturned faces.
X is a discrete random variable since it has finitely many
possible values (the 11 integers 2, 3, ..., 12). The
probability mass function of X is
P(X=2)=1/36
P(X=3)=2/36
P(X=4)=3/36
P(X=5)=4/36
P(X=6)=5/36
P(X=7)=6/36
P(X=8)=5/36
P(X=9)=4/36
P(X=10)=3/36
P(X=11)=2/36
P(X=12)=1/36
It is often instructive to present the probability mass
function in a graphical format plotting p ( xi ) on the y-axis
against xi on the x-axis.
Example 2: Three balls are to be randomly selected without
replacement from an urn containing balls numbered 1
through 20. Let X denote the largest number selected.
X is a random variable taking on values 3, 4, ..., 20. Since
we select the balls randomly, each of the
 20 
  combinations of the balls is equally likely to be
3 
chosen. The probability mass function is
 i  1


2 

P{ X  i} 
, i  3, , 20
20
 
 
3 
This equation follows because the number of selections that
result in the event { X  i} is just the number of selections
that result in the ball numbered i and two of the balls
numbered 1 through i-1 being chosen.
Suppose the random variable X can take on values x1 , x2 ,
Since the probability mass function is a probability function
on the redefined sample space that considers values of X,

we have that
 P( X  x )  1 .
i
i 1
[This follows from
1  P( S )  P(

i 1

{ X  xi })   P( X  xi ) ]
i 1
Example 3: Independent trials, consisting of the flipping of
a coin having probability p of coming up heads, are
continually performed until either a head occurs or a total
of n flips is made. Let X be the random variable that
denotes the number of times the coin is flipped. The
probability mass function for X is
P{ X  1}  P{H }  p
P{ X  2}  P{(T , H )}  (1  p) p
P{ X  3}  P{(T , T , H )}  (1  p) 2 p
P{ X  n  1}  P{(T , T ,
, T , H )}  (1  p) n  2 p
n2
P{ X  n}  P{(T , T ,
, T , T ), (T , T ,
n 1
, T , H )}  (1  p) n 1
n 1
As a check, note that
n
n 1
i 1
i 1
 P{ X  i}   p(1  p)
i 1
 (1  p) n 1
1  (1  p) n 1 
n 1
 p
  (1  p)
 1  (1  p) 
 1  (1  p) n 1  (1  p) n 1
1
II. Expected Value
Probability mass functions provide a global overview of a
random variable’s behavior. Detail that explicit, though, is
not always necessary – or even helpful. Often times, we
want to focus the information contained in the pmf by
summarizing certain of its features with single numbers.
The first feature of a pmf that we will examine is central
tendency, a term referring to the “average” value of a
random variable.
The most frequently used measure for describing central
tendency is the expected value.
Motivation for expected value: Let X 1 , , X n be the sum
of the two dice in n independent throws of two dice. The
mean of X 1 , , X n is
12
# of throws in which X = i
i
*

n
i 2
From the frequentist definition of probability, as n becomes
# of throws in which X  i
large,
becomes close to p (i )
n
so that the mean of X 1 , , X n becomes close to
12
 i * p(i) .
i2
12
This last quantity
 i * p(i) is called the
i2
expected value of X -- it is what the mean value of X over
many repeated experiments converges to.
Generally, for a discrete random variable, the expected
value of a random variable X is a weighted average of the
possible values X can take on, each value being weighted
by the probability that X assumes it:
E[ X ]   xp ( x ) .
x: p ( x )  0
Example 1 continued: The expected value of the random
variable X is
E[ X ]  2*(1/ 36)  3*(2 / 36)  4*(3 / 36)  5*(4 / 36)  6*(5 / 36) 
7*(6/36)+8*(5/36)+9*(4/36)+10*(3/36)+11*(2/36)+12*(1/36)=7
Example 2 continued: The expected value of the random
variable X is
 i  1

20 
2 

E( X )   i
 15.75
20
 
i 3
 
3 
IV. Expectation of Function of a Random Variable
(Chapter 4.4)
Suppose we are given a discrete random variable X along
with its pmf and that we want to compute the expected
value of some function of X, say g(X).
One approach is to directly determine the pmf of g(X).
Example 3: Let X denote a random variable that takes on
the values -1, 0, 1 with respective probabilities
P{X=-1}=.2, P{X=0}=.5, P{X=1}=.3
2
Compute E ( X ) .
Although the procedure we used in Example 3 will always
enable us to compute the expected value of g(X) from
knowledge of the pmf of X, there is another way of thinking
about E[ g ( X )] . Noting that g(X) will equal g(x) whenever
X is equal to x, it seems reasonable that E[ g ( X )] should
just be a weighted average of the values g(x) with g(x)
being weighted by the probability that X is equal to x.
Proposition 4.1: If X is a discrete random variable that
takes on one of the values xi , i  1 with respective
probabilities p ( xi ) , then for any real valued function g,
E[ g ( X )]   g ( xi ) p( xi ) .
i
Applying the proposition to Example 3,
E ( X 2 )  (1)2 (.2)  02 (.5)  12 (.3)  .5
Proof:
 g ( xi ) p( xi )  
i

j i:g ( xi )  y j
  yj
j
g ( xi ) p ( xi )

i:g ( xi )  y j
p ( xi )
  y j P{g ( X )  y j }
j
 E[ g ( X )]
A corollary of Proposition 4.1 is:
Corollary 4.1: If a and b are constants, then
E[aX  b]  aE[ X ]  b .
Proof:
E[aX  b] 

(ax  b) p ( x)
x: p ( x )  0
a

xp( x)  b
x: p ( x )  0
 aE[ X ]  b

x: p ( x ) 0
p ( x)
```