N03-Discrete Random Variables

advertisement
BIOINF 2118
N03- Discrete Random Variables
p.1 of 8
Definition of a Random Variable
Consider an experiment with a sample space X. A random variable is real-valued
function that is defined on the sample space X. We write
.
Capital Roman letters = random variables.
Lower case Roman letters = values of a random variable.
We use A for a subset of the sample space X, and B for a subset of
.
Random Variable Example
Consider an experiment in which we roll two dice.
The sample space is
Let X denote the sum of the two dice. X is a random variable.
Let Y denote the value of the first die. Y is a random variable.
Let Z denote the value of the first die divided by the value of the second die. Z is a r.v.
Types of Random Variables
Discrete random variable:
 Sample space X is countable(usually {0,1}, {0,1,...,n}, or {0,1,...,}).
 The distribution is described by a probability mass function.
Continuous random variable:
 X is an interval (usually [0,1], [0, ∞), or (-∞,∞).)
 Therefore X is UNcountably infinite.
 The distribution is described by a probability density function.
RVs discard information
Consider the sample space Xdice of results from throwing two dice.
The size, or cardinality, of Xdice , written | Xdice |, is 36 (or 21 if indistinguishable).
Let S = the sum of the two dice. The sample space for S is Xsum = {2, 3,…, 12} Î
| Xsum | = _____?
Then S is a function
.
.
Let B be the event “S=4”. So
. S(A) = B = {4}.
Three outcomes in Xdice map to the same value of S in Xsum.
You’ve “lost information”. (But is it valuable information? Later we define “sufficient statistic”.)
Then
.
Distribution of a RV
For any random variable S, the probability distribution of S specifies the probability that S is in (almost) any
subset A of
:
The first “Pr” is on the sample space . (A is a subset of
The other two “Pr”s are on the original full sample space X.
.)
N03- Discrete Random Variables
BIOINF 2118
p.2 of 8
Distribution of a Discrete RV
The distribution of a discrete random variable may be represented by a probability mass
function (p.m.f.) f defined by:
for every possible value s of the random variable S.
P.M.F. : The Dice Example
The p.m.f. of S is:
Some Common Discrete Distributions
The symbol ~ is read thus: “is distributed as”. (Notice we can use other letters: S, Z, X, ..)
Discrete Uniform: S ~ Unif({1,…,k}) if Pr(S=s) = 1/k for x=1,…,k.
(We will encounter this with the “bootstrap” method, using the function sample( ).)
Bernoulli: Z ~ Bernoulli(p) if
.
(Kind of exciting-- flipping a coin at the beginning of the superbowl?)
Binomial: X ~ Bin(n,p) if
for x
{0,1,….,n}.
Also exciting!! Why? It counts things.
Consider a sequence of Bernoulli outcomes Z1,..., Zn which are i.i.d..
i.i.d = “independent, identically distributed”.
Then
.
Z’s could be superbowl coin flips over years... or, response outcomes for some patients.
N03- Discrete Random Variables
BIOINF 2118
p.3 of 8
æ n ö
n!
÷=
è x ø x!(n - x)!
The Binomial Coefficient ç
is read “n choose x”,
because it is the number of ways to choose a subset of x things from a set of size n.
For n = 3,
Sample space X
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
X = Z1 + Z2 + Z3
3
2
2
X
0
1
2
3
æ 3 ö
ç
÷
è 0 ø
æ 3 ö
ç
÷
è 1 ø
æ 3 ö
ç
÷
è 2 ø
æ 3 ö
ç
÷
è 3 ø
=1
=3
=3
=1
1
2
1
1
0
# subsets
=
S -1(X )
See also the document N03-Discrete Random Variables-whiteboards.docx
Multinomial:
.
k = # categories,
,
.
See the tables from last week’s class – the prisoners’ picnic: what’s k?
Study the notation! Ask for clarification if unfamiliar.
Exercise: You have 10 scrabble tiles: S T A T I S T I C S. If you scramble them face down, then put
them in a line, and turn them over, what is the chance that they spell “STATISTICS”?
Hint: How many permutations (orderings) are there? ( new word)
How many of those to choose all 3 S’s for the S spots? Etc.
What are the p’s? What are the m’s?
See the document
“multinomial and the probability of getting the right letters in the right order.docx”
BIOINF 2118
N03- Discrete Random Variables
p.4 of 8
Geometric Distribution
Notation:
X ~ Geom(p) or X ~ NegBin(1,p) , where 0<p<1
Negative Binomial
Notation:
X ~ NegBin(r,p) , where r is a positive integer and 0<p<1.
The pmf is:
This distribution may describe the number of tails obtained while repeatedly flipping a coin
until r heads are obtained, for a coin that has probability p of landing heads.
Confusing: it counts tails, not heads. The # of heads is fixed.
The # tails is UNBOUNDED.
The negative binomial differs from binomial ONLY in the STOPPING RULE.
They have the “same” (proportional) likelihood function.
BIOINF 2118
N03- Discrete Random Variables
p.5 of 8
N03- Discrete Random Variables
BIOINF 2118
p.6 of 8
Poisson Distribution
if
for x=0,1,2,….
The Poisson distribution is VERY exciting!
Often appropriate for count data when there is no natural upper bound.
Markov chains
Suppose the sample space at each time t = 1,2,3,... is {A, B, C, D}, called “states”.
At each time t, we’ll write the current state as X t .
The key assumption is:
regardless of
.
This is a “memoriless” property, a special case of conditional independence.
Pr( Xt + 1 | X1,..., Xt- 1, Xt ) = Pr( Xt + 1 | Xt ) .
Markov chains are tremendous useful in many many ways,
especially for (a) modeling processes, (b) devising computational methods.
BIOINF 2118
N03- Discrete Random Variables
p.7 of 8
Cumulative Distribution Function
Another representation of the distribution of a random variable is given by the
cumulative distribution function (c.d.f).
The cdf of a random variable X is the function F defined by:
for
.
F is a non-decreasing function, continuous from the right, with
.
CDF: The discrete case
.
This is a step function.
In R, the CDFs are obtained from the functions beginning with “p” for probability: pbinom,
ppois, etc..
r: random
p: CDF
q: quantile
d: prob mass
(“density”)
binom
rbinom()
pbinom()
qbinom()
dbinom()
geom
rgeom()
pgeom()
qgeom()
dgeom()
nbinom
rnbinom()
pnbinom()
qnbinom()
dnbinom()
pois
rpois ()
ppois ()
qpois ()
dpois ()
multinom
rmultinom()
pmultinom()
qmultinom()
dmultinom()
N03- Discrete Random Variables
BIOINF 2118
p.8 of 8
Bernoulli
Binomial
Geometric
Negative Binomial
Poisson
Discrete
Discrete
Discrete
Discrete
Discrete
{0,1}
{0,1,…,n}
{0,1,…}
{0,1,…}
{0,1,…}
#(heads)
#(heads)
#(tails)
#(tails)
count
Pr(heads)
Pr(heads)
Pr(tails)
Pr(tails)
Pr(count)
Sample
size
1
n
1
r
1 (?)
E(X)
p
np
λ
V(X)
p(1–p)
np(1 – p)
λ
Variable
Type
Sample
Space
Meaning
of x
Meaning
of p or l
Pr
CV(X)
Binomial – Sum of independent Bernoulli trials with the same probability of success for each trial.
Stopping rule  The total sample size (n) is fixed in advance.
x = number of successes (heads) in the first n trials.
Geometric – Independent Bernoulli trials.
x = number of failures (tails) before the first success.
Stopping rule  stop at the first success
Negative
Binomial – Sum of independent Bernoulli trials with the same probability of success for each trial.
Stopping rule  The number of successes is fixed in advance (r).
x = number of failures before the rth success.
(CAUTION: the parametrization is not entirely consistent across books and
software packages.)
Download