Uploaded by bossman31415

Intro to RL

advertisement
Assignment-1 DA671: Introduction To Reinforcement Learning
24 January 2023, Due by: 5 February 2023
1
Probability Review
(a) The probability mass function of a random variable X is given by
p(i) = c
λi
, i = 0, 1, 2, . . . ,
i!
where λ is some positive value. Find (i) P (X > 0) and (ii) P (X > 3).
(b) Let X represent the difference between the number of heads and the number of tails obtained
when a coin is tossed n times. Find E[X] when n = 3.
(c) How many people are needed so that the probability that at least one of them has the same
birthday as you is greater than 1/2?
2
Markov Chain
(a) A certain calculating machine uses only the alphabets A and B. It is supposed to transmit
one of these alphabets through several stages. However, at every stage, there is a probability r
that the digit that enters this stage will be changed when it leaves and a probability s = 1 − r
that it won’t. Form a Markov chain to represent the process of transmission by taking as
states the alphabet A and B. What is the matrix of transition probabilities? Assume that
the process begins in state A and moves through two stages of transmission. What is the
probability that the machine, after two stages, produces the alphabet A (i.e., the correct
alphabet)?
(b) Check whether thefollowing Markov
 chains
 are irreducible
 and aperiodic.
1/3 0 2/3
1/2 1/2 0
0.5 0.5
1
0  (iii)  0 1/2 1/2
(i)
(ii)  0
0.5 0.5
0 1/5 4/5
1/3 1/3 1/3
3
A Programming Exercise
In the course, you have seen that limn→∞ P n → π. In other words, if a Markov chain is run for
long enough time, then the probability of being in a state does not depend on the initial state.
Hence, P n converges to the a matrix where all the rows are same. Consider a two state Markov
1
1−p
p
chain with P =
, perform the operation P n , measure the Kullback–Leibler (KL)
p
1−p
divergence between two rows of P n and plot as a function of n. Stop when the KL divergence
between two rows of P n ( representing two Bernoulli distributed random variables) becomes zero.
Do this for p = 0.8 and p = 0.9. Explain the behavior
observed
in the plot.
a 1−a
Note: KL divergence between two rows of a matrix
is defined as
b 1−b
D(a||b) = a log ab + (1 − a) log 1−a
1−b .
2
Download