Reinforcement Learning Assignment: Probability, Markov Chains

Assignment-1 DA671: Introduction To Reinforcement Learning 24 January 2023, Due by: 5 February 2023 1 Probability Review (a) The probability mass function of a random variable X is given by p(i) = c λi , i = 0, 1, 2, . . . , i! where λ is some positive value. Find (i) P (X > 0) and (ii) P (X > 3). (b) Let X represent the difference between the number of heads and the number of tails obtained when a coin is tossed n times. Find E[X] when n = 3. (c) How many people are needed so that the probability that at least one of them has the same birthday as you is greater than 1/2? 2 Markov Chain (a) A certain calculating machine uses only the alphabets A and B. It is supposed to transmit one of these alphabets through several stages. However, at every stage, there is a probability r that the digit that enters this stage will be changed when it leaves and a probability s = 1 − r that it won’t. Form a Markov chain to represent the process of transmission by taking as states the alphabet A and B. What is the matrix of transition probabilities? Assume that the process begins in state A and moves through two stages of transmission. What is the probability that the machine, after two stages, produces the alphabet A (i.e., the correct alphabet)? (b) Check whether thefollowing Markov  chains  are irreducible  and aperiodic. 1/3 0 2/3 1/2 1/2 0 0.5 0.5 1 0  (iii)  0 1/2 1/2 (i) (ii)  0 0.5 0.5 0 1/5 4/5 1/3 1/3 1/3 3 A Programming Exercise In the course, you have seen that limn→∞ P n → π. In other words, if a Markov chain is run for long enough time, then the probability of being in a state does not depend on the initial state. Hence, P n converges to the a matrix where all the rows are same. Consider a two state Markov 1 1−p p chain with P = , perform the operation P n , measure the Kullback–Leibler (KL) p 1−p divergence between two rows of P n and plot as a function of n. Stop when the KL divergence between two rows of P n ( representing two Bernoulli distributed random variables) becomes zero. Do this for p = 0.8 and p = 0.9. Explain the behavior observed in the plot. a 1−a Note: KL divergence between two rows of a matrix is defined as b 1−b D(a||b) = a log ab + (1 − a) log 1−a 1−b . 2

Reinforcement Learning Assignment: Probability, Markov Chains

Related documents

Products

Support

Reinforcement Learning Assignment: Probability, Markov Chains

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib