Exercise 2.2 Assume the following sample space: Ω = {is-noun, has-plural-s, is-adjective, is-verb} And the function f :2Ω [0,1] with the following values: x {is-noun} {has-plural-s} {is-adjective} {is-verb} f(x) 0.45 0.2 0.25 0.3 Can f be extended to all of 2Ω such that it is a well-formed probability distribution? If not, how would you model these data probabilistically? No, because P(Ω) =1.2. In a well-formed probability distribution P(Ω) =1. Since ‘has-plural-s’ is dependent on ‘is-noun,’ a better way to model this data probabilistically is to define Ω as {is-noun, isadjective, is-verb} so that the sum of the probabilities adds up to one. ‘Has-plural-s’ can then be defined as a subset of ‘is-noun.’ In that case, P(has-plural-s) = 0.2 = P (has-plural-s ∩ is-noun). Therefore, P(has-plural-s|is-noun) = 0.2/0.45 ≈0.44. Exercise 2.3 Compute the probability of the event ‘A period occurs after a three-letter word and this period indicates an abbreviation (not an end-of-sentence marker),’ assuming the following probabilities: P(is-abbreviation|three-letter-word) = 0.8 P(three-letter-word) = 0.0003 Let ‘is-abbreviation’ be event A and ‘three-letter-word’ be event B. The probability of the event ‘A period occurs after a three-letter word and this period indicates an abbreviation (not an end-ofsentence marker)’ can be expressed as P(A ∩ B). We know that P(A|B) = 0.8 and P(B) = 0.0003. We also know that P(A|B) = P(A ∩ B)/P(B). Therefore, P(A ∩ B) = 0.8 * 0.0003 = 0.0024. Exercise 2.4 Are X and Y as defined in the following table independently distributed? x y p(X = x, Y=y) 0 0 0.32 0 1 0.08 1 0 0.48 1 1 0.12 We know that X and Y are independently distributed if P(X∩Y) = P(X)P(Y). In this case, P(X∩Y)=0.12 and P(X)P(Y) = 0.0384, so X and Y are not independent of each other. Exercise 2.5 In example 5, we worked out the expectation of the sum of two dice in terms of the expectation of rolling one die. Show that one gets the same result if one calculates the expectation for two dice directly. Example 5 computes the expectation of the sum of two dice in terms of the expectation of rolling one die as follows: E(X) = E(Y + Y) = E(Y) + E(Y) = 3 ½ + 3 ½ = 7 Alternately, the expectation can be computed in terms of rolling two dice directly like so: E(X) = 2(1/36) + 3(2/36) + 4(3/36) + 5(4/36) + 6(5/36) + 7(6/36) + 8(5/36) + 9(4/36) + 10(3/36) + 11(2/36) + 12(1/36) = 252/36 = 7 Exercise 2.6 Consider the set of grades you have received for courses taken in the last two years. Convert them to an appropriate numerical scale. What is the appropriate distribution for modeling them. Since the binomial distribution is the only discrete probability distribution covered in detail, we can attempt to use it to model my grades in the last two years. I have taken six classes in the last two years, with grades of A, C+, A-, A, B+, A. Because binomial distributions model trials with two outcomes, let us define our outcomes as A or not A (thus, a grade of A would be a successful outcome). Let us assume that out of the sample space { A, A-, B+, B, B-, C+, C, C-, D+, D, D-, F}, each possible outcome has an equal probability of 1/12 and that each is independent of one another. Thus the probability of earning an A is 1/12, and the probability of earning another grade is 11/12. Given this information, the binomial distribution of my grades can be represented as follows1: Trials 1 Successes 1 2 1 1 Binomial Distribution 𝟏 𝟏 𝟑 𝟏𝟏 𝟑 𝟑𝟑 ( )( ) ( ) = 𝟔 𝟏 𝟏𝟐 𝟏𝟐 𝟏𝟐 𝟐 𝟏 𝟑 𝟏𝟏 𝟑 𝟔𝟔 ( )( ) ( ) = 𝟔 𝟏 𝟏𝟐 𝟏𝟐 𝟏𝟐 I’m not sure if this is what the problem was asking for, nor am I sure that I calculated the number of successes correctly – for each trial, I counted the number of successes that had already occurred. Also, the solution for the binomial distribution seems extremely small! 𝟑 𝟏 𝟑 𝟏𝟏 𝟑 𝟗𝟗 ( )( ) ( ) = 𝟔 𝟏 𝟏𝟐 𝟏𝟐 𝟏𝟐 𝟒 𝟏 𝟑 𝟏𝟏 𝟑 𝟔𝟔 ( )( ) ( ) = 𝟔 𝟐 𝟏𝟐 𝟏𝟐 𝟏𝟐 3 1 4 2 5 2 𝟓 𝟏 𝟑 𝟏𝟏 𝟑 𝟑𝟑𝟎 ( )( ) ( ) = 𝟐 𝟏𝟐 𝟏𝟐 𝟏𝟐𝟔 6 3 𝟔 𝟏 𝟑 𝟏𝟏 𝟑 𝟐𝟎 ( )( ) ( ) = 𝟔 𝟑 𝟏𝟐 𝟏𝟐 𝟏𝟐 Exercise 2.7 Skipped – problem requires corpus data. Exercise 2.8 Skipped for now – requires differentiation