IEOR 165 Homework 3 Due March 19, 2015 Question 1. Use One Sample Kolmogorov Smirnov Test at significance level 0.05 to test whether the observations come from a standard uniform distribution. 0.276 0.612 0.19 0.452 0.966 0.89 0.483 0.682 Discuss the advantages and the limitations of Kolmogorov Smirnov test. Question 2. Let X1 , X2 , ..., Xn be i.i.d random variables, each with the same cumulative distribution function FX (x) = P (Xi < x). Let Xmax = max{X1 , X2 , ..., Xn }. What is the cdf of Xmax ? Question 3. Suppose that X1 , X2 , ..., Xn form a random sample from a uniform distribution on the interval (0, θ), and that the following hypotheses are to be tested: H0 : θ ≥ 2 H1 : θ < 2 Let Xmax = max{X1 , X2 , ..., Xn }, and consider a test whose rejection region contains all the outcomes for which Xmax ≤ 1.5. a. Determine the power function of the test. b. Determine the size of the test. Question 4. Let Ai denote the categorical variables. Draw the simplicial complex associated to the null hypothesis below. a. A1 , A2 and A3 are pairwise dependent but not jointly dependent. b. A1 , A2 , A3 and A4 are jointly dependent and A5 is independent of A1 , A2 , A3 and A4 . c. A1 and A2 are dependent and independent of A3 , A4 and A5 . A3 , A4 and A5 are jointly dependent. d. A1 , A2 , A3 , A4 and A5 are independent. Question 5. 1 a. Let X1 , ..., Xn be iid with density Pθ (X = x) = θx (1 − θ)1x for x = 0, 1 and 0 ≤ θ ≤ 1/2 Find the MLE of θ. b. Let Yi = aXi + ϵi where ϵi ∼ U nif orm(0, θ). Find the MLE of θ. Question 6. Suppose iid data is from Xi ∼ N (µ, σ 2 ) where sigma2 = 1. Consider the null hypothesis H0 : µ = 2 H1 : µ = 1 Assume the power of the test is 0.90 and the significance level is 0.05, use a Monte Carlo algorithm to determine the threshold k. 2 Solution 1. Sorted Xi F (Xi ) F̂ (Xi ) F̂ (Xi−1 ) |F̂ (Xi ) − F (Xi )| |F̂ (Xi−1 ) − F (Xi )| 0.19 0.19 0.125 0 0.065 0.19 0.276 0.276 0.25 0.125 0.026 0.151 0.452 0.482 0.612 0.682 0.89 0.966 0.452 0.482 0.612 0.682 0.89 0.966 0.375 0.5 0.625 0.75 0.875 1.00 0.25 0.375 0.5 0.625 0.75 0.875 0.077 0.018 0.013 0.68 0.015 0.034 0.202 0.107 0.112 0.067 0.14 0.091 Dmax = 0.202 D0.05,8 = 0.457 Since Dmax < Dcritical , H0 cannot be rejected. Advantages Can work with very small samples No class selection, subjectivity Limitations Only applies to continuous istributions Assumes all parameters are known More sensitive near the center of the distribution than at the tails. Solution 2. FXmax (x) = = = = P (Xmax < x) P (X1 < x, X2 < x, ..., Xn < x) P (X1 < x)P (X2 < x)...P (Xn < x) FX (x)n Solution 3. a. β(θ) = Pθ ({X1 , X2 , ..., Xn } ∈ R) = Pθ (max{X1 , X2 , ..., Xn } ≤ 1.5) = Pθ (X ≤ 1.5)n whereX ∼ U nif (0, θ) { 1( ) if θ ≤ 1.5 = 1.5 n if θ > 1.5 θ b. α = sup β(θ) θ∈H0 ( )n ( )n ( )n 1.5 3 1.5 = = = sup θ 2 4 θ≥2 3 Solution 4. a. b. c. d. Solution 5. a. The likelihood function is Ln (θ) = n ∏ Pθ (X = Xi ) = θ i=1 4 ∑n i=1 Xi (1 − θ)n− ∑n i=1 Xi The log-likelihood function is ( n ) ( ) n ∑ ∑ ℓn (θ) = Xi log θ + n − Xi log (1 − θ) i=1 i=1 By taking the first derivative and setting it as 0, we have ∑n ∑ n − ni=1 Xi dℓn i=1 Xi (θ) = − = 0 ⇒ θ = Xn dθ θ 1−θ Also, d2 ℓn (θ) = − dθ2 ∑n i=1 θ2 Xi ∑ n − ni=1 Xi − <0 (1 − θ)2 so ℓn (θ) has a global maximum at θ = X n . However, since 0 ≤ θ ≤ 1/2, ℓn (θ) can only achieve the global maximum when 0 ≤ X n ≤ 1/2. We know that X n ≥ 0. When X n > 1/2, ∑n ∑ ∑n Xi − θn n − ni=1 Xi dℓn Xn − θ i=1 Xi (θ) = − = i=1 = >0 dθ θ 1−θ θ(1 − θ) nθ(1 − θ) So ℓn (θ) is an increasing function for θ. Hence ℓn (θ) would have its maximum at θ = 1/2. Therefore, the MLE of θ is θ̂ = min{X, 1/2}. b. The likelihood function is ( )n 1 I(min(Yi − aXi ) ≥ 0)I(max(Yi − aXi ) ≤ θ) Ln (θ) = θ therefore, the MLE of θ is θ̂ = max(Yi − aXi ) Solution 6. alpha=.05; %desired size beta=0.9; %desired power M=100; %repetition count gamma=zeros(1,M); delta=zeros(1,M); H1=1; %alternative hypothesis H0=2; %null hypothesis sigma=1; %stdev N=2:50; %sample size vector c=0.5:0.5:20; %threshold vector a=zeros(1,length(c)); b=zeros(1,length(c)); sample_size=0; threshold=0; for j=1:length(N) 5 for k=1:length(c) for s=1:M X=normrnd(H0,sigma,[1 N(j)]); Y=normrnd(H1,sigma,[1 N(j)]); LX=exp(-sum((X-H1).^2)/2+sum((X-H0).^2)/2); LY=exp(-sum((Y-H1).^2)/2+sum((Y-H0).^2)/2); if LX>c(k) delta(s)=1; else delta(s)=0; end if LY>c(k) gamma(s)=1; else gamma(s)=0; end end a(k)=sum(delta)/M; if a(k)>alpha %to find the max a value less than alpha a(k)=0; end b(k)=sum(gamma)/M; end [maxim, lambda]=max(a); if b(lambda)>=beta sample_size=N(j); threshold=c(lambda); return end end 6