Lecture 8: Estimation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda 11/9/2015 Estimation Aim: Estimate a sample of a random variable X from a sample of a random vector Y Assumption: We know their joint distribution Estimation of continuous random variables Estimation of discrete random variables Mean square error Common error metric E (X − g (Y))2 The conditional expectation of X given Y is the optimal estimator E X |Y = arg min E (X − g (Y))2 g Example: Gangue A mine produces a mixture of gangue (worthless material) and ore Ore: uniformly distributed between 0 and 1 metric ton Gangue: uniformly distributed between 0 and 1 metric ton If the mixture equals y , what is the best estimate of the amount of ore? Estimation of continuous random variables Estimation of discrete random variables Setting X is discrete and takes a small number of values x1 , . . . , xm MSE is not a very reasonable error metric Conditional mean is not necessarily restricted to {x1 , . . . , xm } It makes more sense to choose the best value within {x1 , . . . , xm } Maximum-likelihood estimator The likelihood function Ly (x) := pY|X (y|x) if Y is discrete and Ly (x) := fY|X (y|x) if Y is continuous The maximum-likelihood estimator is gML (y) := arg = arg max Ly (u) max log Ly (u) u∈{x1 ,...,xm } u∈{x1 ,...,xm } Maximum-a-posteriori estimator The maximum-a-posteriori (MAP) estimator is gMAP (y) := arg max u∈{x1 ,...,xm } pX |Y (u) The MAP estimator is optimal in terms of probability of error Example: Sending bits Communication channel: We send a bit X Prior knowledge 1 pX (1) = , 4 pX (0) = 3 4 Due to noise in the channel we observe Yi = X + Zi , 1 ≤ i ≤ n, where Z1 , Z2 , . . . , Zn are iid Gaussian with zero mean and unit variance ML estimator? MAP estimator? Probabilities or error? ML vs MAP 0.35 ML estimator MAP estimator Probability of error 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 5 10 n 15 20