Lab 3b: Distribution of the mean Outline • Distribution of the mean: Normal (Gaussian) – Central Limit Theorem • “Justification” of the mean as the best estimate • “Justification” of error propagation formula – Propagation of errors – Standard deviation of the mean 640:104 - Elementary Combinatorics and Probability 960:211-212. Statistics I, II Finite number of repeated measurements The average value is the best estimation of true mean value, while the square of standard deviation is the best estimation of true variance. x x 1 N X x xi N i 1 N 1 2 2 2 x xi x N 1 i 1 x x x The same set of data also provides the best estimation of the variance of the mean. 2 2 x x N x x N The Central Limit Theorem The mean is a sum of N independent random variables: x1 x2 x N x N The PDF of the mean is the PDF of the sum of N independent random variables. In probability theory, the central limit theorem (CLT) states that, given conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. N P x PG x *In practice, N just needs to be large enough. Probability Distribution Function of the mean An example of CLT P x1 P x1 x2 x3 P x1 x2 P x1 x2 x3 x4 For this example N>4 is “enough”. More examples of CLT http://www.statisticalengineering.com/central_limit_theorem.html The significant consequence of CLT Almost any repeated measurements of independent random variables can be effectively described by Normal (Gaussian) distribution. E.g. N measurements xi (i=1, 2, …, N) of random variable x can be divided to m subsets of measurements {nj}, j=1,2,…,m. N n1 n2 nm Define the means of each subset as new measurements of new random variable y. If nj are large enough, the PDF of yj is approximately a Normal (Gaussian) one. nj 1 y j x j xi n j i 1 “Justification” of the mean as the best estimate Assuming all the measured values follow a normal distribution GX,(x) with unknown parameters X and . 2 1 x X / 2 2 GX , e 2 The probability of obtaining value x1 in a small interval dx1 is: 2 1 1 x1 X 2 / 2 2 x1 X / 2 2 Prob x between x1 and x1 dx1 e dx1 e 2 constant Similarly, the probability of obtaining value xi (i=0, …, N) in a small interval dxi is: 2 1 1 xi X 2 / 2 2 xi X / 2 2 Prob x between xi and xi dxi e dxi e 2 Justification of the mean as best estimate (cont.) The probability of N measurements with values: x1, x2, … xN: Prob X , x1 , , xN Prob x1 Prob x2 1 e Prob xN N xi X 2 / 2 2 i 1 N Here x1, x2, … xN are known results because they are measured values. On the other hand, X and are unknown parameters. We want to find out the best estimate of them from ProbX, (x1, x2, … xN ). One of the reasonable procedures is the principle of maximum likelihood, i.e. maximizing ProbX, (x1, x2, … xN ) respect to X minimizing N 2 2 x i X / 2 i 1 N 1 xi X 0 best estimate of X N i 1 N x i i 1 x Best estimate of variance Similarly, we could maximize ProbX, (x1, x2, … xN ) respect to to get the best estimate of : 1 N 1 xi X 2 / 2 2 2 xi X 2 / 2 2 N 1 e N xi X 3 e 0 i 1 1 N 2 N 2 xi X 0 N i 1 best estimate of 1 N N xi X 2 i 1 However, we don’t really know X. In practice, we have to replace X with the mean value, which reduces the accuracy. To account for this factor, we have to replace N with N-1. best estimate of 1 N 2 x x i N 1 i 1 Justification of error propagation formula (I) Measured quantity plus a constant: q x A q X A, q = x Measured quantity multiplies a fixed #: q Bx q B X , q =B x Sum of two measured quantities: P q q x y q x y P x P y dxdy P x P q x dx Assuming X=Y=0, P x e x 2 / 2 x2 x 2 P y e y 2 / 2 2y y 2 Justification of error propagation formula (II) Sum of two measured quantities: q x y 2 e x 2 y 2 1 2 x y e x2 q x 2 2 x 2 2y q x / 2 2y P q P x P q x dx x 2 / 2 x2 1 2 x2 y2 e e 2 dx q2 2 x2 2y q 2 x 2 y dx Justification of error propagation formula (III) The general case: q q x . Assuming x are small enough so that we are concerned only with values of x close to X . Using Taylor expansion: q q x q X x X constant x q q q X X x x x q q q x x x x 2 Justification of error propagation formula (IV) The more general case: q q x, y . Assuming x and y are small enough so that we are concerned only with values of x close to X and y close to Y . Using Taylor expansion: q q q x, y q X , Y x X y Y x y q q q q q X ,Y X Y x y x y x y q 2 q q x y x y 2 Standard deviation of the mean x1 x2 x N xN Assuming they are normally distributed and have the same X and x . 2 2 q x xN q q x x x x1 x2 2 2 2 1 1 x x x N N x 2 1 x N x N 1 2 x x N N Lab 3b: test the CLT 1. One run of decay rate (R=N/Δt) measurement a. b. c. Sampling rate: 0.5 second/sample (Δt) Run time: 30 seconds (# of measurements: n=60) calculate N , N , N , plot histogram. 2. Repeat the 30-sec run 20 times (a set of runs) a. b. c. Remember to save data Plot histograms of 21 means with proper bin widths Make sure you answer all the questions 3. Sample size (n=T/ Δt) dependence – Keep the same configuration, vary run time – Record N and N , plot N vs. log n with error bar = N . * “Origin” (OriginLab®) is more convenient than Matlab. http://www.physics.rutgers.edu/rcem/~ahogan/326.html