Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D. Review • Last Class – Introduction to Monte Carlo • This Class – Important Statistics Terms • Random Events – – Independence of Random Events Axioms on Random Events – Independence of Random Variables – Characteristics of Expectation – – rth moment rth central moment • Random Variables • CDF • PDF • Expectation • Moments of a Distribution • • • • • • • • Mean Variance Standard Deviation Covariance – Characteristics of covariance Review of Statistics and Probability Terms Important Distribution Central Limit Theorem Estimand and Estimator • Next Class – Monte Carlo for Integration Random Events and Probability • Random Event – An event which has a chance of happening • Probability – A numerical measure of that chance – Lying between 0 and 1, both inclusive • Terminology – P(A) • The probability that an event A occurs – P(A+B+…) • The probability that at least one of the events A, B, … occurs – P(AB…) • The probability that all the events A, B, … occur – P(A|B) • The probability that the event A occurs when it known that the event B occurs • Conditional probability of A given B Axioms in Probability • P(A+B+…)P(A)+P(B)+… – If only one of the events A, B, … can occur, they are called exclusive. The equality holds – If at least one of the events A, B, … must occur, they are called exhaustive. P(A+B+…)=1 • P(AB)=P(A|B)P(B) – If P(A|B)=P(A), A and B are independent • The chance of A occurring is uninfluenced by the occurrence of B Random Variables and Distributions • Random variable () – A number to characterize a set of exclusive and exhaustive events • Cumulative Distribution Function (CDF) – F(y)=P( y) – The probability that the event which occurs has a value not exceeding a prescribed y – F(+)=1 and F(-)=1 – F(y) is a non-decreasing function of y Expectation • If g() is a function of , the expectation (or mean value) of g is denoted and defined by Eg ( ) g ( y )dF ( y ) – Stieltjes integral – The integral is taken over all values of y • Explanation – Continuous random events • F(y) is continuous and f(y) is a derivative Eg ( ) g ( y ) f ( y )dy – Discrete random events • F(y) is a step function and fi is the step of height at the points of yi Eg ( ) g ( yi ) f i i • Probability Density Function (pdf) – f(y) and yi are the probability density functions More on Expectation • The statistical physicist uses another notation for expectation – Suppose pi is the probability density function • How about if g(x) is a constant function? Linear Combination of the Expectation Values Multi-dimensional Distribution • Multi-dimensional Random Variable – Represented used a vector • Multi-dimensional CDF – F(y)=P( y) • y means that each coordinate of is not greater than the corresponding coordinate of y • Expectation Eg (η) g ( y )dF ( y ) – Continuous multidimensional events Eg (η) g ( y ) f ( y )dy • where k F ( y1 , y2 ,..., yk ) f ( y ) f ( y1 , y2 ,..., yk ) y1y2 ...yk Independence of Random Variables • Consider a set of exhaustive and exclusive events, each characterized by a pair of numbers and , for which F(y,z) is the distribution. G(y) is an CDF for and H(z) is an CDF for . – F(y,z) = P( y, z) – G(y) = P( y) – H(z) = P( z) • If it so happens that – F(y,z)=G(y)H(z) for all y and z – the random variables and are called independent Characteristics of Expectations Eg ( ) E g ( ) i i i i i i • Hold regardless whether or not the random variables i are independent or not Eg ( ) E g ( ) i i i i i i • Hold only i are mutual independent Moments of Distribution • rth moment of a distribution – E(r) • Principle moment – = E() • rth central moment – r= E{(- )r} • Most important moments – = E(), known as the mean of • Measure of location of a random variable – 2, known as the variance of (usually used abbreviation of “var”) • Measure of dispersion about the mean – standard deviation 2 – coefficients of variation • / Covariance • Definition of covariance (usually abbreviation of cov) – If and are random variables with means and v, respectively, the quantity E{(- )(-v)} is called the covariance of and – If and are independent, the covariance is 0 • Why? – Also, cov(, )=var() • Why? Important Formula of Covariance k k k var( i ) cov(i , j ) i 1 i 1 j 1 Correlation Coefficient • Definition cov( , ) / var var – – – – Always between +1 and -1 If =0, they are not correlated If <0, they are negatively correlated If >0, they are positively correlated Important Distributions • • • • • Uniform Distribution Exponential Distribution Binomial Distribution Poison Distribution Normal Distribution Uniform Distribution • Uniform Distribution (Rectangle Distribution) – A distribution has constant probability – Mean? – Variance? Exponential Distribution • Exponential Distribution – mean 1/ – variance 1/ 2 Binomial Distribution • Binomial Distribution – Discrete probability distribution Pp(n|N) of obtaining exactly n successes out of N Bernoulli trials – Each Bernoulli trial is true with probability p and false with probability q=1-p = = Poisson Distribution • Poisson Distribution – The limit of the Binomial Distribution – Mean is v – Variance is v v nev Pv (n) lim PB (n) N n! Normal Distribution • Normal Distribution (Gaussian Distribution) – Bell curve – De Moivre developed the normal distribution as an approximation to the binomial distribution Normal Distribution in Data Analysis • 68.26% of the data will be found within one SD either side of the mean (±1SD) 95.44% of the data will be found within two SD either side of the mean(±2SD) 99.74% of the data will be found within three SD either side of the mean (±3SD) Central Limit Theorem • Central Limit Theorem – The sum of n independent random variables has an approximately normal distribution when n is large • Random variables conform to arbitrary distribution Central Limit Theorem in Practice • In practice – n = 10 is reasonably large number – n = 25 is rather large (effective infinite) Estimation • Monte Carlo Computation – Goal: estimating the unknown numerical value of some parameter of some distribution • The parameter is called an estimand • Sample • The available data (may consist of a number of observed random variables) • The number of observations in the sample is called the sample size • Estimand – mean • (1+ 2+…+ n)/n – weighted average • (w11+w22+…+wnn)/(w1+w2+…+wn) • May be a better estimator • Connection between the sample and the estimand – The estimand is a parameter of the distribution of the random variables constituting the sample Sampling Distribution • Parent Distribution – We can represent the sample by a vector with coordinates 1, 2, 3,…, n – The distribution of 1, 2, 3,…, n is called the Parent Distribution – To estimate the estimand (a parameter of the Parent Distribution), we use some function t() • t is an estimator • Sampling Distribution – is a random variable, so is t() • if we repeated the experiment, we should expect to get a different value of – Since varies from experiment, t() has a distribution, called sampling distribution – If t() is to be close to , then the sampling distribution ought to be closely concentrated around Measuring Sampling Distribution • The bias of t – The difference between and the average value of t() – =E{t()-} – t is an unbiased estimator if =0 • The sampling variance of t – 2t=var{t()}=E{[t()-Et()]2}=E{[t- - ]2} • If and 2t are small, t is a good estimator Important Estimators • Mean of the parent distribution (1 2 ... n ) / n – standard error / n • Variance of the parent distribution 2 s 2 (1 2 ... n n ) /( n 1) 2 2 – standard error s 2 / 0.5n 2 2 Efficiency • Goal of Monte Carlo Work – Obtain a respectably small standard error in the final result – More random samples can lead to better accuracy • Not very rewarding – Variance Reduction Method Summary • Important Statistics Terms – Random Events • Independence of Random Events • Axioms on Random Events – Random Variables • Independence of Random Variables – CDF – PDF – Expectation • Characteristics of Expectation – Moments of a Distribution • rth moment • rth central moment – Mean – Variance – Standard Deviation – Covariance • Characteristics of covariance – Correlation Coefficient Summary (Cont.) • Important Distributions – Uniform Distribution – Exponential Distribution – Binomial Distribution – Poison Distribution – Normal Distribution • Estimation – – – – – Sample Estimand Parent Distribution Sampling Distribution Estimator • Important estimators – Buffon’s Needle What I want you to do? • Review Slides • Review basic probability/statistics concepts • Work on your Assignment 1