Review of basic probability concepts REFERENCES: • WOOLDRIDGE, APPENDIX B • DANAO, ROLANDO. INTRODUCTION TO STATISTICS AND ECONOMETRICS. DILIMAN: THE UNIVERSITY OF THE PHILIPPINES PRESS Review of Probability Concepts I. Probability II. Random Variables, Probability Distributions, Features of Probability Distributions III. Special Probability Distributions Experiment An experiment is any procedure that can, at least in theory, be infinitely repeated and has a well-defined set of outcomes, e.g. tossing a coin an counting the number of time a head turns up. Sample space, event The sample space of a random experiment is the set of all possible outcomes of the experiment. Each outcome is called a sample point. A subset of a sample space is called an event. T.ea.am Experiment 1: Rolling a single die Sample space: S = {1,2,3,4,5,6} sample points Event of an “odd outcome”: A = {1,3,5} Event of an “even outcome”: B = {?} Experiment 2: tossing a coin until a tail appears. S = {T, HT, HHT, …} Event tail occurs on the second toss: A = {HT} ?⃝ Events Given two events A and B, A U B = “union of A and B” = either A or B occurs A ∩ B = “intersection of A and B” = both A and B occur If A is an event, the symbol “A= Ф” means A is impossible. Let A be an event in sample space S. The complementary event of A is defined as Thus Ac A and x S| x Ac S A φ ④ @ A This Ac one is not possible 9 A' Ac = { ES I { a.*} = A U A ✗ ' = s ✗ EA } Events Two events A and B are said to be mutually exclusive if and only if they have no elements in common, i.e. A B=Ф Or, the occurrence of A implies the non-occurrence of B and vice versa Probability The probability of an event is the chance or likelihood of this event occurring, measured by a number between 0 and 1 Equivalent statements: Probability of an event is 3/4. = An event occurs with a 75 percent probability. = The odds against the event not occurring is 25 percent to 75 percent or 1 to 3. Classical definition of probability Let S be a sample space consisting of N mutually exclusive and equally likely outcomes. The probability of a single outcome is defined as 1/N. The probability of an event is the sum of the probabilities of the sample points in the event. The probability of an event is denoted as P(A) or Pr(A). This is the classical definition of probability. Requires that there are a finite number of outcomes and outcomes are mutually exclusive. Example Experiment = Toss a coin twice S = {HH, HT, TH, TT} N=4 A = event that a head will turn up in both tosses P(A)= 1/4 = 25 percent B = event that at least one head will occur P (B) = ? Experiment = Roll a fair die once S = {1,2,3,4,5,6} N=6 A = event that an odd number turns up={1,3,5} P(A) = P(1)+P(3)+P(5)=(1/6) + (1/6) + (1/6) = 3/6 Probability set function Let S be a sample space and A an event. i. 0≤ P(A)≤1 ii. P(S)=1 iii. For any number of mutually exclusive events A1 and A2, P(A1 A2) = P(A1) + P(A2) A real-valued function P that satisfies the preceding three axioms on a sample space S is a probability set function Theorems (i) P(AC) = 1 – P(A) (ii) P(Ф) = 0 (iii) If A B, then P(A) ≤ P(B). (iv) If A and B are any two events, then P(A U B) = P(A) + P(B) – P(A ∩ B) Conditional Probability Let A and B be events in a sample space S. The conditional probability of event A given that B has occurred or is certain to occur is defined as P( A | B) P( A B) , P( B) P(B) 0 Example 3: rolling a die. Let A be event of rolling a “3” and B the event “odd outcome”. Then A = {3}, B = {1,3,5} and P (A|B) = (1/6)/(3/6) = 1/3 Example: Data from a survey of 2000 individuals selected at random (Danao) Sex of Worker Male (M) Female (F) Total Employed (E) 850 600 1450 Unemployed (U) 300 250 550 Total 1150 850 2000 P(E) = 1450/2000 P(U) = 550/2000 P(M) = 1150/2000 P(F) = 850/2000 The probability that a male worker is employed is: P(E|M) = P(E M)/P(M) = (850/2000)/(1150/2000) Independence Two events A and B are said to be independent (statistically independent, stochastically independent) when the occurrence of one event does not affect the probability of the occurrence of the other event. That is, iff P(A|B)=P(A) and P(B|A)=P(B) Two events are independent if and only if P(A B)=P(A)P(B) Events A, B and C are independent iff they are pairwise independent and P(A B C) = P(A)P(B)P(C) Example 5: card drawn from deck of 52 cards Let A be the event that a queen is drawn B be the event that a spade is drawn. n=52 A={QH, QD, QS, QC} B={2S,…,JS, QS, KS, AS} P(A) = 4/52 = 1/13 P(B) = 13/52 = ¼ What is the probability that a queen of spade is drawn? P(A B)=1/52 = (1/13)*(1/4) = P(A)P(B) Therefore, A and B are independent. II. Random Variables, Probability Distributions Random Variables It is more convenient to use numbers to represent elements of a sample space For outcomes that are not numbers, this is accomplished by associating a number with each outcome, e.g. instead of S= {H,T}, use {1,0}. This assignment of numbers is called a random variable. Formally, A random variable is a real-valued function defined on a sample space. If X is a random variable defined on a sample space S, then the range X, denoted by X(S) is the set X(S)={x R|x=X(s), s S} where R is set of real numbers. Random variable X S s. X R x Let X: S R be a random variable. If x R, the symbol “X=x” means “X takes the value x”. That is, there is an s S such that X(s) = x. Example: tossing a fair coin twice S = {HH, TT, HT, TH} Let a random variable X = number of heads that turn up. So, Sample Point HH TT HT TH X 2 0 1 1 The event “X=1” is the set {s S|X(s) = 1} = {HT, TH} Discrete random variable A discrete random variable can take only a finite or “countably infinite” number of values (values can be put in a one-to-one correspondence with the positive integers). We will concentrate on the former. Example: Lottery first prize: P100,000 second prize: P5000 third prize: P500.75 Prize money is a discrete random variable since it has only four (a finite number) possible outcomes: P0.00; P500.75; P5000.00; P100,000.00 Simplest example: Bernoulli random variable A random variable that can only take on the values of zero and one is called a Bernoulli random variable. Event X = 1 is a “success”; Event X = 0 is a “failure” To completely describe it, need only know P(X=1) P(X=1) = θ reads “the probability that X equals one is θ”. Notation: X ~ Bernoulli (θ) read as “X has a Bernoulli distribution with probability of success equal to θ” More generally, Any complete random variable is completely described by listing its possible values and the associated probability that it takes on each value. A list of all of the possible values taken by a discrete random variable along with their chances of occurring is called a probability function or probability density function (pdf). die one dot 1 two dots three dots four dots five dots six dots 6 x 1/6 2 3 4 5 1/6 f(x) 1/6 1/6 1/6 1/6 Same experiment S = {HH, TT, TH, HT} P(HH) = ¼ P(TT) = ¼ P(TH) = ¼ P(HT) = ¼ s TT HT, TH HH x 0 1 2 P(X=x) 1/4 1/2 1/4 X is the number of times heads turn up -> P(X=0) = P(TT) = ¼ -> P(X=1) = P(HT U TH) = ¼ + ¼ = ½ -> P(X=2) = P(HH) = 1/4 Probability density functions A discrete random variable X has pdf, f(x), which is the probability that X takes on the value x. Let X take on the k possible values, with probabilities defined by pj = p(X=xj), j = 1, 2, …k The probability density function (pdf) of X is: f(x) = pj, j = 1, 2, ….k, = 0, for any x not equal to xj for some j. with the following properties : (i) f(x) ≥ 0 (ii)∑x f(x) = 1 Relationship of X, P and f S X X=x R x P 0 f 1 Continuous random variable A variable X is a continuous random variable if it takes on any real value with zero probability: P [ X = a ] = P [ a < X < a ] = 0 (note: X can take on so many possible values that we cannot count them or match them up with positive integers. So logical consistency dictates that X can take on each value with probability zero) Random variables that take on numerous values are best treated as continuous. Examples: Gross national product (GNP), money supply, interest rates, price of eggs, household income, expenditure on clothing For continuous rv • We will use a pdf of a continuous rv only to compute events involving a range of values If a and b are constants, where a < b, the probability that X lies between the numbers a and b, P (a < X < b ) is the area under the pdf between points a and b. For continuous rv, it is easiest to work with the cumulative distribution function (cdf) If X is any random variable, then its cdf is defined for any real number x by F(x) P(X≤x) For discrete rv, this is obtained by summing the pdf over all values of xj such that xj ≤ x. F(x)= Σ f(xj) For a continuous rv, F(x) is the area under the pdf to the left of the point x, or F ( x) P( X x) x f (u)du Continuous RV: cumulative distribution is the area under the pdf falling less than or equal to x The total area under the curve is 1. P(X ≤ a)=area under the curve to the left of a = F(a) P(X ≤ b)=area under the curve to the left of b P(a≤X ≤ b)= = F(b) F(b)-F(a) f(x) a b X Remarks F(x) is simply a probability; always between 0 and 1. If x1 ≤ x2, then P(X ≤ x1 ) ≤ P(X≤ x2 ), that is, F (x1 ) ≤ F(x2). CDF is at least a non-decreasing function of x. Important properties: For any number c, P(X>c) = 1- F(c ) For any numbers a<b, P(a <X ≤b) = F(b) – F(a). For continuous rv, it does not matter whether inequalities are strict or not). That is: P(X c) = P(X>c) and P(a <X <b) = P(a ≤ X ≤b) = P(a <X ≤b) = P(a ≤ X < b) Joint distributions Let X and Y be discrete random variables. Then (X,Y) have a joint distribution, which is fully described by the joint probability density function of (X,Y): f x,y (x,y) = P(X=x,Y=y), = 0, elsewhere if x X and y Y The joint pdf of X and Y (discrete RVs) satisfies: (1) f x,y (x,y) 0 (2) ΣxΣy f(x,y)=1 Example A population of N individuals consists of employed males (em), employed females (ef), unemployed males (um) and unemployed females (uf). Experiment = draw a person at random S={s1,s2,…,sN} Define the ff random variables on S: X(s)=1 if s is female, 0 if s if male Y(s)=1 if s is employed, 0 if s is unemployed e.g. Event {X=0} has em + um elements. Event {Y=1} has em + ef elements…. And so forth. P(X=0) = (em+ um)/N P(Y=1) = ? Similarly, joint probabilities: P(X=0, Y=1) =em/N P(X=1, Y=1) =ef/N P(X=0, Y=0) =um/N P(X=1, Y=0) =uf/N The joint pdf of X and Y is defined as: f f f f f X,Y(0, 1) =em/N X,Y(0, 0) =um/N X,Y(1, 1) =ef/N X,Y(1, 0) =uf/N X,Y(x,y) = 0, elsewhere Example (cont’d) Note that ΣxΣy f x,y (x,y) = f(0,0)+f(0,1)+f(1,0)+f(1,1) = um/N +em/N + uf/N + ef/N = 1 Independence Random variables X and Y are said to be independent iff f X,Y (x,y) = fX (x) fY(y) for all x, y, where fX (x) is the pdf of X and fY(y) is the pdf of Y. This must hold for all pairs of x, y. If random variables are not independent, they are said to be dependent The pdfs fX and fY are called the marginal probability density functions (to distinguish then from the joint pdf f X,Y ) If X and Y are independent, then knowing the outcome of X does not change the probabilities of the possible outcomes of Y, and vice versa. Note The marginal probability density functions, fX(x) and fY(y), for discrete random variables, can be obtained by summing over the f(x,y) with respect to the values of Y to obtain fX(x) and with respect to the values of X to obtain fY(y), fX(x) = Σy f(x,y) fY(y)= Σx f(x,y) Example (see Danao 2.3.3 for continuous case) Recall example 9: The joint pdf of X and Y : f X,Y(0, 1) =em/N f X,Y(0, 0) =um/N f X,Y (1, 1) =ef/N f X,Y(1, 0) =uf/N f X,Y(x,y) = 0, elsewhere The marginal pdf of X is: fX(0)=(em/N) + (um/N) fX(1)=(ef/N) + (uf/N) fX(x) = 0, elsewhere The marginal pdf of Y is: fY(0)=(um/N)+(uf/N) fY(1)=(em/N)+(ef/N) fY(y) = 0, elsewhere Conditional Distributions We want to know how one random variable, Y, relates to one or more other variables? Let X and Y be random variables with joint pdf f(x,y). The conditional pdf of Y given X=x is f Y/X (y/x) = f X,Y (x,y)/ fX (x) e.g. interpretation is most easily seen in the discrete case: f Y/X (y/x) = P(Y=y|X=x) “the probability that Y=y given that X=x” From last example X(s)=1 if s is female, 0 if s if male Y(s)=1 if s is employed, 0 if s is unemployed Summarizing joint pdf and marginal pdf: y=1 y=0 fX x=0 em/N um/N (em+ um)/N x=1 ef/N uf/N (ef+ uf)/N fY (em+ ef)/N (um+ uf)/N The conditional probability of drawing a female given that the person is What is the Pr of drawing employed a female given that the individual f X|Y(1,1)/ =f is employed? X,Y(1,1)/ fY(1)=[(ef/N)]/[(em+ef )/N]=ef/(em+ef ) Conditional distributions and independence Important feature of conditional distributions: if X and Y are independent rv, then knowledge of the value taken on by X tells us nothing about the probability that Y takes on various values (and vice versa). f Y/X (y/x) = f Y (y), and f X/Y (x/y) = fX (x) Are X and Y independent? f X,Y (0,0) = um/N f X (0) = (em + um)/N f Y (0) = (um+uf)/N Example: Basketball player shooting 2 free throws. Let X, Bernoulli random variable that she makes first free throw. Y, Bernoulli random variable that she makes second free throw. Suppose she is an 80% free throw shooter. And X and Y are independent. What is probability of player making both free throws? X ~ Bernoulli (.8), Y ~ Bernoulli (.8) P(X=1, Y=1) =? Now assume that conditional density is: f Y/X (1|1) = .85, f Y/X (0|1) = .15 f Y/X(1|0) = .70, f Y/X (0|0) = .30 Thus, the probability of making the 2nd free throw depends on whether 1st was made. If 1st was made, then chance of making the 2nd is _____. If 1st was missed, chance of making 2nd is _____. Are X and Y independent? Compute for P(X=1, Y=1). Assume P(X=1) =.8. =? Some features of probability distributions MEASURES OF CENTRAL TENDENCY MEASURES OF VARIABILITY MEASURES OF ASSOCIATION BETWEEN TWO RANDOM VARIABLES Measure of central tendency: Expected value Most familiar measure of central tendency. Employs all available data in the computation. Strongly influenced by extreme values. Expected value of X can be a number that is not a possible outcome of X Expected value If X is a random variable, the expected value of X (or expectation) of X, denoted by E(X) (and sometimes μX or just μ) is a weighted average of all possible values of X, where weights are determined by the pdf. Sometimes called the population mean, when we want to emphasize that X represents some variable in the population. Simplest in the case that X is a discrete rv: k E(X) = x1f(x1) + x2f(x2)+…+xkf(xk) j 1 x j f (x j ) If X is continuous rv, then E( X ) xf ( x ) dx Given random variable X and function g(•), can create a new random variable g(X). The expected value of g(X) is: E[ g ( X )] k g ( x j ) f X ( x j ) discrete j 1 E[ g ( X )] g ( x ) f X dx continuous Note: [E(X)]2 E(X2) . That is, for nonlinear function g(X), E[g(X)] g[E(X)] If X and Y are random variables taking on vaues (x1, … xk) and (y1, … ym), then g(X,Y) is a random variable for any function g, and its expectation is: E[g(X,Y)] = k m h=1 j =1 g(xh,yj) fxy (xh,yj) Properties of expected values E.1 For any constant c, E(c ) = c E.2 For any constants a and b, E(aX + b) = a E(X)+b E.3 If {a1, a2, ….an) are constants and {X1, X2, …, Xn} are random variables, then E( n ai X i ) i 1 n ai E ( X i ) i 1 As a special case, with each ai=1, E( n i 1 Xi) n i 1 E( X i ) Median Value that divides an ordered data set (array) into two equal parts. A value below which half of the data fall. Characteristics: A positional measure Not influenced by extreme values May not be an actual value in the data set. Finding the median, (Med(X) or Md) If X is discrete: to arrange the data in an array (ordered values). Let X(i) be the ith observation in the array, i = 1, 2, ..n. If n is odd, the median position equals X (n+1)/2. If n is even, the mean of the two middle values in the array is the median. If X continuous: median is the value such that ½ of the area under the pdf is to the left of Md and ½ is to the right. If X has a symmetric distribution about the value μ, then μ is both the expected value and the median. Measures of variability: variance and standard deviation Measures of dispersion indicate the extent to which individual items in a series are scattered about an average Used as a measure of reliability of the average value. Measures of absolute dispersion (variance, standard deviation) - used to describe the variability of a data set Also: measure of relative dispersion - used to compare two or more data sets with different means and different units of measurement (e.g. coefficient of variation) Variance and standard deviation For random variable X, let μ= E(X). We need a number to tell us how far X is from μ, on average. One such number is the variance tells us the expected distance from X to its mean: Var(X) E[(X – μ)2] , denoted by x2 or just 2 Note that 2 = E(X2)- μ2 The standard deviation of a random variable is simply the positive square root of the variance. sd(X) + Var( X ) The standard deviation is often referred to the measure of “volatility” Variance and standard deviation If there is a large amount of variation in the data set, the data values will be far from the mean. In this case, the standard deviation will be large. If there is only a small amount of variation in the data set, the data values will be close to the mean. Hence, the standard deviation will be small. Characteristics of the standard deviation It is affected by the value of every observation It may be distorted by few extreme values It is always positive. Properties of the variance VAR.1. Var (X)=0 iff there is a constant c, such that P(X=c) = 1, in which case E(X) = c. [Variance of a constant is zero] SD.1 For any constant c, sd(c ) = 0. VAR.2. For any constants a and b, Var (aX+b) = a 2 Var(X) [Adding constant to a rv does not change variance; multiplying by a constant increases variance by factor equal to square of constant] SD.2 sd (aX+b) = |a| sd(X). If a>0, then sd(aX) = a*sd(X) Standardizing a random variable Suppose that given a random variable X, we define a new random variable by subtracting off its mean and dividing by its standard deviation: Z X Which we can write as Z= aX+b, where a 1/σ and b -(μ/ σ). Then, E(Z) = a E(X) + b = 0 and, Var(Z) = a2 Var (X) = 1 Z has mean zero and variance of 1. This procedure is known as standardizing the random variable X and Z is called a standardized random variable. Coefficient of variation Commonly used measure of relative dispersion. The coefficient of variation utilizes two measures: the mean and the standard deviation. It is expressed as a percentage, removing the unit of measurement, thus, allowing comparison of two or more data sets. Unit-less. Used to compare the scatter of one distribution with the scatter of another distribution. Coefficient of variation The formula of the coefficient of variation is given as, CV = σ/ μ * 100% Measures of association; features of Joint and Conditional distributions Summary measures of how, on average, two random variables vary with one another. The covariance between two random variables X an Y, sometimes called the population covariance, is defined as expected value of the product (X – μX) (Y – μY): Cov (X, Y) E[(X – μX) (Y – μY)], denoted as σXY This measures the amount of linear dependence between two random variables. If positive, two random variables move in the same direction (on average, when X is above its mean,Y is also). If negative, move in opposite directions (when X is above its mean, Y is below). Interpreting the magnitude of a covariance can be tricky. Note (show!) Cov (X,Y) = E[(X – μX) (Y – μY)] = E[(X– μX) Y] = E[X(Y – μY)] = E (XY) - μX μY If E(X) or E(Y) = 0, then Cov (X,Y) = E(XY) Some Properties COV.1 If X and Y are independent, then Cov(X,Y)=0. [follows from E(XY) = E(X)E(Y)]. Converse is not true. COV.2. For any constants a1, b1, a2, b2: Cov (a1X + b1, a2Y + b2) = a1a2 Cov(X,Y) This implies covariance between 2 random variables can be altered simply by multiplying one or both by a constant. Depends on units of measurement. COV.3 |Cov(X,Y)| sd(X)sd(Y) Correlation Coefficient (ρXY) Corr (X,Y) = Cov(X,Y)/[sd(X)*sd(Y)] = σXY/ σX σY Sometimes denoted unitless number. xy. This scales covariance into a Again, a measure of the strength of the linear relationship existing between X and Y. Always same sign as Cov (X,Y). Magnitude is easier to interpret than size of covariance because of CORR.1 Properties CORR. 1 –1 Corr (X,Y) 1 A ρxy close to 1 or –1 indicates a strong linear relationship but it does not necessarily imply that X causes Y or Y causes X. If ρxy = 1, perfect positive linear relationship, and we can write Y = a + bX, for some constant a and constant b>0. If ρxy = 0, then there is no linear relationship between X and Y and they are said to be uncorrelated random variables. A value of 0 however does not mean lack of association. Properties CORR.2 For any constants a1, b1, a2, b2: Corr (a1X + b1, a2Y + b2) = Corr (X,Y) If a1a2 < 0, then Corr (a1X + b1, a2Y + b2) = - Corr (X,Y) That is, the correlation is invariant to the units of measurement of either X or Y More properties of the variance VAR.3 For constants a and b, Var (aX + bY)= a2 Var(X) + b2Var(Y) + 2abCov(X,Y) If uncorrelated, then Var(X+Y) = Var(X) + Var(Y) Var (X-Y) = ? VAR.4 Extended to more than two random variables: If {X1, X2, … Xn} are pairwise uncorrelated random variables and {ai, i = 1…n} are constants, then Var(a1X1+…+anXn) = a12Var(X1)+…+an2Var(Xn) Conditional expectation or mean- when Y is related to X in a nonlinear fashion Call Y the explained variable (say, hourly wage) and X the explanatory variable (say, years of formal education). How does the distribution of wages change with education level? Summarize relationship by looking at the conditional expectation of Y given X, also called the conditional mean. The expected value of Y given that we know an outcome of X. Conditional expectation: E(Y|X=x) or E(Y|x) When Y is a discrete rv, taking values {y1…ym} E (Y | x) m y j f Y | X ( y j | x) j 1 Weighted average of possible values of Y, but now weights reflect the fact that X has taken on a specific value. E(Y|x) is a function of x, which tells us how the expected value of Y varies with x. example Let (X,Y) represent population of all working individuals, where X is years of education and Y is hourly wage. Then E(Y|X=10) is average hourly wage for all people in the population with 10 years of education (with HS education). Expected value of hourly wage can be found at each level of education. In econometrics, we can specify simple functions that capture this relationship, e.g. linear function: E(WAGE|EDUC) = 15.65 + 3.50 EDUC (note: conditional expectations can also be nonlinear functions) Useful properties of conditional mean CE. 1 CE. 2 CE.3 CE.4 E[c(X)|X] = c(X), for any function c(X) E[a(X)Y + b(X)|X] = a(X) E(Y|X) + b(X) If X and Y are independent, E(Y|X) = E(Y) If X and Y are independent, then the expected value of Y given X does not depend on X (e.g. if wages were independent of education, then average wages of HS and college grads would be the same) If U and X are independent and E(U) = 0, then E(U|X) = 0. E[E(Y|X)] = E(Y) If we first get E(Y|X) as a function of X, and take the expected value of this, then end up with E(Y). CE.5 If E(Y|X) = E(Y), then cov (X,Y) = 0 (and so corr(X,Y) = 0.) In fact, every function of X is uncorrelated with Y. If knowledge of X does not change the expected value of Y, then X and Y must be uncorrelated (implying, if X and Y are correlated then E(Y|X) must depend on X. Converse is not true. If X and Y are uncorrelated, E(Y|X) could still depend on X. From (4) and (5): If U and X are random variables such that E(U|X) = 0, then E(U) = 0 and U and X are uncorrelated. Conditional variance Given random variables X and Y, the variance of Y conditional on X=x is the variance associated with the conditional distribution of Y, given X=x: E{[Y-E(Y|x)]2 |x} Var(Y|X=x) = E(Y2|x) – [E(Y|x)]2 (why?) CV.1 If X and Y are independent, then Var (Y|X) = Var (Y) III. Special Probability Distributions Special Probability Distributions The probability that a random variable takes values in an interval can be computed from each of these distributions. Examples of widely used distributions for continuous variables: Normal distribution t distribution F distribution Chi-square distribution The Normal Distribution A random variable X is said to be normal if its pdf is of the form f (x) 1 exp σ 2π 1 x μ 2 σ 2 ,x R X has a normal distribution with expected value and variance 2 Remarks on the Normal Distribution The normal distribution is completely characterized by two parameters, (mean) and (standard deviation). The graph of the normal pdf is the familiar bell curve. Symmetric with respect to . is also the median of X. Maximum value of the pdf is attained when x= . The larger is, the flatter f(x) is at x= To indicate that X is a normal random variable with mean and variance 2: X~N( , 2) Standard Normal Distribution If a random variable Z ~ N(0, 1), then we say it has a standard normal distribution (SND). The pdf of a snd is denoted to the left of z. (z) and is obtained as the area under Can use the standard normal cdf for computing the probably of an event involving a standard normal rv. P(Z > z) = 1 - (z) P (Z < -z) = P (Z > z) P (a Z b) = (b) - (a) P (|Z| > c) = P (Z >c) + P (Z<c) = 2 * P(Z>c) = 2[1- (c)] Example To get P (x1 < X < x2), we write P (x1 < X < x2) = P (x1 < Z + < x2) = P ((x1- ) / < Z < (x2- ) / ) Let X ~ N (5, 4). What is P(6 < X < 8)? = P ((6-5)/2 < Z < (8-5)/2) = P (0.5 < Z < 1.5) = 0.9332 - .6915 = .2417 This is the area under the standard normal curve between 0.5 and 1.5. Example Assume that monthly family incomes in urban Philippines are normally distributed with μ = 16,000 and σ=2,000. What is the probability that a family picked at random will have an income between 15,000 and 18,000? P(15000<X<18,000) 15k 16k 18k Example Compute Z values Z1=(15,000-16,000)/2,000 = -0.5 Z2=(18,000-16,000)/2,000=1 Find the area between Z1=-0.5 and Z2=1 Use Table of Z values to find: P(-0.5<Z<1) = .8413 - .3085 Or = P(0<Z<1)+P(-0.5<Z<0) = 0.3413 + 0.1915 = 0.5328 or 53.28 % NOR.2 Let X~N( , b2 2). 2) and let Y = a + bX. Then Y~N(a+b , NOR.3 If X and Y are jointly normally distributed, then they are independent iff Cov(X,Y) =0 NOR.4 Any linear combination of independent, identically distributed normal rv has a normal distribution. In particular, let X1, X2,…, Xn~ N( i, i2). Then for real numbers c1, c2, …, cn c1 X1 + c2 X2+ … + cn Xn ~ N(Σci i, Σci2 i2) Corollary Let X1, X2, ..., Xn be mutually independent normal random variables and identically distributed as N(μ,σ2). The distribution of _ X is N(μ,σ2/n). (That is, the average of independent normally distributed random variables has a normal distribution.) Example Example: consider Zi ~ N(0,1), i = 1,…,25. What is _ P( Z< .2)? _ Z = 1/n 25 Z i ~ N(0,1/25). i 1 _ Z 0 ~ N(0,1). Or, Z = 5 Z 1 / 25 _ Hence, P ( <0.2) = P(Z/5 < 0.2) = P(Z<1) = .8413 So, Z = Z Other features of normal distribution Zero skewness (3rd moment: degree of asymmetry, or departure from symmetry of a distribution). 3 E (Z ) E[( X 3 ) ]/ 3 Kurtosis, distinguishes between two symmetric distribution, = 3 for a normal distribution. E(Z 4 ) E[( X )4 ] / 4 In Excel (and in some other textbooks), the measure of kurtosis is measured as (α4 – 3), referred to as excess kurtosis. If > 0, the distribution has fatter tails than the normal distribution, such as with the t distribution. If <0, then it has thinner tails (rarer situation) Positive vs. negative skewness Positive (to the right) distribution tapers more to the right than to the left longer tail to the right more concentration of values below than above the mean Negative (to the left) distribution tapers more to the left than to the right longer tail to the left more concentration of values above than below the mean Note: rarely do we find data characteristically skewed to the left. Chi-square Obtained directly from independent, standard normal random variables. Let Zi, i = 1, …., n be independent random variables, each distributed as standard normal. Define a new random variable as the sum of the squares of the Zi n X= Z i2 i 1 The X has a chi-square distribution with n degrees of 2 freedom (df), denoted: X ~ n (or 2 (n) ) Notes: Chi-square non-negative (has positive values only for x>0) not symmetric about any point. For small values of df, it is skewed to the right but as df increases, the curve approaches the normal curve. The expected value of X is n, and its variance is 2n. Relation to the normal distribution: the square of standard 2 normal random variable is ~ (1) t distribution Workhorse in classical stat and multiple regression analysis. Obtained from standard normal and chi-square random variable. Let Z have SND and X have chi-square with n df. Assume Z and X are independent. Then the random variable T = Z/ X /n as a t distribution with n df. Denoted: T~tn or t (n) Shape similar to SND, except more spread out so thicker tails. Expected value of a t distributed random variable is 0 and variance is n/(n-2) for n>2 (does not exist for n <2 because distribution is so spread out). As df increases, variance approaches 1, so at large df, t distribution can be approximated by the SND F distribution Will be used for testing hypothesis in context of multiple regression analysis. 2 2 Let X1~ k 1 and X2 ~ k 2 , and they are independent. Then the random variable F = X 1 / k1 X 2 / k2 Has an F distribution with (k1, k2) degrees of freedom. Denoted: F ~F k 1, k 2 [or F (k1, k2)] Characterized by two parameters: k1, the numerator degrees of freedom; k2 , the denominator degrees of freedom. Skewed to the right. The square of a t distribution is an F distribution X ~ t(n) X2 ~ F(1, n)