STA 671 – Ch 4 – Probability and Probability Distribution Notes # 4.1 Probability in inference Classical interpretation = equally likely outcomes in games of chance Outcome = possible distinct result Event = collection of outcomes Relative frequency of probability = n(e)/n Personal/Subjective probability #-----------------------------------------------------------------------------------------------------------------# 4.2 Finding Probability of an Event Number of times event occurs/Number of times it might occur OL example - Toss two coins, Outcomes = HH, HT, TH, TT What is the probability of exactly one head in two tosses of fair coins? R simulation ... my.samp <- sample(x=c("HH", "HT", "TH", "TT"),size=1000,replace=TRUE) sum(my.samp=="HT" | my.samp=="TH") [1] 493 sum(my.samp=="HT" | my.samp=="TH")/length(my.samp) [1] 0.493 # count expts with exactly one head # est. prob #-----------------------------------------------------------------------------------------------------------------# 4.3 Basic Event Relations and Prob laws Events A or B 0 <= P(A) <= 1 UNION = either A or B MUTUALLY EXCLUSIVE = occurrence of one event excludes possibility of other event A and B mutually exclusive then Pr(A or B) = Pr(A) + Pr(B) COMPLEMENT of event A is the event that A does NOT occur UNION INTERSECTION Probability of the union Venn Diagram Aside: R functions related to set manipulation union(x, y) intersect(x, y) setdiff(x, y) setequal(x, y) is.element(el, set) A <- c(1, 3, 5) B <- c(4, 5) union(A,B) [1] 1 3 5 4 intersect(A,B) [1] 5 is.element(4,A) [1] FALSE is.element(4,B) [1] TRUE #-----------------------------------------------------------------------------------------------------------------# 4.4 Conditional Prob and Independence Pr( A | B ) = Pr( A AND B )/Pr(B) where Pr(B) > 0 Pr( A AND B ) = Pr( A | B ) * Pr(B) Pr(False POSITIVE) = Pr(Test POSITIVE | Disease Free) Pr(False NEGATIVE) = Pr(Test NEGATIVE | Disease Absent) INDEPENDENT [Def'n] Occurrence of event A is not dependent on the occurrence of event B. [Implication] Pr( A | B ) = Pr(A) Pr( A ∩ B ) = Pr( A | B ) * Pr(B) = (if A INDEP of B) = Pr(A) * Pr(B) Don't confuse INDEPENDENCE and MUTUALLY EXCLUSIVE #-----------------------------------------------------------------------------------------------------------------# 4.5 Bayes' Formula - not covered in class #-----------------------------------------------------------------------------------------------------------------# 4.6 Variables: discrete and continuous RANDOM VARIABLE - qualitative RV - categorical responses - quantitative RV response varies in numerical magnitude DISCRETE - countable outcomes [counting] CONTINUOUS - uncountable/continuum of outcomes [measuring] #-----------------------------------------------------------------------------------------------------------------# 4.7 Prob. dist'ns for discrete RV P(y) = probability distribution Properties: 1. 0 ≤ P(y) ≤ 1 2. ∑ P(y) = 1 3. Prob(y1 or y2) = P(y1) + P(y2) EXAMPLE: Number of heads in 3 tosses y -0 1 2 3 P(y) -----1/8 3/8 3/8 1/8 #-----------------------------------------------------------------------------------------------------------------# 4.8 Binomial Binomial Experiment 1. 2. 3. 4. 5. n identical trials in an experiment trial results in one of two outcomes - S/F P(S) = π Trials are independent RV y = # of successes in n trials EXAMPLE: y = # survive given n exposed in a toxicology study y = # support candidate out of n sampled in politics P(y) = n! y (1 ) n y y!(n y )! where n=# trials, = P(S) and n! = n(n-1)(n-2) …(2)(1) R function – dbinom – calculates P(y) dbinom(x=c(0,1,2,3),size=3,prob=0.5) [1] 0.125 0.375 0.375 0.125 pp <- dbinom(x=c(0,1,2,3),size=3,prob=0.5) barplot(pp,names.arg=0:3) 0.30 0.20 0.10 0.00 0 1 2 3 MEAN () and STANDARD DEVIATION () of Binomial =n = n (1 ) # # different BINOMIALs illustrated for n=20 # Consider the shape of P(y) for binomial distributed random variables with n=20 and = 0.05, 0.25, 0.50 or 0.80. Values that y can assume: y=0, 1, 2, …, 20 0.3 0.2 0.1 0.0 0.0 0.1 0.2 Probability 0.3 0.4 Bin(n=20,p=0.25) 0.4 Bin(n=20,p=0.05) MEAN 0 MEAN 5 10 15 20 0 Bin(n=20,p=0.50) 10 15 20 Bin(n=20,p=0.80) Number of Successes 0.3 0.2 0.1 0.0 0.0 0.1 0.2 Probability 0.3 0.4 Number of Successes 0.4 5 MEAN 0 5 10 MEAN 15 20 0 5 10 15 20 Q: What you notice about the shape of this probability function? R code to generate the different Binomial plots pp05 <- dbinom(x=0:20, size=20, prob=0.05) pp25 <- dbinom(x=0:20, size=20, prob=0.25) pp50 <- dbinom(x=0:20, size=20, prob=0.50) pp80 <- dbinom(x=0:20, size=20, prob=0.80) par.old.mar <- par()$mar par(mfrow=c(2,2)) par(mar=c(2,2,2,2)) # may lose some of margin text plot(0:20, pp05 ,main=”Bin(n=20,p=0.05)”,type=”h”,lwd=4,ylim=c(0,.40), ylab=”Probability”, xlab=”Number of Successes”) abline(v=(20*.05)) mtext(“MEAN”,side=1,at=20*.05) plot(0:20, pp25 ,main=”Bin(n=20,p=0.25)” ,type=”h”,lwd=4,ylim=c(0,.40) , ylab=”Probability”, xlab=”Number of Successes”) abline(v=(20*.25)) mtext(“MEAN”,side=1,at=20*.05) plot(0:20, pp50, main=”Bin(n=20,p=0.50)” ,type=”h”,lwd=4,ylim=c(0,.40) , ylab=”Probability”, xlab=”Number of Successes”) abline(v=(20*.5)) mtext(“MEAN”,side=1,at=20*.50) plot(0:20, pp80, main=”Bin(n=20,p=0.80)” ,type=”h”,lwd=4,ylim=c(0,.40) , ylab=”Probability”, xlab=”Number of Successes”) abline(v=(20*.80)) mtext(“MEAN”,side=1,at=20*.80) #-----------------------------------------------------------------------------------------------------------------# 4.9 Prob. Dist'ns for continuous RV Suppose outcomes take values over a continuum. For example, a random variable that corresponds to weight, cholesterol levels, etc. could be conceptualized in this manner. As an example, let Y = time after class begins when students arrive Consider a relative frequency histogram corresponding to Y for 10, 100, 100 or 10000 students. Q: What will more data do for you with respect to a relative frequency histogram? * smaller bins * smooth curve describing this variable’s distribution, say f(y) Properties of f(y) * This differs from P(y) for a discrete RV - f(y) does NOT EQUAL Pr(y) * For a continuous RV, the probability of an individual values is zero, Pr(y = a) = 0 * For a continuous RV, the probability of an interval, Pr(a<y <b) = area under f(y) over interval For example, suppose y=weight in pounds What is the Pr(y = 182.738509138920102 lbs)? Pr(y = any specific value) = 0 What is the Pr(178 < y < 182)? This can be determined. * Total area under curve = 1 Histogram of rlnorm(100) 60 40 0 0 2 20 4 6 Frequency 8 80 10 Histogram of rlnorm(10) 20 30 40 0 10 20 30 40 Histogram of rlnorm(10000) rlnorm(100) 0 0 1000 200 400 Frequency 600 Histogram of rlnorm(1000) rlnorm(10) 5000 10 3000 0 0 10 20 30 40 0 10 20 30 40 50 R code to generate this (sampling log-normal variates) hist(rlnorm(10),breaks=seq(from=0,to=40,length=10)) hist(rlnorm(100),breaks=seq(from=0,to=40,length=15)) hist(rlnorm(1000),breaks=seq(from=0,to=40,length=25)) hist(rlnorm(10000),breaks=seq(from=0,to=50,length=50)) #-----------------------------------------------------------------------------------------------------------------# 4.10 NORMAL The “all star” of continuous distributions – symmetric, unimodal, bell-shaped (sometimes called Gaussian distribution) * described by two parameters ( = mean; =standard deviation) Notation: y ~ N(, ) – read “y is distributed as a normal RV with mean and std dev ” STANDARD NORMAL has mean= 0 and std. dev.= 1, z ~ N(0,1) 0.4 0.3 0.2 N(0,1) 0.0 0.1 N(2,2) -4 -2 0 2 4 6 R code to generate figures xx<- seq(from=-4,to=6,length=1000) yy0 <- dnorm(xx) yy2 <- dnorm(xx,mean=2,sd=sqrt(2)) plot(xx,yy0,type=”l”,lty=1,lwd=2) lines(xx,yy2,type=”l”,lty=2,lwd=2) text(-1.5,.3,"N(0,1)") text(4,.2,"N(2,2)") Areas under normal distributions are the same when intervals are expressed in term of standard deviations. Huh? Suppose y~N(500,100) (e.g. SAT-V) and z~N(0,1) Pr(400 < y < 600) = Pr(-1 < z < 1) [both reflect the area under a normal curve from one std dev below the mean to one std dev above the mean] So what? If y ~ N(, ) then z = [y-]/ ~ N(0, 1) Z score = measure of distance – how many SDs is an observation away from the mean Example 4.14: y=daily milk production of Guernsey cow – y ~ N(70, 13) a. Pr(milk production for a cow chosen at random will be less than 60 pounds) = Pr(y<60) Pr(y < 60) = Pr([y-]/ [60 – 70]/13) = Pr(z < -0.77) # R function to find area under N(0,1) less than some point pnorm(-0.77) [1] 0.2206499 pnorm(60, mean=70, sd=13) [1] 0.2208782 # answers above differ do to rounding of -10/13 to -0.77 pnorm(-10/13) [1] 0.2208782 b. Pr(y > 90) 1-pnorm(90, mean=70, sd=13) [1] 0.0619679 c. Pr(60 < y < 90) = P(y<90) – P(y<60) pnorm(90, mean=70, sd=13) - pnorm(60, mean=70, sd=13) [1] 0.717154 Empirical Rule (revisited) # proportion of points with 1 SD of mean for normal data # = Pr(-1 < z < 1) pnorm(1) – pnorm(-1) [1] 0.6826895 # How about within 2 SD of the mean? pnorm(2) – pnorm(-2) [1] 0.9544997 # How about with 3 SD of the mean? pnorm(3) - pnorm(-3) [1] 0.9973002 Can we find the point for a normal distribution that cuts off a certain fraction of the distribution below it? I.e. can we find a 100pth percentile? Example 4.15 SAT SAT ~ N(500, 100) Proportion of scores below 350? **** Sketch figure? Pr(SAT < 350) = pnorm(350, mean=500, sd=100) [1] 0.0668072 What is the 10%-tile of the SAT scores? What is the 10%-tile of N(0,1)? qnorm(0.1) [1] -1.281552 Mean(SAT) + z(10%tile) * SD(SAT) 500 + qnorm(0.1)*100 [1] 371.8448 or 372 if you round to integer score Can also do this directly in R … qnorm(0.1,mean=500,sd=100) [1] 371.8448 #-----------------------------------------------------------------------------------------------------------------# 4.11 Random Sampling Random sample of size “n” from a population of size “N” – all sample of size “n” are equally likely. #-----------------------------------------------------------------------------------------------------------------# 4.12 Sampling Distributions SAMPLING DISTRIBUTION = the distribution of a statistic over repeated samples * “meta-experiment” – how can we think about the distribution of a statistic? * illustrate … sampling.ybar.unif <- function(nsims=2000,nsamp=5) { xx <- matrix(runif(nsims*nsamp),nrow=nsims) ybar <- apply(xx,1,mean) par(mfrow=c(1,2), mar=c(2,2,2,2)) hist(xx, main=”data sampled”) hist(ybar, main=”distribution of ybar”) par(mfrow=c(1,1), mar=c(5.1,4.1,4.1,2.1)) } sampling.ybar.unif(nsamp=5) distribution of ybar 150 0 0 50 100 100 200 300 Frequency 200 400 250 500 300 data sampled 0.0 0.4 0.8 0.2 0.4 0.6 0.8 sampling.ybar.exp <- function(nsims=2000,nsamp=5) { xx <- matrix(rexp(nsims*nsamp),nrow=nsims) ybar <- apply(xx,1,mean) par(mfrow=c(1,2), mar=c(2,2,2,2)) hist(xx, main=”data sampled”) hist(ybar, main=”distribution of ybar”) par(mfrow=c(1,1), mar=c(5.1,4.1,4.1,2.1)) } sampling.ybar.exp(nsamp=25) distribution of ybar 200 0 0 100 5000 10000 Frequency 300 15000 400 20000 data sampled 0 2 4 6 8 10 0.4 0.8 1.2 1.6 * Compare to “nsamp” = 5 Theorem 4.1: CENTRAL LIMIT THEOREM for y Let y denote the sample mean computed from a random sample of n measurements from a population have a mean and finite standard deviation . Let y AND y / n When n is large, the sampling distribution of y will be approximately normal (more precise as n increases). Sampling distribution of y is exactly normal when the population distribution is normal. ASIDE: can also be expressed in terms of sums instead of y (see Theorem 4.2) #-----------------------------------------------------------------------------------------------------------------# 4.13 Normal Approx. to Binomial * Application of the CLT * proportion of successes in a sample can be thought of as a mean of random variables that assume two values (y=1 if success and y=0 if failure) * y = number of successes with n binomial trials (CLT for sum theorem) n AND n 1 y approximately ~ N( , ) [Best approximation if not too close to 0 or 1 -- n ≥ 5 and n (1-) ≥ 5]