STA-671-Ch4-Notes

advertisement
STA 671 – Ch 4 – Probability and Probability
Distribution Notes
# 4.1 Probability in inference
Classical interpretation = equally likely outcomes in games of chance
Outcome = possible distinct result
Event = collection of outcomes
Relative frequency of probability = n(e)/n
Personal/Subjective probability
#-----------------------------------------------------------------------------------------------------------------# 4.2 Finding Probability of an Event
Number of times event occurs/Number of times it might occur
OL example - Toss two coins, Outcomes = HH, HT, TH, TT
What is the probability of exactly one head in two tosses of fair coins?
R simulation ...
my.samp <- sample(x=c("HH", "HT", "TH", "TT"),size=1000,replace=TRUE)
sum(my.samp=="HT" | my.samp=="TH")
[1] 493
sum(my.samp=="HT" | my.samp=="TH")/length(my.samp)
[1] 0.493
# count expts with exactly one head
# est. prob
#-----------------------------------------------------------------------------------------------------------------# 4.3 Basic Event Relations and Prob laws
Events A or B
0 <= P(A) <= 1
UNION = either A or B
MUTUALLY EXCLUSIVE = occurrence of one event excludes possibility of other event
A and B mutually exclusive then Pr(A or B) = Pr(A) + Pr(B)
COMPLEMENT of event A is the event that A does NOT occur
UNION
INTERSECTION
Probability of the union
Venn Diagram
Aside: R functions related to set manipulation
union(x, y)
intersect(x, y)
setdiff(x, y)
setequal(x, y)
is.element(el, set)
A <- c(1, 3, 5)
B <- c(4, 5)
union(A,B)
[1] 1 3 5 4
intersect(A,B)
[1] 5
is.element(4,A)
[1] FALSE
is.element(4,B)
[1] TRUE
#-----------------------------------------------------------------------------------------------------------------# 4.4 Conditional Prob and Independence
Pr( A | B ) = Pr( A AND B )/Pr(B) where Pr(B) > 0
Pr( A AND B ) = Pr( A | B ) * Pr(B)
Pr(False POSITIVE) = Pr(Test POSITIVE | Disease Free)
Pr(False NEGATIVE) = Pr(Test NEGATIVE | Disease Absent)
INDEPENDENT
[Def'n] Occurrence of event A is not dependent on the occurrence of event B.
[Implication]
Pr( A | B ) = Pr(A)
Pr( A ∩ B ) = Pr( A | B ) * Pr(B) = (if A INDEP of B) = Pr(A) * Pr(B)
Don't confuse INDEPENDENCE and MUTUALLY EXCLUSIVE
#-----------------------------------------------------------------------------------------------------------------# 4.5 Bayes' Formula
- not covered in class
#-----------------------------------------------------------------------------------------------------------------# 4.6 Variables: discrete and continuous
RANDOM VARIABLE
- qualitative RV - categorical responses
- quantitative RV response varies in numerical magnitude
DISCRETE - countable outcomes [counting]
CONTINUOUS - uncountable/continuum of outcomes [measuring]
#-----------------------------------------------------------------------------------------------------------------# 4.7 Prob. dist'ns for discrete RV
P(y) = probability distribution
Properties:
1. 0 ≤ P(y) ≤ 1
2. ∑ P(y) = 1
3. Prob(y1 or y2) = P(y1) + P(y2)
EXAMPLE: Number of heads in 3 tosses
y
-0
1
2
3
P(y)
-----1/8
3/8
3/8
1/8
#-----------------------------------------------------------------------------------------------------------------# 4.8 Binomial
Binomial Experiment
1.
2.
3.
4.
5.
n identical trials in an experiment
trial results in one of two outcomes - S/F
P(S) = π
Trials are independent
RV y = # of successes in n trials
EXAMPLE:
y = # survive given n exposed in a toxicology study
y = # support candidate out of n sampled in politics
P(y) =
n!
 y (1   ) n  y
y!(n  y )!
where n=# trials,  = P(S) and n! = n(n-1)(n-2) …(2)(1)
R function – dbinom – calculates P(y)
dbinom(x=c(0,1,2,3),size=3,prob=0.5)
[1] 0.125 0.375 0.375 0.125
pp <- dbinom(x=c(0,1,2,3),size=3,prob=0.5)
barplot(pp,names.arg=0:3)
0.30
0.20
0.10
0.00
0
1
2
3
MEAN () and STANDARD DEVIATION () of Binomial
=n
=
n  (1  )
#
# different BINOMIALs illustrated for n=20
#
Consider the shape of P(y) for binomial distributed random variables with n=20 and  = 0.05,
0.25, 0.50 or 0.80.
Values that y can assume: y=0, 1, 2, …, 20
0.3
0.2
0.1
0.0
0.0
0.1
0.2
Probability
0.3
0.4
Bin(n=20,p=0.25)
0.4
Bin(n=20,p=0.05)
MEAN
0
MEAN
5
10
15
20
0
Bin(n=20,p=0.50)
10
15
20
Bin(n=20,p=0.80)
Number of Successes
0.3
0.2
0.1
0.0
0.0
0.1
0.2
Probability
0.3
0.4
Number of Successes
0.4
5
MEAN
0
5
10
MEAN
15
20
0
5
10
15
20
Q: What you notice about the shape of this probability function?
R code to generate the different Binomial plots
pp05 <- dbinom(x=0:20, size=20, prob=0.05)
pp25 <- dbinom(x=0:20, size=20, prob=0.25)
pp50 <- dbinom(x=0:20, size=20, prob=0.50)
pp80 <- dbinom(x=0:20, size=20, prob=0.80)
par.old.mar <- par()$mar
par(mfrow=c(2,2))
par(mar=c(2,2,2,2)) # may lose some of margin text
plot(0:20, pp05 ,main=”Bin(n=20,p=0.05)”,type=”h”,lwd=4,ylim=c(0,.40),
ylab=”Probability”, xlab=”Number of Successes”)
abline(v=(20*.05))
mtext(“MEAN”,side=1,at=20*.05)
plot(0:20, pp25 ,main=”Bin(n=20,p=0.25)” ,type=”h”,lwd=4,ylim=c(0,.40) ,
ylab=”Probability”, xlab=”Number of Successes”)
abline(v=(20*.25))
mtext(“MEAN”,side=1,at=20*.05)
plot(0:20, pp50, main=”Bin(n=20,p=0.50)” ,type=”h”,lwd=4,ylim=c(0,.40) ,
ylab=”Probability”, xlab=”Number of Successes”)
abline(v=(20*.5))
mtext(“MEAN”,side=1,at=20*.50)
plot(0:20, pp80, main=”Bin(n=20,p=0.80)” ,type=”h”,lwd=4,ylim=c(0,.40) ,
ylab=”Probability”, xlab=”Number of Successes”)
abline(v=(20*.80))
mtext(“MEAN”,side=1,at=20*.80)
#-----------------------------------------------------------------------------------------------------------------# 4.9 Prob. Dist'ns for continuous RV
Suppose outcomes take values over a continuum. For example, a random variable that
corresponds to weight, cholesterol levels, etc. could be conceptualized in this manner.
As an example, let Y = time after class begins when students arrive
Consider a relative frequency histogram corresponding to Y for 10, 100, 100 or 10000 students.
Q: What will more data do for you with respect to a relative frequency histogram?
* smaller bins
* smooth curve describing this variable’s distribution, say f(y)
Properties of f(y)
* This differs from P(y) for a discrete RV - f(y) does NOT EQUAL Pr(y)
* For a continuous RV, the probability of an individual values is zero, Pr(y = a) = 0
* For a continuous RV, the probability of an interval, Pr(a<y <b) = area under f(y) over interval
For example, suppose y=weight in pounds
What is the Pr(y = 182.738509138920102 lbs)? Pr(y = any specific value) = 0
What is the Pr(178 < y < 182)? This can be determined.
* Total area under curve = 1
Histogram of rlnorm(100)
60
40
0
0
2
20
4
6
Frequency
8
80
10
Histogram of rlnorm(10)
20
30
40
0
10
20
30
40
Histogram
of rlnorm(10000)
rlnorm(100)
0
0
1000
200
400
Frequency
600
Histogram
of rlnorm(1000)
rlnorm(10)
5000
10
3000
0
0
10
20
30
40
0
10
20
30
40
50
R code to generate this (sampling log-normal variates)
hist(rlnorm(10),breaks=seq(from=0,to=40,length=10))
hist(rlnorm(100),breaks=seq(from=0,to=40,length=15))
hist(rlnorm(1000),breaks=seq(from=0,to=40,length=25))
hist(rlnorm(10000),breaks=seq(from=0,to=50,length=50))
#-----------------------------------------------------------------------------------------------------------------# 4.10 NORMAL
The “all star” of continuous distributions – symmetric, unimodal, bell-shaped (sometimes called
Gaussian distribution)
* described by two parameters ( = mean;  =standard deviation)
Notation: y ~ N(, ) – read “y is distributed as a normal RV with mean  and std dev ”
STANDARD NORMAL has mean= 0 and std. dev.= 1, z ~ N(0,1)
0.4
0.3
0.2
N(0,1)
0.0
0.1
N(2,2)
-4
-2
0
2
4
6
R code to generate figures
xx<- seq(from=-4,to=6,length=1000)
yy0 <- dnorm(xx)
yy2 <- dnorm(xx,mean=2,sd=sqrt(2))
plot(xx,yy0,type=”l”,lty=1,lwd=2)
lines(xx,yy2,type=”l”,lty=2,lwd=2)
text(-1.5,.3,"N(0,1)")
text(4,.2,"N(2,2)")
Areas under normal distributions are the same when intervals are
expressed in term of standard deviations. Huh?
Suppose y~N(500,100)
(e.g. SAT-V) and z~N(0,1)
Pr(400 < y < 600) = Pr(-1 < z < 1)
[both reflect the area under a normal curve from one std dev
below the mean to one std dev above the mean]
So what?
If y ~ N(, ) then z = [y-]/ ~ N(0, 1)
Z score = measure of distance – how many SDs is an observation away from the mean
Example 4.14: y=daily milk production of Guernsey cow – y ~ N(70, 13)
a. Pr(milk production for a cow chosen at random will be less than 60 pounds) = Pr(y<60)
Pr(y < 60) = Pr([y-]/  [60 – 70]/13) = Pr(z < -0.77)
# R function to find area under N(0,1) less than some point
pnorm(-0.77)
[1] 0.2206499
pnorm(60, mean=70, sd=13)
[1] 0.2208782
# answers above differ do to rounding of -10/13 to -0.77
pnorm(-10/13)
[1] 0.2208782
b.
Pr(y > 90)
1-pnorm(90, mean=70, sd=13)
[1] 0.0619679
c.
Pr(60 < y < 90) = P(y<90) – P(y<60)
pnorm(90, mean=70, sd=13) - pnorm(60, mean=70, sd=13)
[1] 0.717154
Empirical Rule (revisited)
# proportion of points with 1 SD of mean for normal data
# = Pr(-1 < z < 1)
pnorm(1) – pnorm(-1)
[1] 0.6826895
# How about within 2 SD of the mean?
pnorm(2) – pnorm(-2)
[1] 0.9544997
# How about with 3 SD of the mean?
pnorm(3) - pnorm(-3)
[1] 0.9973002
Can we find the point for a normal distribution that cuts off a certain fraction of the distribution
below it? I.e. can we find a 100pth percentile?
Example 4.15 SAT
SAT ~ N(500, 100)
Proportion of scores below 350?
**** Sketch figure?
Pr(SAT < 350) = pnorm(350, mean=500, sd=100)
[1] 0.0668072
What is the 10%-tile of the SAT scores?
What is the 10%-tile of N(0,1)?
qnorm(0.1)
[1] -1.281552
Mean(SAT) + z(10%tile) * SD(SAT)
500 + qnorm(0.1)*100
[1] 371.8448 or 372 if you round to integer score
Can also do this directly in R …
qnorm(0.1,mean=500,sd=100)
[1] 371.8448
#-----------------------------------------------------------------------------------------------------------------# 4.11 Random Sampling
Random sample of size “n” from a population of size “N” – all sample of size “n” are equally
likely.
#-----------------------------------------------------------------------------------------------------------------# 4.12 Sampling Distributions
SAMPLING DISTRIBUTION = the distribution of a statistic over repeated samples
* “meta-experiment” – how can we think about the distribution of a statistic?
* illustrate …
sampling.ybar.unif <- function(nsims=2000,nsamp=5) {
xx <- matrix(runif(nsims*nsamp),nrow=nsims)
ybar <- apply(xx,1,mean)
par(mfrow=c(1,2), mar=c(2,2,2,2))
hist(xx, main=”data sampled”)
hist(ybar, main=”distribution of ybar”)
par(mfrow=c(1,1), mar=c(5.1,4.1,4.1,2.1))
}
sampling.ybar.unif(nsamp=5)
distribution of ybar
150
0
0
50
100
100
200
300
Frequency
200
400
250
500
300
data sampled
0.0
0.4
0.8
0.2
0.4
0.6
0.8
sampling.ybar.exp <- function(nsims=2000,nsamp=5) {
xx <- matrix(rexp(nsims*nsamp),nrow=nsims)
ybar <- apply(xx,1,mean)
par(mfrow=c(1,2), mar=c(2,2,2,2))
hist(xx, main=”data sampled”)
hist(ybar, main=”distribution of ybar”)
par(mfrow=c(1,1), mar=c(5.1,4.1,4.1,2.1))
}
sampling.ybar.exp(nsamp=25)
distribution of ybar
200
0
0
100
5000
10000
Frequency
300
15000
400
20000
data sampled
0
2
4
6
8
10
0.4
0.8
1.2
1.6
* Compare to “nsamp” = 5
Theorem 4.1: CENTRAL LIMIT THEOREM for y
Let y denote the sample mean computed from a random sample of n measurements from a
population have a mean  and finite standard deviation . Let
 y   AND  y   / n
When n is large, the sampling distribution of y will be approximately normal (more precise as n
increases). Sampling distribution of y is exactly normal when the population distribution is
normal.
ASIDE: can also be expressed in terms of sums instead of y (see Theorem 4.2)
#-----------------------------------------------------------------------------------------------------------------# 4.13 Normal Approx. to Binomial
* Application of the CLT
* proportion of successes in a sample can be thought of as a mean of random variables that
assume two values (y=1 if success and y=0 if failure)
* y = number of successes with n binomial trials (CLT for sum theorem)
  n AND   n  1 
y approximately ~ N( , )
[Best approximation if  not too close to 0 or 1 -- n  ≥ 5 and n (1-) ≥ 5]
Download