Chapter 3: Joint Distributions Properties Range Space of (X,Y) Discrete Two Dimensional RV: if the number of possible values of (X(s),Y(s)) are finite or countable Continuous Two Dimensional RV: if the possible values of (X(s),Y(s)) can assume any value in some region of the Euclidean space R2. E.g. Time Marginal Probability Distribution Discrete RV Joint Probability Function Discrete RV fX,Y(x, y)=P(X =x, Y =y) Continuous RV From f(x,y) To get f(x), integrate y away (dy) To get f(y), integrate x away (dx) Properties Conditional Probability Function Continuous RV Independent Random Variable fX,Y(x,y)= fX(x)fY(y) & >0 If & only if f(x) AND f(y) are both POSITIVE Conclusion: if RX ,Y is not a product space, then X and Y are not independent! Expectation For any two variable function g(x,y) Discrete RV Examples DISCRETE: Find E(Y-X) & Cov(X,Y) Method 1: Continuous RV Rmb: For E(XY) we need to multiple by xy e.g. int xy f(x,y) For E(X) we need to multiple by x! Covariance General Method 2: E(Y −X) = E(Y)−E(X) = 1.5−0.5 = 1 E(Y) = 0·(1/8)+1·(3/8)+2·(3/8)+3·(1/8) = 1.5 E(X) = 0·(1/2)+1·(1/2) = 0.5 Cov: E(XY) = (0)(0)(1/8)+(0)(1)(1/4)+(0)(2)(1/8) +...+(1)(3)(1/8) = 1. cov(X,Y) = E(XY)−E(X)E(Y) = 1−(0.5)(1.5) = 0.25 CONTINUOUS: Find the cov(X,Y) given pf: Given pf: Cov(X,Y) = E(XY) – E(X)E(Y) Discrete RV Continuous RV Properties (1) cov(X,Y)=E(XY)−E(X)E(Y). (2) If X and Y are independent, then cov(X,Y) = 0. However, cov(X,Y) = 0 does not imply that X and Y are independent. (3) cov(aX +b,cY +d) = ac·cov(X,Y). (4) V(aX +bY) = a2V(X)+b2V(Y)+2ab·cov(X,Y). V(X+Y) = V(X) + V(Y) – 2(Cov X,Y) If X,Y are indept, Cov = 0! Chapter 4: Special Probability Distribution Discrete Uniform Distribution If RV X assumes the values x1,x2,...,xk with equal probability, then X follows a discrete uniform distribution. p.f. Mean Variance II) Negative Binomial Distribution X = number of trials until the kth success occurs. X ∼ NB(k, p), where p is probability of success. p.f. Mean & Variance E(X) = k/p V(X) = (1− p)k/p2. for x = k,k+1,k+2,... For x = x1, x2, … xk And 0 otherwise Can also use E(X2) – (E(X))2 Bernoulli Trial random experiment with only two possible outcomes; success or failure p.f. Mean & Variance When n = 1, the pf for binom RV X: fX(x)=px(1−p)1−x, for x = 0,1. Bernoulli Process consists of a sequence of repeatedly performed independent and identical Bernoulli trials Distributions derived from Bernoulli trial & process (1) Binomial Distribution (2) Negative Binomial Distribution (Geometric Distribution (3) Poisson Distribution I) Binomial Distribution Binomial random variable counts the number of successes in n trials in a Bernoulli Process. X~B(n, p) where p is probability of success. The probability of getting exactly x successes is given as: Geometric Distribution: Special Case of NB X = number of i.i.d. Bernoulli(p) trials until the first success occurs; then X follows a geometric distribution, denote by X ∼ G(p) p.f. Mean & Variance E(X) = 1/p V(X) = (1-p)/p2 III) Poisson Distribution X denotes the number of events occurring in a fixed period of time or fixed region. X ∼ Poisson(λ) where λ is the expected number of occurrences during the given period/region. p.f. Mean & Variance E(X) = λ V(X) = λ Poisson Process: Continuous time process, expected number of occurrences in an interval of length T is αT. Follow Poisson(αT) distribution Poisson Approx of Binomial Distribution Let X ∼B(n,p). Suppose that n→∞ and p→0 in such a way that λ =np remains a constant. Then approximately, X ∼ Poisson(np) Approx is good when n≥20and p≤0.05, or if n≥100 and np≤ 10. E(X) = np | V(X) = np(1-p) Continuous Distribution Continuous Uniform Distribution X ∼ U(a,b) Random variable X is said to follow a uniform distribution over the interval (a, b) if its probability density function is given by: p.f. Mean & Variance E(X) = (a+b) / 2 V(X) = (b-a)2/12 “No-Memory” of Exp Suppose that X has an exponential distribution with parameter λ > 0. Then for any two positive numbers s and t, we have P(X >s+t|X >s) = P(X >t) Normal Distribution X ∼ N(μ,σ2) RV X is said to follow a normal distribution with parameters μ and σ2 if its probability density function is given by p.f. Mean & Variance E(X) = μ V(X) = σ2 Quantile The αth (upper) quantile (0 ≤ α ≤ 1) of the RV X is the number xα that satisfies: P(X ≥xα)=α. Exponential Distribution X~Exp(λ) RV X is said to follow an exponential distribution with parameter λ > 0 if its p.d.f. is given by p.f. Mean & Variance E(X) = 1/ λ V(X) = 1/ λ2 Alternative p.f. Aka if E(T) = 5, λ = 1/5 Mean & Variance E(X) = μ V(X) = μ 2 We denote by zα the αth (upper) quantile (or 100α percentage point) of Z ∼ N(0,1) P(Z ≥ zα ) = α . zα is value on x asis where area = α e.g. z0.05 = 1.645, z0.01 = 2.326 Normal Approx To Binomial Distribution X ∼B(n,p), so that E(X) =np and V(X)=np(1−p). Then as n→∞, but p remains constant, we can use normal dist to approx. the binom distri. Rule of thumb, use normal when: np>5 and n(1−p)>5. Recall: when n → ∞, p → 0, and np remains a constant, we can use Poisson distribution to approximate the binomial distribution. Chapter 5: Sampling & Sampling Distribution Population: The totality of all possible outcomes or observations of a survey or experiment - Finite Population: Finite no of elements - Infinite Pop: consists of an infinitely (countable and un- countable) large number of elements e.g. he results of all possible rolls of a pair of dice; the depths at all conceivable positions of a lake Sample Solving Other Distributions (χ2, t, and F) Examples of distributions that are derived from random samples from a normal distribution. χ2 Distribution Simple Random Sample If Z is a standard norm variable, a random variable of n members is a sample that is chosen such that with the same distribution as Z2 is called a χ2 every subset of n observations of the population has random variable with one degree of freedom. the same probability of being selected. Sample Mean Sample Variance χ2 random variable with n deg of freedom as χ2(n) Y ∼ χ2(n) P(Y > χ2(n;α)) = α. If S2 is the variance of a random sample of size n taken from a normal population having the variance σ2, then the random variable: has a χ2 distribution with n − 1 deg of freedom Mean & Variance of X̄ Example: 6 random samples are drawn from a N(μ,4) population. Define the sample variance: Standard Error: SD of a sample, denoted by σX Instead of σ which is for population Central Limit Theorm E(Y) = deg of freedom, Var(Y) = 2*deg of freedom For large n, χ2(n) is approximately N(n,2n) Find c such that P(S2 > c) = 0.05. T Distribution Suppose Z ∼ N(0,1) and U ∼ χ2(n). If Z and U are independent, then F Distribution Suppose U ∼ χ2(m) and V ∼ χ2(n) are independent. Then the distribution of the random variable follows the t-distribution with n deg of freedom. Find the probability that a random sample of 25 observations, from a normal population with variance σ 2 = 6, will have a variance S2 (a) greater than 9.1. The t-distribution approaches N(0,1) as the parameter n → ∞. When n ≥ 30, we can replace it by N(0,1). T ∼t(n), thenE(T) = 0 and var(T) = n/(n−2) for n>2 is called a F-distribution with (m,n) deg of freedom. For X~F(m,n), E(X) = n/(n-2) for n >2 T ∼ t(n) P(T >tn;α)=α n − 1 degrees of freedom F(m,n;α) P(F > F(n,m;α)) = α If S12 and S2 represent the variances of independent random samples of size n1 = 8 and n2 = 12, taken from normal populations with equal variances, find pf(4.89, 7,11) = 0.9900295 Chapter 6: Estimation Two types of statistical inference methods 1. Estimation of population parameters (Chap 6) 2. Testing hypotheses about the parameter values (Chap 7) Unbiased Estimator When E(θ) = θ where θ can be p,μ, or σ. is an unbiased estimator of σ2 since E(S2) = σ2. Summary of Test Statistic, Max Error Of Estimate & Sample Size We say that X±E has probability (1−α) of containing μ. Comparing Two Populations Independent, Known & Unequal Var - Pop var known & NOT equal - Two populations are normally distributed or both n >= 30 Independent, Small & Equal Var - Pop var unknown & the same - Both n < 30 & norm distri. Confidence Interval This “fairly certain” can be quantified by the degree of confidence also known as confidence level (1 − α ), in the sense that : P(a < μ < b) = 1 − α . (a, b) is called the (1 − α ) confidence interval. Where t: w n1+n2 – 2 deg of freedom Independent, Large WIth Unknown Variances - Pop var unknown & NOT Equal - Both n >= 30 Independent, Large & Equal Var - Pop var unknown & the same - Both n >= 30 & norm distri. Paired Data Rejection Region H1 : μ =/= μ0 z <− zα/2 or z > zα/2. t < −tn−1,α/2 or t > tn−1,α/2 Rejection Region Using p Value • If p-value < α, reject H0; else • If p-value ≥ α, do not reject H0. Where d = difference Chapter 7: Hypothesis Testing Both null & alternative hypothesis are statements about a population In this chapter it will be about the mean of a population H1 : μ < μ0 z < −zα. t < −tn−1,α . H1 : μ > μ0 z > z α. t > tn−1,α Tests Comparing Mean: Independent Samples H0 : μ1 − μ2 = δ0 Known Population Var Unknown Population Var - Normal distribution OR - N1 & N2 >= 30 - N1 & N2 >= 30 Test Statistic Test Statistic Type I vs Type II Error The rejection of H0 when H0 is true is called a Type I error Not rejecting H0 when H0 is false is called a Type II error - The probability of making a Type I error is called the level of significance, denoted by α We define 1−β = P(Reject H0 | H0 is F) to be the power of the test. Test Statistic Known Variance - Pop variance is known - N sufficiently large, >= 30 Unknown Variance - Pop variance unknown - Distribution normal Rejection Region Unknown But Equal Pop variances unknown but equal; normal dist; n1 & n2 <30 where √ 𝑆𝐷1 2 +𝑆𝐷2 2 2 test against t (n-2) Paired Data Di = Xi −Yi For the null hypothesis H0 : μD = μD0 For the null hypothesis H0 : μD = μD0 Test Statistic If n < 30 & pop is normally distributed: T ∼tn−1. If n >= 30, T ∼N(0,1) Pop mean diff is usually 0! Using R Functions Revision Questions Joint Distributions Tutorial 5 Q6 Tutorial 6 Q1 Suppose that X and Y are RV having the joint probability function. a) Find E(Y|X=2) Thus E(Y|X = 2) = 1(1/4)+3(2/4)+5(1/4) = 3. Alternatively, since X and Y are independent, E (Y |X = 2) = E (Y ) = 1(0.25) + 3(0.5) + 5(0.25) = 3. b) Find E(XY) E(XY)=(2)(1)(0.10)+(2)(3)(0.10)+(2)(5)(0.10) +(4)(1)(0.10)+(4)(3)(0.10)+(4)(5)(0.10)= 9.6. Q2 X & Y independent? Find f(x) by integrating over y, f(y) and multiply! Find Cov(X,Y) & V(X+Y) A fast food restaurant operates a drive-up facility and a walk-up window. On a randomly selected day, let X = proportion of time that the drive-up facility is in use (at least one customer is being served or waiting to be served) and Y = the proportion of the time that the walk-up window is in use. Suppose that the joint probability density function of (X,Y) is given by (iii) Given that the drive-up facility is busy 80% of the time, what is the probability that the walk-in facility is busy at most half the time? (iv) Given that the drive-up facility is busy 80% of the time, what is the expected proportion of time that the walk-in facility is busy? Sampling Probability Distribution Tutorial 7 1. A box contains 2 red marbles and 98 blue ones. Draws are made at random with replacement. In n draws from the box, there is better than a 50% chance for a red marble to appear at least once. What is the smallest possible value for n? Q7: A company rents time on a computer for periods of t hours, for which it receives $600 an hour. The number of times the computer breaks down during t hours is a random variable having the Poisson distribution with λ = 0.8t, and if the computer breaks down x times during t hours, it costs 50x2 dollars to fix it. How should the company select t in order to maximize its expected profit? P(X = 0) = (1−0.02)n ≤ 0.5 n ≥ log(0.5)/ log(0.98) = 34.31 n = 35 Using trial and error: pbinom(0, 35, 0.02, lower.tail=FALSE) = 0.5069254. n = 35 Q4 Three people toss a fair coin and the odd man pays for coffee. If the coins all turn up the same, they are tossed again. Find the probability that fewer than 4 tosses are needed. P(failure) = P(HHH) + P(TTT) = 1/2*1/2*1/2*2 = ¼ P(X<4)=3/4+(1/4)(3/4)+(1/4)^2(3/4)=63/64. 6. A notice is sent to all owners of a certain type of automobile, asking them to bring their cars to a dealer to check for the presence of a particular type of defect. Suppose that only 0.05% of the cars have the defect. Consider a random sample of 10,000 cars. (b) What is the (approximate) probability that at least 10 sampled cars have the defect? Use Poisson Approx since n is large and p is small ppois(9,5, lower.tail = F) Tutorial 8 3. The time (in hours) required to repair a machine is an exponentially distributed random variable with parameter λ = 1/2. What is the conditional probability that a repair takes at least 10 hours, given that its duration exceeds 9 hours? X~Exp(1/2) P(X>=10 | X >9) which equals P(T > 1) by the memoryless property of the exponential distribution. pexp(1, 1/2, lower.tail = F) = 0.6065307 Q8: A coin is tossed 400 times, Use the normal approximation to find the probability of obtaining between 185 and 210 heads inclusive. Y = number of head in 400 tosses of a coin 𝑌~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛=400, 𝑝=0.5) 𝐸(𝑌) = 𝑛𝑝 = 400(0.5) = 200. 𝑉(𝑌) = 𝑛𝑝(1 − 𝑝) = 400(0.5)(0.5) = 100 Y~N(200, 10^2) Estimation Tutorial 9 Q6: Let X be a binomial random variable with parameters n and p. Hypothesis Testing Q4: A normal population with unknown variance has a mean of 20. Is one likely to obtain a random sample of size 9 from this population with a standard deviation being 4.1 and a mean being larger than or equal to 24? If not, what conclusion would you draw? (a) E(U) = E(X)/n = np/n = p. Since E(U) = p, therefore U is an unbiased estimator of p. Since E(V)= ̸= p, therefore V is a biased estimator of p! So we conclude that μ > 20, since this probability is very small showing that it is very unlikely to get mean of 24 if the population mean is really 20. Tutorial 10 Suppose we wish to test the hypothesis H0 : μ = 2 vs H1 : μ ̸= 2 and found a two-sided p-value of 0.03. Separately, a 95% confidence interval for μ is computed to be (1.5, 4.0). Are these two results compatible? Why or why not? SOLUTION No; a p-value of 0.03 suggests that we will reject the null hypothesis at 0.05 level. On the other hand, if the 95% CI contains the null value of 2, then we should not reject the null hypothesis at 0.05 level. So these two statements are not compatible. Other Questions Population of fish; Mean 54, SD 4.5mm A random sample of four fish is chosen from the population. Find the probability that all four fish are between 51 and 60 mm long. Continuing from the previous part, find the probability that the mean length of the four fish in the sample is between 51 and 60 mm long. Must divide sqrt n here! Sample mean need to divide