PS4 Solutions: Probability & Statistics Problem Set

PS4 Solutions Arturo Valdivia November 10, 2020 Contents 1 2 2 ISIR 6.4.6. 4 3 5 4 ISIR 7.7.2. 6 5 ISIR 7.7.4. 10 6 ISIR 7.7.7. 13 7 16 8 19 1 1 Let X be a continuous random variable. Find the expected value, the median, the interquartile range (iqr) of X, and P (0.5 < X < 1.5) when the PDF is: (a)   0.3 0.7 f (x) =  0 0≤x<1 1≤x<2 otherwise One can use the areas-approach to solve this problem; i.e., 0.7 0.7 0.3 0.3 0.0 0.5 1.0 1.5 2.0 x Based on the figure we can easily state that EX = 0.5 × 0.3 + 1.5 × 0.7 = 1.2, where 0.3 and 0.7 are the “heights” of the two components of the p.d.f., respectively. R1 R2 Alternatively, we can compute the expected value as EX = 0 x(0.3)dx + 1 x(0.7)dx = 0.15x2 |10 + 0.35x2 |21 = 0.15 + 0.35(3) = 1.2. R1 Rx Since 0 0.3dt = 0.3t|10 = 0.3, then the median x has to hold the relationship 0.3 + 1 0.7dt = 0.5; that is 0.7t|x1 = 0.2 =⇒ x = q2 = 1.285714. Rx In the same way, q1 is obtained by solving 0.3t|x0 = 0.25 and q3 by solving 0.3 + 1 0.7dt = 0.75 implying that q1 = 0.83333 and q3 = ((0.75 − 0.3)/0.7) + 1 = 1.642857. The iqr is q3 − q1 = 1.642857 − 0.83333 = 0.809527. R 1.5 Finally, we need P (0.5 < X < 1.5) = P (X < 1.5)−P (X < 0.5). We have that P (X < 1.5) = 0.3+ 1 0.7dt = 0.3 + 0.7(0.5) = 0.65, and P (X < 0.5) = 0.3(0.5) = 0.15. Thus, P (0.5 < X < 1.5) = 0.50. (b) f (x) = 2(x − 1) 1 ≤ x ≤ 2 0 otherwise One can use the areas-approach to solve this problem; i.e., 2 2(x−1) 1.666 1.00 1.25 1.50 1.75 2.00 x Since we have a triangular area, we can easily state that the expected value is 1/3 closer to the tallest side; i.e. EX = 1 + (2 − 1) × (2/3) = 1.666667. R2 3 2 Alternatively, we can compute the expected value as EX = 1 x2(x − 1)dx = 2x3 |21 − x2 |21 = ( 16 3 − 3) − 3 = 1.666667. Rx Rx The median is the x value such that 1 f (t)dt = 0.5. Therefore we need to solve 1 2(t − 1)dt = 0.5; that is, t2 − 2t|x1 = 0.5 =⇒ x2 − 2x + 1 = 0.5 =⇒ (x − 1)2 = 0.5 =⇒ x = q2 = 1.707107. In the same way, q1 is obtained by solving (x − 1)2 = 0.25 and q3 by solving (x − 1)2 = 0.75 implying that q1 = 1.5 and q3 = 1.866025. The iqr is q3 − q1 = 1.866025 − 1.5 = 0.366025. Finally, we need P (0.5 < X < 1.5) = P (X < 1.5) − P (X < 0.5). Using tha fact that q1 = 1.5 we have that P (X < 1.5) = 0.25, and since 1 ≤ x ≤ 2 then P (X < 0.5) = 0. Thus, P (0.5 < X < 1.5) = 0.25. 3 2 ISIR 6.4.6. A random variable X ∼ U nif orm(5, 15) has population mean µ = EX = 10 and population variance σ 2 = V arX = 25/3. Let Y denote a normal random variable with the same mean and variance. (a) Consider X. What is the ratio of its interquartile range to its standard deviation, iqr/σ? We know that iqr(X) = q3 (X) − q1 (X). Then we√need q0.75 (X) and q0.25 (X); that is, q0.75 (X) = 12.5 and q0.25 (X) = 7.5. Therefore, iqr/σ = √ 5 = 3. 25/3 ∗ In this question there was a typo. As you see, the theoretical variance of U nif orm(5, 15) distribution is not 225. Therefore, although strictly incorrect, we will p grant full credit for the following solution: iqr(X)/SD(X) = (q3 (X) − q1 (X))/SD(X) = (12.5 − 7.5)/ (V ar(X)) = 5/15 = 1/3. (b) Consider Y . What is the ratio of its interquartile range to its standard deviation? We need q0.75 (Y ) and q0.25 (Y ); that is, q1 <- qnorm(p=0.25,mean=10,sd=sqrt(25/3)); q1 ## [1] 8.052916 q3 <- qnorm(p=0.75,mean=10,sd=sqrt(25/3)); q3 ## [1] 11.94708 iqr<- (q3-q1)/sqrt(25/3); iqr ## [1] 1.34898 q0.75 (Y ) = 11.94708 and q0.25 (Y ) = 8.052916. Therefore, iqr/σ = 1.34898. 4 3 Create the following functions in R: (a) my.iqr(x): If x is a vector, then my.iqr(x) returns the iqr of x my.iqr <- function(x){ unname(quantile(x = x,probs = 0.75)-quantile(x = x,probs = 0.25)) } (b) iqr.sq(x): If x is a vector, then iqr.sq(x) returns the ratio of its interquartile range to its standard deviation iqr.sq <- function(x){ my.iqr(x)/sqrt(mean(x^2)-(mean(x)^2)) } (c) Used functions my.iqr() and iqr.sq() in the following vectors: i. A random sample of 5000 numbers from a standard normal distribution (use a random seed so the results can be replicated). set.seed(320520) x <- rnorm(5000) my.iqr(x) ## [1] 1.319493 iqr.sq(x) ## [1] 1.325863 ii. The variable births in data frame US_births_1994_2003 from package fivethirtyeight library(fivethirtyeight) x <- US_births_1994_2003$births my.iqr(x) ## [1] 3429.75 iqr.sq(x) ## [1] 1.845626 iii. The vector composed by the first 1000 values of the variable births library(fivethirtyeight) x <- US_births_1994_2003$births[1:1000] my.iqr(x) ## [1] 2949.5 iqr.sq(x) ## [1] 1.84589 5 4 ISIR 7.7.2. Let ~x denote the following sample of pulse rates of Peruvian indigenous1 (a) Graph the empirical cdf of ~x. x <- c(88, 76, 84, 64, 60, 64, 60, 64, 68, 74, 68, 68, 72, 76, 72, 52, 72, 64, 60, 56, 72, 88, 80, 76, 64, 72, 60, 76, 88, 72, 64, 60, 60, 72, 92, 80, 72, 64, 68) plot(ecdf(x), main="ECDF of X") 0.6 0.4 0.0 0.2 Fn(x) 0.8 1.0 ECDF of X 50 60 70 80 90 x (b) Compute the plug-in estimates of the population mean and variance. #mean mean(x) ## [1] 70.30769 #variance mean(x^2)-(mean(x)^2) ## [1] 87.90533 (c) Compute the plug-in estimates of the population median and interquartile range. #median median(x) #or equivalently 1 T. A. Ryan, Jr., B. L. Joiner, and B. F. Ryan (1985). The Minitab Student Handbook. Duxbury Press, Boston, pp. 317-318. These data appear as Data Set 345 in A Handbook of Small Data Sets. 6 ## [1] 72 quantile(x,0.5) ## 50% ## 72 #iqr quantile(x,0.75)-quantile(x,0.25) ## 75% ## 12 (d) Compute the ratio of the plug-in estimate of the interquartile range to the square root of the plug-in estimate of the variance. #iqr/var num <- quantile(x,0.75)-quantile(x,0.25) den <- sqrt(mean(x^2)-(mean(x)^2)) num/den ## 75% ## 1.279893 (e) Construct a boxplot. 60 70 80 90 boxplot(x) (f) Construct a normal probability plot. qqnorm(x); qqline(x) 7 80 70 60 Sample Quantiles 90 Normal Q−Q Plot −2 −1 0 1 2 Theoretical Quantiles (g) Construct a kernel density estimate. plot(density(x),main="Density of X") 0.02 0.00 0.01 Density 0.03 0.04 Density of X 40 50 60 70 80 90 100 N = 39 Bandwidth = 3.874 (h) Do you think that this sample was drawn from a normal distribution? Why or why not? 8 First, we should consider that we have relativetly few observations for drawing certain conclusions about the underlying distribution of the data. The “qqnorm” plot deviates a bit from the the 45% degree line, but on the other hand we observe some symmetry in the data since the median and mean are very close. These two evidences suggest that the data might be modeled by a normal distribution, although we cannot confirm that it exhibits strong normality. 9 5 ISIR 7.7.4. The following sample, ~x, was observed and sorted: (a) Graph the empirical cdf of ~x. x <- scan("https://mtrosset.pages.iu.edu/StatInfeR/Data/sample774.dat") plot(ecdf(x), main="ECDF of X") 0.6 0.4 0.0 0.2 Fn(x) 0.8 1.0 ECDF of X 0 2 4 6 8 x (b) Calculate the plug-in estimates of the mean, the variance, the median, and the interquartile range. #mean mean(x) ## [1] 1.4876 #variance mean(x^2)- mean(x)^2 ## [1] 2.787554 #median median(x) #or equivalently ## [1] 1.076 quantile(x,0.5) ## 50% ## 1.076 10 #iqr iqr <- unname(quantile(x,0.75)-quantile(x,0.25)) iqr ## [1] 1.10775 (c) Take the square root of the plug-in estimate of the variance and compare it to the plug-in estimate of the interquartile range. Do you think that ~x was drawn from a normal distribution? Why or why not? #Ratio of the data iqr/sqrt(mean(x^2)-mean(x)^2) ## [1] 0.6634835 #Ratio of the normal distribution (qnorm(.75)-qnorm(.25))/1 ## [1] 1.34898 We have that the correspondent ratio for the normal distribution is 1.34898, while the ratio for the date is 0.66348. Based on this criterion, the data does not seems to have come from a normal distribution. (d) Use the qqnorm function to create a normal probability plot. Do you think that ~x was drawn from a normal distribution? Why or why not? qqnorm(x); qqline(x) 4 2 0 Sample Quantiles 6 Normal Q−Q Plot −2 −1 0 1 2 Theoretical Quantiles Based on the qqplot, the data quantiles deviates from the normal quantiles (45 degree line); therefore we there is no enough evidence for claiming normality. (e) Now consider the transformed sample ~y produced by replacing each xi with its natural logarithm. If ~x is stored in the vector x, then ~y can be computed by the following R command: > y <- log(x) 11 Do you think that ~y was drawn from a normal distribution? Why or why not? y <- log(x) #Ratio of the transformed data iqr/sqrt(mean(y^2)-mean(y)^2) ## [1] 1.291286 #qqplot qqnorm(y); qqline(y) 0.5 −0.5 −1.5 Sample Quantiles 1.5 Normal Q−Q Plot −2 −1 0 1 2 Theoretical Quantiles After the log transformation the data approaches normality (although we keep having relatively few observations). 12 6 ISIR 7.7.7. Consider an urn that contains 10 tickets, labelled {1, 1, 1, 1, 2, 5, 5, 10, 10, 10} From this urn, I propose to draw (with replacement) n = 40 tickets. I am interested in the sum, Y , of the 40 ticket values that I draw. (a) Write an R function named urn.model that simulates this experiment, i.e., evaluating urn.model is like observing a value, y, of the random variable Y . First, let’s define a variable urn urn <- c(1,1,1,1,2,5,5,10,10,10) urn ## [1] 1 1 1 1 2 5 5 10 10 10 and a function urn.model, that receives an “urn” and a number “n” of needed samples, urn.model <- function(urn,n){ samp1 <- sample(urn,n,replace=TRUE) y <- sum(samp1) y } Then, proceed with the sampling, urn.model(urn = urn, n = 40) ## [1] 159 This is a random sampling process, as everytime we run this code, it produces a different sample and a different sum. (b) Use urn.model to generate a sample, y = {y1 , . . . , y25 }, of n = 25 observed sums. The random variable Y is discrete. Does it appear that the distribution of Y can be approximated by a normal distribution? Why or why not? #initializing the sample vector with 25 zeros n <- 25 Y <- rep(0,n) #filling the sample vector for(i in 1:n){ Y[i] <- urn.model(urn = urn, n = 40) } #exploring the density of Y plot(density(Y)) 13 0.010 0.005 0.000 Density 0.015 density.default(x = Y) 100 150 200 250 N = 25 Bandwidth = 10.23 qqnorm(Y); qqline(Y) 180 160 140 120 Sample Quantiles 200 220 Normal Q−Q Plot −2 −1 0 1 2 Theoretical Quantiles Depending of the samples used the plots do change somewhat. While Y may not be far from a normal distribution, the size is too small to have any level of certainty. We can expect, however, that the distribution of Y will approach normality if the sample size increases. Also, having more than 25 observed sums could be 14 helpful. 15 7 Let X be a discrete random variable with probability mass function  x=2   0.6  0.1 x=4 P (X = x) = 0.3 x=8    0 otherwise. (a) EX = 4, V arX = 7.2, E X̄ = 4, V arX̄ = 0.072 (b) xvec = c(rep(2,6), rep(4,1), rep(8,3)) vec.means = replicate(2000, mean(sample(xvec, 100, replace = T))) est.EXbar = mean(vec.means) est.VarXbar = mean(vec.means^2) - mean(vec.means)^2 c(est.EXbar, est.VarXbar) ## [1] 4.0071900 0.0723437 Very close values indeed. (c) hist(vec.means) 300 200 100 0 Frequency 400 500 Histogram of vec.means 3.0 3.5 4.0 vec.means 16 4.5 5.0 plot(density(vec.means)) 0.5 0.0 Density 1.0 1.5 density.default(x = vec.means) 3.0 3.5 4.0 4.5 5.0 N = 2000 Bandwidth = 0.05287 qqnorm(vec.means) qqline(vec.means) 4.5 4.0 3.5 Sample Quantiles 5.0 Normal Q−Q Plot −3 −2 −1 0 1 Theoretical Quantiles 17 2 3 IQR(vec.means)/sqrt(est.VarXbar) ## [1] 1.33845 It does seem for the sample to be drawn form a normal distribution. (d) #(i) 1 - pnorm(3.1, est.EXbar, sqrt(est.VarXbar)) ## [1] 0.999628 #(ii) mean(vec.means > 3.1) ## [1] 1 18 8 Assume the one can of coke weights on average 355 grams and one can of pepsi weights on average 354 grams and both have a standard deviation of 1 gr. If you select at random 36 cans of coke and 48 cans of pepsi, what is the probability that the average weight of coke cans is greater than the average weight of pepsi cans? iid iid Let X1 , ..., X36 ∼ N (µ = 355, σ 2 = 1) represent the random sample of coke weights and Y1 , ..., Y48 ∼ N (µ = 354, σ 2 = 1/48) the random sample of pepsi weights. So, X̄ ∼ N (µ = 355, σ 2 = 1/36) and Ȳ ∼ N (µ = 354, σ 2 = 1/48). We then have P (X̄ > Ȳ ) = P (X̄ − Ȳ > 0) = 1 − P (X̄ − Ȳ < 0). We√ know that X̄ − Ȳ ∼ N (µ = 355 − 354 = 1, σ 2 = 1/36 + 1/48 = 0.0486). Then, P (X̄ > Ȳ ) = 1 − pnorm(0, 1, 0.486), in R, 1-pnorm(0,1,sqrt(0.0486)) ## [1] 0.9999971 that is, P (X̄ > Ȳ ) ≈ 1. In other words, it is almost certain that the average weight of coke cans would be greater than the average weight of pepsi cans. 19

PS4 Solutions: Probability & Statistics Problem Set

Related documents

Products

Support

PS4 Solutions: Probability & Statistics Problem Set

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib