Uploaded by Raunav Sharma

F20S350 PS04 Sol

advertisement
PS4 Solutions
Arturo Valdivia
November 10, 2020
Contents
1
2
2 ISIR 6.4.6.
4
3
5
4 ISIR 7.7.2.
6
5 ISIR 7.7.4.
10
6 ISIR 7.7.7.
13
7
16
8
19
1
1
Let X be a continuous random variable. Find the expected value, the median, the interquartile range (iqr) of
X, and P (0.5 < X < 1.5) when the PDF is:
(a)

 0.3
0.7
f (x) =

0
0≤x<1
1≤x<2
otherwise
One can use the areas-approach to solve this problem; i.e.,
0.7
0.7
0.3
0.3
0.0
0.5
1.0
1.5
2.0
x
Based on the figure we can easily state that EX = 0.5 × 0.3 + 1.5 × 0.7 = 1.2, where 0.3 and 0.7 are the
“heights” of the two components of the p.d.f., respectively.
R1
R2
Alternatively, we can compute the expected value as EX = 0 x(0.3)dx + 1 x(0.7)dx = 0.15x2 |10 + 0.35x2 |21 =
0.15 + 0.35(3) = 1.2.
R1
Rx
Since 0 0.3dt = 0.3t|10 = 0.3, then the median x has to hold the relationship 0.3 + 1 0.7dt = 0.5; that is
0.7t|x1 = 0.2 =⇒ x = q2 = 1.285714.
Rx
In the same way, q1 is obtained by solving 0.3t|x0 = 0.25 and q3 by solving 0.3 + 1 0.7dt = 0.75 implying that
q1 = 0.83333 and q3 = ((0.75 − 0.3)/0.7) + 1 = 1.642857. The iqr is q3 − q1 = 1.642857 − 0.83333 = 0.809527.
R 1.5
Finally, we need P (0.5 < X < 1.5) = P (X < 1.5)−P (X < 0.5). We have that P (X < 1.5) = 0.3+ 1 0.7dt =
0.3 + 0.7(0.5) = 0.65, and P (X < 0.5) = 0.3(0.5) = 0.15. Thus, P (0.5 < X < 1.5) = 0.50.
(b)
f (x) =
2(x − 1) 1 ≤ x ≤ 2
0
otherwise
One can use the areas-approach to solve this problem; i.e.,
2
2(x−1)
1.666
1.00
1.25
1.50
1.75
2.00
x
Since we have a triangular area, we can easily state that the expected value is 1/3 closer to the tallest side;
i.e. EX = 1 + (2 − 1) × (2/3) = 1.666667.
R2
3
2
Alternatively, we can compute the expected value as EX = 1 x2(x − 1)dx = 2x3 |21 − x2 |21 = ( 16
3 − 3) − 3 =
1.666667.
Rx
Rx
The median is the x value such that 1 f (t)dt = 0.5. Therefore we need to solve 1 2(t − 1)dt = 0.5; that is,
t2 − 2t|x1 = 0.5 =⇒ x2 − 2x + 1 = 0.5 =⇒ (x − 1)2 = 0.5 =⇒ x = q2 = 1.707107.
In the same way, q1 is obtained by solving (x − 1)2 = 0.25 and q3 by solving (x − 1)2 = 0.75 implying that
q1 = 1.5 and q3 = 1.866025. The iqr is q3 − q1 = 1.866025 − 1.5 = 0.366025.
Finally, we need P (0.5 < X < 1.5) = P (X < 1.5) − P (X < 0.5). Using tha fact that q1 = 1.5 we have that
P (X < 1.5) = 0.25, and since 1 ≤ x ≤ 2 then P (X < 0.5) = 0. Thus, P (0.5 < X < 1.5) = 0.25.
3
2
ISIR 6.4.6.
A random variable X ∼ U nif orm(5, 15) has population mean µ = EX = 10 and population variance
σ 2 = V arX = 25/3. Let Y denote a normal random variable with the same mean and variance.
(a) Consider X. What is the ratio of its interquartile range to its standard deviation, iqr/σ?
We know that iqr(X) = q3 (X) − q1 (X). Then we√need q0.75 (X) and q0.25 (X); that is, q0.75 (X) = 12.5
and q0.25 (X) = 7.5. Therefore, iqr/σ = √ 5 = 3.
25/3
∗
In this question there was a typo. As you see, the theoretical variance of U nif orm(5, 15) distribution
is not 225. Therefore, although strictly incorrect, we will
p grant full credit for the following solution:
iqr(X)/SD(X) = (q3 (X) − q1 (X))/SD(X) = (12.5 − 7.5)/ (V ar(X)) = 5/15 = 1/3.
(b) Consider Y . What is the ratio of its interquartile range to its standard deviation?
We need q0.75 (Y ) and q0.25 (Y ); that is,
q1 <- qnorm(p=0.25,mean=10,sd=sqrt(25/3)); q1
## [1] 8.052916
q3 <- qnorm(p=0.75,mean=10,sd=sqrt(25/3)); q3
## [1] 11.94708
iqr<- (q3-q1)/sqrt(25/3); iqr
## [1] 1.34898
q0.75 (Y ) = 11.94708 and q0.25 (Y ) = 8.052916. Therefore, iqr/σ = 1.34898.
4
3
Create the following functions in R: (a) my.iqr(x): If x is a vector, then my.iqr(x) returns the iqr of x
my.iqr <- function(x){
unname(quantile(x = x,probs = 0.75)-quantile(x = x,probs = 0.25))
}
(b) iqr.sq(x): If x is a vector, then iqr.sq(x) returns the ratio of its interquartile range to its standard
deviation
iqr.sq <- function(x){
my.iqr(x)/sqrt(mean(x^2)-(mean(x)^2))
}
(c) Used functions my.iqr() and iqr.sq() in the following vectors:
i. A random sample of 5000 numbers from a standard normal distribution (use a random seed so the
results can be replicated).
set.seed(320520)
x <- rnorm(5000)
my.iqr(x)
## [1] 1.319493
iqr.sq(x)
## [1] 1.325863
ii. The variable births in data frame US_births_1994_2003 from package fivethirtyeight
library(fivethirtyeight)
x <- US_births_1994_2003$births
my.iqr(x)
## [1] 3429.75
iqr.sq(x)
## [1] 1.845626
iii. The vector composed by the first 1000 values of the variable births
library(fivethirtyeight)
x <- US_births_1994_2003$births[1:1000]
my.iqr(x)
## [1] 2949.5
iqr.sq(x)
## [1] 1.84589
5
4
ISIR 7.7.2.
Let ~x denote the following sample of pulse rates of Peruvian indigenous1
(a) Graph the empirical cdf of ~x.
x <- c(88, 76, 84, 64, 60, 64, 60, 64, 68, 74, 68, 68, 72, 76, 72, 52, 72, 64, 60,
56, 72, 88, 80, 76, 64, 72, 60, 76, 88, 72, 64, 60, 60, 72, 92, 80, 72, 64, 68)
plot(ecdf(x), main="ECDF of X")
0.6
0.4
0.0
0.2
Fn(x)
0.8
1.0
ECDF of X
50
60
70
80
90
x
(b) Compute the plug-in estimates of the population mean and variance.
#mean
mean(x)
## [1] 70.30769
#variance
mean(x^2)-(mean(x)^2)
## [1] 87.90533
(c) Compute the plug-in estimates of the population median and interquartile range.
#median
median(x) #or equivalently
1 T. A. Ryan, Jr., B. L. Joiner, and B. F. Ryan (1985). The Minitab Student Handbook. Duxbury Press, Boston, pp. 317-318.
These data appear as Data Set 345 in A Handbook of Small Data Sets.
6
## [1] 72
quantile(x,0.5)
## 50%
## 72
#iqr
quantile(x,0.75)-quantile(x,0.25)
## 75%
## 12
(d) Compute the ratio of the plug-in estimate of the interquartile range to the square root of the plug-in
estimate of the variance.
#iqr/var
num <- quantile(x,0.75)-quantile(x,0.25)
den <- sqrt(mean(x^2)-(mean(x)^2))
num/den
##
75%
## 1.279893
(e) Construct a boxplot.
60
70
80
90
boxplot(x)
(f) Construct a normal probability plot.
qqnorm(x); qqline(x)
7
80
70
60
Sample Quantiles
90
Normal Q−Q Plot
−2
−1
0
1
2
Theoretical Quantiles
(g) Construct a kernel density estimate.
plot(density(x),main="Density of X")
0.02
0.00
0.01
Density
0.03
0.04
Density of X
40
50
60
70
80
90
100
N = 39 Bandwidth = 3.874
(h) Do you think that this sample was drawn from a normal distribution? Why or why not?
8
First, we should consider that we have relativetly few observations for drawing certain conclusions about the
underlying distribution of the data. The “qqnorm” plot deviates a bit from the the 45% degree line, but on
the other hand we observe some symmetry in the data since the median and mean are very close. These two
evidences suggest that the data might be modeled by a normal distribution, although we cannot confirm that
it exhibits strong normality.
9
5
ISIR 7.7.4.
The following sample, ~x, was observed and sorted:
(a) Graph the empirical cdf of ~x.
x <- scan("https://mtrosset.pages.iu.edu/StatInfeR/Data/sample774.dat")
plot(ecdf(x), main="ECDF of X")
0.6
0.4
0.0
0.2
Fn(x)
0.8
1.0
ECDF of X
0
2
4
6
8
x
(b) Calculate the plug-in estimates of the mean, the variance, the median, and the interquartile range.
#mean
mean(x)
## [1] 1.4876
#variance
mean(x^2)- mean(x)^2
## [1] 2.787554
#median
median(x) #or equivalently
## [1] 1.076
quantile(x,0.5)
##
50%
## 1.076
10
#iqr
iqr <- unname(quantile(x,0.75)-quantile(x,0.25))
iqr
## [1] 1.10775
(c) Take the square root of the plug-in estimate of the variance and compare it to the plug-in estimate of
the interquartile range. Do you think that ~x was drawn from a normal distribution? Why or why not?
#Ratio of the data
iqr/sqrt(mean(x^2)-mean(x)^2)
## [1] 0.6634835
#Ratio of the normal distribution
(qnorm(.75)-qnorm(.25))/1
## [1] 1.34898
We have that the correspondent ratio for the normal distribution is 1.34898, while the ratio for the date is
0.66348. Based on this criterion, the data does not seems to have come from a normal distribution.
(d) Use the qqnorm function to create a normal probability plot. Do you think that ~x was drawn from a
normal distribution? Why or why not?
qqnorm(x); qqline(x)
4
2
0
Sample Quantiles
6
Normal Q−Q Plot
−2
−1
0
1
2
Theoretical Quantiles
Based
on the qqplot, the data quantiles deviates from the normal quantiles (45 degree line); therefore we there is no
enough evidence for claiming normality.
(e) Now consider the transformed sample ~y produced by replacing each xi with its natural logarithm. If ~x
is stored in the vector x, then ~y can be computed by the following R command:
> y <- log(x)
11
Do you think that ~y was drawn from a normal distribution? Why or why not?
y <- log(x)
#Ratio of the transformed data
iqr/sqrt(mean(y^2)-mean(y)^2)
## [1] 1.291286
#qqplot
qqnorm(y); qqline(y)
0.5
−0.5
−1.5
Sample Quantiles
1.5
Normal Q−Q Plot
−2
−1
0
1
2
Theoretical Quantiles
After the log transformation the data approaches normality (although we keep having relatively few observations).
12
6
ISIR 7.7.7.
Consider an urn that contains 10 tickets, labelled
{1, 1, 1, 1, 2, 5, 5, 10, 10, 10}
From this urn, I propose to draw (with replacement) n = 40 tickets. I am interested in the sum, Y , of the 40
ticket values that I draw.
(a) Write an R function named urn.model that simulates this experiment, i.e., evaluating urn.model is
like observing a value, y, of the random variable Y .
First, let’s define a variable urn
urn <- c(1,1,1,1,2,5,5,10,10,10)
urn
##
[1]
1
1
1
1
2
5
5 10 10 10
and a function urn.model, that receives an “urn” and a number “n” of needed samples,
urn.model <- function(urn,n){
samp1 <- sample(urn,n,replace=TRUE)
y <- sum(samp1)
y
}
Then, proceed with the sampling,
urn.model(urn = urn, n = 40)
## [1] 159
This is a random sampling process, as everytime we run this code, it produces a different sample and a
different sum.
(b) Use urn.model to generate a sample, y = {y1 , . . . , y25 }, of n = 25 observed sums. The random variable
Y is discrete. Does it appear that the distribution of Y can be approximated by a normal distribution?
Why or why not?
#initializing the sample vector with 25 zeros
n <- 25
Y <- rep(0,n)
#filling the sample vector
for(i in 1:n){
Y[i] <- urn.model(urn = urn, n = 40)
}
#exploring the density of Y
plot(density(Y))
13
0.010
0.005
0.000
Density
0.015
density.default(x = Y)
100
150
200
250
N = 25 Bandwidth = 10.23
qqnorm(Y); qqline(Y)
180
160
140
120
Sample Quantiles
200
220
Normal Q−Q Plot
−2
−1
0
1
2
Theoretical Quantiles
Depending of the samples used the plots do change somewhat. While Y may not be far from a normal
distribution, the size is too small to have any level of certainty. We can expect, however, that the distribution
of Y will approach normality if the sample size increases. Also, having more than 25 observed sums could be
14
helpful.
15
7
Let X be a discrete random variable with probability mass function

x=2

 0.6

0.1
x=4
P (X = x) =
0.3
x=8



0 otherwise.
(a) EX = 4, V arX = 7.2, E X̄ = 4, V arX̄ = 0.072
(b)
xvec = c(rep(2,6), rep(4,1), rep(8,3))
vec.means = replicate(2000, mean(sample(xvec, 100, replace = T)))
est.EXbar = mean(vec.means)
est.VarXbar = mean(vec.means^2) - mean(vec.means)^2
c(est.EXbar, est.VarXbar)
## [1] 4.0071900 0.0723437
Very close values indeed.
(c)
hist(vec.means)
300
200
100
0
Frequency
400
500
Histogram of vec.means
3.0
3.5
4.0
vec.means
16
4.5
5.0
plot(density(vec.means))
0.5
0.0
Density
1.0
1.5
density.default(x = vec.means)
3.0
3.5
4.0
4.5
5.0
N = 2000 Bandwidth = 0.05287
qqnorm(vec.means)
qqline(vec.means)
4.5
4.0
3.5
Sample Quantiles
5.0
Normal Q−Q Plot
−3
−2
−1
0
1
Theoretical Quantiles
17
2
3
IQR(vec.means)/sqrt(est.VarXbar)
## [1] 1.33845
It does seem for the sample to be drawn form a normal distribution.
(d)
#(i)
1 - pnorm(3.1, est.EXbar, sqrt(est.VarXbar))
## [1] 0.999628
#(ii)
mean(vec.means > 3.1)
## [1] 1
18
8
Assume the one can of coke weights on average 355 grams and one can of pepsi weights on average 354 grams
and both have a standard deviation of 1 gr. If you select at random 36 cans of coke and 48 cans of pepsi,
what is the probability that the average weight of coke cans is greater than the average weight of pepsi cans?
iid
iid
Let X1 , ..., X36 ∼ N (µ = 355, σ 2 = 1) represent the random sample of coke weights and Y1 , ..., Y48 ∼
N (µ = 354, σ 2 = 1/48) the random sample of pepsi weights. So, X̄ ∼ N (µ = 355, σ 2 = 1/36) and
Ȳ ∼ N (µ = 354, σ 2 = 1/48).
We then have P (X̄ > Ȳ ) = P (X̄ − Ȳ > 0) = 1 − P (X̄ − Ȳ < 0). We√
know that X̄ − Ȳ ∼ N (µ = 355 − 354 =
1, σ 2 = 1/36 + 1/48 = 0.0486). Then, P (X̄ > Ȳ ) = 1 − pnorm(0, 1, 0.486), in R,
1-pnorm(0,1,sqrt(0.0486))
## [1] 0.9999971
that is, P (X̄ > Ȳ ) ≈ 1. In other words, it is almost certain that the average weight of coke cans would be
greater than the average weight of pepsi cans.
19
Download