Uploaded by julie3514

Estimation

advertisement
Estimation
Estimators
Estimators
Let’s consider the dataset gradebook again. The following code has been run to load the data and the libraries we’ll need:
Start Over
library(tidyverse)
library(patchwork)
gradebook <- read.csv("https://raw.githubusercontent.com/sta238/data/main/gradebook.csv") %>%
mutate(final = 0.1*assignment1 + 0.1*assignment2 + 0.3*midterm + 0.5*exam,
A = ifelse(final >= 0.8 , 1, 0))
Notice that two new columns are added; final is the marks for the course, calculated as a weighted average of the course components, and
A is an indicator of whether or not the student received a grade of A or higher in the course.
I want to estimate the proportion of the students who got a grade of A or higher, represented by θ . We will use three different approaches: the
maximum likelihood principle, Bayes rule, and the bootstrap principle.
Maximum Likelihood Estimation
What is the distribution for the event that a student gets a grade of A or higher?
✗ Binomial
✗ Beta
✓ Bernoulli
✗ Normal
Correct!
Write a function for the likelihood function and plot it. In class, we derived the maximum likelihood estimator for θ , θ̂ MLE . Calculate a maximum
likelihood estimate using the data in gradebook and include it in your plot as a red, dashed vertical line.
! Start Over
R Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
" Run Code
likelihood <- function(x, theta) {
p_success <- theta
p_failure <- 1 - theta
likelihood_value <- prod(p_success^x * p_failure^(1 - x))
return(likelihood_value)
}
thetahat <- gradebook$A %>% mean()
tibble(para_values = c(0.01,0.99)) %>%
ggplot(aes(x = para_values,)) +
## code for the plot goes here ##
geom_line(aes(y = likelihood(gradebook$A, para_values)), color = "blue") +
geom_vline(xintercept = thetahat, linetype = "dashed", color = "red") +
labs(caption = stringr::str_c("The red dashed line shows that the MLE is ", round(thetahat, 3)), # provides an informative
Bayesian Inference
Let’s put a beta prior on θ . In class, we derived the posterior distribution for a beta priors and a likelihood from the same family of distributions
as we’re using here. Write a function to plot the prior in blue and posterior in purple. Plot the function for 3-4 sets of values for the
hyperparmeters that could represent your belief, indicating the values chosen on each plot (try using
subtitle = str_c("With hyperparameters alpha: ", a," and beta: ", b,"") ).
! Start Over
R Code
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
" Run Code
hyperparameters <- data.frame(a = c(0.5, 1, 2, 3),
b = c(0.5, 1, 2, 3))
plots <- lapply(1:nrow(hyperparameters), function(i) {
a <- hyperparameters$a[i]
b <- hyperparameters$b[i]
ggplot(data.frame(theta = c(0, 1)), aes(x = theta)) +
geom_line(aes(y = prior(theta, a, b)), color = "blue") +
geom_line(aes(y = posterior(theta, a, b)), color = "purple") +
labs(title = "Prior and Posterior Distributions",
subtitle = str_c("With hyperparameters alpha: ", a, " and beta: ", b, "")) +
theme_minimal()
})
plots
[[1]]
[[2]]
[[3]]
[[4]]
To get an Bayesian estimate for θ , let’s compute the posterior mean, that is, the expectation of the posterior distribution. (We will discuss some
other options for Bayesian estimators when we revisit the topic in a later lecture.)
Given a Beta distributed random variable Y
∼ Beta(α, β) , the expectation is
𝔼[Y] =
α
α+β
Choose your favourite prior and compute θ̂ Bayes = 𝔼[θ|x]. Plot the prior (blue), posterior (purple), and indicate the Bayes estimate with a
dashed red line. Use similar code as suggested for a plot of the MLE to add an informative caption.
! Start Over
R Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
" Run Code
# hyperparameters of the prior
a <- 2
b <- 2
# numerical data summaries
n <- length(gradebook$A)
sumx <- sum(gradebook$A)
alpha <- a + sumx
beta <- b + n - sumx
theta_bayes <- alpha / (alpha + beta)
## plot ##
ggplot(data.frame(theta = c(0, 1)), aes(x = theta)) +
Compare MLE & Bayes
The estimates I got for θ from MLE and Bayes very similar between. To consider how they may differ, consider taking a sample of n students
from the class and finding the MLE and Bayes estimates for that sample. Do this for all values of n in the code below. Plot the resulting
estimates, with different colours for the different methods. Some of the code is sketched out for you.
! Start Over
R Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
" Run Code
# hyperparameters of the prior
a <- 2
b <- 2
# list of sample sizes
n <- c(20, 30, 50, 150, 200)
mle_estimate <- function(x) {
return(mean(x))
}
bayes_estimate <- function(x) {
alpha <- a + sum(x)
beta <- b + length(x) - sum(x)
return(alpha / (alpha + beta))
}
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
Estimatesof©(MLEvs.Bayes)
0.35
Estimate
Bootstrap
EstateMethods
30
Bayes
MLE
0.25
0.20
20
30
50
SampleSize(n)
Next Topic
150
200
Download