Uploaded by ricardozaragozac

Z test statistic p-value

advertisement
Z_test_statistic_p-value
Ricardo Zaragoza
October 24, 2018
This document was generated using RStudio-Markdown
How can we use z test statistic to calculate the p-value through the 68%-95%-99.7% rule?
For example, if the z test statistic in the example below is -1, so how can we calculate the p-value using the
68%-95%-99.7% rule.
Answer
We can apply the 68%-95%-99.7% rule to normal distributions. This rule says that for a normal distribution
with mean µ and a standard deviation σ:
• 68.9% of the observations fall within a standard deviation σ of the mean µ.
• 95% of the observations fall within 2σ of the mean µ.
• 99.7% of the observations fall within 3σ of the mean µ.
To calculate the z-score, it is to say, to estimate exactly how many standard deviations from the mean our
observations x are, we normalize these observations. To do so, we need to estimate the mean and the standard
deviation of the population we are analyzing.
First, we can use some data to see the distribution of height.
This demonstration is based on: Sean Dolinar exercise https://stats.seandolinar.com/calculating-z-scores-with-r/
#Loading your data set
data <- read.csv('Height_data.csv')
# We use the command str to see how the data is structured
str(data)
## 'data.frame':
25000 obs. of 3 variables:
## $ ï..Index
: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Height.Inches.: num 65.8 71.5 69.4 68.2 67.8 ...
## $ Weight.Pounds.: num 113 136 153 142 144 ...
# Generate an object with Height
height <- data$Height
# How many observations do we have?
N <- length(height)
N
## [1] 25000
# Let's observe in the histogram what kind of ditribution the data has
hist(height) #histogram
1
3000
0
1000
Frequency
5000
Histogram of height
60
65
70
75
height
We can see that the data has a normal distribution.
Here we estimate the parameters of the populatio: the standard deviation and the mean.
#population parameter calculations
pop_sd <- sd(height)*sqrt((length(height)-1)/(length(height)))
pop_mean <- mean(height)
# Population mean
pop_mean
## [1] 67.99311
#Population standard deviation
pop_sd
## [1] 1.901641
Using the population mean [µ = 67.99] and standard deviation [σ = 1.90], we can calculate the z-score for
any given value of x. In this example I’ll use 71.78 for x.
xi − µ
z=
σ
#z-statistic calculation
# Generating the z-score
z <- (71.78 - pop_mean)/pop_sd
# The value of the z-test for x=71.78 is:
z
2
## [1] 1.991378
This gives us a score of 1.99, which means that someone who is 71.78 inches tall is 1.99 standard deviations to
the right from the mean of this population. And according to the 68%-95%-99.7% rule, 95% of the population
would be between [µ = 67.99] ± 2 ∗ σ = 1.90, it is to say less than or equal to 64.19 and greater than or equal
to 71.79 inches tall.
Here, we want to test the hypothesis:
H0 : µ = 71.79
Ha : µ 6= 71.79
Now, we can use the z-score to estimate the probability (p − value) of finding someone who is less than or
equal to 64.19 and greater than or equal to 71.79 inches tall in this specific population.
#calculating the p-value. This is the probability of finding someone who is
## less than or equal to 64.19 and greater than or equal to 71.79 inches tall
# in this specific population
# P-value
2*pnorm(-abs(z))
## [1] 0.0464393
Thus, we are estimating the probability of the observations to the left of the blue line, and to the right of the
red line:
#Histogram of the heights distribution
hist(height) #histogram
abline(v=64.19, col = "blue", lty = 2, lwd = 2)
abline(v=71.79, col = "red", lty = 2, lwd = 2)
legend(72,4000, c("64.19", "71.79"),
col = c("blue","red"), lty=c(2,2), lwd = c(1,1))
3
3000
64.19
71.79
0
1000
Frequency
5000
Histogram of height
60
65
70
75
height
Remember, we are assuming that the height of this population follows a normal distribution, with a mean
[µ = 67.99] and standard deviation [σ = 1.90].
Conclusion:
Around 4.64% of the time we would find someone who is less than or equal to 64.19 and greater than or equal
to 71.79 inches tall, and we can reject the null hypothesis that the mean µ = 71.79.
What if we want to know what is the p-value of someone 71.79 inches tall or taller? We can also use the
z-statistic to find it. In this case, we specify a right tail p-value.
#calculating the p-value. This is the probability of finding someone who is 71.79
## inches tall or taller in this specific population
# P-value
pnorm(-abs(z))
## [1] 0.02321965
Conclusion:
Around 2.32% of the time, we would find someone who is 71.79 inches tall or taller, and we can reject the
null hypothesis that the mean µ = 71.79.
Apply the 68%-95%-99.7% rule:
The purpose of these excersices was to remember what is a z-statistic, what the 68%-95%-99.7% rule means,
and how we estimate the p-value for hypothesis testing. Now, we can just substitute the values of the
4
68%-95%-99.7% rule in the estimation of the p-value for two tails (±1, ±2, ±3 standard deviations from the
mean), you can do it for one tail by yourself.
# P-value of two tails for the 68% or +-1 standard deviation
2*pnorm(-abs(1))
## [1] 0.3173105
# P-value of two tails for the 95% or +-2 standard deviations
2*pnorm(-abs(2))
## [1] 0.04550026
# P-value of two tails for the 99.7% or +-3 standard deviations
2*pnorm(-abs(3))
## [1] 0.002699796
What if the z test statistic is 1?
For consistency we assume that x = 71.79 has a z score of 1:
# P-value of two tails for the 68% or +-1 standard deviation
2*pnorm(-abs(1))
## [1] 0.3173105
# P-value of one tail for the 68% or +1 standard deviation
pnorm(-abs(1))
## [1] 0.1586553
We can see that if the z test statistic is 1, for two tails 31.73% of the time we would find someone who is
less than or equal to 67.99 and greater than or equal to 71.79 inches tall, then, we fail to reject the null
hypothesis that the mean µ = 71.79. For one tail, around 15.86% of the time, we would find someone who
is 71.79 inches tall or taller, so we fail to reject the null hypothesis that the mean µ = 71.79.
Remember:
A low p-value (p − value < 0.05) demonstrates strong evidence against the null hypothesis, and a high
p-value (p − value > 0.05) demonstrates strong evidence in favor of the null hypothesis
5
Download