Z_test_statistic_p-value Ricardo Zaragoza October 24, 2018 This document was generated using RStudio-Markdown How can we use z test statistic to calculate the p-value through the 68%-95%-99.7% rule? For example, if the z test statistic in the example below is -1, so how can we calculate the p-value using the 68%-95%-99.7% rule. Answer We can apply the 68%-95%-99.7% rule to normal distributions. This rule says that for a normal distribution with mean µ and a standard deviation σ: • 68.9% of the observations fall within a standard deviation σ of the mean µ. • 95% of the observations fall within 2σ of the mean µ. • 99.7% of the observations fall within 3σ of the mean µ. To calculate the z-score, it is to say, to estimate exactly how many standard deviations from the mean our observations x are, we normalize these observations. To do so, we need to estimate the mean and the standard deviation of the population we are analyzing. First, we can use some data to see the distribution of height. This demonstration is based on: Sean Dolinar exercise https://stats.seandolinar.com/calculating-z-scores-with-r/ #Loading your data set data <- read.csv('Height_data.csv') # We use the command str to see how the data is structured str(data) ## 'data.frame': 25000 obs. of 3 variables: ## $ ï..Index : int 1 2 3 4 5 6 7 8 9 10 ... ## $ Height.Inches.: num 65.8 71.5 69.4 68.2 67.8 ... ## $ Weight.Pounds.: num 113 136 153 142 144 ... # Generate an object with Height height <- data$Height # How many observations do we have? N <- length(height) N ## [1] 25000 # Let's observe in the histogram what kind of ditribution the data has hist(height) #histogram 1 3000 0 1000 Frequency 5000 Histogram of height 60 65 70 75 height We can see that the data has a normal distribution. Here we estimate the parameters of the populatio: the standard deviation and the mean. #population parameter calculations pop_sd <- sd(height)*sqrt((length(height)-1)/(length(height))) pop_mean <- mean(height) # Population mean pop_mean ## [1] 67.99311 #Population standard deviation pop_sd ## [1] 1.901641 Using the population mean [µ = 67.99] and standard deviation [σ = 1.90], we can calculate the z-score for any given value of x. In this example I’ll use 71.78 for x. xi − µ z= σ #z-statistic calculation # Generating the z-score z <- (71.78 - pop_mean)/pop_sd # The value of the z-test for x=71.78 is: z 2 ## [1] 1.991378 This gives us a score of 1.99, which means that someone who is 71.78 inches tall is 1.99 standard deviations to the right from the mean of this population. And according to the 68%-95%-99.7% rule, 95% of the population would be between [µ = 67.99] ± 2 ∗ σ = 1.90, it is to say less than or equal to 64.19 and greater than or equal to 71.79 inches tall. Here, we want to test the hypothesis: H0 : µ = 71.79 Ha : µ 6= 71.79 Now, we can use the z-score to estimate the probability (p − value) of finding someone who is less than or equal to 64.19 and greater than or equal to 71.79 inches tall in this specific population. #calculating the p-value. This is the probability of finding someone who is ## less than or equal to 64.19 and greater than or equal to 71.79 inches tall # in this specific population # P-value 2*pnorm(-abs(z)) ## [1] 0.0464393 Thus, we are estimating the probability of the observations to the left of the blue line, and to the right of the red line: #Histogram of the heights distribution hist(height) #histogram abline(v=64.19, col = "blue", lty = 2, lwd = 2) abline(v=71.79, col = "red", lty = 2, lwd = 2) legend(72,4000, c("64.19", "71.79"), col = c("blue","red"), lty=c(2,2), lwd = c(1,1)) 3 3000 64.19 71.79 0 1000 Frequency 5000 Histogram of height 60 65 70 75 height Remember, we are assuming that the height of this population follows a normal distribution, with a mean [µ = 67.99] and standard deviation [σ = 1.90]. Conclusion: Around 4.64% of the time we would find someone who is less than or equal to 64.19 and greater than or equal to 71.79 inches tall, and we can reject the null hypothesis that the mean µ = 71.79. What if we want to know what is the p-value of someone 71.79 inches tall or taller? We can also use the z-statistic to find it. In this case, we specify a right tail p-value. #calculating the p-value. This is the probability of finding someone who is 71.79 ## inches tall or taller in this specific population # P-value pnorm(-abs(z)) ## [1] 0.02321965 Conclusion: Around 2.32% of the time, we would find someone who is 71.79 inches tall or taller, and we can reject the null hypothesis that the mean µ = 71.79. Apply the 68%-95%-99.7% rule: The purpose of these excersices was to remember what is a z-statistic, what the 68%-95%-99.7% rule means, and how we estimate the p-value for hypothesis testing. Now, we can just substitute the values of the 4 68%-95%-99.7% rule in the estimation of the p-value for two tails (±1, ±2, ±3 standard deviations from the mean), you can do it for one tail by yourself. # P-value of two tails for the 68% or +-1 standard deviation 2*pnorm(-abs(1)) ## [1] 0.3173105 # P-value of two tails for the 95% or +-2 standard deviations 2*pnorm(-abs(2)) ## [1] 0.04550026 # P-value of two tails for the 99.7% or +-3 standard deviations 2*pnorm(-abs(3)) ## [1] 0.002699796 What if the z test statistic is 1? For consistency we assume that x = 71.79 has a z score of 1: # P-value of two tails for the 68% or +-1 standard deviation 2*pnorm(-abs(1)) ## [1] 0.3173105 # P-value of one tail for the 68% or +1 standard deviation pnorm(-abs(1)) ## [1] 0.1586553 We can see that if the z test statistic is 1, for two tails 31.73% of the time we would find someone who is less than or equal to 67.99 and greater than or equal to 71.79 inches tall, then, we fail to reject the null hypothesis that the mean µ = 71.79. For one tail, around 15.86% of the time, we would find someone who is 71.79 inches tall or taller, so we fail to reject the null hypothesis that the mean µ = 71.79. Remember: A low p-value (p − value < 0.05) demonstrates strong evidence against the null hypothesis, and a high p-value (p − value > 0.05) demonstrates strong evidence in favor of the null hypothesis 5