Simple Test of Hypothesis __________________________________________________________________________________________________ Lesson 10 Simple tests of hypothesis (First Semester 2021) Left-Tailed Test Ha: µ < value Right-Tailed Test Ha: µ < value Two-Tailed Test Ha: µ ≠ value Source: Online Statistics Education: A Multimedia Course of Study (http://onlinestatbook.com/). Project Leader: David M. Lane, Rice University. Link of Online Calculator: http://onlinestatbook.com/2/calculators/normal_dist.html Note: Concepts were used with written permission from the author. Let’s Hit These: At the end of this lesson, the students must have: Comprehended the concepts of Statistics; Identified the mean, standard deviation and variance using spreadsheet and online calculator. Appreciated the different applications of Statistics and the use of online calculators. Simple Test of Hypothesis ____________________________________________________________________________________________ Simple Tests of Hypothesis Statistical Inference is to draw conclusion about any population parameter on the basis of the sample information. Statistical hypothesis - is an assumption about a population parameter. an assertion subject to verification an assumption used as the basis for action a guess or prediction made by the researcher regarding the possible outcome of the study. Hypothesis testing refers to the formal procedures used by statisticians or researchers to accept or reject statistical hypotheses. The process of making an inference or generalization on population parameters based on the results of the study on samples. The assertion we hold as true until we have sufficient statistical evidence to conclude otherwise. In hypothesis testing, one draws conclusions about the population using sample data. There are two types of statistical hypotheses. A. Null hypothesis. The null hypothesis, denoted by Ho, is usually the hypothesis that sample observations result purely from chance. Must always express the idea of non-significance of difference or relationship Hypothesis to be tested Always hoped to be rejected Examples on how to state the null hypothesis: a. There is no significant relationship between the respondents’ sex and academic performance. b. One variable does not depend on the other variable. c. Two variables are independent from each other. B. Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause. . The one that we conclude is true if the Ho is rejected. States that there is an effect, there is a difference, or there is a relationship. Generally represents the idea which the researcher wants to prove Examples on how to state the alternative hypothesis: a. There is a significant relationship between the respondents’ sex and academic performance. b. One variable depends on the other variable. c. Two variables are dependent with each other. Must watch (Introduction to Hypothesis Testing): https://www.youtube.com/watch?v=plAiYXYaqY0&t=77s Can We Accept the Null Hypothesis? Some researchers say that a hypothesis test can have one of two outcomes: (1) you accept the null hypothesis or (2) you reject the null hypothesis. Many statisticians, however, take issue with the notion of "accepting the null hypothesis." Instead, they say: you reject the null hypothesis or you fail to reject the null hypothesis. Why the distinction between "acceptance" and "failure to reject?" Acceptance implies that the null hypothesis is true. Failure to reject implies that the data are not sufficiently persuasive for us to prefer the alternative hypothesis over the null hypothesis. Source: Harvey Berman, H. (2020). What is Hypothesis Testing?. Retrieved from: https://stattrek.com/hypothesis-test/hypothesis-testing.aspx Note: Concepts were used with written permission from the author. Stat Trek was founded by Harvey Berman. All of Stat Trek’s analytical tools and training services are free, accessible to anyone who visits the website. Simple Test of Hypothesis ____________________________________________________________________________________________ Decision Errors Two types of errors can result from a hypothesis test. 1. Type I error (α error). A Type I error occurs when the researcher rejects a null hypothesis when it is true. The error of concluding that there is something (a difference, or a change, or an effect) when in reality, there is none. Rejecting a true Ho Example: Ho: Juan is not guilty. If the judge convicts Juan when in fact he is not guilty, the court commits a Type I error. α is read as Alpha which means level of significance 2. Type II error (β error). A Type II error occurs when the researcher fails to reject a null hypothesis that is false. The error of concluding that there is nothing (no difference, or no change, or no effect) when in reality, there is. Accepting a false Ho Example: Ho: Juan is not guilty. If the judge acquits Juan when in fact he is guilty, the court commits a Type II error. β is read as Beta The probability of not committing a Type II error is called the Power of the test. Decision Ho is actually: True False Reject Ho Type I Error Correct Fail to Reject Ho Correct Type II Error Must watch (Type I and Type II Errors ): https://www.youtube.com/watch?v=Sdw2E7Xi0Q0 Simple Test of Hypothesis ____________________________________________________________________________________________ Decision Rules 1. P-value. The strength of evidence in support of a null hypothesis is measured by the P-value. Suppose the test statistic is equal to S. The P-value is the probability of observing a test statistic as extreme as S, assuming the null hypothesis is true. If the P-value is less than the significance level, we reject the null hypothesis. Note: The lower the p-value, the stronger the evidence that the null hypothesis is false. 2. Region of acceptance. The region of acceptance is a range of values. If the test statistic falls within the region of acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the chance of making a Type I error is equal to the significance level. The set of values outside the region of acceptance is called the region of rejection. If the test statistic falls within the region of rejection, the null hypothesis is rejected. In such cases, we say that the hypothesis has been rejected at the α level of significance. Source: Berman, H. (2020).What is Hypothesis Testing?. Retrieved from: https://stattrek.com/hypothesis-test/hypothesis-testing.aspx Note: Concepts were used with written permission from the author. This is also called the Non-Rejection Region Area between z = -1.96 to z = 1.96 = 0.95 Required A = (1 – 0.95) ÷ 2 = 0.025 Source: From Wikimedia Commons. Region of rejections or acceptance. Retrieved from: https://commons.wikimedia.org/wiki/File:Region_of_rejections_or_acceptance.png One-Tailed and Two-Tailed Tests A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the mean is less than or equal to 10. The alternative hypothesis would be that the mean is greater than 10. The region of rejection would consist of a range of numbers located on the right side of sampling distribution; that is, a set of numbers greater than 10. Acceptance Region or Non-Rejection Region Simple Test of Hypothesis ____________________________________________________________________________________________ A test of a statistical hypothesis, where the region of rejection is on both sides of the sampling distribution, is called a two-tailed test. For example, suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis would be that the mean is less than 10 or greater than 10. The region of rejection would consist of a range of numbers located on both sides of sampling distribution; that is, the region of rejection would consist partly of numbers that were less than 10 and partly of numbers that were greater than 10. Source: Berman, H. (2020). What is Hypothesis Testing?. Retrived from: https://stattrek.com/hypothesis-test/hypothesis-testing.aspx Note: Concepts were used with written permission from the author. Acceptance Region is the region for accepting true null hypothesis. Critical Region is the region where true null hypothesis is rejected. In summary: One Tailed Test Is used if the alternative hypothesis is directional Examples: The yield is greater than… The mean score is less than… (< or >) Two –Tailed Test Is used if the alternative hypothesis is non-directional. Example: Score A is not equal to score B There is no rule to identify the one tailed and two tailed test of hypothesis. Generally, if direction of differences is not given in the statement of hypothesis, then we use two tailed test. Similarly, if the direction of difference like at least, at most, increase, decrease, majority, minority, larger, taller, high, low, more than, less than etc. is included in the statement of hypothesis, then we use on tailed test. Source: Raj Chand Takuri (2019). Testing of Hypothesis. In SlideShare. Retrieved from: https://www.slideshare.net/RajThakuri/testing-of-hypotheses Read “Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics” Link: https://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests-significance-levels-alpha-and-p-values-in-statistics Simple Test of Hypothesis ____________________________________________________________________________________________ Level of Significance of a Test is the maximum value of the probability of rejecting the null hypothesis (H o) when in fact it is true. the maximum size of Type I error that researcher prepared to take risk. the probability of rejecting a true hypothesis is a measure of the strength of the evidence that must be present in your sample before you will reject the null hypothesis and conclude that the effect is statistically significant. A 5% significance level means that we can accept 5 chances in 100 that we could reject the null hypothesis when it should be accepted. It also implies that we are 95% confident that we have made the right decision. It indicates a 5% risk of concluding that a difference exists when there is no actual difference. Degrees of Freedom Measures the number of scores that are free to vary when computing the sum of squares for sample data. refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample. In testing the difference between two means, the z-test or t-test may be used. z-test is used when the population standard deviation is known. This is also used if n ≥ 30. t-test is used when the sample standard deviations are known. This is also applied if n < 30. Steps in Hypothesis Testing 1. Formulate the null hypothesis (Ho) that there is no significant difference between items being compared. State the alternative hypothesis (Ha) which is used in case Ho is rejected. 2. Set the level of significance, α. Typically the 0.05 or the 0.01 level is used. 3. Determine the test to be used. 4. Determine the degrees of freedom (df) and the tabular value for the test. For a single sample, df = number of items – 1 = n – 1. For two samples, df = n1 + n2 – 2, where n1 refers to the number of items in the first sample; and the n 2 refers to the number of items in the second sample. For a z-test, use the table of critical values of z based on the area of the normal curve. For a t-test, look for the tabular value from the table of t-distribution. 5. Compute for z or t as needed, using any of the formulas found on the next page. Simple Test of Hypothesis ____________________________________________________________________________________________ A. z-test a. Sample mean compared with Population mean z= ( - μ) σ n where: = sample mean µ = population mean σ = population standard deviation n = number of items within the sample b. Comparing Two Sample Means where: 1 = mean of the first sample 2 = mean of the second samle s1 = standard deviation of the first sample s2 = standard deviation of the second sample n1 = number of items in the first sample n2 = number of items in the second sample c. Comparing Two Sample Proportions P1 - P2 ----------------------------P1q1 + P2q2 n1 n2 z= where: P1 = proportion of the first sample P2 = proportions of the second sample q1 = 1 – P1 q2 = 1 - P2 n1 = number of items in the first sample n2 = number of items in the second sample B. t-test a. Sample Mean Compared with Population t = ( - μ) s n -1 where: = sample mean µ= population mean s= sample standard deviation n= number of items within the sample Simple Test of Hypothesis ____________________________________________________________________________________________ b. Comparing Two Sample Means (independent-measures t statistic) Where: 1 = mean of the first sample 2 = mean of the second sample s1 = standard deviation of the first sample s2 = standard deviation of the first sample n1 = number of items in the first sample n2 = number of items in the second sample c. t-Test for correlated or dependent data (repeated-measures design) t= ∑D ---------------------------------N ∑ D2 - (∑ D)2 -------------------------n-1 Where: D = the difference between the two scores (posttest – pretest) n = number of samples ∑D2 = the sum of the squares of the difference between the posttest and pretest ∑D = the summation of the difference between the posttest and pretest Recall Decision Rules in Rejecting the Null Hypothesis Using the computed value and tabular/critical value If computed value > tabular value --reject Ho and accept Ha If computed value < tabular value --do not reject Ho or fail to reject Ho Using the p-value and level of significance If p-value < level of significance (α) --If p-value > level of significance (α --- reject Ho and accept Ha do not reject Ho or fail to reject Ho Read: “Failing to Reject the Null Hypothesis” Link: https://statisticsbyjim.com/hypothesis-testing/failing-reject-null-hypothesis/ Note: In this subject, we will be considering only z-test for comparing two sample means, t-test for comparing two samples means (independent data) and t-test for correlated or dependent data. Simple Test of Hypothesis ____________________________________________________________________________________________ Tables Used in Identifying the Tabular or Critical Values of t and z. t-distribution Table z- Table (critical values) Test Level of Significance (α) 0.01 0.05 One-tailed ±2.33 ±1.65 Two-tailed ±2.58 ±1.96 Source: Published in Statology by Zach (2020). How to Read the t-Distribution Table. Retrieved from: https://www.statology.org/how-to-read-t-distribution-table/ Simple Test of Hypothesis ____________________________________________________________________________________________ Before we start the application of the topic, please do the following steps in your spreadsheet: We have to INSTALL the Data Analysis ToolPak. 1. Click Data and check if Data Analysis ToolPak is installed. 2. If it is installed, you can see the icon Data Analysis. 3. If it is not installed, please proceed. a. Click File, click Options and click Add-Ins. b. Click Analysis ToolPak, click Go and click OK. c. The Data Analysis ToolPak is already INSTALLED. Simple Test of Hypothesis ____________________________________________________________________________________________ Practice Activity Using Excel (This activity will help you in our next activities.) A. Using MS Excel or Spreadsheet Find the mean, standard deviation and variance of the following data. Scores Note: A function in Excel and Google Sheets is a builtin formula. All functions begin with the equal sign (=) followed by the function's name such as AVERAGE, STDEV, SUM, etc. 88 87 78 90 88 1. Encode data in excel. 2. Solve for the mean following the syntax below. Row 3 Column B This will be the formula in getting the mean ( ) or average. Note that the data are encoded in Colum B from Row 3 to Row 7. Hence the range is from B3:B7. Row 7 You may type the range (B3:B7) or you may highlight the data. Then press Enter. This value will display. Round off number to the nearest hundredths. 3. Solve for the standard deviation. This is the formula in finding the standard deviation (s or sd). Press Enter. Round off sd to the nearest hundredths. Results: = 86.20 s = 4.71 s2 = 22.18 2 Variance (s ) is calculated by squaring the standard deviation. 2 s = (4.71)*(4.71) = 22.18 Simple Test of Hypothesis __________________________________________________________________________________________________ B. Using Online Calculator (Link: https://www.mathsisfun.com/data/standard-deviation-calculator.html ) Data: Scores 88 87 78 90 88 1. 2. Click the link and input the data as shown below. Click Sample. Consider the given data as coming from the representative of the population. (If data are coming from the entire population, click Population.) The Mean The Variance The Standard Deviation Results: = 86.2 s = 4.7 s2 = 22.2 Note that this online calculator will display numbers rounded off to the nearest tenths. Just copy the numbers. Source: Pierce, Rod. (18 Apr 2020). "Math is Fun". Math Is Fun. Retrieved 6 Jul 2020 from http://www.mathsisfun.com/index.htm Note: Concepts were used with written permission from the author. Simple Test of Hypothesis ____________________________________________________________________________________________ C. Using Another Online Calculator (Link: https://www.socscistatistics.com/descriptive/variance/default.aspx ) Data: Scores 88 87 78 90 88 1. Click the link and input the data as shown below. . 2. Click Sample and Click Calculate. The following data will be displayed: . Source: Stangroom, J. (2020). Standard Deviation and Variance Calculator. (2020, September 17). Retrieved from: https://www.socscistatistics.com/descriptive/variance/default.aspx Note: The web site offers free resources for students and researchers working with statistics in the social sciences. Simple Test of Hypothesis ____________________________________________________________________________________________ Let’s Do This: Activity No. 13 The Use of Spreadsheet and Online Calculators Problem: With the given data, identify the mean, sample standard deviation and variance using spreadsheet or online calculator. Links: https://www.mathsisfun.com/data/standard-deviation-calculator.htmlhttps://www.mathsisfun.com/data/standard-deviationcalculator.html https://www.socscistatistics.com/descriptive/variance/default.aspx Data Savings/Day 120 145 150 160 170 170 195 200 210 220 Answers: Mean: _____________________ Standard Deviation: _____________________ (Sample) Variance: _____________________