Principles of Statistics Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Former Director, Centre for Real Estate Studies Faculty of Geoinformation Science and Engineering, Universiti Teknologi Malaysia, Skudai, Johor. E-mail: hamid@fksg.utm.my Hypothesis Testing Content: • Concepts of hypothesis testing • Test of statistical significance • Hypothesis testing one variable at a time Hypothesis • Unproven proposition • Supposition that tentatively explains certain facts or phenomena • Assumption about nature of the world • E.g. the mean price of a three-bedroom single storey houses in Skudai is RM 155,000. Hypothesis (contd.) • An unproven proposition or supposition that tentatively explains certain facts or phenomena: – Null hypothesis – Alternative hypothesis • Null hypothesis is that there is no systematic relationship between independent variables (IVs) and dependent variables (DVs). • Research hypothesis is that any relationship observed in the data is real. Null Hypothesis • Statement about the status quo • No difference • Statistically expressed as: Ho: b=0 where b is any sample parameter used to explain the population. Alternative Hypothesis • Statement that indicates the opposite of the null hypothesis • There is difference • Statistically expressed as: H1: b 0 H1: b < 0 H1: b > 0 Significance Level • Critical probability in choosing between the Ho and H1. • Simply means, the cut-off point (COP) at which a given value is probably true. • Tells how likely a result is due to chance • Most common level, used to mean “something is good enough to be believed”, is .95. • It means, the finding has a 95% chance of being likely true. • What is the COP at 95% chance? Significance Level (contd.) • Denoted as • Tells how much the probability mass is in the tails of a given distribution • Probability or significance level selected is typically .05 or .01 • Too low to warrant support for the null hypothesis • In other words, high chances to warrant support for alternative hypothesis • Main purpose of statistical testing: to reject null hypothesis Significance Level (contd.) P[-1.96 Z 1.96] = 1 - = 0.95 P[Z Zc] = P[Z -Zc] = /2 Let say we have the following relationship: Y = β + ei i=1,…, T and ei ~ N(0,σ2) ……………....(1) The least square estimator for β is: T b=Yi/T ……………………………………………..(2) i=1 with the following properties: 1) E[b] = β ………………………………………….(3a) 2) Var(b)=E[(b-β)]2 = σ2/T ………………………...(3b) 3) b~N(β, σ2/T) …………………………………….(3c) The “standardized” normal random variable for β is: b-β Z =-------- ~ N(0,1) ……………………………………..(4) (σ2T) The critical value of Z, i.e. Zc, such that α=0.05 of the probability mass is in the tails of distribution, is given as: P[Z 1.96] = P[Z -1.96]=0.025 ………………………(5a) and P[-1.96 Z 1.96]=1-0.05=0.95 ………………………(5b) Substituting SND for variable β (Eqn. 4) into Eqn (5a), we get: b-β P[-1.96 --------- 1.96]=0.95 ……………………………..…...(6) (σ2/T) Solving for β, we get: P[b-1.96σ/T β b+1.96σ/T]=0.95 ………………………… (7) In general: P[b-Zcσ/T β b+Zcσ/T]= 1- ……………….. (8a) b-β b -β Also: P[------- -Zc] = P[ -------- Zc] = α/2 (2-tail test) ...…(8b) σ/T σ/T Example You suspect that the mean rental of 225 purposebuilt office units in Johor is RM 3.00/sq.ft. If the std. dev. is RM 1.50/sq.ft., what is the 95% confidence interval of the mean? The null hypothesis that the mean is equal to 3.0: Ho: μ = 3.0 The alternative hypothesis that the mean does not equal to 3.0: H1: μ 3.0 A Sampling Distribution =.025 -XL = ? =.025 m=3.0 XU = ? x Critical values of m Critical value - upper limit S = m ZS X or m Z n 1 .5 = 3.0 1.96 225 Critical values of m = 3.0 1.960.1 = 3.0 .196 = 3.196 Critical values of m Critical value - lower limit = m - ZS X or m - Z 1 .5 = 3.0 - 1.96 225 S n Critical values of m = 3.0 1.960.1 = 3.0 .196 = 2.804 Region of Rejection LOWER LIMIT m=3.0 UPPER LIMIT Hypothesis Test m =3.0 2.804 m=3.0 3.196 3.78 Type I and Type II Errors Null is true Null is false Accept null Reject null Correctno error Type I error Type II error Correctno error Type I and Type II Errors in Hypothesis Testing State of Null Hypothesis in the Population Decision Accept Ho Reject Ho Ho is true Ho is false Correct--no error Type II error Type I error Correct--no error Example You estimate that the average price, μ, of singleand double-storey houses in Malaysia’s major industrialised towns to be RM 1,600/sq.m. Based on a sample of 101 houses, you found that the mean price, , is 1,579.44/sq.m. with a std dev. of RM 350.13/sq.m. (a) Would you reject your initial estimate at 0.05 significance level? (b) What is the confidence interval of rental at 5% s.l.? Answer (a) Ho = 1,600 H1 1,600 1,579.44 – 1,600 Test statistic: Z = -------------------350.13/101 ≈ -0.59 P[Z Zc] = P[Z -Zc] = 0.05 P[0.59 Zc ] = 0.05 From Z-table, Zc = 1.645 Since Z < Zc,do not reject Ho. ∴ Rental = RM 1,600/sq.m. Answer (b) 1,579.13-1.645(34.84)=RM 1,521.82 (lower limit) 1,579.13+1.645(34.84)=RM 1,636.44 (upper limit) PARAMETRIC STATISTICS NONPARAMETRIC STATISTICS t-Distribution • Symmetrical, bell-shaped distribution • Mean of zero and a unit standard deviation • Shape influenced by degrees of freedom Degrees of Freedom • Abbreviated d.f. • Number of observations • Number of constraints Confidence Interval Estimate Using the t-distribution m = X t c .l . S X Upper limit = X t c .l . or Lower limit = X t c .l . S n S n Confidence Interval Estimate Using the t-distribution m X tc.l . = population mean = sample mean = critical value of t at a specified confidence level SX S n = standard error of the mean = sample standard deviation = sample size Confidence Interval Estimate Using the t-distribution m = X t cl s x X = 3 .7 S = 2.66 n = 17 upper limit = 3 .7 2 .12 ( 2 .66 17 ) = 5 .07 Lower limit = 3 . 7 2 . 12 ( 2 . 66 17 ) = 2 . 33 Hypothesis Test Using the t-Distribution Univariate Hypothesis Test Utilizing the t-Distribution Suppose that a production manager believes the average number of defective assemblies each day to be 20. The factory records the number of defective assemblies for each of the 25 days it was opened in a given month. The mean X was calculated to be 22, and the standard deviation, S ,to be 5. H 0 : m = 20 H1 : m 20 SX = S / n = 5 / 25 =1 Univariate Hypothesis Test Utilizing the t-Distribution The researcher desired a 95 percent confidence, and the significance level becomes .05.The researcher must then find the upper and lower limits of the confidence interval to determine the region of rejection. Thus, the value of t is needed. For 24 degrees of freedom (n-1, 25-1), the t-value is 2.064. Lower limit : m tc.l . S X = 20 2.064 5 / 25 = 20 2.0641 = 17.936 Upperlimit : m t c.l. S X = 20 2.064 5 / 25 = 20 2.0641 = 20.064 Univariate Hypothesis Test t-Test tobs X m 22 20 = = SX 1 2 = 1 =2 Testing a Hypothesis about a Distribution • Chi-Square test • Test for significance in the analysis of frequency distributions • Compare observed frequencies with expected frequencies • “Goodness of Fit” Chi-Square Test (Oi Ei )² x² = Ei Chi-Square Test x² = chi-square statistics Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell Chi-Square Test Estimation for Expected Number for Each Cell E ij = R iC n j Chi-Square Test Estimation for Expected Number for Each Cell Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size Univariate Hypothesis Test Chi-square Example O1 E1 = 2 X 2 E1 O2 E 2 2 E2 Univariate Hypothesis Test Chi-square Example 60 50 = 2 X 2 =4 50 40 50 2 50 Hypothesis Test of a Proportion p is the population proportion p is the sample proportion p is estimated with p Hypothesis Test of a Proportion H0 : p = . 5 H1 : p . 5 Sp = 0.60.4 100 = .0024 .24 = 100 = .04899 .6 .5 p p = Zobs = .04899 Sp .1 = 2.04 = .04899 Hypothesis Test of a Proportion: Another Example n = 1,200 p = .20 Sp = pq n Sp = (.2)(.8) 1200 Sp = .16 1200 Sp = .000133 Sp = . 0115 Hypothesis Test of a Proportion: Another Example n = 1,200 p = .20 Sp = pq n Sp = (.2)(.8) 1200 Sp = .16 1200 Sp = .000133 Sp = . 0115 Hypothesis Test of a Proportion: Another Example Z= pp Sp .20 .15 .0115 .05 Z= .0115 Z = 4.348 The Z value exceeds 1.96, so the null hypothesis should be rejected at the .05 level. Indeed it is significantt beyond the .001 Z=