Construction Engineering 221 Sampling and Mean Comparison Sampling and Mean Comparison • We have talked about several sampling distributions: – z (normal) distribution used to estimate probability of a measurement occurrence – n (binomial) distribution used to estimate probability of a counting occurrence – r (correlation) distribution used to estimate relatedness between two variables within a population (extension is regression) Sampling and Mean Comparison • Another important distribution is the t-distribution used to estimate population means • If you draw a sample from one population (engineers, men, heavy drinkers, truck buyers) and compare them to a different population (accountants, women, non-drinkers, van buyers) on some randomly distributed variable, how will you know if the differences are “real” or merely a fluke of random measurement errors (spurious) Sampling and Mean Comparison • You use the t-statistic for population mean comparisons. • Estimating the population mean: – For large samples, the sample mean is an unbiased estimator of population mean – Sample mean is normally distributed – Normally distributed sample means can be used with confidence intervals and margin of error to make judgements about mean comparisons Sampling and Mean Comparison • How big should a sample be: – For 95% confidence interval (2 standard deviations); n= 1/e2, where e is the margin of error (1%, 2%, etc.) – If you want to be 99% sure that the sample mean will be within 2 standard deviations of the population mean, you must sample 10,000 people. If you can live with being 95% sure, you need only sample 400 people – Usually pick confidence interval and margin of error ahead of time based on criticality and other factors Sampling and Mean Comparison • If you are comparing 2 means, use the tstatistics • If you are comparing two percentages, use the z-statistic • If you are comparing 3 or more means, use the F-statistic • If you are comparing 3 or more percentages, use the Chi Square statistic Sampling and Mean Comparison • The hypothesis that assumes the populations are alike (no differences in the means) is the null hypothesis • You test the null hypothesis to determine the likelihood that it is true Unlikely (95%) that the samples came from the same populations Mean sample 2 Mean sample 1 Sampling and Mean Comparison • Assume you are testing an admixture to make concrete more “pumpable”, but don’t want to diminish early strength • You test 25 cylinders of regular concrete (control group) and 25 cylinders of concrete with the admixture. Variable 1 Mean Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat Variable 2 2496.6 2438.6 16791.08 15851.08 25 25 -0.10432 0 24 1.527462 P(T<=t) one-tail 0.06986 t Critical one-tail 1.710882 P(T<=t) two-tail 0.139719 t Critical two-tail 2.063898