Lecture-4-Introduction-to-Statistics

Engineering Data Analysis Engr. Dustin Glenn Cuevas, MSCE Announcement • Recitation #4 is scheduled after this discussion • Seatwork #4 is to be uploaded tomorrow, due date: Nov. 8, 11:59 p.m. • Quiz #4 is scheduled on Monday (Nov 8) • Midterm Exam is scheduled on Monday (Nov 8) Descriptive Statistics Quartiles, Deciles, Percentiles When an ordered set of data is divided into four equal parts, the division points are called quar%les IQR = Q3 – Q1 à Interquar:le Range When an ordered set of data is divided into ten equal parts, the division points are called deciles When an ordered set of data is divided into 100 equal parts, the division points are called percen%les For Percentiles Kth Locator can be solved using: 𝑘𝑡ℎ = 𝑃 (𝑛 + 1) 100 The following are the ages of nine employees of Ayala Corpora:on: 24, 28, 33, 33, 37, 39, 47, 51, 99 Compute the value of the third quar:le Where does the age of 28 fall in rela:on to the ages of the employees? Sample Problem The following are the test scores of 12 reviewees which are arranged in increasing orders: 53, 58, 68, 73, 75, 76, 79, 80, 85, 88, 91, 99 Find the value of 62nd percen:le Find the percen:le rank for the score 85. Find the interquar:le range. Inferential Statistics Statistical Inference Sta:s:cal methods are used to make decisions and draw conclusions about popula:ons. This aspect of sta:s:cs is generally called sta%s%cal inference. Sta:s:cal inference may be divided into two major areas: parameter es%ma%on and hypothesis tes%ng. Example of Parameter Es%ma%on: Consider a popula:on of “height of adult male in the Philippines” Why do we estimate? It is too difficult and expensive to collect data from the whole popula:on. Example: Acceptance Sampling – an engineer will only check a sample of bolts to conclude that all the bolts are “okay” Es%ma%on Procedure: 1. Select a random and representa%ve sample 2. Collect informa:on from the sample 3. Calculate the sample sta:s:c 4. Assign a value to popula:on parameter Point Estimate We know that before the data are collected, the observa:ons are considered to be random variables, say, X1, X2, … , Xn. Therefore, any func:on of the observa:ons, or any sta%s%c, is also a random variable. A point es:mate of some popula:on parameter is a single numerical value of a sta:s:c. The sta:s:c is called the point es%mator. The random variables X1, X2, … , Xn are a random sample of size n if (a) the Xi’s are independent random variables and (b) every Xi has the same probability distribu:on. Central Limit Theorem If X1, X2, … , Xn is a random sample of size n taken from a popula:on (either finite or infinite) with mean μ and finite variance σ2 and if X is the sample mean, the limi:ng form of the distribu:on of 𝑋- − 𝜇 𝑍= 𝜎/ 𝑛 as n → ∞, is the standard normal distribu:on. Central Limit Theorem Consider the lognormal distribu:on func:on Central Limit Theorem Standard Error Suppose that we are sampling from a normal distribu:on with mean μ and variance σ2/n, so the standard error of 𝑋- is 𝜎"! = 𝜎 𝑛 If we did not know σ but subs:tuted the sample standard devia:on S into the preceding equa:on, the es:mated standard error of X would be 𝑆𝐸 𝑋- = 𝜎"! = 𝑆 𝑛 Statistical Intervals for a Single Sample Confidence Interval Point Es:mate a b Point es:mate ± margin of error 1 − 𝛼 ∗ 100% 1 − 𝛼 ∗ 100% = 95% 𝛼 = 0.05 → 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝛼 2 1−𝛼 𝛼 2 Confidence Interval An interval es:mate for a popula:on parameter is called a confidence interval. Three factors that affect the margin of error: Variability – refers how spread out a set of data is. If the popula:on has a large standard devia:on, it is more difficult to pinpoint an es:mate for the popula:on parameter. The larger is variability, the larger will be the margin of error. Sample Size The larger your sample, the more sure you can be that their answers truly reflect the popula:on. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval. Confidence Confidence refers to the probability that an interval calculated from a sample will contain the popula:on parameter. Typical confidence are 95%, 90%, and 99%. Confidence Interval on the Mean of a Normal Distribution, Variance Known If x is the sample mean of a random sample of size n from a normal popula:on with known variance σ2, a 100(1 −α)% confidence interval on μ is given by 𝑥̅ − 𝑧#/% 𝛼 2 𝜎 𝜎 ≤ 𝜇 ≤ 𝑥̅ + 𝑧#/% 𝑛 𝑛 1−𝛼 𝛼 2 Sample Problem What is the 𝑧#/% for a 95% confidence interval? What is the 𝑧#/% for a 75% confidence interval? What is the 𝑧#/% for an 80% confidence interval? Sample Problem ASTM Standard E23 defines standard test methods for notched bar impact tes:ng of metallic materials. The Charpy V-notch (CVN) technique measures impact energy and is open used to determine whether or not a material experiences a duc:le-to-briqle transi:on with decreasing temperature. Ten measurements of impact energy (J) on specimens of A238 steel cut at 60∘C are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, and 64.3. Assume that impact energy is normally distributed with σ=1 J. We want to find a 95% CI for μ, the mean impact energy. One-sided Confidence Bounds It is also possible to obtain one-sided confidence bounds for μ by sesng either the lower bound l =−∞ or the upper bound u =∞ and replacing zα/2 by zα. A 100(1 – α)% upper-confidence bound for u is 𝜇 ≤ 𝑥̅ + 𝑧# 𝜎 𝑛 1−𝛼 A 100(1 – α)% lower-confidence bound for u is 𝜇 ≥ 𝑥̅ − 𝑧# 𝜎 𝑛 𝛼 1−𝛼 𝛼 Sample Problem The same data for impact tes:ng from last example are used to construct a lower, one-sided 95% confidence interval for the mean impact energy. Large Sample Confidence Interval on the Mean When n is large, the quan:ty 𝑋- − 𝜇 𝑆/ 𝑛 has an approximate standard normal distribu:on. Consequently, 𝑥̅ − 𝑧#/% 𝑠 𝑠 ≤ 𝜇 ≤ 𝑥̅ + 𝑧#/% 𝑛 𝑛 is a large-sample confidence interval for μ, with confidence level of approximately 100(1 −α)%. It turns out that when n is large, replacing σ by the sample standard devia:on S has liqle effect on the distribu:on of Z. Confidence Interval on the Mean of a Normal Distribution, Variance Unknown Let X1, X2, … , Xn be a random sample from a normal distribu:on with unknown mean μ and unknown variance σ2. The random variable 𝑋- − 𝜇 𝑇= 𝑆/ 𝑛 has a t distribu:on with n − 1 degrees of freedom. The t probability density func:on is This is introduced by William S. Goset Confidence Interval on the Mean of a Normal Distribution, Variance Unknown x and s are the mean and standard devia:on of a random sample from a normal distribu:on with unknown variance σ2, a 100(1 −α)% confidence interval on μ is given by 𝑥̅ − 𝑡#/%,'() 𝑠 𝑠 ≤ 𝜇 ≤ 𝑥̅ + 𝑡#/%,'() 𝑛 𝑛 where tα/2,n−1 is the upper 100α/2 percentage point of the t distribu:on with n − 1 degrees of freedom. T-distribution The term degrees of freedom results from the fact that the n devia:ons 𝑥1 − 𝑥,̅ 𝑥2 − 𝑥,̅ … , 𝑥𝑛 − 𝑥̅ always sum to zero, and so specifying the values of any n − 1 of these quan::es automa:cally determines the remaining one. 𝑠% ∑'*+)(𝑥* −𝑥)̅ % = 𝑛−1 The number of degrees of freedom is the number of independent pieces of informa:on in the data. n – 1 of the sample values can take on any value. However, the nth value must be specific in order to aqain the sample mean, 𝑥.̅ 𝑖𝑛 𝑒𝑥𝑐𝑒𝑙; 𝑡𝑖𝑛𝑣(𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒, 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚) The difference between z-distribution and tdistribution Normal distribu:ons are used when the popula:on distribu:on is assumed to be normal. The T distribu:on is similar to the normal distribu:on, just with faqer tails. Both assume a normally distributed popula:on. T distribu:ons have higher kurtosis than normal distribu:ons. The probability of gesng values very far from the mean is larger with a T distribu:on than a normal distribu:on. Sample Problem An ar:cle in the Journal of Materials Engineering [“Instrumented Tensile Adhesion Tests on Plasma Sprayed Thermal Barrier Coa:ngs” (1989, Vol. 11(4), pp. 275–282)] describes the results of tensile adhesion tests on 22 U-700 alloy specimens. The load at specimen failure is as follows (in megapascals): We want to find a 95% CI on μ Confidence Interval on the Variance and Standard Deviation of a Normal Distribution Some:mes confidence intervals on the popula:on variance or standard devia:on are needed. Chi-square distribu%on Let X1, X2, … , Xn be a random sample from a normal distribu:on with mean μ and variance σ2, and let S2 be the sample variance. Then the random variable % 𝑛 − 1 𝑆 𝑋% = 𝜎% has a chi-square (χ2) distribu:on with n − 1 degrees of freedom. The probability density func:on of a χ2 random variable is Chi-square distribution Confidence Interval on the Variance If s2 is the sample variance from a random sample of n observa:ons from a normal distribu:on with unknown variance σ2, then a 100(1 −α)% confidence interval on σ2 𝑛 − 1 𝑠% 𝑛 − 1 𝑠% % ≤𝜎 ≤ % χ%#/%,'() χ)(#/%,'() One-Sided Confidence Bounds on the Variance The 100(1 −α)% lower and upper confidence bounds on σ2 are 𝑛 − 1 𝑠% ≤ 𝜎% % χ#,'() % 𝑛 − 1 𝑠 𝜎% ≤ % χ)(#,'() Sample Problem An automa:c filling machine is used to fill boqles with liquid detergent. A random sample of 20 boqles results in a sample variance of fill volume of s2 =0.01532 (fluid ounce).If the variance of fill volume is too large, an unacceptable propor:on of boqles will be under- or overfilled. We will assume that the fill volume is approximately normally distributed. Determine the 95% upper confidence bound for the popula:on standard devia:on. Confidence Interval for a Population Proportion Recall from binomial distribu:on: where p = propor:on of success 1 - p = propor:on of failure 𝑓 𝑥 = 𝑛𝐶𝑥𝑝 , 1 − 𝑝 , 𝑥 = 0,1, … … , 𝑛 Consider x = number of observa:ons in the number of interest, n = sample size of the popula:on p =x/n Confidence Interval for a Population Proportion If p ̂ is the propor:on of observa:ons in a random sample of size n that belongs to a class of interest, an approximate 100(1 −α)% confidence interval on the propor:on p of the popula:on that belongs to this class is 𝑝̂ − 𝑧#/% 𝑝̂ 1 − 𝑝̂ 𝑝(1 ̂ − 𝑝)̂ ≤ 𝑝 ≤ 𝑝̂ + 𝑧#/% 𝑛 𝑛 where zα/2 is the upper α/2 percentage point of the standard normal distribu:on. Sample Problem In a random sample of 85 automobile engine crankshap bearings, 10 have a surface finish that is rougher than the specifica:ons allow. Determine a 95% two-sided confidence interval for automobile engine crankshap with surface that is rougher than the specifica:ons allow. Sample Size for a Specified Error on a Binomial Proportion 𝑛= 𝑧# % 𝐸 % 𝑝(1 − 𝑝) An es:mate of p is required to use for this equa:on. If p is not available, we consider the fact that the equa:on will always be maximum at p = 0.5 𝑛= 𝑧# % 𝐸 % 0.25 Sample Problem In a random sample of 85 automobile engine crankshap bearings, 10 have a surface finish that is rougher than the specifica:ons allow. Determine a 95% two-sided confidence interval for automobile engine crankshap with surface that is rougher than the specifica:ons allow. How large a sample is required if we want to be 95% confident that the error in using p to es:mate p is less than 0.05 from the previous example? How large a sample is required if we want to be 95% confident that the error in using p to es:mate p is less than 0.05 regardless of the value of p? One-sided Confidence Bounds on a Binomial Proportion The approximate 100(1 −α)% lower and upper confidence bounds are 𝑝̂ − 𝑧# 𝑝̂ 1 − 𝑝̂ ≤𝑝 𝑛 𝑝 ≤ 𝑝̂ + 𝑧# 𝑝(1 ̂ − 𝑝)̂ 𝑛 Prediction Interval for Future Observation In some problem situa:ons, we may be interested in predic:ng a future observa:on of a variable. Suppose that X1, X2, …, Xn is a random sample from a normal popula:on. We wish to predict the value Xn+1, a single future observa:on 1 1 𝑥̅ − 𝑡#/%,'() 𝑠 1 + ≤ 𝑋'-) ≤ 𝑥̅ + 𝑡#/%,'() 𝑠 1 + 𝑛 𝑛 Sample Problem An ar:cle in the Journal of Materials Engineering [“Instrumented Tensile Adhesion Tests on Plasma Sprayed Thermal Barrier Coa:ngs” (1989, Vol. 11(4), pp. 275–282)] describes the results of tensile adhesion tests on 22 U-700 alloy specimens. The load at specimen failure is as follows (in megapascals): We plan to test a 23rd specimen. Determine a 95% predic:on interval on the load at failure for this specimen is

Lecture-4-Introduction-to-Statistics

Related documents

Products

Support

Lecture-4-Introduction-to-Statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib