Uploaded by MARTIN DWAINE LIBAS

Lecture-4-Introduction-to-Statistics

advertisement
Engineering Data
Analysis
Engr. Dustin Glenn Cuevas, MSCE
Announcement
• Recitation #4 is scheduled after this discussion
• Seatwork #4 is to be uploaded tomorrow, due date: Nov. 8, 11:59 p.m.
• Quiz #4 is scheduled on Monday (Nov 8)
• Midterm Exam is scheduled on Monday (Nov 8)
Descriptive Statistics
Quartiles, Deciles, Percentiles
When an ordered set of data is divided into four equal parts, the division points are called quar%les
IQR = Q3 – Q1 à Interquar:le Range
When an ordered set of data is divided into ten equal parts, the division points are called deciles
When an ordered set of data is divided into 100 equal parts, the division points are called percen%les
For Percentiles
Kth Locator can be solved using:
𝑘𝑡ℎ =
𝑃
(𝑛 + 1)
100
The following are the ages of nine employees of Ayala Corpora:on:
24, 28, 33, 33, 37, 39, 47, 51, 99
Compute the value of the third quar:le
Where does the age of 28 fall in rela:on to the ages of the employees?
Sample Problem
The following are the test scores of 12 reviewees which are arranged in increasing orders:
53, 58, 68, 73, 75, 76, 79, 80, 85, 88, 91, 99
Find the value of 62nd percen:le
Find the percen:le rank for the score 85.
Find the interquar:le range.
Inferential Statistics
Statistical Inference
Sta:s:cal methods are used to make decisions and draw conclusions about popula:ons. This aspect of
sta:s:cs is generally called sta%s%cal inference.
Sta:s:cal inference may be divided into two major areas: parameter es%ma%on and hypothesis tes%ng.
Example of Parameter Es%ma%on:
Consider a popula:on of “height of adult male in the Philippines”
Why do we estimate?
It is too difficult and expensive to collect data from the whole popula:on.
Example:
Acceptance Sampling – an engineer will only check a sample of bolts to conclude that all the bolts are “okay”
Es%ma%on Procedure:
1. Select a random and representa%ve sample
2. Collect informa:on from the sample
3. Calculate the sample sta:s:c
4. Assign a value to popula:on parameter
Point Estimate
We know that before the data are collected, the observa:ons are considered to be random variables, say,
X1, X2, … , Xn. Therefore, any func:on of the observa:ons, or any sta%s%c, is also a random variable.
A point es:mate of some popula:on parameter is a single numerical value of a sta:s:c. The sta:s:c
is called the point es%mator.
The random variables X1, X2, … , Xn are a random sample of size n if (a) the Xi’s are independent
random variables and (b) every Xi has the same probability distribu:on.
Central Limit Theorem
If X1, X2, … , Xn is a random sample of size n taken from a popula:on (either finite or infinite) with mean μ and
finite variance σ2 and if X is the sample mean, the limi:ng form of the distribu:on of
𝑋- − 𝜇
𝑍=
𝜎/ 𝑛
as n → ∞, is the standard normal distribu:on.
Central Limit Theorem
Consider the lognormal distribu:on func:on
Central Limit Theorem
Standard Error
Suppose that we are sampling from a normal distribu:on with mean μ and variance σ2/n, so the
standard error of 𝑋- is
𝜎"! =
𝜎
𝑛
If we did not know σ but subs:tuted the sample standard devia:on S into the preceding equa:on,
the es:mated standard error of X would be
𝑆𝐸 𝑋- = 𝜎"! =
𝑆
𝑛
Statistical Intervals for a Single Sample
Confidence Interval
Point Es:mate
a
b
Point es:mate ± margin of error
1 − 𝛼 ∗ 100%
1 − 𝛼 ∗ 100% = 95%
𝛼 = 0.05 → 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒
𝛼
2
1−𝛼
𝛼
2
Confidence Interval
An interval es:mate for a popula:on parameter is called a confidence interval.
Three factors that affect the margin of error:
Variability – refers how spread out a set of data is. If the popula:on has a large standard devia:on, it is more
difficult to pinpoint an es:mate for the popula:on parameter. The larger is variability, the larger will be the margin
of error.
Sample Size
The larger your sample, the more sure you can be that their answers truly reflect the popula:on. This indicates that
for a given confidence level, the larger your sample size, the smaller your confidence interval.
Confidence
Confidence refers to the probability that an interval calculated from a sample will contain the popula:on parameter.
Typical confidence are 95%, 90%, and 99%.
Confidence Interval on the Mean
of a Normal Distribution, Variance Known
If x is the sample mean of a random sample of size n from a normal popula:on with known
variance σ2, a 100(1 −α)% confidence interval on μ is given by
𝑥̅ − 𝑧#/%
𝛼
2
𝜎
𝜎
≤ 𝜇 ≤ 𝑥̅ + 𝑧#/%
𝑛
𝑛
1−𝛼
𝛼
2
Sample Problem
What is the 𝑧#/% for a 95% confidence interval?
What is the 𝑧#/% for a 75% confidence interval?
What is the 𝑧#/% for an 80% confidence interval?
Sample Problem
ASTM Standard E23 defines standard test methods for notched bar impact tes:ng of metallic materials. The
Charpy V-notch (CVN) technique measures impact energy and is open used to determine whether or not a
material experiences a duc:le-to-briqle transi:on with decreasing temperature. Ten measurements of
impact energy (J) on specimens of A238 steel cut at 60∘C are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3,
64.6, 64.8, 64.2, and 64.3. Assume that impact energy is normally distributed with σ=1 J. We want to find a
95% CI for μ, the mean impact energy.
One-sided Confidence Bounds
It is also possible to obtain one-sided confidence bounds for μ by sesng either the lower bound l =−∞ or the
upper bound u =∞ and replacing zα/2 by zα.
A 100(1 – α)% upper-confidence bound for u is
𝜇 ≤ 𝑥̅ + 𝑧#
𝜎
𝑛
1−𝛼
A 100(1 – α)% lower-confidence bound for u is
𝜇 ≥ 𝑥̅ − 𝑧#
𝜎
𝑛
𝛼
1−𝛼
𝛼
Sample Problem
The same data for impact tes:ng from last example are used to construct a lower, one-sided 95%
confidence interval for the mean impact energy.
Large Sample Confidence Interval on the Mean
When n is large, the quan:ty
𝑋- − 𝜇
𝑆/ 𝑛
has an approximate standard normal distribu:on. Consequently,
𝑥̅ − 𝑧#/%
𝑠
𝑠
≤ 𝜇 ≤ 𝑥̅ + 𝑧#/%
𝑛
𝑛
is a large-sample confidence interval for μ, with confidence level of approximately 100(1 −α)%.
It turns out that when n is large, replacing σ by the sample standard devia:on S has liqle effect on the
distribu:on of Z.
Confidence Interval on the Mean
of a Normal Distribution, Variance Unknown
Let X1, X2, … , Xn be a random sample from a normal distribu:on with unknown mean μ
and unknown variance σ2. The random variable
𝑋- − 𝜇
𝑇=
𝑆/ 𝑛
has a t distribu:on with n − 1 degrees of freedom.
The t probability density func:on is
This is introduced by William S. Goset
Confidence Interval on the Mean
of a Normal Distribution, Variance Unknown
x and s are the mean and standard devia:on of a random sample from a normal distribu:on
with unknown variance σ2, a 100(1 −α)% confidence interval on μ is given by
𝑥̅ − 𝑡#/%,'()
𝑠
𝑠
≤ 𝜇 ≤ 𝑥̅ + 𝑡#/%,'()
𝑛
𝑛
where tα/2,n−1 is the upper 100α/2 percentage point of the t distribu:on with n − 1 degrees of freedom.
T-distribution
The term degrees of freedom results from the fact that the n devia:ons 𝑥1 − 𝑥,̅ 𝑥2 − 𝑥,̅ … , 𝑥𝑛 − 𝑥̅ always
sum to zero, and so specifying the values of any n − 1 of these quan::es automa:cally determines the remaining
one.
𝑠%
∑'*+)(𝑥* −𝑥)̅ %
=
𝑛−1
The number of degrees of freedom is the number of independent pieces of informa:on in the data.
n – 1 of the sample values can take on any value. However, the nth value must be specific in order to aqain the
sample mean, 𝑥.̅
𝑖𝑛 𝑒𝑥𝑐𝑒𝑙;
𝑡𝑖𝑛𝑣(𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒, 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚)
The difference between z-distribution and tdistribution
Normal distribu:ons are used when the popula:on distribu:on is assumed to be normal. The T distribu:on is
similar to the normal distribu:on, just with faqer tails. Both assume a normally distributed popula:on. T
distribu:ons have higher kurtosis than normal distribu:ons. The probability of gesng values very far from the
mean is larger with a T distribu:on than a normal distribu:on.
Sample Problem
An ar:cle in the Journal of Materials Engineering [“Instrumented Tensile Adhesion Tests on Plasma Sprayed
Thermal Barrier Coa:ngs” (1989, Vol. 11(4), pp. 275–282)] describes the results of tensile adhesion tests on 22 U-700
alloy specimens. The load at specimen failure is as follows (in megapascals): We want to find a 95% CI on μ
Confidence Interval on the Variance and
Standard Deviation of a Normal Distribution
Some:mes confidence intervals on the popula:on variance or standard devia:on are needed.
Chi-square distribu%on
Let X1, X2, … , Xn be a random sample from a normal distribu:on with mean μ and variance σ2, and let S2 be the
sample variance. Then the random variable
%
𝑛
−
1
𝑆
𝑋% =
𝜎%
has a chi-square (χ2) distribu:on with n − 1 degrees of freedom.
The probability density func:on of a χ2 random variable is
Chi-square distribution
Confidence Interval on the Variance
If s2 is the sample variance from a random sample of n observa:ons from a normal distribu:on
with unknown variance σ2, then a 100(1 −α)% confidence interval on σ2
𝑛 − 1 𝑠%
𝑛 − 1 𝑠%
%
≤𝜎 ≤ %
χ%#/%,'()
χ)(#/%,'()
One-Sided Confidence Bounds on the Variance
The 100(1 −α)% lower and upper confidence bounds on σ2 are
𝑛 − 1 𝑠%
≤ 𝜎%
%
χ#,'()
%
𝑛
−
1
𝑠
𝜎% ≤ %
χ)(#,'()
Sample Problem
An automa:c filling machine is used to fill boqles with liquid detergent. A random sample of 20
boqles results in a sample variance of fill volume of s2 =0.01532 (fluid ounce).If the variance of fill
volume is too large, an unacceptable propor:on of boqles will be under- or overfilled. We will
assume that the fill volume is approximately normally distributed. Determine the 95% upper
confidence bound for the popula:on standard devia:on.
Confidence Interval for a Population Proportion
Recall from binomial distribu:on:
where
p = propor:on of success
1 - p = propor:on of failure
𝑓 𝑥 = 𝑛𝐶𝑥𝑝 , 1 − 𝑝
,
𝑥 = 0,1, … … , 𝑛
Consider x = number of observa:ons in the number of interest, n = sample size of the popula:on
p =x/n
Confidence Interval for a Population Proportion
If p
̂ is the propor:on of observa:ons in a random sample of size n that belongs to a class of interest, an
approximate 100(1 −α)% confidence interval on the propor:on p of the popula:on that belongs to this class is
𝑝̂ − 𝑧#/%
𝑝̂ 1 − 𝑝̂
𝑝(1
̂ − 𝑝)̂
≤ 𝑝 ≤ 𝑝̂ + 𝑧#/%
𝑛
𝑛
where zα/2 is the upper α/2 percentage point of the standard normal distribu:on.
Sample Problem
In a random sample of 85 automobile engine crankshap bearings, 10 have a surface finish that is rougher than
the specifica:ons allow. Determine a 95% two-sided confidence interval for automobile engine crankshap with
surface that is rougher than the specifica:ons allow.
Sample Size for a Specified Error on a Binomial
Proportion
𝑛=
𝑧#
%
𝐸
%
𝑝(1 − 𝑝)
An es:mate of p is required to use for this equa:on.
If p is not available, we consider the fact that the equa:on will always be maximum at p = 0.5
𝑛=
𝑧#
%
𝐸
%
0.25
Sample Problem
In a random sample of 85 automobile engine crankshap bearings, 10 have a surface finish that is rougher than
the specifica:ons allow. Determine a 95% two-sided confidence interval for automobile engine crankshap with
surface that is rougher than the specifica:ons allow.
How large a sample is required if we want to be 95% confident that the error in using p to es:mate p is
less than 0.05 from the previous example?
How large a sample is required if we want to be 95% confident that the error in using p to es:mate p is
less than 0.05 regardless of the value of p?
One-sided Confidence Bounds on a Binomial
Proportion
The approximate 100(1 −α)% lower and upper confidence bounds are
𝑝̂ − 𝑧#
𝑝̂ 1 − 𝑝̂
≤𝑝
𝑛
𝑝 ≤ 𝑝̂ + 𝑧#
𝑝(1
̂ − 𝑝)̂
𝑛
Prediction Interval for Future Observation
In some problem situa:ons, we may be interested in predic:ng a future observa:on of a variable.
Suppose that X1, X2, …, Xn is a random sample from a normal popula:on. We wish to predict
the value Xn+1, a single future observa:on
1
1
𝑥̅ − 𝑡#/%,'() 𝑠 1 + ≤ 𝑋'-) ≤ 𝑥̅ + 𝑡#/%,'() 𝑠 1 +
𝑛
𝑛
Sample Problem
An ar:cle in the Journal of Materials Engineering [“Instrumented Tensile Adhesion Tests on Plasma Sprayed
Thermal Barrier Coa:ngs” (1989, Vol. 11(4), pp. 275–282)] describes the results of tensile adhesion tests on 22
U-700 alloy specimens. The load at specimen failure is as follows (in megapascals): We plan to test a 23rd
specimen. Determine a 95% predic:on interval on the load at failure for this specimen is
Download