Applied Statistics I Liang Zhang Department of Mathematics, University of Utah July 1, 2008 Liang Zhang (UofU) Applied Statistics I July 1, 2008 1 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 2 / 36 Weibull Distribution Definition A random variable X is said to have a Weibull distribution with parameters α and β (α > 0, β > 0) if the pdf of X is ( α α−1 −(x/β)α e x ≥0 αx f (x; α, β) = β 0 x <0 Liang Zhang (UofU) Applied Statistics I July 1, 2008 2 / 36 Weibull Distribution Definition A random variable X is said to have a Weibull distribution with parameters α and β (α > 0, β > 0) if the pdf of X is ( α α−1 −(x/β)α e x ≥0 αx f (x; α, β) = β 0 x <0 Remark: 1. The family of Weibull distributions was introduced by the Swedish physicist Waloddi Weibull in 1939. Liang Zhang (UofU) Applied Statistics I July 1, 2008 2 / 36 Weibull Distribution Definition A random variable X is said to have a Weibull distribution with parameters α and β (α > 0, β > 0) if the pdf of X is ( α α−1 −(x/β)α e x ≥0 αx f (x; α, β) = β 0 x <0 Remark: 1. The family of Weibull distributions was introduced by the Swedish physicist Waloddi Weibull in 1939. 2. We use X ∼ WEB(α, β) to denote that the rv X has a Weibull distribution with parameters α and β. Liang Zhang (UofU) Applied Statistics I July 1, 2008 2 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 3 / 36 Weibull Distribution Remark: Liang Zhang (UofU) Applied Statistics I July 1, 2008 3 / 36 Weibull Distribution Remark: 3. When α = 1, the pdf becomes ( f (x; β) = 1 −x/β βe x ≥0 0 x <0 which is the pdf for an exponential distribution with parameter λ = β1 . Thus we see that the exponential distribution is a special case of both the gamma and Weibull distributions. Liang Zhang (UofU) Applied Statistics I July 1, 2008 3 / 36 Weibull Distribution Remark: 3. When α = 1, the pdf becomes ( f (x; β) = 1 −x/β βe x ≥0 0 x <0 which is the pdf for an exponential distribution with parameter λ = β1 . Thus we see that the exponential distribution is a special case of both the gamma and Weibull distributions. 4. There are gamma distributions that are not Weibull distributios and vice versa, so one family is not a subset of the other. Liang Zhang (UofU) Applied Statistics I July 1, 2008 3 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 4 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 4 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 5 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 5 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 6 / 36 Weibull Distribution Proposition Let X be a random variable such that X ∼ WEI(α, β). Then ( 2 ) 2 1 1 and V (X ) = β 2 Γ 1 + − Γ 1+ E (X ) = βΓ 1 + α α α The cdf of X is ( α 1 − e −(x/β) F (x; α, β) = 0 Liang Zhang (UofU) Applied Statistics I x ≥0 x <0 July 1, 2008 6 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 7 / 36 Weibull Distribution Example: The shear strength (in pounds) of a spot weld is a Weibull distributed random variable, X ∼ WEB(400, 2/3). a. Find P(X > 410). Liang Zhang (UofU) Applied Statistics I July 1, 2008 7 / 36 Weibull Distribution Example: The shear strength (in pounds) of a spot weld is a Weibull distributed random variable, X ∼ WEB(400, 2/3). a. Find P(X > 410). b. Find P(X > 410 | X > 390). Liang Zhang (UofU) Applied Statistics I July 1, 2008 7 / 36 Weibull Distribution Example: The shear strength (in pounds) of a spot weld is a Weibull distributed random variable, X ∼ WEB(400, 2/3). a. Find P(X > 410). b. Find P(X > 410 | X > 390). c. Find E (X ) and V (X ). Liang Zhang (UofU) Applied Statistics I July 1, 2008 7 / 36 Weibull Distribution Example: The shear strength (in pounds) of a spot weld is a Weibull distributed random variable, X ∼ WEB(400, 2/3). a. Find P(X > 410). b. Find P(X > 410 | X > 390). c. Find E (X ) and V (X ). d. Find the 95th percentile. Liang Zhang (UofU) Applied Statistics I July 1, 2008 7 / 36 Weibull Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 8 / 36 Weibull Distribution In practical situations, γ = min(X ) > 0 and X − γ has a Weibull distribution. Liang Zhang (UofU) Applied Statistics I July 1, 2008 8 / 36 Weibull Distribution In practical situations, γ = min(X ) > 0 and X − γ has a Weibull distribution. Example (Problem 74): Let X = the time (in 10−1 weeks) from shipment of a defective product until the customer returns the product. Suppose that the minimum return time is γ = 3.5 and that the excess X − 3.5 over the minimum has a Weibull distribution with parameters α = 2 and β = 1.5. a. What is the cdf of X ? Liang Zhang (UofU) Applied Statistics I July 1, 2008 8 / 36 Weibull Distribution In practical situations, γ = min(X ) > 0 and X − γ has a Weibull distribution. Example (Problem 74): Let X = the time (in 10−1 weeks) from shipment of a defective product until the customer returns the product. Suppose that the minimum return time is γ = 3.5 and that the excess X − 3.5 over the minimum has a Weibull distribution with parameters α = 2 and β = 1.5. a. What is the cdf of X ? b. What are the expected return time and variance of return time? Liang Zhang (UofU) Applied Statistics I July 1, 2008 8 / 36 Weibull Distribution In practical situations, γ = min(X ) > 0 and X − γ has a Weibull distribution. Example (Problem 74): Let X = the time (in 10−1 weeks) from shipment of a defective product until the customer returns the product. Suppose that the minimum return time is γ = 3.5 and that the excess X − 3.5 over the minimum has a Weibull distribution with parameters α = 2 and β = 1.5. a. What is the cdf of X ? b. What are the expected return time and variance of return time? c. Compute P(X > 5). Liang Zhang (UofU) Applied Statistics I July 1, 2008 8 / 36 Weibull Distribution In practical situations, γ = min(X ) > 0 and X − γ has a Weibull distribution. Example (Problem 74): Let X = the time (in 10−1 weeks) from shipment of a defective product until the customer returns the product. Suppose that the minimum return time is γ = 3.5 and that the excess X − 3.5 over the minimum has a Weibull distribution with parameters α = 2 and β = 1.5. a. What is the cdf of X ? b. What are the expected return time and variance of return time? c. Compute P(X > 5). d. Compute P(5 ≤ X ≤ 8). Liang Zhang (UofU) Applied Statistics I July 1, 2008 8 / 36 Lognormal Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 9 / 36 Lognormal Distribution Definition A nonnegative rv X is said to have a lognormal distribution if the rv Y = ln(X ) has a normal distribution. The resulting pdf of a lognormal rv when ln(X ) is normally distributed with parameters µ and σ is ( 2 2 √ 1 e −[ln(x)−µ] /(2σ ) x ≤ 0 2πσx f (x; µ, σ) = 0 x <0 Liang Zhang (UofU) Applied Statistics I July 1, 2008 9 / 36 Lognormal Distribution Definition A nonnegative rv X is said to have a lognormal distribution if the rv Y = ln(X ) has a normal distribution. The resulting pdf of a lognormal rv when ln(X ) is normally distributed with parameters µ and σ is ( 2 2 √ 1 e −[ln(x)−µ] /(2σ ) x ≤ 0 2πσx f (x; µ, σ) = 0 x <0 Remark: 1. We use X ∼ LOGN(µ, σ 2 ) to denote that rv X have a lognormal distribution with parameters µ and σ. Liang Zhang (UofU) Applied Statistics I July 1, 2008 9 / 36 Lognormal Distribution Definition A nonnegative rv X is said to have a lognormal distribution if the rv Y = ln(X ) has a normal distribution. The resulting pdf of a lognormal rv when ln(X ) is normally distributed with parameters µ and σ is ( 2 2 √ 1 e −[ln(x)−µ] /(2σ ) x ≤ 0 2πσx f (x; µ, σ) = 0 x <0 Remark: 1. We use X ∼ LOGN(µ, σ 2 ) to denote that rv X have a lognormal distribution with parameters µ and σ. 2. Notice here that the parameter µ is not the mean and σ 2 is not the variance, i.e. µ 6= E (X ) and σ 2 6= V (X ) Liang Zhang (UofU) Applied Statistics I July 1, 2008 9 / 36 Lognormal Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 10 / 36 Lognormal Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 10 / 36 Lognormal Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 11 / 36 Lognormal Distribution Proposition If X ∼ LOGN(µ, σ 2 ), then E (X ) = e µ+σ 2 /2 2 2 and V (X ) = e 2µ+σ · (e σ − 1) The cdf of X is F (x; µ, σ) = P(X ≤ x) = P[ln(X ) ≤ ln(x)] ln(x) − µ ln(x) − µ =P Z ≤ =Φ σ σ x ≤0 where Φ(z) is the cdf of the standard normal rv Z . Liang Zhang (UofU) Applied Statistics I July 1, 2008 11 / 36 Lognormal Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 12 / 36 Lognormal Distribution Example (Problem 115) Let Ii be the input current to a transistor and I0 be the output current. Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of proportionality is 1 (which amounts to choosing a particular unit of measurement), so that current gain = X = ln(I0 /Ii ). Assume X is normally distributed with µ = 1 and σ = 0.05. Liang Zhang (UofU) Applied Statistics I July 1, 2008 12 / 36 Lognormal Distribution Example (Problem 115) Let Ii be the input current to a transistor and I0 be the output current. Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of proportionality is 1 (which amounts to choosing a particular unit of measurement), so that current gain = X = ln(I0 /Ii ). Assume X is normally distributed with µ = 1 and σ = 0.05. a. What is the probability that the output current is more than twice the input current? Liang Zhang (UofU) Applied Statistics I July 1, 2008 12 / 36 Lognormal Distribution Example (Problem 115) Let Ii be the input current to a transistor and I0 be the output current. Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of proportionality is 1 (which amounts to choosing a particular unit of measurement), so that current gain = X = ln(I0 /Ii ). Assume X is normally distributed with µ = 1 and σ = 0.05. a. What is the probability that the output current is more than twice the input current? b. What are the expected value and variance of the ratio of output to input current? Liang Zhang (UofU) Applied Statistics I July 1, 2008 12 / 36 Lognormal Distribution Example (Problem 115) Let Ii be the input current to a transistor and I0 be the output current. Then the current gain is proportional to ln(I0 /Ii ). Suppose the constant of proportionality is 1 (which amounts to choosing a particular unit of measurement), so that current gain = X = ln(I0 /Ii ). Assume X is normally distributed with µ = 1 and σ = 0.05. a. What is the probability that the output current is more than twice the input current? b. What are the expected value and variance of the ratio of output to input current? c. What value r is such that only 5% chance we will have the ratio of output to input current exceed r ? Liang Zhang (UofU) Applied Statistics I July 1, 2008 12 / 36 Beta Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 13 / 36 Beta Distribution Definition A random variable X is said to have a beta distribution with parameters α, β(both positive), A, and B if the pdf of X is α−1 β−1 1 Γ(α+β) x−A B−x · · A≤x ≤B · B−A B−A f (x; α, β, A, B) = B−A Γ(α)·Γ(β) 0 otherwise The case A = 0, B = 1 gives the standard beta distribution. Liang Zhang (UofU) Applied Statistics I July 1, 2008 13 / 36 Beta Distribution Definition A random variable X is said to have a beta distribution with parameters α, β(both positive), A, and B if the pdf of X is α−1 β−1 1 Γ(α+β) x−A B−x · · A≤x ≤B · B−A B−A f (x; α, β, A, B) = B−A Γ(α)·Γ(β) 0 otherwise The case A = 0, B = 1 gives the standard beta distribution. Remark: We use X ∼ BETA(α, β, A, B) to denote that rv X has a beta distribution with parameters α, β, A, and B. Liang Zhang (UofU) Applied Statistics I July 1, 2008 13 / 36 Beta Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 14 / 36 Beta Distribution Proposition If X ∼ BETA(α, β, A, B), then E (X ) = A + (B − A) · Liang Zhang (UofU) α (B − A)2 αβ and V (X ) = α+β (α + β)2 (α + β + 1) Applied Statistics I July 1, 2008 14 / 36 Beta Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 15 / 36 Beta Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 15 / 36 Beta Distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 16 / 36 Beta Distribution Example (Problem 127) An individual’s credit score is a number calculated based on that person’s credit history which helps a lender determine how much he/she should be loaned or what credit limit should be established for a credit card. An article in the Los Angeles Times gave data which suggested that a beta distribution with parameters A = 150, B = 850, α = 8, β = 2 would provide a reasonable approximation to the distribution of American credit scores. [Note: credit scores are integer-valued]. Liang Zhang (UofU) Applied Statistics I July 1, 2008 16 / 36 Beta Distribution Example (Problem 127) An individual’s credit score is a number calculated based on that person’s credit history which helps a lender determine how much he/she should be loaned or what credit limit should be established for a credit card. An article in the Los Angeles Times gave data which suggested that a beta distribution with parameters A = 150, B = 850, α = 8, β = 2 would provide a reasonable approximation to the distribution of American credit scores. [Note: credit scores are integer-valued]. a. Let X represent a randomly selected American credit score. What are the mean value and standard deviation of this random variable? What is the probability that X is within 1 standard deviation of its mean value? Liang Zhang (UofU) Applied Statistics I July 1, 2008 16 / 36 Beta Distribution Example (Problem 127) An individual’s credit score is a number calculated based on that person’s credit history which helps a lender determine how much he/she should be loaned or what credit limit should be established for a credit card. An article in the Los Angeles Times gave data which suggested that a beta distribution with parameters A = 150, B = 850, α = 8, β = 2 would provide a reasonable approximation to the distribution of American credit scores. [Note: credit scores are integer-valued]. a. Let X represent a randomly selected American credit score. What are the mean value and standard deviation of this random variable? What is the probability that X is within 1 standard deviation of its mean value? b. What is the approximate probability that a randomly selected score will exceed 750 (which lenders consider a very good score)? Liang Zhang (UofU) Applied Statistics I July 1, 2008 16 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 17 / 36 Probability Plot Example: There is a machine available for cutting corks intended for use in wine bottles. We want to find out the distribution of the diameters of the corks produced by that machine. Assume we have 10 samples produced by that machine and the diameters is recorded as following: 3.0879 3.2546 2.8970 2.7377 2.7740 2.6030 3.5931 3.1253 2.4756 2.5133 Liang Zhang (UofU) Applied Statistics I July 1, 2008 17 / 36 Probability Plot Example: There is a machine available for cutting corks intended for use in wine bottles. We want to find out the distribution of the diameters of the corks produced by that machine. Assume we have 10 samples produced by that machine and the diameters is recorded as following: 3.0879 3.2546 2.8970 2.7377 2.7740 2.6030 3.5931 3.1253 2.4756 2.5133 Liang Zhang (UofU) Applied Statistics I July 1, 2008 17 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 18 / 36 Probability Plot 3.0879 2.6030 3.2546 3.5931 Liang Zhang (UofU) 2.8970 3.1253 2.7377 2.4756 2.7740 2.5133 Applied Statistics I July 1, 2008 18 / 36 Probability Plot 3.0879 2.6030 3.2546 3.5931 Liang Zhang (UofU) 2.8970 3.1253 2.7377 2.4756 2.7740 2.5133 Applied Statistics I July 1, 2008 18 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 19 / 36 Probability Plot Sample Percentile Liang Zhang (UofU) Applied Statistics I July 1, 2008 19 / 36 Probability Plot Sample Percentile Recall: The (100p)th percentile of the distribution of a continuous rv X , R η(p) denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy . Liang Zhang (UofU) Applied Statistics I July 1, 2008 19 / 36 Probability Plot Sample Percentile Recall: The (100p)th percentile of the distribution of a continuous rv X , R η(p) denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy . In words, the (100p)th percentile η(p) is the X value such that there are 100p% X values below η(p). Liang Zhang (UofU) Applied Statistics I July 1, 2008 19 / 36 Probability Plot Sample Percentile Recall: The (100p)th percentile of the distribution of a continuous rv X , R η(p) denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy . In words, the (100p)th percentile η(p) is the X value such that there are 100p% X values below η(p). Similarly, we can define sample percentile in the same manner, i.e. the (100p)th percentile xp is the value such that there are 100p% sample values below xp . Liang Zhang (UofU) Applied Statistics I July 1, 2008 19 / 36 Probability Plot Sample Percentile Recall: The (100p)th percentile of the distribution of a continuous rv X , R η(p) denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy . In words, the (100p)th percentile η(p) is the X value such that there are 100p% X values below η(p). Similarly, we can define sample percentile in the same manner, i.e. the (100p)th percentile xp is the value such that there are 100p% sample values below xp . Unfortunately, xp may not be a sample value for some p. Liang Zhang (UofU) Applied Statistics I July 1, 2008 19 / 36 Probability Plot Sample Percentile Recall: The (100p)th percentile of the distribution of a continuous rv X , R η(p) denoted by η(p), is defined by p = F (η(p)) = −∞ f (y )dy . In words, the (100p)th percentile η(p) is the X value such that there are 100p% X values below η(p). Similarly, we can define sample percentile in the same manner, i.e. the (100p)th percentile xp is the value such that there are 100p% sample values below xp . Unfortunately, xp may not be a sample value for some p. e.g. for the previous example, what is the 35th percentile for the ten sample values? Liang Zhang (UofU) Applied Statistics I July 1, 2008 19 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 20 / 36 Probability Plot Definition Assume we have a sample with size n. Order the n sample observations from smallest to largest. Then the ith smallest observation in the list is taken to be the [100(i − 0.5)/n]th sample percentile. Liang Zhang (UofU) Applied Statistics I July 1, 2008 20 / 36 Probability Plot Definition Assume we have a sample with size n. Order the n sample observations from smallest to largest. Then the ith smallest observation in the list is taken to be the [100(i − 0.5)/n]th sample percentile. Remark: 1. Why “i − 0.5”? Liang Zhang (UofU) Applied Statistics I July 1, 2008 20 / 36 Probability Plot Definition Assume we have a sample with size n. Order the n sample observations from smallest to largest. Then the ith smallest observation in the list is taken to be the [100(i − 0.5)/n]th sample percentile. Remark: 1. Why “i − 0.5”? We regard the sample observation as being half in the lower group and half in the upper group. Liang Zhang (UofU) Applied Statistics I July 1, 2008 20 / 36 Probability Plot Definition Assume we have a sample with size n. Order the n sample observations from smallest to largest. Then the ith smallest observation in the list is taken to be the [100(i − 0.5)/n]th sample percentile. Remark: 1. Why “i − 0.5”? We regard the sample observation as being half in the lower group and half in the upper group. e.g. if n = 9, then the sample median is the 5th largest observation and this observation is regarded as two parts: one in the lower half and one in the upper half. Liang Zhang (UofU) Applied Statistics I July 1, 2008 20 / 36 Probability Plot Definition Assume we have a sample with size n. Order the n sample observations from smallest to largest. Then the ith smallest observation in the list is taken to be the [100(i − 0.5)/n]th sample percentile. Remark: 1. Why “i − 0.5”? We regard the sample observation as being half in the lower group and half in the upper group. e.g. if n = 9, then the sample median is the 5th largest observation and this observation is regarded as two parts: one in the lower half and one in the upper half. 2. Once the percentage values 100(i − 0.5)/n(i = 1, 2, . . . , n) have been calculated, sample percentiles corresponding to intermediate percentages can be obtained by linear interpolation. Liang Zhang (UofU) Applied Statistics I July 1, 2008 20 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 21 / 36 Probability Plot Example: for the previous example, the [100(i − 0.5)/n]th sample percentile is tabulated as following: 2.4756 2.5133 2.6030 100(1-.5)/10 = 5% 100(2-.5)/10 = 15% 100(3-.5)/10 = 25% 2.7377 2.7740 100(4-.5)/10 = 35% 100(5-.5)/10 = 45% 2.8970 3.0879 3.1253 100(6-.5)/10 = 55% 100(7-.5)/10 = 65% 100(8-.5)/10 = 75% 3.2546 3.5931 100(9-.5)/10 = 85% 100(10-.5)/10 = 95% Liang Zhang (UofU) Applied Statistics I July 1, 2008 21 / 36 Probability Plot Example: for the previous example, the [100(i − 0.5)/n]th sample percentile is tabulated as following: 2.4756 2.5133 2.6030 100(1-.5)/10 = 5% 100(2-.5)/10 = 15% 100(3-.5)/10 = 25% 2.7377 2.7740 100(4-.5)/10 = 35% 100(5-.5)/10 = 45% 2.8970 3.0879 3.1253 100(6-.5)/10 = 55% 100(7-.5)/10 = 65% 100(8-.5)/10 = 75% 3.2546 3.5931 100(9-.5)/10 = 85% 100(10-.5)/10 = 95% The 10th percentile would be (2.4756 + 2.5133)/2 = 2.49445 Liang Zhang (UofU) Applied Statistics I July 1, 2008 21 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 22 / 36 Probability Plot Idea for Quantile-Quantile Plot: 1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given sample. Liang Zhang (UofU) Applied Statistics I July 1, 2008 22 / 36 Probability Plot Idea for Quantile-Quantile Plot: 1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given sample. 2. Find the corresponding [100(i − 0.5)/n]th percentile from the population with the assumed distribution; for example, if the assumed distribution is standard normal, then find corresponding [100(i − 0.5)/n]th percentile from the standard normal distribution. Liang Zhang (UofU) Applied Statistics I July 1, 2008 22 / 36 Probability Plot Idea for Quantile-Quantile Plot: 1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given sample. 2. Find the corresponding [100(i − 0.5)/n]th percentile from the population with the assumed distribution; for example, if the assumed distribution is standard normal, then find corresponding [100(i − 0.5)/n]th percentile from the standard normal distribution. 3. Consider the (population percentile, sample percentile) pairs, i.e. [100(i − 0.5)/n]th percentile, ith smallest sample of the distribution observation Liang Zhang (UofU) Applied Statistics I July 1, 2008 22 / 36 Probability Plot Idea for Quantile-Quantile Plot: 1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given sample. 2. Find the corresponding [100(i − 0.5)/n]th percentile from the population with the assumed distribution; for example, if the assumed distribution is standard normal, then find corresponding [100(i − 0.5)/n]th percentile from the standard normal distribution. 3. Consider the (population percentile, sample percentile) pairs, i.e. [100(i − 0.5)/n]th percentile, ith smallest sample of the distribution observation 4. Each pair plotted as a point on a two-dimensional coordinate system should fall close to a 45◦ line. Liang Zhang (UofU) Applied Statistics I July 1, 2008 22 / 36 Probability Plot Idea for Quantile-Quantile Plot: 1. Determine the “[100(i − 0.5)/n]th sample percentile” for a given sample. 2. Find the corresponding [100(i − 0.5)/n]th percentile from the population with the assumed distribution; for example, if the assumed distribution is standard normal, then find corresponding [100(i − 0.5)/n]th percentile from the standard normal distribution. 3. Consider the (population percentile, sample percentile) pairs, i.e. [100(i − 0.5)/n]th percentile, ith smallest sample of the distribution observation 4. Each pair plotted as a point on a two-dimensional coordinate system should fall close to a 45◦ line. Substantial deviations of the plotted points from a 45◦ line cast doubt on the assumption that the distribution under consideration is the correct one. Liang Zhang (UofU) Applied Statistics I July 1, 2008 22 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 23 / 36 Probability Plot Example 4.29: The value of a certain physical constant is known to an experimenter. The experimenter makes n = 10 independent measurements of this value using a particular measurement device and records the resulting measurement errors (error = observed value - true value). These observations appear in the following table. Percentage Sample Observation Percentage Sample Observation Liang Zhang (UofU) 5 -1.91 55 0.35 15 -1.25 65 0.72 25 -0.75 75 0.87 Applied Statistics I 35 -0.53 85 1.40 45 0.20 95 1.56 July 1, 2008 23 / 36 Probability Plot Example 4.29: The value of a certain physical constant is known to an experimenter. The experimenter makes n = 10 independent measurements of this value using a particular measurement device and records the resulting measurement errors (error = observed value - true value). These observations appear in the following table. Percentage Sample Observation Percentage Sample Observation 5 -1.91 55 0.35 15 -1.25 65 0.72 25 -0.75 75 0.87 35 -0.53 85 1.40 45 0.20 95 1.56 Is it plausible that the random variable measurement error has standard normal distribution? Liang Zhang (UofU) Applied Statistics I July 1, 2008 23 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 24 / 36 Probability Plot We first find the corresponding case, the z percentiles: Percentage 5 Sample Observation -1.91 z percentile -1.645 Percentage 55 Sample Observation 0.35 z percentile 0.126 Liang Zhang (UofU) population distribution percentiles, in this 15 -1.25 -1.037 65 0.72 0.385 25 -0.75 -0.675 75 0.87 0.675 Applied Statistics I 35 -0.53 -0.385 85 1.40 1.037 45 0.20 -0.126 95 1.56 1.645 July 1, 2008 24 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 25 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 25 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 26 / 36 Probability Plot What about the first example? We are only interested in whether the ten sample observations come from a normal distribution. Liang Zhang (UofU) Applied Statistics I July 1, 2008 26 / 36 Probability Plot What about the first example? We are only interested in whether the ten sample observations come from a normal distribution. Recall: {(100p)th percentile for N(µ, σ 2 )} = µ + {(100p)th percentile for N(0, 1)} · σ Liang Zhang (UofU) Applied Statistics I July 1, 2008 26 / 36 Probability Plot What about the first example? We are only interested in whether the ten sample observations come from a normal distribution. Recall: {(100p)th percentile for N(µ, σ 2 )} = µ + {(100p)th percentile for N(0, 1)} · σ If µ = 0, then the pairs (σ · [z percentile], observation) fall on a 45◦ line, which has slope 1. Therefore the pairs ([z percentile], observation) fall on a line passing through (0,0) (i.e., one with y -intercept 0) but having slope σ rather than 1. Liang Zhang (UofU) Applied Statistics I July 1, 2008 26 / 36 Probability Plot What about the first example? We are only interested in whether the ten sample observations come from a normal distribution. Recall: {(100p)th percentile for N(µ, σ 2 )} = µ + {(100p)th percentile for N(0, 1)} · σ If µ = 0, then the pairs (σ · [z percentile], observation) fall on a 45◦ line, which has slope 1. Therefore the pairs ([z percentile], observation) fall on a line passing through (0,0) (i.e., one with y -intercept 0) but having slope σ rather than 1. Now for µ 6= 0, the y -intercept is µ instead of 0. Liang Zhang (UofU) Applied Statistics I July 1, 2008 26 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 27 / 36 Probability Plot Normal Probability Plot A plot of the n pairs ([100(i − 0.5)/n]th z percentile, ith smallest observation) on a two-dimensional coordinate system is called a normal probability plot. If the sample observations are in fact drawn from a normal distribution with mean value µ and standard deviation σ, the points should fall close to a straight line with slope σ and y -intercept µ. Thus a plot for which the points fall close to some straight line suggests that the assumption of a normal population distribution is plausible. Liang Zhang (UofU) Applied Statistics I July 1, 2008 27 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 28 / 36 Probability Plot First Example: Percentage Sample Observation z percentile Percentage Sample Observation z percentile Liang Zhang (UofU) 5 2.4756 -1.645 55 2.8970 0.126 15 2.5133 -1.037 65 3.0879 0.385 25 2.6030 -0.675 75 3.1253 0.675 Applied Statistics I 35 2.7377 -0.385 85 3.2546 1.037 45 2.7740 -0.126 95 3.5931 1.645 July 1, 2008 28 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 29 / 36 Probability Plot A nonnormal population distribution can often be placed in one of the following three categories: 1. It is symmetric and has “lighter tails” than does a normal distribution; that is, the density curve declines more rapidly out in the tails than does a normal curve. 2. It is symmetric and heavy-tailed compared to a normal distribution. 3. It is skewed. Liang Zhang (UofU) Applied Statistics I July 1, 2008 29 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 30 / 36 Probability Plot Symmetric and “light-tailed”: e.g. Uniform distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 30 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 31 / 36 Probability Plot Symmetric and heavy-tailed: e.g. Cauchy distribution with pdf f (x) = 1/[π(1 + x 2 )] for −∞ < x < ∞ Liang Zhang (UofU) Applied Statistics I July 1, 2008 31 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 32 / 36 Probability Plot Skewed: e.g. lognormal distribution Liang Zhang (UofU) Applied Statistics I July 1, 2008 32 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 33 / 36 Probability Plot Some guidances for probability plot for normal distributions (from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert, and Fed Wood, Wiley, New York, 1980) Liang Zhang (UofU) Applied Statistics I July 1, 2008 33 / 36 Probability Plot Some guidances for probability plot for normal distributions (from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert, and Fed Wood, Wiley, New York, 1980) 1. For sample size smaller than 30, there is typically greater variation in the apperance of the probability plot. Liang Zhang (UofU) Applied Statistics I July 1, 2008 33 / 36 Probability Plot Some guidances for probability plot for normal distributions (from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert, and Fed Wood, Wiley, New York, 1980) 1. For sample size smaller than 30, there is typically greater variation in the apperance of the probability plot. 2. Only for much larger sample sizes does a linear pattern generally predominate. Liang Zhang (UofU) Applied Statistics I July 1, 2008 33 / 36 Probability Plot Some guidances for probability plot for normal distributions (from the book Fitting Equations to Data (2nd ed.) Daniel, Cuthbert, and Fed Wood, Wiley, New York, 1980) 1. For sample size smaller than 30, there is typically greater variation in the apperance of the probability plot. 2. Only for much larger sample sizes does a linear pattern generally predominate. Therefore, when a plot is based on a small sample size, only a very substantial departure from linearity should be taken as conclusive evidence of nonnorality. Liang Zhang (UofU) Applied Statistics I July 1, 2008 33 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 34 / 36 Probability Plot Definition Consider a family of probability distributions involving two parameters, θ1 and θ2 , and let F (x; θ1 , θ2 ) denote the corresponding cdf’s. The parameters θ1 and θ2 are said to be location and scale parameters, respectively, if F (x; θ1 , θ2 ) is a function of (x − θ1 )/θ2 . Liang Zhang (UofU) Applied Statistics I July 1, 2008 34 / 36 Probability Plot Definition Consider a family of probability distributions involving two parameters, θ1 and θ2 , and let F (x; θ1 , θ2 ) denote the corresponding cdf’s. The parameters θ1 and θ2 are said to be location and scale parameters, respectively, if F (x; θ1 , θ2 ) is a function of (x − θ1 )/θ2 . e.g. 1. Normal distributions N(µ, σ): F (x; µ, σ) = Φ( x−µ σ ). Liang Zhang (UofU) Applied Statistics I July 1, 2008 34 / 36 Probability Plot Definition Consider a family of probability distributions involving two parameters, θ1 and θ2 , and let F (x; θ1 , θ2 ) denote the corresponding cdf’s. The parameters θ1 and θ2 are said to be location and scale parameters, respectively, if F (x; θ1 , θ2 ) is a function of (x − θ1 )/θ2 . e.g. 1. Normal distributions N(µ, σ): F (x; µ, σ) = Φ( x−µ σ ). 2. The extreme value distribution with cdf F (x; θ1 , θ2 ) = 1 − e −e Liang Zhang (UofU) Applied Statistics I (x−θ1 )/θ2 July 1, 2008 34 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 35 / 36 Probability Plot For Weibull distribution: α F (x; α, β) = 1 − e −(x/β) , the parameter β is a scale parameter but α is NOT a location parameter. α is usually referred to as a shape parameter. Liang Zhang (UofU) Applied Statistics I July 1, 2008 35 / 36 Probability Plot For Weibull distribution: α F (x; α, β) = 1 − e −(x/β) , the parameter β is a scale parameter but α is NOT a location parameter. α is usually referred to as a shape parameter. Fortunately, if X has a Weibull distribution with shape parameter α and scale parameter β, then the transformed variable ln(X ) has an extreme value distribution with location parameter θ1 = ln(β) and scale parameter θ2 = 1/α. Liang Zhang (UofU) Applied Statistics I July 1, 2008 35 / 36 Probability Plot Liang Zhang (UofU) Applied Statistics I July 1, 2008 36 / 36 Probability Plot The gamma distribution also has a shape parameter α. However, there is no transformation h(•) such that h(X ) has a distribution that depends only on location and scale parameters. Liang Zhang (UofU) Applied Statistics I July 1, 2008 36 / 36 Probability Plot The gamma distribution also has a shape parameter α. However, there is no transformation h(•) such that h(X ) has a distribution that depends only on location and scale parameters. Thus, before we construct a probability plot, we have to estimate the shape parameter from the sample data. Liang Zhang (UofU) Applied Statistics I July 1, 2008 36 / 36