Six Sigma Training Dr. Robert O. Neidigh Dr. Robert Setaputra Variable Types – Page 235 Attribute Data – a variable is either classified into categories or used to count occurrences of a phenomenon, also referred to as classification or categorical data. Examples: gender, reasons for defects, and votes for candidates Measurement Data – results from a measurement taken on an item or person of interest, also called continuous or variables data. Examples: height, weight, temperature, and cycle time Measures of Central Tendency Measures that try to describe or quantify the middle of a data set. Measures of Central Tendency Mean – average of all data points Median – value such that at least half the data points are less than or equal to the value and at least half the data points are greater than or equal to the value Mode – value in the data set that occurs most frequently First Quartile – value such that at least 25% of the data points are less than or equal to the value and at least 75% of the data points are greater than or equal to the value Third Quartile – value such that at least 75% of the data points are less than or equal to the value and at least 25% of the data points are greater than or equal to the value Minitab example on Page 249 Measures of Variation Measures that try to describe or quantify the amount of spread or variation in a data set Measures of Variation Range – distance from the smallest data point to the largest data point Variance and Standard Deviation – measure of how much the data points fluctuate around the mean Minitab example on Page 249 What is standard deviation? Standard deviation is a measure of variation within a data set. The larger the standard deviation, the more variation in the data set and vice versa. Technically, standard deviation is a measure of variation about the mean. Roughly speaking, standard deviation is the average distance between each data point and the mean. Motivate measure of variation through examples of small data sets. Population – divide by n, sample – divide by n - 1 Show students Normal.xls file. Continuous Probability Distributions Can assume an infinite number of values within a given range Probability of any one point is zero Probabilities are measured over intervals Area under curve defines probability Use calculus to calculate probabilities Ugh!!! Normal probability distribution is one type Fortunately, probabilities already calculated and contained in a table for normal distribution Characteristics of Normal Probability Distribution 1) 2) 3) 4) 5) 6) Bell-shaped Symmetrical Mean, median, and mode are the same Asymptotic – tails never touch X-axis Completely described by its two parameters – mean(µ) and standard deviation(σ) There are an infinite number of possible normal probability distributions How do we calculate probabilities? Since there are an infinite number of normal distributions, how can we possibly calculate probabilities for all of them? Fortunately, there is a unique characteristic of all normal distributions that allows us to do so. The probability of having a value above/below a point that is X standard deviations above/below the mean is the same for every possible normal distribution. The probabilities for the standard normal distribution (µ = 0 and σ = 1) can be used for every other normal distribution. These probabilities can be found in the standard normal probability table. Our task is to convert every normal distribution to the standard normal, this is called standardizing. How do we standardize? The distance between any point on our normal distribution of interest and the mean is found. We now want to put this distance in units of standard deviation, to do so we divide the distance between the point and the mean by our standard deviation. This value is called a Z-value and tells us how many standard deviations above or below the mean a point is. If the z-value is positive, the point is above the mean and if the z-value is negative the point is below the mean. Z-value = (point minus the mean)/standard deviation The standard normal table always gives the probability of having a value less than the Z-value. Finding the probability of having a value less than a given point Find the Z-value for the given point The Z-value lets us know how many standard deviations above/below the mean the point is Look up the probability in the standard normal table This is the probability of having a value less than the given point μ = 70 and σ = 10, find probability of having a value less than 66 40 50 60 70 80 90 100 -3 -2 -1 0 1 2 3 Finding the probability of having a value greater than a given point Find the Z-value for the given point The Z-value lets us know how many standard deviations above/below the mean the point is Look up the probability in the standard normal table This is the probability of having a value less than the given point Subtract this probability from one to find the probability of having a point greater than the given point μ = 70 and σ = 10, find probability of having a value greater than 56 40 50 60 70 80 90 100 -3 -2 -1 0 1 2 3 Finding the probability of having a value between two points Find the Z-values for the given points The Z-values let us know how many standard deviations above/below the mean the points are Look up the probabilities in the standard normal table for the two Z-values These are the probabilities of having a value less than the given point associated with each Z-value Subtract the probability associated with the smallest Z-value from the probability associated with the largest Z-value This is the probability of having a value between the two points μ = 70 and σ = 10, find probability of having a value between 57 and 76 40 50 60 70 80 90 100 -3 -2 -1 0 1 2 3 Finding the point on a normal distribution associated with a given probability Find the probability in the standard normal table Find the Z-value associated with the probability Convert the Z-value to a point on the normal distribution Mean plus (Z-value times standard deviation) μ = 70 and σ = 10, find the value such that 70% of the charge amounts will be greater than that amount 40 50 60 70 80 90 100 -3 -2 -1 0 1 2 3 Sampling Methods Reasons for sampling: Too time consuming to check entire population Too expensive to check entire population Sample results are adequate Destructive testing Impossible to check entire population Sampling Definitions Simple random sample – each item in the population has the same probability of being selected Sampling error – difference between a sample mean and the population mean Sampling distribution of the sample mean – probability distribution of all possible sample means of a given sample size Standard error of the mean – standard deviation of the sampling distribution of sample means (average sampling error) When is sampling distribution normal? If population distribution is normal, then sampling distribution is normal for any sample size If sample size is greater than or equal to thirty, then sampling distribution is always normal Properties of normal sampling distribution? Sampling distribution mean (µx-bar) equals population mean (µ) Standard error (σx-bar) equals population standard deviation (σ) divided by the square root of the sample size (n) Once we know the mean and standard error of the sampling distribution and we know it is normally distributed we are set to compute probabilities Notation X X / n Example Captain D’s tuna is sold in cans that have a net weight of 8 ounces. The weights are normally distributed with a mean of 8.025 ounces and a standard deviation of 0.125 ounces. You take a sample of 36 cans. Example – Cont. X 8.025 X 0.125/ 36 0.020833 Example – Cont. What is the probability of having a sample mean greater than 8.03 ounces? 7.962 -3 7.983 8.004 8.025 -2 -1 0 8.046 1 8.067 2 8.088 3 Example – Cont. What is the probability of having a sample mean less than 7.995 ounces? 7.962 -3 7.983 8.004 8.025 -2 -1 0 8.046 1 8.067 2 8.088 3 Example – Cont. What is the probability of having a sample mean between 7.995 ounces and 8.03 ounces? 7.962 -3 7.983 8.004 8.025 -2 -1 0 8.046 1 8.067 2 8.088 3 Hypothesis Testing Hypothesis – a statement about a population developed for the purpose of testing Hypothesis test – a procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement Key Point – Anytime a decision is made about a population based upon sample data an incorrect decision may be made Type I and Type II Errors Type I Error – rejecting a true null hypothesis Type II Error – accepting a false null hypothesis Unfortunately, in hypothesis testing the probability of a Type I Error (α) is inversely related to the probability of a Type II Error (β). If we decrease the probability of a Type I Error, then the probability of a Type II Error increases and vice versa. What are Type I and Type II errors in the U.S. Legal System?