Measure - Six Sigma Statistics

advertisement
Measure Phase
Six Sigma Statistics
Six Sigma Statistics
Welcome to Measure
Process Discovery
Six Sigma Statistics
Basic Statistics
Descriptive Statistics
Normal Distribution
Assessing Normality
Special Cause / Common Cause
Graphing Techniques
Measurement System Analysis
Process Capability
Wrap Up & Action Items
OSSS LSS Green Belt v9.1 - Measure Phase
2
© OpenSourceSixSigma, LLC
Purpose of Basic Statistics
The purpose of Basic Statistics is to:
•
Provide a numerical summary of the data being analyzed.
–
Data (n)
•
•
•
•
•
•
Factual information organized for analysis.
Numerical or other information represented in a form suitable for processing by
computer
Values from scientific experiments.
Provide the basis for making inferences about the future.
Provide the foundation for assessing process capability.
Provide a common language to be used throughout an organization to
describe processes.
Relax….it won’t
be that bad!
OSSS LSS Green Belt v9.1 - Measure Phase
3
© OpenSourceSixSigma, LLC
Statistical Notation – Cheat Sheet
Summation
An individual value, an observation
The Standard Deviation of sample data
A particular (1st) individual value
The Standard Deviation of population data
For each, all, individual values
The variance of sample data
The mean, average of sample data
The variance of population data
The grand mean, grand average
The range of data
The mean of population data
The average range of data
Multi-purpose notation, i.e. # of subgroups, #
of classes
A proportion of sample data
A proportion of population data
The absolute value of some term
Sample size
Greater than, less than
Greater than or equal to, less than or equal to
OSSS LSS Green Belt v9.1 - Measure Phase
4
Population size
© OpenSourceSixSigma, LLC
Parameters vs. Statistics
Population:
All the items that have the “property of interest” under study.
Frame:
An identifiable subset of the population.
Sample:
A significantly smaller subset of the population used to make an inference.
Population
Sample
Sample
Sample
Population Parameters:
–
–
Sample Statistics:
Arithmetic descriptions of a population
µ,  , P, 2, N
–
–
OSSS LSS Green Belt v9.1 - Measure Phase
5
Arithmetic descriptions of a
sample
X-bar , s, p, s2, n
© OpenSourceSixSigma, LLC
Types of Data
Attribute Data (Qualitative)
– Is always binary, there are only two possible values (0, 1)
• Yes, No
• Go, No go
• Pass/Fail
Variable Data (Quantitative)
– Discrete (Count) Data
• Can be categorized in a classification and is based on counts.
– Number of defects
– Number of defective units
– Number of customer returns
– Continuous Data
• Can be measured on a continuum, it has decimal subdivisions that are
meaningful
– Time, Pressure, Conveyor Speed, Material feed rate
– Money
– Pressure
– Conveyor Speed
– Material feed rate
OSSS LSS Green Belt v9.1 - Measure Phase
6
© OpenSourceSixSigma, LLC
Discrete Variables
Discrete Variable
Possible values for the variable
The number of defective needles in boxes of 100
diabetic syringes
0,1,2, …, 100
The number of individuals in groups of 30 with a
Type A personality
0,1,2, …, 30
The number of surveys returned out of 300
mailed in a customer satisfaction study.
0,1,2, … 300
The number of employees in 100 having finished
high school or obtained a GED
0,1,2, … 100
The number of times you need to flip a coin
before a head appears for the first time
1,2,3, …
(note, there is no upper limit because you might
need to flip forever before the first head appears.
OSSS LSS Green Belt v9.1 - Measure Phase
7
© OpenSourceSixSigma, LLC
Continuous Variables
Continuous Variable
Possible Values for the Variable
The length of prison time served for individuals
convicted of first degree murder
All the real numbers between a and b, where a is
the smallest amount of time served and b is the
largest.
The household income for households with
incomes less than or equal to $30,000
All the real numbers between a and $30,000,
where a is the smallest household income in the
population
The blood glucose reading for those individuals
having glucose readings equal to or greater than
200
All real numbers between 200 and b, where b is
the largest glucose reading in all such individuals
OSSS LSS Green Belt v9.1 - Measure Phase
8
© OpenSourceSixSigma, LLC
Definitions of Scaled Data
•
Understanding the nature of data and how to represent it can affect the
types of statistical tests possible.
•
Nominal Scale – data consists of names, labels, or categories. Cannot
be arranged in an ordering scheme. No arithmetic operations are
performed for nominal data.
•
Ordinal Scale – data is arranged in some order, but differences between
data values either cannot be determined or are meaningless.
•
Interval Scale – data can be arranged in some order and for which
differences in data values are meaningful. The data can be arranged in
an ordering scheme and differences can be interpreted.
•
Ratio Scale – data that can be ranked and for which all arithmetic
operations including division can be performed. (division by zero is of
course excluded) Ratio level data has an absolute zero and a value of
zero indicates a complete absence of the characteristic of interest.
OSSS LSS Green Belt v9.1 - Measure Phase
9
© OpenSourceSixSigma, LLC
Nominal Scale
Qualitative Variable
Possible nominal level data values for
the variable
Blood Types
A, B, AB, O
State of Residence
Alabama, …, Wyoming
Country of Birth
United States, China, other
Time to weigh in!
OSSS LSS Green Belt v9.1 - Measure Phase
10
© OpenSourceSixSigma, LLC
Ordinal Scale
Qualitative Variable
Possible Ordinal level data
values
Automobile Sizes
Subcompact, compact,
intermediate, full size, luxury
Product rating
Poor, good, excellent
Baseball team classification
Class A, Class AA, Class AAA,
Major League
OSSS LSS Green Belt v9.1 - Measure Phase
11
© OpenSourceSixSigma, LLC
Interval Scale
Interval Variable
Possible Scores
100…
(the difference between scores
is measurable and has
meaning but a difference of 20
points between 100 and 120
does not indicate that one
student is 1.2 times more
intelligent )
IQ scores of students in
BlackBelt Training
OSSS LSS Green Belt v9.1 - Measure Phase
12
© OpenSourceSixSigma, LLC
Ratio Scale
Ratio Variable
Possible Scores
0…
(If person A consumes 25 grams of fat and
person B consumes 50 grams, we can say
that person B consumes twice as much fat
as person A. If a person C consumes zero
grams of fat per day, we can say there is a
complete absence of fat consumed on that
day. Note that a ratio is interpretable and
an absolute zero exists.)
Grams of fat consumed per adult in the
United States
OSSS LSS Green Belt v9.1 - Measure Phase
13
© OpenSourceSixSigma, LLC
Converting Attribute Data to Continuous Data
• Continuous Data is always more desirable
• In many cases Attribute Data can be converted to
Continuous
• Which is more useful?
– 15 scratches or Total scratch length of 9.25”
– 22 foreign materials or 2.5 fm/square inch
– 200 defects or 25 defects/hour
OSSS LSS Green Belt v9.1 - Measure Phase
14
© OpenSourceSixSigma, LLC
Descriptive Statistics
Measures of Location (central tendency)
– Mean
– Median
– Mode
Measures of Variation (dispersion)
–
–
–
–
Range
Interquartile Range
Standard deviation
Variance
OSSS LSS Green Belt v9.1 - Measure Phase
15
© OpenSourceSixSigma, LLC
Descriptive Statistics
Open the MINITAB™ Project “Measure Data Sets.mpj” and
select the worksheet “basicstatistics.mtw”
OSSS LSS Green Belt v9.1 - Measure Phase
16
© OpenSourceSixSigma, LLC
Measures of Location
Mean is:
• Commonly referred to as the average.
• The arithmetic balance point of a distribution of data.
Stat>Basic Statistics>Display Descriptive Statistics…>Graphs…
>Histogram of data, with normal curve
Sample
Histogram (with Normal Curve) of Data
Mean
StDev
N
80
70
Population
5.000
0.01007
200
Frequency
60
50
40
Descriptive Statistics: Data
30
Variable N N* Mean SE Mean StDev Minimum
Q1
Median
Q3
Data
200 0 4.9999 0.000712 0.0101 4.9700 4.9900
5.0000 5.0100
20
10
0
4.97
4.98
4.99
5.00
5.01
5.02
Data
OSSS LSS Green Belt v9.1 - Measure Phase
17
Variable Maximum
Data
5.0200
© OpenSourceSixSigma, LLC
Measures of Location
Median is:
• The mid-point, or 50th percentile, of a distribution of data.
• Arrange the data from low to high, or high to low.
– It is the single middle value in the ordered list if there is an odd
number of observations
– It is the average of the two middle values in the ordered list if there
are an even number of observations
Histogram (with Normal Curve) of Data
Mean
StDev
N
80
70
5.000
0.01007
200
Frequency
60
50
Descriptive Statistics: Data
40
Variable N N* Mean SE Mean StDev Minimum Q1 Median
Q3
Data
200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100
30
20
Variable Maximum
Data
5.0200
10
0
4.97
4.98
4.99
5.00
5.01
5.02
Data
OSSS LSS Green Belt v9.1 - Measure Phase
18
© OpenSourceSixSigma, LLC
Measures of Location
Trimmed Mean is a:
Compromise between the Mean and Median.
• The Trimmed Mean is calculated by eliminating a specified percentage
of the smallest and largest observations from the data set and then
calculating the average of the remaining observations
• Useful for data with potential extreme values.
Stat>Basic Statistics>Display Descriptive Statistics…>Statistics…> Trimmed Mean
Descriptive Statistics: Data
Variable N N* Mean SE Mean TrMean StDev Minimum
Q1 Median
Data
200 0 4.9999 0.000712 4.9999 0.0101 4.9700 4.9900 5.0000
Variable
Q3 Maximum
Data
5.0100 5.0200
OSSS LSS Green Belt v9.1 - Measure Phase
19
© OpenSourceSixSigma, LLC
Measures of Location
Mode is:
The most frequently occurring value in a distribution of data.
Mode = 5
Histogram (with Normal Curve) of Data
Mean
StDev
N
80
70
5.000
0.01007
200
Frequency
60
50
40
30
20
10
0
4.97
4.98
4.99
5.00
5.01
5.02
Data
OSSS LSS Green Belt v9.1 - Measure Phase
20
© OpenSourceSixSigma, LLC
Measures of Variation
Range is the:
Difference between the largest observation and the smallest
observation in the data set.
• A small range would indicate a small amount of variability and a large
range a large amount of variability.
Descriptive Statistics: Data
Variable N N* Mean SE Mean StDev Minimum
Q1 Median
Q3
Data
200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100
Variable Maximum
Data
5.0200
Interquartile Range is the:
Difference between the 75th percentile and the 25th percentile.
Use Range or Interquartile Range when the data distribution is Skewed.
OSSS LSS Green Belt v9.1 - Measure Phase
21
© OpenSourceSixSigma, LLC
Measures of Variation
Standard Deviation is:
Equivalent of the average deviation of values from the Mean for a
distribution of data.
A “unit of measure” for distances from the Mean.
Use when data are symmetrical.
Population
Sample
Descriptive Statistics: Data
Variable N N* Mean SE Mean StDev Minimum
Q1 Median
Q3
Data
200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100
Variable Maximum
Data
5.0200
Cannot calculate population Standard Deviation because this is sample data.
OSSS LSS Green Belt v9.1 - Measure Phase
22
© OpenSourceSixSigma, LLC
Measures of Variation
Variance is the:
Average squared deviation of each individual data point from the
Mean.
Sample
OSSS LSS Green Belt v9.1 - Measure Phase
Population
23
© OpenSourceSixSigma, LLC
Normal Distribution
The Normal Distribution is the most recognized distribution in
statistics.
What are the characteristics of a Normal Distribution?
– Only random error is present
– Process free of assignable cause
– Process free of drifts and shifts
So what is present when the data is Non-normal?
OSSS LSS Green Belt v9.1 - Measure Phase
24
© OpenSourceSixSigma, LLC
The Normal Curve
The normal curve is a smooth, symmetrical, bell-shaped
curve, generated by the density function.
It is the most useful continuous probability model as
many naturally occurring measurements such as
heights, weights, etc. are approximately Normally
Distributed.
OSSS LSS Green Belt v9.1 - Measure Phase
25
© OpenSourceSixSigma, LLC
Normal Distribution
Each combination of Mean and Standard Deviation generates a
unique normal curve:
“Standard” Normal Distribution
– Has a μ = 0, and σ = 1
– Data from any Normal Distribution can be made to
fit the standard Normal by converting raw scores
to standard scores.
– Z-scores measure how many Standard Deviations from the
mean a particular data-value lies.
OSSS LSS Green Belt v9.1 - Measure Phase
26
© OpenSourceSixSigma, LLC
Normal Distribution
The area under the curve between any 2 points represents the
proportion of the distribution between those points.
The area between the
Mean and any other
point depends upon the
Standard Deviation.
m
x
Convert any raw score to a Z-score using the formula:
Refer to a set of Standard Normal Tables to find the
proportion between μ and x.
OSSS LSS Green Belt v9.1 - Measure Phase
27
© OpenSourceSixSigma, LLC
The Empirical Rule
The Empirical Rule…
-6
-5
-4
-3
-2
-1
+1
+2
+3
+4
+5
+6
68.27 % of the data will fall within +/- 1 standard deviation
95.45 % of the data will fall within +/- 2 standard deviations
99.73 % of the data will fall within +/- 3 standard deviations
99.9937 % of the data will fall within +/- 4 standard deviations
99.999943 % of the data will fall within +/- 5 standard deviations
99.9999998 % of the data will fall within +/- 6 standard deviations
OSSS LSS Green Belt v9.1 - Measure Phase
28
© OpenSourceSixSigma, LLC
Download