Data Collection and Sample Size Considerations

advertisement
Data Collection and Sample Size Considerations
• Measure
• Kaizen Facilitation
Objectives
• Review Data Collection Plan Definition and Goals
• Discuss Sampling Principles
• Identify Key Elements of a Data Collection Plan
2
Data Collection Must Be Planned
• A Data Collection Plan is an organized, written strategy for gathering
information for your Project
• Goals
•
•
•
•
•
3
Data are representative of the process
Data are reliable, every time
Only relevant data are collected
All the necessary data are collected
Resources are used effectively
What to Measure?
• Outputs (Y’s)
• Product or Service produced or delivered by the process
• Measures include cycle time, customer satisfaction, cost
• Process Variables (X’s)
• Those variables that influence the output and are generally controllable by
those who operate the process
• Input Variables (X’s)
• Materials and information used by the process to create the outputs. (Inputs
are often outside the control of the process owner)
4
Remember the Data Types?
• Qualitative
• Judgment / Feeling (porridge too hot, job takes too long)
• Quantitative
• Attribute Data
• Discrete
• Binary, Ordinal, Categorical, Individual Count
• i.e. accuracy, defects, etc…
• Continuous Data
• Variables
• Measured
• i.e. time, physical measures, etc…
5
Attribute Data Characteristics
• Use for sorting, primarily
• Less informative than Continuous (Variables) data
• Need large sample sizes to predict capability
• Need to define opportunities for defects to be meaningful
• Definition of exact Quality Characteristics in a “taste test”
• Examples: misspelled words on page, mis-loaded containers
Pepsi
vs.
Coke
6
Continuous Data Characteristics
• Best type if available or can be gathered
• Most information for a given sample size
• Information on capability and shape of distribution
• Graphical and Statistical Analysis available
• Examples: LTI’s, service cycle times
I-MR Chart of Cycle Time - Hrs
Individual Value
6
U C L=5.763
4
2
_
X=1.935
2
0
LC L=-1.893
-2
1
9
17
25
33
41
O bser vation
49
57
65
73
U C L=4.702
Moving Range
4
3
2
__
M R=1.439
1
0
LC L=0
1
7
9
17
25
33
41
O bser vation
49
57
65
73
Continuous Types of Measures
• There are two primary types of (Y) output metrics
• Effectiveness (VoC)
• Efficiency (VoB)
• A third type of metric could be around Quality
• Which may be attribute or continuous data
8
Effectiveness Measures
• Degree to which Customers’ needs/requirements are met or
exceeded
•
•
•
•
•
•
•
9
On-time Delivery
Accuracy (i.e. – Billing Process)
Ease of use
Performance
Serviceability
Price
Value
Efficiency Measures
• Amount of Business resources allocated in meeting or exceeding
Customer needs/requirements
•
•
•
•
•
•
•
10
Total Cycle Time
Machine Time
Processing Time
Waiting Time
Per Unit Costs
Rework Costs
Inspection/ Audit costs
How Much Data is Needed?
• It is often impractical to collect all the data from every aspect of your
process
• When there is too much data
• When too much time is required to sample all the data
• When measurement is costly
• In these cases data sampling is used
• Sound conclusions can be made from a relatively small amount of
data
11
Purpose and Advantages of Using Samples
• Sampling refers to the practice of evaluating (inspecting) a portion
(sample) of a lot (population) for the purpose of inferring information
about the entire lot
• Statistically speaking, the properties of the sample distribution are
used to infer the properties of the population distribution
• Sampling makes possible the study of a large population
• Sampling is for economy, speed, and accuracy
12
Considerations in Data Sampling
Factor
• What type
• When
• Where
• Who
Example
Complaints, Defects, Problems
Year, Month, Week, Day, Hour
Region, City, Site, Quadrant
BU, Department, Individual
NOTE: These questions should be answered
within the data collection plan
13
Principles of Data Sampling
Population:
a set which includes all data measurements
of interest to the project leader
(The collection of all responses, or counts
that are of interest)
Sampling Unit:
An individual unit
of a sample
14
Sample:
A subset of the
population
Samples must be:
• Representative
• Adequate
• Random
Requirements of Data Sampling
• Based on the ‘operational definition’ of the
output (Y) and other factors (Xs) to be recorded,
determine a sampling plan
• Sampling plan must be:
• Representative: all occurring conditions, locations
and times
• Adequate: statistically significant conclusions can be
drawn about long term and short term performance
• Random: data gathered free from bias
15
Requirements of a Representative Sample
• Sample data must represent all segments
•
•
•
•
•
Physical locations
Shifts
Days of Week
Months
Seasons
• Avoid bias
• Collecting only when convenient (omitting night/ weekend shifts)
• Collecting only from responsive individuals
16
Requirements of an Adequate Sample
• Sampling sizes must be adequate to achieve statistical significance
• Sample size to achieve statistical significance varies with each
analytical tool
• Statistical significance may or may not be the same as practical
significance
• Larger sample sizes increase confidence
– refer to guidelines
17
Several Ways to Ensure Random Samples
• Randomization helps ensure data is representative
• Randomization helps endure data is free from bias
• Sampling approaches include:
• Pure Random Sampling (each unit has equal chance)
• Stratified Sampling (select from different groups/classes)
• Systematic or Interval Sampling (every 15 minutes, every 4th unit,
sweep across a location from left to right, etc)
• Cluster Sampling (large geographic areas to deal with)
• Sub-grouping (Sample output of step or activity with some
frequency - usually a time increment e.g. -Pull 5 samples at 10
a.m., 12 p.m., 2 p.m., and 4 p.m.)
18
Dealing with Differences – ‘Within, Between’
• Random samples are selected from a “homogeneous
group” or “lot”, but sometimes may not be because there
are different machines involved, different people, different
locations/ shifts
• With stratified sampling, random samples are drawn from
each “group” of processes that are different
• Stratify data collection efforts by:
•
•
•
•
19
Shift / Time of Day
Item or Type of Service
Location (Gate, Yard)
Equipment utilized (Top Pick vs. Strad)
Sampling Must Catch The Variation
Sudden Change
Hugging or Bunching
20
Cycling
Trending
Patterns of Sample Data
• When we collect sample output data (Y), we want to know not only
what its patterns are, but also what other factors are related to the
pattern in Y
• Two concepts are used to describe the factors related to these
patterns are:
• Segmentation – external factors
• Stratification – internal and process measures
• If easy, gather other process data as well:
• shift, time, operator, part number, material type, etc…
21
Examine External Factors through Segmentation
• Used to identify differences between different factors or processes
• Components from different suppliers
• Example below (l to r): in-person, fax, telephone, on-line
• Segmentation may point out major drivers of defects or correlation to the
output Y
22
Stratification for Internal and Process Measures
• Stratification is a data analysis technique by
which project Y data is sorted according to
relevant subgroups called levels or strata
• Understanding Level differences may lead to the
root cause, that will lead to ultimate
improvement of the process/ project
23
Data Collection Plan - Stratification Example
Project Y’s
Loan
Application
Cycle Time
What team must change
to influence Customer
Satisfaction
Location
Stratified X’s
What team must study
and/or change to
influence Project Y’s
Size of Loan
Phoenix
New
Orleans
Houston
Jacksonville
Small
Big
Medium
This strategy tries to identify variation within locations and perhaps, is
there a relationship between loan size and location?
24
How Much Data Do I Need?
• The amount of data required depends greatly on:
•
•
•
•
The process you’re collecting it from
What you’re trying to represent
The difference you’re trying to detect
Your confidence levels (discussed later)
• As a general rule of thumb it should be enough to cover the sampling
aspect we discussed
• More is always better…
25
Sample Sizes for Data Displays
Rules of Thumb
Tool or Statistic
Mean
Standard Deviation
Proportion Defective (P)
Histogram or Pareto
Control Chart
26
Minimum Sample Size
10 - 15
20
30
25 - 50
20 - 30
But I Only Have 5 Data Points?
• In some situations you
may not have enough
data to collect a sample
• If you run into this
situation it’s important
to understand:
• How to deal with these
small sample sizes?
• How it affects your
ability to use statistics?
27
Risks With Small Samples – Margin of Error
• It is all about precision, tolerance for
risk and cost.
• For samples smaller than 1000, we always
have to think about how confident we want
to be that estimates are within a particular
range (level of confidence and risk), and
how small we want that range to be (level
of precision). Unfortunately, they go in
opposite directions. Higher levels of
confidence require greater ranges (margins
of error) in small sample sizes.
28
Dealing With Small Samples
• Expected effects may not be fully accurate, so be upfront about the limitations
and document your sampling strategies, decisions, and criteria
• See it as an opportunity to keep evaluation costs low recognizing that a large
study without sufficient resources can under-power results
• Hesitate to report percentages, or don't at all - report fraction instead, as
percents can be misleading and may overstate results
29
Questions to Consider
•
•
•
•
•
•
What type of data of data analysis will be conducted?
Will subgroups be compared?
What is the probability of the event occurring?
How much error is tolerable (confidence interval)?
How much precision do we need?
How confident do we need to be that the true population value falls within the
confidence interval?
• What is the budget? Can we afford the desired sample?
• What is the population size? Large? Small/Finite?
• If unknown, assume it to be large ( >100,000)
30
Before Collecting Data
• One task remains before collecting data
• Validation of the Measurement System…
•
•
•
•
Systems Audit – Data Validation
Measurement System Analysis (MSA)
Gage calibration
Gage repeatability and reproducibility (GR&R) for Variables Data
• MSA must be completed before Data Collection!
Validation helps understand total variation attributed to
operator methods, bias in effort, gage discrepancies …
31
Data Collection Plan Example
Customize Your Form and Format – Each process and project has unique requirements
32
Collecting and Recording Data
3 Elements:
• A procedure
•
•
•
•
•
What will be measured (Y and X’s)
What segmentation or stratification will be recorded
Sampling Plan (what, where, when, how much)
Who will record and with What instrument
Measurement System Validation method
• A checklist
• Assure all factors, segments, and strata are included
• A form
• Collect data
33
Data Collection ‘Procedure’ Example
What data are you going to need? What are you going to do with it?
34
Data Collection ‘Checklist’ Example
Develop Your Form and Format – Each process and project has unique requirements
35
Data Collection ‘Form’ Example
Develop Your Form and Format – Each process and project has unique requirements
36
Review
• Review Data Collection Plan Definition and Goals
• Discuss Sampling Principles
• Identify Key Elements of a Data Collection Plan
37
Exercise
• Objectives:
•
•
•
•
•
Collect a data sample
Calculate the sample mean and standard deviation of the total distribution
Plot a histogram of the data total distribution
Test the data for normality
Do some data mining
• Procedure:
• Set up ‘helicopter’ and keep all conditions fixed
• Except: Wing Length & Body Width
• Change Those Randomly
• Record flight time values in Minitab for 30 launches
• Perform appropriate analysis
38
Download