Data Collection and Sample Size Considerations • Measure • Kaizen Facilitation Objectives • Review Data Collection Plan Definition and Goals • Discuss Sampling Principles • Identify Key Elements of a Data Collection Plan 2 Data Collection Must Be Planned • A Data Collection Plan is an organized, written strategy for gathering information for your Project • Goals • • • • • 3 Data are representative of the process Data are reliable, every time Only relevant data are collected All the necessary data are collected Resources are used effectively What to Measure? • Outputs (Y’s) • Product or Service produced or delivered by the process • Measures include cycle time, customer satisfaction, cost • Process Variables (X’s) • Those variables that influence the output and are generally controllable by those who operate the process • Input Variables (X’s) • Materials and information used by the process to create the outputs. (Inputs are often outside the control of the process owner) 4 Remember the Data Types? • Qualitative • Judgment / Feeling (porridge too hot, job takes too long) • Quantitative • Attribute Data • Discrete • Binary, Ordinal, Categorical, Individual Count • i.e. accuracy, defects, etc… • Continuous Data • Variables • Measured • i.e. time, physical measures, etc… 5 Attribute Data Characteristics • Use for sorting, primarily • Less informative than Continuous (Variables) data • Need large sample sizes to predict capability • Need to define opportunities for defects to be meaningful • Definition of exact Quality Characteristics in a “taste test” • Examples: misspelled words on page, mis-loaded containers Pepsi vs. Coke 6 Continuous Data Characteristics • Best type if available or can be gathered • Most information for a given sample size • Information on capability and shape of distribution • Graphical and Statistical Analysis available • Examples: LTI’s, service cycle times I-MR Chart of Cycle Time - Hrs Individual Value 6 U C L=5.763 4 2 _ X=1.935 2 0 LC L=-1.893 -2 1 9 17 25 33 41 O bser vation 49 57 65 73 U C L=4.702 Moving Range 4 3 2 __ M R=1.439 1 0 LC L=0 1 7 9 17 25 33 41 O bser vation 49 57 65 73 Continuous Types of Measures • There are two primary types of (Y) output metrics • Effectiveness (VoC) • Efficiency (VoB) • A third type of metric could be around Quality • Which may be attribute or continuous data 8 Effectiveness Measures • Degree to which Customers’ needs/requirements are met or exceeded • • • • • • • 9 On-time Delivery Accuracy (i.e. – Billing Process) Ease of use Performance Serviceability Price Value Efficiency Measures • Amount of Business resources allocated in meeting or exceeding Customer needs/requirements • • • • • • • 10 Total Cycle Time Machine Time Processing Time Waiting Time Per Unit Costs Rework Costs Inspection/ Audit costs How Much Data is Needed? • It is often impractical to collect all the data from every aspect of your process • When there is too much data • When too much time is required to sample all the data • When measurement is costly • In these cases data sampling is used • Sound conclusions can be made from a relatively small amount of data 11 Purpose and Advantages of Using Samples • Sampling refers to the practice of evaluating (inspecting) a portion (sample) of a lot (population) for the purpose of inferring information about the entire lot • Statistically speaking, the properties of the sample distribution are used to infer the properties of the population distribution • Sampling makes possible the study of a large population • Sampling is for economy, speed, and accuracy 12 Considerations in Data Sampling Factor • What type • When • Where • Who Example Complaints, Defects, Problems Year, Month, Week, Day, Hour Region, City, Site, Quadrant BU, Department, Individual NOTE: These questions should be answered within the data collection plan 13 Principles of Data Sampling Population: a set which includes all data measurements of interest to the project leader (The collection of all responses, or counts that are of interest) Sampling Unit: An individual unit of a sample 14 Sample: A subset of the population Samples must be: • Representative • Adequate • Random Requirements of Data Sampling • Based on the ‘operational definition’ of the output (Y) and other factors (Xs) to be recorded, determine a sampling plan • Sampling plan must be: • Representative: all occurring conditions, locations and times • Adequate: statistically significant conclusions can be drawn about long term and short term performance • Random: data gathered free from bias 15 Requirements of a Representative Sample • Sample data must represent all segments • • • • • Physical locations Shifts Days of Week Months Seasons • Avoid bias • Collecting only when convenient (omitting night/ weekend shifts) • Collecting only from responsive individuals 16 Requirements of an Adequate Sample • Sampling sizes must be adequate to achieve statistical significance • Sample size to achieve statistical significance varies with each analytical tool • Statistical significance may or may not be the same as practical significance • Larger sample sizes increase confidence – refer to guidelines 17 Several Ways to Ensure Random Samples • Randomization helps ensure data is representative • Randomization helps endure data is free from bias • Sampling approaches include: • Pure Random Sampling (each unit has equal chance) • Stratified Sampling (select from different groups/classes) • Systematic or Interval Sampling (every 15 minutes, every 4th unit, sweep across a location from left to right, etc) • Cluster Sampling (large geographic areas to deal with) • Sub-grouping (Sample output of step or activity with some frequency - usually a time increment e.g. -Pull 5 samples at 10 a.m., 12 p.m., 2 p.m., and 4 p.m.) 18 Dealing with Differences – ‘Within, Between’ • Random samples are selected from a “homogeneous group” or “lot”, but sometimes may not be because there are different machines involved, different people, different locations/ shifts • With stratified sampling, random samples are drawn from each “group” of processes that are different • Stratify data collection efforts by: • • • • 19 Shift / Time of Day Item or Type of Service Location (Gate, Yard) Equipment utilized (Top Pick vs. Strad) Sampling Must Catch The Variation Sudden Change Hugging or Bunching 20 Cycling Trending Patterns of Sample Data • When we collect sample output data (Y), we want to know not only what its patterns are, but also what other factors are related to the pattern in Y • Two concepts are used to describe the factors related to these patterns are: • Segmentation – external factors • Stratification – internal and process measures • If easy, gather other process data as well: • shift, time, operator, part number, material type, etc… 21 Examine External Factors through Segmentation • Used to identify differences between different factors or processes • Components from different suppliers • Example below (l to r): in-person, fax, telephone, on-line • Segmentation may point out major drivers of defects or correlation to the output Y 22 Stratification for Internal and Process Measures • Stratification is a data analysis technique by which project Y data is sorted according to relevant subgroups called levels or strata • Understanding Level differences may lead to the root cause, that will lead to ultimate improvement of the process/ project 23 Data Collection Plan - Stratification Example Project Y’s Loan Application Cycle Time What team must change to influence Customer Satisfaction Location Stratified X’s What team must study and/or change to influence Project Y’s Size of Loan Phoenix New Orleans Houston Jacksonville Small Big Medium This strategy tries to identify variation within locations and perhaps, is there a relationship between loan size and location? 24 How Much Data Do I Need? • The amount of data required depends greatly on: • • • • The process you’re collecting it from What you’re trying to represent The difference you’re trying to detect Your confidence levels (discussed later) • As a general rule of thumb it should be enough to cover the sampling aspect we discussed • More is always better… 25 Sample Sizes for Data Displays Rules of Thumb Tool or Statistic Mean Standard Deviation Proportion Defective (P) Histogram or Pareto Control Chart 26 Minimum Sample Size 10 - 15 20 30 25 - 50 20 - 30 But I Only Have 5 Data Points? • In some situations you may not have enough data to collect a sample • If you run into this situation it’s important to understand: • How to deal with these small sample sizes? • How it affects your ability to use statistics? 27 Risks With Small Samples – Margin of Error • It is all about precision, tolerance for risk and cost. • For samples smaller than 1000, we always have to think about how confident we want to be that estimates are within a particular range (level of confidence and risk), and how small we want that range to be (level of precision). Unfortunately, they go in opposite directions. Higher levels of confidence require greater ranges (margins of error) in small sample sizes. 28 Dealing With Small Samples • Expected effects may not be fully accurate, so be upfront about the limitations and document your sampling strategies, decisions, and criteria • See it as an opportunity to keep evaluation costs low recognizing that a large study without sufficient resources can under-power results • Hesitate to report percentages, or don't at all - report fraction instead, as percents can be misleading and may overstate results 29 Questions to Consider • • • • • • What type of data of data analysis will be conducted? Will subgroups be compared? What is the probability of the event occurring? How much error is tolerable (confidence interval)? How much precision do we need? How confident do we need to be that the true population value falls within the confidence interval? • What is the budget? Can we afford the desired sample? • What is the population size? Large? Small/Finite? • If unknown, assume it to be large ( >100,000) 30 Before Collecting Data • One task remains before collecting data • Validation of the Measurement System… • • • • Systems Audit – Data Validation Measurement System Analysis (MSA) Gage calibration Gage repeatability and reproducibility (GR&R) for Variables Data • MSA must be completed before Data Collection! Validation helps understand total variation attributed to operator methods, bias in effort, gage discrepancies … 31 Data Collection Plan Example Customize Your Form and Format – Each process and project has unique requirements 32 Collecting and Recording Data 3 Elements: • A procedure • • • • • What will be measured (Y and X’s) What segmentation or stratification will be recorded Sampling Plan (what, where, when, how much) Who will record and with What instrument Measurement System Validation method • A checklist • Assure all factors, segments, and strata are included • A form • Collect data 33 Data Collection ‘Procedure’ Example What data are you going to need? What are you going to do with it? 34 Data Collection ‘Checklist’ Example Develop Your Form and Format – Each process and project has unique requirements 35 Data Collection ‘Form’ Example Develop Your Form and Format – Each process and project has unique requirements 36 Review • Review Data Collection Plan Definition and Goals • Discuss Sampling Principles • Identify Key Elements of a Data Collection Plan 37 Exercise • Objectives: • • • • • Collect a data sample Calculate the sample mean and standard deviation of the total distribution Plot a histogram of the data total distribution Test the data for normality Do some data mining • Procedure: • Set up ‘helicopter’ and keep all conditions fixed • Except: Wing Length & Body Width • Change Those Randomly • Record flight time values in Minitab for 30 launches • Perform appropriate analysis 38