Measuring Complexity

advertisement
Measuring Complexity
John David Kendrick
Business Process Management, Inc
April 21, 2010
My Purpose
• To provide a framework for measuring
complexity of a hardware or software product
• To inform you about Cluster Analysis and
Principal Component Analysis
• To present examples where this approach was
applied to estimating the complexity of a
hardware and a software product.
Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
My Education
• Master of Engineering, Simulation and Modeling,
Arizona State University
• Master of Applied Statistics, Penn State University
• MBA, Financial Economics, University of
Pittsburgh
• BA Economics, University of Pittsburgh
• BS Math and Computer Science, University of
Pittsburgh
• BS Physics, Purdue University
My Professional Certifications
• Certified Six Sigma Master Black Belt, SigmaPro
• ASQ Certifications:
–
–
–
–
Six Sigma Black Belt
Reliability Engineer
Software Quality Engineer
Quality Manager / Operational Excellence
• Lean Certifications:
– Lean Enterprise, Arizona State University
– Lean Manufacturing Management, IDL Systems
• Certified Reliability Estimation, SigmaPro
Professional Experience
•
•
•
•
•
Naval Air Warfare Center - Aircraft Division
US Army (CIO-G6, TRADOC, MNRA/CIO-G1)
Motorola (Networks Division)
Qwest Communications (US West)
Freddie Mac (Single Family Housing, Program
Management Office, Center of Excellence)
• AT&T (Global Network Operations)
• Ygomi, LLC
My Publications & Quality Promotions
•
John Kendrick and Daniel Saaty, " Analytic Hierarchy Process (AHP) for Six Sigma Project Selection
and Portfolio Optimization", Six Sigma Forum Magazine, American Society of Quality, 8/2007
•
John Kendrick, "Data Stratification for Champions", Six Sigma Forum Magazine, American Society of
Quality, 8/2008
•
John Kendrick, "The Importance of a Proper SPC Subgroup Sampling Technique", Quality Digest,
11/2009
•
John Kendrick, "Poor Technique", Six Sigma Forum Magazine, American Society of Quality, 3/2010
•
•
Speaker ASQ Section 618, March 2008
Speaker ASQ Section 702, April 2010
•
Webinars:
John Kendrick, "Using Discrete Event Simulation to Improve IT Help Desk Operations for ITIL
Problems and Incidents", American Society of Quality, Service Division, 8/2008 (Registered
Audience: 250)
Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
What is Complexity?
Complexity - The level in difficulty in solving
mathematically posed problems as measured
by the time, number of steps or arithmetic
operations, or memory space required (called
time complexity, computational complexity,
and space complexity, respectively).
SOURCE: www.dictionary.com
Common Questions
• How long will it take to make a …?
• How many defects can we expect when we are
making a …?
• How many problems can we expect from our
customers?
• How many hours (in FTE) will it take to make a…?
Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
The Approach
• Group “similar” objects using Cluster Analysis
• Use Principal Component Analysis to
mathematically describe the objects and the
group
Advantages Over Alternative Methods
• No guessing / “gut feel” / reliance on instinct
(breaks down with more than three dimensions)
• More Natural - Discriminate Analysis establishs
the groups before the analysis rather than based
on the objects under examination
• Greater Flexibility –
– Many types of measurements can be applied to the
objects
– Many types of cluster analysis algorithms are available
• In my experience – this always works!
Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
What is Cluster Analysis?
• “Cluster Analysis is the art of finding groups in data.” (1,p.1)
• Popular in the 1980’s
• Considered a branch of pattern recognition and artificial
intelligence.
• Consider these two groups:
– { a,b,c} and {A,B,C}
– What makes the elements similar?
We all agree that there are two groups, but how did we establish
these groups? What rules did we use? What associations are
there between the elements? How did we determine the
criteria for the measurements or which attributes to use for the
groups?
How to Proceed?
• Like many statistical approaches, this is as much an art as a
science
• Understand the question to be answered
• Determine the significant attributes
• Establish and validate a measurement system for each
attribute
• Devise a rule (or algorithm) to assign objects to groups
based on instances of the attributes
• VERIFY AND VALIDATE THE MODEL
“All models are wrong, some models are useful.”
– Dr. George Box and Dr. John Fowler
Data Preparation
•
Types of Data
–
–
–
–
–
–
•
Variables (Continuous) Data
Attribute (Discrete) Data
Nominal Data
Ordinal Data
Ratio Data
Interval Data
Distribution of the data
– Linear
– Exponential
•
•
•
Units selected
Data Centered?
Transformation Needed?
The groupings will change depending on the data type, units used,
how the data is dispersed, and the clustering algorithm selected.
Ways to Measure Associations
Between Objects
Distance
Contribution from each element
1 n
MeanAbsoluteDeviation  MAD    xi , f  x f
n i 1
minimize
1 n
Mean  m f    xi , f
n i 1
yy
y
x x x
x
Ways to Measure Dissimilarities
Between Variables
Correlation
Superset of Euclidean Distance
Accounts for covariance
DM ( x) 
x   T  S 1  ( x   )
x  ( x1 , x2 ,...,xn )
  (1 , 2 ,...,n )
A measure of divergence or distance between groups is Mahalanobis distance.
Used to measure the similarity of an unknown set of objects to a known set of objects.
Accounts for correlations between variables and is independent of scale.
Which Algorithm to Select?
Partitioning Methods
– Partitions of the Space
– Partitions around Medoids
– Fuzzy Analysis
A
B
X
C
“X is 90% associated with A, 5% with B, and 5% with C”
1111
11
3 3
2 2
2
Which Algorithm to Select?
Hierarchical Methods
Agglomerative
Divisive
Example: A Taxonomy that breaks down types of species.
Which Algorithm to Select?
Partitioning Methods
– Partitions of the Space
– Partitions around Medoids
– Fuzzy Analysis
A
B
X
C
“X is 90% associated with A, 5% with B, and 5% with C”
1111
11
3 3
2 2
2
Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
Principal Component Analysis
• Principal component analysis is a linear
transformation from the original coodinate
system in p space to a new orthogonal
coordinate system
• The new coordinate system is directed in the
directions of maximum variation of the data in
the original coordinate system
• The variables in the new coordinate system
are uncorrelated
Principal Component Analysis
• Variable-directed technique
• Will the first few components account for most of the
variation?
• The goal is to describe the data with a smaller number
of variables, hence a data reduction method
• The coordinates of the transformation are constructed
from the correlation matrix
• No assumptions are made about the distribution of the
original data
• Interpretations of the data in the transformed space is
an art…
Principal Component Analysis
p2
x2
p1
Can examine in C1
x1
x3
Original data in C3
Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
Example 1: Software Development
Example 1: Software Development
The principal component index “PC1Effort” is associated with the clusters in the following way:
If -3021 < PC1Effort < -2219 then the Level of Effort is associated with Cluster 1 – Low Level of Effort
If -7085 < PC1Effort < -6063 then the Level of Effort is associated with Cluster 2 – Medium Level of Effort
If -11155 < PC1Effort < -9940 then the Level of Effort is associated with Cluster 3 – High Level of Effort
Verify and Validate the Model!
Example 1: Software Development
The principal component index “PC1Effort” is associated with the clusters in the following way:
If -3021 < PC1Effort < -2219 then the Level of Effort is associated with Cluster 1 – Low Level of Effort
If -7085 < PC1Effort < -6063 then the Level of Effort is associated with Cluster 2 – Medium Level of Effort
If -11155 < PC1Effort < -9940 then the Level of Effort is associated with Cluster 3 – High Level of Effort
Verify and Validate the Model!
Example 1: Software Development
High
Med
Low
High Complexity = (400,475) hours
Medium Complexity = (160,190) hours
Low Complexity = (80,90) hours
Verify and Validate the Model!
pc1
Example 2: Electronic
Product Defects
Expected Defects = High
Expected Defects = Med
Expected Defects = Low
Example 2: Electronic
Product Defects
PC1defects = -2.75*components – 2.72*levels – 2.75*solder_joints
PC2defects = 0.007*components + 0.202*levels +0.156*solder_joints
Cluster 1: PC1defects = (-723,-571 ) and PC2defects = (25, 34)
Cluster 2: PC1defects = ( -1185,-1108) and PC2defects = ( 49, 53)
Cluster 3: PC1defects = ( -1424,-1377 ) and PC2defects = ( 63, 65)
Verify and Validate the Model!
Example 2: Electronic
Product Defects
Expected Defects = High = (252,317)
Expected Defects = Med = (180, 210)
Expected Defects = Low = (128,150)
Recap of Agenda
• Introduction - John David Kendrick
• Overview of the Problem
• Background
– Cluster Analysis
– Principal Component Analysis
• Presentation of the Technique and Examples
Questions?
John David Kendrick
Business Process Management, Inc.
(480) 307-0541
jkendri274@earthlink.net
Thank You!
John David Kendrick
References:
Kaufman, Lenonard, Finding Groups in Data, ©1990, John Wiley & Sons, New York,
New York, Chapters 1-5.
Johnson, Richard and Wichern, Dean, Applied Multivariate Statistics, ©1988m,
Prentice Hall, Englewood Cliffs, New Jersey, Chapter 8.
Download