http://www.juran.com
Scatter Diagrams
A scatter diagram (also known as a scatter plot) is a graphic representation of the
relationship between two variables. It helps us visualize the apparent relationship
between two variables that are plotted in pairs. In the Six Sigma quality improvement
DMAIC methodology, scatter diagrams are usually used to explore relationships in the
Analyze Phase. They are used to help verify the potential root causes because the premise
is that a change in the cause (the X) will produce a change in the effect (the Y). Although
we would like to claim causation, based on a scatter diagram we can only claim
correlation.
The Analyze Phase in DMAIC is essentially a fact-based search for cause-effect
relationships based on the ideas formulated in the Measure Phase. We start with the
symptom of a problem—the measurable “effect” (the Y). Next, through the use of the
cause-effect diagram, we theorize about the possible “causes” (the Xs). Then we collect
data and search for those possible causes that have the strongest influence on the effect. If
we can eliminate or control these causes, we will eliminate or control the effect; the
symptom and the problem will be gone.
While the cause-effect diagram helps a team develop theories about possible causes, the
scatter diagram helps them analyze data to verify or disprove those theories. The scatter
diagram is an ideal way to display data when an improvement team is trying to evaluate a
cause-effect relationship of paired Y and X data. Paired data where the Y and the X are
both continuous is an ideal situation to use scatter diagrams. [Note: Scatter diagrams can
also be used with ranked data and certain discrete Xs but we’ll discuss that another time.]
Because the data on cause-effect relationships almost always display variation, the scatter
diagram is better than a simple table of numbers for summarizing information. The
graphic nature of the scatter diagram helps a team to “see” the relationships between the
variables. To be successful in constructing and analyzing scatter diagrams you will need a
good theory, correctly paired data, accuracy, complete information, and representative
data. You must also be aware of the potential pitfalls including stratification, range of the
data, range of operation, effect of scale, numerical summaries, confounding factors,
correlation without physical understanding, and data problems.
Visual interpretation of scatter diagrams provides a useful, but sometimes limited,
analysis of the relationship between two variables. If a team is examining many causeeffect relationships simultaneously, they may find it difficult to determine which has the
strongest correlation. Calculating the correlation coefficients provides a useful
enhancement to the scatter diagrams in these situations. This correlation coefficient is
known as Pearson’s r. In other cases, a team may need to have a more precise,
mathematical description of the relationship between the variables (i.e., finding the
descriptive equation for the “cause” variable to produce a desired “effect”). In these
situations, a regression analysis must be performed to enhance the scatter diagram.
All Rights Reserved, Juran Institute, Inc.
1
http://www.juran.com
Typical Patterns of Correlation are shown and described below:

Strong Positive: If one variable increases at the same time the other variable
increases, they are said to be positively correlated.

Strong Negative: If one variable decreases at the same time the other variable
increases, or vice versa, they are said to be negatively correlated.

Complex: The data points are scattered in a curved pattern. The shape may look like
a rainbow or an arch. The two variables are correlated, though not linearly. As X
increases, Y first increases, then it decreases (or vice versa).

Weak Relationships: A weak correlation does not necessarily mean that the factor
being studied is not a cause. It may simply be a weak cause or a cause that requires
the presence of another contributing factor to bring about the effect. In this latter case,
both the factor under study and the contributing factor are perfectly good causes; you
just need them both to be active simultaneously to get the effect.

No Relationship: The data points are scattered in a shapeless pattern. You can
conclude that the two variables are not correlated over the ranges for which the data
was collected.
All Rights Reserved, Juran Institute, Inc.
2
http://www.juran.com
Example:
A financial services company that serves the “middle market” of investors had a team
improve service to its customers in order to increase its market share of assets under
management. The team had already observed that there was wide variation among their
account executives in the amount of new business.
Now what do you think of the competing theories? What else should the team do?
For hands-on practice, the reader can copy and paste this data set onto MINITAB®
New Business
10282
12279
16702
10277
10844
9387
15593
12792
13977
10954
8074
6433
16856
17962
7008
15804
14157
9589
6688
9380
16174
11382
17190
5248
13140
18102
13609
6466
12740
12492
8790
11736
8598
8707
Number
78
147
217
106
138
127
121
91
158
121
57
129
149
122
25
125
89
38
107
77
178
62
168
97
145
138
189
40
150
153
80
86
120
97
Size
132
84
77
97
79
74
129
141
88
91
142
50
113
147
280
126
159
252
63
122
91
184
102
54
91
131
72
162
85
82
110
136
72
90
All Rights Reserved, Juran Institute, Inc.
3
http://www.juran.com
14957
10262
12042
11362
6927
12653
13331
12250
9825
13953
4163
12104
13740
12588
12088
10578
12039
201
82
56
127
105
107
94
150
102
118
39
64
80
75
181
89
104
74
125
215
89
66
118
142
82
96
118
107
189
172
168
67
119
116
Using MINITAB®:
Select: Graph > Scatterplot

Simple

Y Variables: New Business

X Variables: Number
Select: Graph > Scatterplot

Simple

Y Variables: New Business

X Variables: Size
All Rights Reserved, Juran Institute, Inc.
4
http://www.juran.com
All Rights Reserved, Juran Institute, Inc.
5
http://www.juran.com
There appears to be a strong positive relationship (positive correlation) between New
Business and Number. There does NOT appear to be any relationship between New
Business and Size.
We can confirm this numerically next.
All Rights Reserved, Juran Institute, Inc.
6
http://www.juran.com
The most common way to measure association is using the Correlation Coefficient.
MINITAB® uses the Pearson Product Moment Correlation Coefficient.



The Correlation Coefficient r: Always falls between –1 and +1. It is a positive value
if the value of one variable increases, and so does the other. It is a negative value if
the value of one variable increases, and the other decreases.
A Positive Correlation: Occurs when the values of both variables move in the same
direction. As one goes up, so does the other. As one goes down, so does the other.
A Negative Correlation: Occurs when the value of one variable increases while the
other decreases.
We can test the statistical significance of the correlation.
The Correlation Test is based on the hypotheses:
Ho: There is no relationship between X and Y
Ha: There is a relationship between X and Y
As such, the p-value may be used to evaluate the significance of the relationship:
If p-value ≤ α, reject the null (in other words, the relationship is significant).
In the example, select Stat > Basic Statistics > Correlation:
All Rights Reserved, Juran Institute, Inc.
7
http://www.juran.com
Select all 3 variables:
All Rights Reserved, Juran Institute, Inc.
8
http://www.juran.com
The printout in the Session window displays:
Correlations: New Business, Number, Size
Number
Size
New Business
0.578
0.000
Number
0.033
0.818
-0.698
0.000
Cell Contents: Pearson correlation
P-Value
We can conclude that there is a statistically significant correlation between New Business
and Number (p-value = 0.000, reject the null hypothesis), with a positive correlation
coefficient of 0.578.
There is no correlation between New Business and Size.
Both of these results confirm the scatter plots shown earlier.
=====================================================
All Rights Reserved, Juran Institute, Inc.
9