Multivariate Analysis of Manufacturing Data Ronald Cao

Multivariate Analysis of Manufacturing Data
by
Ronald Cao
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering and Bachelor of Science in Electrical Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 1997
@ Massachusetts Institute of Technology 1997. All rights reserved.
The author hereby grants to MIT permission to reproduce and distribute
publicly paper and electronic copies of this thesis document in whole or in
part, and to grant others the right to do so.
..
.....
.........n
....... ""
.
Author .............. ..
Department of Electrical " ineering and Computer Science
May 23, 1997
Certified by .................~...
..
-,. . ......
David
Professor of Electrical Engineering
SThesis Supervisor
1\
Accepted by.................
:C
......
..........
"rederig- R. Morgenthaler
,Department Con~itee onorn
ate Students
/
Multivariate Analysis of Manufacturing Data
by
Ronald Cao
Submitted to the Department of Electrical Engineering and Computer Science
on May 23, 1997, in partial fulfillment of the
requirements for the degree of
Master of Engineering and Bachelor of Science in Electrical Engineering
Abstract
With the advancement of technology, manufacturing systems have become increasingly complex. Currently, many continous-time manufacturing-processes are operated by a complicated array of computers which monitor thousands of control variables. It has become more
difficult for managers and operators to determine sources of parameter variation and to
control and maintain the efficiency of their manufacturing processes.
The goal of this thesis is to present a sequence of multivariate analysis techniques that can
be applied to the analysis of information-rich data sets from web manufacturing processes.
The focus is on three main areas: identifying outliers, determining relationships among variables, and grouping variables. The questions asked are 1) how to effectively separate outliers
from the main population? 2) how to determine correlations among variables or subprocesses? and 3) what are the best methods to categorize and group physically significant
variables within a multivariate manufacturing data set?
Results of various experiments focused on the above three areas include 1) both normalized Euclidean distance and principal component analysis are effective in separating the
outliers from the main population, 2) correlation analysis of Poisson-distributed defect densities shows the difficulties in determining the true correlation between varibles, and 3) both
principal component analysis with robust correlation matrix and principal component analysis with frequency-filtered variables are effective in grouping variables. Hopefully these
results can lead to more comprehensive research in the general area of data analysis of
manufacturing processes in the future.
Thesis Supervisor: David H. Staelin
Title: Professor of Electrical Engineering
Acknowledgments
It has been an incredibly meaningful and fulfilling five years at MIT. The following are just
some of names of the people who have made tremendous contributions to my intellectual
development and my personal growth.
* My advisor, Professor David Staelin, who provided me with the guidance that I needed
on my research and thesis. He has inspired me with insightful ideas and thoughtprovoking concepts. In addition, he has given me the freedom to explore my ideas as
well as many valuable suggestions for experimentations.
* My lab partners: Junehee Lee, Michael Shwartz, Carlos Caberra, and Bill Blackwell.
Many thanks to Felicia Brady.
* Dean Bonnie Walters, Professor George W. Pratt, Professor Kirk Kolenbrander, Professor Lester Thurow, and Deborah Ullrich.
* All the friends I have made through my college life, especially my good friends David
Steel and Jake Seid and the brothers of Lambda Chi Alpha Fraternity.
* Most of all, I would like to thank my parents for their endless love and support. They
have been there through every phase of my personal and professional development.
Thank you!
Contents
1 Introduction
17
. ..
1.1
Background ...................
1.2
Previous Work
1.3
O bjective
1.4
Thesis Organization ................................
..
..
17
.....
.....
18
..................................
. . . . . . . . . . . . . .... .
. . . . . . . . . . . . . . . . . . .
19
21
2 Basic Analysis Tools
......
2.1
Data Set .............
2.2
Preprocessing ..............
2.3
19
....
........
....
....
.....
..
........
. .
21
. .
22
2.2.1
M issing Data ...............................
22
2.2.2
Constant Variables ............................
22
2.2.3
Norm alization ...............................
22
24
Outlier Analysis ..................................
2.3.1
Definition . ..
......
...
..
2.3.2
Causes of Outliers
2.3.3
Effects of Outliers ..............................
2.3.4
Outlier Detection .............................
..
..
...
. ...
..
. ..
.. .
............................
2.4
Correlation Analysis
2.5
Spectral Analysis .................................
2.6
Principal Component Analysis ...................
24
..
Basic Concept ...............................
2.6.2
Geometric Representation ...................
2.6.3
Mathematical Definition ...................
7
25
25
...............................
2.6.1
24
26
28
.......
29
29
.....
......
29
31
35
3 Web Process 1
3.1
Background ..........................
3.2
Data ......................
3.3
Preprocessing ....................
3.4
Feature Characterization .............................
3.5
3.6
3.7
4
...
......
..
................
3.4.1
In-Line Data
. ..
3.4.2
End-of-line Data
Correlation Analysis
..
35
................
...
..
..
..
36
36
..
...
..
..
..
..
..
. .. .
36
..
37
............
...............
....................
...........
39
39
........................
3.5.1
Streak-Streak Correlation
3.5.2
Streak-Cloud Correlation .........................
3.5.3
Interpretation ....................
Poisson Distribution
35
41
42
.....
.......
43
...............................
43
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .
3.6.1
M ethod
3.6.2
Results.............
.....
...................
Principal Component Analysis ............
..
.....
.......
44
45
.
3.7.1
PCA of the End-of-Line Data ......................
45
3.7.2
PCA of the In-Line Data .........................
45
3.7.3
Interpretation ..................
47
..
...........
49
Web Process 2
.
49
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
Background ........
4.2
D ata . . . . . ....
4.3
Preprocessing ..........
4.4
Feature Characterization .................
4.5
4.6
..
.
In-Line Variables ......................
..
...
...........
4.4.2
..
....
.......
.. ......
...
50
51
............
52
. . . ................
52
Outliers Analysis .................................
4.5.1
Normalized Euclidean Distance
4.5.2
Time Series Model - PCA .. .....
4.5.3
Identifying Outlying Variables .............
...
50
............
Quality Variables ........
...
50
..........
.............
4.4.1
Variable Grouping
.........
.......
.......
....
4.1
..............
............................
....
. . . . . . . . . .
53
58
62
5
4.6.1
Principal Component Analysis .......................
4.6.2
PCA with Robust Correlation Matrix . .................
4.6.3
PCA with Frequency-Filtered Variables.
Conclusion and Suggested Work
...
. . . . .
. ..
62
64
.......
72
79
List of Figures
30
2-1
(a) Plot in Original Axes (b) Plot in Transformed Axes . ...........
3-1
Time-Series Behavior of a Typical In-Line Variable
3-2
Cross-Web Position of Defects Over Time . ..................
3-3
Streak and Cloud Defects
3-4
The 10 Densest Streaks Over Time .......................
3-5
Correlation Coefficients Between Streaks Using a) standard time block, b)
. . . . . . . . . . . . . .
.
40
41
Correlation Coefficients Between Streak and Cloud with Time Blocks of length
1 to length 100
3-7
37
38
............................
double-length time block, c) quadruple-length time block . ..........
3-6
37
. . ..
..
. ..
..
..
..
. . . ..
..
. . ..
. . . . . .. .
42
(a) Cloud Distribution Using Fixed-Length Time Blocks, (b) Ideal Poisson
Distribution Using the Same Fixed-Length Time Blocks . ...........
43
3-8
Distributions of Cloud Defects Using x2, x4, and x6 Time Blocks .......
44
3-9
Ideal Poisson Distributions Using x2, x4, and x6 Time Blocks
44
3-10 First 3 Principal Components of the End-of-Line Data
. .......
46
. ...........
3-11 Percent of Variance Captured by PCs ......................
46
3-12 First 4 PCs of the In-Line Data .........................
47
4-1
Ten Types of In-Line Variable Behavior ...................
..
4-2
Normalized Euclidean Distance ...................
4-3
Outlier and Normal Behavior Based on Normalized Euclidean Distance . . .
54
4-4
First Ten Principal Components of Web Process 2 Data Set
55
4-5
High-Pass Filters with Wp=0.1 and Wp = 0.3 . ................
56
4-6
First Ten Principal Components from 90% High-Pass Filterd Data ......
57
......
. ........
51
53
4-7
First Ten Principal Components from 70% High-Pass Filtered Data ....
.
57
4-8
Variables Identified that Contribute to Transient Outliers in Region 1 and 4
60
4-9
The First Principal Component and the Corresponding Eigenvector from Process 2 Data .......
.
..............
..............
4-10 The First Ten Principal Components from 738 Variables
. ..........
61
4-11 The First Ten Eigenvectors from 738 Variables . ................
4-12 First 10 Eigenvectors of 738 Variables ........
61
. ..............
4-13 Histograms of the First 10 Eigenvectors of 738 Variables
60
.
63
. ..........
63
4-14 Magnitude of Correlation Coefficients of 738 Variables in Descending Order
in (a) Normal Scale, (b) Log Scale ........................
65
4-15 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff
= 0.06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
4-16 Histograms of the first 10 Eigenvectors Calculated from Robust Correlation
M atrix with Cutoff = 0.06 .........
66
....................
4-17 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff
= 0.1 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .. .
.
67
4-18 Histograms of the first 10 Eigenvectors Calculated from Robust Correlation
67
M atrix with Cutoff = 0.10 ............................
4-19 First 10 Eigenvectors calculated from Correlation Matrix with Cutoff at 0.15
68
4-20 Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix
with Cutoff at 0.15 .......
68
........
.................
4-21 First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff
at 0.18 ...
....
.....................
..
.........
..
69
4-22 Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix
with Cutoff at 0.18 ........
...
......................
69
4-23 A Comparison of Eigenvectors Calculated from (a) Original Correlation Matrix (b) Robust Correlation Matrix with Cutoff=0.06, (c) Robust Correlation
Matrix with Cutoff=0.10, (d) Robust Correlation Matrix with Cutoff=0.15,
(e) Robust Correlation Matrix with Cutoff=0.18 . ...............
4-24 A Comparison of Histograms of the Eigenvectors
. ..............
71
71
4-25 (a) High-Pass Filter with Wp = 0.3 (b) Band-Pass Filter with Wp=[0.2, 0.4]
72
4-26 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.1) Variables
73
4-27 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.1)
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4-28 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables
74
4-29 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3)
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
4-30 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables
75
4-31 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4)
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
75
4-32 First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.4])
V ariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
4-33 Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2,
0.4]) Variables ....................
.....
..
..
......
76
4-34 First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3])
Variables . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .... .
77
4-35 Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2,
0.3]) Variables . . . ..
..
...
. . . . . . . . ..
. . . . . . . . . . . . .. .
77
List of Tables
2.1
12 Observations of 2 Variables.............
..... .......
.....
29
Chapter 1
Introduction
1.1
Background
With development of technology, manufacturing systems are getting increasingly more complex. A typical continous-time manufacturing process may be controlled and monitored by
thousands parameters such as temperature and pressure. With higher customer standards
and higher operating cost, manufacturing companies are constantly creating new ways to
deal with the problem of how to increase efficiency and reduce cost.
The Leaders for Manufacturing (LFM) Program is a joint effort among leading U.S. manufacturing firms and both the School of Engineering and the Sloan School of Management
from the Massachusetts Institute of Technology. The goal of LFM is to identify, discover,
and translate into practice the critical factors that underlie world-class manufacturing. MIT
faculty and students and participating LFM companies have identified seven major themes
of cooperation.
These research themes are Product and Process Life Cycle, Scheduling
and Logistics Control, Variation Reduction, Design and Operation of Manufacturing Systems, Integrated Analysis and Development, Next Generation Manufacturing, and Culture,
Learning, and Organizational Change.
The research and analysis presented in this thesis is directly related to Leaders For
Manufacturing Research Group 4 (RG4) whose focus is variation reduction in manufacturing
processes. Understanding variations and methods to reduce them can help companies to
improve yields, reduce defects, decrease product cycle time, and generate higher quality
products.
In order to gain this understanding, RG4 attempts to answer questions such
as 1) how to effectively determine which process parameters to monitor and control? 2)
what are useful technique to determine multivariate relationship among control and quality
variables? 3) How to best communicate results and findings to managers and engineers at
the participating companies?
There are many types of manufacturing processes in industry. The type of processes
that this thesis focuses on are referred to as web processes. The particular characteristic
associated with a web process is that the end product is in the form of sheets with the
appropriate thickness, width and length and can be packaged into rolls or sliced into sheets.
Although multivariate analysis methods are applied to two data sets collected from web
processes, most of the tools discussed in this thesis can also be applied to analyze data from
other types of processes.
1.2
Previous Work
My research builds on the work conducted by previous LFM RG4 research assistants. In his
Master's thesis titled "The treatment of Outliers and Missing Data in Multivariate Manufacturing Data", Timothy Derksen developed strategies for dealing with outliers and missing
data in largý, multivariate, manufacturing data.[2] He compared the effectiveness of statistics based on standard versus robust estimates of the mean, standard deviation, and the
correlation matrix in separating the outliers from the main population.
In addition, he
developed maximum likelihood methods to treat missing data in large multivariate manufacturing data. Mark Rawizza's "Time-Series Analysis of Multivariate Manufacturing Data
Sets" [4] discussed various data analysis tools used in engineering and applied them to manufacturing data sets. He used fundamental preprocessing and data-reduction techniques such
as principal component analysis to present and reorganize manufacturing data. Furthermore,
he experimented with ARMA models and neural networks to assess the predictability of data
sets collected from both web processes and wafer processes.
1.3
Objective
The objective of this thesis is to apply series of multivariate techniques to analyze informationrich data sets from continous web manufacturing processes. In particular, a lot of the analysis is based on having an understanding of the physics behind the manufacturing process.
Combining multivariate analysis tools with an understanding of the underlying physics can
produce results and insights that can be very valuable company managers.
The questions asked in this thesis are: 1) how to effectively separate outliers from the
main population?
2) how to determine relationships among variables and subprocesses?
and 3) what are the best methods to categorize and group variables within an informationrich multivariate data set? Results of various experiments focused on these three areas are
discussed. Hopefully they can lead to more comprehensive research in the general area of
data analysis of manufacturing processes in the future.
1.4
Thesis Organization
This thesis is divided into four major sections. Chapter 2 presents an overview of the major
multivariate analysis tools and methods used in the rest of the thesis. These tools deal with
preprocessing of the original data, outlier identification and analysis, correlation analysis,
spectral analysis, and principal component analysis.
Chapter 3, the second major section, utilizes the multivariate tools presented in Chapter
2 to analyze a web manufacturing data set. With the data set divided into in-line variables
and quality variables, the objective is to perform multivariate analysis on these two sets
of variables separately and to determine multivariate linear relationships between them. In
addition, correlation analysis is performed on Poisson-distributed defect densities.
Chapter 4, the third section, applies the basic tools to analyze a data set from a different manufacturing web process, where the in-line variables and the quality variables are
not identified. The analysis focuses on experimenting with ways to more effectively separate variables utilizing principal components. Experimental results show that PCA with
robust correlation coefficients and PCA with frequency-filtered variables are more effective
in grouping and identifying the variables that are correlated with each other.
Chapter 5, the final section, summarizes the important insights gained and suggests
possible areas of continued research.
Chapter 2
Basic Analysis Tools
2.1
Data Set
A data set contains information about a group of variables. The information is the values
of these variables for different times or situations. For cxample, we might have a data set
that consists of weather information for 50 states. There might be 20 variables such as
rainfall, average temperature, and dew point temperature, and 50 observations of these 20
variables representing the 50 states. The data set might also be the same 20 variables and
50 observations representing daily measurements of each of these 20 variables for one state
over 50 days. Both data sets can be represented as a m x n matrix, where m = 50 is the
total number of observations and n = 20 is the total number of variables.
The data sets used in this thesis are recorded from continous-time web manufacturing
systems. A typical data set may consists of measurements of thousand of variables for
thousands of observations recorded over days. The variables can be categorized as either inline variables or end-of-line variables. The in-line variables of a manufacturing system control
and monitor the operation of the manufacturing process. Some typical in-line variables are
temperature, pressure, volume, speed, and so on. Furthermore, end-of-line variables, also
referred to as quality variables, provide managers and technicians with information on the
quality of the end product of the manufacturing process. Some typical quality variables are
defect size, defect location, thickness and strength.
2.2
Preprocessing
Preprocessing the data is an integral part of data analysis. Very rarely can large new data
sets be used unaltered for multivariate analysis. The following are three major parts of
preprocessing:
2.2.1
Missing Data
Within a raw manufacturing data set, very rarely are all the observations complete, especially when measurements are collected over days. Often, parts of machines or subprocesses
are shut down for maintenance or testing purposes. As a result, certain parameters are not or
cannot be recorded. These missing observations need to be treated before any multivariate
analysis. Timothy Derksen, in "The treatment of Outliers and Missing Data in Multivariate
Manufacturing Data", investigated methods of detecting, characterizing, and treating missing data in large multivariate manufacturing data sets.[2] In general, if a variable has most
of its observations missing, the variable should be removed completely from the data set.
Otherwise, the missing observations can be estimated using the EM algorithm described by
Little.[6]
2.2.2
Constant Variables
Multivariate analysis allows for understanding variable behavior in a multi-dimensional
world. Any variables that are constant over time do not exhibit any information relevant to
multivariate analysis. As a result, variables that have zero variance should be removed from
the data set.
2.2.3
Normalization
For a web process data set that contains n variables and m observations, the n variables
can consist of both control and quality parameters such as temperature, pressure, speed,
thickness, density, and volume. Since all these variables are most likely measured in different
units, it is often very difficult to compare their relative values. To deal with this comparison
problem, normalization is applied to the variables.
For a given mxn matrix with i = 1, 2, . .. , m observations and j = 1, 2, . .. , n variables,
where the value of any ith observation and jth variable is denoted as Xej, the corresponding
value in the normalized data set is denoted as Zij.
Normalization is commonly defined as the following:
Zi = Xi - 9
'Tj
(2.1)
where
xij
Xj = '=1=
m
(2.2)
m -1
(2.3)
In words, to calculate the normalized Zaj for any ith observation and jth variable, we take
the corresponding value Xij and subtract the mean Xj of the jth variable, and divide the
result by the standard deviation aj of the jth variable. In the rest of this thesis, a variable
that is said to be normalized is normalized to zero mean (Zi = 0) and unit variance (oa = 0).
There are benefits and drawbacks with performing normalization before multivariate
analysis. The following are reasons for normalization:
* Normalization causes the variables to be unit-less. For example, if the unit of Xij is
meters, Xij - Xj is also in meters. When the result is divided by aj, also measured
in meters, the final value Zii will be unit-less. As a result of normalization, variables
originally measured in different units can be compared with each other.
* Normalization causes all the variables to be weighted equally. Since the normalized
variables are zero-mean and unit-variance, each variable is weighted equally in determining correlations among variables. Normalization is especially important before
performing multivariate analyses such as principal component analysis, because it gives
each variable equal importance. More on normalization and principal component analysis will be discussed in Section 2.6.
* Normalization is a way of protecting proprietary information inherent in the original
data. By taking away the mean and reshaping the variance, the information that is
proprietary can be removed. Protecting proprietary information is a very important
part of LFM's contract with its participating companies.
The following is one of the drawbacks of normalization:
* Normalization may increase the noise level. Since normalizing causes all the variables
to have unit variance, it is likely that some measured noise will be scaled so that it rivals
the more significant variables. As a result, normalization may distort the information
in the original data set by increasing the noise level.
2.3
Outlier Analysis
The detection and treatment of outliers are an important pre-step to performing statistical
analysis. This section defines outliers, names the causes and effects of outliers, and presents
some univariate and multivariate tools of detecting outliers.
2.3.1
Definition
Outliers are defined as a set of observations that are inconsistent with the rest of the data.
It is very important to understand that outliers are defined relative to the main population.
2.3.2
Causes of Outliers
The following are the causes of outliers:
1. Extreme members - Since manufacturing data consist of variables recorded over thousands of observations, it is possible that some observations can occasionally exhibit
extreme values.
2. Contaminants - These are observations that should not be grouped with the main
population. For example, if the main population is a set of observations consisting the
weight of apples, the weight of an orange is considered a contaminant if it is placed
in the same group. In a manufacturing process, a contaminant can be an observation
made while a machine is broken amidst observations made while the machine is properly
operating.
2.3.3
Effects of Outliers
Statistical analysis without the removal of outliers can produce skewed and misleading results. Outliers can potentially drastically alter the sample mean and variance of a population. In addition, outliers, especially contaminants, can incorrectly signal the occurrence
of extreme excursions in a manufacturing process when the process is actually operating
normally.
2.3.4
Outlier Detection
For a data set with n variables and m observations, a potential outlier is a point that
lies outside of the main cluster formed by the general population. The following are some
methods used to determine outliers.
Univariate Method
An univariate method of detecting outliers is the calculated the
univariate number of standard deviations from the mean.
zij =
(2.4)
where xij is the value of observation i and variable j, Tj is the sample mean of the variable
j, and aj is the sample standard deviation of the variable j. Observations where zij > Kj,
where Kj is a constant for variable j, can be categorized as outliers. Depending on the range
of the values of observations for each variable, the value of Kj can be adjusted. To determine
gross outliers, the value of Kj can be set to be large.
Multivariate Methods
Equation 2.4 can be extended so that it represents a multivariate
measure of the distance of all the variables away from the origin. Observations where zj >
K, where K is a constant, can be treated as points lying outside of a n-dimensional cube
centered on the sample mean. This multivariate method is very similar to the univariate one,
except the value of K is constant for all variables. Similarly, this method can be effective in
determining gross outliers, but in a manufacturing environment where most of the variables
are correlated, this method is limited in its effectiveness in identifying outliers.
A more robust multivariate method to detect outliers involves calculating the Euclidean
distance to the origin of the n-dimensional space after all n variables are normalized to zeromean and unit-variance. The square of the normalized Euclidean distance is defined as the
following:
d2 =
N
2
--
(2.5)
j=1 si
where xij is the value of observation i for variable j, and sj is the sample variance of variable
j. Observations with d? > K, where K is a constant, lie outside of an ellipsoid centered
around the origin and are considered as outliers.
2.4
Correlation Analysis
In a multivariate manufacturing environment, it is often desirable to measure the linear
relationship between pairs of variables or among groups of variables. By understanding these
relationships among variables, managers can gain insights into the manufacturing process.
One method of determining the linear relationship between variables is to calculate their
covariance and correlation.
Given two variables i and j, with m observations, the sample covariance sij measures the
linear relationship between the two variables and is defined as the following:
sij
e = v)rs -
a 1i)(eke (ki
k=l
-
s
(2.6)
For n variables, the sample covariance matrix S = (sij) is the matrix of sample variances
and covariances of combinations of the n variables:
S = (si) =
S11
312
...
Sin
821
S22
...
S2n
(2.7)
Sn1
Sn2
*..
Snn
where the diagonal of S represents the sample variances of the n variables, and rest of the
matrix represents all the possible sample covariances of pairs of Variables. The covariance of
the ith and jth variables, sij, is defined by Equation 2.6, and the variance, sij = s?, of the
same pair of variables is defined as the following:
i
" m=
- 1
(ki
- -i)
2
(2.8)
k=l
Since the covariance depends on the scale of measurement of variable i and j, it is difficult
to compare covariances between different pairs of variables. For example, if we change the
unit of a variable from meters to miles, that covariance will also change. To solve this
problem, we can normalize the covariance by dividing by the standard deviations of the two
variables. The normalized covariance is called a correlation.
The sample correlation matrix R can be obtained from the sample covariance matrix and
is defined as:
1
r12
Rr21
1
...
r2n
rnl
rn2
...
1
... rln
(2.9)
where rij, the sample correlation coefficient of the ith and jth variable, is defined as the
following:
rn =
(2.10)
Since the correlation of a variable with itself is equal to 1, the diagonal elements of matrix R
in Equation 2.9 are all is. In addition, please notice that if the variables are all normalized
to unit variance such that s;i =1 and sjj = 1, then the correlation matrix R is equal to the
covariance matrix S. Since most of the multivariate analysis discussed in this paper deals
with normalized variables, R is often substituted for S.
2.5
Spectral Analysis
Fourier Transform
Fourier transforms can be an excellent tool to gain insight into the
behavior of variables in the frequency domain. For the jth variable observed at time i =
1, ..., m, the Fourier transform is defined as:
m
xiie -
Xj(ew) =
jwn
(2.11)
i=1
Autocorrelation Function
The autocorrelation function looks at a variable's correlation
with itself over time. A typical random signal is more correlated with itself over short time
lag versus long time lag. The autocorrelation of variable xj is:
R j(r) =E[xizx(i-,,)]
Power-Spectral Density
(2.12)
Power-Spectral Density (PSD) is the Fourier transform of the
autocorrelation function of a random signal xj(t).
Payr(w) = F(Rxx,(r))
(2.13)
where F is the Fourier transform operator and Rxj, (7) is the autocorrelation of the random
signal xj(t). For simplicity, the ensemble average of xj(t) is assumed to be zero without
any loss of generality. The calculation of autocorrelation requires the ensemble average of
xj(t)xz(t - r). Since our data consist of one sample sequence for each variable, this ensemble
average is impossible to get. One technique to get around this problem is to assume the
sequence is ergodic. Then the PSD is the magnitude squared of the Fourier transform.
P=xj(w) = IF(xj(t))12
(2.14)
2.6
2.6.1
Principal Component Analysis
Basic Concept
Principal components analysis (PCA) is a mathematical method for expressing a data set in
an alternative way. The method involves using linear combinations of the original variable
to transform the data set onto a set of orthogonal axes. The main objective of principal
component analysis is two-fold: 1) data reduction, and 2) interpretation.
Principal components analysis is often referred to as data reduction rather than data
restatement, because it preserves the information contained in the original data in a quite
succinct way. Principal component analysis takes advantage of the relationship among the
variables to reduce the size of the data while maintaining most of the variance in the original
set. A data set with n variables and m observation can be reduced to a data set with k
principal components and m observations, where k < n. In addition, since PCA transforms
the original data into a new set of axes, it often reveal relationships that are buried in the
original data set. As a result, PCA is a powerful tool in multivariate analysis.
2.6.2
Geometric Representation
Principal components analysis can be best understood in terms of geometric representation.
We can start with a simple two-dimensional example. Table 2.1 shows 12 observations of 2
variables, X1 and X 2 .
1 2
3
4
5
X1
8 4 5
3
X2
4 6
2 -2
6
7
8
9
10
11
12
1 2
0 -1 -3 -4 -5
-8
3
0
-3
-2
2
-2
-6
-1
Table 2.1: 12 Observations of 2 Variables
Figure 2-1 represents the data in Table 2-1 using two different sets of axes. The points
in Figure 2-la are interpreted relative to the original set of axes, while the same points
I
I
'
'
'
'
m
·
·
·
· I
.
.
-6
.
-4
.
-2
· I
0
.
2
.
4
.
6
.
8
-8
(a)
I
(b)
Figure 2-1: (a) Plot in Original Axes (b) Plot in Transformed Axes
are interpreted relative to a new set of orthogonal axes in Figure 2-lb. The information is
preserved as the axes are rotated.
Similar to Figure 2-1b, principal components are defined as a transformed set of coordinate axes obtained from the original data used to describe the information content of the
data set. In a 2-dimensional data set. The first principal component is defined as the new
axis that captures most of the variability of the original data set. The second principal component, perpendicular to the first one, is the axis that captures the second biggest variance.
The principal components are calculated in a minimal squared-distance sense. The distance
is defined as the perpendicular distance from the points to the candidate axis. The first
principal component is the axis where the sum of the squared-distance from the data points
to the axis is minimal among all possible candidates. The second principal component is
taken perpendicular to the first one, for which the sum of the squared distance is the second
smallest.
In a multivariate data set that extends over more 2 dimensions, PCA finds directions
in which multi-variable data contain big variances (therefore, much information). The first
principal component has the direction in which the data have the biggest variance. The
direction of the second principal component is that with the biggest variance among the
directions which are orthogonal to the direction of the first principal component, and so on.
After a few principal components, the remaining variance of the rest is typically small enough
so that we can ignore them without losing much information. As a result, the original data
set with n dimensions (n variables) can be reduced to a new data set with k dimensions (k
principal components) where k < n.
2.6.3
Mathematical Definition
Principal component analysis takes advantage of the correlation among variables to find new
set of variables which reduce most of the variation within the data set to as few dimension
as possible. The following is the mathematical definition [3]:
Given a data set with n variables and m observations, the first principal components
must satisfy the following conditions:
1. z1 is linear function of the original variables.
zi = w11X 1 + W12 X 2 + ... + wlnXn
where w 11, w12 ,
.
. .
(2.15)
, w• are constants defining the linear function.
2. Scaling of new variable z l .
w1 + 2 + ...+ n=1
(2.16)
3. Of all the linear functions of the original variable that satisfy the above two conditions,
pick zl that has the maximum variance.
Consequently, the second principal component must satisfy the following conditions:
1. z 2 is linear function of the original variables.
Z2 = w 2 1X
where w21 , w22 ,
1
+ w 2 2X
2 +-...-
+
2 n Xn
(2.17)
• • , w2n are constants defining the linear function.
2. Scaling of variable z 2.
2w +
w2 + " + w2 = 1
(2.18)
3. zl and z 2 must be perpendicular.
Wllw21 + W12
22
+
...-+
WlnW2n = 0
(2.19)
4. The values of z 2 must be uncorrelated with the values of zj.
5. Of all the linear functions of the original variable that satisfy the above three conditions,
pick z 2 that captures as possible of the remaining variance.
For a data set with n variables, there are a total of n possible principal components. Each
component is a linear combination of the original set of variables, is perpendicular to the
previously selected components, with values uncorrelated with the values from the previous
set of values, and which explains as much as possible of the remaining variance in the data.
In summary,
zl = W'1X = w11X 1 + wl 2 X 2 + ... + wi,X,
(2.20)
z2 = w'2X = w21XI + w 22 X 2 + ... + w2,Xn
z, = w',X = w l1X 1 + wn 2 X 2 + ... + WnX,
where random variable X' = [X 1, X 2 , X 3 ,..., Xn] has a covariance matrix S with eigenvalueeigenvector pairs (A1 , ex), (A2 , e 2 ), . •
, (An, e,) where A1 >X
2
> ... > A, > 0. The
principal components are uncorrelated linear combinations zi, z2 , z3 , . . . , zn, whose
variances, Var(zi) = wýSwi, are maximized.
It can be shown that principal components solely depend on the covariance matrix S of
X 1, X 2 , X 3 , . . ., Xn. This is an very important concept to understand. As described earlier,
the axes of the original data set can be rotated by multiplying each Xi by an orthogonal
matrix W:
zi = WXi
(2.21)
Since W is orthogonal, W'W = I, and the distance to the origin X is unchanged:
z:zi = (WX,)'(WXi) = X W'WX, = XX,
(2.22)
Thus an orthogonal matrix transformed Xi to a point zi that is the same distance from the
origin with the axes rotated.
Since the new variables zi, z2 , z3 , * . • , z, in z = WX are uncorrelated. Thus, the
sample covariance matrix of z must of the form:
s2z 1
Sz
0
0
...
Sz 2 ...
0
0
0
0
(2.23)
... s2,n
if z = WX, then S, = WSW', and thus:
WSW' =
s2 Z
0
...
0
0
S2 z2
...
0
0
0
(2.24)
... S2zn
where S is the sample covariance matrix of X. In linear algebra, we know that given C'SC
= D, where C is an orthogonal matrix, S is a symmetric matrix, and D is a diagonal matrix,
the columns of the matrix C must be normalized eigenvectors of S. Since Equation 2.24
shows that orthogonal matrix W diagonalizes S, W must equal the transpose of the matrix
C whose columns are normalized eigenvectors of S. W can be written as the following:
I
W
W=
(2.25)
W'
.
where w! is the ith normalized eigenvector of S. The principal components are transformed variables zl = w'X, z2 = w'X,.
. . , z = w'X in z = WX. For example, zl =
wllX 1 + W12X 2 + ... + WlnXn
In addition, the diagonal elements in Equation 2.24 are the eigenvalues of S. Thus the
eigenvalues A1, A2 ,
. . .
, An, of S are the variances of the principal components zi =W!X:
s2zi = Ai
(2.26)
Since the eigenvalues of S are variances of the principal components, the percentage of the
total variance captured by the first k principal components can be represented as:
% of Variance Captured =
A1
+2
..
+ Ak
2i=I Sii
(2.27)
The following is a summary of some interesting and useful properties of principal components.(Johnson, Wichern, p. 342)
* Principal components are uncorrelated.
* Principal components have variances equal to the eigenvalues of the covariance matrix
S of the original data.
* The rows of the orthogonal matrix W correspond to the eigenvectors of S.
f)A
Chapter 3
Web Process 1
3.1
Background
The data set used in this chapter is collected from a continous web manufacturing process
where more than 850 in-line control parameters are constantly monitored.
The end-of-
line data comes from an optical scanner sensitive to small light-scattering defects where 8
important quality parameters are measured with high precision. In this chapter, some of the
analysis tools described in Chapter 2 are ultilized to characterize the multivariate behavior
of the in-line data, the multi-variate behavior of the end-of-line data, and the statistical
relationship between the two.
3.2
Data
The data set from Web Process 1 consists of two major groups of variables: in-line variables
and end-of-line variables. The in-line data set consists of physical parameters that control
the production process, while the end-of-line data are parameters that indicate the quality
of the end product. The combined data set represents information for the manufacturing of
115 rolls of the end product.
The in-line data set contains 854 control parameters, measured approximately every 61
seconds for 4320 observations. The end-of-line data consist of 4836 measurements of 8 quality
parameters. The values of these quality parameters are collected by a real-time scanner that
sweeps across the web at constant frequency. One of the 8 quality variables is an indicator
of the type of defect that occurs at the end-of-line.
3.3
Preprocessing
As discussed in Chapter 2, raw data sets often need to be preprocessed before any multivariate analysis are performed. In the following two sections, techniques are applied to both
the in-line and end-of-line data in order to present more effectively the information in the
original data.
End-of-Line Data
Sometimes the values of variables are not numeric. As a result, the
information contained in the variables must be encoded before any analysis. How to encode
these non-numeric values depends on the type of information they convey. For example, the
end-of-line data contains a variable that categorizes the different types of defects. There are
a total of 8 defect types, and each one of them is simply assigned a numeric value.
In-Line Data
There are a total of 854 in-line parameters, measured approximately every
61 seconds for a period of 3 days. Of all these parameters, 194 variables are constant over
the entire period. These 194 variables can be discarded without any further investigation.
In addition, 222 variables are also eliminated, because they are simply averages of other
parameters. Consequently, 438 in-line parameters are left for analysis.
3.4
Feature Characterization
Before performing any multivariate analysis, much insight can be obtained from examining
the data in the time and space domain.
3.4.1
In-Line Data
The in-line variables show fluctuations over time. This means the physical process does not
remain steady. It tends to change "significantly" over time. The following is a plot of the
behavior of a typical in-line parameter over time.
---
Figure 3-1: Time-Series Behavior of a Typical In-Line Variable
3.4.2
End-of-line Data
The end-of-line data, include the sizes, shapes, positions and times of defects. Figure 3-2 is a
visual representation of the positions and times of the defects. The horizontal axis represents
the cross-web position, and the vertical axis represents the times when defects occur. Each
point on the graph represents a defect spot at a particular time and at a particular position
across the web. One can simply imagine Figure 3-2 as one big sheet of the end-product where
the defects location are marked by dots. If the web moves at a fairly constant speed, the
variable, time, on the vertical axis is highly correlated with down-web position of defects.
Position of Defects
on Web
I,.
I~J4 ;*i7*'
EI.r
·
JI-
· i'·- I
1
Ii ::z:t.:.
5'.
10
U~
·
·
·
...
:.:
.
I:,.
20
30
WKdh
40
50
Figure 3-2: Cross-Web Position of Defects Over Time
Figure 3-2 shows a number of interesting features:
n"~
* Defects can be categorized into streaks and cloud. Some defects tend to occur at
the same cross-web position over time (streaks), while others appear to occur fairly
randomly on the web (cloud).
* Defects are significantly denser on the left side of the web than the right side.
* For certain periods of time, there are no defect occurrences across the web. The
could represent 'perfect' times when the manufacturing process is running without any
defects.
Perfect Observations
Figure 3-2 shows that there are certain periods of time where
no single defect occurs across the web. There are two possible scenarios that can explain
these 'perfect' observations: 1) At these observations, the manufacturing process is running
perfectly and all the control parameters are at their optimal levels. Consequently, there are
no defects. 2) These 'perfect' observations are simply the result of the process being shut
down and the scanner recording no information.
After some investigation, it was discovered that the manufacturing process is occasionally
shut down for various maintenance reasons, such as the cleaning of rollers, etc. Since the
scanner continues to operate during these periods, no defects are recorded on the web. As
a result, it was determined that the 'perfect' observations are simply contaminants that do
not have any physical significance.
01loued
4=
35M
awo
i·z
).·
~':·;
'·'
i·5
· ~':.··
:.
··
f··L
··
·'
'
~
-:z~~ . L
,· ~c':'
' -~··P
"000
1S
r:
:··
V
· ··
--
0
''
~·:·'
_l;~_ri·.
~·I·~
'' ~':.·~:'
~. · ·
· I·
`rr..
· · r·4·i
.· 'r ='
C r.
2o
30
Sekec
·
· i
·.:"
-··
···
,·
·
a40
i
a
Oft-k0.1"
:010
*rjH
3500.
1
2000.
0
10
20
0
40
so
00
Figure 3-3: Streak and Cloud Defects
Cloud and Streak Defects
Figure 3-3 shows the end-of-line defects can be separated
into streak defects and cloud defects. The threshold for differentiating between streak and
cloud defects is 10 defects counts per unit distance across the web, where this distance is
0.1 percent of the web width. In other words, if there are more than 10 defect counts for
all times that occur within a certain unit distance block across the web, then these defects
within this block are counted as a defect streak. Defects within any blocks totaling less than
10 are categorized as cloud defects.
3.5
Correlation Analysis
In order to improve quality and reduce defect rate, an interesting question that a plant
manager might ask is "are the occurrences of streak and cloud defects related to some
common physical factors or they caused by separate physical phenomenon?"
Figure 3-3 shows that these two types of defects can be clearly separated from each other
and seem to resemble two separate physical processes. The cloud defects seem to be fairly
randomly distributed, while the streak defects are concentrated on the left side of the web and
seem somewhat correlated. Correlation analysis is a good method to apply here to determine
the relationships between streaks and between streak and cloud defects. Understanding the
correlations among streaks and clouds is a good beginning to understanding the underlying
physics that causes the defects.
3.5.1
Streak-Streak Correlation
Figure 3-3 indicates that most streaks occur near the left edge of the web. This suggests
that streak defects are not randomly generated.
Some physical characteristics particular
on the left side of the web could be causing the streak defects. To test this hypothesis,
the correlation coefficients between all 45 combinational pairs of the 10 densest streaks are
calculated. If the streaks are caused by some common factor, the correlation coefficients
between streak would be close to 1 or -1. Conversely, if the streaks are not caused by some
common factor, the correlation coefficient should be closer to 0.
Mejor3tkr
400-0
Defeaf
·
·
I
5.
~
I'
;I
'Ii
as=o
;I
'4
2000
Ii
f
:
som
-
.!I.
-
11
I
1
1
.
a
'
'I
i.
ji
'9
-
.2
--
--
3.s
4
4.5
Figure 3-4: The 10 Densest Streaks Over Time
Method
Figure 3-4 shows the ten densest streaks on the web, which are used for correlation
analysis. In order to calculate the correlation coefficients between streaks, each streak is
divided into approximately 257 time blocks. Consequently, a single streak can be represented
as a 257-element vector, where each element is the total defect count within each time block.
The correlation coefficient between any two 257-element vectors can be calculated using
Equation 2.10. Furthermore, each streak can also be divided into time-blocks of other length.
The same procedure can be applied to calculate the correlation coefficients of the streaks
using different length time-blocks.
Results
Figure 3-5a shows the correlation coefficients for all 45 combinations of the 10
defect streaks using the standard-length time blocks. Although most of the streaks are
positively correlated, the average correlation coefficient is only 0.08652.
Figure 3-5b and Figure 3-5c show the correlation coefficients of the 10 defect streaks using
double-length and quadruple-length time blocks respectively. There is a small but steady
increase in the correlation coefficients as the length of the time blocks are increased. For
double-length time blocks the average correlation coefficient is 0.1054, and for quadruplelength time blocks, the average correlation coefficient is 0.122.
The increase in correlation coefficients with increasing time blocks indicates that while
the streaks are somehow correlated, most of the correlations are buried in high frequency
noise. As the time block lengthens, some of the high frequency noise are filtered out, resulting
Streak Correlations
I.T TTTTTT TT
T
_T
TT T
TTT.T
·
01
mean - 0.086552
-0.5
O
5
10
16
20
25
m
0
mean
0.105224
30
35
40
Streak Correlations
0.5-
-0.5
-
0
5
10
15
20
25
30
SS
40
Figure 3-5: Correlation Coefficients Between Streaks Using a) standard time block, b)
double-length time block, c) quadruple-length time block
in higher correlation coefficients. But due to the uncertainty of the signal-to-noise ratio of
the process, the 'true' correlation coefficients between the streaks are still uncertain.
3.5.2
Streak-Cloud Correlation
This part of the analysis focuses on determining correlations between the cloud defects and
the streak defects. If the two types of defects are highly correlated, there should exist some
common process parameters that cause the occurrence of both streak and cloud defects. If
they are not highly correlated, the two types of defects are most likely caused by separate
process parameters.
In addition, comparing the correlation coefficients calculated using
different time-blocks can present some information as to the frequency range where the two
types of defects are most correlated or uncorrelated.
Results
By using time blocks varying from length 1 to length 100, the correlation coeffi-
cients are calculated between the streak and cloud defects. Figure 3-6 shows that the streaks
and cloud defects are positively correlated. For time blocks on the order of length 1 to length
20, the correlation coefficient is approximately 0.2. As the length of time blocks increase
to the order of 70 to 100, the correlation coefficient gradually increases to an average of
Correlaton Coelficients Between Cloudand StreakDetects
Figure 3-6: Correlation Coefficients Between Streak and Cloud with Time Blocks of length
1 to length 100
approximately 0.35.
The positive correlations between the streak and cloud defects indicate that they are
related some common underlying physics. In addition, the analysis shows that the correlation
is higher when longer time blocks are used. This suggests that some of the high-frequency
noise is filtered out as the time block increases, resulting in a more accurate representation
of the correlation coefficients between the streaks and clouds.
3.5.3
Interpretation
The defect data show the difficulties in determining the correlation coefficients between two
processes when the true signal-to-noise ratio is unknown. For example, in section 3.5.1, it
is shown that the correlation coefficients between the streaks increase as the time blocks
are lengthened. Lengthening the time blocks, in effect, removed some of the high frequency
Poisson noise component, resulting in a more accurate representation of the correlation
coefficients. More analysis needs to be done to quantify the effect of Poisson noise on the
correlation coefficient so that the 'true' correlation coefficients between streaks and between
streaks and clouds can be identified.
3.6
Poisson Distribution
Figure 3-3a shows that cloud defects seem to be randomly generated and fairly evenly distributed across the web, representing a Poisson distribution. In this section, analyses are
performed to look at the distribution of cloud defects over time. The key is to find out
whether or not the cloud defects exhibit a Poisson distribution, and if they do, over what
frequency range.
3.6.1
Method
Similar to the method utilized for determining correlations between streak and cloud defects,
the entire set of cloud defects is divided into time blocks of a certain length. Once the total
cloud defect count is determined for each time block, a histogram of the total defect count in
each time block is presented and compared to a plot of a typical Poisson distribution. Time
blocks with different length can be used to determined if there are certain frequency ranges
where the cloud defects resemble Poisson distributions.
Cloud Distribution using 5 min Time Blocks
5
10
Number of Defects in a Time Block
Poisson Distribution using 5 min Time Blocks
_·Y·|
I
std=19.Z2
td= 17.52
8
200
std=14.18
Sstd=9.381
E
= 100"
'''
0
0
--
-~-----
-----
I
10
5
Number of Defects ina Time Block
Figure 3-7: (a) Cloud Distribution Using Fixed-Length Time Blocks, (b) Ideal Poisson Distribution Using the Same Fixed-Length Time Blocks
3.6.2
Results
A time block of a certain fixed length is selected to determine the histogram of the defect
density. For the selected standard length, the average number of cloud defects in each standard time block is approximately 2. Figure 3-7a shows the histogram of these cloud defects
using these fixed-length time blocks, and Figure 3-7b shows an ideal Poisson distribution
generated with the same average defect density using the same standard-length time blocks.
A comparison of Figure 3-7a and 3-7b show that the cloud defect distribution does not
resemble a Poisson distribution for these standard-length time blocks.
Mioc.
k
Cloud Mitibuton u..hg 10 mlN Tn
I
40
lo-
00
2
0
14
10
12
0
8e
NumbFroDefDects in a Ti6 Block
4
2HD
16
18
16
1U
20
m
Cloud Distributionus"lg 30 rnn Trne Block
2
4
6
8
Nu.bLr ofDalect
12
10
, . TimBo.
14
20
Figure 3-8: Distributions of Cloud Defects Using x2, x4, and x6 Time Blocks
I
Pot
n DCidbon urno 30 r-n TW- Block.
Nu"Ir
r
of Dil ctflF in Tir-
Block
Figure 3-9: Ideal Poisson Distributions Using x2, x4, and x6 Time Blocks
Figure 3-8 presents the histograms of cloud defect density using 2 times, 4 times and
6 times the length of the standard time blocks used in Figure 3-7. Figure 3-9 shows the
ideal Poisson distributions with the same average defect density as the cloud distributions
generated using time blocks x2, x4, and x6 the standard length in Figure 3-7. A comparison
of Figure 3-8 to Figure 3-9 shows that the cloud defect density does not exhibit a Poisson distribution when measured using small time blocks. But as the length of the time
block is increased, the distribution of the cloud defects becomes similar to that of a Poisson
distribution.
3.7
Principal Component Analysis
As discussed in Section 2.6, principal component analysis (PCA), also referred to as the
Karhunen-Loeve transformation (KLT), is a powerful tool in multi-variable data analysis. In
a multi-variable data space, the number of variables that have to be observed simultaneously
can be enormous. As a result, PCA is applied to reduce the number of variables without
losing much information and to interpret the data using a different set of axes.
3.7.1
PCA of the End-of-Line Data
Principal component anlaysis is applied to the 7 of the 8 end-of-line quality variables, excluding the variable that characterizes the defect type. Figure 3-10 shows the time-series
behavior of the first 3 principal components. Figure 3-11 shows the accumulated variance
captured as a function of the number of principal components used. One can see that approximately 90% of the information contained in the 7 in-line variables are captured by the
first 3 principal components.
3.7.2
PCA of the In-Line Data
Principal component analysis is applied to 438 in-line variables. Figure 3-12 displays the
first 4 principal components of the in-line data. Changes in the principal components imply
that the production process fluctuates over time.
10
di
5
-5-
0
o
500
o
1000
1 ooo
1500
2000
25M0 3000
5oo
2000
200
Figureoo
3
Principal
1003-10
Firsoo
Tie
Tk-
2ooo
Components
3000
3=0
4000
45'00 5000
3500
4000
4500
of
the
000
End-oo
sof-Line
Data
Figure 3-10: First 3 Principal Components of the End-of-Line Data
I
j
Figure 3-11: Percent of Variance Captured by PCs
10CLi0
-10.
-201
5/15pm
6/4noon
524pm
5A8Smm
20
0-I
5I10aM
5/2 dam
5/1 5pm
5/32amn
24pm
54 10Opm
5/3 noon
5/4lam
Figure 3-12: First 4 PCs of the In-Line Data
3.7.3
Interpretation
A comparison of the two sets of principal components presented in Figure 3-10 and Figure 3-12 can reveal some interesting insight into the nature of the relationship between the
in-line and the end-of-line data. Since the principal components are another way of representing the process variables, fluctuations in the principal components indicate fluctuations
in the underlying process. As indicated before, the principal components of the in-line data
fluctuate noticeably in time. Assuming there exists a close relationship between the in-line
data and the end-of-line data, the principal components of the end-of-line data should also
show similar fluctuations. However, Figure 3-10 does not confirm this. Instead, the principal
components of the end-of-line data seem to behave completely independently of the principal components of the in-line data. As a result, PCA shows that there is no strong linear
relationship between the in-line and the end-of-line data from web process 1.
Chapter 4
Web Process 2
4.1
Background
Web process 2 is a multi-staged manufacturing system that takes raw materials and transforms them into the final product through a number of sequential subprocesses. Raw materials are introduced into the first stage, and after going through certain chemical and physical
changes, the desired output is produced. Next, the output from the first stage becomes the
input for the second stage. Again, under certain control parameters, input is transform into
output. This process is repeated a number of times, as input turns into output, and output
becomes input. The output of the final stage of this multistaged manufacturing process
becomes the final product. It must be noted that the output from each stage can be very
different from the input. As a result, the final product or the output of the final stage is
often nothing like the initial input.
Each stage in this multi-staged manufacturing process can be treated as a subprocess.
Although the subprocesses can be occasionally shut down for maintenance or testing purposes, they are continous processes with real-time controlling and monitoring parameters.
But between these stages, there can be certain amount of delay as material is transferred
between subprocesses. The output of one stage often does not immediately become the input
for the next stage. Due various factors such as supplier power and customer demand, production within certain stages can be speeded up or slowed down, resulting in delays between
subprocesses. Understanding these delays can be important when performing multivariate
data analysis.
4.2
Data
The data set for web process 2 contains 1518 variables recorded every time unit for a total of
2000 observations. These variables can be either control or monitor variables. As mentioned
in a previous chapter, control variables are also referred as in-line variables, and monitor
variables can be called quality variables. Since the variables in the data set are arranged in
alphabetical order, the order of the data does not have any physical significance. In other
words, the variables from all the different subprocesses are scrambled together. In addition,
they are not separated into either in-line or quality variables nor are they grouped according
to subprocesses.
4.3
Preprocessing
A major fraction of the data set containing 1518 variables and 2000 observations are either
corrupted or missing. After unwanted variables and observations are removed, the remaining
working data set contains 1010 variables and 1961 observations, which is about 65 percent
of the original data set. Next, variables whose sample variance is zero are deleted from the
data set, because they contain no useful information. The remaining data set contains 1961
recordings of 860 variables, which include both control and quality parameters. Before applying multivariate analysis, the variables are all normalized to zero mean and unit variance
using methods discussed in section 2.3.1.
4.4
4.4.1
Feature Characterization
Quality Variables
Unlike web process 1, the quality variables for web process 2 do not record the actual physical
location of defects. Instead, the quality variables are various modified physical parameters.
As a result, no one figure can capture the quality of the final product.
4.4.2
In-Line Variables
Since there are many in-line variables, it would be very difficult to analyze them in depth one
by one. But a simple look at the behavior of individual variables over time could provide
some valuable insight before performing any multivariate analysis. The following are 10
typical types of behavior associated with the in-line variables:
Variable Behavior Over Time
o
2-
0
500
'
'
2
-- 2
1000
1500
Observation a
2000
iIOIJo..U
0
Observation 0
10
500
1000
Observation
1500
2000
:Li-i1l-ziii
10o
-o
'L
'
0
OOio.
iloloo
6500
1000
Observation
1500
6
2000
0
6oo
1000
1500
Observation 0
2000
•-00
500
1000
1500
Observtion N
2000
0
500
1000
1500
Observaton 0
2000
2
0
0
1000
100
2
E
tea
0
0
A-5Fu
0
Observation
0oo
1000 U 1500
Observation 0
500
-
1000
Soo Observation
1000
2000
1500
2000
W1500
2000
•
Figure 4-1: Ten Types ofIn-Line Variable Behavior
These graphs present some interesting features with regards to the behavior of the manufacturing process. Each of the above 10 plots represents a group of variables with a particular
type of behavior. Almost all the in-line variables can be categorized into one of the ten types
of behavior described below:
1. Variable 1 - represents the set of variables whose values remain fairly constant except
for sharp transient outliers at certain observations.
2. Variable 2 - represents a set of variables that increase linearly and reset themselves
periodically.
3. Variable 3 - belongs to a group of variables that tend to remain constant for a period
of time before jumping to another value.
4. Variable 4 - generally low frequency quantatized behavior with sharp transient outliers.
5. Variable 5 - linear time-series behavior.
6. variable 6 - high-frequency oscillatory behavior that drifts over time.
7. Variable 7 - high-frequency periodic behavior that is confined tightly within a certain
range.
8. Variable 8 - fairly random high-frequency behavior.
9. Variable 9 - high-frequency behavior with relative small amplitudes compared to sharp
transient outliers.
10. Variable 10 - high-frequency behavior with a lower bound.
4.5
Outliers Analysis
As defined in Section 2.3.1, outliers are observations that are inconsistent with the rest of
the data. Identifying and understanding outliers in a manufacturing setting can be very
important to plant managers whose goals are to eliminate variation and to reduce defects.
The plant managers are interested in knowing the answer to the following questions:
1. Are the outliers simple extensions of the normal behavior?
2. If not, are there any physical significances behind the outliers?
3. If so, can the outliers be grouped according to these physical significances?
4.5.1
Normalized Euclidean Distance
The normalized Euclidean distance method, as explained in Section 2.3.4, is a good way to
identify outliers. In this case, this method is applied to 860 variables and 1961 observations.
The plot of the normalized Euclidean distance in Figure 4-2 shows that there are at least
two distinct populations in the data set. One group of observations, where the normalized
Euclidean distance is above approximately 1000, shows sharp and spiky behavior over time,
Ofn
8000
7OOOO
4000
--
S4ooo
r:' -
o200
,
400
r
?-
.
600
,r;
8
.
G
Ob
o000
.
'120
nmatlan
.
'14oo
1600oo
180
2000
Figure 4-2: Normalized Euclidean Distance
while the other group of observations, where the normalized Euclidean distance is less than
1000, shows slow-moving and fairly constant time-series behavior.
In order to define outliers, it can be assumed that in a properly functioning multivariate
manufacturing environment, all the process parameters operate within a certain normal range
of values both individually and collectively. Consequently, behavior outside this certain
normal range can be categorized as outlier behaviors contributed mostly by containments.
Figure 4-2 shows that an appropriate normal range of behavior can be defined as observations with normalized Euclidean distance less than 1000, and the outliers set corresponds
to observations with normalized Euclidean distance greater than 1000. Figure 4-3 is a plot
of these two separated groups: 1) the normal set, and 2) the outliers set. The time-series
behavior of the normalized Euclidean distance of these two sets of behaviors do not seem
to be extensions of each other. The normal set exhibits fairly constant and stable behavior,
while the outlier set is transient and very unstable.
4.5.2
Time Series Model - PCA
In addition to normalized Euclidean distance, various time-series methods, such as principal
component analysis (PCA), can also be good methods to identify outliers. PCA groups together variables that are correlated with each other. Since multivariate outliers are produced
by sets of variables that exhibit the similar 'outlying' behavior at certain times, PCA should
I-
7000
6000
5O00
2000
1000
o
200
48
00
80
Obtse ro00ation 01200
1400
1600
1800
20 00
1400
1600
1800
2000
8000
7000
6000
95000
O00
2000
0ooo
"7ru~·s~hkut~,*c~,r;~.h+~liirirr3
O
200
400
600
Boo800
1000
1200
Figure 4-3: Outlier and Normal Behavior Based on Normalized Euclidean Distance
be able to group together the variables contribute to the same outliers. PCA should be
more effective in grouping outliers than the normalized Euclidean distance method, because
it groups variable that are physically significant together rather than simply grouping the
observations into two populations.
Principal Component Analysis
Figure 4-4 represents the first 10 principal components
calculated from the data set containing 860 variables and 1961 observations. Similar to the
plot of the normalized Euclidean distance in Figure 4-2, Figure 4-4 shows that principal
components also exhibit sharp discontinuities at certain observations where the their values
jump very sharply. A more careful look at Figure 4-4 shows that the first ten principal
components exhibit two major types of outliers.
* 1st type - Step outliers. This set of outliers is associated with the 1st and 3rd principal
components, where the values of the principal components between approximately
observation 970 and 1220 are substantially different than the values of most of the
other observations.
* 2nd type - Transient outliers. They are associated with the 3rd through the 10th
principal components, where the outlier values are very different from the rest of the
population only for very brief periods of time.
Observation
#
Figure 4-4: First Ten Principal Components of Web Process 2 Data Set
These two types of outliers seem to be controlled by two different set of physical parameters. The first type of outliers takes place when the principal component jumps suddenly
from a certain range of value to a different range of values, stays there for a period of time,
and jumps back to the original range of values. The second type of outliers are transient outliers that occur abruptly and for brief periods of time. The contrasting behavior of these two
types of outliers indicates that they are controlled by separate underlying physical processes.
Looking at the 3rd through 10th principal components associated with transient outliers,
we can identify two distinct groups within the transient outlier set.
* The first group is associated with the 3rd, 4th, and 5th principal components, where
their values exhibit sharp changes at approximately observations 100, 1750, and 1950
occurring with similar relative proportions.
* The second group is associated with the 7th, 8th, 9th, and 10th principal components,
where their values change sharply at approximately observation 600.
PCA with Frequency Filtering Transient outliers occur when the values of the principal
components abruptly jump to a very different value for a short period of time before returning
to the original values. As discovered from Figure 4-4, there are two kinds of transient outliers
associated with the 3rd through 10th principal components. The figure shows that the 1st
kind of transient outliers is spread out over the 3rd, 4th, 5th, and 6th principal components,
while the second kind is spread out over the 7th, 8th, 9th, and 10 principal components.
PCA collapses variables that are similar to the same dimensions. But in this case, each
of the two kinds of transient outliers are spread out over more than one dimension. One
hypothesis for this phenomenon is that the original data set is dominated by low-frequency
behavior. As a result, the first few principal components are also dominated by low-frequency
behavior, and the high-frequency transient outliers are not dominant enough to be grouped
to the same dimensions.
Since we know that the transient outliers are associated with high-frequency behavior,
one way of collapsing the transient outliers into a smaller number of dimensions is to perform
high-pass filtering on the original data set before applying PCA. The idea here is that with
the low-frequency components filtered out, PCA can group the high-frequency transient
outliers much more effectively.
f-gh-Pass
F"b
High-Pass
Fil
o08
0.6
_04
0.
004
0.2
0
o;
0.1
-200
0.2
0.3
04
05
06
(0p0)
NormF-qouooy
220
07
0.8
0.9
0
-
(wp0
FRquency
FNom
01
02
02
01
03
03
04
04
05
05
06
0
(vpi)
Noff Fr.quwcy
o7
a
0
9
1
Norm
FoMquo•y
(PON
Figure 4-5: High-Pass Filters with Wp=0.1 and Wp = 0.3
Figure 4-5 shows the graphs of two high-pass filters utilized to remove the low-frequency
components in the original variables. Figure 4-6 presents the first 10 principal components
.___
W00.
--
Jti • -I0F
n
__r-,
soo
Ob4
e
-0
1000
oo
vtlan w
1-'_
!
B00
-- 0
:0
-
1
..
2000
-.
-
2oS0o
14500
o
soo
~1
0
,1
w
I
i +on
2000
1ooo
4DUý.Va(n
015oo
01·,
2
1000
o
-so
ii1
oo
II
.
600
I000o
.btli
-0064
4
oo
":"
:':• - :: •-.. . .. . .. . . .. . ..
I-00c,
,
"
•
o
600
o
1000
1
00
Ob..,stio. 4
40
2o00
::
M
2000
oi
_,
:.o
..
.•...
::': ...
..
,
:: ..LA.$
I,
...77
".i
,2
11
M,.,
.
"p"7-1-7-irr
o
oo00
1000
I
00
2000O
20
j'j.
20
Cft0,00tjý
so0
o0
--
201
a
I C'~00 a15~00
10b..ý.Ulon
-o•
20000
Ob
soo
1000
15OO
2000
•doo
••oo
20o
-Villn 0
Obucrvrtlofl #
Figure 4-6: First Ten Principal Components from 90% High-Pass Filterd Data
--200 0
2
2
0
So00
1000
1500
2000
20
500
1000
.1500
2000
1000
1500
Obsorvatlon 0
2000
100
0
on
0
soo
0
0,O
50
1000
1500
2000
100
-1°o
0
500
1000
5oo
1ooo
1500
1000
5
50
5oo
Boo Ob50
o~IQO
1
0
0tl0n
Obrlr~vllon
0
oo ~o•°
Obllervrtlon
640iS
0
gsoo
2000
1000
1500
2000
1000
1500
2000
,.oo
2•ooo
2000
Obsftostl2,1 0
50
-o
1500
2000
M
2ooo
0
so
OIL
0
#
-a
I
50
il
,0 m
IOblorvstlon
600
ObrWstlon
soo
•doo
0
an
Figure 4-7: First Ten Principal Components from 70% High-Pass Filtered Data
calculated from variables with the lowest 10 percent frequency range filtered out. Figure 4-7
presents the first 10 principal components calculated from variables with the lowest 30%
frequency range being filtered out.
Figure 4-6 and Figure 4-7 show that PCA, obtained from high-pass filtered variables,
does remove the low-frequency behavior but does not effectively separate the different kinds
of outlier behavior. For the two kinds of transient outliers discussed in the previous section,
PCA with frequency filtering still spreads each one of them out over more than one principal
component.
The first kind of transient outlier appears in the 3rd, 4th, 8th and 10th principal component in Figure 4-6 where the lowest 10 percent of the variable frequency range are removed.
Although the second kind appear predominately in the 6th principal component, it also
shows up in the 5th and 9th principal components. In Figure 4-7, where 30 percent of the
lowest variable frequency are filtered out, the two kinds of transient outliers appear slightly
better defined. The first kind mostly occupies the 6th principal component, while the second
kind mostly shows up in the 5th principal component.
4.5.3
Identifying Outlying Variables
From a plant manager's point of view, outliers represent shifts and changes in process parameters that can potentially effect the quality of the end product. As a result, we want
to develop methods and tools to help managers to identify the physics behind these outlier
behaviors. To understand the underlying physics, we need to determine which variables or
combinations of variables contribute to which outliers. This way, we are able to analyze
the variables and determine the causes for the outliers. In this section I will present some
methods to group the variables according their contributions to the outliers.
Transient Outliers
Focusing on the transient outliers in the top plot of Figure 4-3, which
does not include the set of outliers from observation 1000 to 1200, we can see there are
mainly 4 regions where the values of the normalized distance jump up suddenly and return
quickly. These 4 regions are located approximately at observation 100, 600, 1750, and 1950.
The goal here is to determine which variables contribute to these different transient outliers
in these 4 regions. One method to find the contributing variables is to find the variables
that also experience sudden changes in values at the same observations corresponding to the
4 transient-outlier regions.
Since a manufacturing data set often contains hundreds of variables, looking at the time
behavior of each variable might be burdensome in determining the causes of transient outliers.
The following procedure is a simpler way to find the contributing variables. For a data set
X;j, where i represents the observation number 1, . . . , n, and j represents the variable
number 1, . . . , m, D is a difference matrix with dimensions (n - 1) x m, whose rows are
equal to the difference between adjacent rows of X. Dij is defined as:
D = X - X-
(4.1)
Let M be a row vector of length m where the jth element represents the average of jth
column of the difference matrix D. The 4 transient outlier regions are represented by j=100,
600, 1750, and 1950 respectively. The variables that contribute to region 1 where j=100 are
variables with index i that satisfy the equation Dij >> Mj. The variables that contribute
to the other outlier regions can be determined by using the appropriate j's.
The basic method described here is to determine the variables whose greatest change
between two consecutive observations compared to the average change occur at the same
observations corresponding to the 4 transient outlier regions. Thus, we can determine a set
of variables that contributes to the transient outlier behavior in each of the 4 regions.
Results show that this method is effective in determining the variables that contribute
to the transient outliers in the different regions. Figure 4-8 is a plot of a set of identified
variables that correspond to outliers in region 1 and 4.
Eigenvector Analysis
The eigenvectors associated with the principal components reveal
how much each variable contributes to the time-series behavior of the corresponding principal
components. As a result, eigenvectors associated with principal components that exhibit
outlier behavior can potentially reveal the variables that contribute most to the outliers.
* Step Outliers - Since the first principal component represents most of the behavior associated with the step outliers, the corresponding eigenvector can provide information
Var 37
i 6r
--
300
Vmr 874
100ooo
Var 3Se
2
9
-D
Var 3see
30oo
-40
20op
Vmr 673
a73
Var
Va- 376
2
looo
Vmr 370
2100
10002000
Vlr 373
s"•
Var
0
Var7376
2
1000
'Var070
0
.90
0
0oloo
o~
100ICH0
mrM7
2000
oo
no
Figure 4-8: Variables Identified that Contribute to Transient Outliers in Region 1 and 4
as to the contributing variables. Figure 4-9 represent plots of the first principal component and the corresponding eigenvector. As discussed in Section 4.5.1 , we can see that
-
20
·
·
·
·
·
·
·
·
--·
100
·
100
10
0
jkcC"yU,
2-10
-20
4~.
-30
-bC)
0
·
200
400
600
·
800
·
1000
1200
Observatilon 0
1400
--
2000
Figure 4-9: The First Principal Component and the Corresponding Eigenvector from Process
2 Data
the 1st principal component is dominated by the outlier behavior from approximately
observation 1000 to 1200. The corresponding eigenvector shows that 122 variables are
weighted significantly more than the rest of the variables. These 122 variables are the
main causes for this type of outlier.
* Transient Outliers - With the 122 variables that contribute to the step outliers removed,
the eigenvectors of the correlation matrix and the associated principal components are
calculated for the remaining 738 variables and 1961 observations. Figure 4-10 are the
first 10 principal components, and Figure 4-12 are the corresponding eigenvectors.
,to,
50
o
.
0- I
0
o
F0:
-20-
.0 O..0arti•,
160
2000
,.40
.t.
Z20
o
-4-
- 1000
OClrr"Mtla
So
1000
1500
-20,
10
0
500
0ompobentso,,
0
5
ooo
1.
1500P
0 '1500
2000i
2
4
-20h
-- A ,;
°
2~
~
--
00
Ob20
0
1000So
f
1000
1500
tb~on S
n·tsl
Soo 3ftratian
Fiue41:TeFrtTnPicpa opnnsfo
0,,
~a G
o
2000
n0
ff...
.5 o0
1000
ThbeFrsvti0Tn
moo Ob2tarvti2on
1000
500
-
7 ...
_0
20
.40
100ooo
oo
10
&1 500
2000
8 2000
V
2000
3 aibe
0.
wItrlabloJA
200
Varibl
oo
400
Number
IN~a
oo00
0.1
o
2oo
400
500
500
o
•oo
400
Soo
Soo
V 9j
d
0
0-O2
0o0
02
0
200
400
'doobI
I
U1W"
200
20o
400
4oo
500
500
000
6°°
500
zoo
0.2 o
200 Valrbia 400tle
o.tbs5
oo
oo
G
0.
o
200 Variable
b 400Numlb5 00
0
200
8oo00
Viable Number
200
Vlriable
400
Number
00
00
400
6b00
S 200 400
t~k.Mbtl 800
Vriable
S00
oo0
Figure 4-11: The First Ten Eigenvectors from 738 Variables
Since the two kinds of transient outliers are spread out over the first 10 principal
components, the variables that contribute to these two kinds of transient outliers are
spread out over the 10 eigenvectors. As a result, Figure 4-12 show that all the variables
are weighted fairly evenly in the determination of the principal components, and it is
hard to determine which variables contribute most to the outliers from examining the
eigenvectors. Eigenvector analysis is not very effective in isolating variables when the
outlier behaviors are spread out over many principal components.
4.6
Variable Grouping
This section addresses the question: how can variables that are related to some common
physical process be effectively grouped together? In order to group or separate variables, it
is very important to have some understanding of the underlying physics of the manufacturing
process. In web process 2, the manufacturing process is divided into subprocesses, where the
output of one subprocess becomes the input of another. The data set contains both control
and quality variables from all the subprocesses. It is reasonable to hypothesize that variables
that are related to the same subprocess are more correlated with each other than variables
from different subprocesses.
The following 3 subsections present 3 methods of variable separation.
Section 4.6.1
presents principal component analysis with a focus on examining the associated eigenvectors
calculated from the correlation matrix of the original data. Section 4.6.2 and Section 4.6.3
introduce two methods where some of the noise in the original data set are removed before
performing PCA. Based on the hypothesis that variables related to the same subprocess
are more correlated than variables from different subprocesses, Section 4.6.2 shows how a
more robust correlation matrix can be used to calculate PCA. In Section 4.6.3, variables are
frequency-filtered prior to the calculation of the correlation matrix. Results will show that
these two methods are more effective than standard PCA in capturing the variables groups
within the original data set.
4.6.1
Principal Component Analysis
Principal component analysis groups variables together based on the variables' correlation
with each other. Recall from Section 2.6.3, the ith principal component is a weighted linear
combination the original set of variables obtained from the following relationship:
zi = w!X
(4.2)
where w! is the eigenvector associated with the ith principal component, zi, and X is the
original data set.
Equation 4.2 shows that the eigenvectors characterize the weights of variables in the calculation of the principal components, and the eigenvectors reveal which variables contribute
the most to the time behavior of the corresponding principal components. As a result, examining the eigenvectors obtained from the correlation matrix of the original data can provide
important information as the how the variables are grouped.
-0
200
60
40
o00
Variable Number
0.2Ou
0
0.2
200
400
500
2 Variable Number
000
0
-
0.1
0
200
400
60
6o
400
600
600
Variable Number
00
0.2
7l
Variable Number
Figure 4-12: First 10 Eigenvectors of 738 Variables
-0.
-ht0.05
10
-
0
0.05
0.
0.159
- V.1 b oNumOber .....
-0.1 -0.06
0
-0.1
-0.5
0
0.05
0.1
0.05
Variable Number
0.1
0.15
10a
10*a
-0.1
-005
0
0.05
0.1
0.15
0.06
Number
0.1
0.15
0.05
0.1
0.15
Variable Number
11o
-0.1
-0.05
0
0.05
0.1
Variable Number
ht1 -18
0.15
11o
lO0
e 1oO
-0.1 -0.5•0b•N
-0.1
O,
-0.05
0
0.05
Variable Number
0.1
0.1
0.15
Mh 70
1On
0.15
0.15
0.1
-0.1
-0.05
Variable0
-0.05
0
Variable Number
Va~rible Number
Figure 4-13: Histograms of the First 10 Eigenvectors of 738 Variables
Observation
Figure 4-12 shows the eigenvectors associated with the first 10 principal
components obtained from the correlation matrix of 738 variables, and Figure 4-13 are the
histograms of the eigenvectors. In Figure 4-13, the symbol 'ht' represents the height of the
middle bar. Since the sum of the area under the bars for each eigenvector is the same, the
height of the middle bar is a good indication of how widely the values of the eigenvectors
are distributed. The plots of the first 10 eigenvector and the associated histograms reveal
that almost all the variables are weighted towards their respective principal components. No
single eigenvector is dominated by a small number of variables.
Interpretation Based on the assumption that variables from the same subprocess are
highly correlated due to their common link to the underlying physics of the subprocess,
variables from different subprocesses should be less correlated. Since the data set contains
variables from many independent subprocesses, it is expected that many of these variables
will be uncorrelated with each other. Consequently, for each eigenvector calculated from
the correlation matrix, there should be variables that make significant contribution to the
associated principal components, and there should also be variables that make little or no
contribution to the associated principal components.
Contrary to the above hypothesis, Figure 4-12 shows that almost all the variables in the
first 10 eigenvectors contribute to some extent to the time behavior of the first 10 principal
components. This is not consistent with the initial assumption that variables from difference
subprocesses should not be correlated and, thus, should not all contribute to the same
principal components. From this, we can conclude that there is a lot of noise in the original
data set, resulting in accidental correlations between variables from different subprocesses.
Thus PCA does not groups the variables as well as it can. Methods should be developed to
improve the signal-to-noise ratio of the eigenvectors so that PCA can better categorize the
variables collected from different subprocesses.
4.6.2
PCA with Robust Correlation Matrix
Ideally, only variables that are from the same subprocess or that are controlled by the same
underlying physics should contribute to the same principal component. Consequently, it can
be assumed that the variables from different subprocesses are correlated mostly by accident,
and the correlation between them can be considered as noise. One method to improve the
signal-to-noise ratio in the eigenvectors of the correlation matrix is to create a more robust
correlation matrix by eliminating some of the accidental correlations between variables.
r
A
10,,
I
4
n dex
5
I
. 16n
Figure 4-14: Magnitude of Correlation Coefficients of 738 Variables in Descending Order in
(a) Normal Scale, (b) Log Scale
Method
Figure 4-14 shows the magnitude of the correlation coefficients, rij, calculated
from the 738 variables arranged in a descending order both on a normal scale and a log
scale. In order to create a more robust correlation matrix, where some of the accidental
correlations are removed, let E be the cutoff correlation coefficient, where rijjI < E is set to 0,
and Irij I > c maintains its value. Principal component analysis is performed using this more
robust correlation matrix to determine the grouping of th variables.
Results
Cutoff = 0.06
Figure 4-15 are the first ten eigenvectors calculated from the robust cor-
relations matrix where c = 0.06, and Figure 4-16 are the corresponding histograms.
Cutoff = 0.10
Figure 4-17 are the first ten eigenvectors obtained from the robust corre-
lations matrix where correlation coefficients below 0.1 are set to 0, and Figure 4-18 are the
corresponding histograms.
0o.1
0.1
200
,
O. 0
600
400
800
Boo
VariableNumber
0.10
200Variable
400
600
Number
0.2
Variable Number
800
-0.2
Variable Number
0,2
I
0
0
0.2
0-
200
0
200
400
600
200
400
600
800
200
400
600
Boo
o800
600
400
Variable Number
Variable Number
-02
800
-- 0
-02
200
400
oo
600
Variable Number
0.2
Variable Number
Variable Number
Figure 4-15: 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff
= 0.06
f·
2
L
ht
L 10
71
j10,
-0.1
0.05
-0.05
0
Variable Number
0.1
-0.1
0.15
-0.05
0
0.05
Variable Number
1
102
ht
-0.05
0
0.05
0
15
103
-0.1
-0.1
0.1
ht - 77
0.1
-0.05
0
0.05
0.1
0
0.1
0
).
.15
0.15
10
Variable Number
V.ariable Number
10'
-0.1
-0.05
0.05
Variable0 Number
0.1
0.15
-0.1
-0.05
0
0.05
Variable Number
0.1
0.15
10
L
-0.1
-0.05
0
0.05
Variable Number
-0.1
-0.05
o
0.05
Variable Number
F10
-0.1
-0.05
0
0.05
Variable Number
0.1
0.15
o.1
0.
Figure 4-16: Histograms of the first 10 Eigenvectors Calculated from Robust Correlation
Matrix with Cutoff = 0.06
0
2o
0
0
200Variable
400
Number
400
o
2
-0
o00
800
0.1 O
00
200Variable
400
Number
o00
800
-0.2
-0
200Variabl400
Numbr0
00
800
00
0
200
400
Variable
Num•b•r600
800
02 o
0.20
2002Varible
400
Number600
0.1
800
200Variable
400
Number 00
800
200
800
400Number
600
Vaable
200
400
Variable
Number600
-
S00
=0.1
10P Lht
10
-103
10-
-0.05
O
0.05
Number
0.1
e
2
Vriablrle
[
-0.1
I
n
ht M-105
10
0.15
-0.1
-0.05
O
0.0
Va~rible Number
h
10
0.1
0.15
O
10'
10
110o
r_
10,
28
-0.1
10
·
-0.05
V
0
0.0
0.1
0
0.05
0.1
0.15
o101
ariable
Number
0.15
110
-0.1
-0.05
ON
0.0
0.1
0.15
-0.1
-0.05
0
0.05
0.1
0.15
Varble Number
10'i10'
~102
-0.1
-0.05
Variable Number
hS-
lo"-Vb0r
10
10
lei10"I
10
-0.2
-0.1
-0.05
0
0.05
Variable Number
ht
-
10
0.1
127
0.1
Variable Number
0.16
1
fl.
.-
.1,
In
10'
Variable Numbert-0
-BA1
.M
OOO
6u
O115
0
-0.1
-0.05
0
0.05
0.1
0.15
-0.05
0
0.05
0.1
0.15
ht - 13
Variable Number
10
0.1
Variable Number
Figure 4-18: Histograms of the first 10 Eigenvectors Calculated from Robust Correlation
Matrix with Cutoff = 0.10
Cutoff = 0.15
Figure 4-19 are the results of eigenvectors calculated from the robust
correlation matrix where E =0.15, and Figure 4-20 are the corresponding histograms.
1
-0.
0
200
O
S
400
.
600
.
8
0
200
Varkble
400
600
Number
800
Variable Numbor
o400
sol
Variable Number
800
Variable Number
.
0
2-o.2
0
o
0.
o
0Varlbl.
0
200
0
Numbe-0r
400
600
Variabl. Number
Va.ab.e
Number
0o1
Variable Number
Figure 4-19: First 10 Eigenvectors calculated from Correlation Matrix with Cutoff at 0.15
•
°
|
ht -
-0.1
-0.05
0
0.05
Variable Number
136
0.1
0
Ih
.15
-0.1
-0.05
o
0.05
VariableONuOber
10
0.15
102,
-to'
.15
SI,
10o
10,
Variable Number
°
10
0.05
-- 0.1 -- 0.05
Variable0 Number
0.1
0
-0.1
-0.05
0.05
Var-abla 0 Number
0.1
0
-0.1
-0.05
0
0.05
Variable Number
0.1
0
.15
-0.1--0.05
0
0.05
Variable Nunbar
15
-0.1
z
0.1
0.15
0.1
0.15
•1o
10o
IO'
Ila•
2io10
V-IrWab
N~umber
0
0.05
-0.05
vaftabla N uber
]s
Figure 4-20: Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix
with Cutoff at 0.15
Cutoff = 0.18
Figure 4-21 are plots of the first 10 eigenvectors obtained from the robust
correlation matrix with cutoff coefficient equals 0.18, and Figure 4-22 are the histograms of
the first 10 eigenvectors.
o.1
o.9
200
-0.1
400800
-o
2o
Varlabla Numviber
40
Number
VrIable
0
200
0
200V400
500
V0rlebi
Number
400
V0rleble
Number500
500
800
0
.VerLeblo
200
OO
20
-500
00
o00
400
000
NUmber
500
500
400
800
Vmrvibie Number
500
Figure
at 0.18 4-21: First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff
at 0.18
110
-0.1 -O.O6
O
0.0
0.1
0.10
-10
10
-0.•
-0.r05
0
-0.1
-0.05
0
0.05
0.1
0.05
0.1
0.15
o
lo
-0.1
10-o
-0.05
-0.1 -o0.00
0
0.05
0.1
0.15
o
0.o0
0..
0...
-0.1
-0.05
-0.1
-0.05
0
0.05
0.15
1
N umbrh
VIl*b
0.1
0.15
110
V-6..fta bdurt
ve"'U'a0 "',Z0.05
0.1
0.15
Figure 4-22: Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix
with Cutoff at 0.18
69
Interpretation The figures of eigenvectors obtained from correlation matrices with different cutoff values show that a more robust correlation matrix is more effective in grouping
correlated variables. As the cutoff values increase, eigenvectors show that the variables that
contribute to the associated principal components are weighted more, while the variables
that do not contribute to the associated principal components are weighted less. In addition, the histograms of these eigenvectors show that the distribution of the eigenvectors
become increasingly narrower as the cutoff correlation coefficients are increased. As was
discussed before, a narrower distribution means a taller middle bar. The average height of
the middle bar of the first 10 eigenvectors increases from 86.0 to 98.1, to 113.3, to 133.7, and
to 137.8 as the cutoff level increases from 0 to 0.06, to 0.10, to 0.15, and to 0.18. A narrower
distribution means that for each eigenvector, only a small number of variables contribute
to the associated principal component, while most of the variables make minimum or no
contribution to the associated principal component. This is consistent with the hypothesis
that variables from different subprocesses should not all contribute to the same principal
components.
Comparison
Figure 4-23 is a comparison of 5 eigenvectors calculated from 5 different cor-
relation matrices obtained from different cutoff values. The plots show that the eigenvectors
calculated using the more robust correlation matrix groups the variables much better than
the eigenvector calculated using the original correlation matrix. One can see that the signalto-noise ratio of the values of the eigenvector increases as the cutoff correlation coefficient
level increases.
The histograms of the eigenvectors in Figure 4-24 also indicate the improvements in the
signal-to-noise ratio. The figure shows that the distributions of the eigenvectors get narrower
as the cutoff coefficients are increased. The height of the middle bar increases from 78 to
270 and the cutoff coefficient increases from 0 to 0.18. This means that as the correlation
matrix becomes more robust, accidentally correlated variables are weighted less, while the
significant variables are weighted more.
-o0
100
S0
20
30Oe
0.2
500
40
800
700
Observation
7
0
100
200
300
400
Observation
500
5
800
700
o
100
200
300
400
Observation 0
500
600
700
-0
.0100
200
300
500
600
700
o.1
0•!
(b)
Robust
Correlation
with
Matrix
CutObff0.06
(c)
100
200
300
400
0.2
-0.2
400
Observation
Robust
Correlationrvati
500
S0o
700
0
Matrix with Cut-on
off=0.10, (d) Robust Correlation Matrix with Cutoff=0.15, (e) Robust Correlation Matrix
with
Cutoff=0.18
with Cutoff=O.18
mht
-0.2
-0.15
-0.1
-0.15-0.1
- 7
-0.05
Ou
r.05
-0.0 VariableONumber.0
ht - 10 2
102
-0.2
--.
1
2--.
1
i
0.1
o
0.15
O. 5
0.o5VarlblONumbr0.OS
0.1
0.15
ht -
0.2
--
123
10'
-0.2
-0.15
-0..15
-0.1
o 1
-0.05 VadblONum
Variable Numberr0.05
ht
0.05 - 22
-0.05
0.
10,
0.1
0.15
.0.1
-0.2
Variable Number
5 -0.1
-0.05
ht 1270
0
0.05
Va~rible Number
0.1
0.16
Figure 4-24: A Comparison of Histograms of the Eigenvectors
4.6.3
PCA with Frequency-Filtered Variables
Figure 4-10 in Section 4.6.1 shows that with the exception of transient outliers, the first few
principal components of the 738 variables are mostly dominated by low-frequency behavior. In Section 4.5.2, attempts were made to isolate transient outliers by performing PCA
on variables with the low-frequency components removed. In the previous section, it was
shown the a robust correlation coefficient matrix can be effective in separating variables and
reducing noise in PCA. In this section, we hope the PCA with frequency-filtered variables
can also remove the noise components in the original data set and more effectively group the
variables.
Method
Attempts are made to perform principal component analysis after the original
set of variables are frequency filtered. The idea is that we hope that noise can be filtered
out in certain frequency bands, so that PCA can show the same promising results as PCA
with robust correlations with regards to grouping variables.
Figure 4.6.3 are samples of a high-pass filter and a band-pass filters
Frequency-Filters
used to remove noise in the original data set.
HDh-PsF#O
.v~p_
F
1
D.;
C.C
·
€
O
j
0.1
02
0.3
0.4
.5
Om
N_ f:Mq1WWY
Z'
47
OA
0.-
dI
·)
0
0.1 02
0.3 0.4 0.
0
0.1
013
0
0
07
7
oJ
O.
O
0.8
0J
4o
(a)
0
0.2
OA 0.5 0.6
- o.m
N
Fnm.0-o
y(
I
(b)
Figure 4-25: (a) High-Pass Filter with Wp = 0.3 (b) Band-Pass Filter with Wp=[0.2, 0.4]
Results
For variables that are high-pass filtered with Wp=0.1, Figure 4-26 illustrates
the first ten eigenvectors calculated from the variables' correlation matrix. Figure 4-27 are
histograms of the eigenvectors.
o
400
200
60
600O
*
x o.:1
Variable Numb
iVlr
1- 1
S0oo
ROO
0r
O
S0.2
,
V.Or:l400a N.b.:r
o
0.2
;·'z1
0
400
000
So00o
Number
Clblt
o
200
400
Soo
o0o
-0.2
from High-Pass Filtered (Wp=0.1) Variablesoo
from High-Pass Filtered (Wp=0.1) Variables
Figure 4-26: First 10 Eigenvectors Calculated
ae
O'
-I.1
"t
-
'son
-o.1
VarablhONumt - 1
1.0aleumbr
-0.
S
o.1
110r
-0.1VaablONU
VartblaeNumbr.1
-. •Verlablr
Numb
.1
O.
-0.0.
o
V.11.0g. Numb~r
O'
o
•o
VokrlmblO
•O
m
NumbO1
h' •
"b
Figure 4-27: Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered
(Wp=0.1) Variables
Figure 4-28 are the eigenvectors calculated from the correlation matrix of high-pass filtered variables with Wp=0.3. Figure 4-29 are the histograms of the eigenvectors.
Figure 4-30 are the eigenvectors calculated from the correlation matrix of high-pass filtered variables with Wp=0.4. Figure 4-31 are the histograms of the eigenvectors.
Vriab.e
800
Num-r
N.umber
S62o0 Variabe 400
Soo
0. o
S0o
0
Eo
400
200
0
0.5
V.b
Vab
0
200
Boo
600
Number
Variable Number
400
S00
800
Variable Number
Variable Number
0.
0.5
600
400
200
-0.2
200Varinble
400
Numbe•600
200Variable
400
Num-br 60oo
800
0e
VFs1E
NuCtHere
Figiubl
e e rN.4-r
Figure 4-28: First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables
S10 ...
ht
..
....
o . .
..ht °r
-1.70
...
1 84
-0
-0.1
2
0
o.1
0.2
Vzrialble Number
1
-0.2
0.2
-0.1T
-0.1
VI r
.1
0
0
0.1
ble Number
0.2
0.2
0,
1°1 o-0.2
1
[1
10e
o.
-0.1
Variable oNumber
ht - 170
0.2
0.2
~~07
02
-0 2
-0.
1
i bON
hit
b
0 " 1 Ss
0.2
1°,
...
Figure 4-29:
.
"u
N/o
. - .i•,.•,e ,.N
:
[1
...
Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered
(Wp=0.3) Variables
02.2.
0.2.
0
0
-0.
7
I R V7
200
2
br
e Nu
'Ib
400
200
400
S200
O
0.
Bo000
800
Vrl.ble Num.b.r
V.ariable Numb•r
10
r
m
b e N
400
el
0 5V
o00
T
o000
000
200
4100
0900
Variablae Number
000
800
-
oI
0
200
000
1000
0
200
4000
000
Vartafble Nurrmber
8000
00
200
200
400
1..1.0~t]
0.0
0
200
200
400
41000
600
so
000
Boo
Vaerable Number
4o
400
00
00
0
oo00
0
Variable Number
Figure
4-30: First 10 Eigenvectors
Calculated
from0 High-Pass
(Wp=0.4)
Variables
S2 Vmrlabla
400
Soo
20
-0Variabla
400Filtered
Boo
M.Mbbro
NumborSoo
Figure 4-30: First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables
roO
10ht
21
A
10*
-0.2
...
°
lo
. .-1r.d.o u
...
-0.2• -0.1
-- o.
-0.2
0
Vratý,
Vmrlmblý
.10 *
10,
10'
rorn
* 10,
-0.2
o•
..
0.
0.2
0.1
br"
u
"umbrr
h '
4ý40~
ht - 2014
f
-0.1
rd
0o...
-0.2
-0.2
0
.
-0.1
0.1
a
0
0.1
0.1
al
ht
0. 2
-0.1~
0
0.1
VlrtrAMw Mumbrr
.10.
.
N
-.
n
mb
,
0.2
~_M,
u
-1
0.
h,.
-too
O
0.2
a sb
0.2
1Nu ..•.,
0.2z
0.2
ht - 3-40
V
-0.
0.1
0
r....
Vmr~ablo
Mumbrr IuVnl.2
-0.2n
0 "t
--
-02
-0.1.Vl m, a0Numkr 0.1
I 0.2
o
-- "ai
-01
0
0-"
~.b#ý Pj .,,brr
0.2
Figure 4-31: Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered
(Wp=0.4) Variables
Figure 4-32 are the eigenvectors calculated from the correlation matrix of band-pass
filtered (Wp=[0.2, 0.4]) variables. Figure 4-33 are the histogram of the eigenvectors.
'''I
-0.20O-
I
j I
2Variable
V..bl.. Num .
umber.00
200 VadA 400 NumberO
0.2
0.2o
o
200
O
800
00
o
40o
200
O
400
800
800
200
0
8Boo
o00
6oo
O
200
400
600
Variable Number
o
v.• I
Soo
400
Vrd.14t
0.
0.2
8o00
I
0Variable Number
Variable Number
0.2
02
A
0
200
00
400
20Variablh
O0.
Vraibe MNniib-er
0.2
Numbeo
800
Ojý.j~a."gk
0
200
400
600
Var0able Numbar
0
-0f.r
Soo
2B0Varlable
400Number 600
8e0
Figure 4-32: First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.4])
Variables
100 ~
~
-0.2
--0.2
a i bs o
u t~b -o1I
0.1
-0'
Eu...o.1
... ri....
-=o
0 .2,M-02 - -
....
0.2
.VariableoNumb
-0-1
0
-Vsrlblo
o.1
Vo
0.2
.
°
lo -0.2
-- 0.2
0.1
hM Numb-r
10 2
102
3rlo
.
.2
so
-... Var
.....2 ....
-...
1
1
0*V
0
0
-0.1Variable0NumberO.1
,o·,o
hl-137
0.2
0.2
121
2
t
Numbe
tot
arue
....
40
--
loO
_0_1
•,oo--0.2 n-0"'-IfbfýNum-ý' I
~ o.~ h ...o.
.... ... ....
w1010
-0.
M
V-i-ebl.
0
P4.-t-
02ht
0"'
0.2
-1~34
iE
-- 0.2
-- o.1
0
0.1
vmrlmbfý Numb-r
tht - 04
0
--0.11
VVriebl.
0-02
1,.2
0.1
u4-tht - 1.40
0.2
ti)
--0.2
-01
0
Var ibleNumbo
0.1
-0.. -0 1
11
0.
VeIeblý fNumb-r
0.2
Figure 4-33: Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered
(Wp=[0.2, 0.4]) Variables
Figure 4-34 are the eigenvectors calculated from the correlation matrix of band-pass
filtered (Wp=[0.2, 0.3]) variables. Figure 4-35 are the histogram of the eigenvectors.
Interpretation
The figures of eigenvectors obtained from the correlation matrices of high-
pass and band-pass filtered variables show that PCA with frequency filtered variables is also
0.2
T
O"
0
200
400
00
Var°able Number
0
oo00
7o
1
o
200
0.2
400
Varibla
0.2
200
-0
600
600
-0.2
c-0.2
0
-
400
600
Fo!
0.2
able
0.2
00
6200
. ooI
IM
-
Va-"ible Numb.
200
400
600
-Varable
Number
.
"
00
0"
--
-0.
"
0
Varibl
200
0-
600
O
Number
400
VVariables
Number
S
600
200
0
Number
400
200
400
600
RVa•b
Num•r
Tri
02
600
000
Number
Vrr"b"Num
400
Variabl
Numb.
r
o00
00S
Figure 4-34: First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3])
Variables
110
110
V.riable Number
ht10
Variable Number
110
10
t
c
110
-0.2
-0.1
0
0.1
Variable Number
Elk
0
0.1
-02
-0.1
-0.2
-01
0
01
Varlable Number
0.2
-0.2
0.2
-0.2
0.2
-02
-0.1
0
0.1
Variable Number
nWII
-0.1
0
0.1
-0.1
0
0.1
Variable Number
0.2
0.2
0.2
Figure 4-35: Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered
(Wp=[0.2, 0.3]) Variables
effective in grouping correlated variables together. The histograms indicate that the distributions of these eigenvectors are much narrower than the distributions of the eigenvectors
associated with the correlation matrix of the original data. The average heights of the middle
bars of the first 10 eigenvectors associated with the filtered variables obtained from filters
with normalized pass-band frequency Wp = 0.1, 0.3, 0.4, [0.2, 0.4], and [0.2, 0.3] are 155, 237,
230,128, and 114 respectively. Compared to an average middle bar of 86 for the distribution
of the first 10 eigenvectors of the original data set, the distribution of the eigenvectors from
frequency-filtered variables are much narrower. Consequently, it is reasonable to state that
by removing the noise in the original data set, PCA with frequency-filtered variables improves the signal-to-noise ratio of the eigenvectors, where significant variables are weighted
more, and accidentally correlated variables are weighted less.
Chapter 5
Conclusion and Suggested Work
This thesis presented various methods for analyzing multivariate data sets from continous
web manufacturing processes. The analysis techniques described in Chapter 2 were applied
to two sets of data sets from two different web processes in Chapter 3 and Chapter 4. These
analysis techniques combined with an understanding of the physics of the manufacturing processes can produce insights into information-rich data sets. Experiment results show that
both normalized Euclidean distance and principal component analysis are effective in separating the outliers from the main population. Correlation analysis on Poisson-distributed
defect densities shows the difficulties in determining the true correlation between variables
when the signal-to-noise ratio of the underlying processes are unknown. Principal component analysis is a good way to determine the existence of linear relationships between sets
of variables. 'Based on the hypothesis that variables from the same subprocess are more correlated than variables from different subprocesses, both principal component analysis with
robust correlation matrix and principal component analysis with frequency-filtered variables
are effective in grouping variables.
Hopefully, the results of my experiments can lead to more research in the area of multivariate analysis of manufacturing data in the future. Other multivariate methods can be
explored to identify and to treat outliers. In addition, mathematical models can be built to
determine the effects of Poisson noise on the calculation of correlation between processes.
Furthermore, mathematical methods can be developed to quantify the effects of non-linear
operations on the correlation matrices on the removal of noise and on the effectiveness of
grouping variables. Combining a solid understanding of the underlying physics with a mastery of analysis techniques can lead to tremendous progress in the area of data analysis of
manufacturing data.
Bibliography
[1] M.R. Anderberg, Cluster Analysis for Applications, Academic Press, New York, 1973.
[2] T.J. Derksen, The Treatment of Outliers and Missing Data in Multivariate Manufacturing Data Massachusetts Institute of Technology, Department of Electrical Engineering
and Computer Science, 1996.
[3] B.B. Jackson, Multivariate Data Analysis, Richard D. Irwin, Inc., Illinois, 1983.
R.J.A. Little, Statistical Analysis With Missing Data, John Wiley & Sons, Inc., New
York, 1987.
[4] M.A. Rawizza, Time-Series Analysis of Multivariate Manufacturing Data Sets Massachusetts Institute of Technology, Department of Electrical Engineering and Computer
Science, 1996.
[5] A.C. Rencher, Methods of Multivariate Analysis, John Wiley & Sons, Inc., New York,
1995.