Multivariate Analysis of Manufacturing Data by Ronald Cao Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering and Bachelor of Science in Electrical Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 1997 @ Massachusetts Institute of Technology 1997. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part, and to grant others the right to do so. .. ..... .........n ....... "" . Author .............. .. Department of Electrical " ineering and Computer Science May 23, 1997 Certified by .................~... .. -,. . ...... David Professor of Electrical Engineering SThesis Supervisor 1\ Accepted by................. :C ...... .......... "rederig- R. Morgenthaler ,Department Con~itee onorn ate Students / Multivariate Analysis of Manufacturing Data by Ronald Cao Submitted to the Department of Electrical Engineering and Computer Science on May 23, 1997, in partial fulfillment of the requirements for the degree of Master of Engineering and Bachelor of Science in Electrical Engineering Abstract With the advancement of technology, manufacturing systems have become increasingly complex. Currently, many continous-time manufacturing-processes are operated by a complicated array of computers which monitor thousands of control variables. It has become more difficult for managers and operators to determine sources of parameter variation and to control and maintain the efficiency of their manufacturing processes. The goal of this thesis is to present a sequence of multivariate analysis techniques that can be applied to the analysis of information-rich data sets from web manufacturing processes. The focus is on three main areas: identifying outliers, determining relationships among variables, and grouping variables. The questions asked are 1) how to effectively separate outliers from the main population? 2) how to determine correlations among variables or subprocesses? and 3) what are the best methods to categorize and group physically significant variables within a multivariate manufacturing data set? Results of various experiments focused on the above three areas include 1) both normalized Euclidean distance and principal component analysis are effective in separating the outliers from the main population, 2) correlation analysis of Poisson-distributed defect densities shows the difficulties in determining the true correlation between varibles, and 3) both principal component analysis with robust correlation matrix and principal component analysis with frequency-filtered variables are effective in grouping variables. Hopefully these results can lead to more comprehensive research in the general area of data analysis of manufacturing processes in the future. Thesis Supervisor: David H. Staelin Title: Professor of Electrical Engineering Acknowledgments It has been an incredibly meaningful and fulfilling five years at MIT. The following are just some of names of the people who have made tremendous contributions to my intellectual development and my personal growth. * My advisor, Professor David Staelin, who provided me with the guidance that I needed on my research and thesis. He has inspired me with insightful ideas and thoughtprovoking concepts. In addition, he has given me the freedom to explore my ideas as well as many valuable suggestions for experimentations. * My lab partners: Junehee Lee, Michael Shwartz, Carlos Caberra, and Bill Blackwell. Many thanks to Felicia Brady. * Dean Bonnie Walters, Professor George W. Pratt, Professor Kirk Kolenbrander, Professor Lester Thurow, and Deborah Ullrich. * All the friends I have made through my college life, especially my good friends David Steel and Jake Seid and the brothers of Lambda Chi Alpha Fraternity. * Most of all, I would like to thank my parents for their endless love and support. They have been there through every phase of my personal and professional development. Thank you! Contents 1 Introduction 17 . .. 1.1 Background ................... 1.2 Previous Work 1.3 O bjective 1.4 Thesis Organization ................................ .. .. 17 ..... ..... 18 .................................. . . . . . . . . . . . . . .... . . . . . . . . . . . . . . . . . . . . 19 21 2 Basic Analysis Tools ...... 2.1 Data Set ............. 2.2 Preprocessing .............. 2.3 19 .... ........ .... .... ..... .. ........ . . 21 . . 22 2.2.1 M issing Data ............................... 22 2.2.2 Constant Variables ............................ 22 2.2.3 Norm alization ............................... 22 24 Outlier Analysis .................................. 2.3.1 Definition . .. ...... ... .. 2.3.2 Causes of Outliers 2.3.3 Effects of Outliers .............................. 2.3.4 Outlier Detection ............................. .. .. ... . ... .. . .. .. . ............................ 2.4 Correlation Analysis 2.5 Spectral Analysis ................................. 2.6 Principal Component Analysis ................... 24 .. Basic Concept ............................... 2.6.2 Geometric Representation ................... 2.6.3 Mathematical Definition ................... 7 25 25 ............................... 2.6.1 24 26 28 ....... 29 29 ..... ...... 29 31 35 3 Web Process 1 3.1 Background .......................... 3.2 Data ...................... 3.3 Preprocessing .................... 3.4 Feature Characterization ............................. 3.5 3.6 3.7 4 ... ...... .. ................ 3.4.1 In-Line Data . .. 3.4.2 End-of-line Data Correlation Analysis .. 35 ................ ... .. .. .. 36 36 .. ... .. .. .. .. .. . .. . 36 .. 37 ............ ............... .................... ........... 39 39 ........................ 3.5.1 Streak-Streak Correlation 3.5.2 Streak-Cloud Correlation ......................... 3.5.3 Interpretation .................... Poisson Distribution 35 41 42 ..... ....... 43 ............................... 43 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 3.6.1 M ethod 3.6.2 Results............. ..... ................... Principal Component Analysis ............ .. ..... ....... 44 45 . 3.7.1 PCA of the End-of-Line Data ...................... 45 3.7.2 PCA of the In-Line Data ......................... 45 3.7.3 Interpretation .................. 47 .. ........... 49 Web Process 2 . 49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Background ........ 4.2 D ata . . . . . .... 4.3 Preprocessing .......... 4.4 Feature Characterization ................. 4.5 4.6 .. . In-Line Variables ...................... .. ... ........... 4.4.2 .. .... ....... .. ...... ... 50 51 ............ 52 . . . ................ 52 Outliers Analysis ................................. 4.5.1 Normalized Euclidean Distance 4.5.2 Time Series Model - PCA .. ..... 4.5.3 Identifying Outlying Variables ............. ... 50 ............ Quality Variables ........ ... 50 .......... ............. 4.4.1 Variable Grouping ......... ....... ....... .... 4.1 .............. ............................ .... . . . . . . . . . . 53 58 62 5 4.6.1 Principal Component Analysis ....................... 4.6.2 PCA with Robust Correlation Matrix . ................. 4.6.3 PCA with Frequency-Filtered Variables. Conclusion and Suggested Work ... . . . . . . .. 62 64 ....... 72 79 List of Figures 30 2-1 (a) Plot in Original Axes (b) Plot in Transformed Axes . ........... 3-1 Time-Series Behavior of a Typical In-Line Variable 3-2 Cross-Web Position of Defects Over Time . .................. 3-3 Streak and Cloud Defects 3-4 The 10 Densest Streaks Over Time ....................... 3-5 Correlation Coefficients Between Streaks Using a) standard time block, b) . . . . . . . . . . . . . . . 40 41 Correlation Coefficients Between Streak and Cloud with Time Blocks of length 1 to length 100 3-7 37 38 ............................ double-length time block, c) quadruple-length time block . .......... 3-6 37 . . .. .. . .. .. .. .. . . . .. .. . . .. . . . . . .. . 42 (a) Cloud Distribution Using Fixed-Length Time Blocks, (b) Ideal Poisson Distribution Using the Same Fixed-Length Time Blocks . ........... 43 3-8 Distributions of Cloud Defects Using x2, x4, and x6 Time Blocks ....... 44 3-9 Ideal Poisson Distributions Using x2, x4, and x6 Time Blocks 44 3-10 First 3 Principal Components of the End-of-Line Data . ....... 46 . ........... 3-11 Percent of Variance Captured by PCs ...................... 46 3-12 First 4 PCs of the In-Line Data ......................... 47 4-1 Ten Types of In-Line Variable Behavior ................... .. 4-2 Normalized Euclidean Distance ................... 4-3 Outlier and Normal Behavior Based on Normalized Euclidean Distance . . . 54 4-4 First Ten Principal Components of Web Process 2 Data Set 55 4-5 High-Pass Filters with Wp=0.1 and Wp = 0.3 . ................ 56 4-6 First Ten Principal Components from 90% High-Pass Filterd Data ...... 57 ...... . ........ 51 53 4-7 First Ten Principal Components from 70% High-Pass Filtered Data .... . 57 4-8 Variables Identified that Contribute to Transient Outliers in Region 1 and 4 60 4-9 The First Principal Component and the Corresponding Eigenvector from Process 2 Data ....... . .............. .............. 4-10 The First Ten Principal Components from 738 Variables . .......... 61 4-11 The First Ten Eigenvectors from 738 Variables . ................ 4-12 First 10 Eigenvectors of 738 Variables ........ 61 . .............. 4-13 Histograms of the First 10 Eigenvectors of 738 Variables 60 . 63 . .......... 63 4-14 Magnitude of Correlation Coefficients of 738 Variables in Descending Order in (a) Normal Scale, (b) Log Scale ........................ 65 4-15 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff = 0.06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4-16 Histograms of the first 10 Eigenvectors Calculated from Robust Correlation M atrix with Cutoff = 0.06 ......... 66 .................... 4-17 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 67 4-18 Histograms of the first 10 Eigenvectors Calculated from Robust Correlation 67 M atrix with Cutoff = 0.10 ............................ 4-19 First 10 Eigenvectors calculated from Correlation Matrix with Cutoff at 0.15 68 4-20 Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff at 0.15 ....... 68 ........ ................. 4-21 First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff at 0.18 ... .... ..................... .. ......... .. 69 4-22 Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff at 0.18 ........ ... ...................... 69 4-23 A Comparison of Eigenvectors Calculated from (a) Original Correlation Matrix (b) Robust Correlation Matrix with Cutoff=0.06, (c) Robust Correlation Matrix with Cutoff=0.10, (d) Robust Correlation Matrix with Cutoff=0.15, (e) Robust Correlation Matrix with Cutoff=0.18 . ............... 4-24 A Comparison of Histograms of the Eigenvectors . .............. 71 71 4-25 (a) High-Pass Filter with Wp = 0.3 (b) Band-Pass Filter with Wp=[0.2, 0.4] 72 4-26 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.1) Variables 73 4-27 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.1) Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4-28 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables 74 4-29 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4-30 First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables 75 4-31 Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 75 4-32 First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.4]) V ariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4-33 Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.4]) Variables .................... ..... .. .. ...... 76 4-34 First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3]) Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... . 77 4-35 Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3]) Variables . . . .. .. ... . . . . . . . . .. . . . . . . . . . . . . .. . 77 List of Tables 2.1 12 Observations of 2 Variables............. ..... ....... ..... 29 Chapter 1 Introduction 1.1 Background With development of technology, manufacturing systems are getting increasingly more complex. A typical continous-time manufacturing process may be controlled and monitored by thousands parameters such as temperature and pressure. With higher customer standards and higher operating cost, manufacturing companies are constantly creating new ways to deal with the problem of how to increase efficiency and reduce cost. The Leaders for Manufacturing (LFM) Program is a joint effort among leading U.S. manufacturing firms and both the School of Engineering and the Sloan School of Management from the Massachusetts Institute of Technology. The goal of LFM is to identify, discover, and translate into practice the critical factors that underlie world-class manufacturing. MIT faculty and students and participating LFM companies have identified seven major themes of cooperation. These research themes are Product and Process Life Cycle, Scheduling and Logistics Control, Variation Reduction, Design and Operation of Manufacturing Systems, Integrated Analysis and Development, Next Generation Manufacturing, and Culture, Learning, and Organizational Change. The research and analysis presented in this thesis is directly related to Leaders For Manufacturing Research Group 4 (RG4) whose focus is variation reduction in manufacturing processes. Understanding variations and methods to reduce them can help companies to improve yields, reduce defects, decrease product cycle time, and generate higher quality products. In order to gain this understanding, RG4 attempts to answer questions such as 1) how to effectively determine which process parameters to monitor and control? 2) what are useful technique to determine multivariate relationship among control and quality variables? 3) How to best communicate results and findings to managers and engineers at the participating companies? There are many types of manufacturing processes in industry. The type of processes that this thesis focuses on are referred to as web processes. The particular characteristic associated with a web process is that the end product is in the form of sheets with the appropriate thickness, width and length and can be packaged into rolls or sliced into sheets. Although multivariate analysis methods are applied to two data sets collected from web processes, most of the tools discussed in this thesis can also be applied to analyze data from other types of processes. 1.2 Previous Work My research builds on the work conducted by previous LFM RG4 research assistants. In his Master's thesis titled "The treatment of Outliers and Missing Data in Multivariate Manufacturing Data", Timothy Derksen developed strategies for dealing with outliers and missing data in largý, multivariate, manufacturing data.[2] He compared the effectiveness of statistics based on standard versus robust estimates of the mean, standard deviation, and the correlation matrix in separating the outliers from the main population. In addition, he developed maximum likelihood methods to treat missing data in large multivariate manufacturing data. Mark Rawizza's "Time-Series Analysis of Multivariate Manufacturing Data Sets" [4] discussed various data analysis tools used in engineering and applied them to manufacturing data sets. He used fundamental preprocessing and data-reduction techniques such as principal component analysis to present and reorganize manufacturing data. Furthermore, he experimented with ARMA models and neural networks to assess the predictability of data sets collected from both web processes and wafer processes. 1.3 Objective The objective of this thesis is to apply series of multivariate techniques to analyze informationrich data sets from continous web manufacturing processes. In particular, a lot of the analysis is based on having an understanding of the physics behind the manufacturing process. Combining multivariate analysis tools with an understanding of the underlying physics can produce results and insights that can be very valuable company managers. The questions asked in this thesis are: 1) how to effectively separate outliers from the main population? 2) how to determine relationships among variables and subprocesses? and 3) what are the best methods to categorize and group variables within an informationrich multivariate data set? Results of various experiments focused on these three areas are discussed. Hopefully they can lead to more comprehensive research in the general area of data analysis of manufacturing processes in the future. 1.4 Thesis Organization This thesis is divided into four major sections. Chapter 2 presents an overview of the major multivariate analysis tools and methods used in the rest of the thesis. These tools deal with preprocessing of the original data, outlier identification and analysis, correlation analysis, spectral analysis, and principal component analysis. Chapter 3, the second major section, utilizes the multivariate tools presented in Chapter 2 to analyze a web manufacturing data set. With the data set divided into in-line variables and quality variables, the objective is to perform multivariate analysis on these two sets of variables separately and to determine multivariate linear relationships between them. In addition, correlation analysis is performed on Poisson-distributed defect densities. Chapter 4, the third section, applies the basic tools to analyze a data set from a different manufacturing web process, where the in-line variables and the quality variables are not identified. The analysis focuses on experimenting with ways to more effectively separate variables utilizing principal components. Experimental results show that PCA with robust correlation coefficients and PCA with frequency-filtered variables are more effective in grouping and identifying the variables that are correlated with each other. Chapter 5, the final section, summarizes the important insights gained and suggests possible areas of continued research. Chapter 2 Basic Analysis Tools 2.1 Data Set A data set contains information about a group of variables. The information is the values of these variables for different times or situations. For cxample, we might have a data set that consists of weather information for 50 states. There might be 20 variables such as rainfall, average temperature, and dew point temperature, and 50 observations of these 20 variables representing the 50 states. The data set might also be the same 20 variables and 50 observations representing daily measurements of each of these 20 variables for one state over 50 days. Both data sets can be represented as a m x n matrix, where m = 50 is the total number of observations and n = 20 is the total number of variables. The data sets used in this thesis are recorded from continous-time web manufacturing systems. A typical data set may consists of measurements of thousand of variables for thousands of observations recorded over days. The variables can be categorized as either inline variables or end-of-line variables. The in-line variables of a manufacturing system control and monitor the operation of the manufacturing process. Some typical in-line variables are temperature, pressure, volume, speed, and so on. Furthermore, end-of-line variables, also referred to as quality variables, provide managers and technicians with information on the quality of the end product of the manufacturing process. Some typical quality variables are defect size, defect location, thickness and strength. 2.2 Preprocessing Preprocessing the data is an integral part of data analysis. Very rarely can large new data sets be used unaltered for multivariate analysis. The following are three major parts of preprocessing: 2.2.1 Missing Data Within a raw manufacturing data set, very rarely are all the observations complete, especially when measurements are collected over days. Often, parts of machines or subprocesses are shut down for maintenance or testing purposes. As a result, certain parameters are not or cannot be recorded. These missing observations need to be treated before any multivariate analysis. Timothy Derksen, in "The treatment of Outliers and Missing Data in Multivariate Manufacturing Data", investigated methods of detecting, characterizing, and treating missing data in large multivariate manufacturing data sets.[2] In general, if a variable has most of its observations missing, the variable should be removed completely from the data set. Otherwise, the missing observations can be estimated using the EM algorithm described by Little.[6] 2.2.2 Constant Variables Multivariate analysis allows for understanding variable behavior in a multi-dimensional world. Any variables that are constant over time do not exhibit any information relevant to multivariate analysis. As a result, variables that have zero variance should be removed from the data set. 2.2.3 Normalization For a web process data set that contains n variables and m observations, the n variables can consist of both control and quality parameters such as temperature, pressure, speed, thickness, density, and volume. Since all these variables are most likely measured in different units, it is often very difficult to compare their relative values. To deal with this comparison problem, normalization is applied to the variables. For a given mxn matrix with i = 1, 2, . .. , m observations and j = 1, 2, . .. , n variables, where the value of any ith observation and jth variable is denoted as Xej, the corresponding value in the normalized data set is denoted as Zij. Normalization is commonly defined as the following: Zi = Xi - 9 'Tj (2.1) where xij Xj = '=1= m (2.2) m -1 (2.3) In words, to calculate the normalized Zaj for any ith observation and jth variable, we take the corresponding value Xij and subtract the mean Xj of the jth variable, and divide the result by the standard deviation aj of the jth variable. In the rest of this thesis, a variable that is said to be normalized is normalized to zero mean (Zi = 0) and unit variance (oa = 0). There are benefits and drawbacks with performing normalization before multivariate analysis. The following are reasons for normalization: * Normalization causes the variables to be unit-less. For example, if the unit of Xij is meters, Xij - Xj is also in meters. When the result is divided by aj, also measured in meters, the final value Zii will be unit-less. As a result of normalization, variables originally measured in different units can be compared with each other. * Normalization causes all the variables to be weighted equally. Since the normalized variables are zero-mean and unit-variance, each variable is weighted equally in determining correlations among variables. Normalization is especially important before performing multivariate analyses such as principal component analysis, because it gives each variable equal importance. More on normalization and principal component analysis will be discussed in Section 2.6. * Normalization is a way of protecting proprietary information inherent in the original data. By taking away the mean and reshaping the variance, the information that is proprietary can be removed. Protecting proprietary information is a very important part of LFM's contract with its participating companies. The following is one of the drawbacks of normalization: * Normalization may increase the noise level. Since normalizing causes all the variables to have unit variance, it is likely that some measured noise will be scaled so that it rivals the more significant variables. As a result, normalization may distort the information in the original data set by increasing the noise level. 2.3 Outlier Analysis The detection and treatment of outliers are an important pre-step to performing statistical analysis. This section defines outliers, names the causes and effects of outliers, and presents some univariate and multivariate tools of detecting outliers. 2.3.1 Definition Outliers are defined as a set of observations that are inconsistent with the rest of the data. It is very important to understand that outliers are defined relative to the main population. 2.3.2 Causes of Outliers The following are the causes of outliers: 1. Extreme members - Since manufacturing data consist of variables recorded over thousands of observations, it is possible that some observations can occasionally exhibit extreme values. 2. Contaminants - These are observations that should not be grouped with the main population. For example, if the main population is a set of observations consisting the weight of apples, the weight of an orange is considered a contaminant if it is placed in the same group. In a manufacturing process, a contaminant can be an observation made while a machine is broken amidst observations made while the machine is properly operating. 2.3.3 Effects of Outliers Statistical analysis without the removal of outliers can produce skewed and misleading results. Outliers can potentially drastically alter the sample mean and variance of a population. In addition, outliers, especially contaminants, can incorrectly signal the occurrence of extreme excursions in a manufacturing process when the process is actually operating normally. 2.3.4 Outlier Detection For a data set with n variables and m observations, a potential outlier is a point that lies outside of the main cluster formed by the general population. The following are some methods used to determine outliers. Univariate Method An univariate method of detecting outliers is the calculated the univariate number of standard deviations from the mean. zij = (2.4) where xij is the value of observation i and variable j, Tj is the sample mean of the variable j, and aj is the sample standard deviation of the variable j. Observations where zij > Kj, where Kj is a constant for variable j, can be categorized as outliers. Depending on the range of the values of observations for each variable, the value of Kj can be adjusted. To determine gross outliers, the value of Kj can be set to be large. Multivariate Methods Equation 2.4 can be extended so that it represents a multivariate measure of the distance of all the variables away from the origin. Observations where zj > K, where K is a constant, can be treated as points lying outside of a n-dimensional cube centered on the sample mean. This multivariate method is very similar to the univariate one, except the value of K is constant for all variables. Similarly, this method can be effective in determining gross outliers, but in a manufacturing environment where most of the variables are correlated, this method is limited in its effectiveness in identifying outliers. A more robust multivariate method to detect outliers involves calculating the Euclidean distance to the origin of the n-dimensional space after all n variables are normalized to zeromean and unit-variance. The square of the normalized Euclidean distance is defined as the following: d2 = N 2 -- (2.5) j=1 si where xij is the value of observation i for variable j, and sj is the sample variance of variable j. Observations with d? > K, where K is a constant, lie outside of an ellipsoid centered around the origin and are considered as outliers. 2.4 Correlation Analysis In a multivariate manufacturing environment, it is often desirable to measure the linear relationship between pairs of variables or among groups of variables. By understanding these relationships among variables, managers can gain insights into the manufacturing process. One method of determining the linear relationship between variables is to calculate their covariance and correlation. Given two variables i and j, with m observations, the sample covariance sij measures the linear relationship between the two variables and is defined as the following: sij e = v)rs - a 1i)(eke (ki k=l - s (2.6) For n variables, the sample covariance matrix S = (sij) is the matrix of sample variances and covariances of combinations of the n variables: S = (si) = S11 312 ... Sin 821 S22 ... S2n (2.7) Sn1 Sn2 *.. Snn where the diagonal of S represents the sample variances of the n variables, and rest of the matrix represents all the possible sample covariances of pairs of Variables. The covariance of the ith and jth variables, sij, is defined by Equation 2.6, and the variance, sij = s?, of the same pair of variables is defined as the following: i " m= - 1 (ki - -i) 2 (2.8) k=l Since the covariance depends on the scale of measurement of variable i and j, it is difficult to compare covariances between different pairs of variables. For example, if we change the unit of a variable from meters to miles, that covariance will also change. To solve this problem, we can normalize the covariance by dividing by the standard deviations of the two variables. The normalized covariance is called a correlation. The sample correlation matrix R can be obtained from the sample covariance matrix and is defined as: 1 r12 Rr21 1 ... r2n rnl rn2 ... 1 ... rln (2.9) where rij, the sample correlation coefficient of the ith and jth variable, is defined as the following: rn = (2.10) Since the correlation of a variable with itself is equal to 1, the diagonal elements of matrix R in Equation 2.9 are all is. In addition, please notice that if the variables are all normalized to unit variance such that s;i =1 and sjj = 1, then the correlation matrix R is equal to the covariance matrix S. Since most of the multivariate analysis discussed in this paper deals with normalized variables, R is often substituted for S. 2.5 Spectral Analysis Fourier Transform Fourier transforms can be an excellent tool to gain insight into the behavior of variables in the frequency domain. For the jth variable observed at time i = 1, ..., m, the Fourier transform is defined as: m xiie - Xj(ew) = jwn (2.11) i=1 Autocorrelation Function The autocorrelation function looks at a variable's correlation with itself over time. A typical random signal is more correlated with itself over short time lag versus long time lag. The autocorrelation of variable xj is: R j(r) =E[xizx(i-,,)] Power-Spectral Density (2.12) Power-Spectral Density (PSD) is the Fourier transform of the autocorrelation function of a random signal xj(t). Payr(w) = F(Rxx,(r)) (2.13) where F is the Fourier transform operator and Rxj, (7) is the autocorrelation of the random signal xj(t). For simplicity, the ensemble average of xj(t) is assumed to be zero without any loss of generality. The calculation of autocorrelation requires the ensemble average of xj(t)xz(t - r). Since our data consist of one sample sequence for each variable, this ensemble average is impossible to get. One technique to get around this problem is to assume the sequence is ergodic. Then the PSD is the magnitude squared of the Fourier transform. P=xj(w) = IF(xj(t))12 (2.14) 2.6 2.6.1 Principal Component Analysis Basic Concept Principal components analysis (PCA) is a mathematical method for expressing a data set in an alternative way. The method involves using linear combinations of the original variable to transform the data set onto a set of orthogonal axes. The main objective of principal component analysis is two-fold: 1) data reduction, and 2) interpretation. Principal components analysis is often referred to as data reduction rather than data restatement, because it preserves the information contained in the original data in a quite succinct way. Principal component analysis takes advantage of the relationship among the variables to reduce the size of the data while maintaining most of the variance in the original set. A data set with n variables and m observation can be reduced to a data set with k principal components and m observations, where k < n. In addition, since PCA transforms the original data into a new set of axes, it often reveal relationships that are buried in the original data set. As a result, PCA is a powerful tool in multivariate analysis. 2.6.2 Geometric Representation Principal components analysis can be best understood in terms of geometric representation. We can start with a simple two-dimensional example. Table 2.1 shows 12 observations of 2 variables, X1 and X 2 . 1 2 3 4 5 X1 8 4 5 3 X2 4 6 2 -2 6 7 8 9 10 11 12 1 2 0 -1 -3 -4 -5 -8 3 0 -3 -2 2 -2 -6 -1 Table 2.1: 12 Observations of 2 Variables Figure 2-1 represents the data in Table 2-1 using two different sets of axes. The points in Figure 2-la are interpreted relative to the original set of axes, while the same points I I ' ' ' ' m · · · · I . . -6 . -4 . -2 · I 0 . 2 . 4 . 6 . 8 -8 (a) I (b) Figure 2-1: (a) Plot in Original Axes (b) Plot in Transformed Axes are interpreted relative to a new set of orthogonal axes in Figure 2-lb. The information is preserved as the axes are rotated. Similar to Figure 2-1b, principal components are defined as a transformed set of coordinate axes obtained from the original data used to describe the information content of the data set. In a 2-dimensional data set. The first principal component is defined as the new axis that captures most of the variability of the original data set. The second principal component, perpendicular to the first one, is the axis that captures the second biggest variance. The principal components are calculated in a minimal squared-distance sense. The distance is defined as the perpendicular distance from the points to the candidate axis. The first principal component is the axis where the sum of the squared-distance from the data points to the axis is minimal among all possible candidates. The second principal component is taken perpendicular to the first one, for which the sum of the squared distance is the second smallest. In a multivariate data set that extends over more 2 dimensions, PCA finds directions in which multi-variable data contain big variances (therefore, much information). The first principal component has the direction in which the data have the biggest variance. The direction of the second principal component is that with the biggest variance among the directions which are orthogonal to the direction of the first principal component, and so on. After a few principal components, the remaining variance of the rest is typically small enough so that we can ignore them without losing much information. As a result, the original data set with n dimensions (n variables) can be reduced to a new data set with k dimensions (k principal components) where k < n. 2.6.3 Mathematical Definition Principal component analysis takes advantage of the correlation among variables to find new set of variables which reduce most of the variation within the data set to as few dimension as possible. The following is the mathematical definition [3]: Given a data set with n variables and m observations, the first principal components must satisfy the following conditions: 1. z1 is linear function of the original variables. zi = w11X 1 + W12 X 2 + ... + wlnXn where w 11, w12 , . . . (2.15) , w• are constants defining the linear function. 2. Scaling of new variable z l . w1 + 2 + ...+ n=1 (2.16) 3. Of all the linear functions of the original variable that satisfy the above two conditions, pick zl that has the maximum variance. Consequently, the second principal component must satisfy the following conditions: 1. z 2 is linear function of the original variables. Z2 = w 2 1X where w21 , w22 , 1 + w 2 2X 2 +-...- + 2 n Xn (2.17) • • , w2n are constants defining the linear function. 2. Scaling of variable z 2. 2w + w2 + " + w2 = 1 (2.18) 3. zl and z 2 must be perpendicular. Wllw21 + W12 22 + ...-+ WlnW2n = 0 (2.19) 4. The values of z 2 must be uncorrelated with the values of zj. 5. Of all the linear functions of the original variable that satisfy the above three conditions, pick z 2 that captures as possible of the remaining variance. For a data set with n variables, there are a total of n possible principal components. Each component is a linear combination of the original set of variables, is perpendicular to the previously selected components, with values uncorrelated with the values from the previous set of values, and which explains as much as possible of the remaining variance in the data. In summary, zl = W'1X = w11X 1 + wl 2 X 2 + ... + wi,X, (2.20) z2 = w'2X = w21XI + w 22 X 2 + ... + w2,Xn z, = w',X = w l1X 1 + wn 2 X 2 + ... + WnX, where random variable X' = [X 1, X 2 , X 3 ,..., Xn] has a covariance matrix S with eigenvalueeigenvector pairs (A1 , ex), (A2 , e 2 ), . • , (An, e,) where A1 >X 2 > ... > A, > 0. The principal components are uncorrelated linear combinations zi, z2 , z3 , . . . , zn, whose variances, Var(zi) = wýSwi, are maximized. It can be shown that principal components solely depend on the covariance matrix S of X 1, X 2 , X 3 , . . ., Xn. This is an very important concept to understand. As described earlier, the axes of the original data set can be rotated by multiplying each Xi by an orthogonal matrix W: zi = WXi (2.21) Since W is orthogonal, W'W = I, and the distance to the origin X is unchanged: z:zi = (WX,)'(WXi) = X W'WX, = XX, (2.22) Thus an orthogonal matrix transformed Xi to a point zi that is the same distance from the origin with the axes rotated. Since the new variables zi, z2 , z3 , * . • , z, in z = WX are uncorrelated. Thus, the sample covariance matrix of z must of the form: s2z 1 Sz 0 0 ... Sz 2 ... 0 0 0 0 (2.23) ... s2,n if z = WX, then S, = WSW', and thus: WSW' = s2 Z 0 ... 0 0 S2 z2 ... 0 0 0 (2.24) ... S2zn where S is the sample covariance matrix of X. In linear algebra, we know that given C'SC = D, where C is an orthogonal matrix, S is a symmetric matrix, and D is a diagonal matrix, the columns of the matrix C must be normalized eigenvectors of S. Since Equation 2.24 shows that orthogonal matrix W diagonalizes S, W must equal the transpose of the matrix C whose columns are normalized eigenvectors of S. W can be written as the following: I W W= (2.25) W' . where w! is the ith normalized eigenvector of S. The principal components are transformed variables zl = w'X, z2 = w'X,. . . , z = w'X in z = WX. For example, zl = wllX 1 + W12X 2 + ... + WlnXn In addition, the diagonal elements in Equation 2.24 are the eigenvalues of S. Thus the eigenvalues A1, A2 , . . . , An, of S are the variances of the principal components zi =W!X: s2zi = Ai (2.26) Since the eigenvalues of S are variances of the principal components, the percentage of the total variance captured by the first k principal components can be represented as: % of Variance Captured = A1 +2 .. + Ak 2i=I Sii (2.27) The following is a summary of some interesting and useful properties of principal components.(Johnson, Wichern, p. 342) * Principal components are uncorrelated. * Principal components have variances equal to the eigenvalues of the covariance matrix S of the original data. * The rows of the orthogonal matrix W correspond to the eigenvectors of S. f)A Chapter 3 Web Process 1 3.1 Background The data set used in this chapter is collected from a continous web manufacturing process where more than 850 in-line control parameters are constantly monitored. The end-of- line data comes from an optical scanner sensitive to small light-scattering defects where 8 important quality parameters are measured with high precision. In this chapter, some of the analysis tools described in Chapter 2 are ultilized to characterize the multivariate behavior of the in-line data, the multi-variate behavior of the end-of-line data, and the statistical relationship between the two. 3.2 Data The data set from Web Process 1 consists of two major groups of variables: in-line variables and end-of-line variables. The in-line data set consists of physical parameters that control the production process, while the end-of-line data are parameters that indicate the quality of the end product. The combined data set represents information for the manufacturing of 115 rolls of the end product. The in-line data set contains 854 control parameters, measured approximately every 61 seconds for 4320 observations. The end-of-line data consist of 4836 measurements of 8 quality parameters. The values of these quality parameters are collected by a real-time scanner that sweeps across the web at constant frequency. One of the 8 quality variables is an indicator of the type of defect that occurs at the end-of-line. 3.3 Preprocessing As discussed in Chapter 2, raw data sets often need to be preprocessed before any multivariate analysis are performed. In the following two sections, techniques are applied to both the in-line and end-of-line data in order to present more effectively the information in the original data. End-of-Line Data Sometimes the values of variables are not numeric. As a result, the information contained in the variables must be encoded before any analysis. How to encode these non-numeric values depends on the type of information they convey. For example, the end-of-line data contains a variable that categorizes the different types of defects. There are a total of 8 defect types, and each one of them is simply assigned a numeric value. In-Line Data There are a total of 854 in-line parameters, measured approximately every 61 seconds for a period of 3 days. Of all these parameters, 194 variables are constant over the entire period. These 194 variables can be discarded without any further investigation. In addition, 222 variables are also eliminated, because they are simply averages of other parameters. Consequently, 438 in-line parameters are left for analysis. 3.4 Feature Characterization Before performing any multivariate analysis, much insight can be obtained from examining the data in the time and space domain. 3.4.1 In-Line Data The in-line variables show fluctuations over time. This means the physical process does not remain steady. It tends to change "significantly" over time. The following is a plot of the behavior of a typical in-line parameter over time. --- Figure 3-1: Time-Series Behavior of a Typical In-Line Variable 3.4.2 End-of-line Data The end-of-line data, include the sizes, shapes, positions and times of defects. Figure 3-2 is a visual representation of the positions and times of the defects. The horizontal axis represents the cross-web position, and the vertical axis represents the times when defects occur. Each point on the graph represents a defect spot at a particular time and at a particular position across the web. One can simply imagine Figure 3-2 as one big sheet of the end-product where the defects location are marked by dots. If the web moves at a fairly constant speed, the variable, time, on the vertical axis is highly correlated with down-web position of defects. Position of Defects on Web I,. I~J4 ;*i7*' EI.r · JI- · i'·- I 1 Ii ::z:t.:. 5'. 10 U~ · · · ... :.: . I:,. 20 30 WKdh 40 50 Figure 3-2: Cross-Web Position of Defects Over Time Figure 3-2 shows a number of interesting features: n"~ * Defects can be categorized into streaks and cloud. Some defects tend to occur at the same cross-web position over time (streaks), while others appear to occur fairly randomly on the web (cloud). * Defects are significantly denser on the left side of the web than the right side. * For certain periods of time, there are no defect occurrences across the web. The could represent 'perfect' times when the manufacturing process is running without any defects. Perfect Observations Figure 3-2 shows that there are certain periods of time where no single defect occurs across the web. There are two possible scenarios that can explain these 'perfect' observations: 1) At these observations, the manufacturing process is running perfectly and all the control parameters are at their optimal levels. Consequently, there are no defects. 2) These 'perfect' observations are simply the result of the process being shut down and the scanner recording no information. After some investigation, it was discovered that the manufacturing process is occasionally shut down for various maintenance reasons, such as the cleaning of rollers, etc. Since the scanner continues to operate during these periods, no defects are recorded on the web. As a result, it was determined that the 'perfect' observations are simply contaminants that do not have any physical significance. 01loued 4= 35M awo i·z ).· ~':·; '·' i·5 · ~':.·· :. ·· f··L ·· ·' ' ~ -:z~~ . L ,· ~c':' ' -~··P "000 1S r: :·· V · ·· -- 0 '' ~·:·' _l;~_ri·. ~·I·~ '' ~':.·~:' ~. · · · I· `rr.. · · r·4·i .· 'r =' C r. 2o 30 Sekec · · i ·.:" -·· ··· ,· · a40 i a Oft-k0.1" :010 *rjH 3500. 1 2000. 0 10 20 0 40 so 00 Figure 3-3: Streak and Cloud Defects Cloud and Streak Defects Figure 3-3 shows the end-of-line defects can be separated into streak defects and cloud defects. The threshold for differentiating between streak and cloud defects is 10 defects counts per unit distance across the web, where this distance is 0.1 percent of the web width. In other words, if there are more than 10 defect counts for all times that occur within a certain unit distance block across the web, then these defects within this block are counted as a defect streak. Defects within any blocks totaling less than 10 are categorized as cloud defects. 3.5 Correlation Analysis In order to improve quality and reduce defect rate, an interesting question that a plant manager might ask is "are the occurrences of streak and cloud defects related to some common physical factors or they caused by separate physical phenomenon?" Figure 3-3 shows that these two types of defects can be clearly separated from each other and seem to resemble two separate physical processes. The cloud defects seem to be fairly randomly distributed, while the streak defects are concentrated on the left side of the web and seem somewhat correlated. Correlation analysis is a good method to apply here to determine the relationships between streaks and between streak and cloud defects. Understanding the correlations among streaks and clouds is a good beginning to understanding the underlying physics that causes the defects. 3.5.1 Streak-Streak Correlation Figure 3-3 indicates that most streaks occur near the left edge of the web. This suggests that streak defects are not randomly generated. Some physical characteristics particular on the left side of the web could be causing the streak defects. To test this hypothesis, the correlation coefficients between all 45 combinational pairs of the 10 densest streaks are calculated. If the streaks are caused by some common factor, the correlation coefficients between streak would be close to 1 or -1. Conversely, if the streaks are not caused by some common factor, the correlation coefficient should be closer to 0. Mejor3tkr 400-0 Defeaf · · I 5. ~ I' ;I 'Ii as=o ;I '4 2000 Ii f : som - .!I. - 11 I 1 1 . a ' 'I i. ji '9 - .2 -- -- 3.s 4 4.5 Figure 3-4: The 10 Densest Streaks Over Time Method Figure 3-4 shows the ten densest streaks on the web, which are used for correlation analysis. In order to calculate the correlation coefficients between streaks, each streak is divided into approximately 257 time blocks. Consequently, a single streak can be represented as a 257-element vector, where each element is the total defect count within each time block. The correlation coefficient between any two 257-element vectors can be calculated using Equation 2.10. Furthermore, each streak can also be divided into time-blocks of other length. The same procedure can be applied to calculate the correlation coefficients of the streaks using different length time-blocks. Results Figure 3-5a shows the correlation coefficients for all 45 combinations of the 10 defect streaks using the standard-length time blocks. Although most of the streaks are positively correlated, the average correlation coefficient is only 0.08652. Figure 3-5b and Figure 3-5c show the correlation coefficients of the 10 defect streaks using double-length and quadruple-length time blocks respectively. There is a small but steady increase in the correlation coefficients as the length of the time blocks are increased. For double-length time blocks the average correlation coefficient is 0.1054, and for quadruplelength time blocks, the average correlation coefficient is 0.122. The increase in correlation coefficients with increasing time blocks indicates that while the streaks are somehow correlated, most of the correlations are buried in high frequency noise. As the time block lengthens, some of the high frequency noise are filtered out, resulting Streak Correlations I.T TTTTTT TT T _T TT T TTT.T · 01 mean - 0.086552 -0.5 O 5 10 16 20 25 m 0 mean 0.105224 30 35 40 Streak Correlations 0.5- -0.5 - 0 5 10 15 20 25 30 SS 40 Figure 3-5: Correlation Coefficients Between Streaks Using a) standard time block, b) double-length time block, c) quadruple-length time block in higher correlation coefficients. But due to the uncertainty of the signal-to-noise ratio of the process, the 'true' correlation coefficients between the streaks are still uncertain. 3.5.2 Streak-Cloud Correlation This part of the analysis focuses on determining correlations between the cloud defects and the streak defects. If the two types of defects are highly correlated, there should exist some common process parameters that cause the occurrence of both streak and cloud defects. If they are not highly correlated, the two types of defects are most likely caused by separate process parameters. In addition, comparing the correlation coefficients calculated using different time-blocks can present some information as to the frequency range where the two types of defects are most correlated or uncorrelated. Results By using time blocks varying from length 1 to length 100, the correlation coeffi- cients are calculated between the streak and cloud defects. Figure 3-6 shows that the streaks and cloud defects are positively correlated. For time blocks on the order of length 1 to length 20, the correlation coefficient is approximately 0.2. As the length of time blocks increase to the order of 70 to 100, the correlation coefficient gradually increases to an average of Correlaton Coelficients Between Cloudand StreakDetects Figure 3-6: Correlation Coefficients Between Streak and Cloud with Time Blocks of length 1 to length 100 approximately 0.35. The positive correlations between the streak and cloud defects indicate that they are related some common underlying physics. In addition, the analysis shows that the correlation is higher when longer time blocks are used. This suggests that some of the high-frequency noise is filtered out as the time block increases, resulting in a more accurate representation of the correlation coefficients between the streaks and clouds. 3.5.3 Interpretation The defect data show the difficulties in determining the correlation coefficients between two processes when the true signal-to-noise ratio is unknown. For example, in section 3.5.1, it is shown that the correlation coefficients between the streaks increase as the time blocks are lengthened. Lengthening the time blocks, in effect, removed some of the high frequency Poisson noise component, resulting in a more accurate representation of the correlation coefficients. More analysis needs to be done to quantify the effect of Poisson noise on the correlation coefficient so that the 'true' correlation coefficients between streaks and between streaks and clouds can be identified. 3.6 Poisson Distribution Figure 3-3a shows that cloud defects seem to be randomly generated and fairly evenly distributed across the web, representing a Poisson distribution. In this section, analyses are performed to look at the distribution of cloud defects over time. The key is to find out whether or not the cloud defects exhibit a Poisson distribution, and if they do, over what frequency range. 3.6.1 Method Similar to the method utilized for determining correlations between streak and cloud defects, the entire set of cloud defects is divided into time blocks of a certain length. Once the total cloud defect count is determined for each time block, a histogram of the total defect count in each time block is presented and compared to a plot of a typical Poisson distribution. Time blocks with different length can be used to determined if there are certain frequency ranges where the cloud defects resemble Poisson distributions. Cloud Distribution using 5 min Time Blocks 5 10 Number of Defects in a Time Block Poisson Distribution using 5 min Time Blocks _·Y·| I std=19.Z2 td= 17.52 8 200 std=14.18 Sstd=9.381 E = 100" ''' 0 0 -- -~----- ----- I 10 5 Number of Defects ina Time Block Figure 3-7: (a) Cloud Distribution Using Fixed-Length Time Blocks, (b) Ideal Poisson Distribution Using the Same Fixed-Length Time Blocks 3.6.2 Results A time block of a certain fixed length is selected to determine the histogram of the defect density. For the selected standard length, the average number of cloud defects in each standard time block is approximately 2. Figure 3-7a shows the histogram of these cloud defects using these fixed-length time blocks, and Figure 3-7b shows an ideal Poisson distribution generated with the same average defect density using the same standard-length time blocks. A comparison of Figure 3-7a and 3-7b show that the cloud defect distribution does not resemble a Poisson distribution for these standard-length time blocks. Mioc. k Cloud Mitibuton u..hg 10 mlN Tn I 40 lo- 00 2 0 14 10 12 0 8e NumbFroDefDects in a Ti6 Block 4 2HD 16 18 16 1U 20 m Cloud Distributionus"lg 30 rnn Trne Block 2 4 6 8 Nu.bLr ofDalect 12 10 , . TimBo. 14 20 Figure 3-8: Distributions of Cloud Defects Using x2, x4, and x6 Time Blocks I Pot n DCidbon urno 30 r-n TW- Block. Nu"Ir r of Dil ctflF in Tir- Block Figure 3-9: Ideal Poisson Distributions Using x2, x4, and x6 Time Blocks Figure 3-8 presents the histograms of cloud defect density using 2 times, 4 times and 6 times the length of the standard time blocks used in Figure 3-7. Figure 3-9 shows the ideal Poisson distributions with the same average defect density as the cloud distributions generated using time blocks x2, x4, and x6 the standard length in Figure 3-7. A comparison of Figure 3-8 to Figure 3-9 shows that the cloud defect density does not exhibit a Poisson distribution when measured using small time blocks. But as the length of the time block is increased, the distribution of the cloud defects becomes similar to that of a Poisson distribution. 3.7 Principal Component Analysis As discussed in Section 2.6, principal component analysis (PCA), also referred to as the Karhunen-Loeve transformation (KLT), is a powerful tool in multi-variable data analysis. In a multi-variable data space, the number of variables that have to be observed simultaneously can be enormous. As a result, PCA is applied to reduce the number of variables without losing much information and to interpret the data using a different set of axes. 3.7.1 PCA of the End-of-Line Data Principal component anlaysis is applied to the 7 of the 8 end-of-line quality variables, excluding the variable that characterizes the defect type. Figure 3-10 shows the time-series behavior of the first 3 principal components. Figure 3-11 shows the accumulated variance captured as a function of the number of principal components used. One can see that approximately 90% of the information contained in the 7 in-line variables are captured by the first 3 principal components. 3.7.2 PCA of the In-Line Data Principal component analysis is applied to 438 in-line variables. Figure 3-12 displays the first 4 principal components of the in-line data. Changes in the principal components imply that the production process fluctuates over time. 10 di 5 -5- 0 o 500 o 1000 1 ooo 1500 2000 25M0 3000 5oo 2000 200 Figureoo 3 Principal 1003-10 Firsoo Tie Tk- 2ooo Components 3000 3=0 4000 45'00 5000 3500 4000 4500 of the 000 End-oo sof-Line Data Figure 3-10: First 3 Principal Components of the End-of-Line Data I j Figure 3-11: Percent of Variance Captured by PCs 10CLi0 -10. -201 5/15pm 6/4noon 524pm 5A8Smm 20 0-I 5I10aM 5/2 dam 5/1 5pm 5/32amn 24pm 54 10Opm 5/3 noon 5/4lam Figure 3-12: First 4 PCs of the In-Line Data 3.7.3 Interpretation A comparison of the two sets of principal components presented in Figure 3-10 and Figure 3-12 can reveal some interesting insight into the nature of the relationship between the in-line and the end-of-line data. Since the principal components are another way of representing the process variables, fluctuations in the principal components indicate fluctuations in the underlying process. As indicated before, the principal components of the in-line data fluctuate noticeably in time. Assuming there exists a close relationship between the in-line data and the end-of-line data, the principal components of the end-of-line data should also show similar fluctuations. However, Figure 3-10 does not confirm this. Instead, the principal components of the end-of-line data seem to behave completely independently of the principal components of the in-line data. As a result, PCA shows that there is no strong linear relationship between the in-line and the end-of-line data from web process 1. Chapter 4 Web Process 2 4.1 Background Web process 2 is a multi-staged manufacturing system that takes raw materials and transforms them into the final product through a number of sequential subprocesses. Raw materials are introduced into the first stage, and after going through certain chemical and physical changes, the desired output is produced. Next, the output from the first stage becomes the input for the second stage. Again, under certain control parameters, input is transform into output. This process is repeated a number of times, as input turns into output, and output becomes input. The output of the final stage of this multistaged manufacturing process becomes the final product. It must be noted that the output from each stage can be very different from the input. As a result, the final product or the output of the final stage is often nothing like the initial input. Each stage in this multi-staged manufacturing process can be treated as a subprocess. Although the subprocesses can be occasionally shut down for maintenance or testing purposes, they are continous processes with real-time controlling and monitoring parameters. But between these stages, there can be certain amount of delay as material is transferred between subprocesses. The output of one stage often does not immediately become the input for the next stage. Due various factors such as supplier power and customer demand, production within certain stages can be speeded up or slowed down, resulting in delays between subprocesses. Understanding these delays can be important when performing multivariate data analysis. 4.2 Data The data set for web process 2 contains 1518 variables recorded every time unit for a total of 2000 observations. These variables can be either control or monitor variables. As mentioned in a previous chapter, control variables are also referred as in-line variables, and monitor variables can be called quality variables. Since the variables in the data set are arranged in alphabetical order, the order of the data does not have any physical significance. In other words, the variables from all the different subprocesses are scrambled together. In addition, they are not separated into either in-line or quality variables nor are they grouped according to subprocesses. 4.3 Preprocessing A major fraction of the data set containing 1518 variables and 2000 observations are either corrupted or missing. After unwanted variables and observations are removed, the remaining working data set contains 1010 variables and 1961 observations, which is about 65 percent of the original data set. Next, variables whose sample variance is zero are deleted from the data set, because they contain no useful information. The remaining data set contains 1961 recordings of 860 variables, which include both control and quality parameters. Before applying multivariate analysis, the variables are all normalized to zero mean and unit variance using methods discussed in section 2.3.1. 4.4 4.4.1 Feature Characterization Quality Variables Unlike web process 1, the quality variables for web process 2 do not record the actual physical location of defects. Instead, the quality variables are various modified physical parameters. As a result, no one figure can capture the quality of the final product. 4.4.2 In-Line Variables Since there are many in-line variables, it would be very difficult to analyze them in depth one by one. But a simple look at the behavior of individual variables over time could provide some valuable insight before performing any multivariate analysis. The following are 10 typical types of behavior associated with the in-line variables: Variable Behavior Over Time o 2- 0 500 ' ' 2 -- 2 1000 1500 Observation a 2000 iIOIJo..U 0 Observation 0 10 500 1000 Observation 1500 2000 :Li-i1l-ziii 10o -o 'L ' 0 OOio. iloloo 6500 1000 Observation 1500 6 2000 0 6oo 1000 1500 Observation 0 2000 •-00 500 1000 1500 Observtion N 2000 0 500 1000 1500 Observaton 0 2000 2 0 0 1000 100 2 E tea 0 0 A-5Fu 0 Observation 0oo 1000 U 1500 Observation 0 500 - 1000 Soo Observation 1000 2000 1500 2000 W1500 2000 • Figure 4-1: Ten Types ofIn-Line Variable Behavior These graphs present some interesting features with regards to the behavior of the manufacturing process. Each of the above 10 plots represents a group of variables with a particular type of behavior. Almost all the in-line variables can be categorized into one of the ten types of behavior described below: 1. Variable 1 - represents the set of variables whose values remain fairly constant except for sharp transient outliers at certain observations. 2. Variable 2 - represents a set of variables that increase linearly and reset themselves periodically. 3. Variable 3 - belongs to a group of variables that tend to remain constant for a period of time before jumping to another value. 4. Variable 4 - generally low frequency quantatized behavior with sharp transient outliers. 5. Variable 5 - linear time-series behavior. 6. variable 6 - high-frequency oscillatory behavior that drifts over time. 7. Variable 7 - high-frequency periodic behavior that is confined tightly within a certain range. 8. Variable 8 - fairly random high-frequency behavior. 9. Variable 9 - high-frequency behavior with relative small amplitudes compared to sharp transient outliers. 10. Variable 10 - high-frequency behavior with a lower bound. 4.5 Outliers Analysis As defined in Section 2.3.1, outliers are observations that are inconsistent with the rest of the data. Identifying and understanding outliers in a manufacturing setting can be very important to plant managers whose goals are to eliminate variation and to reduce defects. The plant managers are interested in knowing the answer to the following questions: 1. Are the outliers simple extensions of the normal behavior? 2. If not, are there any physical significances behind the outliers? 3. If so, can the outliers be grouped according to these physical significances? 4.5.1 Normalized Euclidean Distance The normalized Euclidean distance method, as explained in Section 2.3.4, is a good way to identify outliers. In this case, this method is applied to 860 variables and 1961 observations. The plot of the normalized Euclidean distance in Figure 4-2 shows that there are at least two distinct populations in the data set. One group of observations, where the normalized Euclidean distance is above approximately 1000, shows sharp and spiky behavior over time, Ofn 8000 7OOOO 4000 -- S4ooo r:' - o200 , 400 r ?- . 600 ,r; 8 . G Ob o000 . '120 nmatlan . '14oo 1600oo 180 2000 Figure 4-2: Normalized Euclidean Distance while the other group of observations, where the normalized Euclidean distance is less than 1000, shows slow-moving and fairly constant time-series behavior. In order to define outliers, it can be assumed that in a properly functioning multivariate manufacturing environment, all the process parameters operate within a certain normal range of values both individually and collectively. Consequently, behavior outside this certain normal range can be categorized as outlier behaviors contributed mostly by containments. Figure 4-2 shows that an appropriate normal range of behavior can be defined as observations with normalized Euclidean distance less than 1000, and the outliers set corresponds to observations with normalized Euclidean distance greater than 1000. Figure 4-3 is a plot of these two separated groups: 1) the normal set, and 2) the outliers set. The time-series behavior of the normalized Euclidean distance of these two sets of behaviors do not seem to be extensions of each other. The normal set exhibits fairly constant and stable behavior, while the outlier set is transient and very unstable. 4.5.2 Time Series Model - PCA In addition to normalized Euclidean distance, various time-series methods, such as principal component analysis (PCA), can also be good methods to identify outliers. PCA groups together variables that are correlated with each other. Since multivariate outliers are produced by sets of variables that exhibit the similar 'outlying' behavior at certain times, PCA should I- 7000 6000 5O00 2000 1000 o 200 48 00 80 Obtse ro00ation 01200 1400 1600 1800 20 00 1400 1600 1800 2000 8000 7000 6000 95000 O00 2000 0ooo "7ru~·s~hkut~,*c~,r;~.h+~liirirr3 O 200 400 600 Boo800 1000 1200 Figure 4-3: Outlier and Normal Behavior Based on Normalized Euclidean Distance be able to group together the variables contribute to the same outliers. PCA should be more effective in grouping outliers than the normalized Euclidean distance method, because it groups variable that are physically significant together rather than simply grouping the observations into two populations. Principal Component Analysis Figure 4-4 represents the first 10 principal components calculated from the data set containing 860 variables and 1961 observations. Similar to the plot of the normalized Euclidean distance in Figure 4-2, Figure 4-4 shows that principal components also exhibit sharp discontinuities at certain observations where the their values jump very sharply. A more careful look at Figure 4-4 shows that the first ten principal components exhibit two major types of outliers. * 1st type - Step outliers. This set of outliers is associated with the 1st and 3rd principal components, where the values of the principal components between approximately observation 970 and 1220 are substantially different than the values of most of the other observations. * 2nd type - Transient outliers. They are associated with the 3rd through the 10th principal components, where the outlier values are very different from the rest of the population only for very brief periods of time. Observation # Figure 4-4: First Ten Principal Components of Web Process 2 Data Set These two types of outliers seem to be controlled by two different set of physical parameters. The first type of outliers takes place when the principal component jumps suddenly from a certain range of value to a different range of values, stays there for a period of time, and jumps back to the original range of values. The second type of outliers are transient outliers that occur abruptly and for brief periods of time. The contrasting behavior of these two types of outliers indicates that they are controlled by separate underlying physical processes. Looking at the 3rd through 10th principal components associated with transient outliers, we can identify two distinct groups within the transient outlier set. * The first group is associated with the 3rd, 4th, and 5th principal components, where their values exhibit sharp changes at approximately observations 100, 1750, and 1950 occurring with similar relative proportions. * The second group is associated with the 7th, 8th, 9th, and 10th principal components, where their values change sharply at approximately observation 600. PCA with Frequency Filtering Transient outliers occur when the values of the principal components abruptly jump to a very different value for a short period of time before returning to the original values. As discovered from Figure 4-4, there are two kinds of transient outliers associated with the 3rd through 10th principal components. The figure shows that the 1st kind of transient outliers is spread out over the 3rd, 4th, 5th, and 6th principal components, while the second kind is spread out over the 7th, 8th, 9th, and 10 principal components. PCA collapses variables that are similar to the same dimensions. But in this case, each of the two kinds of transient outliers are spread out over more than one dimension. One hypothesis for this phenomenon is that the original data set is dominated by low-frequency behavior. As a result, the first few principal components are also dominated by low-frequency behavior, and the high-frequency transient outliers are not dominant enough to be grouped to the same dimensions. Since we know that the transient outliers are associated with high-frequency behavior, one way of collapsing the transient outliers into a smaller number of dimensions is to perform high-pass filtering on the original data set before applying PCA. The idea here is that with the low-frequency components filtered out, PCA can group the high-frequency transient outliers much more effectively. f-gh-Pass F"b High-Pass Fil o08 0.6 _04 0. 004 0.2 0 o; 0.1 -200 0.2 0.3 04 05 06 (0p0) NormF-qouooy 220 07 0.8 0.9 0 - (wp0 FRquency FNom 01 02 02 01 03 03 04 04 05 05 06 0 (vpi) Noff Fr.quwcy o7 a 0 9 1 Norm FoMquo•y (PON Figure 4-5: High-Pass Filters with Wp=0.1 and Wp = 0.3 Figure 4-5 shows the graphs of two high-pass filters utilized to remove the low-frequency components in the original variables. Figure 4-6 presents the first 10 principal components .___ W00. -- Jti • -I0F n __r-, soo Ob4 e -0 1000 oo vtlan w 1-'_ ! B00 -- 0 :0 - 1 .. 2000 -. - 2oS0o 14500 o soo ~1 0 ,1 w I i +on 2000 1ooo 4DUý.Va(n 015oo 01·, 2 1000 o -so ii1 oo II . 600 I000o .btli -0064 4 oo ":" :':• - :: •-.. . .. . .. . . .. . .. I-00c, , " • o 600 o 1000 1 00 Ob..,stio. 4 40 2o00 :: M 2000 oi _, :.o .. .•... ::': ... .. , :: ..LA.$ I, ...77 ".i ,2 11 M,., . "p"7-1-7-irr o oo00 1000 I 00 2000O 20 j'j. 20 Cft0,00tjý so0 o0 -- 201 a I C'~00 a15~00 10b..ý.Ulon -o• 20000 Ob soo 1000 15OO 2000 •doo ••oo 20o -Villn 0 Obucrvrtlofl # Figure 4-6: First Ten Principal Components from 90% High-Pass Filterd Data --200 0 2 2 0 So00 1000 1500 2000 20 500 1000 .1500 2000 1000 1500 Obsorvatlon 0 2000 100 0 on 0 soo 0 0,O 50 1000 1500 2000 100 -1°o 0 500 1000 5oo 1ooo 1500 1000 5 50 5oo Boo Ob50 o~IQO 1 0 0tl0n Obrlr~vllon 0 oo ~o•° Obllervrtlon 640iS 0 gsoo 2000 1000 1500 2000 1000 1500 2000 ,.oo 2•ooo 2000 Obsftostl2,1 0 50 -o 1500 2000 M 2ooo 0 so OIL 0 # -a I 50 il ,0 m IOblorvstlon 600 ObrWstlon soo •doo 0 an Figure 4-7: First Ten Principal Components from 70% High-Pass Filtered Data calculated from variables with the lowest 10 percent frequency range filtered out. Figure 4-7 presents the first 10 principal components calculated from variables with the lowest 30% frequency range being filtered out. Figure 4-6 and Figure 4-7 show that PCA, obtained from high-pass filtered variables, does remove the low-frequency behavior but does not effectively separate the different kinds of outlier behavior. For the two kinds of transient outliers discussed in the previous section, PCA with frequency filtering still spreads each one of them out over more than one principal component. The first kind of transient outlier appears in the 3rd, 4th, 8th and 10th principal component in Figure 4-6 where the lowest 10 percent of the variable frequency range are removed. Although the second kind appear predominately in the 6th principal component, it also shows up in the 5th and 9th principal components. In Figure 4-7, where 30 percent of the lowest variable frequency are filtered out, the two kinds of transient outliers appear slightly better defined. The first kind mostly occupies the 6th principal component, while the second kind mostly shows up in the 5th principal component. 4.5.3 Identifying Outlying Variables From a plant manager's point of view, outliers represent shifts and changes in process parameters that can potentially effect the quality of the end product. As a result, we want to develop methods and tools to help managers to identify the physics behind these outlier behaviors. To understand the underlying physics, we need to determine which variables or combinations of variables contribute to which outliers. This way, we are able to analyze the variables and determine the causes for the outliers. In this section I will present some methods to group the variables according their contributions to the outliers. Transient Outliers Focusing on the transient outliers in the top plot of Figure 4-3, which does not include the set of outliers from observation 1000 to 1200, we can see there are mainly 4 regions where the values of the normalized distance jump up suddenly and return quickly. These 4 regions are located approximately at observation 100, 600, 1750, and 1950. The goal here is to determine which variables contribute to these different transient outliers in these 4 regions. One method to find the contributing variables is to find the variables that also experience sudden changes in values at the same observations corresponding to the 4 transient-outlier regions. Since a manufacturing data set often contains hundreds of variables, looking at the time behavior of each variable might be burdensome in determining the causes of transient outliers. The following procedure is a simpler way to find the contributing variables. For a data set X;j, where i represents the observation number 1, . . . , n, and j represents the variable number 1, . . . , m, D is a difference matrix with dimensions (n - 1) x m, whose rows are equal to the difference between adjacent rows of X. Dij is defined as: D = X - X- (4.1) Let M be a row vector of length m where the jth element represents the average of jth column of the difference matrix D. The 4 transient outlier regions are represented by j=100, 600, 1750, and 1950 respectively. The variables that contribute to region 1 where j=100 are variables with index i that satisfy the equation Dij >> Mj. The variables that contribute to the other outlier regions can be determined by using the appropriate j's. The basic method described here is to determine the variables whose greatest change between two consecutive observations compared to the average change occur at the same observations corresponding to the 4 transient outlier regions. Thus, we can determine a set of variables that contributes to the transient outlier behavior in each of the 4 regions. Results show that this method is effective in determining the variables that contribute to the transient outliers in the different regions. Figure 4-8 is a plot of a set of identified variables that correspond to outliers in region 1 and 4. Eigenvector Analysis The eigenvectors associated with the principal components reveal how much each variable contributes to the time-series behavior of the corresponding principal components. As a result, eigenvectors associated with principal components that exhibit outlier behavior can potentially reveal the variables that contribute most to the outliers. * Step Outliers - Since the first principal component represents most of the behavior associated with the step outliers, the corresponding eigenvector can provide information Var 37 i 6r -- 300 Vmr 874 100ooo Var 3Se 2 9 -D Var 3see 30oo -40 20op Vmr 673 a73 Var Va- 376 2 looo Vmr 370 2100 10002000 Vlr 373 s"• Var 0 Var7376 2 1000 'Var070 0 .90 0 0oloo o~ 100ICH0 mrM7 2000 oo no Figure 4-8: Variables Identified that Contribute to Transient Outliers in Region 1 and 4 as to the contributing variables. Figure 4-9 represent plots of the first principal component and the corresponding eigenvector. As discussed in Section 4.5.1 , we can see that - 20 · · · · · · · · --· 100 · 100 10 0 jkcC"yU, 2-10 -20 4~. -30 -bC) 0 · 200 400 600 · 800 · 1000 1200 Observatilon 0 1400 -- 2000 Figure 4-9: The First Principal Component and the Corresponding Eigenvector from Process 2 Data the 1st principal component is dominated by the outlier behavior from approximately observation 1000 to 1200. The corresponding eigenvector shows that 122 variables are weighted significantly more than the rest of the variables. These 122 variables are the main causes for this type of outlier. * Transient Outliers - With the 122 variables that contribute to the step outliers removed, the eigenvectors of the correlation matrix and the associated principal components are calculated for the remaining 738 variables and 1961 observations. Figure 4-10 are the first 10 principal components, and Figure 4-12 are the corresponding eigenvectors. ,to, 50 o . 0- I 0 o F0: -20- .0 O..0arti•, 160 2000 ,.40 .t. Z20 o -4- - 1000 OClrr"Mtla So 1000 1500 -20, 10 0 500 0ompobentso,, 0 5 ooo 1. 1500P 0 '1500 2000i 2 4 -20h -- A ,; ° 2~ ~ -- 00 Ob20 0 1000So f 1000 1500 tb~on S n·tsl Soo 3ftratian Fiue41:TeFrtTnPicpa opnnsfo 0,, ~a G o 2000 n0 ff... .5 o0 1000 ThbeFrsvti0Tn moo Ob2tarvti2on 1000 500 - 7 ... _0 20 .40 100ooo oo 10 &1 500 2000 8 2000 V 2000 3 aibe 0. wItrlabloJA 200 Varibl oo 400 Number IN~a oo00 0.1 o 2oo 400 500 500 o •oo 400 Soo Soo V 9j d 0 0-O2 0o0 02 0 200 400 'doobI I U1W" 200 20o 400 4oo 500 500 000 6°° 500 zoo 0.2 o 200 Valrbia 400tle o.tbs5 oo oo G 0. o 200 Variable b 400Numlb5 00 0 200 8oo00 Viable Number 200 Vlriable 400 Number 00 00 400 6b00 S 200 400 t~k.Mbtl 800 Vriable S00 oo0 Figure 4-11: The First Ten Eigenvectors from 738 Variables Since the two kinds of transient outliers are spread out over the first 10 principal components, the variables that contribute to these two kinds of transient outliers are spread out over the 10 eigenvectors. As a result, Figure 4-12 show that all the variables are weighted fairly evenly in the determination of the principal components, and it is hard to determine which variables contribute most to the outliers from examining the eigenvectors. Eigenvector analysis is not very effective in isolating variables when the outlier behaviors are spread out over many principal components. 4.6 Variable Grouping This section addresses the question: how can variables that are related to some common physical process be effectively grouped together? In order to group or separate variables, it is very important to have some understanding of the underlying physics of the manufacturing process. In web process 2, the manufacturing process is divided into subprocesses, where the output of one subprocess becomes the input of another. The data set contains both control and quality variables from all the subprocesses. It is reasonable to hypothesize that variables that are related to the same subprocess are more correlated with each other than variables from different subprocesses. The following 3 subsections present 3 methods of variable separation. Section 4.6.1 presents principal component analysis with a focus on examining the associated eigenvectors calculated from the correlation matrix of the original data. Section 4.6.2 and Section 4.6.3 introduce two methods where some of the noise in the original data set are removed before performing PCA. Based on the hypothesis that variables related to the same subprocess are more correlated than variables from different subprocesses, Section 4.6.2 shows how a more robust correlation matrix can be used to calculate PCA. In Section 4.6.3, variables are frequency-filtered prior to the calculation of the correlation matrix. Results will show that these two methods are more effective than standard PCA in capturing the variables groups within the original data set. 4.6.1 Principal Component Analysis Principal component analysis groups variables together based on the variables' correlation with each other. Recall from Section 2.6.3, the ith principal component is a weighted linear combination the original set of variables obtained from the following relationship: zi = w!X (4.2) where w! is the eigenvector associated with the ith principal component, zi, and X is the original data set. Equation 4.2 shows that the eigenvectors characterize the weights of variables in the calculation of the principal components, and the eigenvectors reveal which variables contribute the most to the time behavior of the corresponding principal components. As a result, examining the eigenvectors obtained from the correlation matrix of the original data can provide important information as the how the variables are grouped. -0 200 60 40 o00 Variable Number 0.2Ou 0 0.2 200 400 500 2 Variable Number 000 0 - 0.1 0 200 400 60 6o 400 600 600 Variable Number 00 0.2 7l Variable Number Figure 4-12: First 10 Eigenvectors of 738 Variables -0. -ht0.05 10 - 0 0.05 0. 0.159 - V.1 b oNumOber ..... -0.1 -0.06 0 -0.1 -0.5 0 0.05 0.1 0.05 Variable Number 0.1 0.15 10a 10*a -0.1 -005 0 0.05 0.1 0.15 0.06 Number 0.1 0.15 0.05 0.1 0.15 Variable Number 11o -0.1 -0.05 0 0.05 0.1 Variable Number ht1 -18 0.15 11o lO0 e 1oO -0.1 -0.5•0b•N -0.1 O, -0.05 0 0.05 Variable Number 0.1 0.1 0.15 Mh 70 1On 0.15 0.15 0.1 -0.1 -0.05 Variable0 -0.05 0 Variable Number Va~rible Number Figure 4-13: Histograms of the First 10 Eigenvectors of 738 Variables Observation Figure 4-12 shows the eigenvectors associated with the first 10 principal components obtained from the correlation matrix of 738 variables, and Figure 4-13 are the histograms of the eigenvectors. In Figure 4-13, the symbol 'ht' represents the height of the middle bar. Since the sum of the area under the bars for each eigenvector is the same, the height of the middle bar is a good indication of how widely the values of the eigenvectors are distributed. The plots of the first 10 eigenvector and the associated histograms reveal that almost all the variables are weighted towards their respective principal components. No single eigenvector is dominated by a small number of variables. Interpretation Based on the assumption that variables from the same subprocess are highly correlated due to their common link to the underlying physics of the subprocess, variables from different subprocesses should be less correlated. Since the data set contains variables from many independent subprocesses, it is expected that many of these variables will be uncorrelated with each other. Consequently, for each eigenvector calculated from the correlation matrix, there should be variables that make significant contribution to the associated principal components, and there should also be variables that make little or no contribution to the associated principal components. Contrary to the above hypothesis, Figure 4-12 shows that almost all the variables in the first 10 eigenvectors contribute to some extent to the time behavior of the first 10 principal components. This is not consistent with the initial assumption that variables from difference subprocesses should not be correlated and, thus, should not all contribute to the same principal components. From this, we can conclude that there is a lot of noise in the original data set, resulting in accidental correlations between variables from different subprocesses. Thus PCA does not groups the variables as well as it can. Methods should be developed to improve the signal-to-noise ratio of the eigenvectors so that PCA can better categorize the variables collected from different subprocesses. 4.6.2 PCA with Robust Correlation Matrix Ideally, only variables that are from the same subprocess or that are controlled by the same underlying physics should contribute to the same principal component. Consequently, it can be assumed that the variables from different subprocesses are correlated mostly by accident, and the correlation between them can be considered as noise. One method to improve the signal-to-noise ratio in the eigenvectors of the correlation matrix is to create a more robust correlation matrix by eliminating some of the accidental correlations between variables. r A 10,, I 4 n dex 5 I . 16n Figure 4-14: Magnitude of Correlation Coefficients of 738 Variables in Descending Order in (a) Normal Scale, (b) Log Scale Method Figure 4-14 shows the magnitude of the correlation coefficients, rij, calculated from the 738 variables arranged in a descending order both on a normal scale and a log scale. In order to create a more robust correlation matrix, where some of the accidental correlations are removed, let E be the cutoff correlation coefficient, where rijjI < E is set to 0, and Irij I > c maintains its value. Principal component analysis is performed using this more robust correlation matrix to determine the grouping of th variables. Results Cutoff = 0.06 Figure 4-15 are the first ten eigenvectors calculated from the robust cor- relations matrix where c = 0.06, and Figure 4-16 are the corresponding histograms. Cutoff = 0.10 Figure 4-17 are the first ten eigenvectors obtained from the robust corre- lations matrix where correlation coefficients below 0.1 are set to 0, and Figure 4-18 are the corresponding histograms. 0o.1 0.1 200 , O. 0 600 400 800 Boo VariableNumber 0.10 200Variable 400 600 Number 0.2 Variable Number 800 -0.2 Variable Number 0,2 I 0 0 0.2 0- 200 0 200 400 600 200 400 600 800 200 400 600 Boo o800 600 400 Variable Number Variable Number -02 800 -- 0 -02 200 400 oo 600 Variable Number 0.2 Variable Number Variable Number Figure 4-15: 1st 10 Eigenvectors of Calculated from Robust Correlation Matrix with Cutoff = 0.06 f· 2 L ht L 10 71 j10, -0.1 0.05 -0.05 0 Variable Number 0.1 -0.1 0.15 -0.05 0 0.05 Variable Number 1 102 ht -0.05 0 0.05 0 15 103 -0.1 -0.1 0.1 ht - 77 0.1 -0.05 0 0.05 0.1 0 0.1 0 ). .15 0.15 10 Variable Number V.ariable Number 10' -0.1 -0.05 0.05 Variable0 Number 0.1 0.15 -0.1 -0.05 0 0.05 Variable Number 0.1 0.15 10 L -0.1 -0.05 0 0.05 Variable Number -0.1 -0.05 o 0.05 Variable Number F10 -0.1 -0.05 0 0.05 Variable Number 0.1 0.15 o.1 0. Figure 4-16: Histograms of the first 10 Eigenvectors Calculated from Robust Correlation Matrix with Cutoff = 0.06 0 2o 0 0 200Variable 400 Number 400 o 2 -0 o00 800 0.1 O 00 200Variable 400 Number o00 800 -0.2 -0 200Variabl400 Numbr0 00 800 00 0 200 400 Variable Num•b•r600 800 02 o 0.20 2002Varible 400 Number600 0.1 800 200Variable 400 Number 00 800 200 800 400Number 600 Vaable 200 400 Variable Number600 - S00 =0.1 10P Lht 10 -103 10- -0.05 O 0.05 Number 0.1 e 2 Vriablrle [ -0.1 I n ht M-105 10 0.15 -0.1 -0.05 O 0.0 Va~rible Number h 10 0.1 0.15 O 10' 10 110o r_ 10, 28 -0.1 10 · -0.05 V 0 0.0 0.1 0 0.05 0.1 0.15 o101 ariable Number 0.15 110 -0.1 -0.05 ON 0.0 0.1 0.15 -0.1 -0.05 0 0.05 0.1 0.15 Varble Number 10'i10' ~102 -0.1 -0.05 Variable Number hS- lo"-Vb0r 10 10 lei10"I 10 -0.2 -0.1 -0.05 0 0.05 Variable Number ht - 10 0.1 127 0.1 Variable Number 0.16 1 fl. .- .1, In 10' Variable Numbert-0 -BA1 .M OOO 6u O115 0 -0.1 -0.05 0 0.05 0.1 0.15 -0.05 0 0.05 0.1 0.15 ht - 13 Variable Number 10 0.1 Variable Number Figure 4-18: Histograms of the first 10 Eigenvectors Calculated from Robust Correlation Matrix with Cutoff = 0.10 Cutoff = 0.15 Figure 4-19 are the results of eigenvectors calculated from the robust correlation matrix where E =0.15, and Figure 4-20 are the corresponding histograms. 1 -0. 0 200 O S 400 . 600 . 8 0 200 Varkble 400 600 Number 800 Variable Numbor o400 sol Variable Number 800 Variable Number . 0 2-o.2 0 o 0. o 0Varlbl. 0 200 0 Numbe-0r 400 600 Variabl. Number Va.ab.e Number 0o1 Variable Number Figure 4-19: First 10 Eigenvectors calculated from Correlation Matrix with Cutoff at 0.15 • ° | ht - -0.1 -0.05 0 0.05 Variable Number 136 0.1 0 Ih .15 -0.1 -0.05 o 0.05 VariableONuOber 10 0.15 102, -to' .15 SI, 10o 10, Variable Number ° 10 0.05 -- 0.1 -- 0.05 Variable0 Number 0.1 0 -0.1 -0.05 0.05 Var-abla 0 Number 0.1 0 -0.1 -0.05 0 0.05 Variable Number 0.1 0 .15 -0.1--0.05 0 0.05 Variable Nunbar 15 -0.1 z 0.1 0.15 0.1 0.15 •1o 10o IO' Ila• 2io10 V-IrWab N~umber 0 0.05 -0.05 vaftabla N uber ]s Figure 4-20: Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff at 0.15 Cutoff = 0.18 Figure 4-21 are plots of the first 10 eigenvectors obtained from the robust correlation matrix with cutoff coefficient equals 0.18, and Figure 4-22 are the histograms of the first 10 eigenvectors. o.1 o.9 200 -0.1 400800 -o 2o Varlabla Numviber 40 Number VrIable 0 200 0 200V400 500 V0rlebi Number 400 V0rleble Number500 500 800 0 .VerLeblo 200 OO 20 -500 00 o00 400 000 NUmber 500 500 400 800 Vmrvibie Number 500 Figure at 0.18 4-21: First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff at 0.18 110 -0.1 -O.O6 O 0.0 0.1 0.10 -10 10 -0.• -0.r05 0 -0.1 -0.05 0 0.05 0.1 0.05 0.1 0.15 o lo -0.1 10-o -0.05 -0.1 -o0.00 0 0.05 0.1 0.15 o 0.o0 0.. 0... -0.1 -0.05 -0.1 -0.05 0 0.05 0.15 1 N umbrh VIl*b 0.1 0.15 110 V-6..fta bdurt ve"'U'a0 "',Z0.05 0.1 0.15 Figure 4-22: Histogram of First 10 Eigenvectors calculated from Robust Correlation Matrix with Cutoff at 0.18 69 Interpretation The figures of eigenvectors obtained from correlation matrices with different cutoff values show that a more robust correlation matrix is more effective in grouping correlated variables. As the cutoff values increase, eigenvectors show that the variables that contribute to the associated principal components are weighted more, while the variables that do not contribute to the associated principal components are weighted less. In addition, the histograms of these eigenvectors show that the distribution of the eigenvectors become increasingly narrower as the cutoff correlation coefficients are increased. As was discussed before, a narrower distribution means a taller middle bar. The average height of the middle bar of the first 10 eigenvectors increases from 86.0 to 98.1, to 113.3, to 133.7, and to 137.8 as the cutoff level increases from 0 to 0.06, to 0.10, to 0.15, and to 0.18. A narrower distribution means that for each eigenvector, only a small number of variables contribute to the associated principal component, while most of the variables make minimum or no contribution to the associated principal component. This is consistent with the hypothesis that variables from different subprocesses should not all contribute to the same principal components. Comparison Figure 4-23 is a comparison of 5 eigenvectors calculated from 5 different cor- relation matrices obtained from different cutoff values. The plots show that the eigenvectors calculated using the more robust correlation matrix groups the variables much better than the eigenvector calculated using the original correlation matrix. One can see that the signalto-noise ratio of the values of the eigenvector increases as the cutoff correlation coefficient level increases. The histograms of the eigenvectors in Figure 4-24 also indicate the improvements in the signal-to-noise ratio. The figure shows that the distributions of the eigenvectors get narrower as the cutoff coefficients are increased. The height of the middle bar increases from 78 to 270 and the cutoff coefficient increases from 0 to 0.18. This means that as the correlation matrix becomes more robust, accidentally correlated variables are weighted less, while the significant variables are weighted more. -o0 100 S0 20 30Oe 0.2 500 40 800 700 Observation 7 0 100 200 300 400 Observation 500 5 800 700 o 100 200 300 400 Observation 0 500 600 700 -0 .0100 200 300 500 600 700 o.1 0•! (b) Robust Correlation with Matrix CutObff0.06 (c) 100 200 300 400 0.2 -0.2 400 Observation Robust Correlationrvati 500 S0o 700 0 Matrix with Cut-on off=0.10, (d) Robust Correlation Matrix with Cutoff=0.15, (e) Robust Correlation Matrix with Cutoff=0.18 with Cutoff=O.18 mht -0.2 -0.15 -0.1 -0.15-0.1 - 7 -0.05 Ou r.05 -0.0 VariableONumber.0 ht - 10 2 102 -0.2 --. 1 2--. 1 i 0.1 o 0.15 O. 5 0.o5VarlblONumbr0.OS 0.1 0.15 ht - 0.2 -- 123 10' -0.2 -0.15 -0..15 -0.1 o 1 -0.05 VadblONum Variable Numberr0.05 ht 0.05 - 22 -0.05 0. 10, 0.1 0.15 .0.1 -0.2 Variable Number 5 -0.1 -0.05 ht 1270 0 0.05 Va~rible Number 0.1 0.16 Figure 4-24: A Comparison of Histograms of the Eigenvectors 4.6.3 PCA with Frequency-Filtered Variables Figure 4-10 in Section 4.6.1 shows that with the exception of transient outliers, the first few principal components of the 738 variables are mostly dominated by low-frequency behavior. In Section 4.5.2, attempts were made to isolate transient outliers by performing PCA on variables with the low-frequency components removed. In the previous section, it was shown the a robust correlation coefficient matrix can be effective in separating variables and reducing noise in PCA. In this section, we hope the PCA with frequency-filtered variables can also remove the noise components in the original data set and more effectively group the variables. Method Attempts are made to perform principal component analysis after the original set of variables are frequency filtered. The idea is that we hope that noise can be filtered out in certain frequency bands, so that PCA can show the same promising results as PCA with robust correlations with regards to grouping variables. Figure 4.6.3 are samples of a high-pass filter and a band-pass filters Frequency-Filters used to remove noise in the original data set. HDh-PsF#O .v~p_ F 1 D.; C.C · € O j 0.1 02 0.3 0.4 .5 Om N_ f:Mq1WWY Z' 47 OA 0.- dI ·) 0 0.1 02 0.3 0.4 0. 0 0.1 013 0 0 07 7 oJ O. O 0.8 0J 4o (a) 0 0.2 OA 0.5 0.6 - o.m N Fnm.0-o y( I (b) Figure 4-25: (a) High-Pass Filter with Wp = 0.3 (b) Band-Pass Filter with Wp=[0.2, 0.4] Results For variables that are high-pass filtered with Wp=0.1, Figure 4-26 illustrates the first ten eigenvectors calculated from the variables' correlation matrix. Figure 4-27 are histograms of the eigenvectors. o 400 200 60 600O * x o.:1 Variable Numb iVlr 1- 1 S0oo ROO 0r O S0.2 , V.Or:l400a N.b.:r o 0.2 ;·'z1 0 400 000 So00o Number Clblt o 200 400 Soo o0o -0.2 from High-Pass Filtered (Wp=0.1) Variablesoo from High-Pass Filtered (Wp=0.1) Variables Figure 4-26: First 10 Eigenvectors Calculated ae O' -I.1 "t - 'son -o.1 VarablhONumt - 1 1.0aleumbr -0. S o.1 110r -0.1VaablONU VartblaeNumbr.1 -. •Verlablr Numb .1 O. -0.0. o V.11.0g. Numb~r O' o •o VokrlmblO •O m NumbO1 h' • "b Figure 4-27: Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.1) Variables Figure 4-28 are the eigenvectors calculated from the correlation matrix of high-pass filtered variables with Wp=0.3. Figure 4-29 are the histograms of the eigenvectors. Figure 4-30 are the eigenvectors calculated from the correlation matrix of high-pass filtered variables with Wp=0.4. Figure 4-31 are the histograms of the eigenvectors. Vriab.e 800 Num-r N.umber S62o0 Variabe 400 Soo 0. o S0o 0 Eo 400 200 0 0.5 V.b Vab 0 200 Boo 600 Number Variable Number 400 S00 800 Variable Number Variable Number 0. 0.5 600 400 200 -0.2 200Varinble 400 Numbe•600 200Variable 400 Num-br 60oo 800 0e VFs1E NuCtHere Figiubl e e rN.4-r Figure 4-28: First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables S10 ... ht .. .... o . . ..ht °r -1.70 ... 1 84 -0 -0.1 2 0 o.1 0.2 Vzrialble Number 1 -0.2 0.2 -0.1T -0.1 VI r .1 0 0 0.1 ble Number 0.2 0.2 0, 1°1 o-0.2 1 [1 10e o. -0.1 Variable oNumber ht - 170 0.2 0.2 ~~07 02 -0 2 -0. 1 i bON hit b 0 " 1 Ss 0.2 1°, ... Figure 4-29: . "u N/o . - .i•,.•,e ,.N : [1 ... Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.3) Variables 02.2. 0.2. 0 0 -0. 7 I R V7 200 2 br e Nu 'Ib 400 200 400 S200 O 0. Bo000 800 Vrl.ble Num.b.r V.ariable Numb•r 10 r m b e N 400 el 0 5V o00 T o000 000 200 4100 0900 Variablae Number 000 800 - oI 0 200 000 1000 0 200 4000 000 Vartafble Nurrmber 8000 00 200 200 400 1..1.0~t] 0.0 0 200 200 400 41000 600 so 000 Boo Vaerable Number 4o 400 00 00 0 oo00 0 Variable Number Figure 4-30: First 10 Eigenvectors Calculated from0 High-Pass (Wp=0.4) Variables S2 Vmrlabla 400 Soo 20 -0Variabla 400Filtered Boo M.Mbbro NumborSoo Figure 4-30: First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables roO 10ht 21 A 10* -0.2 ... ° lo . .-1r.d.o u ... -0.2• -0.1 -- o. -0.2 0 Vratý, Vmrlmblý .10 * 10, 10' rorn * 10, -0.2 o• .. 0. 0.2 0.1 br" u "umbrr h ' 4ý40~ ht - 2014 f -0.1 rd 0o... -0.2 -0.2 0 . -0.1 0.1 a 0 0.1 0.1 al ht 0. 2 -0.1~ 0 0.1 VlrtrAMw Mumbrr .10. . N -. n mb , 0.2 ~_M, u -1 0. h,. -too O 0.2 a sb 0.2 1Nu ..•., 0.2z 0.2 ht - 3-40 V -0. 0.1 0 r.... Vmr~ablo Mumbrr IuVnl.2 -0.2n 0 "t -- -02 -0.1.Vl m, a0Numkr 0.1 I 0.2 o -- "ai -01 0 0-" ~.b#ý Pj .,,brr 0.2 Figure 4-31: Histograms of First 10 Eigenvectors Calculated from High-Pass Filtered (Wp=0.4) Variables Figure 4-32 are the eigenvectors calculated from the correlation matrix of band-pass filtered (Wp=[0.2, 0.4]) variables. Figure 4-33 are the histogram of the eigenvectors. '''I -0.20O- I j I 2Variable V..bl.. Num . umber.00 200 VadA 400 NumberO 0.2 0.2o o 200 O 800 00 o 40o 200 O 400 800 800 200 0 8Boo o00 6oo O 200 400 600 Variable Number o v.• I Soo 400 Vrd.14t 0. 0.2 8o00 I 0Variable Number Variable Number 0.2 02 A 0 200 00 400 20Variablh O0. Vraibe MNniib-er 0.2 Numbeo 800 Ojý.j~a."gk 0 200 400 600 Var0able Numbar 0 -0f.r Soo 2B0Varlable 400Number 600 8e0 Figure 4-32: First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.4]) Variables 100 ~ ~ -0.2 --0.2 a i bs o u t~b -o1I 0.1 -0' Eu...o.1 ... ri.... -=o 0 .2,M-02 - - .... 0.2 .VariableoNumb -0-1 0 -Vsrlblo o.1 Vo 0.2 . ° lo -0.2 -- 0.2 0.1 hM Numb-r 10 2 102 3rlo . .2 so -... Var .....2 .... -... 1 1 0*V 0 0 -0.1Variable0NumberO.1 ,o·,o hl-137 0.2 0.2 121 2 t Numbe tot arue .... 40 -- loO _0_1 •,oo--0.2 n-0"'-IfbfýNum-ý' I ~ o.~ h ...o. .... ... .... w1010 -0. M V-i-ebl. 0 P4.-t- 02ht 0"' 0.2 -1~34 iE -- 0.2 -- o.1 0 0.1 vmrlmbfý Numb-r tht - 04 0 --0.11 VVriebl. 0-02 1,.2 0.1 u4-tht - 1.40 0.2 ti) --0.2 -01 0 Var ibleNumbo 0.1 -0.. -0 1 11 0. VeIeblý fNumb-r 0.2 Figure 4-33: Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.4]) Variables Figure 4-34 are the eigenvectors calculated from the correlation matrix of band-pass filtered (Wp=[0.2, 0.3]) variables. Figure 4-35 are the histogram of the eigenvectors. Interpretation The figures of eigenvectors obtained from the correlation matrices of high- pass and band-pass filtered variables show that PCA with frequency filtered variables is also 0.2 T O" 0 200 400 00 Var°able Number 0 oo00 7o 1 o 200 0.2 400 Varibla 0.2 200 -0 600 600 -0.2 c-0.2 0 - 400 600 Fo! 0.2 able 0.2 00 6200 . ooI IM - Va-"ible Numb. 200 400 600 -Varable Number . " 00 0" -- -0. " 0 Varibl 200 0- 600 O Number 400 VVariables Number S 600 200 0 Number 400 200 400 600 RVa•b Num•r Tri 02 600 000 Number Vrr"b"Num 400 Variabl Numb. r o00 00S Figure 4-34: First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3]) Variables 110 110 V.riable Number ht10 Variable Number 110 10 t c 110 -0.2 -0.1 0 0.1 Variable Number Elk 0 0.1 -02 -0.1 -0.2 -01 0 01 Varlable Number 0.2 -0.2 0.2 -0.2 0.2 -02 -0.1 0 0.1 Variable Number nWII -0.1 0 0.1 -0.1 0 0.1 Variable Number 0.2 0.2 0.2 Figure 4-35: Histograms of First 10 Eigenvectors Calculated from Band-Pass Filtered (Wp=[0.2, 0.3]) Variables effective in grouping correlated variables together. The histograms indicate that the distributions of these eigenvectors are much narrower than the distributions of the eigenvectors associated with the correlation matrix of the original data. The average heights of the middle bars of the first 10 eigenvectors associated with the filtered variables obtained from filters with normalized pass-band frequency Wp = 0.1, 0.3, 0.4, [0.2, 0.4], and [0.2, 0.3] are 155, 237, 230,128, and 114 respectively. Compared to an average middle bar of 86 for the distribution of the first 10 eigenvectors of the original data set, the distribution of the eigenvectors from frequency-filtered variables are much narrower. Consequently, it is reasonable to state that by removing the noise in the original data set, PCA with frequency-filtered variables improves the signal-to-noise ratio of the eigenvectors, where significant variables are weighted more, and accidentally correlated variables are weighted less. Chapter 5 Conclusion and Suggested Work This thesis presented various methods for analyzing multivariate data sets from continous web manufacturing processes. The analysis techniques described in Chapter 2 were applied to two sets of data sets from two different web processes in Chapter 3 and Chapter 4. These analysis techniques combined with an understanding of the physics of the manufacturing processes can produce insights into information-rich data sets. Experiment results show that both normalized Euclidean distance and principal component analysis are effective in separating the outliers from the main population. Correlation analysis on Poisson-distributed defect densities shows the difficulties in determining the true correlation between variables when the signal-to-noise ratio of the underlying processes are unknown. Principal component analysis is a good way to determine the existence of linear relationships between sets of variables. 'Based on the hypothesis that variables from the same subprocess are more correlated than variables from different subprocesses, both principal component analysis with robust correlation matrix and principal component analysis with frequency-filtered variables are effective in grouping variables. Hopefully, the results of my experiments can lead to more research in the area of multivariate analysis of manufacturing data in the future. Other multivariate methods can be explored to identify and to treat outliers. In addition, mathematical models can be built to determine the effects of Poisson noise on the calculation of correlation between processes. Furthermore, mathematical methods can be developed to quantify the effects of non-linear operations on the correlation matrices on the removal of noise and on the effectiveness of grouping variables. Combining a solid understanding of the underlying physics with a mastery of analysis techniques can lead to tremendous progress in the area of data analysis of manufacturing data. Bibliography [1] M.R. Anderberg, Cluster Analysis for Applications, Academic Press, New York, 1973. [2] T.J. Derksen, The Treatment of Outliers and Missing Data in Multivariate Manufacturing Data Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1996. [3] B.B. Jackson, Multivariate Data Analysis, Richard D. Irwin, Inc., Illinois, 1983. R.J.A. Little, Statistical Analysis With Missing Data, John Wiley & Sons, Inc., New York, 1987. [4] M.A. Rawizza, Time-Series Analysis of Multivariate Manufacturing Data Sets Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 1996. [5] A.C. Rencher, Methods of Multivariate Analysis, John Wiley & Sons, Inc., New York, 1995.