Document 18023655

advertisement
Data Mining
Manufacturing
Data
Dave E. Stevens
Eastman Chemical Company
Kingsport, TN
Presentation Outline
•
•
•
•
•
Intro: Data Mining
Manufacturing Data
Data Preparation
Principal Component Analysis
Partial Least Squares
PLS Discriminate Analysis
Manufacturing Data
Then and Now
• 40 Years Ago
- Few Measurements
- Temp, Press., Flows
• Today
- Many Measurements
- Very Often
- Creates Large Data Sets
• Purposes For Measuring
- Process “State”
- Relationships (X, X to Y)
- Classification
- Optimization
Concerns With Current
Manufacturing Data
• Dimensionality: (Large)
>1000 process variables every few seconds
>10 quality variables every few hours
Data Overload - Analyst concentrates on only a few variables
and ignore most of the information!
• Collinearity: Not 1000 independent things at work. Only a
few underlying events affecting all variables. Variables
are all highly correlated.
• Noise:
• Missing Data:
Multivariate Data Concept
* * * **
* * ** *
* ** *
*
*
*
*
*
*
*
*
*
*
*
BreakLoad Control Chart
*
*
*
Is This Process In Control?
*
*
*
*
*
Elongation Control Chart
Data Preparation
• Data collected in a Process Data Historian will have
Process Up and Down Times recorded from the
instrumentation
• Need a software tool that will permit easy methods to
clean the data and do initial Exploratory Data Analyses
• JMP Software
– Interactive Graphing
– Removal of Outliers
• Graphically or Variable Selection Criteria
– Join and/or Subset Data Tables
– Statistical Analyses
Principle
Components
Analysis
Understanding Relationships
Between Process Variables
Principle Component
Analysis
• Principle Component Analysis is a Projection
Technique
• Raw data are first “Centered” and “Scaled”
• Each Principle Component represents a direction
through the data that captures the maximum
amount of raw data variation
• For each Principle Component (a), new data values
are generated for each obs. (i) which are a linear
combination of the raw X variables (k):
ti,a = ba,1*Xi,1 + ba,2*Xi,2 . . . ba,k*Xi,k for each obs. i
Where the b’s are loadings (-1 to 1)
• Increasing number of Principle Components
represent less and less raw data variation
Principle Component Analysis
Fundamentals
X3
Projections
X2
1st PC
2nd PC
X1
PCA: Scores
x2
ti,2
1st PC
Obs. i
x3
ti,1
2nd PC
x1
The scores tia (observation i, dimension a) are the places along
the component lines where the observations are projected.
PCA: Loadings
x2
x2
a3
a2
Cos(a)=X/PC
a1
x3
1st PC
x3
x1
x1
The loadings pak (dimension a, variable k) indicate the importance
of the variable k to the given dimension. pak is the direction cosine
(cos a) of the given component line vs. the xk coordinate axis.
PCA Example
•
•
•
•
10 process responses obtained
on each observation
Data represented weekly
process response averages
Data spanned 10 months
Objective: Determine if the
system was stable.
PCA Score Plot
PC #2
Process Shift
June 30 (5_30)
PC
#1
PCA Loadings Plot
X3
L
o
a
d
i
n
g
s
P
C
#
2
X7
X10
X8
X4
X2
X9
X6
Loadings PC #1
X1
X5
PC #2
Process Shift
June 30 (5_30)
PC #1
Relative to process
shift, X1 and X5 were
high in value and X4
and X8 were low in
value. Pos. Corr. Vars.
were X1, X5 and X4, X8
Neg. Corr. Vars. were
X1, X5 to X4, X8
Process variable X1 increased in value when the system shifted
from the left side to the right side on the PCA Score plot
Variables X1 and X5 were positively correlated
Partial
Least Squares
Technique
Understanding Relationships
Between Process & Response Variables
Partial Least Squares
Fundamentals
X Space
Y Space
X3
Projections
Planes
X2
Y3
Y2
X1
Y1
TA Filter Example
• Objective: Relate Filtrate, TA Catalyst and
Dryer Temp to Filter Speed, Vacuum, Wash
Acid, Weir Level, Nash Discharge Pressure
and Feed Tank Temperature
– Keep Filtrate High, TA Catalyst Low
• Data: 12 Hour Averages from PI collected
over a four month period
TA Filter
TA Filter Relationships
Catalyst
Higher filter speed and vac. pressure increased the filtrate flow and catalyst content but lowered the dyer temp.
Higher weir level, nash discharge pressure and Op tank temp increased filtrate flow. Wash acid flow had no
driving effect on the responses.
PLS Results
• Obtain Weight Plots (Previous Slide)
– Shows the inter-relationships between
the Xs and Ys
• Obtain Regression Coefficients
– Can be used to generate response surface
plot
• Display Variables Important to Prediction
(VIP)
• Display Residual Plots and Distance to the
Model Plot
Correlation
Does Not
Always Mean
Causation
PLS Discriminate
Technique
Determine What Drives Data Groups
To Be Different
Objective
• Given groups of data from a particular
process, determine what makes the groups
different with respect to the given
measurements.
• Example: TA %T
– Measurements: 4-HMB, TMA, TPAD, 4HBA, 4-CBA, IPA, BA, PTAD, p-TA, 2,7DCF, 2,6-DCF, 4-4-DCB, 3,5-DCF, 9-F-2CA, 9-F-4-CA, 2,6-DCA, 4,4-DCS, L*, a*,
b*, .1%, .9%, Mean, %T
– Daily Numbers
– Data taken from Convey Line #1 and #2
TA %T
%T
Convey Line #2
Removed Above Data From PLS Disc. Analysis
Series
PLS Discriminate Analysis
High %T
Low %T
What Measurements
Separated the Groups?
2
The high %T group ($DA1) was high in %T, 0.1, Mean and L. The low %T group ($DA2) had several
measurements that were high in value and were positively correlated (see next slide for details).
The low %T group ($DA2) had several variables that were correlated and high in value: 4 4’-DCS, 4-CBA
TMA and p-TA
Cat
Computer Software
•
JMP Software
– http://www.jmpdiscovery.com
•
SIMCA-P from Umetrics
– http://www.umetrics.com
Download