Multistep Virtual Metrology Approaches for

advertisement
Multistep Virtual Metrology Approaches for
Semiconductor Manufacturing Processes
Presenter: Simone Pampuri (University of Pavia, Italy)
Authors:
Simone Pampuri, University of Pavia, Italy
Andrea Schirru, University of Pavia, Italy
Gian Antonio Susto, University of Padova, Italy
Cristina De Luca, Infineon Technologies AT, Austria
Alessandro Beghi, University of Padova, Italy
Giuseppe De Nicolao, University of Pavia, Italy
Introduction
 Collaboration between University of Pavia (Italy), University of
Padova (Italy) and Infineon Technologies AT (Austria)
 Activity funded by the European project EU IMPROVE: Implementing Manufacturing science solutions to
increase equiPment pROductiVity and fab pErformance
Introduction
 Collaboration between University of Pavia (Italy), University of
Padova (Italy) and Infineon Technologies AT (Austria)
 Activity funded by the European project EU IMPROVE: Implementing Manufacturing science solutions to
increase equiPment pROductiVity and fab pErformance
 Duration: 42 months (since Jan 2009)
 Global fundings: 37.7 M€
 32 partners, including
•
•
•
•
Semiconductor fabs
Academic institutions
Research centers
Software houses
 Thematic Work Packages
Contents
1
Motivations
2
Machine Learning
3
Multilevel framework
4
Multistep VM
5
Results and Conclusions
What is Virtual Metrology?
 In semiconductor manufacturing, measurement
operations are costly and time-consuming
 Only a small part of the production is actually measured
What is Virtual Metrology?
 In semiconductor manufacturing, measurement
operations are costly and time-consuming
 Only a small part of the production is actually measured
 Virtual metrology exploits sensors and logistic
information to predict process outcome
Sensor Data
Recipe Data
Logistic Data
VM
What is Virtual Metrology?
 In semiconductor manufacturing, measurement
operations are costly and time-consuming
 Only a small part of the production is actually measured
 Virtual metrology exploits sensors and logistic
information to predict process outcome
Sensor Data
Recipe Data
Controllers
VM
Predictive
Information
Sampling tools
Decision tasks
Logistic Data
Contents
1
Motivations
2
Machine Learning
3
Multilevel framework
4
Multistep VM
5
Results and Conclusions
Machine learning (in a nutshell)
 Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system
Input
(X)
Output
(Y)
Training dataset
Learning
Algorithm
Model
f(X)
Machine learning (in a nutshell)
 Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system
Input
(X)
Output
(Y)
Learning
Algorithm
Model
f(X)
Training dataset
 The model is then able to predict patterns similar to the observed
ones
Input
(Xnew)
Model
Prediction
(Ynew)
Machine learning (in a nutshell)
 Machine learning algorithms create models from observed data
(training dataset), using little or no prior informations about the
physical system
Input
(X)
Output
Most
(Y)
famous
algorithm:
Learning
Algorithm
Model
f(X)
Ordinary Least Squares (OLS)
that consists in solving the optimization
problem defined by the loss function
Training dataset
 The model is then able to predict patterns similar to the observed
ones
Input
(Xnew)
Model
Prediction
(Ynew)
The curse of dimensionality
 Problem: the so-called “curse of dimensionality”
The number of selected predictors grows almost linearly
with the number of candidate predictors
 Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
 In semiconductor manufacturing,
it is common to have hundreds
of candidate predictors: how to
tackle the problem?
The curse of dimensionality
 Problem: the so-called “curse of dimensionality”
The number of selected predictors grows almost linearly
with the number of candidate predictors
 Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
 In semiconductor manufacturing,
Regularization
(or Penalization)
it is common
to have hundreds
of candidate predictors:methods
how to
tackle the problem?
The curse of dimensionality
 Problem: the so-called “curse of dimensionality”
The number of selected predictors grows almost linearly
with the number of candidate predictors
 Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
1943
Ridge (or Tikhonov) regression: in order to
improve the least squares method, stable
(“easier”) solutions are encouraged by
penalizing coefficients through the parameter a
The curse of dimensionality
 Problem: the so-called “curse of dimensionality”
The number of selected predictors grows almost linearly
with the number of candidate predictors
 Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
1943
• Best value for
hyperparameter
is chosen
Ridge (or Tikhonov) regression: in order
to
via validation
improve the least squares method, stable
(“easier”) solutions are encouraged • Computationally
by
easy
penalizing coefficients through the parameter(closed
a
form solution)
• No sparse solution
The curse of dimensionality
 Problem: the so-called “curse of dimensionality”
The number of selected predictors grows almost linearly
with the number of candidate predictors
 Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
1996 –
today
L1-penalized methods: by constraining the
solution to belong to an hyper-octahedron,
sparse models can be obtained (variable
selection). Most famous example: LASSO
The curse of dimensionality
 Problem: the so-called “curse of dimensionality”
The number of selected predictors grows almost linearly
with the number of candidate predictors
 Consequence: the predictive power of machine learning
models reduces as the number of candidate predictors
increases
1996 –
today
• Best value for
is chosen
L1-penalized methods: by constraining hyperparameter
the
via validation
solution to belong to an hyper-octahedron,
sparse models can be obtained (variable
• Sparse solution (variable
selection). Most famous example: LASSO selection)
• Solved by iterative
algorithms (e.g. SMO)
Contents
1
Motivations
2
Machine Learning
3
Multilevel framework
4
Multistep VM
5
Results and Conclusions
The hierarchical variability
 We deal every day with multiple levels of variability:
 Every equipment has several chambers
 In some cases, these chambers are splitted in sub-chambers
 Different process groups, recipes run on the same equipment
The hierarchical variability
 We deal every day with multiple levels of variability:
 Every equipment has several chambers
 In some cases, these chambers are splitted in sub-chambers
 Different process groups, recipes run on the same equipment
 Simple (“naive”) solution: create one
model for every possible combination
of factors
 We’ll never have enough data to that,
especially for low volume recipes
The hierarchical variability
 We deal every day with multiple levels of variability:
 Every equipment has several chambers
 In some cases, these chambers are splitted in sub-chambers
 Different process groups, recipes run on the same equipment
 Simple (“naive”) solution: create one
model for every possible combination
of factors
 We’ll never have enough data to that,
especially for low volume recipes
 Better solution: handle those different
levels of variability inside the model
The hierarchical variability
 We deal every day with multiple levels of variability:
 Every equipment has several chambers
 In some cases, these chambers are splitted in sub-chambers
 Different process groups, recipes run on the same equipment
Multilevel Techniques:
 Simple (“naive”) solution: create one
model for Multilevel
every possible
combination
Ridge
Regression (RR)
of factors
&
 We’ll never have enough data to that,
Multilevel Lasso
especially for low volume recipes
 Better solution: handle those different
levels of variability inside the model
The Multilevel Transform
 First step is to create an extended input matrix to reflect the
relationships between the j clusters. For instance, in the
case of j mutually exclusive nodes,
 The input matrix reflects the dependency on logistic paths
Contents
1
Motivations
2
Machine Learning
3
Multilevel framework
4
Multistep VM
5
Results and Conclusions
Standard scenario
 Production flow: sequence of steps; each step
represents an operation that must be performed on a
wafer in order to obtain a specific results
 Each step is performed by different equipment
(composed by multiple chambers):
 The knowledge of which wafer is processed by a specific
equipment is available (logistic information)
 The information about processed wafer (e.g. sensor
readings and recipe setup) might be available
 On some equipments a “single step” VM system is already
in place (estimated measures for each processed wafer
are available)
Cascade Multistep VM
 This approach allow to build a pipe system in which the
predictive information is propagated forward to concur to
further model estimation.
 The generation of multilevel input matrix consists in replace
j-th cluster’s process variables with j-th VM-j estimation
Cascade Multistep VM
 This approach allow to build a pipe system in which the
predictive information is propagated forward to concur to
further model estimation.
 The generation of multilevel input matrix consists in replace
j-th cluster’s process variables with j-th VM-j estimation
Pros:
Cons:
o Small overhead append to
the input space
o Steps without “single step”
VM must be excluded
o Computational effort very
similar to “single step” VM
case
o There might be some
information loss between two
or more steps
Process and Logistic Multistep VM
 With this approach, all the relevant logistic, process and
recipe information from all the considered steps is included in
the input set
 In this case, the generation of input matrix fully follows the
previous Multilevel Transform
Process and Logistic Multistep VM
 With this approach, all the relevant logistic, process and
recipe information from all the considered steps is included in
the input set
 In this case, the generation of input matrix fully follows the
previous Multilevel Transform
Pros:
Cons:
o Steps with no (or
meaningless) measurements
can be included
o Input space dimension is
significantly increased by this
approach
o All the available information
is provided to the learning
algorithm
o More observations are
needed to train the learning
algorithm
Contents
1
Motivations
2
Machine Learning
3
Multilevel framework
4
Multistep VM
5
Results and Conclusions
Scenario
 Production flow for methodologies validation:
1.
2.
3.
4.




Chemical Vapor Deposition (CVD)
Thermal Oxidation
Coating
Lithography
Target: post-litho CDs
Dataset: 583 wafers anonymized
Hyper-parameter tuning: 10-fold crossvalidation
Multistep VM setups:


CVD-Litho Cascade
CVD-Litho Process and Full Logistic
Cascade
The cascade VM allows to further improve
the VM performances using RR. This result
might be related to the additional hidden
knowledge provided by the intermediate
CVD metrology prediction.
The cascade approach performs worse
with the LASSO. It should be noted that
this is the only case in which the extended
input space does not improve the
predictive performances.
Process and Full Logistic
Validation RMSE results for Ridge
Regression: it is apparent how the full
step choice allows to improve the
predictive performances.
LASSO is consistently outperformed by
Ridge Regression in the dataset that was
used for the experiment; nevertheless, the
extended input space proves to be fruitful
also in this case, with respect to the
Lithography based approach.
Best Lasso and Best RR
The best overall results for Ridge
Regression are obtained with the
cascade approach and by considering all
the process steps.
For the LASSO, the best overall results are
obtained by considering the extended
process values for all the involved steps.
Conclusions
 Research and design of Multistep VM strategies targeted
to specific semiconductor manufacturing needs
 Main features:
 Enhancing precision and accuracy of regular VM system
 Taking in account process without measurements
 Tests showed promising results; however, the strategy to
be implemented must be carefully designed:
 Sample size and relevance of the steps are fundamental
criteria to obtain the best performances
www.themegallery.com
Thanks for your attention!
Presenter: Simone Pampuri (University of Pavia, Italy)
Authors:
Simone Pampuri, University of Pavia, Italy
Andrea Schirru, University of Pavia, Italy
Gian Antonio Susto, University of Padova, Italy
Cristina De Luca, Infineon Technologies AT, Austria
Alessandro Beghi, University of Padova, Italy
Giuseppe De Nicolao, University of Pavia, Italy
Download