Uploaded by neilgorman15

Neural Modeling for Si-MOSFET Manufacturing Optimization

advertisement
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Date of current version August 25, 2020
Digital Object Identifier
Neural Approach for Modeling and
Optimizing Si-MOSFET Manufacturing
HYUN-CHUL CHOI1 , HYEOK YUN2 , JUN-SIK YOON2 , (Member, IEEE), and ROCK-HYUN
BAEK2 , (Member, IEEE)
1
Department of Electronic Engineering, Yeungnam University, Gyeongsan, 38541 Republic of Korea (e-mail: pogary@ynu.ac.kr)
Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, 37673 Republic of Korea (e-mail: {myska315, junsikyoon,
rh.baek}@postech.ac.kr)
2
Corresponding author: Rock-Hyun Beak (e-mail: rh.baek@postech.ac.kr).
This work was supported in part by the Ministry of Trade, Industry & Energy under Grant 10080617, in part by the Basic Science Research
Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning
(NRF-2018R1C1B6004056, NRF-2020R1A4A4079777), in part by the Korea Semiconductor Research Consortium Support Program for
the Development of the Future Semiconductor Device, in part by the Semiconductor Industry Collaborative Project between POSTECH
and SK Hynix Inc., and in part by IC Design.
ABSTRACT An optimal design of semiconductor device and its process uniformity are critical factors
affecting desired figure-of-merits as well as reducing fabrication cost of fixing possible malfunctioning in
semiconductor manufacturing. Two main tasks in optimal device design for semiconductor manufacturing,
i.e., parameter optimization and modeling, have been typically used either to characterize the devices
by understanding how each parameter affects the device performances or to calibrate the parameters for
SPICE circuit simulation. However, there still remains limitations in describing the relationship between all
manufacturing parameters and figure-of-merits using several simple equations human experts can utilize.
Even with the best model currently available, the optimal design of semiconductor device heavily relies on
experiences of human experts and deals with time-consuming ad-hoc trials and non-holistic approaches. In
this paper, we propose a new approach for data-based accurate electrical modeling of transistor, which is
the most fundamental unit device of semiconductor, and fast optimization of its manufacturing parameters.
Instead of the previous analytic approaches, finding finite equations derived from semiconductor physics,
we utilize machine learning technique and neural networks to find appropriate modeling functions from
data pairs of parameters and figure-of-merits. And for given desired figure-of-merits, we find optimal
manufacturing parameters in holistic manner by using the learned functions of neural networks and
fast gradient-based optimization method. Experimental results show that our neural-network-based-model
directly estimate figure-of-merits with competitive accuracy and that our holistic optimization technique
accurately and rapidly adapts the manufacturing parameters to meet desired figure-of-merits.
INDEX TERMS Transistor, Semiconductor manufacturing, Holistic parameter optimization, Machine
learning, Neural networks, Gradient descent method
I. INTRODUCTION
RANSISTOR process is the most basic and core technology in semiconductor industry. Thus, developing a
transistor that is high-performing and low-power-consuming
is of a great importance. Typically finding optimal manufacturing parameters for high performing and low power
consuming transistor requires a repeated wafers fabrication
with hundreds of unit process steps to measure its electrical
performances and a feedback loop to unit process of manufacturing. This procedure dramatically increases research and
development cost because it can take up to several weeks.
T
The duration of the procedure has been extended with the
shrinkage of technology node.
To reduce the time consuming trials, multiple equationbased models have been suggested in analytic or numerical
forms either to physically understand device characteristics
[1] or to model I-V and C-V through tens of fitting parameters for SPICE circuit simulation [2], [3]. However, these
modeling equations do not explain the relationship between
manufacturing parameters and figure-of-merits at a time.
There is no simple equation to delineate the relationship and
ultimately to meet given desired figure-of-merits by under1
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
standing their correlations. In addition, even with those equations, manufacturing parameter optimization heavily relies
on experiences of human experts and has been dealt with adhoc trials and non-holistic approaches. For instance, human
experts design a transistor to optimize DC performance firstly
by maximizing on/off current ratio from I-V characteristics
and AC performance secondly by adjusting capacitance from
C-V characteristics because of the antagonistic effects between I-V and C-V characteristics on DC/AC performances
of transistor when its structure changes.
In this paper, we aim to overcome time-consuming and
non-holistic optimization of transistor manufacturing parameters. For this end, we propose a new approach for both databased precise transistor modeling and fast optimization of
transistor manufacturing parameters. Instead of the previous
analytic approaches in finding finite equations derived from
semiconductor physics, we utilize artificial intelligence (AI),
namely neural networks [4], [5], and machine learning technique, to automatically find appropriate modeling functions
from a number of data pairs of input manufacturing parameters and output figure-of-merits. Subsequently, we find
optimal manufacturing parameters for given desired figureof-merits in automatic and holistic manner via gradient-based
optimization technique and the learned modeling functions.
AI has been previously applied for monitoring the excursion, i.e., out of boundary figure-of-merits in unit process of
transistor manufacturing [6], but there is no AI-based study
on analyzing the relationship between overall manufacturing
procedure and figure-of-merits of transistor. Our holistic
approach simultaneously optimizes the entire manufacturing
procedure in the whole parameter space while conventional
sequential optimization for each unit process results in a
sub-optimal transistor due to its limited parameter-searching
ability. The contributions of this work are as follows:
1) Physics-free modeling and high compatibility: In semiconductor field, different models with regards to material, scale, structure, and usage of transistor are conventionally required. However, our AI model replaces multiple models with one artificial neural network model,
which is independent of those physical characteristics
and able to be applied to various transistor only with
their input/output data pairs. In the aspect of variability,
our method demonstrates the correlation between manufacturing parameters and device figure-of-merits from
a number of data even for nanoscale devices suffering
from physical ambiguity.
2) Fast modeling and optimization: It takes several hours
for excellent human expert to compactly model a given
transistor. In contrast, our AI method takes only several
tens of seconds to complete the same task. Similarly,
our AI method takes only a couple of minutes to find
optimal manufacturing parameters for given device
figure-of-merits.
3) Fast prediction without accuracy penalty: To speed up
TCAD simulation in transistor design, less physical parameters and more empirical parameters are necessary.
This results in low accuracy of simulation result. However, our AI method uses a compact neural network
for fast prediction and guarantees high accuracy as the
number of data increases.
4) Holistic and flexible optimization: Our method finds
manufacturing parameters to optimize AC performance of transistor by simultaneously considering IV and C-V parameters. In addition, our method can
also selectively optimize the performance suitable for
application of transistor, which is useful for flexible
design of transistor for System-On-Chip application.
In the remained of this paper, we will describe the details
of our AI-based method in sec.II, experimentally prove the
superiority of our method in sec.III, and conclude this work
in sec.IV.
II. METHOD
A. TRANSISTOR STRUCTURE AND PARAMETERS
Since the purpose of this work is not to find a new low-scaled
devices but to optimize transistor and to analyze its variation,
we designed 32-nm node high-K metal gate transistor, a 2-D
planar MOSFET which is easily dealt with in data acquisition
compared to 3-D FinFET in the current 7 nm technology
node, and simulated electrical characteristics using Sentaurus
TCAD simulator [7]. The transistor structure was designed
using structure editor with parameters based on Intel 32nm node [8]. Values and ranges of structure parameters are
shown in fig.1 and summarized in table 1. All doping profiles
were assumed as the Gaussian distribution. The junction
gradient lengths for source/drain (Lsdj ) and halo implants
(Lhaloj ) indicate the lengths where the source/drain (Nsd )
and halo (Nhalo ) implants are reduced by 10 times, respectively.
TABLE 1: Structure input parameters of high-K metal gate
transistor based on Intel’s 32-nm node [8]
Input parameter
Symbol
Unit
Reference Max
Gate length
Lg
nm
30
34.5
Spacer length
Lsp(s/d)
nm
20
21
Contact length
Lcon(s/d)
nm
40
42
Oxide thickness
Tox
nm
0.7
0.805
High-K thickness
Thk
nm
2
2.3
Doping
Nch
1018 cm−3
1
1.25
concentration
Nhalo(s/d) 1018 cm−3
5
6.25
Nsd(s/d) 1020 cm−3
1
1.25
Junction gradient Lhaloj(s/d) nm/dec
10
10.5
Lsdj(s/d)
nm/dec
25
26.25
Gate stack height
Tg
nm
50
52.5
S/D epi height
Ts/d(s/d)
nm
10
10.5
Min
25.5
19
38
0.595
1.7
0.75
3.75
0.75
9.5
23.75
47.5
9.5
B. NEURAL-NETWORK-BASED MODELING FOR
POWER AND DELAY PREDICTION
In order to avoid very complex mathematical equations of
semiconductor physics and to make a general model for
multiple devices, we utilize one of the recently popularized
hyperparametric models, namely artificial neural networks
(ANN) [4], [5], and machine learning techniques to model
2
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
1. 논문 figure 및 table
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
of i-th training sample (Ip i ,Dy i ) for i = 1..N as the only
loss function. Each MLP for y = power or delay, is trained
to minimize this loss function Ly (3) by backpropagation
procedure [13], which updates the hyperparameters of MLP
from initial random values. Minimizing Ly in log scale is
O i
identical to minimizing log-scaled ratio, |log( Dyy i )|, which
considers both scale-sensitive ratio and log-scaled error symmetric regarding to predicted and target values.
N
1 X
||log(Oy i ) − log(Dy i )||2 , y = power or delay.
N i
(3)
One could suggest using a single MLP, which outputs both
power and delay from given input manufacturing parameters.
Such an approach is also possible and, in actuality, more
efficient in the size of neural networks because two different
outputs share the same hidden layer. However, in training
procedure, a multi-objective loss function Lm of two objectives (3) corresponding to power and delay should be defined
and usually has a form of weighted summation (4) of Lpower
and Ldelay with ambiguous weights wpower and wdelay .
Ly =
FIGURE 1: Geometric structure of high-K metal gate transistor based on Intel’s 32 nm node [8].
log log
.
.
.
Here, we used log values in input and output of Fy for y =
power or delay to avoid network training failure caused by
large scale differences between input and output values.
We follow the data flow and operations presented in fig.3
for MLP training. We use mean squared error (MSE), Ly (3),
between model output log(Oy i ) and desired output log(Dy i )
log
.
.
.
(2)
log log
.
.
.
log(Odelay ) = Fdelay (log(Ip )).
log
.
.
.
(1)
Manufacturing parameters (Ip)
.
.
.
log(Opower ) = Fpower (log(Ip )),
(4)
Since training a single MLP by minimizing Lm may result in
overly fitted Fy with a larger weight wy or less optimized Fy
with a small weight wy , we need to repeat MLP training process with varying weights to get satisfactory values of Lpower
and Ldelay to determine appropriate values for the weights.
In contrast, when using two MLPs for two outputs as in our
approach (fig.2), it is unnecessary to find appropriate weights
for multiple loss functions and each MLP is independently
trained with its own loss function Ly (3).
.
.
.
functions that output figure-of-merits of transistor with the
given input vector. The unit structure of ANN is perceptron
[9], which models after the biological function of neuronal
synapse as a linear combination of inputs followed by a
non-linear activation function. ANN with several layers of
multiple perceptrons is called fully connected multi-layer
perceptron (MLP) [10], [11]. The advantage of MLP is in
making a flexible model for new device by simply changing
the number of input and output perceptrons, regarding to the
dimension of manufacturing parameters and output figureof-merits. In this work, we use two of 3-layered MLP (inputhidden-output layers) as shown in fig.2 because the structure
is simple enough to control non-linearity function by changing the number of perceptrons in the hidden layer.
By using these two MLPs, we model two functions of
transistor, Fpower (1) and Fdelay (2), which respectively
output log-scaled power log(Opower ) and delay log(Odelay )
of a device from given input log-scaled vector log(Ip ) of
manufacturing parameters p, where power and delay, or
power delay products (PDP) are figure-of-merits for digital
logic applications [12].
Lm = wpower · Lpower + wdelay · Ldelay .
Fdelay
Fpower
exp
exp
Power (Opower)
Delay (Odelay)
FIGURE 2: 3-layered MLPs for estimating power and delay
from given manufacturing parameters. Flexible nonlinear
function can be implemented by using MLPs with varying
number of perceptrons.
3
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
Backpropagation
Latent values zp
Limiter g(x)
Manufacturing
Parameters
log(Ipi)
.
.
.
log(Oyi)
.
.
.
C. AUTOMATIC OPTIMIZATION OF MANUFACTURING
PARAMETERS WITHIN CONTROLLABLE RANGE
Fdelay
Power
log(Opower)
Delay
log(Odelay)
log(PDP)
Once two models of power and delay are ready to use, we can
not only forwardly predict power or delay from given manufacturing parameters but also backwardly optimize manufacturing parameters for a better performance of transistor. PDP
is a commonly used measurement of transistor performance
and should be as small as possible. Since our neural networks
are designed to output log-scaled power and delay (fig.2),
log-scaled PDP can be represented as the summation of two
outputs of our networks (5).
log(P DP (x)) = log(Opower (x) · Odelay (x))
= log(Opower (x)) + log(Odelay (x))
= Fpower (x) + Fdelay (x),
x = log(Ip ).
Fpower
.
.
.
.
.
.
FIGURE 3: MLP training for a certain figure-of-merit. Given
training sample (Ip i ,Dy i ), MLP is trained to minimize Ly by
backpropagation.
Update latent values by
gradient descent method
.
.
.
Target specification
log(Dyi)
log(Ip)
.
.
.
Fy
Manufacturing
Parameters
Ly
FIGURE 4: Automatic optimization of manufacturing parameters with fully connected multi-layer perceptron (MLP):
pre-trained networks for predicting log-scaled power and
delay are used to calculate log-scaled PDP and limiter g(x)
is used to force the manufacturing parameters Ip be in the
controllable range.
P DP (xt ) ≤ TP DP or G(x) ≤ TG or dx ≤ Tdx .
(5)
As shown in fig.4, we find the optimal set of manufacturing
parameters, which minimizes this log-scaled PDP value by
utilizing our trained neural networks and the gradient descent
method [14], [15]. Gradient descent method operates as described here: initial randomized manufacturing parameters,
Ip0 , and learning rate γ are set as a preliminary step. Then,
as the first step, the gradient of log-scaled PDP with respect
to manufacturing parameter G(x) (6) is calculated with the
initial manufacturing parameters (x = log(Ip0 )). As the
second step, the initial manufacturing parameters are updated
to move to the negative direction of the gradient with a certain
learning rate γ as (7) at time stamp t = 0. Those two steps are
repeated with the updated manufacturing parameters as the
time stamp t increases until PDP value meets a termination
condition (8), i.e., PDP less than a certain threshold value
TP DP or vanishing gradient or no variation in manufacturing
parameters.
d(log(P DP (x)))
dx
dFpower (x) dFdelay (x)
=
+
,
dx
dx
(6)
xt+1 = xt − γ · G(xt ), xt = log(Ipt ),
(7)
G(x) =
(8)
Here, the manufacturing parameters obtained from this
gradient descent method are not guaranteed to be in the
controllable range of real semiconductor manufacturing because the method may find an optimal solution along the
entire hypersurface of our trained neural networks beyond the
training data space, without any constraint of the acceptable
range of solution in reality. Thus setting a controllable range
of manufacturing parameters in semiconductor industry is
critical because of both the limitation of manufacturing
hardware and the preferred parameter setting. To provide a
controllable optimal solution, as shown in fig.4, we use a
limiter function [16] g(zp ) (9) and optimize latent variables
zp instead of manufacturing parameters Ip themselves.
g(zp ) = sig(zp )·(log(Ipmax )−log(Ipmin ))+log(Ipmin ). (9)
g(zp ) is a differentiable function that is defined by maximum
and minimum values (Ipmax , Ipmin ) of the available range
of manufacturing parameters and a sigmoid function sig(zp )
[17] (10).
1
sig(zp ) =
.
(10)
1 + e−zp
Because the output range is from 0 to 1 (sig(zp ) ∈
[0, 1]), g(zp ) becomes log(Ipmax ) when zp goes to +∞
and log(Ipmin ) when zp goes to −∞. Therefore, to find
optimal manufacturing parameters in the controllable range
[Ipmin , Ipmax ], we optimize zp without range constraint and
calculate Ip = g(zp ).
4
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
For numerical efficiency, instead of calculating complex
analytic gradient (6) of neural network function, we use
numerical gradient [18] Ĝ (11) in our process optimization,
where two feed-forward operations of (5) with the manufacturing parameters x + τ2 and x − τ2 and one division by
τ ≈ 0 are needed. Thus, updating equation of latent variables
(12) is also used instead of the equation for manufacturing
parameters (7).
Ĝ(x) =
log(P DP ◦ g(x + τ2 )) − log(P DP ◦ g(x − τ2 ))
,
τ
(11)
xt+1 = xt − γ · Ĝ(xt ), xt = zpt .
(12)
III. EXPERIMENTS
A. EXPERIMENTAL SETUP
(a) Id -Vg curves
For experimental data set, we gathered 1,000 samples of
random manufacturing parameters in the reference range between min and max values specified in table 1 and simulated
1,000 samples of Id -Vg and Cgg -Vg characteristics of the devices as shown in fig.5 by using Sentaurus TCAD simulator.
Before the electrical simulation, we obtained the geometrical
parameters such as gate length and equivalent oxide thickness
announced in [8] and [19]. Then, we varied the doping profiles from source/drain to channel to calibrate subthreshold
characteristics between simulated and Intel measured data
[8]. Simulated I-V characteristics correspond to 32-nm node
HP (High Performance) and LOP (Low Operating Power)
targets [19] about the range of the threshold voltage and
on/off current ratio which are related to electrostatic characteristics like subthreshold swing and drain-induced barrier
lowering. To obtain power (14) and delay (15), effective
current Ief f is extracted from IH and IL (13) using [21].
We extracted IH at gate voltage, Vg =Vdd and drain voltage,
Vd =0.5Vdd and IL at Vg =0.5Vdd and Vd =Vdd .
IH
Ief f = (IH − IL )/ ln
,
(13)
IL
power = Ief f · Vdd + Iof f · Vdd ,
delay =
Cgg · Vdd
.
2 · Ief f
(14)
(15)
Among these 1,000 samples of {manufacturing parameters, delay and power} pairs, 850 samples were used to train
the models of delay and power, the other 150 samples were
used to test the delay and power prediction performance of
the trained models.
For TCAD simulation, drift-diffusion transport model was
calculated self-consistently with Poisson and the carrier continuity equations. Density-gradient model was applied to
consider the quantum confinements of carriers. Multivalley
modified local-density approximation was used with sixband and two-band k.p band structures for holes and electrons, respectively. Canali model [20] for carrier velocity
saturation at high electric field, Masetti model [22] to consider doping-dependent mobility, and Lombardi model [23]
to consider mobility degradation at Si-SiO2 interface were
(b) Cgg -Vg curves
FIGURE 5: (a) I-V and (b) C-V characteristics of transistors
for the extraction of power and delay. Parameter ranges are
selected as full range (black lines) and 5% limiter (red lines).
used. Doping-dependent Shockley-Read-Hall [24] and Auger
[25] generation-recombination models were used along with
Hurkx band-to-band tunneling model [26] to consider the
gate-induced drain leakage. Operation voltage is fixed to 1.0
V, and off-state current is fixed at 1 nA for the averaged
device.
As a pre-processing of data set, we took logarithm of each
element of the 850 train samples and then normalized to
have (mean, standard deviation)=(0, 1) across samples. This
pre-processing of eliminating bias and deviation differences
between data helps multi-layered perceptron (MLP) to learn
a function in the restricted range of domain and is known to
enhance training performance [27]. We performed the same
pre-processing for the 150 test samples by using the mean
and standard deviation of the train samples.
All experiments were operated in MATLAB console on
a laptop computer with Intel i7-8550U (Quadcore, 1.8GHz)
CPU and 16 GByte RAM.
5
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
We built two neural network models, one for predicting delay
and the other for power of transistor, (fig.2) as 3-layered
MLPs of 19 nodes for input layer, 7 nodes for hidden layer,
and 1 node for output layer, where the number of nodes
in the hidden layer were experimentally determined as the
smallest number to learn regression between manufacturing
parameters and delay or power without overfitting. In details,
the two models were trained, one to minimize power loss of
(3) when y=power, the other to minimize delay loss of (3)
when y=delay. During the training phase, randomly selected
700 samples out of 850 samples (termed training set) were
used to update neural network parameters using LevenbergMarquardt (LM) optimizer. The other 150 samples (termed
validation set) were used find the stopping point for training
iteration based on the generalization performance of the
trained neural networks. In our experiments, we stopped the
iteration of LM optimizer when the loss Ly for validation
set continuously remained or increased during the previous 6
epochs. We repeated this training process for varying number
of nodes in the hidden node and used the smallest number of
nodes that produce small training error for training set and
near-zero difference between errors of training and validation
sets.
Since the performance of a trained network differs based
on its initial random parameter values, we repeated this
training procedure with the same training and validation sets
but with different initial randomized values of network parameters. And then, we took the best network which resulted
the least error for training and validation sets. To find an
appropriate number of repetition with a stable learning performance, we measured average MSEs and their variations of
10 -6
13
delay (Fdelay ) and power (Fpower ) models from 10 trials of
training with each repeat number. As shown in fig.6, average
loss and its variation decrease as the repeat number increases
but no critical improvement was resulted after 40 repeats.
Therefore, considering moderate training time, we concluded
to use the best models from 40 trained networks.
For the trained models, we firstly analyzed the learning
ability and generalization performance of our neural models
by monitoring the Loss (3) reduction during training process. Figure 7 shows loss (MSE error) reductions of the
normalized data sets (sec.III-A) during training process. For
both training (blue lines) and validation (green lines) sets,
the loss gradually decreased as the training epoch continued
until optimizer stopped (fig.7a and fig.7b). Even for test data
set (red lines), which was not shown in training procedure,
the same loss to the validation set was achieved. Figure
8 shows output regression lines of our two trained neural
models. The regression plots of delay model (fig.8a) and
power model (fig.8b) show almost ideal linear lines with R
values very close to 1.0 for all kinds of data sets. Based on
Best Validation Performance is 1.0668e-05 at epoch 29
10 0
Train
Validation
Test
Best
Mean Squared Error (mse)
B. PERFORMANCE EVALUATION OF POWER AND
DELAY MODELING
10 -2
10 -4
10 -6
mse error vs repeat num
0
5
10
15
20
25
30
35
35 Epochs
delay error
power error
(a) Delay loss reduction
12
10
Train
Validation
Test
Best
10 -2
Mean Squared Error (mse)
| log(predicted/gt) | 2
Best Validation Performance is 7.8511e-06 at epoch 83
10 -1
11
9
8
7
6
1
10
20
30
40
50
60
70
80
90
10 -3
10 -4
10 -5
100
repeat num
FIGURE 6: Model error vs. number of repeated training:
each curve represents average loss with standard deviation
from 10 trials corresponding to each number of repeated
training.
10 -6
0
10
20
30
40
50
60
70
80
89 Epochs
(b) Power loss reduction
FIGURE 7: Loss vs. epoch during training process
6
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
Validation: R=0.99679
0.1
0.05
0
-0.05
-0.05
0
0.05
0.1
0.15
Target
0.15
0.1
0.05
0
-0.05
-0.05
0.05
0
-0.05
0.05
0.1
0.15
Output ~= 1*Target + 0.00037
Output ~= 0.99*Target + 0.00039
0.1
0
0.1
0.15
All: R=0.99775
Data
Fit
Y=T
-0.05
0.05
Target
Test: R=0.99688
0.15
0
Data
Fit
Y=T
0.15
C. PERFORMANCE EVALUATION OF AUTOMATIC
MANUFACTURING PARAMETER OPTIMIZATION
0.1
0.05
0
-0.05
-0.05
0
Target
0.05
0.1
0.15
Target
(a) Delay regression lines
Validation: R=0.99757
Data
Fit
Y=T
0.05
0
-0.05
-0.1
-0.15
-0.15
-0.1
-0.05
0
0.05
Target
Output ~= 0.99*Target + 0.00057
Output ~= 1*Target + 5.3e-06
Training: R=0.99872
Data
Fit
Y=T
0.05
0
-0.05
-0.1
-0.15
-0.15
0
-0.05
-0.1
-0.15
-0.15
-0.1
-0.05
0
-0.05
0
0.05
All: R=0.99839
0.05
Output ~= 1*Target + 9.4e-05
Output ~= 1*Target + 3e-05
0.05
-0.1
Target
Test: R=0.9976
Data
Fit
Y=T
of our neural models to the simulated delays and powers
(ground truth) for the 150 test samples in sec.III-A. As shown
in fig.9, the differences between predicted and simulated
values are very small (left panel of fig.9), less than 5 % of
the simulated values (right panel of fig.9), and this prediction
accuracy is good enough to apply our neural models to
predict delay and power in a real manufacturing process.
Our neural modeling took 80 seconds to train two neural
networks 40 times. Additionally, compared to TCAD simulation that takes 3 min, our method takes less than 1 ms to
perform a forward prediction of delay and power for a certain
manufacturing parameter.
Data
Fit
Y=T
0.05
0
In the optimization process represented in fig.4, we fixed our
trained neural models of delay and power in sec.III-B and
used them as Fdelay and Fpower respectively. And then, we
generated random initial values of manufacturing parameters
and updated them by gradient descent method as described
in sec.II-C with numerical gradient (11), update equation
(12), and objective function (5). Random initial values of
manufacturing parameters were generated under the uniform
distribution in a reasonable range, i.e., from Ipmin to Ipmax of
(9) where Ipmin = 0.95×reference value in table 1 (-5%) and
Ipmax = 1.05×reference value in table 1 (+5%). The small
deviation τ = 2 × 10−8 was used for calculating accurate numerical gradient (11) and the learning rate γ = 103 was used
for update equation (12) for fast convergence. TP DP = 0,
TG = 10−5 , and Tdx = 10−5 were used for termination
condition (8).
Since gradient descent method does not always guarantee
global optimum from any initial value, we repeated this
optimization process with different initial values and took
the best result as the optimal manufacturing parameters.
To find the necessary number of repetition of optimization
process to obtain stable optimal result, we firstly measured
-0.05
-0.1
10 -12
delay: predicted vs gt
gt
predicted
2.2
-0.15
-0.1
Target
-0.05
0
0.05
Target
2
1.8
1.6
(b) Power regression lines
delay error
0.1
2.4
-0.15
predicted/gt-1
0.15
Data
Fit
Y=T
delay
Data
Fit
Y=T
Output ~= 1*Target + -0.00018
Output ~= 1*Target + 0.00049
Training: R=0.99815
0.05
0
-0.05
1.4
-0.1
FIGURE 8: Regression lines of data set after training process
0
50
10
-4
100
150
0
sample indices
power: predicted vs gt
predicted/gt-1
these results, we concluded that our neural models learned
the deterministic equations of delay and power for given
manufacturing parameters and that our neural models have
a generalization performance to predict delay or power for
unseen manufacturing parameters in the same perturbation
range of training data.
Secondly, to evaluate the modeling accuracy of our neural
approach, we compared the predicted delays and powers
power
2.6
2.4
2.2
2
gt
predicted
1.8
100
150
sample indices
power error
0.1
2.8
50
0.05
0
-0.05
-0.1
0
50
100
sample indices
150
0
50
100
150
sample indices
FIGURE 9: Comparison of predicted and simulated (gt)
values of test data set: the predicted values are located in
between ±5% bounds of ground truth values.
7
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
TABLE 2: Optimized values vs. number of repeated optimization process with 5% limiter: values in parentheses are standard
deviations corresponding to the above averaged values.
Repeat number
1
0.3071
(0.9429)
0.3110
simulated
(0.9578)
0.1661
predicted
(0.2284)
0.1684
simulated
(0.2705)
0.1849
predicted
(0.3095)
0.1847
simulated
(0.2990)
predicted
PDP ×10−15 (×10−18 )
Delay ×10−11 (×10−13 )
Power ×10−3 (×10−5 )
10
0.3066
(0.0079)
0.3118
(0.6863)
0.1684
(0.0308)
0.1699
(0.0041)
0.1820
(0.0339)
0.1835
(0.0365)
20
0.3066
(0.0002)
0.3120
(0.0007)
0.1685
(0.0005)
0.1699
(0.0005)
0.1819
(0.0006)
0.1836
(0.0005)
30
0.3066
(0.0002)
0.3120
(0.0047)
0.1685
(0.0003)
0.1699
(0.0019)
0.1819
(0.0003)
0.1836
(0.0018)
the best PDP value for each repeat number and secondly
calculated mean and standard deviation from 10 trials. Figure
10a and the second row of table 2 show the measured values
where predicted optimum and simulated optimum represent
PDP values corresponding to the optimized manufacturing
parameters from our neural model and from Sentaurus TCAD
simulator, respectively. We find that the predicted PDP is
saturated at around 0.3066 × 10−15 when repeat number
is ≥ 10 and its deviation diminished to 0.0002 × 10−18 .
Even without any repeat (when repeat number is 1), the
resulted PDP value (0.3071 × 10−15 with standard deviation
of 0.9429 × 10−18 ) is in close proximity to the saturated
optimal value. Compared to the ground truth (simulated optimum), the predicted optimum has slightly lower value. This
difference is expected and caused by the model prediction
error shown in fig.9 but the difference within ±5% bounds
of simulated optimum (upper +5% and lower -5% bounds of
fig.10a) is not significant in real manufacturing process.
TABLE 3: The optimized manufacturing parameters from
40 repeated optimization process with 5% limiter (Avg:
averaged value, Std: standard deviation, Std / Avg: relative
deviation): all values were measured from 10 trials of 40
repeated optimization process.
Parameter
Lcon(d)
Lcon(s)
Lg
Lhaloj(d)
Lhaloj(s)
Lsdj(d)
Lsdj(s)
Lsp(d)
Lsp(s)
Nch
Nhalo(d)
Nhalo(s)
Nsd(d)
Nsd(s)
Tg
Thk
Tox
Ts/d(d)
Ts/d(s)
Avg
42
38
28.5
9.5
10.5
26.2
23.8
19
21
1.0496 ×1018
4.7500 ×1018
5.2500 ×1018
1.0500 ×1020
0.9500 ×1020
47.5
2.1
0.735
9.5
10.5
Std
Std / Avg
8.4961 ×10−7 2.0229 ×10−5
4.3151 ×10−7 1.1355 ×10−5
8.5249 ×10−16 2.9912 ×10−14
6.9458 ×10−9 7.3114 ×10−7
2.4959 ×10−7 2.3771 ×10−5
6.3633 ×10−10 2.4241 ×10−8
5.7586 ×10−9 2.4247 ×10−7
4.6348 ×10−12 2.4394 ×10−10
1.1685 ×10−9 5.5644 ×10−8
1.4607 ×1014 1.3916 ×10−4
1.2944 ×1013 2.7251 ×10−6
2.9841 ×1013 5.6841 ×10−6
3.4904 ×1014 3.3242 ×10−6
3.8750 ×1013 4.0789 ×10−7
2.2813 ×10−7 4.8028 ×10−6
1.4153 ×10−9 6.7379 ×10−7
1.4366 ×10−10 1.9545 ×10−7
1.4790 ×10−8 1.5568 ×10−6
4.1235 ×10−10 3.9271 ×10−8
40
0.3066
(0.0002)
0.3120
(0.0006)
0.1685
(0.0006)
0.1699
(0.0004)
0.1819
(0.0007)
0.1836
(0.0004)
50
0.3066
(0.0002)
0.3120
(0.0009)
0.1685
(0.0004)
0.1699
(0.0006)
0.1819
(0.0005)
0.1836
(0.0007)
60
0.3066
(0.0001)
0.3120
(0.0006)
0.1685
(0.0005)
0.1699
(0.0004)
0.1819
(0.0006)
0.1836
(0.0004)
70
0.3066
(0.0002)
0.3120
(0.0046)
0.1685
(0.0005)
0.1699
(0.0019)
0.1819
(0.0005)
0.1836
(0.0017)
80
0.3066
(0.0001)
0.3120
(0.0007)
0.1685
(0.0006)
0.1699
(0.0005)
0.1819
(0.0006)
0.1836
(0.0005)
90
0.3066
(0.0001)
0.3120
(0.0047)
0.1685
(0.0005)
0.1699
(0.0019)
0.1819
(0.0005)
0.1836
(0.0018)
100
0.3066
(0.0002)
0.3120
(0.0047)
0.1685
(0.0004)
0.1699
(0.0019)
0.18219
(0.0004)
0.1836
(0.0018)
When repeat number is 20, delay (fig.10b) and power
(fig.10c) converged to the best predicted optimum, 0.1685 ×
10−11 (third row of table 2) and 0.1819 × 10−3 (fourth
row of table 2) respectively, and are stable with very small
standard deviations, 0.0005 × 10−13 (second row of table
2) and 0.0006 × 10−5 (third row of table 2) respectively.
Compared to the simulated optima of delay and power, the
predicted optima have very small difference within ± 5%
bounds (fig.10b and fig.10c) of the simulated optima. Based
on these results, we decided to repeat optimization process
40 times (20 + marginal 20) to achieve the lowest PDP
performance with stable delay and power.
Our optimized PDP and its corresponding delay and power
are compared to the values of 150 test samples (III-A) in
fig.11. As shown in fig.11a, our optimized PDP (green line) is
lower than the minimum PDP of the test samples. However,
the minimum PDP was not resulted with the minima of
both delay and power. Instead, our optimized manufacturing
parameters have relatively high delay (fig.11b) and relatively
low power (fig.11c). This result is partially consistent with
the previous optimizing technique of human expert where
human expert firstly minimizes power (DC characteristics)
by maintaining good short channel characteristics and then
tries to minimize delay (AC characteristics) under the low
power condition. However, our neural optimization result
differs in that it does not insist on the lowest power and delay;
rather our approach resulted in a slightly higher power, yet
achieved the lowest PDP with a certain delay. Therefore, as
an holistic optimization, our neural approach is promising
in finding optimal manufacturing parameters for the lowest
PDP while the commonly used sequential optimization in
real manufacturing, which firstly sticks to the minimal power
and secondly searches minimal delay, does not guarantee the
lowest PDP. Benefits of our neural approach will be further
proved by quantitative comparing with human expert and the
previously used reference design in sec.III-E.
Table 3 presents the average and standard deviation values
of each optimized manufacturing parameter when repeat num
is 40. Each parameter converged to a certain optimal value
(Avg) with a very small deviation (Std) of 10−4 or less scale
of its corresponding Avg value (Std / Avg). Figure 12 shows
the optimized transistor structure of the manufacturing pa-
8
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
rameters shown in table 3. Since our optimization method has
no constraint of source/drain symmetry, the resulted optimal
transistor has an asymmetric structure unlike commonly used
symmetric structure [8]. Based on the table 2, this asymmetric structure achieved 17% lower PDP (0.3120 × 10−15 ,
12% higher delay (0.1699 × 10−11 ), and 26% lower power
10 -16
3.3
average and deviation of optimized pdp vs repeat num
3.25
predicted optimum
simulated optimum
upper bound (+5%)
lower bound (-5%)
pdp
3.2
(0.1836 × 10−3 ) than reference values, 0.3749 × 10−15 ,
0.1516 × 10−11 , and 0.2473 × 10−3 respectively, which were
obtained using TCAD simulation with the reference structure
[8] in table 1.
Although asymmetric structure is not common and beyond
the present manufacturing ability, this optimization result
shows that our neural approach can suggest theoretical direction for MOSFET structure as well as optimal parameter values for better performance beyond the conventional
source/drain symmetric structure. Our optimization method
can also find a set of optimal source/drain symmetric manufacturing parameters by applying our optimization technique
3.15
3.1
3.05
10 -16
5
3
Test sample PDP vs optimized PDP
pdp of test samples
optimized pdp
2.95
0
10
20
30
40
50
60
70
80
90
4.5
100
PDP
repeat num
(a) Optimized PDP vs repeat number
10 -12
1.8
average and deviation of optimized delay vs repeat num
delay
3.5
predicted optimum
simulated optimum
upper bound (+5%)
lower bound (-5%)
1.75
4
3
0
50
100
150
sample index
1.7
(a) Test sample PDP vs optimized PDP
10 -12
Test sample delay vs optimized delay
1.65
delay of test samples
optimized delay
2.4
2.2
1.6
10
20
30
40
50
60
70
80
90
100
Delay
0
repeat num
(b) Optimized delay vs repeat number
2
1.8
1.6
10 -4
1.95
average and deviation of optimized power vs repeat num
1.4
1.2
power
0
predicted optimum
simulated optimum
upper bound (+5%)
lower bound (-5%)
1.9
1.85
50
100
150
sample index
(b) Test sample delay vs optimized delay
1.8
10 -4
3
Test sample power vs optimized power
power of test samples
optimized power
2.8
1.75
0
10
20
30
40
50
60
70
80
90
100
repeat num
(c) Optimized power vs repeat number
FIGURE 10: Optimized values of PDP, delay, and power
vs the number of repeated optimization process: predicted
optimum represents the output value from our neural model
corresponding to the optimized manufacturing parameters,
simulated optimum represents the values from TCAD simulator, upper and lower bounds represents 5% higher and lower
values respectively than the simulated optimum when repeat
number is 20.
Power
2.6
1.7
2.4
2.2
2
1.8
1.6
0
50
100
150
sample index
(c) Test sample power vs optimized power
FIGURE 11: Test samples and our optimized values of PDP,
delay, and power: the optimized values were obtained from
30 repeated optimization process with 5% limiter.
9
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
1. 논문 figure 및 table
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
DopingConcentration (cm-3)
1.053∙1020
1.524∙1016
2.094∙1012
G
S
-7.822∙1014
-5.402∙1018
D
Sub
FIGURE 12: Transistor structure optimized by our neural
approach with 5% limiter
TABLE 4: Parameters listed in descending order of sensitivity (dy): values with ’-’ sign are negatively correlated with
PDP.
Parameter PDP sensitivity (dy)
Lg
6.8710 ×10−17
Nsd(s)
3.7977 ×10−17
Tox
-2.9362 ×10−17
Lsdj(s)
2.1586 ×10−17
Thk
-1.7121 ×10−17
Lsp(s)
-1.7022 ×10−17
Nsd(d)
-1.3028 ×10−17
Nhalo(s)
-1.2078 ×10−17
Lsdj(d)
-1.0712 ×10−17
Nhalo(d)
1.0056 ×10−17
Parameter PDP sensitivity (dy)
Ts/d(s)
-7.1042 ×10−18
Lsp(d)
5.1539 ×10−18
Ts/d(d)
3.7916 ×10−18
Lhaloj(d)
3.4982 ×10−18
Tg
2.5350 ×10−18
Nch
-1.1027 ×10−18
Lcon(s)
9.1408 ×10−19
Lhaloj(s) -8.1754 ×10−19
Lcon(d)
-4.1220 ×10−20
of sec.II-C assuming the same parameter for source/drain.
The symmetric structure optimization will be dealt with in
sec.III-E.
D. CORRELATION AND SENSITIVITY ANALYSIS
We additionally analyzed the effects of each manufacturing
parameter on PDP value in optimization process by using
our trained neural models of sec.III-B and the optimized
manufacturing parameters of sec.III-C. Each graph of fig.13,
fig.14, and fig.15 shows transitions of figure-of-merits, i.e.,
PDP, delay, and power, respectively for each manufacturing
parameter varying from Min to Max of table 1, while the
other parameters are fixed at the optimal values of table
3. We calculated the differences between the minimum and
maximum values of figure-of-merits as the sensitivity (dy)
of parameter, i.e., how significantly a parameter affects the
figure-of-merit. These graphs can be generated in a second
from our neural models without using repeated TCAD simulations.
From these graphs, we firstly find that PDP and each
manufacturing parameter have an almost linear correlation
as shown in fig.13. This linear correlation resulted in the
optima (red circles in fig.13) on one bound of 5% limiter
(vertical dashed lines or dash dot lines in fig.13) which
has the lowest PDP. Once we found how manufacturing
parameters are correlated with PDP, we can easily decide if
each parameters should be changed furthermore and to which
direction each parameter should be moved. Moreover, the
monotonic linear correlation supports the usage of a wider
limiter in our optimization process up to full range, and thus
each parameter can be moved toward the boundary of the
limiter for even further PDP optimization.
Secondly, we find that source/drain parameter pairs have
the opposite directional correlations with PDP, but with very
similar sensitivities. Highly-doped source region by reducing parasitic resistance and lightly-doped drain region by
decreasing gate-induced drain leakage and improving short
channel effects (SCEs) are preferred whenever possible [28]–
[30]. For example, Lsdj(d) and Lsdj(s) are negatively and
positively correlated with PDP respectively and these two
correlations have similar slopes in fig.13. This supports to
use asymmetric structure for a better PDP because the positive and negative PDP increments of the same source/drain
parameter pair compensate each other.
TABLE 5: Parameters listed in descending order of delay and
power sensitivities (dy): parameters (0,0) are insensitive to
both delay and power, Parameters (±,0) or (0, ±) are sensitive to only delay or power respectively. Parameters (±,±)
or (±,∓) are sensitive to both delay and power in a same
direction or opposite directions. Each group of parameters
are sorted in a descending order in regards to both delay and
power sensitivities.
Parameters (0,0)
Lcon(d)
Lhaloj(d)
Lcon(s)
Tg
Parameters (±,0)
Lsdj(d)
Nsd(d)
Nhalo(d)
Lsp(d)
Ts/d(d)
Parameters (0,±)
Tox
Parameters (±,±)
Lg
Parameters (±,∓)
Nsd(s)
Nhalo(s)
Lsdj(s)
Lsp(s)
Nch
Ts/d(s)
Lhaloj(s)
Thk
Delay sensitivity (dy)
1.0306 ×10−14
1.1255 ×10−14
-4.4003 ×10−15
-1.5111 ×10−16
Delay sensitivity (dy)
-7.5297 ×10−14
-7.4103 ×10−14
4.0869 ×10−14
3.5057 ×10−14
3.2589 ×10−14
Delay sensitivity (dy)
-1.7130 ×10−15
Delay sensitivity (dy)
2.8058 ×10−13
Delay sensitivity (dy)
-4.6829 ×10−13
3.0951 ×10−13
-2.7648 ×10−13
2.2607 ×10−13
1.8004 ×10−13
1.1989 ×10−13
7.0730 ×10−14
3.5765 ×10−14
Power sensitivity (dy)
-1.1441 ×10−6
8.5549 ×10−7
1.0203 ×10−6
1.5210 ×10−6
Power sensitivity (dy)
1.6981 ×10−6
2.5266 ×10−7
1.5459 ×10−6
-7.1222 ×10−7
-1.2461 ×10−6
Power sensitivity (dy)
-1.7213 ×10−5
Power sensitivity (dy)
1.0096 ×10−5
Power sensitivity (dy)
7.1839 ×10−5
-4.2408 ×10−5
5.0168 ×10−5
-3.9581 ×10−5
-2.0630 ×10−5
-1.8421 ×10−5
-8.4648 ×10−6
-1.4148 ×10−5
10
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
1
0
1
0
39
40
41
42
0
38
39
40
x=Lcon(d)
10 -16
10 -17
3
2
pdp vs Lsdj(s)
10 -16
10 -17
32
34
2
3
2
0
0
0
24
24.5
26
19
1
pdp vs N halo(s)
10
10 -17
dy = -1.2078
4
y=pdp
2
-16
3
2
5.5
x=N halo(d)
10 -16
4
4
4.5
10 18
10 -16
10 -17
2
1
5.5
10 18
2
2
x=T hk
2.2
11
0.65
0.7
0.75
0.8
20.5
4
3
2
21
8
pdp vs N sd(s)
10
10 -17
2
9
10
10
11
10 17
pdp vs Tg
dy = 2.535
10 -18
49
51
3
2
12
10 19
48
50
52
x=T g
pdp vs Ts/d(s)
dy = -7.1042
4
2
12
0
9
x=N sd(s)
10 -16
3
11
1
8
10 -18
-16
4
3
pdp vs Ts/d(d)
dy = 3.7916
10 -18
x=N ch
dy = 3.7977
10 19
10 -18
3
2
1
0
0.6
-16
4
12
1
0
1.8
10
x=N sd(d)
10 -16
3
20
0
9
pdp vs N ch
dy = -1.1027
4
1
8
1
0
19.5
10
10 -17
10.5
0
19
pdp vs N sd(d)
2
6
10 -17
10 -16
10 -17
x=Lsp(s)
3
pdp vs Tox
dy = -2.9362
4
y=pdp
3
5
x=N halo(s)
pdp vs Thk
dy = -1.7121
21
0
6
y=pdp
5
20.5
1
0
4.5
20
dy = -1.3028
4
1
0
-16
10
1
x=Lsp(d)
y=pdp
10
10 -17
3
4
19.5
y=pdp
pdp vs N halo(d)
dy = 1.0056
4
25.5
y=pdp
10
-16
25
x=Lsdj(s)
pdp vs Lsp(s)
2
0
26
9.5
x=Lhaloj(s)
3
1
25.5
10.5
dy = -1.7022
4
1
25
0
10
10 -16
10 -18
1
24.5
2
1
9.5
pdp vs Lsp(d)
10 -19
3
x=Lhaloj(d)
dy = 5.1539
4
3
x=Lsdj(d)
y=pdp
30
1
24
y=pdp
28
x=Lg
dy = 2.1586
4
y=pdp
y=pdp
pdp vs Lsdj(d)
2
0
26
x=Lcon(s)
dy = -1.0712
4
42
y=pdp
10 -16
41
3
pdp vs Lhaloj(s)
dy = -8.1754
4
1
y=pdp
38
2
10 -16
10 -18
y=pdp
2
3
pdp vs Lhaloj(d)
dy = 3.4982
4
y=pdp
optimum
-5% bounds
+5% bounds
1
3
10 -16
10 -17
dy = 6.871
4
y=pdp
y=FPDP(x)
2
pdp vs Lg
10 -16
10 -19
y=pdp
3
pdp vs Lcon(s)
dy = 9.1408
4
y=pdp
y=pdp
10 -16
10 -20
dy = -4.122
y=pdp
pdp vs Lcon(d)
10 -16
4
0
9.5
x=T ox
10
x=T s/d(d)
10.5
9.5
10
10.5
x=T s/d(s)
FIGURE 13: PDP transition for varying manufacturing parameters around the optimal value of 5% limiter and asymmetric
source/drain structure
Thirdly, PDP is insensitive to some manufacturing parameters. For example, PDP is almost constant (|dy| < 3.066
×10−18 , less than 1% of the optimal predicted PDP 0.3066
×10−15 shown in table.2) for varying Lcond(s/d) , Lhaloj(s) ,
Nch , and Tg in fig.13. This means that not all parameters
but only some significant parameters with high slope (high
sensitivity) are selectively necessary for achieving an optimal PDP [31], [32]. Considering only significant parameters
in the transistor optimization process is beneficial because
finding an optimum from a lower dimensional search space
is faster and easier. Table 4 shows the sorted list of parameters
in descending order in regards to their PDP sensitivities (dy)
where Lg , Nsd(s) , Tox , and Lsdj(s) are significant factors
for PDP optimization based on their PDP sensitivities. This
analysis is consistent with the trend of conventional semiconductor industry [33], [34].
We also performed a similar analysis on the effect of
manufacturing parameters in delay and power control by
monitoring their sensitivities (dy) as shown in fig.14 and
fig.15. We categorized manufacturing parameters into five
groups regarding their sensitivities (dy) to delay and power,
i.e., both insensitive (0,0), delay-only sensitive (±,0), poweronly sensitive (0,±), both sensitive (±,±), both sensitive
but opposite (±, ∓). Here, we assumed that parameters of
sensitivity (dy) less than 1% of optimal predicted delay
(0.1685 ×10−11 in table.2) or power (0.1819 ×10−3 in
table.2) are insensitive. Table 5 shows the parameter groups
in descending order of sensitivities (dy) to delay and power.
As we can expect from table 4, Lcond(s/d) , Lhaloj(d) , and
Tg of very low PDP sensitivity are in the both insensitive
(0,0) group and have no effect in optimization process. Nch
and Lhaloj(s) of low PDP sensitivity are categorized as the
group of both sensitive but opposite (±,∓) and, therefore,
these parameters are useful in controlling delay and power of
transistor without changing PDP. By using the parameters in
delay-only (±,0) sensitive group like Lsdj(d) or power-only
(0,±) sensitive group like Tox , we can effectively control
delay or power of transistor independently.
Based on this correlation and sensitivity analysis, we find a
possibility to further reduce PDP. To this end, we enlarged the
range of limiter from 5% of reference manufacturing parameter values to full ranges between Min and Max in table 1 and
performed the same automatic parameter optimization process mentioned in sec.III-C. Table 6 shows optimized values
of manufacturing parameters with the full range limiter. The
resulted parameters positively correlated with PDP as shown
in table 4, i.e., Lg , Nhalo(d) , and Nsd(s) , were decreased and
parameters negatively correlated with PDP shown in table 4,
i.e., Nch , Nhalo(s) , Nsd(d) , Thk , and Tox , were increased to
the direction of reducing PDP. The other parameters were
not changed because they were already in full range with 5%
limiter. This full range optimization resulted in about 29%
lower PDP (0.2661×10−15 ) than the TCAD simulated value
using the reference structure (0.3749×10− 15) as shown in
11
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
0.5
0
0.5
40
41
42
39
40
42
0.5
10 -12
10 -13
1.5
1
0.5
0
25
25.5
24
24.5
10 -12
10 -14
25.5
10 -13
1
0.5
0
5
5.5
4
1.5
1
0.5
8
2.2
11
0.65
0.7
0.75
8
9
10
10 -12
10 -13
10 17
delay vs Tg
10 -16
dy = -1.5111
2
1.5
1
0
9
10
11
12
10 -12
49
50
51
52
x=T g
delay vs Ts/d(s)
10 -13
dy = 1.1989
2
48
10 19
x=N sd(s)
10 -14
12
0.5
8
delay vs Ts/d(d)
dy = 3.2589
11
x=N ch
delay vs Nsd(s)
10 19
1
0.8
1
21
1
1.5
1
0.5
0
0.6
20.5
1.5
12
0.5
x=T hk
20
dy = -4.6829
2
1.5
0
2
10
x=N sd(d)
2
0.5
1.8
9
10 -12
1
1.5
0
6
1.5
0
10 -12
10 -14
10 -13
0
19.5
0.5
10 18
10 -15
dy = -1.713
2
y=delay
2
5.5
delay vs Nch
0.5
19
delay vs Nsd(d)
10.5
dy = 1.8004
2
x=Lsp(s)
1
delay vs Tox
10 -12
10 -14
dy = 3.5765
5
x=N halo(s)
delay vs Thk
10 -12
4.5
10 -12
10 -13
1
0
6
10 18
x=N halo(d)
delay vs Lsp(s)
1.5
21
0.5
y=delay
4.5
20.5
1.5
0
4
20
dy = -7.4103
2
10
x=Lhaloj(s)
0
19.5
10 -12
1.5
9.5
0.5
19
delay vs Nhalo(s)
10.5
dy = 2.2607
2
1
26
y=delay
y=delay
1
0.5
10 -12
10 -14
x=Lsp(d)
dy = 3.0951
2
1.5
10
x=Lhaloj(d)
delay vs Lsp(d)
x=Lsdj(s)
dy = 4.0869
2
25
0
9.5
0
26
delay vs Nhalo(d)
34
0.5
x=Lsdj(d)
10 -12
32
1.5
0
24.5
30
dy = 3.5057
2
y=delay
1
delay vs Lsdj(s)
1
0.5
x=Lg
dy = -2.7648
2
1.5
24
28
10 -14
1.5
0
26
y=delay
10 -12
10 -14
y=delay
y=delay
41
x=Lcon(s)
dy = -7.5297
1
0
38
delay vs Lsdj(d)
10 -12
2
1.5
delay vs Lhaloj(s)
dy = 7.073
2
0.5
y=delay
39
x=Lcon(d)
y=delay
1
0
38
y=delay
1.5
10 -12
10 -14
y=delay
1
delay vs Lhaloj(d)
dy = 1.1255
2
y=delay
0.5
2
1.5
10 -12
10 -13
y=delay
optimum
-5% bounds
+5% bounds
delay vs Lg
dy = 2.8058
y=delay
y=Fdelay (x)
1
10 -12
10 -15
y=delay
1.5
delay vs Lcon(s)
dy = -4.4003
2
y=delay
y=delay
2
10 -12
10 -14
dy = 1.0306
y=delay
delay vs Lcon(d)
10 -12
0
9.5
10
x=T ox
10.5
9.5
10
x=T s/d(d)
10.5
x=T s/d(s)
FIGURE 14: Delay transition for varying manufacturing parameter around the optimal value of 5% limiter and asymmetric
source/drain structure
0
1
0
40
41
42
0
38
39
power vs L sdj(d)
10 -4
10 -06
y=power
2
1
24.5
power vs L sdj(s)
25
10 -4
25.5
1
26
24.5
25
25.5
26
10
0
10
5
5.5
x=N halo(d)
10 -4
2
1
4
4.5
10 -4
10 -05
1
0
5.5
2
x=T hk
2.2
-4
8
10
11
10 -4
0.7
x=T ox
19.5
10
0.75
0.8
20
20.5
power vs N ch
dy = -1.2461
2
1
21
-4
8
power vs N sd(s)
10
1
9
10
11
12
10 17
power vs T g
dy = 1.521
10 -06
49
51
2
1
0
8
10 19
9
10
11
12
10 19
x=N sd(s)
10 -4
10 -06
1
-4
10 -05
2
12
2
10 -05
dy = -2.063
x=N ch
dy = 7.1839
power vs T s/d(d)
10.5
0
19
10 -07
0
0.65
10 -4
10 -05
1
power vs N sd(d)
9
10
x=Lsp(s)
x=N sd(d)
10 -05
1
0.6
power vs L sp(s)
0
6
2
9.5
x=Lhaloj(s)
2
21
1
10 18
0
1.8
20.5
2
power vs T ox
dy = -1.7213
y=power
2
5
x=N halo(s)
10.5
dy = -3.9581
0
6
10 18
power vs T hk
dy = -1.4148
20
dy = 2.5266
y=power
4.5
19.5
10 -05
0
4
10 -4
x=Lsp(d)
dy = -4.2408
y=power
1
10
0
19
1
0
9.5
10 -07
1
10 -06
2
x=Lhaloj(d)
2
-4 power vs N halo(s)
10 -06
2
34
power vs L sp(d)
x=Lsdj(s)
dy = 1.5459
32
0
24
-4 power vs N halo(d)
30
dy = -7.1222
y=power
10
28
10 -05
2
x=Lsdj(d)
1
0
26
0
24
y=power
42
power vs L haloj(s)
dy = -8.4648
2
x=Lg
dy = 5.0168
0
y=power
41
y=power
dy = 1.6981
y=power
40
x=Lcon(s)
y=power
x=Lcon(d)
10 -4
1
y=power
39
2
10 -4
10 -07
dy = 8.5549
48
50
52
x=T g
power vs T s/d(s)
dy = -1.8421
y=power
38
power vs L haloj(d)
y=power
2
10 -4
10 -05
y=power
optimum
-5% bounds
+5% bounds
power vs L g
dy = 1.0096
y=power
y=Fpower (x)
10 -4
10 -06
y=power
2
1
power vs L con(s)
dy = 1.0203
y=power
y=power
10 -4
10 -06
dy = -1.1441
y=power
power vs L con(d)
10 -4
10 -05
2
1
0
9.5
10
x=T s/d(d)
10.5
9.5
10
10.5
x=T s/d(s)
FIGURE 15: Power transition for varying manufacturing parameter around the optimal value of 5% limiter and asymmetric
source/drain structure
12
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
TABLE 6: The optimized manufacturing parameters from 40
repeated optimization processes with the full range limiter
(Avg: averaged value, Std: standard deviation, Std / Avg:
relative deviation): all values were measured from 10 trials
of 40 repeated optimization processes.
Parameter
Lcon(d)
Lcon(s)
Lg
Lhaloj(d)
Lhaloj(s)
Lsdj(d)
Lsdj(s)
Lsp(d)
Lsp(s)
Nch
Nhalo(d)
Nhalo(s)
Nsd(d)
Nsd(s)
Tg
Thk
Tox
Ts/d(d)
Ts/d(s)
Avg
Std
42
4.5363 ×10−7
38
8.1926 ×10−7
25.5
0
9.5
1.3720 ×10−8
10.5
4.2806 ×10−8
26.2
4.2387 ×10−8
23.8
1.5486 ×10−12
19
6.6142 ×10−8
21
4.0887 ×10−11
1.2500 ×1018 2.4293 ×1013
3.7500 ×1018 2.8333 ×107
6.2500 ×1018 1.1724 ×1013
1.2500 ×1020 2.5697 ×106
0.7500 ×1020 1.9691 ×105
47.5
8.0618 ×10−7
2.3
4.5714 ×10−19
0.805
0
9.5
2.8810 ×10−8
10.5
1.6864 ×10−8
Std / Avg
1.0801 ×10−5
2.1559 ×10−5
0
1.4442 ×10−6
4.0768 ×10−6
1.6147 ×10−6
6.5204 ×10−11
3.4812 ×10−6
1.9470 ×10−9
1.9435 ×10−5
7.5556 ×10−12
1.8759 ×10−6
2.0558 ×10−14
2.6255 ×10−15
1.6972 ×10−5
1.9876 ×10−16
0
3.0326 ×10−6
1.6061 ×10−6
table 7.
E. COMPARISON BETWEEN HUMAN EXPERT AND OUR
NEURAL APPROACH
For a fair comparison of optimization performances between
our neural optimization method and human expert, we set
the same search ranges of manufacturing parameters, i.e., 5%
limiter (sec.III-C) and full range limiter (from Min to Max of
table 1), and used the conventional source/drain symmetric
structure [8] for both methods.
In case of our neural approach, we performed additional
automatic parameter optimization with our trained models,
previously mentioned in sec.III-B as described in sec.III-C
but with source/drain symmetric constraint, i.e., using a single parameter for each pair of source/drain parameters which
has a symbol with (s/d) in table 1.
In case of human expert, an experienced human expert optimized the PDP of the conventional MOSFET as described
here: to minimize the PDP, gate capacitance Cgg should
be minimized according to the PDP, product of delay and
power. Gate length was first tuned to reduce the Cgg . Then,
equivalent oxide thickness (EOT), directly related to Tox and
Thk , was controlled. In this process, it was revealed that the
EOT does not affect the SCEs of the device in this work,
so a large EOT was set to reduce the Cgg . Both Tg and Tsd
were decreased to minimum in order to reduce the fringing
capacitance between gate stack and S/D regions. Lcon was
also decreased to minimum in order to reduce the junction capacitance. Doping-related parameters, Nch , Nhalo , and Nsd ,
were decreased to reduce the overlap and intrinsic channel
capacitances. Finally, Lsp was set to the middle value due to
the trade-off between overlap and fringing capacitances.
1) Comparison in figure-of-Merits: PDP, delay, and power
Table 7 and fig.16 present and compare figure-of-merits of
transistors optimized by our neural approaches and human
expert. In case of 5% limiter, both human expert and our
neural method (Neural sym) achieved about 7% less PDP and
delay and similar power compared to those resulted from the
reference structure. In case of full range limiter, both methods
achieved about 20% less PDP and power and slightly less
delay than those resulted from the reference structure.
When comparing results from human expert and our
method (Neural sym), both showed similar delays, and powers; while PDPs were also similar, our method resulted in
a slightly higher PDP compared to that resulted by human
expert. . Our method has 1 to 2% error in modeling (fig.9 in
sec.III-B) and optimization (fig.10 in sec.III-C). And, due to
the linear correlation of low complexity (fig.13, fig.14, and
fig.15), human expert easily found the near best manufacturing parameters. This resulted in a lower PDP in Neural sym
but the difference was not significant, coming to less than
1.5%.
We presented optimized manufacturing parameters by our
neural approach in table 8 and by human expert in table
9. Lcond(s/d) , Lsdj(s/d) , Lsp(s/d) , and Tg of two methods
have different values. These differences are resulted from
the different optimization objectives of two methods, where
our neural method selects parameter values to obtain lower
PDP as described in sec.II-C while human expert minimizes
Cgg only for lower PDP according to the PDP equation as
described in sec.III-E. Nonetheless, these parameters with
different values are not significant in minimizing PDP because PDP is almost insensitive to these parameters based on
the sensitivity analysis in symmetric source/drain structure
(table 10). The other parameters have the same or similar
values in both methods. These results demonstrate that our
neural approach achieved a comparable optimization result
to the equation-based minimization of human expert without
knowing any equation through the entire process.
In the case of full range limiter, as the search range
TABLE 7: TCAD simulated values of PDP, delay, and power
for the optimized manufacturing parameters: ’Neural’ represents the optimization with our neural models and asymmetric source/drain structure, ’Neural sym’ with our neural
models and symmetric source/drain structure, ’Human expert’ with human optimization and symmetric source/drain
structure, and ’Reference structure’ shows the outputs of
TCAD simulation with the reference parameter values [8] in
table 1.
Optimization method
PDP×10−15 Delay×10−11 Power×10−3
Neural (5% limiter)
0.3120
0.1699
0.1836
Neural sym (5% limiter)
0.3535
0.1409
0.2509
Human expert (5% limiter)
0.3487
0.1435
0.2430
Neural (full range)
0.2661
0.2272
0.1171
Neural sym (full range)
0.3014
0.1490
0.2023
Human expert (full range)
0.2965
0.1445
0.2052
Reference structure [8]
0.3749
0.1516
0.2473
13
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Relative Delay
Relative PDP
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
PDP in symmetric structure. In addition, there is higher
chances to find better parameters to obtain a lower PDP
in asymmetric structure since asymmetric structure has 19
degree-of-freedom (DOF), which has seven more variables
than symmetric structure with 12 DOF.
Comparison of figure-of-merit in pdp
Comparison of figure-of-merit in delay
Neural (5% limiter)
Neural sym (5% limiter)
Human expert (5% limiter)
Neural (full range)
Neural sym (full range)
Human expert (full range)
Reference structure
Relative Power
Comparison of figure-of-merit in power
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
FIGURE 16: Comparison of methods in figure-of-merits:
PDP, delay, and power in table 7 were normalized so that the
values of reference structure become 1.0 for easy comparison.
TABLE 8: Optimized manufacturing parameters by our neural approach under the constraint of symmetric source/drain
(Neural sym)
Parameter Value (5% limiter) Value (full range)
Lcon(s/d)
42
42
Lg
28.5
25.5
Lhaloj(s/d)
9.5
9.5
Lsdj(s/d)
26.2
26.2
Lsp(s/d)
19
19
Nch
0.9500 ×1018
0.7500 ×1018
Nhalo(s/d)
4.7501 ×1018
3.7500 ×1018
Nsd(s/d)
0.9501 ×1020
0.7500 ×1020
Tg
52.5
52.5
Thk
2.1
2.3
Tox
0.735
0.805
Ts/d(s/d)
9.5
9.5
increased from 5% of reference parameter values to the full
range in table 1, the parameters positively correlated with
PDP (table 10), i.e., Nch , Nhalo(s/d) , and Nsd(s/d) , were
decreased and the parameters negatively correlated with PDP
(table 10), i.e., Thk and Tox , were increased in order to get
minimal PDP in both methods (table 8 and table 9).
2) Comparison of symmetric/asymmetric structures
We find that our neural method with asymmetric structure
(Neural) optimized PDP much less than with symmetric
structure (Neural sym) as shown in table 7 and fig.16. As
mentioned in sec.III-C, two parameters in a source/drain pair
are oppositely correlated to PDP and this resulted in a higher
3) Comparison in optimization speed
Table 11 shows the elapsed time to get optimal manuscript
parameters with our neural method and human expert. Our
neural method consumed 80 seconds for training two neural
models described in sec.III-B and another 80 seconds for automatically optimizing manufacturing parameters described
in sec.III-C. Once the models are trained, then our method
only required 80 seconds for additional optimization. In
contrast, human expert consumed 3 to 4 hours to obtain
optimal manufacturing parameters because of a lot of trial &
error tuning with TCAD simulations. Although these speeds
are not the most optimized for both methods, we can roughly
conclude that our neural method is about 100 times faster (2
to 3 minutes) than that of human expert (180 to 240 minutes).
Our neural approach requires a thousand of training samples generated by TCAD simulations. However, this data
generation is off-line process which can be done by random
sampling without delicate grid point control and in parallel
utilizing multi-cores within several sampling time. Because
of this, data acquisition time is not usually significantly considered in machine-learning-related works. Similarly, there
are several papers about neural approach for the optimization
of devices and circuits which do not consider the data acquisition time either [16], [35], [36]. Even when considering
data acquisition time, our neural method requires only several
TCAD simulation time for generating whole training samples
through parallel computing while human expert needs lots of
sequential simulation time because of a bundle of sequential
trial and error tuning processes. In our experiment, TCAD
simulation time for a data sample required 1 minute and we
used 3 parallel simulations only because of license limitation.
Although this resulted in 333 minutes (about 5.5 hours) to
obtain 1000 training samples, if we use more number of
parallel simulations, data acquisition time can be reduced
TABLE 9: Optimized manufacturing parameters by human expert with conventional MOSFET of symmetric
source/drain
Parameter Value (5% limiter) Value (full range)
Lcon(s/d)
38
40
Lg
28.5
25.5
Lhaloj(s/d)
9.5
9.5
Lsdj(s/d)
23.75
23.8
Lsp(s/d)
19
20
Nch
0.9500 ×1018
0.7500 ×1018
Nhalo(s/d)
4.7500 ×1018
3.7500 ×1018
Nsd(s/d)
0.9500 ×1020
0.7500 ×1020
Tg
47.5
47.5
Thk
2.1
2.3
Tox
0.735
0.805
Ts/d(s/d)
9.5
9.5
14
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
TABLE 10: Parameters listed in descending order of sensitivity (dy) of symmetric source/drain structure: values with ’-’
sign are negatively correlated with PDP. (See appendix.A for
the corresponding transitions of figure-of-merits)
Parameter PDP sensitivity (dy)
Lg
9.6238 ×10−17
Tox
-3.7973 ×10−17
Thk
-2.6635 ×10−17
Nch
8.5829 ×10−18
Nsd(s/d)
5.4069 ×10−18
Lsdj(s/d) -5.3249 ×10−18
Parameter PDP sensitivity (dy)
Lcon(s/d)
-4.8686 ×10−18
Lsp(s/d)
4.7325 ×10−18
Tg
-4.5583 ×10−18
Lhaloj(s/d)
4.0593 ×10−18
Nhalo(s/d)
2.9812 ×10−18
Ts/d(s/d)
1.4272 ×10−18
down to 1 min (when 1000 parallel simulations are used),
a single data sample time.
IV. CONCLUSION
In this paper, we proposed a new framework to automatically
optimize manufacturing parameters of transistor by using
neural networks. As the first step, our method trained two
neural networks for delay and power models of a transistor
utilizing a training data set from TCAD simulation. Then, an
optimal set of manufacturing parameters that minimizes PDP
was found automatically by using the neural network models
and gradient descent method in desired ranges of parameters.
We experimentally proved the superiority and efficiency
of our neural approach compared to the previous Intel 32
nm node [8] and human expert’s MOSFET optimization
in symmetric source/drain structure. Figure-of-merits of our
method were significantly better than the TCAD simulated
values with the reference structure [8], i.e., about 7% less
PDP with 5% limiter and about 20% less PDP with full
range limiter. Additionally, Figure-of-merits of our method
were comparable to human expert’s optimization resulted via
time-consuming trial & error with TCAD simulations, while
our method only took a couple of minutes to do the same
optimization.
Moreover, it was possible to analyze the effect of each
manufacturing parameter on figure-of-merits when our neural models were utilized. Without a huge amount of TCAD
simulations, our neural models obtained sensitivities of parameters to PDP, delay, and power by predicting these figureof-merits for varying parameters. Then, we categorized and
gave certain weights to manufacturing parameters regarding
their sensitivities to PDP, delay, and power. For example, Lg ,
Tox , Thk are highly correlated to PDP but Lcond(s/d) and Tg
are uncorrelated to any of PDP, delay, and power.
Our neural approach is not limited to a specific device and
matured technology but can be expanded to general devices
and new technology as long as a number of input/output data
are given because machine learning itself does not use any
device physics considering device structure and scale.
Even though we did not perform real device experiments
but focused on simulated results as done in the machinelearning-based circuit designing method [16] which includes
only simulated results without implementing real circuit
device, based on our experience, the simulation results are
rather correct and well matched to the real devices.
There would be concerning about gathering training data
from real factory. In real factory, for variation in unit manufacturing process, they cannot examine all wafer but some
dies as representative samples. Our work is focusing on
optimizing variation in unit manufacturing process with the
varying range from min to max in table 1 and, in the same
manner, we used 1000 data samples for machine learning
which can also be obtained in real factory without heavy expense. For output data, it would be burdensome if we measure
full Ids -Vgs and Cgg -Vgs data through gate voltage sweep,
but several Ids and Cgg values at some specific gate voltages
can be obtained in real factory with no heavy expense. For
input data, process parameters can be obtained from the
devices at the scribe line. Randomized structural parameters
in Table I are the outcomes of the process parameters such as
dose, temperature, and gas flow, but the accuracy of machine
learning is not affected by the type of inputs. If there are some
missing data, we can adopt generative adversarial network
(GAN) as a realistic data generator to complete them.
Our experiments demonstrated that PDP of transistor can
be much lower to about 29% less than that of the reference
structure, if asymmetric source/drain structure is adopted.
In addition, PDP can be further improved with even wider
limiter beyond Min and Max shown in table 1 because
linear transitions of figure-of-merits for full range limiter
exhibit monotonic functions (see figures in appendix.B and
C). Since our neural approach is not limited to a specific
figure-of-merit or scale of device, we can apply our method
to a smaller device for better figure-of-merits and to control
each figure-of-merit, such as lowering down the relatively
high delay in asymmetric structure in table 7 by defining
an appropriate objective function alternative to eq.3. There
is a need to further analyze asymmetric structure and wider
limiter for real application because they are not compatible
with the previous 32 nm node with symmetric structure,
and subsequently, there is an additional fabrication cost for
asymmetric structure. This is remained as our future work.
REFERENCES
[1] A. Khakifirooz, O. M. Nayfeh, and D. Antoniadis, "A Simple Semiempirical Short-Channel MOSFET Current–Voltage Model Continuous Across
All Regions of Operation and Employing Only Physical Parameters,"
IEEE Trans. Electron Devices, vol. 56, no. 8, pp. 1674-1680, Aug. 2009,
doi: 10.1109/TED.2009.2024022.
[2] BSIM Models: http://bsim.berkeley.edu/models/
[3] Predictive Technology Model: http://ptm.asu.edu/
[4] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, 1995.
[5] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview,"
Neural Networks, vol. 61, pp. 85–117, 2015.
[6] A. Ceyhan, J. Quijas, S. Jain, H.-Y Liu, W. E. Gifford, and S. Chakravarty,
"Machine Learning-enhanced Multi-dimensional Co-Optimization of Sub10nm Technology Node Options," 2019 IEEE International Electron
Devices Meeting (IEDM), San Francisco, CA, USA, 2019, pp. 36.6.136.6.4, doi: 10.1109/IEDM19573.2019.8993557.
[7] Synopsys Inc., Mountain View, CA, Version N-2017.09, 2017.
[8] S. Natarajan, M. Armstrong, M. Bost, R. Brain, M. Brazier, C-H Chang,
V. Chikarmane, M. Childs, H. Deshpande,K. Dev, G. Ding, T. Ghani, O.
Golonzka, W. Han, J. He*, R. Heussner, R. James, I. Jin, C. Kenyon,
S. Klopcic, S-H. Lee, M. Liu, S. Lodha, B. McFadden, A. Murthy, L.
15
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
TABLE 11: Timing comparison of our neural approach and human expert: timing of our method (Neural or Neural sym) was
measured with MATLAB script (not optimized) and timing of human expert was roughly measured in hour. N represents the
number of parallel processes used in training data generation.
Optimization method
Neural or Neural sym
Human expert
Time consumption
1000
× 1 min (data acquisition) + 80 s (model training) + 80 s (optimizing) = 1000
+ 2.7 min
N
N
Neiberg, J. Neirynck, P. Packan, S. Pae, C. Parker, C. Pelto, L. Pipes,J.
Sebastian, J. Seiple, B. Sell, S. Sivakumar, B. Song, K. Tone, T. Troeger,
C. Weber, M. Yang, A. Yeoh, and K. Zhang "A 32 nm logic technology
featuring 2nd-generation high-k+ metal-gate transistors, enhanced channel
strain and 0.171µm2 SRAM cell size in a 291Mb array," 2008 IEEE
International Electron Devices Meeting, San Francisco, CA, 2008, pp. 1-3,
doi: 10.1109/IEDM.2008.4796777.
[9] P. Werbos, "Beyond Regression: New Tools for Prediction and Analysis in
the Behavioral Sciences,", PhD Thesis, Harvard University Press, 1975.
[10] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward
networks are universal approximators," Neural Networks, vol. 2, no. 5, pp.
359-366, 1989.
[11] K. Funahashi, "On the approximate realization of continuous mappings by
neural networks," Neural Networks, vol. 2, no. 3, pp. 183-192, 1989.
[12] Yoshimura, Matsuoka, and Kakumu, "New CMOS shallow junction well
FET structure (CMOS-SJET) for low power-supply voltage," 1992 International Technical Digest on Electron Devices Meeting, San Francisco,
CA, USA, 1992, pp. 909-912, doi: 10.1109/IEDM.1992.307504.
[13] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533–536, 1986.
[14] G. Arfken, "The Method of Steepest Descents," in Mathematical Methods
for Physicists, 3rd edition. Orlando, FL, Academic Press, pp. 428-436,
1985.
[15] W. T. Vetterling, S. A. Teukolsky, W. H. Press, and B. P. Flannery,
Numerical Recipes in FORTRAN: the art of scientific computing, 2nd
edition. Cambridge, U.K.; Cambridge University Press, pp. 414, 1992.
[16] G. Zhang, H. He, and D. Katabi, “Circuit-GNN: Graph Neural Networks
for Distributed Circuit Design,” in International Conference on Machine
Learning, pp. 7364-7373. 2019.
[17] D. H. von Seggern, CRC Standard Curves and Surfaces With Mathematica,
2nd ed. London, U.K.: Chapman & Hall, 2006.
[18] W. T. Vetterling, S. A. Teukolsky, W. H. Press, and B. P. Flannery,
Numerical Recipes in FORTRAN: the art of scientific computing, 2nd
edition. Cambridge, U.K.; Cambridge University Press, pp. 180-184, 1992.
[19] D. P. Darcy and C. F. Kemerer, "The international technology roadmap for
semiconductors," 2009.
[20] C. Canali, G. Majni, R. Minder, and G. Ottaviani, "Electron and hole drift
velocity measurements in silicon and their empirical relation to electric
field and temperature," in IEEE Transactions on Electron Devices, vol. 22,
no. 11, pp. 1045-1047, Nov. 1975, doi: 10.1109/T-ED.1975.18267.
[21] M. H. Na, E. J. Nowak, W. Haensch, and J. Cai, "The effective
drive current in CMOS inverters," Digest. International Electron Devices Meeting, San Francisco, CA, USA, 2002, pp. 121-124, doi:
10.1109/IEDM.2002.1175793.
[22] G. Masetti, M. Severi, and S. Solmi, "Modeling of carrier mobility against
carrier concentration in arsenic-, phosphorus-, and boron-doped silicon,"
in IEEE Transactions on Electron Devices, vol. 30, no. 7, pp. 764-769,
July 1983, doi: 10.1109/T-ED.1983.21207.
[23] C. Lombardi, S. Manzini, A. Saporito, and M. Vanzi, "A physically
based mobility model for numerical simulation of nonplanar devices," in
IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 7, no. 11, pp. 1164-1171, Nov. 1988, doi: 10.1109/43.9186.
[24] J. G. Fossum, "Computer-aided numerical analysis of silicon solar cells,"
Solid-State Electron., vol. 19, pp. 269-277, 1976.
[25] L. Huldt, N. G. Nilsson, and K. G. Svantesson, “The temperature dependence of band-to-band Auger recombination in silicon,” Appl.Phys. Lett.,
vol. 35, no. 10, pp. 776–777, 1979.
[26] G. A. M. Hurkx, D. B. M. Klaassen, and M. P. G. Knuvers, "A new
recombination model for device simulation including tunneling," in IEEE
Transactions on Electron Devices, vol. 39, no. 2, pp. 331-338, Feb. 1992,
doi: 10.1109/16.121690.
[27] K. Kuźniar and M. Zajac,
˛ "Some methods of pre-processing input data
for neural networks," Computer Assisted Methods in Engineering and
Science, vol. 22, no. 2, pp. 141–151, jan. 2017.
3 to 4 hours
[28] J.-S. Yoon, T. Rim, J. Kim, M. Meyyappan, C.-K. Baek, and Y.-H. Jeong,
"Vertical gate-all-around junctionless nanowire transistors with asymmetric diameters and underlap lengths," Appl. Phys. Lett., vol. 105, no. 10, p.
102105, 2014.
[29] G. C. Patil and S. Qureshi, "Asymmetric drain underlap dopant-segregated
Schottky barrier ultrathin-body SOI MOSFET for low-power mixed-signal
circuits," Semicond. Sci. Technol., vol. 28, no. 4, pp. 045002-1-045002-9,
Feb. 2013.
[30] A. Goel, S. Gupta, A. Bansal, M. Chiang, and K. Roy, "Double-gate
MOSFETs with aymmetric drain underlap: A device-circuit co-design and
optimization perspective for SRAM," 2009 Device Research Conference,
University Park, PA, 2009, pp. 57-58, doi: 10.1109/DRC.2009.5354884.
[31] R. Pal, M. Togo, Y. Yong, L. Vanamurthy, S. Muralidharan, X. Zhang,
R. Carter, M. Eller, and S. Samavedam, "Variation improvement for
manufacturable FINFET technology," 2015 IEEE International Electron
Devices Meeting (IEDM), Washington, DC, 2015, pp. 21.2.1-21.2.4, doi:
10.1109/IEDM.2015.7409748.
[32] J. K. Lorenz, A. Asenov, E. Baer, S. Barraud, F. Kluepfel, C. Millar, and M.
Nedjalkov, “Process variability for devices at and beyond the 7 nm node,”
ECS J. Solid State Sci. Technol., vol. 7, no. 7, pp. 595–601, Jan. 2018, doi:
10.1149/2.0051811jss.
[33] J. Zhuge, R. Wang, R. Huang, J. Zou, X. Huang, D.-W. Kim, D. Park,
X. Zhang, and Y. Wang, "Experimental investigation and design optimization guidelines of characteristic variability in silicon nanowire CMOS
technology," 2009 IEEE International Electron Devices Meeting (IEDM),
Baltimore, MD, 2009, pp. 1-4, doi: 10.1109/IEDM.2009.5424421.
[34] S. Mudanai, "Overcoming variation challenge," IEDM Short Course, Dec.
2018.
[35] F. Feng, V. Gongal-Reddy, C. Zhang, J. Ma, and Q. Zhang, "Parametric
Modeling of Microwave Components Using Adjoint Neural Networks and
Pole-Residue Transfer Functions With EM Sensitivity Analysis," in IEEE
Transactions on Microwave Theory and Techniques, vol. 65, no. 6, pp.
1955-1975, June 2017, doi: 10.1109/TMTT.2017.2650904.
[36] C. Zhang, J. Jin, W. Na, Q. Zhang, and M. Yu, "Multivalued Neural
Network Inverse Modeling and Applications to Microwave Filters," in
IEEE Transactions on Microwave Theory and Techniques, vol. 66, no. 8,
pp. 3781-3797, Aug. 2018, doi: 10.1109/TMTT.2018.2841889.
HYUN-CHUL CHOI received the B.S. degree in
Electronic and Electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Korea, in 2002 and the M.S. and
the Ph.D. degrees in Electronic and Electrical
engineering from Pohang University of Science
and Technology (POSTECH), Korea, in 2005 and
2011 respectively. From 2011 to 2014, he was a
Research Scientist with Multimedia Development
Laboratory of Daum Communications. He had
been an Assistant Professor with the Electronic Engineering Department,
Yeungnam University, Korea from 2014 to 2020 and has been an Associate
Professor from 2020. His research interests include multimedia retrieval,
computational photography, 3D photogrammetry, object tracking and recognition based on machine learning and neural networks.
16
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
HYEOK YUN received the B.S. degree in electrical engineering from Pohang University of Science and Technology (POSTECH), Republic of
Korea, in 2019. He is currently pursuing the M.S.
degree in electrical engineering from POSTECH.
His research interests include variability of multigate field-effect transistors (FinFETs, nanowire
FETs, and nanosheet FETs) and machine learning.
JUN-SIK YOON (S’14-M’16) received the B.S.
degree in electrical engineering and Ph.D. degree
in creative IT engineering from Pohang University
of Science and Technology (POSTECH), Republic
of Korea, in 2012 and 2016, respectively. He was a
postdoctoral research fellow in POSTECH (20162018). From 2019, he is a research assistant professor in electrical engineering from POSTECH.
His research interests include characterization and
simulation of advanced nanoscale devices (fin,
gate-all-around, tunneling, nanosheet FETs) and applications (chemical
sensor, solar cell).
ROCK-HYUN BAEK (S’08-M’11) received the
B.S. degree in electrical engineering from Korea
University, and the M.S. and Ph.D. degrees in
electrical engineering from the Pohang University
of Science and Technology (POSTECH), Republic
of Korea, in 2004, 2006, and 2011, respectively.
He was a Postdoctoral Researcher and Technical Engineer with SEMATECH in Albany, NY
(2011–2015). He was a Senior Device Engineer
with SAMSUNG Research and Development Center (Pathfinding TEAM), Korea (2015–2017). Since 2017, he has been
an Assistant Professor in electrical engineering with POSTECH, Pohang,
Korea. His research interests include technology benchmark by characterization, simulation, and modeling of advanced devices and materials (fin,
gateall-around, nanosheet FETs, 3D-NAND, 3DIC, SiGe, Ge, III-V, etc).
17
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
APPENDIX. TRANSITIONS OF FIGURE-OF-MERITS
A. SYMMETRIC STRUCTURE AND 5% LIMITER
1
0
1
0
39
40
41
42
0
26
28
30
x=Lcon(s/d)
10
2
10.5
pdp vs N halo(s/d)
10 -16
10 -18
3
2
10 -16
3
2
1
0
0
0
0
11
12
10 -16
10 -17
dy = -3.7973
y=pdp
2
5
5.5
1
6
8
9
10 21
10
25.5
26
19
19.5
11
12
pdp vs Tg
20.5
21
pdp vs Thk
10 -16
dy = -4.5583
10 -18
10 -17
dy = -2.6635
4
3
2
1
0
48
49
10 22
x=N sd(s/d)
20
x=Lsp(s/d)
50
51
52
1.8
2
x=T g
2.2
10 -3
x=T hk
pdp vs Ts/d(s/d)
10 -18
dy = 1.4272
4
3
4.5
x=N halo(s/d)
pdp vs Tox
10 -16
4
4
10 20
25
2
1
10
24.5
3
1
x=N ch
0
4
1
9
2
x=Lsdj(s/d)
10 -18
10 -18
3
1
24
pdp vs N sd(s/d)
dy = 5.4069
4
pdp vs Lsp(s/d)
dy = 4.7325
4
0
9.5
y=pdp
3
2
x=Lhaloj(s/d)
dy = 2.9812
4
y=pdp
y=pdp
10 -16
10 -18
dy = 8.5829
8
y=pdp
34
x=Lg
pdp vs N ch
10 -16
4
32
3
1
y=pdp
38
2
10 -16
10 -18
y=pdp
2
4
3
pdp vs Lsdj(s/d)
dy = -5.3249
y=pdp
optimum
-5% bounds
+5% bounds
1
10 -16
10 -18
dy = 4.0593
4
3
pdp vs Lhaloj(s/d)
y=pdp
y=pdp
y=FPDP(x)
10 -16
10 -17
dy = 9.6238
4
3
2
pdp vs Lg
10 -16
10 -18
dy = -4.8686
4
y=pdp
pdp vs Lcon(s/d)
y=pdp
10 -16
3
2
1
0
0
6
6.5
7
7.5
8
9.5
10
10 -4
x=T ox
10.5
10 -3
x=T s/d(s/d)
FIGURE 17: PDP transition for varying manufacturing parameter around the optimal value of 5% limiter and symmetric
source/drain structure
optimum
-5% bounds
+5% bounds
38
39
40
0
41
42
28
30
10 -12
10 -14
1
delay vs Nhalo(s/d)
9
10
11
1.5
1
10 -12
10 -12
10 -14
y=delay
1
0.5
5
5.5
6
1.5
1
25
25.5
26
delay vs Tg
dy = -2.6953
10
x=N sd(s/d)
11
19.5
2
10 -12
10 -15
1.5
1
12
10 22
20
20.5
21
delay vs Thk
dy = 2.1652
2
10 -14
1.5
1
0.5
0
9
19
x=Lsp(s/d)
0.5
8
10 21
24.5
0
48
49
50
x=T g
51
52
1.8
2
x=T hk
2.2
10 -3
delay vs Ts/d(s/d)
dy = 9.2618
2
1.5
4.5
x=N halo(s/d)
delay vs Tox
dy = 4.8036
2
4
0
10 -12
10 -14
0
12
1
0.5
24
delay vs Nsd(s/d)
10 -14
1.5
x=Lsdj(s/d)
0.5
10 20
x=N ch
10.5
dy = -2.6213
2
0
8
10
10 -12
10 -14
0.5
0
1
delay vs Lsp(s/d)
dy = 1.4137
2
0
9.5
y=delay
1.5
1.5
x=Lhaloj(s/d)
dy = 4.6125
2
y=delay
y=delay
delay vs Nch
0.5
y=delay
34
x=Lg
dy = -3.0722
2
32
10 -12
10 -14
0.5
0
26
x=Lcon(s/d)
10 -12
1
0.5
delay vs Lsdj(s/d)
dy = -5.0382
2
1.5
y=delay
0
1
0.5
10 -12
10 -15
y=delay
y=Fdelay (x)
dy = 4.13
2
1.5
delay vs Lhaloj(s/d)
y=delay
1
0.5
10 -12
10 -13
y=delay
2
1.5
delay vs Lg
dy = 1.3915
y=delay
10 -12
10 -15
dy = -2.3782
2
y=delay
delay vs Lcon(s/d)
y=delay
10 -12
10 -15
1.5
1
0.5
0
0
6
6.5
7
x=T ox
7.5
8
10 -4
9.5
10
x=T s/d(s/d)
10.5
10 -3
FIGURE 18: Delay transition for varying manufacturing parameter around the optimal value of 5% limiter and symmetric
source/drain structure
18
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
y=Fpower (x)
optimum
-5% bounds
+5% bounds
0
39
1
41
42
28
30
32
34
1
0
9.5
10
10.5
10 -07
dy = 9.0688
2
1
0
24
24.5
25
25.5
26
19
19.5
20
20.5
x=Lsdj(s/d)
x=Lsp(s/d)
power vs N ch
-4 power vs N halo(s/d)
-4 power vs N sd(s/d)
power vs T g
power vs T hk
y=power
10
10 -06
2
1
0
9
10
11
4
10 20
dy = -3.6069
4.5
5
5.5
10
10 -05
y=power
1
0
2
dy = -2.7868
1
10 -4
10 -06
2
1
0
8
9
10
x=N sd(s/d)
11
12
10 22
dy = -2.3005
21
10 -05
2
1
0
48
49
50
x=T g
51
52
1.8
2
x=T hk
2.2
10 -3
-4 power vs T s/d(s/d)
dy = -5.9577
2
6
10 21
x=N halo(s/d)
power vs T ox
10 -4
10 -06
0
12
x=N ch
dy = 8.2887
y=power
dy = -5.8396
y=power
10
10 -05
0
10
1
2
x=Lhaloj(s/d)
1
-4
2
power vs L sp(s/d)
10 -4
10 -06
dy = 4.8184
0
26
power vs L sdj(s/d)
x=Lg
2
8
10 -4
10 -06
x=Lcon(s/d)
dy = 1.149
y=power
2
y=power
10 -4
40
power vs L haloj(s/d)
dy = 2.1773
0
38
y=power
10 -4
10 -05
y=power
2
1
power vs L g
dy = 4.3252
y=power
y=power
10 -4
10 -06
dy = -3.0649
y=power
power vs L con(s/d)
y=power
10 -4
10 -07
2
1
0
6
6.5
7
x=T ox
7.5
8
10 -4
9.5
10
x=T s/d(s/d)
10.5
10 -3
FIGURE 19: Power transition for varying manufacturing parameter around the optimal value of 5% limiter and symmetric
source/drain structure
19
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
B. ASYMMETRIC STRUCTURE AND FULL RANGE LIMITER
38
39
40
1
1
0
41
42
39
40
10 -16
10 -18
3
2
pdp vs Lsdj(s)
10 -16
10 -17
30
32
34
2
3
2
0
0
0
0
24
24.5
26
19
10 -18
y=pdp
1
10 -16
10 -17
dy = -1.8368
4
2
pdp vs N halo(s)
3
2
1
0
5.5
x=N halo(d)
10 -16
4
4
4.5
10 18
10 -16
10 -17
2
5.5
10 18
2
0
0
2.2
0.6
0.65
0.7
x=T ox
0.75
0.8
2
y=pdp
8
9
10
11
pdp vs N sd(s)
10 -16
10 -17
pdp vs Tg
dy = 2.0643
4
12
10 17
x=N ch
2
10 -18
3
2
1
0
8
10 19
9
10
11
x=N sd(s)
10 -16
10 -18
10 -18
3
21
3
12
12
10 19
48
49
50
51
52
x=T g
pdp vs Ts/d(s)
dy = -8.9216
4
2
0
2
11
3
1
20.5
dy = 3.7011
4
pdp vs Ts/d(d)
dy = 3.8077
4
1
x=T hk
10
x=N sd(d)
10 -16
3
20
0
9
pdp vs N ch
dy = -6.8628
4
1
8
1
1.8
19.5
10 -16
10 -17
10.5
0
19
pdp vs N sd(d)
2
6
10 -17
10 -16
10 -17
x=Lsp(s)
3
pdp vs Tox
dy = -2.1821
4
y=pdp
3
5
x=N halo(s)
pdp vs Thk
dy = -1.3483
21
0
6
y=pdp
5
20.5
1
0
4.5
20
dy = -1.1779
4
10
1
x=Lsp(d)
y=pdp
10 -16
3
4
19.5
y=pdp
pdp vs N halo(d)
dy = 8.363
4
25.5
y=pdp
10 -16
25
x=Lsdj(s)
pdp vs Lsp(s)
2
1
26
9.5
x=Lhaloj(s)
3
1
25.5
10.5
dy = -1.7696
4
1
25
0
10
10 -16
10 -18
1
24.5
2
1
9.5
pdp vs Lsp(d)
10 -18
3
x=Lhaloj(d)
dy = 2.2025
4
3
x=Lsdj(d)
y=pdp
28
x=Lg
dy = 2.0045
4
y=pdp
y=pdp
pdp vs Lsdj(d)
2
pdp vs Lhaloj(s)
dy = -3.2007
4
3
0
26
x=Lcon(s)
dy = -5.8985
24
y=pdp
42
y=pdp
10 -16
41
10 -16
10 -18
1
0
38
x=Lcon(d)
4
2
y=pdp
0
2
4
3
pdp vs Lhaloj(d)
dy = 2.8912
y=pdp
optimum
-5% bounds
+5% bounds
4
3
10 -16
10 -17
dy = 5.833
y=pdp
y=FPDP(x)
1
pdp vs Lg
10 -16
10 -19
y=pdp
2
pdp vs Lcon(s)
dy = 2.617
4
3
y=pdp
y=pdp
4
10 -16
10 -19
dy = -8.9053
y=pdp
pdp vs Lcon(d)
10 -16
10 -18
3
2
1
0
9.5
10
x=T s/d(d)
10.5
9.5
10
10.5
x=T s/d(s)
FIGURE 20: PDP transition for varying manufacturing parameter around the optimal value of full range limiter and asymmetric
source/drain structure
20
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
0.5
0
0.5
40
41
42
39
40
42
0.5
10 -12
10 -13
1.5
1
0.5
0
25
25.5
24
24.5
10 -12
10 -14
25.5
10 -13
1
0.5
0
5
5.5
4
1.5
1
0.5
8
10 -12
1
2.2
11
0.65
0.7
0.75
8
9
10
10 -12
10 -13
10 17
delay vs Tg
10 -17
dy = 9.4408
2
1.5
1
0
9
10
11
12
10 -12
49
50
51
52
x=T g
delay vs Ts/d(s)
10 -13
dy = 1.5153
2
48
10 19
x=N sd(s)
10 -14
12
0.5
8
delay vs Ts/d(d)
11
x=N ch
delay vs Nsd(s)
10 19
1
0.8
1
21
1
1.5
1
0.5
0
0.6
20.5
1.5
12
0.5
x=T hk
20
dy = -5.1099
2
1.5
0
2
10
dy = 2.7258
2
0.5
1.8
9
x=N sd(d)
10 -16
1.5
0
6
1.5
0
10 -12
10 -14
10 -13
0
19.5
0.5
10 18
dy = -2.4486
2
y=delay
2
5.5
delay vs Nch
0.5
19
delay vs Nsd(d)
10.5
dy = 1.8355
2
x=Lsp(s)
1
delay vs Tox
10 -12
10 -14
dy = 4.2984
5
x=N halo(s)
delay vs Thk
10 -12
4.5
10 -12
10 -13
1
0
6
10 18
x=N halo(d)
delay vs Lsp(s)
1.5
21
0.5
y=delay
4.5
20.5
1.5
0
4
20
dy = -6.0569
2
10
x=Lhaloj(s)
0
19.5
10 -12
1.5
9.5
0.5
19
delay vs Nhalo(s)
10.5
dy = 3.4838
2
1
26
y=delay
y=delay
1
0.5
10 -12
10 -14
x=Lsp(d)
dy = 3.9259
2
1.5
10
x=Lhaloj(d)
delay vs Lsp(d)
x=Lsdj(s)
dy = 2.9866
2
25
0
9.5
0
26
delay vs Nhalo(d)
34
0.5
x=Lsdj(d)
10 -12
32
1.5
0
24.5
30
dy = 1.1252
2
y=delay
1
delay vs Lsdj(s)
1
0.5
x=Lg
dy = -4.6585
2
1.5
24
28
10 -14
1.5
0
26
y=delay
10 -12
10 -14
y=delay
y=delay
41
x=Lcon(s)
dy = -3.3448
1
0
38
delay vs Lsdj(d)
10 -12
2
1.5
delay vs Lhaloj(s)
dy = 8.399
2
0.5
y=delay
39
x=Lcon(d)
y=delay
1
0
38
y=delay
1.5
10 -12
10 -15
y=delay
1
delay vs Lhaloj(d)
dy = 7.4521
2
y=delay
0.5
2
1.5
10 -12
10 -13
y=delay
optimum
-5% bounds
+5% bounds
delay vs Lg
dy = 3.4514
y=delay
y=Fdelay (x)
1
10 -12
10 -15
y=delay
1.5
delay vs Lcon(s)
dy = -6.849
2
y=delay
y=delay
2
10 -12
10 -14
dy = 1.458
y=delay
delay vs Lcon(d)
10 -12
0
9.5
10
x=T ox
10.5
9.5
10
x=T s/d(d)
10.5
x=T s/d(s)
FIGURE 21: Delay transition for varying manufacturing parameter around the optimal value of full range limiter and
asymmetric source/drain structure
0
1
0
40
41
42
0
38
39
power vs L sdj(d)
10 -4
10 -07
y=power
2
1
24.5
power vs L sdj(s)
25
10 -4
25.5
1
26
24.5
25
25.5
26
10
0
10
5
5.5
2
1
10 -4
4
10 -4
10 -06
1
0
5.5
2
x=T hk
2.2
-4
8
10
11
10 -4
0.7
x=T ox
19.5
10
0.75
0.8
20
20.5
dy = 2.5153
power vs N ch
2
1
21
-4
8
power vs N sd(s)
10
9
10
11
12
10 17
power vs T g
dy = 1.0191
1
10 -06
2
1
0
8
10 19
9
10
11
12
10 19
x=N sd(s)
10 -4
10 -07
48
49
50
51
52
x=T g
power vs T s/d(s)
dy = -1.4579
1
-4
10 -05
2
12
2
10 -05
dy = -1.5816
x=N ch
dy = 6.4157
power vs T s/d(d)
10.5
0
19
10 -06
0
0.65
10 -4
10 -05
1
power vs N sd(d)
9
10
x=Lsp(s)
x=N sd(d)
10 -05
1
0.6
power vs L sp(s)
0
6
2
9.5
x=Lhaloj(s)
2
21
1
10 18
0
1.8
20.5
2
power vs T ox
dy = -1.0816
y=power
2
5
x=N halo(s)
power vs T hk
dy = -9.468
4.5
10.5
dy = -3.5641
0
6
10 18
x=N halo(d)
20
dy = -2.1501
y=power
4.5
19.5
10 -05
0
4
10 -4
x=Lsp(d)
dy = -4.0321
y=power
1
10
0
19
1
0
9.5
10 -07
1
10 -06
2
x=Lhaloj(d)
2
-4 power vs N halo(s)
10 -06
2
34
power vs L sp(d)
x=Lsdj(s)
dy = 2.3271
32
0
24
-4 power vs N halo(d)
30
dy = 4.16
y=power
10
28
10 -05
2
x=Lsdj(d)
1
0
26
0
24
y=power
42
power vs L haloj(s)
dy = -6.9062
2
x=Lg
dy = 4.8424
0
y=power
41
y=power
dy = -9.0664
y=power
40
x=Lcon(s)
y=power
x=Lcon(d)
10 -4
1
y=power
39
2
10 -4
10 -07
dy = 9.8468
y=power
38
power vs L haloj(d)
y=power
2
10 -4
10 -06
y=power
optimum
-5% bounds
+5% bounds
power vs L g
dy = 7.1119
y=power
y=Fpower (x)
10 -4
10 -07
y=power
2
1
power vs L con(s)
dy = 5.4265
y=power
y=power
10 -4
10 -06
dy = -1.3261
y=power
power vs L con(d)
10 -4
10 -05
2
1
0
9.5
10
x=T s/d(d)
10.5
9.5
10
10.5
x=T s/d(s)
FIGURE 22: Power transition for varying manufacturing parameter around the optimal value of full range limiter and
asymmetric source/drain structure
21
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
C. SYMMETRIC STRUCTURE AND FULL RANGE LIMITER
1
0
1
0
39
40
41
42
0
26
28
30
x=Lcon(s/d)
10
2
10.5
pdp vs N halo(s/d)
10 -16
10 -18
3
2
10 -16
3
2
1
0
0
0
0
11
12
10 -16
10 -17
dy = -3.3479
y=pdp
2
5
5.5
1
6
8
9
10 21
10
25.5
26
19
19.5
11
pdp vs Tg
12
20.5
21
pdp vs Thk
10 -16
10 -18
dy = -4.9389
10 -17
dy = -2.3419
4
3
2
1
0
48
49
10 22
x=N sd(s/d)
20
x=Lsp(s/d)
50
51
52
1.8
2
x=T g
2.2
10 -3
x=T hk
pdp vs Ts/d(s/d)
10 -18
dy = 3.1733
4
3
4.5
x=N halo(s/d)
pdp vs Tox
10 -16
4
4
10 20
25
2
1
10
24.5
3
1
x=N ch
0
4
1
9
2
x=Lsdj(s/d)
10 -17
10 -18
3
1
24
pdp vs N sd(s/d)
dy = 1.1964
4
pdp vs Lsp(s/d)
dy = 7.6336
4
0
9.5
y=pdp
3
2
x=Lhaloj(s/d)
dy = 6.7781
4
y=pdp
y=pdp
10 -16
10 -18
dy = 8.3441
8
y=pdp
34
x=Lg
pdp vs N ch
10 -16
4
32
3
1
y=pdp
38
2
10 -16
10 -17
y=pdp
2
4
3
pdp vs Lsdj(s/d)
dy = -1.0839
y=pdp
optimum
-5% bounds
+5% bounds
10 -16
10 -18
dy = 6.5552
4
3
pdp vs Lhaloj(s/d)
y=pdp
y=pdp
y=FPDP(x)
1
10 -16
10 -17
dy = 9.4324
4
3
2
pdp vs Lg
10 -16
10 -18
dy = -3.3914
4
y=pdp
pdp vs Lcon(s/d)
y=pdp
10 -16
3
2
1
0
0
6
6.5
7
7.5
8
9.5
10
10 -4
x=T ox
10.5
10 -3
x=T s/d(s/d)
FIGURE 23: PDP transition for varying manufacturing parameter around the optimal value of full range limiter and symmetric
source/drain structure
optimum
-5% bounds
+5% bounds
38
39
40
0
41
42
28
30
10 -12
10 -14
1
delay vs Nhalo(s/d)
10 -12
10 -14
0
1.5
1
10
11
10 -12
10 -12
10 -14
y=delay
1
0.5
5
5.5
6
1.5
1
25
25.5
26
delay vs Tg
dy = -1.2668
10
x=N sd(s/d)
19.5
2
10 -12
10 -15
1.5
1
11
12
10 22
20
20.5
21
delay vs Thk
dy = 2.2993
2
10 -14
1.5
1
0.5
0
9
19
x=Lsp(s/d)
0.5
8
10 21
24.5
0
48
49
50
x=T g
51
52
1.8
2
x=T hk
2.2
10 -3
delay vs Ts/d(s/d)
dy = 1.2539
2
1.5
4.5
x=N halo(s/d)
delay vs Tox
dy = 5.3032
2
4
0
10 -12
0
12
10 20
x=N ch
1
0.5
24
10 -14
10 -14
1.5
x=Lsdj(s/d)
0.5
0
9
10.5
delay vs Nsd(s/d)
dy = 3.9025
2
0.5
8
10
delay vs Lsp(s/d)
dy = -2.1134
2
0
9.5
y=delay
1.5
1
x=Lhaloj(s/d)
dy = 5.9283
2
y=delay
y=delay
delay vs Nch
0.5
y=delay
34
x=Lg
dy = -3.5379
2
32
1.5
0.5
0
26
x=Lcon(s/d)
10 -12
1
0.5
y=delay
0
1
0.5
10 -12
10 -15
dy = 9.9258
2
1.5
delay vs Lsdj(s/d)
y=delay
y=Fdelay (x)
10 -12
10 -15
dy = 1.3708
2
1.5
delay vs Lhaloj(s/d)
y=delay
1
0.5
10 -12
10 -13
y=delay
2
1.5
delay vs Lg
dy = 1.4561
y=delay
10 -12
10 -15
dy = 4.8083
2
y=delay
delay vs Lcon(s/d)
y=delay
10 -12
10 -15
1.5
1
0.5
0
0
6
6.5
7
x=T ox
7.5
8
10 -4
9.5
10
x=T s/d(s/d)
10.5
10 -3
FIGURE 24: Delay transition for varying manufacturing parameter around the optimal value of full range limiter and symmetric
source/drain structure
22
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3019933, IEEE Access
Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing
y=Fpower (x)
optimum
-5% bounds
+5% bounds
0
39
1
41
42
1
28
30
32
34
2
1
0
9.5
10
10.5
10 -06
dy = 8.7667
2
1
0
24
24.5
25
25.5
26
19
19.5
20
20.5
x=Lsdj(s/d)
x=Lsp(s/d)
power vs N ch
-4 power vs N halo(s/d)
-4 power vs N sd(s/d)
power vs T g
power vs T hk
2
1
0
9
10
11
4
10 20
dy = -3.3257
4.5
5
5.5
10
10 -05
y=power
1
0
2
dy = -3.3708
1
10 -4
10 -06
2
1
0
8
9
10
x=N sd(s/d)
11
12
10 22
dy = -2.0639
21
10 -05
2
1
0
48
49
50
x=T g
51
52
1.8
2
x=T hk
2.2
10 -3
-4 power vs T s/d(s/d)
dy = 2.1003
2
6
10 21
x=N halo(s/d)
power vs T ox
10 -4
10 -06
0
12
x=N ch
dy = 2.7522
y=power
y=power
10
10 -06
dy = -3.7648
y=power
10
10 -05
0
10
2
power vs L sp(s/d)
10 -4
10 -06
x=Lhaloj(s/d)
1
-4
dy = -9.3457
0
26
power vs L sdj(s/d)
x=Lg
2
8
10 -4
10 -06
x=Lcon(s/d)
dy = 1.1543
y=power
2
y=power
10 -4
40
power vs L haloj(s/d)
dy = 4.5198
0
38
y=power
10 -4
10 -05
y=power
2
1
power vs L g
dy = 4.2202
y=power
y=power
10 -4
10 -06
dy = -3.1715
y=power
power vs L con(s/d)
y=power
10 -4
10 -06
2
1
0
6
6.5
7
x=T ox
7.5
8
10 -4
9.5
10
x=T s/d(s/d)
10.5
10 -3
FIGURE 25: Power transition for varying manufacturing parameter around the optimal value of full range limiter and symmetric
source/drain structure
23
VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
Download