This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Date of current version August 25, 2020 Digital Object Identifier Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing HYUN-CHUL CHOI1 , HYEOK YUN2 , JUN-SIK YOON2 , (Member, IEEE), and ROCK-HYUN BAEK2 , (Member, IEEE) 1 Department of Electronic Engineering, Yeungnam University, Gyeongsan, 38541 Republic of Korea (e-mail: pogary@ynu.ac.kr) Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, 37673 Republic of Korea (e-mail: {myska315, junsikyoon, rh.baek}@postech.ac.kr) 2 Corresponding author: Rock-Hyun Beak (e-mail: rh.baek@postech.ac.kr). This work was supported in part by the Ministry of Trade, Industry & Energy under Grant 10080617, in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2018R1C1B6004056, NRF-2020R1A4A4079777), in part by the Korea Semiconductor Research Consortium Support Program for the Development of the Future Semiconductor Device, in part by the Semiconductor Industry Collaborative Project between POSTECH and SK Hynix Inc., and in part by IC Design. ABSTRACT An optimal design of semiconductor device and its process uniformity are critical factors affecting desired figure-of-merits as well as reducing fabrication cost of fixing possible malfunctioning in semiconductor manufacturing. Two main tasks in optimal device design for semiconductor manufacturing, i.e., parameter optimization and modeling, have been typically used either to characterize the devices by understanding how each parameter affects the device performances or to calibrate the parameters for SPICE circuit simulation. However, there still remains limitations in describing the relationship between all manufacturing parameters and figure-of-merits using several simple equations human experts can utilize. Even with the best model currently available, the optimal design of semiconductor device heavily relies on experiences of human experts and deals with time-consuming ad-hoc trials and non-holistic approaches. In this paper, we propose a new approach for data-based accurate electrical modeling of transistor, which is the most fundamental unit device of semiconductor, and fast optimization of its manufacturing parameters. Instead of the previous analytic approaches, finding finite equations derived from semiconductor physics, we utilize machine learning technique and neural networks to find appropriate modeling functions from data pairs of parameters and figure-of-merits. And for given desired figure-of-merits, we find optimal manufacturing parameters in holistic manner by using the learned functions of neural networks and fast gradient-based optimization method. Experimental results show that our neural-network-based-model directly estimate figure-of-merits with competitive accuracy and that our holistic optimization technique accurately and rapidly adapts the manufacturing parameters to meet desired figure-of-merits. INDEX TERMS Transistor, Semiconductor manufacturing, Holistic parameter optimization, Machine learning, Neural networks, Gradient descent method I. INTRODUCTION RANSISTOR process is the most basic and core technology in semiconductor industry. Thus, developing a transistor that is high-performing and low-power-consuming is of a great importance. Typically finding optimal manufacturing parameters for high performing and low power consuming transistor requires a repeated wafers fabrication with hundreds of unit process steps to measure its electrical performances and a feedback loop to unit process of manufacturing. This procedure dramatically increases research and development cost because it can take up to several weeks. T The duration of the procedure has been extended with the shrinkage of technology node. To reduce the time consuming trials, multiple equationbased models have been suggested in analytic or numerical forms either to physically understand device characteristics [1] or to model I-V and C-V through tens of fitting parameters for SPICE circuit simulation [2], [3]. However, these modeling equations do not explain the relationship between manufacturing parameters and figure-of-merits at a time. There is no simple equation to delineate the relationship and ultimately to meet given desired figure-of-merits by under1 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing standing their correlations. In addition, even with those equations, manufacturing parameter optimization heavily relies on experiences of human experts and has been dealt with adhoc trials and non-holistic approaches. For instance, human experts design a transistor to optimize DC performance firstly by maximizing on/off current ratio from I-V characteristics and AC performance secondly by adjusting capacitance from C-V characteristics because of the antagonistic effects between I-V and C-V characteristics on DC/AC performances of transistor when its structure changes. In this paper, we aim to overcome time-consuming and non-holistic optimization of transistor manufacturing parameters. For this end, we propose a new approach for both databased precise transistor modeling and fast optimization of transistor manufacturing parameters. Instead of the previous analytic approaches in finding finite equations derived from semiconductor physics, we utilize artificial intelligence (AI), namely neural networks [4], [5], and machine learning technique, to automatically find appropriate modeling functions from a number of data pairs of input manufacturing parameters and output figure-of-merits. Subsequently, we find optimal manufacturing parameters for given desired figureof-merits in automatic and holistic manner via gradient-based optimization technique and the learned modeling functions. AI has been previously applied for monitoring the excursion, i.e., out of boundary figure-of-merits in unit process of transistor manufacturing [6], but there is no AI-based study on analyzing the relationship between overall manufacturing procedure and figure-of-merits of transistor. Our holistic approach simultaneously optimizes the entire manufacturing procedure in the whole parameter space while conventional sequential optimization for each unit process results in a sub-optimal transistor due to its limited parameter-searching ability. The contributions of this work are as follows: 1) Physics-free modeling and high compatibility: In semiconductor field, different models with regards to material, scale, structure, and usage of transistor are conventionally required. However, our AI model replaces multiple models with one artificial neural network model, which is independent of those physical characteristics and able to be applied to various transistor only with their input/output data pairs. In the aspect of variability, our method demonstrates the correlation between manufacturing parameters and device figure-of-merits from a number of data even for nanoscale devices suffering from physical ambiguity. 2) Fast modeling and optimization: It takes several hours for excellent human expert to compactly model a given transistor. In contrast, our AI method takes only several tens of seconds to complete the same task. Similarly, our AI method takes only a couple of minutes to find optimal manufacturing parameters for given device figure-of-merits. 3) Fast prediction without accuracy penalty: To speed up TCAD simulation in transistor design, less physical parameters and more empirical parameters are necessary. This results in low accuracy of simulation result. However, our AI method uses a compact neural network for fast prediction and guarantees high accuracy as the number of data increases. 4) Holistic and flexible optimization: Our method finds manufacturing parameters to optimize AC performance of transistor by simultaneously considering IV and C-V parameters. In addition, our method can also selectively optimize the performance suitable for application of transistor, which is useful for flexible design of transistor for System-On-Chip application. In the remained of this paper, we will describe the details of our AI-based method in sec.II, experimentally prove the superiority of our method in sec.III, and conclude this work in sec.IV. II. METHOD A. TRANSISTOR STRUCTURE AND PARAMETERS Since the purpose of this work is not to find a new low-scaled devices but to optimize transistor and to analyze its variation, we designed 32-nm node high-K metal gate transistor, a 2-D planar MOSFET which is easily dealt with in data acquisition compared to 3-D FinFET in the current 7 nm technology node, and simulated electrical characteristics using Sentaurus TCAD simulator [7]. The transistor structure was designed using structure editor with parameters based on Intel 32nm node [8]. Values and ranges of structure parameters are shown in fig.1 and summarized in table 1. All doping profiles were assumed as the Gaussian distribution. The junction gradient lengths for source/drain (Lsdj ) and halo implants (Lhaloj ) indicate the lengths where the source/drain (Nsd ) and halo (Nhalo ) implants are reduced by 10 times, respectively. TABLE 1: Structure input parameters of high-K metal gate transistor based on Intel’s 32-nm node [8] Input parameter Symbol Unit Reference Max Gate length Lg nm 30 34.5 Spacer length Lsp(s/d) nm 20 21 Contact length Lcon(s/d) nm 40 42 Oxide thickness Tox nm 0.7 0.805 High-K thickness Thk nm 2 2.3 Doping Nch 1018 cm−3 1 1.25 concentration Nhalo(s/d) 1018 cm−3 5 6.25 Nsd(s/d) 1020 cm−3 1 1.25 Junction gradient Lhaloj(s/d) nm/dec 10 10.5 Lsdj(s/d) nm/dec 25 26.25 Gate stack height Tg nm 50 52.5 S/D epi height Ts/d(s/d) nm 10 10.5 Min 25.5 19 38 0.595 1.7 0.75 3.75 0.75 9.5 23.75 47.5 9.5 B. NEURAL-NETWORK-BASED MODELING FOR POWER AND DELAY PREDICTION In order to avoid very complex mathematical equations of semiconductor physics and to make a general model for multiple devices, we utilize one of the recently popularized hyperparametric models, namely artificial neural networks (ANN) [4], [5], and machine learning techniques to model 2 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access 1. 논문 figure 및 table Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing of i-th training sample (Ip i ,Dy i ) for i = 1..N as the only loss function. Each MLP for y = power or delay, is trained to minimize this loss function Ly (3) by backpropagation procedure [13], which updates the hyperparameters of MLP from initial random values. Minimizing Ly in log scale is O i identical to minimizing log-scaled ratio, |log( Dyy i )|, which considers both scale-sensitive ratio and log-scaled error symmetric regarding to predicted and target values. N 1 X ||log(Oy i ) − log(Dy i )||2 , y = power or delay. N i (3) One could suggest using a single MLP, which outputs both power and delay from given input manufacturing parameters. Such an approach is also possible and, in actuality, more efficient in the size of neural networks because two different outputs share the same hidden layer. However, in training procedure, a multi-objective loss function Lm of two objectives (3) corresponding to power and delay should be defined and usually has a form of weighted summation (4) of Lpower and Ldelay with ambiguous weights wpower and wdelay . Ly = FIGURE 1: Geometric structure of high-K metal gate transistor based on Intel’s 32 nm node [8]. log log . . . Here, we used log values in input and output of Fy for y = power or delay to avoid network training failure caused by large scale differences between input and output values. We follow the data flow and operations presented in fig.3 for MLP training. We use mean squared error (MSE), Ly (3), between model output log(Oy i ) and desired output log(Dy i ) log . . . (2) log log . . . log(Odelay ) = Fdelay (log(Ip )). log . . . (1) Manufacturing parameters (Ip) . . . log(Opower ) = Fpower (log(Ip )), (4) Since training a single MLP by minimizing Lm may result in overly fitted Fy with a larger weight wy or less optimized Fy with a small weight wy , we need to repeat MLP training process with varying weights to get satisfactory values of Lpower and Ldelay to determine appropriate values for the weights. In contrast, when using two MLPs for two outputs as in our approach (fig.2), it is unnecessary to find appropriate weights for multiple loss functions and each MLP is independently trained with its own loss function Ly (3). . . . functions that output figure-of-merits of transistor with the given input vector. The unit structure of ANN is perceptron [9], which models after the biological function of neuronal synapse as a linear combination of inputs followed by a non-linear activation function. ANN with several layers of multiple perceptrons is called fully connected multi-layer perceptron (MLP) [10], [11]. The advantage of MLP is in making a flexible model for new device by simply changing the number of input and output perceptrons, regarding to the dimension of manufacturing parameters and output figureof-merits. In this work, we use two of 3-layered MLP (inputhidden-output layers) as shown in fig.2 because the structure is simple enough to control non-linearity function by changing the number of perceptrons in the hidden layer. By using these two MLPs, we model two functions of transistor, Fpower (1) and Fdelay (2), which respectively output log-scaled power log(Opower ) and delay log(Odelay ) of a device from given input log-scaled vector log(Ip ) of manufacturing parameters p, where power and delay, or power delay products (PDP) are figure-of-merits for digital logic applications [12]. Lm = wpower · Lpower + wdelay · Ldelay . Fdelay Fpower exp exp Power (Opower) Delay (Odelay) FIGURE 2: 3-layered MLPs for estimating power and delay from given manufacturing parameters. Flexible nonlinear function can be implemented by using MLPs with varying number of perceptrons. 3 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing Backpropagation Latent values zp Limiter g(x) Manufacturing Parameters log(Ipi) . . . log(Oyi) . . . C. AUTOMATIC OPTIMIZATION OF MANUFACTURING PARAMETERS WITHIN CONTROLLABLE RANGE Fdelay Power log(Opower) Delay log(Odelay) log(PDP) Once two models of power and delay are ready to use, we can not only forwardly predict power or delay from given manufacturing parameters but also backwardly optimize manufacturing parameters for a better performance of transistor. PDP is a commonly used measurement of transistor performance and should be as small as possible. Since our neural networks are designed to output log-scaled power and delay (fig.2), log-scaled PDP can be represented as the summation of two outputs of our networks (5). log(P DP (x)) = log(Opower (x) · Odelay (x)) = log(Opower (x)) + log(Odelay (x)) = Fpower (x) + Fdelay (x), x = log(Ip ). Fpower . . . . . . FIGURE 3: MLP training for a certain figure-of-merit. Given training sample (Ip i ,Dy i ), MLP is trained to minimize Ly by backpropagation. Update latent values by gradient descent method . . . Target specification log(Dyi) log(Ip) . . . Fy Manufacturing Parameters Ly FIGURE 4: Automatic optimization of manufacturing parameters with fully connected multi-layer perceptron (MLP): pre-trained networks for predicting log-scaled power and delay are used to calculate log-scaled PDP and limiter g(x) is used to force the manufacturing parameters Ip be in the controllable range. P DP (xt ) ≤ TP DP or G(x) ≤ TG or dx ≤ Tdx . (5) As shown in fig.4, we find the optimal set of manufacturing parameters, which minimizes this log-scaled PDP value by utilizing our trained neural networks and the gradient descent method [14], [15]. Gradient descent method operates as described here: initial randomized manufacturing parameters, Ip0 , and learning rate γ are set as a preliminary step. Then, as the first step, the gradient of log-scaled PDP with respect to manufacturing parameter G(x) (6) is calculated with the initial manufacturing parameters (x = log(Ip0 )). As the second step, the initial manufacturing parameters are updated to move to the negative direction of the gradient with a certain learning rate γ as (7) at time stamp t = 0. Those two steps are repeated with the updated manufacturing parameters as the time stamp t increases until PDP value meets a termination condition (8), i.e., PDP less than a certain threshold value TP DP or vanishing gradient or no variation in manufacturing parameters. d(log(P DP (x))) dx dFpower (x) dFdelay (x) = + , dx dx (6) xt+1 = xt − γ · G(xt ), xt = log(Ipt ), (7) G(x) = (8) Here, the manufacturing parameters obtained from this gradient descent method are not guaranteed to be in the controllable range of real semiconductor manufacturing because the method may find an optimal solution along the entire hypersurface of our trained neural networks beyond the training data space, without any constraint of the acceptable range of solution in reality. Thus setting a controllable range of manufacturing parameters in semiconductor industry is critical because of both the limitation of manufacturing hardware and the preferred parameter setting. To provide a controllable optimal solution, as shown in fig.4, we use a limiter function [16] g(zp ) (9) and optimize latent variables zp instead of manufacturing parameters Ip themselves. g(zp ) = sig(zp )·(log(Ipmax )−log(Ipmin ))+log(Ipmin ). (9) g(zp ) is a differentiable function that is defined by maximum and minimum values (Ipmax , Ipmin ) of the available range of manufacturing parameters and a sigmoid function sig(zp ) [17] (10). 1 sig(zp ) = . (10) 1 + e−zp Because the output range is from 0 to 1 (sig(zp ) ∈ [0, 1]), g(zp ) becomes log(Ipmax ) when zp goes to +∞ and log(Ipmin ) when zp goes to −∞. Therefore, to find optimal manufacturing parameters in the controllable range [Ipmin , Ipmax ], we optimize zp without range constraint and calculate Ip = g(zp ). 4 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing For numerical efficiency, instead of calculating complex analytic gradient (6) of neural network function, we use numerical gradient [18] Ĝ (11) in our process optimization, where two feed-forward operations of (5) with the manufacturing parameters x + τ2 and x − τ2 and one division by τ ≈ 0 are needed. Thus, updating equation of latent variables (12) is also used instead of the equation for manufacturing parameters (7). Ĝ(x) = log(P DP ◦ g(x + τ2 )) − log(P DP ◦ g(x − τ2 )) , τ (11) xt+1 = xt − γ · Ĝ(xt ), xt = zpt . (12) III. EXPERIMENTS A. EXPERIMENTAL SETUP (a) Id -Vg curves For experimental data set, we gathered 1,000 samples of random manufacturing parameters in the reference range between min and max values specified in table 1 and simulated 1,000 samples of Id -Vg and Cgg -Vg characteristics of the devices as shown in fig.5 by using Sentaurus TCAD simulator. Before the electrical simulation, we obtained the geometrical parameters such as gate length and equivalent oxide thickness announced in [8] and [19]. Then, we varied the doping profiles from source/drain to channel to calibrate subthreshold characteristics between simulated and Intel measured data [8]. Simulated I-V characteristics correspond to 32-nm node HP (High Performance) and LOP (Low Operating Power) targets [19] about the range of the threshold voltage and on/off current ratio which are related to electrostatic characteristics like subthreshold swing and drain-induced barrier lowering. To obtain power (14) and delay (15), effective current Ief f is extracted from IH and IL (13) using [21]. We extracted IH at gate voltage, Vg =Vdd and drain voltage, Vd =0.5Vdd and IL at Vg =0.5Vdd and Vd =Vdd . IH Ief f = (IH − IL )/ ln , (13) IL power = Ief f · Vdd + Iof f · Vdd , delay = Cgg · Vdd . 2 · Ief f (14) (15) Among these 1,000 samples of {manufacturing parameters, delay and power} pairs, 850 samples were used to train the models of delay and power, the other 150 samples were used to test the delay and power prediction performance of the trained models. For TCAD simulation, drift-diffusion transport model was calculated self-consistently with Poisson and the carrier continuity equations. Density-gradient model was applied to consider the quantum confinements of carriers. Multivalley modified local-density approximation was used with sixband and two-band k.p band structures for holes and electrons, respectively. Canali model [20] for carrier velocity saturation at high electric field, Masetti model [22] to consider doping-dependent mobility, and Lombardi model [23] to consider mobility degradation at Si-SiO2 interface were (b) Cgg -Vg curves FIGURE 5: (a) I-V and (b) C-V characteristics of transistors for the extraction of power and delay. Parameter ranges are selected as full range (black lines) and 5% limiter (red lines). used. Doping-dependent Shockley-Read-Hall [24] and Auger [25] generation-recombination models were used along with Hurkx band-to-band tunneling model [26] to consider the gate-induced drain leakage. Operation voltage is fixed to 1.0 V, and off-state current is fixed at 1 nA for the averaged device. As a pre-processing of data set, we took logarithm of each element of the 850 train samples and then normalized to have (mean, standard deviation)=(0, 1) across samples. This pre-processing of eliminating bias and deviation differences between data helps multi-layered perceptron (MLP) to learn a function in the restricted range of domain and is known to enhance training performance [27]. We performed the same pre-processing for the 150 test samples by using the mean and standard deviation of the train samples. All experiments were operated in MATLAB console on a laptop computer with Intel i7-8550U (Quadcore, 1.8GHz) CPU and 16 GByte RAM. 5 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing We built two neural network models, one for predicting delay and the other for power of transistor, (fig.2) as 3-layered MLPs of 19 nodes for input layer, 7 nodes for hidden layer, and 1 node for output layer, where the number of nodes in the hidden layer were experimentally determined as the smallest number to learn regression between manufacturing parameters and delay or power without overfitting. In details, the two models were trained, one to minimize power loss of (3) when y=power, the other to minimize delay loss of (3) when y=delay. During the training phase, randomly selected 700 samples out of 850 samples (termed training set) were used to update neural network parameters using LevenbergMarquardt (LM) optimizer. The other 150 samples (termed validation set) were used find the stopping point for training iteration based on the generalization performance of the trained neural networks. In our experiments, we stopped the iteration of LM optimizer when the loss Ly for validation set continuously remained or increased during the previous 6 epochs. We repeated this training process for varying number of nodes in the hidden node and used the smallest number of nodes that produce small training error for training set and near-zero difference between errors of training and validation sets. Since the performance of a trained network differs based on its initial random parameter values, we repeated this training procedure with the same training and validation sets but with different initial randomized values of network parameters. And then, we took the best network which resulted the least error for training and validation sets. To find an appropriate number of repetition with a stable learning performance, we measured average MSEs and their variations of 10 -6 13 delay (Fdelay ) and power (Fpower ) models from 10 trials of training with each repeat number. As shown in fig.6, average loss and its variation decrease as the repeat number increases but no critical improvement was resulted after 40 repeats. Therefore, considering moderate training time, we concluded to use the best models from 40 trained networks. For the trained models, we firstly analyzed the learning ability and generalization performance of our neural models by monitoring the Loss (3) reduction during training process. Figure 7 shows loss (MSE error) reductions of the normalized data sets (sec.III-A) during training process. For both training (blue lines) and validation (green lines) sets, the loss gradually decreased as the training epoch continued until optimizer stopped (fig.7a and fig.7b). Even for test data set (red lines), which was not shown in training procedure, the same loss to the validation set was achieved. Figure 8 shows output regression lines of our two trained neural models. The regression plots of delay model (fig.8a) and power model (fig.8b) show almost ideal linear lines with R values very close to 1.0 for all kinds of data sets. Based on Best Validation Performance is 1.0668e-05 at epoch 29 10 0 Train Validation Test Best Mean Squared Error (mse) B. PERFORMANCE EVALUATION OF POWER AND DELAY MODELING 10 -2 10 -4 10 -6 mse error vs repeat num 0 5 10 15 20 25 30 35 35 Epochs delay error power error (a) Delay loss reduction 12 10 Train Validation Test Best 10 -2 Mean Squared Error (mse) | log(predicted/gt) | 2 Best Validation Performance is 7.8511e-06 at epoch 83 10 -1 11 9 8 7 6 1 10 20 30 40 50 60 70 80 90 10 -3 10 -4 10 -5 100 repeat num FIGURE 6: Model error vs. number of repeated training: each curve represents average loss with standard deviation from 10 trials corresponding to each number of repeated training. 10 -6 0 10 20 30 40 50 60 70 80 89 Epochs (b) Power loss reduction FIGURE 7: Loss vs. epoch during training process 6 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing Validation: R=0.99679 0.1 0.05 0 -0.05 -0.05 0 0.05 0.1 0.15 Target 0.15 0.1 0.05 0 -0.05 -0.05 0.05 0 -0.05 0.05 0.1 0.15 Output ~= 1*Target + 0.00037 Output ~= 0.99*Target + 0.00039 0.1 0 0.1 0.15 All: R=0.99775 Data Fit Y=T -0.05 0.05 Target Test: R=0.99688 0.15 0 Data Fit Y=T 0.15 C. PERFORMANCE EVALUATION OF AUTOMATIC MANUFACTURING PARAMETER OPTIMIZATION 0.1 0.05 0 -0.05 -0.05 0 Target 0.05 0.1 0.15 Target (a) Delay regression lines Validation: R=0.99757 Data Fit Y=T 0.05 0 -0.05 -0.1 -0.15 -0.15 -0.1 -0.05 0 0.05 Target Output ~= 0.99*Target + 0.00057 Output ~= 1*Target + 5.3e-06 Training: R=0.99872 Data Fit Y=T 0.05 0 -0.05 -0.1 -0.15 -0.15 0 -0.05 -0.1 -0.15 -0.15 -0.1 -0.05 0 -0.05 0 0.05 All: R=0.99839 0.05 Output ~= 1*Target + 9.4e-05 Output ~= 1*Target + 3e-05 0.05 -0.1 Target Test: R=0.9976 Data Fit Y=T of our neural models to the simulated delays and powers (ground truth) for the 150 test samples in sec.III-A. As shown in fig.9, the differences between predicted and simulated values are very small (left panel of fig.9), less than 5 % of the simulated values (right panel of fig.9), and this prediction accuracy is good enough to apply our neural models to predict delay and power in a real manufacturing process. Our neural modeling took 80 seconds to train two neural networks 40 times. Additionally, compared to TCAD simulation that takes 3 min, our method takes less than 1 ms to perform a forward prediction of delay and power for a certain manufacturing parameter. Data Fit Y=T 0.05 0 In the optimization process represented in fig.4, we fixed our trained neural models of delay and power in sec.III-B and used them as Fdelay and Fpower respectively. And then, we generated random initial values of manufacturing parameters and updated them by gradient descent method as described in sec.II-C with numerical gradient (11), update equation (12), and objective function (5). Random initial values of manufacturing parameters were generated under the uniform distribution in a reasonable range, i.e., from Ipmin to Ipmax of (9) where Ipmin = 0.95×reference value in table 1 (-5%) and Ipmax = 1.05×reference value in table 1 (+5%). The small deviation τ = 2 × 10−8 was used for calculating accurate numerical gradient (11) and the learning rate γ = 103 was used for update equation (12) for fast convergence. TP DP = 0, TG = 10−5 , and Tdx = 10−5 were used for termination condition (8). Since gradient descent method does not always guarantee global optimum from any initial value, we repeated this optimization process with different initial values and took the best result as the optimal manufacturing parameters. To find the necessary number of repetition of optimization process to obtain stable optimal result, we firstly measured -0.05 -0.1 10 -12 delay: predicted vs gt gt predicted 2.2 -0.15 -0.1 Target -0.05 0 0.05 Target 2 1.8 1.6 (b) Power regression lines delay error 0.1 2.4 -0.15 predicted/gt-1 0.15 Data Fit Y=T delay Data Fit Y=T Output ~= 1*Target + -0.00018 Output ~= 1*Target + 0.00049 Training: R=0.99815 0.05 0 -0.05 1.4 -0.1 FIGURE 8: Regression lines of data set after training process 0 50 10 -4 100 150 0 sample indices power: predicted vs gt predicted/gt-1 these results, we concluded that our neural models learned the deterministic equations of delay and power for given manufacturing parameters and that our neural models have a generalization performance to predict delay or power for unseen manufacturing parameters in the same perturbation range of training data. Secondly, to evaluate the modeling accuracy of our neural approach, we compared the predicted delays and powers power 2.6 2.4 2.2 2 gt predicted 1.8 100 150 sample indices power error 0.1 2.8 50 0.05 0 -0.05 -0.1 0 50 100 sample indices 150 0 50 100 150 sample indices FIGURE 9: Comparison of predicted and simulated (gt) values of test data set: the predicted values are located in between ±5% bounds of ground truth values. 7 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing TABLE 2: Optimized values vs. number of repeated optimization process with 5% limiter: values in parentheses are standard deviations corresponding to the above averaged values. Repeat number 1 0.3071 (0.9429) 0.3110 simulated (0.9578) 0.1661 predicted (0.2284) 0.1684 simulated (0.2705) 0.1849 predicted (0.3095) 0.1847 simulated (0.2990) predicted PDP ×10−15 (×10−18 ) Delay ×10−11 (×10−13 ) Power ×10−3 (×10−5 ) 10 0.3066 (0.0079) 0.3118 (0.6863) 0.1684 (0.0308) 0.1699 (0.0041) 0.1820 (0.0339) 0.1835 (0.0365) 20 0.3066 (0.0002) 0.3120 (0.0007) 0.1685 (0.0005) 0.1699 (0.0005) 0.1819 (0.0006) 0.1836 (0.0005) 30 0.3066 (0.0002) 0.3120 (0.0047) 0.1685 (0.0003) 0.1699 (0.0019) 0.1819 (0.0003) 0.1836 (0.0018) the best PDP value for each repeat number and secondly calculated mean and standard deviation from 10 trials. Figure 10a and the second row of table 2 show the measured values where predicted optimum and simulated optimum represent PDP values corresponding to the optimized manufacturing parameters from our neural model and from Sentaurus TCAD simulator, respectively. We find that the predicted PDP is saturated at around 0.3066 × 10−15 when repeat number is ≥ 10 and its deviation diminished to 0.0002 × 10−18 . Even without any repeat (when repeat number is 1), the resulted PDP value (0.3071 × 10−15 with standard deviation of 0.9429 × 10−18 ) is in close proximity to the saturated optimal value. Compared to the ground truth (simulated optimum), the predicted optimum has slightly lower value. This difference is expected and caused by the model prediction error shown in fig.9 but the difference within ±5% bounds of simulated optimum (upper +5% and lower -5% bounds of fig.10a) is not significant in real manufacturing process. TABLE 3: The optimized manufacturing parameters from 40 repeated optimization process with 5% limiter (Avg: averaged value, Std: standard deviation, Std / Avg: relative deviation): all values were measured from 10 trials of 40 repeated optimization process. Parameter Lcon(d) Lcon(s) Lg Lhaloj(d) Lhaloj(s) Lsdj(d) Lsdj(s) Lsp(d) Lsp(s) Nch Nhalo(d) Nhalo(s) Nsd(d) Nsd(s) Tg Thk Tox Ts/d(d) Ts/d(s) Avg 42 38 28.5 9.5 10.5 26.2 23.8 19 21 1.0496 ×1018 4.7500 ×1018 5.2500 ×1018 1.0500 ×1020 0.9500 ×1020 47.5 2.1 0.735 9.5 10.5 Std Std / Avg 8.4961 ×10−7 2.0229 ×10−5 4.3151 ×10−7 1.1355 ×10−5 8.5249 ×10−16 2.9912 ×10−14 6.9458 ×10−9 7.3114 ×10−7 2.4959 ×10−7 2.3771 ×10−5 6.3633 ×10−10 2.4241 ×10−8 5.7586 ×10−9 2.4247 ×10−7 4.6348 ×10−12 2.4394 ×10−10 1.1685 ×10−9 5.5644 ×10−8 1.4607 ×1014 1.3916 ×10−4 1.2944 ×1013 2.7251 ×10−6 2.9841 ×1013 5.6841 ×10−6 3.4904 ×1014 3.3242 ×10−6 3.8750 ×1013 4.0789 ×10−7 2.2813 ×10−7 4.8028 ×10−6 1.4153 ×10−9 6.7379 ×10−7 1.4366 ×10−10 1.9545 ×10−7 1.4790 ×10−8 1.5568 ×10−6 4.1235 ×10−10 3.9271 ×10−8 40 0.3066 (0.0002) 0.3120 (0.0006) 0.1685 (0.0006) 0.1699 (0.0004) 0.1819 (0.0007) 0.1836 (0.0004) 50 0.3066 (0.0002) 0.3120 (0.0009) 0.1685 (0.0004) 0.1699 (0.0006) 0.1819 (0.0005) 0.1836 (0.0007) 60 0.3066 (0.0001) 0.3120 (0.0006) 0.1685 (0.0005) 0.1699 (0.0004) 0.1819 (0.0006) 0.1836 (0.0004) 70 0.3066 (0.0002) 0.3120 (0.0046) 0.1685 (0.0005) 0.1699 (0.0019) 0.1819 (0.0005) 0.1836 (0.0017) 80 0.3066 (0.0001) 0.3120 (0.0007) 0.1685 (0.0006) 0.1699 (0.0005) 0.1819 (0.0006) 0.1836 (0.0005) 90 0.3066 (0.0001) 0.3120 (0.0047) 0.1685 (0.0005) 0.1699 (0.0019) 0.1819 (0.0005) 0.1836 (0.0018) 100 0.3066 (0.0002) 0.3120 (0.0047) 0.1685 (0.0004) 0.1699 (0.0019) 0.18219 (0.0004) 0.1836 (0.0018) When repeat number is 20, delay (fig.10b) and power (fig.10c) converged to the best predicted optimum, 0.1685 × 10−11 (third row of table 2) and 0.1819 × 10−3 (fourth row of table 2) respectively, and are stable with very small standard deviations, 0.0005 × 10−13 (second row of table 2) and 0.0006 × 10−5 (third row of table 2) respectively. Compared to the simulated optima of delay and power, the predicted optima have very small difference within ± 5% bounds (fig.10b and fig.10c) of the simulated optima. Based on these results, we decided to repeat optimization process 40 times (20 + marginal 20) to achieve the lowest PDP performance with stable delay and power. Our optimized PDP and its corresponding delay and power are compared to the values of 150 test samples (III-A) in fig.11. As shown in fig.11a, our optimized PDP (green line) is lower than the minimum PDP of the test samples. However, the minimum PDP was not resulted with the minima of both delay and power. Instead, our optimized manufacturing parameters have relatively high delay (fig.11b) and relatively low power (fig.11c). This result is partially consistent with the previous optimizing technique of human expert where human expert firstly minimizes power (DC characteristics) by maintaining good short channel characteristics and then tries to minimize delay (AC characteristics) under the low power condition. However, our neural optimization result differs in that it does not insist on the lowest power and delay; rather our approach resulted in a slightly higher power, yet achieved the lowest PDP with a certain delay. Therefore, as an holistic optimization, our neural approach is promising in finding optimal manufacturing parameters for the lowest PDP while the commonly used sequential optimization in real manufacturing, which firstly sticks to the minimal power and secondly searches minimal delay, does not guarantee the lowest PDP. Benefits of our neural approach will be further proved by quantitative comparing with human expert and the previously used reference design in sec.III-E. Table 3 presents the average and standard deviation values of each optimized manufacturing parameter when repeat num is 40. Each parameter converged to a certain optimal value (Avg) with a very small deviation (Std) of 10−4 or less scale of its corresponding Avg value (Std / Avg). Figure 12 shows the optimized transistor structure of the manufacturing pa- 8 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing rameters shown in table 3. Since our optimization method has no constraint of source/drain symmetry, the resulted optimal transistor has an asymmetric structure unlike commonly used symmetric structure [8]. Based on the table 2, this asymmetric structure achieved 17% lower PDP (0.3120 × 10−15 , 12% higher delay (0.1699 × 10−11 ), and 26% lower power 10 -16 3.3 average and deviation of optimized pdp vs repeat num 3.25 predicted optimum simulated optimum upper bound (+5%) lower bound (-5%) pdp 3.2 (0.1836 × 10−3 ) than reference values, 0.3749 × 10−15 , 0.1516 × 10−11 , and 0.2473 × 10−3 respectively, which were obtained using TCAD simulation with the reference structure [8] in table 1. Although asymmetric structure is not common and beyond the present manufacturing ability, this optimization result shows that our neural approach can suggest theoretical direction for MOSFET structure as well as optimal parameter values for better performance beyond the conventional source/drain symmetric structure. Our optimization method can also find a set of optimal source/drain symmetric manufacturing parameters by applying our optimization technique 3.15 3.1 3.05 10 -16 5 3 Test sample PDP vs optimized PDP pdp of test samples optimized pdp 2.95 0 10 20 30 40 50 60 70 80 90 4.5 100 PDP repeat num (a) Optimized PDP vs repeat number 10 -12 1.8 average and deviation of optimized delay vs repeat num delay 3.5 predicted optimum simulated optimum upper bound (+5%) lower bound (-5%) 1.75 4 3 0 50 100 150 sample index 1.7 (a) Test sample PDP vs optimized PDP 10 -12 Test sample delay vs optimized delay 1.65 delay of test samples optimized delay 2.4 2.2 1.6 10 20 30 40 50 60 70 80 90 100 Delay 0 repeat num (b) Optimized delay vs repeat number 2 1.8 1.6 10 -4 1.95 average and deviation of optimized power vs repeat num 1.4 1.2 power 0 predicted optimum simulated optimum upper bound (+5%) lower bound (-5%) 1.9 1.85 50 100 150 sample index (b) Test sample delay vs optimized delay 1.8 10 -4 3 Test sample power vs optimized power power of test samples optimized power 2.8 1.75 0 10 20 30 40 50 60 70 80 90 100 repeat num (c) Optimized power vs repeat number FIGURE 10: Optimized values of PDP, delay, and power vs the number of repeated optimization process: predicted optimum represents the output value from our neural model corresponding to the optimized manufacturing parameters, simulated optimum represents the values from TCAD simulator, upper and lower bounds represents 5% higher and lower values respectively than the simulated optimum when repeat number is 20. Power 2.6 1.7 2.4 2.2 2 1.8 1.6 0 50 100 150 sample index (c) Test sample power vs optimized power FIGURE 11: Test samples and our optimized values of PDP, delay, and power: the optimized values were obtained from 30 repeated optimization process with 5% limiter. 9 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access 1. 논문 figure 및 table Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing DopingConcentration (cm-3) 1.053∙1020 1.524∙1016 2.094∙1012 G S -7.822∙1014 -5.402∙1018 D Sub FIGURE 12: Transistor structure optimized by our neural approach with 5% limiter TABLE 4: Parameters listed in descending order of sensitivity (dy): values with ’-’ sign are negatively correlated with PDP. Parameter PDP sensitivity (dy) Lg 6.8710 ×10−17 Nsd(s) 3.7977 ×10−17 Tox -2.9362 ×10−17 Lsdj(s) 2.1586 ×10−17 Thk -1.7121 ×10−17 Lsp(s) -1.7022 ×10−17 Nsd(d) -1.3028 ×10−17 Nhalo(s) -1.2078 ×10−17 Lsdj(d) -1.0712 ×10−17 Nhalo(d) 1.0056 ×10−17 Parameter PDP sensitivity (dy) Ts/d(s) -7.1042 ×10−18 Lsp(d) 5.1539 ×10−18 Ts/d(d) 3.7916 ×10−18 Lhaloj(d) 3.4982 ×10−18 Tg 2.5350 ×10−18 Nch -1.1027 ×10−18 Lcon(s) 9.1408 ×10−19 Lhaloj(s) -8.1754 ×10−19 Lcon(d) -4.1220 ×10−20 of sec.II-C assuming the same parameter for source/drain. The symmetric structure optimization will be dealt with in sec.III-E. D. CORRELATION AND SENSITIVITY ANALYSIS We additionally analyzed the effects of each manufacturing parameter on PDP value in optimization process by using our trained neural models of sec.III-B and the optimized manufacturing parameters of sec.III-C. Each graph of fig.13, fig.14, and fig.15 shows transitions of figure-of-merits, i.e., PDP, delay, and power, respectively for each manufacturing parameter varying from Min to Max of table 1, while the other parameters are fixed at the optimal values of table 3. We calculated the differences between the minimum and maximum values of figure-of-merits as the sensitivity (dy) of parameter, i.e., how significantly a parameter affects the figure-of-merit. These graphs can be generated in a second from our neural models without using repeated TCAD simulations. From these graphs, we firstly find that PDP and each manufacturing parameter have an almost linear correlation as shown in fig.13. This linear correlation resulted in the optima (red circles in fig.13) on one bound of 5% limiter (vertical dashed lines or dash dot lines in fig.13) which has the lowest PDP. Once we found how manufacturing parameters are correlated with PDP, we can easily decide if each parameters should be changed furthermore and to which direction each parameter should be moved. Moreover, the monotonic linear correlation supports the usage of a wider limiter in our optimization process up to full range, and thus each parameter can be moved toward the boundary of the limiter for even further PDP optimization. Secondly, we find that source/drain parameter pairs have the opposite directional correlations with PDP, but with very similar sensitivities. Highly-doped source region by reducing parasitic resistance and lightly-doped drain region by decreasing gate-induced drain leakage and improving short channel effects (SCEs) are preferred whenever possible [28]– [30]. For example, Lsdj(d) and Lsdj(s) are negatively and positively correlated with PDP respectively and these two correlations have similar slopes in fig.13. This supports to use asymmetric structure for a better PDP because the positive and negative PDP increments of the same source/drain parameter pair compensate each other. TABLE 5: Parameters listed in descending order of delay and power sensitivities (dy): parameters (0,0) are insensitive to both delay and power, Parameters (±,0) or (0, ±) are sensitive to only delay or power respectively. Parameters (±,±) or (±,∓) are sensitive to both delay and power in a same direction or opposite directions. Each group of parameters are sorted in a descending order in regards to both delay and power sensitivities. Parameters (0,0) Lcon(d) Lhaloj(d) Lcon(s) Tg Parameters (±,0) Lsdj(d) Nsd(d) Nhalo(d) Lsp(d) Ts/d(d) Parameters (0,±) Tox Parameters (±,±) Lg Parameters (±,∓) Nsd(s) Nhalo(s) Lsdj(s) Lsp(s) Nch Ts/d(s) Lhaloj(s) Thk Delay sensitivity (dy) 1.0306 ×10−14 1.1255 ×10−14 -4.4003 ×10−15 -1.5111 ×10−16 Delay sensitivity (dy) -7.5297 ×10−14 -7.4103 ×10−14 4.0869 ×10−14 3.5057 ×10−14 3.2589 ×10−14 Delay sensitivity (dy) -1.7130 ×10−15 Delay sensitivity (dy) 2.8058 ×10−13 Delay sensitivity (dy) -4.6829 ×10−13 3.0951 ×10−13 -2.7648 ×10−13 2.2607 ×10−13 1.8004 ×10−13 1.1989 ×10−13 7.0730 ×10−14 3.5765 ×10−14 Power sensitivity (dy) -1.1441 ×10−6 8.5549 ×10−7 1.0203 ×10−6 1.5210 ×10−6 Power sensitivity (dy) 1.6981 ×10−6 2.5266 ×10−7 1.5459 ×10−6 -7.1222 ×10−7 -1.2461 ×10−6 Power sensitivity (dy) -1.7213 ×10−5 Power sensitivity (dy) 1.0096 ×10−5 Power sensitivity (dy) 7.1839 ×10−5 -4.2408 ×10−5 5.0168 ×10−5 -3.9581 ×10−5 -2.0630 ×10−5 -1.8421 ×10−5 -8.4648 ×10−6 -1.4148 ×10−5 10 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing 1 0 1 0 39 40 41 42 0 38 39 40 x=Lcon(d) 10 -16 10 -17 3 2 pdp vs Lsdj(s) 10 -16 10 -17 32 34 2 3 2 0 0 0 24 24.5 26 19 1 pdp vs N halo(s) 10 10 -17 dy = -1.2078 4 y=pdp 2 -16 3 2 5.5 x=N halo(d) 10 -16 4 4 4.5 10 18 10 -16 10 -17 2 1 5.5 10 18 2 2 x=T hk 2.2 11 0.65 0.7 0.75 0.8 20.5 4 3 2 21 8 pdp vs N sd(s) 10 10 -17 2 9 10 10 11 10 17 pdp vs Tg dy = 2.535 10 -18 49 51 3 2 12 10 19 48 50 52 x=T g pdp vs Ts/d(s) dy = -7.1042 4 2 12 0 9 x=N sd(s) 10 -16 3 11 1 8 10 -18 -16 4 3 pdp vs Ts/d(d) dy = 3.7916 10 -18 x=N ch dy = 3.7977 10 19 10 -18 3 2 1 0 0.6 -16 4 12 1 0 1.8 10 x=N sd(d) 10 -16 3 20 0 9 pdp vs N ch dy = -1.1027 4 1 8 1 0 19.5 10 10 -17 10.5 0 19 pdp vs N sd(d) 2 6 10 -17 10 -16 10 -17 x=Lsp(s) 3 pdp vs Tox dy = -2.9362 4 y=pdp 3 5 x=N halo(s) pdp vs Thk dy = -1.7121 21 0 6 y=pdp 5 20.5 1 0 4.5 20 dy = -1.3028 4 1 0 -16 10 1 x=Lsp(d) y=pdp 10 10 -17 3 4 19.5 y=pdp pdp vs N halo(d) dy = 1.0056 4 25.5 y=pdp 10 -16 25 x=Lsdj(s) pdp vs Lsp(s) 2 0 26 9.5 x=Lhaloj(s) 3 1 25.5 10.5 dy = -1.7022 4 1 25 0 10 10 -16 10 -18 1 24.5 2 1 9.5 pdp vs Lsp(d) 10 -19 3 x=Lhaloj(d) dy = 5.1539 4 3 x=Lsdj(d) y=pdp 30 1 24 y=pdp 28 x=Lg dy = 2.1586 4 y=pdp y=pdp pdp vs Lsdj(d) 2 0 26 x=Lcon(s) dy = -1.0712 4 42 y=pdp 10 -16 41 3 pdp vs Lhaloj(s) dy = -8.1754 4 1 y=pdp 38 2 10 -16 10 -18 y=pdp 2 3 pdp vs Lhaloj(d) dy = 3.4982 4 y=pdp optimum -5% bounds +5% bounds 1 3 10 -16 10 -17 dy = 6.871 4 y=pdp y=FPDP(x) 2 pdp vs Lg 10 -16 10 -19 y=pdp 3 pdp vs Lcon(s) dy = 9.1408 4 y=pdp y=pdp 10 -16 10 -20 dy = -4.122 y=pdp pdp vs Lcon(d) 10 -16 4 0 9.5 x=T ox 10 x=T s/d(d) 10.5 9.5 10 10.5 x=T s/d(s) FIGURE 13: PDP transition for varying manufacturing parameters around the optimal value of 5% limiter and asymmetric source/drain structure Thirdly, PDP is insensitive to some manufacturing parameters. For example, PDP is almost constant (|dy| < 3.066 ×10−18 , less than 1% of the optimal predicted PDP 0.3066 ×10−15 shown in table.2) for varying Lcond(s/d) , Lhaloj(s) , Nch , and Tg in fig.13. This means that not all parameters but only some significant parameters with high slope (high sensitivity) are selectively necessary for achieving an optimal PDP [31], [32]. Considering only significant parameters in the transistor optimization process is beneficial because finding an optimum from a lower dimensional search space is faster and easier. Table 4 shows the sorted list of parameters in descending order in regards to their PDP sensitivities (dy) where Lg , Nsd(s) , Tox , and Lsdj(s) are significant factors for PDP optimization based on their PDP sensitivities. This analysis is consistent with the trend of conventional semiconductor industry [33], [34]. We also performed a similar analysis on the effect of manufacturing parameters in delay and power control by monitoring their sensitivities (dy) as shown in fig.14 and fig.15. We categorized manufacturing parameters into five groups regarding their sensitivities (dy) to delay and power, i.e., both insensitive (0,0), delay-only sensitive (±,0), poweronly sensitive (0,±), both sensitive (±,±), both sensitive but opposite (±, ∓). Here, we assumed that parameters of sensitivity (dy) less than 1% of optimal predicted delay (0.1685 ×10−11 in table.2) or power (0.1819 ×10−3 in table.2) are insensitive. Table 5 shows the parameter groups in descending order of sensitivities (dy) to delay and power. As we can expect from table 4, Lcond(s/d) , Lhaloj(d) , and Tg of very low PDP sensitivity are in the both insensitive (0,0) group and have no effect in optimization process. Nch and Lhaloj(s) of low PDP sensitivity are categorized as the group of both sensitive but opposite (±,∓) and, therefore, these parameters are useful in controlling delay and power of transistor without changing PDP. By using the parameters in delay-only (±,0) sensitive group like Lsdj(d) or power-only (0,±) sensitive group like Tox , we can effectively control delay or power of transistor independently. Based on this correlation and sensitivity analysis, we find a possibility to further reduce PDP. To this end, we enlarged the range of limiter from 5% of reference manufacturing parameter values to full ranges between Min and Max in table 1 and performed the same automatic parameter optimization process mentioned in sec.III-C. Table 6 shows optimized values of manufacturing parameters with the full range limiter. The resulted parameters positively correlated with PDP as shown in table 4, i.e., Lg , Nhalo(d) , and Nsd(s) , were decreased and parameters negatively correlated with PDP shown in table 4, i.e., Nch , Nhalo(s) , Nsd(d) , Thk , and Tox , were increased to the direction of reducing PDP. The other parameters were not changed because they were already in full range with 5% limiter. This full range optimization resulted in about 29% lower PDP (0.2661×10−15 ) than the TCAD simulated value using the reference structure (0.3749×10− 15) as shown in 11 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing 0.5 0 0.5 40 41 42 39 40 42 0.5 10 -12 10 -13 1.5 1 0.5 0 25 25.5 24 24.5 10 -12 10 -14 25.5 10 -13 1 0.5 0 5 5.5 4 1.5 1 0.5 8 2.2 11 0.65 0.7 0.75 8 9 10 10 -12 10 -13 10 17 delay vs Tg 10 -16 dy = -1.5111 2 1.5 1 0 9 10 11 12 10 -12 49 50 51 52 x=T g delay vs Ts/d(s) 10 -13 dy = 1.1989 2 48 10 19 x=N sd(s) 10 -14 12 0.5 8 delay vs Ts/d(d) dy = 3.2589 11 x=N ch delay vs Nsd(s) 10 19 1 0.8 1 21 1 1.5 1 0.5 0 0.6 20.5 1.5 12 0.5 x=T hk 20 dy = -4.6829 2 1.5 0 2 10 x=N sd(d) 2 0.5 1.8 9 10 -12 1 1.5 0 6 1.5 0 10 -12 10 -14 10 -13 0 19.5 0.5 10 18 10 -15 dy = -1.713 2 y=delay 2 5.5 delay vs Nch 0.5 19 delay vs Nsd(d) 10.5 dy = 1.8004 2 x=Lsp(s) 1 delay vs Tox 10 -12 10 -14 dy = 3.5765 5 x=N halo(s) delay vs Thk 10 -12 4.5 10 -12 10 -13 1 0 6 10 18 x=N halo(d) delay vs Lsp(s) 1.5 21 0.5 y=delay 4.5 20.5 1.5 0 4 20 dy = -7.4103 2 10 x=Lhaloj(s) 0 19.5 10 -12 1.5 9.5 0.5 19 delay vs Nhalo(s) 10.5 dy = 2.2607 2 1 26 y=delay y=delay 1 0.5 10 -12 10 -14 x=Lsp(d) dy = 3.0951 2 1.5 10 x=Lhaloj(d) delay vs Lsp(d) x=Lsdj(s) dy = 4.0869 2 25 0 9.5 0 26 delay vs Nhalo(d) 34 0.5 x=Lsdj(d) 10 -12 32 1.5 0 24.5 30 dy = 3.5057 2 y=delay 1 delay vs Lsdj(s) 1 0.5 x=Lg dy = -2.7648 2 1.5 24 28 10 -14 1.5 0 26 y=delay 10 -12 10 -14 y=delay y=delay 41 x=Lcon(s) dy = -7.5297 1 0 38 delay vs Lsdj(d) 10 -12 2 1.5 delay vs Lhaloj(s) dy = 7.073 2 0.5 y=delay 39 x=Lcon(d) y=delay 1 0 38 y=delay 1.5 10 -12 10 -14 y=delay 1 delay vs Lhaloj(d) dy = 1.1255 2 y=delay 0.5 2 1.5 10 -12 10 -13 y=delay optimum -5% bounds +5% bounds delay vs Lg dy = 2.8058 y=delay y=Fdelay (x) 1 10 -12 10 -15 y=delay 1.5 delay vs Lcon(s) dy = -4.4003 2 y=delay y=delay 2 10 -12 10 -14 dy = 1.0306 y=delay delay vs Lcon(d) 10 -12 0 9.5 10 x=T ox 10.5 9.5 10 x=T s/d(d) 10.5 x=T s/d(s) FIGURE 14: Delay transition for varying manufacturing parameter around the optimal value of 5% limiter and asymmetric source/drain structure 0 1 0 40 41 42 0 38 39 power vs L sdj(d) 10 -4 10 -06 y=power 2 1 24.5 power vs L sdj(s) 25 10 -4 25.5 1 26 24.5 25 25.5 26 10 0 10 5 5.5 x=N halo(d) 10 -4 2 1 4 4.5 10 -4 10 -05 1 0 5.5 2 x=T hk 2.2 -4 8 10 11 10 -4 0.7 x=T ox 19.5 10 0.75 0.8 20 20.5 power vs N ch dy = -1.2461 2 1 21 -4 8 power vs N sd(s) 10 1 9 10 11 12 10 17 power vs T g dy = 1.521 10 -06 49 51 2 1 0 8 10 19 9 10 11 12 10 19 x=N sd(s) 10 -4 10 -06 1 -4 10 -05 2 12 2 10 -05 dy = -2.063 x=N ch dy = 7.1839 power vs T s/d(d) 10.5 0 19 10 -07 0 0.65 10 -4 10 -05 1 power vs N sd(d) 9 10 x=Lsp(s) x=N sd(d) 10 -05 1 0.6 power vs L sp(s) 0 6 2 9.5 x=Lhaloj(s) 2 21 1 10 18 0 1.8 20.5 2 power vs T ox dy = -1.7213 y=power 2 5 x=N halo(s) 10.5 dy = -3.9581 0 6 10 18 power vs T hk dy = -1.4148 20 dy = 2.5266 y=power 4.5 19.5 10 -05 0 4 10 -4 x=Lsp(d) dy = -4.2408 y=power 1 10 0 19 1 0 9.5 10 -07 1 10 -06 2 x=Lhaloj(d) 2 -4 power vs N halo(s) 10 -06 2 34 power vs L sp(d) x=Lsdj(s) dy = 1.5459 32 0 24 -4 power vs N halo(d) 30 dy = -7.1222 y=power 10 28 10 -05 2 x=Lsdj(d) 1 0 26 0 24 y=power 42 power vs L haloj(s) dy = -8.4648 2 x=Lg dy = 5.0168 0 y=power 41 y=power dy = 1.6981 y=power 40 x=Lcon(s) y=power x=Lcon(d) 10 -4 1 y=power 39 2 10 -4 10 -07 dy = 8.5549 48 50 52 x=T g power vs T s/d(s) dy = -1.8421 y=power 38 power vs L haloj(d) y=power 2 10 -4 10 -05 y=power optimum -5% bounds +5% bounds power vs L g dy = 1.0096 y=power y=Fpower (x) 10 -4 10 -06 y=power 2 1 power vs L con(s) dy = 1.0203 y=power y=power 10 -4 10 -06 dy = -1.1441 y=power power vs L con(d) 10 -4 10 -05 2 1 0 9.5 10 x=T s/d(d) 10.5 9.5 10 10.5 x=T s/d(s) FIGURE 15: Power transition for varying manufacturing parameter around the optimal value of 5% limiter and asymmetric source/drain structure 12 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing TABLE 6: The optimized manufacturing parameters from 40 repeated optimization processes with the full range limiter (Avg: averaged value, Std: standard deviation, Std / Avg: relative deviation): all values were measured from 10 trials of 40 repeated optimization processes. Parameter Lcon(d) Lcon(s) Lg Lhaloj(d) Lhaloj(s) Lsdj(d) Lsdj(s) Lsp(d) Lsp(s) Nch Nhalo(d) Nhalo(s) Nsd(d) Nsd(s) Tg Thk Tox Ts/d(d) Ts/d(s) Avg Std 42 4.5363 ×10−7 38 8.1926 ×10−7 25.5 0 9.5 1.3720 ×10−8 10.5 4.2806 ×10−8 26.2 4.2387 ×10−8 23.8 1.5486 ×10−12 19 6.6142 ×10−8 21 4.0887 ×10−11 1.2500 ×1018 2.4293 ×1013 3.7500 ×1018 2.8333 ×107 6.2500 ×1018 1.1724 ×1013 1.2500 ×1020 2.5697 ×106 0.7500 ×1020 1.9691 ×105 47.5 8.0618 ×10−7 2.3 4.5714 ×10−19 0.805 0 9.5 2.8810 ×10−8 10.5 1.6864 ×10−8 Std / Avg 1.0801 ×10−5 2.1559 ×10−5 0 1.4442 ×10−6 4.0768 ×10−6 1.6147 ×10−6 6.5204 ×10−11 3.4812 ×10−6 1.9470 ×10−9 1.9435 ×10−5 7.5556 ×10−12 1.8759 ×10−6 2.0558 ×10−14 2.6255 ×10−15 1.6972 ×10−5 1.9876 ×10−16 0 3.0326 ×10−6 1.6061 ×10−6 table 7. E. COMPARISON BETWEEN HUMAN EXPERT AND OUR NEURAL APPROACH For a fair comparison of optimization performances between our neural optimization method and human expert, we set the same search ranges of manufacturing parameters, i.e., 5% limiter (sec.III-C) and full range limiter (from Min to Max of table 1), and used the conventional source/drain symmetric structure [8] for both methods. In case of our neural approach, we performed additional automatic parameter optimization with our trained models, previously mentioned in sec.III-B as described in sec.III-C but with source/drain symmetric constraint, i.e., using a single parameter for each pair of source/drain parameters which has a symbol with (s/d) in table 1. In case of human expert, an experienced human expert optimized the PDP of the conventional MOSFET as described here: to minimize the PDP, gate capacitance Cgg should be minimized according to the PDP, product of delay and power. Gate length was first tuned to reduce the Cgg . Then, equivalent oxide thickness (EOT), directly related to Tox and Thk , was controlled. In this process, it was revealed that the EOT does not affect the SCEs of the device in this work, so a large EOT was set to reduce the Cgg . Both Tg and Tsd were decreased to minimum in order to reduce the fringing capacitance between gate stack and S/D regions. Lcon was also decreased to minimum in order to reduce the junction capacitance. Doping-related parameters, Nch , Nhalo , and Nsd , were decreased to reduce the overlap and intrinsic channel capacitances. Finally, Lsp was set to the middle value due to the trade-off between overlap and fringing capacitances. 1) Comparison in figure-of-Merits: PDP, delay, and power Table 7 and fig.16 present and compare figure-of-merits of transistors optimized by our neural approaches and human expert. In case of 5% limiter, both human expert and our neural method (Neural sym) achieved about 7% less PDP and delay and similar power compared to those resulted from the reference structure. In case of full range limiter, both methods achieved about 20% less PDP and power and slightly less delay than those resulted from the reference structure. When comparing results from human expert and our method (Neural sym), both showed similar delays, and powers; while PDPs were also similar, our method resulted in a slightly higher PDP compared to that resulted by human expert. . Our method has 1 to 2% error in modeling (fig.9 in sec.III-B) and optimization (fig.10 in sec.III-C). And, due to the linear correlation of low complexity (fig.13, fig.14, and fig.15), human expert easily found the near best manufacturing parameters. This resulted in a lower PDP in Neural sym but the difference was not significant, coming to less than 1.5%. We presented optimized manufacturing parameters by our neural approach in table 8 and by human expert in table 9. Lcond(s/d) , Lsdj(s/d) , Lsp(s/d) , and Tg of two methods have different values. These differences are resulted from the different optimization objectives of two methods, where our neural method selects parameter values to obtain lower PDP as described in sec.II-C while human expert minimizes Cgg only for lower PDP according to the PDP equation as described in sec.III-E. Nonetheless, these parameters with different values are not significant in minimizing PDP because PDP is almost insensitive to these parameters based on the sensitivity analysis in symmetric source/drain structure (table 10). The other parameters have the same or similar values in both methods. These results demonstrate that our neural approach achieved a comparable optimization result to the equation-based minimization of human expert without knowing any equation through the entire process. In the case of full range limiter, as the search range TABLE 7: TCAD simulated values of PDP, delay, and power for the optimized manufacturing parameters: ’Neural’ represents the optimization with our neural models and asymmetric source/drain structure, ’Neural sym’ with our neural models and symmetric source/drain structure, ’Human expert’ with human optimization and symmetric source/drain structure, and ’Reference structure’ shows the outputs of TCAD simulation with the reference parameter values [8] in table 1. Optimization method PDP×10−15 Delay×10−11 Power×10−3 Neural (5% limiter) 0.3120 0.1699 0.1836 Neural sym (5% limiter) 0.3535 0.1409 0.2509 Human expert (5% limiter) 0.3487 0.1435 0.2430 Neural (full range) 0.2661 0.2272 0.1171 Neural sym (full range) 0.3014 0.1490 0.2023 Human expert (full range) 0.2965 0.1445 0.2052 Reference structure [8] 0.3749 0.1516 0.2473 13 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Relative Delay Relative PDP Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 PDP in symmetric structure. In addition, there is higher chances to find better parameters to obtain a lower PDP in asymmetric structure since asymmetric structure has 19 degree-of-freedom (DOF), which has seven more variables than symmetric structure with 12 DOF. Comparison of figure-of-merit in pdp Comparison of figure-of-merit in delay Neural (5% limiter) Neural sym (5% limiter) Human expert (5% limiter) Neural (full range) Neural sym (full range) Human expert (full range) Reference structure Relative Power Comparison of figure-of-merit in power 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 FIGURE 16: Comparison of methods in figure-of-merits: PDP, delay, and power in table 7 were normalized so that the values of reference structure become 1.0 for easy comparison. TABLE 8: Optimized manufacturing parameters by our neural approach under the constraint of symmetric source/drain (Neural sym) Parameter Value (5% limiter) Value (full range) Lcon(s/d) 42 42 Lg 28.5 25.5 Lhaloj(s/d) 9.5 9.5 Lsdj(s/d) 26.2 26.2 Lsp(s/d) 19 19 Nch 0.9500 ×1018 0.7500 ×1018 Nhalo(s/d) 4.7501 ×1018 3.7500 ×1018 Nsd(s/d) 0.9501 ×1020 0.7500 ×1020 Tg 52.5 52.5 Thk 2.1 2.3 Tox 0.735 0.805 Ts/d(s/d) 9.5 9.5 increased from 5% of reference parameter values to the full range in table 1, the parameters positively correlated with PDP (table 10), i.e., Nch , Nhalo(s/d) , and Nsd(s/d) , were decreased and the parameters negatively correlated with PDP (table 10), i.e., Thk and Tox , were increased in order to get minimal PDP in both methods (table 8 and table 9). 2) Comparison of symmetric/asymmetric structures We find that our neural method with asymmetric structure (Neural) optimized PDP much less than with symmetric structure (Neural sym) as shown in table 7 and fig.16. As mentioned in sec.III-C, two parameters in a source/drain pair are oppositely correlated to PDP and this resulted in a higher 3) Comparison in optimization speed Table 11 shows the elapsed time to get optimal manuscript parameters with our neural method and human expert. Our neural method consumed 80 seconds for training two neural models described in sec.III-B and another 80 seconds for automatically optimizing manufacturing parameters described in sec.III-C. Once the models are trained, then our method only required 80 seconds for additional optimization. In contrast, human expert consumed 3 to 4 hours to obtain optimal manufacturing parameters because of a lot of trial & error tuning with TCAD simulations. Although these speeds are not the most optimized for both methods, we can roughly conclude that our neural method is about 100 times faster (2 to 3 minutes) than that of human expert (180 to 240 minutes). Our neural approach requires a thousand of training samples generated by TCAD simulations. However, this data generation is off-line process which can be done by random sampling without delicate grid point control and in parallel utilizing multi-cores within several sampling time. Because of this, data acquisition time is not usually significantly considered in machine-learning-related works. Similarly, there are several papers about neural approach for the optimization of devices and circuits which do not consider the data acquisition time either [16], [35], [36]. Even when considering data acquisition time, our neural method requires only several TCAD simulation time for generating whole training samples through parallel computing while human expert needs lots of sequential simulation time because of a bundle of sequential trial and error tuning processes. In our experiment, TCAD simulation time for a data sample required 1 minute and we used 3 parallel simulations only because of license limitation. Although this resulted in 333 minutes (about 5.5 hours) to obtain 1000 training samples, if we use more number of parallel simulations, data acquisition time can be reduced TABLE 9: Optimized manufacturing parameters by human expert with conventional MOSFET of symmetric source/drain Parameter Value (5% limiter) Value (full range) Lcon(s/d) 38 40 Lg 28.5 25.5 Lhaloj(s/d) 9.5 9.5 Lsdj(s/d) 23.75 23.8 Lsp(s/d) 19 20 Nch 0.9500 ×1018 0.7500 ×1018 Nhalo(s/d) 4.7500 ×1018 3.7500 ×1018 Nsd(s/d) 0.9500 ×1020 0.7500 ×1020 Tg 47.5 47.5 Thk 2.1 2.3 Tox 0.735 0.805 Ts/d(s/d) 9.5 9.5 14 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing TABLE 10: Parameters listed in descending order of sensitivity (dy) of symmetric source/drain structure: values with ’-’ sign are negatively correlated with PDP. (See appendix.A for the corresponding transitions of figure-of-merits) Parameter PDP sensitivity (dy) Lg 9.6238 ×10−17 Tox -3.7973 ×10−17 Thk -2.6635 ×10−17 Nch 8.5829 ×10−18 Nsd(s/d) 5.4069 ×10−18 Lsdj(s/d) -5.3249 ×10−18 Parameter PDP sensitivity (dy) Lcon(s/d) -4.8686 ×10−18 Lsp(s/d) 4.7325 ×10−18 Tg -4.5583 ×10−18 Lhaloj(s/d) 4.0593 ×10−18 Nhalo(s/d) 2.9812 ×10−18 Ts/d(s/d) 1.4272 ×10−18 down to 1 min (when 1000 parallel simulations are used), a single data sample time. IV. CONCLUSION In this paper, we proposed a new framework to automatically optimize manufacturing parameters of transistor by using neural networks. As the first step, our method trained two neural networks for delay and power models of a transistor utilizing a training data set from TCAD simulation. Then, an optimal set of manufacturing parameters that minimizes PDP was found automatically by using the neural network models and gradient descent method in desired ranges of parameters. We experimentally proved the superiority and efficiency of our neural approach compared to the previous Intel 32 nm node [8] and human expert’s MOSFET optimization in symmetric source/drain structure. Figure-of-merits of our method were significantly better than the TCAD simulated values with the reference structure [8], i.e., about 7% less PDP with 5% limiter and about 20% less PDP with full range limiter. Additionally, Figure-of-merits of our method were comparable to human expert’s optimization resulted via time-consuming trial & error with TCAD simulations, while our method only took a couple of minutes to do the same optimization. Moreover, it was possible to analyze the effect of each manufacturing parameter on figure-of-merits when our neural models were utilized. Without a huge amount of TCAD simulations, our neural models obtained sensitivities of parameters to PDP, delay, and power by predicting these figureof-merits for varying parameters. Then, we categorized and gave certain weights to manufacturing parameters regarding their sensitivities to PDP, delay, and power. For example, Lg , Tox , Thk are highly correlated to PDP but Lcond(s/d) and Tg are uncorrelated to any of PDP, delay, and power. Our neural approach is not limited to a specific device and matured technology but can be expanded to general devices and new technology as long as a number of input/output data are given because machine learning itself does not use any device physics considering device structure and scale. Even though we did not perform real device experiments but focused on simulated results as done in the machinelearning-based circuit designing method [16] which includes only simulated results without implementing real circuit device, based on our experience, the simulation results are rather correct and well matched to the real devices. There would be concerning about gathering training data from real factory. In real factory, for variation in unit manufacturing process, they cannot examine all wafer but some dies as representative samples. Our work is focusing on optimizing variation in unit manufacturing process with the varying range from min to max in table 1 and, in the same manner, we used 1000 data samples for machine learning which can also be obtained in real factory without heavy expense. For output data, it would be burdensome if we measure full Ids -Vgs and Cgg -Vgs data through gate voltage sweep, but several Ids and Cgg values at some specific gate voltages can be obtained in real factory with no heavy expense. For input data, process parameters can be obtained from the devices at the scribe line. Randomized structural parameters in Table I are the outcomes of the process parameters such as dose, temperature, and gas flow, but the accuracy of machine learning is not affected by the type of inputs. If there are some missing data, we can adopt generative adversarial network (GAN) as a realistic data generator to complete them. Our experiments demonstrated that PDP of transistor can be much lower to about 29% less than that of the reference structure, if asymmetric source/drain structure is adopted. In addition, PDP can be further improved with even wider limiter beyond Min and Max shown in table 1 because linear transitions of figure-of-merits for full range limiter exhibit monotonic functions (see figures in appendix.B and C). Since our neural approach is not limited to a specific figure-of-merit or scale of device, we can apply our method to a smaller device for better figure-of-merits and to control each figure-of-merit, such as lowering down the relatively high delay in asymmetric structure in table 7 by defining an appropriate objective function alternative to eq.3. There is a need to further analyze asymmetric structure and wider limiter for real application because they are not compatible with the previous 32 nm node with symmetric structure, and subsequently, there is an additional fabrication cost for asymmetric structure. This is remained as our future work. REFERENCES [1] A. Khakifirooz, O. M. Nayfeh, and D. Antoniadis, "A Simple Semiempirical Short-Channel MOSFET Current–Voltage Model Continuous Across All Regions of Operation and Employing Only Physical Parameters," IEEE Trans. Electron Devices, vol. 56, no. 8, pp. 1674-1680, Aug. 2009, doi: 10.1109/TED.2009.2024022. [2] BSIM Models: http://bsim.berkeley.edu/models/ [3] Predictive Technology Model: http://ptm.asu.edu/ [4] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, 1995. [5] J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks, vol. 61, pp. 85–117, 2015. [6] A. Ceyhan, J. Quijas, S. Jain, H.-Y Liu, W. E. Gifford, and S. Chakravarty, "Machine Learning-enhanced Multi-dimensional Co-Optimization of Sub10nm Technology Node Options," 2019 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2019, pp. 36.6.136.6.4, doi: 10.1109/IEDM19573.2019.8993557. [7] Synopsys Inc., Mountain View, CA, Version N-2017.09, 2017. [8] S. Natarajan, M. Armstrong, M. Bost, R. Brain, M. Brazier, C-H Chang, V. Chikarmane, M. Childs, H. Deshpande,K. Dev, G. Ding, T. Ghani, O. Golonzka, W. Han, J. He*, R. Heussner, R. James, I. Jin, C. Kenyon, S. Klopcic, S-H. Lee, M. Liu, S. Lodha, B. McFadden, A. Murthy, L. 15 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing TABLE 11: Timing comparison of our neural approach and human expert: timing of our method (Neural or Neural sym) was measured with MATLAB script (not optimized) and timing of human expert was roughly measured in hour. N represents the number of parallel processes used in training data generation. Optimization method Neural or Neural sym Human expert Time consumption 1000 × 1 min (data acquisition) + 80 s (model training) + 80 s (optimizing) = 1000 + 2.7 min N N Neiberg, J. Neirynck, P. Packan, S. Pae, C. Parker, C. Pelto, L. Pipes,J. Sebastian, J. Seiple, B. Sell, S. Sivakumar, B. Song, K. Tone, T. Troeger, C. Weber, M. Yang, A. Yeoh, and K. Zhang "A 32 nm logic technology featuring 2nd-generation high-k+ metal-gate transistors, enhanced channel strain and 0.171µm2 SRAM cell size in a 291Mb array," 2008 IEEE International Electron Devices Meeting, San Francisco, CA, 2008, pp. 1-3, doi: 10.1109/IEDM.2008.4796777. [9] P. Werbos, "Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences,", PhD Thesis, Harvard University Press, 1975. [10] K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, no. 5, pp. 359-366, 1989. [11] K. Funahashi, "On the approximate realization of continuous mappings by neural networks," Neural Networks, vol. 2, no. 3, pp. 183-192, 1989. [12] Yoshimura, Matsuoka, and Kakumu, "New CMOS shallow junction well FET structure (CMOS-SJET) for low power-supply voltage," 1992 International Technical Digest on Electron Devices Meeting, San Francisco, CA, USA, 1992, pp. 909-912, doi: 10.1109/IEDM.1992.307504. [13] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533–536, 1986. [14] G. Arfken, "The Method of Steepest Descents," in Mathematical Methods for Physicists, 3rd edition. Orlando, FL, Academic Press, pp. 428-436, 1985. [15] W. T. Vetterling, S. A. Teukolsky, W. H. Press, and B. P. Flannery, Numerical Recipes in FORTRAN: the art of scientific computing, 2nd edition. Cambridge, U.K.; Cambridge University Press, pp. 414, 1992. [16] G. Zhang, H. He, and D. Katabi, “Circuit-GNN: Graph Neural Networks for Distributed Circuit Design,” in International Conference on Machine Learning, pp. 7364-7373. 2019. [17] D. H. von Seggern, CRC Standard Curves and Surfaces With Mathematica, 2nd ed. London, U.K.: Chapman & Hall, 2006. [18] W. T. Vetterling, S. A. Teukolsky, W. H. Press, and B. P. Flannery, Numerical Recipes in FORTRAN: the art of scientific computing, 2nd edition. Cambridge, U.K.; Cambridge University Press, pp. 180-184, 1992. [19] D. P. Darcy and C. F. Kemerer, "The international technology roadmap for semiconductors," 2009. [20] C. Canali, G. Majni, R. Minder, and G. Ottaviani, "Electron and hole drift velocity measurements in silicon and their empirical relation to electric field and temperature," in IEEE Transactions on Electron Devices, vol. 22, no. 11, pp. 1045-1047, Nov. 1975, doi: 10.1109/T-ED.1975.18267. [21] M. H. Na, E. J. Nowak, W. Haensch, and J. Cai, "The effective drive current in CMOS inverters," Digest. International Electron Devices Meeting, San Francisco, CA, USA, 2002, pp. 121-124, doi: 10.1109/IEDM.2002.1175793. [22] G. Masetti, M. Severi, and S. Solmi, "Modeling of carrier mobility against carrier concentration in arsenic-, phosphorus-, and boron-doped silicon," in IEEE Transactions on Electron Devices, vol. 30, no. 7, pp. 764-769, July 1983, doi: 10.1109/T-ED.1983.21207. [23] C. Lombardi, S. Manzini, A. Saporito, and M. Vanzi, "A physically based mobility model for numerical simulation of nonplanar devices," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 7, no. 11, pp. 1164-1171, Nov. 1988, doi: 10.1109/43.9186. [24] J. G. Fossum, "Computer-aided numerical analysis of silicon solar cells," Solid-State Electron., vol. 19, pp. 269-277, 1976. [25] L. Huldt, N. G. Nilsson, and K. G. Svantesson, “The temperature dependence of band-to-band Auger recombination in silicon,” Appl.Phys. Lett., vol. 35, no. 10, pp. 776–777, 1979. [26] G. A. M. Hurkx, D. B. M. Klaassen, and M. P. G. Knuvers, "A new recombination model for device simulation including tunneling," in IEEE Transactions on Electron Devices, vol. 39, no. 2, pp. 331-338, Feb. 1992, doi: 10.1109/16.121690. [27] K. Kuźniar and M. Zajac, ˛ "Some methods of pre-processing input data for neural networks," Computer Assisted Methods in Engineering and Science, vol. 22, no. 2, pp. 141–151, jan. 2017. 3 to 4 hours [28] J.-S. Yoon, T. Rim, J. Kim, M. Meyyappan, C.-K. Baek, and Y.-H. Jeong, "Vertical gate-all-around junctionless nanowire transistors with asymmetric diameters and underlap lengths," Appl. Phys. Lett., vol. 105, no. 10, p. 102105, 2014. [29] G. C. Patil and S. Qureshi, "Asymmetric drain underlap dopant-segregated Schottky barrier ultrathin-body SOI MOSFET for low-power mixed-signal circuits," Semicond. Sci. Technol., vol. 28, no. 4, pp. 045002-1-045002-9, Feb. 2013. [30] A. Goel, S. Gupta, A. Bansal, M. Chiang, and K. Roy, "Double-gate MOSFETs with aymmetric drain underlap: A device-circuit co-design and optimization perspective for SRAM," 2009 Device Research Conference, University Park, PA, 2009, pp. 57-58, doi: 10.1109/DRC.2009.5354884. [31] R. Pal, M. Togo, Y. Yong, L. Vanamurthy, S. Muralidharan, X. Zhang, R. Carter, M. Eller, and S. Samavedam, "Variation improvement for manufacturable FINFET technology," 2015 IEEE International Electron Devices Meeting (IEDM), Washington, DC, 2015, pp. 21.2.1-21.2.4, doi: 10.1109/IEDM.2015.7409748. [32] J. K. Lorenz, A. Asenov, E. Baer, S. Barraud, F. Kluepfel, C. Millar, and M. Nedjalkov, “Process variability for devices at and beyond the 7 nm node,” ECS J. Solid State Sci. Technol., vol. 7, no. 7, pp. 595–601, Jan. 2018, doi: 10.1149/2.0051811jss. [33] J. Zhuge, R. Wang, R. Huang, J. Zou, X. Huang, D.-W. Kim, D. Park, X. Zhang, and Y. Wang, "Experimental investigation and design optimization guidelines of characteristic variability in silicon nanowire CMOS technology," 2009 IEEE International Electron Devices Meeting (IEDM), Baltimore, MD, 2009, pp. 1-4, doi: 10.1109/IEDM.2009.5424421. [34] S. Mudanai, "Overcoming variation challenge," IEDM Short Course, Dec. 2018. [35] F. Feng, V. Gongal-Reddy, C. Zhang, J. Ma, and Q. Zhang, "Parametric Modeling of Microwave Components Using Adjoint Neural Networks and Pole-Residue Transfer Functions With EM Sensitivity Analysis," in IEEE Transactions on Microwave Theory and Techniques, vol. 65, no. 6, pp. 1955-1975, June 2017, doi: 10.1109/TMTT.2017.2650904. [36] C. Zhang, J. Jin, W. Na, Q. Zhang, and M. Yu, "Multivalued Neural Network Inverse Modeling and Applications to Microwave Filters," in IEEE Transactions on Microwave Theory and Techniques, vol. 66, no. 8, pp. 3781-3797, Aug. 2018, doi: 10.1109/TMTT.2018.2841889. HYUN-CHUL CHOI received the B.S. degree in Electronic and Electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Korea, in 2002 and the M.S. and the Ph.D. degrees in Electronic and Electrical engineering from Pohang University of Science and Technology (POSTECH), Korea, in 2005 and 2011 respectively. From 2011 to 2014, he was a Research Scientist with Multimedia Development Laboratory of Daum Communications. He had been an Assistant Professor with the Electronic Engineering Department, Yeungnam University, Korea from 2014 to 2020 and has been an Associate Professor from 2020. His research interests include multimedia retrieval, computational photography, 3D photogrammetry, object tracking and recognition based on machine learning and neural networks. 16 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing HYEOK YUN received the B.S. degree in electrical engineering from Pohang University of Science and Technology (POSTECH), Republic of Korea, in 2019. He is currently pursuing the M.S. degree in electrical engineering from POSTECH. His research interests include variability of multigate field-effect transistors (FinFETs, nanowire FETs, and nanosheet FETs) and machine learning. JUN-SIK YOON (S’14-M’16) received the B.S. degree in electrical engineering and Ph.D. degree in creative IT engineering from Pohang University of Science and Technology (POSTECH), Republic of Korea, in 2012 and 2016, respectively. He was a postdoctoral research fellow in POSTECH (20162018). From 2019, he is a research assistant professor in electrical engineering from POSTECH. His research interests include characterization and simulation of advanced nanoscale devices (fin, gate-all-around, tunneling, nanosheet FETs) and applications (chemical sensor, solar cell). ROCK-HYUN BAEK (S’08-M’11) received the B.S. degree in electrical engineering from Korea University, and the M.S. and Ph.D. degrees in electrical engineering from the Pohang University of Science and Technology (POSTECH), Republic of Korea, in 2004, 2006, and 2011, respectively. He was a Postdoctoral Researcher and Technical Engineer with SEMATECH in Albany, NY (2011–2015). He was a Senior Device Engineer with SAMSUNG Research and Development Center (Pathfinding TEAM), Korea (2015–2017). Since 2017, he has been an Assistant Professor in electrical engineering with POSTECH, Pohang, Korea. His research interests include technology benchmark by characterization, simulation, and modeling of advanced devices and materials (fin, gateall-around, nanosheet FETs, 3D-NAND, 3DIC, SiGe, Ge, III-V, etc). 17 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing APPENDIX. TRANSITIONS OF FIGURE-OF-MERITS A. SYMMETRIC STRUCTURE AND 5% LIMITER 1 0 1 0 39 40 41 42 0 26 28 30 x=Lcon(s/d) 10 2 10.5 pdp vs N halo(s/d) 10 -16 10 -18 3 2 10 -16 3 2 1 0 0 0 0 11 12 10 -16 10 -17 dy = -3.7973 y=pdp 2 5 5.5 1 6 8 9 10 21 10 25.5 26 19 19.5 11 12 pdp vs Tg 20.5 21 pdp vs Thk 10 -16 dy = -4.5583 10 -18 10 -17 dy = -2.6635 4 3 2 1 0 48 49 10 22 x=N sd(s/d) 20 x=Lsp(s/d) 50 51 52 1.8 2 x=T g 2.2 10 -3 x=T hk pdp vs Ts/d(s/d) 10 -18 dy = 1.4272 4 3 4.5 x=N halo(s/d) pdp vs Tox 10 -16 4 4 10 20 25 2 1 10 24.5 3 1 x=N ch 0 4 1 9 2 x=Lsdj(s/d) 10 -18 10 -18 3 1 24 pdp vs N sd(s/d) dy = 5.4069 4 pdp vs Lsp(s/d) dy = 4.7325 4 0 9.5 y=pdp 3 2 x=Lhaloj(s/d) dy = 2.9812 4 y=pdp y=pdp 10 -16 10 -18 dy = 8.5829 8 y=pdp 34 x=Lg pdp vs N ch 10 -16 4 32 3 1 y=pdp 38 2 10 -16 10 -18 y=pdp 2 4 3 pdp vs Lsdj(s/d) dy = -5.3249 y=pdp optimum -5% bounds +5% bounds 1 10 -16 10 -18 dy = 4.0593 4 3 pdp vs Lhaloj(s/d) y=pdp y=pdp y=FPDP(x) 10 -16 10 -17 dy = 9.6238 4 3 2 pdp vs Lg 10 -16 10 -18 dy = -4.8686 4 y=pdp pdp vs Lcon(s/d) y=pdp 10 -16 3 2 1 0 0 6 6.5 7 7.5 8 9.5 10 10 -4 x=T ox 10.5 10 -3 x=T s/d(s/d) FIGURE 17: PDP transition for varying manufacturing parameter around the optimal value of 5% limiter and symmetric source/drain structure optimum -5% bounds +5% bounds 38 39 40 0 41 42 28 30 10 -12 10 -14 1 delay vs Nhalo(s/d) 9 10 11 1.5 1 10 -12 10 -12 10 -14 y=delay 1 0.5 5 5.5 6 1.5 1 25 25.5 26 delay vs Tg dy = -2.6953 10 x=N sd(s/d) 11 19.5 2 10 -12 10 -15 1.5 1 12 10 22 20 20.5 21 delay vs Thk dy = 2.1652 2 10 -14 1.5 1 0.5 0 9 19 x=Lsp(s/d) 0.5 8 10 21 24.5 0 48 49 50 x=T g 51 52 1.8 2 x=T hk 2.2 10 -3 delay vs Ts/d(s/d) dy = 9.2618 2 1.5 4.5 x=N halo(s/d) delay vs Tox dy = 4.8036 2 4 0 10 -12 10 -14 0 12 1 0.5 24 delay vs Nsd(s/d) 10 -14 1.5 x=Lsdj(s/d) 0.5 10 20 x=N ch 10.5 dy = -2.6213 2 0 8 10 10 -12 10 -14 0.5 0 1 delay vs Lsp(s/d) dy = 1.4137 2 0 9.5 y=delay 1.5 1.5 x=Lhaloj(s/d) dy = 4.6125 2 y=delay y=delay delay vs Nch 0.5 y=delay 34 x=Lg dy = -3.0722 2 32 10 -12 10 -14 0.5 0 26 x=Lcon(s/d) 10 -12 1 0.5 delay vs Lsdj(s/d) dy = -5.0382 2 1.5 y=delay 0 1 0.5 10 -12 10 -15 y=delay y=Fdelay (x) dy = 4.13 2 1.5 delay vs Lhaloj(s/d) y=delay 1 0.5 10 -12 10 -13 y=delay 2 1.5 delay vs Lg dy = 1.3915 y=delay 10 -12 10 -15 dy = -2.3782 2 y=delay delay vs Lcon(s/d) y=delay 10 -12 10 -15 1.5 1 0.5 0 0 6 6.5 7 x=T ox 7.5 8 10 -4 9.5 10 x=T s/d(s/d) 10.5 10 -3 FIGURE 18: Delay transition for varying manufacturing parameter around the optimal value of 5% limiter and symmetric source/drain structure 18 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing y=Fpower (x) optimum -5% bounds +5% bounds 0 39 1 41 42 28 30 32 34 1 0 9.5 10 10.5 10 -07 dy = 9.0688 2 1 0 24 24.5 25 25.5 26 19 19.5 20 20.5 x=Lsdj(s/d) x=Lsp(s/d) power vs N ch -4 power vs N halo(s/d) -4 power vs N sd(s/d) power vs T g power vs T hk y=power 10 10 -06 2 1 0 9 10 11 4 10 20 dy = -3.6069 4.5 5 5.5 10 10 -05 y=power 1 0 2 dy = -2.7868 1 10 -4 10 -06 2 1 0 8 9 10 x=N sd(s/d) 11 12 10 22 dy = -2.3005 21 10 -05 2 1 0 48 49 50 x=T g 51 52 1.8 2 x=T hk 2.2 10 -3 -4 power vs T s/d(s/d) dy = -5.9577 2 6 10 21 x=N halo(s/d) power vs T ox 10 -4 10 -06 0 12 x=N ch dy = 8.2887 y=power dy = -5.8396 y=power 10 10 -05 0 10 1 2 x=Lhaloj(s/d) 1 -4 2 power vs L sp(s/d) 10 -4 10 -06 dy = 4.8184 0 26 power vs L sdj(s/d) x=Lg 2 8 10 -4 10 -06 x=Lcon(s/d) dy = 1.149 y=power 2 y=power 10 -4 40 power vs L haloj(s/d) dy = 2.1773 0 38 y=power 10 -4 10 -05 y=power 2 1 power vs L g dy = 4.3252 y=power y=power 10 -4 10 -06 dy = -3.0649 y=power power vs L con(s/d) y=power 10 -4 10 -07 2 1 0 6 6.5 7 x=T ox 7.5 8 10 -4 9.5 10 x=T s/d(s/d) 10.5 10 -3 FIGURE 19: Power transition for varying manufacturing parameter around the optimal value of 5% limiter and symmetric source/drain structure 19 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing B. ASYMMETRIC STRUCTURE AND FULL RANGE LIMITER 38 39 40 1 1 0 41 42 39 40 10 -16 10 -18 3 2 pdp vs Lsdj(s) 10 -16 10 -17 30 32 34 2 3 2 0 0 0 0 24 24.5 26 19 10 -18 y=pdp 1 10 -16 10 -17 dy = -1.8368 4 2 pdp vs N halo(s) 3 2 1 0 5.5 x=N halo(d) 10 -16 4 4 4.5 10 18 10 -16 10 -17 2 5.5 10 18 2 0 0 2.2 0.6 0.65 0.7 x=T ox 0.75 0.8 2 y=pdp 8 9 10 11 pdp vs N sd(s) 10 -16 10 -17 pdp vs Tg dy = 2.0643 4 12 10 17 x=N ch 2 10 -18 3 2 1 0 8 10 19 9 10 11 x=N sd(s) 10 -16 10 -18 10 -18 3 21 3 12 12 10 19 48 49 50 51 52 x=T g pdp vs Ts/d(s) dy = -8.9216 4 2 0 2 11 3 1 20.5 dy = 3.7011 4 pdp vs Ts/d(d) dy = 3.8077 4 1 x=T hk 10 x=N sd(d) 10 -16 3 20 0 9 pdp vs N ch dy = -6.8628 4 1 8 1 1.8 19.5 10 -16 10 -17 10.5 0 19 pdp vs N sd(d) 2 6 10 -17 10 -16 10 -17 x=Lsp(s) 3 pdp vs Tox dy = -2.1821 4 y=pdp 3 5 x=N halo(s) pdp vs Thk dy = -1.3483 21 0 6 y=pdp 5 20.5 1 0 4.5 20 dy = -1.1779 4 10 1 x=Lsp(d) y=pdp 10 -16 3 4 19.5 y=pdp pdp vs N halo(d) dy = 8.363 4 25.5 y=pdp 10 -16 25 x=Lsdj(s) pdp vs Lsp(s) 2 1 26 9.5 x=Lhaloj(s) 3 1 25.5 10.5 dy = -1.7696 4 1 25 0 10 10 -16 10 -18 1 24.5 2 1 9.5 pdp vs Lsp(d) 10 -18 3 x=Lhaloj(d) dy = 2.2025 4 3 x=Lsdj(d) y=pdp 28 x=Lg dy = 2.0045 4 y=pdp y=pdp pdp vs Lsdj(d) 2 pdp vs Lhaloj(s) dy = -3.2007 4 3 0 26 x=Lcon(s) dy = -5.8985 24 y=pdp 42 y=pdp 10 -16 41 10 -16 10 -18 1 0 38 x=Lcon(d) 4 2 y=pdp 0 2 4 3 pdp vs Lhaloj(d) dy = 2.8912 y=pdp optimum -5% bounds +5% bounds 4 3 10 -16 10 -17 dy = 5.833 y=pdp y=FPDP(x) 1 pdp vs Lg 10 -16 10 -19 y=pdp 2 pdp vs Lcon(s) dy = 2.617 4 3 y=pdp y=pdp 4 10 -16 10 -19 dy = -8.9053 y=pdp pdp vs Lcon(d) 10 -16 10 -18 3 2 1 0 9.5 10 x=T s/d(d) 10.5 9.5 10 10.5 x=T s/d(s) FIGURE 20: PDP transition for varying manufacturing parameter around the optimal value of full range limiter and asymmetric source/drain structure 20 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing 0.5 0 0.5 40 41 42 39 40 42 0.5 10 -12 10 -13 1.5 1 0.5 0 25 25.5 24 24.5 10 -12 10 -14 25.5 10 -13 1 0.5 0 5 5.5 4 1.5 1 0.5 8 10 -12 1 2.2 11 0.65 0.7 0.75 8 9 10 10 -12 10 -13 10 17 delay vs Tg 10 -17 dy = 9.4408 2 1.5 1 0 9 10 11 12 10 -12 49 50 51 52 x=T g delay vs Ts/d(s) 10 -13 dy = 1.5153 2 48 10 19 x=N sd(s) 10 -14 12 0.5 8 delay vs Ts/d(d) 11 x=N ch delay vs Nsd(s) 10 19 1 0.8 1 21 1 1.5 1 0.5 0 0.6 20.5 1.5 12 0.5 x=T hk 20 dy = -5.1099 2 1.5 0 2 10 dy = 2.7258 2 0.5 1.8 9 x=N sd(d) 10 -16 1.5 0 6 1.5 0 10 -12 10 -14 10 -13 0 19.5 0.5 10 18 dy = -2.4486 2 y=delay 2 5.5 delay vs Nch 0.5 19 delay vs Nsd(d) 10.5 dy = 1.8355 2 x=Lsp(s) 1 delay vs Tox 10 -12 10 -14 dy = 4.2984 5 x=N halo(s) delay vs Thk 10 -12 4.5 10 -12 10 -13 1 0 6 10 18 x=N halo(d) delay vs Lsp(s) 1.5 21 0.5 y=delay 4.5 20.5 1.5 0 4 20 dy = -6.0569 2 10 x=Lhaloj(s) 0 19.5 10 -12 1.5 9.5 0.5 19 delay vs Nhalo(s) 10.5 dy = 3.4838 2 1 26 y=delay y=delay 1 0.5 10 -12 10 -14 x=Lsp(d) dy = 3.9259 2 1.5 10 x=Lhaloj(d) delay vs Lsp(d) x=Lsdj(s) dy = 2.9866 2 25 0 9.5 0 26 delay vs Nhalo(d) 34 0.5 x=Lsdj(d) 10 -12 32 1.5 0 24.5 30 dy = 1.1252 2 y=delay 1 delay vs Lsdj(s) 1 0.5 x=Lg dy = -4.6585 2 1.5 24 28 10 -14 1.5 0 26 y=delay 10 -12 10 -14 y=delay y=delay 41 x=Lcon(s) dy = -3.3448 1 0 38 delay vs Lsdj(d) 10 -12 2 1.5 delay vs Lhaloj(s) dy = 8.399 2 0.5 y=delay 39 x=Lcon(d) y=delay 1 0 38 y=delay 1.5 10 -12 10 -15 y=delay 1 delay vs Lhaloj(d) dy = 7.4521 2 y=delay 0.5 2 1.5 10 -12 10 -13 y=delay optimum -5% bounds +5% bounds delay vs Lg dy = 3.4514 y=delay y=Fdelay (x) 1 10 -12 10 -15 y=delay 1.5 delay vs Lcon(s) dy = -6.849 2 y=delay y=delay 2 10 -12 10 -14 dy = 1.458 y=delay delay vs Lcon(d) 10 -12 0 9.5 10 x=T ox 10.5 9.5 10 x=T s/d(d) 10.5 x=T s/d(s) FIGURE 21: Delay transition for varying manufacturing parameter around the optimal value of full range limiter and asymmetric source/drain structure 0 1 0 40 41 42 0 38 39 power vs L sdj(d) 10 -4 10 -07 y=power 2 1 24.5 power vs L sdj(s) 25 10 -4 25.5 1 26 24.5 25 25.5 26 10 0 10 5 5.5 2 1 10 -4 4 10 -4 10 -06 1 0 5.5 2 x=T hk 2.2 -4 8 10 11 10 -4 0.7 x=T ox 19.5 10 0.75 0.8 20 20.5 dy = 2.5153 power vs N ch 2 1 21 -4 8 power vs N sd(s) 10 9 10 11 12 10 17 power vs T g dy = 1.0191 1 10 -06 2 1 0 8 10 19 9 10 11 12 10 19 x=N sd(s) 10 -4 10 -07 48 49 50 51 52 x=T g power vs T s/d(s) dy = -1.4579 1 -4 10 -05 2 12 2 10 -05 dy = -1.5816 x=N ch dy = 6.4157 power vs T s/d(d) 10.5 0 19 10 -06 0 0.65 10 -4 10 -05 1 power vs N sd(d) 9 10 x=Lsp(s) x=N sd(d) 10 -05 1 0.6 power vs L sp(s) 0 6 2 9.5 x=Lhaloj(s) 2 21 1 10 18 0 1.8 20.5 2 power vs T ox dy = -1.0816 y=power 2 5 x=N halo(s) power vs T hk dy = -9.468 4.5 10.5 dy = -3.5641 0 6 10 18 x=N halo(d) 20 dy = -2.1501 y=power 4.5 19.5 10 -05 0 4 10 -4 x=Lsp(d) dy = -4.0321 y=power 1 10 0 19 1 0 9.5 10 -07 1 10 -06 2 x=Lhaloj(d) 2 -4 power vs N halo(s) 10 -06 2 34 power vs L sp(d) x=Lsdj(s) dy = 2.3271 32 0 24 -4 power vs N halo(d) 30 dy = 4.16 y=power 10 28 10 -05 2 x=Lsdj(d) 1 0 26 0 24 y=power 42 power vs L haloj(s) dy = -6.9062 2 x=Lg dy = 4.8424 0 y=power 41 y=power dy = -9.0664 y=power 40 x=Lcon(s) y=power x=Lcon(d) 10 -4 1 y=power 39 2 10 -4 10 -07 dy = 9.8468 y=power 38 power vs L haloj(d) y=power 2 10 -4 10 -06 y=power optimum -5% bounds +5% bounds power vs L g dy = 7.1119 y=power y=Fpower (x) 10 -4 10 -07 y=power 2 1 power vs L con(s) dy = 5.4265 y=power y=power 10 -4 10 -06 dy = -1.3261 y=power power vs L con(d) 10 -4 10 -05 2 1 0 9.5 10 x=T s/d(d) 10.5 9.5 10 10.5 x=T s/d(s) FIGURE 22: Power transition for varying manufacturing parameter around the optimal value of full range limiter and asymmetric source/drain structure 21 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing C. SYMMETRIC STRUCTURE AND FULL RANGE LIMITER 1 0 1 0 39 40 41 42 0 26 28 30 x=Lcon(s/d) 10 2 10.5 pdp vs N halo(s/d) 10 -16 10 -18 3 2 10 -16 3 2 1 0 0 0 0 11 12 10 -16 10 -17 dy = -3.3479 y=pdp 2 5 5.5 1 6 8 9 10 21 10 25.5 26 19 19.5 11 pdp vs Tg 12 20.5 21 pdp vs Thk 10 -16 10 -18 dy = -4.9389 10 -17 dy = -2.3419 4 3 2 1 0 48 49 10 22 x=N sd(s/d) 20 x=Lsp(s/d) 50 51 52 1.8 2 x=T g 2.2 10 -3 x=T hk pdp vs Ts/d(s/d) 10 -18 dy = 3.1733 4 3 4.5 x=N halo(s/d) pdp vs Tox 10 -16 4 4 10 20 25 2 1 10 24.5 3 1 x=N ch 0 4 1 9 2 x=Lsdj(s/d) 10 -17 10 -18 3 1 24 pdp vs N sd(s/d) dy = 1.1964 4 pdp vs Lsp(s/d) dy = 7.6336 4 0 9.5 y=pdp 3 2 x=Lhaloj(s/d) dy = 6.7781 4 y=pdp y=pdp 10 -16 10 -18 dy = 8.3441 8 y=pdp 34 x=Lg pdp vs N ch 10 -16 4 32 3 1 y=pdp 38 2 10 -16 10 -17 y=pdp 2 4 3 pdp vs Lsdj(s/d) dy = -1.0839 y=pdp optimum -5% bounds +5% bounds 10 -16 10 -18 dy = 6.5552 4 3 pdp vs Lhaloj(s/d) y=pdp y=pdp y=FPDP(x) 1 10 -16 10 -17 dy = 9.4324 4 3 2 pdp vs Lg 10 -16 10 -18 dy = -3.3914 4 y=pdp pdp vs Lcon(s/d) y=pdp 10 -16 3 2 1 0 0 6 6.5 7 7.5 8 9.5 10 10 -4 x=T ox 10.5 10 -3 x=T s/d(s/d) FIGURE 23: PDP transition for varying manufacturing parameter around the optimal value of full range limiter and symmetric source/drain structure optimum -5% bounds +5% bounds 38 39 40 0 41 42 28 30 10 -12 10 -14 1 delay vs Nhalo(s/d) 10 -12 10 -14 0 1.5 1 10 11 10 -12 10 -12 10 -14 y=delay 1 0.5 5 5.5 6 1.5 1 25 25.5 26 delay vs Tg dy = -1.2668 10 x=N sd(s/d) 19.5 2 10 -12 10 -15 1.5 1 11 12 10 22 20 20.5 21 delay vs Thk dy = 2.2993 2 10 -14 1.5 1 0.5 0 9 19 x=Lsp(s/d) 0.5 8 10 21 24.5 0 48 49 50 x=T g 51 52 1.8 2 x=T hk 2.2 10 -3 delay vs Ts/d(s/d) dy = 1.2539 2 1.5 4.5 x=N halo(s/d) delay vs Tox dy = 5.3032 2 4 0 10 -12 0 12 10 20 x=N ch 1 0.5 24 10 -14 10 -14 1.5 x=Lsdj(s/d) 0.5 0 9 10.5 delay vs Nsd(s/d) dy = 3.9025 2 0.5 8 10 delay vs Lsp(s/d) dy = -2.1134 2 0 9.5 y=delay 1.5 1 x=Lhaloj(s/d) dy = 5.9283 2 y=delay y=delay delay vs Nch 0.5 y=delay 34 x=Lg dy = -3.5379 2 32 1.5 0.5 0 26 x=Lcon(s/d) 10 -12 1 0.5 y=delay 0 1 0.5 10 -12 10 -15 dy = 9.9258 2 1.5 delay vs Lsdj(s/d) y=delay y=Fdelay (x) 10 -12 10 -15 dy = 1.3708 2 1.5 delay vs Lhaloj(s/d) y=delay 1 0.5 10 -12 10 -13 y=delay 2 1.5 delay vs Lg dy = 1.4561 y=delay 10 -12 10 -15 dy = 4.8083 2 y=delay delay vs Lcon(s/d) y=delay 10 -12 10 -15 1.5 1 0.5 0 0 6 6.5 7 x=T ox 7.5 8 10 -4 9.5 10 x=T s/d(s/d) 10.5 10 -3 FIGURE 24: Delay transition for varying manufacturing parameter around the optimal value of full range limiter and symmetric source/drain structure 22 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2020.3019933, IEEE Access Choi et al.: Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing y=Fpower (x) optimum -5% bounds +5% bounds 0 39 1 41 42 1 28 30 32 34 2 1 0 9.5 10 10.5 10 -06 dy = 8.7667 2 1 0 24 24.5 25 25.5 26 19 19.5 20 20.5 x=Lsdj(s/d) x=Lsp(s/d) power vs N ch -4 power vs N halo(s/d) -4 power vs N sd(s/d) power vs T g power vs T hk 2 1 0 9 10 11 4 10 20 dy = -3.3257 4.5 5 5.5 10 10 -05 y=power 1 0 2 dy = -3.3708 1 10 -4 10 -06 2 1 0 8 9 10 x=N sd(s/d) 11 12 10 22 dy = -2.0639 21 10 -05 2 1 0 48 49 50 x=T g 51 52 1.8 2 x=T hk 2.2 10 -3 -4 power vs T s/d(s/d) dy = 2.1003 2 6 10 21 x=N halo(s/d) power vs T ox 10 -4 10 -06 0 12 x=N ch dy = 2.7522 y=power y=power 10 10 -06 dy = -3.7648 y=power 10 10 -05 0 10 2 power vs L sp(s/d) 10 -4 10 -06 x=Lhaloj(s/d) 1 -4 dy = -9.3457 0 26 power vs L sdj(s/d) x=Lg 2 8 10 -4 10 -06 x=Lcon(s/d) dy = 1.1543 y=power 2 y=power 10 -4 40 power vs L haloj(s/d) dy = 4.5198 0 38 y=power 10 -4 10 -05 y=power 2 1 power vs L g dy = 4.2202 y=power y=power 10 -4 10 -06 dy = -3.1715 y=power power vs L con(s/d) y=power 10 -4 10 -06 2 1 0 6 6.5 7 x=T ox 7.5 8 10 -4 9.5 10 x=T s/d(s/d) 10.5 10 -3 FIGURE 25: Power transition for varying manufacturing parameter around the optimal value of full range limiter and symmetric source/drain structure 23 VOLUME 4, 2016 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.