electronics Article Extraction of Interconnect Parasitic Capacitance Matrix Based on Deep Neural Network Yaoyao Ma 1,2,3 , Xiaoyu Xu 4 , Shuai Yan 4 , Yaxing Zhou 4 , Tianyu Zheng 4 , Zhuoxiang Ren 4,5, * and Lan Chen 1,3 1 2 3 4 5 * Citation: Ma, Y.; Xu, X.; Yan, S.; Zhou, Y.; Zheng, T.; Ren, Z.; Chen, L. Extraction of Interconnect Parasitic Capacitance Matrix Based on Deep Neural Network. Electronics 2023, 12, 1440. https://doi.org/10.3390/ Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, China University of Chinese Academy of Sciences, Beijing 100049, China Beijing Key Laboratory of Three-Dimensional and Nanometer Integrated Circuit Design Automation Technology, Beijing 100029, China Institute of Electrical Engineering, Chinese Academy of Sciences, Beijing 100190, China Group of Electrical and Electronic Engineering of Paris, Sorbonne Université, Université Paris-Saclay, CentraleSupélec, CNRS, 75005 Paris, France Correspondence: zhuoxiang.ren@upmc.fr Abstract: Interconnect parasitic capacitance extraction is crucial in analyzing VLSI circuits’ delay and crosstalk. This paper uses the deep neural network (DNN) to predict the parasitic capacitance matrix of a two-dimensional pattern. To save the DNN training time, the neural network’s output includes only coupling capacitances in the matrix, and total capacitances are obtained by summing corresponding predicted coupling capacitances. In this way, we can obtain coupling and total capacitances simultaneously using a single neural network. Moreover, we introduce a mirror flip method to augment the datasets computed by the finite element method (FEM), which doubles the dataset size and reduces data preparation efforts. Then, we compare the prediction accuracy of DNN with another neural network ResNet. The result shows that DNN performs better in this case. Moreover, to verify our method’s efficiency, the total capacitances calculated from the trained DNN are compared with the network (named DNN-2) that takes the total capacitance as an extra output. The results show that the prediction accuracy of the two methods is very close, indicating that our method is reliable and can save the training workload for the total capacitance. Finally, a solving efficiency comparison shows that the average computation time of the trained DNN for one case is not more than 2% of that of FEM. Keywords: interconnect wire; parasitic capacitance matrix; data augmentation; DNN; ResNet electronics12061440 Academic Editors: Andrea Prati, Luis Javier García Villalba and Vincent A. Cicirello Received: 22 February 2023 Revised: 15 March 2023 Accepted: 16 March 2023 Published: 17 March 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1. Introduction With the shrinking of the feature size, the interconnect wires’ delay caused by the parasitic parameters has exceeded the gate delay and becomes the main part of the chip’s total delay [1]. Additionally, parasitic parameters of interconnect wires also cause crosstalk between wires, which may lead to confusion in the transmitted signal logic. Parasitic parameters consist of resistance, capacitance, and inductance; among them, parasitic capacitance plays a significant role owing to its extraction complexity and critical influence on signal integrity analysis [2,3]. In advanced process technology nodes, the accuracy requirement of the parasitic capacitance extraction is that the relative error should be less than 5% [4]. The parasitic capacitance extraction methods can be divided into two categories: the field solver method and the pattern-matching method [5]. The field solver method solves maxwell equations through traditional numerical methods, such as floating random walk (FRW) [6], finite element method (FEM) [7], and boundary element method (BEM) [8,9], etc. It has high accuracy but suffers high calculation costs for full chip extraction. The pattern-matching method first segments the layout and obtains the interconnect wires’ patterns; then matches Electronics 2023, 12, 1440. https://doi.org/10.3390/electronics12061440 https://www.mdpi.com/journal/electronics Electronics 2023, 12, 1440 2 of 18 the corresponding pattern in the pattern library; in the end, calculates each pattern’s of is19 through pre-built capacitance models that are stored in the pattern library.3 It fast and is often used for parasitic extraction at the full chip level. Figure 1 shows the flow of the pattern-matching method. The capacitance extraction dimension changes from 1D, 2D, and 2.5D to1D, 3D 2D, and and the extraction is improved accordingly [10]. Directly using changes from 2.5D to 3Daccuracy and the extraction accuracy is improved accordingly the 3D model will cause huge memory and time cost for the full-chip extraction [11]. The [10]. Directly using the 3D model will cause huge memory and time cost for the full-chip 2.5D method[11]. is anThe alleviation of 3D is method, which considers the 3D which effect by a combination extraction 2.5D method an alleviation of 3D method, considers the 3D ofeffect two by orthogonal 2D patterns It achieves a good balance betweenaaccuracy and a combination of two[12]. orthogonal 2D patterns [12]. It achieves good balance efficiency is usedand in several commercial electronic design commercial automation (EDA) tools,design such between and accuracy efficiency and is used in several electronic as StarRC [13](EDA) and Calibre xRC as [14]. automation tools, such StarRC [13] and Calibre xRC [14]. Electronics 2023, 12, x FOR PEER REVIEW capacitance Figure1.1.The Theflow flowofofparasitic parasiticcapacitance capacitanceextraction extractionusing usingthe thepattern-matching pattern-matchingmethod method[3,15]. [3,15]. Figure With Withthe thecontinuous continuousdevelopment developmentofofbig bigdata datatechnology technologyand andhigh-performance high-performance computing, computing,machine machinelearning learningand anddeep deeplearning learninghave havebeen beenapplied appliedtotomany manyfields fields[16]. [16].In In addition additionto tocommon commonfields fieldssuch suchas asimage imageanalysis analysisand andnatural naturallanguage languageprocessing processing[17], [17], they been implemented in the electronic and electrical engineering [18–20]. theyhave havealso also been implemented infield the of field of electronic and electrical engineering Some researchers use machine learning and deep learning to solve the problem of parasitic [18,19,20]. Some researchers use machine learning and deep learning to solve the problem parameter extraction. Kasai et al. [21] employed multilayer perception (MLP) to extract the of parasitic parameter extraction. Kasai et al. [21] employed multilayer perception (MLP) total capacitance and partial coupling capacitances of 3D interconnect wires. The dimension to extract the total capacitance and partial coupling capacitances of 3D interconnect wires. ofThe thedimension MLP output is only one.layer Hence, it needs train multiple of layer the MLP output is only one.toHence, it needs neural to trainnetworks multiple toneural predict the totaltocapacitance capacitances of thecapacitances same pattern. Besides, networks predict theand totalcoupling capacitance and coupling of the same its prediction error for total capacitance was not very good since some errors exceeded pattern. Besides, its prediction error for total capacitance was not very good since some 10%. Liexceeded et al. [22]10%. combined the[22] adjacent capacitance formulas throughformulas the Levenberg– errors Li et al. combined the adjacent capacitance through Marquardt least square method, reducing the number of capacitance formulas in capacitance the pattern the Levenberg–Marquardt least square method, reducing the number of library. It also classifierItfor automatic pattern a deep pattern neural formulas in constructed the pattern alibrary. also constructed a matching classifier through for automatic network (DNN), which improved pattern-matching efficiency. Yang et al. [11] leveraged a matching through a deep neural network (DNN), which improved pattern-matching convolutional neural network (CNN), ResNet-34, to extract parasitic capacitances of 2D efficiency. Yang et al. [11] leveraged a convolutional neural network (CNN), ResNet-34, interconnect wires. For the total capacitance and coupling capacitances, this work trains to extract parasitic capacitances of 2D interconnect wires. For the total capacitance and two separate ResNet networks for prediction. Furthermore, it also compared the prediction coupling capacitances, this work trains two separate ResNet networks for prediction. performance of the CNN with the MLP, showing that the accuracy is higher than the MLP. Furthermore, it also compared the prediction performance of the CNN with the MLP, Abouelyazid et al. [4] focused on the interconnect pattern containing metal connectivity and showing that the accuracy is higher than the MLP. Abouelyazid et al. [4] focused on the trained a DNN model to predict one specific coupling capacitance. Abouelyazid et al. [5] interconnect pattern containing metal connectivity and trained a DNN model to predict took into account the trapezoidal, wire thickness, dielectric layer thickness, et al. into one specific coupling capacitance. Abouelyazid et al. [5] took into account the trapezoidal, the model variation and employed DNN and support vector regression (SVN) to predict wire thickness, dielectric layer thickness, et al. into the model variation and employed one specific coupling capacitance of 2D patterns, respectively. Abouelyazid et al. [23] DNN and support vector regression (SVN) to predict one specific coupling capacitance of 2D patterns, respectively. Abouelyazid et al. [23] employed DNN to predict the parasitic capacitance matrix. For total capacitances and coupling capacitances in the matrix, this work utilized two separate DNN networks for prediction. Moreover, a hybrid extraction flow was proposed. It can identify the accuracy of three extraction methods, field-solver, Electronics 2023, 12, 1440 3 of 18 employed DNN to predict the parasitic capacitance matrix. For total capacitances and coupling capacitances in the matrix, this work utilized two separate DNN networks for prediction. Moreover, a hybrid extraction flow was proposed. It can identify the accuracy of three extraction methods, field-solver, rule-based, and DNN-based, and choose the fastest extraction method to meet the user’s required accuracy. In the parasitic capacitance matrix, the total capacitance on the matrix diagonal equals the sum of the remaining coupling capacitances in the same row. Using this constraint, we train only one DNN model to predict the capacitance matrix in this study. The neural network’s output includes only coupling capacitances in the capacitance matrix, and the total capacitances are obtained by summing corresponding predicted coupling capacitances. The trained DNN model can be used as a capacitance model in the pattern-matching method shown in Figure 1. The main contributions of this work are as follows. First, we propose a mirror flip method to realize data augmentation. Second, we use the DNN to predict the parasitic capacitance matrix. The neural network output includes merely coupling capacitances in the matrix. The total capacitance is obtained by calculating the sum of the coupling capacitances. Table 1 shows the comparison of our work with other state-of-the-art works. Table 1. The comparison among different capacitance extraction methods, including our works. Property Abouelyazid et al. [4] Abouelyazid et al. [5] Prediction content One coupling capacitance One coupling capacitance Pattern dimension Data augmentation 2D No Type of the method DNN 2D No DNN, SVN Number of neural networks 1 Kasai et al. [21] Abouelyazid et al. [23] This Work Capacitance matrix Capacitance matrix 3D No 2D Yes 3D No One total capacitance and its corresponding coupling capacitances 2D No MLP ResNet DNN DNN 1 2 2 1 One capacitance 1 Yang et al. [11] The rest of the paper is organized as follows. Section 2 first recalls the definition of the parasitic capacitance matrix. Then it introduces the data acquisition flow, data augmentation method, and the DNN structure used in the work. Finally, the densitybased data representation for the ResNet input and the ResNet structure used in this work are described in detail. In Section 3, we first utilize DNN to predict the parasitic capacitance matrix of a 2D pattern. Then, ResNet is employed to predict the parasitic capacitance matrix for comparison. Then, to further verify our method, the calculated total capacitances from the trained DNN are compared with another network (named DNN-2) that takes the total capacitance as the output. Then, the prediction performance of DNN under different training set sizes is tested. Then, the solving efficiency of the trained DNN and the FEM is compared. In the end, we apply our method to a new pattern to verify its feasibility. Section 4 discusses the experimental results and future works. Section 5 concludes this work. 2. Materials and Methods 2.1. Parasitic Capacitance Matrix For multi-conductor interconnect systems, capacitance extraction means calculating the capacitance matrix between those conductors. The coupling capacitance Cij between conductor i and j satisfies Equation (1). Vij represents the potential difference between i and j, and Q j denotes the total charges on the surface of conductor j. Cij = Qj ( j 6= i) Vij (1) Electronics 2023, 12, x FOR PEER REVIEW 5 of 19 Electronics 2023, 12, 1440 4 of 18 For an electrostatic system containing ๐๐ conductors, the capacitance matrix is defined as Equation (2), where ๐๐๐๐ (๐๐ = 1,2, … , ๐๐) denotes the total charges on the For an๐๐.electrostatic system of containing n conductors, capacitance matrix is defined conductor ๐ข๐ข๐๐ is the potential conductor ๐๐. ๐ถ๐ถ๐๐๐๐ on thethe matrix diagonal represents the as Equation (2), where q i = 1, 2, . . . , n denotes the total charges on the conductor ( ) i i is total capacitance or self-capacitance of conductor ๐๐, whose value is equal to the sum i.ofuthe the potential of conductor i. C on the matrix diagonal represents the total capacitance or ii remaining coupling capacitances ๐ถ๐ถ๐๐๐๐ (๐๐ = 1,2, … , ๐๐, ๐๐ ≠ ๐๐) in the same row. self-capacitance of conductor i, whose value is equal to the sum of the remaining coupling The method to calculate the capacitance matrix is roughly as follows. The potential capacitances Cij ( j = 1, 2, . . . , n, j 6= i ) in the same row. of conductor ๐๐ is set as ๐ข๐ข๐๐ = 1 , and the potential of other conductors is ๐ข๐ข๐๐ = The method to calculate the capacitance matrix is roughly as follows. The potential of 0 (๐๐ = 1,2, … , ๐๐, ๐๐ ≠ ๐๐) , then ๐๐๐๐๐๐ = 1 . The ๐ถ๐ถ๐๐๐๐ can be obtained by computing the total conductor i is set as u = 1, and the potential of other conductors is u = 0 ( j = 1, 2, . . . , n, j 6= i ), charges on the surfacei of conductor ๐๐. In addition, since the totalj capacitance of conductor then V = 1. The C can be obtained by computing the total charges on the surface of ๐๐ is the ijcapacitance ofij this conductor to the ground, and the potential difference between conductor j. In addition, since the total capacitance of conductor i is the capacitance of them is also 1. Hence, the total capacitance ๐ถ๐ถ๐๐๐๐ equals the charges on the surface of this conductor to the ground, and the potential difference between them is also 1. Hence, conductor ๐๐ . Repeating this process, the entire capacitance matrix can be determined. the total capacitance Cii equals the charges on the surface of conductor i. Repeating this Moreover, since thecapacitance capacitancematrix between conductors isMoreover, a single value, process, the entire cantwo be determined. since the the capacitance capacitance matrix is symmetric. between two conductors is a single value, the capacitance matrix is symmetric. ๏ฃน๐๐1 ๏ฃฎ ๐ถ๐ถ11 ๐ถ๐ถ12 q1 ๐๐2 C๐ถ๐ถ1121 C๐ถ๐ถ1222 ๏ฟฝ ๏ฃฏ q2๏ฟฝ๏ฃบโฎ ๏ฟฝ = ๏ฃฏC ๏ฃฏ ๏ฃบ ๏ฃฏ 21โฎ C22โฎ ๏ฃฏ .. ๏ฃบ .. ๐๐1 ๐ถ๐ถ.. ๐๐2 ๐๐๐๐= ๏ฃฏ ๏ฃฐ.๏ฃป ๏ฃฐ ๐ถ๐ถ . . qn Cn1 Cn2 ๏ฃฎ … .… .. . .โฎ. .. … . ๐ถ๐ถ1๐๐ ๏ฃน๏ฃฎ๐ข๐ข1 ๏ฃน C2๐๐ 1n ๐ข๐ขu ๐ถ๐ถ 21 ๏ฟฝ๏ฃฏ ๏ฟฝ ๏ฃบ Cโฎ2n๏ฟฝ๏ฃบ ๏ฃบ๏ฃฏโฎu2 ๏ฃบ .. ๏ฃบ๏ฃฏ๐ข๐ข๐๐.. ๏ฃบ ๐ถ๐ถ๐๐๐๐ . ๏ฃป๏ฃฐ . ๏ฃป Cnn un (2) (2) ... 2.2. Dataset Acquisition 2.2. Dataset Acquisition In the pattern-matching method, 2.5D extraction is a way that considers the 3D effect the pattern-matching method, 2.5D extraction is a way that theis3D effect with In a combination of two orthogonal 2D patterns. Moreover, eachconsiders 2D pattern defined with combination of orthogonal patterns. Moreover, 2D patternon is each defined by by thea combination of two different metal 2D layers and the number each of conductors layer the combination of different metal layers and the number of conductors on each layer [11]. [11]. Figure 2 shows a 2D pattern studied in this work. It contains three metal layers, layer shows a 2D studied in this work. It contains three metal layers, ๐๐Figure , layer2 m, layer ๐๐ , pattern and a large substrate ground. Each layer contains one orlayer morek, layer m, layerconductors. n, and a largeThe substrate ground. Eachand layerconductors contains one more interconnect interconnect substrate ground areorembedded in the conductors. The substrate ground andisconductors are embedded in the dielectric, whose dielectric, whose relative permittivity 3.9. The studied pattern contains five conductors relative is 3.9. Conductor The studied1pattern contains five conductors and one substrate and one permittivity substrate ground. is always in the center, and its central position ๐ฅ๐ฅ1 ground. Conductor 1 is always in the center, and its central position x is a constant, which 1 is a constant, which equals 0. Hence, there are nine geometric variables in total, including equals 0. Hence, thereand are nine geometric variablesnamely in total,๐ฅ๐ฅincluding the center positions the center positions widths of conductors, 2 , ๐ฅ๐ฅ3 , ๐ฅ๐ฅ4 , ๐ฅ๐ฅ5 , and ๐ค๐ค1 , ๐ค๐ค2 , and widths of conductors, namely x , x , x , x , and w , w , w , w , w 2 3 4 5 2 3 5. 1 4 ๐ค๐ค3 , ๐ค๐ค4 , ๐ค๐ค5 . Figure 2. 2. The The illustration illustration of of the the studied studied pattern pattern and and its its geometry geometry variables. variables. Figure The capacitance thisthis pattern is defined in Equation (3). It ignores substrate The capacitancematrix matrixof of pattern is defined in Equation (3). It the ignores the ground’s total capacitance and only considers the coupling capacitances between the substrate ground’s total capacitance and only considers the coupling capacitances substrate ground and other conductors. between the substrate ground and other conductors. ๏ฃฎ ๏ฃน ๐ถ๐ถ15 ๐ถ๐ถ16 C๐ถ๐ถ1111 C๐ถ๐ถ1212 C๐ถ๐ถ13 ๐ถ๐ถ14 16 15 C 13 C 14 C โก โค ๏ฃฏC๐ถ๐ถ2121 C๐ถ๐ถ2222 C๐ถ๐ถ23 ๐ถ๐ถ25 ๐ถ๐ถ26 ๐ถ๐ถ24 26 ๏ฃบ 25 C 23 C 24 C ๏ฃฏโข ๏ฃบ โฅ ๏ฃฏC๐ถ๐ถ3131 C๐ถ๐ถ3232 C๐ถ๐ถ33 ๏ฃบ ๐ถ๐ถ35 ๐ถ๐ถ36 C ๐ถ๐ถ == (3) (3) ๐ถ๐ถ34 36 ๏ฃบ 35 C 33 C 34 C โฅ ๏ฃฏโข ๏ฃฐC ๏ฃป ๐ถ๐ถ44 ๐ถ๐ถ45 ๐ถ๐ถ46 โข๐ถ๐ถ4141 C๐ถ๐ถ4242 C๐ถ๐ถ43 43 C 44 C 45 C 46 โฅ ๐ถ๐ถ54 ๐ถ๐ถ55 ๐ถ๐ถ56 โฃ๐ถ๐ถ5151 C๐ถ๐ถ5252 C๐ถ๐ถ53 C 53 C 54 C 55 C 56 โฆ The acquisition of datasets is a crucial step in deep learning. In order to achieve broad coverage of the sampling range, we employ the Latin hypercube sampling (LHS) [24] The acquisition of datasets is a crucial step in deep learning. In ord coverage of the sampling range, we employ the Latin hypercube s method. The studied pattern contains nine geometric variables. Suppo Electronics 2023, 12, 1440 5 of 18 of data for each parameter, the flow of LHS is roughly as follows. Firs ๐๐ values within the range of each geometric variable. Then randomly results, then a dataset with ๐๐ ×geometric 9 size is obtained. After obtaining method. The studied pattern contains nine variables. Suppose sampling N sets the p Electronics 2023, 12, x FOR PEER REVIEW 6 of 19 of data for each parameter, the flow of LHS is roughly as follows. First, randomly sample results, we then construct simulation models and utilize the FEM p N values within the range of each geometric variable. Then randomly match the sampling their matrix. Toissum up,After theobtaining flow ofthethe dataset acquisition results,capacitance then a dataset with N × 9 size obtained. parameter sampling The acquisition of datasets is a crucial step inutilize deep the learning. In order to to compute achieve broad results, we then construct simulation models and FEM program their 3, and the above process is conducted through the FEM software EM coverage of matrix. the sampling the Latin hypercube sampling (LHS) [24] capacitance To sum range, up, thewe flowemploy of the dataset acquisition is shown in Figure 3, and by the The authors’ team [25], whose modeling and solving resultby method. studied pattern contains nine geometric variables. Suppose sampling ๐๐interfaces sets the above process is conducted through the FEM software EMPbridge developed the of data for each[25], parameter, the flow of LHS is roughly follows.are First, randomly sample authors’ team whose modeling and solving result as interfaces shown in Figure 4. 4. ๐๐ values within the range of each geometric variable. Then randomly match the sampling results, then a dataset with ๐๐ × 9 size is obtained. After obtaining the parameter sampling results, we then construct simulation models and utilize the FEM program to compute their capacitance matrix. To sum up, the flow of the dataset acquisition is shown in Figure 3, and the above process is conducted through the FEM software EMPbridge developed by the authors’ team [25], whose modeling and solving result interfaces are shown in Figure 4. Figure 3. The flow of datasets acquisition. Figure 3. The datasets Figure 3. flow Theofflow of acquisition. datasets (a) Model building interface acquisition. (b) Solving result interface Figure 4. 4. The The model model building building and and solving solving result result interfaces interfaces of of EMPbridge. EMPbridge. Figure 2.3. Data Augmentation (a) Model The size of datasets also has a crucial impact on the deep learning prediction effect. When the size is small, due to a lack of sufficient supervision information, it is easy to lead to poor generalization of over-fitting is prone building interface ability of the model, and the phenomenon(b) Solving result interfac to occur [26]. However, the acquisition of enough datasets, such as with the FEM simulation, often accompanied by heavy time and labor Data augmentation is a Figure 4. isThe model building and solving resultcosts. interfaces of EMPbridge. 2.3. Data Augmentation The size of datasets also has a crucial impact on the deep learning predictio When the size is small, due to a lack of sufficient supervision information, it is eas Electronics 2023, 12, 1440 6 of 18 to poor generalization ability of the model, and the phenomenon of over-fitting to occur [26]. However, the acquisition of enough datasets, such as with the FEM tion, is often accompanied by heavy time and labor costs. Data augmentation is a 2.3. Data Augmentation alleviate this problem. It can expand the original data by setting a transformat The size of datasets also has a crucial impact on the deep learning prediction effect. without additional computation. Common transformation operations include r Electronics 2023, 12, x FOR PEER REVIEW 7 of 19 When the size is small, due to a lack of sufficient supervision information, it is easy to mirror flip,generalization cropping, etc. [27].of the model, and the phenomenon of over-fitting is lead to poor ability work mirror flip method, as shown inwith Figure 5, to augm prone This to occur [26]. introduces However, theaacquisition of enough datasets, such as the FEM simulation, is often accompanied by heavy time and labor costs. Data augmentation a way to alleviate this problem. It can expand the original data by setting a transformation tasets. After flipping, the geometric variables of conductors change as is follows. T way to alleviate this problem. It can expand the original data by setting a transformation rule without additional computation. Common transformation operations include tion and width of conductor 1 are unchanged; The widths of conductors 4 and 5 rule without additional computation. Common transformation operations include rotation, rotation, mirror flip, cropping, etc. [27]. changed, but their horizontal positions are multiplied by −1; For conductors 2 and mirror flip, cropping, etc. [27]. This work introduces a mirror flip method, as shown in Figure 5, to augment widths are directly exchanged, horizontal coordinates are firstdatasets. multiplied by This After work introduces a mirror flip and method, as of shown in Figure 5, to augment datasets. flipping, the geometric variables conductors change as follows. The After the geometric variables of conductors change as of follows. The position thenflipping, exchanged. transformation rule of parasitic capacitances is shown position and width ofThe conductor 1 are unchanged; The widths conductors 4 and 5 and are in Fig width of conductor 1 are unchanged; The widths of conductors 4 and 5 are unchanged, ฬ unchanged, their horizontal positions multiplied −1;and For conductors 3, representsbut the original value beforearethe mirror by flip, ๐ถ denotes2 and thebut value of their horizontal positions are multiplied by − 1; For conductors 2 and 3, their widths are ditheir widths are directly exchanged, and horizontal coordinates are first multiplied by −1 capacitance matrix. rectly exchanged, and horizontal coordinates are first multiplied by −1 and then exchanged. and then exchanged. The transformation rule of parasitic capacitances is shown in Figure The rule of parasitic capacitances is flip, shown 6. C the 6. ๐ถ๐ถ transformation represents the original value before the mirror andin๐ถ๐ถฬFigure denotes therepresents value of the original value before the mirror flip, and Cฬ denotes the value of the new capacitance matrix. new capacitance matrix. Figure 5. The mirror flip illustration of thepattern. studied pattern. Figure Figure5. 5. The Themirror mirrorflip flip illustration illustration of of the the studied studied pattern. Figure 6. The transformation rule between the original data and the new data. According to the transformation rule, a new set of data can be obtained, and the datasets are doubled in size. It is worth remembering that this data augmentation method isFigure also applicable to other similar patterns athe transformation isthe determined. 6. 6. The transformation rule between the once original data and thedata newrule data. Figure The transformation rule between original and new data. 2.4. Prediction of Parasitic Capacitance Matrix Based on the DNN In the parasitic capacitance matrix, the total capacitance equals the sum of the remaining coupling capacitances in the same row. With this information, the DNN Electronics 2023, 12, 1440 7 of 18 According to the transformation rule, a new set of data can be obtained, and the datasets are doubled in size. It is worth remembering that this data augmentation method is also applicable to other similar patterns once a transformation rule is determined. Electronics 2023, 12, x FOR PEER REVIEW 8 of 1 2.4. Prediction of Parasitic Capacitance Matrix Based on the DNN In the parasitic capacitance matrix, the total capacitance equals the sum of the re- capacitance is indirectly obtained by summing coupling Th maining coupling capacitances in the same row. With the this predicted information, the DNNcapacitances. network output this work,variables, and the total capacitance is indiinput includes of DNNonly is acoupling vector capacitances containing in geometric and the output is a vecto rectly obtained summing the predicted coupling capacitances. The input of DNN is a containing allby coupling capacitances. vectorTaking containing variables, the outputits is aparasitic vector containing all coupling the geometric studied pattern as and an example, capacitance matrix has bee capacitances. given in Equation (3). Due to the matrix’s symmetric property, there are 15 non-repetitiv Taking the studied pattern as an example, its parasitic capacitance matrix has been coupling capacitances in total, which are ๐ถ๐ถ12 , ๐ถ๐ถ13 , ๐ถ๐ถ14 , ๐ถ๐ถ15 , ๐ถ๐ถ16 , ๐ถ๐ถ23 , ๐ถ๐ถ24 , ๐ถ๐ถ25 , ๐ถ๐ถ26 given in Equation (3). Due to the matrix’s symmetric property, there are 15 non-repetitive ๐ถ๐ถ , ๐ถ๐ถ , ๐ถ๐ถ , ๐ถ๐ถ , ๐ถ๐ถ , and ๐ถ๐ถ . Therefore, as the input dimensio 34 35 36 45 46 56 coupling capacitances in total, which are C12 , C13 , C14 , shown C15 , C16 ,in C23Figure , C24 , C7, 25 , C26 , C34 , C35 , of DNN is nine, that is, all geometric variables; the output dimension C36 , C45 , C46 , and C56 . Therefore, as shown in Figure 7, the input dimensionisoffifteen, DNN isincludin all coupling capacitances. Thethebatch (BN)including layer could mitigate th nine, that is, all geometric variables; outputnormalization dimension is fifteen, all coupling capacitances. The batch normalization layer the vanishing vanishing gradient problem since it(BN) shifts thecould inputmitigate data distribution togradient the unsaturate problem the input data distribution to the the activation region since of theit shifts activation function [28]. Hence, weunsaturated add a BNregion layerofbefore each activatio function [28]. Hence, we add a BN layer before each activation function the DNN (4) is th function in the DNN structure, although it is not shown in Figure 7.inEquation structure, although it is not shown in Figure 7. Equation (4) is the loss function, which is loss function, which is defined by the mean square error (MSE) between the predicte defined by the mean square error (MSE) between the predicted value and the reference value and the reference value in the datasets. In Equation (4), ๐๐ represents the size o value in the datasets. In Equation (4), N represents the size of datasets, and K denotes the datasets, and ๐พ๐พ of denotes the which outputis dimension DNN, whichvalue, is fifteen here. ๐ถ๐ถ๐๐๐๐ i 0 output dimension the DNN, fifteen here.ofCijthe is the predicted and C ij ′ the predicted value, and ๐ถ๐ถ represents the reference value. represents the reference value.๐๐๐๐ ๐๐ ๐พ๐พ ๐๐=1 ๐๐=1 2 1 N 11 K 1 ๐๐๐๐๐๐ = ๏ฟฝ |๐ถ๐ถij0 ๐๐๐๐ − ๐ถ๐ถ๐๐๐๐′ |2 MSE = Cij๏ฟฝ −C ∑ ∑ N i=1 ๐๐ K j=1 ๐พ๐พ (4) (4 Figure7. 7. The illustration of DNN structure. Figure The illustration of DNN structure. 2.5. Data Representation 2.5.Density-Based Density-Based Data Representation For comparison, we employ the convolutional neural network ResNet to predict the For comparison, we employ the convolutional neural network ResNet to predict th capacitance matrix. A density-based data representation method [11,29] is used to describe capacitance matrix. A density-based data representation method [11,29] is used t the geometric characteristics of the metal layer and represents the input of ResNet. This describe geometric characteristics of the[23]. metal layer and method canthe effectively describe the layout feature Furthermore, it isrepresents suitable forthe the input o ResNet. This method can effectively describe the layout feature [23]. Furthermore, it i feature extraction operation performed by the convolutional neural network. In this way, suitable for placement the feature operation performed by the convolutional neura the conductor of a extraction metal layer can be represented as a vector. network. In this way, the conductor placement of a metal layer can be represented as vector. The extraction window width ๐๐ determines the extraction range of the pattern i the horizontal direction and the variation range of the geometric variables. A simulatio experiment is conducted to determine the extraction window width ๐๐ of the studie pattern. In the simulation, conductor 1 and conductor 3 take the minimum width unde Electronics 2023, 12, 1440 8 of 18 The extraction window width W determines the extraction range of the pattern in the horizontal direction and the variation range of the geometric variables. A simulation experiment is conducted to determine the extraction window width W of the studied pattern. In the simulation, conductor 1 and conductor 3 take the minimum width under Electronics 2023, 12, x FOR PEER REVIEW the process. Conductor 4 and conductor 5 take a very large width. Then move conductor 39 of 19 away from conductor 1 until the coupling capacitance between them is less than 1% of the total capacitance of conductor 1. The distance between conductor 1 and conductor 3 at this time multiplied by two is the width of the extraction window [5,11]. cells atOnce equal To accurately the cell width theintervals. extraction window size W isdistinguish determined, different the flow ofgeometries, this data representation should notisbeasmore thanFirst, half the the extraction minimumwindow spacingsize ๐ ๐ ๐๐๐๐๐๐ betweeninto twoL interconnects method follows. is divided cells at equal [11]. Therefore, study,distinguish we let thedifferent cell width equal half thewidth minimum and the intervals. in To this accurately geometries, the cell shouldspacing, not be more than half the minimum spacing s between two interconnects [11]. Therefore, in this number of cells ๐ฟ๐ฟ is determined by minEquation (5) [23]. study, we let the cell width equal half the minimum spacing, and the number of cells L is ๐๐ determined by Equation (5) [23]. ๐ฟ๐ฟ = (5) 0.5 × W ๐ ๐ ๐๐๐๐๐๐ L= (5) 0.5 × smin Second, taking one layer as an example, a one-dimensional vector with a size of ๐ฟ๐ฟ is takingvalue one layer as an example, a one-dimensional vector with a size of Lcell is area created.Second, The vector is determined by the ratio of the conductor area to the created. The vector value is determined by the ratio of the conductor area to the cell area at at each position. When the conductor occupies the entire cell area, the vector value is 1; each position. When the conductor occupies the entire cell area, the vector value is 1; when when the conductor occupies part of a cell, the vector value is taken as the ratio of the the conductor occupies part of a cell, the vector value is taken as the ratio of the occupied occupied area the The cell illustration area. The illustration of the density-based dataofrepresentation of area to the celltoarea. of the density-based data representation one layer is oneshown layerin isFigure shown8.in Figure 8. Figure 8. 8. The density-based representation one metal layer. Figure The density-based data data representation ofof one metal layer. PredictionofofParasitic Parasitic Capacitance Capacitance Matrix onon ResNet 2.6.2.6. Prediction MatrixBased Based ResNet ResNet was proposed by He [30] in 2015, which solves the network degradation ResNet was proposed by He [30] in 2015, which solves the network degradation problem of deep networks. In this study, we modify the convolution kernel size of ResNet problem ofinput deepdata networks. this wethe modify thecapacitance convolution kernel of ResNet to fit our and thenInuse it study, to predict parasitic matrix. Thesize network to structure fit our input data and then use it to predict the parasitic capacitance matrix. ResNet-18 used in this work is shown in Figure 9, including 17 two-dimensional The network structure ResNet-18 used in this work is shown Figure 9,7including 17 twoconvolutional layers, one fully connected layer, and two poolingin layers. “1 × conv, 64, /2” dimensional layers, fullyofconnected layer, layer and two in the figureconvolutional indicates that the outputone channel the convolutional is 64,pooling the size oflayers. kernel is 1 × 7, andindicates the stride that is 2. The line in the is called “1 the × 7convolution conv, 64,/2” in the figure theconnection output channel of figure the convolutional a shortcut. The solid indicates that kernel the input output of the residual are layer is 64, the size of line the convolution is and 1 × 7, and the stride is 2.module The connection directly connected. The dotted line means the input and output of the residual module are line in the figure is called a shortcut. The solid line indicates that the input and output of inconsistent in size and cannot be directly connected. Hence a 1 × 1 convolutional kernel is the residual module are directly connected. The dotted line means the input and output added, which makes the input and output the same size. A detailed illustration of these of two the residual are inconsistent in size cannot be directly connected. Hence a shortcutsmodule is also given in Figure 9. The lossand function definition of the ResNet is the 1 ×same 1 convolutional kernel is added, which makes the input and output the same size. A as Equation (4). detailed illustration of these two shortcuts is also given in Figure 9. The loss function definition of the ResNet is the same as Equation (4). Electronics 2023, 2023, 12, 12, 1440 Electronics x FOR PEER REVIEW 9 of10 18 of 19 Figure9.9.The Theneural neuralnetwork network structure of the modified ResNet-18. (a) The concrete structure Figure structure of the modified ResNet-18. (a) The concrete structure of theof the dotted line; (b) The concrete structure of the solid line. dotted line; (b) The concrete structure of the solid line. 3.3.Experiments Experimentsand andResults Results The research case this experiment hashas been introduced in Section 2.2. It hasIt nine The research caseinin this experiment been introduced in Section 2.2. has nine geometric variables. Equation (3) defines its parasitic capacitance matrix, which includes geometric variables. Equation (3) defines its parasitic capacitance matrix, which includes fifteen non-repetitive coupling capacitances. In this study, only the coupling capacitances fifteen non-repetitive coupling capacitances. In this study, only the coupling capacitances are considered during the deep learning training, and the total capacitance is calculated by are considered during the deep learning training, and the total capacitance is calculated summing up coupling capacitances. The structure and geometric variables’ sampling range bythe summing upthe coupling capacitances. The structure and variables’ sampling of case meet 55 nm process requirements. Moreover, thisgeometric work is also applicable to rangeprocess of thetechnology. case meet Table the 55 nm the process requirements. Moreover, work is also other 2 lists first four layers’ process criterionthis of the 55 nm, applicable to other process technology. Table 2 lists the first four layers’ process criterion which includes the interconnect wires’ thickness, the minimum wire width wmin , and the of the 55 nm, which interconnect wires’ thickness, theresearch minimum wire width minimum spacing sminincludes betweenthe two wires of each metal layer. This selects the ๐ค๐ค๐๐๐๐๐๐ , and the minimum ๐ ๐ ๐๐๐๐๐๐ between of neach metal layer. combination of layer 2, layer spacing 3, and layer 4, that is, k =two 2, mwires = 3, and = 4 in Figure 2. This research selects the combination of layer 2, layer 3, and layer 4, that is, ๐๐ = 2, ๐๐ = 3, and Table Multilayer interconnect 55 nm process standard (first four metal layers) [11]. ๐๐ = 42. in Figure 2. Layer Thickness (µm) wmin (µm) smin (µm) Table 2. Multilayer interconnect 55 nm process standard (first four metal layers) [11]. 1 Layer 2 13 4 2 3 4 0.1 Thickness 0.16 (๐๐๐๐) 0.2 0.1 0.2 0.16 0.2 0.2 0.054 ๐๐0.081 ๐๐๐๐๐๐ (๐๐๐๐) 0.09 0.054 0.09 0.081 0.09 0.09 0.108 ๐ ๐ ๐๐๐๐๐๐0.08 (๐๐๐๐) 0.09 0.108 0.09 0.08 0.09 0.09 Electronics FOR PEER REVIEW Electronics 2023, 2023, 12, 12, x1440 1110ofof19 18 As shown thethe extraction window width is taken as W as = 56 , where As shown in inFigure Figure10, 10, extraction window width is taken ๐๐× =s56 ๐ ๐ ๐๐๐๐๐๐ , min× smin = ๐ ๐ 0.09 µm. For a higher precision in reference values, the solving domain width of where = 0.09 ๐๐๐๐. For a higher precision in reference values, the solving domain ๐๐๐๐๐๐ numerical computation is taken as is 10 taken µm. Once the๐๐๐๐ window W is determined, . Once width width of numerical computation as 10 the window width ๐๐ the is ResNet input data dimension L of one metal layer can be computed using Equation (5), and determined, the ResNet input data dimension ๐ฟ๐ฟ of one metal layer can be computed the result is L =(5), 112. Since three metal layers, theare input data sizelayers, of the the ResNet is using Equation and the there resultare is ๐ฟ๐ฟ = 112. Since there three metal input 3 × 112. data size of the ResNet is 3 × 112. Figure 10. 10. The Theillustration illustration of of extraction extraction window window width width and and the the solving solving domain domain width. width. Figure Traditional multilayer thethe Manhattan structure [31],[31], that Traditional multilayer interconnect interconnectrouting routingfollows follows Manhattan structure is, placing adjacent layers’ interconnects orthogonally to each other. In this way, the that is, placing adjacent layers’ interconnects orthogonally to each other. In this way, the crosstalk caused by coupling capacitances can be minimized [32]. The studied pattern crosstalk caused by coupling capacitances can be minimized [32]. The studied pattern shown in in Figure geometry from thethe layout. Layers k and shown Figure 22 isisaacross-section cross-sectionview viewofofa small a small geometry from layout. Layers ๐๐ n are regarded as horizontal routing here, and the three conductors in the middle layer are and ๐๐ are regarded as horizontal routing here, and the three conductors in the middle regarded as verticalas page routing. , w2 , and๐ค๐คw,3 ๐ค๐ค represent the width of the interconnects. layer are regarded vertical pagew1routing. 1 2 , and ๐ค๐ค3 represent the width of the Hence, we let the maximum variation range of w , w , and w not exceed ten times the 2 1 range interconnects. Hence, we let the maximum variation of ๐ค๐ค31 , ๐ค๐ค2 , and ๐ค๐ค3 not exceed minimum width. Finally, the sampling range of each geometric variable is listed in Table 3. ten times the minimum width. Finally, the sampling range of each geometric variable is listed in Table 3. Table 3. Parameters’ sampling range. Table 3.Parameter Parameters’ sampling range. Range Sampling Parameter Sampling Range x2 [1.0, 2.05] w2 [0.09, 0.9] Parameter Sampling Range Parameter Sampling Range [−2.05, −1.0] ๐ค๐ค2w3 ๐ฅ๐ฅ2 x3 [1.0, 2.05] [0.09,[0.09, 0.9] 0.9] x [−1.0, 1.0] [0.081, 3.0] ๐ค๐ค3w4 ๐ฅ๐ฅ3 4 [−2.05, −1.0] [0.09, 0.9] x5 [−1.0, 1.0] w5 [0.09, 3.0] ๐ค๐ค4 ๐ฅ๐ฅ4 w1 [−1.0, 1.0] [0.081, 3.0] [0.09, 0.9] ๐ค๐ค5 ๐ฅ๐ฅ5 [−1.0, 1.0] [0.09, 3.0] ๐ค๐ค [0.09, 0.9] The 1deep learning framework used in the study is PyTorch. The hardware configuration as follows: CPU Intel Core used i7-12700KF, NVIDIA RTX3080Ti, 16GB Theisdeep learning framework in theGPU study is PyTorch. The with hardware RAM. For the is studied pattern, 25,000 of i7-12700KF, samplings were and the reference configuration as follows: CPU Intelsets Core GPUgenerated, NVIDIA RTX3080Ti, with capacitance values were computed using theofFEM solver using 16GB RAM.matrix For the studied pattern, 25,000 sets samplings were EMPbridge generated, software. and the Finally, 50,000 sets of data werevalues obtained after using the using data augmentation method dereference capacitance matrix were computed the FEM solver using scribed in Section 2.2. The total datasets were randomly split into three parts: 80% of the EMPbridge software. Finally, 50,000 sets of data were obtained after using the data sampling wasmethod used asdescribed the training set, 10%2.2. was used the validation set, and split 10% into was augmentation in Section The totalasdatasets were randomly used as the testing set. Too-small coupling capacitances are usually not considered in the three parts: 80% of the sampling was used as the training set, 10% was used as the extraction [11,23]. thiswas work, all as coupling capacitances are considered the neural validation set, andIn 10% used the testing set. Too-small couplingduring capacitances are network training stage. However, in the testing stage, the coupling capacitances whose usually not considered in the extraction [11,23]. In this work, all coupling capacitances are values are less thanthe 1%neural of the corresponding total capacitance arein specially treated. When considered during network training stage. However, the testing stage, the calculating the predicted total capacitance, all coupling capacitances are included in the calcoupling capacitances whose values are less than 1% of the corresponding total culation. When evaluating the prediction accuracy of the coupling capacitances, the small capacitance are specially treated. When calculating the predicted total capacitance, all values arecapacitances not considered order toinevaluate only theWhen accuracy of significant coupling coupling are in included the calculation. evaluating the prediction capacitances. Since one coupling capacitance corresponds to two total capacitances, such accuracy of the coupling capacitances, the small values are not considered in order to as Cij , which contributes to total capacitances Cii and Cjj , the error of Cij is not evaluated evaluate only the accuracy of significant coupling capacitances. Since one coupling only when both Cij /Cii and Cij /Cjj are less than 1%. capacitance corresponds to two total capacitances, such as ๐ถ๐ถ๐๐๐๐ , which contributes to total Electronics 2023, 12, 1440 11 of 18 3.1. Experiment Results The hyperparameters of the neural network have a great influence on the prediction effect. Different combinations of hyperparameters may cause a significant difference in the prediction effect of the model. For scenarios with few kinds of hyperparameters, the grid search method is commonly used. However, the DNN has many hyperparameters needed to be tuned, and the grid search method will become time-consuming. To alleviate this problem, we utilize an automated hyperparameter tuning method, the tree-structured parzen estimators (TPE) method [33]. TPE is based on the Bayesian optimization algorithm, and it can select the next set of hyperparameters according to the evaluation results of the existing hyperparameter combinations [34]. The TPE method can find the better or best choice among all hyperparameter combinations within the maximum evaluation number. For the DNN training, the optimizer is Adam [35]. Weights and biases are initialized by the Xavier initialization [36]. The remaining hyperparameters are determined with TPE, and their search space is shown in Table 4. The maximum evaluation number is 100. The objective function of the TPE is the predicted coupling capacitances’ average relative error of the trained neural network on the testing set, and the procedure is implemented using the Hyperopt python package [37]. In the end, the hyperparameter combination with better performance for this studied pattern is as follows. The activation function is Swish; the batch size is 32; the learning rate is 10−4 ; the number of hidden layers is 4, and each hidden layer has 500 neurons. Table 4. Hyperparameters’ space for TPE. Hyperparameter Search Range Activation function Batch size Learning rate Number of hidden layers Neurons in each hidden layer ReLU [38], Tanh, Swish [39] 32, 64, 128, 256, 512 10−2 , 10−3 , 10−4 ,10−5 1, 2, 3, 4, 5, 6 50, 100, 200, 300, 400, 500, 600 Additionally, we experimented with ResNet-18 and ResNet-34 to predict their respective capacitances. Both employ the Adam optimizer. Since their hyperparameters needed to be tuned only for batch size and learning rate, hence we utilized the grid search method to find the better combination. Batch size is enumerated in {32,64,128,256,512}; the learning rate is enumerated in {10−2 ,10−3 ,10−4 ,10−5 }. Eventually, ResNet-18 has better performance than ResNet-34, and its hyperparameter combination is as follows. The batch size is set to 32, and the learning rate is 10−3 . Tables 5 and 6 are the prediction results obtained by those two neural networks under 1000 epochs, and their testing sets are consistent. The predicted total capacitances are indirectly obtained by summing the predicted coupling capacitances. We can see that DNN performs better both in the coupling capacitance and total capacitance prediction in this research case. Furthermore, the training time of DNN is only 31% of ResNet-18. In addition, those two networks’ prediction accuracy on the total capacitance all meets the accuracy requirements (<5%) in the advanced process criterion, showing the computation method of total capacitance introduced in this work is reliable. Table 5. The predictive performance of different neural networks on the coupling capacitance (the relative error is only evaluated for coupling capacitances greater than 1% of the corresponding total capacitance). Network Training Time (h) Average Relative Error Maximum Relative Error Ratio (>5%) DNN ResNet-18 0.89 2.90 0.12% 0.21% 10.66% 10.69% 0.01% 0.05% Network DNN ResNet-18 Training Time (h) 0.89 2.90 Average Relative Error 0.12% 0.21% Maximum Relative Error 10.66% 10.69% Ratio (>5%) 0.01% 0.05% Electronics 2023, 12, 1440 12 of 18 Table 6. The predictive performance of different neural networks on the total capacitance. Average of differentMaximum Ratio Table 6. The predictive performance neural networks on the total capacitance. Network Network DNN ResNet-18 DNN ResNet-18 Relative Error Relative Error Average Maximum 0.05% 2.76% Relative Error Relative Error 0.05% 0.05% 4.45% 2.76% 0.05% 4.45% (>5%) 0 0 Ratio (>5%) 0 0 Figure 11 shows the relative error distribution of two neural networks for the coupling capacitance and the total capacitance, respectively. The larger errors of coupling Figure 11 shows the relative error distribution of two neural networks for the coupling capacitance usually occur at lower capacitance values. In addition, the total capacitance capacitance and the total capacitance, respectively. The larger errors of coupling capacierror ofat DNN more concentrated that of tancedistribution usually occur loweriscapacitance values. Inthan addition, theResNet-18. total capacitance error distribution of DNN is more concentrated than that of ResNet-18. (a) DNN (b) ResNet-18 Figure 11. The Therelative relative distribution of thecoupling predicted coupling and total Figure 11. errorerror distribution of the predicted capacitance and capacitance total capacitance: capacitance: DNN; (b) ResNet-18. (a) DNN; (b)(a) ResNet-18. 3.2.The TheComparison Comparison of forfor Predicting TotalTotal Capacitance UsingUsing the DNN 3.2. of Two TwoMethods Methods Predicting Capacitance the DNN In this study, the total capacitance is obtained indirectly by calculating the sum of coupling capacitances. To compare this method with the way that directly predicts the total capacitance using DNN, we constructed another DNN model called DNN-2. The input of this neural network is consistent with the DNN structure introduced in Section 2.4, but Electronics 2023, 12, 1440 In this study, the total capacitance is obtained indirectly by calculating the sum of coupling capacitances. To compare this method with the way that directly predicts the total capacitance using DNN, we constructed another DNN model called DNN-2. The 13 of 18 input of this neural network is consistent with the DNN structure introduced in Section 2.4, but the output dimension of DNN-2 is five, corresponding to the five total capacitances of this research case. The training sets, validation sets, and testing sets are the outputwith dimension DNN-23.1. is five, to the five total capacitances of this consistent those inofSection The corresponding approach of hyperparameters’ settings of DNNresearch case. The training sets, validation sets, and testing sets are consistent with those in 2 is the same as those of the DNN introduced in Section 3.1. Section 3.1. The approach of hyperparameters’ settings of DNN-2 is the same as those of Finally, after trial and error, the hyperparameters’ combination with better the DNN introduced in Section 3.1. performance is as follows. The activation function is Tanh. The learning rate is 10−3 . The Finally, after trial and error, the hyperparameters’ combination betterinperforbatch size is 64. The number of hidden layers is 3, and the number ofwith neurons each −3 . The batch mance is as follows. The activation function is Tanh. The learning rate is 10 hidden layer is 500. The prediction results of DNN-2 are shown in Table 7. At the same size the is 64. The number of hidden 3, and the number neurons hidden time, prediction results of DNNlayers on theistotal capacitance in of Section 3.1 in areeach given as a layer is 500. The prediction results of DNN-2 are shown in Table 7. At the same time, the comparison. From the table, we can see that the predictive performance of the two DNN prediction results of DNN on the total capacitance in Section 3.1 are given as a comparison. models is very close. Therefore, the method used in this study is reliable. Moreover, the From also the table, we can the predictive the for twopredicting DNN models result indicates thatsee ourthat method can saveperformance the trainingofcost totalis very close. Therefore, the method used in this study is reliable. Moreover, the result also capacitance. indicates that our method can save the training cost for predicting total capacitance. Table 7. The prediction results of two methods using DNN for total capacitance. Table 7. The prediction results of two methods using DNN for total capacitance. Training Average Maximum Training Average Maximum Relative Error Relative Error Method Time (h) Time (h) Relative Error Relative Error DNN 0.89 0.05% 2.76% DNN 0.89 0.05% DNN-2 0.35 0.05% 4.96% 2.76% Method DNN-2 0.35 0.05% 4.96% Ratio (>5%) Ratio (>5%) 0 0 0 0 3.3. The Predictive Performance of DNN under Different Training Set Sizes 3.3. The Predictive Performance of DNN under Different Training Set Sizes This subsection tests the predictive performance of DNN under different training set Thisvalidation subsectionset tests the predictive performance of as DNN under 3.1. different training set sizes. The and testing set are still the same in Section The ratio of the sizes. The validation set and testing set are still the same as in Section 3.1. The ratio of the training set size gradually increases from 10% to 80%; that is, the size of training sets varies training set size gradually increases from 10% to 80%; that is, the size of training sets varies from 5000 to 40,000. The structure of DNN is also the same as in Section 3.1. Finally, the from 5000 to 40,000. The structure of DNN is also the same as in Section 3.1. Finally, the average relative error and the ratio of errors greater than 5% of each trained model on the average relative error and the ratio of errors greater than 5% of each trained model on the testing set are drawn in Figure 12. It can be seen that both the average relative error and testing set are drawn in Figure 12. It can be seen that both the average relative error and the ratio (>5%) overall decrease with the increase of the training set size. Moreover, when the ratio (>5%) overall decrease with the increase of the training set size. Moreover, when the size of training sets is more than 20,000, the change trend in both factors slows down. the size of training sets is more than 20,000, the change trend in both factors slows down. (a) Coupling capacitance (b) Total capacitance Figure of average averagerelative relative error of errors greater than 5%different under Figure12. 12. Distribution Distribution of error andand the the ratioratio of errors greater than 5% under different sizes of training sets: (a) Coupling capacitance; (b) Total capacitance. sizes of training sets: (a) Coupling capacitance; (b) Total capacitance. 3.4. The Comparison of Solving Time between the FEM and the Trained DNN To compare the solving efficiency between the trained DNN and the FEM, we select 100 sets of data in the testing set and compute their parasitic capacitance matrix with the trained DNN and FEM, respectively. Since both the trained DNN and FEM need to process the input data in advance (i.e., FEM needs to generate mesh, and the trained DNN needs 3.4. The Comparison of Solving Time between the FEM and the Trained DNN Electronics 2023, 12, 1440 To compare the solving efficiency between the trained DNN and the FEM, we select 14 of 18 100 sets of data in the testing set and compute their parasitic capacitance matrix with the trained DNN and FEM, respectively. Since both the trained DNN and FEM need to process the input data in advance (i.e., FEM needs to generate mesh, and the trained DNN to construct the input vector), the data preparation time time is ignored here,here, and and onlyonly the needs to construct the input vector), the data preparation is ignored solving time is counted. The programming language of DNN is Python, and it uses GPU the solving time is counted. The programming language of DNN is Python, and it uses for computation. The programming language of the of FEM C++.isFinally, average GPU for computation. The programming language theisFEM C++. the Finally, the computation time of those two methods for one case is shown in Table 8. The solving time average computation time of those two methods for one case is shown in Table 8. The of the trained DNN takes only 2% of FEM. solving time of the trained DNN takes only 2% of FEM. Table 8. The average computation time of the FEM and the trained DNN for one testing case. Table 8. The average computation time of the FEM and the trained DNN for one testing case. Method Method FEM FEM Thetrained trained DNN DNN The Time (ms) (ms) Time 759.49 759.49 14.64 14.64 3.5. The The Verification Verification of of aa New New Pattern Pattern 3.5. To further verify our our method, method, we we experiment experiment with with aa new new pattern pattern called called pattern-2, pattern-2, To further verify whose geometry and parameters are shown in Figure 13. In pattern-2, conductor is whose geometry and parameters are shown in Figure 13. In pattern-2, conductor 11 is centered, conductor conductor22and andconductor conductor 4 are always right of center the center centered, 4 are always on on thethe right sideside of the line, line, and and conductor 3 and conductor 5 are always to the left of the center line. We choose the conductor 3 and conductor 5 are always to the left of the center line. We choose the same same metal layers as the pattern in Figure 2, but each layer contains a different number of metal layers as the pattern in Figure 2, but each layer contains a different number of conductors. Subsequently, we built a new DNN network called DNN-3. Since the geometric conductors. Subsequently, we built a new DNN network called DNN-3. Since the variables and the total conductors of pattern-2 the same pattern in geometric variables andnumber the totalofnumber of conductors of are pattern-2 areas thethe same as the Figure 2, the input and output layer used for DNN-3 is the same as in Figure 7. pattern in Figure 2, the input and output layer used for DNN-3 is the same as in Figure 7. Figure 13. The The illustration of pattern-2 and its geometry variables. The metal metal layer layer selected selected in in Pattern-2 Pattern-2 is is the the same same as as the the pattern pattern in in Figure Figure 2; 2; hence hence their their The extraction window width is also the same. The sampling range of each geometric parameter extraction window width is also the same. The sampling range of each geometric in pattern-2 shown in utilized field solver EMPBridge to generate parameter inispattern-2 is Table shown9.inWe Table 9. Wethe utilized the field solver EMPBridge to 25,000 sets of data and used the data augmentation method introduced in 2.3 generate 25,000 sets of data and used the data augmentation method introducedSection in Section to augment thethe datasets, and finally obtained 50,000 the 2.3 to augment datasets, and finally obtained 50,000sets setsofofdata. data.AAtotal total of of 80% 80% of of the datasets were were used used as as training training sets, sets, 10% 10% were were used used as as verification verification sets, sets, and and 10% 10% were were used used datasets as testing sets. as testing sets. Table 9. Table 9. Parameters’ Parameters’ sampling sampling range range of of pattern-2. pattern-2. Parameter Sampling Range Parameter Sampling Range x2 x3 x4 x5 w1 [1.04, 1.52] [−1.52, −1.04] [1.05, 1.52] [−1.52, −1.05] [0.09, 0.9] w2 w3 w4 w5 - [0.081, 2.0] [0.081, 2.0] [0.09, 2.0] [0.09, 2.0] - The approach of hyperparameters’ settings of DNN-3 is the same as those of the DNN introduced in Section 3.1. In the end, the better hyperparameter combination obtained with TPE is as follows. The activation function is Swish. The batch size is 64. The learning rate Electronics 2023, 12, 1440 15 of 18 is 10−4 ; The number of hidden layers is 2, and each hidden layer has 500 neurons. The prediction performance of DNN-3 on the testing set is shown in Table 10. It can be seen that our method performs well in the new example, and the average relative error of the coupling capacitance is only 0.10%, the average relative error of the total capacitance is only 0.04%, and the prediction accuracy of the total capacitance all meets advanced technology accuracy requirements (errors < 5%). Table 10. The predictive performance of DNN-3 on the coupling capacitances and total capacitances. Property Coupling Capacitance Total capacitance Training Time (h) Average Relative Error Maximum Relative Error Ratio (>5%) 0.32 0.10% 5.04% 0.00% - 0.04% 0.37% 0 4. Discussion This study uses the DNN to predict the parasitic capacitance matrix of a 2D pattern. The network output only includes coupling capacitances. The total capacitance is obtained by calculating the sum of the corresponding coupling capacitances. Moreover, a data augmentation method is employed, doubling the dataset’s size. For comparison, this study also utilized ResNet to predict the parasitic capacitance matrix. The above experiments can derive the following discussions. • • • • • • Data augmentation technique can obtain new data through the original dataset without additional computation. Section 2.2 introduces a mirror flip method that doubles the size of datasets size and reduces the data preparation effort. In this paper, we train only one DNN model to predict the capacitance matrix. The neural network’s output includes only coupling capacitances in the capacitance matrix, and the total capacitances are obtained by summing corresponding predicted coupling capacitances. For the research case, experimental results in Table 6 show that the prediction accuracy of the total capacitance obtained by this method meets the advanced process requirements (<5%). The comparison in Table 7 indicates that the predictive performance of our method with the method that directly predicts the total capacitance using DNN is very close. Moreover, the training time of our method is 0.89 h, and that of DNN-2 is 0.35 h. Therefore, our method only takes about 72% of the total training time but can simultaneously predict the total capacitances and coupling capacitances. In addition, the hyperparameters’ tuning of DNN-2 is time-consuming, and our method also saves this process. The above results indicate that the method introduced in this work is reliable and can save the training cost for predicting total capacitance. In [11], ResNet is used to predict the total capacitance and coupling capacitances, which shows good precision. To compare this kind of neural network’s prediction performance with DNN for this research case, we also utilized the ResNet to predict the capacitance matrix. The results show that the performance of DNN is better than the ResNet for this research case, and the training time of DNN is only 31% of ResNet-18. We tested the prediction effect of DNN under different dataset sizes. With the increase in training data size, the average relative error and the ratio of errors greater than 5% of trained models gradually decrease. Further, when the training data size is greater than 20,000, the changing trend of the average relative error and the ratio (>5%) tends to slow down. In addition, the calculation efficiency of the FEM and the trained DNN are compared. The average computing time of the trained DNN for one case is only 2% of that of FEM, which shows that the trained DNN has good efficiency. Furthermore, we validate our method to another pattern called pattern-2. The results show that our method performs well on the new pattern, indicating its feasibility. Electronics 2023, 12, 1440 16 of 18 • • This experiment only considers horizontal changes in the interconnect geometry. However, in real manufacturing, there will be more process variations. For example, the permittivity of the dielectric layer, interconnect wire’s thickness, etc. Therefore, in the future, the parasitic capacitance prediction will add more variables to the consideration. When there are corners or connections in the interconnect wires [4,40], the two-dimensional analysis is no longer applicable. We prepare to solve the parasitic capacitance prediction of this kind of problem in the next step. The neural network’s input data representation is critical, and one approach is using the density-based data representation of the interconnect’s top view [23]. 5. Conclusions This study used DNN to predict the parasitic capacitance matrix of a two-dimensional pattern. The neural network’s output includes only coupling capacitances in the capacitance matrix, and the total capacitances are obtained by summing corresponding coupling capacitances, thus saving the cost of training total capacitance. For the research case in this study, experimental results show that the average relative error of the predicted total capacitance based on DNN in the testing set is only 0.05%, and the average relative error of the predicted coupling capacitance is only 0.12%, showing good prediction accuracy. Furthermore, a data augmentation method was introduced, reducing the data preparation effort. Our study can be used to construct capacitance models in the pattern-matching method. Author Contributions: Conceptualization, Y.M., X.X., S.Y. and Z.R.; Data curation, Y.M., Y.Z. and T.Z.; Formal analysis, Y.M., X.X., S.Y. and Z.R.; Funding acquisition, X.X., S.Y. and Z.R.; Investigation, S.Y. and L.C.; Methodology, Y.M., X.X. and S.Y.; Project administration, Z.R.; Resources, Y.M., X.X., Y.Z., T.Z. and Z.R.; Software, Y.M., S.Y., Y.Z., T.Z. and Z.R.; Supervision, X.X., S.Y., Z.R. and L.C.; Validation, Y.M.; Visualization, Y.M.; Writing—original draft, Y.M.; Writing—review & editing, X.X., S.Y., and Z.R. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by The Institute of Electrical Engineering, CAS (Y960211CS3, E155620101, E139620101). Data Availability Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request. Conflicts of Interest: The authors declare no conflict of interest. References 1. 2. 3. 4. 5. 6. 7. Gong, W.; Yu, W.; Lü, Y.; Tang, Q.; Zhou, Q.; Cai, Y. A parasitic extraction method of VLSI interconnects for pre-route timing analysis. In Proceedings of the 2010 International Conference on Communications, Circuits and Systems (ICCCAS), Chengdu, China, 28–30 July 2010; pp. 871–875. [CrossRef] Ren, D.; Xu, X.; Qu, H.; Ren, Z. Two-dimensional parasitic capacitance extraction for integrated circuit with dual discrete geometric methods. J. Semicond. 2015, 36, 045008. [CrossRef] Yu, W.; Song, M.; Yang, M. Advancements and Challenges on Parasitic Extraction for Advanced Process Technologies. In Proceedings of the 26th Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, 18–21 January 2021; pp. 841–846. [CrossRef] Abouelyazid, M.S.; Hammouda, S.; Ismail, Y. Connectivity-Based Machine Learning Compact Models for Interconnect Parasitic Capacitances. In Proceedings of the 2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD), Raleigh, NC, USA, 30 August 2021–3 September 2021. [CrossRef] Abouelyazid, M.S.; Hammouda, S.; Ismail, Y. Fast and Accurate Machine Learning Compact Models for Interconnect Parasitic Capacitances Considering Systematic Process Variations. IEEE Access 2022, 10, 7533–7553. [CrossRef] Yu, W.; Zhuang, H.; Zhang, C.; Hu, G.; Liu, Z. RWCap: A Floating Random Walk Solver for 3-D Capacitance Extraction of Very-Large-Scale Integration Interconnects. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2013, 32, 353–366. [CrossRef] Jianfeng, X.U.; Hong, L.I.; Yin, W.Y.; Mao, J.; Le-Wei, L.I. Capacitance Extraction of Three-Dimensional Interconnects Using Element-by-Element Finite Element Method (EBE-FEM) and Preconditioned Conjugate Gradient (PCG) Technique. IEICE Trans. Electron. 2007, 90, 179–188. Electronics 2023, 12, 1440 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 17 of 18 Yu, W.; Wang, Z. Enhanced QMM-BEM solver for three-dimensional multiple-dielectric capacitance extraction within the finite domain. IEEE Trans. Microw. Theory Tech. 2004, 52, 560–566. [CrossRef] Nabors, K. FastCap: A multipole accelerated 3D capacitance extraction program. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 1991, 10, 1447–1459. [CrossRef] Kao, W.H. Parasitic Extraction: Current State of the Art and Future Trends. IEEE 2001, 89, 487–490. [CrossRef] Yang, D.; Yu, W.; Guo, Y.; Liang, W. CNN-Cap: Effective Convolutional Neural Network Based Capacitance Models for Full-Chip Parasitic Extraction. In Proceedings of the 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 1–4 November 2021; pp. 1–9. Congt, J.; Het, L.; Kahng, A.B.; Noice, D.; Shirali, N.; Yen, S.H. Analysis and Justification of a simple, practical 2,1/2-D Capacitance extraction methodology. In Proceedings of the 34th Annual Design Automation Conference, Anaheim, CA, USA, 13 June 1997; pp. 1–6. Synopsys StarRC—Golden Signoff Extraction. Available online: https://www.synopsys.com/implementation-and-signoff/ signoff/starrc.html (accessed on 3 February 2023). Siemens Calibre xRC Extraction. Available online: https://eda.sw.siemens.com/en-US/ic/calibre-design/circuit-verification/ xrc/ (accessed on 3 February 2023). What Is Parasitic Extraction. Available online: https://www.synopsys.com/glossary/what-is-parasitic-extraction.html (accessed on 6 February 2023). Fu, X.; Wu, M.; Tiong, R.L.K.; Zhang, L. Data-driven real-time advanced geological prediction in tunnel construction using a hybrid deep learning approach. Autom. Constr. 2023, 146, 104672. [CrossRef] Wang, H.; Qin, K.; Zakari, R.Y.; Liu, G.; Lu, G. Deep Neural Network Based Relation Extraction: An Overview. Neural Comput. Appl. 2021, 34, 4781–4801. [CrossRef] Lu, S.; Xu, Q.; Jiang, C.; Liu, Y.; Kusiak, A. Probabilistic load forecasting with a non-crossing sparse-group Lasso-quantile regression deep neural network. Energy 2022, 242, 122955. [CrossRef] Sharma, H.; Marinovici, L.; Adetola, V.; Schaef, H.T. Data-driven modeling of power generation for a coal power plant under cycling. Energy AI 2023, 11, 100214. [CrossRef] Yurt, R.; Torpi, H.; Mahouti, P.; Kizilay, A.; Koziel, S. Buried Object Characterization Using Ground Penetrating Radar Assisted by Data-Driven Surrogate-Models. IEEE Access 2023, 11, 13309–13323. [CrossRef] Kasai, R.; Kanamoto, T.; Imai, M.; Kurokawa, A.; Hachiya, K. Neural Network-Based 3D IC Interconnect Capacitance Extraction. In Proceedings of the 2019 2nd International Conference on Communication Engineering and Technology (ICCET), Nagoya, Japan, 12–15 April 2019; pp. 168–172. [CrossRef] Li, Z.; Shi, W. Layout Capacitance Extraction Using Automatic Pre-Characterization and Machine Learning. In Proceedings of the 2020 21st International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 25–26 March 2020; pp. 457–464. [CrossRef] Abouelyazid, M.S.; Hammouda, S.; Ismail, Y. Accuracy-Based Hybrid Parasitic Capacitance Extraction Using Rule-Based, Neural-Networks, and Field-Solver Methods. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2022, 41, 5681–5694. [CrossRef] Mckay, M.D.; Conover, R.J.B.J. A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics 1979, 21, 239–245. EMPbridge. Available online: http://www.empbridge.com/index.php (accessed on 15 February 2023). Lashgari, E.; Liang, D.; Maoz, U. Data augmentation for deep-learning-based electroencephalography. J. Neurosci. Methods 2020, 346, 108885. [CrossRef] Zhang, K.; Cao, Z.; Wu, J. Circular Shift: An Effective Data Augmentation Method For Convolutional Neural Network On Image Classification. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab, 25–28 October 2020; pp. 1676–1680. Calik, N.; Günesฬง, F.; Koziel, S.; Pietrenko-Dabrowska, A.; Belen, M.A.; Mahouti, P. Deep-learning-based precise characterization of microwave transistors using fully-automated regression surrogates. Sci. Rep. 2023, 13, 1445. [CrossRef] Wen, W.; Li, J.; Lin, S.; Chen, J.; Chang, S. A Fuzzy-Matching Model With Grid Reduction for Lithography Hotspot Detection. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2014, 33, 1671–1680. [CrossRef] He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. Hong, X.; Xue, T.; Jin, H.; Cheng, C.K.; Kuh, E.S. TIGER: An efficient timing-driven global router for gate array and standard cell layout design. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2006, 16, 1323–1331. [CrossRef] Wong, S.C.; Lee, G.Y.; Ma, D.J.; Chao, C.J. An empirical three-dimensional crossover capacitance model for multilevel interconnect VLSI circuits. IEEE Trans. Semicond. Manuf. 2002, 13, 219–227. [CrossRef] Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for Hyper-Parameter Optimization. In Proceedings of the International Conference on Neural Information Processing Systems, Granada, Spain, December 12–15 2011; pp. 2546–2554. Passos, D.; Mishra, P. A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks. Chemom. Intell. Lab. Syst. 2022, 223, 104520. [CrossRef] Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. Electronics 2023, 12, 1440 36. 37. 38. 39. 40. 18 of 18 Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 2010, 9, 249–256. Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 115–123. Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair. In Proceedings of the Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010. Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv 2017, arXiv:1710.05941. Huang, C.C.; Oh, K.S.; Wang, S.L.; Panchapakesan, S. Improving the accuracy of on-chip parasitic extraction. In Proceedings of the Electrical Performance of Electronic Packaging, San Jose, CA, USA, 27–29 October 1997; pp. 42–45. [CrossRef] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.