Applied Soft Computing 136 (2023) 110067 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Developing GAN-boosted Artificial Neural Networks to model the rate of drilling bit penetration ∗ Mohammad Hassan Sharifinasab 1 , Mohammad Emami Niri 2 , , Milad Masroor 3 Institute of Petroleum Engineering, School of Chemical Engineering, College of Engineering, University of Tehran, Tehran, Iran graphical article abstract info Article history: Received 14 July 2022 Received in revised form 16 January 2023 Accepted 22 January 2023 Available online 2 February 2023 Keywords: GAN-Boosted Neural Networks Convolutional Neural Network Residual structure a b s t r a c t The goal of achieving a single model for estimating the rate of drilling bit penetration (ROP) with high accuracy has been the subject of many efforts. Analytical methods and, later, data-based techniques were utilized for this purpose. However, despite their partial effectiveness, these methods were inadequate for establishing models with sufficient generality. Based on deep learning (DL) concepts, this study has developed an innovative approach that produces general and boosted models capable of more accurately estimating ROPs compared to other techniques. A vital component of this approach is using a deep Artificial Neural Network (ANN) structure known as the Generative Adversarial Network (GAN). The GAN structure is combined with regressor (predictive) ANNs, to boost their ∗ Corresponding author. E-mail addresses: hassan.sharifi75@ut.ac.ir (M.H. Sharifinasab), emami.m@ut.ac.ir (M. Emami Niri), milad.masroor7275@ut.ac.ir (M. Masroor). 1 M.Sc. holder of petroleum drilling engineering. 2 Assistant Professor. 3 M.Sc. holder of petroleum production engineering. https://doi.org/10.1016/j.asoc.2023.110067 1568-4946/© 2023 Elsevier B.V. All rights reserved. M.H. Sharifinasab, M. Emami Niri and M. Masroor Principle component analysis ROP modeling Sensitivity analysis Applied Soft Computing 136 (2023) 110067 performance when estimating the target parameter. The predictive ANNs of this study include MultiLayer Perceptron Neural Network (MLP-NN) and 1-Dimensional Convolution Neural Network (1D-CNN) structures. More specifically, the key idea of our approach is to utilize GAN’s capability to produce fake samples comparable to true samples in predictive ANNs. Therefore, the proposed approach introduces a two-step predictive model development procedure. As the first step, the GAN structure is trained with the target of the problem as the input feature. GAN’s generator part, which can produce fake ROP samples similar to the true ones after training, is frozen and then replaces the output layer’s neuron of the predictive ANNs. In the second step, final predictive ANNs carrying frozen trained-generator called GAN-Boosted Neural Networks (GB-NNs) are trained to make predictions. Because this approach reduces the computational load of the predictive model training process and increases its quality, the performance of predictive ANN models is improved. An additional innovation of this research is using the residual structure during 1D-CNN network training, which improved the performance of the 1DCNN by combining the input data with those features extracted from the inputs. This study revealed that the GB-Res 1D-CNN model, a GAN-Boosted 1-Dimensional Convolutional Neural Network with a Residual structure, results in the most accurate prediction. The validity of the GB-Res 1D-CNN model is confirmed by its successful implementation in blind well. As the final step of this study, we conducted a sensitivity analysis to identify the effect of different parameters on the predicted ROP. As expected, the DS and DT parameters significantly affect the model-estimated ROP. © 2023 Elsevier B.V. All rights reserved. Over the years, ML techniques have been widely used by scientists and engineers to address complex modeling, design and optimization problems [16–20]. In particular, subsurface drilling specialists employed ML-based approaches to monitor, identify, predict, and describe the trends and patterns among various geological and drilling parameters in real-time [e.g., [21–24]]. In this context, ANNs have been the most popular choice for ROP prediction, as reported by various researchers [e.g., [25–33]]. Some other forms of shallow ML algorithms have also been utilized in ROP modeling; for example, Random Forest (RF) [27,34] and Support Vector Machine (SVM) [9,25] have been used to construct ROP estimation models. Despite the acceptable results obtained by the shallow ANNs (e.g., ANNs with a single hidden layer) in ROP estimation, due to the complexity of the relationships between ROP and its related parameters, the generalizability of this ML method is not entirely satisfactory. In fact, as the complexity of the relationships between the parameters increases, the capability of shallow ANNs decreases because (1) shallow ANNs cannot communicate among their input parameters, (2) As the number of training samples increases, the performance of shallow ANNs does not considerably improve (especially after 7000 samples). The recent progress in DL and image-based artificial intelligence (AI) methods has shifted the studies toward them. DL techniques usually have more layers and thus have more capability to deal with complex problems. They can also communicate between parameters in the input layer, which leads to the extraction of complex nonlinear patterns and considers the effects of parameters on each other [35]. Convolution Neural Networks (CNNs) are a widely used subset of DL techniques. Convolutional units called Kernels operate on the hidden layers of CNNs, which place the input layer under the convolution operation. This commonly includes a layer that produces a dot product between the kernel and the input layer matrix. Unlike shallow ANNs, this operation allows the CNN parameters to be adjusted based on the relationship between the input matrix elements. In addition, the kernels enable the network to utilize data in various dimensions, such as numerical data [36] and image data [37]. The CNNs with various structures have been used in many classification and regression problems thanks to the ability of convolution kernels to extract features effectively and prevent information loss and overfitting. For example, Matinkia et al. [38] applied the CNN method to construct a regression model to estimate ROP. Masroor et al. [39,40] showed that, compared to ML methods, 1D-CNN (compatible with 1D samples) enables better handling of a regression problem. They 1. Introduction The rate of drilling bit penetration (ROP) refers to the speed at which a drilling bit penetrates a formation to deepen the borehole. It is often measured at each level of depth in a unit of speed, which is usually meters per hour. The ROP is an essential parameter of drilling operations, as it may greatly affect the drilling performance and economic efficiency [1]. ROP depends on drilling parameters and formation characteristics. Drilling parameters can be categorized into controllable and uncontrollable factors [2]. The controllable parameters, such as drilling fluid rate (Q), drill string rotation speed (DS), stand pipe pressure (SPP), bit torque (T), and bit weight (WOB), can be modified without adversely affecting drilling operations. On the other hand, uncontrollable factors, such as drilling fluid density and rheological properties, and drilling bit diameter, cannot be easily altered due to geological and economic concerns [3]. Depending upon the borehole condition and formation properties, the drilling engineer can adjust the controllable parameters based on his experience or according to the equations and models to achieve the maximum drilling speed. Therefore, it is necessary to derive mathematical equations/models to quantify the relationship between the drilling controllable parameters and ROP. This can contribute to determining the optimal values of the controllable parameters in any drilling operational conditions [2,4]. ROP estimation models can be classified into physics-based (or Traditional) and data-driven (i.e., Statistical or Machine Learning) models [5]. The Traditional ROP prediction models are developed based on physical laws and incorporate the effects of drilling parameters and formation characteristics on the prediction process [6–8]. However, they often result in weak predictions because they rely on empirical coefficients, which require supporting data (such as bit parameters and mud properties) and conform to only one facies type (as lithology plays a vital role in determining the empirical coefficients) [9]. The coefficients of the traditional models are required to fit with the offset wells as accurately as possible, which is often a challenging problem in practice [10–12]. In this context, intelligent techniques, particularly machine learning (ML) methods, have been utilized to address the problems mentioned above in traditional models. The process of ML involves providing computers with the ability to analyze and uncover relationships among some features in a dataset using mathematical algorithms. In fact, the machines can learn from data samples and facilitate the solution of nonlinear or complex problems that are difficult to solve analytically [13–15]. 2 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 also showed that transferring the samples’ dimension space into a higher dimension space to use 2D convolution units and implementing a particular structure called residual structure can improve the performance of the convolution neural networks for prediction. Skip connections in residual structures cause the patterns in the input layer to end up next to the patterns extracted from the feature extraction section. As a result, linear relationships in the input layer directly flow into the learning section along with the extracted features. In this study, we propose a boosting technique based on the Generative Adversarial Network (GAN) method to reinforce shallow and deep ANNs (called GB-NNs) in the ROP estimation problem. In fact, a GAN structure is involved in the model before launching the learning process. The whole process is performed by a two-step training procedure, consisting of (i) a GAN-training step and combining predictive ANNs with GAN’s generator, and (ii) a GB-NNs-training step, which results in a reduction of overfitting and a more accurate ROP prediction. We demonstrate that (1) Boosting shallow and deep models with a GAN structure can enhance accuracy in ROP estimation. (2) The residual form of the 1D-CNN improves its performance. (3) The deep ANNs outperform the shallow ones. Finally, as the most accurate model, we introduce the GB-Res 1D-CNN model, a deep NN model with a residual structure reinforced by the GAN network. Since an essential preprocessing step is choosing the optimal features as the network input, it is required to reduce the number of features while maintaining the accuracy and performance of the model. Among the techniques available to extract the most relevant features from the input dataset, we employed Principal Component Analysis (PCA) [41] to reduce the complexity of models and improve their performance. Fig. 1. The structure of MLP-NN with a single hidden layer. 2.1.1. Multilayer Perceptron Neural Network (MLP-NN) One of the simplest types of ANNs is a fully connected neural network of MLP-NN with a single hidden layer. This network has an input layer, a hidden layer with various numbers of neurons, and an output layer, all linked by strings of weights. A single neuron may constitute the output layer (for regression tasks), or multiple neurons may be utilized (for classification tasks). The structure of MLP-NN with a single hidden layer for a regression task is shown in Fig. 1. The output of this network can be expressed as Eq. (2): F (X ) = f ((σ (X T · θhidden−input ))T · Woutput−hidden ) Where X is the transposed feature matrix, θhidden−input is the matrix of weights connecting the hidden layer to the input layer, Woutput−hidden is the matrix of weights connecting the output layer to the hidden layer, σ is a non-linear activation function, and f is a linear activation function. This study aims to enhance the performance of the predictive ANNs in the ROP modeling problem. The GAN structure is utilized for this purpose; in fact, a part of the GAN structure called the Generator is incorporated into predictive ANNs, leading to the development of boosted structures known as GB-NNs. Therefore, this section introduces ANNs and describes the used networks, such as the MLP, 1D-CNN, and GAN structure. Finally, the development process of GB-NNs are explained. 2.1.2. One-Dimensional Convolutional Neural Network (1D-CNN) A 1D-CNN consists of two major parts: the feature extraction part (which includes convolutional layers and pooling layers) and the learning part (which consists of fully connected layers). There are numerous combinations of layers within these two sections. Additional layers are typically used, such as the flatten layer, which converts the output of the feature extraction section (feature map) into a one-dimensional matrix to be used in the learning section. It is also common to incorporate a dropout layer to prevent the overfitting problem. The architecture of a deep 1D-CNN is schematically illustrated in Fig. 2. Two convolutional layers, two pooling layers, and a fully connected layer are included in this network. Convolutional layer: In 1D-CNNs, the convolutional layer consisting of convolution kernel units is the core component of the network. A kernel is a spatial numerical matrix that operates as a processor unit (neurons in fully connected neural networks) and performs convolution on input data in convolutional networks. Each kernel unit moves along the input matrix and produces a feature matrix called a feature map entering the next layer. The size of the feature map can be expressed as Eq. (3). As shown in Fig. 2, the kernel units are stacked in the convolutional layer. The kernels’ size, number, stride, and padding all influence the convolutional layer’s structure. The kernel size is a spatial area representing a local (or restricted) receptive field over the input features matrix. The kernel is called one-dimensional when the input matrix is a vector. Defining the kernel’s stride will help determine how it moves through inputs. The padding technique 2.1. Artificial Neural Networks (ANNs) ANN is a widely used supervised ML technique to solve regression and classification problems. Based upon a network of interconnected artificial neurons, it represents an information processing procedure with similar characteristics to the behavior of an actual brain. ANN involves training a set of parameters called weights and biases on a dataset to determine a function: f (·) : Rm → Ro (2) T 2. Theory (1) Where m is the number of dimensions for input, and o is that of the output. A non-linear approximator can be learned for classification or regression from a set of features and a target. Typically, ANNs are composed of three main components (an input layer, an output layer, and a hidden part that may contain one or more layers), including non-linear neurons. Therefore, every neuron has a non-linear activation function (except for the input nodes). While several activation functions exist, sigmoid and ReLU are the most common. In an ANN, the structure, architecture, and processors change according to the type of the problem. The following sections will examine some types of ANNs and their underlying architectures used in this study. 3 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 2. A general structure of 1D-CNN. involves adding or removing values from the input dimension to allow the kernel unit to traverse the entire dimension smoothly. Fig. 3 illustrates the process carried out by the one-dimensional kernel unit. As the kernel moves over different regions of the input matrix, it performs the dot product for each place. It is possible to adjust the kernel speed by adjusting the stride parameter. For example, if the stride parameter is 1, the kernel will shift by one pixel at each iteration, and if it is 2, it will move two pixels per iteration. Moreover, the padding parameter determines how the kernel covers the input matrix. As an example, if padding is 0 (Fig. 4A), no border is added to the input matrix, so the kernel center is never placed on the border pixels of the input matrix. In the case of padding 1 (Fig. 4B), an extra margin with zero values is added to the input matrix border. For the kernel center to span all pixels of the input matrix (the feature map has the same size as the input matrix), the padding value must be as Eq. (4). Feature map size = Input v olume size − Kernel size + 2(Padding) Padding v alue = Stride Kernel Size − 1 2 +1 (3) (4) Pooling layer: Pooling reduces the size of the feature map (Fig. 5). We select the highest value from every region with the maximum pooling layer, while with the average pooling layer, the average value will be chosen. A pooling layer is placed after each convolutional layer. Flatten layer: Flatten layer is commonly used in transitioning from the convolution layer to the fully connected layer. This layer makes the produced feature maps suitable to enter the fully connected layer. Fully connected layer(s): This is the section responsible for the learning role. It consists of the fully connected neural layers. Each node in the fully connected layer is connected to nodes in the previous layer. Fig. 3. Convolution operation through a kernel unit. 2.1.3. Residual 1D-Convolutional Neural Network (Res 1D-CNN) As part of this study, to improve the model’s performance, the general architecture of 1D-CNN is modified by adding a residual structure [42]. However, the more complex and deep the ANN, the more likely it is to suffer from vanishing/exploding gradients, which results in performance degradation. He et al. [42] developed a deep residual architecture called ResNet to overcome this problem. It is a stack of residual blocks that makes up the ResNet architecture. A layer’s output is taken and added to the output Fig. 4. Moving Kernel on input matrix. (A) Padding = 0. (B) Padding = 1. of a deeper layer in the residual block. This crossing of layers in residual architecture is called skip connection or shortcutting. By introducing a residual form, the depth of the model can be increased without adding additional parameters to the training process or causing further computation complexity. Fig. 6 illustrates the configuration of a single residual block and the skip 4 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 enter the discriminator section. Afterward, this section returns a probability between 0 and 1, where 1 indicates that the target value is real and 0 indicates that it is fake or produced by the generator network. 2.2. Development of GAN-Boosted Neural Networks (GB-NNs) The concept of boosting ANNs by GAN structure includes reinforcing predictive ANNs through a GAN structure. This modifying process involves combining an arbitrary predictive ANN that receives inputs with a GAN structure that has already encountered the output of the problem before combining the two. Thus, the network training will be conducted in two stages. In the first stage of the training, a GAN structure gets trained using the true target values contained in the data, which are the output of the problem. Then, the frozen GAN generator section is extracted and then replaced the output layer of the predictive ANN, which results in the final ANN entering the second stage of training. In the context of this discussion, the frozen term refers to the immutability of weights and biases throughout the training process. Therefore, the weights and biases of the generator do not change when integrated with the predictive ANN during the GB-NN training process. Figs. 9 and 10 show an unmodified (unboosted) predictive ANN and a modified (boosted) predictive ANN, respectively. According to Fig. 9, an ANN comprises an input layer, a hidden neural layer (or a section containing several layers), and an output layer with one neuron. Depending on the number, type, and structure of layers in the neural network, it may be a MLPNN with a single hidden layer, 1D-CNN, or Res1D-CNN. In the context of our problem (the ROP prediction), the ANN’s hidden part receives inputs such as drilling parameters and conventional well logs through the input layer. In the output layer, the output of the hidden part, which is a matrix of specific dimensions, enters a neuron, which then estimates the target value. Assume that a GAN structure is familiar with target samples and that its generator can convert a latent matrix into a fake target value that closely matches the true one. As illustrated in Fig. 10, this generator can be removed from the GAN structure and replaced by the neuron in the output layer of a predictive ANN. As a result, the ANN’s hidden part is regulated in a way that should produce a normally distributed matrix to feed the frozen generator. This reduces the computational load of the predictive ANN for the following reasons. (1) The weights linking the latent matrix, produced by the ANN’s hidden part, to the generator are frozen during the training of the predictive ANN; only those connecting the input layer to the hidden layer need to be altered. (2) Given that adjusting the weights for the connections between the input layer and the hidden part will result in a matrix with normal distribution, the complexity of the derivative calculations of the training process will be lower. By this method, the hidden part output enters the generator as a latent matrix, and the generator estimates the target value rather than the output layer neuron. Fig. 11 shows a flowchart for the two-step training process of the GB-NNs. Fig. 5. Pooling operation. Fig. 6. Residual block, each rectangle represents a function layer (F). connection. X , the output from the preceding layer, is used as the input for another layer (e.g., a convolutional block) where function F converts it to F (X ). Following the transformation, the original data, X , is added to the transformed result, F (X ) + X being the final result of the residual block. Fig. 7 illustrates a schematic overview of the developed Res 1D-CNN architecture based on the discussed modifications. Throughout this study, the input layer is positioned next to the output of the 1D-CNN feature extraction section. As a result, the patterns in the input layer can remain intact alongside the extracted features. Thus, the learning section of 1D-CNN can use more information. 2.1.4. Generative Adversarial Network (GAN) A GAN structure works as a two-part deep neural network designed to receive actual data (True samples) and generate synthetic data (Fake samples) with the most similarity to the real ones. This deep ANN contains two different neural components: generator and discriminator. The generator section of the network provides the capability to convert a normal distribution random matrix, called latent matrix, into synthetic data similar to real data during the training process. On the other hand, the discriminator section requires configuring through a training process to discern generated synthetic data from true data with the smallest difference. Therefore, the training process comprises constructive interaction between a generator and discriminator, ultimately leading to a network capable of producing data similar to actual ones. Fig. 8 illustrates a schematic view of the GAN structure. The latent matrix in the generator section is used to obtain a fake target value. This fake target value and the true target value 3. Method As illustrated in Fig. 12, the proposed workflow for ROP prediction consists of three main stages of data gathering and description (Stage 1), data preprocessing and feature selection (Stage 2), and model training and evaluation (Stage 3). 5 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 7. The architecture of the Res 1D-CNN model. The input layer is followed by a sequence of convolutional layers at a residual block to extract features from the map. It is also concatenated with a flattened feature map through a skip connection. Fig. 8. Schematic representation of the GAN structure, consisting of a generator and a discriminator. The network receives a random matrix called the latent matrix and creates a fake target sample similar to a true target sample. Fig. 9. Schematic of an unboosted predictive ANN. This network can be MLP, 1D-CNN, and ResCNN, based on the hidden layers’ number, type, and structure. 3.1. Data gathering and description operations include Final Drilling Reports, Daily Drilling Reports, and Daily Mud logging Reports. Well-logging data, as its name implies, is obtained by well-logging tools and is the measurement of some physical properties related to the subsurface formations. Both data types, which are different from well to well, are placed next to each other and form the dataset. In this study, drilling and well-logging data of three wells from an oil field in South-West Iran were used to build the final database (two training wells, The dataset used in this study is classified into two categories of drilling data and well logging data. The drilling data can be obtained from conventional drilling-related logs such as master and drilling logs. These logs result from measuring and adjusting the parameters of drilling operations at the surface. Other resources used to extract and analyze data related to well drilling 6 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 10. Schematic for a predictor ANN boosted by a pre-trained frozen GAN generator (GB-NN). The generator is frozen and placed into the predictive ANN in place of a single output neuron once the weights of the generator network are adjusted to produce fake ROP in the GAN training phase. Table 1 Drilling and well logging parameters in the dataset. Data category Parameter, unit Definition Source Drilling parameters ROP, m/h Drilling bit penetration speed. The rig site acquisition system. DS, rpm The drilling string’s rotational speed. WOB, klbf The downward force exerted by the drill collars on the drilling bit. SPP, psi Total pressure loss occurring in the drilling system due to the drilling fluid pressure drop within the fluid circulation path along the well. Torque, klbf .ft It is the energy required to overcome the rotational friction against the wellbore, the viscous force between the drill string and drilling fluid, and the torque of the drilling bit to rotate it at the bottom of the hole. Well-logging parameters MW, gr/cc Drilling fluid density. Q, gpm The flow rate of drilling fluid. CGR, GAPI Natural radiation caused by potassium and thorium in the formation. DT, µs/ft The delay time of compressional waves. NPHI, DEC Compensated neutron porosity log. RHOB, gr/cc Compensated bulk density log. A and B, and one blind well, C). It should also be noted that the data used belongs to a specific depth interval of each well (Sarvak Formation, which is composed of limestone and interspersed with layers of shale and anhydrite). Table 1 lists the definitions of all parameters present in the dataset. Table 2 presents the statistical parameters for wells A, B, and C. Wireline/ LWD (Logging While Drilling) tool. The LWD refers to a specific well logging technique which provides real-time measurement of the physical characteristics of a subsurface formation as it is drilled. 3.2. Data preprocessing 3.2.1. Outlier elimination and denoising Outliers and noisy samples in a dataset can adversely affect a ML system’s performance [43–45]. 7 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 11. A flowchart of the training process of GB-NNs. An outlier is a sample that has an unreasonable value. For example, in the dataset used for this research, a sample with a WOB or DS equal to zero and a ROP greater than zero can be considered an outlier. Such samples are not logical and must be identified and manually removed from the database. To efficiently handle the noisy data, a smoothing filter called Savitzky–Golay (SG) [46] was employed. SG filter reduces noise on data through a polynomial function, where it replaces the original values with more smooth ones. Within a selected interval, a polynomial function of order n is fitted to m points based on the least-squares error. This interval should contain odd numbers of points. Increasing the polynomial order or decreasing the number of points within the interval could keep the original data structure and reduce the level of smoothing. Therefore, it is vital to properly determine the polynomial order and the number of points within the interval. We conducted a sensitivity analysis to determine the optimal settings for these two parameters. A polynomial order of 1 to 5 and a point interval of 3 to 49 were considered. An ANN structure with a single hidden layer containing 10 neurons (equal to the number of input parameters) was then selected to train with different values for both parameters to determine optimal values. The ANN was trained using the filtered training 8 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 12. The proposed workflow for the ROP modeling process: Stage (1) data gathering and description, Stage (2) data preprocessing and feature selection, and Stage (3) model training and evaluating. Table 2 The summary statistics of drilling and well logging variables for each of the three wells. Well name, no. of samples Variable Minimum Maximum Mean SD A, 6551 ROP WOB DS SPP Torque MW Q CGR DT NPHI RHOB 19.05 1.9401 46.0 478.0 1.9 1.3 212.74 2.9766 49.1127 0.014 1.9201 0.56 18.2543 138.0 2268.0 7.6 1.36 502.9 46.7163 110.6902 0.3266 2.7506 10.0707 11.2312 124.0146 1967.0361 5.6104 1.3414 441.6996 7.177 66.8133 0.1172 2.5005 4.3533 3.8642 16.9024 318.3065 0.9104 0.0126 45.4125 3.8537 8.0663 0.0668 0.1279 B, 6801 ROP WOB DS SPP Torque MW Q CGR DT NPHI RHOB 0.1339 2.0283 12.1 560.0 0.7933 1.16 202.6199 5.3483 42.4247 0.01 2.1386 27.027 35.0535 156.0 3336.0 13.334 1.53 1948.0043 46.9711 120.3512 0.49 2.7283 7.8517 14.1828 114.6806 2661.6506 6.3906 1.3263 741.8476 11.4139 65.2282 0.112 2.4936 4.9718 4.973 22.5396 602.0977 1.5358 0.036 160.8168 3.9147 7.8237 0.0727 0.1129 C, 6631 ROP WOB DS SPP Torque MW Q CGR DT NPHI RHOB 0.42 0.8818 82 1908 2.74 1.3 503.89 2.2746 49.819 0.0189 2.1596 11.32 20.4809 125 2467 7.28 1.38 593.99 68.3711 108.3841 0.4333 2.6975 5.5568 11.9254 114.0482 2246.3163 5.1966 1.343 556.2272 7.264 64.605 0.1323 2.4659 2.1178 3.4468 8.186 113.9434 0.716 0.0107 20.9716 6.0735 7.9607 0.0673 0.1271 dataset through Adam optimizer approach [47] (an adaptive gradient descent method commonly used in back-propagation (BP) algorithms for training feed-forward neural networks [48,49]), and its performance was evaluated with the filtered blind dataset. Lastly, the correlation coefficient between the actual and predicted values was calculated, as shown in Fig. 13. According to Fig. 13, the optimal polynomial value and interval size are 1 and 45, respectively. In Figs. 14 and 15, and 16, the graphs of the 9 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 13. Correlation coefficients from the ANN model for various interval sizes and the SG filter’s polynomial order. Fig. 14. Graph comparing measured data (red lines) with denoised data (black lines) for well A. various drilling and well log parameters for all three wells are shown before (red) and after (black) passing through the SG filter. A heat map of feature correlations is displayed in Fig. 17, which shows coefficients of correlation between all parameters. It can be seen that the correlation between ROP and Q (mud fow rate) is negligible. Also, SPP has a significant linear correlation (R2 more than 0.9) with Q, and thus Q is removed from the dataset. Furthermore, there is a high linear correlation between DT, RHOB, and NPHI, which means two of these variables are required to be dropped, but which should remain? This question will be addressed in the next steps. Non-linear analysis using ANNs: An examination of the nonlinear relationships between features and the target parameter can be carried out using ANNs [50]. Because an ANN consists of a hidden layer so that each input feature is multiplied by the weights, and then the sum of their multiplications is entered into a non-linear function to estimate the target parameter, it may be concluded that the weights assigned to each input feature represent the non-linear correlation between the feature and the target parameter. 3.2.2. Feature selection A feature selection procedure reduces the number of input variables, hence the computational requirements, and enhances the predictive model performance. We used three feature selection methods in this study to determine the key input variables in the training dataset (well A and well B): (1) linear regression analysis, (3) non-linear analysis using ANNs, and (3) PCA. Linear regression analysis: Linear regression analysis uses correlation coefficients to examine linear relationships between input variables and the target parameter/s. Features with a small linear correlation with the parameter of interest can be identified and removed. Using this method, it is also possible to identify and remove the input features with a high linear relationship so that duplicate patterns are not introduced into the training process. 10 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 15. Graph comparing measured data (red lines) with denoised data (black lines) for well B. Fig. 16. Graph comparing measured data (red lines) with denoised data (black lines) for well C. If vector X (x1 , x2 , . . . , xM ) is the ANN input layer and vector Wj (w1j , w2j , . . . , wMj ) is the weights of neuron j (j = 1, 2, . . . , N) in the hidden layer, the output of neuron j can be expressed as Eq. (5). output. oj = σ (X T Wj ) In Table 3, the non-linear correlations between the features in the training dataset and the ROP parameter are reported. Based on the ANN trained in the data denoising section that had the highest R2 , these values were determined. The linear regression analysis concluded that two parameters among DT, NPHI, and RHOB must be removed due to their close linear relationship. However, their high linear correlation with the ROP parameter made it challenging to choose one of them to be retained. After examining the non-linear relationship between these three parameters and ROP, we determined that the NPHI and RHOB parameters are significantly less critical than DT and should be excluded from the dataset. Principal component analysis (PCA): PCA is a widely used technique for reducing dimensionality in datasets [51]. It aims ∑N ⏐⏐ ⏐⏐ j=1 wij feature importancexi = ∑M ∑N ⏐ ⏐ ⏐wij ⏐ i=1 (5) Where σ is a non-linear activation function applied in the ANN’s hidden layer that nonlinearly transmits the contents of a neuron in the hidden layer to the next layer. As illustrated in Fig. 18, N weights are attached to the N neurons in the hidden layer for each feature, and the greater the weight which attaches a feature to neuron j, results in the more effect of that feature on neuron j. As shown in Eq. (6), the magnitude of the effect of a particular feature (xi ) on the ANN hidden layer (feature importance) may be determined by calculating the sum of the absolute values of the attaching weights of that feature which can be divided by the sum of all features’ importance. In other words, feature importance is a kind of non-linear correlation between the feature and the ANN 11 (6) j=1 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 17. Feature correlation heat map that shows correlation coefficients between every two features. Table 3 Non-linear correlations between dataset features and the ROP (features importance), derived by ANN concept. Parameter Feature importance DS WOB SPP Torque MW Q CGR DT NPHI RHOB 0.474447165 0.164021691 0.037629097 0.101870934 0.019082706 0.031409586 0.065480516 0.238409536 0.025232925 0.033376421 to reduce the number of variables in a dataset while preserving as much information as possible. By definition, PCA finds the directions called Principal Components (PCs) that demonstrate the highest possible variance or the direction possessing the most information. The original space is transformed into a new one using PCA (Fig. 19). The PCs consist of uncorrelated variables generated by linearly combining existing variables in a way that the first PC has the most variance (information), the second PC has the most remaining information, and so forth. It is necessary to standardize the existing data before using PCA. The standardization of the dataset can be achieved using Eq. (7). xi,std = xi − av eragex Fig. 18. A schematic view of an ANN with a single hidden layer. Each node (input feature) in the input layer is connected to the hidden layer neurons using weights. The weights indicate the influence of inputs on the hidden layer neurons. For instance, weight w11 indicates the impact of feature 1 on hidden layer neuron 1, while weight w21 indicates the effect of feature 2 on hidden layer neuron 1. Once the covariance matrix is generated, the eigenvalues (λ1 , λ2 , . . . , λN ), which indicate the variation of corresponding PCs, and eigenvectors (v1 , v2 , . . . , vN ) are calculated. There is a corresponding eigenvector for each eigenvalue that has N number of relative coefficients (vi = RC1i , RC2i , . . . , RCNi ). RCji represents the effect of feature j on the eigenvector i. Following the calculation of eigenvalues and eigenvectors, the PCs that exhibit the highest number of eigenvalues are selected. Fig. 20 shows that, among all ten PCs, 97.86% of the total dataset variation is accounted for by the first six PCs (PC1 to PC6 ). In contrast, the other four PCs comprise only 2.14% of the total variation. As a result, the 10-dimensional dataset of this study (dataset with ten variables) can be condensed into a 6-dimensional dataset. (7) standard dev iationx In the next step, the covariance matrix is calculated. ⎡ Cov.Matrix = ⎣ ⎢ Cov (p1 , p1 ) .. . Cov (pN , p1 ) ··· .. . ··· N = Number of v ariables indataset . Cov (p1 , pN ) ⎤ .. . ⎥ ⎦, Cov (pN , pN ) (8) where Cov (pi , pj ) indicates covariance of the ith and the jth parameters in the dataset. 12 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Table 4 Relative coefficients of the first six PCs. Parameter v1 v2 v3 v4 v5 v6 DS WOB SPP Torque MW Q CGR DT NPHI RHOB −0.28095 −0.4214 −0.43723 −0.39923 −0.26045 0.048657 −0.03736 −0.13074 0.245571 −0.93617 0.047275 −0.19252 0.018546 0.043967 −0.02924 0.036426 0.012969 0.046121 −0.27941 −0.23612 −0.01542 0.854505 0.275147 0.212894 −0.10539 −0.25636 0.092881 −0.81423 −0.02465 0.325951 0.058472 0.107607 0.266467 −0.21197 −0.11927 0.276378 0.196753 −0.30271 −0.30614 −0.15363 −0.25305 0.181297 −0.43038 −0.44038 0.444217 0.024681 −0.44764 −0.20187 0.232894 0.232689 −0.23217 0.31161 −0.33444 0.754833 0.160608 −0.21609 0.260013 0.091791 0.08324 −0.01008 Table 5 Mathematical metrics used in this study to evaluate the performance of models. Metric Expression of mathematics Correlation coefficient (R) R = ∑m Root Mean Square Error (RMSE) RMSE = ( 1 ∑m Mean Absolute Error (MAE) MAE = ∑m Mean Absolute Percentage Error (MAPE) MAPE = ∑m x)(y −y) i=1 (xi − ∑m i 2 2 i=1 (yi −y) i=1 (xi −x) m 1 m 1 m i=1 (xi − yi )2 ) 12 i=1 |xi − yi | ∑m ⏐⏐ xi −yi ⏐⏐ i=1 ⏐ xi ⏐ (blind dataset). To predict ROP, we used a MLP-NN with a single hidden layer (a shallow neural network), 1D-CNN (a deep convolutional neural network), and Res 1D-CNN (a modified (residual) deep convolutional neural network) structures before and after the boosting process. We also applied a widely accepted physics-based model (Bingham model) for ROP estimation as a baseline approach. Each of the applied ANNs has its own set of hyperparameters that need to be adjusted before model training, as these hyperparameters affect the accuracy and performance of the models [52]. In order to determine the optimal set of hyperparameters for each network, the PSO algorithm was used [53]. Before boosting predictive ANNs, the training is carried out by the training dataset so that 30% of the training section act as samples used to set hyperparameters and training validation samples to prevent overtraining. To train GB-NNs, it is first necessary to adjust the GAN’s generator weights with ROP samples. Accordingly, GAN training was performed using the ROP samples in the training dataset. The structure of the predictive ANNs is initialized once the GAN is trained. Next, GAN’s generator, in the frozen state, replaces the output layer neuron within each predictive network to form the GB-NNs structure. In training GB-NNs, dataset splitting is similar to dataset splitting for training unboosted ANNs. Then, the final validity and performance of both unboosted NNs and GB-NNs are evaluated using the blind dataset and through the statistical metrics in Table 5. Fig. 21 illustrates how data is split for training and evaluating models. Fig. 19. PCA converts data space from (X , Y ) to (X ′ , Y ′ ). Table 4 lists the eigenvectors for the first 6 PCs. There are ten relative coefficients in vi that indicate the significance of the corresponding parameters in PCi . A parameter’s importance in a PC increases as its absolute relative coefficient increases. For example, the two highest relative coefficients in PC3 are −0.93617 and 0.245571, which relate to MW and Torque, respectively, meaning that a slight change in these values will have a substantial impact on PC3 . The dataset dimension should be reduced from 10 to 6. So, one can specify the four parameters (marked in red in Table 3) that are the least significant on each of the six PCs. Therefore, according to all PCs, the four insignificant parameters can be removed. The MW in four directions (PC1, PC2, PC5, and PC6) is one of the four least important parameters. Also, as indicated above, the mean of ANN’s weights for this parameter was negligible in the non-linear analysis section. Thus, this parameter is removed from the dataset. Q, which was proposed to be removed in the linear regression analysis section, has the lowest relative coefficient in three directions (PC1, PC4, and PC6). Additionally, linear regression and non-linear ANN analysis indicated a small correlation between this parameter and ROP. Therefore, it has been removed from the dataset. Given that the variable Q is a function of SPP, it is noteworthy that using both in modeling results in a bias toward repetitive patterns. RHOB and NPHI variables which should be omitted according to the non-linear analysis section are included in four parameters with minimal effects on variation in three PCs (PC2, PC3, and PC5). In addition, our linear analysis revealed that DT and both of these parameters had a high correlation with each other because all three parameters are a measure of porosity and are related to each other, which makes one of them sufficient for modeling. Therefore, RHOB and NPHI are excluded from the dataset. 4. Experiments, results, and discussions 4.1. Hyper-parameters setting Each of the three included GB-NNs has specific hyper-parameters tuned with a PSO optimization algorithm. The tuned hyper-parameters values of GB-MLP-NN, GB-1D-CNN, and GBRes 1D-CNN are reported in Table 6. It is also worth noting that PSO also determined the optimal values of hyper-parameters for unboosted ANNs. 3.3. Model training and evaluation As already mentioned, the dataset used in this study contains samples from wells A and B (training dataset) and well C 13 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 λ Fig. 20. Percentage of variations ( ∑10i i=1 λi ) for different PCs. Fig. 21. The segmentation of data during the modeling and validation processes. The depth interval of the Sarvak Formation was selected as the length of the dataset in wells A, B, and C. Data from wells A and B are used for the model’s training process (blue). Data from well C is used for the final validation of the model (green). 4.2. Effect of GAN structure is involved). Based on Fig. 22, we can see that GB-NNs have lower errors than unboosted-ANNs because the error line (blue line) for GB-NNs is more similar to a straight line close to zero value. Fig. 23 shows the actual ROP values versus those estimated by predictive ANNs. A higher accumulation of black dots around To highlight the effect of applying GAN structure on predictive ANN’s performance in our ROP prediction problem, we compared the GB-NNs with the unboosted ANN models (no GAN structure 14 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Table 6 PSO-optimized values for hyper-parameters of GB-MLP-NN ,GB-1D-CNN, and GB-Res 1D-CNN. Method Optimization function (optimizer) Learning rate Activation functions Number of samples for gradient update (batch_size) Number of epochs to train the model Validation loss monitoring to stop training & patient number Number of layers Number of neurons GAN Adam 0.002 Fully Connected Layers: Leaky ReLU & Linear 16 2000 No Generator: 3 Discriminator: 2 Generator: 20 & 10 & 1 Discriminator: 128 & 1 GB-MLP Adam 0.00113 Fully Connected Layers: Leaky ReLU & Linear 16 2000 Yes & 50 Fully Connected Layers: 2 Fully Connected Layers: 50 & 1 GB-1D-CNN Adam 0.00021 Conv1D Layers: Leaky ReLU Fully Connected Layers: ReLU & Linear 8 2000 Yes & 50 Conv1D Layers: 3 Fully Connected Layers: 2 Conv1D Layers: 32 & 8 & 32 Dense Layers: 50 & 1 GB-Res 1D-CNN Adam 0.0001 Conv1D Layers: Leaky ReLU Fully Connected Layers: ReLU & Linear 64 2000 Yes & 50 Conv1D Layers: 3 Fully Connected Layers: 2 Conv1D Layers: 32 & 8 & 32 Dense Layers: 50 & 1 Table 7 Statistical results of ANN methods and a physics-based method of Bingham on the blind dataset (Well C). Method R RMSE MAE MAPE Error standard dev. GB-MLP-NN GB-1D-CNN GB-Res 1D-CNN Unboosted MLP-NN Unboosted 1D-CNN Unboosted Res 1D-CNN Bingham 0.9287 0.9443 0.9690 0.9030 0.9270 0.9520 0.7275 0.6600 0.6016 0.4526 1.0070 0.6743 0.5356 1.8260 0.4656 0.4326 0.3245 0.6950 0.4746 0.3740 1.4570 0.0990 0.0925 0.0713 0.1652 0.0971 0.0821 0.2393 0.658 0.6 0.4304 0.979 0.6685 0.5303 1.315 the blue line (correlation line) and greater red line (regression line) conformity with the blue line indicates improved results for GB-NNs. Also, according to the reported values of the statistical parameters in Table 7, it can be observed that adding the GAN’s pre-trained generator structure does help improve the ROP prediction performance. This performance improvement occurs for two reasons. (1) Generator weights do not need to be adjusted during model training. (2) The weights of the hidden part should be adjusted so that the output of this part of the network is a matrix with normal distribution. So, it can be said that with the addition of a pre-trained generator, the network becomes deeper, and the computational load and the complexity of the network training process are reduced, which ultimately enhances the model performance. Besides, we compare the ANN methods (GB-NNs and original ANNs) with one physics-based method: Bingham [6]. All the methods were trained and evaluated using a similar training dataset (Well A and Well B) and the blind dataset (Well C). The statistical results of all approaches on the blind dataset are reported in Table 7. As it shows, our proposed ANN models outperform the physics-based method in terms of the employed performance metrics. 4.3. Results and statistics of the proposed GB-NNs This section presents the final result of each GB-NN on the train and blind datasets. The predicted ROP of all three models (GB-MLP-NN, GB-1D-CNN, and GB-Res 1D-CNN) is compared to the intended well A, B, and C to assess each model’s performance and accuracy. For this purpose, the corresponding values of R, RMSE, MAE, and MAPE performance assessment metrics for each evaluation are calculated. The performance of GB-MLP-NN, GB1D-CNN, and GB-Res 1D-CNN models on the blind dataset (well C) are shown in Figs. 24 to 26. Figs. 24(A) to 26(A) illustrate the ROP profile of the blind well. It can be seen that the ROP predicted by the GB-Res 1D-CNN (solid black line) is more consistent with the real one (solid red line) compared with those predicted by the GB-MLP and GB-1D-CNN methods. Besides, Figs. 24(B) to 26(B) show the cross plots of the measured ROP versus predicted ROP for GB-MLP-NN, GB-1D-CNN, and GB-Res 1D-CNN methods in well C. The black dots representing the predicted ROP in Fig. 26(B) is closer to the red and blue lines than those in Figs. 24(B) and 25(B); demonstrate the strong correlation between the measured and predicted ROP values in 15 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 22. A comparison of the performance of unboosted predictive ANNs (top row) and GB-NNs (bottom row) in estimating the value of ROP. The red line displays the actual value of ROP, the black line displays the estimated value, and the blue line shows the model’s error. Fig. 23. The linear regression between the actual ROP values and the estimated values for unboosted predictive ANNs (top row) and GB-NNs (bottom row). relatively small, which means that the predicted result based on the GB-Res 1D-CNN method is reliable. The error bar comparison of all three settings on well A, well B, and well C is shown in Fig. 27. The GB-Res 1D-CNN method appears to perform the best (closest R-value to 1 and RMSE, MAE, and MAPE values closest to 0). When we compare the performance of the GB-CNNs (GB-1DCNN and GB-Res 1D-CNN) with the GB-MLP-NN method, it can be understood that the performance of GB-CNNs in the training and the blind dataset is better than the GB-MLP-NN method (Table 8). This performance is because of the efficient feature extraction of 1D-CNN. Besides the inherent superiority of the GB-1D-CNN method owing to its deeper structure and pattern recognition potential, the applied modification (applying residual structure) has improved its performance. According to the reported values of the statistical parameters in Table 8, it can be observed that adding the residual structure does help improve the ROP prediction performance. This performance improvement is due to the Residual Fig. 26(B). The statistical parameters of the predicted results are listed in Table 8. It can be seen that the GB-Res 1D-CNN supplies the highest R-value and the lowest RMSE, MAE, and MAPE values, which demonstrates that the GB-Res 1D-CNN can more effectively identify correlations between the conventional well logs and reservoir ROP. The errors between the target values and the predicted outputs are shown in Figs. 24(C) to 26(C). The relatively small values of the error between the measured and predicted ROP values (solid blue line) demonstrate that the predicted ROP by the proposed GB-Res 1D-CNN is reliable. The smaller error values mean the higher confidence of the predictions. The position with larger errors corresponds to the large deviation between the predicted and the measured ROP. Figs. 24(D) to 26(D) show the histogram of the error between the measured and predicted ROP values for GB-MLP-NN, GB1D-CNN, and GB-Res 1D-CNN methods in well C. The GB-Res 1D-CNN’s average and standard deviation of the error values are 16 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 24. ROP prediction performance of GB-MLP-NN on the blind dataset. (A) True (Targets) versus predicted (Outputs) ROP values. (B) Cross-plot showing the true versus estimated ROP values. (C) The relative deviation of GB-MLP for ROP prediction versus relevant true ROP data samples. (D) Histogram of the error between the true and estimated ROP values. Fig. 25. ROP prediction performance of GB-1D-CNN on the blind dataset. (A) True (Targets) versus predicted (Outputs) ROP values. (B) Cross-plot showing the true versus estimated ROP values. (C) The relative deviation of GB-1D-CNN for ROP prediction versus relevant true ROP data samples. (D) Histogram of the error between the true and estimated ROP values. Table 8 Statistical results of GB-MLP-NN, GB-1D-CNN, and GB-Res 1D-CNN on the blind dataset (well C). Method R RMSE MAE MAPE GB-MLP-NN GB-1D-CNN GB-Res 1D-CNN 0.9287 0.9443 0.9690 0.6600 0.6016 0.4526 0.4656 0.4326 0.3245 0.0990 0.0925 0.0713 4.4. Sensitivity analysis Once the GB-Res 1D-CNN model is validated, an ROP sensitivity analysis using the samples of well C was conducted, and the results are demonstrated in Fig. 28. To perform a sensitivity analysis, a baseline case must be defined. The baseline case in this study is one in which all the input features have an average value of well C samples, which is located in the center of the graph. When each parameter is changed with respect to the baseline case, the ROP also changes, which is shown as a percentage change. As a result of this analysis, the DS feature has the greatest impact on ROP, among other features, as it has also been demonstrated in linear and non-linear analyses on actual well samples. This impact is also quite significant; for example, with an increase structure can position the input layer next to the output of the 1D-CNN feature extraction section. As a result, the patterns in the input layer can remain intact alongside the extracted features. Thus, the learning section of 1D-CNN can use more information. Considering all the output results, GB-Res 1D-CNN has shown satisfactory stability and capability of generalization. 17 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 26. ROP prediction performance of GB-Res 1D-CNN on the blind dataset. (A) True (Targets) versus predicted (Outputs) ROP values. (B) Cross-plot showing the true versus estimated ROP values. (C) The relative deviation of GB-Res 1D-CNN for ROP prediction versus relevant true ROP data samples. (D) Histogram of the error between the true and estimated ROP values. Fig. 27. Comparison of the applied GB-NN models on training (well A & B) and blind datasets (well C) to evaluate performance in terms of (A) correlation coefficient (R), (B) root mean squared error (MSE), (C) mean absolute error (MAE), and (D) mean absolute percentage error (MAPE). between CGR and ROP in the linear analysis of features may be attributable to the influence of other features that significantly impact ROP. of 5% in DS, the ROP increase is approximately 30%. Another feature that has a significant effect on the ROP is the DT. When the DT increases, the ROP rises dramatically. As NPHIE and RHOB are eliminated from linear and non-linear analyses, the DT indicates the compaction and porosity of the rocks. Therefore, the change in DT indicates the change in rock strength [54], explaining ROP’s sensitivity to DT. Theoretically, the Torque and WOB factors, which are proportional to the energy needed to overcome rocks’ mechanical specific energy (MSE), should increase ROP, assuming other conditions remain constant [55]. Model results also reflect this effect, although ROP’s sensitivity to Torque and WOB is relatively low. The increase in CGR feature, which indicates more clay minerals in the rock formation, represents the drilled rock block’s softening. Under ideal conditions for cleaning, this softening increases the drilling speed. The sensitivity analysis indicates that this factor has no significant effect on the ROP compared to DS and DT. Due to this, the negative relationship 5. Conclusion This study proposed an innovative DL-based approach to estimate ROP using drilling data and conventional well logs. Using the GAN structure, we boosted ANN models and eventually developed GB-NN models. As a first step, GB-NNs were constructed by learning the GAN structure. Then the generator part, responsible for converting the latent matrix to ROP, was substituted for the neuron of the predictive output layer in ANNs. We concluded that using GAN boosting techniques for ROP prediction enhances the prediction performance of ANNs. The main reasons for this performance improvement are as follows: (1) A reduction in the calculation load during the training process because the generator 18 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 Fig. 28. GB-Res 1D-CNN model’s sensitivity analysis plot. References of the GAN has been frozen. (2) the hidden part is adapted to result in an output with a normal distribution, which enhances the training quality of the network. As the side results of this study, the following conclusions are also drawn: (1) ANN models outperform the commonly used physicsbased Bingham model. (2) 1D-CNN model performs better in both boosted and unboosted modes than the MLP-NN model, which is a shallow ANN. This is due to the communication between the features of the input layer caused by convolution units and extracting features from parameters through convolution sequences. (3) The Res 1D-CNN model outperforms the 1D-CNN in both boosted and unboosted modes due to its residual structure. It is because the input layer is added to the convolution section’s output, which means that the input layer patterns are intact alongside the features extracted from it, so the fully connected section can obtain more information. [1] L.F.F. Barbosa, A. Nascimento, M.H. Mathias, J.A. de Carvalho Jr., Machine learning methods applied to drilling rate of penetration prediction and optimization–A review, J. Pet. Sci. Eng. 183 (2019) 106332, http://dx.doi. org/10.1016/j.petrol.2019.106332. [2] A. Alsaihati, S. Elkatatny, H. Gamal, Rate of penetration prediction while drilling vertical complex lithology using an ensemble-learning model, J. Pet. Sci. Eng. 208 (2022) 109335, http://dx.doi.org/10.1016/j.petrol.2021. 109335. [3] A.T. Bourgoyne, K.K. Millheim, M.E. Chenevert, F.S. Young, Applied Drilling Engineering, Vol. 2, Society of Petroleum Engineers, Richardson, 1986, p. 514. [4] O. Bello, J. Holzmann, T. Yaqoob, C. Teodoriu, Application of artificial intelligence methods in drilling system design and operations: A review of the state of the art, J. Artif. Intell. Soft Comput. Res. 5 (2015) http: //dx.doi.org/10.1515/jaiscr-2015-0024. [5] C. Hegde, H. Daigle, H. Millwater, K. Gray, Analysis of rate of penetration (ROP) prediction in drilling using physics-based and data-driven models, J. Pet. Sci. Eng. 159 (2017) 295–306, http://dx.doi.org/10.1016/j.petrol.2017. 09.020. [6] G. Bingham, A new approach to interpreting rock drillability, Technical Manual Reprint Oil Gas J. 1965 (1965) 93. [7] A.T. Bourgoyne, F.S. Young, A multiple regression approach to optimal drilling and abnormal pressure detection, Soc. Petrol. Eng. J. 14 (04) (1974) 371–384, http://dx.doi.org/10.2118/4238-PA. [8] H.R. Motahhari, G. Hareland, J.A. James, Improved drilling efficiency technique using integrated PDM and PDC bit parameters, J. Can. Pet. Technol. 49 (10) (2010) 45–52, http://dx.doi.org/10.2118/141651-PA. [9] C. Soares, K. Gray, Real-time predictive capabilities of analytical and machine learning rate of penetration (ROP) models, J. Pet. Sci. Eng. 172 (2019) 934–959, http://dx.doi.org/10.1016/j.petrol.2018.08.083. [10] A. Bahari, A. Baradaran Seyed, Trust-region approach to find constants of Bourgoyne and Young penetration rate model in Khangiran Iranian gas field, in: Latin American & Caribbean Petroleum Engineering Conference, OnePetro, 2007, http://dx.doi.org/10.2118/107520-MS. [11] H. Rahimzadeh, M. Mostofi, A. Hashemi, A new method for determining Bourgoyne and Young penetration rate model constants, Petrol. Sci. Technol. 29 (9) (2011) 886–897, http://dx.doi.org/10.1080/ 10916460903452009. [12] M. Najjarpour, H. Jalalifar, S. Norouzi-Apourvari, Half a century experience in rate of penetration management: Application of machine learning methods and optimization algorithms–A review, J. Pet. Sci. Eng. 208 (2022) 109575, http://dx.doi.org/10.1016/j.petrol.2021.109575. [13] A. Al-AbdulJabbar, A.A. Mahmoud, S. Elkatatny, Artificial neural network model for real-time prediction of the rate of penetration while horizontally drilling natural gas-bearing sandstone formations, Arab. J. Geosci. 14 (2) (2021) 1–14, http://dx.doi.org/10.1007/s12517-021-06457-0. CRediT authorship contribution statement Mohammad Hassan Sharifinasab: Conceptualization, Methodology, Software, Coding, Investigation, Writing – original draft, Formal analysis. Mohammad Emami Niri: Conceptualization, Validation, Resources, Supervision, Writing – review & editing. Milad Masroor: Conceptualization, Methodology, Software Coding, Investigation, Writing – original draft, Formal analysis. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability The data that has been used is confidential. 19 M.H. Sharifinasab, M. Emami Niri and M. Masroor Applied Soft Computing 136 (2023) 110067 [33] S. Elkatatny, Real-time prediction of rate of penetration in S-shape well profile using artificial intelligence models, Sensors 20 (12) (2020) 3506, http://dx.doi.org/10.3390/s20123506. [34] C. Hegde, S. Wallace, K. Gray, Using trees, bagging, and random forests to predict rate of penetration during drilling, in: SPE Middle East Intelligent Oil and Gas Conference and Exhibition, OnePetro, 2015, http://dx.doi.org/ 10.2118/176792-MS. [35] S. Chauhan, L. Vig, M.De.Filippo.De. Grazia, M. Corbetta, S. Ahmad, M. Zorzi, A comparison of shallow and deep learning methods for predicting cognitive performance of stroke patients from MRI lesion images, Front. Neuroinform. 13 (53) (2019) http://dx.doi.org/10.3389/fninf.2019.00053. [36] S. Harbola, V. Coors, One-dimensional convolutional neural network architectures for wind prediction, Energy Convers. Manage. 195 (2019) 70–75, http://dx.doi.org/10.1016/j.enconman.2019.05.007. [37] C.L. Yang, Z.X. Chen, C.Y. Yang, Sensor classification using convolutional neural network by encoding multivariate time series as two-dimensional colored images, Sensors 20 (1) (2019) 168, http://dx.doi.org/10.3390/ s20010168. [38] M. Matinkia, A. Sheykhinasab, S. Shojaei, A. Vojdani Tazeh Kand, A. Elmi, M. Bajolvand, M. Mehrad, Developing a new model for drilling rate of penetration prediction using convolutional neural network, Arab. J. Sci. Eng. (2022) 1–33, http://dx.doi.org/10.1007/s13369-022-06765-x. [39] M. Masroor, M. Emami Niri, A.H. Rajabi-Ghozloo, M.H. Sharifinasab, M. Sajjadi, Application of machine and deep learning techniques to estimate NMR-derived permeability from conventional well logs and artificial 2D feature maps, J. Petrol. Explor. Product. Technol. (2022) 1–17, http://dx. doi.org/10.1007/s13202-022-01492-3. [40] M. Masroor, M.E. Niri, M.H. Sharifinasab, A multiple-input deep residual convolutional neural network for reservoir permeability prediction, Geoenergy Sci. Eng. (2023) 211420, http://dx.doi.org/10.1016/j.geoen.2023. 211420. [41] C. Gallo, V. Capozzi, Feature selection with non linear PCA: A neural network approach, J. Appl. Math. Phys. 7 (10) (2019) 2537–2554, http: //dx.doi.org/10.4236/jamp.2019.710173. [42] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778, http://dx.doi.org/10.1109/CVPR.2016.90. [43] I. Bratko, Machine learning in artificial intelligence, Artif. Intell. Eng. 8 (3) (1993) 159–164, http://dx.doi.org/10.1016/0954-1810(93)90002-W. [44] M. Cardiff, P.K. Kitanidis, Fitting data under omnidirectional noise: A probabilistic method for inferring petrophysical and hydrologic relations, Math. Geosci. 42 (8) (2010) 877–909, http://dx.doi.org/10.1007/s11004010-9301-x. [45] L.P. Garcia, A.C. de Carvalho, A.C. Lorena, Effect of label noise in the complexity of classification problems, Neurocomputing 160 (2015) 108–119, http://dx.doi.org/10.1016/j.neucom.2014.10.085. [46] A. Savitzky, M.J. Golay, Smoothing and differentiation of data by simplified least squares procedures, Anal. Chem. 36 (8) (1964) 1627–1639, http: //dx.doi.org/10.1021/ac60214a047. [47] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, http: //dx.doi.org/10.48550/arXiv.1412.6980, arXiv preprint arXiv:1412.6980. [48] G.D. Garson, Interpreting neural-network connection weights, AI Expert 6 (4) (1991) 46–51, https://dl.acm.org/doi/abs/10.5555/129449.129452. [49] A.T. Goh, Back-propagation neural networks for modeling complex systems, Artif. Intell. Eng. 9 (3) (1995) 143–151, http://dx.doi.org/10.1016/ 0954-1810(94)00011-S. [50] C.H. Su, C.H. Cheng, A hybrid fuzzy time series model based on ANFIS and integrated nonlinear feature selection method for forecasting stock, Neurocomputing 205 (2016) 264–273, http://dx.doi.org/10.1016/j.neucom. 2016.03.068. [51] R.A. Kolajoobi, H. Haddadpour, M.E. Niri, Investigating the capability of data-driven proxy models as solution for reservoir geological uncertainty quantification, J. Pet. Sci. Eng. 205 (2021) 108860, http://dx.doi.org/10. 1016/j.petrol.2021.108860. [52] E. Brenjkar, E.B. Delijani, Computational prediction of the drilling rate of penetration (ROP): A comparison of various machine learning approaches and traditional models, J. Pet. Sci. Eng. 210 (2022) 110033, http://dx.doi. org/10.1016/j.petrol.2021.110033. [53] M. Foysal, F. Ahmed, N. Sultana, T.A. Rimi, M.H. Rifat, Convolutional neural network hyper-parameter optimization using particle swarm optimization, in: Emerging Technologies in Data Mining and Information Security, Springer, Singapore, 2021, pp. 363–373, http://dx.doi.org/10.1007/978981-33-4367-2_35. [54] M.D. Zoback, Reservoir Geomechanics, Cambridge University Press, 2010. [55] R. Teale, The concept of specific energy in rock drilling, Int. J. Rock Mech. Min. Sci. Geomech. Abstracts 2 (1) (1965) 57–73, http://dx.doi.org/10.1016/ 0148-9062(65)90022-7. [14] O. Hazbeh, S.K.Y. Aghdam, H. Ghorbani, N. Mohamadian, M.A. Alvar, J. Moghadasi, Comparison of accuracy and computational performance between the machine learning algorithms for rate of penetration in directional drilling well, Petrol. Res. 6 (3) (2021) 271–282, http://dx.doi. org/10.1016/j.ptlrs.2021.02.004. [15] M. Emami Niri, R. Amiri Kolajoobi, M.K. Arbat, M.S. Raz, Metaheuristic optimization approaches to predict shear-wave velocity from conventional well logs in sandstone and carbonate case studies, J. Geophys. Eng. 15 (3) (2018) 1071–1083, http://dx.doi.org/10.1088/1742-2140/aaaba2. [16] H. Haddadpour, M. Emami Niri, Uncertainty assessment in reservoir performance prediction using a two-stage clustering approach: Proof of concept and field application, J. Pet. Sci. Eng. 204 (2021) 108765, http: //dx.doi.org/10.1016/j.petrol.2021.108765. [17] Y. Haghshenas, M. Emami Niri, S. Amini, R.A. Kolajoobi, A physicallysupported data-driven proxy modeling based on machine learning classification methods: Application to water front movement prediction, J. Pet. Sci. Eng. 196 (2021) 107828, http://dx.doi.org/10.1016/j.petrol.2020. 107828. [18] M.S. Jamshidi Gohari, M. Emami Niri, J. Ghiasi-Freez, Improving permeability estimation of carbonate rocks using extracted pore network parameters: A gas field case study, Acta Geophys. 69 (2) (2021) 509–527, http://dx.doi.org/10.1007/s11600-021-00563-z. [19] R.A. Kolajoobi, H. Haddadpour, M. Emami Niri, Investigating the capability of data-driven proxy models as solution for reservoir geological uncertainty quantification, J. Pet. Sci. Eng. 205 (2021) 108860, http://dx.doi.org/10. 1016/j.petrol.2021.108860. [20] Q. Gao, L. Wang, Y. Wang, C. Wang, Crushing analysis and multiobjective crashworthiness optimization of foam-filled ellipse tubes under oblique impact loading, Thin-Walled Struct. 100 (2016) 105–112, http://dx.doi.org/ 10.1016/j.tws.2015.11.020. [21] O.E. Agwu, J.U. Akpabio, S.B. Alabi, A. Dosunmu, Artificial intelligence techniques and their applications in drilling fluid engineering: A review, J. Pet. Sci. Eng. 167 (2018) 300–315, http://dx.doi.org/10.1016/j.petrol.2018. 04.019. [22] A. Gowida, S. Elkatatny, A. Abdulraheem, Application of artificial neural network to predict formation bulk density while drilling, PetrophysicsSPWLA J. Formation Eval. Reserv. Description 60 (05) (2019) 660–674, http://dx.doi.org/10.30632/PJV60N5-2019a9. [23] E.A. Løken, J. Løkkevik, D. Sui, Data-driven approaches tests on a laboratory drilling system, J. Petrol. Explor. Product. Technol. 10 (7) (2020) 3043–3055, http://dx.doi.org/10.1007/s13202-020-00870-z. [24] A. Alsaihati, S. Elkatatny, A.A. Mahmoud, A. Abdulraheem, Use of machine learning and data analytics to detect downhole abnormalities while drilling horizontal wells, with real case study, J. Energy Resources Technol. 143 (4) (2021) http://dx.doi.org/10.1115/1.4048070. [25] K. Amar, A. Ibrahim, Rate of penetration prediction and optimization using advances in artificial neural networks, a comparative study, in: Proceedings of the 4th International Joint Conference on Computational Intelligence, Barcelona, Spain, 2012, pp. 5–7. [26] M.M. Amer, A.S. Dahab, A.A.H. El-Sayed, An ROP predictive model in nile delta area using artificial neural networks, in: SPE Kingdom of Saudi Arabia Annual Technical Symposium and Exhibition, OnePetro, 2017, http: //dx.doi.org/10.2118/187969-MS. [27] S. Eskandarian, P. Bahrami, P. Kazemi, A comprehensive data mining approach to estimate the rate of penetration: Application of neural network, rule based models and feature ranking, J. Petrol. Sci. Eng. 156 (2017) 605–615, http://dx.doi.org/10.1016/j.petrol.2017.06.039. [28] A.K. Abbas, S. Rushdi, M. Alsaba, M.F. Al Dushaishi, Drilling rate of penetration prediction of high-angled wells using artificial neural networks, J. Energy Resourc. Technol. 141 (11) (2019) http://dx.doi.org/10.1115/1. 4043699. [29] A. Ahmed, A. Ali, S. Elkatatny, A. Abdulraheem, New artificial neural networks model for predicting rate of penetration in deep shale formation, Sustainability 11 (22) (2019) 6527, http://dx.doi.org/10.3390/su11226527. [30] S.B. Ashrafi, M. Anemangely, M. Sabah, M.J. Ameri, Application of hybrid artificial neural networks for predicting rate of penetration (ROP): A case study from Marun oil field, J. Pet. Sci. Eng. 175 (2019) 604–623, http: //dx.doi.org/10.1016/j.petrol.2018.12.013. [31] F. Hadi, H. Altaie, E. AlKamil, Modeling rate of penetration using artificial intelligent system and multiple regression analysis, in: Abu Dhabi International Petroleum Exhibition & Conference, OnePetro, 2019, http: //dx.doi.org/10.2118/197663-MS. [32] A. Al-AbdulJabbar, S. Elkatatny, A.Abdulhamid. Mahmoud, T. Moussa, D. Al-Shehri, M. Abughaban, A. Al-Yami, Prediction of the rate of penetration while drilling horizontal carbonate reservoirs using the self-adaptive artificial neural networks technique, Sustainability 12 (4) (2020) 1376, http://dx.doi.org/10.3390/su12041376. 20