Data Science & Machine Learning applied on Petroleum Engineering : Stick & Slip focus Luis Enrique Navarro Morales Why ? Provide the driller useful information ahead of time about drillstring vibration (Stick & Slip), improving the drilling process, developing and using the available data from other wells, also by applying techniques that its results gets better the more data they get. How ? Using real time data, offered by equinor; ML techniques that has been used in other fields but every computer might do What ? Avoid damages to the diverse tools used while drilling a wellbore. The present work aims to be a thesis proposal in petroleum engineering, and its syllabus. Chapter 1. MWD Chapter 2. Data science and Machine Learning Chapter 3. Stick & Slip Chapter 4 . Case of Study and model construction Artificial Intelligence, aims to used data to impart human-like decision making to machines, mimicking human behavior; Machine learning is defined as a subset of AI technique, which uses statistical methods to enable machines improve with experience, or use data to make optimized inferences and predictions. While Deep learning is a subset of ML, dedicated to filter inputs through layers (Neural networks), to learn how to predict or classify. In order to build reliable Machine Learning models, a series of disciplines must be understood and well managed. Data is the factual information, a measurement used as a basis for reasoning, discussion or calculation, in the drilling field, data is taken through sensors in the drillstring, rotary table and tools that form the drill system (Hoisting, hydraulics, etc) Features are a prominent part or characteristic or something, in the case at hand, features will be the drill mechanics of the wellbore Algorithms, are the procedures to solve a problem ir accomplish some end. Data mining and Data science, gaining the data from different sources and the understanding of such data, respectively, are the building blocks fot ML algorithms, the programming will rely on a language. Two branches can be defined in the ML repertoire: - Supervised : This algorithms need solutions or labels, which the model has to predict, using a classification solution to get a categorical solution, or a regression problem, which predicts numbers - Unsupervised: No explicit solutions are given to the model, it need to analyze data and find patterns. Stick & Slip In the drilling process, the bit, the drillstring and the wellbore may interact in a way creating unwanted vibrations: - Bit Bounce : Motion that cause the bit to repeatedly lift-off and impact the formation, via WOB fluctuations - Bending: lateral motion that cause the drillstring to shock the wellbore wall - Bit Whirl: Eccentric rotation of the bit, deviated from its geometric center - Stick & Slip : non uniform surface-bit rotation The stick and Slip (S&S) is a result of wellbore friction, the wrong combination of top drive (rpm) and weight on bit (swob), resulting in the sudden stick of the bit, and an acceleration of the bit while slipping, hence Stick & Slip. This can cause operational problems (low ROP, stuck drillstring, etc) and damaging the components of the BHA. Due to this sudden and increased acacceleration, damages may occur to the bit, connection overtorque, interference with mud telemetry, etc. When the rotary torque applied to the bit is insufficient for its rotation, a momentarily stick is caused followed by a release and an acceleration that might be beyond the material limit. The “STICK_RT” is obtained by subtracting the maximum downhole rpm minus the minimum rpm, this process is done by the MWD tool of the BHA, which has measuring gauges in the collar of the BHA S&S severity percentage is a measure of how much more the BHA is rotating compared with the surface RPM; if the rotational speed is bigger than the limit of the tool, damages are presented. Stable drilling The S&S solution relies on the interaction between WOB and RPM, field solutions such as decreasing Weight on bit, or increasing RPMs, mitigation procedures have bee design by companies like Schlumberger; tools such as the OMNI Roller Reamer, have proven being useful to mitigate the S&S, with its own problems, such as cutting displacements. The measurements show in this are from MWD, tool, where the S&S is the diference or the space between the green line (Max RPM) and the blue line (Min RPM), the tool gives as Stick & Slip Real Time, just the diference. Also shown, other types of drillstring vibrations such as shocks. As seen from the S&S, continuos medium values made the TD be modified, several solutions where applied to decrease the S&S - SWOB + RPM Data gathering In 2018, the company equinor, released a large amount of datasets regarding the development of the Volve Field. The data needed to be analyzed and cleaned in order to be use in the construction of a model The programming language, in which the present work is coded is Python. This high level programming language, its very useful for Data Science techniques, and Machine Learning implementations, a lot of support from the community to better the libraries, which are the tools to interpret and decode files. The first step towards the selection of a drilling section to apply and implement a model, is decode the WITSML (Well Site information Transfer Standard Markup Language), this are files which contained the well-site data from the rig to different stakeholders in the oil & gas industry. The trajectories from a given well are transmitted using this scheme in each survey of the drilling process, python library Beautifulsoup let us decode this xml files, an algorithm was developed to create a dataframe or data structure that contained the survey data and we could plot this trajectories using python plotting libraries Each survey is taken with its corresponding section, plotting each section in a deviation plot, we can see the extent of each section and its deviation, in the image shown, we see the drilling sections mentioned on the End of Well Report provided in the documents As seen from the deviations, the sections with color green and blue, corresponding to the hole sizes 12 ¼ “ and 8 ½ “, respectively, are the ones that show the must deviation from the 45° line. To analyze the whole trajectory, we need to use the xml nodes regarding NorthSouth and EastWest coordinates, and plot them using specialized plotting libraries such as plotly, which give us the hability to create 3D and dynamic plotting. Must be taken into account, the drilling process and the planned trajectories may differ from the actual wellbore trajectory due to rock/bit interaction, BHA vibrations and other mechanisms The drilling mechanics of each section are contained in the LAS files, corresponding to MWD Data, as shown below Basically, there are two types of LAS or well logs files, those who are indexed by time, and those who are indexed by Depth.. Because the purpose of the Machine Learning model is to predict ahead of time, the used scheme will be DateIndexed, arising the problem of support from well known libraries; to solve this issue an algorithm was developed to construct dataframes from this type of files, the data that will be contained in this data structures will be drilling mechanics, data obtained from the BHA memory and surface gauges. The logging captabilities may differ from log to log given the tool used in the BHA, this tools are mentioned in the End of Well Report, and averages obtained from the files, in the understanding that each file stores a different run in the drilling process. Available runs The data obtained from this files can be plotted in a continuos form, such as logs are presented, in this tracks the most common are the ones measured on surface such as: - Block Position (BPOS), it shows the position in the hoisting system of the block, the ups and downs given by the drilling process and final put out of hole (POOH) Block Velocity (BVEL), related to the hoisting system as well, it indicates the velocity of the block in its trip Hook Load (HKLD), the measurement of the weight of the drillstring Depth (DEPT), the measured depth of the bit, not necessarily indicating drilling processs Torque (TQA), the rotational force between the drillstring and formation, given by the stiffness of the drillstrings, topdrive or mud motor Surface Weight on bit (SWOB), the force exerted in the bit due to the drillstring and its weight, the measurement is taken in the surface and correlated with the hookload Rotations Per Minute (RPM), the rotations per minutes of the topdrive, which is transmited in the surface to the drillstring Rate Of Penetration (ROP), a measurement of how much depth or the drilling speed in an hour Drilling remarks from the F-14 Wellbore Run : 3 Data Science analysis : Pairplot Tool that plots each variable against each other and shows the histogram Pearson’s coefficient between each drilling parameter A straight line, indicating a direct correlation between annulus pressure and depth As formations are drilled, a recognizeable pattern emerges from the gamma ray log As depth is gained, the temperature raises in a constant slope, while POOH the temperature diminishes rapidly While drilling, piping must be connected, this process is seen in the load The WOB applied to the drilling sections, must remain in a defined range, TIH and POOH might deviate this trend S&S is present in the drilling process, mostly on the top and bottom S&S is constant and more often whith low ROP ROP decreases due a change in RPM, this might might has been caused by a remedial for S&S Circulating bottoms up or cleaning the hole POOH process Run : 4 RIH process Top of cement is located, and then drilled, making the S&S measurement have abnormal behavior due to casing shoe Run : 5 Full pairplot analysis The gamma ray response in the shallow section, is well defined, while deeper section, ranges considerably *Mud type() Gaining depth, the amount of weight supported increases, and in-between trend is lost The wide response should be modified once the drilling process is selected. The rpm range needed to drill remains constant in the sections The major S&S ocurrences happen on the 12 1/4” section, but its length is relatively short than other sections Lower hookload equals more S&S in the shallowest section, the same cant be tell for further sections ? As seen above, the statistical properties of the S&S value ranges mostly in the amount of outliers, this caused by the quartiles of each section: - 17.5 : most of the S&S values are below 50 - 12.25 : the range widens and end up at 100 - 8.5 : the range almost stays the same, but the maximum value decreases The maximum value of the S&S measure is reached in the three sections Once the analysis was done, the selected section to apply the ML algorithm to predict the S&S severity is the12 1/4inch section Start drilling depth Continuous up/down from the topdrive and block, gradual increase of rpm until the appropriate is reached, mud pumps are stable and recognizable the joint for next sections of drillstring, formation drilling process begin TOC recognition and cement drilling As seen in the previuos slides, the dataframe contains the whole run of the BHA, inside this run, cleaning process, trips and other non-drilling procedures, the next step was to clean this data points, and to keep only the drilling process, an algorithm was developed to achieve this goal, applying the next concepts : - HDTV: hole depth - BONB : Bit on Bottom - Flag : drilling or not Schlumberger Drilling Reference propose the next classification for S&S ocurrances, a categorical value will be added to the data structure, in order to train the ML model to predict such categorical severity, an algorithm was developed to achieve this process Categorical severity 0 1 2 3 Once the dataframe was subjected to all we’ve done, will be easier to see the S&S ocurrences Machine Learning model construction As we are using a supervised ML algorithm, the prediction must filled, and its predictors as well. One option would be deleting the rows which contains any missing value, the second one, would be appliying a ML algorithm know as “KNN” to impute the missing values, this method will also be applied later To avoid collinearity, and the eliminate parameter which provide the same info The Pearson correlation factor must be contained in this interval To train the ML model, splitting the original dataframe is required: - Training set: Set of examples used for the ML learning, fitting the parameters of the predictors - Testing set: subset of the dataframe dedicated to asses the performance of the ML model Decision Tree Classifier The tree is constructed by asking a series of questions or decision node about the dataset at hand, each time an answer is received a follow up question is asked until a conclusion about the class label or leaf node is reached. Existing algorithms exist to create the trees, bein the CART (Classification And Regression Trees), the must used; this trees are the milestone to the ensamble or bagging algorithms, in which the Random Forest Algorithm offers good results. random_state : value that controls the randomness of the algorithm, for replicable results max_leaf_nodes : number of Decision nodes First question asked Depth 0 How pure the leaf node is ? > 0 : samples contained belong to different classes Depth 1 Number of samples in the node, as it’s the root node, the whole training set Depth 3 How many number of samples belong to each class The prediction a given node will make Depth 4 Depth 5 Based on the ML model, scores can be assign to input features or predictors based on how useful they are at predicting a target variable. As seen on the image below, the accuracy obtained if predicting the S&S with only 2 predictor (SWOB & RPM) has a value of 83%, and visualizing the results, the severe cases of S&S aren’t predicted. Classification metrics: True Positive Rate (TPR): Also known as recall and sometimes Sensitivity, the probability of a value been classificated correctly False Positive Rate (FPR): The probability of a value being misclassified in the model ROC (Reciever Operating Characteristics): Describe the trade-off between the TPR and FPR along different probability thresholds for the classifier From the image, we can see that the model achieve high Recall values with little misclassified values (FPR), giving an area under the curve (AUC) of .913, which is a very good value, taking into account only two predictors were givem Random Forest Classifier The prediction of this algorithm is obtained by a majority vote over the predictions of individual decision trees, if given a regression problem, averages will be calculated Numero de nucleos empleado para el entrenamiento, “-1” se utilizan todos los nucleos disponibles Out Of Bag error, validation of a RF model Number of trees in the forest The main contributors for severe S&S are RPMs and SWOBs, schlumberger solution algorithm rely heavily on modifying this parameter to avoid further S&S severity, the model obtained rely mostly on TurbineRPM and Torque, Total Flow and then SWOB. The severity prediction, in the image above shows the 96% accuracy of the model. For an ideal classifier, the AUC is the area of a rectangle with a unit value. Once the model is constructed and analyzed entirely, its predictions will be correct for the section. In the previous pages, Ive shown the potential benefits of Machine Learning Algorithms can bring to the drilling engineering subject, its really important to note that this models will increase its applicability, once the dataframe grows, further feature engineering will be needed to teach the model to keep learning to predict the S&S severity, but for the scope of this work, only one section will be used and analyzed.