From Machine Learning Tools to Mathematical Models: The Case of Disruption Prediction and Avoidance Authors: S. Talebzadeh, A. Murari, P. Gaudio, M. Gelfusa, R. Moreno, M. Lungaroni, E. Peluso, J. Vega MACHINE LEARNING AND DISRUPTIONS : • Machine Learning Tools are used when the problems are so complex that it is difficult or impossible to develop models from first principles. • Machine Learning Tools have traditionally been used to perform classification, regression, anomaly detection, etc. • In Fusion one of the main applications is the prediction of disruptions. • Disruptions are very complex phenomenon difficult to model but at the same time it is imperative to avoid or mitigate them. APODIS : • Machine learning tools have proved to be very effective: APODIS is now deployed in JET real time system • Results for campaigns C28-C30 with the ILW after training with the Carbon wall Safe False alarms Unintentional disruptions Missed alarms TOTAL Intentional disruptions JET off-line classification 651 n/a 305 n/a 956 35 APODIS prediction 645 (99.08%) 6 (0.92%) 300 (98.36%) 5 (1.64%) 956 n/a • One of the reasons for the high rate of success of APODIS is the use of Support Vector Machines (SVM) as classifiers SVM I : • Disruption prediction is a typical classification problem. • Various lines can be used to separate the two classes. • The optimal separation plane is defined as the one with the highest margins with respect to the closest points which are called Support Vectors. • The remaining points are Disruptive irrelevant for the b classification. • Mathematically the objective is the minimization of a Safe quadratic functional with linear constraints: Lagrangian multipliers a Support Vectors Margin m SVM II : • In order to extend the concept to non separable problems, the input space is transformed into a different space of higher dimensionality in which the samples are linearly separable. • This can be achieved with the use of suitable kernels (linear, polynomial, Radial Basis functions etc.) SVM INTERPRETABILITY : • SVMs are very powerful and their performance in classification typically are better than other approaches: higher success rate, determinism, clear mathematical procedure etc. • Their main limitation is interpretability. • The solution, the hypersurface separating the classes, is a sum of functions centered at the Support Vectors (K: Kernel): • For problems of thaw complexity of disruption prediction on JET, the “model” identified by SVM can consist of hundreds or even thousands of terms OUTLINE OF THE METHOD : The technique consists of the following main steps: 1- Training the SVM for classification 2- Building an appropriate mesh on the domain 3- Determining a sufficient number of points on the hyper-surface identified by the SVM 4- Deploying symbolic regression to identify the equation of the hypersurface from the points previously obtained OUTLINE OF THE PROCEDURE: GENERATING SYNTHETIC DATA AND TRAINING SVM Initial Defined Function : y= x × sin(x) 0 < x < 3.5 Offset y y= x × sin(x) Thickness of Data Bulk x Train SVM with These Data ILLUSTRATION OF THE PROCEDURE: GENERATING GRIDS AND FINDING HYPER-PLANE Initial Defined Function : y= x × sin(x) 0 < x < 3.5 Generating Grids with steps finer than the error bars Classifying the Grid Points Going Through the Grids And Finding Change in the Classes y Finding Hyper-plane Points Using Genetic Programming for Finding the Hyper-plane Equation x ILLUSTRATION OF AN EXAMPLE : Initial Defined Function : y= sin(x1) + x2 -3 < x1<3 -2 < x2<2 y1 = y + random data between 0 and L + offset y2 = y - random data between 0 and L - offset L= data thickness L= data thickness where y1 and y2 are the values for the first and second class, respectively. Obtained Function for Interpreting the Hyper-plane : y= 0.985 ( sin(x1) + x2 ) AN EXAMPLE : y= sin(x1) + x2 Green triangles are points generated from the initial function Cyan points are the points belonging to the first class Magenta points are the points belonging to the second class Blue triangles identify support vectors of the first class Red triangles identify support vectors of the second class Yellow surface identifies the hyper-surface obtained with the SR via GP THREE DIMENSIONAL EXAMPLES : Initial Defined Function : y = x1+ x2 - x1 × x2 -1 < x1< 1 1 < x2< 2 Obtained Function for Interpreting the Hyper-plane : y = 1.011 ( x1+ x2 - x1 × x2 ) Initial Defined Function : y = exp ( (x1 × x2 ) 0.5 ) 0 < x1< 1 1 < x2< 3 Obtained Function for Interpreting the Hyper-plane : y = 0.974 exp ( (x1 × x2 ) 0.5 ) FOUR DIMENSIONAL EXAMPLES : Initial Defined Function : y= x1 - x2 + x3 1 < x1< 2 3 < x2< 5 0 < x3< 1 Obtained Function for Interpreting the Hyper-plane : y= 1.002 (x1 - x2 + x3) Initial Defined Function : y= x1 + sin(x2 × x3) 1 < x1< 2 3 < x2< 5 Obtained Function for Interpreting the Hyper-plane : y= 0.98 (x1 + sin(x2 × x3)) 0 < x3< 1 HIGH DIMENSIONAL NOISY EXAMPLE : Initial Defined Function : y = sin( x1 + x2 ) - 0.5 x3 x4 Ranges of Variables : -1.5 < x1 < 1.5 -2 < x2 < 2 0 < x3 < 2 Classification Noise = about 4% SVM Accuracy on Train and Test Data : Train Data: 96.1364 % Test data: 95 % Obtained Function for Interpreting the Hyper-plane : y = 0.9094 sin(x1 + x2 ) - 0.4547 x3 x4 - 0.1284 Obtained Function Accuracy on Train and Test Data : Train Data: 96.0859 % Test data: 95 % 2 < x4 < 4 REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS Database Type Number of Samples Non-Disruptive Data 200 Disruptive Data 256 ms Before Disruption 200 Disruptive Data 128ms Before Disruption 200 Disruptive Data 64 ms Before Disruption 200 Variables (Units) Abbreviation Plasma current (MA) Ip Plasma internal inductance Ii Plasma density (1e-19*m-2) NElid Derivative of diamagnetic energy (MW) DWdiaDt Total input power (MW) Ptot Total radiated power (MW) Prad Non-Disruptive Signal: Selecting non-disruptive points from the steadystate zone Plasma Current (Normalized ) REAL-WORLD HIGH DIMENSIONAL NOISY PROBLEM: (Selecting Data From the Signals) Steady-State Zone Disruptive Signal: Selecting disruptive points 64, 128, and 256 ms before disruption Plasma Current (Normalized ) Time (s) Disruption Time Time (s) REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS Total radiated power (MW) (Data 64 ms before disruption) Red Points: Disruptive Data Blue Points: Non-disruptive Data Plasma Density (1e-19*m-2) DISRUPTIONS: (Obtained equation for hyper-plane, 64 ms before disruption) Prad = 0.7769 Ip - 2.4562 Ii + 0.04145 NElid + 2.377 DWdiaDt + 0.3211 Ptot + 2.0226 Classification Accuracy: 94.697 % Missed Alarms: 3.7879 % False Alarms: 1.5152 % Real radiated power (MW) Red Points: Disruptive Data Blue Points: Non-disruptive Data Calculated radiated power (MW) REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS Total radiated power (MW) (Data 128 ms before disruption) Red Points: Disruptive Data Blue Points: Non-disruptive Data Plasma Density (1e-19*m-2) DISRUPTION CASE: (Obtained equation for hyper-plane, 128 ms before disruption) Prad = 0.5514 Ip - 2.5868 Ii + 0.04753 NElid + 2.796 DWdiaDt + 0.2911 Ptot + 2.7346 Classification Accuracy: 93.1818 % Missed Alarms: 5.0505 % False Alarms: 1.7677 % Real radiated power (MW) Red Points: Disruptive Data Blue Points: Non-disruptive Data Calculated radiated power (MW) REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS Total radiated power (MW) ( Data 256 ms before disruption) Red Points: Disruptive Data Blue Points: Non-disruptive Data Plasma Density (1e-19*m-2) DISRUPTION CASE: (Obtained equation for hyper-plane, 256 ms before disruption) Prad = 0.6589 Ip - 2.2493 Ii + 0.09757 NElid + 2.0073 DWdiaDt + 0.2113 Ptot + 1.7985 Real radiated power (MW) Classification Accuracy: 85.7868 % Missed Alarms: 12.4365 % False Alarms: 1.7766 % Red Points: Disruptive Data Blue Points: Non-disruptive Data Calculated radiated power (MW) CONCLUSIONS AND RECOMMENDATION : • Based on systematic tests and real examples, the presented methodology shows a high capability of interpreting the SVM results. • All the various steps of the procedure can be checked • The presented technique is a completely new development and needs more experience to be applied to complex databases with any kind of data distribution. • The approach has the potential to bridge the gap between theory and experiments and to help in the field of disruption avoidance Thank you for your attention Any Questions?