S.Talebzadeh

advertisement
From Machine Learning Tools to Mathematical
Models: The Case of Disruption
Prediction and Avoidance
Authors: S. Talebzadeh, A. Murari, P. Gaudio, M. Gelfusa,
R. Moreno, M. Lungaroni, E. Peluso, J. Vega
MACHINE LEARNING AND DISRUPTIONS :
• Machine Learning Tools are used when the problems are so
complex that it is difficult or impossible to develop models from
first principles.
• Machine Learning Tools have traditionally been used to perform
classification, regression, anomaly detection, etc.
• In Fusion one of the main applications is the prediction of
disruptions.
• Disruptions are very complex phenomenon difficult to model but
at the same time it is imperative to avoid or mitigate them.
APODIS :
• Machine learning tools have proved to be very effective: APODIS is
now deployed in JET real time system
• Results for campaigns C28-C30 with the ILW after training with the
Carbon wall
Safe
False
alarms
Unintentional
disruptions
Missed
alarms
TOTAL
Intentional
disruptions
JET off-line classification
651
n/a
305
n/a
956
35
APODIS prediction
645 (99.08%)
6 (0.92%)
300 (98.36%)
5 (1.64%)
956
n/a
• One of the reasons for
the high rate of success
of APODIS is the use of
Support Vector
Machines (SVM) as
classifiers
SVM I :
• Disruption prediction is a typical classification problem.
• Various lines can be used to separate the two classes.
• The optimal separation plane is defined as the one with the
highest margins with respect to the closest points which are
called Support Vectors.
• The remaining points are
Disruptive
irrelevant for the
b
classification.
• Mathematically the objective
is the minimization of a
Safe
quadratic functional with
linear constraints:
Lagrangian multipliers
a
Support Vectors
Margin m
SVM II :
• In order to extend the concept to non separable problems, the
input space is transformed into a different space of higher
dimensionality in which the samples are linearly separable.
• This can be achieved with the use of suitable kernels (linear,
polynomial, Radial Basis functions etc.)
SVM INTERPRETABILITY :
• SVMs are very powerful and their performance in classification
typically are better than other approaches: higher success rate,
determinism, clear mathematical procedure etc.
• Their main limitation is interpretability.
• The solution, the hypersurface separating the classes, is a sum of
functions centered at the Support Vectors (K: Kernel):
• For problems of thaw complexity of disruption prediction on JET,
the “model” identified by SVM can consist of hundreds or even
thousands of terms
OUTLINE OF THE METHOD :
The technique consists of the following main steps:
1- Training the SVM for classification
2- Building an appropriate mesh on the domain
3- Determining a sufficient number of points on the
hyper-surface identified by the SVM
4- Deploying symbolic regression to identify the
equation of the hypersurface from the points previously
obtained
OUTLINE OF THE PROCEDURE:
GENERATING SYNTHETIC DATA AND TRAINING SVM
Initial Defined Function :
y= x × sin(x)
0 < x < 3.5
Offset
y
y= x × sin(x)
Thickness of Data Bulk
x
Train SVM with These Data
ILLUSTRATION OF THE PROCEDURE:
GENERATING GRIDS AND FINDING HYPER-PLANE
Initial Defined Function :
y= x × sin(x)
0 < x < 3.5
Generating Grids with
steps finer than the error bars
Classifying the Grid Points
Going Through the Grids
And Finding Change
in the Classes
y
Finding Hyper-plane Points
Using Genetic Programming
for Finding the Hyper-plane
Equation
x
ILLUSTRATION OF AN EXAMPLE :
Initial Defined Function :
y= sin(x1) + x2
-3 < x1<3
-2 < x2<2
y1 = y + random data between 0 and L + offset
y2 = y - random data between 0 and L - offset
L= data thickness
L= data thickness
where y1 and y2 are the values for the first and second class, respectively.
Obtained Function for Interpreting the Hyper-plane :
y= 0.985 ( sin(x1) + x2 )
AN EXAMPLE :
y= sin(x1) + x2
Green triangles are points generated from the initial function
Cyan points are the points belonging to the first class
Magenta points are the points belonging to the second class
Blue triangles identify support vectors of the first class
Red triangles identify support vectors of the second class
Yellow surface identifies the hyper-surface obtained with the SR via GP
THREE DIMENSIONAL EXAMPLES :
Initial Defined Function :
y = x1+ x2 - x1 × x2
-1 < x1< 1
1 < x2< 2
Obtained Function for Interpreting the Hyper-plane :
y = 1.011 ( x1+ x2 - x1 × x2 )
Initial Defined Function :
y = exp ( (x1 × x2 ) 0.5 )
0 < x1< 1
1 < x2< 3
Obtained Function for Interpreting the Hyper-plane :
y = 0.974 exp ( (x1 × x2 ) 0.5 )
FOUR DIMENSIONAL EXAMPLES :
Initial Defined Function :
y= x1 - x2 + x3
1 < x1< 2
3 < x2< 5
0 < x3< 1
Obtained Function for Interpreting the Hyper-plane :
y= 1.002 (x1 - x2 + x3)
Initial Defined Function :
y= x1 + sin(x2 × x3)
1 < x1< 2
3 < x2< 5
Obtained Function for Interpreting the Hyper-plane :
y= 0.98 (x1 + sin(x2 × x3))
0 < x3< 1
HIGH DIMENSIONAL NOISY EXAMPLE :
Initial Defined Function :
y = sin( x1 + x2 ) - 0.5 x3 x4
Ranges of Variables :
-1.5 < x1 < 1.5
-2 < x2 < 2
0 < x3 < 2
Classification Noise = about 4%
SVM Accuracy on Train and Test Data :
Train Data: 96.1364 %
Test data: 95 %
Obtained Function for Interpreting the Hyper-plane :
y = 0.9094 sin(x1 + x2 ) - 0.4547 x3 x4 - 0.1284
Obtained Function Accuracy on Train and Test Data :
Train Data: 96.0859 %
Test data: 95 %
2 < x4 < 4
REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS
Database Type
Number of Samples
Non-Disruptive Data
200
Disruptive Data 256 ms Before Disruption
200
Disruptive Data 128ms Before Disruption
200
Disruptive Data 64 ms Before Disruption
200
Variables (Units)
Abbreviation
Plasma current (MA)
Ip
Plasma internal inductance
Ii
Plasma density (1e-19*m-2)
NElid
Derivative of diamagnetic energy (MW)
DWdiaDt
Total input power (MW)
Ptot
Total radiated power (MW)
Prad
Non-Disruptive Signal:
Selecting non-disruptive
points from the steadystate zone
Plasma Current
(Normalized )
REAL-WORLD HIGH DIMENSIONAL NOISY PROBLEM:
(Selecting Data From the Signals)
Steady-State Zone
Disruptive Signal:
Selecting disruptive
points 64, 128, and 256
ms before disruption
Plasma Current
(Normalized )
Time (s)
Disruption Time
Time (s)
REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS
Total radiated power (MW)
(Data 64 ms before disruption)
Red Points:
Disruptive Data
Blue Points:
Non-disruptive Data
Plasma Density (1e-19*m-2)
DISRUPTIONS:
(Obtained equation for hyper-plane, 64 ms before disruption)
Prad = 0.7769 Ip - 2.4562 Ii + 0.04145 NElid + 2.377 DWdiaDt + 0.3211 Ptot + 2.0226
Classification Accuracy: 94.697 %
Missed Alarms: 3.7879 %
False Alarms: 1.5152 %
Real radiated power (MW)
Red Points:
Disruptive Data
Blue Points:
Non-disruptive Data
Calculated radiated power (MW)
REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS
Total radiated power (MW)
(Data 128 ms before disruption)
Red Points:
Disruptive Data
Blue Points:
Non-disruptive Data
Plasma Density (1e-19*m-2)
DISRUPTION CASE:
(Obtained equation for hyper-plane, 128 ms before disruption)
Prad = 0.5514 Ip - 2.5868 Ii + 0.04753 NElid + 2.796 DWdiaDt + 0.2911 Ptot + 2.7346
Classification Accuracy: 93.1818 %
Missed Alarms: 5.0505 %
False Alarms: 1.7677 %
Real radiated power (MW)
Red Points:
Disruptive Data
Blue Points:
Non-disruptive Data
Calculated radiated power (MW)
REAL-WORLD HIGH DIMENSIONAL PROBLEM:DISRUPTIONS
Total radiated power (MW)
( Data 256 ms before disruption)
Red Points:
Disruptive Data
Blue Points:
Non-disruptive Data
Plasma Density (1e-19*m-2)
DISRUPTION CASE:
(Obtained equation for hyper-plane, 256 ms before disruption)
Prad = 0.6589 Ip - 2.2493 Ii + 0.09757 NElid + 2.0073 DWdiaDt + 0.2113 Ptot + 1.7985
Real radiated power (MW)
Classification Accuracy: 85.7868 %
Missed Alarms: 12.4365 %
False Alarms: 1.7766 %
Red Points:
Disruptive Data
Blue Points:
Non-disruptive Data
Calculated radiated power (MW)
CONCLUSIONS AND RECOMMENDATION :
• Based on systematic tests and real examples, the
presented methodology shows a high capability of
interpreting the SVM results.
• All the various steps of the procedure can be checked
• The presented technique is a completely new
development and needs more experience to be applied
to complex databases with any kind of data
distribution.
• The approach has the potential to bridge the gap
between theory and experiments and to help in the
field of disruption avoidance
Thank you for your attention
Any Questions?
Download