Neural networks in data analysis Michal Kreps www.kit.edu INSTITUT F ¨

advertisement
Neural networks in data analysis
Michal Kreps
INSTITUT FÜR EXPERIMENTELLE KERNPHYSIK
KIT - Universität des Landes Baden-Würrtemberg und
nationales Forschungszentrum in der Helmolts-Gemeintschaft
www.kit.edu
Outline
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
What are the neural networks
1
2
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Basic principles of NN
3
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
For what are they useful
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Training of NN
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Available packages
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Comparison to other methods
4
1
5
6
7
8
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Some examples of NN usage
input layer
hidden layer
output layer
I’m not expert on neural networks, only advanced user who
understands principles.
Goal: Help you to understand principles so that:
→ neural networks don’t look as black box to you
→ you have starting point to use neural networks by yourself
2 11.08.2010
Michal Kreps – Neural networks in data analysis
Tasks to solve
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Different tasks in nature
Some easy to solve
algorithmically, some difficult or
impossible
Reading hand written text
Lot of personal specifics
Recognition often needs to
work only with partial
information
Practically impossible in
algorithmic way
But our brains can solve this
tasks extremely fast
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Use brain as inspiration
3 11.08.2010
Michal Kreps – Neural networks in data analysis
Tasks to solve
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Why (astro-)particle physicist would care about recognition
of hand written text?
Because it is classification task on not very regular data
Do we see now letter A or H?
→ Classification tasks are very often in our analysis
Is this event signal or background?
→ Neural networks have some advantages in this task:
Can learn from known examples (data or simulation)
Can evolve to complex non-linear models
Easily can deal with complicated correlations amongs
inputs
Going to more inputs does not cost additional examples
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
4 11.08.2010
Michal Kreps – Neural networks in data analysis
Down to principles
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Detailed look shows huge
number of neurons
Neurons are interconnected,
each having many inputs
from other neurons
To build artificial neural network:
Zoom in
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
5 11.08.2010
Build model of
neuron
Connect
many
neurons together
Michal Kreps – Neural networks in data analysis
Model of neuron
Get inputs
Give output
Combine input to single quantity
6 11.08.2010
Michal Kreps – Neural networks in data analysis
Model of neuron
Give output
Get inputs
Combine input to single quantity
7 11.08.2010
Michal Kreps – Neural networks in data analysis
Model of neuron
Give output
Get inputs
Combine
Pinput to single quantity
S = i wi xi
7 11.08.2010
Michal Kreps – Neural networks in data analysis
Model of neuron
Give output
Get inputs
Pass S through
activation function
Most common:
A(x) = 1+e2−x − 1
Combine
Pinput to single quantity
S = i wi xi
7 11.08.2010
Michal Kreps – Neural networks in data analysis
Building neural network
Complicated structure
8 11.08.2010
Michal Kreps – Neural networks in data analysis
Building neural network
Even more complicated structure
8 11.08.2010
Michal Kreps – Neural networks in data analysis
Building neural network
1
2
3
4
1
5
6
7
8
input layer
8 11.08.2010
Or simple structures
hidden layer
output layer
Michal Kreps – Neural networks in data analysis
Output of Feed-forward NN
1
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
2
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
3
4
1
5
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
6
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
7
Output of
k in
second layer:
Pnode
1→2
ok = A
w
xi
i ik
Outputof node j in output layer:
P 2→3
xk
oj = A
k wkj
i
hP
P
1→2
2→3
w
xi
A
oj = A
w
i ik
k kj
More layers usually not needed
If more layers involved, output
follows same logic
8
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
input layer
hidden layer
output layer
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
9 11.08.2010
In following, stick to only three
layers as here
Knowledge is stored in weights
Michal Kreps – Neural networks in data analysis
Tasks we can solve
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Classification
Binary decision whether event belongs to one of the two
classes
Is this particle kaon or not?
Is this event candidate for Higgs?
Will Germany become soccer world champion in 2010?
→ Output layer contains single node
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Conditional probability density
Get probability density for each event
What is decay time of a decay
What is energy of particle given the observed shower
What is probability that customer will cause car accident
→ Output layer contains many nodes
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
10 11.08.2010
Michal Kreps – Neural networks in data analysis
NN work flow
11 11.08.2010
Michal Kreps – Neural networks in data analysis
Neural network training
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
After assembling neural networks, weights wik1→2 and wkj2→3
unknown
Initialised to default value
Random value
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Before using neural network for planned task, need to find
best set of weights
Need some measure, which tells which set of weights
performs best ⇒ loss function
Training is process, where based on known examples of
events we search for set of weights which minimise loss
function
Most important part in using neural networks
Bad training can result in bad or even wrong solutions
Many traps in the process, so need to be careful
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
12 11.08.2010
Michal Kreps – Neural networks in data analysis
Loss function
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Practically any function which allows you to decide, which
neural network is better
→ In particle physics it can be one which results in best
measurement
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
In practical applications, two commonly used functions (Tji is
true value with −1 for one class and 1 for other, oji is actual
neural network output)
Quadratic
loss function
P
P 1P
2
2
χ = j wj χj = j 2 i (Tji − oji )2
→ χ2 = 0 for correct decisions and χ2 = 2 for wrong
Entropy loss function
P
P P
j
ED = j wj ED = j i − log( 21 (1 + Tji oji ))
→ ED = 0 for correct decisions and ED = ∞ for wrong
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
13 11.08.2010
Michal Kreps – Neural networks in data analysis
Conditional probability density
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Classification is easy case, we need single output node and
if output is above chosen value, it is class 1 (signal)
otherwise class 2 (background)
Now question is how to obtain probability density?
Use many nodes in output layer, each answering question,
whether probability is larger than some value
As example, lets have n = 10 nodes and true value of 0.63
Vector of true values for output nodes:
(1, 1, 1, 1, 1, 1, −1, −1, −1, −1)
Node j answers question whether probability is larger
than j /10
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
The output vector that represents integrated probability density
14 11.08.2010
Michal Kreps – Neural networks in data analysis
Interpretation of NN output
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
In classification tasks in general higher NN output means
more signal like event
→ NN output rescaled to interval [0; 1] can be interpreted as
probability
Quadratic/entropy loss function is used
NN is well trained ⇔ loss function minimised
→ Purity P(o) = Nselected signal (o)/Nselected (o) should be
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
1.0
0.9
0.8
0.7
purity
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
linear function of NN output
Looking
to expression for output
hP
i
P
2→3
1→2
A
w
w
xi
oj = A
k kj
i ik
→ NN is function, which maps ndimensional space to 1-dimensional
space
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-1.0
-0.5
0.0
0.5
1.0
Network output
15 11.08.2010
Michal Kreps – Neural networks in data analysis
How to check quality of training
Define quantities:
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Signal efficiency:
N(selected signal)
ǫs =
N(all signal)
Efficiency:
N(selected)
ǫ=
N(all)
Purity:
N(selected signal)
P=
N(selected)
16 11.08.2010
Michal Kreps – Neural networks in data analysis
How to check quality of training
Define quantities:
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Signal efficiency:
N(selected signal)
ǫs =
N(all signal)
Efficiency:
N(selected)
ǫ=
N(all)
Purity:
N(selected signal)
P=
N(selected)
Distribution of NN output
for two classes
×103
160
140
120
100
80
60
40
20
0
-1.0
-0.5
0.0
0.5
1.0
Network output
How good is separation?
16 11.08.2010
Michal Kreps – Neural networks in data analysis
How to check quality of training
Define quantities:
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
0.9
Signal efficiency:
N(selected signal)
ǫs =
N(all signal)
Efficiency:
N(selected)
ǫ=
N(all)
Purity:
N(selected signal)
P=
N(selected)
0.8
0.7
purity
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
1.0
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-1.0
-0.5
0.0
0.5
1.0
Network output
Are we at minimum of the
loss function?
16 11.08.2010
Michal Kreps – Neural networks in data analysis
How to check quality of training
Each point has requirement on out > cut
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Signal efficiency:
N(selected signal)
ǫs =
N(all signal)
Efficiency:
N(selected)
ǫ=
N(all)
Purity:
N(selected signal)
P=
N(selected)
Signal purity
Define quantities:
1.0
0.9
Best point is (1,1)
0.8
0.7
Training S /B
0.6
0.5
0.4
0.3
0.2
0.1
0.00.0
0.2
0.4
0.6
0.8
1.0
Signal efficiency
Each point has requirement on out < cut
16 11.08.2010
Michal Kreps – Neural networks in data analysis
How to check quality of training
Each point has requirement on out > cut
Define quantities:
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
Signal efficiency:
N(selected signal)
ǫs =
N(all signal)
Efficiency:
N(selected)
ǫ=
N(all)
Purity:
N(selected signal)
P=
N(selected)
Best achievable
Signal efficiency
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.00.0
Random decision
0.2
0.4
0.6
0.8
1.0
Efficiency
16 11.08.2010
Michal Kreps – Neural networks in data analysis
Issues in training
Main issue: Finding minimum in high dimensional space
5 input nodes with 5 nodes in hidden layer and one output node
gives 5x5 + 5 = 30 weights
Imagine task to find deepest valley in the Alps (only 2 dimensions)
Being put to some random
place, easy to find some
minimum
17 11.08.2010
Michal Kreps – Neural networks in data analysis
Issues in training
Main issue: Finding minimum in high dimensional space
5 input nodes with 5 nodes in hidden layer and one output node
gives 5x5 + 5 = 30 weights
Imagine task to find deepest valley in the Alps (only 2 dimensions)
Being put to some random
place, easy to find some
minimum
But what is in next valley?
17 11.08.2010
Michal Kreps – Neural networks in data analysis
Issues in training
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
In training, we start with:
Large number of weights which are far from optimal value
Input values in unknown ranges
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
→ Large chance for numerical issues in calculation of loss
function
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
Statistical issues coming from low statistics in training
In many places in physics not that much issue as training
statistics can be large
In industry can be quite big issue to obtain large enough
training dataset
But also in physics it can be sometimes very costly to
obtain large enough training sample
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Learning by heart, when NN has enough weights to do that
18 11.08.2010
Michal Kreps – Neural networks in data analysis
How to treat issues
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
All issues mentioned can be treated by neural network
package in principle
Regularisation for numerical issues at the beginning of
training
Weight decay - forgetting
Helps in overtraining on statistical fluctuations
Idea is that effects which are not significant are not
shown often enough to keep weight to remember them
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Pruning of connections - remove connections, which don’t
carry significant information
Preprocessing
numerical issues
search for good minimum by good starting point and
decorrelation
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
19 11.08.2010
Michal Kreps – Neural networks in data analysis
Neurobayes preprocessing
input node 5
6000
4000
3000
flat
events
5000
2000
1000
0
0.2
0.4
0.6
0.8
1
29.709
8.7501
7.121
6.2022
5.5871
5.1186
4.7391
4.4197
4.1511
3.9122
3.706
3.52
3.3514
3.1934
3.045
2.905
2.7733
2.649
2.5321
2.4212
2.3178
2.2173
2.1242
2.0361
1.9542
1.882
1.8114
1.7462
1.6865
1.6316
1.5801
1.5328
1.4886
1.4482
1.4096
1.3739
1.3409
1.3099
1.2805
1.2531
1.2272
1.2021
1.1779
1.1551
1.133
1.1107
1.0887
1.0667
1.0449
1.0224
1
0
Bin to bins of
equal statistics
1.2
spline fit
purity
1
0.8
0.6
0.4
0.2
0
10
20
30
40
50
60
70
80
90
100
bin n.
50000
final
events
40000
30000
20000
10000
0
−3
20 11.08.2010
−2
−1
0
1
2
Calculate purity
for each bin and
do spline fit
3
Use purity instead of original
value, plus transform to µ = 0
and RMS = 1
Michal Kreps – Neural networks in data analysis
Electron ID at CDF
Comparing two different NN packages with exactly same inputs
Preprocessing is important and can improve result significantly
21 11.08.2010
Michal Kreps – Neural networks in data analysis
Available packages
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
NeuroBayes
http://www.phi-t.de
Best I know
Commercial, but fee is small for scientific use
Developed originally in high energy physics experiment
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
ROOT/TMVA
Part of the ROOT package
Can be used for playing, but I would advise to use
something else for serious applications
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
SNNS: Stuttgart Neural Network Simulator
http://www.ra.cs.uni-tuebingen.de/SNNS/
Written in C, but ROOT interface exist
Best open source package I know about
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
22 11.08.2010
Michal Kreps – Neural networks in data analysis
NN vs. other MV techniques
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Several times I saw statements of type, my favourite method
is better than other multivariate techniques
Usually lot of time is spend to understand and properly
train favourite method
Other methods are tried just for quick comparison without
understanding or tuning
Often you will even not learn any details of how other
methods were setup
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
With TMVA it is now easy to compare different methods often I saw that Fischer discriminant won in those
From some tests we found that NN implemented in TMVA
is relativily bad
No wonder that it is usually worst than other methods
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Understand method you are using, none of them is black box
23 11.08.2010
Michal Kreps – Neural networks in data analysis
Practical tips
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
No need for several hidden layers, one should be sufficient
for you
Number of nodes in hidden layer should not be very large
→ Risk of learning by heart
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Number of nodes in hidden layer should not be very small
→ Not enough capacity to learn your problem
→ Usually O(number of inputs) in hidden layer
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
Understand your problem, NN gets numbers and can dig information out of them, but you know meaning of those numbers
24 11.08.2010
Michal Kreps – Neural networks in data analysis
Practical tips
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
Don’t throw away knowledge you acquired about problem
before starting using NN
→ NN can do lot, but you don’t need that it discovers what you
already know
If you already know that given event is background, throw
it away
If you know about good inputs, use them rather than raw
data from which they are calculated
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Use neural network to learn what you don’t know yet
25 11.08.2010
Michal Kreps – Neural networks in data analysis
Physics examples
Only mentioning work done by Karlsruhe HEP group
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
"Stable" particle identification at Delphi, CDF and now Belle
Properties (E, φ, θ, Q-value) of inclusively reconstructed B
mesons
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Decay time in semileptonic Bs decays
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Heavy quark resonances studies: X (3872), B ∗∗ , Bs∗∗ , Λ∗c , Σc
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
B flavour tagging and Bs mixing (Delphi, CDF, Belle)
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
CPV in Bs → J /ψφ decays (CDF)
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
B tagging for high pT physics (CDF, CMS)
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Single top and Higgs search (CDF)
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
B hadron full reconstruction (Belle)
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
Optimisation of KEKB accelerator
26 11.08.2010
Michal Kreps – Neural networks in data analysis
∗∗
Bs
at CDF
1.0 fb-1
1.0 fb-1
2
CDF Run 2
+
+ B**
s → B K → [J/ψ K ] K
400
-
Signal
Background
350
300
250
200
150
100
B+K-
40
B+K+
∗
Bs2
35
Signal
30
Background
25
20
Bs1
15
10
5
50
0
-1.0
Candidates per 1.25 MeV/c
Candidates
CDF Run 2
-0.5
0.0
0.5
1.0
0
0.00
0.05
0.10
+
-
+
0.15
-
0.20
2
Neural Network Output
M(B K )-M(B )-M(K ) [GeV/c ]
Background from data
Signal from MC
First observation of Bs1
Most precise masses
27 11.08.2010
Michal Kreps – Neural networks in data analysis
X(3872) mass at CDF
×103
2.4 fb-1
CDF II Preliminary
100
Signal
Background
Total
Rejected
Selected
60000
40
50000
20
0
-1.0
40000
-0.5
0.0
0.5
1.0
Network Output
Powerful selection at CDF of
largest sample leading to best
mass measurement
28 11.08.2010
2.4 fb-1
CDF II Preliminary
60
Candidates per 2 MeV/c2
Candidates
80
X (3872) first of charmonium like
states
Lot of effort to understand its
nature
30000
20000
10000
0
3.4
3.6
3.8
4.0
J/ψ ππ Mass (GeV/c )
2
Michal Kreps – Neural networks in data analysis
Single top search
0.1
0
-1
-0.5
0
0.5
1
KIT Flavor Sep. Output
Final data plot
Part of big program leading
to discovery of single top
production
29 11.08.2010
Tagging whether jet is b-quark
jet or not using NN
All Channels
-1
CDF II Preliminary 3.2 fb
single top
tt
Wbb+Wcc
Wc
Wqq
Diboson
Z+jets
QCD
data
300
200
80
70
60
50
40
30
20
10
0
0.6
0.7
0.8
0.9
1
100
0
-1
-0.5
0
MC normalized to SM prediction
0.2
Complicated
analysis
with
many different multivariate
techniques
Candidate Events
single top
tt
Wbb+Wcc
Wc
Wqq
Diboson
Z+jets
QCD
normalized to unit area
Event Fraction
-1
TLC 2Jets 1Tag CDF II Preliminary 3.2 fb
0.5
1
NN Output
Michal Kreps – Neural networks in data analysis
B flavour tagging - CDF
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
Flavour tagging at hadron
colliders is difficult task
Lot of potential for
improvements
In plot, concentrate on
continuous lines
Example of not that good
separation
30 11.08.2010
Michal Kreps – Neural networks in data analysis
Training with weights
-1
CDF Run II preliminary, L = 2.4 fb
Candidates per 0.4 MeV
7000
#Signal Events
≈ 23300
#Background Events ≈ 472200
6000
5000
4000
3000
2000
1000
0
0.15
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
0.155
0.16
0.165
0.17
0.175
0.18
0.185
0.19
2
Mass(Λ c π)-Mass(Λ c) [GeV/c ]
One knows statistically whether event is signal or background
Or has sample of mixture of signal and background and pure
sample for one class
Can use data only as example of application
31 11.08.2010
Michal Kreps – Neural networks in data analysis
Training with weights
-1
CDF Run II preliminary, L = 2.4 fb
-1
CDF Run II preliminary, L = 2.4 fb
#Signal Events
≈ 23300
#Background Events ≈ 472200
6000
5000
4000
3000
2000
1000
0
0.15
0.155
0.16
0.165
0.17
0.175
0.18
0.185
0.19
2
Mass(Λ c π)-Mass(Λ c) [GeV/c ]
Candidates per 1.30 MeV
Candidates per 0.4 MeV
7000
3000
Data
Fit Function
++
Σc(2455) → Λ+c π+
++
Σc(2520) → Λ+c π+
2500
2000
1500
1000
500
→ Use only data in selection
→ Can compare data to
simulation
→ Can reweight simulation to match data
32 11.08.2010
(data - fit)
data
0
0.15
0.20
0.25
0.15
0.20
0.25
2
0
-2
2
Mass(Λ+c π+)-Mass(Λ+c) [GeV/c ]
Michal Kreps – Neural networks in data analysis
Examples outside physics
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
00
11
111
000
00
11
000
111
00
11
000
111
00
11
000
111
000
111
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
Medicine and Pharma research
e.g. effects and undesirable effects of drugs early tumor
recognition
Banks
e.g. credit-scoring (Basel II), finance time series prediction,
valuation of derivates, risk minimised trading strategies,
client valuation
Insurances
e.g. risk and cost prediction for individual clients, probability
of contract cancellation, fraud recognition, justice in tariffs
Trading chain stores:
turnover prognosis for individual articles/stores
33 11.08.2010
Michal Kreps – Neural networks in data analysis
Data Mining Cup
World‘s largest
students competition:
Data-Mining-Cup
Very successful in
competition with other
data-mining methods
2005: Fraud detection
in internet trading
2006: price prediction
in ebay auctions
2007: coupon redemption prediction
2008: lottery customer
behaviour prediction
34 11.08.2010
Michal Kreps – Neural networks in data analysis
NeuroBayes Turnover prognosis
35 11.08.2010
Michal Kreps – Neural networks in data analysis
Individual health costs
Pilot project for a large private health insurance
Prognosis of costs in following year for each person
insured with confidence
intervals
4 years of training, test
on following year
Results: Probability density for each customer/tariff
combination
Very good test results!
Potential for an objective cost reduction in health management
36 11.08.2010
Michal Kreps – Neural networks in data analysis
Prognosis of sport events
Results: Probabilities for
37 11.08.2010
home - tie - guest
Michal Kreps – Neural networks in data analysis
Conclusions
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
Neural network is powerful tool for your analysis
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
They are not genius, but can do marvellous things for genius
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
Lecture of this kind cannot really make you expert
00
11
00
11
000
111
00
11
111
000
00
11
000
111
00
11
000
111
000
111
Hope that I was successful with:
Give you enough insight so that neural network is not
black box anymore
Teaching you enough not to be afraid to try
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
38 11.08.2010
Michal Kreps – Neural networks in data analysis
Download