Neural networks in data analysis Michal Kreps INSTITUT FÜR EXPERIMENTELLE KERNPHYSIK KIT - Universität des Landes Baden-Würrtemberg und nationales Forschungszentrum in der Helmolts-Gemeintschaft www.kit.edu Outline 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 What are the neural networks 1 2 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Basic principles of NN 3 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 For what are they useful 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Training of NN 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Available packages 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Comparison to other methods 4 1 5 6 7 8 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Some examples of NN usage input layer hidden layer output layer I’m not expert on neural networks, only advanced user who understands principles. Goal: Help you to understand principles so that: → neural networks don’t look as black box to you → you have starting point to use neural networks by yourself 2 11.08.2010 Michal Kreps – Neural networks in data analysis Tasks to solve 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Different tasks in nature Some easy to solve algorithmically, some difficult or impossible Reading hand written text Lot of personal specifics Recognition often needs to work only with partial information Practically impossible in algorithmic way But our brains can solve this tasks extremely fast 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Use brain as inspiration 3 11.08.2010 Michal Kreps – Neural networks in data analysis Tasks to solve 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Why (astro-)particle physicist would care about recognition of hand written text? Because it is classification task on not very regular data Do we see now letter A or H? → Classification tasks are very often in our analysis Is this event signal or background? → Neural networks have some advantages in this task: Can learn from known examples (data or simulation) Can evolve to complex non-linear models Easily can deal with complicated correlations amongs inputs Going to more inputs does not cost additional examples 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 4 11.08.2010 Michal Kreps – Neural networks in data analysis Down to principles 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Detailed look shows huge number of neurons Neurons are interconnected, each having many inputs from other neurons To build artificial neural network: Zoom in 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 5 11.08.2010 Build model of neuron Connect many neurons together Michal Kreps – Neural networks in data analysis Model of neuron Get inputs Give output Combine input to single quantity 6 11.08.2010 Michal Kreps – Neural networks in data analysis Model of neuron Give output Get inputs Combine input to single quantity 7 11.08.2010 Michal Kreps – Neural networks in data analysis Model of neuron Give output Get inputs Combine Pinput to single quantity S = i wi xi 7 11.08.2010 Michal Kreps – Neural networks in data analysis Model of neuron Give output Get inputs Pass S through activation function Most common: A(x) = 1+e2−x − 1 Combine Pinput to single quantity S = i wi xi 7 11.08.2010 Michal Kreps – Neural networks in data analysis Building neural network Complicated structure 8 11.08.2010 Michal Kreps – Neural networks in data analysis Building neural network Even more complicated structure 8 11.08.2010 Michal Kreps – Neural networks in data analysis Building neural network 1 2 3 4 1 5 6 7 8 input layer 8 11.08.2010 Or simple structures hidden layer output layer Michal Kreps – Neural networks in data analysis Output of Feed-forward NN 1 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 2 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 3 4 1 5 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 6 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 7 Output of k in second layer: Pnode 1→2 ok = A w xi i ik Outputof node j in output layer: P 2→3 xk oj = A k wkj i hP P 1→2 2→3 w xi A oj = A w i ik k kj More layers usually not needed If more layers involved, output follows same logic 8 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 input layer hidden layer output layer 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 9 11.08.2010 In following, stick to only three layers as here Knowledge is stored in weights Michal Kreps – Neural networks in data analysis Tasks we can solve 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Classification Binary decision whether event belongs to one of the two classes Is this particle kaon or not? Is this event candidate for Higgs? Will Germany become soccer world champion in 2010? → Output layer contains single node 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Conditional probability density Get probability density for each event What is decay time of a decay What is energy of particle given the observed shower What is probability that customer will cause car accident → Output layer contains many nodes 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 10 11.08.2010 Michal Kreps – Neural networks in data analysis NN work flow 11 11.08.2010 Michal Kreps – Neural networks in data analysis Neural network training 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 After assembling neural networks, weights wik1→2 and wkj2→3 unknown Initialised to default value Random value 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Before using neural network for planned task, need to find best set of weights Need some measure, which tells which set of weights performs best ⇒ loss function Training is process, where based on known examples of events we search for set of weights which minimise loss function Most important part in using neural networks Bad training can result in bad or even wrong solutions Many traps in the process, so need to be careful 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 12 11.08.2010 Michal Kreps – Neural networks in data analysis Loss function 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Practically any function which allows you to decide, which neural network is better → In particle physics it can be one which results in best measurement 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 In practical applications, two commonly used functions (Tji is true value with −1 for one class and 1 for other, oji is actual neural network output) Quadratic loss function P P 1P 2 2 χ = j wj χj = j 2 i (Tji − oji )2 → χ2 = 0 for correct decisions and χ2 = 2 for wrong Entropy loss function P P P j ED = j wj ED = j i − log( 21 (1 + Tji oji )) → ED = 0 for correct decisions and ED = ∞ for wrong 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 13 11.08.2010 Michal Kreps – Neural networks in data analysis Conditional probability density 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Classification is easy case, we need single output node and if output is above chosen value, it is class 1 (signal) otherwise class 2 (background) Now question is how to obtain probability density? Use many nodes in output layer, each answering question, whether probability is larger than some value As example, lets have n = 10 nodes and true value of 0.63 Vector of true values for output nodes: (1, 1, 1, 1, 1, 1, −1, −1, −1, −1) Node j answers question whether probability is larger than j /10 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 The output vector that represents integrated probability density 14 11.08.2010 Michal Kreps – Neural networks in data analysis Interpretation of NN output 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 In classification tasks in general higher NN output means more signal like event → NN output rescaled to interval [0; 1] can be interpreted as probability Quadratic/entropy loss function is used NN is well trained ⇔ loss function minimised → Purity P(o) = Nselected signal (o)/Nselected (o) should be 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 1.0 0.9 0.8 0.7 purity 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 linear function of NN output Looking to expression for output hP i P 2→3 1→2 A w w xi oj = A k kj i ik → NN is function, which maps ndimensional space to 1-dimensional space 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -1.0 -0.5 0.0 0.5 1.0 Network output 15 11.08.2010 Michal Kreps – Neural networks in data analysis How to check quality of training Define quantities: 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Signal efficiency: N(selected signal) ǫs = N(all signal) Efficiency: N(selected) ǫ= N(all) Purity: N(selected signal) P= N(selected) 16 11.08.2010 Michal Kreps – Neural networks in data analysis How to check quality of training Define quantities: 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Signal efficiency: N(selected signal) ǫs = N(all signal) Efficiency: N(selected) ǫ= N(all) Purity: N(selected signal) P= N(selected) Distribution of NN output for two classes ×103 160 140 120 100 80 60 40 20 0 -1.0 -0.5 0.0 0.5 1.0 Network output How good is separation? 16 11.08.2010 Michal Kreps – Neural networks in data analysis How to check quality of training Define quantities: 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 0.9 Signal efficiency: N(selected signal) ǫs = N(all signal) Efficiency: N(selected) ǫ= N(all) Purity: N(selected signal) P= N(selected) 0.8 0.7 purity 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 1.0 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -1.0 -0.5 0.0 0.5 1.0 Network output Are we at minimum of the loss function? 16 11.08.2010 Michal Kreps – Neural networks in data analysis How to check quality of training Each point has requirement on out > cut 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Signal efficiency: N(selected signal) ǫs = N(all signal) Efficiency: N(selected) ǫ= N(all) Purity: N(selected signal) P= N(selected) Signal purity Define quantities: 1.0 0.9 Best point is (1,1) 0.8 0.7 Training S /B 0.6 0.5 0.4 0.3 0.2 0.1 0.00.0 0.2 0.4 0.6 0.8 1.0 Signal efficiency Each point has requirement on out < cut 16 11.08.2010 Michal Kreps – Neural networks in data analysis How to check quality of training Each point has requirement on out > cut Define quantities: 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 Signal efficiency: N(selected signal) ǫs = N(all signal) Efficiency: N(selected) ǫ= N(all) Purity: N(selected signal) P= N(selected) Best achievable Signal efficiency 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.00.0 Random decision 0.2 0.4 0.6 0.8 1.0 Efficiency 16 11.08.2010 Michal Kreps – Neural networks in data analysis Issues in training Main issue: Finding minimum in high dimensional space 5 input nodes with 5 nodes in hidden layer and one output node gives 5x5 + 5 = 30 weights Imagine task to find deepest valley in the Alps (only 2 dimensions) Being put to some random place, easy to find some minimum 17 11.08.2010 Michal Kreps – Neural networks in data analysis Issues in training Main issue: Finding minimum in high dimensional space 5 input nodes with 5 nodes in hidden layer and one output node gives 5x5 + 5 = 30 weights Imagine task to find deepest valley in the Alps (only 2 dimensions) Being put to some random place, easy to find some minimum But what is in next valley? 17 11.08.2010 Michal Kreps – Neural networks in data analysis Issues in training 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 In training, we start with: Large number of weights which are far from optimal value Input values in unknown ranges 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 → Large chance for numerical issues in calculation of loss function 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 Statistical issues coming from low statistics in training In many places in physics not that much issue as training statistics can be large In industry can be quite big issue to obtain large enough training dataset But also in physics it can be sometimes very costly to obtain large enough training sample 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Learning by heart, when NN has enough weights to do that 18 11.08.2010 Michal Kreps – Neural networks in data analysis How to treat issues 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 All issues mentioned can be treated by neural network package in principle Regularisation for numerical issues at the beginning of training Weight decay - forgetting Helps in overtraining on statistical fluctuations Idea is that effects which are not significant are not shown often enough to keep weight to remember them 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Pruning of connections - remove connections, which don’t carry significant information Preprocessing numerical issues search for good minimum by good starting point and decorrelation 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 19 11.08.2010 Michal Kreps – Neural networks in data analysis Neurobayes preprocessing input node 5 6000 4000 3000 flat events 5000 2000 1000 0 0.2 0.4 0.6 0.8 1 29.709 8.7501 7.121 6.2022 5.5871 5.1186 4.7391 4.4197 4.1511 3.9122 3.706 3.52 3.3514 3.1934 3.045 2.905 2.7733 2.649 2.5321 2.4212 2.3178 2.2173 2.1242 2.0361 1.9542 1.882 1.8114 1.7462 1.6865 1.6316 1.5801 1.5328 1.4886 1.4482 1.4096 1.3739 1.3409 1.3099 1.2805 1.2531 1.2272 1.2021 1.1779 1.1551 1.133 1.1107 1.0887 1.0667 1.0449 1.0224 1 0 Bin to bins of equal statistics 1.2 spline fit purity 1 0.8 0.6 0.4 0.2 0 10 20 30 40 50 60 70 80 90 100 bin n. 50000 final events 40000 30000 20000 10000 0 −3 20 11.08.2010 −2 −1 0 1 2 Calculate purity for each bin and do spline fit 3 Use purity instead of original value, plus transform to µ = 0 and RMS = 1 Michal Kreps – Neural networks in data analysis Electron ID at CDF Comparing two different NN packages with exactly same inputs Preprocessing is important and can improve result significantly 21 11.08.2010 Michal Kreps – Neural networks in data analysis Available packages 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 NeuroBayes http://www.phi-t.de Best I know Commercial, but fee is small for scientific use Developed originally in high energy physics experiment 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 ROOT/TMVA Part of the ROOT package Can be used for playing, but I would advise to use something else for serious applications 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 SNNS: Stuttgart Neural Network Simulator http://www.ra.cs.uni-tuebingen.de/SNNS/ Written in C, but ROOT interface exist Best open source package I know about 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 22 11.08.2010 Michal Kreps – Neural networks in data analysis NN vs. other MV techniques 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Several times I saw statements of type, my favourite method is better than other multivariate techniques Usually lot of time is spend to understand and properly train favourite method Other methods are tried just for quick comparison without understanding or tuning Often you will even not learn any details of how other methods were setup 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 With TMVA it is now easy to compare different methods often I saw that Fischer discriminant won in those From some tests we found that NN implemented in TMVA is relativily bad No wonder that it is usually worst than other methods 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Understand method you are using, none of them is black box 23 11.08.2010 Michal Kreps – Neural networks in data analysis Practical tips 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 No need for several hidden layers, one should be sufficient for you Number of nodes in hidden layer should not be very large → Risk of learning by heart 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Number of nodes in hidden layer should not be very small → Not enough capacity to learn your problem → Usually O(number of inputs) in hidden layer 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 Understand your problem, NN gets numbers and can dig information out of them, but you know meaning of those numbers 24 11.08.2010 Michal Kreps – Neural networks in data analysis Practical tips 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 Don’t throw away knowledge you acquired about problem before starting using NN → NN can do lot, but you don’t need that it discovers what you already know If you already know that given event is background, throw it away If you know about good inputs, use them rather than raw data from which they are calculated 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Use neural network to learn what you don’t know yet 25 11.08.2010 Michal Kreps – Neural networks in data analysis Physics examples Only mentioning work done by Karlsruhe HEP group 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 "Stable" particle identification at Delphi, CDF and now Belle Properties (E, φ, θ, Q-value) of inclusively reconstructed B mesons 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Decay time in semileptonic Bs decays 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Heavy quark resonances studies: X (3872), B ∗∗ , Bs∗∗ , Λ∗c , Σc 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 B flavour tagging and Bs mixing (Delphi, CDF, Belle) 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 CPV in Bs → J /ψφ decays (CDF) 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 B tagging for high pT physics (CDF, CMS) 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Single top and Higgs search (CDF) 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 B hadron full reconstruction (Belle) 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 Optimisation of KEKB accelerator 26 11.08.2010 Michal Kreps – Neural networks in data analysis ∗∗ Bs at CDF 1.0 fb-1 1.0 fb-1 2 CDF Run 2 + + B** s → B K → [J/ψ K ] K 400 - Signal Background 350 300 250 200 150 100 B+K- 40 B+K+ ∗ Bs2 35 Signal 30 Background 25 20 Bs1 15 10 5 50 0 -1.0 Candidates per 1.25 MeV/c Candidates CDF Run 2 -0.5 0.0 0.5 1.0 0 0.00 0.05 0.10 + - + 0.15 - 0.20 2 Neural Network Output M(B K )-M(B )-M(K ) [GeV/c ] Background from data Signal from MC First observation of Bs1 Most precise masses 27 11.08.2010 Michal Kreps – Neural networks in data analysis X(3872) mass at CDF ×103 2.4 fb-1 CDF II Preliminary 100 Signal Background Total Rejected Selected 60000 40 50000 20 0 -1.0 40000 -0.5 0.0 0.5 1.0 Network Output Powerful selection at CDF of largest sample leading to best mass measurement 28 11.08.2010 2.4 fb-1 CDF II Preliminary 60 Candidates per 2 MeV/c2 Candidates 80 X (3872) first of charmonium like states Lot of effort to understand its nature 30000 20000 10000 0 3.4 3.6 3.8 4.0 J/ψ ππ Mass (GeV/c ) 2 Michal Kreps – Neural networks in data analysis Single top search 0.1 0 -1 -0.5 0 0.5 1 KIT Flavor Sep. Output Final data plot Part of big program leading to discovery of single top production 29 11.08.2010 Tagging whether jet is b-quark jet or not using NN All Channels -1 CDF II Preliminary 3.2 fb single top tt Wbb+Wcc Wc Wqq Diboson Z+jets QCD data 300 200 80 70 60 50 40 30 20 10 0 0.6 0.7 0.8 0.9 1 100 0 -1 -0.5 0 MC normalized to SM prediction 0.2 Complicated analysis with many different multivariate techniques Candidate Events single top tt Wbb+Wcc Wc Wqq Diboson Z+jets QCD normalized to unit area Event Fraction -1 TLC 2Jets 1Tag CDF II Preliminary 3.2 fb 0.5 1 NN Output Michal Kreps – Neural networks in data analysis B flavour tagging - CDF 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 Flavour tagging at hadron colliders is difficult task Lot of potential for improvements In plot, concentrate on continuous lines Example of not that good separation 30 11.08.2010 Michal Kreps – Neural networks in data analysis Training with weights -1 CDF Run II preliminary, L = 2.4 fb Candidates per 0.4 MeV 7000 #Signal Events ≈ 23300 #Background Events ≈ 472200 6000 5000 4000 3000 2000 1000 0 0.15 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 0.155 0.16 0.165 0.17 0.175 0.18 0.185 0.19 2 Mass(Λ c π)-Mass(Λ c) [GeV/c ] One knows statistically whether event is signal or background Or has sample of mixture of signal and background and pure sample for one class Can use data only as example of application 31 11.08.2010 Michal Kreps – Neural networks in data analysis Training with weights -1 CDF Run II preliminary, L = 2.4 fb -1 CDF Run II preliminary, L = 2.4 fb #Signal Events ≈ 23300 #Background Events ≈ 472200 6000 5000 4000 3000 2000 1000 0 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.185 0.19 2 Mass(Λ c π)-Mass(Λ c) [GeV/c ] Candidates per 1.30 MeV Candidates per 0.4 MeV 7000 3000 Data Fit Function ++ Σc(2455) → Λ+c π+ ++ Σc(2520) → Λ+c π+ 2500 2000 1500 1000 500 → Use only data in selection → Can compare data to simulation → Can reweight simulation to match data 32 11.08.2010 (data - fit) data 0 0.15 0.20 0.25 0.15 0.20 0.25 2 0 -2 2 Mass(Λ+c π+)-Mass(Λ+c) [GeV/c ] Michal Kreps – Neural networks in data analysis Examples outside physics 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 00 11 111 000 00 11 000 111 00 11 000 111 00 11 000 111 000 111 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 Medicine and Pharma research e.g. effects and undesirable effects of drugs early tumor recognition Banks e.g. credit-scoring (Basel II), finance time series prediction, valuation of derivates, risk minimised trading strategies, client valuation Insurances e.g. risk and cost prediction for individual clients, probability of contract cancellation, fraud recognition, justice in tariffs Trading chain stores: turnover prognosis for individual articles/stores 33 11.08.2010 Michal Kreps – Neural networks in data analysis Data Mining Cup World‘s largest students competition: Data-Mining-Cup Very successful in competition with other data-mining methods 2005: Fraud detection in internet trading 2006: price prediction in ebay auctions 2007: coupon redemption prediction 2008: lottery customer behaviour prediction 34 11.08.2010 Michal Kreps – Neural networks in data analysis NeuroBayes Turnover prognosis 35 11.08.2010 Michal Kreps – Neural networks in data analysis Individual health costs Pilot project for a large private health insurance Prognosis of costs in following year for each person insured with confidence intervals 4 years of training, test on following year Results: Probability density for each customer/tariff combination Very good test results! Potential for an objective cost reduction in health management 36 11.08.2010 Michal Kreps – Neural networks in data analysis Prognosis of sport events Results: Probabilities for 37 11.08.2010 home - tie - guest Michal Kreps – Neural networks in data analysis Conclusions 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 Neural network is powerful tool for your analysis 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 They are not genius, but can do marvellous things for genius 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 Lecture of this kind cannot really make you expert 00 11 00 11 000 111 00 11 111 000 00 11 000 111 00 11 000 111 000 111 Hope that I was successful with: Give you enough insight so that neural network is not black box anymore Teaching you enough not to be afraid to try 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 38 11.08.2010 Michal Kreps – Neural networks in data analysis