Statistics in ROOT René Brun, Anna Kreshuk, Lorenzo Moneta PH/SFT group, CERN http://root.cern.ch ftp://root.cern.ch/root/phystat05.ppt 15th September 2005 PHYSTAT 05, Oxford 1 Contents User interface Data storage and access Analysis Visualization New Math libraries Future plans 15th September 2005 PHYSTAT 05, Oxford 2 ROOT’s user interface C++ in batch mode root -b -q myMacro.C > myMacro.log C++ interpreted code with CINT – the C++ interpreter in the command line: root[0] for (int i=0; i<10; i++) cout<<“hello ”<<i<<endl; loading a macro: root[1] .L mySmallMacro.C; root[2] myFunction(1, 2, 3); C++ compiled code via CINT root[] .L myScript.C+ Creating shared library /home/…/MyScript_C.so Python: Access to ROOT from Python Access to Python from ROOT 15th September 2005 >>> from ROOT import TLorentzVector >>> l = TLorentzVector root [0] TPython::LoadMacro(“MyPyClass.py”); root [1] MyPyClass mpc; PHYSTAT 05, Oxford 3 ROOT and external libraries Using external libraries from ROOT: – utility to link compiled C/C++ objects with CINT C/C++ interpreter Example: rootcint In the Makefile of MyLibrary, rootcint generates the dictionary for MyClass Load and use MyLibrary in a ROOT session: root[] .L MyLibrary.so root[] MyClass *mc = new MyClass(); 15th September 2005 PHYSTAT 05, Oxford 4 Data storage and access • Allows TTree1 TTree2 Dataset to analyze TTreeN Branches of a TTree are read independently, so the variables not needed for the analysis are not loaded into memory 15th September 2005 to analyze Terabytes of data • Can select entries from different physical locations and collect them into the analysis dataset V1 V2 …………V23 ………….....V99 PHYSTAT 05, Oxford 5 Histograms 1-2-3 dimensional histograms Errors for each bin can be computed: Default: as sqrt(bin content) As sqrt(sum of squares of weights of the bin) 1-2 dimensional profile histograms Mean value of Y and its standard deviation for each bin in X 15th September 2005 PHYSTAT 05, Oxford 6 Analysis of TTrees TTree::Draw method and TTreeViewer - an easy way to examine the tree: Producing histograms of user-defined expressions in up to 4 dimensions Expressions – C++ formulas Selections – expressions, user-defined macros or graphical cuts Examples: 15th September 2005 Tree.Draw(“sqrt(x):y”, “x>0 && y<1”); Tree.Draw(“2*TMath::Log(x)”, cut1 || cut2); PHYSTAT 05, Oxford 7 Fitting - interface Minimization packages: Minuit and Fumili Fitting can be done: Directly in those packages with a user-defined function to minimize Through the general interface of TH1::Fit (binned data) – Chisquare and Loglikelihood methods TGraph::Fit (unbinned data) TGraphErrors::Fit (data with errors) TGraphAsymmErrors::Fit (taking into account asymmetry of errors) TTree::Fit and TTree::UnbinnedFit RooFit package for object-oriented data modeling. Distributed with ROOT starting from version 5.02-00 15th September 2005 PHYSTAT 05, Oxford 8 Linear Fitting (1) New class TLinearFitter Used to fit functions linear in the parameters 10-15 times faster than Minuit, depending on the fitting function Simple to use in a multidimensional case Example: lfitter.SetFormula(“1 ++ x0 ++ sqrt(x1) ++ exp(x2) ++ x3 ++ x4”); Expressions with such syntax can be used in all the Fit interface functions 15th September 2005 PHYSTAT 05, Oxford 9 Linear Fitting (2) Robust least trimmed squares fitting Based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals High breakdown point – smallest proportion of outliers that can cause the estimator to produce values arbitrarily far from the true parameters Graph.Fit(“pol3”, “rob=0.75”, -2, 2); 15th September 2005 PHYSTAT 05, Oxford 2nd parameter – fraction h of the good points 10 Smoothing and peak finding TSpectrum class: Graph smoothers: 1 and 2-dim background estimation smoothing deconvolution peak search and fitting Kernel smoother Lowess “Super smoother” Splines – cubic and quintic 15th September 2005 PHYSTAT 05, Oxford 11 Multivariate methods (1) Minimum Covariance Determinant Estimator – a highly robust estimator of multivariate location and scatter Class TRobustEstimator High breakdown point Algorithm similar to Least Trimmed Squares regression 15th September 2005 PHYSTAT 05, Oxford 12 Multivariate methods (2) TPrincipal - principal components analysis TMultiDimFit – approximates a multidimensional function with monomials, Chebyshev or Legendre polynomials TMultiLayerPerceptron – a neural networks class All multivariate methods can take input data from a TTree 15th September 2005 PHYSTAT 05, Oxford 13 Confidence intervals TLimit – computes 95% C.L. limits using the Likelihood ratio semi-Bayesian method TRolke – computes confidence intervals for the rate of the Poisson in the presence of background and efficiency with a fully frequentist treatment of uncertainties. TFeldmanCousins – calculate the C.L. upper limit using the Feldman-Cousins method 15th September 2005 PHYSTAT 05, Oxford 14 Small useful algorithms In the namespace TMath: Most probability distribution functions, their densities and inverses Special functions Mean and Median – also for weighted datasets, Variance and K-th order statistic Kolmogorov-Smirnov test 15th September 2005 PHYSTAT 05, Oxford 15 Linear algebra and quadratic programming Linear algebra package: General, symmetric and sparse matrices Matrix decompositions Eigenvalue analysis Quadratic programming library: Dense and sparse data Gondzio and Mehrotra solving methods 15th September 2005 PHYSTAT 05, Oxford 16 Graphs 1-d: TGraph TGraphErrors TGraphAsymmErrors TMultiGraph – a collection of graphs 2-d: TGraph2D TGraph2DErrors 15th September 2005 PHYSTAT 05, Oxford 17 ROOT Math Packages 15th September 2005 PHYSTAT 05, Oxford 18 MathCore Library with the basic Math functionality build-able as a standalone library no dependency on others ROOT packages no external dependency Main content of MathCore: Basic and commonly used mathematical functions Special and statistics (pdf, cdf) functions Interfaces to function and algorithm classes Basic implementation of some numerical algorithms 3D and LorentzVectors Random numbers 15th September 2005 PHYSTAT 05, Oxford 19 MathMore Library with extra mathematical functionalities Current content: C++ interface to functions and algorithms from the Gnu Scientific Library (GSL) Mathematical functions implemented using GSL Algorithms currently present: adaptive numerical integration, derivation, root finders, interpolation,1D minimization repository for needed and useful extra Math functionality could include other useful math libraries 15th September 2005 PHYSTAT 05, Oxford 20 Summary and Future plans First versions of MathCore and MathMore libraries are being released Next addition will be new random number package Improvement of the fitting interface Statistical algorithms to add: Transition phase, over in 2-3 months sPlot Loess - locally weighted polynomial regression Cluster analysis Boxplot and spiderplot Interface with R? 15th September 2005 PHYSTAT 05, Oxford 21 Mathematical Functions Special functions use proposed C++ standard interface: double cyl_bessel_i (double nu, double x); Statistical functions Probability density functions (pdf) Cumulative dist. (lower tail and upper tail) Inverse of cumulative distributions Coherent naming scheme (also proposed to C++ standard) chisquared_pdf, chisquared_prob, chisquared_quant, Chisquared_prob_inv, chisquare_quant_inv 15th September 2005 PHYSTAT 05, Oxford 22 Mathematical Functions (cont) New functions with better precision than old one in ROOT Extensive tests of numerical accuracy Comparison with other libraries (Nag, Mathematica) 15th September 2005 PHYSTAT 05, Oxford 23 Numerical Algorithm New C++ classes and interfaces for describing algorithms and functions Integrator classes Implementation based on GSL (QGS) for definite and indefinite integration Move of functionality currently in ROOT TF1 inside new classes in MathCore Easier 15th September 2005 to use for all clients PHYSTAT 05, Oxford 24 Physics and Geometry Vectors Classes for 3D Vectors and LorentzVectors with their operations and transformations New classes with cleaner interfaces, generic on the scalar type and the based coordinates (cartesian, polar, cylindrical, etc..) Classes for 3D rotations and Lorentz transformations Merge old ROOT and CLHEP Have also rotations based on quaternion Work done in collaboration with Fermilab group 15th September 2005 PHYSTAT 05, Oxford 25 Minimization New C++ version of Minuit being introduced in ROOT Same algorithms translated in C++ plus some added functionality Fumili minimizer, single side bounds Going under extensive validation tests before 15th September 2005 after PHYSTAT 05, Oxford 26