Statistics in ROOT

advertisement
Statistics in ROOT
René Brun, Anna Kreshuk, Lorenzo Moneta
PH/SFT group, CERN
http://root.cern.ch
ftp://root.cern.ch/root/phystat05.ppt
15th September 2005
PHYSTAT 05, Oxford
1
Contents






User interface
Data storage and access
Analysis
Visualization
New Math libraries
Future plans
15th September 2005
PHYSTAT 05, Oxford
2
ROOT’s user interface

C++ in batch mode
root -b -q myMacro.C > myMacro.log


C++ interpreted code with CINT – the C++ interpreter

in the command line:
root[0] for (int i=0; i<10; i++) cout<<“hello ”<<i<<endl;

loading a macro:
root[1] .L mySmallMacro.C;
root[2] myFunction(1, 2, 3);
C++ compiled code via CINT
root[] .L myScript.C+
Creating shared library /home/…/MyScript_C.so

Python:

Access to ROOT from Python

Access to Python from ROOT
15th September 2005
>>> from ROOT import TLorentzVector
>>> l = TLorentzVector
root [0] TPython::LoadMacro(“MyPyClass.py”);
root [1] MyPyClass mpc;
PHYSTAT 05, Oxford
3
ROOT and external libraries

Using external libraries from ROOT:
– utility to link compiled C/C++ objects with
CINT C/C++ interpreter
 Example:
 rootcint


In the Makefile of MyLibrary, rootcint generates the dictionary
for MyClass
Load and use MyLibrary in a ROOT session:
root[] .L MyLibrary.so
root[] MyClass *mc = new MyClass();
15th September 2005
PHYSTAT 05, Oxford
4
Data storage and access
• Allows
TTree1
TTree2
Dataset to
analyze
TTreeN
Branches of a TTree are
read independently,
so the variables not
needed for the analysis
are not loaded into
memory
15th September 2005
to analyze Terabytes of
data
• Can select entries from
different physical locations and
collect them into the analysis
dataset
V1 V2 …………V23 ………….....V99
PHYSTAT 05, Oxford
5
Histograms

1-2-3 dimensional histograms

Errors for each bin can be computed:



Default: as sqrt(bin content)
As sqrt(sum of squares of weights of the bin)
1-2 dimensional profile histograms

Mean value of Y and its standard deviation for each bin in X
15th September 2005
PHYSTAT 05, Oxford
6
Analysis of TTrees

TTree::Draw method and TTreeViewer - an easy way to examine the tree:



Producing histograms of user-defined expressions in up to 4 dimensions
Expressions – C++ formulas
Selections – expressions, user-defined macros or graphical cuts
Examples:
15th September 2005
Tree.Draw(“sqrt(x):y”, “x>0 && y<1”);
Tree.Draw(“2*TMath::Log(x)”, cut1 || cut2);
PHYSTAT 05, Oxford
7
Fitting - interface


Minimization packages: Minuit and Fumili
Fitting can be done:

Directly in those packages with a user-defined function to minimize
 Through the general interface of





TH1::Fit (binned data) – Chisquare and Loglikelihood methods
TGraph::Fit (unbinned data)
TGraphErrors::Fit (data with errors)
TGraphAsymmErrors::Fit (taking into account asymmetry of errors)
TTree::Fit and TTree::UnbinnedFit
 RooFit package for object-oriented
data modeling. Distributed with ROOT
starting from version 5.02-00
15th September 2005
PHYSTAT 05, Oxford
8
Linear Fitting (1)

New class TLinearFitter
 Used
to fit functions linear in the parameters
 10-15 times faster than Minuit, depending on
the fitting function
 Simple to use in a multidimensional case

Example:
lfitter.SetFormula(“1 ++ x0 ++ sqrt(x1) ++ exp(x2) ++ x3 ++ x4”);

Expressions with such syntax can be used in all the
Fit interface functions
15th September 2005
PHYSTAT 05, Oxford
9
Linear Fitting (2)
Robust least trimmed squares fitting

Based on the subset of h
cases (out of n) whose
least squares fit possesses
the smallest sum of
squared residuals
 High breakdown point –
smallest proportion of outliers that can cause the estimator
to produce values arbitrarily far from the true parameters
Graph.Fit(“pol3”, “rob=0.75”, -2, 2);
15th September 2005
PHYSTAT 05, Oxford
2nd parameter –
fraction h of the
good points
10
Smoothing and peak finding

TSpectrum class:





Graph smoothers:




1 and 2-dim background
estimation
smoothing
deconvolution
peak search and fitting
Kernel smoother
Lowess
“Super smoother”
Splines – cubic and quintic
15th September 2005
PHYSTAT 05, Oxford
11
Multivariate methods (1)


Minimum Covariance Determinant Estimator –
a highly robust estimator of multivariate location
and scatter
Class TRobustEstimator
 High breakdown

point
Algorithm similar to
Least Trimmed
Squares regression
15th September 2005
PHYSTAT 05, Oxford
12
Multivariate methods (2)
TPrincipal - principal components analysis
 TMultiDimFit – approximates a
multidimensional function with monomials,
Chebyshev or Legendre polynomials
 TMultiLayerPerceptron – a neural
networks class
 All multivariate methods can take input
data from a TTree

15th September 2005
PHYSTAT 05, Oxford
13
Confidence intervals



TLimit – computes 95% C.L. limits using the
Likelihood ratio semi-Bayesian method
TRolke – computes confidence intervals for the
rate of the Poisson in the presence of
background and efficiency with a fully frequentist
treatment of uncertainties.
TFeldmanCousins – calculate the C.L. upper
limit using the Feldman-Cousins method
15th September 2005
PHYSTAT 05, Oxford
14
Small useful algorithms

In the namespace TMath:
 Most
probability distribution functions, their
densities and inverses
 Special functions
 Mean and Median – also for weighted
datasets, Variance and K-th order statistic
 Kolmogorov-Smirnov test
15th September 2005
PHYSTAT 05, Oxford
15
Linear algebra and quadratic
programming

Linear algebra package:




General, symmetric and
sparse matrices
Matrix decompositions
Eigenvalue analysis
Quadratic programming
library:


Dense and sparse data
Gondzio and Mehrotra
solving methods
15th September 2005
PHYSTAT 05, Oxford
16
Graphs

1-d:





TGraph
TGraphErrors
TGraphAsymmErrors
TMultiGraph – a collection
of graphs
2-d:


TGraph2D
TGraph2DErrors
15th September 2005
PHYSTAT 05, Oxford
17
ROOT Math Packages
15th September 2005
PHYSTAT 05, Oxford
18
MathCore


Library with the basic Math functionality
build-able as a standalone library
 no
dependency on others ROOT packages
 no external dependency

Main content of MathCore:
 Basic

and commonly used mathematical functions
Special and statistics (pdf, cdf) functions
 Interfaces

to function and algorithm classes
Basic implementation of some numerical algorithms
 3D
and LorentzVectors
 Random numbers
15th September 2005
PHYSTAT 05, Oxford
19
MathMore





Library with extra mathematical functionalities
Current content:
 C++ interface to functions and algorithms from the Gnu
Scientific Library (GSL)
Mathematical functions implemented using GSL
Algorithms currently present:
 adaptive numerical integration, derivation, root finders,
interpolation,1D minimization
repository for needed and useful extra Math
functionality

could include other useful math libraries
15th September 2005
PHYSTAT 05, Oxford
20
Summary and Future plans

First versions of MathCore and MathMore libraries are
being released




Next addition will be new random number package
Improvement of the fitting interface
Statistical algorithms to add:





Transition phase, over in 2-3 months
sPlot
Loess - locally weighted polynomial regression
Cluster analysis
Boxplot and spiderplot
Interface with R?
15th September 2005
PHYSTAT 05, Oxford
21
Mathematical Functions

Special functions
 use


proposed C++ standard interface:
double cyl_bessel_i (double nu, double x);
Statistical functions
 Probability
density functions (pdf)
 Cumulative dist. (lower tail and upper tail)
 Inverse of cumulative distributions
 Coherent naming scheme (also proposed to C++
standard)


chisquared_pdf, chisquared_prob, chisquared_quant,
Chisquared_prob_inv, chisquare_quant_inv
15th September 2005
PHYSTAT 05, Oxford
22
Mathematical Functions (cont)

New functions with better precision than old one
in ROOT
 Extensive
tests of numerical accuracy
 Comparison with other libraries (Nag, Mathematica)
15th September 2005
PHYSTAT 05, Oxford
23
Numerical Algorithm
New C++ classes and interfaces for
describing algorithms and functions
 Integrator classes

 Implementation
based on GSL (QGS) for
definite and indefinite integration

Move of functionality currently in ROOT
TF1 inside new classes in MathCore
 Easier
15th September 2005
to use for all clients
PHYSTAT 05, Oxford
24
Physics and Geometry Vectors

Classes for 3D Vectors and LorentzVectors with their
operations and transformations


New classes with cleaner interfaces, generic on the
scalar type and the based coordinates


(cartesian, polar, cylindrical, etc..)
Classes for 3D rotations and Lorentz transformations


Merge old ROOT and CLHEP
Have also rotations based on quaternion
Work done in collaboration with Fermilab group
15th September 2005
PHYSTAT 05, Oxford
25
Minimization

New C++ version of Minuit being introduced in ROOT
 Same algorithms translated in C++ plus some added
functionality


Fumili minimizer, single side bounds
Going under extensive validation tests
before
15th September 2005
after
PHYSTAT 05, Oxford
26
Download