A Toolkit for Multi-variate Fitting Designed with Template Metaprogramming Luca Lista1, Francesco Fabozzi1,2 1INFN Napoli 2Università della Basilicata Luca Lista, IEEE NSS-MIC 2003, Portland Introduction • The toolkit provides: a language to describe and model parametric fit problems in C++ utilities to study the fit frequentistic properties • Not intended to provide new mathematical algorithms The underlying minimization engine is Minuit • Motivated for analysis in BaBar experiment requiring complex fit modeling and Toy MC Luca Lista, IEEE NSS-MIC 2003, Portland Main functionalities • Description of Probability Distribution Functions (PDF) most common PDFs provided (Gaussian, Poisson, etc.) random number generators for each provided PDF utilities to combine PDFs • Manipulation of symbolic expression simplifies the definition of PDF models and fit functions • Fitter tools different Unbinned Maximum Likelihood (UML) fitters and Chi-square fitter supported • Toy Monte Carlo utility to generate random data samples to validate the fit results (pull distribution, fit bias estimate, etc.) • User-defined components can be easily plugged-in Luca Lista, IEEE NSS-MIC 2003, Portland Design choices • The code is optimized for speed Toy Monte Carlo of complex fits are very CPU intensive • It can be achieved without loosing good OO design avoid virtual functions where not necessary using template generic programming the Boost C++ library provides powerful tools • Metaprogramming permits type manipulations at compile time • User don’t “see” these technical detail in the interface • External package dependencies are well isolated Random number generator engines (ROOT, CLHEP, …) Minuit wrapper (ROOT, …) • Other minimizers may be adopted (NAG, …) Luca Lista, IEEE NSS-MIC 2003, Portland PDF interface A PDF implements the “()” operator: P = f( x, y, … ) struct Flat : { PdfFlat( double a, double b ) : min( a ), max( b ) { } double operator()( double x ) const { return ( x < min || x > max ? 0 : 1 / ( max - min ) ); } struct Poissonian { PdfPoissonian( double m ) : mean( m ) { } double operator()( int n ) const { return ( exp( - mean ) * pow( mean, n ) / factorial( n ) ); } double min, max; }; double mean; }; Returns dP(x)/ dx Variable set; a sequence of any variable type is supported Returns P(n) Users can define new PDFs respecting the above interface Luca Lista, IEEE NSS-MIC 2003, Portland Random number generators Implements the “generate” method: r.generate( x, y, … ) template< typename Generator = DefaultGen> struct RandomGenerator< Flat, Generator > { RandomGenerator( const Flat& pdf ) : _min( pdf.min ), _max( pdf.max ) { } void generate( double & x ) const{ x = Generator::shootFlat( _min, _max ); } private: const double& _min, &_max; }; • Users can define new generators with the preferred method Random engine: CLHEP, ROOT, … Partial specialization • Numerical implementations are provided • trapezoidal PDF sampling • “hit or miss” technique RANDOM_GENERATOR_SAMPLE(MyPdf, Bins, Min, Max) RANDOM_GENERATOR_HITORMISS(MyPdf, Min, Max, fMax) Luca Lista, IEEE NSS-MIC 2003, Portland Combining PDFs Argus shoulder ( 5.20, 5.28, -0.1 ); Gaussian peak( 5.28, 0.05 ); typedef Mixture<Gaussian, Argus> Mix; Mix pdf( peak, shoulder, 0.1 ); Argus + Gaussian peaking 10% peaking component RandomGenerator<Mix> rnd; double x; rnd.generate( x ); Gaussian sigX( 5.28, 0.05 ); Gaussian sigY ( 0, 0.015 ); typedef Independent<Gaussian, Gaussian> SigXY; 2D Gaussian peaking RandomGenerator<SigXY> rndXY; double x, y; rndXY.generate( x, y ); Random generators defined automatically • Transformation of variables is also supported Random variables are be generated in the original coordinate system, then transformed Luca Lista, IEEE NSS-MIC 2003, Portland Fit PDF parameters and run Toy MC const int sig = 100; double mean = 0, sigma = 1; Definition of fit model and fitter Gaussian pdf( mean, sigma ); Likelihood<Gaussian> like( pdf ); UMLParameterFitter<Likelihood<Gaussian> > fitter( like ); fitter.addParameter( "mean", & pdf.mean ); fitter.addParameter( "sigma", & pdf.sigma ); Parameters “linked” to the fitter Poissonian num( sig ); // alternative: Constant Gaussian pdfExp( mean, sigma ); Experiment<Poissonian, Gaussian> experiment( num, pdfExp ); for ( int i = 0; i < 50000; i++ ) { Sample<Likelihood::types> sample; experiment.generate( sample ); Poisson PDF for MC generation Type list deduced from Likelihood type double par[ 2 ] = { mean, sigma }, err[ 2 ] = { 1, 1 }, logLike; logLike = fitter.fit( par, err, sample ); double pullm = ( par[ 0 ] - mean ) / err[ 0 ]; double pulls = ( par[ 1 ] - sigma ) / err[ 1 ]; } Luca Lista, IEEE NSS-MIC 2003, Portland Parameter fit Results (Pulls) There is a bias (as expected): 2 = 1/ni(xi-)2 1/n-1i(xi-)2 Luca Lista, IEEE NSS-MIC 2003, Portland UML Yield fit const int sig = 10, bkg = 5; typedef Independent< Gaussian, Gaussian > PdfSig; typedef Independent< Flat, Flat > PdfBkg; PdfSig pdfSig( Gaussian( 0, 1 ), Gaussian( 0, 0.5 ) ); PdfBkg pdfBkg( Flat( -5, 5 ), Flat( -5, 5 ) ); typedef ExtendedLikelihood2< PdfSig, PdfBkg > Likelihood; Likelihood like( pdfSig, pdfBkg ); UMLYieldFitter< Likelihood > fitter( like ); typedef Poissonian Fluctuation; // alternative: Constant Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef Experiment< Fluctuation, PdfSig > ToySig; typedef Experiment< Fluctuation, PdfBkg > ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2< ToySig, ToyBkg > toy( toySig, toyBkg ); In 2 dimensions: Flat background in a signal box Gaussian signal Ext. Likelihood with two samples Yield fitter extracts the yield of the two components for ( int i = 0; i < 50000; i++ ) { Sample< Likelihood::types > sample; toy.generate( sample ); double s[] = { sig, bkg }, err[] = { 1, 1 }; double logLike = fitter.fit( s, err, sample ); double pull1 = ( s[0] - sig ) / err[0] ), pull2 = ( ( s[1] - bkg ) / err[1] ); } Luca Lista, IEEE NSS-MIC 2003, Portland Yield fit Results (Pulls) <s> = 10 Discrete structure because of low statistics Poisson fluctuation Luca Lista, IEEE NSS-MIC 2003, Portland <b> = 5 Combined Yield and parameter fit const int sig = 10, bkg = 5; typedef Poissonian Fluctuation; Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef Independent< Gaussian, Gaussian > PdfSig; typedef Independent< Flat, Flat > PdfBkg; Gaussian g1( 0, 1 ), g2( 0, 0.5 ); Flat f1( -5, 5 ), f2( -5, 5 ); Sig pdfSig( g1, g2 ); Bkg pdfBkg( f1, f2 ); typedef Experiment<Fluctuation, Sig> ToySig; typedef Experiment<Fluctuation, Bkg> ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2<ToySig, ToyBkg> toy( toySig, toyBkg ); typedef ExtendedLikelihood2<PdfSig, PdfBkg> Likelihood; Gaussian G1( 0, 1 ); Sig pdfSig1( G1, g2 ); Likelihood like( pdfSig1, pdfBkg ); UMLYieldAndParameterFitter<Likelihood> fitter( like ); fitter.addParameter( "mean", & G1.mean ); double pull1, pull2, pull3; for ( int i = 0; i < 50000; i++ ) { Sample< Likelihood::types > sample; toy.generate( sample ); double s[] = { sig, bkg, 0 }; double err[] = { 1, 1, 1 }; double logLike = fitter.fit( s, err, sample ); pull1 = ( s[ 0 ] - sig ) / err[ 0 ]; pull2 = ( s[ 1 ] - bkg ) / err[ 1 ]; pull3 = ( s[ 2 ] - 0 ) / err[ 2 ]; } 2D Gaussian signal over a 2D flat background: Simultaneous fit of yields and Gaussian mean Luca Lista, IEEE NSS-MIC 2003, Portland Symbolic function package • Symbolic expressions makes the definition of PDFs easier { X x; // declare the variable x // normalize using the symbolic integration at c-tor PdfNonParametric<X> f1( sqr( sin(x) + cos(x) ) , Normalization: 0, 4 * M_PI ); Analytic integral performed by the compiler // recompute the normalization every time, since // the parameter tau may change from call to call Parameter tau( 0.123 ); PdfParametric<X> f2( x * exp( - tau * x ) , 0, 10 ); User can specify } different way of performing normalization and integration Luca Lista, IEEE NSS-MIC 2003, Portland Example of 2 fit { X x; Parameter a( 0 ), b( 1 ), c( 0 ); Function<X> parabola( c + x*( b + x*a ) ); UniformPartition partition( 100, -1.0, 1.0 ); Chi2<Function<X> > chi2( parabola, partition ); Chi2Fitter<Chi2<Function<X> > > fitter( chi ); fitter.addParameter( "a", a.ptr() ); fitter.addParameter( "b", b.ptr() ); fitter.addParameter( "c", c.ptr() ); SampleErr<double> sample( partition.bins() ); // fill the sample... double par[] = { a, b, c }, err[] = { 1, 1, 1 }; fitter.fit( par, err, sample ); } Luca Lista, IEEE NSS-MIC 2003, Portland Possible future improvement • Upper limit extraction based on Toy Monte Carlo Could be based on existing code from BaBar B analysis • Support for 2 fit with correlated errors and covariance matrix • Provide more “standard” PDFs Crystal ball, Tchebichev polynomials,… • Managing singular PDF Delta-Dirac components • Managing (un)folding • … Luca Lista, IEEE NSS-MIC 2003, Portland Conclusion • We designed a new tool to model fit problems • Using template generic programming we obtained: Generality: • User can plug-in new components (PDF, transformations, random generators, etc.) • Easy to incorporate in the tool external contributions Light-weight • Most of the code is contained in header (#include) files • Mild external dependencies Easy to use • Very “synthetic” and “expressive” code CPU Speed • Virtual function calls are extremely limited • Most of the methods are inlined • Interest has been expressed from: Geant4 Statistical testing toolkit LCG/PI (LHC Computing Grid - Physics Interfaces) • Will focus on a release version shortly Luca Lista, IEEE NSS-MIC 2003, Portland