Luca Lista

advertisement
A Toolkit for Multi-variate Fitting
Designed with Template Metaprogramming
Luca Lista1, Francesco Fabozzi1,2
1INFN
Napoli
2Università della Basilicata
Luca Lista, IEEE NSS-MIC 2003, Portland
Introduction
• The toolkit provides:
 a language to describe and model parametric fit
problems in C++
 utilities to study the fit frequentistic properties
• Not intended to provide new mathematical
algorithms
 The underlying minimization engine is Minuit
• Motivated for analysis in BaBar experiment
requiring complex fit modeling and Toy MC
Luca Lista, IEEE NSS-MIC 2003, Portland
Main functionalities
• Description of Probability Distribution Functions (PDF)
 most common PDFs provided (Gaussian, Poisson, etc.)
 random number generators for each provided PDF
 utilities to combine PDFs
• Manipulation of symbolic expression
 simplifies the definition of PDF models and fit functions
• Fitter tools
 different Unbinned Maximum Likelihood (UML) fitters and
Chi-square fitter supported
• Toy Monte Carlo
 utility to generate random data samples to validate the fit
results (pull distribution, fit bias estimate, etc.)
• User-defined components can be easily plugged-in
Luca Lista, IEEE NSS-MIC 2003, Portland
Design choices
• The code is optimized for speed
 Toy Monte Carlo of complex fits are very CPU intensive
• It can be achieved without loosing good OO design
 avoid virtual functions where not necessary
 using template generic programming
 the Boost C++ library provides powerful tools
• Metaprogramming permits type manipulations at compile time
• User don’t “see” these technical detail in the interface
• External package dependencies are well isolated
 Random number generator engines (ROOT, CLHEP, …)
 Minuit wrapper (ROOT, …)
• Other minimizers may be adopted (NAG, …)
Luca Lista, IEEE NSS-MIC 2003, Portland
PDF interface
A PDF implements the “()” operator: P = f( x, y, … )
struct Flat : {
PdfFlat( double a, double b ) :
min( a ), max( b ) { }
double operator()( double x ) const
{
return ( x < min || x > max ?
0 :
1 / ( max - min ) );
}
struct Poissonian {
PdfPoissonian( double m ) :
mean( m ) { }
double operator()( int n ) const
{
return ( exp( - mean ) *
pow( mean, n ) /
factorial( n ) );
}
double min, max;
};
double mean;
};
Returns dP(x)/
dx
Variable set;
a sequence of any variable type
is supported
Returns P(n)
Users can define new PDFs respecting the above interface
Luca Lista, IEEE NSS-MIC 2003, Portland
Random number generators
Implements the “generate” method: r.generate( x, y, … )
template< typename Generator = DefaultGen>
struct RandomGenerator< Flat, Generator >
{
RandomGenerator( const Flat& pdf ) :
_min( pdf.min ), _max( pdf.max ) { }
void generate( double & x ) const{
x = Generator::shootFlat( _min, _max );
}
private:
const double& _min, &_max;
};
• Users can define new generators
with the preferred method
Random engine:
CLHEP, ROOT, …
Partial
specialization
• Numerical implementations are provided
• trapezoidal PDF sampling
• “hit or miss” technique
RANDOM_GENERATOR_SAMPLE(MyPdf, Bins, Min, Max)
RANDOM_GENERATOR_HITORMISS(MyPdf, Min, Max, fMax)
Luca Lista, IEEE NSS-MIC 2003, Portland
Combining PDFs
Argus shoulder ( 5.20, 5.28, -0.1 );
Gaussian peak( 5.28, 0.05 );
typedef Mixture<Gaussian, Argus> Mix;
Mix pdf( peak, shoulder, 0.1 );
Argus + Gaussian peaking
10% peaking component
RandomGenerator<Mix> rnd;
double x;
rnd.generate( x );
Gaussian sigX( 5.28, 0.05 );
Gaussian sigY ( 0, 0.015 );
typedef Independent<Gaussian, Gaussian> SigXY;
2D Gaussian peaking
RandomGenerator<SigXY> rndXY;
double x, y;
rndXY.generate( x, y );
Random generators
defined automatically
• Transformation of variables is also supported
 Random variables are be generated in the original coordinate
system, then transformed
Luca Lista, IEEE NSS-MIC 2003, Portland
Fit PDF parameters and run Toy MC
const int sig = 100;
double mean = 0, sigma = 1;
Definition of
fit model and
fitter
Gaussian pdf( mean, sigma );
Likelihood<Gaussian> like( pdf );
UMLParameterFitter<Likelihood<Gaussian> > fitter( like );
fitter.addParameter( "mean", & pdf.mean );
fitter.addParameter( "sigma", & pdf.sigma );
Parameters “linked”
to the fitter
Poissonian num( sig ); // alternative: Constant
Gaussian pdfExp( mean, sigma );
Experiment<Poissonian, Gaussian> experiment( num, pdfExp );
for ( int i = 0; i < 50000; i++ ) {
Sample<Likelihood::types> sample;
experiment.generate( sample );
Poisson PDF for
MC generation
Type list deduced from
Likelihood type
double par[ 2 ] = { mean, sigma }, err[ 2 ] = { 1, 1 }, logLike;
logLike = fitter.fit( par, err, sample );
double pullm = ( par[ 0 ] - mean ) / err[ 0 ];
double pulls = ( par[ 1 ] - sigma ) / err[ 1 ];
}
Luca Lista, IEEE NSS-MIC 2003, Portland
Parameter fit Results (Pulls)
There is a bias (as expected):
2 = 1/ni(xi-)2  1/n-1i(xi-)2
Luca Lista, IEEE NSS-MIC 2003, Portland
UML Yield fit
const int sig = 10, bkg = 5;
typedef Independent< Gaussian, Gaussian > PdfSig;
typedef Independent< Flat, Flat > PdfBkg;
PdfSig pdfSig( Gaussian( 0, 1 ), Gaussian( 0, 0.5 ) );
PdfBkg pdfBkg( Flat( -5, 5 ), Flat( -5, 5 ) );
typedef ExtendedLikelihood2< PdfSig, PdfBkg > Likelihood;
Likelihood like( pdfSig, pdfBkg );
UMLYieldFitter< Likelihood > fitter( like );
typedef Poissonian Fluctuation; // alternative: Constant
Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg );
typedef Experiment< Fluctuation, PdfSig > ToySig;
typedef Experiment< Fluctuation, PdfBkg > ToyBkg;
ToySig toySig( fluctuationSig, pdfSig );
ToyBkg toyBkg( fluctuationBkg, pdfBkg );
Experiment2< ToySig, ToyBkg > toy( toySig, toyBkg );
In 2 dimensions:
Flat background
in a signal box
Gaussian signal
Ext. Likelihood
with two samples
Yield fitter extracts
the yield of the two
components
for ( int i = 0; i < 50000; i++ ) {
Sample< Likelihood::types > sample;
toy.generate( sample );
double s[] = { sig, bkg }, err[] = { 1, 1 };
double logLike = fitter.fit( s, err, sample );
double pull1 = ( s[0] - sig ) / err[0] ), pull2 = ( ( s[1] - bkg ) / err[1] );
}
Luca Lista, IEEE NSS-MIC 2003, Portland
Yield fit Results (Pulls)
<s> = 10
Discrete structure because of low statistics
Poisson fluctuation
Luca Lista, IEEE NSS-MIC 2003, Portland
<b> = 5
Combined Yield and parameter fit
const int sig = 10, bkg = 5;
typedef Poissonian Fluctuation;
Fluctuation fluctuationSig( sig ),
fluctuationBkg( bkg );
typedef Independent< Gaussian,
Gaussian > PdfSig;
typedef Independent< Flat,
Flat > PdfBkg;
Gaussian g1( 0, 1 ), g2( 0, 0.5 );
Flat f1( -5, 5 ), f2( -5, 5 );
Sig pdfSig( g1, g2 );
Bkg pdfBkg( f1, f2 );
typedef Experiment<Fluctuation,
Sig> ToySig;
typedef Experiment<Fluctuation,
Bkg> ToyBkg;
ToySig toySig( fluctuationSig, pdfSig );
ToyBkg toyBkg( fluctuationBkg, pdfBkg );
Experiment2<ToySig, ToyBkg>
toy( toySig, toyBkg );
typedef ExtendedLikelihood2<PdfSig,
PdfBkg> Likelihood;
Gaussian G1( 0, 1 );
Sig pdfSig1( G1, g2 );
Likelihood like( pdfSig1, pdfBkg );
UMLYieldAndParameterFitter<Likelihood>
fitter( like );
fitter.addParameter( "mean", & G1.mean );
double pull1, pull2, pull3;
for ( int i = 0; i < 50000; i++ ) {
Sample< Likelihood::types > sample;
toy.generate( sample );
double s[] = { sig, bkg, 0 };
double err[] = { 1, 1, 1 };
double logLike = fitter.fit(
s, err, sample );
pull1 = ( s[ 0 ] - sig ) / err[ 0 ];
pull2 = ( s[ 1 ] - bkg ) / err[ 1 ];
pull3 = ( s[ 2 ] - 0 ) / err[ 2 ];
}
2D Gaussian signal over a
2D flat background:
Simultaneous fit of yields and
Gaussian mean
Luca Lista, IEEE NSS-MIC 2003, Portland
Symbolic function package
• Symbolic expressions makes the definition of PDFs
easier
{
X x; // declare the variable x
// normalize using the symbolic integration at c-tor
PdfNonParametric<X> f1( sqr( sin(x) + cos(x) ) ,
Normalization:
0, 4 * M_PI );
Analytic integral performed
by the compiler
// recompute the normalization every time, since
// the parameter tau may change from call to call
Parameter tau( 0.123 );
PdfParametric<X> f2( x * exp( - tau * x ) ,
0, 10 );
User can specify
}
different way of performing
normalization and integration
Luca Lista, IEEE NSS-MIC 2003, Portland
Example of 2 fit
{
X x;
Parameter a( 0 ), b( 1 ), c( 0 );
Function<X> parabola( c + x*( b + x*a ) );
UniformPartition partition( 100, -1.0, 1.0 );
Chi2<Function<X> > chi2( parabola, partition );
Chi2Fitter<Chi2<Function<X> > > fitter( chi );
fitter.addParameter( "a", a.ptr() );
fitter.addParameter( "b", b.ptr() );
fitter.addParameter( "c", c.ptr() );
SampleErr<double> sample( partition.bins() );
// fill the sample...
double par[] = { a, b, c }, err[] = { 1, 1, 1 };
fitter.fit( par, err, sample );
}
Luca Lista, IEEE NSS-MIC 2003, Portland
Possible future improvement
• Upper limit extraction based on Toy Monte Carlo
 Could be based on existing code from BaBar B  analysis
• Support for 2 fit with correlated errors and covariance
matrix
• Provide more “standard” PDFs
 Crystal ball, Tchebichev polynomials,…
• Managing singular PDF
 Delta-Dirac components
• Managing (un)folding
• …
Luca Lista, IEEE NSS-MIC 2003, Portland
Conclusion
• We designed a new tool to model fit problems
• Using template generic programming we obtained:
 Generality:
• User can plug-in new components (PDF, transformations, random
generators, etc.)
• Easy to incorporate in the tool external contributions
 Light-weight
• Most of the code is contained in header (#include) files
• Mild external dependencies
 Easy to use
• Very “synthetic” and “expressive” code
 CPU Speed
• Virtual function calls are extremely limited
• Most of the methods are inlined
• Interest has been expressed from:
 Geant4 Statistical testing toolkit
 LCG/PI (LHC Computing Grid - Physics Interfaces)
• Will focus on a release version shortly
Luca Lista, IEEE NSS-MIC 2003, Portland
Download