Propagation of Errors

advertisement
Propagation of Errors
Suppose we measure the branching fraction BR(Higgs+ -) using the number
of produced Higgs Bosons (Nproduced), the number of Higgs+ - decays
found (Nfound), and the efficiency for finding a Higgs+ – decay (e).
BR(Higgs+ -)=Nfound/(eNproduced),
If we know the uncertainties (s’s) of Nproduced, Nfound, and e what is the uncertainty
on BR(Higgs+ -) ?
More formally we could ask, given that we have a functional relationship
between several measured variables (x, y, z), i.e.
Q = f(x, y, z)
What is the uncertainty in Q if the uncertainties in x, y, and z are known?
Usually when we talk about uncertainties in a measured variable such as x we
assume that the value of x represents the mean of a Gaussian distribution
and the uncertainty in x is the standard deviation (s) of the Guassian distribution.
A word of caution here, not all measurements can be represented by Gaussian
distributions, but more on that later!
To answer this question we use a technique called Propagation of Errors.
880.P20 Winter 2006
Richard Kass
Propagation of Errors
To calculate the variance in Q as a function of the variances in x and y we use the following:
s Q2  s 2x  Q / x   s 2y  Q /  y  2s x y  Q / x  Q /  y
2
2
Note: if x and y are uncorrelated (sxy =0) then the last term in the above equation is 0.
Let’s derive the above formula.
Assume we have several measurement of the quantities x (e.g. x1, x2...xi) and y (e.g. y1, y2...yi).
We can calculate the average of x and y using:
N
N
i1
i 1
 x   xi / N and  y   yi / N
Let's define:
Qi  f(xi, yi)
Q  f(x, y) evaluated at the average values
Now expand Qi about the average values:
Q
Q
Qi  Q(  x ,  y )  ( xi   x )
 ( yi   y )
x  x  y
y
 higher order term s
x y
Assume we can neglect the higher order terms (i.e. the measured values are close to the average values). We can
rewrite the above as:
Q
Q
Qi  Q  (x i   x )
 (yi   y )
x  x
y y
We would like to find the variance of Q. By definition the variance of Q is just:
2
1 N
s   (Qi  Q)
N i1
Note: To first order the average of a function is the function evaluated at its average value(s): <Q>=Q()
2
Q
880.P20 Winter 2006
Richard Kass
Propagation of Errors
If we expand the summation using the definition of Q - Qi we get:
2
2
2
2
1 N
  Q
1 N
 Q
2 N
  Q   Q
2

s Q   (xi   x )
  (yi   y )
  (xi   x )(yi   y )
 x  
  x    y 
 y  
N i1
N i1
N i1
x
x

y
y
Since the derivatives are all evaluated at the average values (x, y) we can pull the derivatives outside of the
summations. Finally, remembering the definition of the variance we can write:
2
2


N
2
2   Q
2   Q
  2   Q   Q  (xi   x )(y i   y )
sQ  sx
 sy
 x  
 y 
N  x    y  i 1
y
x
x
y
If the measurements are uncorrelated then the summation in the above equation will be very close to zero (if the
variables are truly uncorrelated then the sum is 0) and can be neglected. Thus for uncorrelated variables we
have:
2
  Q
2
2   Q

sQ  sx
 s 2y 
 x  
 y 
x
2
y
uncorrelated errors
1 N
 (xi   x )(yi   y )
N i 1
If however x and y are correlated, then we define the COVARIANCE sxy using: s x y 
The variance in Q including correlations is given by:
2
2


2
2   Q
2   Q
  2 Q   Q 
sQ  sx
 sy
 x  
 x    y 
 y 
y
x
x
y
sx y
correlated errors
Example: Error in BR(Higgs+ – ). Assume: Nproduced =100  10, Nfound =10  3, e = 0.2  0.02
2
s BR  s N
2
2
pr
 BR 

 s 2
N fd
 N 
pr 

2
 BR 
BR 
2

  s e2 
  s N pr
 e 
 N fd 
2
s BR 
2
 N
fd

 eN 2
pr

2
2

 s 2
N fd


 1

 eN
pr

10
1
10






10 
 9
 (4  10  4 )
4 
2 
4
2 
 0.2  10 
 0.2  10 
 4  10  10 
4
2


  s 2  N fd
2
e 


 e N pr
2
 0.17
BR(Higgs+ – ) =0.5  0.2
880.P20 Winter 2006
Richard Kass




2
Propagation of Errors
Example: The error in the average.
The average of several measurements each with the same uncertainty (s) is given by:
x1  x2    xn
n
2
2
2
2
2
2
2












1
2
2
2
2
2 1 
2 1 
2 1 
  s x 
  s    s    s    ns  
s   s x    s x 2 

1
 x1 
 x 2 
n
 xn 
s 
n
s
n
 n
n
n
“error in the mean”
This is a very important result! It says that we can determine the mean better by combining measurements.
Unfortunately, the precision only increases as the square root of the number of measurements.
Do not confuse s with s!
s is related to the width of the pdf (e.g. gaussian) that the measurements come from.
s does not get smaller as we combine measurements.
A slightly more complicated problem is the case of the weighted average or unequal s’s:
x1

x2

xn
s 12 s 22
s n2

1 / s 12  1 / s 22    1 / s n2
Using same procedure
as above we obtain:
s 2 
1
1 / s 12
 1 / s 22

 1 / s n2
“error in the weighted mean”
880.P20 Winter 2006
Richard Kass
Propagation of Errors
Problems with Propagation of Errors:
In calculating the variance using propagation of errors we usually assume that we are dealing with Gaussian errors
for the measured variable (e.g. x). Unfortunately, just because x is described by a Gaussian distribution
does not mean that f(x) will be described by a Gaussian distribution.
Example: when the new distribution is Gaussian.
Let y = Ax, with A = a constant and x a Gaussian variable. Let the pdf for x be gaussian:

p(x,  x , s x )dx 
e
( x  x ) 2
2 s 2x
dx
s x 2
Then y = Ax and sy = Asx. Putting this into the above equation we have:

p( x,  x , s x )dx 


e
Ae
sy 2 

dx 
e
( y / A  y / A )
2
2
s x 2
( y  y ) 2
2 s 2y
( x  x )
2 s 2x

dx 
e
2( s / A )
2
y
2
s y / A 2
dx
( y  y ) 2
2 s 2y
s y 2
dy
 p( y,  y , s y )dy
Thus the new pdf for y, p(y, y, sy) is also given by a Gaussian probability distribution function.
100
y = 2x with x = 10 2
dN/dy
80
sy  2s x 4
60
Start with a gaussian with =10, s=2.
Get another gaussian with =20, s= 4
40
20
0
880.P20 Winter 2006
0
10
20
30
40
y
Richard Kass
Error of Propagation of Errors
Example when the new distribution is non-Gaussian: Let y = 2/x
The transformed probability distribution function for y does not have the form of a Gaussian pdf.
100
y = 2/x with x = 10  2
dN/dy
80
s y 2sx / x 2
Start with a gaussian with =10, s=2.
DO NOT get another gaussian !
Get a pdf with  = 0.2, s = 0.04.
This new pdf has longer tails than a gaussian pdf.
60
G
x
40
Prob(y>y+5sy) =5x10-3, for gaussian 3x10-7
20
0
0.1
0.2
0.3
0.4
0.5
0.6
y
Unphysical situations can arise if we use the propagation of errors results blindly!
Example: Suppose we measure the volume of a cylinder: V = R2L.
Let R = 1 cm exact, and L = 1.0 ± 0.5 cm.
Using propagation of errors we have: sV = R2sL = /2 cm3.
and V =  ± /2 cm3
However, if the error on V (sV) is to be interpreted in the Gaussian sense then
the above result says that there’s a finite probability (≈ 3%) that the volume (V) is < 0 since V is
only two standard deviations away from than 0!
Clearly this is unphysical ! Care must be taken in interpreting the meaning of sV.
880.P20 Winter 2006
Richard Kass
Generalization of Propagation of Errors
We can generalize the propagation of errors formula:
s Q2  s x2 Q / x 2  s y2 Q / y 2  2s x y Q / x Q / y 
In matrix notation:
s
2
Q
 Q / x
 s x2
Q / y 
s x y
s x y  Q / x 


2 
s y  Q / y 
We can generalize to any number of variables:
s2=dTVd
with d an N-dimensional vector of derivatives
and V an NxN matrix of variances and covariances
V is often called the “error matrix” or “covariance matrix”.
V is a symmetric matrix (NxN and V=VT)
Example: Error in BR(Higgs+ – ). Assume: BR=Nfd/(eNpr)
2
2
s BR
 N fd / eN pr
2
s 2BR  s 2N
pr
1 / eN pr
 BR 

 s 2
N fd
 N 
pr 

N fd / e 2 N pr
2
2
 s Npr

 0
 0


 BR 
BR 
2

  s e2 
  s N pr

N

e



fd 
2
0
2
s Nfd
0
 N
fd

 eN 2
pr

2

0  N fd / eN pr


0  1 / eN pr 
s e2  N fd / e 2 N pr 
2

 s 2
N fd


 1

 eN
pr

2


  s 2  N fd
2
e 


 e N pr
880.P20 Winter 2006
Richard Kass




2
A Real Life Example
We want to measure the branching fraction for B-D0K*-.
We can measure it using three different decay modes of the D0:
D0, D00 and D0
D0K
D0K0
D0K3
B( B   Dk0 K * ) 
e k NB
N ( D0  X k ) f K *

B

B( K * )B( D 0  X k )

0

e, efficiency (%)
13.30
4.60
8.82
B(D0X) (%)
3.80
12.84
7.46
N, Yield (events)
144.4±13.2
185.4±18.6
195.0±18.2
B(B-D0K*-)x10-4
5.15±0.45
5.65±0.54
5.34±0.48
Statistical uncertainties only
Also have to take into account systematic errors
How should we combine
the 3 measurements?
8
880.P20 Winter 2006
Richard Kass
A Real Life Example
Also have to take into account systematic errors
Summary of Systematic Errors
uncorrelated
standard recipes
study data & MC, vary cuts
data (B-D0-) Vs MC
lumi script
finite MC samples
PDG BF uncertainties
study data & MC, vary cuts
uncorrelated
correlated
6.6%
6.2%
3.1%
5.2%
4.7%
7.3%
Some sources of systematic errors are correlated, some are not.
Correlated errors wind up in the off-diagonal elements of the error matrix.
9
880.P20 Winter 2006
Richard Kass
A Real Life Example
Since the 3 measurements have different precision we will do a weighted average.
But, a bit tricky because we have statistical and systematic errors and
some of the systematic errors are correlated.
B(B-D0K*-)=w1B(B-DK*-)+w2B(B-0DK*-)+w3B(B-DK*-)
Follow the procedure outlined in Lyons et al., NIMA 270, 110 (1988)
We want to find the weights (w ) that minimizes: s2=wTVw
i
subject to: Swi=1
(Here the derivative vector is just the weights)
Can solve this problem using Lagrange multiplier technique:
s2=wTVw+l(wTI-1) here I is a vector of 1’s.
d 2
s  2Vw  lI  0  w  (l / 2)V 1 I
dw
Need to find the multiplier, l. From constraint equation we get:
Can now solve the problem:
Vw  (l / 2) I
V is a 3x3 symmetric matrix
d 2
s  ( wT I  1)  0  wT I  1
dl
wT Vw  (l / 2) wT I  (l / 2) using constraint
wT Vw  wT VV 1Vw  (Vw) T V 1 (Vw)  (l / 2 I ) T V 1 (l / 2 I )  (l2 / 4) I T V 1 I
2
(l2 / 4) I T V 1 I  l / 2  l  T 1
For the uncorrelated case the
I V I
weights are the same as using:
880.P20 Winter 2006
V 1 I
w  T 1
I V 10
I
x1

x2

xn
s 12 s 22
s n2

1 / s 12  1 / s 22    1 / s n2
Richard Kass
Branching Fraction Averaging Procedure
The weights (w1, w2, w3) are calculated using the error matrix, V:
V=
Vstatistics
+
Vsystematics
 s st2 , K

V  0

 0
0
s st2 , K
0
0
2
s syTOT
 
, K

  s syC , K  s syC , K 0
 
s st2 , K 3   s syC , K 3  s syC , K

0
0
s syC , K  s syC , K
2
s syTOT
, K
s syC , K 3  s syC , K
0
0
0
s syC , K 3  s syC , K
s syC , K 3  s syC , K
2
s syTOT
, K 3
st=statistics syTOT=total systematics syC=correlated systematics
Note: an off diagonal element is a “dot” product of the errors of the 3 modes, e.g.:
s syC , K  s syC , K 0  s syC1, K  s syC1, K 0  s syC 2, K  s syC 2, K 0   
where 1=tracking eff, 2=particle ID, etc.
V 1u
Calculate weights: w  T 1  (0.506, 0.216, 0.278)
u V u
Calculate variances:
s st2  wTVstatisticsw  0.09
s sy2  wTVsystematics w  0.119
B( B   D 0 K * )  (5.29  0.30  0.34) 104
11
published in PRD 773, 111104(R)
(2006)
880.P20 Winter 2006
Richard Kass



0



Download