Lecture 9: Sampling (from a decision perspective)

advertisement
Lecture 9:
Sampling (from a decision
perspective)
Sampling issues in forensic casework
When the population (seizure, consignment) is too large to be analysed in its
entirety:
•
–
because of limitations in time and/or resources (personnel, money)
•
When the analysis of a single unit means destruction
•
When seizing is equivalent to sampling (Isn’t it always?)
•When
the population is “infinite”
What is sampled?
•
Drugs (pills, plastic bags, capsules, phials)
•
Bank notes and coins
•
CD-ROMs
•
Crops (suspected cannabis)
•
Individuals
•
Glass
•
Fibres
•
Clothes
Objectives of sampling
•
To do forensic analysis in particular cases
•
To establish data bases (reference material) for use with evidence evaluation
•
For quality assurance reasons
Two “general” cases
1. The population is expected to be (large and) heterogeneous
•
Difficult to make prior assumptions about population parameters
•
Sample size must usually be large (…to reflect the heterogeneity)
 Normal approximations are valid and sample size determination can be
done with the “frequentist” approach
2. The population is expected to be homogeneous
•
Easier to make prior assumption about population parameters
•
Sample size needs not to be large (e.g. if we are 100 % certain that all
elements in the population are of the same kind, we only need to sample
one unit)
•
Bayesian approach to sample size determination is more attractable
The heterogeneous case
•
Undesirable for forensic analysis in particular cases
•
Expected when data bases are to be established
Sampling of individuals, glass, fibres etc.
Should be carried out with careful use of knowledge from survey theory:
•
Comparison of frame population with true population
Choice of sampling design (simple random sampling, stratified sampling,
cluster sampling,…)
•
•
Efficient prevention and post-handling of non-response
The homogeneous case
•
Main “Objective” of the current presentation
•
Required for efficient sampling in daily case-work
Sampling of drug pills, bank notes, CD-ROMs etc. for further analysis
General desire: To keep the sample size very small (5-10 units)
Sampling under experimental conditions for inference about proportions
General desire: To keep the sample size as small as possible
Some examples from drug sampling
1. Homogeneity expected from visual inspection and experience
Consider a case with a seizure of 5000 pills, all of the same colour (blue), form
(circular) and printing (e.g. the Mitsubishi trade mark)
The forensic scientist would say “this is a seizure of Ecstasy pills”
Some examples from drug sampling
So, what do we know about blue pills (supposed to be Ecstasy)?
Consider historical cases with blue pills
Group the cases into M clusters with respect to another parameter, e.g. the print
on the pill.
Find an estimate of the prior distribution for the proportion  of Ecstasy pills
among blue pills.
Nordgaard A. (2006) Quantifying experience in sample size determination for drug
analysis of seized drugs. Law, Probability and Risk 4: 217-225
Some examples from drug sampling
Cluster
Accumulated
size of
seizure
Accumulated
size of
sample
Number of
Ecstasy pills
Number of
Non-Ecstasy
pills
1
N1
n1
x1
n1 – x1
2
N2
n2
x2
n2 – x2
…
…
…
…
…
M
NM
nM
xM
nM – xM
Use a generic beta prior for the proportion  of Ecstasy pills in the current
seizure:
f   1 , 2  

 1    2
B 1 , 2 
1 1
 1
; 0  1
Some examples from drug sampling
Use the grouped data to estimate the parameters 1 and 2 of this beta prior.
This can be done by the maximum likelihood method using that the probability of
obtaining xi Ecstasy pills in cluster i is
 N i      N i  1    

  

xi   ni  xi


P  xi  
 Ni 
 
 ni 
Hypergeometric
distribution
where “    ” stands for rounding downwards to nearest integer
The likelihood function is thus
L x    P  xi 
M
i 1
Some examples from drug sampling
The obtained point estimates of 1 and 2 can be assessed with respect to bias
and variance using bootstrap resampling.
In Nordgaard (2006) original point estimates of 1 and 2 for historical cases of
blue pills at SKL (now NFC) are
ˆ1  0.075 and ˆ2  0.224
Bias adjusted estimates are
ˆ1*  0.038 and ˆ2*  0.133
and upper 90% confidence limits for the true values of 1 and 2 are
 1  0.062 and  2  0.262
Some examples from drug sampling
Now, assume the forthcoming sample of n units will consist entirely of Ecstasy
pills. (Otherwise the case will be considered “non-standard”)
The sample size is determined so that the posterior probability of  being higher
than a certain proportion, say 50 %, is at least say 99% (referred to as 99%
credibility)
For large seizures the posterior distribution of  given all n sample units consist of
Ecstasy is also beta:
f  n, 1 , 2  

 1    2
B 1  n, 2 
1  n 1
 1
; 0  1
Some examples from drug sampling
Thus we solve for n
1
 f 
n, 1 , 2  d  0.99
0.50

1



0.50
1  n 1
 1   
 2 1
B 1  n, 2 
d
 0.99
where 1 and 2 are replaced by their (adjusted) point estimates or upper
confidence limits.
Some examples from drug sampling
For the above case we find that with the bias-adjusted point estimates
ˆ1*  0.038 and ˆ2*  0.133
the required sample size is at least 3 and with the upper confidence limits used
instead (i.e with 0.062 and 0.262) the required sample size is at least 4
There are in general no large differences between different choices of estimated
parameters, nor between different colours of Ecstasy pills.
A general sampling rule of n =5 can therefore be used to state with 99%
credibility that at least 50% of the seizure consists of Ecstasy pills. For a higher
proportion, a sample size around 12 appears to be satisfactory.
Some examples from drug sampling
For smaller seizures it is more wise to rephrase the requirement in terms of the
number of Ecstasy units in the non-sampled part of the seizure.
The posterior beta distribution is then replaced with a beta-binomial distribution.
Some examples from drug sampling
2. Homogeneity stated upon inspection only
Consider now a case with a (large) seizure of drug pills of which the forensic
scientist cannot directly suspect the contents.
Visual inspection  All pills seem to be identical
Can we substitute the “experience” from the Ecstasy case?
Some examples from drug sampling
UV-lightning
Pills can be inspected under UV light.
The fluorescence differs between pills with different chemical composition and
looking at a number of pills under UV light would thus reveal (to greatest extent)
heterogeneity.
Uncertainty of this procedure lies mainly with the person who does the inspection
 Experiment required!
Some examples from drug sampling
Assume a prior g( ) for the proportion of pills in the seizure that contains a
certain (but possibly unknown) illicit drug.
For sake of simplicity, assume that pills may be of two kinds (the illicit drug or
another substance).
Let Y be a random variable associated with the inspection such that
0 if inspection gives " all pills are identical"
Y 
1 if inspection gives " differences among pills"
Some examples from drug sampling
Relevant case is Y = 0
(Otherwise the result of the UV-inspection has rejected the assumption of
homogeneity.)
Now,
PY  0   for 0    1
is the false positive probability as a function of  (if a positive result means that
no heterogeneity is detected)
while
P Y  0   0   P Y  0   1
is the true positive probability.
Some examples from drug sampling
The prior g can be updated using this information (when available)
h | Y  0 
PrY  0 |    g  
 PrY  0 |    g  d
1
0
Note that an non-informative prior (i.e. g( )  1 ; 0    1 can be used.
The updated prior (i.e. the posterior upon UV-inspection) can then be used
analogously to the previous case (Ecstasy).
Some examples from drug sampling
Example Experiment (conducted at SKL (now NFC))
8 types of pills with different substances were used to form 9 different
mixtures (i.e. of two proportions) of 2 types of pills
•
Each mixture was prepared by randomly shuffling 100 pills with the current
proportions on a tray that was put under UV-light
•
10 case-workers made inspections in random order such that a total of 114117 inspections were made for each mixture
•
Some examples from drug sampling
Data can be illustrated by plotting estimated probabilities for Y = 0 vs. 
Linear interpolation gives
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
0.2
0.4
0.6
0.8
1
0.0
0.2
0.4

 0.97  48.5  
 0.005  0.024  


ˆ
PY  0 |     θ   
0
 0.019  0.024  


  47.5  48.5  
0.6
0.8
1.0

0    0.02
0.02    0.20
0.20    0.80
0.80    0.98
0.98    1
Some examples from drug sampling
To avoid the vertices at  = 0.02, 0.20, 0.80 and 0.98, the linearly interpolated
values are smoothed using a Kernel function:
     K        d
1
0
where K(x) is a symmetric function integrating to one over its support.
1
0.8
0.6


0.4
0.2
0
0.00
0.01
0.02
0.03

0.04
0.05
Some examples from drug sampling
Now, the prior can be updated using this smoothed function as an estimate of
PrY  0   , i.e.
h | Y  0 
    g  
    g  d
1
0
(With a non-informative prior g, this simplifies into
h | Y  0 
  
   d
1
0
)
Some examples from drug sampling
Comparison of the non-informative prior and the updated prior
30
20
g
h
10
0
0.0
0.2
0.4
0.6

0.8
1.0
Some examples from drug sampling
Now, let x be the number of illicit drug pills found in a sample of n pills.
Analogously with the Ecstasy case n should be determined so that if x = n a 99%
credible lower limit for  is 50% (or even higher).
With the updated prior derived the following table of posterior probabilities is
obtained.
n
Pr  0.5 | x  n, Y  0
3
4
5
6
7
8
9
10
0.99996032237
0.99999475894
0.99999924614
0.99999988597
0.99999998211
0.99999999711
0.99999999952
0.99999999992
Thus, a sample size of n =3 units is
satisfactory.
Slightly higher values may be
recommended due to the limits of
the experiment
The decision-theoretic approach
For any statistical problem that is defined as a decision problem, there is a loss
function (or utility function):
LS d ,  
Where d is the decision (action) taken and  denotes the state-of-nature (often a
parameter with unknown value).
The Bayes decision is the decision that minimises the expected loss (prior or
posterior depending on if data is available or not).
If data is available, data is assumed represented by a (random) sample of size n,
and the Bayes decision minimises the expected posterior loss.
How can the choice of sample size be defined as a decision problem?
“Extend” the loss function so that it includes the sample size:
L0 d ,  , n   LS d ,    C (n)
where C(n) is the cost – cost given in the same units as LS – of obtaining a sample
of size n.
An additive composition is assumed since this is the most natural choice.
The expected posterior loss when  is a parameter with parameter space  is then
L d , q  x , n , n    LS d ,   q  x , n d  C n 

where q( |x, n) is the posterior density of  given a sample x of size n
(i.e. x = (x1, … , xn) ).
When the states-of-nature is a countable set of distinct states (H1, H2, … ) the
expected posterior loss is
L d , q H x , n , n    LS d , H i  Pr H i x , n   C n 
i 1
with q(H |x, n ) being the posterior probability mass function.
Now, the Bayes decision will depend on n not only via the posterior density
(that depends on n) but also via the cost function C(n).
Let


r q n  , n  min L d , q  x , n , n 
d D
This is the Bayes risk and the minimisation is here over the decision space D at a
certain sample size n.
The optimal sample size is then obtained by minimising the Bayes risk with
respect to n :

nopt  min r q n  , n
nN

Example: Return to the examples with illicit pills
Assume we should make a decision on whether the proportion,  , of Ecstasy
pills in a seizure of 1000 pills is less than or at least 50 %.
The decision space is D = {d1 = “ < 50 %”, d2 = “  50 %”}
Assume a “0–ki” loss function as
 < 50 %
  50 %
d1
0
1
d2
10
0
Assume a prior distribution of  as Beta(1, 2) with 1 = 0.038 and 2 = 0.133
The number of Ecstasy pills in a sample of n pills is the Bin(n,  ).
Pre-assuming the sample to be completely homogeneous, i.e. either all are Ecstasy
pills or all are non-Ecstasy pills gives the posterior distribution to be any of
Beta(1 + n, 2 ) [all are Ecstasy] and Beta(1, 2 + n) [all are non-Ecstasy]
With Beta(1 + n, 2 ) as posterior the expected posterior losses are
L1 d1 , q  0, n , n   0  Pr   0.5 n, n   1  Pr   0.5 n, n  
 0.038 n 1 1   0.1331
1
 B0.038  n,0.133 d
0 .5
 0.038 n 1 1   0.1331
0 .5
L1 d 2 , q  0, n , n   10  Pr   0.5 n, n   0  Pr   0.5 n, n   10 
B0.038  n,0.133
0
d
With Beta(1, 2 + n) as posterior the expected posterior losses are
L2  d1 , q  n, n , n   0  Pr   0.5 0, n   1  Pr   0.5 0, n  
1
 0.0381 1   0.133 n 1
 B0.038,0.133  n  d
0 .5
 0.0381 1   0.133 n 1
0 .5
L2  d 2 , q  n, n , n   10  Pr   0.5 0, n   0  Pr   0.5 0, n   10 
0
How would we obtain the optimal sample size?
B0.038,0.133  n 
d
Download