Lecture 5: Decision and inference Part 2

Lecture 5: Decision and inference Part 2 Influence diagrams Decision-theoretic components can be added to a Bayesian network. The complete network is then related to as a Bayesian Decision Network or more common Influence diagram (ID) Return to the example with banknotes. Let H0: Dye is present H1: Dye is not present States of nature E1: Method gives positive detection E2: Method gives negative detection Data and let A simple Bayesian network can be constructed for the relevance between the state of nature and data: H H Probabilities H0 0.001 H1 0.999 Probabilities E H: E H0 H1 E1 0.99 0.02 E2 0.01 0.98 Now, we will add two nodes to the network, one for the actions that can be taken and one for the loss function A A a1 Destroy banknote a2 Use banknote A: L H: L a1 a2 H0 H1 H0 H1 0 100 500 0 Neither of the nodes are random nodes. Node L must be a child node with nodes H and A as parents. A H L E With this network, an influence diagram, we would like to be able to propagate data from node E to a choice of decision in node A. Hence, in the loss node L the expected posterior loss should be calculated. Using GeNIe: In GeNIe (and other software), this node is per definition a utility node, but it can be used as a loss node by representing losses as negative utilities Run the network Instantiate node E to E1 The expected posterior utility (negative loss) can be read off in node A (and also in node L ) About prior distributions Conjugate prior distributions Example: Assume the parameter of interest is , the proportion of some property of interest in the population (i.e. the probability for this property to occur) A reasonable prior density for  is the Beta density:   1 1    p  ;  ,    B  ,    1 1 B  ,     x 1 1  x  0  1 , 0    1;   0,   0 hyperparam eters  dx       Beta function       Beta(1,1) Beta(5,5) Beta(1,5) Beta(5,1) Beta(2,5) Beta(5,2) Beta(0.5,0.5) Beta(0.3,0.7) Beta(0.7,0.3) 0 0.5 1 Now, assume a sample of size n from the population in which y of the values possess the property of interest. The likelihood is n y L ; y      1   n  y  y  1  1 n y  1   n y     1     y B  ,   L  | y  p    q  | y   1      1  1 1 0 Lx | y  px dx 0  n  x y 1  x n y  x 1  x  dx B  ,    y  y 1    n y   x 1  x  1 0 y n y    1 1     y  1 1     1 x  1 1  x   1 n  y   1  dx 1 x 0 y  1 1  x  n  y   1  dx  y  1 1     B y   , n  y    n  y   1 Thus, the posterior density is also a Beta density with hyperparameters y +  and n – y + . Prior distributions that combined with the likelihood gives a posterior in the same distributional family are named conjugate priors. (Note that by a distributional family we mean distributions that go under a common name: Normal distribution, Binomial distribution, Poisson distribution etc. ) A conjugate prior always go together with a particular likelihood to produce the posterior. We sometimes refer to a conjugate pair of distributions meaning (prior distribution, sample distribution = likelihood) In particular, if the sample distribution, i.e. f (x; ) belongs to the k-parameter exponential family (class) of distributions: k  Aj θ B j  x C  x  D θ  f  x; θ   e j1 we may put  Aj θ  j  k 1  D θ  K 1 ,, k , k 1  k p θ   e j1 k  Aj θ  j  k 1  D θ   e j1 where 1 , … , k + 1 are parameters of this prior distribution and K( ) is a function of 1 , … , k + 1 only . Then q θ | x   L θ | x   p θ   k n n k  A j θ  B j  xi   C  xi  nD θ   A j θ  j  k 1 D θ  K 1 ,, k , k 1  i 1 i 1  e j 1 e j 1  n e  e  A j θ   j 1  k e C  xi  i 1   A j θ   j 1  k K 1 ,, k , k 1   e  B j  xi  j    n  k 1  D θ   i 1  n    B j  xi  j    n  k 1  D θ   i 1  n  i.e. the posterior distribution is of the same form as the prior distribution but with parameters n n i 1 i 1 1   B1  xi , ,  k   Bk xi ,  k 1  n instead of 1,,  k ,  k 1 Some common cases (within or outside the exponential family): Conjugate prior Sample distribution Posterior Beta Binomial  ~ Beta  ,   Beta X ~ Bin n,   Normal, known  2 Normal   ~ N  , 2  Gamma  X i ~ N , 2  Normal 2 2 2   2 n      | x ~ N 2   x , 2 2 2 2    n 2   n   n   Poisson  ~ Gamma ,   Pareto Gamma X i ~ Po  Uniform p      ;     | x ~ Beta   x,   n  x   |  xi ~ Gamma    xi ,   n  Pareto X i ~ U 0,   q ; x      n  ;   max  , xn   Non-informative priors (uninformative) A prior distribution that gives no more information about  than possibly the parameter space is called a non-informative or uninformative prior. Example: Beta(1,1) for an unknown proportion  simply says that the parameter can be any value between 0 and 1 (which coincides with its definition) A non-informative prior is characterized by the property that all values in the parameter space are equally likely. Proper non-informative priors: 0.25 0.2 0.15 The prior is a true density or mass function 0.1 0.05 1.2 0 1 2 3 4 5 1 0.8 0.6 Improper non-informative priors: 0.4 0.2 0 0 The prior is a constant value over Rk Example: N ( , ) for the mean of a normal population 0.2 0.4 0.6 0.8 1

Lecture 5: Decision and inference Part 2

Related documents

Products

Support

Lecture 5: Decision and inference Part 2

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib