Document 13165139

advertisement

LINKÖPINGS UNIVERSITET

Department of Computer and

Information Science

Division of Statistics/ANd

732A45 Statistical Evidence Evaluation

Fall semester 2015

Assignment 1

Assignment 1

Below are four tasks that you shall try to solve. All questions put should be answered.

Prepare your solutions in a nice format that can be easily read.

Your solutions should be submitted at latest on Friday 9 October 2015 .

1.

Assume a person is visiting his General Practitioner (GP) for some health problem. The patient shows a symptom that from the GP:s point-of-view could be the consequence of each of three different diseases. This symptom appears with disease A

1

in 25 % of all cases, with disease A

2

in 15 % of all cases, and with disease A

3

in 5 % of all cases. One can further approximately assume that a person cannot have more than one of these diseases at the same time. The symptom can also appear for other reasons. When none of the three mentioned diseases are present the probability is approximately 0.5 % that a person shows this symptom. a) What diagnosis should the GP give if she uses the principles of inference to the best explanation? b) Assume now that the prevalence of the diseases A

1

, A

2

and A

3

are 0.1 %, 0.5 % and 1 % respectively. By prevalence is here meant the proportion of the relevant population

(those people from which the current person belong) that has this disease at this specific point of time (point prevalence). What are the conditional probabilities of the person having respectively the diseases A

1

, A

2

and A

3

? What is your opinion about diagnosis according to principles of inference to the best explanation in this case?

2.

Consider a feature with an unknown population proportion θ . The Bayesian approach to point estimation of θ is to describe the initial uncertainty about it using a prior distribution, p ( θ ) and update this distribution using available data x to the (hopefully) more accurate posterior distribution q ( θ | x

) . In general the posterior is obtained as

( x

θ

)

⋅ p

( ) q

= f

λ f

( x

λ

)

⋅ p

( ) d λ where f ( x

| θ ) is the likelihood of θ with the data.

In many cases, data consist of a number n of sampled elements (hypergeometrical sampling) or random trials (binomial sampling) in which the absolute frequency x of elements possessing the feature has been obtained. With binomial sampling (which is a good approximation to hypergeometric sampling when the population is large) the likelihood becomes f

=



 n x



θ x

⋅ ( 1

θ

) n

− x

LINKÖPINGS UNIVERSITET

Department of Computer and

Information Science

Division of Statistics/ANd

732A45 Statistical Evidence Evaluation

Fall semester 2015

Assignment 1

By choosing a beta ( a , b ) distribution as prior, i.e. p

( )

=

θ a

1

(

1

θ

B

( a , b

)

) b

1

where B ( a,b ) is the Beta function it can easily be shown (do it as an exercise!) that the posterior is a beta ( a + x , b + n – x )-distribution

As a point estimate of θ we may then take the posterior mean (assuming the so-called loss function to be squared loss) a) Assume a beta (1,1) prior for θ (i.e. a uniform distribution) and compute a point estimate using the data given above. b) Assume that we take a random sample of 10 units from a set of 1000 units in which the proportion of units with property A is unknown. In the sample we get 2 units with property A. Find the Bayes factor for the inference problem defined by the following hypotheses:

H

0

: The unknown proportion is 20 %

H

1

: The unknown proportion is 25 % c) With the same sampling setup as in b) find the Bayes factor for the inference problem

H

0

: The unknown proportion is 20 %

H

1

: The unknown proportion is different from 20 % where you under H

1

can assume that any proportion between 0 % and 100 % is as probable as anyone else, i.e. a non-informative prior distribution. d) For the same inference problem as in c) calculated the posterior odds for H

0

when the prior odds are 2 to 1 on, i.e. numerically equivalent to 2.

3.

Assume some rare fibres have been secured from a car seat and they are suspected to

origin from a sweatshirt worn by a person suspected of having been sitting on the car

seat. The fibres match fibres of the sweatshirt. i.e. by inspection under microscope no

relevant differences can be detected. However when the key characteristics of the fibres

(stored in database format) are compared against a database of the characteristics of

3557 different fibres, no match is found.

Let the inference problem be defined by the two hypotheses

H

0

: “The fibres originate from the sweatshirt”

H

1

: “The fibres originate from some other textile product”

The data can in a simple way be described as “The characteristics of the fibres are as those of the fibres of the sweatshirt”

LINKÖPINGS UNIVERSITET

Department of Computer and

Information Science

Division of Statistics/ANd

732A45 Statistical Evidence Evaluation

Fall semester 2015

Assignment 1 a) What is the likelihood of H

0

with the data? b) Add the characteristics of the fibres of the sweatshirt to the database, i.e. the database is increased with one item. Assume (unrealistic though) that this new database perfectly represents the population of fibre characteristics. If under H

1

all fibre characteristic are equally probable what is the Bayes factor for the inference problem?

Download