22S6 - Numerical and data analysis techniques Mike Peardon Hilary Term 2012

advertisement
22S6 - Numerical and data analysis
techniques
Mike Peardon
School of Mathematics
Trinity College Dublin
Hilary Term 2012
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
1 / 22
Course content and assessment
Assessment
The course counts for 5 ECTS.
The course will be examined in the summer in a two-hour
paper.
Homework assignments will be assessed.
Your final mark will be 20% homework score, 80% exam
performance.
1
Probability
Review the basic ideas
Conditional probability and Bayes’ theorem
Random variables
Distributions
2
3
4
Information from data: Statistics
Numerical methods and algorithms
Introduction to stochastic processes
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
2 / 22
Probability
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
3 / 22
Sample space
Consider performing an experiment where the outcome is
purely randomly determined and where the experiment
has a set of possible outcomes.
Sample Space
A sample space, S associated with an experiment is a set
such that:
1
each element of S denotes a possible outcome O of the
experiment and
2
performing the experiment leads to a result
corresponding to one element of S in a unique way.
Example: flipping a coin - choose the sample space
S = {H, T} corresponding to coin landing heads or tails.
Not unique: choose the sample space S = {L}
corresponding to coin just landing. Not very useful!
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
4 / 22
Events
Events
An event, E can be defined for a sample space S if a question
can be put that has an unambiguous answer for all outcomes
in S. E is the subset of S for which the question is true.
Example 1: Two coin flips, with S = {HH, HT, TH, TT}.
Define the event E1T = {HT, TH}, which corresponds to
one and only one tail landing.
Example 2: Two coin flips, with S = {HH, HT, TH, TT}.
Define the event E≥1T = {HT, TH, TT}, which corresponds
to at least one tail landing.
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
5 / 22
Probability measure
Can now define a probability model, which consists of a
sample space, S a collection of events (which are all
subsets of S) and a probability measure.
Probability measure
The probability measure assigns to each event E a probability
P(E), with the following properties:
1
P(E) is a non-negative real number with 0 ≤ P(E) ≤ 1.
2
P(∅) = 0 (∅ is the empty set event).
3
P(S) = 1 and
4
P is additive, meaning that if E1 , E2 , . . . is a sequence of
disjoint events then
P(E1 ∪ E2 ∪ . . . ) = P(E1 ) + P(E2 ) + . . .
Two events are disjoint if they have no common outcomes
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
6 / 22
Probability measure (2)
Venn diagrams give a very useful way of visualising
probability models.
Example: Ec ⊂ S is
the complement to
event E, and is the
set of all outcomes
NOT in E (ie
Ec = {x : x ∈
/ E}).
C
E
E
The probability of an
event is visualised as
the area of the
region in the Venn
diagram.
S
The intersection A ∩ B and union A ∪ B of two events can
be depicted ...
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
7 / 22
Probability measure (3)
A B
The intersection of
two subsets A ⊂ S
and B ⊂ S
A ∩ B = {x : x ∈ A and x ∈ B}
A
A
B
S
A B
1111111111111111
0000000000000000
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
111111111
000000000
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
The union of two
subsets A ⊂ S and B ⊂ S
A ∪ B = {x : x ∈ A or x ∈ B}
B
S
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
8 / 22
Probability measure (4)
The Venn diagram approach makes it easy to remember:
P(Ec ) = 1 − P(E)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Also define the conditional probability P(A|B), which is
the probability event A occurs, given event B has occured.
Since event B occurs with probability P(B) and both
events A and B occur with probability P(A ∩ B) then the
conditional probability P(A|B) can be computed from
Conditional probability
P(A|B) =
Mike Peardon (TCD)
P(A ∩ B)
P(B)
22S6 - Data analysis
Hilary Term 2012
9 / 22
Conditional probability (1)
Conditional probability describes situations when partial
information about outcomes is given
Example: coin tossing
Three fair coins are flipped. What is the probability that the
first coin landed heads given exactly two coins landed heads?
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
A = {HHH, HHT, HTH, HTT} and B = {HHT, HTH, THH}
A ∩ B = {HHT, HTH}
P(A|B) =
P(A∩B)
P(B)
Answer:
=
2/ 8
3/ 8
=
2
3
2
3
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
10 / 22
Conditional probability (2)
Bayes’ theorem
For two events A and B with P(A) > 0 and P(B) > 0 we have
P(A)
P(A|B) =
P(B|A)
P(B)
Since P(A|B) = P(A ∩ B)/ P(B) from conditional
probability result, we see P(A ∩ B) = P(B)P(A|B).
switching A and B also gives P(B ∩ A) = P(A)P(B|A)
. . . A ∩ B is the same as B ∩ A . . .
Thomas Bayes
(1702-1761)
so we get P(A)P(B|A) = P(B)P(A|B) and Bayes’
theorem follows
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
11 / 22
Partitions of state spaces
Suppose we can completely partition S into n disjoint
events, A1 , A2 , . . . An , so S = A1 ∪ A2 ∪ · · · ∪ An .
Now for any event E, we find
P(E) = P(E|A1 )P(A1 ) + P(E|A2 )P(A2 ) + . . . P(E|An )P(An )
This result is seen by using the conditional probability
theorem and additivity property of the probability
measure. It can be remembered with the Venn diagram:
A2
A4
E A1
Mike Peardon (TCD)
A5
A3
22S6 - Data analysis
S
Hilary Term 2012
12 / 22
A sobering example
With the framework built up so far, we can make powerful
(and sometimes surprising) predictions...
Diagnostic accuracy
A new clinical test for swine flu has been devised that has a
95% chance of finding the virus in an infected patient.
Unfortunately, it has a 1% chance of indicating the disease in
a healthy patient (false positive). One person per 1, 000 in the
population is infected with swine flu. What is the probability
that an individual patient diagnosed with swine flu by this
method actually has the disease?
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
13 / 22
A sobering example
With the framework built up so far, we can make powerful
(and sometimes surprising) predictions...
Diagnostic accuracy
A new clinical test for swine flu has been devised that has a
95% chance of finding the virus in an infected patient.
Unfortunately, it has a 1% chance of indicating the disease in
a healthy patient (false positive). One person per 1, 000 in the
population is infected with swine flu. What is the probability
that an individual patient diagnosed with swine flu by this
method actually has the disease?
Answer: about 8.7%
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
13 / 22
The Monty Hall problem
When it comes to probability, intuition is
often not very helpful...
The Monty Hall problem
In a gameshow, a contestant is shown three doors and asked
to select one. Hidden behind one door is a prize and the
contestant wins the prize if it is behind their chosen door at
the end of the game. The contestant picks one of the three
doors to start. The host then opens at random one of the
remaining two doors that does not contain the prize. Now the
contestant is asked if they want to change their mind and
switch to the other, unopened door. Should they? Does it
make any difference?
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
14 / 22
The Monty Hall problem
When it comes to probability, intuition is
often not very helpful...
The Monty Hall problem
In a gameshow, a contestant is shown three doors and asked
to select one. Hidden behind one door is a prize and the
contestant wins the prize if it is behind their chosen door at
the end of the game. The contestant picks one of the three
doors to start. The host then opens at random one of the
remaining two doors that does not contain the prize. Now the
contestant is asked if they want to change their mind and
switch to the other, unopened door. Should they? Does it
make any difference?
P(Win)=2/3 when switching, P(Win) = 1/3 otherwise
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
14 / 22
The Monty Hall problem (2)
This misunderstanding about conditional probability can
lead to incorrect conclusions from experiments...
Observing rationalised decision making?
An experiment is performed where a monkey picks between
two coloured sweets. Suppose he picks black in preference to
white. The monkey is then offered white and red sweets and
the experimenters notice more often than not, the monkey
continues to reject the white sweets and chooses red. The
experimental team concludes the monkey has consciously
rationalised his decision to reject white sweets and reinforced
his behaviour. Are they right in coming to this conclusion?
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
15 / 22
The Monty Hall problem (2)
This misunderstanding about conditional probability can
lead to incorrect conclusions from experiments...
Observing rationalised decision making?
An experiment is performed where a monkey picks between
two coloured sweets. Suppose he picks black in preference to
white. The monkey is then offered white and red sweets and
the experimenters notice more often than not, the monkey
continues to reject the white sweets and chooses red. The
experimental team concludes the monkey has consciously
rationalised his decision to reject white sweets and reinforced
his behaviour. Are they right in coming to this conclusion?
Not necessarily. Based on the first observation, there
are three possible compatible rankings
(B>W>R,B>R>W,R>B>W). In 2 of 3, red is preferred to
white, so a priori that outcome is more likely anyhow.
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
15 / 22
Independent events
Independent events
Events A and B are said to be independent if
P(B ∩ A) = P(A) × P(B)
If P(A) > 0 and P(B) > 0, then independence implies both:
P(B|A) = P(B) and
P(A|B) = P(A).
These results can be seen using the conditional
probability result.
Example: Two coins are flipped where the probability the
first lands on heads is 1/ 2 and similarly for the second. If
these events are independent we can now show that all
outcomes in S = {HH, HT, TH, TT} have probability 1/ 4.
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
16 / 22
Summary
Defining a probability model means choosing a good
sample space S, collection of events (which all correspond
to subsets of S) and a probability measure defined on all
the events.
Events are called disjoint if they have no common
outcomes.
Understanding and remembering probability calculations
or results is often made easier by visualising with Venn
diagrams.
The conditional probability P(A|B) is the probability event
A occurs given event B also occured.
Bayes’ theorem relates P(A|B) to P(B|A).
Calculations are often made easier by partitioning state
spaces - ie finding disjoint A1 , A2 , . . . An such that
S = A1 ∪ A2 ∪ . . . An .
Events are called independent if P(A ∩ B) = P(A) × P(B).
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
17 / 22
Binomial experiments
A binomial experiment
Binomial experiments are defined by a sequence of
probabilistic trials where:
1
2
3
4
Each trial returns a true/false result
Different trials in the sequence are independent
The number of trials is fixed
The probability of a true/false result is constant
Usual question to ask - what is the probability the trial
result is true x times out of n, given the probability of
each trial being true is p?
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
18 / 22
Examples of binomial experiments
Examples and counter-examples
These examples are binomial experiments:
1
Flip a coin 10 times, does the coin land heads?
2
Ask the next ten people you meet if they like pizza
3
Screen 1000 patients for a virus
... and these are not:
Flip a coin until it lands heads (not fixed number of trials)
Ask the next ten people you meet their age (not
true/false)
Is it raining on the first Monday of each month? (not a
constant probability)
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
19 / 22
Number of experiments with x true outcomes
Number of selections
There are
Nx,n ≡ n Cx =
n!
x!(n − x)!
ways of having x out of n selections.
Coin flip outcomes
Example: how many outcomes of five coin flips result in
the coin landing heads three times?
Answer: Nx,n =
5!
3!2!
= 10
They are: {HHHTT, HHTHT, HHTTH, HTHHT, HTHTH, . . .
. . . HTTHH, THHHT, THHTH, THTHH, TTHHH}
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
20 / 22
Probability of x out of n true trials
If the probability of each trial being true is p (and so the
probability of it being false is q = 1 − p) ...
and the selection trials are independent then...
Probability of x out of n true outcomes
Px,n = n Cx px qn−x ≡ n Cx px (1 − p)n−x
We can compute this probability since we can count the
number of cases where there are x true trials and each
case has the same probability
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
21 / 22
Infinite state spaces
The set of outcomes of a probabilistic experiment may be
an uncountably infinite set
Here, the distinction between outcomes and events is
more important: events can be assigned probabilities,
outcomes can’t
Outcomes described by a continuous variable
1
If I throw a coin and measure how far away it lands, the
state space is described by the set of real numbers, Ω = R
2
I could also simultaneously see if it lands heads or tails.
This set of outcomes is still “uncountably infinite”. The
state space is now Ω = (H, T) × R
Impossible to define probability the coin lands 1m away.
Events can be defined - for example, an even might be
“the coin lands heads more than 1m away.”
Mike Peardon (TCD)
22S6 - Data analysis
Hilary Term 2012
22 / 22
Download