Lab 1: Review of Probability

advertisement
Lab 1: Review of Probability
Introduction
Population genetics is, in its essence, really a question of probability. What is the probability of two
alleles coming together to form a diploid organism? How do those alleles affect the probability that the
organism will survive and reproduce? What is the probability of occurrence of a fitness-altering mutation?
The combination of these probabilities results in the particular distribution of genetic variants that
comprise modern species, and this is the fundamental driver of organismal evolution. Therefore, a
fundamental understanding of probability is an essential building block for population genetics, and this is
the focus of our first two laboratory modules. Our specific goals are to:
1) Calculate the probability of an event using the Sample Point Method and combinatorics,
2) Calculate the probability of the union and intersection of events, and
3) Apply the concepts of independence and conditionality in probability calculations.
Definition: Probability is a quantitative measure of one’s belief in the occurrence of a future event.
One way to think about probability is as the relative frequency of an event in a long series of trials (e.g.,
the proportion of heads resulting from flipping a fair coin a number of times). Another school of thought
regards probability as a measure of subjective plausibility of a proposition (i.e., the probability of an event
may vary depending on the beliefs of the person performing the evaluation). The former concept of
probability is known as the frequentists’ approach, whereas the latter is one of many concepts
characterizing the Bayesian interpretation of probability. The two approaches differ both philosophically
and mathematically, and resolving these differences is (fortunately) not one of the goals of this class.
While you are probably much more familiar with the frequentists’ approach, we will also see examples of
Bayesian inference later on.
Sample-Point Method
The Sample-Point Method is a simple way to find the probability of an event. This method involves the
following conceptual steps:
1) Define the sample space S of an experiment by listing all sample points (i.e., all possible
outcomes of the experiment).
2) Assign probabilities Pi to all sample points in S, making sure that
Pi = 1.
∑
3) Determine which sample points constitute the event of interest and sum their probabilities to find
the probability of that event. If all sample points have equal probabilities, the probability of event
A can also be calculated as P(A) = na / N, where na is the number of points constituting event A
and N is the total number of sample points.
Example: Use the Sample Point Method to find the probability of getting exactly two heads in three tosses
of a balanced coin.
1) The sample space of this experiment is:
Outcome
1
2
3
4
5
6
7
Toss 1
Head
Head
Head
Tail
Tail
Tail
Head
Toss 2
Head
Head
Tail
Head
Tail
Head
Tail
Toss 3
Head
Tail
Head
Head
Head
Tail
Tail
Shorthand
HHH
HHT
HTH
THH
TTH
THT
HTT
1
8
Tail
Tail
Tail
TTT
2) Assuming that the coin is fair, each of these 8 outcomes has a probability of 1/8.
3) The probability of getting two heads is the sum of the probabilities of outcomes 2, 3, and 4
(HHT, HTH, and THH), or 1/8 + 1/8 + 1/8 = 3/8 = 0.375.
Problem 1: The game of “craps” consists of rolling a pair of balanced dice (i.e., for each die getting 1, 2,
3, 4, 5, and 6 all have equal probabilities) and adding up the resulting numbers. A roll of 2,3, or 12 loses,
while a roll of 7 or 11 wins. Using the Sample-Point Method, find the exact probabilities of
a. a losing role, and
b. a winning role.
Combinatorics
As you can imagine, listing all possible outcomes from an experiment can be somewhat tedious. To deal
with larger sample spaces, one would need to use some combinatorial tricks for counting sample points.
The simplest of these tricks is known as the mn rule, which states that with m elements from one group
and n elements from another group, it is possible to form m × n pairs containing one element from each
group. Applying this rule, you could quickly find that the two-dice problem you just worked on has 6 × 6
= 36 possible outcomes, which can be graphically confirmed by counting up the cells of the following
table:
Second die
First die
1
2
3
4
5
6
1
2
3
4
5
6
The mn rule readily extends to more than two groups. Thus, if the experiment consisted of rolling 3 dice,
there would have been 63 = 216 possible outcomes.
Example: Calculate the probability that each person in a class of 14 students has a different birthday.
A sample point in this problem consists of 14 dates, each corresponding to the birthday of a student. If we
assume that there are 365 possible birthday dates (which of course is not true but makes things simpler),
and that all sample points are equiprobable, the total number of sample points is N = 36514 (i.e., there are
365 possible birthday dates for student 1, 365 possible birthday dates for student 2, ..., and 365 possible
birthday dates for student 14). To calculate the number of ways in which 14 students could have different
birthday dates, we use the same logic: there are 365 possible birthday dates for student 1, 364 possible
birthday dates for student 2 (all but the birthday date of student 1), 363 possible birthday dates for student
3 (all but the birthday dates of students 1 and 2), etc. Thus, the number of sample points constituting the
event of interest is:
na = 365 × 364 × 363 × ... × 352,
and the probability of the event (let’s designate the event that 14 students all have different birthdays with
A) is:
P( A) =
na 365 × 364 × 363 × ... × 352
=
= 0.7769.
N
36514
2
Sample points can often be represented as sequences of numbers or symbols and, in some cases, the total
number of sample points is equal to the number of distinct ways of arranging these symbols or numbers in
a sequence. An ordered arrangement of distinct objects is called a permutation, and the total number of
ways of ordering n objects taken r at a time is:
Prn = n(n − 1)(n − 2)...(n − r + 1) =
n!
,
(n − r )!
where n! = n × (n-1) × (n-2) ×...× 2 × 1 (remember also that 0! = 1).
Example: How many trinucleotide sequences can be formed without repeating a nucleotide?
Using the notation given above, we are basically interested in the number of ways of ordering n = 4
elements (A, T, C, and G) taken r = 3 at a time. Using the formula for the total number of permutations:
Prn = 4 × 3 × 2 =
4!
= 24.
(4 − 3)!
In other cases, the order of the elements in a sequence is not important. Unordered sets of r elements
chosen without replacement from n available elements are called combinations, and the total number of
combinations can be calculated using the formula:
⎛ n ⎞
n!
Crn = ⎜⎜ ⎟⎟ =
.
⎝ r ⎠ r!(n − r )!
Notice that:
Crn =
Prn Prn
=
.
Prr
r!
Translated into English, this formula means that the number of combinations of n elements taken r at a
time is equal to the number of permutations of n elements taken r at a time divided by the number of
possible arrangements of r selected elements.
In the case of our nucleotide sequences, if we don’t take the order of bases into account, the total number
of combinations is:
⎛ 4 ⎞
4!
4!
C34 = ⎜⎜ ⎟⎟ =
=
= 4.
⎝ 3 ⎠ 3!(4 − 3)! 3!1!
In this case, the number is only 4 because, for example, ACG, AGC, CAG, CGA, GAC, and GCA would
all be counted as one occurrence of the ACG combination.
Problem 2: There are 36 computer workstations in this lab. How many distinct ways could the students
of this class be arranged, with one student per workstation?
3
Problem 3. Twenty students try out for the Morgantown High School basketball team, but only 12 can be
selected for the team.
a.) How many possible teams can be selected?
b.) There are five different starting positions on a basketball team: point guard, shooting guard, small
forward, power forward, and center. How many different starting lineups could be constructed
from the original twenty students who tried out?
Union and Intersection of Events
Definition: The union of events A and B, denoted by A ∪ B, contains all sample points that fall within A
or B or both within A and B (i.e., union = A or B is true).
Venn diagram for A ∪ B
A
B
Example: Let event A be the occurrence of an odd number in a single roll of a fair die and event B the
occurrence of a number smaller than 4. The sample points falling within event A are {1, 3, 5}, and those
falling within event B are {1, 2, 3}. The union of A and B will, therefore, consist of sample points {1, 2, 3,
5}.
Definition: The intersection of events A and B, denoted by A ∩ B, contains all sample points that fall
within both A and B (i.e., intersection = A and B are both true).
A
B
Venn diagram for A ∩ B
Example: In the example above, the intersection of A and B would consist of sample points {1, 3}.
4
Conditional Probability
The probability of an event sometimes depends on other events that might have occurred. Consider, for
example, a game in which a person rolls a balanced die that you cannot see, and then asks you to guess
the resulting number. If you guess that the number is 3, the probability of guessing correctly is 1/6.
However, if the other person tells you that the result is an odd number, the probability of 3 being the
correct guess becomes 1/3 (i.e., because the possible outcomes become 1, 3, and 5). The conditional
probability of an event A (the resulting number is 3), given that event B (the result is an odd number) has
occurred can be found as:
P( A | B) =
P( A ∩ B)
,
P( B)
where P ( A | B ) is read as “the probability of A given B.” In this example, P( A ∩ B) = 1/6, P(B) =1/2,
and
P( A | B ) =
1/ 6
= 1 / 3.
1/ 2
Independent Events and the Multiplicative Law of Probability
Definition: Two events A and B are said to be independent if P( A | B ) = P( A) or P( B | A) = P( B ).
By rearranging the formula for conditional probability, the probability of the intersection of two events
can be calculated as:
P( A ∩ B ) = P( A | B ) P( B ),
which is known as the multiplicative law of probability. If events A and B are independent:
P( A ∩ B ) = P( A) P( B ).
This basically means that the probability that two independent events will both occur can be found by
multiplying their individual probabilities.
Mutually Exclusive Events and the Additive Law of Probability
Definition: Two events A and B are said to be mutually exclusive if P ( A ∩ B ) = 0.
The probability of the union of two events can be calculated using the additive law of probability:
P( A ∪ B ) = P( A) + P( B ) − P( A ∩ B ),
and if events A and B are mutually exclusive:
P( A ∪ B) = P( A) + P( B).
Thus, the probability that one of two mutually exclusive events will occur can be found by adding their
individual probabilities.
5
Problem 4: An inexperienced spelunker is preparing for the exploration of a big cave in a rural area of
Mexico. He is planning to use two independent light sources and from reading their technical
specifications, he has concluded that each source is expected to malfunction with probability of 0.01.
What is the probability that:
a) At least one of his light sources malfunctions?
b) Both of his light sources malfunctions?
Problem 5. GRADUATE STUDENTS ONLY: Search the literature for an example of an application of
basic probability theory to a problem in genetics or genomics. Describe the hypothesis being tested, the
results of the test, and the interpretation. Was this a correct implementation of the method? Two points of
extra credit will be awarded if you uncover an error in calculation and/or interpretation that was published
in a peer-reviewed journal. Be sure to send the original manuscript to Rose when you submit your report.
Guidelines for Lab Report # 1
•
•
Show all relevant formulas and calculations, including MS Excel spreadsheets. There will be
partial credit.
At the end of each solution, provide a sentence (or several sentences) with specific answers to
the questions asked.
Bibliography:
Wackerly, D.D., Mendenhall, W., and Scheaffer, R.L. 2002. Mathematical statistics with applications.
Sixth edition. Duxbury, USA. Chapter II.
6
Download