Basic Probability

6430

Basic Probability, Including Contingency Tables



Random Variable

A random variable is real valued function defined on a sample space.

The sample space is the set of all distinct outcomes possible for an experiment.



Function

: two sets’ (well defined collections of objects) members are paired so that each member of the one set ( domain ) is paired with one and only one member of the other set

( range ), although elements of the range may be paired with more than one element of the domain.



The domain is the sample space, the range is a set of real numbers. A random variable is the set of pairs created by pairing each possible experimental outcome with one and only one real number.



Examples: a.) the outcome of rolling a die:



= 1,

 

= 2,

  

= 3, etc. (Each outcome has only one number, and, vice versa); b.)



= 1,

 

= 2,

  

= 1, etc. (each outcome has (oddeven) only one number, but not vica versa); c.) The weight of each student in my statistics class.

Probability and Probability Experiments

A probability experiment is a well-defined act or process that leads to a single well defined outcome. Example: toss a coin (H or T), roll a die

The probability of an event, P(A) is the fraction of times that event will occur in an indefinitely long series of trials of the experiment. This may be estimated:



Empirically : conduct the experiment many times and compute P ( A )



N

A , the sample

N total relative frequency of A. Roll die 1000 times, even numbers appear 510 times, P(even) =

510/1000 = .51 or 51%.



Rationally or Analytically : make certain assumptions about the probabilities of the elementary events included in outcome A and compute probability by rules of probability.

Assume each event 1,2,3,4,5,6 on die is equally likely. The sum of the probabilities of all possible events must equal one. Then P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6. P (even)

= P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 = 1/2 (addition rule) or 50%.



Subjectively : a measure of an individual’s degree of belief assigned to a given event in whatever manner. I think that the probability that ECU will win its opening game of the season is 1/3 or 33%. This means I would accept 2:1 odds against ECU as a fair bet (if I bet $1 on

ECU and they win, I get $2 in winnings).

Independence, Mutual Exclusion, and Mutual Exhaustion

Two events are independent iff (if and only if) the occurrence or non-occurrence of the one has no effect on the occurrence or non-occurrence of the other.

Two events are mutually exclusive iff the occurrence of the one precludes occurrence of the other (both cannot occur simultaneously on any one trial).



Copyright 2015 Karl L. Wuensch - All rights reserved.

Prob6430.docx

Two (or more) events are mutually exhaustive iff they include all possible outcomes.

Marginal, Conditional, and Joint Probabilities

The marginal probability of event A, P(A) , is the probability of A ignoring whether or not any other event has also occurred.

The conditional probability of A given B, P(A|B) , is the probability that A will occur given that B has occurred. If A and B are independent, then P(A) = P(A|B).

2

A joint probability is the probability that both of two events will occur simultaneously.

The Multiplication Rule

If two events are independent , their joint probability is the product of their marginal probabilities, P(A



B) = P(A)



P(B) .

Regardless of the independence or nonindependence of A and B, the joint probability of A and

B is: P(A



B) = P(A)



P(B|A) = P(B)



P(A|B)

The Addition Rule

If A and B are mutually exclusive , the probability that either A or B will occur is the sum of their marginal probabilities, P(A



B) = P(A) + P(B) .

Regardless of their exclusivity, P(A



B) = P(A) + P(B) - P(A



B)

Bayes Theorem

Relating conditional probabilities to joint and marginal probabilities: P ( A | B )



P ( A



B )

. This

P ( B ) forms the basis of Bayesian Statistics .

Working with Contingency Tables

To determine whether two categorical variables are correlated with one another, we can use a two-dimensional table of frequencies, often called a contingency table. For example, suppose that we have asked each of 150 female college students two questions: 1. Do you smoke (yes/no), and,

2. Do you have sleep disturbances (yes/no). Suppose that we obtain the following data (these are totally contrived, not real):

Data From Independent Variables

Smoke?

No

Yes

Sleep?

No Yes

20

40

60

30

60

90

50

100

150

Marginal Probabilities

P(Smoke)



100

150



10

15



2

3



.

6 6 P(Sleep)



90

150



9

15



3

5



.

60

Conditional Probabilities

P(Sleep | Smoke)



60

100



6

10



3

5



.

60 P(Sleep | Nosmoke)



30

50



3

5



.

60

Notice that the conditional probability that the student has sleeping disturbances is the same if she is a smoker as it is if she is not a smoker. Knowing the student’s smoking status does not alter our estimate of the probability that she has sleeping disturbances. That is, for these contrived data, smoking and sleeping are independent, not correlated.

Multiplication Rule

The probability of the joint occurrence of two independent events is equal to the product of the events’ marginal probabilities.

P(Sleep



Smoke)



P(Sleep) x P(Smoke)

60

150



6

15



.

40

3

5



2

3



6

15



.

40

Addition Rule

Suppose that the probability distribution for final grades in PSYC 2101 were as follows:

3

Grade A B C D F

Probability .2 .3 .3 .15 .05

The probability that a randomly selected student would get an A or a B,

P(A



B)



P(A)



P(B)



.2



.3



.5

, since A and B are mutually exclusive events.

Now, consider our contingency table, and suppose that the “sleep” question was not about having sleep disturbances, but rather about “sleeping” with men (that is, being sexually active).

Suppose that a fundamentalist preacher has told you that women who smoke go to Hades, and women who “sleep” go there too. What is the probability that a randomly selected woman from our sample is headed to Hades? If we were to apply the addition rule as we did earlier,

P(Sleep)



P(Smoke)



9

15 here.



10

15



19

15



1 .

27 , but a probability cannot exceed 1, something is wrong

The problem is that the events (sleeping and smoking) are not mutually exclusive, so we have counted the overlap between sleeping and smoking (the 60 women who do both) twice. We need to subtract out that double counting. If we look back at the cell counts, we see that 30 + 40 + 60 = 130 of the women sleep and/or smoke, so the probability we seek must be 130/150 = 13/15 = .87. Using the more general form of the addition rule,

P(Sleep



Smoke)



P(Sleep)



P(Smoke) P(Sleep



Smoke)



9

15



10

15

6

-

15



13

15



.87.

Data From Correlated Variables

Now, suppose that the “smoke” question concerned marijuana use, and the “sleep” question concerned sexual activity, variables known to be related.

4

Smoke?

Sleep?

No Yes

No

Yes

30

40

20 50

60 100

70 80 150

Marginal Probabilities

P(Smoke)



100

150



2

3



.

6 6 P(Sleep)



80

150



8

15



.

5 3

Conditional Probabilities

P(Sleep | Smoke)



60

100



6



3



.

60 P(Sleep | Nosmoke)



20

50



2



.

40

10 5 5

Now our estimate of the probability that a randomly selected student “sleeps” depends on what we know about her smoking behavior. If we know nothing about her smoking behavior, our estimate is about 53%. If we know she smokes, our estimate is 60%. If we know she does not smoke, our estimate is 40%. We conclude that the two variables are correlated, that female students who smoke marijuana are more likely to be sexually active than are those who do not smoke.

Multiplication Rule

If we attempt to apply the multiplication rule to obtain the probability that a randomly selected student both sleeps and smokes, using the same method we employed with independent variables, we obtain: P(Sleep



Smoke)



P(Sleep) x P(Smoke)



8

15



2

3



16

45



.

3 5 . This answer is, however, incorrect. Sixty of 150 students are smoking sleepers, so we should have obtained a probability of

6/15 = .40. The fact that the simple form of the multiplication rule (the one which assumes independence) did not produce the correct solution shows us that the two variables are not independent.

If we apply the more general form of the multiplication rule, the one which does not assume independence, we get the correct solution:

P(Smoke



Sleep)



P(Smoke)



P(Sleep | Smoke)



2

3



3

5



6

15



.

40 .

Real Data

Finally, here is an example using data obtained by Castellow, Wuensch, and Moore (1990,

Journal of Social Behavior and Personality , 5, 547-562). We manipulated the physical attractiveness of the plaintiff and the defendant in a mock trial. The plaintiff was a young women suing her male

boss for sexual harassment. Our earlier research had indicated that physical attractiveness is an asset for defendants in criminal trials (juries treat physically attractive defendants better than physically unattractive defendants), and we expected physical attractiveness to be an asset in civil cases as well. Here are the data relevant to the effect of the attractiveness of the plaintiff.

Guilty?

Attractive? No Yes

5

No

Yes

33

17

39 72

56 73

50 95 145

Guilty verdicts (finding in favor of the plaintiff) were more likely when the plaintiff was physically attractive (56/73 = 77%) than when she was not physically attractive (39/72 = 54%). The magnitude of the effect of physical attractiveness can be obtained by computing an odds ratio . When the plaintiff was physically attractive, the odds of a guilty verdict were 56 to 17, that is, 56/17 = 3.29. That is, a guilty verdict was more than three times more likely than a not guilty verdict. When the plaintiff was not physically attractive the odds of a guilty verdict were much less, 39 to 33, that is, 1.18. The ratio of these two odds is

56

39 /

/ 17



2 .

79 .

That is, the odds of a guilty verdict when the plaintiff was

33 attractive were almost three times higher than when the plaintiff was not attractive. That is a big effect!

We also found that physical attractiveness was an asset to the defendant. Here are the data:

Guilty?

Attractive? No Yes

No

Yes

17

33

53 70

42 75

50 95 145

Guilty verdicts (finding in favor of the plaintiff) were less likely when the defendant was physically attractive (42/75 = 56%) than when he was not physically attractive (53/70 = 76%). The odds ratio here is

53

42 /

/ 17



2 .

50 .

We could form a ratio of probabilities rather than odds. The ratio of

33

[the probability of a guilty verdict given that the defendant was not physically attractive] to [the probability of a guilty verdict given that the defendant was physically attractive] is

53 / 70

42 / 75



1 .

35 .

We could look at these data from the perspective of the odds of a not guilty verdict. Not guilty verdicts were more likely when the defendant was physically attractive (33/75 = 44%) than when he was not physically attractive (17/70 = 24%). The odds ratio here is

33 / 42

17 / 53



2 .

50 .

Notice that the odds ratio is the same regardless of which perspective we take. This is not true of probability ratios

(and is why I much prefer odds ratios over probability ratios ).

6

If we consider the combined effects of physical attractiveness of both litigants, guilty verdicts were most likely when the plaintiff was attractive and the defendant was not, in which case 83% of the jurors recommended a guilty verdict. When the defendant was attractive but the plaintiff was not, only

41% of the jurors recommended a guilty verdict (and only 26% of the male jurors did). That produces an odds ratio of

83 / 17

41 / 59 attractive when in court.



7 .

03 .

Clearly, it is to one’s advantage to appear physically

Why Odds Ratios Rather Than Probability Ratios?

Odds ratios are my favorite way to describe the strength of the relationship between two dichotomous variables. One could use probability ratios, but there are problems with probability ratios, which I illustrate here.

Suppose that we randomly assign 200 ill patients to two groups. One group is treated with a modern antibiotic. The other group gets a homeopathic preparation.

At the end of two weeks we blindly determine whether or not there has been a substantial remission of symptoms of the illness. The data are presented in the table below.

Treatment

Antibiotic

Remission of

Symptoms

No

10

Homeopathic 60

70

Yes

90

40

100

100

130 200

Odds of Success

For the group receiving the antibiotic, the odds of success (remission of symptoms) are 90/10

= 9. For the group receiving the homeopathic preparation the odds of success are 40/60 = 2/3 or about .67. The odds ratio reflecting the strength of the relationship is 9 divided by 2/3 = 13.5. That is, the odds of success for those treated with the antibiotic are 13.5

times higher than for those treated with the homeopathic preparation.

Odds of Failure

For the group receiving the homeopathic preparation, the odds of failure (no remission of symptoms) are 60/40 = 1.5. For those receiving the antibiotic the odds of failure are 10/90 = 1/9. The odds ratio is 1.5 divided by 1/9 = 13.5

. Notice that this ratio is the same as that obtained when we used odds of success, as, IMHO, it should be.

Now let us see what happens if we use probability ratios.

Probability of Success

For the group receiving the antibiotic, the probability of success is 90/100 = .9. For the homeopathic group the probability of success is 40/100 = .4. The ratio of these probabilities is .9/.4 =

2.25.

The probability of success for the antibiotic group is 2.25 times that of the homeopathic group.

Probability of Failure

For the group receiving the homeopathic preparation the probability of failure is 60/100 =

.6. For the antibiotic group it is 10/100 = .10. The probability ratio is .6/.1 = 6.

The probability of failure for the homeopathic group is six times that for the antibiotic group.

The Problem

With probability ratios the value you get to describe the strength of the relationship when you compare (A given B) to (A given not B) is not the same as what you get when you compare (not A given B) to (not A given not B). This is, IMHO, a big problem. There is no such problem with odds ratios.

7

Another Example

According to a report provided by Medscape, among the general population in the US, the incidence of narcissistic personality disorder is 0.5%. Among members of the US Military it is 20%.

Probability of NPD.

The probability that a randomly selected member of the military will have NPD is 20%, the probability that a randomly selected member of the general population will have NPD is 0.5%. This yields a probability ratio of .20/.005 = 40 .

Probability of NOT NPD.

The probability that a randomly selected member of the general population will not have NPD is .995. The probability that a randomly selected member of the military will not have NPD is

.80. This yields a probability ratio of .995/.8 = 1.24

.

Odds of NPD

For members of the military, .2/.8 = .25. For members of the general population, .005/.995 =

.005. The odds ratio is (.2/.8) / (.005/.995) = 49.75

.

Odds of NOT NPD.

For members of the military, .8/.2 = 4. For members of the general population, .995/.005 =

199. The odds ratio is (.995/.005) / (.8/.2) = 49.75

.

Probability Distributions

The probability distribution of a discrete variable Y is the pairing of each value of Y with one and only one probability. The pairing may be by a listing, a graph, or some other specification of a functional rule, such as a formula. Every P(Y) must be between 0 and 1 inclusive, and the sum of all the P(Y)s must equal 1.

For a continuous variable we work with a probability density function, defined by a formula or a figure (such as the normal curve).



Imagine a relative frequency histogram with data grouped into 5 class intervals so there are 5 bars. Now increase the number of intervals to 10, then to 100, then 100,000, then an infinitely large number of intervals

—now you have a smooth curve, the probability function.

8



There is an infinitely large number of points on the curve, so the probability of any one point is infinitely small.



We can obtain the probability of a score falling in any particular interval a to b by setting the total area under the curve equal to 1 and measuring the area under the curve between a and b. This involves finding the definite integral of the probability density function from a to b.



To avoid doing the calculus, we can, for some probability density functions (such as the normal), rely on tables or computer programs to compute the area.



Although technically a continuous variable is one where there is an infinite number of possible intermediate values between any pair of values, if a discrete variable can take on many possible values we shall treat it as a continuous variable.

Random Sampling

Sampling N data points from a population is random if every possible different sample of size

N was equally likely to be selected.



We want our samples to be representative of the population to which we are making inferences. With moderately large random samples, we can be moderately confident that our sample is representative.



The inferential statistics we shall use assume that random sampling is employed.



In fact, we rarely if ever achieve truly random sampling, but we try to get as close to it as is reasonable.



This definition differs from that given in Howell ("each and every element of the population has an equal chance of being selected"). A sampling procedure may meet Howell’s definition but not mine. For example, sampling from a population of 4 objects (A,B,C,& D) without replacement, N = 2, contrast sampling procedure X with Y:

Probability

Sample

AB

AC

AD

BC

BD

CD

X

1/2

0

0

0

0

1/2

Y

1/6

1/6

1/6

1/6

1/6

1/6



If each time a single score is sampled, all scores in the population are equally likely to be selected, then a random sample will be obtained.

Counting Rules

Suppose I have four flavors of ice cream and I wish to know how many different types of ice cream cones I can make using all four flavours. I shall consider order important, so chocolate atop vanilla atop strawberry atop pistachio is a different cone than vanilla atop strawberry atop pistachio atop chocolate. Each cone must have all four flavours. The simple rule to solve this problem is:

There are Y!

ways to arrange Y different things. The “!” means “factorial” and directs you to find the product of Y times Y-1 times Y-2 times ... times 1. By definition, 0! is 1. For our problem, there are

4x3x2x1 = 24 different orderings of four flavours.

Now, suppose I have 10 different flavours, but still only want to make 4-scoop cones. From 10 different flavours how many different 4-scoop cones can I make? This is a permutations problem.

Here is the rule to solve it: the number of ways of selecting and arranging X objects from a set of N objects equals

( N

N !



Y )!

. For our problem, the solution is

( 10

10 !



4 )!



10



9



8

6 !



7



6 !



5040 .

Now suppose I decide that order is not important, that is, I don’t care whether the chocolate is atop the vanilla or vice versa, etc. This is a combinations problem. Since the number of ways of arranging Y objects is Y!, I simply divide the number of permutations by Y! That is, I find

( N

N !



Y )!

Y !



10 !

6 !

4 !



10



9

6 !



4





8

3





7

2



6 !



210 .

9

Suppose I am assigning ID numbers to the employees at my ice cream factory. Each employee will have a two digit number. How many different two digit numbers can I generate? A rule I can apply is C L where C is the number of different characters available (10) and L is the length of the ID number. There are 10 2 = 100 different ID numbers, but you already knew that. Suppose I decide to use letters instead. There are 26 different characters in the alphabet you use, so there are 26 2 = 676 different two digit ID strings. Now suppose I decide to use both letters (A through Z) and numerals (0 through 9). Now there are 36 2 = 1,296 different two-character ID strings. Now suppose I decide to use one and two character strings. Since there are 36 different one character strings, I have 1,296 +

36 = 1,332 different ID strings of not more than two characters.

If I up the maximum number of characters to three, that gives me an additional 36 3 = 46,656 strings, for a total of 46,656 + 1,332 = 47,988 different strings of one to three characters. Up it to one to four characters and I have 1,679,616 + 47,988 = 1,727,604. Up it to one to five characters and you get 60,466,176 + 1,727,604 = 62,193,780. Up it to one to six characters and we have 2,176,782,336

+ 62,193,780 = 2,238,976,116 different strings of from one to six characters. I guess it will be a while until tinyurl needs to go to seven character strings for their shortened urls.



Probability FAQ

– Answers to frequently asked questions.

 Return to Wuensch’s Stats Lessons

Copyright 2015, Karl L . Wuensch - All rights reserved.

Basic Probability

Basic Probability, Including Contingency Tables

Working with Contingency Tables

Why Odds Ratios Rather Than Probability Ratios?

Another Example

Probability Distributions

Random Sampling

Counting Rules

Related documents

Products

Support

Basic Probability