Probability, Sampling, and Inference Q560: Experimental Methods in Cognitive Science Lecture 5 What is Probability? Relationship between samples and populations: Used to predict what kind of samples are likely to be obtained from a population Defining Probability Probability = proportion of outcome. Given an outcome A: Probability of A = Number of outcomes classed as A Examples: coin toss deck of cards Total number of outcomes Probability Notation Probability of outcome A = p(A) Examples: Probability of “king” = p(king) = 4/52. Probabilities can be expressed as fractions, decimals, or as percentages. 4/52 = 0.0769 = 7.69% Probability and Random Sampling For a random sample these two conditions must be met: 1. Each individual has an equal chance of being selected. 2. If more than one individual is selected, there must be constant probability for each selection. (requires sampling with replacement) Explanation… Location of Scores in a Distribution X values are transformed into z-scores, such that … 1. The sign (+, -) indicates location above or below the mean. 2. The number indicates distance from the mean in terms of the number of standard deviations. IQ scores: =100, =10 z= = z= X- X- deviation score standard deviation X = + z Standardizing a Distribution What effects does this z-score transformation have on the original distribution? 1. Shape: stays the same! Individual scores do not change position. 2. Mean: z-score distribution mean is always zero! 3. Standard deviation: z-score distribution standard deviation is always 1! z-Score transformation is like re-labelling the x-axis … Standardizing a Distribution Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2 X 0 6 5 2 3 2 SX =18 X-μ X2 SX 18 m= = =3 n 6 Standardizing a Distribution Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2 X X-μ 0 6 5 2 3 2 -3 3 2 -1 0 -1 X2 S(X - m ) = 0 SX 18 m= = =3 n 6 Standardizing a Distribution Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2 X X-μ X2 0 6 5 2 3 2 -3 3 2 -1 0 -1 0 36 25 4 9 4 SX =18 SX 2 = 78 2 (SX) SS = SX 2 N 182 = 78 = 24 6 SS s= = 4 =2 N Standardizing a Distribution Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2 X X-μ X2 0 6 5 2 3 2 -3 3 2 -1 0 -1 0 36 25 4 9 4 μ=3 σ=2 z z= X -m s Standardizing a Distribution Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2 X X-μ X2 z 0 6 5 2 3 2 -3 3 2 -1 0 -1 0 36 25 4 9 4 -1.5 1.5 1 -0.5 0 -0.5 μ=3 σ=2 z= X -m s Standardizing a Distribution Let’s draw frequency distribution graphs: Probability and Frequency Graphs Example: For the population of scores shown below, what is the probability in a random draw of obtaining a score greater than 4? p(X>4) = The Normal Distribution Diagram: The Normal Distribution Proportions of areas within the normal distribution can be quantified using z-scores: The Normal Distribution Note: The normal distribution is symmetrical. This means that the proportions on both sides of the mean are identical. Note: All normal distributions have the same proportions. This allows us to solve problems like the following: Body height has a normal distribution, with = 68, and = 6. If we select one person at random, what is the probability for selecting a person taller than 80? The Normal Distribution A graphical representation of the same problem: The Unit Normal Table Given the standard proportions of normal distributions we can give probabilities for z-scores with whole number values. But what about fractional z-scores? That’s what the unit normal table is all about … Or, plenty of online calculators: http://www.stat.tamu.edu/~west/applets/normaldemo.html The Unit Normal Table How the table is organized: Things to remember when using the unit normal table: 1. Symmetrical (only positive z-scores are tabulated). 2. Proportions are always positive. 3. Section > 50% = “body” 4. Section < 50% = “tail” 5. Body+tail = 1.00 (100%). In a graph: “area greater than” = “area to the right of” “area smaller than” = “area to the left of” From Specific Scores to The Unit Normal Table You are asked a probability associated with a specific X value (as opposed to a z-score). Example: For a normal distribution with =500 and =100, give the probability of selecting an individual whose score is above 650. (= proportion of individuals with a score above 650.) Procedure to do this: … From Specific Scores to The Unit Normal Table Follow this procedure: 1. Make a rough sketch ( and ). 2. Locate and mark specific score X. 3. Shade appropriate proportion. 4. Transform X value into z-score. 5. Look up value for proportion in unit normal table (using z-score). Probability from the Unit Normal The math section of the SAT has a = 500 and = 100. If you selected a person at random: a) What is the probability he would have a score greater than 650? b) What is the probability he would have a score between 400 and 500? The Binomial Distribution The Binomial Distribution “binomial” = “two names” Variable exists in two categories only… heads – tails true – false Probabilities for each outcome are often known… p(heads) = 0.5 p(tails) = 0.5 Question of interest: how often does an outcome occur in a sample of observations. The Binomial Distribution Notation: 1. Two categories: A, B 2. Probabilities: p = p(A), q = p(B). Note: p+q = 1.00. 3. Number of observations in the sample: n 4. Variable X is number of times that A occurs in the sample. Note: X ranges between 0 and n. The binomial distribution shows the probability associated with each value X from X=0 to X=n. The Binomial Distribution Table of outcomes: X = Number of heads. Toss 1 Toss 2 X Heads Heads Tails Tails Heads Tails Heads Tails 2 1 1 0 p(X=2) = ¼ p(X=1) = ½ p(X=0) = ¼ The Binomial Distribution Draw the binomial distribution: Class experiment: Toss a coin 16 times, count the number of heads. Shape of the binomial distribution for large numbers of trials: n=2 n=8 n=16 n=64 The Binomial Distribution The binomial distribution tends to approximate the normal distribution, as n gets large, or more precisely, as pn and qn are greater than 10. Then the normal distribution will have approximately: = pn = npq This means that, given p, q and n, we can directly derive z-scores: X – pn z= npq The Binomial Distribution An example graph: Using a balanced coin, what is the probability of obtaining more than 30 heads in 50 tosses? The Binomial Distribution p = 0.5 q = 0.5 n = 50 X = 30 m = pn = 0.5(50) = 25 s = npq = 50(.5)(.5) = 3.54 z= X -m s 30 - 25 = 3.54 = 1.41 Probability is .0793 The Binomial Distribution A friend bets you that he can draw a king more than 8 times in 20 draws (with replacement) of a fair deck of cards, and he does it. Is this a likely outcome, or should you conclude that the deck is not “fair” p = .077 q = .923 n = 20 X=8 The Binomial Distribution p = .077 q = .923 n = 20 X=8 m = pn = 0.077(20) = 1.54 s = npq = 20(.077)(.923) = 1.19 z= X -m s 8 -1.54 = 1.19 = 5.43 Probability is ~0 The Binomial Distribution Baby sea turtles hatch on land and have to quickly make it to the ocean before they are picked off by birds. A baby sea turtle has a 1/8 chance of making it to the water safely. If a mother lays 100 eggs (and they all hatch), what is the probability that more than half the hatchlings making it to the ocean safely? p = 0.125 q = 0.875 n = 100 X = 50 The Binomial Distribution p = 0.125 q = 0.875 n = 100 X = 50 m = pn = 0.125(100) = 12.5 s = npq = 100(.125)(.875) = 3.31 z= X -m s 50 -12.5 = 3.31 = 11.33 Probability is close to zero Statistical Significance It is very unlikely to obtain an individual from the original population who has a z-score beyond 1.96 Less that 5% of any population fit into this area under the curve Therefore, we will define an event as “unlikely due to chance” or statistically significant if it has a less than 5% chance of occurrence in a normal population. Our card magician was “unlikely” but our coin flip could still be explained by chance (p not < .05)