The Standard Normal Distribution; Sampling Distribution Seminar 5 An example of a very difficult question Identify the variables “Objects and events that are simultaneously attended with one’s social group are subject to more elaborative processing, are better remembered, and are more readily internalized through social learning. In contrast, none of these effects are observed when jointly experiencing an event with nongroup members.” Shteynberg et al. (2014). Feeling more together. Emotion. Your midterm will be easier than this, but more difficult than your tutorials. Identify the variables “Objects and events that are simultaneously attended with one’s social group are subject to more elaborative processing, are better remembered, and are more readily internalized through social learning. In contrast, none of these effects are observed when jointly experiencing an event with nongroup members.” Shteynberg et al. (2014). Feeling more together. Emotion. STANDARD NORMAL DISTRIBUTION Recap • We learnt how normal distributions are formed. • We learnt about z-scores • How are they related? Today’s question • Example: Ramadhar gets a 50 on his Statistics midterm and an 50 on his Calculus midterm. Did he do equally well on these two exams, compared to his classmates? • Big question: How can we compare a person’s score on different variables? Very similar concepts Normal distribution vs. Standard(ized) normal distribution The Normal Distribution f ( x) 1 2 Note constants: =3.14159 e=2.71828 1 x 2 ( ) 2 e This is a bell shaped curve with different centers and spreads depending on and The Normal Distribution It’s a probability density function No matter what the values of and , must integrate to 1! 1 2 1 x 2 ( ) e 2 dx 1 15 Case 1 Statistics Calculus •Statistics: Ramadhar’s exam score is 10 points above the mean 10 •Calculus: Dave’s exam score is 10 points below the mean 0 5 •How can we interpret Ramadhar’s grade relative to the average performance of the class, for each course? 0 20406080 100 GRA DE Mean Statistics = 40 Mean Calculus = 60 *Note* It is wrong to interpret Ramadhar’s grade relative to the performance of both courses combined 50⁰F 100⁰F You can’t say the average temperature is 75⁰F, right? 0 5 10 15 20 25 30 Case 2 •Both distributions have the same mean (40), but different standard deviations (10 vs. 20) Statistics •In one case, Ramadhar is performing better than almost 95% of the class. In the other, he is performing better than approximately 68% of the class. Calculus 0 •Thus, how we evaluate Ramadhar’s performance depends on how much variability there is in the 20406080 100 exam scores GRA DE Standard (Z) Scores • We want to express a person’s score with respect to both (a) the mean of the group and (b) the variability of the scores – how far a person is from the mean = X - M – variability = SD (Xi M ) Standard score or Z i SD ** How far a person is from the mean, in the metric of standard deviation units ** Case 1 15 Statistics: Statistics (50 - 40)/10 = +1 Calculus 10 one SD above the mean Calculus: 5 (50 - 60)/10 = -1 0 one SD below the mean 0 20406080 100 Mean Statistics = 40 GRA DE Mean Calculus = 60 0 5 10 15 20 25 30 Case 2 An example where the means are identical, but the two sets of scores have different spreads Statistics Statistics Z-score: (50 - 40)/5 = 2 Calculus Calculus Z-score: (50 - 40)/20 = .5 0 20406080 100 GRA DE Three Properties of Standard Scores 1. The mean of a set of z-scores is always 0 2. The SD of a set of standardized scores is always 1 3. Shape of distribution for unstandardized and standardized scores is identical. STANDARDIZED 0 0.0 0.1 2 0.2 0.3 4 0.4 6 0.5 UNSTANDARDIZED 0.4 0.6 0.8 1.0 -6 -4 -2 0 2 Two advantages of standard scores We can use standard scores to find (per)centile scores: the proportion of people with scores less than or equal to a particular score. (per)centile scores ≠ z-scores (per)centile scores ↔ z-scores 0. 0.1 0.2 0.3 0.4 The area under a normal curve 50% 34% 34% 14% 14% 2% -4 2% -2 0 2 4 S CORE Two advantages of standard scores Standard scores provides a way to standardize or equate different metrics. We can now interpret Ramadhar’s scores in Statistics and Calculus on the same z-score metric. (Each score comes from a distribution with the same mean [zero] and the same standard deviation [1].) Two disadvantages of standard scores Because a person’s score is expressed relative to the group (X - M), the same person (score) can have different z-scores when assessed in different samples Example: Ramadhar’s score depends on everyone else’s scores. Two disadvantages of standard scores If the absolute score (e.g., $, ₹, €) is meaningful or of psychological interest (e.g., milliseconds), it will be obscured by transforming it to a relative metric. We will revisit this concept in Multiple Regression (Week 11). Z-scores Percentile scores What’s the probability of getting a math SAT score of 575 or less, = 500 and = 50? Z 575 500 1.5 50 A score of 575 is 1.5 standard deviations above the mean. 575 P( X 575) 1 (50) 200 2 1.5 1 x 500 2 ( ) e 2 50 dx 1 2 1 Z2 e 2 dz You don’t have to calculate this. Look up a z-score table. Looking up probabilities in a standard normal table What is the area to the left of Z=1.51 in a standard normal curve? Z=1.51 Z=1.51 Area is 93.45%, or 93.45% higher than all scores. What does it mean to get a sample from a population? SAMPLING DISTRIBUTIONS Recap: Sample vs Population • Population – A group that includes all the cases (individuals, objects, or groups) in which the researcher is interested. • Sample – A relatively small subset from a population. Today’s question • Are the descriptive statistics we obtain from a sample the same as the corresponding statistics in a population? • Of course it not! How different will they be? • In other words, what is the error associated with inferring parameters from a sample to a population? Population inferences can be made... ...by selecting a representative sample from the population Why sample? • Reduces cost of research (e.g. political polls) • In some cases (e.g. industrial production) analysis may be destructive • Generalize findings to population – Inference from sample to population: inferential statistic Features of good inference from samples Random selection Don’t confuse this with “random assignment” Every member of the population has the same chance of being selected in the sample (Often violated in psychology…but is this necessarily a problem) Henrich et al. (2010). The weirdest people in the world? Behav Brain Sci. Features of good inference from samples Large sample size, N How large? Central Limit Theorem: N > 30 (demonstration on last slide) Sampling distribution • We take one sample: (a) how much error we can expect on average and (b) how much variation there will be on average in the errors observed • Sampling distribution: the distribution of a sample statistic (e.g., a mean) when sampled under known sampling conditions from a known population. The real problem • Often, we don’t know the parameters of a population. (That’s why we sample!) • The sample is an estimate of the population (philosophically: an estimate of the truth) Features of sampling distribution In statistics, we are mostly concerned with the M and SD. • Mean of sampling distribution μx̄ = μ • Standard deviation (SD) of the sampling distribution: standard error (SE) σ σ𝑥 = 𝑛 σ : sample standard deviation SDs are “errors”. SEs are essentially “errors” of “errors” n: sample size 0 5 10 15 20 z 1500 0 0 0 200 500 500 400 1000 600 1000 800 2000 1500 From the previous two formulas… 0 5 10 15 20 z 0 5 10 15 20 z “small” sample “medium” sample “large” sample mean of sample means = 10 mean of sample means = 10 mean of sample means = 10 SD of sample means = 4.16 SD of sample means = 2.41 SD of sample means = 0.87 Central Limit Theorem “No matter what we are measuring, the distribution of any measure across all possible samples is approximately a normal distribution, as long as the number of cases in each sample is about 30 or larger.” What is the probability that the ball will end up at… CLT demonstration Discussion: Why aren’t the small balls normally distributed? Summary • Z-scores are useful relative scores in some cases • A good sample is one that is random and large. • The sampling technique matters too (to be taught in detail in SRM II)