Introduction to summary statistics: Sample mean & sample variance Fred Boehm Statistics 224 January 27, 2014 224 logistics Website updates: Revised office hours info • • Alyssa & Huikun: 12-2pm today in MSC 1217c Fred: 6:30-8pm today in Wendt library (room 129) Electronic survey Respond by 6pm tonight (Jan 27) Completion time: ~ 3 minutes Email me if you can't find the email with hyper-link Homework 1 due Wednesday at 11am in class Lecture overview Key terms in statistics Statistic & Parameter Random variable Measures of central tendency Measures of spread Sample mean as a statistic related to central tendency Sample variance as a statistic related to spread of data Coin flip examples Statistic vs. Parameter Statistic – observed values, or function of observed values Coin Flip Example: For ten coin flips, what is the number of heads? Parameter – unknown, underlying value that impacts the observed outcomes Coin Flip Example: Is the coin fair? In other words, is the probability of observing heads equal to 0.5? Random variable Technical definitions use notions from probability theory For our purposes, we may think of a random variable as an outcome that has more than one possible value Random variable example: a coin flip Two possible outcomes (heads or tails) What is a “sample mean” A statistic (function of observed data) Intuitively, the 'center' point of your observations Mathematically, the “average” of your observed values Written as X with a bar above it Pronounced “X bar” Batting Average in Baseball Baseball batting average What is the maximum possible value of AVG? What is the minimum possible value of AVG? Sample mean, continued Coin flips example Repeat coin flips and record outcomes Coin Flips Activity Each student flips the penny 5 times Record the number of heads (between zero and five) Show of hands for each value of number of heads Plot the data (as histogram) in R Coin Flips Activity, continued Do you think that your coin is fair? Why? What might you do to better assess the fairness of your coin? Turn to your neighbor to discuss these three questions Sample variance Tells you about the 'spread' of the data Larger sample variance corresponds to data being more spread out Mathematically, one definition is: Sample variance & coin flips You've already flipped your penny 5 times You recorded the number of heads that you saw Calculate, from your five flips: Sample mean = Xbar = (number of heads)/(number of flips) Sample variance & coin flips Now, calculate the sample variance from your five flips Compare your sample variance with those of your neighbors Should you have the same sample variance as your neighbors? Should you be surprised if you and your neighbor have the same sample variance? Why? Histograms of three random samples Black: Variance=100; Sample variance=89 Red: Variance=16; Sample variance=17 Green: Variance=1; Sample variance=0.95 Sibling count histogram How do we get the sample mean from a histogram? What is (approximately) the sample mean here? Sibling count histogram How do we get the sample mean from a histogram? What is (approximately) the sample mean here? 1.8 Data from Stockholm Birth Cohort Study. http://www.stockholmbirthcohort.su.se/ Lecture overview Key terms in statistics Statistic & Parameter Random variable Measures of central tendency Sample mean as a statistic related to central tendency Measures of spread Sample variance as a statistic related to spread of data Coin flip examples Guessing a sample mean from a histogram