Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics for Biology. CRC Press. • The basic output of any scientific investigation is a collection of observations or data. (Ex. If Y is a random variable, then we use Yi to denote the ith observation in our sample.) • Often, we will use our sample data to estimate unknown population parameters (Ex. We can use the sample mean,Y, to estimate the population mean, μ) • The construction of frequency distributions is usually the first step in summarizing data Hypericum cumulicola: • Small, short-lived perennial herb • Narrowly endemic and endangered • Flowers are small and bisexual Histogram of plant height (1995) Measures of location • It is useful to identify a “typical value” to summarize our observations (i.e., an “average”) • Examples include: 1. Mean 2. Median 3. Mode The Arithmetic Mean The arithmetic mean (or simply the mean) of a list of numbers is the sum of all the observations (Yi) in the list divided by the number of the observations (n): n Yi i 1 Y n The Arithmetic Mean • Remember the formula for the expected value of a discrete random variable? n E (Y ) Yi pi i 1 • Since we assume, for our sample, that the Yi are the values of a random variable and that pi = 1/n for all Yi, we get: n 1 n E (Y ) Yi (1 / n) Yi Y n i 1 i 1 The Arithmetic Mean • The arithmetic mean of the observations in our sample (Y ) is an unbiased estimator of the population mean (μ) if 3 conditions are met: 1. Observation are made on randomly selected individuals 2. Observations in the sample are independent 3. Observations are drawn from a larger population that is distributed as a normal random variable The Law of Large Numbers • As the sample size n increases, the arithmetic mean of Yi approaches the expected value of Y n Y i lim i 1 Yn E (Y ) n n The Median • The value of a set of ordered observations that has an equal number of observations above and below it. The Median • Estimation: – For an odd number of observations, the median is the middle observation of the set. – Ex. Median of {1, 2, 3, 4, 5} = 3 – For an even number of observations, the median is the average of the two middle observations of the set. – Ex. Median of {1, 2, 3, 4, 5, 6} = (3+4)/2 = 3.5 The sample mean and the median height of Hypericum cumulicola (ADULTS ONLY) The normal distribution with the observed sample mean and variance The Mode • The value of the observations that occurs most frequently in the sample. • This will be the peak of the frequency distribution in a histogram The distribution of height of Hypericum cumulicola is bimodal. Could you suggest why? Plotting seedlings and adults separately Final Comments on Measures of Location • When the underlying distribution is symmetrical (or nearly so), the mean, median, and mode are all similar in value, BUT… • …when there are extreme observations, the median or mode may better describe the location of the data Measures of variability • It is never sufficient to just state the mean or other measure of location of our data! • Because there is variability in nature, variability due to our sampling, etc., we also need to estimate the spread of our observations around the average value • Examples include: The range, the variance, and the standard deviation The sample variance An individual value (Yi Y ) is called a deviation from the mean. The sum of the squared deviations is called the sum of squares (SS). We divide SS by one less than the sample size to get the sample variance (s2), which is an unbiased estimator of the population variance (σ2). n 2 Yi Y Sum of squares SS 2 i 1 s n 1 n 1 n 1 The sample standard deviation The units in which the variance is expressed are (original units)2, which is conceptually awkward. To get around this, the sample variance is converted to the sample standard deviation (s), by simple taking the square root: n Yi Y s 2 s i 1 2 n 1 68.26 % 15.87 % 15.87 % Mean + One standard deviation The Standard Error of the Mean • Remember the Central Limit Theorem: if the Yi are independent random observations and the sample size is “reasonably large”, the sample mean ( Y ) is approximately normally distributed with mean E[Y] and variance σ2(Y)/n • Thus, we can calculate the standard error of the mean as follows: sY 2 (Y ) n s 2 n s n