Statistics Introduction Statistics • the art and science of gathering, describing, analyzing, summarizing and interpreting data to give us new information and knowledge Populations and Samples Reasons to Sample • • • • • • • • Reduced Cost For populations having an extremely large number of individuals, measuring each individual would be impractical. For example, it would be impossible to measure the volume of every tree on a large timber sale. Neither the time nor the money is available to do such an inventory. Additionally, the inventory costs of doing such an inventory would exceed the value of the sale. Greater Speed Sampling reduces the magnitude of the job, allowing the task to be completed in a shorter period of time. Not many timber sales would make it to market if the volume is measured on all of the trees and it takes ten months to do the sale prep work. Greater Scope Sampling provides the ability to study a larger area and include diverse information about the population. Additional information such as species present in the population, type of defect, or other conditions in the population can be included when sampling is used. Greater Accuracy Often overlooked is the quality of work suffers when budgets and resources are stretched too thin. Good measurements on a sample of individuals provide more reliable information than bad measurements on the entire population. Hopefully the sample is representative Populations, Parameters and Estimates • Unit – the object information is gathered on – i.e. a tree or a plot – All units combined equal the population • Variable – the attribute, property or characteristic that varies from unit to unit. – i.e. species, height, diameter or volume per acre • Parameters describe populations – i.e. size, total, mean, variation Accuracy vs. Precision • Accuracy = the success of estimating the true value of a quantity • Precision refers to the clustering of sample values about their own average Bias • A systematic distortion in the data which may be caused by poor measurements, bad sample selection or incorrect estimation procedures. • This could be caused by defective or improperly calibrated equipment or measuring procedures. Also by user error failing to correct for declination, slope, etc. Accuracy, Bias and Precision Bias can be measured Notation Subscripts the X indicates the associated value for a particular individual. It could represent height, weight, age or some other value of interest. The subscript i indicates the ith individual in the population has the value of interest used in the equation Summation When summing a list of numbers, say 1 through 6, the summation of those numbers would look like this.... 1 + 2 + 3 + 4 + 5 + 6 = 21 In statistics, a value represented by x and subscripted by i indicates the ith individual in the list. So if there were six individuals in the list and x represented height, summing those six values could appear like this..... x1 + x2 + x3 + x4 + x5 + x6 = the sum of heights For lengthy lists, this will get cumbersome quickly. A shorthand notation was created using the Greek letter sigma. The symbol is circled in red in the equation shown. The notation above and below the sigma represent the limits over which the summation is applied. In this example, the letter at the top of the summation symbol indicates to sum all the values in the list or sum the individuals in the list starting at 1 through n, however many are in the list. In the example of six values to sum, the n in the equation would have been 6 to indicate there are six values in the list. Brackets • This indicates the operations within the brackets are to be performed before other operations outside the brackets. • 1+2+3+4+5+6 = 21 squared equals 441 • 12+22+32+42+52+62 = 91 More Summation • Add a constant with brackets (1 + 7.5) + (2 + 7.5) + (3 + 7.5) + (4 + 7.5) + (5 + 7.5) + (6 + 7.5) = 66 • Add a constant without brackets 1 + 2 + 3 + 4 + 5 + 6 = 21 + 7.5 = 28.5 • Multiply by a constant without brackets 3 times 1 + 3 times 2 +3 times 3 + 3 times 4 + 3 times 5 + 3 times 6 = 63 • Multiply by a constant with brackets 3 times 1 + 2 + 3 + 4 + 5 + 6 = 3 times 21 = 63 Units Are the items that are observed in sampling. In forestry, the unit is usually a tree. Variables • an attribute, property or characteristic of an individual or unit being observed. • In Forestry, the variable of interest might be: Tree Height, Diameter, Weight, Species, etc. Continuous or Discrete Variables • A continuous variable is one which can take on a value between any other two values, such as: tree height, age of an animal or amount of water consumed. • A discrete variable corresponds to a digital quantity, while a continuous variable corresponds to an analog quantity. Continuous variables can be analyzed by normal statistics such as mean and standard deviation. Discrete cannot. • A discrete variable is one with a well defined finite set of possible values, called states. Examples are: the number of chicks in a clutch, a habitat that is either ‘present’ or ‘absent’ or species of a tree. The tools for analyzing discrete data are called nonparametric statistics. Continuous or Discrete? Statistical Notation for Populations Statistical Notation for Samples Other Notation Population Population Population Total X is the population total xi is the unit value N is the population size Population Central Tendency • The mean is the arithmetic average of the set of observations • The median is the middle value of the series of observations when they are arranged in magnitude order • The mode is defined as the most frequently appearing value or class of values in a set of observations • For a normally distributed (bell-curved shaped) population, these values are all the same. Population Variance The population variance is used to characterize the spread and is defined as the average squared difference of the observations from the population mean. Population Standard Deviation To get a measure of variation expressed in the same units as the original data, the square root of the variance is taken like the variance, standard deviation is a measure of dispersion of the individual observations about the mean in a normally distributed population Population Coefficient of Variation Because populations with large means tend to have larger standard deviations than those with small means, the coefficient of variation permits a comparison of relative variability about different means. It is independent of the units used and is useful in comparing distributions where units may be different. CV is the ratio of the standard deviation to the mean and is expressed as a percentage