Statistics

advertisement
Statistics
Introduction
Statistics
• the art and science of gathering, describing,
analyzing, summarizing and interpreting data
to give us new information and knowledge
Populations and Samples
Reasons to Sample
•
•
•
•
•
•
•
•
Reduced Cost
For populations having an extremely large number of individuals, measuring each individual
would be impractical. For example, it would be impossible to measure the volume of every
tree on a large timber sale. Neither the time nor the money is available to do such an
inventory. Additionally, the inventory costs of doing such an inventory would exceed the
value of the sale.
Greater Speed
Sampling reduces the magnitude of the job, allowing the task to be completed in a shorter
period of time. Not many timber sales would make it to market if the volume is measured on
all of the trees and it takes ten months to do the sale prep work.
Greater Scope
Sampling provides the ability to study a larger area and include diverse information about the
population. Additional information such as species present in the population, type of defect,
or other conditions in the population can be included when sampling is used.
Greater Accuracy
Often overlooked is the quality of work suffers when budgets and resources are stretched too
thin. Good measurements on a sample of individuals provide more reliable information than
bad measurements on the entire population.
Hopefully the sample is representative
Populations, Parameters and Estimates
• Unit – the object information is gathered on
– i.e. a tree or a plot
– All units combined equal the population
• Variable – the attribute, property or
characteristic that varies from unit to unit.
– i.e. species, height, diameter or volume per acre
• Parameters describe populations
– i.e. size, total, mean, variation
Accuracy vs. Precision
• Accuracy = the success of estimating the true
value of a quantity
• Precision refers to the clustering of sample
values about their own average
Bias
• A systematic distortion in the data which may
be caused by poor measurements, bad sample
selection or incorrect estimation procedures.
• This could be caused by defective or
improperly calibrated equipment or
measuring procedures. Also by user error failing to correct for declination, slope, etc.
Accuracy, Bias and Precision
Bias can be measured
Notation
Subscripts
the X indicates the associated value for a particular
individual. It could represent height, weight, age or
some other value of interest. The subscript i indicates
the ith individual in the population has the value of
interest used in the equation
Summation
When summing a list of numbers, say 1 through 6, the summation of
those numbers would look like this....
1 + 2 + 3 + 4 + 5 + 6 = 21
In statistics, a value represented by x and subscripted by i indicates
the ith individual in the list. So if there were six individuals in the list
and x represented height, summing those six values could appear like
this.....
x1 + x2 + x3 + x4 + x5 + x6 = the sum of heights
For lengthy lists, this will get cumbersome quickly. A shorthand
notation was created using the Greek letter sigma. The symbol is
circled in red in the equation shown. The notation above and below
the sigma represent the limits over which the summation is applied.
In this example, the letter at the top of the summation symbol
indicates to sum all the values in the list or sum the individuals in the
list starting at 1 through n, however many are in the list. In the
example of six values to sum, the n in the equation would have been
6 to indicate there are six values in the list.
Brackets
• This indicates the operations within the
brackets are to be performed before other
operations outside the brackets.
• 1+2+3+4+5+6 = 21 squared equals 441
• 12+22+32+42+52+62 = 91
More Summation
• Add a constant with brackets
(1 + 7.5) + (2 + 7.5) + (3 + 7.5) + (4 + 7.5) + (5 + 7.5) + (6 + 7.5) = 66
• Add a constant without brackets
1 + 2 + 3 + 4 + 5 + 6 = 21 + 7.5 = 28.5
• Multiply by a constant without brackets
3 times 1 + 3 times 2 +3 times 3 + 3 times 4 + 3 times 5 + 3 times 6 = 63
• Multiply by a constant with brackets
3 times 1 + 2 + 3 + 4 + 5 + 6 = 3 times 21 = 63
Units
Are the items that are observed in sampling.
In forestry, the unit is usually a tree.
Variables
• an attribute, property or characteristic of an
individual or unit being observed.
• In Forestry, the variable of interest might be:
Tree Height, Diameter, Weight, Species, etc.
Continuous or Discrete Variables
• A continuous variable is one which can take on a value
between any other two values, such as: tree height, age of
an animal or amount of water consumed.
• A discrete variable corresponds to a digital quantity, while a
continuous variable corresponds to an analog quantity.
Continuous variables can be analyzed by normal statistics
such as mean and standard deviation. Discrete cannot.
• A discrete variable is one with a well defined finite set of
possible values, called states. Examples are: the number of
chicks in a clutch, a habitat that is either ‘present’ or
‘absent’ or species of a tree. The tools for analyzing
discrete data are called nonparametric statistics.
Continuous or Discrete?
Statistical Notation for Populations
Statistical Notation for Samples
Other Notation
Population
Population
Population Total
X is the population total
xi is the unit value
N is the population size
Population Central Tendency
• The mean is the arithmetic
average of the set of observations
• The median is the middle value of the series of
observations when they are arranged in magnitude
order
• The mode is defined as the most frequently
appearing value or class of values in a set of
observations
• For a normally distributed (bell-curved shaped)
population, these values are all the same.
Population Variance
The population variance is used to
characterize the spread and is defined
as the average squared difference of
the observations from the population
mean.
Population Standard Deviation
To get a measure of variation expressed in the same units as the
original data, the square root of the variance is taken
like the variance,
standard deviation is
a measure of
dispersion of the
individual
observations about
the mean in a
normally distributed
population
Population Coefficient of Variation
Because populations with large means tend to have larger standard
deviations than those with small means, the coefficient of variation
permits a comparison of relative variability about different means. It is
independent of the units used and is useful in comparing distributions
where units may be different.
CV is the ratio of the
standard deviation to the
mean and is expressed as
a percentage
Download