Statistics Definitions

advertisement
COMPUTATIONAL CHEMISTRY
SOME STATISTICAL CONCEPTS AND DEFINITIONS
Population:
A population is a finite or infinite set of measurements of some property of a finite or infinite collection of
objects. The use of the word implies completeness and refers to the measurements of the variable of
interest and not to the objects whose properties are being measured.
Thus the population refers to the set of numerical measurements x 1,x2,...,xn, which are the outcome of N
measurements of some property of the objects. Example: the height and/or weights of students attending
PSU.
Sample:
A sample is a portion or a subset of the population and implies incompleteness. A sample always yields a
finite set of measured values due to the limitations of measurement. Samples are studied because of time
and financial limitations as well as the impossibility of measuring an infinite set. The important question is
this: WHAT CAN ONE LEARN ABOUT THE NATURE OF THE POPULATION FROM A STUDY OF THE
PROPERTIES OF SAMPLES DRAWN FROM THE POPULATION? Example: Harris polls attempt to
predict election results for the entire voting population (i.e., the statistical population is anyone who can vote)
from a detailed study of a statistical sample, which is usually about 1260 random individuals.
Random Variable:
A random variable X, is a real-valued function defined for all possible occurrences in an experiment.
'Random' implies that each measurement in the set of possible values is independent of all of the others.
Thus X(x1,x2, …, xn) refers symbolically to the set x1, x2,....,xn .
Example: In the rolling of a die, X includes the set of values, 1,2,..,6.
Random variables may be of two kinds:
a) Discrete Variables: These are variables that can assume only specific values (usually integer), such as
the number of male students with brown hair or the number of apples on a tree with worms, the number of
defective products in a batch etc. This includes all counting data with radioactive isotopes, rolling of a die
problems, coin flipping problems and opinion polls to name only a few examples. Discrete variables are
very common in the sciences. The distribution that represents a discrete variable is a ‘discrete distribution’
and the corresponding probability function is the ‘probability mass function’ or pmf for short.
b) Continuous Variables: If a variable can assume any value within a finite interval, then we say that the
random variable is continuous. This concept is not limited by the resolution of the measuring process!
Examples: The height or weight of a person, the speed of a car, in fact, most of the variables studied in the
physical sciences are continuous. The fact that we cannot measure the variable to an infinite precision is
irrelevant. The distribution that represents a continuous variable is a ‘continuous distribution’ and the
corresponding probability function is called a ‘density’ function.
Experimental Distributions:
An experimental distribution is a curve or graph (bar graphs, pie charts etc.) which result from plotting the
experimental frequencies of the measured data against the specific values that the experimental data take
on. Example: frequencies of people saying they are for, against and undecided in an election poll.
Theoretical Distributions:
These are model equations that serve to represent the mathematical behavior of populations (not samples!).
They may be discrete or continuous. Usually, theoretical distributions are used to model experimental
distributions because once you know the theoretical model that represents an experimental distribution, you
can derive information about the experimental distribution that probably has not been or cannot be easily
measured. Of course, the validity of the procedure is dependent on how closely the experimental distribution
is modeled by the theoretical distribution.
8/21/2006
Download