Uploaded by fawer56513

Statistics Definitions: Key Concepts & Methods

advertisement
Statistics Definition
 Statistics is a collection of methods for planning experiments, obtaining data,
and then organizing, summarizing, presenting, analyzing, interpreting and
drawing conclusions based on that data. It’s the science of understanding data
and of making decisions in the face of variability and uncertainty.
 Descriptive Statistics uses both numerical and graphical methods to
summarize and/or describe the characteristics of a known set of data. Ex)
Average age of this class is 19.6 years old.
 Inferential Statistics goes beyond the description. It involves the use of sample
data to make inferences about a larger set of data from which the sample was
chosen. Ex) Infer or conclude that the average age of all Stfx students is 19.6
years old.
 Data are observations or measurements that you or someone else observes or
measures (scores, counts, measurements, names) gathered to draw conclusions
(inferences).
- Data does not have to be numbers
- Data is more than just a list of numbers or names
- Data has a story, or a context to describe the observations
 Raw Data is data that has not been sorted or changed in any way. Ex) {15.0,
18.5, 18.9 24.0}
o Data – meaning
Raw Data – no meaning
 Variation is the difference or change in observations or measurements.
 Example) The total monthly sales in ($000) for four randomly selected months
last year was.
{15.0, 18.5, 18.9, 24.0}
• The average monthly sales are 19.1 or $19,100 (Sum / number of variables =
19.1)
– We might conclude that the average monthly sales in is $19,100
– Or the yearly sales are 12 × 19.1 = 229.2 = $229,200
• Note how sales in each sample month is different.
– We have variation in the data.
– How (accurate or reliable) will our (conclusion or inference) be given the
variation in this sample data?
– To answer this, we will need to have a better understanding of Variation.
 Reliability is a measure of how good our inference (based on the sample) really
is.
 Goals of Statistics are to enable an investigator to plan research so as to take
variability into account (manage or reduce variability). Extract the maximum
amount of reliable information and to quantify any variability in the data.
 Population is the complete collection of elements (scores, people,
measurements) to be studied.
 Sample is a sub-collection of elements drawn from the population.
 Experimental unit (or just unit or element) is an object (person, object, event)
upon which we collect data.
 Parameter is a numerical measurement describing some characteristic of a
population.
 Statistic is a numerical measurement describing some characteristic of a
sample.
 Statistical inference is an estimate or prediction about a population and its
parameters based on information obtained through the sample and its sample
statistics.
 Variable is a characteristic observed on sample data that can vary from unit to
unit in the sample.
 For Example: Consider the class as a sample of STFX students. What are
some characteristics that can be observed on each student here?
- Hair Color, Degree, Height, Wake-up time, Shoe Size, Number of Siblings
Variable
Categorical (Qualitative) Variables Numerical (Quantitative) Variables
classified as belonging to groups or
measured using a numerical scale
categories. E.g., Hair colour,
Degree program, Breakfast (yes/no)
Discrete
variables can take only a
finite set of values
E.g., Shoe size, Number of Siblings
Continuous
variables can take
all or any value
E.g., Height, Wake-up time
 Classification of Variables (Alternative) Levels of Measurement
- Nominal -- data consist of names, labels or categories, no ordering scheme.
- Ordinal -- data can be arranged in some order, but differences cannot be
determined or are meaningless.
- Interval -- data can be arranged in some order with meaningful differences
between the data. No natural starting point.
- Ratio -- same as interval but does have a natural starting point.
 Observational Study, observations and measurements are made on subjects in
their natural setting without modifying the subjects studied.
 Designed Experiment, we apply some treatment to the subjects (modify the
subjects) and observe the effects on the subjects.
- Usually, there are two or more groups to be studied, one of which is the control
group, the other treatment groups. The treatment is only applied to the treatment
groups.
 Survey asks people questions and records their response.
- May be done over phone, mail, or face to face
 Census surveys every member of the population.
- The results are very reliable, since the sample is the same as the population.
 Published Source one uses results that have already been collected and
published.
 Selection Bias: is when part of the population is excluded from ever being in the
sample, may be intentional or non-intentional.
 Non-Response Bias: is when data cannot be obtained from every unit in the
sample. Ex) Not everyone responds to a survey, yet they were selected as part
of the sample
 Representative Sample has characteristics that are essentially the same as
those possessed by the population from which it was drawn.
 Random Sampling, members of the population are selected in such a way that
each member has an equal chance of being selected. (Appendix A)
 Confounding Variable (an alternative explanation for the differences between
the groups) cannot be ruled out.
 Placebo neutral treatment that has no "real" effect on the dependent variable.
 Unstacked Data: Data values are stored in two columns, each column is a
variable from a different group, and can only store data for two variables
 Stacked Data: Data values stored in a spreadsheet format, each row contains
data for a single individual, can store many variables, and most Statistical
Software uses this format
Download