Chapter 1: Introduction to Statistics 1.1 An Overview of Statistics

advertisement
Chapter 1:
Introduction
to Statistics
1.1 An Overview of Statistics
1.2 Data Classification
1.3 Experimental Design
1.1 An Overview of Statistics
What is statistics?
 Science of data
 Data are numbers with
context
 It can be broken down to
three branches:
 Data analysis
 Probability
 Statistical Inference
A Definition of Statistics
Data


It is collection of facts
Consists of information coming from
observations, counts, measurements or responses
Statistics


Uses data to gain insight and draw conclusions
It is the science of collecting, organizing,
analyzing and interpreting data in order to make
decisions
Data sets
Population

It is the collection of all outcomes, responses,
measurements or counts that are of interest.
Sample

It is a subset of the populations.
Population: All students
taking Statistics classes at
NSCC
Sample: All
Students in
Math109 section
05
Data sets
Parameter

It is a description of a population characteristic.
Statistic:

It is a description of a sample characteristic.
Branches of Statistics
Descriptive statistics:
 It involves organization, analysis, summarization
and display of data.
Probability theory:
 It is the branch of statistics which deals with chance
or random phenomena i.e. it tries to quantify how
likely events are to occur.
Inferential statistics:
 It is the branch statistics that involves using a
sample to draw conclusions about a population. A
basic tool in the study of inferential statistics is
probability.
1.2 Data classification
How do we classify data?
Types of data
Qualitative data
 Data which cannot be measured by a numerical scale.
 It consists of attributes (like gender, nationality). It can be
binary (yes or no) or categorical
Quantitative data
 Data which can be measured or identified by a numerical
scale i.e. it consists of numerical measurements, counts.
Types of data
Nominal: data at this level is qualitative only
Ordinal: data at this level is qualitative or quantitative, they can
be ranked or ordered but differences between measurements
are not meaningful.
Interval: data at this level can be ordered and meaningful
differences can be calculated. A zero entry measures a position
on a scale. It is not an inherent zero **.
Ratio: data at this level are similar to those at the interval level
with the added property that a zero entry is an inherent zero.
A ratio of two data values can be performed so that the one
data value can be a multiple of another.
** inherent zero is a zero that implied ‘none’.
1.3 Experimental Design
What is experimental study?
 An experiment deliberately
imposes a treatment on a group
of objects or subjects in the
interest of observing the
response.
 It is wise to take time and effort
to organize the experiment
properly to ensure that the
right type of data, and enough
of it, is available to answer the
questions of interest as clearly
and efficiently as possible
Design of a statistical study
Guidelines to designing a statistical study:
 Identify the variable(s) of interest and the population of the
study
 Design data collection process. If you use a sample, make sure
the sample is representative of the population.
 Collect the data.
 Summarize the data, using descriptive statistics techniques.
 Interpret the data and make decisions about the population
using inferential statistics.
 Identify any possible errors.
Data Collection
Methods:
• Observational study Basically you observe ‘what is’. An observational
study is a study in which a researcher simply observes behavior in a
systematic manner without influencing or interfering with the behavior
• Perform an experiment: Here, a treatment is applied to part of the
population and responses are observed. Another part of the population
may be used as a control group, in which no treatment is applied. The
results of the treatment and the control group are studied and compared.
• Simulation: It is the use of mathematical or physical model to
reproduce the conditions of a situation or process. They allow you to
study situations hat are impractical or even dangerous to create in real
life.
• Survey: it is an investigation of one or more characteristics of a
population.
Experimental Design
An experiment deliberately imposes a treatment on a group of objects
or subjects in the interest of observing the response. This differs from
an observational study, which involves collecting and analyzing data
without changing existing conditions. Because the validity of a
experiment is directly affected by its construction and execution,
attention to experimental design is extremely important.
Three key principles of experimental design are:
Control of the effects of lurking variables on the response, most
simply by comparing several treatments.
Randomization, use of chance to assign experimental units to
treatments.
Replication of the experiment on many units to reduce chance
variation in the results.
Experimental Design
Control: An experiment involves a dependent variable and independent
variables. One usually conducts the experiment to see the impact of the latter
on the former. It is very likely that a variety of factors other than the
independent variable which is of interest affect the results of the experiment.
Hence in order to maintain the integrity of the experiment it is important to
control these influential factors. Some factors are:
Confounding variable: it is an extraneous variable in an experiment that
correlates with both the dependent and independent variable.
Placebo effect: it occurs when a subjects shows a favorable reaction to a
placebo i.e. when he or she is not administered the actual treatment but a
placebo in its place. To control or minimize this effect the blinding technique
is used.
Single blind: it is when the subject does not know whether he or she is
receiving the treatment or a placebo.
Double blind: it is when both the researcher and subject are unaware if
the subject is receives a treatment or placebo.
Experimental Design
Randomization:
•It is a process of randomly assigning experimental units to different
treatment groups.
•In a completely randomized design, experimental units or subjects are
assigned to different treatment groups through random selection.
•In some cases the experimenter is aware of differences among groups of the
experimental units or subjects. In such cases it is necessary to use blocks,
which are groups of subjects/units with similar characteristics before they are
randomly assigned to a treatment group. This setup is known as a
randomized block design.
Replication:
To improve the results of an experimental, replication, the repetition of an
experiment on a large group of subjects, is required. Replication reduces
variability in experimental results, increasing their significance and the
confidence level with which a researcher can draw conclusions about an
experimental factor.
Sampling techniques
What is a census?
A census is a count or measure of an entire population. Although it
provides complete information it is costly, cumbersome and time
consuming.
What is sampling?
Sampling is the process of selecting units (e.g., people,
organizations) from a population of interest so that by studying the
sample we may fairly generalize our results back to the population
from which they were chosen. To collect unbiased data, a researcher
must ensure that the sample is representative of the population.
What is a sampling error?
A sampling error is the difference between the results of a sample
and those of the population.
Sampling techniques
Random sample is one in which every member of the population has
an equal chance of being selected.
Simple random sample is a sample in which every possible sample
of the same size has the same chance of being selected. Now when
you choose members of a sample, you should decide whether it is
acceptable to have the same population member selected more than
once:
•
If it is acceptable, ,then the sampling process is known as with
replacement.
•
If it is not acceptable, then the sampling process is said to be
without replacement.
Sampling techniques
Stratified random sample is formed when the researcher first
divides the population into groups that share similar characteristics,
called strata and then selects a simple random sample from each
stratum.
Cluster sample is formed by diving the population into naturally
occurring subgroups, called clusters, and selecting all the members
in one or more clusters.
Systematic sample is one in which members of the population are
ordered in some way, a starting number is randomly selected and
then sample members are selected at regular intervals from the
starting number.
Convenience sample consists of only available members of the
population. This type of sample often leads to biased studies.
Homework
Section 1.1
 1-4, 11, 13, 14, 17, 19, 21, 27
(assume U.S.), 29, 30, 32,
33, 36, 40, 41
Section 1.2
 7-10, 15, 16
Section 1.3
 1,2, 4-10, 15-21, 29, 30, 31,
33, 43 (random, stratified
and clustering only)
Read Chapter 2
What are the odds??? :P
Download