Uploaded by mikabriella

01 Introduction to Statistics

advertisement
→ Survey (sampling at random from
the exiting population)
→ Create own samples by
performing
an
experiment
(experimental subjects: samples of
the infinite population of subjects // …
u could’ve created w/ infinite
resources)
01: AN INTRODUCTION TO STATISTICS
TYPICAL FIGURES IN LECTURES &
SCIENTIFIC PAPERS
-
Bar chart
-
Box & whisker plot
-
Scatter plot
-
Contingency table

Why do biologists have to bother with
statistics?
- Helps with variability, investigates the
distribution of samples
- Calculate reasonable estimates of the
situation in the whole population (e.g.
how tall women are on average) →
a.k.a. descriptive statistics
- Descriptive statistics: summarize
what you know about your samples
- Answer questions by conducting
hypothesis testing (e.g. whether
one group of women were taller than
another)
- To discount the possibility of chance
results → conduct a statistical test
(e.g. two-sample t test)

Why is statistical logic so strange?
- There is a need to construct a null
hypothesis
- Test if the null hypothesis is to be true
(statistical tests: 4 main stages)
-
AWKWARD QUESTIONS

Why do biologists have to repeat
everything?
- Making generalizations in biology is
inaccurate (variability)
- Replicated observations of a sample:
overcome variability
-
-
1. Formulating a null hypothesis
Opposite the scientific hypothesis
No differences/relationships
Preliminary assumption
2. Calculating a test statistic
Measures the size of any effect
Usually a difference between groups
/ relationship between measurements
relative to variability
USUALLY: larger the effect, larger the
test statistic (↑ effect, ↑ test statistic)
3. Calculating the significance
probability
The chances that a certain set of
results could be obtained if the null
hypothesis were true
-
-
-
-
-
-

GENERALLY: the larger the test
statistic & sample size, the smaller
the significance probability (↑ test
statistic & sample size, ↓
significance probability)
4. Deciding whether to reject the
null hypothesis
REJECT →
Significance probability ≤ 1 in 20
(5% or 0.05)
NO EVIDENCE TO REJECT /
SUPPORT →
Significance probability > 5%
5% cut-off: compromise to reduce the
chances of errors
TYPE 1 ERROR: detection of an
apparently significant difference /
association, when in reality there is
NONE between the populations
TYPE 2 ERROR: failure to detect a
significant difference / association,
when in reality it is PRESENT in the
populations
→ ↑↑↑ chances by lowering the cut-off
point
*** STATISTICAL TESTS DON’T
PROVE
ANYTHING
CONCLUSIVELY (there is still a
chance that there may or may not be
a significant effect)
Why are there so many different
statistical tests?
- There are many different ways to
quantify things w/ different types of
data
- Data can vary in different ways
- There are very different questions you
might want to ask about the collected
data
TYPES OF DATA

MEASUREMENTS
- Character state which can be
meaningfully represented by a
number
-
-
-
-
The most common way to quantify
things is to take measurements
Interval data:
→ Continuously (e.g. weight)
→ Discretely (e.g. # of hairs)
Normal
distribution:
usual
symmetrical
&
bell-shaped
distribution pattern influenced by a
large number of factors
Parametric test: statistical test which
assumes that data is normally
distributed
Non-parametric test: statistical test
that doesn’t assume that data is
normally distributed… BUT uses the
ranks of the observations

RANKS
- Put measurements into an order w/o
the actual values having meaning
- Ranked / ordinal data
- E.g. 1st, 12th; none, light, medium,
heavy; 1 = poor to 5 = excellent
- MUST be analyzed w/ nonparametric tests

CATEGORICAL DATA
- Some
organism
features
are
unquantifiable
- Classify into different categories
- Quantify this sort of data by counting
the frequency
- Usually analyzed with x2 (chisquared) tests or logic regression
TYPES OF QUESTIONS
-
Statistical tests are designed to
answer 2 main types of questions:
→ Are there differences between
sets of measurements?
→ Are there relationships between
them?



TESTING
FOR
DIFFERENCES
BETWEEN
SETS
OF
MEASUREMENTS
- To find out if experimentally treated
organisms/cells are different from
controls
- To compare 2 sets of measurements
taken on a single group of organisms
(e.g. medical condition of patients
before & after treatment)
- To see if several types of organisms
(e.g. 5 different bacterial strains), or
those subjected to treatments (e.g.
wheat in different levels of nitrate &
phosphate) were different from each
other
TESTING
FOR
RELATIONSHIPS
BETWEEN MEASUREMENTS
- Take 2 or more measurements on a
single group of organisms/cells &
investigate how the measurements
are related
- E.g. variation of heart rates w/ their
blog pressure, variation of weight with
age, concentrations of different
cations in neurons vary w/ others
- Help study how organisms operate
- Help predict things about them
TESTING FOR DIFFERENCES AND
RELATIONSHIPS
BETWEEN
CATEGORICAL DATA
- Determine if there are different
frequencies of organisms in different
categories (e.g. rats turn more
frequently to the right in a maze)
- Determine if categorical traits are
associated (e.g. eye & hair color)
- Study
how
quantitative
measurements
might
affect
categorical traits (e.g. are tall people
more likely to have brown eyes?)
USING THE DECISION CHART
-
Complication: final box
alternative tests
Parametric test: bold type
w/
2
-
-
Non-parametric test: normal type
ALWAYS advised to use the
parametric test if valid
Parametric tests: more powerful in
detecting significant effects
Non-parametric tests: ranked data,
irregularly distributed data that can’t
be transformed to the normal
distribution or have measurements w/
only a few, discrete values
INVESTIGATE the DISTRIBUTION
of data before deciding which tests to
carry out (to see if parametric tests
are valid or if you can transform the
data so that it can be valid)
USING THE DECISION CHART

CARRYING OUT TESTS
- Test descriptions will do 5 things…
1. Tell the kinds of questions the test
will help you answer & give
examples to show the range of
situations in which it is suitable.
This helps you ensure you are
choosing the right test
2. Tell when it is valid to use the test
3. Describe
the
rationale
&
mathematical basis for the test
(says how it works)
4. Show how to perform the test
using a calculator and/or the
computer-based
statistical
packages SPSS and RStudio
5. Tells how to present the results of
the statistical tests

DESIGNING EXPERIMENTS
- Can use information about statistics
to design better experiments

COMPLEX STATISTICAL ANALYSIS
- Describe most of the statistical tests
needed to analyze straight-forward
experiments and surveys (may look at
1-2 factors)
- Best to stay as far as possible within
the
limits
when
designing
experiments
-

In some branches of biology in some
occasions, you simply have to carry
out and analyze rather more complex
experiments or investigate huge sets
of data
PRESENTING
&
DISCUSSING
STATISTICS
- Describes how to present information
about what statistical tests you did and
why, how to present the results of your
tests, and the level at which to discuss
the results
- To produce professional write-ups
Download