Uploaded by Bossun Senpai

lecture-1

advertisement
An introduction
to statistics
Book Contents
Presenting and writing about statistics
Course Outline
Chapter
Topic
Minimum No. weeks allotted
1
1
1
2
end Jan
end Feb
PRELIMS EXAM
2 end March
2 end April
2
1
FINALS EXAM
To become a biologist,
you need to “know” statistics
UNDERSTAND & ANALYZE
what others describe
your own results
How can you make/interpret these visualizations?
Gordon et al. 2010. PLoS ONE. DOI: 10.1371/journal.pone.0010905
Ranasinghe et al. 2013. BMC Public Health 13(1):797
Nanchahal et al. 2018. EBioMedicine 33: DOI: 10.1016/j.ebiom.2018.06.022
Badrawy et al. 2016. Int J Stem Cells. 9(1): 145-151.
ASSIGNMENT:
Line Graphs
Bar Graphs
Scatterplot
Box and whiskers plot
Pie Graph
Histogram
Area Graph
Biologists do not generalize
from a single observation
DETECT
Variability is inherent among organisms
DECREASE
Bias and Error is possible during observations
Replicated observations can..
DETECT
Variability is inherent among organisms
DECREASE
Bias and Error is possible during observations
Statistics helps by..
DETECT
INVESTIGATING
Variability
Distribution
DECREASE
CALCULATING
Bias and Error
Reasonable Estimates
Answer questions by hypothesis testing
based on type of observation
OUR MAIN OBJECTIVE
Answer questions by hypothesis testing
based on type of observation
Data
set of values with respect to
qualitative or quantitative variables
Answer questions by hypothesis testing
based on type of observation
Measurements
Data
Ranks
Frequencies
Answer questions by hypothesis testing
based on type of observation
Measurements
Data
Ranks
Frequencies
ex. Height , mass,
pH
Answer questions by hypothesis testing
based on type of observation
also known as
Measurements
2 kinds:
Data
Ranks
Frequencies
interval data
CONTINUOUS
meaningfully
with decimal
e.g.
length in cm: [ 6, 12.4, 8.32]
DISCRETE
can only be integer
individual counts
e.g.
no. of pills/patient : patient A = 4 pills
patient B = 3 pills
Answer questions by hypothesis testing
based on type of observation
Measurements
Data
also known as
ordinal data
take note:
Ranks
ranks remove inherent gaps
in variability
Frequencies
must be analyzed using
non-parametric tests
Answer questions by hypothesis testing
based on type of observation
Measurements
Data
Ranks
Frequencies
Example: seriousness of infection
( none, light, medium, heavy)
Results of questionnaire ( 1=
poor; 5= excellent)
Answer questions by hypothesis testing
based on type of observation
Measurements
Data
Ranks
Categorical
data/
Frequencies
Answer questions by hypothesis testing
based on type of observation
different from measurement
individual counts of each
organism/category
Measurements
DISCRETE
Data
some features of organisms
seem impossible to quantify
only way is to get a
Ranks
total counts per category
Ex. cancer or non-cancer, mutant or
non-mutant,
Different species of turtles
Frequencies
usually analyzed using
chi2 test or logistic regression
QUIZ
Give 2 examples each of the types of data ( be more specific) :
1.Measurement ex. Height , mass, pH
2. Ranks ex. Example: seriousness of infection ( none, light,
medium, heavy)
Results of questionnaire ( 1= poor; 5= excellent)
3. Categorical data Ex. cancer or non-cancer, mutant or non-mutant,
Different species of turtles
Answer questions by hypothesis testing
based on type of observation
Answer questions by hypothesis testing
based on type of observation
DIFFERENCE?
RELATIONSHIP?
There are 2 main “statistical” questions
Answer questions by hypothesis testing
based on type of observation
DIFFERENCE?
On average,
is “x” more than “y” ?
less
bigger
smaller
RELATIONSHIP?
The Effect of Ethanolic Extract of Urtica dioica Leaves on High Levels
of Blood Glucose and Gene Expression of Glucose Transporter 2
(Glut2) in Liver of Alloxan-Induced Diabetic Mice
DIFFERENCE?
DIFFERENCE?
endothelial nitric oxide synthase (eNOS)
Answer questions by hypothesis testing
based on type of observation
DIFFERENCE?
RELATIONSHIP?
On average,
is “x” more than “y” ?
less
bigger
smaller
As “x” increase,
does “y” increase?
decrease?
no change?
BY WHAT AMOUNT?
RELATIONSHIP?
RELATIONSHIP?
Relationship between body mass index (BMI) and risk for diabetes in
US Health Professionals, derived from data extracted from Chan et al.
[2].
universally expressed in units of kg/m2,
underweight: under 18.5 kg/m2, normal weight: 18.5 to 25, overweight: 25 to
30, obese: over 30.
Answer questions by hypothesis testing
based on type of observation
DIFFERENCE?
RELATIONSHIP?
On average,
How
is “x”much
moreone
thandata
“y” ?set
is inless
comparison
to other
data set/s
bigger
smaller
As “x” increase,
How
much
one data set
does
“y” increase?
varies
decrease?
with another
data set
no change?
BY WHAT AMOUNT?
Answer questions by hypothesis testing
based on type of observation
Answer questions by hypothesis testing
based on type of observation
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw an INFERENCE
Answer questions by hypothesis testing
based on type of observation
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
2 kinds:
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
BIOLOGICAL
overall theme investigated
STATISTICAL
specific; states the data set/s
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
example:
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
BIOLOGICAL
Is hypertension linked to obesity?
STATISTICAL
Does increased systolic blood pressure
positively vary with high body mass index?
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
example:
theme
BIOLOGICAL
Is hypertension linked to obesity?
STATISTICAL
Does increased systolic blood pressure
positively vary with high body mass index?
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Data sets
Answer questions by hypothesis testing
based on type of observation
reminder:
There can be one (1)
BIOLOGICAL question
investigated by multiple
STATISTICAL questions
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Also 2 kinds
but in statement format:
BIOLOGICAL
overall theme investigated
STATISTICAL
specific; states the data set/s
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Also 2 kinds
but in statement format:
BIOLOGICAL
Hypertension is not linked to obesity.
STATISTICAL
Increased systolic blood pressure
does not vary with high body mass index.
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Also 2 kinds
but in statement format:
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
BIOLOGICAL
Hypertension is not linked to obesity.
STATISTICAL
Increased systolic blood pressure
does not vary with high body mass index.
preliminary assumption must be
NO difference / relationship
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Table OR Graph ?
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Table OR Graph ?
Histogram
Bar Chart
Box-Whiskers Plot
Scatter Plot
etc.
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Table OR Graph ?
Histogram
Bar Chart
Box-Whiskers Plot
Scatter Plot
etc.
LECTURE: how to choose appropriate!
LABORATORY: create from data sets!
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
example:
Pormousa et al. 2015. Dependence Modelling 3(1): DOI: 10.1515/demo-2015-0016
For relationship tests using measurements
most common and basic is the “scatterplot”
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Measures effect size of difference/relationship 1. State the QUESTION
relative to amount of variability
2. Formulate the NULL HYPOTHESIS
0.58
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Measures effect size of difference/relationship 1. State the QUESTION
relative to amount of variability
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
Pearson r : 0.76
example of Pearson correlation coefficient
a “parametric” test of relationships 0.58
r2 :
measure of the linear correlation between
two variables X and Y.
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Measures effect size of difference/relationship 1. State the QUESTION
relative to amount of variability
2. Formulate the NULL HYPOTHESIS
Pearson r : 0.76
r2 : 0.58
effect size
3. Create a VISUALIZATION
in this example
Calculate the TEST STATISTIC
0 lowest : 1 highest4.value
test statistic
5. Determine SIGNIFICANCE
value between +1 and −1, where 1 is total
positive linear correlation,
is no linear
6. Draw a0INFERENCE
correlation, and −1 is total negative linear
correlation.
Answer questions by hypothesis testing
based on type of observation
Probability of
getting effect just by chance
if null hypothesis was true
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
WE CAN USE:
critical value
1. State the QUESTION
or
p-value
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
Probability of
getting effect just by chance
if null hypothesis was true
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
3. Create a VISUALIZATION
4. Calculate the TEST
test statistic
sample size
sig. probability
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
IF example we obtained
p < 0.05
critical > test statistic
1. State the QUESTION
What to write:
2. Formulate the NULL HYPOTHESIS
Therefore, we accept/reject the null hypothesis (HO).
3. Create a VISUALIZATION
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
IF example we obtained
p < 0.05
critical > test statistic
1. State the QUESTION
What to write:
2. Formulate the NULL HYPOTHESIS
Therefore, we accept/reject the null hypothesis (HO).
3. Create a VISUALIZATION
That is what you DECIDE,
not what you WRITE IN TEXT.
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
IF example we obtained
p < 0.05
critical > test statistic
1. State the QUESTION
What to write:
2. Formulate the NULL HYPOTHESIS
Therefore, we accept/reject the null hypothesis (HO).
3. Create a VISUALIZATION
Increased systolic blood pressure positively
4. Calculate the TEST STATISTIC
varies with high body mass index.
5. Determine
If you decide to REJECT HO,
STATE your ALTERNATE hypothesis (HA).
SIGNIFICANCE
6. Draw a INFERENCE
Answer questions by hypothesis testing
based on type of observation
TOGETHER, we can write in the RESULTS:
1. State the QUESTION
2. Formulate the NULL HYPOTHESIS
Increased systolic blood pressure positively varies
3. Create a VISUALIZATION
with high body mass index (Pearson r=0.76, p<0.05).
4. Calculate the TEST
5. Determine
STATISTIC
SIGNIFICANCE
6. Draw a INFERENCE
REMEMBER:
Performing statistical tests
does not actually allow you to prove conclusively
• there are always inherent biases / limitations,
do your best to identify them
• understand results as “concepts in context”
BE CAREFUL!
Though hypothesis tests are meant to be reliable,
two types of errors can occur
https://www.abtasty.com/blog/type-1-and-type-2-errors/
false negative
false positive
Download