here - Bioinformatics Shared Resource Homepage

How Statistics Can Empower Your Research?
Part II
Xiayu (Stacy) Huang
Bioinformatics Shared Resource
Sanford | Burnham Medical Research Institute
OUTLINE
 Summary of Previous Talk


Descriptive & inferential statistics
Student’s T test, one-way ANOVA
 More common statistical tests and applications

Repeated measures one-way ANOVA

Two-way ANOVA
 Power analysis
 Common data transformation methods
SUMMARY OF PREVIOUS TALK
• Descriptive statistics
•
Measure of central tendency, dispersion, etc.
• Inferential statistics
• Hypothesis, errors, p-value, power
• Three statistical tests and their applications
•
Two sample unpaired test, paired t test and one way ANOVA
Power point presentation at http://bsrweb.burnham.org
ONE-WAY ANOVA EXAMPLE
• Goal:studying the effect of mice genotypes on their learning skills on rotarod.
• Dependent variable: number of seconds staying on a rotarod
Group1
Group2
Group3
Group4
170
116
30
114
214
102
60
24
122
120
136
72
44
82
126
42
80
90
56
20
130
54
6
32
DECISION TREE
DECISION TREE----ONE-WAY ANOVA
ASSUMPTION CHECK IN GRAPHPAD PRISM
DATA ANALYSIS IN GRAPHPAD PRISM
Variance check
REPEATED MEASURES ONE-WAY ANOVA

Compares the means of 3 or more groups

Repeated measurements on the same group of
subjects

Assumptions:

Sampling should be independent and randomized.

Equal sample size per group preferred.

Sphericity or homogeneity of covariance

Data is normally distributed.
APPLICATION OF REPEATED MEASURES ONE-WAY
ANOVA IN BIOLOGY
Days
REPEATED MEASURES ONE-WAY ANOVA EXAMPLE
• Goal:studying the effect of practice on maze learning for rats.
• independent variable : days
• dependent variable: number of errors made each day
Rat ID
Day 1
Day 2
Day 3
Day 4
Rat_1
3
1
0
0
Rat_1
Rat_2
Rat_3
Rat_2
3
2
2
1
Rat_3
6
3
1
2
DECISION TREE----ONE-WAY REPEATED ANOVA
TABLE FORMAT IN GRAPHPAD PRISM– REPEATED
MEASURES ONE-WAY ANOVA
DATA FORMAT AND CHOOSING ANALYSIS METHODS
DATA ANALYSIS IN GRAPHPAD PRISM
ANALYSIS RESULT
ONE-WAY REPEATED ANOVA COMPARED WITH
REGULAR ONE-WAY ANOVA
TWO-WAY ANOVA

One dependent variable and two independent
variables or factors

Assumptions
samples are normally or approximately normally distributed
 The samples from each treatment group must be independent
 The variances of the populations must be equal
 equal sample size per treatment group preferred


Treatment group

all possible combinations of the two factors
Treatment
Gender
Placebo
Female
Drug
Male
Female
Male
TWO-WAY ANOVA

Main effect


Interaction effect


Effect of individual factor
Effect of one factor on the other
Hypotheses
The population means of the first factor A are equal
 The population means of the second factor B are equal
 There is no interaction between the two factors


Test

F test: mean square for each main effect and the interaction
effect divided by the within variance
MAIN EFFECTS
Pain score
• Asprin
•Ibuprophen
• Asprin
•Ibuprophen
A--Time
B
1st hr
2nd hr
A
I. No main effects for both time and treatment
1st hr
2nd hr
III. Main effect of time only
2nd hr
II. Main effect of treatment only
• Asprin
•Ibuprophen
• Asprin
•Ibuprophen
1st hr
B--Treatment
1st hr
2nd hr
IV. Main effects of time and treatment
MAIN EFFECT AND INTERACTION EFFECT
Pain score
• Asprin
•Ibuprophen
1st hr
2nd hr
V. Interaction effect only
• Asprin
•Ibuprophen
1st hr
VI. Main effect of time only and interaction effect
• Asprin
•Ibuprophen
1st hr
2nd hr
VII. Main effect of treatment only and interaction effect
2nd hr
• Asprin
•Ibuprophen
1st hr
2nd hr
VIII. Main effects of time and treatment, and interaction effect
TWO-WAY ANOVA EXPERIMENTAL DESIGN
I.
Control
Treated
Time 0
4
4
Time 2
4
Time 4
Time 8
Control
Treated
Time 0
3
4
4
Time 2
6
8
4
4
Time 4
3
4
4
4
Time 8
9
12
Balanced design with equal replication (Best)
Control
Treated
Time 0
1
1
Time 2
1
Time 4
Time 8
II.
Proportional design replication (Acceptable)
Control
Treated
Time 0
4
3
1
Time 2
2
2
1
1
Time 4
2
2
1
1
Time 8
3
4
III. One replication only (Not recommended)
IV. Disproportional design (Bad)
APPLICATION OF TWO-WAY ANOVA IN BIOLOGY
0 mM
50 mM
75 mM
Microarray: Time-dose relationship
TWO-WAY ANOVA WITH REPLICATION EXAMPLE
Study the effect of gender and anti-cancer drugs on tumor growth
Drug
cisplatin
Gender Female
Tumor
Size
vinblastine
5-fluorouracil
Male
Female
Male
Female
Male
65
50
70
45
55
35
70
55
65
60
65
40
60
80
60
85
70
35
60
65
70
65
55
55
60
70
65
70
55
35
55
75
60
70
60
40
60
75
60
80
50
45
50
65
50
60
50
40
DECISION TREE– FACTORIAL ANOVA
TABLE FORMAT IN PRISM—TWO-WAY ANOVA
DATA FORMAT AND CHOOSING ANALYSIS METHODS
CHOOSING MODEL
ANALYSIS RESULT
TWO-WAY REPEATED MEASURES ANOVA EXAMPLE
Goal: Investigating gender and caffeine consumption on the effect of memory
Independent variables: gender and caffeine consumptions
Dependent variable: memory score
Subject
Sex
Lowcaff
Medcaff
Highcaff
1
Male
10
15
17
2
Male
9
12
11
3
Male
11
14
15
4
Male
13
11
12
5
Male
11
10
16
6
Male
12
6
12
7
Female
10
14
14
8
Female
12
21
22
9
Female
21
18
23
10
Female
9
18
22
11
Female
12
16
20
12
Female
15
17
26
DECISION TREE----TWO-WAY REPEATED ANOVA
TABLE FORMAT– TWO-WAY REPEATED MEASURES ANOVA
DATA FORMAT AND ANALYSIS METHODS
CHOOSING MODEL
ANALYSIS RESULT
Matching not effective???
RECONSIDERING REGULAR TWO-WAY ANOVA
OUTLINE
 Summary of Previous Talk


Descriptive & inferential statistics
Student’s T test, one-way ANOVA
 More common Statistical tests and Applications

Repeated-measures one-way ANOVA

Two-way ANOVA
 Power analysis
 Common data transformation methods
POWER ANALYSIS


Power depends on:

Sample size (n )

Standard deviation ( or s )

Minimal detectable difference ( )

False positive rate ( )
effect size
Power analysis includes:

Sample size required

Effect size or Minimal detectable difference

Power of the test
POWER ANALYSIS SOFTWARE/PACKAGES

G*Power (free!!!)

Optimal design (free!!!)

SPSS sample power

PASS

SAS proc power, Stata sampsi, etc

Mplus for more advanced/complicated analysis

Many free on-line programs

http://www.stat.uiowa.edu/~rlenth/Power/
TWO INDEPENDENT SAMPLE POWER ANALYSIS
--INPUT AND OUTPUT PARAMETERS IN G*POWER

Sample size required

Input parameters
Effect size ( f )
 False positive rate ( )
 Minimum Power 1
(  )
 Ratio of two sample sizes


Output parameters
Noncentrality parameter ( )
 Critical t
 Degree of freedom
 Sample size for each group
 Total sample size
 Actual power

TWO INDEPENDENT SAMPLES POWER ANALYSIS
--INPUT AND OUTPUT PARAMETERS IN G*POWER

Effect size

Input parameters
False positive rate
 Minimum power
 Sample size for each group


Output parameters
Noncentrality parameter
 Critical t
 Degree of freedom
 Effect size
 Minimal detectable difference

COMPUTE SAMPLE SIZE– TWO INDEPENDENT SAMPLES
DETERMINING EFFECT SIZE– TWO INDEPENDENT SAMPLES
ANALYSIS RESULTS– TWO INDEPENDENT SAMPLES
COMPUTE EFFECT SIZE– TWO INDEPENDENT SAMPLES
X-Y PLOT FOR A RANGE OF VALUES
FACTOR AFFECTING POWER—TWO INDEPENDENT SAMPLES
 Power increases as total sample size increases
 Power increases as effect size increases
 Power increases as significance level increases
ONE-WAY ANOVA POWER ANALYSIS
--INPUT AND OUTPUT PARAMETERS IN G*POWER

Sample size required

Input parameters
Effect size ( f )
 False positive rate ( )
 Minimum Power 1
(  )
 Number of groups


Output parameters
Noncentrality parameter ( )
 Critical F
 Degree of freedom
 Total sample size
 Actual power

ONE-WAY ANOVA SAMPLE POWER ANALYSIS
--INPUT AND OUTPUT PARAMETERS IN G*POWER

Effect size

Input parameters
False positive rate
 Minimum power
 Total sample size
 Number of groups


Output parameters
Noncentrality parameter
 Critical F
 Numerator and denominator degree of freedom
 Effect size
 Minimal detectable difference

COMPUTE SAMPLE SIZE-- ONE-WAY ANOVA
COMPUTE EFFECT SIZE– ONE-WAY ANOVA
FACTORS AFFECTING POWER—ONE-WAY ANOVA
 Power increases as total sample size increases
 Power increases as effect size increases
 Power increases as significance level increases
OUTLINE
 Summary of Previous Talk


Descriptive & inferential statistics
Student’s T test, one-way ANOVA
 More common Statistical tests and Applications

Repeated-measures one-way ANOVA

Two-way ANOVA
 Power analysis
 Common data transformation methods
DATA TRANSFORMATION

Why?


Many biological variables do not follow normal
distribution
How?
Applying a mathematical function on each observation
 Performing statistical tests using transformed data
 Interpreting results using back transformation


Common data transformation methods in biology




Log transformation
Square root transformation
Arcsine transformation
Reciprocal transformation
LOG TRANSFORMATION

Usage
Convert a positively skewed distribution into a symmetrical
one
 Applicable when there is heteroscedasticity and standard
deviations are proportional to the means


Mathematical function
x '  log2 ( x  1)

Logarithms in any base are satisfactory
x  2 ^ x ' 1

Back transformation:
SQUARE ROOT TRANSFORMATION

Usage



Applicable when the group variances are proportional to the
means
Samples taken from Poisson distribution such as counting data
Mathematical function
x '  x  0.5

Back transformation:
x  x '^ 2  0.5
ARCSINE TRANSFORMATION

Usage


Applicable when data (proportions or percentages)
was taken from a binomial distribution
Mathematical function
p '  arcsin p


Back transformation:p  (sin
p ') ^ 2
Shortcoming
Not good at the ends of the range (near 0 and 100%)
 Adjustment needed when p near 0 and 100%

CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION
Shape
Reverse J
Severe skew right
Moderate skew right
Figure
Transformation
A
B
C
1/X
Log (X)
sqrt (X)
CHOOSING TRANSFORMATION BASED ON DATA DISTRIBUTION
Shape
Moderate skew left
Severe skew left
J-shaped
Figure
D
E
F
Transformation
1/sqrt(X)
-1/Log (X)
-1/X
LOG TRANSFORMATION
Untransformed
Square-root
transformed
Log transformed
38
6.164
1.580
1
1.000
0.000
13
3.606
1.114
2
1.414
0.301
13
3.606
1.114
20
4.472
1.301
50
7.071
1.699
9
3.000
0.954
28
5.292
1.447
6
2.449
0.778
4
2.000
0.602
43
6.557
1.633
SUMMARY

ANOVA

One-way ANOVA



With or without repeated measures
Two-way ANOVA
 Regular two-way ANOVA
 Two-way repeated ANOVA
Power analysis
Two independent samples
 One-way ANOVA


Data Transformations
Log transformation
 Square root transformation
 Arcsine transformation

BASIC STATISTICS TOOLS
Statistics software and packages:
1.Graphpad prism, SPSS and excel addins
2. G*power, Optimal design, etc
3. SAS, R, Stata, etc
Basic statistics books:
1. Intro Stats, SDSU, 2nd edition, Deveaux, Velleman, Bock
2. Choosing and Using Statistics: A Biologist's Guide
3. Biostatistical analysis, Jerrold H. Zar
4. Biostatistics: the bare essentials, Norman Streiner
5. Handbook of biological statistics
Thank You All for Coming!!!
Questions???