Statistical significance – misused or bad concept

advertisement
Petr Soukup
Faculty of Social Science, Charles University, Sociological Institute (Czech Academy of
Science)
Statistical significance – misused or bad concept?
Abstract
Nowadays, computers offer us possibility to use statistical methods nearly without thinking.
We teach students how to use statistics in a statistical package and not how to use statistics in
science. One of the biggest problem is connected with statistical significance concept,
established by sir Ronald Fisher, Jerzy Neyman and Egon Pearson. It seems to me, that most
of social scientist use statistical significance in a bad way, most of students do not uderstand
to the key idea of this concept. Scientists are used to publish more probably statistical
significant results than nonsignificant ones. Is statistical significant result always important?
Are there any alternatives to statistical significance for evaluation of result importance? Do
Czech sociologists use statistical significance in bad way? I discuss limits of statistical
signicance and statistical significance controversy in psychological and educational journals
in the last 20 years. I offer some alternatives to classical statistical significance (confidence
intervals, power analysis, minumum sample size, alternative models approach). I made
enquiry of social scince students and tried to find out what is thein statistical significance
knowledge. At the end of article I give some recommendations for teaching in methodological
courses that can improve statistical significance practicing in future.
Short history of statistical testing
The history of statistical testing started in 1710 by Arbunthot paper [Arbunthot, 1710]. The
author of this paper wanted to show differences of girl’s and boy’s birth rates in Britain. Idea
of statistical testing was sleeping for two centuries and at the beginning of 20th century Fisher
started to develop it. The basic Fisher’s idea was connected with experiment designs and key
question was: Is there any diference between control and experimental group? Or is there any
diference between two experimantal groups etc.? Fisher invented analysis of variance and
idea of statistical hypothesis testing. Fisher proposed to test so called tested hypothesi which
states: There s no diference between control and experimental groups in measured
characterics (e.g. lenght of XXXXXnarcisy atd.) Fisher derived equations for analysis of
variance and Fischer distribution (F-distribution).
F=MSb/MSw
According to the value of test criteria F it was possible to compute probability of validity of
null hypothesis if this is true in basic population. This probability (often nowadays called Sig
or P or significance etc.) can be between 0 and 1. Low values seems to support idea that
control and experimental groups differ i.e. there is some imapct of effect which wass used in
experimental group. Fisher proposed probability (dále jen P) below 0,05 (5 %) as „proof“ of
effect impact and propose to continue in experiments with P lower that 0,05. These
replications can confirm our idea about effect impact. The second Fisher’s propsal was
conncted with P in interval <0.05,0.2>. In these cases Fisher proposed to continue in
experiments, because our experiment design is may be problematic but we can reach some
higher differences in next experiments. The third Fisher’s proposal can be formulated: If P is
above 0.2 (20 %) follow tested hypothesis and conclude that no effect can be proven and no
replication of experiment is neccessary. These ideas were developed by polish statistician
Jerzy Neyman and british statistician Egon Pearson. These two developed idea of null and
alternative hypothesi. At the begging researcher must state his/her null hypothesis and
alternative one.
Download