QM 1 Randomness - Research Design & Evaluation

advertisement
What is randomness?
Stephen Gorard
Durham University
s.a.c.gorard@durham.ac.uk
A basic view of randomness
In statistics and the philosophy of science, randomness means something rather more than its
everyday meaning of haphazard. A random event is one that is completely unpredictable in form,
outcome or timing. Randomness is the quality of such an event – its unpredictability, lack of
intention, and lack of long-term patterning. Applied to a set of numbers, randomness means that
each number in the set is unrelated to any other. It means that knowing one or more numbers in the
set will not help identify any other numbers in the set (in the same range). For example, if a standard
six-side die roll has a random outcome, then knowing the previous 10, 100 or 1,000 results from that
die will not assist prediction of the next one. The chances of guessing the result of the next roll
correctly remain 1 in 6, however many rolls have been seen previously. Randomness is the
characteristic of chance.
Apparent patterns in random events
There are three apparent exceptions to the lack of pattern in random events, but none disturb the
explanation given above. First, there will be short-term patterns in random events just like shapes in
the clouds. A die can roll a 6 ten times in a row and still be random. In fact, if this never happened its
randomness might be doubted. Second, random events can have a clear probability distribution over
a large number of events, and this is a kind of pattern. For a die roll the distribution is called uniform
because each outcome has the same likelihood. So, over a very large number of rolls the outcomes
will tend to be equal in occurrence – there will be something like 1 in 6 ones, 1 in 6 twos, and so on.
But this does not help to predict the next random event (to believe otherwise, and that the die
‘owes’ a six for example, is the gambler’s fallacy). Third, in real-life, true random events will be rare.
It is hard to imagine a die has been so perfectly manufactured and weighted that each side will have
precisely the same probability of occurrence. It is also possible to imagine that a person rolling the
die might be influenced slightly by the faces showing when the decision is made to release the die. In
real-life such events as rolling a die are therefore termed pseudo-random. They appear random and
they are practically as good as random (i.e. unpredictable). Even random number tables generated
by a computer must be based on the algorithm that creates them, and this creates the danger that
they are not truly random. Randomness is more of an ideal than a fact (but see Chapter 5 in Gorard
2013).
Randomness and social science
The term ‘random’ is widely used in social science in the context of sampling and statistical analysis.
A sample is random if all of the cases in it were selected by chance from a larger set of cases known
as the population, and if all of the cases in the population had a genuine chance of being in the
sample. A population itself is clearly not a random sample, and nor is a sample selected by other
means (ad hoc, convenience, purposive etc.). A sample selected at random but in which cases
1
cannot be found, do not respond, or are otherwise not recorded is no longer a random sample. And
there is no reason to believe that such missing cases are a random subset of the planned sample
either (those refusing to take part in a piece of research, for example, can be predicted on the basis
of their prior characteristics with more success than chance alone). A complete sample selected by a
pseudo-random number generator is a pseudo-random sample, and in everyday terms is
indistinguishable from a random sample. A sample in which selected cases are intentionally removed
or not available is no kind of random sample at all. None of the statistical techniques predicated on
working with a random sample can or should be used with populations or samples that are not
random or not complete. This means that p-values, standard errors, or confidence intervals should
not be cited, and tests of significance should not be conducted or reported, with such cases. And it
means that such use or reporting should be ignored in the work of others unless the study is based
on a complete random sample. In turn, the scarcity of true random samples in social science means
that such statistical techniques should rarely be used or reported (even though their abuse remains
widespread).
2
Download