What is randomness? Stephen Gorard Durham University s.a.c.gorard@durham.ac.uk A basic view of randomness In statistics and the philosophy of science, randomness means something rather more than its everyday meaning of haphazard. A random event is one that is completely unpredictable in form, outcome or timing. Randomness is the quality of such an event – its unpredictability, lack of intention, and lack of long-term patterning. Applied to a set of numbers, randomness means that each number in the set is unrelated to any other. It means that knowing one or more numbers in the set will not help identify any other numbers in the set (in the same range). For example, if a standard six-side die roll has a random outcome, then knowing the previous 10, 100 or 1,000 results from that die will not assist prediction of the next one. The chances of guessing the result of the next roll correctly remain 1 in 6, however many rolls have been seen previously. Randomness is the characteristic of chance. Apparent patterns in random events There are three apparent exceptions to the lack of pattern in random events, but none disturb the explanation given above. First, there will be short-term patterns in random events just like shapes in the clouds. A die can roll a 6 ten times in a row and still be random. In fact, if this never happened its randomness might be doubted. Second, random events can have a clear probability distribution over a large number of events, and this is a kind of pattern. For a die roll the distribution is called uniform because each outcome has the same likelihood. So, over a very large number of rolls the outcomes will tend to be equal in occurrence – there will be something like 1 in 6 ones, 1 in 6 twos, and so on. But this does not help to predict the next random event (to believe otherwise, and that the die ‘owes’ a six for example, is the gambler’s fallacy). Third, in real-life, true random events will be rare. It is hard to imagine a die has been so perfectly manufactured and weighted that each side will have precisely the same probability of occurrence. It is also possible to imagine that a person rolling the die might be influenced slightly by the faces showing when the decision is made to release the die. In real-life such events as rolling a die are therefore termed pseudo-random. They appear random and they are practically as good as random (i.e. unpredictable). Even random number tables generated by a computer must be based on the algorithm that creates them, and this creates the danger that they are not truly random. Randomness is more of an ideal than a fact (but see Chapter 5 in Gorard 2013). Randomness and social science The term ‘random’ is widely used in social science in the context of sampling and statistical analysis. A sample is random if all of the cases in it were selected by chance from a larger set of cases known as the population, and if all of the cases in the population had a genuine chance of being in the sample. A population itself is clearly not a random sample, and nor is a sample selected by other means (ad hoc, convenience, purposive etc.). A sample selected at random but in which cases 1 cannot be found, do not respond, or are otherwise not recorded is no longer a random sample. And there is no reason to believe that such missing cases are a random subset of the planned sample either (those refusing to take part in a piece of research, for example, can be predicted on the basis of their prior characteristics with more success than chance alone). A complete sample selected by a pseudo-random number generator is a pseudo-random sample, and in everyday terms is indistinguishable from a random sample. A sample in which selected cases are intentionally removed or not available is no kind of random sample at all. None of the statistical techniques predicated on working with a random sample can or should be used with populations or samples that are not random or not complete. This means that p-values, standard errors, or confidence intervals should not be cited, and tests of significance should not be conducted or reported, with such cases. And it means that such use or reporting should be ignored in the work of others unless the study is based on a complete random sample. In turn, the scarcity of true random samples in social science means that such statistical techniques should rarely be used or reported (even though their abuse remains widespread). 2