Jan00: Algorithm Alley - Eastern Washington University

advertisement
Randomized Shuffling
Dr. Dobb's Journal January 2000
By Timothy Rolfe
Tim is a visiting associate professor in the Computer Science Department of Eastern Washington
University. He can be contacted at rolfet@acm.org.
Card shuffling is an example of putting a fixed number of items into completely random order.
The method used to shuffle cards applies to any programming circumstance where you need to
randomize the order of a fixed number of items, including Scrabble tiles, dominoes, standard
playing cards, Tarot cards, or lottery numbers, for example.
You can view this randomization of items as randomizing the permutation of entries in an array.
In this article, I'll examine a couple of randomizing algorithms -- one that does not generate all
permutations with equal probability, and another that does. Both algorithms are based on
working through the array, swapping entries in the process.
The first algorithm focuses on proceeding through the array, randomly positioning the value
found in each successive position. With this algorithm, as you proceed through the N positions of
the array (source positions), you choose the target position randomly from the N possible
positions in the array. The values in the source and target positions are then interchanged. H.M.
Deitel and P.J. Deitel refer to this as their "high-performance card shuffling" algorithm in their
book C++: How to Program (Prentice-Hall, 1998). In C++, this process looks like Example 1
(presuming a function or macro Swap (L,R)). At first glance, it appears that this algorithm
generates all permutations with equal probability. On examination, however, you can see that
this generates NN rearrangements of elements -- each of the N iterations of the loop positions a
value among the N available positions -- even though there are only N! possible permutations of
N elements. For each permutation there are a number of ways to generate that permutation -- an
average of NN/N! ways.
for ( Src = 0; Src < N; Src++ )
{
Dest = rand() % N; // All N positions equally
likely
Swap (X[Src], X[Dest]);
}
Example 1: C++ implementation of shuffling algorithm that does not generate
all permutations with equal probability.
The second algorithm focuses on choosing a random value for each position in the array. Once
the value is placed in that position, the position (and the value) no longer participates in the
shuffling. While you could start at either end of the array, the code is simpler if the region for
choosing a source value always begins with subscript 0. To do that, you start with the rightmost
position in the array as the target and randomly select a source position from the front of the
array to that target position. You interchange the values, then treat the array as shortened by one
position and do the same thing over again. There are no choices left to be made when the
remaining array has a single element. Example 2 is C++ code that implements this approach.
Examination of the structure of this loop shows that it generates N! rearrangements of elements.
All permutations are equally likely, aside from the minor deviation from uniform distribution by
selecting a random value between 0 and Dest as (rand()%(Dest+1)).
for (Dest = N-1; Dest > 0; Dest--)
{
Src = rand() % (Dest+1); // Positions from [0] to
[Dest]
Swap (X[Src], X[Dest]);
}
Example 2: C++ implementation of shuffling algorithm that does generate all
permutations with equal probability.
A natural question is "How far from the uniform distribution are the permutations from the NN
algorithm discussed first?" To answer this, I'll develop a method of numbering permutations.
Then all of the permutations generated by the NN algorithm can be generated and counted. Since,
in a permutation, each position has one fewer choice than the one before it, you can borrow from
programming the notion of a multidimensional array, with dimensions: [N] [N-1] [N-2]...[3] [2]
[1].
You can then map subscripts for such an N-dimensional array onto a one- dimensional offset.
Each permutation can then be viewed as N times selecting elements from those remaining (as
selecting from position [0], [1], and so on). At first, there are N elements to choose from, then
(N-1), then (N-2), down to 1. Viewed as subscripts in an N-dimensional array, these positions of
elements chosen will generate offsets (permutation indices) between 0 and (N!-1). A further
advantage of the numbering is that, if the original permutation is in increasing order, it numbers
the permutations in lexicographic order; that is, something like alphabetical or dictionary order.
Thus the permutation with all elements in ascending order is numbered "0," while the
permutation with all elements in descending order is numbered "N!-1."
Figure 1 illustrates how, in the permutation of eight elements, the 88(16777216) rearrangements
from the first algorithm map onto the 8! (40320) possible permutations. The file mapperms.cpp
(available electronically; see "Resource Center," page 5) implements this process. Table 1 shows
that there are some individual permutations that are markedly more likely than others. While
most appear to cluster between 200 and 600, the pattern is by no means random. Applying a
moving average to these data, you can show some underlying regularities in the data, again
indicating that the data are far from random. In Figure 2, a moving window of about 0.5 percent
of all the data is applied (201 cells averaged and assigned to the central point). Another way that
you can get a feel for the distribution is by sorting those 40320 data points; see Figure 3.
Figure 1: Mapping NN generated permutations onto N! permutation indices.
Table 1: Permutations that are markedly more likely than others.
Figure 2: Mapping NN generated permutations onto N! permutations indices.
Moving average, 201 cell window.
Figure 3: Number of hits: sorted 88/8!=416.1016
Another approach is to investigate, in the mapping of these NN reorderings onto the N!
permutations, the probability (fraction of the total) for each position in the initial string to end up
in each position in the shuffled string. Table 2 shows those probabilities. In it, each column and
each row totals to 100 percent. Down the columns, each position contains one of the eight
available characters. Across the rows, each character shows up on one of the eight positions. The
first character in the source string shows equal probability of ending up in all eight positions of
the rearranged string, while all other characters in the source string show varying probabilities
for their positions in the rearranged string.
Table 2: Probability for each position in the initial string to end up in each
position in the shuffled string.
Also, you see that only the final position in the rearranged string has an equal probability of
receiving all of the characters of the source string. Figure 4 displays Table 2 with the eight series
taken from the rows.
Figure 4: Distribution of characters. Series labels: character positioned.
Acknowledgment
Thanks to Dr. Ray Hamel of the Eastern Washington University Computer Science Department
for his comments, specifically for the idea of examining the relationships between the initial and
final positions of items in the rearrangements.
DDJ
Download