Generating Random Numbers without Modulo Bias, Redux

2015-02-27
, ,

In an earlier post, I discussed how to remove modulo bias from your C code. I would like to add a different version of removing modulo bias, inspired by the code from the PCG family of random number generators.

The PCG Version

In the official minimal C implementation of PCG, there is a function called pcg32_boundedrand_r, which takes a bound variable. The code for that function is as follows:

uint32_t pcg32_boundedrand_r(pcg32_random_t* rng, uint32_t bound)
{
    uint32_t threshold = -bound % bound;
    for (;;) {
        uint32_t r = pcg32_random_r(rng);
        if (r >= threshold)
            return r % bound;
    }
}

. If bound is 5, then the function will return a uniform range from 0 to 4. I.e., the return value r is such that \(0 \leq r < bound\).

Explanation of the PCG Version

Although I generally prefer to explain code in the order they appear, top to bottom, for this case I would like to explain the pcg32_boundedrand_r() function by talking about the usual broken version found in the wild.

Broken Version

The broken version is like this:

r = some_rng();
return r % bound;

. Unless r is a multiple of bound, we will incur the wrath of modulo bias in the above code. Why is this? Well, consider what the modulo operator does. All it does is chop off any excess range of values that is not a multiple of bound. Consider the following ASCII diagram, where the maximum value returned by our some_rng() function is just 11:

0 1 2 3 4 5 6 7 8 9 10 11

. So, some_rng() has 12 possible unique values, 0 through 11. If our bound was 4, then doing r % bound is perfectly fine, because we can chop the RNG range into three equal parts of length 4 each, like this:

0 1 2 3 | 4 5 6 7 | 8 9 10 11

. If each of the 12 values 0 through 11 occurs uniformly1, then we can assume that each of the values occurs \(\frac{1}{12}\) times. If we apply % 4 to the values, then our output range looks like this:

0 1 2 3 | 0 1 2 3 | 0 1 2 3

. Can you see how the three subparts are the same? What’s more, 0 will occur \(\frac{3}{12} = \frac{1}{4}\) times. The same goes for 1, 2, and 3. Because all 4 possible values, 0 through 4, occur an equal \(\frac{1}{4}\) times, there is no modulo bias here!

What if the modulus is not a nice number with respect to

RAND_MAX?

This is where the problem occurs. RAND_MAX is the highest value returned by our RNG, and thus defines the range of the possible values output by the RNG.2 Continuing with our example above, if instead of bound = 4, we used another value like 5, we will get this instead:

0 1 2 3 4 | 5 6 7 8 9 | 10 11

— or essentially, these values are output if we just naively use r % bound:

0 1 2 3 4 | 0 1 2 3 4 | 0 1

. Can you see how 0 and 1 will occur \(\frac{3}{12}\) times, but 2, 3, and 4 will occur only \(\frac{2}{12}\) times?

The fix — adjust the range!

Now, we can fix the above example by simply throwing out certain values. The approach I used in the old blog post was to discard the right hand side values. So, in our example with bound = 5, where we have

0 1 2 3 4 | 0 1 2 3 4 | 0 1

, the old example tried to discard the last 2 values, like this:

0 1 2 3 4 | 0 1 2 3 4 | x x

. In other words, if r fell within the range 10 through 11, then we’d simply discard it and call some_rand() again.

But it doesn’t have to be this way. Instead of throwing out the values on the right, we can throw out the values on the left! So, instead of

0 1 2 3 4 | 0 1 2 3 4 | x x

we can instead do

x x 2 3 4 | 0 1 2 3 4 | 0 1

. Can you see how all of the values 0 through 4 occur exactly 2 times? No more bias!

PCG’s approach

The approach in PCG is the same; here is the code again:

uint32_t pcg32_boundedrand_r(pcg32_random_t* rng, uint32_t bound)
{
    uint32_t threshold = -bound % bound;
    for (;;) {
        uint32_t r = pcg32_random_r(rng);
        if (r >= threshold)
            return r % bound;
    }
}

. The threshold value is the initial range that must be discarded. So we can visualize it like this:

x x 2 3 4 | 0 1 2 3 4 | 0 1

// The `x x` here is the initial threshold range, that must be discarded.

On line 3, we determine the value of threshold, and then on line 4 we enter a for loop that repeatedly calls the RNG until we get a value outside of this threshold range (well, technically, greater-than-or-equal-to the threshold value). If we do get such a value, then we return the modulo of it. If we visualize it, it’s like this:

0 1 2 3 4 | 5 6 7 8 9 | 10 11

- Discard 0 and 1 (the threshold area).

x x 2 3 4 | 5 6 7 8 9 | 10 11

- Return (r % bound).

x x 2 3 4 | 0 1 2 3 4 | 0 1

.

How is threshold calculated?

The above high-level explanation should be sufficient for you, dear reader. But if you want to go down to the innards of C, to see how pcg32_boundedrand_r works, read on.

The cornerstone of the PCG approach is to use a variable called threshold. The code to calculate threshold is somewhat complicated:

uint32_t threshold = -bound % bound;

. Now, let’s remind ourselves that the whole point of threshold is to be a minimum value that sets a cutoff of values to be discarded — if the generated value r is too low (lower than our threshold), then we discard it.

Consider the following diagram, with RAND_MAX set to 11, and bound set to 5:

0 1 2 3 4 | 5 6 7 8 9 | 10 11

. We can visualze the above like this instead:

R R R R R | _ _ R R R | R R

. The underscores represent the values that must be skipped over, in order to eliminate modulo bias. The question then, is to figure out how to count the number of underscores. In our case, it is 2, and so as long as we skip the first two values 0 and 1, we should be fine.

The first step is to count backwards from the right-hand edge:

                |<---count|
R R R R R | _ _ R R R | R R
                t         |
                          \-> RAND_MAX

. We end up where t is on the diagram above, by counting backwards from RAND_MAX. If we then take the modulo of this by count itself, then we end up with t being the value we want — the number of underscores. This is the essence of -bound % bound — we first take -bound which is obtained by counting backwards from RAND_MAX, and then we take the modulo of this number by bound itself, to get what we need. Using the ASCII diagram again, we get

                |<-----count|
0 1 2 3 4 | 5 6 7 8 9 | 10 11
                t          |
                           \-> RAND_MAX

where t is 7, and now applying % 5, we get:

0 1 2 3 4 | 0 1 2 3 4 | 0 1
                t

t = 2, the correct answer! You can try out different values for RAND_MAX and bound, but you will get the right answer each time using threshold = -bound % bound.

But why is -bound the way it is?

The C standard says that a negative unsigned value is stored as a positive value. Without getting too technical, here are the values of bound as it becomes “negative”:

Bound | Actual value
------+-------------
2     | 2
1     | 1
0     | 0
-1    | (2^32) - 1 (same as RAND_MAX)
-2    | (2^32) - 2 (same as RAND_MAX - 1)
-3    | (2^32) - 3 (same as RAND_MAX - 2)
... and so on

. In our case, if our RAND_MAX is 11, and bound is 5, then -bound is indeed 7.

Bound | Actual value
------+-------------
0     | 0
-1    | 11 - 0 (same as RAND_MAX)
-2    | 11 - 1 (same as RAND_MAX - 1)
-3    | 11 - 2 (same as RAND_MAX - 2)
-4    | 11 - 3 (same as RAND_MAX - 3)
-5    | 11 - 4 (same as RAND_MAX - 4) = 7
... and so on

Conclusion

I thoroughly enjoyed looking at the source code in PCG, only to discover an elegant solution around removing modulo bias. Unfortunately, I do not know the true origin of this approach; it is possible that the authors of the PCG code invented it, but I find this improbable. Meanwhile, I strongly recommend the following code for anyone using a low-level generator that does not come with a bounded version:

/*
 * Uniformly return an integer from 0 to (bound - 1). We assume that rand()
 * returns a 32-bit unsigned integer, so we use uint32_t.
 */
uint32_t bound = some_arbitrary_bound;
uint32_t r;
uint32_t threshold = -bound % bound;
while (r = rand() < threshold) {};
return r % bound;

  1. Any RNG worth their salt will return a uniformly distributed value, typically from 0 to RAND_MAX. In the example here, our RAND_MAX is 11.↩︎

  2. For a 32-bit unsigned integer RNG, \(2^{32} - 1\) (all 1 bits set) is the highest value that can be returned. That is, our RNG returns a value from 0 (no bits set) to RAND_MAX (all bits set). This means that our RNG generates everything from all 0s to all 1s and everything in between.↩︎