Programmers are often faced with a problem of picking a random number within a given range. The most common way to solve this is to take a random integer and compute the modulo of some smaller number.
For example, assume the function rand() returns a random integer between 0 and an arbitrary maximum value, m. For some number n less than m, to compute a random number between 1 and n we do the following:
r = rand(0,m) % n + 1
This is fine enough, for most cases, however it CAN introduce what is called a modulo bias. A modulo bias can be visualized, quite clearly with a little help from the pigeonhole principle, which states:
Assume you are given n number of pigeonholes, and p > n number of pigeons. If you put each pigeon into a pigeonhole, there must exist at least one pigeonhole which has at least two pigeons.

In other words, because there are more pigeons than there are pigeonholes, at least one of the pigeonholes must have at least two pigeons in it! Here we are compressing 10 pigeons into 9 pigeonholes.
This is an example of compressing a larger range (number of pigeons) into a smaller range (number of pigeonholes).
So, how does this come into play when dealing with a random number in a range? We are doing the same thing when picking a random number in a range. We are given the larger range, (0,m), and are trying to compress this down into a smaller range (x, n).
Assume each of our pigeons is wearing a collar with a number 1-10 written on it. Now assume we have a 10-sided fair die numbered 1-10 on the sides, which we are going to roll. We are going to select a pigeon at random by rolling the die. The pigeon with a collar numbered the same as the up-facing side of the die is the one we will select. Now, the pigeonhole which houses the winning pigeon is going to have food placed into it. Which pigeonhole is most likely to get the food?
Each pigeon has a 10% chance of being selected, so each pigeonhole has a (10% * number of pigeons in pigeonhole) chance of being selected. That is, the pigeonhole with two pigeons has a 20% chance of being selected, where the other eight only have a 10% chance of being selected. There is a bias towards the pigeonhole with the most pigeons in it.
When each pigeonhole has just one pigeon in it, each pigeonhole shares a common chance winning the food. That is, each pigeonhole has the same chance of winning, because each pigeonhole has the same number of pigeons in it. If we have 18 pigeons and nine pigeonholes, we can put exactly two pigeons in each pigeonhole. Each pigeon adds a 1/18 chance of winning to the pigeonhole it is housed in. Since each pigeonhole has two pigeons, this means each pigeonhole has a 2/18 = 1/9 chance of winning, and the game is now fair!
The same thing happens with the random numbers. Assume our programming language generates random numbers between 0 and m = 9. Now, assume we are faced with a task, where we must select a random number between 1 and 3
that is:
$r = (rand(0,9) % 3) + 1;
This code will give us a random number between 1 and 3, but the problem is that it introduces a modulo bias. Here, we are compressing the range (0,9) down into the smaller range (1,3). There are 10 different values in the larger range (the pigeons), and just 3 different values in the smaller range (the pigeonholes).
First, we know that at least one of our pigeonholes will have at least two pigeons in it. We assign each of our pigeons to the correct pigeonhole by using the modulo operation:
(0 % 3) + 1 = 1
(3 % 3) + 1 = 1
(6 % 3) + 1 = 1
(9 % 3) + 1 = 1
|
(1 % 3) + 1 = 2
(4 % 3) + 1 = 2
(7 % 3) + 1 = 2
|
(2 % 3) + 1 = 3
(5 % 3) + 1 = 3
(8 % 3) + 1 = 3
|
Can you spot the modulo bias? In this example 1 has a greater chance (40%) of being selected than either 2 or 3 (30%). Here, because we have n close to m, the effect is greater. As m gets much bigger than n the effect becomes less noticeable, but does still exist (The pigeonholes are populated in order, so in the worst case, pigeon counts may differ by one). To eliminate the modulo bias, we have to decrease m until it is evenly divisible by n.
That is, if m % n != 0, we have a modulo bias.
Solving this equation yields a new value for m, m’ with no modulo bias:
m’ = (m – x) / n = 0
To prove this, I wrote a short PHP script, that computes a random number as:
$r = (rand(0,9) % 3) + 1;
and created 100,000 random numbers 1-3, and came out with the following results:
Array
(
[1] => 40005
[2] => 30352
[3] => 29643
)
Total of 100000 iterations:
1 = 40.005%
2 = 30.352%
3 = 29.643%
Just as I predicted! Now, if we reduce m from 10 down to 9, so that m % n = 0, we see the results change to:
Array
(
[1] => 33234
[2] => 33413
[3] => 33353
)
Total of 100000 iterations:
1 = 33.234%
2 = 33.413%
3 = 33.353%
And that’s more like it!