0xStubs

System Administration, Programming and Reconfigurable Computing

libc’s random number generators and what to be aware of when seeding them

When you need pseudo-random numbers in C code, commonly used routines are random(), rand() and rand_r(unsigned int*). First and foremost, it is important to know that none of these routines produce high quality random numbers suitable for cryptographic use. But there are cases where you just need some random looking numbers. Let’s focus on these use-cases in this blog post.

For example, let’s assume you have a multi-threaded OpenMP program where you need to generate different pseudo-random numbers in each thread. A natural approach to this would be the following:

#pragma omp parallel
{
    unsigned seed = omp_get_thread_num();
    #pragma omp for
    for (...) {
        int rnd = rand_r(&seed);
        ...
    }
}

Each thread stores its own state of the random number generator and initializes (seeds) it with its thread number. Therefore, each thread should obtain different random numbers, although they will be idential on each run due to the deterministically chosen initial values. So, what’s wrong with this code? Let’s have a look at the very first random numbers generated for each thread on macOS 12:

Thread 0 (seed=0): 520932930
Thread 1 (seed=1): 16807
Thread 2 (seed=2): 33614
Thread 3 (seed=3): 50421
Thread 4 (seed=4): 67228

For seed=1,2,3,..., the generated numbers look not at all random. Even in scenarios where you only need random looking values, this is probably not what you want. The reason for this behavior is the pseudo random number generator (PRNG) used to implement rand_r(unsigned int*) on macOS. It’s MINSTD, a so-called Lehmer RNG which could be impemented as follows:

#define RAND_MAX 0x7fffffff
int rand_r(unsigned *seed) {
    if (*seed == 0)
        *seed = 123459876;
    *seed = ((unsigned long long)*seed * 16807) % RAND_MAX;
    return *seed;
}

So basically, the seed is just multiplied by 16807 and since seed=0 would never generate any other number than 0, it needs to be mapped to some other value. In this case it’s behaving like seed=123459876. I took this observation as a reason to look at the generated random numbers for seed=0..4 for different libc implementations.

rand_r(unsigned int*)

The PRNG behind rand_r(unsigned int*) is inherently limited by the API to have an internal state consisting of only a single unsigned integer. This makes generating good pseudo-random numbers difficult.

macOS 12.1

Seed    Generated number
0       520932930
1       16807
2       33614
3       50421
4       67228
  • MINSTD with seed=0 being equal to seed=123459876.
  • Values ramp up for small seeds, so you probably want to seed it with numbers significantly larger than 1.
  • The implementation seems to originate from earlier versions of FreeBSD.

glibc 2.31

Seed    Generated number
0	1012484
1	476707713
2	952403967
3	1430195325
4	1905891579
  • Values seem to ramp up for small seeds, so you probably want to seed it with numbers significantly larger than 1.
  • Looking at the implementation, this behavior is not immediately obvious.

FreeBSD 13.0

Seed    Generated number
0	16806
1	33613
2	50420
3	67227
4	84034
  • Looks like MINSTD with a constant value of 16806 added to each random number. This allows seed=0 to generate a number different to 0.
  • Values seem to ramp up for small seeds, so you probably want to seed it with numbers significantly larger than 1.
  • Link to the implementation

random()

random() is a suggested replacement for rand() and rand_r(unsigned int*) and supposed to provide better pseudo-random numbers.

macOS 12.1

Seed    Generated number
0       577655601
1       1804289383
2       1505335290
3       1205554746
4       1968078301

glibc 2.31

Seed    Generated number
0	1804289383
1	1804289383
2	1505335290
3	1205554746
4	1968078301

FreeBSD 13.0

Seed    Generated number
0	1707946985
1	1408992891
2	1109212348
3	1871735903
4	493669277

rand()

macOS 12.1

Seed    Generated number
0       520932930
1       16807
2       33614
3       50421
4       67228
  • Identical to macOS’s rand_r(unsigned int*) implementation with all of its issues.

glibc 2.31

Seed    Generated number
0	1804289383
1	1804289383
2	1505335290
3	1205554746
4	1968078301
  • Identical to glibc’s random() implementation. Be aware of seed=0 and seed=1 behaving identically.

FreeBSD 13.0

Seed    Generated number
0	1707946985
1	1408992891
2	1109212348
3	1871735903
4	493669277

What about C++?

C++ implements different PRNGs in the STL that can be selected by the user and whose behavior is defined in the standard, so they should behave identically on all platforms. For example, there is the Mersenne Twister RNG std::mt19937:

Seed    Generated number
0       2357136044
1       1791095845
2       1872583848
3       2365658986
4       4153361530

If you really want to, you can also get MINSTD as std::minstd_rand0 with seed=0 being equal to seed=1:

Seed    Generated number
0       16807
1       16807
2       33614
3       50421
4       67228

Bottom line

  • If you really want to stick to libc in a single-threaded scenario: Use random() and make sure to never use seed=0.
  • If you use glibc and need different pseudo-random numbers in different threads: There is random_r(struct random_data *restrict, int32_t *restrict) which I have not tried but which should provide the quality of random() with the possibility of having thread-local RNG states.
  • If you really need to use rand_r(unsigned int*), make sure to use seeds significantly larger than 1.

Final remarks

All of the above only takes into account PRNGs from the C and C++ standard library. There are dozens of better choices, both performance and quality wise:

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha loading...