Performance of Random Number Generators

The performance is measured on an MacBook Pro with an Intel Core i7-4960HQ CPU running macOS Sierra (version 10.12.4).

Three compilers are tested, LLVM clang (version Apple 8.1.0), GNU GCC (version 6.3.0), and Intel C++ compiler (version 17.0.2). They are labeled as “LLVM”, “GCC” and “Intel”, respectively.

Two usage cases of RNGs are considered. The first is generating random integers within a loop, each iteration generate a single random integer,

// RNGType is the type of RNG tested
RNGType rng;
::mckl::UniformBitsDistribution<RNGType::result_type> ubits;
for (std:size_t i = 0; i != n; ++i)
    r[i] = ::mckl::rand(rng, ubits);

For RNGs that output random integers uniform on the set \(\{0,\dots,2^W - 1\}\), where \(W\) is the number of bits in its output integer type, this is equivalent to

for (std::size_t i = 0; i != n; ++i)
    r[i] = rng();

The second is the vectorized performance,

::mckl::rand(rng, ubits, n, r.data());

For RNGs that output random integers uniform on the set \(\{0,\dots,2^W - 1\}\), where \(W\) is the number of bits in its output integer type, this is equivalent to

::mckl::rand(rng, n, r.data());

See Vectorized Random Number Generating. In both cases, we repeat the simulations 100 times, each time with \(n\) chosen randomly between 5,000 and 10,000. The total number of cycles of the 100 simulations are recorded, and then divided by the total number of bytes generated. It gives the performance measurement in cpB. The two cases are labeled “Single” and “Batch”, respectively, in the tables below.

Performance of RNGs in the Standard Library
    Single     Batch  
RNG LLVM GCC Intel LLVM GCC Intel
std::mt19937 4.56 4.21 5.03 4.56 3.99 5.03
std::mt19937_64 2.35 2.21 2.46 2.35 2.18 2.46
std::minstd_rand0 9.68 10.1 10.9 9.68 10.1 10.9
std::minstd_rand 7.61 8.56 9.59 7.54 8.56 9.57
std::ranlux24_base 12.4 5.98 13.7 12.6 6.15 12.1
std::ranlux48_base 5.83 3.11 6.59 5.83 3.07 6.69
std::ranlux24 130 94.2 130 130 93.7 113
std::ranlux48 223 159 243 224 159 230
std::knuth_b 43.7 49.6 30.5 37.8 48.8 30.4
Performance of RNGs in the Random123 Library
    Single     Batch  
RNG LLVM GCC Intel LLVM GCC Intel
r123::AESNI4x32 6.58 1.56 8.93 6.61 1.52 8.83
r123::ARS4x32 5.13 1.18 6.39 5.19 1.18 6.30
r123::Philox2x32 10.1 3.04 12.6 10.1 3.07 12.6
r123::Philox4x32 6.14 2.86 13.1 6.17 2.81 13.1
r123::Philox2x64 2.19 1.77 10.3 2.15 1.56 10.2
r123::Philox4x64 2.25 1.72 12.0 2.25 1.67 11.9
r123::Threefry2x32 11.6 4.59 10.8 11.7 4.59 10.9
r123::Threefry4x32 6.94 3.86 7.86 6.84 3.87 7.80
r123::Threefry2x64 3.38 2.25 3.17 3.30 2.32 3.17
r123::Threefry4x64 2.62 1.97 6.75 2.53 1.97 6.72
Performance of AESEngine
    Single     Batch  
RNG LLVM GCC Intel LLVM GCC Intel
AES128 1.53 1.90 1.87 0.64 0.63 0.63
AES192 2.19 2.28 2.25 0.76 0.76 0.76
AES256 2.57 2.63 2.63 0.89 0.89 0.89
ARS 1.04 1.33 1.01 0.32 0.32 0.32
AES128_64 1.14 1.36 1.48 0.63 0.63 0.63
AES192_64 1.62 1.62 1.78 0.76 0.76 0.76
AES256_64 1.93 1.96 2.06 0.88 0.88 0.88
ARS_64 0.73 0.86 0.67 0.32 0.32 0.32
Performance of PhiloxEngine
    Single     Batch  
RNG LLVM GCC Intel LLVM GCC Intel
Philox2x32 4.71 4.52 5.57 0.61 0.61 0.61
Philox4x32 3.74 6.27 4.34 0.63 0.63 0.63
Philox2x64 2.70 2.38 3.07 1.42 1.42 1.43
Philox4x64 2.79 2.67 2.48 1.45 1.45 1.45
Philox2x32_64 4.08 4.08 5.32 0.61 0.61 0.61
Philox4x32_64 3.46 6.07 4.06 0.61 0.61 0.63
Philox2x64_64 2.09 2.08 2.82 1.42 1.42 1.42
Philox4x64_64 2.28 2.16 2.08 1.45 1.45 1.45
Performance of ThreefryEngine
    Single     Batch  
RNG LLVM GCC Intel LLVM GCC Intel
Threefry2x32 7.34 6.61 7.16 0.99 0.99 0.94
Threefry4x32 5.33 6.81 5.32 1.01 0.99 0.98
Threefry2x64 4.02 3.33 4.28 0.92 0.92 0.89
Threefry4x64 3.27 2.89 3.52 0.95 0.91 0.86
Threefry8x64 3.04 2.19 3.13 0.89 0.86 0.86
Threefry16x64 3.99 3.46 3.60 0.91 1.02 0.86
Threefish256 8.78 8.89 9.16 2.97 2.76 2.81
Threefish512 6.72 6.55 6.83 2.92 2.78 2.79
Threefish1024 10.6 9.21 9.44 3.41 3.84 3.26
Threefry2x32_64 6.53 5.50 6.59 0.98 0.98 0.94
Threefry4x32_64 4.78 6.61 4.85 1.01 0.98 0.96
Threefry2x64_64 3.29 3.13 4.03 0.92 0.91 0.88
Threefry4x64_64 2.57 2.62 2.95 0.94 0.89 0.85
Threefry8x64_64 1.97 1.93 2.50 0.88 0.86 0.86
Threefry16x64_64 2.50 2.48 2.43 0.91 0.99 0.85
Threefish256_64 8.05 8.35 8.43 2.95 2.75 2.79
Threefish512_64 5.67 5.58 6.05 2.89 2.78 2.78
Threefish1024_64 8.70 8.29 8.43 3.38 3.80 3.22
Performance of MKLEngine
    Single     Batch  
RNG LLVM GCC Intel LLVM GCC Intel
MKL_ARS5 2.37 2.38 2.35 0.41 0.41 0.41
MKL_PHILOX4X32X10 2.95 2.98 2.95 0.76 0.77 0.77
MKL_MCG59 2.09 2.13 2.10 0.44 0.44 0.44
MKL_MT19937 2.00 2.09 2.09 0.32 0.32 0.32
MKL_MT2203 1.99 2.00 1.97 0.25 0.26 0.25
MKL_SFMT19937 1.97 2.00 2.00 0.22 0.22 0.22
MKL_ARS5_64 1.15 1.15 1.11 0.39 0.39 0.39
MKL_PHILOX4X32X10_64 1.58 1.58 1.52 0.75 0.76 0.76
MKL_MCG59_64 0.98 0.98 0.94 0.42 0.42 0.42
MKL_MT19937_64 0.92 0.92 0.89 0.32 0.32 0.32
MKL_MT2203_64 0.88 0.83 0.80 0.25 0.25 0.25
MKL_SFMT19937_64 0.89 0.91 0.85 0.20 0.20 0.20