Mathematical Functions

Constants

MCKL defines some mathematical constants in the form of constant expression functions. For example, to get the value of \(\pi\) with a desired precision, one can use the following,

constexpr float       pi_f = ::mckl::const_pi<float>();
constexpr double      pi_d = ::mckl::const_pi<double>();
constexpr long double pi_l = ::mckl::const_pi<long double>();

The compiler evaluates these values at compile-time and thus there is no performance difference between the function call and hard-coding the constants in the program, while the readability is improved. All defined constants are listed in the table below.

Mathematical Constants
Function Value
const_inf \(\infty\)
const_nan silent NaN
const_zero \(0\)
const_one \(1\)
const_pi \(\pi\)
const_pi_2 \(2\pi\)
const_pi_inv \(1/\pi\)
const_pi_sqr \(\pi^2\)
const_pi_by2 \(\pi/2\)
const_pi_by3 \(\pi/3\)
const_pi_by4 \(\pi/4\)
const_pi_by6 \(\pi/6\)
const_pi_2by3 \(2\pi/3\)
const_pi_3by4 \(3\pi/4\)
const_pi_4by3 \(4\pi/3\)
const_sqrt_pi \(\sqrt{\pi}\)
const_sqrt_pi_2 \(\sqrt{2\pi}\)
const_sqrt_pi_inv \(\sqrt{1/\pi}\)
const_sqrt_pi_by2 \(\sqrt{\pi/2}\)
const_sqrt_pi_by3 \(\sqrt{\pi/3}\)
const_sqrt_pi_by4 \(\sqrt{\pi/4}\)
const_sqrt_pi_by6 \(\sqrt{\pi/6}\)
const_sqrt_pi_2by3 \(\sqrt{2\pi/3}\)
const_sqrt_pi_3by4 \(\sqrt{3\pi/4}\)
const_sqrt_pi_4by3 \(\sqrt{4\pi/3}\)
const_ln_pi \(\ln{\pi}\)
const_ln_pi_2 \(\ln{2\pi}\)
const_ln_pi_inv \(\ln{1/\pi}\)
const_ln_pi_by2 \(\ln{\pi/2}\)
const_ln_pi_by3 \(\ln{\pi/3}\)
const_ln_pi_by4 \(\ln{\pi/4}\)
const_ln_pi_by6 \(\ln{\pi/6}\)
const_ln_pi_2by3 \(\ln{2\pi/3}\)
const_ln_pi_3by4 \(\ln{3\pi/4}\)
const_ln_pi_4by3 \(\ln{4\pi/3}\)
const_e \(\mathrm{e}\)
const_e_inv \(1/\mathrm{e}\)
const_sqrt_e \(\sqrt{\mathrm{e}}\)
const_sqrt_e_inv \(\sqrt{1/\mathrm{e}}\)
const_sqrt_2 \(\sqrt{2}\)
const_sqrt_3 \(\sqrt{3}\)
const_sqrt_5 \(\sqrt{5}\)
const_sqrt_10 \(\sqrt{10}\)
const_sqrt_1by2 \(\sqrt{1/2}\)
const_sqrt_1by3 \(\sqrt{1/3}\)
const_sqrt_1by5 \(\sqrt{1/5}\)
const_sqrt_1by10 \(\sqrt{1/10}\)
const_ln_2 \(\ln{2}\)
const_ln_3 \(\ln{3}\)
const_ln_5 \(\ln{5}\)
const_ln_10 \(\ln{10}\)
const_ln_inv_2 \(1/\ln{2}\)
const_ln_inv_3 \(1/\ln{3}\)
const_ln_inv_5 \(1/\ln{5}\)
const_ln_inv_10 \(1/\ln{10}\)
const_ln_ln_2 \(\ln\ln{2}\)

Vectorized Functions

MCKL provides a set of vectorized elementary mathematical functions. For example, to perform additions of two vectors,

size_t n = 1000;
::mckl::Vector<double> a(n), b(n), y(n);
// Fill vectors a and b
::mckl::add(n, a.data(), b.data(), y.data());

This is equivalent to,

for (size_t i = 0; i != n; ++i)
    y[i] = a[i] + b[i];

Conventions of Parameters

For each function, the first parameter is always the length of the vectors, and the last is a pointer to the output vector (except for sincos and modf which have two output vectors). The output parameters are always vectors. Some of the input parameters may be scalars. For example, the muladd function Functions` has the following seven overloads,

namespace mckl
{

template <typename T>
void muladd(size_t n, const T *a, const T *b, const T *c, T *y);

template <typename T>
void muladd(size_t n, const T *a, const T *b, T c, T *y);

template <typename T>
void muladd(size_t n, const T *a, T b, const T *c, T *y);

template <typename T>
void muladd(size_t n, T a, const T *b, const T *c, T *y);

template <typename T>
void muladd(size_t n, T a, T b, const T *c, T *y);

template <typename T>
void muladd(size_t n, T a, const T *b, T c, T *y);

template <typename T>
void muladd(size_t n, const T *a, T b, T c, T *y);

}

The input of these functions can be either real numbers (floating point types), or complex numbers (std::complex<double>, etc.), or both. The supported data types are also listed in the tables. In most cases, output data type is the same as the input. There are a few exceptions. The abs and arg functions always have real numbers as output. The cis function takes real numbers as input and complex numbers as output. Note that, mixed precision is not allowed. For example,

::mckl::Vector<double> a(n);
::mckl::Vector<double> b(n);
::mckl::Vector<double> y(n);
::mckl::muladd(n, a.data(), b.data(), 2, y.data());

causes compile-time error because the fourth argument is an integer while the others are floating point types. The correct call shall be,

::mckl::muladd(n, a.data(), b.data(), 2.0, y.data());

The same principle applies to mixed type functions (abs, arg and cis).

The input and output pointers are allowed to alias to each other in the sense that they might pointing to the same memory locations. However, if they point different locations but the vectors overlap, the behavior is undefined.

Vectorized Performance

With only the standard library, these functions do not provide performance advantage compared to simple loops. When Intel MKL is available, some functions can have substantial performance improvement when all input arguments are vectors of types float or double, std::complex<float> or std::complex<double>. The performance of Vectorized Random Number Generating heavily depends on these functions.

List of Functions

Arithmetic Functions
Function Operation Data Type
add \(y = a + b\) Real, Complex
sub \(y = a - b\) Real, Complex
sqr \(y = a^2\) Real
mul \(y = ab\) Real, Complex
mulbyconj \(y = a\bar{b}\) Complex
conj \(y = \bar{a}\) Complex
abs \(y = \lvert{a}\rvert\) Real, Complex
arg \(y = \mathrm{arg}(a)\) Complex
muladd \(y = ab + c\) Real, Complex
mulsub \(y = ab - c\) Real, Complex
nmuladd \(y = -ab + c\) Real, Complex
nmulsub \(y = -ab - c\) Real, Complex
fmadd \(y = ab + c\) (fused) Real
fmsub \(y = ab - c\) (fused) Real
fnmadd \(y = -ab + c\) (fused) Real
fnmsub \(y = -ab - c\) (fused) Real
Power and Root Functions
Function Operation Data Type
inv \(y = 1 / a\) Real
div \(y = a / b\) Real, Complex
sqrt \(y = \sqrt{a}\) Real, Complex
invsqrt \(y = 1 / \sqrt{a}\) Real
cbrt \(y = a^{1/3}\) Real
invcbrt \(y = a^{-1/3}\) Real
pow2o3 \(y = a^{2/3}\) Real
pow3o2 \(y = a^{3/2}\) Real
pow \(y = a^b\) Real, Complex
hypot \(y = \sqrt{a^2 + b^2}\) Real
Exponential and Logarithm Functions
Function Operation Data Type
exp \(y = \mathrm{e}^a\) Real, Complex
exp2 \(y = 2^a\) Real
expm1 \(y = \mathrm{e}^a - 1\) Real
log \(y = \ln a\) Real, Complex
log2 \(y = \log_2 a\) Real
log10 \(y = \log_{10} a\) Real, Complex
log1p \(y = \ln(a + 1)\) Real
Trigonometric Functions
Function Operation Data Type
cos \(y = \cos(a)\) Real, Complex
sin \(y = \sin(a)\) Real, Complex
sincos \(z = \cos(a)\) Real
cis \(y = \cos(a) + i\sin(a)\) Complex
tan \(y = \tan(a)\) Real, Complex
acos \(y = \arccos(a)\) Real, Complex
asin \(y = \arcsin(a)\) Real, Complex
atan \(y = \arctan(a)\) Real, Complex
atan2 \(y = \arctan(a / b)\) Real
Hyperbolic Functions
Function Operation Data Type
cosh \(y = \cosh(a)\) Real, Complex
sinh \(y = \sinh(a)\) Real, Complex
tanh \(y = \tanh(a)\) Real, Complex
acosh \(y = \mathrm{arc}\cosh(a)\) Real, Complex
asinh \(y = \mathrm{arc}\sinh(a)\) Real, Complex
atanh \(y = \mathrm{arc}\tanh(a)\) Real, Complex
Special Functions
Function Operation Data Type
erf \(y = \mathrm{erf}(a)\) Real
erfc \(y = \mathrm{erfc}(a)\) Real
cdfnorm \(y = (1 + \mathrm{erf}(a / \sqrt{2})) / 2\) Real
lgamma \(y = \ln\Gamma(a)\) Real
tgamma \(y = \Gamma(a)\) Real
Rounding Functions
Function Operation Data Type
floor \(y = \lfloor a \rfloor\) Real
ceil \(y = \lceil a \rceil\) Real
trunc \(y = \mathrm{sgn}(a)\lfloor|a|\rfloor\) Real
round \(y = \text{nearest integer of }a\) Real
nearbyint \(y = \text{nearest integer of }a\) Real
rint \(y = \text{nearest integer of }a\) Real
modf \(z = a - y\) Real

Fused Multiplication and Addition

The muladd, fmadd and related functions differ in that fmadd always does fused multiply-add (FMA) by std::fma while muladd use one multiplication and one addition. However, the compiler may or may not be able to make use the hardware FMA support. And it may not vectorize the loop in fmadd etc., as most modern C++ compilers would do for simpler operations such as addition and multiplication.

When software implementation of std::fma is used, it will be much slower than using one multiplication and one addition. In this case, there are assembly implementations that take advantage of the platform support for single and double precision. To enable this feature, one needs to build and link to the Optional Runtime Library. One also needs to use preprocessor configuration #define MCKL_USE_ASM_LIB 1. In addition, this feature is only enabled for platforms with FMA3 instruction set support.

The library detects the availability of FMA3 instructions using compiler predefined macros. If this mechanism is not adequate, one can manually enable them using preprocessor configuration #define MCKL_HAS_FMA 1. Note that, when the compiler is able to generate vectorized loop for std::fma, the assembly library may or may not outperform the compiler generated binary. If the compiler generated binary is preferred but the runtime library is also used for its other features, then one can use the preprocessor configuration #define MCKL_USE_ASM_FMA 0.