Mathematical Functions¶

Constants¶

MCKL defines some mathematical constants in the form of constant expression functions. For example, to get the value of \(\pi\) with a desired precision, one can use the following,

constexpr float       pi_f = ::mckl::const_pi<float>();
constexpr double      pi_d = ::mckl::const_pi<double>();
constexpr long double pi_l = ::mckl::const_pi<long double>();

The compiler evaluates these values at compile-time and thus there is no performance difference between the function call and hard-coding the constants in the program, while the readability is improved. All defined constants are listed in the table below.

Mathematical Constants¶
Function	Value
`const_inf`	\(\infty\)
`const_nan`	silent NaN
`const_zero`	\(0\)
`const_one`	\(1\)
`const_pi`	\(\pi\)
`const_pi_2`	\(2\pi\)
`const_pi_inv`	\(1/\pi\)
`const_pi_sqr`	\(\pi^2\)
`const_pi_by2`	\(\pi/2\)
`const_pi_by3`	\(\pi/3\)
`const_pi_by4`	\(\pi/4\)
`const_pi_by6`	\(\pi/6\)
`const_pi_2by3`	\(2\pi/3\)
`const_pi_3by4`	\(3\pi/4\)
`const_pi_4by3`	\(4\pi/3\)
`const_sqrt_pi`	\(\sqrt{\pi}\)
`const_sqrt_pi_2`	\(\sqrt{2\pi}\)
`const_sqrt_pi_inv`	\(\sqrt{1/\pi}\)
`const_sqrt_pi_by2`	\(\sqrt{\pi/2}\)
`const_sqrt_pi_by3`	\(\sqrt{\pi/3}\)
`const_sqrt_pi_by4`	\(\sqrt{\pi/4}\)
`const_sqrt_pi_by6`	\(\sqrt{\pi/6}\)
`const_sqrt_pi_2by3`	\(\sqrt{2\pi/3}\)
`const_sqrt_pi_3by4`	\(\sqrt{3\pi/4}\)
`const_sqrt_pi_4by3`	\(\sqrt{4\pi/3}\)
`const_ln_pi`	\(\ln{\pi}\)
`const_ln_pi_2`	\(\ln{2\pi}\)
`const_ln_pi_inv`	\(\ln{1/\pi}\)
`const_ln_pi_by2`	\(\ln{\pi/2}\)
`const_ln_pi_by3`	\(\ln{\pi/3}\)
`const_ln_pi_by4`	\(\ln{\pi/4}\)
`const_ln_pi_by6`	\(\ln{\pi/6}\)
`const_ln_pi_2by3`	\(\ln{2\pi/3}\)
`const_ln_pi_3by4`	\(\ln{3\pi/4}\)
`const_ln_pi_4by3`	\(\ln{4\pi/3}\)
`const_e`	\(\mathrm{e}\)
`const_e_inv`	\(1/\mathrm{e}\)
`const_sqrt_e`	\(\sqrt{\mathrm{e}}\)
`const_sqrt_e_inv`	\(\sqrt{1/\mathrm{e}}\)
`const_sqrt_2`	\(\sqrt{2}\)
`const_sqrt_3`	\(\sqrt{3}\)
`const_sqrt_5`	\(\sqrt{5}\)
`const_sqrt_10`	\(\sqrt{10}\)
`const_sqrt_1by2`	\(\sqrt{1/2}\)
`const_sqrt_1by3`	\(\sqrt{1/3}\)
`const_sqrt_1by5`	\(\sqrt{1/5}\)
`const_sqrt_1by10`	\(\sqrt{1/10}\)
`const_ln_2`	\(\ln{2}\)
`const_ln_3`	\(\ln{3}\)
`const_ln_5`	\(\ln{5}\)
`const_ln_10`	\(\ln{10}\)
`const_ln_inv_2`	\(1/\ln{2}\)
`const_ln_inv_3`	\(1/\ln{3}\)
`const_ln_inv_5`	\(1/\ln{5}\)
`const_ln_inv_10`	\(1/\ln{10}\)
`const_ln_ln_2`	\(\ln\ln{2}\)

Vectorized Functions¶

MCKL provides a set of vectorized elementary mathematical functions. For example, to perform additions of two vectors,

size_t n = 1000;
::mckl::Vector<double> a(n), b(n), y(n);
// Fill vectors a and b
::mckl::add(n, a.data(), b.data(), y.data());

This is equivalent to,

for (size_t i = 0; i != n; ++i)
    y[i] = a[i] + b[i];

Conventions of Parameters¶

For each function, the first parameter is always the length of the vectors, and the last is a pointer to the output vector (except for sincos and modf which have two output vectors). The output parameters are always vectors. Some of the input parameters may be scalars. For example, the muladd function Functions` has the following seven overloads,

namespace mckl
{

template <typename T>
void muladd(size_t n, const T *a, const T *b, const T *c, T *y);

template <typename T>
void muladd(size_t n, const T *a, const T *b, T c, T *y);

template <typename T>
void muladd(size_t n, const T *a, T b, const T *c, T *y);

template <typename T>
void muladd(size_t n, T a, const T *b, const T *c, T *y);

template <typename T>
void muladd(size_t n, T a, T b, const T *c, T *y);

template <typename T>
void muladd(size_t n, T a, const T *b, T c, T *y);

template <typename T>
void muladd(size_t n, const T *a, T b, T c, T *y);

}

The input of these functions can be either real numbers (floating point types), or complex numbers (std::complex<double>, etc.), or both. The supported data types are also listed in the tables. In most cases, output data type is the same as the input. There are a few exceptions. The abs and arg functions always have real numbers as output. The cis function takes real numbers as input and complex numbers as output. Note that, mixed precision is not allowed. For example,

::mckl::Vector<double> a(n);
::mckl::Vector<double> b(n);
::mckl::Vector<double> y(n);
::mckl::muladd(n, a.data(), b.data(), 2, y.data());

causes compile-time error because the fourth argument is an integer while the others are floating point types. The correct call shall be,

::mckl::muladd(n, a.data(), b.data(), 2.0, y.data());

The same principle applies to mixed type functions (abs, arg and cis).

The input and output pointers are allowed to alias to each other in the sense that they might pointing to the same memory locations. However, if they point different locations but the vectors overlap, the behavior is undefined.

Vectorized Performance¶

With only the standard library, these functions do not provide performance advantage compared to simple loops. When Intel MKL is available, some functions can have substantial performance improvement when all input arguments are vectors of types float or double, std::complex<float> or std::complex<double>. The performance of Vectorized Random Number Generating heavily depends on these functions.

List of Functions¶

Arithmetic Functions¶
Function	Operation	Data Type
`add`	\(y = a + b\)	Real, Complex
`sub`	\(y = a - b\)	Real, Complex
`sqr`	\(y = a^2\)	Real
`mul`	\(y = ab\)	Real, Complex
`mulbyconj`	\(y = a\bar{b}\)	Complex
`conj`	\(y = \bar{a}\)	Complex
`abs`	\(y = \lvert{a}\rvert\)	Real, Complex
`arg`	\(y = \mathrm{arg}(a)\)	Complex
`muladd`	\(y = ab + c\)	Real, Complex
`mulsub`	\(y = ab - c\)	Real, Complex
`nmuladd`	\(y = -ab + c\)	Real, Complex
`nmulsub`	\(y = -ab - c\)	Real, Complex
`fmadd`	\(y = ab + c\) (fused)	Real
`fmsub`	\(y = ab - c\) (fused)	Real
`fnmadd`	\(y = -ab + c\) (fused)	Real
`fnmsub`	\(y = -ab - c\) (fused)	Real

Power and Root Functions¶
Function	Operation	Data Type
`inv`	\(y = 1 / a\)	Real
`div`	\(y = a / b\)	Real, Complex
`sqrt`	\(y = \sqrt{a}\)	Real, Complex
`invsqrt`	\(y = 1 / \sqrt{a}\)	Real
`cbrt`	\(y = a^{1/3}\)	Real
`invcbrt`	\(y = a^{-1/3}\)	Real
`pow2o3`	\(y = a^{2/3}\)	Real
`pow3o2`	\(y = a^{3/2}\)	Real
`pow`	\(y = a^b\)	Real, Complex
`hypot`	\(y = \sqrt{a^2 + b^2}\)	Real

Exponential and Logarithm Functions¶
Function	Operation	Data Type
`exp`	\(y = \mathrm{e}^a\)	Real, Complex
`exp2`	\(y = 2^a\)	Real
`expm1`	\(y = \mathrm{e}^a - 1\)	Real
`log`	\(y = \ln a\)	Real, Complex
`log2`	\(y = \log_2 a\)	Real
`log10`	\(y = \log_{10} a\)	Real, Complex
`log1p`	\(y = \ln(a + 1)\)	Real

Trigonometric Functions¶
Function	Operation	Data Type
`cos`	\(y = \cos(a)\)	Real, Complex
`sin`	\(y = \sin(a)\)	Real, Complex
`sincos`	\(z = \cos(a)\)	Real
`cis`	\(y = \cos(a) + i\sin(a)\)	Complex
`tan`	\(y = \tan(a)\)	Real, Complex
`acos`	\(y = \arccos(a)\)	Real, Complex
`asin`	\(y = \arcsin(a)\)	Real, Complex
`atan`	\(y = \arctan(a)\)	Real, Complex
`atan2`	\(y = \arctan(a / b)\)	Real

Hyperbolic Functions¶
Function	Operation	Data Type
`cosh`	\(y = \cosh(a)\)	Real, Complex
`sinh`	\(y = \sinh(a)\)	Real, Complex
`tanh`	\(y = \tanh(a)\)	Real, Complex
`acosh`	\(y = \mathrm{arc}\cosh(a)\)	Real, Complex
`asinh`	\(y = \mathrm{arc}\sinh(a)\)	Real, Complex
`atanh`	\(y = \mathrm{arc}\tanh(a)\)	Real, Complex

Special Functions¶
Function	Operation	Data Type
`erf`	\(y = \mathrm{erf}(a)\)	Real
`erfc`	\(y = \mathrm{erfc}(a)\)	Real
`cdfnorm`	\(y = (1 + \mathrm{erf}(a / \sqrt{2})) / 2\)	Real
`lgamma`	\(y = \ln\Gamma(a)\)	Real
`tgamma`	\(y = \Gamma(a)\)	Real

Rounding Functions¶
Function	Operation	Data Type
`floor`	\(y = \lfloor a \rfloor\)	Real
`ceil`	\(y = \lceil a \rceil\)	Real
`trunc`	\(y = \mathrm{sgn}(a)\lfloor\|a\|\rfloor\)	Real
`round`	\(y = \text{nearest integer of }a\)	Real
`nearbyint`	\(y = \text{nearest integer of }a\)	Real
`rint`	\(y = \text{nearest integer of }a\)	Real
`modf`	\(z = a - y\)	Real

Fused Multiplication and Addition¶

The muladd, fmadd and related functions differ in that fmadd always does fused multiply-add (FMA) by std::fma while muladd use one multiplication and one addition. However, the compiler may or may not be able to make use the hardware FMA support. And it may not vectorize the loop in fmadd etc., as most modern C++ compilers would do for simpler operations such as addition and multiplication.

When software implementation of std::fma is used, it will be much slower than using one multiplication and one addition. In this case, there are assembly implementations that take advantage of the platform support for single and double precision. To enable this feature, one needs to build and link to the Optional Runtime Library. One also needs to use preprocessor configuration #define MCKL_USE_ASM_LIB 1. In addition, this feature is only enabled for platforms with FMA3 instruction set support.

The library detects the availability of FMA3 instructions using compiler predefined macros. If this mechanism is not adequate, one can manually enable them using preprocessor configuration #define MCKL_HAS_FMA 1. Note that, when the compiler is able to generate vectorized loop for std::fma, the assembly library may or may not outperform the compiler generated binary. If the compiler generated binary is preferred but the runtime library is also used for its other features, then one can use the preprocessor configuration #define MCKL_USE_ASM_FMA 0.