Mathematical Functions¶
Constants¶
MCKL defines some mathematical constants in the form of constant expression functions. For example, to get the value of \(\pi\) with a desired precision, one can use the following,
constexpr float pi_f = ::mckl::const_pi<float>();
constexpr double pi_d = ::mckl::const_pi<double>();
constexpr long double pi_l = ::mckl::const_pi<long double>();
The compiler evaluates these values at compile-time and thus there is no performance difference between the function call and hard-coding the constants in the program, while the readability is improved. All defined constants are listed in the table below.
| Function | Value |
|---|---|
const_inf |
\(\infty\) |
const_nan |
silent NaN |
const_zero |
\(0\) |
const_one |
\(1\) |
const_pi |
\(\pi\) |
const_pi_2 |
\(2\pi\) |
const_pi_inv |
\(1/\pi\) |
const_pi_sqr |
\(\pi^2\) |
const_pi_by2 |
\(\pi/2\) |
const_pi_by3 |
\(\pi/3\) |
const_pi_by4 |
\(\pi/4\) |
const_pi_by6 |
\(\pi/6\) |
const_pi_2by3 |
\(2\pi/3\) |
const_pi_3by4 |
\(3\pi/4\) |
const_pi_4by3 |
\(4\pi/3\) |
const_sqrt_pi |
\(\sqrt{\pi}\) |
const_sqrt_pi_2 |
\(\sqrt{2\pi}\) |
const_sqrt_pi_inv |
\(\sqrt{1/\pi}\) |
const_sqrt_pi_by2 |
\(\sqrt{\pi/2}\) |
const_sqrt_pi_by3 |
\(\sqrt{\pi/3}\) |
const_sqrt_pi_by4 |
\(\sqrt{\pi/4}\) |
const_sqrt_pi_by6 |
\(\sqrt{\pi/6}\) |
const_sqrt_pi_2by3 |
\(\sqrt{2\pi/3}\) |
const_sqrt_pi_3by4 |
\(\sqrt{3\pi/4}\) |
const_sqrt_pi_4by3 |
\(\sqrt{4\pi/3}\) |
const_ln_pi |
\(\ln{\pi}\) |
const_ln_pi_2 |
\(\ln{2\pi}\) |
const_ln_pi_inv |
\(\ln{1/\pi}\) |
const_ln_pi_by2 |
\(\ln{\pi/2}\) |
const_ln_pi_by3 |
\(\ln{\pi/3}\) |
const_ln_pi_by4 |
\(\ln{\pi/4}\) |
const_ln_pi_by6 |
\(\ln{\pi/6}\) |
const_ln_pi_2by3 |
\(\ln{2\pi/3}\) |
const_ln_pi_3by4 |
\(\ln{3\pi/4}\) |
const_ln_pi_4by3 |
\(\ln{4\pi/3}\) |
const_e |
\(\mathrm{e}\) |
const_e_inv |
\(1/\mathrm{e}\) |
const_sqrt_e |
\(\sqrt{\mathrm{e}}\) |
const_sqrt_e_inv |
\(\sqrt{1/\mathrm{e}}\) |
const_sqrt_2 |
\(\sqrt{2}\) |
const_sqrt_3 |
\(\sqrt{3}\) |
const_sqrt_5 |
\(\sqrt{5}\) |
const_sqrt_10 |
\(\sqrt{10}\) |
const_sqrt_1by2 |
\(\sqrt{1/2}\) |
const_sqrt_1by3 |
\(\sqrt{1/3}\) |
const_sqrt_1by5 |
\(\sqrt{1/5}\) |
const_sqrt_1by10 |
\(\sqrt{1/10}\) |
const_ln_2 |
\(\ln{2}\) |
const_ln_3 |
\(\ln{3}\) |
const_ln_5 |
\(\ln{5}\) |
const_ln_10 |
\(\ln{10}\) |
const_ln_inv_2 |
\(1/\ln{2}\) |
const_ln_inv_3 |
\(1/\ln{3}\) |
const_ln_inv_5 |
\(1/\ln{5}\) |
const_ln_inv_10 |
\(1/\ln{10}\) |
const_ln_ln_2 |
\(\ln\ln{2}\) |
Vectorized Functions¶
MCKL provides a set of vectorized elementary mathematical functions. For example, to perform additions of two vectors,
size_t n = 1000;
::mckl::Vector<double> a(n), b(n), y(n);
// Fill vectors a and b
::mckl::add(n, a.data(), b.data(), y.data());
This is equivalent to,
for (size_t i = 0; i != n; ++i)
y[i] = a[i] + b[i];
Conventions of Parameters¶
For each function, the first parameter is always the length of the vectors, and
the last is a pointer to the output vector (except for sincos and modf
which have two output vectors). The output parameters are always vectors. Some
of the input parameters may be scalars. For example, the muladd function
Functions` has the following seven overloads,
namespace mckl
{
template <typename T>
void muladd(size_t n, const T *a, const T *b, const T *c, T *y);
template <typename T>
void muladd(size_t n, const T *a, const T *b, T c, T *y);
template <typename T>
void muladd(size_t n, const T *a, T b, const T *c, T *y);
template <typename T>
void muladd(size_t n, T a, const T *b, const T *c, T *y);
template <typename T>
void muladd(size_t n, T a, T b, const T *c, T *y);
template <typename T>
void muladd(size_t n, T a, const T *b, T c, T *y);
template <typename T>
void muladd(size_t n, const T *a, T b, T c, T *y);
}
The input of these functions can be either real numbers (floating point types),
or complex numbers (std::complex<double>, etc.), or both. The supported
data types are also listed in the tables. In
most cases, output data type is the same as the input. There are a few
exceptions. The abs and arg functions always have real numbers as
output. The cis function takes real numbers as input and complex numbers as
output. Note that, mixed precision is not allowed. For example,
::mckl::Vector<double> a(n);
::mckl::Vector<double> b(n);
::mckl::Vector<double> y(n);
::mckl::muladd(n, a.data(), b.data(), 2, y.data());
causes compile-time error because the fourth argument is an integer while the others are floating point types. The correct call shall be,
::mckl::muladd(n, a.data(), b.data(), 2.0, y.data());
The same principle applies to mixed type functions (abs, arg and
cis).
The input and output pointers are allowed to alias to each other in the sense that they might pointing to the same memory locations. However, if they point different locations but the vectors overlap, the behavior is undefined.
Vectorized Performance¶
With only the standard library, these functions do not provide performance
advantage compared to simple loops. When Intel MKL is available, some
functions can have substantial performance improvement when all input arguments
are vectors of types float or double, std::complex<float> or
std::complex<double>. The performance of Vectorized Random Number Generating heavily depends on these functions.
List of Functions¶
| Function | Operation | Data Type |
|---|---|---|
add |
\(y = a + b\) | Real, Complex |
sub |
\(y = a - b\) | Real, Complex |
sqr |
\(y = a^2\) | Real |
mul |
\(y = ab\) | Real, Complex |
mulbyconj |
\(y = a\bar{b}\) | Complex |
conj |
\(y = \bar{a}\) | Complex |
abs |
\(y = \lvert{a}\rvert\) | Real, Complex |
arg |
\(y = \mathrm{arg}(a)\) | Complex |
muladd |
\(y = ab + c\) | Real, Complex |
mulsub |
\(y = ab - c\) | Real, Complex |
nmuladd |
\(y = -ab + c\) | Real, Complex |
nmulsub |
\(y = -ab - c\) | Real, Complex |
fmadd |
\(y = ab + c\) (fused) | Real |
fmsub |
\(y = ab - c\) (fused) | Real |
fnmadd |
\(y = -ab + c\) (fused) | Real |
fnmsub |
\(y = -ab - c\) (fused) | Real |
| Function | Operation | Data Type |
|---|---|---|
inv |
\(y = 1 / a\) | Real |
div |
\(y = a / b\) | Real, Complex |
sqrt |
\(y = \sqrt{a}\) | Real, Complex |
invsqrt |
\(y = 1 / \sqrt{a}\) | Real |
cbrt |
\(y = a^{1/3}\) | Real |
invcbrt |
\(y = a^{-1/3}\) | Real |
pow2o3 |
\(y = a^{2/3}\) | Real |
pow3o2 |
\(y = a^{3/2}\) | Real |
pow |
\(y = a^b\) | Real, Complex |
hypot |
\(y = \sqrt{a^2 + b^2}\) | Real |
| Function | Operation | Data Type |
|---|---|---|
exp |
\(y = \mathrm{e}^a\) | Real, Complex |
exp2 |
\(y = 2^a\) | Real |
expm1 |
\(y = \mathrm{e}^a - 1\) | Real |
log |
\(y = \ln a\) | Real, Complex |
log2 |
\(y = \log_2 a\) | Real |
log10 |
\(y = \log_{10} a\) | Real, Complex |
log1p |
\(y = \ln(a + 1)\) | Real |
| Function | Operation | Data Type |
|---|---|---|
cos |
\(y = \cos(a)\) | Real, Complex |
sin |
\(y = \sin(a)\) | Real, Complex |
sincos |
\(z = \cos(a)\) | Real |
cis |
\(y = \cos(a) + i\sin(a)\) | Complex |
tan |
\(y = \tan(a)\) | Real, Complex |
acos |
\(y = \arccos(a)\) | Real, Complex |
asin |
\(y = \arcsin(a)\) | Real, Complex |
atan |
\(y = \arctan(a)\) | Real, Complex |
atan2 |
\(y = \arctan(a / b)\) | Real |
| Function | Operation | Data Type |
|---|---|---|
cosh |
\(y = \cosh(a)\) | Real, Complex |
sinh |
\(y = \sinh(a)\) | Real, Complex |
tanh |
\(y = \tanh(a)\) | Real, Complex |
acosh |
\(y = \mathrm{arc}\cosh(a)\) | Real, Complex |
asinh |
\(y = \mathrm{arc}\sinh(a)\) | Real, Complex |
atanh |
\(y = \mathrm{arc}\tanh(a)\) | Real, Complex |
| Function | Operation | Data Type |
|---|---|---|
erf |
\(y = \mathrm{erf}(a)\) | Real |
erfc |
\(y = \mathrm{erfc}(a)\) | Real |
cdfnorm |
\(y = (1 + \mathrm{erf}(a / \sqrt{2})) / 2\) | Real |
lgamma |
\(y = \ln\Gamma(a)\) | Real |
tgamma |
\(y = \Gamma(a)\) | Real |
| Function | Operation | Data Type |
|---|---|---|
floor |
\(y = \lfloor a \rfloor\) | Real |
ceil |
\(y = \lceil a \rceil\) | Real |
trunc |
\(y = \mathrm{sgn}(a)\lfloor|a|\rfloor\) | Real |
round |
\(y = \text{nearest integer of }a\) | Real |
nearbyint |
\(y = \text{nearest integer of }a\) | Real |
rint |
\(y = \text{nearest integer of }a\) | Real |
modf |
\(z = a - y\) | Real |
Fused Multiplication and Addition¶
The muladd, fmadd and related functions differ in that fmadd always
does fused multiply-add
(FMA) by std::fma while muladd use one multiplication and one addition.
However, the compiler may or may not be able to make use the hardware FMA
support. And it may not vectorize the loop in fmadd etc., as most modern
C++ compilers would do for simpler operations such as addition and
multiplication.
When software implementation of std::fma is used, it will be much slower
than using one multiplication and one addition. In this case, there are
assembly implementations that take advantage of the platform support for single
and double precision. To enable this feature, one needs to build and link to
the Optional Runtime Library. One also needs to use preprocessor
configuration #define MCKL_USE_ASM_LIB 1. In addition, this feature is only
enabled for platforms with FMA3
instruction set support.
The library detects the availability of FMA3 instructions using compiler
predefined macros. If this mechanism is not adequate, one can manually enable
them using preprocessor configuration #define MCKL_HAS_FMA 1. Note that,
when the compiler is able to generate vectorized loop for std::fma, the
assembly library may or may not outperform the compiler generated binary. If
the compiler generated binary is preferred but the runtime library is also used
for its other features, then one can use the preprocessor configuration
#define MCKL_USE_ASM_FMA 0.