fast_sse_tanh example for approximation of hyperbolic tangent from Lambert's (Gauss) continued fraction. last updated: 13122011 ------------------------------------------------------------------------------ tanh_sse.h example for approximation of hyperbolic tangent from Lambert's (Gauss) continued fraction. http://en.wikipedia.org/wiki/Gauss's_continued_fraction http://maths.ashwyninnovations.com/lambert.pdf plain c and sse intrinsic versions for msvc and gcc compile with: gcc -msse -mfpmath=387 -W -Wall gcc -msse -mfpmath=387,sse -W -Wall cl /arch:SSE /W4 notes: - the plain c version also ends up quite fast with gcc's fpu optimizations - _TANH_RANGE could be brought down, so that the "clamp" engages for lower values. error starts to propagate near 5,6 or -5,-6 in the 0.00001 range - if _TANH_CLAMP_INDIVIDUAL is not set the entire vector will be clamped - speed comparison against libm on amd althon xp with gcc 4.x for one and the same value in the vector: flags: -O3 -msse -mfpmath=387 iterations: 1E+6 fast_tanh_sse: 37 ms libm: 172 ms - orders of 5/6 are used: tanh(x) = (21*x^5 + 1260*x^3 + 10395*x) / (x^6 + 210*x^4 + 4725*x^2 + 10395) - define _TANH_FAST_DIV for less accurate (~14% faster) division with ~(21 - 22) bits of mantissa. contact: lubomir i. ivanov, neolit123 [at] gmail ------------------------------------------------------------------------------ download: updates: [13.12.2011] thanks to Jakub Bystron (jb.elitecode [ at ] gmail) for pointing out a branching issue. -- |