< home

fast_sse_tanh

example for approximation of hyperbolic tangent from Lambert's (Gauss) continued fraction.
last updated: 13122011


information:
------------------------------------------------------------------------------
tanh_sse.h

  example for approximation of hyperbolic tangent from Lambert's (Gauss)
  continued fraction.
  http://en.wikipedia.org/wiki/Gauss's_continued_fraction
  http://maths.ashwyninnovations.com/lambert.pdf

  plain c and sse intrinsic versions for msvc and gcc

  compile with:
    gcc -msse -mfpmath=387 -W -Wall
    gcc -msse -mfpmath=387,sse -W -Wall
    cl /arch:SSE /W4

  notes:
    - the plain c version also ends up quite fast with gcc's fpu optimizations
    - _TANH_RANGE could be brought down, so that the "clamp" engages for lower
      values. error starts to propagate near 5,6 or -5,-6 in the 0.00001 range
    - if _TANH_CLAMP_INDIVIDUAL is not set the entire vector will be clamped
    - speed comparison against libm on amd althon xp with gcc 4.x for one and
      the same value in the vector:
        flags:          -O3 -msse -mfpmath=387
        iterations:     1E+6
        fast_tanh_sse:  37 ms
        libm:           172 ms
    - orders of 5/6 are used:
      tanh(x) = (21*x^5 + 1260*x^3 + 10395*x) /
                (x^6 + 210*x^4 + 4725*x^2 + 10395)
    - define _TANH_FAST_DIV for less accurate (~14% faster) division with
      ~(21 - 22) bits of mantissa.

  contact:
    lubomir i. ivanov, neolit123 [at] gmail
------------------------------------------------------------------------------

download:

updates:
[13.12.2011] thanks to Jakub Bystron (jb.elitecode [ at ] gmail) for pointing out a branching issue.

--