Factorial and Gamma

MATLAB Factorial Function

Loop

Vector

Recursion

MATLAB Gamma Function

MATLAB's Integral Function

Simpsons Rule

Stirling's Approximation

Lanczos approximation

Loop vs Gamma vs Lancoz

Purpose - One day I decided to time MATLAB's built in factorial() and gamma() functions and learned that the gamma function is ridiculously fast. So fast in fact that MATLAB gives a warning that it is inaccurate and is on the magnitude of a billionth of a second (7.9614e-08 sec). Which was around 10x faster then the factorial function (when calculating 100!), however that doesn't make sense to me how the gamma function, which is a integral, is quicker then multiplying several numbers together. This then prompted multiple attempts at recreating MATLAB's speed in calculating the factorial which ended at a function on the same magnitude as MATLAB's.

For those who don't know the relationship between the Gamma Function and the Factorial is defined below:

Because I wanted to compare each implementation to each other I measured both precision and execution time. Thus below are the methodologies followed by the code. I ran each implementation on various values of n, for the implementations that only work with positive integers it was all integers from 0 to 102 and the rest was the same but with a step size of 0.1 such that it went 0, 0.1, 0.2, 0.3, ..., 102. The reason why 102 was chosen was because 103 caused some of the implementations to output <missing> and negatives weren't tested as most outputted <missing>, Inf, or just very wrong, along with the gamma function not being defined at negative integers adding extra complexity for only two implementations functioning correctly.

Timing - Timing was fairly straightforward using MATLAB's built in timeit() function, that said I did have ambitions of timing it multiple times and taking a average but that was too slow for the desired number of points. It should also be noted that as mentioned before some implementations were so quick that timit() occasionally outputted the following warning:

Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring

something that takes longer.

> In timeit (line 158)

In Factorial_Timing (line 18)

Precision - Precision was tricker then timing as none of the implementations were 100% exact hence prompting me to measure it. That said after some trial and error I settled on using the vpaintegral() function which for select values matched Wolfram Alpha's answer. The full function is shown below along with a table of select values:

syms f(x)

f(x) = x^n * exp(-x);

vpaintegral(f,0,1e3, 'RelTol', 1e-50, 'AbsTol', 0)

Timing and Precision Code

function ret = time()

max_test = 102;

delete factorial_timing.xls

test = 0:max_test;

headers = {'n', 'Matlab Factorial', 'Loop', 'Vector', 'Recursion'};

writecell(headers,'factorial_timing.xls','Sheet','Integer Precision','WriteMode','append')

writecell(headers,'factorial_timing.xls','Sheet','Integer Timing','WriteMode','append')

for j = 1:length(test)

i = test(j);

data = ones(2,length(headers)) * 100;

data(:,1) = i;

data(:,2) = testFunc(@() factorial(i), i);

data(:,3) = testFunc(@() loop(i), i);

data(:,4) = testFunc(@() vector(i), i);

data(:,5) = testFunc(@() recursion(i), i);

writematrix(data(1,:),'factorial_timing.xls','Sheet','Integer Timing', 'WriteMode','append')

writematrix(data(2,:),'factorial_timing.xls','Sheet','Integer Precision', 'WriteMode','append')

disp(j / length(test) * 100)

end

max_test = 100;

test = 0:0.1:max_test;

headers = {'n', 'Matlab Gamma', 'Matlab Integral', 'Simpsons Rule', 'Stirlings Formula', 'Lanczos Approximation', 'Exact'};

writecell(headers,'factorial_timing.xls','Sheet','Real Precision','WriteMode','append')

writecell(headers,'factorial_timing.xls','Sheet','Real Timing','WriteMode','append')

for j = 1:length(test)

i = test(j);

data = ones(2,length(headers)) * 100;

data(:,1) = i;

data(:,2) = testFunc(@() gamma(i + 1), i);

data(:,3) = testFunc(@() integral(@(x) x.^i .* exp(-x),0,10e2), i);

data(:,4) = testFunc(@() simpsons_rule(i,0,1e3), i);

data(:,5) = testFunc(@() stirling(i), i);

data(:,6) = testFunc(@() lanczos(i), i);

syms f(x)

f(x) = x^i * exp(-x);

data(:,7) = testFunc(@() vpaintegral(f,0,1e3, 'RelTol', 1e-50, 'AbsTol', 0), i);

writematrix(data(1,:),'factorial_timing.xls','Sheet','Real Timing', 'WriteMode','append')

writematrix(data(2,:),'factorial_timing.xls','Sheet','Real Precision', 'WriteMode','append')

disp(j / length(test) * 100)

end

function [ret] = testFunc(func, n)

ret = [0 0];

syms f(x)

f(x) = x^n * exp(-x);

exactValue = vpaintegral(f,0,1e3, 'RelTol', 1e-50, 'AbsTol', 0);

evaluatedValue = func();

ret(2) = abs((evaluatedValue - exactValue) / exactValue) * 100;

trials = 1;

data = zeros(1,trials);

for i = 1:trials

data(i) = timeit(func);

end

ret(1) = mean(data);

end

Graphing Code

clc

clear

clf

warning('off')

file = "factorial_timing.xlsx";

intPrec = readtable(file, "Sheet","Integer Precision");

intTime = readtable(file, "Sheet","Integer Timing");

realPrec = readtable(file, "Sheet","Real Precision");

realTime = readtable(file, "Sheet","Real Timing");

n = 10;

m = false;

hold on

title('Precision of Loop', 'FontSize', 20)

xlabel('n', 'FontWeight','bold', 'FontSize', 18)

xlim([0 100])

ax = gca;

ax.XAxis.FontSize = 15;

ax.YAxis.FontSize = ax.XAxis.FontSize;

ax.YAxis.Exponent = 0;

ylabel('Time (sec)', 'FontWeight','bold', 'FontSize', 18)

if m == true

intPrec = avTable(intPrec, 5);

intTime = avTable(intTime, 5);

realPrec = avTable(realPrec, 100);

realTime = avTable(realTime, 100);

end

plot(intPrec.n, intPrec.MatlabFactorial, 'LineWidth',3)

%legend({'Loop','LanczosApproximation', 'Matlab Gamma'},'Location','northwest')

disp('Done')

function ret = avTable(T, n)

var = T.Variables;

cols = length(var(1,:));

rows = length(var(:,1));

%n = ceil(rows / n);

for col = 1:cols

T.(col) = movmean(T.(col), n);

end

ret = T;

end

MATLAB Factorial Function

Precision

Timing (moving average of 5)

This method is using MATLAB's built in factorial function, the code of which is below, and seems to just find the max value in the inputted vector and calculate every factorial up to said value then gets the desired values and returns it. It calculates the factorials through repeated multiplication, although through a vector using cumprod(). MATLAB's own documentation also sites a maximum of 15 digits of accuracy for numbers larger then 21, which matches the measured precision.

MATLAB's Factorial Function Code

function n = factorial(n)

%FACTORIAL Factorial function.

% FACTORIAL(N) for scalar N, is the product of all the integers from 1 to N,

% i.e. prod(1:N). When N is an N-D matrix, FACTORIAL(N) is the factorial for

% each element of N. Since double precision numbers only have about

% 15 digits, the answer is only accurate for N <= 21. For larger N,

% the answer will have the correct order of magnitude, and is accurate for

% the first 15 digits.

%

% Class support for input N:

% float: double, single

% integer: uint8, int8, uint16, int16, uint32, int32, uint64, int64

%

% See also PROD.

N = n(:);

if ~isreal(n) || any(fix(N) ~= N) || any(N < 0)

error(message('MATLAB:factorial:NNegativeInt'))

end

if isa(n,'double')

thres = 171;

elseif isa(n, 'single')

thres = 35;

elseif isinteger(n)

thres = 21;

else

error(message('MATLAB:factorial:unsupportedType'));

end

N = min(N,thres);

m = max(1, max(N));

Fa = cumprod([1 1 2:m]);

n(:) = Fa(N+1);

Loop

Precision

Timing (moving average of 5)

The loop implementation was very simple in how it works but was very quick and took a lot of effort to be able to beat it. It scales linearly as well as seems to have the same precision as the factorial() function. Thus it is reasonable to assume that the loop implementation is also exact up to 21.

Loop Implementation Code

function fact = loop(n)

fact = 1;

for i = 2:n

fact = fact * i;

end

Vector

Percision

Timing (moving average of 5)

I learned during the Prime Sieve deep dive and Simpson's Rule that occasionally MATLAB is faster when working with vectors then using a loop. However that isn't the case here, although there are some benefits like being a single line of code and probably scaling better then the loop implementation. However MATLAB will only calculate the factorial to 170 before outputting Inf, which is not enough for the vector implementation to be faster then the loop.

Vector Implementation Code

function fact = vector(n)

fact = prod(1:n);

end

Recursion

Precision

Timing (moving average of 5)

Admittedly this was mostly for fun as well as to see how MATLAB handles recursion. It scales linearly as you would expect and has stuck with the trend of having the same precision as the previous implementations. It is also slower then the loop implementation which is what I would also expect.

Recursion Implementation Code

function fact = recursion(n)

if (n <= 1)

fact = 1;

return

end

fact = n * recursion(n - 1);

end

MATLAB Gamma Function

Precision (moving average of 10)

Timing (moving average of 10)

BFb0080113.pdf

This implementation is the first one to use the Gamma Function definition of the factorial which can take in any number and was done using MATLAB's built in gamma() function. As the code isn't freely available there is no code, however MATLAB's documentation cites

Cody, J., An Overview of Software Development for Special Functions, Lecture Notes in Mathematics, 506, Numerical Analysis Dundee, G. A. Watson (ed.), Springer Verlag, Berlin, 1976.

for there algorithm, which I believe I tracked down for the interested readers. To my limited understanding they have a different way for approximating based on different ranges to provide the most performance and precision. Which can be seen in the Timing Graph as the time steadily increases before stabilizing at around n = 10.

MATLAB's Integral Function

Precision

Timing (moving average of 10)

My first thought for why the gamma function is so quick was because MATLAB could calculate integrals quick thus I timed the integral() function. However that isn't true as the gamma function is both more accurate and faster then the integral function, although the precision can be increased. How exactly the function works is beyond me but the code is viewable for those interested.

Integral Implementation Code

n = 4;

fun = @(x) x.^n .* exp(-x);

integral(fun,0,10e2)

Simpsons Rule

Precision (moving average of 5)

Timing (moving average of 10)

This implementation implements Simpson's Rule which similarly to the recursive implementation was for fun but was also a test to see if MATLAB is doing some sort of integral approximation. However after choosing a n value such that the function calculates 0.5! to 5 decimal places which is how much the other implementation's would display. It was obvious that this wasn't the solution. Even after slight optimizations from the previous project making it tailored to the specific integral rather then any function it was still far behind in terms of timing but is surprisingly close to the gamma function precision.

Simpsons Rule Implementation Code

function fact = simpsons_rule(z, a, b)

n = 1211534;

deltaX = (b - a) / n;

values = linspace(a + deltaX, b - deltaX, n - 1);

y = values.^z .* exp(-values);

odd = 4 .* y(1:2:end);

even = 2 .* y(2:2:end);

fact = (deltaX / 3) * (a^z * exp(-a) + b^z * exp(-b) + sum(odd) + sum(even));

end

Stirling's Approximation

Precision

Timing (moving average of 10)

This implementation uses Stirling's Approximation which is a single equation for the factorial. The problem is that due to the complex math it is just barely faster then the loop implementation and also slower then the gamma function. It is also very inaccurate only providing a 2 digit accuracy, but does have the most satisfying precision graphs. However taking what I have learned, it could probably be made faster but is to inaccurate that it lacks any usefulness.

Stirling's Approximation Implementation Code

function fact = stirling(n)

fact = sqrt((2 * n + 1/3) * pi) * n^n * exp(-n);

end

Lanczos approximation

This implementation is the main version I worked on in terms of making it quicker, more precise and over all learning the most about. This implementation uses Lanczos approximation in two functions, one was a attempt on calculating the coefficients used and the second actually implementing the part that calculates the factorial. Firstly a quick overview on the method, below are the main equations used

Where p1, p2, p3 and so on are the previously mentioned coefficients which can be pre calculated improving performance. Thus firstly lets discuss the code calculating the coefficients. I mainly followed the steps by Paul Godfrey in there example they made g = 5 and calculated 7 coefficients which I likewise will use, now below is the equation for the kth coefficient.

note that (2a - 1)!! is the Double Factorial of (2a-1) with the code for Fg(a) shown below

%Calculates Fg(a)

a = 0:n-1;

f = sqrt(2/sym(pi)) * (doubleFact(2*a-1).*my_exp(a+g+sym(1)/2))./(2.^a.*(a+g+1/2).^(a+1/2));

and does give Godfrey's answer

I also implemented my own e^x function and other tricks in order to get more precision which will be further discussed later. The next step is to calculate C(2k,2a) which are the even columns and even rows of the coefficients of the Chebyshev Polynomials which the code is below.

%Calculates Coefficients of the Chebyshev polynomials

%https://en.wikipedia.org/wiki/Lanczos_approximation

C = zeros(2*n-1,2*n-1);

%Gets the Chebyshev polynomials Coefficents

for N = 1:length(C)

for M = 1:length(C)

C(N,M) = T(N,M);

end

%Gets the even rows and columns of C

C = C(1:2:end, 1:2:end);

C(1,1) = 0.5;

This does give Godfrey's answer

Thus pk = C * Fg(a) through some math and re working we have that Ag(z) = Z * D * B * pk where

D is essentially the Sloane Sequence and B is the odd columns of the Binomial Coefficients as well as with alternating signs therefore

Therefore the coefficients are c = (D * B * C )* Fg(a), note that this is matrix multiplication not normal multiplication thus the coefficients for when g = 9 and

n = 11, which Godfrey found to be the most precise values, are below.

You may notice that my and Godfrey's values are slightly off despite my best efforts I couldn't figure out why they differ but to my knowledge Godfrey's values are the more accurate values thus I used those instead of my own when comparing the gamma functions.

Lanczos Approximation Coefficients Code

%Calculates the coeefficients for the approximation

%Main Steps: https://www.numericana.com/answer/info/godfrey.htm

function p = lancoz_coef(n,g)

digits(128)

%Calculates Fg(a)

a = 0:n-1;

f = sqrt(2/sym(pi)) * (doubleFact(2*a-1).*my_exp(a+g+sym(1)/2))./(2.^a.*(a+g+1/2).^(a+1/2));

%Version 1 that was less precise

% a = 0:n-1;

% f = vpa(sqrt(2)/pi)*gamma(a-0.5+1).*(a+g+0.5).^-(a+0.5).*exp(a+g+0.5);

% disp(vpa(f(1)))

%Version 2 that was less precise

% f = zeros(1,n);

% for i = 0:n-1

% f(i+1) = vpa(sqrt(2)/pi)*halfGamma(i-0.5+1)*(i+g+0.5)^-(i+0.5)*vpa(exp(i+g+0.5));

% end

%Calculates Coefficients of the Chebyshev polynomials

%https://en.wikipedia.org/wiki/Lanczos_approximation

C = zeros(2*n-1,2*n-1);

%Gets the Chebyshev polynomials Coefficents

for N = 1:length(C)

for M = 1:length(C)

C(N,M) = T(N,M);

end

%Gets the even rows and columns of C

C = C(1:2:end, 1:2:end);

C(1,1) = 0.5;

%Calculates D the Sloane sequence

D = zeros(n);

D(1,1) = 1;

for i = 0:n-2

D(i+2,i+2) = -factorial(2*i+2) ./ (2*factorial(i).*factorial(i+1));

end

%Calculates the Binomial Coeffiecients

%https://en.wikipedia.org/wiki/Binomial_coefficient

B = zeros(n);

%Trivial first row

B(1,:) = 1;

%Creates a vector with alternating 1's and -1's

alt = (-1).^(0:n-1);

%Gets the rows of B which come from the odd columns of the Binomial

%Coefficents and alternates the sign

for i = 2:n

row = i+i-3:n+i-3;

col = (i-2)*2+1;

B(i,i:end) = alt(1:end-i+1).*factorial(row) ./ (factorial(col) .* factorial(row - col));

end

%Multiplyes the matricies to get a final answer for the

%coefficents

p = (D * B * C) * vpa(sym(f)');

disp(round(p,21))

end

%My implimantation of exp(n), intended to give more precise answers

%https://en.wikipedia.org/wiki/E_(mathematical_constant)

function exp = my_exp(n)

digits(128)

k = 300;

exp = zeros(1,length(n));

a = vpa(1:k);

fact = factorial(a);

for i = 1:length(n)

exp(i) = vpa(1 + sum((n(i).^a ./ fact)));

end

%Made to calculate the gamma function of a half, ex: 9.5, 6.5, 0.5

function fact = halfGamma(n)

digits(128)

if n == 0.5

fact = vpa(sqrt(pi));

return

else

d = vpa(n-1);

fact = d * halfGamma(n-1);

end

%Chebyshev Polynomial Coefficents through recursion

%https://en.wikipedia.org/wiki/Lanczos_approximation#Coefficients

function ret = T(n,m)

if n <= 1 && m == 1

ret = 1;

elseif n == 2 && m == 2

ret = 1;

elseif m == 1

ret = -T(n-2,1);

elseif m == n

ret = 2 * T(n-1,m-1);

elseif n >= 1

ret = 2 * T(n-1,m-1) - T(n-2,m);

else

ret = 0;

end

%Implements the double factorial

%https://mathworld.wolfram.com/DoubleFactorial.html

function fact = doubleFact(n)

fact = zeros(1, length(n));

for i = 1:length(n)

term = n(i);

if term == -1 || term == 0

fact(i) = 1;

elseif mod(term,2) == 0

fact(i) = prod(2:2:term);

else

fact(i) = prod(1:2:term);

end

Now having discussed the coefficient code we can now move on to the actual calculation that being calculating the factorial. Firstly a reminder on the new calculation

There are many ways to code this with some being quicker then others. Firstly lets look at the part right before Ag(z). I originally coded it as shown

temp = z+9.5;

sqrt(2 * pi)*temp^z*sqrt(temp)*exp(-temp);

in a attempt to reduce the amount of calculations and using built in functions I also learned that the slowest calculation is (z+9.5)^z because when z = 100 like my tests that would be 109.5^100 which is a very large calculation. That is when I came up with the idea to make the base constant ideally e as matlab has a built in function for that. After the following algebra

and further simplification as well as splitting it on other lines to time as well as a possible performance increase we have the final version

temp = z+9.5;

comm = sqrt(2 * pi * temp);

comm = comm * exp(z * log(temp) - temp);

we see that performance decreased from around 0.00013689 ms to 7.0418e-05 ms for 100! or a 64.1287% difference. The next significant optimization came from not using a loop as I originally had the coefficients in a vector and iterated through them as seen below with mild decrease in calculations

c = [ %1.000000000000000174663;

5716.400188274341379136;

-14815.30426768413909044;

14291.49277657478554025;

-6348.160217641458813289;

1301.608286058321874105;

-108.1767053514369634679;

2.605696505611755827729;

-0.7423452510201416151527e-2;

0.5384136432509564062961e-7;

-0.4023533141268236372067e-8];

fact = 1.000000000000000174663;

for i = 1:10

%fact = fact + c(i) / (z + i - 1);

temp = c(i) / (z + i);

temp = 1 / (z + i);

temp = temp * c(i);

fact = fact + temp;

end

fact = fact * comm;

however removing the loop and instead hard coding the iterations as seen below

fact = 1.000000000000000174663;

fact = fact + 5716.400188274341379136 / (z + 1);

fact = fact - 14815.30426768413909044 / (z + 2);

fact = fact + 14291.49277657478554025 / (z + 3);

fact = fact - 6348.160217641458813289 / (z + 4);

fact = fact + 1301.608286058321874105 / (z + 5);

fact = fact - 108.1767053514369634679 / (z + 6);

fact = fact + 2.605696505611755827729 / (z + 7);

fact = fact - 0.007423452510201416151527 / (z + 8);

fact = fact + 0.00000005384136432509564062961 / (z + 9);

fact = fact - 0.000000004023533141268236372067 / (z + 10);

fact = fact * comm;

we see that 8.5678e-05 ms now becomes 7.3558e-05 ms or around a 15.2227% difference.

Lanczos Approximation Calculation Code

%Uses the Lancoz Approximation

%About: https://en.wikipedia.org/wiki/Lanczos_approximation

%Main Steps: https://www.numericana.com/answer/info/godfrey.htm

function fact = lanczos(z)

%g = 9;

%n = 11;

% p = [1.000000000434457002999

% 5716.40018826882469642082

% -14815.304267303827193624659

% 14291.492770243227957671811

% -6348.160174678220466185036

% 1301.608146277464603149011

% -108.176491813501910571434

% 2.605616416699605525942

% -0.007600138516935476197

% 0.000223334913969033078

% -0.000077312348075020927];

% c = [ %1.000000000000000174663;

% 5716.400188274341379136;

% -14815.30426768413909044;

% 14291.49277657478554025;

% -6348.160217641458813289;

% 1301.608286058321874105;

% -108.1767053514369634679;

% 2.605696505611755827729;

% -0.7423452510201416151527e-2;

% 0.5384136432509564062961e-7;

% -0.4023533141268236372067e-8];

%p = double(lancoz_coef(n,g));

%Previous Version that is slower

%comm = sqrt(2 * pi)*temp^z*sqrt(temp)*exp(-temp);

%Originally used a loop but for speed did it manually

% for i = 1:10

% %fact = fact + c(i) / (z + i - 1);

% temp = c(i) / (z + i);

% temp = 1 / (z + i);

% temp = temp * c(i);

% fact = fact + temp;

% end

% fact = fact * comm;

temp = z+9.5;

comm = sqrt(2 * pi * temp);

comm = comm * exp(z * log(temp) - temp);

fact = 1.000000000000000174663;

fact = fact + 5716.400188274341379136 / (z + 1);

fact = fact - 14815.30426768413909044 / (z + 2);

fact = fact + 14291.49277657478554025 / (z + 3);

fact = fact - 6348.160217641458813289 / (z + 4);

fact = fact + 1301.608286058321874105 / (z + 5);

fact = fact - 108.1767053514369634679 / (z + 6);

fact = fact + 2.605696505611755827729 / (z + 7);

fact = fact - 0.007423452510201416151527 / (z + 8);

fact = fact + 0.00000005384136432509564062961 / (z + 9);

fact = fact - 0.000000004023533141268236372067 / (z + 10);

fact = fact * comm;

end

Precision (moving average of 10)

Timing (moving average of 10)

Loop vs Gamma vs Lancoz

Gamma and Lanczos got moving average of 10

Loop got moving average of 5

Gamma and Lanczos got moving average of 10

Loop got moving average of 5

As one would expect I didn't revolutionize anything however I came very close to MATLAB's gamma function and firmly beat the Loop which I am very proud of.